Robust Multi-Agent Bandits Over Undirected Graphs

Daniel Vial [email protected] University of Texas at AustinAustinTexasUSA , Sanjay Shakkottai [email protected] University of Texas at AustinAustinTexasUSA and R. Srikant [email protected] University of Illinois Urbana-ChampaignUrbana-ChampaignIllinoisUSA

(August 2022; October 2022; November 2022)

Abstract.

We consider a multi-agent multi-armed bandit setting in which $n$ honest agents collaborate over a network to minimize regret but $m$ malicious agents can disrupt learning arbitrarily. Assuming the network is the complete graph, existing algorithms incur $O((m+K/n)\log(T)/\Delta)$ regret in this setting, where $K$ is the number of arms and $\Delta$ is the arm gap. For $m\ll K$ , this improves over the single-agent baseline regret of $O(K\log(T)/\Delta)$ .

In this work, we show the situation is murkier beyond the case of a complete graph. In particular, we prove that if the state-of-the-art algorithm is used on the undirected line graph, honest agents can suffer (nearly) linear regret until time is doubly exponential in $K$ and $n$ . In light of this negative result, we propose a new algorithm for which the $i$ -th agent has regret $O((d_{\text{mal}}(i)+K/n)\log(T)/\Delta)$ on any connected and undirected graph, where $d_{\text{mal}}(i)$ is the number of $i$ ’s neighbors who are malicious. Thus, we generalize existing regret bounds beyond the complete graph (where $d_{\text{mal}}(i)=m$ ), and show the effect of malicious agents is entirely local (in the sense that only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret).

multi-armed bandits; malicious agents

^†^†copyright: acmcopyright^†^†journal: POMACS^†^†journalyear: 2022^†^†journalvolume: 6^†^†journalnumber: 3^†^†article: 53^†^†publicationmonth: 12^†^†price: 15.00^†^†doi: 10.1145/3570614^†^†ccs: Theory of computation Online learning theory^†^†ccs: Theory of computation Sequential decision making^†^†ccs: Theory of computation Regret bounds^†^†ccs: Theory of computation Multi-agent learning^†^†ccs: Theory of computation Distributed algorithms^†^†ccs: Theory of computation Adversarial learning

1. Introduction

Motivated by applications including distributed computing, social recommendation systems, and federated learning, a number of recent papers have studied multi-agent variants of the classical multi-armed bandit problem. Typically, these variants involve a large number of agents playing a bandit while communicating over a network. The goal is to devise communication protocols that allow the agents to efficiently amalgamate information, thereby learning the bandit’s parameters more quickly than they could by running single-agent algorithms in isolation.

Among the many multi-agent variants considered in the literature (see Section 1.5), we focus on a particular setting studied in the recent line of work (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021; Vial et al., 2021). In these papers, $n$ agents play separate instances of the same $K$ -armed bandit and are restricted to $o(T)$ pairwise and bit-limited communications per $T$ arm pulls. We recount two motivating applications from this prior work.

Example 1.

For an e-commerce site (e.g., Amazon), the agents model $n$ servers choosing one of $K$ products to show visitors to the site. The product selection problem can be viewed as a bandit – products are arms, while purchases yield rewards – and communication among the agents/servers is restricted by bandwidth.

Example 2.

For a social recommendation site (e.g., Yelp), the agents represent $n$ users choosing among $K$ items, such as restaurants. This is analogously modeled as a bandit, and communication is limited because agents/users are exposed to a small portion of all reviews.

To contextualize our contributions, we next discuss this line of work in more detail.

1.1. Fully cooperative multi-agent bandits

The goal of (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021) is to devise fully cooperative algorithms for which the cumulative regret $R_{T}^{(i)}$ of each agent $i$ is small (see (5) for the formal definition of regret). All of these papers follow a similar approach, which roughly proceeds as follows (see Section 3 for details):

•

The arms are partitioned into $n$ subsets of size $O(K/n)$ , and each agent is assigned a distinct subset called a sticky set, which they are responsible for exploring.
•

Occasionally ( $o(T)$ times per $T$ arm pulls), each agent $i$ asks a random neighbor $i^{\prime}$ for an arm recommendation; $i^{\prime}$ responds with the arm they believe is best, which $i$ begins playing.

For these algorithms, the regret analysis essentially contains two steps:

•

First, the authors show that the agent (say, $i^{\star}$ ) with the true best arm in its sticky set eventually identifies it as such. Thereafter, a gossip process unfolds. Namely, $i^{\star}$ recommends the best arm to its neighbors, who recommend it to their neighbors, etc., spreading the best arm to all agents. The spreading time (and thus the regret before this time) is shown to be polynomial in $K$ , $n$ , and $1/\Delta$ , where $\Delta$ is the gap in mean reward between the two best arms.
•

Once the best arm spreads, agents play only it and their sticky sets, so long-term, they effectively face $O(K/n)$ -armed bandits instead of the full $K$ -armed bandit. By classical bandit results (see, e.g., (Auer et al., 2002)), this implies $O((K/n)\log(T)/\Delta)$ regret over horizon $T$ .

Hence, summing up the two terms, (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021) provide regret bounds of the form¹¹1More precisely, (Chawla et al., 2020; Newton et al., 2021) prove (1), while the $K/n$ term balloons to $(K/n)+\log n$ in (Sankararaman et al., 2019).

(1)

R_{T}^{(i)}=O\left(\frac{K}{n}\frac{\log T}{\Delta}+\text{poly}\left(K,n,\frac{1}{\Delta}\right)\right),

as compared to $O(K\log(T)/\Delta)$ regret for running a single-agent algorithm in isolation.

1.2. Robust multi-agent bandits on the complete graph

Despite these improved bounds, (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021) require all agents to execute the prescribed algorithm, and in particular, to recommend best arm estimates to their neighbors. As pointed out in (Vial et al., 2021), this may be unrealistic: in Example 2, review spam can be modeled as bad arm recommendations, while in Example 1, servers may fail entirely. Hence, (Vial et al., 2021) considers a more realistic setting where $n$ honest agents recommend best arm estimates but $m$ malicious agents recommend arbitrarily. For this setting, the authors propose a robust version of the algorithm from (Chawla et al., 2020) where honest agents block suspected malicious agents. More specifically, (Vial et al., 2021) considers the following blocking rule:

•

If agent $i^{\prime}$ recommends arm $k$ to honest agent $i$ , but arm $k$ subsequently performs poorly for $i$ – in the sense that the upper confidence bound (UCB) algorithm does not select it sufficiently often – then $i$ temporarily suspends communication with $i^{\prime}$ .

As shown in (Vial et al., 2021), this blocking scheme prevents each malicious agent from recommending more than $O(1)$ bad arms long-term, which (effectively) results in an $O(m+K/n)$ -armed bandit ( $O(m)$ malicious recommendations, plus the $O(K/n)$ -sized sticky set). Under the assumption that honest and malicious agents are connected by the complete graph, this allows (Vial et al., 2021) to prove

(2)

R_{T}^{(i)}=O\left(\left(\frac{K}{n}+m\right)\frac{\log T}{\Delta}+\text{poly}\left(K,n,m,\frac{1}{\Delta}\right)\right).

In (Vial et al., 2021), it is also shown that blocking is necessary: for any $n\in\mathbb{N}$ , if even $m=1$ malicious agent is present, the algorithm from (Chawla et al., 2020) (which does not use blocking) incurs $\Omega(K\log(T)/\Delta)$ regret. Thus, one malicious agent negates the improvement over the single-agent baseline.

1.3. Objective and challenges

Our main goal is to generalize the results of (Vial et al., 2021) from the complete graph to the case where the honest agent subgraph is only connected and undirected. This is nontrivial because (Vial et al., 2021) relies heavily on the complete graph assumption. In particular, the analysis in (Vial et al., 2021) requires that $i^{\star}$ (the agent with the best arm in its sticky set) itself recommends the best arm to each of the other honest agents. In other words, each honest agent $i\neq i^{\star}$ relies on $i^{\star}$ to inform them of the best arm, which means $i^{\star}$ must be a neighbor of $i$ . Thus, to extend (2) beyond complete graphs, we need to show a gossip process unfolds (like in the fully cooperative case): $i^{\star}$ recommends the best arm to its neighbors, who recommend it to their neighbors, etc., spreading it through the network.

The challenge is that, while blocking is necessary to prevent $\Omega(K\log(T)/\Delta)$ regret, it also causes honest agents to accidentally block each other. Indeed, due to the aforementioned blocking rule and the noisy rewards, they will block each other until they collect enough samples to reliably identify good arms. From a network perspective, accidental blocking means that edges in the subgraph of honest agents temporarily fail. Consequently, it is not clear if the best arm spreads to all honest agents, or if (for example) this subgraph eventually becomes disconnected, preventing the spread and causing the agents who do not receive the best arm to suffer $\Theta(T)$ regret.

Analytically, accidental blocking means we must deal with a gossip process over a dynamic graph. This process is extremely complicated, because the graph dynamics are driven by the bandit algorithms, which in turn affect the future evolution of the graph. Put differently, blocking causes the randomness of the communication protocol and that of the bandit algorithms to become interdependent. We note this does not occur for the original non-blocking algorithm, where the two sources of randomness can be cleverly decoupled and separately analyzed – see (Chawla et al., 2020, Proposition 4). Thus, in contrast to existing work, we need to analyze the interdependent processes directly.

1.4. Our contributions

Failure of the existing blocking rule: In Section 4, we show that the algorithm from (Vial et al., 2021) fails to achieve a regret bound of the form (2) for connected and undirected graphs in general. Toward this end, we define a natural “bad instance” in which $n=K$ , the honest agent subgraph is an undirected line (thus connected), and all honest agents share a malicious neighbor. For this instance, we propose a malicious strategy that causes honest agents to repeatedly block one another, which results in the best arm spreading extremely slowly. More specifically, we show that if honest agents run the algorithm from (Vial et al., 2021), then the best arm does not reach honest agent $n$ (the one at the end of the line) until time is doubly exponential in $n=K$ . Note (Vial et al., 2021) shows the best arm spreads polynomially fast for the complete graph, so we demonstrate a doubly exponential slowdown for complete versus line graphs. This is rather surprising, because for classical rumor processes that do not involve bandits or blocking (see, e.g., (Pittel, 1987)), the slowdown is only exponential (i.e., $\Theta(\log n)$ rumor spreading time on the complete graph versus $\Theta(n)$ on the line graph). As a consequence of the slow spread, we show the algorithm from (Vial et al., 2021) suffers regret

(3)

R_{T}^{(n)}=\Omega\left(\min\left\{\log(T)+\exp\left(\exp\left(\frac{n}{3}\right)\right),\frac{T}{\log^{7}T}\right\}\right),

i.e., it incurs (nearly) linear regret until time $\Omega(\exp(\exp(n/3)))$ and thereafter incurs logarithmic regret but with a huge additive term (see Theorem 1).

Refined blocking rule: In light of this negative result, we propose a refined blocking rule in Section 5. Roughly, our rule is as follows: agent $i$ blocks $i^{\prime}$ for recommending arm $k$ if

•

arm $k$ performs poorly, i.e., it is not chosen sufficiently often by UCB,
•

and agent $i$ has not changed its own best arm estimate recently.

The second criterion is the key distinction from (Vial et al., 2021). Intuitively, it says that agents should not block for seemingly-poor recommendations until they become confident that their own best arm estimates have settled on truly good arms. This idea is the main new algorithmic insight of the paper. It is directly motivated by the negative result of Section 4; see Remark 5.

Gossip despite blocking: Analytically, our main contribution is to show that, with our refined blocking rule, the best arm quickly spreads to all honest agents. The proof is quite involved; we provide an outline in Section 7. At a very high level, the idea is to show that honest agents using our blocking rule eventually stop blocking each other. Thereafter, we can couple the arm spreading process with a much more tractable noisy rumor process that involves neither bandits nor blocking (see Definition 1), and that is guaranteed to spread the best arm in polynomial time.

Regret upper bound: Combining our novel gossip analysis with some existing regret minimization techniques, we show in Section 5 that our refined algorithm enjoys the regret bound

(4)

R_{T}^{(i)}=O\left(\left(\frac{K}{n}+d_{\text{mal}}(i)\right)\frac{\log T}{\Delta}+\text{poly}\left(K,n,m,\frac{1}{\Delta}\right)\right),

where $d_{\text{mal}}(i)$ is the number of malicious neighbors of $i$ (see Theorem 2). Thus, our result generalizes (2) from the complete graph (where $d_{\text{mal}}(i)=m$ ) to connected and undirected graphs. Moreover, note the leading $\log T$ term in (4) is entirely local – only the $d_{\text{mal}}(i)$ malicious agents directly connected to $i$ affect its long-term regret. For example, in the sparse regime $d_{\text{mal}}(i)=O(1)$ , our $\log T$ term matches the one in (1) up to constants, which (we recall) (Chawla et al., 2020; Newton et al., 2021) proved in the case where there are no malicious agents anywhere in the network. In fact, for honest agents $i$ with $d_{\text{mal}}(i)=0$ , we can prove that the $\log T$ term in our regret bound matches the corresponding term from (Chawla et al., 2020), including constants (see Corollary 2). In other words, we show that for large horizons $T$ , the effects of malicious agents do not propagate beyond one-step neighbors. Furthermore, we note that the additive term in (4) is polynomial in all parameters, whereas for the existing algorithm it can be doubly exponential in $K$ and $n$ , as shown in (3) and discussed above.

Numerical results: In Section 6, we replicate the experiments from (Vial et al., 2021) and extend them from the complete graph to $G(n+m,p)$ random graphs. Among other findings, we show that for $p=1/2$ and $p=1/4$ , respectively, the algorithm from (Vial et al., 2021) can perform worse than the non-blocking algorithm from (Chawla et al., 2020) and the single-agent baseline, respectively. In other words, the existing blocking rule becomes a liability as $p$ decreases from the extreme case $p=1$ considered in (Vial et al., 2021). In contrast, we show that our refined rule has lower regret than (Chawla et al., 2020) across the range of $p$ tested. Additionally, it outperforms (Vial et al., 2021) on average for all but the largest $p$ and has much lower variance for smaller $p$ .

Summary: Ultimately, the high-level messages of this paper are twofold:

•

In multi-agent bandits with malicious agents, we can devise algorithms that simultaneously (1) learn useful information and spread it through the network via gossip, and (2) learn who is malicious and block them to mitigate the harm they cause. Moreover, this harm is local in the sense that it only affects one-hop neighbors.
•

However, blocking must be done carefully; algorithms designed for the complete graph may spread information extremely slowly on general graphs. In particular, the slowdown can be doubly exponential, much worse than the exponential slowdown of simple rumor processes.

1.5. Other related work

In addition to the paper (Vial et al., 2021) discussed above, several others have considered multi-agent bandits where some of the agents are uncooperative. In (Awerbuch and Kleinberg, 2008), the honest agents face a non-stochastic (i.e., adversarial) bandit (Auer et al., 1995) and communicate at every time step, in contrast to the stochastic bandit and limited communication of our work. The authors of (Mitra et al., 2021) consider the objective of best arm identification (Audibert and Bubeck, 2010) instead of cumulative regret. Most of their paper involves a different communication model where the agents/clients collaborate via a central server; Section 6 studies a “peer-to-peer” model which is closer to ours but requires additional assumptions on the number of malicious neighbors. A different line of work considers the case where an adversary can corrupt the observed rewards (see, e.g., (Bogunovic et al., 2020, 2021; Garcelon et al., 2020; Gupta et al., 2019; Jun et al., 2018; Kapoor et al., 2019; Liu and Shroff, 2019; Liu et al., 2021; Lykouris et al., 2018), and the references therein), which is distinct from the role that malicious agents play in our setting.

For the fully cooperative case, there are several papers with communication models that differ from the aforementioned (Chawla et al., 2020; Newton et al., 2021; Sankararaman et al., 2019). For example, agents in (Buccapatnam et al., 2015; Chakraborty et al., 2017) broadcast information instead of exchanging pairwise arm recommendations, communication in (Kolla et al., 2018; Lalitha and Goldsmith, 2021; Martínez-Rubio et al., 2019) is more frequent, the number of transmissions in (Madhushani and Leonard, 2021) depends on $\Delta^{-1}$ so could be large, and agents in (Landgren et al., 2016) exchange arm mean estimates instead of (bit-limited) arm indices.

More broadly, other papers have studied fully cooperative variants of different bandit problems. These include minimizing simple instead of cumulative regret (e.g., (Hillel et al., 2013; Szörényi et al., 2013)), minimizing the total regret across agents rather than ensuring all have low regret (e.g., (Dubey et al., 2020a; Wang et al., 2020)), contextual instead of multi-armed bandits (e.g., (Chawla et al., 2022; Dubey et al., 2020b; Dubey and Pentland, 2020; Korda et al., 2016; Tekin and Van Der Schaar, 2015)), adversarial rather than stochastic bandits (e.g., (Bar-On and Mansour, 2019; Cesa-Bianchi et al., 2016; Kanade et al., 2012)), and bandits that vary across agents (e.g., (Bistritz and Leshem, 2018; Shahrampour et al., 2017; Zhu et al., 2021)). Another long line of work features collision models where rewards are lower if multiple agents simultaneously pull the same arm (e.g., (Anandkumar et al., 2011; Avner and Mannor, 2014; Boursier and Perchet, 2019; Dakdouk et al., 2021; Kalathil et al., 2014; Liu and Zhao, 2010; Liu et al., 2020; Mansour et al., 2018; Rosenski et al., 2016)), unlike our model. Along these lines, other reward structures have been studied, such as reward being a function of the agents’ joint action (e.g., (Bargiacchi et al., 2018; Bistritz and Bambos, 2020; Kao et al., 2022)).

1.6. Organization

The rest of the paper is structured as follows. We begin in Section 2 with definitions. In Section 3, we introduce the algorithm from (Vial et al., 2021). Sections 4 and 5 discuss the existing and proposed blocking rules. Section 6 contains experiments. We discuss our analysis in Section 7 and close in Section 8.

2. Preliminaries

Communication network: Let $G=([n+m],E)$ be an undirected graph with vertices $[n+m]=\{1,\ldots,n+m\}$ . We call $[n]$ the honest agents and assume they execute the forthcoming algorithm. The remaining agents are termed malicious; their behavior will be specified shortly. For instance, honest and malicious agents represent functioning and failed servers in Example 1. The edge set $E$ encodes which agents are allowed to communicate, e.g., if $(i,i^{\prime})\in E$ , the $i$ -th and $i^{\prime}$ -th servers can communicate in the forthcoming algorithm.

Denote by $E_{\text{hon}}=\{(i,i^{\prime})\in E:i,i^{\prime}\in[n]\}$ the edges between honest agents and $G_{\text{hon}}=([n],E_{\text{hon}})$ the honest agent subgraph. For each $i\in[n]$ , we let $N(i)=\{i^{\prime}\in[n+m]:(i,i^{\prime})\in E\}$ denote its neighbors, $N_{\text{hon}}(i)=N(i)\cap[n]$ its honest neighbors, and $N_{\text{mal}}(i)=N(i)\setminus[n]$ its malicious neighbors. We write $d(i)=|N(i)|$ , $d_{\text{hon}}(i)=|N_{\text{hon}}(i)|$ , and $d_{\text{mal}}(i)=|N_{\text{mal}}(i)|$ for the associated degrees, and $\bar{d}=\max_{i\in[n]}d(i)$ , $\bar{d}_{\text{hon}}=\max_{i\in[n]}d_{\text{hon}}(i)$ , and $\bar{d}_{\text{mal}}=\max_{i\in[n]}d_{\text{mal}}(i)$ for the maximal degrees. We make the following assumption, which generalizes the complete graph case of (Vial et al., 2021).

Assumption 1.

The honest agent subgraph $G_{\text{hon}}$ is connected, i.e., for any $i,i^{\prime}\in[n]$ , there exists $l\in\mathbb{N}$ and $i_{0},i_{1},\ldots,i_{l}\in[n]$ such that $i_{0}=i$ , $(i_{j-1},i_{j})\in E_{\text{hon}}\ \forall\ j\in[l]$ , and $i_{l}=i^{\prime}$ .

Multi-armed bandit: We consider the standard stochastic multi-armed bandit (Lattimore and Szepesvári, 2020, Chapter 4). Denote by $K\in\mathbb{N}$ the number of arms and $[K]$ the set of arms. For each $k\in[K]$ , we let $\nu_{k}$ be a probability distribution over $\mathbb{R}$ and $\{X_{i,t}(k)\}_{i\in[n],t\in\mathbb{N}}$ an i.i.d. sequence of rewards sampled from $\nu_{k}$ . The interpretation is that, if the $i$ -th honest agent chooses the $k$ -th arm at time $t$ , it earns reward $X_{i,t}(k)$ . The objective (to be formalized shortly) is reward maximization. In Example 2, for instance, $[K]$ represents the set of restaurants in a city, and the reward $X_{i,t}(k)$ quantifies how much person $i$ enjoys restaurant $k$ if they dine there on day $t$ .

For each arm $k\in[K]$ , we let $\mu_{k}=\mathbb{E}[X_{i,t}(k)]$ denote the corresponding expected reward. Without loss of generality, we assume the arms are labeled such that $\mu_{1}\geq\cdots\geq\mu_{K}$ . We additionally assume the following, which generalizes the $\nu_{k}=\text{Bernoulli}(\mu_{k})$ and $\mu_{1}>\mu_{2}$ setting of (Vial et al., 2021). Notice that under this assumption, the arm gap $\Delta_{k}\triangleq\mu_{1}-\mu_{k}$ is strictly positive.

Assumption 2.

Rewards are $[0,1]$ -valued, i.e., for each $k\in[K]$ , $\nu_{k}$ is a distribution over $[0,1]$ . Furthermore, the best arm is unique, i.e., $\mu_{1}>\mu_{2}$ .

Objective: For each $i\in[n]$ and $t\in\mathbb{N}$ , let $I_{t}^{(i)}\in[K]$ denote the arm chosen by honest agent $i$ at time $t$ . Our goal is to minimize the regret $R_{T}^{(i)}$ , which is the expected additive loss in cumulative reward for agent $i$ ’s sequence of arm pulls $\{I_{t}^{(i)}\}_{t=1}^{T}$ compared to the optimal policy that always chooses the best arm $1$ . More precisely, we define regret as follows:

(5)

R_{T}^{(i)}\triangleq\sum_{t=1}^{T}\mathbb{E}\left[X_{i,t}(1)-X_{i,t}(I_{t}^{(i)})\right]=\sum_{t=1}^{T}\mathbb{E}\left[\mu_{1}-\mu_{I_{t}^{(i)}}\right]=\sum_{t=1}^{T}\mathbb{E}\left[\Delta_{I_{t}^{(i)}}\right].

3. Algorithm

We next discuss the algorithm from (Vial et al., 2021) (Algorithm 1 below), which modifies the one from (Chawla et al., 2020) to include blocking. For ease of exposition, we begin by discussing the key algorithmic design principles from (Chawla et al., 2020) in Section 3.1. We then define Algorithm 1 formally in Section 3.2. Finally, we introduce and discuss one additional assumption in Section 3.3.

3.1. Key ideas of the non-blocking algorithm

Refer to caption — Figure 1. Illustration of the phases in Algorithm 1; see beginning of Section 3.1 for details.

We assume $m=0$ this subsection and describe the non-blocking algorithm from (Chawla et al., 2020).

•

Phases: In (Chawla et al., 2020), the time steps $1,\ldots,T$ are grouped into phases, whose role is twofold. First, within the $j$ -th phase, the $i$ -th honest agent only pulls arms belonging to a particular subset $S_{j}^{(i)}\subset[K]$ . We call these active sets and detail their construction next. Second, at the end of the $j$ -th phase, the agents construct new active sets by exchanging arm recommendations with neighbors, in a manner to be described shortly. See Figure 1 for a pictorial description. Notice that the phase durations are increasing, which will be discussed in Section 3.2.
•

Active sets: The active set $S_{j}^{(i)}$ will always contain a subset of arms $\hat{S}^{(i)}\subset[K]$ that does not vary with the phase $j$ . Following (Chawla et al., 2020; Vial et al., 2021), we call $\hat{S}^{(i)}$ the sticky set and its elements sticky arms. The sticky sets ensure that each arm is explored by some agent, as will be seen in the forthcoming example. In addition, $S_{j}^{(i)}$ will contain two non-sticky arms that are dynamically updated across phases $j$ based on arm recommendations from neighbors.
•

Arm recommendations: After the $j$ -th phase, each agent $i$ contacts a random neighbor, who responds with whichever of their active arms performed “best” in the current phase. Upon receiving this recommendation, $i$ adds it to its active set and discards whichever currently-active non-sticky arm (i.e., whichever element of $S_{j}^{(i)}\setminus\hat{S}^{(i)}$ ) performed “worse”. (We quantify “best” and “worse” in the formal discussion of Section 3.2.)

Example 3.

Each subfigure of Figure 2 depicts $n=3$ honest agents as circles and their active sets as rectangles. The blue rectangles are sticky sets, the orange rectangles are non-sticky arms, and the arms are sorted by performance. For example, the left agent in Figure 2(a) has sticky set $\{1,2\}$ and active set $\{1,2,3,6\}$ and believes arm $3$ to be the best of these. Note the blue sticky sets partition $[K]=[6]$ , so at each phase, each arm is active for some agent. This ensures the best arm is never permanently discarded during the arm recommendations discussed above. Figure 2(b) shows agents recommending the active arms they believe are best, and Figure 2(c) depicts the updated active sets. For instance, the left agent replaces its worse non-sticky arm $6$ with the recommendation $5$ . Figure 2(d) shows a later phase where the best arm $1$ has spread to all agents, who have all identified it as such. Thereafter, all agents recommend $1$ , so the active set remains fixed (Figures 2(e) and 2(f)). Hence, all agents eventually exploit the best arm while only exploring a subset of the suboptimal arms (three instead of five here).

3.2. Formal definition of the blocking algorithm

The algorithm in (Vial et al., 2021) supplements the one from Section 3.1 with a blocking procedure. Specifically, honest agents run the algorithm from (Chawla et al., 2020) while maintaining blocklists of neighbors they are unwilling to communicate with. This approach is defined in Algorithm 1 and detailed next.

2Input: UCB parameter

\alpha>0

, phase parameter

\beta>1

, sticky arms

\hat{S}^{(i)}

with

|\hat{S}^{(i)}|=S\leq K-2

3Initialize

A_{j}=\lceil j^{\beta}\rceil,P_{j}^{(i)}=\emptyset\ \forall\ j\in\mathbb{N}

(communication times and blocklists)

4Set

j=1

(current phase), let

\{U_{j}^{(i)},L_{j}^{(i)}\}\subset[K]\setminus\hat{S}^{(i)}

be two distinct non-sticky arms and

S_{j}^{(i)}=\hat{S}^{(i)}\cup\{U_{j}^{(i)},L_{j}^{(i)}\}

the initial active set

5for $t\in\mathbb{N}$ do

7 Pull

I_{t}^{(i)}=\operatorname*{arg\,max}_{k\in S_{j}^{(i)}}\left(\hat{\mu}_{k}^{(i)}(t-1)+\sqrt{\alpha\log(t)/T_{k}^{(i)}(t-1)}\right)

(UCB over active set)

8 if $t=A_{j}$ (if communication occurs at this time) then

B_{j}^{(i)}=\operatorname*{arg\,max}_{k\in S_{j}^{(i)}}\left(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1})\right)

(most played active arm in this phase)

\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}\leftarrow\texttt{Update-Blocklist}(\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty})

((Vial et al., 2021) uses Alg. 3; we propose Alg. 4)

(H_{j}^{(i)},R_{j}^{(i)})=\texttt{Get-Recommendation}(i,j,P_{j}^{(i)})

(see Alg. 2)

13 if $R_{j}^{(i)}\notin S_{j}^{(i)}$ (if recommendation not already active) then

U_{j+1}^{(i)}=\operatorname*{arg\,max}_{k\in\{U_{j}^{(i)},L_{j}^{(i)}\}}\left(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1})\right)

(best non-sticky active arm)

L_{j+1}^{(i)}=R_{j}^{(i)}

(replace worst non-sticky active arm with recommendation)

S_{j+1}^{(i)}=\hat{S}^{(i)}\cup\{U_{j+1}^{(i)},L_{j+1}^{(i)}\}

(new active set is sticky set and two non-sticky arms)

18 else

S_{j+1}^{(i)}=S_{j}^{(i)}

(keep the same active set, since recommendation is already active)

j\leftarrow j+1

(increment phase)

Algorithm 1 Multi-agent bandits with blocking (executed by

i\in[n]

)

2Input: Agent

i\in\{1,\ldots,n\}

, phase

j\in\mathbb{N}

, blocklist

P_{j}^{(i)}

3Sample

H_{j}^{(i)}

from

N(i)\setminus P_{j}^{(i)}

(non-blocked neighbors) uniformly at random

4if $H_{j}^{(i)}\leq n$ (if the sampled agent is honest) then

6 Set

R_{j}^{(i)}=B_{j}^{(H_{j}^{(i)})}

(honest agents recommend most played arm from this phase)

7else

9 Choose

R_{j}^{(i)}\in[K]

arbitrarily (malicious agents recommend arbitrary arms)

10Output:

(H_{j}^{(i)},R_{j}^{(i)})

Algorithm 2

(H_{j}^{(i)},R_{j}^{(i)})=\texttt{Get-Recommendation}(i,j,P_{j}^{(i)})

(black box to

i\in[n]

)

Inputs (Line 1): The first input is a standard UCB exploration parameter $\alpha>0$ , which will be discussed shortly. The input $\beta>1$ controls the lengths of the phases; namely, the $j$ -th phase encompasses times $1+A_{j-1},\ldots,A_{j}$ , where $A_{j}\triangleq\lceil j^{\beta}\rceil$ . Note the phase duration $A_{j}-A_{j-1}=O(j^{\beta-1})$ grows with $j$ , as shown in Figure 1. The final input is an $S$ -sized sticky set $\hat{S}^{(i)}$ ( $S=2$ in Example 3), which, as in (Vial et al., 2021), we assume are provided to the agents (see Section 3.3 for more details).

Initialization (Lines 1-1): To start, $i$ initializes the times $A_{j}$ at which the $j$ -th phase ends, along with the blocklist $P_{j}^{(i)}$ . Additionally, $i$ chooses two distinct (but otherwise arbitrary) non-sticky arms $U_{1}^{(i)}$ and $L_{1}^{(i)}$ and constructs the active set $S_{1}^{(i)}=\hat{S}^{(i)}\cup\{U_{1}^{(i)},L_{1}^{(i)}\}$ . Notice that the active set contains the sticky set and two arms that depend on the phase, as described in Section 3.1.

UCB over the active set (Line 1): As was also mentioned in Section 3.1, $i$ only pulls arms from its current active set $S_{j}^{(i)}$ . More specifically, at each time $t$ during phase $j$ , $i$ chooses the active arm $I_{t}^{(i)}\in S_{j}^{(i)}$ that maximizes the UCB in Line 1 (see (Lattimore and Szepesvári, 2020, Chapters 7-10) for background). Here $T_{k}^{(i)}(t-1)$ and $\hat{\mu}_{k}^{(i)}(t-1)$ are the number of pulls of $k$ and the empirical mean of those pulls, i.e.,

T_{k}^{(i)}(t-1)=\sum_{s\in[t-1]}\mathbbm{1}(I_{s}^{(i)}=k),\quad\hat{\mu}_{k}^{(i)}(t-1)=\frac{1}{T_{k}^{(i)}(t-1)}\sum_{s\in[t-1]:I_{s}^{(i)}=k}X_{i,s}(k),

where $X_{i,s}(k)\sim\nu_{k}$ as in Section 2 and $\mathbbm{1}$ is the indicator function.

Best arm estimate (Line 1): At the end of phase $j$ (i.e., when $t=A_{j}$ ), $i$ defines its best arm estimate $B_{j}^{(i)}$ as the active arm it played the most in phase $j$ . The intuition is that, for large horizons, the arm chosen most by UCB is a good estimate of the true best arm (Bubeck et al., 2011). Thus, because phase lengths are increasing (see Figure 1), $B_{j}^{(i)}$ will be a good estimate of the best active arm for large $j$ .

Blocklist update (Line 1): Next, $i$ calls the Update-Blocklist subroutine to update its blocklist $P_{j}^{(i)}$ . The implementation of this subroutine is the key distinction between (Vial et al., 2021) and our work. We discuss the respective implementations in Sections 4 and 5, respectively.

Arm recommendations (Line 1): Having updated $P_{j}^{(i)}$ , $i$ requests an arm recommendation $R_{j}^{(i)}$ via Algorithm 2. Algorithm 2 is a black box (i.e., $i$ provides the input and observes the output), which samples a random non-blocked neighbor $H_{j}^{(i)}\in N(i)\setminus P_{j}^{(i)}$ . If $H_{j}^{(i)}$ is honest, it recommends its best arm estimate, while if malicious, it recommends an arbitrary arm.²²2Technically, malicious recommendations need to be measurable; see (Vial et al., 2021, Section 3) for details.

Updating the active set (Lines 1-1): Finally, $i$ updates its active set as in Section 3.1. In particular, if the recommendation $R_{j}^{(i)}$ is not currently active³³3If the recommendation is currently active, the active set remains unchanged (see Line 1 of Algorithm 1)., $i$ defines $U_{j+1}^{(i)}$ to be the non-sticky arm that performed better in phase $j$ , in the sense that UCB chose it more often (following the above intuition from (Bubeck et al., 2011)). The other non-sticky arm $L_{j+1}^{(i)}$ becomes the recommendation $R_{j}^{(i)}$ , and the new active set becomes $S_{j+1}^{(i)}=\hat{S}^{(i)}\cup\{U_{j+1}^{(i)},L_{j+1}^{(i)}\}$ (the sticky set and two other arms, as above).

3.3. Additional assumption

Observe that Algorithm 1 does not preclude the case where the best arm is not in any honest agent’s sticky set, i.e., $1\notin\cup_{i=1}^{n}\hat{S}^{(i)}$ . In this case, the best arm may be permanently discarded, which causes linear regret even in the absence of malicious agents. For example, this would occur if $1$ was not a sticky arm for the left agent in Figure 2 (since the right agent discards $1$ in Figure 2(c)). To prevent this situation, we will follow (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021; Vial et al., 2021) in assuming the following.

Assumption 3.

There exists $i^{\star}\in[n]$ with the best arm in its sticky set, i.e., $1\in\hat{S}^{(i^{\star})}$ .

Remark 1.

As discussed in (Chawla et al., 2020, Appendix N), Assumption 3 holds with high probability if $S$ (the size of the sticky set input to Algorithm 1) is set to $\tilde{\Theta}(K/n)$ and each sticky set $\hat{S}^{(i)}$ is sampled uniformly at random from the $S$ -sized subsets of $[K]$ .

Remark 2.

The choice $S=\tilde{\Theta}(K/n)$ from Remark 1 requires the honest agents to know an order-accurate estimate of $n$ , i.e., they need to know some $n^{\prime}=\tilde{\Theta}(n)$ in order to set $S=\tilde{\Theta}(K/n^{\prime})$ and ensure that $S=\tilde{\Theta}(K/n)$ . As discussed in (Vial et al., 2021, Remark 7), this amounts to knowing order-accurate estimates of $n+m$ and $n/(n+m)$ . The former quantity is the total number of agents, knowledge of which is rather benign and is also assumed in the fully-cooperative setting (Sankararaman et al., 2019; Chawla et al., 2020; Newton et al., 2021). The latter requires the agents to know that, e.g., half of the others are honest, which is similar in spirit to the assumptions in related problems regarding social learning in the presence of adversarial agents (e.g., (LeBlanc et al., 2013)).

Remark 3.

Alternatively, we can avoid Assumption 3 entirely by defining the set of the arms to be those initially known by the honest agents (i.e., their sticky sets), rather than sampling the sticky sets from a larger “base set” as in Remark 1. In this alternative model, the honest agents aim to identify and spread through the network whichever of the initially-known arms is best, similar to what happens on platforms like Yelp (see Example 2). In contrast, the Section 2 model allows for the pathological case where the base set contains a better arm than any initially known to honest agents (e.g., where no honest Yelp user has ever dined at the best restaurant). Coping with these pathological cases either requires Assumption 3, or another mode of exploration (i.e., exploration of base arms) that obfuscates the key point of our work (collaborative bandit exploration amidst adversaries). For these reasons, we prefer the alternative model, but to enable a cleaner comparison with prior work (Vial et al., 2021), we restrict attention to the Section 2 model (which generalizes that of (Vial et al., 2021)).

4. Existing blocking rule

We can now define the blocking approach from (Vial et al., 2021), which is provided in Algorithm 3. In words, the rule is as follows: if the recommendation $R_{j-1}^{(i)}$ from phase $j-1$ is not $i$ ’s most played arm in the subsequent phase $j$ , then the agent $H_{j-1}^{(i)}$ who recommended it is added to the blocklists $P_{j}^{(i)},\ldots,P_{j^{\eta}}^{(i)}$ , where $\eta>1$ is a tuning parameter. By Algorithm 2, this means $i$ blocks (i.e., does not communicate with) $H_{j-1}^{(i)}$ until phase $j^{\eta}+1$ (at the earliest). Thus, agents block others whose recommendations perform poorly – in the sense that UCB does not play them often – and the blocking becomes more severe as the phase counter $j$ grows. See (Vial et al., 2021, Remark 4) for further intuition.

2if $j>1$ and $B_{j}^{(i)}\neq R_{j-1}^{(i)}$ (if previous recommendation not most played) then

P_{j^{\prime}}^{(i)}\leftarrow P_{j^{\prime}}^{(i)}\cup\{H_{j-1}^{(i)}\}\ \forall\ j^{\prime}\in\{j,\ldots,\lceil j^{\eta}\rceil\}

(block the recommender until phase

j^{\eta}

)

Algorithm 3

\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}=\texttt{Update-Blocklist}

(executed by

i\in[n]

, existing rule from (Vial et al., 2021))

In the remainder of this section, we define a bad instance (Section 4.1) on which this blocking rule provably fails (Section 4.2). Our goal here is to demonstrate a single such instance in order to show this blocking rule must be refined. Therefore, we have opted for a concrete example, which includes some numerical constants (e.g., $13/15$ in (6), the $7$ in the $\log^{7}T$ term in Theorem 1, etc.) that have no particular meaning. Nevertheless, the instance can be generalized; see Remark 4.

4.1. Bad instance

The network and bandit for the bad instance are as follows:

•

There are an even number of honest agents (at least four) arranged in a line, increasing in index from left to right, and there is a malicious agent connected to each of the honest ones. Mathematically, we have $n\in\{4,6,8,\ldots\}$ , $m=1$ , and $E=\{(i,i+1)\}_{i=1}^{n-1}\cup\{(i,n+1)\}_{i=1}^{n}$ .

•

There are $K=n$ arms that generate deterministic rewards (i.e., $\nu_{k}=\delta_{\mu_{k}}$ ) with

(6)

\mu_{1}=1,\quad\mu_{k}=\frac{13}{15}+\sum_{h=1}^{(n/2)-k}2^{-2^{h+1}}\ \forall\ k\in\{2,\ldots,n/2\},\quad\mu_{k}=0\ \forall\ k>n/2.

Intuitively, there are three sets of arms: the best arm, $(n/2)-1$ mediocre arms, and $n/2$ bad arms. We provide further intuition in the forthcoming proof sketch. For now, we highlight three key properties. First, the gap from mediocre to bad arms is constant, i.e., $\mu_{k_{1}}-\mu_{k_{2}}\geq 13/15$ when $k_{1}\leq n/2<k_{2}$ . Second, the gaps between mediocre arms are doubly exponentially small, i.e., $\mu_{k}-\mu_{k+1}=2^{-2^{(n/2)-k+1}}$ for $k\in\{2,\ldots,(n/2)-1\}$ . Third, the gap $\Delta_{2}$ from the best to the mediocre arms is at least $1/15$ , as shown in Appendix C.

Observation 1.

Since rewards are deterministic, the most played arm $B_{j+1}^{(i)}$ in phase $j+1$ is a deterministic function of the number of plays of the active arms at the beginning of the phase, i.e., of the set $\{T_{k}^{(i)}(A_{j})\}_{k\in S_{j+1}^{(i)}}$ . Hence, when the $j$ -th recommendation is already active (i.e., when $R_{j}^{(i)}\in S_{j}^{(i)}$ , which implies $S_{j+1}^{(i)}=S_{j}^{(i)}$ in Algorithm 1), $B_{j+1}^{(i)}$ is a function of $\{T_{k}^{(i)}(A_{j})\}_{k\in S_{j}^{(i)}}$ , which is information available to the malicious agent at the $j$ -th communication time $A_{j}$ . Consequently, the malicious agent can always recommend some $R_{j}^{(i)}\in S_{j}^{(i)}$ such that $B_{j+1}^{(i)}=R_{j}^{(i)}$ to avoid being blocked by $i$ .

We make the following assumptions on Algorithms 1 and 2:

•

The parameters in Algorithm 1 are $\alpha=4$ and $\beta=2$ , while $\eta=2$ in Algorithm 3.
•

Sticky sets have size $S=1$ and for any $i\in\{1+n/2,\ldots,n\}$ , $i$ ’s initial active set satisfies $\min S_{1}^{(i)}>n/2$ . Thus, active sets contain three arms, and the right half of the honest agents are initially only aware of the bad arms, i.e., of those that provide no reward.

Remark 4.

Note that Assumptions 1-3 all hold for this instance, and the choices $\alpha=4$ and $\beta=\eta=2$ are used for the complete graph experiments in (Vial et al., 2021). Additionally, the instance can be significantly generalized – the key properties are that $K$ and $n$ have the same scaling, the gaps from mediocre arms to others are constant, the gaps among mediocre arms are doubly exponentially small, and a constant fraction of agents on the right initially only have bad arms active.

Finally, we define a particular malicious agent strategy. Let $J_{1}=2^{8}$ and inductively define $J_{l+1}=(J_{l}+2)^{2}$ for each $l\in\mathbb{N}$ . Then the malicious recommendations are as follows:

•

If $j=J_{l}$ and $i\in\{l+1+n/2,l+2+n/2\}$ for some $l\in[(n/2)-1]$ , set $R_{j}^{(i)}=1-l+n/2$ .
•

Otherwise, let $R_{j}^{(i)}\in S_{j}^{(i)}$ be such that $B_{j+1}^{(i)}=R_{j}^{(i)}$ (see Observation 1).

Similar to the arm means, we will wait for the proof sketch to explain this strategy in more detail. For now, we only mention that the phases $J_{l}$ grow doubly exponentially, i.e.,

(7)

J_{l+1}=(J_{l}+2)^{2}>J_{l}^{2}>\cdots>J_{1}^{2^{l}}\ \forall\ l\in\mathbb{N}.

4.2. Negative result

We can now state the main result of this section. It shows that if the existing blocking rule from (Vial et al., 2021) is used on the above instance, then the honest agent $n$ at the end of the line suffers nearly linear regret $\tilde{\Omega}(T)$ until time $T$ exceeds a doubly exponential function of $n=K$ .

Theorem 1.

If we run Algorithm 1 and use Algorithm 3 as the Update-Blocklist subroutine with the parameters and problem instance described in Section 4, then

R_{T}^{(n)}=\Omega\left(\min\left\{\log(T)+\exp\left(\exp\left(n/3\right)\right),T/\log^{7}T\right\}\right).

Proof sketch.

We provide a complete proof in Appendix C but discuss the intuition here.

•

First, suppose honest agent $1+n/2$ contacts the malicious agent $n+1$ at all phases $j\in[J_{1}-1]$ (this occurs with constant probability since $J_{1}$ is constant). Then the right half of honest agents (i.e., agents $1+n/2,\ldots,n$ ) only have bad arms (i.e., arms $1+n/2,\ldots,n$ ) in their active sets at phase $J_{1}$ . This is because their initial active sets only contain such arms (by assumption), $n+1$ only recommends currently-active arms before $J_{1}$ , and no arm recommendations flow from the left half of the graph to the right half (they need to first be sent from $n/2$ to $1+n/2$ , but we are assuming the latter only contacts $n+1$ before $J_{1}$ ).
•

Now consider phase $J_{1}$ . With constant probability, $1+n/2$ and $2+n/2$ both contact $n+1$ , who recommends a currently active (thus bad) arm and the mediocre arm $n/2$ , respectively. Then, again with constant probability, $2+n/2$ contacts $1+n/2$ at the next phase $J_{1}+1$ ; $1+n/2$ only has bad arms active and thus recommends a bad arm. Therefore, during phase $J_{1}+2$ , agent $2+n/2$ has the mediocre arm $n/2$ and some bad recommendation from $1+n/2$ in its active set. The inverse gap squared between these arms is constant, thus less than the length of phase $J_{1}+2$ (for appropriate $J_{1}$ ), so by standard bandit arguments (basically, noiseless versions of best arm identification results from (Bubeck et al., 2011)), $n/2$ will be most played. Consequently, by the blocking rule in Algorithm 3, $2+n/2$ blocks $1+n/2$ until phase $(J_{1}+2)^{2}=J_{2}$ .
•

We then use induction. For each $l\in[(n/2)-1]$ ( $l=1$ in the previous bullet), suppose $l+1+n/2$ blocks $l+n/2$ between phases $J_{l}+2$ and $J_{l+1}$ . Then during these phases, no arm recommendations flow past $l+n/2$ , so agents $\geq l+1+n/2$ only play arms $\geq 1-l+n/2$ . At phase $J_{l+1}$ , the malicious agent recommends $k\geq 1-l+n/2$ and $-l+n/2$ to agents $l+1+n/2$ and $l+2+n/2$ , respectively, and at the subsequent phase $J_{l+1}+1$ , $l+1+n/2$ recommends $k^{\prime}\geq l+1+n/2$ to $l+2+n/2$ . Similar to the previous bullet, we then show $l+2+n/2$ plays arm $-l+n/2$ more than $k^{\prime}$ during phase $J_{l+1}+2$ and thus blocks $l+1+n/2$ until $(J_{l+1}+2)^{2}=J_{l+2}$ , completing the inductive step. The proof that $-l+n/2$ is played more than $k^{\prime}$ during phase $J_{l+1}+2$ again follows from noiseless best arm identification, although unlike the previous bullet, the relevant arm gap is no longer constant (both could be mediocre arms). However, we chose the mediocre arm means such that their inverse gap squared is at most doubly exponential in $l$ , so by (7), the length of phase $J_{l+1}$ dominates it.

In summary, we show that due to blocking amongst honest agents, $l+1+n/2$ does not receive arm $1-l+n/2$ until phase $J_{l}$ , given that some constant probability events occur at each of the times $J_{1},\ldots,J_{l}$ . This allows us to show that, with probability at least $\exp(-\Omega(n))$ , agent $n$ does not receive the best arm until phase $J_{n/2}=\exp(\exp(\Omega(n)))$ , and thus does not play the best arm until time $\exp(\exp(\Omega(n)))$ in expectation. Since $\Delta_{2}$ is constant, we can lower bound regret similarly. ∎

5. Proposed blocking rule

To summarize the previous section, we showed that the existing blocking rule (Algorithm 3) may result in honest agents blocking too aggressively, which causes the best arm to spread very slowly. In light of this, we propose a relaxed blocking criteria (see Algorithm 4): at phase $j$ , agent $i$ will block the agent $H_{j-1}^{(i)}$ who recommended arm $R_{j-1}^{(i)}$ at the previous phase $j-1$ if

(8)

T_{R_{j-1}^{(i)}}^{(i)}(A_{j})\leq\kappa_{j}\quad\text{and}\quad B_{j}^{(i)}=B_{j-1}^{(i)}=\cdots=B_{\lfloor\theta_{j}\rfloor}^{(i)},

where $\kappa_{j}\leq A_{j}$ and $\theta_{j}\leq j$ are tuning parameters. Thus, $i$ blocks if both of the following occur:

•

The recommended arm $R_{j-1}^{(i)}$ performs poorly, in the sense that UCB has not chosen it sufficiently often (i.e., at least $\kappa_{j}$ times) by the end of phase $j$ .
•

Agent $i$ has not changed its own best arm estimate since phase $\theta_{j}$ . Intuitively, this can be viewed as a confidence criterion: if instead $i$ has recently changed its estimate, then $i$ is currently unsure which arm is best, so should not block for recommendations that appear suboptimal at first glance (i.e., those for which the first criterion in (8) may hold).

2if $j>1$ and (8) holds then

P_{j^{\prime}}^{(i)}\leftarrow P_{j^{\prime}}^{(i)}\cup\{H_{j-1}^{(i)}\}\ \forall\ j\in\{j,\ldots,\lceil j^{\eta}\rceil\}

(block recommender);

Algorithm 4

\{P_{j^{\prime}}^{(i)}\}_{j^{\prime}=j}^{\infty}=\texttt{Update-Blocklist}

(executed by

i\in[n]

, proposed rule)

Remark 5.

The first criterion in (8) is a natural relaxation of demanding the recommended arm be most played. The second is directly motivated by the negative result from Section 4. In particular, recall from the Theorem 1 proof sketch that $l+1+n/2$ blocked $l+n/2$ shortly after receiving a new mediocre arm from the malicious agent. Thus, blocking amongst honest agents was always precipitated by the blocking agent changing its best arm estimate. The second criterion in (8) aims to avoid this.

Remark 6.

Our proposed rule has two additional parameters compared to the existing one: $\kappa_{j}$ and $\theta_{j}$ . For our theoretical results, these will be specified in Theorem 2; for experiments, they are discussed in Section 6. For now, we only mention that they should satisfy two properties. First, $\kappa_{j}$ should be $o(A_{j})$ , so that the first criterion in (8) dictates a sublinear number of plays. Second, $j-\theta_{j}$ should grow with $j$ , since (as discussed above) the second criterion represents the confidence in the best arm estimate, which grows as the number of reward observations increases.

In the remainder of this section, we introduce a further definition (Section 5.1), provide a general regret bound under our blocking rule (Section 5.2), and discuss some special cases (Section 5.3).

5.1. Noisy rumor process

As discussed in Section 1.4, we will show that under our proposed rule (1) honest agents eventually stop blocking each other, and (2) honest agents with the best arm active will eventually recommend it to others. Thereafter, we essentially reduce the arm spreading process to a much simpler rumor process in which each honest agent $i$ contacts a uniformly random neighbor $i^{\prime}$ and, if $i^{\prime}$ is an honest agent who knows the rumor (i.e., if the best arm is active for $i^{\prime}$ ), then $i^{\prime}$ informs $i$ of the rumor (i.e., $i^{\prime}$ recommends the best arm to $i$ ). The only caveat is that we make no assumption on the malicious agent arm recommendations, so we have no control over whether or not they are blocked. In other words, the rumor process unfolds over a dynamic graph, where edges between honest and malicious agents may or may not be present, and we have no control over these dynamics.

In light of this, we take a worst-case view and lower bound the arm spreading process with a noisy rumor process that unfolds on the (static) honest agent subgraph. More specifically, we consider the process $\{\bar{\mathcal{I}}_{j}\}_{j=0}^{\infty}$ that tracks the honest agents informed of the rumor. Initially, only $i^{\star}$ (the agent from Assumption 3) is informed (i.e., $\bar{\mathcal{I}}_{0}=\{i^{\star}\}$ ). Then at each phase $j\in\mathbb{N}$ , each honest agent $i$ contacts a random honest neighbor $i^{\prime}$ . If $i^{\prime}$ is informed (i.e., if $i^{\prime}\in\bar{\mathcal{I}}_{j-1}$ ), then $i$ becomes informed as well (i.e., $i\in\bar{\mathcal{I}}_{j}$ ), subject to some $\text{Bernoulli}(\Upsilon)$ noise, where $\Upsilon\leq d_{\text{hon}}(i)/d(i)$ . Hence, $i$ becomes informed with probability $|\bar{\mathcal{I}}_{j-1}\cap N_{\text{hon}}(i)|\Upsilon/d_{\text{hon}}(i)\leq|\bar{\mathcal{I}}_{j-1}\cap N_{\text{hon}}(i)|/d(i)$ . Note the right side of this inequality is in turn upper bounded by the probability with which they receive the best arm in the process of the previous paragraph.

More formally, we define the noisy rumor process as follows. The key quantity in Definition 1 is $\bar{\tau}_{\text{spr}}$ , the first phase all are informed. Analogous to (Chawla et al., 2020), our most general result will be in terms of the expected time that this phase occurs, i.e., $\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]$ . Under Assumption 1, the latter quantity is $\tilde{O}((n\bar{d}_{\text{hon}}/\Upsilon)^{\beta})$ , which cannot be improved in general (see Appendix D.4). However, Section 5.3 provides sharper bounds for $\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]$ in some special cases.

Definition 1.

Let $\Upsilon=\min_{i\in[n]}d_{\text{hon}}(i)/d(i)$ . For each honest agent $i\in[n]$ , let $\{\bar{Y}_{j}^{(i)}\}_{j=1}^{\infty}$ be i.i.d. $\text{Bernoulli}(\Upsilon)$ random variables and $\{\bar{H}_{j}^{(i)}\}_{j=1}^{\infty}$ i.i.d. random variables chosen uniformly at random from $N_{\text{hon}}(i)$ . Inductively define $\{\bar{\mathcal{I}}_{j}\}_{j=0}^{\infty}$ as follows: $\bar{\mathcal{I}}_{0}=\{i^{\star}\}$ (the agent from Assumption 3) and

(9)

\bar{\mathcal{I}}_{j}=\bar{\mathcal{I}}_{j-1}\cup\{i\in[n]\setminus\bar{\mathcal{I}}_{j-1}:\bar{Y}_{j}^{(i)}=1,\bar{H}_{j}^{(i)}\in\bar{\mathcal{I}}_{j-1}\}\ \forall\ j\in\mathbb{N}.

Finally, let $\bar{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\bar{\mathcal{I}}_{j}=[n]\}$ .

5.2. Positive result

We can now present the main result of this section: a regret upper bound for the proposed blocking rule. We state it first and then unpack the statement in some ensuing remarks. The proof of this result (and all others in this section) is deferred to Appendix D.

Theorem 2.

Let Assumptions 1-3 hold. Suppose we run Algorithm 1 and use Algorithm 4 as the Update-Blocklist subroutine with $\theta_{j}=(j/3)^{\rho_{1}}$ and $\kappa_{j}=j^{\rho_{2}}/(K^{2}S)$ in (8). Also assume

(10)

\beta>1,\quad\eta>1,\quad 0<\rho_{1}\leq\frac{1}{\eta},\quad\alpha>\frac{3}{2}+\frac{1}{2\beta}+\frac{1}{2\rho_{1}^{2}},\quad\frac{1}{2\alpha-3}<\rho_{2}<\rho_{1}(\beta-1).

Then for any honest agent $i\in[n]$ and horizon $T\in\mathbb{N}$ , we have

(11)

R_{T}^{(i)}\leq 4\alpha\log(T)\min\left\{\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{1}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{1}{\Delta_{k}},\sum_{k=2}^{K}\frac{1}{\Delta_{k}}\right\}+2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]+C_{\star},

where $\Delta_{k}=1$ by convention if $k>K$ . Here $C_{\star}$ is a term independent of $T$ satisfying

(12)

C_{\star}=\tilde{O}\left(\max\left\{d_{\text{mal}}(i)/\Delta_{2},(K/\Delta_{2})^{2},S^{\beta/(\rho_{1}^{2}(\beta-1))},(S/\Delta_{2}^{2})^{\beta/(\beta-1)},\bar{d}^{\beta/\rho_{1}},nK^{2}S\right\}\right),

where $\tilde{O}(\cdot)$ hides dependencies on $\alpha$ , $\beta$ , $\eta$ , $\rho_{1}$ , and $\rho_{2}$ and log dependencies on $K$ , $n$ , $m$ , and $\Delta_{2}^{-1}$ .

Remark 7.

The theorem shows that our algorithm’s regret scales as $(d_{\text{mal}}(i)+S)\log(T)/\Delta$ , plus an additive term $2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]+C_{\star}$ that is independent of $T$ and polynomial in all other parameters. When $S=O(K/n)$ (see Remark 1), the first term is $O((d_{\text{mal}}(i)+K/n)\log(T)/\Delta)$ , as stated in Section 1.4. Also, when $d_{\text{mal}}(i)$ is large, we recover the $O(K\log(T)/\Delta)$ single-agent bound (including the constant $4\alpha$ ), i.e., if there are many malicious agents, honest ones fare no worse than the single-agent case.

Remark 8.

In addition to Assumptions 1-3, the theorem requires the algorithmic parameters to satisfy (10). For example, we can choose $\beta=\eta=2$ , $\rho_{1}=1/2$ , $\alpha=4$ , and $\rho_{2}=1/3$ . More generally, we view these five parameters as small numerical constants and hide them in the $\tilde{O}(\cdot)$ notation.

Remark 9.

The bound in Theorem 2 can be simplified under additional assumptions. For instance, in Example 2, it is reasonable to assume $K=\Theta(n)$ (i.e., the number of restaurants is proportional to the population) and $\bar{d}=O(1)$ (i.e., the degrees are constant, as in sparse social networks). Under these assumptions, the choice $S=O(K/n)=O(1)$ from Remark 1, and the parameters from Remark 6, the theorem’s regret bound can be further upper bounded by

R_{T}^{(i)}\leq\sum_{k=2}^{O(1)}\frac{48\log T}{\Delta_{k}}+2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]+\tilde{O}(\max\{(K/\Delta_{2})^{2},\Delta_{2}^{-4},nK^{2}\}).

Remark 10.

Note the parameters from Remark 8 were also used for the bad instance of Section 4. There, we had $\Delta_{k}>1/15$ , $S=d_{\text{mal}}(i)=1$ , and $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]=\tilde{O}(n^{\beta})$ , so our regret is $O(\log T)$ plus a polynomial additive term that is much smaller than the doubly exponential term in Section 4.

Proof sketch.

Let $\tau_{\text{spr}}=\inf\{j\in\mathbb{N}:B_{j^{\prime}}^{(i^{\prime})}=1\ \forall\ i^{\prime}\in[n],j^{\prime}\geq j\}$ denote the first phase where the best arm is most played for all honest agents at all phases thereafter. Before this phase (i.e., before time $A_{\tau_{\text{spr}}}$ ) we simply upper bound regret by $\mathbb{E}[A_{\tau_{\text{spr}}}]$ . The main novelty of our analysis is bounding $\mathbb{E}[A_{\tau_{\text{spr}}}]$ in terms of $C_{\star}$ and $\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]$ . We devote Section 7 to discussing this proof.

After phase $\tau_{\text{spr}}$ , the best arm is active by definition, so $i$ incurs logarithmic in $T$ regret. We let

(13)

\tau_{\text{blk}}^{(i)}=\inf\{j\in\mathbb{N}:H_{j^{\prime}-1}^{(i)}\in P_{j^{\prime}}^{(i)}\setminus P_{j^{\prime}-1}^{(i)}\ \forall\ j^{\prime}\geq j\ s.t.\ R_{j^{\prime}-1}^{(i)}\neq 1\}

be the earliest phase such that $i$ blocks for all suboptimal recommendations thereafter. We then split the phases after $\tau_{\text{spr}}$ into two groups: those before $\tilde{\tau}^{(i)}\triangleq\tau_{\text{spr}}\vee\tau_{\text{blk}}^{(i)}\vee T^{1/(\beta K)}$ and those after.

For the phases after $\tau_{\text{spr}}$ but before $\tilde{\tau}^{(i)}$ , we consider three cases:

•

$\tilde{\tau}^{(i)}=\tau_{\text{spr}}$ : In this case, there are no such phases, so there is nothing to prove.
•

$\tilde{\tau}^{(i)}=T^{1/(\beta K)}$ : Here we have an effective horizon $A_{\tilde{\tau}^{(i)}}=(\tilde{\tau}^{(i)})^{\beta}=T^{1/K}$ , so similar to (Vial et al., 2021), we exploit the fact that the best arm is active and modify existing UCB analysis to bound regret by $O(K\log(T^{1/K})/\Delta)=O(\log(T)/\Delta)$ , which is dominated by (11) (in an order sense).
•

$\tilde{\tau}^{(i)}=\tau_{\text{blk}}^{(i)}$ : Here we are considering phases $j$ where the best arm is most played by $i$ (since $j\geq\tau_{\text{spr}}$ ) but $i$ does not block suboptimal recommendations (since $j\leq\tau_{\text{blk}}^{(i)}$ ). Note that no such phases arise for the existing blocking rule, so here the proof diverges from (Vial et al., 2021), and most of Appendix D.1 is dedicated to this case. Roughly speaking, the analogous argument of the previous case yields the regret bound $O((K/\Delta)\mathbb{E}[\log\tilde{\tau}^{(i)}]$ , and we prove this term is also $O(\log(T)/\Delta)$ by deriving a tail bound for $\tilde{\tau}^{(i)}$ . The tail amounts to showing that, once the best arm is active, $i$ can identify suboptimal arms as such, within the phase. This in turn follows from best arm identification results and the growing phase lengths.

After phase $\tilde{\tau}^{(i)}$ , the best arm is most played for all honest agents (since $\tilde{\tau}^{(i)}\geq\tau_{\text{spr}}$ ), so they only recommend this arm. Thus, $i$ only plays the best arm, its $S$ sticky arms, and any malicious recommendations. Consequently, to bound regret by $O((S+d_{\text{mal}}(i))\log(T)/\Delta)$ as in (11), we need to show each malicious neighbor $i^{\prime}$ only recommends $O(1)$ suboptimal arms. It is easy to see that $i^{\prime}$ can only recommend $O(\log K)$ such arms: if $i^{\prime}$ recommends a bad arm at phase $\tilde{\tau}^{(i)}$ , they will be blocked until phase $T^{\eta/(\beta K)}$ (since $\tilde{\tau}^{(i)}\geq\tau_{\text{blk}}^{(i)}\vee T^{1/(\beta K)}$ ), then until phase $(T^{\eta/(\beta K)})^{\eta}=T^{\eta^{2}/(\beta K)}$ , etc. Thus, the $(\log_{\eta}K)$ -th bad recommendation occurs at phase $T^{\eta^{\log_{\eta}K}/(\beta K)}=T^{1/\beta}$ , which is time $T$ by definition $A_{j}=j^{\beta}$ . Finally, an argument from (Vial et al., 2021) sharpens this $O(\log K)$ term to $O(1)$ . ∎

5.3. Special cases

We next discuss some special cases of our regret bound. First, as in (Chawla et al., 2020), we can prove an explicit bound assuming the honest agent subgraph $G_{\text{hon}}$ is $d$ -regular, i.e., $d_{\text{hon}}(i)=d\ \forall\ i\in[n]$ .

Corollary 1.

Let the assumptions of Theorem 2 hold and further assume $G_{\text{hon}}$ is $d$ -regular with $d\geq 2$ . Let $\phi$ denote the conductance of $G_{\text{hon}}$ . Then for any honest agent $i\in[n]$ and horizon $T\in\mathbb{N}$ ,

(14)		$\displaystyle R_{T}^{(i)}$	$\displaystyle\leq 4\alpha\log(T)\min\left\{\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{1}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{1}{\Delta_{k}},\sum_{k=2}^{K}\frac{1}{\Delta_{k}}\right\}$
(15)			$\displaystyle\quad+\tilde{O}\left(\max\left\{d_{\text{mal}}(i)/\Delta_{2},(K/\Delta_{2})^{2},S^{\beta/(\rho_{1}^{2}(\beta-1))},(S/\Delta_{2}^{2})^{\beta/(\beta-1)},\bar{d}^{\beta/\rho_{1}},nK^{2}S,(\phi\Upsilon)^{-\beta}\right\}\right).$

Remark 11.

This corollary includes the complete graph case studied in (Vial et al., 2021), where $d_{\text{mal}}(i)=m$ , $\phi=\Theta(1)$ , and $\Upsilon=\Theta(n/(n+m))$ . In this case, the term (14) matches the corresponding term from (Vial et al., 2021) exactly, i.e., for large $T$ , Corollary 1 is a strict generalization. Our additive term scales as

\max\left\{(m/\Delta_{2}),(K/\Delta_{2})^{2},S^{\beta/(\rho_{1}^{2}(\beta-1))},(S/\Delta_{2}^{2})^{\beta/(\beta-1)},(n+m)^{\beta/\rho_{1}},nK^{2}S,((n+m)/n)^{\beta}\right\}

whereas the additive term from (Vial et al., 2021) scales as $\max\{(m/\Delta_{2}),(K/\Delta_{2}),(S/\Delta_{2}^{2})^{2\beta\eta/(\beta-1)},(n+m)^{\beta},nK^{2}S\}$ . Notice our dependence on the arm gap is $\Delta_{2}^{-2\beta/(\beta-1)}$ , which matches the fully cooperative case (Chawla et al., 2020), whereas the dependence is $\Delta_{2}^{-2\beta\eta/(\beta-1)}$ in (Vial et al., 2021), which is potentially much larger.

Remark 12.

In the setting of Remark 9, the corollary’s regret bound becomes

R_{T}^{(i)}\leq\sum_{k=2}^{O(1)}\frac{48\log T}{\Delta_{k}}+\tilde{O}(\max\{(K/\Delta_{2})^{2},\Delta_{2}^{-4},nK^{2},(\phi\Upsilon)^{-\beta}\}).

The key difference is the dependence on conductance $\tilde{O}(\phi^{-\beta})$ , which matches the result from (Chawla et al., 2022).

Proof sketch.

In light of Theorem 2, we only need to show $\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]=\tilde{O}((\phi\Upsilon)^{-\beta})$ . To do so, we let $\underline{\mathcal{I}}_{j}$ denote the noiseless version of $\bar{\mathcal{I}}_{j}$ (defined in the same way but with $\Upsilon=1$ ) and $\underline{\tau}_{\text{spr}}=\inf\{j:\underline{\mathcal{I}}_{j}=[n]\}$ . We then construct a coupling between $\bar{\mathcal{I}}_{j}$ and $\underline{\mathcal{I}}_{j}$ , which ensures that with high probability, $\bar{\tau}_{\text{spr}}\leq j\log(j)/\Upsilon$ whenever $\underline{\tau}_{\text{spr}}\leq j$ . Finally, using this coupling and a tail bound for $\underline{\tau}_{\text{spr}}$ from (Chawla et al., 2020) (which draws upon the analysis of (Chierichetti et al., 2010)), we derive a tail bound for $\bar{\tau}_{\text{spr}}$ . This allows us to show $\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]=O(((\log n)^{2}\log(\log(n)/\phi)/(\phi\Upsilon))^{\beta})=\tilde{O}((\phi\Upsilon)^{-\beta})$ , as desired.⁴⁴4When $\Upsilon=1$ , (Chawla et al., 2020) shows $\mathbb{E}[A_{\underline{\tau}_{\text{spr}}}]=\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]=O((\log(n)/\phi)^{\beta})$ , so our bound generalizes theirs up to $\log$ terms. ∎

Finally, we can sharpen the above results for honest agents without malicious neighbors.

Corollary 2.

For $i\in[n]$ with $d_{\text{mal}}(i)=0$ , the terms (11) and (14) from Theorem 2 and Corollary 1, respectively, can (under their respective assumptions) be improved to $4\alpha\log(T)\sum_{k=2}^{S+2}\Delta_{k}^{-1}$ .

Remark 13.

The improved term in Corollary 2 matches the $\log T$ term from (Chawla et al., 2020), including constants. Thus, the corollary shows that for large $T$ , agents who are not directly connected to malicious agents are unaffected by their presence elsewhere in the graph.

Proof sketch.

Recall from the Theorem 2 proof sketch that the $\log T$ term arises from regret after phase $\tau_{\text{spr}}$ . At any such phase, the best arm is most played for all honest agents (by definition), so when $d_{\text{mal}}(i)=0$ , $i$ ’s neighbors only recommend this arm. Therefore, $i$ ’s active sets after $\tau_{\text{spr}}$ are fixed; they contain the best arm and $S+1$ suboptimal ones. Thus, $i$ only plays $S+1$ suboptimal arms long-term, so in the worst case incurs the standard UCB regret $4\alpha\log(T)\sum_{k=2}^{S+2}\Delta_{k}^{-1}$ . ∎

6. Numerical results

Thus far, we have shown the proposed blocking rule adapts to general graphs more gracefully than the existing one, at least in theory. We now illustrate this finding empirically.

Experimental setup: We follow (Vial et al., 2021, Section 6) except we extend those experiments to $G(n+m,p)$ graphs, i.e., each edge is present with probability $p$ . For each $p\in\{1,1/2,1/4\}$ and each of two malicious strategies (to be defined shortly), we conduct $100$ trials of the following:

•

Set $n=25$ and $m=10$ and generate $G$ as a $G(n+m,p)$ random graph, resampling if necessary until the honest agent subgraph $G_{\text{hon}}$ is connected (see Assumption 1).
•

Set $K=100$ , $\mu_{1}=0.95$ , and $\mu_{2}=0.85$ , then sample the remaining arm means $\{\mu_{k}\}_{k=3}^{K}$ uniformly from $[0,0.85]$ (so $\Delta_{2}=0.1$ ). For each $k\in[K]$ , set $\nu_{k}=\text{Bernoulli}(\mu_{k})$ .
•

Set $S=K/n$ and sample the sticky sets $\{\hat{S}^{(i)}\}_{i=1}^{n}$ uniformly from the $S$ -sized subsets of $[K]$ , resampling if necessary until $1\in\cup_{i=1}^{n}\hat{S}^{(i)}$ (see Assumption 3).
•

Run Algorithm 1 with the existing (Algorithm 3) and proposed (Algorithm 4) blocking rules, along with two baselines: a no communication scheme, where agents ignore the network and run UCB in isolation, and the algorithm from (Chawla et al., 2020), where they do not block.

Algorithmic parameters: We set $\alpha=4$ and $\beta=\eta=2$ as in Remarks 4 and 8. For the parameters in the proposed blocking rule, we choose $\kappa_{j}=j^{1.5}$ , and $\theta_{j}=j-\log j$ . While these are different from the parameters specified in our theoretical results (which we found are too conservative in practice), they do satisfy the key properties discussed in Remark 6.

Malicious strategies: Like (Vial et al., 2021), we use strategies we call the naive and smart strategies (they are called uniform and omniscient in (Vial et al., 2021)). The naive strategy simply recommends a uniformly random suboptimal arm. The smart strategy recommends $R_{j}^{(i)}=\operatorname*{arg\,min}_{k\in\{2,\ldots,K\}\setminus S_{j}^{(i)}}T_{k}^{(i)}(A_{j})$ , i.e., the least played, inactive, suboptimal arm. Intuitively, this is a more devious strategy which forces $i$ to play $R_{j}^{(i)}$ often in the next phase (to drive down its upper confidence bound). Consequently, $i$ may play it most and discard a better arm in favor of it (see Lines 1-1 of Algorithm 1).

Results: In Figure 3, we plot the average and standard deviation (across trials) of the per-agent regret $\sum_{i=1}^{n}R_{T}^{(i)}/n$ . For the naive strategy, the existing blocking rule eventually becomes worse than the no blocking baseline as $p$ decreases. More strikingly, it even becomes worse than the no communication baseline for the smart strategy. In other words, honest agents would have been better off ignoring the network and simply running UCB on their own. As in Section 4, this is because accidental blocking causes the best arm to spread very slowly. Additionally, the standard deviation becomes much higher than all other algorithms, suggesting that regret is significantly worse in some trials. In contrast, the proposed blocking rule improves as $p$ decreases, because it is mild enough to spread the best arm at all values of $p$ , and for smaller $p$ , honest agents have fewer malicious neighbors (on average). We also observe that the proposed rule outperforms both baselines uniformly across $p$ . Additionally, it improves over the existing rule more dramatically for the smart strategy, i.e., when the honest agents face a more sophisticated adversary. Finally, it is worth acknowledging the existing rule is better when $p=1$ – although not in a statistically significant sense for the smart strategy – because it does spread the best arm quickly on the complete graph (as shown in (Vial et al., 2021)), and thereafter more aggressively blocks malicious agents.

Other results: As in (Vial et al., 2021), we reran the simulations using arm means derived from the MovieLens dataset (Harper and Konstan, 2015). We also experimented with new variants of the smart and naive strategies, where the malicious agents follow these strategies if the best arm is active (in hopes of forcing honest agents to discard it) and recommend the second best arm otherwise. Intuitively, these variants differ in that malicious agents recommend good arms (i.e., the second best) more frequently, while still never revealing the best arm (the only one that leads to logarithmic regret). For all experiments, the key message – that the proposed blocking rule adapts to varying graph structures more gracefully than the existing one – is consistent. See Appendix B for details.

7. Gossip despite blocking

As discussed above, the main analytical contribution of this work is proving that the best arm spreads in a gossip fashion, despite accidental blocking. In this (technical) section, we provide a detailed sketch of this proof. We begin with a high-level outline. The key is to show that honest agents eventually stop blocking each other. This argument (roughly) proceeds as follows:

•

Step 1: First, we show that honest agents learn the arm statistics in a certain sense. More specifically, we provide a tail bound for a random phase $\tau_{\text{arm}}$ such that for all phases $j\geq\tau_{\text{arm}}$ (1) each honest agent’s most played arm in phase $j$ is close to its true best active arm and (2) any active arm close to the true best one is played at least $\kappa_{j}$ times by the end of phase $j$ .
•

Step 2: Next, we show that honest agents communicate with their neighbors frequently. In particular, we establish a tail bound for another random phase $\tau_{\text{com}}$ such that for any $j\geq\tau_{\text{com}}$ , each honest agent contacts all of its honest neighbors at least once between $\theta_{j}$ and $j$ .
•

Step 3: Finally, we use the above to show that eventually, no blocking occurs amongst honest agents. The basic idea is as follows. Consider a phase $j$ , an honest agent $i$ , and a neighbor $i^{\prime}$ of $i$ . Then if $i$ has had the same best arm estimate $k$ since phase $\theta_{j}$ – i.e., if the second blocking criterion in (8) holds – $i^{\prime}$ would have contacted $i$ at some phase $j^{\prime}\in\{\theta_{j},\ldots,j\}$ (by step 2) and received arm $k$ . Between phases $j^{\prime}$ and $j$ , the most played arm for $i^{\prime}$ cannot get significantly worse (by step 1). Thus, if $i$ asks $i^{\prime}$ for a recommendation at $j$ , $i^{\prime}$ will respond with an arm whose mean is close to $\mu_{k}$ , which $i$ will play at least $\kappa_{j}$ times (by step 1). Hence, the first criterion in (8) fails, i.e., the two cannot simultaneously hold.

In the next three sub-sections, we discuss these three steps. Then in Section 7.4, we describe how, once accidental blocking stops, the arm spreading process can be coupled to the noisy rumor process from Definition 1. Finally, in Section 7.5, we discuss how to combine all of these steps to bound the term $\mathbb{E}[A_{\tau_{\text{spr}}}]$ from the Theorem 2 proof sketch.

7.1. Learning the arm statistics

Recall we assume $\mu_{1}\geq\cdots\geq\mu_{K}$ , so for any $W\subset[K]$ , $\min W$ is the best arm in $W$ , i.e., $\mu_{\min W}=\max_{w\in W}\mu_{w}$ . Therefore, for any $\delta\in(0,1)$ , $G_{\delta}(W)\triangleq\{w\in W:\mu_{w}\geq\mu_{\min W}-\delta\}$ is the subset of arms at least $\delta$ -close to the best one. For each honest agent $i\in[n]$ and phase $j\in\mathbb{N}$ , define

(16)

\displaystyle\Xi_{j,1}^{(i)}=\left\{B_{j}^{(i)}\notin G_{\delta_{j,1}}(S_{j}^{(i)})\right\},\quad\Xi_{j,2}^{(i)}=\left\{\min_{w\in G_{\delta_{j,2}}(S_{j}^{(i)})}T_{w}^{(i)}(A_{j})\leq\kappa_{j}\right\},\quad\Xi_{j}^{(i)}=\Xi_{j,1}^{(i)}\cup\Xi_{j,2}^{(i)}.

where $\delta_{j,1},\delta_{j,2}\in(0,1)$ will be chosen shortly. Finally, define the random phase

(17)

\tau_{\text{arm}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\Xi_{j^{\prime}}^{(i)})=0\ \forall\ i\in[n],j^{\prime}\in\{j,j+1,\ldots\}\}.

In words, $\tau_{\text{arm}}$ is the earliest phase such that, at all phases $j$ thereafter, (1) the most played arms are $\delta_{j,1}$ -close to best active arms and (2) all arms $\delta_{j,2}$ -close to the best are played at least $\kappa_{j}$ times.

As discussed above, Step 1 involves a tail bound for $\tau_{\text{arm}}$ . The analysis is based on (Bubeck et al., 2011, Theorem 2), which includes a tail bound showing that the most played arm is $\delta$ -close to the best, provided that $1/\delta^{2}$ samples have been collected from each of the $\delta$ -far arms. In our case, phase $j$ lasts $A_{j}-A_{j-1}=\Theta(j^{\beta-1})$ time steps, so each of $S+2$ active arms is played $\Theta(j^{\beta-1}/S)$ times on average. Hence, we can show the most played arm within the phase is $\delta_{j,1}$ -close to the best if we choose $\delta_{j,1}=\Theta(\sqrt{S/j^{\beta-1}})$ , which allows us to bound $\mathbb{P}(\Xi_{j,1}^{(i)})$ . Analogously, we choose $\delta_{j,2}=\Theta(1/\sqrt{\kappa_{j}})$ and show that $\delta_{j,2}$ -close arms must be played $1/\delta_{j,2}^{2}=\tilde{\Theta}(\kappa_{j})$ times before they are distinguished as such, which allows us to bound $\mathbb{P}(\Xi_{j,2}^{(i)})$ . Taken together, we can prove a tail bound for $\tau_{\text{arm}}$ (Lemma 9 in Appendix E.1) with these choices $\delta_{j,1}=\Theta(\sqrt{S/j^{\beta-1}})$ and $\delta_{j,2}=\tilde{\Theta}(1/\sqrt{\kappa_{j}})$ .

7.2. Communicating frequently

Next, for any $i,i^{\prime}\in[n]$ such that $(i,i^{\prime})\in E_{\text{hon}}$ , let $\Xi_{j}^{(i\rightarrow i^{\prime})}=\cap_{j^{\prime}=\lfloor\theta_{j}\rfloor}^{j-2}\{H_{j^{\prime}}^{(i^{\prime})}\neq i\}$ denote the event that $i$ did not send a recommendation to $i^{\prime}$ between phases $\lfloor\theta_{j}\rfloor$ and $j-2$ . Also define

(18)

\tau_{\text{com}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\cup_{i\rightarrow i^{\prime}\in E_{\text{hon}}}\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})=0\ \forall\ j^{\prime}\in\{j,j+1,\ldots\}\}.

Here we abuse notation slightly; the union is over all (undirected) edges in $E_{\text{hon}}$ but viewed as pairs of directed edges. Hence, at all phases $j\geq\tau_{\text{com}}$ , each honest agent $i^{\prime}$ receives a recommendation from each of its honest neighbors $i$ at some phase $j^{\prime}$ between $\theta_{j}$ and $j-2$ .

Step 2 involves the tail bound for $\tau_{\text{com}}$ that was mentioned above (see Lemma 10 in Appendix E.2). The proof amounts to bounding the probability of $\Xi_{j}^{(i\rightarrow i^{\prime})}$ . Recall this event says $i^{\prime}$ did not contact $i$ for a recommendation at any phase $j^{\prime}\in\{{\theta_{j}},\ldots,j-2\}$ . Clearly, this means $i^{\prime}$ did not block $i$ at any such phase. Hence, in the worst case, $i^{\prime}$ blocked $i$ just before $\theta_{j}$ , in which case $i$ was un-blocked at $\theta_{j}^{\eta}=(j/3)^{\rho_{1}\eta}\leq j/3$ , where the inequality holds by assumption in Theorem 2. Hence, $\Xi_{j}^{(i\rightarrow i^{\prime})}$ implies $i^{\prime}$ was not blocking $i$ between phases $j/3$ and $j-2$ , so each of the $\Theta(j)$ neighbors that $i^{\prime}$ contacted in these phases was sampled uniformly from a set containing $i$ , yet $i$ was never sampled. The probability of this decays exponentially in $j$ , which yields an exponential tail for $\tau_{\text{com}}$ .

7.3. Avoiding accidental blocking

Next, we show honest agents eventually stop blocking each other. Toward this end, we first note

(19)

\forall\ i\in[n],\quad\forall\ j\geq\tau_{\text{arm}},\quad\mu_{\min S_{j}^{(i)}}\leq\mu_{B_{j}^{(i)}}+\delta_{j,1}\leq\mu_{\min S_{j+1}^{(i)}}+\delta_{j,1}

where the first inequality uses the definition of $\tau_{\text{arm}}$ , and the second holds because $\min S_{j+1}^{(i)}\in\operatorname*{arg\,max}_{k\in S_{j+1}^{(i)}}\mu_{k}$ and $B_{j}^{(i)}\in S_{j+1}^{(i)}$ in Algorithm 1 (see Claim 13 in Appendix E.3 for details). In words, (19) says the best active arm can decay by at most $\delta_{j,1}$ at phase $j$ . Applying iteratively and since there are $K$ arms total, we then show (see Claim 14 in Appendix E.3)

(20)

\forall\ i\in[n],\quad\forall\ j^{\prime}\geq j\geq\tau_{\text{arm}},\quad\mu_{\min S_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-(K-1)\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.

Combining the previous two inequalities, we conclude (see Corollary 3 in Appendix E.3)

(21)

\forall\ i\in[n],\quad\forall\ j^{\prime}\geq j\geq\tau_{\text{arm}},\quad\mu_{B_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-K\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.

Now the key part of Step 3 is to use (21) to show (see Claim 15 in Appendix E.3)

(22)

\forall\ j\in\mathbb{N}\ s.t.\ \theta_{j}\geq\tau_{\text{arm}},\quad\forall\ i,i^{\prime}\in[n]\ s.t.\ i\in\{H_{j^{\prime}}^{(i^{\prime})}\}_{j^{\prime}=\theta_{j}}^{j-2},\quad i^{\prime}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}.

In words, this result says that if $j$ is sufficiently large, and if $i$ has sent a recommendation to $i^{\prime}$ since phase $\theta_{j}$ , then $i$ will not block $i^{\prime}$ at phase $j$ . The proof is by contraction: if instead $i$ blocks $i^{\prime}$ at $j$ , then by Algorithm 4, $i$ has not changed its best arm estimate $k$ since phase $\theta_{j}$ , so it would have recommended $k$ to $i^{\prime}$ at some phase $j^{\prime}\geq\theta_{j}$ . Therefore, $\mu_{\min S_{j^{\prime}}^{(i^{\prime})}}\geq\mu_{k}$ . Additionally, since $j^{\prime}\geq\theta_{j}=\Omega(j^{\rho_{1}})$ , we know that for any $j^{\prime\prime}\geq j^{\prime}$ , the choice of $\delta_{j^{\prime\prime},1}$ in Section 7.1 guarantees that

K\delta_{j^{\prime\prime},1}\leq O(K\delta_{j^{\prime},1})=\tilde{O}\Big{(}\sqrt{K^{2}S/(j^{\prime})^{\beta-1}}\Big{)}\leq\tilde{O}\Big{(}\sqrt{K^{2}S/j^{\rho_{1}(\beta-1)}}\Big{)}.

Combining these observations and using (21) (with $j^{\prime}$ and $j$ replaced by $j-1$ and $j^{\prime}$ ), we then show

(23)

\mu_{B_{j-1}^{(i^{\prime})}}\geq\mu_{\min S_{j^{\prime}}^{(i^{\prime})}}-K\sup_{j^{\prime\prime}\geq j^{\prime}}\delta_{j^{\prime\prime},1}\geq\mu_{k}-\tilde{O}\Big{(}\sqrt{K^{2}S/j^{\rho_{1}(\beta-1)}}\Big{)}.

On the other hand, $i$ blocking $i^{\prime}$ at phase $j$ means $i$ plays the recommended arm $B_{j-1}^{(i^{\prime})}$ fewer than $\kappa_{j}$ times by the end of phase $j$ . Since $j\geq\theta_{j}\geq\tau_{\text{arm}}$ , this implies (by definition of $\tau_{\text{arm}}$ ) that $\mu_{k}>\mu_{B_{j-1}^{(i^{\prime})}}+\delta_{j,2}$ , where $\delta_{j,2}=\tilde{\Theta}(1/\sqrt{\kappa_{j}})$ as in Section 7.1. Combined with (23) and the choice $\kappa_{j}=j^{\rho_{2}}/(K^{2}S)$ from Theorem 2, we conclude $j^{\rho_{1}(\beta-1)}\leq\tilde{O}(j^{\rho_{2}})$ . This contradicts the assumption $\rho_{2}<\rho_{1}(\beta-1)$ in Theorem 2, which completes the proof of (22).

Finally, we use (22) to show honest agents eventually stop blocking each other entirely, i.e.,

(24)

\forall\ j\in\mathbb{N}\ s.t.\ \theta_{j}\geq\tau_{\text{com}},\theta_{\theta_{j}}\geq\tau_{\text{arm}},\quad P_{j}^{(i)}\cap[n]=\emptyset\ \forall\ i\in[n]

(see Lemma 11 in Appendix E.3). Intuitively, (24) holds because after new blocking stops (22), old blocking will eventually “wear off”. The proof is again by contradiction: if $i$ is blocking some honest $i^{\prime}$ at phase $j$ , the blocking must have started at some $j^{\prime}\geq j^{1/\eta}$ (else, it ends by $(j^{\prime})^{\eta}<j$ ). Thus, by assumption $j^{1/\eta}\geq j^{\rho_{1}}\geq{\theta_{j}}\geq\tau_{\text{com}}$ , $i$ blocked $i^{\prime}$ at phase $j^{\prime}\geq\tau_{\text{com}}$ . But by definition of $\tau_{\text{com}}$ , $i^{\prime}$ would have contacted $i$ at some phase $j^{\prime\prime}\in\{\theta_{j^{\prime}},\ldots,j^{\prime}\}$ . Applying (22) (at phase $j^{\prime}$ ; note that by the above inequalities, $\theta_{j^{\prime}}\geq\theta_{j^{1/\eta}}\geq\theta_{\theta_{j}}\geq\tau_{\text{arm}}$ , as required by (22)), we obtain a contradiction.

7.4. Coupling with noisy rumor process

To begin, we define an equivalent way to sample $H_{j}^{(i)}$ in Algorithm 2.⁵⁵5Claim 16 in Appendix E.4 verifies this equivalency (the proof is a straightforward application of the law of total probability). This equivalent method will allow us to couple the arm spreading and noisy rumor processes through a set of primitive random variables. In particular, for each honest agent $i\in[n]$ , let $\{\upsilon_{j}^{(i)}\}_{j=1}^{\infty}$ and $\{\bar{H}_{j}^{(i)}\}_{j=1}^{\infty}$ be i.i.d. sequences drawn uniformly from $[0,1]$ and $N_{\text{hon}}(i)$ . Then choose $H_{j}^{(i)}$ according to two cases:

•

If $P_{j}^{(i)}\cap[n]=\emptyset$ , let $Y_{j}^{(i)}=\mathbbm{1}(\upsilon_{j}^{(i)}\leq d_{\text{hon}}(i)/|N(i)\setminus P_{j}^{(i)}|)$ and consider two sub-cases. First, if $Y_{j}^{(i)}=1$ , set $H_{j}^{(i)}=\bar{H}_{j}^{(i)}$ . Second, if $Y_{j}^{(i)}=0$ , sample $H_{j}^{(i)}$ from $N_{\text{mal}}(i)\setminus P_{j}^{(i)}$ uniformly.
•

If $P_{j}^{(i)}\cap[n]\neq\emptyset$ , sample $H_{j}^{(i)}$ from $N(i)\setminus P_{j}^{(i)}$ uniformly.

Next, we observe that since $\delta_{j,1}\rightarrow 0$ as $j\rightarrow\infty$ by the choice of $\delta_{j,1}$ in Section 7.1 and $\Delta_{2}>0$ by Assumption 2, we have $\delta_{j,1}<\Delta_{2}$ for large enough $j$ . Paired with the definition of $\tau_{\text{arm}}$ , this allows us to show that for all large $j$ and $i\in[n]$ with $1\in S_{j}^{(i)}$ (i.e., with the best arm active), $B_{j}^{(i)}=1$ (i.e., the best arm is played most). See Claim 17 in Appendix E.4 for the formal statement.

Finally, we observe that by (24), only the first case of the above sampling strategy occurs for large $j$ . Moreover, in this case, $Y_{j}^{(i)}$ is Bernoulli with parameter

d_{\text{hon}}(i)/|N(i)\setminus P_{j}^{(i)}|\geq d_{\text{hon}}(i)/|N(i)|\triangleq d_{\text{hon}}(i)/d(i)\geq\Upsilon,

where the second inequality holds by Definition 1. Hence, the probability that $Y_{j}^{(i)}=1$ , and thus the probability that $i$ contacts the random honest neighbor $\bar{H}_{j}^{(i)}$ in the above sampling strategy, dominates the probability that $i$ contacts $\bar{H}_{j}^{(i)}$ in the noisy rumor process of Definition 1. Additionally, by the previous paragraph, agents with the best arm active will recommend it (for large enough $j$ ). Taken together, we can show that the probability of receiving the best arm in the arm spreading process dominates the probability of being informed of the rumor in the noisy rumor process. This allows us to prove a tail bound for $\tau_{\text{spr}}$ in terms of a tail bound for the random phase $\bar{\tau}_{\text{spr}}$ from Definition 1, on the event that the tails of $\tau_{\text{arm}}$ and $\tau_{\text{com}}$ are sufficiently small (in the sense of (24); see Lemma 12 in Appendix E.4 for details).

7.5. Spreading the best arm

In summary, we prove tail bounds for $\tau_{\text{arm}}$ and $\tau_{\text{com}}$ (Sections 7.1 and 7.2) and show the tails of $\tau_{\text{spr}}$ are controlled by those of $\bar{\tau}_{\text{spr}}$ , provided the tails of $\tau_{\text{arm}}$ and $\tau_{\text{com}}$ are not too heavy (Sections 7.3 and 7.4). Combining and summing tails allows us to bound $\mathbb{E}[A_{{\tau}_{\text{spr}}}]$ in terms of $C_{\star}$ (which accounts for the tails of $\tau_{\text{arm}}$ and $\tau_{\text{com}}$ ) and $\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}]$ (which accounts for the tail of $\bar{\tau}_{\text{spr}}$ ), as mentioned in the Theorem 2 proof sketch. See Theorem 3 and Corollary 4 in Appendix E.5 for details.

8. Conclusion

In this work, we showed that existing algorithms for multi-agent bandits with malicious agents fail to generalize beyond the complete graph. In light of this, we proposed a new blocking algorithm and showed it has low regret on any connected and undirected graph. This regret bound relied on the analysis of a novel process involving gossip and blocking. Our work leaves open several questions, such as whether our insights can be applied to multi-agent reinforcement learning.

Acknowledgements.

This work was supported by NSF Grants CCF 22-07547, CCF 19-34986, CNS 21-06801, 2019844, 2112471, and 2107037; ONR Grant N00014-19-1-2566; the Machine Learning Lab (MLL) at UT Austin; and the Wireless Networking and Communications Group (WNCG) Industrial Affiliates Program.

References

(1)
Anandkumar et al. (2011) Animashree Anandkumar, Nithin Michael, Ao Kevin Tang, and Ananthram Swami. 2011. Distributed algorithms for learning and cognitive medium access with logarithmic regret. IEEE Journal on Selected Areas in Communications 29, 4 (2011), 731–745.
Audibert and Bubeck (2010) Jean-Yves Audibert and Sébastien Bubeck. 2010. Best Arm Identification in Multi-Armed Bandits. In COLT-23th Conference on Learning Theory-2010. 13–p.
Auer et al. (2002) Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235–256.
Auer et al. (1995) Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 1995. Gambling in a rigged casino: The adversarial multi-armed bandit problem. In Proceedings of IEEE 36th Annual Foundations of Computer Science. IEEE, 322–331.
Avner and Mannor (2014) Orly Avner and Shie Mannor. 2014. Concurrent bandits and cognitive radio networks. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 66–81.
Awerbuch and Kleinberg (2008) Baruch Awerbuch and Robert Kleinberg. 2008. Competitive collaborative learning. J. Comput. System Sci. 74, 8 (2008), 1271–1288.
Bar-On and Mansour (2019) Yogev Bar-On and Yishay Mansour. 2019. Individual regret in cooperative nonstochastic multi-armed bandits. Advances in Neural Information Processing Systems 32 (2019), 3116–3126.
Bargiacchi et al. (2018) Eugenio Bargiacchi, Timothy Verstraeten, Diederik Roijers, Ann Nowé, and Hado Hasselt. 2018. Learning to coordinate with coordination graphs in repeated single-stage multi-agent decision problems. In International conference on machine learning. PMLR, 482–490.
Bistritz and Bambos (2020) Ilai Bistritz and Nicholas Bambos. 2020. Cooperative multi-player bandit optimization. Advances in Neural Information Processing Systems 33 (2020).
Bistritz and Leshem (2018) Ilai Bistritz and Amir Leshem. 2018. Distributed multi-player bandits-a game of thrones approach. In Advances in Neural Information Processing Systems. 7222–7232.
Bogunovic et al. (2020) Ilija Bogunovic, Andreas Krause, and Jonathan Scarlett. 2020. Corruption-tolerant Gaussian process bandit optimization. In International Conference on Artificial Intelligence and Statistics. PMLR, 1071–1081.
Bogunovic et al. (2021) Ilija Bogunovic, Arpan Losalka, Andreas Krause, and Jonathan Scarlett. 2021. Stochastic linear bandits robust to adversarial attacks. In International Conference on Artificial Intelligence and Statistics. PMLR, 991–999.
Boursier and Perchet (2019) Etienne Boursier and Vianney Perchet. 2019. SIC-MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits. Advances in Neural Information Processing Systems 32 (2019), 12071–12080.
Bubeck et al. (2011) Sébastien Bubeck, Rémi Munos, and Gilles Stoltz. 2011. Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Computer Science 412, 19 (2011), 1832–1852.
Buccapatnam et al. (2015) Swapna Buccapatnam, Jian Tan, and Li Zhang. 2015. Information sharing in distributed stochastic bandits. In 2015 IEEE Conference on Computer Communications (INFOCOM). IEEE, 2605–2613.
Cesa-Bianchi et al. (2016) Nicolo Cesa-Bianchi, Claudio Gentile, Yishay Mansour, and Alberto Minora. 2016. Delay and cooperation in nonstochastic bandits. In Conference on Learning Theory, Vol. 49. 605–622.
Chakraborty et al. (2017) Mithun Chakraborty, Kai Yee Phoebe Chua, Sanmay Das, and Brendan Juba. 2017. Coordinated Versus Decentralized Exploration In Multi-Agent Multi-Armed Bandits.. In IJCAI. 164–170.
Chawla et al. (2020) Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2020. The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. 3471–3481.
Chawla et al. (2022) Ronshee Chawla, Abishek Sankararaman, and Sanjay Shakkottai. 2022. Multi-agent low-dimensional linear bandits. IEEE Trans. Automat. Control (2022).
Chierichetti et al. (2010) Flavio Chierichetti, Silvio Lattanzi, and Alessandro Panconesi. 2010. Almost tight bounds for rumour spreading with conductance. In Proceedings of the forty-second ACM symposium on Theory of computing. 399–408.
Dakdouk et al. (2021) Hiba Dakdouk, Raphaël Féraud, Romain Laroche, Nadège Varsier, and Patrick Maillé. 2021. Collaborative Exploration and Exploitation in massively Multi-Player Bandits. (2021).
Dubey et al. (2020a) Abhimanyu Dubey et al. 2020a. Cooperative multi-agent bandits with heavy tails. In International Conference on Machine Learning. PMLR, 2730–2739.
Dubey et al. (2020b) Abhimanyu Dubey et al. 2020b. Kernel methods for cooperative multi-agent contextual bandits. In International Conference on Machine Learning. PMLR, 2740–2750.
Dubey and Pentland (2020) Abhimanyu Dubey and AlexSandy’ Pentland. 2020. Differentially-Private Federated Linear Bandits. Advances in Neural Information Processing Systems 33 (2020).
Garcelon et al. (2020) Evrard Garcelon, Baptiste Roziere, Laurent Meunier, Jean Tarbouriech, Olivier Teytaud, Alessandro Lazaric, and Matteo Pirotta. 2020. Adversarial Attacks on Linear Contextual Bandits. Advances in Neural Information Processing Systems 33 (2020).
Gupta et al. (2019) Anupam Gupta, Tomer Koren, and Kunal Talwar. 2019. Better Algorithms for Stochastic Bandits with Adversarial Corruptions. In Conference on Learning Theory. 1562–1578.
Harper and Konstan (2015) F Maxwell Harper and Joseph A Konstan. 2015. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis) 5, 4 (2015), 1–19.
Hillel et al. (2013) Eshcar Hillel, Zohar S Karnin, Tomer Koren, Ronny Lempel, and Oren Somekh. 2013. Distributed exploration in multi-armed bandits. In Advances in Neural Information Processing Systems. 854–862.
Jun et al. (2018) Kwang-Sung Jun, Lihong Li, Yuzhe Ma, and Jerry Zhu. 2018. Adversarial attacks on stochastic bandits. Advances in Neural Information Processing Systems 31 (2018), 3640–3649.
Kalathil et al. (2014) Dileep Kalathil, Naumaan Nayyar, and Rahul Jain. 2014. Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory 60, 4 (2014), 2331–2345.
Kanade et al. (2012) Varun Kanade, Zhenming Liu, and Bozidar Radunovic. 2012. Distributed non-stochastic experts. In Advances in Neural Information Processing Systems. 260–268.
Kao et al. (2022) Hsu Kao, Chen-Yu Wei, and Vijay Subramanian. 2022. Decentralized cooperative reinforcement learning with hierarchical information structure. In International Conference on Algorithmic Learning Theory. PMLR, 573–605.
Kapoor et al. (2019) Sayash Kapoor, Kumar Kshitij Patel, and Purushottam Kar. 2019. Corruption-tolerant bandit learning. Machine Learning 108, 4 (2019), 687–715.
Kolla et al. (2018) Ravi Kumar Kolla, Krishna Jagannathan, and Aditya Gopalan. 2018. Collaborative learning of stochastic bandits over a social network. IEEE/ACM Transactions on Networking 26, 4 (2018), 1782–1795.
Korda et al. (2016) Nathan Korda, Balázs Szörényi, and Li Shuai. 2016. Distributed clustering of linear bandits in peer to peer networks. In Journal of machine learning research workshop and conference proceedings, Vol. 48. International Machine Learning Societ, 1301–1309.
Lalitha and Goldsmith (2021) Anusha Lalitha and Andrea Goldsmith. 2021. Bayesian Algorithms for Decentralized Stochastic Bandits. IEEE Journal on Selected Areas in Information Theory 2, 2 (2021), 564–583.
Landgren et al. (2016) Peter Landgren, Vaibhav Srivastava, and Naomi Ehrich Leonard. 2016. Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms. In 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 167–172.
Lattimore and Szepesvári (2020) Tor Lattimore and Csaba Szepesvári. 2020. Bandit algorithms. Cambridge University Press.
LeBlanc et al. (2013) Heath J LeBlanc, Haotian Zhang, Xenofon Koutsoukos, and Shreyas Sundaram. 2013. Resilient asymptotic consensus in robust networks. IEEE Journal on Selected Areas in Communications 31, 4 (2013), 766–781.
Liu and Shroff (2019) Fang Liu and Ness Shroff. 2019. Data Poisoning Attacks on Stochastic Bandits. In International Conference on Machine Learning. 4042–4050.
Liu et al. (2021) Junyan Liu, Shuai Li, and Dapeng Li. 2021. Cooperative Stochastic Multi-agent Multi-armed Bandits Robust to Adversarial Corruptions. arXiv preprint arXiv:2106.04207 (2021).
Liu and Zhao (2010) Keqin Liu and Qing Zhao. 2010. Distributed learning in multi-armed bandit with multiple players. IEEE Transactions on Signal Processing 58, 11 (2010), 5667–5681.
Liu et al. (2020) Lydia T Liu, Horia Mania, and Michael Jordan. 2020. Competing bandits in matching markets. In International Conference on Artificial Intelligence and Statistics. PMLR, 1618–1628.
Lykouris et al. (2018) Thodoris Lykouris, Vahab Mirrokni, and Renato Paes Leme. 2018. Stochastic bandits robust to adversarial corruptions. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing. 114–122.
Madhushani and Leonard (2021) Udari Madhushani and Naomi Leonard. 2021. When to call your neighbor? strategic communication in cooperative stochastic bandits. arXiv preprint arXiv:2110.04396 (2021).
Mansour et al. (2018) Yishay Mansour, Aleksandrs Slivkins, and Steven Wu. 2018. Competing bandits: Learning under competition. In 9th Innovations in Theoretical Computer Science, ITCS 2018. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 48.
Martínez-Rubio et al. (2019) David Martínez-Rubio, Varun Kanade, and Patrick Rebeschini. 2019. Decentralized cooperative stochastic multi-armed bandits. Advances in Neural Information Processing Systems (2019).
Mitra et al. (2021) Aritra Mitra, Hamed Hassani, and George Pappas. 2021. Exploiting Heterogeneity in Robust Federated Best-Arm Identification. arXiv preprint arXiv:2109.05700 (2021).
Newton et al. (2021) Conor Newton, AJ Ganesh, and Henry Reeve. 2021. Asymptotic Optimality for Decentralised Bandits. In Reinforcement Learning in Networks and Queues, Sigmetrics 2021.
Pittel (1987) Boris Pittel. 1987. On spreading a rumor. SIAM J. Appl. Math. 47, 1 (1987), 213–223.
Rosenski et al. (2016) Jonathan Rosenski, Ohad Shamir, and Liran Szlak. 2016. Multi-player bandits–a musical chairs approach. In International Conference on Machine Learning. 155–163.
Sankararaman et al. (2019) Abishek Sankararaman, Ayalvadi Ganesh, and Sanjay Shakkottai. 2019. Social learning in multi agent multi armed bandits. Proceedings of the ACM on Measurement and Analysis of Computing Systems 3, 3 (2019), 1–35.
Shahrampour et al. (2017) Shahin Shahrampour, Alexander Rakhlin, and Ali Jadbabaie. 2017. Multi-armed bandits in multi-agent networks. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2786–2790.
Szörényi et al. (2013) Balázs Szörényi, Róbert Busa-Fekete, István Hegedűs, Róbert Ormándi, Márk Jelasity, and Balázs Kégl. 2013. Gossip-based distributed stochastic bandit algorithms. In Journal of Machine Learning Research Workshop and Conference Proceedings, Vol. 2. International Machine Learning Societ, 1056–1064.
Tekin and Van Der Schaar (2015) Cem Tekin and Mihaela Van Der Schaar. 2015. Distributed online learning via cooperative contextual bandits. IEEE transactions on signal processing 63, 14 (2015), 3700–3714.
Vial et al. (2021) Daniel Vial, Sanjay Shakkottai, and R Srikant. 2021. Robust multi-agent multi-armed bandits. In Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing. 161–170.
Wang et al. (2020) Po-An Wang, Alexandre Proutiere, Kaito Ariu, Yassir Jedra, and Alessio Russo. 2020. Optimal algorithms for multiplayer multi-armed bandits. In International Conference on Artificial Intelligence and Statistics. PMLR, 4120–4129.
Zhu et al. (2021) Zhaowei Zhu, Jingxuan Zhu, Ji Liu, and Yang Liu. 2021. Federated bandit: A gossiping approach. In Abstract Proceedings of the 2021 ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems. 3–4.

Appendix A Notes on appendices

The appendices are organized as follows. First, Appendix B contains the additional numerical results that were mentioned in Section 6. Next, we prove Theorem 1 in Appendix C and all results from Section 5 in Appendix D. We then provide a rigorous version of the proof sketch from Section 7 in Appendix E. Finally, Appendix F contains some auxiliary results – namely, Appendix F.1 records some simple inequalities, Appendix F.2 provides some bandit results that are essentially known but stated in forms convenient to us, and Appendices F.3-F.4 contain some tedious calculations.

For the analysis, we use $C_{i}$ , $C_{i}^{\prime}$ , etc. to denote positive constants depending only on the algorithmic parameters $\alpha$ , $\beta$ , $\eta$ , $\rho_{1}$ , and $\rho_{2}$ . Each is associated with a corresponding claim, e.g., $C_{\ref{clmLogRegNaive}}$ with Claim 1. Within the proofs, we use $C$ , $C^{\prime}$ , etc. to denote constants whose values may change across proofs. Finally, $\mathbbm{1}$ denotes the indicator function, $\mathbb{E}_{j}$ and $\mathbb{P}_{j}$ are expectation and probability conditioned on all randomness before the $j$ -th communication period, and $A^{-1}(t)=\min\{j\in\mathbb{N}:t\leq A_{j}\}$ denotes the current phase at time $t\in\mathbb{N}$ .

Appendix B Additional experiments

As mentioned in Section 6, we also considered arm means derived from real data. The setup was the same as for the synthetic means, except for two changes (as in (Vial et al., 2021)): we choose $m=15$ instead of $m=10$ , and we sample $\{\mu_{k}\}_{k=1}^{K}$ uniformly from a set of arm means derived from MovieLens (Harper and Konstan, 2015) user film ratings via matrix completion; see (Vial et al., 2021, Section 6.2) for details. The results (Figure 4) are qualitatively similar to the synthetic case.

Finally, we repeated the synthetic data experiments from Section 6 with the intermediate $G(n+m,p)$ graph parameter $p=1/2$ and two new malicious strategies called mixed naive and mixed smart. As discussed in Section 6, these approaches use a “mixed report” where the malicious agents more frequently recommend good arms – namely, the second best when the best is inactive and the naive or smart recommendation otherwise. Results are shown in Figure 5. They again reinforce the key message that the proposed rule adapts more gracefully to networks beyond the complete graph – in this case, our blocking rule has less than half of the regret of the existing one at the horizon $T$ . Additionally, we observe that the no blocking algorithm from (Chawla et al., 2020) has much lower regret in Figure 5 than Figure 3, though still higher than our proposed blocking algorithm. This suggests that our algorithm remains superior even for “nicer” malicious strategies under which blocking is less necessary (in the sense that (Chawla et al., 2020) has lower regret in Figure 5 than Figure 3).

Appendix C Proof of Theorem 1

We first observe that since $h\leq 2^{h-1}\ \forall\ h\in\mathbb{N}$ , we can lower bound the arm gap as follows:

\Delta_{2}=\mu_{1}-\mu_{2}=1-\frac{13}{15}-\sum_{h=1}^{(n/2)-2}\left(\frac{1}{16}\right)^{2^{h-1}}>1-\frac{13}{15}-\sum_{h=1}^{\infty}\left(\frac{1}{16}\right)^{h}=\frac{1}{15}.

Next, we show that for phases $j\geq J_{l}$ , agents aware of arms at least as good as $1-l+n/2$ will (1) play such arms most often in phase $j$ and (2) have such arms active thereafter. The proof is basically a noiseless version of a known bandit argument, specialized to the setting of Section 4.

Lemma 1.

Under the assumptions of Theorem 1, for any $l\in[n/2]$ , $j\geq J_{l}$ , and $i\in[n]$ such that $\min S_{j}^{(i)}\leq 1-l+n/2$ , we have $B_{j}^{(i)}\leq 1-l+n/2$ and $\min S_{j^{\prime}}^{(i)}\leq 1-l+n/2\ \forall\ j^{\prime}\geq j$ .

Proof.

First, we prove by contradiction that $B_{j}^{(i)}\leq 1-l+n/2$ : suppose instead that $B_{j}^{(i)}\geq 2-l+n/2$ . Let $k_{1}=\min S_{j}^{(i)}$ and $k_{2}=B_{j}^{(i)}$ . Then $T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1})\geq(A_{j}-A_{j-1})/3$ ; otherwise, since $|S_{j}^{(i)}|=S+2=3$ by assumption and $k_{2}$ is the most played arm in phase $j$ , we obtain

\sum_{k\in S_{j}^{(i)}}(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1}))\leq 3(T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1}))<A_{j}-A_{j-1},

which is a contradiction. Furthermore, there clearly exists $t\in\{1+A_{j-1},\ldots,A_{j}\}$ such that

T_{k_{2}}^{(i)}(t-1)-T_{k_{2}}^{(i)}(A_{j-1})=T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1})-1,\quad I_{t}^{(i)}=k_{2}.

Combining these observations, and since $T_{k_{2}}^{(i)}(A_{j-1})\geq 0$ , we obtain that

T_{k_{2}}^{(i)}(t-1)\geq T_{k_{2}}^{(i)}(A_{j})-T_{k_{2}}^{(i)}(A_{j-1})-1\geq\frac{A_{j}-A_{j-1}}{3}-1,\quad I_{t}^{(i)}=k_{2}.

By the UCB policy and since $\alpha=4$ by assumption, the previous expression implies

(25)

\mu_{k_{1}}<\mu_{k_{1}}+\sqrt{\frac{4\log t}{T_{k_{1}}^{(i)}(t-1)}}\leq\mu_{k_{2}}+\sqrt{\frac{4\log t}{T_{k_{2}}^{(i)}(t-1)}}\leq\mu_{k_{2}}+\sqrt{\frac{4\log t}{\frac{A_{j}-A_{j-1}}{3}-1}}.

Since $A_{j}=j^{2}$ by the assumption $\beta=2$ and $t\leq A_{j}$ , we also know

(26)

\frac{4\log t}{\frac{A_{j}-A_{j-1}}{3}-1}\leq\frac{8\log j}{\frac{2j-1}{3}-1}=\frac{12\log j}{j-2}=h(j),

where we define $h(j^{\prime})=12\log(j^{\prime})/(j^{\prime}-2)\ \forall\ j^{\prime}>2$ . Note this function decreases on $[3,\infty)$ , since

h^{\prime}(j^{\prime})=\frac{12(j^{\prime}-2-j^{\prime}\log j^{\prime})}{j^{\prime}(j^{\prime}-2)^{2}}\leq\frac{12(j^{\prime}-2-j^{\prime}\log 3)}{j^{\prime}(j^{\prime}-2)^{2}}<\frac{-24}{j^{\prime}(j^{\prime}-2)^{2}}<0\ \forall\ j^{\prime}\geq 3,

where the second inequality is $e<3$ . Thus, since $j\geq J_{l}\geq J_{1}\geq 3$ , we know $h(j)\leq h(J_{l})$ . Combined with (25) and (26), we obtain $\mu_{k_{1}}<\mu_{k_{2}}+\sqrt{h(J_{l})}$ . Finally, recall $k_{2}\geq 2-l+n/2$ and $k_{1}\leq 1-l+n/2$ , so $\mu_{k_{1}}-\mu_{k_{2}}\geq\mu_{1-l+n/2}-\mu_{2-l+n/2}>0$ . Combined with $\mu_{k_{1}}<\mu_{k_{2}}+\sqrt{h(J_{l})}$ , we conclude

(27)

(\mu_{1-l+n/2}-\mu_{2-l+n/2})^{2}<h(J_{l})=12\log(J_{l})/(J_{l}-2).

We now show that in each of three cases, (27) yields a contradiction.

•

$l=1$ : By definition, the left side of (27) is $(\mu_{n/2}-\mu_{1+n/2})^{2}=(13/15)^{2}$ and the right side is $12\log(2^{8})/(2^{8}-2)$ , and one can verify that $12\log(2^{8})/(2^{8}-2)\leq(13/15)^{2}$ .

•

$1<l<n/2$ : Here $1-l+n/2>1$ and $2-l+n/2<1+n/2$ , i.e., both arms are mediocre. By definition, we thus have $(\mu_{1-l+n/2}-\mu_{2-l+n/2})^{2}=2^{-2^{l+1}}$ , so to obtain a contradiction to (27), it suffices to show $h(J_{l})\leq 2^{-2^{l+1}}$ . We show by induction that this holds for all $l\in\{2,3,\ldots\}$ .

For $l=2$ , note $J_{2}=(J_{1}+2)^{2}>J_{1}^{2}+2=2^{16}+2$ (so $J_{2}-2>2^{16}$ ) and $J_{2}=J_{1}^{2}+4J_{1}+4=2^{16}+2^{10}+2^{2}<2^{17}$ (so $\log J_{2}<17\log 2<17$ ). Thus, $h(J_{2})<12\cdot 17/2^{16}=204/2^{16}<2^{-8}$ .

Now assume $h(J_{l})<2^{-2^{l+1}}$ for some $l\geq 2$ ; we aim to show $h(J_{l+1})<2^{-2^{l+2}}$ . Since $J_{l}\geq 2$ , we have $J_{l+1}\leq(2J_{l})^{2}\leq J_{l}^{4}$ ; we also know $J_{l+1}-2=J_{l}^{2}+4J_{l}+2>J_{l}(J_{l}-2)$ . Thus, we obtain

(28)

h(J_{l+1})<\frac{12\log J_{l}^{4}}{J_{l}(J_{l}-2)}=\frac{4}{J_{l}}\cdot h(J_{l})<\frac{4}{J_{l}}\cdot 2^{-2^{l+1}}<\frac{4}{J_{1}^{2^{l-1}}}\cdot 2^{-2^{l+1}}=2^{2-2^{l+2}-2^{l+1}},

where the inequalities follow from the previous paragraph, the inductive hypothesis, and (7) from Section 4.1, respectively. Since $2<2^{l+1}$ , this completes the proof.

•

$l=n/2$ : Recall that in the previous case, we showed $h(J_{l})<2^{-2^{l+1}}$ for any $l\in\{2,3,\ldots\}$ . Therefore, $h(J_{n/2})\leq 2^{-2^{(n/2)+1}}\leq 2^{-8}$ by assumption $n\in\{4,6,\ldots\}$ . Since $(\mu_{1-l+n/2}-\mu_{2-l+n/2})^{2}=\Delta_{2}^{2}=(1/15)^{2}>(1/16)^{2}=2^{-8}$ in this case, we obtain a contradiction to (27).

Thus, we have established the first part of the lemma ( $B_{j}^{(i)}\leq 1-l+n/2$ ). To show $\min S_{j^{\prime}}^{(i)}\leq 1-l+n/2$ , we suppose instead that $\min S_{j^{\prime}}^{(i)}>1-l+n/2$ for some $j^{\prime}\geq j$ . Then $j^{\dagger}=\min\{j^{\prime}\geq j:\min S_{j^{\prime}}^{(i)}>1-l+n/2\}$ is well-defined. If $j^{\dagger}=j$ , then $\min S_{j}^{(i)}>1-l+n/2$ , which violates the assumption of the lemma, so we assume $j^{\dagger}>j$ . In this case, we know $\min S_{j^{\dagger}-1}^{(i)}\leq 1-l+n/2$ (since $j^{\dagger}$ is minimal) and $B_{j^{\dagger}-1}^{(i)}>1-l+n/2$ (else, because $B_{j^{\dagger}-1}^{(i)}\in S_{j^{\dagger}}^{(i)}$ , we would have $\min S_{j^{\dagger}}^{(i)}\leq B_{j^{\dagger}-1}^{(i)}\leq 1-l+n/2$ ). But since $j^{\dagger}-1\geq J_{l}$ (by assumption $j^{\dagger}>j$ and $j\geq J_{l}$ ), this contradicts the first part of the lemma. ∎

Next, for each $l\in[n/2]$ , we define the event

\mathcal{E}_{l}=\{l+n/2\notin P_{J_{l}}^{(l+1+n/2)}\}\cap\cap_{j=1}^{J_{l}}\cap_{i=l+n/2}^{n}\{\min S_{j}^{(i)}>1-l+n/2,n+1\notin P_{j}^{(i)}\},

where $\{n\notin P_{J_{n/2}}^{(n+1)}\}=\Omega$ by convention (so $\mathcal{E}_{n/2}$ is well-defined). Thus, in words $\mathcal{E}_{l}$ says (1) $l+1+n/2$ is not blocking $l+n/2$ at phase $J_{l}$ , (2) no honest agent $i\geq l+n/2$ has ever been aware of arms as good as $1-l+n/2$ up to phase $J_{l}$ , and (3) no such $i$ has ever blocked the malicious agent $n+1$ . Point (2) will allow us to show that agent $n$ does not become aware of the best arm until phase $J_{n/2}$ (when $\mathcal{E}_{n/2}$ holds). The other events in the definition of $\{\mathcal{E}_{l}\}_{l=1}^{n/2}$ will allow us to inductively lower bound their probabilities. The next lemma establishes the base of this inductive argument.

Lemma 2.

Under the assumptions of Theorem 1, $\mathbb{P}(\mathcal{E}_{1})\geq 3^{-2(J_{1}-1)}$ .

Proof.

We first observe that at all phases $j\in[J_{1}-1]$ , only the second case of the malicious strategy – where the malicious agent recommends to avoid blocking – arises, which implies $n+1\notin P_{j}^{(i)}\ \forall\ j\in[J_{1}],i\in[n]$ . Therefore, it suffices to show $\mathbb{P}(\mathcal{E}_{1}^{\prime})\geq 3^{-2(J_{1}-1)}$ , where we define

\mathcal{E}_{1}^{\prime}=\{1+n/2\notin P_{J_{1}}^{(2+n/2)}\}\cap\cap_{j=1}^{J_{1}}\cap_{i=1+n/2}^{n}\{\min S_{j}^{(i)}>n/2\}.

To do so, we will show $\mathcal{E}_{1}^{\prime}\supset\mathcal{F}\triangleq\cap_{j=1}^{J_{1}-1}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\}$ and $\mathbb{P}(\mathcal{F})\geq 3^{-2(J_{1}-1)}$ .

To show $\mathbb{P}(\mathcal{F})\geq 3^{-2(J_{1}-1)}$ , first note that by the law of total expectation, we have

\mathbb{P}(\mathcal{F})=\mathbb{E}[\mathbbm{1}(\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\})\mathbb{P}_{J_{1}-1}(\cap_{i=1+n/2}^{2+n/2}\{H_{J_{1}-1}^{(i)}=n+1\})].

Now when $\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\}$ occurs, the malicious strategy implies $P_{J_{1}-1}^{(i)}=\emptyset$ , so $H_{J_{1}-1}^{(i)}$ is sampled from a set of three agents which includes $n+1$ , for each $i\in\{1+n/2,2+n/2\}$ . Since this sampling is independent, we conclude that when $\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\}$ occurs,

\displaystyle\mathbb{P}_{J_{1}-1}(\cap_{i=1+n/2}^{2+n/2}\{H_{J_{1}-1}^{(i)}=n+1\})=3^{-2}.

Thus, combining the previous two expressions with the definition of $\mathcal{F}$ and iterating, we obtain

\mathbb{P}(\cap_{j=1}^{J_{1}-1}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\})=\mathbb{P}(\mathcal{F})=\mathbb{P}(\cap_{j=1}^{J_{1}-2}\cap_{i=1+n/2}^{2+n/2}\{H_{j}^{(i)}=n+1\})/3^{2}=3^{-2(J_{1}-1)}.

To show $\mathcal{E}_{1}^{\prime}\supset\mathcal{F}$ , first observe that when $\mathcal{F}$ occurs, $2+n/2$ does not contact $1+n/2$ at any phase $j\in[J_{1}-1]$ , so $1+n/2\notin P_{J_{1}}^{(2+n/2)}$ . Thus, it only remains to show that $\mathcal{F}$ implies $\min S_{j}^{(i)}>n/2$ for all $j\leq J_{1}$ and $i>n/2$ . Suppose instead that $\mathcal{F}$ holds and $\min S_{j}^{(i)}\leq n/2$ for some such $j$ and $i$ . Let $j^{\dagger}=\min\{j\leq J_{1}:\min S_{j}^{(i)}\leq n/2\text{ for some }i>n/2\}$ be the earliest $j$ it occurs; note $j^{\dagger}>1$ by assumption that $\min S_{1}^{(i)}>n/2$ for $i>n/2$ . Let $i^{\dagger}$ be some agent it occurs for, i.e., $i^{\dagger}>n/2$ is such that $k^{\dagger}\in S_{j^{\dagger}}^{(i^{\dagger})}$ for some $k^{\dagger}\leq n/2$ . Since $j^{\dagger}$ is the earliest such phase and $j^{\dagger}>1$ , we know $k^{\dagger}$ was not active for $i^{\dagger}$ at the previous phase $j^{\dagger}-1$ , so it was recommended to $i^{\dagger}$ at this phase. By the malicious strategy and $j^{\dagger}-1\leq J_{1}-1$ , this implies that the agent $i^{\ddagger}$ who recommended $k^{\dagger}$ to $i^{\dagger}$ is honest, so $k^{\dagger}$ was active for $i^{\ddagger}$ at the previous phase, which implies $i^{\ddagger}\leq n/2$ (else, we contradict the minimality of $j^{\dagger}$ ). From the assumed line graph structure, we must have $i^{\dagger}=1+n/2$ and $i^{\ddagger}=n/2$ , i.e., $1+n/2$ contacted $n/2$ at phase $j^{\dagger}-1\leq J_{1}-1$ . But this contradicts the definition of $\mathcal{F}$ , which stipulates that $1+n/2$ only contacts $n+1$ before $J_{1}$ . ∎

To continue the inductive argument, we lower bound $\mathbb{P}(\mathcal{E}_{l+1})$ in terms of $\mathbb{P}(\mathcal{E}_{l})$ .

Lemma 3.

Under the assumptions of Theorem 1, for any $l\in[(n/2-1]$ , we have $\mathbb{P}(\mathcal{E}_{l+1})\geq 3^{-4}\mathbb{P}(\mathcal{E}_{l})$ .

Proof.

The proof is somewhat lengthy and proceeds in four steps.

Step 1: Probabilistic arguments. First, we define the events

\mathcal{G}_{1}=\cap_{i=l+n/2}^{l+2+n/2}\{H_{J_{l}}^{(i)}=n+1\},\quad\mathcal{G}_{2}=\{H_{J_{l}+1}^{(l+1+n/2)}=l+n/2\},\quad\mathcal{G}=\mathcal{G}_{1}\cap\mathcal{G}_{2}.

Then by the law of total expectation, we know that

(29)

\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G})=\mathbb{E}[\mathbb{E}_{J_{l}+1}[\mathbbm{1}(\mathcal{E}_{l}\cap\mathcal{G}_{1})\mathbbm{1}(\mathcal{G}_{2})]]=\mathbb{E}[\mathbbm{1}(\mathcal{E}_{l}\cap\mathcal{G}_{1})\mathbb{P}_{J_{l}+1}(\mathcal{G}_{2})].

Now if $\mathcal{E}_{l}\cap\mathcal{G}_{1}$ occurs, then $l+n/2\notin P_{J_{l}}^{(l+1+n/2)}$ (by $\mathcal{E}_{l}$ ) and $H_{J_{l}}^{(l+1+n/2)}=n+1$ (by $\mathcal{G}_{1}$ ); the latter implies $l+n/2\notin P_{J_{l}+1}^{(l+1+n/2)}\setminus P_{J_{l}}^{(l+1+n/2)}$ , so combined with the former, $l+n/2\notin P_{J_{l}+1}^{(l+1+n/2)}$ . Thus, $\mathcal{E}_{l}\cap\mathcal{G}_{1}$ implies that $H_{J_{l}+1}^{(l+1+n/2)}$ is sampled from a set of most three agents containing $l+n/2$ , so $\mathbb{P}_{J_{l}+1}(\mathcal{G}_{2})\geq 1/3$ . Substituting into (29), and again using total expectation, we thus obtain

\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G})\geq\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G}_{1})/3=\mathbb{E}[\mathbb{E}_{J_{l}}[\mathbbm{1}(\mathcal{E}_{l})\mathbbm{1}(\mathcal{G}_{1})]]/3=\mathbb{E}[\mathbbm{1}(\mathcal{E}_{l})\mathbb{P}_{J_{l}}(\mathcal{G}_{1})]/3.

Analogously, when $\mathcal{E}_{l}$ holds, $n+1\notin P_{J_{l}}^{(i)}\ \forall\ i\in\{l+n/2,\ldots,l+2+n/2\}$ , which by similar logic gives $\mathbb{P}_{J_{l}}(\mathcal{G}_{1})\geq 3^{-3}$ . Therefore, combining the previous two inequalities, we have shown $\mathbb{P}(\mathcal{E}_{l}\cap\mathcal{G})\geq 3^{-4}\mathbb{P}(\mathcal{E}_{l})$ . Consequently, it suffices to show that $\mathcal{E}_{l}\cap\mathcal{G}\subset\mathcal{E}_{l+1}$ .

Step 2: Event decomposition. For $l^{\prime}\in\{l,l+1\}$ , we decompose $\mathcal{E}_{l^{\prime}}=\cap_{h=1}^{4}\mathcal{H}_{l^{\prime},h}$ , where

(30)		$\displaystyle\mathcal{H}_{l^{\prime},1}=\{l^{\prime}+n/2\notin P_{J_{l^{\prime}}}^{(l^{\prime}+1+n/2)}\},\ \mathcal{H}_{l^{\prime},2}=\cap_{j=1}^{J_{l^{\prime}-1}}\cap_{i=l^{\prime}+n/2}^{n}\{\min S_{j}^{(i)}>1-l^{\prime}+n/2,n+1\notin P_{j}^{(i)}\},$
(31)		$\displaystyle\mathcal{H}_{l^{\prime},3}=\cap_{j=J_{l^{\prime}-1}+1}^{J_{l^{\prime}}}\cap_{i=l^{\prime}+n/2}^{n}\{\min S_{j}^{(i)}>1-l^{\prime}+n/2\},\ \mathcal{H}_{l^{\prime},4}=\cap_{j=J_{l^{\prime}-1}+1}^{J_{l^{\prime}}}\cap_{i=l^{\prime}+n/2}^{n}\{n+1\notin P_{j}^{(i)}\}.$

As a simple consequence of these definitions, we note that

(32)		$\displaystyle\mathcal{H}_{l,2}\cap\mathcal{H}_{l,3}\cap\mathcal{H}_{l,4}$	$\displaystyle=\cap_{j=1}^{J_{l}}\cap_{i=l+n/2}^{n}\{\min S_{j}^{(i)}>1-l+n/2,n+1\notin P_{j}^{(i)}\}$
(33)			$\displaystyle\subset\cap_{j=1}^{J_{l}}\cap_{i=l+1+n/2}^{n}\{\min S_{j}^{(i)}>1-(l+1)+n/2,n+1\notin P_{j}^{(i)}\}=\mathcal{H}_{l+1,2}.$

Hence, to prove $\mathcal{E}_{l}\cap\mathcal{G}\subset\mathcal{E}_{l+1}$ , it suffices to show $\mathcal{E}_{l}\cap\mathcal{G}\subset\mathcal{H}_{l+1,1}\cap\mathcal{H}_{l+1,3}\cap\mathcal{H}_{l+1,4}$ . For the remainder of the proof, we thus assume $\mathcal{E}_{l}\cap\mathcal{G}$ holds and argue $\mathcal{H}_{l+1,1}\cap\mathcal{H}_{l+1,3}\cap\mathcal{H}_{l+1,4}$ holds.

Step 3: Some consequences. We begin by deriving several consequences of $\mathcal{E}_{l}\cap\mathcal{G}$ . First, note each $i\in\{l+1+n/2,l+2+n/2\}$ contacts $n+1$ at phase $J_{l}$ (by $\mathcal{G}_{1}$ ), who recommends $1-l+n/2$ (by the malicious strategy). Since $\min S_{J_{l}}^{(i)}>1-l+n/2$ (by $\mathcal{H}_{l,3}$ ), this implies $1-l+n/2=\min S_{J_{l}+1}^{(i)}$ , so $1-l+n/2$ is most played in phase $J_{l}+1$ (by Lemma 1). In summary, we have shown

(34)

H_{J_{l}}^{(i)}=n+1,R_{J_{l}}^{(i)}=B_{J_{l}+1}^{(i)}=1-l+n/2\ \forall\ i\in\{l+1+n/2,l+2+n/2\}.

Second, as a consequence of the above and Lemma 1, we can also write

(35)

1-l+n/2=\min S_{J_{l}+1}^{(i)}\geq\min S_{J_{l}+2}^{(i)}\geq\cdots\ \forall\ i\in\{l+1+n/2,l+2+n/2\}.

Third, we know $l+n/2$ contacts $n+1$ at phase $J_{l}$ (by $\mathcal{G}_{1}$ ), who responds with a currently active arm (by the malicious strategy), so since $\min S_{J_{l}}^{(l+n/2)}>1-l+n/2$ (by $\mathcal{H}_{l,3}$ ), we have

(36)

\min S_{J_{l}+1}^{(l+n/2)}>1-l+n/2.

As a consequence of (36), we see that when $l+1+n/2$ contacts $l+n/2$ at phase $J_{l}+1$ (which occurs by $\mathcal{G}_{2}$ ), $l+n/2$ recommends some arm strictly worse than $1-l+n/2$ . On the other hand, by (35) and Lemma 1, we know the most played arm for $l+1+n/2$ in phase $J_{l}+2$ has index at most $1-l+n/2$ . Taken together, the recommendation is not most played, so

(37)

l+n/2\in P_{j}^{(l+1+n/2)}\ \forall\ j\in\{J_{l}+2,\ldots,(J_{l}+2)^{2}=J_{l+1}\}.

Step 4: Completing the proof. Using the above, we prove in turn that $\mathcal{H}_{l+1,4}$ , $\mathcal{H}_{l+1,3}$ , and $\mathcal{H}_{l+1,1}$ hold. For $\mathcal{H}_{l+1,4}$ , we use proof by contradiction: if $\mathcal{H}_{l+1,4}$ fails, we can find $i\geq l+1+n/2$ and $j\in\{J_{l}+1,\ldots,J_{l+1}\}$ such that $n+1\in P_{j}^{(i)}$ . Let $j^{\dagger}=\min\{j\in\{J_{l}+1,\ldots,J_{l+1}\}:n+1\in P_{j}^{(i)}\}$ be the minimal such $j$ (for this $i$ ). Since $n+1\notin P_{J_{l}}^{(i)}$ (by $\mathcal{H}_{l,4}$ ) and $j^{\dagger}$ is minimal, we must have $n+1\in P_{j^{\dagger}}^{(i)}\setminus P_{j^{\dagger}-1}^{(i)}$ , i.e., $n+1$ was blocked for the recommendation it provided at $j^{\dagger}-1$ . If $i\geq l+3+n/2$ , this contradicts the malicious strategy, since $j^{\dagger}-1\in\{J_{l},\ldots,J_{l+1}-1\}$ and the strategy avoids blocking for such $i$ and $j^{\dagger}$ . A similar contradiction arises if $i\in\{l+1+n/2,l+2+n/2\}$ and $j^{\dagger}\geq J_{l}+2$ (since $j^{\dagger}-1\in\{J_{l}+1,\ldots,J_{l+1}-1\}$ in this case), so we must have $i\in\{l+1+n/2,l+2+n/2\}$ and $j^{\dagger}=J_{l}+1$ . But in this case, $n+1\in P_{j^{\dagger}}^{(i)}\setminus P_{j^{\dagger}-1}^{(i)}=P_{J_{l}+1}^{(i)}\setminus P_{J_{l}}^{(i)}$ contradicts (34).

Next, we show $\mathcal{H}_{l+1,3}$ holds. The logic is similar to the end of the Lemma 2 proof. If instead $\mathcal{H}_{l+1,3}$ fails, we can find $j\in\{J_{l}+1,\ldots,J_{l+1}\}$ and $i\in\{l+1+n/2,\ldots,n\}$ such that $\min S_{j}^{(i)}\leq(n/2)-l$ . Let $j^{\dagger}$ be the minimal such $j$ and $i^{\dagger}\geq l+1+n/2$ an agent with $\min S_{j^{\dagger}}^{(i^{\dagger})}=k^{\dagger}$ for some $k^{\dagger}\leq(n/2)-l$ . Since $\min S_{J_{l}}^{(i^{\dagger})}>1-l+n/2$ (by $\mathcal{H}_{l,3}$ ), $j^{\dagger}\geq J_{l}+1$ , and $j^{\dagger}$ is minimal, we know that $k^{\dagger}$ was recommended to $i^{\dagger}$ at phase $j^{\dagger}-1\in\{J_{l},\ldots,J_{l+1}-1\}$ . By the malicious strategy, this implies that the recommending agent (say, $i^{\ddagger}$ ) was honest. Therefore, $k^{\dagger}$ was active for $i^{\ddagger}$ at phase $j^{\dagger}-1$ , so since $j^{\dagger}$ is minimal, $i^{\ddagger}\leq l+n/2$ . Hence, by the assumed graph structure, $i^{\dagger}=l+1+n/2$ contacted $i^{\ddagger}=l+n/2$ at phase $j^{\dagger}-1$ , who recommended $k^{\dagger}$ . If $j^{\dagger}-1\in\{J_{l},J_{l}+2,\ldots,J_{l+1}-1\}$ , this contact cannot occur, since $l+1+n/2$ instead contacts $n+1$ at $J_{l}$ (by $\mathcal{G}_{1}$ ) and does not contact $l+n/2$ at $J_{l}+2,\ldots,J_{l+1}$ (by (37)). Hence, we must have $j^{\dagger}-1=J_{l}+1$ , so $\min S_{J_{l}+1}^{(l+n/2)}\leq k^{\dagger}\leq(n/2)-l$ , contradicting (36).

Finally, we prove $\mathcal{H}_{l+1,1}$ . Suppose instead that $l+1+n/2\in P_{J_{l+1}}^{(l+2+n/2)}$ , i.e., $l+1+n/2$ is blocked at $J_{l+1}$ . Then since $P_{0}^{(l+2+n/2)}=\emptyset$ , we must have $l+1+n/2\in P_{j}^{(l+2+n/2)}\setminus P_{j-1}^{(l+2+n/2)}$ for some $j\in[J_{l+1}]$ . Let $j^{\dagger}$ be the maximal such $j$ . Then $j^{\dagger}\geq\sqrt{J_{l+1}}=J_{l}+2$ ; otherwise, if $j^{\dagger}<\sqrt{J_{l+1}}$ , $l+1+n/2$ would have been un-blocked by phase $J_{l+1}$ . Therefore, the blocking rule implies

(38)

B_{j^{\dagger}}^{(l+2+n/2)}\neq R_{j^{\dagger}-1}^{(l+2+n/2)}=B_{j^{\dagger}-1}^{(l+1+n/2)}.

By $j^{\dagger}\in\{J_{l}+2,\ldots,J_{l+1}\}$ , $\mathcal{H}_{l+1,3}$ , and (35), we also know

-l+n/2<\min S_{j^{\dagger}}^{(l+2+n/2)},\min S_{j^{\dagger}-1}^{(l+1+n/2)}\leq 1-l+n/2,

so $\min S_{j^{\dagger}}^{(l+2+n/2)}=\min S_{j^{\dagger}-1}^{(l+1+n/2)}=1-l+n/2$ . Combined with (38), we must have $B_{j^{\dagger}+h-2}^{(l+h+n/2)}>\min S_{j^{\dagger}+h-2}^{(l+h+n/2)}=1-l+n/2$ for some $h\in\{1,2\}$ , which contradicts Lemma 1 (since $j^{\dagger}\geq J_{l}+2$ ). ∎

Finally, we can prove the theorem. Define $\sigma=\min\{j\in\mathbb{N}:1\in S_{j}^{(n)}\}$ . Then by definition, $I_{t}^{(n)}\neq 1$ for any $t\leq A_{\sigma-1}$ . Hence, because $\Delta_{2}=1/15$ in the problem instance of the theorem, we obtain

\frac{A_{\sigma-1}\wedge T}{15}=\sum_{t=1}^{A_{\sigma-1}\wedge T}\frac{\mathbbm{1}(I_{t}^{(n)}\neq 1)}{15}=\sum_{t=1}^{A_{\sigma}\wedge T}\sum_{k=2}^{K}\frac{\mathbbm{1}(I_{t}^{(n)}=k)}{15}\leq\sum_{t=1}^{T}\sum_{k=2}^{K}\Delta_{k}\mathbbm{1}(I_{t}^{(n)}=k).

Thus, by Claim 23 from Appendix F.1 and since $A_{\sigma-1}=(\sigma-1)^{2}$ by the choice $\beta=2$ , we can write

(39)

R_{T}^{(n)}=\mathbb{E}\left[\sum_{t=1}^{T}\sum_{k=2}^{K}\Delta_{k}\mathbbm{1}(I_{t}^{(i)}=k)\right]\geq\frac{\mathbb{E}[A_{\sigma-1}\wedge T]}{15}=\frac{\mathbb{E}[(\sigma-1)^{2}\wedge T]}{15}.

Let $l\in[n/2]$ be chosen later. Then $\sigma>J_{l}$ implies $\sigma-1\geq J_{l}$ (since $\sigma,J_{l}\in\mathbb{N}$ ). Thus, we can write

\mathbb{E}[(\sigma-1)^{2}\wedge T]\geq\mathbb{E}[((\sigma-1)^{2}\wedge T)\mathbbm{1}(\sigma>J_{l})]\geq(J_{l}^{2}\wedge T)\mathbb{P}(\sigma>J_{l}).

By definition of $\sigma$ and $\mathcal{E}_{l}$ , along with Lemmas 2 and 3, we also know

\mathbb{P}(\sigma>J_{l})\geq\mathbb{P}(\mathcal{E}_{l})\geq 3^{-4(l-1)}\mathbb{P}(\mathcal{E}_{1})\geq 3^{-4(l-1)}\cdot 3^{-2(J_{1}-1)}=9^{3-2l-J_{1}}.

By (7) from Section 4.1, we know $J_{l}^{2}\geq J_{1}^{2^{l}}=(2^{8})^{2^{l}}=2^{2^{l+3}}$ . Combined with the previous three bounds, and letting $C$ denote the constant $C=9^{3-J_{1}}/15$ , we thus obtain

(40)

R_{T}^{(n)}\geq(2^{2^{l+3}}\wedge T)\cdot 9^{3-2l-J_{1}}/15=C\cdot 81^{-l}\cdot(2^{2^{l+3}}\wedge T)\ \forall\ l\in[n/2].

We now consider three different cases, each with a different choice of $l$ .

•

If $T>2^{2^{(n/2)+3}}$ , choose $l=n/2$ . Then (40) becomes $R_{T}^{(n)}\geq C\cdot 81^{-n/2}\cdot 2^{2^{(n/2)+3}}$ . Observe that

	$\displaystyle 81^{-n/2}\cdot 2^{2^{(n/2)+3}}$	$\displaystyle\geq 16^{-n}\cdot 2^{2^{(n/2)+3}}=(2^{4})^{2\cdot 2^{n/2}-n}\geq(2^{4})^{2^{n/2}}>\exp(2^{n/2})$
		$\displaystyle=\exp(\exp(n\log(2)/2))>\exp(\exp(n/3)),$

where the second inequality is $n\leq 2^{n/2}$ for $n\in\{2,4,8,\ldots\}$ . On the other hand, Claim 1 below shows $R_{T}^{(n)}\geq\log(T)/C_{\ref{clmLogRegNaive}}$ for some absolute constant $C_{\ref{clmLogRegNaive}}>0$ . Thus, we have shown

R_{T}^{(n)}=(R_{T}^{(n)}/2)+(R_{T}^{(n)}/2)\geq(C/2)\exp(\exp(n/3))+\log(T)/(2C_{\ref{clmLogRegNaive}}).

•

If $T\in(2^{8},2^{2^{(n/2)+3}}]$ , let $l=\lceil\log_{2}(\log_{2}(T))-3\rceil$ . Then $2^{2^{l+3}}\geq T$ , so $2^{2^{l+3}}\wedge T=T$ . Furthermore, we know $l\leq\log_{2}(\log_{2}(T))-2$ , which implies

\quad 81^{-l}\geq 81^{2}\cdot 81^{-\log_{2}(\log_{2}(T))}\geq 81^{2}\cdot 2^{-7\log_{2}(\log_{2}(T))}=81^{2}/\log_{2}^{7}(T)=81^{2}\log^{7}(2)/\log^{7}(T).

Next, observe that $0=\log_{2}(\log_{2}(2^{8}))-3<\log_{2}(\log_{2}(T))-3\leq n/2$ for this case of $T$ , so $l\in[n/2]$ . Thus, we can choose this $l$ in (40) and combine with the above bounds to lower bound regret as $R_{T}^{(n)}\geq 81^{2}\log^{7}(2)CT/\log^{7}(T)$ .

•

If $T\leq 2^{8}$ , choose $l=1$ . Then $2^{2^{l+3}}=2^{16}\geq T$ , so (40) implies $R_{T}^{(n)}\geq CT/81$ .

Hence, in all three cases, we have shown $R_{T}^{(n)}\geq C^{\prime}\min\{\log(T)+\exp(\exp(n/3)),T/\log^{7}(T)\}$ for some absolute constant $C^{\prime}>0$ . This establishes the theorem.

We return to state and prove the aforementioned Claim 1. We note the analysis is rather coarse; our only goal here is to establish a $\log T$ scaling (not optimize constants).

Claim 1.

Under the assumptions of Theorem 1, we have $R_{T}^{(n)}\geq\log(T)/C_{\ref{clmLogRegNaive}}$ , where $C_{\ref{clmLogRegNaive}}=15\log 99$ .

Proof.

If $T=1$ , the bound holds by nonnegativity. If $T\in\{2,\ldots,99\}$ , then since $\min S_{1}^{(n)}>n/2$ and $\Delta_{2}\geq 1/15$ by assumption in Theorem 1, we know $\Delta_{I_{1}^{(n)}}\geq 1/15$ , which implies $R_{T}^{(n)}\geq 1/15\geq\log(T)/C_{\ref{clmLogRegNaive}}$ . Thus, only the case $T\geq 100$ remains. By Claim 23 from Appendix F.1 and $\Delta_{2}\geq 1/15$ ,

R_{T}^{(n)}\geq\frac{\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)]}{15}=\frac{\log(99)\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)]}{C_{\ref{clmLogRegNaive}}}>\frac{2\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)]}{C_{\ref{clmLogRegNaive}}}.

Thus, it suffices to show that $\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)\geq\log(T)/2$ . Suppose instead that this inequality fails. Then since the left side is an integer, we have $\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)\leq\lfloor\log(T)/2\rfloor$ by assumption. Therefore, we can find $t\in\{T-\lfloor\log(T)/2\rfloor+1,\ldots,T\}$ such that $I_{t}^{(n)}=1$ (else, we violate the assumed inequality). By this choice of $t$ and the assumed inequality, we can then write

T_{1}^{(n)}(t-1)=t-1-\sum_{s=1}^{t-1}\mathbbm{1}(I_{t}^{(n)}\neq 1)\geq\left(T-\lfloor\log(T)/2\rfloor\right)-\left(\lfloor\log(T)/2\rfloor\right)\geq T-\log T.

We can lower bound the right side by $4\log T$ (else, applying Claim 20 from Appendix F.1 with $x=T$ , $y=1$ , and $z=5$ yields $T<100$ , a contradiction), which is further bounded by $4\log t$ . Combined with the fact that rewards are deterministic, $\mu_{1}=1$ , and $\alpha=4$ in Theorem 1, we obtain

(41)

\hat{\mu}_{1}^{(n)}(t-1)+\sqrt{\alpha\log(t)/T_{1}^{(n)}(t-1)}=1+\sqrt{4\log(t)/T_{1}^{(n)}(t-1)}\leq 2.

Next, let $k\in S_{A^{-1}(t)}^{(n)}$ be any other arm which is active for $n$ at time $t$ . Then clearly

(42)		$\displaystyle T_{k}^{(n)}(t-1)\leq\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}=k)\leq\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(n)}\neq 1)\leq\lfloor\log(T)/2\rfloor\leq\log(T)/2$
(43)		$\displaystyle\Rightarrow\hat{\mu}_{k}^{(n)}(t-1)+\sqrt{\alpha\log(t)/T_{k}^{(n)}(t-1)}\geq\sqrt{8\log(t)/\log(T)}=2\sqrt{\log(t)/\log(\sqrt{T})}.$

By (41), (43), the fact that $I_{t}^{(n)}=1$ , and the UCB policy, we conclude $t\leq\sqrt{T}$ . Since $T\geq 4$ , this further implies $t\leq T/2$ . But we also know that $t\geq T-\lfloor\log(T)/2\rfloor+1>T-\log(T)/2$ . Combining these inequalities gives $T<\log T$ , a contradiction. ∎

Appendix D Proofs from Section 5

D.1. Proof of Theorem 2

Fix an honest agent $i\in[n]$ . Let $\tau^{(i)}=\tau_{\text{spr}}\vee\tau_{\text{blk}}^{(i)}$ , where we recall from the proof sketch that

(44)		$\displaystyle\tau_{\text{spr}}=\inf\{j\in\mathbb{N}:B_{j^{\prime}}^{(i^{\prime})}=1\ \forall\ i^{\prime}\in[n],j^{\prime}\geq j\},$
(45)		$\displaystyle\tau_{\text{blk}}^{(i)}=\inf\{j\in\mathbb{N}:H_{j^{\prime}-1}^{(i)}\in P_{j^{\prime}}^{(i)}\setminus P_{j^{\prime}-1}^{(i)}\ \forall\ j^{\prime}\geq j\ s.t.\ R_{j^{\prime}-1}^{(i)}\neq 1\}.$

Let $\gamma_{i}\in(0,1)$ be chosen later. Denote by $\overline{S}^{(i)}=\{2,\ldots,K\}\cap\hat{S}^{(i)}$ and $\underline{S}^{(i)}=\{2,\ldots,K\}\setminus\hat{S}^{(i)}$ the suboptimal sticky and non-sticky arms, respectively, for agent $i$ . Then by Claim 23 from Appendix F.1, we can decompose regret as $R_{T}^{(i)}=\sum_{h=1}^{4}R_{T,h}^{(i)}$ , where

(46)		$\displaystyle R_{T,1}^{(i)}=\mathbb{E}\left[\sum_{t=1}^{A_{\tau_{\text{spr}}}\wedge T}\Delta_{I_{t}^{(i)}}\right],\quad R_{T,2}^{(i)}=\sum_{k\in\overline{S}^{(i)}}\Delta_{k}\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}(I_{t}^{(i)}=k)\right],$
(47)		$\displaystyle R_{T,3}^{(i)}=\sum_{k\in\underline{S}^{(i)}}\Delta_{k}\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil}\wedge T}\mathbbm{1}(I_{t}^{(i)}=k)\right],\quad R_{T,4}^{(i)}=\sum_{k\in\underline{S}^{(i)}}\Delta_{k}\mathbb{E}\left[\sum_{t=1+A_{\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil}}^{T}\mathbbm{1}(I_{t}^{(i)}=k)\right],$

and where $\sum_{t=s_{1}}^{s_{2}}\mathbbm{1}(I_{t}^{(i)}=k)=0$ whenever $s_{1}>s_{2}$ by convention. Thus, we have rewritten regret as the sum of four terms: $R_{T,1}^{(i)}$ , which accounts for regret before the best arm spreads; $R_{T,2}^{(i)}$ , the regret due to sticky arms after the best arm spreads; $R_{T,3}^{(i)}$ , regret from non-sticky arms after the best arm spreads but before phase $\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil$ ; and $R_{T,4}^{(i)}$ , regret from non-sticky arms after this phase. The subsequent lemmas bound these terms in turn.

Lemma 4.

Under the assumptions of Theorem 2, for any $i\in[n]$ and $T\in\mathbb{N}$ , we have

R_{T,1}^{(i)}\leq\mathbb{E}[A_{\tau_{\text{spr}}}]=O(S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S\log(S/\Delta_{2})/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee(\bar{d}\log(n+m))^{\beta/\rho_{1}}\vee nK^{2}S)+\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}].

Proof.

Assumption 2 ensures $\Delta_{k}\leq 1$ , so $R_{T,1}^{(i)}\leq\mathbb{E}[A_{\tau_{\text{spr}}}]$ . The result follows from the bound on $\mathbb{E}[A_{\tau_{\text{spr}}}]$ discussed in Section 7.5 and formally stated as Corollary 4 in Appendix E.5. ∎

Lemma 5.

Under the assumptions of Theorem 2, for any $i\in[n]$ and $T\in\mathbb{N}$ , we have

(48)

R_{T,2}^{(i)}\leq\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{4(\alpha-1)|\overline{S}^{(i)}|}{2\alpha-3}.

Proof.

For any $k\in\overline{S}^{(i)}$ , Claim 22 and Corollary 6 from Appendix F.2 imply

(49)	$\displaystyle\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k\right)\right]$	$\displaystyle=\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]$
(50)		$\displaystyle\quad+\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]$
(51)		$\displaystyle\leq\frac{4\alpha\log T}{\Delta_{k}^{2}}+\frac{4(\alpha-1)}{2\alpha-3},$

so multiplying by $\Delta_{k}$ , using $\Delta_{k}\leq 1$ , and summing over $k\in\overline{S}^{(i)}$ completes the proof. ∎

Lemma 6.

Under the assumptions of Theorem 2, for any $i\in[n]$ , $\gamma_{i}\in(0,1)$ , and $T\in\mathbb{N}$ , we have

R_{T,3}^{(i)}\leq\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}+\frac{4C_{\ref{clmIntNonStickSmallT}}\alpha K}{\Delta_{2}\gamma_{i}}\log\left(\frac{C_{\ref{clmIntNonStickSmallT}}K}{\Delta_{2}\gamma_{i}}\right)+1+\mathbb{E}[A_{\tau_{\text{spr}}}].

Proof.

The proof is nontrivial; we defer it to the end of this sub-appendix. ∎

Lemma 7.

Under the assumptions of Theorem 2, for any $i\in[n]$ , $\gamma_{i}\in(0,1)$ , and $T\in\mathbb{N}$ , we have

R_{T,4}^{(i)}\leq\frac{2\eta-1}{\eta-1}\max_{\tilde{S}\subset\underline{S}^{(i)}:|\tilde{S}|\leq d_{\text{mal}}(i)+2}\sum_{k\in\tilde{S}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{8\alpha\beta\log_{\eta}(1/\gamma_{i})(d_{\text{mal}}(i)+2)}{\Delta_{2}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}.

Proof sketch.

The proof follows the same logic as that of (Vial et al., 2021, Lemma 4), so we omit it. The only differences are (1) we replace $m$ (the number of malicious agents connected to $i$ for the complete graph) with $d_{\text{mal}}(i)$ , and (2) we use Claim 19 from Appendix F.1 to bound the summation in (Vial et al., 2021, Lemma 4). We refer the reader to the Theorem 2 proof sketch for intuition. ∎

Additionally, we note the sum $R_{T,3}^{(i)}+R_{T,4}^{(i)}$ can be naively bounded as follows.

Lemma 8.

Under the assumptions of Theorem 2, for any $i\in[n]$ and $T\in\mathbb{N}$ , we have

R_{T,3}^{(i)}+R_{T,4}^{(i)}\leq\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}.

Proof.

The proof follows the exact same logic as that of Lemma 5 so is omitted. ∎

We can now prove the theorem. First, we use the regret decomposition $R_{T}^{(i)}=\sum_{h=1}^{4}R_{T,h}^{(i)}$ , Lemmas 4-7, and the fact that $|\overline{S}^{(i)}|+|\underline{S}^{(i)}|\leq K$ to write

(52)		$\displaystyle R_{T}^{(i)}$	$\displaystyle\leq\frac{2\eta-1}{\eta-1}\max_{\tilde{S}\subset\underline{S}^{(i)}:\|\tilde{S}\|\leq d_{\text{mal}}(i)+2}\sum_{k\in\tilde{S}}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}$
(53)			$\displaystyle\quad+\frac{8\alpha\beta\log_{\eta}(1/\gamma_{i})(d_{\text{mal}}(i)+2)}{\Delta_{2}}+\frac{8(\alpha-1)K}{2\alpha-3}+\frac{4C_{\ref{clmIntNonStickSmallT}}\alpha K}{\Delta_{2}\gamma_{i}}\log\left(\frac{C_{\ref{clmIntNonStickSmallT}}K}{\Delta_{2}\gamma_{i}}\right)+1+2\mathbb{E}[A_{\tau_{\text{spr}}}].$

Now choose $\gamma_{i}=\Delta_{2}/(K\Delta_{S+d_{\text{mal}}(i)+4})\in(0,1)$ . Then

(54)		$\displaystyle\eqref{eqProposedProofNoT}$	$\displaystyle=\tilde{O}\left((d_{\text{mal}}(i)/\Delta_{2})\vee(K/\Delta_{2})^{2}\right)+2\mathbb{E}[A_{\tau_{\text{spr}}}]$
(55)			$\displaystyle=\tilde{O}\left((d_{\text{mal}}(i)/\Delta_{2})\vee(K/\Delta_{2})^{2}\vee S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee\bar{d}^{\beta/\rho_{1}}\vee nK^{2}S\right)+4\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}],$

where the second inequality is due to Lemma 4. Furthermore, by Claim 18 from Appendix F.1, we know that $\log(A_{\lceil T^{\gamma_{i}/\beta}\rceil})\leq\log(e^{2\beta}(T^{\gamma_{i}/\beta})^{\beta})=2\beta+\gamma_{i}\log T$ . Combined with $\underline{S}^{(i)}\leq K$ , $\Delta_{k}\geq\Delta_{2}\ \forall\ k\in\underline{S}^{(i)}$ , and the choice of $\gamma_{i}$ , we can thus write

\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}\leq\sum_{k\in\underline{S}^{(i)}}\frac{8\alpha\beta+\gamma_{i}\log T}{\Delta_{k}}\leq\frac{8\alpha\beta K+\gamma_{i}K\log T}{\Delta_{2}}=\frac{8\alpha\beta K}{\Delta_{2}}+\frac{\log T}{\Delta_{S+d_{\text{mal}}(i)+4}}.

Therefore, we can bound (52) as follows:

(56)		$\displaystyle\eqref{eqProposedProofT}$	$\displaystyle\leq\frac{2\eta-1}{\eta-1}\max_{\tilde{S}\subset\underline{S}^{(i)}:\|\tilde{S}\|\leq d_{\text{mal}}(i)+2}\sum_{k\in\tilde{S}}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{\log T}{\Delta_{S+d_{\text{mal}}(i)+4}}+\frac{8\alpha\beta K}{\Delta_{2}}$
(57)			$\displaystyle\leq\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{4\alpha\log T}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{4\alpha\log T}{\Delta_{k}}+\frac{8\alpha\beta K}{\Delta_{2}},$

where the second inequality holds by $\Delta_{2}\leq\cdots\leq\Delta_{K}$ and $|\overline{S}^{(i)}|\leq S$ . Combining the above yields

(58)		$\displaystyle R_{T}^{(i)}$	$\displaystyle\leq 4\alpha\log(T)\left(\frac{2\eta-1}{\eta-1}\sum_{k=2}^{d_{\text{mal}}(i)+3}\frac{1}{\Delta_{k}}+\sum_{k=d_{\text{mal}}(i)+4}^{S+d_{\text{mal}}(i)+4}\frac{1}{\Delta_{k}}\right)+2\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]$
(59)			$\displaystyle\quad+\tilde{O}\left((d_{\text{mal}}(i)/\Delta_{2})\vee(K/\Delta_{2})^{2}\vee S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee\bar{d}^{\beta/\rho_{1}}\vee nK^{2}S\right).$

Alternatively, we can simply use Lemmas 4, 5, and 8 to write

(60)

R_{T}^{(i)}\leq 4\alpha\log(T)\sum_{k=2}^{K}\frac{1}{\Delta_{k}}+\frac{4(\alpha-1)K}{2\alpha-3}+\mathbb{E}[A_{\tau_{\text{spr}}}].

Therefore, combining the previous two expressions and again invoking Lemma 4 to bound the additive terms in (60) by those in (58), we obtain the desired bound.

Thus, it only remains to prove Lemma 6. We begin by using some standard bandit arguments recounted in Appendix F.2 to bound $R_{T,3}^{(i)}$ in terms of a particular tail of $\tau^{(i)}$ .

Claim 2.

Under the assumptions of Theorem 2, for any $i\in[n]$ , $\gamma_{i}\in(0,1)$ , and $T\in\mathbb{N}$ , we have

(61)		$\displaystyle R_{T,3}^{(i)}$	$\displaystyle\leq\sum_{k\in\underline{S}^{(i)}}\frac{4\alpha\log A_{\lceil T^{\gamma_{i}/\beta}\rceil}}{\Delta_{k}}+\frac{4(\alpha-1)\|\underline{S}^{(i)}\|}{2\alpha-3}+\mathbb{E}[A_{\tau_{\text{spr}}}]$
(62)			$\displaystyle\quad+\frac{4\alpha K\log T}{\Delta_{2}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right).$

Proof.

If $T=1$ , we can naively bound $R_{T,3}^{(i)}\leq 1$ , which completes the proof. Thus, we assume $T>1$ (which will allow us to divide by $\log T$ later). For any $k\in\underline{S}^{(i)}$ , we first write

(63)		$\displaystyle\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}\vee\lceil T^{\gamma_{i}/\beta}\rceil}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k\right)\right]\leq\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]$
(64)		$\displaystyle\quad+\mathbb{E}\left[\mathbbm{1}(\tau^{(i)}\leq\lceil T^{\gamma_{i}/\beta}\rceil)\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\lceil T^{\gamma_{i}/\beta}\rceil}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]$
(65)		$\displaystyle\quad+\mathbb{E}\left[\mathbbm{1}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil)\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right].$

By Corollary 6 from Appendix F.2, (63) is bounded by $4(\alpha-1)/(2\alpha-3)$ . By Claim 22 from Appendix F.2 and $\mathbbm{1}(\cdot)\leq 1$ , (64) is bounded by $4\alpha\log(A_{\lceil T^{\gamma_{i}/\beta}\rceil})/\Delta_{k}^{2}$ . For (65), Claim 22 similarly gives

(66)

\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\leq\frac{4\alpha\log T}{\Delta_{k}^{2}},

which clearly implies the following bound for (65):

(67)

\mathbb{E}\left[\mathbbm{1}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil)\sum_{t=1+A_{\tau_{\text{spr}}}}^{A_{\tau^{(i)}}\wedge T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]\leq\frac{4\alpha\log T}{\Delta_{k}^{2}}\mathbb{P}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil).

For the remaining probability term, by Markov’s inequality, we have

(68)

\mathbb{P}(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil)\leq\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)+\frac{\Delta_{2}\mathbb{E}[A_{\tau_{\text{spr}}}]}{4\alpha K\log T}.

Hence, combining the previous two inequalities, and since $\Delta_{2}\leq\Delta_{k}$ , we can bound (65) by

(69)

\frac{4\alpha\log T}{\Delta_{2}\Delta_{k}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)+\frac{\mathbb{E}[A_{\tau_{\text{spr}}}]}{K\Delta_{k}}.

Finally, combining these bounds, then multiplying by $\Delta_{k}$ , summing over $k\in\underline{S}^{(i)}$ , and using $\Delta_{k}\leq 1$ and $|\underline{S}^{(i)}|\leq K$ , we obtain the desired result. ∎

To bound (62), we consider two cases defined in terms of the following inequalities:

(70)		$\displaystyle 4\alpha K\log(T)/\Delta_{2}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta},\quad 4\alpha K\log(T)/\Delta_{2}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil},$
(71)		$\displaystyle\frac{4\alpha\log A_{j}}{\Delta_{2}^{2}}<\lceil\kappa_{j}\rceil-1\ \forall\ j\geq\lceil T^{\gamma_{i}/\beta}\rceil,\quad\log(T)\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}<\frac{\Delta_{2}(2\alpha-3)}{8\alpha K^{2}}.$

Roughly speaking, when all of these inequalities hold, then $T$ is large enough to ensure that the event $\{\tau^{(i)}=\Omega(\text{poly}(T)),A_{\tau_{\text{spr}}}=O(\log T)\}$ in (62) is unlikely. The next claim makes this precise.

Claim 3.

Under the assumptions of Theorem 2, for any $i\in[n]$ , $\gamma_{i}\in(0,1)$ , and $T\in\mathbb{N}$ such that all of the inequalities in (70)-(71) hold, we have

(72)

\frac{4\alpha K\log T}{\Delta_{2}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)\leq 1.

Proof.

If $T=1$ , the left side is zero and the bound is immediate, so we assume $T>1$ . First note that if $A_{\tau_{\text{spr}}}<4\alpha K\log(T)/\Delta_{2}$ , then since $A_{\tau_{\text{spr}}}=\lceil\tau_{\text{spr}}^{\beta}\rceil\geq\tau_{\text{spr}}^{\beta}$ by definition and $4\alpha K\log(T)/\Delta_{2}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta}\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}$ by (70), $\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}$ . By definition $\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}=(\lceil T^{\gamma_{i}/\beta}\rceil/3)^{\rho_{1}}$ with $\rho_{1}\in(0,1)$ , this further implies $\tau_{\text{spr}}<\lceil T^{\gamma_{i}/\beta}\rceil$ . Thus, when $\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil$ and $A_{\tau_{\text{spr}}}<4\alpha K\log(T)/\Delta_{2}$ , we have $\tau_{\text{spr}}<\tau^{(i)}$ , which by definition of $\tau^{(i)}$ implies $\tau^{(i)}=\tau_{\text{blk}}^{(i)}$ . Therefore, we can write

(73)

\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)\leq\mathbb{P}(\tau_{\text{blk}}^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}).

Now by definition, $\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor$ implies that $B_{j}^{(i)}=1\ \forall\ j\geq\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor$ . Also by definition, $\tau_{\text{blk}}^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil$ implies that for some $j\geq\lceil T^{\gamma_{i}/\beta}\rceil$ and $k>1$ , $R_{j-1}^{(i)}=k$ but $H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}$ . Thus,

(74)		$\displaystyle\mathbb{P}(\tau_{\text{blk}}^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta})$
(75)		$\displaystyle\quad\leq\sum_{k=2}^{K}\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}\mathbb{P}(R_{j-1}^{(i)}=k,H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)},\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}).$

Now fix $k$ and $j$ as in the double summation. Again using $\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\Rightarrow B_{j}^{(i)}=1\ \forall\ j\geq\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor$ , the blocking rules implies that if $\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor$ , $R_{j-1}^{(i)}=k$ , and $H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}$ , then $T_{k}^{(i)}(A_{j})>\kappa_{j}$ . Since $T_{k}^{(i)}(A_{j})\in\mathbb{N}$ , this means $T_{k}^{(i)}(A_{j})\geq\lceil\kappa_{j}\rceil$ . Hence, there must exist some $t\in\{\lceil\kappa_{j}\rceil,\ldots,A_{j}\}$ such that $T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1$ and $I_{t}^{(i)}=k$ . Thus, taking another union bound,

(76)		$\displaystyle\mathbb{P}(R_{j-1}^{(i)}=k,H_{j-1}^{(i)}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)},\tau_{\text{spr}}<\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\wedge\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta})$
(77)		$\displaystyle\quad\leq\sum_{t=\lceil\kappa_{j}\rceil}^{A_{j}}\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,I_{t}^{(i)}=k,\tau_{\text{spr}}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}).$

Next, note $\tau_{\text{spr}}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta}$ implies that for any $j\geq\lceil T^{\gamma_{i}/\beta}\rceil$ and $t\geq\lceil\kappa_{j}\rceil$ , we have $t\geq\lceil\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\rceil\geq\lceil\tau_{\text{spr}}^{\beta}\rceil=A_{\tau_{\text{spr}}}$ , so $1\in S_{A^{-1}(t)}^{(i)}$ by definition of $\tau_{\text{spr}}$ . Therefore, for any $t\in\{\lceil\kappa_{j}\rceil,\ldots,A_{j}\}$ ,

(78)

\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,I_{t}^{(i)}=k,\tau_{\text{spr}}<\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}^{1/\beta})\leq\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k).

Now let $k_{1}=k$ , $k_{2}=1$ , and $\ell=\lceil\kappa_{j}\rceil-1$ . Then $\mu_{k_{2}}-\mu_{k_{1}}=\Delta_{k}\geq\sqrt{4\alpha\log(t)/(\lceil\kappa_{j}\rceil-1)}$ by definition and (71), respectively. Therefore, we can use Corollary 5 from Appendix F.2 to obtain

(79)

\displaystyle\mathbb{P}(T_{k}^{(i)}(t-1)\geq\lceil\kappa_{j}\rceil-1,1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k)\leq 2t^{2(1-\alpha)}.

Combining the above five inequalities, then using Claim 19 from Appendix F.1 and (71), we obtain

	$\displaystyle\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)$	$\displaystyle\leq 2K\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}\sum_{t=\lceil\kappa_{j}\rceil}^{\infty}t^{2(1-\alpha)}$
		$\displaystyle\leq\frac{2K}{2\alpha-3}\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}\leq\frac{\Delta_{2}}{4\alpha K\log T},$

so multiplying both sides by $4\alpha K\log(T)/\Delta$ completes the proof. ∎

On the other hand, when (70)-(71) fails, we can show that $T$ is bounded, and thus we bound the $\log T$ term in (62) by a constant and the probability term by $1$ , as shown in the following claim.

Claim 4.

Under the assumptions of Theorem 2, there exists a constant $C_{\ref{clmIntNonStickSmallT}}>0$ such that, for any $i\in[n]$ , $\gamma_{i}\in(0,1)$ , and $T\in\mathbb{N}$ for which any of the inequalities in (70)-(71) fails, we have

(80)

\frac{4\alpha K\log T}{\Delta_{2}}\mathbb{P}\left(\tau^{(i)}>\lceil T^{\gamma_{i}/\beta}\rceil,A_{\tau_{\text{spr}}}<\frac{4\alpha K\log T}{\Delta_{2}}\right)\leq\frac{4C_{\ref{clmIntNonStickSmallT}}\alpha K}{\Delta_{2}\gamma_{i}}\log\frac{C_{\ref{clmIntNonStickSmallT}}K}{\Delta_{2}\gamma_{i}}.

Proof.

By Claims 33-36 in Appendix F.4, we can set $C_{\ref{clmIntNonStickSmallT}}=\max_{i\in\{\ref{clmSmallTtheta},\ldots,\ref{clmSmallTSum}\}}C_{i}$ to ensure that, if any of the inequalities in (70)-(71) fail, then $\log T\leq(C_{\ref{clmIntNonStickSmallT}}/\gamma_{i})\log(C_{\ref{clmIntNonStickSmallT}}K/(\Delta_{2}\gamma_{i}))$ . The claim follows after upper bounding the probability by $1$ . ∎

Finally, Lemma 6 follows by combining the previous three claims.

D.2. Proof of Corollary 1

As discussed in the proof sketch, we will couple with the noiseless process. We define this process as follows: let $\{\underline{H}_{j}^{(i)}\}_{j=1}^{\infty}$ be i.i.d. $\text{Uniform}(N_{\text{hon}}(i))$ random variables for each $i\in\mathbb{N}$ , and

(81)

\underline{\mathcal{I}}_{0}=\{i^{\star}\},\quad\underline{\mathcal{I}}_{j}=\underline{\mathcal{I}}_{j-1}\cup\{i\in[n]\setminus\underline{\mathcal{I}}_{j-1}:\underline{H}_{j}^{(i)}\in\underline{\mathcal{I}}_{j-1}\}\ \forall\ j\in\mathbb{N}.

For the coupling, we first define

(82)

\sigma_{0}=0,\quad\sigma_{l}=\inf\left\{j>\sigma_{l-1}:\min_{i\in[n]}\sum_{j^{\prime}=1+\sigma_{l-1}}^{j}\bar{Y}_{j^{\prime}}^{(i)}\geq 1\right\}\ \forall\ l\in\mathbb{N}.

Next, for each $i\in[n]$ and $l\in\mathbb{N}$ , let $Z_{l}^{(i)}=\min\{j\in\{1+\sigma_{l-1},\ldots,\sigma_{l}\}:\bar{Y}_{j}^{(i)}=1\}$ . Note this set is nonempty, and since $Z_{l}^{(i)}$ is a deterministic function of $\{\bar{Y}_{j}\}_{j=1}^{\infty}$ , which is independent of $\{\bar{H}_{j}^{(i)}\}_{j=1}^{\infty}$ , $\bar{H}_{Z_{l}^{(i)}}^{(i)}$ is $\text{Uniform}(N_{\text{hon}}(i))$ for each $l\in\mathbb{N}$ . Hence, we can set

\underline{H}_{j}^{(i)}=\begin{cases}\bar{H}_{Z_{l}^{(i)}}^{(i)},&\text{if }j=Z_{l}^{(i)}\text{ for some }l\in\mathbb{N}\\ \text{Uniform}(N_{\text{hon}}(i)),&\text{if }j\notin\{Z_{l}^{(i)}\}_{l=1}^{\infty}\end{cases}

without changing the distribution of $\{\underline{\mathcal{I}}_{j}\}_{j=0}^{\infty}$ . This results in a coupling where the noiseless process dominates the noisy one, in the following sense.

Claim 5.

For the coupling described above, $\underline{\mathcal{I}}_{j}\subset\bar{\mathcal{I}}_{\sigma_{j}}$ for any $j\geq 0$ .

Proof.

We use induction on $j$ . For $j=0$ , we simply have $\underline{\mathcal{I}}_{j}=\{i^{\star}\}=\bar{\mathcal{I}}_{j}=\bar{\mathcal{I}}_{\sigma_{j}}$ . Now assume $\underline{\mathcal{I}}_{j-1}\subset\bar{\mathcal{I}}_{\sigma_{j-1}}$ ; we aim to show that if $i\in\underline{\mathcal{I}}_{j}$ , then $i\in\bar{\mathcal{I}}_{\sigma_{j}}$ . We consider two cases, the first of which is straightforward: if $i\in\underline{\mathcal{I}}_{j-1}$ , then $i\in\bar{\mathcal{I}}_{\sigma_{j-1}}$ by the inductive hypothesis, so since $\sigma_{j-1}<\sigma_{j}$ by definition and $\{\bar{\mathcal{I}}_{j^{\prime}}\}_{j^{\prime}=0}^{\infty}$ increases monotonically, we obtain $i\in\bar{\mathcal{I}}_{\sigma_{j}}$ , as desired.

For the second case, we assume $i\in[n]\setminus\underline{\mathcal{I}}_{j-1}$ and $\underline{H}_{j}^{(i)}\in\underline{\mathcal{I}}_{j-1}$ . Set $j^{\prime}=Z_{j}^{(i)}$ and recall $j^{\prime}\in\{1+\sigma_{j-1},\ldots,\sigma_{j}\}$ by definition. From the coupling above, we know $\bar{Y}_{j^{\prime}}^{(i)}=1$ and $\bar{H}_{j^{\prime}}^{(i)}=\underline{H}_{j}^{(i)}$ . Since $\underline{H}_{j}^{(i)}\in\underline{\mathcal{I}}_{j-1}$ in the present case, we have $\bar{H}_{j^{\prime}}^{(i)}\in\underline{\mathcal{I}}_{j-1}$ as well. Hence, because $\underline{\mathcal{I}}_{j-1}\subset\bar{\mathcal{I}}_{\sigma_{j-1}}$ by the inductive hypothesis, $j^{\prime}-1\geq\sigma_{j-1}$ by definition, and $\{\bar{\mathcal{I}}_{j^{\prime\prime}}\}_{j^{\prime\prime}=0}^{\infty}$ is increasing, we obtain $\bar{H}_{j^{\prime}}^{(i)}\in\bar{\mathcal{I}}_{j^{\prime}-1}$ . We have thus shown $\bar{Y}_{j^{\prime}}^{(i)}=1$ and $\bar{H}_{j^{\prime}}^{(i)}\in\bar{\mathcal{I}}_{j^{\prime}-1}$ , so $i\in\bar{\mathcal{I}}_{j^{\prime}}$ by Definition 1 Finally, using $j^{\prime}\leq\sigma_{j}$ and again appealing to monotonocity, we conclude that $i\in\bar{\mathcal{I}}_{\sigma_{j}}$ . ∎

We can now relate the rumor spreading times of the two processes. In particular, let $\bar{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\bar{\mathcal{I}}_{j}=[n]\}$ (as in Definition 1) and $\underline{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\underline{\mathcal{I}}_{j}=[n]\}$ . We then have the following.

Claim 6.

For any $j\in\{3,4,\ldots\}$ and $\iota>1$ , we have $\mathbb{P}(\bar{\tau}_{\text{spr}}>\iota j\log(j)/\Upsilon)\leq\mathbb{P}(\underline{\tau}_{\text{spr}}>j)+27nj^{1-\iota}$ .

Proof.

Let $h(0)=0$ and $h(j^{\prime})=\iota j^{\prime}\log(j)/\Upsilon$ for each $j^{\prime}\in\mathbb{N}$ . Then clearly

(83)

\mathbb{P}(\bar{\tau}_{\text{spr}}>\iota j\log(j)/\Upsilon)=\mathbb{P}(\bar{\tau}_{\text{spr}}>h(j),\sigma_{j}\leq h(j))+\mathbb{P}(\bar{\tau}_{\text{spr}}>h(j),\sigma_{j}>h(j)).

For the first term in (83), by definition of $\bar{\tau}_{\text{spr}}$ and Claim 5, we have

(84)

\{\bar{\tau}_{\text{spr}}>h(j),\sigma_{j}\leq h(j)\}\subset\{\bar{\tau}_{\text{spr}}>\sigma_{j}\}=\{\bar{\mathcal{I}}_{\sigma_{j}}\neq[n]\}\subset\{\underline{\mathcal{I}}_{j}\neq[n]\}=\{\underline{\tau}_{\text{spr}}>j\}.

For the second term in (83), we first observe that for any $j^{\prime}\in\mathbb{N}$ ,

\lfloor h(j^{\prime})\rfloor-\lceil h(j^{\prime}-1)\rceil-1\geq h(j^{\prime})-h(j^{\prime}-1)-3=(\iota\log(j)/\Upsilon)-3>0,

where the last inequality holds by assumption on $j$ and $\iota$ . Thus, by the union bound, we can write

(85)		$\displaystyle\mathbb{P}(\sigma_{j^{\prime}}>h(j^{\prime}),\sigma_{j^{\prime}-1}\leq h(j^{\prime}-1))$	$\displaystyle\leq\sum_{i=1}^{n}\mathbb{P}(\cap_{j^{\prime\prime}=1+\lceil h(j^{\prime}-1)\rceil}^{\lfloor h(j^{\prime})\rfloor}\{\bar{Y}_{j^{\prime\prime}}^{(i)}=0\})=n(1-\Upsilon)^{\lfloor h(j^{\prime})\rfloor-\lceil h(j^{\prime}-1)\rceil-1}$
(86)			$\displaystyle\leq n(1-\Upsilon)^{(\iota\log(j)/\Upsilon)-3}\leq n\exp(-\iota\log(j)+3\Upsilon)<27nj^{-\iota}.$

Hence, because $\sigma_{0}=0=h(0)$ , we can iterate this argument to obtain that for any $j^{\prime}\in\mathbb{N}$ ,

(87)		$\displaystyle\mathbb{P}(\sigma_{j^{\prime}}>h(j^{\prime}))$	$\displaystyle\leq\mathbb{P}(\sigma_{j^{\prime}}>h(j^{\prime}),\sigma_{j^{\prime}-1}\leq h(j^{\prime}-1))+\mathbb{P}(\sigma_{j^{\prime}-1}>h(j^{\prime}-1))$
(88)			$\displaystyle\leq 27nj^{-\iota}+\mathbb{P}(\sigma_{j^{\prime}-1}>h(j^{\prime}-1))\leq\cdots\leq 27nj^{\prime}j^{-\iota}.$

In particular, $\mathbb{P}(\sigma_{j}>h(j))\leq 27nj^{1-\iota}$ . Combining with (83) and (84) completes the proof. ∎

To bound the tail of $\underline{\tau}_{\text{spr}}$ , we will use the following result.

Claim 7 (Lemma 19 from (Chawla et al., 2020)).

Under the assumptions of Corollary 1, there exists an absolute constant $C_{\ref{clmNoiselessTail}}>0$ such that, for any $h\in\mathbb{N}$ , we have $\mathbb{P}(\underline{\tau}_{\text{spr}}\geq C_{\ref{clmNoiselessTail}}h\log(n)/\phi)\leq n^{-4h}$ .

Using the previous two claims, we can prove a tail bound for $\bar{\tau}_{\text{spr}}$ .

Claim 8.

Under the assumptions of Corollary 1, there exists an absolute constant $C_{\ref{clmNoisyTail}}>0$ such that, for any $h\in\{3,4,\ldots\}$ , we have $\mathbb{P}(\bar{\tau}_{\text{spr}}\geq\xi(h))\leq 56\cdot 2^{-h}$ , where we define

\xi(h)=C_{\ref{clmNoisyTail}}(\log n)^{2}h^{3}\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)/(\phi\Upsilon).

Proof.

Since $G_{\text{hon}}$ is $d$ -regular with $d\geq 2$ by assumption, we have $n\geq 2$ as well. Therefore, setting $C_{\ref{clmNoisyTail}}=(2C_{\ref{clmNoiselessTail}})\vee((e+1)/\log(2))$ , we know $\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\geq 1$ , which implies

(h-1)\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\geq h-1\geq\log h\quad\Rightarrow\quad h\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\geq\log(C_{\ref{clmNoisyTail}}h\log(n)/\phi).

Consequently, if we define $\iota=h\log n$ and $j=\lfloor C_{\ref{clmNoisyTail}}h\log(n)/\phi\rfloor$ , we can write

(89)		$\displaystyle\xi(h)$	$\displaystyle=\left(h\log n\right)\left(C_{\ref{clmNoisyTail}}h\log(n)/\phi\right)\left(h\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)\right)/\Upsilon$
(90)			$\displaystyle\geq\left(h\log n\right)\left(C_{\ref{clmNoisyTail}}h\log(n)/\phi\right)\log(C_{\ref{clmNoisyTail}}h\log(n)/\phi)/\Upsilon\geq\iota j\log(j)/\Upsilon.$

Because $C_{\ref{clmNoisyTail}}\geq(e+1)/\log(2)$ , $h\geq 3$ , and $n\geq 2$ , we also know

(91)

j\geq C_{\ref{clmNoisyTail}}\cdot h\cdot\log(n)-1\geq((e+1)/\log(2))\cdot 3\cdot\log(2)-1=3(e+1)-1>3e>e^{2}.

Hence, $j\in\{3,4,\ldots\}$ ; combined with $\iota\geq 3\log 2>1$ , we can apply Claim 6 to obtain

(92)

\mathbb{P}\left(\bar{\tau}_{\text{spr}}>\xi(h)\right)\leq\mathbb{P}\left(\bar{\tau}_{\text{spr}}>\iota j\log(j)/\Upsilon\right)\leq\mathbb{P}\left(\underline{\tau}_{\text{spr}}>j\right)+27nj^{1-\iota}.

On the other hand, (91) implies $C_{\ref{clmNoisyTail}}h\log(n)/\phi\geq 2$ , so by definition of $C_{\ref{clmNoisyTail}}$ ,

j\geq(C_{\ref{clmNoisyTail}}h\log(n)/\phi)-1\geq(C_{\ref{clmNoisyTail}}h\log(n)/\phi)/2=(C_{\ref{clmNoisyTail}}/2)h\log(n)/\phi\geq C_{\ref{clmNoiselessTail}}h\log(n)/\phi.

Therefore, by Claim 7, we know that

\mathbb{P}\left(\underline{\tau}_{\text{spr}}>j\right)\leq\mathbb{P}\left(\underline{\tau}_{\text{spr}}>C_{\ref{clmNoiselessTail}}h\log(n)/\phi\right)\leq n^{-4h}<n^{1-h}.

Finally, notice that $\iota=h\log n\geq 3\log 2>2$ , so $1-\iota<-\iota/2$ , thus by (91),

27nj^{1-\iota}<27nj^{-\iota/2}=27n\sqrt{j}^{-\iota}<27n\exp(-\iota)=27n^{1-h}.

Hence, substituting the previous two inequalities into (92) and using $n\geq 2$ completes the proof. ∎

We now bound $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]$ . First, we define

\chi=\lceil C_{\ref{clmNoisyTail}}(\log n)^{2}\log(C_{\ref{clmNoisyTail}}\log(n)/\phi)/(\phi\Upsilon)\rceil,\quad C=8\beta/\log 2,\quad h_{\star}=\lceil C\log C\rceil\vee 3.

Notice that for any $h\geq h_{\star}\geq C\log C$ , we have $2^{-h}\leq h^{-4\beta}$ (else, we can invoke Claim 20 from Appendix F.1 with $x=h$ , $y=1$ , and $z=C/2$ to obtain $h<C\log C$ , a contradiction). We write

(93)

\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq A_{2\chi h_{\star}^{3}}+\sum_{h=h_{\star}}^{\infty}\sum_{j=2\chi h^{3}+1}^{2\chi(h+1)^{3}}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2).

Now for any $h\geq h_{\star}$ and $j\geq 2\chi h^{3}+1$ , we use $2^{-h}\leq h^{-4\beta}$ , the definition of $\chi$ , and Claim 8 to write

\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq\mathbb{P}(\bar{\tau}_{\text{spr}}\geq\chi h^{3})\leq\mathbb{P}(\bar{\tau}_{\text{spr}}\geq\xi(h))\leq 56\cdot 2^{-h}\leq 56h^{-4\beta}.

Therefore, for any such $h$ , we obtain

\sum_{j=2\chi h^{3}+1}^{2\chi(h+1)^{3}}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq 56h^{-4\beta}\sum_{j=2\chi h^{3}+1}^{2\chi(h+1)^{3}}(A_{j}-A_{j-1})\leq 56h^{-4\beta}A_{2\chi(h+1)^{3}}.

Furthermore, by Claim 18 in Appendix F.1 and $h\geq 1$ , we know that

A_{2\chi(h+1)^{3}}\leq e^{2\beta}(2\chi(h+1)^{3})^{\beta}\leq e^{2\beta}(2\chi(2h)^{3})^{\beta}=(4e)^{2\beta}\chi^{\beta}h^{3\beta}\ \forall\ h\geq 1.

Therefore, by the previous two inequalities and (93), and since $h_{\star}\geq 3$ , we have shown

\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)\leq(4e)^{2\beta}\chi^{\beta}\left(h_{\star}^{3\beta}+56\sum_{h=h_{\star}}^{\infty}h^{-\beta}\right)\leq(4e)^{2\beta}\chi^{\beta}\left(h_{\star}^{3\beta}+\frac{56}{\beta-1}\right),

where the second inequality follows from Claim 19 from Appendix F.1, $h_{\star}\geq 3$ , and $\beta>1$ . Because $h_{\star}$ is a constant, the right side is $O(\chi^{\beta})$ . Therefore, we have shown

\mathbb{E}[_{2\bar{\tau}_{\text{spr}}}]=\mathbb{E}\left[\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbbm{1}(2\bar{\tau}_{\text{spr}}\geq j)\right]=\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}\geq j/2)=O(\chi^{\beta}).

Hence, by definition of $\chi$ , we obtain $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]=O(((\log n)^{2}\log(\log(n)/\phi)/(\phi\Upsilon))^{\beta})$ . Combining this bound with Theorem 2 completes the proof of Corollary 1.

D.3. Proof of Corollary 2

Similar to the analysis in Appendix D.1, we can use the decomposition $R_{T}^{(i)}=\sum_{h=1}^{4}R_{T,h}^{(i)}$ , along with Lemmas 4 and 5, to bound regret as follows:

(94)

\displaystyle R_{T}^{(i)}\leq\sum_{k\in\overline{S}^{(i)}}\frac{4\alpha\log T}{\Delta_{k}}+\frac{4(\alpha-1)|\overline{S}^{(i)}|}{2\alpha-3}+R_{T,3}^{(i)}+R_{T,4}^{(i)}+\mathbb{E}[A_{\tau_{\text{spr}}}].

Next, for each $k\in\underline{S}^{(i)}$ , let $Y_{k}=\mathbbm{1}(\cup_{j=\tau_{\text{spr}}}^{\infty}\{k\in S_{j}^{(i)}\})$ be the indicator that $k$ was active after $A_{\tau_{\text{spr}}}$ . Then as in the proof of Lemma 5, we can use Claim 22 and Corollary 6 from Appendix F.2 to write

(95)

\mathbb{E}\left[\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k\right)\right]\leq\mathbb{E}[Y_{k}]\frac{4\alpha\log T}{\Delta_{k}^{2}}+\frac{4(\alpha-1)}{2\alpha-3}.

(The only difference from the proof of Lemma 5 is that, when applying Claim 22, we write

\displaystyle\sum_{t=1+A_{\tau_{\text{spr}}}}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\leq Y_{k}\frac{4\alpha\log T}{\Delta_{k}^{2}},

where we can multiply by $Y_{k}$ because the left side is also zero when $Y_{k}=0$ .) Combining (95) with the definitions of $R_{T,3}^{(i)}$ and $R_{T,4}^{(i)}$ and using $\Delta_{k}\leq 1$ , we thus obtain

R_{T,3}^{(i)}+R_{T,4}^{(i)}\leq\mathbb{E}\left[\sum_{k\in\underline{S}^{(i)}}Y_{k}\frac{4\alpha\log T}{\Delta_{k}}\right]+\frac{4(\alpha-1)|\underline{S}^{(i)}|}{2\alpha-3}.

We claim, and will return to prove, that when $d_{\text{mal}}(i)=0$ ,

(96)

\sum_{k\in\overline{S}^{(i)}}\frac{1}{\Delta_{k}}+\sum_{k\in\underline{S}^{(i)}}\frac{Y_{k}}{\Delta_{k}}\leq\sum_{k=2}^{S+2}\frac{1}{\Delta_{k}}.

Assuming (96) holds, we can combine the previous two inequalities and substitute into (94) to obtain $R_{T}^{(i)}\leq 4\alpha\log(T)\sum_{k=2}^{S+2}\Delta_{k}^{-2}+O(K)+\mathbb{E}[A_{\tau_{\text{spr}}}]$ . Bounding $\mathbb{E}[A_{\tau_{\text{spr}}}]$ as in Lemma 4 yields the sharper version of Theorem 2, and further bounding $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]$ as in Appendix D.2 sharpens Corollary 2.

To prove (96), we first show $\sum_{k\in\underline{S}^{(i)}}Y_{k}\leq 1+\mathbbm{1}(1\in\hat{S}^{(i)})$ . Suppose instead that $\sum_{k\in\underline{S}^{(i)}}Y_{k}\geq 2+\mathbbm{1}(1\in\hat{S}^{(i)})\triangleq H$ . Then we can find $H$ distinct arms $k_{1},\ldots,k_{H}\in\underline{S}^{(i)}$ , and $H$ corresponding phases $j_{h}\geq\tau_{\text{spr}}$ , such that $k_{h}$ was active at phase $j_{h}$ for each $h\in[H]$ . Without loss of generality, we can assume each $j_{h}$ is minimal, i.e., $j_{h}=\min\{j\geq\tau_{\text{spr}}:k_{h}\in S_{j}^{(i)}\}$ . We consider two cases (which are exhaustive since $j_{h}\geq\tau_{\text{spr}}$ ) and show that both yield contradictions.

•
$j_{h}=\tau_{\text{spr}}\ \forall\ h\in[H]$ : We consider two further sub-cases.
- –
  
  $1\in\hat{S}^{(i)}$ , i.e., the best arm is sticky. Then $H=3$ , so $k_{1},\ldots,k_{3}$ are all active at phase $\tau_{\text{spr}}$ . But all of these arms are non-sticky and only two such arms are active per phase.
- –
  
  $1\notin\hat{S}^{(i)}$ . Here $k_{1},k_{2}$ are both active at phase $\tau_{\text{spr}}$ , as is $1$ (by definition of $\tau_{\text{spr}}$ ). But since $k_{1}$ and $k_{2}$ are suboptimal, we again have three non-sticky active arms.
•

$\max_{h\in[H]}j_{h}>\tau_{\text{spr}}$ : We can assume (after possibly relabeling) that $j_{1}>\tau_{\text{spr}}$ . Thus, by minimality of $j_{1}$ , $k_{1}$ was not active at phase $j_{1}-1$ but became active at $j_{1}$ , so it was recommended by some neighbor $i^{\prime}$ at $j_{1}-1$ . But since $d_{\text{mal}}(i)=0$ , $i^{\prime}$ is honest, and since $j_{1}-1\geq\tau_{\text{spr}}$ , the best arm was most played for $i^{\prime}$ in phase $j_{1}-1$ , so $i^{\prime}$ would not have recommended $k_{1}$ .

Thus, $\sum_{k\in\underline{S}^{(i)}}Y_{k}\leq 1+\mathbbm{1}(1\in\hat{S}^{(i)})$ holds. Combined with the fact that $|\overline{S}^{(i)}|=S-\mathbbm{1}(1\in\hat{S}^{(i)})$ by definition, at most $S+1$ terms are nonzero in the summations on the left side of (96). Since $\Delta_{2}\leq\cdots\leq\Delta_{K}$ by the assumed ordering of the arm means, this completes the proof.

D.4. Coarse analysis of the noisy rumor process

For completeness, we provide a coarser though more general bound for $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]$ than the one derived in Appendix D.2. To begin, let $\mathbb{P}_{j}^{\prime}$ and $\mathbb{E}_{j}^{\prime}$ denote probability and expectation conditioned on $\{\bar{Y}_{j^{\prime}}^{(i)},\bar{H}_{j^{\prime}}^{(i)}\}_{j^{\prime}=1}^{j-1}$ . For each $h\in[n]$ , define the random phase $\bar{\tau}(h)=\inf\{j\in\mathbb{N}:|\bar{\mathcal{I}}_{j}|=h\}$ . Note that $\bar{\tau}(1)=1$ and $\bar{\tau}_{\text{spr}}=\bar{\tau}(n)$ . We then have the following tail bound.

Claim 9.

For any $l,j\in\mathbb{N}$ , we have $\mathbb{P}(\bar{\tau}(l)>lj)\leq l(1-\Upsilon/\bar{d}_{\text{hon}})^{j}$ .

Proof.

We use induction on $l$ . For $l=1$ , $\bar{\tau}(1)=1$ ensures $\mathbb{P}(\bar{\tau}(l)>lj)=0$ for any $j\in\mathbb{N}$ , so the bound is immediate. Next, assume the bound holds for $l\in\mathbb{N}$ . We first write

\mathbb{P}(\bar{\tau}(l+1)>(l+1)j)\leq\mathbb{P}(\bar{\tau}(l+1)>(l+1)j,\bar{\tau}(l)\leq lj)+\mathbb{P}(\bar{\tau}(l)>lj).

Thus, by the inductive hypothesis, it suffices to bound the first term by $(1-\Upsilon/\bar{d}_{\text{hon}})^{j}$ . We first write

\mathbb{P}(\bar{\tau}(l+1)>(l+1)j,\bar{\tau}(l)\leq lj)=\mathbb{E}[\mathbbm{1}(\bar{\tau}(l)\leq lj)\mathbb{P}_{lj+1}^{\prime}(\bar{\tau}(l+1)>(l+1)j)].

Now suppose $\bar{\tau}(l)\leq lj$ . By Assumption 1, we can find $i\in\bar{\mathcal{I}}_{lj}$ and $i^{\prime}\notin\bar{\mathcal{I}}_{lj}$ such that $i\in N_{\text{hon}}(i^{\prime})$ . Then for $\bar{\tau}(l+1)>(l+1)j$ to occur, it must be the case that, for each $j^{\prime}\in\{lj+1,\ldots,(l+1)j\}$ , the event $\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}$ did not occur. Therefore, we have

(97)

\mathbb{P}_{lj+1}^{\prime}(\bar{\tau}(l+1)>(l+1)j)\leq\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C}).

By the law of total expectation, we can write

(98)		$\displaystyle\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})$
(99)		$\displaystyle\quad=\mathbb{E}_{lj+1}^{\prime}[\mathbbm{1}(\cap_{j^{\prime}=lj+1}^{(l+1)j-1}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})(1-\mathbb{P}_{(l+1)j}^{\prime}(\bar{H}_{(l+1)j}^{(i^{\prime})}=i,\bar{Y}_{(l+1)j}^{(i^{\prime})}=1))].$

Since $\bar{H}_{(l+1)j}^{(i^{\prime})}$ is $\text{Uniform}(N_{\text{hon}}(i))$ and $\bar{Y}_{(l+1)j}^{(i^{\prime})}$ is $\text{Bernoulli}(\Upsilon)$ , we have

\mathbb{P}_{(l+1)j}^{\prime}(\bar{H}_{(l+1)j}^{(i^{\prime})}=i,\bar{Y}_{(l+1)j}^{(i^{\prime})}=1)=\Upsilon/d_{\text{hon}}(i)\geq\Upsilon/\bar{d}_{\text{hon}}.

Therefore, combining the previous two expressions and iterating, we obtain

	$\displaystyle\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})$	$\displaystyle\leq\mathbb{P}_{lj+1}^{\prime}(\cap_{j^{\prime}=lj+1}^{(l+1)j-1}\{\bar{H}_{j^{\prime}}^{(i^{\prime})}=i,\bar{Y}_{j^{\prime}}^{(i^{\prime})}=1\}^{C})(1-\Upsilon/\bar{d}_{\text{hon}})$
		$\displaystyle\leq\cdots\leq(1-\Upsilon/\bar{d}_{\text{hon}})^{j}.\qed$

Next, we have a simple technical claim.

Claim 10.

Let $h_{\dagger}=(8\beta\bar{d}_{\text{hon}}/\Upsilon)\log(8\beta\bar{d}_{\text{hon}}n/\Upsilon)$ . Then for any $h\geq h_{\dagger}$ , $\exp(-h\Upsilon/\bar{d}_{\text{hon}})\leq h^{-2\beta}/n$ .

Proof.

If the claimed bound fails, we have $h<(2\beta\bar{d}_{\text{hon}}/\Upsilon)\log(h)+(\bar{d}_{\text{hon}}/\Upsilon)\log(n)$ . Then since $(\bar{d}_{\text{hon}}/\Upsilon)\log(n)<h_{\dagger}/2\leq h/2$ , we obtain $h<(4\beta\bar{d}_{\text{hon}}/\Upsilon)\log h$ . Applying Claim 20 from Appendix F.1 with $x=h$ , $y=1$ , and $z=4\beta\bar{d}_{\text{hon}}/\Upsilon$ , we obtain $h<2z\log(2z)\leq h_{\dagger}$ , a contradiction. ∎

We can now bound $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]$ . The analysis is similar to Appendix D.2. We first write

\displaystyle\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]\leq A_{2n\lceil h_{\dagger}\rceil}+\sum_{h=\lceil h_{\dagger}\rceil}^{\infty}\mathbb{P}(\bar{\tau}_{\text{spr}}>nh)\sum_{j=2nh+1}^{2n(h+1)}(A_{j}-A_{j-1}).

By the previous two claims, we know

\mathbb{P}(\bar{\tau}_{\text{spr}}>nh)=\mathbb{P}(\bar{\tau}(n)>nh)\leq n(1-\Upsilon/\bar{d}_{\text{hon}})^{h}\leq n\exp(-h\Upsilon/\bar{d}_{\text{hon}})\leq h^{-2\beta}.

By Claim 18 from Appendix F.1, we also have

A_{2n\lceil h_{\dagger}\rceil}\leq(2e)^{2\beta}(nh_{\dagger})^{\beta},\quad A_{2n(h+1)}\leq e^{2\beta}(2n(h+1))^{\beta}\leq e^{2\beta}(2n(2h))^{\beta}=(2e)^{2\beta}(nh)^{\beta}\ \forall\ h\in\mathbb{N}.

Therefore, combining the previous three expressions, we obtain

\mathbb{E}A_{2\bar{\tau}_{\text{spr}}}\leq(2e)^{2\beta}n^{\beta}\left(h_{\dagger}^{\beta}+\sum_{h=\lceil h_{\dagger}\rceil}^{\infty}h^{-\beta}\right)\leq(2e)^{2\beta}n^{\beta}\left(h_{\dagger}^{\beta}+\frac{1}{\beta-1}\right),

where the second inequality uses $\beta>1$ , $h_{\dagger}\geq 2$ , and Claim 19 from Appendix F.1. Hence, we have shown $\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]=O((nh_{\dagger})^{\beta})=\tilde{O}((n\bar{d}_{\text{hon}}/\Upsilon)^{\beta})$ . Note this bound cannot be improved in general – for example, if $G_{\text{hon}}$ is a line graph, it becomes $\tilde{O}((n/\Upsilon)^{\beta})$ , so since $\mathbb{E}[\bar{\tau}_{\text{spr}}]^{\beta}=O(\mathbb{E}[A_{\bar{\tau}_{\text{spr}}}])$ , we have $\mathbb{E}[\bar{\tau}_{\text{spr}}]=\tilde{O}(n/\Upsilon)$ , which is the correct scaling (up to log terms) in Definition 1.

Appendix E Details from Section 7

In this appendix, we formalize the analysis that was discussed in Section 7. In particular, the subsequent five sub-appendices provide details on the respective five subsections of Section 7.

E.1. Details from Section 7.1

Let $\psi^{\prime}=(\rho_{2}+\beta(2\alpha-1))/(2\alpha\beta)$ . Note that since $\alpha>2$ and $\rho_{2}\in(0,\beta)$ by assumption,

0<1-1/(2\alpha)=\beta(2\alpha-1)/(2\alpha\beta)<\psi^{\prime}<(\beta+\beta(2\alpha-1))/(2\alpha\beta)=1,

so $\psi=\sqrt{\psi^{\prime}}$ is well-defined and $\psi\in(0,1)$ . Next, for any $j\in\mathbb{N}$ , define

(100)

\delta_{j,1}=\sqrt{\frac{4\alpha\log A_{j}}{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}},\quad\delta_{j,2}=\sqrt{\alpha\log(A_{j-1}\vee 1)}\left(\frac{1-\psi}{\sqrt{\kappa_{j}}}-\frac{2}{\sqrt{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}}\right)\vee 0.

Since $A_{j}-A_{j-1}=\Theta(j^{\beta-1})$ , $\psi<1$ , and $\rho_{2}<\beta-1$ , we are guaranteed that $A_{j}-A_{j-1}\geq 2(S+2)$ and $\delta_{j,2}>0$ for large $j$ , so the following is well-defined:

(101)

J_{1}^{\star}=\min\left\{j\in\mathbb{N}:A_{j^{\prime}}-A_{j^{\prime}-1}\geq 2(S+2),\delta_{j^{\prime},2}>0\ \forall\ j^{\prime}\geq j\right\}.

Also note $J_{1}^{\star}\geq 2$ (since $A_{1}-A_{0}=1$ ). Next, recall from Section 7.1 that

(102)

\displaystyle\Xi_{j,1}^{(i)}=\left\{B_{j}^{(i)}\notin G_{\delta_{j,1}}(S_{j}^{(i)})\right\},\quad\Xi_{j,2}^{(i)}=\left\{\min_{w\in G_{\delta_{j,2}}(S_{j}^{(i)})}T_{w}^{(i)}(A_{j})\leq\kappa_{j}\right\},\quad\Xi_{j}^{(i)}=\Xi_{j,1}^{(i)}\cup\Xi_{j,2}^{(i)}.

Hence, if we let $\mathcal{S}^{(i)}=\{W\subset[K]:|W|=S+2,\hat{S}^{(i)}\subset W\}$ denote the possible active sets for agent $i$ (i.e., $S_{j}^{(i)}\in\mathcal{S}^{(i)}$ for any phase $j$ ), we can write

(103)

\Xi_{j}^{(i)}=\cup_{W\in\mathcal{S}^{(i)}}((\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})\cup((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})).

Consequently, by the union bound, we obtain

(104)

\mathbb{P}(\Xi_{j}^{(i)})\leq\sum_{W\in\mathcal{S}^{(i)}}\left(\mathbb{P}(\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})+\mathbb{P}((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})\right).

The next two claims bound the two summands on the right side.

Claim 11.

Under the assumptions of Theorem 2, for any $i\in[n]$ , $j\geq J_{1}^{\star}$ , and $W\in\mathcal{S}^{(i)}$ , we have

(105)

\mathbb{P}(\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})\leq 4S(j-1)^{\beta(3-2\alpha)}/(2\alpha-3).

Proof.

If $W\setminus G_{\delta_{j,1}}(W)=\emptyset$ , the claim is immediate. Otherwise, $\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\}$ implies $B_{j}^{(i)}=w$ for some $w\in W\setminus G_{\delta_{j,1}}(W)$ . Thus, by the union bound,

(106)

\mathbb{P}(\Xi_{j,1}^{(i)}\cap\{S_{j}^{(i)}=W\})\leq\sum_{w\in W\setminus G_{\delta_{j,1}}(W)}\mathbb{P}(B_{j}^{(i)}=w,\min W\in S_{j}^{(i)}).

Fix $w\in W\setminus G_{\delta_{j,1}}(W)$ . Then $B_{j}^{(i)}=w$ implies $T_{w}^{(i)}(A_{j})-T_{w}^{(i)}(A_{j-1})\geq(A_{j}-A_{j-1})/(S+2)$ (else, by definition of $B_{j}^{(i)}$ , $\sum_{k\in S_{j}^{(i)}}(T_{k}^{(i)}(A_{j})-T_{k}^{(i)}(A_{j-1}))<A_{j}-A_{j-1}$ ). Since $A_{j}-A_{j-1}\geq S+2$ (by $j\geq J_{1}^{\star}$ ), we conclude $T_{w}^{(i)}(A_{j})-T_{w}^{(i)}(A_{j-1})\geq 1$ , so there exists $t\in\{1+A_{j-1},\ldots,A_{j}\}$ such that

(107)

T_{w}^{(i)}(t-1)-T_{w}^{(i)}(A_{j-1})=T_{w}^{(i)}(A_{j})-T_{w}^{(i)}(A_{j-1})-1,\quad I_{t}^{(i)}=w.

Combining and using the union bound and with $T_{w}^{(i)}(A_{j-1})\geq 0$ by definition, we obtain

(108)

\displaystyle\mathbb{P}(B_{j}^{(i)}=w,\min W\in S_{j}^{(i)})\leq\sum_{t=1+A_{j-1}}^{A_{j}}\mathbb{P}\left(T_{w}^{(i)}(t-1)\geq\frac{A_{j}-A_{j-1}}{S+2}-1,\min W\in S_{j}^{(i)},I_{t}^{(i)}=w\right).

Now fix $t$ as in the summation. Observe that since $w\in W\setminus G_{\delta_{j,1}}(W)$ and $j\geq J_{1}^{\star}$ , we have

\mu_{\min W}-\mu_{w}>\delta_{j,1}=\sqrt{\frac{4\alpha\log A_{j}}{\frac{A_{j}-A_{j-1}}{S+2}-1}}\geq\sqrt{\frac{4\alpha\log t}{\frac{A_{j}-A_{j-1}}{S+2}-1}}.

Therefore, for any such $t$ , we can apply a basic bandit tail (namely, Corollary 5 from Appendix F.2 with the parameters $k_{1}=w$ , $k_{2}=\min W$ , and $\ell=(A_{j}-A_{j-1})/(S+2)-1$ ) to obtain

\mathbb{P}\left(T_{w}^{(i)}(t-1)\geq\frac{A_{j}-A_{j-1}}{S+2}-1,\min W\in S_{j}^{(i)},I_{t}^{(i)}=w\right)\leq 2t^{2(1-\alpha)}.

Substituting into (108) and using Claim 19 from Appendix F.1 (which applies since $\alpha>2$ ), we obtain

\mathbb{P}(B_{j}^{(i)}=w,\min W\in S_{j}^{(i)})\leq 2\sum_{t=1+A_{j-1}}^{\infty}t^{2(1-\alpha)}\leq\frac{2A_{j-1}^{3-2\alpha}}{2\alpha-3}\leq\frac{2(j-1)^{\beta(3-2\alpha)}}{2\alpha-3}.

Substituting into (106) and using $|W\setminus G_{\delta_{j,1}}(W)|\leq|W|-1=S+1\leq 2S$ completes the proof. ∎

Claim 12.

Under the assumption of Theorem 2, for any $i\in[n]$ , $j\geq J_{1}^{\star}$ , and $W\in\mathcal{S}^{(i)}$ ,

(109)

\mathbb{P}((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})\leq 6\cdot 2^{\beta}S(j-1)^{\beta(3-2\alpha)}/(2\alpha-3).

Proof.

By definition, we have

(110)

\displaystyle(\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\}=\left\{B_{j}^{(i)}\in G_{\delta_{j,1}}(W),\min_{w\in G_{\delta_{j,2}}(W)}T_{w}^{(i)}(A_{j})\leq\kappa_{j},S_{j}^{(i)}=W\right\}.

As in the proof of Claim 11, we know that

(111)

T_{B_{j}^{(i)}}^{(i)}(A_{j})\geq T_{B_{j}^{(i)}}^{(i)}(A_{j})-T_{B_{j}^{(i)}}^{(i)}(A_{j-1})\geq\frac{A_{j}-A_{j-1}}{S+2}>\left(\frac{A_{j}-A_{j-1}}{S+2}-1\right)\vee 1>\kappa_{j},

where the final inequality holds since $\delta_{j,2}>0$ by assumption $j\geq J_{1}^{\star}$ , which implies

(112)

\frac{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}{\kappa_{j}}>\left(\frac{2}{1-\psi}\right)^{2}>1.

Thus, $(\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\}$ implies $B_{j}^{(i)}\notin\operatorname*{arg\,min}_{w\in G_{\delta_{j,2}}(W)}T_{w}^{(i)}(A_{j})$ , so by the union bound,

(113)			$\displaystyle\mathbb{P}((\Xi_{j,1}^{(i)})^{C}\cap\Xi_{j,2}^{(i)}\cap\{S_{j}^{(i)}=W\})$
(114)			$\displaystyle\quad\leq\sum_{w_{1}\in G_{\delta_{j,1}}(W),w_{2}\in G_{\delta_{j,2}}(W)\setminus\{w_{1}\}}\mathbb{P}\left(T_{w_{1}}^{(i)}(A_{j})-T_{w_{1}}^{(i)}(A_{j-1})\geq\frac{A_{j}-A_{j-1}}{S+2},T_{w_{2}}^{(i)}(A_{j})\leq\kappa_{j},S_{j}^{(i)}=W\right).$

Now fix $w_{1},w_{2}$ as in the double summation. Then similar to the proof of Claim 11,

(115)			$\displaystyle\mathbb{P}\left(T_{w_{1}}^{(i)}(A_{j})-T_{w_{1}}^{(i)}(A_{j-1})\geq\frac{A_{j}-A_{j-1}}{S+2},T_{w_{2}}^{(i)}(A_{j})\leq\kappa_{j},S_{j}^{(i)}=W\right)$
(116)			$\displaystyle\quad\leq\sum_{t=1+A_{j-1}}^{A_{j}}\mathbb{P}\left(T_{w_{1}}^{(i)}(t-1)\geq\frac{A_{j}-A_{j-1}}{S+2}-1,T_{w_{2}}^{(i)}(A_{j})\leq\kappa_{j},w_{2}\in S_{j}^{(i)},I_{t}^{(i)}=w_{1}\right)$
(117)			$\displaystyle\quad\leq\sum_{t=1+A_{j-1}}^{A_{j}}2\kappa_{j}t^{1-2\alpha\psi^{2}}=2\sum_{t=1+A_{j-1}}^{A_{j}}(\kappa_{j}t^{-\rho_{2}/\beta})t^{2-2\alpha},$

where the second bound follows from applying Claim 21 from Appendix F.2 with $k_{1}=w_{1}$ , $k_{2}=w_{2}$ , $\ell=(A_{j}-A_{j-1})/(S+2)-1$ , $u=\kappa_{j}$ , and $\iota=\psi$ ; note this claim applies since by assumption $j\geq J_{1}^{\star}$ ,

\mu_{w_{2}}-\mu_{w_{1}}\geq\mu_{w_{2}}-\mu_{\min W}\geq-\delta_{j,2}\geq\sqrt{\alpha\log t}\left(\frac{2}{\sqrt{\frac{A_{j}-A_{j-1}}{S+2}-1}}-\frac{1-\psi}{\sqrt{\kappa_{j}}}\right).

Next, observe that for any $t\geq A_{j-1}\geq(j-1)^{\beta}$ , by definition of $\kappa_{j}$ , $j\geq J_{1}^{\star}\geq 2$ , and $\rho_{2}<\beta-1$ ,

\kappa_{j}t^{-\rho_{2}/\beta}\leq\kappa_{j}(j-1)^{-\rho_{2}}=(1-1/j)^{-\rho_{2}}/(K^{2}S)\leq 2^{\beta-1}/S.

Similar to the proof of Claim 11, we can then use Claim 19 to obtain

2\sum_{t=1+A_{j-1}}^{A_{j}}(\kappa_{j}t^{-\rho_{2}/\beta})t^{2-2\alpha}\leq\frac{2^{\beta}}{S}\sum_{t=1+A_{j-1}}^{A_{j}}t^{2-2\alpha}\leq\frac{2^{\beta}(j-1)^{\beta(3-2\alpha)}}{S(2\alpha-3)}.

Combining with (113) and (115) completes the proof, since

\displaystyle|G_{\delta_{j,1}}(W)||G_{\delta_{j,2}}(W)\setminus\{w_{1}\}|<(S+2)(S+1)\leq(3S)(2S)=6S^{2}.\qed

Finally, we provide the tail for $\tau_{\text{arm}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\Xi_{j^{\prime}}^{(i)})=0\ \forall\ i\in[n],j^{\prime}\geq j\}$ .

Lemma 9.

Under the assumptions of Theorem 2, for any $j\geq J_{1}^{\star}\vee 3$ ,

(118)

\mathbb{P}(\tau_{\text{arm}}>j)\leq\frac{(6^{\beta}+2)nK^{2}S(j-2)^{\beta(3-2\alpha)+1}}{(2\alpha-3)(\beta(2\alpha-3)-1)}.

Proof.

By (104), Claims 11 and 12, $|\mathcal{S}^{(i)}|=\binom{K}{2}<\frac{K^{2}}{2}$ , and $\beta\geq 1$ , we can write

(119)

\mathbb{P}(\Xi_{j^{\prime}}^{(i)})\leq\frac{(6\cdot 2^{\beta}+4)K^{2}S(j^{\prime}-1)^{\beta(3-2\alpha)}}{2(2\alpha-3)}\leq\frac{(6^{\beta}+2)K^{2}S(j^{\prime}-1)^{\beta(3-2\alpha)}}{2\alpha-3}.

Thus, because $\tau_{\text{arm}}>j$ implies $\mathbbm{1}(\cup_{i=1}^{n}\Xi_{j^{\prime}}^{(i)})=1$ for some $i\in[n]$ and $j^{\prime}\geq j$ , the union bound gives

(120)

\mathbb{P}(\tau_{\text{arm}}>j)\leq\sum_{j^{\prime}=j}^{\infty}\sum_{i=1}^{n}\mathbb{P}(\Xi_{j^{\prime}}^{(i)})\leq\frac{(6^{\beta}+2)nK^{2}S}{2\alpha-3}\sum_{j^{\prime}=j}^{\infty}(j^{\prime}-1)^{\beta(3-2\alpha)}.

Finally, use Claim 19 (which applies since $\beta(2\alpha-3)>1$ ) to bound the sum. ∎

E.2. Details from Section 7.2

Recall $\theta_{j}=(j/3)^{\rho_{1}}$ , where $\rho_{1}\in(0,1/\eta]$ and $\eta>1$ . Hence, for all large $j$ , we have

(121)

1\leq\lfloor\theta_{j}\rfloor\leq j-2,\quad\lceil\lfloor\theta_{j}\rfloor^{\eta}\rceil\leq\theta_{j}^{\eta}+1\leq(j/3)+1\leq(j-2)-j/3.

Thus, the following is well-defined:

(122)

J_{2}^{\star}=\min\left\{j\in\mathbb{N}:1\leq\lfloor\theta_{j^{\prime}}\rfloor\leq j^{\prime}-2,j^{\prime}/3\leq(j^{\prime}-2)-\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil\ \forall\ j^{\prime}\geq j\right\}.

Now recall from Section 7.2 that $\Xi_{j}^{(i\rightarrow i^{\prime})}=\cap_{j^{\prime}=\lfloor\theta_{j}\rfloor}^{j-2}\{H_{j^{\prime}}^{(i^{\prime})}\neq i\}$ , and

\tau_{\text{com}}=\inf\{j\in\mathbb{N}:\mathbbm{1}(\cup_{(i,i^{\prime})\in E_{\text{hon}}}\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})=0\ \forall\ j^{\prime}\in\{j,j+1,\ldots\}\}.

The next lemma provides a tail bound for this random phase.

Lemma 10.

Under the assumptions of Theorem 2, for any $j\geq J_{2}^{\star}$ ,

(123)

\mathbb{P}(\tau_{\text{com}}>j)\leq 3(n+m)^{3}\exp(-j/(3\bar{d})).

Proof.

We first use the union bound to write

(124)

\displaystyle\mathbb{P}(\tau_{\text{com}}>j)\leq\sum_{j^{\prime}=j}^{\infty}\sum_{i\rightarrow i^{\prime}\in E_{\text{hon}}}\mathbb{P}(\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})}).

Fix $i\rightarrow i^{\prime}\in E_{\text{hon}}$ and $j^{\prime}\geq j$ . Suppose $\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})}$ holds. Then $i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor+1,\ldots,j^{\prime}-1\}$ ; else, we can find $j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor+1,\ldots,j^{\prime}-1\}$ such that $H_{j^{\prime\prime}-1}^{(i^{\prime})}=i$ (i.e., $H_{j^{\prime\prime}}^{(i^{\prime})}=i$ for $j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor,\ldots,j^{\prime}-2\}$ ), contradicting $\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})}$ . Hence, we have two cases: $i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in[j^{\prime}-1]$ , or $i\in P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}$ for some $j^{\prime\prime}\in[j^{\prime}-1]$ and $\max\{j^{\prime\prime}\in[j^{\prime}-1]:i\in P_{j^{\prime\prime}}^{(i^{\prime})}\setminus P_{j^{\prime\prime}-1}^{(i^{\prime})}\}\leq\lfloor\theta_{j^{\prime}}\rfloor$ . In the former case, $i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in[j^{\prime}-1]$ ; in the latter, $i\notin P_{j^{\prime\prime}}^{(i^{\prime})}\ \forall\ j^{\prime\prime}\in\{\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil+1,\ldots,j^{\prime}-1\}$ . Thus,

(125)		$\displaystyle\mathbb{P}(\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})$	$\displaystyle\leq\mathbb{P}(\cap_{j^{\prime\prime}=\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil+1}^{j^{\prime}-2}\{i\notin P_{j^{\prime\prime}}^{(i^{\prime})},H_{j^{\prime\prime}}^{(i^{\prime})}\neq i\})$
(126)			$\displaystyle=\mathbb{E}[\mathbbm{1}(\cap_{j^{\prime\prime}=\lceil\lfloor\theta_{j^{\prime}}\rfloor^{\eta}\rceil+1}^{j^{\prime}-3}\{i\notin P_{j^{\prime\prime}}^{(i^{\prime})},H_{j^{\prime\prime}}^{(i^{\prime})}\neq i\})\mathbbm{1}(i\notin P_{j^{\prime}-2}^{(i^{\prime})})\mathbb{P}_{j^{\prime}-2}(H_{j^{\prime}-2}^{(i)}\neq i)].$

Now given that $i\notin P_{j^{\prime}-2}^{(i^{\prime})}$ , $H_{j^{\prime}-2}^{(i)}$ is sampled uniformly from a set of at most $\bar{d}$ elements which includes $i$ , so $\mathbb{P}_{j^{\prime}-2}(H_{j^{\prime}-2}^{(i)}\neq i)\leq(1-1/\bar{d})$ . Substituting above and iterating yields

(127)

\mathbb{P}(\Xi_{j^{\prime}}^{(i\rightarrow i^{\prime})})\leq(1-1/\bar{d})^{j^{\prime}-2-\lceil\lfloor\theta_{j}^{\prime}\rfloor^{\eta}\rceil}\leq(1-1/\bar{d})^{j^{\prime}/3},

where the final inequality uses $j^{\prime}\geq j\geq J_{2}^{\star}$ . Combining (124) and (127) and computing a geometric series, we obtain

(128)

\mathbb{P}(\tau_{\text{com}}>j)\leq\frac{|E_{\text{hon}}|(1-1/\bar{d})^{j/3}}{1-(1-1/\bar{d})^{j/3}}\leq\frac{|E_{\text{hon}}|(1-1/\bar{d})^{j/3}}{1-(1-1/\bar{d})^{1/3}}

Finally, using $|E_{\text{hon}}|\leq n^{2}<(n+m)^{2}$ , $1-x\leq e^{-x}\ \forall\ x\in\mathbb{R}$ , $(1+x)^{r}\leq 1+rx$ for any $r\in(0,1)$ and $x\geq-1$ , and $\bar{d}\leq m+n$ , we obtain the desired bound. ∎

E.3. Details from Section 7.3

We begin with some intermediate claims.

Claim 13.

If the assumptions of Theorem 2 hold, then for any $i\in[n]$ and $j\geq\tau_{\text{arm}}$ , we have

\mu_{\min S_{j}^{(i)}}\leq\mu_{B_{j}^{(i)}}+\delta_{j,1}\leq\mu_{\min S_{j+1}^{(i)}}+\delta_{j,1}.

Proof.

The first inequality holds by definition of $\tau_{\text{arm}}$ and assumption $j\geq\tau_{\text{arm}}$ . The second holds since $\min S_{j+1}^{(i)}$ is the best arm in $S_{j+1}^{(i)}$ and $B_{j}^{(i)}\in S_{j+1}^{(i)}$ in the algorithm. ∎

Claim 14.

If the assumptions of Theorem 2 hold, then for any $i\in[n]$ and $j^{\prime}\geq j\geq\tau_{\text{arm}}$ ,

(129)

\mu_{\min S_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-(K-1)\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.

Proof.

If $j=j^{\prime}$ or $\mu_{\min S_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}$ , the bound is immediate, so we assume $j^{\prime}>j$ and $\mu_{\min S_{j^{\prime}}^{(i)}}<\mu_{\min S_{j}^{(i)}}$ for the remainder of the proof. Under this assumption, there must exist a phase $j^{\prime\prime}\in\{j+1,\ldots,j^{\prime}\}$ at which the mean of the best active arm reaches a new strict minimum since $j$ , i.e., $\mu_{\min S_{j^{\prime\prime}}^{(i)}}<\mu_{\min S_{j}^{(i)}}$ . Let $m$ denote the number of phases this occurs and $j^{(1)},\ldots,j^{(m)}$ these (ordered) phases; formally,

(130)

\displaystyle j^{(0)}=j,\quad j^{(l)}=\min\left\{j^{\prime\prime}\in\{j^{(l-1)}+1,\ldots,j^{\prime}\}:\mu_{\min S_{j^{\prime\prime}}^{(i)}}<\mu_{\min S_{j^{(l-1)}}^{(i)}}\right\}\ \forall\ l\in[m].

The remainder of the proof relies on the following three inequalities:

(131)

m\leq K-1,\quad\mu_{\min S_{j^{(m)}}^{(i)}}\leq\mu_{\min S_{j^{\prime}}^{(i)}},\quad\mu_{\min S_{j^{(l-1)}}^{(i)}}\leq\mu_{\min S_{j^{(l)}-1}^{(i)}}\ \forall\ l\in[m].

The first inequality holds since $\mu_{\min S_{j^{(0)}}^{(i)}}>\cdots>\mu_{\min S_{j^{(m)}}^{(i)}}$ by definition, so $\min S_{j^{(0)}}^{(i)},\ldots,\min S_{j^{(m)}}^{(i)}$ are distinct arms; since there are $m+1$ of these arms and $K$ in total, $m+1\leq K$ . For the second, we have $\mu_{\min S_{j^{(m)}}^{(i)}}=\mu_{\min S_{j^{\prime}}^{(i)}}$ when $j^{(m)}=j^{\prime}$ and $\mu_{\min S_{j^{(m)}}^{(i)}}\leq\mu_{\min S_{j^{\prime}}^{(i)}}$ when $j^{(m)}<j^{\prime}$ (if the latter fails, we contradict the definition of $m$ ). For the third, note $j^{(l)}\geq j^{(l-1)}+1$ by construction, so if $j^{(l)}=j^{(l-1)}+1$ , the bound holds with equality; else, $j^{(l)}-1\geq j^{(l-1)}+1$ , so if the bound fails,

(132)

j^{(l)}-1\in\left\{j^{\prime\prime}\in\{j^{(l-1)}+1,\ldots,j^{\prime}\}:\mu_{\min S_{j^{\prime\prime}}^{(i)}}<\mu_{\min S_{j^{(l-1)}}^{(i)}}\right\},

which is a contradiction, since $j^{(l)}$ is the minimal element of the set at right. Hence, (131) holds. Combined with Claim 13 (note $j^{(l)}-1\geq j^{(0)}=j\geq\tau_{\text{arm}}\ \forall\ l\in[m]$ , as required), we obtain

(133)		$\displaystyle\mu_{\min S_{j}^{(i)}}-\mu_{\min S_{j^{\prime}}^{(i)}}$	$\displaystyle=\sum_{l=1}^{m}\left(\mu_{\min S_{j^{(l-1)}}^{(i)}}-\mu_{\min S_{j^{(l)}}^{(i)}}\right)+\mu_{\min S_{j^{(m)}}^{(i)}}-\mu_{\min S_{j^{\prime}}^{(i)}}$
(134)			$\displaystyle\leq\sum_{l=1}^{m}\left(\mu_{\min S_{j^{(l)}-1}^{(i)}}-\mu_{\min S_{j^{(l)}}^{(i)}}\right)\leq\sum_{l=1}^{m}\delta_{j^{(l)}-1,1}\leq(K-1)\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1},$

where the last inequality uses $j=j^{(0)}\leq j^{(l)}-1<j^{\prime}\ \forall\ l\in[m]$ . ∎

As a simple corollary of the previous two claims, we have the following.

Corollary 3.

If the assumptions of Theorem 2 hold, then for any $i\in[n]$ and $j^{\prime}\geq j\geq\tau_{\text{arm}}$ ,

(135)

\mu_{B_{j^{\prime}}^{(i)}}\geq\mu_{\min S_{j}^{(i)}}-K\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.

Proof.

Since $j^{\prime}\geq j\geq\tau_{\text{arm}}$ , we can use Claims 13 and 14, respectively, to obtain

(136)

\displaystyle\mu_{B_{j^{\prime}}^{(i)}}

\displaystyle\geq\mu_{\min S_{j^{\prime}}^{(i)}}-\delta_{j^{\prime},1}\geq\mu_{\min S_{j^{\prime}}^{(i)}}-\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}\geq\mu_{\min S_{j}^{(i)}}-K\sup_{j^{\prime\prime}\in\{j,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}.\qed

Next, inspecting the analysis in Section 7.3, we see that $\delta_{j,2}\geq(K+1)\sup_{j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime},1}$ for large $j$ . Thus, the following is well-defined:

(137)

J_{3}^{\star}=\min\left\{j\in\mathbb{N}:\delta_{j^{\prime},2}\geq(K+1)\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j^{\prime}}\rfloor,\ldots,j^{\prime}\}}\delta_{j^{\prime\prime},1}\ \forall\ j^{\prime}\geq j\right\}.

As discussed in Section 7.3, we can now show that no new accidental blocking occurs at late phases, at least among pairs of honest agents that have recently communicated.

Claim 15.

Under the assumptions of Theorem 2, if $j\geq J_{3}^{\star}$ , $\lfloor\theta_{j}\rfloor\geq\tau_{\text{arm}}$ , and $H_{j^{\prime}}^{(i^{\prime})}=i$ for some $i,i^{\prime}\in[n]$ and $j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j-2\}$ , then $i^{\prime}\notin P_{j}^{(i)}\setminus P_{j-1}^{(i)}$ .

Proof.

Suppose instead that $i^{\prime}\in P_{j}^{(i)}\setminus P_{j-1}^{(i)}$ . Then by the algorithm,

(138)

B_{j}^{(i)}=\cdots=B_{\lfloor\theta_{j}\rfloor}^{(i)},\quad T_{R_{j-1}^{(i)}}^{(i)}(A_{j})\leq\kappa_{j},\quad R_{j-1}^{(i)}=B_{j-1}^{(i^{\prime})}\in S_{j}^{(i)}.

Since $j\geq\lfloor\theta_{j}\rfloor\geq\tau_{\text{arm}}$ , this implies $B_{j-1}^{(i^{\prime})}\notin G_{\delta_{j,2}}(S_{j}^{(i)})$ . We then observe the following:

•

Since $B_{j-1}^{(i^{\prime})}\in S_{j}^{(i)}\setminus G_{\delta_{j,2}}(S_{j}^{(i)})$ , the definition of $G_{\delta_{j,2}}(S_{j}^{(i)})$ implies $\mu_{B_{j-1}^{(i^{\prime})}}<\mu_{\min S_{j}^{(i)}}-\delta_{j,2}$ .
•

Again using $j\geq\tau_{\text{arm}}$ , Claim 13 implies $\mu_{\min S_{j}^{(i)}}\leq\mu_{B_{j}^{(i)}}+\delta_{j,1}$ .
•

Since $\lfloor\theta_{j}\rfloor\leq j^{\prime}\leq j$ , (138) implies $B_{j}^{(i)}=B_{j^{\prime}}^{(i)}$ , so $\mu_{B_{j}^{(i)}}=\mu_{B_{j^{\prime}}^{(i)}}$ .
•

Since $H_{j^{\prime}}^{(i^{\prime})}=i$ , the algorithm implies $B_{j^{\prime}}^{(i)}\in S_{j^{\prime}+1}^{(i^{\prime})}$ , so $\mu_{B_{j^{\prime}}^{(i)}}\leq\mu_{\min S_{j^{\prime}+1}^{(i^{\prime})}}$ .
•

Since $\tau_{\text{arm}}\leq\lfloor\theta_{j}\rfloor<j^{\prime}+1<j$ , Corollary 3 implies $\mu_{\min S_{j^{\prime}+1}^{(i^{\prime})}}\leq\mu_{B_{j-1}^{(i^{\prime})}}+K\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}$ .

Stringing together these inequalities, we obtain

(139)

\mu_{B_{j-1}^{(i^{\prime})}}<\mu_{B_{j-1}^{(i^{\prime})}}+K\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}-\delta_{j,2}+\delta_{j,1}\leq\mu_{B_{j-1}^{(i^{\prime})}}+(K+1)\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}-\delta_{j,2},

i.e., $\delta_{j,2}<(K+1)\sup_{j^{\prime\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}}\delta_{j^{\prime\prime},1}$ , which contradicts $j\geq J_{3}^{\star}$ . ∎

Finally, we prove that honest agents eventually stop blocking each other.

Lemma 11.

Under the assumptions of Theorem 2, if $j\geq J_{2}^{\star}$ , $\lfloor\theta_{j}\rfloor\geq\tau_{\text{com}}\vee J_{3}^{\star}$ , and $\lfloor\theta_{\lfloor\theta_{j}\rfloor}\rfloor\geq\tau_{\text{arm}}$ , then for any $i\in[n]$ and $j^{\prime}\geq j$ , $P_{j^{\prime}}^{(i)}\cap[n]=\emptyset$ .

Proof.

Fix $i\rightarrow i^{\prime}\in E_{\text{hon}}$ and $j^{\prime}\geq j$ ; we aim to show $i^{\prime}\notin P_{j^{\prime}}^{(i^{\prime})}$ . This clearly holds if $i^{\prime}\notin P_{j^{\prime\prime}}^{(i)}\setminus P_{j^{\prime\prime}-1}^{(i)}\ \forall\ j^{\prime\prime}\leq j^{\prime}$ . Otherwise, $j_{B,1}=\max\{j^{\prime\prime}\leq j^{\prime}:i^{\prime}\in P_{j^{\prime\prime}}^{(i)}\setminus P_{j^{\prime\prime}-1}^{(i)}\}$ (the latest phase up to and including $j^{\prime}$ that $i$ blocked $i^{\prime}$ ) is well-defined. We consider two cases of $j_{B,1}$ .

The first case is $j_{B,1}\leq\lfloor\theta_{j}\rfloor$ . Let $j_{B,2}=\min\{j^{\prime\prime}>j^{\prime}:P_{j^{\prime\prime}}^{(i)}\setminus P_{j^{\prime\prime}-1}^{(i)}\}$ denote the first phase after $j^{\prime}$ that $i$ blocked $i^{\prime}$ . Combined with the definition of $j_{B,1}$ , the algorithm implies

(140)

i^{\prime}\notin P_{j^{\prime\prime}}^{(i)}\ \forall\ j^{\prime\prime}\in\{\lceil j_{B,1}^{\eta}\rceil+1,\ldots,j_{B,2}\}.

Since $j\geq J_{2}^{\star}$ , the definition of $J_{2}^{\star}$ implies

(141)

\lceil\lfloor\theta_{j}\rfloor^{\eta}\rceil+1\leq(j-2)-(j/3)+1<(j-2)+1=j-1,

so $\lceil j_{B}^{\eta}\rceil+1<j-1<j^{\prime}$ as well. Combined with $j^{\prime}\leq j_{B,2}-1$ by definition, $i^{\prime}\notin P_{j^{\prime}}^{(i^{\prime})}$ holds by (140).

The second case is $j_{B,1}>\lfloor\theta_{j}\rfloor$ . By assumption, $j_{B,1}>\lfloor\theta_{j}\rfloor\geq\tau_{\text{com}}$ . Hence, by definition of $\tau_{\text{com}}$ , $H_{j_{C}}^{(i^{\prime})}=i$ for some $j_{C}\in\{\lfloor\theta_{j_{B,1}}\rfloor,\ldots,j_{B,1}-2\}$ . Note that $j_{B,1}\geq\lfloor\theta_{j}\rfloor\geq J_{3}^{\star}$ and $\lfloor\theta_{j_{B,1}}\rfloor\geq\lfloor\theta_{\lfloor\theta_{j}\rfloor}\rfloor\geq\tau_{\text{arm}}$ by assumption (and by monotonicity of $\{\lfloor\theta_{j^{\prime\prime}}\rfloor\}_{j^{\prime\prime}\in\mathbb{N}}$ in the latter case). Hence, we can apply Claim 15 (with $j=j_{B,1}$ and $j^{\prime}=j_{C}$ in the claim) to obtain $i^{\prime}\notin P_{j_{B,1}}^{(i)}\setminus P_{j_{B,1}-1}^{(i)}$ . This is a contradiction. ∎

E.4. Details from Section 7.4

We first verify that the sampling strategy in Section 7.4 is identical to the one in Algorithm 2.

Claim 16.

Suppose we replace the sampling of $H_{j}^{(i)}$ in Algorithm 2 with the sampling of Section 7.4, and recall $\mathbb{P}_{j}$ denotes probability conditioned on all randomness before this sampling occurs. Then

(142)

\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime})=\mathbbm{1}(i^{\prime}\in N(i)\setminus P_{j}^{(i)})/|N(i)\setminus P_{j}^{(i)}|\ \forall\ i\in[n],i^{\prime}\in[n+m],j\in\mathbb{N}.

Proof.

Since $\mathbb{P}_{j}$ conditions on $P_{j}^{(i)}$ , we can prove the identity separately in the cases $P_{j}^{(i)}\cap[n]\neq\emptyset$ and $P_{j}^{(i)}\cap[n]=\emptyset$ . The identity is immediate in the former case. For the latter, we have

(143)	$\displaystyle\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime})$	$\displaystyle=\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime}\|Y_{j}^{(i)}=1)\mathbb{P}_{j}(Y_{j}^{(i)}=1)+\mathbb{P}_{j}(H_{j}^{(i)}=i^{\prime}\|Y_{j}^{(i)}=0)\mathbb{P}_{j}(Y_{j}^{(i)}=0)$
(144)		$\displaystyle=\frac{\mathbbm{1}(i^{\prime}\in N_{\text{hon}}(i))}{d_{\text{hon}}(i)}\frac{d_{\text{hon}}(i)}{\|N(i)\setminus P_{j}^{(i)}\|}+\frac{\mathbbm{1}(i^{\prime}\in N_{\text{mal}}(i)\setminus P_{j}^{(i)})}{\|N_{\text{mal}}(i)\setminus P_{j}^{(i)}\|}\left(1-\frac{d_{\text{hon}}(i)}{\|N(i)\setminus P_{j}^{(i)}\|}\right)$
(145)		$\displaystyle=\frac{\mathbbm{1}(i^{\prime}\in N_{\text{hon}}(i))}{\|N(i)\setminus P_{j}^{(i)}\|}+\frac{\mathbbm{1}(i^{\prime}\in N_{\text{mal}}(i)\setminus P_{j}^{(i)})}{\|N_{\text{mal}}(i)\setminus P_{j}^{(i)}\|}\frac{\|N(i)\setminus P_{j}^{(i)}\|-d_{\text{hon}}(i)}{\|N(i)\setminus P_{j}^{(i)}\|}$
(146)		$\displaystyle=\frac{\mathbbm{1}(i^{\prime}\in N_{\text{hon}}(i))+\mathbbm{1}(i^{\prime}\in N_{\text{mal}}(i)\setminus P_{j}^{(i)})}{\|N(i)\setminus P_{j}^{(i)}\|}=\frac{\mathbbm{1}(i^{\prime}\in N(i)\setminus P_{j}^{(i)})}{\|N(i)\setminus P_{j}^{(i)}\|}.\qed$

Next, note that since $\delta_{j^{\prime},1}\rightarrow 0$ as $j^{\prime}\rightarrow\infty$ , the following is well-defined:

(147)

J_{4}^{\star}=\min\{j\in\{2,3,\ldots\}:\delta_{j^{\prime},1}<\Delta_{2}\ \forall\ j^{\prime}\geq\lfloor j/2\rfloor\}.

As in Section 7.4, we let $\mathcal{I}_{j}=\{i\in[n]:1\in S_{j}^{(i)}\}$ be the agents with the best arm active at phase $j$ .

Claim 17.

Under the assumptions of Theorem 2, if $j\geq J_{4}^{\star}$ and $\lfloor j/2\rfloor\geq\tau_{\text{arm}}$ , then $B_{j^{\prime}}^{(i)}=1\ \forall\ j^{\prime}\geq\lfloor j/2\rfloor,i\in\mathcal{I}_{j^{\prime}}$ .

Proof.

Suppose instead that $B_{j^{\prime}}^{(i)}\neq 1$ for some $j^{\prime}\geq\lfloor j/2\rfloor$ and $i\in\mathcal{I}_{j^{\prime}}$ . Since $j^{\prime}\geq\lfloor j/2\rfloor\geq\lfloor J_{4}^{\star}/2\rfloor$ , we know $\delta_{j^{\prime},1}<\Delta_{2}$ . Hence, because $1\in S_{j^{\prime}}^{(i)}$ by definition of $\mathcal{I}_{j^{\prime}}$ , we have $G_{\delta_{j^{\prime},1}}(S_{j^{\prime}}^{(i)})=\{1\}$ . Combined with $B_{j^{\prime}}^{(i)}\neq 1$ , we get $B_{j^{\prime}}^{(i)}\in S_{j^{\prime}}^{(i)}\setminus G_{\delta_{j^{\prime},1}}(S_{j^{\prime}}^{(i)})$ , which contradicts $j^{\prime}\geq\lfloor j/2\rfloor\geq\tau_{\text{arm}}$ . ∎

Finally, recall $\tau_{\text{spr}}=\inf\{j\in\mathbb{N}:B_{j^{\prime}}^{(i)}=1\ \forall\ i\in[n],j^{\prime}\geq j\}$ and $\bar{\tau}_{\text{spr}}=\inf\{j\in\mathbb{N}:\bar{\mathcal{I}}_{j}=[n]\}$ , where $\{\bar{\mathcal{I}}_{j}\}_{j=1}^{\infty}$ is the noisy rumor process from Definition 1.

Lemma 12.

Under the assumptions of Theorem 2, if $j\geq J_{4}^{\star}$ , $\lfloor j/2\rfloor\geq J_{2}^{\star}$ , and $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}$ , then

(148)

\mathbb{P}(\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor,\tau_{\text{spr}}>j)\leq\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).

Proof.

Let $\mathcal{E}_{j}=\{\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\}$ and

(149)

\tilde{\mathcal{I}}_{\lfloor j/2\rfloor}=\{i^{\star}\},\ \tilde{\mathcal{I}}_{j}=\bar{\mathcal{I}}_{j-1}\cup\{i\in[n]\setminus\tilde{\mathcal{I}}_{j-1}:\bar{Y}_{j}^{(i)}=1,\bar{H}_{j}^{(i)}\in\tilde{\mathcal{I}}_{j-1}\}\ \forall\ j\in\{\lfloor j/2\rfloor+1,\lfloor j/2\rfloor+2,\ldots\}.

Then it suffices to prove the following:

(150)

\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\mathcal{I}_{j}\neq[n]\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\neq[n])=\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).

For the first inequality in (150), we begin by proving

(151)

\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\}\cap\{\mathcal{I}_{j}=[n]\}=\emptyset.

To do so, we show $\mathcal{E}_{j},\mathcal{I}_{j}=[n]\Rightarrow\tau_{\text{spr}}\leq j$ . Assume $\mathcal{E}_{j}$ and $\mathcal{I}_{j}=[n]$ hold; by definition of $\tau_{\text{spr}}$ , we aim to show $B_{j^{\prime}}^{(i)}=1\ \forall\ i\in[n],j^{\prime}\geq j$ . We use induction. The base of induction ( $j^{\prime}=j$ ) holds by $\mathcal{I}_{j}=[n]$ , $\mathcal{E}_{j}\subset\{\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\leq\lfloor j/2\rfloor\}$ , and Claim 17. Given the inductive hypothesis $B_{j^{\prime}}^{(i)}=1\ \forall\ i\in[n]$ , we have $1\in S_{j^{\prime}+1}^{(i)}$ by the algorithm and $\mathcal{I}_{j^{\prime}+1}=[n]$ by definition, so we again use the assumption that $\mathcal{E}_{j}$ holds and Claim 17 to obtain $B_{j^{\prime}+1}^{(i)}=1\ \forall\ i\in[n]$ . Hence, (151) holds, so

(152)

\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\})=\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\tau_{\text{spr}}>j\}\cap\{\mathcal{I}_{j}\neq[n]\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\mathcal{I}_{j}\neq[n]\}).

For the second inequality in (150), we claim, and will return to prove, the following:

(153)

\mathcal{E}_{j}\subset\cap_{j^{\prime}=\lfloor j/2\rfloor}^{j}\{\tilde{\mathcal{I}}_{j^{\prime}}\subset\mathcal{I}_{j^{\prime}}\}.

Assuming (153) holds, we obtain

(154)

\displaystyle\mathbb{P}_{\lfloor j/2\rfloor}(\mathcal{E}_{j}\cap\{\mathcal{I}_{j}\neq[n]\})\leq\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\subset\mathcal{I}_{j},\mathcal{I}_{j}\neq[n],)\leq\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\neq[n]).

Hence, it only remains to prove (153). We show by induction on $j^{\prime}$ when $\mathcal{E}_{j}$ holds, $\tilde{\mathcal{I}}_{j^{\prime}}\subset\mathcal{I}_{j^{\prime}}$ for each $j^{\prime}\in\{\lfloor j/2\rfloor,\ldots,j\}$ . For $j^{\prime}=\lfloor j/2\rfloor$ , recall $1\in S_{\lfloor j/2\rfloor}^{(i^{\star})}$ by Assumption 3, so $\mathcal{I}_{\lfloor j/2\rfloor}\supset\{i^{\star}\}=\tilde{\mathcal{I}}_{\lfloor j/2\rfloor}$ . Now assume $\tilde{\mathcal{I}}_{j^{\prime}-1}\subset\mathcal{I}_{j^{\prime}-1}$ for some $j^{\prime}\in\{\lfloor j/2\rfloor+1,\ldots,j\}$ . Let $i\in\tilde{\mathcal{I}}_{j^{\prime}}$ ; we aim to show that $i\in\mathcal{I}_{j^{\prime}}$ , i.e., that $1\in S_{j^{\prime}}^{(i)}$ . By (149), we have two cases to consider:

•

$i\in\tilde{\mathcal{I}}_{j^{\prime}-1}$ : By the inductive hypothesis, $i\in{\mathcal{I}}_{j^{\prime}-1}$ as well, so $1\in S_{j^{\prime}-1}^{(i)}$ by definition. Combined with $j^{\prime}-1\geq\lfloor j/2\rfloor$ and Claim 17 (recall $j\geq J_{4}^{\star}$ and $\lfloor j/2\rfloor\geq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq\tau_{\text{arm}}$ when $\mathcal{E}_{j}$ holds, so the claim applies), this implies $B_{j^{\prime}-1}^{(i)}=1$ , so $1\in S_{j^{\prime}}^{(i)}$ by the algorithm.

•

$i\in[n]\setminus\tilde{\mathcal{I}}_{j^{\prime}-1},\bar{Y}_{j^{\prime}}^{(i)}=1,\bar{H}_{j^{\prime}}^{(i)}\in\tilde{\mathcal{I}}_{j^{\prime}-1}$ : First observe that since $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}$ and $j^{\prime}\geq\lfloor j/2\rfloor\geq J_{2}^{\star}$ , we can apply Lemma 11 on the event $\mathcal{E}_{j}$ (with $j$ replaced by $\lfloor j/2\rfloor$ in that lemma) to obtain $P_{j^{\prime}}^{(i)}\cap[n]=\emptyset$ . On the other hand, recall that $\bar{Y}_{j^{\prime}}^{(i)}$ is $\text{Bernoulli}(\Upsilon)$ in Definition 1 and $\upsilon_{j^{\prime}}^{(i)}$ is $\text{Uniform}[0,1]$ for the Section 7.4 sampling, so we can realize the former as $\bar{Y}_{j^{\prime}}^{(i)}=\mathbbm{1}(\upsilon_{j^{\prime}}^{(i)}\leq\Upsilon)$ . Hence, by assumption $\bar{Y}_{j^{\prime}}^{(i)}=1$ and definition $\Upsilon=\min_{i\in[n]}d_{\text{hon}}(i)/d(i)$ , we obtain

(155)

1=\bar{Y}_{j^{\prime}}^{(i)}=\mathbbm{1}(\upsilon_{j^{\prime}}^{(i)}\leq\Upsilon)\leq\mathbbm{1}\left(\upsilon_{j^{\prime}}^{(i)}\leq\frac{d_{\text{hon}}(i)}{d(i)}\right)\leq\mathbbm{1}\left(\upsilon_{j^{\prime}}^{(i)}\leq\frac{d_{\text{hon}}(i)}{|N(i)\setminus P_{j}^{(i)}|}\right)=Y_{j}^{(i)}.

In summary, we have shown that $P_{j^{\prime}}^{(i)}\cap[n]=\emptyset$ and $Y_{j}^{(i)}=1$ . Hence, by the Section 7.4 sampling, we conclude $H_{j}^{(i)}=\bar{H}_{j}^{(i)}$ . Let $i^{\prime}=H_{j}^{(i)}$ denote this honest agent. Then by the inductive hypothesis, we know that $i^{\prime}\in\tilde{\mathcal{I}}_{j^{\prime}-1}\subset{\mathcal{I}}_{j^{\prime}-1}$ , i.e., that $1\in S_{j^{\prime}-1}^{(i^{\prime})}$ . By Claim 17, this implies $B_{j^{\prime}-1}^{(i^{\prime})}=1$ , so by the algorithm, $1=B_{j^{\prime}-1}^{(i^{\prime})}=R_{j^{\prime}-1}^{(i)}\in S_{j^{\prime}}^{(i)}$ .

Finally, for the equality in (150), note that $\tilde{\mathcal{I}}_{j^{\prime}}$ is independent of the randomness before $\lfloor j/2\rfloor$ . By $\tilde{\mathcal{I}}_{\lfloor j/2\rfloor}=\bar{\mathcal{I}}_{0}=\{i^{\star}\}$ , the fact that $\bar{Y}_{j^{\prime}}^{(i)}$ and $\bar{Y}_{j^{\prime}-\lfloor j/2\rfloor}^{(i)}$ are both $\text{Bernoulli}(\Upsilon)$ random variables and $\bar{H}_{j^{\prime}}^{(i)}$ and $\bar{H}_{j^{\prime}-\lfloor j/2\rfloor}^{(i)}$ are both sampled uniformly from $N_{\text{hon}}(i)$ , $\tilde{\mathcal{I}}_{j^{\prime}}$ has the same distribution as $\bar{\mathcal{I}}_{j^{\prime}-\lfloor j/2\rfloor}$ . Also, note that $\tilde{\mathcal{I}}_{j^{\prime}}$ is independent of the randomness before $\lfloor j/2\rfloor$ . Finally, by definition of $\bar{\tau}_{\text{spr}}$ , $\bar{\mathcal{I}}_{j-\lfloor j/2\rfloor}\neq[n]$ implies that $\bar{\tau}_{\text{spr}}>j-\lfloor j/2\rfloor\geq j/2$ . These observations successively imply

(156)

\displaystyle\mathbb{P}_{\lfloor j/2\rfloor}(\tilde{\mathcal{I}}_{j}\neq[n])=\mathbb{P}(\tilde{\mathcal{I}}_{j}\neq[n])=\mathbb{P}(\bar{\mathcal{I}}_{j^{\prime}-\lfloor j/2\rfloor}\neq[n])\leq\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).\qed

E.5. Details from Section 7.5

Combining the lemmas of the above sub-appendices, we can bound the tail of the spreading time.

Theorem 3.

Under the assumptions of Theorem 2, for any $j\geq J_{\star}$ , where $J_{\star}$ is defined in Claim 32 from Appendix F.3, we have

(157)

\mathbb{P}(\tau_{\text{spr}}>j)\leq\left(\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)nK^{2}S}{(2\alpha-3)(\beta(2\alpha-3)-1)}+3\right)j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}+\mathbb{P}\left(\bar{\tau}_{\text{spr}}>\frac{j}{2}\right).

Proof.

For any $j\geq J_{\star}$ , we can use Claim 32 to obtain

(158)		$\displaystyle j\geq J_{4}^{\star},\quad\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}\vee J_{3}^{\star},\quad\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}\vee(2+j^{\rho_{1}^{2}}/84)\geq 3$
(159)		$\displaystyle(n+m)^{3}\exp(-\lfloor\theta_{\lfloor j/2\rfloor}\rfloor/(3\bar{d}))\leq j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.$

In particular, $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}\vee 3$ implies that we can use Lemma 9 with $j$ replaced by $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor$ . Combined with (158) (namely, $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor-2\geq j^{\rho_{1}^{2}}/84$ ); this yields

(160)

\mathbb{P}(\tau_{\text{arm}}>\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor)\leq\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)nK^{2}S}{(2\alpha-3)(\beta(2\alpha-3)-1)}j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

Since $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}$ by (158), we can use Lemma 10 (with $j$ replaced by $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor$ ) and (159) to obtain

(161)

\mathbb{P}(\tau_{\text{com}}>\lfloor\theta_{\lfloor j/2\rfloor}\rfloor)\leq 3(n+m)^{2}\exp\left(-\frac{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}{3\bar{d}}\right)\leq 3j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

Furthermore, using the bounds $j\geq J_{4}^{\star}$ , $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}$ , and $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}$ from (158) (the last of which implies $\lfloor j/2\rfloor\geq J_{2}^{\star}$ , since $\theta_{\lfloor j/2\rfloor}=(\lfloor j/2\rfloor/3)^{\rho_{1}}\leq\lfloor j/2\rfloor$ by $\rho_{1}\in(0,1)$ ), we can use Lemma 12 to get

(162)

\mathbb{P}(\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor,\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{spr}}>j)\leq\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2).

Finally, by the union bound, we have

(163)		$\displaystyle\mathbb{P}(\tau_{\text{spr}}>j)$	$\displaystyle\leq\mathbb{P}(\tau_{\text{arm}}>\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor)+\mathbb{P}(\tau_{\text{com}}>\lfloor\theta_{\lfloor j/2\rfloor}\rfloor)$
(164)			$\displaystyle\quad+\mathbb{P}(\tau_{\text{arm}}\leq\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor,\tau_{\text{com}}\leq\lfloor\theta_{\lfloor j/2\rfloor}\rfloor,\tau_{\text{spr}}>j),$

so combining the previous four inequalities yields the desired result. ∎

Finally, as a corollary, we can bound $\mathbb{E}[A_{\tau_{\text{spr}}}]$ .

Corollary 4.

Under the assumptions of Theorem 2, we have

(165)		$\displaystyle\mathbb{E}[A_{\tau_{\text{spr}}}]$	$\displaystyle\leq(J_{\star})^{\beta}+\left(\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)}{(2\alpha-3)(\beta(2\alpha-3)-1)}+3\right)\frac{2\beta nK^{2}S}{\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta}+\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}]$
(166)			$\displaystyle=O\left(S^{\beta/(\rho_{1}^{2}(\beta-1))}\vee(S\log(S/\Delta_{2})/\Delta_{2}^{2})^{\beta/(\beta-1)}\vee(\bar{d}\log(n+m))^{\beta/\rho_{1}}\vee nK^{2}S\right)+\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}],$

where $J_{\star}$ is defined as in Claim 32 from Appendix F.3.

Proof.

We first observe

(167)

\mathbb{E}[A_{\tau_{\text{spr}}}]=\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\tau_{\text{spr}}\geq j)\leq A_{J_{\star}-1}+\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\tau_{\text{spr}}\geq j).

For the first term in (167), using Claim 18 from Appendix F.1, we compute

(168)

A_{J_{\star}-1}=A_{J_{\star}}-(A_{J_{\star}}-A_{J_{\star}-1})=\lceil(J_{\star})^{\beta}\rceil-(A_{J_{\star}}-A_{J_{\star}-1})\leq(J_{\star})^{\beta}+1-1=(J_{\star})^{\beta}.

For the second term in (167), define the constant

(169)

C=\frac{84^{\rho_{1}^{2}(\beta(2\alpha-3)-1)}(6^{\beta}+2)\ }{(2\alpha-3)(\beta(2\alpha-3)-1)}+3.

Then by Theorem 3, we have

(170)		$\displaystyle\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\tau_{\text{spr}}\geq j)$	$\displaystyle\leq\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2)$
(171)			$\displaystyle\quad+CnK^{2}S\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.$

For (170), we simply use nonnegativity to write

(172)

\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(\bar{\tau}_{\text{spr}}>j/2)\leq\sum_{j=1}^{\infty}(A_{j}-A_{j-1})\mathbb{P}(2\bar{\tau}_{\text{spr}}\geq j)=\mathbb{E}[A_{2\bar{\tau}_{\text{spr}}}].

For the second term in (171), we use Claim 18 and $\beta>1$ and $\rho_{1}^{2}(\beta(2\alpha-3)-1)>\beta$ by the assumptions of Theorem 2 to write

(173)		$\displaystyle CnK^{2}S\sum_{j=J_{\star}}^{\infty}(A_{j}-A_{j-1})j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}<2\beta CnK^{2}S\sum_{j=J_{\star}}^{\infty}j^{-1-(\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta)}$
(174)		$\displaystyle\quad\leq 2\beta CnK^{2}S\int_{j=1}^{\infty}j^{-1-(\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta)}dj=\frac{2\beta CnK^{2}S}{\rho_{1}^{2}(\beta(2\alpha-3)-1)-\beta}.$

Finally, combining the above bounds completes the proof. ∎

Appendix F Other proofs

F.1. Basic inequalities

In this sub-appendix, we prove some simple inequalities used frequently in the analysis.

Claim 18.

For any $j\in\mathbb{N}$ , we have

j^{\beta}\leq A_{j}\leq j^{2\beta},\quad\beta(j-1)^{\beta-1}-1<A_{j}-A_{j-1}<\beta j^{\beta-1}+1,\quad A_{j}-A_{j-1}\geq 1,

and for any $z\geq 1$ and $l\in\mathbb{N}$ , we have $A_{l\lceil z\rceil}\leq e^{2\beta}(lz)^{\beta}$ .

Proof.

For the first pair of inequalities, observe $A_{j}=j^{\beta}=j^{2\beta}=1$ when $j=1$ , and for $j\geq 2$ ,

(175)

j^{\beta}\leq A_{j}\leq j^{\beta}+1\leq 2j^{\beta}\leq j^{1+\beta}\leq j^{2\beta}.

For the second pair of inequalities, we first observe

(176)

A_{j}-A_{j-1}>j^{\beta}-(j-1)^{\beta}-1=\beta x^{\beta-1}-1\geq\beta(j-1)^{\beta-1}-1,

where the equality holds for some $x\in[j-1,j]$ by the mean value theorem and the second inequality is $x\geq j-1$ . By analogous reasoning, one can also show $A_{j}-A_{j-1}<\beta j^{\beta-1}+1$ , so the second pair of inequalities holds. The third inequality holds with equality when $j=1$ , and for $j\geq 2$ , the lower bound in (176) and $\beta>1$ imply $A_{j}-A_{j-1}>0$ , so since $A_{j}$ and $A_{j-1}$ are integers, $A_{j}-A_{j-1}\geq 1$ . Finally, using $z\geq 1$ , $\beta>1$ , and $2<e$ , we can write

A_{l\lceil z\rceil}<(l(z+1))^{\beta}+1<(2lz)^{\beta}+(2lz)^{\beta}=2^{\beta+1}(lz)^{\beta}<e^{2\beta}(zl)^{\beta}.\qed

Claim 19.

For any $j\in\mathbb{N}$ and $c>1$ , $\sum_{i=j+1}^{\infty}i^{-c}\leq j^{1-c}/(c-1)$ .

Proof.

Since $j\geq 1$ and $c>1$ , we can write

\displaystyle\sum_{i=j+1}^{\infty}i^{-c}=\sum_{i=j+1}^{\infty}\int_{x=i-1}^{i}i^{-c}dx\leq\sum_{i=j+1}^{\infty}\int_{x=i-1}^{i}x^{-c}dx=\int_{x=j}^{\infty}x^{-c}dx=\frac{j^{1-c}}{c-1}.\qed

Claim 20.

For any $x,y,z>0$ such that $x^{y}\leq z\log x$ , $x<((2z/y)\log(2z/y))^{1/y}<(2z/y)^{2/y}$ .

Proof.

Multiplying and dividing the right side of the assumed inequality by $y/2$ , we obtain $x^{y}\leq(2z/y)\log x^{y/2}$ . We can then loosen this bound to get $x^{y}<(2z/y)x^{y/2}$ , or $x<(2z/y)^{2/y}$ . Plugging into the $\log$ term of the assumed inequality yields $x^{y}<(2z/y)\log(2z/y)$ . Raising both sides to the power $1/y$ establishes the first bound. The second bound follows by using $\log(2z/y)<2z/y$ . ∎

Remark 14.

We typically apply Claim 20 with $y$ constant but $z$ not. It allows us to invert inequalities of the form $x^{y}\leq z\log x$ to obtain $x=\tilde{O}(z^{1/y})$ .

F.2. Bandit inequalities

Next, we state and prove some basic bandit inequalities. The proof techniques are mostly modified from existing work (e.g., (Auer et al., 2002)), but we provide the bounds in forms useful for our setting.

Claim 21.

Suppose that $k_{1},k_{2}\in[K]$ , $t\in\mathbb{N}$ , $\ell,u>0$ , and $\iota\in(0,1]$ satisfy

\mu_{k_{2}}-\mu_{k_{1}}\geq\sqrt{\alpha\log t}\left(\frac{2}{\sqrt{\ell}}-\frac{1-\iota}{\sqrt{u}}\right).

Let $j\in\mathbb{N}$ be such that $t\in\{1+A_{j-1},\ldots,A_{j}\}$ , i.e., $j\in A^{-1}(t)$ . Then for any $i\in[n]$ , we have

\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,T_{k_{2}}^{(i)}(A_{j})\leq u,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})\leq 2(\lfloor u\rfloor\wedge t)t^{1-2\alpha\iota^{2}}

Proof.

For $k\in[K]$ , let $\{X_{k}(s)\}_{s=1}^{\infty}$ be an i.i.d. sequence distributed as $\nu_{k}$ , and for $s\in\mathbb{N}$ , let

(177)

\hat{\mu}_{k}^{(i)}(s)=\frac{1}{s}\sum_{s^{\prime}=1}^{s}X_{k}(s^{\prime}),\quad U_{k}^{(i)}(t,s)=\hat{\mu}_{k}^{(i)}(s)+\sqrt{\frac{\alpha\log t}{s}}

denote the empirical mean and UCB index of if $i$ has pulled the $k$ -th arm $s$ times before $t$ . Then Algorithm 1 implies that if $k_{2}\in S_{j}^{(i)}$ and $I_{t}^{(i)}=k_{1}$ , we must have

(178)

U_{k_{1}}^{(i)}(t,T_{k_{1}}^{(i)}(t-1))\geq U_{k_{2}}^{(i)}(t,T_{k_{2}}^{(i)}(t-1)).

Next, note that if $T_{k_{2}}^{(i)}(A_{j})\leq u$ , then by monotonicity, $T_{k_{2}}^{(i)}(t-1)\leq u$ as well. Combined with the fact that $T_{k_{2}}^{(i)}(t-1)\in[t]$ by definition, we conclude that $T_{k_{2}}^{(i)}(A_{j})\leq u$ implies $T_{k_{2}}^{(i)}(t-1)\leq\lfloor u\rfloor\wedge t$ . Similarly, $T_{k_{1}}^{(i)}(t-1)\geq\ell$ implies $T_{k_{1}}^{(i)}(t-1)\geq\lceil\ell\rceil$ (since $T_{k_{1}}^{(i)}(t-1)\in\mathbb{N}$ ). Combined with (178), we obtain that if the event in the statement of the claim occurs, it must be the case that

\max_{s_{1}\in\{\lceil\ell\rceil,\ldots,t\}}U_{k_{1}}^{(i)}(t,s_{1})\geq\min_{s_{2}\in[\lfloor u\rfloor\wedge t]}U_{k_{2}}^{(i)}(t,s_{2}).

Therefore, by the union bound, we obtain

(179)

\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,T_{k_{2}}^{(i)}(A_{j})\leq u,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})\leq\sum_{s_{1}=\lceil\ell\rceil}^{t}\sum_{s_{2}=1}^{\lfloor u\rfloor\wedge t}\mathbb{P}(U_{k_{1}}^{(i)}(t,s_{1})\geq U_{k_{2}}^{(i)}(t,s_{2})).

Now fix $s_{1}$ and $s_{2}$ as in the double summation. We claim $U_{k_{1}}^{(i)}(t,s_{1})\geq U_{k_{2}}^{(i)}(t,s_{2})$ implies

\hat{\mu}_{k_{1}}^{(i)}(s_{1})\geq\mu_{k_{1}}+\sqrt{\alpha\log(t)/s_{1}}\quad\text{or}\quad\hat{\mu}_{k_{2}}^{(i)}(s_{2})\leq\mu_{k_{2}}-\iota\sqrt{\alpha\log(t)/s_{2}}.

Indeed, if instead both inequalities fail, then by choice of $s_{1},s_{2}$ and the assumption of the claim,

(180)		$\displaystyle U_{k_{1}}^{(i)}(t,s_{1})$	$\displaystyle<\mu_{k_{1}}+2\sqrt{\alpha\log(t)/s_{1}}\leq\mu_{k_{1}}+2\sqrt{\alpha\log(t)/\ell}$
(181)			$\displaystyle\leq\mu_{k_{2}}+(1-\iota)\sqrt{\alpha\log(t)/u}\leq\mu_{k_{2}}+(1-\iota)\sqrt{\alpha\log(t)/s_{2}}<U_{k_{2}}^{(i)}(t,s_{2}),$

which is a contradiction. Thus, by the union bound, Hoeffding’s inequality, and $\iota\in(0,1)$ , we obtain

	$\displaystyle\mathbb{P}(U_{k_{1}}^{(i)}(t,s_{1})\geq U_{k_{2}}^{(i)}(t,s_{2}))$	$\displaystyle\leq\mathbb{P}(\hat{\mu}_{k_{1}}^{(i)}(s_{1})\geq\mu_{k_{1}}+\sqrt{\alpha\log(t)/s_{1}})+\mathbb{P}(\hat{\mu}_{k_{2}}^{(i)}(s_{2})\leq\mu_{k_{2}}-\iota\sqrt{\alpha\log(t)/s_{2}})$
		$\displaystyle\leq e^{-2\alpha\log t}+e^{-2\alpha\iota^{2}\log t}=t^{-2\alpha}+t^{-2\alpha\iota^{2}}\leq 2t^{-2\alpha\iota^{2}},$

so plugging into (179) completes the proof. ∎

Corollary 5.

Suppose that $k_{1},k_{2}\in[K]$ , $t\in\mathbb{N}$ ,and $\ell>0$ satisfy $\mu_{k_{2}}-\mu_{k_{1}}\geq\sqrt{4\alpha\log(t)/\ell}$ . Let $j\in\mathbb{N}$ be such that $t\in\{1+A_{j-1},\ldots,A_{j}\}$ , i.e., $j=A^{-1}(t)$ . Then for any $i\in[n]$ , we have

\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})\leq 2t^{2(1-\alpha)}.

Proof.

Using $T_{k_{2}}^{(i)}(A_{j})\leq A_{j}$ by definition and applying Claim 21 with $u=A_{j}$ and $\iota=1$ ,

	$\displaystyle\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})$	$\displaystyle=\mathbb{P}(T_{k_{1}}^{(i)}(t-1)\geq\ell,T_{k_{2}}^{(i)}(A_{j})\leq A_{j},k_{2}\in S_{j}^{(i)},I_{t}^{(i)}=k_{1})$
		$\displaystyle\leq 2(\lfloor A_{j}\rfloor\wedge t)t^{1-2\alpha}=t^{2(1-\alpha)}.\qed$

Corollary 6.

For any $i\in[n]$ , $k\in[K]$ , and $T\in\mathbb{N}$ , we have

\mathbb{E}\left[\sum_{t=A_{\tau_{\text{spr}}}+1}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]\leq\frac{4(\alpha-1)}{2\alpha-3}.

Proof.

First observe that since $1\in S_{A^{-1}(t)}^{(i)}$ whenever $t\geq A_{\tau_{\text{spr}}}+1$ by definition, we have

(182)

\displaystyle\sum_{t=A_{\tau_{\text{spr}}}+1}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\leq\sum_{t=1}^{\infty}\mathbbm{1}\left(T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}},1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k\right).

Next, let $k_{1}=k$ , $k_{2}=1$ , and $\ell=4\alpha\log(t)/\Delta_{k}^{2}$ . Then by definition, we have $\mu_{k_{2}}-\mu_{k_{1}}=\Delta_{k}=\sqrt{4\alpha\log(t)/\ell}$ . Therefore, we can use Corollary 5 to obtain

(183)

\mathbb{P}(T_{k}^{(i)}(t-1)\geq 4\alpha\log(t)/\Delta_{k}^{2},1\in S_{A^{-1}(t)}^{(i)},I_{t}^{(i)}=k)\leq 2t^{2(1-\alpha)}.

Hence, taking expectation in (182), then plugging in (183) to the right side and using Claim 19, yields

\mathbb{E}\left[\sum_{t=A_{\tau_{\text{spr}}}+1}^{T}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)\geq\frac{4\alpha\log t}{\Delta_{k}^{2}}\right)\right]\leq 2\left(1+\sum_{t=2}^{\infty}t^{2(1-\alpha)}\right)\leq\frac{4(\alpha-1)}{2\alpha-3}.\qed

Claim 22.

For any $i\in[n]$ , $t_{1},t_{2}\in\mathbb{N}$ such that $t_{1}<t_{2}$ , and $\{\ell_{t}\}_{t=t_{1}}^{t_{2}}\subset(0,\infty)$ ,

\sum_{t=t_{1}}^{t_{2}}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\ell_{t}\right)\leq\max_{t\in\{t_{1},\ldots,t_{2}\}}\ell_{t}.

Proof.

Set $\ell=\max_{t\in\{t_{1},\ldots,t_{2}\}}\ell_{t}$ . Then clearly

\sum_{t=t_{1}}^{t_{2}}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\ell_{t}\right)\leq\sum_{t=t_{1}}^{t_{2}}\mathbbm{1}\left(I_{t}^{(i)}=k,T_{k}^{(i)}(t-1)<\ell\right).

Now suppose the right strictly exceeds $\ell$ . Then since the right side is an integer, we can find $\lceil\ell\rceil$ times $t\in\{t_{1},\ldots,t_{2}\}$ such that $I_{t}^{(i)}=k$ and $T_{k}^{(i)}(t-1)<\ell$ . Let $\bar{t}$ denote the largest such $t$ . Because $I_{t}^{(i)}=k$ occurred at least $\lceil\ell\rceil-1$ times before $t$ , we know $T_{k}^{(i)}(\bar{t}-1)\geq\lceil\ell\rceil-1$ . But since $T_{k}^{(i)}(\bar{t}-1)<\ell$ , this implies $\ell+1>\lceil\ell\rceil$ , which is a contradiction. ∎

Finally, we recall a well-known regret decomposition.

Claim 23.

The regret $R_{T}^{(i)}$ defined in (5) satisfies $R_{T}^{(i)}=\sum_{k=2}^{K}\Delta_{k}\mathbb{E}[\sum_{t=1}^{T}\mathbbm{1}(I_{t}^{(i)}=k)]$ .

Proof.

See, e.g., the proof of (Lattimore and Szepesvári, 2020, Lemma 4.5). ∎

F.3. Calculations for the early regret

In this sub-appendix, we assume $\alpha$ , $\beta$ , $\eta$ , $\theta_{j}$ , $\kappa_{j}$ , $\rho_{1}$ , and $\rho_{2}$ are chosen as in Theorem 2. Recall $C_{i}$ , $C_{i}^{\prime}$ , etc. denote constants associated with Claim $i$ that only depend on $\alpha$ , $\beta$ , $\eta$ , $\rho_{1}$ , and $\rho_{2}$ .

Claim 24.

There exists $C_{\ref{clmThetaLower}},C_{\ref{clmThetaLower}}^{\prime}>0$ such that $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}$ and $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}}\ \forall\ j\geq C_{\ref{clmThetaLower}}^{\prime}$ .

Proof.

This follows from the choice $\theta_{j^{\prime}}=(j^{\prime}/3)^{\rho_{1}}\ \forall\ j^{\prime}\in\mathbb{N}$ in Theorem 2. ∎

Claim 25.

There exists $C_{\ref{clmJ2starFin}}>0$ such that $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}\ \forall\ j\geq C_{\ref{clmJ2starFin}}$ .

Proof.

This follows from Claim 24 and the fact that $J_{2}^{\star}$ is a constant by definition (122). ∎

Claim 26.

There exists $C_{\ref{clmAjDiffOverS}},C_{\ref{clmAjDiffOverS}}^{\prime}>0$ such that for any $j\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)}$ ,

(184)

\left(\frac{A_{j}-A_{j-1}}{S+2}-1\right)\vee 1=\frac{A_{j}-A_{j-1}}{S+2}-1\geq\frac{j^{\beta-1}}{C_{\ref{clmAjDiffOverS}}^{\prime}S}.

Proof.

By Claim 18, we can find $C,C^{\prime}>0$ depending only on $\beta$ such that $A_{j}-A_{j-1}\geq Cj^{\beta-1}$ whenever $j\geq C^{\prime}$ . Hence, for any $j\geq(6S/C)^{1/(\beta-1)}\vee C^{\prime}$ , we know $A_{j}-A_{j-1}\geq Cj^{\beta-1}\geq 6S$ , so

(185)

\displaystyle\frac{A_{j}-A_{j-1}}{S+2}-1\geq\frac{Cj^{\beta-1}-3S}{3S}\geq\frac{Cj^{\beta-1}-Cj^{\beta-1}/2}{3S}=\frac{Cj^{\beta-1}}{6S}\geq 1,

where we also used $S\geq 1$ . The claim follows if we set $C_{\ref{clmAjDiffOverS}}=(6/C)^{1/(\beta-1)}\vee C^{\prime}$ and $C_{\ref{clmAjDiffOverS}}^{\prime}=C/6$ . ∎

Claim 27.

There exists $C_{\ref{clmUglyDiff}},C_{\ref{clmUglyDiff}}^{\prime}>0$ such that for any $j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}$ ,

(186)

\log(A_{j-1}\vee 1)\geq\beta\log(j)/2>0,\quad\frac{1-\psi}{\sqrt{\kappa_{j}}}-\frac{2}{\sqrt{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}}>\sqrt{\frac{C_{\ref{clmUglyDiff}}^{\prime}K^{2}S}{j^{\rho_{2}}}}.

Proof.

By Claim 26, we can find constants $C_{\ref{clmAjDiffOverS}},C_{\ref{clmAjDiffOverS}}^{\prime}>0$ such that for any $j\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)}$ ,

2/\sqrt{([(A_{j}-A_{j-1})/(S+2)]-1)\vee 1}\leq\sqrt{4C_{\ref{clmAjDiffOverS}}^{\prime}S/j^{\beta-1}}\leq\sqrt{4C_{\ref{clmAjDiffOverS}}^{\prime}K^{2}S/j^{\beta-1}},

where we also used $K\geq 1$ . Furthermore, since $\rho_{2}<\beta-1$ by assumption, we can find $C>0$ depending only on $C_{\ref{clmAjDiffOverS}}^{\prime}$ , $\psi$ , $\beta$ , and $\rho_{2}$ , such that for any $j\geq C$ , we have $4C_{\ref{clmAjDiffOverS}}^{\prime}/j^{\beta-1}<(1-\psi)^{2}/(4j^{\rho_{2}})$ . Combined with the previous inequality and the choice $\kappa_{j}=j^{\rho_{2}}/(K^{2}S)$ in Theorem 2, we obtain

\frac{1-\psi}{\sqrt{\kappa_{j}}}-\frac{2}{\sqrt{(\frac{A_{j}-A_{j-1}}{S+2}-1)\vee 1}}>\sqrt{\frac{(1-\psi)^{2}K^{2}S}{4j^{\rho_{2}}}}\ \forall\ j\geq(C_{\ref{clmAjDiffOverS}}\vee C)S^{1/(\beta-1)}.

Hence, if we set $C_{\ref{clmUglyDiff}}=C_{\ref{clmAjDiffOverS}}\vee C\vee 4$ and $C_{\ref{clmUglyDiff}}^{\prime}=(1-\psi)^{2}/4$ , the second inequality in (186) holds for $j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}$ . Finally, define $h(j)=j-1-\sqrt{j}\ \forall\ j\in\mathbb{N}$ . Then $h(4)=1$ and $h^{\prime}(j)=1-1/(2\sqrt{j})>0\ \forall\ j\geq 4$ , so $h(j)\geq 0\ \forall\ j\geq 4$ . Thus, for any $j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}\geq 4$ , we know $j-1\geq\sqrt{j}$ , so by Claim 18, $\log(A_{j-1})\geq\log((j-1)^{\beta})\geq\log(\sqrt{j}^{\beta})=\beta\log(j)/2$ , i.e., the first inequality in (186) holds. ∎

Claim 28.

There exists $C_{\ref{clmJ1starFin}}>0$ such that for any $j\geq C_{\ref{clmJ1starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}$ , $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}$ .

Proof.

By Claims 26 and 27, we can set $C=C_{\ref{clmAjDiffOverS}}\vee C_{\ref{clmUglyDiff}}$ to ensure that for $j\geq CS^{1/(\beta-1)}$ , $A_{j}-A_{j-1}\geq 2(S+2)$ and $\delta_{j,2}>0$ . Hence, $J_{1}^{\star}\leq CS^{1/(\beta-1)}$ by definition (101). On the other hand, by Claim 24, we know $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}}$ for $j\geq C_{\ref{clmThetaLower}}^{\prime}$ . Thus, if we set $C_{\ref{clmJ1starFin}}=(C/C_{\ref{clmThetaLower}})^{1/\rho_{1}^{2}}\vee C_{\ref{clmThetaLower}}^{\prime}$ , then for any $j\geq C_{\ref{clmJ1starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}$ , we obtain $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}}$ (since $j\geq C_{\ref{clmThetaLower}}^{\prime}$ ) and $C_{\ref{clmThetaLower}}j^{\rho_{1}^{2}}\geq CS^{1/(\beta-1)}$ (since $j\geq(C/C_{\ref{clmThetaLower}})^{1/\rho_{1}^{2}}S^{1/(\rho_{1}^{2}(\beta-1))}=((C/C_{\ref{clmThetaLower}})S^{1/(\beta-1)})^{1/\rho_{1}^{2}}$ ), which implies $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq CS^{1(\beta-1)}\geq J_{1}^{\star}$ . ∎

Claim 29.

There exists $C_{\ref{clmJ3starFin}}>0$ such that for any $j\geq C_{\ref{clmJ3starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}$ , $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}$ .

Proof.

We first upper bound $J_{3}^{\star}$ . By Claim 24, we can find $C_{\ref{clmThetaLower}},C_{\ref{clmThetaLower}}^{\prime}>0$ such that $\lfloor\theta_{j}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}$ when $j\geq C_{\ref{clmThetaLower}}^{\prime}$ . Let $j\geq C_{\ref{clmThetaLower}}^{\prime}\vee((C_{\ref{clmAjDiffOverS}}/C_{\ref{clmThetaLower}})S^{1/(\beta-1)})^{1/\rho_{1}}$ , where $C_{\ref{clmAjDiffOverS}}$ is from Claim 26, and $j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}$ . Then $j^{\prime}\geq\lfloor\theta_{j}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)}$ , so we can find $C_{\ref{clmAjDiffOverS}}^{\prime}>0$ such that

(K+1)\delta_{j^{\prime},1}\leq(K+1)\sqrt{\frac{4\alpha\log(A_{j^{\prime}})C_{\ref{clmAjDiffOverS}}^{\prime}S}{(j^{\prime})^{\beta-1}}}\leq\sqrt{\frac{(2K)^{2}4\alpha\log(j^{2\beta})C_{\ref{clmAjDiffOverS}}^{\prime}}{(C_{\ref{clmThetaLower}}j^{\rho_{1}})^{\beta-1}}}=\sqrt{\frac{CK^{2}S\log j}{j^{\rho_{1}(\beta-1)}}},

where the first inequality uses Claim 26; the second inequality uses $K\geq 1$ , $j^{\prime}\geq\lfloor\theta_{j}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}$ , and Claim 18; and the equality defines $C=32\alpha\beta C_{\ref{clmAjDiffOverS}}^{\prime}/C_{\ref{clmThetaLower}}^{\beta-1}$ . On the other hand, if $j\geq C_{\ref{clmUglyDiff}}S^{1/(\beta-1)}$ , where $C_{\ref{clmUglyDiff}}$ is from Claim 27, we can find $C_{\ref{clmUglyDiff}}^{\prime}>0$ such that

\delta_{j,2}>\sqrt{\alpha\beta\log(j)C_{\ref{clmUglyDiff}}^{\prime}K^{2}S/(2j^{\rho_{2}})}=\sqrt{C^{\prime}K^{2}S\log(j)/j^{\rho_{2}}},

where the inequality uses Claim 27 and the equality defines $C^{\prime}=\alpha\beta C_{\ref{clmUglyDiff}}^{\prime}/2$ . Finally, by assumption $\rho_{2}<\rho_{1}(\beta-1)$ , we can find $C^{\prime\prime}>0$ such that, for any $j\geq C^{\prime\prime}$ , we have $C^{\prime}/j^{\rho_{2}}\geq C/j^{\rho_{1}(\beta-1)}$ . Combined with the previous two inequalities, we obtain that for $j\geq C_{\ref{clmThetaLower}}^{\prime}\vee((C_{\ref{clmAjDiffOverS}}/C_{\ref{clmThetaLower}})S^{1/(\beta-1)})^{1/\rho_{1}}\vee(C_{\ref{clmUglyDiff}}S^{1/(\beta-1)})\vee C^{\prime\prime}$ and any $j^{\prime}\in\{\lfloor\theta_{j}\rfloor,\ldots,j\}$ , $\delta_{j,2}>(K+1)\delta_{j^{\prime},1}$ . Therefore, by definition of $J_{3}^{\star}$ (137), we conclude that $J_{3}^{\star}\leq\tilde{C}S^{1/(\rho_{1}(\beta-1))}$ , where $\tilde{C}=C_{\ref{clmThetaLower}}^{\prime}\vee(C_{\ref{clmAjDiffOverS}}/C_{\ref{clmThetaLower}})\vee C_{\ref{clmUglyDiff}}\vee C^{\prime\prime}$ . Therefore, if we set $C_{\ref{clmJ3starFin}}=C_{\ref{clmThetaLower}}^{\prime}\vee(\tilde{C}/C_{\ref{clmThetaLower}})^{1/\rho_{1}}$ , we obtain that for any $j\geq C_{\ref{clmJ3starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}$ , $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}$ (since $j\geq C_{\ref{clmThetaLower}}^{\prime}$ ) and $C_{9}j^{\rho_{1}}\geq\tilde{C}S^{1/(\rho_{1}(\beta-1))}$ (since $j\geq(\tilde{C}/C_{\ref{clmThetaLower}})^{1/\rho_{1}}S^{1/(\rho_{1}^{2}(\beta-1))}=(\tilde{C}S^{1/(\rho_{1}(\beta-1))}/C_{\ref{clmThetaLower}})^{1/\rho_{1}}$ ), so stringing together the inequalities, we conclude $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}\geq\tilde{C}S^{1/(\rho_{1}(\beta-1))}\geq J_{3}^{\star}$ . ∎

Claim 30.

There exists $C_{\ref{clmJ4star}}>0$ such that $J_{4}^{\star}\leq C_{\ref{clmJ4star}}(S\log(C_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)}$ .

Proof.

Let $\tilde{C}_{\ref{clmJ4star}}=16C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta/(\beta-1)$ and $C_{\ref{clmJ4star}}=\tilde{C}_{\ref{clmJ4star}}\vee(3\tilde{C}_{\ref{clmJ4star}}^{1/(\beta-1)})\vee(3C_{\ref{clmAjDiffOverS}})\vee 16$ , where $C_{\ref{clmAjDiffOverS}}$ and $C_{\ref{clmAjDiffOverS}}^{\prime}$ are the constants from Claim 26. Also define $J_{4}^{\dagger}=C_{\ref{clmJ4star}}(S\log(C_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)}$ . Then by definition of $J_{4}^{\star}$ (147), it suffices to show $\delta_{j,1}<\Delta_{2}\ \forall\ j\geq\lfloor\lfloor J_{4}^{\dagger}\rfloor/2\rfloor$ . Fix such a $j$ and suppose instead that $\delta_{j,1}\geq\Delta_{2}$ . Since $C_{\ref{clmJ4star}}\geq 16$ , we know $J_{4}^{\dagger}\geq C_{\ref{clmJ4star}}(\log C_{\ref{clmJ4star}})^{1/(\beta-1)}\geq 16$ , so

(187)

j\geq(\lfloor J_{4}^{\dagger}\rfloor/2)-1\geq((J_{4}^{\dagger}-1)/2)-1=(J_{4}^{\dagger}/2)-(3/2)>J_{4}^{\dagger}/3.

Hence, because $C_{\ref{clmJ4star}}\geq 3C_{\ref{clmAjDiffOverS}}$ , we have $j\geq J_{4}^{\dagger}/3\geq C_{\ref{clmAjDiffOverS}}S^{1/(\beta-1)}$ , so by Claims 26 and 18, respectively,

(188)

\delta_{j,1}^{2}\leq 4\alpha\log(A_{j})C_{\ref{clmAjDiffOverS}}^{\prime}S/j^{\beta-1}\leq 4\alpha\log(j^{2\beta})C_{\ref{clmAjDiffOverS}}^{\prime}S/j^{\beta-1}=8C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S\log(j)/j^{\beta-1}.

Rearranging and using the assumption $\delta_{j,1}\geq\Delta_{2}$ , this implies $j^{\beta-1}\leq 8C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S\log(j)/\Delta_{2}^{2}$ . Hence, applying Claim 20 with $x=j$ , $y=\beta-1$ , and $z=8C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S/\Delta_{2}^{2}$ , we obtain

j\leq(16C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S\log(16C_{\ref{clmAjDiffOverS}}^{\prime}\alpha\beta S/((\beta-1)\Delta_{2}^{2}))/((\beta-1)\Delta_{2}^{2}))^{1/(\beta-1)}=(\tilde{C}_{\ref{clmJ4star}}S\log(\tilde{C}_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)}.

But since $C_{\ref{clmJ4star}}\geq\tilde{C}_{\ref{clmJ4star}}\vee(3\tilde{C}_{\ref{clmJ4star}}^{1/(\beta-1)})$ , we have shown $j\leq J_{4}^{\dagger}/3$ , which contradicts (187). ∎

Claim 31.

There exists $C_{\ref{clmExpToPoly}}>0$ such that, for any $j\geq(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}\bar{d}(n+m)))^{1/\rho_{1}}$ ,

(189)

(n+m)^{3}\exp(-\lfloor\theta_{\lfloor j/2\rfloor}\rfloor/(3\bar{d}))\leq j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.

Proof.

Let $\tilde{C}_{\ref{clmExpToPoly}}=12\rho_{1}(\beta(2\alpha-3)-1)/C_{\ref{clmThetaLower}}$ and $C_{\ref{clmExpToPoly}}=C_{\ref{clmThetaLower}}^{\prime}\vee 3\vee(18/C_{\ref{clmThetaLower}})\vee\tilde{C}_{\ref{clmExpToPoly}}$ , and suppose instead that (189) fails for some $j\geq(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}\bar{d}(n+m)))^{1/\rho_{1}}$ . Then we can write

(190)

\lfloor\theta_{\lfloor j/2\rfloor}\rfloor<9\bar{d}\log(n+m)+3\bar{d}\rho_{1}^{2}(\beta(2\alpha-3)-1)\log j.

Since $C_{\ref{clmExpToPoly}}\geq C_{\ref{clmThetaLower}}^{\prime}\vee 3$ and $\rho_{1}\in(0,1)$ , we know that $j\geq(C_{\ref{clmThetaLower}}^{\prime}\log(3))^{1/\rho_{1}}>C_{\ref{clmThetaLower}}^{\prime}$ , so $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq C_{\ref{clmThetaLower}}j^{\rho_{1}}$ by Claim 24. Since $C_{\ref{clmExpToPoly}}\geq(18/C_{\ref{clmThetaLower}})\vee 1$ and $\bar{d}\geq 1$ , we also have $j\geq((18/C_{\ref{clmThetaLower}})\log(n+m))^{1/\rho_{1}}$ , or $C_{\ref{clmThetaLower}}j^{\rho_{1}}/2\geq 9\bar{d}\log(n+m)$ . Combining these two bounds with (190), we conclude

(191)

j^{\rho_{1}}<(6\rho_{1}^{2}(\beta(2\alpha-3)-1)/C_{\ref{clmThetaLower}})\bar{d}\log j=\tilde{C}_{\ref{clmExpToPoly}}(\rho_{1}/2)\bar{d}\log j.

Applying Claim 20 with $x=j$ , $y=\rho_{1}$ , and $z=\tilde{C}_{\ref{clmExpToPoly}}(\rho_{1}/2)\bar{d}$ , we obtain $j<(\tilde{C}_{\ref{clmExpToPoly}}\bar{d}\log(\tilde{C}_{\ref{clmExpToPoly}}\bar{d}))^{1/\rho_{1}}$ . But since $C_{\ref{clmExpToPoly}}\geq\tilde{C}_{\ref{clmExpToPoly}}$ , this contradicts the assumed lower bound on $j$ . ∎

Claim 32.

Define $C_{\ref{clmJstar}}=C_{\ref{clmJ2starFin}}\vee C_{\ref{clmJ1starFin}}\vee C_{\ref{clmJ3starFin}}$ and

	$\displaystyle J_{\star}$	$\displaystyle=(C_{\ref{clmJstar}}S^{1/(\rho_{1}^{2}(\beta-1))})\vee(C_{\ref{clmJ4star}}(S\log(C_{\ref{clmJ4star}}S/\Delta_{2}^{2})/\Delta_{2}^{2})^{1/(\beta-1)})\vee(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}^{\prime}(n+m)))^{1/\rho_{1}}.$
		$\displaystyle=\Theta\left(S^{1/(\rho_{1}^{2}(\beta-1))}\vee(S\log(S/\Delta_{2})/\Delta_{2}^{2})^{1/(\beta-1)}\vee(\bar{d}\log(n+m))^{1/\rho_{1}}\right).$

Then for any $j\in\mathbb{N}$ such that $j\geq J_{\star}$ , we have

(192)		$\displaystyle j\geq J_{4}^{\star},\quad\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}\vee J_{3}^{\star},\quad\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}\vee(2+j^{\rho_{1}^{2}}/84)\geq 3$
(193)		$\displaystyle(n+m)^{3}\exp(-\lfloor\theta_{\lfloor j/2\rfloor}\rfloor/(3\bar{d}))\leq j^{\rho_{1}^{2}(\beta(3-2\alpha)+1)}.$

Proof.

The first bound holds by $j\geq C_{\ref{clmJ4star}}((S/\Delta_{2}^{2})\log(C_{\ref{clmJ4star}}^{\prime}S/\Delta_{2}^{2}))^{1/(\beta-1)}$ and Claim 30. The second holds since $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{2}^{\star}$ by $j\geq C_{\ref{clmJ2starFin}}$ and Claim 25, and since $\lfloor\theta_{\lfloor j/2\rfloor}\rfloor\geq J_{3}^{\star}$ by $j\geq C_{\ref{clmJ3starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}$ and Claim 29. The third holds since $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq J_{1}^{\star}$ by $j\geq C_{\ref{clmJ1starFin}}S^{1/(\rho_{1}^{2}(\beta-1))}$ and Claim 28, and since $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\geq(2+j^{\rho_{1}^{2}}/48)$ for large enough $C_{\ref{clmThetaLower}}^{\prime}$ . The fourth holds since $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor>2$ (by the third) and $\lfloor\theta_{\lfloor\theta_{\lfloor j/2\rfloor}\rfloor}\rfloor\in\mathbb{N}$ . The fifth holds by $j\geq(C_{\ref{clmExpToPoly}}\bar{d}\log(C_{\ref{clmExpToPoly}}^{\prime}(n+m)))^{1/\rho_{1}}$ and Claim 31. ∎

F.4. Calculations for the later regret

Claim 33.

There exists $C_{\ref{clmSmallTtheta}}>0$ such that, for any $\gamma_{i}\in(0,1)$ ,

(194)

\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta}\leq 4\alpha K\log(T)/\Delta_{2}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTtheta}}/\gamma_{i})\log\left(C_{\ref{clmSmallTtheta}}K/(\Delta_{2}\gamma_{i})\right).

Proof.

Similar to Claim 24 from Appendix F.3, we can find constants $C,C^{\prime}>0$ such that for any $j\geq C^{\prime}$ , $\lfloor\theta_{j}\rfloor\geq Cj^{\rho_{1}}$ . If $\lceil T^{\gamma_{i}/\beta}\rceil<C^{\prime}$ , then $T^{\gamma_{i}}\leq\lceil T^{\gamma_{i}/\beta}\rceil^{\beta}<(C^{\prime})^{\beta}$ , so $\log T\leq(\beta/\gamma_{i})\log(C^{\prime})$ , and the right side of (194) will hold for $C_{\ref{clmSmallTtheta}}\geq\beta\vee C^{\prime}$ . If instead $\lceil T^{\gamma_{i}/\beta}\rceil\geq C^{\prime}$ , then $\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor\geq C\lceil T^{\gamma_{i}/\beta}\rceil^{\rho_{1}}\geq CT^{\gamma_{i}\rho_{1}/\beta}$ , so if the left side of (194) holds, we can write

C^{\beta}T^{\gamma_{i}\rho_{1}}=(CT^{\gamma_{i}\rho_{1}/\beta})^{\beta}\leq\lfloor\theta_{\lceil T^{\gamma_{i}/\beta}\rceil}\rfloor^{\beta}\leq 4\alpha\log(T)/\Delta_{2}.

Hence, applying Claim 20 with $x=T$ , $y=\gamma_{i}\rho_{1}$ , and $z=4\alpha/(C^{\beta}\Delta_{2})$ , we obtain

\log T\leq\log(8\alpha/(\Delta_{2}\gamma_{i}C^{\beta}\rho_{1}))^{2/(\gamma_{i}\rho_{1})}\leq(C_{\ref{clmSmallTtheta}}/\gamma_{i})\log(C_{\ref{clmSmallTtheta}}K/(\Delta_{2}\gamma_{i})),

where the last inequality holds for any $C_{\ref{clmSmallTtheta}}\geq(8\alpha/(C^{\beta}\rho_{1}))\vee(2/\rho_{1})$ . ∎

Claim 34.

There exists $C_{\ref{clmSmallTKappa1}}>0$ such that, for any $\gamma_{i}\in(0,1)$ ,

\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\leq 4\alpha K\log(T)/\Delta_{2}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTKappa1}}/\gamma_{i})\log\left(C_{\ref{clmSmallTKappa1}}K/(\Delta_{2}\gamma_{i})\right).

Proof.

Recall $\kappa_{j}=j^{\rho_{2}}/(K^{2}S)$ in Theorem 2. Hence, because $S\leq K$ , we know that $\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\geq T^{\gamma_{i}\rho_{2}/\beta}/K^{3}$ . Rearranging and using the assumed bound, we obtain

T^{\gamma_{i}\rho_{2}/\beta}\leq K^{3}\cdot\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\leq K^{3}\cdot 4\alpha K\log(T)/\Delta_{2}=(4\alpha K^{4}/\Delta_{2})\log T\leq(4\alpha K/\Delta_{2})^{4}\log T,

where the last inequality uses $\alpha\geq 1$ and $\Delta_{2}\in(0,1)$ . Applying Claim 20 with $x=T$ , $y=\gamma_{i}\rho_{2}/\beta$ , and $z=(4\alpha K/\Delta_{2})^{4}$ , and noting that $2/y\leq(2/y)^{4}$ (since $\gamma_{i}\in(0,1)$ and $\rho_{2}\in(0,\beta-1)$ ), we obtain

\log T\leq\log(8\alpha\beta K/(\Delta_{2}\gamma_{i}\rho_{2}))^{8\beta/(\gamma_{i}\rho_{2})}\leq(C_{\ref{clmSmallTKappa1}}/\gamma_{i})\log(C_{\ref{clmSmallTKappa1}}K/(\Delta_{2}\gamma_{i})),

where the second inequality holds for $C_{\ref{clmSmallTKappa1}}\geq 8\alpha\beta/\rho_{2}$ . ∎

Claim 35.

There exists $C_{\ref{clmSmallTKappa2}}>0$ such that, for any $\gamma_{i}\in(0,1)$ ,

\exists\ j\geq\lceil T^{\gamma_{i}/\beta}\rceil\ s.t.\ \lceil\kappa_{j}\rceil\leq 1+4\alpha\log(A_{j})/\Delta_{2}^{2}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTKappa2}}/\gamma_{i})\log\left(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i})\right).

Proof.

Fix $j\geq\lceil T^{\gamma_{i}/\beta}\rceil$ such that $\lceil\kappa_{j}\rceil\leq 1+4\alpha\log(A_{j})/\Delta_{2}^{2}$ . Note that if $j=1$ , then $1\geq\lceil T^{\gamma_{i}/\beta}\rceil\geq T^{\gamma_{i}/\beta}$ , so since $T\in\mathbb{N}$ , we must have $T=1$ . This implies $\log T=0$ , so the claimed bound is immediate. Hence, we assume $j\geq 2$ moving forward. We first observe

j^{\rho_{2}}\leq K^{3}\lceil\kappa_{j}\rceil\leq K^{3}\left(1+\frac{4\alpha\log A_{j}}{\Delta_{2}^{2}}\right)\leq K^{3}\left(1+\frac{8\alpha\beta\log j}{\Delta_{2}^{2}}\right)\leq K^{3}\left(\frac{16\alpha\beta\log j}{\Delta_{2}^{2}}\right)=\frac{16\alpha\beta K^{3}\log j}{\Delta_{2}^{2}}.

where the inequalities use $\lceil\kappa_{j}\rceil=\lceil j^{\rho_{2}}/(K^{2}S)\rceil\geq j^{\rho_{2}}/K^{3}$ , the assumed upper bound on $\lceil\kappa_{j}\rceil$ , Claim 18, and $8\alpha\beta\log(j)/\Delta_{2}^{2}\geq 1$ (since $\alpha,\beta\geq 1$ , $j\geq 2$ , and $\Delta_{2}\in(0,1)$ ), respectively. Applying Claim 20 with $x=j$ , $y=\rho_{2}$ , and $z=16\alpha\beta K^{3}/\Delta_{2}^{2}$ , and noting that $2z/y\leq(32\alpha\beta K/(\Delta_{2}\rho_{2}))^{3}$ (since $\alpha\geq 1$ , $\rho_{2}\in(0,\beta-1)$ , and $\Delta_{2}\in(0,1)$ ), we obtain $j\leq(32\alpha\beta K/(\Delta_{2}\rho_{2}))^{6/\rho_{2}}\leq(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i}))^{6/\rho_{2}}$ for any $C_{\ref{clmSmallTKappa2}}\geq 32\alpha\beta/\rho_{2}$ . Therefore, by assumption $j\geq\lceil T^{\gamma_{i}/\beta}\rceil$ , we obtain that for any $C_{\ref{clmSmallTKappa2}}\geq 32\alpha\beta/\rho_{2}$ ,

T\leq\lceil T^{\gamma_{i}/\beta}\rceil^{\beta/\gamma_{i}}\leq j^{\beta/\gamma_{i}}\leq\left(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i})\right)^{6\beta/(\rho_{2}\gamma_{i})}\leq\left(C_{\ref{clmSmallTKappa2}}K/(\Delta_{2}\gamma_{i})\right)^{C_{\ref{clmSmallTKappa2}}/\gamma_{i}}.\qed

Claim 36.

There exists $C_{\ref{clmSmallTSum}}>0$ such that, for any $\gamma_{i}\in(0,1)$ ,

(195)

\frac{\Delta_{2}(2\alpha-3)}{8\alpha K^{2}}\leq\log(T)\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}\quad\Rightarrow\quad\log T\leq(C_{\ref{clmSmallTSum}}/\gamma_{i})\log(C_{\ref{clmSmallTSum}}K/(\Delta_{2}\gamma_{i})).

Proof.

We first eliminate the corner case where $\min\{T^{\gamma_{i}/\beta},\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\}<2$ . In this case, one of $T^{\gamma_{i}/\beta}<2$ and $\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}<2$ must hold. If the former holds, then $\log T<(\beta/\gamma_{i})\log 2$ , and if the latter holds, then $2>\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}=\lceil T^{\gamma_{i}/\beta}\rceil^{\rho_{2}}/(K^{2}S)\geq T^{\gamma_{i}\rho_{2}/\beta}/K^{3}$ , so $\log T\leq(\beta/(\gamma_{i}\rho_{2}))\log(2K^{3})$ . In both cases, we can clearly find $C_{\ref{clmSmallTSum}}>0$ satisfying the right side of (195).

Next, we assume $\kappa_{\lceil T^{\gamma_{i}/\beta}\rceil}\geq 2$ and $T^{\gamma_{i}/\beta}\geq 2$ . By monotonicity, the former implies $\kappa_{j}\geq 2$ for any $j\geq\lceil T^{\gamma_{i}/\beta}\rceil$ . For any such $j$ , by definition and $S\leq K$ , we can then write

\lceil\kappa_{j}\rceil-1\geq\kappa_{j}-1\geq\kappa_{j}/2=j^{\rho_{2}}/(2K^{2}S)\geq j^{\rho_{2}}/(2K^{3}).

Therefore, since $3-2\alpha<0$ by assumption in Theorem 2, we obtain

\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha}\leq 2^{2\alpha-3}K^{3(2\alpha-3)}\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}j^{\rho_{2}(3-2\alpha)}.

For the summation at right, we use $T^{\gamma_{i}/\beta}\geq 2$ (which implies $\lceil T^{\gamma_{i}/\beta}\rceil-1\geq T^{\gamma_{i}/\beta}-1\geq T^{\gamma_{i}/\beta}/2$ ) and $\rho_{2}(2\alpha-3)>1$ by assumption in Theorem 2, along with Claim 19, to write

\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}j^{\rho_{2}(3-2\alpha)}\leq\frac{(T^{\gamma_{i}/\beta}/2)^{1+\rho_{2}(3-2\alpha)}}{\rho_{2}(2\alpha-3)-1}=\frac{2^{\rho_{2}(2\alpha-3)}T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/\beta}}{2(\rho_{2}(2\alpha-3)-1)}.

Using $\rho_{2}(2\alpha-3)>1$ (by assumption), we also know

T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/\beta}\log(T)=\frac{2\beta T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/\beta}\log(T^{\gamma_{i}(\rho_{2}(2\alpha-3)-1)/(2\beta)})}{\gamma_{i}(\rho_{2}(2\alpha-3)-1)}\leq\frac{2\beta T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/(2\beta)}}{\gamma_{i}(\rho_{2}(2\alpha-3)-1)}.

Combining the previous three inequalities, we then obtain

\log(T)\sum_{j=\lceil T^{\gamma_{i}/\beta}\rceil}^{\infty}(\lceil\kappa_{j}\rceil-1)^{3-2\alpha)}\leq\frac{2^{(\rho_{2}+1)(2\alpha-3)}\beta}{(\rho_{2}(2\alpha-3)-1)^{2}}\frac{K^{3(2\alpha-3)}}{\gamma_{i}}T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/(2\beta)}.

Therefore, if the left side of (195) holds, we are guaranteed that

\frac{\Delta_{2}(2\alpha-3)}{8\alpha K^{2}}\leq\frac{2^{(\rho_{2}+1)(2\alpha-3)}\beta}{(\rho_{2}(2\alpha-3)-1)^{2}}\frac{K^{3(2\alpha-3)}}{\gamma_{i}}T^{\gamma_{i}(1+\rho_{2}(3-2\alpha))/(2\beta)},

or, after rearranging, then using $\alpha>1$ and $\Delta_{2},\gamma_{i}\in(0,1)$ ,

T^{\gamma_{i}(\rho_{2}(2\alpha-3)-1)/(2\beta)}\leq\frac{8\cdot 2^{(\rho_{2}+1)(2\alpha-3)}\alpha\beta}{(2\alpha-3)(\rho_{2}(2\alpha-3)-1)^{2}}\frac{K^{6\alpha-7}}{\Delta_{2}\gamma_{i}}\leq(C_{\ref{clmSmallTSum}})^{6\alpha}\frac{K^{6\alpha-7}}{\Delta_{2}\gamma_{i}}\leq\left(\frac{C_{\ref{clmSmallTSum}}K}{\Delta_{2}\gamma_{i}}\right)^{6\alpha},

where the second inequality holds for large $C_{\ref{clmSmallTSum}}$ and the third uses $\alpha\geq 1$ and $\Delta_{2},\gamma_{i}\in(0,1)$ . Taking logarithms and choosing $C_{\ref{clmSmallTSum}}$ appropriately in terms of $\rho_{2}$ , $\alpha$ , and $\beta$ yields the right side of (195). ∎