Sequential anomaly identification with observation control under generalized error metrics

Aristomenis Tsopelakos, Georgios Fellouris This work was presented in part in ISIT ’20, [1]. Aristomenis Tsopelakos, Georgios Fellouris are with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign.

Abstract

The problem of sequential anomaly detection and identification is considered, where multiple data sources are simultaneously monitored and the goal is to identify in real time those, if any, that exhibit “anomalous” statistical behavior. An upper bound is postulated on the number of data sources that can be sampled at each sampling instant, but the decision maker selects which ones to sample based on the already collected data. Thus, in this context, a policy consists not only of a stopping rule and a decision rule that determine when sampling should be terminated and which sources to identify as anomalous upon stopping, but also of a sampling rule that determines which sources to sample at each time instant subject to the sampling constraint. Two distinct formulations are considered, which require control of different, “generalized” error metrics. The first one tolerates a certain user-specified number of errors, of any kind, whereas the second tolerates distinct, user-specified numbers of false positives and false negatives. For each of them, a universal asymptotic lower bound on the expected time for stopping is established as the error probabilities go to 0, and it shown to be attained by a policy that combines the stopping and decision rules proposed in the full-sampling case with a probabilistic sampling rule that achieves a specific long-run sampling frequency for each source. Moreover, the optimal to a first-order asymptotic approximation expected time for stopping is compared in simulation studies with the corresponding factor in a finite regime, and the impact of the sampling constraint and tolerance to errors is assessed.

Index Terms:

Anomaly identification, generalized error metric, sampling design, asymptotic optimality.

I Introduction

In many scientific and engineering applications, measurements from various data sources are collected sequentially, aiming to identify in real-time the sources, if any, with “anomalous” statistical behavior. In internet security systems [2], for example, the data streams may refer to the transition rate of a link, where a meager rate warns of a possible intrusion. In finance, the data streams may refer to prices in the stock market [3], where we need to detect an unusual rate of return of a stock price.

Such applications, among many others, motivate the formulation of sequential multiple testing problems where multiple data sources are monitored sequentially, a binary hypothesis testing problem is formulated for each of them, and the goal is to simultaneously solve these testing problems as quickly as possible, while controlling certain error metrics. In some works, e.g.,[4, 5, 6], it is assumed that all sources can be monitored at each sampling instant upon stopping. In others, e.g., [7, 8, 9, 10, 11, 12, 13, 14, 15], a sampling constraint is imposed, according to which it is possible to observe only a subset of sources at each time instant, but the decision maker selects which ones to sample based on the already collected data. In the latter case, in addition to a stopping rule and a decision rule that determine when to stop sampling and which sources to identify as anomalous upon stopping, it is also required to specify a sampling rule, which determines the sources to be sampled at each time instant until stopping. This sampling constraint leads to a sequential multiple testing problem with adaptive sampling design, which lies in the field of “sequential testing with controlled sensing”, or “sequential design of experiments” [16], [17], [18], [19]. Hence, methods and results from these works are applicable as well.

In the above works, the problem formulation requires, implicitly or explicitly, control of the “classical” misclassification error rate, i.e., the probability of at least one error, of any kind, or of the two “classical” familywise error rates, i.e., the probabilities of at least one false alarm and at least one missed detection. However, such error metrics can be impractical, especially when there is a common stopping time at which the decision is made for all data sources, as in the above papers. Indeed, even with a relatively small number of data sources, a single “difficult” hypothesis may determine, even inflate, the overall time required for the decisions to be made.

This inflexibility has motivated the adoption of more lenient error metrics, which are prevalent in the fixed-sample-size literature of multiple testing [20, 21, 22]. As it was shown in [22], the methodology and asymptotic optimality theory in [6], [9] for the sequential identification problem under classical familywise error rates remain valid with other error metrics as long as the latter are bounded above and below up to a multiplicative constant by the corresponding classical familywise error rates. ¹¹1The authors in [22] focus in the full sampling case, but exactly the same arguments apply in the case of sampling constraints. This is indeed the case for many error metrics, such as the false discovery rate (FDR) and the false non-discovery rate (FNR) [23]. However, it is not the case for others, such as the generalized misclassification error rate, which requires control of the probability of at least $k$ errors of any kind, and the generalized familywise type-I and type-II errors [24, 25, 26], which require control of the probabilities of at least $k_{1}$ false alarms and at least $k_{2}$ missed detections. As it was shown in [27] in the case of full sampling, that is, when all data sources are continuously monitored and there are no sampling constraints, the sequential identification problem with these generalized error metrics when $k>1$ or $k_{1}+k_{2}>2$ requires a distinct methodology and pose additional mathematical challenges compared to their corresponding classical versions, where $k=1$ or $k_{1}=k_{2}=1$ .

In the present work, we address the same sequential anomaly identification problem as in [27], but in the presence of sampling constraints. That is, unlike [27], we assume that it is not possible to observe all data sources at all times. Instead, we impose an upper bound on the average number of samples collected from all sources up to stopping. For each of the two resulting problem formulations, i.e., for each of the two generalized error metrics under consideration, (i) we establish a universal lower bound on the optimal expected time for stopping, to a first-order asymptotic approximation as the corresponding error probabilities go to zero, (ii) we show that this lower bound is attained by a policy that utilizes the stopping and decision rules in [27], as long as the long-run sampling frequency of each source is greater than or equal to a certain value, which is computed explicitly and depends both on the source and on the true subset of anomalous sources, (iii) we show that the latter can be achieved by a probabilistic sampling rule.

The methodology and the theory that we develop in the present work is based on an interplay of techniques, ideas, and methods from the sequential multiple testing problem with full sampling and generalized error control in [27], and the sequential multiple testing problem with sampling constraints and “classical” familywise error control in [9]. However, there are various interesting special features that arise in the present work, both from a technical but also from a methodological point of view. First, there are certain technical challenges which force us to strengthen some of our assumptions compared to [9]. Specifically, we require a stronger moment assumption on the log-likelihood ratio statistics than the finiteness of the Kullback-Leibler information numbers, and we have to postulate a harder sampling constraint (which is still weaker than the usual constraint in the literature that fixes the number of sources that are sampled at each time instant). Second, while in [9] it is shown that there is no need for “forced exploration”, as in the general sequential testing problem with controlled sensing [16, 17], this turns out to be needed in this work.

The remainder of the paper is organized as follows. In Section II, we formulate the two problems we consider in this work. In Section III, we state and solve two auxiliary max-min optimization problems, which play a key role in the formulation of our main results. In Section IV, we state the universal asymptotic lower bound for each of the two problems. In Section V, for each of the two problems, we introduce a family of policies that satisfies the error constraints, and we state a criterion on the sampling rule that guarantees the asymptotic optimality of such a policy. In Section VI, we present a class of probabilistic sampling rules for which the aforementioned criterion is satisfied. In Section VII, we present a simulation study that illustrates our theoretical results, and in Section VIII, we state our conclusions and discuss future research directions.

We end this section with some notations we use throughout the paper. We use $:=$ to indicate the definition of a new quantity and $\equiv$ to indicate the equivalence of two notions. We set $\mathbb{N}:=\{1,2\ldots,\}$ , $\mathbb{N}_{0}:=\{0\}\cup\mathbb{N}$ , and $[n]:=\{1,\ldots,n\}$ for $n\in\mathbb{N}$ . We denote by $|A|$ the size and by $2^{A}$ the powerset of a set $A$ , and by $A\triangle B$ the symmetric difference of two sets $A,B$ . The $\lfloor a\rfloor$ , $\lceil a\rceil$ stand for the floor and the ceiling of a positive number $a$ . The indicator function is denoted by $\mathbf{1}$ . In a summation, if the lower limit is larger than the upper limit, then the summation is assumed to be equal to $0$ . For positive sequences $(x_{n})$ and $(y_{n})$ , we write $x_{n}\sim y_{n}$ when $\lim_{n}(x_{n}/y_{n})=1$ , $x_{n}\gtrsim y_{n}$ when $\liminf_{n}(x_{n}/y_{n})\geq 1$ , and $x_{n}\lesssim y_{n}$ when $\limsup_{n}(x_{n}/y_{n})\leq 1$ . Finally, iid stands for independent and identically distributed.

II Problem formulation

Let $(\mathbb{S},\mathcal{S})$ be a measurable space and let $(\Omega,\mathcal{F},\mathsf{P})$ be a probability space that hosts $M$ independent sequences of iid, $\mathbb{S}$ -valued random elements, $\{X_{i}(n),n\in\mathbb{N}\}$ , $i\in[M]$ , which are generated by $M$ distinct data sources, and $M$ independent sequences of random variables, $\{Z_{i}(n):n\in\mathbb{N}\}$ , $i\in[M]$ , uniformly distributed in $(0,1)$ , to be used for randomization purposes. For each $i\in[M]$ , we assume that each $X_{i}(n)$ has a density with respect to some $\sigma$ -finite measure $\nu_{i}$ that is equal to either $f_{1i}$ or $f_{0i}$ , and we say that source $i$ is “anomalous” if its density is $f_{1i}$ . We denote by $\mathsf{P}_{A}$ the underlying probability measure, and by $\mathsf{E}_{A}$ the corresponding expectation, when the subset of anomalous sources is $A\subseteq[M]$ . We simply write $\mathsf{P}$ and $\mathsf{E}$ whenever the identity of the subset of anomalous sources is not relevant.

The problem we consider is the identification of the anomalous sources, if any, on the basis of sequentially acquired observations from all sources when it is not possible to observe all of them at every sampling instant. Specifically, we have to specify (i) the random time, $T$ , at which sampling is terminated, (ii) the subset of sources, $\Delta$ , that are declared as anomalous upon stopping, and, for each $n\leq T$ , (iii) the subset, $R(n)$ , of sources that are sampled at time $n$ . At any time instant, the decision whether to stop or not, as well as the subsets of sources that are identified as anomalous in the former case, or sampled next in the latter, must be determined based on the already collected data.

Therefore, we say that $R:=\{R(n):n\in\mathbb{N}\}$ is a sampling rule if $R(n)$ is $\mathcal{F}^{R}_{n-1}$ -measurable for every $n\in\mathbb{N}$ , where

\displaystyle\mathcal{F}^{R}_{n}:=\begin{cases}\sigma\left(\mathcal{F}^{R}_{n-1},\,Z(n),\,\{X_{i}(n)\,:\,i\in R(n)\}\right),\quad&n\in\mathbb{N},\\ \sigma(Z(0)),\quad&n=0,\end{cases}

(1)

and $Z(n):=(Z_{1}(n),\ldots,Z_{M}(n))$ . Moreover, we say that the triplet $(R,T,\Delta)$ is a policy if

(i)

$R$ is a sampling rule,
(ii)

$T$ is a stopping time with respect to filtration $\{\mathcal{F}^{R}_{n}\,:\,n\in\mathbb{N}\}$ ,
(iii)

$\Delta$ is $\mathcal{F}^{R}_{T}$ -measurable, i.e.,

$\{T=n,\Delta=D\}\in\mathcal{F}^{R}_{n},\quad\forall\;n\in\mathbb{N}\quad\text{and}\quad D\subseteq[M],$

in which case we refer to $T$ as a stopping rule and to $\Delta$ as a decision rule.

We denote by $\mathcal{C}$ the family of all policies, and we focus on policies that satisfy a sampling constraint and control the probabilities of certain types of error. To define these, we need to introduce some additional notation. Thus, for any sampling rule $R$ and time instant $n\in\mathbb{N}$ , we denote by $R_{i}(n)$ the indicator of whether source $i$ is sampled at time $n\in\mathbb{N}$ , i.e.,

R_{i}(n):=\mathbf{1}\{i\in R(n)\},\quad i\in[M],

by $\pi_{i}^{R}(n)$ the proportion of times source $i$ has been sampled in the first $n$ time instants, i.e.,

\displaystyle\pi_{i}^{R}(n):=\frac{1}{n}\sum_{m=1}^{n}R_{i}(m),

(2)

and we note that the average number of observations from all sources in the first $n$ time instants is

\frac{1}{n}\sum_{m=1}^{n}|R(m)|=\frac{1}{n}\sum_{i=1}^{M}\sum_{m=1}^{n}R_{i}(m)=\sum_{i=1}^{M}\pi_{i}^{R}(n).

(3)

II-A The sampling constraint

For any real number $K$ in $(0,M]$ , we say that a policy $(R,T,\Delta)$ belongs to $\mathcal{C}(K)$ if the average number of observations from all sources until stopping is less than or equal to $K$ , i.e.,

\frac{1}{T}\sum_{m=1}^{T}|R(m)|=\sum_{i=1}^{M}\pi_{i}^{R}(T)\leq K.

(4)

This sampling constraint is clearly satisfied when at most $\lfloor K\rfloor$ sources are sampled at each time instant up to stopping, i.e., when

|R(m)|\leq\lfloor K\rfloor,\quad\forall\,m\leq T,

(5)

whereas it implies the sampling constraint considered in [9],

\mathsf{E}\left[\sum_{m=1}^{T}|R(m)|\right]\leq K\,\mathsf{E}[T].

(6)

In other words, the sampling constraint (4) is looser than (5), but stricter than (6).

II-B The error constraints

We consider two types of error control, which lead to two distinct problem formulations. We characterize them both as “generalized”, as they generalize their corresponding “classical” versions of misiclassification error rate and familywise error rates.

II-B1 Control of generalized misclassification error rate

For any $K\in(0,M]$ , $k\in[M]$ , and $\alpha\in(0,1)$ , we say that a policy $(R,T,\Delta)$ in $\mathcal{C}(K)$ belongs to $\mathcal{C}(\alpha;k,K)$ if the probability of at least $k\in[M]$ errors of any kind is at most $\alpha$ , i.e.,

\mathsf{P}_{A}(|A\triangle\Delta|\geq k)\leq\alpha,\quad\forall\,A\subseteq[M],

(7)

and we denote by $\mathcal{J}_{A}(\alpha;k,K)$ the smallest possible expected time for stopping in $\mathcal{C}(\alpha;k,K)$ , when the subset of anomalous sources is $A\subseteq[M]$ , i.e.,

\mathcal{J}_{A}(\alpha;k,K):=\inf\limits_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{E}_{A}[T].

(8)

The first problem we consider in this paper is to evaluate $\mathcal{J}_{A}(\alpha;k,K)$ to a first-order asymptotic approximation as $\alpha\to 0$ for any $A\subseteq[M]$ , and to find a policy that achieves $\mathcal{J}_{A}(\alpha;k,K)$ in this asymptotic sense, simultaneously for every $A\subseteq[M]$ .

II-B2 Control of generalized familywise error rates

For any $K\in(0,M]$ , $k_{1},\,k_{2}\in[M]$ , and $\alpha,\,\beta\in(0,1)$ , such that $\alpha+\beta<1$ and $k_{1}+k_{2}\leq M$ , we say that a policy $(R,T,\Delta)$ in $\mathcal{C}(K)$ belongs to $\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)$ if the probability of at least $k_{1}$ false positives does not exceed $\alpha$ and the probability of at least $k_{2}$ false negatives does not exceed $\beta$ , i.e.,

\mathsf{P}_{A}(|\Delta\setminus A|\geq k_{1})\leq\alpha,\qquad\text{and}\qquad\mathsf{P}_{A}(|A\setminus\Delta|\geq k_{2})\leq\beta,\qquad\forall\,A\subseteq[M],

(9)

and we denote by $\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)$ the smallest expected time for stopping in $\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)$ when the subset of anomalous sources is $A\subseteq[M]$ , i.e.,

\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K):=\inf\limits_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{E}_{A}[T].

(10)

The second problem we consider in this paper is to evaluate $\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)$ to a first-order asymptotic approximation as $\alpha,\beta\to 0$ , for any $A\subseteq[M]$ , and to find a family of policies that achieves $\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)$ in this asymptotic sense, simultaneously for every $A\subseteq[M]$ .

Remark II.1

As we mentioned in the Introduction, both problems have been solved for the full sampling case, i.e., when all sources are observed at each instant, in [27]. On the other hand, in the presence of sampling constraints, neither of them has been considered beyond the special case of “classical” misclassification error rate ( $k=1$ ) and “classical” familywise error rates ( $k_{1}=k_{2}=1$ ) in [9].

II-C Distributional Assumptions

For each $i\in[M]$ , the Kullback-Leibler (KL) divergences of $f_{1i}$ and $f_{0i}$ are assumed to be positive and finite, i.e., for each $i\in[M]$ we have:

	$\displaystyle I_{i}$	$\displaystyle:=\int_{\mathbb{S}}\log\left(f_{1i}/f_{0i}\right)\,f_{1i}\,d\nu_{i}\,\in(0,\infty),$		(11)
	$\displaystyle J_{i}$	$\displaystyle:=\int_{\mathbb{S}}\log\left(f_{0i}/f_{1i}\right)\,f_{0i}\,d\nu_{i}\,\in(0,\infty).$		(11)

To establish asymptotic lower bounds for (8) and (10), we assume that

\displaystyle\begin{split}\sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|\log^{+}(|g_{i}|)\,f_{1i}\,d\nu_{i}&<\infty,\qquad\\ \sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|\log^{+}(|g_{i}|)\,f_{0i}\,d\nu_{i}&<\infty,\end{split}

(12)

where $g_{i}:=\log\left(f_{1i}/f_{0i}\right)$ . This assumption is not needed neither in the full sampling case considered in [27], nor in the case of classical error control ( $k=1$ or $k_{1}=k_{2}=1$ ) under sampling constraints in [9]. Nevertheless, it is weaker than the typical assumption in the sequential controlled sensing literature (e.g.,[16], [17]), according to which it is required that

\displaystyle\begin{split}\sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|^{\mathfrak{p}}\,f_{1i}\;d\nu_{i}&<\infty,\\ \sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|^{\mathfrak{p}}\,f_{0i}\;d\nu_{i}&<\infty,\end{split}

(13)

holds for $\mathfrak{p}=2$ . However, in order to show that the asymptotic lower bound in (8) and (10) can be achieved, in certain cases we will need to require that (13) holds for some $\mathfrak{p}>4$ .

III Two max-min optimization problems

In this section, we formulate and solve two auxiliary max-min optimization problems. These will be used in Section IV to express the asymptotic lower bounds for (8) and (10), and in Section V to design procedures that achieve these lower bounds.

To be specific, we denote by $\boldsymbol{L}:=\{L_{i}\,:\,i\in[|\boldsymbol{L}|]\}$ an ordered set of positive numbers, i.e.,

L_{0}:=0<L_{1}\leq\ldots\leq L_{|\boldsymbol{L}|},

(14)

for each $i\in[|\boldsymbol{L}|]$ , we denote by $\widetilde{L}_{i}$ the harmonic mean of the $|\boldsymbol{L}|-i+1$ largest elements in $\boldsymbol{L}$ , i.e.,

\widetilde{L}_{i}:=\frac{|\boldsymbol{L}|-i+1}{\sum_{u=i}^{|\boldsymbol{L}|}(1/L_{u})},\quad i\in[|\boldsymbol{L}|],

(15)

and we also set $\widetilde{L}_{|\boldsymbol{L}|+1}:=\infty$ . Then, assuming that $|\boldsymbol{L}|\leq M$ , we introduce the following function,

\mathcal{V}(\boldsymbol{c};\kappa,\boldsymbol{L}):=\min_{U\subseteq[|\boldsymbol{L}|]:\,|U|=\kappa}\,\sum_{i\in U}c_{i}\,L_{i},

(16)

where

•

$\kappa$ is a positive integer in $[|\boldsymbol{L}|]$ ,
•

$\boldsymbol{c}:=(c_{1},\ldots,c_{|\boldsymbol{L}|},0,\ldots,0)$ is a vector in

$\mathcal{D}(K):=\left\{(c_{1},\ldots,c_{M})\in[0,1]^{M}:\sum_{i=1}^{M}c_{i}\leq K\right\}.$ (17)

The constants $M$ and $K$ are defined as in the previous section, i.e., $M$ is a positive integer, and $K$ a real number in $(0,M]$ .

III-A Optimization Problem I

Let $\boldsymbol{L}$ be an ordered set of size $|\boldsymbol{L}|\leq M$ , and $\kappa\in[|\boldsymbol{L}|]$ . The first max-min optimization problem we consider is

V(\kappa,K,\boldsymbol{L}):=\max_{\boldsymbol{c}\in\mathcal{D}(K)}\mathcal{V}(\boldsymbol{c};\kappa,\boldsymbol{L}).

(18)

In the following lemma, we provide an expression for the value $V(\kappa,K,\boldsymbol{L})$ of the max-min optimization problem (18), as well as for the maximizer of (18) with the minimum $\mathcal{L}^{1}$ norm.

Lemma III.1

The value of the max-min optimization problem (18) is equal to the expression

V(\kappa,K,\boldsymbol{L})=\begin{cases}(\kappa-u)\,\frac{K}{|\boldsymbol{L}|-u}\,\widetilde{L}_{u+1},\quad&\mbox{if}\quad v=0,\\ x\,L_{v-1}+\sum_{i=v}^{u}L_{i}+(\kappa-u)\,y\,\widetilde{L}_{u+1},\quad&\mbox{if}\quad v\geq 1,\end{cases}

(19)

where $x,y$ are real numbers in $[0,1)$ , and $u,v$ are integer numbers in $[0,\kappa]$ such that $v\leq u$ . The values of $x$ , $y$ , $v$ , $u$ are determined by Algorithm 1 presented in Appendix A. The maximizer of (18) with the minimum $\mathcal{L}^{1}$ norm is given by

\boldsymbol{c}^{\prime}(\kappa,K,\boldsymbol{L}):=(c^{\prime}_{1},\ldots,c^{\prime}_{|\boldsymbol{L}|},0,\ldots,0),

(20)

where

•

for all $i\in\{u+1,\ldots,|\boldsymbol{L}|\}$ we have

c^{\prime}_{i}\,L_{i}=\begin{cases}y\,\widetilde{L}_{u+1},\;\,\qquad\text{if}\quad u<\kappa,\\ L_{\kappa},\qquad\qquad\,\text{if}\quad u=\kappa,\end{cases}

(21)

•

if $v=0$ then

$c^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\,\in\,\{1,\ldots,u\},$ (22)
•

if $v\geq 1$ then

$c^{\prime}_{i}=1,\qquad\mbox{for all}\quad i\,\in\,\{v,\ldots,u\},$ (23)
•

if $v\geq 2$ then

$c^{\prime}_{v-1}=\begin{cases}x,\qquad\text{if}\quad x>0,\\ 0,\qquad\text{if}\quad x=0,\end{cases}$ (24)
•

if $v\geq 3$ then

$c^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\leq v-2.$ (25)

Proof:

Appendix A. ∎

Remark III.1

In the symmetric case where $L_{i}=L$ for all $i\in[|\boldsymbol{L}|]$ , then

V(\kappa,K,\boldsymbol{L})=\kappa\,(K/|\boldsymbol{L}|)\,L,

and $c^{\prime}_{i}=(K/|\boldsymbol{L}|)\wedge 1$ for all $i\in[|\boldsymbol{L}|]$ .

Remark III.2

Based on the size of $K$ we distinguish the following cases on the form of $V(\kappa,K,\boldsymbol{L})$ .

•

If $K$ is relatively large, i.e.,

$K\geq\kappa+L_{\kappa}\sum_{i=\kappa+1}^{|\boldsymbol{L}|}1/L_{i},$ (26)

then

$V(\kappa,K,\boldsymbol{L})=\sum_{i=1}^{\kappa}L_{i},$ (27)

which is the largest possible value that $V(\kappa,K,\boldsymbol{L})$ can take over all possible values of $K\leq M$ . In this case $x=y=0$ , $v=1$ , $u=\kappa$ .
•

If $K$ is relatively small, i.e.,

$K<L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i},$ (28)

then

$V(\kappa,K,\boldsymbol{L})=(\kappa-u^{*})\;\frac{K}{|\boldsymbol{L}|-u^{*}}\;\widetilde{L}_{u^{*}+1},$

where $u^{*}$ is a quantity defined as

$u^{*}:=\max\left\{u\in\{0,\ldots,\kappa-1\}\,:\,\frac{\kappa-u}{|\boldsymbol{L}|-u}\;\widetilde{L}_{u+1}\geq L_{u}\right\}.$

In this case $x=0$ , $u=u^{*}$ , $v=0$ , and $y=K/(|\boldsymbol{L}|-u^{*})$ .
•

If $K$ is between the values given in (26), (28) the $V(\kappa,K,\boldsymbol{L})$ has the general form described in Lemma III.1.

Remark III.3

The only case we can have a maximizer $(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})$ which is not equal to the maximizer with the minimum $\mathcal{L}^{1}$ norm, is when (26) holds by strict inequality. The only difference between the two maximizers is that for $(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})$ there is $i\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}$ such that

\tilde{c}_{i}L_{i}>L_{\kappa}.

(29)

Although (29) can hold for some $i\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}$ , this does not change the value of $V(\kappa,K,\boldsymbol{L})$ in (27).

III-B Optimization Problem II

Let $\boldsymbol{L}_{1}:=\{L_{1,i}:i\in[|\boldsymbol{L}_{1}|]\}$ and $\boldsymbol{L}_{2}:=\{L_{2,i}:i\in[|\boldsymbol{L}_{2}|]\}$ be two ordered sets such that $|\boldsymbol{L}_{1}|+|\boldsymbol{L}_{2}|\leq M$ , let $\kappa_{1}$ , $\kappa_{2}$ be two positive integers such that $\kappa_{1}\in[|\boldsymbol{L}_{1}|]$ , $\kappa_{2}\in[|\boldsymbol{L}_{2}|]$ , and let $r$ be an arbitrary positive number. The second max-min optimization problem we consider in this section is more complex than the first, and it has the form

W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r):=\max_{\boldsymbol{c}\in\mathcal{D}(K)}\min\left\{\mathcal{V}(\hat{\boldsymbol{c}};\kappa_{1},\boldsymbol{L}_{1}),r\;\mathcal{V}(\check{\boldsymbol{c}};\kappa_{2},\boldsymbol{L}_{2})\right\},

(30)

where the function $\mathcal{V}$ is defined in (16), and $\boldsymbol{c}:=(\hat{\boldsymbol{c}},\check{\boldsymbol{c}},\boldsymbol{0})$ , the size of $\boldsymbol{0}$ being $M-|\boldsymbol{L}_{1}|-|\boldsymbol{L}_{2}|$ , and

\displaystyle\begin{split}\hat{\boldsymbol{c}}&:=(\hat{c}_{1},\ldots,\hat{c}_{|\boldsymbol{L}_{1}|}),\\ \check{\boldsymbol{c}}&:=(\check{c}_{1},\ldots,\check{c}_{|\boldsymbol{L}_{2}|}).\end{split}

(31)

As we show in the following lemma, the value $W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)$ of the max-min optimization problem (30) is equal to the value of the following optimization problem,

\max_{(K_{1},K_{2})}V(\kappa_{1},K_{1},\boldsymbol{L}_{1}),

(32)

such that the following two constraints hold

	$\displaystyle K_{1}+K_{2}$	$\displaystyle\leq K,$		(33)
	$\displaystyle V(\kappa_{1},K_{1},\boldsymbol{L}_{1})$	$\displaystyle=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2}).$		(33)

Definition III.1

We denote by $(K^{*}_{1},K^{*}_{2})$ the maximizer of the constrained optimization problem (32) with the minimum $\mathcal{L}^{1}$ norm, i.e., the minimum sum $K_{1}+K_{2}$ , among all maximizers. Based on Lemma III.1, we denote by $x_{1},y_{1},u_{1},v_{1}$ the parameters such that

V(\kappa_{1},K^{*}_{1},\boldsymbol{L}_{1})=\begin{cases}(\kappa_{1}-u_{1})\,\frac{K^{*}_{1}}{|\boldsymbol{L}_{1}|-u_{1}}\,\widetilde{L}_{1,u_{1}+1},\quad&\mbox{if}\quad v_{1}=0,\\ x_{1}L_{1,v_{1}-1}+\sum_{i=v_{1}}^{u_{1}}L_{1,i}+(\kappa_{1}-u_{1})\,y_{1}\,\widetilde{L}_{1,u_{1}+1},\quad&\mbox{if}\quad v_{1}\geq 1,\end{cases}

and by $x_{2},y_{2},u_{2},v_{2}$ the parameters such that

V(\kappa_{2},K^{*}_{2},\boldsymbol{L}_{2})=\begin{cases}(\kappa_{2}-u_{2})\,\frac{K^{*}_{2}}{|\boldsymbol{L}_{2}|-u_{2}}\,\widetilde{L}_{2,u_{2}+1},\quad&\mbox{if}\quad v_{2}=0,\\ x_{2}L_{2,v_{2}-1}+\sum_{i=v_{2}}^{u_{2}}L_{2,i}+(\kappa_{2}-u_{2})\,y_{2}\,\widetilde{L}_{2,u_{2}+1},\quad&\mbox{if}\quad v_{2}\geq 1.\end{cases}

The parameters $x_{1},y_{1},u_{1},v_{1}$ , and $x_{2},y_{2},u_{2},v_{2}$ can be computed by applying Algorithm 1 for each case.

The parameters $x_{1},y_{1},u_{1},v_{1}$ , and $x_{2},y_{2},u_{2},v_{2}$ are used in the expression of the maximizers of (30) with the minimum $\mathcal{L}^{1}$ norm.

Lemma III.2

The value of the max-min optimization problem (30) is equal to

	$\displaystyle W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)$	$\displaystyle=V(\kappa_{1},K^{*}_{1},\boldsymbol{L}_{1})$		(34)
		$\displaystyle=r\,V(\kappa_{2},K^{*}_{2},\boldsymbol{L}_{2}).$		(34)

The maximizer of (30) with the minimum $\mathcal{L}^{1}$ norm is given by

\boldsymbol{c}^{\prime}(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r):=\left(\hat{c}^{\prime}_{1},\ldots,\hat{c}^{\prime}_{|\boldsymbol{L}_{1}|},\check{c}^{\prime}_{1},\ldots,\check{c}^{\prime}_{|\boldsymbol{L}_{2}|},0,\ldots,0\right),

(35)

where

•

for all $i\in\{u_{1}+1,\ldots,|\boldsymbol{L}_{1}|\}$ we have

\hat{c}^{\prime}_{i}\,L_{1,i}=\begin{cases}y_{1}\,\widetilde{L}_{1,u_{1}+1},\,\qquad\text{if}\quad u_{1}<\kappa_{1},\\ L_{1,\kappa_{1}},\qquad\qquad\,\text{if}\quad u_{1}=\kappa_{1},\end{cases}

(36)

•

if $v_{1}=0$ then

$\hat{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\,\in\,\{1,\ldots,u_{1}\},$ (37)
•

if $v_{1}\geq 1$ then

$\hat{c}^{\prime}_{i}=1,\qquad\mbox{for all}\quad i\,\in\,\{v_{1},\ldots,u_{1}\},$ (38)
•

if $v_{1}\geq 2$ then

$\hat{c}^{\prime}_{v_{1}-1}=\begin{cases}x_{1},\;\,\quad\text{if}\quad x_{1}>0,\\ 0,\qquad\text{if}\quad x_{1}=0,\end{cases}$ (39)
•

if $v_{1}\geq 3$ then

$\hat{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\leq v_{1}-2,$ (40)

and

•

for all $i\in\{u_{2}+1,\ldots,|\boldsymbol{L}_{2}|\}$ we have

\check{c}^{\prime}_{i}\,L_{2,i}=\begin{cases}y_{2}\,\widetilde{L}_{2,u_{2}+1},\,\qquad\text{if}\quad u_{2}<\kappa_{2},\\ L_{2,\kappa_{2}},\qquad\qquad\,\text{if}\quad u_{2}=\kappa_{2},\end{cases}

(41)

•

if $v_{2}=0$ then

$\check{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\,\in\,\{1,\ldots,u_{2}\},$ (42)
•

if $v_{2}\geq 1$ then

$\check{c}^{\prime}_{i}=1,\qquad\mbox{for all}\quad i\,\in\,\{v_{2},\ldots,u_{2}\},$ (43)
•

if $v_{2}\geq 2$ then

$\check{c}^{\prime}_{v_{2}-1}=\begin{cases}x_{2},\;\,\quad\text{if}\quad x_{2}>0,\\ 0,\qquad\text{if}\quad x_{2}=0,\end{cases}$ (44)
•

if $v_{2}\geq 3$ then

$\check{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\leq v_{2}-2.$ (45)

Proof:

Appendix A. ∎

IV Universal asymptotic lower bounds

In this section, we fix an arbitrary $A\subseteq[M]$ , and $k,k_{1},k_{2},K$ as in Section II, and we establish a universal asymptotic lower bound for (8) and (10), as $\alpha\to 0$ , and $\alpha,\beta\to 0$ respectively, under the moment assumption (12).

IV-A The case of generalized misclassification error rate

To state the asymptotic lower bound for $\mathcal{J}_{A}(\alpha;k,K)$ as $\alpha\to 0$ , we need to introduce some additional notation. Thus, we denote by

\boldsymbol{F}(A):=\{F_{i}(A)\,:\,i\in[M]\}

(46)

the ordered set that consists of the Kullback-Leibler numbers in $\{I_{i},\,J_{j}\,:\,i\in A,\,j\notin A\}$ . In particular, for each $i\in[M]$ , $F_{i}(A)$ is the $i^{th}$ smallest element in $\boldsymbol{F}(A)$ , and can be interpreted as a measure of the difficulty of the $i^{th}$ most difficult testing problem.

The overall difficulty of the testing problem is determined by the quantity $V(k,K,\boldsymbol{F}(A))$ , as described in the following theorem.

Theorem IV.1

Suppose (12) holds. As $\alpha\to 0$ , we have

\mathcal{J}_{A}(\alpha;k,K)\gtrsim\frac{|\log\alpha|}{V(k,K,\boldsymbol{F}(A))},

(47)

where $V$ is defined in (18).

Proof:

Appendix B. ∎

Remark IV.1

Consider the homogeneous and symmetric setup where the difficulty is the same across all testing problems, in the sense that

I_{i}=J_{j}=I,\quad\forall\;i,j\in[M].

Then, for every $A\subseteq[M]$ we have

F_{i}(A)=I,\quad\forall\;i\in[M].

Consequently, by Remark III.1 we can see that

V(k,K,\boldsymbol{F}(A))=k\,(K/M)\,I.

IV-B The case of generalized familywise error rates

We next establish an asymptotic lower bound for $\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)$ as $\alpha\to 0$ and / or $\beta\to 0$ . For this, we need to introduce the following notation.

(i)
If $A\neq\emptyset$ , for each $i\in[|A|]$ we denote
- •
 
 by $I_{i}(A)$ the $i^{th}$ smallest element in $\{I_{j}\,:\,j\in A\}$ ,
- •
 
 by $\widetilde{I}_{i}(A)$ the harmonic mean of the $|A|-i+1$ largest elements in $\{I_{j}\,:\,j\in A\}$ .
Moreover, we denote by $\boldsymbol{I}(A)$ the ordered set that consists of the Kullback-Leibler numbers in $\{I_{j}\,:\,j\in A\}$ , i.e.,

$\boldsymbol{I}(A):=\left\{I_{i}(A)\,:\,i\in[|A|]\right\},$

and for each $l\in\{0,\ldots,|A|-1\}$ , we denote by $\boldsymbol{I_{l}}(A)$ the set that consists of the $|A|-l$ largest elements in $\{I_{j}\,:\,j\in A\}$ , i.e.,

$\boldsymbol{I_{l}}(A):=\{I_{i}(A)\,:\,l<i\leq|A|\}.$
(ii)
If $A\neq[M]$ , for each $i\in[|A^{c}|]$ we denote
- •
 
 by $J_{i}(A)$ the $i^{th}$ smallest element in $\{J_{j}\,:\,j\in A^{c}\}$ ,
- •
 
 by $\widetilde{J}_{i}(A)$ the harmonic mean of the largest $|A^{c}|-i+1$ elements in $\{J_{j}\,:\,j\in A^{c}\}$ .
Moreover, we denote by $\boldsymbol{J}(A)$ the ordered set that consists of the Kullback-Leibler numbers in $\{J_{j}\,:\,j\in A^{c}\}$ , i.e.,

$\boldsymbol{J}(A):=\left\{J_{i}(A)\,:\,i\in[|A^{c}|]\right\},$

and for each $l\in\{0,\ldots,|A^{c}|-1\}$ , we denote by $\boldsymbol{J_{l}}(A)$ the ordered set that consists of the $|A^{c}|-l$ largest elements in $\{J_{j}\,:\,j\in A^{c}\}$ , i.e.,

$\boldsymbol{J_{l}}(A):=\{J_{i}(A)\,:\,l<i\leq|A^{c}|\}.$

We state first the asymptotic lower bound for $\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)$ when $A=\emptyset$ and when $A=[M]$ , as $\beta\to 0$ and $\alpha\to 0$ , respectively.

Theorem IV.2

Suppose (12) holds.

•

Let $A=\emptyset$ . For any given $\alpha\in(0,1)$ , as $\beta\to 0$ we have

$\mathcal{J}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\beta|}{V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A))}.$ (48)
•

Let $A=[M]$ . For any given $\beta\in(0,1)$ , as $\alpha\to 0$ we have

$\mathcal{J}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A))}.$ (49)

where $V$ is defined in (18).

Proof:

Appendix B.
∎

Remark IV.2

If $A=\emptyset$ the $k_{1}-1$ sources with the smallest KL numbers in $\boldsymbol{J}(A)$ are not considered in the evaluation of the difficulty of the testing problem. This is because we can intentionally misclassify these $k_{1}-1$ sources as anomalous without exceeding the tolerance level of $k_{1}$ false alarm errors in order to reduce the expected stopping time. The respective remark holds for the case $A=[M]$ .

We continue with the asymptotic lower bound when $0<|A|<M$ as $\alpha,\,\beta\to 0$ so that

|\log\alpha|\sim r|\log\beta|,\quad\mbox{for some }\;r\in(0,\infty).

(50)

For this, we need to introduce the following definition.

Definition IV.1

We denote by $v_{A}(k_{1},k_{2},K,r)$ the maximum of the following quantities. Each quantity is included to the overall maximum given that the respective condition is satisfied. We recall that $k_{1}\leq|A|$ or $k_{2}\leq|A^{c}|$ , because otherwise we would have $k_{1}+k_{2}>|A|+|A^{c}|=M$ which contradicts the initial assumption $k_{1}+k_{2}\leq M$ .

•

If $k_{2}\leq|A^{c}|$ we include the

$\max\{W(k_{1}-l,k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l}}(A),r)\}$ (51)

over all $l\in\{(k_{1}-|A|)^{+},\ldots,(k_{1}-1)\wedge(|A^{c}|-k_{2})\}$ .
•

If $k_{1}\leq|A|$ we include the

$\max\{W(k_{1},k_{2}-l,K,\boldsymbol{I_{l}}(A),\boldsymbol{J}(A),r)\}$ (52)

over all $l\in\{(k_{2}-|A^{c}|)^{+},\ldots,(k_{2}-1)\wedge(|A|-k_{1})\}$ .
•

If $k_{1}-1\geq|A^{c}|-k_{2}+1$ we include the

$V(k_{1}-(|A^{c}|-k_{2}+1)^{+},K,\boldsymbol{I}(A)).$ (53)
•

If $k_{2}-1\geq|A|-k_{1}+1$ we include the

$r\,V(k_{2}-(|A|-k_{1}+1)^{+},K,\boldsymbol{J}(A)).$ (54)

where $W$ is defined in (30), and $V$ in (18).

Remark IV.3

We note that for any $A\subseteq[M]$ , it holds

	$\displaystyle(k_{1}-\|A\|)^{+}$	$\displaystyle\leq(k_{1}-1)\wedge(\|A^{c}\|-k_{2}),$		(55)
	$\displaystyle(k_{2}-\|A^{c}\|)^{+}$	$\displaystyle\leq(k_{2}-1)\wedge(\|A\|-k_{1}).$		(56)

Indeed for (55) we observe that if $k_{1}\leq|A|$ then $(k_{1}-|A|)^{+}=0$ and the result is evident, whereas if $k_{1}>|A|$ it holds

	$\displaystyle k_{1}-\|A\|$	$\displaystyle\leq k_{1}-1,$		(57)
	$\displaystyle k_{1}-\|A\|$	$\displaystyle\leq\|A^{c}\|-k_{2},$		(58)

where (57) follows by the fact that $0<|A|<M$ , and (58) holds because we have assumed $k_{1}+k_{2}\leq M$ . Similarly, we can verify that (56) holds.

The difficulty of the testing problem is determined by the quantity $v_{A}(k_{1},k_{2},K,r)$ as described in the following theorem.

Theorem IV.3

Suppose (12) holds, and let $0<|A|<M$ . As $\alpha,\,\beta\to 0$ so that (50) holds, we have

\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{v_{A}(k_{1},k_{2},K,r)}.

(59)

where $v_{A}(k_{1},k_{2},K,r)$ is given by Definition IV.1.

Proof:

Appendix B. ∎

We denote by $l_{A}$ the value of the parameter $l$ which corresponds to the maximum of the quantities in Definition IV.1, as it is used in the formulation of the following results.

Definition IV.2

The quantity $l_{A}$ is defined as follows.

•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to (51), then $l_{A}$ is the number such that

$v_{A}(k_{1},k_{2},K,r)=W(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A),r).$ (60)
•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to $\eqref{v2}$ , then $l_{A}$ is the number such that

$v_{A}(k_{1},k_{2},K,r)=W(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A),r).$ (61)
•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to $\eqref{v3}$ , then $l_{A}=(|A^{c}|-k_{2}+1)^{+}$ .
•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to $\eqref{v4}$ , then $l_{A}=(|A|-k_{1}+1)^{+}$ .

Remark IV.4

If $v_{A}(k_{1},k_{2},K,r)$ is equal to (51), the $l_{A}$ sources with the smallest KL numbers in $\boldsymbol{J}(A)$ are not considered in the evaluation of the difficulty of the testing problem. This is because we can intentionally misclassify these $l_{A}$ sources as anomalous without exceeding the tolerance level of $k_{1}$ false alarms in order to reduce the expected stopping time. The respective remark holds if $v_{A}(k_{1},k_{2},K,r)$ is equal to (52).

If $k_{1}-1\geq|A^{c}|-k_{2}+1$ then we can intentionally misclassify as anomalous the $|A^{c}|-k_{2}+1$ sources with the smallest KL numbers in $\boldsymbol{J}(A)$ without exceeding the tolerance level of $k_{1}$ false alarms in order to reduce the expected stopping time. The remaining $k_{2}-1$ sources in $A^{c}$ are already less than the tolerance level of the $k_{2}$ missed detection errors and this is why the difficulty of the testing problem is determined only by the KL numbers in $\boldsymbol{I}(A)$ in (53). The respective remark holds if $k_{2}-1\geq|A|-k_{1}+1$ .

Corollary IV.1

Suppose (12) holds, and let $0<|A|<M$ . As $\alpha,\,\beta\to 0$ so that (50) holds, we can distinguish the following cases.

•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to (51),

\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1}-l_{A},K_{1}^{*},\boldsymbol{I}(A))}\sim\frac{|\log\beta|}{V(k_{2},K_{2}^{*},\boldsymbol{J_{l_{A}}}(A))}.

•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to (52),

\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1},K_{1}^{*},\boldsymbol{I_{l_{A}}}(A))}\sim\frac{|\log\beta|}{V(k_{2}-l_{A},K_{2}^{*},\boldsymbol{J}(A))}.

•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to (53), then

$\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1}-l_{A},K,\boldsymbol{I}(A))}.$
•

If $v_{A}(k_{1},k_{2},K,r)$ is equal to (54), then

$\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\beta|}{V(k_{2}-l_{A},K,\boldsymbol{J}(A))}.$

In the first two cases $(K_{1}^{*},K_{2}^{*})$ is the maximizer of (32) for the respective case.

V A criterion for asymptotic optimality

In this section, for each of the two problems under consideration, we first introduce a stopping and a decision rule so that the corresponding error constraint is satisfied for any choice of sampling rule. For this, we adopt the approach in [27], where the full sampling case was considered. Subsequently, we establish the second main result of this paper, which is a criterion on the sampling rule for the resulting policy to achieve the corresponding universal asymptotic lower bound in the previous section.

In what follows, for any sampling rule $R$ and for each source $i\in[M]$ , we denote by $\Lambda^{R}_{i}(n)$ the local log-likelihood ratio (LLR) of source $i$ based on the observations from it in the first $n$ time instants, i.e.,

\displaystyle\Lambda^{R}_{i}(n)

\displaystyle:=\sum_{m=1}^{n}\log\left(\frac{f_{1i}(X_{i}(m))}{f_{0i}(X_{i}(m))}\right)\,R_{i}(m),\quad n\in\mathbb{N}.

(62)

V-A The case of generalized misclassification error

For any sampling rule $R$ , we denote by $T^{R}_{si}$ the first time that the sum of the $k$ smallest LLRs in absolute value is larger than some threshold $d>0$ , and by $\Delta_{si}^{R}$ the subset of data sources with positive LLRs upon stopping, i.e.,

	$\displaystyle T^{R}_{si}$	$\displaystyle:=\inf\left\{n\geq 1\,:\,\sum_{i=1}^{k}\bar{\Lambda}^{R}_{i}(n)\geq d\right\},$		(63)
	$\displaystyle\Delta^{R}_{si}$	$\displaystyle:=\left\{i\in[M]\,:\,\Lambda^{R}_{i}(T^{R}_{si})>0\right\},$		(64)

where, for each $i\in[M]$ and $n\in\mathbb{N}$ , $\bar{\Lambda}^{R}_{i}(n)$ denotes the $i^{th}$ smallest LLR in absolute value at time $n$ , i.e., the $i^{th}$ smallest element in $\{|{\Lambda}^{R}_{j}(n)|\,:\,j\in[M]\}$ . In the full-sampling case, where $R(n)=[M]$ for every $n\in\mathbb{N}$ , $(T^{R}_{si},\Delta^{R}_{si})$ coincides with the sum-intersection rule, introduced in [27]. Similarly to [27, Theorem 3.1], it can be shown that if the threshold $d$ in (63) is selected as

d:=|\log\alpha|+\log{M\choose k},

(65)

then the policy $(R,T^{R}_{si},\Delta^{R}_{si})$ satisfies the error constraint (7) for any sampling rule $R$ . Next, we show that this policy, with this choice of threshold, also achieves the asymptotic lower bound in Theorem IV.1 when, for each $i\in[M]$ , the long-run frequency of the source that corresponds to $F_{i}(A)$ is not smaller than the quantity $c^{\prime}_{i}(k,K,\boldsymbol{F}(A))$ , defined according to (20) of Lemma III.1. To be specific, we need the following definition.

Definition V.1

For each $A\subseteq[M]$ , we denote by $\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A))$ the permutation of $\boldsymbol{c}^{\prime}(k,K,\boldsymbol{F}(A))$ , defined in (20), so that

\displaystyle c^{*}_{(i)}(A):=c^{\prime}_{i}(k,K,\boldsymbol{F}(A)),\quad i\in[M],

(66)

where $(i)$ denotes the source with the $i^{th}$ smallest number in the set $\boldsymbol{F}(A)$ , defined in (46).

We are now ready to state the first main result of this section.

Theorem V.1

Consider a policy of the form $(R,T^{R}_{si},\Delta^{R}_{si})$ , where the threshold $d$ in (63) is selected according to (65) and the sampling constraint (4) is satisfied. Fix $A\subseteq[M]$ , and suppose that, for all $i\in[M]$ , the sampling rule $R$ satisfies

\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)<\infty,\qquad\forall\;\epsilon>0,

(67)

where $\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A))$ is defined according to Definition V.1. Then, as $\alpha\to 0$ , we have

\mathsf{E}_{A}\left[T^{R}_{si}\right]\sim\mathcal{J}_{A}(\alpha;k,K)\sim\frac{|\log\alpha|}{V(k,K,\boldsymbol{F}(A))},

(68)

where $V$ is defined in (18).

Proof:

Appendix C. ∎

V-B The case of generalized familywise error metric

For any sampling rule $R$ , we denote by $p^{R}(n)$ the number of non-negative LLRs at time $n$ , by $\hat{w}^{R}_{1}(n),\ldots,\hat{w}^{R}_{p^{R}(n)}(n)$ the indices of the increasingly ordered non-negative LLRs at time $n$ , and by $\check{w}^{R}_{1}(n),\ldots,\check{w}^{R}_{M-p^{R}(n)}(n)$ the indices of the decreasingly ordered negative LLRs at time $n$ , i.e.,

\displaystyle\begin{split}0&\leq\Lambda^{R}_{\hat{w}^{R}_{1}(n)}(n)\leq\ldots\leq\Lambda^{R}_{\hat{w}^{R}_{p^{R}(n)}}(n),\\ 0&>\Lambda^{R}_{\check{w}^{R}_{1}(n)}(n)\geq\ldots\geq\Lambda^{R}_{\check{w}^{R}_{M-p^{R}(n)}}(n).\end{split}

(69)

We also set

	$\displaystyle\hat{\Lambda}^{R}_{i}(n)$	$\displaystyle:=\begin{cases}\Lambda^{R}_{\hat{w}^{R}_{i}(n)}(n),&\qquad i\leq p^{R}(n),\\ +\infty,&\qquad i>p^{R}(n),\end{cases}$		(70)
	$\displaystyle\check{\Lambda}^{R}_{i}(n)$	$\displaystyle:=\begin{cases}-\Lambda^{R}_{\check{w}^{R}_{i}(n)}(n),&\quad i\leq M-p^{R}(n),\\ +\infty,&\quad i>M-p^{R}(n).\end{cases}$		(70)

For any integer $l$ such that $0\leq l<k_{1}$ , we set

\hat{\tau}(l):=\inf\left\{n\geq 1:\sum_{i=1}^{k_{1}-l}\hat{\Lambda}^{R}_{i}(n)\geq b,\,\,\sum_{i=1+l}^{k_{2}+l}\check{\Lambda}^{R}_{i}(n)\geq a\right\},

(71)

Also, for any integer $l$ with $1\leq l<k_{2}$ , we set

\check{\tau}(l):=\inf\left\{n\geq 1:\sum_{i=1+l}^{k_{1}+l}\hat{\Lambda}^{R}_{i}(n)\geq b,\,\,\sum_{i=1}^{k_{2}-l}\check{\Lambda}^{R}_{i}(n)\geq a\right\}.

(72)

We denote by $T^{R}_{leap}$ the minimum of the stopping times in (71) and (72), i.e.,

\displaystyle T^{R}_{leap}:=\min\left\{\min\limits_{0\leq l<k_{1}}\hat{\tau}(l),\min\limits_{1\leq l<k_{2}}\check{\tau}(l)\right\},

(73)

and depending on whether the minimum is attained by a $\hat{\tau}(l)$ for some $l\in[0,k_{1})$ , or by a $\check{\tau}(l)$ for some $l\in[1,k_{2})$ , and we set

\displaystyle\Delta^{R}_{leap}:=\begin{cases}\{\hat{w}_{1}(\hat{\tau}(l)),..,\hat{w}_{p(\hat{\tau}(l))}(\hat{\tau}(l))\}\bigcup\{\check{w}_{1}(\hat{\tau}(l)),..,\check{w}_{l\wedge(M-p^{R}(n))}(\hat{\tau}(l))\},&\mbox{if}\quad T^{R}_{leap}=\hat{\tau}(l),\\ \{\hat{w}_{l+1}(\check{\tau}(l)),..,\hat{w}_{p(\check{\tau}(l))}(\check{\tau}(l))\},&\mbox{if}\quad T^{R}_{leap}=\check{\tau}(l).\end{cases}

(74)

In the full-sampling case, where $R(n)=[M]$ for every $n\in\mathbb{N}$ , $(T^{R}_{si},\Delta^{R}_{si})$ coincides with the leap rule, introduced in [27]. Similarly to [27, Theorem 4.1], it can be shown that if the thresholds $a$ and $b$ in (71)-(72) are selected as

\displaystyle\begin{split}a&:=|\log\beta|+\log\left(2^{k_{2}}{M\choose k_{2}}\right),\\ b&:=|\log\alpha|+\log\left(2^{k_{1}}{M\choose k_{1}}\right),\end{split}

(75)

then the policy $(R,T^{R}_{leap},\Delta^{R}_{leap})$ satisfies the error constraint (9) for any sampling rule $R$ . We next show that this policy, with this choice of threshold, achieves the asymptotic lower bound in Theorem IV.3, as long as the long-run sampling frequency of each source is sufficiently large. To be specific, we introduce the following definition, for which we recall the definition of the quantity $l_{A}$ in Definition IV.2.

Definition V.2

For each $A\subseteq[M]$ , we define the vector $\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A))$ as follows:

(i)

If $A=\emptyset$ , then $\boldsymbol{c}^{*}(A)$ is a permutation of the vector $\boldsymbol{c}^{\prime}(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A))$ , defined in (20), such that

$c_{\{i\}}^{*}(A):=c^{\prime}_{i}(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)),\quad i\in[M-k_{1}+1],$ (76)

where $\{i\}$ denotes the source with the $i^{th}$ smallest element in $\boldsymbol{J_{k_{1}-1}}(A)$ .
(ii)

If $A=[M]$ , then $\boldsymbol{c}^{*}(A)$ is a permutation of the vector $\boldsymbol{c}^{\prime}(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A))$ , defined in (20), such that

$c_{}^{*}(A):=c_{i}^{\prime}(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A)),\quad i\in[M-k_{2}+1],$ (77)

where $$ denotes the source with the $i^{th}$ smallest element in $\boldsymbol{I_{k_{2}-1}}(A)$ .
(iii)

If $0<|A|<M$ and $v_{A}(k_{1},k_{2},K,r)$ is equal to (53), then $\boldsymbol{c}^{*}(A)$ is a permutation of the vector $\boldsymbol{c}^{\prime}(k_{1}-l_{A},K,\boldsymbol{I}(A)),$ defined in (20), such that

$c_{}^{*}(A):=c_{i}^{\prime}(k_{1}-l_{A},K,\boldsymbol{I}(A)),\quad i\in[|A|],$ (78)

where $$ denotes the source in $A$ with the $i^{th}$ smallest element in $\boldsymbol{I}(A)$ .
(iv)

If $0<|A|<M$ and $v_{A}(k_{1},k_{2},K,r)$ is equal to (54), then $\boldsymbol{c}^{*}(A)$ is a permutation of the vector $\boldsymbol{c}^{\prime}(k_{2}-l_{A},K,\boldsymbol{J}(A)),$ defined according to (20), such that

$c_{\{i\}}^{*}(A):=c^{\prime}_{i}(k_{2}-l_{A},K,\boldsymbol{J}(A)),\quad i\in[|A^{c}|],$ (79)

where $\{i\}$ denotes the source in $A^{c}$ with the $i^{th}$ smallest element in $\boldsymbol{J}(A)$ .

(v)

If $0<|A|<M$ and $v_{A}(k_{1},k_{2},K,r)$ is equal to (51), then $\boldsymbol{c}^{*}(A)$ is a permutation of the vector $\boldsymbol{c}^{\prime}(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A))$ , defined according to (35), such that

	$\displaystyle c^{*}_{<i>}(A)$	$\displaystyle:=\hat{c}^{\prime}_{i}(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A)),\quad i\in[\|A\|],$		(80)
	$\displaystyle c^{*}_{\{j\}}(A)$	$\displaystyle:=\check{c}^{\prime}_{j}(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A)),\quad j\in[\|A^{c}\|-l_{A}],$		(81)

where $$ is the source in $A$ with the $i^{th}$ smallest element in $\boldsymbol{I}(A)$ , and $\{j\}$ the source in $A^{c}$ with the $j^{th}$ smallest element in $\boldsymbol{J_{l_{A}}}(A)$ .

(vi)

If $0<|A|<M$ and $v_{A}(k_{1},k_{2},K,r)$ is equal to (52), then $\boldsymbol{c}^{*}(A)$ is a permutation of the vector $\boldsymbol{c}^{\prime}(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A))$ , defined according to (35), such that

	$\displaystyle c^{*}_{<i>}(A)$	$\displaystyle:=\hat{c}^{\prime}_{i}(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A)),\quad i\in[\|A\|-l_{A}],$		(82)
	$\displaystyle c^{*}_{\{j\}}(A)$	$\displaystyle:=\check{c}^{\prime}_{j}(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A)),\quad j\in[\|A^{c}\|],$		(83)

where $$ denotes the source in $A$ with the $i^{th}$ smallest element in $\boldsymbol{I_{l_{A}}}(A)$ , and $\{j\}$ the source in $A^{c}$ with the $j^{th}$ smallest element in $\boldsymbol{J}(A)$ .

We are now ready to state the second main result of this section.

Theorem V.2

Consider a policy of the form $(R,T^{R}_{leap},\Delta^{R}_{leap})$ , where the thresholds $a$ and $b$ in (71)-(72) are selected according to (75) and the sampling constraint (4) is satisfied. Fix $A\subseteq[M]$ , and suppose that, for all $i\in[M]$ , the sampling rule $R$ satisfies

\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)<\infty,\quad\forall\;\epsilon>0,

(84)

where the vector $\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A))$ is defined according to Definition V.2.

•

If $A=\emptyset$ , then, for any $\alpha\in(0,1)$ , as $\beta\to 0$ we have

\displaystyle\begin{split}\mathsf{E}_{A}\left[T^{R}_{leap}\right]&\sim\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log\beta|}{V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A))}.\end{split}

(85)

•

If $A=[M]$ , then, for any $\beta\in(0,1)$ , as $\alpha\to 0$ we have

\displaystyle\begin{split}\mathsf{E}_{A}\left[T^{R}_{leap}\right]&\sim\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log\alpha|}{V(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A))}.\end{split}

(86)

•

If $0<|A|<M$ , then, as $\alpha,\beta\to 0$ so that (50) holds we have

\displaystyle\begin{split}\mathsf{E}_{A}\left[T^{R}_{leap}\right]&\sim\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log\alpha|}{v_{A}(k_{1},k_{2},K,r)},\end{split}

(87)

where $V$ is defined in (18), and $v_{A}$ in Definition IV.1.

Proof:

Appendix C. ∎

VI Asymptotically optimal probabilistic sampling rules

In this section, we design sampling rules that satisfy the criteria for asymptotic optimality established in Section V simultaneously for every possible subset of anomalies. For this, we first introduce a notion of consistency, which applies to an arbitrary sampling rule. Then, we define a family of probabilistic sampling rules, and finally, we show how to design a probabilistic sampling rule so that condition (67) (resp. (84)) is satisfied for every $i\in[M]$ and $A\subseteq[M]$ , and subsequently so that the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds for every $A\subseteq[M]$ .

VI-A Consistency

We say that a sampling rule $R$ is consistent if the subset of sources with non-negative LLRs at time $n$ , i.e.,

\mathfrak{D}_{n}^{R}:=\{i\in[M]\,:\,\Lambda^{R}_{i}(n)\geq 0\},

(88)

converges quickly to the true subset of anomalous sources $A$ .

Definition VI.1

Fix $A\subseteq[M]$ . We say that a sampling rule $R$ is consistent under $\mathsf{P}_{A}$ if

\mathsf{E}_{A}\left[\sigma^{R}_{A}\right]<\infty,

(89)

where $\sigma_{A}^{R}$ is the random time starting from which the sources in $A$ are the only ones with non-negative LLR, i.e.,

\sigma^{R}_{A}:=\inf\left\{n\in\mathbb{N}:\mathfrak{D}^{R}_{m}=A\quad\text{for all}\;\;m\geq n\right\}.

(90)

Definition VI.2

We say that a sampling rule $R$ is consistent if it is consistent under $\mathsf{P}_{A}$ , for every $A\subseteq[M]$ .

From [9, Theorem 3.1] we know that if there is a $\rho>0$ such that, for each $i\in[M]$ , the sequence $\{\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<\rho\right)\,:\,n\in\mathbb{N}\}$ is exponentially decaying, then the sequence $\{\mathsf{P}_{A}\left(\sigma^{R}_{A}>n\right)\,:\,n\in\mathbb{N}\}$ is exponentially decaying, and, as a result, the sampling rule $R$ is consistent under $\mathsf{P}_{A}$ . Next, we state a less restrictive criterion for consistency, according to which a sampling rule is consistent even if the long-run sampling frequency of all sources is equal to $0$ , as long as the decay is not fast.

Theorem VI.1

Suppose that condition (13) holds for some $\mathfrak{p}>4$ , and let $\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right)$ , and $C>0$ . Fix $A\subseteq[M]$ . If $R$ is an arbitrary sampling rule for which

\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)<\infty,\quad\forall\,i\,\in[M],

(91)

then $R$ is consistent under $\mathsf{P}_{A}$ .

Proof:

Appendix C. ∎

VI-B Probabilistic sampling rules

We say that a sampling rule $R$ is probabilistic if there exists a function

q^{R}:2^{[M]}\times\mathbb{N}\times 2^{[M]}\to[0,1]

such that, for every $n\in\mathbb{N}$ , $D\subseteq[M]$ , and $B\subseteq[M]$ , $q^{R}\left(B;n,D\right)$ is the probability that $B$ is the subset of sampled sources at time $n$ when $D$ is the subset of sources with non-negative LLRs at time $n-1$ , i.e.,

\displaystyle q^{R}\left(B;n,D\right):=\mathsf{P}\left(R(n)=B\,|\,\mathcal{F}^{R}_{n-1},\mathfrak{D}^{R}_{n-1}=D\right).

(92)

For such a sampling rule, for each source $i\in[M]$ we denote by $c^{R}_{i}\left(n,D\right)$ the probability with which source $i$ is sampled at time $n$ when $D$ is the subset of sources with non-negative LLRs at time $n-1$ , i.e.,

\displaystyle c^{R}_{i}\left(n,D\right)

\displaystyle:=\mathsf{P}\left(R_{i}(n)=1\,|\,\mathcal{F}^{R}_{n-1},\mathfrak{D}^{R}_{n-1}=D\right)

(93)

thus,

\displaystyle c^{R}_{i}\left(n,D\right)=\sum_{B\subseteq[M]:\,i\in B}q^{R}\left(B;n,D\right).

(94)

In the following theorem, we state a condition under which a consistent probabilistic sampling rule satisfies condition (67) or (84) simultaneously for every $A\subseteq[M]$ .

Theorem VI.2

Let $R$ be a consistent probabilistic sampling rule, and fix $A\subseteq[M]$ .

(i)

If for all $i\in[M]$ we have

$\liminf_{n\to\infty}c_{i}^{R}(n,A)\geq c^{*}_{i}(A),$ (95)

where $(c^{*}_{1}(A),\ldots,c^{*}_{M}(A))$ is given by Definition V.1 (resp. Definition V.2), then condition (67) (resp. (84)) is satisfied for every $i\in[M]$ .
(ii)

If, also, the sampling constraint (4) is satisfied, then the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds.

Proof:

Part (i) is proven in Appendix C, and part (ii) follows by Theorem V.1 (resp. Theorem V.2). ∎

Condition (95) is clearly satisfied if source $i$ is sampled at each instant with probability $c^{*}_{i}(D)$ when the estimated anomalous subset at the previous time instant is $D$ , i.e.,

c_{i}^{R}(n,D)=c^{*}_{i}(D),\quad\forall\;n\in\mathbb{N},\;D\subseteq[M],\;i\in[M].

(96)

From [9, Theorem 4.1] it follows that this choice implies that the sampling rule is consistent when

c^{*}_{i}(D)>0,\quad\forall\;D\subseteq[M],\;i\in[M].

(97)

In case condition (97) does not hold, selection (96) no longer guarantees the consistency of the sampling rule, and needs to be modified. Indeed, for a source $i\in[M]$ for which $c^{*}_{i}(D)=0$ , then the sampling frequency of source $i$ should converge to $0$ slowly enough when the true subset of anomalous sources is $D$ , so that Theorem VI.2 is applicable. To be more specific, let $\{b_{n}:n\in\mathbb{N}\}$ be a sequence of positive reals that converges to 0, which we will specify later, and for each $n\in\mathbb{N}$ , $i\in[M]$ , and $D\subseteq[M]$ set

c_{i}^{R}(n,D)=\begin{cases}b_{n},\quad&\text{if}\quad c^{*}_{i}(D)=0,\\ c^{*}_{i}(D)-(\tilde{l}_{D}/(M-\tilde{l}_{D}))\,b_{n},\quad&\text{if}\quad c^{*}_{i}(D)>0,\end{cases}

(98)

where $\tilde{l}_{D}$ is the number of zero entries in the vector $\mathbf{c}^{*}(D)$ .

As we show in the following proposition, a suitable selection of the sequence $\{b_{n}:n\in\mathbb{N}\}$ guarantees the consistency of the sampling rule.

Proposition VI.1

(i)

If (97) holds, and the probabilistic sampling rule $R$ satisfies (96), then condition (67) (resp. (84)) is satisfied for every $i\in[M]$ .
(ii)

Suppose condition (13) holds for some $\mathfrak{p}>4$ . If $R$ is a probabilistic sampling rule that satisfies (98) for every $i\in[M],D\subseteq[M]$ , $n\in\mathbb{N}$ , where $b_{n}=C_{p}\,n^{-\delta}$ for some $\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right)$ and some $C_{p}>0$ small enough such that

$c_{i}^{R}(n,D)\geq C_{p}\,n^{-\delta}$ (99)

holds for all $n\in\mathbb{N}$ , $D\subseteq[M]$ , and $i\in[M]$ , then condition (67) (resp. (84)) is satisfied for every $i\in[M]$ .

Proof:

Since (96) and (98) both imply (95) for any $A\subseteq[M]$ , by Theorem VI.2 it follows that it suffices to show that $R$ is consistent. As discussed earlier, for (i) this follows from [9, Theorem 4.1]. For (ii), the proof of consistency is presented in Appendix C.
∎

The previous proposition provides concrete selections of $c^{R}(n,D)$ that guarantee the consistency of the sampling rule and conditions (67) or (84). To achieve the corresponding asymptotic optimality property, the sampling rule should satisfy the sampling constraint (4). This is clearly the case if at most $\lfloor K\rfloor$ sources are sampled at each instant, i.e.,

\displaystyle q^{R}\left(B;n,D\right)=0\quad\text{for all}\quad n\in\mathbb{N},\;D\subseteq[M],\quad\text{and }\;B\subseteq[M]\;\text{ such that }\;|B|>\lfloor K\rfloor.

(100)

Combining this observation with the previous proposition we can now state the theorem that summarizes the main result of this section.

Theorem VI.3

Consider an integer $K$ .

(i)

Suppose (97) holds. If $R$ is a probabilistic sampling rule that satisfies (96) and (100), then the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds for every $A\subseteq[M]$ .
(ii)

If $R$ is a probabilistic sampling rule that satisfies (98) and (100) for every $i\in[M],D\subseteq[M]$ , $n\in\mathbb{N}$ , where $b_{n}=C_{p}\,n^{-\delta}$ for some $\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right)$ and some $C_{p}>0$ small enough such that (99) holds for all $n\in\mathbb{N}$ , $D\subseteq[M]$ , and $i\in[M]$ , then the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds for every $A\subseteq[M]$ .

Proof:

The claim follows by Proposition VI.1, and Theorems V.1, V.2, respectively. ∎

Remark VI.1

Theorem VI.3 remains valid even if $K$ is not an integer as long as

\sum_{i=1}^{M}c^{*}_{i}(D)\leq\lfloor K\rfloor,\quad\forall\,D\subseteq[M].

Remark VI.2

Proposition VI.1 and Theorem VI.3 remain valid even if $c_{i}^{R}(n,D)$ are chosen greater than or equal to (96) (resp. (98)), as long as the sampling constraint (100) is satisfied.

VII Simulation study

In this section, we present the results of various simulations by which we illustrate the asymptotic theory that was developed in the previous sections. In all simulations, there are $M=10$ sources and, for each source $i\in[M]$ , the observations are normally distributed with variance $1$ and mean $0$ if source $i$ is not anomalous, whereas the mean is $\mu_{i}$ if it is anomalous, i.e., $f_{0i}=\mathcal{N}(0,1)$ and $f_{1i}=\mathcal{N}(\mu_{i},1)$ , and thus $I_{i}=J_{i}=(\mu_{i})^{2}/2$ . For our simulations, we consider a non-homogeneous setup, where

\mu_{i}=\begin{cases}0.5,\quad&1\leq i\leq 3,\\ 0.7,\quad&4\leq i\leq 7,\\ 1,\quad&8\leq i\leq 10.\end{cases}

We apply the probabilistic sampling rule (see Section VI), which observes the values of exactly $K=5$ sources per sampling instant. When controlling the generalized misclassification error, we apply the sum-intersection rule (63)-(64), whereas when controlling the generalized familywise errors, we apply the leap rule (73)-(74). For the computation of each expected time for stopping, we apply $10^{4}$ simulation runs, and the standard error in each expected time for stopping is $1$ , whereas the standard error for each ratio of expected times for stopping is $10^{-2}$ , in all cases.

In all simulations, we fix the true, but unknown, anomalous set to be $A=\{1,\ldots,5\}$ . The numbers $\{F_{i}(A)\,:\,i\in[M]\}$ defined in Subsection IV-A for $I_{i}=J_{i}=(\mu_{i})^{2}/2$ , are equal to

\displaystyle F_{i}(A)

\displaystyle=\begin{cases}0.125,\quad&1\leq i\leq 3,\\ 0.245,\quad&4\leq i\leq 7,\\ 0.5,\quad&8\leq i\leq 10,\end{cases}

and for $\{I_{i}(A)\,:\,i\in[|A|]\}$ , $\{J_{i}(A)\,:\,i\in[|A^{c}|]\}$ defined in Subsection IV-B, we have

	$\displaystyle I_{i}(A)$	$\displaystyle=F_{i}(A),\quad 1\leq i\leq 5,$
	$\displaystyle J_{i-5}(A)$	$\displaystyle=F_{i}(A),\quad 6\leq i\leq 10.$

VII-A Controlling the generalized misclassification error

In the case of generalized misclassification error rate, by Theorem V.1 as $\alpha\to 0$

\mathcal{J}_{A}(\alpha;k,K)\sim\frac{|\log\alpha|}{V(k,K,\boldsymbol{F}(A))},

(101)

where for $K=5$ we have

V(k,K,\boldsymbol{F}(A))=\begin{cases}k\,\frac{K}{M}\,\widetilde{F}_{1}(A),\quad&\mbox{if }\;k\in\{1,\ldots,5\},\\ (k-3)\,\frac{K}{M-3}\,\widetilde{F}_{4}(A),\quad&\mbox{if }\;k\in\{6,\ldots,8\},\\ \sum_{i=6}^{9}F_{i}(A),\quad&\mbox{if }\;k=9,\\ \sum_{i=6}^{10}F_{i}(A),\quad&\mbox{if }\;k=10.\end{cases}

(102)

where for the computation of $V(k,K,\boldsymbol{F}(A))$ we used Algorithm 1. We can easily verify that $k(K/M)\widetilde{F}_{1}(A)$ is less than or equal to the respective value of $V(k,K,\boldsymbol{F}(A))$ for all $k\in[M]$ , and as a result

\frac{\mathcal{J}_{A}(\alpha;k,K)}{\mathcal{J}_{A}(\alpha;1,K)}\lesssim\frac{1}{k},\quad\forall\,k\in[M].

(103)

In Figure 1, we plot the ratio of the expected time for stopping for $k=5$ over that for $k=1$ , against $|\log_{10}(\alpha)|$ for $\alpha\in\{10^{-1},\ldots,10^{-10}\}$ . For each value of $\alpha$ , the thresholds are selected according to (65). As expected by the form of $V(k,K,\boldsymbol{F}(A))$ for $k=5$ in (102), the ratio converges to $1/5$ , and this is also depicted in Figure 1, where by $\mathsf{E}[T;k=1]$ , $\mathsf{E}[T;k=5]$ we denote the expected stopping time for $k=1$ , $k=5$ , respectively.

Refer to caption — Figure 1: Ratio of expected stopping times $\mathsf{E}[T;k=5]/\mathsf{E}[T;k=1]$ versus $|\log_{10}(\alpha)|$ .

In order to verify the limit (103) as an approximation in the finite regime, we fix $\alpha=10^{-3}$ and select the thresholds of the sum-intersection rule in (63), via Monte Carlo simulation, so that the probability of at least $k$ errors, of any kind, is equal to $10^{-3}$ . In Figure 2, we plot the ratio of the expected time for stopping for $k$ , over that for $k=1$ , against $k\in[M]$ . In Figure 2, we can see that the expected time for stopping when $k=1$ reduces by a factor approximately equal to $1/k$ for $1\leq k\leq 5$ , and clearly less than $1/k$ for $5\leq k\leq 10$ . In Figure 2, we denote by $\mathsf{E}[T;k]$ the expected stopping time for the respective value of $k$ .

VII-B Controlling the generalized familywise errors

In the case of generalized familywise error rates, for $r=1$ and since $0<|A|<M$ , by Theorem V.2 as $\alpha,\,\beta\to 0$

\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)},

(104)

where for $K=5$ , $v_{A}(k_{1},k_{2},K,r)$ has the following form

v_{A}(k_{1},k_{2},K,r)=\begin{cases}k_{1}\,y_{1}\,\widetilde{I}_{1}(A)=k_{2}\,y_{2}\,\widetilde{J}_{1}(A),\quad&k_{1}=k_{2}\in\{1,2\},\\ k_{1}\,y_{1}\,\widetilde{I}_{2}(A)=(k_{2}-1)\,y_{2}\,\widetilde{J}_{1}(A),\quad&k_{1}=k_{2}=3,\\ x_{1}\,I_{3}(A)+\sum_{i=4}^{5}I_{i}(A)=(k_{2}-1)\,y_{2}\,\widetilde{J}_{1}(A),\quad&k_{1}=k_{2}=4,\\ x_{1}\,I_{2}(A)+\sum_{i=3}^{5}I_{i}(A)=x_{2}\,J_{4}(A)+J_{5}(A),\quad&k_{1}=k_{2}=5.\end{cases}

(105)

and $x_{1}$ , $x_{2}$ , $y_{1}$ , $y_{2}$ , and $l_{A}$ are given in Table I.

$k_{1}=k_{2}$	1	2	3	4	5
$l_{A}$	0	0	1	1	0
$x_{1}$	-	-	-	0.61	0.61
$x_{2}$	-	-	-	-	0.39
$y_{1}$	0.69	0.69	0.66	-	-
$y_{2}$	0.30	0.30	0.46	0.53	-

TABLE I: The

l_{A}

x_{1}

x_{2}

y_{1}

y_{2}

for each

k_{1}=k_{2}\in[M/2]

For the computation of $v_{A}(k_{1},k_{2},K,r)$ we first use Algorithm 5, and then for the computation of $x_{1}$ , $x_{2}$ , $y_{1}$ , $y_{2}$ we used Algorithm 1. We can easily verify that $k_{1}\,y_{1}\,\widetilde{I}_{1}(A)$ is less than or equal to $v_{A}(k_{1},k_{2},K,r)$ for all $k_{1}=k_{2}\in[M/2]$ , and as a result

\frac{\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{1},K)}{\mathcal{J}_{A}(\alpha,\beta;1,1,K)}\lesssim\frac{1}{k_{1}}.

(106)

In Figure 3, we plot the ratio of the expected stopping time for $k_{1}=k_{2}=3$ over that for $k_{1}=k_{2}=1$ , against $|\log_{10}(\alpha)|$ for $\alpha\in\{10^{-1},\ldots,10^{-10}\}$ . For each value of $\alpha$ , and $\beta=\alpha$ , the thresholds are selected according to (75). As expected by the form of $v_{A}$ in (105), it follows that this ratio converges to $0.328$ , which is also depicted in Figure 3, where by $\mathsf{E}[T;k_{1}=k_{2}=3]$ , $\mathsf{E}[T;k_{1}=k_{2}=1]$ we denote the expected stopping time for $k_{1}=k_{2}=1$ , $k_{1}=k_{2}=3$ , respectively.

In order to verify the limit (106) as an approximation in the finite regime, we fix $\alpha=\beta=10^{-3}$ and we select the thresholds of the leap rule (73), via Monte Carlo sumulation, so that the probabilities of at least $k_{1}$ false positives and at least $k_{2}$ false negatives are both equal to $\alpha=\beta=10^{-3}$ . In Figure 4, we depict the ratio of the expected time for stopping with $k_{1}=k_{2}$ , over that for $k_{1}=k_{2}=1$ , against $k_{1}=k_{2}\in[M/2]$ . In Figure 4, we observe that the expected stopping time for $k_{1}=k_{2}=1$ , reduces by a factor of approximately $1/k_{1}$ , as $k_{1}$ increases.

VIII Conclusion

In this work, we study the sequential anomaly identification problem in the presence of a sampling constraint under two different formulations that involve generalized error metrics. For each of them we establish a universal asymptotic lower bound, and we show that it is attained by a policy that combines the stopping and decision rule that was proposed in the case of full-sampling in [27] and a probabilistic sampling rule which is designed to achieve specific long-run sampling frequencies. The optimal performance is characterized, and the impact of the sampling constraint and tolerance to errors is assessed, both to a first-order asymptotic approximation as the error probabilities go to 0. These theoretical asymptotic results are also illustrated via simulation studies.

Directions for further research involve (i) the incorporation of prior information on the number of anomalous sources, as in [9] in the case of classical familywise error control $(k_{1}=k_{2}=1)$ , (ii) consideration of composite hypotheses for the testing problem in each source, as in [27] in the full sampling case, and in [12] for the case we know a priori that there is only one anomalous source and we can observe one source at a time, (iii) varying sampling or switching cost per source, as in [10] and [11], respectively. Other directions include the framework where the acquired observations are not conditionally independent of the past [19], as well as the consideration of a dependence structure within the observations from different sources [28].

Acknowledgments

This work was supported by the University of Illinois at Urbana–Champaign research support award RB21036.

References

[1] A. Tsopelakos and G. Fellouris, “Sequential anomaly detection with observation control under a generalized error metric,” in 2020 IEEE International Symposium on Information Theory (ISIT), 2020, pp. 1165–1170.
[2] R. J. Bolton and D. J. Hand, “Statistical Fraud Detection: A Review,” Statistical Science, vol. 17, no. 3, pp. 235 – 255, 2002. [Online]. Available: https://doi.org/10.1214/ss/1042727940
[3] E. Dimson, Stock market anomalies. CUP Archive, 1988.
[4] S. K. De and M. Baron, “Sequential bonferroni methods for multiple hypothesis testing with strong control of family-wise error rates i and ii,” Sequential Analysis, vol. 31, no. 2, pp. 238–262, 2012.
[5] J. Bartroff and J. Song, “Sequential tests of multiple hypotheses controlling type i and ii familywise error rates,” Journal of statistical planning and inference, vol. 153, pp. 100–114, 2014.
[6] Y. Song and G. Fellouris, “Asymptotically optimal, sequential, multiple testing procedures with prior information on the number of signals,” Electronic Journal of Statistics, vol. 11, 03 2016.
[7] K. Cohen and Q. Zhao, “Active hypothesis testing for anomaly detection,” IEEE Transactions on Information Theory, vol. 61, no. 3, pp. 1432–1450, 2015.
[8] B. Huang, K. Cohen, and Q. Zhao, “Active anomaly detection in heterogeneous processes,” IEEE Transactions on information theory, vol. 65, no. 4, pp. 2284–2301, 2018.
[9] A. Tsopelakos and G. Fellouris, “Sequential anomaly detection under sampling constraints,” IEEE Transactions on Information Theory, pp. 1–1, 2022.
[10] A. Gurevich, K. Cohen, and Q. Zhao, “Sequential anomaly detection under a nonlinear system cost,” IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3689–3703, 2019.
[11] T. Lambez and K. Cohen, “Anomaly search with multiple plays under delay and switching costs,” IEEE Transactions on Signal Processing, vol. 70, pp. 174–189, 2021.
[12] B. Hemo, T. Gafni, K. Cohen, and Q. Zhao, “Searching for anomalies over composite hypotheses,” IEEE Transactions on Signal Processing, vol. 68, pp. 1181–1196, 2020.
[13] A. Deshmukh, V. V. Veeravalli, and S. Bhashyam, “Sequential controlled sensing for composite multihypothesis testing,” Sequential Analysis, vol. 40, no. 2, pp. 259–289, 2021.
[14] G. R. Prabhu, S. Bhashyam, A. Gopalan, and R. Sundaresan, “Sequential multi-hypothesis testing in multi-armed bandit problems: An approach for asymptotic optimality,” IEEE Transactions on Information Theory, 2022.
[15] A. Tsopelakos and G. Fellouris, “Asymptotically optimal sequential anomaly identification with ordering sampling rules,” arXiv preprint arXiv:2309.14528, 2023.
[16] H. Chernoff, “Sequential design of experiments,” Ann. Math. Statist., vol. 30, no. 3, pp. 755–770, 09 1959. [Online]. Available: http://dx.doi.org/10.1214/aoms/1177706205
[17] S. Nitinawarat, G. K. Atia, and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on Automatic Control, vol. 58, no. 10, pp. 2451–2464, 2013.
[18] S. A. Bessler, “Theory and applications of the sequential design of experiments, k-actions and infinitely many experiments, part i theory.” Department of Statistics, Stanford University, Technical Report 55, 1960.
[19] S. Nitinawarat and V. V. Veeravalli, “Controlled sensing for sequential multihypothesis testing with controlled markovian observations and non-uniform control cost,” Sequential Analysis, vol. 34, no. 1, pp. 1–24, 2015. [Online]. Available: https://doi.org/10.1080/07474946.2014.961864
[20] J. Bartroff, “Multiple hypothesis tests controlling generalized error rates for sequential data,” Statistica Sinica, vol. 28, no. 1, pp. 363–398, 2018. [Online]. Available: http://www.jstor.org/stable/26384246
[21] J. Bartroff and J. Song, “Sequential tests of multiple hypotheses controlling false discovery and nondiscovery rates,” Sequential Analysis, vol. 39, no. 1, pp. 65–91, 2020. [Online]. Available: https://doi.org/10.1080/07474946.2020.1726686
[22] X. He and J. Bartroff, “Asymptotically optimal sequential fdr and pfdr control with (or without) prior information on the number of signals,” Journal of Statistical Planning and Inference, vol. 210, pp. 87–99, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378375820300604
[23] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal statistical society: series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995.
[24] E. L. Lehmann and J. P. Romano, “Generalizations of the familywise error rate,” The Annals of Statistics, vol. 33, no. 3, pp. 1138–1154, 2005. [Online]. Available: http://www.jstor.org/stable/3448684
[25] J. P. Romano and A. M. Shaikh, “Stepup procedures for control of generalizations of the familywise error rate,” The Annals of Statistics, vol. 34, no. 4, pp. 1850–1873, 2006. [Online]. Available: http://www.jstor.org/stable/25463487
[26] J. P. Romano and M. Wolf, “Control of generalized error rates in multiple testing,” The Annals of Statistics, vol. 35, no. 4, pp. 1378–1408, 2007. [Online]. Available: http://www.jstor.org/stable/25464544
[27] Y. Song and G. Fellouris, “Sequential multiple testing with generalized error control: An asymptotic optimality theory,” The Annals of Statistics, vol. 47, no. 3, pp. 1776 – 1803, 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1737
[28] J. Heydari, A. Tajer, and H. V. Poor, “Quickest linear search over correlated sequences,” IEEE Transactions on Information Theory, vol. 62, no. 10, pp. 5786–5808, 2016.
[29] P. Hall and C. C. Heyde, “Martingale limit theory and its application,” Probability and Mathematical Statistics. New York etc.: Academic Press, A Subsidiary of Harcourt Brace Jovanovich, Publishers. XII, 308 p. $ 36.00 (1980)., 1980.
[30] O. Kallenberg, Foundations of Modern Probability, ser. Probability and Its Applications. Springer New York, 2002. [Online]. Available: https://books.google.com/books?id=TBgFslMy8V4C
[31] E. Cesàro, “Sur la convergence des séries,” Nouvelles annales de mathématiques: journal des candidats aux écoles polytechnique et normale, vol. 7, pp. 49–59, 1888.

Appendix A

In Appendix A, we prove Lemma III.1 and Lemma III.2, which provide the solutions to the max-min problems (18) and (30), respectively. We also provide Algorithm 1 for the computation of $x,y\in[0,1)$ and $u,v\in[0,\kappa]$ , and Algorithm 5 for the computation of $(K^{*}_{1},K^{*}_{2})$ .

Proof:

Since $\mathcal{D}(K)$ is compact, and $\{U\subseteq[|\boldsymbol{L}|]\,:\;|U|=\kappa\}$ is finite, the max-min problem (18) has a solution. The max-min structure of (18) implies that a maximizer $(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})$ must satisfy the following two conditions:

(i)

for each $i\in\{1,\ldots,\kappa\}$ , and each $j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}$ , it holds

$\tilde{c}_{i}L_{i}\leq\tilde{c}_{j}L_{j},$ (107)
(ii)

the $\{\tilde{c}_{i}L_{i}\,:\,i\in\{1,\ldots,\kappa\}\}$ follow an ascending order, i.e.,

$\tilde{c}_{1}L_{1}\leq\ldots\leq\tilde{c}_{i}L_{i}\leq\ldots\leq\tilde{c}_{\kappa}L_{\kappa},$ (108)

By definition of $\mathcal{V}(\boldsymbol{c};\kappa,\boldsymbol{L})$ in (16), the first condition (107) implies that

V(\kappa,K,\boldsymbol{L})=\sum_{i=1}^{\kappa}\tilde{c}_{i}L_{i}.

(109)

To prove why condition (107) must hold, we apply an argument by contradiction. Let us assume that there are $i\in\{1,\ldots,\kappa\}$ , and $j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}$ such that $\tilde{c}_{i}L_{i}>\tilde{c}_{j}L_{j}$ , and let us denote by $j^{*}$ the index which corresponds to smallest such $\tilde{c}_{j}L_{j}$ , in case the assumed inequality holds for more than one $j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}$ , and by $i^{*}$ the index which corresponds to largest such $\tilde{c}_{i}L_{i}$ in case the assumed inequality holds for more than one $i\in\{1,\ldots,\kappa\}$ . Clearly, $\tilde{c}_{i^{*}}L_{i^{*}}>\tilde{c}_{j^{*}}L_{j^{*}}$ , and the $\tilde{c}_{j^{*}}L_{j^{*}}$ is included in the sum of $V(\kappa,K,\boldsymbol{L})$ , as it is one of the $\kappa$ smallest $\{\tilde{c}_{u}L_{u}:u\in[|\boldsymbol{L}|]\}$ , whereas $\tilde{c}_{i^{*}}L_{i^{*}}$ is not included. Since $\tilde{c}_{i^{*}}L_{i^{*}}>\tilde{c}_{j^{*}}L_{j^{*}}$ , and by definition $L_{i^{*}}\leq L_{j^{*}}$ , then $\tilde{c}_{j^{*}}<\tilde{c}_{i^{*}}$ . Thus, simply by swapping the values of $\tilde{c}_{j^{*}}$ , $\tilde{c}_{i^{*}}$ we could increase the value of $V(\kappa,K,\boldsymbol{L})$ by a size of $\tilde{c}_{i^{*}}L_{j^{*}}-\tilde{c}_{j^{*}}L_{j^{*}}$ , which is a contradiction because $(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})$ is a maximizer.

Assuming that (107) holds which implies (109), in order to prove why condition (108) must hold, we apply again an argument by contradiction. Let us assume that there are $i<j$ , both in $\{1,\ldots,\kappa\}$ , such that $\tilde{c}_{i}L_{i}>\tilde{c}_{j}L_{j}$ . In such a case, it is clear that $\tilde{c}_{i}>\tilde{c}_{j}$ because by definition $L_{i}\leq L_{j}$ . If we swap the values of $\tilde{c}_{i}$ , $\tilde{c}_{j}$ , then we can increase the value of $V(\kappa,K,\boldsymbol{L})$ since

\tilde{c}_{i}L_{i}+\tilde{c}_{j}L_{j}<\tilde{c}_{j}L_{i}+\tilde{c}_{i}L_{j}\Leftrightarrow(\tilde{c}_{i}-\tilde{c}_{j})L_{i}<(\tilde{c}_{i}-\tilde{c}_{j})L_{j},

(110)

which is a contradiction because $(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})$ is a maximizer.

The maximizer with the minimum $\mathcal{L}^{1}$ norm is the one that satisfies (108), and (107) by the following equality,

\tilde{c}_{\kappa}L_{\kappa}=\tilde{c}_{j}L_{j},\quad\forall\,j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}.

(111)

If $K$ is large enough so that

K\geq\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j},

(112)

then the maximizer with the minimum $\mathcal{L}^{1}$ norm is

		$\displaystyle\tilde{c}_{1}=\ldots=\tilde{c}_{\kappa}=1,$		(113)
	$\displaystyle\tilde{c}_{j}=$	$\displaystyle L_{\kappa}/L_{j},\quad\forall\;j\in\{\kappa+1,\ldots,\|\boldsymbol{L}\|\},$		(113)

which implies that

V(\kappa,K,\boldsymbol{L})=\sum_{i=1}^{\kappa}L_{i},

and thus $V(\kappa,K,\boldsymbol{L})$ equals to the form (19) with $x=y=0$ , $v=1$ , $u=\kappa$ . Hence, it suffices to prove that $V(\kappa,K,\boldsymbol{L})$ equals to the form (19) when

K<\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j}.

(114)

When (114) holds, the maximizer $(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})\in\mathcal{D}(K)$ satisfies

\sum_{i=1}^{|\boldsymbol{L}|}\tilde{c}_{i}=K,

(115)

by which we get that

K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}=\sum_{i=\kappa}^{|\boldsymbol{L}|}\tilde{c}_{i}=\tilde{c}_{\kappa}L_{\kappa}\sum_{i=\kappa}^{|\boldsymbol{L}|}1/L_{j}

where the second equality follows by (111). Therefore,

\tilde{c}_{j}L_{j}=\tilde{c}_{\kappa}L_{\kappa}=\frac{K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}}{|\boldsymbol{L}|-(\kappa-1)}\widetilde{L}_{\kappa},\quad\forall\,j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}.

(116)

where $\widetilde{L}_{\kappa}$ defined in (15) is the harmonic mean of the $|\boldsymbol{L}|-(\kappa-1)$ largest elements in $\boldsymbol{L}$ , and it stands for an average of them. It holds

0\leq\frac{K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}}{|\boldsymbol{L}|-(\kappa-1)}\leq 1

(117)

because

K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}=\sum_{i=\kappa}^{|\boldsymbol{L}|}\tilde{c}_{i}\leq|\boldsymbol{L}|-(\kappa-1),

since $\tilde{c}_{i}\in[0,1]$ for all $i\in[|\boldsymbol{L}|]$ . By replacing $\tilde{c}_{\kappa}L_{\kappa}$ in (109) we get

V(\kappa,K,\boldsymbol{L})=\tilde{c}_{1}L_{1}+\ldots+\tilde{c}_{\kappa-1}L_{\kappa-1}+\frac{K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}}{|\boldsymbol{L}|-(\kappa-1)}\;\widetilde{L}_{\kappa}.

(118)

Therefore, the values of $\tilde{c}_{1},\ldots,\tilde{c}_{\kappa-1}$ are deduced by the solution of the following maximization problem

\mbox{maximize}\left\{c_{1}L_{1}+\ldots+c_{\kappa-1}L_{\kappa-1}+\frac{z}{|\boldsymbol{L}|-(\kappa-1)}\widetilde{L}_{\kappa}\right\}

(119)

with respect to $c_{1},\ldots,c_{\kappa-1}\in[0,1]$ and $z$ subject to

(i)

$0\leq z\leq L_{\kappa}\sum_{i=\kappa}^{|\boldsymbol{L}|}1/L_{i},$
(ii)

$\sum_{i=1}^{\kappa-1}c_{i}+z=K,$
(iii)

$c_{1}L_{1}\leq\ldots\leq c_{\kappa-1}L_{\kappa-1}\leq\frac{z}{|\boldsymbol{L}|-(\kappa-1)}\widetilde{L}_{\kappa}.$

In the special case $\widetilde{L}_{\kappa}/(|\boldsymbol{L}|-(\kappa-1))\geq L_{\kappa-1}$ , and since by definition $L_{\kappa-1}\geq\ldots\geq L_{1}$ , the maximization method is to distribute $K$ so that $z,c_{\kappa-1},\ldots,c_{1}$ respectively takes its’ largest possible value, independently, one by one with respect to the priority list they are written. On the other hand, if $\widetilde{L}_{\kappa}/(|\boldsymbol{L}|-(\kappa-1))<L_{\kappa-1}$ then $c_{\kappa-1}$ would become first in the priority list. Due to constraint (iii) for any value of $c_{\kappa-1}$ , the value of $z$ must be large enough so that the inequality is satisfied. Thus, the maximization method for the former case does not work, as the value of $z$ depends on the value of $c_{\kappa-1}$ . In order to resolve this issue, we consider

u^{*}:=\max\left\{u\in\{0,\ldots,\kappa-1\}\,:\,\frac{\kappa-u}{|\boldsymbol{L}|-u}\;\widetilde{L}_{u+1}\geq L_{u}\right\},

thus $V(\kappa,K,\boldsymbol{L})$ reduces to

V(\kappa,K,\boldsymbol{L})=\tilde{c}_{1}L_{1}+\ldots+\tilde{c}_{u^{*}}L_{u^{*}}+(\kappa-u^{*})\frac{z}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1},

(120)

and it holds

\frac{\kappa-u^{*}}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}\geq L_{u^{*}}\geq\ldots\geq L_{1}.

The values of $\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|}$ are deduced by the solution of the following maximization problem,

\mbox{maximize}\left\{c_{1}L_{1}+\ldots+c_{u^{*}}L_{u^{*}}+(\kappa-u^{*})\frac{z}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}\right\}

(121)

with respect to $c_{1},\ldots,c_{u^{*}}\in[0,1]$ and $z$ subject to

(i)

$0\leq z\leq L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i},$
(ii)

$\sum_{i=1}^{u^{*}}c_{i}+z=K,$
(iii)

$c_{1}L_{1}\leq\ldots\leq c_{u^{*}}L_{u^{*}}\leq\frac{z}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}.$

K<L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i},

(122)

then $c_{1}=\ldots=c_{u^{*}}=0$ ,

V(\kappa,K,\boldsymbol{L})=(\kappa-u^{*})\frac{K}{|\boldsymbol{L}|-u^{*}}\;\widetilde{L}_{u^{*}+1},

(123)

and thus $V(\kappa,K,\boldsymbol{L})$ equals to the form (19) with $x=0$ , $v=0$ , $u=u^{*}$ , $y=K/(|\boldsymbol{L}|-u^{*})$ . Therefore, in view of (114), it suffices to prove that $V(\kappa,K,\boldsymbol{L})$ equals to the form (19) when

L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}\leq K<\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j}.

(124)

If $K$ satisfies (124) then $c_{u^{*}+1}=1$ , the objective of (121) becomes

c_{1}L_{1}+\ldots+c_{u^{*}}L_{u^{*}}+L_{u^{*}+1}+(\kappa-(u^{*}+1))\frac{z}{|\boldsymbol{L}|-(u^{*}+1)}\widetilde{L}_{u^{*}+2}.

and if we set $u:=u^{*}+1$ , $v:=u^{*}+1$ the maximization problem (121) takes the more general form

\mbox{maximize}\left\{c_{1}L_{1}+\ldots+c_{v-1}L_{v-1}+\sum_{i=v}^{u}L_{i}+(\kappa-u)\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\right\}

(125)

with respect to $c_{1},\ldots,c_{v-1}\in[0,1]$ and $z$ subject to

(i)

$L_{u}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}\leq z\leq L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i},$
(ii)

$\sum_{i=1}^{v-1}c_{i}+z=K-(u-v-1),$
(iii)

$c_{1}L_{1}\leq\ldots\leq c_{v-1}L_{v-1}\leq L_{v}\leq\ldots\leq L_{u}\leq\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}.$

For any $u,v\in[1,\kappa]$ with $v\leq u$ , if

L_{v-1}\geq\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1},

(126)

then our priority is to make $c_{v-1}$ as large as possible. If $c_{v-1}=1$ then we decrease variable $v$ by $1$ . We observe that the value of $c_{v-1}$ does not affect $z$ as the values $L_{v},\ldots,L_{u}$ interfere. If (126) does not hold, our priority is to make $z$ as large as possible. If $z=L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}$ then we set $z$ equal to $L_{u+1}\sum_{i=u+2}^{|\boldsymbol{L}|}1/L_{i}$ , and we increase $u$ by $1$ . Depending on the size of $K$ this procedure continues for a number of times, and it ends up with a form

V(\kappa,K,\boldsymbol{L})=x\,L_{v-1}+\sum_{i=v}^{u}L_{i}+(\kappa-u)\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1},

(127)

which for $y:=z/(|\boldsymbol{L}|-u)$ equals to the form (19) in Lemma III.1.

In order to prove that the maximizer with the minimum $\mathcal{L}^{1}$ norm satisfies (21)-(25), we observe that (23)-(25) follow directly by the form (127) of $V(\kappa,K,\boldsymbol{L})$ by matching the $c^{\prime}_{i}$ with the factor of the respective $L_{i}$ . In order to prove (21) we distinguish the following two cases.

(i)

If $u=\kappa$ then $\tilde{c}_{\kappa}=1$ and the result follows by (111).

(ii)

If $u<\kappa$ then

\tilde{c}_{j}L_{j}=\tilde{c}_{\kappa}L_{\kappa}=\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1},\quad\forall\,j\,\in\,\{\kappa+1,\ldots,|\boldsymbol{L}|\},

which proves the claim since $y:=z/(|\boldsymbol{L}|-u)$ .

∎

In Algorithm 1, we describe explicitly how $x,y,u,v$ are computed. The algorithm primarily focuses on the case that $K$ satisfies (124), as the other two cases are covered in (112) and (122). Algorithm 1 solves the constrained optimization problem (125) by implementing the steps described in the paragraph that follows (126).

(1)

Input: $\kappa$ , $K$ , $\boldsymbol{L}$ .
(2)

Compute

$u^{*}\leftarrow\max\left\{u\in\{0,..,\kappa-1\}\,:\,\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\geq L_{u}\right\}.$
(3)

If $K\geq\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j}$ then

$x\leftarrow 0$ , $y\leftarrow 0$ , $v\leftarrow 1$ , $u\leftarrow\kappa$ ,

go to output

end-if.
(4)

If $K<L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}$ then

$x\leftarrow 0$ , $v\leftarrow 0$ , $u\leftarrow u^{*}$ , $y\leftarrow K/(|\boldsymbol{L}|-u^{*})$ ,

go to output

end-if.
(5)

Initialize: $x\leftarrow 0$ , $u\leftarrow u^{*}+1$ , $v\leftarrow u^{*}+1$ ,

$z\leftarrow L_{u}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}$ ,

$K\leftarrow K-L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}$ .
(6)
While $(K>0)$
1. (i)
 
 If $\left(L_{v-1}\geq\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\right)$ or $(u=\kappa)$ then
 1. If $(K\geq 1)$ then
 
 $v\leftarrow v-1$ ,
 
 $K\leftarrow K-1$ ,
 2. else
 
 $x\leftarrow K$ ,
 
 $K\leftarrow 0$ ,
 
 end-if
 end-if
2. (ii)
 
 If $\left(L_{v-1}<\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\right)$ or $(v=1)$ then
 
 $w\leftarrow L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}-z$ ,
 
 $z\leftarrow z+\min\{K,w\}$ ,
 
 $K\leftarrow K-\min\{K,w\}$ .
 1. If $\left(z\geq L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}\right)$ then
 
 $z\leftarrow L_{u+1}\sum_{i=u+2}^{|\boldsymbol{L}|}1/L_{i}$ ,
 
 $u\leftarrow u+1$ ,
 
 end-if
 end-if
end-while.

$y\leftarrow z/(|\boldsymbol{L}|-u)$ .
(7)

Output: $x$ , $u$ , $v$ , $y$ .

Algorithm 1 Computation of

x,y\in[0,1)

and

u,v\in[0,\kappa]

Proof:

For the proof of Lemma III.2, it suffices to show that

W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)=V(\kappa_{1},K_{1}^{*},\boldsymbol{L}_{1}),

where $(K^{*}_{1},K^{*}_{2})$ is the maximizer with the minimum $\mathcal{L}^{1}$ norm of the problem (32), then the form (34) of $W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)$ , as well as the form (36)-(45) of the maximizer of (30) with the minimum $\mathcal{L}^{1}$ norm, follow by Lemma III.1.

Since $\mathcal{D}(K)$ is compact and $\{U\subseteq[|\boldsymbol{L}_{1}|]\,:\,|U|=\kappa_{1}\}$ , $\{U\subseteq[|\boldsymbol{L}_{2}|]\,:\,|U|=\kappa_{2}\}$ are finite sets, the max-min problem (30) has a solution. We denote by $(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0})$ the maximizer of (30) with the minimum $\mathcal{L}^{1}$ norm, where $\hat{\boldsymbol{c}}^{\prime}:=(\hat{c}^{\prime}_{1},\ldots,\hat{c}^{\prime}_{|\boldsymbol{L}_{1}|})$ , $\check{\boldsymbol{c}}^{\prime}:=(\check{c}^{\prime}_{1},\ldots,\check{c}^{\prime}_{|\boldsymbol{L}_{2}|})$ , and we consider

\hat{K}:=\sum_{i=1}^{|\boldsymbol{L}_{1}|}\hat{c}^{\prime}_{i},\qquad\check{K}:=\sum_{i=1}^{|\boldsymbol{L}_{2}|}\check{c}^{\prime}_{i}.

(128)

The max-min structure of (30) implies that for $\hat{\boldsymbol{c}}^{\prime}$ , $\check{\boldsymbol{c}}^{\prime}$ , we have

W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)=\mathcal{V}(\hat{\boldsymbol{c}}^{\prime};\kappa_{1},\boldsymbol{L}_{1})=r\,\mathcal{V}(\check{\boldsymbol{c}}^{\prime};\kappa_{2},\boldsymbol{L}_{2})

and

	$\displaystyle\mathcal{V}(\hat{\boldsymbol{c}}^{\prime};\kappa_{1},\boldsymbol{L}_{1})$	$\displaystyle=\max_{\hat{\boldsymbol{c}}\in\mathcal{D}(\hat{K})}\mathcal{V}(\hat{\boldsymbol{c}};\kappa_{1},\boldsymbol{L}_{1})=V(\kappa_{1},\hat{K},\boldsymbol{L}_{1}),$
	$\displaystyle\mathcal{V}(\check{\boldsymbol{c}}^{\prime};\kappa_{2},\boldsymbol{L}_{2})$	$\displaystyle=\max_{\check{\boldsymbol{c}}\in\mathcal{D}(\check{K})}\mathcal{V}(\check{\boldsymbol{c}};\kappa_{2},\boldsymbol{L}_{2})=V(\kappa_{2},\check{K},\boldsymbol{L}_{2}).$

Hence,

W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)=V(\kappa_{1},\hat{K},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\check{K},\boldsymbol{L}_{2}),

(129)

and since $(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0})\in\mathcal{D}(K)$ we have $\hat{K}+\check{K}\leq K$ .

Therefore, it suffices to prove that $(\hat{K},\check{K})$ is the maximizer of (32) with the minimum $\mathcal{L}^{1}$ norm. To prove this, we apply an argument by contradiction. Let us assume that $(\hat{K},\check{K})$ is not a maximizer of (32), then there is $(K_{1},K_{2})$ such that $K_{1}+K_{2}\leq K$ , and

V(\kappa_{1},K_{1},\boldsymbol{L}_{1})=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})>V(\kappa_{1},\hat{K},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\check{K},\boldsymbol{L}_{2}).

Let us denote by $\hat{\boldsymbol{c}}^{\prime\prime}:=(\hat{c}^{\prime\prime}_{1},\ldots,\hat{c}^{\prime\prime}_{|\boldsymbol{L}_{1}|})$ the maximizer of $V(\kappa_{1},K_{1},\boldsymbol{L}_{1})$ , and by $\check{\boldsymbol{c}}^{\prime\prime}:=(\check{c}^{\prime\prime}_{1},\ldots,\check{c}^{\prime\prime}_{|\boldsymbol{L}_{2}|})$ the maximizer of $V(\kappa_{2},K_{2},\boldsymbol{L}_{2})$ . Then,

\min\left\{\mathcal{V}(\hat{\boldsymbol{c}}^{\prime\prime};\kappa_{1},\boldsymbol{L}_{1}),r\,\mathcal{V}(\check{\boldsymbol{c}}^{\prime\prime};\kappa_{2},\boldsymbol{L}_{2})\right\}>\min\left\{\mathcal{V}(\hat{\boldsymbol{c}}^{\prime};\kappa_{1},\boldsymbol{L}_{1}),r\,\mathcal{V}(\check{\boldsymbol{c}}^{\prime};\kappa_{2},\boldsymbol{L}_{2})\right\}

which is a contradiction because we assumed that $(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0})$ is a maximizer of (30). Also, if we assume that $(\hat{K},\check{K})$ is not the maximizer (32) with the minimum $\mathcal{L}^{1}$ norm, by (128) we deduce that $(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0})$ is not the maximizer of (30) with the minimum $\mathcal{L}^{1}$ norm, which is a contradiction. ∎

Next, we provide Algorithm 5 for the computation of the maximizer $(K^{*}_{1},K^{*}_{2})$ of the constrained optimization problem (32) with the minimum $\mathcal{L}^{1}$ norm.

1(1)

Input: $\kappa_{1}$ , $\kappa_{2}$ , $K$ , $\boldsymbol{L}_{1}$ , $\boldsymbol{L}_{2}$ , $r$ .

(2)

Compute

$\tilde{K}_{1}\leftarrow\kappa_{1}+L_{1,\kappa_{1}}\sum_{i=\kappa_{1}+1}^{|\boldsymbol{L}_{1}|}1/L_{1,i},$

$\tilde{K}_{2}\leftarrow\kappa_{2}+L_{2,\kappa_{2}}\sum_{i=\kappa_{2}+1}^{|\boldsymbol{L}_{2}|}1/L_{2,i}.$

(3)

If $\sum_{i=1}^{\kappa_{1}}L_{1,i}\leq r\,\sum_{i=1}^{\kappa_{2}}L_{2,i}$

$K^{r}_{2}\leftarrow$ root of the equation $r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})=V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1})$ with respect to $K_{2}\in[0,\tilde{K}_{2}]$ .

$\tilde{K}\leftarrow\min\{\tilde{K}_{1}+K^{r}_{2},K\}$ .

$K^{*}_{1}\leftarrow$ root of the equation $V(\kappa_{1},K_{1},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2})$ with respect to $K_{1}\in[0,\tilde{K}]$ .

$K^{*}_{2}\leftarrow\tilde{K}-K^{*}_{1}.$

end-if

(4)

If $\sum_{i=1}^{\kappa_{1}}L_{1,i}>r\,\sum_{i=1}^{\kappa_{2}}L_{2,i}$

$K^{r}_{1}\leftarrow$ root of the equation $V(\kappa_{1},K_{1},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\tilde{K}_{2},\boldsymbol{L}_{2})$ with respect to $K_{1}\in[0,\tilde{K}_{1}]$ .

$\tilde{K}\leftarrow\min\{K^{r}_{1}+\tilde{K}_{2},K\}$ .

$K^{*}_{2}\leftarrow$ root of the equation $V(\kappa_{1},\tilde{K}-K_{2},\boldsymbol{L}_{1})=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})$ with respect to $K_{2}\in[0,\tilde{K}]$ .

$K^{*}_{1}\leftarrow\tilde{K}-K^{*}_{2}.$

end-if

(5)

Output: $K^{*}_{1},K^{*}_{2}$ .

Algorithm 2 Computation of

(K^{*}_{1},K^{*}_{2})

We note that $K^{r}_{2}$ (resp. $K^{r}_{1}$ ) and $K^{*}_{1}$ (resp. $K^{*}_{2}$ ) are unique can be computed using the bisection method. Without loss of generality, we assume that

\sum_{i=1}^{\kappa_{1}}L_{1,i}\leq r\,\sum_{i=1}^{\kappa_{2}}L_{2,i},

then for the computation of $K^{r}_{2}$ , we consider the function

g(K_{2}):=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})-V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1}),\quad K_{2}\in[0,\tilde{K}_{2}].

which is continuous with

	$\displaystyle g(0)$	$\displaystyle=-V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1})<0,$
	$\displaystyle g(\tilde{K}_{2})$	$\displaystyle=r\,V(\kappa_{2},\tilde{K}_{2},\boldsymbol{L}_{2})-V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1})$
		$\displaystyle=r\,\sum_{i=1}^{\kappa_{2}}L_{2,i}-\sum_{i=1}^{\kappa_{1}}L_{1,i}\geq 0.$

If $g(\tilde{K}_{2})=0$ then $K^{r}_{2}=\tilde{K}_{2}$ , otherwise $g(\tilde{K}_{2})>0$ and the solution follows by the bisection method. Since $V(\kappa_{2},K_{2},\boldsymbol{L}_{2})$ is increasing over $[0,\tilde{K}_{2}]$ , the function $g(K_{2})$ is also increasing over $[0,\tilde{K}_{2}]$ and thus $K^{r}_{2}$ is unique.

For the computation of $K^{*}_{1}$ , we consider the function

h(K_{1})=V(\kappa_{1},K_{1},\boldsymbol{L}_{1})-r\,V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2}),\quad K_{1}\in[0,\tilde{K}]

(130)

which is continuous with

	$\displaystyle h(0)$	$\displaystyle=-r\,V(\kappa_{2},\tilde{K},\boldsymbol{L}_{2})<0,$
	$\displaystyle h(\tilde{K})$	$\displaystyle=V(\kappa_{1},\tilde{K},\boldsymbol{L}_{1})>0.$

Thus, the solution follows by the bisection method and it is less than or equal to $\tilde{K}_{1}\wedge\tilde{K}$ . Since $V(\kappa_{1},K_{1},\boldsymbol{L}_{1})$ is increasing over $[0,\tilde{K}_{1}]$ , and $V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2})$ is non-increasing over $[0,\tilde{K}]$ the function $h(K_{1})$ is increasing over $[0,\tilde{K}_{1}\wedge\tilde{K}]$ and thus $K^{*}_{1}$ is unique.

By restricting the value of $K$ to $\tilde{K}:=(\tilde{K}_{1}+K^{r}_{2})\wedge K$ , we do not reduce the maximum possible value that the objective of (32), i.e. $V(\kappa_{1},K_{1},\boldsymbol{L}_{1})$ , can take. In the same time, we restrict the number of possible maximizers to $1$ , by keeping the one with the minimum $\mathcal{L}^{1}$ norm, which is given by the unique root of

V(\kappa_{1},K_{1},\boldsymbol{L}_{1})-r\,V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2}),\quad K_{1}\in[0,\tilde{K}].

Appendix B

In Appendix B, we prove Theorems IV.1, IV.2, IV.3, which establish a universal asymptotic lower bound on the expected stopping time when controlling the misclassification error rate (7), and the familywise error rates (9), respectively.

B-A Misclassification error rate

As a first step towards the proof of Theorem IV.1 we provide the following auxiliary lemma. We fix $A\subseteq[M]$ , and we denote by $Z(A;k)$ the family of subsets of $[M]$ whose Hamming distance from $A$ is at least $k$ , i.e.,

Z(A;k):=\{C\subseteq[M]:|C\triangle A|\geq k\}.

(131)

Lemma B.1

For any $B\notin Z(A;k)$ , and $(c_{1},\ldots,c_{M})\in\mathcal{D}(K)$ , we have the following inequality,

\min_{G\in Z(B;k)}\left(\sum_{i\in A\setminus G}c_{i}I_{i}+\sum_{j\in G\setminus A}c_{j}J_{j}\right)\leq V(k,K,\boldsymbol{F}(A)).

(132)

Proof:

We fix $(c_{1},\ldots,c_{M})\in\mathcal{D}(K)$ , and we denote by $C^{*}$ the set minimizer of

\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right),

(133)

where in place of $Z(B;k)$ we have $Z(A;k)$ . By [27, Lemma B.2], there exists a set $G^{*}$ such that for the sets $A,\,A\triangle C^{*},\,B$ it holds

A\triangle G^{*}\subseteq A\triangle C^{*}\subseteq B\triangle G^{*}.

(134)

By the right inclusion we have

|B\triangle G^{*}|\geq|A\triangle C^{*}|\geq k,

(135)

which means that $G^{*}\in Z(B;k)$ , whereas by the left inclusion we have

A\setminus G^{*}\subseteq A\setminus C^{*},\quad G^{*}\setminus A\subseteq C^{*}\setminus A.

(136)

As a result,

$\displaystyle\min_{G\in Z(B;k)}\left(\sum_{i\in A\setminus G}c_{i}I_{i}+\sum_{j\in G\setminus A}c_{j}J_{j}\right)\leq$	$\displaystyle\sum_{i\in A\setminus G^{}}c_{i}I_{i}+\sum_{j\in G^{}\setminus A}c_{j}J_{j}$	(137)
$\displaystyle\leq$	$\displaystyle\sum_{i\in A\setminus C^{}}c_{i}I_{i}+\sum_{j\in C^{}\setminus A}c_{j}J_{j}$
	$\displaystyle=\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right)$
	$\displaystyle\leq\max_{(c_{1},\ldots,c_{M})\in\mathcal{D}(K)}\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right),$

where the first inequality follows by the fact that $G^{*}\in Z(B;k)$ , and the second one by (136). The set minimizer $C^{*}\in Z(A;k)$ has Hamming distance equal to $k$ , for any $(c_{1},\ldots,c_{M})\in\mathcal{D}(K)$ , as the addition of extra terms exceeds the minimum, which implies

$\displaystyle\max_{(c_{1},\ldots,c_{M})\in\mathcal{D}(K)}$	$\displaystyle\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right)$	(138)
	$\displaystyle=\max_{(c_{1},\ldots,c_{M})\in\mathcal{D}(K)}\;\min_{U\subseteq[M]:\,\|U\|=k}\;\sum_{i\in U}c_{i}\,F_{i}(A)$
	$\displaystyle=V(k,K,\boldsymbol{F}(A)).$

where the last equality follows by definition of $V(k,K,\boldsymbol{F}(A))$ according to (18). ∎

For the proof of Theorem IV.1, we introduce the log-likelihood ratio of $\mathsf{P}_{A}$ versus $\mathsf{P}_{C}$ , for any sampling rule $R$ and any subsets $A,\,C\subseteq[M]$ , based on the observations from all sources in the first $n$ sampling instants, i.e.,

\displaystyle\Lambda^{R}_{A,C}(n):=\log\frac{d\mathsf{P}_{A}}{d\mathsf{P}_{{C}}}\left(\mathcal{F}^{R}_{n}\right),\quad n\in\mathbb{N}.

(139)

Since $R(n)$ is $\mathcal{F}^{R}_{n-1}$ -measurable, $X_{i}(n)$ is independent of $\mathcal{F}^{R}_{n-1}$ , and $Z(n)$ is independent of $\mathcal{F}^{R}_{n-1}$ and of $\{X_{i}(n)\,:\,i\in[M]\}$ , we have

\displaystyle\begin{split}\Lambda^{R}_{A,C}(n):=\Lambda^{R}_{A,C}(n-1)&+\sum_{i\in A\setminus C}g_{i}(X_{i}(n))\,R_{i}(n)\\ &-\sum_{j\in C\setminus A}g_{j}(X_{j}(n))\,R_{j}(n),\end{split}

(140)

where we recall that $g_{i}:=\log\left(f_{1i}/f_{0i}\right)$ , and we set $\Lambda^{R}_{A,C}(0):=0$ . Comparing with (62), it is clear that

\displaystyle\Lambda^{R}_{A,C}(n)=\sum_{i\in A\setminus C}\Lambda^{R}_{i}(n)-\sum_{j\in C\setminus A}\Lambda^{R}_{j}(n),\quad n\in\mathbb{N}.

(141)

We also set

\displaystyle\begin{split}\tilde{\Lambda}^{R}_{i}(n)&:=\tilde{\Lambda}^{R}_{i}(n-1)+\Bigl{(}g_{i}(X_{i}(n))-\mathsf{E}_{A}[g_{i}(X_{i}(n))]\Bigr{)}\,R_{i}(n),\\ \tilde{\Lambda}^{R}_{i}(0)&:=0,\end{split}

(142)

which implies that

\displaystyle\tilde{\Lambda}^{R}_{i}(n)

\displaystyle=\begin{cases}\Lambda^{R}_{i}(n)-I_{i}\,n\pi^{R}_{i}(n),\quad i\in A,\\ \Lambda^{R}_{i}(n)+J_{i}\,n\pi^{R}_{i}(n),\quad i\notin A.\end{cases}

(143)

Proof:

We have to show that

\mathcal{J}_{A}(\alpha;k,K)\geq\frac{|\log(\alpha)|}{V(k,K,\boldsymbol{F}(A))}(1+o(1)),

(144)

where $o(1)$ is a quantity that tends to zero as $\alpha\to 0$ . We set

f(\alpha):=\frac{|\log(\alpha)|}{V(k,K,\boldsymbol{F}(A))},\quad\alpha\in(0,1).

(145)

By Markov’s inequality, for any stopping time $T$ and $q\in(0,1)$ we have

\mathsf{E}_{A}[T]\geq q\,f(\alpha)\mathsf{P}_{A}(T\geq q\,f(\alpha)).

(146)

Thus, it suffices to show that for every $q\in(0,1)$ we have

\liminf_{\alpha\to 0}\inf_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}(T\geq q\,f(\alpha))\geq 1,

(147)

as this will imply that

\liminf\limits_{\alpha\to 0}\frac{\mathcal{J}_{A}(\alpha;k,K)}{|\log(\alpha)|}\geq\frac{q}{V(k,K,\boldsymbol{F}(A))},

(148)

and the desired result will follow by letting $q\to 1$ .

In the rest of the proof we fix some arbitrary $q\in(0,1)$ . Then, for any $\alpha\in\,(0,1)$ , $(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)$ , and $B\notin Z(A;k)$ , where $Z(A;k)$ is defined in (131), we have

$\displaystyle\mathsf{P}_{A}(\Delta=B)\leq$	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	(149)
	$\displaystyle+\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
	$\displaystyle+\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta=B\right),$

where $\eta$ is an arbitrary constant in $(0,1)$ . By summing up (149) with respect to all $B\notin Z(A;k)$ , we have

$\displaystyle\mathsf{P}_{A}(\Delta\notin Z(A;k))\leq$	$\displaystyle\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	(150)
	$\displaystyle+\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
	$\displaystyle+\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta\notin Z(A;k)\right).$

In view of the fact that

1-\alpha\leq\mathsf{P}_{A}(\Delta\notin Z(A;k)),

(151)

we obtain

$\displaystyle\mathsf{P}_{A}$	$\displaystyle\left(T\geq qf(\alpha)\right)$	(152)
	$\displaystyle\geq\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta\notin Z(A;k)\right)$
	$\displaystyle\geq 1-\alpha-\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
	$\displaystyle\qquad\qquad-\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).$

Thus, in order to show (147), it suffices to show that for any $B\notin Z(A;k)$

\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)=0,

(153)

and

\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)=0.

(154)

In order to show (153), we fix $\alpha\in\,(0,1)$ and $(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)$ . By application of Boole’s inequality we have

\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)\leq\sum_{G\in Z(B;k)}\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).

(155)

For all $G\in Z(B;k)$ , we apply the change of measure $\mathsf{P}_{A}\to\mathsf{P}_{G}$ and by Wald’s likelihood ratio identity we obtain

	$\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	$\displaystyle=\mathsf{E}_{G}\left[\exp\{\Lambda^{R}_{A,G}(T)\};{\Lambda^{R}_{A,G}}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right]$		(156)
		$\displaystyle\leq\frac{\eta}{\alpha}\,\mathsf{P}_{G}(\Delta=B)\leq\eta,$		(156)

where the last inequality is deduced by the fact that for any $G\in Z(B;k)$ , it holds $\mathsf{P}_{G}(\Delta=B)\leq\alpha$ . In view of (155), we obtain

\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}{\Lambda^{R}_{A,G}}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)\leq\big{|}Z(B;k)\big{|}\eta.

(157)

Since $\eta\in(0,1)$ is arbitrary, letting $\eta\to 0$ , we prove (153).

In order to show (154), we observe that by decomposition (143) we have

	$\displaystyle\frac{1}{T}\Lambda^{R}_{A,G}(T)$	$\displaystyle=\frac{1}{T}\left(\sum_{i\in A\setminus G}\tilde{\Lambda}^{R}_{i}(T)+\sum_{j\in G\setminus A}-\tilde{\Lambda}^{R}_{j}(T)\right)+\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T)+\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{j}(T)$
		$\displaystyle\leq\frac{1}{T}\sum_{i\in[M]}\|\tilde{\Lambda}^{R}_{i}(T)\|+\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T)+\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{i}(T).$

Considering the minimum over all $G\in Z(B;k)$ on both sides of the above equality, we have

	$\displaystyle\frac{1}{T}\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)$	$\displaystyle\leq\frac{1}{T}\sum_{i\in[M]}\|\tilde{\Lambda}^{R}_{i}(T)\|+\min_{G\in Z(B;k)}\left(\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T)+\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{j}(T)\right)$		(158)
		$\displaystyle\leq\frac{1}{T}\sum_{i\in[M]}\|\tilde{\Lambda}^{R}_{i}(T)\|+V(k,K,\boldsymbol{F}(A)),$		(158)

where the second inequality follows by Lemma B.1 and the fact that $(R,T,\Delta)$ belongs to $\mathcal{C}(K)$ , which implies $(\pi^{R}_{1}(T),\ldots,\pi^{R}_{M}(T))\in\mathcal{D}(K)$ . Therefore,

	$\displaystyle\mathsf{P}_{A}$	$\displaystyle\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$		(159)
		$\displaystyle\leq\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\xi(T)\geq\|\log\alpha\|+\log\eta\right),$		(159)

where

\displaystyle\xi(n):=\Bigg{(}\sum_{i\in[M]}\left|\frac{\tilde{\Lambda}^{R}_{i}(n)}{n}\right|+V(k,K,\boldsymbol{F}(A))\Bigg{)}n,\quad\forall n\in\mathbb{N}.

(160)

By the moment assumption (12) and [29, Theorem 2.19], we have

\lim_{n\to\infty}\frac{\tilde{\Lambda}^{R}_{i}(n)}{n}=0,\quad\mbox{a.s.}\quad\forall\,i\in\,[M],

(161)

and as a result,

\lim_{n\to\infty}\frac{\xi(n)}{n}=V(k,K,\boldsymbol{F}(A)),\quad\mbox{a.s.}

(162)

Hence, by [27, Lemma F.1] we have

\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\xi(T)\geq|\log(\alpha)|+\log(\eta)\right)=0,

(163)

From this and (159), we conclude that (154) holds. ∎

B-B Familywise error rates

As a first step towards the proof of Theorems IV.2, IV.3, we provide the following auxiliary lemma. To this end, for fixed $A\subseteq[M]$ , we consider the set

H_{k_{1},k_{2}}(A):=\{B\subseteq[M]:|B\setminus A|<k_{1},\,|A\setminus B|<k_{2}\},

(164)

and for any $B\in H_{k_{1},k_{2}}(A)$ , we also consider the sets

	$\displaystyle U_{k_{1}}(B)$	$\displaystyle:=\{C\subseteq A:\|B\setminus C\|\geq k_{1}\},$		(165)
	$\displaystyle Y_{k_{2}}(B)$	$\displaystyle:=\{C\supseteq A:\|C\setminus B\|\geq k_{2}\}.$		(165)

Lemma B.2

For any $B\in H_{k_{1},k_{2}}(A)$ and $(c_{1},\ldots,c_{M})\in\mathcal{D}(K)$ , we have the following inequalities:

(i)

If $A=\emptyset$ ,

$\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G}c_{j}J_{j}\leq V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)).$ (166)
(ii)

If $A=[M]$ ,

$\min_{G\in U_{k_{1}}(B)}\sum_{i\in[M]\setminus G}c_{i}\,I_{i}\leq V(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A)).$ (167)

(iii)

If $0<|A|<M$ ,

\min\Bigg{\{}\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}c_{i}\,I_{i},\,r\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}c_{j}\,J_{j}\Bigg{\}}\leq v_{A}(k_{1},k_{2},K,r),

(168)

where $v_{A}(k_{1},k_{2},K,r)$ is defined in Theorem IV.3.

Proof:

We prove (i), (iii). Case (ii) is symmetric to (i).
For $A=\emptyset$ , without loss of generality, we consider $B\in H_{k_{1},k_{2}}(\emptyset)$ such that $Y_{k_{2}}(B)\neq\emptyset$ , or equivalently $|B|<k_{1}$ and $|B^{c}|\geq k_{2}$ , otherwise (166) holds trivially. Since $|B^{c}|\geq k_{2}$ , there exists $\Gamma_{2}\subseteq B^{c}$ such that $|\Gamma_{2}|=k_{2}$ , and $\Gamma_{2}$ contains the sources that correspond to the $k_{2}$ smallest terms in $\{c_{j}J_{j}\,:\,j\in B^{c}\}$ . Clearly, $\Gamma_{2}\in Y_{k_{2}}(B)$ , and

[M]=B\cup B^{c}\supseteq B\cup\Gamma_{2}.

(169)

Since $|B|<k_{1}$ , even if $B$ contains the $k_{1}-1$ smallest elements in $\{c_{j}J_{j}\,:\,j\in[M]\}$ , the set $\Gamma_{2}$ contains the following $k_{2}$ smallest elements in $[M]\setminus B=B^{c}$ . Thus, let us denote by $\{j\}$ the identity of the source with the $j^{th}$ smallest element in $\{c_{i}J_{i}\,:\,i\in[M]\}$ , then we have

	$\displaystyle\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G}c_{j}\,J_{j}\leq$	$\displaystyle\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}$		(170)
	$\displaystyle\leq$	$\displaystyle\sum_{j=k_{1}}^{k_{2}+(k_{1}-1)}c_{\{j\}}\,J_{\{j\}}\leq V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)).$		(170)

For $0<|A|<M$ , without loss of generality, we consider $B\in H_{k_{1},k_{2}}(A)$ such that $U_{k_{1}}(B)\neq\emptyset$ and $Y_{k_{2}}(B)\neq\emptyset$ , or equivalently $|B|\geq k_{1}$ and $|B^{c}|\geq k_{2}$ , otherwise (168) holds trivially. We consider the quantities

l_{1}:=|B\setminus A|,\quad l_{2}:=|A\setminus B|.

(171)

1.

The fact that $|B|\geq k_{1}$ implies that $|A\cap B|\geq k_{1}-|B\setminus A|=k_{1}-l_{1}$ . Thus, there exists $\Gamma_{1}\subseteq A\cap B$ such that $|\Gamma_{1}|=k_{1}-l_{1}$ , and $\Gamma_{1}$ contains the sources that correspond to the $k_{1}-l_{1}$ smallest elements in $\{c_{i}I_{i}\,:\,i\in A\cap B\}$ . We set $B^{*}_{1}:=A\setminus\Gamma_{1}$ . Since $\Gamma_{1}\subseteq A\cap B$ , it holds $A\setminus B^{*}_{1}=\Gamma_{1}$ , and

$B\setminus B^{*}_{1}=B\cap(A\setminus\Gamma_{1})^{c}=\Gamma_{1}\cup\left(B\setminus A\right).$ (172)

Therefore,

$|B\setminus B^{*}_{1}|=|\Gamma_{1}|+|B\setminus A|=k_{1}-l_{1}+l_{1}=k_{1},$

which implies $B^{*}_{1}\in U_{k_{1}}(B)$ , and

$\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}c_{i}\,I_{i}\leq\sum_{i\in\Gamma_{1}}c_{i}\,I_{i}.$ (173)

Even if $A\setminus B$ contained the $l_{2}$ smallest elements in $\{c_{i}I_{i}\,:\,i\in A\}$ , the set $\Gamma_{1}\subseteq A\cap B$ would contain the following $k_{1}-l_{1}$ smallest elements in the same set. Thus, for $<j>$ to denote the identity of the $j^{th}$ smallest element in $\{c_{i}I_{i}\,:\,i\in A\}$ , we always have

$\sum_{i\in\Gamma_{1}}c_{i}\,I_{i}\leq\sum_{i=1+l_{2}}^{k_{1}-l_{1}+l_{2}}c_{}\,I_{}.$ (174)
2.

The fact that $|B^{c}|\geq k_{2}$ implies that $|A^{c}\cap B^{c}|\geq k_{2}-|A\cap B^{c}|=k_{2}-l_{2}$ . Thus, there exists $\Gamma_{2}\subseteq A^{c}\cap B^{c}$ such that $|\Gamma_{2}|=k_{2}-l_{2}$ and $\Gamma_{2}$ contains the sources that correspond to the $k_{2}-l_{2}$ smallest elements in $\{c_{j}J_{j}\,:\,j\in A^{c}\cap B^{c}\}$ . We set $B^{*}_{2}:=A\cup\Gamma_{2}$ . Since $\Gamma_{2}\subseteq A^{c}\cap B^{c}$ , it holds $B^{*}_{2}\setminus A=\Gamma_{2}$ , and

$B^{*}_{2}\setminus B=(A\cup\Gamma_{2})\cap B^{c}=\Gamma_{2}\cup\left(A\setminus B\right).$ (175)

Therefore,

$|B^{*}_{2}\setminus B|=|\Gamma_{2}|+|A\setminus B|=k_{2}-l_{2}+l_{2}=k_{2},$

which implies $B^{*}_{2}\in Y_{k_{2}}(B)$ , and

$\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}c_{j}\,J_{j}\leq\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}.$ (176)

Even if $B\setminus A$ contained the $l_{1}$ smallest elements in $\{c_{j}J_{j}\,:\,j\in A^{c}\}$ , the set $\Gamma_{2}\subseteq A^{c}\cap B^{c}$ would contain the following $k_{2}-l_{2}$ smallest elements in the same set. Thus, for $\{j\}$ to denote the identity of the source with the $j^{th}$ smallest element in $\{c_{j}J_{j}\,:\,j\in A^{c}\}$ , we always have

$\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}\leq\sum_{j=1+l_{1}}^{k_{2}-l_{2}+l_{1}}c_{\{j\}}\,J_{\{j\}}.$ (177)

Let us assume, without loss of generality, that $l_{1}\geq l_{2}$ . We set $l:=l_{1}-l_{2}$ , and for (174), (177) we further have,

	$\displaystyle\sum_{i=1+l_{2}}^{k_{1}-l_{1}+l_{2}}c_{<i>}\,I_{<i>}$	$\displaystyle\leq\sum_{i=1+l_{2}}^{k_{1}-l}c_{<i>}\,I_{<i>}\leq\sum_{i=1}^{k_{1}-l}c_{<i>}\,I_{<i>},$		(178)
	$\displaystyle\sum_{j=1+l_{1}}^{k_{2}-l_{2}+l_{1}}c_{\{j\}}\,J_{\{j\}}$	$\displaystyle\leq\sum_{j=1+l+l_{2}}^{k_{2}+l}c_{\{j\}}\,J_{\{j\}}\leq\sum_{j=1+l}^{k_{2}+l}c_{\{j\}}\,J_{\{j\}}.$		(178)

As a result,

	$\displaystyle\min$	$\displaystyle\left\{\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}c_{i}I_{i},\,r\,\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}c_{j}\,J_{j}\right\}$
		$\displaystyle\leq\min\left\{\sum_{i\in\Gamma_{1}}c_{i}\,I_{i},r\,\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}\right\}$
		$\displaystyle\leq\min\left\{\sum_{i=1}^{k_{1}-l}c_{<i>}\,I_{<i>},r\,\sum_{j=1+l}^{k_{2}+l}c_{\{j\}}\,J_{\{j\}}\right\}$
		$\displaystyle\leq W(k_{1}-l,k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l}}(A),r)$
		$\displaystyle\leq v_{A}(k_{1},k_{2},K,r),$

which concludes the claim. ∎

We proceed to the proof of Theorem IV.3 and by which we deduce the proof of Theorem IV.2, as explained in the last part of the following proof.

Proof:

We have to show that

\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\geq\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)}(1+o(1)),

(179)

where $v_{A}(k_{1},k_{2},K,r)$ is defined in Theorem IV.3 and $o(1)$ is a quantity that tends to zero as $\alpha\to 0$ . We define

f(\alpha):=\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)},\quad\alpha\in(0,1).

(180)

By Markov’s inequality, for any stopping time $T$ and $q,\alpha\in(0,1)$ ,

\mathsf{E}_{A}[T]\geq q\,f(\alpha)\mathsf{P}_{A}(T\geq q\,f(\alpha)).

(181)

Thus, it suffices to show that for every $q\in(0,1)$ , we have

\liminf_{\alpha\to 0}\inf_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}(T\geq q\,f(\alpha))\geq 1,

(182)

as this will imply that

\liminf\limits_{\alpha\to 0}\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)/|\log(\alpha)|\geq q/v_{A}(k_{1},k_{2},K,r),

(183)

and the desired result will follow by letting $q\to 1$ .

In the rest of the proof, we fix some arbitrary $q\in(0,1)$ . Moreover, we note that for any $B\subseteq[M]$ we have either $|B|\geq k_{1}$ or $|B^{c}|\geq k_{2}$ , because otherwise $M=|B|+|B^{c}|<k_{1}+k_{2}$ , which contradicts the assumption $k_{1}+k_{2}\leq M$ . In what follows, we let $B\in H_{k_{1},k_{2}}(A)$ and we focus on the case $|B|\geq k_{1}$ and $|B^{c}|\geq k_{2}$ . The other two cases $|B|\geq k_{1},\,|B^{c}|<k_{2}$ , and $|B|<k_{1},\,|B^{c}|\geq k_{2}$ are simpler and they can be treated in the same way as described in the last part of this proof.

Then, for any $\alpha\in\,(0,1)$ and $(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)$ we have

$\displaystyle\mathsf{P}_{A}\left(\Delta=B\right)\leq$	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	(184)
$\displaystyle+$	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)$
$\displaystyle+$	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\beta}\right),\Delta=B\right),$

where $\Lambda^{R}_{A,G}(T)$ is defined in (140), and $\eta$ is an arbitrary constant in $(0,1)$ . The third term in (184) can be equivalently written as

\mathsf{P}_{A}\left(\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho(\alpha,\eta)\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right),

(185)

where $\rho(\alpha,\eta):=\log(\eta/\alpha)/\log(\eta/\beta)$ .

For simplicity in what follows we just write $\rho$ instead of $\rho(\alpha,\eta)$ . We upper bound the third term in (184) by

$\displaystyle\mathsf{P}_{A}$	$\displaystyle\left(\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	(186)
$\displaystyle\leq$	$\displaystyle\mathsf{P}_{A}\left(T\leq qf(\alpha),\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
	$\displaystyle+\mathsf{P}_{A}(T\geq qf(\alpha),\Delta=B).$

By summing up (184) over all $B\in H_{k_{1},k_{2}}(A)$ , we have

	$\displaystyle\mathsf{P}_{A}$	$\displaystyle\left(\Delta\in H_{k_{1},k_{2}}(A)\right)$
		$\displaystyle\leq\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
		$\displaystyle+\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)$
		$\displaystyle+\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
		$\displaystyle+\mathsf{P}_{A}(T\geq qf(\alpha),\Delta\in H_{k_{1},k_{2}}(A)).$

In view of the fact that

1-(\alpha+\beta)\leq\mathsf{P}_{A}\left(\Delta\in H_{k_{1},k_{2}}(A)\right),

(187)

we obtain

	$\displaystyle\mathsf{P}_{A}$	$\displaystyle\left(T\geq qf(\alpha)\right)$
		$\displaystyle\geq\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta\in H_{k_{1},k_{2}}(A)\right)$
		$\displaystyle\geq 1-(\alpha+\beta)$
		$\displaystyle-\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$
		$\displaystyle-\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)$
		$\displaystyle-\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).$

Thus, in order to show (182) it suffices to show that for all $B\in H_{k_{1},k_{2}}(A)$

	$\displaystyle\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)=0,$		(188)
	$\displaystyle\lim_{\beta\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)=0,$		(188)

and

	$\displaystyle\lim_{\alpha\to 0}\sup_{(R,T,\Delta)}$	$\displaystyle\mathsf{P}_{A}\left(T{\leq}qf(\alpha),\min\{\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\}{\geq}\log\left(\frac{\eta}{\alpha}\right),\Delta{=}B\right)$		(189)
		$\displaystyle=0,$		(189)

where the supremum is evaluated over $(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)$ . In order to show (188), we apply Boole’s inequality and we have:

	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	$\displaystyle\leq\sum_{G\in U_{k_{1}}(B)}\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right),$		(190)
	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)$	$\displaystyle\leq\sum_{G\in Y_{k_{2}}(B)}\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right).$		(190)

For all $G\in U_{k_{1}}(B)$ , we apply the change measure $\mathsf{P}_{A}\to\mathsf{P}_{G}$ , and we have

	$\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$	$\displaystyle=\mathsf{E}_{G}\left[\exp\{\Lambda^{R}_{A,G}(T)\};\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right]$		(191)
		$\displaystyle\leq\frac{\eta}{\alpha}\mathsf{P}_{G}(\Delta=B)\leq\frac{\eta}{\alpha}\mathsf{P}_{G}\left(\|B\setminus G\|\geq k_{1}\right)=\eta,$		(191)

where the last inequality is deduced by the fact that for any $G\in U_{k_{1}}(B)$ , the error constraint (9) implies $\mathsf{P}_{G}\left(|B\setminus G|\geq k_{1}\right)\leq\alpha$ . Therefore, for every $\eta\,\in(0,1)$ ,

\limsup_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)\leq|U_{k_{1}}(B)|\eta.

(192)

In the same way, we show that

\limsup_{\beta\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)\leq|Y_{k_{2}}(B)|\eta.

(193)

Letting $\eta\to 0$ , we prove (188).

In order to show (189), we observe that by decomposition (143) we have

	$\displaystyle\frac{1}{T}\,\Lambda^{R}_{A,G}(T)$	$\displaystyle=\sum_{i\in A\setminus G}\frac{\tilde{\Lambda}^{R}_{i}(T)}{T}+I_{i}\,\pi^{R}_{i}(T),$	$\displaystyle\quad\forall\,G\in U_{k_{1}}(B),$		(194)
	$\displaystyle\frac{1}{T}\Lambda^{R}_{A,G}(T)$	$\displaystyle=\sum_{j\in G\setminus A}\frac{-\tilde{\Lambda}^{R}_{j}(T)}{T}+J_{j}\,\pi^{R}_{j}(T),$	$\displaystyle\quad\forall\,G\in Y_{k_{2}}(B),$		(194)

which further implies that

$\displaystyle\frac{1}{T}\min$	$\displaystyle\left\{\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\,\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\right\}$	(195)
$\displaystyle\leq$	$\displaystyle\max\{1,\rho\}\sum_{i\in[M]}\frac{\|\tilde{\Lambda}^{R}_{i}(T)\|}{T}$
	$\displaystyle+\min\Bigg{\{}\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T),\rho\,\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{j}(T)\Bigg{\}}.$

We note that for any fixed $\eta>0$ ,

\rho(\alpha,\eta)=\frac{(\log(\eta)/\log(\alpha)-1)\log(\alpha)}{(\log(\eta)/\log(\beta)-1)\log(\beta)}\to r,\quad\mbox{as }\,\alpha\to 0,

(196)

where $r$ is defined in (50). Also, by Lemma B.2 we have

\min\Bigg{\{}\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}I_{i}\pi^{R}_{i}(T),\,r\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}J_{j}\pi^{R}_{j}(T)\Bigg{\}}\leq v_{A}(k_{1},k_{2},K,r).

(197)

Therefore,

\frac{1}{T}\min\left\{\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),r\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\right\}\leq\max\{1,r\}\sum_{i\in[M]}\frac{|\tilde{\Lambda}^{R}_{i}(T)|}{T}+v_{A}(k_{1},k_{2},K,r).

(198)

As a result, the probability in (189) is bounded above by

\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\xi(T)\geq|\log(\alpha)|+\log(\eta)\right),

(199)

where

\xi(n):=\left(\max\{1,r\}\sum_{i\in[M]}\frac{\big{|}\tilde{\Lambda}^{R}_{i}(n)\big{|}}{n}+v_{A}(k_{1},k_{2},K,r)\right)n,\quad n\in\mathbb{N},

(200)

and it suffices to show that

\limsup_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\xi(T)\geq|\log(\alpha)|+\log(\eta)\right)=0.

(201)

In order to show (201), it suffices to prove that

\lim_{n\to\infty}\frac{\xi(n)}{n}=v_{A}(k_{1},k_{2},K,r),\quad\mbox{a.s.}

(202)

and then the claim follows by [27, Lemma F.1]. Indeed, by [29, Theorem 2.19] and moment assumption (12) we have

\lim_{n\to\infty}\frac{\tilde{\Lambda}^{R}_{i}(n)}{n}=0,\quad\mbox{a.s.}\quad\forall\,i\in[M],

(203)

which combined with (200), implies (202).

The other cases: As for the cases $|B|\geq k_{1},\,|B^{c}|<k_{2}$ , and $|B|<k_{1},\,|B^{c}|\geq k_{2}$ we note the following.

•

In case $|B|\geq k_{1}$ , $|B^{c}|<k_{2}$ , i.e., $Y_{k_{2}}(B)=\emptyset$ and for any $B\in H_{k_{1},k_{2}}(A)$ , we have

	$\displaystyle\mathsf{P}_{A}(\Delta=B)\leq$	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}{\Lambda^{R}_{A,G}}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)$		(204)
		$\displaystyle+\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}{\Lambda^{R}_{A,G}}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).$		(204)

This also applies for the case $A=[M]$ .

•

In case $|B^{c}|\geq k_{2}$ , $|B|<k_{1}$ , i.e., $U_{k_{1}}(B)=\emptyset$ and for any $B\in H_{k_{1},k_{2}}(A)$ , we have

	$\displaystyle\mathsf{P}_{A}(\Delta=B)\leq$	$\displaystyle\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)$		(205)
		$\displaystyle+\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\beta}\right),\Delta=B\right).$		(205)

This also applies for the case $A=\emptyset$ .

In both cases, we deduce the claim following the same reasoning as for the main case ( $|B|\geq k_{1},\,|B^{c}|\geq k_{2}$ ). ∎

Appendix C

In Appendix C, we prove Theorems V.1, V.2, Theorems VI.1, VI.2, and Proposition VI.1. Throughout Appendix C, we fix $A\subseteq[M]$ .

Proof:

In view of the asymptotic lower bound in Theorem IV.1, it suffices to show that

\mathsf{E}_{A}[T^{R}]\lesssim\frac{|\log(\alpha)|}{V(k,K,\boldsymbol{F}(A))},\quad\mbox{as }\;\alpha\to 0,

(206)

where $T^{R}$ is the stopping rule defined in (63). For $d>0$ selected according to (65), it suffices to show that for an arbitrarily small $\epsilon>0$ , we have

\mathsf{E}_{A}[T^{R}]\lesssim d\Big{/}\left(\sum_{i=1}^{k}(c^{*}_{(i)}(A)-\epsilon)F_{i}(A)-\epsilon\right),\quad\mbox{as }\;d\to\infty,

(207)

where $c^{*}_{(i)}(A)$ are defined in (66), and by letting $\epsilon\to 0$ we obtain (206). To this end, we fix $\epsilon>0$ small enough, and we set

L_{\epsilon}(d):=\frac{d}{\sum_{i=1}^{k}(c^{*}_{(i)}(A)-\epsilon)F_{i}(A)-\epsilon},

(208)

for which it holds

\mathsf{E}_{A}[T^{R}]\leq L_{\epsilon}(d)+\sum_{n>L_{\epsilon}(d)}\mathsf{P}_{A}(T^{R}>n).

(209)

On the event $\{T^{R}>n\}$ , there exists $U\subseteq[M]$ with $|U|=k$ such that

\sum_{i\in A\cap U}\Lambda^{R}_{i}(n)-\sum_{j\in A^{c}\cap U}\Lambda^{R}_{j}(n)\leq\sum_{i\in U}|\Lambda^{R}_{i}(n)|<d.

(210)

Moreover, for every $n>L_{\epsilon}(d)$ we have

d<n\,\left(\sum_{i=1}^{k}(c^{*}_{(i)}(A)-\epsilon)F_{i}(A)-\epsilon\right).

By Lemma III.1, for every set $U\subseteq[M]$ with $|U|=k$ , we have

V(k,K,\boldsymbol{F}(A))=\sum_{i=1}^{k}c^{*}_{(i)}(A)\,F_{i}(A)\leq\sum_{i\in A\cap U}c^{*}_{i}(A)\,I_{i}+\sum_{j\in A^{c}\cap U}c^{*}_{j}(A)\,J_{j}.

which further implies that there is an $\epsilon^{\prime}>0$ sufficiently smaller than $\epsilon$ such that

d<n\,\left(\sum_{i\in A\cap U}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}+\sum_{j\in A^{c}\cap U}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right).

Therefore, for every $n>L_{\epsilon}(d)$ the event $\{T^{R}>n\}$ is included in the event

\left\{\sum_{i\in A\cap U}\Lambda^{R}_{i}(n)-\sum_{j\in A^{c}\cap U}\Lambda^{R}_{j}(n)<n\left(\sum_{i\in A\cap U}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}+\sum_{j\in A^{c}\cap U}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right)\right\}.

In view of (209) and by application of Boole’s inequality over all $U\subseteq[M]$ such that $|U|=k$ , we deduce that

\mathsf{E}_{A}[T^{R}]\leq L_{\epsilon}(d)+\sum_{\{U\subseteq[M]:|U|=k\}}\sum_{n=1}^{\infty}\;S(n;U),

(211)

where

	$\displaystyle S(n;U):=$	$\displaystyle\sum_{i\in A\cap U}\mathsf{P}_{A}\left(\frac{\Lambda^{R}_{i}(n)}{n}<(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}/k\right)$		(212)
		$\displaystyle+\sum_{j\in A^{c}\cap U}\mathsf{P}_{A}\left(-\frac{\Lambda^{R}_{j}(n)}{n}<(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}/k\right).$		(212)

By assumption, for any $i\in A$ , $j\notin A$ , and for any $\epsilon>0$ , we have

	$\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<c_{i}^{*}(A)-\epsilon\right)<\infty,$		(213)
	$\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{j}(n)<c_{j}^{*}(A)-\epsilon\right)<\infty,$		(213)

and thus by [9, Lemma A.2 (i), (ii)], with $\rho_{i}=c_{i}^{*}(A)-\epsilon^{\prime}$ and $\rho_{j}=c_{j}^{*}(A)-\epsilon^{\prime}$ , it follows that all the series in (211) converge. Hence, letting $d\to\infty$ we obtain (207). ∎

Proof:

In view of the asymptotic lower bound in Theorem IV.3, it suffices to show that

\mathsf{E}_{A}\left[T^{R}_{leap}\right]\lesssim\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)},\quad\mbox{as }\;\alpha\to 0,

(214)

where $T^{R}_{leap}$ is the stopping rule (73). Without loss of generality, we focus on case $0<|A|<M$ and $v_{A}(k_{1},k_{2},K,r)$ equals (53) of Theorem IV.3, as the other cases follow by the same approach. For $a,\,b>0$ selected according to (75) it suffices to show that for any arbitrarily small $\epsilon>0$ , we have

\mathsf{E}_{A}[T^{R}]\lesssim b\Big{/}\left(\sum_{i=1}^{k_{1}-l_{A}}(c^{*}_{<i>}(A)-\epsilon)I_{i}(A)-\epsilon\right),\quad\mbox{as }\,b\to\infty,

(215)

where $c^{*}_{}(A)$ , $c^{*}_{\{i\}}(A)$ are defined as in (80)-(81), and by letting $\epsilon\to 0$ we obtain (214). To this end, we fix $\epsilon>0$ small enough and we set

L_{\epsilon}(a,b):=\max\left\{\frac{b}{\sum_{i=1}^{k_{1}-l_{A}}(c^{*}_{\langle i\rangle}(A)-\epsilon)I_{i}(A)-\epsilon},\frac{a}{\sum_{j=1}^{k_{2}}(c^{*}_{\{j\}}(A)-\epsilon)J_{l_{A}+j}(A)-\epsilon}\right\},

(216)

and we observe that

\mathsf{E}_{A}\left[T^{R}_{leap}\right]\leq L_{\epsilon}(a,b)+\sum_{n>L_{\epsilon}(a,b)}\mathsf{P}_{A}(T^{R}>n).

(217)

For any $n\in\mathbb{N}$ , on the event $\{T^{R}_{leap}>n\}$ there exist $U_{1}\subseteq A$ with $|U_{1}|=k_{1}-l_{A}$ , and $U_{2}\subseteq\boldsymbol{J_{l_{A}}}(A)$ with $|U_{2}|=k_{2}$ , such that

\sum_{i\in U_{1}}\hat{\Lambda}^{R}_{i}(n)<b,\quad\mbox{or}\quad\sum_{i\in U_{2}}\check{\Lambda}^{R}_{i}(n)<a.

(218)

Moreover, for every $n>L_{\epsilon}(a,b)$ we have

b<n\left(\sum_{i=1}^{k_{1}-l_{A}}(c^{*}_{\langle i\rangle}(A)-\epsilon)I_{i}(A)-\epsilon\right),\qquad a<n\left(\sum_{j=1}^{k_{2}}(c^{*}_{\{j\}}(A)-\epsilon)J_{l_{A}+j}(A)-\epsilon\right).

(219)

For any set $U_{1}$ , $U_{2}$ , as described above, we have

\sum_{i=1}^{k_{1}-l_{A}}c^{*}_{\langle i\rangle}(A)\,I_{i}(A)\leq\sum_{i\in U_{1}}c^{*}_{i}(A)\,I_{i},\qquad\sum_{j=1}^{k_{2}}c^{*}_{\{j\}}(A)\,J_{j+l_{A}}(A)\leq\sum_{j\in U_{2}}c^{*}_{j}(A)\,J_{j}.

Thus, there is an $\epsilon^{\prime}>0$ sufficiently smaller than $\epsilon$ such that

b<n\left(\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right),\qquad a<n\left(\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right).

(220)

Therefore, for every $n>L_{\epsilon}(a,b)$ the event $\{T^{R}_{leap}>n\}$ is included in the event

\left\{\sum_{i\in U_{1}}\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right\}\bigcup\left\{\sum_{j\in U_{2}}\frac{\check{\Lambda}^{R}_{j}(n)}{n}<\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right\}.

In view of (217) and by application of Boole’s inequality over all $U_{1}\subseteq A$ with $|U_{1}|=k_{1}-l_{A}$ , and all $U_{2}\subseteq\boldsymbol{J_{l_{A}}}(A)$ with $|U_{2}|=k_{2}$ , we deduce that

	$\displaystyle\mathsf{E}_{A}\left[T^{R}_{leap}\right]\leq L_{\epsilon}(a,b)$	$\displaystyle+\sum_{U_{1}\subseteq A:\,\|U_{1}\|=k_{1}-l_{A}}\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\sum_{i\in U_{1}}\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right)$		(221)
		$\displaystyle+\sum_{U_{2}\subseteq\boldsymbol{J_{l_{A}}}(A):\,\|U_{2}\|=k_{2}}\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\sum_{j\in U_{2}}\frac{\check{\Lambda}^{R}_{j}(n)}{n}<\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right).$		(221)

In order to prove the claim, it suffices to show the summability of the series in (221). By Boole’s inequality we have

	$\displaystyle\mathsf{P}_{A}\left(\sum_{i\in U_{1}}\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right)$	$\displaystyle\leq\sum_{i\in U_{1}}\mathsf{P}_{A}\left(\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}/k_{1}\right),$		(222)
	$\displaystyle\mathsf{P}_{A}\left(\sum_{j\in U_{2}}\frac{\check{\Lambda}^{R}_{j}(n)}{n}<\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right)$	$\displaystyle\leq\sum_{j\in U_{2}}\mathsf{P}_{A}\left(\frac{\check{\Lambda}^{R}_{j}(n)}{n}<(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}/k_{2}\right).$		(222)

By assumption, for any $i\in A$ , $j\notin A$ , and for any $\epsilon>0$ we have

	$\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<c_{i}^{*}(A)-\epsilon\right)$	$\displaystyle<\infty,$		(223)
	$\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{j}(n)<c_{j}^{*}(A)-\epsilon\right)$	$\displaystyle<\infty,$		(223)

and thus by [9, Lemma A.2 (i) (ii)], with $\rho_{i}=c_{i}^{*}(A)-\epsilon^{\prime}$ and $\rho_{j}=c_{j}^{*}(A)-\epsilon^{\prime}$ , it follows that all the series in (222) converge. Thus, letting $b\to\infty$ we obtain (215).
∎

Proof:

By definition of $\mathfrak{D}^{R}_{n}$ , we have

	$\displaystyle\mathsf{P}_{A}\left(\sigma_{A}^{R}>n\right)$	$\displaystyle\leq\sum_{i\in A}\mathsf{P}_{A}\left(\exists\,m\geq n:\Lambda^{R}_{i}(m)<0\right)$		(224)
		$\displaystyle+\sum_{j\notin A}\mathsf{P}_{A}\left(\exists\,m\geq n:\Lambda^{R}_{j}(m)\geq 0\right),$		(224)

which by Boole’s inequality is further bounded by

	$\displaystyle\mathsf{P}_{A}\left(\sigma_{A}^{R}>n\right)\leq$	$\displaystyle\sum_{i\in A}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{i}(m)<0\right)$		(225)
		$\displaystyle+\sum_{j\notin A}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{j}(m)\geq 0\right).$		(225)

Therefore, in order to prove that $\mathsf{E}_{A}\left[\sigma_{A}^{R}\right]<\infty$ , it suffices to show that

	$\displaystyle\sum_{n=1}^{\infty}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{i}(m)<0\right)=\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0\right)$	$\displaystyle<\infty,\quad\forall\,i\in A,$		(226)
	$\displaystyle\sum_{n=1}^{\infty}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{j}(m)\geq 0\right)=\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\Lambda^{R}_{j}(n)\geq 0\right)$	$\displaystyle<\infty,\quad\forall\,j\in A^{c}.$		(226)

We prove the case $i\in A$ , as the case $i\in A^{c}$ follows in the same way. We note that

	$\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0\right)\leq$	$\displaystyle\,\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0,\,\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right)$		(227)
		$\displaystyle+\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right).$		(227)

In view of assumption (91), in order to prove (226), it suffices to show that

\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0,\,\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right)<\infty,

(228)

and since

		$\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0,\,\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right)$		(229)
		$\displaystyle=\mathsf{P}_{A}\left(\tilde{\Lambda}^{R}_{i}(n)<-I_{i}\,n\,\pi^{R}_{i}(n),\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right)$
		$\displaystyle\leq\mathsf{P}_{A}\left(\|\tilde{\Lambda}^{R}_{i}(n)\|>C\,I_{i}\,n^{1-\delta}\right),$

it suffices to show

\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>CI_{i}n^{1-\delta}\right)<\infty.

(230)

By Markov’s inequality

\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>CI_{i}n^{1-\delta}\right)\leq\frac{\mathsf{E}_{A}\left[|\tilde{\Lambda}^{R}_{i}(n)|^{\mathfrak{p}}\right]}{C^{\mathfrak{p}}I^{\mathfrak{p}}_{i}n^{(1-\delta)\mathfrak{p}}}.

(231)

For each $i\in[M]$ , $\{\tilde{\Lambda}^{R}_{i}(n)\,:\,n\geq 0\}$ is a $\mathcal{F}^{R}(n)$ -martingale, thus by Rosenthal’s inequality [29, Theorem 2.12] there is a constant $C_{0}>0$ , such that

\mathsf{E}_{A}\left[|\tilde{\Lambda}^{R}_{i}(n)|^{\mathfrak{p}}\right]\leq C_{0}n^{\mathfrak{p}/2},

(232)

and as a result,

\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>CI_{i}n^{1-\delta}\right)\leq\frac{C_{0}}{C^{\mathfrak{p}}I^{\mathfrak{p}}_{i}}\frac{n^{\mathfrak{p}/2}}{n^{(1-\delta)\mathfrak{p}}},

(233)

For $\epsilon>0$ small enough such that $\delta<1/2-(2+\epsilon)/\mathfrak{p}$ , it holds

\frac{n^{\mathfrak{p}/2}}{n^{(1-\delta)\mathfrak{p}}}<\frac{1}{n^{2+\epsilon}},

(234)

which proves the claim (230). ∎

Proof:

According to Theorems V.1 and V.2, it suffices to show that for each $i\in[M]$

\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)<\infty,\quad\forall\;\epsilon>0,

(235)

where $(c^{*}_{1}(A),\ldots,c^{*}_{M}(A))$ is defined according to Definitions V.1 and V.2, respectively.

By the definition (92) of a probabilistic sampling rule, $R(n)$ is conditionally independent of $\mathcal{F}^{R}_{n-1}$ given $\mathfrak{D}^{R}_{n-1}$ . Thus, by [30, Prop. 6.13] there is a measurable function $h:\mathbb{N}\times 2^{[M]}\times[0,1]\to 2^{[M]}$ such that

R(n)=h(n,\mathfrak{D}^{R}_{n-1},Z_{0}(n-1)),\quad n\in\mathbb{N},

(236)

where $\{Z_{0}(n)\,:\,n\in\mathbb{N}_{0}\}$ is a sequence of iid random variables, uniformly distributed on $[0,1]$ . Consequently, for each $i\in[M]$ there is a measurable function $h_{i}:\mathbb{N}\times 2^{[M]}\times[0,1]\to\{0,1\}$ such that

R_{i}(n)=h_{i}(n,\mathfrak{D}^{R}_{n-1},Z_{0}(n-1)),\quad n\in\mathbb{N}.

(237)

We fix $\epsilon>0$ , and $i\in[M]$ . For every $n\in\mathbb{N}$ , we have

		$\displaystyle\left\{\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right\}$		(238)
		$\displaystyle=\left\{\sum_{m=1}^{n}\left(R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\right)+\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n\,(c^{*}_{i}(A)-\epsilon)\right\},$		(238)

and as a result

	$\displaystyle\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)$	$\displaystyle\leq\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\right)<-n\,\epsilon/2\right)$		(239)
		$\displaystyle+\mathsf{P}_{A}\left(\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n(c^{*}_{i}(A)-\epsilon/2)\right).$		(239)

For the first term on the right hand side of (239) we have

		$\displaystyle\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\right)<-n\,\epsilon/2\right)$		(240)
		$\displaystyle\leq\mathsf{P}_{A}\left(\sum_{m=1}^{n}\|R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\|>n\,\epsilon/2,\sigma^{R}_{A}\leq n\right)+\mathsf{P}_{A}\left(\sigma^{R}_{A}\geq n\right)$
		$\displaystyle\leq\mathsf{P}_{A}\left(\sigma^{R}_{A}\geq n\,\epsilon/2\right)+\mathsf{P}_{A}\left(\sigma^{R}_{A}\geq n\right)$

where we used the fact that $R_{i}(n)=h_{i}(n,A,Z_{0}(n-1))$ for all $n\geq\sigma^{R}_{A}$ . Since $R$ is consistent both terms on the right hand side of (240) are summable.

For the second term on the right hand side of (239) we have

		$\displaystyle\left\{\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n(c^{*}_{i}(A)-\epsilon/2)\right\}$		(241)
		$\displaystyle=\left\{\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<\sum_{m=1}^{n}\left(c^{*}_{i}(A)-c_{i}^{R}(m,A)\right)-n\,\epsilon/2\right\}.$		(241)

By the (general form of) Stolz-Cesaro theorem [31],

		$\displaystyle\limsup_{n\to\infty}\frac{\sum_{m=1}^{n}\left(c^{*}_{i}(A)-c_{i}^{R}(m,A)\right)}{n}$		(242)
		$\displaystyle\leq\limsup_{n\to\infty}\left(c^{}_{i}(A)-c_{i}^{R}(n,A)\right)=c^{}_{i}(A)-\liminf_{n\to\infty}c_{i}^{R}(n,A)\leq 0,$		(242)

where the last inequality is deduced by assumption (95). Therefore, there exists an $M>0$ such that for all $n\geq M$ it holds

\frac{1}{n}\sum_{m=1}^{n}\left(c^{*}_{i}(A)-c_{i}^{R}(m,A)\right)<\frac{\epsilon}{4}.

(243)

Hence,

		$\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n\,(c^{*}_{i}(A)-\epsilon/2)\right)$		(244)
		$\displaystyle\leq M+\sum_{n=M}^{\infty}\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<-n\,\epsilon/4\right),$		(244)

and, thus, it suffices to show that

\sum_{n=M}^{\infty}\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<-n\epsilon/4\right)<\infty.

(245)

By (93) and (237), it holds

\mathsf{E}_{A}\left[h_{i}(n,A,Z_{0}(n-1))\,|\,\mathcal{F}^{R}_{n-1}\right]=c_{i}^{R}(n,A),\quad\forall\,n\in\mathbb{N}

(246)

which shows that $\{h_{i}(n,A,Z_{0}(n-1))-c_{i}^{R}(n,A)\,:\,n\geq 1\}$ is a martingale difference sequence, whose absolute value is uniformly bounded by $1$ . By application of Azuma-Hoeffding inequality, we deduce that

\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<-n\,\epsilon/2\right)\leq e^{-\gamma n}

(247)

where $\gamma>0$ is a constant, which proves (245).
∎

Proof:

We show that the suggested sampling rule $R$ is consistent for part (ii). According to Theorem VI.1, it suffices to show that for each $i\in[M]$ ,

\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)<\infty,

(248)

for some $C>0$ and $\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right)$ . We choose $C<C_{p}$ . We fix $i\in[M]$ , and for every $n\in\mathbb{N}$ , we notice that

	$\displaystyle\left\{\pi_{i}^{R}(n)<C\,n^{-\delta}\right\}$	$\displaystyle=\left\{\sum_{m=1}^{n}R_{i}(m)<Cn^{1-\delta}\right\}$		(249)
		$\displaystyle=\left\{\sum_{m=1}^{n}(R_{i}(m)-c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1}))+\sum_{m=1}^{n}c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1})<Cn^{1-\delta}\right\}.$		(249)

By (99), we have $c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1})\geq C_{p}/m^{\delta}$ , for all $m\in\{1,\ldots,n\}$ , and we deduce that

\displaystyle\sum_{m=1}^{n}c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1})

\displaystyle\geq\sum_{m=1}^{n}\frac{C_{p}}{m^{\delta}}\geq C_{p}\,n^{1-\delta},

(250)

which further implies

		$\displaystyle\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)$		(251)
		$\displaystyle\leq\mathsf{P}_{A}\left(\sum_{m=1}^{n}(R_{i}(m)-c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1}))<-(C_{p}-C)\,n^{1-\delta}\right).$		(251)

In view of (93), it holds

\mathsf{E}_{A}\left[R_{i}(n)|\mathcal{F}^{R}_{n-1}\right]=c^{R}_{i}(n,\mathfrak{D}^{R}_{n-1}),\quad\forall\,n\in\mathbb{N},

(252)

which shows that $\left\{R_{i}(n)-c^{R}_{i}(n,\mathfrak{D}^{R}_{n-1})\,:\,n\in\mathbb{N}\right\}$ is a martingale difference, whose absolute value is also uniformly bounded by $1$ . Thus, by application of the Azuma-Hoeffding inequality, we deduce that

\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)\leq e^{-\zeta n^{1-2\delta}},

where $\zeta>0$ is a constant. Therefore, we deduce (248) by noting that there is $M>0$ such that for all $n\geq M$ ,

n^{1-2\delta}\geq\frac{3}{\zeta}\ln(n).

∎

	$\displaystyle(k_{1}-\|A\|)^{+}$	$\displaystyle\leq(k_{1}-1)\wedge(\|A^{c}\|-k_{2}),$		(55)
	$\displaystyle(k_{2}-\|A^{c}\|)^{+}$	$\displaystyle\leq(k_{2}-1)\wedge(\|A\|-k_{1}).$		(56)