This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sequential anomaly identification with observation control under generalized error metrics

Aristomenis Tsopelakos, Georgios Fellouris This work was presented in part in ISIT ’20, [1]. Aristomenis Tsopelakos, Georgios Fellouris are with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign.
Abstract

The problem of sequential anomaly detection and identification is considered, where multiple data sources are simultaneously monitored and the goal is to identify in real time those, if any, that exhibit “anomalous” statistical behavior. An upper bound is postulated on the number of data sources that can be sampled at each sampling instant, but the decision maker selects which ones to sample based on the already collected data. Thus, in this context, a policy consists not only of a stopping rule and a decision rule that determine when sampling should be terminated and which sources to identify as anomalous upon stopping, but also of a sampling rule that determines which sources to sample at each time instant subject to the sampling constraint. Two distinct formulations are considered, which require control of different, “generalized” error metrics. The first one tolerates a certain user-specified number of errors, of any kind, whereas the second tolerates distinct, user-specified numbers of false positives and false negatives. For each of them, a universal asymptotic lower bound on the expected time for stopping is established as the error probabilities go to 0, and it shown to be attained by a policy that combines the stopping and decision rules proposed in the full-sampling case with a probabilistic sampling rule that achieves a specific long-run sampling frequency for each source. Moreover, the optimal to a first-order asymptotic approximation expected time for stopping is compared in simulation studies with the corresponding factor in a finite regime, and the impact of the sampling constraint and tolerance to errors is assessed.

Index Terms:
Anomaly identification, generalized error metric, sampling design, asymptotic optimality.

I Introduction

In many scientific and engineering applications, measurements from various data sources are collected sequentially, aiming to identify in real-time the sources, if any, with “anomalous” statistical behavior. In internet security systems [2], for example, the data streams may refer to the transition rate of a link, where a meager rate warns of a possible intrusion. In finance, the data streams may refer to prices in the stock market [3], where we need to detect an unusual rate of return of a stock price.

Such applications, among many others, motivate the formulation of sequential multiple testing problems where multiple data sources are monitored sequentially, a binary hypothesis testing problem is formulated for each of them, and the goal is to simultaneously solve these testing problems as quickly as possible, while controlling certain error metrics. In some works, e.g.,[4, 5, 6], it is assumed that all sources can be monitored at each sampling instant upon stopping. In others, e.g., [7, 8, 9, 10, 11, 12, 13, 14, 15], a sampling constraint is imposed, according to which it is possible to observe only a subset of sources at each time instant, but the decision maker selects which ones to sample based on the already collected data. In the latter case, in addition to a stopping rule and a decision rule that determine when to stop sampling and which sources to identify as anomalous upon stopping, it is also required to specify a sampling rule, which determines the sources to be sampled at each time instant until stopping. This sampling constraint leads to a sequential multiple testing problem with adaptive sampling design, which lies in the field of “sequential testing with controlled sensing”, or “sequential design of experiments” [16], [17], [18], [19]. Hence, methods and results from these works are applicable as well.

In the above works, the problem formulation requires, implicitly or explicitly, control of the “classical” misclassification error rate, i.e., the probability of at least one error, of any kind, or of the two “classical” familywise error rates, i.e., the probabilities of at least one false alarm and at least one missed detection. However, such error metrics can be impractical, especially when there is a common stopping time at which the decision is made for all data sources, as in the above papers. Indeed, even with a relatively small number of data sources, a single “difficult” hypothesis may determine, even inflate, the overall time required for the decisions to be made.

This inflexibility has motivated the adoption of more lenient error metrics, which are prevalent in the fixed-sample-size literature of multiple testing [20, 21, 22]. As it was shown in [22], the methodology and asymptotic optimality theory in [6], [9] for the sequential identification problem under classical familywise error rates remain valid with other error metrics as long as the latter are bounded above and below up to a multiplicative constant by the corresponding classical familywise error rates. 111The authors in [22] focus in the full sampling case, but exactly the same arguments apply in the case of sampling constraints. This is indeed the case for many error metrics, such as the false discovery rate (FDR) and the false non-discovery rate (FNR) [23]. However, it is not the case for others, such as the generalized misclassification error rate, which requires control of the probability of at least kk errors of any kind, and the generalized familywise type-I and type-II errors [24, 25, 26], which require control of the probabilities of at least k1k_{1} false alarms and at least k2k_{2} missed detections. As it was shown in [27] in the case of full sampling, that is, when all data sources are continuously monitored and there are no sampling constraints, the sequential identification problem with these generalized error metrics when k>1k>1 or k1+k2>2k_{1}+k_{2}>2 requires a distinct methodology and pose additional mathematical challenges compared to their corresponding classical versions, where k=1k=1 or k1=k2=1k_{1}=k_{2}=1.

In the present work, we address the same sequential anomaly identification problem as in [27], but in the presence of sampling constraints. That is, unlike [27], we assume that it is not possible to observe all data sources at all times. Instead, we impose an upper bound on the average number of samples collected from all sources up to stopping. For each of the two resulting problem formulations, i.e., for each of the two generalized error metrics under consideration, (i) we establish a universal lower bound on the optimal expected time for stopping, to a first-order asymptotic approximation as the corresponding error probabilities go to zero, (ii) we show that this lower bound is attained by a policy that utilizes the stopping and decision rules in [27], as long as the long-run sampling frequency of each source is greater than or equal to a certain value, which is computed explicitly and depends both on the source and on the true subset of anomalous sources, (iii) we show that the latter can be achieved by a probabilistic sampling rule.

The methodology and the theory that we develop in the present work is based on an interplay of techniques, ideas, and methods from the sequential multiple testing problem with full sampling and generalized error control in [27], and the sequential multiple testing problem with sampling constraints and “classical” familywise error control in [9]. However, there are various interesting special features that arise in the present work, both from a technical but also from a methodological point of view. First, there are certain technical challenges which force us to strengthen some of our assumptions compared to [9]. Specifically, we require a stronger moment assumption on the log-likelihood ratio statistics than the finiteness of the Kullback-Leibler information numbers, and we have to postulate a harder sampling constraint (which is still weaker than the usual constraint in the literature that fixes the number of sources that are sampled at each time instant). Second, while in [9] it is shown that there is no need for “forced exploration”, as in the general sequential testing problem with controlled sensing [16, 17], this turns out to be needed in this work.

The remainder of the paper is organized as follows. In Section II, we formulate the two problems we consider in this work. In Section III, we state and solve two auxiliary max-min optimization problems, which play a key role in the formulation of our main results. In Section IV, we state the universal asymptotic lower bound for each of the two problems. In Section V, for each of the two problems, we introduce a family of policies that satisfies the error constraints, and we state a criterion on the sampling rule that guarantees the asymptotic optimality of such a policy. In Section VI, we present a class of probabilistic sampling rules for which the aforementioned criterion is satisfied. In Section VII, we present a simulation study that illustrates our theoretical results, and in Section VIII, we state our conclusions and discuss future research directions.

We end this section with some notations we use throughout the paper. We use :=:= to indicate the definition of a new quantity and \equiv to indicate the equivalence of two notions. We set :={1,2,}\mathbb{N}:=\{1,2\ldots,\}, 0:={0}\mathbb{N}_{0}:=\{0\}\cup\mathbb{N}, and [n]:={1,,n}[n]:=\{1,\ldots,n\} for nn\in\mathbb{N}. We denote by |A||A| the size and by 2A2^{A} the powerset of a set AA, and by ABA\triangle B the symmetric difference of two sets A,BA,B. The a\lfloor a\rfloor, a\lceil a\rceil stand for the floor and the ceiling of a positive number aa. The indicator function is denoted by 𝟏\mathbf{1}. In a summation, if the lower limit is larger than the upper limit, then the summation is assumed to be equal to 0. For positive sequences (xn)(x_{n}) and (yn)(y_{n}), we write xnynx_{n}\sim y_{n} when limn(xn/yn)=1\lim_{n}(x_{n}/y_{n})=1, xnynx_{n}\gtrsim y_{n} when lim infn(xn/yn)1\liminf_{n}(x_{n}/y_{n})\geq 1, and xnynx_{n}\lesssim y_{n} when lim supn(xn/yn)1\limsup_{n}(x_{n}/y_{n})\leq 1. Finally, iid stands for independent and identically distributed.

II Problem formulation

Let (𝕊,𝒮)(\mathbb{S},\mathcal{S}) be a measurable space and let (Ω,,𝖯)(\Omega,\mathcal{F},\mathsf{P}) be a probability space that hosts MM independent sequences of iid, 𝕊\mathbb{S}-valued random elements, {Xi(n),n}\{X_{i}(n),n\in\mathbb{N}\}, i[M]i\in[M], which are generated by MM distinct data sources, and MM independent sequences of random variables, {Zi(n):n}\{Z_{i}(n):n\in\mathbb{N}\}, i[M]i\in[M], uniformly distributed in (0,1)(0,1), to be used for randomization purposes. For each i[M]i\in[M], we assume that each Xi(n)X_{i}(n) has a density with respect to some σ\sigma-finite measure νi\nu_{i} that is equal to either f1if_{1i} or f0if_{0i}, and we say that source ii is “anomalous” if its density is f1if_{1i}. We denote by 𝖯A\mathsf{P}_{A} the underlying probability measure, and by 𝖤A\mathsf{E}_{A} the corresponding expectation, when the subset of anomalous sources is A[M]A\subseteq[M]. We simply write 𝖯\mathsf{P} and 𝖤\mathsf{E} whenever the identity of the subset of anomalous sources is not relevant.

The problem we consider is the identification of the anomalous sources, if any, on the basis of sequentially acquired observations from all sources when it is not possible to observe all of them at every sampling instant. Specifically, we have to specify (i) the random time, TT, at which sampling is terminated, (ii) the subset of sources, Δ\Delta, that are declared as anomalous upon stopping, and, for each nTn\leq T, (iii) the subset, R(n)R(n), of sources that are sampled at time nn. At any time instant, the decision whether to stop or not, as well as the subsets of sources that are identified as anomalous in the former case, or sampled next in the latter, must be determined based on the already collected data.

Therefore, we say that R:={R(n):n}R:=\{R(n):n\in\mathbb{N}\} is a sampling rule if R(n)R(n) is n1R\mathcal{F}^{R}_{n-1}-measurable for every nn\in\mathbb{N}, where

nR:={σ(n1R,Z(n),{Xi(n):iR(n)}),n,σ(Z(0)),n=0,\displaystyle\mathcal{F}^{R}_{n}:=\begin{cases}\sigma\left(\mathcal{F}^{R}_{n-1},\,Z(n),\,\{X_{i}(n)\,:\,i\in R(n)\}\right),\quad&n\in\mathbb{N},\\ \sigma(Z(0)),\quad&n=0,\end{cases} (1)

and Z(n):=(Z1(n),,ZM(n))Z(n):=(Z_{1}(n),\ldots,Z_{M}(n)). Moreover, we say that the triplet (R,T,Δ)(R,T,\Delta) is a policy if

  • (i)

    RR is a sampling rule,

  • (ii)

    TT is a stopping time with respect to filtration {nR:n}\{\mathcal{F}^{R}_{n}\,:\,n\in\mathbb{N}\},

  • (iii)

    Δ\Delta is TR\mathcal{F}^{R}_{T}-measurable, i.e.,

    {T=n,Δ=D}nR,nandD[M],\{T=n,\Delta=D\}\in\mathcal{F}^{R}_{n},\quad\forall\;n\in\mathbb{N}\quad\text{and}\quad D\subseteq[M],

in which case we refer to TT as a stopping rule and to Δ\Delta as a decision rule.

We denote by 𝒞\mathcal{C} the family of all policies, and we focus on policies that satisfy a sampling constraint and control the probabilities of certain types of error. To define these, we need to introduce some additional notation. Thus, for any sampling rule RR and time instant nn\in\mathbb{N}, we denote by Ri(n)R_{i}(n) the indicator of whether source ii is sampled at time nn\in\mathbb{N}, i.e.,

Ri(n):=𝟏{iR(n)},i[M],R_{i}(n):=\mathbf{1}\{i\in R(n)\},\quad i\in[M],

by πiR(n)\pi_{i}^{R}(n) the proportion of times source ii has been sampled in the first nn time instants, i.e.,

πiR(n):=1nm=1nRi(m),\displaystyle\pi_{i}^{R}(n):=\frac{1}{n}\sum_{m=1}^{n}R_{i}(m), (2)

and we note that the average number of observations from all sources in the first nn time instants is

1nm=1n|R(m)|=1ni=1Mm=1nRi(m)=i=1MπiR(n).\frac{1}{n}\sum_{m=1}^{n}|R(m)|=\frac{1}{n}\sum_{i=1}^{M}\sum_{m=1}^{n}R_{i}(m)=\sum_{i=1}^{M}\pi_{i}^{R}(n). (3)

II-A The sampling constraint

For any real number KK in (0,M](0,M], we say that a policy (R,T,Δ)(R,T,\Delta) belongs to 𝒞(K)\mathcal{C}(K) if the average number of observations from all sources until stopping is less than or equal to KK, i.e.,

1Tm=1T|R(m)|=i=1MπiR(T)K.\frac{1}{T}\sum_{m=1}^{T}|R(m)|=\sum_{i=1}^{M}\pi_{i}^{R}(T)\leq K. (4)

This sampling constraint is clearly satisfied when at most K\lfloor K\rfloor sources are sampled at each time instant up to stopping, i.e., when

|R(m)|K,mT,|R(m)|\leq\lfloor K\rfloor,\quad\forall\,m\leq T, (5)

whereas it implies the sampling constraint considered in [9],

𝖤[m=1T|R(m)|]K𝖤[T].\mathsf{E}\left[\sum_{m=1}^{T}|R(m)|\right]\leq K\,\mathsf{E}[T]. (6)

In other words, the sampling constraint (4) is looser than (5), but stricter than (6).

II-B The error constraints

We consider two types of error control, which lead to two distinct problem formulations. We characterize them both as “generalized”, as they generalize their corresponding “classical” versions of misiclassification error rate and familywise error rates.

II-B1 Control of generalized misclassification error rate

For any K(0,M]K\in(0,M], k[M]k\in[M], and α(0,1)\alpha\in(0,1), we say that a policy (R,T,Δ)(R,T,\Delta) in 𝒞(K)\mathcal{C}(K) belongs to 𝒞(α;k,K)\mathcal{C}(\alpha;k,K) if the probability of at least k[M]k\in[M] errors of any kind is at most α\alpha, i.e.,

𝖯A(|AΔ|k)α,A[M],\mathsf{P}_{A}(|A\triangle\Delta|\geq k)\leq\alpha,\quad\forall\,A\subseteq[M], (7)

and we denote by 𝒥A(α;k,K)\mathcal{J}_{A}(\alpha;k,K) the smallest possible expected time for stopping in 𝒞(α;k,K)\mathcal{C}(\alpha;k,K), when the subset of anomalous sources is A[M]A\subseteq[M], i.e.,

𝒥A(α;k,K):=inf(R,T,Δ)𝒞(α;k,K)𝖤A[T].\mathcal{J}_{A}(\alpha;k,K):=\inf\limits_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{E}_{A}[T]. (8)

The first problem we consider in this paper is to evaluate 𝒥A(α;k,K)\mathcal{J}_{A}(\alpha;k,K) to a first-order asymptotic approximation as α0\alpha\to 0 for any A[M]A\subseteq[M], and to find a policy that achieves 𝒥A(α;k,K)\mathcal{J}_{A}(\alpha;k,K) in this asymptotic sense, simultaneously for every A[M]A\subseteq[M].

II-B2 Control of generalized familywise error rates

For any K(0,M]K\in(0,M], k1,k2[M]k_{1},\,k_{2}\in[M], and α,β(0,1)\alpha,\,\beta\in(0,1), such that α+β<1\alpha+\beta<1 and k1+k2Mk_{1}+k_{2}\leq M, we say that a policy (R,T,Δ)(R,T,\Delta) in 𝒞(K)\mathcal{C}(K) belongs to 𝒞(α,β;k1,k2,K)\mathcal{C}(\alpha,\beta;k_{1},k_{2},K) if the probability of at least k1k_{1} false positives does not exceed α\alpha and the probability of at least k2k_{2} false negatives does not exceed β\beta, i.e.,

𝖯A(|ΔA|k1)α,and𝖯A(|AΔ|k2)β,A[M],\mathsf{P}_{A}(|\Delta\setminus A|\geq k_{1})\leq\alpha,\qquad\text{and}\qquad\mathsf{P}_{A}(|A\setminus\Delta|\geq k_{2})\leq\beta,\qquad\forall\,A\subseteq[M], (9)

and we denote by 𝒥A(α,β;k1,k2,K)\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K) the smallest expected time for stopping in 𝒞(α,β;k1,k2,K)\mathcal{C}(\alpha,\beta;k_{1},k_{2},K) when the subset of anomalous sources is A[M]A\subseteq[M], i.e.,

𝒥A(α,β;k1,k2,K):=inf(R,T,Δ)𝒞(α,β;k1,k2,K)𝖤A[T].\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K):=\inf\limits_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{E}_{A}[T]. (10)

The second problem we consider in this paper is to evaluate 𝒥A(α,β;k1,k2,K)\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K) to a first-order asymptotic approximation as α,β0\alpha,\beta\to 0, for any A[M]A\subseteq[M], and to find a family of policies that achieves 𝒥A(α,β;k1,k2,K)\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K) in this asymptotic sense, simultaneously for every A[M]A\subseteq[M].

Remark II.1

As we mentioned in the Introduction, both problems have been solved for the full sampling case, i.e., when all sources are observed at each instant, in [27]. On the other hand, in the presence of sampling constraints, neither of them has been considered beyond the special case of “classical” misclassification error rate (k=1k=1) and “classical” familywise error rates (k1=k2=1k_{1}=k_{2}=1) in [9].

II-C Distributional Assumptions

For each i[M]i\in[M], the Kullback-Leibler (KL) divergences of f1if_{1i} and f0if_{0i} are assumed to be positive and finite, i.e., for each i[M]i\in[M] we have:

Ii\displaystyle I_{i} :=𝕊log(f1i/f0i)f1i𝑑νi(0,),\displaystyle:=\int_{\mathbb{S}}\log\left(f_{1i}/f_{0i}\right)\,f_{1i}\,d\nu_{i}\,\in(0,\infty), (11)
Ji\displaystyle J_{i} :=𝕊log(f0i/f1i)f0i𝑑νi(0,).\displaystyle:=\int_{\mathbb{S}}\log\left(f_{0i}/f_{1i}\right)\,f_{0i}\,d\nu_{i}\,\in(0,\infty).

To establish asymptotic lower bounds for (8) and (10), we assume that

i=1M𝕊|gi|log+(|gi|)f1i𝑑νi<,i=1M𝕊|gi|log+(|gi|)f0i𝑑νi<,\displaystyle\begin{split}\sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|\log^{+}(|g_{i}|)\,f_{1i}\,d\nu_{i}&<\infty,\qquad\\ \sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|\log^{+}(|g_{i}|)\,f_{0i}\,d\nu_{i}&<\infty,\end{split} (12)

where gi:=log(f1i/f0i)g_{i}:=\log\left(f_{1i}/f_{0i}\right). This assumption is not needed neither in the full sampling case considered in [27], nor in the case of classical error control (k=1k=1 or k1=k2=1k_{1}=k_{2}=1) under sampling constraints in [9]. Nevertheless, it is weaker than the typical assumption in the sequential controlled sensing literature (e.g.,[16], [17]), according to which it is required that

i=1M𝕊|gi|𝔭f1i𝑑νi<,i=1M𝕊|gi|𝔭f0i𝑑νi<,\displaystyle\begin{split}\sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|^{\mathfrak{p}}\,f_{1i}\;d\nu_{i}&<\infty,\\ \sum_{i=1}^{M}\int_{\mathbb{S}}|g_{i}|^{\mathfrak{p}}\,f_{0i}\;d\nu_{i}&<\infty,\end{split} (13)

holds for 𝔭=2\mathfrak{p}=2. However, in order to show that the asymptotic lower bound in (8) and (10) can be achieved, in certain cases we will need to require that (13) holds for some 𝔭>4\mathfrak{p}>4.

III Two max-min optimization problems

In this section, we formulate and solve two auxiliary max-min optimization problems. These will be used in Section IV to express the asymptotic lower bounds for (8) and (10), and in Section V to design procedures that achieve these lower bounds.

To be specific, we denote by 𝑳:={Li:i[|𝑳|]}\boldsymbol{L}:=\{L_{i}\,:\,i\in[|\boldsymbol{L}|]\} an ordered set of positive numbers, i.e.,

L0:=0<L1L|𝑳|,L_{0}:=0<L_{1}\leq\ldots\leq L_{|\boldsymbol{L}|}, (14)

for each i[|𝑳|]i\in[|\boldsymbol{L}|], we denote by L~i\widetilde{L}_{i} the harmonic mean of the |𝑳|i+1|\boldsymbol{L}|-i+1 largest elements in 𝑳\boldsymbol{L}, i.e.,

L~i:=|𝑳|i+1u=i|𝑳|(1/Lu),i[|𝑳|],\widetilde{L}_{i}:=\frac{|\boldsymbol{L}|-i+1}{\sum_{u=i}^{|\boldsymbol{L}|}(1/L_{u})},\quad i\in[|\boldsymbol{L}|], (15)

and we also set L~|𝑳|+1:=\widetilde{L}_{|\boldsymbol{L}|+1}:=\infty. Then, assuming that |𝑳|M|\boldsymbol{L}|\leq M, we introduce the following function,

𝒱(𝒄;κ,𝑳):=minU[|𝑳|]:|U|=κiUciLi,\mathcal{V}(\boldsymbol{c};\kappa,\boldsymbol{L}):=\min_{U\subseteq[|\boldsymbol{L}|]:\,|U|=\kappa}\,\sum_{i\in U}c_{i}\,L_{i}, (16)

where

  • κ\kappa is a positive integer in [|𝑳|][|\boldsymbol{L}|],

  • 𝒄:=(c1,,c|𝑳|,0,,0)\boldsymbol{c}:=(c_{1},\ldots,c_{|\boldsymbol{L}|},0,\ldots,0) is a vector in

    𝒟(K):={(c1,,cM)[0,1]M:i=1MciK}.\mathcal{D}(K):=\left\{(c_{1},\ldots,c_{M})\in[0,1]^{M}:\sum_{i=1}^{M}c_{i}\leq K\right\}. (17)

The constants MM and KK are defined as in the previous section, i.e., MM is a positive integer, and KK a real number in (0,M](0,M].

III-A Optimization Problem I

Let 𝑳\boldsymbol{L} be an ordered set of size |𝑳|M|\boldsymbol{L}|\leq M, and κ[|𝑳|]\kappa\in[|\boldsymbol{L}|]. The first max-min optimization problem we consider is

V(κ,K,𝑳):=max𝒄𝒟(K)𝒱(𝒄;κ,𝑳).V(\kappa,K,\boldsymbol{L}):=\max_{\boldsymbol{c}\in\mathcal{D}(K)}\mathcal{V}(\boldsymbol{c};\kappa,\boldsymbol{L}). (18)

In the following lemma, we provide an expression for the value V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) of the max-min optimization problem (18), as well as for the maximizer of (18) with the minimum 1\mathcal{L}^{1} norm.

Lemma III.1

The value of the max-min optimization problem (18) is equal to the expression

V(κ,K,𝑳)={(κu)K|𝑳|uL~u+1,ifv=0,xLv1+i=vuLi+(κu)yL~u+1,ifv1,V(\kappa,K,\boldsymbol{L})=\begin{cases}(\kappa-u)\,\frac{K}{|\boldsymbol{L}|-u}\,\widetilde{L}_{u+1},\quad&\mbox{if}\quad v=0,\\ x\,L_{v-1}+\sum_{i=v}^{u}L_{i}+(\kappa-u)\,y\,\widetilde{L}_{u+1},\quad&\mbox{if}\quad v\geq 1,\end{cases} (19)

where x,yx,y are real numbers in [0,1)[0,1), and u,vu,v are integer numbers in [0,κ][0,\kappa] such that vuv\leq u. The values of xx, yy, vv, uu are determined by Algorithm 1 presented in Appendix A. The maximizer of (18) with the minimum 1\mathcal{L}^{1} norm is given by

𝒄(κ,K,𝑳):=(c1,,c|𝑳|,0,,0),\boldsymbol{c}^{\prime}(\kappa,K,\boldsymbol{L}):=(c^{\prime}_{1},\ldots,c^{\prime}_{|\boldsymbol{L}|},0,\ldots,0), (20)

where

  • for all i{u+1,,|𝑳|}i\in\{u+1,\ldots,|\boldsymbol{L}|\} we have

    ciLi={yL~u+1,ifu<κ,Lκ,ifu=κ,c^{\prime}_{i}\,L_{i}=\begin{cases}y\,\widetilde{L}_{u+1},\;\,\qquad\text{if}\quad u<\kappa,\\ L_{\kappa},\qquad\qquad\,\text{if}\quad u=\kappa,\end{cases} (21)
  • if v=0v=0 then

    ci=0,for alli{1,,u},c^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\,\in\,\{1,\ldots,u\}, (22)
  • if v1v\geq 1 then

    ci=1,for alli{v,,u},c^{\prime}_{i}=1,\qquad\mbox{for all}\quad i\,\in\,\{v,\ldots,u\}, (23)
  • if v2v\geq 2 then

    cv1={x,ifx>0,0,ifx=0,c^{\prime}_{v-1}=\begin{cases}x,\qquad\text{if}\quad x>0,\\ 0,\qquad\text{if}\quad x=0,\end{cases} (24)
  • if v3v\geq 3 then

    ci=0,for alliv2.c^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\leq v-2. (25)
Proof:

Appendix A. ∎

Remark III.1

In the symmetric case where Li=LL_{i}=L for all i[|𝐋|]i\in[|\boldsymbol{L}|], then

V(κ,K,𝑳)=κ(K/|𝑳|)L,V(\kappa,K,\boldsymbol{L})=\kappa\,(K/|\boldsymbol{L}|)\,L,

and ci=(K/|𝐋|)1c^{\prime}_{i}=(K/|\boldsymbol{L}|)\wedge 1 for all i[|𝐋|]i\in[|\boldsymbol{L}|].

Remark III.2

Based on the size of KK we distinguish the following cases on the form of V(κ,K,𝐋)V(\kappa,K,\boldsymbol{L}).

  • If KK is relatively large, i.e.,

    Kκ+Lκi=κ+1|𝑳|1/Li,K\geq\kappa+L_{\kappa}\sum_{i=\kappa+1}^{|\boldsymbol{L}|}1/L_{i}, (26)

    then

    V(κ,K,𝑳)=i=1κLi,V(\kappa,K,\boldsymbol{L})=\sum_{i=1}^{\kappa}L_{i}, (27)

    which is the largest possible value that V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) can take over all possible values of KMK\leq M. In this case x=y=0x=y=0, v=1v=1, u=κu=\kappa.

  • If KK is relatively small, i.e.,

    K<Lu+1i=u+1|𝑳|1/Li,K<L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}, (28)

    then

    V(κ,K,𝑳)=(κu)K|𝑳|uL~u+1,V(\kappa,K,\boldsymbol{L})=(\kappa-u^{*})\;\frac{K}{|\boldsymbol{L}|-u^{*}}\;\widetilde{L}_{u^{*}+1},

    where uu^{*} is a quantity defined as

    u:=max{u{0,,κ1}:κu|𝑳|uL~u+1Lu}.u^{*}:=\max\left\{u\in\{0,\ldots,\kappa-1\}\,:\,\frac{\kappa-u}{|\boldsymbol{L}|-u}\;\widetilde{L}_{u+1}\geq L_{u}\right\}.

    In this case x=0x=0, u=uu=u^{*}, v=0v=0, and y=K/(|𝑳|u)y=K/(|\boldsymbol{L}|-u^{*}).

  • If KK is between the values given in (26), (28) the V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) has the general form described in Lemma III.1.

Remark III.3

The only case we can have a maximizer (c~1,,c~|𝐋|)(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|}) which is not equal to the maximizer with the minimum 1\mathcal{L}^{1} norm, is when (26) holds by strict inequality. The only difference between the two maximizers is that for (c~1,,c~|𝐋|)(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|}) there is i{κ+1,,|𝐋|}i\in\{\kappa+1,\ldots,|\boldsymbol{L}|\} such that

c~iLi>Lκ.\tilde{c}_{i}L_{i}>L_{\kappa}. (29)

Although (29) can hold for some i{κ+1,,|𝐋|}i\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}, this does not change the value of V(κ,K,𝐋)V(\kappa,K,\boldsymbol{L}) in (27).

III-B Optimization Problem II

Let 𝑳1:={L1,i:i[|𝑳1|]}\boldsymbol{L}_{1}:=\{L_{1,i}:i\in[|\boldsymbol{L}_{1}|]\} and 𝑳2:={L2,i:i[|𝑳2|]}\boldsymbol{L}_{2}:=\{L_{2,i}:i\in[|\boldsymbol{L}_{2}|]\} be two ordered sets such that |𝑳1|+|𝑳2|M|\boldsymbol{L}_{1}|+|\boldsymbol{L}_{2}|\leq M, let κ1\kappa_{1}, κ2\kappa_{2} be two positive integers such that κ1[|𝑳1|]\kappa_{1}\in[|\boldsymbol{L}_{1}|], κ2[|𝑳2|]\kappa_{2}\in[|\boldsymbol{L}_{2}|], and let rr be an arbitrary positive number. The second max-min optimization problem we consider in this section is more complex than the first, and it has the form

W(κ1,κ2,K,𝑳1,𝑳2,r):=max𝒄𝒟(K)min{𝒱(𝒄^;κ1,𝑳1),r𝒱(𝒄ˇ;κ2,𝑳2)},W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r):=\max_{\boldsymbol{c}\in\mathcal{D}(K)}\min\left\{\mathcal{V}(\hat{\boldsymbol{c}};\kappa_{1},\boldsymbol{L}_{1}),r\;\mathcal{V}(\check{\boldsymbol{c}};\kappa_{2},\boldsymbol{L}_{2})\right\}, (30)

where the function 𝒱\mathcal{V} is defined in (16), and 𝒄:=(𝒄^,𝒄ˇ,𝟎)\boldsymbol{c}:=(\hat{\boldsymbol{c}},\check{\boldsymbol{c}},\boldsymbol{0}), the size of 𝟎\boldsymbol{0} being M|𝑳1||𝑳2|M-|\boldsymbol{L}_{1}|-|\boldsymbol{L}_{2}|, and

𝒄^:=(c^1,,c^|𝑳1|),𝒄ˇ:=(cˇ1,,cˇ|𝑳2|).\displaystyle\begin{split}\hat{\boldsymbol{c}}&:=(\hat{c}_{1},\ldots,\hat{c}_{|\boldsymbol{L}_{1}|}),\\ \check{\boldsymbol{c}}&:=(\check{c}_{1},\ldots,\check{c}_{|\boldsymbol{L}_{2}|}).\end{split} (31)

As we show in the following lemma, the value W(κ1,κ2,K,𝑳1,𝑳2,r)W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r) of the max-min optimization problem (30) is equal to the value of the following optimization problem,

max(K1,K2)V(κ1,K1,𝑳1),\max_{(K_{1},K_{2})}V(\kappa_{1},K_{1},\boldsymbol{L}_{1}), (32)

such that the following two constraints hold

K1+K2\displaystyle K_{1}+K_{2} K,\displaystyle\leq K, (33)
V(κ1,K1,𝑳1)\displaystyle V(\kappa_{1},K_{1},\boldsymbol{L}_{1}) =rV(κ2,K2,𝑳2).\displaystyle=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2}).
Definition III.1

We denote by (K1,K2)(K^{*}_{1},K^{*}_{2}) the maximizer of the constrained optimization problem (32) with the minimum 1\mathcal{L}^{1} norm, i.e., the minimum sum K1+K2K_{1}+K_{2}, among all maximizers. Based on Lemma III.1, we denote by x1,y1,u1,v1x_{1},y_{1},u_{1},v_{1} the parameters such that

V(κ1,K1,𝑳1)={(κ1u1)K1|𝑳1|u1L~1,u1+1,ifv1=0,x1L1,v11+i=v1u1L1,i+(κ1u1)y1L~1,u1+1,ifv11,V(\kappa_{1},K^{*}_{1},\boldsymbol{L}_{1})=\begin{cases}(\kappa_{1}-u_{1})\,\frac{K^{*}_{1}}{|\boldsymbol{L}_{1}|-u_{1}}\,\widetilde{L}_{1,u_{1}+1},\quad&\mbox{if}\quad v_{1}=0,\\ x_{1}L_{1,v_{1}-1}+\sum_{i=v_{1}}^{u_{1}}L_{1,i}+(\kappa_{1}-u_{1})\,y_{1}\,\widetilde{L}_{1,u_{1}+1},\quad&\mbox{if}\quad v_{1}\geq 1,\end{cases}

and by x2,y2,u2,v2x_{2},y_{2},u_{2},v_{2} the parameters such that

V(κ2,K2,𝑳2)={(κ2u2)K2|𝑳2|u2L~2,u2+1,ifv2=0,x2L2,v21+i=v2u2L2,i+(κ2u2)y2L~2,u2+1,ifv21.V(\kappa_{2},K^{*}_{2},\boldsymbol{L}_{2})=\begin{cases}(\kappa_{2}-u_{2})\,\frac{K^{*}_{2}}{|\boldsymbol{L}_{2}|-u_{2}}\,\widetilde{L}_{2,u_{2}+1},\quad&\mbox{if}\quad v_{2}=0,\\ x_{2}L_{2,v_{2}-1}+\sum_{i=v_{2}}^{u_{2}}L_{2,i}+(\kappa_{2}-u_{2})\,y_{2}\,\widetilde{L}_{2,u_{2}+1},\quad&\mbox{if}\quad v_{2}\geq 1.\end{cases}

The parameters x1,y1,u1,v1x_{1},y_{1},u_{1},v_{1}, and x2,y2,u2,v2x_{2},y_{2},u_{2},v_{2} can be computed by applying Algorithm 1 for each case.

The parameters x1,y1,u1,v1x_{1},y_{1},u_{1},v_{1}, and x2,y2,u2,v2x_{2},y_{2},u_{2},v_{2} are used in the expression of the maximizers of (30) with the minimum 1\mathcal{L}^{1} norm.

Lemma III.2

The value of the max-min optimization problem (30) is equal to

W(κ1,κ2,K,𝑳1,𝑳2,r)\displaystyle W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r) =V(κ1,K1,𝑳1)\displaystyle=V(\kappa_{1},K^{*}_{1},\boldsymbol{L}_{1}) (34)
=rV(κ2,K2,𝑳2).\displaystyle=r\,V(\kappa_{2},K^{*}_{2},\boldsymbol{L}_{2}).

The maximizer of (30) with the minimum 1\mathcal{L}^{1} norm is given by

𝒄(κ1,κ2,K,𝑳1,𝑳2,r):=(c^1,,c^|𝑳1|,cˇ1,,cˇ|𝑳2|,0,,0),\boldsymbol{c}^{\prime}(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r):=\left(\hat{c}^{\prime}_{1},\ldots,\hat{c}^{\prime}_{|\boldsymbol{L}_{1}|},\check{c}^{\prime}_{1},\ldots,\check{c}^{\prime}_{|\boldsymbol{L}_{2}|},0,\ldots,0\right), (35)

where

  • for all i{u1+1,,|𝑳1|}i\in\{u_{1}+1,\ldots,|\boldsymbol{L}_{1}|\} we have

    c^iL1,i={y1L~1,u1+1,ifu1<κ1,L1,κ1,ifu1=κ1,\hat{c}^{\prime}_{i}\,L_{1,i}=\begin{cases}y_{1}\,\widetilde{L}_{1,u_{1}+1},\,\qquad\text{if}\quad u_{1}<\kappa_{1},\\ L_{1,\kappa_{1}},\qquad\qquad\,\text{if}\quad u_{1}=\kappa_{1},\end{cases} (36)
  • if v1=0v_{1}=0 then

    c^i=0,for alli{1,,u1},\hat{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\,\in\,\{1,\ldots,u_{1}\}, (37)
  • if v11v_{1}\geq 1 then

    c^i=1,for alli{v1,,u1},\hat{c}^{\prime}_{i}=1,\qquad\mbox{for all}\quad i\,\in\,\{v_{1},\ldots,u_{1}\}, (38)
  • if v12v_{1}\geq 2 then

    c^v11={x1,ifx1>0,0,ifx1=0,\hat{c}^{\prime}_{v_{1}-1}=\begin{cases}x_{1},\;\,\quad\text{if}\quad x_{1}>0,\\ 0,\qquad\text{if}\quad x_{1}=0,\end{cases} (39)
  • if v13v_{1}\geq 3 then

    c^i=0,for alliv12,\hat{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\leq v_{1}-2, (40)

and

  • for all i{u2+1,,|𝑳2|}i\in\{u_{2}+1,\ldots,|\boldsymbol{L}_{2}|\} we have

    cˇiL2,i={y2L~2,u2+1,ifu2<κ2,L2,κ2,ifu2=κ2,\check{c}^{\prime}_{i}\,L_{2,i}=\begin{cases}y_{2}\,\widetilde{L}_{2,u_{2}+1},\,\qquad\text{if}\quad u_{2}<\kappa_{2},\\ L_{2,\kappa_{2}},\qquad\qquad\,\text{if}\quad u_{2}=\kappa_{2},\end{cases} (41)
  • if v2=0v_{2}=0 then

    cˇi=0,for alli{1,,u2},\check{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\,\in\,\{1,\ldots,u_{2}\}, (42)
  • if v21v_{2}\geq 1 then

    cˇi=1,for alli{v2,,u2},\check{c}^{\prime}_{i}=1,\qquad\mbox{for all}\quad i\,\in\,\{v_{2},\ldots,u_{2}\}, (43)
  • if v22v_{2}\geq 2 then

    cˇv21={x2,ifx2>0,0,ifx2=0,\check{c}^{\prime}_{v_{2}-1}=\begin{cases}x_{2},\;\,\quad\text{if}\quad x_{2}>0,\\ 0,\qquad\text{if}\quad x_{2}=0,\end{cases} (44)
  • if v23v_{2}\geq 3 then

    cˇi=0,for alliv22.\check{c}^{\prime}_{i}=0,\qquad\mbox{for all}\quad i\leq v_{2}-2. (45)
Proof:

Appendix A. ∎

IV Universal asymptotic lower bounds

In this section, we fix an arbitrary A[M]A\subseteq[M], and k,k1,k2,Kk,k_{1},k_{2},K as in Section II, and we establish a universal asymptotic lower bound for (8) and (10), as α0\alpha\to 0, and α,β0\alpha,\beta\to 0 respectively, under the moment assumption (12).

IV-A The case of generalized misclassification error rate

To state the asymptotic lower bound for 𝒥A(α;k,K)\mathcal{J}_{A}(\alpha;k,K) as α0\alpha\to 0, we need to introduce some additional notation. Thus, we denote by

𝑭(A):={Fi(A):i[M]}\boldsymbol{F}(A):=\{F_{i}(A)\,:\,i\in[M]\} (46)

the ordered set that consists of the Kullback-Leibler numbers in {Ii,Jj:iA,jA}\{I_{i},\,J_{j}\,:\,i\in A,\,j\notin A\}. In particular, for each i[M]i\in[M], Fi(A)F_{i}(A) is the ithi^{th} smallest element in 𝑭(A)\boldsymbol{F}(A), and can be interpreted as a measure of the difficulty of the ithi^{th} most difficult testing problem.

The overall difficulty of the testing problem is determined by the quantity V(k,K,𝑭(A))V(k,K,\boldsymbol{F}(A)), as described in the following theorem.

Theorem IV.1

Suppose (12) holds. As α0\alpha\to 0, we have

𝒥A(α;k,K)|logα|V(k,K,𝑭(A)),\mathcal{J}_{A}(\alpha;k,K)\gtrsim\frac{|\log\alpha|}{V(k,K,\boldsymbol{F}(A))}, (47)

where VV is defined in (18).

Proof:

Appendix B. ∎

Remark IV.1

Consider the homogeneous and symmetric setup where the difficulty is the same across all testing problems, in the sense that

Ii=Jj=I,i,j[M].I_{i}=J_{j}=I,\quad\forall\;i,j\in[M].

Then, for every A[M]A\subseteq[M] we have

Fi(A)=I,i[M].F_{i}(A)=I,\quad\forall\;i\in[M].

Consequently, by Remark III.1 we can see that

V(k,K,𝑭(A))=k(K/M)I.V(k,K,\boldsymbol{F}(A))=k\,(K/M)\,I.

IV-B The case of generalized familywise error rates

We next establish an asymptotic lower bound for 𝒥A(α,β;k1,k2,K)\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K) as α0\alpha\to 0 and / or β0\beta\to 0. For this, we need to introduce the following notation.

  1. (i)

    If AA\neq\emptyset, for each i[|A|]i\in[|A|] we denote

    • by Ii(A)I_{i}(A) the ithi^{th} smallest element in {Ij:jA}\{I_{j}\,:\,j\in A\},

    • by I~i(A)\widetilde{I}_{i}(A) the harmonic mean of the |A|i+1|A|-i+1 largest elements in {Ij:jA}\{I_{j}\,:\,j\in A\}.

    Moreover, we denote by 𝑰(A)\boldsymbol{I}(A) the ordered set that consists of the Kullback-Leibler numbers in {Ij:jA}\{I_{j}\,:\,j\in A\}, i.e.,

    𝑰(A):={Ii(A):i[|A|]},\boldsymbol{I}(A):=\left\{I_{i}(A)\,:\,i\in[|A|]\right\},

    and for each l{0,,|A|1}l\in\{0,\ldots,|A|-1\}, we denote by 𝑰𝒍(A)\boldsymbol{I_{l}}(A) the set that consists of the |A|l|A|-l largest elements in {Ij:jA}\{I_{j}\,:\,j\in A\}, i.e.,

    𝑰𝒍(A):={Ii(A):l<i|A|}.\boldsymbol{I_{l}}(A):=\{I_{i}(A)\,:\,l<i\leq|A|\}.
  2. (ii)

    If A[M]A\neq[M], for each i[|Ac|]i\in[|A^{c}|] we denote

    • by Ji(A)J_{i}(A) the ithi^{th} smallest element in {Jj:jAc}\{J_{j}\,:\,j\in A^{c}\},

    • by J~i(A)\widetilde{J}_{i}(A) the harmonic mean of the largest |Ac|i+1|A^{c}|-i+1 elements in {Jj:jAc}\{J_{j}\,:\,j\in A^{c}\}.

    Moreover, we denote by 𝑱(A)\boldsymbol{J}(A) the ordered set that consists of the Kullback-Leibler numbers in {Jj:jAc}\{J_{j}\,:\,j\in A^{c}\}, i.e.,

    𝑱(A):={Ji(A):i[|Ac|]},\boldsymbol{J}(A):=\left\{J_{i}(A)\,:\,i\in[|A^{c}|]\right\},

    and for each l{0,,|Ac|1}l\in\{0,\ldots,|A^{c}|-1\}, we denote by 𝑱𝒍(A)\boldsymbol{J_{l}}(A) the ordered set that consists of the |Ac|l|A^{c}|-l largest elements in {Jj:jAc}\{J_{j}\,:\,j\in A^{c}\}, i.e.,

    𝑱𝒍(A):={Ji(A):l<i|Ac|}.\boldsymbol{J_{l}}(A):=\{J_{i}(A)\,:\,l<i\leq|A^{c}|\}.

We state first the asymptotic lower bound for 𝒥A(α,β;k1,k2,K)\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K) when A=A=\emptyset and when A=[M]A=[M], as β0\beta\to 0 and α0\alpha\to 0, respectively.

Theorem IV.2

Suppose (12) holds.

  • Let A=A=\emptyset. For any given α(0,1)\alpha\in(0,1), as β0\beta\to 0 we have

    𝒥(α,β;k1,k2,K)|logβ|V(k2,K,𝑱𝒌𝟏𝟏(A)).\mathcal{J}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\beta|}{V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A))}. (48)
  • Let A=[M]A=[M]. For any given β(0,1)\beta\in(0,1), as α0\alpha\to 0 we have

    𝒥(α,β;k1,k2,K)|logα|V(k1,K,𝑰𝒌𝟐𝟏(A)).\mathcal{J}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A))}. (49)

where VV is defined in (18).

Proof:

Appendix B.

Remark IV.2

If A=A=\emptyset the k11k_{1}-1 sources with the smallest KL numbers in 𝐉(A)\boldsymbol{J}(A) are not considered in the evaluation of the difficulty of the testing problem. This is because we can intentionally misclassify these k11k_{1}-1 sources as anomalous without exceeding the tolerance level of k1k_{1} false alarm errors in order to reduce the expected stopping time. The respective remark holds for the case A=[M]A=[M].

We continue with the asymptotic lower bound when 0<|A|<M0<|A|<M as α,β0\alpha,\,\beta\to 0 so that

|logα|r|logβ|,for some r(0,).|\log\alpha|\sim r|\log\beta|,\quad\mbox{for some }\;r\in(0,\infty). (50)

For this, we need to introduce the following definition.

Definition IV.1

We denote by vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) the maximum of the following quantities. Each quantity is included to the overall maximum given that the respective condition is satisfied. We recall that k1|A|k_{1}\leq|A| or k2|Ac|k_{2}\leq|A^{c}|, because otherwise we would have k1+k2>|A|+|Ac|=Mk_{1}+k_{2}>|A|+|A^{c}|=M which contradicts the initial assumption k1+k2Mk_{1}+k_{2}\leq M.

  • If k2|Ac|k_{2}\leq|A^{c}| we include the

    max{W(k1l,k2,K,𝑰(A),𝑱𝒍(A),r)}\max\{W(k_{1}-l,k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l}}(A),r)\} (51)

    over all l{(k1|A|)+,,(k11)(|Ac|k2)}l\in\{(k_{1}-|A|)^{+},\ldots,(k_{1}-1)\wedge(|A^{c}|-k_{2})\}.

  • If k1|A|k_{1}\leq|A| we include the

    max{W(k1,k2l,K,𝑰𝒍(A),𝑱(A),r)}\max\{W(k_{1},k_{2}-l,K,\boldsymbol{I_{l}}(A),\boldsymbol{J}(A),r)\} (52)

    over all l{(k2|Ac|)+,,(k21)(|A|k1)}l\in\{(k_{2}-|A^{c}|)^{+},\ldots,(k_{2}-1)\wedge(|A|-k_{1})\}.

  • If k11|Ac|k2+1k_{1}-1\geq|A^{c}|-k_{2}+1 we include the

    V(k1(|Ac|k2+1)+,K,𝑰(A)).V(k_{1}-(|A^{c}|-k_{2}+1)^{+},K,\boldsymbol{I}(A)). (53)
  • If k21|A|k1+1k_{2}-1\geq|A|-k_{1}+1 we include the

    rV(k2(|A|k1+1)+,K,𝑱(A)).r\,V(k_{2}-(|A|-k_{1}+1)^{+},K,\boldsymbol{J}(A)). (54)

where WW is defined in (30), and VV in (18).

Remark IV.3

We note that for any A[M]A\subseteq[M], it holds

(k1|A|)+\displaystyle(k_{1}-|A|)^{+} (k11)(|Ac|k2),\displaystyle\leq(k_{1}-1)\wedge(|A^{c}|-k_{2}), (55)
(k2|Ac|)+\displaystyle(k_{2}-|A^{c}|)^{+} (k21)(|A|k1).\displaystyle\leq(k_{2}-1)\wedge(|A|-k_{1}). (56)

Indeed for (55) we observe that if k1|A|k_{1}\leq|A| then (k1|A|)+=0(k_{1}-|A|)^{+}=0 and the result is evident, whereas if k1>|A|k_{1}>|A| it holds

k1|A|\displaystyle k_{1}-|A| k11,\displaystyle\leq k_{1}-1, (57)
k1|A|\displaystyle k_{1}-|A| |Ac|k2,\displaystyle\leq|A^{c}|-k_{2}, (58)

where (57) follows by the fact that 0<|A|<M0<|A|<M, and (58) holds because we have assumed k1+k2Mk_{1}+k_{2}\leq M. Similarly, we can verify that (56) holds.

The difficulty of the testing problem is determined by the quantity vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) as described in the following theorem.

Theorem IV.3

Suppose (12) holds, and let 0<|A|<M0<|A|<M. As α,β0\alpha,\,\beta\to 0 so that (50) holds, we have

𝒥A(α,β;k1,k2,K)|logα|vA(k1,k2,K,r).\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{v_{A}(k_{1},k_{2},K,r)}. (59)

where vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is given by Definition IV.1.

Proof:

Appendix B. ∎

We denote by lAl_{A} the value of the parameter ll which corresponds to the maximum of the quantities in Definition IV.1, as it is used in the formulation of the following results.

Definition IV.2

The quantity lAl_{A} is defined as follows.

  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (51), then lAl_{A} is the number such that

    vA(k1,k2,K,r)=W(k1lA,k2,K,𝑰(A),𝑱𝒍𝑨(A),r).v_{A}(k_{1},k_{2},K,r)=W(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A),r). (60)
  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (52)\eqref{v2}, then lAl_{A} is the number such that

    vA(k1,k2,K,r)=W(k1,k2lA,K,𝑰𝒍𝑨(A),𝑱(A),r).v_{A}(k_{1},k_{2},K,r)=W(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A),r). (61)
  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (53)\eqref{v3}, then lA=(|Ac|k2+1)+l_{A}=(|A^{c}|-k_{2}+1)^{+}.

  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (54)\eqref{v4}, then lA=(|A|k1+1)+l_{A}=(|A|-k_{1}+1)^{+}.

Remark IV.4

If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (51), the lAl_{A} sources with the smallest KL numbers in 𝐉(A)\boldsymbol{J}(A) are not considered in the evaluation of the difficulty of the testing problem. This is because we can intentionally misclassify these lAl_{A} sources as anomalous without exceeding the tolerance level of k1k_{1} false alarms in order to reduce the expected stopping time. The respective remark holds if vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (52).

If k11|Ac|k2+1k_{1}-1\geq|A^{c}|-k_{2}+1 then we can intentionally misclassify as anomalous the |Ac|k2+1|A^{c}|-k_{2}+1 sources with the smallest KL numbers in 𝐉(A)\boldsymbol{J}(A) without exceeding the tolerance level of k1k_{1} false alarms in order to reduce the expected stopping time. The remaining k21k_{2}-1 sources in AcA^{c} are already less than the tolerance level of the k2k_{2} missed detection errors and this is why the difficulty of the testing problem is determined only by the KL numbers in 𝐈(A)\boldsymbol{I}(A) in (53). The respective remark holds if k21|A|k1+1k_{2}-1\geq|A|-k_{1}+1.

Corollary IV.1

Suppose (12) holds, and let 0<|A|<M0<|A|<M. As α,β0\alpha,\,\beta\to 0 so that (50) holds, we can distinguish the following cases.

  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (51),

    𝒥A(α,β;k1,k2,K)|logα|V(k1lA,K1,𝑰(A))|logβ|V(k2,K2,𝑱𝒍𝑨(A)).\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1}-l_{A},K_{1}^{*},\boldsymbol{I}(A))}\sim\frac{|\log\beta|}{V(k_{2},K_{2}^{*},\boldsymbol{J_{l_{A}}}(A))}.
  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (52),

    𝒥A(α,β;k1,k2,K)|logα|V(k1,K1,𝑰𝒍𝑨(A))|logβ|V(k2lA,K2,𝑱(A)).\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1},K_{1}^{*},\boldsymbol{I_{l_{A}}}(A))}\sim\frac{|\log\beta|}{V(k_{2}-l_{A},K_{2}^{*},\boldsymbol{J}(A))}.
  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (53), then

    𝒥A(α,β;k1,k2,K)|logα|V(k1lA,K,𝑰(A)).\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\alpha|}{V(k_{1}-l_{A},K,\boldsymbol{I}(A))}.
  • If vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (54), then

    𝒥A(α,β;k1,k2,K)|logβ|V(k2lA,K,𝑱(A)).\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\gtrsim\frac{|\log\beta|}{V(k_{2}-l_{A},K,\boldsymbol{J}(A))}.

In the first two cases (K1,K2)(K_{1}^{*},K_{2}^{*}) is the maximizer of (32) for the respective case.

V A criterion for asymptotic optimality

In this section, for each of the two problems under consideration, we first introduce a stopping and a decision rule so that the corresponding error constraint is satisfied for any choice of sampling rule. For this, we adopt the approach in [27], where the full sampling case was considered. Subsequently, we establish the second main result of this paper, which is a criterion on the sampling rule for the resulting policy to achieve the corresponding universal asymptotic lower bound in the previous section.

In what follows, for any sampling rule RR and for each source i[M]i\in[M], we denote by ΛiR(n)\Lambda^{R}_{i}(n) the local log-likelihood ratio (LLR) of source ii based on the observations from it in the first nn time instants, i.e.,

ΛiR(n)\displaystyle\Lambda^{R}_{i}(n) :=m=1nlog(f1i(Xi(m))f0i(Xi(m)))Ri(m),n.\displaystyle:=\sum_{m=1}^{n}\log\left(\frac{f_{1i}(X_{i}(m))}{f_{0i}(X_{i}(m))}\right)\,R_{i}(m),\quad n\in\mathbb{N}. (62)

V-A The case of generalized misclassification error

For any sampling rule RR, we denote by TsiRT^{R}_{si} the first time that the sum of the kk smallest LLRs in absolute value is larger than some threshold d>0d>0, and by ΔsiR\Delta_{si}^{R} the subset of data sources with positive LLRs upon stopping, i.e.,

TsiR\displaystyle T^{R}_{si} :=inf{n1:i=1kΛ¯iR(n)d},\displaystyle:=\inf\left\{n\geq 1\,:\,\sum_{i=1}^{k}\bar{\Lambda}^{R}_{i}(n)\geq d\right\}, (63)
ΔsiR\displaystyle\Delta^{R}_{si} :={i[M]:ΛiR(TsiR)>0},\displaystyle:=\left\{i\in[M]\,:\,\Lambda^{R}_{i}(T^{R}_{si})>0\right\}, (64)

where, for each i[M]i\in[M] and nn\in\mathbb{N}, Λ¯iR(n)\bar{\Lambda}^{R}_{i}(n) denotes the ithi^{th} smallest LLR in absolute value at time nn, i.e., the ithi^{th} smallest element in {|ΛjR(n)|:j[M]}\{|{\Lambda}^{R}_{j}(n)|\,:\,j\in[M]\}. In the full-sampling case, where R(n)=[M]R(n)=[M] for every nn\in\mathbb{N}, (TsiR,ΔsiR)(T^{R}_{si},\Delta^{R}_{si}) coincides with the sum-intersection rule, introduced in [27]. Similarly to [27, Theorem 3.1], it can be shown that if the threshold dd in (63) is selected as

d:=|logα|+log(Mk),d:=|\log\alpha|+\log{M\choose k}, (65)

then the policy (R,TsiR,ΔsiR)(R,T^{R}_{si},\Delta^{R}_{si}) satisfies the error constraint (7) for any sampling rule RR. Next, we show that this policy, with this choice of threshold, also achieves the asymptotic lower bound in Theorem IV.1 when, for each i[M]i\in[M], the long-run frequency of the source that corresponds to Fi(A)F_{i}(A) is not smaller than the quantity ci(k,K,𝑭(A))c^{\prime}_{i}(k,K,\boldsymbol{F}(A)), defined according to (20) of Lemma III.1. To be specific, we need the following definition.

Definition V.1

For each A[M]A\subseteq[M], we denote by 𝐜(A):=(c1(A),,cM(A))\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A)) the permutation of 𝐜(k,K,𝐅(A))\boldsymbol{c}^{\prime}(k,K,\boldsymbol{F}(A)), defined in (20), so that

c(i)(A):=ci(k,K,𝑭(A)),i[M],\displaystyle c^{*}_{(i)}(A):=c^{\prime}_{i}(k,K,\boldsymbol{F}(A)),\quad i\in[M], (66)

where (i)(i) denotes the source with the ithi^{th} smallest number in the set 𝐅(A)\boldsymbol{F}(A), defined in (46).

We are now ready to state the first main result of this section.

Theorem V.1

Consider a policy of the form (R,TsiR,ΔsiR)(R,T^{R}_{si},\Delta^{R}_{si}), where the threshold dd in (63) is selected according to (65) and the sampling constraint (4) is satisfied. Fix A[M]A\subseteq[M], and suppose that, for all i[M]i\in[M], the sampling rule RR satisfies

n=1𝖯A(πiR(n)<ci(A)ϵ)<,ϵ>0,\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)<\infty,\qquad\forall\;\epsilon>0, (67)

where 𝐜(A):=(c1(A),,cM(A))\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A)) is defined according to Definition V.1. Then, as α0\alpha\to 0, we have

𝖤A[TsiR]𝒥A(α;k,K)|logα|V(k,K,𝑭(A)),\mathsf{E}_{A}\left[T^{R}_{si}\right]\sim\mathcal{J}_{A}(\alpha;k,K)\sim\frac{|\log\alpha|}{V(k,K,\boldsymbol{F}(A))}, (68)

where VV is defined in (18).

Proof:

Appendix C. ∎

V-B The case of generalized familywise error metric

For any sampling rule RR, we denote by pR(n)p^{R}(n) the number of non-negative LLRs at time nn, by w^1R(n),,w^pR(n)R(n)\hat{w}^{R}_{1}(n),\ldots,\hat{w}^{R}_{p^{R}(n)}(n) the indices of the increasingly ordered non-negative LLRs at time nn, and by wˇ1R(n),,wˇMpR(n)R(n)\check{w}^{R}_{1}(n),\ldots,\check{w}^{R}_{M-p^{R}(n)}(n) the indices of the decreasingly ordered negative LLRs at time nn, i.e.,

0Λw^1R(n)R(n)Λw^pR(n)RR(n),0>Λwˇ1R(n)R(n)ΛwˇMpR(n)RR(n).\displaystyle\begin{split}0&\leq\Lambda^{R}_{\hat{w}^{R}_{1}(n)}(n)\leq\ldots\leq\Lambda^{R}_{\hat{w}^{R}_{p^{R}(n)}}(n),\\ 0&>\Lambda^{R}_{\check{w}^{R}_{1}(n)}(n)\geq\ldots\geq\Lambda^{R}_{\check{w}^{R}_{M-p^{R}(n)}}(n).\end{split} (69)

We also set

Λ^iR(n)\displaystyle\hat{\Lambda}^{R}_{i}(n) :={Λw^iR(n)R(n),ipR(n),+,i>pR(n),\displaystyle:=\begin{cases}\Lambda^{R}_{\hat{w}^{R}_{i}(n)}(n),&\qquad i\leq p^{R}(n),\\ +\infty,&\qquad i>p^{R}(n),\end{cases} (70)
ΛˇiR(n)\displaystyle\check{\Lambda}^{R}_{i}(n) :={ΛwˇiR(n)R(n),iMpR(n),+,i>MpR(n).\displaystyle:=\begin{cases}-\Lambda^{R}_{\check{w}^{R}_{i}(n)}(n),&\quad i\leq M-p^{R}(n),\\ +\infty,&\quad i>M-p^{R}(n).\end{cases}

For any integer ll such that 0l<k10\leq l<k_{1}, we set

τ^(l):=inf{n1:i=1k1lΛ^iR(n)b,i=1+lk2+lΛˇiR(n)a},\hat{\tau}(l):=\inf\left\{n\geq 1:\sum_{i=1}^{k_{1}-l}\hat{\Lambda}^{R}_{i}(n)\geq b,\,\,\sum_{i=1+l}^{k_{2}+l}\check{\Lambda}^{R}_{i}(n)\geq a\right\}, (71)

Also, for any integer ll with 1l<k21\leq l<k_{2}, we set

τˇ(l):=inf{n1:i=1+lk1+lΛ^iR(n)b,i=1k2lΛˇiR(n)a}.\check{\tau}(l):=\inf\left\{n\geq 1:\sum_{i=1+l}^{k_{1}+l}\hat{\Lambda}^{R}_{i}(n)\geq b,\,\,\sum_{i=1}^{k_{2}-l}\check{\Lambda}^{R}_{i}(n)\geq a\right\}. (72)

We denote by TleapRT^{R}_{leap} the minimum of the stopping times in (71) and (72), i.e.,

TleapR:=min{min0l<k1τ^(l),min1l<k2τˇ(l)},\displaystyle T^{R}_{leap}:=\min\left\{\min\limits_{0\leq l<k_{1}}\hat{\tau}(l),\min\limits_{1\leq l<k_{2}}\check{\tau}(l)\right\}, (73)

and depending on whether the minimum is attained by a τ^(l)\hat{\tau}(l) for some l[0,k1)l\in[0,k_{1}), or by a τˇ(l)\check{\tau}(l) for some l[1,k2)l\in[1,k_{2}), and we set

ΔleapR:={{w^1(τ^(l)),..,w^p(τ^(l))(τ^(l))}{wˇ1(τ^(l)),..,wˇl(MpR(n))(τ^(l))},ifTleapR=τ^(l),{w^l+1(τˇ(l)),..,w^p(τˇ(l))(τˇ(l))},ifTleapR=τˇ(l).\displaystyle\Delta^{R}_{leap}:=\begin{cases}\{\hat{w}_{1}(\hat{\tau}(l)),..,\hat{w}_{p(\hat{\tau}(l))}(\hat{\tau}(l))\}\bigcup\{\check{w}_{1}(\hat{\tau}(l)),..,\check{w}_{l\wedge(M-p^{R}(n))}(\hat{\tau}(l))\},&\mbox{if}\quad T^{R}_{leap}=\hat{\tau}(l),\\ \{\hat{w}_{l+1}(\check{\tau}(l)),..,\hat{w}_{p(\check{\tau}(l))}(\check{\tau}(l))\},&\mbox{if}\quad T^{R}_{leap}=\check{\tau}(l).\end{cases} (74)

In the full-sampling case, where R(n)=[M]R(n)=[M] for every nn\in\mathbb{N}, (TsiR,ΔsiR)(T^{R}_{si},\Delta^{R}_{si}) coincides with the leap rule, introduced in [27]. Similarly to [27, Theorem 4.1], it can be shown that if the thresholds aa and bb in (71)-(72) are selected as

a:=|logβ|+log(2k2(Mk2)),b:=|logα|+log(2k1(Mk1)),\displaystyle\begin{split}a&:=|\log\beta|+\log\left(2^{k_{2}}{M\choose k_{2}}\right),\\ b&:=|\log\alpha|+\log\left(2^{k_{1}}{M\choose k_{1}}\right),\end{split} (75)

then the policy (R,TleapR,ΔleapR)(R,T^{R}_{leap},\Delta^{R}_{leap}) satisfies the error constraint (9) for any sampling rule RR. We next show that this policy, with this choice of threshold, achieves the asymptotic lower bound in Theorem IV.3, as long as the long-run sampling frequency of each source is sufficiently large. To be specific, we introduce the following definition, for which we recall the definition of the quantity lAl_{A} in Definition IV.2.

Definition V.2

For each A[M]A\subseteq[M], we define the vector 𝐜(A):=(c1(A),,cM(A))\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A)) as follows:

  1. (i)

    If A=A=\emptyset, then 𝒄(A)\boldsymbol{c}^{*}(A) is a permutation of the vector 𝒄(k2,K,𝑱𝒌𝟏𝟏(A))\boldsymbol{c}^{\prime}(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)), defined in (20), such that

    c{i}(A):=ci(k2,K,𝑱𝒌𝟏𝟏(A)),i[Mk1+1],c_{\{i\}}^{*}(A):=c^{\prime}_{i}(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)),\quad i\in[M-k_{1}+1], (76)

    where {i}\{i\} denotes the source with the ithi^{th} smallest element in 𝑱𝒌𝟏𝟏(A)\boldsymbol{J_{k_{1}-1}}(A).

  2. (ii)

    If A=[M]A=[M], then 𝒄(A)\boldsymbol{c}^{*}(A) is a permutation of the vector 𝒄(k1,K,𝑰𝒌𝟐𝟏(A))\boldsymbol{c}^{\prime}(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A)), defined in (20), such that

    c<i>(A):=ci(k1,K,𝑰𝒌𝟐𝟏(A)),i[Mk2+1],c_{<i>}^{*}(A):=c_{i}^{\prime}(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A)),\quad i\in[M-k_{2}+1], (77)

    where <i><i> denotes the source with the ithi^{th} smallest element in 𝑰𝒌𝟐𝟏(A)\boldsymbol{I_{k_{2}-1}}(A).

  3. (iii)

    If 0<|A|<M0<|A|<M and vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (53), then 𝒄(A)\boldsymbol{c}^{*}(A) is a permutation of the vector 𝒄(k1lA,K,𝑰(A)),\boldsymbol{c}^{\prime}(k_{1}-l_{A},K,\boldsymbol{I}(A)), defined in (20), such that

    c<i>(A):=ci(k1lA,K,𝑰(A)),i[|A|],c_{<i>}^{*}(A):=c_{i}^{\prime}(k_{1}-l_{A},K,\boldsymbol{I}(A)),\quad i\in[|A|], (78)

    where <i><i> denotes the source in AA with the ithi^{th} smallest element in 𝑰(A)\boldsymbol{I}(A).

  4. (iv)

    If 0<|A|<M0<|A|<M and vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (54), then 𝒄(A)\boldsymbol{c}^{*}(A) is a permutation of the vector 𝒄(k2lA,K,𝑱(A)),\boldsymbol{c}^{\prime}(k_{2}-l_{A},K,\boldsymbol{J}(A)), defined according to (20), such that

    c{i}(A):=ci(k2lA,K,𝑱(A)),i[|Ac|],c_{\{i\}}^{*}(A):=c^{\prime}_{i}(k_{2}-l_{A},K,\boldsymbol{J}(A)),\quad i\in[|A^{c}|], (79)

    where {i}\{i\} denotes the source in AcA^{c} with the ithi^{th} smallest element in 𝑱(A)\boldsymbol{J}(A).

  5. (v)

    If 0<|A|<M0<|A|<M and vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (51), then 𝒄(A)\boldsymbol{c}^{*}(A) is a permutation of the vector 𝒄(k1lA,k2,K,𝑰(A),𝑱𝒍𝑨(A))\boldsymbol{c}^{\prime}(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A)), defined according to (35), such that

    c<i>(A)\displaystyle c^{*}_{<i>}(A) :=c^i(k1lA,k2,K,𝑰(A),𝑱𝒍𝑨(A)),i[|A|],\displaystyle:=\hat{c}^{\prime}_{i}(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A)),\quad i\in[|A|], (80)
    c{j}(A)\displaystyle c^{*}_{\{j\}}(A) :=cˇj(k1lA,k2,K,𝑰(A),𝑱𝒍𝑨(A)),j[|Ac|lA],\displaystyle:=\check{c}^{\prime}_{j}(k_{1}-l_{A},k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l_{A}}}(A)),\quad j\in[|A^{c}|-l_{A}], (81)

    where <i><i> is the source in AA with the ithi^{th} smallest element in 𝑰(A)\boldsymbol{I}(A), and {j}\{j\} the source in AcA^{c} with the jthj^{th} smallest element in 𝑱𝒍𝑨(A)\boldsymbol{J_{l_{A}}}(A).

  6. (vi)

    If 0<|A|<M0<|A|<M and vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is equal to (52), then 𝒄(A)\boldsymbol{c}^{*}(A) is a permutation of the vector 𝒄(k1,k2lA,K,𝑰𝒍𝑨(A),𝑱(A))\boldsymbol{c}^{\prime}(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A)), defined according to (35), such that

    c<i>(A)\displaystyle c^{*}_{<i>}(A) :=c^i(k1,k2lA,K,𝑰𝒍𝑨(A),𝑱(A)),i[|A|lA],\displaystyle:=\hat{c}^{\prime}_{i}(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A)),\quad i\in[|A|-l_{A}], (82)
    c{j}(A)\displaystyle c^{*}_{\{j\}}(A) :=cˇj(k1,k2lA,K,𝑰𝒍𝑨(A),𝑱(A)),j[|Ac|],\displaystyle:=\check{c}^{\prime}_{j}(k_{1},k_{2}-l_{A},K,\boldsymbol{I_{l_{A}}}(A),\boldsymbol{J}(A)),\quad j\in[|A^{c}|], (83)

    where <i><i> denotes the source in AA with the ithi^{th} smallest element in 𝑰𝒍𝑨(A)\boldsymbol{I_{l_{A}}}(A), and {j}\{j\} the source in AcA^{c} with the jthj^{th} smallest element in 𝑱(A)\boldsymbol{J}(A).

We are now ready to state the second main result of this section.

Theorem V.2

Consider a policy of the form (R,TleapR,ΔleapR)(R,T^{R}_{leap},\Delta^{R}_{leap}), where the thresholds aa and bb in (71)-(72) are selected according to (75) and the sampling constraint (4) is satisfied. Fix A[M]A\subseteq[M], and suppose that, for all i[M]i\in[M], the sampling rule RR satisfies

n=1𝖯A(πiR(n)<ci(A)ϵ)<,ϵ>0,\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)<\infty,\quad\forall\;\epsilon>0, (84)

where the vector 𝐜(A):=(c1(A),,cM(A))\boldsymbol{c}^{*}(A):=(c_{1}^{*}(A),\ldots,c_{M}^{*}(A)) is defined according to Definition V.2.

  • If A=A=\emptyset, then, for any α(0,1)\alpha\in(0,1), as β0\beta\to 0 we have

    𝖤A[TleapR]𝒥A(α,β;k1,k2,K)|logβ|V(k2,K,𝑱𝒌𝟏𝟏(A)).\displaystyle\begin{split}\mathsf{E}_{A}\left[T^{R}_{leap}\right]&\sim\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log\beta|}{V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A))}.\end{split} (85)
  • If A=[M]A=[M], then, for any β(0,1)\beta\in(0,1), as α0\alpha\to 0 we have

    𝖤A[TleapR]𝒥A(α,β;k1,k2,K)|logα|V(k1,K,𝑰𝒌𝟐𝟏(A)).\displaystyle\begin{split}\mathsf{E}_{A}\left[T^{R}_{leap}\right]&\sim\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log\alpha|}{V(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A))}.\end{split} (86)
  • If 0<|A|<M0<|A|<M, then, as α,β0\alpha,\beta\to 0 so that (50) holds we have

    𝖤A[TleapR]𝒥A(α,β;k1,k2,K)|logα|vA(k1,k2,K,r),\displaystyle\begin{split}\mathsf{E}_{A}\left[T^{R}_{leap}\right]&\sim\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log\alpha|}{v_{A}(k_{1},k_{2},K,r)},\end{split} (87)

where VV is defined in (18), and vAv_{A} in Definition IV.1.

Proof:

Appendix C. ∎

VI Asymptotically optimal probabilistic sampling rules

In this section, we design sampling rules that satisfy the criteria for asymptotic optimality established in Section V simultaneously for every possible subset of anomalies. For this, we first introduce a notion of consistency, which applies to an arbitrary sampling rule. Then, we define a family of probabilistic sampling rules, and finally, we show how to design a probabilistic sampling rule so that condition (67) (resp. (84)) is satisfied for every i[M]i\in[M] and A[M]A\subseteq[M], and subsequently so that the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds for every A[M]A\subseteq[M].

VI-A Consistency

We say that a sampling rule RR is consistent if the subset of sources with non-negative LLRs at time nn, i.e.,

𝔇nR:={i[M]:ΛiR(n)0},\mathfrak{D}_{n}^{R}:=\{i\in[M]\,:\,\Lambda^{R}_{i}(n)\geq 0\}, (88)

converges quickly to the true subset of anomalous sources AA.

Definition VI.1

Fix A[M]A\subseteq[M]. We say that a sampling rule RR is consistent under 𝖯A\mathsf{P}_{A} if

𝖤A[σAR]<,\mathsf{E}_{A}\left[\sigma^{R}_{A}\right]<\infty, (89)

where σAR\sigma_{A}^{R} is the random time starting from which the sources in AA are the only ones with non-negative LLR, i.e.,

σAR:=inf{n:𝔇mR=Afor allmn}.\sigma^{R}_{A}:=\inf\left\{n\in\mathbb{N}:\mathfrak{D}^{R}_{m}=A\quad\text{for all}\;\;m\geq n\right\}. (90)
Definition VI.2

We say that a sampling rule RR is consistent if it is consistent under 𝖯A\mathsf{P}_{A}, for every A[M]A\subseteq[M].

From [9, Theorem 3.1] we know that if there is a ρ>0\rho>0 such that, for each i[M]i\in[M], the sequence {𝖯A(πiR(n)<ρ):n}\{\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<\rho\right)\,:\,n\in\mathbb{N}\} is exponentially decaying, then the sequence {𝖯A(σAR>n):n}\{\mathsf{P}_{A}\left(\sigma^{R}_{A}>n\right)\,:\,n\in\mathbb{N}\} is exponentially decaying, and, as a result, the sampling rule RR is consistent under 𝖯A\mathsf{P}_{A}. Next, we state a less restrictive criterion for consistency, according to which a sampling rule is consistent even if the long-run sampling frequency of all sources is equal to 0, as long as the decay is not fast.

Theorem VI.1

Suppose that condition (13) holds for some 𝔭>4\mathfrak{p}>4, and let δ(0,122𝔭)\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right), and C>0C>0. Fix A[M]A\subseteq[M]. If RR is an arbitrary sampling rule for which

n=1n𝖯A(πiR(n)<Cnδ)<,i[M],\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)<\infty,\quad\forall\,i\,\in[M], (91)

then RR is consistent under 𝖯A\mathsf{P}_{A}.

Proof:

Appendix C. ∎

VI-B Probabilistic sampling rules

We say that a sampling rule RR is probabilistic if there exists a function

qR:2[M]××2[M][0,1]q^{R}:2^{[M]}\times\mathbb{N}\times 2^{[M]}\to[0,1]

such that, for every nn\in\mathbb{N}, D[M]D\subseteq[M], and B[M]B\subseteq[M], qR(B;n,D)q^{R}\left(B;n,D\right) is the probability that BB is the subset of sampled sources at time nn when DD is the subset of sources with non-negative LLRs at time n1n-1, i.e.,

qR(B;n,D):=𝖯(R(n)=B|n1R,𝔇n1R=D).\displaystyle q^{R}\left(B;n,D\right):=\mathsf{P}\left(R(n)=B\,|\,\mathcal{F}^{R}_{n-1},\mathfrak{D}^{R}_{n-1}=D\right). (92)

For such a sampling rule, for each source i[M]i\in[M] we denote by ciR(n,D)c^{R}_{i}\left(n,D\right) the probability with which source ii is sampled at time nn when DD is the subset of sources with non-negative LLRs at time n1n-1, i.e.,

ciR(n,D)\displaystyle c^{R}_{i}\left(n,D\right) :=𝖯(Ri(n)=1|n1R,𝔇n1R=D)\displaystyle:=\mathsf{P}\left(R_{i}(n)=1\,|\,\mathcal{F}^{R}_{n-1},\mathfrak{D}^{R}_{n-1}=D\right) (93)

thus,

ciR(n,D)=B[M]:iBqR(B;n,D).\displaystyle c^{R}_{i}\left(n,D\right)=\sum_{B\subseteq[M]:\,i\in B}q^{R}\left(B;n,D\right). (94)

In the following theorem, we state a condition under which a consistent probabilistic sampling rule satisfies condition (67) or (84) simultaneously for every A[M]A\subseteq[M].

Theorem VI.2

Let RR be a consistent probabilistic sampling rule, and fix A[M]A\subseteq[M].

  • (i)

    If for all i[M]i\in[M] we have

    lim infnciR(n,A)ci(A),\liminf_{n\to\infty}c_{i}^{R}(n,A)\geq c^{*}_{i}(A), (95)

    where (c1(A),,cM(A))(c^{*}_{1}(A),\ldots,c^{*}_{M}(A)) is given by Definition V.1 (resp. Definition V.2), then condition (67) (resp. (84)) is satisfied for every i[M]i\in[M].

  • (ii)

    If, also, the sampling constraint (4) is satisfied, then the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds.

Proof:

Part (i) is proven in Appendix C, and part (ii) follows by Theorem V.1 (resp. Theorem V.2). ∎

Condition (95) is clearly satisfied if source ii is sampled at each instant with probability ci(D)c^{*}_{i}(D) when the estimated anomalous subset at the previous time instant is DD, i.e.,

ciR(n,D)=ci(D),n,D[M],i[M].c_{i}^{R}(n,D)=c^{*}_{i}(D),\quad\forall\;n\in\mathbb{N},\;D\subseteq[M],\;i\in[M]. (96)

From [9, Theorem 4.1] it follows that this choice implies that the sampling rule is consistent when

ci(D)>0,D[M],i[M].c^{*}_{i}(D)>0,\quad\forall\;D\subseteq[M],\;i\in[M]. (97)

In case condition (97) does not hold, selection (96) no longer guarantees the consistency of the sampling rule, and needs to be modified. Indeed, for a source i[M]i\in[M] for which ci(D)=0c^{*}_{i}(D)=0, then the sampling frequency of source ii should converge to 0 slowly enough when the true subset of anomalous sources is DD, so that Theorem VI.2 is applicable. To be more specific, let {bn:n}\{b_{n}:n\in\mathbb{N}\} be a sequence of positive reals that converges to 0, which we will specify later, and for each nn\in\mathbb{N}, i[M]i\in[M], and D[M]D\subseteq[M] set

ciR(n,D)={bn,ifci(D)=0,ci(D)(l~D/(Ml~D))bn,ifci(D)>0,c_{i}^{R}(n,D)=\begin{cases}b_{n},\quad&\text{if}\quad c^{*}_{i}(D)=0,\\ c^{*}_{i}(D)-(\tilde{l}_{D}/(M-\tilde{l}_{D}))\,b_{n},\quad&\text{if}\quad c^{*}_{i}(D)>0,\end{cases} (98)

where l~D\tilde{l}_{D} is the number of zero entries in the vector 𝐜(D)\mathbf{c}^{*}(D).

As we show in the following proposition, a suitable selection of the sequence {bn:n}\{b_{n}:n\in\mathbb{N}\} guarantees the consistency of the sampling rule.

Proposition VI.1
  • (i)

    If (97) holds, and the probabilistic sampling rule RR satisfies (96), then condition (67) (resp. (84)) is satisfied for every i[M]i\in[M].

  • (ii)

    Suppose condition (13) holds for some 𝔭>4\mathfrak{p}>4. If RR is a probabilistic sampling rule that satisfies (98) for every i[M],D[M]i\in[M],D\subseteq[M], nn\in\mathbb{N}, where bn=Cpnδb_{n}=C_{p}\,n^{-\delta} for some δ(0,122𝔭)\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right) and some Cp>0C_{p}>0 small enough such that

    ciR(n,D)Cpnδc_{i}^{R}(n,D)\geq C_{p}\,n^{-\delta} (99)

    holds for all nn\in\mathbb{N}, D[M]D\subseteq[M], and i[M]i\in[M], then condition (67) (resp. (84)) is satisfied for every i[M]i\in[M].

Proof:

Since (96) and (98) both imply (95) for any A[M]A\subseteq[M], by Theorem VI.2 it follows that it suffices to show that RR is consistent. As discussed earlier, for (i) this follows from [9, Theorem 4.1]. For (ii), the proof of consistency is presented in Appendix C.

The previous proposition provides concrete selections of cR(n,D)c^{R}(n,D) that guarantee the consistency of the sampling rule and conditions (67) or (84). To achieve the corresponding asymptotic optimality property, the sampling rule should satisfy the sampling constraint (4). This is clearly the case if at most K\lfloor K\rfloor sources are sampled at each instant, i.e.,

qR(B;n,D)=0for alln,D[M],and B[M] such that |B|>K.\displaystyle q^{R}\left(B;n,D\right)=0\quad\text{for all}\quad n\in\mathbb{N},\;D\subseteq[M],\quad\text{and }\;B\subseteq[M]\;\text{ such that }\;|B|>\lfloor K\rfloor. (100)

Combining this observation with the previous proposition we can now state the theorem that summarizes the main result of this section.

Theorem VI.3

Consider an integer KK.

  • (i)

    Suppose (97) holds. If RR is a probabilistic sampling rule that satisfies (96) and (100), then the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds for every A[M]A\subseteq[M].

  • (ii)

    If RR is a probabilistic sampling rule that satisfies (98) and (100) for every i[M],D[M]i\in[M],D\subseteq[M], nn\in\mathbb{N}, where bn=Cpnδb_{n}=C_{p}\,n^{-\delta} for some δ(0,122𝔭)\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right) and some Cp>0C_{p}>0 small enough such that (99) holds for all nn\in\mathbb{N}, D[M]D\subseteq[M], and i[M]i\in[M], then the first-order asymptotic optimality property (68) (resp. (85)-(87)) holds for every A[M]A\subseteq[M].

Proof:

The claim follows by Proposition VI.1, and Theorems V.1, V.2, respectively. ∎

Remark VI.1

Theorem VI.3 remains valid even if KK is not an integer as long as

i=1Mci(D)K,D[M].\sum_{i=1}^{M}c^{*}_{i}(D)\leq\lfloor K\rfloor,\quad\forall\,D\subseteq[M].
Remark VI.2

Proposition VI.1 and Theorem VI.3 remain valid even if ciR(n,D)c_{i}^{R}(n,D) are chosen greater than or equal to (96) (resp. (98)), as long as the sampling constraint (100) is satisfied.

VII Simulation study

In this section, we present the results of various simulations by which we illustrate the asymptotic theory that was developed in the previous sections. In all simulations, there are M=10M=10 sources and, for each source i[M]i\in[M], the observations are normally distributed with variance 11 and mean 0 if source ii is not anomalous, whereas the mean is μi\mu_{i} if it is anomalous, i.e., f0i=𝒩(0,1)f_{0i}=\mathcal{N}(0,1) and f1i=𝒩(μi,1)f_{1i}=\mathcal{N}(\mu_{i},1), and thus Ii=Ji=(μi)2/2I_{i}=J_{i}=(\mu_{i})^{2}/2. For our simulations, we consider a non-homogeneous setup, where

μi={0.5,1i3,0.7,4i7,1,8i10.\mu_{i}=\begin{cases}0.5,\quad&1\leq i\leq 3,\\ 0.7,\quad&4\leq i\leq 7,\\ 1,\quad&8\leq i\leq 10.\end{cases}

We apply the probabilistic sampling rule (see Section VI), which observes the values of exactly K=5K=5 sources per sampling instant. When controlling the generalized misclassification error, we apply the sum-intersection rule (63)-(64), whereas when controlling the generalized familywise errors, we apply the leap rule (73)-(74). For the computation of each expected time for stopping, we apply 10410^{4} simulation runs, and the standard error in each expected time for stopping is 11, whereas the standard error for each ratio of expected times for stopping is 10210^{-2}, in all cases.

In all simulations, we fix the true, but unknown, anomalous set to be A={1,,5}A=\{1,\ldots,5\}. The numbers {Fi(A):i[M]}\{F_{i}(A)\,:\,i\in[M]\} defined in Subsection IV-A for Ii=Ji=(μi)2/2I_{i}=J_{i}=(\mu_{i})^{2}/2, are equal to

Fi(A)\displaystyle F_{i}(A) ={0.125,1i3,0.245,4i7,0.5,8i10,\displaystyle=\begin{cases}0.125,\quad&1\leq i\leq 3,\\ 0.245,\quad&4\leq i\leq 7,\\ 0.5,\quad&8\leq i\leq 10,\end{cases}

and for {Ii(A):i[|A|]}\{I_{i}(A)\,:\,i\in[|A|]\}, {Ji(A):i[|Ac|]}\{J_{i}(A)\,:\,i\in[|A^{c}|]\} defined in Subsection IV-B, we have

Ii(A)\displaystyle I_{i}(A) =Fi(A),1i5,\displaystyle=F_{i}(A),\quad 1\leq i\leq 5,
Ji5(A)\displaystyle J_{i-5}(A) =Fi(A),6i10.\displaystyle=F_{i}(A),\quad 6\leq i\leq 10.

VII-A Controlling the generalized misclassification error

In the case of generalized misclassification error rate, by Theorem V.1 as α0\alpha\to 0

𝒥A(α;k,K)|logα|V(k,K,𝑭(A)),\mathcal{J}_{A}(\alpha;k,K)\sim\frac{|\log\alpha|}{V(k,K,\boldsymbol{F}(A))}, (101)

where for K=5K=5 we have

V(k,K,𝑭(A))={kKMF~1(A),if k{1,,5},(k3)KM3F~4(A),if k{6,,8},i=69Fi(A),if k=9,i=610Fi(A),if k=10.V(k,K,\boldsymbol{F}(A))=\begin{cases}k\,\frac{K}{M}\,\widetilde{F}_{1}(A),\quad&\mbox{if }\;k\in\{1,\ldots,5\},\\ (k-3)\,\frac{K}{M-3}\,\widetilde{F}_{4}(A),\quad&\mbox{if }\;k\in\{6,\ldots,8\},\\ \sum_{i=6}^{9}F_{i}(A),\quad&\mbox{if }\;k=9,\\ \sum_{i=6}^{10}F_{i}(A),\quad&\mbox{if }\;k=10.\end{cases} (102)

where for the computation of V(k,K,𝑭(A))V(k,K,\boldsymbol{F}(A)) we used Algorithm 1. We can easily verify that k(K/M)F~1(A)k(K/M)\widetilde{F}_{1}(A) is less than or equal to the respective value of V(k,K,𝑭(A))V(k,K,\boldsymbol{F}(A)) for all k[M]k\in[M], and as a result

𝒥A(α;k,K)𝒥A(α;1,K)1k,k[M].\frac{\mathcal{J}_{A}(\alpha;k,K)}{\mathcal{J}_{A}(\alpha;1,K)}\lesssim\frac{1}{k},\quad\forall\,k\in[M]. (103)

In Figure 1, we plot the ratio of the expected time for stopping for k=5k=5 over that for k=1k=1, against |log10(α)||\log_{10}(\alpha)| for α{101,,1010}\alpha\in\{10^{-1},\ldots,10^{-10}\}. For each value of α\alpha, the thresholds are selected according to (65). As expected by the form of V(k,K,𝑭(A))V(k,K,\boldsymbol{F}(A)) for k=5k=5 in (102), the ratio converges to 1/51/5, and this is also depicted in Figure 1, where by 𝖤[T;k=1]\mathsf{E}[T;k=1], 𝖤[T;k=5]\mathsf{E}[T;k=5] we denote the expected stopping time for k=1k=1, k=5k=5, respectively.

Refer to caption

Figure 1: Ratio of expected stopping times 𝖤[T;k=5]/𝖤[T;k=1]\mathsf{E}[T;k=5]/\mathsf{E}[T;k=1] versus |log10(α)||\log_{10}(\alpha)|.

In order to verify the limit (103) as an approximation in the finite regime, we fix α=103\alpha=10^{-3} and select the thresholds of the sum-intersection rule in (63), via Monte Carlo simulation, so that the probability of at least kk errors, of any kind, is equal to 10310^{-3}. In Figure 2, we plot the ratio of the expected time for stopping for kk, over that for k=1k=1, against k[M]k\in[M]. In Figure 2, we can see that the expected time for stopping when k=1k=1 reduces by a factor approximately equal to 1/k1/k for 1k51\leq k\leq 5, and clearly less than 1/k1/k for 5k105\leq k\leq 10. In Figure 2, we denote by 𝖤[T;k]\mathsf{E}[T;k] the expected stopping time for the respective value of kk.

Refer to caption


Figure 2: Ratio of expected stopping times 𝖤[T;k]/𝖤[T;k=1]\mathsf{E}[T;k]/\mathsf{E}[T;k=1] versus the tolerance level k.

VII-B Controlling the generalized familywise errors

In the case of generalized familywise error rates, for r=1r=1 and since 0<|A|<M0<|A|<M, by Theorem V.2 as α,β0\alpha,\,\beta\to 0

𝒥A(α,β;k1,k2,K)|log(α)|vA(k1,k2,K,r),\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\sim\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)}, (104)

where for K=5K=5, vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) has the following form

vA(k1,k2,K,r)={k1y1I~1(A)=k2y2J~1(A),k1=k2{1,2},k1y1I~2(A)=(k21)y2J~1(A),k1=k2=3,x1I3(A)+i=45Ii(A)=(k21)y2J~1(A),k1=k2=4,x1I2(A)+i=35Ii(A)=x2J4(A)+J5(A),k1=k2=5.v_{A}(k_{1},k_{2},K,r)=\begin{cases}k_{1}\,y_{1}\,\widetilde{I}_{1}(A)=k_{2}\,y_{2}\,\widetilde{J}_{1}(A),\quad&k_{1}=k_{2}\in\{1,2\},\\ k_{1}\,y_{1}\,\widetilde{I}_{2}(A)=(k_{2}-1)\,y_{2}\,\widetilde{J}_{1}(A),\quad&k_{1}=k_{2}=3,\\ x_{1}\,I_{3}(A)+\sum_{i=4}^{5}I_{i}(A)=(k_{2}-1)\,y_{2}\,\widetilde{J}_{1}(A),\quad&k_{1}=k_{2}=4,\\ x_{1}\,I_{2}(A)+\sum_{i=3}^{5}I_{i}(A)=x_{2}\,J_{4}(A)+J_{5}(A),\quad&k_{1}=k_{2}=5.\end{cases} (105)

and x1x_{1}, x2x_{2}, y1y_{1}, y2y_{2}, and lAl_{A} are given in Table I.

k1=k2k_{1}=k_{2} 1 2 3 4 5
lAl_{A} 0 0 1 1 0
x1x_{1} - - - 0.61 0.61
x2x_{2} - - - - 0.39
y1y_{1} 0.69 0.69 0.66 - -
y2y_{2} 0.30 0.30 0.46 0.53 -
TABLE I: The lAl_{A}, x1x_{1}, x2x_{2}, y1y_{1}, y2y_{2} for each k1=k2[M/2]k_{1}=k_{2}\in[M/2].

For the computation of vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) we first use Algorithm 5, and then for the computation of x1x_{1}, x2x_{2}, y1y_{1}, y2y_{2} we used Algorithm 1. We can easily verify that k1y1I~1(A)k_{1}\,y_{1}\,\widetilde{I}_{1}(A) is less than or equal to vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) for all k1=k2[M/2]k_{1}=k_{2}\in[M/2], and as a result

𝒥A(α,β;k1,k1,K)𝒥A(α,β;1,1,K)1k1.\frac{\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{1},K)}{\mathcal{J}_{A}(\alpha,\beta;1,1,K)}\lesssim\frac{1}{k_{1}}. (106)

In Figure 3, we plot the ratio of the expected stopping time for k1=k2=3k_{1}=k_{2}=3 over that for k1=k2=1k_{1}=k_{2}=1, against |log10(α)||\log_{10}(\alpha)| for α{101,,1010}\alpha\in\{10^{-1},\ldots,10^{-10}\}. For each value of α\alpha, and β=α\beta=\alpha, the thresholds are selected according to (75). As expected by the form of vAv_{A} in (105), it follows that this ratio converges to 0.3280.328, which is also depicted in Figure 3, where by 𝖤[T;k1=k2=3]\mathsf{E}[T;k_{1}=k_{2}=3], 𝖤[T;k1=k2=1]\mathsf{E}[T;k_{1}=k_{2}=1] we denote the expected stopping time for k1=k2=1k_{1}=k_{2}=1, k1=k2=3k_{1}=k_{2}=3, respectively.

Refer to caption

Figure 3: Ratio of expected stopping times 𝖤[T;k1=k2=3]/𝖤[T;k1=k2=1]\mathsf{E}[T;k_{1}=k_{2}=3]/\mathsf{E}[T;k_{1}=k_{2}=1] versus |log10(α)||\log_{10}(\alpha)|.

In order to verify the limit (106) as an approximation in the finite regime, we fix α=β=103\alpha=\beta=10^{-3} and we select the thresholds of the leap rule (73), via Monte Carlo sumulation, so that the probabilities of at least k1k_{1} false positives and at least k2k_{2} false negatives are both equal to α=β=103\alpha=\beta=10^{-3}. In Figure 4, we depict the ratio of the expected time for stopping with k1=k2k_{1}=k_{2}, over that for k1=k2=1k_{1}=k_{2}=1, against k1=k2[M/2]k_{1}=k_{2}\in[M/2]. In Figure 4, we observe that the expected stopping time for k1=k2=1k_{1}=k_{2}=1, reduces by a factor of approximately 1/k11/k_{1}, as k1k_{1} increases.

Refer to caption

Figure 4: Ratio of expected stopping times 𝖤[T;k1]/𝖤[T;k1=1]\mathsf{E}[T;k_{1}]/\mathsf{E}[T;k_{1}=1] versus the tolerance level k1=k2k_{1}=k_{2}.

VIII Conclusion

In this work, we study the sequential anomaly identification problem in the presence of a sampling constraint under two different formulations that involve generalized error metrics. For each of them we establish a universal asymptotic lower bound, and we show that it is attained by a policy that combines the stopping and decision rule that was proposed in the case of full-sampling in [27] and a probabilistic sampling rule which is designed to achieve specific long-run sampling frequencies. The optimal performance is characterized, and the impact of the sampling constraint and tolerance to errors is assessed, both to a first-order asymptotic approximation as the error probabilities go to 0. These theoretical asymptotic results are also illustrated via simulation studies.

Directions for further research involve (i) the incorporation of prior information on the number of anomalous sources, as in [9] in the case of classical familywise error control (k1=k2=1)(k_{1}=k_{2}=1), (ii) consideration of composite hypotheses for the testing problem in each source, as in [27] in the full sampling case, and in [12] for the case we know a priori that there is only one anomalous source and we can observe one source at a time, (iii) varying sampling or switching cost per source, as in [10] and [11], respectively. Other directions include the framework where the acquired observations are not conditionally independent of the past [19], as well as the consideration of a dependence structure within the observations from different sources [28].

Acknowledgments

This work was supported by the University of Illinois at Urbana–Champaign research support award RB21036.

References

  • [1] A. Tsopelakos and G. Fellouris, “Sequential anomaly detection with observation control under a generalized error metric,” in 2020 IEEE International Symposium on Information Theory (ISIT), 2020, pp. 1165–1170.
  • [2] R. J. Bolton and D. J. Hand, “Statistical Fraud Detection: A Review,” Statistical Science, vol. 17, no. 3, pp. 235 – 255, 2002. [Online]. Available: https://doi.org/10.1214/ss/1042727940
  • [3] E. Dimson, Stock market anomalies.   CUP Archive, 1988.
  • [4] S. K. De and M. Baron, “Sequential bonferroni methods for multiple hypothesis testing with strong control of family-wise error rates i and ii,” Sequential Analysis, vol. 31, no. 2, pp. 238–262, 2012.
  • [5] J. Bartroff and J. Song, “Sequential tests of multiple hypotheses controlling type i and ii familywise error rates,” Journal of statistical planning and inference, vol. 153, pp. 100–114, 2014.
  • [6] Y. Song and G. Fellouris, “Asymptotically optimal, sequential, multiple testing procedures with prior information on the number of signals,” Electronic Journal of Statistics, vol. 11, 03 2016.
  • [7] K. Cohen and Q. Zhao, “Active hypothesis testing for anomaly detection,” IEEE Transactions on Information Theory, vol. 61, no. 3, pp. 1432–1450, 2015.
  • [8] B. Huang, K. Cohen, and Q. Zhao, “Active anomaly detection in heterogeneous processes,” IEEE Transactions on information theory, vol. 65, no. 4, pp. 2284–2301, 2018.
  • [9] A. Tsopelakos and G. Fellouris, “Sequential anomaly detection under sampling constraints,” IEEE Transactions on Information Theory, pp. 1–1, 2022.
  • [10] A. Gurevich, K. Cohen, and Q. Zhao, “Sequential anomaly detection under a nonlinear system cost,” IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3689–3703, 2019.
  • [11] T. Lambez and K. Cohen, “Anomaly search with multiple plays under delay and switching costs,” IEEE Transactions on Signal Processing, vol. 70, pp. 174–189, 2021.
  • [12] B. Hemo, T. Gafni, K. Cohen, and Q. Zhao, “Searching for anomalies over composite hypotheses,” IEEE Transactions on Signal Processing, vol. 68, pp. 1181–1196, 2020.
  • [13] A. Deshmukh, V. V. Veeravalli, and S. Bhashyam, “Sequential controlled sensing for composite multihypothesis testing,” Sequential Analysis, vol. 40, no. 2, pp. 259–289, 2021.
  • [14] G. R. Prabhu, S. Bhashyam, A. Gopalan, and R. Sundaresan, “Sequential multi-hypothesis testing in multi-armed bandit problems: An approach for asymptotic optimality,” IEEE Transactions on Information Theory, 2022.
  • [15] A. Tsopelakos and G. Fellouris, “Asymptotically optimal sequential anomaly identification with ordering sampling rules,” arXiv preprint arXiv:2309.14528, 2023.
  • [16] H. Chernoff, “Sequential design of experiments,” Ann. Math. Statist., vol. 30, no. 3, pp. 755–770, 09 1959. [Online]. Available: http://dx.doi.org/10.1214/aoms/1177706205
  • [17] S. Nitinawarat, G. K. Atia, and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on Automatic Control, vol. 58, no. 10, pp. 2451–2464, 2013.
  • [18] S. A. Bessler, “Theory and applications of the sequential design of experiments, k-actions and infinitely many experiments, part i theory.” Department of Statistics, Stanford University, Technical Report 55, 1960.
  • [19] S. Nitinawarat and V. V. Veeravalli, “Controlled sensing for sequential multihypothesis testing with controlled markovian observations and non-uniform control cost,” Sequential Analysis, vol. 34, no. 1, pp. 1–24, 2015. [Online]. Available: https://doi.org/10.1080/07474946.2014.961864
  • [20] J. Bartroff, “Multiple hypothesis tests controlling generalized error rates for sequential data,” Statistica Sinica, vol. 28, no. 1, pp. 363–398, 2018. [Online]. Available: http://www.jstor.org/stable/26384246
  • [21] J. Bartroff and J. Song, “Sequential tests of multiple hypotheses controlling false discovery and nondiscovery rates,” Sequential Analysis, vol. 39, no. 1, pp. 65–91, 2020. [Online]. Available: https://doi.org/10.1080/07474946.2020.1726686
  • [22] X. He and J. Bartroff, “Asymptotically optimal sequential fdr and pfdr control with (or without) prior information on the number of signals,” Journal of Statistical Planning and Inference, vol. 210, pp. 87–99, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0378375820300604
  • [23] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal statistical society: series B (Methodological), vol. 57, no. 1, pp. 289–300, 1995.
  • [24] E. L. Lehmann and J. P. Romano, “Generalizations of the familywise error rate,” The Annals of Statistics, vol. 33, no. 3, pp. 1138–1154, 2005. [Online]. Available: http://www.jstor.org/stable/3448684
  • [25] J. P. Romano and A. M. Shaikh, “Stepup procedures for control of generalizations of the familywise error rate,” The Annals of Statistics, vol. 34, no. 4, pp. 1850–1873, 2006. [Online]. Available: http://www.jstor.org/stable/25463487
  • [26] J. P. Romano and M. Wolf, “Control of generalized error rates in multiple testing,” The Annals of Statistics, vol. 35, no. 4, pp. 1378–1408, 2007. [Online]. Available: http://www.jstor.org/stable/25464544
  • [27] Y. Song and G. Fellouris, “Sequential multiple testing with generalized error control: An asymptotic optimality theory,” The Annals of Statistics, vol. 47, no. 3, pp. 1776 – 1803, 2019. [Online]. Available: https://doi.org/10.1214/18-AOS1737
  • [28] J. Heydari, A. Tajer, and H. V. Poor, “Quickest linear search over correlated sequences,” IEEE Transactions on Information Theory, vol. 62, no. 10, pp. 5786–5808, 2016.
  • [29] P. Hall and C. C. Heyde, “Martingale limit theory and its application,” Probability and Mathematical Statistics. New York etc.: Academic Press, A Subsidiary of Harcourt Brace Jovanovich, Publishers. XII, 308 p. $ 36.00 (1980)., 1980.
  • [30] O. Kallenberg, Foundations of Modern Probability, ser. Probability and Its Applications.   Springer New York, 2002. [Online]. Available: https://books.google.com/books?id=TBgFslMy8V4C
  • [31] E. Cesàro, “Sur la convergence des séries,” Nouvelles annales de mathématiques: journal des candidats aux écoles polytechnique et normale, vol. 7, pp. 49–59, 1888.

Appendix A

In Appendix A, we prove Lemma III.1 and Lemma III.2, which provide the solutions to the max-min problems (18) and (30), respectively. We also provide Algorithm 1 for the computation of x,y[0,1)x,y\in[0,1) and u,v[0,κ]u,v\in[0,\kappa], and Algorithm 5 for the computation of (K1,K2)(K^{*}_{1},K^{*}_{2}).

Proof:

Since 𝒟(K)\mathcal{D}(K) is compact, and {U[|𝑳|]:|U|=κ}\{U\subseteq[|\boldsymbol{L}|]\,:\;|U|=\kappa\} is finite, the max-min problem (18) has a solution. The max-min structure of (18) implies that a maximizer (c~1,,c~|𝑳|)(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|}) must satisfy the following two conditions:

  1. (i)

    for each i{1,,κ}i\in\{1,\ldots,\kappa\}, and each j{κ+1,,|𝑳|}j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}, it holds

    c~iLic~jLj,\tilde{c}_{i}L_{i}\leq\tilde{c}_{j}L_{j}, (107)
  2. (ii)

    the {c~iLi:i{1,,κ}}\{\tilde{c}_{i}L_{i}\,:\,i\in\{1,\ldots,\kappa\}\} follow an ascending order, i.e.,

    c~1L1c~iLic~κLκ,\tilde{c}_{1}L_{1}\leq\ldots\leq\tilde{c}_{i}L_{i}\leq\ldots\leq\tilde{c}_{\kappa}L_{\kappa}, (108)

By definition of 𝒱(𝒄;κ,𝑳)\mathcal{V}(\boldsymbol{c};\kappa,\boldsymbol{L}) in (16), the first condition (107) implies that

V(κ,K,𝑳)=i=1κc~iLi.V(\kappa,K,\boldsymbol{L})=\sum_{i=1}^{\kappa}\tilde{c}_{i}L_{i}. (109)

To prove why condition (107) must hold, we apply an argument by contradiction. Let us assume that there are i{1,,κ}i\in\{1,\ldots,\kappa\}, and j{κ+1,,|𝑳|}j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\} such that c~iLi>c~jLj\tilde{c}_{i}L_{i}>\tilde{c}_{j}L_{j}, and let us denote by jj^{*} the index which corresponds to smallest such c~jLj\tilde{c}_{j}L_{j}, in case the assumed inequality holds for more than one j{κ+1,,|𝑳|}j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}, and by ii^{*} the index which corresponds to largest such c~iLi\tilde{c}_{i}L_{i} in case the assumed inequality holds for more than one i{1,,κ}i\in\{1,\ldots,\kappa\}. Clearly, c~iLi>c~jLj\tilde{c}_{i^{*}}L_{i^{*}}>\tilde{c}_{j^{*}}L_{j^{*}}, and the c~jLj\tilde{c}_{j^{*}}L_{j^{*}} is included in the sum of V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}), as it is one of the κ\kappa smallest {c~uLu:u[|𝑳|]}\{\tilde{c}_{u}L_{u}:u\in[|\boldsymbol{L}|]\}, whereas c~iLi\tilde{c}_{i^{*}}L_{i^{*}} is not included. Since c~iLi>c~jLj\tilde{c}_{i^{*}}L_{i^{*}}>\tilde{c}_{j^{*}}L_{j^{*}}, and by definition LiLjL_{i^{*}}\leq L_{j^{*}}, then c~j<c~i\tilde{c}_{j^{*}}<\tilde{c}_{i^{*}}. Thus, simply by swapping the values of c~j\tilde{c}_{j^{*}}, c~i\tilde{c}_{i^{*}} we could increase the value of V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) by a size of c~iLjc~jLj\tilde{c}_{i^{*}}L_{j^{*}}-\tilde{c}_{j^{*}}L_{j^{*}}, which is a contradiction because (c~1,,c~|𝑳|)(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|}) is a maximizer.

Assuming that (107) holds which implies (109), in order to prove why condition (108) must hold, we apply again an argument by contradiction. Let us assume that there are i<ji<j, both in {1,,κ}\{1,\ldots,\kappa\}, such that c~iLi>c~jLj\tilde{c}_{i}L_{i}>\tilde{c}_{j}L_{j}. In such a case, it is clear that c~i>c~j\tilde{c}_{i}>\tilde{c}_{j} because by definition LiLjL_{i}\leq L_{j}. If we swap the values of c~i\tilde{c}_{i}, c~j\tilde{c}_{j}, then we can increase the value of V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) since

c~iLi+c~jLj<c~jLi+c~iLj(c~ic~j)Li<(c~ic~j)Lj,\tilde{c}_{i}L_{i}+\tilde{c}_{j}L_{j}<\tilde{c}_{j}L_{i}+\tilde{c}_{i}L_{j}\Leftrightarrow(\tilde{c}_{i}-\tilde{c}_{j})L_{i}<(\tilde{c}_{i}-\tilde{c}_{j})L_{j}, (110)

which is a contradiction because (c~1,,c~|𝑳|)(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|}) is a maximizer.

The maximizer with the minimum 1\mathcal{L}^{1} norm is the one that satisfies (108), and (107) by the following equality,

c~κLκ=c~jLj,j{κ+1,,|𝑳|}.\tilde{c}_{\kappa}L_{\kappa}=\tilde{c}_{j}L_{j},\quad\forall\,j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}. (111)

If KK is large enough so that

Kκ+Lκj=κ+1|𝑳|1/Lj,K\geq\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j}, (112)

then the maximizer with the minimum 1\mathcal{L}^{1} norm is

c~1==c~κ=1,\displaystyle\tilde{c}_{1}=\ldots=\tilde{c}_{\kappa}=1, (113)
c~j=\displaystyle\tilde{c}_{j}= Lκ/Lj,j{κ+1,,|𝑳|},\displaystyle L_{\kappa}/L_{j},\quad\forall\;j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\},

which implies that

V(κ,K,𝑳)=i=1κLi,V(\kappa,K,\boldsymbol{L})=\sum_{i=1}^{\kappa}L_{i},

and thus V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) equals to the form (19) with x=y=0x=y=0, v=1v=1, u=κu=\kappa. Hence, it suffices to prove that V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) equals to the form (19) when

K<κ+Lκj=κ+1|𝑳|1/Lj.K<\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j}. (114)

When (114) holds, the maximizer (c~1,,c~|𝑳|)𝒟(K)(\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|})\in\mathcal{D}(K) satisfies

i=1|𝑳|c~i=K,\sum_{i=1}^{|\boldsymbol{L}|}\tilde{c}_{i}=K, (115)

by which we get that

Ki=1κ1c~i=i=κ|𝑳|c~i=c~κLκi=κ|𝑳|1/LjK-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}=\sum_{i=\kappa}^{|\boldsymbol{L}|}\tilde{c}_{i}=\tilde{c}_{\kappa}L_{\kappa}\sum_{i=\kappa}^{|\boldsymbol{L}|}1/L_{j}

where the second equality follows by (111). Therefore,

c~jLj=c~κLκ=Ki=1κ1c~i|𝑳|(κ1)L~κ,j{κ+1,,|𝑳|}.\tilde{c}_{j}L_{j}=\tilde{c}_{\kappa}L_{\kappa}=\frac{K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}}{|\boldsymbol{L}|-(\kappa-1)}\widetilde{L}_{\kappa},\quad\forall\,j\in\{\kappa+1,\ldots,|\boldsymbol{L}|\}. (116)

where L~κ\widetilde{L}_{\kappa} defined in (15) is the harmonic mean of the |𝑳|(κ1)|\boldsymbol{L}|-(\kappa-1) largest elements in 𝑳\boldsymbol{L}, and it stands for an average of them. It holds

0Ki=1κ1c~i|𝑳|(κ1)10\leq\frac{K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}}{|\boldsymbol{L}|-(\kappa-1)}\leq 1 (117)

because

Ki=1κ1c~i=i=κ|𝑳|c~i|𝑳|(κ1),K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}=\sum_{i=\kappa}^{|\boldsymbol{L}|}\tilde{c}_{i}\leq|\boldsymbol{L}|-(\kappa-1),

since c~i[0,1]\tilde{c}_{i}\in[0,1] for all i[|𝑳|]i\in[|\boldsymbol{L}|]. By replacing c~κLκ\tilde{c}_{\kappa}L_{\kappa} in (109) we get

V(κ,K,𝑳)=c~1L1++c~κ1Lκ1+Ki=1κ1c~i|𝑳|(κ1)L~κ.V(\kappa,K,\boldsymbol{L})=\tilde{c}_{1}L_{1}+\ldots+\tilde{c}_{\kappa-1}L_{\kappa-1}+\frac{K-\sum_{i=1}^{\kappa-1}\tilde{c}_{i}}{|\boldsymbol{L}|-(\kappa-1)}\;\widetilde{L}_{\kappa}. (118)

Therefore, the values of c~1,,c~κ1\tilde{c}_{1},\ldots,\tilde{c}_{\kappa-1} are deduced by the solution of the following maximization problem

maximize{c1L1++cκ1Lκ1+z|𝑳|(κ1)L~κ}\mbox{maximize}\left\{c_{1}L_{1}+\ldots+c_{\kappa-1}L_{\kappa-1}+\frac{z}{|\boldsymbol{L}|-(\kappa-1)}\widetilde{L}_{\kappa}\right\} (119)

with respect to c1,,cκ1[0,1]c_{1},\ldots,c_{\kappa-1}\in[0,1] and zz subject to

  1. (i)
    0zLκi=κ|𝑳|1/Li,0\leq z\leq L_{\kappa}\sum_{i=\kappa}^{|\boldsymbol{L}|}1/L_{i},
  2. (ii)
    i=1κ1ci+z=K,\sum_{i=1}^{\kappa-1}c_{i}+z=K,
  3. (iii)
    c1L1cκ1Lκ1z|𝑳|(κ1)L~κ.c_{1}L_{1}\leq\ldots\leq c_{\kappa-1}L_{\kappa-1}\leq\frac{z}{|\boldsymbol{L}|-(\kappa-1)}\widetilde{L}_{\kappa}.

In the special case L~κ/(|𝑳|(κ1))Lκ1\widetilde{L}_{\kappa}/(|\boldsymbol{L}|-(\kappa-1))\geq L_{\kappa-1}, and since by definition Lκ1L1L_{\kappa-1}\geq\ldots\geq L_{1}, the maximization method is to distribute KK so that z,cκ1,,c1z,c_{\kappa-1},\ldots,c_{1} respectively takes its’ largest possible value, independently, one by one with respect to the priority list they are written. On the other hand, if L~κ/(|𝑳|(κ1))<Lκ1\widetilde{L}_{\kappa}/(|\boldsymbol{L}|-(\kappa-1))<L_{\kappa-1} then cκ1c_{\kappa-1} would become first in the priority list. Due to constraint (iii) for any value of cκ1c_{\kappa-1}, the value of zz must be large enough so that the inequality is satisfied. Thus, the maximization method for the former case does not work, as the value of zz depends on the value of cκ1c_{\kappa-1}. In order to resolve this issue, we consider

u:=max{u{0,,κ1}:κu|𝑳|uL~u+1Lu},u^{*}:=\max\left\{u\in\{0,\ldots,\kappa-1\}\,:\,\frac{\kappa-u}{|\boldsymbol{L}|-u}\;\widetilde{L}_{u+1}\geq L_{u}\right\},

thus V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) reduces to

V(κ,K,𝑳)=c~1L1++c~uLu+(κu)z|𝑳|uL~u+1,V(\kappa,K,\boldsymbol{L})=\tilde{c}_{1}L_{1}+\ldots+\tilde{c}_{u^{*}}L_{u^{*}}+(\kappa-u^{*})\frac{z}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}, (120)

and it holds

κu|𝑳|uL~u+1LuL1.\frac{\kappa-u^{*}}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}\geq L_{u^{*}}\geq\ldots\geq L_{1}.

The values of c~1,,c~|𝑳|\tilde{c}_{1},\ldots,\tilde{c}_{|\boldsymbol{L}|} are deduced by the solution of the following maximization problem,

maximize{c1L1++cuLu+(κu)z|𝑳|uL~u+1}\mbox{maximize}\left\{c_{1}L_{1}+\ldots+c_{u^{*}}L_{u^{*}}+(\kappa-u^{*})\frac{z}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}\right\} (121)

with respect to c1,,cu[0,1]c_{1},\ldots,c_{u^{*}}\in[0,1] and zz subject to

  1. (i)
    0zLu+1i=u+1|𝑳|1/Li,0\leq z\leq L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i},
  2. (ii)
    i=1uci+z=K,\sum_{i=1}^{u^{*}}c_{i}+z=K,
  3. (iii)
    c1L1cuLuz|𝑳|uL~u+1.c_{1}L_{1}\leq\ldots\leq c_{u^{*}}L_{u^{*}}\leq\frac{z}{|\boldsymbol{L}|-u^{*}}\widetilde{L}_{u^{*}+1}.

If

K<Lu+1i=u+1|𝑳|1/Li,K<L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}, (122)

then c1==cu=0c_{1}=\ldots=c_{u^{*}}=0,

V(κ,K,𝑳)=(κu)K|𝑳|uL~u+1,V(\kappa,K,\boldsymbol{L})=(\kappa-u^{*})\frac{K}{|\boldsymbol{L}|-u^{*}}\;\widetilde{L}_{u^{*}+1}, (123)

and thus V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) equals to the form (19) with x=0x=0, v=0v=0, u=uu=u^{*}, y=K/(|𝑳|u)y=K/(|\boldsymbol{L}|-u^{*}). Therefore, in view of (114), it suffices to prove that V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) equals to the form (19) when

Lu+1i=u+1|𝑳|1/LiK<κ+Lκj=κ+1|𝑳|1/Lj.L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}\leq K<\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j}. (124)

If KK satisfies (124) then cu+1=1c_{u^{*}+1}=1, the objective of (121) becomes

c1L1++cuLu+Lu+1+(κ(u+1))z|𝑳|(u+1)L~u+2.c_{1}L_{1}+\ldots+c_{u^{*}}L_{u^{*}}+L_{u^{*}+1}+(\kappa-(u^{*}+1))\frac{z}{|\boldsymbol{L}|-(u^{*}+1)}\widetilde{L}_{u^{*}+2}.

and if we set u:=u+1u:=u^{*}+1, v:=u+1v:=u^{*}+1 the maximization problem (121) takes the more general form

maximize{c1L1++cv1Lv1+i=vuLi+(κu)z|𝑳|uL~u+1}\mbox{maximize}\left\{c_{1}L_{1}+\ldots+c_{v-1}L_{v-1}+\sum_{i=v}^{u}L_{i}+(\kappa-u)\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\right\} (125)

with respect to c1,,cv1[0,1]c_{1},\ldots,c_{v-1}\in[0,1] and zz subject to

  1. (i)
    Lui=u+1|𝑳|1/LizLu+1i=u+1|𝑳|1/Li,L_{u}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}\leq z\leq L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i},
  2. (ii)
    i=1v1ci+z=K(uv1),\sum_{i=1}^{v-1}c_{i}+z=K-(u-v-1),
  3. (iii)
    c1L1cv1Lv1LvLuz|𝑳|uL~u+1.c_{1}L_{1}\leq\ldots\leq c_{v-1}L_{v-1}\leq L_{v}\leq\ldots\leq L_{u}\leq\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}.

For any u,v[1,κ]u,v\in[1,\kappa] with vuv\leq u, if

Lv1κu|𝑳|uL~u+1,L_{v-1}\geq\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}, (126)

then our priority is to make cv1c_{v-1} as large as possible. If cv1=1c_{v-1}=1 then we decrease variable vv by 11. We observe that the value of cv1c_{v-1} does not affect zz as the values Lv,,LuL_{v},\ldots,L_{u} interfere. If (126) does not hold, our priority is to make zz as large as possible. If z=Lu+1i=u+1|𝑳|1/Liz=L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i} then we set zz equal to Lu+1i=u+2|𝑳|1/LiL_{u+1}\sum_{i=u+2}^{|\boldsymbol{L}|}1/L_{i}, and we increase uu by 11. Depending on the size of KK this procedure continues for a number of times, and it ends up with a form

V(κ,K,𝑳)=xLv1+i=vuLi+(κu)z|𝑳|uL~u+1,V(\kappa,K,\boldsymbol{L})=x\,L_{v-1}+\sum_{i=v}^{u}L_{i}+(\kappa-u)\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}, (127)

which for y:=z/(|𝑳|u)y:=z/(|\boldsymbol{L}|-u) equals to the form (19) in Lemma III.1.

In order to prove that the maximizer with the minimum 1\mathcal{L}^{1} norm satisfies (21)-(25), we observe that (23)-(25) follow directly by the form (127) of V(κ,K,𝑳)V(\kappa,K,\boldsymbol{L}) by matching the cic^{\prime}_{i} with the factor of the respective LiL_{i}. In order to prove (21) we distinguish the following two cases.

  1. (i)

    If u=κu=\kappa then c~κ=1\tilde{c}_{\kappa}=1 and the result follows by (111).

  2. (ii)

    If u<κu<\kappa then

    c~jLj=c~κLκ=z|𝑳|uL~u+1,j{κ+1,,|𝑳|},\tilde{c}_{j}L_{j}=\tilde{c}_{\kappa}L_{\kappa}=\frac{z}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1},\quad\forall\,j\,\in\,\{\kappa+1,\ldots,|\boldsymbol{L}|\},

    which proves the claim since y:=z/(|𝑳|u)y:=z/(|\boldsymbol{L}|-u).

In Algorithm 1, we describe explicitly how x,y,u,vx,y,u,v are computed. The algorithm primarily focuses on the case that KK satisfies (124), as the other two cases are covered in (112) and (122). Algorithm 1 solves the constrained optimization problem (125) by implementing the steps described in the paragraph that follows (126).

  1. (1)

    Input: κ\kappa, KK, 𝑳\boldsymbol{L}.

  2. (2)

    Compute

    umax{u{0,..,κ1}:κu|𝑳|uL~u+1Lu}.u^{*}\leftarrow\max\left\{u\in\{0,..,\kappa-1\}\,:\,\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\geq L_{u}\right\}.
  3. (3)

    If Kκ+Lκj=κ+1|𝑳|1/LjK\geq\kappa+L_{\kappa}\sum_{j=\kappa+1}^{|\boldsymbol{L}|}1/L_{j} then

    x0x\leftarrow 0, y0y\leftarrow 0, v1v\leftarrow 1, uκu\leftarrow\kappa,

    go to output

    end-if.

  4. (4)

    If K<Lu+1i=u+1|𝑳|1/LiK<L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i} then

    x0x\leftarrow 0, v0v\leftarrow 0, uuu\leftarrow u^{*}, yK/(|𝑳|u)y\leftarrow K/(|\boldsymbol{L}|-u^{*}),

    go to output

    end-if.

  5. (5)

    Initialize: x0x\leftarrow 0, uu+1u\leftarrow u^{*}+1, vu+1v\leftarrow u^{*}+1,

    zLui=u+1|𝑳|1/Liz\leftarrow L_{u}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i},

    KKLu+1i=u+1|𝑳|1/LiK\leftarrow K-L_{u^{*}+1}\sum_{i=u^{*}+1}^{|\boldsymbol{L}|}1/L_{i}.

  6. (6)

    While (K>0)(K>0)

    1. (i)

      If (Lv1κu|𝑳|uL~u+1)\left(L_{v-1}\geq\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\right) or (u=κ)(u=\kappa) then

      1. If (K1)(K\geq 1) then

        vv1v\leftarrow v-1,

        KK1K\leftarrow K-1,

      2. else

        xKx\leftarrow K,

        K0K\leftarrow 0,

        end-if

      end-if

    2. (ii)

      If (Lv1<κu|𝑳|uL~u+1)\left(L_{v-1}<\frac{\kappa-u}{|\boldsymbol{L}|-u}\widetilde{L}_{u+1}\right) or (v=1)(v=1) then

      wLu+1i=u+1|𝑳|1/Lizw\leftarrow L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}-z,

      zz+min{K,w}z\leftarrow z+\min\{K,w\},

      KKmin{K,w}K\leftarrow K-\min\{K,w\}.

      1. If (zLu+1i=u+1|𝑳|1/Li)\left(z\geq L_{u+1}\sum_{i=u+1}^{|\boldsymbol{L}|}1/L_{i}\right) then

        zLu+1i=u+2|𝑳|1/Liz\leftarrow L_{u+1}\sum_{i=u+2}^{|\boldsymbol{L}|}1/L_{i},

        uu+1u\leftarrow u+1,

        end-if

      end-if

    end-while.

    yz/(|𝑳|u)y\leftarrow z/(|\boldsymbol{L}|-u).

  7. (7)

    Output: xx, uu, vv, yy.

Algorithm 1 Computation of x,y[0,1)x,y\in[0,1) and u,v[0,κ]u,v\in[0,\kappa].
Proof:

For the proof of Lemma III.2, it suffices to show that

W(κ1,κ2,K,𝑳1,𝑳2,r)=V(κ1,K1,𝑳1),W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)=V(\kappa_{1},K_{1}^{*},\boldsymbol{L}_{1}),

where (K1,K2)(K^{*}_{1},K^{*}_{2}) is the maximizer with the minimum 1\mathcal{L}^{1} norm of the problem (32), then the form (34) of W(κ1,κ2,K,𝑳1,𝑳2,r)W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r), as well as the form (36)-(45) of the maximizer of (30) with the minimum 1\mathcal{L}^{1} norm, follow by Lemma III.1.

Since 𝒟(K)\mathcal{D}(K) is compact and {U[|𝑳1|]:|U|=κ1}\{U\subseteq[|\boldsymbol{L}_{1}|]\,:\,|U|=\kappa_{1}\}, {U[|𝑳2|]:|U|=κ2}\{U\subseteq[|\boldsymbol{L}_{2}|]\,:\,|U|=\kappa_{2}\} are finite sets, the max-min problem (30) has a solution. We denote by (𝒄^,𝒄ˇ,𝟎)(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0}) the maximizer of (30) with the minimum 1\mathcal{L}^{1} norm, where 𝒄^:=(c^1,,c^|𝑳1|)\hat{\boldsymbol{c}}^{\prime}:=(\hat{c}^{\prime}_{1},\ldots,\hat{c}^{\prime}_{|\boldsymbol{L}_{1}|}), 𝒄ˇ:=(cˇ1,,cˇ|𝑳2|)\check{\boldsymbol{c}}^{\prime}:=(\check{c}^{\prime}_{1},\ldots,\check{c}^{\prime}_{|\boldsymbol{L}_{2}|}), and we consider

K^:=i=1|𝑳1|c^i,Kˇ:=i=1|𝑳2|cˇi.\hat{K}:=\sum_{i=1}^{|\boldsymbol{L}_{1}|}\hat{c}^{\prime}_{i},\qquad\check{K}:=\sum_{i=1}^{|\boldsymbol{L}_{2}|}\check{c}^{\prime}_{i}. (128)

The max-min structure of (30) implies that for 𝒄^\hat{\boldsymbol{c}}^{\prime}, 𝒄ˇ\check{\boldsymbol{c}}^{\prime}, we have

W(κ1,κ2,K,𝑳1,𝑳2,r)=𝒱(𝒄^;κ1,𝑳1)=r𝒱(𝒄ˇ;κ2,𝑳2)W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)=\mathcal{V}(\hat{\boldsymbol{c}}^{\prime};\kappa_{1},\boldsymbol{L}_{1})=r\,\mathcal{V}(\check{\boldsymbol{c}}^{\prime};\kappa_{2},\boldsymbol{L}_{2})

and

𝒱(𝒄^;κ1,𝑳1)\displaystyle\mathcal{V}(\hat{\boldsymbol{c}}^{\prime};\kappa_{1},\boldsymbol{L}_{1}) =max𝒄^𝒟(K^)𝒱(𝒄^;κ1,𝑳1)=V(κ1,K^,𝑳1),\displaystyle=\max_{\hat{\boldsymbol{c}}\in\mathcal{D}(\hat{K})}\mathcal{V}(\hat{\boldsymbol{c}};\kappa_{1},\boldsymbol{L}_{1})=V(\kappa_{1},\hat{K},\boldsymbol{L}_{1}),
𝒱(𝒄ˇ;κ2,𝑳2)\displaystyle\mathcal{V}(\check{\boldsymbol{c}}^{\prime};\kappa_{2},\boldsymbol{L}_{2}) =max𝒄ˇ𝒟(Kˇ)𝒱(𝒄ˇ;κ2,𝑳2)=V(κ2,Kˇ,𝑳2).\displaystyle=\max_{\check{\boldsymbol{c}}\in\mathcal{D}(\check{K})}\mathcal{V}(\check{\boldsymbol{c}};\kappa_{2},\boldsymbol{L}_{2})=V(\kappa_{2},\check{K},\boldsymbol{L}_{2}).

Hence,

W(κ1,κ2,K,𝑳1,𝑳2,r)=V(κ1,K^,𝑳1)=rV(κ2,Kˇ,𝑳2),W(\kappa_{1},\kappa_{2},K,\boldsymbol{L}_{1},\boldsymbol{L}_{2},r)=V(\kappa_{1},\hat{K},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\check{K},\boldsymbol{L}_{2}), (129)

and since (𝒄^,𝒄ˇ,𝟎)𝒟(K)(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0})\in\mathcal{D}(K) we have K^+KˇK\hat{K}+\check{K}\leq K.

Therefore, it suffices to prove that (K^,Kˇ)(\hat{K},\check{K}) is the maximizer of (32) with the minimum 1\mathcal{L}^{1} norm. To prove this, we apply an argument by contradiction. Let us assume that (K^,Kˇ)(\hat{K},\check{K}) is not a maximizer of (32), then there is (K1,K2)(K_{1},K_{2}) such that K1+K2KK_{1}+K_{2}\leq K, and

V(κ1,K1,𝑳1)=rV(κ2,K2,𝑳2)>V(κ1,K^,𝑳1)=rV(κ2,Kˇ,𝑳2).V(\kappa_{1},K_{1},\boldsymbol{L}_{1})=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})>V(\kappa_{1},\hat{K},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\check{K},\boldsymbol{L}_{2}).

Let us denote by 𝒄^′′:=(c^1′′,,c^|𝑳1|′′)\hat{\boldsymbol{c}}^{\prime\prime}:=(\hat{c}^{\prime\prime}_{1},\ldots,\hat{c}^{\prime\prime}_{|\boldsymbol{L}_{1}|}) the maximizer of V(κ1,K1,𝑳1)V(\kappa_{1},K_{1},\boldsymbol{L}_{1}), and by 𝒄ˇ′′:=(cˇ1′′,,cˇ|𝑳2|′′)\check{\boldsymbol{c}}^{\prime\prime}:=(\check{c}^{\prime\prime}_{1},\ldots,\check{c}^{\prime\prime}_{|\boldsymbol{L}_{2}|}) the maximizer of V(κ2,K2,𝑳2)V(\kappa_{2},K_{2},\boldsymbol{L}_{2}). Then,

min{𝒱(𝒄^′′;κ1,𝑳1),r𝒱(𝒄ˇ′′;κ2,𝑳2)}>min{𝒱(𝒄^;κ1,𝑳1),r𝒱(𝒄ˇ;κ2,𝑳2)}\min\left\{\mathcal{V}(\hat{\boldsymbol{c}}^{\prime\prime};\kappa_{1},\boldsymbol{L}_{1}),r\,\mathcal{V}(\check{\boldsymbol{c}}^{\prime\prime};\kappa_{2},\boldsymbol{L}_{2})\right\}>\min\left\{\mathcal{V}(\hat{\boldsymbol{c}}^{\prime};\kappa_{1},\boldsymbol{L}_{1}),r\,\mathcal{V}(\check{\boldsymbol{c}}^{\prime};\kappa_{2},\boldsymbol{L}_{2})\right\}

which is a contradiction because we assumed that (𝒄^,𝒄ˇ,𝟎)(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0}) is a maximizer of (30). Also, if we assume that (K^,Kˇ)(\hat{K},\check{K}) is not the maximizer (32) with the minimum 1\mathcal{L}^{1} norm, by (128) we deduce that (𝒄^,𝒄ˇ,𝟎)(\hat{\boldsymbol{c}}^{\prime},\check{\boldsymbol{c}}^{\prime},\boldsymbol{0}) is not the maximizer of (30) with the minimum 1\mathcal{L}^{1} norm, which is a contradiction. ∎

Next, we provide Algorithm 5 for the computation of the maximizer (K1,K2)(K^{*}_{1},K^{*}_{2}) of the constrained optimization problem (32) with the minimum 1\mathcal{L}^{1} norm.

  1. 1(1)

    Input: κ1\kappa_{1}, κ2\kappa_{2}, KK, 𝑳1\boldsymbol{L}_{1}, 𝑳2\boldsymbol{L}_{2}, rr.

  • (2)

    Compute

    K~1κ1+L1,κ1i=κ1+1|𝑳1|1/L1,i,\tilde{K}_{1}\leftarrow\kappa_{1}+L_{1,\kappa_{1}}\sum_{i=\kappa_{1}+1}^{|\boldsymbol{L}_{1}|}1/L_{1,i},

    K~2κ2+L2,κ2i=κ2+1|𝑳2|1/L2,i.\tilde{K}_{2}\leftarrow\kappa_{2}+L_{2,\kappa_{2}}\sum_{i=\kappa_{2}+1}^{|\boldsymbol{L}_{2}|}1/L_{2,i}.

  • (3)

    If i=1κ1L1,iri=1κ2L2,i\sum_{i=1}^{\kappa_{1}}L_{1,i}\leq r\,\sum_{i=1}^{\kappa_{2}}L_{2,i}

    K2rK^{r}_{2}\leftarrow root of the equation rV(κ2,K2,𝑳2)=V(κ1,K~1,𝑳1)r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})=V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1}) with respect to K2[0,K~2]K_{2}\in[0,\tilde{K}_{2}].

    K~min{K~1+K2r,K}\tilde{K}\leftarrow\min\{\tilde{K}_{1}+K^{r}_{2},K\}.

    K1K^{*}_{1}\leftarrow root of the equation V(κ1,K1,𝑳1)=rV(κ2,K~K1,𝑳2)V(\kappa_{1},K_{1},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2}) with respect to K1[0,K~]K_{1}\in[0,\tilde{K}].

    K2K~K1.K^{*}_{2}\leftarrow\tilde{K}-K^{*}_{1}.

    end-if

  • (4)

    If i=1κ1L1,i>ri=1κ2L2,i\sum_{i=1}^{\kappa_{1}}L_{1,i}>r\,\sum_{i=1}^{\kappa_{2}}L_{2,i}

    K1rK^{r}_{1}\leftarrow root of the equation V(κ1,K1,𝑳1)=rV(κ2,K~2,𝑳2)V(\kappa_{1},K_{1},\boldsymbol{L}_{1})=r\,V(\kappa_{2},\tilde{K}_{2},\boldsymbol{L}_{2}) with respect to K1[0,K~1]K_{1}\in[0,\tilde{K}_{1}].

    K~min{K1r+K~2,K}\tilde{K}\leftarrow\min\{K^{r}_{1}+\tilde{K}_{2},K\}.

    K2K^{*}_{2}\leftarrow root of the equation V(κ1,K~K2,𝑳1)=rV(κ2,K2,𝑳2)V(\kappa_{1},\tilde{K}-K_{2},\boldsymbol{L}_{1})=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2}) with respect to K2[0,K~]K_{2}\in[0,\tilde{K}].

    K1K~K2.K^{*}_{1}\leftarrow\tilde{K}-K^{*}_{2}.

    end-if

  • (5)

    Output: K1,K2K^{*}_{1},K^{*}_{2}.

  • Algorithm 2 Computation of (K1,K2)(K^{*}_{1},K^{*}_{2}).

    We note that K2rK^{r}_{2} (resp. K1rK^{r}_{1}) and K1K^{*}_{1} (resp. K2K^{*}_{2}) are unique can be computed using the bisection method. Without loss of generality, we assume that

    i=1κ1L1,iri=1κ2L2,i,\sum_{i=1}^{\kappa_{1}}L_{1,i}\leq r\,\sum_{i=1}^{\kappa_{2}}L_{2,i},

    then for the computation of K2rK^{r}_{2}, we consider the function

    g(K2):=rV(κ2,K2,𝑳2)V(κ1,K~1,𝑳1),K2[0,K~2].g(K_{2}):=r\,V(\kappa_{2},K_{2},\boldsymbol{L}_{2})-V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1}),\quad K_{2}\in[0,\tilde{K}_{2}].

    which is continuous with

    g(0)\displaystyle g(0) =V(κ1,K~1,𝑳1)<0,\displaystyle=-V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1})<0,
    g(K~2)\displaystyle g(\tilde{K}_{2}) =rV(κ2,K~2,𝑳2)V(κ1,K~1,𝑳1)\displaystyle=r\,V(\kappa_{2},\tilde{K}_{2},\boldsymbol{L}_{2})-V(\kappa_{1},\tilde{K}_{1},\boldsymbol{L}_{1})
    =ri=1κ2L2,ii=1κ1L1,i0.\displaystyle=r\,\sum_{i=1}^{\kappa_{2}}L_{2,i}-\sum_{i=1}^{\kappa_{1}}L_{1,i}\geq 0.

    If g(K~2)=0g(\tilde{K}_{2})=0 then K2r=K~2K^{r}_{2}=\tilde{K}_{2}, otherwise g(K~2)>0g(\tilde{K}_{2})>0 and the solution follows by the bisection method. Since V(κ2,K2,𝑳2)V(\kappa_{2},K_{2},\boldsymbol{L}_{2}) is increasing over [0,K~2][0,\tilde{K}_{2}], the function g(K2)g(K_{2}) is also increasing over [0,K~2][0,\tilde{K}_{2}] and thus K2rK^{r}_{2} is unique.

    For the computation of K1K^{*}_{1}, we consider the function

    h(K1)=V(κ1,K1,𝑳1)rV(κ2,K~K1,𝑳2),K1[0,K~]h(K_{1})=V(\kappa_{1},K_{1},\boldsymbol{L}_{1})-r\,V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2}),\quad K_{1}\in[0,\tilde{K}] (130)

    which is continuous with

    h(0)\displaystyle h(0) =rV(κ2,K~,𝑳2)<0,\displaystyle=-r\,V(\kappa_{2},\tilde{K},\boldsymbol{L}_{2})<0,
    h(K~)\displaystyle h(\tilde{K}) =V(κ1,K~,𝑳1)>0.\displaystyle=V(\kappa_{1},\tilde{K},\boldsymbol{L}_{1})>0.

    Thus, the solution follows by the bisection method and it is less than or equal to K~1K~\tilde{K}_{1}\wedge\tilde{K}. Since V(κ1,K1,𝑳1)V(\kappa_{1},K_{1},\boldsymbol{L}_{1}) is increasing over [0,K~1][0,\tilde{K}_{1}], and V(κ2,K~K1,𝑳2)V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2}) is non-increasing over [0,K~][0,\tilde{K}] the function h(K1)h(K_{1}) is increasing over [0,K~1K~][0,\tilde{K}_{1}\wedge\tilde{K}] and thus K1K^{*}_{1} is unique.

    By restricting the value of KK to K~:=(K~1+K2r)K\tilde{K}:=(\tilde{K}_{1}+K^{r}_{2})\wedge K, we do not reduce the maximum possible value that the objective of (32), i.e. V(κ1,K1,𝑳1)V(\kappa_{1},K_{1},\boldsymbol{L}_{1}), can take. In the same time, we restrict the number of possible maximizers to 11, by keeping the one with the minimum 1\mathcal{L}^{1} norm, which is given by the unique root of

    V(κ1,K1,𝑳1)rV(κ2,K~K1,𝑳2),K1[0,K~].V(\kappa_{1},K_{1},\boldsymbol{L}_{1})-r\,V(\kappa_{2},\tilde{K}-K_{1},\boldsymbol{L}_{2}),\quad K_{1}\in[0,\tilde{K}].

    Appendix B

    In Appendix B, we prove Theorems IV.1, IV.2, IV.3, which establish a universal asymptotic lower bound on the expected stopping time when controlling the misclassification error rate (7), and the familywise error rates (9), respectively.

    B-A Misclassification error rate

    As a first step towards the proof of Theorem IV.1 we provide the following auxiliary lemma. We fix A[M]A\subseteq[M], and we denote by Z(A;k)Z(A;k) the family of subsets of [M][M] whose Hamming distance from AA is at least kk, i.e.,

    Z(A;k):={C[M]:|CA|k}.Z(A;k):=\{C\subseteq[M]:|C\triangle A|\geq k\}. (131)
    Lemma B.1

    For any BZ(A;k)B\notin Z(A;k), and (c1,,cM)𝒟(K)(c_{1},\ldots,c_{M})\in\mathcal{D}(K), we have the following inequality,

    minGZ(B;k)(iAGciIi+jGAcjJj)V(k,K,𝑭(A)).\min_{G\in Z(B;k)}\left(\sum_{i\in A\setminus G}c_{i}I_{i}+\sum_{j\in G\setminus A}c_{j}J_{j}\right)\leq V(k,K,\boldsymbol{F}(A)). (132)
    Proof:

    We fix (c1,,cM)𝒟(K)(c_{1},\ldots,c_{M})\in\mathcal{D}(K), and we denote by CC^{*} the set minimizer of

    minCZ(A;k)(iACciIi+jCAcjJj),\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right), (133)

    where in place of Z(B;k)Z(B;k) we have Z(A;k)Z(A;k). By [27, Lemma B.2], there exists a set GG^{*} such that for the sets A,AC,BA,\,A\triangle C^{*},\,B it holds

    AGACBG.A\triangle G^{*}\subseteq A\triangle C^{*}\subseteq B\triangle G^{*}. (134)

    By the right inclusion we have

    |BG||AC|k,|B\triangle G^{*}|\geq|A\triangle C^{*}|\geq k, (135)

    which means that GZ(B;k)G^{*}\in Z(B;k), whereas by the left inclusion we have

    AGAC,GACA.A\setminus G^{*}\subseteq A\setminus C^{*},\quad G^{*}\setminus A\subseteq C^{*}\setminus A. (136)

    As a result,

    minGZ(B;k)(iAGciIi+jGAcjJj)\displaystyle\min_{G\in Z(B;k)}\left(\sum_{i\in A\setminus G}c_{i}I_{i}+\sum_{j\in G\setminus A}c_{j}J_{j}\right)\leq iAGciIi+jGAcjJj\displaystyle\sum_{i\in A\setminus G^{*}}c_{i}I_{i}+\sum_{j\in G^{*}\setminus A}c_{j}J_{j} (137)
    \displaystyle\leq iACciIi+jCAcjJj\displaystyle\sum_{i\in A\setminus C^{*}}c_{i}I_{i}+\sum_{j\in C^{*}\setminus A}c_{j}J_{j}
    =minCZ(A;k)(iACciIi+jCAcjJj)\displaystyle=\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right)
    max(c1,,cM)𝒟(K)minCZ(A;k)(iACciIi+jCAcjJj),\displaystyle\leq\max_{(c_{1},\ldots,c_{M})\in\mathcal{D}(K)}\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right),

    where the first inequality follows by the fact that GZ(B;k)G^{*}\in Z(B;k), and the second one by (136). The set minimizer CZ(A;k)C^{*}\in Z(A;k) has Hamming distance equal to kk, for any (c1,,cM)𝒟(K)(c_{1},\ldots,c_{M})\in\mathcal{D}(K), as the addition of extra terms exceeds the minimum, which implies

    max(c1,,cM)𝒟(K)\displaystyle\max_{(c_{1},\ldots,c_{M})\in\mathcal{D}(K)} minCZ(A;k)(iACciIi+jCAcjJj)\displaystyle\min_{C\in Z(A;k)}\left(\sum_{i\in A\setminus C}c_{i}I_{i}+\sum_{j\in C\setminus A}c_{j}J_{j}\right) (138)
    =max(c1,,cM)𝒟(K)minU[M]:|U|=kiUciFi(A)\displaystyle=\max_{(c_{1},\ldots,c_{M})\in\mathcal{D}(K)}\;\min_{U\subseteq[M]:\,|U|=k}\;\sum_{i\in U}c_{i}\,F_{i}(A)
    =V(k,K,𝑭(A)).\displaystyle=V(k,K,\boldsymbol{F}(A)).

    where the last equality follows by definition of V(k,K,𝑭(A))V(k,K,\boldsymbol{F}(A)) according to (18). ∎

    For the proof of Theorem IV.1, we introduce the log-likelihood ratio of 𝖯A\mathsf{P}_{A} versus 𝖯C\mathsf{P}_{C}, for any sampling rule RR and any subsets A,C[M]A,\,C\subseteq[M], based on the observations from all sources in the first nn sampling instants, i.e.,

    ΛA,CR(n):=logd𝖯Ad𝖯C(nR),n.\displaystyle\Lambda^{R}_{A,C}(n):=\log\frac{d\mathsf{P}_{A}}{d\mathsf{P}_{{C}}}\left(\mathcal{F}^{R}_{n}\right),\quad n\in\mathbb{N}. (139)

    Since R(n)R(n) is n1R\mathcal{F}^{R}_{n-1}-measurable, Xi(n)X_{i}(n) is independent of n1R\mathcal{F}^{R}_{n-1}, and Z(n)Z(n) is independent of n1R\mathcal{F}^{R}_{n-1} and of {Xi(n):i[M]}\{X_{i}(n)\,:\,i\in[M]\}, we have

    ΛA,CR(n):=ΛA,CR(n1)+iACgi(Xi(n))Ri(n)jCAgj(Xj(n))Rj(n),\displaystyle\begin{split}\Lambda^{R}_{A,C}(n):=\Lambda^{R}_{A,C}(n-1)&+\sum_{i\in A\setminus C}g_{i}(X_{i}(n))\,R_{i}(n)\\ &-\sum_{j\in C\setminus A}g_{j}(X_{j}(n))\,R_{j}(n),\end{split} (140)

    where we recall that gi:=log(f1i/f0i)g_{i}:=\log\left(f_{1i}/f_{0i}\right), and we set ΛA,CR(0):=0\Lambda^{R}_{A,C}(0):=0. Comparing with (62), it is clear that

    ΛA,CR(n)=iACΛiR(n)jCAΛjR(n),n.\displaystyle\Lambda^{R}_{A,C}(n)=\sum_{i\in A\setminus C}\Lambda^{R}_{i}(n)-\sum_{j\in C\setminus A}\Lambda^{R}_{j}(n),\quad n\in\mathbb{N}. (141)

    We also set

    Λ~iR(n):=Λ~iR(n1)+(gi(Xi(n))𝖤A[gi(Xi(n))])Ri(n),Λ~iR(0):=0,\displaystyle\begin{split}\tilde{\Lambda}^{R}_{i}(n)&:=\tilde{\Lambda}^{R}_{i}(n-1)+\Bigl{(}g_{i}(X_{i}(n))-\mathsf{E}_{A}[g_{i}(X_{i}(n))]\Bigr{)}\,R_{i}(n),\\ \tilde{\Lambda}^{R}_{i}(0)&:=0,\end{split} (142)

    which implies that

    Λ~iR(n)\displaystyle\tilde{\Lambda}^{R}_{i}(n) ={ΛiR(n)IinπiR(n),iA,ΛiR(n)+JinπiR(n),iA.\displaystyle=\begin{cases}\Lambda^{R}_{i}(n)-I_{i}\,n\pi^{R}_{i}(n),\quad i\in A,\\ \Lambda^{R}_{i}(n)+J_{i}\,n\pi^{R}_{i}(n),\quad i\notin A.\end{cases} (143)

    Proof:

    We have to show that

    𝒥A(α;k,K)|log(α)|V(k,K,𝑭(A))(1+o(1)),\mathcal{J}_{A}(\alpha;k,K)\geq\frac{|\log(\alpha)|}{V(k,K,\boldsymbol{F}(A))}(1+o(1)), (144)

    where o(1)o(1) is a quantity that tends to zero as α0\alpha\to 0. We set

    f(α):=|log(α)|V(k,K,𝑭(A)),α(0,1).f(\alpha):=\frac{|\log(\alpha)|}{V(k,K,\boldsymbol{F}(A))},\quad\alpha\in(0,1). (145)

    By Markov’s inequality, for any stopping time TT and q(0,1)q\in(0,1) we have

    𝖤A[T]qf(α)𝖯A(Tqf(α)).\mathsf{E}_{A}[T]\geq q\,f(\alpha)\mathsf{P}_{A}(T\geq q\,f(\alpha)). (146)

    Thus, it suffices to show that for every q(0,1)q\in(0,1) we have

    lim infα0inf(R,T,Δ)𝒞(α;k,K)𝖯A(Tqf(α))1,\liminf_{\alpha\to 0}\inf_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}(T\geq q\,f(\alpha))\geq 1, (147)

    as this will imply that

    lim infα0𝒥A(α;k,K)|log(α)|qV(k,K,𝑭(A)),\liminf\limits_{\alpha\to 0}\frac{\mathcal{J}_{A}(\alpha;k,K)}{|\log(\alpha)|}\geq\frac{q}{V(k,K,\boldsymbol{F}(A))}, (148)

    and the desired result will follow by letting q1q\to 1.

    In the rest of the proof we fix some arbitrary q(0,1)q\in(0,1). Then, for any α(0,1)\alpha\in\,(0,1), (R,T,Δ)𝒞(α;k,K)(R,T,\Delta)\in\mathcal{C}(\alpha;k,K), and BZ(A;k)B\notin Z(A;k), where Z(A;k)Z(A;k) is defined in (131), we have

    𝖯A(Δ=B)\displaystyle\mathsf{P}_{A}(\Delta=B)\leq 𝖯A(minGZ(B;k)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) (149)
    +𝖯A(Tqf(α),minGZ(B;k)ΛA,GR(T)log(ηα),Δ=B)\displaystyle+\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    +𝖯A(Tqf(α),Δ=B),\displaystyle+\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta=B\right),

    where η\eta is an arbitrary constant in (0,1)(0,1). By summing up (149) with respect to all BZ(A;k)B\notin Z(A;k), we have

    𝖯A(ΔZ(A;k))\displaystyle\mathsf{P}_{A}(\Delta\notin Z(A;k))\leq BZ(A;k)𝖯A(minGZ(B;k)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) (150)
    +BZ(A;k)𝖯A(Tqf(α),minGZ(B;k)ΛA,GR(T)log(ηα),Δ=B)\displaystyle+\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    +𝖯A(Tqf(α),ΔZ(A;k)).\displaystyle+\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta\notin Z(A;k)\right).

    In view of the fact that

    1α𝖯A(ΔZ(A;k)),1-\alpha\leq\mathsf{P}_{A}(\Delta\notin Z(A;k)), (151)

    we obtain

    𝖯A\displaystyle\mathsf{P}_{A} (Tqf(α))\displaystyle\left(T\geq qf(\alpha)\right) (152)
    𝖯A(Tqf(α),ΔZ(A;k))\displaystyle\geq\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta\notin Z(A;k)\right)
    1αBZ(A;k)𝖯A(minGZ(B;k)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\geq 1-\alpha-\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    BZ(A;k)𝖯A(Tqf(α),minGZ(B;k)ΛA,GR(T)log(ηα),Δ=B).\displaystyle\qquad\qquad-\sum_{B\notin Z(A;k)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).

    Thus, in order to show (147), it suffices to show that for any BZ(A;k)B\notin Z(A;k)

    limα0sup(R,T,Δ)𝒞(α;k,K)𝖯A(minGZ(B;k)ΛA,GR(T)<log(ηα),Δ=B)=0,\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)=0, (153)

    and

    limα0sup(R,T,Δ)𝒞(α;k,K)𝖯A(Tqf(α),minGZ(B;k)ΛA,GR(T)log(ηα),Δ=B)=0.\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)=0. (154)

    In order to show (153), we fix α(0,1)\alpha\in\,(0,1) and (R,T,Δ)𝒞(α;k,K)(R,T,\Delta)\in\mathcal{C}(\alpha;k,K). By application of Boole’s inequality we have

    𝖯A(minGZ(B;k)ΛA,GR(T)<log(ηα),Δ=B)GZ(B;k)𝖯A(ΛA,GR(T)<log(ηα),Δ=B).\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)\leq\sum_{G\in Z(B;k)}\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right). (155)

    For all GZ(B;k)G\in Z(B;k), we apply the change of measure 𝖯A𝖯G\mathsf{P}_{A}\to\mathsf{P}_{G} and by Wald’s likelihood ratio identity we obtain

    𝖯A(ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) =𝖤G[exp{ΛA,GR(T)};ΛA,GR(T)<log(ηα),Δ=B]\displaystyle=\mathsf{E}_{G}\left[\exp\{\Lambda^{R}_{A,G}(T)\};{\Lambda^{R}_{A,G}}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right] (156)
    ηα𝖯G(Δ=B)η,\displaystyle\leq\frac{\eta}{\alpha}\,\mathsf{P}_{G}(\Delta=B)\leq\eta,

    where the last inequality is deduced by the fact that for any GZ(B;k)G\in Z(B;k), it holds 𝖯G(Δ=B)α\mathsf{P}_{G}(\Delta=B)\leq\alpha. In view of (155), we obtain

    𝖯A(minGZ(B;k)ΛA,GR(T)<log(ηα),Δ=B)|Z(B;k)|η.\mathsf{P}_{A}\left(\min_{G\in Z(B;k)}{\Lambda^{R}_{A,G}}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)\leq\big{|}Z(B;k)\big{|}\eta. (157)

    Since η(0,1)\eta\in(0,1) is arbitrary, letting η0\eta\to 0, we prove (153).

    In order to show (154), we observe that by decomposition (143) we have

    1TΛA,GR(T)\displaystyle\frac{1}{T}\Lambda^{R}_{A,G}(T) =1T(iAGΛ~iR(T)+jGAΛ~jR(T))+iAGIiπiR(T)+jGAJjπjR(T)\displaystyle=\frac{1}{T}\left(\sum_{i\in A\setminus G}\tilde{\Lambda}^{R}_{i}(T)+\sum_{j\in G\setminus A}-\tilde{\Lambda}^{R}_{j}(T)\right)+\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T)+\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{j}(T)
    1Ti[M]|Λ~iR(T)|+iAGIiπiR(T)+jGAJjπiR(T).\displaystyle\leq\frac{1}{T}\sum_{i\in[M]}|\tilde{\Lambda}^{R}_{i}(T)|+\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T)+\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{i}(T).

    Considering the minimum over all GZ(B;k)G\in Z(B;k) on both sides of the above equality, we have

    1TminGZ(B;k)ΛA,GR(T)\displaystyle\frac{1}{T}\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T) 1Ti[M]|Λ~iR(T)|+minGZ(B;k)(iAGIiπiR(T)+jGAJjπjR(T))\displaystyle\leq\frac{1}{T}\sum_{i\in[M]}|\tilde{\Lambda}^{R}_{i}(T)|+\min_{G\in Z(B;k)}\left(\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T)+\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{j}(T)\right) (158)
    1Ti[M]|Λ~iR(T)|+V(k,K,𝑭(A)),\displaystyle\leq\frac{1}{T}\sum_{i\in[M]}|\tilde{\Lambda}^{R}_{i}(T)|+V(k,K,\boldsymbol{F}(A)),

    where the second inequality follows by Lemma B.1 and the fact that (R,T,Δ)(R,T,\Delta) belongs to 𝒞(K)\mathcal{C}(K), which implies (π1R(T),,πMR(T))𝒟(K)(\pi^{R}_{1}(T),\ldots,\pi^{R}_{M}(T))\in\mathcal{D}(K). Therefore,

    𝖯A\displaystyle\mathsf{P}_{A} (Tqf(α),minGZ(B;k)ΛA,GR(T)log(ηα),Δ=B)\displaystyle\left(T\leq qf(\alpha),\min_{G\in Z(B;k)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) (159)
    𝖯A(Tqf(α),ξ(T)|logα|+logη),\displaystyle\leq\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\xi(T)\geq|\log\alpha|+\log\eta\right),

    where

    ξ(n):=(i[M]|Λ~iR(n)n|+V(k,K,𝑭(A)))n,n.\displaystyle\xi(n):=\Bigg{(}\sum_{i\in[M]}\left|\frac{\tilde{\Lambda}^{R}_{i}(n)}{n}\right|+V(k,K,\boldsymbol{F}(A))\Bigg{)}n,\quad\forall n\in\mathbb{N}. (160)

    By the moment assumption (12) and [29, Theorem 2.19], we have

    limnΛ~iR(n)n=0,a.s.i[M],\lim_{n\to\infty}\frac{\tilde{\Lambda}^{R}_{i}(n)}{n}=0,\quad\mbox{a.s.}\quad\forall\,i\in\,[M], (161)

    and as a result,

    limnξ(n)n=V(k,K,𝑭(A)),a.s.\lim_{n\to\infty}\frac{\xi(n)}{n}=V(k,K,\boldsymbol{F}(A)),\quad\mbox{a.s.} (162)

    Hence, by [27, Lemma F.1] we have

    limα0sup(R,T,Δ)𝒞(α;k,K)𝖯A(Tqf(α),ξ(T)|log(α)|+log(η))=0,\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha;k,K)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\xi(T)\geq|\log(\alpha)|+\log(\eta)\right)=0, (163)

    From this and (159), we conclude that (154) holds. ∎

    B-B Familywise error rates

    As a first step towards the proof of Theorems IV.2, IV.3, we provide the following auxiliary lemma. To this end, for fixed A[M]A\subseteq[M], we consider the set

    Hk1,k2(A):={B[M]:|BA|<k1,|AB|<k2},H_{k_{1},k_{2}}(A):=\{B\subseteq[M]:|B\setminus A|<k_{1},\,|A\setminus B|<k_{2}\}, (164)

    and for any BHk1,k2(A)B\in H_{k_{1},k_{2}}(A), we also consider the sets

    Uk1(B)\displaystyle U_{k_{1}}(B) :={CA:|BC|k1},\displaystyle:=\{C\subseteq A:|B\setminus C|\geq k_{1}\}, (165)
    Yk2(B)\displaystyle Y_{k_{2}}(B) :={CA:|CB|k2}.\displaystyle:=\{C\supseteq A:|C\setminus B|\geq k_{2}\}.
    Lemma B.2

    For any BHk1,k2(A)B\in H_{k_{1},k_{2}}(A) and (c1,,cM)𝒟(K)(c_{1},\ldots,c_{M})\in\mathcal{D}(K), we have the following inequalities:

    1. (i)

      If A=A=\emptyset,

      minGYk2(B)jGcjJjV(k2,K,𝑱𝒌𝟏𝟏(A)).\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G}c_{j}J_{j}\leq V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)). (166)
    2. (ii)

      If A=[M]A=[M],

      minGUk1(B)i[M]GciIiV(k1,K,𝑰𝒌𝟐𝟏(A)).\min_{G\in U_{k_{1}}(B)}\sum_{i\in[M]\setminus G}c_{i}\,I_{i}\leq V(k_{1},K,\boldsymbol{I_{k_{2}-1}}(A)). (167)
    3. (iii)

      If 0<|A|<M0<|A|<M,

      min{minGUk1(B)iAGciIi,rminGYk2(B)jGAcjJj}vA(k1,k2,K,r),\min\Bigg{\{}\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}c_{i}\,I_{i},\,r\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}c_{j}\,J_{j}\Bigg{\}}\leq v_{A}(k_{1},k_{2},K,r), (168)

      where vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is defined in Theorem IV.3.

    Proof:

    We prove (i), (iii). Case (ii) is symmetric to (i).
    For A=A=\emptyset, without loss of generality, we consider BHk1,k2()B\in H_{k_{1},k_{2}}(\emptyset) such that Yk2(B)Y_{k_{2}}(B)\neq\emptyset, or equivalently |B|<k1|B|<k_{1} and |Bc|k2|B^{c}|\geq k_{2}, otherwise (166) holds trivially. Since |Bc|k2|B^{c}|\geq k_{2}, there exists Γ2Bc\Gamma_{2}\subseteq B^{c} such that |Γ2|=k2|\Gamma_{2}|=k_{2}, and Γ2\Gamma_{2} contains the sources that correspond to the k2k_{2} smallest terms in {cjJj:jBc}\{c_{j}J_{j}\,:\,j\in B^{c}\}. Clearly, Γ2Yk2(B)\Gamma_{2}\in Y_{k_{2}}(B), and

    [M]=BBcBΓ2.[M]=B\cup B^{c}\supseteq B\cup\Gamma_{2}. (169)

    Since |B|<k1|B|<k_{1}, even if BB contains the k11k_{1}-1 smallest elements in {cjJj:j[M]}\{c_{j}J_{j}\,:\,j\in[M]\}, the set Γ2\Gamma_{2} contains the following k2k_{2} smallest elements in [M]B=Bc[M]\setminus B=B^{c}. Thus, let us denote by {j}\{j\} the identity of the source with the jthj^{th} smallest element in {ciJi:i[M]}\{c_{i}J_{i}\,:\,i\in[M]\}, then we have

    minGYk2(B)jGcjJj\displaystyle\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G}c_{j}\,J_{j}\leq jΓ2cjJj\displaystyle\sum_{j\in\Gamma_{2}}c_{j}\,J_{j} (170)
    \displaystyle\leq j=k1k2+(k11)c{j}J{j}V(k2,K,𝑱𝒌𝟏𝟏(A)).\displaystyle\sum_{j=k_{1}}^{k_{2}+(k_{1}-1)}c_{\{j\}}\,J_{\{j\}}\leq V(k_{2},K,\boldsymbol{J_{k_{1}-1}}(A)).

    For 0<|A|<M0<|A|<M, without loss of generality, we consider BHk1,k2(A)B\in H_{k_{1},k_{2}}(A) such that Uk1(B)U_{k_{1}}(B)\neq\emptyset and Yk2(B)Y_{k_{2}}(B)\neq\emptyset, or equivalently |B|k1|B|\geq k_{1} and |Bc|k2|B^{c}|\geq k_{2}, otherwise (168) holds trivially. We consider the quantities

    l1:=|BA|,l2:=|AB|.l_{1}:=|B\setminus A|,\quad l_{2}:=|A\setminus B|. (171)
    1. 1.

      The fact that |B|k1|B|\geq k_{1} implies that |AB|k1|BA|=k1l1|A\cap B|\geq k_{1}-|B\setminus A|=k_{1}-l_{1}. Thus, there exists Γ1AB\Gamma_{1}\subseteq A\cap B such that |Γ1|=k1l1|\Gamma_{1}|=k_{1}-l_{1}, and Γ1\Gamma_{1} contains the sources that correspond to the k1l1k_{1}-l_{1} smallest elements in {ciIi:iAB}\{c_{i}I_{i}\,:\,i\in A\cap B\}. We set B1:=AΓ1B^{*}_{1}:=A\setminus\Gamma_{1}. Since Γ1AB\Gamma_{1}\subseteq A\cap B, it holds AB1=Γ1A\setminus B^{*}_{1}=\Gamma_{1}, and

      BB1=B(AΓ1)c=Γ1(BA).B\setminus B^{*}_{1}=B\cap(A\setminus\Gamma_{1})^{c}=\Gamma_{1}\cup\left(B\setminus A\right). (172)

      Therefore,

      |BB1|=|Γ1|+|BA|=k1l1+l1=k1,|B\setminus B^{*}_{1}|=|\Gamma_{1}|+|B\setminus A|=k_{1}-l_{1}+l_{1}=k_{1},

      which implies B1Uk1(B)B^{*}_{1}\in U_{k_{1}}(B), and

      minGUk1(B)iAGciIiiΓ1ciIi.\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}c_{i}\,I_{i}\leq\sum_{i\in\Gamma_{1}}c_{i}\,I_{i}. (173)

      Even if ABA\setminus B contained the l2l_{2} smallest elements in {ciIi:iA}\{c_{i}I_{i}\,:\,i\in A\}, the set Γ1AB\Gamma_{1}\subseteq A\cap B would contain the following k1l1k_{1}-l_{1} smallest elements in the same set. Thus, for <j><j> to denote the identity of the jthj^{th} smallest element in {ciIi:iA}\{c_{i}I_{i}\,:\,i\in A\}, we always have

      iΓ1ciIii=1+l2k1l1+l2c<i>I<i>.\sum_{i\in\Gamma_{1}}c_{i}\,I_{i}\leq\sum_{i=1+l_{2}}^{k_{1}-l_{1}+l_{2}}c_{<i>}\,I_{<i>}. (174)
    2. 2.

      The fact that |Bc|k2|B^{c}|\geq k_{2} implies that |AcBc|k2|ABc|=k2l2|A^{c}\cap B^{c}|\geq k_{2}-|A\cap B^{c}|=k_{2}-l_{2}. Thus, there exists Γ2AcBc\Gamma_{2}\subseteq A^{c}\cap B^{c} such that |Γ2|=k2l2|\Gamma_{2}|=k_{2}-l_{2} and Γ2\Gamma_{2} contains the sources that correspond to the k2l2k_{2}-l_{2} smallest elements in {cjJj:jAcBc}\{c_{j}J_{j}\,:\,j\in A^{c}\cap B^{c}\}. We set B2:=AΓ2B^{*}_{2}:=A\cup\Gamma_{2}. Since Γ2AcBc\Gamma_{2}\subseteq A^{c}\cap B^{c}, it holds B2A=Γ2B^{*}_{2}\setminus A=\Gamma_{2}, and

      B2B=(AΓ2)Bc=Γ2(AB).B^{*}_{2}\setminus B=(A\cup\Gamma_{2})\cap B^{c}=\Gamma_{2}\cup\left(A\setminus B\right). (175)

      Therefore,

      |B2B|=|Γ2|+|AB|=k2l2+l2=k2,|B^{*}_{2}\setminus B|=|\Gamma_{2}|+|A\setminus B|=k_{2}-l_{2}+l_{2}=k_{2},

      which implies B2Yk2(B)B^{*}_{2}\in Y_{k_{2}}(B), and

      minGYk2(B)jGAcjJjjΓ2cjJj.\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}c_{j}\,J_{j}\leq\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}. (176)

      Even if BAB\setminus A contained the l1l_{1} smallest elements in {cjJj:jAc}\{c_{j}J_{j}\,:\,j\in A^{c}\}, the set Γ2AcBc\Gamma_{2}\subseteq A^{c}\cap B^{c} would contain the following k2l2k_{2}-l_{2} smallest elements in the same set. Thus, for {j}\{j\} to denote the identity of the source with the jthj^{th} smallest element in {cjJj:jAc}\{c_{j}J_{j}\,:\,j\in A^{c}\}, we always have

      jΓ2cjJjj=1+l1k2l2+l1c{j}J{j}.\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}\leq\sum_{j=1+l_{1}}^{k_{2}-l_{2}+l_{1}}c_{\{j\}}\,J_{\{j\}}. (177)

    Let us assume, without loss of generality, that l1l2l_{1}\geq l_{2}. We set l:=l1l2l:=l_{1}-l_{2}, and for (174), (177) we further have,

    i=1+l2k1l1+l2c<i>I<i>\displaystyle\sum_{i=1+l_{2}}^{k_{1}-l_{1}+l_{2}}c_{<i>}\,I_{<i>} i=1+l2k1lc<i>I<i>i=1k1lc<i>I<i>,\displaystyle\leq\sum_{i=1+l_{2}}^{k_{1}-l}c_{<i>}\,I_{<i>}\leq\sum_{i=1}^{k_{1}-l}c_{<i>}\,I_{<i>}, (178)
    j=1+l1k2l2+l1c{j}J{j}\displaystyle\sum_{j=1+l_{1}}^{k_{2}-l_{2}+l_{1}}c_{\{j\}}\,J_{\{j\}} j=1+l+l2k2+lc{j}J{j}j=1+lk2+lc{j}J{j}.\displaystyle\leq\sum_{j=1+l+l_{2}}^{k_{2}+l}c_{\{j\}}\,J_{\{j\}}\leq\sum_{j=1+l}^{k_{2}+l}c_{\{j\}}\,J_{\{j\}}.

    As a result,

    min\displaystyle\min {minGUk1(B)iAGciIi,rminGYk2(B)jGAcjJj}\displaystyle\left\{\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}c_{i}I_{i},\,r\,\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}c_{j}\,J_{j}\right\}
    min{iΓ1ciIi,rjΓ2cjJj}\displaystyle\leq\min\left\{\sum_{i\in\Gamma_{1}}c_{i}\,I_{i},r\,\sum_{j\in\Gamma_{2}}c_{j}\,J_{j}\right\}
    min{i=1k1lc<i>I<i>,rj=1+lk2+lc{j}J{j}}\displaystyle\leq\min\left\{\sum_{i=1}^{k_{1}-l}c_{<i>}\,I_{<i>},r\,\sum_{j=1+l}^{k_{2}+l}c_{\{j\}}\,J_{\{j\}}\right\}
    W(k1l,k2,K,𝑰(A),𝑱𝒍(A),r)\displaystyle\leq W(k_{1}-l,k_{2},K,\boldsymbol{I}(A),\boldsymbol{J_{l}}(A),r)
    vA(k1,k2,K,r),\displaystyle\leq v_{A}(k_{1},k_{2},K,r),

    which concludes the claim. ∎

    We proceed to the proof of Theorem IV.3 and by which we deduce the proof of Theorem IV.2, as explained in the last part of the following proof.

    Proof:

    We have to show that

    𝒥A(α,β;k1,k2,K)|log(α)|vA(k1,k2,K,r)(1+o(1)),\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)\geq\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)}(1+o(1)), (179)

    where vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) is defined in Theorem IV.3 and o(1)o(1) is a quantity that tends to zero as α0\alpha\to 0. We define

    f(α):=|log(α)|vA(k1,k2,K,r),α(0,1).f(\alpha):=\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)},\quad\alpha\in(0,1). (180)

    By Markov’s inequality, for any stopping time TT and q,α(0,1)q,\alpha\in(0,1),

    𝖤A[T]qf(α)𝖯A(Tqf(α)).\mathsf{E}_{A}[T]\geq q\,f(\alpha)\mathsf{P}_{A}(T\geq q\,f(\alpha)). (181)

    Thus, it suffices to show that for every q(0,1)q\in(0,1), we have

    lim infα0inf(R,T,Δ)𝒞(α,β;k1,k2,K)𝖯A(Tqf(α))1,\liminf_{\alpha\to 0}\inf_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}(T\geq q\,f(\alpha))\geq 1, (182)

    as this will imply that

    lim infα0𝒥A(α,β;k1,k2,K)/|log(α)|q/vA(k1,k2,K,r),\liminf\limits_{\alpha\to 0}\mathcal{J}_{A}(\alpha,\beta;k_{1},k_{2},K)/|\log(\alpha)|\geq q/v_{A}(k_{1},k_{2},K,r), (183)

    and the desired result will follow by letting q1q\to 1.

    In the rest of the proof, we fix some arbitrary q(0,1)q\in(0,1). Moreover, we note that for any B[M]B\subseteq[M] we have either |B|k1|B|\geq k_{1} or |Bc|k2|B^{c}|\geq k_{2}, because otherwise M=|B|+|Bc|<k1+k2M=|B|+|B^{c}|<k_{1}+k_{2}, which contradicts the assumption k1+k2Mk_{1}+k_{2}\leq M. In what follows, we let BHk1,k2(A)B\in H_{k_{1},k_{2}}(A) and we focus on the case |B|k1|B|\geq k_{1} and |Bc|k2|B^{c}|\geq k_{2}. The other two cases |B|k1,|Bc|<k2|B|\geq k_{1},\,|B^{c}|<k_{2}, and |B|<k1,|Bc|k2|B|<k_{1},\,|B^{c}|\geq k_{2} are simpler and they can be treated in the same way as described in the last part of this proof.

    Then, for any α(0,1)\alpha\in\,(0,1) and (R,T,Δ)𝒞(α,β;k1,k2,K)(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K) we have

    𝖯A(Δ=B)\displaystyle\mathsf{P}_{A}\left(\Delta=B\right)\leq 𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) (184)
    +\displaystyle+ 𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)
    +\displaystyle+ 𝖯A(minGUk1(B)ΛA,GR(T)log(ηα),minGYk2(B)ΛA,GR(T)log(ηβ),Δ=B),\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\beta}\right),\Delta=B\right),

    where ΛA,GR(T)\Lambda^{R}_{A,G}(T) is defined in (140), and η\eta is an arbitrary constant in (0,1)(0,1). The third term in (184) can be equivalently written as

    𝖯A(min{minGUk1(B)ΛA,GR(T),ρ(α,η)minGYk2(B)ΛA,GR(T)}log(ηα),Δ=B),\mathsf{P}_{A}\left(\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho(\alpha,\eta)\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right), (185)

    where ρ(α,η):=log(η/α)/log(η/β)\rho(\alpha,\eta):=\log(\eta/\alpha)/\log(\eta/\beta).

    For simplicity in what follows we just write ρ\rho instead of ρ(α,η)\rho(\alpha,\eta). We upper bound the third term in (184) by

    𝖯A\displaystyle\mathsf{P}_{A} (min{minGUk1(B)ΛA,GR(T),ρminGYk2(B)ΛA,GR(T)}log(ηα),Δ=B)\displaystyle\left(\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) (186)
    \displaystyle\leq 𝖯A(Tqf(α),min{minGUk1(B)ΛA,GR(T),ρminGYk2(B)ΛA,GR(T)}log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(T\leq qf(\alpha),\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    +𝖯A(Tqf(α),Δ=B).\displaystyle+\mathsf{P}_{A}(T\geq qf(\alpha),\Delta=B).

    By summing up (184) over all BHk1,k2(A)B\in H_{k_{1},k_{2}}(A), we have

    𝖯A\displaystyle\mathsf{P}_{A} (ΔHk1,k2(A))\displaystyle\left(\Delta\in H_{k_{1},k_{2}}(A)\right)
    BHk1,k2(A)𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\leq\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    +BHk1,k2(A)𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)\displaystyle+\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)
    +BHk1,k2(A)𝖯A(Tqf(α),min{minGUk1(B)ΛA,GR(T),ρminGYk2(B)ΛA,GR(T)}log(ηα),Δ=B)\displaystyle+\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    +𝖯A(Tqf(α),ΔHk1,k2(A)).\displaystyle+\mathsf{P}_{A}(T\geq qf(\alpha),\Delta\in H_{k_{1},k_{2}}(A)).

    In view of the fact that

    1(α+β)𝖯A(ΔHk1,k2(A)),1-(\alpha+\beta)\leq\mathsf{P}_{A}\left(\Delta\in H_{k_{1},k_{2}}(A)\right), (187)

    we obtain

    𝖯A\displaystyle\mathsf{P}_{A} (Tqf(α))\displaystyle\left(T\geq qf(\alpha)\right)
    𝖯A(Tqf(α),ΔHk1,k2(A))\displaystyle\geq\mathsf{P}_{A}\left(T\geq qf(\alpha),\Delta\in H_{k_{1},k_{2}}(A)\right)
    1(α+β)\displaystyle\geq 1-(\alpha+\beta)
    BHk1,k2(A)𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle-\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)
    BHk1,k2(A)𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)\displaystyle-\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)
    BHk1,k2(A)𝖯A(Tqf(α),min{minGUk1(B)ΛA,GR(T),ρminGYk2(B)ΛA,GR(T)}log(ηα),Δ=B).\displaystyle-\sum_{B\in H_{k_{1},k_{2}}(A)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\min\Big{\{}\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\Big{\}}\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).

    Thus, in order to show (182) it suffices to show that for all BHk1,k2(A)B\in H_{k_{1},k_{2}}(A)

    limα0sup(R,T,Δ)𝒞(α,β;k1,k2,K)𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)=0,\displaystyle\lim_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)=0, (188)
    limβ0sup(R,T,Δ)𝒞(α,β;k1,k2,K)𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)=0,\displaystyle\lim_{\beta\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)=0,

    and

    limα0sup(R,T,Δ)\displaystyle\lim_{\alpha\to 0}\sup_{(R,T,\Delta)} 𝖯A(Tqf(α),min{minGUk1(B)ΛA,GR(T),ρminGYk2(B)ΛA,GR(T)}log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(T{\leq}qf(\alpha),\min\{\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\}{\geq}\log\left(\frac{\eta}{\alpha}\right),\Delta{=}B\right) (189)
    =0,\displaystyle=0,

    where the supremum is evaluated over (R,T,Δ)𝒞(α,β;k1,k2,K)(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K). In order to show (188), we apply Boole’s inequality and we have:

    𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) GUk1(B)𝖯A(ΛA,GR(T)<log(ηα),Δ=B),\displaystyle\leq\sum_{G\in U_{k_{1}}(B)}\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right), (190)
    𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right) GYk2(B)𝖯A(ΛA,GR(T)<log(ηβ),Δ=B).\displaystyle\leq\sum_{G\in Y_{k_{2}}(B)}\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right).

    For all GUk1(B)G\in U_{k_{1}}(B), we apply the change measure 𝖯A𝖯G\mathsf{P}_{A}\to\mathsf{P}_{G}, and we have

    𝖯A(ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) =𝖤G[exp{ΛA,GR(T)};ΛA,GR(T)<log(ηα),Δ=B]\displaystyle=\mathsf{E}_{G}\left[\exp\{\Lambda^{R}_{A,G}(T)\};\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right] (191)
    ηα𝖯G(Δ=B)ηα𝖯G(|BG|k1)=η,\displaystyle\leq\frac{\eta}{\alpha}\mathsf{P}_{G}(\Delta=B)\leq\frac{\eta}{\alpha}\mathsf{P}_{G}\left(|B\setminus G|\geq k_{1}\right)=\eta,

    where the last inequality is deduced by the fact that for any GUk1(B)G\in U_{k_{1}}(B), the error constraint (9) implies 𝖯G(|BG|k1)α\mathsf{P}_{G}\left(|B\setminus G|\geq k_{1}\right)\leq\alpha. Therefore, for every η(0,1)\eta\,\in(0,1),

    lim supα0sup(R,T,Δ)𝒞(α,β;k1,k2,K)𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)|Uk1(B)|η.\limsup_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right)\leq|U_{k_{1}}(B)|\eta. (192)

    In the same way, we show that

    lim supβ0sup(R,T,Δ)𝒞(α,β;k1,k2,K)𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)|Yk2(B)|η.\limsup_{\beta\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right)\leq|Y_{k_{2}}(B)|\eta. (193)

    Letting η0\eta\to 0, we prove (188).

    In order to show (189), we observe that by decomposition (143) we have

    1TΛA,GR(T)\displaystyle\frac{1}{T}\,\Lambda^{R}_{A,G}(T) =iAGΛ~iR(T)T+IiπiR(T),\displaystyle=\sum_{i\in A\setminus G}\frac{\tilde{\Lambda}^{R}_{i}(T)}{T}+I_{i}\,\pi^{R}_{i}(T), GUk1(B),\displaystyle\quad\forall\,G\in U_{k_{1}}(B), (194)
    1TΛA,GR(T)\displaystyle\frac{1}{T}\Lambda^{R}_{A,G}(T) =jGAΛ~jR(T)T+JjπjR(T),\displaystyle=\sum_{j\in G\setminus A}\frac{-\tilde{\Lambda}^{R}_{j}(T)}{T}+J_{j}\,\pi^{R}_{j}(T), GYk2(B),\displaystyle\quad\forall\,G\in Y_{k_{2}}(B),

    which further implies that

    1Tmin\displaystyle\frac{1}{T}\min {minGUk1(B)ΛA,GR(T),ρminGYk2(B)ΛA,GR(T)}\displaystyle\left\{\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),\rho\,\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\right\} (195)
    \displaystyle\leq max{1,ρ}i[M]|Λ~iR(T)|T\displaystyle\max\{1,\rho\}\sum_{i\in[M]}\frac{|\tilde{\Lambda}^{R}_{i}(T)|}{T}
    +min{minGUk1(B)iAGIiπiR(T),ρminGYk2(B)jGAJjπjR(T)}.\displaystyle+\min\Bigg{\{}\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}I_{i}\,\pi^{R}_{i}(T),\rho\,\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}J_{j}\,\pi^{R}_{j}(T)\Bigg{\}}.

    We note that for any fixed η>0\eta>0,

    ρ(α,η)=(log(η)/log(α)1)log(α)(log(η)/log(β)1)log(β)r,as α0,\rho(\alpha,\eta)=\frac{(\log(\eta)/\log(\alpha)-1)\log(\alpha)}{(\log(\eta)/\log(\beta)-1)\log(\beta)}\to r,\quad\mbox{as }\,\alpha\to 0, (196)

    where rr is defined in (50). Also, by Lemma B.2 we have

    min{minGUk1(B)iAGIiπiR(T),rminGYk2(B)jGAJjπjR(T)}vA(k1,k2,K,r).\min\Bigg{\{}\min_{G\in U_{k_{1}}(B)}\sum_{i\in A\setminus G}I_{i}\pi^{R}_{i}(T),\,r\min_{G\in Y_{k_{2}}(B)}\sum_{j\in G\setminus A}J_{j}\pi^{R}_{j}(T)\Bigg{\}}\leq v_{A}(k_{1},k_{2},K,r). (197)

    Therefore,

    1Tmin{minGUk1(B)ΛA,GR(T),rminGYk2(B)ΛA,GR(T)}max{1,r}i[M]|Λ~iR(T)|T+vA(k1,k2,K,r).\frac{1}{T}\min\left\{\min_{G\in U_{k_{1}}(B)}\Lambda^{R}_{A,G}(T),r\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\right\}\leq\max\{1,r\}\sum_{i\in[M]}\frac{|\tilde{\Lambda}^{R}_{i}(T)|}{T}+v_{A}(k_{1},k_{2},K,r). (198)

    As a result, the probability in (189) is bounded above by

    𝖯A(Tqf(α),ξ(T)|log(α)|+log(η)),\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\xi(T)\geq|\log(\alpha)|+\log(\eta)\right), (199)

    where

    ξ(n):=(max{1,r}i[M]|Λ~iR(n)|n+vA(k1,k2,K,r))n,n,\xi(n):=\left(\max\{1,r\}\sum_{i\in[M]}\frac{\big{|}\tilde{\Lambda}^{R}_{i}(n)\big{|}}{n}+v_{A}(k_{1},k_{2},K,r)\right)n,\quad n\in\mathbb{N}, (200)

    and it suffices to show that

    lim supα0sup(R,T,Δ)𝒞(α,β;k1,k2,K)𝖯A(Tqf(α),ξ(T)|log(α)|+log(η))=0.\limsup_{\alpha\to 0}\sup_{(R,T,\Delta)\in\mathcal{C}(\alpha,\beta;k_{1},k_{2},K)}\mathsf{P}_{A}\left(T\leq qf(\alpha),\,\xi(T)\geq|\log(\alpha)|+\log(\eta)\right)=0. (201)

    In order to show (201), it suffices to prove that

    limnξ(n)n=vA(k1,k2,K,r),a.s.\lim_{n\to\infty}\frac{\xi(n)}{n}=v_{A}(k_{1},k_{2},K,r),\quad\mbox{a.s.} (202)

    and then the claim follows by [27, Lemma F.1]. Indeed, by [29, Theorem 2.19] and moment assumption (12) we have

    limnΛ~iR(n)n=0,a.s.i[M],\lim_{n\to\infty}\frac{\tilde{\Lambda}^{R}_{i}(n)}{n}=0,\quad\mbox{a.s.}\quad\forall\,i\in[M], (203)

    which combined with (200), implies (202).

    The other cases: As for the cases |B|k1,|Bc|<k2|B|\geq k_{1},\,|B^{c}|<k_{2}, and |B|<k1,|Bc|k2|B|<k_{1},\,|B^{c}|\geq k_{2} we note the following.

    • In case |B|k1|B|\geq k_{1}, |Bc|<k2|B^{c}|<k_{2}, i.e., Yk2(B)=Y_{k_{2}}(B)=\emptyset and for any BHk1,k2(A)B\in H_{k_{1},k_{2}}(A), we have

      𝖯A(Δ=B)\displaystyle\mathsf{P}_{A}(\Delta=B)\leq 𝖯A(minGUk1(B)ΛA,GR(T)<log(ηα),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}{\Lambda^{R}_{A,G}}(T)<\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right) (204)
      +𝖯A(minGUk1(B)ΛA,GR(T)log(ηα),Δ=B).\displaystyle+\mathsf{P}_{A}\left(\min_{G\in U_{k_{1}}(B)}{\Lambda^{R}_{A,G}}(T)\geq\log\left(\frac{\eta}{\alpha}\right),\Delta=B\right).

      This also applies for the case A=[M]A=[M].

    • In case |Bc|k2|B^{c}|\geq k_{2}, |B|<k1|B|<k_{1}, i.e., Uk1(B)=U_{k_{1}}(B)=\emptyset and for any BHk1,k2(A)B\in H_{k_{1},k_{2}}(A), we have

      𝖯A(Δ=B)\displaystyle\mathsf{P}_{A}(\Delta=B)\leq 𝖯A(minGYk2(B)ΛA,GR(T)<log(ηβ),Δ=B)\displaystyle\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)<\log\left(\frac{\eta}{\beta}\right),\Delta=B\right) (205)
      +𝖯A(minGYk2(B)ΛA,GR(T)log(ηβ),Δ=B).\displaystyle+\mathsf{P}_{A}\left(\min_{G\in Y_{k_{2}}(B)}\Lambda^{R}_{A,G}(T)\geq\log\left(\frac{\eta}{\beta}\right),\Delta=B\right).

      This also applies for the case A=A=\emptyset.

    In both cases, we deduce the claim following the same reasoning as for the main case (|B|k1,|Bc|k2|B|\geq k_{1},\,|B^{c}|\geq k_{2}). ∎

    Appendix C

    In Appendix C, we prove Theorems V.1, V.2, Theorems VI.1, VI.2, and Proposition VI.1. Throughout Appendix C, we fix A[M]A\subseteq[M].

    Proof:

    In view of the asymptotic lower bound in Theorem IV.1, it suffices to show that

    𝖤A[TR]|log(α)|V(k,K,𝑭(A)),as α0,\mathsf{E}_{A}[T^{R}]\lesssim\frac{|\log(\alpha)|}{V(k,K,\boldsymbol{F}(A))},\quad\mbox{as }\;\alpha\to 0, (206)

    where TRT^{R} is the stopping rule defined in (63). For d>0d>0 selected according to (65), it suffices to show that for an arbitrarily small ϵ>0\epsilon>0, we have

    𝖤A[TR]d/(i=1k(c(i)(A)ϵ)Fi(A)ϵ),as d,\mathsf{E}_{A}[T^{R}]\lesssim d\Big{/}\left(\sum_{i=1}^{k}(c^{*}_{(i)}(A)-\epsilon)F_{i}(A)-\epsilon\right),\quad\mbox{as }\;d\to\infty, (207)

    where c(i)(A)c^{*}_{(i)}(A) are defined in (66), and by letting ϵ0\epsilon\to 0 we obtain (206). To this end, we fix ϵ>0\epsilon>0 small enough, and we set

    Lϵ(d):=di=1k(c(i)(A)ϵ)Fi(A)ϵ,L_{\epsilon}(d):=\frac{d}{\sum_{i=1}^{k}(c^{*}_{(i)}(A)-\epsilon)F_{i}(A)-\epsilon}, (208)

    for which it holds

    𝖤A[TR]Lϵ(d)+n>Lϵ(d)𝖯A(TR>n).\mathsf{E}_{A}[T^{R}]\leq L_{\epsilon}(d)+\sum_{n>L_{\epsilon}(d)}\mathsf{P}_{A}(T^{R}>n). (209)

    On the event {TR>n}\{T^{R}>n\}, there exists U[M]U\subseteq[M] with |U|=k|U|=k such that

    iAUΛiR(n)jAcUΛjR(n)iU|ΛiR(n)|<d.\sum_{i\in A\cap U}\Lambda^{R}_{i}(n)-\sum_{j\in A^{c}\cap U}\Lambda^{R}_{j}(n)\leq\sum_{i\in U}|\Lambda^{R}_{i}(n)|<d. (210)

    Moreover, for every n>Lϵ(d)n>L_{\epsilon}(d) we have

    d<n(i=1k(c(i)(A)ϵ)Fi(A)ϵ).d<n\,\left(\sum_{i=1}^{k}(c^{*}_{(i)}(A)-\epsilon)F_{i}(A)-\epsilon\right).

    By Lemma III.1, for every set U[M]U\subseteq[M] with |U|=k|U|=k, we have

    V(k,K,𝑭(A))=i=1kc(i)(A)Fi(A)iAUci(A)Ii+jAcUcj(A)Jj.V(k,K,\boldsymbol{F}(A))=\sum_{i=1}^{k}c^{*}_{(i)}(A)\,F_{i}(A)\leq\sum_{i\in A\cap U}c^{*}_{i}(A)\,I_{i}+\sum_{j\in A^{c}\cap U}c^{*}_{j}(A)\,J_{j}.

    which further implies that there is an ϵ>0\epsilon^{\prime}>0 sufficiently smaller than ϵ\epsilon such that

    d<n(iAU(ci(A)ϵ)Ii+jAcU(cj(A)ϵ)Jjϵ).d<n\,\left(\sum_{i\in A\cap U}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}+\sum_{j\in A^{c}\cap U}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right).

    Therefore, for every n>Lϵ(d)n>L_{\epsilon}(d) the event {TR>n}\{T^{R}>n\} is included in the event

    {iAUΛiR(n)jAcUΛjR(n)<n(iAU(ci(A)ϵ)Ii+jAcU(cj(A)ϵ)Jjϵ)}.\left\{\sum_{i\in A\cap U}\Lambda^{R}_{i}(n)-\sum_{j\in A^{c}\cap U}\Lambda^{R}_{j}(n)<n\left(\sum_{i\in A\cap U}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}+\sum_{j\in A^{c}\cap U}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right)\right\}.

    In view of (209) and by application of Boole’s inequality over all U[M]U\subseteq[M] such that |U|=k|U|=k, we deduce that

    𝖤A[TR]Lϵ(d)+{U[M]:|U|=k}n=1S(n;U),\mathsf{E}_{A}[T^{R}]\leq L_{\epsilon}(d)+\sum_{\{U\subseteq[M]:|U|=k\}}\sum_{n=1}^{\infty}\;S(n;U), (211)

    where

    S(n;U):=\displaystyle S(n;U):= iAU𝖯A(ΛiR(n)n<(ci(A)ϵ)Iiϵ/k)\displaystyle\sum_{i\in A\cap U}\mathsf{P}_{A}\left(\frac{\Lambda^{R}_{i}(n)}{n}<(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}/k\right) (212)
    +jAcU𝖯A(ΛjR(n)n<(cj(A)ϵ)Jjϵ/k).\displaystyle+\sum_{j\in A^{c}\cap U}\mathsf{P}_{A}\left(-\frac{\Lambda^{R}_{j}(n)}{n}<(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}/k\right).

    By assumption, for any iAi\in A, jAj\notin A, and for any ϵ>0\epsilon>0, we have

    n=1𝖯A(πiR(n)<ci(A)ϵ)<,\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<c_{i}^{*}(A)-\epsilon\right)<\infty, (213)
    n=1𝖯A(πjR(n)<cj(A)ϵ)<,\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{j}(n)<c_{j}^{*}(A)-\epsilon\right)<\infty,

    and thus by [9, Lemma A.2 (i), (ii)], with ρi=ci(A)ϵ\rho_{i}=c_{i}^{*}(A)-\epsilon^{\prime} and ρj=cj(A)ϵ\rho_{j}=c_{j}^{*}(A)-\epsilon^{\prime}, it follows that all the series in (211) converge. Hence, letting dd\to\infty we obtain (207). ∎

    Proof:

    In view of the asymptotic lower bound in Theorem IV.3, it suffices to show that

    𝖤A[TleapR]|log(α)|vA(k1,k2,K,r),as α0,\mathsf{E}_{A}\left[T^{R}_{leap}\right]\lesssim\frac{|\log(\alpha)|}{v_{A}(k_{1},k_{2},K,r)},\quad\mbox{as }\;\alpha\to 0, (214)

    where TleapRT^{R}_{leap} is the stopping rule (73). Without loss of generality, we focus on case 0<|A|<M0<|A|<M and vA(k1,k2,K,r)v_{A}(k_{1},k_{2},K,r) equals (53) of Theorem IV.3, as the other cases follow by the same approach. For a,b>0a,\,b>0 selected according to (75) it suffices to show that for any arbitrarily small ϵ>0\epsilon>0, we have

    𝖤A[TR]b/(i=1k1lA(c<i>(A)ϵ)Ii(A)ϵ),as b,\mathsf{E}_{A}[T^{R}]\lesssim b\Big{/}\left(\sum_{i=1}^{k_{1}-l_{A}}(c^{*}_{<i>}(A)-\epsilon)I_{i}(A)-\epsilon\right),\quad\mbox{as }\,b\to\infty, (215)

    where c<i>(A)c^{*}_{<i>}(A), c{i}(A)c^{*}_{\{i\}}(A) are defined as in (80)-(81), and by letting ϵ0\epsilon\to 0 we obtain (214). To this end, we fix ϵ>0\epsilon>0 small enough and we set

    Lϵ(a,b):=max{bi=1k1lA(ci(A)ϵ)Ii(A)ϵ,aj=1k2(c{j}(A)ϵ)JlA+j(A)ϵ},L_{\epsilon}(a,b):=\max\left\{\frac{b}{\sum_{i=1}^{k_{1}-l_{A}}(c^{*}_{\langle i\rangle}(A)-\epsilon)I_{i}(A)-\epsilon},\frac{a}{\sum_{j=1}^{k_{2}}(c^{*}_{\{j\}}(A)-\epsilon)J_{l_{A}+j}(A)-\epsilon}\right\}, (216)

    and we observe that

    𝖤A[TleapR]Lϵ(a,b)+n>Lϵ(a,b)𝖯A(TR>n).\mathsf{E}_{A}\left[T^{R}_{leap}\right]\leq L_{\epsilon}(a,b)+\sum_{n>L_{\epsilon}(a,b)}\mathsf{P}_{A}(T^{R}>n). (217)

    For any nn\in\mathbb{N}, on the event {TleapR>n}\{T^{R}_{leap}>n\} there exist U1AU_{1}\subseteq A with |U1|=k1lA|U_{1}|=k_{1}-l_{A}, and U2𝑱𝒍𝑨(A)U_{2}\subseteq\boldsymbol{J_{l_{A}}}(A) with |U2|=k2|U_{2}|=k_{2}, such that

    iU1Λ^iR(n)<b,oriU2ΛˇiR(n)<a.\sum_{i\in U_{1}}\hat{\Lambda}^{R}_{i}(n)<b,\quad\mbox{or}\quad\sum_{i\in U_{2}}\check{\Lambda}^{R}_{i}(n)<a. (218)

    Moreover, for every n>Lϵ(a,b)n>L_{\epsilon}(a,b) we have

    b<n(i=1k1lA(ci(A)ϵ)Ii(A)ϵ),a<n(j=1k2(c{j}(A)ϵ)JlA+j(A)ϵ).b<n\left(\sum_{i=1}^{k_{1}-l_{A}}(c^{*}_{\langle i\rangle}(A)-\epsilon)I_{i}(A)-\epsilon\right),\qquad a<n\left(\sum_{j=1}^{k_{2}}(c^{*}_{\{j\}}(A)-\epsilon)J_{l_{A}+j}(A)-\epsilon\right). (219)

    For any set U1U_{1}, U2U_{2}, as described above, we have

    i=1k1lAci(A)Ii(A)iU1ci(A)Ii,j=1k2c{j}(A)Jj+lA(A)jU2cj(A)Jj.\sum_{i=1}^{k_{1}-l_{A}}c^{*}_{\langle i\rangle}(A)\,I_{i}(A)\leq\sum_{i\in U_{1}}c^{*}_{i}(A)\,I_{i},\qquad\sum_{j=1}^{k_{2}}c^{*}_{\{j\}}(A)\,J_{j+l_{A}}(A)\leq\sum_{j\in U_{2}}c^{*}_{j}(A)\,J_{j}.

    Thus, there is an ϵ>0\epsilon^{\prime}>0 sufficiently smaller than ϵ\epsilon such that

    b<n(iU1(ci(A)ϵ)Iiϵ),a<n(jU2(cj(A)ϵ)Jjϵ).b<n\left(\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right),\qquad a<n\left(\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right). (220)

    Therefore, for every n>Lϵ(a,b)n>L_{\epsilon}(a,b) the event {TleapR>n}\{T^{R}_{leap}>n\} is included in the event

    {iU1Λ^iR(n)n<iU1(ci(A)ϵ)Iiϵ}{jU2ΛˇjR(n)n<jU2(cj(A)ϵ)Jjϵ}.\left\{\sum_{i\in U_{1}}\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right\}\bigcup\left\{\sum_{j\in U_{2}}\frac{\check{\Lambda}^{R}_{j}(n)}{n}<\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right\}.

    In view of (217) and by application of Boole’s inequality over all U1AU_{1}\subseteq A with |U1|=k1lA|U_{1}|=k_{1}-l_{A}, and all U2𝑱𝒍𝑨(A)U_{2}\subseteq\boldsymbol{J_{l_{A}}}(A) with |U2|=k2|U_{2}|=k_{2}, we deduce that

    𝖤A[TleapR]Lϵ(a,b)\displaystyle\mathsf{E}_{A}\left[T^{R}_{leap}\right]\leq L_{\epsilon}(a,b) +U1A:|U1|=k1lAn=1𝖯A(iU1Λ^iR(n)n<iU1(ci(A)ϵ)Iiϵ)\displaystyle+\sum_{U_{1}\subseteq A:\,|U_{1}|=k_{1}-l_{A}}\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\sum_{i\in U_{1}}\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right) (221)
    +U2𝑱𝒍𝑨(A):|U2|=k2n=1𝖯A(jU2ΛˇjR(n)n<jU2(cj(A)ϵ)Jjϵ).\displaystyle+\sum_{U_{2}\subseteq\boldsymbol{J_{l_{A}}}(A):\,|U_{2}|=k_{2}}\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\sum_{j\in U_{2}}\frac{\check{\Lambda}^{R}_{j}(n)}{n}<\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right).

    In order to prove the claim, it suffices to show the summability of the series in (221). By Boole’s inequality we have

    𝖯A(iU1Λ^iR(n)n<iU1(ci(A)ϵ)Iiϵ)\displaystyle\mathsf{P}_{A}\left(\sum_{i\in U_{1}}\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<\sum_{i\in U_{1}}(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}\right) iU1𝖯A(Λ^iR(n)n<(ci(A)ϵ)Iiϵ/k1),\displaystyle\leq\sum_{i\in U_{1}}\mathsf{P}_{A}\left(\frac{\hat{\Lambda}^{R}_{i}(n)}{n}<(c^{*}_{i}(A)-\epsilon^{\prime})I_{i}-\epsilon^{\prime}/k_{1}\right), (222)
    𝖯A(jU2ΛˇjR(n)n<jU2(cj(A)ϵ)Jjϵ)\displaystyle\mathsf{P}_{A}\left(\sum_{j\in U_{2}}\frac{\check{\Lambda}^{R}_{j}(n)}{n}<\sum_{j\in U_{2}}(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}\right) jU2𝖯A(ΛˇjR(n)n<(cj(A)ϵ)Jjϵ/k2).\displaystyle\leq\sum_{j\in U_{2}}\mathsf{P}_{A}\left(\frac{\check{\Lambda}^{R}_{j}(n)}{n}<(c^{*}_{j}(A)-\epsilon^{\prime})J_{j}-\epsilon^{\prime}/k_{2}\right).

    By assumption, for any iAi\in A, jAj\notin A, and for any ϵ>0\epsilon>0 we have

    n=1𝖯A(πiR(n)<ci(A)ϵ)\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<c_{i}^{*}(A)-\epsilon\right) <,\displaystyle<\infty, (223)
    n=1𝖯A(πjR(n)<cj(A)ϵ)\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi^{R}_{j}(n)<c_{j}^{*}(A)-\epsilon\right) <,\displaystyle<\infty,

    and thus by [9, Lemma A.2 (i) (ii)], with ρi=ci(A)ϵ\rho_{i}=c_{i}^{*}(A)-\epsilon^{\prime} and ρj=cj(A)ϵ\rho_{j}=c_{j}^{*}(A)-\epsilon^{\prime}, it follows that all the series in (222) converge. Thus, letting bb\to\infty we obtain (215).

    Proof:

    By definition of 𝔇nR\mathfrak{D}^{R}_{n}, we have

    𝖯A(σAR>n)\displaystyle\mathsf{P}_{A}\left(\sigma_{A}^{R}>n\right) iA𝖯A(mn:ΛiR(m)<0)\displaystyle\leq\sum_{i\in A}\mathsf{P}_{A}\left(\exists\,m\geq n:\Lambda^{R}_{i}(m)<0\right) (224)
    +jA𝖯A(mn:ΛjR(m)0),\displaystyle+\sum_{j\notin A}\mathsf{P}_{A}\left(\exists\,m\geq n:\Lambda^{R}_{j}(m)\geq 0\right),

    which by Boole’s inequality is further bounded by

    𝖯A(σAR>n)\displaystyle\mathsf{P}_{A}\left(\sigma_{A}^{R}>n\right)\leq iAm=n𝖯A(ΛiR(m)<0)\displaystyle\sum_{i\in A}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{i}(m)<0\right) (225)
    +jAm=n𝖯A(ΛjR(m)0).\displaystyle+\sum_{j\notin A}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{j}(m)\geq 0\right).

    Therefore, in order to prove that 𝖤A[σAR]<\mathsf{E}_{A}\left[\sigma_{A}^{R}\right]<\infty, it suffices to show that

    n=1m=n𝖯A(ΛiR(m)<0)=n=1n𝖯A(ΛiR(n)<0)\displaystyle\sum_{n=1}^{\infty}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{i}(m)<0\right)=\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0\right) <,iA,\displaystyle<\infty,\quad\forall\,i\in A, (226)
    n=1m=n𝖯A(ΛjR(m)0)=n=1n𝖯A(ΛjR(n)0)\displaystyle\sum_{n=1}^{\infty}\sum_{m=n}^{\infty}\mathsf{P}_{A}\left(\Lambda^{R}_{j}(m)\geq 0\right)=\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\Lambda^{R}_{j}(n)\geq 0\right) <,jAc.\displaystyle<\infty,\quad\forall\,j\in A^{c}.

    We prove the case iAi\in A, as the case iAci\in A^{c} follows in the same way. We note that

    𝖯A(ΛiR(n)<0)\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0\right)\leq 𝖯A(ΛiR(n)<0,πiR(n)Cnδ)\displaystyle\,\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0,\,\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right) (227)
    +𝖯A(πiR(n)<Cnδ).\displaystyle+\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right).

    In view of assumption (91), in order to prove (226), it suffices to show that

    n=1n𝖯A(ΛiR(n)<0,πiR(n)Cnδ)<,\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0,\,\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right)<\infty, (228)

    and since

    𝖯A(ΛiR(n)<0,πiR(n)Cnδ)\displaystyle\mathsf{P}_{A}\left(\Lambda^{R}_{i}(n)<0,\,\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right) (229)
    =𝖯A(Λ~iR(n)<IinπiR(n),πiR(n)Cnδ)\displaystyle=\mathsf{P}_{A}\left(\tilde{\Lambda}^{R}_{i}(n)<-I_{i}\,n\,\pi^{R}_{i}(n),\pi^{R}_{i}(n)\geq C\,n^{-\delta}\right)
    𝖯A(|Λ~iR(n)|>CIin1δ),\displaystyle\leq\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>C\,I_{i}\,n^{1-\delta}\right),

    it suffices to show

    n=1n𝖯A(|Λ~iR(n)|>CIin1δ)<.\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>CI_{i}n^{1-\delta}\right)<\infty. (230)

    By Markov’s inequality

    𝖯A(|Λ~iR(n)|>CIin1δ)𝖤A[|Λ~iR(n)|𝔭]C𝔭Ii𝔭n(1δ)𝔭.\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>CI_{i}n^{1-\delta}\right)\leq\frac{\mathsf{E}_{A}\left[|\tilde{\Lambda}^{R}_{i}(n)|^{\mathfrak{p}}\right]}{C^{\mathfrak{p}}I^{\mathfrak{p}}_{i}n^{(1-\delta)\mathfrak{p}}}. (231)

    For each i[M]i\in[M], {Λ~iR(n):n0}\{\tilde{\Lambda}^{R}_{i}(n)\,:\,n\geq 0\} is a R(n)\mathcal{F}^{R}(n)-martingale, thus by Rosenthal’s inequality [29, Theorem 2.12] there is a constant C0>0C_{0}>0, such that

    𝖤A[|Λ~iR(n)|𝔭]C0n𝔭/2,\mathsf{E}_{A}\left[|\tilde{\Lambda}^{R}_{i}(n)|^{\mathfrak{p}}\right]\leq C_{0}n^{\mathfrak{p}/2}, (232)

    and as a result,

    𝖯A(|Λ~iR(n)|>CIin1δ)C0C𝔭Ii𝔭n𝔭/2n(1δ)𝔭,\mathsf{P}_{A}\left(|\tilde{\Lambda}^{R}_{i}(n)|>CI_{i}n^{1-\delta}\right)\leq\frac{C_{0}}{C^{\mathfrak{p}}I^{\mathfrak{p}}_{i}}\frac{n^{\mathfrak{p}/2}}{n^{(1-\delta)\mathfrak{p}}}, (233)

    For ϵ>0\epsilon>0 small enough such that δ<1/2(2+ϵ)/𝔭\delta<1/2-(2+\epsilon)/\mathfrak{p}, it holds

    n𝔭/2n(1δ)𝔭<1n2+ϵ,\frac{n^{\mathfrak{p}/2}}{n^{(1-\delta)\mathfrak{p}}}<\frac{1}{n^{2+\epsilon}}, (234)

    which proves the claim (230). ∎

    Proof:

    According to Theorems V.1 and V.2, it suffices to show that for each i[M]i\in[M]

    n=1𝖯A(πiR(n)<ci(A)ϵ)<,ϵ>0,\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right)<\infty,\quad\forall\;\epsilon>0, (235)

    where (c1(A),,cM(A))(c^{*}_{1}(A),\ldots,c^{*}_{M}(A)) is defined according to Definitions V.1 and V.2, respectively.

    By the definition (92) of a probabilistic sampling rule, R(n)R(n) is conditionally independent of n1R\mathcal{F}^{R}_{n-1} given 𝔇n1R\mathfrak{D}^{R}_{n-1}. Thus, by [30, Prop. 6.13] there is a measurable function h:×2[M]×[0,1]2[M]h:\mathbb{N}\times 2^{[M]}\times[0,1]\to 2^{[M]} such that

    R(n)=h(n,𝔇n1R,Z0(n1)),n,R(n)=h(n,\mathfrak{D}^{R}_{n-1},Z_{0}(n-1)),\quad n\in\mathbb{N}, (236)

    where {Z0(n):n0}\{Z_{0}(n)\,:\,n\in\mathbb{N}_{0}\} is a sequence of iid random variables, uniformly distributed on [0,1][0,1]. Consequently, for each i[M]i\in[M] there is a measurable function hi:×2[M]×[0,1]{0,1}h_{i}:\mathbb{N}\times 2^{[M]}\times[0,1]\to\{0,1\} such that

    Ri(n)=hi(n,𝔇n1R,Z0(n1)),n.R_{i}(n)=h_{i}(n,\mathfrak{D}^{R}_{n-1},Z_{0}(n-1)),\quad n\in\mathbb{N}. (237)

    We fix ϵ>0\epsilon>0, and i[M]i\in[M]. For every nn\in\mathbb{N}, we have

    {πiR(n)<ci(A)ϵ}\displaystyle\left\{\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right\} (238)
    ={m=1n(Ri(m)hi(m,A,Z0(m1)))+m=1nhi(m,A,Z0(m1))<n(ci(A)ϵ)},\displaystyle=\left\{\sum_{m=1}^{n}\left(R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\right)+\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n\,(c^{*}_{i}(A)-\epsilon)\right\},

    and as a result

    𝖯A(πiR(n)<ci(A)ϵ)\displaystyle\mathsf{P}_{A}\left(\pi_{i}^{R}(n)<c^{*}_{i}(A)-\epsilon\right) 𝖯A(m=1n(Ri(m)hi(m,A,Z0(m1)))<nϵ/2)\displaystyle\leq\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\right)<-n\,\epsilon/2\right) (239)
    +𝖯A(m=1nhi(m,A,Z0(m1))<n(ci(A)ϵ/2)).\displaystyle+\mathsf{P}_{A}\left(\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n(c^{*}_{i}(A)-\epsilon/2)\right).

    For the first term on the right hand side of (239) we have

    𝖯A(m=1n(Ri(m)hi(m,A,Z0(m1)))<nϵ/2)\displaystyle\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))\right)<-n\,\epsilon/2\right) (240)
    𝖯A(m=1n|Ri(m)hi(m,A,Z0(m1))|>nϵ/2,σARn)+𝖯A(σARn)\displaystyle\leq\mathsf{P}_{A}\left(\sum_{m=1}^{n}|R_{i}(m)-h_{i}(m,A,Z_{0}(m-1))|>n\,\epsilon/2,\sigma^{R}_{A}\leq n\right)+\mathsf{P}_{A}\left(\sigma^{R}_{A}\geq n\right)
    𝖯A(σARnϵ/2)+𝖯A(σARn)\displaystyle\leq\mathsf{P}_{A}\left(\sigma^{R}_{A}\geq n\,\epsilon/2\right)+\mathsf{P}_{A}\left(\sigma^{R}_{A}\geq n\right)

    where we used the fact that Ri(n)=hi(n,A,Z0(n1))R_{i}(n)=h_{i}(n,A,Z_{0}(n-1)) for all nσARn\geq\sigma^{R}_{A}. Since RR is consistent both terms on the right hand side of (240) are summable.

    For the second term on the right hand side of (239) we have

    {m=1nhi(m,A,Z0(m1))<n(ci(A)ϵ/2)}\displaystyle\left\{\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n(c^{*}_{i}(A)-\epsilon/2)\right\} (241)
    ={m=1n(hi(m,A,Z0(m1))ciR(m,A))<m=1n(ci(A)ciR(m,A))nϵ/2}.\displaystyle=\left\{\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<\sum_{m=1}^{n}\left(c^{*}_{i}(A)-c_{i}^{R}(m,A)\right)-n\,\epsilon/2\right\}.

    By the (general form of) Stolz-Cesaro theorem [31],

    lim supnm=1n(ci(A)ciR(m,A))n\displaystyle\limsup_{n\to\infty}\frac{\sum_{m=1}^{n}\left(c^{*}_{i}(A)-c_{i}^{R}(m,A)\right)}{n} (242)
    lim supn(ci(A)ciR(n,A))=ci(A)lim infnciR(n,A)0,\displaystyle\leq\limsup_{n\to\infty}\left(c^{*}_{i}(A)-c_{i}^{R}(n,A)\right)=c^{*}_{i}(A)-\liminf_{n\to\infty}c_{i}^{R}(n,A)\leq 0,

    where the last inequality is deduced by assumption (95). Therefore, there exists an M>0M>0 such that for all nMn\geq M it holds

    1nm=1n(ci(A)ciR(m,A))<ϵ4.\frac{1}{n}\sum_{m=1}^{n}\left(c^{*}_{i}(A)-c_{i}^{R}(m,A)\right)<\frac{\epsilon}{4}. (243)

    Hence,

    n=1𝖯A(m=1nhi(m,A,Z0(m1))<n(ci(A)ϵ/2))\displaystyle\sum_{n=1}^{\infty}\mathsf{P}_{A}\left(\sum_{m=1}^{n}h_{i}(m,A,Z_{0}(m-1))<n\,(c^{*}_{i}(A)-\epsilon/2)\right) (244)
    M+n=M𝖯A(m=1n(hi(m,A,Z0(m1))ciR(m,A))<nϵ/4),\displaystyle\leq M+\sum_{n=M}^{\infty}\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<-n\,\epsilon/4\right),

    and, thus, it suffices to show that

    n=M𝖯A(m=1n(hi(m,A,Z0(m1))ciR(m,A))<nϵ/4)<.\sum_{n=M}^{\infty}\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<-n\epsilon/4\right)<\infty. (245)

    By (93) and (237), it holds

    𝖤A[hi(n,A,Z0(n1))|n1R]=ciR(n,A),n\mathsf{E}_{A}\left[h_{i}(n,A,Z_{0}(n-1))\,|\,\mathcal{F}^{R}_{n-1}\right]=c_{i}^{R}(n,A),\quad\forall\,n\in\mathbb{N} (246)

    which shows that {hi(n,A,Z0(n1))ciR(n,A):n1}\{h_{i}(n,A,Z_{0}(n-1))-c_{i}^{R}(n,A)\,:\,n\geq 1\} is a martingale difference sequence, whose absolute value is uniformly bounded by 11. By application of Azuma-Hoeffding inequality, we deduce that

    𝖯A(m=1n(hi(m,A,Z0(m1))ciR(m,A))<nϵ/2)eγn\mathsf{P}_{A}\left(\sum_{m=1}^{n}\left(h_{i}(m,A,Z_{0}(m-1))-c_{i}^{R}(m,A)\right)<-n\,\epsilon/2\right)\leq e^{-\gamma n} (247)

    where γ>0\gamma>0 is a constant, which proves (245).

    Proof:

    We show that the suggested sampling rule RR is consistent for part (ii). According to Theorem VI.1, it suffices to show that for each i[M]i\in[M],

    n=1n𝖯A(πiR(n)<Cnδ)<,\sum_{n=1}^{\infty}n\,\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)<\infty, (248)

    for some C>0C>0 and δ(0,122𝔭)\delta\in\left(0,\frac{1}{2}-\frac{2}{\mathfrak{p}}\right). We choose C<CpC<C_{p}. We fix i[M]i\in[M], and for every nn\in\mathbb{N}, we notice that

    {πiR(n)<Cnδ}\displaystyle\left\{\pi_{i}^{R}(n)<C\,n^{-\delta}\right\} ={m=1nRi(m)<Cn1δ}\displaystyle=\left\{\sum_{m=1}^{n}R_{i}(m)<Cn^{1-\delta}\right\} (249)
    ={m=1n(Ri(m)ciR(m,𝔇m1R))+m=1nciR(m,𝔇m1R)<Cn1δ}.\displaystyle=\left\{\sum_{m=1}^{n}(R_{i}(m)-c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1}))+\sum_{m=1}^{n}c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1})<Cn^{1-\delta}\right\}.

    By (99), we have ciR(m,𝔇m1R)Cp/mδc^{R}_{i}(m,\mathfrak{D}^{R}_{m-1})\geq C_{p}/m^{\delta}, for all m{1,,n}m\in\{1,\ldots,n\}, and we deduce that

    m=1nciR(m,𝔇m1R)\displaystyle\sum_{m=1}^{n}c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1}) m=1nCpmδCpn1δ,\displaystyle\geq\sum_{m=1}^{n}\frac{C_{p}}{m^{\delta}}\geq C_{p}\,n^{1-\delta}, (250)

    which further implies

    𝖯A(πiR(n)<Cnδ)\displaystyle\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right) (251)
    𝖯A(m=1n(Ri(m)ciR(m,𝔇m1R))<(CpC)n1δ).\displaystyle\leq\mathsf{P}_{A}\left(\sum_{m=1}^{n}(R_{i}(m)-c^{R}_{i}(m,\mathfrak{D}^{R}_{m-1}))<-(C_{p}-C)\,n^{1-\delta}\right).

    In view of (93), it holds

    𝖤A[Ri(n)|n1R]=ciR(n,𝔇n1R),n,\mathsf{E}_{A}\left[R_{i}(n)|\mathcal{F}^{R}_{n-1}\right]=c^{R}_{i}(n,\mathfrak{D}^{R}_{n-1}),\quad\forall\,n\in\mathbb{N}, (252)

    which shows that {Ri(n)ciR(n,𝔇n1R):n}\left\{R_{i}(n)-c^{R}_{i}(n,\mathfrak{D}^{R}_{n-1})\,:\,n\in\mathbb{N}\right\} is a martingale difference, whose absolute value is also uniformly bounded by 11. Thus, by application of the Azuma-Hoeffding inequality, we deduce that

    𝖯A(πiR(n)<Cnδ)eζn12δ,\mathsf{P}_{A}\left(\pi^{R}_{i}(n)<C\,n^{-\delta}\right)\leq e^{-\zeta n^{1-2\delta}},

    where ζ>0\zeta>0 is a constant. Therefore, we deduce (248) by noting that there is M>0M>0 such that for all nMn\geq M,

    n12δ3ζln(n).n^{1-2\delta}\geq\frac{3}{\zeta}\ln(n).