This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymptotically Optimal Search for a Change Point Anomaly under a Composite Hypothesis Model

Liad Lea Didi, Tomer Gafni, Kobi Cohen (Senior Member, IEEE) Liad Lea Didi, Tomer Gafni, and Kobi Cohen are with the School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva 8410501 Israel. Email: [email protected], [email protected], [email protected] short version of this paper that introduces the algorithm, and preliminary simulation results was presented at the annual Allerton Conference on Communication, Control, and Computing (Allerton) 2024 [1]. In this journal version we include: (i) A detailed development and description of the algorithm; (ii) a deep theoretical analysis of the algorithm with detailed proofs; (iii) more extensive simulation results; and (iv) a detailed discussion of the results, and comprehensive discussion and comparison with the existing literature.This work was supported by the Israel Science Foundation under Grant 2640/20.
Abstract

We address the problem of searching for a change point in an anomalous process among a finite set of MM processes. Specifically, we address a composite hypothesis model in which each process generates measurements following a common distribution with an unknown parameter (vector). This parameter belongs to either a normal or abnormal space depending on the current state of the process. Before the change point, all processes, including the anomalous one, are in a normal state; after the change point, the anomalous process transitions to an abnormal state. Our goal is to design a sequential search strategy that minimizes the Bayes risk by balancing sample complexity and detection accuracy.

We propose a deterministic search algorithm with the following notable properties. First, we analytically demonstrate that when the distributions of both normal and abnormal processes are unknown, the algorithm is asymptotically optimal in minimizing the Bayes risk as the error probability approaches zero. In the second setting, where the parameter under the null hypothesis is known, the algorithm achieves asymptotic optimality with improved detection time based on the true normal state. Simulation results are presented to validate the theoretical findings.

Index Terms:
Anomaly detection, change point detection, controlled sensing, active hypothesis testing.

I Introduction

We consider the problem of detecting a change point that occurs in an anomalous process among MM processes. At each time step, the decision maker observes a subset of KK processes (1KM1\leq K\leq M). Using terminology from target search, we refer to these processes as cells, with the anomalous process after the change point being the target, located in one of the MM cells. We adopt a composite hypothesis model, where the observations follow a distribution dependent on an unknown parameter (vector). When probing a specific cell, random continuous values are measured, drawn from a common distribution ff. This distribution ff has an unknown parameter that belongs to either Θ(0)\Theta^{(0)} or Θ(1)\Theta^{(1)}, depending on whether the target is absent or present, respectively. Before the change point, all cells are in a benign state, i.e., with parameters in Θ(0)\Theta^{(0)}. After the change point, the distribution of the observations for one of the cells may change, with its parameter shifting to Θ(1)\Theta^{(1)}. We assume the change point is an unknown deterministic parameter. The policy for selecting which subset of cells to probe at each time step must be designed jointly with the change detection algorithm. Therefore, the objective is to design a sequential search strategy that minimizes the Bayes risk, accounting for sample complexity and detection accuracy, by determining both the subset of cells to observe and when to terminate the search and declare the anomalous cell.

I-A Main Results

The problem of searching for a change point anomaly, as considered in this paper, is closely related to both the sequential design of experiments and change point detection. Within the sequential design of experiments, search problems involving a finite set of processes require critical decisions about which processes to sample sequentially at each time step. This process continues until testing concludes, and the anomaly’s location is reliably identified. This class of problems has been referred to in recent years as controlled sensing for hypothesis testing [2] or active hypothesis testing [3]. These frameworks trace their origins to Chernoff’s seminal work on sequential design of experiments [4]. Chernoff primarily addressed binary hypothesis testing, introducing a randomized test strategy, the Chernoff test, which is asymptotically optimal as the error probability approaches zero. Variations and extensions of this framework are discussed in Section I-B. However, these approaches typically assume a fixed distribution setting, where the anomalous state is present from the outset.

In contrast, the problem addressed in this paper involves a target process that transitions from a benign to an anomalous state at an unknown change point. This aspect aligns the problem with change point detection, where the objective is to identify a change in the distribution as quickly as possible after it occurs, while maintaining a low false alarm rate to avoid premature declarations. The CUSUM algorithm, introduced in [5], serves as a foundational method in change point detection for single processes, achieving optimality under certain performance criteria. Variants of this problem are also discussed in Section I-B. Despite these connections, the specific problem of searching for a change point anomaly across multiple processes under a composite hypothesis model considered here has not been analyzed before. Below, we summarize the main contributions of this work.

First, in terms of algorithm development, We propose a novel algorithm, Searching for Change Point Anomaly (SCPA), to address the problem at hand. The algorithm is deterministic with low-complexity implementations and operates through three distinct phases: Exploration, exploitation, and sequential testing. In the exploration phase, all cells are probed in a round-robin fashion to estimate the unknown parameter. During the exploitation phase, probing focuses on the cell where the change point is most likely to occur. Finally, in the sequential testing phase, this cell is sampled until sufficient confidence is reached to declare it as anomalous. A notable challenge arises from early observations in the pre-change regime, which could delay detection. This is mitigated by strategically resetting the statistic measure. Detailed descriptions of the algorithm are provided in Section III.

In terms of theoretical performance analysis, we establish the asymptotic optimality of SCPA as the error probability approaches zero. Traditional change point analysis often restricts the change point τc\tau_{c} by assuming prior distributions on τc\tau_{c} or adopting weaker performance metrics, such as fixed, non-vanishing error rates. Here, we introduce a novel approach to tackle the problem by setting τc\tau_{c} in a manner that ensures robust guarantees for vanishing error probabilities. Specifically, we impose an asymptotic constraint on τc\tau_{c}, enabling it to be arbitrarily large while ensuring it occurs within a timeframe that does not excessively delay detection. This approach enables strong guarantees on the vanishing of error probabilities, aligning with the classical Chernoff’s sequential experimental design [4], and addresses gaps in existing approaches where globally optimal solutions are unavailable under general settings [6, 7]. Current modifications often use techniques like increasing curved boundaries [8] or constraining local false alarm probabilities [7]. In contrast, our analysis provides explicit conditions under which SCPA achieves asymptotic optimality. This setting is particularly applicable to real-world scenarios, such as cyber anomaly detection, where algorithms operate during intervals where anomalies are likely to manifest (e.g., before corrective actions like patch installations are implemented). More generally, it applies to systems operating within time intervals where the focus is on ensuring vanishing error probabilities within periods after the change point. This robust framework contrasts with approaches accepting fixed error rates across the entire time horizon.

We demonstrate the asymptotic optimality of SCPA in two scenarios: (i) When the distributions of normal and anomalous processes are unknown, and (ii) when the parameter value under the null hypothesis is known and identical across all non-anomalous processes. In both cases, our algorithm matches the performance of the active hypothesis testing setting in [9], which is a special case of our problem where the anomaly is present from the start without temporal change. Finally, we present simulation results that validate our theoretical findings and demonstrate the practical effectiveness of the SCPA algorithm.

I-B Related Work

As highlighted earlier, our problem is closely related to both sequential experimental design and change point detection. Sequential experimental design, originally introduced by Chernoff [4], has been extensively studied and expanded upon in works such as [3, 2, 10, 11, 12, 13, 14, 15, 16, 9, 17, 18, 19, 20, 21, 22, 23, 24]. The specific challenge of anomaly detection across multiple processes has been addressed in [25, 20, 14, 16, 21, 22, 23, 10, 12, 9, 13, 15]. The detection of abnormal processes over densities with an unknown parameter was explored in [10], focusing on independent process states across cells. In [25], anomaly detection among MM processes was studied, where KK processes could be observed at each time step. Observations followed one of two known distributions, depending on whether the process was normal or abnormal. A deterministic search policy was proven asymptotically optimal in minimizing detection time under an error probability constraint. An extension of [25] was considered in [9], where observations followed a common distribution with an unknown parameter, varying based on the process state. A sequential search strategy achieving asymptotic optimality was developed. For Poisson point processes with unknown rates, [13] established an asymptotically optimal policy for single-location probing, requiring a randomized selection rule and linear exploration time. Non-parametric detection over finite observation spaces was studied in [26], showing asymptotic optimality with logarithmic exploration time when the null distribution is known. Extensions to M-ary and composite hypothesis testing for single processes were investigated in [27, 28]. In contrast to these anomaly detection frameworks with fixed distribution settings, our setting addresses a dynamic scenario involving a change point where the anomalous process transitions between states. Additional works on sequential detection across independent processes include [29, 30, 31, 10, 32, 33].

This work also relates to the field of change point detection. Page [34] introduced CUSUM charts, which leverage cumulative data for enhanced detection efficiency. Shiryaev [35] developed a Bayesian framework minimizing average detection delay with known geometric priors. Lorden [36] proposed a minimax framework optimizing worst-case detection delay, with CUSUM achieving asymptotic optimality. Subsequent improvements to Lorden’s criterion were presented in [37, 38]. Pollak [39] introduced a single-maximization criterion, showing the asymptotic optimality of Shiryaev-Roberts and CUSUM algorithms [28]. Non-Bayesian quickest detection was explored in [40], demonstrating CUSUM’s optimality. In [41], the problem of detecting distribution changes in random vector sequences was considered, addressing unknown post-change parameters influenced by control actions, requiring both a stopping rule and a sequential control policy. [42] addressed streaming data changing from a known to an unknown parametric distribution, combining CUSUM with post-change parameter estimation. Shiryaev [43] examined change detection in a single process under a Bayesian loss framework, while Lorden [44] and Pollak [45] proposed minimax criteria for i.i.d. observations with known pre- and post-change distributions but unknown deterministic change points. Asymptotic performance of Shiryaev’s procedures was studied in [46]. Extensions and variations appear in [6, 47, 48, 49]. Multi-stream settings were addressed in [50], where streams share a known distribution pre-change, and one stream diverges to a different known distribution post-change. [51] generalized to exponential distributions with a common pre-change parameter, where one stream shifts to an unknown parameter. However, these studies focused on declaring a change without pinpointing the affected stream.

II System Model and Problem Formulation

We begin this section by presenting the system model, followed by formulating the objective.

II-A System Model

We consider the problem of detecting a change point in an anomalous process within a finite set of MM processes, where MM is large yet finite. Initially, before the change point, all processes—including the anomalous one—are in a normal state. After the change point, the anomalous process transitions to an abnormal state. Specifically, the system consists of MM discrete time stochastic processes {Xnm}n1,1mM\{X_{n}^{m}\}_{n\geq 1},1\leq m\leq M, with XnmX_{n}^{m} taking real values at times n=1,2,n=1,2,.... The distribution of the anomalous process changes at an unknown deterministic change point τc\tau_{c}. If process mm is the anomalous one, we denote this as hypothesis HmH_{m} being true. Drawing an analogy from target search, we often refer to the anomalous process as the target, which can be located in any of the MM cells.

We focus on a composite hypothesis setting, where the observation distribution depends on an unknown parameter (vector). Let θm\theta_{m} denote the unknown parameter specifying the observation distribution for cell mm. When cell mm is observed at time nn, an observation ym(n)y_{m}(n) is drawn independently from a common density f(y|θm)f(y|\theta_{m}), where θmΘ\theta_{m}\in\Theta and Θ\Theta\subseteq\mathbb{R} represents the parameter space for all cells. We define two disjoint open subsets of Θ\Theta: Θ(0)\Theta^{(0)} and Θ(1)\Theta^{(1)}, representing the parameter spaces for the non-anomalous distribution and the anomalous distribution, respectively. Specifically, under the non-anomalous distribution, θmΘ(0)\theta_{m}\in\Theta^{(0)}, while under the anomalous distribution, θmΘ(1)\theta_{m}\in\Theta^{(1)}. Under hypothesis HmH_{m}, the observations {yj(n)}n1,jm\{y_{j}(n)\}_{n\geq 1},\forall j\neq m, are drawn independently from f(y|θj)f(y|\theta_{j}) with θjΘ(0)\theta_{j}\in\Theta^{(0)}. For cell mm, the observations ym(1),ym(2),,ym(τc1)y_{m}(1),y_{m}(2),\ldots,y_{m}(\tau_{c}-1) are drawn independently from f(y|θm)f(y|\theta_{m}) with θmΘ(0)\theta_{m}\in\Theta^{(0)}, and the observations ym(τc),ym(τc+1),y_{m}(\tau_{c}),y_{m}(\tau_{c}+1),\ldots are drawn independently from f(y|θm)f(y|\theta_{m}) with θmΘ(1)\theta_{m}\in\Theta^{(1)}. The prior probability that HmH_{m} is true is denoted by πm\pi_{m}, where m=1Mπm=1\sum_{m=1}^{M}\pi_{m}=1. To avoid trivial solutions, we assume 0<πm<10<\pi_{m}<1 for all mm. At each time step, only KK cells (1KM1\leq K\leq M) can be observed. Let 𝐏m\mathbf{P}_{m} denote the probability measure under hypothesis HmH_{m}, and let 𝐄m\mathbf{E}_{m} represent the expectation operator with respect to the measure 𝐏m\mathbf{P}_{m}.

The stopping time τ\tau is defined as the time when the decision-maker concludes the search by declaring the target’s location. Let δ{1,2,,M}\delta\in\{1,2,\ldots,M\} be a decision rule, where δ=m\delta=m indicates that the decision-maker declares HmH_{m} to be true. Define ϕ(n){1,2,,M}K\phi(n)\in\{1,2,\ldots,M\}^{K} as the selection rule specifying the KK cells chosen for observation at time nn. The time-series vector of selection rules is denoted by ϕ=(ϕ(n),n=1,2,)\boldsymbol{\phi}=(\phi(n),n=1,2,\ldots). Let yϕ(n)(n)\textbf{y}_{\phi(n)}(n) represent the vector of observations obtained from the selected cells ϕ(n)\phi(n) at time nn, and let y(n)={ϕ(t),yϕ(t)(t)}t=1n\textbf{y}(n)=\{\phi(t),\textbf{y}_{\phi(t)}(t)\}_{t=1}^{n} represent the collection of all cell selections and corresponding observations up to time nn. A deterministic selection rule ϕ(n)\phi(n) at time nn is a mapping from y(n1)\textbf{y}(n-1) to {1,2,,M}K\{1,2,\ldots,M\}^{K}. Alternatively, a randomized selection rule ϕ(n)\phi(n) maps y(n1)\textbf{y}(n-1) to probability mass functions over {1,2,,M}K\{1,2,\ldots,M\}^{K}.

An admissible strategy Γ\Gamma for the sequential change point anomaly detection problem is characterized by the tuple Γ=(τ,δ,ϕ)\Gamma=(\tau,\delta,\boldsymbol{\phi}). Note that we seek a search strategy that does not rely on knowledge of τc\tau_{c} and assumes no prior knowledge about it.

II-B Objective

We begin by defining the error probabilities associated with a sequential decision strategy Γ\Gamma. In classical change point detection involving a single process, the decision rule declares a change has occurred once sufficient evidence is gathered. In this setting, only false alarm events can occur, which correspond to declaring a change before it actually happens. After the change point, all subsequent decisions are correct. The objective in this case is to strike a balance between minimizing the detection delay, i.e., the time elapsed between the actual change point and the declaration time, and minimizing the false alarm probability.

In contrast, for anomaly change point detection involving multiple processes, as considered in this paper, the objective is not only to detect whether a change has occurred but also to identify the anomalous process in which the change occurred. As a result, errors arise from two types of events, depending on whether the change point has occurred at or before the stopping time τ\tau: false alarms and missed detections. The false alarm probability is the probability of declaring an anomaly in one of the MM cells before the change point occurs (when all processes are still in their normal states). Conversely, the missed detection probability is the probability of incorrectly identifying a non-anomalous cell as the target after the change point has occurred (when one process is in an anomalous state, but the decision-maker fails to detect it and instead declares another process as anomalous).

Formally, under strategy Γ\Gamma and hypothesis HmH_{m}, the false alarm probability is given by:

Pm(τ<τc|Γ;τc),\textbf{P}_{m}(\tau<\tau_{c}\,|\,\Gamma\,;\,\tau_{c})\,, (1)

and the miss detection probability is given by:

Pm(δm,ττc|Γ;τc).\textbf{P}_{m}(\delta\neq m,\,\tau\geq\tau_{c}\,|\,\Gamma\,;\,\tau_{c})\,. (2)

Hence, the error probability under hypothesis HmH_{m} is given by:

αm(Γ;τc)=Pm(δm,ττc|Γ;τc)+Pm(τ<τc|Γ;τc).\alpha_{m}(\Gamma\,;\,\tau_{c})=\textbf{P}_{m}(\delta\neq m,\,\tau\geq\tau_{c}\,|\,\Gamma\,;\,\tau_{c})+\textbf{P}_{m}(\tau<\tau_{c}\,|\,\Gamma\,;\,\tau_{c}). (3)

Finally, the overall error probability under strategy Γ\Gamma is given by:

Pe(Γ;τc)=m=1Mπmαm(Γ;τc).P_{e}(\Gamma\,;\,\tau_{c})=\sum_{m=1}^{M}\pi_{m}\alpha_{m}(\Gamma\,;\,\tau_{c})\,. (4)

Note that in the special case of M=1M=1, i.e., a single anomalous process, missed detection events do not occur. Consequently, the error probability reduces to the false alarm probability (1).

Let E((ττc)+|Γ;τc)=m=1MπmEm((ττc)+|Γ;τc)\textbf{E}(\left(\tau-\tau_{c}\right)^{+}|\Gamma;\tau_{c})=\sum_{m=1}^{M}\pi_{m}\textbf{E}_{m}(\left(\tau-\tau_{c}\right)^{+}|\Gamma;\tau_{c}) be the average detection delay under Γ\Gamma, where x+=max{x,0}x^{+}=\max\{x,0\}. We adopt the Bayes risk framework, as utilized in both active hypothesis testing and change point detection formulations [4, 25, 28, 52, 9, 29], by assigning a cost of cc for each observation made after the change point and a loss of 11 for a wrong declaration. The risk under strategy Γ\Gamma when hypothesis HmH_{m} is true and change point τc\tau_{c}, is thus given by:

Rm(Γ;τc)αm(Γ;τc)+cEm((ττc)+|Γ;τc).R_{m}(\Gamma\,;\,\tau_{c})\triangleq\alpha_{m}(\Gamma\,;\,\tau_{c})+c\>\textbf{E}_{m}\left(\left(\tau-\tau_{c}\right)^{+}|\Gamma;\tau_{c}\right). (5)

The average risk is given by:

R(Γ;τc)=m=1MπmRm(Γ;τc)=Pe(Γ;τc)+cE((ττc)+|Γ;τc).\begin{array}[]{l}\displaystyle R(\Gamma\,;\,\tau_{c})=\sum_{m=1}^{M}\pi_{m}R_{m}(\Gamma\,;\,\tau_{c})\vspace{0.1cm}\\ \displaystyle\hskip 56.9055pt=P_{e}(\Gamma\,;\,\tau_{c})+c\>\textbf{E}\left(\left(\tau-\tau_{c}\right)^{+}\,|\,\Gamma\,;\tau_{c}\right).\end{array} (6)

The objective is to find a strategy Γ\Gamma that minimizes the Bayes risk R(Γ)R(\Gamma) over all possible realizations of τc\tau_{c} without knowledge of its value:

infΓR(Γ;τc).\underset{\Gamma}{\inf}\;R(\Gamma\,;\,\tau_{c}). (7)

III The Search Algorithm for Change Point Anomaly (SCPA)

In this section we present the Search algorithm for Change Point Anomaly (SCPA) for solving (7). To streamline the presentation, we describe the algorithm for K=1K=1 (i.e., sampling one cell at each time slot), which is common in dynamic search problems (e.g., [53, 26, 9] and subsequent studies). Later, in Section III-B, we explain its extension to K>1K>1. We also assume that the parameter space Θ\Theta is finite, consistent with the assumption made in [4] and subsequent studies. Later, in Section IV, we will provide asymptotic analysis of the algorithm performance.

Let us provide below notations used in the algorithm description and throughout the paper. For every θ,φΘ\theta,\varphi\in\Theta we define

m(θ,φ)(t)logf(ym(t)|θ)f(ym(t)|φ),\ell_{m}^{(\theta,\varphi)}(t)\triangleq\log\frac{f(y_{m}(t)|\theta)}{f(y_{m}(t)|\varphi)}, (8)

as the log-likelihood ratio (LLR), testing θ\theta against φ\varphi at time nn. The term

D(θ||φ)Ef(y|θ)[logf(y|θ)f(y|φ)]D(\theta||\varphi)\triangleq\textbf{E}_{f(y|\theta)}\left[\log\frac{f(y|\theta)}{f(y|\varphi)}\right] (9)

denotes the Kullback-Leibler (KL) divergence between the two distributions, f(y|θ)f(y|\theta) and f(y|φ)f(y|\varphi).

III-A Description of the Algorithm

The SCPA algorithm is structured into exploration and exploitation epochs, as described below.

  1. 1.

    Phase 1: Sequential Round-Robin Exploration: Observe the cells sequentially in a round-robin manner, collecting ym(n)y_{m}(n) (the observation from cell mm at time nn), and estimate the cell parameter using the last NN observations, y~m{ym(r1),,ym(rN)}\tilde{\textbf{y}}_{m}\triangleq\{y_{m}(r_{1}),...,y_{m}(r_{N})\}, through an unconstrained maximum likelihood estimator (MLE)

    θ^m(n)argmaxθΘf(y~m|θ).\hat{\theta}_{m}(n)\triangleq\arg\max_{\theta\in\Theta}f(\tilde{\textbf{y}}_{m}|\theta). (10)

    Here, NN is a predefined parameter, and its selection will be discussed later in the algorithm description.
    If |{m:θ^m(n)Θ(0)}|=1|\{m:\hat{\theta}_{m}(n)\not\in\Theta^{(0)}\}|=1 (i.e., there is exactly one cell suspected of having experienced a change point), proceed to Phase 2. Otherwise, return to Phase 1.

  2. 2.

    Phase 2: Exploitation with Targeted Parameter Estimation: Let TT denote the most recent time at which Phase 1 was exited, and let m^(n)={m:θ^m(n)Θ(0)}\hat{m}(n)=\{m:\hat{\theta}_{m}(n)\not\in\Theta^{(0)}\} represent the index of the single cell whose MLE falls outside Θ(0)\Theta^{(0)}. Observe cell m^(n)\hat{m}(n). Based on the observations y¯m^(n)(n){ym^(n)(T+1),,ym^(n)(n)}\overline{\textbf{y}}_{\hat{m}(n)}(n)\triangleq\{y_{\hat{m}(n)}(T+1),...,y_{\hat{m}(n)}(n)\} estimate the cell parameter using the unconstrained MLE:

    θ^m^(n)(n)argmaxθΘf(y¯m^(n)(n)|θ).\hat{\theta}_{\hat{m}(n)}(n)\triangleq\arg\max_{\theta\in\Theta}f(\overline{\textbf{y}}_{\hat{m}(n)}(n)|\theta). (11)

    If θ^m^(n)(n)Θ(0)\hat{\theta}_{\hat{m}(n)}(n)\in\Theta^{(0)}, return to Phase 1. Otherwise, proceed to Phase 3.

  3. 3.

    Phase 3: Sequential Testing: Let

    θ^m(0)(n)argmaxθΘ(0)f(y¯m(n)|θ)\hat{\theta}_{m}^{(0)}(n)\triangleq\arg\max_{\theta\in\Theta^{(0)}}f(\overline{\textbf{y}}_{m}(n)|\theta) (12)

    be the constrained MLE for cell mm to be in a normal state, and

    Sm(n)t=T+2nm(θ^m(t1),θ^m(0)(n))(t)=t=T+2nlogf(ym(t)|(θ^m(t1))f(ym(t)|θ^m(0)(n))\begin{array}[]{l}\displaystyle S_{m}(n)\triangleq\sum_{t=T+2}^{n}\ell_{m}^{(\hat{\theta}_{m}(t-1),\hat{\theta}_{m}^{(0)}(n))}(t)\vspace{0.2cm}\\ \displaystyle\hskip 28.45274pt=\sum_{t=T+2}^{n}\log\frac{f(y_{m}(t)|(\hat{\theta}_{m}(t-1))}{f(y_{m}(t)|\hat{\theta}_{m}^{(0)}(n))}\end{array} (13)

    be the sum of the adaptive LLR (SALLR) of cell mm at time n used to confirm hypothesis mm.

    If Sm^(n)(n)log(c)S_{\hat{m}(n)}(n)\geq-\log(c) stop the test and declare δ=m^(n)\delta=\hat{m}(n) as the anomalous cell, otherwise go to Phase 2.

III-B Insights and Discussion on the SCPA Algorithm

III-B1 Considerations and Implementation Details

Note that the SCPA algorithm is deterministic and transitions between phases based on the current state as determined by the MLE. In the exploration phase, we estimate the cell parameter using only the most recent NN observations. The value of NN is a predefined parameter, which can be chosen freely. Since we only use a finite number of observations, after the change point, we exclude benign observations from the abnormal cell that occurred before the change point. This approach is particularly beneficial when the change point happens after a long time, as it reduces the need to accumulate too many abnormal observations to counteract the bias from earlier normal ones, thus ensuring that our performance remains unaffected by the timing of the change point. Selecting NN too small may result in instability, causing the algorithm to repeatedly enter and exit Phase 1, while a larger NN may lead to prolonged stays in Phase 1 after the change point. For simplicity, we chose N=1N=1 in our theoretical and numerical evaluations, though any fixed natural constant can be used theoretically.

In Phases 2 and 3, we avoid using all the normal observations of the anomalous cell to prevent performance degradation, while ensuring enough observations from the target cell are accumulated to confirm the correct hypothesis. Note that in Phase 3, the summation begins at T+2T+2 rather than T+1T+1, ensuring it relies on observations from T+1T+1 to nn, excluding TT. This adjustment accounts for the parameter estimation in the numerator and arises from technical considerations necessary to establish the theoretical asymptotic performance presented in the appendix.

III-B2 Tackling Multi-Process Probing and Detection of Multiple Anomalies

Next, we discuss the generalization of the SCPA algorithm to accommodate the case of K1K\geq 1. The extension works as follows: In Phase 1, at each time step, KK cells are sequentially probed in a round-robin fashion in groups of KK. The transition to Phase 2 occurs under the same condition as before, when exactly one cell is suspected of having an anomaly. In Phase 2, the remaining cells are ranked by their SALLR values, selecting the next K1K-1 cells with the highest likelihood of being anomalous. The suspected cell and these K1K-1 cells are then probed. If the suspected cell’s parameter estimate falls within the normal range, or if any of the K1K-1 cells exhibit parameter estimates outside the normal range (based on observations collected since entering this phase), we return to Phase 1. Otherwise, we proceed to Phase 3. In Phase 3, the algorithm compares the SALLR of the suspected cell with the second-highest SALLR. If the difference is greater than or equal to logc-\log c, the suspected cell is declared abnormal, and the test stops. Otherwise, the process returns to Phase 2.

Next, we address the extension of SCPA to handle multiple (say LL) anomalies. The implementation of Phase 1 remains unchanged, but the algorithm proceeds to Phase 2 when exactly LL cells are suspected of being anomalous. In Phase 2, these LL cells are sequentially observed. If any of the LL cells no longer appear anomalous based on their parameter estimates, the algorithm returns to Phase 1. Otherwise, after each observation in Phase 2, the SALLR values of the cells are evaluated. When the maximum SALLR exceeds the threshold, the corresponding cell is declared anomalous. The algorithm then continues observing the remaining L1L-1 suspected cells in Phase 2 until all LL anomalies are detected.

III-B3 Using Generalized LLR

In Phase 3, the sum of the adaptive LLR from equation (13) is used to evaluate the suspected cell. Alternatively, the sum of the Generalized LLR (GLLR) can be applied in a similar manner:

Sm(n)t=T+2nm(θ^m(n),θ^m(0)(n))(t)=t=T+2nlogf(ym(t)|(θ^m(n))f(ym(t)|θ^m(0)(n)).\begin{array}[]{l}\displaystyle S_{m}(n)\triangleq\sum_{t=T+2}^{n}\ell_{m}^{(\hat{\theta}_{m}(n),\hat{\theta}_{m}^{(0)}(n))}(t)\vspace{0.2cm}\\ \displaystyle\hskip 28.45274pt=\sum_{t=T+2}^{n}\log\frac{f(y_{m}(t)|(\hat{\theta}_{m}(n))}{f(y_{m}(t)|\hat{\theta}_{m}^{(0)}(n))}.\end{array} (14)

The ALLR is advantageous for bounding the error probability effectively using martingale techniques. However, it depends on parameter estimates derived from a limited number of initial observations, which remain fixed and cannot be updated even as additional data is collected. On the other hand, the GLLR (14) is known to achieve better empirical performance, as it updates parameter estimates with the latest observations. Despite this practical advantage, the GLLR lacks the theoretical performance guarantees provided by the ALLR.

IV Performance Analysis

In this section, we provide an asymptotic analysis of the algorithm’s performance. The analysis assumes the setting of K=1K=1 and a single anomaly, which are common assumptions in dynamic search problem analysis (e.g., [53, 26, 9] and subsequent studies). We begin in Section IV-A by examining the case where no additional information about the process states is available—that is, both the pre-change and post-change distribution parameters are unknown. In Section IV-B, we consider the case where the parameter under the null hypothesis is known and identical across all non-anomalous processes.

In traditional change point analysis, deriving optimality properties often involves restricting the occurrence of the change point, τc\tau_{c}. This is typically done by assuming a prior distribution on τc\tau_{c} or adopting weaker performance measures, such as allowing a fixed error rate that does not vanish asymptotically. In contrast, this work introduces an assumption for τc\tau_{c} that enables strong guarantees on the vanishing of error probabilities, aligning with the classical frameworks of sequential experimental design by Chernoff [4]. Specifically, we impose an asymptotic constraint on τc\tau_{c}, allowing it to be arbitrarily large while ensuring its occurrence is not excessively delayed relative to the detection time.

This assumption is particularly relevant in practical anomaly detection systems. For example, in cyber anomaly detection, algorithms often operate during suspicious intervals where anomalies, such as malicious or undesired behavior, are expected to manifest before corrective actions (e.g., patch installations) take effect. More generally, this assumption applies to systems operating within time intervals, where the focus is on ensuring vanishing error probabilities within intervals where the change has occurred (i.e., when the assumption on the asymptotic bound on τc\tau_{c} holds), rather than accepting fixed error rates over the entire time horizon. This approach provides a robust framework for maintaining strong performance guarantees under realistic operational conditions. The asymptotic constraint on τc\tau_{c} assumed throughout the analysis in this section is formally defined as follows:

Assumption 1: Let cc be the cost per observation defined in (5). Then, the change point τc\tau_{c} is of order O((logc)1δ)O\left((-\log c)^{1-\delta}\right) for some 0<δ<10<\delta<1.

This assumption implies that τc\tau_{c} grows slower than logc-\log c, specifically, o(logc)o\left(-\log c\right). Later, we demonstrate that the detection time is of order O(logc))O\left(-\log c)\right), ensuring that τc\tau_{c} does not occur too late relative to the detection time as cc approaches zero.

IV-A change point Anomaly Search without Side Information

We start by analyzing the SCPA algorithm in the setting where the parameters of both the anomalous and non-anomalous distributions are unknown. The following theorem establishes its asymptotic optimality:

Theorem 1.

Let RR^{*} and R(Γ;τc)R(\Gamma\,;\,\tau_{c}) be the Bayes risks under the SCPA algorithm and any other policy Γ\Gamma, respectively. Then, the Bayes risk satisfies:111The notation fgf\sim g as c0c\rightarrow 0 refers to limc0f/g=1\lim_{c\rightarrow 0}f/g=1.

RclogcD(θ(1))infΓR(Γ;τc)asc0,\displaystyle\begin{array}[]{l}\displaystyle R^{*}\;\sim\;\frac{-c\log c}{D(\theta^{(1)})}\;\sim\;\inf_{\Gamma}\;{R(\Gamma\,;\,\tau_{c})}\;\;\;\text{as}\;\;\;c\rightarrow 0,\end{array}

where D(θ(1))minφΘ(0)D(θ(1)||φ)\displaystyle D(\theta^{(1)})\triangleq\min\nolimits_{\varphi\in\Theta^{(0)}}D(\theta^{(1)}||\varphi).

Proof.

The proof can be found in Appendix VII-B. The asymptotic optimality of the SCPA algorithm relies on several key results demonstrated in the Appendix. First, we establish that the algorithm achieves an error probability of order O(c(logc)1δ)O\left(c\cdot(-\log c)^{1-\delta}\right). Second, we show that the post-change exploration time is of order O(1)O(1). The bounded post-change exploration time of the SCPA algorithm is particularly noteworthy, as it contrasts sharply with the logarithmic exploration time commonly observed in many active search strategies (see, for example, [26, 10]). Additionally, recall that the asymptotic constraint on τc\tau_{c} results in a pre-change time of order o(logc)o\left(-\log c\right). Finally, we demonstrate that the post-change detection time asymptotically satisfies clogcD(θ(1)),as c0.\sim\frac{-c\log c}{D(\theta^{(1)})},\text{as }c\to 0.

Taken together, these results yield the asymptotic Bayes risk stated earlier. Since clogcD(θ(1))\frac{-c\log c}{D(\theta^{(1)})} serves as a lower bound on the Bayes risk of any algorithm, these findings collectively establish the asymptotic optimality of SCPA. For a detailed proof, see Appendix VII-B. ∎

IV-B change point Anomaly Search Under a Known Model of Normality

Here, we assume that the parameter under the null hypothesis, θ=θ(0)Θ(0)\theta=\theta^{(0)}\in\Theta^{(0)}, is known and equal for the non-anomalous distribution across all cells. This setup models many anomaly detection scenarios, where the decision maker can infer the pre-change distribution parameters, which represent the typical distribution before the change, while there is uncertainty regarding the distribution after the change point (i.e., under abnormal conditions). To utilize this information, we substitute the estimation of the parameter θ(0)\theta^{(0)} in the SALLR statistics (13) with its known true value:

Sm(n)t=T+2nlogf(ym(t)|θ^m(t1))f(ym(t)|θ(0)).\begin{array}[]{l}\displaystyle S_{m}(n)\triangleq\sum_{t=T+2}^{n}\log{\frac{f(y_{m}(t)|\hat{\theta}_{m}(t-1))}{f(y_{m}(t)|\theta^{(0)})}}.\end{array} (16)

The following theorem establishes the asymptotic optimality of SCPA algorithm in this setting.

Theorem 2.

Let RR^{*} and R(Γ;τc)R(\Gamma\,;\,\tau_{c}) be the Bayes risks under the SCPA algorithm and any other policy Γ\Gamma, respectively. Then, the Bayes risk satisfies:

RclogcD(θ(1)||θ(0))infΓR(Γ;τc)asc0.\begin{array}[]{l}\displaystyle R^{*}\;\sim\;\frac{-c\log c}{D(\theta^{(1)}||\theta^{(0)})}\;\sim\;\inf_{\Gamma}\;{R(\Gamma\,;\,\tau_{c})}\;\;\;\text{as}\;\;\;c\rightarrow 0.\end{array} (17)
Proof.

The proof is given in Appendix VII-A. ∎

We note that having knowledge of the parameter for the non-anomalous distribution enhances the algorithm’s performance. This improvement is evident from the fact that D(θ(1)||θ(0))>D(θ(1))D(\theta^{(1)}||\theta^{(0)})>D(\theta^{(1)}), which implies that the Bayes risk stated in Theorem 2 is smaller than the risk presented in Theorem 1.

V Numerical Examples

We validate the theoretical findings of SCPA through numerical examples. In our simulations, we modeled a single anomaly occurring in one of the MM cells after the change point, with the following setup: The a priori probability of the target being present in cell mm was set to πm=1/M\pi_{m}=1/M for all 1mM1\leq m\leq M. When cell mm was observed at time nn, an observation ym(n)y_{m}(n) was independently drawn from either the distribution exp(θ(0))\exp(\theta^{(0)}) or exp(θ(1))\exp(\theta^{(1)}), depending on whether the target was absent or present, respectively.

We examine the cases for both τc=0\tau_{c}=0 and τc=70\tau_{c}=70, with the results shown in Fig. 1 and Fig. 2, respectively, for the average detection delay, and in Fig. 3 and Fig. 4, respectively, for the log error probability. The results indicate similar outcomes for both realizations of the change point, as it does not influence the detection time. This observation is consistent with our theoretical findings. Additionally, we observe a linear relationship between the detection time and logc-\log c, which further aligns with the theoretical findings.

Refer to caption
Figure 1: The average detection delay as a function of logc-\log c, with M=5M=5, Θ(0)={0.1N,1N10}\Theta^{(0)}=\{0.1N,1\leq N\leq 10\} , Θ(1)={N,2N10}\Theta^{(1)}=\{N,2\leq N\leq 10\} and τc=0\tau_{c}=0.
Refer to caption
Figure 2: The average detection delay as a function of logc-\log c, with M=5M=5, Θ(0)={0.1N,1N10}\Theta^{(0)}=\{0.1N,1\leq N\leq 10\} , Θ(1)={N,2N10}\Theta^{(1)}=\{N,2\leq N\leq 10\} and τc=70\tau_{c}=70.
Refer to caption
Figure 3: The error probability as a function of the average detection delay, with M=5M=5, Θ(0)={1+0.1N,0N10}\Theta^{(0)}=\{1+0.1N,0\leq N\leq 10\} , Θ(1)={0.5+0.1N,0N4}{2.1+0.1N,0N4}\Theta^{(1)}=\{0.5+0.1N,0\leq N\leq 4\}\cup\{2.1+0.1N,0\leq N\leq 4\} and τc=0\tau_{c}=0.
Refer to caption
Figure 4: The error probability as a function of the average detection delay, with M=5M=5, Θ(0)={1+0.1N,1N10}\Theta^{(0)}=\{1+0.1N,1\leq N\leq 10\} , Θ(1)={0.5+0.1N,0N4}{2.1+0.1N,0N4}\Theta^{(1)}=\{0.5+0.1N,0\leq N\leq 4\}\cup\{2.1+0.1N,0\leq N\leq 4\} and τc=70\tau_{c}=70.

We proceed to compare the proposed SCPA algorithm with the CUSUM algorithm, both used for change point detection, with SALLR based on the closest abnormal and normal parameters. At each step, if the SALLR is negative, the next cell is probed. Otherwise, we determine whether to stop and declare the cell anomalous (if the SALLR exceeds logc-\log c) or to probe the cell again and repeat the process. The SCPA algorithm was evaluated in two scenarios: without side information and with side information, as detailed in Sections IV-A and IV-B, respectively. The results are presented in Fig. 5 and Fig. 6. In both simulations with side information, the SCPA algorithm demonstrated a clear advantage, achieving a smaller error probability and a shorter detection time from the start of the measurements. Without side information, CUSUM achieved faster detection times for higher error probability values. However, asymptotically, as the error probability decreased, the SCPA algorithm without side information exhibited superior performance.

Refer to caption
Figure 5: The error probability as a function of the average detection delay under the SCPA algorithm with side information, the SCPA algorithm without side information and the CUSUM algorithm, with M=4M=4, Θ(0)={0.1N,1N9}\Theta^{(0)}=\{0.1N,1\leq N\leq 9\} , Θ(1)={N,1N30}\Theta^{(1)}=\{N,1\leq N\leq 30\} and τc=20\tau_{c}=20.
Refer to caption
Figure 6: The error probability as a function of the average detection delay under the SCPA algorithm with side information, the SCPA algorithm without side information and the CUSUM algorithm, with M=7M=7, Θ(0)={10+10N,1N9}\Theta^{(0)}=\{10+10N,1\leq N\leq 9\} , Θ(1)={0.1+0.5N,0N37}\Theta^{(1)}=\{0.1+0.5N,0\leq N\leq 37\} and τc=0\tau_{c}=0.

VI Conclusion

We addressed the anomaly detection problem in a set of MM processes, where only a subset of cells can be probed at each step. Observations depend on unknown cell parameters, classified as normal or abnormal. Before the change point, all cells are normal; after the change point, one cell transitions to the abnormal state. Our goal was to develop a sequential strategy minimizing detection time under an error probability constraint. We analyzed two scenarios: without prior information and with known normal parameters. We proved asymptotic optimality in both cases and showed improved detection times with additional information. Simulations validated the theoretical guarantees and efficiency of the proposed SCPA algorithm.

VII Appendix

For clarity, we begin by proving Theorem 2, followed by highlighting the key steps required to extend the results to the context of Theorem 1.

VII-A Proof of Theorem 2

To prove the theorem, we introduce the following definitions:

Definition 1.

Let τEST\tau_{EST} be the minimal integer such that τESTτc\tau_{EST}\geq\tau_{c} and for all nτESTn\geq\tau_{EST}, the parameter estimates satisfy: θ^m(n)=θ(1)\hat{\theta}_{m}(n)=\theta^{(1)}, θ^j(n)=θ(0)\hat{\theta}_{j}(n)=\theta^{(0)} for all jmj\neq m under hypothesis HmH_{m}.

Definition 2.

nESTτESTτcn_{EST}\triangleq\tau_{EST}-\tau_{c} denotes the total amount of time between τEST\tau_{EST} and the change point, τc\tau_{c}.

Remark 1.

Later in the proof, we will show that the expected time spent during the exploration phase is O(1)O(1). Note that for all n>τESTn>\tau_{EST}, we probe only cell mm, which occurs during the exploitation phase. Therefore, the time spent in the exploration phase after the change point is bounded by nESTn_{EST}. As a result, to show that the expected exploration phase time is O(1)O(1), it is sufficient to prove that the expectation of nESTn_{EST} is O(1)O(1). The detailed proof is provided rigorously later.

Remark 2.

In the following lemma, when we say that SCPA is implemented indefinitely, we mean that the algorithm operates without stopping, disregarding the stopping rule.

Lemma 1.

Assume that SCPA is implemented indefinitely. Then, there exist C>0C>0 and γ>0\gamma>0 such that

Pm(nEST>n)Ceγn.\textbf{P}_{m}(n_{EST}>n)\leq Ce^{-\gamma\sqrt{n}}. (18)
Proof.

At any given moment, we are either in Phase 1 (exploration) or Phase 2 (exploitation). Therefore, if nEST>nn_{EST}>n, it implies that after the change point, we remained in either Phase 1 or Phase 2 for at least n/2n/2 time indices. Let T1T_{1} denote the time spent in Phase 1 after τc\tau_{c}. Then, we have:

𝐏m(nEST>n)\displaystyle\mathbf{P}_{m}(n_{EST}>n) 𝐏m(nEST>n,T1n2)\displaystyle\leq\mathbf{P}_{m}\left(n_{EST}>n\;,\;T_{1}\geq\frac{n}{2}\right)
+𝐏m(nEST>n,T1<n2).\displaystyle+\mathbf{P}_{m}\left(n_{EST}>n\;,\;T_{1}<\frac{n}{2}\right). (19)

We now bound the first term on the RHS of (VII-A). A complete round of Phase 1 requires MM time indices. Therefore, there are at least n2M\frac{n}{2M} rounds of Phase 1 after the anomaly occurs. Each round of Phase 1 can lead to one of the following outcomes: (i) Return to Phase 1. (ii) Move to Phase 2 with m^(t)=m\hat{m}(t)=m. (iii) Move to Phase 2 with m^(t)m\hat{m}(t)\neq m. Based on the above, at least n6M\frac{n}{6M} rounds end with one of these outcomes. Thus, it suffices to demonstrate that the probability of each outcome is bounded.

(i) For at least n6M\frac{n}{6M} rounds that ended by returning to Phase 1: If a round of Phase 1 ends with a return to Phase 1, it implies either that the observations ended by: θ^jΘ(0)\hat{\theta}_{j}\in\Theta^{(0)} for all jj, or that there exist two cells with θ^jΘ(0)\hat{\theta}_{j}\not\in\Theta^{(0)} (so we have jmj\neq m with θ^jΘ(0)\hat{\theta}_{j}\not\in\Theta^{(0)}). Following the same arguments as before, we deduce that after the change point, we have at least n12M\frac{n}{12M} rounds of Phase 1 that end with θ^jΘ(0)\hat{\theta}_{j}\in\Theta^{(0)} for all jj, or at least n12M\frac{n}{12M} rounds of Phase 1 that end with some jmj\neq m that θ^jΘ(0)\hat{\theta}_{j}\not\in\Theta^{(0)}. We will show that each of them is bounded.

If there are at least n12M\frac{n}{12M} rounds ending with θ^j(t)Θ(0)\hat{\theta}_{j}(t)\in\Theta^{(0)} for all jj, it implies that there are at least n12M\frac{n}{12M} observations of cell mm resulting in θ^m(t)Θ(0)\hat{\theta}_{m}(t)\in\Theta^{(0)}. Lets Nm(n)N_{m}(n) be the total number of these observations and t1,,tNm(n)t_{1},...,t_{N_{m}(n)} represent their corresponding time indices. Therefore, by the definition of the MLE in (10) we know that m(θ(1),θ^m(ti))(ti)<0\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})<0 whenever θ^m(ti)θ(1)\hat{\theta}_{m}(t_{i})\neq\theta^{(1)}, for all 1iNm(n)1\leq i\leq N_{m}(n), and by summation we have,

i=1Nm(n)m(θ(1),θ^m(ti))(ti)<0.\sum_{i=1}^{N_{m}(n)}\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})<0. (20)

Thus, applying the Chernoff bound and leveraging the independence of m(θ(1),θ^m(ti))(ti)\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i}), we obtain:

𝐏m(i=1Nm(n)m(θ(1),θ^m(ti))(ti)<0)\displaystyle\displaystyle\mathbf{P}_{m}\left(\sum_{i=1}^{N_{m}(n)}\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})<0\right)
i=1Nm(n)Em[esm(θ(1),θ^m(ti))(ti)]\displaystyle\leq\mathbf{\prod}_{i=1}^{N_{m}(n)}{E}_{m}\left[e^{-s\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})}\right] (21)

for all s>0s>0.

Note that a moment-generating function (MGF) is equal to 1 at s=0s=0. Since Em[m(θ(1),θ^m(ti))(ti)]=D(θ(1)|θ^m(ti))<0\textbf{E}_{m}\left[-\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})\right]=-\displaystyle{D}(\theta^{(1)}|\hat{\theta}_{m}(t_{i}))<0 is strictly negative, the derivative of the MGF of m(θ(1),θ^m(ti))(ti)-\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i}) with respect to ss is strictly negative at s=0s=0. Therefore, there exist si>0s_{i}>0 and γi>0\gamma_{i}>0 such that Em[esim(θ(1),θ^m(ti))(ti)]<eγi\textbf{E}_{m}\left[e^{-s_{i}\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})}\right]<e^{-\gamma_{i}}. Consequently, we can find s>0s>0 and γ>0\gamma>0 such that Em[esm(θ(1),θ^m(ti))(ti)]<eγ\textbf{E}_{m}\left[e^{-s\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})}\right]<e^{-\gamma} for all θ^m(ti)Θ(0)\hat{\theta}_{m}(t_{i})\in\Theta^{(0)}. Hence,

i=1Nm(n)Em[esm(θ(1),θ^m(ti))(ti)]eγNm(n)\displaystyle\mathbf{\prod}_{i=1}^{N_{m}(n)}\textbf{E}_{m}\left[e^{-s\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t_{i}))}(t_{i})}\right]\leq e^{-\gamma\cdot N_{m}(n)}
eγneγn,\displaystyle\leq e^{-\gamma^{\prime}n}\leq e^{-\gamma^{\prime}\sqrt{n}}, (22)

where the last inequality holds because Nm(n)n12MN_{m}(n)\geq\frac{n}{12M}, and for an appropriately chosen γ\gamma^{\prime}, we obtain the desired result.

For at least n12M\frac{n}{12M} rounds of Phase 1, some jmj\neq m satisfies θ^j(t)Θ(0)\hat{\theta}_{j}(t)\not\in\Theta^{(0)}. Note that this jj may vary across rounds, but there are at least n12M2\frac{n}{12M^{2}} rounds where the same jj occurs. Consequently, we have Nj(n)n12M2N_{j}(n)\geq\frac{n}{12M^{2}} observations of jj that resulted in θ^j(t)θ(0)\hat{\theta}_{j}(t)\neq\theta^{(0)}. Let t1,,tNj(n)t_{1},\dots,t_{N_{j}(n)} denote the time indices of these observations, and θ^j(ti)\hat{\theta}_{j}(t_{i}) represent their corresponding estimates during Phase 1. By the MLE definition (10), we have j(θ(0),θ^j(ti))(ti)<0\ell_{j}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i})<0 whenever θ^j(ti)θ(0)\hat{\theta}_{j}(t_{i})\neq\theta^{(0)}, for all 1iNj(n)1\leq i\leq N_{j}(n), which implies:

i=1Nj(n)j(θ(0),θ^j(ti))(ti)<0.\sum_{i=1}^{N_{j}(n)}\ell_{j}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i})<0. (23)

By applying the Chernoff bound and leveraging the independence of j(θ(0),θ^j(ti))(ti)\ell_{j}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i}), we obtain:

𝐏m(i=1Nj(n)j(θ(0),θ^j(ti))(ti)<0)\displaystyle\displaystyle\mathbf{P}_{m}\left(\sum_{i=1}^{N_{j}(n)}\ell_{j}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i})<0\right)
i=1Nj(n)Em[esj(θ(0),θ^j(ti))(ti)]\displaystyle\leq\mathbf{\prod}_{i=1}^{N_{j}(n)}\textbf{E}_{m}\left[e^{-s\cdot\ell_{j}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i})}\right] (24)

For all s>0s>0. Since Em[j(θ(0),θ^j(ti))(ti)]=D(θ(0)|θ^j(ti))<0\textbf{E}_{m}\left[-\ell_{j}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i})\right]=-\displaystyle D(\theta^{(0)}|\hat{\theta}_{j}(t_{i}))<0 for all 1iNj(n)1\leq i\leq N_{j}(n), by applying similar arguments as in the previous case, we can find s>0s>0 and γ>0\gamma>0 such that Em[esm(θ(0),θ^j(ti))(ti)]<eγ\textbf{E}_{m}\left[e^{-s\cdot\ell_{m}^{(\theta^{(0)},\hat{\theta}_{j}(t_{i}))}(t_{i})}\right]<e^{-\gamma} for all θ^j(ti)Θ(0)\hat{\theta}_{j}(t_{i})\not\in\Theta^{(0)}. Hence, (VII-A) can be bounded by:

eγNj(n)\displaystyle e^{-\gamma\cdot N_{j}(n)} eγneγn,\displaystyle\leq e^{-\gamma^{\prime}n}\leq e^{\gamma^{\prime}\sqrt{n}}, (25)

for some suitable γ\gamma^{\prime}.

(ii) For at least n6M\frac{n}{6M} rounds that ended by moving to Phase 2 with m^(t)=m\hat{m}(t)=m: Since τEST>n\tau_{EST}>n, we are guaranteed to return to Phase 1. This means that the observations of cell mm in Phase 2 result in the MLE, as defined in (11), being within the normal parameter space, Θ(0)\Theta^{(0)}. Denote by 𝒩2(n)\mathcal{N}_{2}(n) the number of times we enter Phase 2 under these conditions up to time nn. Therefore, we have 𝒩2(n)n6M\mathcal{N}_{2}(n)\geq\frac{n}{6M} under this assumption. For 1i𝒩2(n)1\leq i\leq\mathcal{N}_{2}(n) let TiT_{i} be the time at which we exit the exploration phase (as defined in the exploitation phase), and NiN_{i} be the time at which we exit Phase 2 and return to Phase 1. Thus, for all 1i𝒩2(n)1\leq i\leq\mathcal{N}_{2}(n) we have:

θ^m(Ni)=argmaxθΘf(y¯m(Ni)|θ)Θ(0),\hat{\theta}_{m}(N_{i})=\arg\max_{\theta\in\Theta}f(\overline{\textbf{y}}_{m}(N_{i})|\theta)\in\Theta^{(0)}, (26)

when θ^m(Ni)θ(1)\hat{\theta}_{m}(N_{i})\neq\theta^{(1)} and y¯m(Ni)={ym(Ti+1),,ym(Ni)}\overline{\textbf{y}}_{m}(N_{i})=\{y_{m}(T_{i}+1),...,\\ y_{m}(N_{i})\}. Therefore, (26) implies:

t=Ti+1Nim(θ(1),θ^m(Ni))(t)<0.\sum_{t=T_{i}+1}^{N_{i}}\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(t)<0. (27)

By summing over all ii, we obtain:

i=1𝒩2(n)(t=Ti+1Nim(θ(1),θ^m(Ni))(t))<0.\sum_{i=1}^{\mathcal{N}_{2}(n)}\left(\sum_{t=T{i}+1}^{N_{i}}\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(t)\right)<0. (28)

By applying the Chernoff bound and the i.i.d property of j(θ(1),θ^m(Ni))\ell_{j}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))} we obtain:

𝐏m(i=1𝒩2(n)t=Ti+1Nim(θ(1),θ^m(Ni))(t)<0)\displaystyle\displaystyle\mathbf{P}_{m}\left(\sum_{i=1}^{\mathcal{N}_{2}(n)}\sum_{t=T{i}+1}^{N_{i}}\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(t)<0\right)
i=1𝒩2(n)[Em(esm(θ(1),θ^m(Ni))(Ti+1))]NiTi.\displaystyle\leq\mathbf{\prod}_{i=1}^{\mathcal{N}_{2}(n)}\left[\textbf{E}_{m}\left(e^{-s\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(T_{i}+1)}\right)\right]^{N_{i}-T_{i}}. (29)

Since Em[m(θ(1),θ^m(Ni))(Ti+1)]=D(θ(1)|θ^m(Ni))<0\textbf{E}_{m}\left[-\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(T_{i}+1)\right]=-\displaystyle D(\theta^{(1)}|\hat{\theta}_{m}(N_{i}))<0 for all 1i𝒩2(n)1\leq i\leq\mathcal{N}_{2}(n), there exist si>0s_{i}>0 and γi>0\gamma_{i}>0 such that Em[esim(θ(1),θ^m(Ni))(Ti+1)]<eγi\textbf{E}_{m}\left[e^{-s_{i}\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(T_{i}+1)}\right]<e^{-\gamma_{i}}. Thus, we can find s>0s>0 and γ>0\gamma>0 such that Em[esm(θ(1),θ^m(Ni))(Ti+1)]<eγ\textbf{E}_{m}\left[e^{-s\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(T_{i}+1)}\right]<e^{-\gamma} for all θ^m(Ni)\hat{\theta}_{m}(N_{i}). As a result, we have:

i=1𝒩2(n)[Em(esm(θ(1),θ^m(Ni))(Ti+1))]NiTi\displaystyle\displaystyle\mathbf{\prod}_{i=1}^{\mathcal{N}_{2}(n)}\left[\textbf{E}_{m}\left(e^{-s\cdot\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(N_{i}))}(T_{i}+1)}\right)\right]^{N_{i}-T_{i}}
i=1𝒩2(n)eγ(NiTi)eγ𝒩2(n)eγn\displaystyle\leq\mathbf{\prod}_{i=1}^{\mathcal{N}_{2}(n)}e^{-\gamma\cdot(N_{i}-T_{i})}\leq e^{-\gamma\mathcal{N}_{2}(n)}\leq e^{-\gamma^{\prime}n}
eγn,\displaystyle\leq e^{-\gamma\sqrt{n}}, (30)

where the second inequality follows from NiTi1N_{i}-T_{i}\geq 1.

(iii) For at least n6M\frac{n}{6M} rounds that ended by moving to Phase 2 with m^(t)m\hat{m}(t)\neq m:

If we moved to Phase 2 with m^(n)m\hat{m}(n)\neq m, this implies that at least n6M\frac{n}{6M} observations of cell mm resulted in θ^m(t)Θ(0)\hat{\theta}_{m}(t)\in\Theta^{(0)}. Therefore, the probability is bounded using the same arguments as in the first case of (i) in step 1.

Next, we bound the second term on the RHS of (VII-A). We can enter Phase 2 in one of two ways: (i) m^(t)=m\hat{m}(t)=m, or (ii) m^(t)m\hat{m}(t)\neq m. Spending less than n2\frac{n}{2} time indices in Phase 1 implies that we spent more than n2\frac{n}{2} time indices in Phase 2. In this case, we either spent more than n4\frac{n}{4} time indices in Phase 2 with m^(t)=m\hat{m}(t)=m, or we spent more than n4\frac{n}{4} time indices in Phase 2 with m^(t)m\hat{m}(t)\neq m. It is sufficient to show that each of these cases is bounded.

(i) For more than n4\frac{n}{4} time on Phase 2 with m^(t)=m\hat{m}(t)=m: As in (ii) in step 1, denote by 𝒩2(n)\mathcal{N}_{2}(n) the number of times we entered Phase 2 with cell mm up to time nn. Since τEST>n\tau_{EST}>n, we know that we will return to Phase 1. Let TiT_{i} and NiN_{i} represent the time we entered and the time we returned to Phase 1, respectively, for all 1i𝒩2(n)1\leq i\leq\mathcal{N}_{2}(n). Notice that i=1𝒩2(n)(TiNi)\sum_{i=1}^{\mathcal{N}_{2}(n)}(T_{i}-N_{i}) is the total time spent in Phase 2 with cell mm up to time nn, which is assumed to be greater than n4\frac{n}{4}. Using similar arguments as in (ii) in step 1, there exists γ>0\gamma>0 such that the probability of this case is bounded by:

i=1𝒩2(n)eγ(NiTi)=\displaystyle\mathbf{\prod}_{i=1}^{\mathcal{N}_{2}(n)}e^{-\gamma\cdot(N_{i}-T_{i})}= eγi=1𝒩2(n)(TiNi)<eγn4\displaystyle e^{-\gamma\sum_{i=1}^{\mathcal{N}_{2}(n)}(T_{i}-N_{i})}<e^{-\gamma\cdot\frac{n}{4}}
eγneγn.\displaystyle\leq e^{-\gamma^{\prime}n}\leq e^{-\gamma^{\prime}\sqrt{n}}. (31)

(ii) For more than n4\frac{n}{4} time on Phase 2 with m^(t)m\hat{m}(t)\neq m:

Under this assumption, one of the following occurs: (a) We have more than n2\frac{\sqrt{n}}{2} rounds of Phase 1 that end with moving to Phase 2 where m^(t)m\hat{m}(t)\neq m. (b) We enter Phase 2 with m^(t)m\hat{m}(t)\neq m and remain in Phase 2 for more than n2\frac{\sqrt{n}}{2} time indices. If neither (a) nor (b) occurs, we will conclude that we entered Phase 2 with the wrong cell at most n2\frac{\sqrt{n}}{2} times, and in each case, we stayed for at most n2\frac{\sqrt{n}}{2} time indices. Therefore, the total time spent in Phase 2 with the wrong cell is at most n2n2=n4\frac{\sqrt{n}}{2}\cdot\frac{\sqrt{n}}{2}=\frac{n}{4}, which contradicts the assumption in (ii). Hence, to prove the lemma, it remains to show that both (a) and (b) are bounded.

For Case (a), we have more than n2\frac{\sqrt{n}}{2} observations of cell mm that result in θ^m(t)Θ(0)\hat{\theta}_{m}(t)\in\Theta^{(0)}. Therefore, following the same reasoning as in the first case of (i) in step 1, let Nm(n)N_{m}(n) denote the number of these observations up to time nn, with the condition that Nm(n)>n2N_{m}(n)>\frac{\sqrt{n}}{2}. Using the Chernoff bound and the properties of m(θ(1),θ^m(t))\ell_{m}^{(\theta^{(1)},\hat{\theta}_{m}(t))} (independence and negative expectation), we can find a constant γ>0\gamma>0 such that the probability of this case is bounded by:

eγNm(n)<eγn2=eγn.\displaystyle e^{-\gamma\cdot N_{m}(n)}<e^{-\gamma\frac{\sqrt{n}}{2}}=e^{-\gamma^{\prime}\sqrt{n}}. (32)

For Case (b), if we spent more than n2\frac{\sqrt{n}}{2} time units in Phase 2 with m^m\hat{m}\neq m, this implies that after n2\frac{\sqrt{n}}{2} time units in Phase 2, we did not return to Phase 1. Let TT denote the time at which we entered Phase 2. Thus, at time t=T+n2t=T+\frac{\sqrt{n}}{2}, the MLE, as defined in (11), satisfies θ^m^(t)Θ(0)\hat{\theta}_{\hat{m}}(t)\not\in\Theta^{(0)}. Therefore,

r=T+1tm^(θ(0),θ^m^(t))(r)<0.\sum_{r=T+1}^{t}\ell_{\hat{m}}^{(\theta^{(0)},\hat{\theta}_{\hat{m}(t)})}(r)<0. (33)

By applying the Chernoff bound and using the i.i.d property of m^(θ(0),θ^m^(t))(r)\ell_{\hat{m}}^{(\theta^{(0)},\hat{\theta}_{\hat{m}(t)})}(r), we have:

𝐏m(r=T+1tm^(θ(0),θ^m^(t))(r)<0)\displaystyle\displaystyle\mathbf{P}_{m}\left(\sum_{r=T+1}^{t}\ell_{\hat{m}}^{(\theta^{(0)},\hat{\theta}_{\hat{m}(t)})}(r)<0\right)
[Em(esm^(θ(0),θ^m^(t))(T+1))]n2.\displaystyle\leq\left[\textbf{E}_{m}\left(e^{-s\cdot\ell_{\hat{m}}^{(\theta^{(0)},\hat{\theta}_{\hat{m}}(t))}(T+1)}\right)\right]^{\frac{\sqrt{n}}{2}}. (34)

Using the fact that Em(m^(θ(0),θ^m^(t))(T+1))=D(θ(0)||θ^m^(t))<0\textbf{E}_{m}\left(\ell_{\hat{m}}^{(\theta^{(0)},\hat{\theta}_{\hat{m}}(t))}(T+1)\right)=-\displaystyle D(\theta^{(0)}||\hat{\theta}_{\hat{m}}(t))<0, we can find s>0s>0 and γ>0\gamma>0, such that (VII-A) is bounded by

eγn2=eγn,\displaystyle e^{-\gamma\cdot\frac{\sqrt{n}}{2}}=e^{-\gamma^{\prime}\sqrt{n}}, (35)

for γ=γ/2\gamma^{\prime}=\gamma/2, which completes the proof of the lemma. ∎

Definition 3.

Let τU\tau_{U} be the smallest integer such that Sm(n)log(c)S_{m}(n)\geq-\log(c) for all n>τESTn>\tau_{EST}:

τUinf{n>τEST:Sm(n)log(c)},\tau_{U}\triangleq\inf\{n>\tau_{EST}:S_{m}(n)\geq-\log(c)\}, (36)

and let nUτUτESTn_{U}\triangleq\tau_{U}-\tau_{EST} denote the total amount of time between τEST\tau_{EST} and τU\tau_{U}.

Lemma 2.

Assume that SCPA is implemented indefinitely. Then, for every fixed ϵ>0\epsilon>0 there exist C>0C>0 and γ>0\gamma>0 such that

𝐏m(nU>n)Ceγnn>(1+ϵ)logc/D(θ(1)||θ(0)).\begin{array}[]{l}\displaystyle\mathbf{P}_{m}\left(n_{U}>n\right)\leq Ce^{-\gamma\sqrt{n}}\vspace{0.1cm}\\ \hskip 56.9055pt\displaystyle\forall n>-(1+\epsilon)\log c/D\left(\theta^{(1)}||\theta^{(0)}\right).\end{array} (37)
Proof.

Let

m(t)m(θ^m(t1),θ(0))(t)=logf(ym(t)|θ^m(t1))f(ym(t))|θ(0))\displaystyle\ell_{m}(t)\triangleq\ell_{m}^{(\hat{\theta}_{m}(t-1),\theta^{(0)})}(t)=\log\frac{f(y_{m}(t)|\hat{\theta}_{m}(t-1))}{f(y_{m}(t))|{\theta}^{(0)})} (38)

and,

~m(t)m(t)D(θ(1)||θ(0))\displaystyle\tilde{\ell}_{m}(t)\triangleq\ell_{m}(t)-D(\theta^{(1)}||\theta^{(0)}) (39)

denote the ALLR and the normalized ALLR, respectively, of cell mm at time tt. Note that for all t>τESTt>\tau_{EST}, ~m(t)\tilde{\ell}_{m}(t) is a zero-mean random variable under hypothesis HmH_{m}, and as such, we refer to it as the normalized ALLR.

For n>τESTn>\tau_{EST}, by the definition of τEST\tau_{EST} and in the context of the exploitation phase notation, we have TτESTT\leq\tau_{EST}. Let ϵ1=D(θ(1)||θ(0))ϵ/(1+ϵ)>0\epsilon_{1}=D(\theta^{(1)}||\theta^{(0)})\epsilon/(1+\epsilon)>0. Then,

t=T+2τEST+nm(t)+logc=t=T+2τESTm(t)+τEST+1τEST+nm(t)+logc\displaystyle\displaystyle\sum_{t=T+2}^{\tau_{EST}+n}\ell_{m}(t)+\log{c}=\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)+\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\ell_{m}(t)+\log c
=t=T+2τESTm(t)+τEST+1τEST+n~m(t)+nD(θ(1)||θ(0))+logc\displaystyle=\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)+\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}_{m}(t)+nD(\theta^{(1)}||\theta^{(0)})+\log c
t=T+2τESTm(t)+τEST+1τEST+n~m(t)+nϵ1,\displaystyle\geq\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)+\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}_{m}(t)+n\epsilon_{1}, (40)

for all n>(1+ϵ)logc/D(θ(1)|θ(0)).n>-(1+\epsilon)\log c/D(\theta^{(1)}|\theta^{(0)}).

As a result, t=T+2τEST+nm(t)logc\sum_{t=T+2}^{\tau_{EST}+n}\ell_{m}(t)\leq-\log{c}, implies that

t=T+2τESTm(t)+τEST+1τEST+n~m(t)nϵ1\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)+\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}_{m}(t)\leq-n\epsilon_{1}.

Therefore, for every ϵ>0\epsilon>0 there exists ϵ1>0\epsilon_{1}>0 such that,

𝐏m(t=T+2τEST+nm(t)logc)\displaystyle\displaystyle\mathbf{P}_{m}\left(\sum_{t=T+2}^{\tau_{EST}+n}\ell_{m}(t)\leq-\log{c}\right)
𝐏m(t=T+2τESTm(t)+τEST+1τEST+n~m(t)nϵ1)\displaystyle\leq\mathbf{P}_{m}\left(\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)+\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}_{m}(t)\leq-n\epsilon_{1}\right)
𝐏m(t=T+2τESTm(t)nϵ1/2)\displaystyle\leq\mathbf{P}_{m}\left(\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\right)
+𝐏m(τEST+1τEST+n~m(t)nϵ1/2).\displaystyle+\mathbf{P}_{m}\left(\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}_{m}(t)\leq-n\epsilon_{1}/2\right). (41)

Hence, it suffices to show that each term on the RHS of (2) is bounded.

Next, we bound the first term on the RHS of (2). Fix q>0q>0. Then,

𝐏m(t=T+2τESTm(t)nϵ1/2)\displaystyle\displaystyle\mathbf{P}_{m}\left(\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\right)
𝐏m(t=T+2τESTm(t)nϵ1/2,nEST>qn)\displaystyle\leq\mathbf{P}_{m}\left(\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\;n_{EST}>qn\right)
+𝐏m(t=T+2τESTm(t)nϵ1/2,nESTqn).\displaystyle+\mathbf{P}_{m}\left(\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\;n_{EST}\leq qn\right). (42)

Again, it suffices to prove that each term on the RHS of (2) is bounded. The first term is bounded by Lemma 1. For the second term, we have two cases: (i) The last exit from Phase 1 occurred after the change point, i.e., TτcT\geq\tau_{c}. (ii) The last exit from Phase 1 occurred before the change point, i.e., T<τcT<\tau_{c}. Under Case (i), τESTTnESTqn\tau_{EST}-T\leq n_{EST}\leq qn. Since q>0q>0 can be arbitrarily small and m(t)\ell_{m}(t) has a finite expectation, by applying the Chernoff bound we obtain that the probability of Case (i) decreases exponentially with nn. Hence, we have C>0C>0 and γ>0\gamma>0 such that the probability of (i) is bounded by CeγnCe^{-\gamma\sqrt{n}}. We fix again r>0r>0. Then, (ii) is bounded by:

𝐏m(\displaystyle\mathbf{P}_{m}\bigg{(} t=T+2τESTm(t)nϵ1/2,nESTqn,\displaystyle\displaystyle\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\;n_{EST}\leq qn\;,
τcTrn,T<τc)\displaystyle\tau_{c}-T\leq rn\;,\;\;T<\tau_{c}\bigg{)}
+𝐏m(\displaystyle+\;\;\mathbf{P}_{m}\bigg{(} t=T+2τESTm(t)nϵ1/2,nESTqn,\displaystyle\displaystyle\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\;n_{EST}\leq qn\;,
τcT>rn,T<τc)\displaystyle\tau_{c}-T>rn\;,\;\;T<\tau_{c}\bigg{)}
𝐏m(\displaystyle\leq\mathbf{P}_{m}\bigg{(} t=T+2τESTm(t)nϵ1/2,τESTT(q+r)n)\displaystyle\displaystyle\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\;\tau_{EST}-T\leq(q+r)n\bigg{)}
+𝐏m(\displaystyle+\;\;\mathbf{P}_{m}\bigg{(} t=T+2τESTm(t)nϵ1/2,τcT>rn).\displaystyle\displaystyle\sum_{t=T+2}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\tau_{c}-T>rn\bigg{)}. (43)

The first term on the RHS of (2) can be bounded using similar arguments as in Case (i). For any q>0q>0 and r>0r>0, arbitrarily small values can be chosen, and m(t)\ell_{m}(t) has a finite expectation. For the second term, we observe that after the time index τc\tau_{c}, there was no return to Phase 1, i.e., θ^m(τc)Θ(0)\hat{\theta}_{m}(\tau_{c})\not\in\Theta^{(0)}. Consequently, the observations ym(T+1),,ym(τc)y_{m}(T+1),\dots,y_{m}(\tau_{c}) resulted in θ^m(τc)θ(0)\hat{\theta}_{m}(\tau_{c})\neq\theta^{(0)}. By the MLE definition in (10), we have:

t=T+1τcm(θ(0),θ^m(τc))<0.\displaystyle\sum_{t=T+1}^{\tau_{c}}\ell_{m}^{(\theta^{(0)},\hat{\theta}_{m}(\tau_{c}))}<0. (44)

Hence, by applying the Chernoff bound and using the i.i.d property of m(θ(0),θ^m(τc))(t)\ell_{m}^{(\theta^{(0)},\hat{\theta}_{m}(\tau_{c}))}(t), we obtain:

𝐏m(t=T+1τESTm(t)nϵ1/2,τcT>rn)\displaystyle\displaystyle\mathbf{P}_{m}\bigg{(}\sum_{t=T+1}^{\tau_{EST}}\ell_{m}(t)\leq-n\epsilon_{1}/2\;,\tau_{c}-T>rn\bigg{)}
𝐄m[esm(θ(0),θ^m(τc))(T+1)]τcT,\displaystyle\leq\mathbf{E}_{m}\left[e^{-s\cdot\ell_{m}^{(\theta^{(0)},\hat{\theta}_{m}(\tau_{c}))}(T+1)}\right]^{\tau_{c}-T}, (45)

for all s>0s>0. Since 𝐄m(m(θ(0),θ^m(τc))(T+1))=D(θ(0)||θ^m(τc))>0\mathbf{E}_{m}\left(\ell_{m}^{(\theta^{(0)},\hat{\theta}_{m}(\tau_{c}))}(T+1)\right)=\displaystyle D(\theta^{(0)}||\hat{\theta}_{m}(\tau_{c}))>0, we can find γ>0\gamma>0 suvh that (2) is bounded by:

eγ(τcT)eγrneγneγn,\displaystyle e^{-\gamma\cdot(\tau_{c}-T)}\leq e^{-\gamma\cdot rn}\leq e^{-\gamma^{\prime}n}\leq e^{-\gamma^{\prime}\sqrt{n}}, (46)

for γ=γr>0\gamma^{\prime}=\gamma\cdot r>0.

Next, we Bound the second term on the RHS of (2). Note that ~m(t)\tilde{\ell}_{m}(t) has zero mean for all t>τESTt>\tau_{EST}. Therefore, by applying the Chernoff bound and using the i.i.d property of ~m(t)\tilde{\ell}_{m}(t) we have:

𝐏m(τEST+1τEST+n~m(t)nϵ1/2)𝐄m[est=τEST+1τEST+n(~m(t)+ϵ1)]𝐄m[es(~m(t)+ϵ1)]n,\begin{array}[]{l}\displaystyle\mathbf{P}_{m}\left(\sum_{\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}_{m}(t)\leq-n\epsilon_{1}/2\right)\\ \displaystyle\mathbf{E}_{m}\left[e^{-s\cdot\sum_{t=\tau_{EST}+1}^{\tau_{EST}+n}(\tilde{\ell}_{m}(t)+\epsilon_{1})}\right]\leq\mathbf{E}_{m}\left[e^{-s\cdot(\tilde{\ell}_{m}(t)+\epsilon_{1})}\right]^{n},\end{array} (47)

for all s>0s>0. Hence, there exist γ>0\gamma>0 such that (47) is bounded by eγne^{-\gamma n}, which completes the proof of the lemma. ∎

Lemma 3.

The error probability under SCPA is O(c(logc)1δ)O\left(c\cdot(-\log c)^{1-\delta}\right).

Proof.

Recall that the error probabuility is given by:

Pe=m=1Mπmαm.P_{e}=\sum_{m=1}^{M}\pi_{m}\alpha_{m}. (48)

Let αm,jMD=𝐏m(δ=j,τ>τc)\alpha_{m,j}^{MD}=\mathbf{P}_{m}(\delta=j,\,\tau>\tau_{c}) for all jmj\neq m, and αm,jFA=𝐏m(δ=j,ττc)\alpha_{m,j}^{FA}=\mathbf{P}_{m}(\delta=j,\,\tau\leq\tau_{c}) for all jj, denote the miss-detect and false-alarm probabilities, respectively. Thus, αm=jmMαm,jMD+j=1Mαm,jFA\alpha_{m}\vphantom{\bigg{[}}=\sum_{j\neq m}^{M}\alpha_{m,j}^{MD}+\sum_{j=1}^{M}\alpha_{m,j}^{FA}.

Therefore, it suffices to show that αm,jMD,αm,jFA=O(c(logc)1δ)\alpha_{m,j}^{MD}\,,\,\alpha_{m,j}^{FA}=O\left(c\cdot(-\log c)^{1-\delta}\right). We start by analyzing the false-alarm error. Note that

αm,jFA\displaystyle\alpha_{m,j}^{FA} =𝐏m(δ=j,ττc)\displaystyle=\mathbf{P}_{m}(\delta=j,\,\tau\leq\tau_{c})
=𝐏m(Sj(τ)logcfor someττc)\displaystyle=\displaystyle\mathbf{P}_{m}\left(S_{j}(\tau)\geq-\log c\;\;\text{for some}\;\tau\leq\tau_{c}\right)
T=1τcPm(t=T+2τlogf(yj(t)|(θ^j(t1))f(yj(t)|θ(0))logc\displaystyle\leq\sum_{T=1}^{\tau_{c}}\mathbf{\,}{P}_{m}\bigg{(}\sum_{t=T+2}^{\tau}\log\frac{f(y_{j}(t)|(\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta^{(0)})}\geq-\log c
for someT<τ)\displaystyle\qquad\qquad\qquad\text{for some}\;T<\tau\bigg{)}
T=1τcPm(ZT(τT)1cfor someT<τ),\displaystyle\leq\sum_{T=1}^{\tau_{c}}\mathbf{\,}{P}_{m}\left(Z_{T}(\tau-T)\geq\frac{1}{c}\;\;\text{for some}\;T<\tau\right), (49)

where we define for fixed T1T\geq 1,

ZT(τT)eSj(τ)=t=T+2τf(yj(t)|θ^j(t1))f(yj(t)|θ(0)).Z_{T}(\tau-T)\triangleq e^{{S_{j}(\tau)}}=\prod_{t=T+2}^{\tau}\frac{f(y_{j}(t)|\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta^{(0)})}\;. (50)

Also, notice that ZT(τT)Z_{T}(\tau-T) is a non-negative martingale:

Em[ZT(τT)|{yj(t)}t=T+2τ1]\displaystyle\displaystyle{\textbf{E}}_{m}\left[Z_{T}\left(\tau-T\right)|\left\{y_{j}(t)\right\}_{t=T+2}^{\tau-1}\right]
=ZT(τ1T)Em[f(yj(τ)|θ^j(τ1))f(yj(τ)|θ(0))]\displaystyle\quad\displaystyle=Z_{T}(\tau-1-T){\textbf{E}}_{m}\left[\frac{f(y_{j}(\tau)|\hat{\theta}_{j}{(\tau-1)})}{f(y_{j}(\tau)|\theta^{(0)})}\right]
=ZT(τ1T).\displaystyle\quad\displaystyle=Z_{T}\left(\tau-1-T\right)\!. (51)

Hence, by Lemma 1 in [54] for a non-negative martingale, we have,

αm,jFA\displaystyle\alpha_{m,j}^{FA} =T=1τccEm[ZT(1)]=cτcc(logc)1δ,\displaystyle=\sum_{T=1}^{\tau_{c}}c\cdot\displaystyle{\textbf{E}}_{m}\left[Z_{T}\left(1\right)\right]=c\cdot\tau_{c}\leq c\cdot(-\log c)^{1-\delta}, (52)

where the last equality is due to the fact that Em[ZT(1)]=1{{\textbf{E}}}_{m}\left[Z_{T}\left(1\right)\right]=1, since for ττc\tau\leq\tau_{c} , θj(τ)=θ(0)\theta_{j}(\tau)=\theta^{(0)}.

Next, we analyze the miss-detect error. Note that

αm,jMD\displaystyle\alpha_{m,j}^{MD} =𝐏m(δ=j,τ>τc)\displaystyle=\mathbf{P}_{m}(\delta=j,\,\tau>\tau_{c})
=𝐏m(Sj(τ)logcfor someτ>τc)\displaystyle=\displaystyle\mathbf{P}_{m}\left(S_{j}(\tau)\geq-\log c\;\;\text{for some}\;\tau>\tau_{c}\right)
=r=τc+1Pm(t=r+2τlogf(yj(t)|(θ^j(t1))f(yj(t)|θ(0))logc\displaystyle=\sum_{r=\tau_{c}+1}^{\infty}\mathbf{\,}{P}_{m}\bigg{(}\sum_{t=r+2}^{\tau}\log\frac{f(y_{j}(t)|(\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta^{(0)})}\geq-\log c
for somer<τ,T=r)\displaystyle\qquad\qquad\qquad\text{for some}\;r<\tau\;,\;T=r\bigg{)}
=r=τc+1Pm(t=r+2τlogf(yj(t)|(θ^j(t1))f(yj(t)|θ(0))logc\displaystyle=\sum_{r=\tau_{c}+1}^{\infty}\mathbf{\,}{P}_{m}\bigg{(}\sum_{t=r+2}^{\tau}\log\frac{f(y_{j}(t)|(\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta^{(0)})}\geq-\log c
for somer<τ|T=r)Pm(T=r)\displaystyle\qquad\text{for some}\;r<\tau\;|\;T=r\bigg{)}\cdot\mathbf{\,}{P}_{m}\left(T=r\right)
=r=τc+1Pm(t=r+2τlogf(yj(t)|(θ^j(t1))f(yj(t)|θ(0))logc\displaystyle=\sum_{r=\tau_{c}+1}^{\infty}\mathbf{\,}{P}_{m}\bigg{(}\sum_{t=r+2}^{\tau}\log\frac{f(y_{j}(t)|(\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta^{(0)})}\geq-\log c
for somer<τ)Pm(T=r)\displaystyle\qquad\text{for some}\;r<\tau\bigg{)}\cdot\mathbf{\,}{P}_{m}\left(T=r\right)
r=τc+1Pm(Zr(τr)1cfor somer<τ)\displaystyle\leq\sum_{r=\tau_{c}+1}^{\infty}\mathbf{\,}{P}_{m}\left(Z_{r}(\tau-r)\geq\frac{1}{c}\;\;\text{for some}\;r<\tau\right)
Pm(T=r)\displaystyle\qquad\qquad\cdot\mathbf{\,}{P}_{m}\left(T=r\right)
r=τc+1cPm(T=r)\displaystyle\leq\sum_{r=\tau_{c}+1}^{\infty}\,c\cdot{P}_{m}\left(T=r\right) (53)

The last inequality arises from the fact that the event {T=r}\{T=r\} depends solely on samples up to t=rt=r, making it independent of samples from t=r+1t=r+1 onward. Additionally, the inequality follows from Lemma 1 in [54] for non-negative martingales, applied to Zr(τr)Z_{r}(\tau-r), along with the property that Em[Zr(1)]=1{{\textbf{E}}}_{m}\left[Z_{r}\left(1\right)\right]=1 for all jmj\neq m and fixed rr. Finally, note that TτESTT\leq\tau_{EST}. Therefore, we can apply Lemma 1 and obtain:

αm,jMD\displaystyle\alpha_{m,j}^{MD} r=τc+1cPm(T=r)r=τccPm(T>r)\displaystyle\leq\sum_{r=\tau_{c}+1}^{\infty}c\cdot{P}_{m}\left(T=r\right)\leq\sum_{r=\tau_{c}}^{\infty}c\cdot{P}_{m}\left(T>r\right)
r=τccPm(τEST>r)n=0cPm(nEST>n)\displaystyle\leq\sum_{r=\tau_{c}}^{\infty}c\cdot{P}_{m}\left(\tau_{EST}>r\right)\leq\sum_{n=0}^{\infty}c\cdot{P}_{m}\left(n_{EST}>n\right)
=O(c)=O(c(logc)1δ),\displaystyle=O(c)=O\left(c\cdot(-\log c)^{1-\delta}\right), (54)

which completes the proof of the lemma. ∎

Next, we complete the proof of the theorem. From Lemma 1, we have 𝐄m[nEST]O(1)\mathbf{E}_{m}[n_{EST}]\leq O(1), and from Lemma 2, we have 𝐄m[nU](1+o(1))log(c)D(θ(1)||θ(0))\mathbf{E}_{m}[n_{U}]\leq-(1+o(1))\frac{\log(c)}{D(\theta^{(1)}||\theta^{(0)})}\vphantom{\bigg{[}}.

The detection time τ\tau under the SCPA algorithm is upper bounded by τU\tau_{U}. Therefore, ττcτUτc=nU+nEST\tau-\tau_{c}\leq\tau_{U}-\tau_{c}=n_{U}+n_{EST}. Combining all together, we obtain:

𝐄m[ττc](1+o(1))log(c)D(θ(1)||θ(0)).\displaystyle\mathbf{E}_{m}[\tau-\tau_{c}]\leq-(1+o(1))\frac{\log(c)}{D(\theta^{(1)}||\theta^{(0)})}. (55)

To upper bound the Bayes risk under the SCPA algorithm, we apply the bound on the error by Lemma 3, and (55) to have

Rm(Γ)(1+o(1))clog(c)D(θ(1)||θ(0)).\displaystyle R_{m}(\Gamma)\leq-(1+o(1))\frac{c\log(c)}{D(\theta^{(1)}||\theta^{(0)})}. (56)

Lastly, we apply the lower bound on the Bayes risk under simple hypotheses for any algorithm [25]: Rm(Γ)(1+o(1))clog(c)D(θ(1)||θ(0))R_{m}(\Gamma)\geq-(1+o(1))\frac{c\log(c)}{D(\theta^{(1)}||\theta^{(0)})}. Combining the upper and lower bounds concludes the proof.

VII-B Proof of Theorem 1

In this section, we extend the proof for the case when no additional information on the parameters is given, and for all mm, θm\theta_{m} is unknown. Without loss of generality, we prove the theorem when hypothesis mm is true.

To bound the detection time, we define τU\tau_{U} and nUn_{U} as in Definition 3, where τUinf{n>τEST:Sm(n)log(c)}\tau_{U}\triangleq\inf\{n>\tau_{EST}:S_{m}(n)\geq-\log(c)\} and nUτUτESTn_{U}\triangleq\tau_{U}-\tau_{EST}. Additionally, we introduce the following definition:

Definition 4.

For any φΘ(0)\varphi\in\Theta^{(0)}, τU(φ)\tau_{U}(\varphi) is the smallest integer satisfying t=T+2nm(θ(1),φ)(t)log(c)\sum_{t=T+2}^{n}\ell^{(\theta^{(1)},\varphi)}_{m}(t)\geq-\log(c) for n>τESTn>\tau_{EST}. Specifically,

τU(φ)inf{n>τEST:t=T+2nm(θ(1),φ)(t)log(c)}.\displaystyle\tau_{U}(\varphi)\triangleq\inf\{n>\tau_{EST}:\sum_{t=T+2}^{n}\ell^{(\theta^{(1)},\varphi)}_{m}(t)\geq-\log(c)\}. (57)

Also, let nU(φ)τU(φ)τEST(φ)n_{U}(\varphi)\triangleq\tau_{U}(\varphi)-\tau_{EST}(\varphi) denote the total amount of time between τEST\tau_{EST} and τU(φ)\tau_{U}(\varphi).

Note that for all n>τESTn>\tau_{EST}, Sm(n)=t=T+2nm(θ(1),θ(0))(t)S_{m}(n)=\sum_{t=T+2}^{n}\ell^{(\theta^{(1)},\theta^{(0)})}_{m}(t) and nU=maxφΘ(0){nU(φ)}n_{U}=\max_{\varphi\in\Theta^{(0)}}\{n_{U}(\varphi)\}. The following lemma bounds nU(φ)n_{U}(\varphi) for all φΘ(0)\varphi\in\Theta^{(0)}.

Lemma 4.

Assume that SCPA is implemented indefinitely. Then, for every φΘ(0)\varphi\in\Theta^{(0)} and for every fixed ϵ>0\epsilon>0 there exist C>0C>0 and γ>0\gamma>0 such that

𝐏m(nU(φ)>n)Ceγn\displaystyle\mathbf{P}_{m}\left(n_{U}(\varphi)>n\right)\leq Ce^{-\gamma n}
n>(1+ϵ)logc/D(θ(1)).\displaystyle\quad\hskip 56.9055pt\forall n>-(1+\epsilon)\log c/D(\theta^{(1)})\;. (58)
Proof.

As in the first case, we define:

~m(θ(1),φ)(t)m(θ(1),φ)(t)D(θ(1)||φ).\displaystyle\tilde{\ell}^{(\theta^{(1)},\varphi)}_{m}(t)\triangleq\ell^{(\theta^{(1)},\varphi)}_{m}(t)-D(\theta^{(1)}||\varphi). (59)

Again, under hypothesis HmH_{m}, ~m(t)\tilde{\ell}_{m}(t) has zero mean for all t>τESTt>\tau_{EST}. Let ϵ1=D(θ(1)||φ)ϵ(1+ϵ)>0\epsilon_{1}=D(\theta^{(1)}||\varphi)\epsilon(1+\epsilon)>0 and as in (2), we have:

t=T+2τEST+n(θ(1),φ)(t)+logc\displaystyle\sum_{t=T+2}^{\tau_{EST}+n}\ell^{(\theta^{(1)},\varphi)}(t)+\log c\hphantom{TTTTTTTTTTTTTTTTTTTT}
t=T+2τEST(θ(1),φ)(t)+t=τEST+1τEST+n~(θ(1),φ)(t)+nϵ1,\displaystyle\geq\sum_{t=T+2}^{\tau_{EST}}\ell^{(\theta^{(1)},\varphi)}(t)+\sum_{t=\tau_{EST}+1}^{\tau_{EST}+n}\tilde{\ell}^{(\theta^{(1)},\varphi)}(t)+n\epsilon_{1}, (60)

for all n>(1+ϵ)logc/D(θ(1)||φ)n>-(1+\epsilon)\log c/D(\theta^{(1)}||\varphi). Since D(θ(1)||φ)D(θ(1))D(\theta^{(1)}||\varphi)\geq D(\theta^{(1)}) for all φΘ(0)\varphi\in\Theta^{(0)}, we can prove (4) using the same steps as in the proof of Lemma 2. ∎

Next, we show that the error probability is O(c(logc)1δ)O\left(c\cdot(-\log c)^{1-\delta}\right). Similar to the previous case, it suffices to prove that αm,jMD,αm,jFA=O(c(logc)1δ)\alpha_{m,j}^{MD}\,,\,\alpha_{m,j}^{FA}=O\left(c\cdot(-\log c)^{1-\delta}\right) for all jmj\neq m and jj, respectively. Note that for jmj\neq m, θjΘ(0)\theta_{j}\in\Theta^{(0)} and for τ<τc\tau<\tau_{c} , θjΘ(0)\theta_{j}\in\Theta^{(0)} for all jj. Therefore,

Sj(τ)=t=T+2τf(yj(t)|θ^j(t1))f(yj(t)|θ^j(0)(τ))\displaystyle S_{j}(\tau)=\sum_{t=T+2}^{\tau}\frac{f(y_{j}(t)|\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\hat{\theta}^{(0)}_{j}(\tau))}
=minφΘ(0)t=T+2τf(yj(t)|θ^j(t1))f(yj(t)|φ)\displaystyle\hskip 24.18501pt=\min_{\varphi\in\Theta^{(0)}}\sum_{t=T+2}^{\tau}\frac{f(y_{j}(t)|\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\varphi)}
t=T+2τf(yj(t)|θ^j(t1))f(yj(t)|θj).\displaystyle\hskip 24.18501pt\leq\sum_{t=T+2}^{\tau}\frac{f(y_{j}(t)|\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta_{j})}. (61)

Hence, with the same notations as in the previous case for fixed TT , ZT(τT)eSj(τ)=t=T+2τf(yj(t)|θ^j(t1))f(yj(t)|θj),Z_{T}(\tau-T)\triangleq e^{{S_{j}(\tau)}}=\prod_{t=T+2}^{\tau}\frac{f(y_{j}(t)|\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta_{j})}\;, we have:

αm,jFA\displaystyle\alpha_{m,j}^{FA} =𝐏m(δ=j,ττc)\displaystyle=\displaystyle\mathbf{P}_{m}\left(\delta=j\,,\,\tau\leq\tau_{c}\right)
=𝐏m(Sj(τ)logcfor someττc)\displaystyle=\displaystyle\mathbf{P}_{m}\left(S_{j}(\tau)\geq-\log c\;\;\text{for some}\;\tau\leq\tau_{c}\right)
T=1τc𝐏m(t=T+2τf(yj(t)|θ^j(t1))f(yj(t)|θj)logc\displaystyle\leq\sum_{T=1}^{\tau_{c}}\displaystyle\mathbf{P}_{m}\bigg{(}\sum_{t=T+2}^{\tau}\frac{f(y_{j}(t)|\hat{\theta}_{j}(t-1))}{f(y_{j}(t)|\theta_{j})}\geq-\log c\;\;
for someT<τ)\displaystyle\hskip 85.35826pt\text{for some}\;T<\tau\bigg{)}
T=1τc𝐏m(ZT(τT)1cfor someT<τ)\displaystyle\leq\sum_{T=1}^{\tau_{c}}\displaystyle\mathbf{P}_{m}\left(Z_{T}(\tau-T)\geq\frac{1}{c}\;\;\text{for some}\;T<\tau\right)
T=1τcc𝐄m[ZT(1)]=cτcc(logc)1δ,\displaystyle\leq\sum_{T=1}^{\tau_{c}}c\,\mathbf{E}_{m}[Z_{T}(1)]=c\cdot\tau_{c}\leq c\cdot\left(-\log c\right)^{1-\delta}, (62)

where again we used Lemma 1 in [54] for a non-negative martingale, ZT(τT)Z_{T}(\tau-T). The last inequality follows due to 𝐄m[ZT(1)]=1\mathbf{E}_{m}[Z_{T}(1)]=1. The same holds for αm,jMD\alpha_{m,j}^{MD}, which proves the error bound.

To complete the proof, note that nU>nn_{U}>n implies that there exists φΘ(0)\varphi\in\Theta^{(0)} with nU(φ)>nn_{U}(\varphi)>n, such that

𝐏m(nU>n)φΘ(0)𝐏m(nU(φ)>n).\displaystyle\mathbf{P}_{m}\left(n_{U}>n\right)\leq\sum_{\varphi\in\Theta^{(0)}}\mathbf{P}_{m}\left(n_{U}(\varphi)>n\right). (63)

Recall that Θ(0)\Theta^{(0)} is a finite space. Therefore, using Lemma 4, for every ϵ>0\epsilon>0 we can find C>0C>0 and γ>0\gamma>0 such that for all φΘ(0)\varphi\in\Theta^{(0)} we have:

𝐏m(nU(φ)>n),n>(1+ϵ)logc/D(θ(1)).\displaystyle\mathbf{P}_{m}\left(n_{U}(\varphi)>n\right)\;,\;\forall n>-(1+\epsilon)\log c/D(\theta^{(1)}). (64)

Using (64), we obtain that (63) is bounded by:

C~eγn,n>(1+ϵ)logc/D(θ(1)),\displaystyle\tilde{C}e^{-\gamma\sqrt{n}}\;,\;\forall n>-(1+\epsilon)\log c/D(\theta^{(1)}), (65)

for suitable C~>0\tilde{C}>0. Since ττcnEST+nU\tau-\tau_{c}\leq n_{EST}+n_{U}, using the bound on nESTn_{EST} and nUn_{U}, we have:

𝐄m[ττc](1+o(1))log(c)D(θ(1)).\displaystyle\mathbf{E}_{m}[\tau-\tau_{c}]\leq-(1+o(1))\frac{\log(c)}{D(\theta^{(1)})}. (66)

Finally, combining the error bound and (66) yields:

Rm(Γ)(1+o(1))clog(c)D(θ(1))),\displaystyle R_{m}(\Gamma)\leq-(1+o(1))\frac{c\log(c)}{D(\theta^{(1))})}, (67)

which, together with the lower bound on the Bayes risk, completes the proof.

References

  • [1] L. L. Didi, T. Gafni, and K. Cohen, “Active change point anomaly detection over composite hypotheses,” in 2024 60th Annual Allerton Conference on Communication, Control, and Computing, pp. 1–6, 2024.
  • [2] S. Nitinawarat, G. K. Atia, and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on automatic control, vol. 58, no. 10, pp. 2451–2464, 2013.
  • [3] M. Naghshvar and T. Javidi, “Active sequential hypothesis testing,” 2013.
  • [4] H. Chernoff, “Sequential design of experiments,” The Annals of Mathematical Statistics, vol. 30, no. 3, pp. 755–770, 1959.
  • [5] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, pp. 100–115, 1954.
  • [6] V. V. Veeravalli and T. Banerjee, “Quickest change detection,” in Academic press library in signal processing, vol. 3, pp. 209–255, Elsevier, 2014.
  • [7] A. G. Tartakovsky, “Asymptotic performance of a multichart cusum test under false alarm probability constraint,” in Proceedings of the 44th IEEE Conference on Decision and Control, pp. 320–325, IEEE, 2005.
  • [8] A. A. Borovkov, “Asymptotically optimal solutions in the change-point problem,” Theory of Probability & Its Applications, vol. 43, no. 4, pp. 539–561, 1999.
  • [9] B. Hemo, T. Gafni, K. Cohen, and Q. Zhao, “Searching for anomalies over composite hypotheses,” IEEE Trans. Sig. Proc., vol. 68, pp. 1181–1196, 2020.
  • [10] K. Cohen and Q. Zhao, “Asymptotically optimal anomaly detection via sequential testing,” IEEE Transactions on Signal Processing, vol. 63, no. 11, pp. 2929–2941, 2015.
  • [11] Y. Kaspi, O. Shayevitz, and T. Javidi, “Searching with measurement dependent noise,” IEEE Transactions on Information Theory, vol. 64, no. 4, pp. 2690–2705, 2017.
  • [12] Y. Song and G. Fellouris, “Asymptotically optimal, sequential, multiple testing procedures with prior information on the number of signals,” Electronic Journal of Statistics, vol. 11, no. 1, pp. 338–363, 2017.
  • [13] N. K. Vaidhiyan and R. Sundaresan, “Learning to detect an oddball target,” IEEE Transactions on Information Theory, vol. 64, no. 2, pp. 831–852, 2017.
  • [14] B. Huang, K. Cohen, and Q. Zhao, “Active anomaly detection in heterogeneous processes,” IEEE Transactions on Information Theory, vol. 65, no. 4, pp. 2284–2301, 2018.
  • [15] C. Zhong, M. C. Gursoy, and S. Velipasalar, “Deep actor-critic reinforcement learning for anomaly detection,” in IEEE Global Communications Conference (GLOBECOM), pp. 1–6, 2019.
  • [16] A. Gurevich, K. Cohen, and Q. Zhao, “Sequential anomaly detection under a nonlinear system cost,” IEEE Transactions on Signal Processing, vol. 67, no. 14, pp. 3689–3703, 2019.
  • [17] C. Wang, K. Cohen, and Q. Zhao, “Information-directed random walk for rare event detection in hierarchical processes,” IEEE Transactions on Information Theory, 2020.
  • [18] T. Gafni, K. Cohen, and Q. Zhao, “Searching for unknown anomalies in hierarchical data streams,” IEEE Signal Processing Letters, vol. 28, pp. 1774–1778, 2021.
  • [19] A. Tajer, J. Heydari, and H. V. Poor, “Active sampling for the quickest detection of markov networks,” IEEE Transactions on Information Theory, vol. 68, no. 4, pp. 2479–2508, 2021.
  • [20] T. Lambez and K. Cohen, “Anomaly search with multiple plays under delay and switching costs,” IEEE Transactions on Signal Processing, vol. 70, pp. 174–189, 2021.
  • [21] A. Tsopelakos and G. Fellouris, “Sequential anomaly detection under sampling constraints,” IEEE Transactions on Information Theory, 2022.
  • [22] A. Tsopelakos and G. Fellouris, “Asymptotically optimal sequential anomaly identification with ordering sampling rules,” arXiv preprint arXiv:2309.14528, 2023.
  • [23] T. Gafni, B. Wolff, G. Revach, N. Shlezinger, and K. Cohen, “Anomaly search over discrete composite hypotheses in hierarchical statistical models,” IEEE Transactions on Signal Processing, vol. 71, pp. 202–217, 2023.
  • [24] H. Szostak and K. Cohen, “Deep multi-agent reinforcement learning for decentralized active hypothesis testing,” IEEE Access, 2024.
  • [25] K. Cohen and Q. Zhao, “Active hypothesis testing for anomaly detection,” IEEE Trans. Inf. Theory, vol. 61, no. 3, pp. 1432–1450, 2015.
  • [26] S. Nitinawarat and V. V. Veeravalli, “Universal scheme for optimal search and stop,” in 2015 Information Theory and Applications Workshop (ITA), pp. 322–328, IEEE, 2015.
  • [27] G. Schwarz, “Asymptotic shapes of bayes sequential testing regions,” The Annals of mathematical statistics, pp. 224–236, 1962.
  • [28] T. L. Lai, “Nearly optimal sequential tests of composite hypotheses,” The Annals of Statistics, pp. 856–886, 1988.
  • [29] L. Lai, H. V. Poor, Y. Xin, and G. Georgiadis, “Quickest search over multiple sequences,” IEEE Transactions on Information Theory, vol. 57, no. 8, pp. 5375–5386, 2011.
  • [30] A. Tajer and H. V. Poor, “Quick search for rare events,” IEEE Transactions on Information Theory, vol. 59, no. 7, pp. 4462–4481, 2013.
  • [31] K. Cohen, Q. Zhao, and A. Swami, “Optimal index policies for quickest localization of anomaly in cyber networks,” in 2013 IEEE Global Conference on Signal and Information Processing, pp. 221–224, 2013.
  • [32] K. Cohen and Q. Zhao, “Anomaly detection over independent processes: Switching with memory,” in 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 33–37, 2014.
  • [33] J. Heydari, A. Tajer, and H. Vincent Poor, “Quickest linear search over correlated sequences,” IEEE Transactions on Information Theory, vol. 62, no. 10, pp. 5786–5808, 2016.
  • [34] E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, no. 1/2, pp. 100–115, 1954.
  • [35] A. N. Shiryaev, “On optimum methods in quickest detection problems,” Theory of Probability & Its Applications, vol. 8, no. 1, pp. 22–46, 1963.
  • [36] G. Lorden, “Procedures for reacting to a change in distribution,” The annals of mathematical statistics, pp. 1897–1908, 1971.
  • [37] G. V. Moustakides, “Optimal stopping times for detecting changes in distributions,” the Annals of Statistics, vol. 14, no. 4, pp. 1379–1387, 1986.
  • [38] Y. Ritov, “Decision theoretic optimality of the cusum procedure,” The Annals of Statistics, pp. 1464–1469, 1990.
  • [39] M. Pollak, “Optimal detection of a change in distribution,” The Annals of Statistics, pp. 206–227, 1985.
  • [40] E. Bayraktar and L. Lai, “Byzantine fault tolerant distributed quickest change detection,” SIAM Journal on Control and Optimization, vol. 53, no. 2, pp. 575–591, 2015.
  • [41] V. V. Veeravalli, G. Fellouris, and G. V. Moustakides, “Quickest change detection with controlled sensing,” IEEE Journal on Selected Areas in Information Theory, vol. 5, pp. 1–11, 2024.
  • [42] L. Xie, G. V. Moustakides, and Y. Xie, “Window-limited cusum for sequential change detection,” IEEE Transactions on Information Theory, vol. 69, no. 9, pp. 5990–6005, 2023.
  • [43] A. N. Shiryaev, “On optimum methods in quickest detection problems,” Theory of Probability & Its Applications, vol. 8, no. 1, pp. 22–46, 1963.
  • [44] G. Lorden, “Procedures for reacting to a change in distribution,” The Annals of Mathematical Statistics, vol. 42, no. 6, pp. 1897–1908, 1971.
  • [45] M. Pollak, “Optimal detection of a change in distribution,” The Annals of Statistics, vol. 13, no. 1, pp. 206–227, 1985.
  • [46] A. G. Tartakovsky and V. V. Veeravalli, “General asymptotic bayesian theory of quickest change detection,” Theory of Probability & Its Applications, vol. 49, no. 3, pp. 458–497, 2005.
  • [47] L. Xie, S. Zou, Y. Xie, and V. V. Veeravalli, “Sequential (quickest) change detection: Classical results and new directions,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 2, pp. 494–514, 2021.
  • [48] T. Banerjee, H. Firouzi, and A. O. Hero, “Quickest detection for changes in maximal knn coherence of random matrices,” IEEE Transactions on Signal Processing, vol. 66, no. 17, pp. 4490–4503, 2018.
  • [49] A. Puzanov and K. Cohen, “Deep reinforcement one-shot learning for change point detection,” in 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1047–1051, 2018.
  • [50] Q. Xu, Y. Mei, and G. V. Moustakides, “Optimum multi-stream sequential change-point detection with sampling control,” IEEE Transactions on Information Theory, vol. 67, no. 11, pp. 7627–7636, 2021.
  • [51] Q. Xu and Y. Mei, “Asymptotic optimality theory for active quickest detection with unknown postchange parameters,” Sequential Analysis, vol. 42, no. 2, pp. 150–181, 2023.
  • [52] A. Wald, Sequential Analysis. New York, NY, USA:Wiley, 1947.
  • [53] K. Cohen, Q. Zhao, and A. Swami, “Optimal index policies for anomaly localization in resource-constrained cyber systems,” IEEE Transactions on Signal Processing, vol. 62, no. 16, pp. 4224–4236, 2014.
  • [54] H. Robbins and D. Siegmund, “A class of stopping rules for testing parametric hypotheses,” in Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, CA, 1970/1971), vol. 4, pp. 37–41, 1972.