Likelihood Scores for Sparse Signal and Change-Point Detection

Shouri Hu, Jingyan Huang, Hao Chen and Hock Peng Chan²²footnotemark: 2 School of Mathematical Sciences, University of Electronic Science and Technology of ChinaDepartment of Statistics and Data Science, National University of SingaporeDepartment of Statistics, University of California at Davis

Abstract

We consider here the identification of change-points on large-scale data streams. The objective is to find the most efficient way of combining information across data stream so that detection is possible under the smallest detectable change magnitude. The challenge comes from the sparsity of change-points when only a small fraction of data streams undergo change at any point in time. The most successful approach to the sparsity issue so far has been the application of hard thresholding such that only local scores from data streams exhibiting significant changes are considered and added. However the identification of an optimal threshold is a difficult one. In particular it is unlikely that the same threshold is optimal for different levels of sparsity. We propose here a sparse likelihood score for identifying a sparse signal. The score is a likelihood ratio for testing between the null hypothesis of no change against an alternative hypothesis in which the change-points or signals are barely detectable. By the Neyman-Pearson Lemma this score has maximum detection power at the given alternative. The outcome is that we have a scoring of data streams that is successful in detecting at the boundary of the detectable region of signals and change-points. The likelihood score can be seen as a soft thresholding approach to sparse signal and change-point detection in which local scores that indicate small changes are down-weighted much more than local scores indicating large changes. We are able to show second-order optimality of the sparsity likelihood score in the sense of achieving successful detection at the minimum detectable order of change magnitude as well as at the minimum detection asymptotic constant with respect this order of change.

1 Introduction

Consider a large number $N$ of data streams containing change-points. We consider the situation in which all data up to a given time is available for analysis, so each data stream is an observed sequence of length $T$ . At each change-point one or more of the sequences undergo distribution change. The objective is to identify these change-points and the sequences undergoing distribution change. Of interest here is the identification of these change-points when there is sparsity, that is when the number of sequences undergoing change is small compared to $N$ . More specifically we want to know the minimum magnitude of change for which the distribution change can be detected under sparsity. And secondly we want to be have an algorithm that is able to detect, with high probability, change-points under the minimum detectable change. See Niu, Hao and Zhang [23] and Wang and Samworth [30] for applications to engineering, genomics and finance.

A typical strategy to deal with sparsity is to subject local scores to thresholding or penalization before summing them up across sequence. Algorithms employing this strategy include the Sparsified Binary Segmentation (SBS) (Cho and Fryzlewicz [8]), the double CUSUM (DC) (Cho [7]), the Informative Sparse Projection (INSPECT) (Wang and Samworth [30]) and the scan algorithm of Enikeeva and Harachaoui [12]. The strategy was also employed by Mei [22], Xie and Siegmund [34] and Wang and Mei [31] in sequential change-point detection on multiple sequences, and Zhang, Siegmund, Ji and Li [37] to detect distribution deviations from known baselines on multiple sequences. Thresholding and penalization suppress noise by removing small and moderate scores, mostly from the majority of sequences without change, thus enhancing the signals from the sparse sequences with changes. It is however unlikely that we are able to specify a threshold or penalization parameter that is optimal at all levels of sparsity.

The higher-criticism (HC) test statistic, proposed by Tukey [29] to check for significantly large number of small p-values, uses multiple thresholds for sparse mixture detection. The number of p-values below a threshold is transformed to a higher-criticism score and this score is maximized over all thresholds. The Berk and Jones [3] test statistic uses multiple thresholds as well but it applies a different p-value transformation. The HC test statistic was shown by Donoho and Jin [9] to be optimal in the detection of a sparse normal mixture. Cai, Jeng and Jin [4] extended it to detect intervals in multiple sequences where the means of a sparse fraction of the sequences deviate from a known baseline and showed that the HC test statistic is optimal. Chan and Walther [5] considered sequence length much larger than number of sequences with detection boundaries that are more complex. They showed that the HC test statistic achieves detection at these boundaries and is optimal in more general settings. They also showed that the Berk-Jones test statistic achieves the same optimality. The HC and Berk-Jones test statistics have the advantage of not requiring a threshold to be specified in advance. However at each threshold we are unable to differentiate a p-value just below the threshold and a p-value that is below the threshold by a large margin. This loss of information results sub-optimality when identifying the exact location where change has occurred.

Our approach here is to convert the p-values into likelihood scores for testing sparse sequences. It can be considered to be a soft form of thresholding in which p-values that are close to zero are penalized less than p-values that are barely significant. Unlike the HC and Berk-Jones test statistic only one transformation of the p-values is needed. It retains the advantage of the HC and Berk-Jones test statistics in not specifying a threshold in advance, but unlike the HC and Berk-Jones test statistics a smaller p-value always has a larger likelihood score.

Since the likelihood scores are transformations of p-values, the proposed method can be applied to any type of distribution changes and it can handle data types that vary across sequences. Our theory however requires a specific distribution family for neat asymptotics and we consider here in particular either normal or Poisson data. We show optimality up to the correct asymptotic constants. For sparse normal change-points these constants are two-dimensional extensions of those in Ingster [17] and Donoho and Jin [9] for sparse normal mixture detection. These constants have been discussed in the context of sparse normal change-point detection assuming a known baseline in Chan and Walther [5] and Chan [6]. For sparse Poisson change-points the constants are new and different from sparse normal constants.

The optimality of multiple sequence identification of change-points up to the correct constant is new. Previous works on optimality for normal data are up to the correct order of magnitude though they go beyond the i.i.d. model, for example Pilliat, Carpentier and Verzelen [26] considered sparse change-point detection in time-series with normal errors. Liu, Guo and Samworth [21] showed optimality up to the best order for normal errors. However they considered a different problem that assumes at most one change-point.

The algorithm we propose here has two steps in the identification of two change-points. The first detection screening step applies the Screening and Ranking (SaRa) idea of Niu and Zhang [24]. The second estimation step for more precise location of change-points uses the CUSUM-like procedure of Wild Binary Segmentation (WBS), cf. Fryzlewicz [14].. This two-step approach saves computation time because the fast screening step evaluates a large number of segments whereas the computationally intensive estimation step is only applied when a change-point has been detected during screening. In contrast for WBS the estimation step is applied on a large number of randomly generated segments. Unlike in Niu and Zhang [24] we do not apply the BIC criterion of Zhang and Siegmund [36] to determine the number of change-points. Instead critical values are specified in advance and binary segmentation, cf. Olshen et al. [25], is applied to detect the change-points sequentially.

An alternative to binary segmentation is estimating the full set of change-points at one go by applying global optimization and making use of dynamic programming to manage the computational complexity. This was employed by the HMM algorithms of Yao [35] and Lai and Xing [20], the multi-scale SMUCE algorithm of Frick, Munk and Sieling [13] and the Bayesian Likelihood algorithm of Du, Kao and Kou [10]. These methods are however designed for single sequence segmentation. Niu, Hao and Zhang [23] provides an excellent background of the historical developments.

The outline of this paper is as follows. In Section II we introduce the sparse likelihood (SL) scores and show that they are optimal in the detection of sparse normal mixtures. In Section III we extend SL scores to detect change-points in multiple sequences. In Section IV we show that SL scores are optimal for change-point detection when the observations are normal or Poisson. In Section V we perform simulation studies on the SL scores. In the appendices we prove the optimality of SL scores.

1.1 Notations

We write $a_{n}\sim b_{n}$ to denote $\lim_{n\rightarrow\infty}(a_{n}/b_{n})=1$ . We write $a_{n}=o(b_{n})$ to denote $\lim_{n\rightarrow\infty}(a_{n}/b_{n})=0$ . We write $a_{n}\lesssim b_{n}$ to denote $a_{n}\leq Cb_{n}$ for all $n$ for some $C>0$ and $a_{n}\asymp b_{n}$ to denote $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ . We write $X_{n}=O_{p}(a_{n})$ to denote $P(X_{n}\leq Ca_{n})\rightarrow 1$ for some $C>0$ . Let $\lfloor\cdot\rfloor(\lceil\cdot\rceil)$ denote the greatest (least) integer function. Let $\phi$ and $\Phi$ denote the density and distribution function respectively of the standard normal. Let ${\bf 1}$ denote the indicator function. Let $\emptyset$ denote the empty set and let $\#A$ denote the number of elements in a set $A$ .

2 Sparse mixture detection

We start with the simpler problem of detecting a sparse mixture, with the objective of motivating the sparse likelihood score.

Let ${\bf p}=(p^{1},\ldots,p^{N})$ be independent p-values of $N$ null hypotheses and let $p^{(1)}\leq\cdots\leq p^{(N)}$ be the sorted p-values. Tukey (1976) proposed the higher-criticism test statistic

\mbox{HC}({\bf p})=\max_{n:Np^{(n)}\leq n}\tfrac{n-Np^{(n)}}{\sqrt{Np^{(n)}(1-p^{(n)})}},

(1)

with ${\rm HC}({\bf p})=0$ if $Np^{(n)}>n$ for all $n$ , for the overall test that all null hypotheses are true.

Donoho and Jin [9] showed that the HC test statistic is optimal for detecting a sparse fraction of false null hypotheses. Consider test scores $Z^{n}\sim{\rm N}(0,1)$ when the $n$ th null hypothesis is true and $Z^{n}\sim{\rm N}(\mu_{N},1)$ for some $\mu_{N}>0$ when the $n$ th null hypothesis is false. Define

\rho_{Z}(\beta)=\left\{\begin{array}[]{ll}\beta-\tfrac{1}{2}&\mbox{ if }\tfrac{1}{2}<\beta<\tfrac{3}{4},\cr(1-\sqrt{1-\beta})^{2}&\mbox{ if }\tfrac{3}{4}\leq\beta<1.\end{array}\right.

(2)

Donoho and Jin [9] showed that on the sparse mixture $(1-\epsilon){\rm N}(0,1)+\epsilon{\rm N}(\mu_{N},1)$ no algorithm is able to achieve, as $N~{}\rightarrow~{}\infty$ ,

P_{0}(\mbox{Type I error})+P_{\mu_{N}}(\mbox{Type II error})\rightarrow 0,

(3)

for testing $H_{0}$ : $\epsilon=0$ versus $H_{1}$ : $\epsilon=N^{-\beta}$ , if $\mu_{N}=\sqrt{2\nu\log N}$ for $\nu<\rho_{Z}(\beta)$ . They also showed that the HC test statistic achieves (3) when $\nu>\rho_{Z}(\beta)$ and is thus optimal. Type I error refers to the conclusion of $H_{1}$ when $H_{0}$ is true whereas Type II error refers to the conclusion of $H_{0}$ when $H_{1}$ is true. Ingster (1997, 1998) established the detection lower bound showing that (3) cannot be achieved when $\nu<\rho_{Z}(\beta)$ .

Like the HC test statistic, the Berk and Jones [3] test statistic

\mbox{BJ}({\bf p})=\max_{n:Np^{(n)}\leq n}\Big{[}n\log\Big{(}\tfrac{n}{Np^{(n)}}\Big{)}+(N-n)\log\Big{(}\tfrac{N-n}{N(1-p^{(n)})}\Big{)}\Big{]}

achieves (3) when $\nu>\rho_{Z}(\beta)$ .

We introduce the sparse likelihood scores in Section II.A and show that they achieve (3) in the detection of sparse mixtures, when $\nu>\rho_{Z}(\beta)$ , in Section II.B.

2.1 Sparse likelihood

Let $f_{1}(p)=\tfrac{1}{p(2-\log p)^{2}}-\tfrac{1}{2}$ and $f_{2}(p)=\tfrac{1}{\sqrt{p}}-2$ . For both $i=1$ and 2, $\int_{0}^{1}f_{i}(p)dp=0$ and $f_{i}(p)$ increases as $p$ decreases.

Define the sparse likelihood score

	$\displaystyle\ell_{N}({\bf p})$	$\displaystyle=$	$\displaystyle\sum_{n=1}^{N}\ell_{N}(p^{n}),$		(4)
	$\displaystyle\mbox{where }\ell_{N}(p)$	$\displaystyle=$	$\displaystyle\log\Big{(}1+\tfrac{\lambda_{1}\log N}{N}f_{1}(p)+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(p)\Big{)},$

with $\lambda_{1}\geq 0$ and $\lambda_{2}>0$ . For the purpose of sparse mixture detection $f_{1}$ plays no role and we can consider $\lambda_{1}=0$ . The importance of $f_{1}$ comes from detecting very sparse change-points.

The sparse likelihood is the likelihood of

	$\displaystyle p^{n}$	$\displaystyle\sim_{\rm i.i.d.}$	$\displaystyle{\rm Uniform}(0,1),$
	$\displaystyle\mbox{ vs }p^{n}$	$\displaystyle\sim_{\rm i.i.d.}$	$\displaystyle f=1+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}.$

The distribution function of $f$ is

F(p)=p+\tfrac{2\lambda_{2}}{\sqrt{N\log N}}(\sqrt{p}-p).

Consider the true distribution of $p^{n}$ to be some unknown $G\neq{\rm Uniform}(0,1)$ . The fraction of p-values less than some small $p_{0}$ has mean $G(p_{0})$ and standard deviation approximately $\sqrt{G(p_{0})/N}$ . Hence to achieve decent detection power for a test using $p_{0}$ as critical p-value, we need

G(p_{0})\geq p_{0}+C\sqrt{p_{0}/N}\mbox{ for some }C>0.

(5)

Setting $C=\tfrac{2\lambda_{2}}{\sqrt{\log N}}$ to the right-hand side of (5) gives us a function close to $F$ .

As $\sqrt{\log N}$ varies slowly with $N$ , we can view $F$ as lying at the boundary for which detection is possible at each small $p_{0}$ . This suggests that for any $G$ that is greater than $F$ at some $p_{0}$ , we are able to differentiate it from Uniform (0,1) by using the likelihood test against $F$ .

When $p$ is of order smaller than $N^{-1}$ , $\tfrac{\log N}{N}f_{1}(p)$ dominates $\tfrac{1}{\sqrt{N\log N}}f_{2}(p)$ and the selection of $\lambda_{1}>0$ is advantageous. This is relevant in the extension of sparse likelihood scores to detect change-points on long sequences where large number of likelihood comparisons is involved.

Refer to caption — Figure 1: Graphs of $\ell_{N}(p(z))$ (black, —) and $(z-2)^{2}_{+}/2$ (red, $--$ ), with $p(z)=2\Phi(-z)$ , for $0\leq z\leq 5$ (top) and $0\leq z\leq 2$ (bottom). The parameters of $\ell_{N}$ are $N=500$ , $\lambda_{1}=1$ and $\lambda_{2}=1.84\Big{(}\doteq\sqrt{\tfrac{\log T}{\log\log T}}$ for $T=500$ ).

The sparse likelihood score can be viewed as a form of soft thresholding. To visualize this we compare in Figure 1 the plot of $\ell_{N}(p(z))$ for $p(z)=2\Phi(-|z|)$ , $N=500$ , $\lambda_{1}=1$ and $\lambda_{2}=1.84$ , against that of $(z-2)^{2}_{+}/2$ . For $0\leq z\leq 5$ , the two functions are close to each other however within $0\leq z\leq 2$ , $\ell_{N}(p(z))$ is not constant but has a gentle upward curve. The sparsity likelihood score is negative for $z\leq 1.18$ and $\ell_{N}(p(Z))$ for $Z$ standard normal has a mean of $-0.004$ . This negative mean helps in controlling the sum of scores when $N$ is large and $p^{n}\stackrel{{\scriptstyle\rm i.i.d}}{{\sim}}{\rm Uniform}(0,1)$ .

2.2 Optimal detection

We show here that the sparse likelihood score is optimal in the detection of change-points for a broad range of sparsity. Let $E_{0}$ and $P_{0}$ denote expectation and probability respectively with respect to $p^{n}\stackrel{{\scriptstyle\rm i.i.d}}{{\sim}}{\rm Uniform}(0,1)$ . Since

			$\displaystyle E_{0}\exp(\ell_{N}({\bf p}))$
		$\displaystyle=$	$\displaystyle\prod_{n=1}^{N}E_{0}[1+\tfrac{\lambda_{1}\log N}{N}f_{1}(p^{n})+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(p^{n})]=1,$

it follows from Markov’s inequality that

P_{0}(\ell_{N}({\bf p})\geq c_{N})\leq e^{-c_{N}}.

(6)

This exponential bound makes the sparsity likelihood score easy to work with when there are large number of likelihood comparisons, as critical values satisfying a required level of Type I error control can have a simple expression not depending on $N$ . We show in Theorem 1 that by selecting

c_{N}\rightarrow\infty\mbox{ with }c_{N}=o(N^{\delta})\rightarrow 0\mbox{ for all }\delta>0,

(7)

the Type I and II error probabilities both go to zero at the detection boundary.

Theorem 1.

Assume (7). Consider the test of $H_{0}$ : $Z^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}{\rm N}(0,1)$ versus $H_{1}$ : $Z^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}(1-\epsilon){\rm N}(0,1)+\epsilon{\rm N}(\mu_{N},1)$ , for $1\leq n\leq N$ , with $\epsilon=N^{-\beta}$ for some $\tfrac{1}{2}<\beta<1$ . Let $p^{n}=\Phi(-Z^{n})$ . If $\mu_{N}=\sqrt{2\nu\log N}$ for $\nu>\rho_{Z}(\beta)$ , then

P_{0}(\ell_{N}({\bf p})\geq c_{N})+P_{\mu_{N}}(\ell_{N}({\bf p})<c_{N})\rightarrow 0.

3 Change-point detection

Let $X^{n}_{t}$ denote the $t$ th observation of the $n$ th sequence for $1\leq t\leq T$ and $1\leq n\leq N$ . Consider first the model

X^{n}_{t}\sim_{\rm indep.}{\rm N}(\mu_{t}^{n},1).

(8)

We are interested in the detection and estimation of

\boldsymbol{\tau}:=\{t:\mu^{n}_{t}\neq\mu^{n}_{t+1}\mbox{ for some }n\}.

For $s<t$ , let $\bar{X}_{st}^{n}=(t-s)^{-1}\sum_{u=s+1}^{t}X_{u}^{n}$ . To check for a change of mean on the $n$ th sequence at location $t$ , select $s<t<u$ and let p-value

p_{stu}^{n}=2\Phi(-|Z_{stu}^{n}|),\mbox{ where }Z_{stu}^{n}=\tfrac{\bar{X}_{tu}^{n}-\bar{X}_{st}^{n}}{\sqrt{(u-t)^{-1}+(t-s)^{-1}}}.

In the sparse likelihood algorithm we combine these p-values using $\ell_{N}({\bf p}_{stu})$ , where ${\bf p}_{stu}=(p_{stu}^{1},\ldots,p_{stu}^{N})$ . When the data follow some other distributions, the corresponding likelihood ratio statistic and p-value can be computed accordingly.

Sparse likelihood scores detects well when only a small fraction of the sequences undergo change of mean. For $T$ large computing the sparse likelihood score for all $(s,t,u)$ is expensive. Instead we combine the approximating set idea of Walther (2010) to first space out the $(s,t,u)$ that are evaluated, and to apply the CUSUM-type scores used in WBS to estimate the change-point location accurately only when the first step indicates a change-point.

In addition to computational savings, through this two-step approach we are able to incorporate multi-scale penalization terms similar to those used in Dümbgen and Spokoiny (2001) and the SMUCE algorithm of Frick, Munk and Sieling (2014), to ensure optimality not only at all levels of sparse change-points, but also at all orders of change magnitudes.

Let $1\leq h_{1}<h_{2}<\cdots$ and $1\leq d_{1}<d_{2}<\cdots$ be integer-valued sequences with $h_{i}\geq d_{i}$ for all $i$ . Let $K_{i}(g)=\lfloor\tfrac{g-1}{d_{i}}\rfloor$ . Define

$\displaystyle{\cal A}_{i}(g)$	$\displaystyle=$	$\displaystyle\{(s(ik),t(ik),u(ik)):1\leq k\leq K_{i}(g)\},$
$\displaystyle s(ik)$	$\displaystyle=$	$\displaystyle\max(0,kd_{i}-h_{i}),$
$\displaystyle t(ik)$	$\displaystyle=$	$\displaystyle kd_{i},$
$\displaystyle u(ik)$	$\displaystyle=$	$\displaystyle\min(kd_{i}+h_{i},g).$

The elements of ${\cal A}_{i}(g)$ are the indices where sparse likelihood scores for windows of length $h_{i}$ are computed. Initially we have the full dataset ${\bf X}_{1:T}=(X^{n}_{t}:1\leq t\leq T,1\leq n\leq N)$ and after one or more change-points have been estimated, it is split into sub-datasets ${\bf X}_{b:e}=(X^{n}_{t}:b\leq t\leq e,1\leq n\leq N)$ , with length $g=e-b+1$ . We check for change-points in ${\bf X}_{b:e}$ using windows specified by ${\cal A}_{i}(g)$ .

Let the penalized sparse likelihood scores

\ell^{\rm{pen}}_{N}({\bf p}_{stu})=\ell_{N}({\bf p}_{stu})-\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t})).

(9)

For $g\geq 1$ , let $i_{g}=\max\{i:h_{i}+d_{i}\leq g\}$ . The detection of change-points within ${\bf X}_{b:e}$ , with window lengths at least $h_{i_{0}}$ , is as follows.

Algorithm 1 SL-estimate

input(

c,i_{0},b,e

)

{\bf X}\leftarrow{\bf X}_{b:e}

g\leftarrow e-b+1

for

i=i_{0},\ldots,i_{g}

\max_{1\leq k\leq K_{i}(g)}\ell^{\rm{pen}}_{N}({\bf p}_{s(ik),t(ik),u(ik)})\geq c

then

j\leftarrow\mbox{argmax}_{k:1\leq k\leq K_{i}(g)}\ell^{\rm{pen}}_{N}({\bf p}_{s(ik),t(ik),u(ik)})

\widehat{\tau}\leftarrow[\mbox{argmax}_{t:s(ij)<t<u(ij)}\ell^{\rm{pen}}_{N}({\bf p}_{s(ij),t,u(ij)})]+b-1

output

(\widehat{\tau},i)

stop

end if

end for

output (0,0)

There are two steps in SL-estimate in the estimation of a change-point, when the largest penalized score exceeds the critical value $c$ . The first is the identification of an interval $(s(ij),u(ij))$ , associated with the largest penalized score, within which a change-point lies. The second is the estimation of the change-point within this interval. In the approximating set ${\cal A}_{i}(g)$ , neighboring windows are located $d_{i}$ apart, hence we are unable to estimate the change-points accurately in the first step. Accurate estimation is carried out, with more intensive computations within $(s(ij),u(ij))$ , in the second step. Since the second step is performed only after an interval has been identified as containing a change-point, performing this two-step procedure saves computations in regions where scores are generally small and the likelihood of change-points is low.

After a change-point has been identified, we split the dataset into two and execute the same algorithm on each split dataset. To avoid repetitive computations, we start from the window length $h_{i_{0}}$ used in the evaluation of the change-point spltting the dataset, instead of starting from the smallest window length $h_{1}$ , on the split datasets. The use of a set of representative set of window-lengths $h_{i}$ for computational savings in change-point detection have been proposed in Willsky and Jones [33]. The recursive segmentation algorithm for the computation of the estimated change-point set $\widehat{\boldsymbol{\tau}}$ is given below, with initialization at $(c,1,1,T,\emptyset)$ .

Algorithm 2 SL-detect

input(

c,i_{0},b,e,\widehat{\boldsymbol{\tau}}

)

(\widehat{\tau},i)\leftarrow

SL-estimate(

c,i_{0},b,e

)

\widehat{\tau}>0

then

\widehat{\boldsymbol{\tau}}\leftarrow\widehat{\boldsymbol{\tau}}\cup\{\widehat{\tau}\}

\widehat{\boldsymbol{\tau}}\leftarrow

SL-detect(

c,i,b,\widehat{\tau},\widehat{\boldsymbol{\tau}})

\widehat{\boldsymbol{\tau}}\leftarrow

SL-detect(

c,i,\widehat{\tau},e,\widehat{\boldsymbol{\tau}})

end if

output

\widehat{\boldsymbol{\tau}}

In Figure 2 we show that the critical values of the sparse likelihood algorithm, for a specified Type I error probability, is stable over $N$ . Contributing factors include $\ell_{N}(p)$ having a mean that is close to zero and $\ell_{N}({\bf p})$ having exponential tail probabilities not depending on $N$ , see (6), when $p$ and $p^{n}$ are uniformly distributed.

4 Optimal detection

Let $\boldsymbol{\mu}=(\mu_{t}^{n}:1\leq t\leq T,1\leq n\leq N)$ and let $J=(\#\boldsymbol{\tau})$ be the number of change-points. We show that the sparse likelihood algorithm is optimal for normal observations in Section IV.A and for Poisson observations in Section IV.B. Consider $T\rightarrow\infty$ as $N\rightarrow\infty$ such that

\log T\sim N^{\zeta}\mbox{ for some }0<\zeta<1.

(10)

In Theorems 2 and 4 we specify the detection boundary for asymptotically zero Type I and II error probabilities. Analogous detection boundaries for a single sequence is given in Arias-Castro, Donoho and Huo [1, 2].

In Theorems 3 and 5 we show that Type I and II error probabilities of the sparse likelihood algorithm go to zero at the detection boundary.

Recall that $i_{T}=\max\{i:h_{i}+d_{i}\leq T\}$ . Consider the sparse likelihood algorithm with $d_{i}$ and $h_{i}$ satisfying

	$\displaystyle\tfrac{h_{i+1}}{h_{i}}$	$\displaystyle\rightarrow$	$\displaystyle 1\mbox{ and }d_{i}=o(h_{i})\mbox{ as }i\rightarrow\infty,$		(11)
	$\displaystyle\log\Big{(}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\Big{)}$	$\displaystyle=$	$\displaystyle o(\log T)\mbox{ as }T\rightarrow\infty,$		(12)

and critical values $c_{T}$ satisfying

c_{T}=o(\log T)\mbox{ and }c_{T}-\log\Big{(}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\Big{)}\rightarrow\infty\mbox{ as }T\rightarrow\infty.

(13)

For the sparse likelihood algorithm select parameters $\lambda_{1}>0$ and

\lambda_{2}=\sqrt{\tfrac{\log T}{\log\log T}}.

(14)

We satisfy (11) when $h_{i}\sim\exp(\tfrac{i}{\log i})$ and $d_{i}\sim\tfrac{h_{i}}{i}$ as $i~{}\rightarrow~{}\infty$ . Moreover (12) holds because

\log\Big{(}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\Big{)}\sim 2\log i_{T}\sim 2\log\log T.

Condition (11) ensures that the set of $(h_{i},d_{i})$ is sufficiently dense to detect change-points optimally. Condition (12) is required for (13) to hold. The first half of condition (13) ensures Type II error probability goes to 0. The second half ensures Type I error probability goes to 0.

4.1 Normal model

Let

m_{j\Delta}=\#\{n:|\mu_{\tau_{j}+1}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta\}

be the number of sequences with change of mean of at least $\Delta$ at the $j$ th change-point. Let

$\displaystyle\Omega_{0}$	$\displaystyle=$	$\displaystyle\{\boldsymbol{\mu}:J=0\},$
$\displaystyle\Omega_{1}(\Delta,V,h)$	$\displaystyle=$	$\displaystyle\{\boldsymbol{\mu}:\mbox{ there exists }j\mbox{ such that}$
		$\displaystyle\qquad\min(\tau_{j}-\tau_{j-1},\tau_{j+1}-\tau_{j})\geq h$
		$\displaystyle\qquad\mbox{ and }m_{j\Delta}\geq V\},$

with the convention $\tau_{0}=0$ and $\tau_{J+1}=T$ . We consider here the test of $H_{0}$ : $\boldsymbol{\mu}\in\Omega_{0}$ versus $H_{1}$ : $\boldsymbol{\mu}\in\Omega_{1}(\Delta,h,V)$ . Define

\rho_{Z}(\beta,\zeta)=\left\{\begin{array}[]{l}\beta-\tfrac{1-\zeta}{2}\cr\qquad\mbox{ if }\tfrac{1-\zeta}{2}<\beta\leq\tfrac{3(1-\zeta)}{4},\cr(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}\cr\qquad\mbox{ if }\tfrac{3(1-\zeta)}{4}<\beta<1-\zeta.\end{array}\right.

(15)

These constants are extensions of $\rho_{Z}(\beta)$ in (2) to capture the effect of multiple testing in change-point detection.

Theorem 2.

Assume (10) and let $0<\epsilon<1$ . For normal observations, no algorithm is able to achieve, as $N\rightarrow\infty$ ,

\sup_{\boldsymbol{\mu}\in\Omega_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})+\sup_{\boldsymbol{\mu}\in\Omega_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})\rightarrow 0,

(16)

under any of the following conditions.

(a) Consider $\Delta>0$ fixed.

i. When $V=o(\tfrac{\log T}{\log N})$ and $h=4(1-\epsilon)(\tfrac{\log T}{\Delta^{2}V})$ .

ii. When $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ and $h=4(1-\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}})$ .

(b) Consider $\Delta=T^{-\eta}$ for some $0<\eta<\tfrac{1}{2}$ .

i. When $V=o(\tfrac{\log T}{\log N})$ and $h=4(1-2\eta)(1-\epsilon)(\tfrac{\log T}{\Delta^{2}V})$ .

ii. When $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ and $h=4(1-\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}})$ .

Theorem 3.

Assume (10) and let $\epsilon>0$ . For normal observations the sparse likelihood algorithm, with parameters satisfying (11)–(14), achieves (16) under any of the following conditions.

(a) Consider $\Delta>0$ fixed.

i. When $V=o(\tfrac{\log T}{\log N})$ and $h=4(1+\epsilon)(\tfrac{\log T}{\Delta^{2}V})$ .

ii. When $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ and $h=4(1+\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}})$ .

(b) Consider $\Delta=T^{-\eta}$ for some $0<\eta<\tfrac{1}{2}$ .

i. When $V=o(\tfrac{N^{\zeta}}{\log N})$ and $h=4(1-2\eta)(1+\epsilon)(\tfrac{\log T}{\Delta^{2}V})$ .

ii. When $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ and $h=4(1+\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}})$ .

4.2 Poisson model

We show here the optimality of the sparse likelihood detection algorithm for

X^{n}_{t}\sim_{\rm indep.}\mbox{Poisson}(\mu^{n}_{t}).

(17)

For optimal detection on a single Poisson sequence, see Rivera and Walther [28].

Let $Y_{st}^{n}=\sum_{v=s+1}^{t}X_{v}^{n}$ . Consider $s<t<u$ . Under the null hypothesis of no change-points in the interval $(s,u)$ , conditioned on $Y_{su}^{n}=y^{n}_{su}$ , $Y^{n}_{st}$ is binomial distributed with $y^{n}_{su}$ trials and success probability $\frac{t-s}{u-s}$ . Let $p^{n}_{stu}$ be the two-sided p-value of this conditional binomial test, with continuity adjustments so that it is distributed as Uniform(0,1) under the null hypothesis. More specifically when $Y_{st}^{n}=y_{st}^{n}$ and $Y_{su}^{n}=y_{su}^{n}$ simulate

\psi_{stu}^{n}\sim\mbox{Uniform}(P(Y<y_{st}^{n}),P(Y\leq y_{st}^{n})),

(18)

where $P$ is probability with respect to $Y\sim\mbox{Binomial}(y_{su}^{n},\tfrac{t-s}{u-s})$ , and define $p_{stu}^{n}=2\min(\psi_{stu}^{n},1-\psi_{stu}^{n})$ .

Let

m_{j\Delta}=\#\{n:|\log(\mu_{\tau_{j}+1}^{n}/\mu_{\tau_{j}}^{n})|\geq\Delta\},

and for a given $\mu_{0}>0$ , let

$\displaystyle\Lambda$	$\displaystyle=$	$\displaystyle\{\boldsymbol{\mu}:\mu_{n}^{t}\geq\mu_{0}\mbox{ for all }n\mbox{ and }t\},$
$\displaystyle\Lambda_{0}$	$\displaystyle=$	$\displaystyle\{\boldsymbol{\mu}\in\Lambda:J=0\},$
$\displaystyle\Lambda_{1}(\Delta,V,h)$	$\displaystyle=$	$\displaystyle\{\boldsymbol{\mu}\in\Lambda:\mbox{ there exists }j\mbox{ such that }$
		$\displaystyle\qquad\min(\tau_{j+1}-\tau_{j},\tau_{j}-\tau_{j-1})\geq h$
		$\displaystyle\qquad\mbox{ and }m_{j\Delta}\geq V\}.$

We consider here the test of $H_{0}$ : $\boldsymbol{\mu}\in\Lambda_{0}$ vs $H_{1}$ : $\boldsymbol{\mu}\in\Lambda_{1}(\Delta,V,h)$ .

For a given $r>1$ , let

I_{r}=r\log(\tfrac{2r}{r+1})+\log(\tfrac{2}{r+1}).

(19)

Let $g_{r}(\omega)=(\tfrac{1+r^{\omega}}{2})^{\frac{1}{\omega}}$ and let

\rho_{r}(\beta,\zeta)=\max_{\frac{1-\zeta}{\beta}<\omega\leq 2}\Big{(}\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g_{r}(\omega)-1-r}\Big{)}\mbox{ for }\tfrac{1-\zeta}{2}<\beta<1-\zeta.

(20)

In Theorem 4 we show that (20) is the asymptotic constant in the detection boundary of Poisson random variables. In Theorem 5 we show that the sparse likelihood algorithm achieves detection at this boundary for a broad range of sparsity.

Theorem 4.

Assume (10). Let $r=e^{\Delta}$ for some $\Delta>0$ and $0<\epsilon<1$ . For Poisson observations no algorithm is able to achieve, as $N\rightarrow\infty$ ,

\sup_{\boldsymbol{\mu}\in\Lambda_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})+\sup_{\boldsymbol{\mu}\in\Lambda_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})\rightarrow 0

(21)

under either of the following conditions.

(a) When $V=o(\tfrac{\log T}{\log N})$ and $h=(1-\epsilon)I_{r}^{-1}(\tfrac{\log T}{\mu_{0}V})$ .

(b) When $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ and $h=(1-\epsilon)\rho_{r}(\beta,\zeta)(\tfrac{\log N}{\mu_{0}})$ .

Theorem 5.

Assume (10). Let $\epsilon>0$ , $\Delta>0$ and $1<r<e^{\Delta}$ . For Poisson observations the sparse likelihood algorithm, with parameters satisfying (11)–(14), achieves (21) under either of the following conditions.

(a) When $V=o(\tfrac{\log T}{\log N})$ and $h=(1+\epsilon)I_{r}^{-1}(\tfrac{\log T}{\mu_{0}V})$ .

(b) When $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ and $h=(1+\epsilon)\rho_{r}(\beta,\zeta)(\tfrac{\log N}{\mu_{0}})$ .

5 Simulation studies

We follow here the simulation set-up in Sections 5.1 and 5.3 of Wang and Samworth [30]. Assume that the random variables are normal with variances that are unknown but equal within sequence. These variances are estimated using median absolute differences of adjacent observations and after normalization, the random variables are treated like unit variance normal.

In the first study there is exactly one change-point $\tau_{1}$ . Consider $\mu_{t}^{n}=0$ for $t\leq\tau_{1}$ and all $n$ . For $t>\tau_{1}$ , let

\mu^{t}_{n}=\left\{\begin{array}[]{ll}0.8\Big{/}\sqrt{n\textstyle\sum_{m=1}^{V}m^{-1}}&\mbox{ if }n\leq V,\cr 0&\mbox{ if }n>V.\end{array}\right.

The objective is to estimate $\tau_{1}$ assuming we know there is exactly one change-point. We estimate $\tau_{1}$ here by

\widehat{\tau}_{1}={\rm arg}\max_{0<t<T}\ell^{\rm{pen}}_{N}({\bf p}_{0tT}),

where $\ell^{\rm pen}_{N}$ is the penalized sparse score with $\lambda_{1}=1$ and $\lambda_{2}=\sqrt{\tfrac{\log T}{\log\log T}}$ .

We simulate the probabilities that $|\widehat{\tau}_{1}-\tau_{1}|\leq k$ for $k=3$ and 10, and compare against the INSPECT algorithm and the scan algorithm of Enikeeva and Harchaoui [12]. These two algorithms have the best numerical performances in Wang and Samworth [30]. The comparisons in Table I show that the sparse likelihood algorithm performs well.

Table 1: The fraction of simulation runs (out of 1000) for which

\widehat{\tau}_{1}

is within distance

k

from

\tau_{1}

for

k=3

and 10. The same datasets are used to compare sparse likelihood (SL), INSPECT and the scan test, with

\tau_{1}=200

for

T=500

and

\tau_{1}=800

for

T=2000

	$k$	3	10	3	10	3	10
	$V$	SL		INSPECT		scan
$T=500$	3	0.511	0.801	0.478	0.785	0.520	0.804
$N=500$	5	0.466	0.740	0.427	0.718	0.463	0.722
	10	0.393	0.645	0.370	0.637	0.362	0.599
	22	0.319	0.553	0.282	0.547	0.256	0.465
	50	0.244	0.462	0.211	0.453	0.197	0.378
	500	0.177	0.339	0.148	0.335	0.112	0.240
$T=500$	3	0.481	0.748	0.410	0.667	0.480	0.730
$N=2000$	5	0.423	0.673	0.344	0.584	0.394	0.633
	10	0.320	0.546	0.246	0.480	0.261	0.456
	20	0.237	0.431	0.198	0.403	0.188	0.332
	45	0.186	0.344	0.136	0.311	0.130	0.242
	200	0.114	0.227	0.095	0.235	0.074	0.153
	2000	0.068	0.160	0.078	0.189	0.042	0.096
$T=2000$	3	0.603	0.859	0.587	0.855	0.589	0.854
$N=500$	5	0.604	0.865	0.595	0.855	0.558	0.832
	10	0.565	0.827	0.569	0.833	0.487	0.764
	22	0.522	0.789	0.522	0.795	0.438	0.714
	50	0.472	0.748	0.468	0.745	0.384	0.652
	500	0.378	0.643	0.336	0.609	0.273	0.524
$T=2000$	3	0.607	0.866	0.608	0.861	0.599	0.858
$N=2000$	5	0.594	0.864	0.586	0.857	0.557	0.829
	10	0.553	0.847	0.558	0.847	0.476	0.780
	20	0.494	0.807	0.498	0.789	0.435	0.726
	45	0.447	0.747	0.451	0.746	0.377	0.657
	200	0.362	0.649	0.342	0.604	0.297	0.554
	2000	0.274	0.532	0.241	0.471	0.225	0.457

In the second study there are three change-points within $N=200$ sequences of length $T=2000$ , at $\tau_{1}=500$ , $\tau_{2}=1000$ and $\tau_{3}=1500$ . At each change-point exactly $40$ sequences undergo mean changes. Six scenarios are considered, corresponding to

\mu_{\tau_{j}+1}^{k(j-1)+n}-\mu_{\tau_{j}}^{k(j-1)+n}=r\Big{/}\sqrt{n\textstyle\sum_{m=1}^{40}m^{-1}},

$1\leq j\leq 3$ and $1\leq n\leq 40$ , for $r=0.4,0.6$ and $k=0,20,40$ . For $k=0$ , the mean changes are within the same 40 sequences at all three change-points, whereas for $k=40$ the mean changes at all three change-points are on distinct sequences. For $k=20$ , there is partial overlap of the sequences having mean changes at adjacent change-points. The number of estimated change-points over 100 simulated datasets on each sequence is recorded, as well as the adjusted Rand index (ARI), see Rand [27] and Hubert and Arabie [16], to measure the quality of the change-point estimation.

In the application of the sparse likelihood algorithm, we select $h_{1}=1$ and $h_{i+1}=\lceil 1.1h_{i}\rceil$ for $i\geq 1$ , and $d_{i}=\lfloor h_{i}/i\rfloor$ , for a total of $i_{T}=61$ window lengths. We select critical value $c_{T}=5$ and parameters $\lambda_{1}=1$ , $\lambda_{2}=\sqrt{\tfrac{\log T}{\log\log T}}\doteq 1.94$ .

Wang and Samworth [30] showed that INSPECT achieves average ARI of 0.90 when $r=0.6$ and either 0.73 (for $k=20$ ) or 0.74 (for $k=0$ and 40) when $r=0.4$ , comparable to sparse likelihood, see Table II.

In addition to INSPECT, Wang and Samworth [30] considered DC, SBS and scan, as well as the CUSUM aggregration algorithms of Jirak [19] and Horváth and Hušková [15], with average ARI in the range 0.77–0.87 when $r=0.6$ and 0.68–0.72 when $r=0.4$ .

Table 2: Number of change-points estimated by the sparse likelihood algorithm and the average ARI over 100 simulated datasets.

$r$	$k$	$\#$ change-points				ARI
		2	3	4	5
0.6	0	11	80	8	1	0.91
0.4	0	61	35	4	0	0.74
0.6	20	12	80	8	0	0.91
0.4	20	66	31	2	1	0.74
0.6	40	10	78	12	0	0.91
0.4	40	68	26	6	0	0.75

Appendix A Proof of Theorem 1

Since $c_{N}\rightarrow\infty$ , by Markov’s inequality $P_{0}(\ell_{N}({\bf p})\geq c_{N})\leq e^{-c_{N}}\rightarrow 0$ . The proof of $P_{\mu_{N}}(\ell_{N}({\bf p})<c_{N})\rightarrow 0$ applies Lemmas 1 and 2 below. Lemma 1 says that the sum of sparse likelihood scores under $q^{n}\sim{\rm Uniform}(0,1)$ is bounded below by a value close to zero, with large probability. Lemma 2 provides a lower bound to the increase in score when the p-value is divided by at least 2. Their proofs are at the end of Appendix A.

Lemma 1.

Let ${\bf q}=(q^{1},\ldots,q^{N})$ , with $q^{n}\sim_{\rm i.i.d.}$ Uniform(0,1). For fixed $\lambda_{1}\geq 0$ and $\delta>0$ ,

\sup_{\delta\leq\lambda_{2}\leq\sqrt{N}}P(\ell_{N}({\bf q})\leq-\lambda_{2}^{2}\sqrt{\log N})\rightarrow 0.

Lemma 2.

For $\lambda_{1}>0$ fixed, $\delta\leq\lambda_{2}\leq\sqrt{N}$ for some $\delta>0$ and $\xi_{N}=o(N^{-\eta})$ for some $\eta>0$ such that $\xi_{N}\geq\tfrac{\lambda_{2}^{2}}{2N}$ ,

\inf_{\begin{subarray}{c}(p,q):p\leq\xi_{N},\\ q\geq\lambda_{2}^{2}/N,p\leq q/2\end{subarray}}[\ell_{N}(p)-\ell_{N}(q)]\geq\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}

for large $N$ .

Proof of Theorem 1. Let $\tfrac{1}{2}<\beta<1$ , $\lambda_{1}\geq 0$ and $\lambda_{2}>0$ be fixed. Let $\nu$ be such that

	$\displaystyle(1-\sqrt{1-\beta})^{2}<\nu<1$		$\displaystyle\mbox{ if }\tfrac{3}{4}\leq\beta<1,$
	$\displaystyle\beta-\tfrac{1}{2}<\nu<4(\beta-\tfrac{1}{2})$		$\displaystyle\mbox{ if }\tfrac{1}{2}<\beta<\tfrac{3}{4},$

and let

$\displaystyle\mu_{N}$	$\displaystyle=$	$\displaystyle\sqrt{2\nu\log N},$
$\displaystyle Q^{n}$	$\displaystyle\sim$	$\displaystyle\mbox{Bernoulli}(N^{-\beta}),$
$\displaystyle Z^{n}\|Q^{n}$	$\displaystyle\sim$	$\displaystyle{\rm N}(\mu_{N}Q^{n},1),$
$\displaystyle p^{n}$	$\displaystyle=$	$\displaystyle\Phi(-Z^{n}),$
$\displaystyle q^{n}$	$\displaystyle=$	$\displaystyle\Phi(-Z^{n}+\mu_{N}Q^{n}).$

The additional assumptions of $\nu<1$ for $\tfrac{3}{4}\leq\beta<1$ and $\nu<4(\beta-\tfrac{1}{2})$ for $\tfrac{1}{2}<\beta<\tfrac{3}{4}$ is not restrictive because $\ell_{N}({\bf p})$ increases stochastically with $\mu_{N}$ .

Case 1: $\tfrac{3}{4}\leq\beta<1$ . Let

\Gamma=\{n:Q^{n}=1,Z^{n}\geq\sqrt{2\log N},q^{n}\geq\tfrac{\lambda^{2}_{2}}{N}\}.

For $N$ large, $p^{n}\leq\tfrac{q^{n}}{2}$ for $n\in\Gamma$ . Moreover $\ell_{N}(p^{n})\geq\ell_{N}(q^{n})$ for all $n$ . Hence by Lemma 1 and Lemma 2 with $\xi_{N}=N^{-1}$ , with probability tending to 1,

	$\displaystyle\ell_{N}({\bf p})$	$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q})+\sum_{n\in\Gamma}[\ell_{N}(p^{n})-\ell_{N}(q^{n})]$
		$\displaystyle\geq$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{\log N}}.$

Since $\#\Gamma$ is binomial with mean

			$\displaystyle E_{\mu_{N}}(\#\Gamma)$
		$\displaystyle=$	$\displaystyle N^{1-\beta}[\Phi(-\sqrt{2\log N}+\sqrt{2\nu\log N})-\tfrac{\lambda_{2}^{2}}{N}]$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{N^{1-\beta-(1-\sqrt{\nu})^{2}}}{\sqrt{\log N}},$

with $1-\beta-(1-\sqrt{\nu})^{2}>0$ for $(1-\sqrt{1-\beta})^{2}<\nu<1$ , and since $c_{N}$ is subpolynomial in $N$ , we conclude $P_{\mu_{N}}(\ell_{N}({\bf p})\geq c_{N})\rightarrow 1$ .

Case 2: $\tfrac{1}{2}<\beta<\tfrac{3}{4}$ . Let

\Gamma=\{n:Q^{n}=1,Z^{n}\geq 2\sqrt{(2\beta-1)\log N},q^{n}\geq\tfrac{\lambda_{2}^{2}}{N}\}.

For $N$ large, $p^{n}\leq\tfrac{q^{n}}{2}$ for $n\in\Gamma$ . Hence by Lemma 1 and Lemma 2 with $\xi_{N}=N^{2-4\beta}$ , with probability tending to 1,

	$\displaystyle\ell_{N}({\bf p})$	$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q})+\sum_{n\in\Gamma}[\ell_{N}(p^{n})-\ell_{N}(q^{n})]$
		$\displaystyle\geq$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(\#\Gamma)\tfrac{\lambda_{2}}{4N^{\frac{3}{2}-2\beta}\sqrt{\log N}}.$

Since $\#\Gamma$ is binomial with mean

			$\displaystyle E_{\mu_{N}}(\#\Gamma)$
		$\displaystyle=$	$\displaystyle N^{1-\beta}[\Phi(-2\sqrt{(2\beta-1)\log N}+\sqrt{2\nu\log N})-\tfrac{\lambda_{2}^{2}}{N}]$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{N^{1-\beta-(\sqrt{4\beta-2}-\sqrt{\nu})^{2}}}{\sqrt{\log N}},$

and

1-\beta-(\sqrt{4\beta-2}-\sqrt{\nu})^{2}>\tfrac{3}{2}-\beta\mbox{ for }\beta-\tfrac{1}{2}<\nu<4(\beta-\tfrac{1}{2}),

we conclude $P_{\mu_{N}}(\ell_{N}({\bf p})\geq c_{N})\rightarrow 1$ . $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Lemma 1. Let

x_{N}(p)=\tfrac{\lambda_{1}\log N}{N}f_{1}(p)+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(p),

where $f_{1}(p)=\tfrac{1}{p(2-\log p)^{2}}-\frac{1}{2}$ , $f_{2}(p)=\tfrac{1}{\sqrt{p}}-2$ , $\lambda_{1}\geq 0$ and $\delta\leq\lambda_{2}\leq N^{\frac{1}{2}}$ for some $\delta>0$ . Let $r_{N}=\tfrac{1}{N\log N}$ . Since $x_{N}(r_{N})\geq 0$ and $x_{N}(1)\geq-\tfrac{1}{2}$ for $N$ large and $\log(1+x)\geq x-x^{2}$ for $x\geq-\frac{1}{2}$ ,

\ell_{N}({\bf q})=\sum_{n=1}^{N}\log(1+x_{N}(q^{n}))\geq\sum_{n=1}^{N}h_{N}(q^{n})-\sum_{n=1}^{N}h_{N}^{2}(q^{n}),

(24)

where $h_{N}(q)=x_{N}(q){\bf 1}_{\{q\geq r_{N}\}}$ .

By Chebyshev’s inequality and the bounds in (A)–(A) below.

			$\displaystyle P(\ell_{N}({\bf q})\leq-\lambda_{2}^{2}\sqrt{\log N})$
		$\displaystyle\leq$	$\displaystyle P\Big{(}\sum_{n=1}^{N}h_{N}(q^{n})\leq-\tfrac{\lambda_{2}^{2}\sqrt{\log N}}{2}\Big{)}$
			$\displaystyle\quad+P\Big{(}\sum_{n=1}^{N}h_{N}^{2}(q^{n})\geq\tfrac{\lambda_{2}^{2}\sqrt{\log N}}{2}\Big{)}$
		$\displaystyle\leq$	$\displaystyle\tfrac{N{\rm Var}(h_{N}(q^{n}))}{(NEh_{N}(q^{n})+\frac{\lambda_{2}^{2}\sqrt{\log N}}{2})^{2}}+\tfrac{N{\rm Var}(h_{N}^{2}(q^{n}))}{(\frac{\lambda_{2}^{2}\sqrt{\log N}}{2}-NEh_{N}^{2}(q^{n}))^{2}}\rightarrow 0.$

Since $Ex_{N}(q^{n})=0$ ,

$\displaystyle Eh_{N}(q^{n})$	$\displaystyle=$	$\displaystyle-E[x_{N}(q^{n}){\bf 1}_{\{q^{n}<r_{N}\}}]$
	$\displaystyle=$	$\displaystyle-\tfrac{\lambda_{1}\log N}{N}(\tfrac{1}{2-\log r_{N}}-\tfrac{r_{N}}{2})$
		$\displaystyle\quad-\tfrac{\lambda_{2}}{\sqrt{N\log N}}(2\sqrt{r_{N}}-2r_{N})$
	$\displaystyle\geq$	$\displaystyle-\tfrac{\lambda_{1}}{N}-\tfrac{2\lambda_{2}}{N\log N}.$

Let $s_{N}=\tfrac{(\log N)^{2}}{N}$ .

			$\displaystyle{\rm Var}(h_{N}(q^{n}))$
		$\displaystyle\leq$	$\displaystyle Eh_{N}^{2}(q^{n})$
		$\displaystyle\leq$	$\displaystyle\tfrac{2\lambda_{1}^{2}(\log N)^{2}}{N^{2}}\int_{r_{N}}^{1}\tfrac{dq}{q^{2}(2-\log q)^{4}}+\tfrac{2\lambda_{2}^{2}}{N\log N}\int_{r_{N}}^{1}\tfrac{dq}{q}$
		$\displaystyle\leq$	$\displaystyle\tfrac{2\lambda_{1}^{2}(\log N)^{2}}{N^{2}}\Big{(}\int_{s_{N}}^{1}\tfrac{dq}{q^{2}}+\tfrac{1}{(2-\log s_{N})^{4}}\int_{r_{N}}^{s_{N}}\tfrac{dq}{q^{2}}\Big{)}$
			$\displaystyle\quad+\tfrac{2\lambda_{2}^{2}\log(\frac{1}{r_{N}})}{N\log N}$
		$\displaystyle\lesssim$	$\displaystyle\tfrac{\lambda_{1}^{2}+\lambda_{2}^{2}}{N}.$

			$\displaystyle{\rm Var}(h_{N}^{2}(q^{n}))$
		$\displaystyle\leq$	$\displaystyle Eh_{N}^{4}(q^{n})$
		$\displaystyle\leq$	$\displaystyle\tfrac{8\lambda_{1}^{4}(\log N)^{4}}{N^{4}}\Big{(}\int_{s_{N}}^{1}\tfrac{dq}{q^{4}}+\tfrac{1}{(2-\log s_{N})^{8}}\int_{r_{N}}^{s_{N}}\tfrac{dq}{q^{4}}\Big{)}$
			$\displaystyle\quad+\tfrac{8\lambda_{2}^{4}}{(N\log N)^{4}}\int_{r_{N}}^{1}\tfrac{dq}{q^{2}}$
		$\displaystyle\lesssim$	$\displaystyle\tfrac{\lambda_{1}^{4}+\lambda_{2}^{4}}{N}.$

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Lemma 2. For $\tfrac{\lambda_{2}^{2}}{2N}\leq r\leq 2\xi_{N}$ , $|\log r|\asymp\log N$ and therefore

\tfrac{\frac{\lambda_{1}\log N}{N}f_{1}(r)}{\frac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(r)}\asymp\tfrac{1}{\lambda_{2}\sqrt{Nr\log N}}\rightarrow 0.

Moreover,

\tfrac{\lambda_{2}}{N\log N}f_{2}(r)\sim\tfrac{\lambda_{2}}{\sqrt{Nr\log N}}\rightarrow 0.

Hence by $\log(1+x)\sim x$ as $x\rightarrow 0$ ,

\ell_{N}(r)\sim\tfrac{\lambda_{2}}{\sqrt{Nr\log N}}.

(28)

Case 1: $\frac{\lambda_{2}^{2}}{2N}\leq p\leq\xi_{N}$ . By (28) and $q\geq 2p$ ,

$\displaystyle\ell_{N}(p)-\ell_{N}(q)$	$\displaystyle\geq$	$\displaystyle\ell_{N}(p)-\ell_{N}(2p)$
	$\displaystyle\sim$	$\displaystyle(1-\tfrac{1}{\sqrt{2}})\tfrac{\lambda_{2}}{\sqrt{Np\log N}}$
	$\displaystyle>$	$\displaystyle\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}.$

Case 2: $p<\tfrac{\lambda_{2}^{2}}{2N}$ . By (28), $q\geq\tfrac{\lambda_{2}^{2}}{N}$ and $\xi_{N}\geq\tfrac{\lambda_{2}^{2}}{2N}$ ,

$\displaystyle\ell_{N}(p)-\ell_{N}(q)$	$\displaystyle\geq$	$\displaystyle\ell_{N}(\tfrac{\lambda_{2}^{2}}{2N})-\ell_{N}(\tfrac{\lambda_{2}^{2}}{N})$
	$\displaystyle\sim$	$\displaystyle(1-\tfrac{1}{\sqrt{2}})\tfrac{\lambda_{2}}{\sqrt{N(\frac{\lambda_{2}^{2}}{2N})\log N}}$
	$\displaystyle>$	$\displaystyle\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}.$

Appendix B Proof of Theorem 2

Proof of Theorem 2(a)i. Let $h=\lfloor\tfrac{4(1-\epsilon)\log T}{\Delta^{2}V}\rfloor$ for some $0<\epsilon<1$ . Let $P_{0}$ denote probability with respect to $\mu^{n}_{t}=0$ for all $n$ and $t$ . Let $t_{k}=(2k-1)h$ and let $P_{k}$ , $1\leq k\leq K:=\lfloor\tfrac{T}{2h}\rfloor$ , denote probability under which, for $n\leq V$ ,

$\displaystyle\mu^{n}_{t_{k}-h+1}$	$\displaystyle=$	$\displaystyle\cdots=\mu^{n}_{t_{k}}=-\tfrac{\Delta}{2},$	(29)
$\displaystyle\mu^{n}_{t_{k}+1}$	$\displaystyle=$	$\displaystyle\cdots=\mu^{n}_{t_{k}+h}=\tfrac{\Delta}{2},$
$\displaystyle\mu^{n}_{t}$	$\displaystyle=$	$\displaystyle 0\mbox{ for }t\leq t_{k}-h\mbox{ and }t>t_{k}+h,$

and $\mu^{n}_{1}=\cdots=\mu^{n}_{T}=0$ for $n>V$ . Let $E_{k}$ denote expectation with respect to $P_{k}$ .

Let $P_{*}=\tfrac{1}{K}\sum_{k=1}^{K}P_{k}$ and let $L=\tfrac{1}{K}\sum_{k=1}^{K}L_{k}$ , where $L_{k}=\tfrac{dP_{k}}{dP_{0}}({\bf X})$ with ${\bf X}=(X^{n}_{t}:1\leq n\leq N,1\leq t\leq T)$ . Hence

\log L_{1}=\tfrac{h\Delta}{2}\sum_{n=1}^{V}(\bar{X}^{n}_{h,2h}-\bar{X}^{n}_{0h})-\tfrac{hV\Delta^{2}}{4}.

(30)

Let $A_{i}=\{L\leq 3\}\cap\{\mbox{conclude }H_{i}\}$ . Since $P(A_{1})=E_{0}(L{\bf 1}_{A_{1}})\leq 3P_{0}(A_{1})$ ,

			$\displaystyle\sup_{\boldsymbol{\mu}\in\Omega_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})$
			$\displaystyle\quad+\sup_{\boldsymbol{\mu}\in\Omega_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})$
		$\displaystyle\geq$	$\displaystyle P_{0}(\mbox{conclude }H_{1})+P_{*}(\mbox{conclude }H_{0})$
		$\displaystyle\geq$	$\displaystyle P_{0}(A_{1})+P_{}(A_{0})\geq\tfrac{1}{3}P_{}(L\leq 3)=\tfrac{1}{3}P_{1}(L\leq 3),$

with the last equality due to $L$ having the same distribution under all $P_{k}$ and $P_{*}$ .

Since $E_{1}L_{k}=1$ for $k\geq 2$ , it follows that $P_{1}(\tfrac{1}{K}\sum_{k=2}^{K}L_{k}\leq 2)\geq\tfrac{1}{2}$ . Hence by (B), to show that $\sup_{\boldsymbol{\mu}\in\Omega_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})+\sup_{\boldsymbol{\mu}\in\Omega_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})\rightarrow 0$ is not possible, it suffices to show that

P_{1}(L_{1}\leq K)\rightarrow 1\mbox{ as }T\rightarrow\infty.

(32)

By (30), $\log L_{1}\sim{\rm N}(\tfrac{hV\Delta^{2}}{4},\tfrac{hV\Delta^{2}}{2})$ , and indeed

P_{1}(L_{1}\leq K)=\Phi\Big{(}\tfrac{\log K-\frac{1}{4}hV\Delta^{2}}{\sqrt{\frac{1}{2}hV\Delta^{2}}}\Big{)}\rightarrow 1.

(33)

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 2(a)ii. Proceed as in the proof of Theorem 2(a)i., but with $h=\lfloor\tfrac{4(1-\epsilon)\rho_{Z}(\beta,\zeta)\log N}{\Delta^{2}}\rfloor$ , and $P_{k}$ probability under which, independently for $1\leq n\leq N$ , $Q^{n}=1$ with probability $2N^{-\beta}$ and $Q^{n}=0$ otherwise. When $Q^{n}=1$ , (29) holds. When $Q^{n}=0$ , $\mu^{n}_{1}=\cdots=\mu^{n}_{T}=0$ .

By the law of large numbers, $P_{1}(\boldsymbol{\mu}\in\Omega_{1}(h,\Delta,V))=P_{1}(\sum_{n=1}^{N}Q^{n}\geq V)\rightarrow 1$ . Hence by (B) it suffices to show (32) with

	$\displaystyle L_{1}$	$\displaystyle=$	$\displaystyle\prod_{n=1}^{N}[1+2N^{-\beta}(e^{Z^{n}\Delta\sqrt{\frac{h}{2}}-\frac{h\Delta^{2}}{4}}-1)],$		(34)
	$\displaystyle Z^{n}$	$\displaystyle=$	$\displaystyle\sqrt{\tfrac{h}{2}}(\bar{X}^{n}_{h,2h}-\bar{X}^{n}_{0h})\sim{\rm N}\Big{(}Q^{n}\Delta\sqrt{\tfrac{h}{2}},1\Big{)}.$		(35)

Case 1: $\tfrac{1-\zeta}{2}<\beta<\tfrac{3(1-\zeta)}{4}$ . Recall that $\rho_{Z}(\beta,\zeta)=\beta-\tfrac{1-\zeta}{2}$ . By (34) and (35),

$\displaystyle E_{1}L_{1}$	$\displaystyle=$	$\displaystyle(1+4N^{-2\beta}[\exp(\tfrac{h\Delta^{2}}{2})-1])^{N}$
	$\displaystyle\leq$	$\displaystyle\exp(4N^{1-2\beta+2(1-\epsilon)\rho_{Z}(\beta,\zeta)})$
	$\displaystyle=$	$\displaystyle\exp(4N^{\zeta-2\epsilon\rho_{Z}(\beta,\zeta)}).$

Since $\log K=\log(\lfloor\tfrac{T}{2h}\rfloor)\sim N^{\zeta}$ , it follows that $P_{1}(L_{1}\leq K)\geq 1-K^{-1}E_{1}L_{1}\rightarrow 1$ and (32) holds.

Case 2: $\tfrac{3(1-\zeta)}{4}\leq\beta<1-\zeta$ . Recall that $\rho_{Z}(\beta,\zeta)=(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}$ . Express $\log L_{1}=\sum_{i=0}^{3}R_{i}$ , where

$\displaystyle R_{i}$	$\displaystyle=$	$\displaystyle\sum_{n\in\Gamma_{i}}\log\Big{(}1+2N^{-\beta}\Big{[}\exp\Big{(}Z^{n}\Delta\sqrt{\tfrac{h}{2}}-\tfrac{\Delta^{2}h}{4}\Big{)}-1\Big{]}\Big{)},$
$\displaystyle\Gamma_{0}$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=0\},$
$\displaystyle\Gamma_{1}$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=1,Z^{n}\leq\sqrt{2(1-\zeta)\log N}\},$
$\displaystyle\Gamma_{2}$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=1,\sqrt{2(1-\zeta)\log N}<Z^{n}\leq 2\sqrt{2\log N}\},$
$\displaystyle\Gamma_{3}$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=1,Z^{n}>2\sqrt{2\log N}\}.$

We show (32) by showing that

P_{1}(R_{i}\geq\tfrac{1}{4}\log K)\rightarrow 0\mbox{ for }0\leq i\leq 3.

(36)

${\bf i=3}$ : Since $\Delta\sqrt{\tfrac{h}{2}}\leq\sqrt{2\log N}$ ,

P_{1}(R_{3}>0)\leq 2N^{1-\beta}\Phi(-\sqrt{2\log N})\rightarrow 0.

${\bf i=2}$ : Since

			$\displaystyle\Delta\sqrt{\tfrac{h}{2}}$
		$\displaystyle\leq$	$\displaystyle\sqrt{2(1-\zeta)\log N}-\sqrt{2(1-\zeta-\beta)\log N}-\sqrt{2\delta\log N}$

for some $\delta>0$ , it follows that

\Phi\Big{(}\Delta\sqrt{\tfrac{h}{2}}-\sqrt{2(1-\zeta)\log N}\Big{)}=o(N^{\zeta+\beta-1-\delta}).

Hence

			$\displaystyle E_{1}R_{2}$
		$\displaystyle\leq$	$\displaystyle E_{1}(\#\Gamma_{2})\log(1+2N^{4-\beta})$
		$\displaystyle\lesssim$	$\displaystyle(N^{1-\beta}\log N)\Phi\Big{(}\Delta\sqrt{\tfrac{h}{2}}-\sqrt{2(1-\zeta)\log N}\Big{)}$
		$\displaystyle=$	$\displaystyle o(N^{\zeta-\delta}\log N),$

and (36) follows from $\log K\sim N^{\zeta}$ .

${\bf i=1}$ : Since $\log(1+x)\leq x$ ,

	$\displaystyle E_{1}R_{1}\leq 4N^{1-2\beta}e^{-h\Delta^{2}/4}$		(38)
	$\displaystyle\qquad\times\int_{-\infty}^{\sqrt{2(1-\zeta)\log N}}\tfrac{1}{\sqrt{2\pi}}e^{-(z-\Delta\sqrt{\frac{h}{2}})^{2}/2+z\Delta\sqrt{\frac{h}{2}}}dz$
	$\displaystyle\qquad=4N^{1-2\beta}\Phi\Big{(}\sqrt{2(1-\zeta)\log N}-2\Delta\sqrt{\tfrac{h}{2}}\Big{)}e^{h\Delta^{2}/2}$
	$\displaystyle\qquad\leq 4N^{1-2\beta-(\sqrt{1-\zeta}-2\sqrt{(1-\epsilon)\rho_{Z}(\beta,\zeta)})^{2}+2(1-\epsilon)\rho_{Z}(\beta,\zeta)}$
	$\displaystyle\qquad=4N^{\zeta-\delta}\mbox{ for some }\delta>0.$

The last step above is shown below. Since

R_{1}\geq(\#\Gamma_{1})\log(1-2N^{-\beta})\stackrel{{\scriptstyle p}}{{\sim}}-2N^{1-2\beta}=o(N^{\zeta}),

and $\log K\sim N^{\zeta}$ , (36) follows from (38) and Markov’s inequality.

${\bf i=0}$ : Since $E_{1}e^{R_{0}}=1$ ,

P_{1}(R_{0}\geq\tfrac{1}{4}\log K)\leq K^{-\frac{1}{4}}\rightarrow 0.

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of (38). It suffices to show that

1-2\beta-(\sqrt{1-\zeta}-2\sqrt{(1-\epsilon)\rho_{Z}(\beta,\zeta)})+2(1-\epsilon)\rho_{Z}(\beta,\zeta)<\zeta.

(39)

Let $m(\rho)=-(\sqrt{1-\zeta}-2\sqrt{\rho})^{2}+2\rho$ . Inequality (39) follows from

			$\displaystyle m(\rho_{Z}(\beta,\zeta))$
		$\displaystyle=$	$\displaystyle-(\sqrt{1-\zeta}-2\sqrt{\rho_{Z}(\beta,\zeta)})+2\rho_{Z}(\beta,\zeta)$
		$\displaystyle=$	$\displaystyle-(2\sqrt{1-\zeta-\beta}-\sqrt{1-\zeta})^{2}$
			$\displaystyle\quad+2(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}$
		$\displaystyle=$	$\displaystyle 1-\zeta-2(1-\zeta-\beta)=\zeta-1+2\beta,$

and

	$\displaystyle\tfrac{d}{d\rho}m(\rho)$	$\displaystyle=$	$\displaystyle 2\rho^{-\frac{1}{2}}(\sqrt{1-\zeta}-2\sqrt{\rho})+2$
		$\displaystyle=$	$\displaystyle 2\rho^{-\frac{1}{2}}\sqrt{1-\zeta}-2>0\mbox{ for }\rho<1-\zeta.$

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 2(b). Consider $\Delta=T^{-\eta}$ . Proceed as in the proof of Theorem 2(a), with $h=\lfloor\tfrac{4(1-2\eta)(1-\epsilon)}{V}T^{2\eta}\log T\rfloor$ for i. and $h=\lfloor 4(1-\epsilon)\rho_{Z}(\beta,\zeta)T^{2\eta}\log N\rfloor$ for ii. For Theorem 2(b)i.,

\log K=\log(\lfloor\tfrac{T}{2h}\rfloor\sim(1-2\eta)\log T,

(40)

and (32) holds because

P_{1}(L\leq K)=\Phi\Big{(}\tfrac{\log K-\frac{1}{4}hV\Delta^{2}}{\sqrt{\frac{1}{2}hV\Delta^{2}}}\Big{)}\rightarrow 1.

For Theorem 2(b)ii. the arguments in the proof of Theorem 2(a)ii. apply with

\log K=\log(\lfloor\tfrac{T}{2h}\rfloor)\sim(1-2\eta)N^{\zeta}.

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Appendix C Proof of Theorem 3

For $(s,t,u)\in{\cal A}_{i}(T)$ , the penalty of the SL scores is

\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\geq\log(\tfrac{T}{2h_{i}}).

Moreover $\#{\cal A}_{i}(T)\leq\tfrac{T}{d_{i}}$ . Hence by (6) and $c_{T}-\log(\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}})\rightarrow\infty$ , for $\boldsymbol{\mu}\in\Omega_{0}$ ,

			$\displaystyle P_{\boldsymbol{\mu}}(\mbox{Type I error})$
		$\displaystyle\leq$	$\displaystyle\sum_{i=1}^{i_{T}}\sum_{(s,t,u)\in{\cal A}_{i}(T)}P_{\boldsymbol{\mu}}(\ell_{N}({\bf p}_{stu})\geq c_{T}+\log(\tfrac{T}{2h_{i}}))$
		$\displaystyle\leq$	$\displaystyle\sum_{i=1}^{i_{T}}\tfrac{T}{d_{i}}\exp(-c_{T}-\log(\tfrac{T}{2h_{i}}))$
		$\displaystyle=$	$\displaystyle 2e^{-c_{T}}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\rightarrow 0.$

Consider $\boldsymbol{\mu}\in\Omega_{1}(\Delta,h,V)$ and let $\tau_{j}$ be the change-point satisfying the conditions in the definition of $\Omega_{1}(\Delta,h,V)$ . Let $Q^{n}=1$ if $|\mu_{\tau_{j}+1}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta$ and $Q^{n}=0$ otherwise. We assume without loss of generality that $0<\epsilon<1$ .

To aid in the checking of the proof of Theorem 3, we provide here the key ideas. Let $j$ be such that

\min(\tau_{j}-\tau_{j-1},\tau_{j+1}-\tau_{j})\geq h\mbox{ and }m_{j\Delta}\geq V.

Consider $\Delta>0$ fixed and $V\sim N^{1-\beta}$ for some $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ . Since $h\rightarrow\infty$ , it follows from (11) that for $N$ large we are able to find $(s,t,u)=(s(ik),t(ik),u(ik))$ close to $(\tau_{j}-h,\tau_{j},\tau_{j}+h)$ such that

E_{\boldsymbol{\mu}}Z_{stu}^{n}\geq[1+o(1)]\tfrac{h\Delta^{2}}{2}\mbox{ for }n\mbox{ satisfying }|\mu_{\tau_{j+1}}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta.

(42)

Recall that $p_{stu}^{n}=2\Phi(-|Z_{stu}^{n}|)$ and let $q_{stu}^{n}=\Phi(-|Z_{stu}^{n}|+E_{\boldsymbol{\mu}}Z_{stu}^{n})+\Phi(-|Z_{stu}^{n}|-E_{\boldsymbol{\mu}}Z_{stu}^{n})$ . Let

	$\displaystyle\Gamma=\{n:\|Z_{stu}^{n}\|\geq\sqrt{2\omega\log N},\ q_{stu}^{n}\geq N^{\zeta-1},$		(43)
	$\displaystyle\qquad\qquad\|\mu_{\tau_{j+1}}^{n}-\mu_{\tau_{j}}^{n}\|\geq\Delta\},$

with $\omega=1-\zeta$ when $\tfrac{3(1-\zeta)}{4}<\beta<1-\zeta$ and $\omega=4(\beta-\tfrac{1-\zeta}{2})$ when $\tfrac{1-\zeta}{2}<\beta\leq\tfrac{3(1-\zeta)}{4}$ . It follows from Lemmas 1 and 2 that with probability tending to 1,

	$\displaystyle\ell_{N}({\bf p}_{stu})$	$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}$
		$\displaystyle\geq$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}$

for $\xi_{N}=N^{-\omega}$ .

Since the penalty $\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T\sim N^{\zeta}$ , $c_{T}=o(\log T)$ and $\lambda_{2}\sim\tfrac{N^{\frac{\zeta}{2}}}{\sqrt{\zeta\log N}}$ , to show $P_{\boldsymbol{\mu}}(\ell^{\rm pen}_{stu}({\bf p})\geq c_{T})\rightarrow 1$ , it suffices to show that there exists $\delta>0$ such that

E_{\boldsymbol{\mu}}(\#\Gamma)\gtrsim\left\{\begin{array}[]{ll}N^{\zeta+\delta}&\mbox{ if }\tfrac{3(1-\zeta)}{4}<\beta<1-\zeta,\cr N^{\frac{3}{2}-2\beta-\tfrac{\zeta}{2}+\delta}&\mbox{ if }\tfrac{1-\zeta}{2}<\beta\leq\tfrac{3(1-\zeta)}{4}.\end{array}\right.

(44)

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 3(a)i. Consider $V=o(\tfrac{\log T}{\log N})$ . Since $h=4(1+\epsilon)(\tfrac{\log T}{\Delta^{2}V})\rightarrow\infty$ , $\tfrac{h_{i+1}}{h_{i}}\rightarrow 1$ and $d_{i}=o(h_{i})$ , for large $T$ there exists

h_{i}\geq 4(1+\epsilon)^{\frac{1}{2}}(\tfrac{\log T}{\Delta^{2}V})

such that for all $\boldsymbol{\mu}\in\Omega_{1}(h,\Delta,V)$ , there exists $k$ satisfying

\tau_{j-1}<s(ik)<u(ik)<\tau_{j+1}\mbox{ and }|t(ik)-\tau_{j}|\leq\tfrac{d_{i}}{2}.

(45)

Hence when $Q^{n}=1$ ,

|E_{\boldsymbol{\mu}}Z^{n}_{stu}|\geq\Delta(1-\tfrac{d_{i}}{2h_{i}})\sqrt{\tfrac{h_{i}}{2}}\geq\sqrt{2(1+\epsilon)^{\frac{1}{3}}V^{-1}\log T},

(46)

where $(s,t,u)=(s(ik),t(ik),u(ik))$ .

Let $\Gamma=\{n:Q^{n}=1,|Z^{n}_{stu}|\geq\sqrt{2(1+\epsilon)^{\frac{1}{4}}(\tfrac{\log T}{V})}\}$ . Let $p^{n}_{stu}=2\Phi(-|Z_{stu}^{n}|)$ and $q_{stu}^{n}=\Phi(-|Z_{stu}^{n}|+E_{\boldsymbol{\mu}}Z_{stu}^{n})+\Phi(-|Z_{stu}^{n}|-E_{\boldsymbol{\mu}}Z_{stu}^{n})$ . Since $q^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}\mbox{Uniform}$ (0,1) and $\ell_{N}(1)\geq-1$ for $N$ large, by Lemmas 1 and 2, with probability tending to 1,

$\displaystyle\ell_{N}({\bf p}_{stu})$	$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q}_{stu})$
		$\displaystyle\quad+(\#\Gamma)\Big{[}\ell_{N}\Big{(}2\Phi\Big{(}-\sqrt{2(1+\epsilon)^{\frac{1}{4}}\tfrac{\log T}{V}}\Big{)}\Big{)}-1\Big{]}$
	$\displaystyle\geq$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+V[(1+\epsilon)^{\frac{1}{5}}\tfrac{\log T}{V}-\log N]$
	$\displaystyle\geq$	$\displaystyle(1+\epsilon)^{\frac{1}{6}}\log T.$

Since the penalty $\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T$ and $c_{T}=o(\log T)$ , it follows that $P_{\boldsymbol{\mu}}(\ell^{\rm pen}_{N}({\bf p}_{stu})\geq c_{T})\rightarrow 1$ . $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 3(a)ii. Case 1: $V\sim N^{1-\beta}$ for $\tfrac{3(1-\zeta)}{4}\leq\beta<1-\zeta$ . Since $h\Delta^{2}=4(1+\epsilon)(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}\log N$ and $d_{i}=o(h_{i})$ , for large $N$ there exists $i$ satisfying $h_{i}\geq(1+\epsilon)^{-\frac{1}{2}}h$ such that whenever $Q^{n}=1$ ,

	$\displaystyle\|E_{\boldsymbol{\mu}}Z^{n}\|$	$\displaystyle\geq$	$\displaystyle\Delta(1-\tfrac{d_{i}}{2h_{i}})\sqrt{\tfrac{h_{i}}{2}}\geq\sqrt{2\nu\log N},$		(48)
	$\displaystyle\nu$	$\displaystyle=$	$\displaystyle(1+\epsilon)^{\frac{1}{3}}(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2},$

with $(s,t,u)=(s(ik),t(ik),u(ik))$ for $k$ satisfying (45).

For $\Gamma$ defined in (43),

$\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma)$	$\displaystyle\geq$	$\displaystyle V[\Phi\Big{(}-\sqrt{2(1-\zeta)\log N}+\sqrt{2\nu\log N}\Big{)}$
		$\displaystyle\quad-N^{\zeta-1}]$
	$\displaystyle\gtrsim$	$\displaystyle N^{1-\beta-(\sqrt{1-\zeta}-\sqrt{\nu})^{2}}(\log N)^{-\frac{1}{2}},$

and (44) follows from

\sqrt{1-\zeta}>\sqrt{\nu}>\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta}.

Case 2: $V\sim N^{1-\beta}$ for $\tfrac{1-\zeta}{2}<\beta<\tfrac{3(1-\zeta)}{4}$ . Since $h\Delta^{2}=4(1+\epsilon)(\beta-\tfrac{1-\zeta}{2})\log N$ , for large $N$ there exists $h_{i}\geq(1+\epsilon)^{-\frac{1}{2}}h$ such that whenever $Q^{n}=1$ ,

	$\displaystyle\|E_{\boldsymbol{\mu}}Z^{n}_{stu}\|$	$\displaystyle\geq$	$\displaystyle\Delta(1-\tfrac{d_{i}}{2h_{i}})\sqrt{\tfrac{h_{i}}{2}}\geq\sqrt{2\nu\log N},$		(49)
	$\displaystyle\nu$	$\displaystyle=$	$\displaystyle(1+\epsilon)^{\frac{1}{3}}(\beta-\tfrac{1-\zeta}{2}),$

with $(s,t,u)=(s(ik),t(ik),u(ik))$ for $k$ satisfying (45).

For $\Gamma$ defined in (43),

$\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma)$	$\displaystyle\geq$	$\displaystyle V[\Phi\Big{(}-2\sqrt{(2\beta-1+\zeta)\log N}+\sqrt{2\nu\log N}\Big{)}$
		$\displaystyle\quad-N^{\zeta-1}]$
	$\displaystyle\gtrsim$	$\displaystyle N^{1-\beta-(2\sqrt{\beta-\frac{1-\zeta}{2}}-\sqrt{\nu})^{2}}(\log N)^{-\frac{1}{2}},$

and (44) follows from

2\sqrt{\beta-\tfrac{1-\zeta}{2}}>\sqrt{\nu}>\sqrt{\beta-\tfrac{1-\zeta}{2}}.

$\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 3(b). Consider first $V=o(\tfrac{\log T}{\log N})$ . Let $h_{i}\geq(1-2\eta)(1+\epsilon)^{\frac{1}{2}}(\tfrac{\log T}{\Delta^{2}V})$ be such that for all $\boldsymbol{\mu}\in\Omega_{1}(h,\Delta,V)$ , (45) holds for some $k$ . Let

\Gamma=\{n:Q^{n}=1,|Z^{n}_{stu}|\geq\sqrt{2(1-2\eta)(1+\epsilon)^{\frac{1}{4}}\tfrac{\log T}{V}}\}

and define $p_{stu}^{n}$ and $q_{stu}^{n}$ as in the proof of Theorem 3(a)i.

By the arguments in (C), with probability tending to 1,

\ell_{N}({\bf p}_{stu})\geq(1-2\eta)(1+\epsilon)^{\frac{1}{6}}\log T.

Since $\Delta\sim T^{-\eta}$ , it follows that $h_{i}\gtrsim T^{2\eta}\log N$ and the penalty $\log(\tfrac{T}{4}(\tfrac{1}{u-t}+\tfrac{1}{t-s}))\leq(1-2\eta)\log T$ for $T$ large. Hence by $c_{T}=o(\log T)$ we conclude $P_{\boldsymbol{\mu}}(\ell_{N}^{\rm pen}({\bf p}_{stu})\geq c_{T})\rightarrow 1$ .

For $V\sim N^{1-\beta}$ with $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ , the same arguments in the proof of Theorem 3(a)ii. apply. $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Appendix D Proof of Theorem 4

Proof of Theorem 4(a). Let $h=\lfloor\tfrac{(1-\epsilon)\log T}{\mu_{0}VI_{r}}\rfloor$ for some $0<\epsilon<1$ . Let $P_{0}$ denote probability with respect to $\mu_{t}^{n}=(\tfrac{1+r}{2})\mu_{0}$ for all $n$ and $t$ . Let $t_{k}=(2k-1)h$ . Let $P_{k}$ , $1\leq k\leq K:=\lfloor\tfrac{T}{2h}\rfloor$ , denote probability under which for $n\leq V$ ,

\mu^{n}_{t}=\left\{\begin{array}[]{ll}\mu_{0}&\mbox{ for }t_{k}-h<t\leq t_{k},\cr r\mu_{0}&\mbox{ for }t_{k}<t\leq t_{k}+h,\cr(\tfrac{1+r}{2})\mu_{0}&\mbox{ for }t\leq t_{k}-h\mbox{ and }t>t_{k}+h,\end{array}\right.

(50)

and $\mu^{n}_{1}=\cdots=\mu^{n}_{T}=(\tfrac{1+r}{2})\mu_{0}$ for $n>V$ . Let $E_{k}$ and ${\rm Var}_{k}$ denote expectation and variance respectively with respect to $P_{k}$ . Let

	$\displaystyle U^{n}$	$\displaystyle=$	$\displaystyle S_{0h}^{n}\log(\tfrac{2}{1+r})+S_{h,2h}^{n}\log(\tfrac{2r}{r+1}),$		(51)
	$\displaystyle L_{1}$	$\displaystyle=$	$\displaystyle\tfrac{dP_{1}}{dP_{0}}({\bf X})=\prod_{n=1}^{V}\exp(U^{n}).$		(52)

By (B)–(32), it suffices to show that

P_{1}(L_{1}\leq K)\rightarrow 1\mbox{ as }T\rightarrow\infty.

(53)

Since $E_{1}(\log L_{1})=h\mu_{0}VI_{r}$ and ${\rm Var}_{1}(\log L_{1})=h\mu_{0}VC_{r}$ , where $C_{r}=r[\log(\tfrac{2r}{r+1})]^{2}+[\log(\tfrac{2}{r+1})]^{2}$ , by Chebyshev’s inequality,

P_{1}(L_{1}\leq K)\geq 1-\tfrac{hV\mu_{0}C_{r}}{(\log K-hV\mu_{0}I_{r})^{2}}\rightarrow 1,

and (53) holds. $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

We preface the proof of Theorem 4(b) with Lemma 3, which provides an alternative representation of $\rho_{r}(\beta,\zeta)$ . Let

	$\displaystyle D(\omega)$	$\displaystyle=$	$\displaystyle\tfrac{1}{1+r^{\omega}}\log(\tfrac{2}{1+r^{\omega}})+\tfrac{r^{\omega}}{1+r^{\omega}}\log(\tfrac{2r^{\omega}}{1+r^{\omega}}),$		(54)
	$\displaystyle g(\omega)$	$\displaystyle=$	$\displaystyle(\tfrac{1+r^{\omega}}{2})^{\frac{1}{\omega}}.$

Let $\xi(\omega)=\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r}$ . Recall from (20) that

\rho_{r}(\beta,\zeta)=\max_{\frac{1-\zeta}{\beta}<\omega\leq 2}\xi(\omega)\mbox{ for }\tfrac{1-\zeta}{2}<\beta<1-\zeta.

(55)

Lemma 3.

For $\tfrac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]$ , $\xi$ achieves its maximum at $\omega=2$ and

\rho_{r}(\beta,\zeta)=\tfrac{\beta-\frac{1}{2}(1-\zeta)}{2g(2)-1-r}.

(56)

For $\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1$ , $\xi$ achieves its maximum at some $\omega<2$ and

\rho_{r}(\beta,\zeta)=\tfrac{1-\zeta}{2g(\omega)D(\omega)}.

(57)

Proof. Since

$\displaystyle\tfrac{d}{d\omega}\log\xi(\omega)$	$\displaystyle=$	$\displaystyle\tfrac{\omega^{-2}(1-\zeta)}{\beta-\omega^{-1}(1-\zeta)}-\tfrac{2\frac{d}{d\omega}g(\omega)}{2g(\omega)-1-r},$
$\displaystyle\tfrac{d}{d\omega}g(\omega)$	$\displaystyle=$	$\displaystyle\tfrac{d}{d\omega}\exp[\tfrac{1}{\omega}\log(\tfrac{1+r^{\omega}}{2})]$
	$\displaystyle=$	$\displaystyle[\tfrac{r^{\omega}\log r}{\omega(1+r^{\omega})}-\tfrac{1}{\omega^{2}}\log(\tfrac{1+r^{\omega}}{2})]g(\omega)$
	$\displaystyle=$	$\displaystyle\tfrac{D(\omega)g(\omega)}{\omega^{2}},$

it follows that $\tfrac{d}{d\omega}\log\xi(\omega)=0$ when

\omega^{-2}(1-\zeta)[2g(\omega)-1-r]=2[\beta-\omega^{-1}(1-\zeta)]\tfrac{D(\omega)g(\omega)}{\omega^{2}},

(58)

that is when

\tfrac{\beta}{1-\zeta}=\omega^{-1}+\tfrac{2g(\omega)-1-r}{2g(\omega)D(\omega)}.

(59)

For $\tfrac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]$ , the solution of $\omega$ to (59) is at least 2 and the maximum in (55) is attained at $\omega=2$ . We conclude (56). For $\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1$ , the solution of $\omega$ to (59) lies in the interval $(\frac{1-\zeta}{\beta},2)$ . We conclude (57) from (55) and a rearrangement of (58). $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 4(b). For $\frac{1-\zeta}{2}<\beta<1-\zeta$ , let $\omega$ be the maximizer in

\rho_{r}(\beta,\zeta)=\max_{\frac{1-\zeta}{\beta}<\omega\leq 2}(\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r}).

(60)

Let $h=\lfloor\tfrac{(1-\epsilon)\rho_{r}(\beta,\zeta)\log N}{\mu_{0}}\rfloor$ for some $\epsilon>0$ . Let $P_{0}$ denote probability with respect to $\mu^{n}_{t}=g(\omega)\mu_{0}$ for all $n$ and $t$ . Let $t_{k}=(2k-1)h$ . Let $P_{k}$ , $1\leq k\leq K:=\lfloor\tfrac{T}{2h}\rfloor$ , denote probability under which, independently for $1\leq n\leq N$ , $Q^{n}=1$ with probability $2N^{-\beta}$ , and $Q^{n}=0$ otherwise. When $Q^{n}=1$ ,

\mu^{n}_{t}=\left\{\begin{array}[]{ll}\mu_{0}&\mbox{ for }t_{k}-h<t\leq t_{k},\cr r\mu_{0}&\mbox{ for }t_{k}<t\leq t_{k}+h,\cr g(\omega)\mu_{0}&\mbox{ for }t\leq t_{k}-h\mbox{ and }t>t_{k}+h.\end{array}\right.

(61)

When $Q^{n}=0$ , $\mu_{1}^{n}=\cdots=\mu_{T}^{n}=g(\omega)\mu_{0}$ . Let $E_{1}$ denote expectation with respect to $P_{1}$ . Let $P_{Q}=P_{1}(\cdot|Q^{1}=1)$ and let $E_{Q}$ denote expectation with respect to $P_{Q}$ .

By (B)–(32), it suffices to show (53) for

$\displaystyle L_{1}$	$\displaystyle=$	$\displaystyle\tfrac{dP_{1}}{dP_{0}}({\bf X})=\prod_{n=1}^{N}(1+2N^{-\beta}[\exp(U^{n})-1]),$	(62)
$\displaystyle U^{n}$	$\displaystyle=$	$\displaystyle S^{n}_{0h}\log(\tfrac{1}{g(\omega)})+S^{n}_{h,2h}\log(\tfrac{r}{g(\omega)})$
		$\displaystyle\quad-h\mu_{0}[1+r-2g(\omega)].$

For notational simplicity, let $S_{0h}=S_{0h}^{1}$ and $S_{h,2h}=S_{h,2h}^{1}$ .

For $X\sim$ Poisson( $\lambda$ ) and constant $C>0$ ,

E(C^{X})=\sum_{x=0}^{\infty}e^{-\lambda}\tfrac{(C\lambda)^{x}}{x!}=e^{\lambda(C-1)}.

(63)

This identity is applied in (D), (D) and (D).

Case 1: $\frac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\frac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]$ , $\omega=2$ . By Lemma 2, (61)–(63) and $[g(2)]^{2}=\tfrac{1+r^{2}}{2}$ ,

			$\displaystyle E_{Q}\exp(U^{1})$
		$\displaystyle=$	$\displaystyle E_{Q}[(\tfrac{1}{g(2)})^{S_{0h}}(\tfrac{r}{g(2)})^{S_{h,2h}}]e^{-h\mu_{0}[1+r-2g(2)]}$
		$\displaystyle=$	$\displaystyle\exp(h\mu_{0}[\tfrac{1}{g(2)}-1+\tfrac{r^{2}}{g(2)}-r]-h\mu_{0}[1+r-2g(2)])$
		$\displaystyle=$	$\displaystyle\exp(2h\mu_{0}[2g(2)-1-r])$
		$\displaystyle=$	$\displaystyle\exp(\tfrac{h\mu_{0}(2\beta-1+\zeta)}{\rho_{r}(\beta,\zeta)})\leq N^{(1-\epsilon)(2\beta-1+\zeta)}.$

Hence

	$\displaystyle E_{1}L_{1}$	$\displaystyle=$	$\displaystyle(1+4N^{-2\beta}[E_{Q}\exp(U^{1})-1])^{N}$
		$\displaystyle\leq$	$\displaystyle\exp(4N^{\zeta-\delta})=o(K),$

where $\delta=\epsilon(2\beta-1+\zeta)$ , and (53) holds.

Case 2: $\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1$ . Express

$\displaystyle\log L_{1}$	$\displaystyle=$	$\displaystyle R_{0}+R_{1},$	(65)
$\displaystyle\mbox{where }R_{i}$	$\displaystyle=$	$\displaystyle\sum_{n\in\Gamma_{i}}\log(1+2N^{-\beta}[\exp(U^{n})-1]),$
$\displaystyle\Gamma_{0}$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=0\}\cup\{n:Q^{n}=1,\exp(U^{n})\leq N^{\beta}\},$
$\displaystyle\Gamma_{1}$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=1,\exp(U^{n})>N^{\beta}\}.$

We conclude (53) from

P_{1}(R_{i}\leq\tfrac{1}{2}\log K)\rightarrow 1\mbox{ for }i=0\mbox{ and }1.

(66)

${\bf i=0}$ : Let $a=\omega-1$ with $\omega$ the maximizer in (60). Since $g(\omega)=\tfrac{1+r^{a+1}}{2g^{a}(\omega)}$ , by (60), (62) and (63),

			$\displaystyle E_{Q}[\exp(U^{1}){\bf 1}_{\{1\in\Gamma_{0}\}}]$
		$\displaystyle\leq$	$\displaystyle N^{\beta(1-a)}E_{Q}\exp(aU^{1})$
		$\displaystyle=$	$\displaystyle N^{\beta(1-a)}\exp(h\mu_{0}[\tfrac{1}{g^{a}(\omega)}-1+\tfrac{r^{a+1}}{g^{a}(\omega)}-r]$
			$\displaystyle-ah\mu_{0}[1+r-2g(\omega)])$
		$\displaystyle=$	$\displaystyle N^{\beta(1-a)}\exp(\omega h\mu_{0}[2g(\omega)-1-r])$
		$\displaystyle=$	$\displaystyle N^{\beta(1-a)}\exp(\tfrac{h\mu_{0}(\beta\omega-1+\zeta)}{\rho_{r}(\beta,\zeta)})\leq N^{2\beta-1+\zeta-\delta},$

where $\delta=\epsilon(\beta\omega-1+\zeta)$ . Since $E_{0}\exp(U^{n})=1$ , it follows from (D) that

	$\displaystyle E_{1}\exp(R_{0})$	$\displaystyle\leq$	$\displaystyle(1+4N^{-2\beta}E_{Q}[\exp(U^{1}){\bf 1}_{\{1\in\Gamma_{0}\}}])^{N}$
		$\displaystyle\leq$	$\displaystyle\exp(4N^{\zeta-\delta}),$

and (66) holds.

${\bf i=1}$ : Express $U^{1}=v_{1}S_{0h}+v_{2}S_{h,2h}-z$ , where $v_{1}=\log(\tfrac{1}{g(\omega)})$ , $v_{2}=\log(\tfrac{r}{g(\omega)})$ and $z=h\mu_{0}[1+r-2g(\omega)]$ . Since $g(\omega)=\tfrac{1+r^{a+1}}{2g^{a}(\omega)}$ , by Markov’s inequality and (63),

			$\displaystyle E_{1}(\#\Gamma_{1})$
		$\displaystyle=$	$\displaystyle 2N^{1-\beta}P_{Q}(e^{aU^{1}}>N^{a\beta})$
		$\displaystyle\leq$	$\displaystyle 2N^{1-\beta-a\beta}e^{-az}E_{Q}(e^{v_{1}aS_{0h}}e^{v_{2}aS_{h,2h}})$
		$\displaystyle=$	$\displaystyle 2N^{1-\omega\beta}\exp(-az+h\mu_{0}[e^{v_{1}a}-1+re^{v_{2}a}-r])$
		$\displaystyle=$	$\displaystyle 2N^{1-\omega\beta}\exp(\omega h\mu_{0}[2g(\omega)-1-r])$
		$\displaystyle=$	$\displaystyle 2N^{1-\omega\beta}\exp(\tfrac{h\mu_{0}(\beta\omega-1+\zeta)}{\rho_{r}(\beta,\zeta)})\leq N^{\zeta-\delta},$

where $\delta=\epsilon(\beta\omega-1+\zeta)$ . Since

R_{1}\leq(\#\Gamma_{1})\max_{n\in\Gamma_{1}}U_{1}^{n}\mbox{ and }P_{1}(\max_{n}U^{n}\geq N^{\frac{\delta}{2}})\rightarrow 0,

we conclude (66) from (D) and Markov’s inequality. $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Appendix E Proof of Theorem 5

It follows from (C) that $\sup_{\boldsymbol{\mu}\in\Lambda_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})\rightarrow 0$ .

Consider $\boldsymbol{\mu}\in\Lambda_{1}(h,\Delta,V)$ and let $\tau_{j}$ be a change-point such that

\min(\tau_{j+1}-\tau_{j},\tau_{j}-\tau_{j-1})\geq h\mbox{ and }m_{j\Delta}\geq V,

where $m_{j\Delta}=\#\{n:|\log(\mu_{\tau_{j}+1}^{n}/\mu_{\tau_{j}}^{n})|\geq\Delta\}$ .

Let $Q^{n}=1$ if $|\log(\mu_{\tau_{j}+1}^{n}/\mu_{\tau_{j}}^{n})|\geq\Delta$ and $Q^{n}=0$ otherwise.

Proof of Theorem 5(a). Consider $V=o(\tfrac{\log T}{\log N})$ and recall from (19) that $I_{r}=r\log(\tfrac{2r}{r+1})+\log(\tfrac{2}{r+1})$ . Let $r_{1}$ and $\mu_{1}$ be such that $e^{\Delta}>r_{1}>r$ and $\mu_{0}/(1+\epsilon)^{\frac{1}{3}}<\mu_{1}<\mu_{0}$ . Since $hVI_{r}\mu_{0}=(1+\epsilon)\log T$ , $\frac{h_{i+1}}{h_{i}}\rightarrow 1$ and $d_{i}=o(h_{i})$ , for $T$ large there exists

h_{i}\geq(1+\epsilon)^{\frac{1}{2}}I_{r}^{-1}(\tfrac{\log T}{\mu_{1}V}),

(69)

such that for all $\boldsymbol{\mu}\in\Lambda_{1}(h,\Delta,V)$ , there exists $k$ such that

\tau_{j-1}<s(ik)<u(ik)<\tau_{j+1},\qquad|t(ik)-\tau_{j}|\leq\tfrac{d_{i}}{2}.

Moreover when $Q^{n}=1$ ,

|\log(E_{\boldsymbol{\mu}}Y^{n}_{tu}/E_{\boldsymbol{\mu}}Y^{n}_{st})|\geq\log r_{1},

(70)

where $(s,t,u)=(s(ik),t(ik),u(ik))$ . Let

\Gamma=\{n:Q^{n}=1,\ Y^{n}_{su}\geq(1+r)h_{i}\mu_{1},\ |\log(Y_{tu}^{n}/Y_{st}^{n})|\geq\log r\}.

By (69), for $n\in\Gamma$ ,

p^{n}_{stu}\leq 2\exp(-\mu_{1}h_{i}I_{r})\leq 2\exp(-(1+\epsilon)^{\frac{1}{2}}\tfrac{\log T}{V}).

(71)

Since $\tfrac{\log T}{V\log N}\rightarrow\infty$ , for $N$ large,

\ell(p_{stu}^{n})\geq(1+\epsilon)^{\frac{1}{3}}(\tfrac{\log T}{V}).

Hence as $\ell_{N}(q)\geq-1$ for $N$ large, by Lemma 1, with probability tending to 1,

	$\displaystyle\ell_{N}({\bf p}_{stu})$	$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)[(1+\epsilon)^{\frac{1}{3}}(\tfrac{\log T}{V})-1]$
		$\displaystyle\geq$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(1+\epsilon)^{\frac{1}{4}}\log T.$

Since the penalty $\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T$ , $\lambda_{2}^{2}\sqrt{\log N}=o(\log T)$ and $c_{T}=o(\log T)$ , we can conclude that $P_{\boldsymbol{\mu}}(\ell^{\rm pen}_{N}({\bf p}_{stu})\geq c_{T})\rightarrow 1$ . $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

Proof of Theorem 5(b). Consider $V\sim N^{1-\beta}$ for $\tfrac{1-\zeta}{2}<\beta<1-\zeta$ . For $N$ large, there exists

\log N\gtrsim h_{i}\geq(1+\epsilon)^{\frac{1}{2}}\rho_{r}(\beta,\zeta)(\tfrac{\log N}{\mu_{0}})

(72)

such that for all $\boldsymbol{\mu}\in\Lambda_{1}(h,\Delta,V)$ , there exists $k$ such that

\tau_{j-1}<s(ik)<u(ik)<\tau_{j+1},\quad|t(ik)-\tau_{j}|\leq\tfrac{d_{i}}{2},

and conditioned on $Q^{n}=1$ , either

E_{\boldsymbol{\mu}}Y_{tu}^{n}\geq rE_{\boldsymbol{\mu}}Y_{st}^{n}\mbox{ or }E_{\boldsymbol{\mu}}Y_{st}^{n}\geq rE_{\boldsymbol{\mu}}Y_{tu}^{n},

(73)

where $(s,t,u)=(s(ik),t(ik),u(ik))$ .

By Stirling’s approximation $x!\sim\sqrt{2\pi x}(\tfrac{x}{e})^{x}$ , for $X\sim$ Poisson( $\eta$ ), as $x\rightarrow\infty$ ,

P(X=x)=e^{-\eta}\tfrac{\eta^{x}}{x!}\sim\tfrac{1}{\sqrt{2\pi x}}\exp[-\eta+x-x\log(\tfrac{x}{\eta})].

(74)

By apply this in (E) and (E).

Case 1: $\tfrac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]$ and $\rho_{r}(\beta,\zeta)=\tfrac{\beta-\frac{1}{2}(1-\zeta)}{2g(2)-1-r}$ . Let

	$\displaystyle\Gamma$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=1,\ Y_{su}^{n}\geq\sqrt{2(1+r^{2})}h_{i}\mu_{0}-1,$
			$\displaystyle\qquad\|\log(Y_{tu}^{n}/Y_{st}^{n})\|\geq 2\log r,\ q_{stu}^{n}\geq N^{\zeta-1}\}.$

Consider $Y_{1}\sim$ Poisson( $h_{i}\mu_{0}$ ) and $Y_{2}\sim$ Poisson( $rh_{i}\mu_{0}$ ). By (74) and $h_{i}\lesssim\log N$ ,

			$\displaystyle P(Y_{1}=\lfloor(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}h_{i}\mu_{0}\rfloor)$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-1+(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}$
			$\displaystyle\quad-(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}\log((\tfrac{2}{1+r^{2}})^{\frac{1}{2}})]),$
			$\displaystyle P(Y_{2}=\lceil(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}r^{2}h_{i}\mu_{0}\rceil)$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-r+r^{2}(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}$
			$\displaystyle\quad-r^{2}(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}\log(r(\tfrac{2}{1+r^{2}})^{\frac{1}{2}})]).$

Recall that $g(2)=(\tfrac{1+r^{2}}{2})^{\frac{1}{2}}$ and $D(2)=\tfrac{1}{1+r^{2}}\log(\tfrac{2}{1+r^{2}})+\tfrac{r^{2}}{1+r^{2}}\log(\tfrac{2r^{2}}{1+r^{2}})$ [see (54)]. By (E),

			$\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma)$
		$\displaystyle\geq$	$\displaystyle V[P(Y_{1}=\lfloor(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}h_{i}\mu_{0}\rfloor)P(Y_{2}=\lceil(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}r^{2}h_{i}\mu_{0}\rceil)$
			$\displaystyle\quad-N^{\zeta-1}]$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{N^{1-\beta}}{\log N}\exp(h_{i}\mu_{0}[2g(2)-1-r-g(2)D(2)]).$

By (E), for $n\in\Gamma$ ,

	$\displaystyle p^{n}_{stu}$	$\displaystyle\leq$	$\displaystyle 2\exp(-Y_{su}^{n}D(2))\leq\xi_{N}$		(78)
	$\displaystyle\mbox{where }\xi_{N}$	$\displaystyle=$	$\displaystyle C_{2}\exp(-2h_{i}\mu_{0}g(2)D(2))\mbox{ for }C_{2}=2e^{D(2)}.$

Let $q^{n}_{stu}=F_{n}(p^{n}_{stu})$ where $F_{n}$ is the distribution function of $p^{n}_{stu}$ . It follows from Lemmas 1 and 2 that with probability tending to 1,

			$\displaystyle\ell_{N}({\bf p}_{stu})$
		$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}$
		$\displaystyle\geq$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+\tfrac{\lambda_{2}N^{\frac{1}{2}-\beta}}{(\log N)^{\frac{3}{2}}}\exp(h_{i}\mu_{0}[2g(2)-1-r]).$

Since $\lambda_{2}\asymp\tfrac{N^{\frac{\zeta}{2}}}{\sqrt{\log N}}$ and by (72)

h_{i}\mu_{0}\geq(1+\epsilon)^{\frac{1}{2}}\rho_{r}(\beta,\zeta)\log N=(1+\epsilon)^{\frac{1}{3}}(\tfrac{\beta-\tfrac{1}{2}(1-\zeta)}{2g(2)-1-r})\log N,

it follows from (E) that $\ell_{N}({\bf p}_{stu})\gtrsim\tfrac{N^{\zeta+\delta}}{(\log N)^{2}}$ for $\delta=[(1+\epsilon)^{\frac{1}{2}}-1][\beta-\tfrac{1}{2}(1-\zeta)]$ . Since the penalty $\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T\sim N^{\zeta}$ and $c_{T}=o(N^{\zeta})$ , we conclude $P_{\boldsymbol{\mu}}(\ell_{N}^{\rm pen}({\bf p}_{stu})\geq c_{T})\rightarrow 1$ .

Case 2: $\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1$ and $\rho_{r}(\beta,\zeta)=\tfrac{1-\zeta}{2g(\omega)D(\omega)}=\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r}$ with $\omega$ achieving the maximum in (55). Let

	$\displaystyle\Gamma$	$\displaystyle=$	$\displaystyle\{n:Q^{n}=1,\ Y^{n}_{su}\geq 2g(\omega)h_{i}\mu_{0}-1,$
			$\displaystyle\qquad\|\log(Y^{n}_{tu}/Y^{n}_{st})\|\geq\omega\log r,\ q_{stu}^{n}\geq N^{\zeta-1}\}.$

Consider $Y_{1}\sim$ Poisson( $h_{i}\mu_{0}$ ) and $Y_{2}\sim$ Poisson( $rh_{i}\mu_{0}$ ).

By (74) and $h_{i}\lesssim\log N$ ,

			$\displaystyle P(Y_{1}=\lfloor\tfrac{2g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rfloor)$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-1+\tfrac{2g(\omega)}{r^{\omega}+1}$
			$\displaystyle\quad-\tfrac{2g(\omega)}{r^{\omega}+1}\log(\tfrac{2g(\omega)}{r^{\omega}+1})]),$
			$\displaystyle P(Y_{2}=\lceil\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rceil)$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-r+\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}$
			$\displaystyle\quad-\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}\log(\tfrac{2r^{\omega-1}g(\omega)}{r^{\omega}+1})]).$

Recall that $g(\omega)=(\tfrac{1+r^{\omega}}{2})^{\frac{1}{\omega}}$ and $D(\omega)=\tfrac{1}{1+r^{\omega}}\log(\tfrac{2}{1+r^{\omega}})+\tfrac{r^{\omega}}{1+r^{\omega}}\log(\tfrac{2r^{\omega}}{1+r^{\omega}})$ [see (54)]. By (E),

			$\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma)$
		$\displaystyle\geq$	$\displaystyle V[P(Y_{1}=\lfloor\tfrac{2g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rfloor)P(Y_{2}\geq\lceil\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rceil)$
			$\displaystyle\quad-N^{\zeta-1}]$
		$\displaystyle\gtrsim$	$\displaystyle\tfrac{N^{1-\beta}}{\log N}\exp(h_{i}\mu_{0}[2g(\omega)-1-r-2(\tfrac{\omega-1}{\omega})g(\omega)D(\omega)]).$

By (E), for $n\in\Gamma$ ,

	$\displaystyle p^{n}$	$\displaystyle\leq$	$\displaystyle 2\exp(-Y^{n}_{s(ik),u(ik)}D(\omega))\leq\xi_{N}$		(83)
	$\displaystyle\mbox{where }\xi_{N}$	$\displaystyle=$	$\displaystyle C_{\omega}\exp(-2h_{i}\mu_{0}g(\omega)D(\omega))\mbox{ for }C_{\omega}=2e^{D(\omega)}.$

Let $q^{n}=F_{n}(p^{n})$ where $F_{n}$ is the distribution function of $p^{n}$ . It follows from Lemmas 1 and 2 that with probability tending to 1,

			$\displaystyle\ell_{N}({\bf p}_{stu})$
		$\displaystyle\geq$	$\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}$
		$\displaystyle\gtrsim$	$\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+\tfrac{\lambda_{2}N^{\frac{1}{2}-\beta}}{(\log N)^{\frac{3}{2}}}$
			$\displaystyle\qquad\times\exp(h_{i}\mu_{0}[2g(\omega)-1-r-(\tfrac{\omega-2}{\omega})g(\omega)D(\omega)]).$

Since $\lambda_{2}\asymp\tfrac{N^{\frac{\zeta}{2}}}{\sqrt{\log N}}$ and by (72),

$\displaystyle h_{i}\mu_{0}$	$\displaystyle\geq$	$\displaystyle(1+\epsilon)^{\frac{1}{2}}\rho_{r}(\beta,\zeta)\log N$
	$\displaystyle=$	$\displaystyle(1+\epsilon)^{\frac{1}{2}}(\tfrac{1-\zeta}{2g(\omega)D(\omega)})\log N$
	$\displaystyle=$	$\displaystyle(1+\epsilon)^{\frac{1}{2}}(\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r})\log N,$

it follows from (E) that $\ell_{N}({\bf p}_{stu})\gtrsim\tfrac{N^{\zeta+\delta}}{(\log N)^{2}}$ for $\delta=[(1+\epsilon)^{\frac{1}{2}}-1][\beta-\tfrac{1}{2}(1-\zeta)]$ . Since the penalty $\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T\sim N^{\zeta}$ and $c_{T}=o(\log T)$ , we conclude $P_{\boldsymbol{\mu}}(\ell_{N}^{\rm pen}({\bf p}_{stu})\geq c_{T})\rightarrow 1$ . $\sqcap\hbox to0.0pt{\hss$\sqcup$}$

References

[1] Arias-Castro, E., Donoho, D. and Huo X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inf. Theory 51 2402–2425.
[2] Arias-Castro, E., Donoho, D. and Huo X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform noise. Ann. Statist. 34 326–349.
[3] Berk, R.H. and Jones, D.H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete 47 47–50.
[4] Cai, T., Jeng, X.J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedasic mixtures. J. Roy. Statist. Soc. B 73 629–662.
[5] Chan, H.P. and Walther, G. (2015). Optimal detection of multi-sample aligned sparse signals. Ann. Statist. 43 1865–1895
[6] Chan, H.P. (2017). Optimal sequential detection in multi-stream data. Ann. Statist. 45 2736–2763.
[7] Cho, H. (2016). Change-point detection in panel data via double CUSUM statistic. Electron. J. Statist. 10 2000–2038.
[8] Cho, H. and Fryzlewicz, P. (2015). Multiple change-point detection for high-dimensional time series via sparsified binary segmentation. J. Roy. Statist. Soc. B 77 475–507.
[9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
[10] Du, C., Kao, C.L. and Kou, S.C. (2016). Stepwise signal extraction via marginal likelihood. J. Amer. Statist. Assoc. 111 314–330.
[11] Dümbgen, L. and Spokoiny, V.G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124–152.
[12] Enikeeva, F. and Harchaoui, Z. (2019). High-dimensional change-point detection under sparse alternatives. Ann. Statist. 47 2051–2079.
[13] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change point inference. J. Roy. Statist. Soc. B 76 495–580.
[14] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Ann. Statist. 42 2243–2281.
[15] Horváth L. and Hušková M. (2014). Change-point detection in panel data. J. Time Ser. Anal. 23 631–648.
[16] Hubert, L. and Arabie, P. (1983). Comparing partitions. J. Classificn 2 193–218.
[17] Ingster, Y.I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
[18] Ingster, Y.I. (1998). Minimax detection of a signal for $\ell^{n}$ balls. Math. Methods Statist. 7 401–428.
[19] Jirak, M. (2015). Uniform change point tests in high dimension. Ann. Statist. 43 2451–2483
[20] Lai, T.L. and Xing, H. (2011). A simple Bayesian approach to multiple change-points. Statist. Sinica 21 539–569.
[21] Liu, H., Gao, C. and Samworth, R. (2021). Minimax rate in sparse high-dimensional change-point detection. Ann. Statist. 49 1081–1112.
[22] Mei, Y. (2010). Efficient scalable schemes for monitoring large number of data streams. Biometrika 97 419–433.
[23] Niu, Y.S., Hao, N. and Zhang, H. (2016). Multiple change-point detection: a selective overview. Statist. Sci. 31 611–623.
[24] Niu, Y.S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variation. Ann. Appl. Statist. 6 1306–1326.
[25] Olshen, A.B., Venkatraman, E.S., Lucito, R. and Wigler, M. (2003). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatist. 5 557–572.
[26] Pilliat, E., Carpentier. A. and Verzelen, N. (2020). Optimal multiple change-point detection for high-dimensional data. arXiv preprint arXiv:2011.07818.
[27] Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Soc. 66 846–850.
[28] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Statist. 40 752–769.
[29] Tukey, J.W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ.
[30] Wang, T. and Samworth, R. (2018). High dimensional change point estimation via sparse projection. J. Roy. Statist. Soc. B 80 57–83.
[31] Wang, Y. and Mei, Y. (2015). Large-scale multi-stream quickest change via shrinkage post-change estimation. IEEE Trans. Information Theory 61 6926–6938.
[32] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.
[33] Willsky, A. and Jones, H. (1976). A generalized likelihood ratio approach to the detection and estimation of jumps in linear system. IEEE Trans. Autom. Cont. 21 108–112.
[34] Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist. 41 670–692.
[35] Yao, Y.C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447.
[36] Zhang, N.R. and Siegmund, D. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization. Biometrics 63 22–52.
[37] Zhang, N.R., Siegmund, D., Ji, H. and Li, J.Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631–645.