This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Likelihood Scores for Sparse Signal and Change-Point Detection

Shouri Hu, Jingyan Huang, Hao Chen and Hock Peng Chan22footnotemark: 2 School of Mathematical Sciences, University of Electronic Science and Technology of ChinaDepartment of Statistics and Data Science, National University of SingaporeDepartment of Statistics, University of California at Davis
Abstract

We consider here the identification of change-points on large-scale data streams. The objective is to find the most efficient way of combining information across data stream so that detection is possible under the smallest detectable change magnitude. The challenge comes from the sparsity of change-points when only a small fraction of data streams undergo change at any point in time. The most successful approach to the sparsity issue so far has been the application of hard thresholding such that only local scores from data streams exhibiting significant changes are considered and added. However the identification of an optimal threshold is a difficult one. In particular it is unlikely that the same threshold is optimal for different levels of sparsity. We propose here a sparse likelihood score for identifying a sparse signal. The score is a likelihood ratio for testing between the null hypothesis of no change against an alternative hypothesis in which the change-points or signals are barely detectable. By the Neyman-Pearson Lemma this score has maximum detection power at the given alternative. The outcome is that we have a scoring of data streams that is successful in detecting at the boundary of the detectable region of signals and change-points. The likelihood score can be seen as a soft thresholding approach to sparse signal and change-point detection in which local scores that indicate small changes are down-weighted much more than local scores indicating large changes. We are able to show second-order optimality of the sparsity likelihood score in the sense of achieving successful detection at the minimum detectable order of change magnitude as well as at the minimum detection asymptotic constant with respect this order of change.

1 Introduction

Consider a large number NN of data streams containing change-points. We consider the situation in which all data up to a given time is available for analysis, so each data stream is an observed sequence of length TT. At each change-point one or more of the sequences undergo distribution change. The objective is to identify these change-points and the sequences undergoing distribution change. Of interest here is the identification of these change-points when there is sparsity, that is when the number of sequences undergoing change is small compared to NN. More specifically we want to know the minimum magnitude of change for which the distribution change can be detected under sparsity. And secondly we want to be have an algorithm that is able to detect, with high probability, change-points under the minimum detectable change. See Niu, Hao and Zhang [23] and Wang and Samworth [30] for applications to engineering, genomics and finance.

A typical strategy to deal with sparsity is to subject local scores to thresholding or penalization before summing them up across sequence. Algorithms employing this strategy include the Sparsified Binary Segmentation (SBS) (Cho and Fryzlewicz [8]), the double CUSUM (DC) (Cho [7]), the Informative Sparse Projection (INSPECT) (Wang and Samworth [30]) and the scan algorithm of Enikeeva and Harachaoui [12]. The strategy was also employed by Mei [22], Xie and Siegmund [34] and Wang and Mei [31] in sequential change-point detection on multiple sequences, and Zhang, Siegmund, Ji and Li [37] to detect distribution deviations from known baselines on multiple sequences. Thresholding and penalization suppress noise by removing small and moderate scores, mostly from the majority of sequences without change, thus enhancing the signals from the sparse sequences with changes. It is however unlikely that we are able to specify a threshold or penalization parameter that is optimal at all levels of sparsity.

The higher-criticism (HC) test statistic, proposed by Tukey [29] to check for significantly large number of small p-values, uses multiple thresholds for sparse mixture detection. The number of p-values below a threshold is transformed to a higher-criticism score and this score is maximized over all thresholds. The Berk and Jones [3] test statistic uses multiple thresholds as well but it applies a different p-value transformation. The HC test statistic was shown by Donoho and Jin [9] to be optimal in the detection of a sparse normal mixture. Cai, Jeng and Jin [4] extended it to detect intervals in multiple sequences where the means of a sparse fraction of the sequences deviate from a known baseline and showed that the HC test statistic is optimal. Chan and Walther [5] considered sequence length much larger than number of sequences with detection boundaries that are more complex. They showed that the HC test statistic achieves detection at these boundaries and is optimal in more general settings. They also showed that the Berk-Jones test statistic achieves the same optimality. The HC and Berk-Jones test statistics have the advantage of not requiring a threshold to be specified in advance. However at each threshold we are unable to differentiate a p-value just below the threshold and a p-value that is below the threshold by a large margin. This loss of information results sub-optimality when identifying the exact location where change has occurred.

Our approach here is to convert the p-values into likelihood scores for testing sparse sequences. It can be considered to be a soft form of thresholding in which p-values that are close to zero are penalized less than p-values that are barely significant. Unlike the HC and Berk-Jones test statistic only one transformation of the p-values is needed. It retains the advantage of the HC and Berk-Jones test statistics in not specifying a threshold in advance, but unlike the HC and Berk-Jones test statistics a smaller p-value always has a larger likelihood score.

Since the likelihood scores are transformations of p-values, the proposed method can be applied to any type of distribution changes and it can handle data types that vary across sequences. Our theory however requires a specific distribution family for neat asymptotics and we consider here in particular either normal or Poisson data. We show optimality up to the correct asymptotic constants. For sparse normal change-points these constants are two-dimensional extensions of those in Ingster [17] and Donoho and Jin [9] for sparse normal mixture detection. These constants have been discussed in the context of sparse normal change-point detection assuming a known baseline in Chan and Walther [5] and Chan [6]. For sparse Poisson change-points the constants are new and different from sparse normal constants.

The optimality of multiple sequence identification of change-points up to the correct constant is new. Previous works on optimality for normal data are up to the correct order of magnitude though they go beyond the i.i.d. model, for example Pilliat, Carpentier and Verzelen [26] considered sparse change-point detection in time-series with normal errors. Liu, Guo and Samworth [21] showed optimality up to the best order for normal errors. However they considered a different problem that assumes at most one change-point.

The algorithm we propose here has two steps in the identification of two change-points. The first detection screening step applies the Screening and Ranking (SaRa) idea of Niu and Zhang [24]. The second estimation step for more precise location of change-points uses the CUSUM-like procedure of Wild Binary Segmentation (WBS), cf. Fryzlewicz [14].. This two-step approach saves computation time because the fast screening step evaluates a large number of segments whereas the computationally intensive estimation step is only applied when a change-point has been detected during screening. In contrast for WBS the estimation step is applied on a large number of randomly generated segments. Unlike in Niu and Zhang [24] we do not apply the BIC criterion of Zhang and Siegmund [36] to determine the number of change-points. Instead critical values are specified in advance and binary segmentation, cf. Olshen et al. [25], is applied to detect the change-points sequentially.

An alternative to binary segmentation is estimating the full set of change-points at one go by applying global optimization and making use of dynamic programming to manage the computational complexity. This was employed by the HMM algorithms of Yao [35] and Lai and Xing [20], the multi-scale SMUCE algorithm of Frick, Munk and Sieling [13] and the Bayesian Likelihood algorithm of Du, Kao and Kou [10]. These methods are however designed for single sequence segmentation. Niu, Hao and Zhang [23] provides an excellent background of the historical developments.

The outline of this paper is as follows. In Section II we introduce the sparse likelihood (SL) scores and show that they are optimal in the detection of sparse normal mixtures. In Section III we extend SL scores to detect change-points in multiple sequences. In Section IV we show that SL scores are optimal for change-point detection when the observations are normal or Poisson. In Section V we perform simulation studies on the SL scores. In the appendices we prove the optimality of SL scores.

1.1 Notations

We write anbna_{n}\sim b_{n} to denote limn(an/bn)=1\lim_{n\rightarrow\infty}(a_{n}/b_{n})=1. We write an=o(bn)a_{n}=o(b_{n}) to denote limn(an/bn)=0\lim_{n\rightarrow\infty}(a_{n}/b_{n})=0. We write anbna_{n}\lesssim b_{n} to denote anCbna_{n}\leq Cb_{n} for all nn for some C>0C>0 and anbna_{n}\asymp b_{n} to denote anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n}. We write Xn=Op(an)X_{n}=O_{p}(a_{n}) to denote P(XnCan)1P(X_{n}\leq Ca_{n})\rightarrow 1 for some C>0C>0. Let ()\lfloor\cdot\rfloor(\lceil\cdot\rceil) denote the greatest (least) integer function. Let ϕ\phi and Φ\Phi denote the density and distribution function respectively of the standard normal. Let 𝟏{\bf 1} denote the indicator function. Let \emptyset denote the empty set and let #A\#A denote the number of elements in a set AA.

2 Sparse mixture detection

We start with the simpler problem of detecting a sparse mixture, with the objective of motivating the sparse likelihood score.

Let 𝐩=(p1,,pN){\bf p}=(p^{1},\ldots,p^{N}) be independent p-values of NN null hypotheses and let p(1)p(N)p^{(1)}\leq\cdots\leq p^{(N)} be the sorted p-values. Tukey (1976) proposed the higher-criticism test statistic

HC(𝐩)=maxn:Np(n)nnNp(n)Np(n)(1p(n)),\mbox{HC}({\bf p})=\max_{n:Np^{(n)}\leq n}\tfrac{n-Np^{(n)}}{\sqrt{Np^{(n)}(1-p^{(n)})}}, (1)

with HC(𝐩)=0{\rm HC}({\bf p})=0 if Np(n)>nNp^{(n)}>n for all nn, for the overall test that all null hypotheses are true.

Donoho and Jin [9] showed that the HC test statistic is optimal for detecting a sparse fraction of false null hypotheses. Consider test scores ZnN(0,1)Z^{n}\sim{\rm N}(0,1) when the nnth null hypothesis is true and ZnN(μN,1)Z^{n}\sim{\rm N}(\mu_{N},1) for some μN>0\mu_{N}>0 when the nnth null hypothesis is false. Define

ρZ(β)={β12 if 12<β<34,(11β)2 if 34β<1.\rho_{Z}(\beta)=\left\{\begin{array}[]{ll}\beta-\tfrac{1}{2}&\mbox{ if }\tfrac{1}{2}<\beta<\tfrac{3}{4},\cr(1-\sqrt{1-\beta})^{2}&\mbox{ if }\tfrac{3}{4}\leq\beta<1.\end{array}\right. (2)

Donoho and Jin [9] showed that on the sparse mixture (1ϵ)N(0,1)+ϵN(μN,1)(1-\epsilon){\rm N}(0,1)+\epsilon{\rm N}(\mu_{N},1) no algorithm is able to achieve, as NN~{}\rightarrow~{}\infty,

P0(Type I error)+PμN(Type II error)0,P_{0}(\mbox{Type I error})+P_{\mu_{N}}(\mbox{Type II error})\rightarrow 0, (3)

for testing H0H_{0}: ϵ=0\epsilon=0 versus H1H_{1}: ϵ=Nβ\epsilon=N^{-\beta}, if μN=2νlogN\mu_{N}=\sqrt{2\nu\log N} for ν<ρZ(β)\nu<\rho_{Z}(\beta). They also showed that the HC test statistic achieves (3) when ν>ρZ(β)\nu>\rho_{Z}(\beta) and is thus optimal. Type I error refers to the conclusion of H1H_{1} when H0H_{0} is true whereas Type II error refers to the conclusion of H0H_{0} when H1H_{1} is true. Ingster (1997, 1998) established the detection lower bound showing that (3) cannot be achieved when ν<ρZ(β)\nu<\rho_{Z}(\beta).

Like the HC test statistic, the Berk and Jones [3] test statistic

BJ(𝐩)=maxn:Np(n)n[nlog(nNp(n))+(Nn)log(NnN(1p(n)))]\mbox{BJ}({\bf p})=\max_{n:Np^{(n)}\leq n}\Big{[}n\log\Big{(}\tfrac{n}{Np^{(n)}}\Big{)}+(N-n)\log\Big{(}\tfrac{N-n}{N(1-p^{(n)})}\Big{)}\Big{]}

achieves (3) when ν>ρZ(β)\nu>\rho_{Z}(\beta).

We introduce the sparse likelihood scores in Section II.A and show that they achieve (3) in the detection of sparse mixtures, when ν>ρZ(β)\nu>\rho_{Z}(\beta), in Section II.B.

2.1 Sparse likelihood

Let f1(p)=1p(2logp)212f_{1}(p)=\tfrac{1}{p(2-\log p)^{2}}-\tfrac{1}{2} and f2(p)=1p2f_{2}(p)=\tfrac{1}{\sqrt{p}}-2. For both i=1i=1 and 2, 01fi(p)𝑑p=0\int_{0}^{1}f_{i}(p)dp=0 and fi(p)f_{i}(p) increases as pp decreases.

Define the sparse likelihood score

N(𝐩)\displaystyle\ell_{N}({\bf p}) =\displaystyle= n=1NN(pn),\displaystyle\sum_{n=1}^{N}\ell_{N}(p^{n}), (4)
where N(p)\displaystyle\mbox{where }\ell_{N}(p) =\displaystyle= log(1+λ1logNNf1(p)+λ2NlogNf2(p)),\displaystyle\log\Big{(}1+\tfrac{\lambda_{1}\log N}{N}f_{1}(p)+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(p)\Big{)},

with λ10\lambda_{1}\geq 0 and λ2>0\lambda_{2}>0. For the purpose of sparse mixture detection f1f_{1} plays no role and we can consider λ1=0\lambda_{1}=0. The importance of f1f_{1} comes from detecting very sparse change-points.

The sparse likelihood is the likelihood of

pn\displaystyle p^{n} i.i.d.\displaystyle\sim_{\rm i.i.d.} Uniform(0,1),\displaystyle{\rm Uniform}(0,1),
vs pn\displaystyle\mbox{ vs }p^{n} i.i.d.\displaystyle\sim_{\rm i.i.d.} f=1+λ2NlogNf2.\displaystyle f=1+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}.

The distribution function of ff is

F(p)=p+2λ2NlogN(pp).F(p)=p+\tfrac{2\lambda_{2}}{\sqrt{N\log N}}(\sqrt{p}-p).

Consider the true distribution of pnp^{n} to be some unknown GUniform(0,1)G\neq{\rm Uniform}(0,1). The fraction of p-values less than some small p0p_{0} has mean G(p0)G(p_{0}) and standard deviation approximately G(p0)/N\sqrt{G(p_{0})/N}. Hence to achieve decent detection power for a test using p0p_{0} as critical p-value, we need

G(p0)p0+Cp0/N for some C>0.G(p_{0})\geq p_{0}+C\sqrt{p_{0}/N}\mbox{ for some }C>0. (5)

Setting C=2λ2logNC=\tfrac{2\lambda_{2}}{\sqrt{\log N}} to the right-hand side of (5) gives us a function close to FF.

As logN\sqrt{\log N} varies slowly with NN, we can view FF as lying at the boundary for which detection is possible at each small p0p_{0}. This suggests that for any GG that is greater than FF at some p0p_{0}, we are able to differentiate it from Uniform (0,1) by using the likelihood test against FF.

When pp is of order smaller than N1N^{-1}, logNNf1(p)\tfrac{\log N}{N}f_{1}(p) dominates 1NlogNf2(p)\tfrac{1}{\sqrt{N\log N}}f_{2}(p) and the selection of λ1>0\lambda_{1}>0 is advantageous. This is relevant in the extension of sparse likelihood scores to detect change-points on long sequences where large number of likelihood comparisons is involved.

Refer to caption
Figure 1: Graphs of N(p(z))\ell_{N}(p(z)) (black, —) and (z2)+2/2(z-2)^{2}_{+}/2 (red,--), with p(z)=2Φ(z)p(z)=2\Phi(-z), for 0z50\leq z\leq 5 (top) and 0z20\leq z\leq 2 (bottom). The parameters of N\ell_{N} are N=500N=500, λ1=1\lambda_{1}=1 and λ2=1.84(logTloglogT\lambda_{2}=1.84\Big{(}\doteq\sqrt{\tfrac{\log T}{\log\log T}} for T=500T=500).

The sparse likelihood score can be viewed as a form of soft thresholding. To visualize this we compare in Figure 1 the plot of N(p(z))\ell_{N}(p(z)) for p(z)=2Φ(|z|)p(z)=2\Phi(-|z|), N=500N=500, λ1=1\lambda_{1}=1 and λ2=1.84\lambda_{2}=1.84, against that of (z2)+2/2(z-2)^{2}_{+}/2. For 0z50\leq z\leq 5, the two functions are close to each other however within 0z20\leq z\leq 2, N(p(z))\ell_{N}(p(z)) is not constant but has a gentle upward curve. The sparsity likelihood score is negative for z1.18z\leq 1.18 and N(p(Z))\ell_{N}(p(Z)) for ZZ standard normal has a mean of 0.004-0.004. This negative mean helps in controlling the sum of scores when NN is large and pni.i.dUniform(0,1)p^{n}\stackrel{{\scriptstyle\rm i.i.d}}{{\sim}}{\rm Uniform}(0,1).

2.2 Optimal detection

We show here that the sparse likelihood score is optimal in the detection of change-points for a broad range of sparsity. Let E0E_{0} and P0P_{0} denote expectation and probability respectively with respect to pni.i.dUniform(0,1)p^{n}\stackrel{{\scriptstyle\rm i.i.d}}{{\sim}}{\rm Uniform}(0,1). Since

E0exp(N(𝐩))\displaystyle E_{0}\exp(\ell_{N}({\bf p}))
=\displaystyle= n=1NE0[1+λ1logNNf1(pn)+λ2NlogNf2(pn)]=1,\displaystyle\prod_{n=1}^{N}E_{0}[1+\tfrac{\lambda_{1}\log N}{N}f_{1}(p^{n})+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(p^{n})]=1,

it follows from Markov’s inequality that

P0(N(𝐩)cN)ecN.P_{0}(\ell_{N}({\bf p})\geq c_{N})\leq e^{-c_{N}}. (6)

This exponential bound makes the sparsity likelihood score easy to work with when there are large number of likelihood comparisons, as critical values satisfying a required level of Type I error control can have a simple expression not depending on NN. We show in Theorem 1 that by selecting

cN with cN=o(Nδ)0 for all δ>0,c_{N}\rightarrow\infty\mbox{ with }c_{N}=o(N^{\delta})\rightarrow 0\mbox{ for all }\delta>0, (7)

the Type I and II error probabilities both go to zero at the detection boundary.

Theorem 1.

Assume (7). Consider the test of H0H_{0}: Zni.i.d.N(0,1)Z^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}{\rm N}(0,1) versus H1H_{1}: Zni.i.d.(1ϵ)N(0,1)+ϵN(μN,1)Z^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}(1-\epsilon){\rm N}(0,1)+\epsilon{\rm N}(\mu_{N},1), for 1nN1\leq n\leq N, with ϵ=Nβ\epsilon=N^{-\beta} for some 12<β<1\tfrac{1}{2}<\beta<1. Let pn=Φ(Zn)p^{n}=\Phi(-Z^{n}). If μN=2νlogN\mu_{N}=\sqrt{2\nu\log N} for ν>ρZ(β)\nu>\rho_{Z}(\beta), then

P0(N(𝐩)cN)+PμN(N(𝐩)<cN)0.P_{0}(\ell_{N}({\bf p})\geq c_{N})+P_{\mu_{N}}(\ell_{N}({\bf p})<c_{N})\rightarrow 0.

3 Change-point detection

Let XtnX^{n}_{t} denote the ttth observation of the nnth sequence for 1tT1\leq t\leq T and 1nN1\leq n\leq N. Consider first the model

Xtnindep.N(μtn,1).X^{n}_{t}\sim_{\rm indep.}{\rm N}(\mu_{t}^{n},1). (8)

We are interested in the detection and estimation of

𝝉:={t:μtnμt+1n for some n}.\boldsymbol{\tau}:=\{t:\mu^{n}_{t}\neq\mu^{n}_{t+1}\mbox{ for some }n\}.

For s<ts<t, let X¯stn=(ts)1u=s+1tXun\bar{X}_{st}^{n}=(t-s)^{-1}\sum_{u=s+1}^{t}X_{u}^{n}. To check for a change of mean on the nnth sequence at location tt, select s<t<us<t<u and let p-value

pstun=2Φ(|Zstun|), where Zstun=X¯tunX¯stn(ut)1+(ts)1.p_{stu}^{n}=2\Phi(-|Z_{stu}^{n}|),\mbox{ where }Z_{stu}^{n}=\tfrac{\bar{X}_{tu}^{n}-\bar{X}_{st}^{n}}{\sqrt{(u-t)^{-1}+(t-s)^{-1}}}.

In the sparse likelihood algorithm we combine these p-values using N(𝐩stu)\ell_{N}({\bf p}_{stu}), where 𝐩stu=(pstu1,,pstuN){\bf p}_{stu}=(p_{stu}^{1},\ldots,p_{stu}^{N}). When the data follow some other distributions, the corresponding likelihood ratio statistic and p-value can be computed accordingly.

Sparse likelihood scores detects well when only a small fraction of the sequences undergo change of mean. For TT large computing the sparse likelihood score for all (s,t,u)(s,t,u) is expensive. Instead we combine the approximating set idea of Walther (2010) to first space out the (s,t,u)(s,t,u) that are evaluated, and to apply the CUSUM-type scores used in WBS to estimate the change-point location accurately only when the first step indicates a change-point.

In addition to computational savings, through this two-step approach we are able to incorporate multi-scale penalization terms similar to those used in Dümbgen and Spokoiny (2001) and the SMUCE algorithm of Frick, Munk and Sieling (2014), to ensure optimality not only at all levels of sparse change-points, but also at all orders of change magnitudes.

Let 1h1<h2<1\leq h_{1}<h_{2}<\cdots and 1d1<d2<1\leq d_{1}<d_{2}<\cdots be integer-valued sequences with hidih_{i}\geq d_{i} for all ii. Let Ki(g)=g1diK_{i}(g)=\lfloor\tfrac{g-1}{d_{i}}\rfloor. Define

𝒜i(g)\displaystyle{\cal A}_{i}(g) =\displaystyle= {(s(ik),t(ik),u(ik)):1kKi(g)},\displaystyle\{(s(ik),t(ik),u(ik)):1\leq k\leq K_{i}(g)\},
s(ik)\displaystyle s(ik) =\displaystyle= max(0,kdihi),\displaystyle\max(0,kd_{i}-h_{i}),
t(ik)\displaystyle t(ik) =\displaystyle= kdi,\displaystyle kd_{i},
u(ik)\displaystyle u(ik) =\displaystyle= min(kdi+hi,g).\displaystyle\min(kd_{i}+h_{i},g).

The elements of 𝒜i(g){\cal A}_{i}(g) are the indices where sparse likelihood scores for windows of length hih_{i} are computed. Initially we have the full dataset 𝐗1:T=(Xtn:1tT,1nN){\bf X}_{1:T}=(X^{n}_{t}:1\leq t\leq T,1\leq n\leq N) and after one or more change-points have been estimated, it is split into sub-datasets 𝐗b:e=(Xtn:bte,1nN){\bf X}_{b:e}=(X^{n}_{t}:b\leq t\leq e,1\leq n\leq N), with length g=eb+1g=e-b+1. We check for change-points in 𝐗b:e{\bf X}_{b:e} using windows specified by 𝒜i(g){\cal A}_{i}(g).

Let the penalized sparse likelihood scores

Npen(𝐩stu)=N(𝐩stu)log(T4(1ts+1ut)).\ell^{\rm{pen}}_{N}({\bf p}_{stu})=\ell_{N}({\bf p}_{stu})-\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t})). (9)

For g1g\geq 1, let ig=max{i:hi+dig}i_{g}=\max\{i:h_{i}+d_{i}\leq g\}. The detection of change-points within 𝐗b:e{\bf X}_{b:e}, with window lengths at least hi0h_{i_{0}}, is as follows.

Algorithm 1 SL-estimate
  
  input(c,i0,b,ec,i_{0},b,e)
      𝐗𝐗b:e{\bf X}\leftarrow{\bf X}_{b:e}
      geb+1g\leftarrow e-b+1
      for i=i0,,igi=i_{0},\ldots,i_{g}
        if max1kKi(g)Npen(𝐩s(ik),t(ik),u(ik))c\max_{1\leq k\leq K_{i}(g)}\ell^{\rm{pen}}_{N}({\bf p}_{s(ik),t(ik),u(ik)})\geq c then
          jargmaxk:1kKi(g)Npen(𝐩s(ik),t(ik),u(ik))j\leftarrow\mbox{argmax}_{k:1\leq k\leq K_{i}(g)}\ell^{\rm{pen}}_{N}({\bf p}_{s(ik),t(ik),u(ik)})
          τ^[argmaxt:s(ij)<t<u(ij)Npen(𝐩s(ij),t,u(ij))]+b1\widehat{\tau}\leftarrow[\mbox{argmax}_{t:s(ij)<t<u(ij)}\ell^{\rm{pen}}_{N}({\bf p}_{s(ij),t,u(ij)})]+b-1
          output (τ^,i)(\widehat{\tau},i)
          stop
        end if
      end for
      output (0,0)

There are two steps in SL-estimate in the estimation of a change-point, when the largest penalized score exceeds the critical value cc. The first is the identification of an interval (s(ij),u(ij))(s(ij),u(ij)), associated with the largest penalized score, within which a change-point lies. The second is the estimation of the change-point within this interval. In the approximating set 𝒜i(g){\cal A}_{i}(g), neighboring windows are located did_{i} apart, hence we are unable to estimate the change-points accurately in the first step. Accurate estimation is carried out, with more intensive computations within (s(ij),u(ij))(s(ij),u(ij)), in the second step. Since the second step is performed only after an interval has been identified as containing a change-point, performing this two-step procedure saves computations in regions where scores are generally small and the likelihood of change-points is low.

After a change-point has been identified, we split the dataset into two and execute the same algorithm on each split dataset. To avoid repetitive computations, we start from the window length hi0h_{i_{0}} used in the evaluation of the change-point spltting the dataset, instead of starting from the smallest window length h1h_{1}, on the split datasets. The use of a set of representative set of window-lengths hih_{i} for computational savings in change-point detection have been proposed in Willsky and Jones [33]. The recursive segmentation algorithm for the computation of the estimated change-point set 𝝉^\widehat{\boldsymbol{\tau}} is given below, with initialization at (c,1,1,T,)(c,1,1,T,\emptyset).

Refer to caption
Figure 2: Graphs of Type I error probability against critical value for the sparse likelihood detection algorithm, for independent unit variance normal observations. We consider parameters did_{i}, hih_{i}, λ1\lambda_{1} and λ2\lambda_{2} as applied in the numerical studies in Section V, with T=2000T=2000 (top), T=T=20,000 (bottom). and N=50N=50 (black), N=100N=100 (red), N=200N=200 (green), N=500N=500 (blue), N=1000N=1000 (orange).
Algorithm 2 SL-detect
  
  input(c,i0,b,e,𝝉^c,i_{0},b,e,\widehat{\boldsymbol{\tau}})
        (τ^,i)(\widehat{\tau},i)\leftarrow SL-estimate(c,i0,b,ec,i_{0},b,e)
        if τ^>0\widehat{\tau}>0 then
          𝝉^𝝉^{τ^}\widehat{\boldsymbol{\tau}}\leftarrow\widehat{\boldsymbol{\tau}}\cup\{\widehat{\tau}\}
          𝝉^\widehat{\boldsymbol{\tau}}\leftarrow SL-detect(c,i,b,τ^,𝝉^)c,i,b,\widehat{\tau},\widehat{\boldsymbol{\tau}})
          𝝉^\widehat{\boldsymbol{\tau}}\leftarrow SL-detect(c,i,τ^,e,𝝉^)c,i,\widehat{\tau},e,\widehat{\boldsymbol{\tau}})
        end if
        output 𝝉^\widehat{\boldsymbol{\tau}}

In Figure 2 we show that the critical values of the sparse likelihood algorithm, for a specified Type I error probability, is stable over NN. Contributing factors include N(p)\ell_{N}(p) having a mean that is close to zero and N(𝐩)\ell_{N}({\bf p}) having exponential tail probabilities not depending on NN, see (6), when pp and pnp^{n} are uniformly distributed.

4 Optimal detection

Let 𝝁=(μtn:1tT,1nN)\boldsymbol{\mu}=(\mu_{t}^{n}:1\leq t\leq T,1\leq n\leq N) and let J=(#𝝉)J=(\#\boldsymbol{\tau}) be the number of change-points. We show that the sparse likelihood algorithm is optimal for normal observations in Section IV.A and for Poisson observations in Section IV.B. Consider TT\rightarrow\infty as NN\rightarrow\infty such that

logTNζ for some 0<ζ<1.\log T\sim N^{\zeta}\mbox{ for some }0<\zeta<1. (10)

In Theorems 2 and 4 we specify the detection boundary for asymptotically zero Type I and II error probabilities. Analogous detection boundaries for a single sequence is given in Arias-Castro, Donoho and Huo [1, 2].

In Theorems 3 and 5 we show that Type I and II error probabilities of the sparse likelihood algorithm go to zero at the detection boundary.

Recall that iT=max{i:hi+diT}i_{T}=\max\{i:h_{i}+d_{i}\leq T\}. Consider the sparse likelihood algorithm with did_{i} and hih_{i} satisfying

hi+1hi\displaystyle\tfrac{h_{i+1}}{h_{i}} \displaystyle\rightarrow 1 and di=o(hi) as i,\displaystyle 1\mbox{ and }d_{i}=o(h_{i})\mbox{ as }i\rightarrow\infty, (11)
log(i=1iThidi)\displaystyle\log\Big{(}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\Big{)} =\displaystyle= o(logT) as T,\displaystyle o(\log T)\mbox{ as }T\rightarrow\infty, (12)

and critical values cTc_{T} satisfying

cT=o(logT) and cTlog(i=1iThidi) as T.c_{T}=o(\log T)\mbox{ and }c_{T}-\log\Big{(}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\Big{)}\rightarrow\infty\mbox{ as }T\rightarrow\infty. (13)

For the sparse likelihood algorithm select parameters λ1>0\lambda_{1}>0 and

λ2=logTloglogT.\lambda_{2}=\sqrt{\tfrac{\log T}{\log\log T}}. (14)

We satisfy (11) when hiexp(ilogi)h_{i}\sim\exp(\tfrac{i}{\log i}) and dihiid_{i}\sim\tfrac{h_{i}}{i} as ii~{}\rightarrow~{}\infty. Moreover (12) holds because

log(i=1iThidi)2logiT2loglogT.\log\Big{(}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\Big{)}\sim 2\log i_{T}\sim 2\log\log T.

Condition (11) ensures that the set of (hi,di)(h_{i},d_{i}) is sufficiently dense to detect change-points optimally. Condition (12) is required for (13) to hold. The first half of condition (13) ensures Type II error probability goes to 0. The second half ensures Type I error probability goes to 0.

4.1 Normal model

Let

mjΔ=#{n:|μτj+1nμτjn|Δ}m_{j\Delta}=\#\{n:|\mu_{\tau_{j}+1}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta\}

be the number of sequences with change of mean of at least Δ\Delta at the jjth change-point. Let

Ω0\displaystyle\Omega_{0} =\displaystyle= {𝝁:J=0},\displaystyle\{\boldsymbol{\mu}:J=0\},
Ω1(Δ,V,h)\displaystyle\Omega_{1}(\Delta,V,h) =\displaystyle= {𝝁: there exists j such that\displaystyle\{\boldsymbol{\mu}:\mbox{ there exists }j\mbox{ such that}
min(τjτj1,τj+1τj)h\displaystyle\qquad\min(\tau_{j}-\tau_{j-1},\tau_{j+1}-\tau_{j})\geq h
 and mjΔV},\displaystyle\qquad\mbox{ and }m_{j\Delta}\geq V\},

with the convention τ0=0\tau_{0}=0 and τJ+1=T\tau_{J+1}=T. We consider here the test of H0H_{0}: 𝝁Ω0\boldsymbol{\mu}\in\Omega_{0} versus H1H_{1}: 𝝁Ω1(Δ,h,V)\boldsymbol{\mu}\in\Omega_{1}(\Delta,h,V). Define

ρZ(β,ζ)={β1ζ2 if 1ζ2<β3(1ζ)4,(1ζ1ζβ)2 if 3(1ζ)4<β<1ζ.\rho_{Z}(\beta,\zeta)=\left\{\begin{array}[]{l}\beta-\tfrac{1-\zeta}{2}\cr\qquad\mbox{ if }\tfrac{1-\zeta}{2}<\beta\leq\tfrac{3(1-\zeta)}{4},\cr(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}\cr\qquad\mbox{ if }\tfrac{3(1-\zeta)}{4}<\beta<1-\zeta.\end{array}\right. (15)

These constants are extensions of ρZ(β)\rho_{Z}(\beta) in (2) to capture the effect of multiple testing in change-point detection.

Theorem 2.

Assume (10) and let 0<ϵ<10<\epsilon<1. For normal observations, no algorithm is able to achieve, as NN\rightarrow\infty,

sup𝝁Ω0P𝝁(Type I error)+sup𝝁Ω1(Δ,V,h)P𝝁(Type II error)0,\sup_{\boldsymbol{\mu}\in\Omega_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})+\sup_{\boldsymbol{\mu}\in\Omega_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})\rightarrow 0, (16)

under any of the following conditions.

(a) Consider Δ>0\Delta>0 fixed.

i. When V=o(logTlogN)V=o(\tfrac{\log T}{\log N}) and h=4(1ϵ)(logTΔ2V)h=4(1-\epsilon)(\tfrac{\log T}{\Delta^{2}V}).

ii. When VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta and h=4(1ϵ)ρZ(β,ζ)(logNΔ2)h=4(1-\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}}).

(b) Consider Δ=Tη\Delta=T^{-\eta} for some 0<η<120<\eta<\tfrac{1}{2}.

i. When V=o(logTlogN)V=o(\tfrac{\log T}{\log N}) and h=4(12η)(1ϵ)(logTΔ2V)h=4(1-2\eta)(1-\epsilon)(\tfrac{\log T}{\Delta^{2}V}).

ii. When VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta and h=4(1ϵ)ρZ(β,ζ)(logNΔ2)h=4(1-\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}}).

Theorem 3.

Assume (10) and let ϵ>0\epsilon>0. For normal observations the sparse likelihood algorithm, with parameters satisfying (11)–(14), achieves (16) under any of the following conditions.

(a) Consider Δ>0\Delta>0 fixed.

i. When V=o(logTlogN)V=o(\tfrac{\log T}{\log N}) and h=4(1+ϵ)(logTΔ2V)h=4(1+\epsilon)(\tfrac{\log T}{\Delta^{2}V}).

ii. When VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta and h=4(1+ϵ)ρZ(β,ζ)(logNΔ2)h=4(1+\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}}).

(b) Consider Δ=Tη\Delta=T^{-\eta} for some 0<η<120<\eta<\tfrac{1}{2}.

i. When V=o(NζlogN)V=o(\tfrac{N^{\zeta}}{\log N}) and h=4(12η)(1+ϵ)(logTΔ2V)h=4(1-2\eta)(1+\epsilon)(\tfrac{\log T}{\Delta^{2}V}).

ii. When VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta and h=4(1+ϵ)ρZ(β,ζ)(logNΔ2)h=4(1+\epsilon)\rho_{Z}(\beta,\zeta)(\tfrac{\log N}{\Delta^{2}}).

4.2 Poisson model

We show here the optimality of the sparse likelihood detection algorithm for

Xtnindep.Poisson(μtn).X^{n}_{t}\sim_{\rm indep.}\mbox{Poisson}(\mu^{n}_{t}). (17)

For optimal detection on a single Poisson sequence, see Rivera and Walther [28].

Let Ystn=v=s+1tXvnY_{st}^{n}=\sum_{v=s+1}^{t}X_{v}^{n}. Consider s<t<us<t<u. Under the null hypothesis of no change-points in the interval (s,u)(s,u), conditioned on Ysun=ysunY_{su}^{n}=y^{n}_{su}, YstnY^{n}_{st} is binomial distributed with ysuny^{n}_{su} trials and success probability tsus\frac{t-s}{u-s}. Let pstunp^{n}_{stu} be the two-sided p-value of this conditional binomial test, with continuity adjustments so that it is distributed as Uniform(0,1) under the null hypothesis. More specifically when Ystn=ystnY_{st}^{n}=y_{st}^{n} and Ysun=ysunY_{su}^{n}=y_{su}^{n} simulate

ψstunUniform(P(Y<ystn),P(Yystn)),\psi_{stu}^{n}\sim\mbox{Uniform}(P(Y<y_{st}^{n}),P(Y\leq y_{st}^{n})), (18)

where PP is probability with respect to YBinomial(ysun,tsus)Y\sim\mbox{Binomial}(y_{su}^{n},\tfrac{t-s}{u-s}), and define pstun=2min(ψstun,1ψstun)p_{stu}^{n}=2\min(\psi_{stu}^{n},1-\psi_{stu}^{n}).

Let

mjΔ=#{n:|log(μτj+1n/μτjn)|Δ},m_{j\Delta}=\#\{n:|\log(\mu_{\tau_{j}+1}^{n}/\mu_{\tau_{j}}^{n})|\geq\Delta\},

and for a given μ0>0\mu_{0}>0, let

Λ\displaystyle\Lambda =\displaystyle= {𝝁:μntμ0 for all n and t},\displaystyle\{\boldsymbol{\mu}:\mu_{n}^{t}\geq\mu_{0}\mbox{ for all }n\mbox{ and }t\},
Λ0\displaystyle\Lambda_{0} =\displaystyle= {𝝁Λ:J=0},\displaystyle\{\boldsymbol{\mu}\in\Lambda:J=0\},
Λ1(Δ,V,h)\displaystyle\Lambda_{1}(\Delta,V,h) =\displaystyle= {𝝁Λ: there exists j such that\displaystyle\{\boldsymbol{\mu}\in\Lambda:\mbox{ there exists }j\mbox{ such that }
min(τj+1τj,τjτj1)h\displaystyle\qquad\min(\tau_{j+1}-\tau_{j},\tau_{j}-\tau_{j-1})\geq h
 and mjΔV}.\displaystyle\qquad\mbox{ and }m_{j\Delta}\geq V\}.

We consider here the test of H0H_{0}: 𝝁Λ0\boldsymbol{\mu}\in\Lambda_{0} vs H1H_{1}: 𝝁Λ1(Δ,V,h)\boldsymbol{\mu}\in\Lambda_{1}(\Delta,V,h).

For a given r>1r>1, let

Ir=rlog(2rr+1)+log(2r+1).I_{r}=r\log(\tfrac{2r}{r+1})+\log(\tfrac{2}{r+1}). (19)

Let gr(ω)=(1+rω2)1ωg_{r}(\omega)=(\tfrac{1+r^{\omega}}{2})^{\frac{1}{\omega}} and let

ρr(β,ζ)=max1ζβ<ω2(βω1(1ζ)2gr(ω)1r) for 1ζ2<β<1ζ.\rho_{r}(\beta,\zeta)=\max_{\frac{1-\zeta}{\beta}<\omega\leq 2}\Big{(}\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g_{r}(\omega)-1-r}\Big{)}\mbox{ for }\tfrac{1-\zeta}{2}<\beta<1-\zeta. (20)

In Theorem 4 we show that (20) is the asymptotic constant in the detection boundary of Poisson random variables. In Theorem 5 we show that the sparse likelihood algorithm achieves detection at this boundary for a broad range of sparsity.

Theorem 4.

Assume (10). Let r=eΔr=e^{\Delta} for some Δ>0\Delta>0 and 0<ϵ<10<\epsilon<1. For Poisson observations no algorithm is able to achieve, as NN\rightarrow\infty,

sup𝝁Λ0P𝝁(Type I error)+sup𝝁Λ1(Δ,V,h)P𝝁(Type II error)0\sup_{\boldsymbol{\mu}\in\Lambda_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})+\sup_{\boldsymbol{\mu}\in\Lambda_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})\rightarrow 0 (21)

under either of the following conditions.

(a) When V=o(logTlogN)V=o(\tfrac{\log T}{\log N}) and h=(1ϵ)Ir1(logTμ0V)h=(1-\epsilon)I_{r}^{-1}(\tfrac{\log T}{\mu_{0}V}).

(b) When VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta and h=(1ϵ)ρr(β,ζ)(logNμ0)h=(1-\epsilon)\rho_{r}(\beta,\zeta)(\tfrac{\log N}{\mu_{0}}).


Theorem 5.

Assume (10). Let ϵ>0\epsilon>0, Δ>0\Delta>0 and 1<r<eΔ1<r<e^{\Delta}. For Poisson observations the sparse likelihood algorithm, with parameters satisfying (11)–(14), achieves (21) under either of the following conditions.

(a) When V=o(logTlogN)V=o(\tfrac{\log T}{\log N}) and h=(1+ϵ)Ir1(logTμ0V)h=(1+\epsilon)I_{r}^{-1}(\tfrac{\log T}{\mu_{0}V}).

(b) When VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta and h=(1+ϵ)ρr(β,ζ)(logNμ0)h=(1+\epsilon)\rho_{r}(\beta,\zeta)(\tfrac{\log N}{\mu_{0}}).

5 Simulation studies

We follow here the simulation set-up in Sections 5.1 and 5.3 of Wang and Samworth [30]. Assume that the random variables are normal with variances that are unknown but equal within sequence. These variances are estimated using median absolute differences of adjacent observations and after normalization, the random variables are treated like unit variance normal.

In the first study there is exactly one change-point τ1\tau_{1}. Consider μtn=0\mu_{t}^{n}=0 for tτ1t\leq\tau_{1} and all nn. For t>τ1t>\tau_{1}, let

μnt={0.8/nm=1Vm1 if nV,0 if n>V.\mu^{t}_{n}=\left\{\begin{array}[]{ll}0.8\Big{/}\sqrt{n\textstyle\sum_{m=1}^{V}m^{-1}}&\mbox{ if }n\leq V,\cr 0&\mbox{ if }n>V.\end{array}\right.

The objective is to estimate τ1\tau_{1} assuming we know there is exactly one change-point. We estimate τ1\tau_{1} here by

τ^1=argmax0<t<TNpen(𝐩0tT),\widehat{\tau}_{1}={\rm arg}\max_{0<t<T}\ell^{\rm{pen}}_{N}({\bf p}_{0tT}),

where Npen\ell^{\rm pen}_{N} is the penalized sparse score with λ1=1\lambda_{1}=1 and λ2=logTloglogT\lambda_{2}=\sqrt{\tfrac{\log T}{\log\log T}}.

We simulate the probabilities that |τ^1τ1|k|\widehat{\tau}_{1}-\tau_{1}|\leq k for k=3k=3 and 10, and compare against the INSPECT algorithm and the scan algorithm of Enikeeva and Harchaoui [12]. These two algorithms have the best numerical performances in Wang and Samworth [30]. The comparisons in Table I show that the sparse likelihood algorithm performs well.

Table 1: The fraction of simulation runs (out of 1000) for which τ^1\widehat{\tau}_{1} is within distance kk from τ1\tau_{1} for k=3k=3 and 10. The same datasets are used to compare sparse likelihood (SL), INSPECT and the scan test, with τ1=200\tau_{1}=200 for T=500T=500 and τ1=800\tau_{1}=800 for T=2000T=2000.
kk 3 10 3 10 3 10
VV SL INSPECT scan
T=500T=500 3 0.511 0.801 0.478 0.785 0.520 0.804
N=500N=500 5 0.466 0.740 0.427 0.718 0.463 0.722
10 0.393 0.645 0.370 0.637 0.362 0.599
22 0.319 0.553 0.282 0.547 0.256 0.465
50 0.244 0.462 0.211 0.453 0.197 0.378
500 0.177 0.339 0.148 0.335 0.112 0.240
T=500T=500 3 0.481 0.748 0.410 0.667 0.480 0.730
N=2000N=2000 5 0.423 0.673 0.344 0.584 0.394 0.633
10 0.320 0.546 0.246 0.480 0.261 0.456
20 0.237 0.431 0.198 0.403 0.188 0.332
45 0.186 0.344 0.136 0.311 0.130 0.242
200 0.114 0.227 0.095 0.235 0.074 0.153
2000 0.068 0.160 0.078 0.189 0.042 0.096
T=2000T=2000 3 0.603 0.859 0.587 0.855 0.589 0.854
N=500N=500 5 0.604 0.865 0.595 0.855 0.558 0.832
10 0.565 0.827 0.569 0.833 0.487 0.764
22 0.522 0.789 0.522 0.795 0.438 0.714
50 0.472 0.748 0.468 0.745 0.384 0.652
500 0.378 0.643 0.336 0.609 0.273 0.524
T=2000T=2000 3 0.607 0.866 0.608 0.861 0.599 0.858
N=2000N=2000 5 0.594 0.864 0.586 0.857 0.557 0.829
10 0.553 0.847 0.558 0.847 0.476 0.780
20 0.494 0.807 0.498 0.789 0.435 0.726
45 0.447 0.747 0.451 0.746 0.377 0.657
200 0.362 0.649 0.342 0.604 0.297 0.554
2000 0.274 0.532 0.241 0.471 0.225 0.457

In the second study there are three change-points within N=200N=200 sequences of length T=2000T=2000, at τ1=500\tau_{1}=500, τ2=1000\tau_{2}=1000 and τ3=1500\tau_{3}=1500. At each change-point exactly 4040 sequences undergo mean changes. Six scenarios are considered, corresponding to

μτj+1k(j1)+nμτjk(j1)+n=r/nm=140m1,\mu_{\tau_{j}+1}^{k(j-1)+n}-\mu_{\tau_{j}}^{k(j-1)+n}=r\Big{/}\sqrt{n\textstyle\sum_{m=1}^{40}m^{-1}},

1j31\leq j\leq 3 and 1n401\leq n\leq 40, for r=0.4,0.6r=0.4,0.6 and k=0,20,40k=0,20,40. For k=0k=0, the mean changes are within the same 40 sequences at all three change-points, whereas for k=40k=40 the mean changes at all three change-points are on distinct sequences. For k=20k=20, there is partial overlap of the sequences having mean changes at adjacent change-points. The number of estimated change-points over 100 simulated datasets on each sequence is recorded, as well as the adjusted Rand index (ARI), see Rand [27] and Hubert and Arabie [16], to measure the quality of the change-point estimation.

In the application of the sparse likelihood algorithm, we select h1=1h_{1}=1 and hi+1=1.1hih_{i+1}=\lceil 1.1h_{i}\rceil for i1i\geq 1, and di=hi/id_{i}=\lfloor h_{i}/i\rfloor, for a total of iT=61i_{T}=61 window lengths. We select critical value cT=5c_{T}=5 and parameters λ1=1\lambda_{1}=1, λ2=logTloglogT1.94\lambda_{2}=\sqrt{\tfrac{\log T}{\log\log T}}\doteq 1.94.

Wang and Samworth [30] showed that INSPECT achieves average ARI of 0.90 when r=0.6r=0.6 and either 0.73 (for k=20k=20) or 0.74 (for k=0k=0 and 40) when r=0.4r=0.4, comparable to sparse likelihood, see Table II.

In addition to INSPECT, Wang and Samworth [30] considered DC, SBS and scan, as well as the CUSUM aggregration algorithms of Jirak [19] and Horváth and Hušková [15], with average ARI in the range 0.77–0.87 when r=0.6r=0.6 and 0.68–0.72 when r=0.4r=0.4.

Table 2: Number of change-points estimated by the sparse likelihood algorithm and the average ARI over 100 simulated datasets.
rr kk #\# change-points ARI
2 3 4 5
0.6 0 11 80 8 1 0.91
0.4 0 61 35 4 0 0.74
0.6 20 12 80 8 0 0.91
0.4 20 66 31 2 1 0.74
0.6 40 10 78 12 0 0.91
0.4 40 68 26 6 0 0.75

Appendix A Proof of Theorem 1

Since cNc_{N}\rightarrow\infty, by Markov’s inequality P0(N(𝐩)cN)ecN0P_{0}(\ell_{N}({\bf p})\geq c_{N})\leq e^{-c_{N}}\rightarrow 0. The proof of PμN(N(𝐩)<cN)0P_{\mu_{N}}(\ell_{N}({\bf p})<c_{N})\rightarrow 0 applies Lemmas 1 and 2 below. Lemma 1 says that the sum of sparse likelihood scores under qnUniform(0,1)q^{n}\sim{\rm Uniform}(0,1) is bounded below by a value close to zero, with large probability. Lemma 2 provides a lower bound to the increase in score when the p-value is divided by at least 2. Their proofs are at the end of Appendix A.

Lemma 1.

Let 𝐪=(q1,,qN){\bf q}=(q^{1},\ldots,q^{N}), with qni.i.d.q^{n}\sim_{\rm i.i.d.} Uniform(0,1). For fixed λ10\lambda_{1}\geq 0 and δ>0\delta>0,

supδλ2NP(N(𝐪)λ22logN)0.\sup_{\delta\leq\lambda_{2}\leq\sqrt{N}}P(\ell_{N}({\bf q})\leq-\lambda_{2}^{2}\sqrt{\log N})\rightarrow 0.
Lemma 2.

For λ1>0\lambda_{1}>0 fixed, δλ2N\delta\leq\lambda_{2}\leq\sqrt{N} for some δ>0\delta>0 and ξN=o(Nη)\xi_{N}=o(N^{-\eta}) for some η>0\eta>0 such that ξNλ222N\xi_{N}\geq\tfrac{\lambda_{2}^{2}}{2N},

inf(p,q):pξN,qλ22/N,pq/2[N(p)N(q)]λ24NξNlogN\inf_{\begin{subarray}{c}(p,q):p\leq\xi_{N},\\ q\geq\lambda_{2}^{2}/N,p\leq q/2\end{subarray}}[\ell_{N}(p)-\ell_{N}(q)]\geq\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}

for large NN.

Proof of Theorem 1. Let 12<β<1\tfrac{1}{2}<\beta<1, λ10\lambda_{1}\geq 0 and λ2>0\lambda_{2}>0 be fixed. Let ν\nu be such that

(11β)2<ν<1\displaystyle(1-\sqrt{1-\beta})^{2}<\nu<1 if 34β<1,\displaystyle\mbox{ if }\tfrac{3}{4}\leq\beta<1,
β12<ν<4(β12)\displaystyle\beta-\tfrac{1}{2}<\nu<4(\beta-\tfrac{1}{2}) if 12<β<34,\displaystyle\mbox{ if }\tfrac{1}{2}<\beta<\tfrac{3}{4},

and let

μN\displaystyle\mu_{N} =\displaystyle= 2νlogN,\displaystyle\sqrt{2\nu\log N},
Qn\displaystyle Q^{n} \displaystyle\sim Bernoulli(Nβ),\displaystyle\mbox{Bernoulli}(N^{-\beta}),
Zn|Qn\displaystyle Z^{n}|Q^{n} \displaystyle\sim N(μNQn,1),\displaystyle{\rm N}(\mu_{N}Q^{n},1),
pn\displaystyle p^{n} =\displaystyle= Φ(Zn),\displaystyle\Phi(-Z^{n}),
qn\displaystyle q^{n} =\displaystyle= Φ(Zn+μNQn).\displaystyle\Phi(-Z^{n}+\mu_{N}Q^{n}).

The additional assumptions of ν<1\nu<1 for 34β<1\tfrac{3}{4}\leq\beta<1 and ν<4(β12)\nu<4(\beta-\tfrac{1}{2}) for 12<β<34\tfrac{1}{2}<\beta<\tfrac{3}{4} is not restrictive because N(𝐩)\ell_{N}({\bf p}) increases stochastically with μN\mu_{N}.

Case 1: 34β<1\tfrac{3}{4}\leq\beta<1. Let

Γ={n:Qn=1,Zn2logN,qnλ22N}.\Gamma=\{n:Q^{n}=1,Z^{n}\geq\sqrt{2\log N},q^{n}\geq\tfrac{\lambda^{2}_{2}}{N}\}.

For NN large, pnqn2p^{n}\leq\tfrac{q^{n}}{2} for nΓn\in\Gamma. Moreover N(pn)N(qn)\ell_{N}(p^{n})\geq\ell_{N}(q^{n}) for all nn. Hence by Lemma 1 and Lemma 2 with ξN=N1\xi_{N}=N^{-1}, with probability tending to 1,

N(𝐩)\displaystyle\ell_{N}({\bf p}) \displaystyle\geq N(𝐪)+nΓ[N(pn)N(qn)]\displaystyle\ell_{N}({\bf q})+\sum_{n\in\Gamma}[\ell_{N}(p^{n})-\ell_{N}(q^{n})]
\displaystyle\geq λ22logN+(#Γ)λ24logN.\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{\log N}}.

Since #Γ\#\Gamma is binomial with mean

EμN(#Γ)\displaystyle E_{\mu_{N}}(\#\Gamma)
=\displaystyle= N1β[Φ(2logN+2νlogN)λ22N]\displaystyle N^{1-\beta}[\Phi(-\sqrt{2\log N}+\sqrt{2\nu\log N})-\tfrac{\lambda_{2}^{2}}{N}]
\displaystyle\gtrsim N1β(1ν)2logN,\displaystyle\tfrac{N^{1-\beta-(1-\sqrt{\nu})^{2}}}{\sqrt{\log N}},

with 1β(1ν)2>01-\beta-(1-\sqrt{\nu})^{2}>0 for (11β)2<ν<1(1-\sqrt{1-\beta})^{2}<\nu<1, and since cNc_{N} is subpolynomial in NN, we conclude PμN(N(𝐩)cN)1P_{\mu_{N}}(\ell_{N}({\bf p})\geq c_{N})\rightarrow 1.

Case 2: 12<β<34\tfrac{1}{2}<\beta<\tfrac{3}{4}. Let

Γ={n:Qn=1,Zn2(2β1)logN,qnλ22N}.\Gamma=\{n:Q^{n}=1,Z^{n}\geq 2\sqrt{(2\beta-1)\log N},q^{n}\geq\tfrac{\lambda_{2}^{2}}{N}\}.

For NN large, pnqn2p^{n}\leq\tfrac{q^{n}}{2} for nΓn\in\Gamma. Hence by Lemma 1 and Lemma 2 with ξN=N24β\xi_{N}=N^{2-4\beta}, with probability tending to 1,

N(𝐩)\displaystyle\ell_{N}({\bf p}) \displaystyle\geq N(𝐪)+nΓ[N(pn)N(qn)]\displaystyle\ell_{N}({\bf q})+\sum_{n\in\Gamma}[\ell_{N}(p^{n})-\ell_{N}(q^{n})]
\displaystyle\geq λ22logN+(#Γ)λ24N322βlogN.\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(\#\Gamma)\tfrac{\lambda_{2}}{4N^{\frac{3}{2}-2\beta}\sqrt{\log N}}.

Since #Γ\#\Gamma is binomial with mean

EμN(#Γ)\displaystyle E_{\mu_{N}}(\#\Gamma)
=\displaystyle= N1β[Φ(2(2β1)logN+2νlogN)λ22N]\displaystyle N^{1-\beta}[\Phi(-2\sqrt{(2\beta-1)\log N}+\sqrt{2\nu\log N})-\tfrac{\lambda_{2}^{2}}{N}]
\displaystyle\gtrsim N1β(4β2ν)2logN,\displaystyle\tfrac{N^{1-\beta-(\sqrt{4\beta-2}-\sqrt{\nu})^{2}}}{\sqrt{\log N}},

and

1β(4β2ν)2>32β for β12<ν<4(β12),1-\beta-(\sqrt{4\beta-2}-\sqrt{\nu})^{2}>\tfrac{3}{2}-\beta\mbox{ for }\beta-\tfrac{1}{2}<\nu<4(\beta-\tfrac{1}{2}),

we conclude PμN(N(𝐩)cN)1P_{\mu_{N}}(\ell_{N}({\bf p})\geq c_{N})\rightarrow 1. \sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Lemma 1. Let

xN(p)=λ1logNNf1(p)+λ2NlogNf2(p),x_{N}(p)=\tfrac{\lambda_{1}\log N}{N}f_{1}(p)+\tfrac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(p),

where f1(p)=1p(2logp)212f_{1}(p)=\tfrac{1}{p(2-\log p)^{2}}-\frac{1}{2}, f2(p)=1p2f_{2}(p)=\tfrac{1}{\sqrt{p}}-2, λ10\lambda_{1}\geq 0 and δλ2N12\delta\leq\lambda_{2}\leq N^{\frac{1}{2}} for some δ>0\delta>0. Let rN=1NlogNr_{N}=\tfrac{1}{N\log N}. Since xN(rN)0x_{N}(r_{N})\geq 0 and xN(1)12x_{N}(1)\geq-\tfrac{1}{2} for NN large and log(1+x)xx2\log(1+x)\geq x-x^{2} for x12x\geq-\frac{1}{2},

N(𝐪)=n=1Nlog(1+xN(qn))n=1NhN(qn)n=1NhN2(qn),\ell_{N}({\bf q})=\sum_{n=1}^{N}\log(1+x_{N}(q^{n}))\geq\sum_{n=1}^{N}h_{N}(q^{n})-\sum_{n=1}^{N}h_{N}^{2}(q^{n}), (24)

where hN(q)=xN(q)𝟏{qrN}h_{N}(q)=x_{N}(q){\bf 1}_{\{q\geq r_{N}\}}.

By Chebyshev’s inequality and the bounds in (A)–(A) below.

P(N(𝐪)λ22logN)\displaystyle P(\ell_{N}({\bf q})\leq-\lambda_{2}^{2}\sqrt{\log N})
\displaystyle\leq P(n=1NhN(qn)λ22logN2)\displaystyle P\Big{(}\sum_{n=1}^{N}h_{N}(q^{n})\leq-\tfrac{\lambda_{2}^{2}\sqrt{\log N}}{2}\Big{)}
+P(n=1NhN2(qn)λ22logN2)\displaystyle\quad+P\Big{(}\sum_{n=1}^{N}h_{N}^{2}(q^{n})\geq\tfrac{\lambda_{2}^{2}\sqrt{\log N}}{2}\Big{)}
\displaystyle\leq NVar(hN(qn))(NEhN(qn)+λ22logN2)2+NVar(hN2(qn))(λ22logN2NEhN2(qn))20.\displaystyle\tfrac{N{\rm Var}(h_{N}(q^{n}))}{(NEh_{N}(q^{n})+\frac{\lambda_{2}^{2}\sqrt{\log N}}{2})^{2}}+\tfrac{N{\rm Var}(h_{N}^{2}(q^{n}))}{(\frac{\lambda_{2}^{2}\sqrt{\log N}}{2}-NEh_{N}^{2}(q^{n}))^{2}}\rightarrow 0.

Since ExN(qn)=0Ex_{N}(q^{n})=0,

EhN(qn)\displaystyle Eh_{N}(q^{n}) =\displaystyle= E[xN(qn)𝟏{qn<rN}]\displaystyle-E[x_{N}(q^{n}){\bf 1}_{\{q^{n}<r_{N}\}}]
=\displaystyle= λ1logNN(12logrNrN2)\displaystyle-\tfrac{\lambda_{1}\log N}{N}(\tfrac{1}{2-\log r_{N}}-\tfrac{r_{N}}{2})
λ2NlogN(2rN2rN)\displaystyle\quad-\tfrac{\lambda_{2}}{\sqrt{N\log N}}(2\sqrt{r_{N}}-2r_{N})
\displaystyle\geq λ1N2λ2NlogN.\displaystyle-\tfrac{\lambda_{1}}{N}-\tfrac{2\lambda_{2}}{N\log N}.

Let sN=(logN)2Ns_{N}=\tfrac{(\log N)^{2}}{N}.

Var(hN(qn))\displaystyle{\rm Var}(h_{N}(q^{n}))
\displaystyle\leq EhN2(qn)\displaystyle Eh_{N}^{2}(q^{n})
\displaystyle\leq 2λ12(logN)2N2rN1dqq2(2logq)4+2λ22NlogNrN1dqq\displaystyle\tfrac{2\lambda_{1}^{2}(\log N)^{2}}{N^{2}}\int_{r_{N}}^{1}\tfrac{dq}{q^{2}(2-\log q)^{4}}+\tfrac{2\lambda_{2}^{2}}{N\log N}\int_{r_{N}}^{1}\tfrac{dq}{q}
\displaystyle\leq 2λ12(logN)2N2(sN1dqq2+1(2logsN)4rNsNdqq2)\displaystyle\tfrac{2\lambda_{1}^{2}(\log N)^{2}}{N^{2}}\Big{(}\int_{s_{N}}^{1}\tfrac{dq}{q^{2}}+\tfrac{1}{(2-\log s_{N})^{4}}\int_{r_{N}}^{s_{N}}\tfrac{dq}{q^{2}}\Big{)}
+2λ22log(1rN)NlogN\displaystyle\quad+\tfrac{2\lambda_{2}^{2}\log(\frac{1}{r_{N}})}{N\log N}
\displaystyle\lesssim λ12+λ22N.\displaystyle\tfrac{\lambda_{1}^{2}+\lambda_{2}^{2}}{N}.
Var(hN2(qn))\displaystyle{\rm Var}(h_{N}^{2}(q^{n}))
\displaystyle\leq EhN4(qn)\displaystyle Eh_{N}^{4}(q^{n})
\displaystyle\leq 8λ14(logN)4N4(sN1dqq4+1(2logsN)8rNsNdqq4)\displaystyle\tfrac{8\lambda_{1}^{4}(\log N)^{4}}{N^{4}}\Big{(}\int_{s_{N}}^{1}\tfrac{dq}{q^{4}}+\tfrac{1}{(2-\log s_{N})^{8}}\int_{r_{N}}^{s_{N}}\tfrac{dq}{q^{4}}\Big{)}
+8λ24(NlogN)4rN1dqq2\displaystyle\quad+\tfrac{8\lambda_{2}^{4}}{(N\log N)^{4}}\int_{r_{N}}^{1}\tfrac{dq}{q^{2}}
\displaystyle\lesssim λ14+λ24N.\displaystyle\tfrac{\lambda_{1}^{4}+\lambda_{2}^{4}}{N}.

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Lemma 2. For λ222Nr2ξN\tfrac{\lambda_{2}^{2}}{2N}\leq r\leq 2\xi_{N}, |logr|logN|\log r|\asymp\log N and therefore

λ1logNNf1(r)λ2NlogNf2(r)1λ2NrlogN0.\tfrac{\frac{\lambda_{1}\log N}{N}f_{1}(r)}{\frac{\lambda_{2}}{\sqrt{N\log N}}f_{2}(r)}\asymp\tfrac{1}{\lambda_{2}\sqrt{Nr\log N}}\rightarrow 0.

Moreover,

λ2NlogNf2(r)λ2NrlogN0.\tfrac{\lambda_{2}}{N\log N}f_{2}(r)\sim\tfrac{\lambda_{2}}{\sqrt{Nr\log N}}\rightarrow 0.

Hence by log(1+x)x\log(1+x)\sim x as x0x\rightarrow 0,

N(r)λ2NrlogN.\ell_{N}(r)\sim\tfrac{\lambda_{2}}{\sqrt{Nr\log N}}. (28)

Case 1: λ222NpξN\frac{\lambda_{2}^{2}}{2N}\leq p\leq\xi_{N}. By (28) and q2pq\geq 2p,

N(p)N(q)\displaystyle\ell_{N}(p)-\ell_{N}(q) \displaystyle\geq N(p)N(2p)\displaystyle\ell_{N}(p)-\ell_{N}(2p)
\displaystyle\sim (112)λ2NplogN\displaystyle(1-\tfrac{1}{\sqrt{2}})\tfrac{\lambda_{2}}{\sqrt{Np\log N}}
>\displaystyle> λ24NξNlogN.\displaystyle\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}.

Case 2: p<λ222Np<\tfrac{\lambda_{2}^{2}}{2N}. By (28), qλ22Nq\geq\tfrac{\lambda_{2}^{2}}{N} and ξNλ222N\xi_{N}\geq\tfrac{\lambda_{2}^{2}}{2N},

N(p)N(q)\displaystyle\ell_{N}(p)-\ell_{N}(q) \displaystyle\geq N(λ222N)N(λ22N)\displaystyle\ell_{N}(\tfrac{\lambda_{2}^{2}}{2N})-\ell_{N}(\tfrac{\lambda_{2}^{2}}{N})
\displaystyle\sim (112)λ2N(λ222N)logN\displaystyle(1-\tfrac{1}{\sqrt{2}})\tfrac{\lambda_{2}}{\sqrt{N(\frac{\lambda_{2}^{2}}{2N})\log N}}
>\displaystyle> λ24NξNlogN.\displaystyle\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}.

Appendix B Proof of Theorem 2

Proof of Theorem 2(a)i. Let h=4(1ϵ)logTΔ2Vh=\lfloor\tfrac{4(1-\epsilon)\log T}{\Delta^{2}V}\rfloor for some 0<ϵ<10<\epsilon<1. Let P0P_{0} denote probability with respect to μtn=0\mu^{n}_{t}=0 for all nn and tt. Let tk=(2k1)ht_{k}=(2k-1)h and let PkP_{k}, 1kK:=T2h1\leq k\leq K:=\lfloor\tfrac{T}{2h}\rfloor, denote probability under which, for nVn\leq V,

μtkh+1n\displaystyle\mu^{n}_{t_{k}-h+1} =\displaystyle= =μtkn=Δ2,\displaystyle\cdots=\mu^{n}_{t_{k}}=-\tfrac{\Delta}{2}, (29)
μtk+1n\displaystyle\mu^{n}_{t_{k}+1} =\displaystyle= =μtk+hn=Δ2,\displaystyle\cdots=\mu^{n}_{t_{k}+h}=\tfrac{\Delta}{2},
μtn\displaystyle\mu^{n}_{t} =\displaystyle= 0 for ttkh and t>tk+h,\displaystyle 0\mbox{ for }t\leq t_{k}-h\mbox{ and }t>t_{k}+h,

and μ1n==μTn=0\mu^{n}_{1}=\cdots=\mu^{n}_{T}=0 for n>Vn>V. Let EkE_{k} denote expectation with respect to PkP_{k}.

Let P=1Kk=1KPkP_{*}=\tfrac{1}{K}\sum_{k=1}^{K}P_{k} and let L=1Kk=1KLkL=\tfrac{1}{K}\sum_{k=1}^{K}L_{k}, where Lk=dPkdP0(𝐗)L_{k}=\tfrac{dP_{k}}{dP_{0}}({\bf X}) with 𝐗=(Xtn:1nN,1tT){\bf X}=(X^{n}_{t}:1\leq n\leq N,1\leq t\leq T). Hence

logL1=hΔ2n=1V(X¯h,2hnX¯0hn)hVΔ24.\log L_{1}=\tfrac{h\Delta}{2}\sum_{n=1}^{V}(\bar{X}^{n}_{h,2h}-\bar{X}^{n}_{0h})-\tfrac{hV\Delta^{2}}{4}. (30)

Let Ai={L3}{conclude Hi}A_{i}=\{L\leq 3\}\cap\{\mbox{conclude }H_{i}\}. Since P(A1)=E0(L𝟏A1)3P0(A1)P(A_{1})=E_{0}(L{\bf 1}_{A_{1}})\leq 3P_{0}(A_{1}),

sup𝝁Ω0P𝝁(Type I error)\displaystyle\sup_{\boldsymbol{\mu}\in\Omega_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})
+sup𝝁Ω1(Δ,V,h)P𝝁(Type II error)\displaystyle\quad+\sup_{\boldsymbol{\mu}\in\Omega_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})
\displaystyle\geq P0(conclude H1)+P(conclude H0)\displaystyle P_{0}(\mbox{conclude }H_{1})+P_{*}(\mbox{conclude }H_{0})
\displaystyle\geq P0(A1)+P(A0)13P(L3)=13P1(L3),\displaystyle P_{0}(A_{1})+P_{*}(A_{0})\geq\tfrac{1}{3}P_{*}(L\leq 3)=\tfrac{1}{3}P_{1}(L\leq 3),

with the last equality due to LL having the same distribution under all PkP_{k} and PP_{*}.

Since E1Lk=1E_{1}L_{k}=1 for k2k\geq 2, it follows that P1(1Kk=2KLk2)12P_{1}(\tfrac{1}{K}\sum_{k=2}^{K}L_{k}\leq 2)\geq\tfrac{1}{2}. Hence by (B), to show that sup𝝁Ω0P𝝁(Type I error)+sup𝝁Ω1(Δ,V,h)P𝝁(Type II error)0\sup_{\boldsymbol{\mu}\in\Omega_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})+\sup_{\boldsymbol{\mu}\in\Omega_{1}(\Delta,V,h)}P_{\boldsymbol{\mu}}(\mbox{Type II error})\rightarrow 0 is not possible, it suffices to show that

P1(L1K)1 as T.P_{1}(L_{1}\leq K)\rightarrow 1\mbox{ as }T\rightarrow\infty. (32)

By (30), logL1N(hVΔ24,hVΔ22)\log L_{1}\sim{\rm N}(\tfrac{hV\Delta^{2}}{4},\tfrac{hV\Delta^{2}}{2}), and indeed

P1(L1K)=Φ(logK14hVΔ212hVΔ2)1.P_{1}(L_{1}\leq K)=\Phi\Big{(}\tfrac{\log K-\frac{1}{4}hV\Delta^{2}}{\sqrt{\frac{1}{2}hV\Delta^{2}}}\Big{)}\rightarrow 1. (33)

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 2(a)ii. Proceed as in the proof of Theorem 2(a)i., but with h=4(1ϵ)ρZ(β,ζ)logNΔ2h=\lfloor\tfrac{4(1-\epsilon)\rho_{Z}(\beta,\zeta)\log N}{\Delta^{2}}\rfloor, and PkP_{k} probability under which, independently for 1nN1\leq n\leq N, Qn=1Q^{n}=1 with probability 2Nβ2N^{-\beta} and Qn=0Q^{n}=0 otherwise. When Qn=1Q^{n}=1, (29) holds. When Qn=0Q^{n}=0, μ1n==μTn=0\mu^{n}_{1}=\cdots=\mu^{n}_{T}=0.

By the law of large numbers, P1(𝝁Ω1(h,Δ,V))=P1(n=1NQnV)1P_{1}(\boldsymbol{\mu}\in\Omega_{1}(h,\Delta,V))=P_{1}(\sum_{n=1}^{N}Q^{n}\geq V)\rightarrow 1. Hence by (B) it suffices to show (32) with

L1\displaystyle L_{1} =\displaystyle= n=1N[1+2Nβ(eZnΔh2hΔ241)],\displaystyle\prod_{n=1}^{N}[1+2N^{-\beta}(e^{Z^{n}\Delta\sqrt{\frac{h}{2}}-\frac{h\Delta^{2}}{4}}-1)], (34)
Zn\displaystyle Z^{n} =\displaystyle= h2(X¯h,2hnX¯0hn)N(QnΔh2,1).\displaystyle\sqrt{\tfrac{h}{2}}(\bar{X}^{n}_{h,2h}-\bar{X}^{n}_{0h})\sim{\rm N}\Big{(}Q^{n}\Delta\sqrt{\tfrac{h}{2}},1\Big{)}. (35)

Case 1: 1ζ2<β<3(1ζ)4\tfrac{1-\zeta}{2}<\beta<\tfrac{3(1-\zeta)}{4}. Recall that ρZ(β,ζ)=β1ζ2\rho_{Z}(\beta,\zeta)=\beta-\tfrac{1-\zeta}{2}. By (34) and (35),

E1L1\displaystyle E_{1}L_{1} =\displaystyle= (1+4N2β[exp(hΔ22)1])N\displaystyle(1+4N^{-2\beta}[\exp(\tfrac{h\Delta^{2}}{2})-1])^{N}
\displaystyle\leq exp(4N12β+2(1ϵ)ρZ(β,ζ))\displaystyle\exp(4N^{1-2\beta+2(1-\epsilon)\rho_{Z}(\beta,\zeta)})
=\displaystyle= exp(4Nζ2ϵρZ(β,ζ)).\displaystyle\exp(4N^{\zeta-2\epsilon\rho_{Z}(\beta,\zeta)}).

Since logK=log(T2h)Nζ\log K=\log(\lfloor\tfrac{T}{2h}\rfloor)\sim N^{\zeta}, it follows that P1(L1K)1K1E1L11P_{1}(L_{1}\leq K)\geq 1-K^{-1}E_{1}L_{1}\rightarrow 1 and (32) holds.

Case 2: 3(1ζ)4β<1ζ\tfrac{3(1-\zeta)}{4}\leq\beta<1-\zeta. Recall that ρZ(β,ζ)=(1ζ1ζβ)2\rho_{Z}(\beta,\zeta)=(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}. Express logL1=i=03Ri\log L_{1}=\sum_{i=0}^{3}R_{i}, where

Ri\displaystyle R_{i} =\displaystyle= nΓilog(1+2Nβ[exp(ZnΔh2Δ2h4)1]),\displaystyle\sum_{n\in\Gamma_{i}}\log\Big{(}1+2N^{-\beta}\Big{[}\exp\Big{(}Z^{n}\Delta\sqrt{\tfrac{h}{2}}-\tfrac{\Delta^{2}h}{4}\Big{)}-1\Big{]}\Big{)},
Γ0\displaystyle\Gamma_{0} =\displaystyle= {n:Qn=0},\displaystyle\{n:Q^{n}=0\},
Γ1\displaystyle\Gamma_{1} =\displaystyle= {n:Qn=1,Zn2(1ζ)logN},\displaystyle\{n:Q^{n}=1,Z^{n}\leq\sqrt{2(1-\zeta)\log N}\},
Γ2\displaystyle\Gamma_{2} =\displaystyle= {n:Qn=1,2(1ζ)logN<Zn22logN},\displaystyle\{n:Q^{n}=1,\sqrt{2(1-\zeta)\log N}<Z^{n}\leq 2\sqrt{2\log N}\},
Γ3\displaystyle\Gamma_{3} =\displaystyle= {n:Qn=1,Zn>22logN}.\displaystyle\{n:Q^{n}=1,Z^{n}>2\sqrt{2\log N}\}.

We show (32) by showing that

P1(Ri14logK)0 for 0i3.P_{1}(R_{i}\geq\tfrac{1}{4}\log K)\rightarrow 0\mbox{ for }0\leq i\leq 3. (36)

𝐢=𝟑{\bf i=3}: Since Δh22logN\Delta\sqrt{\tfrac{h}{2}}\leq\sqrt{2\log N},

P1(R3>0)2N1βΦ(2logN)0.P_{1}(R_{3}>0)\leq 2N^{1-\beta}\Phi(-\sqrt{2\log N})\rightarrow 0.

𝐢=𝟐{\bf i=2}: Since

Δh2\displaystyle\Delta\sqrt{\tfrac{h}{2}}
\displaystyle\leq 2(1ζ)logN2(1ζβ)logN2δlogN\displaystyle\sqrt{2(1-\zeta)\log N}-\sqrt{2(1-\zeta-\beta)\log N}-\sqrt{2\delta\log N}

for some δ>0\delta>0, it follows that

Φ(Δh22(1ζ)logN)=o(Nζ+β1δ).\Phi\Big{(}\Delta\sqrt{\tfrac{h}{2}}-\sqrt{2(1-\zeta)\log N}\Big{)}=o(N^{\zeta+\beta-1-\delta}).

Hence

E1R2\displaystyle E_{1}R_{2}
\displaystyle\leq E1(#Γ2)log(1+2N4β)\displaystyle E_{1}(\#\Gamma_{2})\log(1+2N^{4-\beta})
\displaystyle\lesssim (N1βlogN)Φ(Δh22(1ζ)logN)\displaystyle(N^{1-\beta}\log N)\Phi\Big{(}\Delta\sqrt{\tfrac{h}{2}}-\sqrt{2(1-\zeta)\log N}\Big{)}
=\displaystyle= o(NζδlogN),\displaystyle o(N^{\zeta-\delta}\log N),

and (36) follows from logKNζ\log K\sim N^{\zeta}.

𝐢=𝟏{\bf i=1}: Since log(1+x)x\log(1+x)\leq x,

E1R14N12βehΔ2/4\displaystyle E_{1}R_{1}\leq 4N^{1-2\beta}e^{-h\Delta^{2}/4} (38)
×2(1ζ)logN12πe(zΔh2)2/2+zΔh2dz\displaystyle\qquad\times\int_{-\infty}^{\sqrt{2(1-\zeta)\log N}}\tfrac{1}{\sqrt{2\pi}}e^{-(z-\Delta\sqrt{\frac{h}{2}})^{2}/2+z\Delta\sqrt{\frac{h}{2}}}dz
=4N12βΦ(2(1ζ)logN2Δh2)ehΔ2/2\displaystyle\qquad=4N^{1-2\beta}\Phi\Big{(}\sqrt{2(1-\zeta)\log N}-2\Delta\sqrt{\tfrac{h}{2}}\Big{)}e^{h\Delta^{2}/2}
4N12β(1ζ2(1ϵ)ρZ(β,ζ))2+2(1ϵ)ρZ(β,ζ)\displaystyle\qquad\leq 4N^{1-2\beta-(\sqrt{1-\zeta}-2\sqrt{(1-\epsilon)\rho_{Z}(\beta,\zeta)})^{2}+2(1-\epsilon)\rho_{Z}(\beta,\zeta)}
=4Nζδ for some δ>0.\displaystyle\qquad=4N^{\zeta-\delta}\mbox{ for some }\delta>0.

The last step above is shown below. Since

R1(#Γ1)log(12Nβ)p2N12β=o(Nζ),R_{1}\geq(\#\Gamma_{1})\log(1-2N^{-\beta})\stackrel{{\scriptstyle p}}{{\sim}}-2N^{1-2\beta}=o(N^{\zeta}),

and logKNζ\log K\sim N^{\zeta}, (36) follows from (38) and Markov’s inequality.

𝐢=𝟎{\bf i=0}: Since E1eR0=1E_{1}e^{R_{0}}=1,

P1(R014logK)K140.P_{1}(R_{0}\geq\tfrac{1}{4}\log K)\leq K^{-\frac{1}{4}}\rightarrow 0.

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of (38). It suffices to show that

12β(1ζ2(1ϵ)ρZ(β,ζ))+2(1ϵ)ρZ(β,ζ)<ζ.1-2\beta-(\sqrt{1-\zeta}-2\sqrt{(1-\epsilon)\rho_{Z}(\beta,\zeta)})+2(1-\epsilon)\rho_{Z}(\beta,\zeta)<\zeta. (39)

Let m(ρ)=(1ζ2ρ)2+2ρm(\rho)=-(\sqrt{1-\zeta}-2\sqrt{\rho})^{2}+2\rho. Inequality (39) follows from

m(ρZ(β,ζ))\displaystyle m(\rho_{Z}(\beta,\zeta))
=\displaystyle= (1ζ2ρZ(β,ζ))+2ρZ(β,ζ)\displaystyle-(\sqrt{1-\zeta}-2\sqrt{\rho_{Z}(\beta,\zeta)})+2\rho_{Z}(\beta,\zeta)
=\displaystyle= (21ζβ1ζ)2\displaystyle-(2\sqrt{1-\zeta-\beta}-\sqrt{1-\zeta})^{2}
+2(1ζ1ζβ)2\displaystyle\quad+2(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}
=\displaystyle= 1ζ2(1ζβ)=ζ1+2β,\displaystyle 1-\zeta-2(1-\zeta-\beta)=\zeta-1+2\beta,

and

ddρm(ρ)\displaystyle\tfrac{d}{d\rho}m(\rho) =\displaystyle= 2ρ12(1ζ2ρ)+2\displaystyle 2\rho^{-\frac{1}{2}}(\sqrt{1-\zeta}-2\sqrt{\rho})+2
=\displaystyle= 2ρ121ζ2>0 for ρ<1ζ.\displaystyle 2\rho^{-\frac{1}{2}}\sqrt{1-\zeta}-2>0\mbox{ for }\rho<1-\zeta.

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 2(b). Consider Δ=Tη\Delta=T^{-\eta}. Proceed as in the proof of Theorem 2(a), with h=4(12η)(1ϵ)VT2ηlogTh=\lfloor\tfrac{4(1-2\eta)(1-\epsilon)}{V}T^{2\eta}\log T\rfloor for i. and h=4(1ϵ)ρZ(β,ζ)T2ηlogNh=\lfloor 4(1-\epsilon)\rho_{Z}(\beta,\zeta)T^{2\eta}\log N\rfloor for ii. For Theorem 2(b)i.,

logK=log(T2h(12η)logT,\log K=\log(\lfloor\tfrac{T}{2h}\rfloor\sim(1-2\eta)\log T, (40)

and (32) holds because

P1(LK)=Φ(logK14hVΔ212hVΔ2)1.P_{1}(L\leq K)=\Phi\Big{(}\tfrac{\log K-\frac{1}{4}hV\Delta^{2}}{\sqrt{\frac{1}{2}hV\Delta^{2}}}\Big{)}\rightarrow 1.

For Theorem 2(b)ii. the arguments in the proof of Theorem 2(a)ii. apply with

logK=log(T2h)(12η)Nζ.\log K=\log(\lfloor\tfrac{T}{2h}\rfloor)\sim(1-2\eta)N^{\zeta}.

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Appendix C Proof of Theorem 3

For (s,t,u)𝒜i(T)(s,t,u)\in{\cal A}_{i}(T), the penalty of the SL scores is

log(T4(1ts+1ut))log(T2hi).\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\geq\log(\tfrac{T}{2h_{i}}).

Moreover #𝒜i(T)Tdi\#{\cal A}_{i}(T)\leq\tfrac{T}{d_{i}}. Hence by (6) and cTlog(i=1iThidi)c_{T}-\log(\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}})\rightarrow\infty, for 𝝁Ω0\boldsymbol{\mu}\in\Omega_{0},

P𝝁(Type I error)\displaystyle P_{\boldsymbol{\mu}}(\mbox{Type I error})
\displaystyle\leq i=1iT(s,t,u)𝒜i(T)P𝝁(N(𝐩stu)cT+log(T2hi))\displaystyle\sum_{i=1}^{i_{T}}\sum_{(s,t,u)\in{\cal A}_{i}(T)}P_{\boldsymbol{\mu}}(\ell_{N}({\bf p}_{stu})\geq c_{T}+\log(\tfrac{T}{2h_{i}}))
\displaystyle\leq i=1iTTdiexp(cTlog(T2hi))\displaystyle\sum_{i=1}^{i_{T}}\tfrac{T}{d_{i}}\exp(-c_{T}-\log(\tfrac{T}{2h_{i}}))
=\displaystyle= 2ecTi=1iThidi0.\displaystyle 2e^{-c_{T}}\sum_{i=1}^{i_{T}}\tfrac{h_{i}}{d_{i}}\rightarrow 0.

Consider 𝝁Ω1(Δ,h,V)\boldsymbol{\mu}\in\Omega_{1}(\Delta,h,V) and let τj\tau_{j} be the change-point satisfying the conditions in the definition of Ω1(Δ,h,V)\Omega_{1}(\Delta,h,V). Let Qn=1Q^{n}=1 if |μτj+1nμτjn|Δ|\mu_{\tau_{j}+1}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta and Qn=0Q^{n}=0 otherwise. We assume without loss of generality that 0<ϵ<10<\epsilon<1.

To aid in the checking of the proof of Theorem 3, we provide here the key ideas. Let jj be such that

min(τjτj1,τj+1τj)h and mjΔV.\min(\tau_{j}-\tau_{j-1},\tau_{j+1}-\tau_{j})\geq h\mbox{ and }m_{j\Delta}\geq V.

Consider Δ>0\Delta>0 fixed and VN1βV\sim N^{1-\beta} for some 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta. Since hh\rightarrow\infty, it follows from (11) that for NN large we are able to find (s,t,u)=(s(ik),t(ik),u(ik))(s,t,u)=(s(ik),t(ik),u(ik)) close to (τjh,τj,τj+h)(\tau_{j}-h,\tau_{j},\tau_{j}+h) such that

E𝝁Zstun[1+o(1)]hΔ22 for n satisfying |μτj+1nμτjn|Δ.E_{\boldsymbol{\mu}}Z_{stu}^{n}\geq[1+o(1)]\tfrac{h\Delta^{2}}{2}\mbox{ for }n\mbox{ satisfying }|\mu_{\tau_{j+1}}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta. (42)

Recall that pstun=2Φ(|Zstun|)p_{stu}^{n}=2\Phi(-|Z_{stu}^{n}|) and let qstun=Φ(|Zstun|+E𝝁Zstun)+Φ(|Zstun|E𝝁Zstun)q_{stu}^{n}=\Phi(-|Z_{stu}^{n}|+E_{\boldsymbol{\mu}}Z_{stu}^{n})+\Phi(-|Z_{stu}^{n}|-E_{\boldsymbol{\mu}}Z_{stu}^{n}). Let

Γ={n:|Zstun|2ωlogN,qstunNζ1,\displaystyle\Gamma=\{n:|Z_{stu}^{n}|\geq\sqrt{2\omega\log N},\ q_{stu}^{n}\geq N^{\zeta-1}, (43)
|μτj+1nμτjn|Δ},\displaystyle\qquad\qquad|\mu_{\tau_{j+1}}^{n}-\mu_{\tau_{j}}^{n}|\geq\Delta\},

with ω=1ζ\omega=1-\zeta when 3(1ζ)4<β<1ζ\tfrac{3(1-\zeta)}{4}<\beta<1-\zeta and ω=4(β1ζ2)\omega=4(\beta-\tfrac{1-\zeta}{2}) when 1ζ2<β3(1ζ)4\tfrac{1-\zeta}{2}<\beta\leq\tfrac{3(1-\zeta)}{4}. It follows from Lemmas 1 and 2 that with probability tending to 1,

N(𝐩stu)\displaystyle\ell_{N}({\bf p}_{stu}) \displaystyle\geq N(𝐪stu)+(#Γ)λ24NξNlogN\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}
\displaystyle\geq λ22logN+(#Γ)λ24NξNlogN\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}

for ξN=Nω\xi_{N}=N^{-\omega}.

Since the penalty log(T4(1ts+1ut))logTNζ\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T\sim N^{\zeta}, cT=o(logT)c_{T}=o(\log T) and λ2Nζ2ζlogN\lambda_{2}\sim\tfrac{N^{\frac{\zeta}{2}}}{\sqrt{\zeta\log N}}, to show P𝝁(stupen(𝐩)cT)1P_{\boldsymbol{\mu}}(\ell^{\rm pen}_{stu}({\bf p})\geq c_{T})\rightarrow 1, it suffices to show that there exists δ>0\delta>0 such that

E𝝁(#Γ){Nζ+δ if 3(1ζ)4<β<1ζ,N322βζ2+δ if 1ζ2<β3(1ζ)4.E_{\boldsymbol{\mu}}(\#\Gamma)\gtrsim\left\{\begin{array}[]{ll}N^{\zeta+\delta}&\mbox{ if }\tfrac{3(1-\zeta)}{4}<\beta<1-\zeta,\cr N^{\frac{3}{2}-2\beta-\tfrac{\zeta}{2}+\delta}&\mbox{ if }\tfrac{1-\zeta}{2}<\beta\leq\tfrac{3(1-\zeta)}{4}.\end{array}\right. (44)

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 3(a)i. Consider V=o(logTlogN)V=o(\tfrac{\log T}{\log N}). Since h=4(1+ϵ)(logTΔ2V)h=4(1+\epsilon)(\tfrac{\log T}{\Delta^{2}V})\rightarrow\infty, hi+1hi1\tfrac{h_{i+1}}{h_{i}}\rightarrow 1 and di=o(hi)d_{i}=o(h_{i}), for large TT there exists

hi4(1+ϵ)12(logTΔ2V)h_{i}\geq 4(1+\epsilon)^{\frac{1}{2}}(\tfrac{\log T}{\Delta^{2}V})

such that for all 𝝁Ω1(h,Δ,V)\boldsymbol{\mu}\in\Omega_{1}(h,\Delta,V), there exists kk satisfying

τj1<s(ik)<u(ik)<τj+1 and |t(ik)τj|di2.\tau_{j-1}<s(ik)<u(ik)<\tau_{j+1}\mbox{ and }|t(ik)-\tau_{j}|\leq\tfrac{d_{i}}{2}. (45)

Hence when Qn=1Q^{n}=1,

|E𝝁Zstun|Δ(1di2hi)hi22(1+ϵ)13V1logT,|E_{\boldsymbol{\mu}}Z^{n}_{stu}|\geq\Delta(1-\tfrac{d_{i}}{2h_{i}})\sqrt{\tfrac{h_{i}}{2}}\geq\sqrt{2(1+\epsilon)^{\frac{1}{3}}V^{-1}\log T}, (46)

where (s,t,u)=(s(ik),t(ik),u(ik))(s,t,u)=(s(ik),t(ik),u(ik)).

Let Γ={n:Qn=1,|Zstun|2(1+ϵ)14(logTV)}\Gamma=\{n:Q^{n}=1,|Z^{n}_{stu}|\geq\sqrt{2(1+\epsilon)^{\frac{1}{4}}(\tfrac{\log T}{V})}\}. Let pstun=2Φ(|Zstun|)p^{n}_{stu}=2\Phi(-|Z_{stu}^{n}|) and qstun=Φ(|Zstun|+E𝝁Zstun)+Φ(|Zstun|E𝝁Zstun)q_{stu}^{n}=\Phi(-|Z_{stu}^{n}|+E_{\boldsymbol{\mu}}Z_{stu}^{n})+\Phi(-|Z_{stu}^{n}|-E_{\boldsymbol{\mu}}Z_{stu}^{n}). Since qni.i.d.Uniformq^{n}\stackrel{{\scriptstyle\rm i.i.d.}}{{\sim}}\mbox{Uniform}(0,1) and N(1)1\ell_{N}(1)\geq-1 for NN large, by Lemmas 1 and 2, with probability tending to 1,

N(𝐩stu)\displaystyle\ell_{N}({\bf p}_{stu}) \displaystyle\geq N(𝐪stu)\displaystyle\ell_{N}({\bf q}_{stu})
+(#Γ)[N(2Φ(2(1+ϵ)14logTV))1]\displaystyle\quad+(\#\Gamma)\Big{[}\ell_{N}\Big{(}2\Phi\Big{(}-\sqrt{2(1+\epsilon)^{\frac{1}{4}}\tfrac{\log T}{V}}\Big{)}\Big{)}-1\Big{]}
\displaystyle\geq λ22logN+V[(1+ϵ)15logTVlogN]\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+V[(1+\epsilon)^{\frac{1}{5}}\tfrac{\log T}{V}-\log N]
\displaystyle\geq (1+ϵ)16logT.\displaystyle(1+\epsilon)^{\frac{1}{6}}\log T.

Since the penalty log(T4(1ts+1ut))logT\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T and cT=o(logT)c_{T}=o(\log T), it follows that P𝝁(Npen(𝐩stu)cT)1P_{\boldsymbol{\mu}}(\ell^{\rm pen}_{N}({\bf p}_{stu})\geq c_{T})\rightarrow 1. \sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 3(a)ii. Case 1: VN1βV\sim N^{1-\beta} for 3(1ζ)4β<1ζ\tfrac{3(1-\zeta)}{4}\leq\beta<1-\zeta. Since hΔ2=4(1+ϵ)(1ζ1ζβ)2logNh\Delta^{2}=4(1+\epsilon)(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2}\log N and di=o(hi)d_{i}=o(h_{i}), for large NN there exists ii satisfying hi(1+ϵ)12hh_{i}\geq(1+\epsilon)^{-\frac{1}{2}}h such that whenever Qn=1Q^{n}=1,

|E𝝁Zn|\displaystyle|E_{\boldsymbol{\mu}}Z^{n}| \displaystyle\geq Δ(1di2hi)hi22νlogN,\displaystyle\Delta(1-\tfrac{d_{i}}{2h_{i}})\sqrt{\tfrac{h_{i}}{2}}\geq\sqrt{2\nu\log N}, (48)
ν\displaystyle\nu =\displaystyle= (1+ϵ)13(1ζ1ζβ)2,\displaystyle(1+\epsilon)^{\frac{1}{3}}(\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta})^{2},

with (s,t,u)=(s(ik),t(ik),u(ik))(s,t,u)=(s(ik),t(ik),u(ik)) for kk satisfying (45).

For Γ\Gamma defined in (43),

E𝝁(#Γ)\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma) \displaystyle\geq V[Φ(2(1ζ)logN+2νlogN)\displaystyle V[\Phi\Big{(}-\sqrt{2(1-\zeta)\log N}+\sqrt{2\nu\log N}\Big{)}
Nζ1]\displaystyle\quad-N^{\zeta-1}]
\displaystyle\gtrsim N1β(1ζν)2(logN)12,\displaystyle N^{1-\beta-(\sqrt{1-\zeta}-\sqrt{\nu})^{2}}(\log N)^{-\frac{1}{2}},

and (44) follows from

1ζ>ν>1ζ1ζβ.\sqrt{1-\zeta}>\sqrt{\nu}>\sqrt{1-\zeta}-\sqrt{1-\zeta-\beta}.

Case 2: VN1βV\sim N^{1-\beta} for 1ζ2<β<3(1ζ)4\tfrac{1-\zeta}{2}<\beta<\tfrac{3(1-\zeta)}{4}. Since hΔ2=4(1+ϵ)(β1ζ2)logNh\Delta^{2}=4(1+\epsilon)(\beta-\tfrac{1-\zeta}{2})\log N, for large NN there exists hi(1+ϵ)12hh_{i}\geq(1+\epsilon)^{-\frac{1}{2}}h such that whenever Qn=1Q^{n}=1,

|E𝝁Zstun|\displaystyle|E_{\boldsymbol{\mu}}Z^{n}_{stu}| \displaystyle\geq Δ(1di2hi)hi22νlogN,\displaystyle\Delta(1-\tfrac{d_{i}}{2h_{i}})\sqrt{\tfrac{h_{i}}{2}}\geq\sqrt{2\nu\log N}, (49)
ν\displaystyle\nu =\displaystyle= (1+ϵ)13(β1ζ2),\displaystyle(1+\epsilon)^{\frac{1}{3}}(\beta-\tfrac{1-\zeta}{2}),

with (s,t,u)=(s(ik),t(ik),u(ik))(s,t,u)=(s(ik),t(ik),u(ik)) for kk satisfying (45).

For Γ\Gamma defined in (43),

E𝝁(#Γ)\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma) \displaystyle\geq V[Φ(2(2β1+ζ)logN+2νlogN)\displaystyle V[\Phi\Big{(}-2\sqrt{(2\beta-1+\zeta)\log N}+\sqrt{2\nu\log N}\Big{)}
Nζ1]\displaystyle\quad-N^{\zeta-1}]
\displaystyle\gtrsim N1β(2β1ζ2ν)2(logN)12,\displaystyle N^{1-\beta-(2\sqrt{\beta-\frac{1-\zeta}{2}}-\sqrt{\nu})^{2}}(\log N)^{-\frac{1}{2}},

and (44) follows from

2β1ζ2>ν>β1ζ2.2\sqrt{\beta-\tfrac{1-\zeta}{2}}>\sqrt{\nu}>\sqrt{\beta-\tfrac{1-\zeta}{2}}.

\sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 3(b). Consider first V=o(logTlogN)V=o(\tfrac{\log T}{\log N}). Let hi(12η)(1+ϵ)12(logTΔ2V)h_{i}\geq(1-2\eta)(1+\epsilon)^{\frac{1}{2}}(\tfrac{\log T}{\Delta^{2}V}) be such that for all 𝝁Ω1(h,Δ,V)\boldsymbol{\mu}\in\Omega_{1}(h,\Delta,V), (45) holds for some kk. Let

Γ={n:Qn=1,|Zstun|2(12η)(1+ϵ)14logTV}\Gamma=\{n:Q^{n}=1,|Z^{n}_{stu}|\geq\sqrt{2(1-2\eta)(1+\epsilon)^{\frac{1}{4}}\tfrac{\log T}{V}}\}

and define pstunp_{stu}^{n} and qstunq_{stu}^{n} as in the proof of Theorem 3(a)i.

By the arguments in (C), with probability tending to 1,

N(𝐩stu)(12η)(1+ϵ)16logT.\ell_{N}({\bf p}_{stu})\geq(1-2\eta)(1+\epsilon)^{\frac{1}{6}}\log T.

Since ΔTη\Delta\sim T^{-\eta}, it follows that hiT2ηlogNh_{i}\gtrsim T^{2\eta}\log N and the penalty log(T4(1ut+1ts))(12η)logT\log(\tfrac{T}{4}(\tfrac{1}{u-t}+\tfrac{1}{t-s}))\leq(1-2\eta)\log T for TT large. Hence by cT=o(logT)c_{T}=o(\log T) we conclude P𝝁(Npen(𝐩stu)cT)1P_{\boldsymbol{\mu}}(\ell_{N}^{\rm pen}({\bf p}_{stu})\geq c_{T})\rightarrow 1.

For VN1βV\sim N^{1-\beta} with 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta, the same arguments in the proof of Theorem 3(a)ii. apply. \sqcap\hbox to0.0pt{\hss$\sqcup$}

Appendix D Proof of Theorem 4

Proof of Theorem 4(a). Let h=(1ϵ)logTμ0VIrh=\lfloor\tfrac{(1-\epsilon)\log T}{\mu_{0}VI_{r}}\rfloor for some 0<ϵ<10<\epsilon<1. Let P0P_{0} denote probability with respect to μtn=(1+r2)μ0\mu_{t}^{n}=(\tfrac{1+r}{2})\mu_{0} for all nn and tt. Let tk=(2k1)ht_{k}=(2k-1)h. Let PkP_{k}, 1kK:=T2h1\leq k\leq K:=\lfloor\tfrac{T}{2h}\rfloor, denote probability under which for nVn\leq V,

μtn={μ0 for tkh<ttk,rμ0 for tk<ttk+h,(1+r2)μ0 for ttkh and t>tk+h,\mu^{n}_{t}=\left\{\begin{array}[]{ll}\mu_{0}&\mbox{ for }t_{k}-h<t\leq t_{k},\cr r\mu_{0}&\mbox{ for }t_{k}<t\leq t_{k}+h,\cr(\tfrac{1+r}{2})\mu_{0}&\mbox{ for }t\leq t_{k}-h\mbox{ and }t>t_{k}+h,\end{array}\right. (50)

and μ1n==μTn=(1+r2)μ0\mu^{n}_{1}=\cdots=\mu^{n}_{T}=(\tfrac{1+r}{2})\mu_{0} for n>Vn>V. Let EkE_{k} and Vark{\rm Var}_{k} denote expectation and variance respectively with respect to PkP_{k}. Let

Un\displaystyle U^{n} =\displaystyle= S0hnlog(21+r)+Sh,2hnlog(2rr+1),\displaystyle S_{0h}^{n}\log(\tfrac{2}{1+r})+S_{h,2h}^{n}\log(\tfrac{2r}{r+1}), (51)
L1\displaystyle L_{1} =\displaystyle= dP1dP0(𝐗)=n=1Vexp(Un).\displaystyle\tfrac{dP_{1}}{dP_{0}}({\bf X})=\prod_{n=1}^{V}\exp(U^{n}). (52)

By (B)–(32), it suffices to show that

P1(L1K)1 as T.P_{1}(L_{1}\leq K)\rightarrow 1\mbox{ as }T\rightarrow\infty. (53)

Since E1(logL1)=hμ0VIrE_{1}(\log L_{1})=h\mu_{0}VI_{r} and Var1(logL1)=hμ0VCr{\rm Var}_{1}(\log L_{1})=h\mu_{0}VC_{r}, where Cr=r[log(2rr+1)]2+[log(2r+1)]2C_{r}=r[\log(\tfrac{2r}{r+1})]^{2}+[\log(\tfrac{2}{r+1})]^{2}, by Chebyshev’s inequality,

P1(L1K)1hVμ0Cr(logKhVμ0Ir)21,P_{1}(L_{1}\leq K)\geq 1-\tfrac{hV\mu_{0}C_{r}}{(\log K-hV\mu_{0}I_{r})^{2}}\rightarrow 1,

and (53) holds. \sqcap\hbox to0.0pt{\hss$\sqcup$}

We preface the proof of Theorem 4(b) with Lemma 3, which provides an alternative representation of ρr(β,ζ)\rho_{r}(\beta,\zeta). Let

D(ω)\displaystyle D(\omega) =\displaystyle= 11+rωlog(21+rω)+rω1+rωlog(2rω1+rω),\displaystyle\tfrac{1}{1+r^{\omega}}\log(\tfrac{2}{1+r^{\omega}})+\tfrac{r^{\omega}}{1+r^{\omega}}\log(\tfrac{2r^{\omega}}{1+r^{\omega}}), (54)
g(ω)\displaystyle g(\omega) =\displaystyle= (1+rω2)1ω.\displaystyle(\tfrac{1+r^{\omega}}{2})^{\frac{1}{\omega}}.

Let ξ(ω)=βω1(1ζ)2g(ω)1r\xi(\omega)=\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r}. Recall from (20) that

ρr(β,ζ)=max1ζβ<ω2ξ(ω) for 1ζ2<β<1ζ.\rho_{r}(\beta,\zeta)=\max_{\frac{1-\zeta}{\beta}<\omega\leq 2}\xi(\omega)\mbox{ for }\tfrac{1-\zeta}{2}<\beta<1-\zeta. (55)
Lemma 3.

For 12<β1ζ12[1+2g(2)1rg(2)D(2)]\tfrac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}], ξ\xi achieves its maximum at ω=2\omega=2 and

ρr(β,ζ)=β12(1ζ)2g(2)1r.\rho_{r}(\beta,\zeta)=\tfrac{\beta-\frac{1}{2}(1-\zeta)}{2g(2)-1-r}. (56)

For 12[1+2g(2)1rg(2)D(2)]<β1ζ<1\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1, ξ\xi achieves its maximum at some ω<2\omega<2 and

ρr(β,ζ)=1ζ2g(ω)D(ω).\rho_{r}(\beta,\zeta)=\tfrac{1-\zeta}{2g(\omega)D(\omega)}. (57)

Proof. Since

ddωlogξ(ω)\displaystyle\tfrac{d}{d\omega}\log\xi(\omega) =\displaystyle= ω2(1ζ)βω1(1ζ)2ddωg(ω)2g(ω)1r,\displaystyle\tfrac{\omega^{-2}(1-\zeta)}{\beta-\omega^{-1}(1-\zeta)}-\tfrac{2\frac{d}{d\omega}g(\omega)}{2g(\omega)-1-r},
ddωg(ω)\displaystyle\tfrac{d}{d\omega}g(\omega) =\displaystyle= ddωexp[1ωlog(1+rω2)]\displaystyle\tfrac{d}{d\omega}\exp[\tfrac{1}{\omega}\log(\tfrac{1+r^{\omega}}{2})]
=\displaystyle= [rωlogrω(1+rω)1ω2log(1+rω2)]g(ω)\displaystyle[\tfrac{r^{\omega}\log r}{\omega(1+r^{\omega})}-\tfrac{1}{\omega^{2}}\log(\tfrac{1+r^{\omega}}{2})]g(\omega)
=\displaystyle= D(ω)g(ω)ω2,\displaystyle\tfrac{D(\omega)g(\omega)}{\omega^{2}},

it follows that ddωlogξ(ω)=0\tfrac{d}{d\omega}\log\xi(\omega)=0 when

ω2(1ζ)[2g(ω)1r]=2[βω1(1ζ)]D(ω)g(ω)ω2,\omega^{-2}(1-\zeta)[2g(\omega)-1-r]=2[\beta-\omega^{-1}(1-\zeta)]\tfrac{D(\omega)g(\omega)}{\omega^{2}}, (58)

that is when

β1ζ=ω1+2g(ω)1r2g(ω)D(ω).\tfrac{\beta}{1-\zeta}=\omega^{-1}+\tfrac{2g(\omega)-1-r}{2g(\omega)D(\omega)}. (59)

For 12<β1ζ12[1+2g(2)1rg(2)D(2)]\tfrac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}], the solution of ω\omega to (59) is at least 2 and the maximum in (55) is attained at ω=2\omega=2. We conclude (56). For 12[1+2g(2)1rg(2)D(2)]<β1ζ<1\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1, the solution of ω\omega to (59) lies in the interval (1ζβ,2)(\frac{1-\zeta}{\beta},2). We conclude (57) from (55) and a rearrangement of (58). \sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 4(b). For 1ζ2<β<1ζ\frac{1-\zeta}{2}<\beta<1-\zeta, let ω\omega be the maximizer in

ρr(β,ζ)=max1ζβ<ω2(βω1(1ζ)2g(ω)1r).\rho_{r}(\beta,\zeta)=\max_{\frac{1-\zeta}{\beta}<\omega\leq 2}(\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r}). (60)

Let h=(1ϵ)ρr(β,ζ)logNμ0h=\lfloor\tfrac{(1-\epsilon)\rho_{r}(\beta,\zeta)\log N}{\mu_{0}}\rfloor for some ϵ>0\epsilon>0. Let P0P_{0} denote probability with respect to μtn=g(ω)μ0\mu^{n}_{t}=g(\omega)\mu_{0} for all nn and tt. Let tk=(2k1)ht_{k}=(2k-1)h. Let PkP_{k}, 1kK:=T2h1\leq k\leq K:=\lfloor\tfrac{T}{2h}\rfloor, denote probability under which, independently for 1nN1\leq n\leq N, Qn=1Q^{n}=1 with probability 2Nβ2N^{-\beta}, and Qn=0Q^{n}=0 otherwise. When Qn=1Q^{n}=1,

μtn={μ0 for tkh<ttk,rμ0 for tk<ttk+h,g(ω)μ0 for ttkh and t>tk+h.\mu^{n}_{t}=\left\{\begin{array}[]{ll}\mu_{0}&\mbox{ for }t_{k}-h<t\leq t_{k},\cr r\mu_{0}&\mbox{ for }t_{k}<t\leq t_{k}+h,\cr g(\omega)\mu_{0}&\mbox{ for }t\leq t_{k}-h\mbox{ and }t>t_{k}+h.\end{array}\right. (61)

When Qn=0Q^{n}=0, μ1n==μTn=g(ω)μ0\mu_{1}^{n}=\cdots=\mu_{T}^{n}=g(\omega)\mu_{0}. Let E1E_{1} denote expectation with respect to P1P_{1}. Let PQ=P1(|Q1=1)P_{Q}=P_{1}(\cdot|Q^{1}=1) and let EQE_{Q} denote expectation with respect to PQP_{Q}.

By (B)–(32), it suffices to show (53) for

L1\displaystyle L_{1} =\displaystyle= dP1dP0(𝐗)=n=1N(1+2Nβ[exp(Un)1]),\displaystyle\tfrac{dP_{1}}{dP_{0}}({\bf X})=\prod_{n=1}^{N}(1+2N^{-\beta}[\exp(U^{n})-1]), (62)
Un\displaystyle U^{n} =\displaystyle= S0hnlog(1g(ω))+Sh,2hnlog(rg(ω))\displaystyle S^{n}_{0h}\log(\tfrac{1}{g(\omega)})+S^{n}_{h,2h}\log(\tfrac{r}{g(\omega)})
hμ0[1+r2g(ω)].\displaystyle\quad-h\mu_{0}[1+r-2g(\omega)].

For notational simplicity, let S0h=S0h1S_{0h}=S_{0h}^{1} and Sh,2h=Sh,2h1S_{h,2h}=S_{h,2h}^{1}.

For XX\simPoisson(λ\lambda) and constant C>0C>0,

E(CX)=x=0eλ(Cλ)xx!=eλ(C1).E(C^{X})=\sum_{x=0}^{\infty}e^{-\lambda}\tfrac{(C\lambda)^{x}}{x!}=e^{\lambda(C-1)}. (63)

This identity is applied in (D), (D) and (D).

Case 1: 12<β1ζ12[1+2g(2)1rg(2)D(2)]\frac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\frac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}], ω=2\omega=2. By Lemma 2, (61)–(63) and [g(2)]2=1+r22[g(2)]^{2}=\tfrac{1+r^{2}}{2},

EQexp(U1)\displaystyle E_{Q}\exp(U^{1})
=\displaystyle= EQ[(1g(2))S0h(rg(2))Sh,2h]ehμ0[1+r2g(2)]\displaystyle E_{Q}[(\tfrac{1}{g(2)})^{S_{0h}}(\tfrac{r}{g(2)})^{S_{h,2h}}]e^{-h\mu_{0}[1+r-2g(2)]}
=\displaystyle= exp(hμ0[1g(2)1+r2g(2)r]hμ0[1+r2g(2)])\displaystyle\exp(h\mu_{0}[\tfrac{1}{g(2)}-1+\tfrac{r^{2}}{g(2)}-r]-h\mu_{0}[1+r-2g(2)])
=\displaystyle= exp(2hμ0[2g(2)1r])\displaystyle\exp(2h\mu_{0}[2g(2)-1-r])
=\displaystyle= exp(hμ0(2β1+ζ)ρr(β,ζ))N(1ϵ)(2β1+ζ).\displaystyle\exp(\tfrac{h\mu_{0}(2\beta-1+\zeta)}{\rho_{r}(\beta,\zeta)})\leq N^{(1-\epsilon)(2\beta-1+\zeta)}.

Hence

E1L1\displaystyle E_{1}L_{1} =\displaystyle= (1+4N2β[EQexp(U1)1])N\displaystyle(1+4N^{-2\beta}[E_{Q}\exp(U^{1})-1])^{N}
\displaystyle\leq exp(4Nζδ)=o(K),\displaystyle\exp(4N^{\zeta-\delta})=o(K),

where δ=ϵ(2β1+ζ)\delta=\epsilon(2\beta-1+\zeta), and (53) holds.

Case 2: 12[1+2g(2)1rg(2)D(2)]<β1ζ<1\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1. Express

logL1\displaystyle\log L_{1} =\displaystyle= R0+R1,\displaystyle R_{0}+R_{1}, (65)
where Ri\displaystyle\mbox{where }R_{i} =\displaystyle= nΓilog(1+2Nβ[exp(Un)1]),\displaystyle\sum_{n\in\Gamma_{i}}\log(1+2N^{-\beta}[\exp(U^{n})-1]),
Γ0\displaystyle\Gamma_{0} =\displaystyle= {n:Qn=0}{n:Qn=1,exp(Un)Nβ},\displaystyle\{n:Q^{n}=0\}\cup\{n:Q^{n}=1,\exp(U^{n})\leq N^{\beta}\},
Γ1\displaystyle\Gamma_{1} =\displaystyle= {n:Qn=1,exp(Un)>Nβ}.\displaystyle\{n:Q^{n}=1,\exp(U^{n})>N^{\beta}\}.

We conclude (53) from

P1(Ri12logK)1 for i=0 and 1.P_{1}(R_{i}\leq\tfrac{1}{2}\log K)\rightarrow 1\mbox{ for }i=0\mbox{ and }1. (66)

𝐢=𝟎{\bf i=0}: Let a=ω1a=\omega-1 with ω\omega the maximizer in (60). Since g(ω)=1+ra+12ga(ω)g(\omega)=\tfrac{1+r^{a+1}}{2g^{a}(\omega)}, by (60), (62) and (63),

EQ[exp(U1)𝟏{1Γ0}]\displaystyle E_{Q}[\exp(U^{1}){\bf 1}_{\{1\in\Gamma_{0}\}}]
\displaystyle\leq Nβ(1a)EQexp(aU1)\displaystyle N^{\beta(1-a)}E_{Q}\exp(aU^{1})
=\displaystyle= Nβ(1a)exp(hμ0[1ga(ω)1+ra+1ga(ω)r]\displaystyle N^{\beta(1-a)}\exp(h\mu_{0}[\tfrac{1}{g^{a}(\omega)}-1+\tfrac{r^{a+1}}{g^{a}(\omega)}-r]
ahμ0[1+r2g(ω)])\displaystyle-ah\mu_{0}[1+r-2g(\omega)])
=\displaystyle= Nβ(1a)exp(ωhμ0[2g(ω)1r])\displaystyle N^{\beta(1-a)}\exp(\omega h\mu_{0}[2g(\omega)-1-r])
=\displaystyle= Nβ(1a)exp(hμ0(βω1+ζ)ρr(β,ζ))N2β1+ζδ,\displaystyle N^{\beta(1-a)}\exp(\tfrac{h\mu_{0}(\beta\omega-1+\zeta)}{\rho_{r}(\beta,\zeta)})\leq N^{2\beta-1+\zeta-\delta},

where δ=ϵ(βω1+ζ)\delta=\epsilon(\beta\omega-1+\zeta). Since E0exp(Un)=1E_{0}\exp(U^{n})=1, it follows from (D) that

E1exp(R0)\displaystyle E_{1}\exp(R_{0}) \displaystyle\leq (1+4N2βEQ[exp(U1)𝟏{1Γ0}])N\displaystyle(1+4N^{-2\beta}E_{Q}[\exp(U^{1}){\bf 1}_{\{1\in\Gamma_{0}\}}])^{N}
\displaystyle\leq exp(4Nζδ),\displaystyle\exp(4N^{\zeta-\delta}),

and (66) holds.

𝐢=𝟏{\bf i=1}: Express U1=v1S0h+v2Sh,2hzU^{1}=v_{1}S_{0h}+v_{2}S_{h,2h}-z, where v1=log(1g(ω))v_{1}=\log(\tfrac{1}{g(\omega)}), v2=log(rg(ω))v_{2}=\log(\tfrac{r}{g(\omega)}) and z=hμ0[1+r2g(ω)]z=h\mu_{0}[1+r-2g(\omega)]. Since g(ω)=1+ra+12ga(ω)g(\omega)=\tfrac{1+r^{a+1}}{2g^{a}(\omega)}, by Markov’s inequality and (63),

E1(#Γ1)\displaystyle E_{1}(\#\Gamma_{1})
=\displaystyle= 2N1βPQ(eaU1>Naβ)\displaystyle 2N^{1-\beta}P_{Q}(e^{aU^{1}}>N^{a\beta})
\displaystyle\leq 2N1βaβeazEQ(ev1aS0hev2aSh,2h)\displaystyle 2N^{1-\beta-a\beta}e^{-az}E_{Q}(e^{v_{1}aS_{0h}}e^{v_{2}aS_{h,2h}})
=\displaystyle= 2N1ωβexp(az+hμ0[ev1a1+rev2ar])\displaystyle 2N^{1-\omega\beta}\exp(-az+h\mu_{0}[e^{v_{1}a}-1+re^{v_{2}a}-r])
=\displaystyle= 2N1ωβexp(ωhμ0[2g(ω)1r])\displaystyle 2N^{1-\omega\beta}\exp(\omega h\mu_{0}[2g(\omega)-1-r])
=\displaystyle= 2N1ωβexp(hμ0(βω1+ζ)ρr(β,ζ))Nζδ,\displaystyle 2N^{1-\omega\beta}\exp(\tfrac{h\mu_{0}(\beta\omega-1+\zeta)}{\rho_{r}(\beta,\zeta)})\leq N^{\zeta-\delta},

where δ=ϵ(βω1+ζ)\delta=\epsilon(\beta\omega-1+\zeta). Since

R1(#Γ1)maxnΓ1U1n and P1(maxnUnNδ2)0,R_{1}\leq(\#\Gamma_{1})\max_{n\in\Gamma_{1}}U_{1}^{n}\mbox{ and }P_{1}(\max_{n}U^{n}\geq N^{\frac{\delta}{2}})\rightarrow 0,

we conclude (66) from (D) and Markov’s inequality. \sqcap\hbox to0.0pt{\hss$\sqcup$}

Appendix E Proof of Theorem 5

It follows from (C) that sup𝝁Λ0P𝝁(Type I error)0\sup_{\boldsymbol{\mu}\in\Lambda_{0}}P_{\boldsymbol{\mu}}(\mbox{Type I error})\rightarrow 0.

Consider 𝝁Λ1(h,Δ,V)\boldsymbol{\mu}\in\Lambda_{1}(h,\Delta,V) and let τj\tau_{j} be a change-point such that

min(τj+1τj,τjτj1)h and mjΔV,\min(\tau_{j+1}-\tau_{j},\tau_{j}-\tau_{j-1})\geq h\mbox{ and }m_{j\Delta}\geq V,

where mjΔ=#{n:|log(μτj+1n/μτjn)|Δ}m_{j\Delta}=\#\{n:|\log(\mu_{\tau_{j}+1}^{n}/\mu_{\tau_{j}}^{n})|\geq\Delta\}.

Let Qn=1Q^{n}=1 if |log(μτj+1n/μτjn)|Δ|\log(\mu_{\tau_{j}+1}^{n}/\mu_{\tau_{j}}^{n})|\geq\Delta and Qn=0Q^{n}=0 otherwise.

Proof of Theorem 5(a). Consider V=o(logTlogN)V=o(\tfrac{\log T}{\log N}) and recall from (19) that Ir=rlog(2rr+1)+log(2r+1)I_{r}=r\log(\tfrac{2r}{r+1})+\log(\tfrac{2}{r+1}). Let r1r_{1} and μ1\mu_{1} be such that eΔ>r1>re^{\Delta}>r_{1}>r and μ0/(1+ϵ)13<μ1<μ0\mu_{0}/(1+\epsilon)^{\frac{1}{3}}<\mu_{1}<\mu_{0}. Since hVIrμ0=(1+ϵ)logThVI_{r}\mu_{0}=(1+\epsilon)\log T, hi+1hi1\frac{h_{i+1}}{h_{i}}\rightarrow 1 and di=o(hi)d_{i}=o(h_{i}), for TT large there exists

hi(1+ϵ)12Ir1(logTμ1V),h_{i}\geq(1+\epsilon)^{\frac{1}{2}}I_{r}^{-1}(\tfrac{\log T}{\mu_{1}V}), (69)

such that for all 𝝁Λ1(h,Δ,V)\boldsymbol{\mu}\in\Lambda_{1}(h,\Delta,V), there exists kk such that

τj1<s(ik)<u(ik)<τj+1,|t(ik)τj|di2.\tau_{j-1}<s(ik)<u(ik)<\tau_{j+1},\qquad|t(ik)-\tau_{j}|\leq\tfrac{d_{i}}{2}.

Moreover when Qn=1Q^{n}=1,

|log(E𝝁Ytun/E𝝁Ystn)|logr1,|\log(E_{\boldsymbol{\mu}}Y^{n}_{tu}/E_{\boldsymbol{\mu}}Y^{n}_{st})|\geq\log r_{1}, (70)

where (s,t,u)=(s(ik),t(ik),u(ik))(s,t,u)=(s(ik),t(ik),u(ik)). Let

Γ={n:Qn=1,Ysun(1+r)hiμ1,|log(Ytun/Ystn)|logr}.\Gamma=\{n:Q^{n}=1,\ Y^{n}_{su}\geq(1+r)h_{i}\mu_{1},\ |\log(Y_{tu}^{n}/Y_{st}^{n})|\geq\log r\}.

By (69), for nΓn\in\Gamma,

pstun2exp(μ1hiIr)2exp((1+ϵ)12logTV).p^{n}_{stu}\leq 2\exp(-\mu_{1}h_{i}I_{r})\leq 2\exp(-(1+\epsilon)^{\frac{1}{2}}\tfrac{\log T}{V}). (71)

Since logTVlogN\tfrac{\log T}{V\log N}\rightarrow\infty, for NN large,

(pstun)(1+ϵ)13(logTV).\ell(p_{stu}^{n})\geq(1+\epsilon)^{\frac{1}{3}}(\tfrac{\log T}{V}).

Hence as N(q)1\ell_{N}(q)\geq-1 for NN large, by Lemma 1, with probability tending to 1,

N(𝐩stu)\displaystyle\ell_{N}({\bf p}_{stu}) \displaystyle\geq N(𝐪stu)+(#Γ)[(1+ϵ)13(logTV)1]\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)[(1+\epsilon)^{\frac{1}{3}}(\tfrac{\log T}{V})-1]
\displaystyle\geq λ22logN+(1+ϵ)14logT.\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+(1+\epsilon)^{\frac{1}{4}}\log T.

Since the penalty log(T4(1ts+1ut))logT\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T, λ22logN=o(logT)\lambda_{2}^{2}\sqrt{\log N}=o(\log T) and cT=o(logT)c_{T}=o(\log T), we can conclude that P𝝁(Npen(𝐩stu)cT)1P_{\boldsymbol{\mu}}(\ell^{\rm pen}_{N}({\bf p}_{stu})\geq c_{T})\rightarrow 1. \sqcap\hbox to0.0pt{\hss$\sqcup$}

Proof of Theorem 5(b). Consider VN1βV\sim N^{1-\beta} for 1ζ2<β<1ζ\tfrac{1-\zeta}{2}<\beta<1-\zeta. For NN large, there exists

logNhi(1+ϵ)12ρr(β,ζ)(logNμ0)\log N\gtrsim h_{i}\geq(1+\epsilon)^{\frac{1}{2}}\rho_{r}(\beta,\zeta)(\tfrac{\log N}{\mu_{0}}) (72)

such that for all 𝝁Λ1(h,Δ,V)\boldsymbol{\mu}\in\Lambda_{1}(h,\Delta,V), there exists kk such that

τj1<s(ik)<u(ik)<τj+1,|t(ik)τj|di2,\tau_{j-1}<s(ik)<u(ik)<\tau_{j+1},\quad|t(ik)-\tau_{j}|\leq\tfrac{d_{i}}{2},

and conditioned on Qn=1Q^{n}=1, either

E𝝁YtunrE𝝁Ystn or E𝝁YstnrE𝝁Ytun,E_{\boldsymbol{\mu}}Y_{tu}^{n}\geq rE_{\boldsymbol{\mu}}Y_{st}^{n}\mbox{ or }E_{\boldsymbol{\mu}}Y_{st}^{n}\geq rE_{\boldsymbol{\mu}}Y_{tu}^{n}, (73)

where (s,t,u)=(s(ik),t(ik),u(ik))(s,t,u)=(s(ik),t(ik),u(ik)).

By Stirling’s approximation x!2πx(xe)xx!\sim\sqrt{2\pi x}(\tfrac{x}{e})^{x}, for XX\simPoisson(η\eta), as xx\rightarrow\infty,

P(X=x)=eηηxx!12πxexp[η+xxlog(xη)].P(X=x)=e^{-\eta}\tfrac{\eta^{x}}{x!}\sim\tfrac{1}{\sqrt{2\pi x}}\exp[-\eta+x-x\log(\tfrac{x}{\eta})]. (74)

By apply this in (E) and (E).

Case 1: 12<β1ζ12[1+2g(2)1rg(2)D(2)]\tfrac{1}{2}<\tfrac{\beta}{1-\zeta}\leq\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}] and ρr(β,ζ)=β12(1ζ)2g(2)1r\rho_{r}(\beta,\zeta)=\tfrac{\beta-\frac{1}{2}(1-\zeta)}{2g(2)-1-r}. Let

Γ\displaystyle\Gamma =\displaystyle= {n:Qn=1,Ysun2(1+r2)hiμ01,\displaystyle\{n:Q^{n}=1,\ Y_{su}^{n}\geq\sqrt{2(1+r^{2})}h_{i}\mu_{0}-1,
|log(Ytun/Ystn)|2logr,qstunNζ1}.\displaystyle\qquad|\log(Y_{tu}^{n}/Y_{st}^{n})|\geq 2\log r,\ q_{stu}^{n}\geq N^{\zeta-1}\}.

Consider Y1Y_{1}\sim Poisson(hiμ0h_{i}\mu_{0}) and Y2Y_{2}\sim Poisson(rhiμ0rh_{i}\mu_{0}). By (74) and hilogNh_{i}\lesssim\log N,

P(Y1=(21+r2)12hiμ0)\displaystyle P(Y_{1}=\lfloor(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}h_{i}\mu_{0}\rfloor)
\displaystyle\gtrsim 1logNexp(hiμ0[1+(21+r2)12\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-1+(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}
(21+r2)12log((21+r2)12)]),\displaystyle\quad-(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}\log((\tfrac{2}{1+r^{2}})^{\frac{1}{2}})]),
P(Y2=(21+r2)12r2hiμ0)\displaystyle P(Y_{2}=\lceil(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}r^{2}h_{i}\mu_{0}\rceil)
\displaystyle\gtrsim 1logNexp(hiμ0[r+r2(21+r2)12\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-r+r^{2}(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}
r2(21+r2)12log(r(21+r2)12)]).\displaystyle\quad-r^{2}(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}\log(r(\tfrac{2}{1+r^{2}})^{\frac{1}{2}})]).

Recall that g(2)=(1+r22)12g(2)=(\tfrac{1+r^{2}}{2})^{\frac{1}{2}} and D(2)=11+r2log(21+r2)+r21+r2log(2r21+r2)D(2)=\tfrac{1}{1+r^{2}}\log(\tfrac{2}{1+r^{2}})+\tfrac{r^{2}}{1+r^{2}}\log(\tfrac{2r^{2}}{1+r^{2}}) [see (54)]. By (E),

E𝝁(#Γ)\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma)
\displaystyle\geq V[P(Y1=(21+r2)12hiμ0)P(Y2=(21+r2)12r2hiμ0)\displaystyle V[P(Y_{1}=\lfloor(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}h_{i}\mu_{0}\rfloor)P(Y_{2}=\lceil(\tfrac{2}{1+r^{2}})^{\frac{1}{2}}r^{2}h_{i}\mu_{0}\rceil)
Nζ1]\displaystyle\quad-N^{\zeta-1}]
\displaystyle\gtrsim N1βlogNexp(hiμ0[2g(2)1rg(2)D(2)]).\displaystyle\tfrac{N^{1-\beta}}{\log N}\exp(h_{i}\mu_{0}[2g(2)-1-r-g(2)D(2)]).

By (E), for nΓn\in\Gamma,

pstun\displaystyle p^{n}_{stu} \displaystyle\leq 2exp(YsunD(2))ξN\displaystyle 2\exp(-Y_{su}^{n}D(2))\leq\xi_{N} (78)
where ξN\displaystyle\mbox{where }\xi_{N} =\displaystyle= C2exp(2hiμ0g(2)D(2)) for C2=2eD(2).\displaystyle C_{2}\exp(-2h_{i}\mu_{0}g(2)D(2))\mbox{ for }C_{2}=2e^{D(2)}.

Let qstun=Fn(pstun)q^{n}_{stu}=F_{n}(p^{n}_{stu}) where FnF_{n} is the distribution function of pstunp^{n}_{stu}. It follows from Lemmas 1 and 2 that with probability tending to 1,

N(𝐩stu)\displaystyle\ell_{N}({\bf p}_{stu})
\displaystyle\geq N(𝐪stu)+(#Γ)λ24NξNlogN\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}
\displaystyle\geq λ22logN+λ2N12β(logN)32exp(hiμ0[2g(2)1r]).\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+\tfrac{\lambda_{2}N^{\frac{1}{2}-\beta}}{(\log N)^{\frac{3}{2}}}\exp(h_{i}\mu_{0}[2g(2)-1-r]).

Since λ2Nζ2logN\lambda_{2}\asymp\tfrac{N^{\frac{\zeta}{2}}}{\sqrt{\log N}} and by (72)

hiμ0(1+ϵ)12ρr(β,ζ)logN=(1+ϵ)13(β12(1ζ)2g(2)1r)logN,h_{i}\mu_{0}\geq(1+\epsilon)^{\frac{1}{2}}\rho_{r}(\beta,\zeta)\log N=(1+\epsilon)^{\frac{1}{3}}(\tfrac{\beta-\tfrac{1}{2}(1-\zeta)}{2g(2)-1-r})\log N,

it follows from (E) that N(𝐩stu)Nζ+δ(logN)2\ell_{N}({\bf p}_{stu})\gtrsim\tfrac{N^{\zeta+\delta}}{(\log N)^{2}} for δ=[(1+ϵ)121][β12(1ζ)]\delta=[(1+\epsilon)^{\frac{1}{2}}-1][\beta-\tfrac{1}{2}(1-\zeta)]. Since the penalty log(T4(1ts+1ut))logTNζ\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T\sim N^{\zeta} and cT=o(Nζ)c_{T}=o(N^{\zeta}), we conclude P𝝁(Npen(𝐩stu)cT)1P_{\boldsymbol{\mu}}(\ell_{N}^{\rm pen}({\bf p}_{stu})\geq c_{T})\rightarrow 1.

Case 2: 12[1+2g(2)1rg(2)D(2)]<β1ζ<1\tfrac{1}{2}[1+\tfrac{2g(2)-1-r}{g(2)D(2)}]<\tfrac{\beta}{1-\zeta}<1 and ρr(β,ζ)=1ζ2g(ω)D(ω)=βω1(1ζ)2g(ω)1r\rho_{r}(\beta,\zeta)=\tfrac{1-\zeta}{2g(\omega)D(\omega)}=\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r} with ω\omega achieving the maximum in (55). Let

Γ\displaystyle\Gamma =\displaystyle= {n:Qn=1,Ysun2g(ω)hiμ01,\displaystyle\{n:Q^{n}=1,\ Y^{n}_{su}\geq 2g(\omega)h_{i}\mu_{0}-1,
|log(Ytun/Ystn)|ωlogr,qstunNζ1}.\displaystyle\qquad|\log(Y^{n}_{tu}/Y^{n}_{st})|\geq\omega\log r,\ q_{stu}^{n}\geq N^{\zeta-1}\}.

Consider Y1Y_{1}\sim Poisson(hiμ0h_{i}\mu_{0}) and Y2Y_{2}\sim Poisson(rhiμ0rh_{i}\mu_{0}).

By (74) and hilogNh_{i}\lesssim\log N,

P(Y1=2g(ω)rω+1hiμ0)\displaystyle P(Y_{1}=\lfloor\tfrac{2g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rfloor)
\displaystyle\gtrsim 1logNexp(hiμ0[1+2g(ω)rω+1\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-1+\tfrac{2g(\omega)}{r^{\omega}+1}
2g(ω)rω+1log(2g(ω)rω+1)]),\displaystyle\quad-\tfrac{2g(\omega)}{r^{\omega}+1}\log(\tfrac{2g(\omega)}{r^{\omega}+1})]),
P(Y2=2rωg(ω)rω+1hiμ0)\displaystyle P(Y_{2}=\lceil\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rceil)
\displaystyle\gtrsim 1logNexp(hiμ0[r+2rωg(ω)rω+1\displaystyle\tfrac{1}{\sqrt{\log N}}\exp(h_{i}\mu_{0}[-r+\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}
2rωg(ω)rω+1log(2rω1g(ω)rω+1)]).\displaystyle\quad-\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}\log(\tfrac{2r^{\omega-1}g(\omega)}{r^{\omega}+1})]).

Recall that g(ω)=(1+rω2)1ωg(\omega)=(\tfrac{1+r^{\omega}}{2})^{\frac{1}{\omega}} and D(ω)=11+rωlog(21+rω)+rω1+rωlog(2rω1+rω)D(\omega)=\tfrac{1}{1+r^{\omega}}\log(\tfrac{2}{1+r^{\omega}})+\tfrac{r^{\omega}}{1+r^{\omega}}\log(\tfrac{2r^{\omega}}{1+r^{\omega}}) [see (54)]. By (E),

E𝝁(#Γ)\displaystyle E_{\boldsymbol{\mu}}(\#\Gamma)
\displaystyle\geq V[P(Y1=2g(ω)rω+1hiμ0)P(Y22rωg(ω)rω+1hiμ0)\displaystyle V[P(Y_{1}=\lfloor\tfrac{2g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rfloor)P(Y_{2}\geq\lceil\tfrac{2r^{\omega}g(\omega)}{r^{\omega}+1}h_{i}\mu_{0}\rceil)
Nζ1]\displaystyle\quad-N^{\zeta-1}]
\displaystyle\gtrsim N1βlogNexp(hiμ0[2g(ω)1r2(ω1ω)g(ω)D(ω)]).\displaystyle\tfrac{N^{1-\beta}}{\log N}\exp(h_{i}\mu_{0}[2g(\omega)-1-r-2(\tfrac{\omega-1}{\omega})g(\omega)D(\omega)]).

By (E), for nΓn\in\Gamma,

pn\displaystyle p^{n} \displaystyle\leq 2exp(Ys(ik),u(ik)nD(ω))ξN\displaystyle 2\exp(-Y^{n}_{s(ik),u(ik)}D(\omega))\leq\xi_{N} (83)
where ξN\displaystyle\mbox{where }\xi_{N} =\displaystyle= Cωexp(2hiμ0g(ω)D(ω)) for Cω=2eD(ω).\displaystyle C_{\omega}\exp(-2h_{i}\mu_{0}g(\omega)D(\omega))\mbox{ for }C_{\omega}=2e^{D(\omega)}.

Let qn=Fn(pn)q^{n}=F_{n}(p^{n}) where FnF_{n} is the distribution function of pnp^{n}. It follows from Lemmas 1 and 2 that with probability tending to 1,

N(𝐩stu)\displaystyle\ell_{N}({\bf p}_{stu})
\displaystyle\geq N(𝐪stu)+(#Γ)λ24NξNlogN\displaystyle\ell_{N}({\bf q}_{stu})+(\#\Gamma)\tfrac{\lambda_{2}}{4\sqrt{N\xi_{N}\log N}}
\displaystyle\gtrsim λ22logN+λ2N12β(logN)32\displaystyle-\lambda_{2}^{2}\sqrt{\log N}+\tfrac{\lambda_{2}N^{\frac{1}{2}-\beta}}{(\log N)^{\frac{3}{2}}}
×exp(hiμ0[2g(ω)1r(ω2ω)g(ω)D(ω)]).\displaystyle\qquad\times\exp(h_{i}\mu_{0}[2g(\omega)-1-r-(\tfrac{\omega-2}{\omega})g(\omega)D(\omega)]).

Since λ2Nζ2logN\lambda_{2}\asymp\tfrac{N^{\frac{\zeta}{2}}}{\sqrt{\log N}} and by (72),

hiμ0\displaystyle h_{i}\mu_{0} \displaystyle\geq (1+ϵ)12ρr(β,ζ)logN\displaystyle(1+\epsilon)^{\frac{1}{2}}\rho_{r}(\beta,\zeta)\log N
=\displaystyle= (1+ϵ)12(1ζ2g(ω)D(ω))logN\displaystyle(1+\epsilon)^{\frac{1}{2}}(\tfrac{1-\zeta}{2g(\omega)D(\omega)})\log N
=\displaystyle= (1+ϵ)12(βω1(1ζ)2g(ω)1r)logN,\displaystyle(1+\epsilon)^{\frac{1}{2}}(\tfrac{\beta-\omega^{-1}(1-\zeta)}{2g(\omega)-1-r})\log N,

it follows from (E) that N(𝐩stu)Nζ+δ(logN)2\ell_{N}({\bf p}_{stu})\gtrsim\tfrac{N^{\zeta+\delta}}{(\log N)^{2}} for δ=[(1+ϵ)121][β12(1ζ)]\delta=[(1+\epsilon)^{\frac{1}{2}}-1][\beta-\tfrac{1}{2}(1-\zeta)]. Since the penalty log(T4(1ts+1ut))logTNζ\log(\tfrac{T}{4}(\tfrac{1}{t-s}+\tfrac{1}{u-t}))\leq\log T\sim N^{\zeta} and cT=o(logT)c_{T}=o(\log T), we conclude P𝝁(Npen(𝐩stu)cT)1P_{\boldsymbol{\mu}}(\ell_{N}^{\rm pen}({\bf p}_{stu})\geq c_{T})\rightarrow 1. \sqcap\hbox to0.0pt{\hss$\sqcup$}

References

  • [1] Arias-Castro, E., Donoho, D. and Huo X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inf. Theory 51 2402–2425.
  • [2] Arias-Castro, E., Donoho, D. and Huo X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform noise. Ann. Statist. 34 326–349.
  • [3] Berk, R.H. and Jones, D.H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete 47 47–50.
  • [4] Cai, T., Jeng, X.J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedasic mixtures. J. Roy. Statist. Soc. B 73 629–662.
  • [5] Chan, H.P. and Walther, G. (2015). Optimal detection of multi-sample aligned sparse signals. Ann. Statist. 43 1865–1895
  • [6] Chan, H.P. (2017). Optimal sequential detection in multi-stream data. Ann. Statist. 45 2736–2763.
  • [7] Cho, H. (2016). Change-point detection in panel data via double CUSUM statistic. Electron. J. Statist. 10 2000–2038.
  • [8] Cho, H. and Fryzlewicz, P. (2015). Multiple change-point detection for high-dimensional time series via sparsified binary segmentation. J. Roy. Statist. Soc. B 77 475–507.
  • [9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • [10] Du, C., Kao, C.L. and Kou, S.C. (2016). Stepwise signal extraction via marginal likelihood. J. Amer. Statist. Assoc. 111 314–330.
  • [11] Dümbgen, L. and Spokoiny, V.G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124–152.
  • [12] Enikeeva, F. and Harchaoui, Z. (2019). High-dimensional change-point detection under sparse alternatives. Ann. Statist. 47 2051–2079.
  • [13] Frick, K., Munk, A. and Sieling, H. (2014). Multiscale change point inference. J. Roy. Statist. Soc. B 76 495–580.
  • [14] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection. Ann. Statist. 42 2243–2281.
  • [15] Horváth L. and Hušková M. (2014). Change-point detection in panel data. J. Time Ser. Anal. 23 631–648.
  • [16] Hubert, L. and Arabie, P. (1983). Comparing partitions. J. Classificn 2 193–218.
  • [17] Ingster, Y.I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
  • [18] Ingster, Y.I. (1998). Minimax detection of a signal for n\ell^{n} balls. Math. Methods Statist. 7 401–428.
  • [19] Jirak, M. (2015). Uniform change point tests in high dimension. Ann. Statist. 43 2451–2483
  • [20] Lai, T.L. and Xing, H. (2011). A simple Bayesian approach to multiple change-points. Statist. Sinica 21 539–569.
  • [21] Liu, H., Gao, C. and Samworth, R. (2021). Minimax rate in sparse high-dimensional change-point detection. Ann. Statist. 49 1081–1112.
  • [22] Mei, Y. (2010). Efficient scalable schemes for monitoring large number of data streams. Biometrika 97 419–433.
  • [23] Niu, Y.S., Hao, N. and Zhang, H. (2016). Multiple change-point detection: a selective overview. Statist. Sci. 31 611–623.
  • [24] Niu, Y.S. and Zhang, H. (2012). The screening and ranking algorithm to detect DNA copy number variation. Ann. Appl. Statist. 6 1306–1326.
  • [25] Olshen, A.B., Venkatraman, E.S., Lucito, R. and Wigler, M. (2003). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatist. 5 557–572.
  • [26] Pilliat, E., Carpentier. A. and Verzelen, N. (2020). Optimal multiple change-point detection for high-dimensional data. arXiv preprint arXiv:2011.07818.
  • [27] Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Soc. 66 846–850.
  • [28] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Statist. 40 752–769.
  • [29] Tukey, J.W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ.
  • [30] Wang, T. and Samworth, R. (2018). High dimensional change point estimation via sparse projection. J. Roy. Statist. Soc. B 80 57–83.
  • [31] Wang, Y. and Mei, Y. (2015). Large-scale multi-stream quickest change via shrinkage post-change estimation. IEEE Trans. Information Theory 61 6926–6938.
  • [32] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.
  • [33] Willsky, A. and Jones, H. (1976). A generalized likelihood ratio approach to the detection and estimation of jumps in linear system. IEEE Trans. Autom. Cont. 21 108–112.
  • [34] Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist. 41 670–692.
  • [35] Yao, Y.C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447.
  • [36] Zhang, N.R. and Siegmund, D. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization. Biometrics 63 22–52.
  • [37] Zhang, N.R., Siegmund, D., Ji, H. and Li, J.Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631–645.