Likelihood Scores for Sparse Signal and Change-Point Detection
Shouri Hu,
Jingyan Huang,
Hao Chen
and Hock Peng Chan22footnotemark: 2School of Mathematical Sciences, University of Electronic Science and Technology of ChinaDepartment of Statistics and Data Science,
National University of SingaporeDepartment of Statistics,
University of California at Davis
Abstract
We consider here the identification of change-points on large-scale data streams.
The objective is to find the most efficient way of combining information across data stream
so that detection is possible under the smallest detectable change magnitude.
The challenge comes from the sparsity of change-points when only a small fraction of data streams undergo change
at any point in time.
The most successful approach to the sparsity issue so far has been the application of hard thresholding such that only
local scores from data streams exhibiting significant changes are considered and added.
However the identification of an optimal threshold is a difficult one.
In particular it is unlikely that the same threshold is optimal for different levels of sparsity.
We propose here a sparse likelihood score for identifying a sparse signal.
The score is a likelihood ratio for testing between the null hypothesis of no change against an alternative hypothesis
in which the change-points or signals are barely detectable.
By the Neyman-Pearson Lemma this score has maximum detection power at the given alternative.
The outcome is that we have a scoring of data streams that is successful in detecting at the boundary of the
detectable region of signals and change-points.
The likelihood score can be seen as a soft thresholding approach to sparse signal and change-point detection in which
local scores that indicate small changes are down-weighted much more than local scores indicating large changes.
We are able to show second-order optimality of the sparsity likelihood score in the sense of achieving successful
detection at the minimum detectable order of change magnitude as well as at the minimum detection asymptotic constant
with respect this order of change.
1 Introduction
Consider a large number of data streams containing change-points.
We consider the situation in which all data up to a given time is available for analysis, so
each data stream is an observed sequence of length .
At each change-point one or more of the sequences undergo distribution change.
The objective is to identify these change-points and the sequences undergoing distribution change.
Of interest here is the identification of these change-points when there is sparsity,
that is when the number of sequences undergoing change is small compared to .
More specifically we want to know the minimum magnitude of change for which the distribution change can be detected
under sparsity.
And secondly we want to be have an algorithm that is able to detect,
with high probability, change-points under the minimum detectable
change.
See Niu, Hao and Zhang [23] and Wang and Samworth [30] for applications to
engineering, genomics and finance.
A typical strategy to deal with sparsity is to subject local scores to thresholding or penalization before
summing them up across sequence.
Algorithms employing this strategy include the Sparsified Binary Segmentation (SBS) (Cho and Fryzlewicz [8]),
the double CUSUM (DC) (Cho [7]),
the Informative Sparse Projection (INSPECT) (Wang and Samworth [30]) and the scan algorithm of Enikeeva and Harachaoui
[12].
The strategy was also employed by Mei [22], Xie and Siegmund [34]
and Wang and Mei [31] in sequential change-point detection on multiple sequences,
and Zhang, Siegmund, Ji and Li [37] to detect distribution deviations from known baselines on multiple sequences.
Thresholding and penalization suppress noise by removing small and moderate scores,
mostly from the majority of sequences without change,
thus enhancing the signals from the sparse sequences with changes.
It is however unlikely that we are able to specify a threshold or penalization parameter that is optimal at all levels of sparsity.
The higher-criticism (HC) test statistic,
proposed by Tukey [29] to check for significantly large number of small p-values,
uses multiple thresholds for sparse mixture detection.
The number of p-values below a threshold is transformed to a
higher-criticism score and this score is maximized over all thresholds.
The Berk and Jones [3] test statistic uses multiple thresholds as well but it applies a different
p-value transformation.
The HC test statistic was shown by Donoho and Jin [9] to be optimal in the detection of a sparse normal mixture.
Cai, Jeng and Jin [4] extended it to detect intervals in multiple sequences where the means of a sparse fraction of the sequences deviate from a known baseline and showed that the HC test statistic is optimal.
Chan and Walther [5] considered sequence length much larger than number of sequences
with detection boundaries that are more complex.
They showed that the HC test statistic achieves detection at these boundaries and is optimal in more general settings.
They also showed that the Berk-Jones test statistic achieves the same optimality.
The HC and Berk-Jones test statistics have the advantage of not requiring a threshold to be specified in advance.
However at each threshold we are unable to differentiate a p-value just below the threshold and a p-value that is below
the threshold by a large margin.
This loss of information results sub-optimality when identifying the exact location where change has occurred.
Our approach here is to convert the p-values into likelihood scores for testing sparse sequences.
It can be considered to be a soft form of thresholding in which p-values that are close to zero are penalized less than
p-values that are barely significant.
Unlike the HC and Berk-Jones test statistic only one transformation of the p-values is needed.
It retains the advantage of the HC and Berk-Jones test statistics in not specifying a threshold in advance,
but unlike the HC and Berk-Jones test statistics a smaller p-value always has a larger likelihood score.
Since the likelihood scores are transformations of p-values, the proposed method can be applied to
any type of distribution changes and it can handle data types that vary across sequences.
Our theory however requires a specific distribution family for neat asymptotics and we consider here in particular
either normal or Poisson data.
We show optimality up to the correct asymptotic constants.
For sparse normal change-points these constants are two-dimensional extensions of those in Ingster [17]
and Donoho and Jin [9] for sparse normal mixture detection.
These constants have been discussed in the context of sparse normal change-point detection assuming a known baseline
in Chan and Walther [5] and Chan [6].
For sparse Poisson change-points the constants are new and different from sparse normal constants.
The optimality of multiple sequence identification of change-points up to the correct constant is new.
Previous works on optimality for normal data are up to the correct order of magnitude though they go beyond
the i.i.d. model,
for example Pilliat, Carpentier and Verzelen [26]
considered sparse change-point detection in time-series with normal errors.
Liu, Guo and Samworth [21] showed optimality up to the best order for normal errors.
However they considered a different problem that assumes at most one change-point.
The algorithm we propose here has two steps in the identification of two change-points.
The first detection screening step applies the Screening and Ranking (SaRa) idea of Niu and Zhang [24].
The second estimation step for more precise location of change-points
uses the CUSUM-like procedure of Wild Binary Segmentation (WBS), cf. Fryzlewicz [14]..
This two-step approach saves computation time because the fast screening step evaluates a large number of segments whereas
the computationally intensive estimation step is only applied when a change-point has been detected during screening.
In contrast for WBS the estimation step is applied on a large number of randomly generated segments.
Unlike in Niu and Zhang [24] we do not apply the BIC criterion of Zhang and Siegmund [36] to determine the number of change-points.
Instead critical values are specified in advance and binary segmentation, cf. Olshen et al. [25],
is applied to detect the change-points sequentially.
An alternative to binary segmentation is estimating the full set of change-points at one go by applying global optimization
and making use of dynamic programming to manage the computational complexity.
This was employed by the HMM algorithms of Yao [35] and Lai and Xing [20],
the multi-scale SMUCE algorithm of Frick, Munk and Sieling [13]
and the Bayesian Likelihood algorithm of Du, Kao and Kou [10].
These methods are however designed for single sequence segmentation.
Niu, Hao and Zhang [23] provides an excellent background of the historical developments.
The outline of this paper is as follows.
In Section II we introduce the sparse likelihood (SL) scores and show that they are optimal in the detection of sparse normal mixtures.
In Section III we extend SL scores to detect change-points in multiple sequences.
In Section IV we show that SL scores are optimal for change-point detection when the observations are normal or Poisson.
In Section V we perform simulation studies on the SL scores.
In the appendices we prove the optimality of SL scores.
1.1 Notations
We write to denote .
We write to denote .
We write to denote for all for some and to denote and
.
We write to denote for some .
Let denote the greatest (least) integer function.
Let and denote the density and distribution function respectively of the standard normal.
Let denote the indicator function.
Let denote the empty set and let denote the number of elements in a set .
2 Sparse mixture detection
We start with the simpler problem of detecting a sparse mixture,
with the objective of motivating the sparse likelihood score.
Let be independent p-values of null hypotheses
and let be the sorted p-values.
Tukey (1976) proposed the higher-criticism test statistic
(1)
with if for all ,
for the overall test that all null hypotheses are true.
Donoho and Jin [9] showed that the HC test statistic is optimal for detecting a sparse fraction of false null hypotheses.
Consider test scores when the th null hypothesis is true and for some when the th null hypothesis is false.
Define
(2)
Donoho and Jin [9] showed that on the sparse mixture
no algorithm is able to achieve,
as ,
(3)
for testing : versus : ,
if for .
They also showed that the HC test statistic achieves (3) when
and is thus optimal.
Type I error refers to the conclusion of when is true whereas
Type II error refers to the conclusion of when is true.
Ingster (1997, 1998) established the detection lower bound
showing that (3) cannot be achieved when .
Like the HC test statistic,
the Berk and Jones [3] test statistic
We introduce the sparse likelihood scores in Section II.A and show that they achieve (3) in the detection of sparse mixtures,
when ,
in Section II.B.
2.1 Sparse likelihood
Let and .
For both and 2,
and increases as decreases.
Define the sparse likelihood score
(4)
with and .
For the purpose of sparse mixture detection plays no role and we can consider .
The importance of comes from detecting very sparse change-points.
The sparse likelihood is the likelihood of
The distribution function of is
Consider the true distribution of to be some unknown .
The fraction of p-values less than some small has mean and standard deviation approximately .
Hence to achieve decent detection power for a test using as critical p-value,
we need
(5)
Setting to the right-hand side of (5) gives us a function close to .
As varies slowly with ,
we can view as lying at the boundary for which detection is possible at each small .
This suggests that for any that is greater than at some ,
we are able to differentiate it from Uniform (0,1) by using the likelihood test against .
When is of order smaller than ,
dominates
and the selection of is advantageous.
This is relevant in the extension of sparse likelihood scores to detect change-points on long sequences where
large number of likelihood comparisons is involved.
Figure 1: Graphs of (black, —) and (red,),
with ,
for (top) and (bottom).
The parameters of are ,
and for ).
The sparse likelihood score can be viewed as a form of soft thresholding.
To visualize this we compare in Figure 1
the plot of for ,
,
and ,
against that of .
For ,
the two functions are close to each other however within ,
is not constant but has a gentle upward curve.
The sparsity likelihood score is negative for and for standard normal has a mean of .
This negative mean helps in controlling the sum of scores when is large and .
2.2 Optimal detection
We show here that the sparse likelihood score is optimal in the detection of change-points for a broad range of sparsity.
Let and denote expectation and probability respectively with respect to .
Since
it follows from Markov’s inequality that
(6)
This exponential bound makes the sparsity likelihood score easy to work with when there are large number of likelihood comparisons,
as critical values satisfying a required level of Type I error control can have a simple expression not depending on .
We show in Theorem 1 that by selecting
(7)
the Type I and II error probabilities both go to zero at the detection boundary.
Theorem 1.
Assume (7).
Consider the test of : versus : ,
for ,
with for some .
Let .
If for ,
then
3 Change-point detection
Let denote the th observation of the th sequence for and .
Consider first the model
(8)
We are interested in the detection and estimation of
For ,
let .
To check for a change of mean on the th sequence at location ,
select and let p-value
In the sparse likelihood algorithm we combine these p-values using ,
where .
When the data follow some other distributions, the corresponding likelihood ratio statistic and p-value can be computed accordingly.
Sparse likelihood scores detects well when only a small fraction of the sequences undergo change of mean.
For large
computing the sparse likelihood score for all is expensive.
Instead we combine the approximating set idea of Walther (2010) to first space out the that are evaluated,
and to apply the CUSUM-type scores used in WBS to estimate the change-point location accurately only
when the first step indicates a change-point.
In addition to computational savings,
through this two-step approach we are able to incorporate multi-scale penalization terms
similar to those used in Dümbgen and Spokoiny (2001) and the SMUCE algorithm of Frick, Munk and Sieling (2014),
to ensure optimality not only at all levels of sparse change-points,
but also at all orders of change magnitudes.
Let and be integer-valued sequences with for all .
Let .
Define
The elements of are the indices where sparse likelihood scores for windows of length are computed.
Initially we have the full dataset
and after one or more change-points have been estimated,
it is split into sub-datasets ,
with length .
We check for change-points in using windows specified by .
Let the penalized sparse likelihood scores
(9)
For ,
let .
The detection of change-points within ,
with window lengths at least ,
is as follows.
Algorithm 1 SL-estimate
input()
for
ifthen
output
stop
end if
end for
output (0,0)
There are two steps in SL-estimate in the estimation of a change-point,
when the largest penalized score exceeds the critical value .
The first is the identification of an interval ,
associated with the largest penalized score,
within which a change-point lies.
The second is the estimation of the change-point within this interval.
In the approximating set ,
neighboring windows are located apart,
hence we are unable to estimate the change-points accurately in the first step.
Accurate estimation is carried out,
with more intensive computations within ,
in the second step.
Since the second step is performed only after an interval has been identified as containing a change-point,
performing this two-step procedure saves computations in regions where scores are generally small and the likelihood of change-points is low.
After a change-point has been identified,
we split the dataset into two and execute the same algorithm on each split dataset.
To avoid repetitive computations,
we start from the window length used in the evaluation of the change-point spltting the dataset,
instead of starting from the smallest window length ,
on the split datasets.
The use of a set of representative set of window-lengths for computational savings in change-point detection have
been proposed in Willsky and Jones [33].
The recursive segmentation algorithm for the computation of the estimated change-point set is given below,
with initialization at .
Figure 2: Graphs of Type I error probability against critical value for the sparse likelihood detection algorithm,
for independent unit variance normal observations.
We consider parameters , , and as applied in the numerical studies in Section V,
with (top),
20,000 (bottom).
and (black), (red), (green), (blue), (orange).Algorithm 2 SL-detect
input()
SL-estimate()
ifthen
SL-detect(
SL-detect(
end if
output
In Figure 2 we show that the critical values of the sparse likelihood algorithm,
for a specified Type I error probability,
is stable over .
Contributing factors include having a mean that is close to zero and having exponential tail probabilities not depending on ,
see (6),
when and are uniformly distributed.
4 Optimal detection
Let and let be the number of change-points.
We show that the sparse likelihood algorithm is optimal for normal observations in Section IV.A
and for Poisson observations in Section IV.B.
Consider as such that
(10)
In Theorems 2 and 4
we specify the detection boundary for asymptotically zero Type I and II error probabilities.
Analogous detection boundaries for a single sequence is given in Arias-Castro, Donoho and Huo [1, 2].
In Theorems 3 and 5 we show that Type I and II error probabilities of the sparse likelihood algorithm go to zero at the detection boundary.
Recall that .
Consider the sparse likelihood algorithm with and satisfying
(11)
(12)
and critical values satisfying
(13)
For the sparse likelihood algorithm select parameters and
(14)
We satisfy (11) when and as .
Moreover (12) holds because
Condition (11) ensures that the set of is sufficiently dense to detect change-points optimally.
Condition (12) is required for (13) to hold.
The first half of condition (13) ensures Type II error probability goes to 0.
The second half ensures Type I error probability goes to 0.
4.1 Normal model
Let
be the number of sequences with change of mean of at least at the th change-point.
Let
with the convention and .
We consider here the test of : versus : .
Define
(15)
These constants are extensions of in (2) to capture the effect of multiple testing in change-point detection.
Theorem 2.
Assume (10) and let .
For normal observations,
no algorithm is able to achieve,
as ,
(16)
under any of the following conditions.
(a) Consider fixed.
i. When and .
ii. When for some and
.
(b) Consider for some .
i. When and .
ii. When for some and
.
Theorem 3.
Assume (10) and let .
For normal observations the sparse likelihood algorithm,
with parameters satisfying (11)–(14),
achieves (16) under any of the following conditions.
(a) Consider fixed.
i. When and .
ii. When for some and
.
(b) Consider for some .
i. When and .
ii. When for some and
.
4.2 Poisson model
We show here the optimality of the sparse likelihood detection algorithm for
(17)
For optimal detection on a single Poisson sequence,
see Rivera and Walther [28].
Let .
Consider .
Under the null hypothesis of no change-points in the interval ,
conditioned on ,
is binomial distributed with trials and success probability .
Let be the two-sided p-value of this conditional binomial test,
with continuity adjustments so that it is distributed as Uniform(0,1) under the null hypothesis.
More specifically when and simulate
(18)
where is probability with respect to ,
and define .
Let
and for a given ,
let
We consider here the test of : vs : .
For a given ,
let
(19)
Let and let
(20)
In Theorem 4 we show that (20) is the asymptotic constant in the detection boundary of Poisson random variables.
In Theorem 5 we show that the sparse likelihood algorithm achieves detection at this boundary for a broad
range of sparsity.
Theorem 4.
Assume (10).
Let for some and .
For Poisson observations no algorithm is able to achieve,
as ,
(21)
under either of the following conditions.
(a) When and .
(b) When for some and
.
Theorem 5.
Assume (10).
Let , and .
For Poisson observations the sparse likelihood algorithm,
with parameters satisfying (11)–(14),
achieves (21) under either of the following conditions.
(a) When and .
(b) When for some and
.
5 Simulation studies
We follow here the simulation set-up in Sections 5.1 and 5.3 of Wang and Samworth [30].
Assume that the random variables are normal with variances that are unknown but equal within sequence.
These variances are estimated using median absolute differences of adjacent observations and after normalization,
the random variables are treated like unit variance normal.
In the first study there is exactly one change-point .
Consider for and all .
For ,
let
The objective is to estimate assuming we know there is exactly one change-point.
We estimate here by
where is the penalized sparse score with and .
We simulate the probabilities that for and 10,
and compare against the INSPECT algorithm and the scan algorithm of Enikeeva and Harchaoui [12].
These two algorithms have the best numerical performances in Wang and Samworth [30].
The comparisons in Table I show that the sparse likelihood algorithm performs well.
Table 1: The fraction of simulation runs (out of 1000) for which is within distance from for and 10.
The same datasets are used to compare sparse likelihood (SL), INSPECT and the scan test,
with for and for .
3
10
3
10
3
10
SL
INSPECT
scan
3
0.511
0.801
0.478
0.785
0.520
0.804
5
0.466
0.740
0.427
0.718
0.463
0.722
10
0.393
0.645
0.370
0.637
0.362
0.599
22
0.319
0.553
0.282
0.547
0.256
0.465
50
0.244
0.462
0.211
0.453
0.197
0.378
500
0.177
0.339
0.148
0.335
0.112
0.240
3
0.481
0.748
0.410
0.667
0.480
0.730
5
0.423
0.673
0.344
0.584
0.394
0.633
10
0.320
0.546
0.246
0.480
0.261
0.456
20
0.237
0.431
0.198
0.403
0.188
0.332
45
0.186
0.344
0.136
0.311
0.130
0.242
200
0.114
0.227
0.095
0.235
0.074
0.153
2000
0.068
0.160
0.078
0.189
0.042
0.096
3
0.603
0.859
0.587
0.855
0.589
0.854
5
0.604
0.865
0.595
0.855
0.558
0.832
10
0.565
0.827
0.569
0.833
0.487
0.764
22
0.522
0.789
0.522
0.795
0.438
0.714
50
0.472
0.748
0.468
0.745
0.384
0.652
500
0.378
0.643
0.336
0.609
0.273
0.524
3
0.607
0.866
0.608
0.861
0.599
0.858
5
0.594
0.864
0.586
0.857
0.557
0.829
10
0.553
0.847
0.558
0.847
0.476
0.780
20
0.494
0.807
0.498
0.789
0.435
0.726
45
0.447
0.747
0.451
0.746
0.377
0.657
200
0.362
0.649
0.342
0.604
0.297
0.554
2000
0.274
0.532
0.241
0.471
0.225
0.457
In the second study there are three change-points within sequences of length ,
at , and .
At each change-point exactly sequences undergo mean changes.
Six scenarios are considered,
corresponding to
and ,
for and .
For ,
the mean changes are within the same 40 sequences at all three change-points,
whereas for the mean changes at all three change-points are on distinct sequences.
For ,
there is partial overlap of the sequences having mean changes at adjacent change-points.
The number of estimated change-points over 100 simulated datasets on each sequence is recorded,
as well as the adjusted Rand index (ARI),
see Rand [27] and Hubert and Arabie [16],
to measure the quality of the change-point estimation.
In the application of the sparse likelihood algorithm,
we select and for ,
and ,
for a total of window lengths.
We select critical value and parameters ,
.
Wang and Samworth [30] showed that INSPECT achieves average ARI of 0.90 when and either 0.73 (for ) or 0.74
(for and 40) when ,
comparable to sparse likelihood,
see Table II.
In addition to INSPECT,
Wang and Samworth [30] considered DC, SBS and scan,
as well as the CUSUM aggregration algorithms of Jirak [19] and Horváth and Hušková [15],
with average ARI in the range 0.77–0.87 when and 0.68–0.72 when .
Table 2: Number of change-points estimated by the sparse likelihood algorithm and the average ARI over 100 simulated datasets.
Since ,
by Markov’s inequality .
The proof of applies Lemmas 1 and 2 below.
Lemma 1 says that the sum of sparse likelihood scores under is bounded below
by a value close to zero,
with large probability.
Lemma 2 provides a lower bound to the increase in score when the p-value is divided by at least 2.
Their proofs are at the end of Appendix A.
Lemma 1.
Let ,
with Uniform(0,1).
For fixed and ,
Lemma 2.
For fixed,
for some and for some
such that ,
for large .
Proof of Theorem1.
Let , and be fixed.
Let be such that
and let
The additional assumptions of for and for is not restrictive because increases stochastically with .
Case 1: .
Let
For large,
for .
Moreover for all .
Hence by Lemma 1 and Lemma 2 with ,
with probability tending to 1,
Since is binomial with mean
with for ,
and since is subpolynomial in ,
we conclude .
Case 2: .
Let
For large,
for .
Hence by Lemma 1 and Lemma 2 with ,
with probability tending to 1,
Proof of Theorem2(a)ii.
Proceed as in the proof of Theorem 2(a)i.,
but with ,
and probability under which,
independently for ,
with probability and otherwise.
When ,
(29) holds.
When ,
.
By the law of large numbers,
.
Hence by (B) it suffices to show (32) with
Consider and let be the change-point satisfying the conditions in the definition of
.
Let if and otherwise.
We assume without loss of generality that .
To aid in the checking of the proof of Theorem 3,
we provide here the key ideas.
Let be such that
Consider fixed and for some .
Since ,
it follows from (11) that for large
we are able to find close to such that
(42)
Recall that and let
.
Let
(43)
with when and when
.
It follows from Lemmas 1 and 2 that with probability tending to 1,
for .
Since the penalty ,
and ,
to show ,
it suffices to show that there exists such that
(44)
Proof of Theorem3(a)i.
Consider .
Since ,
and ,
for large there exists
such that for all ,
there exists satisfying
(45)
Hence when ,
(46)
where .
Let .
Let and .
Since (0,1) and for large,
by Lemmas 1 and 2,
with probability tending to 1,
Since the penalty and ,
it follows that .
Proof of Theorem3(a)ii.
Case 1: for .
Since and ,
for large there exists satisfying such that whenever ,
For ,
the solution of to (59) is at least 2 and the maximum in (55) is attained at .
We conclude (56).
For ,
the solution of to (59) lies in the interval .
We conclude (57) from (55) and a rearrangement of (58).
Proof of Theorem4(b).
For ,
let be the maximizer in
(60)
Let for some .
Let denote probability with respect to for all and .
Let .
Let ,
,
denote probability under which,
independently for ,
with probability ,
and otherwise.
When ,
(61)
When ,
.
Let denote expectation with respect to .
Let and let denote expectation with respect to .
it follows from (E) that for
.
Since the penalty and ,
we conclude .
References
[1]Arias-Castro, E., Donoho, D. and Huo X. (2005).
Near-optimal detection of geometric objects by fast multiscale methods.
IEEE Trans. Inf. Theory51 2402–2425.
[2]Arias-Castro, E., Donoho, D. and Huo X. (2006).
Adaptive multiscale detection of filamentary structures in a background of uniform noise.
Ann. Statist.34 326–349.
[3]Berk, R.H. and Jones, D.H. (1979). Goodness-of-fit
test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete47 47–50.
[4]Cai, T., Jeng, X.J. and Jin, J. (2011).
Optimal detection of heterogeneous and heteroscedasic mixtures.
J. Roy. Statist. Soc. B 73 629–662.
[5]Chan, H.P. and Walther, G. (2015).
Optimal detection of multi-sample aligned sparse signals.
Ann. Statist.43 1865–1895
[6]Chan, H.P. (2017).
Optimal sequential detection in multi-stream data.
Ann. Statist. 45 2736–2763.
[7]Cho, H. (2016).
Change-point detection in panel data via double CUSUM statistic.
Electron. J. Statist.10 2000–2038.
[8]Cho, H. and Fryzlewicz, P. (2015).
Multiple change-point detection for high-dimensional time series via sparsified binary segmentation. J. Roy. Statist. Soc. B 77 475–507.
[9]Donoho, D. and Jin, J. (2004).
Higher criticism for detecting sparse heterogeneous mixtures.
Ann. Statist.32 962–994.
[10]Du, C., Kao, C.L. and Kou, S.C. (2016).
Stepwise signal extraction via marginal likelihood.
J. Amer. Statist. Assoc.111 314–330.
[11]Dümbgen, L. and
Spokoiny, V.G. (2001). Multiscale testing of
qualitative hypotheses. Ann. Statist.29
124–152.
[12]Enikeeva, F. and Harchaoui, Z. (2019).
High-dimensional change-point detection under sparse alternatives.
Ann. Statist.47 2051–2079.
[13]Frick, K., Munk, A. and Sieling, H. (2014).
Multiscale change point inference. J. Roy. Statist. Soc. B 76 495–580.
[14]Fryzlewicz, P. (2014).
Wild binary segmentation for multiple change-point detection.
Ann. Statist.42 2243–2281.
[15]Horváth L. and Hušková M. (2014).
Change-point detection in panel data.
J. Time Ser. Anal.23 631–648.
[16]Hubert, L. and Arabie, P. (1983).
Comparing partitions. J. Classificn2 193–218.
[17]Ingster, Y.I. (1997).
Some problems of hypothesis testing leading to infinitely
divisible distributions. Math. Methods Statist.6 47–69.
[18]Ingster, Y.I. (1998).
Minimax detection of a signal for balls. Math.
Methods Statist.7 401–428.
[19]Jirak, M. (2015).
Uniform change point tests in high dimension.
Ann. Statist.43 2451–2483
[20]Lai, T.L. and Xing, H. (2011).
A simple Bayesian approach to multiple change-points.
Statist. Sinica21 539–569.
[21]Liu, H., Gao, C. and Samworth, R. (2021).
Minimax rate in sparse high-dimensional change-point detection.
Ann. Statist.49 1081–1112.
[22]Mei, Y. (2010).
Efficient scalable schemes for monitoring large number of data streams.
Biometrika97 419–433.
[23]Niu, Y.S., Hao, N. and Zhang, H.
(2016). Multiple change-point detection: a selective overview.
Statist. Sci.31 611–623.
[24]Niu, Y.S. and Zhang, H. (2012).
The screening and ranking algorithm to detect DNA copy number variation.
Ann. Appl. Statist.6 1306–1326.
[25]Olshen, A.B., Venkatraman, E.S., Lucito, R. and Wigler, M. (2003).
Circular binary segmentation for the analysis of array-based DNA copy number data.
Biostatist.5 557–572.
[26]Pilliat, E., Carpentier. A. and Verzelen, N. (2020).
Optimal multiple change-point detection for high-dimensional data.
arXiv preprint arXiv:2011.07818.
[27]Rand, W.M. (1971).
Objective criteria for the evaluation of clustering methods.
J. Amer. Statist. Soc.66 846–850.
[28]Rivera, C. and Walther, G.
(2013). Optimal detection of a jump in the intensity of a Poisson
process or in a density with likelihood ratio statistics.
Scand. J. Statist.40 752–769.
[30]Wang, T. and Samworth, R. (2018).
High dimensional change point estimation via sparse projection.
J. Roy. Statist. Soc. B 80 57–83.
[31]Wang, Y. and Mei, Y. (2015).
Large-scale multi-stream quickest change via shrinkage post-change estimation.
IEEE Trans. Information Theory61 6926–6938.
[32]Walther, G. (2010). Optimal and fast
detection of spatial clusters with scan statistics.
Ann. Statist.38 1010–1033.
[33]Willsky, A. and Jones, H. (1976).
A generalized likelihood ratio approach to the detection and estimation of jumps in linear system.
IEEE Trans. Autom. Cont.21 108–112.
[34]Xie, Y. and Siegmund, D.
(2013). Sequential multi-sensor change-point detection.
Ann. Statist.41 670–692.
[35]Yao, Y.C. (1984).
Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist.12 1434–1447.
[36]Zhang, N.R. and Siegmund, D. (2007).
A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization.
Biometrics63 22–52.
[37]Zhang, N.R., Siegmund, D., Ji, H. and Li, J.Z.
(2010). Detecting simultaneous changepoints in multiple sequences.
Biometrika97 631–645.