This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Change point detection and image segmentation for time series of astrophysical images

Cong Xu Department of Statistics, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA Hans Moritz Günther MIT, Kavli Institute for Astrophysics and Space Research, 77 Massachusetts Avenue, Cambridge, MA 02139, USA Vinay L. Kashyap Center for Astrophysics || Harvard & Smithsonian, 60 Garden Street, Cambridge, MA 02138, USA Thomas C. M. Lee Department of Statistics, University of California, Davis, One Shields Avenue, Davis, CA 95616, USA Andreas Zezas Department of Physics, University of Crete, 710 03 Heraklion, Crete, Greece
Abstract

Many astrophysical phenomena are time-varying, in the sense that their intensity, energy spectrum, and/or the spatial distribution of the emission suddenly change. This paper develops a method for modeling a time series of images. Under the assumption that the arrival times of the photons follow a Poisson process, the data are binned into 4D grids of voxels (time, energy band, and x-y coordinates), and viewed as a time series of non-homogeneous Poisson images. The method assumes that at each time point, the corresponding multi-band image stack is an unknown 3D piecewise constant function including Poisson noise. It also assumes that all image stacks between any two adjacent change points (in time domain) share the same unknown piecewise constant function. The proposed method is designed to estimate the number and the locations of all the change points (in time domain), as well as all the unknown piecewise constant functions between any pairs of the change points. The method applies the minimum description length (MDL) principle to perform this task. A practical algorithm is also developed to solve the corresponding complicated optimization problem. Simulation experiments and applications to real datasets show that the proposed method enjoys very promising empirical properties. Applications to two real datasets, the XMM observation of a flaring star and an emerging solar coronal loop, illustrate the usage of the proposed method and the scientific insight gained from it.

Poisson distribution (1898), Maximum likelihood estimation (1901), Detection (1911), Spatial point processes (1915), Time series analysis (1916), Astronomy data analysis (1858)
facilities: XMM-Newton (EPIC), SDO (AIA)software: CIAO (Fruscione et al., 2006). astropy (Astropy Collaboration et al., 2013, 2018). IDL https://www.l3harrisgeospatial.com/Software-Technology/IDL. SolarSoft (Freeland & Handy, 2012). Automark (Wong et al., 2016). 4D_Automark, a Python package for the proposed method can be downloaded from https://github.com/kevinxucong/4D_Automark. (doi:10.5281/zenodo.4451396)
\NewPageAfterKeywords

1 Introduction

Many phenomena in the high-energy universe are time-variable, from coronal flares on the smallest stars to accretion events in the most massive black holes. Often, this variability can just be seen “by-eye” but at other times, we need to use robust methods founded in statistics to distinguish random noise from significant variability. Realizing where the change has occurred is critical for subsequent scientific analyses, e.g., spectral fitting and light curve modeling. Such analyses must focus on those intervals in data space which are properly tied to the changes in the physical processes that generate the observed photons. Therefore, it is of importance to identify sources as well as to locate their spatial boundaries. Our goal is to detect change points in the time direction; that is, the times at which sudden changes happened during the underlying astrophysical process.

Change point detection in time series is well studied, and several algorithms employing different philosophies have been developed. For example, Aue & Horváth (2013) employed hypothesis testing to study structural break detection, using both non-parametric approaches like cumulative sum (CUSUM) and parametric methods like likelihood ratio statistic to deal with different kinds of structural breaks. Another likelihood-based approach commonly used to analyze astronomical time series is Bayesian Blocks (Scargle et al., 2013) which finds change points by fitting piecewise constant models between change points. A good example of the model driven approach is the Auto-PARM procedure developed by Davis et al. (2006). By modeling the piecewise-stationary time series, the procedure is able to simultaneously estimate the number of change points, their locations and the parametric model for each piece. Here the minimum description length (MDL) principle by Rissanen (1989, 2007) is applied in the model selection procedure. Davis & Yau (2013) proved the strong consistency of the MDL-based change point detection procedure. Another example is Automark by Wong et al. (2016), who developed an MDL-based methodology which detects the changes in observed emission from astronomical sources in 2-D time-wavelength space.

Dey et al. (2010) loosely classified image segmentation techniques into two categories: (i) image driven approaches and (ii) model driven approaches. Image driven segmentation techniques are mainly based on the discrete pixel values of the image. For example, the graph-based algorithm by Felzenszwalb & Huttenlocher (2004) treats pixels as vertices, and the weights of the edges are based on the similarity between the features of pixels. The evidence for a boundary between two regions can be measured based on the graph. Such methods work in many complicated cases as there is no underlying model for images. Model driven approaches rely upon the information of the structure of the image. These methods are based on the assumption that the pixels in the same region have similar characteristics. The Blobworld framework by Carson et al. (1999) assumes that the features of pixels are from a underlying multivariate Gaussian mixture model. Neighboring pixels whose features are from the same Gaussian distribution are grouped into the same region.

Here we follow the model driven approach. In order to develop a change point detection method for image time series data, we begin by specifying an underlying statistical model for the images between any two consecutive change points. In doing so we also study the statistical properties of the change point detection method. We assume the underlying Poisson rate for each of the images follows a piecewise constant function. Therefore, the region growing algorithm developed by Adams & Bischof (1994) for greyvalue images can be naturally applied.

Given the previous successes of applying the MDL principle (Rissanen, 1989, 2007) to other time series change point detection and image segmentation problems (e.g., Davis et al., 2006; Lee, 2000; Wong et al., 2016), here we also use MDL to tackle our problem of joint change point detection and image segmentation for time series of astronomical images. Briefly, MDL defines the best model as the one that produces the best lossless compression of the data. There are different versions of MDL, and the one we use is the so-called two-part code; a gentle introduction can be found in Lee (2001). When comparing with other versions of MDL such as normalized maximum likelihood, one advantage of the two-part version is that it tends to be more computationally tractable for complex problems such as the one this paper considers. It has also been shown to enjoy excellent theoretical and empirical properties in other model selection tasks (e.g., Aue & Lee, 2011; Lee, 2000; Davis et al., 2006; Davis & Yau, 2013) Based on MDL, we develop a practical algorithm that can be applied to simultaneously estimate the number and locations of the change points, as well as to perform image segmentation on each of the images.

2 Methodology

Our method is applied to 4-D data cubes where 2-D spatial slices in several energy passbands are stacked in time. Such data cubes are commonly available, though in high-energy astrophysics, data are usually obtained in the form of a list of photons. The list contains the two-dimensional spatial coordinates where the photons were recorded on the detector, the times they were recorded, and their energies or wavelengths. To facilitate our analysis, we bin these data into a 4-D rectangular grid of boxes. After the binning of the original data, we obtain a 4D table of photon counts indexed by the two-dimensional coordinates (x,y)(x,y), time index tt and energy band ww. The dataset is thus a series of multi-band images with counts of photons as the values of the pixels. Since the emission times of photons can be considered a non-homogeneous Poisson process, and the grids do not overlap with each other, the counts in each pixel are independent, and the image slices are also independent.

We first partition these images into a set of non-overlapping region segments using a seeded region growing (SRG) method, and then merging adjacent segments to minimize MDL (see Section 2.1). The counts in each segment are modeled as Poisson counts (see Section 2.2; the implementation details of the algorithm are described in Section 2.3). We minimize the MDL criterion across the images by iteratively removing change points along the time axis and applying the SRG segmentation onto the images in each of the time intervals. Key pixels that are influential in how the segmentations and change points are determined are then identified through searching for changes in the fitted intensities (see Section 2.4). Such regions are the focus of follow-up analyses.

We list the variables, parameters, and notation used here in Table 1.

Table 1: Statistical Notations
Notation Definition
NIN_{\rm I} number of pixels in each 2-D spatial image
NTN_{\rm T} number of time bins
NWN_{\rm W} number of energy bands
ΔTt\Delta T_{t} duration of the ttht^{\rm th} time bin
yi,t,wy_{i,t,w} photon counts within the ithi^{\rm th} spatial pixel, the ttht^{\rm th} time interval and the wthw^{\rm th} energy range
λi,t,w\lambda_{i,t,w} Poisson rate for the ithi^{\rm th} spatial pixel, the ttht^{\rm th} time interval and the wthw^{\rm th} energy range
KK number of change points
τk\tau_{k} location of the kthk^{\rm th} change point
m(k)m^{(k)} number of region segments for the kthk^{\rm th} interval between two consecutive change points
ah(k)a_{h}^{(k)} the area (number of pixels) of the hthh^{\rm th} region segment of the kthk^{\rm th} interval between two consecutive change points
bh(k)b_{h}^{(k)} the “perimeter” (number of pixel edges between this and neighboring regions) of the hthh^{\rm th} region of the kthk^{\rm th} interval
μh,w(k)\mu_{h,w}^{(k)} Poisson rate for the hthh^{\rm th} region segment and the wthw^{\rm th} energy range of the kthk^{\rm th} interval
μ^h,w(k)\hat{\mu}_{h,w}^{(k)} fitted Poisson rate for the hthh^{\rm th} region segment and the wthw^{\rm th} energy range of the kthk^{\rm th} interval

2.1 Region Growing and Merging

As a first step in the analysis, a suitable segmentation method must be applied to the images to delineate regions of interest (ROIs). For this, we use the seeded region growing (SRG) method of Adams & Bischof (1994) to obtain a segmentation of the image. We choose SRG over other image segmentation algorithms for its speed and reliability (Fan & Lee, 2015). Also, it can be straightforwardly incorporated to the Poisson setting.

At the beginning of SRG, we select a set of seeds, manually or automatically, from the image. Each seed can be a single pixel or a set of connected pixels. A seed comprises an initial region. Then each region starts to grow outward until the whole image is covered. (See Section 2.3.2 offers some suggestions on the selection of initial seeds.)

At each step, the unlabelled pixels which are neighbors to at least one of the current regions comprise the set of candidates for growing the region. One of these candidates is selected to merge into the region, based on the Poisson likelihood that measures the similarity between a candidate pixel and the corresponding region. We repeat this process until all the pixels are labeled, thus producing an initial segmentation by SRG.

At the end of the SRG process, we are left with an oversegmentation, i.e., with the image split into a larger than optimal number of segments. We then merge these segments based on the largest reduction or smallest increase in the MDL criterion (see below). From this sequence of segmentations, we select the one that gives the smallest value of the MDL criterion as the final ROIs.

2.2 Modeling a Poisson Image Series

2.2.1 Input Data Type

We require that the data are binned into photon counts in an NI×NT×NWN_{\rm I}\times N_{\rm T}\times N_{\rm W} tensor {yi,t,w},i=1,,NI,t=1,,NT,w=1,,NW\{y_{i,t,w}\},i=1,...,N_{\rm I},t=1,...,N_{\rm T},w=1,...,N_{\rm W}, where yi,t,wy_{i,t,w} is the photon counts within the ithi^{\rm th} spatial rectangular region, the ttht^{\rm th} time interval [Tt1,Tt)[T_{t-1},T_{t}) and the wthw^{\rm th} energy range [Ww1,Ww)[W_{w-1},W_{w}). After binning, the data can be viewed as a time series of images in different energy bands. The values of each pixel are the photon counts in the different bands in the corresponding spatial region.

Notice that compared with Automark (Wong et al., 2016), we incorporate 2-D spatial information into the model, thus extending the analysis from two (wavelength/energy and time) to four dimensions (wavelength/energy, time, and projected sky location). We also relax the restriction that the bin sizes along any of the axes are held fixed. Thus, sharp changes are more easily detected.

As the data in high-energy astrophysics are photon counts, we use a Poisson process to model the data,

yi,t,wi.i.d.Poisson(λi,t,wΔTt),\displaystyle y_{i,t,w}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Poisson}(\lambda_{i,t,w}\Delta T_{t}), (1)

where ΔTt=(TtTt1)\Delta T_{t}=(T_{t}-T_{t-1}).

Our goal is to infer model intensities λi,t,w\lambda_{i,t,w} from the observed counts data {yi,t,w}\{y_{i,t,w}\}. We are especially interested in detecting significant changes of λi,t,w\lambda_{i,t,w} over time. If there are changes, we also want to estimate the number and locations of the change points.

To simplify the presentation, we first develop a time-homogeneous model, i.e., one where there are no change points and λi,t,w\lambda_{i,t,w} is unchanging with tt (Section 2.2.2). We will then consider more complex cases, where change points are added to the model so that λi,t,w\lambda_{i,t,w} is allowed to change over time (Section 2.2.3)

2.2.2 Piecewise Constant Model

First consider a temporally homogeneous Poisson model without any change points. Then each image can be treated as an independent Poisson realization of the same, unknown, true image.

We model the image as a 3-dimensional piecewise constant function. That is, the 2-dimensional space of xx-yy coordinates is partitioned into mm non-overlapping regions such that all the pixels in a given region have the same Poisson intensity. Different energy bands share the same spatial partitioning. Rigorously, the Poisson parameter λi,t,w\lambda_{i,t,w} can be written as a summation of region-specific Poisson rates μh,w\mu_{h,w} times the corresponding indicator functions of regions (I{iRh}I_{\{i\in R_{h}\}} is 1 for pixel ii in region RhR_{h} and 0 otherwise) in the following format:

λi,t,w=h=1mμh,wI{iRh}.\displaystyle\lambda_{i,t,w}=\sum_{h=1}^{m}\mu_{h,w}I_{\{i\in R_{h}\}}. (2)

Here iRhi\in R_{h} means “the ithi^{\rm th} pixel in the hthh^{\rm th} region” and II is the indicator function. RhR_{h} is the index set of the pixels within the hthh^{\rm th} region, with Rh{1,,NI}R_{h}\subseteq\{1,...,N_{\rm I}\}. Also, μh,w\mu_{h,w} is the Poisson rate for the wthw^{\rm th} band of the hthh^{\rm th} region. The partition of the image is specified by 𝑹={Rh|h=1,,m}\bm{R}=\{R_{h}|h=1,...,m\}.

2.2.3 Adding Change Points to the Model

Now we allow the underlying Poisson parameter λi,t,w\lambda_{i,t,w} to change over time tt. We model λi,t,w\lambda_{i,t,w} as a piecewise constant function of tt.

Suppose these NTN_{\rm T} images can be partitioned into K+1K+1 homogeneous intervals by KK change points

τ={τ0=0,τ1,τ2,,τK,τK+1=NT}\mathbf{\tau}=\{\tau_{0}=0,~{}\tau_{1},\tau_{2},...,\tau_{K},~{}\tau_{K+1}=N_{\rm T}\}

For the ttht^{\rm th} image, suppose that it belongs to the kthk^{\rm th} time interval; i.e., t(τk1,τk]t\in(\tau_{k-1},\tau_{k}]. For each given tt, let λ\lambda be a two-dimensional piecewise constant function with m(k)m^{(k)} constant regions. Then λ\lambda can be represented by

λi,t,w=k=1K+1I{t(τk1,τk]}h=1m(k)μh,w(k)I{iRh(k)},\displaystyle\lambda_{i,t,w}=\sum_{k=1}^{K+1}I_{\{t\in(\tau_{k-1},\tau_{k}]\}}\sum_{h=1}^{m^{(k)}}\mu_{h,w}^{(k)}I_{\{i\in R_{h}^{(k)}\}}, (3)

where m(k)m^{(k)} is the number of regions within the kthk^{\rm th} interval. Let ={m(k)|k=1,2,,K+1}\mathcal{M}=\{m^{(k)}|k=1,2,...,K+1\}. The partition of the images within interval kk is specified by 𝑹(k)={Rh(k)|h=1,,m(k)}\bm{R}^{(k)}=\{R_{h}^{(k)}|h=1,...,m^{(k)}\}. And the overall partition is ={𝑹(k)|k=1,2,,K+1}\mathcal{R}=\{\bm{R}^{(k)}|k=1,2,...,K+1\}. The Poisson rates μh,w(k)\mu_{h,w}^{(k)} is the value for the wthw^{\rm th} band in the hthh^{\rm th} region of the kthk^{\rm th} interval. Let 𝝁(k)={μh,w(k)|h=1,,m(k),w=1,,NW}\bm{\mu}^{(k)}=\{\mu_{h,w}^{(k)}|h=1,...,m^{(k)},w=1,...,N_{\rm W}\}. And let 𝝁={𝝁(k)|k=1,,K+1}\bm{\mu}=\{\bm{\mu}^{(k)}|k=1,...,K+1\}, and iRh(k)i\in R_{h}^{(k)} means “the ithi^{\rm th} pixel is in the hthh^{\rm th} region of the kthk^{\rm th} interval”.

2.2.4 Model Selection Using MDL

Given the observed images {yi,t,w}\{y_{i,t,w}\}, we aim to obtain an estimate of λi,t,w\lambda_{i,t,w}. In other words, we want an estimate of the image partitions and the Poisson rates of the regions for each band. It is straightforward to estimate the Poisson intensities given the region partitioning, but the partitioning is a much more complicated model selection problem.

We will apply MDL to select the best fitting model. Loosely speaking, the idea behind MDL for model selection is to first obtain a MDL criterion for each possible model, and then define the best fitting model as the minimizer of this criterion. MDL defines the best model as the one that produces the best compression of the data. The criterion can be treated as the code length, or amount of hardware memory required to store the data.

First we present the MDL criterion for the homogeneous Poisson model, then follow it by the MDL criterion for the general case (i.e., with change points).

Following similar arguments as in Lee (2000) (see their Appendix B), the MDL criterion for segmenting NTN_{\rm T} homogeneous images is

MDL(m,R,μ^)=mlog(NI)+log(3)2h=1mbh+\displaystyle\text{MDL}(m,R,\hat{\mu})=m\log(N_{\rm I})+\frac{\log(3)}{2}\sum_{h=1}^{m}b_{h}+
NW2h=1mlog(NTah)w=1NWt=1NTh=1miRhyi,t,wlog(μ^h,w),\displaystyle\frac{N_{\rm W}}{2}\sum_{h=1}^{m}\log(N_{\rm T}a_{h})-\sum_{w=1}^{N_{\rm W}}\sum_{t=1}^{N_{\rm T}}\sum_{h=1}^{m}\sum_{i\in R_{h}}y_{i,t,w}\log(\hat{\mu}_{h,w}),

where aha_{h} and bhb_{h} are, respectively, the “area” (number of pixels) and “perimeter” (number of pixel edges) of region RhR_{h}, and

μ^h,w=1t=1NTΔTtaht=1NTiRhyi,t,w\hat{\mu}_{h,w}=\frac{1}{\sum_{t=1}^{N_{\rm T}}\Delta T_{t}a_{h}}\sum_{t=1}^{N_{\rm T}}\sum_{i\in R_{h}}y_{i,t,w} (5)

is the maximum likelihood estimate of the Poisson rate in the corresponding region. Note that the indices of 𝝁^={μ^hw}\hat{\bm{\mu}}=\{\hat{\mu}_{hw}\} run over the region segments h=1..mh=1..m and the passbands w=1..NWw=1..N_{\rm W}.

For the Poisson model with change points, once the number of change points KK and the locations 𝝉={τ1,,τK}\bm{\tau}=\{\tau_{1},...,\tau_{K}\} are specified, for each k(1,2,,K+1)k\in(1,2,...,K+1), m(k)m^{(k)} and 𝑹(k)\bm{R}^{(k)} can be estimated independently. Using the previous argument, the MDL criterion for images within the same homogeneous interval is

MDL(τk1,τk,m(k),𝑹(k),𝝁^(k))\displaystyle\text{MDL}(\tau_{k-1},\tau_{k},m^{(k)},\bm{R}^{(k)},\hat{\bm{\mu}}^{(k)}) (6)
=\displaystyle= m(k)log(NI)+log(3)2h=1m(k)bh(k)\displaystyle m^{(k)}\log(N_{\rm I})+\frac{\log(3)}{2}\sum_{h=1}^{m^{(k)}}b_{h}^{(k)}
+NW2h=1m(k)log((τkτk1)ah(k))\displaystyle+\frac{N_{\rm W}}{2}\sum_{h=1}^{m^{(k)}}\log((\tau_{k}-\tau_{k-1})a_{h}^{(k)})
w=1NWt=τk1+1τkh=1m(k)iRh(k)yi,t,wlog(μ^h,w(k)).\displaystyle-\sum_{w=1}^{N_{\rm W}}\sum_{t=\tau_{k-1}+1}^{\tau_{k}}\sum_{h=1}^{m^{(k)}}\sum_{i\in R_{h}^{(k)}}y_{i,t,w}\log(\hat{\mu}_{h,w}^{(k)}).

Then the overall MDL criterion for the model with change points is

MDLoverall(K,τ,,,𝝁^)\displaystyle\text{MDL}_{\text{overall}}(K,\mathbf{\tau},\mathcal{M},\mathcal{R},\hat{\bm{\mu}})
=\displaystyle= Klog(NT)+k=1K+1MDL(τk1,τk,m(k),𝑹(k),𝝁^(k)).\displaystyle K\log(N_{\rm T})+\sum_{k=1}^{K+1}\text{MDL}(\tau_{k-1},\tau_{k},m^{(k)},\bm{R}^{(k)},\hat{\bm{\mu}}^{(k)}).
(7)

To sum up, using the MDL principle, the best-fit model is defined as the minimizer of the criterion (7). The next subsection presents a practical algorithm for carrying out this minimization.

2.2.5 Statistical Consistency

An important step to demonstrating the efficacy of our method is to establish its statistical consistency. That is, if it is shown that as the size of the data increases, the differences between the estimated model parameters and the true values decrease to zero, then the method can be said to be free of asymptotic bias, can be applied in the general case, and is elevated above a heuristic. We prove in Appendix A that the MDL-based model selection to choose the region partitioning, as well as the corresponding Poisson intensity parameters, is indeed strongly statistically consistent under mild assumptions of maintaining the temporal variability structure of λi,t,w\lambda_{i,t,w}.

2.3 Practical Minimization

2.3.1 An Iterative Algorithm

Given its complicated structure, global minimization of MDLoverall(K,τ,,,μ^)\text{MDL}_{\text{overall}}(K,\tau,\mathcal{M},\mathcal{R},\hat{\mu}) (Equation 7) is virtually infeasible when the number of images NTN_{T} and the number of pixels NIN_{I} are not small, because the time complexity of the exhaustive search is of order 2NTNI2^{N_{\rm T}N_{\rm I}}.

Refer to caption
Figure 1: Schematic illustration of the minimization algorithm

We iterate the following two steps to (approximately) minimize the MDL criterion (7).

  1. 1.

    Given a set of change points, apply the image segmentation method to all the images belonging to the first homogeneous time interval and obtain the MDL best-fitting image for this interval. Repeat this for all remaining intervals. Calculate the MDL criterion (7).

  2. 2.

    Modify the set of change points by, for example, adding or removing one change point. In terms of what modification should be made, we use the greedy strategy to select the one that achieves the largest reduction of the overall MDL value in (7).

For Step 1 we begin with a large set of change points (i.e., an over-fitted model). In Step 2 we remove one change point (i.e., merge two neighboring intervals) to maximize the reduction of the MDL value. The procedure stops and declares the optimization is done if no further MDL reduction can be achieved by removing change points. This is similar to backward elimination for statistical model selection problems. See Figure 1 for a flowchart of the whole procedure.

2.3.2 Practical Considerations

Here we list some practical issues that are crucial to the success of the above minimization algorithm.

Initial seed allocation for SRG: The selection of the initial seeds in SRG plays an important role in the performance of the algorithm. To obtain a good initial oversegmentation, there must be at least one seed within each true region. Currently we use all the local maxima as well as a subset of the square lattice as the initial seeds. Based on simulation results (see Section 3.2), when the number of initial seeds is inadequate, the SRG will underfit the images, which will in turn lead to an overfitting of the change points and will lead to an increased false positive rate. On the other hand, it could be time-consuming when the number of initial seeds is very large, especially for high resolution images. We developed an algorithm that allocates initial seeds automatically based on locating local maxima. However, an optimal selection of initial seeds almost certainly requires expert intervention, as it depends on the type of data that we work with. To reduce the chance of obtaining a poor oversegmentation with SRG, in practice one could try using different sets of initial seeds and select the oversegmentation that gives the smallest MDL value.

Counts per bin: The photon counts in the image pixels cannot be too small, otherwise the algorithm could fail to produce meaningful output; see Section 3.2. In some sense, small photon counts can be seen as low signal level, which means that the proposed method requires a minimum level of signal to operate with. Therefore, care must be exercised when deciding the size for the bin. As a rule of thumb, it should be enough to have around 100 counts for each pixel belonging to an astronomical object, while pixels from the background can have very low or even zero count.

Initial change point selection: Although the stepwise greedy algorithm is capable of saving a significant amount of computation time, it could still be time consuming if the initial set of change points is too large, as it might need many iterations to reach a local minimum. It is recommended to select the initial change points based on prior knowledge, if available, in order to accelerate the algorithm.

Computation Time: In each iteration, the main time-consuming part is to apply the SRG. When the total number of pixels and the number of seeds are large, during the process, the number of candidates is large. Therefore, the comparison among all the candidates and the following updating manner lead to most of the computation burden. As an example, we found that it takes about 40 minutes for applying SRG and merging once on 64×6464\times 64 images with about 200 seeds on a Linux machine with an octa-core 2.90 GHz Intel Xeon processor.

2.4 Highlighting the Key Pixels

After the change points are located, it is necessary to locate the pixels or regions that contribute to the estimation of the change points. The manner by which such key pixels are identified depends on the scientific context. Below we present two methods that are applicable to the real-world examples we discuss in Section 4

We focus here on images in a single passband, i.e., grey-valued images. For multi-band images, one can first transform a multi-band image into a single-band image by, for example, summing the pixel values in different bands, or by using only first principal component image of the multi-band image. Alternatively, one can also apply the method to each band individually, and merge the results from each band.

2.4.1 Based on Pixel Differences

The first method is to highlight key pixels based on the distribution of pixel differences before and after change points. The rationale is that a pixel with different fitted values before and after a change point is a strong indicator that it is a key pixel.

Suppose the fitted values for pixel ii in time intervals kk and (k+1)(k+1) are λ^i(k)\hat{\lambda}_{i}^{(k)} and λ^i(k+1)\hat{\lambda}_{i}^{(k+1)}, respectively. Given the Poisson nature of the data, we first apply a square-root transformation to normalize the fitted values. Define the difference did_{i} for pixel ii as

di=λ^i(k+1)λ^i(k).d_{i}=\sqrt{\hat{\lambda}_{i}^{(k+1)}}-\sqrt{\hat{\lambda}_{i}^{(k)}}\,. (8)

A pixel is labelled as a key pixel if its did_{i} is far away from the mean of all the differences. To be specific, pixel ii is labelled as a key pixel if

|diμ^σ^|>Φ1(112p),\left|\frac{d_{i}-\hat{\mu}}{\hat{\sigma}}\right|>\Phi^{-1}\left(1-\frac{1}{2}p\right), (9)

where μ^=1NI1NIdi\hat{\mu}=\frac{1}{N_{\rm I}}\sum_{1}^{N_{\rm I}}d_{i} and σ^=1Φ1(3/4)MAD\hat{\sigma}=\frac{1}{\Phi^{-1}(3/4)}\text{MAD}. Here MAD=median(|did~|)\text{MAD}=\text{median}(|d_{i}-\tilde{d}|) is the median absolute deviation with d~=median(di)\tilde{d}=\text{median}(d_{i}), and is used to obtain a robust estimate (i.e., a measure that minimizes the effect of outliers) of the standard deviation of the did_{i}’s (see e.g., Rousseeuw & Croux, 1993). Φ1()\Phi^{-1}(\cdot) is the quantile of the standard normal distribution, and pp is the pre-specified significance level111Φ1\Phi^{-1} is related to the standard Normal error function, with exemplar values Φ1({0.75,0.841,0.977,0.9986,11052,110102,110152})={0.6745,1,2,3,4.417,6.467,8.014}\Phi^{-1}(\{0.75,0.841,0.977,0.9986,1-\frac{10^{-5}}{2},1-\frac{10^{-10}}{2},1-\frac{10^{-15}}{2}\})=\{0.6745,1,2,3,4.417,6.467,8.014\}. We typically choose p=110152p=1-\frac{10^{-15}}{2} as our threshold.. Notice that by checking the sign of diμ^d_{i}-\hat{\mu}, we can deduce if pixel ii has increased or decreased after this change point.

2.4.2 Based on Region Differences

Another method to locate key pixels is to compare pairs of regions. For any region in the time interval kk, there must exist at least one region in time interval (k+1)(k+1) such that these two regions have overlapping pixels. We then test if the difference between the means of the pixels from these two regions is significant or not.

As before, we apply the square-root transformation to the pixels within each of the regions. Then we calculate the sample means μ^1\hat{\mu}_{1} and μ^2\hat{\mu}_{2} and sample variances σ^12\hat{\sigma}_{1}^{2} and σ^22\hat{\sigma}_{2}^{2} of these two groups of square-rooted values. Then we can for example test whether the difference between μ^1\hat{\mu}_{1} and μ^2\hat{\mu}_{2} is large enough with

|μ^2μ^1σ^12+σ^22|>Φ1(112p).\displaystyle\left|\frac{\hat{\mu}_{2}-\hat{\mu}_{1}}{\sqrt{\hat{\sigma}_{1}^{2}+\hat{\sigma}_{2}^{2}}}\right|>\Phi^{-1}\left(1-\frac{1}{2}p\right). (10)

See Section 4 for the applications of these two methods on some real data sets.

Lastly we note that the selection of pp, the significance level, deserves a more careful consideration. As in reality one may need to do comparisons for many change points and energy bands, this becomes a multiple-testing problem where the number of tests is large. Therefore, one should adjust the value of pp in order to control false positives.

3 Simulations

Two groups of simulations were conducted to evaluate the empirical performance of the proposed method. A specially designed λi,t,w\lambda_{i,t,w} was used for each of the experiments. For each experiment, we tested 13 signal levels, defined as the average number of photon counts per pixel. For each signal level, 100 datasets were generated according to (1), with ΔT=1\Delta T=1. The number of spectral bands NW=3N_{W}=3 so w=1,2,3w=1,2,3 and the number of time points NT=60N_{T}=60.

3.1 Group 1: Single Pixel

The first group of experiments were designed to evaluate the ability of the proposed method for detecting change points, under the condition that there are no spatial variations. To be more specific, the size of the images is 1×11\times 1; i.e., only one pixel. In other words, NI=1N_{\rm I}=1 and the ii in λi,t,w\lambda_{i,t,w} is a dummy index. Three λi,t,w\lambda_{i,t,w}’s of increasing complexity were used:

  1. 1.

    λi,t,w\lambda_{i,t,w} is constant; i.e., no change point (as depicted in Figure 2 (a)).

  2. 2.

    λi,t,w\lambda_{i,t,w} shows intensity changes but all three bands are identical at any given tt (see Figure 2 (b)).

  3. 3.

    λi,t,w\lambda_{i,t,w} shows spectral changes (see Figure 2 (c)).

The first λi,t,w\lambda_{i,t,w} was used to study the level of false positives, while the remaining two were used to study false negatives.

Refer to caption
Figure 2: The Poisson rate functions λi,t,w\lambda_{i,t,w} used in the simulation experiments. (a): λi,t,w\lambda_{i,t,w} used in the single pixel experiment without change point (Section 3.1.1). The x-axis denotes the time points, while the y-axis shows the values of λi,t,w\lambda_{i,t,w} for different band ww. (b): the λi,t,w\lambda_{i,t,w} relative to the no-change case (top left), used in the single pixel simulation with changing intensity (Section 3.1.2). (c): the λi,t,w\lambda_{i,t,w} for different passbands (marked in blue, orange and green) relative to the no-change case (top left), used in the single pixel simulation with changing intensity and spectra (Section 3.1.3). (d): the spatial structure used for the second group of experiments. Size of the image is m=n=8m=n=8.

3.1.1 No Change Point

As there is no change point in λi,t,w\lambda_{i,t,w}, this experiment is ideal for studying the relationship between the false positive rate and the signal level; recall the latter is defined as the average number of photon counts per pixel.

The results of this experiment (together with the next two experiments) are summarized as the blue curves in Figure 3. The figure captures how well the simulation recovers the location of the change points (top left), the number of change points (top right), the excess number of change points (false positives; bottom left), and the deficit in change points (false negatives; bottom right). The top left plot reports the fraction of simulated datasets for which the set of the fitted change points 𝝉^\hat{\bm{\tau}} is identical to the set of true change points 𝝉\bm{\tau}. The top right plot presents the fraction of simulated datasets for which the fitted number of change points K^\hat{K} equals to the true number of change points KK. The bottom left plot shows the average false positive rate, which is defined as the average number of falsely detected change points per possible location. The bottom right plot presents the fraction of simulated datasets for which 𝝉^\hat{\bm{\tau}} contains 𝝉\bm{\tau}, i.e., 𝝉𝝉^\bm{\tau}\subseteq\hat{\bm{\tau}}. One can see that the false discovery rate seems to be quite stable across different signal levels. Notice that the last curve is always 1 because 𝝉\bm{\tau} is empty for this experiment.

Refer to caption
Figure 3: Simulation results for Group 1 experiments (single pixel). For all four plots, the x-axes denote the logarithm of the average counts of photons over all time points and all bands. Top left: fraction of the fitted change points 𝝉^\hat{\bm{\tau}} that are identical to the true change points 𝝉\bm{\tau}. Top right: fraction of the fitted number of change point K^\hat{K} that equals to the true number of change point KK. Bottom left: false positive rate. Bottom right: fraction of fitted change points contains true change points; i.e., 𝝉𝝉^\bm{\tau}\subseteq\hat{\bm{\tau}}. Note that the legend in the bottom right plot holds for all four plots.

3.1.2 Varying Intensity

In this experiment we introduced variation in λi,t,w\lambda_{i,t,w} by multiplying (a) and (b) in Figure 2 together. The results are reported as the green curve in Figure 3. One can see that when the signal level is small (log10(average signal level)<1.0\log_{10}(\text{average signal level})<1.0), increasing the signal level leads to more false positives: this range of signals levels are too small to provide enough information for the detection of the true change points.

When log10(average signal level)\log_{10}(\text{average signal level}) is between 1.01.0 and 2.02.0, the proposed method starts to be able to detect the true change points as the signal level increases. Also, the false positive rate begins to drop. One possible explanation is that for any two consecutive homogeneous time intervals, the location of the change point might not be clear when the signal level is relatively small.

As the signal level continues to increase, the proposed method becomes more successful in detecting the true change point locations. For example, when the average number of counts in each bin is greater than 100 (i.e., log10(average signal level)>2.0\log_{10}(\text{average signal level})>2.0), the signal is strong enough so that all the true change points can be detected successfully. We note that there are always some false positives due to the Poisson randomness, and it seems that the false positive rate stabilizes as the signal level increases.

3.1.3 Varying Spectrum

Here we allow different bands ww to change differently at the change points. The rate λi,t,w\lambda_{i,t,w} was obtained by multiplying (a) and (c) in Figure 2 together. The results, which are about the same as the previous experiment (varying intensity), are reported as the red curve in Figure 3. When the signal level is small (log10(average signal level)<1.0\log_{10}(\text{average signal level})<1.0), the proposed method fails to detect the true change points. When log10(average signal level)\log_{10}(\text{average signal level}) is between 1.01.0 and 2.02.0, as the signal level increases, the false positive rate begins to decrease while the true positive rate increases. When log10(average signal level)>2.0\log_{10}(\text{average signal level})>2.0, all the true change points can be detected successfully, while the false positive rate stays at the same level as signal level increases.

3.2 Group 2: Spatial Structure

Instead of having a constant spatial signal (i.e., single pixel), in this second group of experiments a spatial varying structure is introduced to study the empirical performance of the proposed method. As before, three Poisson rate functions λi,t,w\lambda_{i,t,w} are considered. The size of the image is set to NI=8×8N_{I}=8\times 8. To illustrate the importance of initial seed placement, we tested two allocation strategies: (i) we deliberately placed an inadequate number of initial seeds and (ii) we used every pixel as an initial seed.

3.2.1 No Change Point

There was no change point in this experiment and the spatial variation of λi,t,w\lambda_{i,t,w} is given in the bottom right plot of Figure 2. The results are reported as the blue curves in Figure 4. One can see that if the number of initial seeds is inadequate, the false positive rate increases as the signal level increases above log10(average signal level)>2.5\log_{10}(\text{average signal level})>2.5. However, this does not happen when there are a large number of initial seeds; see the blue dotted curves in Figure 4. In fact, for this and the following two experiments, our method did not detect any false positive change points. This suggests that when the images are under-segmented, the method tends to place more false change points to compensate for data variability not explainable by image segmentation.

Refer to caption
Figure 4: Simulation results for Group 2 experiments (with spatial structure). For all four plots, the x-axes denote the logarithm of average count of photons over all time points and all bands. Solid curves denote the results when the number of initial seeds was inadequate, while the dotted curves show the results when every pixel was assigned as a initial seed. Top left: fraction of the fitted change points 𝝉^\hat{\bm{\tau}} that are identical to the true change points 𝝉\bm{\tau}. Top right: fraction of the fitted number of change points K^\hat{K} that equals to the true number of change points KK. Bottom left: false positive rates. Bottom right: fraction of fitted change points contains the true change points; i.e., 𝝉𝝉^\bm{\tau}\subseteq\hat{\bm{\tau}}. The legend in the bottom right plot holds for all these four plots.

3.2.2 Varying Intensity

In this experiment λi,t,w\lambda_{i,t,w} was obtained by multiplying (b) and (d) of Figure 2 together, so there are three change points over time. The results are similar to the no change point case, and revsummarized as the green curves in Figure 4.

3.2.3 Varying Spectrum

In this last experiment the energy bands were allowed to be different, and λi,t,w\lambda_{i,t,w} was obtained by multiplying (c) and (d) of Figure 2 together. The results, reported as the red curves in Figure 4, are similar to the previous two experiments.

3.3 Empirical Conclusions

The following empirical conclusions can be drawn from the above experimental results.

  • The method works well in all cases when the signal level is sufficiently large. As a rule of thumb, for binning of the original data, it would be ideal to have 100 counts or more for each bin covering an astronomical source.

  • It is important to place enough initial seeds when applying SRG; otherwise the false positive rate will increase with the signal level. See the second paragraph of Section 2.3.2 for some practical guidelines for initial seed selection.

4 Applications to Real Data

To illustrate the usage in the astrophysics field, we apply the proposed method on two real datasets, which are more complicated than those in the previous section. Specifically, we select these datasets with some obvious time-evolving variations to demonstrate the performance of our method.

4.1 XMM-Newton Observations of Proxima Centauri

Refer to caption
Figure 5: Light curves of Proxima Cen in different bands. Each curve denotes the number of photons within the corresponding band at a given time point index. Vertical black lines denote the locations of the detected change points.

Proxima Centauri (catalog ) is the nearest star to the Sun and as such is well suited for studies of coronal activity. Like our Sun, Proxima Centauri operates an internal dynamo, which generates a stellar magnetic field. In the standard model for stellar dynamos, the magnetic field lines wind up through differential rotation. When some of the magnetic field lines reconnect, the energy is released in the stellar flare. Such flares typically show a sudden rise in X-ray emission and a more gradual decay over several hours. In flares, flux and temperature are correlated such that a higher X-ray flux corresponds to a higher temperature and thus a higher energy of the average detected photons (see Güdel, 2004, for a review of X-ray emission in stellar coronae and further references).

Despite its proximity, Proxima Centauri and its corona are unresolved in X-ray observations; it is just the point-spread function (PSF) of the telescope that distributes the incoming flux over many pixels on the detector.

4.1.1 Data

We use a dataset from XMM-Newton (Obs.ID 0049350101), where Proxima Centauri was observed for 67 ks on 2001-08-12. Because of the high flux, the MOS cameras on XMM-Newton are highly piled-up and we restrict our analysis to the data from the PN camera. We obtained the data from the XMM-Newton science archive hosted by the European Space Agency (ESA)222https://www.cosmos.esa.int/web/xmm-newton/xsa. The data we received was processed by ODS version 12.0.0. Our analysis is based on the filtered PN event data from the automated reduction pipeline (PPS). Güdel et al. (2002) presented a detailed analysis and interpretation of this dataset.

In our analysis, we only used a subset of photons with spatial coordinates within [25500,27500]×[26500,28500][25500,27500]\times[26500,28500], and it was binned as images of size 64×6464\times 64. We used the temporal bins of width 1100.41100.4 seconds to generate 6060 images. We binned the data into three energy bands, (200,1000](200,1000], (1000,3000](1000,3000] and (3000,10000](3000,10000] in eV.

4.1.2 Results

Refer to caption
Figure 6: Results for Proxima Centauri. (a): the data image at time point 42 for the first band (200, 1000] in eV. (b): the corresponding fitted value λi,t,w\lambda_{i,t,w}. (c): regions that show an increase (blue) and decrease (red) in intensity prior to this time point. Compared with the previous time interval, there was a significant increase in the source at this time point. (d): as in panel c, but for the epoch after this time point. After this time point, the brightness in the source decreased. Notices that these two bottom plots share the colorbar, where the value 11 denotes increasing and 1-1 denotes decreasing intensities.

Figure 5 presents the light curves for different bands as well as the locations of the detected change points. As there is only a single source of photons with negligible background signal level, the detected change points coincide with the changes of the light curves. Many change points are detected for the abrupt increase and then decrease in brightness for all the bands at the time points between 42 and 50. The time interval between 15 and 36 is detected as a homogeneous time interval, and the variation in light curves within this interval is viewed as common Poisson variations. A few change points are detected for the time interval before time point 15 and the interval after 50. A piecewise constant model is used to fit these gradual changes in intensities.

The fitted images can be found in Figure 6. The source of most photons, a point source, is modeled by different piecewise constant models at all these time intervals. The center of the point source and the wings of the PSF region nearby were fitted by models with shapes like concentric circles.

To test our method for finding the regions of significant change, we also apply it here because we know what the answer should be. For this dataset, as the observations for different bands change simultaneously, we combine all the three bands to highlight the key pixels. That is, we highlight the regions for each band and take the intersection of these regions. Examples of the results based on the method in Section 2.4.2 can be found in Figure 6. The method indeed picks out the point source. With significance level p=102p=10^{-2}, the abrupt increase in brightness of the source at time point 41 and 42, as well as the sudden decrease at time point 43 can be detected successfully. By modifying the significance level, different sensitivity can be achieved.

4.2 Isolated Evolving Solar Coronal Loop

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: An isolated loop structure shown lighting up in 3 SDO/AIA passbands. Each row corresponds to the intensities in AIA filter images, averaged over the time duration found by our method, going from interval 11 (top) to interval 44 (bottom). The columns, going from left to right, show the 94, 335, and 131 Å filter band images. The filter name, time duration, and the image sequence indices are marked at the top of the image and the intensity scale is marked at the bottom. The grid at the bottom of each image denotes the pixelation, with each image having a size of 64×6464{\times}64 pixels. Notice that the isolated loop becomes bright enough for detection in the 3rd3^{\rm rd} interval.
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8: Intensities λi,t,w\lambda_{i,t,w} as fit to the data from Figure 7 in spatial segments. The images are arranged in the same manner, and demonstrate that the loop structure is locatable and identifiable. The number of region segments found are also marked.

Images of the solar corona constitute a legitimate Big Data problem. Several observatories have been collecting images in extreme ultra-violet (EUV) filters and in X-ray passbands for several decades, and analyzing them to pick out interesting changes using automated routines have been largely unsuccessful. Catalogs like the HEK (Heliophysics Events Knowledgebase Hurlburt et al., 2012; Martens et al., 2012) can detect and mark features of particular varieties, though these compilations remain beset by incompleteness (see, e.g., Aggarwal et al., 2018; Hughes et al., 2019; Barnes et al., 2017). In this context, our method provides a way to model solar features without limiting it to a particular feature set to identify and locate regions in images where something interesting has transpired. As a proof of concept, we apply the method to a simple case of an isolated coronal loop filling with plasma, as observed with the Solar Dynamics Observatory’s Atmospheric Imaging Assembly (SDO/AIA) filters (Pesnell et al., 2012; Lemen et al., 2012). Considerable enhancements must still be made in order to lower the computational cost before the method can be applied to full size images at faster than observed cadence; however, we demonstrate here that a well-defined region of interest can be selected without manual intervention for a dataset that consists of images in several filters.

4.2.1 Data

In particular, here we consider AIA observations carried out on 2014-Dec-11 between 19:12 UT and 19:23 UT, and focus on a 64×6464{\times}64 pixel region located (+1′′,271′′)(+1^{\prime\prime},-271^{\prime\prime}) from disk center, in which a small, isolated, well-defined loop appeared at approximately 19:19 UT. This region was selected solely as a test case to demonstrate our method; the appearance of the loop is clear and unambiguous, with no other event occurring nearby to confuse the issue; see Figure 7. We apply our method to these data, downloaded using the SDO AIA Cutout Service,333https://www.lmsal.com/get_aia_data/ and demonstrate that the loop (and it alone) is detected and identified; see Figures 8, 9 and 10. AIA data are available in 6 filter bands, centered at 211, 94, 335, 193, 131, 171 Å. Here, we have limited our analysis to 3 bands: 94, 335, and 131 Å in which the isolated loop is easily discernible to the eye (a full analysis including all the filters does not change the results). Each filter consists of a sequence of 54 images, and while they are not obtained simultaneously, the difference in time between the bands is ignorable on the timescale over which the loop evolves.

4.2.2 Results

Refer to caption
Refer to caption
Figure 9: Demonstrating the isolation of key pixels of interest. Each set of three shows the fitted intensity in one passband in the 3rd3^{\rm rd} interval (left), followed by a bitmap of pixels (middle) showing where intensity increases (blue) and decreases (red), followed by the fitted intensity image in the same filter in the 4th4^{\rm th} time interval (right). The upper row shows the transition in the 94 Å filter, and the lower row shows the transition in the 131 Å filter. Notice that the loop continues to brighten at 94 Å, even as it starts to fade at 131 Å.
Refer to caption
Refer to caption
Refer to caption
Figure 10: Light curves of the key pixels where changes are found, for the three filters used in the analysis: 94 Å (top), 335 Å (middle), and 131 Å (bottom). The average of the observed intensities, weighted by the number of times each pixel is flagged as a key pixel, are shown as dots, along with the similarly weighted sample standard deviation as vertical bars. The shaded regions represent the envelope of the sample standard deviation seen outside the flagged pixels. The vertical lines denote the change points found by our algorithm.

The fitted images for the 3-band case can be found in Figure 8. Notice that there is a loop-shaped object that is of interest. Based on the fitted result, this object starts to appear at time point c.36 and becomes brighter after that for the first band. In the second band, this object appears at time point c.26 and stays bright throughout the duration considered. However in the third band, the object becomes bright at time point c.36 and vanishes soon after time point c.38. The proposed method is able to catch these changes in different bands and to detect the corresponding change points.

After detecting these change points, we find the key pixels that contribute to the change points using the methods in Section 2.4.1. This method is appropriate to highlight the regions that change rapidly after the change point because different bands may not change in the same direction for this dataset. Here we apply this method on a single band, 94, as an example. We use p=1015p=10^{-15}. See Figure 9 for an illustration. We find that the method could highlight the loop-shape object which starts to appear at time point c.36, and also detect the region that becomes much brighter after time point c.38. We also compute the light curves of the intensities in a region comprised of the set of pixels formed from the union of all key pixels found at all change points in all the filters; see Figure 10. Notice that the event of interest is fully incorporated within the key pixels, with no spillover into the background, and the change points are post facto found to be reasonably located from a temporal perspective in that they are located where a researcher seeking to manually place them would do so. The first segment is characterized by steady emission in all three bands, the second segment shows the isolated loop beginning to form, the third segment catches the time when it reaches a peak, and the last segment tracks the slow decline in intensity.

5 Summary

We have developed an approach to model photon emissions by astronomical sources. Also, we propose a practical algorithm to detect the change points as well as to segment the astronomical images, based on the MDL principle for model selection. We test this method on a series of simulation experiments and apply it to two real astrophysical datasets. We are able to recover the time-evolving variations.

Based on the results of simulation experiments, it is recommended that the average number of photon counts within each bin should be from 100 to 1000 for pixels belonging to an astrophysical object, so that the proposed method is able to find change points and limit false positives.

For future work, it will be helpful to quantify the evidence of the existence of a change point by deriving a test statistic based on Monte Carlo simulations or other methods. Another possible extension is to relax the piecewise constant assumption and allow piecewise linear/quadratic modeling so that the method is able to capture more complicated and realistic patterns.

Support for CX and TCML was partially provided by the National Science Foundation under grants DMS-1811405, DMS-1811661 and DMS-1916125. Support for HMG was provided by the National Aeronautics and Space Administration through the Smithsonian Astrophysical Observatory contract SV3-73016 to MIT for Support of the Chandra X-Ray Center, which is operated by the Smithsonian Astrophysical Observatory for and on behalf of the National Aeronautics Space Administration under contract NAS8-03060. VLK acknowledges support from NASA Contract NAS8-03060 to the Chandra X-ray Center. VLK thanks Durgesh Tripathi for the solar dataset used in this analysis.

Appendix A Statistical Consistency

Here we prove that the MDL scheme is statistically consistent (see Section 2.2.5), thereby ensuring that the estimates of region segmentations and the Poisson intensities are reliable measures of the data. In the following, we assume that the size of the time bins ΔTt=1,1tNT\Delta T_{t}=1,\forall 1\leq t\leq N_{\rm T}, as NTN_{\rm T} increases to infinity. That is to say, first, we study that as these underlying nonhomogeneous Poisson processes are extending at the same rate, the size of the bins keeps fixed, which leads to increasing number of independent observations for any given part of this Poisson process. And by keeping the size of the bins fixed, we get rid of the case that the Poisson parameters keep varying as NTN_{\rm T} increases. Second, by setting ΔTt=1\Delta T_{t}=1, the Poisson parameter is numerically equal to the Poisson rate, which ease the arguments. The proof can be extended if we relax this assumption.

Given the above assumption, the photon counts have the following Poisson model,

yi,t,wi.i.d.Poisson(λi,t,w).y_{i,t,w}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Poisson}(\lambda_{i,t,w}). (A1)

Given change points 𝝉=(τ1,τ2,,τK)\bm{\tau}=(\tau_{1},\tau_{2},...,\tau_{K}), we set τ0=0\tau_{0}=0 and τK+1=NT\tau_{K+1}=N_{\rm T}, and let νk=τk/NT,k=0,1,NT\nu_{k}=\tau_{k}/N_{\rm T},k=0,1,...N_{\rm T} to be the normalized change points. The consistency results are based on νk\nu_{k}’s being fixed as NTN_{\rm T} increases.

The observed images within the same interval follow the same corresponding piecewise constant model given change points 𝝉\bm{\tau}. Let xi,tτk1,w(k)=yi,t,w,τk1+1tτkx_{{i,t-\tau_{k-1},w}}^{(k)}=y_{i,t,w},\tau_{k-1}+1\leq t\leq\tau_{k}, 𝑿tτk1(k)=𝒀t={yi,t,w|i=1,,NI,w=1,,NW},τk1+1tτk\bm{X}_{t-\tau_{k-1}}^{(k)}=\bm{Y}_{t}=\{y_{i,t,w}|i=1,...,N_{\rm I},w=1,...,N_{\rm W}\},\tau_{k-1}+1\leq t\leq\tau_{k}, and 𝑿(k)={𝑿t(k)|1tTk}\bm{X}^{(k)}=\{\bm{X}_{t}^{(k)}|1\leq t\leq T_{k}\} where Tk=τkτk1T_{k}=\tau_{k}-\tau_{k-1}.

Let λi,t,w\lambda_{i,t,w}’s within the kthk^{\rm th} interval follow the same corresponding two-dimensional piecewise constant model with m(k)m^{(k)} constant regions. The partition of the images within interval kk is specified by a region assignment function R(k)(.):{1,,NI}{1,,m(k)}R^{(k)}(.)\colon\{1,...,N_{\rm I}\}\to\{1,...,m^{(k)}\}. The Poisson parameters μh,w(k)\mu_{h,w}^{(k)} is the value for the wthw^{\rm th} band in the hthh^{\rm th} region of the kthk^{\rm th} interval. Let μ(k)={μh,w(k)|h=1,,m(k),w=1,,NW}\mu^{(k)}=\{\mu_{h,w}^{(k)}|h=1,...,m^{(k)},w=1,...,N_{\rm W}\}. That is to say,

λi,t,w=k=1K+1I{t(τk1,τk]}μR(k)(i),w(k).\lambda_{i,t,w}=\sum_{k=1}^{K+1}I_{\{t\in(\tau_{k-1},\tau_{k}]\}}\mu_{R^{(k)}(i),w}^{(k)}. (A2)

In all, for pixel iRh(k)i\in R_{h}^{(k)},

xi,t,w(k)i.i.d.Poisson(μh,w(k)).x_{i,t,w}^{(k)}\stackrel{{\scriptstyle i.i.d.}}{{\sim}}\text{Poisson}(\mu_{h,w}^{(k)}). (A3)

For each 1tTk1\leq t\leq T_{k}, the log-likelihood function for regions assignment R(k)R^{(k)} and Poisson parameters μ(k)\mu^{(k)} is

l~k((R(k),μ(k));𝑿t(k))=w=1NWh=1m(k)i,s.t.R(k)(i)=h[xi,t,w(k)log(μh,w(k))μh,w(k)log(xi,t,w(k)!)].\tilde{l}_{k}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})=\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{(k)}}\sum_{i,s.t.R^{(k)}(i)=h}[x_{{i,t,w}}^{(k)}\log(\mu_{h,w}^{(k)})-\mu_{h,w}^{(k)}-\log(x_{{i,t,w}}^{(k)}!)]. (A4)

As some of the terms in the log-likelihood function have nothing to do with the parameters to estimate, we remove these terms and write down the log-likelihood to be

lk((R(k),μ(k));𝑿t(k))=w=1NWh=1m(k)i,s.t.R(k)(i)=h[xi,t,w(k)log(μh,w(k))μh,w(k)].l_{k}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})=\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{(k)}}\sum_{i,s.t.R^{(k)}(i)=h}[x_{i,t,w}^{(k)}\log(\mu_{h,w}^{(k)})-\mu_{h,w}^{(k)}]. (A5)

Define ψk=(R(k),μ(k))\psi_{k}=(R^{(k)},\mu^{(k)}) to be the parameter set for the kthk^{\rm th} interval, and 𝓜\bm{\mathcal{M}} to be the class of models ψk\psi_{k} can take value from. Then the log-likelihood for the kthk^{\rm th} interval can be written as

LT(k)(ψk;𝑿(k))=t=1Tklk((R(k),μ(k));𝑿t(k)).L_{T}^{(k)}(\psi_{k};\bm{X}^{(k)})=\sum_{t=1}^{T_{k}}l_{k}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)}). (A6)

Let 𝝂=(ν1,,νK)\bm{\nu}=(\nu_{1},...,\nu_{K}) be the normalized change point location vector, and 𝝍=(ψ1,,ψK+1)\bm{\psi}=(\psi_{1},...,\psi_{K+1}) be the parameter vector. Then vector (K,𝝂,𝝍)(K,\bm{\nu},\bm{\psi}) can specify a model for this sequence of images. The MDL is derived to be

MDL(K,𝝂,𝝍)=\displaystyle\text{MDL}(K,\bm{\nu},\bm{\psi})= Klog(NT)+k=1K+1[m(k)log(NI)+log(3)2h=1m(k)bh(k)+NW2h=1m(k)log(([Tνk][Tνk1]+1)ah(k))]\displaystyle K\log(N_{\rm T})+\sum_{k=1}^{K+1}[m^{(k)}\log(N_{\rm I})+\frac{\log(3)}{2}\sum_{h=1}^{m^{(k)}}b_{h}^{(k)}+\frac{N_{\rm W}}{2}\sum_{h=1}^{m^{(k)}}\log(([T\nu_{k}]-[T\nu_{k-1}]+1)a_{h}^{(k)})] (A7)
\displaystyle- k=1K+1LT(k)(ψk;𝑿(k)).\displaystyle\sum_{k=1}^{K+1}L_{T}^{(k)}(\psi_{k};\bm{X}^{(k)}).

Here the “area” (number of pixels) and “perimeter” (number of pixel edges) of region Rh(k)R_{h}^{(k)} are denoted by ah(k)a_{h}^{(k)} and bh(k)b_{h}^{(k)}.

In order to make sure that the change points are identifiable, we assume that there exists a ϵν>0\epsilon_{\nu}>0 such that min1kK+1|νkνk1|>ϵν\min_{1\leq k\leq K+1}|\nu_{k}-\nu_{k-1}|>\epsilon_{\nu}. Therefore, the number of change points is bounded by K[1/ϵν]+1K\leq[1/\epsilon_{\nu}]+1. And there exists a constraint AϵνKA_{\epsilon_{\nu}}^{K} of 𝝂\bm{\nu} where

AϵνK={𝝂(0,1)K|0<ν1<<νK<1,νkνk1>ϵν,1kK+1}.A_{\epsilon_{\nu}}^{K}=\{\bm{\nu}\in(0,1)^{K}|0<\nu_{1}<...<\nu_{K}<1,\nu_{k}-\nu_{k-1}>\epsilon_{\nu},\forall 1\leq k\leq K+1\}. (A8)

Then the estimation of the model based on MDL is given by

(K^T,𝝂^T,𝝍^T)=argminK[1/ϵν]+1,𝝂AϵνK,𝝍𝓜1NTMDL(K,𝝂,𝝍).(\hat{K}_{T},\hat{\bm{\nu}}_{T},\hat{\bm{\psi}}_{T})=\text{arg}\min_{K\leq[1/\epsilon_{\nu}]+1,\bm{\nu}\in A_{\epsilon_{\nu}}^{K},\bm{\psi}\in\bm{\mathcal{M}}}\frac{1}{N_{\rm T}}\text{MDL}(K,\bm{\nu},\bm{\psi}). (A9)

Here MDL(K,𝝂,𝝍)\text{MDL}(K,\bm{\nu},\bm{\psi}) is defined in (A7), 𝝂^T=(ν^1,,ν^K^)\hat{\bm{\nu}}_{T}=(\hat{\nu}_{1},...,\hat{\nu}_{\hat{K}}) and 𝝍^T=(ψ^1,,ψ^K^+1)\hat{\bm{\psi}}_{T}=(\hat{\psi}_{1},...,\hat{\psi}_{\hat{K}+1}), where ψ^k=(R^(k),μ^(k))\hat{\psi}_{k}=(\hat{R}^{(k)},\hat{\mu}^{(k)}). And μ^(k)\hat{\mu}^{(k)} is defined as

μ^(k)=argmaxμ(k)Θk(R^(k))LT(k)((R^(k),μ(k));𝑿^(k))\hat{\mu}^{(k)}=\text{arg}\max_{\mu^{(k)}\in\Theta_{k}(\hat{R}^{(k)})}L_{T}^{(k)}((\hat{R}^{(k)},\mu^{(k)});\hat{\bm{X}}^{(k)}) (A10)

with 𝑿^(k)={𝒀t|[Tν^k1]<t[Tν^k]}\hat{\bm{X}}^{(k)}=\{\bm{Y}_{t}|[T\hat{\nu}_{k-1}]<t\leq[T\hat{\nu}_{k}]\} denotes the estimated kthk^{\rm th} interval of the sequence of images.

We further define the log-likelihood formed by a portion of the observations in the kthk^{\rm th} interval by

LT(k)(ψk,νd,νu;𝑿(k))=t=[Tkνd]+1[Tkνu]lk((R(k),μ(k));𝑿t(k)),L_{T}^{(k)}(\psi_{k},\nu_{d},\nu_{u};\bm{X}^{(k)})=\sum_{t=[T_{k}\nu_{d}]+1}^{[T_{k}\nu_{u}]}l_{k}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)}), (A11)

where 0νd<νu10\leq\nu_{d}<\nu_{u}\leq 1 and νuνd>ϵν\nu_{u}-\nu_{d}>\epsilon_{\nu}.

We denote

supνd,νusup0νd<νu1,νuνd>ϵν\sup_{\nu_{d},\nu_{u}}\coloneqq\sup_{0\leq\nu_{d}<\nu_{u}\leq 1,\nu_{u}-\nu_{d}>\epsilon_{\nu}} (A12)

to simplify the notation.

In this setting, an extension need to be made such that νd\nu_{d} and νu\nu_{u} can be slightly outside [0,1][0,1]. It means that the kthk^{\rm th} estimated interval could cover a part of the observations that belong to the (k1)th(k-1)^{\rm th} and (k+1)th(k+1)^{\rm th} true intervals. Based on the formula (3.4) in Davis & Yau (2013), for a real-value function fT(νd,νu)f_{T}(\nu_{d},\nu_{u}) on 2\mathcal{R}^{2},

supνd¯,νu¯fT(νd,νu)a.s.0\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}f_{T}(\nu_{d},\nu_{u})\xrightarrow{a.s.}0 (A13)

is used to denote

suphT<νd<νu<1+rT,νuνd>ϵνfT(νd,νu)a.s.0\sup_{-h_{T}<\nu_{d}<\nu_{u}<1+r_{T},\nu_{u}-\nu_{d}>\epsilon_{\nu}}f_{T}(\nu_{d},\nu_{u})\xrightarrow{a.s.}0 (A14)

for any pre-specified positive-valued sequences hTh_{T} and rTr_{T}, which cover to 0 as NTN_{\rm T}\rightarrow\infty.

The following assumptions on true Poisson parameters μh,wo(k),1hm(k),1wNW,1k(Ko+1)\mu_{h,w}^{o(k)},1\leq h\leq m^{(k)},1\leq w\leq N_{\rm W},1\leq k\leq(K^{o}+1) are necessary.

Assumption 1.
0<Cdmink,h,wμh,wo(k)Cumaxk,h,wμh,wo(k)<.0<C_{d}\coloneqq\min_{k,h,w}\mu_{h,w}^{o(k)}\leq C_{u}\coloneqq\max_{k,h,w}\mu_{h,w}^{o(k)}<\infty. (A15)
Assumption 2.

For two true neighboring regions Rp(k)R_{p}^{(k)} and Rq(k)R_{q}^{(k)} at the kthk^{\rm th} interval,

δ1mink,p,q,w|μp,wo(k)μq,wo(k)|>0.\delta_{1}\coloneqq\min_{k,p,q,w}|\mu_{p,w}^{o(k)}-\mu_{q,w}^{o(k)}|>0. (A16)
Assumption 3.

For any two neighboring intervals (k1)(k-1) and kk

δ2minkmaxi,w|μR(k)(i),wo(k)μR(k1)(i),wo(k1)|>0.\delta_{2}\coloneqq\min_{k}\max_{i,w}|\mu_{R^{(k)}(i),w}^{o(k)}-\mu_{R^{(k-1)}(i),w}^{o(k-1)}|>0. (A17)
Proposition A.1 (v).

For k=1,,K+1k=1,...,K+1 and any fixed R(k)R^{(k)}, there exists a ϵ>0\epsilon>0 such that,

supμ(k)Θk(R(k))E|lk((R(k),μ(k));𝑿t(k))|v+ϵ<,supμ(k)Θk(R(k))E|lk((R(k),μ(k));𝑿t(k))|v+ϵ<,supμ(k)Θk(R(k))E|lk′′((R(k),μ(k));𝑿t(k))|<.\begin{split}\sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}E|l_{k}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})|^{v+\epsilon}<\infty,\\ \sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}E|l_{k}^{\prime}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})|^{v+\epsilon}<\infty,\\ \sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}E|l_{k}^{\prime\prime}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})|<\infty.\end{split} (A18)

This proposition holds for v=1,2,4v=1,2,4 due to the compactness of parameter space (Assumption 1) and bounded E[(xi,t,w(k))v+ϵ]E[(x_{i,t,w}^{(k)})^{v+\epsilon}].

Proposition A.2.

For k=1,,K+1k=1,...,K+1 and any fixed R(k)R^{(k)},

supμ(k)Θk(R(k))|1NT(νkνk1)LT(k)((R(k),μ(k));𝑿(k))Lk((R(k),μ(k)))|a.s.0,supμ(k)Θk(R(k))|1NT(νkνk1)LT(k)((R(k),μ(k));𝑿(k))Lk((R(k),μ(k)))|a.s.0,supμ(k)Θk(R(k))|1NT(νkνk1)LT′′(k)((R(k),μ(k));𝑿(k))Lk′′((R(k),μ(k)))|a.s.0,\begin{split}\sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{(k)}((R^{(k)},\mu^{(k)});\bm{X}^{(k)})-L_{k}((R^{(k)},\mu^{(k)}))|\xrightarrow{a.s.}0,\\ \sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{\prime(k)}((R^{(k)},\mu^{(k)});\bm{X}^{(k)})-L_{k}^{\prime}((R^{(k)},\mu^{(k)}))|\xrightarrow{a.s.}0,\\ \sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{\prime\prime(k)}((R^{(k)},\mu^{(k)});\bm{X}^{(k)})-L_{k}^{\prime\prime}((R^{(k)},\mu^{(k)}))|\xrightarrow{a.s.}0,\end{split} (A19)

where

Lk((R(k),μ(k)))E(lk((R(k),μ(k));𝑿t(k))),Lk((R(k),μ(k)))E(lk((R(k),μ(k));𝑿t(k))),Lk′′((R(k),μ(k)))E(lk′′((R(k),μ(k));𝑿t(k))).\begin{split}L_{k}((R^{(k)},\mu^{(k)}))\coloneqq E(l_{k}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})),\\ L_{k}^{\prime}((R^{(k)},\mu^{(k)}))\coloneqq E(l_{k}^{\prime}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})),\\ L_{k}^{\prime\prime}((R^{(k)},\mu^{(k)}))\coloneqq E(l_{k}^{\prime\prime}((R^{(k)},\mu^{(k)});\bm{X}_{t}^{(k)})).\end{split} (A20)

The estimated locations of change points are used to define the likelihood in practice. Therefore, the two ends of the kthk^{\rm th} interval might contain observations from the (k1)th(k-1)^{\rm th} and (k+1)th(k+1)^{\rm th} true intervals, though the estimated change points are close to the true change points. It is necessary to control the effect at the two ends of the fitted interval.

Proposition A.3 (w).

For k=1,,K+1k=1,...,K+1 and any fixed ψ\psi and any sequence of integers {g(NT)}NT1\{g(N_{\rm T})\}_{N_{\rm T}\geq 1} that satisfies g(NT)>cNTwg(N_{\rm T})>cN_{\rm T}^{w} for some c>0c>0 when NTN_{\rm T} is large enough, then

1g(NT)t=NTg(NT)+1NTlk(ψ;𝑿t(k))a.s.E(lk(ψ;𝑿t(k))),1g(NT)t=NTg(NT)+1NTlk(ψ;𝑿t(k))a.s.E(lk(ψ;𝑿t(k))).\begin{split}\frac{1}{g(N_{\rm T})}\sum_{t=N_{\rm T}-g(N_{\rm T})+1}^{N_{\rm T}}l_{k}(\psi;\bm{X}_{t}^{(k)})\xrightarrow{a.s.}E(l_{k}(\psi;\bm{X}_{t}^{(k)})),\\ \frac{1}{g(N_{\rm T})}\sum_{t=N_{\rm T}-g(N_{\rm T})+1}^{N_{\rm T}}l_{k}^{\prime}(\psi;\bm{X}_{t}^{(k)})\xrightarrow{a.s.}E(l_{k}^{\prime}(\psi;\bm{X}_{t}^{(k)})).\end{split} (A21)

Based on Lemma 1 in Davis & Yau (2013), Proposition A.3 holds when Proposition A.1(2) holds and the Assumption 4* in Davis & Yau (2013) is satisfied. And Assumption 4* is satisfied because an independent process, like the current setting, must be mixing.

It is necessary to discuss the identifiability of models in 𝓜\bm{\mathcal{M}}. First we define RbR^{b} an oversegmentation compared with RsR^{s} if Rb(i)=Rb(j)R^{b}(i)=R^{b}(j) leads to Rs(i)=Rs(j)R^{s}(i)=R^{s}(j).

Proposition A.4.

For the kthk^{\rm th} interval, the true model ψko𝓜\psi_{k}^{o}\in\bm{\mathcal{M}} satisfies ψko=argmaxψ𝓜E(lk(ψ;𝐗t(k)))\psi_{k}^{o}=\arg\max_{\psi\in\bm{\mathcal{M}}}E(l_{k}(\psi;\bm{X}_{t}^{(k)})). Also, ψko\psi_{k}^{o} is uniquely identifiable, which means that if there exists a μ\mu^{*} such that lk((Ro,μo);𝐗t(k))=lk((Ro,μ);𝐗t(k))l_{k}((R^{o},\mu^{o});\bm{X}_{t}^{(k)})=l_{k}((R^{o},\mu^{*});\bm{X}_{t}^{(k)}) almost everywhere for 𝐗t(k)\bm{X}_{t}^{(k)}, then μo=μ\mu^{o}=\mu^{*}. And suppose there exists another model ψkb=(Rb,μb)\psi_{k}^{b}=(R^{b},\mu^{b}) such that lk(ψkb;𝐗t(k))=lk(ψko;𝐗t(k))l_{k}(\psi_{k}^{b};\bm{X}_{t}^{(k)})=l_{k}(\psi_{k}^{o};\bm{X}_{t}^{(k)}) almost everywhere, then RbR^{b} must be an oversegmentation compared with RoR^{o}. And μb\mu^{b} satisfies μRb(i),wb=μRo(i),wo,i,w\mu_{R^{b}(i),w}^{b}=\mu_{R^{o}(i),w}^{o},\forall i,w.

Proof.

Suppose on the contrary there exist a model ψ=(R,μ)\psi^{*}=(R^{*},\mu^{*}) that satisfies ψ=argmaxψ𝓜E(lk(ψ;𝑿t(k))))\psi^{*}=\arg\max_{\psi\in\bm{\mathcal{M}}}E(l_{k}(\psi;\bm{X}_{t}^{(k)}))), and ψ\psi^{*} is neither the true model nor an oversegmentation of the true model. Then there exist two pixels i0i_{0} and j0j_{0}, such that they are neighboring pixels, Ro(k)(i0)Ro(k)(j0) and R(i0)=R(j0)R^{o(k)}(i_{0})\neq R^{o(k)}(j_{0})\text{ and }R^{*}(i_{0})=R^{*}(j_{0}). Therefore, by Assumption 2, we have

|μRo(k)(i0),wo(k)μRo(k)(j0),wo(k)|δ1>0.|\mu_{R^{o(k)}(i_{0}),w}^{o(k)}-\mu_{R^{o(k)}(j_{0}),w}^{o(k)}|\geq\delta_{1}>0. (A22)

Define

μ¯h,w(R)1ah(R)i,R(i)=hμRo(i),wo(k),\bar{\mu}_{h,w}(R)\coloneqq\frac{1}{a_{h}(R)}\sum_{i,R(i)=h}\mu_{R^{o}(i),w}^{o(k)}, (A23)

where ah(R)a_{h}(R) denotes the number of pixels in region hh given segmentation RR. And in a special case,

μ¯h,w(Ro(k))=μh,wo(k).\bar{\mu}_{h,w}(R^{o(k)})=\mu_{h,w}^{o(k)}. (A24)

Then for all possible μΘk(R)\mu^{*}\in\Theta_{k}(R^{*}), we have

E(lk((R,μ);𝑿t(k)))=E(w=1NWh=1mi,s.t.R(i)=h[xi,t,w(k)log(μh,w)μh,w])=w=1NWh=1mi,s.t.R(i)=h[μRo(k)(i),wo(k)log(μh,w)μh,w]=w=1NWh=1mah(R)[μ¯h,w(R)log(μh,w)μh,w]w=1NWh=1mah(R)maxμh,w[μ¯h,w(R)log(μh,w)μh,w]=w=1NWh=1mah(R)[μ¯h,w(R)log(μ¯h,w(R))μ¯h,w(R)]=E(lk((R,μ¯(R));𝑿t(k))).\begin{split}E(l_{k}((R^{*},\mu^{*});\bm{X}_{t}^{(k)}))&=E(\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{*}}\sum_{i,s.t.R^{*}(i)=h}[x_{i,t,w}^{(k)}\log(\mu_{h,w}^{*})-\mu_{h,w}^{*}])\\ &=\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{*}}\sum_{i,s.t.R^{*}(i)=h}[\mu_{R^{o(k)}(i),w}^{o(k)}\log(\mu_{h,w}^{*})-\mu_{h,w}^{*}]\\ &=\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{*}}a_{h}(R^{*})[\bar{\mu}_{h,w}(R)\log(\mu_{h,w}^{*})-\mu_{h,w}^{*}]\\ &\leq\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{*}}a_{h}(R^{*})\max_{\mu_{h,w}}[\bar{\mu}_{h,w}(R)\log(\mu_{h,w})-\mu_{h,w}]\\ &=\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{*}}a_{h}(R^{*})[\bar{\mu}_{h,w}(R^{*})\log(\bar{\mu}_{h,w}(R^{*}))-\bar{\mu}_{h,w}(R^{*})]\\ &=E(l_{k}((R^{*},\bar{\mu}(R^{*}));\bm{X}_{t}^{(k)})).\end{split} (A25)

Also, we have

E(lk((R,μ¯(R));𝑿t(k)))=w=1NWh=1mah(R)maxμh,w[μ¯h,w(R)log(μh,w)μh,w]w=1NWi=1,i{i0,j0}NImaxλi,w[μRo(k)(i),wo(k)log(λi,w)λi,w]+w=1NW[μRo(k)(i0),wo(k)log(μR(i0),w)μR(i0),w+μRo(k)(j0),wo(k)log(μR(j0),w)μR(j0),w]<w=1NWi=1,i{i0,j0}NImaxλi,w[μRo(k)(i),wo(k)log(λi,w)λi,w]+w=1NWmaxλi0,w[μRo(k)(i0),wo(k)log(λi0,w)λi0,w]+w=1NWmaxλj0,w[μRo(k)(j0),wo(k)log(λj0,w)λj0,w]=w=1NWi=1NI[μRo(k)(i),wo(k)log(μRo(k)(i),wo(k))μRo(k)(i),wo(k)]=w=1NWh=1mo(k)aho(k)[μRo(k)(i),wo(k)log(μRo(k)(i),wo(k))μRo(k)(i),wo(k)]=E(lk((Ro(k),μo(k));𝑿t(k))).\begin{split}E(l_{k}((R^{*},\bar{\mu}(R^{*}));\bm{X}_{t}^{(k)}))=&\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{*}}a_{h}(R^{*})\max_{\mu_{h,w}}[\bar{\mu}_{h,w}(R^{*})\log(\mu_{h,w})-\mu_{h,w}]\\ \leq&\sum_{w=1}^{N_{\rm W}}\sum_{i=1,i\notin\{i_{0},j_{0}\}}^{N_{\rm I}}\max_{\lambda_{i,w}}[\mu_{R^{o(k)}(i),w}^{o(k)}\log(\lambda_{i,w})-\lambda_{i,w}]\\ &+\sum_{w=1}^{N_{\rm W}}[\mu_{R^{o(k)}(i_{0}),w}^{o(k)}\log(\mu_{R^{*}(i_{0}),w}^{*})-\mu_{R^{*}(i_{0}),w}^{*}+\mu_{R^{o(k)}(j_{0}),w}^{o(k)}\log(\mu_{R^{*}(j_{0}),w}^{*})-\mu_{R^{*}(j_{0}),w}^{*}]\\ <&\sum_{w=1}^{N_{\rm W}}\sum_{i=1,i\notin\{i_{0},j_{0}\}}^{N_{\rm I}}\max_{\lambda_{i,w}}[\mu_{R^{o(k)}(i),w}^{o(k)}\log(\lambda_{i,w})-\lambda_{i,w}]\\ &+\sum_{w=1}^{N_{\rm W}}\max_{\lambda_{i_{0},w}}[\mu_{R^{o(k)}(i_{0}),w}^{o(k)}\log(\lambda_{i_{0},w})-\lambda_{i_{0},w}]\\ &+\sum_{w=1}^{N_{\rm W}}\max_{\lambda_{j_{0},w}}[\mu_{R^{o(k)}(j_{0}),w}^{o(k)}\log(\lambda_{j_{0},w})-\lambda_{j_{0},w}]\\ =&\sum_{w=1}^{N_{\rm W}}\sum_{i=1}^{N_{\rm I}}[\mu_{R^{o(k)}(i),w}^{o(k)}\log(\mu_{R^{o(k)}(i),w}^{o(k)})-\mu_{R^{o(k)}(i),w}^{o(k)}]\\ =&\sum_{w=1}^{N_{\rm W}}\sum_{h=1}^{m^{o(k)}}a_{h}^{o(k)}[\mu_{R^{o(k)}(i),w}^{o(k)}\log(\mu_{R^{o(k)}(i),w}^{o(k)})-\mu_{R^{o(k)}(i),w}^{o(k)}]\\ =&E(l_{k}((R^{o(k)},\mu^{o(k)});\bm{X}_{t}^{(k)})).\end{split} (A26)

Here the strict inequities must hold because of (A22)

Finally combining (A25) and (A26), we have

E(lk((R,μ);𝑿t(k)))<E(lk((Ro(k),μo(k));𝑿t(k))),E(l_{k}((R^{*},\mu^{*});\bm{X}_{t}^{(k)}))<E(l_{k}((R^{o(k)},\mu^{o(k)});\bm{X}_{t}^{(k)})), (A27)

which is a contradiction. This finishes the proof. ∎

Lemma 1.

For any fixed R(k)R^{(k)},

supνd¯,νu¯supμ(k)Θk(R(k))|1NT(νkνk1)LT(k)((R(k),μ(k)),νd,νu;𝑿(k))(νuνd)Lk((R(k),μ(k)))|a.s.0,supνd¯,νu¯supμ(k)Θk(R(k))|1NT(νkνk1)LT(k)((R(k),μ(k)),νd,νu;𝑿(k))(νuνd)Lk((R(k),μ(k)))|a.s.0,supνd¯,νu¯supμ(k)Θk(R(k))|1NT(νkνk1)LT′′(k)((R(k),μ(k)),νd,νu;𝑿(k))(νuνd)Lk′′((R(k),μ(k)))|a.s.0.\begin{split}\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}\sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{(k)}((R^{(k)},\mu^{(k)}),\nu_{d},\nu_{u};\bm{X}^{(k)})-(\nu_{u}-\nu_{d})L_{k}((R^{(k)},\mu^{(k)}))|\xrightarrow{a.s.}0,\\ \sup_{\underline{\nu_{d}},\overline{\nu_{u}}}\sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{\prime(k)}((R^{(k)},\mu^{(k)}),\nu_{d},\nu_{u};\bm{X}^{(k)})-(\nu_{u}-\nu_{d})L_{k}^{\prime}((R^{(k)},\mu^{(k)}))|\xrightarrow{a.s.}0,\\ \sup_{\underline{\nu_{d}},\overline{\nu_{u}}}\sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{\prime\prime(k)}((R^{(k)},\mu^{(k)}),\nu_{d},\nu_{u};\bm{X}^{(k)})-(\nu_{u}-\nu_{d})L_{k}^{\prime\prime}((R^{(k)},\mu^{(k)}))|\xrightarrow{a.s.}0.\end{split} (A28)

See Proposition 1 and 2 in Davis & Yau (2013) for the proof.

Lemma 2.

Suppose the true parameters for interval kk is ψo(k)=(Ro(k),μo(k))\psi^{o(k)}=(R^{o(k)},\mu^{o(k)}). And suppose a region segmentation R(k)R^{(k)} is specified for estimation. Let

μ^T=μ^T(k)(νd,νu)argmaxμ(k)Θk(R(k))LT(k)((R(k),μ(k)),νd,νu;𝑿k),μ(k)argmaxμ(k)Θk(R(k))Lk((R(k),μ(k))).\begin{split}\hat{\mu}_{T}&=\hat{\mu}_{T}^{(k)}(\nu_{d},\nu_{u})\coloneqq\arg\max_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}L_{T}^{(k)}((R^{(k)},\mu^{(k)}),\nu_{d},\nu_{u};\bm{X}_{k}),\\ \mu^{*(k)}&\coloneqq\arg\max_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}L_{k}((R^{(k)},\mu^{(k)})).\end{split} (A29)

Then

supνd¯,νu¯|1NT(νkνk1)LT(k)((R(k),μ^T),νd,νu;𝑿(k))(νuνd)Lk((R(k),μ(k)))|a.s.0,\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{(k)}((R^{(k)},\hat{\mu}_{T}),\nu_{d},\nu_{u};\bm{X}^{(k)})-(\nu_{u}-\nu_{d})L_{k}((R^{(k)},\mu^{*(k)}))|\xrightarrow{a.s.}0, (A30)

where the supremum is defined in (A13). And if R(k)=Ro(k)R^{(k)}=R^{o(k)}, we further have

supνd¯,νu¯|μ^T(k)(νd,νu)μo(k)|a.s.0.\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}|\hat{\mu}_{T}^{(k)}(\nu_{d},\nu_{u})-\mu^{o(k)}|\xrightarrow{a.s.}0. (A31)

If R(k)R^{(k)} is an oversegmentation than Ro(k)R^{o(k)}, then we have

supνd¯,νu¯|μ^T,R(k)(i),w(νd,νu)μRo(k)(i),wo(k)|a.s.0i,w.\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}|\hat{\mu}_{T,R^{(k)}(i),w}(\nu_{d},\nu_{u})-\mu^{o(k)}_{R^{o(k)}(i),w}|\xrightarrow{a.s.}0\ \forall i,w. (A32)
Proof.
(νuνd)(Lk((R(k),μ(k)))Lk((R(k),μ^T)))supνd¯,νu¯|(νuνd)Lk((R(k),μ(k)))1NT(νkνk1)LT(k)((R(k),μ(k)),νd,νu;𝑿(k))+1NT(νkνk1)LT(k)((R(k),μ^T),νd,νu;𝑿(k))(νuνd)Lk((R(k),μ^T))|2supνd¯,νu¯supμ(k)Θk(R(k))|1NT(νkνk1)LT(k)((R(k),μ^T),νd,νu;𝑿(k))(νuνd)Lk((R(k),μ(k)))|a.s.0.\begin{split}&(\nu_{u}-\nu_{d})(L_{k}((R^{(k)},\mu^{*(k)}))-L_{k}((R^{(k)},\hat{\mu}_{T})))\\ \leq&\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}|(\nu_{u}-\nu_{d})L_{k}((R^{(k)},\mu^{*(k)}))-\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{(k)}((R^{(k)},\mu^{*(k)}),\nu_{d},\nu_{u};\bm{X}^{(k)})\\ &+\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{(k)}((R^{(k)},\hat{\mu}_{T}),\nu_{d},\nu_{u};\bm{X}^{(k)})-(\nu_{u}-\nu_{d})L_{k}((R^{(k)},\hat{\mu}_{T}))|\\ \leq&2\sup_{\underline{\nu_{d}},\overline{\nu_{u}}}\sup_{\mu^{(k)}\in\Theta_{k}(R^{(k)})}|\frac{1}{N_{\rm T}(\nu_{k}-\nu_{k-1})}L_{T}^{(k)}((R^{(k)},\hat{\mu}_{T}),\nu_{d},\nu_{u};\bm{X}^{(k)})-(\nu_{u}-\nu_{d})L_{k}((R^{(k)},\mu^{(k)}))|\\ \xrightarrow{a.s.}&0.\end{split} (A33)

The first inequity is obtained by the definition of maximum likelihood estimator, and the last convergence comes from Lemma 1. As μ(k)\mu^{*(k)} maximizes Lk((R(k),μ(k)))L_{k}((R^{(k)},\mu^{(k)})) and νuνd>0\nu_{u}-\nu_{d}>0, we have

|Lk((R(k),μ(k)))Lk((R(k),μ^T))|a.s.0.|L_{k}((R^{(k)},\mu^{*(k)}))-L_{k}((R^{(k)},\hat{\mu}_{T}))|\xrightarrow{a.s.}0. (A34)

Combining (A33), (A34) and Proposition A.1(1), A30 holds. If R(k)=Ro(k)R^{(k)}=R^{o(k)}, by Proposition A.4, Lk((R(k),μ(k)))L_{k}((R^{(k)},\mu^{(k)})) has a unique maximizer at μo(k)\mu^{o(k)}, so (A31) holds. If R(k)R^{(k)} is an oversegmentation compared with Ro(k)R^{o(k)}, by Proposition A.4, (A32) holds. ∎

Now we give a preliminary result of the convergence when the number of change points is known.

Theorem 1.

(Theorem 1 in Davis & Yau (2013)) Let {𝐘t|t=1,,NT}\{\bm{Y}_{t}|t=1,...,N_{\rm T}\} be the observed images specified by (Ko,𝛎o,𝛙o)(K^{o},\bm{\nu}^{o},\bm{\psi}^{o}). And suppose the number of change points KoK^{o} is known. The change points and parameters are estimated by

(𝝂^T,𝝍^T)=argmin𝝀Aϵλm,𝝍𝓜1NTMDL(Ko,𝝂,𝝍).(\hat{\bm{\nu}}_{T},\hat{\bm{\psi}}_{T})=\text{arg}\min_{\bm{\lambda}\in A_{\epsilon_{\lambda}}^{m},\bm{\psi}\in\bm{\mathcal{M}}}\frac{1}{N_{\rm T}}\text{MDL}(K^{o},\bm{\nu},\bm{\psi}). (A35)

Then 𝛎^Ta.s.𝛎o\hat{\bm{\nu}}_{T}\xrightarrow{a.s.}\bm{\nu}^{o} and for each interval, the estimated R^(k)\hat{R}^{(k)} must be an oversegmentation comparing to the true region segmentation.

We skip the proof of this theorem because it is quite similar to the proof of Theorem 1 in Davis & Yau (2013). Notice that we need to use Assumption 3 in the proof.

Corollary 1.

(Corollary 1 in Davis & Yau (2013)) Under the conditions of Theorem 1, if the number of change- points is unknown and is estimated from the data , then

  1. 1.

    The number of change points cannot be underestimated. That is to say, K^Ko\hat{K}\geq K^{o} almost surely when NTN_{\rm T} is large enough.

  2. 2.

    When K^>Ko\hat{K}>K^{o}, 𝝂o\bm{\nu}^{o} must be a subset of the limit of 𝝂^T\hat{\bm{\nu}}_{T} for large enough NTN_{\rm T}.

  3. 3.

    In each fitted interval, the region segmentation must be equal to or be an oversegmentation comparing with the corresponding true region segmentation.

See Corollary 1 in Davis & Yau (2013) for more details.

Theorem 2.

(Theorem 2 in Davis & Yau (2013)) Let 𝛎o=(ν1o,ν2o,,νmoo)\bm{\nu}^{o}=(\nu_{1}^{o},\nu_{2}^{o},...,\nu_{m^{o}}^{o}) be the true change points. And (K^,𝛎^T,𝛙^T)(\hat{K},\hat{\bm{\nu}}_{T},\hat{\bm{\psi}}_{T}) is the MDL-based result. Then k=1,2,,Ko\forall k=1,2,...,K^{o}, there exists a ν^tk𝛎^T\hat{\nu}_{t_{k}}\in\hat{\bm{\nu}}_{T} where 1tkK^1\leq t_{k}\leq\hat{K} such that

|ν^tkνko|=o(NT12)a.s..|\hat{\nu}_{t_{k}}-\nu_{k}^{o}|=o(N_{\rm T}^{-\frac{1}{2}})\ a.s.\ . (A36)

See the proof of Theorem 2 in Davis & Yau (2013).

Lemma 3.

Suppose the true region segmentation Ro(k)R^{o(k)} is specified for the kthk^{\rm th} interval, then

μ^T(k)(ν^k1,ν^k)μo(k)=O(loglog(NT)NT)a.s..\hat{\mu}_{T}^{(k)}(\hat{\nu}_{k-1},\hat{\nu}_{k})-\mu^{o(k)}=O(\sqrt{\frac{\log\log(N_{\rm T})}{N_{\rm T}}})\ a.s.\ . (A37)

When the specific region segmentation R(k)R^{(k)} is an oversegmentation compared with Ro(k)R^{o(k)}, then we have

μ^T,R(k)(i),w(k)(ν^k1,ν^k)μRo(k)(i),wo(k)=O(loglog(NT)NT)a.s.i,w.\hat{\mu}_{T,R^{(k)}(i),w}^{(k)}(\hat{\nu}_{k-1},\hat{\nu}_{k})-\mu_{R^{o(k)}(i),w}^{o(k)}=O(\sqrt{\frac{\log\log(N_{\rm T})}{N_{\rm T}}})\ \ a.s.\ \ \forall i,w. (A38)

See Lemma 2 in Davis & Yau (2013) for more details.

Then we come to the main result.

Theorem 3.

Let {𝐘t|t=1,,NT}\{\bm{Y}_{t}|t=1,...,N_{\rm T}\} be the observed images specified by (Ko,𝛎o,𝛙o)(K^{o},\bm{\nu}^{o},\bm{\psi}^{o}). The estimator (K^T,𝛎^T,𝛙^T)(\hat{K}_{T},\hat{\bm{\nu}}_{T},\hat{\bm{\psi}}_{T}) is defined by (A9). Then we have

K^Ta.s.Ko,𝝂^Ta.s.𝝂o,𝝍^Ta.s.𝝍o.\begin{split}\hat{K}_{T}\xrightarrow{a.s.}K^{o},\\ \hat{\bm{\nu}}_{T}\xrightarrow{a.s.}\bm{\nu}^{o},\\ \hat{\bm{\psi}}_{T}\xrightarrow{a.s.}\bm{\psi}^{o}.\end{split} (A39)

See Theorem 3 in Davis & Yau (2013) for more details.

References

  • Adams & Bischof (1994) Adams, R., & Bischof, L. 1994, IEEE Transactions on Pattern Analysis and Machine Intelligence, 641
  • Aggarwal et al. (2018) Aggarwal, A., Schanche, N., Reeves, K. K., Kempton, D., & Angryk, R. 2018, ApJS, 236, 15, doi: 10.3847/1538-4365/aab77f
  • Astropy Collaboration et al. (2013) Astropy Collaboration, Robitaille, T. P., Tollerud, E. J., et al. 2013, A&A, 558, A33, doi: 10.1051/0004-6361/201322068
  • Astropy Collaboration et al. (2018) Astropy Collaboration, Price-Whelan, A. M., Sipőcz, B. M., et al. 2018, AJ, 156, 123, doi: 10.3847/1538-3881/aabc4f
  • Aue & Horváth (2013) Aue, A., & Horváth, L. 2013, Journal of Time Series Analysis, 34, 1, doi: 10.1111/j.1467-9892.2012.00819.x
  • Aue & Lee (2011) Aue, A., & Lee, T. C. M. 2011, The Annals of Statistics, 39, 2912
  • Barnes et al. (2017) Barnes, G., Schanche, N., Leka, K. D., Aggarwal, A., & Reeves, K. 2017, in Astroinformatics, ed. M. Brescia, S. G. Djorgovski, E. D. Feigelson, G. Longo, & S. Cavuoti, Vol. 325, 201–204, doi: 10.1017/S1743921316012758
  • Carson et al. (1999) Carson, C., Belongie, S., Greenspan, H., & Malik, J. 1999, IEEE Transactions on Pattern Analysis and Machine Intelligence, 167
  • Davis et al. (2006) Davis, R. A., Lee, T. C. M., & Rodriguez-Yam, G. A. 2006, Journal of the American Statistical Association, 101, 223
  • Davis & Yau (2013) Davis, R. A., & Yau, C. Y. 2013, Electronic Journal of Statistics, 7, 381
  • Dey et al. (2010) Dey, V., Zhang, Y., & Zhong, M. 2010, The International Society for Photogrammetry and Remote Sensing Symposium (ISPRS) TC VII Symposium-100 Years ISPRS, Vienna, Austria, XXXVIII, 31
  • Fan & Lee (2015) Fan, M., & Lee, T. C. 2015, IET Image Processing, 9, 478, doi: https://doi.org/10.1049/iet-ipr.2014.0490
  • Felzenszwalb & Huttenlocher (2004) Felzenszwalb, P. F., & Huttenlocher, D. P. 2004, International Journal of Computer Vision, 59(2), 167
  • Freeland & Handy (2012) Freeland, S. L., & Handy, B. N. 2012, SolarSoft: Programming and data analysis environment for solar physics. http://ascl.net/1208.013
  • Fruscione et al. (2006) Fruscione, A., McDowell, J. C., Allen, G. E., et al. 2006, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 6270, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, ed. D. R. Silva & R. E. Doxsey, 62701V, doi: 10.1117/12.671760
  • Güdel (2004) Güdel, M. 2004, The Astronomy and Astrophysics Review, 12, 71, doi: 10.1007/s00159-004-0023-2
  • Güdel et al. (2002) Güdel, M., Audard, M., Skinner, S. L., & Horvath, M. I. 2002, The Astrophysical Journal, 580, L73, doi: 10.1086/345404
  • Hughes et al. (2019) Hughes, J. M., Hsu, V. W., Seaton, D. B., et al. 2019, Journal of Space Weather and Space Climate, 9, A38, doi: 10.1051/swsc/2019036
  • Hurlburt et al. (2012) Hurlburt, N., Cheung, M., Schrijver, C., et al. 2012, Sol. Phys., 275, 67, doi: 10.1007/s11207-010-9624-2
  • Lee (2000) Lee, T. C. M. 2000, Journal of the American Statistical Association, 259
  • Lee (2001) —. 2001, International Statistical Review, 69, 169
  • Lemen et al. (2012) Lemen, J. R., Title, A. M., Akin, D. J., et al. 2012, Sol. Phys., 275, 17, doi: 10.1007/s11207-011-9776-8
  • Martens et al. (2012) Martens, P. C. H., Attrill, G. D. R., Davey, A. R., et al. 2012, Sol. Phys., 275, 79, doi: 10.1007/s11207-010-9697-y
  • Pesnell et al. (2012) Pesnell, W. D., Thompson, B. J., & Chamberlin, P. C. 2012, Sol. Phys., 275, 3, doi: 10.1007/s11207-011-9841-3
  • Rissanen (1989) Rissanen, J. 1989, Stochastic Complexity in Statistical Inquiry Theory (USA: World Scientific Publishing Co., Inc.)
  • Rissanen (2007) —. 2007, Information and Complexity in Statistical Modeling, doi: 10.1007/978-0-387-68812-1
  • Rousseeuw & Croux (1993) Rousseeuw, P. J., & Croux, C. 1993, Journal of the American Statistical Association, 88, 1273, doi: 10.1080/01621459.1993.10476408
  • Scargle et al. (2013) Scargle, J. D., Norris, J. P., Jackson, B., & Chiang, J. 2013, The Astrophysical Journal, 764, 167, doi: 10.1088/0004-637x/764/2/167
  • Wong et al. (2016) Wong, R. K. W., Kashyap, V. L., Lee, T. C. M., & van Dyk, D. A. 2016, Annals of Applied Statistics, 10, 1107, doi: 10.1214/16-AOAS933