PLSO: A generative framework for decomposing nonstationary time-series into piecewise stationary oscillatory components
Abstract
To capture the slowly time-varying spectral content of real-world time-series, a common paradigm is to partition the data into approximately stationary intervals and perform inference in the time-frequency domain. However, this approach lacks a corresponding nonstationary time-domain generative model for the entire data and thus, time-domain inference occurs in each interval separately. This results in distortion/discontinuity around interval boundaries and can consequently lead to erroneous inferences based on any quantities derived from the posterior, such as the phase. To address these shortcomings, we propose the Piecewise Locally Stationary Oscillation (PLSO) model for decomposing time-series data with slowly time-varying spectra into several oscillatory, piecewise-stationary processes. PLSO, as a nonstationary time-domain generative model, enables inference on the entire time-series without boundary effects and simultaneously provides a characterization of its time-varying spectral properties. We also propose a novel two-stage inference algorithm that combines Kalman theory and an accelerated proximal gradient algorithm. We demonstrate these points through experiments on simulated data and real neural data from the rat and the human brain.
1 Introduction
With the collection of long time-series now common, in areas such as neuroscience and geophysics, it is important to develop an inference framework for data where the stationarity assumption is too restrictive. We restrict our attention to data 1) with spectral properties that change slowly over time and 2) for which decomposition into several oscillatory components is warranted for interpretation, often the case in electroencephalogram (EEG) or electrophysiology recordings. One can use bandpass filtering [Oppenheim et al., 2009] or methods such as the empirical mode decomposition [Huang et al., 1998, Daubechies et al., 2011] for these purposes. However, due to the absence of a generative model, these methods lack a framework for performing inference. Another popular approach is to perform inference in the time-frequency (TF) domain on the short-time Fourier transform (STFT) of the data, assuming stationarity within small intervals. This has led to a rich literature on inference in the TF domain, such as Wilson et al. [2008]. A drawback is that most of these methods focus on estimates for the power spectral density (PSD) and lose important phase information. To recover the time-domain estimates, additional algorithms are required [Griffin and J. Lim, 1984].
This motivates us to explore time-domain generative models that allow time-domain inference and decomposition into oscillatory components. We can find examples based on the stationarity assumption in the signal processing/Gaussian process (GP) communities. A superposition of stochastic harmonic oscillators, where each oscillator corresponds to a frequency band, is used in the processing of speech [Cemgil and Godsill, 2005] and neuroscience data [Matsuda and Komaki, 2017, Beck et al., 2018]. In GP literature [Rasmussen and Williams, 2005], the spectral mixture (SM) kernel [Wilson and Adams, 2013, Wilkinson et al., 2019] models the data as samples from a GP, whose kernel consists of the superposition of localized and frequency-modulated kernels.
These time-domain models can be applied to nonstationary data by partitioning them into stationary intervals and performing time-domain inference within each interval. However, we are faced with a different kind of challenge. As the inference is localized within each interval, the time-domain estimates in different intervals are independent conditioned on the data and do not reflect the dependence across intervals. This also causes discontinuity/distortion of the time-domain estimates near the interval boundaries, and consequently any quantities derived from these estimates.
To address these shortcomings, we propose a generative framework for data with slow time-varying spectra, termed the Piecewise Locally Stationary Oscillation (PLSO) framework111Code is available at https://github.com/andrewsong90/plso.git. The main contributions are:
Generative model for piecewise stationary, oscillatory components PLSO models time-series as the superposition of piecewise stationary, oscillatory components. This allows time-domain inference on each component and estimation of the time-varying spectra.
Continuity across stationary intervals The state-space model that underlies PLSO strikes a balance between ensuring time-domain continuity across piecewise stationary intervals and stationarity within each interval. Moreover, by imposing stochastic continuity on the interval-level, PLSO learns underlying smooth time-varying spectra accurately.
2 Background
2.1 Notation
We use and to denote frequency and discrete-time sample index, respectively. We use for normalized frequency. The latent process centered at is denoted as , with denoting the sample of and , its real and imaginary parts. We also represent as a vector, . The elements of are denoted as . The state covariance matrix for is defined as . To express an enumeration of variables, we use and drop first/last index for simplicity, e.g. instead of .
We use and for the discrete-time counterpart of the continuous observation and latent process, and . With the sampling frequency , we have , , and .
2.2 Piecewise local stationarity
The concept of piecewise local stationarity (PLS) for nonstationary time-series with slowly time-varying spectra [Adak, 1998] plays an important role in PLSO. A stationary process has a constant mean and a covariance function which depends only on the difference between two time points.
For our purposes, it suffices to understand the following on PLS: 1) It includes local stationary [Priestley, 1965, Dahlhaus, 1997] and amplitude-modulated stationary processes. 2) A PLS process can be approximated as a piecewise stationary (PS) process (Theorem 1 of [Adak, 1998])
(1) |
where is a continuous stationary process and the boundaries are . Note that Eq. 1 does not guarantee continuity across different PS intervals,
3 The PLSO model and its mathematical properties
Building on the Theorem 1 of [Adak, 1998], PLSO models nonstationary data as PS processes. It is a superposition of different PS processes , with corresponding to an oscillatory process centered at frequency . PLSO also guarantees stochastic continuity across PS intervals. We show that piecewise stationarity and continuity across PS intervals are two competing objectives and that PLSO strikes a balance between them, as discussed in Section 3.2.

Fig. 1 shows an example of the PLSO framework applied to simulated data. In the time domain, the oscillation inferred using the regularized STFT (blue) [Kim et al., 2018], which imposes stochastic continuity on the STFT coefficients, suffers from discontinuity/distortion near window boundaries, whereas that inferred by PLSO (red) does not. In the frequency domain, each PLSO component corresponds to a localized spectrum , the sum of which is the PSD , and is fit to the data STFT (or periodogram) . We start by introducing the PLSO model for a single window.
3.1 PLSO for stationary data
As a building block for PLSO, we use the discrete stochastic harmonic oscillator for a stationary time series [Qi et al., 2002, Cemgil and Godsill, 2005, Matsuda and Komaki, 2017]. The data are assumed to be a superposition of independent zero-mean components
(2) |
where , , and , correspond to the rotation matrix, the state noise, and the observation noise, respectively. The imaginary component , which does not directly contribute to , can be seen as the auxiliary variable to write in recursive form using [Koopman and Lee, 2009]. We assume .
We reparameterize and , using lengthscale and power , such that and . Theoretically, this establishes a connection to 1) the stochastic differential equation [Brown et al., 2004, Solin and Särkkä, 2014], detailed in Appendix A, and 2) a superposition of frequency-modulated and localized GP kernels, similar to SM kernel [Wilson and Adams, 2013]. Practically, this ensures that , and thus stability of the process.
Given Eq. 2, we can readily express the frequency spectra of PLSO in each interval, through the autocovariance function. The autocovariance of is given as . It can also be thought of as an exponential kernel, frequency-modulated by . The spectra for , denoted as , is obtained by taking the Fourier transform (FT) of
with the detailed derivation in Appendix B.
Given , we can show that PSD of the entire process is simply . First, since is independent across , the autocovariance can be simplified, i.e., . Next, using the linearity of FT, we can conclude that is a superposition of individual spectra.
3.2 PLSO for nonstationary data
If is nonstationary, we can still apply stationary PLSO of Eq. 2 for the time-domain inference. However, this implies constant spectra for the entire data ( and do not depend on ), which is not suitable for nonstationary time-series for which we want to track spectral dynamics. This point is further illustrated in Section 6.
We therefore segment into non-overlapping PS intervals, indexed by , of length , indexed by , such that . We then apply the stationary PLSO to each interval, with additional Markovian dynamics imposed on ,
(3) |
where , and . We define as the covariance of , with . The graphical model is shown in Fig. 2.

We can understand PLSO as providing a parameterized spectrogram defined by and of the time-domain generative model. The lengthscale controls the bandwidth of the process, with larger corresponding to narrower bandwidth. The variance controls the power of and changes across different intervals, resulting in time-varying spectra and PSD . The center frequency , and , at which is maximized, controls the modulation frequency.
As discussed previously, the segmentation approach for nonstationary time-series produces distortion/discontinuity artifacts around interval boundaries - The PLSO, as described by Eq. 3, resolves these issues gracefully. We now analyze two mathematical properties, stochastic continuity and piecewise stationarity, to gain more insights on how PLSO accomplishes this.
3.2.1 Stochastic continuity
We discuss two types of stochastic continuity, 1) across the interval boundaries and 2) on .
Continuity across the interval boundaries In PLSO, the state-space model (Eq. 3) provides stochastic continuity across different PS intervals. The following proposition rigorously explains stochastic continuity for PLSO.
Proposition 1.
For a given , as , the samples on either side of the interval boundary, which are and , converge to each other in mean square,
where we use .
Proof.
We use the connection between PLSO, which is a discrete-time model, and its continuous-time counterpart. Details are in the Appendix C. ∎
This matches our intuition that as , the adjacent samples from the same process should coverge to each other. For PS approaches without the sample-level continuity, even with the interval-level constraint [Rosen et al., 2009, Kim et al., 2018, Song et al., 2018, Das and Babadi, 2018, Soulat et al., 2019], convergence is not guaranteed.
We can interpret the continuity in the context of posterior for . For PS approaches without continuity, we have
(4) |
where and are omitted for notational ease. This is due to , as a result of absence of continuity across the intervals. Consequently, the inferred time-domain estimates are conditionally independent across the intervals. On the contrary, in PLSO, the time-domain estimates depend on the entire , not just on a subset.
Continuity on For a given , we impose stochastic continuity on . Effectively, this pools together estimates of to 1) prevent overfitting to the noisy data spectra and 2) estimate smooth dynamics of . The use of ensures that is non-negative.
The choice of dictates the smoothness of , with the two extremes corresponding to the familiar dynamics. If , we treat each window independently. If , we treat the data as stationary, as the constraint forces , . Practically, the smooth regularization prevents artifacts in the spectral analysis, arising from sudden motion or missing data, as demonstrated in Section 6.
3.2.2 Piecewise stationarity
For the window to be piecewise stationary, the initial state covariance matrix should be the steady-state covariance matrix for the window, denoted as .
The challenge is transitioning from to . Specifically, to ensure , given that , the variance of the process noise between the two samples, , has to equal . However, this is infeasible. If , the variance is negative. Even if it were positive, the limit as does not equal zero, i.e., . As a result, the Proposition 1 no longer holds and the trajectory is discontinuous.
In summary, there exists a trade-off between maintaining piecewise stationarity and continuity across intervals. PLSO maintains continuity across the intervals while ensuring that the state covariance quickly transitions to the steady-state covariance. We quantify the speed of transition in the following proposition.
Proposition 2.
Assume , such that . In Eq. 3, the difference between and decays exponentially fast as a function of ,
Proof.
In Appendix D, we prove this result by induction.
∎
This implies that, except for the transition portion at the beginning of each window, we can assume stationarity. In practice, we additionally impose an upper bound on during estimation and also use a reasonably-large . Empirically, we observe that the transition period has little impact.
4 Inference
Given the generative model in Eq. 3, our goal is to perform inference on the posterior distribution
(5) |
We can learn or fix the parameters to specific values informed by domain knowledge, such as the center frequency or bandwidth of the processes. The posterior distribution factorizes into two terms as in Eq. 5, the window-level posterior and the sample-level posterior . Accordingly, we break the inference into two stages.
Stage 1 We minimize the window-level negative log-posterior, with respect to and . Specifically, we obtain maximum likelihood (ML) estimate and maximum a posteriori (MAP) estimate . We drop subscripts ML and MAP for notational simplicity.
Stage 2 Given and , we perform inference on the sample-level posterior. This includes computing the mean and credible intervals, which quantifies the uncertainty of the estimates, or any statistical quantity derived from the posterior distribution.
4.1 Optimization of and
We factorize the posterior, . As the exact inference is intractable, we instead minimize the negative log-posterior, . This is an empirical Bayes approach [Casella, 1985], since we estimate using the marginal likelihood . The smooth hyperprior provides the MAP estimate for .
We use the Whittle likelihood [Whittle, 1953], defined for stationary time-series in the frequency domain, for the log-likelihood ,
(6) |
where the log-likelihood is the sum of the Whittle likelihood computed for each interval, with discrete frequency and data STFT The Whittle likelihood, which is nonconvex, enables frequency-domain parameter estimation as a computationally more efficient alternative to the time domain estimation [Turner and Sahani, 2014]. The concave log-prior , which arises from the continuity on , is given as
(7) |
This yields the following nonconvex problem
(8) |
We optimize Eq. 8 by block coordinate descent [Wright, 2015] on and . For , , and , we minimize , since does not affect them. We perform conjugate gradient descent on and . We discuss initialization and how to estimate in the Appendix E.
4.1.1 Optimization of
We introduce an algorithm to compute a local optimal solution of for the nonconvex optimization problem in Eq. 8, by leveraging the regularized temporal structure of . It extends the inexact accelerated proximal gradient (APG) method [Li and Lin, 2015], by solving the proximal step with a Kalman filter/smoother [Kalman, 1960]. This follows since computing the proximal operator for is equivalent to MAP estimation for independent 1-dimensional linear Gaussian state-space models
(9) |
where is a step-size for the iteration, , and .
The optimization problem, , is equivalent to estimating the mean of the posterior for in a linear Gaussian state-space model with observations , observation noise variance , and state variance . Therefore, the solution can efficiently be computed with 1-dimensional, Kalman filters/smoothers, with the computational complexity of .
Note that Eq. 9 holds for all non-negative . If , the proximal operator is an identity operator, as . This is a gradient descent with a step-size rule. If , we have , , which leads to . The algorithm is guaranteed to converge when , where is the Lipschitz constant for . In practice, we select according to the step-size rule [Barzilai and Borwein, 1988]. In Appendix F & G, we present the full algorithm for optimizing and a derivation for .
4.2 Inference for
We perform inference on the posterior distribution . Since this is a Gaussian distribution, the mean trajectories and the credible intervals can be computed analytically. Moreover, Eq. 3 is a linear Gaussian state-space model, we can use Kalman filter/smoother for efficient computation with computational complexity , further discussed In Appendix H. Since we use the point estimate , the credible interval for will be narrower compared to the full Bayesian setting which accounts for all values of .
4.2.1 Monte Carlo Inference
We can also obtain posterior samples and perform Monte Carlo (MC) inference on any posterior-derived quantity. To generate the MC trajectory samples, we use the forward-filter backward-sampling (FFBS) algorithm [Carter and Kohn, 1994]. Assuming number of MC samples, the computational complexity for FFBS is , since for each sample, the algorithm uses Kalman filter/smoother for sampling. This is different from generating samples with the interval-specific posterior in Eq. 4. In the latter case, the FFBS algorithm is run times, the samples of which have to be concatenated to form an entire trajectory. With PLSO, the trajectory sample is conditioned on the entire observation and is continuous across the intervals.
One quantity of interest is the phase. We obtain the phase as . Since is a non-linear operation, we compute the mean and credible interval with MC samples through the FFBS algorithm. Given the posterior-sampled trajectories , where denotes MC sample index, we estimate , and use empirical quantiles for the associated credible interval.
4.3 Choice of and
We choose that minimizes the Akaike Information Criterion (AIC) [Akaike, 1981], defined as
(10) |
where corresponds to the number of parameters (). The regularization parameter is determined through a two-fold cross-validation, where each fold is generated by aggregating even/odd sample indices [Ba et al., 2014].
4.4 Choice of window length
The choice of window length presents the tradeoff between 1) spectral resolution and 2) the temporal resolution of the spectral dynamics [Oppenheim et al., 2009]. For a shorter window, the estimated spectral dynamics have a finer temporal resolution, coarser spectral resolution, and higher variance. For a longer window, these trends are reversed. This suggests that the choice is application-dependent. For electrophysiology data, a window on the order of seconds is used, as scientific interpretations are made on the basis of fine spectral resolution (< 1Hz). For audio signal processing [Gold et al., 2011], short windows ( ms) are used, since audio data is highly nonstationary and thus requires fine temporal resolution for processing. A survey of window lengths used in different applications can be found in the supporting information of Kim et al. [2018].
5 Related works
We examine how PLSO relates to other nonstationary frameworks.
STFT/Regularized STFT In STFT, the harmonic basis is used, whereas quasi-periodic components are used for PLSO, which allows capturing of broader spectral content. Recent works regularize STFT coefficients with stochastic continuity across the windows, to infer smooth spectral dynamics [Ba et al., 2014, Kim et al., 2018]. However, this regularization leads to discontinuities at window boundaries.
Piecewise stationary GP GP regression and parameter estimation are performed within each interval [Gramacy and Lee, 2008, Solin et al., 2018]. Consequently, the recovered trajectories are discontinuous. Also, the inversion of covariance matrix leads to an expensive inference. For example, the time-complexity of mean trajectory estimation is , whereas the time-complexity for PLSO is . Considering that the typical sampling frequency for electrophysiology data is (Hz) and windows are several seconds, which leads to , PLSO is computationally more efficient. In the Appendix I, we confirm this through an experiment.
Time-varying Autoregressive model (TVAR) The TVAR model [Kitagawa and Gersch, 1985] is given as
with the time-varying coefficients . Consequently, it does not suffer from discontinuity issues. TVAR can also be approximately decomposed into oscillatory components via eigen-decomposition of the transition matrix [West et al., 1999]. However, since the eigen-decomposition changes at every sample, this leads to an ambiguous interpretation of the oscillations in the data, as we discuss in Section 6.
RNN frameworks Despite the popularity of recurrent neural networks (RNN) for time-series applications [Goodfellow et al., 2016], we believe PLSO is more appropriate for time-frequency analysis for two reasons.
-
1.
RNNs operate in the time-domain with the goal of prediction/denoising and consequently less emphasis is placed on local stationarity or estimation of second-order statistics. Performing time-frequency analysis requires segmenting the RNN outputs and applying the STFT, which yields noisy spectral estimates.
-
2.
RNN is not a generative framework. Although variational framework can be combined with RNN [Chung et al., 2015, Krishnan et al., 2017], the use of variational lower bound objective could lead to suboptimal results. On the other hand, PLSO is a generative framework that maximizes the true log-posterior.
6 Experiments
We apply PLSO to three settings: 1) A simulated dataset 2) local-field potential (LFP) data from the rat hippocampus, and 3) EEG data from a subject under anesthesia.
We use PLSO with , , and determined by cross-validation, . We use interval lengths chosen by domain experts. As baselines, we use 1) regularized STFT (STFT-reg.) and 2) Piecewise stationary GP (GP-PS). For GP-PS, we use the same and as PLSO with , so that the estimated PSD of GP-PS and PLSO are identical. This lets us explain differences in time-domain estimates by the fact that PLSO operates in the time-domain.
6.1 Simulated dataset
We simulate from the following model for
where and are as in Eq. 2, with Hz, Hz, seconds, , and . This stationary process comprises two amplitude-modulated oscillations, namely one modulated by a slow-frequency sinusoid and the other a linearly-increasing signal [Ba et al., 2014]. We simulate 20 realizations and train on each realization, assuming 2-second PS intervals. For PLSO, we use . Additional details are provided in the Appendix J.
Results We use two metrics: 1) Mean squared error (MSE) between the mean estimate and the ground truth and 2) . The averaged results are shown in Table 1. We define to be the level of discontinuity at the interval boundaries. If greatly exceeds , this implies the existence of large discontinuities at the boundaries.


MSE | IS div. | ||
---|---|---|---|
Truth | 0.95/12.11 | 0/0 | 0 |
0.26/10.15 | 2.90/3.92 | 4.08 | |
0.22/10.32 | 3.26/4.53 | 13.78 | |
0.25/10.21 | 2.88/3.91 | 3.93 | |
STFT-reg. | 49.59/81.00 | 6.89/10.68 | N/A |
GP-PS | 16.99/23.28 | 3.00/4.04 | 4.08 |
Fig. 3 shows the true data in the time domain and spectrogram results. Fig. 3(c) shows that although the regularized STFT detects activities around and Hz, it fails to delineate the time-varying spectral pattern. Fig. 3(d) shows that PLSO with stationarity () assumption is too restrictive. Fig. 3(e), (f) show that both PLSO with independent window assumption () and PLSO with cross-validated () are able to capture the dynamic pattern, with the latter being more effective in recovering the smooth dynamics across different PS intervals.
For GP-PS and STFT-reg., exceeds , indicating discontinuities at the boundaries. An example is in Fig. 1. PLSO produces a similar jump metric as the ground truth metric, indicating the absence of discontinuities. We attribute the lower value to Kalman smoothing. For the TF domain, we use Itakura-Saito (IS) divergence [Itakura and Saito, 1970] as a distance measure between the ground truth spectra and the PLSO estimates. That the highest divergence is given by indicates the inaccuracy of the stationarity assumption.
6.2 LFP data from the rat hippocampus
We use LFP data collected from the rat hippocampus during open field tasks [Mizuseki et al., 2009], with seconds and Hz222We use channel 1 of mouse ec013.528 for the LFP. The population spikes were simultaneously recorded.. The theta neural oscillation band ( Hz) is believed to play a role in coordinating the firing of neurons in the entorhinal-hippocampal system and is important for understanding the local circuit computation.
We fit PLSO with , which minimizes AIC as shown in Table 2, with 2-second PS interval. The estimated is 7.62 Hz in the theta band. To obtain the phase for non-PLSO methods, we perform the Hilbert transform on the theta-band reconstructed signal. With no ground truth, we bandpass-filter (BPF) the data in the theta band for reference.
J | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
AIC | 2882 | 2593 | 2566 | 2522 | 2503 | 2505 |
Spike-phase coupling Fig. 4(a) shows the theta phase distribution of population neuron spikes in the hippocampus. The PLSO-estimated distribution (red) confirms the original results analyzed with bandpass-filtered signal (black) [Mizuseki et al., 2009]–the hippocampal spikes show a strong preference for a specific phase, for this dataset, of the theta band. Since PLSO provides posterior sample-trajectories for the entire time-series, we can compute as many realizations of the phase distribution as the number of MC samples. The resulting credible interval excludes the uniform distribution (horizontal gray), suggesting the statistical significance of strong phase preference.
Denoised spectrogram Fig. 4(b-c) shows the estimated spectrogram. We observe that PLSO denoises the spectrogram, while retaining sustained power at Hz and weaker bursts at Hz.
Time domain discontinuity Fig. 4(d-e) show a segment of the estimated signal and phase near a boundary for . While the estimates from STFT-reg. (blue) and PLSO (red) follow the BPF result closely, the STFT-reg. estimates exhibit discontiunity/distortion near the boundary. In Fig. 4(e), the phase jump at the boundary is degrees. We also computed in degrees/sample. Considering that the theta band roughly progresses degrees/sample, we observe that BPF (2.23), as expected, and PLSO ( 2.40, : 2.66) are not affected by the boundary effect. This is not the case for STFT-reg. (26.83) and GP-PS (25.91).

Comparison with TVAR Fig. 5(a-b) shows a segment of TVAR inference results333The details for TVAR is in the Appendix K.. Specifically, Fig. 5(a) and (b) shows a time-varying center frequency and the corresponding reconstruction, for the lowest frequency component. Note that the eigenvalues, which correspond to , and the eigenvectors, which are used for oscillatory decomposition, are derived from the estimated TVAR transition matrix. Consequently, we cannot explicitly control , as shown in Fig. 5(a), the bandwidth of each component, as well as the number of components . This is further complicated by the fact that the transition matrix changes every sample. Fig. 5(b) shows that this ambiguity results in the lowest-frequency component of TVAR explaining both the slow ( Hz) and theta components. With PLSO, on the contrary, we can explicitly specify or learn the parameters. Fig. 5(c-d) demonstrates that PLSO is able to delineate the slow/theta components without any discontinuity.
6.3 EEG data from the human brain under propofol anesthesia
We apply PLSO to the EEG data from a subject anesthetized with propofol anesthetic drug, to assess whether PLSO can leverage regularization to recover smooth spectral dynamics, which is widely-observed during propofol-induced unconsciousness444The EEG recording is part of de-identified data collected from patients at Massachusetts General Hospital (MGH) as a part of a MGH Human Research Committee-approved protocol. [Purdon et al., 2013]. The data last seconds, sampled at Hz. The drug infusion starts at and the subject loses consciousness around seconds. We use and assume a 4-second PS interval.

Smooth spectrogram Fig. 6(a-b) shows a segment of the PLSO-estimated spectrogram with and . They identify strong slow ( Hz) and alpha oscillations ( Hz), both well-known signatures of propofol-induced unconsciousness. We also observe that the alpha band power diminishes between and seconds, suggesting that the subject regained consciousness before becoming unconscious again. PLSO with exhibits PSD fluctuation across windows, since are estimated independently. The stationary PLSO () is restrictive and fails to capture spectral dynamics (Fig. 6(c)). In contrast, PLSO with exhibits smooth dynamics by pooling together estimates from the neighboring windows. The regularization also helps remove movement-related artifacts, shown as vertical lines in Fig. 6(a), around / seconds, and spurious power in Hz band. In summary, PLSO with regularization enables smooth spectral dynamics estimation and spurious noise removal.
7 Conclusion
We presented the Piecewise Locally Stationary Oscillatory (PLSO) framework to model nonstationary time-series data with slowly time-varying spectra, as the superposition of piecewise stationary (PS) oscillatory components. PLSO strikes a balance between stochastic continuity of the data across PS intervals and stationarity within each interval. For inference, we introduce an algorithm that combines Kalman theory and nonconvex optimization algorithms. Applications to simulated/real data show that PLSO preserves time-domain continuity and captures time-varying spectra. Future directions include 1) the automatic identification of PS intervals and 2) the expansion to higher-order autoregressive models and diverse priors on the parameters that enforce continuity across intervals.
Acknowledgements.
We thank Abiy Tasissa for helpful discussion on the algorithm and the derivations. This research is supported by NIH grant P01 GM118629 and support from the JPB Foundation. AHS acknowledges Samsung scholarship.References
- Adak [1998] S. Adak. Time-dependent spectral analysis of nonstationary time series. Journal of the American Statistical Association, 93(444):1488–1501, 1998.
- Akaike [1981] H. Akaike. Likelihood of a model and information criteria. Journal of Econometrics, 16(1):3–14, 1981.
- Ba et al. [2014] D. Ba, B. Babadi, P. L. Purdon, and E. N. Brown. Robust spectrotemporal decomposition by iteratively reweighted least squares. Proceedings of the National Academy of Sciences, 111(50):E5336–E5345, 2014.
- Barzilai and Borwein [1988] J. Barzilai and J. M. Borwein. Two-Point Step Size Gradient Methods. IMA Journal of Numerical Analysis, 8(1):141–148, 01 1988.
- Beck et al. [2018] A. M. Beck, E. P. Stephen, and P. L. Purdon. State space oscillator models for neural data analysis. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 4740–4743, 2018.
- Brown et al. [2004] E. N. Brown, V. Solo, Y. Choe, and Z. Zhang. Measuring period of human biological clock: infill asymptotic analysis of harmonic regression parameter estimates. Methods in enzymology, 383:382–405, 2004.
- Carter and Kohn [1994] C. K. Carter and R. Kohn. On gibbs sampling for state space models. Biometrika, 81(3):541–553, 1994.
- Casella [1985] G. Casella. An introduction to empirical bayes data analysis. The American Statistician, 39(2):83–87, 1985.
- Cemgil and Godsill [2005] A. T. Cemgil and S. J. Godsill. Probabilistic phase vocoder and its application to interpolation of missing values in audio signals. In 2005 13th European Signal Processing Conference, pages 1–4, 2005.
- Chung et al. [2015] J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio. A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
- Dahlhaus [1997] R. Dahlhaus. Fitting time series models to nonstationary processes. Ann. Statist., 25(1):1–37, 02 1997.
- Das and Babadi [2018] P. Das and B. Babadi. Dynamic bayesian multitaper spectral analysis. IEEE Transactions on Signal Processing, 66(6):1394–1409, 2018.
- Daubechies et al. [2011] I. Daubechies, J. Lu, and H. Wu. Synchrosqueezed wavelet transforms: An empirical mode decomposition-like tool. Applied and Computational Harmonic Analysis, 30(2):243 – 261, 2011.
- Gold et al. [2011] B. Gold, N. Morgan, and D. Ellis. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley-Interscience, USA, 2nd edition, 2011. ISBN 0470195363.
- Goodfellow et al. [2016] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
- Gramacy and Lee [2008] R. B. Gramacy and H. K. H. Lee. Bayesian treed gaussian process models with an application to computer modeling. Journal of the American Statistical Association, 103(483):1119–1130, 2008.
- Griffin and J. Lim [1984] D. Griffin and J. Lim. Signal estimation from modified short-time fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(2):236–243, 1984.
- Huang et al. [1998] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. Yen, C. C. Tung, and H. H. Liu. The empirical mode decomposition and the hilbert spectrum for nonlinear and non-stationary time series analysis. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 454(1971):903–995, 1998.
- Itakura and Saito [1970] F. Itakura and S. Saito. A statistical method for estimation of speech spectral density and formant frequencies. 1970.
- Kalman [1960] R.E. Kalman. A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 1960.
- Kim et al. [2018] S. Kim, M. K. Behr, D. Ba, and E. N. Brown. State-space multitaper time-frequency analysis. Proceedings of the National Academy of Sciences, 115(1):E5–E14, 2018.
- Kitagawa and Gersch [1985] G. Kitagawa and W. Gersch. A smoothness priors time-varying ar coefficient modeling of nonstationary covariance time series. IEEE Transactions on Automatic Control, 30(1):48–56, 1985.
- Koopman and Lee [2009] S. J. Koopman and K. M. Lee. Seasonality with trend and cycle interactions in unobserved components models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 58(4):427–448, 2009.
- Krishnan et al. [2017] R. G. Krishnan, U. Shalit, and D. Sontag. Structured inference networks for nonlinear state space models. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 2101–2109. AAAI Press, 2017.
- Li and Lin [2015] H. Li and Z. Lin. Accelerated proximal gradient methods for nonconvex programming. In Advances in Neural Information Processing Systems 28, pages 379–387. Curran Associates, Inc., 2015.
- Lindsten and Schön [2013] F. Lindsten and T. B. Schön. Backward Simulation Methods for Monte Carlo Statistical Inference. 2013.
- Matsuda and Komaki [2017] T. Matsuda and F. Komaki. Time series decomposition into oscillation components and phase estimation. Neural Computation, 29(2):332–367, 2017.
- Mizuseki et al. [2009] K. Mizuseki, A. Sirota, E. Pastalkova, and G. Buzsaki. Theta oscillations provide temporal windows for local circuit computation in the entorhinal-hippocampal loop. Neuron, 64:267–280, 2009.
- Oppenheim et al. [2009] A. V. Oppenheim, R. W. Schafer, and J. R. Buck. Discrete-time Signal Processing (3rd Ed.). Prentice-Hall, Inc., 2009.
- Priestley [1965] M. B. Priestley. Evolutionary spectra and non-stationary processes. Journal of the Royal Statistical Society. Series B (Methodological), 27(2):204–237, 1965.
- Purdon et al. [2013] P. L. Purdon, E. T. Pierce, E. A. Mukamel, M. J. Prerau, J. L. Walsh, K. F. K. Wong, A. F. Salazar-Gomez, P. G. Harrell, A. L. Sampson, A. Cimenser, S. Ching, N. J. Kopell, C. Tavares-Stoeckel, K. Habeeb, R. Merhar, and E. N. Brown. Electroencephalogram signatures of loss and recovery of consciousness from propofol. Proceedings of the National Academy of Sciences, 110(12):E1142–E1151, 2013.
- Qi et al. [2002] Y. Qi, T. P. Minka, and R. W. Picard. Bayesian spectrum estimation of unevenly sampled nonstationary data. In 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pages II–1473–II–1476, 2002.
- Rasmussen and Williams [2005] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.
- Rosen et al. [2009] O. Rosen, D. S. Stoffer, and S. Wood. Local spectral analysis via a bayesian mixture of smoothing splines. Journal of the American Statistical Association, 104(485):249–262, 2009.
- Solin and Särkkä [2014] A. Solin and S. Särkkä. Explicit Link Between Periodic Covariance Functions and State Space Models. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, volume 33 of Proceedings of Machine Learning Research, pages 904–912. PMLR, 2014.
- Solin et al. [2018] A. Solin, J. Hensman, and R. E Turner. Infinite-horizon gaussian processes. In Advances in Neural Information Processing Systems 31, pages 3486–3495. 2018.
- Song et al. [2018] A. H. Song, S. Chakravarty, and E. N. Brown. A smoother state space multitaper spectrogram. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 33–36, July 2018.
- Soulat et al. [2019] H. Soulat, E. P. Stephen, A. M. Beck, and P. L. Purdon. State space methods for phase amplitude coupling analysis. bioRxiv, 2019.
- Turner and Sahani [2014] R. E. Turner and M. Sahani. Time-frequency analysis as probabilistic inference. IEEE Transactions on Signal Processing, 62(23):6171–6183, 2014.
- West et al. [1999] M. West, R. Prado, and A. D. Krystal. Evaluation and Comparison of EEG Traces: Latent Structure in Nonstationary Time Series. Journal of the American Statistical Association, 94(448):1083–1095, 1999.
- Whittle [1953] P. Whittle. Estimation and information in stationary time series. Ark. Mat., 2(5):423–434, 08 1953.
- Wilkinson et al. [2019] W. J. Wilkinson, M. Riis Andersen, J. D. Reiss, D. Stowell, and A. Solin. Unifying probabilistic models for time-frequency analysis. In 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3352–3356, May 2019.
- Wilson and Adams [2013] A. G. Wilson and R. P. Adams. Gaussian process kernels for pattern discovery and extrapolation. In Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML’13, page III–1067–III–1075, 2013.
- Wilson et al. [2008] K. W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran. Speech denoising using nonnegative matrix factorization with priors. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4029–4032, 2008.
- Wright [2015] S. J. Wright. Coordinate descent algorithms. Mathematical Programming, 151:3–34, 2015.
Appendix
References to the sections in the section headings are made with respect to the sections in the main text. Below is the table of contents for the Appendix.
-
A.
Continuous model interpretation of PLSO (Section 3)
-
B.
PSD for complex AR(1) process
-
C.
Proof for Proposition 1 (Section 3.2.1)
-
D.
Proof for Proposition 2 (Section 3.2.2)
-
E.
Initialization for PLSO (Section 4)
-
F.
Optimization for via proximal gradient update (Section 4.1.1)
-
G.
Lipschitz constant for (Section 4.1.1)
-
H.
Inference with (Section 4.2)
-
I.
Computational efficiency of PLSO vs. GP-PS
-
J.
Simulation experiment (Section 5.1)
-
K.
Details of TVAR model (Section 5.2)
-
L.
Anesthesia EEG dataset (Section 5.3)
A. Continuous model interpretation of PLSO
B. PSD for complex AR(1) process
We compute the steady-state covariance denoted as . Since we assume , it is easy to show that is a diagonal matrix from . Denoting , we use the discrete Lyapunov equation
(13) |
which implies that under the assumption , we are guaranteed , . To compute the PSD of the stationary process , we now need to compute the autocovariance. Since only contributes to , we compute the autocovariance of as
(14) |
where denotes the operator that extracts the real part of the complex argument and we used the fact that . The spectra for the component, can be written as
(15) |
Unpacking the infinite sum for one of the terms,
(16) |
Using the relation and unpacking the infinite sum for the other term, we have
Since Fourier transform is a linear operator, we can conclude that .
C. Proof for Proposition 1 (Section 3.2.1)
Proposition 1. For a given , as , the samples on either side of the interval boundary, which are and , converge to each other in mean square,
where we use .
D. Proof for Proposition 2 (Section 3.2.2)
Proposition 2. Assume , such that . In Eq. 3, the difference between and decays exponentially fast as a function of , for ,
Proof.
We first obtain the steady-state covariance , similar to Appendix B. Since we assume , we can show that , is a diagonal matrix, noting that . Denoting , we now use the discrete Lyapunov equation
(17) |
We now prove the proposition by induction. For fixed and , and for ,
Assuming the same holds for , we have for ,
By the principle of induction, Eq. D. Proof for Proposition 2 (Section 3.2.2) holds for . ∎
E. Initialization & Estimation for PLSO (Section 4)
E.1 Initialization
As noted in the main text, we use AIC to determine the optimal number of . For a given number of components , we first construct the spectrogram of the data using STFT and identify the frequency bands with prominent power, i.e., frequency bands whose average power exceeds pre-determined threshold. The center frequencies of these bands serve as the initial center frequencies , which are either fixed throughout the algorithm or further refined through the estimation algorithm in the main text. If exceeds the number of identified frequency bands from the spectrogram, 1) we first place in the prominent frequency bands and 2) we then place the remaining components uniformly spread out in , where is a cutoff frequency to be further determined in the next section. As for , we set it to be a certain fraction of the corresponding . We then fit and with , through the procedure explained in Stage 1. We finally use these estimates as initial values for other values of .
E.2 Estimation for
There are two possible ways to estimate the observation noise variance . The first approach is to perform maximum likelihood estimation of with respect to . The second approach, which we found to work better in practice and use throughout the manuscript, is to directly estimate it from the Fourier transform of the data. Given a cutoff frequency , informed by domain knowledge, we take the average power of the Fourier transform of in . For instance, it is widely known that the spectral content below 40 Hz in anesthesia EEG dataset is physiologically relevant and we use Hz.
F. Optimization for via proximal gradient update (Section 4.1.1)
We discuss the algorithm to obtain a local optimal solution of to the MAP optimization problem in Eq. 8. We define and for notational decluttering, to be used in this section.
We rewrite Eq. 8 as
(18) |
The algorithm is described in Algorithm 1. It follows the steps outlined in the inexact accelerated proximal gradient algorithm [Li and Lin, 2015]. For faster convergence, we use larger step sizes with the Barzilai-Borwein (BB) step size initialization rule [Barzilai and Borwein, 1988]. For rest of this section, we drop dependence on . The main novelty of our algorithm is the proximal gradient update
(19) |
where the same holds for . The auxiliary variables ensure convergence of . We use to denote entry of . As mentioned in the main text, this can be solved in a computationally efficient manner by using Kalman filter/smoother.
G. Lipschitz constant for (Section 4.1.1)
In this section, we prove that under some assumptions, we can show that the log-likelihood has Lipschitz continuous gradient with the Lipschitz constant . As in the previous section, we use and .
Let us start by restating the definition of Lipschitz continuous gradient.
Definition A continuously differentiable function is Lipschitz continuous gradient if
(20) |
where is a compact subset of and is the Lipschitz constant.
Our goal is to find the constant for the Whittle likelihood
(21) |
where we use and for notational simplicity and
We make the following assumptions
-
1.
is bounded, i.e., . With the real-world signal, we can reasonably assume that or energy of the signal is bounded.
-
2.
, is bounded, i.e., for some . This implies .
In addition, we have the following facts
-
1.
, and are nonnegative.
-
2.
and are bounded. This follows from the aforementioned assumptions.
-
3.
For given , we have bounded . To see this, note that the maximum of is acheived at ,
(22) Therefore, denoting ,
(23)
Finally, we define .
We want to compute the Lipschitz constant for and for , i.e.,
(24) |
where we dropped dependence on for notational simplicity. Consequently, the triangle inequality yields
(25) |
Lipschitz constant for
Let us examine first. The derivative with respect to is given as
(26) |
We now have
(27) |
Without loss of generality, we assume . We now apply the mean value theorem (MVT) to
(28) |
where .We can compute and bound as follows
(29) |
Combining Eq. 28 and 29, we have
(30) |
We thus have,
(31) |
Lipschitz constant for
Computing proceeds in a similar manner to computing . The derivative with respect to is given as
(32) |
We now have
(33) |
Without loss of generality, assume . To apply MVT, we need to compute and bound
(34) |
Applying MVT,
(35) |
We thus have,
(36) |
Collecting the Lipschitz constants and , we finally have
(37) |
H. Inference with (Section 4.2)
We present the details for performing inference with , given the estimates and from window-level inference. First, we present the Kalman filter/smoother algorithm to compute the mean posterior trajectory, and the credible interval. Next, we present the forward filtering backward sampling (FFBS) algorithm Carter and Kohn [1994], Lindsten and Schön [2013] to generate Monte Carlo (MC) sample trajectories.
First, we define additional notations.
1)
Posterior mean of . We are primarily concerned with the following three types: 1) , the one-step prediction estimate, 2) , the Kalman filter estimate, and 3) , the Kalman smoother estimate.
2)
A collection of in a single vector.
3)
Posterior covariance of . Just as in , we are interested in three types, i.e., , , and .
4)
A block diagonal matrix of posterior covariance matrices.
5)
A block digonal transition matrix.
6) The observation gain.
Kalman filter/smoother
The Kalman filter equations are given as
(38) |
Subsequently, the Kalman smoother equations are given as
(39) |
To obtain the mean reconstructed trajectory for the oscillatory component, , we take the real part of the component of the smoothed mean, , where is a unit vector with the only non-zero value, equal to 1, at the entry .
The 95% credible interval for , denoted as / for the upper/lower end, respectively, is given as
(40) |
FFBS algorithm for
To generate the MC trajectory samples from the posterior distribution , we use the FFBS algorithm. The steps are summarized in Algorithm 2, which uses the Kalman estimates derived in the previous section. We denote as the MC sample index.
I. Computational efficiency of PLSO vs. GP-PS
We show the runtime of PLSO and piecewise stationary GP (GP-PS) for inference of the mean trajectory of the hippocampus data ( Hz, , 2-second window) for varying data lengths (, , seconds corresponding to sample points, respectively).
As noted in Section 5, the computational complexity of PLSO is , where as the computational complexity of GP-PS is . Since , the number of samples per window, is fixed ( samples), we expect both PLSO and GP-PS to be linear in terms of the number of samples . Table 3 indeed confirms that this is the case. However, we observe that PLSO is much more efficient than GP-PS.
PLSO | GP-PS | |
---|---|---|
50 | 1.7 | 346.8 |
100 | 3.1 | 700.6 |
200 | 6.5 | 1334.0 |
J. Simulation experiment (Section 5.1)
We simulate from the following model for
where and are from the PLSO stationary generative model, i.e., . The parameters are Hz, Hz, seconds, , and . This stationary process comprises two amplitude-modulated oscillations, namely one modulated by a slow-frequency ( Hz) sinusoid and the other a linearly-increasing signal Ba et al. [2014]. We assume a 2-second PS interval. For PLSO, we use components and 5 block coordinate descent iterations for optimizing and .
K. Details of the TVAR model
As explained in Section 5, the TVAR model is defined as
which can alternatively be written as
It is the transition matrix that determines the oscillatory component profile at time , such as the number of components and the center frequencies. Specifically, are first fit to the data and then eigen-decomposition is performed on each of the estimated . More in-depth technical details can be found in West et al. [1999].
We use publicly available code for the TVAR implementation555https://www2.stat.duke.edu/ mw/mwsoftware/TVAR/index.html. We use TVAR order of , as the models with lower orders than this value did not capture the theta-band signal - The lowest frequency band in these cases were the gamma band ( Hz). Even with the higher orders of , and various hyperparameter combinations, we observed that the slow and theta frequency band was still explained by a single oscillatory component. For the discount factor, we used to ensure that the TVAR coefficients and, consequently, the decomposed oscillatory components do not fluctuate much.
L. Anesthesia EEG dataset (Section 5.3)
We show spectral analysis results for the EEG data of a subject anesthetized with propofol (This is a different subject from the main text.) The data last seconds, sampled at Hz. We assume a 4-second PS interval, use components and 5 block coordinate descent iterations for optimizing and .
Fig. 7 shows the STFT and the PLSO-estimated spectrograms. As noted in the main text, PLSO with stationarity assumption is too restrictive and fails to capture the time-varying spectral pattern. Both PLSO with and are more effective in capturing such patterns, with the latter able to remove the artifacts and better recover the smoother dynamics.
