This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\DeclareCaptionOptionNoValue

*ljustified@setformatplain@setjustificationljustified

Information-theoretical analysis of statistical measures for multiscale dynamics

Naoki Asuke [email protected] Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan    Tomoki Yamagami Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan    Takatomo Mihana Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan    André Röhm Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan    Ryoichi Horisaki Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan    Makoto Naruse Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
Abstract

Multiscale entropy (MSE) has been widely used to examine nonlinear systems involving multiple time scales, such as biological and economic systems. Conversely, Allan variance has been used to evaluate the stability of oscillators, such as clocks and lasers, ranging from short to long time scales. Although these two statistical measures were developed independently for different purposes in different fields in the literature, their interest is to examine multiscale temporal structures of physical phenomena under study. We show that, from an information-theoretical perspective, they share some foundations and exhibit similar tendencies. We experimentally confirmed that similar properties of the MSE and Allan variance can be observed in low-frequency fluctuations (LFF) in chaotic lasers and physiological heartbeat data. Furthermore, we calculated the condition under which this consistency between the MSE and Allan variance exists, which is related to certain conditional probabilities. Heuristically, physical systems in nature including the aforementioned LFF and heartbeat data mostly satisfy this condition, and hence the MSE and Allan variance demonstrate similar properties. As a counterexample, an artificially constructed random sequence is demonstrated, for which the MSE and Allan variance exhibit different trends.

preprint: AIP/123-QED

I Introduction

Multiscale entropy (MSE) Costa, Goldberger, and Peng (2002) has been used widely to evaluate nonlinear systems that involve multiple time scales in biology,Catarino et al. (2011); Costa, Goldberger, and Peng (2005) economics,Martina et al. (2011) transportation,Wang et al. (2013) and other fields. Meanwhile, Allan variance Allan (1966) has been used to evaluate the stability of oscillators,Allan (1966); Filler (1988); Skrinsky, Skrinska, and Zelinger (2014) such as atomic clocks or lasers, over many time scales. Although these two statistical measures were developed independently in different domains, the MSE and its variantsHumeau-Heurtier (2015) and Allan variance have a similar objective: to characterize dynamical systems containing multiple time scales.

However, the relevance and differences between the MSE and Allan variance have yet to be examined in the literature. In this study, we discuss the similarities and differences between the MSE and Allan variance from an information-theoretical perspective and from the viewpoint of computational cost, and reveal the mechanisms behind them. We experimentally validated the similar tendencies of the MSE and Allan variance observed in low-frequency fluctuations (LFF) in chaotic lasers that are numerically simulated by the Lang–Kobayashi equations,Lang and Kobayashi (1980) and physiological heartbeat data available in the public domain.Goldberger et al. (e 13); Irurzun et al. (2021); Baim et al. (1986); Petrutiu, Sahakian, and Swiryn (2007) Furthermore, we present the underlying mechanism behind the consistency between the MSE and Allan variance based on conditional probabilities. We artificially constructed a random sequence that violated this condition, leading to a case exhibiting inconsistent results between the MSE and Allan variance.

As discussed in detail below, the LFF and heartbeat contain both slow and fast dynamics; they represent typical physical phenomena with multiple time scales. Furthermore, recent studies on physics-based computing and communications, such as reservoir computing,Larger et al. (2012) laser-chaos-based secure communication,Rogister et al. (2001); Mengue and Essimbi (2012) and laser networks for solving reinforcement learning problems,Mihana et al. (2020) work across multiple time scales; hence, understanding fundamental attributes in multiple time scales through statistical measures, such as the MSE and Allan variance, is essential for furthering our comprehension.

The remainder of this paper is organized as follows. We review the Allan variance and MSE in Sections II.1 and II.2, respectively. Section III examines the MSE and Allan variance for several time series, namely noise, RR interval,Wagner and Marriott (2008) and laser chaos. A similar tendency of the MSE and Allan variance is shown for each time series. Section IV discusses the mechanism behind the similarity between the two measures. Additionally, the cause of the differences in behavior is discussed. Section V concludes the paper.

II Theory

II.1 Allan variance

Coarse-graining of a time series s(t)(t{1,2,,N})s(t)~{}(t\in\{1,2,\cdots,N\}) with a scale factor τA\tau_{A} refers to the operation that obtains another time series s¯(τA)(l)(l{1,2,,N/τA})\bar{s}^{(\tau_{A})}(l)~{}(l\in\{1,2,\cdots,\lfloor N/\tau_{A}\rfloor\}) as follows:

s¯(τA)(l)\displaystyle\bar{s}^{(\tau_{A})}(l) =1τAt=(l1)τA+1lτAs(t).\displaystyle=\dfrac{1}{\tau_{A}}\sum_{t=(l-1)\tau_{A}+1}^{l\tau_{A}}s(t). (1)

Here, \lfloor\,\cdot\,\rfloor denotes floor function. The time series s¯(τA)(l)\bar{s}^{(\tau_{A})}(l) is called a coarse-grained time series. This operation obtains a new point by taking the average of every τA\tau_{A} point. Using coarse-graining, the Allan varianceAllan (1966) of a time series s(t)s(t) with scale factor τA\tau_{A} is defined as

σs2(τA)\displaystyle\sigma_{s}^{2}(\tau_{A}) =12N/τAl=2N/τA(Δs¯(τA)(l))2,\displaystyle=\dfrac{1}{2\lfloor N/\tau_{A}\rfloor}\sum_{l=2}^{\lfloor N/\tau_{A}\rfloor}\left(\Delta\bar{s}^{(\tau_{A})}(l)\right)^{2}, (2)

where

Δs¯(τA)(l)\displaystyle\Delta\bar{s}^{(\tau_{A})}(l) =s¯(τA)(l)s¯(τA)(l1).\displaystyle=\bar{s}^{(\tau_{A})}(l)-\bar{s}^{(\tau_{A})}(l-1). (3)

The Allan variance considers the variance of the difference between successive points in the coarse-grained time series under study, with respect to the time scale given by the coarse-graining time τA\tau_{A}.

Meanwhile, it is known that the Allan variance can be expressed using the power spectral density S(f)S(f) of the signal s(t)s(t) as follows, under the assumption that s(t)s(t) is stationary and ergodicBarnes et al. (1971):

limNσs2(τA)\displaystyle\lim_{N\to\infty}\sigma^{2}_{s}(\tau_{A}) =20S(f)sin4(πfτA)(πfτA)2𝑑f.\displaystyle=2\int_{0}^{\infty}S(f)\dfrac{\sin^{4}(\pi f\tau_{A})}{(\pi f\tau_{A})^{2}}df. (4)

Using Eq. (4), for example, the Allan variance of white noise is as follows:

20h0sin4(πfτA)(πfτA)2𝑑f=h02τA.\displaystyle 2\int_{0}^{\infty}h_{0}\dfrac{\sin^{4}(\pi f\tau_{A})}{(\pi f\tau_{A})^{2}}df=\dfrac{h_{0}}{2\tau_{A}}. (5)

Similarly, for 1/f1/f noise, the Allan variance is

20h1fsin4(πfτA)(πfτA)2𝑑f=2h1ln2.\displaystyle 2\int_{0}^{\infty}\dfrac{h_{-1}}{f}\dfrac{\sin^{4}(\pi f\tau_{A})}{(\pi f\tau_{A})^{2}}df=2h_{-1}\ln 2. (6)

Here h0h_{0} and h1h_{-1} represent the intensities of the white and 1/f1/f components, respectively. We can then estimate h0h_{0} and h1h_{-1} from the Allan variance with various τA\tau_{A}. In this way, the Allan variance has been used to evaluate the variability characteristics of time-series data.

II.2 Multiscale entropy (MSE)

Before discussing multiscale entropy,Costa, Goldberger, and Peng (2002) it is necessary to introduce original sample entropy (SaEn). Richman and Moorman (2000) The theoretical background and examples of SaEn were described in detail by Richman and Moorman.Delgado-Bonal and Marshak (2019) We will use their definition of SaEn.

In short, SaEn quantifies the regularity of a time series. SaEn has several parameters: the embedding dimension mm, tolerance rr and length of the time series NN. The SaEn of a time series s(t)s(t) is defined with the correlation integral Cm(r)C^{m}(r) as follows:

SaEn(m,r,N)=logCm+1(r)Cm(r).\displaystyle SaEn(m,r,N)=-\log\dfrac{C^{m+1}(r)}{C^{m}(r)}. (7)

Several steps must be taken to define and compute Cm(r)C^{m}(r). First, the embedded vector series 𝒗m(t)\bm{v}^{m}(t) is constructed as follows:

𝒗m(t)\displaystyle\bm{v}^{m}(t) =[s(t),s(t+1),,s(t+m1)]T.\displaystyle=[s(t),s(t+1),\cdots,s(t+m-1)]^{T}. (8)

Please note that the length of the vector series defined as Eq. (8) is Nm+1N-m+1 because the mm-th component of 𝒗m(Nm+1)\bm{v}^{m}(N-m+1) is s(N)s(N). Second, for each embedded vector, Cim(r)C_{i}^{m}(r) and Cim+1(r)C_{i}^{m+1}(r) are defined as follows:

Cim(r)\displaystyle C_{i}^{m}(r) =1Nm1j=1jiNmu(rdcheb(𝒗m(i),𝒗m(j))),\displaystyle=\dfrac{1}{N-m-1}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{N-m}u(r-d_{\rm cheb}(\bm{v}^{m}(i),\bm{v}^{m}(j))), (9)
Cim+1(r)\displaystyle C_{i}^{m+1}(r) =1Nm1j=1jiNmu(rdcheb(𝒗m+1(i),𝒗m+1(j))).\displaystyle=\dfrac{1}{N-m-1}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{N-m}u(r-d_{\rm cheb}(\bm{v}^{m+1}(i),\bm{v}^{m+1}(j))). (10)

Here, uu denotes the unit step function, defined as

u(x)={0(x<0)1(x0).\displaystyle u(x)=\begin{cases}0&(x<0)\\ 1&(x\geq 0)\end{cases}. (11)

dchebd_{\rm cheb} is the Chebyshev distance and dcheb(𝒗m(i),𝒗m(j))d_{\rm cheb}(\bm{v}^{m}(i),\bm{v}^{m}(j)) is defined as follows:

dcheb(𝒗m(i),𝒗m(j))\displaystyle d_{\rm cheb}(\bm{v}^{m}(i),\bm{v}^{m}(j)) =maxk{0,1,,m1}|s(i+k)s(j+k)|.\displaystyle=\max_{k\in\{0,1,\cdots,m-1\}}|s(i+k)-s(j+k)|. (12)

Please note that the maximum jj is not Nm+1N-m+1 but NmN-m in Eq. (9) because we also calculate Cim+1(r)C_{i}^{m+1}(r) to compute SaEn and the total number of embedded vectors in the m+1m+1 dimension is NmN-m. Intuitively, the definition of Cim(r)C^{m}_{i}(r) is the probability of a randomly chosen 𝒗m(j)(ji)\bm{v}^{m}(j)~{}(j\neq i) satisfying

dcheb(𝒗m(i),𝒗m(j))\displaystyle d_{{\rm cheb}}(\bm{v}^{m}(i),\bm{v}^{m}(j)) <r.\displaystyle<r. (13)

It is noteworthy that, by the definition of the Chebyshev distance, the following proposition holds:

dcheb(𝒗m(i),𝒗m(j))<riffk{0,1,,m1},|s(i+k)s(j+k)|<r.\displaystyle\begin{split}&d_{{\rm cheb}}(\bm{v}^{m}(i),\bm{v}^{m}(j))<r\\ \mbox{iff}\ \,&\forall k\in\{0,1,\cdots,m-1\},\ |s(i+k)-s(j+k)|<r.\end{split} (14)

This proposition plays an important role in Sec. IV.1.

From the definition, Cim(r)C^{m}_{i}(r) can be regarded as the probability that a randomly chosen 𝒗m(j)\bm{v}^{m}(j) is a neighbor of the embedded vector 𝒗m(i)\bm{v}^{m}(i). In other words, it is the likelihood of patterns [s(j),s(j+1),s(j+m1)]T\left[s(j),s(j+1),\cdots s(j+m-1)\right]^{T} which are considered repetitions of the pattern [s(i),s(i+1),,s(i+m1)]T\left[s(i),s(i+1),\cdots,s(i+m-1)\right]^{T} under the tolerance rr. With Cim(r)C_{i}^{m}(r), the correlation integral Cm(r)C^{m}(r) is defined as

Cm(r)\displaystyle C^{m}(r) =1Nmi=1NmCim(r).\displaystyle=\dfrac{1}{N-m}\sum_{i=1}^{N-m}C_{i}^{m}(r). (15)

Intuitively, Cm(r)C^{m}(r) is the average of Cim(r)C_{i}^{m}(r); therefore, we can regard Cm(r)C^{m}(r) as the probability that randomly chosen vectors 𝒗m(i)\bm{v}^{m}(i) and 𝒗m(j)(ij)\bm{v}^{m}(j)~{}(i\neq j) are close. A large Cm(r)C^{m}(r) indicates that time series s(t)s(t) contains many repeating structures of length mm. SaEn then quantifies whether, when mm consecutive points in the time series are considered repeated, the (m+1)(m+1)-th point is also considered repeated.

In other words, it quantifies whether the (m+1)(m+1)-th point is predictable by looking at the mm preceding points. SaEn can be applied to real-world data without assuming any model governing the time series.

The MSE is defined as the SaEn of a coarse-grained time series and can be plotted as a function of τA\tau_{A}, where the same rr is used for every τA\tau_{A}. The MSE was proposed to quantify complexity at various time scales. 1/f1/f noise has a long-time correlation; therefore, it is considered more complex than white noise. However, SaEn assigns the maximum value to white noise. In contrast, the MSE of white noise monotonically decreases with increasing τA\tau_{A}, whereas the MSE of 1/f1/f noise is constant,Costa, Goldberger, and Peng (2002) i.e., 1/f1/f noise has more complexity than white noise over a longer time scale.

As introduced in Secs. II.1 and II.2, the definitions of the MSE and Allan variance are apparently completely different. In this study, we demonstrate that the MSE and Allan variance exhibit similar τA\tau_{A} dependency, and reveal the underlying mechanism from an information-theoretical perspective. Note that for 1/f1/f noise, it has already been shown theoretically that both the Allan variance and MSE are constants, independent of τA\tau_{A}.

III Comparison of the MSE and Allan variance

To compare the behavior of the MSE and the correlation integral Cm(r)C^{m}(r) to that of the Allan variance, they were calculated for the three types of signals described later. mm was fixed to 22, and rr was fixed at 0.15×SD(s)0.15\times SD(s),Costa, Goldberger, and Peng (2002) where SD(s)SD(s) denotes the standard deviation of the signal s(t)s(t).

III.1 Noise

Thirty temporal waveforms of white Gaussian noise (WGN) and 1/f1/f noise containing 10710^{7} points each were generated numerically, and the MSE and Allan variance were calculated up to τA=103\tau_{A}=10^{3} points. We fixed the length of the coarse-grained time series to 10410^{4} points for the MSE calculation owing to limitations of our computing environment, whereas the entire time series was used for the Allan variance calculation. The results are presented in Fig. 1.

The Allan variance and MSE or Cm(r)-C^{m}(r) show similar trends with regard to τA\tau_{A}, with appropriate scaling. Please note that the scales of the y-axes in Cm(r)C^{m}(r) plots and the Allan variance plots are identical (logarithmic), while the y-axes in the MSE plots are linear. Because the MSE is defined by the logarithm of Cm(r)C^{m}(r), it is natural to plot it linearly when Cm(r)C^{m}(r) is plotted logarithmically.

Figures 1(a) and (c) show the MSE and Allan variance of WGN, respectively, where both decrease monotonically as a function of τA\tau_{A}. This trend agrees well up to τA=102\tau_{A}=10^{2} points. For τA>102\tau_{A}>10^{2} points, the Allan variance still shows 1/τA1/\tau_{A} dependency, as indicated by Eq. (5), whereas the MSE saturates at 0. In this region, Cm(r)C^{m}(r) is almost 1, i.e., almost all of the embedded vectors are neighbors of each other. This saturation effect is one of the differences between the MSE and Allan variance.

Figures 1(b) and (d) show the MSE and Allan variance of 1/f1/f noise, respectively. Theoretically, both are predicted to be independent of τA\tau_{A}, whereas the simulation results show some τA\tau_{A} dependency. This may be a result of the difficulty in properly generating 1/f1/f noise on scales from τA\tau_{A} = 11 to 10310^{3}, however it is noteworthy that the MSE, Cm(r)-C^{m}(r) and Allan variance exhibit similar curves as a function of τA\tau_{A}.

Refer to caption
Figure 1: Comparison of the MSE and Allan variance for WGN and 1/f1/f noise. (a,b) The MSE and Cm(r)C^{m}(r) of WGN and 1/f1/f noise, respectively. The black, red, and blue curves represent the MSE, C3(r)C^{3}(r) and C2(r)C^{2}(r), respectively. Each curve is the average of independent results from 30 trials, and the filled area around the curve represents the first and third quartile. The variance of the computed results is so slight that it is difficult to observe the painted areas. (c,d) The Allan variance of WGN and 1/f1/f noise, respectively. The Allan variance plots and Cm(r)C^{m}(r) plots are logarithmic on both axes to observe the 1/τA1/\tau_{A} dependency of WGN clearly, while the y-axis is linear for the MSE plots, because the MSE is defined with the logarithm of Cm(r)C^{m}(r).

III.2 Physiological signal

Costa et al.Costa, Goldberger, and Peng (2002) introduced the MSE to demonstrate that physiological signals observed in young, healthy individuals exhibit versatile signal levels across a variety of time scales, whereas unhealthy subjects do not. In this section, we compare the MSE and Allan variance of the RR intervalWagner and Marriott (2008) series, which are sequences of intervals between adjacent R waves in electro-cardiograms, for healthy subjects and those with congestive heart failure (CHF) or atrial fibrillation (AF), following Costa et al. Each dataset was obtained from PhysioNet.Goldberger et al. (e 13); Irurzun et al. (2021); Baim et al. (1986); Petrutiu, Sahakian, and Swiryn (2007)

The lengths of the time series varied, but the entirety of each time series was used in the calculations, because adjusting to the shorter ones would reduce the accuracy of the estimation. To compare the Allan variance, the standard deviation of each time series was normalized to 1. This normalization is consistent with the MSE calculation, because rr was fixed at 0.15×SD(s)0.15\times SD(s). The MSE and Allan variance for each dataset is shown in Fig. 2. The Allan variance, MSE and Cm(r)-C^{m}(r) show trends that are similar except for the location of the peaks.

Figures 2(a) and (e) respectively show the MSE and Allan variance overlaid for all three cases. Figure 2(a) shows that the RR interval for healthy subjects is the most complex on long time scales, which is consistent with the results of the previous study.Costa, Goldberger, and Peng (2002) As shown in Fig. 2(e), the ordering of the different classes of subjects for the Allan variance and MSE show a similar trend. However, the exact scale factor τA\tau_{A} at which each curve intersects differs from that of the MSE. It seems plausible that the analysis of RR interval time series in general could be performed using Allan variance instead of the MSE.

Refer to caption
Figure 2: Comparison of the MSE and Allan variance for the RR interval time series. (a, e) Overlaid figure of the MSE and Allan variance of the RR interval time series from healthy subjects and subjects with CHF or AF, respectively. (b,c,d) The MSE and Cm(r)C^{m}(r) of the RR interval time series for healthy subject, CHF, and AF, respectively. The black, red, and blue curves represent the MSE, C3(r)C^{3}(r) and C2(r)C^{2}(r), respectively. Each curve is the average of independent results from 147 subjects for healthy, 29 for CHF, and 84 for AF, and the filled area around the MSE curve represents the first and third quartile. The first and third quartiles of Cm(r)C^{m}(r) are omitted for legibility. (f,g,h) Allan variance of the corresponding time series. The filled area around the curve represents the first and third quartile. Each dataset was obtained from PhysioNet.Goldberger et al. (e 13); Irurzun et al. (2021); Baim et al. (1986); Petrutiu, Sahakian, and Swiryn (2007)

III.3 Laser chaos

The MSE and Allan variance of the noise time series exhibit monotonic properties. However, the MSE and Allan variance of physiological signals show more complex properties, but the underlying dynamics are not completely known, which makes further analysis difficult.

As an example of a time series that includes multiple time scales and for which the underlying dynamics are known, we discuss a phenomenon called LFFUchida (2012) exhibited by a semiconductor laser with optical feedback. The Lang–Kobayashi equations,Lang and Kobayashi (1980) a set of model equations, was used to generate the time series. An example of a time series is shown in Fig. 3, where fast chaotic oscillations on the order of GHz coexist with irregular intensity dropouts and recovery on the order of MHz. The results are presented in Fig. 4. As the simulation step and time scale of the dynamics are important, τA\tau_{A} was converted to time.

Figure 4 shows that the MSE, Cm(r)C^{m}(r) and Allan variance all capture the dynamics of fast oscillations on the order of GHz, corresponding to the fast peak in the MSE and Allan variance, and dropouts on the order of MHz, corresponding to the slow peak.

Refer to caption
Figure 3: Example of an LFF time series. The intensity is normalized to that without optical feedback. (a) Original time series. (b) Time series obtained by applying an ideal low-pass filter with a cutoff frequency of 100 MHz to the time series in (a). Sudden dropouts with gradual recovery are observed.
Refer to caption
Figure 4: Comparison of the MSE and Allan variance for an LFF time series. (a) The MSE and Cm(r)C^{m}(r). The black, red, and blue curves represent the MSE, C3(r)C^{3}(r) and C2(r)C^{2}(r), respectively. (b) Allan variance. Note that the MSE, Cm(r)-C^{m}(r), and the Allan variance exhibits similar τA\tau_{A} dependencies.

IV Information theoretical connections between the MSE and Allan variance

IV.1 Cm(r)C^{m}(r) and Allan variance

In Sec. III, we observe that Cm(r)-C^{m}(r) exhibits behavior similar to that of the Allan variance. In this section, we discuss the underlying mechanisms.

IV.1.1 Cm(r)C^{m}(r) decomposition

First, we consider decomposing Cm(r)C^{m}(r) by the difference in the indices of the embedding vectors. Let M=N/τAM=\lfloor N/\tau_{A}\rfloor be the length of the coarse-grained time series, and Dm(i,j)D^{m}(i,j) be dcheb(𝒗m(i),𝒗m(j))d_{{\rm cheb}}(\bm{v}^{m}(i),\bm{v}^{m}(j)). Considering the expectation and symmetry of ii and jj, and assuming s(t)s(t) or its first-order difference s(t)s(t1)s(t)-s(t-1) is strongly stationary,

Cm(r)\displaystyle\langle C^{m}(r)\rangle =i=1Mmj=1jiMmu(rDm(i,j))(Mm1)(Mm)\displaystyle=\dfrac{\sum_{i=1}^{M-m}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{M-m}\langle u\left(r-D^{m}(i,j)\right)\rangle}{(M-m-1)(M-m)} (16)
=i=1Mmj=1jiMmPr(Dm(i,j)<r)(Mm1)(Mm)\displaystyle=\dfrac{\sum_{i=1}^{M-m}\sum_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{M-m}\Pr\left(D^{m}(i,j)<r\right)}{(M-m-1)(M-m)} (17)
=i=1Mmj=i+1Mm2Pr(Dm(i,j)<r)(Mm1)(Mm)\displaystyle=\dfrac{\sum_{i=1}^{M-m}\sum_{j=i+1}^{M-m}2\Pr\left(D^{m}(i,j)<r\right)}{(M-m-1)(M-m)} (18)
=k=1Mm1(Mmk)Pr(Dm(1,1+k)<r)(Mm2).\displaystyle=\dfrac{\sum_{k=1}^{M-m-1}(M-m-k)\Pr(D^{m}(1,1+k)<r)}{\binom{M-m}{2}}. (19)

First we clarify the meaning of expectation. In the following, we regard the time series s(t),t{1,2,,N}s(t),~{}t\in\{1,2,\cdots,N\} under study as a sequence of random variables following a specific probability distribution function (PDF). In this sense, Cm(r)C^{m}(r) and u(rDm(i,j))u(r-D^{m}(i,j)) are also random variables, and thus each of the possible values of Cm(r)C^{m}(r) and u(rDm(i,j))u(r-D^{m}(i,j)) has some probability of occurrence. The expectation is the weighted average of the likelihood of every potential value.

Here we explain the transformation of the equations from (16) to (19) line by line. The only change made from Eq. (16) to (17) is that the expectation term u(rDm(i,j))\langle u\left(r-D^{m}(i,j)\right)\rangle is changed to the probability term Pr(Dm(i,j)<r)\Pr\left(D^{m}(i,j)<r\right). We assume that a random variable sequence s(t),t{1,2,,N}s(t),~{}t\in\{1,2,\cdots,N\} follows a PDF, and calculate u(rDm(i,j))u\left(r-D^{m}(i,j)\right). As previously mentioned, u(rDm(i,j))u\left(r-D^{m}(i,j)\right) is also a random variable that is either 0 or 1 with the following probability:

u(rDm(i,j))={0(1Pr(Dm(i,j)<r))1(Pr(Dm(i,j)<r)).\displaystyle u\left(r-D^{m}(i,j)\right)=\begin{cases}0&(1-\Pr\left(D^{m}(i,j)<r\right))\\ 1&(\Pr\left(D^{m}(i,j)<r\right))\end{cases}. (20)

Consequently,

u(rDm(i,j))=Pr(Dm(i,j)<r)\displaystyle\langle u\left(r-D^{m}(i,j)\right)\rangle=\Pr\left(D^{m}(i,j)<r\right) (21)

holds.

From Eq. (17) to (18), the factor 22 is added to the numerator, and the range of summation over jj is restricted to j>ij>i. Here, we use the fact that Dm(i,j)=Dm(j,i)D^{m}(i,j)=D^{m}(j,i). From Eq. (18) to (19), we must assume that Pr(Dm(i,j)<r)\Pr\left(D^{m}(i,j)<r\right) depends only on the difference in the indices k=jik=j-i. For example, Pr(Dm(i+1,j+1)<r)\Pr\left(D^{m}(i+1,j+1)<r\right) is the same as Pr(Dm(i,j)<r)\Pr\left(D^{m}(i,j)<r\right). This assumption is satisfied when the original signal s(t)s(t) or its first-order difference s(t)s(t1)s(t)-s(t-1) is strongly stationary. This is a reasonable assumption when studying a time series from a dynamical system. The factor (Mmk)(M-m-k) in Eq. (19) comes from the fact that, for each kk, the number of pairs (i,j)(i,j) satisfying k=jik=j-i is (Mmk)(M-m-k). For example, when k=1k=1, the pairs are (1,2),(2,3),,(Mm1,Mm)(1,2),(2,3),\cdots,(M-m-1,M-m), as the total number of embedded vectors is MmM-m.

The probability in Eq. (19) can be decomposed further by considering conditional probabilities. Using the proposition (LABEL:iff:cheb),

Pr(Dm(1,1+k)<r)\displaystyle\Pr(D^{m}(1,1+k)<r)
=\displaystyle=\, Pr(dcheb(𝒗(1),𝒗(1+k))<r)\displaystyle\Pr(d_{\rm cheb}(\bm{v}(1),\bm{v}(1+k))<r)
=\displaystyle=\, Pr(|s(1)s(1+k)|<r|s(2)s(2+k)|<r\displaystyle\Pr(|s(1)-s(1+k)|<r\land|s(2)-s(2+k)|<r
|s(m)s(m+k)|<r)\displaystyle\land\cdots\land|s(m)-s(m+k)|<r)
=\displaystyle=\, Pr(D1(1,1+k)<rD1(2,2+k)<r\displaystyle\Pr(D^{1}(1,1+k)<r\land D^{1}(2,2+k)<r
D1(m,m+k)<r)\displaystyle\land\cdots\land D^{1}(m,m+k)<r) (22)

holds. The probability of a product event, such as Eq. (22), can be represented as a product of conditional probabilities. For example, the probability that propositions AA, BB and CC hold simultaneously is

Pr(ABC)\displaystyle\Pr(A\land B\land C) =Pr(BCA)Pr(A)\displaystyle=\Pr(B\land C\mid A)\Pr(A)
=Pr(CAB)Pr(BA)Pr(A).\displaystyle=\Pr(C\mid A\land B)\Pr(B\mid A)\Pr(A). (23)

In the same way,

Pr(Dm(1,1+k)<r)\displaystyle\Pr(D^{m}(1,1+k)<r)
=\displaystyle= Pr(D1(1,1+k)<r)\displaystyle\Pr(D^{1}(1,1+k)<r)
i=2mPr(D1(i,i+k)<rDi1(1,1+k)<r)\displaystyle\prod_{i=2}^{m}\Pr\left(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r\right) (24)

holds.

In the following, we show that the above conditional probability behaves oppositely to the Allan variance regardless of kk, while the unconditional probability does not. This point is the key to understanding why Cm(r)-C^{m}(r), the MSE and Allan variance show similar τA\tau_{A} dependency.

Figure 6(a) shows a logarithmic-scale color map of Pr(D1(1,1+k)<r)\Pr(D^{1}(1,1+k)<r), which is the first term on the right-hand side of Eq. (24), whereas Fig. 6(b) shows that of Pr(D1(2,2+k)<rD1(1,1+k)<r)\Pr(D^{1}(2,2+k)<r\mid D^{1}(1,1+k)<r), corresponding to the conditional probability in the second term on the right-hand side of Eq. (24) for i=2i=2.

Each probability was calculated for the LFF time series. For k=1k=1, corresponding to the lowest row in Fig. 6(a) and the solid curve in Fig. 6(c), the unconditioned probability Pr(D1(1,2)<r)\Pr(D^{1}(1,2)<r) behaves opposite to the Allan variance (see Fig. 4(b)). However, for larger kk, the behavior differs from that of the k=1k=1 case, as shown in the dashed plot in Fig. 6(c).

This can be understood as follows. First, the following equation holds:

D1(1,1+k)\displaystyle D^{1}(1,1+k)
=\displaystyle=\, |s¯(τA)(k+1)s¯(τA)(1)|\displaystyle|\bar{s}^{(\tau_{A})}(k+1)-\bar{s}^{(\tau_{A})}(1)|
=\displaystyle=\, |s¯(τA)(k+1)s¯(τA)(k)+s¯(τA)(k)+s¯(τA)(2)s¯(τA)(1)|\displaystyle|\bar{s}^{(\tau_{A})}(k+1)-\bar{s}^{(\tau_{A})}(k)+\bar{s}^{(\tau_{A})}(k)-\cdots+\bar{s}^{(\tau_{A})}(2)-\bar{s}^{(\tau_{A})}(1)|
=\displaystyle=\, |j=2k+1Δs¯(τA)(j)|.\displaystyle\left|\sum_{j=2}^{k+1}\Delta\bar{s}^{(\tau_{A})}(j)\right|. (25)

For k=1k=1, whether D1(1,2)=|Δs¯(τA)(2)|D^{1}(1,2)=|\Delta\bar{s}^{(\tau_{A})}(2)| is smaller than rr is closely related to the Allan variance. If the Allan variance for a certain τA\tau_{A} is smaller than that of another τA\tau_{A}, the Δs¯(τA)(l)\Delta\bar{s}^{(\tau_{A})}(l) distribution is biased toward the center, because the mean value of Δs¯(τA)(l)\Delta\bar{s}^{(\tau_{A})}(l) is zero by definition. Consequently, the probability D1(1,2)=|Δs¯(τA)(2)|<rD^{1}(1,2)=|\Delta\bar{s}^{(\tau_{A})}(2)|<r is high for that τA\tau_{A}. For larger kk, the j=2k+1Δs¯(τA)(j)\sum_{j=2}^{k+1}\Delta\bar{s}^{(\tau_{A})}(j) distribution is affected by the time correlation of s¯(τA)(l)\bar{s}^{(\tau_{A})}(l) that the time series under study inherently has, and the Allan variance Var(Δs¯(τA)(l))\operatorname*{Var}(\Delta\bar{s}^{(\tau_{A})}(l)) cannot predict the j=2k+1Δs¯(τA)(j)\sum_{j=2}^{k+1}\Delta\bar{s}^{(\tau_{A})}(j) distribution well, so Pr(D1(1,1+k)<r)\Pr(D^{1}(1,1+k)<r) does not behave in a manner that is strongly correlated with the Allan variance.

Conversely, the conditional probability Pr(D1(2,2+k)<rD1(1,1+k)<r)\Pr(D^{1}(2,2+k)<r\mid D^{1}(1,1+k)<r) depicted in Fig. 6 (b) shows a similar trend to that of the Allan variance regardless of kk, which we can also observe in Fig. 6(d). That is, Pr(D1(2,2+k)<rD1(1,1+k)<r)\Pr(D^{1}(2,2+k)<r\mid D^{1}(1,1+k)<r) is the main connection that explains the similarity between the MSE and Allan variance. To examine this further, we discuss the variance of the corresponding quantity, rather than considering the probabilities in the following sections.

IV.1.2 Neighborhood-Likelihood-to-Variance-Relationship (NLVR)

In this section, we introduce an assumption called the Neighborhood-Likelihood-to-Variance-Relationship (NLVR) to connect the probability discussion to the variance discussion. First, define ΔPr(τA)\Delta\Pr(\tau_{A}) as the difference between Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) for τA\tau_{A} and τA1\tau_{A}-1. In the same way, ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) is the difference between Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r) for τA\tau_{A} and τA1\tau_{A}-1. Please note that the conditional probability Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) is the same as that in Eq. (24), and the absolute value of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i), referred to in ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}), is identical to D1(i,i+k)D^{1}(i,i+k) in the definition of ΔPr(τA)\Delta\Pr(\tau_{A}).

Then, to connect the probability with the variance, we assume the Neighborhood-Likelihood-to-Variance-Relationship (NLVR), as follows:

ΔPr(τA)ΔVar(τA)<0.\displaystyle\Delta\Pr(\tau_{A})\Delta\operatorname*{Var}(\tau_{A})<0. (26)

The inequality (26) states that the signs of ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) are always opposite.

We will explain the NLVR in more detail. Both Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) and Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r) are statistics of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) under the same condition of Di1(1,1+k)<rD^{i-1}(1,1+k)<r. Considering the PDF of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i), Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) represents the area under the PDF in the range [r,r][-r,~{}r]. In addition, when there is no condition, the mean of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) is zero by definition. Here we also assume that the mean value of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) is close to zero under the condition Di1(1,1+k)<rD^{i-1}(1,1+k)<r. When ΔPr(τA)>0\Delta\Pr(\tau_{A})>0, the distribution of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) is more centrally biased. Accordingly, when ΔPr(τA)>0\Delta\Pr(\tau_{A})>0, ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) is likely to be negative.

Figure 5 schematically illustrates the concept of the NLVR. The black curves in Figs. 5 (a) and (b) represent the PDF of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) under the condition Di1(1+k,1)<rD^{i-1}(1+k,1)<r for τA1\tau_{A}-1 and τA\tau_{A}, respectively. The red-filled areas and black bars show Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) and Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), respectively. ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) are the difference in the red-filled areas and black bars between Figs. 5(b) and (a), respectively. Please note that the plots shown here, based on Gaussian distributions, are for explanatory purposes only and are not obtained from a time series. The validity of the NLVR is discussed in Sec. IV.3.

Refer to caption
Figure 5: A visual representation of the NLVR. (a, b) s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) distribution under the condition Di1(1+k,1)<rD^{i-1}(1+k,1)<r for τA1\tau_{A}-1 and τA\tau_{A}, respectively. The red-filled area and the the black bars represent Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) and Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), respectively.

IV.1.3 Variance decomposition

Here we further discuss the conditional variance Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), instead of the conditional probability Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) following the NLVR assumption, as mentioned in Sec. IV.1.1. By decomposing s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) as

s¯(τA)(i+k)s¯(τA)(i)\displaystyle\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)
=\displaystyle=\, s¯(τA)(i+k)s¯(τA)(i+k1)+s¯(τA)(i+k1)\displaystyle\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i+k-1)+\bar{s}^{(\tau_{A})}(i+k-1)
s¯(τA)(i)+s¯(τA)(i1)s¯(τA)(i1)\displaystyle-\bar{s}^{(\tau_{A})}(i)+\bar{s}^{(\tau_{A})}(i-1)-\bar{s}^{(\tau_{A})}(i-1)
=\displaystyle=\, Δs¯(τA)(i+k)Δs¯(τA)(i)+(s¯(τA)(i+k1)s¯(τA)(i1)),\displaystyle\Delta\bar{s}^{(\tau_{A})}(i+k)-\Delta\bar{s}^{(\tau_{A})}(i)+(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1)), (27)

the conditional variance, Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), can be decomposed as follows:

Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\displaystyle\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r)
=\displaystyle= Var(Δs¯(τA)(i+k))\displaystyle\operatorname*{Var}(\Delta\bar{s}^{(\tau_{A})}(i+k)\mid\cdots)
+Var(Δs¯(τA)(i))\displaystyle+\operatorname*{Var}(\Delta\bar{s}^{(\tau_{A})}(i)\mid\cdots)
2Cov(Δs¯(τA)(i+k),Δs¯(τA)(i))\displaystyle-2\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),\Delta\bar{s}{(\tau_{A})}(i)\mid\cdots)
+Var(s¯(τA)(i+k1)s¯(τA)(i1))\displaystyle+\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1)\mid\cdots)
+2Cov(Δs¯(τA)(i+k),(s¯(τA)(i+k1)s¯(τA)(i1)))\displaystyle+2\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1))\mid\cdots)
2Cov(Δs¯(τA)(i),(s¯(τA)(i+k1)s¯(τA)(i1))).\displaystyle-2\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i),(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1))\mid\cdots). (28)

Here, the condition Di1(1,1+k)<rD^{i-1}(1,1+k)<r is abbreviated by the symbol \cdots on the right-hand side of Eq. (28).

[Overview of the variance decomposition]

There are six terms on the right-hand side of Eq. (28). The logarithmic scale color map of each term is shown in Figs. 7(a)–(f) to determine which term has the greatest influence on the conditional variance. Figure 7(g) represents the summation of each term, which is the original conditional variance. Meanwhile, Fig. 7(h) show cross-sectional profiles when k=50k=50 regarding the first term (Fig. 7(a)), the third term (Fig. 7(c)) and the total conditional variance (Fig. 7(g)).

As discussed shortly below, the first through third terms in Eq. (28) are dominant, as shown in Figs. 7(a)–(c), whereas they show a similar τA\tau_{A} dependency with the Allan variance regardless of kk, as indicated in Fig. 7(h).

Please note that we plotted the absolute values of the covariance terms, because the covariance can be negative. However, the τA\tau_{A} dependency of Cov(Δs¯(τA)(i+k),Δs¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),\Delta\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), shown in Figs. 7(c) and (h), is still similar to that of the Allan variance. Conversely, the remaining panels (Figs. 7(d)–(f)) exhibit small values, regardless of kk and τA\tau_{A}.

Therefore, we observe that the conditional variance Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), which is shown in Figs. 7(g) and (h), and the conditional probability Pr(|s¯(τA)(i+k)s¯(τA)(i)|<rDi1(1,1+k)<r)\Pr(|\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)|<r\mid D^{i-1}(1,1+k)<r), shown in Fig. 6(b) for the case of i=2i=2, exhibit the opposite τA\tau_{A} dependency to the Allan variance from the NLVR.

[The first and the second terms]

The first and second terms are the conditional variances of Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i), namely the conditional Allan variances, respectively. Although the condition somewhat affects the Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i) distribution, the size relationship of the variance of Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i) concerning τA\tau_{A} is almost the same as the Allan variance.

[The third term]

However, the third term is strongly influenced by the condition. In the absence of the condition, if kk is sufficiently large, there would be little correlation between Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i); thus, the covariances are expected to be small. Fig. 8(a) shows the unconditional covariance of Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i), where the color scale is the same as in Fig. 7. Comparing to the conditional covariance plotted in Fig. 7(c), this covariance without the condition is smaller in most places.

By contrast, when the condition is satisfied, i1i-1 consecutive points up to s¯(τA)(i+k1)\bar{s}^{(\tau_{A})}(i+k-1) and s¯(τA)(i1)\bar{s}^{(\tau_{A})}(i-1) are close values. In that case, Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i), i.e., the difference between these points and their consecutively following point, often have the same sign, and the distribution of their magnitudes follows the Allan variance.

Figure 8(b) shows an example of a pair of Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i) satisfying the condition for i=3i=3. Because we can regard 𝒗2(7)\bm{v}^{2}(7) and 𝒗2(17)\bm{v}^{2}(17), denoted by the blue dots, as identical under the tolerance rr, the red arrows, representing Δs¯(τA)(9)\Delta\bar{s}^{(\tau_{A})}(9) and Δs¯(τA)(19)\Delta\bar{s}^{(\tau_{A})}(19) are likely to be similar. By the definition of covariance, if Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i) are similar, Cov(Δs¯(τA)(i+k),Δs¯(τA)(i))\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),\Delta\bar{s}^{(\tau_{A})}(i)) is close to Var(Δs¯(τA)(i+k))\operatorname*{Var}(\Delta\bar{s}^{(\tau_{A})}(i+k)), namely the Allan variance. From these considerations, the conditional covariance behaves similarly to the Allan variance.

Please note that the form of the condition in the Fig. 8(b), namely D2(7,17)<rD^{2}(7,17)<r is slightly different from Di1(1,1+k)<rD^{i-1}(1,1+k)<r. However, as we assume stationary, it is allowed to shift the indices. For example, in this case, we examined Δs¯(τA)(19)\Delta\bar{s}^{(\tau_{A})}(19) and Δs¯(τA)(9)\Delta\bar{s}^{(\tau_{A})}(9) under the condition of D2(7,17)<rD^{2}(7,17)<r, and this corresponds to the shifting of indices by six. To be more specific, we investigated Δs¯(τA)(19)=Δs¯(τA)(3+10+6)\Delta\bar{s}^{(\tau_{A})}(19)=\Delta\bar{s}^{(\tau_{A})}(3+10+6) and Δs¯(τA)(3+6)\Delta\bar{s}^{(\tau_{A})}(3+6), under the condition of D2(7,17)<r=D2(1+6,1+10+6)D^{2}(7,17)<r=D^{2}(1+6,1+10+6), where i=3i=3 and k=10k=10.

[The fourth to the sixth terms]

The fourth to sixth terms do not contribute significantly to the conditional variance, as mentioned earlier. This is due to the fact that s¯(τA)(i+k1)s¯(τA)(i1)\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1), which is involved in the variance or covariance in the fourth to sixth terms, is small when the condition Di1(1,1+k)<rD^{i-1}(1,1+k)<r is satisfied. Once again, the condition implies a sort of similarity on the time scale of i1i-1 points, and when these points are similar, the variance of these terms containing the difference will be small.

[Summary of the decomposition]

Adding the above six terms together, we see that the conditional variance eventually behaves similarly to the Allan variance, as shown in Figs. 7(g) and (h). Because the conditional variances and conditional probabilities are assumed by the NLVR to act in opposite directions for increasing scale factors, the conditional probabilities and Cm(r)C^{m}(r), represented by the weighted average of the product of probability without conditions Pr(D1(1,1+k)<r)\Pr(D^{1}(1,1+k)<r) and the conditional probabilities Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r), in turn, behave in the opposite direction for increasing scale factors for the Allan variance.

Although we observe the conditional variances and covariances from an LFF time series as an example, the discussion of behavior shown by the variances and covariances is valid for other time series to some extent. In the discussion, we assumed strong stationarity of the signal s(t)s(t) or its first-order difference to estimate probabilities and variances from the obtained time series. This is because Pr(Dm(i,j)<r)\Pr(D^{m}(i,j)<r) in Eq. (17) cannot be estimated from a single time series. Conversely, the NLVR assumption between probability and variance can also be applied to Pr(|s¯(τA)(i)s¯(τA)(j)|<r)\Pr(|\bar{s}^{(\tau_{A})}(i)-\bar{s}^{(\tau_{A})}(j)|<r) and Var(s¯(τA)(i)s¯(τA)(j))\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i)-\bar{s}^{(\tau_{A})}(j)) and could be valid to a certain extent. Thus, similar behavior exhibited by the MSE and Allan variance should hold for a more general class of signals.

Refer to caption
Figure 6: (a,b) Logarithmic scale color maps of
  1. (a)

    Pr(D1(1,1+k)<r)\Pr(D^{1}(1,1+k)<r) and

  2. (b)

    Pr(D1(2,2+k)<rD1(1,1+k)<r)\Pr(D^{1}(2,2+k)<r\mid D^{1}(1,1+k)<r)

for the LFF time series up to k=100k=100, respectively. (c,d) One-dimensional plots of the two-dimensional data displayed in (a) and (b) along the two arrows (k=1k=1 and k=100k=100). Solid and dashed plots correspond to k=1k=1 and k=100k=100, respectively.
Refer to caption
Figure 7: Element-wise logarithmic-scale color maps of the conditional variance Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r) given by Eq. (28). There are six terms of an LFF time series up to k=100k=100:
  1. (a)

    Var(Δs¯(τA)(i+k)Di1(1,1+k)<r)\operatorname*{Var}(\Delta\bar{s}^{(\tau_{A})}(i+k)\mid D^{i-1}(1,1+k)<r),

  2. (b)

    Var(Δs¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\Delta\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r),

  3. (c)

    2|Cov(Δs¯(τA)(i+k),Δs¯(τA)(i)Di1(1,1+k)<r)|2|\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),\Delta\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r)|,

  4. (d)

    Var(s¯(τA)(i+k1)s¯(τA)(i1)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1)\mid D^{i-1}(1,1+k)<r),

  5. (e)

    2|Cov(Δs¯(τA)(i+k),(s¯(τA)(i+k1)s¯(τA)(i1))Di1(1,1+k)<r)|2|\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1))\mid D^{i-1}(1,1+k)<r)| and

  6. (f)

    2|Cov(Δs¯(τA)(i),(s¯(τA)(i+k1)s¯(τA)(i1))Di1(1,1+k)<r)|2|\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i),(\bar{s}^{(\tau_{A})}(i+k-1)-\bar{s}^{(\tau_{A})}(i-1))\mid D^{i-1}(1,1+k)<r)|.

  7. (g)

    The term before the above decomposition, which is Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r).

  8. (h)

    Cross-sectional profiles when k=50k=50 from (a) (red curve), (c) (blue curve), and (g) (black curve) indicated by the arrows.

Please note that for parts (c), (e), and (f) we plot the absolute value of the covariance, because covariance can be negative, and our color scale is logarithmic. However, 2Cov(Δs¯(τA)(i+k),Δs¯(τA)(i)Di1(1,1+k)<r)2\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),\Delta\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r), shown in (c), is mostly positive. In (g), although the points with negative values are neglected, almost the entire curve is depicted.
Refer to caption
Figure 8: (a) The color map of the unconditional version of the third term of the conditional variance (Eq. (28): 2|Cov(Δs¯(τA)(i+k),Δs¯(τA)(i)|2|\operatorname*{Cov}(\Delta\bar{s}^{(\tau_{A})}(i+k),\Delta\bar{s}^{(\tau_{A})}(i)|. The color scale is the same as in Fig. 7. (b) Δs¯(τA)(i+k)\Delta\bar{s}^{(\tau_{A})}(i+k) and Δs¯(τA)(i)\Delta\bar{s}^{(\tau_{A})}(i) under the condition of D2(7,7+10)<rD^{2}(7,7+10)<r. Here, i=3i=3 and k=10k=10. The blue dots represent the two consecutive points which we can regard as 𝒗2(7)\bm{v}^{2}(7) and 𝒗2(17)\bm{v}^{2}(17) satisfying the condition D2(7,7+10)<rD^{2}(7,7+10)<r. The red dots denote s¯(τA)(9)\bar{s}^{(\tau_{A})}(9) and s¯(τA)(19)\bar{s}^{(\tau_{A})}(19) and the red arrows show Δs¯(τA)(9)\Delta\bar{s}^{(\tau_{A})}(9) and Δs¯(τA)(19)\Delta\bar{s}^{(\tau_{A})}(19).

IV.2 The MSE and Allan variance

In Sec. IV.1, the connection between Cm(r)C^{m}(r) and the Allan variance was discussed based on the connection between the conditional probability and variance. We can also apply this discussion to the connection between the MSE and Allan variance.

To simplify the discussion, we assume a sufficiently long (strongly stationary) time series, so that we can consider Cm(r)C^{m}(r) to be the same as the expectation Cm(r)\langle C^{m}(r)\rangle. This is equivalent to assuming ergodicity.

Under this condition,

MSE\displaystyle MSE
=\displaystyle= logCm+1(r)Cm(r)\displaystyle-\log\dfrac{C^{m+1}(r)}{C^{m}(r)}
=\displaystyle= logk=1Mm1(Mmk)Pr(Dm+1(1,1+k)<r)k=1Mm1(Mmk)Pr(Dm(1,1+k)<r).\displaystyle-\log\dfrac{\sum_{k=1}^{M-m-1}(M-m-k)\Pr(D^{m+1}(1,1+k)<r)}{\sum_{k=1}^{M-m-1}(M-m-k)\Pr(D^{m}(1,1+k)<r)}. (29)

Pr(Dm+1(1,1+k)<r)\Pr(D^{m+1}(1,1+k)<r), in the numerator of Eq. (29) can be decomposed similarly, as shown in Sec. IV.1, as follows:

Pr(Dm+1(1,1+k)<r)\displaystyle\Pr(D^{m+1}(1,1+k)<r)
=\displaystyle= Pr(D1(m+1,m+1+k)<rDm(1,1+k)<r)\displaystyle\Pr(D^{1}(m+1,m+1+k)<r\mid D^{m}(1,1+k)<r)
Pr(Dm(1,1+k)<r).\displaystyle\Pr(D^{m}(1,1+k)<r). (30)

Let ak=(Mmk)Pr(Dm(1,1+k)<r)a_{k}=(M-m-k)\Pr(D^{m}(1,1+k)<r) and bk=Pr(D1(m+1,m+1+k)<rDm(1,1+k)<r)b_{k}=\Pr(D^{1}(m+1,m+1+k)<r\mid D^{m}(1,1+k)<r). Considering the equality (Mmk)Pr(Dm+1(1,1+k)<r)=akbk(M-m-k)\Pr(D^{m+1}(1,1+k)<r)=a_{k}b_{k}, Eq. (29) can be rewritten as follows:

MSE\displaystyle MSE =logCm+1(r)Cm(r)\displaystyle=-\log\dfrac{C^{m+1}(r)}{C^{m}(r)}
=logk=1Mm1akbkk=1Mm1ak\displaystyle=-\log\dfrac{\sum_{k=1}^{M-m-1}a_{k}b_{k}}{\sum_{k=1}^{M-m-1}a_{k}} (31)

Equation (31) states that the MSE is the negative logarithm of the weighted average of bkb_{k}. Because bkb_{k} is the conditional probability, the behavior of the MSE can be explained by the conditional probability as well as Cm(r)C^{m}(r).

More precisely, we must care about the expectation. Cm(r)C^{m}(r) can be represented with probability when considering the expectation Cm(r)\langle C^{m}(r)\rangle. The expectation of the MSE is

MSE\displaystyle\langle MSE\rangle =logCm+1(r)Cm(r)\displaystyle=\left\langle-\log\dfrac{C^{m+1}(r)}{C^{m}(r)}\right\rangle (32)
logCm+1(r)Cm(r).\displaystyle\geq-\log\left\langle\dfrac{C^{m+1}(r)}{C^{m}(r)}\right\rangle. (33)

From Eq. (32) to (33), we use the fact that the negative logarithm is convex, along with Jensen’s inequality. Eq. (33) in general is not the same as

logCm+1(r)Cm(r)=logk=1Mm1akbkk=1Mm1ak.\displaystyle-\log\dfrac{\langle C^{m+1}(r)\rangle}{\langle C^{m}(r)\rangle}=-\log\dfrac{\sum_{k=1}^{M-m-1}a_{k}b_{k}}{\sum_{k=1}^{M-m-1}a_{k}}. (34)

However, it is noteworthy that Costa et al.Costa, Goldberger, and Peng (2005) showed that the MSE values calculated theoretically, using probability, agree well for the WGN and 1/f1/f noise cases.

IV.3 Connection between the probability and variance

In Secs. IV.1 and IV.2, we discussed the connection between Cm(r)C^{m}(r) or the MSE and Allan variance based on the assumption of NLVR. In this section, we discuss the extent to which NLVR is valid, and what happens if it is violated.

[Considering independent identical distribution (i.i.d.)]

For simplicity, we consider the case of independent identical distributions (i.i.d.). In this case, the conditional probability distribution of (s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r) is identical to the probability distribution of (s¯(τA)(i+k)s¯(τA)(i))(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)), which is independent of the condition (Di1(1,1+k)<r)(D^{i-1}(1,1+k)<r). The distribution does not depend on kk (see Appendix A). Therefore, it is sufficient to consider the distribution of s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i). In addition, the variances of s¯(τA)(i)\bar{s}^{(\tau_{A})}(i) and s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) are Var(s(1))/τA\operatorname*{Var}(s(1))/\tau_{A} and 2Var(s(1))/τA2\operatorname*{Var}(s(1))/\tau_{A}, respectively.

Therefore, the Allan variance for an i.i.d. system is inversely proportional to τA\tau_{A}. For the WGN case, owing to the reproductive property, the PDF of s¯(τA)(i)\bar{s}^{(\tau_{A})}(i) and s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) are also Gaussian, with variances Var(s(1))/τA\operatorname*{Var}(s(1))/\tau_{A} and 2Var(s(1))/τA2\operatorname*{Var}(s(1))/\tau_{A}, respectively. In addition, because the mean value of s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) is zero, the probability |s¯(τA)(i+1)s¯(τA)(i)|<r|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r can be represented as follows:

Pr(|s¯(τA)(i+1)s¯(τA)(i)|<r)\displaystyle\Pr(|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r)
=\displaystyle= 12π2σ/τArrexp(x22(2σ/τA)2)𝑑x,\displaystyle\sqrt{\dfrac{1}{2\pi 2\sigma/\tau_{A}}}\int_{-r}^{r}\exp\left(-\dfrac{x^{2}}{2\cdot(2\sigma/\tau_{A})^{2}}\right)dx, (35)

where σ2=Var(s(1))\sigma^{2}=\operatorname*{Var}(s(1)), and increases monotonically with increasing τA\tau_{A}. Therefore, NLVR holds for all τA\tau_{A}.

The variance of s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) continuously decreases when τA\tau_{A} increases in the i.i.d. case. That is, ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) is negative for all τA\tau_{A}. The cases in which NLVR does not hold are those in which the area under the PDF of s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) in the range [r,r][-r,~{}r] decreases with increasing τA\tau_{A}, which means ΔPr(τA)\Delta\Pr(\tau_{A}) is also negative. This situation seems unlikely to occur when the i.i.d. process is unimodal. That is to say, the reduction of the variance means the shrinking of the distribution toward the center, which means an increase in the probabilities around the center.

[When NLVR is invalid]

However, situations that violate NLVR are likely to occur when, for example, the original system exhibits a multimodal distribution. It should be noted that s¯(τA)(i)\bar{s}^{(\tau_{A})}(i) of a multimodal distribution can have more peaks than the distribution of s(t)s(t), owing to the effect of averaging. Thus, s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) may also have many peaks.

The number of peaks increases as τA\tau_{A} increases; however, beyond a certain point, the number of peaks eventually decreases because the peaks fuse with each other. According to the central limit theorem, the distribution of s¯(τA)(i)\bar{s}^{(\tau_{A})}(i) and s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) finally assumes shapes close to a Gaussian distribution in the i.i.d. case. As the number of peaks increases, the area of the PDF of s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) near the center is distributed to each peak, thus, ΔPr(τA)\Delta\Pr(\tau_{A}) is negative until the effects of this distribution and peaks merging become antagonistic.

[Violation of NLVR by bimodal distributions]

Figure 9 presents an example of this discussion. Figure 9(a) shows a histogram of the original bimodal i.i.d. system composed of two Gaussians centered around ±1\pm 1. Figures 9(c), (e), and (g) show the histograms of the coarse-grained time series for τA=1,2,5,100\tau_{A}=1,~{}2,~{}5,~{}100, respectively. Recall that coarse-graining means averaging over neighboring points. Because the process is i.i.d., two consecutive points have a 50%50\,\% probability of coming from opposite sides of the origin. Thus, for τA=2\tau_{A}=2 shown in Fig. 9(b), a third peak appears around the origin. More precisely, the leftmost peak shown in Fig. 9(c) corresponds to the case in which two consecutive s(t)s(t) points are negative, i.e., from the left peak in Fig. 9(a), which has a probability of 50%×50%=25%50\,\%\times 50\,\%=25\,\%. Similarly, the rightmost peak in Fig. 9(c) corresponds to two consecutive positive s(t)s(t) points, and the center peak corresponds to either positive and negative, or negative and positive points. The number of peaks increases up to a certain τA\tau_{A}, until they start to merge with each other. For a sufficiently large τA\tau_{A}, the distribution approaches a Gaussian distribution, as shown in Fig. 9(g).

Figures 9(b), (d), (f), and (h) show the histograms of the difference between adjacent coarse-grained points s¯(τA)(i+1)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i) for τA=1,2,5,100\tau_{A}=1,~{}2,~{}5,~{}100, respectively. The red filled area represents the range [r,r][-r,~{}r] and the black bar denotes the standard deviation of the distribution. In fact, the distributions can be regarded as convolutions of the corresponding distribution of s¯(τA)(i)\bar{s}^{(\tau_{A})}(i). Similar to s¯(τA)(i)\bar{s}^{(\tau_{A})}(i), the number of peaks increases until a certain τA\tau_{A} and then decreases. In such cases, ΔPr(τA)\Delta\Pr(\tau_{A}) is negative until a certain τA\tau_{A}, which can be observed from the reduction of the red area in Fig. 9(b)–(f), whereas ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) is always negative. Thus, the NLVR is violated.

Refer to caption
Figure 9: Example of histograms of a bimodal i.i.d. system. (a,c,e,g) Histograms of the coarse-grained time series for τA=1,2,5,100\tau_{A}=1,~{}2,~{}5,~{}100, respectively. (b,d,f,h) Histograms of the difference between adjacent coarse-grained points for τA=1,2,5,100\tau_{A}=1,~{}2,~{}5,~{}100, respectively. Red areas represent the range [r,r][-r,~{}r]. Black bars denote the standard deviation.

[Disagreement of MSE and Allan variance]

Plots of Pr(D1(i,i+1)=|s¯(τA)(i+1)s¯(τA)(i)|<r)\Pr(D^{1}(i,i+1)=|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r), the MSE, Cm(r)C^{m}(r) and Allan variance of this bimodal i.i.d. system are presented in Fig. 10. The dashed black line in Fig. 10(a) shows Pr(|s¯(τA)(i+1)s¯(τA)(i)|<r)\Pr(|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r). In contrast to the continuously decreasing Allan variance, Pr(|s¯(τA)(i+1)s¯(τA)(i)|<r)\Pr(|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r) decreases until τA=8\tau_{A}=8 and then increases. Cm(r)C^{m}(r) (red and blue curves) shows the same trend, and the MSE (black curve) inherits the inverted trend. Actually Cm(r)=Pr(|s¯(τA)(i+1)s¯(τA)(i)|<r)m\langle C^{m}(r)\rangle=\Pr(|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r)^{m} and MSElogPr(|s¯(τA)(i+1)s¯(τA)(i)|<r)MSE\simeq-\log\Pr(|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r) (see Appendix A). In contrast, the Allan variance curve plotted in Fig. 10(b) shows 1/τA1/\tau_{A} dependency (note that the plot is logarithmically scaled), as discussed above. Because the example under study here was an i.i.d. process, the Allan variance result may be considered the more reliable one, as it does not detect any specific time scales of relevance, whereas the MSE indicates a higher level of complexity for τA=8\tau_{A}=8 than at other scales.

Refer to caption
Figure 10: Comparison of the MSE and Allan variance for the bimodal system shown in Fig. 9(a). (a) The MSE, Cm(r)C^{m}(r) and Pr(|s¯(τA)(i+1)s¯(τA)(i)|<r)\Pr(|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r) of a time series obtained from the system. The black, red, and blue curves represent the MSE, C3(r)C^{3}(r) and C2(r)C^{2}(r), respectively. The dashed black curve represents Pr(D1(i,i+1)=|s¯(τA)(i+1)s¯(τA)(i)|<r)\Pr(D^{1}(i,i+1)=|\bar{s}^{(\tau_{A})}(i+1)-\bar{s}^{(\tau_{A})}(i)|<r). (b) The Allan variance of the corresponding time series. Herein, the MSE and the Allan variance clearly exhibit different τA\tau_{A} dependencies.

From these discussions we can see that the NLVR is not always valid. For example, the condition may not hold when the original time series s(t)s(t) contains a multimodal distribution. If the NLVR is violated, meaning that ΔPr(τA)ΔVar(τA)>0\Delta\Pr(\tau_{A})\Delta\operatorname*{Var}(\tau_{A})>0 for a large number of kk, Cm(r)C^{m}(r), expressed as a weighted average of the probability, is expected to change in the same direction as the Allan variance. As a result, the Allan variance and MSE change with the opposite sign. In i.i.d. cases, ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) are independent of kk, so if NLVR does not hold for a specific τA\tau_{A} and kk, it is violated for all kk.

Therefore, in the range of τA\tau_{A} where NLVR holds, the MSE and the Allan variance show similar τA\tau_{A} dependence; in the range where the NLVR is violated, they behave oppositely to τA\tau_{A}. If the time series are time-correlated, as in dynamical systems, the NLVR may be violated by even more complex mechanisms.

IV.4 Empirical validation of the NLVR

Finally, we examine the extent to which the NLVR is satisfied based on empirical data. Specifically, the change in conditional probability (ΔPr(τA)\Delta\Pr(\tau_{A})) and conditional variance (ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A})) are calculated for the time series studied in Sections III and IV.3 up to k=100k=100.

First, Fig. 11(a) shows a histogram of the intensity observed in the LFF time series. The intensity is normalized to that without optical feedback, i.e., the intensity in the single mode.Ohtsubo (2017) The histogram shown in Fig. 11(a) has a strong peak near the origin and a smaller peak near the normalized intensity of 3. That is, the probability distribution exhibits a somewhat multimodal distribution, which may cause a violation of the NLVR.

Figures 11(b), (c) and (d) show the scatter plot of (ΔPr(τA),ΔVar(τA)\Delta\Pr(\tau_{A}),\Delta\operatorname*{Var}(\tau_{A})) for LFF, WGN, and bimodal distribution, respectively, as explained below. If the NLVR holds, a positive ΔPr(τA)\Delta\Pr(\tau_{A}) means a negative ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) and a negative ΔPr(τA)\Delta\Pr(\tau_{A}) means a positive ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}). That is to say, the points in the scatter diagram should be in the second or fourth quadrants.

From Fig. 11(b), almost all the sampling points are concentrated in the second and fourth quadrants, i.e., in regions where the signs of ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) are opposite. That is to say, even though the probability distribution contains multimodality, this LFF system mostly does not violate the NLVR. From these observations, we speculate that the peaks of the distribution would need to be more separated to lead to a disagreement of the Allan variance and MSE.

Similarly, Fig. 11(c) shows ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) for the WGN time series. As discussed above, ΔPr(τA)\Delta\Pr(\tau_{A}) should always be positive, whereas ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) is always negative, i.e., every point should be in the second quadrant. However, the distribution calculated from the time series is not precisely the Gaussian distribution, so some points are in the third quadrant.

Finally, Fig. 11(d) shows the same scatter plot for the bimodal case. In contrast to the LFF and WGN cases, many points are in the third quadrant in violation of the NLVR. This was expected, as we constructed this case specifically as a counter-example.

Refer to caption
Figure 11: ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) for m=2m=2 and i=2i=2, up to k=100k=100. (a) Histogram of the LFF time series. (b, c, d) Scatter plot of ΔPr(τA)\Delta\Pr(\tau_{A}) and ΔVar(τA)\Delta\operatorname*{Var}(\tau_{A}) for (b) the LFF time series, (c) WGN time series, and (d) the bimodal time series. When the NLVR holds, the points in the scatter diagram should be in the second or fourth quadrants.

IV.5 Computation of the MSE and Allan variance

As introduced in Sec. II.2, the definition of MSE involves probabilities, whereas that of the Allan variance is based on the variability of the data under study. Section IV.3 reveals the common underlying mechanism from an information-theoretic viewpoint despite the seemingly different definitions of the statistical measures of multiscale dynamics. In this section, we discuss the difference from the viewpoint of computation between the MSE and Allan variance.

For the MSE, calculating Cm(r)C^{m}(r) is a computationally demanding task. Computing Eq. (9) for all ii requires O(N2)O(N^{2}) calculations, as all pairs of embedded vectors are compared. Some implementations also require O(N2)O(N^{2}) memory to store all the calculated Chebyshev distances. Thus, the MSE evaluates the details of the probability distribution at a high computational cost.

In contrast, the Allan variance calculation requires only O(N)O(N) computations and memory. The Allan variance does not consider the probability distribution; it depends only on the differences Δs¯(τA)(l)\Delta\bar{s}^{(\tau_{A})}(l) of successive coarse-grained points s¯(τA)(l)\bar{s}^{(\tau_{A})}(l). As discussed in Sec. IV.3, the MSE for the i.i.d. process depends on the distribution, whereas the Allan variance does not.

Despite being computationally cheaper, it is not obvious that the Allan variance has any disadvantages when compared to the MSE for extracting multiscale features. The Allan variance is a statistical measure that is similar but not identical to the MSE, and that quantifies slightly different aspects of the time series.

In the literature, the range of applications of the MSE is versatile, such as bearing fault detectionWang, Yao, and Cai (2020) and sleep level qualification.Liang et al. (2012) However, the MSE suffers from severe computational difficulties as discussed above. Concerning the similar properties of the MSE and Allan variance, as well as the computationally lightweight nature of Allan variance, the extension of the Allan variance to real-time applications, such as bearing fault detection or prediction of epilepsy from electroencephalography (EEG), would be an interesting future topic.

V Conclusion

In this study, we examined the similarities shown by the multiscale statistics the MSE and Allan variance, and discussed the underlying mechanisms through an information-theoretic analysis. It is noteworthy that although the apparent definitions of the MSE and Allan variance are significantly different, they show a similar behavior for a wide range of time-series data. We experimentally confirmed the similar properties of the MSE and Allan variance observed in LFF in chaotic lasers and physiological heartbeat data, as well as white Gaussian and 1/f1/f noise. The connection can be understood by decomposing the conditional probabilities in the MSE and extracting the dominant contributions. We derived a condition which must be satisfied for the MSE and Allan variance to exhibit similar tendencies via a discussion of conditional probabilities. Then, we artificially constructed a random sequence that violates the condition, leading to inconsistent MSE and Allan variance behavior. We also quantitatively demonstrated that the aforementioned LFF and heartbeat, which are physically plausible systems, mostly satisfy the condition.

Finally, we discussed future research topics. Using Allan variance instead of the MSE may lead to more computationally lightweight applications that are suitable for real-time tasks. In addition, there is a possibility of integrating further developments that have been devised for the MSE Humeau-Heurtier (2015) and Allan variance.Kroupa (2012)

Furthermore, more research on the theoretical foundations of coarse-graining and the MSE is needed. The MSE research to date has focused mainly on its application as a statistical tool, and there has been little research on its theoretical foundations. The relationship between the dynamics of a coarse-grained time series and those of the original time series is unclear. Coarse-graining can be regarded as a combination of a moving-average filter and downsampling; however, according to a previous study,Sauer, Yorke, and Casdagli (1991) a linear filter applied to the original time series during Takens’ embedding preserves its topological properties. From this theorem, it may be possible to discuss the theoretical basis of coarse-graining from the viewpoint of dynamical invariants, including entropies. Meanwhile, the theoretical foundations of the MSE should be more complicated, as the MSE shares the tolerance rr for all scales. As pointed out by Humeau-Heurtier,Humeau-Heurtier (2015) more and more embedded vectors may be regarded as neighbors of each other as the scale factor τA\tau_{A} increases, owing to the reduction of the variance of the coarse-grained time series. Notably, Costa et al. Costa, Goldberger, and Peng (2005), the original proposers of the MSE, pointed out that the variance changes induced by coarse-graining are related to the temporal structures of the original time series. We may need a framework that allows us to connect the dynamical invariants at each time scale.

Acknowledgments

This work was supported in part by the CREST Project (JPMJCR17N2) funded by the Japan Science and Technology Agency and Grants-in-Aid for Scientific Research (JP20H00233), and Transformative Research Areas (A) (JP22H05197) funded by the Japan Society for the Promotion of Science (JSPS). AR is supported by JSPS as an International Research Fellow.

Appendix A Independent identical distributions case

In this section, we discuss the properties of conditional probability Pr(D1(i,i+k)<rDi1(1,1+k)<r)\Pr(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r) and conditional variance Var(s¯(τA)(i+k)s¯(τA)(i)Di1(1,1+k)<r)\operatorname*{Var}(\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i)\mid D^{i-1}(1,1+k)<r) for i.i.d. systems.

Let s(t)s(t) be the i.i.d. time series under study. We can regard s(t)s(t) as a sequence of random variables. Obviously, the coarse-grained time series defined as

s¯(τA)(l)\displaystyle\bar{s}^{(\tau_{A})}(l) =1τAt=(l1)τA+1lτAs(t)\displaystyle=\dfrac{1}{\tau_{A}}\sum_{t=(l-1)\tau_{A}+1}^{l\tau_{A}}s(t) (36)

is also an i.i.d. sequence of random variables. To simplify the symbols, let the random variable YlY_{l} be s¯(τA)(l)\bar{s}^{(\tau_{A})}(l). We now define the PDF of Yl=s¯(τA)(l)Y_{l}=\bar{s}^{(\tau_{A})}(l) as g(τA)(yl)g^{(\tau_{A})}(y_{l}). Please note that the function itself is independent of ll.

The distribution of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) under the condition Di1(1,1+k)<rD^{i-1}(1,1+k)<r is then computed. For visibility, we define the random variable ZkZ_{k} as follows:

Zk\displaystyle Z_{k} =s¯(τA)(i+k)s¯(τA)(i).\displaystyle=\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i). (37)

Let hk(τA)(zk)h_{k}^{(\tau_{A})}(z_{k}) be the PDF of ZkZ_{k} under the condition of Di1(1,1+k)<rD^{i-1}(1,1+k)<r. hk(τA)(zk)h_{k}^{(\tau_{A})}(z_{k}) can be represented by g(τA)(yl)g^{(\tau_{A})}(y_{l}) as follows:

hk(τA)(zk)\displaystyle h_{k}^{(\tau_{A})}(z_{k}) =ddzkPr(Zk<zkDi1(1,1+k)<r)\displaystyle=\dfrac{d}{dz_{k}}\Pr(Z_{k}<z_{k}\mid D^{i-1}(1,1+k)<r)
=ddzkPr(Zk<zkDi1(1,1+k)<r)Pr(Di1(1,1+k)<r)\displaystyle=\dfrac{d}{dz_{k}}\dfrac{\Pr(Z_{k}<z_{k}\land D^{i-1}(1,1+k)<r)}{\Pr(D^{i-1}(1,1+k)<r)}
=ddzkTk,1(τA)Tk,2(τA)j=1i+kg(τA)(yj)dyjTk,2(τA)j=1i+kg(τA)(yj)dyj,\displaystyle=\dfrac{d}{dz_{k}}\dfrac{\int_{T_{k,1}^{(\tau_{A})}\land T_{k,2}^{(\tau_{A})}}\prod_{j=1}^{i+k}g^{(\tau_{A})}(y_{j})dy_{j}}{\int_{T_{k,2}^{(\tau_{A})}}\prod_{j=1}^{i+k}g^{(\tau_{A})}(y_{j})dy_{j}}, (38)

where

Tk,1(τA)\displaystyle T_{k,1}^{(\tau_{A})} ={(y1,y2,,yi+k)yi+kyi<zk},\displaystyle=\{(y_{1},y_{2},\cdots,y_{i+k})\mid y_{i+k}-y_{i}<z_{k}\}, (39)
Tk,2(τA)\displaystyle T_{k,2}^{(\tau_{A})} ={(y1,y2,,yi+k)Di1(1,1+k)<r}.\displaystyle=\{(y_{1},y_{2},\cdots,y_{i+k})\mid D^{i-1}(1,1+k)<r\}. (40)

Here we used the fact that the joint PDF of i.i.d. random variables is equal to the product of PDFs of each random variable. Because the conditions in Eqs. (39) and (40) refer to different variables, the integral in the numerator of Eq. (38) can be separated into the integrals of variables yiy_{i} and yi+ky_{i+k}, and the other terms, as follows:

Tk,1(τA)Tk,2(τA)j=1i+kg(τA)(yj)dyj\displaystyle\int_{T_{k,1}^{(\tau_{A})}\land T_{k,2}^{(\tau_{A})}}\prod_{j=1}^{i+k}g^{(\tau_{A})}(y_{j})dy_{j}
=\displaystyle= T~k,1(τA)g(τA)(yi)g(τA)(yi+k)𝑑yi𝑑yi+kT~k,2(τA)j=1jii+k1g(τA)(yj)dyj,\displaystyle\int_{\tilde{T}_{k,1}^{(\tau_{A})}}g^{(\tau_{A})}(y_{i})g^{(\tau_{A})}(y_{i+k})dy_{i}dy_{i+k}\int_{\tilde{T}_{k,2}^{(\tau_{A})}}\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{i+k-1}g^{(\tau_{A})}(y_{j})dy_{j}, (41)

where

T~k,1(τA)={(yi,yi+k)yi+kyi<zk},\displaystyle\tilde{T}_{k,1}^{(\tau_{A})}=\{(y_{i},y_{i+k})\mid y_{i+k}-y_{i}<z_{k}\}, (42)
T~k,2(τA)={(y1,,yi1,yi+1,,yi+k1)Di1(1,1+k)<r}.\displaystyle\tilde{T}_{k,2}^{(\tau_{A})}=\{(y_{1},\cdots,y_{i-1},y_{i+1},\cdots,y_{i+k-1})\mid D^{i-1}(1,1+k)<r\}. (43)

Here, different from Eqs. (39) and (40), there is no variable overlap between Eqs. (42) and (43). Similarly, the denominator can also be decomposed as follows:

Tk,2(τA)j=1i+kg(τA)(yj)dyj\displaystyle\int_{T_{k,2}^{(\tau_{A})}}\prod_{j=1}^{i+k}g^{(\tau_{A})}(y_{j})dy_{j}
=\displaystyle= 2g(τA)(yi)g(τA)(yi+k)𝑑yi𝑑yi+kT~k,2(τA)j=1jii+k1g(τA)(yj)dyj\displaystyle\int_{\mathbb{R}^{2}}g^{(\tau_{A})}(y_{i})g^{(\tau_{A})}(y_{i+k})dy_{i}dy_{i+k}\int_{\tilde{T}_{k,2}^{(\tau_{A})}}\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{i+k-1}g^{(\tau_{A})}(y_{j})dy_{j}
=\displaystyle= T~k,2(τA)j=1jii+k1g(τA)(yj)dyj.\displaystyle\int_{\tilde{T}_{k,2}^{(\tau_{A})}}\prod_{\begin{subarray}{c}j=1\\ j\neq i\end{subarray}}^{i+k-1}g^{(\tau_{A})}(y_{j})dy_{j}. (44)

Here we used the fact that

2g(τA)(yi)g(τA)(yi+k)𝑑yi𝑑yi+k=1.\displaystyle\int_{\mathbb{R}^{2}}g^{(\tau_{A})}(y_{i})g^{(\tau_{A})}(y_{i+k})dy_{i}dy_{i+k}=1. (45)

As a result, Eq. (38) can be reduced to

hk(τA)(zk)\displaystyle h_{k}^{(\tau_{A})}(z_{k}) =ddzkT~k,1(τA)g(τA)(yi)g(τA)(yi+k)𝑑yi𝑑yi+k.\displaystyle=\dfrac{d}{dz_{k}}\int_{\tilde{T}_{k,1}^{(\tau_{A})}}g^{(\tau_{A})}(y_{i})g^{(\tau_{A})}(y_{i+k})dy_{i}dy_{i+k}. (46)

Equation (46) is the PDF of ZkZ_{k} without any conditions. In conclusion, the condition Di1(1,1+k)<rD^{i-1}(1,1+k)<r does not matter in i.i.d. cases. In addition, the above calculation does not depend on kk and ii except for k=1k=1, as i+k1=ii+k-1=i. Here yiy_{i} appears in both Tk,1(τA)T^{(\tau_{A})}_{k,1} and Tk,2(τA)T^{(\tau_{A})}_{k,2}, so we cannot divide the integral in the same manner. However, the same conclusion can be derived for k=1k=1. Here we introduce a variable transformation, as follows:

uj={yj+1yj(j=1,2,,i)yi+1(j=i+1).\displaystyle u_{j}=\begin{cases}y_{j+1}-y_{j}&(j=1,2,\cdots,i)\\ y_{i+1}&(j=i+1).\end{cases} (47)

Using uju_{j}, the integral of the numerator in Eq. (38) can be written as follows:

𝑑ui+1zk𝑑uirrj=1i+1g(τA)(2ui+1(l=ji+1ul))j=1i1duj.\displaystyle\int_{-\infty}^{\infty}du_{i+1}\int_{-\infty}^{z_{k}}du_{i}\int_{-r}^{r}\prod_{j=1}^{i+1}g^{(\tau_{A})}\left(2u_{i+1}-\left(\sum_{l=j}^{i+1}u_{l}\right)\right)\prod_{j=1}^{i-1}du_{j}. (48)

Similarly, the integral in the denominator in Eq. (38) is

𝑑ui+1𝑑uirrj=1i+1g(τA)(2ui+1(l=ji+1ul))j=1i1duj.\displaystyle\int_{-\infty}^{\infty}du_{i+1}\int_{-\infty}^{\infty}du_{i}\int_{-r}^{r}\prod_{j=1}^{i+1}g^{(\tau_{A})}\left(2u_{i+1}-\left(\sum_{l=j}^{i+1}u_{l}\right)\right)\prod_{j=1}^{i-1}du_{j}. (49)

Because the only difference between Eqs. (48) and (49) is the range of the integral for uiu_{i}, we can cancel the integrals for u1,u2,,ui1u_{1},u_{2},\cdots,u_{i-1}, and the remaining integrals for the numerator and denominator are

𝑑ui+1zk𝑑uig(τA)(ui+1)g(τA)(ui+1ui)\displaystyle\int_{-\infty}^{\infty}du_{i+1}\int_{-\infty}^{z_{k}}du_{i}~{}g^{(\tau_{A})}(u_{i+1})g^{(\tau_{A})}(u_{i+1}-u_{i}) (50)

and

𝑑ui+1𝑑uig(τA)(ui+1)g(τA)(ui+1ui)=1,\displaystyle\int_{-\infty}^{\infty}du_{i+1}\int_{-\infty}^{\infty}du_{i}~{}g^{(\tau_{A})}(u_{i+1})g^{(\tau_{A})}(u_{i+1}-u_{i})=1, (51)

respectively. Consequently, the resulting PDF is

h1(τA)(z1)\displaystyle h_{1}^{(\tau_{A})}(z_{1}) =ddz1𝑑ui+1z1𝑑uig(τA)(ui+1)g(τA)(ui+1ui)\displaystyle=\dfrac{d}{dz_{1}}\int_{-\infty}^{\infty}du_{i+1}\int_{-\infty}^{z_{1}}du_{i}~{}g^{(\tau_{A})}(u_{i+1})g^{(\tau_{A})}(u_{i+1}-u_{i})
=ddz1T~1,1(τA)g(τA)(yi)g(τA)(yi+1)𝑑yi𝑑yi+1.\displaystyle=\dfrac{d}{dz_{1}}\int_{\tilde{T}_{1,1}^{(\tau_{A})}}g^{(\tau_{A})}(y_{i})g^{(\tau_{A})}(y_{i+1})dy_{i}dy_{i+1}. (52)

This is equivalent to the PDF of Z1Z_{1} without any conditions. Summarizing the results thus far, hk(τA)h_{k}^{(\tau_{A})} is the same as the PDF of ZkZ_{k} without any conditions for all kk. In addition, the calculation does not depend on ii. Consequently, it is sufficient to discuss the distribution of s¯(τA)(2)s¯(τA)(1)\bar{s}^{(\tau_{A})}(2)-\bar{s}^{(\tau_{A})}(1) in i.i.d. cases.

From the above results, Pr(Dm(1,1+k)<r)\Pr(D^{m}(1,1+k)<r) can be expressed as follows:

Pr(Dm(1,1+k)<r)\displaystyle\Pr(D^{m}(1,1+k)<r)
=\displaystyle= Pr(D1(1,1+k)<r)\displaystyle\Pr(D^{1}(1,1+k)<r)
i=2mPr(D1(i,i+k)<rDi1(1,1+k)<r)\displaystyle\prod_{i=2}^{m}\Pr\left(D^{1}(i,i+k)<r\mid D^{i-1}(1,1+k)<r\right) (53)
=\displaystyle= Pr(D1(1,1+k)<r)i=2mPr(D1(i,i+k)<r)\displaystyle\Pr(D^{1}(1,1+k)<r)\prod_{i=2}^{m}\Pr\left(D^{1}(i,i+k)<r\right) (54)
=\displaystyle= Pr(D1(1,1+k)<r)m\displaystyle\Pr(D^{1}(1,1+k)<r)^{m} (55)
=\displaystyle= Pr(D1(1,2)<r)m.\displaystyle\Pr(D^{1}(1,2)<r)^{m}. (56)

Here, we obtain Eq. (53) in the same manner as Eq. (24). From Eq. (53) to (54), we ignored the condition term, as discussed above. From Eq. (54) to(55) and (56), we used the fact that the distribution of s¯(τA)(i+k)s¯(τA)(i)\bar{s}^{(\tau_{A})}(i+k)-\bar{s}^{(\tau_{A})}(i) is independent of ii and kk. Consequently, Cm(r)=Pr(D1(1,2)<r)m\langle C^{m}(r)\rangle=\Pr(D^{1}(1,2)<r)^{m}, and MSElogPr(D1(1,2)<r)MSE\simeq-\log\Pr(D^{1}(1,2)<r) holds.

It is noteworthy that the variances of s¯(τA)(1)\bar{s}^{(\tau_{A})}(1) and s¯(τA)(2)s¯(τA)(1)\bar{s}^{(\tau_{A})}(2)-\bar{s}^{(\tau_{A})}(1) are 1/τA1/\tau_{A} and 2/τA2/\tau_{A} respectively, with regard to the variance of s(1)s(1), as

Var(s¯(τA)(1))\displaystyle\operatorname*{Var}(\bar{s}^{(\tau_{A})}(1)) =Var(1τAt=1τAs(t))\displaystyle=\operatorname*{Var}\left(\dfrac{1}{\tau_{A}}\sum_{t=1}^{\tau_{A}}s(t)\right)
=1τA2t=1τAVar(s(t))\displaystyle=\dfrac{1}{\tau_{A}^{2}}\sum_{t=1}^{\tau_{A}}\operatorname*{Var}(s(t))
=1τAVar(s(1)),\displaystyle=\dfrac{1}{\tau_{A}}\operatorname*{Var}(s(1)), (57)

and

Var(s¯(τA)(2)s¯(τA)(1))\displaystyle\operatorname*{Var}(\bar{s}^{(\tau_{A})}(2)-\bar{s}^{(\tau_{A})}(1))
=\displaystyle=\, Var(s¯(τA)(2))+Var(s¯(τA)(1))\displaystyle\operatorname*{Var}(\bar{s}^{(\tau_{A})}(2))+\operatorname*{Var}(\bar{s}^{(\tau_{A})}(1))
=\displaystyle=\, 2τAVar(s(1))\displaystyle\dfrac{2}{\tau_{A}}\operatorname*{Var}(s(1)) (58)

hold because s(t)s(t) and s¯(τA)(l)\bar{s}^{(\tau_{A})}(l) are i.i.d.

References

  • Costa, Goldberger, and Peng (2002) M. Costa, A. L. Goldberger,  and C.-K. Peng, “Multiscale entropy analysis of complex physiologic time series,” Phys. Rev. Lett. 89, 068102 (2002).
  • Catarino et al. (2011) A. Catarino, O. Churches, S. Baron-Cohen, A. Andrade,  and H. Ring, “Atypical eeg complexity in autism spectrum conditions: A multiscale entropy analysis,” Clinical Neurophysiology 122, 2375–2383 (2011).
  • Costa, Goldberger, and Peng (2005) M. Costa, A. L. Goldberger,  and C.-K. Peng, “Multiscale entropy analysis of biological signals,” Phys. Rev. E 71, 021906 (2005).
  • Martina et al. (2011) E. Martina, E. Rodriguez, R. Escarela-Perez,  and J. Alvarez-Ramirez, “Multiscale entropy analysis of crude oil price dynamics,” Energy Economics 33, 936–947 (2011).
  • Wang et al. (2013) J. Wang, P. Shang, X. Zhao,  and J. Xia, “Multiscale entropy analysis of traffic time series,” International Journal of Modern Physics C 24, 1350006 (2013)https://doi.org/10.1142/S012918311350006X .
  • Allan (1966) D. Allan, “Statistics of atomic frequency standards,” Proceedings of the IEEE 54, 221–230 (1966).
  • Filler (1988) R. Filler, “The acceleration sensitivity of quartz crystal oscillators: a review,” IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 35, 297–305 (1988).
  • Skrinsky, Skrinska, and Zelinger (2014) J. Skrinsky, M. Skrinska,  and Z. Zelinger, “Allan variance - stability of diode-laser spectrometer for monitoring of gas concentrations,” in 2014 International Conference on Mathematics and Computers in Sciences and in Industry (2014) pp. 159–164.
  • Humeau-Heurtier (2015) A. Humeau-Heurtier, “The multiscale entropy algorithm and its variants: A review,” Entropy 17, 3110–3123 (2015).
  • Lang and Kobayashi (1980) R. Lang and K. Kobayashi, “External optical feedback effects on semiconductor injection laser properties,” IEEE Journal of Quantum Electronics 16, 347–355 (1980).
  • Goldberger et al. (e 13) A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng,  and H. E. Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals,” Circulation 101, e215–e220 (2000 (June 13)), circulation Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218; doi: 10.1161/01.CIR.101.23.e215.
  • Irurzun et al. (2021) I. M. Irurzun, L. Garavaglia, M. M. Defeo,  and J. T. Mailland, “Rr interval time series from healthy subjects (version 1.0.0),” PhysioNet  (2021), https://doi.org/10.13026/51yd-d219.
  • Baim et al. (1986) D. S. Baim, W. S. Colucci, E. S. Monrad, H. S. Smith, R. F. Wright, A. Lanoue, D. F. Gauthier, B. J. Ransil, W. Grossman,  and E. Braunwald, “Survival of patients with severe congestive heart failure treated with oral milrinone,” Journal of the American College of Cardiology 7, 661–670 (1986).
  • Petrutiu, Sahakian, and Swiryn (2007) S. Petrutiu, A. V. Sahakian,  and S. Swiryn, “Abrupt changes in fibrillatory wave characteristics at the termination of paroxysmal atrial fibrillation in humans,” EP Europace 9, 466–470 (2007)https://academic.oup.com/europace/article-pdf/9/7/466/1470935/eum096.pdf .
  • Larger et al. (2012) L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. M. Gutierrez, L. Pesquera, C. R. Mirasso,  and I. Fischer, “Photonic information processing beyond turing: an optoelectronic implementation of reservoir computing,” Opt. Express 20, 3241–3249 (2012).
  • Rogister et al. (2001) F. Rogister, A. Locquet, D. Pieroux, M. Sciamanna, O. Deparis, P. Mégret,  and M. Blondel, “Secure communication scheme using chaotic laser diodes subject to incoherent optical feedback and incoherent optical injection,” Opt. Lett. 26, 1486–1488 (2001).
  • Mengue and Essimbi (2012) A. D. Mengue and B. Z. Essimbi, “Secure communication using chaotic synchronization in mutually coupled semiconductor lasers,” Nonlinear Dynamics 70, 1241–1253 (2012).
  • Mihana et al. (2020) T. Mihana, K. Fujii, K. Kanno, M. Naruse,  and A. Uchida, “Laser network decision making by lag synchronization of chaos in a ring configuration,” Opt. Express 28, 40112–40130 (2020).
  • Wagner and Marriott (2008) G. S. Wagner and H. J. L. H. J. L. Marriott, Marriott’s practical electrocardiography, 11th ed. (Wolters Kluwer Health/Lippincott Williams & Wilkins, 2008).
  • Barnes et al. (1971) J. A. Barnes, A. R. Chi, L. S. Cutler, D. J. Healey, D. B. Leeson, T. E. McGunigal, J. A. Mullen, W. L. Smith, R. L. Sydnor, R. F. C. Vessot,  and G. M. R. Winkler, “Characterization of frequency stability,” IEEE Transactions on Instrumentation and Measurement IM-20, 105–120 (1971).
  • Richman and Moorman (2000) J. S. Richman and J. R. Moorman, “Physiological time-series analysis using approximate entropy and sample entropy,” American Journal of Physiology-Heart and Circulatory Physiology 278, H2039–H2049 (2000), pMID: 10843903.
  • Delgado-Bonal and Marshak (2019) A. Delgado-Bonal and A. Marshak, “Approximate entropy and sample entropy: A comprehensive tutorial,” Entropy 21 (2019), 10.3390/e21060541.
  • Uchida (2012) A. Uchida, Optical Communication with Chaotic Lasers: Applications of Nonlinear Dynamics and Synchronization (Wiley-VCH, 2012).
  • Ohtsubo (2017) J. Ohtsubo, Semiconductor Lasers: Stability, Instability and Chaos Fourth Edition (Springer, 2017).
  • Wang, Yao, and Cai (2020) Z. Wang, L. Yao,  and Y. Cai, “Rolling bearing fault diagnosis using generalized refined composite multiscale sample entropy and optimized support vector machine,” Measurement 156, 107574 (2020).
  • Liang et al. (2012) S.-F. Liang, C.-E. Kuo, Y.-H. Hu, Y.-H. Pan,  and Y.-H. Wang, “Automatic stage scoring of single-channel sleep eeg by using multiscale entropy and autoregressive models,” IEEE Transactions on Instrumentation and Measurement 61, 1649–1657 (2012).
  • Kroupa (2012) V. Kroupa, Frequency stability (Wiley-IEEE PRESS, 2012).
  • Sauer, Yorke, and Casdagli (1991) T. Sauer, J. A. Yorke,  and M. Casdagli, “Embedology,” Journal of Statistical Physics 65, 579–616 (1991).