This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

]Max Planck Institute for the Physics of Complex Systems, Nöthnitzer Str. 38, 01187 Dresden.
Max Planck Institute for the Mathematics in the Sciences, Inselstr. 22, 04103 Leipzig.

A combinatorial view of stochastic processes: White noise

A. Diaz-Ruelas [
Abstract

White noise is a fundamental and fairly well understood stochastic process that conforms the conceptual basis for many other processes, as well as for the modeling of time series. Here we push a fresh perspective toward white noise that, grounded on combinatorial considerations, contributes to give new interesting insights both for modelling and theoretical purposes. To this aim, we incorporate the ordinal pattern analysis approach which allows us to abstract a time series as a sequence of patterns and their associated permutations, and introduce a simple functional over permutations that partitions them into classes encoding their level of asymmetry. We compute the exact probability mass function (p.m.f.) of this functional over the symmetric group of degree nn, thus providing the description for the case of an infinite white noise realization. This p.m.f. can be conveniently approximated by a continuous probability density from an exponential family, the Gaussian, hence providing natural sufficient statistics that render a convenient and simple statistical analysis through ordinal patterns. Such analysis is exemplified on experimental data for the spatial increments from tracks of gold nanoparticles in 3D diffusion.

preprint: APS/123-QED

I Introduction

The incorporation of stochastic ingredients in models describing phenomena in all disciplines is now a standard in scientific practice. White noise is one of the most important of such stochastic ingredients. Although tools for identifying white and other types of noise exist [1, 2], there is a permanent demand for reliable and robust statistical methods for analyzing data in order to distinguish noise and filter it from signals in experiments. Or in hypothesis tests, for assessing the plausibility of the outcome of an experiment being the result of randomness and not a significant, controllable effect. Due to its ubiquity in experiments and its mathematical simplicity, white noise is very often the most convenient stochastic component that adds realism to a dynamic model, commonly regarded as the noise polluting observations. It can be continuous or discrete both in time or in distribution, so it can be applied to many scenarios. It is a stationary and independent and identically distributed process, all relatively simple properties for a stochastic process. Here we present a combinatorial perspective to study white noise inspired in the concept of ordinal patterns. An ordinal pattern of length nn is the diagramatic representation of the inequality fulfilled by a subsequence of nn points x1,,xnx_{1},…,x_{n} in a time series {xt}tI\{x_{t}\}_{t\in I}. We discuss ordinal patterns in detail in Sec. II. This concept was used in 2002 by Bandt and Pompe [3] for building a measure of complexity for time series named Permutation Entropy (PE). PE has proven its value not only in applications, when used to analyze time series from a great variety of phenomena [4, 5], but it is also of theoretical relevance since it is equivalent to the Kolmogorov-Sinai entropy for a large class of piece-wise continuous maps of the interval [6, 7]. The procedure for computing PE consists in first choosing the size nn for the window that will be used in the analysis, corresponding to the embedding dimension in a Takens embedding [2]. Then we slide the window through the series (equivalent to construct the lagged or delay vectors [2]), to find the frequencies πj,k=1,,n!\pi_{j},k=1,…,n! with which ordinal patterns occur. Then we compute the associated Shannon entropy Hπ=jπjlogπjH_{\pi}=-\sum_{j}\pi_{j}\log\pi_{j}. For white noise in the long time series limit all the patterns are equally likely, therefore their frequencies follow a discrete uniform probability mass function (p.m.f.) with support on the integers 1,,n!1,…,n!, which is equivalent to the probability measure of permutations over the symmetric group of degree nn, SnS_{n}. Despite its relevance and wide range of applications, there are few rigorous studies on the properties of PE for its use in statistical inference. To the best of the author’s knowledge, this is addressed only in works such as [8], where the authors investigate the expectation value and variance of PE for finite time series of white noise, and later the same authors address the effect of ordinal pattern selection on the variance of PE [9]. In the customary PE approach every permutation is, in a sense, considered as a class, since the count of every single permutation is important. The effect of this is that the empirical distributions obtained from finite-length observations will be very sensitive to relatively minor changes in the proportions of each observed pattern [8]. This lack of robustness represents a liability when trying to distinguish noise from structure. Another problem is the factorial growth of the number of classes, enlarging the discrete support correspondingly and making the analysis both impractical and meaningless for values of the embedding dimension beyond low or moderate nn (10\sim 10), since the required length of a time series that will have a chance to display roughly one representative member of each class would be on the same order of n!n! (already around 3 million observations for n=10n=10).

In this work, we address these limitations by introducing a new statistic for permutations in Sec. III. This statistic is a functional over the symmetric group SnS_{n}, that can be interpreted as a measure of asymmetry for ordinal patterns. The functional divides the symmetric group into classes corresponding to a coarse notion of levels of overall increasing or decreasing behaviour of a pattern. In turn, this results on the transformation of the original discrete uniform probability measure of the patterns over SnS_{n}, into a new probability measure that concentrates around its expected value, as we show in Sec IV. This has practical and conceptual consequences, such as the ability of performing a suitably modified version of the ordinal pattern analysis for very large embedding dimensions, since now the number of classes of patterns to be tracked is reduced from 𝒪(n!)\mathcal{O}(n!) to merely 𝒪(n2)\mathcal{O}(n^{2}). The probability mass functions corresponding to our functional can be approximated by a Gaussian distribution, that is itself an exponential family. This guarantees the existence of natural sufficient statistics for the estimation of our statistic, as we explain in Sec IV. We open Sec. V with an illustration of our framework by analysing white noise from different source distributions and discussing its potential for distinguishing the deterministic signature of a chaotic orbit from a discrete map by inspection of the empirical p.m.f. obtained from our modified pattern analysis. Then we take experimental 3D tracks of gold nanoparticles and design a test for identifying the statistical independence among the spatial increments on each coordinate in a plane of observation. We finish by discussing our results and making additional remarks on the advantages and drawbacks of the presented framework.

II Ordinal Patterns
and their symmetries

A permutation τ\tau is a bijection τ:SS\tau:S\rightarrow S. If we take the set SS to be the sequence of integers S={1,2,3,,n}S=\{1,2,3,\ldots,n\}, then τ(i)=τi\tau(i)=\tau_{i} maps this sequence into itself τi=k,i,k=1,2,3,,n\tau_{i}=k,\ \ i,k=1,2,3,\ldots,n. Due to its bijective nature, we call the arrangement τ=τ1τ2τn\tau=\tau_{1}\tau_{2}\cdots\tau_{n} also a permutation of length nn. Alternatively we can call it a word produced by τ\tau on nn symbols. The set Sn={τ:τ(i)=k,i,k=1,2,3,,n}S_{n}=\{\tau:\tau(i)=k,\ \ i,k=1,2,3,\ldots,n\} together with the operation of functional composition is the symmetric group on nn symbols, of cardinality |Sn|=n!|S_{n}|=n! [10]. We will denote the set of symbols as [n]:={1,2,,n}[n]:=\{1,2,\ldots,n\} and an interval of symbols is denoted by [i,j][n][i,j]\in[n].

From a simplified perspective, the elements τSn\tau\in S_{n} can be regarded all equivalent to each other a priori. Therefore, the corresponding discrete measure over SnS_{n} would be U(1,n!)U(1,n!), assigning each permutation a weight of 1/n!1/n!. Under this measure, permutations can be enumerated by lexicographic order, which ranks the words according to their size as integers. The lexicographic order in S3S_{3} is shown in Table 1 following the index ii along with other statistics for permutations, such as the inversion number, the major index and the runs (number of ascending sequences) [10],[11]. In the last column, we include the corresponding values of the functional introduced in Sec. III.

ii τ\tau sgn inv maj runs α(τ)\alpha(\tau)
1 123 +1 0 0 1 2
2 132 -1 1 2 2 1
3 213 -1 1 1 2 1
4 231 +1 2 2 2 -1
5 312 +1 2 1 2 -1
6 321 -1 3 3 3 -2
Table 1: Sign, inversion number, major index, runs and the functional α(τ)\alpha(\tau) introduced in Sec.III for the permutations τS3\tau\in S_{3}, listed in lexicographic order.

Consider the lattice LnL_{n} formed by the cartesian product Sn×SnS_{n}\times S_{n}. An ordinal pattern [3], that we will denote here as dτ=τ(1),τ(2),,τ(n)d_{\tau}=\langle\tau(1),\tau(2),\ldots,\tau(n)\rangle or simply dd is a directed graph in LnL_{n} that connects ordered pairs that are consecutive in their first index (i,τ(i))(i+1,τ(i+1))(i,\tau(i))\rightarrow(i+1,\tau(i+1)), i=1,2,,ni=1,2,\ldots,n, hence d:S×SSnd:S\times S\rightarrow S_{n}. The angle brackets \langle\cdot\rangle explicitly indicate that the elements of the nn-tuple (τ(1),τ(2),,τ(n))(\tau(1),\tau(2),\ldots,\tau(n)) are connected consecutively conforming a graph.

For instance, the graph (1,1)(2,2)(1,1)\rightarrow(2,2), or 1,2\langle 1,2\rangle in the L2L_{2} lattice is an ordinal pattern, since τ(1)=1,τ(2)=2\tau(1)=1,\tau(2)=2, then did=1,2d_{\mathrm{id}}=\langle 1,2\rangle corresponds to the identity permutation in S2S_{2}, τid=12\tau_{\mathrm{id}}=12. The graph (1,1)(2,1)=1,1(1,1)\rightarrow(2,1)=\langle 1,1\rangle is not a valid diagram since the word 1111 is not a permutation, i.e. 11S2={12,21}11\notin S_{2}=\{12,21\}. The set of all diagrams is denoted by Dn={d:d=τ(1),τ(2),,τ(n)}D_{n}=\{d:d=\langle\tau(1),\tau(2),\ldots,\tau(n)\rangle\}. Tables 2 and 3 display the permutations and their corresponding graph sets D3D_{3}, D4D_{4}, respectively.

τ\tau α(τ)\alpha(\tau) pattern τ\tau α(τ)\alpha(\tau) pattern τ\tau α(τ)\alpha(\tau) pattern
123 2 [Uncaptioned image] 213 1 [Uncaptioned image] 312 -1 [Uncaptioned image]
132 1 [Uncaptioned image] 231 -1 [Uncaptioned image] 321 -2 [Uncaptioned image]
Table 2: Permutations τ\tau, α(τ)\alpha(\tau), and the associated ordinal patterns of length n=3n=3.
Table 3: Permutations τ\tau, the functional α(τ)\alpha(\tau) (defined in Sec. III), and the associated ordinal patterns of length n=4n=4. Reflection of the patterns on the first and second columns across a horizontal axis results in the diagrams in the pervious to last and last columns, respectively, with the same absolute values of the characteristic for the reflected diagrams/permutations, but negative sign. This is a basic property of α(τ)\alpha(\tau) that is also reflected statsistically.
τ\tau α(τ)\alpha(\tau) pattern τ\tau α(τ)\alpha(\tau) pattern τ\tau α(τ)\alpha(\tau) pattern τ\tau α(τ)\alpha(\tau) pattern
1234 4 [Uncaptioned image] 2134 4 [Uncaptioned image] 3124 2 [Uncaptioned image] 4123 0 [Uncaptioned image]
1243 4 [Uncaptioned image] 2143 4 [Uncaptioned image] 3142 2 [Uncaptioned image] 4132 0 [Uncaptioned image]
1324 2 [Uncaptioned image] 2314 0 [Uncaptioned image] 3214 0 [Uncaptioned image] 4213 -2 [Uncaptioned image]
1342 2 [Uncaptioned image] 2341 0 [Uncaptioned image] 3241 0 [Uncaptioned image] 4231 -2 [Uncaptioned image]
1423 0 [Uncaptioned image] 2413 -2 [Uncaptioned image] 3412 -4 [Uncaptioned image] 4312 -4 [Uncaptioned image]
1432 0 [Uncaptioned image] 2431 -2 [Uncaptioned image] 3421 -4 [Uncaptioned image] 4321 -4 [Uncaptioned image]

II.1 Reflections of diagrams

It is instructive to explore the symmetries of the ordinal patterns, since they are reflected in the statistical properties of the permutations as we explain in Sec. III for inspecting the properties of the distribution of the statistic introduced there.

Vertical reflections

A vertical reflection v(d)v(d) consists on flipping a diagram along a vertical axis and relabeling the nodes (ordered pairs) accordingly. It is a bijection h:DDh:D\rightarrow D that acts on ordered pairs as

v(i,τi)=(v(i),τi),v(i,\tau_{i})=(v(i),\tau_{i}), (1)

i.e. it leaves the second coordinate invariant. The action of vv over [n][n] is explicit

v(1)=n,v(2)=n1,,v(n1)=2,v(n)=1,v(1)=n,\ \ v(2)=n-1,\ldots,\ \ v(n-1)=2,\ \ v(n)=1, (2)

meaning that vv simply mirrors the symbols along a vertical axis of reflection to the right of the permutation, as illustrated in Fig. 1. For even permutation length nn, the effect of v(τ)v(\tau) as a reflection with respect to an external vertical axis, is the same as an internal reflection of the symbols of the word v(τ1τ2τn1τn)=τnτn1τ2τ1v(\tau_{1}\tau_{2}\cdots\tau_{n-1}\tau_{n})=\tau_{n}\tau_{n-1}\cdots\tau_{2}\tau_{1} along an internal axis splitting the permutation in equal parts of n2\frac{n}{2} symbols. For odd nn, the permutation is split into equal parts of mn2m\equiv\left\lfloor{\frac{n}{2}}\right\rfloor but the symbol τm+1\tau_{m+1} is left invariant since it becomes the internal axis of reflection itself. As a simple illustration, let us take a diagram of length n=6n=6, as shown schematically in Fig. 1(a). Ranking the points from smallest to largest gives the ranking permutation τ=τ1τ2τn\tau=\tau_{1}\tau_{2}\cdots\tau_{n}, corresponding to the permuted indices of the vector of ranks: r(xt1,xt2,xtn)=(xτ1,xτ2,xτn)\mathrm{r}(x_{t_{1}},x_{t_{2}}\ldots,x_{t_{n}})=(x_{\tau_{1}},x_{\tau_{2}}\cdots,x_{\tau_{n}}) relative to the original positions t1,t2,tnt_{1},t_{2},\ldots t_{n}. Hence, in the example of Fig. 1(a), v(1,2,5:4,6,3)=3,6,4:5,2,1v(\langle 1,2,5:4,6,3\rangle)=\langle 3,6,4:5,2,1\rangle, where we put double dots instead of a comma in the middle only to highlight the internal axis of symmetry of the diagram.

Refer to caption
Figure 1: Effect of vertical and horizontal reflections on diagrams d1=1,2,5,4,6,3d_{1}=\langle 1,2,5,4,6,3\rangle and d2=3,6,4,5,2,1d_{2}=\langle 3,6,4,5,2,1\rangle respectively. (see Sec. III).

Horizontal reflections

A horizontal reflection h(d)h(d) consists on flipping a diagram along a horizontal axis. Its action over ordered pairs is

h(i,τ(i))=(i,h(τi)),h(i,\tau(i))=(i,h(\tau_{i})), (3)

meaning that hh acts directly on the permutation symbols. Its explicit action corresponds to the involution

τ1,τ2,,τnn+1τ1,n+1τ2,,n+1τn,\langle\tau_{1},\tau_{2},\cdots,\tau_{n}\rangle\rightarrow\langle n+1-\tau_{1},n+1-\tau_{2},\cdots,n+1-\tau_{n}\rangle, (4)

III A functional over permutations

All of the permutation statistics displayed in Table I reflect different symmetries in SnS_{n}. The sign of the permutation is arguably the most basic statistic, telling explicitly whether the number of flips of symbols in a permutation τ\tau relative to the identity τid=123n\tau_{\mathrm{id}}=123\cdots n is even (sgn(τ)=+1\operatorname{sgn}(\tau)=+1) or odd (sgn(τ)=1\operatorname{sgn}(\tau)=-1). The inversion number explicitly counts the pairs for which the symbol in position ii is larger than the one at i+1i+1. Summing up the indices of the larger symbols in each inversion yields the major index maj(τ)=τ(i)>τ(i+1)i\mathrm{maj}(\tau)=\sum_{\tau(i)>\tau(i+1)}i.

Below we introduce a new statistic of permutations that accounts for an imbalance in weight when considering the symbols [n][n] conforming every permutation τSn\tau\in S_{n} as a collection of weights. The objective of defining and characterizing this new statistic is twofold: On one hand, we have the intrinsic interest on new insights on the study of the symmetric group and the implications of these discoveries on other areas such as dynamical systems and stochastic processes. On the other hand, there is also a practical interest on the analysis of time series, specifically in connection with ordinal pattern analysis.

The construction of the set of ordinal patterns from a realization of any process XtX_{t}, consists on mapping portions of that process time series {xt}tI\{x_{t}\}_{t\in I} (with I=[L]I=[L] an index set, usually [N]=1,2,,L[N]={1,2,\ldots,L}), to the diagrams in Sec.II by sliding a window of size nn over {xt}tI\{x_{t}\}_{t\in I}. We can inspect larger portions of the series without increasing nn by skipping (lagging) a number of l1l-1 points after every choice on each window, so the number ll is correspondingly known as lag. For instance, a lag of l=1l=1 means that we do not skip any point and we slide the window without gaps. The collection of windows of nn points are called delay or lagged vectors, and the process of construction of these vectors is known as an embedding [2]. Correspondingly, the natural number nn indicating the window size is called embedding dimension. This terminology comes from the embedding theorems that form the basis of the state space reconstruction methods from scalar time series, known in general as time delay embeddings [2]. Hence, for the ordinal pattern analysis, we first make the embedding of the time series for a chosen dimension nn and lag ll. This yields a total of N=L(n1)lN=L-(n-1)l lagged vectors. Then we rank the amplitudes {x1,x2,,xn}\{x_{1},x_{2},\ldots,x_{n}\} on every lagged vector according to their magnitude, thus obtaining the ranked vectors {xτ(1),xτ(2),,xτ(n)}\{x_{\tau(1)},x_{\tau(2)},\ldots,x_{\tau(n)}\}. The sequence of indices in these ranked vectors are the ranking permutations τ=τ1τ2τn\tau=\tau_{1}\tau_{2}\cdots\tau_{n}, where τi=τ(i)\tau_{i}=\tau(i). Finally, these permutations have associated ordinal patterns as we saw in Sec. II. In this way the local information on the relative ordering is preserved, and as a consequence also the relevant information on the correlation structure of the series. However, a simple but careful inspection of this mapping makes evident that the relative ordering is not the only information preserved. In fact, if we interpret the ranks as abstract weights assigned to the amplitudes of the process, then it is clear that also information on the overall variation within portions of these windows of nn data points is preserved. Questions such as the weight accumulated in sub-intervals within each bin of size nn arise naturally from this observation. An equally natural choice of sub-interval for analysis within the nn-window is half the bin, so as to build a measure to estimate the tendency of the process to have local increasing, decreasing or close to constant behavior. This is in close analogy to the concept of derivative, but getting rid of the information on the specific amplitudes that the process takes. In order to study this asymmetry on the total weight concentrated on each half of a length nn window, we provide the following

Definition 1 Functional α(τ)\alpha(\tau). For every permutation τSn\tau\in S_{n}, nn a non-negative integer, define the functional

α(τ)={i=m+2nτ(i)i=1mτ(i),noddi=m+1nτ(i)i=1mτ(i),neven,\alpha(\tau)=\begin{cases}\sum_{i=m+2}^{n}\tau(i)-\sum_{i=1}^{m}\tau(i),\ n\ \mathrm{odd}\\ \sum_{i=m+1}^{n}\tau(i)-\sum_{i=1}^{m}\tau(i),\ n\ \mathrm{even},\end{cases} (5)

where mn2m\equiv\lfloor{\frac{n}{2}\rfloor}.

The functional thus defined is invariant under a shift of the sequence [n][n] by an integer. This transformation is also invariant under monotonic transformations of the process XtX_{t}, since the relative ordering is not affected. The case for odd nn gives mixed statistical behavior, since the middle permutation symbol τm+1\tau_{m+1} is ignored in the computation of α(τ)\alpha(\tau), implying that for a given permutation lenght nn, when the ignored symbol happens to be τm+1=n\tau_{m+1}=n, there will be (n1)!(n-1)! permutations that display the same statistics as in the full problem for permutation lenght n1n-1. By the invariance of α(τ)\alpha(\tau) under a shift of {1,2,,n}\{1,2,\ldots,n\} by an integer, when the middle symbol is τm+1=1\tau_{m+1}=1 we have the same situation as before. Other ignored symbols produce different and more complicated effects. Therefore, we shall limit here to the case for even nn, which reduces Eq. (5) to

α(τ)=k=1m[τ(k+m)τ(mk+1)],m=n2.\alpha(\tau)=\sum_{k=1}^{m}\left[\tau\left(k+m\right)-\tau\left(m-k+1\right)\right],\ \ m=\frac{n}{2}. (6)

Hence, α(τ)\alpha(\tau) splits τ\tau into equally sized intervals L=[τ(1),τ(m)]L=[\tau(1),\tau(m)] and R=[τ(m+1),τ(n)]R=[\tau(m+1),\tau(n)] with partial sums sl=i=1mτ(i)s_{l}=\sum_{i=1}^{m}\tau(i) and sr=i=m+1nτ(i)s_{r}=\sum_{i=m+1}^{n}\tau(i), respectively. We get α(τ)\alpha(\tau) by summing up the symbols on each half, and subtracting: α=srsl\alpha=s_{r}-s_{l}. Notice that sl,min=m(m+1)2s_{l,\mathrm{min}}=\frac{m(m+1)}{2}, and sl,max=n(n+1)2m(m+1)2=3m2+m2s_{l,\mathrm{max}}=\frac{n(n+1)}{2}-\frac{m(m+1)}{2}=\frac{3m^{2}+m}{2}. This means that αmaxαM=m2\alpha_{\mathrm{max}}\equiv\alpha_{M}=m^{2}, as it is the case, for instance, for the identity permutation α(τid)=αM\alpha(\tau_{\mathrm{id}})=\alpha_{M}. By symmetry, using the reflections in Eqs. (2) and (4) the minimum value of α\alpha is α(h(τid))=α(v(τid))=αM\alpha(h(\tau_{\mathrm{id}}))=\alpha(v(\tau_{\mathrm{id}}))=-\alpha_{M}. This means that α(τ)\alpha(\tau) is a bounded variable for finite nn.

The definition of α(τ)\alpha(\tau) allows for an obvious source of degeneracy or multiplicity, since the order of the summands on each half sl,srs_{l},s_{r} is irrelevant. Let us denote the number of permutations that share the same value of α(τ)\alpha(\tau) by ϕ(α)\phi(\alpha). For every distinct value of α(τ)\alpha(\tau), there are (by shuffling the terms on each sum sl,srs_{l},s_{r}) at least |L||R|=(m!)2η|L|\cdot|R|=\left(m!\right)^{2}\equiv\eta reorderings that share the same value, hence ϕ(α)η\phi(\alpha)\geq\eta. For instance, if we consider τid\tau_{\mathrm{id}} we get α(τid)=αM\alpha(\tau_{\mathrm{id}})=\alpha_{M} as above, which has the least degeneracy ϕ(αM)=η\phi(\alpha_{M})=\eta. The next possible value of α(τ)\alpha(\tau) can be obtained from τid=123n\tau_{\mathrm{id}}=123\cdots n by switching mm, the largest symbol in LL, with the smallest symbol in RR, which is m+1m+1. This clearly increases the value of the left partial sum by one slsl+1s_{l}\rightarrow s_{l}+1 while decreasing the right partial sum by one unit srsr1s_{r}\rightarrow s_{r}-1, with the effect of decreasing α(τid)\alpha(\tau_{\mathrm{id}}) by two units α(τid)α(τid)2\alpha(\tau_{\mathrm{id}})\rightarrow\alpha(\tau_{\mathrm{id}})-2, then the next value is α=αM2\alpha^{\prime}=\alpha_{M}-2. Proceeding in this way, we get the set of all possible α\alpha-values over SnS_{n} (for even nn)

An={αM,αM+2,αM+4,,αM4,αM2,αM}A_{n}=\{-\alpha_{M},-\alpha_{M}+2,-\alpha_{M}+4,\ldots,\alpha_{M}-4,\alpha_{M}-2,\alpha_{M}\} (7)

whose size is |An|=m2+1|A_{n}|=m^{2}+1. Notice that if m21mod(2)m^{2}\equiv 1\mod(2), then minτ{|α(τ)|:τSn}=1\min_{\tau}\{|\alpha(\tau)|:\tau\in S_{n}\}=1 and minτ{|α(τ)|:τSn}=0\min_{\tau}\{|\alpha(\tau)|:\tau\in S_{n}\}=0 for m20mod(2)m^{2}\equiv 0\mod(2), a difference that becomes irrelevant as nn\rightarrow\infty. It is not difficult to see that the degeneracies come in multiples of η\eta, meaning that ϕ(α=αi)=aiη\phi(\alpha=\alpha_{i})=a_{i}\eta, αi=±αM2i\alpha_{i}=\pm\alpha_{M}\mp 2i, i=0,1,2,,m2i=0,1,2,\ldots,m^{2}. The coefficients ai(n)a_{i}(n) for i[1,m22+1]i\in[1,\lfloor\frac{m^{2}}{2}\rfloor+1] form sequences

ai(2)\displaystyle a_{i}(2) =\displaystyle= 1,1\displaystyle 1,1
ai(4)\displaystyle a_{i}(4) =\displaystyle= 1,1,2\displaystyle 1,1,2
ai(6)\displaystyle a_{i}(6) =\displaystyle= 1,1,2,3,3\displaystyle 1,1,2,3,3
ai(8)\displaystyle a_{i}(8) =\displaystyle= 1,1,2,3,5,5,7,7,8\displaystyle 1,1,2,3,5,5,7,7,8
ai(10)\displaystyle a_{i}(10) =\displaystyle= 1,1,2,3,5,7,9,11,14,16,18,19,20\displaystyle 1,1,2,3,5,7,9,11,14,16,18,19,20
ai(12)\displaystyle a_{i}(12) =\displaystyle= 1,1,2,3,5,7,11,13,18,22,28,32,39,42,48,\displaystyle 1,1,2,3,5,7,11,13,18,22,28,32,39,42,48,
51,55,55,58\displaystyle 51,55,55,58
ai(14)\displaystyle a_{i}(14) =\displaystyle= 1,1,2,3,5,7,11,15,20,26,34,42,53,63,75,\displaystyle 1,1,2,3,5,7,11,15,20,26,34,42,53,63,75, (8)
87,100,112,125,136,146,155,162,166,169\displaystyle 87,100,112,125,136,146,155,162,166,169
\displaystyle\vdots

that are mirrored for i[m22+2,n]i\in[\lfloor\frac{m^{2}}{2}\rfloor+2,n]. The sequences in Eq. (8) were found by direct numerical computation, but in the following we obtain them analytically from the observations in the construction of the set AnA_{n} and the definition in Eq (6) for α(τ)\alpha(\tau).

It is clear that iϕ(αi)=ηiai=n!\sum_{i}\phi(\alpha_{i})=\eta\sum_{i}a_{i}=n!, therefore,

i=0m2ai=(nm),m=n2.\sum_{i=0}^{m^{2}}a_{i}={{n}\choose{m}},\ \ m=\frac{n}{2}. (9)

Thus, the central binomial coefficient (nm){{n}\choose{m}} counts the number of ways of computing α(τ)\alpha(\tau) in a nontrivial way. This is because (nm){{n}\choose{m}} also counts the number of different ways to assign a ++ sign to m=n/2m=n/2 out of nn symbols and a - sign to the rest of nm=n/2n-m=n/2 symbols, which is precisely the problem of computing α(τ)\alpha(\tau) without counting the shuffling of the terms in the sums. Yet, Eq. (9) tells us the total number of different ways to compute α(τ)\alpha(\tau), not the value of the individual coefficients aia_{i}. To find the nontrivial multiplicities aia_{i}, we notice the equivalence between our problem and the combinatorial problem of sums of partitions of sets. The multiplicities aia_{i} correspond to the number of permutations τSn\tau\in S_{n} such that α(τ)=ai\alpha(\tau)=a_{i}. This is equivalent to the problem of finding the number of sets made of m=n/2m=n/2 elements out of nn total elements, such that the sum of the mm-element set is fixed. This, in turn, fixes the sum of the nmn-m remaining elements. These sums are clearly sl,srs_{l},s_{r} discussed before. Since fixing either of the sums fixes the value of α(τ)\alpha(\tau), we have thus just rephrased the computation of our functional as a combinatorial problem which has an elegant solution, stated in the form of the following

Theorem (Bóna 2012 [10]) Let nn and kk be fixed non-negative integers so that knk\leq n. Let bib_{i} denote the number of kk-element subsets of [n][n] whose elements have sum (k+12)+i{k+1\choose 2}+i, that is, ii larger than the minimum. Then we have

[𝐧𝐤]:=i=0k(nk)biqi.{\mathbf{n}\brack\mathbf{k}}:=\sum_{i=0}^{k(n-k)}b_{i}q^{i}. (10)

In other words, [𝐧𝐤]{\mathbf{n}\brack\mathbf{k}} is the ordinary generating function of the kk-element subsets of [n][n] according to the sum of their elements.

The object in Eq. (10) belongs to a special kind of polynomials known as Gaussian polynomials or Gaussian binomial coefficients [10], regarded as a generalization of the binomial coefficients and defined as

[𝐧𝐤]=[𝐧]![𝐧𝐤]![𝐤]!=(1qn)(1qn1)(1qnk+1)(1q)(1q2)(1qk){\mathbf{n}\brack\mathbf{k}}=\frac{[\mathbf{n}]!}{[\mathbf{n-k}]![\mathbf{k}]!}=\frac{(1-q^{n})(1-q^{n-1})\cdots(1-q^{n-k+1})}{(1-q)(1-q^{2})\cdots(1-q^{k})} (11)

which are polynomials of the form in Eq. (10) whose degree is k(nk)k(n-k) and with symmetric coefficients bi=bk(nk)ib_{i}=b_{k(n-k)-i}. The symbol [𝐤][\mathbf{k}] it is defined as

[𝐤]=1qk1q=1+q+q2++qk1,q1,[\mathbf{k}]=\frac{1-q^{k}}{1-q}=1+q+q^{2}+\cdots+q^{k-1},\ \ q\neq 1, (12)

also a polynomial, which reduces to [𝐤]=k[\mathbf{k}]=k in the limit q1q\rightarrow 1. From the same limit, we also recover (nk){n\choose k} from Eq. (11). The polynomial [𝐤][\mathbf{k}] in qq is called a qq-analog of kk, a natural choice for generalizing a representation of nonnegative integers kk parametrized by qq. The generalization of the factorial as a qq-analog is the qq-factorial

[𝐤]!\displaystyle[\mathbf{k}]! =\displaystyle= [𝟏][𝟐][𝟑][𝐤]\displaystyle[\mathbf{1}][\mathbf{2}][\mathbf{3}]\cdots[\mathbf{k}]
=\displaystyle= (1+q)(1+q+q2)(1+q+q2+qk1).\displaystyle(1+q)(1+q+q^{2})\cdots(1+q+q^{2}\cdots+q^{k-1}).
(13)

Our main result, the computation of the numbers ai(n)a_{i}(n) in Eq. (8), corresponding to the nontrivial contribution to ϕ(αi)=aiη\phi(\alpha_{i})=a_{i}\eta reduces to a corollary of the Theorem in Eq. (10) for the particular choice k=m=n/2k=m=n/2. In fact, using k=m=n/2k=m=n/2 we recover a generating function for our sequence aia_{i}

Gn(q)=i=0m2aiqi,ai=am2i>0,q1,G_{n}(q)=\sum_{i=0}^{m^{2}}a_{i}q^{i},\ \ a_{i}=a_{m^{2}-i}>0,\ \ q\neq 1, (14)

with Gn(1)=(nm)G_{n}(1)={n\choose m}. Therefore

ai=[qi][𝐧𝐦]=[qi]Gn(q)a_{i}=[q^{i}]{\mathbf{n}\brack\mathbf{m}}=[q^{i}]G_{n}(q) (15)

where the notation [qi]P(q)[q^{i}]P(q) indicates the coefficient of the ii-th power of the polynomial P(q)P(q). We can compute statistical properties of a sequence from its generating function with relative ease [12]. If we sample a permutation τSn\tau\in S_{n} at random and assign the random variable χ\chi to the associated value of α(τ)\alpha(\tau), the probability of observing the value αi\alpha_{i} is given by the ratio

Pχn(αi)=aij=0m2aj=[qi][𝐧𝐦](nm),P_{\chi_{n}}(\alpha_{i})=\frac{a_{i}}{\sum_{j=0}^{m^{2}}a_{j}}=\frac{[q^{i}]{\mathbf{n}\brack\mathbf{m}}}{{n\choose m}}, (16)

hence, Pχn(α)P_{\chi_{n}}(\alpha) is a probability mass function with support in AnA_{n} (Eq. 7) and Eq. (9) is the normalization condition iPχ(αi)=1\sum_{i}P_{\chi}(\alpha_{i})=1. The associated probability generating function (p.g.f.) is

Gχn(q)\displaystyle G_{\chi_{n}}(q) =\displaystyle= 0im2piqi\displaystyle\sum_{0\leq i\leq m^{2}}p_{i}q^{i} (17)
=\displaystyle= Gn(q)Gn(1)\displaystyle\frac{G_{n}(q)}{G_{n}(1)} (18)
=\displaystyle= [𝐧𝐦](nm)\displaystyle\frac{{\mathbf{n}\brack\mathbf{m}}}{{n\choose m}} (19)

where pi=Pχn(αi)p_{i}=P_{\chi_{n}}(\alpha_{i}). From the p.g.f in Eq. (19) we can, at least in principle, compute any desired moment of χ\chi by differentiation of Gχ(q)G_{\chi}(q) with respect to qq. However, the derivatives Gχ(q)G^{\prime}_{\chi}(q) are more cumbesome than illuminating and we will not present them here.

By applying the functional α(τ)\alpha(\tau) defined in Eq. (6) to the set of observed permutations obtained from embedding a stochastic process XtX_{t} as explained previously, we get an associated stochastic process in terms of α\alpha. The functional α(τ)\alpha(\tau) can be applied to any process, but here we limit to its use for the analysis of white noise processes.

A white noise process XtX_{t} is a continuous or discrete time stochastic process with the following properties

E[Xt]=0,andVar[Xt]=σ2,foralltΩ\displaystyle E[X_{t}]=0,\ \mathrm{and}\ \mathrm{Var}[X_{t}]=\sigma^{2},\ \mathrm{for\ all\ }t\in\Omega (20)
E[XtXs]=σ2δ(ts),foralls,tΩ\displaystyle E[X_{t}X_{s}]=\sigma^{2}\delta(t-s),\ \mathrm{for\ all}\ s,t\in\Omega (21)

where 0<σ2<0<\sigma^{2}<\infty, Ω\Omega is the set in which s,ts,t take values, for instance Ω=\Omega=\mathbb{R} for a continuous process. From these properties, the zero-mean and finite variance conditions are irrelevant for the application of α(τ)\alpha(\tau), since they merely are information about the scale of the process, to which our functional is blind.

The values of α\alpha over an embedding in a process are well defined, so long as the source process XtX_{t} has a continuous support (later we will see that this condition can be sometimes relaxed), thus having a zero probability of observing repeated amplitudes of the process XtX_{t}. This means that the ranking permutations are well defined. Of course, real observations have a finite resolution, but even considering this, repetition of amplitudes from a stochastic process with continuous support would be expected to be highly unlikely at least in a finite time. Considering the former facts, we arrive at the following

Definition 2 Induced χ\chi-process and α\alpha-series. For every choice of non-negative integers nn, mm and ll (with n=2mn=2m) and every white noise process XtX_{t}, the functional in Eq. (6) induces the discrete-time process χk\chi_{k} with discrete support AnA_{n} (7). Correspondingly, we denote a realization or time series of the process χk\chi_{k} as

{αk}k[N],\{\alpha_{k}\}_{k\in[N]}, (22)

where [N]={1,2,,N}[N]=\{1,2,\ldots,N\}, LL is the length of an observed realization of XtX_{t}, and N=L(n1)lN=L-(n-1)l, is the number of delay vectors in an embedding with fixed values nn and ll.

In general, for arbitrary values of nn and ll, the process χk\chi_{k} and, consequently, {αk}k[N]\{\alpha_{k}\}_{k\in[N]} are correlated due to the overlap of the embedding vectors, that is in turn reflected in a sequence of permutations that are not independent. Nevertheless, the process could become effectively uncorrelated for some values of ll or at large values of both nn and ll. In order to visualize how the p.m.f Pχn(αi)P_{\chi_{n}}(\alpha_{i}) changes as nn increases, let us plot Eq. (16) for different nn values. To facilitate comparisons, we make use of the standardized variable Zα=(χμχ)/σχZ_{\alpha}=(\chi-\mu_{\chi})/\sigma_{\chi}. Since μχ=0\mu_{\chi}=0, we have the simple quotient

Zα=χσχ.Z_{\alpha}=\frac{\chi}{\sigma_{\chi}}. (23)

we introduce the notation ZαZ_{\alpha} for referring to the standardized χ\chi random variable. We will denote its realizations by αz\alpha_{z} and the corresponding p.m.f. will be PZα(αz)P_{Z_{\alpha}}(\alpha_{z}) or simply P(αz)P(\alpha_{z}), with moments μZαE[Zα]\mu_{Z_{\alpha}}\equiv E[Z_{\alpha}], σZαVar[Zα]\sigma_{Z_{\alpha}}\equiv\mathrm{Var}[Z_{\alpha}], and associated α\alpha-process {αz,k}k[N]\{\alpha_{z,k}\}_{k\in[N]} (see (22)). We plot P(αz)P(\alpha_{z}) for n=4,8,16,32n=4,8,16,32 in Fig. 2.

Refer to caption
Figure 2: Standardized probability mass function P(αz)P(\alpha_{z}) for n=4,8,16,32n=4,8,16,32 represented by blue bars in panels (a),(b),(c) and (d) respectively, computed from Eq. 15 and (19) using a recursive relation between Gaussian binomial coefficients. The continuous blue curve on each panel is a superimposed standard Gaussian (μ=μχ=0\mu=\mu_{\chi}=0, σ=1\sigma=1).

Let us come back briefly to the reflections of diagrams introduced in Sec. II to add understanding to Pχ(α)P_{\chi}(\alpha). As illustrated in Figs. 1 and 1, reflecting the diagrams change the corresponding values of α\alpha up to sign only. This means that for every dDnd\in D_{n} the compositions of hh and vv fulfill

α[(hv)(d)]=α[(vh)(d)],\alpha[(h\circ v)(d)]=\alpha[(v\circ h)(d)], (24)

and we will denote them simply by hv()hv(\cdot) and vh()vh(\cdot) in the following. Since in general hv(d)vh(d)hv(d)\neq vh(d), this is the source of the degeneracy ϕ(α)\phi(\alpha) that is not accounted for by simple permutation of the terms of the partial sums sl,srs_{l},s_{r}. An illustration of Eq. (24) is seen in Fig. 1. We get diagrams that are different in a nontrivial permutational way but with the same value of α(τ)\alpha(\tau) via the composition hvhv, as illustrated there by going from d1=1,2,5,4,6,3d_{1}=\langle 1,2,5,4,6,3\rangle to d2=4,1,3,2,5,6d_{2}=\langle 4,1,3,2,5,6\rangle

α(dτ)\displaystyle\alpha(d_{\tau}) =\displaystyle= α[v(d1)]\displaystyle-\alpha[v(d_{1})]
=\displaystyle= α[hv(d1)]\displaystyle\alpha[hv(d_{1})]
=\displaystyle= α[d2]\displaystyle\alpha[d_{2}]
=\displaystyle= 5\displaystyle 5

The symmetry of the coefficients aia_{i} is thus equivalent to the reflection symmetry in Eq. (24).

IV Continuous approximations and Sufficient statistics

By direct computation of the variance σχn2=0im2αi2pi\sigma_{\chi_{n}}^{2}=\sum_{0\leq i\leq m^{2}}\alpha_{i}^{2}p_{i} for succesive nn values we arrive at the formula

σχ2=m2(2m+1)3\sigma_{\chi}^{2}=\frac{m^{2}(2m+1)}{3} (25)

which for mm\rightarrow\infty becomes

σχ2=23m3\sigma_{\chi}^{2}=\frac{2}{3}m^{3} (26)

As previously noticed, the probability mass function Pχ(α)P_{\chi}(\alpha) converges to a Gaussian as nn increases. However, as can be noticed from Fig. 2 for low values of nn the shape of the distribution is different from a Gaussian, specially around its center. We discuss this case in Appendix A.

The one-parameter exponential or Darmois-Koopman-Pitman families such as the Normal distribution arise naturally from the optimization of the Shannon entropy of P(χ)P(\chi) under normalization, first and second moment σχ2=23m3\sigma_{\chi}^{2}=\frac{2}{3}m^{3} constraints [13]

L[p]\displaystyle L[p] =\displaystyle= ipilogpiβ0(ipi1)\displaystyle-\sum_{i}p_{i}\log p_{i}-\beta_{0}(\sum_{i}p_{i}-1)
β1(iαipiμχ)β2(iαi2piσχ2)\displaystyle-\beta_{1}(\sum_{i}\alpha_{i}p_{i}-\mu_{\chi})-\beta_{2}(\sum_{i}\alpha_{i}^{2}p_{i}-\sigma_{\chi}^{2})

optimization yields pi=exp(1β0β1αiβ2αi2)p_{i}=\exp(1-\beta_{0}-\beta_{1}\alpha_{i}-\beta_{2}\alpha_{i}^{2}). Using the normalization condition, together with μχ=0\mu_{\chi}=0 we get

pi\displaystyle p_{i} =\displaystyle= eβαi2jeβαj2\displaystyle\frac{e^{-\beta\alpha_{i}^{2}}}{\sum_{j}e^{-\beta\alpha_{j}^{2}}}
=\displaystyle= aijaj\displaystyle\frac{a_{i}}{\sum_{j}a_{j}}

with β=β2\beta=\beta_{2}. We find β\beta by making a direct identification with the Gaussian p.d.f.

f(x;μ,σ)=1σ2πe(xμ)22σ2f(x;\mu,\sigma)=\frac{1}{\sigma\sqrt{2\pi}}e^{\frac{(x-\mu)^{2}}{2\sigma^{2}}} (29)

with σ=σχn\sigma=\sigma_{\chi_{n}} in formula (26), μ=μχn=0\mu=\mu_{\chi_{n}}=0 and jPχn(αj)=1\sum_{j}P_{\chi_{n}}(\alpha_{j})=1 in order to get the correct normalization factor. This yields β=34m3\beta=\frac{3}{4}m^{-3} and thus, an explicit continuous approximation of our p.m.f for large nn as

Pχn(αi)=3πm32exp(3αi24m3)P_{\chi_{n}}(\alpha_{i})=\sqrt{\frac{3}{\pi}}m^{-\frac{3}{2}}\exp\left(-\frac{3\alpha_{i}^{2}}{4m^{3}}\right) (30)

where αi2=(4i24m2i+m4)\alpha_{i}^{2}=(4i^{2}-4m^{2}i+m^{4}), m=n/2m=n/2. A convenient form for finding the natural sufficient statistics for our distribution is given by dropping the index ii, i.e. αi=α(τ)\alpha_{i}=\alpha(\tau) so now χ\chi is seen as a continuous variable. Allowing the mean back into the expression, we get

fχ(α;μχ,σχ)=1σχ2πexp((αμχ)22σχ2),f_{\chi}(\alpha;\mu_{\chi},\sigma_{\chi})=\frac{1}{\sigma_{\chi}\sqrt{2\pi}}\exp\left(-\frac{(\alpha-\mu_{\chi})^{2}}{2\sigma_{\chi}^{2}}\right), (31)

thus fχ(α;μχ,σχ)f_{\chi}(\alpha;\mu_{\chi},\sigma_{\chi}) is a probability density function that approximates Pχn(αi)P_{\chi_{n}}(\alpha_{i}) for large nn and allows for the possibility of a change in location and scale. More importantly, Eq. (31) indicates directly that the natural sufficient statistics for estimating the mean and variance from a sample χ1,χ2,,χN\chi_{1},\chi_{2},\ldots,\chi_{N} are given by the sample mean and variance

α¯\displaystyle\bar{\alpha} =\displaystyle= 1NτTLα(τ)\displaystyle\frac{1}{N}\sum_{\tau\in T_{L}}\alpha(\tau) (32)
sα2\displaystyle s_{\alpha}^{2} =\displaystyle= 1N1τTLα(τ)2\displaystyle\frac{1}{N-1}\sum_{\tau\in T_{L}}\alpha(\tau)^{2} (33)

where TLT_{L} is the set of permutations corresponding to the ordinal patterns observed in a time series of length LL and |TL|=N=L(n1)l|T_{L}|=N=L-(n-1)l is the number of patterns observed in that realization. The sample mean α¯\bar{\alpha} is of special interest for statistical analysis, as we will show in the next section.

Refer to caption
Figure 3: White noise α\alpha-analysis for white noise trajectories of length L=105L=10^{5}, with source distributions (a) Standard Normal, (b) Cauchy(μ=0\mu=0), (c) Exponential(θ=1\theta=1), (d) Poisson(λ=10000\lambda=10000), (e) Continuous Uniform U(0,1)U(0,1) and (f) a deterministic chaotic trajectory from the logistic map in Eq. (34) with r=4r=4. The random trajectory is shown on the top left part of each panel, with the first 164 points displayed. Below these random realizations the first 132 points of the associated α\alpha-process are shown. To the right of each sub panel we find the corresponding normalized histograms for n=32n=32, l=1l=1. Superimposed Gaussian densities with μ=α¯z,σ2=sα,z2\mu=\bar{\alpha}_{z},\ \sigma^{2}=s^{2}_{\alpha,z} are drawn with a blue dashed line. The sample mean α¯z\bar{\alpha}_{z} is indicated by a black vertical line.

V Time series analysis

As an illustration, let us apply our framework as explained at the beginning of Sec. III to white noise time series of length L=105L=10^{5} generated from different distributions: Standard Normal (unbounded support), Cauchy (unbounded support, heavy tails), Exponential (asymmetric distribution, unbounded support), Poisson (discrete unbounded support), Continuous Uniform (bounded support), and the special case of deterministic chaotic trajectories generated from the logistic map

f(x)=rx(1x),x[0,1],r[0,4]f(x)=rx(1-x),\ \ x\in[0,1],\ \ r\in[0,4] (34)

at control parameter value r=4r=4. At this value of rr, the logistic map is ergodic and its invariant density has a closed form ρ(x)=1/πx(1x)\rho(x)=1/\pi\sqrt{x(1-x)}, which is equal to Beta(1/2,1/2)\mathrm{Beta}(1/2,1/2). Furthermore, its orbits display exponential decay of correlations [14] and thus are effectively random in the long run. Therefore we can regard long time series obtained from 34 at r=4r=4 as white noise generated from a Beta distribution as a source, i.e., XBeta(1/2,1/2)X\sim\mathrm{Beta}(1/2,1/2). As discussed in Sec. III, the validity of the theory developed for our functional requires the process XtX_{t} to have a source distribution with continuous support, since a total order is needed for obtaining well-defined ranking permutations, but this is not the case for the Poisson distribution. Nevertheless, here we relax this condition to see that, in a practical situation where the discrete support of XtX_{t} is large enough to make the probability of repeated neighbouring points in time very low, then we can expect our method to apply to a good approximation.

For the sake of comparison, we use the standardized variable ZαZ_{\alpha}, whose associated process is {αz,k}k[N]\{\alpha_{z,k}\}_{k\in[N]}. Now let us choose an embedding dimension (sliding window length) with n=32n=32 and let us use a lag of l=1l=1. With this choice we ensure that the shape of Pχ(α)P_{\chi}(\alpha) is well approximated by a Gaussian (see Fig. 2-(d)).

In Fig 3, we can confirm empirically the statement that the only requirements for obtaining a Gaussian distribution for α(τ)\alpha(\tau), are the statistical independence in the observations and the continuity of the support of XtX_{t}. These conditions can be relaxed to include processes with sufficiently rapid (i.e. exponential) decay of correlations as in the case of the deterministic chaotic trajectory, or to discrete processes with sufficiently large support.

The usefulness of our analysis is not limited to large values of the embedding dimension. In Fig. 4 we show the distributions P(Zα)P(Z_{\alpha}) for embedding dimensions n=2,4,8n=2,4,8, obtained from a chaotic trajectory of length L=104L=10^{4} from the logistic map with r=4r=4 and initial condition x0=0.84291157x_{0}=0.84291157\ldots . The discrepancies between the distributions for the chaotic trajectory and realizations of uniform white noise come from the impossibility of the logistic map of displaying some types of patterns, known as forbidden patterns [15]. For instance, in [15] it is shown that the pattern of the type 3,2,1\langle 3,2,1\rangle, and more generally patterns of the form ,3+k,,2+k,,1+k,\langle*,3+k,*,2+k,*,1+k,*\rangle called outgrowth patterns (where * indicates any other symbol in the pattern) cannot be displayed by orbits of the logistic map with r=4r=4. Although one of the drawbacks of our method is that we are limited to even values of nn, we can still see the effect of the forbidden patterns by the asymmetry in the distributions in Fig 4, which has less mass in the negative side of the support. This is because the forbidden pattern has a negative value of our functional α(321)=2\alpha(321)=-2, and thus, patterns that are negative in the α\alpha sense will be more likely to belong to the outgrowth set patterns of 3,2,1\langle 3,2,1\rangle. Correspondingly, an excess of positive patterns is observed, yielding an asymmetric distribution. This makes our analysis potentially useful for detecting signatures of determinism in an observed process by direct comparison with white noise. As it can be seen in Fig. 4, this effect is lost for a larger value of the lag due to the increased scale of observation and consequent loss of correlations.

Refer to caption
Figure 4: Distributions P(αz)P(\alpha_{z}) for n=2,4,8n=2,4,8 and different ll values, obtained for a chaotic trajectory of the logistic map with r=4r=4 (blue bars), and white noise (red bars). The effect of the lag can be observed in panels (a),(c) and (e), where l=1l=1 and the thus the dynamics of the map is reflected with more detail, including the effect of the forbidden patterns (see text). By contrast, when the lag is large enough, l=4l=4 in this case, we see in panels (b),(d) and (f) how the effect of the correlations due to the determinism of the map, as well as effect of the forbidden patterns, is lost and both distributions are almost indistinguishable.

V.1 3D Diffusion of Gold Nanoparticles

Now, let us analyze experimental data gathered using a recent and powerful technique for direct observation of the 3D dynamics of nanoparticles (NPs), known as liquid-cell scanning transmission electron microscopy (LCSTEM). Although it provides a very sharp resolution, this technique have been reported to yield observations of NP dynamics that is 3 to 8 orders of magnitude slower than the theoretical predictions [16]. This discrepancy can be atributed to the damping effect of the strong beam of electrons, the viscosity of the media, and interactions of the particles with the boundaries of the experimental cell [16]. In [16], Welling et al. address the problem of observed slowed down diffusion by tuning the electron beam to a low dose rate and using high viscosity media, such as glycerol, for the NP diffusion. With those modifications, they track the 3D diffusion of charge-neutral 77 nm gold nanoparticles (Au-NPs) in glycerol as well as charged 350 nm titania particles in glycerol carbonate. The independence between the spatial increments is one of the defining properties of Brownian motion. In the following we show how to use our transformed ordinal pattern framework for testing this independence in the experimental particle tracks from the set of Au-NPs in Ref. [16]. There are more than 200200 NP tracks observed in the xx-yy plane in this data set, whose original experimental labels are kept here for identification. We analyzed all the trajectories whose length is L100L\geq 100 points (with a maximum of L=359L=359), for a total of M=37M=37 tracks (See Fig. 5).

The procedure is as follows. For each of the MM tracks 𝐱=(x1,x2,,xL)\mathbf{x}=(x_{1},x_{2},\ldots,x_{L}), 𝐲=(y1,y2,,yL)\mathbf{y}=(y_{1},y_{2},\ldots,y_{L})

  • 1)

    Obtain the vectors of increments Δ𝐱=(x2x1,x3x2,,xL1xL)\Delta\mathbf{x}=(x_{2}-x_{1},x_{3}-x_{2},\ldots,x_{L-1}-x_{L}) (analogously for Δ𝐲\Delta\mathbf{y}).

  • 2)

    Choose an embedding dimension nn and lag ll in order to apply the α\alpha-analysis to Δ𝐱,Δ𝐲\Delta\mathbf{x},\Delta\mathbf{y}. This yields a pair of vectors, denoted for simplicity in notation by αΔj\alpha\circ\Delta_{j} and αΔy\alpha\circ\Delta_{y}. The notation αu\alpha\circ u has to be interpreted as first making an embedding of the series u={u1,u2,uL}u=\{u_{1},u_{2},\ldots u_{L}\} with the chosen n,ln,l values and then applying the α(τ)\alpha(\tau) functional to the corresponding collection of ranking permutations.

  • 3)

    We get the standardized versions as

    αzΔj=αΔjσχ\alpha_{z}\circ\Delta_{j}=\frac{\alpha\circ\Delta_{j}}{\sigma_{\chi}}

    where j=x,yj=x,y and σχ=m((2m+1)/3)1/2\sigma_{\chi}=m((2m+1)/3)^{-1/2} (Eq. 25). This amounts to obtaining the induced αz\alpha_{z} time series for both coordinates, i.e., {αz,k(j)}k[N],j=x,y\{\alpha_{z,k}^{(j)}\}_{k\in[N]},\ \ j=x,y.

  • 4)

    We now account for the correlations among the αz\alpha_{z} values that are introduced by construction, by computing the effective length of the vectors αzΔj\alpha_{z}\circ\Delta_{j} via [1] through the correction

    Neff=N1+2jRα(i)N_{\mathrm{eff}}=\frac{N}{1+2\sum_{j}R_{\alpha}(i)} (35)

    where Rα(i)R_{\alpha}(i) is the autocorrelation function (ACF) of the {αz,k(j)}k[N]\{\alpha_{z,k}^{(j)}\}_{k\in[N]} series from the previous step, at time lag ii.

  • 5)

    Finally, since the variance of ZαZ_{\alpha} is known (σZα=1\sigma_{Z_{\alpha}}=1), we can perform a one-sample ZZ-test on the {αz,k(j)}k[N],j=x,y\{\alpha_{z,k}^{(j)}\}_{k\in[N]},\ \ j=x,y series, under the assumption that the increments Δ𝐱,Δ𝐲\Delta\mathbf{x},\Delta\mathbf{y} are independent, and using NeffN_{\mathrm{eff}} computed in previous step as the sample size. Therefore, our null hypothesis is simple: The mean value of the standardized variable ZαZ_{\alpha} is zero, μZα=0\mu_{Z_{\alpha}}=0. In other words, we want to test

    H0:μαz=0againstH1:μαz0H_{0}:\mu_{\alpha_{z}}=0\ \ \mathrm{against}\ \ H_{1}:\mu_{\alpha_{z}}\neq 0 (36)

    with a selected level of confidence.

The robustness of this test is guaranteed by the fact that the quantities α¯\bar{\alpha} and sα2s^{2}_{\alpha} (Eqs. LABEL:Eq:SuffStats_alpha_s2) obtained in Sec. IV are sufficient statistics for χ\chi. Therefore, we can rely on the statistic α¯z\bar{\alpha}_{z} as the the unbiased estimator of the standardized mean μZα\mu_{Z_{\alpha}}. For each spatial dimension of the diffusion, we have the estimators α¯zjw=E[αzΔj],j=x,y\bar{\alpha}_{z}^{jw}=E[\alpha_{z}\circ\Delta_{j}],\ j=x,y. We will follow common practice and choose a confidence level of 95%95\%, corresponding to a type-I error (false-positive) rate of 0.050.05, denoted here by ϵ1\epsilon_{1}. This error rate is customarily denoted by “α\alpha”, conflicting with the notation for the main object in the paper. The author hopes that this change from the standard notation does not affect the reading. Correspondingly, we will denote the rate of a type-II error (false-negative) as ϵ2\epsilon_{2}, customarily denoted by β\beta, and will require an 80%80\% power, or ϵ2=0.2\epsilon_{2}=0.2.

Refer to caption
Figure 5: Tracks of the 3D diffusion of 7777 nm gold nanoparticles in the xx-yy plane. The 37 particles shown have a length L>100L>100. The three tracks (labels 11,94 and 144) highlighted in blue did not pass the test of independence for the xx-coordinate increments (see text). Analogously, the null hypothesis of the increments being independent in the yy-coordinate had to be rejected for the highlighted purple track. The tracks for which the null hypothesis could not be rejected at the selected level of confidence are shown in gray.

We choose an embedding dimension of n=32n=32, since in that case we can consider P(αz)P(\alpha_{z}) to be well aproximated by a Gaussian (see Fig. 2-(d)). For the lag we choose l=1l=1, because it ill give the maximum series length N=L(n1)lN=L-(n-1)l. The correlations introduced by this choice of lag are accounted for by Eq. (35), but, before proceeding with next steps, let us estimate the minimum value of NeffN_{\mathrm{eff}} that complies with the chosen ϵ1\epsilon_{1} and ϵ2\epsilon_{2}. This estimate for NeffN_{\mathrm{eff}} is given from the usual two-sided ZZ-test by [17]

Neff(z1ϵ12+z1ϵ2)2(σZαc)2,N_{\mathrm{eff}}\geq\left(z_{1-\frac{\epsilon_{1}}{2}}+z_{1-\epsilon_{2}}\right)^{2}\left(\frac{\sigma_{Z_{\alpha}}}{c}\right)^{2}, (37)

where σZα=Var[Zα]\sigma_{Z_{\alpha}}=\mathrm{Var}[Z_{\alpha}] (see Eq. (23)), z1ϵ1/2z_{1-\epsilon_{1}/2} and z1ϵ2z_{1-\epsilon_{2}} are the quantiles of the standard normal distribution at 1ϵ1/21-\epsilon_{1}/2 and 1ϵ21-\epsilon_{2} respectively, and cc is the desired threshold of detection for deviations from the mean, regularly written as proportional to the standard deviation, so cσc\sim\sigma.

For ϵ1=0.05\epsilon_{1}=0.05, ϵ2=0.2\epsilon_{2}=0.2, and c=1.96σc=1.96\sigma, we get Neff30N_{\mathrm{eff}}\geq 30 from Eq. (37). All of the trajectories considered have a corresponding Neff>30N_{\mathrm{eff}}>30, therefore we can reliably apply our test. A summary of the results of the test is provided in Table 4, where we show the tracks for which the null hypothesis is rejected (pp-value lower than 0.050.05).

Refer to caption
Figure 6: Violin-box plot for the sample means {α¯z(j)},j=x,y\{\bar{\alpha}_{z}^{(j)}\},\ j=x,y of the 37 analyzed tracks. The whiskers mark the 1.5×\timesIQR (inter-quartile range). Outliers are detected only in the xx-coordinate α¯z\bar{\alpha}_{z} set, corresponding to the trajectories with labels 11,94,14411,94,144 and indicated by diamonds.

In contrast with the analysis displayed in Fig. 6, where only 3 xx-trajectories are detected as outliers (labels 11,94,14411,94,144), there are 44 trajectories that get rejected by the single trajectory hypothesis test (See Table 4), with labels 11,94,14411,94,144 (xx-tracks) and 159159 (yy-track).

Nevertheless, the agreement between the single trajectory ZZ-tests and the independent analysis through the box plot is reasonably good, suggesting that the correction in Eq. (35) represents a sensible approximation that accounts for the autocorrelation in the αz,k(j)\alpha_{z,k}^{(j)} processes.

NP label xx, pp-val yy, pp-val α¯z(x)\bar{\alpha}_{z}^{(x)} α¯z(y)\bar{\alpha}_{z}^{(y)} Neff,xN_{\mathrm{eff},x} Neff,yN_{\mathrm{eff},y}
11 0.000 0.963 -0.3902 0.0051 81 81
94 0.024 0.089 0.2849 -0.2128 63 64
144 0.032 0.667 -0.3620 0.0717 35 36
159 0.276 0.031 -0.1700 0.3364 41 41
Table 4: Nanoparticle tracks rejected by the single trajectory analysis and their α\alpha-statistics.

VI Discussion

We have illustrated the main advantages of avoiding working with the direct statistics of patterns by, instead, first dividing the symmetric group into classes through the functional α(τ)\alpha(\tau) defined by Eq. (6), so that the problem is reduced to analyze these classes. However, this procedure does not come without drawbacks, and next we will discuss this, as well as other positive points in more detail.

An interesting conceptual consequence of the presented view of white noise is, that a sense of typicality emerges in terms of the functional α(τ)\alpha(\tau) due to its concentration around zero. Therefore, the stationarity of white noise acquires a combinatorial character arising from statistical constraints.

In Sec. V, we have successfully shown a use case of our framework to test for independence in the spatial increments of diffusing particles in 3 dimensions, whose motion was recorded in a 2-dimensional plane. Although computing the ensemble averaged mean squared displacement (MSD) is the customary check for diffusive behavior, it does not provide the single-trajectory detail and statistical power of our method. Indeed, our method can be implemented for a single particle if that is the only information available, and still being reliable assuming a minimum effective trajectory length is achieved (as seen in Sec. V.1), which is a sensible requisite.

The time series available were of a relatively short length. Yet, notably, the test is able to handle these short time series. None of the trajectories were rejected by our test for the two dimensions at the same time. That could be interpreted as a good indication that the observation technique used in [16] performed well enough as to preserve the 3D Brownian diffusion overall, by keeping the introduction of correlations in the motion through the observation scheme at a minimum. After discussion with one of the authors in [16], we can explain the higher rejection rate for the xx-coordinate tracks by the observation procedure: The LCSTEM probe is progressively scanning line by line along the xx-dimension, thereby potentially introducing weak correlations in the particles motion in that direction.

The former is supported from the theory exposed and the reasonably good agreement between our analysis and the quartile analysis in Fig. 6. Furthermore, the quartile analysis did not determined a track whose yy-component was rejected by our test, suggesting our method is more sensitive for a given confidence level.

For all the examples of white noise processes considered in Sec. V, the customary ordinal pattern analysis for the PE computation would be practically impossible for the choice of n=32n=32 used here, since we have to keep track of 32!32! patterns. The analysis and graphical representation of the final distribution would be also impossible without a further coarsening of the support, something that would render the analysis crippled of the detail that characterizes it in the first place. Instead, in the present framework the size of the support of our distribution is |An|=m2+1|A_{n}|=m^{2}+1 (see Sec.II). This is an important simplification, while still keeping relevant information about the patterns both in terms of correlations, and even rough information about the variation of the amplitudes in the form of weights. A remarkable aspect of our framework when comparing white noise of different sources is the robustness of the empirical distributions. The empirical distribution over SnS_{n} in the customary PE approach approximates a discrete uniform with support {1,2,,n!}\{1,2,\ldots,n!\}, implying that for moderate to large nn, it would display strong variations when estimated from a finite sample. In the considered case with n=32,l=1n=32,\ l=1 and series length L=105L=10^{5}, the number of observations N=L(n1)l105N=L-(n-1)l\simeq 10^{5} falls extremely short for having at least one representative out of the 32!2.6×103532!\simeq 2.6\times 10^{35} possible patterns. Therefore the obtained empirical density would be composed mostly of void regions and uneven peaks. In order to prevent this effect, in our current example of time series of length L=105L=10^{5} one should limit to n=8n=8 (8!4×1048!\simeq 4\times 10^{4}) and still, we could get an uneven empirical density with gaps despite the rather large time series. In contrast to this, in our approach there are just m2+1=257m^{2}+1=257 classes to keep track of, as in the illustrative examples considered in Fig. 3. The choice n=32n=32 was done mainly to guarantee a good approximation of P(Zα)P(Z_{\alpha}) by a Gaussian p.d.f. Nevertheless, we can think of alternative analysis for lower nn values exploiting the fact that we know the specific form of P(Zα)P(Z_{\alpha}) for hypothesis testing.

Now, let us address the autocorrelations introduced in the embedding procedure. We already mentioned in Sec. III that the sequence of α\alpha values obtained from a time series are correlated due to the overlap among the embedding vectors, that in turn translates into overlapping ranking permutations τSn\tau\in S_{n} and, finally, correlated α\alpha values. This is specially so for low values of the lag ll. As illustrated in Sec. V, for large enough values of ll, the effect of the correlations in the original process can be significantly diminished, but also the overlap between the patterns can be diminished or eliminated (see Fig.4). Nevertheless, even for low ll, the autocorrelations in the series {αk}k[N]\{\alpha_{k}\}_{k\in[N]} do not affect the Gaussian character of P(α)P(\alpha), since this characteristic comes from the fact that, as nn\rightarrow\infty, the partial sums sls_{l} discussed in Sec. III are composed of integers from the uniform distribution, that become effectively independently sampled as nn grows, and thus the Central Limit Theorem applies. This is the same effect as the effective loss of statistical dependence when drawing with replacement from a very large pool. Thus, despite the correlations introduced by construction in the method, this does not come at a cost so high that it ruins the statistical power of the analysis, specially for large nn, large ll, or both n,ln,l large values.

The absence or over representation of patterns that is expected to happen due to finite sampling is lessened by the fact that it is very likely that a pattern in the same or a similar class will take the place of the missing one. Or vice-versa, over represented patterns in a sample would induce over representation of patterns in neighboring classes for low lag values, specially l=1l=1 when the window has the greatest overlap. This has the overall effect of perturbing the shape of the Gaussian around the over represented pattern. It is a similar situation for absent patterns. An example of this can be seen for the chaotic trajectory of the logistic map in Fig. 3-(f), where there is a region in the center which gets a slightly increased probability density than it should for a white noise process. This is explained by the missing forbidden patterns as discussed in Sec. V. The over representation of the patterns with values of α\alpha around zero is apparently more likely to happen for the processes with bounded support as is the case of Uniform white noise and the deterministic chaotic trajectory (See Fig 3(e)-(f)). This could be explained by the boundeness of the noise and the finiteness of the trajectory, making less likely that increasing (decreasing) sequences appear, corresponding to positive (negative) values of α\alpha that are far from the average around zero. Thus, values α0\alpha\simeq 0 get over represented.

A major drawback for the applicability of our framework is, that an even embedding dimension must be used. Nevertheless, a workaround to this could be to perform the analysis for the adjacent even values of the desired odd nn value if the actual odd structure of the patterns is not relevant. Furthermore, in principle the odd nn analysis is also possible by the Definition 1 (Eq. (5)) that we can approximate the exact distribution of the corresponding α\alpha functional since we have the general expression 11 describing the statistics of the degeneracy of α\alpha for any choice of nn and mm. This degeneracies can be computed numerically from the expression 11 by means of a recursion for the Gaussian binomial coefficients [10]. The distributions for α(τ)\alpha(\tau) for odd nn thus obtained are skewed, but still bell-shaped. We wish to make a general analysis of this case, together with possible practical implications in a future work.

To finish, we stress that an important message conveyed with this contribution is, that the full detail of the customary ordinal pattern analysis (prior to PE computation) is not needed and instead hinders its statistical applications and, on the other hand, that the computation of the Shannon entropy directly from the ordinal pattern analysis washes away information that is valuable statistically. The approach presented here represents a middle ground with several extra benefits and relatively minor drawbacks.

Acknowledgements.
Funding for this work was provided by the Alexander von Humboldt Foundation in the framework of the Sofja Kovalevskaja Award endowed by the German Federal Ministry of Education and Research, through Matteo Smerlak. The author acknowledges Tom Welling, Marijn A. van Huis, Professor van Blaaderen and their collaborators for sharing their experimental data. David Davalos, Holger Kantz and Mario Díaz-Torres are acknowledged for useful discussions and comments.

References

  • Gelman et al. [2013] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin, Bayesian Data Analysis, 3rd ed. (CRC Press, 2013).
  • Kantz and Schreiber [2003] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis, 2nd ed. (Cambridge University Press, 2003).
  • Bandt and Pompe [2002] C. Bandt and B. Pompe, Permutation entropy: A natural complexity measure for time series, Phys. Rev. Lett. 88, 174102 (2002).
  • Zanin et al. [2012] M. Zanin, L. Zunino, O. A. Rosso, and D. Papo, Permutation entropy and its main biomedical and econophysics applications: A review, Entropy 14, 1553 (2012).
  • Zanin and Olivares [2021] M. Zanin and F. Olivares, Ordinal patterns-based methodologies for distinguishing chaos from noise in discrete time series, Communications Physics 410.1038/s42005-021-00696-z (2021).
  • Bandt et al. [2002] C. Bandt, G. Keller, and B. Pompe, Entropy of interval maps via permutations, Nonlinearity 15, 1595 (2002).
  • Misiurewicz [2003] M. Misiurewicz, Permutations and topological entropy for interval maps, Nonlinearity 16, 971 (2003).
  • [8] D. J. Little and D. M. Kane, Phys. Rev. E doi.org/10.1103/PhysRevE.94.022118.
  • Little and Kane [2017] D. J. Little and D. M. Kane, Variance of permutation entropy and the influence of ordinal pattern selection, Phys. Rev. E 95, 052126 (2017).
  • Boná [2012] M. Boná, Combinatorics of Permutations (CRC Press, 2012).
  • Andrews and Eriksson [2004] G. E. Andrews and K. Eriksson, Integer Partitions, 1st ed. (Cambridge University Press, 2004).
  • Wilf [1994] H. S. Wilf, Generatingfunctionology (Academic Press, 1994).
  • Cover and Thomas [2005] T. M. Cover and J. A. Thomas, Elements of Information Theory (2005) pp. 1–748.
  • Keller and Nowicki [1992] G. Keller and T. Nowicki, Spectral theory, zeta functions and the distribution of periodic points for Collet-Eckmann maps, Communications in Mathematical Physics 149, 31 (1992).
  • Amigó et al. [2007] J. M. Amigó, S. Zambrano, and M. A. Sanjún, True and false forbidden patterns in deterministic and random dynamics, Epl 7910.1209/0295-5075/79/50001 (2007).
  • Welling et al. [2020] T. A. Welling, S. Sadighikia, K. Watanabe, A. Grau-Carbonell, M. Bransen, D. Nagao, A. van Blaaderen, and M. A. van Huis, Observation of Undamped 3D Brownian Motion of Nanoparticles Using Liquid-Cell Scanning Transmission Electron Microscopy, Particle and Particle Systems Characterization 3710.1002/ppsc.202000003 (2020).
  • DeGroot and Schervish [2012] M. H. DeGroot and M. J. Schervish, Probability and Statistics, 4th ed. (Pearson Education Inc., 2012).
  • Shao [2007] J. Shao, Mathematical Statistics, 2nd ed. (Springer-Verlag, 2007).
  • Marchal and Arbel [2017] O. Marchal and J. Arbel, On the sub-Gaussianity of the Beta and Dirichlet distributions, Electronic Communications in Probability 22, 1 (2017)1705.00048 .
  • Hoeffding [1963] W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association 58, 13 (1963).
  • Chernoff [1952] H. Chernoff, A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations, The Annals of Mathematical Statistics 23, 493 (1952).

Appendix A Continuous approximation for finite permutation length

It is clear from Fig. 2, that the p.m.f P(αz)P(\alpha_{z}) is not well approximated by a Gaussian for low values of nn. Therefore, we propose a Beta distribution as a better continuous approximation for this case. Recall its probability density function (p.d.f.)

fW(w;a,b)=wa1(1w)b1B(a,b),w[0,1]f_{W}(w;a,b)=\frac{w^{a-1}(1-w)^{b-1}}{B(a,b)},\ \ w\in[0,1] (38)

where B(a,b)=01ta1(1t)b1𝑑tB(a,b)=\int_{0}^{1}t^{a-1}(1-t)^{b-1}dt.

To match the random variable χ\chi with the Beta random variable WW, we need first to transform the support AnA_{n} of Pχ(α)P_{\chi}(\alpha) in Eq. (7) to be a subset of the unit interval A~n[0,1]\tilde{A}_{n}\subseteq[0,1] by a shift and re-scaling

Wα=χ+m22m2,Wαf(αw;a,b)W_{\alpha}=\frac{\chi+m^{2}}{2m^{2}},\ \ W_{\alpha}\sim f(\alpha_{w};a,b) (39)

where f(αw;a,b)f(\alpha_{w};a,b) is a Beta p.d.f. (Eq. 38). As in the case of ZαZ_{\alpha}, here we adopt the notation WαW_{\alpha} for the transformed random variable, αw\alpha_{w} for its realizations and P(αw)P(\alpha_{w}) for the p.m.f..

From Eq. (39), formula (25) and μχ=E[χ]=0\mu_{\chi}=E[\chi]=0, we get for the moments of WαW_{\alpha}

E[Wα]=μWα\displaystyle E[W_{\alpha}]=\mu_{W_{\alpha}} =\displaystyle= 12\displaystyle\frac{1}{2} (40)
Var[Wα]=σWα2\displaystyle\mathrm{Var}[W_{\alpha}]=\sigma^{2}_{W_{\alpha}} =\displaystyle= σχ24m4=2m+112m2,\displaystyle\frac{\sigma^{2}_{\chi}}{4m^{4}}=\frac{2m+1}{12m^{2}}, (41)

This low nn approximation of PWα(αw)P_{W_{\alpha}}(\alpha_{w}) by a Beta distribution f(αw;a,b)f(\alpha_{w};a,b) is consistent with the fact that in the limit a,ba,b\rightarrow\infty the Beta p.d.f. in Eq. (38) converges to a Gaussian, as seen in Sec IV.

The random variable Wα[0,1]W_{\alpha}\in[0,1] can be interpreted as a probability itself. A value αw(τ)\alpha_{w}(\tau) would correspond to the probability of a pattern dτd_{\tau} to have a degree of asymmetry ranging from 0 to 11. This means that Pr(0αw<12)\mathrm{Pr}(0\leq\alpha_{w}<\frac{1}{2}) is the probability of sampling an overall decreasing pattern, Pr(αw=12)\mathrm{Pr}(\alpha_{w}=\frac{1}{2}) is the probability of sampling an overall constant pattern, and Pr(1αw>12)\mathrm{Pr}(1\geq\alpha_{w}>\frac{1}{2}) is the probability of finding overall increasing patterns.

The Chebyshev-Pearson method of moments [18] provides the following expressions for estimating the parameters of the B(a,b)B(a,b) distribution from the mean x¯w=N1jNαw,j\bar{x}_{w}=N^{-1}\sum_{j}^{N}\alpha_{w,j} and variance σ^w2=(N1)1jN(αw,j1/2)\hat{\sigma}_{w}^{2}=(N-1)^{-1}\sum_{j}^{N}(\alpha_{w,j}-1/2) from a random sample Wα,1,Wα,2Wα,NW_{\alpha,1},W_{\alpha,2}\ldots W_{\alpha,N}

a^\displaystyle\hat{a} =\displaystyle= x¯w(x¯w(1x¯w)σ^w21)\displaystyle\bar{x}_{w}\left(\frac{\bar{x}_{w}(1-\bar{x}_{w})}{\hat{\sigma}_{w}^{2}}-1\right) (42)
b^\displaystyle\hat{b} =\displaystyle= (1x¯w)(x¯w(1x¯w)σ^w21)\displaystyle(1-\bar{x}_{w})\left(\frac{\bar{x}_{w}(1-\bar{x}_{w})}{\hat{\sigma}_{w}^{2}}-1\right) (43)

Since in our computations the sample comprises the population itself (the whole symmetric group) the parameters we have characterized so far are the exact population mean and the exact variance, respectively, i.e., x¯=μWα=1/2\bar{x}=\mu_{W_{\alpha}}=1/2, σ^=σWα\hat{\sigma}=\sigma_{W_{\alpha}}, a=a^a=\hat{a}, b=b^b=\hat{b}. Then, replacing Eqs. (41) in formulas (LABEL:Eq:MethodMoments_BetaParams) to obtain the exact parameters

a=3m22(2m+1)12,andb=aa=\frac{3m^{2}}{2(2m+1)}-\frac{1}{2},\ \mathrm{and}\ \ b=a (44)

that becomes a34ma\sim\frac{3}{4}m for m1m\gg 1, in agreement with the symmetry of P(α)P(\alpha) and the linear increase observed for the parameters a,ba,b from the numerical computations. For the case m=1m=1 we get a=b=0a=b=0. In this limit, the p.d.f. for the Beta distribution in Eq. (38) becomes a Bernoulli distribution with p=1/2p=1/2, which is precisely the same as PWα(αw)P_{W_{\alpha}}(\alpha_{w}) for n=2n=2, or, after a location shift to zero, is also equivalent to Pn(α)P_{n}(\alpha) and P(αz)P(\alpha_{z}) which are symmetric Bernoulli distributions.

Appendix B Concentration of measure by α(τ)\alpha(\tau)

Before drawing a conclusion on the continuous asymptotic shape of Pχ(α)P_{\chi}(\alpha), let us characterize its moments.

First, we find a bound for the scaling of σχ\sigma_{\chi} in order to test formula 25 through the following sub-gaussianity argument.

A random variable XX with finite expectation μ\mu is called sub-gaussian if there exists a positive number σ\sigma such that

E[exp(λ(Xμ))]exp(λ2σ22),λ.E[\exp(\lambda(X-\mu))]\leq\exp\left(\frac{\lambda^{2}\sigma^{2}}{2}\right),\ \forall\lambda\in\mathbb{R}. (45)

The constant σ\sigma is called proxy variance or sub-gaussian norm [19]. From the classical Hoeffding’s theorem [20] together with the definition in Eq. (45), we can conclude that every integrable random variable supported on a compact set is sub-gaussian [19]. Therefore, considering that for fixed nn our functional is bounded m2αm2-m^{2}\leq\alpha\leq m^{2}, we conclude that χ\chi is subgaussian, and so it is YY.

The left-hand side of inequality (45) is nothing more than the moment generating function MX(λ)=E[exp(λX)]M_{X^{\prime}}(\lambda)=E[\exp(\lambda X^{\prime})] of the random variable X=XμX^{\prime}=X-\mu. If we apply of Markov’s inequality to MX(λ)M_{X^{\prime}}(\lambda) we obtain

Pr(Xt)\displaystyle\mathrm{Pr}(X^{\prime}\geq t) =\displaystyle= P(exp(λX)exp(tλ))\displaystyle P(\exp(\lambda X^{\prime})\geq\exp(t\lambda)) (46)
\displaystyle\leq E[exp(tX)]exp(tλ)\displaystyle\frac{E[\exp(tX^{\prime})]}{\exp(t\lambda)}

by finding the minumum in the right-hand side of the inequality in Eq. (46) with respect to the parameter λ\lambda we get the optimal bound (Chernoff bound [17],[21]) for subgaussian random variables [19]

Pr(Xμ>t)exp(t22σ2),t>0,\mathrm{Pr}(X-\mu>t)\leq\exp\left(-\frac{t^{2}}{2\sigma^{2}}\right),\ \forall t>0, (47)

which is an equivalent but more usable statement of the subgaussianity property, Eq. (45). Although inequalities (46) and (47) are more useful when treating sums of i.i.d variables due to factorizability, they are valid also for certain instances of dependency in the samples, such as sampling without replacement [20]. Computing α(τ)\alpha(\tau) is a problem of sampling m=n/2m=n/2 symbols without replacement from the set [n][n], so we can apply these techniques in our problem. Considering that μX=0\mu_{X}=0 and the fact that we always know the tail probability of P(χ)P(\chi)

Pr(|χ|m2)\displaystyle\mathrm{Pr}(|\chi|\geq m^{2}) =\displaystyle= p0+p1+pm21+pm2\displaystyle p_{0}+p_{1}+p_{m^{2}-1}+p_{m^{2}} (48)
=\displaystyle= 4(nm)\displaystyle\frac{4}{{{n}\choose{m}}}

as stated in the previous section. Using Eq. (47) with t=m2t=m^{2} together with Eq. (48), taking logarithm in both sides and using Strling’s approximation for the logarithm of the central binomial coefficients up to the first term: log(nm)mlog(4)\log{{n}\choose{m}}\simeq m\log(4), and recalling n=2mn=2m, we get the bound

σχ2m3[(1+1m)log(4)12mlog(πm)]\sigma_{\chi}^{2}\leq\frac{m^{3}}{\left[\left(1+\frac{1}{m}\right)\log(4)-\frac{1}{2m}\log(\pi m)\right]} (49)

which reduces rapidly to

σχ2m3log(4)\sigma_{\chi}^{2}\leq\frac{m^{3}}{\log(4)} (50)

as mm\rightarrow\infty, which is consistent with formula (26) for m7m\geq 7.