This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Stationary 11-dependent Counting Processes:
from Runs to Bivariate Generating Functions

Jim Pitman, Zhiyi You
(May 17, 2021)

Abstract

We give a formula for the bivariate generating function of a stationary 11-dependent counting process in terms of its run probability generating function, with a probabilistic proof. The formula reduces to the well known bivariate generating function of the Eulerian distribution in the case of descents of a sequence of indepependent and identically distributed random variables. The formula is compared with alternative expressions from the theory of determinantal point processes and the combinatorics of sequences.

Keywords

One-dependent process, bivariate generating function, run probability generating function, Eulerian numbers, dependence structures, determinantal point process.

1 Introduction

1.1 Counting one-dependent processes

A discrete time stochastic process (X1,X2,)(X_{1},X_{2},\cdots) is said to be 11-dependent if

(X1,,Xm1) is independent of (Xm+1,,Xm+n)(X_{1},\ldots,X_{m-1})\text{ is independent of }(X_{m+1},\ldots,X_{m+n}) (1.1)

for all positive integers mm and nn. In contrast to a Markov chain, this independence requires no knowledge of current position. This dependence structure has been widely investigated in probability theory [2, 25, 24], and as a tool in statistics [30] and queuing systems [23, 36]. Examples of 11-dependent processes are provided by 22-block factors [29] generated by a function of two successive terms in an independent sequence. But not all 11-dependent processes can be constructed this way: Aaronson et al. [1] explicitly gave a two-parameter family of 11-dependent processes which cannot be expresses as 22-block factors. Other examples of this kind arise in the theory of random colorings of integers developed by Holroyd and Liggett [25, 24].

Here, we restrict attention to 11-dependent processes (X1,X2,)(X_{1},X_{2},\cdots) which are also stationary, i.e. for all positive integers nn,

(X1,X2,,Xn)=d(X2,X3,,Xn+1).(X_{1},X_{2},\ldots,X_{n})\stackrel{{\scriptstyle d}}{{=}}(X_{2},X_{3},\ldots,X_{n+1}). (1.2)

In the case of an independent and identically distributed (i.i.d.) sequence, the distribution of the count Sn(A)=k=1n1(XkA)S_{n}(A)=\sum_{k=1}^{n}1(X_{k}\in A) is binomial with parameters nn and (XnA){\mathbb{P}}(X_{n}\in A), for any measurable subset AA of the state space of the sequence. We describe here a bivariate generating function which determines the distribution of this counting variable Sn(A)S_{n}(A) for any stationary 11-dependent process (X1,X2,)(X_{1},X_{2},\cdots).

1.2 Bivariate generating functions

Following the work of de Moivre on the distribution of the number of spots in a number of die rolls, the encoding of a sequence by its generating function was exploited by Euler [13] and many subsequent authors for combinatorial enumeration [21, 16]. To describe the distribution of an integer-valued random variable, Laplace [31] introduced the probability generating function [11]. For a sequence of non-negative-integer-valued random variables SnS_{n}, let QSnQ_{S_{n}} denote the probability generating function of SnS_{n}:

QSn(z):=𝔼zSn=k=0(Sn=k)zk,Q_{S_{n}}(z):={\mathbb{E}}z^{S_{n}}=\sum_{k=0}^{\infty}{\mathbb{P}}(S_{n}=k)z^{k}, (1.3)

and let Q(z,v)Q(z,v) be the bivariate generating function of distributions of SnS_{n}

Q(z,v):=n0k0(Sn=k)zkvn=n0QSn(z)vn,Q(z,v):=\sum_{n\geq 0}\sum_{k\geq 0}{\mathbb{P}}(S_{n}=k)z^{k}v^{n}=\sum_{n\geq 0}Q_{S_{n}}(z)v^{n}, (1.4)

for all z,vz,v such that the summation converges, including |z|1|z|\leq 1 and |v|<1|v|<1. This bivariate generating function determines the distribution of SnS_{n} for every nn by extraction of the coefficient of zkvnz^{k}v^{n} from Q(z,v)Q(z,v):

(Sn=k)=[zkvn]Q(z,v),n,k=0,1,2,{\mathbb{P}}(S_{n}=k)=[z^{k}v^{n}]Q(z,v),\quad n,k=0,1,2,\cdots (1.5)

In our set up for counting processes, Sn:=X1++XnS_{n}:=X_{1}+\cdots+X_{n}, where (Xn,n1)(X_{n},n\geq 1) is an indicator sequence, so each count SnS_{n} takes values in {0,1,2,,n}\{0,1,2,\cdots,n\}, and the series (1.4) is absolutely convergent for |zv|<1|zv|<1. See e.g. [16, Chapter III] for further background on bivariate generating functions.

1.3 Run probability generating functions

For an indicator sequence (Xn,n1)(X_{n},n\geq 1), define its 0-run probabilities

q0:=1 and qn:=(Sn=0)=(X1=X2==0),n=1,2,q_{0}:=1\text{ and }q_{n}:={\mathbb{P}}(S_{n}=0)={\mathbb{P}}(X_{1}=X_{2}=\cdots=0),\quad n=1,2,\ldots (1.6)

and the associated 0-run probability generating function

Q(v)\displaystyle Q(v) :=n=0qnvn=n=0(Sn=0)vn=Q(0,v).\displaystyle:=\sum_{n=0}^{\infty}q_{n}v^{n}=\sum_{n=0}^{\infty}{\mathbb{P}}(S_{n}=0)v^{n}=Q(0,v). (1.7)

The 11-run probability sequence can be similarly defined, treated as the 0-run probability sequence for the dual indicator sequence (Xn^:=1Xn,n1)(\hat{X_{n}}:=1-X_{n},n\geq 1), with counts Sn^=nSn\hat{S_{n}}=n-S_{n}:

p0:=1 and pn:=(Sn=n)=(X1=X2==1),n=1,2,p_{0}:=1\text{ and }p_{n}:={\mathbb{P}}(S_{n}=n)={\mathbb{P}}(X_{1}=X_{2}=\cdots=1),\quad n=1,2,\ldots (1.8)

The associated 11-run probability generating function is then for 0v<10\leq v<1

P(v)\displaystyle P(v) :=n=0pnvn=n=0(Sn=n)vn=Q^(0,v)=limz0Q(z1,zv),\displaystyle:=\sum_{n=0}^{\infty}p_{n}v^{n}=\sum_{n=0}^{\infty}{\mathbb{P}}(S_{n}=n)v^{n}=\hat{Q}(0,v)=\lim_{z\to 0}Q(z^{-1},zv), (1.9)

where Q^(z,v)\hat{Q}(z,v) is the bivariate generating function of S^n\hat{S}_{n}, and the last equality is by dominated convergence as z0z\to 0, using the evaluation for |zv|<1|zv|<1:

Q^(z,v)\displaystyle\hat{Q}(z,v) =n0k=0n(Sn=nk)zkvn\displaystyle=\sum_{n\geq 0}\sum_{k=0}^{n}{\mathbb{P}}(S_{n}=n-k)z^{k}v^{n} (1.10)
=n0k=0n(Sn=k)zk(zv)n=Q(z1,zv).\displaystyle=\sum_{n\geq 0}\sum_{k=0}^{n}{\mathbb{P}}(S_{n}=k)z^{-k}(zv)^{n}=Q(z^{-1},zv). (1.11)

In our case of a stationary 11-dependent sequence of indicator variables (Xn,n1)(X_{n},n\geq 1), it is known [35, Chapter 7.4][1, Theorem 1] that the distribution of Sn=X1++XnS_{n}=X_{1}+\cdots+X_{n}, is uniquely determined by its sequence of 11-run probabilities, or just as well by its sequence of 0-run probabilities, through a determinantal formula for the probability function of the random vector (X1,,Xn)(X_{1},\ldots,X_{n}). Our main result, presented in Section 2, gives a formula for the bivariate generating function Q(z,v)Q(z,v) of distributions of SnS_{n} in this case, which is simpler than might be expected from this determinantal formula. The rest of this paper is organized as follows.

  • Section 3 shows how the Eulerian bivariate generating function is obtained from our result in the case of descents.

  • Section 4 displays the bivariate generating function of some other stationary 11-dependent processes.

  • Section 5 verifies our result from the perspective of determinantal point processes.

  • Section 6 makes connection with a combinatorial result in Goulden and Jackson [21] and provides Corollary 6.1 which is suitable for counting a particular pattern in 22-block factors.

  • Section 7 compares our formula for the bivariate generating function in the stationary 11-dependent case to similar formulae for exchangeable or renewal processes, which are either known or easily derived from known results.

2 Main result

Theorem 2.1

For a stationary 11-dependent indicator sequence (Xn,n1)(X_{n},n\geq 1), the bivariate generating function Q(z,v)Q(z,v) of distributions of its partial sums SnS_{n} is determined either by the 0-run probability generating function Q(v)Q(v), or by the 11-run probability generating function P(v)P(v), via the formulae

Q(z,v)=Q((1z)v)1zvQ((1z)v)=P((1z)v)1vP((1z)v).Q(z,v)=\frac{Q((1-z)v)}{1-zvQ((1-z)v)}=\frac{P(-(1-z)v)}{1-vP(-(1-z)v)}. (2.1)

The particular case z=0z=0 of (2.1) reduces to the following known result:

Corollary 2.2 (Involution [2, Proposition 7.4])

In the setting of the previous theorem, for any stationary 11-dependent indicator sequence, the 0-run generating function Q(v)Q(v) and the 11-run generating function P(v)P(v) determine each other via the involution of formal power series

Q(v)=P(v)1vP(v);P(v)=Q(v)1vQ(v).Q(v)=\frac{P(-v)}{1-vP(-v)};\qquad P(v)=\frac{Q(-v)}{1-vQ(-v)}. (2.2)

Proof (of Theorem 2.1 and Corollary 2.2)

We will first prove the left equality in (2.1), rearranged as

Q(z,v)=Q((1z)v)+zvQ((1z)v)Q(z,v),Q(z,v)=Q((1-z)v)+zv\,Q((1-z)v)Q(z,v), (2.3)

by establishing the corresponding identity of coefficients of powers of vv, that is,

QSn(z)\displaystyle Q_{S_{n}}(z) =[vn]Q((1z)v)+zk=0n1[vk]Q((1z)v)[vn1k]Q(z,v).\displaystyle=[v^{n}]Q((1-z)v)+z\sum_{k=0}^{n-1}[v^{k}]Q((1-z)v)[v^{n-1-k}]Q(z,v). (2.4)

Recall that qj:=[vj]Q(v)q_{j}:=[v^{j}]Q(v), whence

[vj]Q((1z)v)=(1z)j[vj]Q(v)=(1z)jqj,j=0,1,.[v^{j}]Q((1-z)v)=(1-z)^{j}[v^{j}]Q(v)=(1-z)^{j}q_{j},\quad j=0,1,\ldots. (2.5)

So (2.4) for each n=1,2,n=1,2,\ldots, with j=k1j=k-1, reduces to

QSn(z)\displaystyle Q_{S_{n}}(z) =(1z)nqn+k=1n((1z)k1z)qk1QSnk(z),\displaystyle=(1-z)^{n}q_{n}+\sum_{k=1}^{n}\left((1-z)^{k-1}z\right)q_{k-1}Q_{S_{n-k}}(z), (2.6)

which has the following interpretation. For 0z10\leq z\leq 1, let (Yn,n1)(Y_{n},n\geq 1) be a sequence of i.i.d. Bernoulli(zz) random variables, also independent of (Xn,n1)(X_{n},n\geq 1). Employing van Dantzig’s method of marks [37], treat YnY_{n} as a mark on XnX_{n}: say the nn-th item XnX_{n} is zz-marked if Yn=1Y_{n}=1, and non-zz-marked if Yn=0Y_{n}=0. By construction, QSn(z)Q_{S_{n}}(z) is the probability that every success among the first nn trials is zz-marked. In particular, if z=0z=0, every success in non-zz-marked. Then the only way every success in the first nn trials is zz-marked is if there are no successes. Hence QSn(0)=qnQ_{S_{n}}(0)=q_{n} is the probability of no successes in the first nn trials. The identity (2.6), decomposes the event that every success in the first nn trials is zz-marked according to the value of Tz:=min{n:Yn=1}T_{z}:=\min\{n:Y_{n}=1\}, the index of the first zz-mark. So

  • TzT_{z} has the geometric(z)(z) distribution (Tz=k)=(1z)k1z{\mathbb{P}}(T_{z}=k)=(1-z)^{k-1}z for k=1,2,k=1,2,\ldots;

  • On the event of probability (1z)n(1-z)^{n}, that the first zz-mark occurs at Tz>nT_{z}>n, no trial among the first nn is allowed to be success, with probability qnq_{n};

  • On the event of probability (1z)k1z(1-z)^{k-1}z, that the first zz-mark occurs at Tz=kT_{z}=k for some 1kn1\leq k\leq n, no trial among the first k1k-1 is allowed to be success, with probability qk1q_{k-1}, and all success after the kk-th (excluding the kk-th) are zz-marked, with probability QSnk(z)Q_{S_{n-k}}(z), with independence before and after the kk-th trial by the assumption that (Xn,n1)(X_{n},n\geq 1) is 11-dependent.

This proves the left equality of (2.1). To prove the right, it is easiest to prove Corollary 2.2. Recall (1.9),

P(v)=limz0Q(z1,zv)=limz0Q((z1)v)1vQ((z1)v)=Q(v)1vQ(v),P(v)=\lim_{z\to 0}Q(z^{-1},zv)=\lim_{z\to 0}\frac{Q((z-1)v)}{1-vQ((z-1)v)}=\frac{Q(-v)}{1-vQ(-v)}, (2.7)

which yields the PP identity in (2.2). To see the QQ identity, simply replace vv with v-v in the last equation.

Lastly, the right equation of (2.1) is obtained from the left one and the involution

Q(z,v)\displaystyle Q(z,v) =P((1z)v)/(1(1z)vP((1z)v))1zvP((1z)v)/(1(1z)vP((1z)v))\displaystyle=\frac{P(-(1-z)v)/(1-(1-z)vP(-(1-z)v))}{1-zvP(-(1-z)v)/(1-(1-z)vP(-(1-z)v))} (2.8)
=P((1z)v)1vP((1z)v).\displaystyle=\frac{P(-(1-z)v)}{1-vP(-(1-z)v)}. (2.9)

 

3 Application to descents

In this section, we present the example of Eulerian numbers. We were led to the general formula for the bivariate generating function of counts of a 11-dependent indicator sequence by the algebraically simple form of the bivariate generating function of Eulerian numbers, whose probabilistic meaning is not immediately obvious, but nicely explained by the above proof of Theorem 2.1.

It is well known that a large class of stationary 11-dependent indicator sequences (though not all, see [1, 3]) may be constructed from an independent and identically distributed background sequence (Y1,Y2,)(Y_{1},Y_{2},\ldots), as two-block factors

Xn:=1((Yn,Yn+1)B),X_{n}:=1((Y_{n},Y_{n+1})\in B), (3.1)

for some product-measurable subset BB of the space of pairs of YY-values, say [0,1]2[0,1]^{2} for YiY_{i}\sim Uniform(0,1)(0,1).

An important example is provided by the sequence of descents Xn:=1(Yn>Yn+1)X_{n}:=1(Y_{n}>Y_{n+1}) for real-valued YiY_{i}. In particular, for YiY_{i}\sim Uniform(0,1)(0,1) (or any continuous distribution) and Sn:=Dn+1S_{n}:=D_{n+1} counting the number of descents Yi>Yi+1Y_{i}>Y_{i+1} with 1in1\leq i\leq n:

(Sn=0)=(Sn=n)=(Y1>>Yn+1)=1(n+1)!.{\mathbb{P}}(S_{n}=0)={\mathbb{P}}(S_{n}=n)={\mathbb{P}}(Y_{1}>\cdots>Y_{n+1})=\frac{1}{(n+1)!}. (3.2)

So the run generating functions Q(v)Q(v) and P(v)P(v) in this case are easily evaluated as

Q(v)=P(v)=n0vn(n+1)!=ev1v.Q(v)=P(v)=\sum_{n\geq 0}\frac{v^{n}}{(n+1)!}=\frac{e^{v}-1}{v}. (3.3)

3.1 Eulerian numbers

The Eulerian numbers nk\genfrac{\langle}{\rangle}{0.0pt}{}{n}{k} are commonly defined by the numbers of permutations of [n]:={1,2,,n}[n]:=\{1,2,\cdots,n\} with exactly kk descents, i.e. kk adjacent pairs with first larger than the second [22]. So the count S^n1\hat{S}_{n-1} of descents in a uniform random permutation of [n][n] has the Eulerian distribution

(S^n1=k)=1(n)!nk.{\mathbb{P}}(\hat{S}_{n-1}=k)=\frac{1}{(n)!}\genfrac{\langle}{\rangle}{0.0pt}{}{n}{k}. (3.4)

Observe that this uniform permutation can be done by taking the ranks of the i.i.d. background sequence (Y1,Y2,,Yn)(Y_{1},Y_{2},\cdots,Y_{n}). Here, we say the rank of YiY_{i} is kk if and only if YiY_{i} is the kk-th smallest among Y1,Y2,YnY_{1},Y_{2},\cdots Y_{n}. Then, for YiY_{i}\sim Uniform(0,1)(0,1), the ranks are almost surely a uniform permutation of [n][n]. Therefore, S^n\hat{S}_{n} has the same distribution as SnS_{n} in (3.2). Now, applying Theorem 2.1 to the descents Xn:=1(Yn>Yn+1)X_{n}:=1(Y_{n}>Y_{n+1}) implies the following bivariate generating function

Q(z,v)=n=0k=0n(Sn=k)zkvn=e(1z)v1v(1ze(1z)v)=evezvv(ezvzev),Q(z,v)=\sum_{n=0}^{\infty}\sum_{k=0}^{n}{\mathbb{P}}(S_{n}=k)z^{k}v^{n}=\frac{e^{(1-z)v}-1}{v(1-ze^{(1-z)v})}=\frac{e^{v}-e^{zv}}{v(e^{zv}-ze^{v})}, (3.5)

which is the classical bivariate generating function of the Eulerian numbers [4, 7, 22, 33, 28].

4 More examples

To simplify displays, we work in this section with the shifted run generating functions

Q~(v)=1+vQ(v),P~(v)=1+vP(v),\tilde{Q}(v)=1+vQ(v),\qquad\tilde{P}(v)=1+vP(v), (4.1)

and the shifted bivariate generating function

Q~(z,v)=1+vQ(z,v).\tilde{Q}(z,v)=1+vQ(z,v). (4.2)

For reasons which do not seem obvious, the algebraic form of the generating functions associated with a 11-dependent indicator sequence is typically simpler when they are shifted like this. The shifted generating functions of some selected models are shown in the table below, with detailed explanation later.

There are some benefits for using the shifted generating functions. Firstly, they are simpler, especially in the Eulerian case; secondly, for 22-block factors, the shifted generating functions are actually ‘standard’ in combinatorics, since nn is set to be the length of background sequence; thirdly, the formulae in Theorem 2.1 and Corollary 2.2 are also slightly simplified: the involution formula (2.2) becomes (see, e.g. [32, 2] for earlier occurrences, and [18, 5] where this formula was first discovered in the study of 22-block factors)

Q~(v)=1P~(v);P~(v)=1Q~(v),\tilde{Q}(v)=\frac{1}{\tilde{P}(-v)};\qquad\tilde{P}(v)=\frac{1}{\tilde{Q}(-v)}, (4.3)

while our main theorem (2.1) is re-written as

Q~(z,v)=(1z)Q~((1z)v)1zQ~((1z)v)=1zP~((1z)v)z.\tilde{Q}(z,v)=\frac{(1-z)\tilde{Q}((1-z)v)}{1-z\tilde{Q}((1-z)v)}=\frac{1-z}{\tilde{P}(-(1-z)v)-z}. (4.4)
Table 1
Model Shifted 0-run pgf Q~(v)\tilde{Q}(v) Shifted 11-run pgf P~(v)\tilde{P}(v) Shifted bgf Q~(z,v)\tilde{Q}(z,v)
Eulerian eve^{v} eve^{v} 1ze(1z)vz\frac{1-z}{e^{-(1-z)v}-z}
I.i.d. 1+pv1(1p)v\frac{1+pv}{1-(1-p)v} 1+vpv1pv\frac{1+v-pv}{1-pv} 1+pvpzv1v+pvpzv\frac{1+pv-pzv}{1-v+pv-pzv}
One-pair 1+pv1v+pvpv2+p2v2\frac{1+pv}{1-v+pv-pv^{2}+p^{2}v^{2}} 1+vpvpv2+p2v21pv\frac{1+v-pv-pv^{2}+p^{2}v^{2}}{1-pv} 1+pvpvz1v+pvpv2+p2v2pvz+pv2zp2v2z\frac{1+pv-pvz}{1-v+pv-pv^{2}+p^{2}v^{2}-pvz+pv^{2}z-p^{2}v^{2}z}
Carries (1vb)b\left(1-\frac{v}{b}\right)^{-b} (1+vb)b\left(1+\frac{v}{b}\right)^{b} 1z(1(1z)vb)bz\frac{1-z}{\left(1-\frac{(1-z)v}{b}\right)^{b}-z}
Flipping qptan[vpq\sqrt{\frac{q}{p}}\tan\left[v\sqrt{pq}\right. arctan(qp)]\left.-\arctan\left(\frac{q}{p}\right)\right] pq/tan[vpq\left.\sqrt{\frac{p}{q}}\middle/\tan\left[-v\sqrt{pq}\right.\right. arctan(qp)]\left.-\arctan\left(\frac{q}{p}\right)\right] (1z)tan[(1z)vpqarctan(q/p)]p/qztan[(1z)vpqarctan(q/p)]\frac{(1-z)\tan\left[(1-z)v\sqrt{pq}-\arctan\left(q/p\right)\right]}{\sqrt{p/q}-z\tan\left[(1-z)v\sqrt{pq}-\arctan\left(q/p\right)\right]}
Non-2BF 11v+αv2βv3\frac{1}{1-v+\alpha v^{2}-\beta v^{3}} 1+v+αv2+βv31+v+\alpha v^{2}+\beta v^{3} 11v+α(1z)v2β(1z)2v3\frac{1}{1-v+\alpha(1-z)v^{2}-\beta(1-z)^{2}v^{3}}

We already discussed the Eulerian model in Section 3. The rest of Table 11 will be briefed here row by row. Usually, we only say how one of the run probability generating functions is obtained, since the other one and the bivariate generating function can then be found easily through (4.1), (4.3) and (4.4).

Independent and identically distributed trials (I.i.d.)

The classical example of i.i.d. Bernoulli(pp) trials is for sure an example of 11-dependent sequence.

Indicator of two consecutive ones (One-pair)

Consider the simplest 2-block factors Xn:=1(Yn=Yn+1=1)X_{n}:=1(Y_{n}=Y_{n+1}=1), where (Yn,n1)(Y_{n},n\geq 1) is i.i.d. Bernoulli(pp) trials. Its 11-run probability generating function is easy to compute.

Considering the coefficient of zkvnz^{k}v^{n} in its bivariate generating function gives the recursion

qn,k=(1p)qn1,k+pqn1,k1+p(1p)(qn2,kqn2,k1),n2,k0,q_{n,k}=(1-p)q_{n-1,k}+pq_{n-1,k-1}+p(1-p)(q_{n-2,k}-q_{n-2,k-1}),\qquad n\geq 2,k\geq 0, (4.5)

where qn,k:=[zkvn]Q(z,v)=[zkvn+1]Q~(z,v)=(Sn=k)q_{n,k}:=[z^{k}v^{n}]Q(z,v)=[z^{k}v^{n+1}]\tilde{Q}(z,v)={\mathbb{P}}(S_{n}=k) with initial values q0,0=1,q1,0=1p2,q1,1=p2q_{0,0}=1,q_{1,0}=1-p^{2},q_{1,1}=p^{2} and convention qn,k=0q_{n,k}=0 for k>nk>n or k<0k<0.

Setting p=1/2p=1/2 recovers the Fibonacci sequence as its 0-run probabilities:

Fn+2=:2nqn,0=2n1qn1,0+2n2qn2,0=Fn+1+Fn,F_{n+2}=:2^{n}q_{n,0}=2^{n-1}q_{n-1,0}+2^{n-2}q_{n-2,0}=F_{n+1}+F_{n}, (4.6)

where F0=F1=1F_{0}=F_{1}=1 is the first two terms of the Fibonacci sequence we use here. This can also be interpreted as the chance of not getting any consecutive heads in a row of coin tosses, which has been recognized by many others, see [34, 12, 15, 26].

Carries when adding a list of digits (Carries)

Adding a list of digits using carries is discussed in [2] as a stationary 11-dependent process. This example also falls into the category of 22-block factors with i.i.d. Uniform({0,1,,b1}\{0,1,\cdots,b-1\})’s as its background sequence, and B={(x,y):b>x>y0}B=\{(x,y):b>x>y\geq 0\}. Its 0-run probabilities and generating function are explicitly given in [2].

Edge flipping on the integers (Flipping)

Chung and Graham [6] introduced the following discrete time model of a random pattern in {0,1}V\{0,1\}^{V} indexed by the vertex set VV of a finite simple graph (V,E)(V,E): pick an edge uniformly at random from EE, then the pattern is updated by replacing its values on the two vertices joined with the picked edge by 1111 with probability pp and by 0000 otherwise, where the choices of edges and update of values on vertices are assumed to be all independent of the others, for some 0<p<10<p<1.

They offered an analysis of discrete-time edge flipping on an nn-cycle, which can be generalized in terms of a stationary continuous-time (11-dependent indicator) process indexed by the integers.

In short, we may sample its stationary distribution in the following manner:

  • first, generate a sequence (Un,nZ)(U_{n},n\in Z) of i.i.d. Uniform (0,1)(0,1)’s, which works as the time of last update on the edge {n,n+1}\{n,n+1\}. That is to say, if Un>Un1U_{n}>U_{n-1}, then the last update on edge {n,n+1}\{n,n+1\} happened later than the one on {n1,n}\{n-1,n\};

  • secondly, generate a sequence (Wn,n)(W_{n},n\in{\mathbb{Z}}) of i.i.d. Bernoulli (p)(p)’s, independent of (Un,nZ)(U_{n},n\in Z), which stands for whether the last update on the edge {n,n+1}\{n,n+1\} is 1111 or 0000.

  • lastly,

    Xn:=1(Un>Un1)Wn+1(Un<Un1)Wn1,n.X_{n}:=1(U_{n}>U_{n-1})W_{n}+1(U_{n}<U_{n-1})W_{n-1},\quad n\in{\mathbb{Z}}. (4.7)

This is apparently a stationary 22-block factor. Its shifted 0-run probability generating function is given in [6, Theorem 6].

A non-22-block-factor example (Non-2BF)

Aaronson, Gilat, Keane and de Valk [1] first discovered this family of non-22-block-factor stationary 11-dependent indicator processes. In short, they forbid the appearance of three consecutive ones, hence the 11-run probability generating functions are as simple as quadratic functions. Note that not all value pairs (α,β)(\alpha,\beta) make this process not a 22-block factor – only some work, while some others do not yield stationary 11-dependent processes at all. See [1, Fig. 2].

5 Determinantal representation

Any indicator process can be treated as a point process by regarding the indicated events as points. It was shown in [2, Theorem 7.1] that any 11-dependent point process on a segment of {\mathbb{Z}} is a determinantal process, as discussed further in [8, 9, 2].

Given a finite set 𝒳{\mathcal{X}}, a point process on 𝒳{\mathcal{X}} is a probability measure {\mathbb{P}} on 2𝒳2^{\mathcal{X}}. Its correlation function is the function of subsets A𝒳A\subseteq{\mathcal{X}} defined by ρ(A):=(S:SA)\rho(A):={\mathbb{P}}(S:S\supseteq A). A point process is said to be determinantal with kernel K(x,y)K(x,y) if

ρ(A)=det(K(x,y))x,yA,\rho(A)=\det(K(x,y))_{x,y\in A}, (5.1)

where the right hand side is the determinant of the |A|×|A||A|\times|A| matrix with K(x,y)K(x,y) on its (x,y)(x,y)-th entry for x,yAx,y\in A. Now, we may state the following theorem from [2].

Theorem 5.1 (Theorem 7.1 [2])

Every 11-dependent point process on a segment of {\mathbb{Z}} is determinantal with kernel

K(x,y)=r=1yx+1(1)r1x=l0<l1<<lr=y+1k=1rρ([lk1,lk)),xy,K(x,y)=\sum_{r=1}^{y-x+1}(-1)^{r-1}\sum_{x=l_{0}<l_{1}<\cdots<l_{r}=y+1}\prod_{k=1}^{r}\rho([l_{k-1},l_{k})\cap{\mathbb{Z}}),\quad x\leq y, (5.2)

K(x,y)=1K(x,y)=-1 for x=y+1x=y+1, and K(x,y)=0K(x,y)=0 for xy+2x\geq y+2.

In the stationary case,

ρ([a,b))=(Sba=ba)=pba,\rho([a,b)\cap{\mathbb{Z}})={\mathbb{P}}(S_{b-a}=b-a)=p_{b-a}, (5.3)

where SnS_{n} and pnp_{n} inherit the setup in Section 1. It is easy to see that then the kernel is also stationary, i.e.

K(x,y):=k(yx)=K(x+c,y+c),c.K(x,y):=k(y-x)=K(x+c,y+c),\qquad\forall c\in{\mathbb{Z}}. (5.4)

To better describe the kernel, consider the kernel generating function

Gk(v):=nk(n)vn=v1+p1+(p2p12)v+G_{k}(v):=\sum_{n\in{\mathbb{Z}}}k(n)v^{n}=-v^{-1}+p_{1}+(p_{2}-p_{1}^{2})v+\cdots (5.5)

Borodin, Diaconis and Fulman[2, Corollary 7.3] give the following relationship between the kernel generating function GkG_{k} and the 11-run probability generating function PP

Gk(v)P(v)=1v.G_{k}(v)P(v)=-\frac{1}{v}. (5.6)

One last interesting (but not hard) result [2, Theorem 4.1] is that we may write the probability of any string pattern as a determinant. Consider 𝒳=[n]:={1,2,,n}{\mathcal{X}}=[n]:=\{1,2,\cdots,n\}, then the string with exactly kk zeros at 0<w1<w2<<wkn0<w_{1}<w_{2}<\cdots<w_{k}\leq n, which corresponds to the subset A:={w1,w2,,wk}cA:=\{w_{1},w_{2},\cdots,w_{k}\}^{c}, has probability

(A)=det(pwj+1wi1)0i,jk,{\mathbb{P}}(A)=\det(p_{w_{j+1}-w_{i}-1})_{0\leq i,j\leq k}, (5.7)

where pn=0,n<1,p0=p1=1p_{n}=0,\forall n<-1,p_{0}=p_{-1}=1 and w0=0,wk+1=n+1w_{0}=0,w_{k+1}=n+1.

As shown in [2], this formula (5.7) is obtained by application of inclusion-exclusion formula on the correlation function. Hence, we may recover our bivariate generating function (2.1) as follows:

Corollary 5.2

The multivariate probability generating function of the indicators IiI_{i} of location i,i𝒳=[n]i,i\in{\mathcal{X}}=[n] is

Gn(𝐳):=GI1,I2,,In(z1,z2,,zn)=det(gi,j)0i,jn,G_{n}({\bf z}):=G_{I_{1},I_{2},\cdots,I_{n}}(z_{1},z_{2},\cdots,z_{n})=\det(g_{i,j})_{0\leq i,j\leq n}, (5.8)

where gi,j=pji,ij1g_{i,j}=p_{j-i},\forall i-j\neq 1 and gj+1,j=p1zj=1zjg_{j+1,j}=p_{-1}-z_{j}=1-z_{j}.

Corollary 5.3

The ordinary probability generating function of the number SnS_{n} of ones in 𝒳=[n]{\mathcal{X}}=[n] is

GSn(z)=det(p^ji)0i,jn,G_{S_{n}}(z)=\det(\hat{p}_{j-i})_{0\leq i,j\leq n}, (5.9)

where p^k=pk,k1\hat{p}_{k}=p_{k},\forall k\neq-1 and p^1=p1z=1z\hat{p}_{-1}=p_{-1}-z=1-z.

The proof to Corollary 5.2 is quite straightforward: just multiply (5.7) by iAzi\prod_{i\in A}z_{i} and then sum it up over all subsets A[n]A\subseteq[n]. Corollary 5.3 is then obvious by considering the ordinary generating function of Sn=i[n]IiS_{n}=\sum_{i\in[n]}I_{i}, i.e. treat all ziz_{i} in (5.8) as zz. We are also aware of the following known determinantal generating function formula for any 11-dependent process (not necessarily stationary)

GSn(z)=det(I+(z1)K),G_{S_{n}}(z)=\det(I+(z-1)K), (5.10)

where II is the identity matrix and KK is the determinantal correlation kernel given by Theorem 5.1. This is not easy to simplify even for the stationary case. But from hindsight, one may check that this is essentially equivalent to (5.9). Lastly, one may also show Theorem 2.1 by the Laplace expansion of the last column of the determinant on the right of (5.9).

6 Enumeration of sequences

We have derived our main Theorem 2.1 from a probabilistic point of view, without assuming the sequences to be 22-block factors. But its enumerative Corollary 6.1 on integer-valued 22-block factors can be deduced from a combinatorial result in Goulden and Jackson [21, Section 4].

Recall (3.1), we defined 22-block factors on VV based on the background sequence (Y1,Y2,)(Y_{1},Y_{2},\ldots) as indicators of adjacent pairs (Yi,Yi+1)(Y_{i},Y_{i+1}) falling in some subset BV2B\subseteq V^{2}, which obviously is a 11-dependent indicator process. For example, in section 3, we discussed the case where V=[0,1]V=[0,1] and B={(x,y):0yx1}B=\{(x,y):0\leq y\leq x\leq 1\}.

Say a string on +{\mathbb{Z}}_{+} is a BB-string if each adjacent pair belongs to BB. Suppose further that the background sequence (Y1,Y2,)(Y_{1},Y_{2},\ldots) has i.i.d. uniform distribution on [m]={1,2,,m}[m]=\{1,2,\cdots,m\}. Then, the 11-run probability pkp_{k} can be used to count the number mkm_{k} of BB-strings of length kk, i.e.

mk=mkpk1,k1,m_{k}=m^{k}p_{k-1},\quad k\geq 1, (6.1)

and m0=1m_{0}=1 by convention. Define the BB-string generating function as

G(v)=k=0mkvk=1+k=1pk1mkvk=1+mvP(mv),G(v)=\sum_{k=0}^{\infty}m_{k}v^{k}=1+\sum_{k=1}^{\infty}p_{k-1}m^{k}v^{k}=1+mvP(mv), (6.2)

where P(v)P(v) is the 11-run probability generating function. Now the following corollary follows from Theorem 2.1 by elementary algebra.

Corollary 6.1

For V=[m]V=[m], the number of sequences of length n1n\geq 1 on VV with exactly k[0,n1]k\in[0,n-1] occurrences of pairs of adjacent components in BB is

[zkvn]z1zG((z1)v).[z^{k}v^{n}]\frac{z-1}{z-G((z-1)v)}. (6.3)

To see the connection between Corollary 6.1 and [21], we need the following definition from [21]. Let the BB-string of length kk enumerator be

γk(𝐯)=γk(v1,v2,):=|σ|=kiviτi(σ),\gamma_{k}({\bf v})=\gamma_{k}(v_{1},v_{2},\cdots):=\sum_{|\sigma|=k}\prod_{i}v_{i}^{\tau_{i}(\sigma)}, (6.4)

where the sum is over all BB-string σ\sigma of length kk and τi(σ)\tau_{i}(\sigma) is the number of times that component i+i\in{\mathbb{Z}}_{+} appears, with viv_{i} being the counting variable associated with ii.

Using the BB-string of length kk enumerator, the following result for enumerating sequences with a certain number of occurrences of BB is shown in [21]:

Theorem 6.2 ([21, Corollary 4.2.12])

The number of sequences of length n=iτin=\sum_{i}\tau_{i} with exactly τi\tau_{i} ii’s and k(n1)k(\leq n-1) occurrences of pairs of adjacent components in BB is

[zkiviτi]{1j=1(z1)j1γj(𝐯)}1.[z^{k}\prod_{i}v_{i}^{\tau_{i}}]\left\{1-\sum_{j=1}^{\infty}(z-1)^{j-1}\gamma_{j}({\bf v})\right\}^{-1}. (6.5)

By setting v1=v2==vv_{1}=v_{2}=\cdots=v, this becomes equivalent to (6.3) since γj(v𝟏)=mjvj\gamma_{j}(v{\bf 1})=m_{j}v^{j}. A more generalized version of Theorem 6.2, known as the Goulden-Jackson cluster theorem, allows enumeration of adjacent components of any length (not just pairs), see [20, 19].

In the case of descents, treated Section 3, the above discussion does not apply directly, as since the background sequence is continuously distributed. However, this can be circumvented by considering the ranks in the partial sequence instead of the absolute values, hence it can be treated combinatorially as in [21, Section 2.4.21] by considering only permutations of {1,2,,n}\{1,2,\cdots,n\} instead of all possible sequences.

An application of Corollary 6.1 can be found in [17].

Theorem 6.3 (a=4a=4: Florez [17, Theorem 9])

The number of sequences on V={0,1,,a1}V=\{0,1,\cdots,a-1\} of length n+mn+m with exactly mm occurrences of adjacent pair 0101 is

f(n,m)=[xmyn]F(x,y)=[xmyn]11(a+x)y+y2.f(n,m)=[x^{m}y^{n}]F(x,y)=[x^{m}y^{n}]\frac{1}{1-(a+x)y+y^{2}}. (6.6)

To check this from Corollary 6.1, note first that the 11-run probability generating function is P(v)=1+1a2vP(v)=1+\frac{1}{a^{2}}v, hence the BB-string generating function is G(v)=1+av+v2G(v)=1+av+v^{2}. (6.6) is then proved by substitutions km,nkn,vy,zvxk\to m,n-k\to n,v\to y,zv\to x.

7 Comparison with other dependence structures

Table 2
Dependence structure Q(z,v)Q(z,v) in terms of Q(v)Q(v) Q(z,v)Q(z,v) in terms of P(v)P(v)
Stationary 11-dependent Q((1z)v)1zvQ((1z)v)\frac{Q((1-z)v)}{1-zvQ((1-z)v)} P((1z)v)1vP((1z)v)\frac{P(-(1-z)v)}{1-vP(-(1-z)v)}
Exchangeable Q((1z)v1zv)1zv\frac{Q\left(\frac{(1-z)v}{1-zv}\right)}{1-zv} P((1z)v1v)1v\frac{P\left(-\frac{(1-z)v}{1-v}\right)}{1-v}
Renewal Q(v)1z(1+(v1)Q(v))\frac{Q(v)}{1-z(1+(v-1)Q(v))} N/A
Stationary renewal z+(zv+(1z)vQ(0))Q(v)z(v1)+(1z)v(Q(0)1)+z(v1)2Q(v)\frac{-z+(z-v+(1-z)vQ^{\prime}(0))Q(v)}{z(v-1)+(1-z)v(Q^{\prime}(0)-1)+z(v-1)^{2}Q(v)} N/A

Apart from stationary 11-dependent processes we have discussed in this article, there are several other dependence structures whose bivariate generating functions can be written in terms of run probability generating functions. It is interesting to compare these formulas, but no current theory seems to unify them. Some examples may even belong to more than one dependence structure, see Table 2 above and explanations below.

Exchangeable

A sequence (Xn,n1)(X_{n},n\geq 1) is exchangeable iff for each n1n\geq 1 and each permutations π\pi of [n][n]

(Xπ(1),Xπ(2),,Xπ(n))=d(X1,X2,,Xn).(X_{\pi(1)},X_{\pi(2)},\cdots,X_{\pi(n)})\stackrel{{\scriptstyle d}}{{=}}(X_{1},X_{2},\cdots,X_{n}). (7.1)

Properties of exchangeable sequences were first studied by de Finetti [10]. The i.i.d. example in Section 4 is also exchangeable.

Renewal and stationary renewal

Consider the partial sum Sn=X1+X2++XnS_{n}=X_{1}+X_{2}+\cdots+X_{n} of the sequence (Xn,n1)(X_{n},n\geq 1), and its inter-arrival times T1=min{n:Sn=1},Tk=min{n:Sn=k}Tk1T_{1}=\min\{n:S_{n}=1\},T_{k}=\min\{n:S_{n}=k\}-T_{k-1}. We say (Xn,n1)(X_{n},n\geq 1) is delayed renewal, if the renewals T2,T3,T_{2},T_{3},\cdots are i.i.d., and the delay T1T_{1} is independent of T2,T3,T_{2},T_{3},\cdots. In particular, we say the indicator sequence is

  1. 1.

    renewal conditional on X0=1X_{0}=1 or renewal, if the delay T1T_{1} has the same distribution as the renewals T2,T3,T_{2},T_{3},\cdots;

  2. 2.

    stationary renewal, if it is both stationary and delayed renewal.

For these sequences it is not possible to recover the bivariate generating function from the 11-run probability generating function P(v)P(v), since in the delayed renewal case

P(v)=1+q1(v+r1v2+r12v3+)P(v)=1+q_{1}(v+r_{1}v^{2}+r_{1}^{2}v^{3}+\cdots)

contains only the information about immediate renewals, which does not determine the distribution of T1T_{1} or the 0-run probabilities.

The bivariate generating formula is an indirect corollary of [27, Theorem 3.5.4]. See [14] for introductions to renewal theory.

Three examples which appeared earlier in this article are both stationary 11-dependent and stationary renewal: the indicator of two consecutive ones in Section 4, the case b=2b=2 of carries when adding a list of digits in Section 4, and the generalized Florez’ example in Section 6. All these three examples can be treated as indicators of time-homogeneous Markov chains visiting a particular state.

References

  • [1] Jon Aaronson, David Gilat, Michael Keane, and Vincent de Valk, An algebraic construction of a class of one-dependent processes, Ann. Probab. 17 (1989), no. 1, 128–143. MR 972778
  • [2] Alexei Borodin, Persi Diaconis, and Jason Fulman, On adding a list of numbers (and other one-dependent determinantal processes), Bull. Amer. Math. Soc. (N.S.) 47 (2010), no. 4, 639–670. MR 2721041
  • [3] Robert M. Burton, Marc Goulet, and Ronald Meester, On 11-dependent processes and kk-block factors, Ann. Probab. 21 (1993), no. 4, 2157–2168. MR 1245304
  • [4] L. Carlitz and Richard Scoville, Generalized Eulerian numbers: combinatorial applications, J. Reine Angew. Math. 265 (1974), 110–137. MR 429575
  • [5] L. Carlitz, Richard Scoville, and Theresa Vaughan, Enumeration of pairs of sequences by rises, falls and levels, Manuscripta Math. 19 (1976), no. 3, 211–243. MR 432472
  • [6] Fan Chung and Ron Graham, Edge flipping in graphs, Adv. in Appl. Math. 48 (2012), no. 1, 37–63. MR 2845506
  • [7] Louis Comtet, Advanced combinatorics, enlarged ed., D. Reidel Publishing Co., Dordrecht, 1974, The art of finite and infinite expansions. MR 0460128
  • [8] D. J. Daley and D. Vere-Jones, An introduction to the theory of point processes. Vol. I, second ed., Probability and its Applications (New York), Springer-Verlag, New York, 2003, Elementary theory and methods. MR 1950431
  • [9]  , An introduction to the theory of point processes. Vol. II, second ed., Probability and its Applications (New York), Springer, New York, 2008, General theory and structure. MR 2371524
  • [10] Bruno de Finetti, Theory of probability, Wiley Series in Probability and Statistics, John Wiley & Sons, Ltd., Chichester, 2017, A critical introductory treatment, Translated from the 1970 Italian original by Antonio Machí and Adrian Smith and with a preface by Smith, Combined edition of Vol. 1, 1974 and Vol. 2, 1975. MR 3643261
  • [11] Michael A. B. Deakin, The development of the Laplace transform, 1737–1937. I. Euler to Spitzer, 1737–1880, Arch. Hist. Exact Sci. 25 (1981), no. 4, 343–390. MR 649489
  • [12] Richard A. Epstein, The theory of gambling and statistical logic, revised ed., Academic Press [Harcourt Brace Jovanovich, Publishers], New York-London, 1977. MR 0446535
  • [13] Leonhard Euler and Christian Goldbach, Leonhard Euler und Christian Goldbach. Briefwechsel 1729–1764, Abh. Deutsch. Akad. Wiss. Berlin Kl. Philos., Gesch., Staats-, Rechts-, Wirtschaftswiss 1965 (1965), no. 1, ix+420. MR 0201260
  • [14] William Feller, An introduction to probability theory and its applications. Vol. I, John Wiley and Sons, Inc., New York; Chapman and Hall, Ltd., London, 1957, 2nd ed. MR 0088081
  • [15] Mark Finkelstein and Robert Whitley, Fibonacci numbers in coin tossing sequences, Fibonacci Quart. 16 (1978), no. 6, 539–541.
  • [16] Philippe Flajolet and Robert Sedgewick, Analytic combinatorics, Cambridge University Press, Cambridge, 2009. MR 2483235
  • [17] Rigoberto Flórez, Leandro Junes, and José L. Ramírez, Further results on paths in an nn-dimensional cubic lattice, J. Integer Seq. 21 (2018), no. 1, Art. 18.1.2, 27. MR 3771670
  • [18] Ralph Fröberg, Determination of a class of Poincaré series, Math. Scand. 37 (1975), no. 1, 29–39. MR 404254
  • [19] Ira M. Gessel, An application of the Goulden-Jackson cluster theorem, 2020.
  • [20] I. P. Goulden and D. M. Jackson, An inversion theorem for cluster decompositions of sequences with distinguished subsequences, J. London Math. Soc. (2) 20 (1979), no. 3, 567–576. MR 561149
  • [21] Ian P Goulden and David M Jackson, Combinatorial enumeration, Courier Corporation, 2004.
  • [22] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Concrete mathematics, Addison-Wesley Publishing Company, Advanced Book Program, Reading, MA, 1989, A foundation for computer science. MR 1001562
  • [23] Shane G Henderson and Peter W Glynn, Regenerative steady-state simulation of discrete-event systems, ACM Transactions on Modeling and Computer Simulation (TOMACS) 11 (2001), no. 4, 313–345.
  • [24] Alexander E. Holroyd, One-dependent coloring by finitary factors, vol. 53, 2017, pp. 753–765. MR 3634273
  • [25] Alexander E. Holroyd and Thomas M. Liggett, Finitely dependent coloring, Forum Math. Pi 4 (2016), e9, 43. MR 3570073
  • [26] Ross Honsberger, A second look at the Fibonacci and Lucas numbers, Mathematical gems III. Washington, DC: Math Assoc Amer (1985).
  • [27] Jeffrey J. Hunter, Mathematical techniques of applied probability. Vol. 2, Operations Research and Industrial Engineering, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers], New York, 1983, Discrete time models: techniques and applications. MR 719019
  • [28] Hsien-Kuei Hwang, Hua-Huai Chern, and Guan-Huei Duh, An asymptotic distribution theory for Eulerian recurrences with applications, Adv. in Appl. Math. 112 (2020), 101960, 125. MR 4023911
  • [29] I. A. Ibragimov and Yu. V. Linnik, Independent and stationary sequences of random variables, Wolters-Noordhoff Publishing, Groningen, 1971, With a supplementary chapter by I. A. Ibragimov and V. V. Petrov, Translation from the Russian edited by J. F. C. Kingman. MR 0322926
  • [30] U. Kamps, Chebyshev polynomials and least squares estimation based on one-dependent random variables, Linear Algebra Appl. 112 (1989), 217–230. MR 976339
  • [31] Pierre Simon Laplace, Théorie analytique des probabilités, Courcier, 1820.
  • [32] I. G. Macdonald, Symmetric functions and Hall polynomials, second ed., Oxford Mathematical Monographs, The Clarendon Press, Oxford University Press, New York, 1995, With contributions by A. Zelevinsky, Oxford Science Publications. MR 1354144
  • [33] Percy A. MacMahon, Combinatory analysis. Vol. I, II (bound in one volume), Dover Phoenix Editions, Dover Publications, Inc., Mineola, NY, 2004, Reprint of ıt An introduction to combinatory analysis (1920) and ıt Combinatory analysis. Vol. I, II (1915, 1916). MR 2417935
  • [34] S. G. Mohanty, An rr-coin tossing game and the associated partition of generalized Fibonacci numbers, Sankhyā Ser. A 29 (1967), 207–214. MR 217830
  • [35] Alexander Polishchuk and Leonid Positselski, Quadratic algebras, University Lecture Series, vol. 37, American Mathematical Society, Providence, RI, 2005. MR 2177131
  • [36] Karl Sigman, One-dependent regenerative processes and queues in continuous time, Math. Oper. Res. 15 (1990), no. 1, 175–189. MR 1038240
  • [37] D. van Dantzig, Sur la méthode des fonctions génératrices, Le Calcul des Probabilités et ses Applications. Colloques International du Centre National de la Recherche Scientifique, no. 13, Centre National de la Recherche Scientifique, Paris, 1949, pp. 29–45. MR 0032964