\articlenumber

Height of walks with resets, the Moran model, and the discrete Gumbel distribution

Rafik Aguech¹\orcid0000-0002-4483-9356 , Asma Althagafi²\orcid0000-0001-5499-0810 and Cyril Banderier³\orcid0000-0003-0755-3022 ¹ Department of Statistics and Operations Research, King Saud Univ., Saudi Arabia, and Department of Mathematics, University of Monastir, Tunisia;
\websitehttps://faculty.ksu.edu.sa/en/raguech ² Department of Statistics and Operations Research, King Saud Univ., Saudi Arabia; \websitehttps://www.researchgate.net/profile/Asma-Althagafi ³ Laboratoire d’Informatique de Paris Nord, Univ. Sorbonne Paris Nord, France; \websitehttp://lipn.fr/ banderier

Abstract.

In this article, we consider several models of random walks in one or several dimensions, additionally allowing, at any unit of time, a reset (or “catastrophe”) of the walk with probability $q$ . We establish the distribution of the final altitude. We prove algebraicity of the generating functions of walks of bounded height $h$ (showing in passing the equivalence between Lagrange interpolation and the kernel method). To get these generating functions, our approach offers an algorithm of cost $O(1)$ , instead of cost $O(h^{3})$ if a Markov chain approach would be used. The simplest nontrivial model corresponds to famous dynamics in population genetics: the Moran model.

We prove that the height of these Moran walks asymptotically follows a discrete Gumbel distribution. For $q=1/2$ , this generalizes a model of carry propagation over binary numbers considered e.g. by von Neumann and Knuth. For generic $q$ , using a Mellin transform approach, we show that the asymptotic height exhibits fluctuations for which we get an explicit description (and, in passing, new bounds for the digamma function). We end by showing how to solve multidimensional generalizations of these walks (where any subset of particles is attributed a different probability of dying) and we give an application to the soliton wave model.

Key words and phrases:

Random walks, renewal process, Moran model, analytic combinatorics, discrete Gumbel distribution, Mellin transform, kernel method, digamma function

1. Introduction

The height of random walks is a fundamental parameter which occurs in many domains: in computer science (evolution of a stack, tree traversals, or cache algorithms [39]), in reliability or failure theory (maximal age of a component and inference statistics on the longevity before replacement [24]), in queueing theory (maximal length of the queue, with e.g. applications to traffic jam analysis [37]), in mathematical finance (e.g. in risk theory [28]), in bioinformatics (pattern matching and sequence alignment [2]), etc.

In combinatorics, random walks are studied via the corresponding notion of lattice paths, which play a central role, not only for intrinsic properties of such paths, but also as they are in bijection with many fundamental structures (trees, words, maps, …). We refer to the nice magnum opus of Flajolet and Sedgewick on analytic combinatorics [22] for many enumerative and asymptotic examples.

While the behavior of an extremal parameter such as the height is well understood for walks corresponding to Brownian motion theory, it becomes more subtle when a notion of reset/renewal/resetting/catastrophe [8, 14, 33, 9, 29, 42, 40] is introduced in the model: indeed, typical behaviors in this model are often established by conditioning on events of probability zero in the model without reset, leading to possibly counterintuitive results.

In this article, we give several enumerative and asymptotic results on different statistics (final altitude, waiting time, height) of walks with resets, focusing on the so-called Moran walks (walks related to biological/population models considered by Moran in 1958; see Section 5 for more on this).

Plan of the article.

In Section 2, we consider a generic model of walks with resets (allowing any finite set of steps and a reset step). We describe the behavior of their final altitude (at finite time, and asymptotically). We obtain an algebraic closed form for the bivariate generating function (length/final altitude) for walks of bounded height $h$ . Our approach uses a variant of the so-called kernel method, which has the advantage to avoid any case-by-case computation based on Markov chains/transfer matrices of size $h\times h$ . In passing, we show the intimate link between Lagrange interpolation and the kernel method.

In Section 3, we consider Moran walks, a model described in Figure 1, for which we generalize an enumerative formula due to Pippenger [45]. We show that their height asymptotically follows a distribution which involves non-trivial fluctuations. We prove that this distribution is a discrete Gumbel distribution, and we clarify its links with the continuous Gumbel distribution. We give an application to the waiting time for reaching any given altitude.

In Section 4, we begin with a brief presentation of the Mellin transform method, and then use it to derive a precise analysis of the asymptotic average and variance of the height. The second asymptotic term involves some $O(1)$ fluctuations given by a Fourier series (which we prove to be infinitely differentiable, and for which we also derive generic bounds of independent interest). This extends (and fixes some error terms) in earlier analyses by von Neumann, Knuth, Flajolet and Sedgewick [13, 38, 22].

In Section 5, we tackle some multidimensional generalizations of Moran walks, with applications to a model in population genetics and to a wave propagation model (a soliton model), as considered by Itoh, Mahmoud, and Takahashi in [35, 34].

In Section 6, we conclude with a few possible extensions for future work.

Refer to caption — Figure 1. A Moran walk is a random walk which makes a jump $+1$ with probability $p$ , and a reset (a jump to 0) with probability $1-p$ . Above, one sees such a walk of length $n=30$ . Its final altitude is $Y_{n}=1$ , the height is $H_{n}=5$ (reached twice, in red), having 7 resets (the 7 blue dots). In this article, we tackle the enumeration and asymptotics of such paths (and of generalizations involving more general step sets and higher dimension). We also prove that this simple model of walks leads to some noteworthy nontrivial asymptotic behavior of their height $H_{n}$ .

2. Walks with resets: final altitude and height

We consider walks with steps in $\mathcal{S}$ (where $\mathcal{S}$ is a nonempty finite subset of $\mathbb{Z}$ ), which can additionally have a reset at any altitude. That is, we have the following process on $\mathbb{Z}$ :

	$\displaystyle Y_{0}$	$\displaystyle=0$
	$\displaystyle Y_{n+1}$	$\displaystyle=\left\{\begin{array}[]{ll}Y_{n}+k,&\hbox{ with probability }p_{k}\text{\qquad(for each $k\in\mathbb{Z}$, with $p_{k}:=0$ if $k\not\in\mathcal{S}$)},\\ \\ 0,&\hbox{ with probability }q\text{\qquad(with $q+\sum_{k\in\mathcal{S}}p_{k}=1$)}.\end{array}\right.$

(So if $Y_{n}=0$ we have $Y_{n+1}=0$ with probability $p_{0}+q$ .)

Thus, $Y_{n}$ is the altitude of the process after $n$ steps and $H_{n}:=\max(Y_{0},\dots,Y_{n})$ is its height. It is convenient to encode the steps and their probabilities by the Laurent polynomial

P(u):=\sum_{k=c}^{d}p_{k}u^{k}\text{\qquad(with $c:=\min{\mathcal{S}}$ and $d:=\max{\mathcal{S}}$)}.

(1)

We assume $0<q<1$ to avoid degenerate cases. We do not require that $c<0$ or $d>0$ . Of course, if $c\geq 0$ , the walk will live by design in $\mathbb{N}$ (it is e.g. the case for Moran walks of Figure 1). In Section 2.1, we determine the distribution of the final altitude (as illustrated in Figure 2 for different families of steps) and we investigate the height in Section 2.2.

2.1. Final altitude $Y_{n}$

Let us start with a simple result which paves the way for the more subtle generating function manipulations for the height that we tackle later in Section 2.2.

We use the classical convenient notations:

•

$[z^{n}]G(z)$ stands for the coefficient of $z^{n}$ in the power series $G(z)$ ,
•

$\partial_{u}^{j}F(z,1)$ is the $j$ -th derivative of $F(z,u)$ with respect to $u$ , evaluated at $u=1$ .

Theorem 2.1 (Final altitude at finite time).

The final altitude of walks with resets follows a discrete law with probability generating function

F(z,u)=\sum_{n\geq 0}\mathbb{E}[u^{Y_{n}}]z^{n}=\frac{1+qz/(1-z)}{1-zP(u)},\vspace{-1.1mm}

(2)

where $P(u)$ is the Laurent polynomial encoding the allowed steps (a finite subset of $\mathbb{Z}$ ). Equivalently, for $k\in\mathbb{Z}$ , we have

\mathbb{P}(Y_{n}=k)=[u^{k}]P(u)^{n}+q[u^{k}]\sum_{j=0}^{n-1}P(u)^{j}.\vspace{-1.5mm}

(3)

Let $\delta:=P^{\prime}(1)$ be the drift¹¹1We recall that $P(1)=1-q$ , so another convention could have been to call drift the quantity $P^{\prime}(1)/(1-q)$ , i.e., we would then condition on having no reset (instead of considering walks without reset, weighted by the initial model (1)). This alternative convention does not simplify the subsequent formulas. of the walk without reset, and $V:=P^{\prime\prime}(1)$ its second factorial moment. The mean and the variance of the final altitude of the walk with resets are given by

\mathbb{E}[Y_{n}]=\delta/q+(1-q)^{n-1}(\delta-\delta/q),

\mathbb{V}{\rm ar}[Y_{n}]=\frac{\left(V+\delta\right)q+\delta^{2}}{q^{2}}+(1-q)^{n}\left(2\,\frac{\delta^{2}n}{(q-1)q}-\frac{V+\delta}{q}\right)-(1-q)^{2n}\frac{\delta^{2}}{q^{2}}.

For Moran walks (i.e., $P(u)=pu$ and $p=1-q$ ), the mean and the variance simplify to

\mathbb{E}[Y_{n}]=\frac{p}{q}\Big{(}1-p^{n}\Big{)}\text{\quad and \quad}\mathbb{V}{\rm ar}[Y_{n}]=\frac{p}{q^{2}}\Big{(}1-p^{n}\big{(}p^{n+1}+(1+2n)q\big{)}\Big{)}.

Proof 2.2.

The probability generating function can be written as

F(z,u)=\sum_{n\geq 0}\left(\sum_{k\in\mathbb{Z}}^{n}\mathbb{P}(Y_{n}=k)u^{k}\right)z^{n}=\sum_{n\geq 0}f_{n}(u)z^{n},\vspace{-1mm}

where the $f_{n}(u)$ ’s are Laurent polynomials encoding the location of the walk at time $n$ ; thus we have $f_{n+1}(u)=P(u)f_{n}(u)+qf_{n}(1)$ , with $f_{0}(u)=1$ . Multiplying both sides of this recurrence by $z^{n+1}$ , and summing over $n$ , one gets

F(z,u)(1-zP(u))=1+qzF(z,1).

As $F(z,1)=1/(1-z)$ , one obtains Formula (2). Note that the generating function can also be obtained by using a regular expression encoding these walks (by factorizing the walk in factors ending by a reset): $({\mathcal{S}}^{*}q)^{*}(\mathcal{S})^{*}$ , which translates to

F(z,u)=\frac{1}{1-qz\frac{1}{1-zP(1)}}\frac{1}{1-zP(u)},

where the occurrences of $P(1)$ and $P(u)$ reflect that only the altitudes after the last reset contribute to the final altitude of the full walk. Using $P(1)=1-q$ , we get Formula (2).

The mean of $Y_{n}$ is then obtained via $\mu_{n}:=\mathbb{E}[Y_{n}]=[z^{n}]\partial_{u}F(z,1)$ , while its variance is obtained via a second-order derivative: $\mathbb{V}{\rm ar}[Y_{n}]=[z^{n}]\partial_{u}^{2}F(z,1)+\mu_{n}-\mu_{n}^{2}$ .

We can now establish the corresponding limit distribution.

Theorem 2.3 (Final altitude: asymptotics).

Consider walks with $0\not\in\mathcal{S}$ , $\gcd\mathcal{S}=1$ , and ${d=\max\mathcal{S}>0}$ (these three constraints bring no loss of generality²²2 There is no loss of generality. Indeed, if the walk as a periodic support (i.e., if $\gcd(\mathcal{S})=g$ with $g>1$ ) we rescale (without loss of generality) the step set $\mathcal{S}$ by dividing each step by $g$ . Now, if $\max\mathcal{S}<0$ , then we multiply each step by $-1$ . Last, if $0\in\mathcal{S}$ we consider instead the equivalent model $\mathcal{S}:=\mathcal{S}\setminus\{0\}$ and $q:=q+p_{0}$ .). Therefore the support of the walk is either $\mathbb{Z}$ (with all altitudes being reachable), or $\mathbb{N}$ (with a finite set of altitudes impossible to reach, known as the unreachable set in the coin-exchange problem of Frobenius). The final altitude of these walks with resets behaves asymptotically according to these two cases.

a)

For walks with $\min{\mathcal{S}}\geq 0$ , we have for $k\in\mathbb{N}$ (not in the Frobenius unreachable set):

$q\cdot(\min_{i\in\mathcal{S}}p_{i})^{k}\leq\lim_{n}\mathbb{P}(Y_{n}=k)\leq q\cdot(\max_{i\in\mathcal{S}}p_{i})^{k/d}.$ (4)

In particular, for Moran walks, we have $\mathbb{P}(Y_{n}=k)=qp^{k}$ for $0\leq k<n$ and $\mathbb{P}(Y_{n}=n)=p^{n}$ so $\lim Y_{n}=\operatorname{Geom}(q)-1$ .

For walks with $\min\mathcal{S}<0$ and $\max\mathcal{S}>0$ , we have for $k\in\mathbb{Z}$ :

\mathbb{P}(Y_{n}=k)=qW_{k}(1-q)+(1-q)\frac{1}{\tau^{k+1}}\frac{1}{\sqrt{2\pi nP^{\prime\prime}(\tau)}}+O\left(\frac{1}{n}\right).

Moreover, both in Case a) and in Case b), $\mathbb{P}(Y_{n}=k)$ has a geometric decay for large $k$ .

Proof 2.4.

In Case a), we have $\min\mathcal{S}\geq 1$ ; the definition of $P(u)$ in (1) then entails $[u^{k}]P(u)^{j}=0$ for large $j$ . The limit of Equation (3) thus gives

\lim_{n\rightarrow+\infty}\mathbb{P}(Y_{n}=k)=q[u^{k}]\sum_{j=0}^{k}P(u)^{j}.

(5)

In particular, when it is not $0$ , this quantity is lower bounded by $q\cdot(\min_{i\in\mathcal{S}}p_{i})^{k}$ and upper bounded by $q\cdot(\max_{i\in\mathcal{S}}p_{i})^{k/d}$ , and therefore decreases geometrically.

In Case b), the proof is more complicated and will recycle ingredients of the asymptotics of walks without reset. To this aim, first set $\widetilde{P}(u):=P(u)/P(1)$ , i.e., the step setprobabilities are renormalized to have global mass $\widetilde{P}(1)=1$ . Let $W_{k}(z)$ be the probability generating function of walks without reset, i.e., $W_{k}(z)=[u^{k}]\frac{1}{1\scalebox{0.85}[1.11]{$\,-\,$}z\widetilde{P}(u)}=\sum_{n\geq 0}w_{n,k}z^{n}$ . We then rewrite Equation (3) as

$\displaystyle\mathbb{P}(Y_{n}=k)$	$\displaystyle=P(1)^{n}[u^{k}]\widetilde{P}(u)^{n}+q[u^{k}]\sum_{j=0}^{n-1}P(1)^{j}\widetilde{P}(u)^{j}$
	$\displaystyle=(1-q)P(1)^{n}w_{n,k}+q\sum_{j=0}^{n}P(1)^{j}w_{j,k}$
	$\displaystyle=(1-q)P(1)^{n}w_{n,k}+qP(1)^{n}[z^{n}]\frac{1}{1-z/P(1)}W_{k}(z).$	(6)

If $\min{\mathcal{S}}<0$ and $\max{\mathcal{S}}>0$ , then there is a unique real $\tau>0$ such that $\widetilde{P}^{\prime}(\tau)=0$ . It is proven in [5] that $\rho=1/\widetilde{P}(\tau)$ is the radius of convergence of $W_{k}(z)$ and that $w_{n,k}\sim\tau^{-k}C\widetilde{P}(\tau)^{n}/\sqrt{2\pi n}$ , where $C:=\frac{1}{\tau}\sqrt{\widetilde{P}(\tau)/\widetilde{P}^{\prime\prime}(\tau)}$ .

Note that, as we have a probability generating function, we have $\rho=\widetilde{P}(\tau)=1$ . The asymptotics of (6) then follows by singularity analysis, as $1/(1-z/P(1))$ is singular at $z=P(1)=1-q$ , that is, before $W_{k}(z)$ which is singular at $z=1$ :

\mathbb{P}(Y_{n}=k)=qW_{k}(1-q)+(1-q)\tau^{-k}C\frac{P(\tau)^{n}}{\sqrt{2\pi n}}+O\left(\frac{1}{n}\right).

(7)

Note that Formulas (10) and (11) in [5, Theorem 1] give a closed form for $W_{k}(z)$ . It implies in particular

0<W_{k}(1-q)<(1-q)(c+d)C_{1}/C_{2}^{|k|+1},

(8)

where $C_{1}>0$ and $C_{2}>1$ are constants independent of $k$ ; thus $W_{k}(1-q)$ decays geometrically for $k\rightarrow\pm\infty$ . This concludes our analysis of Case b) and gives the theorem.

These limiting behaviors are thus in sharp contrast with the asymptotic behavior of the final altitude of walks on $\mathbb{Z}$ with no resets, which is $\delta n\pm O(\sqrt{n})$ , with fluctuations given by a continuous distribution (Rayleigh or Gaussian; see [5]).

2.2. The height $H_{n}$

In order to study the height of these walks with resets, one considers the subset of them made of walks conditioned to have a height smaller than $h$ . We want to obtain an explicit formula for their generating function

F^{\leq h}(z,u):=\sum_{n=0}^{+\infty}\mathbb{E}\Big{(}u^{Y_{n}}{{\rm 1\!I}}_{\{Y_{1}\leq h,Y_{2}\leq h,\dots,Y_{n}\leq h\}}\Big{)}z^{n}.

If these walks are generated by a step set $\mathcal{S}$ having only positive jumps, a natural but naive approach to enumerate them would be to create a deterministic finite automaton (a finite discrete Markov chain) with $h$ states encoding the possible altitudes of the process. It leads to a system of linear equations which would allow us to get the corresponding rational generating function. However, this approach to obtain the generating function (given $h$ and the transition probabilities) suffers from three drawbacks:

•

it would be of complexity $h^{3}$ (computing determinants of $h\times h$ matrices),
•

it would be a case-by-case approach (new computations are needed for each $h$ ),
•

it would fail if the step set $\mathcal{S}$ has some negative steps (then the support of the walkis $[-\infty,+h]$ , and thus one would need an automaton with an infinite number of states).

So, we prefer here to use a more efficient approach, which relies on a powerful method (namely, the kernel method [7]): the complexity to obtain a closed-form formula for $F^{\leq h}(z,u)$ then drops³³3The PhD thesis of Louis Dumont [17] compares the cost of different methods to compute the coefficients of such generating functions (which can be related to diagonals of rational functions); the full analysis has to take into account the space and time complexities, and some precomputation steps, of cost of course higher than $O(1)$ , but in all cases it is more efficient than a Markov chain approach (see however Bacher [3] for a clever use of a transfer matrix point of view). from $O(h^{3})$ to $O(1)$ for any finite step set $\mathcal{S}\subset\mathbb{Z}$ ! This leads to the following theorem.

Theorem 2.5.

Let $F^{\leq h}(z,u)$ be the probability generating function of walks on $\mathbb{Z}$ of height $\leq h$ with resets, where the length and the final altitude of the walks are respectively encoded by the exponents of $z$ and $u$ . Let $P(u)$ encode the allowed jumps as in (1). One has

\displaystyle F^{\leq h}(z,u)

\displaystyle=\sum_{n=0}^{+\infty}\mathbb{E}\Big{(}u^{Y_{n}}{{\rm 1\!I}}_{\{Y_{1}\leq h,Y_{2}\leq h,\dots,Y_{n}\leq h\}}\Big{)}z^{n}=\frac{W^{\leq h}(z,u)}{1-zqW^{\leq h}(z,1)},

(9)

where

\displaystyle W^{\leq h}(z,u)

\displaystyle:=\frac{\displaystyle{1-\sum_{i=1}^{d}\left(\frac{u}{u_{i}}\right)^{h+1}\prod_{1\leq j\leq d,j\neq i}\frac{u_{j}-u}{u_{j}-u_{i}}}}{\displaystyle{1-zP(u)}}

(10)

is the generating function of walks of height $\leq h$ without reset, and where $u_{1},\dots,u_{d}$ are the roots of $1-zP(u)=0$ such that $\lim_{z\rightarrow 0}|u_{i}(z)|=+\infty$ .

Remark 2.6 (A rational simplification).

These generating functions are algebraic, as they rationally depends on the roots $u_{i}(z)$ , which are themselves algebraic functions. Now, when the step set $\mathcal{S}$ has only positive steps, $W^{\leq h}$ is a polynomial and $F^{\leq h}$ simplifies to a rational function (despite the fact that their closed forms (10) and (9) involve algebraic functions!). This simplification can be seen either by the automaton point of view and the Kleene theorem, or by using the Vieta formulas on Newton sums (as, when one has only positive jumps, the $u_{i}$ ’s are then all the roots of the kernel $1-zP(u)$ ). For example, for $P(u)=u/3+u^{2}/2$ and $h=3$ , we have

u_{1}(z)=\frac{-z+\sqrt{z^{2}+18z}}{3z}\text{ \qquad and \qquad}u_{2}(z)=\frac{-z-\sqrt{z^{2}+18z}}{3z}

(11)

(the Vieta formulas are here: $u_{1}(z)+u_{2}(z)=-2/3$ and $u_{1}(z)u_{2}(z)=-2/z$ ); then, the quotient (9) involving these algebraic functions $u_{1}$ and $u_{2}$ simplifies, leading to

	$\displaystyle W^{\leq 3}(z,u)$	$\displaystyle=\frac{1}{1-zP(u)}\left(1-\left(\frac{u}{u_{1}(z)}\right)^{4}\frac{u_{2}(z)-u}{u_{2}(z)-u_{1}(z)}-\left(\frac{u}{u_{2}(z)}\right)^{4}\frac{u_{1}(z)-u}{u_{1}(z)-u_{2}(z)}\right)$
		$\displaystyle=1+z\left({\frac{u^{2}}{2}}+{\frac{u}{3}}\right)+z^{2}\left({\frac{u^{3}}{3}}+{\frac{u^{2}}{9}}\right)+\frac{z^{3}u^{3}}{27},$
	$\displaystyle F^{\leq 3}(z,u)$	$\displaystyle=\frac{\left(1+z\left({\frac{u^{2}}{2}}+{\frac{u}{3}}\right)+z^{2}\left({\frac{u^{3}}{3}}+{\frac{u^{2}}{9}}\right)+\frac{z^{3}u^{3}}{27}\right)}{1-zq\left(1+{\frac{5z}{6}}+{\frac{4{z}^{2}}{9}}+{\frac{z^{3}}{27}}\right)}.$

Proof 2.7 (Proof of Theorem 2.5).

The probability generating function can be written as

F^{\leq h}(z,u)=\sum_{n\geq 0}f_{n}^{\leq h}(u)z^{n}=\sum_{k=0}^{h}F^{\leq h}_{k}(z)u^{k},

where $f_{n}^{\leq h}(u)$ encodes the possible values of $Y_{n}$ (constrained to be bounded by $h$ over the full process), and where

F^{\leq h}_{k}(z)=\sum_{n=0}^{+\infty}f_{n,k}^{\leq h}z^{n}=\sum_{n=0}^{+\infty}\mathbb{P}\Big{(}Y_{1}\leq h,\,Y_{2}\leq h,\dots,Y_{n-1}\leq h,\,Y_{n}=k\leq h\Big{)}z^{n}

is the probability generating function of bounded walks ending at altitude $k$ .

The dynamics of the process then entails the recurrence

f_{n+1}^{\leq h}(u)=P(u)f_{n}^{\leq h}(u)-\{u^{>h}\}P(u)f_{n,h}^{\leq h}u^{h}+qf_{n}^{\leq h}(1),

where $\{u^{>h}\}$ extracts monomials having a degree in $u$ strictly larger than $h$ . This mimics that at time $n+1$ , either, with probability $p_{k}$ , we increase by $k$ the altitude of where we were at time $n$ (that is, we multiply by $u^{k}$ , and this is allowed as long as the walk stays at some altitude $\leq h$ , thus we removed here the cases corresponding to the walks which would reach an altitude $>h$ at time $n+1$ ); or, with probability $q$ , we have a reset to altitude 0 (i.e., all the mass of the walks at any altitude $k$ , corresponding to the coefficient of $u^{k}$ , is sent back to $u^{0}$ ; this is thus captured by the substitution $u=1$ ).

This directly translates to the functional equation

F^{\leq h}(z,u)=1+zP(u)F^{\leq h}(z,u)-\sum_{k=0}^{d-1}F_{h-k}^{\leq h}(z)u^{h-k}\left(z\sum_{j=k+1}^{d}p_{j}u^{j}\right)+zqF^{\leq h}(z,1).

Setting $q=0$ , we get the functional equation for the generating function $W^{\leq h}$ of walks of height $\leq h$ without reset:

W^{\leq h}(z,u)=1+zP(u)W^{\leq h}(z,u)-\sum_{k=0}^{d-1}W_{h-k}^{\leq h}(z)u^{h-k}\left(z\sum_{j=k+1}^{d}p_{j}u^{j}\right).

(12)

Of course, the factorization of walks with resets into $({\mathcal{S}}^{*}q)^{*}(\mathcal{S})^{*}$ entails $F^{\leq h}(z,u)=\operatorname{Seq}(W^{\leq h}(z,1)q)W^{\leq h}(z,u)$ , which is Formula (9). So if we find a closed form for $W^{\leq h}$ , we are happy as this also solves the initial problem for $F^{\leq h}$ . Now, on the right-hand side of (12), the sum for $k$ from $0$ to $d-1$ is a polynomial in $u$ , which we conveniently rewrite as

W^{\leq h}(z,u)(1-zP(u))=1-u^{h}\sum_{k=1}^{d}G_{k}(z)u^{k}.

(13)

It is possible to solve such an equation via the kernel method: the kernel is the factor $1-zP(u)$ in (13), and if one considers the equation on the variety defined by $1-zP(u)=0$ , this brings additional equations which will allow us to get a closed form for $W^{\leq h}(z,u)$ . First, observe that this kernel is a (Laurent) polynomial in $u$ of “positive” degree $d$ . Then, from an analysis of its Newton polygon, one gets that it has $d$ roots $u_{1}(z),\dots,u_{d}(z)$ such that $u_{i}(z)\approx z^{-1/d}$ for $z\sim 0^{+}$ (the other roots being convergent at $z\sim 0^{+}$ ; see [5] for more on this issue). Thus, setting $u=u_{i}(z)$ (for $i=1,\dots,d$ ) in the functional equation (13) gives $d$ new equations. Some care is required in this step: we have to check that one does not create series involving an infinite number of monomials with negative exponents⁴⁴4Let $R$ be the ring of series $\sum_{n\in\mathbb{Z}}a_{n}z^{n}$ . The Cauchy product of two series in $R$ is well defined only with some additional convergence conditions, and, even if we restrict ourselves to series for which the product is well defined, we have to take care to the fact that they do not form an integral ring: indeed, we have many divisors of zero (e.g. for $S(z):=\sum_{n\in Z}z^{n}$ , we have $zS=S$ and thus $(z-1)S=0$ ). Most algebraic manipulations in this ring, if they are temporarily handling quantities which are not in the subring of power series (or Laurent/Puiseux/Fourier series), would lead to invalid identities in ${\mathbb{C}}[[z]]$ ..

In fact, in our case, the substitution $u=u_{i}$ is legitimate as $W^{\leq h}(z,u_{i})$ becomes a well-defined Puiseux series in $z$ : this follows from the fact that the coefficients $f_{n}^{\leq h}(u)$ are (Laurent) polynomials with “positive” degree bounded by $h$ (and “negative” degree lower bounded by $-cn$ ), so $f_{n}^{\leq h}(u_{i}(z))$ is a Puiseux series with exponents from $-h/d$ to $+\infty$ . Then, multiplying by $z^{n}$ and summing over $n$ , only a finite number of summands contribute to each monomial of $W^{\leq h}(z,u_{i})$ , which is thus well defined. Via these substitutions $u=u_{i}$ , we obtain a linear system of $d$ equations (which only contains the $G_{k}$ ’s as unknowns). Then, by Cramer’s rule, we get $G_{k}=\det(V_{k})/\det(V)$ , where

V=\begin{pmatrix}u_{1}^{h+1}&u_{1}^{h+2}&\dots&{u_{1}}^{h+d}\\ u_{2}^{h+1}&u_{2}^{h+2}&\dots&{u_{2}}^{h+d}\\ \vdots&\vdots&\vdots&&\vdots\\ u_{d}^{h+1}&u_{d}^{h+2}&\dots&{u_{d}}^{h+d}\end{pmatrix}\text{\quad and \quad}V_{k}=\begin{pmatrix}u_{1}^{h+1}&\dots&u_{1}^{h+k-1}&1&u_{1}^{h+k+1}&\dots&{u_{1}}^{h+d}\\ u_{2}^{h+1}&\dots&u_{2}^{h+k-1}&1&u_{2}^{h+k+1}&\dots&{u_{2}}^{h+d}\\ \vdots&\vdots&\vdots&&\vdots\\ u_{d}^{h+1}&\dots&u_{d}^{h+k-1}&1&u_{d}^{h+k+1}&\dots&{u_{d}}^{h+d}\end{pmatrix},

that is, $V_{k}$ is the matrix $V$ with its $k$ -th column entries replaced by $1$ . Thus, as $V$ is a Vandermonde matrix, its determinant is

\det(V)=\left(\prod_{i=1}^{d}u_{i}^{h+1}\right)\prod_{1\leq i<j\leq d}(u_{j}-u_{i}).

(14)

Now, to compute $\det(V_{k})$ , one first proves that

\displaystyle\Delta=\det\begin{pmatrix}u_{1}^{1}&\dots&u_{1}^{k-1}&1&u_{1}^{k+1}&\dots&{u_{1}}^{d}\\ u_{2}^{1}&\dots&u_{2}^{k-1}&1&u_{2}^{k+1}&\dots&{u_{2}}^{d}\\ \vdots&\vdots&\vdots&&\vdots\\ u_{d}^{1}&\dots&u_{d}^{k-1}&1&u_{d}^{k+1}&\dots&{u_{d}}^{d}\\ \end{pmatrix}=e_{d-k}(u_{1},\dots,u_{d})\prod_{1\leq i<j\leq d}(u_{j}-u_{i}),

(15)

where we used the classical notation for the elementary symmetric polynomials:

e_{k}(x_{1},\dots,x_{d}):=[t^{k}]\prod_{i=1}^{d}(1+tx_{i}),

(16)

e.g., $e_{3}(x_{1},\dots,x_{5})=x_{1}x_{2}x_{3}+x_{1}x_{2}x_{4}+x_{1}x_{2}x_{5}+x_{1}x_{3}x_{4}+x_{1}x_{3}x_{5}+x_{1}x_{4}x_{5}+x_{2}x_{3}x_{4}+x_{2}x_{3}x_{5}+x_{2}x_{4}x_{5}+x_{3}x_{4}x_{5}$ . Formula (15) follows from 2 facts:

•

If $u_{i}=u_{j}$ , then two rows of $V_{k}$ are equal and thus the determinant is 0; this explains the Vandermonde product $\Pi:=\prod_{1\leq i<j\leq d}(u_{j}-u_{i})$ on the right-hand side of Formula (15).
•

Now writing the determinant as a sum over the $d!$ permutations of the entries gives a sum of monomials, each of total degree $(1+2+...+d)-k$ in the $u_{i}$ ’s. $\Pi$ being of total degree $\binom{d}{2}=d(d-1)/2$ , it implies that $\Delta/\Pi$ is a polynomial which is symmetric and homogeneous of total degree $d-k$ . Up to a constant factor (determined to be 1, by comparing any monomial), this polynomial has to be $e_{d-k}$ , which captures exactly the missing $u_{i}$ ’s in each of the $d!$ summands.

Then, performing a Laplace expansion of $\det(V_{k})$ on its $k$ -th column and using Formula (15), one gets (after simplification in the Cramer formula):

G_{k}(z)=\sum_{\ell=1}^{d}u_{\ell}^{-h-1}(-1)^{k+d}e_{d-k}(u_{1},\dots,u_{d})_{|u_{\ell}=0}\prod_{\begin{subarray}{c}1\leq j\leq d\\ j\neq\ell\end{subarray}}\frac{1}{u_{\ell}-u_{j}}.

(17)

Now, using $\sum_{k=0}^{d}(-1)^{d-k}e_{d-k}(u_{1},\dots,u_{d})u^{k}=\prod_{i=1}^{d}(u-u_{i})$ (which is equivalent to the definition (16)), and regrouping the powers $u_{k}^{-h-1}$ , we get

\sum_{k=1}^{d}G_{k}(z)u^{k-1}=\sum_{k=1}^{d}u_{k}^{-h-1}\prod_{1\leq j\leq d,j\neq k}\frac{u_{j}-u}{u_{j}-u_{k}}.

(18)

Combining Equations (18) and (12), we get Formula (10) for $W^{\leq h}(z,u)$ , and thus the closed form for $F^{\leq h}(z,u)$ .

Remark 2.8 (Link with Lagrange interpolation).

As we know the evaluation of the right-hand side of (13) in each of the $u_{k}$ , Formula (18) is also equivalent to the Lagrange interpolation formula (which we thus reproved en passant). Moreover, this Lagrange interpolation approach offers a nice advantage: it is circumventing the fact that the factorization argument used to get the closed forms for the generating functions in [12, 5] works only if the walks start at altitude 0.

Now, if we go back to Moran walks (i.e., for $P(u)=pu$ ; see Figure 1), the generating function simplifies to the following noteworthy shape.

Corollary 2.9.

The probability generating function of Moran walks of height $\leq h$ is

\displaystyle F^{\leq h}(z,u)=\frac{(1-pz)(1-(pzu)^{h+1})}{(1-puz)(1-z+(pz)^{h+1}zq)},

(19)

where, in the power series, the length and the final altitude of the walks are respectively encoded by the exponents of $z$ and $u$ . Accordingly,

	$\displaystyle\mathbb{P}(H_{n}\leq h)$	$\displaystyle=[z^{n}]F^{\leq h}(z,1)=[z^{n}]\frac{1-(pz)^{h+1}}{1-z+(pz)^{h+1}zq}$		(20)
		$\displaystyle=\sum_{k=0}^{\left\lfloor\frac{n}{h+1}\right\rfloor}(-qp^{h+1})^{k}\left(\binom{n-k(h+1)}{k}-p^{h+1}\binom{n-(k+1)(h+1)}{k}\right),\qquad$		(21)

with the convention that $\binom{m}{k}=0$ if $m<0$ .

Proof 2.10.

The closed form (21) is obtained via the power series expansion $1/(1-T)=\sum T^{j}$ by applying the binomial theorem to each term $T^{j}$ , with $T=z+(pz)^{h+1}zq$ .

The binomial sum (21) generalizes a formula obtained (for $p=1/2$ ) by Pippenger in [45]. Therein, it is derived by an inclusion-exclusion principle (guided by the combinatorics of the carry propagation in binary words); for his problem, the generating function, and thus the corresponding binomial sum, are a little bit simpler than (20) and (21), and are then used to perform some real analysis for the asymptotics of the expected length.

In our case, equipped with this explicit expression for the probability generating function of Moran walks of bounded height, we can now tackle the question of the asymptotic distribution of this extremal parameter.

3. Asymptotic height of Moran walks

In this section, we establish a local limit law for the distribution of the height of Moran walks. One noteworthy consequence of the generating function explicit formula that we get in the previous section is that it allows us to have very efficient computations and simulations of the process at time $n$ , for large $n$ , as stressed by the following remark.

Remark 3.1 (Fast computation scheme for any given $n$ and $h$ ).

One does not need to run the process for $n$ steps to have the exact distribution of $H_{n}$ . Indeed, using the rational generating function from Corollary 2.9, for any $p$ , $h$ , and $n$ , it is possible to get the exact value of $\mathbb{P}\left(H_{n}=h\right)=[z^{n}]\left(F^{\leq h}(z,1)-F^{\leq h-1}(z,1)\right)$ in time $O(\ln(n))$ via binary exponentiation.

This allows us to plot the distribution $H_{n}$ , for quite large values of $n$ (as an example, see Figure 3). Note that for our other generating functions, which are algebraic, there exists a fast algorithm of cost $\sqrt{n}\ln(n)$ to compute their $n$ -th coefficient (this algorithm works more generally for all D-finite functions). This algorithm due to the brothers Chudnovsky is e.g. implemented in the Maple computer algebra system via the package Gfun; see [49]

3.1. Localization of the dominant singularity

As $F^{\leq h}(z,1)$ (as given by Equation (19)) is a rational function, all its singularities are poles. The asymptotic behavior of the coefficients of $F^{\leq h}(z,1)$ is governed by the closest pole(s) to zero (also called “dominant singularities” of $F^{\leq h}$ ). A natural candidate for being such a dominant singularity of $F^{\leq h}(z,1)$ would be $z=1/p$ , but it is in fact a removable singularity, as one has (e.g. via L’Hôpital’s rule) $F^{\leq h}(1/p,1)=\frac{p(h+1)}{2p-1-qh}$ . Thus, we can focus on the other roots of the denominator $D(z)$ of $F^{\leq h}(z,1)$ .

Lemma 3.2 (Localization of the singularities of $F^{\leq h}$ ).

For $p\in(0,1)$ , the $h+2$ roots $z_{1}(h),\dots,z_{h+2}(h)$ of $D(z)=1-z+qp^{h+1}z^{h+2}$ are such that we have for $h$ large enough:

(i)

$z_{1}(h)$ is the unique root strictly between 1 and $1/p$ ;
(ii)

$z_{2}(h)=1/p$ is the unique root of modulus $1/p$ ;
(iii)

the remaining $h$ roots $z_{3}(h),\dots,z_{h+2}(h)$ are all of modulus $>1/p$ , and arbitrarily close (in modulus) to $1/p$ (for $h\rightarrow+\infty$ );
(iv)

all the roots are simple.

Accordingly, $z_{1}(h)$ is the dominant singularity of $F^{\leq h}(z,1)$ .

Proof 3.3.

Let $z_{*}(h)$ be the unique positive zero of $D^{\prime}(z)=-1+(h+2)qp^{h+1}z^{h+1}$ given by

z_{*}(h)=\frac{1}{p}\left(\frac{1}{q(h+2)}\right)^{\frac{1}{h+1}}.

As $z_{*}(h)$ tends to $\frac{1}{p}$ from the left, we thus have $0<z_{*}(h)<1/p$ for $h$ large enough. Moreover, $D(z)$ is decreasing for all $z$ in the interval $[0,z_{*}(h)]$ and increasing in the interval $[z_{*}(h),+\infty]$ . As $D(1/p)=0$ , one thus has $D(z_{*}(h))<0$ . And since $D(1)>0$ , the intermediate value theorem implies the existence of (at least) one zero of $D$ between $1$ and $z_{*}(h)$ . Combined with the (non)decreasing properties of $D$ , this entails the unicity of this zero; let us call it $z_{1}(h)$ . Then, Pringsheim’s theorem (see e.g. [22]) asserts that $F^{\leq h}$ has a real positive dominant singularity which is thus $z_{1}(h)$ , the first real positive zero of $D$ . As $F^{\leq h}(z)$ is a probability generating function, all its singularities are of modulus ${\geq 1}$ . So we have ${1<z_{1}(h)<z_{*}(h)<1/p}$ and thus proved (i).

We now prove (ii). The fact that $z_{2}(h)=1/p$ is a root follows from $1-1/p+q/p=0$ . Is there any other root of the same modulus? If $z=\exp(i\theta)/p$ (with $\theta\in[0,2\pi]$ ) would be a root of $D(z)$ , then this would imply $p=\exp(i\theta)-q\exp(i(h+2)\theta)$ . By the reverse triangle inequality $\Big{|}|x|-|y|\Big{|}\leq|x-y|$ (with equality only if $xy=0$ or $x/y\in\mathbb{R}^{+}$ ), this would entail $\theta=0$ .

To prove (iii), we use the following version of Rouché’s theorem: if $|D-g|<|g|$ on the boundary of a disk $\mathcal{D}$ , then $D$ and $g$ have the same number of roots inside $\mathcal{D}$ . We can apply this theorem to $D$ with $g(z):=1-z$ , for the disk ${\mathcal{D}}(0,\frac{1-\epsilon}{p})$ : on its boundary, one indeed has $|D(z)-g(z)|=\frac{q}{p}|pz|^{h+2}\leq\frac{q}{p}|1-\epsilon|^{h+2}<\frac{q}{p}|1-\epsilon|^{2/q}<\frac{q-\epsilon}{p}\leq|g(z)|$ , where the first strict inequality holds for $h\geq 2/q$ and the next strict inequality holds for any small enough $\epsilon$ (independently of $h$ ), as we have then $\frac{\ln(1-\epsilon/q)}{\ln(1-\epsilon)}<2/q$ . As the constraint on $h$ is independent of $\epsilon$ , letting $\epsilon\rightarrow 0$ , we infer that $D$ has only one root strictly inside ${\mathcal{D}}(0,\frac{1}{p})$ .

Now we can also apply this theorem to $D$ with $g(z):=1+z^{h+2}$ : on the boundary of the disk ${\mathcal{D}}(0,\frac{1+\epsilon}{p})$ , one indeed has, for $h$ large enough (depending on $\epsilon$ ),

|D(z)-g(z)|\leq\left(\frac{1+\varepsilon}{p}\right)^{h+2}\left(1-qp^{h+1}\right)+\frac{1+\varepsilon}{p}<\left(\frac{1+\varepsilon}{p}\right)^{h+2}-1\leq|g(z)|,

where the last $-1$ is just a crude bound of the term $-\frac{q}{p}(1+\varepsilon)^{h+2}+\frac{1+\varepsilon}{p}$ which converges to $-\infty$ for $h\rightarrow+\infty$ . So $D$ , like $g$ , has $h+2$ roots inside this disk.
To prove (iv), note that the equation $D(z)=D^{\prime}(z)=0$ is forcing $z=1+\frac{1}{h+1}$ , but $D^{\prime}(1+\frac{1}{h+1})\rightarrow-1$ for $h\rightarrow+\infty$ , therefore all the zeros are simple for $h$ large enough.

See Figure 5 on page 5 for an illustration of the location of the roots.

3.2. Limit distribution of the height: the discrete Gumbel distribution

The height distribution exhibits some a priori surprising asymptotic aspects, having a flavor of number theory/Diophantine approximation. Such phenomena, however, appear for a few other probabilistic processes where some statistics could have different asymptotic behaviors depending on some resonance between $\ln p$ and $\ln q$ (see e.g. Janson [36] or Flajolet, Vallée, and Roux [21] for some examples related to tries or binary search trees). In our case, it appears that a resonance between $\ln p$ and $\ln n$ plays a role.

Theorem 3.4 (Distribution of the height of Moran walks).

We have

\mathbb{P}\left(H_{n}\leq h\right)=\exp\left(-qnp^{h+1}\right)\left(1+O\left(\frac{(\ln n)^{3}}{n}\right)\right),

(22)

where the error term is uniform for $h\in[0,n]$ . Accordingly, $\mathbb{P}(H_{n}=h)$ is unimodal, with a peak at $h=h^{*}(n)$ , the closest integer to $c^{*}(n)\frac{\ln(n)}{\ln(1/p)}$ , where $c^{*}(n):=1-\frac{\ln(\ln(1/p)/q^{2})}{\ln(n)}$ , and we have

\mathbb{P}(H_{n}=h^{*}(n))\sim p^{p/q}-p^{1/q}.

(23)

Moreover, the mass is sharply concentrated around $\frac{\ln n}{\ln(1/p)}$ , as better seen by the following result, with a uniform error term in $k$ :

\mathbb{P}\left(H_{n}\leq\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor+k\right)=\exp\left(-q\alpha(n)p^{k+1}\right)\left(1+O\left(\frac{(\ln n)^{3}}{n}\right)\right),

with $\alpha(n):=p^{-\{\frac{\ln n}{-\ln p}\}}$ (where $\{x\}$ stands for the fractional part of $x$ , and where $\lfloor x\rfloor$ stands for the floor function of $x$ ). [See Figure 3 on page 3 for an illustration of the distribution of $H_{n}$ and Figure 4 for the behavior of the function $\alpha(n)$ .]

Proof 3.5.

In the sequel, as the context is explicit, we simply denote by $z_{1},\dots,z_{h+2}$ the zeros $z_{1}(h),\dots,z_{h+2}(h)$ of $D(z)=1-z+qp^{h+1}z^{h+2}$ . From Lemma 3.2, for $h$ large enough, all these zeros $z_{i}$ are simple; the partial fraction decomposition of $1/D$ is then

\frac{1}{D(z)}=\sum_{i=1}^{h+2}\frac{1}{D^{\prime}(z_{i})\left(z-z_{i}\right)}

and as $D^{\prime}(z_{i})=-1+(h+2)(z_{i}-1)/{z_{i}}$ , one thus gets

	$\displaystyle F^{\leq h}(z,1)$	$\displaystyle=\frac{1-(pz)^{h+1}}{D(z)}=\sum_{i=1}^{h+2}\frac{1-(pz)^{h+1}}{D^{\prime}(z_{i})\left(z-z_{i}\right)}$
		$\displaystyle=\sum_{i=1}^{h+2}\left(\frac{1}{z_{i}\scalebox{0.85}[1.11]{$\,-\,$}\left(z_{i}\scalebox{0.85}[1.11]{$\,-\,$}1\right)(h\scalebox{0.95}{$\,+\,$}2)}\left(\sum_{n=0}^{+\infty}z_{i}^{-n}z^{n}\right)\scalebox{0.85}[1.11]{$\,-\,$}\frac{p^{h+1}}{z_{i}\scalebox{0.85}[1.11]{$\,-\,$}\left(z_{i}\scalebox{0.85}[1.11]{$\,-\,$}1\right)(h\scalebox{0.95}{$\,+\,$}2)}\sum_{n=h+1}^{+\infty}z_{i}^{-n+h+1}z^{n}\right)$
		$\displaystyle=\sum_{i=1}^{h+2}\left(\frac{1}{z_{i}\scalebox{0.85}[1.11]{$\,-\,$}\left(z_{i}\scalebox{0.85}[1.11]{$\,-\,$}1\right)(h\scalebox{0.95}{$\,+\,$}2)}\left(\sum_{n=0}^{h}z_{i}^{-n}z^{n}\right)+\frac{1\scalebox{0.85}[1.11]{$\,-\,$}(pz_{i})^{h+1}}{z_{i}\scalebox{0.85}[1.11]{$\,-\,$}\left(z_{i}\scalebox{0.85}[1.11]{$\,-\,$}1\right)(h\scalebox{0.95}{$\,+\,$}2)}\sum_{n=h+1}^{+\infty}z_{i}^{-n}z^{n}\right).$

It is combinatorially obvious that $\mathbb{P}\left(H_{n}\leq h\right)=1$ for all $n\leq h$ . So we now focus on $n>h$ , for which we have, as $(pz_{i})^{h+1}=\frac{z_{i}-1}{qz_{i}}$ and $1-\frac{z_{i}-1}{qz_{i}}=\frac{1-pz_{i}}{qz_{i}}$ :

$\displaystyle\mathbb{P}\left(H_{n}\leq h\right)=[z^{n}]F^{\leq h}(z,1)$	$\displaystyle=\sum_{i=1}^{h+2}\frac{1-(pz_{i})^{h+1}}{z_{i}-\left(z_{i}-1\right)(h+2)}z_{i}^{-n}$
	$\displaystyle=\sum_{i=1}^{h+2}\frac{1-pz_{i}}{q\left(1+\left(1-z_{i}\right)(h+1)\right)}z_{i}^{-n-1}$
	$\displaystyle=Z_{1}(n,h)+O\left(hMp^{n+1}\right),$	(24)

where $M=\max_{i=3,\dots,h+2}\left|\frac{1-pz_{i}}{q\left(1+\left(1-z_{i}\right)(h+1)\right)}\right|=O(1)$ (note that the summand involving $z_{2}=1/p$ cancels), and where $Z_{1}(n,h):=\frac{1-pz_{1}}{q\left[1+\left(1-z_{1}\right)(h+1)\right]}z_{1}^{-n-1}$ is the contribution coming from the pole $z_{1}$ .

Set $z_{1}:=1+\varepsilon_{h}$ . Then $D(z_{1})=1-(1+\varepsilon_{h})+qp^{h+1}\left(1+\varepsilon_{h}\right)^{h+2}=0$ , thus this implies $\varepsilon_{h}=qp^{h+1}\left(1+\varepsilon_{h}\right)^{h+2}$ ; therefore we have $z_{1}=1+\varepsilon_{h}=1+qp^{h+1}+O(hp^{2h})$ . Now, for $h=h(n)$ tending to $+\infty$ , this entails that the contribution $Z_{1}(n,h)$ of the pole $z_{1}$ (as given by Equation (24)) satisfies

$\displaystyle\!\!\!\!\!\!\!\!\!\!\!\!\!\!Z_{1}(n,h)$	$\displaystyle=\frac{1-p^{h+2}+O\left(hp^{2h}\right)}{1-(h+1)qp^{h+1}+O(h^{2}p^{2h})}(1+\varepsilon_{h})^{-n-1}$	(25)
	$\displaystyle=\Big{(}1+q(h+1)p^{h+1}-p^{h+2}+O(h^{2}p^{2h})\Big{)}\exp\left((n+1)\ln\left(\frac{1}{1+\varepsilon_{h}}\right)\right)$	(26)
	$\displaystyle=\left(1+q(h+1)p^{h+1}-p^{h+2}+O(h^{2}p^{2h})\right)\exp\left(-(n+1)\varepsilon_{h}+\Theta((n+1)\varepsilon_{h}^{2})\right)\!.$	(27)

Observe that

\text{ if \qquad}h=c\frac{\ln(n)}{\ln(1/p)}+c^{\prime}\frac{\ln(\ln(n))}{\ln(1/p)}\text{ \qquad then \qquad}p^{h}=\frac{1}{n^{c}\ln(n)^{c^{\prime}}}.

(28)

(Here and in the sequel we always consider $c>1/2$ and $c^{\prime}\geq 0$ . In fact, $c^{\prime}>0$ is not needed right now, but this will be required for the asymptotics of the mean of $H_{n}$ in Section 4.)

For such values of $h$ , the asymptotics of the first factor in Equation (27) is

\displaystyle 1+q(h+1)p^{h+1}-p^{h+2}+O(h^{2}p^{2h})

\displaystyle=1+O\left(\frac{1}{n^{c}\ln(n)^{c^{\prime}-1}}\right),

(29)

and the asymptotics of the second factor in Equation (27) is

	$\displaystyle\exp\left(-(n+1)\varepsilon_{h}+O((n+1)\varepsilon_{h}^{2})\right)=\exp\left(-nqp^{h+1}+O(nhp^{2h})-\varepsilon_{h}+\Theta(n^{1-2c}/\ln(n)^{2c^{\prime}})\right)$		(30)
	$\displaystyle\qquad=\exp\left(-nqp^{h+1}\right)\left(1+O(n^{1-2c}\ln(n)^{1-2c^{\prime}})-O(n^{-c}\ln(n)^{-c^{\prime}})+\Theta(n^{1-2c}/\ln(n)^{2c^{\prime}}))\right).$		(31)

In this expansion, one now has to check which error term dominates. It is the big-oh term with $n^{-c}$ if $c>1$ and the big-oh with $n^{1-2c}$ if $c\leq 1$ . Multiplying with the asymptotic expansion from Equation (29) and using the approximation (24), we get the following result (in which we simplified the $\ln$ part of the error term in a non-optimal way which will be enough for our purpose):

\mathbb{P}\left(H_{n}\leq h\right)=\exp\left(-nqp^{h+1}\right)\left(1+O\left(\frac{\ln n}{n^{\min(c,2c-1)}}\right)\right).

(32)

Moreover, this approximation holds for all $h\in[0,n]$ : first, for $h\ll\frac{1}{2}\ln(n)/\ln(1/p)$ this follows from the fact that $\mathbb{P}\left(H_{n}\leq h\right)$ is increasing with respect to $h$ , and then for $h\gg c\ln(n)$ this follows from the bound (50) hereafter.

In conclusion, for $h=\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor+k$ , for any $k$ such that $h\in\left[c_{1}\frac{\ln(n)}{\ln(1/p)},c_{2}\frac{\ln(n)}{\ln(1/p)}\right]$ (with $1/2<c_{1}<c_{2}$ ), we have uniformly in $k$ (when $n\to+\infty$ ):

	$\displaystyle\mathbb{P}\left(H_{n}\leq h\right)$	$\displaystyle=\exp\left(-nqp^{\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor+k+1}\right)\left(1+O\left(\frac{(\ln n)^{3}}{n}\right)\right)$
		$\displaystyle=\exp\left(-qp^{-\{\frac{\ln n}{-\ln p}\}+k+1}\right)\left(1+O\left(\frac{(\ln n)^{3}}{n}\right)\right),$

and we get Theorem 3.4 by setting $\alpha(n):=p^{-\{\frac{\ln n}{-\ln p}\}}$ .

If $p=q=1/2$ , we have $\alpha(n)=2^{\{\operatorname{lg}(n)\}}$ (where the symbol $\operatorname{lg}$ stands for the binary logarithm, $\operatorname{lg}(x)=\log_{2}(x)$ ). This subcase of particular interest corresponds to a problem initially considered in 1946 by Burks, Goldstine, and von Neumann [13]: the study of carry propagation in computer binary arithmetic; it constitutes one of the first analyses of the cost of an algorithm! They gave crude bounds which were deeply improved by Knuth in 1978 [38]. This problem can also be seen as runs in binary words, and, as such, is analyzed by Flajolet and Sedgewick [22, Theorem V.1]. Therein, the analysis unfortunately contains a few typos which affect some of the error terms. Our proofs are incidentally fixing this issue.

These extremal parameters (runs, longest carry) are archetypal examples of problems leading to a Gumbel distribution (or a discrete version of it). This distribution indeed often appears in combinatorics as the distribution of parameters encoding a maximal value: e.g., maximum of i.i.d. geometric distributions [51], longest repetition of a pattern in lattice paths [46], runs in integer compositions [23], carry propagation in signed digit representations [30], largest part in some integer compositions, longest chain of nodes with a given arity in trees, maximum degree in some families of trees [47], the maximum protection number in simply generated trees [31]. For some of these examples, it was proven only for some specific families of structures, but there is no doubt that it holds generically. A general framework leading to such double exponential laws is given by Gourdon [26, Theorem 4] for the largest component in supercritical composition schemes (see also Bender and Gao [10]). We refer to Figure 6 for an illustration of some of these parameters.

The Gumbel distribution is also called the “double exponential distribution”, or the “type-I generalized extreme value distribution”, and can also be expressed as a subcase of the Fisher–Tippett distribution. Let us give a formal definition.

Definition 3.6 (Gumbel distribution).

A continuous random variable $X$ with support $[-\infty,+\infty]$ follows a Gumbel distribution (of parameters $\mu$ and $\beta$ ), denoted by $\operatorname{Gumbel}(\mu,\beta)$ , if

\mathbb{P}(X\leq x)=\exp\left(-\exp\left(-\frac{x-\mu}{\beta}\right)\right).

Its mean satisfies $\mathbb{E}[X]=\mu+\gamma\beta$ (where $\gamma=0.5772\dots$ is Euler’s constant) and its variance satisfies $\mathbb{V}{\rm ar}[X]=\frac{\pi^{2}}{6}\beta^{2}$ . It is unimodal with a peak at $x=\mu$ and its median is at $x=\mu-\beta\ln(\ln(2))$ .

Definition 3.7 (Discrete Gumbel distribution).

A discrete random variable $Y$ follows a discrete Gumbel distribution of parameters $\mu$ and $\beta$ , which we denote $\operatorname{Gumbel}(\mu,\beta)$ ⁵⁵5With a slight abuse of notation, we use the same notation $\operatorname{Gumbel}(\mu,\beta)$ for both the continuous distribution and the discrete distribution, adding the right adjective if needed to remove any ambiguity., if

\mathbb{P}(Y\leq h)=\exp\left(-\exp\left(-\frac{h-\mu}{\beta}\right)\right),\text{\qquad for all $h\in\mathbb{Z}$}.

(33)

In particular, one can always write $Y=\lceil X\rceil$ , where $X$ follows a continuous $\operatorname{Gumbel}(\mu,\beta)$ ; note on the other side that $\lfloor X\rfloor$ follows a discrete $\operatorname{Gumbel}(\mu-1,\beta)$ .

To obtain a nice formula for the mean and variance of a discrete Gumbel distribution remains an open problem: for example, for $Y\stackrel{{\scriptstyle d}}{{=}}\operatorname{Gumbel}(0,1)$ , we have

\mathbb{E}[Y]=\sum_{h=-\infty}^{\infty}h\left(\exp(-\exp(-h))-\exp(-\exp(-h+1)\right)=1.077240905953631072609\dots

(and it takes 5 seconds to get thousands of digits, as the terms decrease doubly exponentially fast), but will anybody find a closed form for this mysterious constant? Some insight on the variance of the discrete distribution $Y$ can be obtained from the continuous distribution $X$ via the following trivial but useful bounds which hold more generally as soon as $|X-Y|<1$ :

\left|\mathbb{E}[Y]-\mathbb{E}[X]\right|<1\text{\qquad and \qquad}\left|\mathbb{V}{\rm ar}[Y]-\mathbb{V}{\rm ar}[X]\right|<2+4|\mathbb{E}[X]|.

(34)

We can now restate our previous theorem in terms of this discrete Gumbel distribution.

Corollary 3.8 (Gumbel limit law).

The sequence of random variables $\lceil H_{n}-\frac{\ln(pqn)}{\ln(1/p)}\rceil$ converges for $n\rightarrow+\infty$ (in distribution and in moments) to the discrete $\operatorname{Gumbel}(0,\beta)$ distribution with $\beta=\frac{1}{\ln(1/p)}$ . Accordingly, it implies that

\mathbb{E}[H_{n}]\sim\frac{\ln(pqn)}{\ln(1/p)}+\gamma\beta+\text{an error smaller than $1$},

\mathbb{V}{\rm ar}[H_{n}]\sim\frac{\pi^{2}}{6\ln(p)^{2}}+\text{ an error smaller than $2+4\gamma\beta$}.

Proof 3.9.

Consider the sequence of random variables $Y_{n}:=\lceil H_{n}-\mu_{n}\rceil$ . Then, the change of variable $h\mapsto h+\mu_{n}$ in Equation (22), with $\mu_{n}=\frac{\ln(pqn)}{\ln(1/p)}$ allows us to match $Y:=\lim_{n}Y_{n}$ (where the limit is in distribution) with the discrete Gumbel defined in (33), for $\mu=0$ and $\beta=\frac{1}{\ln(1/p)}$ . Due to the exponentially small uniform error term in (22) on the support $[0,n]$ of $H_{n}$ , we have a convergence in moments of $Y_{n}$ to $Y$ . Then, the asymptotics of the moments follow by applying the bounds (34) on the link between the mean/variance of the discrete and continuous Gumbel distribution.

These moment asymptotics already constitute a notable result (falling as a good ripe fruit!), but a very interesting phenomenon is hidden in these imprecise errors terms: some bodacious fluctuations, that we fully describe in Section 4.

3.3. Waiting time

Let us end this section with an application to a natural statistic: the waiting time $\tau_{h}$ , i.e., the number of steps spent by the random walk when it reaches a given altitude $h$ for the first time. There is an intimate relationship between height and waiting time (stated more formally in Equation (37) hereafter); it is thus natural that they have enumerative and asymptotic formulas of a similar nature, as better shown by the following corollary.

Corollary 3.10.

The waiting time $\tau_{h}$ for reaching height $h$ satisfies

\displaystyle\mathbb{P}(\tau_{h}=n)

\displaystyle=[z^{n}]\frac{(1-pz)p^{h}z^{h}}{1-z+qp^{h-1}z^{h}}.

(35)

The distribution function of $\tau_{h}$ satisfies

\mathbb{P}(\tau_{h}\leq n)=1-\exp\left(-q\alpha(n)^{2}np^{h}\right)+O\left(\frac{(\ln n)^{3}}{n}\right).

(36)

Proof 3.11.

Consider a walk reaching for the first time altitude $h$ at time $n$ . Cut it after each reset. It gives a sequence of factors of length $k\leq h$ , followed by a last factor with $h$ up steps. This translates into the combinatorial formula

\displaystyle\mathbb{P}(\tau_{h}=n)

\displaystyle=[z^{n}]\frac{p^{h}z^{h}}{1-\sum_{k=1}^{h-1}p^{k-1}qz^{k}},

which simplifies to Formula (35). Now, for the distribution function, instead of redoing a full analysis based on a partial fraction decomposition of this generating function, it is more convenient to use the relation

\displaystyle\mathbb{P}(\tau_{h}=n)=\mathbb{P}(H_{n}=h\text{ and }H_{n-1}<h),

(37)

thus this waiting time also satisfies

\displaystyle\mathbb{P}(\tau_{h}\leq n)=\mathbb{P}(H_{n}\geq h)=1-\mathbb{P}(H_{n}\leq h-1).

(38)

Then, using Theorem 3.4, we also have

	$\displaystyle\mathbb{P}(H_{n}\leq h-1)$	$\displaystyle=\mathbb{P}\left(H_{n}\leq\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor+h-1-\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor\right)$
		$\displaystyle=\exp\left(-q\alpha(n)p^{h-\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor}\right)+O\left(\frac{(\ln n)^{3}}{n}\right)$
		$\displaystyle=\exp\left(-q\alpha(n)^{2}p^{h+\frac{\ln n}{\ln p}}\right)+O\left(\frac{(\ln n)^{3}}{n}\right).$

Via Formula (38) linking the waiting time $\tau_{h}$ and the height $H_{n}$ , this entails (36).

We now turn to a finer analysis of the mean and variance of $H_{n}$ .

4. Mean and variance of the height

4.1. Fundamental properties of the Mellin transform

In order to get a fine estimation of the average height, we use a Mellin transform, which, as we shall see, is the key tool to handle the corresponding asymptotics. We now present the needed definitions and formulas. We refer e.g. to Flajolet, Gourdon, and Dumas [19] or to the book Analytic Combinatorics [22, Appendix B.7] for more on the Mellin transform and numerous applications to asymptotics of harmonic sums, digital sums, and divide-and-conquer recurrences.

Definition 4.1 (Mellin transform).

Let $f(t)$ be a continuous function defined on the positive real axis $0<t<+\infty$ . The Mellin transform $f^{*}$ of $f$ is the function defined by

f^{*}(s):=\int_{0}^{+\infty}f(t)t^{s-1}dt.

This integral exists only for $s$ such that the function $f(t)t^{s-1}$ is integrable on $\left(0,\;+\infty\right)$ . Thus, if there exist two real numbers $a$ and $b$ , such that $a>b$ and

f(t)=\begin{cases}O(t^{a}),&\mbox{ if }t\to 0\\ O(t^{b}),&\mbox{ if }t\to+\infty\end{cases},

(39)

then the function $f^{*}$ is well defined for any complex number $s$ with real part such that $-a<\Re(s)<-b$ ; this domain is called the fundamental strip of $f^{*}$ . Moreover, for all $c$ in this domain, if $f^{*}(s)$ converges uniformly to 0 for $s=c\pm i\infty$ , then the function $f$ can be expressed for $t\in(0,+\infty)$ as the following inverse Mellin transform:

f(t)=\frac{1}{2i\pi}\int_{c-i\infty}^{c+i\infty}f^{*}(s)t^{-s}ds.

(40)

As an example, let us consider the gamma function, which illustrates well the role of the fundamental strip (and this example will also play a role in the next pages).

Example 4.2 (The gamma function as a Mellin transform).

The gamma function satisfies

	$\displaystyle\Gamma(s)$	$\displaystyle=\int_{0}^{+\infty}\exp(-t)t^{s-1}dt\text{\qquad(for $0<\Re(s)<+\infty$)},$		(41)
	$\displaystyle\Gamma(s)$	$\displaystyle=\int_{0}^{+\infty}\left(1-\exp(-t)\right)t^{s-1}dt\text{\qquad(for $-1<\Re(s)<0$)}.$		(42)

An important consequence of Formula (40) is that, if $f$ is a meromorphic function on $\mathbb{C}$ , and if $\lim_{c\rightarrow+\infty}\int_{c-i\infty}^{c+i\infty}f^{*}(s)t^{-s}ds=0$ , then one can push the integration contour of Formula (40) to the right (taking $\lim_{c\rightarrow+\infty}$ ) and one then collects in passing the contributions from the residue at each pole $s_{k}$ to the right of the fundamental strip. Now, for $t>0$ and $a\!\in\!{\mathbb{C}}$ , multiplying $t^{-s}=t^{-a}\sum_{\ell\geq 0}\ln(t)^{\ell}(a-s)^{\ell}/\ell!$ by the Laurent series of $f^{*}(s)$ at $s\!=\!s_{k}$ , we see that $\operatorname{Res}[f^{*}(s)t^{-s},s_{k}]$ can be expressed⁶⁶6The notation $\operatorname{Res}[g(s),s_{k}]$ stands for the residue of $g(s)$ at $s=s_{k}$ . as a sum of $\operatorname{order}(s_{k})$ terms, and one gets

	$\displaystyle f(t)=$	$\displaystyle\sum_{\begin{subarray}{c}\text{$s_{k}$ pole of $f^{}(s)t^{-s}$}\\ \text{$\Re(s_{k})\geq-b$}\end{subarray}}\operatorname{Res}[f^{}(s)t^{-s},s_{k}]$		(43)
	$\displaystyle=$	$\displaystyle\sum_{\begin{subarray}{c}\text{$s_{k}$ pole of $f^{}$}\\ \text{$\Re(s_{k})\geq-b$}\end{subarray}}\sum_{j=1}^{\operatorname{order}(s_{k})}\operatorname{Res}[(s-s_{k})^{j-1}f^{}(s),s_{k}]\ t^{-s_{k}}\frac{(-1)^{j}}{(j-1)!}\,(\ln t)^{j-1}.$		(44)

4.2. Average height of Moran walks

We now state the main result of this section.

Theorem 4.3 (Average height).

The average height of Moran walks of length $n$ is given by

\displaystyle\mathbb{E}[H_{n}]

\displaystyle=\frac{\ln n}{\ln(1/p)}-\frac{\gamma}{\ln p}-\frac{1}{2}-\frac{\ln q}{\ln p}+\frac{Q(\ln(qn))}{\ln p}+O\left(\frac{(\ln n)^{4}}{n}\right),

(45)

where $\gamma=.57721\dots$ is Euler’s constant, and where $Q$ is an oscillating function (a Fourier series of period $\ln(1/p)$ ) given by

\displaystyle Q(x)

\displaystyle:=\sum_{k\in\mathbb{Z}\setminus\{0\}}\Gamma(s_{k})\exp(-s_{k}x)\text{\quad where $s_{k}:=\frac{2ik\pi}{\ln p}$}.

(46)

Remark 4.4 (Fourier series representation).

The fact that $Q$ is a Fourier series of period $\ln(1/p)$ and is real for $x\in\mathbb{R}$ is better seen via the alternative equivalent expression

\displaystyle Q(x)

\displaystyle=2\sum_{k\geq 1}\left(\Re(\Gamma(s_{k}))\cos\left(\frac{2k\pi x}{\ln(p)}\right)+\Im(\Gamma(s_{k}))\sin\left(\frac{2k\pi x}{\ln(p)}\right)\right),

where $\Re$ and $\Im$ stands for the real and imaginary parts. This is illustrated in Figure 7.

Remark 4.5 (Fourier series differentiability).

Such asymptotics involving fluctuations dictated by a Fourier series are typical of results obtained via Mellin transforms. They often appear in the asymptotic cost of divide-and-conquer algorithms, or of expressions involving digital sums, harmonic sums, or finite differences (see the work of de Bruijn, Knuth, and Rice [15, 38], or Flajolet, Gourdon, and Dumas [19]). It is sometimes also possible to get them via some real analysis (like Pippenger did [45]), or like in the seminal work of Delange [16] on the sum of digits. Note that the Delange series is nowhere differentiable, while our Fourier series is infinitely differentiable, as proven in Theorem 4.15.

Proof 4.6 (Proof of Theorem 4.3).

The proof exploits the fact that the mean $\mathbb{E}[H_{n}]$ asymptotically behaves like $\sum_{h=0}^{+\infty}\big{(}1-\exp(-nqp^{h+1})\big{)}$ ; this is proven by rewriting $\mathbb{E}[H_{n}]$ as follows:

\mathbb{E}[H_{n}]=\sum_{h=0}^{n}\left(1-\mathbb{P}\left(H_{n}\leq h\right)\right)=\Sigma_{0}+\Sigma_{1}+\Sigma_{2}+\Sigma_{3}-\Sigma_{4}+\Sigma_{\infty},

(47)

with

	$\displaystyle\Sigma_{0}$	$\displaystyle:=\sum_{0\leq h<h_{1}}\left(\exp(-nqp^{h+1})-\mathbb{P}\left(H_{n}\leq h\right)\right),$
	$\displaystyle\Sigma_{1}$	$\displaystyle:=\sum_{h_{1}\leq h<h_{2}}\left(\exp(-nqp^{h+1})-\mathbb{P}\left(H_{n}\leq h\right)\right),$
	$\displaystyle\Sigma_{2}$	$\displaystyle:=\sum_{h_{2}\leq h<h_{3}}\left(\exp(-nqp^{h+1})-\mathbb{P}\left(H_{n}\leq h\right)\right),$
	$\displaystyle\Sigma_{3}$	$\displaystyle:=\sum_{h_{3}\leq h\leq n}\big{(}1-\mathbb{P}\left(H_{n}\leq h\right)\big{)},$
	$\displaystyle\Sigma_{4}$	$\displaystyle:=\sum_{h=h_{3}}^{+\infty}\big{(}1-\exp(-nqp^{h+1})\big{)},$
	$\displaystyle\Sigma_{\infty}$	$\displaystyle:=\sum_{h=0}^{+\infty}\big{(}1-\exp(-nqp^{h+1})\big{)}.$

The key is to prove that, for some $h_{1}$ , $h_{2}$ , and $h_{3}$ adequately chosen, the sums $\Sigma_{0},\Sigma_{1},\Sigma_{2}$ , $\Sigma_{3}$ , and $\Sigma_{4}$ are asymptotically negligible, while the main contribution to $\mathbb{E}[H_{n}]$ comes from the last sum (namely, $\Sigma_{\infty}$ ), which we will evaluate via a Mellin transform approach.

The reader not enjoying delta-epsilon proofs could have the feeling that “cutting epsilons into 5 parts” like above is a little bit discouraging but this is the price to pay to get the $O((\ln(n)^{4}/n)$ error term in Formula (45). In fact, in Equation (47) for $\mathbb{E}[H_{n}]$ , it is possible to cut the sum into only 4 parts, but then this would lead to a final weaker $O(1/\sqrt{n})$ error term.

So let’s be brave and begin with $\Sigma_{0}$ . Here, for the range $0\leq h<h_{1}$ , with $h_{1}:=\frac{3}{4}\frac{\ln(n)}{\ln(1/p)}$ ,
we get

	$\displaystyle\|\Sigma_{0}\|$	$\displaystyle\leq h_{1}\times\left(\max_{0\leq h<h_{1}}\left(\exp(-nqp^{h+1})+\max_{0\leq h<h_{1}}\mathbb{P}\left(H_{n}\leq h\right)\right)\right)$
		$\displaystyle=h_{1}\times\left(\exp(-nqp^{h_{1}+1})+\mathbb{P}\left(H_{n}\leq h_{1}\right)\right)$
		$\displaystyle=h_{1}\times\left(2\exp(-qpn^{1/4})+O\left(\frac{(\ln n)^{3}}{n}\right)\right)$
		$\displaystyle=O\left(\frac{(\ln n)^{4}}{n}\right),$

where, for the second line we used that the sequences are increasing with respect to $h$ , and for the third line we used Formula (28) for $p^{h}$ and the approximation of Theorem 3.4. Note that this bound for $|\Sigma_{0}|$ also implies the uniform bound

\mathbb{P}(H_{n}\leq h)=O\left(\frac{(\ln n)^{4}}{n}\right)\qquad\text{(for $h<h_{1}$)}.

(48)

Now, for $\Sigma_{1}$ , in the range $h_{1}\leq h<h_{2}$ , with $h_{2}:=\frac{\ln(n)}{\ln(1/p)}+\frac{\ln(\ln(n))}{\ln(1/p)}$ , we rewrite $h$ as $h:=(1-t)h_{1}+th_{2}$ . Such values of $h$ correspond to using $c=(t+3)/4$ and $c^{\prime}=t$ in the Formula (28) for $p^{h}$ .

Via the exponential bound on $H_{n}$ from Formula (32), we get

	$\displaystyle\|\Sigma_{1}\|$	$\displaystyle\leq(h_{2}-h_{1})\times\left(\max_{h_{1}\leq h<h_{2}}\left(\exp(-nqp^{h+1})+\max_{h_{1}\leq h<h_{2}}\mathbb{P}\left(H_{n}\leq h\right)\right)\right)$
		$\displaystyle\leq h_{2}\times\left(\exp(-nqp^{h_{2}+1})+\mathbb{P}\left(H_{n}\leq h_{2}\right)\right)=O((\ln n)^{4}/n).$

Then, for $\Sigma_{2}$ , in the range $h_{2}\leq h_{3}$ , with $h_{3}:=\frac{4\ln(n)}{\ln(1/p)}$ , we rewrite $h$ as $h:=(1-t)h_{2}+th_{3}$ . Such values of $h$ correspond to using $c=1+3t$ and $c^{\prime}=1-t$ in the Formula (28) for $p^{h}$ . Via Formula (32), we get $|\Sigma_{2}|=O((\ln n)^{3}/n)$ .

For the next sum, using the power series expansion of the exponential in Equation (27) (and keeping in mind that our choice of $h_{3}$ implies $p^{h_{3}}=1/n^{4}$ ), we get

	$\displaystyle\Sigma_{3}=\sum_{h=h_{3}}^{n}\left(1-\mathbb{P}\left(H_{n}\leq h\right)\right)$	$\displaystyle\leq(n+1-h_{3})\left(1-\mathbb{P}\left(H_{n}\leq h_{3}\right)\right)$		(49)
		$\displaystyle\leq n(1-\exp(-(n+1)qp^{h_{3}+1}))(1+o(1))=O\left(\frac{1}{n^{2}}\right).\qquad$		(50)

Finally, for the sum $\Sigma_{4}$ , we use the power series expansions of $\exp(x)$ and of $1/(1-p)$ and we get:

\Sigma_{4}=\sum_{h\geq h_{3}}(1-\exp(-nqp^{h+1}))=\frac{nqp^{h_{3}+1}}{1-p}-\sum_{h\geq h_{3}}\sum_{k\geq 2}\frac{(-nqp^{h+1})^{k}}{k!}<np^{h_{3}+1}=O\left(\frac{1}{n^{3}}\right).

We got that $\Sigma_{0}$ , $\Sigma_{1}$ , $\Sigma_{2}$ , $\Sigma_{3}$ , and $\Sigma_{4}$ are $o(1)$ . It remains to evaluate $\Sigma_{\infty}=\sum_{h\geq 0}(1-e^{-nqp^{h+1}})$ . Such a sum is typical of expressions which can be evaluated by Mellin transform techniques. To this aim, let $\phi(t)=\sum_{h\geq 0}(1-e^{-tqp^{h+1}})$ and set $f(t):=1-e^{-tpq}$ and $\mu_{h}:=p^{h}$ , then

\phi(t)=\sum_{h\geq 0}f(\mu_{h}t).

Let $\phi^{*}$ and $f^{*}$ be, respectively, the Mellin transform of the functions $\phi$ and $f$ . Using Identity (42) given in Example 4.2, we have $f^{*}(s)=-(pq)^{-s}\Gamma(s)$ on its fundamental strip $-1<\Re(s)<0$ and, as $\phi$ is a harmonic sum, its Mellin transform is

\phi^{*}(s)=f^{*}(s)\sum_{h\geq 0}\mu_{h}^{-s}=\frac{q^{-s}\Gamma(s)}{1-p^{s}}.

(51)

This function extends analytically to the full complex plane, with isolated poles at the negative integers (due to poles of $\Gamma(s)$ there), and with another set of isolated poles (the roots of $p^{s}=1$ ). These two sets of poles have $s=0$ in common. This implies that for $\Re(s)>-1$ the poles of $\phi^{*}$ are

\displaystyle\begin{cases}s_{k}=\frac{2ik\pi}{\ln p}\text{ for $k\in\mathbb{Z},k\neq 0$ \qquad(all are poles of order 1)},\\ s_{0}=0\qquad\text{(the only pole of order 2)}.\end{cases}

(52)

Using Formula (44) for the inverse Mellin transform, we obtain

	$\displaystyle\phi(t)$	$\displaystyle=\operatorname{Res}[s\phi^{},0]\ \ln t-\operatorname{Res}[\phi^{},0]-\sum_{k\in\mathbb{Z}\setminus\{0\}}\operatorname{Res}[\phi^{*},s_{k}]\ t^{-s_{k}}$
		$\displaystyle=\frac{\ln t}{-\ln p}-\left(\frac{\gamma}{\ln p}+\frac{1}{2}+\frac{\ln q}{\ln p}\right)+\frac{1}{\ln p}\sum_{k\in\mathbb{Z}\setminus\{0\}}\Gamma(s_{k})q^{-s_{k}}t^{-s_{k}}.$

We finally get the claim of the theorem by noting that $\mathbb{E}[H_{n}]=\phi(n)+O\left(\frac{(\ln n)^{4}}{n}\right)$ .

4.3. Variance of the height of Moran walks

We now prove that the height of Moran walks, despite a mean of order $O(\ln n)$ and a second moment of order $O((\ln n)^{2})$ , has a variance which involves surprising cancellations at these two orders, leading to an oscillating function of order $O(1)$ (in $n$ ), as implied by the following much more precise asymptotics.

Theorem 4.7.

The variance of the height of Moran walks satisfies

\mathbb{V}{\rm ar}[H_{n}]=\frac{1}{\ln(p)^{2}}\left(Q^{2}(\ln(qn))+2\gamma Q(\ln(qn))+2R(\ln(qn))+\frac{\pi^{2}}{6}\right)+\frac{1}{12}+O\left(\frac{(\ln n)^{5}}{n}\right),

where $Q$ and $R$ are Fourier series of small amplitudes given by Formulas (46) and (56).

Proof 4.8.

To obtain the variance of $H_{n}$ we first consider the second moment

\displaystyle\mathbb{E}[H_{n}^{2}]

\displaystyle=\sum_{h\geq 0}\mathbb{P}(H_{n}=h)h^{2}=\sum_{h\geq 0}\mathbb{P}(H_{n}^{2}>h),

(53)

where we know from Theorem 3.4 that the summand can be approximated by

\mathbb{P}(H_{n}^{2}>h)=1-\mathbb{P}\left(H_{n}\leq\sqrt{h}\right)=1-\exp\left(-nqp^{\left\lfloor\sqrt{h}\right\rfloor+1}\right)+O\left(\frac{(\ln n)^{3}}{n}\right).

Then, partitioning the last sum in (53) into the same intervals as in Formula (47), we get that $\mathbb{E}[H_{n}^{2}]=\phi_{\rm var}(n)+O\left(\frac{(\ln n)^{4}}{n}\right)$ , where $\phi_{\rm var}$ is the function defined by

\phi_{\rm var}(x)=\sum_{h\geq 0}\left(1-\exp\left(-xqp^{\left\lfloor\sqrt{h}\right\rfloor+1}\right)\right).

From the behavior of $\phi_{\rm var}(x)$ at $x=0$ and $x=+\infty$ , using the property given in (39), we get that the Mellin transform of $\phi_{\rm var}$ is defined on the fundamental strip $(-1,\,0)$ . Using the harmonic sum summation (51), one gets for $s$ in this strip:

\phi_{\rm var}^{*}(s)=f^{*}(s)\sum_{h\geq 0}\left(p^{\left\lfloor\sqrt{h}\right\rfloor}\right)^{-s}=-\Gamma(s)(pq)^{-s}\sum_{h\geq 0}\left(p^{\left\lfloor\sqrt{h}\right\rfloor}\right)^{-s}.

Here, as we have

\displaystyle\sum_{h\geq 0}\left(p^{\left\lfloor\sqrt{h}\right\rfloor}\right)^{-s}=\sum_{n\geq 0}\,\sum_{h=n^{2}}^{(n+1)^{2}-1}\left(p^{-s}\right)^{n}=\sum_{n\geq 0}\,\left(2n+1\right)\left(p^{-s}\right)^{n}=\frac{1+p^{-s}}{\left(1-p^{-s}\right)^{2}},

we finally get

\phi_{\rm var}^{*}(s)=\frac{-\Gamma(s)q^{-s}(1+p^{s})}{(p^{s}-1)^{2}}.

(54)

What are the poles of $\phi_{\rm var}^{*}(s)$ ? These are $s=0$ (a pole of order 3) and $s=s_{k}=2ik\pi$ (for $k\in\mathbb{Z},k\neq 0$ , which are poles of order 2). Using Formula (44) for the inverse Mellin transform, one thus obtains

$\displaystyle\phi_{\rm var}(t)=$	$\displaystyle\frac{\ln(t)^{2}}{\ln(p)^{2}}+\ln(t)\frac{\ln(p)+2\ln(q)+2\gamma-2Q(\ln(qt))}{\ln(p)^{2}}$
	$\displaystyle-\frac{\ln(p)+2\ln(q)}{\ln(p)^{2}}Q(\ln(qt))+\frac{2}{\ln(p)^{2}}R(\ln(qt))$
	$\displaystyle+{\frac{1}{3}}+{\frac{\gamma+\ln(q)}{\ln(p)}}+\frac{\pi^{2}/6+\gamma^{2}}{\ln(p)^{2}}+\frac{2\gamma\,\ln(q)+\ln(q)^{2}}{\ln(p)^{2}},$	(55)

with the same $Q(x)$ as in (46), and where $R(x)$ is another Fourier series given by

\displaystyle R(x)

\displaystyle=\sum_{k\in\mathbb{Z}\setminus\{0\}}\Gamma^{\prime}(s_{k})\exp(-s_{k}x).

(56)

(Similarly to $Q(x)$ , this Fourier series $R(x)$ is always real, as can be seen by replacing $\Gamma$ by $\Gamma^{\prime}$ in Remark 4.4.)

Now that we obtained the asymptotic behavior of $\mathbb{E}[H_{n}^{2}]$ , we conclude and obtain Theorem 4.7 via $\mathbb{V}{\rm ar}[H_{n}]=\mathbb{E}[H_{n}^{2}]-\mathbb{E}[H_{n}]^{2}$ , where $\mathbb{E}[H_{n}]$ was computed in Theorem 4.3.

4.4. Height of excursions

Excursions are walks in $\mathbb{N}^{2}$ ending at altitude $0$ (where, as previously, time is encoded by the $x$ -axis, and altitude by the $y$ -axis). As in previous sections, let $Y_{n}$ and $H_{n}$ be the final altitude and height of a walk, and let the random variable ${\widetilde{H}}_{n}$ be the height of a walk of length $n$ conditioned to be an excursion, that is, ${\widetilde{H}}_{n}=H_{n}|\{Y_{n}=0\}$ . For Moran walks, we get the following behavior.

Theorem 4.9 (Distribution and moments of the height of Moran excursions).

The distribution of the height of excursions satisfies (for a uniform error term in $k$ )

\displaystyle\mathbb{P}\left({\widetilde{H}}_{n}\leq\left\lfloor\frac{\ln n}{\ln(1/p)}\right\rfloor+k\right)=\exp\left(-q\alpha(n-1)p^{k+1}\right)+O\left(\frac{(\ln n)^{3}}{n}\right),

(57)

with $\alpha(n):=p^{-\{\frac{\ln n}{\ln(1/p)}\}}$ (where $\{x\}$ stands for the fractional part of $x$ , and where $\lfloor x\rfloor$ stands for the floor function of $x$ ).

Introducing temporarily the quantity $\ell_{n}:=\ln(q(n-1))$ , and with the same Fourier series $Q$ and $R$ as in Theorems 4.3 and 4.7, the average and the variance are given by

\displaystyle\mathbb{E}[{\widetilde{H}}_{n}]

\displaystyle=\frac{\ln n}{\ln(1/p)}-\frac{\gamma}{\ln p}-\frac{1}{2}-\frac{\ln q}{\ln p}+\frac{Q(\ell_{n})}{\ln p}+O\left(\frac{(\ln n)^{4}}{n}\right),

(58)

\mathbb{V}{\rm ar}[{\widetilde{H}}_{n}]=\frac{1}{\ln(p)^{2}}\left(Q^{2}(\ell_{n})+2\gamma Q(\ell_{n})+2R(\ell_{n})+\frac{\pi^{2}}{6}\right)+\frac{1}{12}+O\left(\frac{(\ln n)^{5}}{n}\right).

(59)

Proof 4.10.

As a Moran excursion necessarily ends by a reset, we have

\mathbb{P}(\widetilde{H}_{n}\leq h)=\mathbb{P}\left(H_{n}\leq h|\{Y_{n}=0\}\right)=q\mathbb{P}(H_{n-1}\leq h)/\mathbb{P}(Y_{n}=0).

(60)

Thus, we have $\mathbb{P}(\widetilde{H}_{n}\leq h)=\mathbb{P}(H_{n-1}\leq h)$ , $\mathbb{E}[{\widetilde{H}}_{n}]=\mathbb{E}[H_{n-1}]$ , and $\mathbb{V}{\rm ar}[{\widetilde{H}}_{n}]=\mathbb{V}{\rm ar}[H_{n-1}]$ , we can therefore directly recycle the results of Theorems 3.4, 4.3, and 4.7 to get the asymptotic distribution/mean/variance.

In this recycling, some care has to be brought while performing the substitution $n\rightarrow n-1$ in the asymptotic formulas for the walks: indeed, this could impact intermediate asymptotic terms (smaller than the main asymptotic term, but larger than the error term); however, in our case, all is safe as we have

\frac{(\ln(n\pm 1))^{m}}{(n\pm 1)^{m^{\prime}}}=\frac{(\ln n)^{m}}{n^{m^{\prime}}}+O\left(\frac{(\ln n)^{m}}{n^{m^{\prime}+1}}\right).

This result is a simple consequence of the combinatorially obvious identity (60), so this direct link between the asymptotics of walks and excursions holds in wider generality for any model of walks with resets for which the step set $\mathcal{S}$ contains only positive steps.

4.5. Fourier series: bounds and infinite differentiability

In his seminal work [38], Knuth mentions at the end of his Section 3 that if one assumes that $\ln(qn)$ is equidistributed mod 1, then the sum $Q(\ln(qn))$ is of “average 0”. Let us amend a little bit Knuth’s assertion. Indeed, Weyl’s criterion asserts that a sequence $a_{n}$ is equidistributed mod 1 if and only if, for any positive integer $\ell$ , we have

\lim_{N\rightarrow+\infty}\frac{1}{N}\sum_{n=1}^{N}\exp(2i\pi\ell a_{n})=0.

Considering this sum with $\ell=1$ and $a_{n}=\ln(qn)$ , and applying the Euler–Maclaurin formula to it, one gets that it does not converge to 0, and therefore $\ln(qn)$ is not equidistributed mod 1.

However, it is indeed true that the oscillating $Q(x)$ and $R(x)$ are of mean value zero over their period (i.e., $\int_{0}^{\ln(1/p)}Q(x)dx=0$ ; see Figure 7 on page 7), and that $Q(\ln(qn))$ and $R(\ln(qn))$ are “almost” of mean value zero and that they possess small fluctuations. Let us give an explicit bound on their amplitude. To this aim, we first need to bound the digamma function⁷⁷7This is a rather misleading name: indeed, the digamma function is traditionally denoted by the letter psi (i.e., $\psi$ ), while it should logically be denoted by the Greek letter digamma (i.e., $\digamma$ , a letter which looks like a big $\Gamma$ stack on a small $\Gamma$ , which later gave birth to the more familiar letter $F$ in the Latin alphabet). This paradox is due to the fact that Stirling, who introduced this function, did initially use the notation digamma $\digamma$ , but later authors switched the notation to $\psi$ , while the initial name remained., defined by

\psi(z):=\Gamma^{\prime}(z)/\Gamma(z).

(61)

The function $\psi$ can be seen as an analytic continuation of harmonic numbers and satisfies $\psi(t+1)=\psi(t)+1/t$ . While several bounds for $\psi(z)$ exist in the literature (see e.g. [52]), most of them are dedicated to $z\in\mathbb{R}$ (for example we have $\psi(t)<\ln(t)-1/(2t)$ for $t>0$ ), so we now establish a lemma for $z\in i\mathbb{R}$ (which we believe to be new, and which has its own interest beyond our application hereafter to bounds of Fourier series).

Lemma 4.11 (A bound for the digamma function on the imaginary axis).

For $t>0$ , we have

\left|\psi(it)\right|\leq\frac{1}{2}\ln\left(1+t^{2}\right)+\left(\frac{\pi}{2}+1-\gamma\right)+\frac{1}{t},

(62)

which also implies the bound

\left|\psi(it)\right|\leq\left(\frac{\pi}{2}+1-\gamma+\frac{\ln 2}{2}\right)+\left(\ln(t){\mathbbm{1}}_{\{t\geq 1\}}+\frac{1}{t}\right).

Proof 4.12.

Using Euler’s representation of the gamma function as an infinite product, i.e.,

\Gamma(z)=\frac{1}{z}\prod_{k\geq 1}(1+1/k)^{z}/(1+z/k)=\frac{\exp(-\gamma z)}{z}\prod_{k\geq 1}\frac{\exp(z/k)}{1+z/k},

we get that its logarithmic derivative, $\psi(z)=\Gamma^{\prime}(z)/\Gamma(z)$ , satisfies, for $z\in\mathbb{C},z\notin-{\mathbb{N}}$ :

\psi(z)=-\frac{1}{z}-\gamma+\sum_{k=1}^{+\infty}\frac{z}{k(k+z)}.

(63)

We refer to [18, Section 1.1] for more details on these formulas. Now, setting $z=it$ (with $t>0$ ), and regrouping the imaginary and real parts gives

\psi(it)=i\left(\frac{1}{t}+\sum_{n=1}^{+\infty}\frac{t}{n^{2}+t^{2}}\right)+\left(\sum_{n=1}^{+\infty}\frac{t^{2}}{n\left(n^{2}+t^{2}\right)}-\gamma\right),

and thus, by the triangle inequality

\left|\psi(it)\right|\leq\left(\frac{1}{t}+\sum_{n=1}^{+\infty}\frac{t}{n^{2}+t^{2}}\right)+\left(\sum_{n=1}^{+\infty}\frac{t^{2}}{n\left(n^{2}+t^{2}\right)}-\gamma\right).

(64)

Here, note that for all $n\leq u<n+1$ , we have $n^{2}+t^{2}\leq u^{2}+t^{2}<(n+1)^{2}+t^{2}$ , and thus

\frac{t}{(n+1)^{2}+t^{2}}\leq\int_{n}^{n+1}\frac{t}{u^{2}+t^{2}}du\leq\frac{t}{n^{2}+t^{2}}.

Summing for $n$ from $0$ to $+\infty$ , we obtain

\displaystyle\sum_{n=1}^{+\infty}\frac{t}{n^{2}+t^{2}}\leq\sum_{n=0}^{+\infty}\int_{n}^{n+1}\frac{t}{u^{2}+t^{2}}du=\int_{0}^{+\infty}\frac{t}{u^{2}+t^{2}}du=\frac{\pi}{2}.

So the first infinite sum in (64) is bounded by $\pi/2$ . For the second infinite sum, it is convenient to split it in the contribution from the summand for $n=1$ , which is bounded by

\max_{t\geq 0}\left(\frac{t^{2}}{1+t^{2}}\right)=1,

plus the remaining part (i.e., the sum of the terms for $n\geq 2$ ):

\displaystyle\sum_{n=2}^{+\infty}\frac{t^{2}}{n\left(n^{2}+t^{2}\right)}\leq\int_{t^{-1}}^{+\infty}\frac{1}{u(u^{2}+1)}du\ =\ \frac{1}{2}\ln(1+t^{2}).

Plugging these two bounds in (64) proves our lemma.

Equipped with the previous lemma, we can now give our bounds for $Q(x)$ and $R(x)$ .

Proposition 4.13 (Uniform bounds for the oscillations).

The oscillating functions $Q(x)$ and $R(x)$ are uniformly bounded by

	$\displaystyle\sup_{x\in\mathbb{R}^{+}}\|Q(x)\|$	$\displaystyle\leq\frac{\ln(p)}{\pi}\operatorname{lnexp}\left(p,\frac{4}{5}\pi^{2}\right),$		(65)
	$\displaystyle\sup_{x\in\mathbb{R}^{+}}\|R(x)\|$	$\displaystyle\leq\frac{\ln(p)}{\pi}\left[\operatorname{lnexp}\left(p,\frac{4}{5}\pi^{2}\right)+\left(\frac{\pi}{2}\!+\!1\!-\!\gamma\!-\!\frac{\ln(p)}{2\pi}\right)\operatorname{lnexp}\left(p,\frac{114}{155}\pi^{2}\right)\right],\qquad$		(66)

where

\operatorname{lnexp}(p,\beta):=\ln\left(1-\exp\left(\frac{\beta}{\ln(p)}\right)\right).

(67)

For $p=1/2$ , we have more precisely

\sup_{x\in\mathbb{R}^{+}}|Q(x)|=1.090430\dots\times 10^{-6}\text{\qquad and \qquad}\sup_{x\in\mathbb{R}^{+}}|R(x)|=2.987768\dots\times 10^{-6}.

Proof 4.14.

Applying the triangle inequality on the definition of $Q(x)$ in (46), we get

|Q(x)|\leq\sum_{k\in\mathbb{Z}\setminus\{0\}}|\Gamma(s_{k})|\times|\exp(-s_{k}x)|\leq 2\sum_{k\geq 1}\left|\Gamma(s_{k})\right|

(a quantity independent of $x$ , as $|\exp(-s_{k}x)|=1$ ). Then, using the complement formula for the gamma function, we have $\Gamma(-z)\Gamma(z)=\frac{\pi}{z\sin(\pi(z+1))}$ (for $z\not\in\mathbb{Z}$ ). Using this relation for $z=it$ (with $t\in\mathbb{R}$ ) together with the relation $\overline{\Gamma(z)}=\Gamma(\bar{z})$ , we infer that

|\Gamma(it)|^{2}=\Gamma(it)\Gamma(-it)=\frac{\pi}{t\sinh(\pi t)}.

(68)

Thus, for $t=\frac{2\pi}{-\ln p}$ , this gives

\displaystyle\sup_{x\in\mathbb{R}^{+}}|Q(x)|

\displaystyle\leq 2\sum_{k\geq 1}\sqrt{\frac{\pi}{kt\sinh(\pi kt)}}=\sqrt{\frac{\ln(1/p)}{2}}\sum_{k\geq 1}\sqrt{\frac{1}{k\sinh(\pi kt)}}.

(69)

As, for $x\geq 0$ , we have $\sinh(x)\geq(1/4)x\exp(4x/5)$ , we get

$\displaystyle\sup_{x\in\mathbb{R}^{+}}\|Q(x)\|$	$\displaystyle\leq\sqrt{\frac{\ln(1/p)}{2}}\sum_{k\geq 1}\sqrt{\frac{1}{(1/4)\pi k^{2}t\exp(4\pi kt/5)}}$
	$\displaystyle={\ln(1/p)}\sum_{k\geq 1}\frac{1}{\pi k\exp\left(\frac{2}{5}\pi kt\right)}$	(70)
	$\displaystyle=\frac{\ln(p)}{\pi}\ln\left(1-\exp\left(\frac{4\pi^{2}}{5\ln(p)}\right)\right).$	(71)

Note that the more relaxed bound (71) is quite close to the stricter bound (69): e.g. for $p=1/2$ the bound (69) gives the upper bound $1.090430\dots\times 10^{-6}$ (and one can numerically check that these first digits also constitute a lower bound), while the bound (71) gives the upper bound $2.49\times 10^{-6}$ .

Now, for bounding $R(x)$ , we use the identity $\Gamma^{\prime}(z)=\psi(z)\Gamma(z)$ , with the bound (62) from Lemma 4.11 for $|\psi(it)|$ , and the bound (70) for $|\Gamma(it)|$ :

	$\displaystyle\|R(x)\|$	$\displaystyle\leq 2\sum_{k\geq 1}\left\|\Gamma^{\prime}(s_{k})\right\|=2\sum_{k\geq 1}\left\|\psi(s_{k})\right\|\left\|\Gamma(s_{k})\right\|$
		$\displaystyle\leq\sum_{k\geq 1}\left(\frac{1}{2}\ln\left(1\!+\!\left(\frac{2\pi k}{\ln(p)}\right)^{2}\right)\!+\!\frac{\pi}{2}\!+\!1\!-\!\gamma\!-\!\frac{\ln p}{2\pi k}\right)\frac{\ln(1/p)}{\pi k\exp\left(-\frac{4}{5}\pi^{2}k/\ln(p)\right)}.\qquad$		(72)

Now, it is easy to check that we have $\frac{1}{2}\ln\left(1+x^{2}\right)\leq\exp\left(\frac{1}{31}\pi x\right)$ for all $x>0$ . Then, noting $t=-2\pi/\ln(p)$ , we get

	$\displaystyle\sum_{k\geq 1}\frac{1}{2}\ln\left(1+(kt)^{2}\right)\frac{\ln(1/p)}{\pi k\exp\left(\frac{2}{5}\pi kt\right)}$	$\displaystyle\leq\sum_{k\geq 1}\frac{\ln(1/p)}{\pi k\exp\left(\frac{57}{155}\pi kt\right)}$
		$\displaystyle=\frac{\ln(p)}{\pi}\ln\left(1-\exp\left(\frac{114\pi^{2}}{155\ln(p)}\right)\right).$

Together with the contribution of the remaining summands in (72), this gives the bound (66) for $|R(x)|$ .

From this, we can establish the infinite differentiability of our fluctuations.

Theorem 4.15 (Fourier series infinite differentiability).

The Fourier series

Q(x)=\sum_{k\in\mathbb{Z}\setminus\{0\}}\Gamma(s_{k})\exp(-s_{k}x)\text{\qquad and \qquad}R(x)=\sum_{k\in\mathbb{Z}\setminus\{0\}}\Gamma^{\prime}(s_{k})\exp(-s_{k}x)

(where $s_{k}=\frac{2ik\pi}{\ln p}$ ) are infinitely differentiable on ${\mathbb{R}}$ .

Proof 4.16.

A Fourier series $f(x)=\sum_{k\in{\mathbb{Z}}}c_{k}\exp(-ikx)$ satisfies the Weierstrass $M$ -test if there exists a sequence $M_{n}$ such that $|c_{k}\exp(-ikx)|+|c_{-k}\exp(ikx)|<M_{k}$ (for all $x\in\mathbb{R}$ ) and $\sum_{k\geq 0}M_{k}$ converges. If $f(x)$ and $g(x):=-i\sum_{k\in{\mathbb{Z}}}kc_{k}\exp(-ikx)$ both satisfy the Weierstrass $M$ -test, then they converge absolutely and uniformly in $\mathbb{R}$ , and $f^{\prime}=g$ .

Thus, by successive application of this $M$ -test, if the coefficients decay polynomially, i.e., we have $|c_{-k}|+|c_{k}|=O(|k|^{-d-1})$ , then $f(x)$ is in ${\mathcal{C}}^{d}$ (that is, $d$ times differentiable) and $f(x)$ is in ${\mathcal{C}}^{\infty}$ (that is, infinitely differentiable) if its coefficients decay faster than any polynomial rate. By Equation (68), the coefficients $\Gamma(s_{k})$ decay like $\approx\exp(-k\pi/\ln(p))$ , so $Q(x)$ is in ${\mathcal{C}}^{\infty}$ . By Equation (72), the coefficients $\Gamma^{\prime}(s_{k})$ also decay like an exponential, so $R(x)$ is in ${\mathcal{C}}^{\infty}$ .

It is interesting to compare this smoothness result with the situation observed by Delange [16] in his seminal work on the sum of digits of $n$ in base $1/p$ (when $1/p$ is an integer). Therein, he proved an asymptotic behavior involving fluctuations dictated by a Fourier series, which can also be obtained by a Mellin transform approach, quite similarly to the road followed in our article. It appears that his Fourier series (already mentioned in Remark 4.5) has coefficients $\zeta(s_{k})/((1+s_{k})s_{k})\approx k^{-1.5}$ ; it is thus not surprising that the Delange series is nowhere differentiable, in sharp contrast with the smoothness of our Fourier series (see Figure 8).

This concludes our analysis of the height and the corresponding fluctuations.

5. Some results for the Moran model in dimension $m>1$

5.1. Joint distribution of ages for the Moran model with $m>1$

Moran processes are models of population evolution (or mutation transmission) where the population is of constant size (some individuals could die but are then immediately replaced by a new individual). Depending on the applications, several variants were considered in the literature starting with the seminal work of Moran himself [43, 44], up to more recent extensions (for example to spatially structured population [41].

Motivated by the model with resets of Itoh, Mahmoud, and Takahashi [35, 34], we now define the Moran model with $m$ individuals. It is a process parametrized by some probabilities $p$ and $p_{i}$ ’s such that $p+\sum_{i=0}^{m}p_{i}=1$ , and which starts at time 0 with $m$ individuals of age $0$ . Then, at each new unit of time,

•

either, with probability $p$ , all survive (their age increases by 1),
•

either, with probability $p_{i}$ (for $1\leq i\leq m$ ), the $i$ -th individual dies (it is then replaced by a new $i$ -th individual of age 0), while the age of the $m-1$ surviving individuals increases by 1,
•

either, with probability $p_{0}$ , all die and are replaced by $m$ new individuals of age 0.

Now, we define the sequence of multivariate polynomials $f_{n}(x_{1},\dots,x_{m})$ (for $n\in\mathbb{N}$ ) by the fact that the coefficient of $x_{1}^{k_{1}}\cdots x_{m}^{k_{m}}$ in $f_{n}(x_{1},\dots,x_{m})$ is the probability that, at time $n$ , the $i$ -th individual has age $k_{i}$ (for $i=1,\dots,m$ ). Accordingly, $F(t,x_{1},\dots,x_{m}):=\sum_{n\geq 0}f_{n}(x_{1},\dots,x_{m})t^{n}$ is the probability generating function associated to the above Moran model, where the time is encoded by the exponent of $t$ .

Theorem 5.1.

The probability generating function of the Moran model is a rational function, and it admits the closed form

F(t,x_{1},\dots,x_{m})=\frac{\sum_{k=0}^{2^{m}-1}(-1)^{k}P_{k}t^{k}}{\Delta},

(73)

where the $P_{k}$ ’s are polynomials (given in the proof) in the $x_{i}$ ’s, $p$ , $p_{i}$ ’s, and where $\Delta$ is the following polynomial of degree $2^{m}$ in $t$ :

\Delta=\prod_{I\subseteq\{1,\dots,m\}}\left(1-t\left(p+p_{0}[\![{I=\{1,\dots,m\}}]\!]+\sum_{i\in I}p_{i}\right)\prod_{i\not\in I}x_{i}\right).

(74)

Proof 5.2.

The Moran model evolution is encoded by the following functional equation for the probability generating function $F$ :

	$\displaystyle F(t,x_{1},\dots,x_{m})=1$	$\displaystyle+tpx_{1}\cdots x_{m}F(t,x_{1},\dots,x_{m})+tp_{0}F(t,1,\dots,1)$
		$\displaystyle+t\left(\sum_{i=1}^{m}p_{i}\frac{x_{1}\cdots x_{m}}{x_{i}}F(t,x_{1},\dots,x_{m})_{\|x_{i}=1}\right),$		(75)

where $F_{|x_{i}=1}$ means $F$ evaluated at $x_{i}=1$ .

To solve this single functional equation (which has $m+2$ unknowns⁸⁸8We temporarily count $F(t,1,\dots,1)$ as unknown, even if it is obviously equal to $1/(1-t)$ , as $F$ is a probability generating function.), the trick is to transform it into a linear system of equations with… $2^{m}$ unknowns! Indeed, by substituting $x_{i}=1$ (in all the possible ways) in the functional equation (5.2), we get a system of $2^{m}$ equations.

Then, we encode this system by a matrix $M$ , where we cleverly (sic!) choose the order in which unknowns are associated to the lines/columns of $M$ . Let us define this order; to this aim consider the Cartesian product ${\mathcal{X}}:=\{1,x_{1}\}\times\cdots\times\{1,x_{m}\}$ . For any pair of $m$ -tuples $\mathbf{X}$ and $\mathbf{Y}$ from ${\mathcal{X}}$ , one writes $\mathbf{X}\prec\mathbf{Y}$ if the number of 1’s in $\mathbf{X}$ is less than the number of 1’s in $\mathbf{Y}$ , or, when they have the same number of 1’s, if $\mathbf{X}$ is smaller than $\mathbf{Y}$ in the lexicographical order induced by $x_{1}\prec\dots\prec x_{m}\prec 1$ . For example, we have $(x_{1},x_{2})\prec(x_{1},1)\prec(1,x_{2})\prec(1,1)$ . Listing all the elements of $\mathcal{X}$ in increasing order, we get a list of $2^{m}$ tuples $X_{1},\dots,X_{2^{m}}$ . The matrix $M$ encoding the aforementioned system of equations is constructed such that the $i$ -th line of the matrix $M$ corresponds to the unknown $F(t,X_{i})$ and the $j$ -th column corresponds to the unknown $F(t,X_{j})$ .

With this order, the matrix $M$ is an upper triangular matrix (as each of the substitution of some $x_{i}$ ’s by some 1’s in Equation (5.2) leads from some tuple $\mathbf{X}$ to $m+2$ larger tuples $\mathbf{Y}$ ), and thus the determinant of $M$ is the product of its diagonal terms:

\det M=\ \prod_{I\subseteq\{1,\dots,m\}}\left(1-t\left(p+p_{0}[\![{I=\{1,\dots,m\}}]\!]+\sum_{i\in I}p_{i}\right)\prod_{i\not\in I}x_{i}\right),

(76)

where we use Iverson’s bracket notation⁹⁹9This notation, $[\![\text{assertion}]\!]$ , is 1 if the assertion is true, and 0 if not. It was introduced in the semantics of the language APL by its founder, Kenneth Iverson. It was later popularized in mathematics by Graham, Knuth, and Patashnik [27]..

As this determinant $\Delta:=\det M$ is not zero, this entails by Cramer’s rule that $F(t,x_{1},\dots,x_{m})$ can be written as a rational function with denominator $\Delta$ (note that, for some specific real values of $p$ and the $p_{i}$ ’s, it is not excluded that the numerator could have a shared factor with $\Delta$ ). Of course, computing the determinant of each comatrix, and using the relation $p_{0}=1-(p+p_{1}+\dots+p_{m})$ , we get symmetric polynomial expressions for the $P_{k}$ ’s occurring in (73), e.g.:

	$\displaystyle P_{0}$	$\displaystyle=1,$
	$\displaystyle P_{1}$	$\displaystyle=p\left(\prod_{i=1}^{m}(1+x_{i})-\prod_{i=1}^{m}x_{i}\right)+\sum_{i=1}^{m}x_{i}\sum_{\begin{subarray}{c}j=1,\dots,m\\ j\neq i\end{subarray}}p_{j},$
	$\displaystyle\vdots$
	$\displaystyle P_{2^{m}-1}$	$\displaystyle=\left(\prod_{i=1}^{m}x_{i}^{m}\right)\prod_{I\subsetneq\{1,\dots,m\}}\left(p+\sum_{i\in I}p_{i}\right).$

Note that the case $p_{0}=0$ , $p_{i}=1/m$ for $i=1,\dots,m$ (with $m\geq 2$ ) was analyzed by Itoh and Mahmoud [34]: they proved that the age of each individual converges to a shifted geometric distribution, namely $\operatorname{Geom}(1/m)-1$ . They also show that the number of individuals of age $k$ at time $n$ converges to a Bernoulli distribution, namely $\operatorname{Ber}((m/(m-1))^{k})$ .Our Theorem 5.1 constitutes a joint law version of these results, at discrete times, for generic $p_{i}$ ’s. For example, introducing $G(t,v):=\sum_{j=1}^{m}\binom{m}{j}v^{j}[x_{1}^{k}\dots x_{j}^{k}]F(t,x_{1},\dots,x_{m})$ , the coefficient $[t^{n}]\partial_{v}G(t,1)$ gives the average number of individuals of age $k$ at time $n$ . (Note that the sum with the binomial coefficients $\binom{m}{j}$ has to be replaced by a sum over the subsets of $\{1,\dots,m\}$ if the $p_{i}$ ’s and the initial conditions for the $x_{i}^{\prime}s$ are not symmetric.)

5.2. A multidimensional generalization of the Moran model

Interestingly, the same strategy of proof allows us to solve a wide generalization of the Moran model, where

•

with probability $p_{I}$ , all the individuals from the subset $I$ of $\{1,\dots,m\}$ die (they are then replaced by new individuals of age 0), while the age of each surviving individual increases by 1.
•

the process starts with $m$ individuals of any (possibly distinct) ages, encoded by a monomial $f_{0}(x_{1},\dots,x_{m})$ .

This translates to the following single functional equation, involving $2^{m}$ unknowns:

F(t,x_{1},\dots,x_{m})=f_{0}(x_{1},\dots,x_{m})+t\sum_{I\in\{1,\dots,m\}}p_{I}F(t,{\mathbf{X}}_{I})\ \prod_{i\not\in I}x_{i},

(77)

where ${\mathbf{X}}_{I}=(x_{1},\dots,x_{m})_{|\text{$x_{i}=1$ for all\ $i\in I$}}$ .

Obviously, by taking $f_{0}=1$ , $p_{\varnothing}=p$ , $p_{\{1,\dots,m\}}=p_{0}$ , $p_{\{i\}}=p_{i}$ , and all other $p_{I}=0$ , the generalized model simplifies to the classical Moran model of Theorem 5.1. Another natural set of probabilities is $p_{I}=q^{k}(1-q)^{m-k}$ , where $k$ is the number of elements in $I$ . It encodes the model where, at each unit of time, each individual dies with probability $q$ .

More generally, for any set of $p_{I}$ ’s, one gets the following result.

Theorem 5.3.

The probability generating function of the generalized Moran model is a rational function:

F(t,x_{1},\dots,x_{m})=\frac{\sum_{k=0}^{2^{m}-1}(-1)^{k}Q_{k}t^{k}}{\Delta},

(78)

where the $Q_{k}$ ’s are polynomials in the $x_{i}$ ’s and $p_{I}$ ’s for $I\subset\{1,\dots,m\}$ , and where $\Delta$ is the following polynomial of degree $2^{m}$ in $t$ :

\Delta=\prod_{I\subseteq\{1,\dots,m\}}\left(1-t\left(p_{\varnothing}+p_{\{1,\dots,m\}}[\![{I=\{1,\dots,m\}}]\!]+\sum_{i\in I}p_{\{i\}}\right)\prod_{i\not\in I}x_{i}\right).

(79)

Note that, for this generalized model, the denominator $\Delta$ is the same as in Theorem 5.1, and the $Q_{k}$ ’s are a lifting of the $P_{k}$ ’s from Theorem 5.1, involving more terms and variables (namely, all the $p_{I}$ ’s). For these two models, these polynomials $P_{k}$ and $Q_{k}$ are variants of symmetric functions. We comment more on this fact now.

Remark 5.4 (Links with bi-indexed families of symmetric functions).

Many problems related to lattice paths lead to generating functions expressible in terms of symmetric functions; this results from the kernel method, which involves a Vandermonde-like determinant, and thus leads to variants of Schur functions [4, 11, 6]. For the generalized Moran model we also get symmetric expressions, as the problem is by design symmetric, but in a more subtle way: one does not get formulas nicely expressible in terms of classical symmetric functions. This is due to the fact that we have to play with two distinct sets of variables (the $p_{i}$ ’s and the $x_{i}$ ’s), the occurrences of which are not fully independent. It appears that thesesubtle dependencies are well encoded by the MacMahon elementary symmetric functions, defined by $e_{j,k}:=[t_{x}^{j}t_{p}^{k}]\prod_{i=1}^{m}(1+t_{x}x_{i}+t_{p}p_{i})$ . For example, we have $e_{2,1}=x_{1}x_{2}p_{3}+x_{2}x_{3}p_{1}+x_{3}x_{1}p_{2}$ . They allow us to provide more compact formulas for our generating functions, like $P_{1}=e_{1,1}+p\sum_{j=1}^{m}e_{j,0}$ . We plan to study these aspects in a forthcoming work. Note that these MacMahon symmetric functions also appear in problems a priori unrelated to our multidimensional Moran walks, see e.g. the articles of Gessel [25] and Rosas [48].

5.3. Application to the soliton wave model

The soliton wave model (as considered by Itoh, Mahmoud, and Takahashi [35]) is a stochastic system of particles encoding a unidirectional wave. The number of particles is constant during the full process: we have $m$ particles on $\mathbb{Z}$ which can only moves to the left as follows. At time $n=0$ , the initial configuration consists of $m$ particles, at $x$ -coordinates $1,\dots,m$ . Then, at each unit of time $n=1,\,2,\dots$ , uniformly at random, one of the $m$ particles jumps just to the left of the first particle (the wave front), thus leaving an empty space at its starting position:

$\ \longrightarrow\$

Note that at time $n$ the location of the leftmost particle has thus $x$ -coordinate $1-n$ . See Figure 9 for an illustration of 6 iterations of this process, where, for drawing convenience, we shift the origin of the $x$ -axis after each step, so that the first particle is always at $x$ -coordinate 1.

Then, applying Theorem 5.3 to this model, we get the following proposition.

Proposition 5.5.

The joint distribution $F(t,x_{1},\dots,x_{m})$ of the time/positions of the particles in the soliton wave model is given by Formula (78), by taking as initial condition $f_{0}=x_{1}^{1}x_{2}^{2}\dots x_{m}^{m}$ , and, as probabilities of transition, $p_{\{i\}}=1/m$ and all other $p_{I}=0$ ; what is more, the denominator of $F(t,x_{1},\dots,x_{m})$ thus simplifies to

\Delta=\prod_{I\subseteq\{1,\dots,m\}}\left(1-t\frac{|I|}{m}\right)\prod_{i\not\in I}x_{i},

where $|I|$ stands for the number of elements of the set $I$ .

Figure 9 also shows that this model has one degree of freedom, that is, the soliton wave model with $m$ particles can be modeled as $m-1$ interactive urns $U_{1},\dots,U_{m-1}$ : the urn $U_{k}$ contains the number of white cells between the $k$ -th and $(k+1)$ -th blue particle. Accordingly, this interactive urn process starts with $U_{k}(0)=0$ for all $k$ , and then, at each unit of time, we have one of the following $m$ events (with probability $1/m$ ):

•

$U_{1}(n+1)=U_{1}(n)+1$ and other urns are unchanged.
•

for $k=2,\dots,m-1$ : $U_{1}(n+1)=0$ , $U_{j}(n+1):=U_{j-1}(n)$ (for $j=2,\dots,k-1$ ), $U_{k}(n+1):=U_{k-1}(n)+U_{k}(n)+1$ , and remaining urns are unchanged.
•

$U_{1}(n+1)=0$ and, for $k\geq 2$ , $U_{k}(n+1):=U_{k-1}(n)$ .

The length of the soliton is then given by $L_{n}=m+U_{1}(n)+\dots+U_{m-1}(n)$ ; it can equivalently be viewed as the maximum of the $x$ -coordinates (at time $n$ ) of each particle.

6. Conclusion and future works

In this article, we considered several statistics (final altitude, waiting time, height) associated to walks with resets, for any given finite step set. For the case of the simplest non-trivial model (namely, for Moran walks), we prove that the asymptotic height exhibits some subtle behavior related to the discrete Gumbel distribution. In a forthcoming article, we plan to consider the asymptotic analysis of the height for more general walks.

In our formulas for walks of length $n$ , taking $q^{\prime}:=q/n$ (and more generally $q^{\prime}=q(n)$ ) as the probability of reset leads to models which can counterbalance the infinite negative drift of the initial model, and thus present a different type of asymptotic behavior. Studying these models and their phase transitions in more detail would be interesting.

In Section 5, we considered several multidimensional extensions of such walks, with applications to the soliton wave model, or to models in genetics. More multidimensional variants of Moran models allowing both positive and negative jumps (and with or without resets) can be handled using the approach presented in this article (see [1]). One interesting example is the one where each dimension evolves like a Motzkin path, this model was e.g. considered in the haploid Moran model [32], where the authors use a Markov chain approach, using duality/reversibility to establish links with Ornstein–Uhlenbeck processes. Note that even if one adds resets to such Motzkin-like models, one keeps nice links with continuous fractions associated to birth and death processes; see [20]. The analysis becomes much more complicated as soon as jumps of amplitude $\geq 2$ are allowed; in such cases, our approach based on the kernel method strikes again.

Another natural extension is to consider walks in the quarter plane with resets (a natural model of two queues evolving in parallel); even for walks with jumps of amplitude $1$ , the exact enumeration and the asymptotic behavior of the (maximal) height remain open. Other more ad hoc extensions consider some age-dependent probabilities $p_{i}$ ’s, then leading to partial differential equations for the corresponding generating functions. Some specific cases lead to closed-form solutions.

All these variants of Moran models are parametrized by the $p_{i}$ ’s. One can then turn to the tuning of several statistical tests: having some experimental data, it is natural to look for maximum likelihood estimators of the $p_{i}$ ’s, and to study if they are unbiased, sufficient, and consistent (for more on these notions, see e.g. [50]). In conclusion, the Moran model offers a large variety of interesting models, with many aspects to explore!

Acknowledgement: The work of the first author is supported by the Researchers Supporting Project RSPD2023R987 of King Saud University. We thank Hosam Mahmoud for introducing us to the Moran walks, and asking us about the distribution of their height. We are indebted to Rosa Orellana and Mike Zabrowski for pinpointing us that the bi-indexed symmetric functions which we introduced in Remark 5.4, and which already occurred in the literature under the name MacMahon symmetric functions. Last but not least, we also deeply thank the two referees for their kind detailed reports, which helped to improve several parts of this article.

References

[1] Asma Althagafi. Maximum Age Attained by a Moran Population and Some Applications. PhD thesis, King Saoud University, 2023.
[2] Stephen F. Altschul and Samuel Karlin. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Natl. Acad. Sci. USA, 87:2264–2268, 1990.
[3] Axel Bacher. Generalized Dyck paths of bounded height. Sém. Lothar. Combin., 2023 (to appear).
[4] Cyril Banderier. Combinatoire analytique des chemins et des cartes. PhD thesis, Univ. Paris VI, 2001.
[5] Cyril Banderier and Philippe Flajolet. Basic analytic combinatorics of directed lattice paths. Theor. Comput. Sci., 281(1-2):37–80, 2002.
[6] Cyril Banderier, Marie-Louise Lackner, and Michael Wallner. Latticepathology and symmetric functions. In Proceedings of AofA’2020, volume 159, pages 21:1–21:16. Leibniz International Proceedings in Informatics (LIPIcs), 2020.
[7] Cyril Banderier and Pierre Nicodème. Bounded discrete walks. Discrete Math. Theor. Comput. Sci., AM:35–48, 2010.
[8] Cyril Banderier and Michael Wallner. Lattice paths with catastrophes. Discrete Math. Theor. Comput. Sci., 19(1):32, 2017. Id/No 23.
[9] Iddo Ben-Ari, Alexander Roitershtein, and Rinaldo B. Schinazi. A random walk with catastrophes. Electron. J. Probab., 24:21, 2019. Id/No 28.
[10] Edward A. Bender and Zhicheng Gao. Part sizes of smooth supercritical compositional structures. Comb. Probab. Comput., 23(5):686–716, 2014.
[11] Mireille Bousquet-Mélou. Discrete excursions. Sém. Lothar. Combin., 57:23 pp., 2008.
[12] Mireille Bousquet-Mélou and Marko Petkovšek. Linear recurrences with constant coefficients: the multivariate case. Discrete Math., 225(1-3):51–75, 2000.
[13] Arthur W. Burks, Herman H. Goldstine, and John von Neumann. Preliminary discussion of the logical design of an electronic computing instrument. Report to U.S. Army Ordnance Department, 1946. Reprinted in “Collected Works of John von Neumann”, vol. 5, pp. 34–79, The Macmillan Company, 1963.
[14] Iva Chang, Alan Krinik, and Randall J. Swift. Birth-multiple catastrophe processes. Journal of Statistical Planning and Inference, 137(5):1544–1559, 2007.
[15] Nicolaas Govert de Bruijn, Donald E. Knuth, and Stephen O. Rice. The average height of planted plane trees. In R. C. Read, editor, Graph Theory and Computing, pages 15–22. Academic Press, 1972.
[16] Hubert Delange. Sur la fonction sommatoire de la fonction ’somme des chiffres’. Enseign. Math. (2), 21:31–47, 1975.
[17] Louis Dumont. Algorithmes rapides pour le calcul symbolique de certaines intégrales de contour à paramètre. PhD thesis, Univ. Paris Saclay, 2016.
[18] Arthur Erdélyi, W. Magnus, F. Oberhettinger, and Francesco G. Tricomi. Higher transcendental functions. Vol. I. McGraw-Hill Book Co., 1953. Bateman Manuscript Project.
[19] Philippe Flajolet, Xavier Gourdon, and Philippe Dumas. Mellin transforms and asymptotics: Harmonic sums. Theor. Comput. Sci., 144(1-2):3–58, 1995.
[20] Philippe Flajolet and Fabrice Guillemin. The formal theory of birth-and-death processes, lattice path combinatorics and continued fractions. Advances in Applied Probability, 32(3):750–778, 2000.
[21] Philippe Flajolet, Mathieu Roux, and Brigitte Vallée. Digital trees and memoryless sources: from arithmetics to analysis. Discrete Math. Theor. Comput. Sci., AM:233–260, 2010.
[22] Philippe Flajolet and Robert Sedgewick. Analytic Combinatorics. Cambridge University Press, 2009.
[23] Ayla Gafni. Longest run of equal parts in a random integer composition. Discrete Math., 338(2):236–247, 2015.
[24] Ilya B. Gertsbakh. Statistical Reliability Theory. Marcel Dekker, Inc., 1989.
[25] Ira M. Gessel. Enumerative applications of symmetric functions. Sém. Lothar. Combin., B17a:17 pp., 1987.
[26] Xavier Gourdon. Largest component in random combinatorial structures. Discrete Math., 180(1-3):185–209, 1998.
[27] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. Concrete Mathematics. A Foundation for Computer Science. Addison-Wesley, 1994. Second edition (1st ed: 1988).
[28] Jan Grandell. Aspects of Risk Theory. Springer, 1991.
[29] Rosemary J. Harris and Hugo Touchette. Phase transitions in large deviations of reset processes. J. Phys. A, 50(10):10LT01, 13, 2017.
[30] Clemens Heuberger and Helmut Prodinger. Carry propagation in signed digit representations. Eur. J. Comb., 24(3):293–320, 2003.
[31] Clemens Heuberger, Sarah Selkirk, and Stephan Wagner. The distribution of the maximum protection number in random trees. arXiv, 2023.
[32] Thierry Huillet and Martin Möhle. Duality and asymptotics for a class of nonneutral discrete Moran models. J. Appl. Probab., 46(3):866–893, 2009.
[33] Blake Hunter, Alan Krinik, Chau Nguyen, Jennifer M. Switkes, and Hubertus F. Von Bremen. Gambler’s ruin with catastrophes and windfalls. Journal of Statistical Theory and Practice, 2008.
[34] Yoshiaki Itoh and Hosam M. Mahmoud. Age statistics in the Moran population model. Stat. Probab. Lett., 74(1):21–30, 2005.
[35] Yoshiaki Itoh, Hosam M. Mahmoud, and Daisuke Takahashi. A stochastic model for solitons. Random Struct. Algorithms, 24(1):51–64, 2004.
[36] Svante Janson. Renewal theory in the analysis of tries and strings. Theor. Comput. Sci., 416:33–54, 2012.
[37] Boris S. Kerner. Introduction to Modern Traffic Flow Theory and Control. The Long Road to Three-Phase Traffic Theory. Springer, 2009.
[38] Donald E. Knuth. The average time for carry propagation. Nederl. Akad. Wet., Proc., Ser. A, 81:238–242, 1978.
[39] Donald E. Knuth. The Art of Computer Programming. Vol. 3: Sorting and searching. Addison-Wesley, 1997 (2nd edition, 1st edition in 1973).
[40] Lukasz Kusmierz, Satya N. Majumdar, Sanjib Sabhapandit, and Grégory Schehr. First order transition for the optimal search time of Lévy flights with resetting. Phys. Rev. Lett., 113:220602, 2014.
[41] Erez Lieberman, Christoph Hauert, and Martin A. Nowak. Evolutionary dynamics on graphs. Nature, 433(7023):312–316, 2005.
[42] Satya N. Majumdar, Sanjib Sabhapandit, and Grégory Schehr. Random walk with random resetting to the maximum position. Phys. Rev. E, 92(5):052126, 13, 2015.
[43] Patrick A. P. Moran. Random processes in genetics. Proc. Camb. Philos. Soc., 54:60–71, 1958.
[44] Patrick A. P. Moran. The Statistical Processes of Evolutionary Theory. Oxford University Press, 1962.
[45] Nicholas Pippenger. Analysis of carry propagation in addition: an elementary approach. J. Algorithms, 42(2):317–333, 2002.
[46] Helmut Prodinger and Stephan Wagner. Minimal and maximal plateau lengths in Motzkin paths. Discrete Math. Theor. Comput. Sci., AH:353–362, 2007.
[47] Helmut Prodinger and Stephan Wagner. Bootstrapping and double-exponential limit laws. Discrete Math. Theor. Comput. Sci., 17(1):123–144, 2015.
[48] Mercedes H. Rosas. MacMahon symmetric functions, the partition lattice, and Young subgroups. J. Comb. Theory, A, 96(2):326–340, 2001.
[49] Bruno Salvy and Paul Zimmermann. GFUN: A Maple package for the manipulation of generating and holonomic functions in one variable. ACM Trans. Math. Softw., 20(2):163–177, 1994.
[50] Aris Spanos. Probability Theory and Statistical Inference. Empirical Modeling with Observational Data. Cambridge University Press, 2019.
[51] Wojciech Szpankowski and Vernon Rego. Yet another application of a binomial recurrence. Order statistics. Computing, 43(4):401–410, 1990.
[52] Zhen-Hang Yang, Yu-Ming Chu, and Xiao-Hui Zhang. Sharp bounds for psi function. Appl. Math. Comput., 268:1055–1063, 2015.