A Mass Transport Proof of the Ergodic Theorem

Calvin Wooyoung Chin

Abstract.

It is known that a gambler repeating a game with positive expected value has a positive probability to never go broke. We use the mass transport method to prove the generalization of this fact where the gains from the bets form a stationary, rather than an i.i.d., sequence. Birkhoff’s ergodic theorem follows from this by a standard argument.

Let $X_{1},X_{2},\ldots$ be (real-valued) random variables, and write

S_{n}:=X_{1}+\cdots+X_{n}\qquad\text{for all $n\in\mathbf{N}$.}

In this note, we write $\mathbf{N}:=\{1,2,3,\ldots\}$ .

It is known [Chi22] that a gambler that repeats a game with positive expected value has a positive chance to never go broke. More precisely, if $X_{1},X_{2},\ldots$ are i.i.d. and $\operatorname{\mathbf{E}}X_{1}>0$ , then

\operatorname{\mathbf{P}}(S_{n}>0\text{ for all }n\in\mathbf{N})>0.

In this note, we show that this holds whenever $X_{1},X_{2},\ldots$ are stationary and $\operatorname{\mathbf{E}}X_{1}>0$ (Lemma 1) using the “mass transport” method. Although no knowledge of the method will be needed, one might want to take a look at the short paper [Häg99] to get a flavor of mass transport.

Birkhoff’s ergodic theorem (Theorem 3) then follows by a standard argument. We include the derivation of the ergodic theorem for completeness.

Lemma 1.

If the sequence $X_{1},X_{2},\ldots$ is stationary and $\operatorname{\mathbf{E}}X_{1}>0$ , then

\operatorname{\mathbf{P}}(S_{n}>0\text{ for all }n\in\mathbf{N})>0.

Proof.

By Kolmogorov’s extension theorem, we may assume without loss of generality the existence of a doubly-infinite stationary sequence

\dots,X_{-2},X_{-1},X_{0},X_{1},X_{2},\dots.

Define $S_{n}$ for every $n\in\mathbf{Z}$ so that $S_{0}=0$ and $S_{n}-S_{n-1}=X_{n}$ for all $n\in\mathbf{Z}$ .

Let $n\in\mathbf{Z}$ . We call $m\in\mathbf{Z}$ a record after $n$ if

m>n\qquad\text{and}\qquad S_{m}=\min\{S_{n+1},S_{n+2},\ldots,S_{m}\}.

Notice that $n+1$ is always a record after $n$ . We now introduce the “mass” $M(n,m)$ that $n$ sends to each $m\in\mathbf{Z}$ as follows. If $X_{n+1}\leq 0$ , then $M(n,m):=0$ for all $m\in\mathbf{Z}$ . If $X_{n+1}>0$ , then let $n+1=n_{0}<n_{1}<n_{2}<\cdots$ (this might be finite in length) be the enumeration of the records after $n$ , and let

(1)

M(n,n_{j}):=\max\{S_{n_{j-1}},S_{n}\}-\max\{S_{n_{j}},S_{n}\}\qquad\text{ for each $j\in\mathbf{N}$.}

For all $m\in\mathbf{Z}\setminus\{n_{1},n_{2},\ldots\}$ , let $M(n,m):=0$ . See Figure 1 for a visualization of the mass transport.

Refer to caption — Figure 1. The records after $n$ (filled dots) and the mass that $n$ sends to them. In this particular case, $n$ sends out $a+b+c$ amount of mass, which equals $X_{n+1}$ .

Since $\dots,X_{-1},X_{0},X_{1},\dots$ is stationary and $M(n,m)\geq 0$ for all $n,m\in\mathbf{Z}$ , we have

(2)

\operatorname{\mathbf{E}}\biggl{[}\sum_{n=1}^{\infty}M(0,n)\biggr{]}=\sum_{n=1}^{\infty}\operatorname{\mathbf{E}}[M(0,n)]=\sum_{n=1}^{\infty}\operatorname{\mathbf{E}}[M(-n,0)]=\operatorname{\mathbf{E}}\biggl{[}\sum_{n=1}^{\infty}M(-n,0)\biggr{]}.

This simple equality is at the heart of the mass transport method. Suppose that $\operatorname{\mathbf{P}}(S_{n}\leq 0\text{ for some }n\in\mathbf{N})=1$ . We will derive a contradiction by evaluating each side of (2).

Assume that $X_{1}>0$ , and let $\tau:=\inf\{n\in\mathbf{N}:S_{n}\leq 0\}$ . Notice that $\tau<\infty$ a.s. Let $1=n_{0}<n_{1}<\cdots<n_{t}=\tau$ be the records after $0$ up to $\tau$ . Then we have

\sum_{n=1}^{\infty}M(0,n)=\sum_{j=1}^{t}M(0,n_{j})=\sum_{j=1}^{t-1}(S_{n_{j-1}}-S_{n_{j}})+(S_{n_{t-1}}-S_{0})=X_{1}.

Thus, the left side of (2) equals $\operatorname{\mathbf{E}}[X_{1};X_{1}>0]$ .

Let us examine the sum on the right side of (2). See Figure 2 for a visualization. If $X_{0}>0$ , then $0$ is not a record after any number below $-1$ , and thus the sum is $0$ . Assume $X_{0}\leq 0$ , and let $-1=m_{0}>m_{1}>m_{2}>\cdots$ (which might be finite in length) be the enumeration of the numbers $m<0$ such that

S_{m}<\min\{S_{m+1},\ldots,S_{-1}\}.

Let $m<0$ and let us compute $M(m,0)$ . First assume that $m=m_{j}$ for some $j\in\mathbf{N}$ , and consider the cases (a) $S_{m_{j-1}}<0$ and (b) $S_{m_{j-1}}\geq 0$ . If (a) is the case, then $0$ is not a record after $m_{j}$ , and thus $M(m,0)=0$ . Assume that (b) is the case. By the definition of $m_{0},m_{1},\dots$ , we have $S_{n}\geq S_{m_{j-1}}$ for all $n=m_{j}+1,\ldots,m_{j-1}$ . This implies that $m_{j-1}$ is a record after $m_{j}$ . Since

0\leq S_{m_{j-1}}<\min\{S_{m_{j-1}+1},\ldots,S_{-1}\},

we see that $m_{j-1}$ and $0$ are consecutive records after $m_{j}$ . Thus,

M(m,0)=S_{m_{j-1}}-\max\{0,S_{m_{j}}\}.

Combining (a) and (b) yields

M(m,0)=\max\{S_{m_{j-1}},0\}-\max\{S_{m_{j}},0\}.

If $m\neq m_{j}$ for all $j\geq 0$ , and $k\geq 0$ is the largest number such that $m<m_{k}$ , then $S_{m}\geq S_{m_{k}}$ . Since $m_{k}$ is a record after $m$ , the definition of $M$ tells us that $M(m,0)=0$ ; the two maximums in (1) are both $S_{m}$ even if $0$ is a record after $m$ . We now know what $M(m,0)$ is for all $m<0$ , and this yields

\begin{split}\operatorname{\mathbf{E}}\biggl{[}\sum_{n=1}^{\infty}M(-n,0)\biggr{]}&=\operatorname{\mathbf{E}}\biggl{[}\sum_{j\geq 1}\bigl{(}\max\{S_{m_{j-1}},0\}-\max\{S_{m_{j}},0\}\bigr{)};X_{0}\leq 0\biggr{]}\\ &\leq\operatorname{\mathbf{E}}[-X_{0};X_{0}\leq 0].\end{split}

The equation (2) now gives

\operatorname{\mathbf{E}}[X_{1};X_{1}>0]\leq\operatorname{\mathbf{E}}[-X_{0};X_{0}\leq 0],

which implies $\operatorname{\mathbf{E}}X_{1}\leq 0$ ; this is a contradiction. ∎

Remark 2.

Our argument actually proves the maximal ergodic theorem [Bil12, Theorem 24.2], which says that if $X_{1},X_{2},\ldots$ is a stationary sequence, then

\operatorname{\mathbf{E}}[X_{1};S_{n}\leq 0\text{ for some }n\in\mathbf{N}]\leq 0.

Indeed, the left side of (2) is bounded below by

\operatorname{\mathbf{E}}[X_{1};X_{1}>0\text{ and }S_{n}\leq 0\text{ for some }n\in\mathbf{N}],

while the right side of (2) is bounded above by $\operatorname{\mathbf{E}}[-X_{0};X_{0}\leq 0]$ . Thus,

\operatorname{\mathbf{E}}[X_{1};X_{1}>0\text{ and }S_{n}\leq 0\text{ for some }n\in\mathbf{N}]\leq\operatorname{\mathbf{E}}[-X_{1};X_{1}\leq 0],

and therefore $\operatorname{\mathbf{E}}[X_{1};S_{n}\leq 0\text{ for some }n\in\mathbf{N}]\leq 0$ .

There is a short proof of the maximal ergodic theorem which however lacks in intuition; see [Bil12, Theorem 24.2], for example. Our proof is an attempt to remedy this problem by utilizing the intuitive principle of mass transport.

We now prove Birkhoff’s ergodic theorem. Let $(\Omega,\mathcal{F},\operatorname{\mathbf{P}})$ be the underlying probability space, and a measurable map $T\colon\Omega\to\Omega$ be measure-preserving in the sense that $\operatorname{\mathbf{P}}(T^{-1}A)=\operatorname{\mathbf{P}}(A)$ for all $A\in\mathcal{F}$ . An event $A$ is invariant under $T$ if $T^{-1}A=A$ , and we denote the $\sigma$ -field of all invariant events by $\mathcal{I}$ .

Theorem 3 (Birkhoff’s ergodic theorem).

Let $X_{1}$ be a random variable with finite mean, and write $X_{n}:=X_{1}\circ T^{n-1}$ for $n=2,3,\dots$ . If $S_{n}:=X_{1}+\cdots+X_{n}$ for all $n\in\mathbf{N}$ , then

S_{n}/n\to\operatorname{\mathbf{E}}[X_{1}\mid\mathcal{I}]\qquad\text{a.s.}

Proof.

Since $\operatorname{\mathbf{E}}[X_{1}\mid\mathcal{I}]\circ T=\operatorname{\mathbf{E}}[X_{1}\mid\mathcal{I}]$ , we may assume that $\operatorname{\mathbf{E}}[X_{1}\mid\mathcal{I}]=0$ . Let $\epsilon>0$ and $A_{\epsilon}:=\{\liminf_{n\to\infty}S_{n}/n<-\epsilon\}$ . On the event $A_{\epsilon}$ , we have $S_{n}+n\epsilon<0$ for some $n\in\mathbf{N}$ . Thus,

(3)

\operatorname{\mathbf{P}}((S_{n}+n\epsilon)1_{A_{\epsilon}}>0\text{ for all }n\in\mathbf{N})=0.

Since $A_{\epsilon}\in\mathcal{I}$ , we have

((X_{1}+\epsilon)1_{A_{\epsilon}})\circ T^{n-1}=(X_{n}+\epsilon)1_{A_{\epsilon}}\qquad\text{for all $n\in\mathbf{N}$.}

As $((X_{n}+\epsilon)1_{A_{\epsilon}})_{n\in\mathbf{N}}$ is a stationary sequence, (3) and Lemma 1 imply

\operatorname{\mathbf{E}}[(X_{1}+\epsilon)1_{A_{\epsilon}}]\leq 0.

Since $\operatorname{\mathbf{E}}[X_{1}\mid\mathcal{I}]=0$ , we have

\operatorname{\mathbf{E}}[(X_{1}+\epsilon)1_{A_{\epsilon}}]=\epsilon\operatorname{\mathbf{P}}(A_{\epsilon}),

and thus $\operatorname{\mathbf{P}}(A_{\epsilon})=0$ . As $\epsilon$ is arbitrary, we have $\liminf_{n\to\infty}S_{n}/n\geq 0$ . Applying this result to $-X_{1}$ in place of $X_{1}$ gives $\limsup_{n\to\infty}S_{n}/n\leq 0$ . Therefore, we have $S_{n}/n\to 0$ a.s. ∎

References

[Bil12] Patrick Billingsley. Probability and measure. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc., Hoboken, NJ, 2012. Anniversary edition [of MR1324786], With a foreword by Steve Lalley and a brief biography of Billingsley by Steve Koppes.
[Chi22] Calvin Wooyoung Chin. A gambler that bets forever and the strong law of large numbers. Amer. Math. Monthly, 129(2):183–185, 2022.
[Häg99] Olle Häggström. Invariant percolation on trees and the mass-transport method. 1999.