This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the rate of convergence of the martingale central limit theorem in Wasserstein distances

Xiaoqin Guo The work of XG is supported by Simons Foundation through Collaboration Grant for Mathematicians #852943. Department of Mathematical Sciences, University of Cincinnati , 2815 Commons Way, Cincinnati, OH 45221, USA
Abstract

For martingales with a wide range of integrability, we will quantify the rate of convergence of the central limit theorem via Wasserstein distances of order rr, 1r31\leq r\leq 3. Our bounds are in terms of Lyapunov’s coefficients and the r/2\mathscr{L}^{r/2} fluctuation of the total conditional variances. We will show that our Wasserstein-1 bound is optimal up to a multiplicative constant.

1 Introduction

In this paper, we consider the rate of convergence of the martingale central limit theorem (CLT) under Wasserstein distances. Let Y={Ym}m=1nY_{\cdot}=\{Y_{m}\}_{m=1}^{n} be a square integrable martingale difference sequence (mds) with respect to σ\sigma-fields {m}m=0n\{\mathcal{F}_{m}\}_{m=0}^{n}. Here 0\mathcal{F}_{0} denotes the trivial σ\sigma-field. For 1mn1\leq m\leq n, let sm2=i=1mE[Yi2]s_{m}^{2}=\sum_{i=1}^{m}E[Y_{i}^{2}] and

σm2(Y)\displaystyle\sigma_{m}^{2}(Y) =E[Ym2|m1],\displaystyle=E[Y_{m}^{2}|\mathcal{F}_{m-1}],
Vn2\displaystyle V_{n}^{2} =1sn2i=1nσi2(Y),\displaystyle=\frac{1}{s_{n}^{2}}\sum_{i=1}^{n}\sigma_{i}^{2}(Y),
Xm\displaystyle X_{m} =1sni=1mYi.\displaystyle=\frac{1}{s_{n}}\sum_{i=1}^{m}Y_{i}. (1)

For a mds {Yi}i=1\{Y_{i}\}_{i=1}^{\infty}, when limnVn2=1\lim_{n\to\infty}V_{n}^{2}=1 in probability and Lindeberg’s condition is satisfied, it is well-known that

Xn𝒩 as n,X_{n}\Rightarrow\mathcal{N}\quad\text{ as }n\to\infty,

where 𝒩𝒩(0,1)\mathcal{N}\sim\mathcal{N}(0,1) denotes a standard normal random variable.

To quantify such a convergence in distribution, one of the most important metrics is the Wasserstein distance which has deep roots in the optimal transport theory [42]. Recall that, for two probability measures μ,ν\mu,\nu on \mathbb{R}, their Wasserstein distance (also called minimal distance) 𝒲r\mathcal{W}_{r} of order rr, r1r\geq 1, is defined by

𝒲r(μ,ν)=inf{UVr:Uμ,Vν},\mathcal{W}_{r}(\mu,\nu)=\inf\{\lVert U-V\rVert_{r}:U\sim\mu,V\sim\nu\},

where Ur=E[|U|r]1/r\lVert U\rVert_{r}=E[|U|^{r}]^{1/r}. With abuse of notations, we may use random variables as synonyms of their distributions. E.g., for Uμ,VνU\sim\mu,V\sim\nu, we may write 𝒲r(μ,ν)\mathcal{W}_{r}(\mu,\nu) as 𝒲r(U,V)\mathcal{W}_{r}(U,V).

When r=1r=1, recall that 𝒲1\mathcal{W}_{1} admits the following alternative representations:

𝒲1(U,V)=suphLip1{E[h(U)]E[h(V)]}=|FU(x)FV(x)|dx\mathcal{W}_{1}(U,V)=\sup_{h\in\rm{Lip}_{1}}\left\{E[h(U)]-E[h(V)]\right\}=\int_{\mathbb{R}}\Bigr{\lvert}F_{U}(x)-F_{V}(x)\Bigl{\rvert}\mathrm{d}x

where Lip1\rm{Lip}_{1} denotes the set of all 1-Lipschitz functions, and FU(x):=P(Ux)F_{U}(x):=P(U\leq x).

Throughout the paper, we use c,Cc,C to denote positive constants which may change from line to line. Unless otherwise stated, Cp,cpC_{p},c_{p} denote constants depending only on the parameter pp. We write ABA\lesssim B if ACBA\leq CB, and ABA\asymp B if ABA\lesssim B and ABA\gtrsim B. We also use notations ApBA\lesssim_{p}B, ApBA\asymp_{p}B to indicate that the multiplicative constant depends on the parameter pp.

1.1 Earlier results in the literature

There is an immense literature on the convergence rate of the CLT for independent random variables. Such results are often phrased in terms of Lyapunov’s coefficient (i.e., the term in the p\mathscr{L}^{p} Lyapunov condition). To be specific, for p1p\geq 1, set111Note that the meaning of our LpL_{p} notation is slightly different from that of [2, 27, 25, 38, 4] in the literature whose LpL_{p} means the LppL_{p}^{p} in our paper.

Lp=Lp(Y):=1sn(i=1nE[|Yi|p])1/p.L_{p}=L_{p}(Y):=\frac{1}{s_{n}}\left(\sum_{i=1}^{n}E[|Y_{i}|^{p}]\right)^{1/p}. (2)

Note that typically LpL_{p} is of size O(n1/p1/2)O(n^{1/p-1/2}).

When {Yi}i=1n\{Y_{i}\}_{i=1}^{n} is a centered independent sequence, a nonuniform estimate of Bikjalis [2] implies 𝒲1(Xn,𝒩)cLpp\mathcal{W}_{1}(X_{n},\mathcal{N})\leq cL_{p}^{p}, p(2,3]p\in(2,3], extending the 𝒲1\mathcal{W}_{1} bounds of Esseen [16], Zolotarev [45], and Ibragimov [29]. For r>2r>2, Sakhanenko [40] proved that 𝒲r(Xn,𝒩)cLr\mathcal{W}_{r}(X_{n},\mathcal{N})\leq cL_{r} which is optimal for independent r\mathscr{L}^{r}-integrable variables. For 1r21\leq r\leq 2, it is established by Rio [37, 38] that

𝒲r(Xn,𝒩)crLr+2(r+2)/r.\mathcal{W}_{r}(X_{n},\mathcal{N})\leq c_{r}L_{r+2}^{(r+2)/r}. (3)

In particular, when {Yi}i=1n\{Y_{i}\}_{i=1}^{n} have roughly the same r+2\mathscr{L}^{r+2} moments, the 𝒲r\mathcal{W}_{r} bound in (3) achieves the optimal rate O(n1/2)O(n^{-1/2}). Bobkov [4] confirmed Rio’s conjecture that (3) holds for all r1r\geq 1. Cf. [38, 4] and references therein. For further developments, see e.g. [44, 20, 8, 22, 14, 6, 33] for work in the multivariate setting and e.g. [21, 33] for results on random variables with local-dependence.

It is natural to to ask whether the martingale CLT can be quantified by similar Wasserstein metric bounds in terms of LpL_{p}. However, despite its theoretical importance, there are very few such results for (general) martingales compared to the independent case, let alone questions on optimal rates.

When {Yi}i=1n\{Y_{i}\}_{i=1}^{n} is a mds, the non-uniform bound of the distribution functions by Joos [31] (which generalizes Haeusler and Joos [26]) implies that

𝒲1(Xn,𝒩)cp,q(Lpp/(1+p)+Vn21qq/(2q+1)) for 2<p<,q1.\mathcal{W}_{1}(X_{n},\mathcal{N})\leq c_{p,q}\left(L_{p}^{p/(1+p)}+\lVert V_{n}^{2}-1\rVert_{q}^{q/(2q+1)}\right)\quad\text{ for }2<p<\infty,q\geq 1. (4)

In the case M=maxi=1nYi<M=\max_{i=1}^{n}\lVert Y_{i}\rVert_{\infty}<\infty, the nonuniform bound of Joos [30] implies

𝒲1(Xn,𝒩)cq(Msnlog(e+sn2M2)+Vn21qq/(2q+1)) for q1.\mathcal{W}_{1}(X_{n},\mathcal{N})\leq c_{q}\left(\frac{M}{s_{n}}\log\big{(}e+\frac{s_{n}^{2}}{M^{2}}\big{)}+\lVert V_{n}^{2}-1\rVert_{q}^{q/(2q+1)}\right)\,\text{ for }q\geq 1. (5)

Still for the \mathscr{L}^{\infty} case, Dung, Son, Tien [12] improved the last term in (5) to be CqV21q1/2C_{q}\lVert V^{2}-1\rVert_{q}^{1/2} for any q>1/2q>1/2. Under the condition

P(Vn2=1)=1,P(V_{n}^{2}=1)=1, (6)

Röllin’s [39, (2.1)] result inferred that (See also [18, Lemma 2.1])

𝒲1(Xn,𝒩)L3.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim L_{3}. (7)

Fan and Ma [18] dropped the condition (6) and obtained

𝒲1(Xn,𝒩)qL3+Vn21q1/2+1snmaxi=1nYi2q,q1.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim_{q}L_{3}+\lVert V_{n}^{2}-1\rVert_{q}^{1/2}+\frac{1}{s_{n}}\max_{i=1}^{n}\lVert Y_{i}\rVert_{2q},\quad\forall q\geq 1. (8)

Recently, assuming (6), Fan, Su [19, Corollary 2.5] implicitly extended (7) to be

𝒲1(Xn,𝒩)Lp for 2<p3.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim L_{p}\quad\text{ for }2<p\leq 3. (9)

Dedecker, Merlevède, and Rio [11] proved 𝒲r\mathcal{W}_{r} bounds, r(0,3]r\in(0,3], that involve the quantities i=nE[Yi2|1]E[Yi2]\sum_{i=\ell}^{n}E[Y_{i}^{2}|\mathcal{F}_{\ell-1}]-E[Y_{i}^{2}] (instead of Vn21V_{n}^{2}-1 in our bounds), 2n2\leq\ell\leq n, which can be suitably bounded in many situations.

Readers may refer to [10, 9, 39, 19, 11] for Wasserstein bounds for the martingale CLT under other special conditions (e.g. the sequence being stationary, σi\sigma_{i}’s being close to be deterministic, σi\sigma_{i}’s are uniformly bounded from below, or certain variants of the \mathscr{L}^{\infty} case, etc.).

Remark 1.

Unlike the Wasserstein metric, the Kolmogorov distance bounds for the martingale CLT has been thoroughly investigated since the 1970s. One of the earliest results is due to Heyde, Brown [28] which states that, for 2<p42<p\leq 4,

dK(Xn,𝒩):=supx|FXn(x)F𝒩(x)|pLpp/(p+1)+Vn21p/2p/(2p+2).d_{K}(X_{n},\mathcal{N}):=\sup_{x\in\mathbb{R}}\lvert F_{X_{n}}(x)-F_{\mathcal{N}}(x)\rvert\lesssim_{p}L_{p}^{p/(p+1)}+\lVert V_{n}^{2}-1\rVert_{p/2}^{p/(2p+2)}. (10)

When the mds is \mathscr{L}^{\infty}, i.e., maxi=1nYi=M<\max_{i=1}^{n}\lVert Y_{i}\rVert_{\infty}=M<\infty, Bolthausen [5] showed that

dK(Xn,𝒩)Mnlognsn3+min{Vn211/2,Vn2111/3}d_{K}(X_{n},\mathcal{N})\lesssim_{M}\frac{n\log n}{s_{n}^{3}}+\min\bigg{\{}\lVert V_{n}^{2}-1\rVert_{\infty}^{1/2},\lVert V_{n}^{2}-1\rVert_{1}^{1/3}\bigg{\}} (11)

and that the first term lognn1/2\tfrac{\log n}{n^{1/2}} is optimal for the case sn2=ns_{n}^{2}=n. Haeusler [24, 25] generalized (10) to 2<p<2<p<\infty and showed that the first term Lpp/(p+1)L_{p}^{p/(p+1)} is exact. Joos [31] proved that the second term in (10) can be replaced by Vn21qq/(2q+1)\lVert V_{n}^{2}-1\rVert_{q}^{q/(2q+1)}, q1q\geq 1. Mourrat [34] improved the second term in (11) to (Vn21q+sn2)q/(2q+1)(\lVert V_{n}^{2}-1\rVert_{q}+s_{n}^{-2})^{q/(2q+1)}, q1q\geq 1. Cf. also [26, 31, 23, 35, 15, 17, 19] and references therein for work in this direction.

1.2 Motivation and our contributions

Our paper concerns the 𝒲r\mathcal{W}_{r} convergence rates for the martingale CLT, 1r31\leq r\leq 3.

Let us comment on some weaknesses of the aforementioned 𝒲1\mathcal{W}_{1} bounds.

When the martingale differences are at least p\mathscr{L}^{p}-integrable, 2<p32<p\leq 3, the best 𝒲1\mathcal{W}_{1} rates given by (7), (9), (8) are typically n1/p1/2n1/6n^{1/p-1/2}\geq n^{-1/6}, leaving a big gap between the rate O(n1/2logn)O(n^{-1/2}\log n) in the \mathscr{L}^{\infty} case (5), not to mention that the condition (6) imposed in (9) is too restrictive for general martingales. Compared to (9), the exponent of LpL_{p} in (4) is clearly not optimal, at least for 2<p32<p\leq 3. But (4) does not assume (6), and it offers a typical 𝒲1\mathcal{W}_{1} rate O(n(2p)/(2p+2))O(n^{(2-p)/(2p+2)}) which is better than n1/6n^{-1/6} for p>7/2p>7/2. However, the constant cpc_{p} in (4) is expected to grow linearly as pp\to\infty, rendering (4) a useless bound when pp is bigger than n1/2n^{1/2}.

Notice that all of the results discussed above are 𝒲1\mathcal{W}_{1} bounds. There are rarely any results on the 𝒲r\mathcal{W}_{r} bounds, r>1r>1, in terms of LpL_{p} for (general) martingales.

Can we obtain 𝒲r\mathcal{W}_{r} bounds, r1r\geq 1, in terms of the Lyapunov coefficient (2) for the martingale CLT that generalize all of the previous results (4), (5), (7), (8), (9)? Further, what are the optimal rates and how to justify their optimality?

Is it possible to get Wasserstein distance bounds for the CLT so that the constant coefficents do not blow up as the integrability of the martingale increases? Such estimates would be important when we have a limited sample size, and they allow us to exploit the integrability of the martingale to obtain better rates.

In theory and in applications, there are numerous stochastic processes that do not fit into any of the p\mathscr{L}^{p} category, 2<p<2<p<\infty, cf. [41, 32, 43]. Can we quantify the CLT for martingales with much wider spectrum of integrability than p\mathscr{L}^{p}, pn1/2p\ll n^{1/2}?

Motivated by these questions, we will prove the following results.

  1. (1)

    We will obtain 𝒲r\mathcal{W}_{r}, 1r31\leq r\leq 3, convergence rates for martingales which are Orlicz-integrable. For instance, if

    A:=1ni=1nψ(|Yi|)<A:=\frac{1}{n}\sum_{i=1}^{n}\psi(|Y_{i}|)<\infty (12)

    for appropriate convex function ψ\psi that grows at most polynomially fast, then

    𝒲1(Xn,𝒩)ψA1snψ1(n)+Vn211/21/2,\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim_{\psi}\frac{A\vee 1}{s_{n}}\psi^{-1}(n)+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}, (13)

    where ψ1\psi^{-1} denotes the inverse function of ψ\psi.

    Our result greatly generalizes the known 𝒲1\mathcal{W}_{1} bounds (4), (7), (8), (9) for p\mathscr{L}^{p} martingales, 2<p32<p\leq 3. Moreover, we will prove that both terms 1snψ1(n)\tfrac{1}{s_{n}}\psi^{-1}(n) and Vn211/21/2\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2} in this 𝒲1\mathcal{W}_{1} bound are optimal.

  2. (2)

    We will derive Wasserstein bounds whose constant coefficients do not depend on the integrability of the variables. Take the 𝒲1\mathcal{W}_{1} distance for example, if (12) holds for ψ\psi in a wide class of convex functions, we show that

    𝒲1(Xn,𝒩)A1snψ1(n)logn+Vn211/21/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\frac{A\vee 1}{s_{n}}\psi^{-1}(n)\log n+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}. (14)

    This explains the presence of the logn\log n term in (5), and it implies that if the mds is \mathscr{L}^{\infty} bounded, i.e., M=maxi=1nYi<M=\max_{i=1}^{n}\lVert Y_{i}\rVert_{\infty}<\infty, then

    𝒲1(Xn,𝒩)Msnlog(e+sn2M2)+Vn211/21/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\frac{M}{s_{n}}\log(e+\frac{s_{n}^{2}}{M^{2}})+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}.

    The novelty of this result lies not only on the fact that it encompasses an even larger spectrum (all the way up to \mathscr{L}^{\infty}) of integrability than our first bound (13), but also that it yields a better bound than (13) when the order of the “integrability” ψ\psi is bigger than the logarithm of the sample size (i.e., beyond O(logn)O(\log n)).

  3. (3)

    Similar bounds for the 𝒲r\mathcal{W}_{r} distances in terms of the Lyapunov coefficient and Vn21r/2\lVert V_{n}^{2}-1\rVert_{r/2}, 1<r31<r\leq 3, will be established as well.

1.3 Structure of the paper

The organization of our paper is as follows.

Subsection 1.4 contains definitions of N-functions and the corresponding Orlicz norm for random sequences. In Section 2, we present our main Wasserstein distance bounds (Theorems 5,7, and 8) and the optimality of the 𝒲1\mathcal{W}_{1} rate (Proposition 10). In Section 3, using Taylor expansion and Lindeberg’s telescopic sum argument, we will derive 𝒲r\mathcal{W}_{r} bounds in terms of the conditional moments for martingales with Vn2=1V_{n}^{2}=1 a.s.. As consequences, 𝒲1,𝒲2\mathcal{W}_{1},\mathcal{W}_{2} bounds (Corollary 15) will be obtained for martingales that satisfy certain special conditions.

Section 4 is devoted to the proof of our main results. Our proof consists of the following components. First, we truncate the martingale as Haeusler [25] and elongate it as Dvoretzky [13] to turn it into a sequence with bounded increments and Vn2=1V_{n}^{2}=1. We will bound the error of this modification in terms of the Lyapunov conefficient and (Vn21)(V_{n}^{2}-1). (See Section 4.1.) Then, as a crucial technical step, we use Young’s inequality within conditional expectations to “decouple” the Lyapunov coefficient and σi2(Y)\sigma_{i}^{2}(Y)’s from the conditional moments, and turn the 𝒲r\mathcal{W}_{r} bound to be an optimization problem over three parameters. This argument is robust enough for us to deal with martingales with very flexible integrability. Our another key observation is that there should be different bounds for the two different scenarios when the martingale is “at most p\mathscr{L}^{p}” and “more integrable than p\mathscr{L}^{p}”. By possibly sacrificing a small factor (e.g. logn\log n), we can make the constants of the bound independent of the integrability ψ\psi, which leads to bounds of type (14).

Finally, in Section 5 we construct examples to show that our 𝒲1\mathcal{W}_{1} bound is optimal. We leave some open questions in Section 6.

1.4 Preliminaries: N-functions and an Orlicz-norm for sequences

To generalize the notion of p\mathcal{L}^{p} integrability, we recall the definitions of N-function and the corresponding Orlicz-norm in this subsection.

Definition 2.

A convex function ψ:[0,)[0,)\psi:[0,\infty)\to[0,\infty) is called an N-function if it satisfies limx0ψ(x)/x=0\lim_{x\to 0}\psi(x)/x=0, limxψ(x)/x=\lim_{x\to\infty}\psi(x)/x=\infty, and ψ(x)>0\psi(x)>0 for x>0x>0. For two functions ψ1,ψ2[0,)[0,)\psi_{1},\psi_{2}\in[0,\infty)^{[0,\infty)}, we write

ψ1ψ2 or ψ2ψ1\psi_{1}\preccurlyeq\psi_{2}\quad\text{ or }\quad\psi_{2}\succcurlyeq\psi_{1}

if ψ2(x)ψ1(x)\frac{\psi_{2}(x)}{\psi_{1}(x)} is non-decreasing on (0,)(0,\infty).

Note that every N-function ψ\psi satisfies ψx\psi\succcurlyeq x. See [1].

Denote the Fenchel-Legendre transform of ψ\psi by ψ[0,)[0,)\psi_{*}\in[0,\infty)^{[0,\infty)}, i.e.,

ψ(x)=sup{xyψ(y):y} for x0.\psi_{*}(x)=\sup\{xy-\psi(y):y\in\mathbb{R}\}\quad\text{ for }x\geq 0.

Then ψ\psi_{*} is still an N-function.Young’s inequality states that

xyψ(x)+ψ(y) for all x,y0.xy\leq\psi(x)+\psi_{*}(y)\quad\text{ for all }x,y\geq 0. (15)

Another useful relation between the pair ψ,ψ\psi,\psi_{*} is

xψ1(x)ψ1(x)2x,x[0,).x\leq\psi^{-1}(x)\psi_{*}^{-1}(x)\leq 2x,\quad\forall x\in[0,\infty). (16)

See [1] for a proof and for more properties of N-functions.

Definition 3.

For any N-function ψ\psi and nn\in\mathbb{N}, the ψ\mathscr{L}^{\psi}-Orlicz norm of a random sequence Y={Ym}m=1nY=\{Y_{m}\}_{m=1}^{n} with length nn is defined by

Yψ:=inf{c>0:1ni=1nE[ψ(|Yi|/c)]1}.\lVert Y\rVert_{\psi}:=\inf\big{\{}c>0:\frac{1}{n}\sum_{i=1}^{n}E\left[\psi(|Y_{i}|/c)\right]\leq 1\big{\}}. (17)

In particular, when n=1n=1, i.e., Y=Y1Y=Y_{1} is a single random variable, we still write

Y1ψ=inf{c>0:E[ψ(|Y1|/c)]1}.\lVert Y_{1}\rVert_{\psi}=\inf\{c>0:E[\psi(|Y_{1}|/c)]\leq 1\}.

When ψ(x)=xp\psi(x)=x^{p}, p1p\geq 1, we simply write Yψ\lVert Y\rVert_{\psi} as Yp\lVert Y\rVert_{p}. Notice that

Yp\displaystyle\lVert Y\rVert_{p} =(1ni=1nE[|Yi|p])1/p,\displaystyle=\left(\frac{1}{n}\sum_{i=1}^{n}E[|Y_{i}|^{p}]\right)^{1/p}, (18)
and limpYp\displaystyle\text{and }\quad\lim_{p\to\infty}\lVert Y\rVert_{p} =maxi=1nYi=:Y.\displaystyle=\max_{i=1}^{n}\lVert Y_{i}\rVert_{\infty}=:\lVert Y\rVert_{\infty}. (19)
Remark 4.
  1. (a)

    Using the property ψ(λx)λψ(x)\psi(\lambda x)\geq\lambda\psi(x) for λ1\lambda\geq 1 for N-functions, it is easily seen that the integrability condition (12) implies YψA1\lVert Y\rVert_{\psi}\leq A\vee 1.

  2. (b)

    Since this paper concerns square-integrable martingales, we are only interested in sup-quadratic N-functions ψx2\psi\succcurlyeq x^{2}. For instance, x2log(x+1)x^{2}\log(x+1), xpx^{p} (with p2p\geq 2), exx1e^{x}-x-1, exp(xβ)1\exp(x^{\beta})-1, exp(ln(x+1)β)1\exp\left(\ln(x+1)^{\beta}\right)-1 (with β>1\beta>1) are among such N-functions.

  3. (c)

    The meaning of our notation Yp\lVert Y\rVert_{p} is different from some work in the literature, e.g. [5, 19] where it means maxi=1nYip\max_{i=1}^{n}\lVert Y_{i}\rVert_{p}. Note that Yψmaxi=1nYiψ\lVert Y\rVert_{\psi}\leq\max_{i=1}^{n}\lVert Y_{i}\rVert_{\psi}.

  4. (d)

    Removing the 1n\tfrac{1}{n} in (18) only changes a multiplicative factor of Yp\lVert Y\rVert_{p}. However, for general ψ\lVert\cdot\rVert_{\psi}, the presence of 1n\tfrac{1}{n} in (17) is crucial for the definition–removing it would drastically change the meaning of the norm.

2 Main results

Recall Y\lVert Y\rVert_{\infty} in (19). For the mds {Yi}i=1n\{Y_{i}\}_{i=1}^{n} and an N-function ψ\psi, we generalize the notation LpL_{p} in (2) to be

Lψ=1snYψψ1(n),L=1snY,L_{\psi}=\frac{1}{s_{n}}\lVert Y\rVert_{\psi}\psi^{-1}(n),\quad L_{\infty}=\frac{1}{s_{n}}\lVert Y\rVert_{\infty}, (20)

where ψ1\psi^{-1} denotes the inverse function of ψ\psi.

Our first main results are two 𝒲1\mathcal{W}_{1} bounds that generalize (4), (5), (7), (8), (9).

Theorem 5 (𝒲1\mathcal{W}_{1} bounds).

Let {Ym}m=1n\{Y_{m}\}_{m=1}^{n} be a martingale difference sequence. Recall the notations Xn,Vn2,Lψ,L,,X_{n},V_{n}^{2},L_{\psi},L_{\infty},\preccurlyeq,\succcurlyeq in (1), (20), and Definition 2.

  1. (i)

    For any N-function ψx2\psi\succcurlyeq x^{2},

    𝒲1(Xn,𝒩)Lψlog(e+Lψ2)+Vn211/21/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim L_{\psi}\log\left(e+L_{\psi}^{-2}\right)+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}. (21)
  2. (ii)

    For any N-function ψ\psi with x2ψxpx^{2}\preccurlyeq\psi\preccurlyeq x^{p}, p>2p>2,

    𝒲1(Xn,𝒩)pLψ+Vn211/21/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim pL_{\psi}+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}. (22)

As consequences, for any p>2p>2,

𝒲1(Xn,𝒩)min{p,log(e+Lp2)}Lp+Vn211/21/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\min\left\{p,\,\log(e+L_{p}^{-2})\right\}L_{p}+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}.

In particular, for the \mathscr{L}^{\infty} case, by (19),

𝒲1(Xn,𝒩)Llog(e+L2)+Vn211/21/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim L_{\infty}\log\left(e+L_{\infty}^{-2}\right)+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}.
Remark 6.
  1. (a)

    For any sup-quadratic N-function ψx2\psi\succcurlyeq x^{2}, by Lemma A.1,

    sn=Y2n1/2ψ(1)Yψn1/2.s_{n}=\lVert Y\rVert_{2}n^{1/2}\lesssim_{\psi(1)}\lVert Y\rVert_{\psi}n^{1/2}.

    Hence, by the definition of LψL_{\psi} in (20),

    Lψψ(1)ψ1(n)n1/2ψ(1)ψ1(1)n1/2.L_{\psi}\gtrsim_{\psi(1)}\psi^{-1}(n)n^{-1/2}\gtrsim_{\psi(1)}\psi^{-1}(1)n^{-1/2}.

    Thus, log(e+Lψ2)\log(e+L_{\psi}^{-2}) in Theorem 5(i) is dominated by log(n)\log(n).

  2. (b)

    Both bounds (21) and (22) in Theorem 5 have their strengths and weaknesses.

    Apparently, the second bound (22) gives better rates for mds which are at most polynomially integrable. E.g., for the “barely more than square integrable” case ψ=x2log(x+1)\psi=x^{2}\log(x+1), (22) yields 1sn(nlogn)1/2(loglogn)Yψ\frac{1}{s_{n}}(\frac{n}{\log n})^{1/2}(\log\log n)\lVert Y\rVert_{\psi} as the first term in the bound. However, (22) becomes a trivial bound for pn1/2p\gg n^{1/2}.

    Although the first bound (21) is seemingly log(n)\log(n) factor worse than the latter, it is applicable to martingales with more general integrability. For instance, if the martingale is exponentially integrable, taking ψ=exx1\psi=e^{x}-x-1, Theorem 5(i) yields 𝒲1(Xn,𝒩)Yψ(logn)2/sn+Vn211/21/2\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\lVert Y\rVert_{\psi}(\log n)^{2}/s_{n}+\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}, where the first term is better than the O(n1/p/sn)O(n^{1/p}/s_{n}) rates offered by (22) for any p>0p>0.

  3. (c)

    In some sense, the term Vn21\lVert V_{n}^{2}-1\rVert quantifies the extent of decorrelation of the process. For independent sequences, (Vn21)(V_{n}^{2}-1) is 0. In general, (Vn21)(V_{n}^{2}-1) could converge to 0 arbitrarily fast, depending on how decorrelated the process is.

  4. (d)

    The term (Vn21)(V_{n}^{2}-1) is essential for bounds of type

    𝒲r(Xn,𝒩)C1Lψa+C2Vn21cb,\mathcal{W}_{r}(X_{n},\mathcal{N})\leq C_{1}L_{\psi}^{a}+C_{2}\lVert V_{n}^{2}-1\rVert_{c}^{b}, (23)

    i.e., we cannot allow C2C_{2} to be 0. For example, let BB be such that P(B=1±12)=12P(B=1\pm\tfrac{1}{2})=\tfrac{1}{2} and let (ξi)i=1n(\xi_{i})_{i=1}^{n} be i.i.d. 𝒩(0,1)\mathcal{N}(0,1) variables. Define Yi=(B/n)1/2ξiY_{i}=(B/n)^{1/2}\xi_{i}. Then limn𝒲r(i=1nYi,𝒩)=𝒲r(Bξ1,𝒩)0\lim_{n\to\infty}\mathcal{W}_{r}(\sum_{i=1}^{n}Y_{i},\mathcal{N})=\mathcal{W}_{r}(\sqrt{B}\xi_{1},\mathcal{N})\neq 0. Whereas, for p>2p>2, Lp=O(n1/p1/2)0L_{p}=O(n^{1/p-1/2})\to 0 as nn\to\infty. Thus C20C_{2}\neq 0.

Our next main results concern the 𝒲2\mathcal{W}_{2} and 𝒲3\mathcal{W}_{3} convergence rates.

Theorem 7 (𝒲2\mathcal{W}_{2} bounds).

Let {Ym}m=1n\{Y_{m}\}_{m=1}^{n} be a martingale difference sequence.

  1. (i)

    For any N-function ψx3\psi\preccurlyeq x^{3} such that xψ(x)x\mapsto\psi(\sqrt{x}) is convex,

    𝒲2(Xn,𝒩)Lψ+Vn2111/2.\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim L_{\psi}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2}.
  2. (ii)

    For any N-function ψx3\psi\succcurlyeq x^{3},

    𝒲2(Xn,𝒩)ψ(1)Lψ+Yψsn[ng1(ψ1(n)2)]1/4+Vn2111/2,\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim_{\psi(1)}L_{\psi}+\frac{\lVert Y\rVert_{\psi}}{s_{n}}\left[ng^{-1}(\psi^{-1}(n)^{2})\right]^{1/4}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2},

    where gg denotes the inverse function of xψ(x)xx\mapsto\tfrac{\psi(x)}{x}.

As consequences, for p(2,)p\in(2,\infty),

𝒲2(Xn,𝒩){Lp+Vn2111/2 when p(2,3],n1/4+1/(2p22p)sn1Yp+Vn2111/2 when p>3.\displaystyle\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim\left\{\begin{array}[]{lr}L_{p}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2}&\text{ when }p\in(2,3],\\ n^{1/4+1/(2p^{2}-2p)}s_{n}^{-1}\lVert Y\rVert_{p}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2}&\text{ when }p>3.\end{array}\right.

In particular, taking pp\to\infty,

𝒲2(Xn,𝒩)n1/4snY+Vn2111/2.\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim\frac{n^{1/4}}{s_{n}}\lVert Y\rVert_{\infty}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2}. (24)
Theorem 8 (𝒲3\mathcal{W}_{3} bound).

Let {Ym}m=1n\{Y_{m}\}_{m=1}^{n} be a martingale difference sequence. Then

𝒲3(Xn,𝒩)L3+Vn213/21/2.\mathcal{W}_{3}(X_{n},\mathcal{N})\lesssim L_{3}+\lVert V_{n}^{2}-1\rVert_{3/2}^{1/2}.
Remark 9.

For the ease of our presentation, we only present results in terms of the 𝒲r\mathcal{W}_{r} metrics, r{1,2,3}r\in\{1,2,3\}. Readers can use the interpolation inequality

𝒲r(Xn,𝒩)𝒲j(Xn,𝒩)j(kr)/(kj)𝒲k(Xn,𝒩)k(rj)/(kj),j<r<k,\mathcal{W}_{r}(X_{n},\mathcal{N})\leq\mathcal{W}_{j}(X_{n},\mathcal{N})^{j(k-r)/(k-j)}\mathcal{W}_{k}(X_{n},\mathcal{N})^{k(r-j)/(k-j)},\quad j<r<k,

to easily infer other 𝒲r\mathcal{W}_{r} bounds, r(1,3)r\in(1,3).

The following proposition justifies that the terms LψL_{\psi} and the exponent 1/21/2 of Vn211/21/2\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2} within the 𝒲1\mathcal{W}_{1} bound (22) are optimal.

Proposition 10.

(Optimality of the 𝒲1\mathcal{W}_{1} rates).

  1. (1)

    For any p>2p>2, there exists a constant NN depending on pp such that for any n>Nn>N, we can find a martingale {Xi}i=1n\{X_{i}\}_{i=1}^{n} which satisfies

    • Vn2=1V_{n}^{2}=1, a.s.;

    • Lpp(logn)1L_{p}\lesssim_{p}(\log n)^{-1};

    • 𝒲1(Xn,𝒩)pLp\mathcal{W}_{1}(X_{n},\mathcal{N})\asymp_{p}L_{p}.

  2. (2)

    For any n2n\geq 2, there exists a martingale {Xi}i=1n\{X_{i}\}_{i=1}^{n} that satisfies

    • Vn211/2(logn)2\lVert V_{n}^{2}-1\rVert_{1/2}\lesssim(\log n)^{-2};

    • 𝒲1(Xn,𝒩)Vn211/21/2\mathcal{W}_{1}(X_{n},\mathcal{N})\asymp\lVert V_{n}^{2}-1\rVert_{1/2}^{1/2}.

3 Martingales with Vn2=1V_{n}^{2}=1

In this section we will derive 𝒲r\mathcal{W}_{r} bounds, r{1,2,3}r\in\{1,2,3\}, for martingales with Vn2=1V_{n}^{2}=1. As in [39, 18, 11, 19], we will use Lindeberg’s argument and Taylor expansion to bound the Wasserstein distance with a sum involving the third order conditional moments. Our derivation of the 𝒲r\mathcal{W}_{r} bounds for r>1r>1 also relies on an observation of Rio [38] that relates the Wasserstein distances to Zolotarev’s ideal metric (25).

Note that [39, 18, 19] use Stein’s method to express the 𝒲1\mathcal{W}_{1} bound in terms of derivatives of Stein’s equation. But as observed in [11], Gaussian smoothing can already provide us enough regularity to do Taylor expansions. We follow [11] in this respect.

3.1 Wasserstein distance bounds via Lindeberg’s argument

For any nn-th order differentiable function ff and γ(0,1]\gamma\in(0,1], denote its (n,γ)(n,\gamma)-Hölder constant by

[f]n,γ:=supx,y:xyf(n)(x)f(n)(y)|xy|γ.[f]_{n,\gamma}:=\sup_{x,y:x\neq y}\frac{f^{(n)}(x)-f^{(n)}(y)}{|x-y|^{\gamma}}.

We say that fCn,γf\in C^{n,\gamma} if [f]n,γ<[f]_{n,\gamma}<\infty.

Definition 11.

Suppose r>1r>1, and μ\mu, ν\nu are probability measures on \mathbb{R}. Let \ell\in\mathbb{N} and γ(0,1]\gamma\in(0,1] be the unique numbers such that r=+γr=\ell+\gamma. We let Λr={fC,γ:[f],γ1}\Lambda_{r}=\{f\in C^{\ell,\gamma}:[f]_{\ell,\gamma}\leq 1\}. The Zolotarev distance 𝒵r\mathcal{Z}_{r} is defined by

𝒵r(μ,ν)=sup{fdμfdν:fΛr}.\mathcal{Z}_{r}(\mu,\nu)=\sup\left\{\int_{\mathbb{R}}f\mathrm{d}\mu-\int_{\mathbb{R}}f\mathrm{d}\nu:f\in\Lambda_{r}\right\}.

Recall that by the Kantorovich-Rubinstein Theorem, 𝒲r(μ,ν)=𝒵r(μ,ν)\mathcal{W}_{r}(\mu,\nu)=\mathcal{Z}_{r}(\mu,\nu) for r=1r=1. When r>1r>1, it is shown by Rio [38, Theorem 3.1] that

𝒲r(μ,ν)r𝒵r(μ,ν)1/r.\mathcal{W}_{r}(\mu,\nu)\lesssim_{r}\mathcal{Z}_{r}(\mu,\nu)^{1/r}. (25)

For σ>0\sigma>0 and any function fΛrf\in\Lambda_{r}, let

fσ(x):=E[f(x+σ𝒩)],f_{\sigma}(x):=E[f(x+\sigma\mathcal{N})], (26)

where 𝒩\mathcal{N} denotes a standard normal random variable. Direct integration by parts would yield the following regularity estimate for the Gaussian smoothing fσf_{\sigma} which is a special case of [10, Lemma 6.1].

Lemma 12.

[10, Lemma 6.1] For σ>0\sigma>0 and fΛrf\in\Lambda_{r}, rr\in\mathbb{N}, let fσf_{\sigma} be as in (26). Then

fσ(k)Ck+1rσrk,kr,\lVert f_{\sigma}^{(k)}\rVert_{\infty}\leq C_{k+1-r}\sigma^{r-k},\quad\forall k\geq r,

where Cn=|uϕ(n)(u)|duC_{n}=\int_{\mathbb{R}}\lvert u\phi^{(n)}(u)\rvert\mathrm{d}u, and ϕ(x)=12πex2/2\phi(x)=\tfrac{1}{\sqrt{2\pi}}e^{-x^{2}/2} is the standard normal density.

For the reader’s convenience, we include a proof of Lemma 12 in the Appendix.

Recall the notations σm(Y),Xm\sigma_{m}(Y),X_{m} in (1). For any random variable UU, we write the expectation conditioning on i1\mathcal{F}_{i-1} as EiE_{i}, i.e.,

Ei[U]:=E[U|i1].E_{i}[U]:=E[U|\mathcal{F}_{i-1}]. (27)
Proposition 13.

Let Y={Ym}m=1nY=\{Y_{m}\}_{m=1}^{n}, n2n\geq 2, be a martingale difference sequence and let β>0\beta>0. Assume that i=1nσi2(Y)=sn2\sum_{i=1}^{n}\sigma_{i}^{2}(Y)=s_{n}^{2} almost surely. For any β>0\beta>0, set

λm2=λm2(Y,β):=i=m+1nσi2(Y)+β2sn2.\lambda_{m}^{2}=\lambda_{m}^{2}(Y,\beta):=\sum_{i=m+1}^{n}\sigma_{i}^{2}(Y)+\beta^{2}s_{n}^{2}.

Then, for any x3ψ[0,)[0,)x^{3}\succcurlyeq\psi\in[0,\infty)^{[0,\infty)} such that xψ(x)x\mapsto\psi(\sqrt{x}) is convex,

𝒵r(Xn,𝒩)\displaystyle\mathcal{Z}_{r}(X_{n},\mathcal{N}) βr+1snrEi=1nmin{Ei[|Yi|3]λi3r,λirEiψ(|Yi|)ψ(λi)},r{1,2},\displaystyle\lesssim\beta^{r}+\frac{1}{s_{n}^{r}}E\sum_{i=1}^{n}\min\left\{\frac{E_{i}[|Y_{i}|^{3}]}{\lambda_{i}^{3-r}},\frac{\lambda_{i}^{r}E_{i}\psi(|Y_{i}|)}{\psi(\lambda_{i})}\right\},\,r\in\{1,2\}, (28)
𝒵3(Xn,𝒩)\displaystyle\mathcal{Z}_{3}(X_{n},\mathcal{N}) 1sn3i=1nE[|Yi|3].\displaystyle\lesssim\frac{1}{s_{n}^{3}}\sum_{i=1}^{n}E[|Y_{i}|^{3}]. (29)
Proof of Proposition 13:.

Without loss of generality , assume sn=1s_{n}=1.

Let ξ,𝒩\xi,\mathcal{N} be random variables with distributions 𝒩(0,β2)\mathcal{N}(0,\beta^{2}) and 𝒩(0,1)\mathcal{N}(0,1), so that the triple (ξ,𝒩,{Xi}i=1n)(\xi,\mathcal{N},\{X_{i}\}_{i=1}^{n}) are independent. For hΛrh\in\Lambda_{r}, r{1,2}r\in\{1,2\}, and any random variable UU which is independent of ξ\xi,

|E[h(U+ξ)h(U)]|=|E[h(U+ξ)i=0r1h(i)(U)ξi]|1r!E[|ξ|r].\displaystyle\lvert E[h(U+\xi)-h(U)]\rvert=\Bigr{\lvert}E[h(U+\xi)-\sum_{i=0}^{r-1}h^{(i)}(U)\xi^{i}]\Bigl{\rvert}\leq\tfrac{1}{r!}E[|\xi|^{r}].

The triangle inequality and (25) then yields, for r{1,2}r\in\{1,2\},

|𝒵r(Xn,𝒩)𝒵r(Xn+ξ,𝒩+ξ)|βr.\lvert\mathcal{Z}_{r}(X_{n},\mathcal{N})-\mathcal{Z}_{r}(X_{n}+\xi,\mathcal{N}+\xi)\rvert\lesssim\beta^{r}. (30)

In what follows we will use Lindeberg’s argument to derive bounds for 𝒵r(Xn+ξ,𝒩+ξ)\mathcal{Z}_{r}(X_{n}+\xi,\mathcal{N}+\xi), r{1,2,3}r\in\{1,2,3\}. Recall notations in (1), (27).

Note that λm\lambda_{m} is m1\mathcal{F}_{m-1}-measurable. Recall the function hσh_{\sigma} in (26). Recall the operation EiE_{i} in (27). For hΛrh\in\Lambda_{r}, we consider the following telescopic sum:

E[h(Xn+ξ)h(𝒩+ξ)]\displaystyle E[h(X_{n}+\xi)-h(\mathcal{N}+\xi)] =i=1nE[h(Xi+λi𝒩)h(Xi1+λi1𝒩)]=Ei=1nDi,\displaystyle=\sum_{i=1}^{n}E[h(X_{i}+\lambda_{i}\mathcal{N})-h(X_{i-1}+\lambda_{i-1}\mathcal{N})]=E\sum_{i=1}^{n}D_{i},

where Di:=Ei[hλi(Xi)hλi(Xi1+σi𝒩)]D_{i}:=E_{i}[h_{\lambda_{i}}(X_{i})-h_{\lambda_{i}}(X_{i-1}+\sigma_{i}\mathcal{N})].

Note that

Di=Eiσi𝒩Yi0thλi′′(Xi1+s)hλi′′(Xi1)dsdt.D_{i}=E_{i}\int_{\sigma_{i}\mathcal{N}}^{Y_{i}}\int_{0}^{t}h^{\prime\prime}_{\lambda_{i}}(X_{i-1}+s)-h^{\prime\prime}_{\lambda_{i}}(X_{i-1})\mathrm{d}s\mathrm{d}t. (31)

Hence, for hΛrh\in\Lambda_{r}, r{1,2,3}r\in\{1,2,3\}, using the fact σi3Ei[|Yi|3]\sigma_{i}^{3}\leq E_{i}[\lvert Y_{i}\rvert^{3}], we have

|Di|Ei[|Yi|3]hλi(3)Lemma12Ei[|Yi|3]λir3.|D_{i}|\lesssim E_{i}[|Y_{i}|^{3}]\lVert h_{\lambda_{i}}^{(3)}\rVert_{\infty}\stackrel{{\scriptstyle Lemma~{}\ref{lem:smoothing}}}{{\lesssim}}E_{i}[|Y_{i}|^{3}]\lambda_{i}^{r-3}. (32)

When r=3r=3, notice that β\beta is irrelevant . So, letting β0\beta\downarrow 0 we immediately get (29). It remains to consider r{1,2}r\in\{1,2\}.

Using the fact that for fC1,1f\in C^{1,1},

|0y0tf′′(x+s)f′′(x)dsdt|min{16|y|3f(3),y2f′′},\displaystyle\Bigr{\lvert}\int_{0}^{y}\int_{0}^{t}f^{\prime\prime}(x+s)-f^{\prime\prime}(x)\mathrm{d}s\mathrm{d}t\Bigl{\rvert}\leq\min\left\{\tfrac{1}{6}|y|^{3}\lVert f^{(3)}\rVert_{\infty},\,y^{2}\lVert f^{\prime\prime}\rVert_{\infty}\right\},

we get, for any event AiA_{i} and any hΛrh\in\Lambda_{r}, r{1,2}r\in\{1,2\},

I:\displaystyle\mathrm{I}: =|Ei0Yi0th′′λi(Xi1+s)h′′λi(Xi1)dsdt|\displaystyle=\Bigr{\lvert}E_{i}\int_{0}^{Y_{i}}\int_{0}^{t}h^{\prime\prime}_{\lambda_{i}}(X_{i-1}+s)-h^{\prime\prime}_{\lambda_{i}}(X_{i-1})\mathrm{d}s\mathrm{d}t\Bigl{\rvert}
Ei[16|Yi|3hλi(3)𝟙Aic+Yi2hλi′′𝟙Ai]\displaystyle\leq E_{i}\left[\tfrac{1}{6}|Y_{i}|^{3}\lVert h_{\lambda_{i}}^{(3)}\rVert_{\infty}\mathbbm{1}_{A_{i}^{c}}+Y_{i}^{2}\lVert h_{\lambda_{i}}^{\prime\prime}\rVert_{\infty}\mathbbm{1}_{A_{i}}\right]
Lemma12Ei[|Yi|3λir3𝟙Aic+Yi2λir2𝟙Ai],\displaystyle\stackrel{{\scriptstyle Lemma~{}\ref{lem:smoothing}}}{{\lesssim}}E_{i}\left[|Y_{i}|^{3}\lambda_{i}^{r-3}\mathbbm{1}_{A_{i}^{c}}+Y_{i}^{2}\lambda_{i}^{r-2}\mathbbm{1}_{A_{i}}\right],

and similarly

II:\displaystyle\mathrm{II}: =|Ei0σi𝒩0th′′λi(Xi1+s)h′′λi(Xi1)dsdt|\displaystyle=\Bigr{\lvert}E_{i}\int_{0}^{\sigma_{i}\mathcal{N}}\int_{0}^{t}h^{\prime\prime}_{\lambda_{i}}(X_{i-1}+s)-h^{\prime\prime}_{\lambda_{i}}(X_{i-1})\mathrm{d}s\mathrm{d}t\Bigl{\rvert}
Ei[σ3λir3𝟙Aic+σi2λir2𝟙Ai].\displaystyle\lesssim E_{i}\left[\sigma^{3}\lambda_{i}^{r-3}\mathbbm{1}_{A_{i}^{c}}+\sigma_{i}^{2}\lambda_{i}^{r-2}\mathbbm{1}_{A_{i}}\right].

Further, since ψx3\psi\preccurlyeq x^{3}, we have s3/t3ψ(s)/ψ(t)s^{3}/t^{3}\leq\psi(s)/\psi(t) for 0<st0<s\leq t. Hence

|Yi|3λir3𝟙|Yi|<λiλirψ(|Yi|)ψ(λi)𝟙|Yi|<λi.|Y_{i}|^{3}\lambda_{i}^{r-3}\mathbbm{1}_{|Y_{i}|<\lambda_{i}}\leq\lambda_{i}^{r}\frac{\psi(|Y_{i}|)}{\psi(\lambda_{i})}\mathbbm{1}_{|Y_{i}|<\lambda_{i}}.

Since ψx2\psi\succcurlyeq x^{2}, we have s2/t2ψ1(s)/ψ1(t)s^{2}/t^{2}\leq\psi_{1}(s)/\psi_{1}(t) for 0<ts0<t\leq s. Thus

Yi2λir2𝟙|Yi|λiλirψ(|Yi|)ψ(λi)𝟙|Yi|λi,Y_{i}^{2}\lambda_{i}^{r-2}\mathbbm{1}_{|Y_{i}|\geq\lambda_{i}}\leq\lambda_{i}^{r}\frac{\psi(|Y_{i}|)}{\psi(\lambda_{i})}\mathbbm{1}_{|Y_{i}|\geq\lambda_{i}},

and so, for r{1,2}r\in\{1,2\},

IEi[|Yi|3λir3𝟙|Yi|<λi+Yi2λir2𝟙|Yi|λi]λirEi[ψ(|Yi|)]ψ(λi).\displaystyle\mathrm{I}\lesssim E_{i}\left[|Y_{i}|^{3}\lambda_{i}^{r-3}\mathbbm{1}_{|Y_{i}|<\lambda_{i}}+Y_{i}^{2}\lambda_{i}^{r-2}\mathbbm{1}_{|Y_{i}|\geq\lambda_{i}}\right]\lesssim\frac{\lambda_{i}^{r}E_{i}[\psi(|Y_{i}|)]}{\psi(\lambda_{i})}.

Similarly, for hΛrh\in\Lambda_{r}, r{1,2}r\in\{1,2\},

IIEi[σ3λir3𝟙σi<λi+σi2λir2𝟙σiλi]λirψ(σi)ψ(λi).\mathrm{II}\lesssim E_{i}\left[\sigma^{3}\lambda_{i}^{r-3}\mathbbm{1}_{\sigma_{i}<\lambda_{i}}+\sigma_{i}^{2}\lambda_{i}^{r-2}\mathbbm{1}_{\sigma_{i}\geq\lambda_{i}}\right]\lesssim\frac{\lambda_{i}^{r}\psi(\sigma_{i})}{\psi(\lambda_{i})}.

Since xψ(x)x\mapsto\psi(\sqrt{x}) is convex, by Jensen’s inequality we know that

ψ(σi)Ei[ψ(|Yi|)].\psi(\sigma_{i})\leq E_{i}[\psi(|Y_{i}|)]. (33)

Thus, we obtain, for hΛrh\in\Lambda_{r}, r{1,2}r\in\{1,2\},

|Di|(31)I+IIλirEi[ψ(|Yi|)]ψ(|λi|).|D_{i}|\stackrel{{\scriptstyle\eqref{eq:di-bound}}}{{\leq}}\mathrm{I}+\mathrm{II}\lesssim\frac{\lambda_{i}^{r}E_{i}[\psi(|Y_{i}|)]}{\psi(|\lambda_{i}|)}.

This inequality, together with (32), (30) and (25), yields the Proposition. ∎

Remark 14.

If there exists 1jn1\leq j\leq n such that, given j1\mathcal{F}_{j-1}, the distribution of YjY_{j} is the normal 𝒩(0,σj2)\mathcal{N}(0,\sigma_{j}^{2}), then the jj-th summand in (28) and (29) can be removed. That is, the summations in (28) and (29) can both be replaced by

i=1,ijn.\sum_{i=1,i\neq j}^{n}.

Indeed, in this case, XjX_{j} and Xj1+σj𝒩X_{j-1}+\sigma_{j}\mathcal{N} are identically distributed conditioning on j1\mathcal{F}_{j-1}. Thus Dj=0D_{j}=0.

3.2 The rates of some special cases

From Proposition 13 we can obtain Wasserstein distance bounds for some special cases (with Vn2=1V_{n}^{2}=1). See Corollary 15 below. These cases usually yield better rates than the typical cases. They are not only of interest on their own right, but can also serve as important references when we construct counterexamples.

The 𝒲1\mathcal{W}_{1} bounds within Corollary 15 can be considered generalizations of some of the results in [39, Corollaries 2.2,2.3] and [19, Corollaries 2.5,2.6,2.7] to the ψ\mathscr{L}^{\psi} integrable cases.

Corollary 15.

Assume that Vn2=1V_{n}^{2}=1 almost surely. Let ψ\psi be an N-function such that ψx3\psi\preccurlyeq x^{3} and xψ(x)x\mapsto\psi(\sqrt{x}) is convex. Then the following statements hold.

  1. (i)

    𝒲r(Xn,𝒩)Lψ\mathcal{W}_{r}(X_{n},\mathcal{N})\lesssim L_{\psi}  for r{1,2}r\in\{1,2\}.

  2. (ii)

    If there exists σ>0\sigma>0 such that

    P(σiσ for all 1in)=1,P(\sigma_{i}\geq\sigma\,\text{ for all }1\leq i\leq n)=1,

    then, writing Mψ=maxi=1nE[ψ(|Yi|)]M_{\psi}=\max_{i=1}^{n}E[\psi(|Y_{i}|)] (and write MψM_{\psi} as M3M_{3} when ψ=x3\psi=x^{3}),

    • 𝒲1(Xn,𝒩){M3σ2snlogn when ψ=x3,(3p)1Mψsn2σ2ψ(sn) when ψxp,p(2,3);\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\left\{\begin{array}[]{lr}\frac{M_{3}}{\sigma^{2}s_{n}}\log n&\text{ when }\psi=x^{3},\\ (3-p)^{-1}\frac{M_{\psi}s_{n}^{2}}{\sigma^{2}\psi(s_{n})}&\text{ when }\psi\preccurlyeq x^{p},p\in(2,3);\end{array}\right..

    • 𝒲2(Xn,𝒩)Mψ1/2snσψ(sn)1/2\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim\frac{M_{\psi}^{1/2}s_{n}}{\sigma\psi(s_{n})^{1/2}}.

  3. (iii)

    If there exists a constant θ>0\theta>0 such that almost surely,

    E[ψ(|Yi|)|i1]θσi2i=1,,n,E[\psi(|Y_{i}|)|\mathcal{F}_{i-1}]\leq\theta\sigma_{i}^{2}\quad\forall i=1,\ldots,n,

    then

    • 𝒲1(Xn,𝒩)θ{1snlog(e+sn) when ψ=x3,sn2ψ(sn) when ψxp,p(2,3);\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim_{\theta}\left\{\begin{array}[]{lr}\frac{1}{s_{n}}\log(e+s_{n})&\text{ when }\psi=x^{3},\\ \frac{s_{n}^{2}}{\psi(s_{n})}&\text{ when }\psi\preccurlyeq x^{p},p\in(2,3);\end{array}\right.

    • 𝒲2(Xn,𝒩)θsnψ(sn)1/2\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim_{\theta}\frac{s_{n}}{\psi(s_{n})^{1/2}}.

    In particular, when θ=maxi=1nYi<\theta=\max_{i=1}^{n}\lVert Y_{i}\rVert_{\infty}<\infty, we have

    𝒲1(Xn,𝒩)θsnlog(e+sn) and 𝒲2(Xn,𝒩)(θsn)1/2.\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\frac{\theta}{s_{n}}\log(e+s_{n})\,\text{ and }\,\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim(\tfrac{\theta}{s_{n}})^{1/2}.
Proof.

(i) For the ease of notation we write θ:=Yψ\theta:=\lVert Y\rVert_{\psi}. Note that xψ(x/θ)x\mapsto\psi(x/\theta) is still an N-function that satisfies conditions of Proposition 13. Recall the conditional expectation EiE_{i} in (27). By Proposition 13,

𝒵r(Xn,𝒩)\displaystyle\mathcal{Z}_{r}(X_{n},\mathcal{N}) βr+1snrEi=1nλirEi[ψ(|Yi|/θ)]ψ(λi/θ)\displaystyle\lesssim\beta^{r}+\frac{1}{s_{n}^{r}}E\sum_{i=1}^{n}\frac{\lambda_{i}^{r}E_{i}[\psi(|Y_{i}|/\theta)]}{\psi(\lambda_{i}/\theta)}
βr+1snrEi=1n(βsn)rEi[ψ(|Yi|/θ)]ψ(βsn/θ)\displaystyle\lesssim\beta^{r}+\frac{1}{s_{n}^{r}}E\sum_{i=1}^{n}\frac{(\beta s_{n})^{r}E_{i}[\psi(|Y_{i}|/\theta)]}{\psi(\beta s_{n}/\theta)}
βr+nβrψ(βsn/θ) for any β>0,r{1,2},\displaystyle\lesssim\beta^{r}+\frac{n\beta^{r}}{\psi(\beta s_{n}/\theta)}\quad\text{ for any }\beta>0,r\in\{1,2\},

where in the second inequality we used the fact that λiβsn\lambda_{i}\geq\beta s_{n} for all ii and that xxrψ(x)x\mapsto\frac{x^{r}}{\psi(x)} is decreasing. Taking β=θsnψ1(n)\beta=\frac{\theta}{s_{n}}\psi^{-1}(n), the 𝒵1\mathcal{Z}_{1}, 𝒵2\mathcal{Z}_{2} bounds are proved.

(ii) Recall λi\lambda_{i} in Proposition 13. When σiσ\sigma_{i}\geq\sigma for all ii we have

λi2(ni)σ2+β2sn2,1in.\lambda_{i}^{2}\geq(n-i)\sigma^{2}+\beta^{2}s_{n}^{2},\quad\forall 1\leq i\leq n.

Thus, by Proposition 13, we get, for any β2σ/sn\beta\geq 2\sigma/s_{n}, xrψx3x^{r}\preccurlyeq\psi\preccurlyeq x^{3}, r{1,2}r\in\{1,2\},

𝒵r(Xn,𝒩)\displaystyle\mathcal{Z}_{r}(X_{n},\mathcal{N}) βr+Mψsnri=1n[(ni)σ2+β2sn2]r/2ψ((ni)σ2+β2sn2)\displaystyle\lesssim\beta^{r}+\frac{M_{\psi}}{s_{n}^{r}}\sum_{i=1}^{n}\tfrac{[(n-i)\sigma^{2}+\beta^{2}s_{n}^{2}]^{r/2}}{\psi(\sqrt{(n-i)\sigma^{2}+\beta^{2}s_{n}^{2}})}
βr+Mψsnrσ2σnσ+βsntr+1ψ(t)dt.\displaystyle\lesssim\beta^{r}+\frac{M_{\psi}}{s_{n}^{r}\sigma^{2}}\int_{\sigma}^{\sqrt{n}\sigma+\beta s_{n}}\frac{t^{r+1}}{\psi(t)}\mathrm{d}t. (34)

The first 𝒵1\mathcal{Z}_{1} bound follows when r=1r=1, ψ(x)=x3\psi(x)=x^{3}, β=2M3/(snσ2)\beta=2M_{3}/(s_{n}\sigma^{2}).

Now consider the case xrψxpx^{r}\preccurlyeq\psi\preccurlyeq x^{p}, p(2,r+2)(2,3]p\in(2,r+2)\cap(2,3], r{1,2}r\in\{1,2\}. Since sn2nσs_{n}^{2}\geq n\sigma and xxp/ψ(x)x\mapsto x^{p}/\psi(x) is increasing, we get, for any β[2σsn,1]\beta\in[\tfrac{2\sigma}{s_{n}},1],

σnσ+βsntpψ(t)tr+1pdt\displaystyle\int_{\sigma}^{\sqrt{n}\sigma+\beta s_{n}}\frac{t^{p}}{\psi(t)}t^{r+1-p}\mathrm{d}t (2sn)pψ(2sn)02sntr+1pdt\displaystyle\leq\frac{(2s_{n})^{p}}{\psi(2s_{n})}\int_{0}^{2s_{n}}t^{r+1-p}\mathrm{d}t
1r+2p(2sn)r+2ψ(2sn)\displaystyle\leq\frac{1}{r+2-p}\frac{(2s_{n})^{r+2}}{\psi(2s_{n})}
1r+2p2rsnr+2ψ(sn).\displaystyle\leq\frac{1}{r+2-p}\frac{2^{r}s_{n}^{r+2}}{\psi(s_{n})}.

Taking β\beta such that βr=CMψσ2sn2ψ(sn)\beta^{r}=C\frac{M_{\psi}}{\sigma^{2}}\frac{s_{n}^{2}}{\psi(s_{n})} for appropriate constant C>0C>0 and recalling (3.2), we get (the case xrψxpx^{r}\preccurlyeq\psi\preccurlyeq x^{p}, p(2,r+2)(2,3]p\in(2,r+2)\cap(2,3])

𝒵r(Xn,𝒩)1r+2pβr.\mathcal{Z}_{r}(X_{n},\mathcal{N})\lesssim\frac{1}{r+2-p}\beta^{r}.

the rest of the 𝒲r\mathcal{W}_{r} bounds in (ii) follow. Note that this is a trivial inequality (since 𝒲r(Xn,𝒩)1\mathcal{W}_{r}(X_{n},\mathcal{N})\lesssim 1) if β>1\beta>1. If β<1\beta<1, it remains to justify that such a choice of β\beta satisfies β2σ/sn\beta\geq 2\sigma/s_{n}. Indeed, since ψx2\psi\succcurlyeq x^{2} and σiσ\sigma_{i}\geq\sigma, we have, for 1in1\leq i\leq n,

Mψψ(σ/2)(σ/2)2E[Yi2𝟙|Yi|σ/2]ψ(σ/2)M_{\psi}\geq\frac{\psi(\sigma/2)}{(\sigma/2)^{2}}E[Y_{i}^{2}\mathbbm{1}_{|Y_{i}|\geq\sigma/2}]\gtrsim\psi(\sigma/2)

and so, for r{1,2}r\in\{1,2\},

βsnβrsnψ(σ/2)sn3(σ/2)2ψ(sn)σ,\beta s_{n}\geq\beta^{r}s_{n}\gtrsim\frac{\psi(\sigma/2)s_{n}^{3}}{(\sigma/2)^{2}\psi(s_{n})}\gtrsim\sigma,

where in the last inequality we used the fact that ψx3\psi\lesssim x^{3}.

(iii) First, notice that there exists γ>0\gamma>0 such that σi<γ\sigma_{i}<\gamma, 1in\forall 1\leq i\leq n. Indeed,

34σi2\displaystyle\frac{3}{4}\sigma_{i}^{2} Ei[Yi2𝟙|Yi|>σi/2]Ei[ψ(|Yi|)]/g(σi2)θσi2/g(σi2)\displaystyle\leq E_{i}[Y_{i}^{2}\mathbbm{1}_{|Y_{i}|>\sigma_{i}/2}]\leq E_{i}[\psi(|Y_{i}|)]/g(\tfrac{\sigma_{i}}{2})\leq\theta\sigma_{i}^{2}/g(\tfrac{\sigma_{i}}{2})

where g(x):=ψ(x)/x2g(x):=\psi(x)/x^{2} is an increasing function on (0,)(0,\infty). Thus

σi2g1(4θ3)=:γ.\sigma_{i}\leq 2g^{-1}(\tfrac{4\theta}{3})=:\gamma.

Next, by Proposition 13 and Ei[ψ(|Yi|)]θσi2E_{i}[\psi(|Y_{i}|)]\leq\theta\sigma_{i}^{2}, we get

𝒵r(Xn,𝒩)βr+θsnrEi=1nλirσi2ψ(λi).\mathcal{Z}_{r}(X_{n},\mathcal{N})\lesssim\beta^{r}+\frac{\theta}{s_{n}^{r}}E\sum_{i=1}^{n}\frac{\lambda_{i}^{r}\sigma_{i}^{2}}{\psi(\lambda_{i})}.

Define a sequence of stopping times {τt}0t1\{\tau_{t}\}_{0\leq t\leq 1} by

τ(t)=τt:=inf{m1:i=1mσi2t}.\tau(t)=\tau_{t}:=\inf\{m\geq 1:\sum_{i=1}^{m}\sigma_{i}^{2}\geq t\}. (35)

Clearly, τt=m\tau_{t}=m if and only if t(i=1m1σi2,i=1mσi2]t\in(\sum_{i=1}^{m-1}\sigma_{i}^{2},\sum_{i=1}^{m}\sigma_{i}^{2}]. Moreover,

λτt2=sn2(1+β2)i=1τtσi2sn2(1+β2)tστt2sn2(1+β2)tγ2.\lambda_{\tau_{t}}^{2}=s_{n}^{2}(1+\beta^{2})-\sum_{i=1}^{\tau_{t}}\sigma_{i}^{2}\geq s_{n}^{2}(1+\beta^{2})-t-\sigma_{\tau_{t}}^{2}\geq s_{n}^{2}(1+\beta^{2})-t-\gamma^{2}.

Thus we get, for any β[γsn,1]\beta\in[\frac{\gamma}{s_{n}},1], r{1,2}r\in\{1,2\},

i=1nλirσi2ψ(λi)=0sn2λτtrψ(λτt)dtsn2β2γ2(1+β2)sn2tr/2ψ(t1/2)dt2sn2β2γ22sntr+1ψ(t)dt,\displaystyle\sum_{i=1}^{n}\frac{\lambda_{i}^{r}\sigma_{i}^{2}}{\psi(\lambda_{i})}=\int_{0}^{s_{n}^{2}}\frac{\lambda_{\tau_{t}}^{r}}{\psi(\lambda_{\tau_{t}})}\mathrm{d}t\leq\int_{s_{n}^{2}\beta^{2}-\gamma^{2}}^{(1+\beta^{2})s_{n}^{2}}\frac{t^{r/2}}{\psi(t^{1/2})}\mathrm{d}t\leq 2\int_{\sqrt{s_{n}^{2}\beta^{2}-\gamma^{2}}}^{\sqrt{2}s_{n}}\frac{t^{r+1}}{\psi(t)}\mathrm{d}t,

where in the first inequality we used the fact that xx/ψ(x)x\mapsto x/\psi(x) is decreasing. When ψ=x3\psi=x^{3}, taking β=2γ/sn\beta=2\gamma/s_{n}, the desired 𝒲1\mathcal{W}_{1} bound follows. For the case ψxp\psi\preccurlyeq x^{p}, 2<p<32<p<3, the bound of this integral can be handled the same way as in (ii). Then the 𝒲1\mathcal{W}_{1} bounds in (iii) are proved.

It remains to prove the 𝒵2\mathcal{Z}_{2} bound. By the inequality above, for any β[γsn,1]\beta\in[\frac{\gamma}{s_{n}},1],

𝒵2(Xn,𝒩)β2+θsn202snt3ψ(t)dtβ2+θsn2ψ(sn),\displaystyle\mathcal{Z}_{2}(X_{n},\mathcal{N})\lesssim\beta^{2}+\frac{\theta}{s_{n}^{2}}\int_{0}^{\sqrt{2}s_{n}}\frac{t^{3}}{\psi(t)}\mathrm{d}t\lesssim\beta^{2}+\frac{\theta s_{n}^{2}}{\psi(s_{n})},

where in the last inequality we used the fact that xx3/ψx\mapsto x^{3}/\psi is increasing. Taking β2=sn2ψ(γ)ψ(sn)γ2\beta^{2}=\frac{s_{n}^{2}\psi(\gamma)}{\psi(s_{n})\gamma^{2}}, the 𝒵2\mathcal{Z}_{2} bound in (iii) follows. ∎

4 Proofs of the main 𝒲r\mathcal{W}_{r} bounds

This section is devoted to the proofs of Theorems 5, 7, and 8.

Note that the Taylor expansion results (Proposition 13), which are in terms of the second and third conditional moments, suit exactly martingales (with Vn2=1V_{n}^{2}=1) with integrability between 2\mathscr{L}^{2} and 3\mathscr{L}^{3}, and so not surprisingly, the 𝒲1,𝒲2\mathcal{W}_{1},\mathcal{W}_{2} bounds of Corollary 15 for p\mathscr{L}^{p} martingales, p(2,3]{}p\in(2,3]\cup\{\infty\}, followed quite easily from Proposition 13. However, to obtain Wasserstein bounds for martingales with more general integrability, we need new insights.

In Subsection 4.1, we will modify the mds into a bounded sequence with deterministic total conditional variance Vn2V_{n}^{2}. To this end, we first truncate the martingale into a bounded sequence with Vn21V_{n}^{2}\leq 1, and then lengthen it to have Vn2=1V_{n}^{2}=1. Such tricks of elongation and truncation of martingales were already used by Bolthausen [5] (the idea goes back to Dvoretzky [13]) and Haeusler [25] in the study of the Kolmogorov distance of the martingale CLT. The main result of Subsection 4.1 is a control of the error due to this modification in terms of LψL_{\psi} and (Vn21)(V_{n}^{2}-1). See Proposition 17. As an application, we prove the 𝒲3\mathcal{W}_{3} bound in Theorem 8.

In Subsections 4.2 and 4.3, we prove the 𝒲1,𝒲2\mathcal{W}_{1},\mathcal{W}_{2} estimates in Theorems 5 and 7 by bounding the corresponding metrics for the modified martingale. A crucial step of our method is to use Young’s inequality inside the conditional expectations to “decouple” the Lyapunov coefficient LψL_{\psi} and the conditional variances σi2\sigma_{i}^{2} from the summation within Proposition 13. This will turn the 𝒲r\mathcal{W}_{r} bounds into an optimization problem over three parameters: the truncation parameter (α\alpha), the smoothing parameter (β\beta), and an Orlicz parameter (ρ\rho). To this end, tools from the theory of N-functions will be employed to compare N-functions to polynomials.

4.1 A modified martingale, and Proof of Theorem 8 (𝒲3\mathcal{W}_{3} bounds)

The goal of this subsection is to modify the original mds {Yi}i=1n\{Y_{i}\}_{i=1}^{n} to a new mds Y^={Y^i}i=1n+1\hat{Y}=\{\hat{Y}_{i}\}_{i=1}^{n+1} which is uniformly bounded except the last term. Throughout this subsection, we let α(0,]\alpha\in(0,\infty] be any fixed constant.

Define a martingale difference sequence Z=Z(α)={Zi}i=1nZ=Z^{(\alpha)}=\{Z_{i}\}_{i=1}^{n} as

Zi=Zi(α):=Yi𝟙|Yi|α/2E[Yi𝟙|Yi|α/2|i1],1in.Z_{i}=Z_{i}^{(\alpha)}:=Y_{i}\mathbbm{1}_{|Y_{i}|\leq\alpha/2}-E[Y_{i}\mathbbm{1}_{|Y_{i}|\leq\alpha/2}|\mathcal{F}_{i-1}],\quad 1\leq i\leq n. (36)

Note that P(|Zi|α)=1P(|Z_{i}|\leq\alpha)=1 for all 1in1\leq i\leq n and, with σi2(Z):=E[Zi2|i1]\sigma_{i}^{2}(Z):=E[Z_{i}^{2}|\mathcal{F}_{i-1}],

i=1nσi2(Z)\displaystyle\sum_{i=1}^{n}\sigma_{i}^{2}(Z) i=1nE[Yi2𝟙|Yi|α/2]i=1nσi2(Y).\displaystyle\leq\sum_{i=1}^{n}E[Y_{i}^{2}\mathbbm{1}_{|Y_{i}|\leq\alpha/2}]\leq\sum_{i=1}^{n}\sigma_{i}^{2}(Y). (37)

Define a stopping time (with the convention inf=\inf\emptyset=\infty)

T=inf{1mn:i=1mσi2(Z)>sn2}(n+1).T=\inf\big{\{}1\leq m\leq n:\sum_{i=1}^{m}\sigma_{i}^{2}(Z)>s_{n}^{2}\big{\}}\wedge(n+1). (38)

Since i=1mσi2(Z)\sum_{i=1}^{m}\sigma_{i}^{2}(Z) is m1\mathcal{F}_{m-1}-measurable, (T1)(T-1) is also a stopping time.

Definition 16.

Let α(0,]\alpha\in(0,\infty] and let ξ𝒩(0,1)\xi\sim\mathcal{N}(0,1) be a standard normal which is independent of {Yi}i=1n\{Y_{i}\}_{i=1}^{n}. Define the modified martingale difference sequence Y^=Y^(α)={Y^i}i=1\hat{Y}=\hat{Y}^{(\alpha)}=\{\hat{Y}_{i}\}_{i=1}^{\infty} as follows: i1\forall i\geq 1,

Y^i=Y^i(α):=Zi𝟙j=1iσj2(Z)1+ξ(sn2j=1T1σj2(Z))1/2𝟙i=n+1.\hat{Y}_{i}=\hat{Y}_{i}^{(\alpha)}:=Z_{i}\mathbbm{1}_{\sum_{j=1}^{i}\sigma_{j}^{2}(Z)\leq 1}+\xi\left(s_{n}^{2}-\sum_{j=1}^{T-1}\sigma_{j}^{2}(Z)\right)^{1/2}\mathbbm{1}_{i=n+1}. (39)

Enlarging the σ\sigma-fields {i}\{\mathcal{F}_{i}\} if necessary, {Y^m}m0\{\hat{Y}_{m}\}_{m\geq 0} is still a mds. Let

X^m=1sni=1mY^i,m1, with X^0=0.\hat{X}_{m}=\frac{1}{s_{n}}\sum_{i=1}^{m}\hat{Y}_{i},\quad m\geq 1,\text{ with }\hat{X}_{0}=0.

Note that Y^j=0\hat{Y}_{j}=0 for all j[T,n](n+1,)j\in[T,n]\cup(n+1,\infty) and recall that Tn+1T\leq n+1. We write, for m1m\geq 1, σm2(Y^):=E[Y^m2|m1]\sigma_{m}^{2}(\hat{Y}):=E[\hat{Y}_{m}^{2}|\mathcal{F}_{m-1}].

The main feature of the mds Y^\hat{Y} is that |Y^i|α|\hat{Y}_{i}|\leq\alpha a.s. for 1in1\leq i\leq n, and

i=1n+1σi2(Y^)=sn2 almost surely.\sum_{i=1}^{n+1}\sigma_{i}^{2}(\hat{Y})=s_{n}^{2}\quad\text{ almost surely.} (40)
Proposition 17.

For α(0,]\alpha\in(0,\infty] and a mds {Yi}i=1n\{Y_{i}\}_{i=1}^{n}, let {Y^i}\{\hat{Y}_{i}\} and Z={Zi}i=1nZ=\{Z_{i}\}_{i=1}^{n} be as in Definition 16. Let r1r\geq 1 and let ψ[0,)[0,)\psi\in[0,\infty)^{[0,\infty)} be an N-function with ψxr2\psi\succcurlyeq x^{r\vee 2}. Recall the definition of Lψ=Lψ(Y)L_{\psi}=L_{\psi}(Y) in (20). Set θψ:=Yψ\theta_{\psi}:=\lVert Y\rVert_{\psi}.

  1. (a)

    For α<\alpha<\infty, r1r\geq 1,

    |𝒲r(Xn,𝒩)𝒲r(X^n+1,𝒩)|rLψ+α+n1/2αψ(αsn/2θψ)1/(r2)+Vn21r/21/2.\displaystyle\lvert\mathcal{W}_{r}(X_{n},\mathcal{N})-\mathcal{W}_{r}(\hat{X}_{n+1},\mathcal{N})\rvert\lesssim_{r}L_{\psi}+\alpha+\frac{n^{1/2}\alpha}{\psi(\alpha s_{n}/2\theta_{\psi})^{1/(r\vee 2)}}+\lVert V_{n}^{2}-1\rVert_{r/2}^{1/2}.
  2. (b)

    If ψ\psi satisfies that xψ(x)x\mapsto\psi(\sqrt{x}) is convex, then for r1r\geq 1,

    |𝒲r(Xn,𝒩)𝒲r(X^n+1,𝒩)|rLψ+n1/2αψ(αsn/2θψ)1/(r2)𝟙α<+Vn21r/21/2.\displaystyle\lvert\mathcal{W}_{r}(X_{n},\mathcal{N})-\mathcal{W}_{r}(\hat{X}_{n+1},\mathcal{N})\rvert\lesssim_{r}L_{\psi}+\frac{n^{1/2}\alpha}{\psi(\alpha s_{n}/2\theta_{\psi})^{1/(r\vee 2)}}\mathbbm{1}_{\alpha<\infty}+\lVert V_{n}^{2}-1\rVert_{r/2}^{1/2}.

Estimate (b) is slightly better than (a) but with a slightly stronger condition. Statement (b) is more useful for the case α=\alpha=\infty.

Lemma 18.

Let Z={Zi}i=1nZ=\{Z_{i}\}_{i=1}^{n} be as in (36). Recall the definition of the orlicz norm ψ\lVert\cdot\rVert_{\psi} for a sequence in Definition 3. Then, for any N-function ψ\psi,

Zψ2Yψ\lVert Z\rVert_{\psi}\leq 2\lVert Y\rVert_{\psi}
Proof.

Without loss of generality, assume Yψ=1\lVert Y\rVert_{\psi}=1. It suffices to show that

Ei=1nψ(|Zi|/2)n.E\sum_{i=1}^{n}\psi(|Z_{i}|/2)\leq n. (41)

Indeed, recall Ei[]E_{i}[\cdot] in (27). By the definition of ZiZ_{i} in (36), for 1in1\leq i\leq n,

Ei[ψ(|Zi|/2)]\displaystyle E_{i}[\psi(|Z_{i}|/2)] Ei[ψ((|Yi|𝟙|Yi|α/2+Ei[|Yi|𝟙|Yi|α/2])/2)]\displaystyle\leq E_{i}\left[\psi\left((|Y_{i}|\mathbbm{1}_{|Y_{i}|\leq\alpha/2}+E_{i}[|Y_{i}|\mathbbm{1}_{|Y_{i}|\leq\alpha/2}])/2\right)\right]
12Ei[ψ(|Yi|𝟙|Yi|α/2)+ψ(Ei[|Yi|𝟙|Yi|α/2])]\displaystyle\leq\frac{1}{2}E_{i}\left[\psi(|Y_{i}|\mathbbm{1}_{|Y_{i}|\leq\alpha/2})+\psi(E_{i}[|Y_{i}|\mathbbm{1}_{|Y_{i}|\leq\alpha/2}])\right]
Ei[ψ(|Yi|𝟙|Yi|α/2)],\displaystyle\leq E_{i}\left[\psi(|Y_{i}|\mathbbm{1}_{|Y_{i}|\leq\alpha/2})\right],

where we used Jensen’s inequality in both the second and the third inequalities. Summing both sides over all 1in1\leq i\leq n, inequality (41) follows. ∎

Proof of Proposition 17.

Throughout, notice that r1r\geq 1 and ψxr2\psi\succcurlyeq x^{r\vee 2}.

Without loss of generality, assume sn=1s_{n}=1. Set

Kr:=Vn21r/21/2.\displaystyle K_{r}:=\lVert V_{n}^{2}-1\rVert_{r/2}^{1/2}. (42)

To prove Proposition 17, it suffices to show that

XnX^n+1rrLψ+maxinσi(Z)r+Kr+n1/2αψ(α/2θψ)1/(r2)𝟙α<.\lVert X_{n}-\hat{X}_{n+1}\rVert_{r}\lesssim_{r}L_{\psi}+\lVert\max_{i}^{n}\sigma_{i}(Z)\rVert_{r}+K_{r}+\frac{n^{1/2}\alpha}{\psi(\alpha/2\theta_{\psi})^{1/(r\vee 2)}}\mathbbm{1}_{\alpha<\infty}. (43)

Indeed, trivially we have maxinσi(Z)rα\lVert\max_{i}^{n}\sigma_{i}(Z)\rVert_{r}\leq\alpha. If ψ(x)\psi(\sqrt{x}) is convex, we claim that

maxinσi(Z)rLψ.\lVert\max_{i}^{n}\sigma_{i}(Z)\rVert_{r}\lesssim L_{\psi}. (44)

To this end, we apply Lemma A.2 to get

maxinσi(Z)r21/r{σi(Z)}i=1nψψ1(n)\lVert\max_{i}^{n}\sigma_{i}(Z)\rVert_{r}\leq 2^{1/r}\lVert\{\sigma_{i}(Z)\}_{i=1}^{n}\rVert_{\psi}\psi^{-1}(n)

Notice that, when ψ(x)\psi(\sqrt{x}) is convex, then using Jensen’s inequality as in (33), we have {σi(Z)}i=1nψZψ\lVert\{\sigma_{i}(Z)\}_{i=1}^{n}\rVert_{\psi}\leq\lVert Z\rVert_{\psi} which is bounded by 2Yψ2\lVert Y\rVert_{\psi} by Lemma 18. Claim (44) follows.

The rest of the proof is devoted to obtain (43).

Recall TT in (38). Since {Y^i}\{\hat{Y}_{i}\} and {Zi}\{Z_{i}\} coincide up to i=T1i=T-1, we have

XnX^n+1\displaystyle X_{n}-\hat{X}_{n+1} =i=TnZi+i=1n(YiZi)Y^n+1.\displaystyle=\sum_{i=T}^{n}Z_{i}+\sum_{i=1}^{n}(Y_{i}-Z_{i})-\hat{Y}_{n+1}. (45)

Step 1. We will bound the first term i=TnZi\sum_{i=T}^{n}Z_{i} in (45) by

i=TnZirrKr+Lψ,r1.\lVert\sum_{i=T}^{n}Z_{i}\rVert_{r}\lesssim_{r}K_{r}+L_{\psi},\quad r\geq 1. (46)

Since T1T-1 is a stopping time, {Zi}i=Tn\{Z_{i}\}_{i=T}^{n} form a martingale difference sequence. Hence, by an inequality of Burkholder Theorem A.3 and Lemma A.2, for r1r\geq 1,

i=TnZiri=T+1nZir+maxi=1n|Zi|rri=T+1nσi2(Z)r/21/2+Lψ.\displaystyle\lVert\sum_{i=T}^{n}Z_{i}\rVert_{r}\leq\lVert\sum_{i=T+1}^{n}Z_{i}\rVert_{r}+\lVert\max_{i=1}^{n}|Z_{i}|\rVert_{r}\lesssim_{r}\lVert\sum_{i=T+1}^{n}\sigma_{i}^{2}(Z)\rVert_{r/2}^{1/2}+L_{\psi}. (47)

Further, by the definition of the stopping time TT,

i=T+1nσi2(Z)=(i=1nσi2(Z)i=1Tσi2(Z))0(i=1nσi2(Z)1)0.\displaystyle\sum_{i=T+1}^{n}\sigma_{i}^{2}(Z)=\left(\sum_{i=1}^{n}\sigma_{i}^{2}(Z)-\sum_{i=1}^{T}\sigma_{i}^{2}(Z)\right)\vee 0\leq\left(\sum_{i=1}^{n}\sigma_{i}^{2}(Z)-1\right)\vee 0.

This inequality, together with (47), yields (46).

Step 2. Consider the second i=1n(YiZi)\sum_{i=1}^{n}(Y_{i}-Z_{i}) in (45). We will show that

i=1n(YiZi)rrαn1/2ψ(α/2θψ)1/(r2)𝟙α<.\lVert\sum_{i=1}^{n}(Y_{i}-Z_{i})\rVert_{r}\lesssim_{r}\frac{\alpha n^{1/2}}{\psi(\alpha/2\theta_{\psi})^{1/(r\vee 2)}}\mathbbm{1}_{\alpha<\infty}. (48)

Indeed, this inequality is trivial for α=\alpha=\infty. When α<\alpha<\infty, note that {YiZi}i=1n\{Y_{i}-Z_{i}\}_{i=1}^{n} is a martingale difference sequence. For r1r\geq 1, by Burkholder’s inequality,

i=1n(YiZi)rri=1n(YiZi)2r/21/2\displaystyle\lVert\sum_{i=1}^{n}(Y_{i}-Z_{i})\rVert_{r}\lesssim_{r}\lVert\sum_{i=1}^{n}(Y_{i}-Z_{i})^{2}\rVert_{r/2}^{1/2}

Further, by Hölder’s inequality, Jensen’s inequality, and the fact YiZi=Yi𝟙|Yi|>α/2E[Yi𝟙|Yi|>α/2|i1]Y_{i}-Z_{i}=Y_{i}\mathbbm{1}_{|Y_{i}|>\alpha/2}-E[Y_{i}\mathbbm{1}_{|Y_{i}|>\alpha/2}|\mathcal{F}_{i-1}], we get, for r2r\geq 2,

i=1n(YiZi)2r/2n1ni=1n|YiZi|r12/r4n1ni=1n|Yi|r𝟙|Yi|>α/212/r.\displaystyle\lVert\sum_{i=1}^{n}(Y_{i}-Z_{i})^{2}\rVert_{r/2}\leq n\bigg{\lVert}\frac{1}{n}\sum_{i=1}^{n}|Y_{i}-Z_{i}|^{r}\bigg{\rVert}_{1}^{2/r}\leq 4n\bigg{\lVert}\frac{1}{n}\sum_{i=1}^{n}|Y_{i}|^{r}\mathbbm{1}_{|Y_{i}|>\alpha/2}\bigg{\rVert}_{1}^{2/r}.

Since |Yi|r𝟙|Yi|>α/2ψ(|Yi|/θψ)(α/2)r/ψ(α/2θψ)|Y_{i}|^{r}\mathbbm{1}_{|Y_{i}|>\alpha/2}\leq\psi(|Y_{i}|/\theta_{\psi})(\alpha/2)^{r}/\psi(\alpha/2\theta_{\psi}), we have

1ni=1n|Yi|r𝟙|Yi|>α/21(α/2)rnψ(α/2θψ)i=1nE[ψ(|Yi|/2θψ)](α/2)rψ(α/2θψ)\bigg{\lVert}\frac{1}{n}\sum_{i=1}^{n}|Y_{i}|^{r}\mathbbm{1}_{|Y_{i}|>\alpha/2}\bigg{\rVert}_{1}\leq\frac{(\alpha/2)^{r}}{n\psi(\alpha/2\theta_{\psi})}\sum_{i=1}^{n}E[\psi(|Y_{i}|/2\theta_{\psi})]\leq\frac{(\alpha/2)^{r}}{\psi(\alpha/2\theta_{\psi})} (49)

where the last inequality used the definition of Yψ\lVert Y\rVert_{\psi}. Hence

i=1n(YiZi)2r/2α2nψ(α/2θψ)2/(r2).\lVert\sum_{i=1}^{n}(Y_{i}-Z_{i})^{2}\rVert_{r/2}\leq\frac{\alpha^{2}n}{\psi(\alpha/2\theta_{\psi})^{2/(r\vee 2)}}.

Inequality (48) is proved.

Step 3. Consider the last term in (45). By definition (39),

Y^n+1rr(1i=1T1σi2(Z))1/2r.\lVert\hat{Y}_{n+1}\rVert_{r}\lesssim_{r}\bigg{\lVert}\big{(}1-\sum_{i=1}^{T-1}\sigma_{i}^{2}(Z)\big{)}^{1/2}\bigg{\rVert}_{r}.

When TnT\leq n, we have

01i=1T1σi2(Z)1i=1Tσi2(Z)+maxi=1nσi2(Z)<maxi=1nσi2(Z).0\leq 1-\sum_{i=1}^{T-1}\sigma_{i}^{2}(Z)\leq 1-\sum_{i=1}^{T}\sigma_{i}^{2}(Z)+\max_{i=1}^{n}\sigma_{i}^{2}(Z)<\max_{i=1}^{n}\sigma_{i}^{2}(Z).

If T=n+1T=n+1, then

|1i=1T1σi2(Z)||1i=1nσi2(Y)|+|i=1nσi2(Y)i=1nσi2(Z)|.\Bigr{\lvert}1-\sum_{i=1}^{T-1}\sigma_{i}^{2}(Z)\Bigl{\rvert}\leq\Bigr{\lvert}1-\sum_{i=1}^{n}\sigma_{i}^{2}(Y)\Bigl{\rvert}+\Bigr{\lvert}\sum_{i=1}^{n}\sigma_{i}^{2}(Y)-\sum_{i=1}^{n}\sigma_{i}^{2}(Z)\Bigl{\rvert}.

Hence, we have

Y^n+1rmaxinσi(Z)r+Kr+i=1nσi2(Y)σi2(Z)r/21/2.\lVert\hat{Y}_{n+1}\rVert_{r}\leq\lVert\max_{i}^{n}\sigma_{i}(Z)\rVert_{r}+K_{r}+\bigg{\lVert}\sum_{i=1}^{n}\sigma_{i}^{2}(Y)-\sigma_{i}^{2}(Z)\bigg{\rVert}_{r/2}^{1/2}. (50)

Step 4. Let us bound the third term in (50). We will show that

i=1nσi2(Y)σi2(Z)r/2nα2ψ(α/2θψ)2/(r2)𝟙α<.\bigg{\lVert}\sum_{i=1}^{n}\sigma_{i}^{2}(Y)-\sigma_{i}^{2}(Z)\bigg{\rVert}_{r/2}\lesssim\frac{n\alpha^{2}}{\psi(\alpha/2\theta_{\psi})^{2/(r\vee 2)}}\mathbbm{1}_{\alpha<\infty}. (51)

This is trivial when α=\alpha=\infty. For α<\alpha<\infty, recall the notation EiE_{i} in (27). Then

σi2(Y)σi2(Z)=Ei[Yi2𝟙|Yi|>α/2]+Ei[Yi𝟙|Yi|>α/2]21in.\displaystyle\sigma_{i}^{2}(Y)-\sigma_{i}^{2}(Z)=E_{i}[Y_{i}^{2}\mathbbm{1}_{|Y_{i}|>\alpha/2}]+E_{i}[Y_{i}\mathbbm{1}_{|Y_{i}|>\alpha/2}]^{2}\quad\forall 1\leq i\leq n.

Hence, by Hölder’s inequality and Jensen’s inequality, for r2r\geq 2,

i=1nσi2(Y)σi2(Z)r/2\displaystyle\bigg{\lVert}\sum_{i=1}^{n}\sigma_{i}^{2}(Y)-\sigma_{i}^{2}(Z)\bigg{\rVert}_{r/2} 2i=1nEi[Yi2𝟙|Yi|>α/2]r/2\displaystyle\leq 2\bigg{\lVert}\sum_{i=1}^{n}E_{i}[Y_{i}^{2}\mathbbm{1}_{|Y_{i}|>\alpha/2}]\bigg{\rVert}_{r/2}
2n1ni=1n|Yi|r𝟙|Yi|>α/212/r(49)nα2ψ(α/2θψ)2/r.\displaystyle\leq 2n\bigg{\lVert}\frac{1}{n}\sum_{i=1}^{n}|Y_{i}|^{r}\mathbbm{1}_{|Y_{i}|>\alpha/2}\bigg{\rVert}_{1}^{2/r}\stackrel{{\scriptstyle\eqref{eq:noname}}}{{\lesssim}}\frac{n\alpha^{2}}{\psi(\alpha/2\theta_{\psi})^{2/r}}.

Display (51) is proved.

Step 5. By (50), (51), we have arrived at

Y^n+1rmaxinσi(Z)r+Kr+αn1/2ψ(α/2θψ)1/(r2)𝟙α<.\lVert\hat{Y}_{n+1}\rVert_{r}\lesssim\lVert\max_{i}^{n}\sigma_{i}(Z)\rVert_{r}+K_{r}+\frac{\alpha n^{1/2}}{\psi(\alpha/2\theta_{\psi})^{1/(r\vee 2)}}\mathbbm{1}_{\alpha<\infty}.

This inequality, together with (45), (46), (48), yields (43). ∎

Proof of Theorem 8.

Consider α=\alpha=\infty and recall the definition of Y^=Y^()\hat{Y}=\hat{Y}^{(\infty)} in Definition 16. Note that Zi()=YiZ_{i}^{(\infty)}=Y_{i} for 1in1\leq i\leq n.

Since Y^n+1\hat{Y}_{n+1} is normal conditioning on n\mathcal{F}_{n}, by (29) in Proposition 13 and Remark 14, we have

𝒵3(X^n+1,𝒩)1sn3i=1nE[|Y^i|3]1sn3i=1nE[|Zi|3]=1sn3i=1nE[|Yi|3].\mathcal{Z}_{3}(\hat{X}_{n+1},\mathcal{N})\lesssim\frac{1}{s_{n}^{3}}\sum_{i=1}^{n}E[|\hat{Y}_{i}|^{3}]\leq\frac{1}{s_{n}^{3}}\sum_{i=1}^{n}E[|Z_{i}|^{3}]=\frac{1}{s_{n}^{3}}\sum_{i=1}^{n}E[|Y_{i}|^{3}].

By (25), this implies 𝒲3(X^n+1,𝒩)L3\mathcal{W}_{3}(\hat{X}_{n+1},\mathcal{N})\lesssim L_{3}. Further, applying Proposition 17(b) to the case α=\alpha=\infty, we get

𝒲3(Xn,𝒩)𝒲3(X^n+1,𝒩)+L3+Vn213/21/2.\mathcal{W}_{3}(X_{n},\mathcal{N})\lesssim\mathcal{W}_{3}(\hat{X}_{n+1},\mathcal{N})+L_{3}+\lVert V_{n}^{2}-1\rVert_{3/2}^{1/2}.

Theorem 8 is proved. ∎

4.2 Proof of Theorem 5 (𝒲1\mathcal{W}_{1} bounds)

In what follows we let α(0,),ρ>0\alpha\in(0,\infty),\rho>0 be constants to be determined later. Let the mds Y^=Y^(α)\hat{Y}=\hat{Y}^{(\alpha)} be as in Definition 16, and set

β=2α.\beta=\sqrt{2}\alpha. (52)

Throughout this subsection, we simply write

σi2\displaystyle\sigma_{i}^{2} =σi2(Y^)=E[Y^i2|i1],i1\displaystyle=\sigma_{i}^{2}(\hat{Y})=E[\hat{Y}_{i}^{2}|\mathcal{F}_{i-1}],\quad i\geq 1
λm2\displaystyle\lambda_{m}^{2} =i=m+1n+1σi2+β2sn2,0mn+1.\displaystyle=\sum_{i=m+1}^{n+1}\sigma_{i}^{2}+\beta^{2}s_{n}^{2},\quad 0\leq m\leq n+1. (53)

Note that λ02=(1+β2)sn2\lambda_{0}^{2}=(1+\beta^{2})s_{n}^{2}, and λm2=λ02i=1mσi2\lambda_{m}^{2}=\lambda_{0}^{2}-\sum_{i=1}^{m}\sigma_{i}^{2} is m1\mathcal{F}_{m-1}-measurable for m1m\geq 1.

Recall that Y^n+1\hat{Y}_{n+1} is normaly distributed given i1\mathcal{F}_{i-1}. By (40), Proposition 13, and Remark 14, for r{1,2}r\in\{1,2\}, α>0,β=2α\alpha>0,\beta=\sqrt{2}\alpha,

𝒲r(X^n+1,𝒩)αr+snrEi=1n|Y^i|3λi3r.\mathcal{W}_{r}(\hat{X}_{n+1},\mathcal{N})\lesssim\alpha^{r}+s_{n}^{-r}E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}^{3-r}}. (54)

The following estimate of the summation within (54) will be crucially employed in the derivation of our 𝒲r\mathcal{W}_{r} bounds, r=1,2r=1,2.

Proposition 19.

Let ψ[0,)[0,)\psi\in[0,\infty)^{[0,\infty)} be an N-function with ψx2\psi\succcurlyeq x^{2}, and set θψ=Yψ\theta_{\psi}=\lVert Y\rVert_{\psi}. Recall α,Y^,λi\alpha,\hat{Y},\lambda_{i} above, and β=2α\beta=\sqrt{2}\alpha. Then, for any B,k>0B,k>0,

Ei=1n|Y^i|3λik𝟙λi2Bθψρn+θψρα2α2(1+α2)Bψ(α2ρtk/2)dt,ρ>0.E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}^{k}}\mathbbm{1}_{\lambda_{i}^{2}\leq B}\lesssim\theta_{\psi}\rho n+\theta_{\psi}\rho\alpha^{-2}\int_{\alpha^{2}}^{(1+\alpha^{2})\wedge B}\psi_{*}(\tfrac{\alpha^{2}}{\rho t^{k/2}})\mathrm{d}t,\quad\forall\rho>0.
Proof.

Without loss of generality, assume sn=1s_{n}=1.

Recall that λi\lambda_{i} is i1\mathcal{F}_{i-1}-measurable. Recall that Ei[]=E[|i1]E_{i}[\cdot]=E[\cdot|\mathcal{F}_{i-1}] as in (27).

By Young’s inequality (15), for any ρ>0\rho>0,

Ei=1nEi[|Y^i|3]2θψρλik𝟙λi2BEi=1nEi[ψ(|Y^i|2θψ)+ψ(Y^i2ρλik)𝟙λi2B].E\sum_{i=1}^{n}\frac{E_{i}[|\hat{Y}_{i}|^{3}]}{2\theta_{\psi}\rho\lambda_{i}^{k}}\mathbbm{1}_{\lambda_{i}^{2}\leq B}\leq E\sum_{i=1}^{n}E_{i}\big{[}\psi(\tfrac{|\hat{Y}_{i}|}{2\theta_{\psi}})+\psi_{*}(\tfrac{\hat{Y}_{i}^{2}}{\rho\lambda_{i}^{k}})\mathbbm{1}_{\lambda_{i}^{2}\leq B}\big{]}.

Note that ψ\psi_{*} is an N-function and so ψx\psi_{*}\succcurlyeq x. Using the fact that ψx\psi_{*}\succcurlyeq x and that |Y^i|α|\hat{Y}_{i}|\leq\alpha for 1in1\leq i\leq n, we have, almost surely,

ψ(Y^i2ρλik)Yi^2α2ψ(α2ρλik),1in.\psi_{*}(\tfrac{\hat{Y}_{i}^{2}}{\rho\lambda_{i}^{k}})\leq\tfrac{\hat{Y_{i}}^{2}}{\alpha^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho\lambda_{i}^{k}}),\quad\forall 1\leq i\leq n.

Thus we further have (Note that Y^i=0\hat{Y}_{i}=0 for TinT\leq i\leq n.)

Ei=1nEi[|Y^i|3]2θψρλik𝟙λi2B\displaystyle E\sum_{i=1}^{n}\frac{E_{i}[|\hat{Y}_{i}|^{3}]}{2\theta_{\psi}\rho\lambda_{i}^{k}}\mathbbm{1}_{\lambda_{i}^{2}\leq B} Ei=1nψ(|Zi|2θψ)+Ei=1nEi[Y^i2]α2ψ(α2ρλik)𝟙λi2B\displaystyle\leq E\sum_{i=1}^{n}\psi(\tfrac{|Z_{i}|}{2\theta_{\psi}})+E\sum_{i=1}^{n}\tfrac{E_{i}[\hat{Y}_{i}^{2}]}{\alpha^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho\lambda_{i}^{k}})\mathbbm{1}_{\lambda_{i}^{2}\leq B}
Lemma18n+Ei=1T1σi2α2ψ(α2ρλik)𝟙λi2Bρ>0.\displaystyle\stackrel{{\scriptstyle Lemma~{}\ref{lem:orfnorm-z}}}{{\leq}}n+E\sum_{i=1}^{T-1}\tfrac{\sigma_{i}^{2}}{\alpha^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho\lambda_{i}^{k}})\mathbbm{1}_{\lambda_{i}^{2}\leq B}\quad\forall\rho>0.

Next, we will show the following integral bound:

Ei=1T1σi2ψ(α2ρλi1k)𝟙λikBα2(1+α2)Bψ(α2ρtk/2)dt.\displaystyle E\sum_{i=1}^{T-1}\sigma_{i}^{2}\psi_{*}(\tfrac{\alpha^{2}}{\rho\lambda_{i-1}^{k}})\mathbbm{1}_{\lambda_{i}^{k}\leq B}\leq\int_{\alpha^{2}}^{(1+\alpha^{2})\wedge B}\psi_{*}(\tfrac{\alpha^{2}}{\rho t^{k/2}})\mathrm{d}t. (55)

Recall the sequence of stopping times {τt}0t1\{\tau_{t}\}_{0\leq t\leq 1} defined in (35). Then τt=m\tau_{t}=m if and only if t(i=1m1σi2,i=1mσi2]t\in(\sum_{i=1}^{m-1}\sigma_{i}^{2},\sum_{i=1}^{m}\sigma_{i}^{2}]. Moreover, recalling that β2=2α2\beta^{2}=2\alpha^{2},

λτt2=1+β2i=1τtσi21+β2tα2=1+α2t.\lambda_{\tau_{t}}^{2}=1+\beta^{2}-\sum_{i=1}^{\tau_{t}}\sigma_{i}^{2}\geq 1+\beta^{2}-t-\alpha^{2}=1+\alpha^{2}-t.

Thus, for any α>0\alpha>0, writing vm2:=i=1mσi2v_{m}^{2}:=\sum_{i=1}^{m}\sigma_{i}^{2},

E[i=1T1σi2ψ(αkρλik)𝟙λi2B]\displaystyle E\left[\sum_{i=1}^{T-1}\sigma_{i}^{2}\psi_{*}(\tfrac{\alpha^{k}}{\rho\lambda_{i}^{k}})\mathbbm{1}_{\lambda_{i}^{2}\leq B}\right] =Ei=1T1vi12vi2ψ(α2ρλτtk)𝟙λτt2Bdt\displaystyle=E\sum_{i=1}^{T-1}\int_{v_{i-1}^{2}}^{v_{i}^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho\lambda_{\tau_{t}}^{k}})\mathbbm{1}_{\lambda_{\tau_{t}}^{2}\leq B}\mathrm{d}t
E01ψ(α2ρ(1+α2t)k/2)𝟙1+α2tBdt\displaystyle\leq E\int_{0}^{1}\psi_{*}(\tfrac{\alpha^{2}}{\rho(1+\alpha^{2}-t)^{k/2}})\mathbbm{1}_{1+\alpha^{2}-t\leq B}\mathrm{d}t
α2(1+α2)Bψ(α2ρtk/2)dt.\displaystyle\leq\int_{\alpha^{2}}^{(1+\alpha^{2})\wedge B}\psi_{*}(\tfrac{\alpha^{2}}{\rho t^{k/2}})\mathrm{d}t.

Inequality (55) is proved. The Proposition follows. ∎

To prove Theorem 7 and Theorem 8, we will need the following lemma to compare N-functions to power functions.

Lemma 20.

Let p>1p>1 and let q:=p/(p1)q:=p/(p-1) denote its Hölder conjugate. Let ψ\psi be an N-function. Set Cψ=maxx,y>0ψ1(x)ψ1(x)/xψ1(y)ψ1(y)/yC_{\psi}=\max_{x,y>0}\frac{\psi^{-1}(x)\psi_{*}^{-1}(x)/x}{\psi^{-1}(y)\psi_{*}^{-1}(y)/y}.

  1. (1)

    If ψxp\psi\preccurlyeq x^{p}, then

    ψ(s)sqCψqψ(t)tq for any s>t>0.\frac{\psi_{*}(s)}{s^{q}}\geq C_{\psi}^{-q}\frac{\psi_{*}(t)}{t^{q}}\quad\text{ for any }s>t>0.
  2. (2)

    If ψxp\psi\succcurlyeq x^{p}, then

    ψ(s)sqCψqψ(t)tq for any s>t>0.\frac{\psi_{*}(s)}{s^{q}}\leq C_{\psi}^{q}\frac{\psi_{*}(t)}{t^{q}}\quad\text{ for any }s>t>0.

Note that Cψ=1C_{\psi}=1 if ψ=xp\psi=x^{p}, and (16) guarantees that 1Cψ21\leq C_{\psi}\leq 2 in general.

Proof of Theorem 5:.

Without loss of generality, assume sn=1s_{n}=1. Set θψ=Yψ\theta_{\psi}=\lVert Y\rVert_{\psi}.

Part 1 (Proof of Theorem 5(i) for general ψ\psi).

Since ψ(x)/x\psi_{*}(x)/x is increasing, we have tψ(α2ρt)α2ψ(α2ρα2)t\psi_{*}(\tfrac{\alpha^{2}}{\rho t})\leq\alpha^{2}\psi_{*}(\tfrac{\alpha^{2}}{\rho\alpha^{2}}) for tα2t\geq\alpha^{2}. Hence

α21+α2ψ(α2ρt)dtα21+α2α2tψ(1ρ)dt=α2ψ(1ρ)log(1+1α2).\displaystyle\int_{\alpha^{2}}^{1+\alpha^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho t})\mathrm{d}t\leq\int_{\alpha^{2}}^{1+\alpha^{2}}\frac{\alpha^{2}}{t}\psi_{*}(\tfrac{1}{\rho})\mathrm{d}t=\alpha^{2}\psi_{*}(\tfrac{1}{\rho})\log(1+\tfrac{1}{\alpha^{2}}).

This inequality, together with Proposition 19, yields

Ei=1n|Y^i|3λi12\displaystyle E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i-1}^{2}} θψρn+θψρψ(1ρ)log(1+1α2).\displaystyle\lesssim\theta_{\psi}\rho n+\theta_{\psi}\rho\psi_{*}(\tfrac{1}{\rho})\log(1+\tfrac{1}{\alpha^{2}}).

Taking ρ=1/ψ1(n)\rho=1\big{/}\psi_{*}^{-1}(n) in the above inequality, we obtain

Ei=1T1|Y^i|3λi12θψρnlog(e+1α2)θψψ1(n)log(e+1α2)\displaystyle E\sum_{i=1}^{T-1}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i-1}^{2}}\lesssim\theta_{\psi}\rho n\log(e+\tfrac{1}{\alpha^{2}})\lesssim\theta_{\psi}\psi^{-1}(n)\log(e+\tfrac{1}{\alpha^{2}}) (56)

where in the last inequality we used the fact

ρ=1/ψ1(n)(16)1nψ1(n).\rho=1\big{/}\psi_{*}^{-1}(n)\stackrel{{\scriptstyle\eqref{eq:orlicz-ineq}}}{{\leq}}\tfrac{1}{n}\psi^{-1}(n). (57)

Further, putting

α=2θψψ1(n)=2Lψ,\alpha=2\theta_{\psi}\psi^{-1}(n)=2L_{\psi}, (58)

then inequality (56) and (54) imply

𝒲1(X^n+1,𝒩)αlog(e+1Lψ2).\mathcal{W}_{1}(\hat{X}_{n+1},\mathcal{N})\lesssim\alpha\log(e+\tfrac{1}{L_{\psi}^{2}}).

This inequality, together with Proposition 17(a), yields (recall K1K_{1} in (42))

𝒲1(Xn,𝒩)αlog(e+1Lψ2)+n1/2αψ(α/2θψ)1/2+K1(58)αlog(e+1Lψ2)+K1.\displaystyle\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\alpha\log(e+\tfrac{1}{L_{\psi}^{2}})+\frac{n^{1/2}\alpha}{\psi(\alpha/2\theta_{\psi})^{1/2}}+K_{1}\stackrel{{\scriptstyle\eqref{eq:alpha-value}}}{{\lesssim}}\alpha\log(e+\tfrac{1}{L_{\psi}^{2}})+K_{1}.

Theorem 5(i) is proved.

Part 2 (Proof of Theorem 5(ii) for ψxp\psi\preccurlyeq x^{p}, p>2p>2).

Similar to the previous step, we will first derive a bound for the integral in (55). The key observation is that we can compare ψ\psi_{*} to (xp)(x^{p})_{*} instead of 1/x1/x (thanks to Lemma 20).

Let q:=p/(p1)q:=p/(p-1) denote the Hölder conjugate of p>2p>2. By Lemma 20, ψ(α2ρt)Cψq(α2t)qψ(α2ρα2)\psi_{*}(\tfrac{\alpha^{2}}{\rho t})\leq C_{\psi}^{q}(\tfrac{\alpha^{2}}{t})^{q}\psi_{*}(\tfrac{\alpha^{2}}{\rho\alpha^{2}}) for t>α2t>\alpha^{2}. Hence

α21+α2ψ(α2ρt)dtCψqq1α2ψ(1ρ).\int_{\alpha^{2}}^{1+\alpha^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho t})\mathrm{d}t\leq\frac{C_{\psi}^{q}}{q-1}\alpha^{2}\psi_{*}(\tfrac{1}{\rho}). (59)

This inequality, together with Proposition 19, yields (Note that Cψq22C_{\psi}^{q}\leq 2^{2}.)

Ei=1n|Y^i|3λi12\displaystyle E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i-1}^{2}} θψρn+1q1θψρψ(1ρ).\displaystyle\lesssim\theta_{\psi}\rho n+\tfrac{1}{q-1}\theta_{\psi}\rho\psi_{*}(\tfrac{1}{\rho}).

Putting ρ=1/ψ1(n)\rho=1\big{/}\psi_{*}^{-1}(n), this inequality becomes

Ei=1n|Y^i|3λi12pθψρn(57)pθψψ1(n).\displaystyle E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i-1}^{2}}\lesssim p\theta_{\psi}\rho n\stackrel{{\scriptstyle\eqref{eq:rho-upper}}}{{\lesssim}}p\theta_{\psi}\psi^{-1}(n). (60)

Hence, taking α=2θψψ1(n)=2Lψ\alpha=2\theta_{\psi}\psi^{-1}(n)=2L_{\psi}, inequalities (60) and (54) imply

𝒲1(X^n+1,𝒩)pα.\mathcal{W}_{1}(\hat{X}_{n+1},\mathcal{N})\lesssim p\alpha.

This inequality, together with Proposition 17(a), yields Theorem 5(ii). ∎

4.3 Proof of Theorem 7 (𝒲2\mathcal{W}_{2} bounds)

In some sense, our proof of the 𝒲2\mathcal{W}_{2} bound in Theorem 7 is an interpolation between the proofs of Theorem 5 and Theorem 8. We observe that, to bound Ei=1n|Yi|3/λiE\sum_{i=1}^{n}|Y_{i}|^{3}/\lambda_{i}, a better estimate than Proposition 19 can be achieved by simply bounding |Yi|3/λi|Y_{i}|^{3}/\lambda_{i} by |Yi|3/C|Y_{i}|^{3}/C when λi\lambda_{i} is larger than some “threshold” CC.

Proof of Theorem 7(i):.

Without loss of generality, assume sn=1s_{n}=1. Set θψ=Yψ\theta_{\psi}=\lVert Y\rVert_{\psi}.

Define the mds Y^=Y^()\hat{Y}=\hat{Y}^{(\infty)} and the martingale {X^i}\{\hat{X}_{i}\} as in Definition 16. Note that Y^i=Yi\hat{Y}_{i}=Y_{i} for all i<Ti<T.

Let β>0\beta>0 be a constant to be determined later, and recall the notations σi,λi\sigma_{i},\lambda_{i} in (53). Since ψx2\psi\succcurlyeq x^{2} and λiβ\lambda_{i}\geq\beta for 1in+11\leq i\leq n+1, we have λi2/ψ(λi/θψ)β2/ψ(β/θψ)\lambda_{i}^{2}/\psi(\lambda_{i}/\theta_{\psi})\leq\beta^{2}/\psi(\beta/\theta_{\psi}) for all ii. Applying Proposition 13 and Remark 14, we get

𝒵2(X^n+1,𝒩)\displaystyle\mathcal{Z}_{2}(\hat{X}_{n+1},\mathcal{N}) β2+Ei=1nλi2ψ(|Y^i|/θψ)ψ(λi/θψ)\displaystyle\lesssim\beta^{2}+E\sum_{i=1}^{n}\frac{\lambda_{i}^{2}\psi(|\hat{Y}_{i}|/\theta_{\psi})}{\psi(\lambda_{i}/\theta_{\psi})}
β2+Ei=1nβ2ψ(|Yi|/θψ)ψ(β/θψ)\displaystyle\lesssim\beta^{2}+E\sum_{i=1}^{n}\frac{\beta^{2}\psi(|Y_{i}|/\theta_{\psi})}{\psi(\beta/\theta_{\psi})}
β2+nβ2ψ(β/θψ).\displaystyle\lesssim\beta^{2}+\frac{n\beta^{2}}{\psi(\beta/\theta_{\psi})}.

Taking β=Lψ=θψψ1(n)\beta=L_{\psi}=\theta_{\psi}\psi^{-1}(n), we get

𝒵2(X^n+1,𝒩)Lψ2.\mathcal{Z}_{2}(\hat{X}_{n+1},\mathcal{N})\lesssim L_{\psi}^{2}.

Finally, by Proposition 17(b) and (25), with α=\alpha=\infty,

𝒲2(Xn,𝒩)𝒵2(X^n+1,𝒩)1/2+Lψ+i=1nσi2111/2.\mathcal{W}_{2}(X_{n},\mathcal{N})\lesssim\mathcal{Z}_{2}(\hat{X}_{n+1},\mathcal{N})^{1/2}+L_{\psi}+\lVert\sum_{i=1}^{n}\sigma_{i}^{2}-1\rVert_{1}^{1/2}.

Theorem 7(i) follows. ∎

Proof of Theorem 7(ii):.

Without loss of generality, assume sn=1s_{n}=1.

Let α>0\alpha>0 be a constant to be determined. We define the mds Y^=Y^(α)\hat{Y}=\hat{Y}^{(\alpha)} and the martingale {X^i}\{\hat{X}_{i}\} as in Definition 16. Set θψ=Yψ\theta_{\psi}=\lVert Y\rVert_{\psi}.

Step 1. Applying Proposition 13 and Remark 14 to the case β=2α\beta=\sqrt{2}\alpha, we get

𝒵2(X^n+1,𝒩)α2+Ei=1n|Y^i|3λi.\mathcal{Z}_{2}(\hat{X}_{n+1},\mathcal{N})\lesssim\alpha^{2}+E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}. (61)

Let k>0k>0 be a constant to be determined later, and define and event

Ai={λikα}.A_{i}=\{\lambda_{i}\leq k\alpha\}.

By Proposition 19,

Ei=1n|Y^i|3λi𝟙λikαθψρn+θψρα20k2α2ψ(α2ρt1/2)dtE\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}\mathbbm{1}_{\lambda_{i}\leq k\alpha}\lesssim\theta_{\psi}\rho n+\theta_{\psi}\rho\alpha^{-2}\int_{0}^{k^{2}\alpha^{2}}\psi_{*}(\tfrac{\alpha^{2}}{\rho t^{1/2}})\mathrm{d}t (62)

Since ψx3\psi\succcurlyeq x^{3}, by Lemma 20, t3/2ψ(α2ρt)(kα)3/2ψ(α2ρkα)t^{3/2}\psi_{*}(\tfrac{\alpha^{2}}{\rho t})\lesssim(k\alpha)^{3/2}\psi_{*}(\tfrac{\alpha^{2}}{\rho k\alpha}) for t(0,kα)t\in(0,k\alpha). Thus

0kαtψ(α2ρt)dt(kα)3/2ψ(α2ρkα)0kαt13/2dt(kα)2ψ(αρk).\displaystyle\int_{0}^{k\alpha}t\psi_{*}(\tfrac{\alpha^{2}}{\rho t})\mathrm{d}t\lesssim(k\alpha)^{3/2}\psi_{*}(\tfrac{\alpha^{2}}{\rho k\alpha})\int_{0}^{k\alpha}t^{1-3/2}\mathrm{d}t\lesssim(k\alpha)^{2}\psi_{*}(\tfrac{\alpha}{\rho k}).

This inequality, together with (62), yields

Ei=1n|Y^i|3λi𝟙Aiθψρn+θψρk2ψ(αρk).E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}\mathbbm{1}_{A_{i}}\lesssim\theta_{\psi}\rho n+\theta_{\psi}\rho k^{2}\psi_{*}(\tfrac{\alpha}{\rho k}).

Taking ρ\rho such that n=4k2ψ(αρk)n=4k^{2}\psi_{*}(\tfrac{\alpha}{\rho k}), i.e.,

ρ=α/kψ1(n/k2)(16)4αknψ1(n4k2),\rho=\frac{\alpha/k}{\psi_{*}^{-1}(n/k^{2})}\stackrel{{\scriptstyle\eqref{eq:orlicz-ineq}}}{{\leq}}\frac{4\alpha k}{n}\psi^{-1}(\frac{n}{4k^{2}}),

we obtain

Ei=1T1|Y^i|3λi𝟙Aiθψρnθψαkψ1(n4k2).E\sum_{i=1}^{T-1}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}\mathbbm{1}_{A_{i}}\lesssim\theta_{\psi}\rho n\lesssim\theta_{\psi}\alpha k\psi^{-1}(\frac{n}{4k^{2}}). (63)

Step 2. By Lemma 18, {Y^i}i=1n33Y33\lVert\{\hat{Y}_{i}\}_{i=1}^{n}\rVert_{3}^{3}\lesssim\lVert Y\rVert_{3}^{3}. Recalling that Aic={λi>kα}A_{i}^{c}=\{\lambda_{i}>k\alpha\} and ψx3\psi\succcurlyeq x^{3},

Ei=1n|Y^i|3λi𝟙AicEi=1n|Y^i|3kαnY33kαψ(1)nθψ3kα.\displaystyle E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}\mathbbm{1}_{A_{i}^{c}}\leq E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{k\alpha}\lesssim\frac{n\lVert Y\rVert_{3}^{3}}{k\alpha}\lesssim_{\psi(1)}\frac{n\theta_{\psi}^{3}}{k\alpha}.

where Lemma A.1 is used in the last inequality. This inequality, together with (63), yields

Ei=1n|Y^i|3λiψ(1)θψαkψ1(n4k2)+nθψ3kα.E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}\lesssim_{\psi(1)}\theta_{\psi}\alpha k\psi^{-1}(\frac{n}{4k^{2}})+\frac{n\theta_{\psi}^{3}}{k\alpha}.

Let g(x):=1xψ(x)g(x):=\frac{1}{x}\psi(x). Choosing kk such that θψαkψ1(n4k2)=nθψ3kα\theta_{\psi}\alpha k\psi^{-1}(\frac{n}{4k^{2}})=\frac{n\theta_{\psi}^{3}}{k\alpha}, i.e.,

1αk=g1(α24θψ2)/(nθψ2),\frac{1}{\alpha k}=\sqrt{g^{-1}(\tfrac{\alpha^{2}}{4\theta_{\psi}^{2}})/(n\theta_{\psi}^{2})},

we arrive at the bound

Ei=1n|Y^i|3λiψ(1)θψ2ng1(α24θψ2).E\sum_{i=1}^{n}\frac{|\hat{Y}_{i}|^{3}}{\lambda_{i}}\lesssim_{\psi(1)}\theta_{\psi}^{2}\sqrt{ng^{-1}(\tfrac{\alpha^{2}}{4\theta_{\psi}^{2}})}.

Step 3. Further, by (61), we get

𝒵2(X^n+1,𝒩)ψ(1)α2+θψ2ng1(α2θψ2).\mathcal{Z}_{2}(\hat{X}_{n+1},\mathcal{N})\lesssim_{\psi(1)}\alpha^{2}+\theta_{\psi}^{2}\sqrt{ng^{-1}(\tfrac{\alpha^{2}}{\theta_{\psi}^{2}})}\,.

Thus, choosing α=2Lψ=2θψψ1(n)\alpha=2L_{\psi}=2\theta_{\psi}\psi^{-1}(n), by Proposition 17(a) and (25),

𝒲2(Xn,𝒩)\displaystyle\mathcal{W}_{2}(X_{n},\mathcal{N}) 𝒵2(X^n+1,𝒩)1/2+Lψ+Vn2111/2\displaystyle\lesssim\mathcal{Z}_{2}(\hat{X}_{n+1},\mathcal{N})^{1/2}+L_{\psi}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2}
ψ(1)Lψ+θψ[ng1(ψ1(n)2)]1/4+Vn2111/2.\displaystyle\lesssim_{\psi(1)}L_{\psi}+\theta_{\psi}\left[ng^{-1}(\psi^{-1}(n)^{2})\right]^{1/4}+\lVert V_{n}^{2}-1\rVert_{1}^{1/2}.

Theorem 7(ii) is proved. ∎

5 Optimality of the 𝒲1\mathcal{W}_{1} rates: Proof of Proposition 10

For any p>2p>2, we will construct a mds {Yi}i=1n\{Y_{i}\}_{i=1}^{n} such that Vn2=1V_{n}^{2}=1 and 𝒲1(Xn,𝒩)pLp(Y)\mathcal{W}_{1}(X_{n},\mathcal{N})\gtrsim_{p}L_{p}(Y). Note that in our examples, Lp(Y)0L_{p}(Y)\to 0 as nn\to\infty, justifying the optimality of the term LψL_{\psi} in Theorem 5.

We choose

α=1logn.\alpha=\frac{1}{\log n}.

(Actually for any p>2p>2, any αn1/p1/2\alpha\gg n^{1/p-1/2} so that αn0\alpha_{n}\to 0 as nn\to\infty would work.)

Example 21.

For n3n\geq 3, let Y1,,Yn2,Yn,ξ,ηY_{1},\ldots,Y_{n-2},Y_{n},\xi,\eta be independent random variables such that

  • Yn,ξ𝒩(0,αn2)Y_{n},\xi\sim\mathcal{N}(0,\alpha_{n}^{2}),

  • Y1,,Yn2Y_{1},\ldots,Y_{n-2} are i.i.d. 𝒩(0,12αn2n2)\mathcal{N}(0,\tfrac{1-2\alpha_{n}^{2}}{n-2}) normal random variables,

  • η\eta has distribution

    P(η=12)=45,P(η=2)=15,P(\eta=\tfrac{1}{2})=\tfrac{4}{5},\quad P(\eta=-2)=\tfrac{1}{5},

We let Xm:=Y1++YmX_{m}:=Y_{1}+\ldots+Y_{m} and define a set AA\subset\mathbb{R} as

A={x:cosx12}=k[(2k13)π,(2k+13)π].A=\left\{x\in\mathbb{R}:\cos x\geq\tfrac{1}{2}\right\}=\bigcup_{k\in\mathbb{Z}}\big{[}(2k-\tfrac{1}{3})\pi,(2k+\tfrac{1}{3})\pi\big{]}.

Define

Yn1=ξ𝟙Xn2αA+αη𝟙Xn2αA.Y_{n-1}=\xi\mathbbm{1}_{X_{n-2}\notin\alpha A}+\alpha\eta\mathbbm{1}_{X_{n-2}\in\alpha A}.

Clearly, {Xm}m=1n\{X_{m}\}_{m=1}^{n} is a martingale, and

σn12=σn2=α2,σi2=12α2n2 for i=1,,n2.\sigma_{n-1}^{2}=\sigma_{n}^{2}=\alpha^{2},\quad\sigma_{i}^{2}=\tfrac{1-2\alpha^{2}}{n-2}\,\text{ for }i=1,\ldots,n-2.

Of course, Vn2=i=1nσi2=1V_{n}^{2}=\sum_{i=1}^{n}\sigma_{i}^{2}=1 almost surely, and

Lp=(i=1nE[|Yi|p])1/p((n2)cnp/2+cαp)1/ppα.L_{p}=\left(\sum_{i=1}^{n}E[|Y_{i}|^{p}]\right)^{1/p}\leq\left((n-2)\frac{c}{n^{p/2}}+c\alpha^{p}\right)^{1/p}\lesssim_{p}\alpha.
Proof of Proposition 10(1).

Let {Yi}i=1n\{Y_{i}\}_{i=1}^{n} be the mds in Example 21. We consider the function

h(x)=αsin(xα).h(x)=\alpha\sin(\tfrac{x}{\alpha}).

Clearly, [h]0,11[h]_{0,1}\leq 1. Recall the definition of the function hα(x)h_{\alpha}(x) in (26). Then

hα(x)=αsin(1α(xαu))ϕ(u)du=h(x)cosuϕ(u)du=1eh(x),h_{\alpha}(x)=\alpha\int_{\mathbb{R}}\sin\big{(}\tfrac{1}{\alpha}(x-\alpha u)\big{)}\phi(u)\mathrm{d}u=h(x)\int_{\mathbb{R}}\cos u\phi(u)\mathrm{d}u=\frac{1}{\sqrt{e}}h(x), (64)

where ϕ(x)=12πex2/2\phi(x)=\tfrac{1}{\sqrt{2\pi}}e^{-x^{2}/2} denotes the standard normal density. Moreover,

E[h(Xn)h(𝒩)]=E[hα(Xn1)hα(Xn2+α𝒩)].\displaystyle E[h(X_{n})-h(\mathcal{N})]=E[h_{\alpha}(X_{n-1})-h_{\alpha}(X_{n-2}+\alpha\mathcal{N})].

By the definition of Yn1Y_{n-1}, we have

E[hα(Xn1)]=E[hα(Xn2+α𝒩)𝟙Xn2αA]+E[hα(Xn1+αη)𝟙Xn2αA].\displaystyle E[h_{\alpha}(X_{n-1})]=E[h_{\alpha}(X_{n-2}+\alpha\mathcal{N})\mathbbm{1}_{X_{n-2}\notin\alpha A}]+E[h_{\alpha}(X_{n-1}+\alpha\eta)\mathbbm{1}_{X_{n-2}\in\alpha A}].

Thus

E[h(Xn)h(𝒩)]\displaystyle E[h(X_{n})-h(\mathcal{N})] (65)
=E[(hα(Xn2+αη)hα(Xn2+α𝒩))𝟙Xn1αA]\displaystyle=E\left[\left(h_{\alpha}(X_{n-2}+\alpha\eta)-h_{\alpha}(X_{n-2}+\alpha\mathcal{N})\right)\mathbbm{1}_{X_{n-1}\in\alpha A}\right]
=(64)αeE[(sinXn2α(cosηcos𝒩)+cosXn2α(sinηsin𝒩))𝟙Xn2αA]\displaystyle\stackrel{{\scriptstyle\eqref{eq:calcul-h}}}{{=}}\tfrac{\alpha}{\sqrt{e}}E\left[\left(\sin\tfrac{X_{n-2}}{\alpha}(\cos\eta-\cos\mathcal{N})+\cos\tfrac{X_{n-2}}{\alpha}(\sin\eta-\sin\mathcal{N})\right)\mathbbm{1}_{X_{n-2}\in\alpha A}\right]
=αe(E[sinXn2α𝟙Xn2αA]E[cosηcos𝒩]+E[cosXn2α𝟙Xn2αA]E[sinη]).\displaystyle=\tfrac{\alpha}{\sqrt{e}}\left(E\left[\sin\tfrac{X_{n-2}}{\alpha}\mathbbm{1}_{X_{n-2}\in\alpha A}\right]E[\cos\eta-\cos\mathcal{N}]+E\left[\cos\tfrac{X_{n-2}}{\alpha}\mathbbm{1}_{X_{n-2}\in\alpha A}\right]E[\sin\eta]\right).

Since the set AA is symmetric, i.e., A=AA=-A, and Xn2𝒩(0,12α2)X_{n-2}\sim\mathcal{N}(0,1-2\alpha^{2}), we get E[sinXn2α𝟙Xn2αA]=0E\left[\sin\tfrac{X_{n-2}}{\alpha}\mathbbm{1}_{X_{n-2}\in\alpha A}\right]=0. Hence

E[h(Xn)h(𝒩)]\displaystyle E[h(X_{n})-h(\mathcal{N})] =αeE[cosXn2α𝟙Xn2αA]E[sinη]\displaystyle=\tfrac{\alpha}{\sqrt{e}}E\left[\cos\tfrac{X_{n-2}}{\alpha}\mathbbm{1}_{X_{n-2}\in\alpha A}\right]E[\sin\eta]
α2eP(Xn2αA)E[sinη].\displaystyle\geq\tfrac{\alpha}{2\sqrt{e}}P(X_{n-2}\in\alpha A)E[\sin\eta].

Note that E[sinη]=45sin1215sin2>0.2E[\sin\eta]=\tfrac{4}{5}\sin\tfrac{1}{2}-\tfrac{1}{5}\sin 2>0.2, and, writing c1:=(12α2)1/2c_{1}:=(1-2\alpha^{2})^{-1/2},

P(Xn2αA)\displaystyle P(X_{n-2}\in\alpha A) =P(𝒩c1αA)\displaystyle=P(\mathcal{N}\in c_{1}\alpha A)
=kc1(6k1)απ/3c1(6k+1)απ/3ϕ(u)du\displaystyle=\sum_{k\in\mathbb{Z}}\int_{c_{1}(6k-1)\alpha\pi/3}^{c_{1}(6k+1)\alpha\pi/3}\phi(u)\mathrm{d}u
12(0c1απ/3ϕ(u)du+k=02c1απ3ϕ(c1(6k+1)απ3))\displaystyle\geq\frac{1}{2}\left(\int_{0}^{c_{1}\alpha\pi/3}\phi(u)\mathrm{d}u+\sum_{k=0}^{\infty}\tfrac{2c_{1}\alpha\pi}{3}\phi(\tfrac{c_{1}(6k+1)\alpha\pi}{3})\right)
160ϕ(u)du=112.\displaystyle\geq\frac{1}{6}\int_{0}^{\infty}\phi(u)\mathrm{d}u=\frac{1}{12}. (66)

Therefore, E[h(Xn)h(𝒩)]α120eE[h(X_{n})-h(\mathcal{N})]\geq\frac{\alpha}{120\sqrt{e}} and so

𝒲1(Xn,𝒩)α120eCLp,\mathcal{W}_{1}(X_{n},\mathcal{N})\geq\frac{\alpha}{120\sqrt{e}}\geq CL_{p},

justifying the optimality of the power of LψL_{\psi} in Theorem 5. ∎

We can modify Example 21 above to justify the optimality of the exponent of V211/2\lVert V^{2}-1\rVert_{1/2} in Theorem 5.

Example 22.

For n3n\geq 3, let Y1,,Yn2Y_{1},\ldots,Y_{n-2} be i.i.d. with Y1𝒩(0,1α2n2)Y_{1}\sim\mathcal{N}(0,\frac{1-\alpha^{2}}{n-2}). Let

Yn1=αη𝟙Xn2αA,Y_{n-1}=\alpha\eta\mathbbm{1}_{X_{n-2}\in\alpha A},

where Xn2=i=1n1YiX_{n-2}=\sum_{i=1}^{n-1}Y_{i}, α=αn=1/logn\alpha=\alpha_{n}=1/\log n , η\eta and the set AA\subset\mathbb{R} are as in Example 21. Let Yn𝒩(0,α2P(𝒩κA))Y_{n}\sim\mathcal{N}\left(0,\alpha^{2}P(\mathcal{N}\notin\kappa A)\right), where κ=κn=α/1α2\kappa=\kappa_{n}=\alpha/\sqrt{1-\alpha^{2}}.

Proof of Proposition 10(2).

Let the mds {Yi}i=1n\{Y_{i}\}_{i=1}^{n} be as in Example 22.

Clearly, L3=(i=1nE[|Yi|3])1/3αL_{3}=\left(\sum_{i=1}^{n}E[|Y_{i}|^{3}]\right)^{1/3}\lesssim\alpha, and

Vn211/2\displaystyle\lVert V_{n}^{2}-1\rVert_{1/2} =α2E[|𝟙Xn2αAP(Xn2αA)|1/2]2\displaystyle=\alpha^{2}E\left[|\mathbbm{1}_{X_{n-2}\in\alpha A}-P(X_{n-2}\in\alpha A)|^{1/2}\right]^{2}
α2P(𝒩κA)P(𝒩κA).\displaystyle\asymp\alpha^{2}P(\mathcal{N}\notin\kappa A)P(\mathcal{N}\in\kappa A).

Hence, by Theorem 5(ii), we get

𝒲1(Xn,𝒩)α,\mathcal{W}_{1}(X_{n},\mathcal{N})\lesssim\alpha, (67)

Next we will show that, there exists N>0N>0 such that for n>Nn>N,

Vn211/2α2.\lVert V_{n}^{2}-1\rVert_{1/2}\asymp\alpha^{2}. (68)

Note that the same computation as in (5) yields P(𝒩κA)1/12P(\mathcal{N}\in\kappa A)\geq 1/12. Hence we only need to show that P(𝒩κA)1P(\mathcal{N}\notin\kappa A)\gtrsim 1. Indeed,

P(𝒩κA)\displaystyle P(\mathcal{N}\in\kappa A) =ic1(6i1)κπ/3c1(6i+1)κπ/3ϕ(u)du\displaystyle=\sum_{i\in\mathbb{Z}}\int_{c_{1}(6i-1)\kappa\pi/3}^{c_{1}(6i+1)\kappa\pi/3}\phi(u)\mathrm{d}u
20c1κπ/3ϕ(u)du+2i=12c1κπ3ϕ(c1(6i1)κπ3)\displaystyle\leq 2\int_{0}^{c_{1}\kappa\pi/3}\phi(u)\mathrm{d}u+2\sum_{i=1}^{\infty}\frac{2c_{1}\kappa\pi}{3}\phi(\frac{c_{1}(6i-1)\kappa\pi}{3})
20c1κπ/3ϕ(u)du+23c1κπ/3ϕ(u)du\displaystyle\leq 2\int_{0}^{c_{1}\kappa\pi/3}\phi(u)\mathrm{d}u+\frac{2}{3}\int_{-c_{1}\kappa\pi/3}^{\infty}\phi(u)\mathrm{d}u
830c1κπ/3ϕ(u)du+1312\displaystyle\leq\frac{8}{3}\int_{0}^{c_{1}\kappa\pi/3}\phi(u)\mathrm{d}u+\frac{1}{3}\leq\frac{1}{2}

for all nn sufficiently big. Inequality (68) follows.

It remains to show that

𝒲1(Xn,𝒩)α.\mathcal{W}_{1}(X_{n},\mathcal{N})\gtrsim\alpha.

To this end, we consider the function h(x)=αsin(xα)h(x)=\alpha\sin(\tfrac{x}{\alpha}). Define λ>0\lambda>0 by

λ2=Var(Yn)=α2P(𝒩κA)[12α2,1112α2].\lambda^{2}={\rm Var}(Y_{n})=\alpha^{2}P(\mathcal{N}\notin\kappa A)\in[\tfrac{1}{2}\alpha^{2},\tfrac{11}{12}\alpha^{2}].

Recall the definition of hλ(x)h_{\lambda}(x) in (26). By the same calculation as in (64),

hλ(x)=h(x)cos(λαu)ϕ(u)du=h(x)exp(λ22α2).h_{\lambda}(x)=h(x)\int_{\mathbb{R}}\cos(\tfrac{\lambda}{\alpha}u)\phi(u)\mathrm{d}u=h(x)\exp(-\frac{\lambda^{2}}{2\alpha^{2}}).

By symmetry, E[h(𝒩)]=0E[h(\mathcal{N})]=0 and E[hλ(Xn2)𝟙Xn2αA]=0E[h_{\lambda}(X_{n-2})\mathbbm{1}_{X_{n-2}\notin\alpha A}]=0. Hence

E[h(Xn)h(𝒩)]\displaystyle E[h(X_{n})-h(\mathcal{N})]
=E[hλ(Xn2)𝟙Xn2αA]+E[hλ(Xn2+αη)𝟙Xn2αA]\displaystyle=E[h_{\lambda}(X_{n-2})\mathbbm{1}_{X_{n-2}\notin\alpha A}]+E[h_{\lambda}(X_{n-2}+\alpha\eta)\mathbbm{1}_{X_{n-2}\in\alpha A}]
=αexp(λ22α2)E[cos(Xn2α)sinη𝟙Xn2αA]\displaystyle=\alpha\exp(-\frac{\lambda^{2}}{2\alpha^{2}})E[\cos(\tfrac{X_{n-2}}{\alpha})\sin\eta\mathbbm{1}_{X_{n-2}\in\alpha A}]
0.1αexp(λ22α2)P(Xn2αA)cα.\displaystyle\geq 0.1\alpha\exp(-\frac{\lambda^{2}}{2\alpha^{2}})P(X_{n-2}\in\alpha A)\geq c\alpha.

Display (67) is proved. Our proof of the Proposition is complete. ∎

6 Some open questions

  1. 1.

    For the \mathscr{L}^{\infty} case (i.e. the mds is uniformly \mathscr{L}^{\infty}-bounded), is the typical 𝒲1\mathcal{W}_{1} rates O(n1/2logn)O(n^{-1/2}\log n) within Theorem 5 optimal? Recall that it is shown by Bolthausen [5] that this rate is typically optimal for the Kolmogorov distance.

  2. 2.

    For the \mathscr{L}^{\infty} case, are the typical 𝒲r\mathcal{W}_{r} rates O(n/(2r))O(n^{-/(2r)}), r{2,3}r\in\{2,3\}, in Theorems 7 and 8 optimal? For general r>3r>3 and bounded mds, can we get similar bounds O(n/(2r))O(n^{-/(2r)}), r>3r>3?

  3. 3.

    Can we say anything about the optimality of the 𝒲2\mathcal{W}_{2} rates in Theorem 7 when the mds is p\mathscr{L}^{p} integrable, for any p>2p>2?

  4. 4.

    Is there a better (unified) formula than the “piecewise” 𝒲2\mathcal{W}_{2} bound in Theorem 7 for p\mathscr{L}^{p} martingales, p>2p>2?

  5. 5.

    As a feature of the 𝒲1,𝒲2\mathcal{W}_{1},\mathcal{W}_{2} bounds in Theorems 5 and 7, better integrability implies faster (typical) convergence rates for the martingale CLT. But this is no longer the case for the 𝒲3\mathcal{W}_{3} estimate in Theorem 8. Is it possible to get better 𝒲3\mathcal{W}_{3} bounds for general martingales with better integrability than 3\mathscr{L}^{3}?

  6. 6.

    How to obtain the Wasserstein rates for the CLT of multi-dimensional martingales? Can we say something on the dependence of the rates on the dimension?

  7. 7.

    How to obtain the Wasserstein-rr convergence rates, r>3r>3, for the martingale CLT in terms of LψL_{\psi}?

Appendix A Appendix

A.1 Comparison between Orlicz norms

Recall the definition of the (mean-)orlicz norm ψ\lVert\cdot\rVert_{\psi} for a sequence in Definition 3.

Lemma A.1.

Let p1p\geq 1 and let Y={Yi}i=1nY=\{Y_{i}\}_{i=1}^{n} be a sequence of random variables with Yp<\lVert Y\rVert_{p}<\infty. If an N-function ψ\psi satisfies ψxp\psi\succcurlyeq x^{p}, then

Yp(1+1ψ(1))1/pYψ.\lVert Y\rVert_{p}\leq\left(1+\frac{1}{\psi(1)}\right)^{1/p}\lVert Y\rVert_{\psi}.
Proof.

Without loss of generality assume Yψ=1\lVert Y\rVert_{\psi}=1. Since ψ/xp\psi/x^{p} is increasing,

E[|Yi|p]1+E[|Yi|p𝟙|Yi|>1]1+1ψ(1)E[ψ(|Yi|)].\displaystyle E[|Y_{i}|^{p}]\leq 1+E[|Y_{i}|^{p}\mathbbm{1}_{|Y_{i}|>1}]\leq 1+\frac{1}{\psi(1)}E\left[\psi(|Y_{i}|)\right].

Thus, by the definition of ψ\lVert\cdot\rVert_{\psi},

Ypp=1ni=1nE[|Yi|p]1+1ψ(1)ni=1nE[ψ(|Yi|)]1+1ψ(1).\displaystyle\lVert Y\rVert_{p}^{p}=\frac{1}{n}\sum_{i=1}^{n}E[|Y_{i}|^{p}]\leq 1+\frac{1}{\psi(1)n}\sum_{i=1}^{n}E[\psi(|Y_{i}|)]\leq 1+\frac{1}{\psi(1)}.

The lemma follows. ∎

A.2 Regularity of Gaussian smoothing:Proof of Lemma 12

The proof is exactly as in [10, Lemma 6.1].

Proof.

Since fσ(x)=f(xσu)ϕ(u)f_{\sigma}(x)=\int_{\mathbb{R}}f(x-\sigma u)\phi(u), integration by parts yields

fσ(k)(x)\displaystyle f_{\sigma}^{(k)}(x) =1σkf()(xσu)ϕ(k)(u)du\displaystyle=\frac{1}{\sigma^{k}}\int_{\mathbb{R}}f^{(\ell)}(x-\sigma u)\phi^{(k-\ell)}(u)\mathrm{d}u
=1σk[f()(xσu)f()(x)]ϕ(k)(u)du,0<k.\displaystyle=\frac{1}{\sigma^{k-\ell}}\int_{\mathbb{R}}[f^{(\ell)}(x-\sigma u)-f^{(\ell)}(x)]\phi^{(k-\ell)}(u)\mathrm{d}u,\quad\forall 0\leq\ell<k.

The lemma follows by taking =r1\ell=r-1 and using the fact [f]r1,1=1[f]_{r-1,1}=1. ∎

A.3 Moment bound of the maximum

Lemma A.2.

Let ψxr\psi\succcurlyeq x^{r} be an N-function, r>0r>0. For any sequence of random variables Y={Yi}i=1nY=\{Y_{i}\}_{i=1}^{n}, we have

maxi=1n|Yi|r21/rYψψ1(n).\lVert\max_{i=1}^{n}|Y_{i}|\rVert_{r}\leq 2^{1/r}\lVert Y\rVert_{\psi}\psi^{-1}(n).
Proof.

Without loss of generality, assume Yψ=1\lVert Y\rVert_{\psi}=1. For any α>0\alpha>0,

|Yi|r𝟙|Yi|ααrψ(α)ψ(|Yi|)𝟙|Yi|α,1in|Y_{i}|^{r}\mathbbm{1}_{|Y_{i}|\geq\alpha}\leq\frac{\alpha^{r}}{\psi(\alpha)}\psi(|Y_{i}|)\mathbbm{1}_{|Y_{i}|\geq\alpha},\quad 1\leq i\leq n

where we used the fact that xψxrx\mapsto\frac{\psi}{x^{r}} is increasing. Hence

E[maxi=1n|Yi|r]\displaystyle E[\max_{i=1}^{n}|Y_{i}|^{r}] αr+E[maxi=1n|Yi|r𝟙|Yi|α]\displaystyle\leq\alpha^{r}+E[\max_{i=1}^{n}|Y_{i}|^{r}\mathbbm{1}_{|Y_{i}|\geq\alpha}]
αr+E[i=1nαrψ(α)ψ(|Yi|)]αr+nαrψ(α)\displaystyle\leq\alpha^{r}+E\left[\sum_{i=1}^{n}\frac{\alpha^{r}}{\psi(\alpha)}\psi(|Y_{i}|)\right]\leq\alpha^{r}+\frac{n\alpha^{r}}{\psi(\alpha)}

Taking α=ψ1(n)\alpha=\psi^{-1}(n), we get maxi=1n|Yi|rr2αr\lVert\max_{i=1}^{n}|Y_{i}|\rVert_{r}^{r}\leq 2\alpha^{r}. ∎

A.4 Proof of Lemma 20

Proof.

We only give the proof of (1). The proof of (2) is similar.

The case ψ=xp\psi=x^{p} is trivial, so we only consider the general case ψxp\psi\preccurlyeq x^{p}. It suffices to show that, for any s>t>0s>t>0,

s1/qψ1(s)14t1/qψ1(t),\frac{s^{1/q}}{\psi_{*}^{-1}(s)}\geq\frac{1}{4}\frac{t^{1/q}}{\psi_{*}^{-1}(t)},

which, writing F(x):=ψ1(x)ψ1(x)F(x):=\psi^{-1}(x)\psi_{*}^{-1}(x), is equivalent to

ψ1(s)/s1/pψ1(t)/t1/p1CψF(s)/sF(t)/ts>t>0.\frac{\psi^{-1}(s)/s^{1/p}}{\psi^{-1}(t)/t^{1/p}}\geq\frac{1}{C_{\psi}}\frac{F(s)/s}{F(t)/t}\quad\forall s>t>0.

Note that ψxp\psi\preccurlyeq x^{p} implies that the left-side is at least 1. Lemma 20(1) follows. ∎

A.5 A Burkholder inequality

Theorem A.3.

Let {Yi}i=1n\{Y_{i}\}_{i=1}^{n} be a mds and let Sm=Y1++YmS_{m}=Y_{1}+\ldots+Y_{m}, 1mn1\leq m\leq n. Recall the notation σi(Y)2\sigma_{i}(Y)^{2} in (1). Then, fir p>0p>0,

E[maxi=1n|Sn|p]pE[(i=1nσi2(Y))p/2]+E[maxi=1n|Yi|p].E[\max_{i=1}^{n}|S_{n}|^{p}]\lesssim_{p}E[\big{(}\sum_{i=1}^{n}\sigma_{i}^{2}(Y)\big{)}^{p/2}]+E[\max_{i=1}^{n}|Y_{i}|^{p}].

This version is taken from [27, Theorem 2.11]. For a more general inequality, see [7, Theorem 21.1].

References

  • [1] R. A. Adams, J.J.F. Fournier, Sobolev Spaces. Second edition. Pure Appl. Math. (Amst.), 140. Elsevier/Academic Press, Amsterdam, 2003.
  • [2] A. Bikjalis, Estimates of the remainder term in the central limit theorem. Litovsk. Mat. Sb.6(1966), 323-346.
  • [3] S. G. Bobkov, Entropic approach to E. Rio’s central limit theorem for W2W_{2} transport distance. Stat. Probab. Lett. 83(7), 1644-1648 (2013).
  • [4] S. G. Bobkov, Berry-Esseen bounds and Edgeworth expansions in the central limit theorem for transport distances. Probab. Theory Relat. Fields (2018) 170:229-262.
  • [5] E. Bolthausen, Exact convergence rates in some martingale central limit theorems. Ann. Probab. 10 (1982)672-688.
  • [6] T. Bonis, Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probab. Theory Related Fields 178 (2020), no. 3-4, 827–860.
  • [7] D. L. Burkholder, B. J. Davis, R. F. Gundy, Integral inequalities for convex functions of operators on martingales. University of California Press, Berkeley, CA, 1972, pp. 223-240.
  • [8] T. Courtade, M. Fathi, A. Pananjady, Existence of Stein kernels under a spectral gap, and discrepancy bounds. Ann. Inst. Henri Poincaré Probab. Stat.55(2019), no.2, 777-790.
  • [9] J. Dedecker, F. Merlevède, Rates of convergence in the central limit theorem for linear statistics of martingale differences. Stochastic Process. Appl. 121 (2011) 1013-1043.
  • [10] J. Dedecker, F. Merlevède, E. Rio, Rates of convergence for minimal distances in the central limit theorem under projective criteria. Electron. J. Probab.14(2009), no. 35, 978-1011.
  • [11] J. Dedecker, F. Merlevède, E. Rio, Rates of convergence in the central limit theorem for martingales in the non stationary setting. Ann. Inst. Henri Poincaré Probab. Stat. 58 (2022), no. 2, 945-966.
  • [12] L.V. Dung, T.C. Son, N.D. Tien, L1L_{1} bounds for some martingale central limit theorems. Lith. Math. J. 54, 48-60 (2014).
  • [13] A. Dvoretzky, Asymptotic normality for sums of dependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Probab. 2 513-535. Univ. California Press, 1972.
  • [14] R. Eldan, D. Mikulincer, A. Zhai, The CLT in high dimensions: quantitative bounds via martingale embedding. Ann. Probab. 48 (2020), no. 5, 2494-2524.
  • [15] M. El Machkouri, L. Ouchti, Exact convergence rates in the central limit theorem for a class of martingales. Bernoulli 13 (4) (2007) 981-999.
  • [16] C.-G. Esseen, On mean central limit theorem. Kungl. Tekn. Högsk. Handl. Stockholm,121(1958)
  • [17] X. Fan, Exact rates of convergence in some martingale central limit theorems. J. Math. Anal. Appl. 469 (2019), no. 2, 1028-1044.
  • [18] X. Fan, X. Ma, On the Wasserstein distance for a martingale central limit theorem. Statist. Probab. Lett. 167 (2020), 108892, 6 pp.
  • [19] X. Fan, Z. Su, Rates of convergence in the distances of Kolmogorov and Wasserstein for standardized martingales. (2023), arXiv:2309.08189v1
  • [20] X. Fang, Q.-M. Shao, L. Xu, Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Related Fields 174(2019), no.3-4, 945-979.
  • [21] X. Fang, Wasserstein-2 bounds in normal approximation under local dependence. Electron. J. Probab. 24 (2019), no. 35, 1-14.
  • [22] M. Fathi, Stein kernels and moment maps. Ann. Probab. 47 (2019), no. 4, 2172-2185.
  • [23] I. Grama, E. Haeusler, Large deviations for martingales via Cramér’s method. Stochastic Process. Appl. 85 (2000) 279-293.
  • [24] E. Haeusler, A note on the rate of convergence in the martingale central limit theorem. Ann. Probab. 12 (1984) 635-639.
  • [25] E. Haeusler, On the rate of convergence in the central limit theorem for martingales with discrete and continuous time. Ann. Probab. 16 (1) (1988) 275-299.
  • [26] E. Haeusler, K. Joos, A nonuniform bound on the rate of convergence in the martingale central limit theorem. Ann. Probab.16(1988), no.4, 1699-1720.
  • [27] P. Hall, C.C. Heyde, Martingale Limit Theory and Its Applications. Academic, New York, 1980.
  • [28] C.C. Heyde, B.M. Brown, On the departure from normality of a certain class of martingales. Ann. Math. Stat. 41 (1970) 2161-2165.
  • [29] I. Ibragimov, On the accuracy of Gaussian approximation to the distribution functions of sums of independent random variables. Theory Probab. Appl. 11 (1966) 559-579.
  • [30] K. Joos, Nonuniform convergence rates in the central limit theorem for martingales. J. Multivariate Anal.36(1991), no.2, 297-313.
  • [31] K. Joos, Nonuniform convergence rates in the central limit theorem for martingales. Studia Sci. Math. Hungar. 28 (1-2) (1993) 145-158.
  • [32] R. Kulik, P. Soulier, Heavy-tailed time series. Springer Ser. Oper. Res. Financ. Eng. Springer, New York, 2020.
  • [33] T. Liu, M. Austern, Wasserstein-p bounds in the central limit theorem under local dependence. Electron. J. Probab. 28 (2023), no. 117, 1-47.
  • [34] J.C. Mourrat, On the rate of convergence in the martingale central limit theorem. Bernoulli 19 (2) (2013) 633-645.
  • [35] L. Ouchti, On the rate of convergence in the central limit theorem for martingale difference sequences. Ann. Inst. Henri Poincaré Probab. Stat. 41 (1) (2005) 35-43.
  • [36] V.V. Petrov, Limit theorems of probability theory. Oxford Stud. Probab., 4. The Clarendon Press, Oxford University Press, New York, 1995.
  • [37] E. Rio, Distances minimales et distances idéales. C. R. Acad. Sci. Paris 326 (1998) 1127-1130.
  • [38] E. Rio, Upper bounds for minimal distances in the central limit theorem. Ann. Inst. Henri Poincaré Probab. Stat. 45(3), 802–817 (2009)
  • [39] A. Röllin, On quantitative bounds in the mean martingale central limit theorem. Statist. Probab. Lett.138(2018), 171-176.
  • [40] A. I. Sakhanenko, Estimates in an invariance principle. (Russian) Limit theorems of probability theory, 27–44, 175, Trudy Inst. Mat., 5, “Nauka” Sibirsk. Otdel., Novosibirsk (1985).
  • [41] S. van de Geer, J. Lederer, The Bernstein-Orlicz norm and deviation inequalities. Probab. Theory Related Fields 157 (2013), no. 1-2, 225-250.
  • [42] C. Villani, Optimal Transport: Old and New. Grundlehren Math. Wiss., 338, Springer, Berlin (2009)
  • [43] M. Vladimirova, S. Girard, H. Nguyen, J. Arbel, Sub-Weibull distributions: generalizing sub-Gaussian and sub-exponential properties to heavier tailed distributions. Stat 9 (2020), e318.
  • [44] A. Zhai, A high-dimensional CLT in W2W_{2} distance with near optimal convergence rate. Probab.Theory Related Fields 170, 1-25 (2018).
  • [45] V. M. Zolotarev, On asymptotically best constants in refinements of mean central limit theorems. Theory Probab. Appl. 9 (1964) 268-276.

E-mail address, Xiaoqin Guo: [email protected]