This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\AtBeginShipoutNext\AtBeginShipoutDiscard
thanks: This research was partially supported by NSTDA young researcher fund 64000343.
Email: [email protected]

Concentration inequalities using approximate zero bias couplings with applications to Hoeffding’s statistic under the Ewens distribution

Nathakhun Wiroonsri

Mathematics and Statistics with Applications Research Group
Department of Mathematics, King Mongkut’s University of Technology Thonburi
Abstract

We prove concentration inequalities of the form P(Yt)exp(B(t))P(Y\geq t)\leq\exp(-B(t)) for a random variable YY with mean zero and variance σ2\sigma^{2} using a coupling technique from Stein’s method that is so-called approximate zero bias couplings. Applications to the Hoeffding’s statistic where the random permutation has the Ewens distribution with parameter θ>0\theta>0 are also presented. A few simulation experiments are then provided to visualize the tail probability of the Hoeffding’s statistic and our bounds. Based on the simulation results, our bounds work well especially when θ1\theta\leq 1.


AMS 2010 subject classifications: Primary 60E15, 60C05 Secondary 62G10

Key words: Tail probabilities, Approximate Zero bias coupling, Stein’s method, Simulation, Accept-Reject algorithm, Ewens’s Sampling Formula

1 Introduction

Stein’s method was first introduced by Charles Stein in his seminal paper (Stein (1972)) and is best known for obtaining non-asymptotic bounds on various distances for distributional approximations, see the text (Chen et al. (2011)) and the introductory notes (Ross (2011)). Thus far, many applications in several areas such as combinatorics, statistics, statistical physics and applied sciences have been developed using this method. Meanwhile, some researchers have used Stein’s method for non-distributional approximation contexts. Chatterjee (2007) introduced a version of Stein’s method for deriving concentration of measure inequalities.

The name ‘concentration of measure’ was first used in Talagrand (1995) when the author aimed to derive deviation bounds of the form P(|X𝔼X|x)P(|X-\mathbb{E}X|\geq x) in the situations where the exact evaluation is impossible or is hard to obtain. A bound of this form is considered ‘good’ if it decays sufficiently rapidly in xx. For instance, most of the works in the literature seek for bounds that are decreasing at exponential rate.

Now we only restrict our attention to concentration inequalities using Stein’s method. Laipaporn and Neammanee (2006) and Rattanawong (2013) obtained non-uniform concentration inequalities for randomized orthogonal array sampling designs and random permutation sum, respectively, however, they are not in the form that we focus on. Chatterjee (2007) used exchangeable pairs technique to provide concentration of measure for functions of dependent random variables, including the Hoeffding’s statistic when the random permutation has the uniform distribution and the net magnetization in the Curie-Weiss model. Once the connection between Stein’s method and concentration inequalities was uncovered, many results have been obtained using various techniques related to Stein’s method. Concentration bounds using bounded size bias couplings was introduced in Ghosh and Goldstein (2011b) and Ghosh and Goldstein (2011a) with examples including the number of relatively ordered subsequences of a random permutation, sliding window statistics, the lightbulb process, and etc. The same type of results with the help of bounded zero bias couplings occurred in Goldstein and Işlak (2014) with applications to the Hoeffding’s statistic where the random permutation has either the uniform distribution or one which is constant over permutations with the same cycle type and having no fixed points. The bounded size bias technique was strengthened in Arratia and Baxendale (2015) and the bounded assumption was relaxed by Cook et al. (2018) with applications to Erdös-Rényi graphs and permutation models.

In this work, we generalize the results in Goldstein and Işlak (2014) with zero bias couplings replaced by approximate zero bias couplings, first appeared in Wiroonsri (2017). This coupling is constructed by adding a remainder term to the construction of the zero bias coupling generated by a λ\lambda-Stein pair (see Goldstein and Reinert (1997) or Section 4.4 of Chen et al. (2011)). Here we provide brief detail, presented in Wiroonsri (2017), of the construction of approximate zero bias couplings generated by approximate λ,R\lambda,R-Stein pairs. Recall that we call a pair of random variables (Y,Y′′)(Y^{\prime},Y^{\prime\prime}), an approximate λ,R\lambda,R-Stein pair if it is exchangeable and satisfies

𝔼Y=0,Var(Y)=σ2\displaystyle\mathbb{E}Y^{\prime}=0,\quad\mathrm{Var}(Y^{\prime})=\sigma^{2} (1)

with σ2(0,)\sigma^{2}\in(0,\infty), and

𝔼(Y′′|Y)=(1λ)Y+R,\displaystyle\mathbb{E}(Y^{\prime\prime}|Y^{\prime})=(1-\lambda)Y^{\prime}+R, (2)

for some 0<λ<10<\lambda<1 and R=R(Y)R=R(Y^{\prime}).

Now we state the following lemma proved in Wiroonsri (2017) which is the generalized version of the result in Goldstein and Reinert (1997) with a λ\lambda-Stein pair replaced by an approximate λ,R\lambda,R-Stein pair. We call YY^{*} that satisfies (3), an approximate YY^{\prime}-zero bias variable.

Lemma 1.1

Let (Y,Y′′)(Y^{\prime},Y^{\prime\prime}) be an approximate λ,R\lambda,R-Stein pair with distribution F(y,y′′)F(y^{\prime},y^{\prime\prime}). Then when (Y,Y)(Y^{\dagger},Y^{\ddagger}) has distribution

dF(y,y′′)=(y′′y)2𝔼(Y′′Y)2dF(y,y′′),\displaystyle dF^{\dagger}(y^{\prime},y^{\prime\prime})=\frac{(y^{\prime\prime}-y^{\prime})^{2}}{\mathbb{E}(Y^{\prime\prime}-Y^{\prime})^{2}}dF(y^{\prime},y^{\prime\prime}),

and UU has a uniform distribution on [0,1][0,1], independent of Y,YY^{\dagger},Y^{\ddagger}, the variable Y=UY+(1U)YY^{*}=UY^{\dagger}+(1-U)Y^{\ddagger} satisfies

𝔼[Yf(Y)]=σ2𝔼f(Y)𝔼YRλ𝔼f(Y)+𝔼Rf(Y)λ\displaystyle\mathbb{E}[Y^{\prime}f(Y^{\prime})]=\sigma^{2}\mathbb{E}f^{\prime}(Y^{*})-\frac{\mathbb{E}Y^{\prime}R}{\lambda}\mathbb{E}f^{\prime}(Y^{*})+\frac{\mathbb{E}Rf(Y^{\prime})}{\lambda} (3)

for all absolutely continuous functions ff.

The core idea of Stein’s method is based on the fact that a mean zero random variable WW is normal if and only if 𝔼[Wf(W)]=σ2𝔼[f(W)]\mathbb{E}[Wf(W)]=\sigma^{2}\mathbb{E}[f^{\prime}(W)] for all absolutely continuous function. Therefore if YY^{\prime} and YY^{*} can be closely coupled by using Lemma 1.1 satisfying (3) with some small remainder term RR then we expect that the behavior of YY^{\prime}, including the decay of its tail probabilities, may be simlar to that of the normal. Wiroonsri (2017) obtained L1L^{1} and LL^{\infty} bounds between Y/σY^{\prime}/\sigma and the standard normal in term of the difference |YY||Y^{\prime}-Y^{*}| and the remainder term RR. Here we focus on the concentration bounds on the same setting.

We also apply our result to the Hoeffding’s statistic (Hoeffding (1951)) where the random permutation has the Ewens distribution. That is, we study the distribution of

Y=i=1nai,π(i)\displaystyle Y=\sum_{i=1}^{n}a_{i,\pi(i)} (4)

where An×nA\in\mathbb{R}^{n\times n} is a given real matrix with components {ai,j}i,j=1n\{a_{i,j}\}_{i,j=1}^{n} and π𝒮n\pi\in\mathcal{S}_{n} has the Ewens distribution where 𝒮n\mathcal{S}_{n} denotes the symmetric group. We note that the distribution of YY is of interest as it has a connection to nonparametric tests in statistics. Wiroonsri (2017) has succeeded in providing L1L^{1} and LL^{\infty} bounds of order σ1\sigma^{-1} between YY and the normal by constructing an approximate YY-zero bias coupling and deriving the bounds based on the coupling. In the present work, we follow the construction there and apply the following main results to obtain concentration inequalities.

We now state the main theorem and then add a remark comparing the results here to the ones in Goldstein and Işlak (2014).

Theorem 1.2

Let YY be a mean zero random variable with variance σ2>0\sigma^{2}>0 and moment generating function m(s)=𝔼[esY]m(s)=\mathbb{E}[e^{sY}], and let YY^{*} have the approximate YY-zero bias distribution satisfying (3), with R=R(Y)R=R(Y) and 0<λ<10<\lambda<1. Let B1=𝔼|R|/λB_{1}=\mathbb{E}|R|/\lambda if |R||R| and esYe^{sY} are negatively correlated for 0<s<α0<s<\alpha where α\alpha is given differently for each bound below and B1=esssup|𝔼[R|Y]|/λB_{1}=\mathrm{ess}\ \mathrm{sup}|\mathbb{E}[R|Y]|/\lambda otherwise and B2=|𝔼(YR)|/λB_{2}=|\mathbb{E}(YR)|/\lambda. Assume that B1B_{1} and B2B_{2} are finite.

  1. (a)

    If YYcY^{*}-Y\leq c for some c>0c>0 and m(s)m(s) exists for all s[0,α)s\in[0,\alpha) with α=1/c\alpha=1/c, then for all t>0t>0

    P(Yt)exp(t(t2B1)2(σ2+B2+ct)).\displaystyle P(Y\geq t)\leq\exp\left(-\frac{t(t-2B_{1})}{2(\sigma^{2}+B_{2}+ct)}\right). (5)

    The same upper bound holds for P(Yt)P(Y\leq-t) if YYcY-Y^{*}\leq c when m(s)m(s) exists for all s(α,0]s\in(-\alpha,0]. If |YY|c|Y^{*}-Y|\leq c for some c>0c>0 and m(s)m(s) exists for all s[0,α)s\in[0,\alpha) with α=2/c\alpha=2/c then for all t>0t>0

    P(Yt)exp(t(t2B1)10(σ2+B2)/3+ct),\displaystyle P(Y\geq t)\leq\exp\left(-\frac{t(t-2B_{1})}{10(\sigma^{2}+B_{2})/3+ct}\right), (6)

    with the same upper bound for P(Yt)P(Y\leq-t) if YYcY-Y^{*}\leq c and m(s)m(s) exists in (α,0](-\alpha,0].

  2. (b)

    If YYcY^{*}-Y\leq c for some c>0c>0 and m(s)m(s) exists at α=(log(tB1)loglog(tB1))/c\alpha=(\log(t-B_{1})-\log\log(t-B_{1}))/c then for t>B1t>B_{1},

    P(Yt)\displaystyle P(Y\geq t) \displaystyle\leq exp(tB1c(log(tB1)loglog(tB1)σ2+B2c))\displaystyle\exp\left(-\frac{t-B_{1}}{c}\left(\log(t-B_{1})-\log\log(t-B_{1})-\frac{\sigma^{2}+B_{2}}{c}\right)\right) (7)
    \displaystyle\leq exp(tB12c(log(tB1)2(σ2+B2)c)).\displaystyle\exp\left(-\frac{t-B_{1}}{2c}\left(\log(t-B_{1})-\frac{2(\sigma^{2}+B_{2})}{c}\right)\right).

    If YYcY-Y^{*}\leq c then the same bound holds for the left tail P(Yt)P(Y\leq-t) when m(α)<m(-\alpha)<\infty.

Part (b) shows that the respective asymptotic orders as tt\rightarrow\infty of exp(t/(2c))\exp(-t/(2c)) and exp(t/c)\exp(-t/c) of the bounds (5) and (6) can be improved to exp(tlogt/(2c))\exp(-t\log t/(2c)).

Remark 1.3
  1. 1.

    If R=0R=0, then B1B_{1} and B2B_{2} are zero and therefore the bounds (5), (6) and (7) in Theorem 1.2 are exactly the same as (2), (3) and (4) of Goldstein and Işlak (2014), respectively. See Remark 2.1 of Goldstein and Işlak (2014) and a few paragraphs thereafter for a comparison to previous results in the literature.

  2. 2.

    We note that the bounds in (5) and (6) will be effective only for t>2B1t>2B_{1}, and the bound (7) for t>B1t>B_{1} where B1B_{1} is given in the statement of the theorem. Therefore the three bounds in Theorem 1.2 will be useful only if we can construct an approximate λ,R\lambda,R-Stein pair in such a way that B1B_{1} or 2B12B_{1} is in the range of the support of YY. Otherwise, the bounds will be trivial.

Stein’s method has been used with Ewens distribution recently in Gan and Ross (2021) where they obtained an upper bound on the total variabtion distance between the random partition generated by Ewens Sampling Formula and the Poisson-Dirichlet distribution. Ewens distribution has also attracted attention in other fields of mathematics for a while (see Johnson et al. (1997), Arratia et al. (2003), Crane (2016) and da Silva et al. (2020) for instance.)

The remainder of this work is organized as follows. We present an application to the Hoeffding’s statistic where the random permutation has the Ewens distribution in Section 2. A few simulation experiments are then performed in Section 3. Finally, we devote the last section for the proof of our main theorem.

2 The Hoeffding’s statistic under the Ewens measure

As introduced in the first section, we study the distribution of

Y=i=1nai,π(i)\displaystyle Y=\sum_{i=1}^{n}a_{i,\pi(i)} (8)

where An×nA\in\mathbb{R}^{n\times n} is a given real matrix with components {ai,j}i,j=1n\{a_{i,j}\}_{i,j=1}^{n} and π𝒮n\pi\in\mathcal{S}_{n} has the Ewens distribution. The original work (Hoeffding (1951)) studied the asymptotic normality of YY when π𝒮n\pi\in\mathcal{S}_{n} has a uniform distribution. We note that, in this section, we consider the symbols π\pi and YY interchageable with π\pi^{\prime} and YY^{\prime}, respectively.

We first briefly describe the Ewens distribution and state some important properties. The Ewens distribution θ{\cal E}_{\theta} on the symmetric group 𝒮n{\cal S}_{n} with parameter θ>0\theta>0, was first introduced in Ewens (1972) and used in population genetics to describe the probabilities associated with the number of times that different alleles are observed in the sample; see also Arratia et al. (2003) for the description in mathematical context. Let k=[k,)\mathbb{N}_{k}=[k,\infty)\cap\mathbb{Z}. In the following, for xx\in\mathbb{R} and n1n\in\mathbb{N}_{1}, we use the notations

x(n)=x(x+1)(x+n1) and x(n)=x(x1)(xn+1).\displaystyle x^{(n)}=x(x+1)\cdots(x+n-1)\text{ \ \ and \ \ }x_{(n)}=x(x-1)\cdots(x-n+1).

For a permutation π𝒮n\pi\in\mathcal{S}_{n}, the Ewens measure is given by

Pθ(π)=θ#(π)θ(n),\displaystyle P_{\theta}(\pi)=\frac{\theta^{\#(\pi)}}{\theta^{(n)}}, (9)

where #(π)\#(\pi) denotes the number of cycles of π\pi. We note that θ{\cal E}_{\theta} specializes to the uniform distribution over all permutations when θ=1\theta=1.

A permutation πn𝒮n\pi_{n}\in\mathcal{S}_{n} with the distribution θ{\cal E}_{\theta} can be constructed by the “so called” the Chinese restaurant process (see e.g. Aldous (1985) and Pitman (1996)), as follows. For n=1n=1, π1\pi_{1} is the unique permutation that maps 11 to 11 in 𝒮1\mathcal{S}_{1}. For n2n\geq 2, we construct πn\pi_{n} from πn1\pi_{n-1} by either adding nn as a fixed point with probability θ/(θ+n1)\theta/(\theta+n-1), or by inserting nn uniformly into one of n1n-1 locations inside a cycle of πn1\pi_{n-1}, so each with probability 1/(θ+n1)1/(\theta+n-1). Following this construction, for i,j,k,l[n]i,j,k,l\in[n] where [n]={1,2,,n}[n]=\{1,2,\ldots,n\} such that iji\neq j and klk\neq l, we have

Pθ(π(i)=k)={1θ+n1 if ki,θθ+n1 if k=i.\displaystyle P_{\theta}(\pi(i)=k)=\begin{cases}\frac{1}{\theta+n-1}\text{ \ if \ }k\neq i,\\ \frac{\theta}{\theta+n-1}\text{ \ if \ }k=i\end{cases}. (10)

A distribution on 𝒮n\mathcal{S}_{n} is said to be constant on cycle type if the probability of any permutation π𝒮n\pi\in{\cal S}_{n} depends only on the cycle type (c1,,cn)(c_{1},\ldots,c_{n}) where cq(π)c_{q}(\pi) is the number of qq cycles of π\pi and we write cqc_{q} for cq(π)c_{q}(\pi) for simplicity. Concentration inequalities of YY where π𝒮n\pi\in\mathcal{S}_{n} has either a uniform distribution or one which is constant over permutations with the same cycle type and having no fixed points were obtained in Goldstein and Işlak (2014) by using zero bias couplings. It follows from (9) directly that the Ewens distribution is constant on cycle type and allows fixed point. Due to the difference, the zero biasing technique does not work here and thus we use approximate zero biasing as discussed in the first section instead. We first follow the construction of approximate YY-zero bias couplings presented in Wiroonsri (2017) and then apply Theorem 1.2 to obtain concentration bounds. It is interesting to study the robustness and the sensitivity of the tail probabilities when the controlled assumptions have changed.

We begin with stating the definitions, notations and assumptions used in Wiroonsri (2017). Letting

a,=1n(θ+n1)(θi=1nai,i+ijaij),\displaystyle a_{\bullet,\bullet}=\frac{1}{n(\theta+n-1)}\left(\theta\sum_{i=1}^{n}a_{i,i}+\sum_{i\neq j}a_{ij}\right), (11)

by (10), we have

𝔼Y=na,.\displaystyle\mathbb{E}Y=na_{\bullet,\bullet}. (12)

Letting

a^i,j=ai,ja,,\displaystyle\widehat{a}_{i,j}=a_{i,j}-a_{\bullet,\bullet},

and using (12), we have

𝔼[i=1na^i,π(i)]=𝔼[i=1n(ai,π(i)a,)]=𝔼[i=1nai,π(i)na,]=0.\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}\widehat{a}_{i,\pi(i)}\right]=\mathbb{E}\left[\sum_{i=1}^{n}(a_{i,\pi(i)}-a_{\bullet,\bullet})\right]=\mathbb{E}\left[\sum_{i=1}^{n}a_{i,\pi(i)}-na_{\bullet,\bullet}\right]=0.

As a consequence, replacing ai,ja_{i,j} by a^i,j\widehat{a}_{i,j}, we assume without loss of generality that

𝔼Y=0 and a,=0,\displaystyle\mathbb{E}Y=0\text{ \ and \ }a_{\bullet,\bullet}=0, (13)

and for simplicity, we consider only the symmetric case, that is, ai,j=aj,ia_{i,j}=a_{j,i} for all i,ji,j. Also, to rule out trivial cases, we assume in what follows that σ2>0\sigma^{2}>0. The explicit form of σ2\sigma^{2} was calculated in Lemma 4.8 of Wiroonsri (2017) and as discussed in Remark 4.9 of Wiroonsri (2017), σ2\sigma^{2} is of order nn when the elements of AA are well chosen so that they do not depend on nn and most of the elements are not the same number. In this case, there exists N1N\in\mathbb{N}_{1} such that σ2>0\sigma^{2}>0 for n>Nn>N.

The following theorem presents concentration inequalities for YY.

Theorem 2.1

Let n6n\geq 6 and let A={ai,j}i,j=1nA=\{a_{i,j}\}_{i,j=1}^{n} be an array of real numbers satisfying

ai,j=aj,i.\displaystyle a_{i,j}=a_{j,i}.

Let π𝒮n\pi\in\mathcal{S}_{n} be a random permutation with the distribution θ{\cal E}_{\theta}, with θ>0\theta>0. Then, with YY the sum in (8) and

M=maxi,j|ai,ja,|,a, as in (11),\displaystyle M=\max_{i,j}|a_{i,j}-a_{\bullet,\bullet}|,\ a_{\bullet,\bullet}\text{ \ as in \eqref{adotdot}},

the bounds in Theorem 1.2 hold with YY replaced by Y𝔼YY-\mathbb{E}Y and c=20Mc=20M.

Moreover,

B1(6n+4.8θ)M and B2(3κθ,n,1+1.2θ+1.2(κθ,n,1(θ+1)+κθ,n,2)n)Mσ,\displaystyle B_{1}\leq(6n+4.8\theta)M\text{ \ and \ }B_{2}\leq\left(3\kappa_{\theta,n,1}+1.2\theta+\frac{1.2(\kappa_{\theta,n,1}(\theta+1)+\kappa_{\theta,n,2})}{n}\right)M\sigma,

where

κθ,n,1=θ2n(2)(θ+n1)(2)+θnθ+n1\displaystyle\kappa_{\theta,n,1}=\sqrt{\frac{\theta^{2}n_{(2)}}{(\theta+n-1)_{(2)}}+\frac{\theta n}{\theta+n-1}} (14)

and

κθ,n,2=θ4n(4)(θ+n1)(4)+4θ3n(3)(θ+n1)(3)+2θ2n(2)(θ+n1)(2).\displaystyle\kappa_{\theta,n,2}=\sqrt{\frac{\theta^{4}n_{(4)}}{(\theta+n-1)_{(4)}}+\frac{4\theta^{3}n_{(3)}}{(\theta+n-1)_{(3)}}+\frac{2\theta^{2}n_{(2)}}{(\theta+n-1)_{(2)}}}. (15)

In particular, if AA is chosen so that |R||R| given in (16) and and esYe^{sY} are negatively correlated for 0<s<α0<s<\alpha, then

B1θM(3.6n+2.4θ3)θ+n1+θ2nM2(θ+n1)(2).\displaystyle B_{1}\leq\frac{\theta M(3.6n+2.4\theta-3)}{\theta+n-1}+\frac{\theta^{2}nM}{2(\theta+n-1)_{(2)}}.

As mentioned earlier in Remark 1.3, the bounds arised in Theorem 2.1, are effective for t2B1t\geq 2B_{1} where B1B_{1} depends on the remainder term of the approximate λ,R\lambda,R-Stein pair and the bounds are trivial outside this range. Hence if 2B12B_{1} is out of the support of Y𝔼YY-\mathbb{E}Y, then the bound will not be useful. The best possible bound of B1B_{1} that we can obtain in general is (6n+4.8θ)M(6n+4.8\theta)M which depends on nn, nevertheless, it becomes θM(3.6n+2.4θ3)/(θ+n1)+θ2nM/(2(θ+n1)(2))\theta M(3.6n+2.4\theta-3)/(\theta+n-1)+\theta^{2}nM/(2(\theta+n-1)_{(2)}) which is of constant order in the case that |R||R| and and esYe^{sY} are negatively correlated for 0<s<α0<s<\alpha.

To prove Theorem 2.1, we apply Theorem 1.2 from the introduction. That is, we have to construct an approximate YY-zero bias coupling YY^{*} from an approximate λ,R\lambda,R-Stein pair (Y,Y′′)(Y,Y^{\prime\prime}) by using Lemma 1.1 in such a way that |YY||Y^{*}-Y| and 𝔼[R|Y]\mathbb{E}[R|Y] are bounded and that |𝔼(YR)|<|\mathbb{E}(YR)|<\infty. For this purpose, we state the following two lemmas. The first, Lemma 2.2 proved in Wiroonsri (2017), constructs an approximate 4/n,R4/n,R-Stein pair (Y,Y′′)(Y^{\prime},Y^{\prime\prime}) for n6n\geq 6 where YY^{\prime} is given by (8) with π\pi^{\prime} replacing π\pi. The bounds related to the remainder term RR are then obtained in Lemma 2.3. Below, for i,j[n]i,j\in[n], we write iji\sim j if ii and jj are in the same cycle, let |i||i| be the length of the cycle containing ii and let τi,j\tau_{i,j}, i,j[n]i,j\in[n] be the permutation that transposes ii and jj.

Lemma 2.2 (Wiroonsri (2017))

For n6n\geq 6, let {ai,j}i,j=1n\{a_{i,j}\}_{i,j=1}^{n} be an array of real numbers satisfying ai,j=aj,ia_{i,j}=a_{j,i} and a,=0a_{\bullet,\bullet}=0 where a,a_{\bullet,\bullet} is as in (11). Let π𝒮n\pi\in\mathcal{S}_{n} a random permutation has the Ewens measure θ{\cal E}_{\theta} with θ>0\theta>0, and let YY^{\prime} be given by (8). Further, let I,JI,J be chosen independently of π\pi, uniformly from all pairs of distinct elements of {1,n}\{1,\ldots n\}. Then, letting π′′=τI,JπτI,J\pi^{\prime\prime}=\tau_{I,J}\pi\tau_{I,J} and YY^{\prime} be given by (8) with π\pi^{\prime} replacing π\pi, (Y,Y′′)(Y^{\prime},Y^{\prime\prime}) is an approximate 4/n,R4/n,R-Stein pair with

R(Y)=1n(n1)𝔼[T|Y]\displaystyle R(Y^{\prime})=\frac{1}{n(n-1)}\mathbb{E}[T|Y^{\prime}] (16)

where

T\displaystyle T =\displaystyle= 2(n+c12(θ+1))|i|=1ai,i2(c12θ)|i|2ai,i\displaystyle 2(n+c_{1}-2(\theta+1))\sum_{|i|=1}a_{i,i}-2(c_{1}-2\theta)\sum_{|i|\geq 2}a_{i,i} (17)
4|i|=1,|j|=1,jiai,j4|i|=1,|j|2ai,j.\displaystyle\hskip 80.0pt-4\sum_{|i|=1,|j|=1,j\neq i}a_{i,j}-4\sum_{|i|=1,|j|\geq 2}a_{i,j}.
Lemma 2.3

Let (Y,Y′′)(Y^{\prime},Y^{\prime\prime}) be an approximate 4/n,R4/n,R-Stein pair constructed as in Lemma 2.2 with RR as in (16). Then

|𝔼[R|Y]|10n+8θn1M a.s,\displaystyle|\mathbb{E}[R|Y]|\leq\frac{10n+8\theta}{n-1}M\text{ \ \ a.s}, (18)
𝔼|R|θM(12n+8θ10)(n1)(θ+n1)+2θ2M(θ+n1)(2),\displaystyle\mathbb{E}|R|\leq\frac{\theta M(12n+8\theta-10)}{(n-1)(\theta+n-1)}+\frac{2\theta^{2}M}{(\theta+n-1)_{(2)}}, (19)

and

|𝔼YR|(10κθ,n,1+4θ+4(κθ,n,1(θ+1)+κθ,n,2)n)Mσn1\displaystyle|\mathbb{E}Y^{\prime}R|\leq\left(10\kappa_{\theta,n,1}+4\theta+\frac{4(\kappa_{\theta,n,1}(\theta+1)+\kappa_{\theta,n,2})}{n}\right)\frac{M\sigma}{n-1} (20)

where M=maxi,j|ai,ja,|M=\max_{i,j}|a_{i,j}-a_{\bullet,\bullet}| with a,a_{\bullet,\bullet} as in (11) and κθ,n,1\kappa_{\theta,n,1} and κθ,n,2\kappa_{\theta,n,2} are given in (14) and (15), respectively.

Proof: The bounds (19) of 𝔼|R|\mathbb{E}|R| and (20) of |𝔼YR||\mathbb{E}Y^{\prime}R| were shown in Wiroonsri (2017). To prove (18), since |𝔼[R|Y]|=|𝔼[T|Y]|/(n(n1))|T|/(n(n1))|\mathbb{E}[R|Y]|=|\mathbb{E}[T|Y]|/(n(n-1))\leq|T|/(n(n-1)), it suffices to bound TT in (17). For any fixed π𝒮n\pi\in\mathcal{S}_{n}, using that |{i[n]:|i|=1}|=c1|\{i\in[n]:|i|=1\}|=c_{1} and |{i[n]:|i|2}|=nc1|\{i\in[n]:|i|\geq 2\}|=n-c_{1} we have

|T|4θnM a.s.    if c1=0,\displaystyle|T|\leq 4\theta nM\text{ \ \ a.s. \ \ if \ \ }c_{1}=0,

and, if c1>0c_{1}>0,

|T|(2n+2c1+4θ+4)c1M+(2c1+4θ)(nc1)M\displaystyle|T|\leq(2n+2c_{1}+4\theta+4)c_{1}M+(2c_{1}+4\theta)(n-c_{1})M
+4c1(c11)M+4c1(nc1)M a.s.\displaystyle+4c_{1}(c_{1}-1)M+4c_{1}(n-c_{1})M\text{ \ \ a.s. }

Using that c1nc_{1}\leq n and that n(nc1)n(n-c_{1}) is maximized when c1=n/2c_{1}=n/2, it follows that

|T|(10n2+8θn)M a.s.\displaystyle|T|\leq(10n^{2}+8\theta n)M\text{ \ \ a.s.}

\Box

Now we have all ingredients to prove Theorem 2.1.

Proof of Theorem 2.1: First we construct an approximate 4/n,R4/n,R-Stein pair (Y,Y′′)(Y^{\prime},Y^{\prime\prime}) as in Lemma 2.2 with the remainder RR given in (16). Then, constructing an approximate YY-zero bias variable YY^{*} from the approximate Stein pair (Y,Y′′)(Y^{\prime},Y^{\prime\prime}) as in the proof of Theorem 4.1 of Wiroonsri (2017), we have |YY|20M|Y^{*}-Y|\leq 20M. Invoking Theorem 1.2 with λ=4/n\lambda=4/n using the bounds of |R||R| and |𝔼YR||\mathbb{E}YR| in (18) and (20) of Lemma 2.3, respectively, completes the proof. \Box


3 Simulation

In this section, we perform 4 simulation experiments. AA is generated by first randomly choosing XX from N(1,0.2)N(1,0.2) and N(1,0.2)N(-1,0.2) equally and then letting B=X+XTB=X+X^{T} and A=Bb,A=B-b_{\bullet,\bullet} where b,b_{\bullet,\bullet} is defined similarly to (11). We intentionally select XX such that Cov(|R|,esY)<0\mathrm{Cov}(|R|,e^{sY})<0 for all 0<s<α0<s<\alpha with α\alpha as defined in Theorem 1.2 so that the negative correlation assumption there is satisfied. To generate Ewens permutations with θ1\theta\neq 1, we apply the accept-reject algorithm (see Robert and Casella (2013)). Recall that the discrete version of the accept-reject algorithm is stated as follow.

Theorem 3.1 (Von Neumann (1951))

Let YY and VV be discrete random variables with probability mass functions fY(y)f_{Y}(y) and fV(v)f_{V}(v), respectively, where fYf_{Y} and fVf_{V} have common support. Define

C=supyfY(y)fV(y)<.\displaystyle C=\sup_{y}\frac{f_{Y}(y)}{f_{V}(y)}<\infty.

To generate a random variable YfYY\sim f_{Y}:

  1. (a)

    Generate UU\sim Uniform(0,1)(0,1), VfVV\sim f_{V}, independently.

  2. (b)

    If UfY(V)CfV(V)U\leq\frac{f_{Y}(V)}{Cf_{V}(V)}, set Y=VY=V; otherwise, return to step (a).

In our case, YY and VV are θ\mathcal{E}_{\theta} with θ1\theta\neq 1 and uniform, respectively, with common support be all possible permutations π𝒮n\pi\in\mathcal{S}_{n}. Recall that the probability mass functions of YY and VV are

fY(π)=θ#(π)θ(n) and fV(π)=1n!,\displaystyle f_{Y}(\pi)=\frac{\theta^{\#(\pi)}}{\theta^{(n)}}\text{ \ and \ }f_{V}(\pi)=\frac{1}{n!},

where #(π)\#(\pi) denotes the number of cycles of π\pi. It is easy to see that

C={n!θθ(n) if θ<1n!θnθ(n) if θ>1.\displaystyle C=\begin{cases}\frac{n!\theta}{\theta^{(n)}}\text{ \ if \ }\theta<1\\ \frac{n!\theta^{n}}{\theta^{(n)}}\text{ \ if \ }\theta>1\\ \end{cases}.

3.1 Experiment 1

Let n=1000n=1000 and θ=1\theta=1. By simulating YY and RR for 1,000,000 times, we have σ2=2082.80\sigma^{2}=2082.80, B1=0.74B_{1}=0.74, B2=2.13B_{2}=2.13, M=3.59M=3.59, support=[2795.86,2803.19]=[-2795.86,2803.19] and Cov(Y,|R|)=0.0005\mathrm{Cov}(Y,|R|)=-0.0005 from the sample. Figures 1 and 2 show Cov(esY,|R|)Cov(e^{sY},|R|) for 0<s<α0<s<\alpha and the plot of P(Y>t)P(Y>t) compared with the bounds from (5) and (6), respectively. Note that we intend not to show (7) since Cov(esY,|R|)>0Cov(e^{sY},|R|)>0 at α=(log(tB1)loglog(tB1))/c\alpha=(\log(t-B_{1})-\log\log(t-B_{1}))/c for large tt. Since θ=1\theta=1 is equivalent to the uniform distribution on π𝒮n\pi\in\mathcal{S}_{n}, we also plot the bound from Goldstein and Işlak (2014). It is clear that the bound from Goldstein and Işlak (2014) is smaller as our coupling construction yields larger |YY||Y-Y^{*}| and that our bound is derived in a way to handle the remainder term so it is not as tight. However, the bound from Goldstein and Işlak (2014) does not work with θ1\theta\neq 1.

Refer to caption
Figure 1: Cov(esY,|R|)Cov(e^{sY},|R|) for 0s2/c0\leq s\leq 2/c.
Refer to caption
Figure 2: Plot of P(Y>t)P(Y>t) when n=1000n=1000, θ=1\theta=1.

3.2 Experiment 2

Since generating Ewens permutations with θ1\theta\neq 1 requires an algorithm, we are not able to generate a very big sample with large nn. Let n=1000n=1000 and θ=0.8\theta=0.8. θ<1\theta<1 implies that the chance of having a big number of cycles is less likely. By simulating YY and RR 10,000 times, we have σ2=507.20\sigma^{2}=507.20, B1=0.29B_{1}=0.29, B2=0.28B_{2}=0.28, M=1.60M=1.60, support=[1400.45,1398.01]=[-1400.45,1398.01] and Cov(Y,|R|)=0.001\mathrm{Cov}(Y,|R|)=-0.001 from the sample. Figures 3 and 4 show Cov(esY,|R|)Cov(e^{sY},|R|) for 0<s<α0<s<\alpha and the plot of P(Y>t)P(Y>t) compared with the bounds from (5) and (6), respectively. Note that the bound (7) is too large in this case because of the term 2(σ2+B2)c\frac{2(\sigma^{2}+B_{2})}{c}.

Refer to caption
Figure 3: Cov(esY,|R|)Cov(e^{sY},|R|) for 0s2/c0\leq s\leq 2/c.
Refer to caption
Figure 4: Plot of P(Y>t)P(Y>t) when n=1000n=1000, θ=0.8\theta=0.8.

3.3 Experiment 3

When increasing θ\theta, it is more expensive to generate a permutation. Now we let n=100n=100 and θ=1.05\theta=1.05. θ>1\theta>1 provides that the chance of having a big number of cycles is more likely. By simulating YY and RR, 10,000 times, we have σ2=212.75\sigma^{2}=212.75, B1=0.88B_{1}=0.88, B2=2.42B_{2}=2.42, M=3.15M=3.15, support=[255.59,260.86]=[-255.59,260.86] and Cov(Y,|R|)=0.015\mathrm{Cov}(Y,|R|)=-0.015 from the sample. Figures 5 and 6 show Cov(esY,|R|)Cov(e^{sY},|R|) for 0<s<α0<s<\alpha and the plot of P(Y>t)P(Y>t) compared with the bounds from (5), (6) and (7), respectively.

Refer to caption
Figure 5: Cov(esY,|R|)Cov(e^{sY},|R|) for s=(log(tB1)loglog(tB1))/cs=(\log(t-B_{1})-\log\log(t-B_{1}))/c.
Refer to caption
Figure 6: Plot of P(Y>t)P(Y>t) when n=100n=100, θ=1.05\theta=1.05.

3.4 Experiment 4

For this experiment, we intend to illustrate the case that θ\theta is much greater than 1 which results in a permutation with much larger number of cycles. However, the simulation is very expensive which even takes 93.41 iterations on average to get only one permutation with n=10n=10 and θ=2\theta=2. Based on our simulation result with sample size of 10,000, the mean of the number of cycles is 4.03. By simulating YY and RR 10,000 times, we have σ2=23.14\sigma^{2}=23.14, B1=1.27B_{1}=1.27, B2=4.60B_{2}=4.60, M=2.70M=2.70, support=[20.79,20.95]=[-20.79,20.95] and Cov(Y,|R|)=0.69\mathrm{Cov}(Y,|R|)=-0.69 from the sample. Figures 7 and 8 show Cov(esY,|R|)Cov(e^{sY},|R|) for 0<s<α0<s<\alpha and the plot of P(Y>t)P(Y>t) compared with the bounds from (5), (6) and (7), respectively.

Refer to caption
Figure 7: Cov(esY,|R|)Cov(e^{sY},|R|) for s=(log(tB1)loglog(tB1))/cs=(\log(t-B_{1})-\log\log(t-B_{1}))/c.
Refer to caption
Figure 8: Plot of P(Y>t)P(Y>t) when n=10n=10, θ=2\theta=2.

3.5 Conclusion from the simulation results

We can notice from the four experiments that our bounds seem to be more useful for smaller θ\theta, especially when θ1\theta\leq 1. For θ>1\theta>1, the bounds are not sufficiently small even at the upper limit of the support. However, our experiments with θ>1\theta>1 are only done with small nn. Therefore, it is interesting to see whether our bounds will work better with larger nn.

Also based on our experiments, the bound (6) dominates the other in all cases. Since the bound (7) actually has the fastest decaying rate at exp(tlogt)\exp\left(-t\log t\right), the bound (7) should be better if we consider YY with a larger support.

4 Proof of the main result

In this section, we follow the same technique as in Goldstein and Işlak (2014) but handle a few extra terms that include RR.


Proof of Theorem 1.2: Let m(s)=𝔼[esY]m(s)=\mathbb{E}[e^{sY}] and m(s)=𝔼[esY]m^{*}(s)=\mathbb{E}[e^{sY^{*}}]. Using that YYcY^{*}-Y\leq c, for s0s\geq 0 we have

m(s)=𝔼[es(YY)esY]𝔼[ecsesY]=ecsm(s).\displaystyle m^{*}(s)=\mathbb{E}[e^{s(Y^{*}-Y)}e^{sY}]\leq\mathbb{E}[e^{cs}e^{sY}]=e^{cs}m(s). (21)

(a) If m(s)m(s) exists in an open interval containing ss we may interchange expectation and differentiation at ss to obtain

m(s)\displaystyle m^{\prime}(s) =\displaystyle= 𝔼[YesY]=σ2𝔼[sesY]𝔼YRλ𝔼[sesY]+𝔼[ResY]λ\displaystyle\mathbb{E}[Ye^{sY}]=\sigma^{2}\mathbb{E}[se^{sY^{*}}]-\frac{\mathbb{E}YR}{\lambda}\mathbb{E}[se^{sY^{*}}]+\frac{\mathbb{E}[Re^{sY}]}{\lambda} (22)
=\displaystyle= σ2sm(s)𝔼YRλsm(s)+𝔼[ResY]λ,\displaystyle\sigma^{2}sm^{*}(s)-\frac{\mathbb{E}YR}{\lambda}sm^{*}(s)+\frac{\mathbb{E}[Re^{sY}]}{\lambda},

where we have used the approximate zero bias relation in the second equality.

For α(0,1/c)\alpha\in(0,1/c) and 0sα0\leq s\leq\alpha, using ex1/(1x)e^{x}\leq 1/(1-x) for x[0,1)x\in[0,1) we have

m(s)ecsm(s)11scm(s).\displaystyle m^{*}(s)\leq e^{cs}m(s)\leq\frac{1}{1-sc}m(s). (23)

Using (22) and applying (23), we have

m(s)\displaystyle m^{\prime}(s) \displaystyle\leq σ2s1scm(s)+|𝔼(YR)|λs1scm(s)+|𝔼(ResY)|λ\displaystyle\frac{\sigma^{2}s}{1-sc}m(s)+\frac{|\mathbb{E}(YR)|}{\lambda}\frac{s}{1-sc}m(s)+\frac{|\mathbb{E}(Re^{sY})|}{\lambda} (24)
=\displaystyle= σ2s1scm(s)+|𝔼(YR)|λs1scm(s)+|𝔼[esY𝔼[R|Y]]|λ\displaystyle\frac{\sigma^{2}s}{1-sc}m(s)+\frac{|\mathbb{E}(YR)|}{\lambda}\frac{s}{1-sc}m(s)+\frac{|\mathbb{E}[e^{sY}\mathbb{E}[R|Y]]|}{\lambda}
\displaystyle\leq (σ2+|𝔼(YR)|/λ)s1scm(s)+𝔼[esY|𝔼[R|Y]|]λ.\displaystyle\frac{(\sigma^{2}+|\mathbb{E}(YR)|/\lambda)s}{1-sc}m(s)+\frac{\mathbb{E}[e^{sY}|\mathbb{E}[R|Y]|]}{\lambda}.

Recalling from the statement of the theorem that B1=𝔼|R|/λB_{1}=\mathbb{E}|R|/\lambda if |R||R| and and esYe^{sY} are negatively correlated for 0<s<α0<s<\alpha and B1=esssup|𝔼[R|Y]|/λB_{1}=\mathrm{ess}\ \mathrm{sup}|\mathbb{E}[R|Y]|/\lambda otherwise and B2=|𝔼(YR)|/λB_{2}=|\mathbb{E}(YR)|/\lambda, it follows from the last expression that

m(s)(σ2+B2)s1scm(s)+B1m(s).\displaystyle m^{\prime}(s)\leq\frac{(\sigma^{2}+B_{2})s}{1-sc}m(s)+B_{1}m(s).

Dividing both sides by m(s)m(s), integrating over [0,α][0,\alpha] and using that m(0)=1m(0)=1 yield

log(m(α))=0αm(s)m(s)𝑑sσ2+B21αc0αs𝑑s+B10α1𝑑s=(σ2+B2)α22(1αc)+B1α,\displaystyle\log(m(\alpha))=\int_{0}^{\alpha}\frac{m^{\prime}(s)}{m(s)}ds\leq\frac{\sigma^{2}+B_{2}}{1-\alpha c}\int_{0}^{\alpha}sds+B_{1}\int_{0}^{\alpha}1ds=\frac{(\sigma^{2}+B_{2})\alpha^{2}}{2(1-\alpha c)}+B_{1}\alpha,

and therefore

m(α)exp((σ2+B2)α22(1αc)+B1α).\displaystyle m(\alpha)\leq\exp\left(\frac{(\sigma^{2}+B_{2})\alpha^{2}}{2(1-\alpha c)}+B_{1}\alpha\right).

We note that the claim for t=0t=0 is clear as the right hand side of (5) is one. Applying Markov’s inequality for t>0t>0, we obtain

P(Yt)=P(eαYeαt)eαtm(α)exp(α(tB1)+(σ2+B2)α22(1αc)).\displaystyle P(Y\geq t)=P(e^{\alpha Y}\geq e^{\alpha t})\leq e^{-\alpha t}m(\alpha)\leq\exp\left(-\alpha(t-B_{1})+\frac{(\sigma^{2}+B_{2})\alpha^{2}}{2(1-\alpha c)}\right).

Letting α=t/(σ2+B2+ct)\alpha=t/(\sigma^{2}+B_{2}+ct) that lies in (0,1/c)(0,1/c), (5) holds. The result for the left tail when YYcY-Y^{*}\leq c follows immediately from the same proof by letting X=YX=-Y and the fact that X=YX^{*}=-Y^{*} has the approximate XX-zero bias distribution with RR replaced by R-R. We also note that the results for the left tail in the other cases below follow by the same argument so it suffices to prove only the results for the right tail.

Next we prove (6). Following the same argument as in Goldstein and Işlak (2014) and using that

eyex|yx|(ey+ex)2 for all x and y,\displaystyle e^{y}-e^{x}\leq\frac{|y-x|(e^{y}+e^{x})}{2}\text{ \ \ for all \ \ }x\text{ and }y,

for α(0,2/c)\alpha\in(0,2/c) and 0sα0\leq s\leq\alpha, we have

esYesY|s(YY)|(esY+esY)2.\displaystyle e^{sY^{*}}-e^{sY}\leq\frac{|s(Y^{*}-Y)|(e^{sY^{*}}+e^{sY})}{2}.

Taking expectation and simplifying, we obtain

m(s)(1+cs/21cs/2)m(s).\displaystyle m^{*}(s)\leq\left(\frac{1+cs/2}{1-cs/2}\right)m(s).

Using (24) and proceeding as in the first bound yield

m(α)exp(5(σ2+B2)α26(1cα/2)+B1α) for all α(0,2/c).\displaystyle m(\alpha)\leq\exp\left(\frac{5(\sigma^{2}+B_{2})\alpha^{2}}{6(1-c\alpha/2)}+B_{1}\alpha\right)\text{ \ \ for all \ \ }\alpha\in(0,2/c).

We note that the claim for t=0t=0 is clear as the right hand side of (6) is one. Again applying Markov’s inequality for t>0t>0, we have

P(Yt)exp(α(tB1)+5(σ2+B2)α26(1cα/2))\displaystyle P(Y\geq t)\leq\exp\left(-\alpha(t-B_{1})+\frac{5(\sigma^{2}+B_{2})\alpha^{2}}{6(1-c\alpha/2)}\right)

Letting α=2t/(10(σ2+B2)/2+ct)\alpha=2t/(10(\sigma^{2}+B_{2})/2+ct) that lies in (0,2/c)(0,2/c), (6) follows.

(b) For any s[0,α)s\in[0,\alpha) such that m(α)m(\alpha) exists, using (22), (21) and that B1=𝔼|R|/λB_{1}=\mathbb{E}|R|/\lambda if |R||R| and and esYe^{sY} are negatively correlated for 0<s<α0<s<\alpha and B1=esssup|𝔼[R|Y]|/λB_{1}=\mathrm{ess}\ \mathrm{sup}|\mathbb{E}[R|Y]|/\lambda otherwise and B2=|𝔼(YR)|/λB_{2}=|\mathbb{E}(YR)|/\lambda, we have

m(s)\displaystyle m^{\prime}(s) =\displaystyle= σ2sm(s)𝔼YRλsm(s)+𝔼[ResY]λ\displaystyle\sigma^{2}sm^{*}(s)-\frac{\mathbb{E}YR}{\lambda}sm^{*}(s)+\frac{\mathbb{E}[Re^{sY}]}{\lambda}
\displaystyle\leq σ2sm(s)𝔼YRλsm(s)+𝔼[esY|𝔼[R|Y]|]λ\displaystyle\sigma^{2}sm^{*}(s)-\frac{\mathbb{E}YR}{\lambda}sm^{*}(s)+\frac{\mathbb{E}[e^{sY}|\mathbb{E}[R|Y]|]}{\lambda}
\displaystyle\leq (σ2+B2)secsm(s)+B1m(s)\displaystyle(\sigma^{2}+B_{2})se^{cs}m(s)+B_{1}m(s)

and thus

(logm(s))(σ2+B2)secs+B1.\displaystyle(\log m(s))^{\prime}\leq(\sigma^{2}+B_{2})se^{cs}+B_{1}.

Integrating over s[0,α]s\in[0,\alpha] we obtain

logm(α)σ2+B2c2(ecα(cα1)+1)+B1α,\displaystyle\log m(\alpha)\leq\frac{\sigma^{2}+B_{2}}{c^{2}}(e^{c\alpha}(c\alpha-1)+1)+B_{1}\alpha,

and hence

m(α)exp(σ2+B2c2(ecα(cα1)+1)+B1α).\displaystyle m(\alpha)\leq\exp\left(\frac{\sigma^{2}+B_{2}}{c^{2}}(e^{c\alpha}(c\alpha-1)+1)+B_{1}\alpha\right).

Again, by Markov’s inequality,

P(Yt)exp(α(tB1)+σ2+B2c2(ecα(cα1)+1))\displaystyle P(Y\geq t)\leq\exp\left(-\alpha(t-B_{1})+\frac{\sigma^{2}+B_{2}}{c^{2}}(e^{c\alpha}(c\alpha-1)+1)\right)

Now, for t>e+B1t>e+B_{1}, choosing α=(log(tB1)loglog(tB1))/c\alpha=(\log(t-B_{1})-\log\log(t-B_{1}))/c, we have

P(Yt)\displaystyle P(Y\geq t) \displaystyle\leq exp(tB1c(log(tB1)loglog(tB1))\displaystyle\exp\bigg{(}-\frac{t-B_{1}}{c}(\log(t-B_{1})-\log\log(t-B_{1}))
+σ2+B2c2(tB1log(tB1)(log(tB1)loglog(tB1)1)+1))\displaystyle+\frac{\sigma^{2}+B_{2}}{c^{2}}\left(\frac{t-B_{1}}{\log(t-B_{1})}(\log(t-B_{1})-\log\log(t-B_{1})-1)+1\right)\bigg{)}
\displaystyle\leq exp(tB1c(log(tB1)loglog(tB1)σ2+B2c)).\displaystyle\exp\left(-\frac{t-B_{1}}{c}\left(\log(t-B_{1})-\log\log(t-B_{1})-\frac{\sigma^{2}+B_{2}}{c}\right)\right).

Using that (logx)/2loglogx(\log x)/2\geq\log\log x for all x>1x>1 yields the second inequality in (7).

\Box


References

  • Aldous (1985) Aldous, D. J. (1985). Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII—1983, pp.  1–198. Lecture Notes in Math., 1117, Springer, Berlin.
  • Arratia et al. (2003) Arratia, R., A. D. Barbour, and S. Tavaré (2003). Logarithmic combinatorial structures: a probabilistic approach. European Mathematical Society.
  • Arratia and Baxendale (2015) Arratia, R. and P. Baxendale (2015). Bounded size bias coupling: a gamma function bound, and universal dickman-function behavior. Probability Theory and Related Fields 162(3), 411–429.
  • Chatterjee (2007) Chatterjee, S. (2007). Stein’s method for concentration inequalities. Probability theory and related fields 138(1-2), 305–321.
  • Chen et al. (2011) Chen, L. H., L. Goldstein, and Q.-M. Shao (2011). Normal approximation by Stein’s method. Springer, New York.
  • Cook et al. (2018) Cook, N., L. Goldstein, and T. Johnson (2018). Size biased couplings and the spectral gap for random regular graphs. The Annals of Probability 46(1), 72–125.
  • Crane (2016) Crane, H. (2016). The ubiquitous Ewens sampling formula. Statistical science 31(1), 1–19.
  • da Silva et al. (2020) da Silva, P. H., A. Jamshidpey, and S. Tavaré (2020). Random derangements and the Ewens sampling formula. arXiv preprint arXiv:2006.04840.
  • Ewens (1972) Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoretical population biology 3(1), 87–112.
  • Gan and Ross (2021) Gan, H. L. and N. Ross (2021). Stein’s method for the poisson–dirichlet distribution and the Ewens sampling formula, with applications to wright–fisher models. The Annals of Applied Probability 31(2), 625–667.
  • Ghosh and Goldstein (2011a) Ghosh, S. and L. Goldstein (2011a). Applications of size biased couplings for concentration of measures. Electronic Communications in Probability 16, 70–83.
  • Ghosh and Goldstein (2011b) Ghosh, S. and L. Goldstein (2011b). Concentration of measures via size-biased couplings. Probability theory and related fields 149(1), 271–278.
  • Goldstein and Işlak (2014) Goldstein, L. and Ü. Işlak (2014). Concentration inequalities via zero bias couplings. Statistics & Probability Letters 86, 17–23.
  • Goldstein and Reinert (1997) Goldstein, L. and G. Reinert (1997). Stein’s method and the zero bias transformation with application to simple random sampling. The Annals of Applied Probability 7(4), 935–952.
  • Hoeffding (1951) Hoeffding, W. (1951). A combinatorial central limit theorem. Annals of Mathematical Statistics 22(4), 558–566.
  • Johnson et al. (1997) Johnson, N. L., S. Kotz, and N. Balakrishnan (1997). Discrete multivariate distributions. Wiley, New York.
  • Laipaporn and Neammanee (2006) Laipaporn, K. and K. Neammanee (2006). A non-uniform concentration inequality for randomized orthogonal array sampling designs. Thai Journal of Mathematics 4(1), 11–34.
  • Pitman (1996) Pitman, J. (1996). Some developments of the blackwell-macqueen urn scheme. Lecture Notes-Monograph Series, 245–267.
  • Rattanawong (2013) Rattanawong, P. (2013). A non-uniform concentration inequality for a random permutation sum. Communications in Statistics-Theory and Methods 42(10), 1879–1888.
  • Robert and Casella (2013) Robert, C. and G. Casella (2013). Monte Carlo statistical methods. Springer Verlag, New York.
  • Ross (2011) Ross, N. (2011). Fundamentals of Stein’s method. Probability Surveys 8, 210–293.
  • Stein (1972) Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Prob. Univ. of California Press 2, 583–602.
  • Talagrand (1995) Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques 81(1), 73–205.
  • Von Neumann (1951) Von Neumann, J. (1951). Various techniques in connection with random digits, collected works, volume v. Applied Math Series, Notes by G. E. Forsythe, in National Bureau of Standards 12, 36–38.
  • Wiroonsri (2017) Wiroonsri, N. (2017). Stein’s method using approximate zero bias couplings with applications to combinatorial central limit theorems under the Ewens distribution. ALEA, Lat. Am. J. Probab. Math. Stat. 14, 917–946.