\AtBeginShipoutNext\AtBeginShipoutDiscard

^†^†thanks: This research was partially supported by NSTDA young researcher fund 64000343.
Email: [email protected]

Concentration inequalities using approximate zero bias couplings with applications to Hoeffding’s statistic under the Ewens distribution

Nathakhun Wiroonsri

Mathematics and Statistics with Applications Research Group
Department of Mathematics, King Mongkut’s University of Technology Thonburi

Abstract

We prove concentration inequalities of the form $P(Y\geq t)\leq\exp(-B(t))$ for a random variable $Y$ with mean zero and variance $\sigma^{2}$ using a coupling technique from Stein’s method that is so-called approximate zero bias couplings. Applications to the Hoeffding’s statistic where the random permutation has the Ewens distribution with parameter $\theta>0$ are also presented. A few simulation experiments are then provided to visualize the tail probability of the Hoeffding’s statistic and our bounds. Based on the simulation results, our bounds work well especially when $\theta\leq 1$ .

AMS 2010 subject classifications: Primary 60E15, 60C05 Secondary 62G10

Key words: Tail probabilities, Approximate Zero bias coupling, Stein’s method, Simulation, Accept-Reject algorithm, Ewens’s Sampling Formula

1 Introduction

Stein’s method was first introduced by Charles Stein in his seminal paper (Stein (1972)) and is best known for obtaining non-asymptotic bounds on various distances for distributional approximations, see the text (Chen et al. (2011)) and the introductory notes (Ross (2011)). Thus far, many applications in several areas such as combinatorics, statistics, statistical physics and applied sciences have been developed using this method. Meanwhile, some researchers have used Stein’s method for non-distributional approximation contexts. Chatterjee (2007) introduced a version of Stein’s method for deriving concentration of measure inequalities.

The name ‘concentration of measure’ was first used in Talagrand (1995) when the author aimed to derive deviation bounds of the form $P(|X-\mathbb{E}X|\geq x)$ in the situations where the exact evaluation is impossible or is hard to obtain. A bound of this form is considered ‘good’ if it decays sufficiently rapidly in $x$ . For instance, most of the works in the literature seek for bounds that are decreasing at exponential rate.

Now we only restrict our attention to concentration inequalities using Stein’s method. Laipaporn and Neammanee (2006) and Rattanawong (2013) obtained non-uniform concentration inequalities for randomized orthogonal array sampling designs and random permutation sum, respectively, however, they are not in the form that we focus on. Chatterjee (2007) used exchangeable pairs technique to provide concentration of measure for functions of dependent random variables, including the Hoeffding’s statistic when the random permutation has the uniform distribution and the net magnetization in the Curie-Weiss model. Once the connection between Stein’s method and concentration inequalities was uncovered, many results have been obtained using various techniques related to Stein’s method. Concentration bounds using bounded size bias couplings was introduced in Ghosh and Goldstein (2011b) and Ghosh and Goldstein (2011a) with examples including the number of relatively ordered subsequences of a random permutation, sliding window statistics, the lightbulb process, and etc. The same type of results with the help of bounded zero bias couplings occurred in Goldstein and Işlak (2014) with applications to the Hoeffding’s statistic where the random permutation has either the uniform distribution or one which is constant over permutations with the same cycle type and having no fixed points. The bounded size bias technique was strengthened in Arratia and Baxendale (2015) and the bounded assumption was relaxed by Cook et al. (2018) with applications to Erdös-Rényi graphs and permutation models.

In this work, we generalize the results in Goldstein and Işlak (2014) with zero bias couplings replaced by approximate zero bias couplings, first appeared in Wiroonsri (2017). This coupling is constructed by adding a remainder term to the construction of the zero bias coupling generated by a $\lambda$ -Stein pair (see Goldstein and Reinert (1997) or Section 4.4 of Chen et al. (2011)). Here we provide brief detail, presented in Wiroonsri (2017), of the construction of approximate zero bias couplings generated by approximate $\lambda,R$ -Stein pairs. Recall that we call a pair of random variables $(Y^{\prime},Y^{\prime\prime})$ , an approximate $\lambda,R$ -Stein pair if it is exchangeable and satisfies

\displaystyle\mathbb{E}Y^{\prime}=0,\quad\mathrm{Var}(Y^{\prime})=\sigma^{2}

(1)

with $\sigma^{2}\in(0,\infty)$ , and

\displaystyle\mathbb{E}(Y^{\prime\prime}|Y^{\prime})=(1-\lambda)Y^{\prime}+R,

(2)

for some $0<\lambda<1$ and $R=R(Y^{\prime})$ .

Now we state the following lemma proved in Wiroonsri (2017) which is the generalized version of the result in Goldstein and Reinert (1997) with a $\lambda$ -Stein pair replaced by an approximate $\lambda,R$ -Stein pair. We call $Y^{*}$ that satisfies (3), an approximate $Y^{\prime}$ -zero bias variable.

Lemma 1.1

Let $(Y^{\prime},Y^{\prime\prime})$ be an approximate $\lambda,R$ -Stein pair with distribution $F(y^{\prime},y^{\prime\prime})$ . Then when $(Y^{\dagger},Y^{\ddagger})$ has distribution

\displaystyle dF^{\dagger}(y^{\prime},y^{\prime\prime})=\frac{(y^{\prime\prime}-y^{\prime})^{2}}{\mathbb{E}(Y^{\prime\prime}-Y^{\prime})^{2}}dF(y^{\prime},y^{\prime\prime}),

and $U$ has a uniform distribution on $[0,1]$ , independent of $Y^{\dagger},Y^{\ddagger}$ , the variable $Y^{*}=UY^{\dagger}+(1-U)Y^{\ddagger}$ satisfies

\displaystyle\mathbb{E}[Y^{\prime}f(Y^{\prime})]=\sigma^{2}\mathbb{E}f^{\prime}(Y^{*})-\frac{\mathbb{E}Y^{\prime}R}{\lambda}\mathbb{E}f^{\prime}(Y^{*})+\frac{\mathbb{E}Rf(Y^{\prime})}{\lambda}

(3)

for all absolutely continuous functions $f$ .

The core idea of Stein’s method is based on the fact that a mean zero random variable $W$ is normal if and only if $\mathbb{E}[Wf(W)]=\sigma^{2}\mathbb{E}[f^{\prime}(W)]$ for all absolutely continuous function. Therefore if $Y^{\prime}$ and $Y^{*}$ can be closely coupled by using Lemma 1.1 satisfying (3) with some small remainder term $R$ then we expect that the behavior of $Y^{\prime}$ , including the decay of its tail probabilities, may be simlar to that of the normal. Wiroonsri (2017) obtained $L^{1}$ and $L^{\infty}$ bounds between $Y^{\prime}/\sigma$ and the standard normal in term of the difference $|Y^{\prime}-Y^{*}|$ and the remainder term $R$ . Here we focus on the concentration bounds on the same setting.

We also apply our result to the Hoeffding’s statistic (Hoeffding (1951)) where the random permutation has the Ewens distribution. That is, we study the distribution of

\displaystyle Y=\sum_{i=1}^{n}a_{i,\pi(i)}

(4)

where $A\in\mathbb{R}^{n\times n}$ is a given real matrix with components $\{a_{i,j}\}_{i,j=1}^{n}$ and $\pi\in\mathcal{S}_{n}$ has the Ewens distribution where $\mathcal{S}_{n}$ denotes the symmetric group. We note that the distribution of $Y$ is of interest as it has a connection to nonparametric tests in statistics. Wiroonsri (2017) has succeeded in providing $L^{1}$ and $L^{\infty}$ bounds of order $\sigma^{-1}$ between $Y$ and the normal by constructing an approximate $Y$ -zero bias coupling and deriving the bounds based on the coupling. In the present work, we follow the construction there and apply the following main results to obtain concentration inequalities.

We now state the main theorem and then add a remark comparing the results here to the ones in Goldstein and Işlak (2014).

Theorem 1.2

Let $Y$ be a mean zero random variable with variance $\sigma^{2}>0$ and moment generating function $m(s)=\mathbb{E}[e^{sY}]$ , and let $Y^{*}$ have the approximate $Y$ -zero bias distribution satisfying (3), with $R=R(Y)$ and $0<\lambda<1$ . Let $B_{1}=\mathbb{E}|R|/\lambda$ if $|R|$ and $e^{sY}$ are negatively correlated for $0<s<\alpha$ where $\alpha$ is given differently for each bound below and $B_{1}=\mathrm{ess}\ \mathrm{sup}|\mathbb{E}[R|Y]|/\lambda$ otherwise and $B_{2}=|\mathbb{E}(YR)|/\lambda$ . Assume that $B_{1}$ and $B_{2}$ are finite.

(a)

If $Y^{*}-Y\leq c$ for some $c>0$ and $m(s)$ exists for all $s\in[0,\alpha)$ with $\alpha=1/c$ , then for all $t>0$

$\displaystyle P(Y\geq t)\leq\exp\left(-\frac{t(t-2B_{1})}{2(\sigma^{2}+B_{2}+ct)}\right).$ (5)

The same upper bound holds for $P(Y\leq-t)$ if $Y-Y^{*}\leq c$ when $m(s)$ exists for all $s\in(-\alpha,0]$ . If $|Y^{*}-Y|\leq c$ for some $c>0$ and $m(s)$ exists for all $s\in[0,\alpha)$ with $\alpha=2/c$ then for all $t>0$

$\displaystyle P(Y\geq t)\leq\exp\left(-\frac{t(t-2B_{1})}{10(\sigma^{2}+B_{2})/3+ct}\right),$ (6)

with the same upper bound for $P(Y\leq-t)$ if $Y-Y^{*}\leq c$ and $m(s)$ exists in $(-\alpha,0]$ .

(b)

If $Y^{*}-Y\leq c$ for some $c>0$ and $m(s)$ exists at $\alpha=(\log(t-B_{1})-\log\log(t-B_{1}))/c$ then for $t>B_{1}$ ,

	$\displaystyle P(Y\geq t)$	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{t-B_{1}}{c}\left(\log(t-B_{1})-\log\log(t-B_{1})-\frac{\sigma^{2}+B_{2}}{c}\right)\right)$		(7)
		$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{t-B_{1}}{2c}\left(\log(t-B_{1})-\frac{2(\sigma^{2}+B_{2})}{c}\right)\right).$		(7)

If $Y-Y^{*}\leq c$ then the same bound holds for the left tail $P(Y\leq-t)$ when $m(-\alpha)<\infty$ .

Part (b) shows that the respective asymptotic orders as $t\rightarrow\infty$ of $\exp(-t/(2c))$ and $\exp(-t/c)$ of the bounds (5) and (6) can be improved to $\exp(-t\log t/(2c))$ .

Remark 1.3

1.

If $R=0$ , then $B_{1}$ and $B_{2}$ are zero and therefore the bounds (5), (6) and (7) in Theorem 1.2 are exactly the same as (2), (3) and (4) of Goldstein and Işlak (2014), respectively. See Remark 2.1 of Goldstein and Işlak (2014) and a few paragraphs thereafter for a comparison to previous results in the literature.
2.

We note that the bounds in (5) and (6) will be effective only for $t>2B_{1}$ , and the bound (7) for $t>B_{1}$ where $B_{1}$ is given in the statement of the theorem. Therefore the three bounds in Theorem 1.2 will be useful only if we can construct an approximate $\lambda,R$ -Stein pair in such a way that $B_{1}$ or $2B_{1}$ is in the range of the support of $Y$ . Otherwise, the bounds will be trivial.

Stein’s method has been used with Ewens distribution recently in Gan and Ross (2021) where they obtained an upper bound on the total variabtion distance between the random partition generated by Ewens Sampling Formula and the Poisson-Dirichlet distribution. Ewens distribution has also attracted attention in other fields of mathematics for a while (see Johnson et al. (1997), Arratia et al. (2003), Crane (2016) and da Silva et al. (2020) for instance.)

The remainder of this work is organized as follows. We present an application to the Hoeffding’s statistic where the random permutation has the Ewens distribution in Section 2. A few simulation experiments are then performed in Section 3. Finally, we devote the last section for the proof of our main theorem.

2 The Hoeffding’s statistic under the Ewens measure

As introduced in the first section, we study the distribution of

\displaystyle Y=\sum_{i=1}^{n}a_{i,\pi(i)}

(8)

where $A\in\mathbb{R}^{n\times n}$ is a given real matrix with components $\{a_{i,j}\}_{i,j=1}^{n}$ and $\pi\in\mathcal{S}_{n}$ has the Ewens distribution. The original work (Hoeffding (1951)) studied the asymptotic normality of $Y$ when $\pi\in\mathcal{S}_{n}$ has a uniform distribution. We note that, in this section, we consider the symbols $\pi$ and $Y$ interchageable with $\pi^{\prime}$ and $Y^{\prime}$ , respectively.

We first briefly describe the Ewens distribution and state some important properties. The Ewens distribution ${\cal E}_{\theta}$ on the symmetric group ${\cal S}_{n}$ with parameter $\theta>0$ , was first introduced in Ewens (1972) and used in population genetics to describe the probabilities associated with the number of times that different alleles are observed in the sample; see also Arratia et al. (2003) for the description in mathematical context. Let $\mathbb{N}_{k}=[k,\infty)\cap\mathbb{Z}$ . In the following, for $x\in\mathbb{R}$ and $n\in\mathbb{N}_{1}$ , we use the notations

\displaystyle x^{(n)}=x(x+1)\cdots(x+n-1)\text{ \ \ and \ \ }x_{(n)}=x(x-1)\cdots(x-n+1).

For a permutation $\pi\in\mathcal{S}_{n}$ , the Ewens measure is given by

\displaystyle P_{\theta}(\pi)=\frac{\theta^{\#(\pi)}}{\theta^{(n)}},

(9)

where $\#(\pi)$ denotes the number of cycles of $\pi$ . We note that ${\cal E}_{\theta}$ specializes to the uniform distribution over all permutations when $\theta=1$ .

A permutation $\pi_{n}\in\mathcal{S}_{n}$ with the distribution ${\cal E}_{\theta}$ can be constructed by the “so called” the Chinese restaurant process (see e.g. Aldous (1985) and Pitman (1996)), as follows. For $n=1$ , $\pi_{1}$ is the unique permutation that maps $1$ to $1$ in $\mathcal{S}_{1}$ . For $n\geq 2$ , we construct $\pi_{n}$ from $\pi_{n-1}$ by either adding $n$ as a fixed point with probability $\theta/(\theta+n-1)$ , or by inserting $n$ uniformly into one of $n-1$ locations inside a cycle of $\pi_{n-1}$ , so each with probability $1/(\theta+n-1)$ . Following this construction, for $i,j,k,l\in[n]$ where $[n]=\{1,2,\ldots,n\}$ such that $i\neq j$ and $k\neq l$ , we have

\displaystyle P_{\theta}(\pi(i)=k)=\begin{cases}\frac{1}{\theta+n-1}\text{ \ if \ }k\neq i,\\ \frac{\theta}{\theta+n-1}\text{ \ if \ }k=i\end{cases}.

(10)

A distribution on $\mathcal{S}_{n}$ is said to be constant on cycle type if the probability of any permutation $\pi\in{\cal S}_{n}$ depends only on the cycle type $(c_{1},\ldots,c_{n})$ where $c_{q}(\pi)$ is the number of $q$ cycles of $\pi$ and we write $c_{q}$ for $c_{q}(\pi)$ for simplicity. Concentration inequalities of $Y$ where $\pi\in\mathcal{S}_{n}$ has either a uniform distribution or one which is constant over permutations with the same cycle type and having no fixed points were obtained in Goldstein and Işlak (2014) by using zero bias couplings. It follows from (9) directly that the Ewens distribution is constant on cycle type and allows fixed point. Due to the difference, the zero biasing technique does not work here and thus we use approximate zero biasing as discussed in the first section instead. We first follow the construction of approximate $Y$ -zero bias couplings presented in Wiroonsri (2017) and then apply Theorem 1.2 to obtain concentration bounds. It is interesting to study the robustness and the sensitivity of the tail probabilities when the controlled assumptions have changed.

We begin with stating the definitions, notations and assumptions used in Wiroonsri (2017). Letting

\displaystyle a_{\bullet,\bullet}=\frac{1}{n(\theta+n-1)}\left(\theta\sum_{i=1}^{n}a_{i,i}+\sum_{i\neq j}a_{ij}\right),

(11)

by (10), we have

\displaystyle\mathbb{E}Y=na_{\bullet,\bullet}.

(12)

Letting

\displaystyle\widehat{a}_{i,j}=a_{i,j}-a_{\bullet,\bullet},

and using (12), we have

\displaystyle\mathbb{E}\left[\sum_{i=1}^{n}\widehat{a}_{i,\pi(i)}\right]=\mathbb{E}\left[\sum_{i=1}^{n}(a_{i,\pi(i)}-a_{\bullet,\bullet})\right]=\mathbb{E}\left[\sum_{i=1}^{n}a_{i,\pi(i)}-na_{\bullet,\bullet}\right]=0.

As a consequence, replacing $a_{i,j}$ by $\widehat{a}_{i,j}$ , we assume without loss of generality that

\displaystyle\mathbb{E}Y=0\text{ \ and \ }a_{\bullet,\bullet}=0,

(13)

and for simplicity, we consider only the symmetric case, that is, $a_{i,j}=a_{j,i}$ for all $i,j$ . Also, to rule out trivial cases, we assume in what follows that $\sigma^{2}>0$ . The explicit form of $\sigma^{2}$ was calculated in Lemma 4.8 of Wiroonsri (2017) and as discussed in Remark 4.9 of Wiroonsri (2017), $\sigma^{2}$ is of order $n$ when the elements of $A$ are well chosen so that they do not depend on $n$ and most of the elements are not the same number. In this case, there exists $N\in\mathbb{N}_{1}$ such that $\sigma^{2}>0$ for $n>N$ .

The following theorem presents concentration inequalities for $Y$ .

Theorem 2.1

Let $n\geq 6$ and let $A=\{a_{i,j}\}_{i,j=1}^{n}$ be an array of real numbers satisfying

\displaystyle a_{i,j}=a_{j,i}.

Let $\pi\in\mathcal{S}_{n}$ be a random permutation with the distribution ${\cal E}_{\theta}$ , with $\theta>0$ . Then, with $Y$ the sum in (8) and

\displaystyle M=\max_{i,j}|a_{i,j}-a_{\bullet,\bullet}|,\ a_{\bullet,\bullet}\text{ \ as in \eqref{adotdot}},

the bounds in Theorem 1.2 hold with $Y$ replaced by $Y-\mathbb{E}Y$ and $c=20M$ .

Moreover,

\displaystyle B_{1}\leq(6n+4.8\theta)M\text{ \ and \ }B_{2}\leq\left(3\kappa_{\theta,n,1}+1.2\theta+\frac{1.2(\kappa_{\theta,n,1}(\theta+1)+\kappa_{\theta,n,2})}{n}\right)M\sigma,

where

\displaystyle\kappa_{\theta,n,1}=\sqrt{\frac{\theta^{2}n_{(2)}}{(\theta+n-1)_{(2)}}+\frac{\theta n}{\theta+n-1}}

(14)

and

\displaystyle\kappa_{\theta,n,2}=\sqrt{\frac{\theta^{4}n_{(4)}}{(\theta+n-1)_{(4)}}+\frac{4\theta^{3}n_{(3)}}{(\theta+n-1)_{(3)}}+\frac{2\theta^{2}n_{(2)}}{(\theta+n-1)_{(2)}}}.

(15)

In particular, if $A$ is chosen so that $|R|$ given in (16) and and $e^{sY}$ are negatively correlated for $0<s<\alpha$ , then

\displaystyle B_{1}\leq\frac{\theta M(3.6n+2.4\theta-3)}{\theta+n-1}+\frac{\theta^{2}nM}{2(\theta+n-1)_{(2)}}.

As mentioned earlier in Remark 1.3, the bounds arised in Theorem 2.1, are effective for $t\geq 2B_{1}$ where $B_{1}$ depends on the remainder term of the approximate $\lambda,R$ -Stein pair and the bounds are trivial outside this range. Hence if $2B_{1}$ is out of the support of $Y-\mathbb{E}Y$ , then the bound will not be useful. The best possible bound of $B_{1}$ that we can obtain in general is $(6n+4.8\theta)M$ which depends on $n$ , nevertheless, it becomes $\theta M(3.6n+2.4\theta-3)/(\theta+n-1)+\theta^{2}nM/(2(\theta+n-1)_{(2)})$ which is of constant order in the case that $|R|$ and and $e^{sY}$ are negatively correlated for $0<s<\alpha$ .

To prove Theorem 2.1, we apply Theorem 1.2 from the introduction. That is, we have to construct an approximate $Y$ -zero bias coupling $Y^{*}$ from an approximate $\lambda,R$ -Stein pair $(Y,Y^{\prime\prime})$ by using Lemma 1.1 in such a way that $|Y^{*}-Y|$ and $\mathbb{E}[R|Y]$ are bounded and that $|\mathbb{E}(YR)|<\infty$ . For this purpose, we state the following two lemmas. The first, Lemma 2.2 proved in Wiroonsri (2017), constructs an approximate $4/n,R$ -Stein pair $(Y^{\prime},Y^{\prime\prime})$ for $n\geq 6$ where $Y^{\prime}$ is given by (8) with $\pi^{\prime}$ replacing $\pi$ . The bounds related to the remainder term $R$ are then obtained in Lemma 2.3. Below, for $i,j\in[n]$ , we write $i\sim j$ if $i$ and $j$ are in the same cycle, let $|i|$ be the length of the cycle containing $i$ and let $\tau_{i,j}$ , $i,j\in[n]$ be the permutation that transposes $i$ and $j$ .

Lemma 2.2 (Wiroonsri (2017))

For $n\geq 6$ , let $\{a_{i,j}\}_{i,j=1}^{n}$ be an array of real numbers satisfying $a_{i,j}=a_{j,i}$ and $a_{\bullet,\bullet}=0$ where $a_{\bullet,\bullet}$ is as in (11). Let $\pi\in\mathcal{S}_{n}$ a random permutation has the Ewens measure ${\cal E}_{\theta}$ with $\theta>0$ , and let $Y^{\prime}$ be given by (8). Further, let $I,J$ be chosen independently of $\pi$ , uniformly from all pairs of distinct elements of $\{1,\ldots n\}$ . Then, letting $\pi^{\prime\prime}=\tau_{I,J}\pi\tau_{I,J}$ and $Y^{\prime}$ be given by (8) with $\pi^{\prime}$ replacing $\pi$ , $(Y^{\prime},Y^{\prime\prime})$ is an approximate $4/n,R$ -Stein pair with

\displaystyle R(Y^{\prime})=\frac{1}{n(n-1)}\mathbb{E}[T|Y^{\prime}]

(16)

where

	$\displaystyle T$	$\displaystyle=$	$\displaystyle 2(n+c_{1}-2(\theta+1))\sum_{\|i\|=1}a_{i,i}-2(c_{1}-2\theta)\sum_{\|i\|\geq 2}a_{i,i}$		(17)
			$\displaystyle\hskip 80.0pt-4\sum_{\|i\|=1,\|j\|=1,j\neq i}a_{i,j}-4\sum_{\|i\|=1,\|j\|\geq 2}a_{i,j}.$		(17)

Lemma 2.3

Let $(Y^{\prime},Y^{\prime\prime})$ be an approximate $4/n,R$ -Stein pair constructed as in Lemma 2.2 with $R$ as in (16). Then

\displaystyle|\mathbb{E}[R|Y]|\leq\frac{10n+8\theta}{n-1}M\text{ \ \ a.s},

(18)

\displaystyle\mathbb{E}|R|\leq\frac{\theta M(12n+8\theta-10)}{(n-1)(\theta+n-1)}+\frac{2\theta^{2}M}{(\theta+n-1)_{(2)}},

(19)

and

\displaystyle|\mathbb{E}Y^{\prime}R|\leq\left(10\kappa_{\theta,n,1}+4\theta+\frac{4(\kappa_{\theta,n,1}(\theta+1)+\kappa_{\theta,n,2})}{n}\right)\frac{M\sigma}{n-1}

(20)

where $M=\max_{i,j}|a_{i,j}-a_{\bullet,\bullet}|$ with $a_{\bullet,\bullet}$ as in (11) and $\kappa_{\theta,n,1}$ and $\kappa_{\theta,n,2}$ are given in (14) and (15), respectively.

Proof: The bounds (19) of $\mathbb{E}|R|$ and (20) of $|\mathbb{E}Y^{\prime}R|$ were shown in Wiroonsri (2017). To prove (18), since $|\mathbb{E}[R|Y]|=|\mathbb{E}[T|Y]|/(n(n-1))\leq|T|/(n(n-1))$ , it suffices to bound $T$ in (17). For any fixed $\pi\in\mathcal{S}_{n}$ , using that $|\{i\in[n]:|i|=1\}|=c_{1}$ and $|\{i\in[n]:|i|\geq 2\}|=n-c_{1}$ we have

\displaystyle|T|\leq 4\theta nM\text{ \ \ a.s. \ \ if \ \ }c_{1}=0,

and, if $c_{1}>0$ ,

	$\displaystyle\|T\|\leq(2n+2c_{1}+4\theta+4)c_{1}M+(2c_{1}+4\theta)(n-c_{1})M$
	$\displaystyle+4c_{1}(c_{1}-1)M+4c_{1}(n-c_{1})M\text{ \ \ a.s. }$

Using that $c_{1}\leq n$ and that $n(n-c_{1})$ is maximized when $c_{1}=n/2$ , it follows that

\displaystyle|T|\leq(10n^{2}+8\theta n)M\text{ \ \ a.s.}

$\Box$

Now we have all ingredients to prove Theorem 2.1.

Proof of Theorem 2.1: First we construct an approximate $4/n,R$ -Stein pair $(Y^{\prime},Y^{\prime\prime})$ as in Lemma 2.2 with the remainder $R$ given in (16). Then, constructing an approximate $Y$ -zero bias variable $Y^{*}$ from the approximate Stein pair $(Y^{\prime},Y^{\prime\prime})$ as in the proof of Theorem 4.1 of Wiroonsri (2017), we have $|Y^{*}-Y|\leq 20M$ . Invoking Theorem 1.2 with $\lambda=4/n$ using the bounds of $|R|$ and $|\mathbb{E}YR|$ in (18) and (20) of Lemma 2.3, respectively, completes the proof. $\Box$

3 Simulation

In this section, we perform 4 simulation experiments. $A$ is generated by first randomly choosing $X$ from $N(1,0.2)$ and $N(-1,0.2)$ equally and then letting $B=X+X^{T}$ and $A=B-b_{\bullet,\bullet}$ where $b_{\bullet,\bullet}$ is defined similarly to (11). We intentionally select $X$ such that $\mathrm{Cov}(|R|,e^{sY})<0$ for all $0<s<\alpha$ with $\alpha$ as defined in Theorem 1.2 so that the negative correlation assumption there is satisfied. To generate Ewens permutations with $\theta\neq 1$ , we apply the accept-reject algorithm (see Robert and Casella (2013)). Recall that the discrete version of the accept-reject algorithm is stated as follow.

Theorem 3.1 (Von Neumann (1951))

Let $Y$ and $V$ be discrete random variables with probability mass functions $f_{Y}(y)$ and $f_{V}(v)$ , respectively, where $f_{Y}$ and $f_{V}$ have common support. Define

\displaystyle C=\sup_{y}\frac{f_{Y}(y)}{f_{V}(y)}<\infty.

To generate a random variable $Y\sim f_{Y}$ :

(a)

Generate $U\sim$ Uniform $(0,1)$ , $V\sim f_{V}$ , independently.
(b)

If $U\leq\frac{f_{Y}(V)}{Cf_{V}(V)}$ , set $Y=V$ ; otherwise, return to step (a).

In our case, $Y$ and $V$ are $\mathcal{E}_{\theta}$ with $\theta\neq 1$ and uniform, respectively, with common support be all possible permutations $\pi\in\mathcal{S}_{n}$ . Recall that the probability mass functions of $Y$ and $V$ are

\displaystyle f_{Y}(\pi)=\frac{\theta^{\#(\pi)}}{\theta^{(n)}}\text{ \ and \ }f_{V}(\pi)=\frac{1}{n!},

where $\#(\pi)$ denotes the number of cycles of $\pi$ . It is easy to see that

\displaystyle C=\begin{cases}\frac{n!\theta}{\theta^{(n)}}\text{ \ if \ }\theta<1\\ \frac{n!\theta^{n}}{\theta^{(n)}}\text{ \ if \ }\theta>1\\ \end{cases}.

3.1 Experiment 1

Let $n=1000$ and $\theta=1$ . By simulating $Y$ and $R$ for 1,000,000 times, we have $\sigma^{2}=2082.80$ , $B_{1}=0.74$ , $B_{2}=2.13$ , $M=3.59$ , support $=[-2795.86,2803.19]$ and $\mathrm{Cov}(Y,|R|)=-0.0005$ from the sample. Figures 1 and 2 show $Cov(e^{sY},|R|)$ for $0<s<\alpha$ and the plot of $P(Y>t)$ compared with the bounds from (5) and (6), respectively. Note that we intend not to show (7) since $Cov(e^{sY},|R|)>0$ at $\alpha=(\log(t-B_{1})-\log\log(t-B_{1}))/c$ for large $t$ . Since $\theta=1$ is equivalent to the uniform distribution on $\pi\in\mathcal{S}_{n}$ , we also plot the bound from Goldstein and Işlak (2014). It is clear that the bound from Goldstein and Işlak (2014) is smaller as our coupling construction yields larger $|Y-Y^{*}|$ and that our bound is derived in a way to handle the remainder term so it is not as tight. However, the bound from Goldstein and Işlak (2014) does not work with $\theta\neq 1$ .

Refer to caption — Figure 1: $Cov(e^{sY},|R|)$ for $0\leq s\leq 2/c$ .

3.2 Experiment 2

Since generating Ewens permutations with $\theta\neq 1$ requires an algorithm, we are not able to generate a very big sample with large $n$ . Let $n=1000$ and $\theta=0.8$ . $\theta<1$ implies that the chance of having a big number of cycles is less likely. By simulating $Y$ and $R$ 10,000 times, we have $\sigma^{2}=507.20$ , $B_{1}=0.29$ , $B_{2}=0.28$ , $M=1.60$ , support $=[-1400.45,1398.01]$ and $\mathrm{Cov}(Y,|R|)=-0.001$ from the sample. Figures 3 and 4 show $Cov(e^{sY},|R|)$ for $0<s<\alpha$ and the plot of $P(Y>t)$ compared with the bounds from (5) and (6), respectively. Note that the bound (7) is too large in this case because of the term $\frac{2(\sigma^{2}+B_{2})}{c}$ .

3.3 Experiment 3

When increasing $\theta$ , it is more expensive to generate a permutation. Now we let $n=100$ and $\theta=1.05$ . $\theta>1$ provides that the chance of having a big number of cycles is more likely. By simulating $Y$ and $R$ , 10,000 times, we have $\sigma^{2}=212.75$ , $B_{1}=0.88$ , $B_{2}=2.42$ , $M=3.15$ , support $=[-255.59,260.86]$ and $\mathrm{Cov}(Y,|R|)=-0.015$ from the sample. Figures 5 and 6 show $Cov(e^{sY},|R|)$ for $0<s<\alpha$ and the plot of $P(Y>t)$ compared with the bounds from (5), (6) and (7), respectively.

3.4 Experiment 4

For this experiment, we intend to illustrate the case that $\theta$ is much greater than 1 which results in a permutation with much larger number of cycles. However, the simulation is very expensive which even takes 93.41 iterations on average to get only one permutation with $n=10$ and $\theta=2$ . Based on our simulation result with sample size of 10,000, the mean of the number of cycles is 4.03. By simulating $Y$ and $R$ 10,000 times, we have $\sigma^{2}=23.14$ , $B_{1}=1.27$ , $B_{2}=4.60$ , $M=2.70$ , support $=[-20.79,20.95]$ and $\mathrm{Cov}(Y,|R|)=-0.69$ from the sample. Figures 7 and 8 show $Cov(e^{sY},|R|)$ for $0<s<\alpha$ and the plot of $P(Y>t)$ compared with the bounds from (5), (6) and (7), respectively.

3.5 Conclusion from the simulation results

We can notice from the four experiments that our bounds seem to be more useful for smaller $\theta$ , especially when $\theta\leq 1$ . For $\theta>1$ , the bounds are not sufficiently small even at the upper limit of the support. However, our experiments with $\theta>1$ are only done with small $n$ . Therefore, it is interesting to see whether our bounds will work better with larger $n$ .

Also based on our experiments, the bound (6) dominates the other in all cases. Since the bound (7) actually has the fastest decaying rate at $\exp\left(-t\log t\right)$ , the bound (7) should be better if we consider $Y$ with a larger support.

4 Proof of the main result

In this section, we follow the same technique as in Goldstein and Işlak (2014) but handle a few extra terms that include $R$ .

Proof of Theorem 1.2: Let $m(s)=\mathbb{E}[e^{sY}]$ and $m^{*}(s)=\mathbb{E}[e^{sY^{*}}]$ . Using that $Y^{*}-Y\leq c$ , for $s\geq 0$ we have

\displaystyle m^{*}(s)=\mathbb{E}[e^{s(Y^{*}-Y)}e^{sY}]\leq\mathbb{E}[e^{cs}e^{sY}]=e^{cs}m(s).

(21)

(a) If $m(s)$ exists in an open interval containing $s$ we may interchange expectation and differentiation at $s$ to obtain

	$\displaystyle m^{\prime}(s)$	$\displaystyle=$	$\displaystyle\mathbb{E}[Ye^{sY}]=\sigma^{2}\mathbb{E}[se^{sY^{}}]-\frac{\mathbb{E}YR}{\lambda}\mathbb{E}[se^{sY^{}}]+\frac{\mathbb{E}[Re^{sY}]}{\lambda}$		(22)
		$\displaystyle=$	$\displaystyle\sigma^{2}sm^{}(s)-\frac{\mathbb{E}YR}{\lambda}sm^{}(s)+\frac{\mathbb{E}[Re^{sY}]}{\lambda},$		(22)

where we have used the approximate zero bias relation in the second equality.

For $\alpha\in(0,1/c)$ and $0\leq s\leq\alpha$ , using $e^{x}\leq 1/(1-x)$ for $x\in[0,1)$ we have

\displaystyle m^{*}(s)\leq e^{cs}m(s)\leq\frac{1}{1-sc}m(s).

(23)

Using (22) and applying (23), we have

$\displaystyle m^{\prime}(s)$	$\displaystyle\leq$	$\displaystyle\frac{\sigma^{2}s}{1-sc}m(s)+\frac{\|\mathbb{E}(YR)\|}{\lambda}\frac{s}{1-sc}m(s)+\frac{\|\mathbb{E}(Re^{sY})\|}{\lambda}$	(24)
	$\displaystyle=$	$\displaystyle\frac{\sigma^{2}s}{1-sc}m(s)+\frac{\|\mathbb{E}(YR)\|}{\lambda}\frac{s}{1-sc}m(s)+\frac{\|\mathbb{E}[e^{sY}\mathbb{E}[R\|Y]]\|}{\lambda}$
	$\displaystyle\leq$	$\displaystyle\frac{(\sigma^{2}+\|\mathbb{E}(YR)\|/\lambda)s}{1-sc}m(s)+\frac{\mathbb{E}[e^{sY}\|\mathbb{E}[R\|Y]\|]}{\lambda}.$

Recalling from the statement of the theorem that $B_{1}=\mathbb{E}|R|/\lambda$ if $|R|$ and and $e^{sY}$ are negatively correlated for $0<s<\alpha$ and $B_{1}=\mathrm{ess}\ \mathrm{sup}|\mathbb{E}[R|Y]|/\lambda$ otherwise and $B_{2}=|\mathbb{E}(YR)|/\lambda$ , it follows from the last expression that

\displaystyle m^{\prime}(s)\leq\frac{(\sigma^{2}+B_{2})s}{1-sc}m(s)+B_{1}m(s).

Dividing both sides by $m(s)$ , integrating over $[0,\alpha]$ and using that $m(0)=1$ yield

\displaystyle\log(m(\alpha))=\int_{0}^{\alpha}\frac{m^{\prime}(s)}{m(s)}ds\leq\frac{\sigma^{2}+B_{2}}{1-\alpha c}\int_{0}^{\alpha}sds+B_{1}\int_{0}^{\alpha}1ds=\frac{(\sigma^{2}+B_{2})\alpha^{2}}{2(1-\alpha c)}+B_{1}\alpha,

and therefore

\displaystyle m(\alpha)\leq\exp\left(\frac{(\sigma^{2}+B_{2})\alpha^{2}}{2(1-\alpha c)}+B_{1}\alpha\right).

We note that the claim for $t=0$ is clear as the right hand side of (5) is one. Applying Markov’s inequality for $t>0$ , we obtain

\displaystyle P(Y\geq t)=P(e^{\alpha Y}\geq e^{\alpha t})\leq e^{-\alpha t}m(\alpha)\leq\exp\left(-\alpha(t-B_{1})+\frac{(\sigma^{2}+B_{2})\alpha^{2}}{2(1-\alpha c)}\right).

Letting $\alpha=t/(\sigma^{2}+B_{2}+ct)$ that lies in $(0,1/c)$ , (5) holds. The result for the left tail when $Y-Y^{*}\leq c$ follows immediately from the same proof by letting $X=-Y$ and the fact that $X^{*}=-Y^{*}$ has the approximate $X$ -zero bias distribution with $R$ replaced by $-R$ . We also note that the results for the left tail in the other cases below follow by the same argument so it suffices to prove only the results for the right tail.

Next we prove (6). Following the same argument as in Goldstein and Işlak (2014) and using that

\displaystyle e^{y}-e^{x}\leq\frac{|y-x|(e^{y}+e^{x})}{2}\text{ \ \ for all \ \ }x\text{ and }y,

for $\alpha\in(0,2/c)$ and $0\leq s\leq\alpha$ , we have

\displaystyle e^{sY^{*}}-e^{sY}\leq\frac{|s(Y^{*}-Y)|(e^{sY^{*}}+e^{sY})}{2}.

Taking expectation and simplifying, we obtain

\displaystyle m^{*}(s)\leq\left(\frac{1+cs/2}{1-cs/2}\right)m(s).

Using (24) and proceeding as in the first bound yield

\displaystyle m(\alpha)\leq\exp\left(\frac{5(\sigma^{2}+B_{2})\alpha^{2}}{6(1-c\alpha/2)}+B_{1}\alpha\right)\text{ \ \ for all \ \ }\alpha\in(0,2/c).

We note that the claim for $t=0$ is clear as the right hand side of (6) is one. Again applying Markov’s inequality for $t>0$ , we have

\displaystyle P(Y\geq t)\leq\exp\left(-\alpha(t-B_{1})+\frac{5(\sigma^{2}+B_{2})\alpha^{2}}{6(1-c\alpha/2)}\right)

Letting $\alpha=2t/(10(\sigma^{2}+B_{2})/2+ct)$ that lies in $(0,2/c)$ , (6) follows.

(b) For any $s\in[0,\alpha)$ such that $m(\alpha)$ exists, using (22), (21) and that $B_{1}=\mathbb{E}|R|/\lambda$ if $|R|$ and and $e^{sY}$ are negatively correlated for $0<s<\alpha$ and $B_{1}=\mathrm{ess}\ \mathrm{sup}|\mathbb{E}[R|Y]|/\lambda$ otherwise and $B_{2}=|\mathbb{E}(YR)|/\lambda$ , we have

$\displaystyle m^{\prime}(s)$	$\displaystyle=$	$\displaystyle\sigma^{2}sm^{}(s)-\frac{\mathbb{E}YR}{\lambda}sm^{}(s)+\frac{\mathbb{E}[Re^{sY}]}{\lambda}$
	$\displaystyle\leq$	$\displaystyle\sigma^{2}sm^{}(s)-\frac{\mathbb{E}YR}{\lambda}sm^{}(s)+\frac{\mathbb{E}[e^{sY}\|\mathbb{E}[R\|Y]\|]}{\lambda}$
	$\displaystyle\leq$	$\displaystyle(\sigma^{2}+B_{2})se^{cs}m(s)+B_{1}m(s)$

and thus

\displaystyle(\log m(s))^{\prime}\leq(\sigma^{2}+B_{2})se^{cs}+B_{1}.

Integrating over $s\in[0,\alpha]$ we obtain

\displaystyle\log m(\alpha)\leq\frac{\sigma^{2}+B_{2}}{c^{2}}(e^{c\alpha}(c\alpha-1)+1)+B_{1}\alpha,

and hence

\displaystyle m(\alpha)\leq\exp\left(\frac{\sigma^{2}+B_{2}}{c^{2}}(e^{c\alpha}(c\alpha-1)+1)+B_{1}\alpha\right).

Again, by Markov’s inequality,

\displaystyle P(Y\geq t)\leq\exp\left(-\alpha(t-B_{1})+\frac{\sigma^{2}+B_{2}}{c^{2}}(e^{c\alpha}(c\alpha-1)+1)\right)

Now, for $t>e+B_{1}$ , choosing $\alpha=(\log(t-B_{1})-\log\log(t-B_{1}))/c$ , we have

$\displaystyle P(Y\geq t)$	$\displaystyle\leq$	$\displaystyle\exp\bigg{(}-\frac{t-B_{1}}{c}(\log(t-B_{1})-\log\log(t-B_{1}))$
		$\displaystyle+\frac{\sigma^{2}+B_{2}}{c^{2}}\left(\frac{t-B_{1}}{\log(t-B_{1})}(\log(t-B_{1})-\log\log(t-B_{1})-1)+1\right)\bigg{)}$
	$\displaystyle\leq$	$\displaystyle\exp\left(-\frac{t-B_{1}}{c}\left(\log(t-B_{1})-\log\log(t-B_{1})-\frac{\sigma^{2}+B_{2}}{c}\right)\right).$

Using that $(\log x)/2\geq\log\log x$ for all $x>1$ yields the second inequality in (7).

$\Box$

References

Aldous (1985) Aldous, D. J. (1985). Exchangeability and related topics. In École d’Été de Probabilités de Saint-Flour XIII—1983, pp. 1–198. Lecture Notes in Math., 1117, Springer, Berlin.
Arratia et al. (2003) Arratia, R., A. D. Barbour, and S. Tavaré (2003). Logarithmic combinatorial structures: a probabilistic approach. European Mathematical Society.
Arratia and Baxendale (2015) Arratia, R. and P. Baxendale (2015). Bounded size bias coupling: a gamma function bound, and universal dickman-function behavior. Probability Theory and Related Fields 162(3), 411–429.
Chatterjee (2007) Chatterjee, S. (2007). Stein’s method for concentration inequalities. Probability theory and related fields 138(1-2), 305–321.
Chen et al. (2011) Chen, L. H., L. Goldstein, and Q.-M. Shao (2011). Normal approximation by Stein’s method. Springer, New York.
Cook et al. (2018) Cook, N., L. Goldstein, and T. Johnson (2018). Size biased couplings and the spectral gap for random regular graphs. The Annals of Probability 46(1), 72–125.
Crane (2016) Crane, H. (2016). The ubiquitous Ewens sampling formula. Statistical science 31(1), 1–19.
da Silva et al. (2020) da Silva, P. H., A. Jamshidpey, and S. Tavaré (2020). Random derangements and the Ewens sampling formula. arXiv preprint arXiv:2006.04840.
Ewens (1972) Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoretical population biology 3(1), 87–112.
Gan and Ross (2021) Gan, H. L. and N. Ross (2021). Stein’s method for the poisson–dirichlet distribution and the Ewens sampling formula, with applications to wright–fisher models. The Annals of Applied Probability 31(2), 625–667.
Ghosh and Goldstein (2011a) Ghosh, S. and L. Goldstein (2011a). Applications of size biased couplings for concentration of measures. Electronic Communications in Probability 16, 70–83.
Ghosh and Goldstein (2011b) Ghosh, S. and L. Goldstein (2011b). Concentration of measures via size-biased couplings. Probability theory and related fields 149(1), 271–278.
Goldstein and Işlak (2014) Goldstein, L. and Ü. Işlak (2014). Concentration inequalities via zero bias couplings. Statistics & Probability Letters 86, 17–23.
Goldstein and Reinert (1997) Goldstein, L. and G. Reinert (1997). Stein’s method and the zero bias transformation with application to simple random sampling. The Annals of Applied Probability 7(4), 935–952.
Hoeffding (1951) Hoeffding, W. (1951). A combinatorial central limit theorem. Annals of Mathematical Statistics 22(4), 558–566.
Johnson et al. (1997) Johnson, N. L., S. Kotz, and N. Balakrishnan (1997). Discrete multivariate distributions. Wiley, New York.
Laipaporn and Neammanee (2006) Laipaporn, K. and K. Neammanee (2006). A non-uniform concentration inequality for randomized orthogonal array sampling designs. Thai Journal of Mathematics 4(1), 11–34.
Pitman (1996) Pitman, J. (1996). Some developments of the blackwell-macqueen urn scheme. Lecture Notes-Monograph Series, 245–267.
Rattanawong (2013) Rattanawong, P. (2013). A non-uniform concentration inequality for a random permutation sum. Communications in Statistics-Theory and Methods 42(10), 1879–1888.
Robert and Casella (2013) Robert, C. and G. Casella (2013). Monte Carlo statistical methods. Springer Verlag, New York.
Ross (2011) Ross, N. (2011). Fundamentals of Stein’s method. Probability Surveys 8, 210–293.
Stein (1972) Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proc. Sixth Berkeley Symp. Math. Statist. Prob. Univ. of California Press 2, 583–602.
Talagrand (1995) Talagrand, M. (1995). Concentration of measure and isoperimetric inequalities in product spaces. Publications Mathématiques de l’Institut des Hautes Etudes Scientifiques 81(1), 73–205.
Von Neumann (1951) Von Neumann, J. (1951). Various techniques in connection with random digits, collected works, volume v. Applied Math Series, Notes by G. E. Forsythe, in National Bureau of Standards 12, 36–38.
Wiroonsri (2017) Wiroonsri, N. (2017). Stein’s method using approximate zero bias couplings with applications to combinatorial central limit theorems under the Ewens distribution. ALEA, Lat. Am. J. Probab. Math. Stat. 14, 917–946.