This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Le Cam distance between multivariate hypergeometric
and multivariate normal experiments

Frédéric Ouimet McGill University, Montreal, QC H3A 0B9, Canada. California Institute of Technology, Pasadena, CA 91125, USA. [email protected]
Abstract

In this short note, we develop a local approximation for the log-ratio of the multivariate hypergeometric probability mass function over the corresponding multinomial probability mass function. In conjunction with the bounds from Carter, [4] and Ouimet, [14] on the total variation between the law of a multinomial vector jittered by a uniform on (1/2,1/2)d(-1/2,1/2)^{d} and the law of the corresponding multivariate normal distribution, the local expansion for the log-ratio is then used to obtain a total variation bound between the law of a multivariate hypergeometric random vector jittered by a uniform on (1/2,1/2)d(-1/2,1/2)^{d} and the law of the corresponding multivariate normal distribution. As a corollary, we find an upper bound on the Le Cam distance between multivariate hypergeometric and multivariate normal experiments.

keywords:
multivariate hypergeometric distribution , sampling without replacement , multinomial distribution , normal approximation , Gaussian approximation , local approximation , local limit theorem , asymptotic statistics , multivariate normal distribution , Le Cam distance , total variation , deficiency , comparison of experiments
MSC:
[2020]Primary: 62E20, 62B15 Secondary: 60F99, 60E05, 62H10, 62H12
journal: Results in Mathematics

1 Introduction

Let dd\in\mathbb{N}. The dd-dimensional (unit) simplex and its interior are defined by

𝒮d:={𝒙[0,1]d:𝒙11},Int(𝒮d):={𝒙(0,1)d:𝒙1<1},\mathcal{S}_{d}\vcentcolon=\big{\{}\boldsymbol{x}\in[0,1]^{d}:\|\boldsymbol{x}\|_{1}\leq 1\big{\}},\qquad\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d})\vcentcolon=\big{\{}\boldsymbol{x}\in(0,1)^{d}:\|\boldsymbol{x}\|_{1}<1\big{\}}, (1.1)

where 𝒙1:=i=1d|xi|\|\boldsymbol{x}\|_{1}\vcentcolon=\sum_{i=1}^{d}|x_{i}| denotes the 1\ell^{1} norm on d\mathbb{R}^{d}. Given a set of probability weights 𝒑N10dInt(𝒮d)\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d}), the probability mass function of the multivariate hypergeometric distribution, Hypergeometric(N,n,𝒑)\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p}), is defined, by Johnson et al., [6, Chapter 39], as

PN,n,𝒑(𝒌):=i=1d+1(Npiki)(Nn),𝒌𝕂d,P_{N,n,\boldsymbol{p}}(\boldsymbol{k})\vcentcolon=\frac{\prod_{i=1}^{d+1}\binom{Np_{i}}{k_{i}}}{\binom{N}{n}},\quad\boldsymbol{k}\in\mathbb{K}_{d}, (1.2)

where pd+1:=1𝒑1>0p_{d+1}\vcentcolon=1-\|\boldsymbol{p}\|_{1}>0, kd+1:=n𝒌1k_{d+1}\vcentcolon=n-\|\boldsymbol{k}\|_{1}, n,Nn,N\in\mathbb{N} with nNn\leq N, and

𝕂d:={𝒌0dn𝒮d:ki[0,Npi]for all 1id+1}.\mathbb{K}_{d}\vcentcolon=\big{\{}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}:k_{i}\in[0,Np_{i}]\,\,\text{for all }1\leq i\leq d+1\big{\}}. (1.3)

This distribution represents the first dd components of the vector of categorical sample counts when randomly sorting a random sample of nn objects from a finite population of NN objects into d+1d+1 categories, where pi,1id+1p_{i},~{}1\leq i\leq d+1, is the probability of any given object to be sorted in the ii-th category.

Our first main goal in this paper is to develop a local approximation for the log-ratio of the multivariate hypergeometric probability mass function (1.2) over the Multinomial(n,𝒑)\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p}) probability mass function, namely

Qn,𝒑(𝒌):=n!i=1d+1ki!i=1d+1piki,𝒌0dn𝒮d.Q_{n,\boldsymbol{p}}(\boldsymbol{k})\vcentcolon=\frac{n!}{\prod_{i=1}^{d+1}k_{i}!}\prod_{i=1}^{d+1}p_{i}^{k_{i}},\quad\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}. (1.4)

This latter distribution represents the exact same as (1.2) above, except that the population from which the nn objects are drawn is infinite (N=N=\infty). Another way of distinguishing PN,n,𝒑P_{N,n,\boldsymbol{p}} and Qn,𝒑Q_{n,\boldsymbol{p}} in a finite population of NN objects is to say that we sample the nn objects without replacement and with replacement, respectively. In both cases, the categorical probabilities (𝒑,pd+1)(\boldsymbol{p},p_{d+1}) are the same. For good general references on normal approximations, we refer the reader to Bhattacharya & Ranga Rao, [2] and Kolassa, [7].

Our second main goal is to prove an upper bound on the total variation between the probability measure on d\mathbb{R}^{d} induced by a random vector distributed according to PN,n,𝒑P_{N,n,\boldsymbol{p}} and jittered by a uniform on (1/2,1/2)d(-1/2,1/2)^{d}, and the probability measure on d\mathbb{R}^{d} induced by a multivariate normal random vector with the same mean and covariances as a random vector distributed according to Qn,𝒑Q_{n,\boldsymbol{p}}, namely n𝒑n\boldsymbol{p} and n(diag(𝒑)𝒑𝒑)n(\mathrm{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}). The proof makes use of the total variation bound from Ouimet, [14, Lemma 3.1] (which improved Lemma 2 in [4]) on the total variation between the probability measure on d\mathbb{R}^{d} induced by a multinomial vector distributed according to Qn,𝒑Q_{n,\boldsymbol{p}} and jittered by a uniform on (1/2,1/2)d(-1/2,1/2)^{d}, and the probability measure on d\mathbb{R}^{d} induced by a multivariate normal random vector with the same mean and covariances. As pointed out by Mattner & Schulz, [11, p.732], the univariate case here would be much simpler since Morgenstern, [12, p.62-63] showed that the hypergeometric probability mass function can be written as a ratio of three binomial probability mass functions, and local limit theorems are well-known for the binomial distribution, see, e.g., Prokhorov, [15] and Govindarajulu, [5].

The deficiency between a given statistical experiment and another measures the loss of information from carrying inferences to the second setting using information from the first setting. This loss of information goes in both directions, but the deficiency is not necessarily symmetric. The maximum of the two deficiencies is called the Le Cam distance (or Δ\Delta-distance in [8]). The usefulness of this notion comes from the fact that seemingly completely different statistical experiments can result in asymptotically equivalent inferences using Markov kernels to carry information from one setting to another. For instance, it was famously shown by Nussbaum, [13] that the density estimation problem and the Gaussian white noise problem are asymptotically equivalent in the sense that the Le Cam distance between the two experiments goes to 0 as the number of observations goes to infinity. The main idea was that the information we get from sampling observations from an unknown density function and counting the observations that fall in the various boxes of a fine partition of the density’s support can be encoded using the increments of a properly scaled Brownian motion with drift t0tf(s)dst\mapsto\int_{0}^{t}\sqrt{f(s)}\hskip 1.42262pt{\rm d}s, and vice versa. An alternative (simpler) proof of this asymptotic equivalence was shown by Brown et al., [3] who combined a Haar wavelet cascade scheme with coupling inequalities relating the binomial and univariate normal distributions at each step (a similar argument was developed previously by Carter, [4] to derive a multinomial/multivariate coupling inequality). Not only Brown et al., [3] streamlined the proof of the asymptotic equivalence originally shown by Nussbaum, [13], but their results hold for a larger class of densities and the asymptotic equivalence was also extended to Poisson processes. Our third main result in the present paper extends the multinomial/multivariate normal comparison from [4] (revisited and improved by Ouimet, [14], who removed the inductive part of the argument) to the multivariate hypergeometric/multivariate normal comparison (recall from (1.4) that the multinomial distribution is just the limiting case N=N=\infty of the multivariate hypergeometric distribution). For an excellent and concise review on Le Cam’s theory for the comparison of statistical models, we refer the reader to Mariucci, [10].

The three results we have just described are presented in Section 2, and the related proofs are gathered in Section 3. Here are now some motivations for these results. First, we believe that the first two results (the local expansion of the log-ratio and the total variation bound) could help in developing asymptotic Berry-Esseen type bounds for the symmetric multivariate hypergeometric distribution and the symmetric multinomial distribution, similar to the exact optimal bounds proved recently by Mattner & Schulz, [11] in the univariate setting. Second, there might be a way to use the Le Cam distance upper bound between multivariate hypergeometric and multivariate normal experiments to extend the results on the asymptotic equivalence between the density estimation problem and the Gaussian white noise problem shown by Nussbaum, [13] and Brown et al., [3].

Remark 1.

Throughout the paper, the notation u=𝒪(v)u=\mathcal{O}(v) means that lim supN|u/v|<C\limsup_{N\to\infty}|u/v|<C, where C(0,)C\in(0,\infty) is a universal constant. Whenever CC might depend on a parameter, we add a subscript (for example, u=𝒪d(v)u=\mathcal{O}_{d}(v)).

2 Results

Our first main result is an asymptotic expansion for the log-ratio of the multivariate hypergeometric probability mass function (1.2) over the corresponding multinomial probability mass function (1.4).

Theorem 1 (Local limit theorem for the log-ratio).

Assume that n,Nn,N\in\mathbb{N} with nNn\leq N and 𝐩N10dInt(𝒮d)\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d}) hold, and pick any γ(0,1)\gamma\in(0,1). Then, uniformly for 𝐤𝕂d\boldsymbol{k}\in\mathbb{K}_{d} such that max1id+1(ki/pi)γN\max_{1\leq i\leq d+1}(k_{i}/p_{i})\leq\gamma N and nγNn\leq\gamma N, we have, as NN\to\infty,

log(PN,n,𝒑(𝒌)Qn,𝒑(𝒌))=1N[(n22n2)i=1d+11pi(ki22ki2)]+𝒪γ(1N2[n3+i=1d+1ki3pi2]),\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)=\frac{1}{N}\left[\bigg{(}\frac{n^{2}}{2}-\frac{n}{2}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\bigg{(}\frac{k_{i}^{2}}{2}-\frac{k_{i}}{2}\bigg{)}\right]+\mathcal{O}_{\gamma}\left(\frac{1}{N^{2}}\left[n^{3}+\sum_{i=1}^{d+1}\frac{k_{i}^{3}}{p_{i}^{2}}\right]\right), (2.1)

and

log(PN,n,𝒑(𝒌)Qn,𝒑(𝒌))\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right) =1N[(n22n2)i=1d+11pi(ki22ki2)]\displaystyle=\frac{1}{N}\left[\bigg{(}\frac{n^{2}}{2}-\frac{n}{2}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\bigg{(}\frac{k_{i}^{2}}{2}-\frac{k_{i}}{2}\bigg{)}\right] (2.2)
+1N2[(n36n24+n12)i=1d+11pi2(ki36ki24+ki12)]\displaystyle\quad+\frac{1}{N^{2}}\left[\bigg{(}\frac{n^{3}}{6}-\frac{n^{2}}{4}+\frac{n}{12}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}^{2}}\cdot\bigg{(}\frac{k_{i}^{3}}{6}-\frac{k_{i}^{2}}{4}+\frac{k_{i}}{12}\bigg{)}\right]
+𝒪γ(1N3[n4+i=1d+1ki4pi3]).\displaystyle\quad+\mathcal{O}_{\gamma}\left(\frac{1}{N^{3}}\left[n^{4}+\sum_{i=1}^{d+1}\frac{k_{i}^{4}}{p_{i}^{3}}\right]\right).

The local limit theorem above together with the total variation bound in [4, 14] between jittered multinomials and the corresponding multivariate normals allow us to derive an upper bound on the total variation between the probability measure on d\mathbb{R}^{d} induced by a multivariate hypergeometric random vector jittered by a uniform random vector on (1/2,1/2)d(-1/2,1/2)^{d} and the probability measure on d\mathbb{R}^{d} induced by a multivariate normal random vector with the same mean and covariances as the multinomial distribution in (1.4).

Theorem 2 (Total variation upper bound).

Assume that n,Nn,N\in\mathbb{N} with n(3/4)Nn\leq(3/4)\,N and 𝐩N10dInt(𝒮d)\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d}) hold. Let 𝐊Hypergeometric(N,n,𝐩)\boldsymbol{K}\sim\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p}), 𝐋Multinomial(n,𝐩)\boldsymbol{L}\sim\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p}), and 𝐔,𝐕Uniform(1/2,1/2)d\boldsymbol{U},\boldsymbol{V}\sim\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}, where 𝐊\boldsymbol{K}, 𝐋\boldsymbol{L}, 𝐔\boldsymbol{U} and 𝐕\boldsymbol{V} are assumed to be jointly independent. Define 𝐗:=𝐊+𝐔\boldsymbol{X}\vcentcolon=\boldsymbol{K}+\boldsymbol{U} and 𝐘:=𝐋+𝐕\boldsymbol{Y}\vcentcolon=\boldsymbol{L}+\boldsymbol{V}, and let ~N,n,𝐩\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}} and ~n,𝐩\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}} be the laws of 𝐗\boldsymbol{X} and 𝐘\boldsymbol{Y}, respectively. Also, let n,𝐩\mathbb{Q}_{n,\boldsymbol{p}} be the law of the Normald(n𝐩,nΣ𝐩)\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}) distribution, where Σ𝐩:=diag(𝐩)𝐩𝐩\Sigma_{\boldsymbol{p}}\vcentcolon=\mathrm{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}. Then, as NN\to\infty,

~N,n,𝒑n,𝒑\displaystyle\|\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\| 2i=1d+1(1νi)nνipi(1pi1νipi)n(1νipi)+𝒪(n2N)\displaystyle\leq\sqrt{2\sum_{i=1}^{d+1}\left(\frac{1}{\nu_{i}}\right)^{n\nu_{i}p_{i}}\left(\frac{1-p_{i}}{1-\nu_{i}p_{i}}\right)^{n(1-\nu_{i}p_{i})}+\mathcal{O}\left(\frac{n^{2}}{N}\right)} (2.3)
+𝒪(dnmax{p1,,pd,pd+1}min{p1,,pd,pd+1}),\displaystyle\quad+\mathcal{O}\left(\frac{d}{\sqrt{n}}\sqrt{\frac{\max\{p_{1},\dots,p_{d},p_{d+1}\}}{\min\{p_{1},\dots,p_{d},p_{d+1}\}}}\right),

where νi:=pi11\nu_{i}\vcentcolon=\lceil p_{i}^{-1}-1\rceil, for 1id+11\leq i\leq d+1, and \|\cdot\| denotes the total variation norm.

Since the Le Cam distance is a pseudometric and the Markov kernel that jitters a random vector by a uniform on (1/2,1/2)d(-1/2,1/2)^{d} is easily inverted (round off each component of the vector to the nearest integer), then we find, as a consequence of the total variation bound in Theorem 2, an upper bound on the Le Cam distance between multivariate hypergeometric and multivariate normal experiments.

Theorem 3 (Le Cam distance upper bound).

Assume that n,Nn,N\in\mathbb{N} with n(3/4)Nn\leq(3/4)\,N holds. For any given R1R\geq 1, let

ΘR:={𝒑N10dInt(𝒮d):max{p1,,pd,pd+1}min{p1,,pd,pd+1}R}.\Theta_{R}\vcentcolon=\left\{\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d}):\,\frac{\max\{p_{1},\dots,p_{d},p_{d+1}\}}{\min\{p_{1},\dots,p_{d},p_{d+1}\}}\leq R\right\}. (2.4)

Define the experiments

𝒫\displaystyle\mathscr{P} :=\displaystyle\vcentcolon= {N,n,𝒑}𝒑ΘR,\displaystyle~{}\{\mathbb{P}_{N,n,\boldsymbol{p}}\}_{\boldsymbol{p}\in\Theta_{R}},\quad N,n,𝒑is the measure induced by Hypergeometric(N,n,𝒑),\displaystyle\mathbb{P}_{N,n,\boldsymbol{p}}~{}\text{is the measure induced by }\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p}),
𝒬\displaystyle\mathscr{Q} :=\displaystyle\vcentcolon= {n,𝒑}𝒑ΘR,\displaystyle~{}\{\mathbb{Q}_{n,\boldsymbol{p}}\}_{\boldsymbol{p}\in\Theta_{R}},\quad n,𝒑is the measure induced by Normald(n𝒑,nΣ𝒑).\displaystyle\mathbb{Q}_{n,\boldsymbol{p}}~{}\text{is the measure induced by }\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}).

Then, for Nn3/d2N\geq n^{3}/d^{\hskip 0.56905pt2}, we have the following upper bound on the Le Cam distance Δ(𝒫,𝒬)\Delta(\mathscr{P},\mathscr{Q}) between 𝒫\mathscr{P} and 𝒬\mathscr{Q},

Δ(𝒫,𝒬):=max{δ(𝒫,𝒬),δ(𝒬,𝒫)}CRdn,\Delta(\mathscr{P},\mathscr{Q})\vcentcolon=\max\{\delta(\mathscr{P},\mathscr{Q}),\delta(\mathscr{Q},\mathscr{P})\}\leq C_{R}\,\frac{d}{\sqrt{n}}, (2.5)

where CRC_{R} is a positive constant that depends only on RR,

δ(𝒫,𝒬)\displaystyle\delta(\mathscr{P},\mathscr{Q}) :=infT1sup𝒑ΘR𝕂dT1(𝒌,)N,n,𝒑(d𝒌)n,𝒑,\displaystyle\vcentcolon=\inf_{T_{1}}\sup_{\boldsymbol{p}\in\Theta_{R}}\bigg{\|}\int_{\mathbb{K}_{d}}T_{1}(\boldsymbol{k},\cdot\,)\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\mathbb{Q}_{n,\boldsymbol{p}}\bigg{\|}, (2.6)
δ(𝒬,𝒫)\displaystyle\delta(\mathscr{Q},\mathscr{P}) :=infT2sup𝒑ΘRN,n,𝒑dT2(𝒛,)n,𝒑(d𝒛),\displaystyle\vcentcolon=\inf_{T_{2}}\sup_{\boldsymbol{p}\in\Theta_{R}}\bigg{\|}\mathbb{P}_{N,n,\boldsymbol{p}}-\int_{\mathbb{R}^{d}}T_{2}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\|},

and the infima are taken over all Markov kernels T1:𝕂d×(d)[0,1]T_{1}:\mathbb{K}_{d}\times\mathscr{B}(\mathbb{R}^{d})\to[0,1] and T2:d×(𝕂d)[0,1]T_{2}:\mathbb{R}^{d}\times\mathscr{B}(\mathbb{K}_{d})\to[0,1].

Now, consider the following multivariate normal experiments with independent components

𝒬¯\displaystyle\overline{\mathscr{Q}} :=\displaystyle\vcentcolon= {¯n,𝒑}𝒑ΘR,\displaystyle~{}\{\overline{\mathbb{Q}}_{n,\boldsymbol{p}}\}_{\boldsymbol{p}\in\Theta_{R}},\quad ¯n,𝒑is the measure induced by Normald(n𝒑,ndiag(𝒑)),\displaystyle\overline{\mathbb{Q}}_{n,\boldsymbol{p}}~{}\text{is the measure induced by }\mathrm{Normal}_{d}(n\boldsymbol{p},n\mathrm{diag}(\boldsymbol{p})),
𝒬\displaystyle\mathscr{Q}^{\star} :=\displaystyle\vcentcolon= {n,𝒑}𝒑ΘR,\displaystyle~{}\{\mathbb{Q}_{n,\boldsymbol{p}}^{\star}\}_{\boldsymbol{p}\in\Theta_{R}},\quad n,𝒑is the measure induced by Normald(n𝒑,diag(𝟏/4)),\displaystyle\mathbb{Q}_{n,\boldsymbol{p}}^{\star}~{}\text{is the measure induced by }\mathrm{Normal}_{d}(\sqrt{n\boldsymbol{p}},\mathrm{diag}(\boldsymbol{1}/4)),

where 𝟏:=(1,1,,1)\boldsymbol{1}\vcentcolon=(1,1,\dots,1)^{\top}, then Carter, [4, Section 7] showed, using a variance stabilizing transformation, that

Δ(𝒬,𝒬¯)CRdnandΔ(𝒬¯,𝒬)CRdn,\Delta(\mathscr{Q},\overline{\mathscr{Q}})\leq C_{R}\,\sqrt{\frac{d}{n}}\qquad\text{and}\qquad\Delta(\overline{\mathscr{Q}},\mathscr{Q}^{\star})\leq C_{R}\,\frac{d}{\sqrt{n}}, (2.7)

with proper adjustments to the definition of the deficiencies in (2.6).

Corollary 1.

With the same notation as in Theorem 3, we have, for Nn3/d2N\geq n^{3}/d^{\hskip 0.56905pt2},

Δ(𝒫,𝒬¯)CRdnandΔ(𝒫,𝒬)CRdn,\Delta(\mathscr{P},\overline{\mathscr{Q}})\leq C_{R}\,\frac{d}{\sqrt{n}}\qquad\text{and}\qquad\Delta(\mathscr{P},\mathscr{Q}^{\star})\leq C_{R}\,\frac{d}{\sqrt{n}}, (2.8)

where CRC_{R} is a positive constant that depends only on RR.

3 Proofs

Proof of Theorem 1.

Throughout the proof, the parameter nn\in\mathbb{N} satisfies nγNn\leq\gamma N and the asymptotic expressions are valid as NN\to\infty. Let 𝒑N10dInt(𝒮d)\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d}) and 𝒌𝕂d\boldsymbol{k}\in\mathbb{K}_{d}. Using Stirling’s formula,

logm!=12log(2π)+(m+12)logmm+112m+𝒪(m3),m,\log m!=\frac{1}{2}\log(2\pi)+(m+\tfrac{1}{2})\log m-m+\frac{1}{12m}+\mathcal{O}(m^{-3}),\quad m\to\infty, (3.1)

see, e.g., Abramowitz & Stegun, [1, p.257], and taking logarithms in (1.2) and (1.4), we obtain

log(PN,n,𝒑(𝒌)Qn,𝒑(𝒌))\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right) =i=1d+1log(Npi)!i=1d+1log(Npiki)!+log(Nn)!logN!i=1d+1kilogpi\displaystyle=\sum_{i=1}^{d+1}\log(Np_{i})!-\sum_{i=1}^{d+1}\log(Np_{i}-k_{i})!+\log(N-n)!-\log N!-\sum_{i=1}^{d+1}k_{i}\log p_{i}
=i=1d+1(Npi+12)log(Npi)i=1d+1(Npiki+12)log(Npiki)\displaystyle=\sum_{i=1}^{d+1}(Np_{i}+\tfrac{1}{2})\log(Np_{i})-\sum_{i=1}^{d+1}(Np_{i}-k_{i}+\tfrac{1}{2})\log(Np_{i}-k_{i})
+(Nn+12)log(Nn)(N+12)logNi=1d+1kilogpi\displaystyle\qquad+(N-n+\tfrac{1}{2})\log(N-n)-(N+\tfrac{1}{2})\log N-\sum_{i=1}^{d+1}k_{i}\log p_{i}
+112N[i=1d+11pi{1(1kiNpi)1}+(1nN)11]\displaystyle\qquad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\left\{1-\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}^{-1}\right\}+\bigg{(}1-\frac{n}{N}\bigg{)}^{-1}-1\right]
+𝒪(1N3[i=1d+11pi3{1+(1kiNpi)3}+(1nN)3+1])\displaystyle\qquad+\mathcal{O}\left(\frac{1}{N^{3}}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}^{3}}\left\{1+\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}^{-3}\right\}+\bigg{(}1-\frac{n}{N}\bigg{)}^{-3}+1\right]\right)
=i=1d+1Npi(1kiNpi)log(1kiNpi)12i=1d+1log(1kiNpi)\displaystyle=-\sum_{i=1}^{d+1}Np_{i}\,\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}\log\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}-\frac{1}{2}\sum_{i=1}^{d+1}\log\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}
+N(1nN)log(1nN)+12log(1nN)\displaystyle\qquad+N\left(1-\frac{n}{N}\right)\log\left(1-\frac{n}{N}\right)+\frac{1}{2}\log\left(1-\frac{n}{N}\right)
+112N[i=1d+11pi{1(1kiNpi)1}+(1nN)11]\displaystyle\qquad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\left\{1-\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}^{-1}\right\}+\bigg{(}1-\frac{n}{N}\bigg{)}^{-1}-1\right] (3.2)
+𝒪γ(i=1d+11(Npi)3).\displaystyle\qquad+\mathcal{O}_{\gamma}\left(\sum_{i=1}^{d+1}\frac{1}{(Np_{i})^{3}}\right). (3.3)

By applying the following Taylor expansions, valid for |x|γ<1|x|\leq\gamma<1,

(1x)log(1x)\displaystyle(1-x)\log(1-x) =x+x22+x36+𝒪((1γ)3|x|4),\displaystyle=-x+\frac{x^{2}}{2}+\frac{x^{3}}{6}+\mathcal{O}\big{(}(1-\gamma)^{-3}|x|^{4}\big{)}, (3.4)
log(1x)\displaystyle\log(1-x) =xx22+𝒪((1γ)3|x|3),\displaystyle=-x-\frac{x^{2}}{2}+\mathcal{O}\big{(}(1-\gamma)^{-3}|x|^{3}\big{)},
(1x)1\displaystyle(1-x)^{-1} =1+x+𝒪((1γ)3|x|2),\displaystyle=1+x+\mathcal{O}\big{(}(1-\gamma)^{-3}|x|^{2}\big{)},

in (3), we have

log(PN,n,𝒑(𝒌)Qn,𝒑(𝒌))\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right) =i=1d+1Npi{(kiNpi)12(kiNpi)216(kiNpi)3+𝒪γ(|kiNpi|4)}\displaystyle=\sum_{i=1}^{d+1}Np_{i}\,\left\{\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}-\frac{1}{2}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{2}-\frac{1}{6}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{3}+\mathcal{O}_{\gamma}\left(\bigg{|}\frac{k_{i}}{Np_{i}}\bigg{|}^{4}\right)\right\} (3.5)
+12i=1d+1{(kiNpi)+12(kiNpi)2+𝒪γ(|kiNpi|3)}\displaystyle\quad+\frac{1}{2}\sum_{i=1}^{d+1}\left\{\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}+\frac{1}{2}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{2}+\mathcal{O}_{\gamma}\left(\bigg{|}\frac{k_{i}}{Np_{i}}\bigg{|}^{3}\right)\right\}
+N{(nN)+12(nN)2+16(nN)3+𝒪γ(|nN|4)}\displaystyle\quad+N\left\{-\bigg{(}\frac{n}{N}\bigg{)}+\frac{1}{2}\bigg{(}\frac{n}{N}\bigg{)}^{2}+\frac{1}{6}\bigg{(}\frac{n}{N}\bigg{)}^{3}+\mathcal{O}_{\gamma}\left(\bigg{|}\frac{n}{N}\bigg{|}^{4}\right)\right\}
12{(nN)+12(nN)2+𝒪γ(|nN|3)}\displaystyle\quad-\frac{1}{2}\left\{\bigg{(}\frac{n}{N}\bigg{)}+\frac{1}{2}\bigg{(}\frac{n}{N}\bigg{)}^{2}+\mathcal{O}_{\gamma}\left(\bigg{|}\frac{n}{N}\bigg{|}^{3}\right)\right\}
+112N[i=1d+11pi{kiNpi+𝒪γ(|kiNpi|2)}+{nN+𝒪γ(|nN|2)}]\displaystyle\quad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\left\{-\frac{k_{i}}{Np_{i}}+\mathcal{O}_{\gamma}\left(\bigg{|}\frac{k_{i}}{Np_{i}}\bigg{|}^{2}\right)\right\}+\left\{\frac{n}{N}+\mathcal{O}_{\gamma}\left(\bigg{|}\frac{n}{N}\bigg{|}^{2}\right)\right\}\right]
+𝒪γ(i=1d+11(Npi)3).\displaystyle\quad+\mathcal{O}_{\gamma}\left(\sum_{i=1}^{d+1}\frac{1}{(Np_{i})^{3}}\right).

After rearranging some terms and noticing that i=1d+1ki=n\sum_{i=1}^{d+1}k_{i}=n, we get

log(PN,n,𝒑(𝒌)Qn,𝒑(𝒌))\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right) =1N[(n22n2)i=1d+11pi(ki22ki2)]\displaystyle=\frac{1}{N}\left[\bigg{(}\frac{n^{2}}{2}-\frac{n}{2}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\bigg{(}\frac{k_{i}^{2}}{2}-\frac{k_{i}}{2}\bigg{)}\right] (3.6)
+1N2[(n36n24+n12)i=1d+11pi2(ki36ki24+ki12)]\displaystyle\quad+\frac{1}{N^{2}}\left[\bigg{(}\frac{n^{3}}{6}-\frac{n^{2}}{4}+\frac{n}{12}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}^{2}}\cdot\bigg{(}\frac{k_{i}^{3}}{6}-\frac{k_{i}^{2}}{4}+\frac{k_{i}}{12}\bigg{)}\right]
+𝒪γ(1N3[n4+i=1d+1ki4pi3]).\displaystyle\quad+\mathcal{O}_{\gamma}\left(\frac{1}{N^{3}}\left[n^{4}+\sum_{i=1}^{d+1}\frac{k_{i}^{4}}{p_{i}^{3}}\right]\right).

This proves (2.2). Equation (2.1) follows from the same arguments, simply by keeping fewer terms for the Taylor expansions in (3.4). The details are omitted for conciseness. ∎

Proof of Theorem 2.

Define

AN,n,𝒑(γ):={𝒌𝕂d:max1id+1kipiγN}+(12,12)d,γ>0.A_{N,n,\boldsymbol{p}}(\gamma)\vcentcolon=\left\{\boldsymbol{k}\in\mathbb{K}_{d}:\max_{1\leq i\leq d+1}\frac{k_{i}}{p_{i}}\leq\gamma N\right\}+\left(-\frac{1}{2},\frac{1}{2}\right)^{d},\quad\gamma>0. (3.7)

By the comparison of the total variation norm with the Hellinger distance on page 726 of Carter, [4], we already know that

~N,n,𝒑~n,𝒑2(𝑿AN,n,𝒑c(1/2))+𝔼[log(d~N,n,𝒑d~n,𝒑(𝑿)) 1{𝑿AN,n,𝒑(1/2)}].\|\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}-\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}\|\leq\sqrt{2\,\mathbb{P}(\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}^{c}(1/2))+\mathbb{E}\left[\log\bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]}. (3.8)

By applying a union bound together with the large deviation bound for the (univariate) hypergeometric distribution in Luh & Pippenger, [9, Equation (4)], we get, for NN large enough,

(𝑿AN,n,𝒑c(1/2))\displaystyle\mathbb{P}(\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}^{c}(1/2)) i=1d+1(Ki>N2nnpi1)i=1d+1(Ki>νinpi)\displaystyle\leq\sum_{i=1}^{d+1}\mathbb{P}\Bigl{(}K_{i}>\tfrac{N}{2n}\cdot np_{i}-1\Bigr{)}\leq\sum_{i=1}^{d+1}\mathbb{P}\Bigl{(}K_{i}>\nu_{i}\cdot np_{i}\Bigr{)} (3.9)
i=1d+1(1νi)nνipi(1pi1νipi)n(1νipi),\displaystyle\leq\sum_{i=1}^{d+1}\left(\frac{1}{\nu_{i}}\right)^{n\nu_{i}p_{i}}\left(\frac{1-p_{i}}{1-\nu_{i}p_{i}}\right)^{n(1-\nu_{i}p_{i})},

where νi:=pi11\nu_{i}\vcentcolon=\lceil p_{i}^{-1}-1\rceil, for all 1id+11\leq i\leq d+1. To estimate the expectation in (3.8), note that if PN,n,𝒑(𝒙)P_{N,n,\boldsymbol{p}}(\boldsymbol{x}) and Qn,𝒑(𝒙)Q_{n,\boldsymbol{p}}(\boldsymbol{x}) denote the density functions associated with ~N,n,𝒑\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}} and ~n,𝒑\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}} (i.e., PN,n,𝒑(𝒙)P_{N,n,\boldsymbol{p}}(\boldsymbol{x}) is equal to PN,n,𝒑(𝒌)P_{N,n,\boldsymbol{p}}(\boldsymbol{k}) whenever 𝒌𝕂d\boldsymbol{k}\in\mathbb{K}_{d} is closest to 𝒙\boldsymbol{x}, and analogously for Qn,𝒑(𝒙)Q_{n,\boldsymbol{p}}(\boldsymbol{x})), then, for NN large enough, we have

|𝔼[log(d~N,n,𝒑d~n,𝒑(𝑿)) 1{𝑿AN,n,𝒑(1/2)}]|\displaystyle\left|\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\Bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]\right| =|𝔼[log(PN,n,𝒑(𝑿)Qn,𝒑(𝑿)) 1{𝑿AN,n,𝒑(1/2)}]|\displaystyle=\left|\mathbb{E}\left[\log\bigg{(}\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{X})}{Q_{n,\boldsymbol{p}}(\boldsymbol{X})}\bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]\right| (3.10)
𝔼[|log(PN,n,𝒑(𝑲)Qn,𝒑(𝑲))| 1{𝑲AN,n,𝒑(3/4)}].\displaystyle\leq\mathbb{E}\left[\bigg{|}\log\bigg{(}\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{K})}{Q_{n,\boldsymbol{p}}(\boldsymbol{K})}\bigg{)}\bigg{|}\,\mathds{1}_{\{\boldsymbol{K}\in A_{N,n,\boldsymbol{p}}(3/4)\}}\right].

By Theorem 1 with γ=3/4\gamma=3/4, we find

𝔼[log(d~N,n,𝒑d~n,𝒑(𝑿)) 1{𝑿AN,n,𝒑(1/2)}]\displaystyle\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\Bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right] =𝒪(n2N+i=1d+1𝔼[Ki2]Npi)\displaystyle=\mathcal{O}\left(\frac{n^{2}}{N}+\sum_{i=1}^{d+1}\frac{\mathbb{E}[K_{i}^{2}]}{Np_{i}}\right) (3.11)
=𝒪(n2N+i=1d+1n2pi2Npi)=𝒪(n2N).\displaystyle=\mathcal{O}\left(\frac{n^{2}}{N}+\sum_{i=1}^{d+1}\frac{n^{2}p_{i}^{2}}{Np_{i}}\right)=\mathcal{O}\left(\frac{n^{2}}{N}\right).

Together with the large deviation bound in (3.9), we deduce from (3.8) that

~N,n,𝒑~n,𝒑2i=1d+1(1νi)nνipi(1pi1νipi)n(1νipi)+𝒪(n2N).\|\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}-\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}\|\leq\sqrt{2\sum_{i=1}^{d+1}\left(\frac{1}{\nu_{i}}\right)^{n\nu_{i}p_{i}}\left(\frac{1-p_{i}}{1-\nu_{i}p_{i}}\right)^{n(1-\nu_{i}p_{i})}+\mathcal{O}\left(\frac{n^{2}}{N}\right)}. (3.12)

Also, by Lemma 3.1 in [14] (a slightly weaker bound can be found in Lemma 2 of [4]), we already know that

~n,𝒑n,𝒑=𝒪(dnmax{p1,,pd,pd+1}min{p1,,pd,pd+1}).\|\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|=\mathcal{O}\left(\frac{d}{\sqrt{n}}\sqrt{\frac{\max\{p_{1},\dots,p_{d},p_{d+1}\}}{\min\{p_{1},\dots,p_{d},p_{d+1}\}}}\right). (3.13)

Putting (3.12) and (3.13) together yields the conclusion. ∎

Proof of Theorem 3.

By Theorem 2 with our assumption Nn3/d2N\geq n^{3}/d^{\hskip 0.56905pt2}, we get the desired bound on δ(𝒫,𝒬)\delta(\mathscr{P},\mathscr{Q}) by choosing the Markov kernel T1T_{1}^{\star} that adds 𝑼Uniform(1/2,1/2)d\boldsymbol{U}\sim\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d} to 𝑲Hypergeometric(N,n,𝒑)\boldsymbol{K}\sim\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p}), namely

T1(𝒌,B):=(12,12)d𝟙B(𝒌+𝒖)d𝒖,𝒌𝕂d,B(d).\displaystyle T_{1}^{\star}(\boldsymbol{k},B)\vcentcolon=\int_{(-\frac{1}{2},\frac{1}{2})^{d}}\mathds{1}_{B}(\boldsymbol{k}+\boldsymbol{u}){\rm d}\boldsymbol{u},\quad\boldsymbol{k}\in\mathbb{K}_{d},~{}B\in\mathscr{B}(\mathbb{R}^{d}). (3.14)

To get the bound on δ(𝒬,𝒫)\delta(\mathscr{Q},\mathscr{P}), it suffices to consider a Markov kernel T2T_{2}^{\star} that inverts the effect of T1T_{1}^{\star}, i.e., rounding off every components of 𝒁Normald(n𝒑,nΣ𝒑)\boldsymbol{Z}\sim\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}) to the nearest integer. Then, as explained by Carter, [4, Section 5], we get

δ(𝒬,𝒫)\displaystyle\delta(\mathscr{Q},\mathscr{P}) N,n,𝒑dT2(𝒛,)n,𝒑(d𝒛)\displaystyle\leq\bigg{\|}\mathbb{P}_{N,n,\boldsymbol{p}}-\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\|}
=dT2(𝒛,)𝕂dT1(𝒌,d𝒛)N,n,𝒑(d𝒌)dT2(𝒛,)n,𝒑(d𝒛)\displaystyle=\bigg{\|}\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\int_{\mathbb{K}_{d}}T_{1}^{\star}(\boldsymbol{k},{\rm d}\boldsymbol{z})\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\|}
𝕂dT1(𝒌,)N,n,𝒑(d𝒌)n,𝒑,\displaystyle\leq\bigg{\|}\int_{\mathbb{K}_{d}}T_{1}^{\star}(\boldsymbol{k},\cdot\,)\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\mathbb{Q}_{n,\boldsymbol{p}}\bigg{\|}, (3.15)

and we obtain the same bound by Theorem 2. ∎

Proof of Corollary 1.

This follows directly from Theorem 3, Equation (2.7) and the fact that Δ(,)\Delta(\cdot,\cdot) is a pseudometric (i.e., the triangle inequality is valid). ∎

Acknowledgements

The author thanks the referee for his/her comments.

Funding: The author is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR).

Declarations

Conflict of interest: The author declares no conflict of interest.

References

  • Abramowitz & Stegun, [1964] Abramowitz, M., & Stegun, I. A. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series, vol. 55. For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. MR0167642.
  • Bhattacharya & Ranga Rao, [1976] Bhattacharya, R. N., & Ranga Rao, R. 1976. Normal Approximation and Asymptotic Expansions. John Wiley & Sons, New York-London-Sydney. MR0436272.
  • Brown et al., [2004] Brown, L. D., Carter, A. V., Low, M. G., & Zhang, C.-H. 2004. Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift. Ann. Statist., 32(5), 2074–2097. MR2102503.
  • Carter, [2002] Carter, A. V. 2002. Deficiency distance between multinomial and multivariate normal experiments. Dedicated to the memory of Lucien Le Cam. Ann. Statist., 30(3), 708–730. MR1922539.
  • Govindarajulu, [1965] Govindarajulu, Z. 1965. Normal approximations to the classical discrete distributions. Sankhyā Ser. A, 27, 143–172. MR207011.
  • Johnson et al., [1997] Johnson, N. L., Kotz, S., & Balakrishnan, N. 1997. Discrete Multivariate Distributions. Wiley Series in Probability and Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York. MR1429617.
  • Kolassa, [1994] Kolassa, J. E. 1994. Series Approximation Methods in Statistics. Lecture Notes in Statistics, vol. 88. Springer-Verlag, New York. MR1295242.
  • Le Cam & Yang, [2000] Le Cam, L., & Yang, G. L. 2000. Asymptotics in Statistics. Second edn. Springer Series in Statistics. Springer-Verlag, New York. MR1784901.
  • Luh & Pippenger, [2014] Luh, K., & Pippenger, N. 2014. Large-deviation bounds for sampling without replacement. Amer. Math. Monthly, 121(5), 449–454. MR3193733.
  • Mariucci, [2016] Mariucci, E. 2016. Le Cam theory on the comparison of statistical models. Grad. J. Math., 1(2), 81–91. MR3850766.
  • Mattner & Schulz, [2018] Mattner, L., & Schulz, J. 2018. On normal approximations to symmetric hypergeometric laws. Trans. Amer. Math. Soc., 370(1), 727–748. MR3717995.
  • Morgenstern, [1968] Morgenstern, D. 1968. Einführung in die Wahrscheinlichkeitsrechnung und mathematische Statistik [In German]. Die Grundlehren der mathematischen Wissenschaften, Band 124. Springer-Verlag, Berlin-New York. MR0254884.
  • Nussbaum, [1996] Nussbaum, M. 1996. Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist., 24(6), 2399–2430. MR1425959.
  • Ouimet, [2021] Ouimet, F. 2021. A precise local limit theorem for the multinomial distribution and some applications. J. Statist. Plann. Inference, 215, 218–233. MR4249129.
  • Prokhorov, [1953] Prokhorov, Y. V. 1953. Asymptotic behavior of the binomial distribution. Uspekhi Mat. Nauk, 8(3(55)), 135–142. MR56861.