On the Le Cam distance between multivariate hypergeometric
and multivariate normal experiments

Frédéric Ouimet McGill University, Montreal, QC H3A 0B9, Canada. California Institute of Technology, Pasadena, CA 91125, USA. [email protected]

Abstract

In this short note, we develop a local approximation for the log-ratio of the multivariate hypergeometric probability mass function over the corresponding multinomial probability mass function. In conjunction with the bounds from Carter, [4] and Ouimet, [14] on the total variation between the law of a multinomial vector jittered by a uniform on $(-1/2,1/2)^{d}$ and the law of the corresponding multivariate normal distribution, the local expansion for the log-ratio is then used to obtain a total variation bound between the law of a multivariate hypergeometric random vector jittered by a uniform on $(-1/2,1/2)^{d}$ and the law of the corresponding multivariate normal distribution. As a corollary, we find an upper bound on the Le Cam distance between multivariate hypergeometric and multivariate normal experiments.

keywords:

multivariate hypergeometric distribution , sampling without replacement , multinomial distribution , normal approximation , Gaussian approximation , local approximation , local limit theorem , asymptotic statistics , multivariate normal distribution , Le Cam distance , total variation , deficiency , comparison of experiments

MSC:

[2020]Primary: 62E20, 62B15 Secondary: 60F99, 60E05, 62H10, 62H12

^†^†journal: Results in Mathematics

1 Introduction

Let $d\in\mathbb{N}$ . The $d$ -dimensional (unit) simplex and its interior are defined by

\mathcal{S}_{d}\vcentcolon=\big{\{}\boldsymbol{x}\in[0,1]^{d}:\|\boldsymbol{x}\|_{1}\leq 1\big{\}},\qquad\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d})\vcentcolon=\big{\{}\boldsymbol{x}\in(0,1)^{d}:\|\boldsymbol{x}\|_{1}<1\big{\}},

(1.1)

where $\|\boldsymbol{x}\|_{1}\vcentcolon=\sum_{i=1}^{d}|x_{i}|$ denotes the $\ell^{1}$ norm on $\mathbb{R}^{d}$ . Given a set of probability weights $\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d})$ , the probability mass function of the multivariate hypergeometric distribution, $\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p})$ , is defined, by Johnson et al., [6, Chapter 39], as

P_{N,n,\boldsymbol{p}}(\boldsymbol{k})\vcentcolon=\frac{\prod_{i=1}^{d+1}\binom{Np_{i}}{k_{i}}}{\binom{N}{n}},\quad\boldsymbol{k}\in\mathbb{K}_{d},

(1.2)

where $p_{d+1}\vcentcolon=1-\|\boldsymbol{p}\|_{1}>0$ , $k_{d+1}\vcentcolon=n-\|\boldsymbol{k}\|_{1}$ , $n,N\in\mathbb{N}$ with $n\leq N$ , and

\mathbb{K}_{d}\vcentcolon=\big{\{}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}:k_{i}\in[0,Np_{i}]\,\,\text{for all }1\leq i\leq d+1\big{\}}.

(1.3)

This distribution represents the first $d$ components of the vector of categorical sample counts when randomly sorting a random sample of $n$ objects from a finite population of $N$ objects into $d+1$ categories, where $p_{i},~{}1\leq i\leq d+1$ , is the probability of any given object to be sorted in the $i$ -th category.

Our first main goal in this paper is to develop a local approximation for the log-ratio of the multivariate hypergeometric probability mass function (1.2) over the $\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p})$ probability mass function, namely

Q_{n,\boldsymbol{p}}(\boldsymbol{k})\vcentcolon=\frac{n!}{\prod_{i=1}^{d+1}k_{i}!}\prod_{i=1}^{d+1}p_{i}^{k_{i}},\quad\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}.

(1.4)

This latter distribution represents the exact same as (1.2) above, except that the population from which the $n$ objects are drawn is infinite ( $N=\infty$ ). Another way of distinguishing $P_{N,n,\boldsymbol{p}}$ and $Q_{n,\boldsymbol{p}}$ in a finite population of $N$ objects is to say that we sample the $n$ objects without replacement and with replacement, respectively. In both cases, the categorical probabilities $(\boldsymbol{p},p_{d+1})$ are the same. For good general references on normal approximations, we refer the reader to Bhattacharya & Ranga Rao, [2] and Kolassa, [7].

Our second main goal is to prove an upper bound on the total variation between the probability measure on $\mathbb{R}^{d}$ induced by a random vector distributed according to $P_{N,n,\boldsymbol{p}}$ and jittered by a uniform on $(-1/2,1/2)^{d}$ , and the probability measure on $\mathbb{R}^{d}$ induced by a multivariate normal random vector with the same mean and covariances as a random vector distributed according to $Q_{n,\boldsymbol{p}}$ , namely $n\boldsymbol{p}$ and $n(\mathrm{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top})$ . The proof makes use of the total variation bound from Ouimet, [14, Lemma 3.1] (which improved Lemma 2 in [4]) on the total variation between the probability measure on $\mathbb{R}^{d}$ induced by a multinomial vector distributed according to $Q_{n,\boldsymbol{p}}$ and jittered by a uniform on $(-1/2,1/2)^{d}$ , and the probability measure on $\mathbb{R}^{d}$ induced by a multivariate normal random vector with the same mean and covariances. As pointed out by Mattner & Schulz, [11, p.732], the univariate case here would be much simpler since Morgenstern, [12, p.62-63] showed that the hypergeometric probability mass function can be written as a ratio of three binomial probability mass functions, and local limit theorems are well-known for the binomial distribution, see, e.g., Prokhorov, [15] and Govindarajulu, [5].

The deficiency between a given statistical experiment and another measures the loss of information from carrying inferences to the second setting using information from the first setting. This loss of information goes in both directions, but the deficiency is not necessarily symmetric. The maximum of the two deficiencies is called the Le Cam distance (or $\Delta$ -distance in [8]). The usefulness of this notion comes from the fact that seemingly completely different statistical experiments can result in asymptotically equivalent inferences using Markov kernels to carry information from one setting to another. For instance, it was famously shown by Nussbaum, [13] that the density estimation problem and the Gaussian white noise problem are asymptotically equivalent in the sense that the Le Cam distance between the two experiments goes to $0$ as the number of observations goes to infinity. The main idea was that the information we get from sampling observations from an unknown density function and counting the observations that fall in the various boxes of a fine partition of the density’s support can be encoded using the increments of a properly scaled Brownian motion with drift $t\mapsto\int_{0}^{t}\sqrt{f(s)}\hskip 1.42262pt{\rm d}s$ , and vice versa. An alternative (simpler) proof of this asymptotic equivalence was shown by Brown et al., [3] who combined a Haar wavelet cascade scheme with coupling inequalities relating the binomial and univariate normal distributions at each step (a similar argument was developed previously by Carter, [4] to derive a multinomial/multivariate coupling inequality). Not only Brown et al., [3] streamlined the proof of the asymptotic equivalence originally shown by Nussbaum, [13], but their results hold for a larger class of densities and the asymptotic equivalence was also extended to Poisson processes. Our third main result in the present paper extends the multinomial/multivariate normal comparison from [4] (revisited and improved by Ouimet, [14], who removed the inductive part of the argument) to the multivariate hypergeometric/multivariate normal comparison (recall from (1.4) that the multinomial distribution is just the limiting case $N=\infty$ of the multivariate hypergeometric distribution). For an excellent and concise review on Le Cam’s theory for the comparison of statistical models, we refer the reader to Mariucci, [10].

The three results we have just described are presented in Section 2, and the related proofs are gathered in Section 3. Here are now some motivations for these results. First, we believe that the first two results (the local expansion of the log-ratio and the total variation bound) could help in developing asymptotic Berry-Esseen type bounds for the symmetric multivariate hypergeometric distribution and the symmetric multinomial distribution, similar to the exact optimal bounds proved recently by Mattner & Schulz, [11] in the univariate setting. Second, there might be a way to use the Le Cam distance upper bound between multivariate hypergeometric and multivariate normal experiments to extend the results on the asymptotic equivalence between the density estimation problem and the Gaussian white noise problem shown by Nussbaum, [13] and Brown et al., [3].

Remark 1.

Throughout the paper, the notation $u=\mathcal{O}(v)$ means that $\limsup_{N\to\infty}|u/v|<C$ , where $C\in(0,\infty)$ is a universal constant. Whenever $C$ might depend on a parameter, we add a subscript (for example, $u=\mathcal{O}_{d}(v)$ ).

2 Results

Our first main result is an asymptotic expansion for the log-ratio of the multivariate hypergeometric probability mass function (1.2) over the corresponding multinomial probability mass function (1.4).

Theorem 1 (Local limit theorem for the log-ratio).

Assume that $n,N\in\mathbb{N}$ with $n\leq N$ and $\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d})$ hold, and pick any $\gamma\in(0,1)$ . Then, uniformly for $\boldsymbol{k}\in\mathbb{K}_{d}$ such that $\max_{1\leq i\leq d+1}(k_{i}/p_{i})\leq\gamma N$ and $n\leq\gamma N$ , we have, as $N\to\infty$ ,

\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)=\frac{1}{N}\left[\bigg{(}\frac{n^{2}}{2}-\frac{n}{2}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\bigg{(}\frac{k_{i}^{2}}{2}-\frac{k_{i}}{2}\bigg{)}\right]+\mathcal{O}_{\gamma}\left(\frac{1}{N^{2}}\left[n^{3}+\sum_{i=1}^{d+1}\frac{k_{i}^{3}}{p_{i}^{2}}\right]\right),

(2.1)

and

$\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)$	$\displaystyle=\frac{1}{N}\left[\bigg{(}\frac{n^{2}}{2}-\frac{n}{2}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\bigg{(}\frac{k_{i}^{2}}{2}-\frac{k_{i}}{2}\bigg{)}\right]$	(2.2)
	$\displaystyle\quad+\frac{1}{N^{2}}\left[\bigg{(}\frac{n^{3}}{6}-\frac{n^{2}}{4}+\frac{n}{12}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}^{2}}\cdot\bigg{(}\frac{k_{i}^{3}}{6}-\frac{k_{i}^{2}}{4}+\frac{k_{i}}{12}\bigg{)}\right]$
	$\displaystyle\quad+\mathcal{O}_{\gamma}\left(\frac{1}{N^{3}}\left[n^{4}+\sum_{i=1}^{d+1}\frac{k_{i}^{4}}{p_{i}^{3}}\right]\right).$

The local limit theorem above together with the total variation bound in [4, 14] between jittered multinomials and the corresponding multivariate normals allow us to derive an upper bound on the total variation between the probability measure on $\mathbb{R}^{d}$ induced by a multivariate hypergeometric random vector jittered by a uniform random vector on $(-1/2,1/2)^{d}$ and the probability measure on $\mathbb{R}^{d}$ induced by a multivariate normal random vector with the same mean and covariances as the multinomial distribution in (1.4).

Theorem 2 (Total variation upper bound).

Assume that $n,N\in\mathbb{N}$ with $n\leq(3/4)\,N$ and $\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d})$ hold. Let $\boldsymbol{K}\sim\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p})$ , $\boldsymbol{L}\sim\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p})$ , and $\boldsymbol{U},\boldsymbol{V}\sim\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}$ , where $\boldsymbol{K}$ , $\boldsymbol{L}$ , $\boldsymbol{U}$ and $\boldsymbol{V}$ are assumed to be jointly independent. Define $\boldsymbol{X}\vcentcolon=\boldsymbol{K}+\boldsymbol{U}$ and $\boldsymbol{Y}\vcentcolon=\boldsymbol{L}+\boldsymbol{V}$ , and let $\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}$ and $\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}$ be the laws of $\boldsymbol{X}$ and $\boldsymbol{Y}$ , respectively. Also, let $\mathbb{Q}_{n,\boldsymbol{p}}$ be the law of the $\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}})$ distribution, where $\Sigma_{\boldsymbol{p}}\vcentcolon=\mathrm{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}$ . Then, as $N\to\infty$ ,

	$\displaystyle\\|\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\\|$	$\displaystyle\leq\sqrt{2\sum_{i=1}^{d+1}\left(\frac{1}{\nu_{i}}\right)^{n\nu_{i}p_{i}}\left(\frac{1-p_{i}}{1-\nu_{i}p_{i}}\right)^{n(1-\nu_{i}p_{i})}+\mathcal{O}\left(\frac{n^{2}}{N}\right)}$		(2.3)
		$\displaystyle\quad+\mathcal{O}\left(\frac{d}{\sqrt{n}}\sqrt{\frac{\max\{p_{1},\dots,p_{d},p_{d+1}\}}{\min\{p_{1},\dots,p_{d},p_{d+1}\}}}\right),$		(2.3)

where $\nu_{i}\vcentcolon=\lceil p_{i}^{-1}-1\rceil$ , for $1\leq i\leq d+1$ , and $\|\cdot\|$ denotes the total variation norm.

Since the Le Cam distance is a pseudometric and the Markov kernel that jitters a random vector by a uniform on $(-1/2,1/2)^{d}$ is easily inverted (round off each component of the vector to the nearest integer), then we find, as a consequence of the total variation bound in Theorem 2, an upper bound on the Le Cam distance between multivariate hypergeometric and multivariate normal experiments.

Theorem 3 (Le Cam distance upper bound).

Assume that $n,N\in\mathbb{N}$ with $n\leq(3/4)\,N$ holds. For any given $R\geq 1$ , let

\Theta_{R}\vcentcolon=\left\{\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d}):\,\frac{\max\{p_{1},\dots,p_{d},p_{d+1}\}}{\min\{p_{1},\dots,p_{d},p_{d+1}\}}\leq R\right\}.

(2.4)

Define the experiments

	$\displaystyle\mathscr{P}$	$\displaystyle\vcentcolon=$	$\displaystyle~{}\{\mathbb{P}_{N,n,\boldsymbol{p}}\}_{\boldsymbol{p}\in\Theta_{R}},\quad$	$\displaystyle\mathbb{P}_{N,n,\boldsymbol{p}}~{}\text{is the measure induced by }\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p}),$
	$\displaystyle\mathscr{Q}$	$\displaystyle\vcentcolon=$	$\displaystyle~{}\{\mathbb{Q}_{n,\boldsymbol{p}}\}_{\boldsymbol{p}\in\Theta_{R}},\quad$	$\displaystyle\mathbb{Q}_{n,\boldsymbol{p}}~{}\text{is the measure induced by }\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}).$

Then, for $N\geq n^{3}/d^{\hskip 0.56905pt2}$ , we have the following upper bound on the Le Cam distance $\Delta(\mathscr{P},\mathscr{Q})$ between $\mathscr{P}$ and $\mathscr{Q}$ ,

\Delta(\mathscr{P},\mathscr{Q})\vcentcolon=\max\{\delta(\mathscr{P},\mathscr{Q}),\delta(\mathscr{Q},\mathscr{P})\}\leq C_{R}\,\frac{d}{\sqrt{n}},

(2.5)

where $C_{R}$ is a positive constant that depends only on $R$ ,

	$\displaystyle\delta(\mathscr{P},\mathscr{Q})$	$\displaystyle\vcentcolon=\inf_{T_{1}}\sup_{\boldsymbol{p}\in\Theta_{R}}\bigg{\\|}\int_{\mathbb{K}_{d}}T_{1}(\boldsymbol{k},\cdot\,)\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\mathbb{Q}_{n,\boldsymbol{p}}\bigg{\\|},$		(2.6)
	$\displaystyle\delta(\mathscr{Q},\mathscr{P})$	$\displaystyle\vcentcolon=\inf_{T_{2}}\sup_{\boldsymbol{p}\in\Theta_{R}}\bigg{\\|}\mathbb{P}_{N,n,\boldsymbol{p}}-\int_{\mathbb{R}^{d}}T_{2}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\\|},$		(2.6)

and the infima are taken over all Markov kernels $T_{1}:\mathbb{K}_{d}\times\mathscr{B}(\mathbb{R}^{d})\to[0,1]$ and $T_{2}:\mathbb{R}^{d}\times\mathscr{B}(\mathbb{K}_{d})\to[0,1]$ .

Now, consider the following multivariate normal experiments with independent components

	$\displaystyle\overline{\mathscr{Q}}$	$\displaystyle\vcentcolon=$	$\displaystyle~{}\{\overline{\mathbb{Q}}_{n,\boldsymbol{p}}\}_{\boldsymbol{p}\in\Theta_{R}},\quad$	$\displaystyle\overline{\mathbb{Q}}_{n,\boldsymbol{p}}~{}\text{is the measure induced by }\mathrm{Normal}_{d}(n\boldsymbol{p},n\mathrm{diag}(\boldsymbol{p})),$
	$\displaystyle\mathscr{Q}^{\star}$	$\displaystyle\vcentcolon=$	$\displaystyle~{}\{\mathbb{Q}_{n,\boldsymbol{p}}^{\star}\}_{\boldsymbol{p}\in\Theta_{R}},\quad$	$\displaystyle\mathbb{Q}_{n,\boldsymbol{p}}^{\star}~{}\text{is the measure induced by }\mathrm{Normal}_{d}(\sqrt{n\boldsymbol{p}},\mathrm{diag}(\boldsymbol{1}/4)),$

where $\boldsymbol{1}\vcentcolon=(1,1,\dots,1)^{\top}$ , then Carter, [4, Section 7] showed, using a variance stabilizing transformation, that

\Delta(\mathscr{Q},\overline{\mathscr{Q}})\leq C_{R}\,\sqrt{\frac{d}{n}}\qquad\text{and}\qquad\Delta(\overline{\mathscr{Q}},\mathscr{Q}^{\star})\leq C_{R}\,\frac{d}{\sqrt{n}},

(2.7)

with proper adjustments to the definition of the deficiencies in (2.6).

Corollary 1.

With the same notation as in Theorem 3, we have, for $N\geq n^{3}/d^{\hskip 0.56905pt2}$ ,

\Delta(\mathscr{P},\overline{\mathscr{Q}})\leq C_{R}\,\frac{d}{\sqrt{n}}\qquad\text{and}\qquad\Delta(\mathscr{P},\mathscr{Q}^{\star})\leq C_{R}\,\frac{d}{\sqrt{n}},

(2.8)

where $C_{R}$ is a positive constant that depends only on $R$ .

3 Proofs

Proof of Theorem 1.

Throughout the proof, the parameter $n\in\mathbb{N}$ satisfies $n\leq\gamma N$ and the asymptotic expressions are valid as $N\to\infty$ . Let $\boldsymbol{p}\in N^{-1}\,\mathbb{N}_{0}^{d}\cap\mathrm{Int}\hskip 0.56905pt(\mathcal{S}_{d})$ and $\boldsymbol{k}\in\mathbb{K}_{d}$ . Using Stirling’s formula,

\log m!=\frac{1}{2}\log(2\pi)+(m+\tfrac{1}{2})\log m-m+\frac{1}{12m}+\mathcal{O}(m^{-3}),\quad m\to\infty,

(3.1)

see, e.g., Abramowitz & Stegun, [1, p.257], and taking logarithms in (1.2) and (1.4), we obtain

$\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)$	$\displaystyle=\sum_{i=1}^{d+1}\log(Np_{i})!-\sum_{i=1}^{d+1}\log(Np_{i}-k_{i})!+\log(N-n)!-\log N!-\sum_{i=1}^{d+1}k_{i}\log p_{i}$
	$\displaystyle=\sum_{i=1}^{d+1}(Np_{i}+\tfrac{1}{2})\log(Np_{i})-\sum_{i=1}^{d+1}(Np_{i}-k_{i}+\tfrac{1}{2})\log(Np_{i}-k_{i})$
	$\displaystyle\qquad+(N-n+\tfrac{1}{2})\log(N-n)-(N+\tfrac{1}{2})\log N-\sum_{i=1}^{d+1}k_{i}\log p_{i}$
	$\displaystyle\qquad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\left\{1-\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}^{-1}\right\}+\bigg{(}1-\frac{n}{N}\bigg{)}^{-1}-1\right]$
	$\displaystyle\qquad+\mathcal{O}\left(\frac{1}{N^{3}}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}^{3}}\left\{1+\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}^{-3}\right\}+\bigg{(}1-\frac{n}{N}\bigg{)}^{-3}+1\right]\right)$
	$\displaystyle=-\sum_{i=1}^{d+1}Np_{i}\,\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}\log\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}-\frac{1}{2}\sum_{i=1}^{d+1}\log\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}$
	$\displaystyle\qquad+N\left(1-\frac{n}{N}\right)\log\left(1-\frac{n}{N}\right)+\frac{1}{2}\log\left(1-\frac{n}{N}\right)$
	$\displaystyle\qquad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\left\{1-\bigg{(}1-\frac{k_{i}}{Np_{i}}\bigg{)}^{-1}\right\}+\bigg{(}1-\frac{n}{N}\bigg{)}^{-1}-1\right]$	(3.2)
	$\displaystyle\qquad+\mathcal{O}_{\gamma}\left(\sum_{i=1}^{d+1}\frac{1}{(Np_{i})^{3}}\right).$	(3.3)

By applying the following Taylor expansions, valid for $|x|\leq\gamma<1$ ,

$\displaystyle(1-x)\log(1-x)$	$\displaystyle=-x+\frac{x^{2}}{2}+\frac{x^{3}}{6}+\mathcal{O}\big{(}(1-\gamma)^{-3}\|x\|^{4}\big{)},$	(3.4)
$\displaystyle\log(1-x)$	$\displaystyle=-x-\frac{x^{2}}{2}+\mathcal{O}\big{(}(1-\gamma)^{-3}\|x\|^{3}\big{)},$
$\displaystyle(1-x)^{-1}$	$\displaystyle=1+x+\mathcal{O}\big{(}(1-\gamma)^{-3}\|x\|^{2}\big{)},$

in (3), we have

$\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)$	$\displaystyle=\sum_{i=1}^{d+1}Np_{i}\,\left\{\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}-\frac{1}{2}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{2}-\frac{1}{6}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{3}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{k_{i}}{Np_{i}}\bigg{\|}^{4}\right)\right\}$	(3.5)
	$\displaystyle\quad+\frac{1}{2}\sum_{i=1}^{d+1}\left\{\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}+\frac{1}{2}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{2}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{k_{i}}{Np_{i}}\bigg{\|}^{3}\right)\right\}$
	$\displaystyle\quad+N\left\{-\bigg{(}\frac{n}{N}\bigg{)}+\frac{1}{2}\bigg{(}\frac{n}{N}\bigg{)}^{2}+\frac{1}{6}\bigg{(}\frac{n}{N}\bigg{)}^{3}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{n}{N}\bigg{\|}^{4}\right)\right\}$
	$\displaystyle\quad-\frac{1}{2}\left\{\bigg{(}\frac{n}{N}\bigg{)}+\frac{1}{2}\bigg{(}\frac{n}{N}\bigg{)}^{2}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{n}{N}\bigg{\|}^{3}\right)\right\}$
	$\displaystyle\quad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\left\{-\frac{k_{i}}{Np_{i}}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{k_{i}}{Np_{i}}\bigg{\|}^{2}\right)\right\}+\left\{\frac{n}{N}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{n}{N}\bigg{\|}^{2}\right)\right\}\right]$
	$\displaystyle\quad+\mathcal{O}_{\gamma}\left(\sum_{i=1}^{d+1}\frac{1}{(Np_{i})^{3}}\right).$

After rearranging some terms and noticing that $\sum_{i=1}^{d+1}k_{i}=n$ , we get

$\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)$	$\displaystyle=\frac{1}{N}\left[\bigg{(}\frac{n^{2}}{2}-\frac{n}{2}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\bigg{(}\frac{k_{i}^{2}}{2}-\frac{k_{i}}{2}\bigg{)}\right]$	(3.6)
	$\displaystyle\quad+\frac{1}{N^{2}}\left[\bigg{(}\frac{n^{3}}{6}-\frac{n^{2}}{4}+\frac{n}{12}\bigg{)}-\sum_{i=1}^{d+1}\frac{1}{p_{i}^{2}}\cdot\bigg{(}\frac{k_{i}^{3}}{6}-\frac{k_{i}^{2}}{4}+\frac{k_{i}}{12}\bigg{)}\right]$
	$\displaystyle\quad+\mathcal{O}_{\gamma}\left(\frac{1}{N^{3}}\left[n^{4}+\sum_{i=1}^{d+1}\frac{k_{i}^{4}}{p_{i}^{3}}\right]\right).$

This proves (2.2). Equation (2.1) follows from the same arguments, simply by keeping fewer terms for the Taylor expansions in (3.4). The details are omitted for conciseness. ∎

Proof of Theorem 2.

Define

A_{N,n,\boldsymbol{p}}(\gamma)\vcentcolon=\left\{\boldsymbol{k}\in\mathbb{K}_{d}:\max_{1\leq i\leq d+1}\frac{k_{i}}{p_{i}}\leq\gamma N\right\}+\left(-\frac{1}{2},\frac{1}{2}\right)^{d},\quad\gamma>0.

(3.7)

By the comparison of the total variation norm with the Hellinger distance on page 726 of Carter, [4], we already know that

\|\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}-\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}\|\leq\sqrt{2\,\mathbb{P}(\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}^{c}(1/2))+\mathbb{E}\left[\log\bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]}.

(3.8)

By applying a union bound together with the large deviation bound for the (univariate) hypergeometric distribution in Luh & Pippenger, [9, Equation (4)], we get, for $N$ large enough,

	$\displaystyle\mathbb{P}(\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}^{c}(1/2))$	$\displaystyle\leq\sum_{i=1}^{d+1}\mathbb{P}\Bigl{(}K_{i}>\tfrac{N}{2n}\cdot np_{i}-1\Bigr{)}\leq\sum_{i=1}^{d+1}\mathbb{P}\Bigl{(}K_{i}>\nu_{i}\cdot np_{i}\Bigr{)}$		(3.9)
		$\displaystyle\leq\sum_{i=1}^{d+1}\left(\frac{1}{\nu_{i}}\right)^{n\nu_{i}p_{i}}\left(\frac{1-p_{i}}{1-\nu_{i}p_{i}}\right)^{n(1-\nu_{i}p_{i})},$		(3.9)

where $\nu_{i}\vcentcolon=\lceil p_{i}^{-1}-1\rceil$ , for all $1\leq i\leq d+1$ . To estimate the expectation in (3.8), note that if $P_{N,n,\boldsymbol{p}}(\boldsymbol{x})$ and $Q_{n,\boldsymbol{p}}(\boldsymbol{x})$ denote the density functions associated with $\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}$ and $\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}$ (i.e., $P_{N,n,\boldsymbol{p}}(\boldsymbol{x})$ is equal to $P_{N,n,\boldsymbol{p}}(\boldsymbol{k})$ whenever $\boldsymbol{k}\in\mathbb{K}_{d}$ is closest to $\boldsymbol{x}$ , and analogously for $Q_{n,\boldsymbol{p}}(\boldsymbol{x})$ ), then, for $N$ large enough, we have

	$\displaystyle\left\|\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\Bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]\right\|$	$\displaystyle=\left\|\mathbb{E}\left[\log\bigg{(}\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{X})}{Q_{n,\boldsymbol{p}}(\boldsymbol{X})}\bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]\right\|$		(3.10)
		$\displaystyle\leq\mathbb{E}\left[\bigg{\|}\log\bigg{(}\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{K})}{Q_{n,\boldsymbol{p}}(\boldsymbol{K})}\bigg{)}\bigg{\|}\,\mathds{1}_{\{\boldsymbol{K}\in A_{N,n,\boldsymbol{p}}(3/4)\}}\right].$		(3.10)

By Theorem 1 with $\gamma=3/4$ , we find

	$\displaystyle\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\Bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]$	$\displaystyle=\mathcal{O}\left(\frac{n^{2}}{N}+\sum_{i=1}^{d+1}\frac{\mathbb{E}[K_{i}^{2}]}{Np_{i}}\right)$		(3.11)
		$\displaystyle=\mathcal{O}\left(\frac{n^{2}}{N}+\sum_{i=1}^{d+1}\frac{n^{2}p_{i}^{2}}{Np_{i}}\right)=\mathcal{O}\left(\frac{n^{2}}{N}\right).$		(3.11)

Together with the large deviation bound in (3.9), we deduce from (3.8) that

\|\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}-\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}\|\leq\sqrt{2\sum_{i=1}^{d+1}\left(\frac{1}{\nu_{i}}\right)^{n\nu_{i}p_{i}}\left(\frac{1-p_{i}}{1-\nu_{i}p_{i}}\right)^{n(1-\nu_{i}p_{i})}+\mathcal{O}\left(\frac{n^{2}}{N}\right)}.

(3.12)

Also, by Lemma 3.1 in [14] (a slightly weaker bound can be found in Lemma 2 of [4]), we already know that

\|\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|=\mathcal{O}\left(\frac{d}{\sqrt{n}}\sqrt{\frac{\max\{p_{1},\dots,p_{d},p_{d+1}\}}{\min\{p_{1},\dots,p_{d},p_{d+1}\}}}\right).

(3.13)

Putting (3.12) and (3.13) together yields the conclusion. ∎

Proof of Theorem 3.

By Theorem 2 with our assumption $N\geq n^{3}/d^{\hskip 0.56905pt2}$ , we get the desired bound on $\delta(\mathscr{P},\mathscr{Q})$ by choosing the Markov kernel $T_{1}^{\star}$ that adds $\boldsymbol{U}\sim\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}$ to $\boldsymbol{K}\sim\mathrm{Hypergeometric}\hskip 0.56905pt(N,n,\boldsymbol{p})$ , namely

\displaystyle T_{1}^{\star}(\boldsymbol{k},B)\vcentcolon=\int_{(-\frac{1}{2},\frac{1}{2})^{d}}\mathds{1}_{B}(\boldsymbol{k}+\boldsymbol{u}){\rm d}\boldsymbol{u},\quad\boldsymbol{k}\in\mathbb{K}_{d},~{}B\in\mathscr{B}(\mathbb{R}^{d}).

(3.14)

To get the bound on $\delta(\mathscr{Q},\mathscr{P})$ , it suffices to consider a Markov kernel $T_{2}^{\star}$ that inverts the effect of $T_{1}^{\star}$ , i.e., rounding off every components of $\boldsymbol{Z}\sim\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}})$ to the nearest integer. Then, as explained by Carter, [4, Section 5], we get

$\displaystyle\delta(\mathscr{Q},\mathscr{P})$	$\displaystyle\leq\bigg{\\|}\mathbb{P}_{N,n,\boldsymbol{p}}-\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\\|}$
	$\displaystyle=\bigg{\\|}\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\int_{\mathbb{K}_{d}}T_{1}^{\star}(\boldsymbol{k},{\rm d}\boldsymbol{z})\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\\|}$
	$\displaystyle\leq\bigg{\\|}\int_{\mathbb{K}_{d}}T_{1}^{\star}(\boldsymbol{k},\cdot\,)\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\mathbb{Q}_{n,\boldsymbol{p}}\bigg{\\|},$	(3.15)

and we obtain the same bound by Theorem 2. ∎

Proof of Corollary 1.

This follows directly from Theorem 3, Equation (2.7) and the fact that $\Delta(\cdot,\cdot)$ is a pseudometric (i.e., the triangle inequality is valid). ∎

Acknowledgements

The author thanks the referee for his/her comments.

Funding: The author is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR).

Declarations

Conflict of interest: The author declares no conflict of interest.

References

Abramowitz & Stegun, [1964] Abramowitz, M., & Stegun, I. A. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series, vol. 55. For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. MR0167642.
Bhattacharya & Ranga Rao, [1976] Bhattacharya, R. N., & Ranga Rao, R. 1976. Normal Approximation and Asymptotic Expansions. John Wiley & Sons, New York-London-Sydney. MR0436272.
Brown et al., [2004] Brown, L. D., Carter, A. V., Low, M. G., & Zhang, C.-H. 2004. Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift. Ann. Statist., 32(5), 2074–2097. MR2102503.
Carter, [2002] Carter, A. V. 2002. Deficiency distance between multinomial and multivariate normal experiments. Dedicated to the memory of Lucien Le Cam. Ann. Statist., 30(3), 708–730. MR1922539.
Govindarajulu, [1965] Govindarajulu, Z. 1965. Normal approximations to the classical discrete distributions. Sankhyā Ser. A, 27, 143–172. MR207011.
Johnson et al., [1997] Johnson, N. L., Kotz, S., & Balakrishnan, N. 1997. Discrete Multivariate Distributions. Wiley Series in Probability and Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York. MR1429617.
Kolassa, [1994] Kolassa, J. E. 1994. Series Approximation Methods in Statistics. Lecture Notes in Statistics, vol. 88. Springer-Verlag, New York. MR1295242.
Le Cam & Yang, [2000] Le Cam, L., & Yang, G. L. 2000. Asymptotics in Statistics. Second edn. Springer Series in Statistics. Springer-Verlag, New York. MR1784901.
Luh & Pippenger, [2014] Luh, K., & Pippenger, N. 2014. Large-deviation bounds for sampling without replacement. Amer. Math. Monthly, 121(5), 449–454. MR3193733.
Mariucci, [2016] Mariucci, E. 2016. Le Cam theory on the comparison of statistical models. Grad. J. Math., 1(2), 81–91. MR3850766.
Mattner & Schulz, [2018] Mattner, L., & Schulz, J. 2018. On normal approximations to symmetric hypergeometric laws. Trans. Amer. Math. Soc., 370(1), 727–748. MR3717995.
Morgenstern, [1968] Morgenstern, D. 1968. Einführung in die Wahrscheinlichkeitsrechnung und mathematische Statistik [In German]. Die Grundlehren der mathematischen Wissenschaften, Band 124. Springer-Verlag, Berlin-New York. MR0254884.
Nussbaum, [1996] Nussbaum, M. 1996. Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist., 24(6), 2399–2430. MR1425959.
Ouimet, [2021] Ouimet, F. 2021. A precise local limit theorem for the multinomial distribution and some applications. J. Statist. Plann. Inference, 215, 218–233. MR4249129.
Prokhorov, [1953] Prokhorov, Y. V. 1953. Asymptotic behavior of the binomial distribution. Uspekhi Mat. Nauk, 8(3(55)), 135–142. MR56861.

$\displaystyle\log\left(\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{k})}{Q_{n,\boldsymbol{p}}(\boldsymbol{k})}\right)$	$\displaystyle=\sum_{i=1}^{d+1}Np_{i}\,\left\{\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}-\frac{1}{2}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{2}-\frac{1}{6}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{3}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{k_{i}}{Np_{i}}\bigg{\|}^{4}\right)\right\}$	(3.5)
	$\displaystyle\quad+\frac{1}{2}\sum_{i=1}^{d+1}\left\{\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}+\frac{1}{2}\bigg{(}\frac{k_{i}}{Np_{i}}\bigg{)}^{2}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{k_{i}}{Np_{i}}\bigg{\|}^{3}\right)\right\}$
	$\displaystyle\quad+N\left\{-\bigg{(}\frac{n}{N}\bigg{)}+\frac{1}{2}\bigg{(}\frac{n}{N}\bigg{)}^{2}+\frac{1}{6}\bigg{(}\frac{n}{N}\bigg{)}^{3}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{n}{N}\bigg{\|}^{4}\right)\right\}$
	$\displaystyle\quad-\frac{1}{2}\left\{\bigg{(}\frac{n}{N}\bigg{)}+\frac{1}{2}\bigg{(}\frac{n}{N}\bigg{)}^{2}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{n}{N}\bigg{\|}^{3}\right)\right\}$
	$\displaystyle\quad+\frac{1}{12N}\left[\sum_{i=1}^{d+1}\frac{1}{p_{i}}\cdot\left\{-\frac{k_{i}}{Np_{i}}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{k_{i}}{Np_{i}}\bigg{\|}^{2}\right)\right\}+\left\{\frac{n}{N}+\mathcal{O}_{\gamma}\left(\bigg{\|}\frac{n}{N}\bigg{\|}^{2}\right)\right\}\right]$
	$\displaystyle\quad+\mathcal{O}_{\gamma}\left(\sum_{i=1}^{d+1}\frac{1}{(Np_{i})^{3}}\right).$

	$\displaystyle\left\|\mathbb{E}\left[\log\Bigg{(}\frac{{\rm d}\widetilde{\mathbb{P}}_{N,n,\boldsymbol{p}}}{{\rm d}\widetilde{\mathbb{Q}}_{n,\boldsymbol{p}}}(\boldsymbol{X})\Bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]\right\|$	$\displaystyle=\left\|\mathbb{E}\left[\log\bigg{(}\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{X})}{Q_{n,\boldsymbol{p}}(\boldsymbol{X})}\bigg{)}\,\mathds{1}_{\{\boldsymbol{X}\in A_{N,n,\boldsymbol{p}}(1/2)\}}\right]\right\|$		(3.10)
		$\displaystyle\leq\mathbb{E}\left[\bigg{\|}\log\bigg{(}\frac{P_{N,n,\boldsymbol{p}}(\boldsymbol{K})}{Q_{n,\boldsymbol{p}}(\boldsymbol{K})}\bigg{)}\bigg{\|}\,\mathds{1}_{\{\boldsymbol{K}\in A_{N,n,\boldsymbol{p}}(3/4)\}}\right].$		(3.10)

$\displaystyle\delta(\mathscr{Q},\mathscr{P})$	$\displaystyle\leq\bigg{\\|}\mathbb{P}_{N,n,\boldsymbol{p}}-\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\\|}$
	$\displaystyle=\bigg{\\|}\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\int_{\mathbb{K}_{d}}T_{1}^{\star}(\boldsymbol{k},{\rm d}\boldsymbol{z})\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\int_{\mathbb{R}^{d}}T_{2}^{\star}(\boldsymbol{z},\cdot\,)\,\mathbb{Q}_{n,\boldsymbol{p}}({\rm d}\boldsymbol{z})\bigg{\\|}$
	$\displaystyle\leq\bigg{\\|}\int_{\mathbb{K}_{d}}T_{1}^{\star}(\boldsymbol{k},\cdot\,)\,\mathbb{P}_{N,n,\boldsymbol{p}}({\rm d}\boldsymbol{k})-\mathbb{Q}_{n,\boldsymbol{p}}\bigg{\\|},$	(3.15)

On the Le Cam distance between multivariate hypergeometric and multivariate normal experiments