On the Le Cam distance between multivariate hypergeometric
and multivariate normal experiments
Abstract
In this short note, we develop a local approximation for the log-ratio of the multivariate hypergeometric probability mass function over the corresponding multinomial probability mass function. In conjunction with the bounds from Carter, [4] and Ouimet, [14] on the total variation between the law of a multinomial vector jittered by a uniform on and the law of the corresponding multivariate normal distribution, the local expansion for the log-ratio is then used to obtain a total variation bound between the law of a multivariate hypergeometric random vector jittered by a uniform on and the law of the corresponding multivariate normal distribution. As a corollary, we find an upper bound on the Le Cam distance between multivariate hypergeometric and multivariate normal experiments.
keywords:
multivariate hypergeometric distribution , sampling without replacement , multinomial distribution , normal approximation , Gaussian approximation , local approximation , local limit theorem , asymptotic statistics , multivariate normal distribution , Le Cam distance , total variation , deficiency , comparison of experimentsMSC:
[2020]Primary: 62E20, 62B15 Secondary: 60F99, 60E05, 62H10, 62H121 Introduction
Let . The -dimensional (unit) simplex and its interior are defined by
(1.1) |
where denotes the norm on . Given a set of probability weights , the probability mass function of the multivariate hypergeometric distribution, , is defined, by Johnson et al., [6, Chapter 39], as
(1.2) |
where , , with , and
(1.3) |
This distribution represents the first components of the vector of categorical sample counts when randomly sorting a random sample of objects from a finite population of objects into categories, where , is the probability of any given object to be sorted in the -th category.
Our first main goal in this paper is to develop a local approximation for the log-ratio of the multivariate hypergeometric probability mass function (1.2) over the probability mass function, namely
(1.4) |
This latter distribution represents the exact same as (1.2) above, except that the population from which the objects are drawn is infinite (). Another way of distinguishing and in a finite population of objects is to say that we sample the objects without replacement and with replacement, respectively. In both cases, the categorical probabilities are the same. For good general references on normal approximations, we refer the reader to Bhattacharya & Ranga Rao, [2] and Kolassa, [7].
Our second main goal is to prove an upper bound on the total variation between the probability measure on induced by a random vector distributed according to and jittered by a uniform on , and the probability measure on induced by a multivariate normal random vector with the same mean and covariances as a random vector distributed according to , namely and . The proof makes use of the total variation bound from Ouimet, [14, Lemma 3.1] (which improved Lemma 2 in [4]) on the total variation between the probability measure on induced by a multinomial vector distributed according to and jittered by a uniform on , and the probability measure on induced by a multivariate normal random vector with the same mean and covariances. As pointed out by Mattner & Schulz, [11, p.732], the univariate case here would be much simpler since Morgenstern, [12, p.62-63] showed that the hypergeometric probability mass function can be written as a ratio of three binomial probability mass functions, and local limit theorems are well-known for the binomial distribution, see, e.g., Prokhorov, [15] and Govindarajulu, [5].
The deficiency between a given statistical experiment and another measures the loss of information from carrying inferences to the second setting using information from the first setting. This loss of information goes in both directions, but the deficiency is not necessarily symmetric. The maximum of the two deficiencies is called the Le Cam distance (or -distance in [8]). The usefulness of this notion comes from the fact that seemingly completely different statistical experiments can result in asymptotically equivalent inferences using Markov kernels to carry information from one setting to another. For instance, it was famously shown by Nussbaum, [13] that the density estimation problem and the Gaussian white noise problem are asymptotically equivalent in the sense that the Le Cam distance between the two experiments goes to as the number of observations goes to infinity. The main idea was that the information we get from sampling observations from an unknown density function and counting the observations that fall in the various boxes of a fine partition of the density’s support can be encoded using the increments of a properly scaled Brownian motion with drift , and vice versa. An alternative (simpler) proof of this asymptotic equivalence was shown by Brown et al., [3] who combined a Haar wavelet cascade scheme with coupling inequalities relating the binomial and univariate normal distributions at each step (a similar argument was developed previously by Carter, [4] to derive a multinomial/multivariate coupling inequality). Not only Brown et al., [3] streamlined the proof of the asymptotic equivalence originally shown by Nussbaum, [13], but their results hold for a larger class of densities and the asymptotic equivalence was also extended to Poisson processes. Our third main result in the present paper extends the multinomial/multivariate normal comparison from [4] (revisited and improved by Ouimet, [14], who removed the inductive part of the argument) to the multivariate hypergeometric/multivariate normal comparison (recall from (1.4) that the multinomial distribution is just the limiting case of the multivariate hypergeometric distribution). For an excellent and concise review on Le Cam’s theory for the comparison of statistical models, we refer the reader to Mariucci, [10].
The three results we have just described are presented in Section 2, and the related proofs are gathered in Section 3. Here are now some motivations for these results. First, we believe that the first two results (the local expansion of the log-ratio and the total variation bound) could help in developing asymptotic Berry-Esseen type bounds for the symmetric multivariate hypergeometric distribution and the symmetric multinomial distribution, similar to the exact optimal bounds proved recently by Mattner & Schulz, [11] in the univariate setting. Second, there might be a way to use the Le Cam distance upper bound between multivariate hypergeometric and multivariate normal experiments to extend the results on the asymptotic equivalence between the density estimation problem and the Gaussian white noise problem shown by Nussbaum, [13] and Brown et al., [3].
Remark 1.
Throughout the paper, the notation means that , where is a universal constant. Whenever might depend on a parameter, we add a subscript (for example, ).
2 Results
Our first main result is an asymptotic expansion for the log-ratio of the multivariate hypergeometric probability mass function (1.2) over the corresponding multinomial probability mass function (1.4).
Theorem 1 (Local limit theorem for the log-ratio).
Assume that with and hold, and pick any . Then, uniformly for such that and , we have, as ,
(2.1) |
and
(2.2) | ||||
The local limit theorem above together with the total variation bound in [4, 14] between jittered multinomials and the corresponding multivariate normals allow us to derive an upper bound on the total variation between the probability measure on induced by a multivariate hypergeometric random vector jittered by a uniform random vector on and the probability measure on induced by a multivariate normal random vector with the same mean and covariances as the multinomial distribution in (1.4).
Theorem 2 (Total variation upper bound).
Assume that with and hold. Let , , and , where , , and are assumed to be jointly independent. Define and , and let and be the laws of and , respectively. Also, let be the law of the distribution, where . Then, as ,
(2.3) | ||||
where , for , and denotes the total variation norm.
Since the Le Cam distance is a pseudometric and the Markov kernel that jitters a random vector by a uniform on is easily inverted (round off each component of the vector to the nearest integer), then we find, as a consequence of the total variation bound in Theorem 2, an upper bound on the Le Cam distance between multivariate hypergeometric and multivariate normal experiments.
Theorem 3 (Le Cam distance upper bound).
Assume that with holds. For any given , let
(2.4) |
Define the experiments
Then, for , we have the following upper bound on the Le Cam distance between and ,
(2.5) |
where is a positive constant that depends only on ,
(2.6) | ||||
and the infima are taken over all Markov kernels and .
Now, consider the following multivariate normal experiments with independent components
where , then Carter, [4, Section 7] showed, using a variance stabilizing transformation, that
(2.7) |
with proper adjustments to the definition of the deficiencies in (2.6).
Corollary 1.
With the same notation as in Theorem 3, we have, for ,
(2.8) |
where is a positive constant that depends only on .
3 Proofs
Proof of Theorem 1.
Throughout the proof, the parameter satisfies and the asymptotic expressions are valid as . Let and . Using Stirling’s formula,
(3.1) |
see, e.g., Abramowitz & Stegun, [1, p.257], and taking logarithms in (1.2) and (1.4), we obtain
(3.2) | ||||
(3.3) |
By applying the following Taylor expansions, valid for ,
(3.4) | ||||
in (3), we have
(3.5) | ||||
After rearranging some terms and noticing that , we get
(3.6) | ||||
This proves (2.2). Equation (2.1) follows from the same arguments, simply by keeping fewer terms for the Taylor expansions in (3.4). The details are omitted for conciseness. ∎
Proof of Theorem 2.
Define
(3.7) |
By the comparison of the total variation norm with the Hellinger distance on page 726 of Carter, [4], we already know that
(3.8) |
By applying a union bound together with the large deviation bound for the (univariate) hypergeometric distribution in Luh & Pippenger, [9, Equation (4)], we get, for large enough,
(3.9) | ||||
where , for all . To estimate the expectation in (3.8), note that if and denote the density functions associated with and (i.e., is equal to whenever is closest to , and analogously for ), then, for large enough, we have
(3.10) | ||||
By Theorem 1 with , we find
(3.11) | ||||
Together with the large deviation bound in (3.9), we deduce from (3.8) that
(3.12) |
Also, by Lemma 3.1 in [14] (a slightly weaker bound can be found in Lemma 2 of [4]), we already know that
(3.13) |
Proof of Theorem 3.
By Theorem 2 with our assumption , we get the desired bound on by choosing the Markov kernel that adds to , namely
(3.14) |
To get the bound on , it suffices to consider a Markov kernel that inverts the effect of , i.e., rounding off every components of to the nearest integer. Then, as explained by Carter, [4, Section 5], we get
(3.15) |
and we obtain the same bound by Theorem 2. ∎
Acknowledgements
The author thanks the referee for his/her comments.
Funding: The author is supported by postdoctoral fellowships from the NSERC (PDF) and the FRQNT (B3X supplement and B3XR).
Declarations
Conflict of interest: The author declares no conflict of interest.
References
- Abramowitz & Stegun, [1964] Abramowitz, M., & Stegun, I. A. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series, vol. 55. For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. MR0167642.
- Bhattacharya & Ranga Rao, [1976] Bhattacharya, R. N., & Ranga Rao, R. 1976. Normal Approximation and Asymptotic Expansions. John Wiley & Sons, New York-London-Sydney. MR0436272.
- Brown et al., [2004] Brown, L. D., Carter, A. V., Low, M. G., & Zhang, C.-H. 2004. Equivalence theory for density estimation, Poisson processes and Gaussian white noise with drift. Ann. Statist., 32(5), 2074–2097. MR2102503.
- Carter, [2002] Carter, A. V. 2002. Deficiency distance between multinomial and multivariate normal experiments. Dedicated to the memory of Lucien Le Cam. Ann. Statist., 30(3), 708–730. MR1922539.
- Govindarajulu, [1965] Govindarajulu, Z. 1965. Normal approximations to the classical discrete distributions. Sankhyā Ser. A, 27, 143–172. MR207011.
- Johnson et al., [1997] Johnson, N. L., Kotz, S., & Balakrishnan, N. 1997. Discrete Multivariate Distributions. Wiley Series in Probability and Statistics: Applied Probability and Statistics. John Wiley & Sons, Inc., New York. MR1429617.
- Kolassa, [1994] Kolassa, J. E. 1994. Series Approximation Methods in Statistics. Lecture Notes in Statistics, vol. 88. Springer-Verlag, New York. MR1295242.
- Le Cam & Yang, [2000] Le Cam, L., & Yang, G. L. 2000. Asymptotics in Statistics. Second edn. Springer Series in Statistics. Springer-Verlag, New York. MR1784901.
- Luh & Pippenger, [2014] Luh, K., & Pippenger, N. 2014. Large-deviation bounds for sampling without replacement. Amer. Math. Monthly, 121(5), 449–454. MR3193733.
- Mariucci, [2016] Mariucci, E. 2016. Le Cam theory on the comparison of statistical models. Grad. J. Math., 1(2), 81–91. MR3850766.
- Mattner & Schulz, [2018] Mattner, L., & Schulz, J. 2018. On normal approximations to symmetric hypergeometric laws. Trans. Amer. Math. Soc., 370(1), 727–748. MR3717995.
- Morgenstern, [1968] Morgenstern, D. 1968. Einführung in die Wahrscheinlichkeitsrechnung und mathematische Statistik [In German]. Die Grundlehren der mathematischen Wissenschaften, Band 124. Springer-Verlag, Berlin-New York. MR0254884.
- Nussbaum, [1996] Nussbaum, M. 1996. Asymptotic equivalence of density estimation and Gaussian white noise. Ann. Statist., 24(6), 2399–2430. MR1425959.
- Ouimet, [2021] Ouimet, F. 2021. A precise local limit theorem for the multinomial distribution and some applications. J. Statist. Plann. Inference, 215, 218–233. MR4249129.
- Prokhorov, [1953] Prokhorov, Y. V. 1953. Asymptotic behavior of the binomial distribution. Uspekhi Mat. Nauk, 8(3(55)), 135–142. MR56861.