This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Non-asymptotic approximations for Pearson’s chi-square statistic and its application to confidence intervals for strictly convex functions of the probability weights of discrete distributions

Eric Bax Frédéric Ouimet Verizon Media, Playa Vista (California) USA Department of Mathematics and Statistics, McGill University, Montréal (Québec) Canada H3A 0B9
Abstract

In this paper, we develop a non-asymptotic local normal approximation for multinomial probabilities. First, we use it to find non-asymptotic total variation bounds between the measures induced by uniformly jittered multinomials and the multivariate normals with the same means and covariances. From the total variation bounds, we also derive a comparison of the cumulative distribution functions and quantile coupling inequalities between Pearson’s chi-square statistic (written as the normalized quadratic form of a multinomial vector) and its multivariate normal analogue. We apply our results to find confidence intervals for the negative entropy of discrete distributions. Our method can be applied more generally to find confidence intervals for strictly convex functions of the weights of discrete distributions.

keywords:
categorical data , convex optimization , entropy , Gaussian approximation , information theory , local limit theorem , multinomial distribution , multivariate normal distribution , quantile coupling
MSC:
[2020]Primary: 62E17, 62F25, 62F30; Secondary: 60E15, 60F99, 62E20, 62H10, 62H12

1 Introduction

A fundamental question in machine learning and statistics is this: Given a set of i.i.d. observations, what can we infer about the distribution that generated the observations? In general, we cannot infer the exact distribution – it is an intractable inverse problem. However, in some cases, if we know that the generating distribution belongs to a parametric family of distributions, then it is often possible to find a confidence set on the parameters for a given significance level. If the confidence set has a shape that makes it amenable to optimization (being convex, for example), then we can use it to infer confidence bounds on functions of the parameters by minimizing/maximizing the functions over the confidence set. In this paper, we are interested in the particular case where the parametric family is a given (fixed) discrete distribution with known values v1,,vd+1v_{1},\dots,v_{d+1}. The parameters are the probability weights p0,ip_{0,i} associated with each value viv_{i}. In other words, given a sequence of i.i.d. observations X1,X2,,XnX_{1},X_{2},\dots,X_{n}, we assume that

(Xi=vk)=pk,1kd+1,\mathbb{P}(X_{i}=v_{k})=p_{k},\quad 1\leq k\leq d+1, (1.1)

where the categorical values viv_{i} are fixed and known, but the categorical probabilities pip_{i} are unknown parameters. We want confidence bounds on the quantity

λ0=fv1,,vd+1(p0,1,,p0,d),\lambda_{0}=f_{v_{1},\dots,v_{d+1}}(p_{0,1},\dots,p_{0,d}), (1.2)

where the objective function (p1,,pd)fv1,,vd+1(p1,,pd)(p_{1},\dots,p_{d})\mapsto f_{v_{1},\dots,v_{d+1}}(p_{1},\dots,p_{d}) can depend on the categorical values v1,,vd+1v_{1},\dots,v_{d+1} and is assumed to be strictly convex. If we sample observations from the distribution (1.1), then note that the probability weights p0,1,,p0,d+1p_{0,1},\dots,p_{0,d+1} are also parameters of the multinomial distribution associated with the sample count vector 𝑵=(N1,N2,,Nd+1)\boldsymbol{N}=(N_{1},N_{2},\dots,N_{d+1}) where

Nk=i=1n𝟙{Xi=xk},1kd+1.N_{k}=\sum_{i=1}^{n}\mathds{1}_{\{X_{i}=x_{k}\}},\quad 1\leq k\leq d+1. (1.3)

Therefore, in order to build confidence bounds for λ0\lambda_{0}, an idea is to find a confidence set for the probability parameters of the sample count vector’s multinomial distribution and then minimize/maximize the objective function over the probability parameters in that set. This problem is described in detail and solved in Section 3.

Motivated by the above machine learning problem, we derive in this paper closely related theoretical results that may be of independent interest. They include a non-asymptotic local normal approximation for multinomial probabilities, a non-asymptotic total variation bound between the measures induced by uniformly jittered multinomials and the corresponding multivariate normals with the same means and covariances, and a non-asymptotic comparison of the cumulative distribution functions (c.d.f.s) between Pearson’s chi-square statistic (written as the normalized quadratic form of a multinomial vector) and its multivariate normal analogue, which then leads to quantile coupling bounds as a corollary. (For a good introduction to quantile coupling bounds, see, e.g., Mason & Zhou, (2012).)

To the best of our knowledge, the first two results are new. The local approximation result complements the asymptotic version obtained in Lemma 2.1 of Siotani & Fujikoshi, (1984), via a Stieltjes integral representation of probabilities due to Esseen, (1945), and later rederived independently in Theorem 2.1 of Ouimet, (2021b) in a log-ratio form via elementary Taylor expansions and Stirling’s formula. The total variation bound complements the asymptotic version obtained in the proof of Theorem 1 in Carter, (2002) and its improvement in Lemma 3.1 of Ouimet, (2021b) where the inductive part of Carter’s proof was removed by using the multinomial/normal local approximation directly instead of having a multi-scale comparison between binomials/normals. Unfortunately, the comparison of the c.d.f.s (and thus the quantile coupling result), being of order (logn)3/2n1/2(\log n)^{3/2}n^{-1/2} turns out to be (slightly) sub-optimal, although the derivation is easy to follow and provides a completely different route than usual (so there should still be some interest in the argument). Earlier asymptotic results in that direction can be found in Siotani & Fujikoshi, (1984) and Read, (1984). Some lapses were later found in the asymptotic expansions of those two papers and corrected in Ulyanov & Zubov, (2009) (see also Prokhorov & Ulyanov, (2013)). Slightly better bounds, of order n1/2n^{-1/2}, can be obtained via a multivariate Berry-Esseen theorem, see the Kolmogorov distance bounds of Mann, (1997a, b) mentioned on p.734 of Gaunt et al., (2017) and references therein. Gaunt et al., (2017) themselves derive, using Stein’s method, distributional bounds of order n1n^{-1} over smooth test functions between Pearson’s chi-square statistic and its asymptotic chi-square distribution, see also Gaunt, (2021) and Gaunt & Reinert, (2022) for similar results with respect to the Friedman and power divergence statistics. Since indicator functions are not smooth, the rate n1/2n^{-1/2} would be optimal in our context.

Here is the outline of the paper. The definitions and theoretical results are presented in Section 2 and proved in Appendix A. The solution to the problem described in the introduction is given in Section 3 and related proofs are also differed to Appendix A. Technical moment estimates are gathered in Appendix B.

2 Definitions and theoretical results

For dd\in\mathbb{N}, define the (unit) dd-dimensional simplex and its interior by

𝒮d:={𝒔[0,1]d:𝒔11},𝒮do:={𝒔(0,1)d:𝒔1<1},\mathcal{S}_{d}\vcentcolon=\big{\{}\boldsymbol{s}\in[0,1]^{d}:\|\boldsymbol{s}\|_{1}\leq 1\big{\}},\qquad\mathcal{S}_{d}^{o}\vcentcolon=\big{\{}\boldsymbol{s}\in(0,1)^{d}:\|\boldsymbol{s}\|_{1}<1\big{\}}, (2.1)

where 𝒔1:=i=1d|si|\|\boldsymbol{s}\|_{1}\vcentcolon=\sum_{i=1}^{d}|s_{i}| denotes the 1\ell^{1} norm on d\mathbb{R}^{d}. Given a positive integer nn and a set of probability weights 𝒑𝒮d\boldsymbol{p}\in\mathcal{S}_{d}, the Multinomial(n,𝒑)\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p}) probability mass function is defined by

Pn,𝒑(𝒌)=n!i=1d+1ki!i=1d+1piki,𝒌0dn𝒮d,P_{n,\boldsymbol{p}}(\boldsymbol{k})=\frac{n!}{\prod_{i=1}^{d+1}k_{i}!}\,\prod_{i=1}^{d+1}p_{i}^{k_{i}},\quad\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}, (2.2)

where pd+1:=1𝒑10p_{d+1}\vcentcolon=1-\|\boldsymbol{p}\|_{1}\geq 0 and kd+1:=n𝒌1k_{d+1}\vcentcolon=n-\|\boldsymbol{k}\|_{1}. With this notation, the multinomial distribution has (d+1)(d+1) categories, with respective probabilities pi,1id+1p_{i},~{}1\leq i\leq d+1. The covariance matrix for the first dd components of a multinomial vector is well-known to be nΣ𝒑n\Sigma_{\boldsymbol{p}}, where Σ𝒑:=diag(𝒑)𝒑𝒑\Sigma_{\boldsymbol{p}}\vcentcolon=\text{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}, see, e.g., (Severini,, 2005, p.377). From Theorem 1 in Tanabe & Sagae, (1992), we also know that det(Σ𝒑)=i=1d+1pi\det(\Sigma_{\boldsymbol{p}})=\prod_{i=1}^{d+1}p_{i}. Hence, the density function of the multivariate centered Gaussian random vector that has the same covariance matrix Σ𝒑\Sigma_{\boldsymbol{p}} is defined as follows:

ϕΣ𝒑(𝒚):=exp(12𝒚Σ𝒑1𝒚)(2π)di=1d+1pi,𝒚d.\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{y})\vcentcolon=\frac{\exp\big{(}-\frac{1}{2}\boldsymbol{y}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\,\boldsymbol{y}\big{)}}{\sqrt{(2\pi)^{d}\,\prod_{i=1}^{d+1}p_{i}}},\quad\boldsymbol{y}\in\mathbb{R}^{d}. (2.3)

Throughout this section, let 𝒑𝒮do\boldsymbol{p}\in\mathcal{S}_{d}^{o} be given, and let

𝑲Multinomial(n,𝒑),𝒀Normald(n𝒑,nΣ𝒑).\boldsymbol{K}\sim\mathrm{Multinomial}\hskip 0.85358pt(n,\boldsymbol{p}),\qquad\boldsymbol{Y}\sim\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}). (2.4)

Ultimately, our main goal (Theorem 2.7) will be to compare the c.d.f. of Pearson’s chi-square statistic, 𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}, with its Gaussian analogue 𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑χd2\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\sim\chi_{d}^{2}, namely compare

(𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑)with(𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑),for >0,\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)\quad\text{with}\quad\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell),\quad\text{for }\ell>0, (2.5)

where 𝜹𝒚,𝒑:=(δy1,p1,δy2,p2,,δyd,pd)\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}\vcentcolon=(\delta_{y_{1},p_{1}},\delta_{y_{2},p_{2}},\dots,\delta_{y_{d},p_{d}})^{\top},

δy,pi:=ynpin,y,1id+1.\delta_{y,p_{i}}\vcentcolon=\frac{y-np_{i}}{\sqrt{n}},\quad y\in\mathbb{R},~{}1\leq i\leq d+1. (2.6)

To see that the statistic 𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}} is just a rewriting of Pearson’s chi-square statistic, use the fact that

Σ𝒑1=diag(𝒑1)+pd+11𝟏d𝟏d\Sigma_{\boldsymbol{p}}^{-1}=\mathrm{diag}(\boldsymbol{p}^{-1})+p_{d+1}^{-1}\boldsymbol{1}_{d}^{\phantom{\top}}\boldsymbol{1}_{d}^{\top} (2.7)

(see, e.g., (Tanabe & Sagae,, 1992, Eqn. 21)) together with Kd+1=ni=1dKiK_{d+1}=n-\sum_{i=1}^{d}K_{i} to write

𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑\displaystyle\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}} =1n(𝑲n𝒑){diag(𝒑1)+pd+11𝟏d𝟏d}(𝑲n𝒑)\displaystyle=\frac{1}{n}(\boldsymbol{K}-n\boldsymbol{p})^{\top}\left\{\mathrm{diag}(\boldsymbol{p}^{-1})+p_{d+1}^{-1}\boldsymbol{1}_{d}^{\phantom{\top}}\boldsymbol{1}_{d}^{\top}\right\}(\boldsymbol{K}-n\boldsymbol{p})
=i=1d(Kinpi)2npi+i=1dj=1d(Kinpi)(Kjnpj)npd+1\displaystyle=\sum_{i=1}^{d}\frac{(K_{i}-np_{i})^{2}}{np_{i}}+\sum_{i=1}^{d}\sum_{j=1}^{d}\frac{(K_{i}-np_{i})(K_{j}-np_{j})}{np_{d+1}}
=i=1d(Kinpi)2npi+(Kd+1npd+1)2npd+1=i=1d+1(Kinpi)2npi.\displaystyle=\sum_{i=1}^{d}\frac{(K_{i}-np_{i})^{2}}{np_{i}}+\frac{(K_{d+1}-np_{d+1})^{2}}{np_{d+1}}=\sum_{i=1}^{d+1}\frac{(K_{i}-np_{i})^{2}}{np_{i}}. (2.8)
Remark 2.1.

Notice that

δkd+1,pd+1=i=1dδki,pi,\delta_{k_{d+1},p_{d+1}}=-\sum_{i=1}^{d}\delta_{k_{i},p_{i}}, (2.9)

because pd+1=1𝐩1p_{d+1}=1-\|\boldsymbol{p}\|_{1} and kd+1=n𝐤1k_{d+1}=n-\|\boldsymbol{k}\|_{1}.

Remark 2.2.

Let >0\ell>0 be given. Since we know that 𝛅𝐘,𝐩Σ𝐩1𝛅𝐘,𝐩\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}} is χd2\chi_{d}^{2} distributed, then

(𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑)=γ¯(d/2,/2),\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)=\overline{\gamma}(d/2,\ell/2), (2.10)

where the two functions

γ¯(x,a):=0atx1etdtΓ(x),Γ(x)=0tx1etdt,x,a>0,\overline{\gamma}(x,a)\vcentcolon=\frac{\int_{0}^{a}t^{x-1}e^{-t}{\rm d}t}{\Gamma(x)},\qquad\Gamma(x)=\int_{0}^{\infty}t^{x-1}e^{-t}{\rm d}t,\quad x,a>0, (2.11)

denote the regularized lower incomplete gamma function and Euler’s gamma function, respectively.

Remark 2.3.

Throughout the paper, the notation u=𝒪(v)u=\mathcal{O}(v) means that lim supn|u/v|<C\limsup_{n\to\infty}|u/v|<C, where C(0,)C\in(0,\infty) is a universal constant. Whenever CC might depend on a parameter, we add a subscript: for example, u=𝒪d(v)u=\mathcal{O}_{d}(v). Similarly, u=o(v)u=\mathrm{o}(v) means that limn|u/v|=0\lim_{n\to\infty}|u/v|=0, and subscripts indicate which parameters the convergence rate can depend on.

The first step (Proposition 2.4) to achieve our goal (2.5) is to prove a non-asymptotic version of the local limit theorem found in (Siotani & Fujikoshi,, 1984, Lemma 2.1) and (Ouimet,, 2021b, Theorem 2.1). Specifically, we develop a non-asymptotic expansion for the log-ratio of the multinomial probability mass function (2.2) over the corresponding multivariate normal density (2.3). Our approximation is uniform for 𝒌\boldsymbol{k} in the bulk of the Multinomial(n,𝒑)\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p}) distribution and uniform for 𝒑\boldsymbol{p} in a compact set away from the boundary of the simplex 𝒮d\mathcal{S}_{d}, namely for

𝒌n,𝒑:={𝒌0dn𝒮do:max1id+1|δki,pinpi|τlognn},\displaystyle\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}\vcentcolon=\left\{\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}^{o}:\max_{1\leq i\leq d+1}\left|\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right|\leq\tau\sqrt{\frac{\log n}{n}}\right\}, (2.12)
𝒑𝒫τ:={𝒔𝒮do:max1id+1si1τ},\displaystyle\boldsymbol{p}\in\mathscr{P}_{\tau}\vcentcolon=\left\{\boldsymbol{s}\in\mathcal{S}_{d}^{o}:\max_{1\leq i\leq d+1}s_{i}^{-1}\leq\tau\right\}, (2.13)

where τd+1\tau\geq d+1 is a fixed parameter.

Proposition 2.4 (Non-asymptotic local expansion).

Let τd+1\tau\geq d+1 be given. Then, uniformly for 𝐤n,𝐩\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}, 𝐩𝒫τ\boldsymbol{p}\in\mathscr{P}_{\tau} and nτ4n\geq\tau^{4}, we have

log{Pn,𝒑(𝒌)nd/2ϕΣ𝒑(𝜹𝒌,𝒑)}=n1/2i=1d+1(16pi2δki,pi312piδki,pi)+Rn,𝒑(𝒌),\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}=n^{-1/2}\sum_{i=1}^{d+1}\left(\frac{1}{6p_{i}^{2}}\,\delta_{k_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{k_{i},p_{i}}\right)+R_{n,\boldsymbol{p}}(\boldsymbol{k}), (2.14)

where the error Rn,𝐩(𝐤)R_{n,\boldsymbol{p}}(\boldsymbol{k}) satisfies

|Rn,𝒑(𝒌)|\displaystyle|R_{n,\boldsymbol{p}}(\boldsymbol{k})| n1i=1d+1(21pi3|δki,pi|4+10pi2|δki,pi|2+2τ3).\displaystyle\leq n^{-1}\sum_{i=1}^{d+1}\left(\frac{21}{p_{i}^{3}}\,|\delta_{k_{i},p_{i}}|^{4}+\frac{10}{p_{i}^{2}}\,|\delta_{k_{i},p_{i}}|^{2}+\frac{2\tau}{3}\right). (2.15)

The following corollary may be of independent interest. It extends the local limit theorem from Proposition 2.4 to one-to-one transformations of multinomial vectors.

Corollary 2.5.

Let τd+1\tau\geq d+1 be given. Let 𝐡()\boldsymbol{h}(\cdot) be a one-to-one mapping from an open subset 𝒟\mathcal{D} of 𝒮d\mathcal{S}_{d} onto a subset \mathcal{R} of d\mathbb{R}^{d}. Assume further that 𝐡\boldsymbol{h} has continuous partial derivatives on 𝒟\mathcal{D} and its Jacobian determinant det{dd𝐱𝐡(𝐱)}\det\{\frac{{\rm d}}{{\rm d}\boldsymbol{x}}\boldsymbol{h}(\boldsymbol{x})\} is non-zero for all 𝐱𝒟\boldsymbol{x}\in\mathcal{D}. Then, uniformly for 𝐲\boldsymbol{y}\in\mathcal{R} such that 𝐡1(𝐲)n,𝐩\boldsymbol{h}^{-1}(\boldsymbol{y})\in\mathscr{B}_{n,\boldsymbol{p}}, 𝐩𝒫τ\boldsymbol{p}\in\mathscr{P}_{\tau} and nτ4n\geq\tau^{4}, we have

log[Pn,𝒑{𝒉1(𝒚)}|det{dd𝒙𝒉(𝒙)|𝒙=𝒉1(𝒚)}|nd/2ϕΣ𝒑(𝜹𝒉1(𝒚),𝒑)]\displaystyle\log\left[\frac{P_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}\left|\det\left\{\frac{{\rm d}}{{\rm d}\boldsymbol{x}}\boldsymbol{h}(\boldsymbol{x})|_{\boldsymbol{x}=\boldsymbol{h}^{-1}(\boldsymbol{y})}\right\}\right|}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{h}^{-1}(\boldsymbol{y}),\boldsymbol{p}})}\right] (2.16)
=n1/2i=1d+1[16pi2δ{𝒉1(𝒚)}i,pi312piδ{𝒉1(𝒚)}i,pi]+Rn,𝒑{𝒉1(𝒚)},\displaystyle\quad=n^{-1/2}\sum_{i=1}^{d+1}\left[\frac{1}{6p_{i}^{2}}\,\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}\right]+R_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\},

where the error Rn,𝐩{𝐡1(𝐲)}R_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\} satisfies

|Rn,𝒑{𝒉1(𝒚)}|\displaystyle|R_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}| n1i=1d+1[21pi3|δ{𝒉1(𝒚)}i,pi|4+10pi2|δ{𝒉1(𝒚)}i,pi|2+2τ3].\displaystyle\leq n^{-1}\sum_{i=1}^{d+1}\left[\frac{21}{p_{i}^{3}}\,|\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}|^{4}+\frac{10}{p_{i}^{2}}\,|\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}|^{2}+\frac{2\tau}{3}\right]. (2.17)

Using the local expansion from Proposition 2.4, the second step (Proposition 2.6) to achieve our goal (2.5) consists in finding an upper bound on the total variation between the probability measure associated with the multivariate normal in (2.3) and the probability measure of a multinomial vector jittered by a uniform random variable on (1/2,1/2)d(-1/2,1/2)^{d}. The proposition is a non-asymptotic version of (Ouimet,, 2021b, Lemma 3.1).

Proposition 2.6 (Total variation).

Let τd+1\tau\geq d+1 be given, and let 𝐊Multinomial(n,𝐩)\boldsymbol{K}\sim\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p}) and 𝐔Uniform(1/2,1/2)d\boldsymbol{U}\sim\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}, where 𝐊\boldsymbol{K} and 𝐔\boldsymbol{U} are assumed independent. Define 𝐗:=𝐊+𝐔\boldsymbol{X}\vcentcolon=\boldsymbol{K}+\boldsymbol{U} and let ~n,𝐩\widetilde{\mathbb{P}}_{n,\boldsymbol{p}} be the law of 𝐗\boldsymbol{X}. Let n,𝐩\mathbb{Q}_{n,\boldsymbol{p}} be the law of the multivariate normal distribution Normald(n𝐩,nΣ𝐩)\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}), where recall Σ𝐩:=diag(𝐩)𝐩𝐩\Sigma_{\boldsymbol{p}}\vcentcolon=\mathrm{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}. Then, for all nτ4n\geq\tau^{4}, we have

sup𝒑𝒫τ~n,𝒑n,𝒑8.03τ3/2(d+1)1/2n1/2,\sup_{\boldsymbol{p}\in\mathscr{P}_{\tau}}\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}, (2.18)

where \|\cdot\| denotes the total variation norm.

The final step (Theorem 2.7) to achieve our goal (2.5) consists in controlling the difference between (𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑)\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell) and (𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑)\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell) using the total variation bound from Proposition 2.6.

Theorem 2.7 (C.d.f. comparison).

Let τd+1\tau\geq d+1 and >0\ell>0 be given. Then, for all nτ4n\geq\tau^{4}, we have

sup𝒑𝒫τ|(𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑)(𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑)|1.26τ3(d+1)(logn)3/2n1/2.\sup_{\boldsymbol{p}\in\mathscr{P}_{\tau}}\big{|}\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)-\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)\big{|}\leq\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}. (2.19)

As a consequence of Remark 2.2, Theorem 2.7 and the mean value theorem, we deduce the following stochastic upper bound on 𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}.

Corollary 2.8 (Quantile coupling).

Let FF be the cumulative distribution function (c.d.f.) of 𝛅𝐊,𝐩Σ𝐩1𝛅𝐊,𝐩\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}, and let F(q):=inf{x:F(x)q}F^{\star}(q)\vcentcolon=\inf\{x\in\mathbb{R}:F(x)\geq q\} denote the generalized inverse distribution function (or quantile function) associated with FF. Also, let GG be the c.d.f. of 𝛅𝐘,𝐩Σ𝐩1𝛅𝐘,𝐩\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}, and define the random variable Ξn,𝐩:=F{G(𝛅𝐘,𝐩Σ𝐩1𝛅𝐘,𝐩)}\Xi_{n,\boldsymbol{p}}\vcentcolon=F^{\star}\{G(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}})\}. For >0\ell>0, let

c():=1.26τ3(d+1)(logn)3/2G()n1/2,G()=(/2)d/21e/22Γ(d/2).c(\ell)\vcentcolon=\ell-\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(\ell)\,n^{1/2}},\quad G^{\prime}(\ell)=\frac{(\ell/2)^{d/2-1}e^{-\ell/2}}{2\,\Gamma(d/2)}. (2.20)

Then, with the event

A:={ωΩ:2(d/21)𝜹𝒀(ω),𝒑Σ𝒑1𝜹𝒀(ω),𝒑sup[2(d/21),)c()},A\vcentcolon=\left\{\omega\in\Omega:2(d/2-1)\leq\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}\leq\sup_{\ell\in[2(d/2-1),\infty)}c(\ell)\right\}, (2.21)

we have, for nn(d,τ)n\geq n(d,\tau) large enough,

Ξn,𝒑(ω)𝜹𝒀(ω),𝒑Σ𝒑1𝜹𝒀(ω),𝒑+1.26τ3(d+1)(logn)3/2G{Ln,𝒑,d,τ(ω)}n1/2,ωA,\Xi_{n,\boldsymbol{p}}(\omega)\leq\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}\{L_{n,\boldsymbol{p},d,\tau}(\omega)\}\,n^{1/2}},\quad\omega\in A, (2.22)

where Ln,𝐩,d,τ(ω)L_{n,\boldsymbol{p},d,\tau}(\omega) solves 𝛅𝐘(ω),𝐩Σ𝐩1𝛅𝐘(ω),𝐩=c{Ln,𝐩,d,τ(ω)}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}=c\{L_{n,\boldsymbol{p},d,\tau}(\omega)\}.

3 Application to confidence intervals

3.1 Description of the problem

In this section, we apply the method mentioned in Section 1 and use the normal approximation results we obtained in Section 2 to clarify the optimization step, given the multi-dimensionality of the problem and the roughness of the confidence set for the multinomial.

Formally, consider a discrete random variable XX having the following probability mass function

(X=vi)=p0,i,1id+1,\mathbb{P}(X=v_{i})=p_{0,i},\quad 1\leq i\leq d+1, (3.1)

where dd\in\mathbb{N} is a positive integer, the values (𝒗,vd+1)d+1(\boldsymbol{v},v_{d+1})\in\mathbb{R}^{d+1} are known, but the probability weights p0,ip_{0,i} are unknown. We consider a (fixed and given) strictly convex objective function 𝒑f𝒗,vd+1(𝒑)\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}), which may depend on the values of 𝒗\boldsymbol{v} and vd+1v_{d+1}, and our goal is to estimate and find a confidence interval for

λ0:=f𝒗,vd+1(𝒑0),where 𝒑0:=(p0,1,,p0,d).\lambda_{0}\vcentcolon=f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}_{0}),\quad\text{where }\boldsymbol{p}_{0}\vcentcolon=(p_{0,1},\dots,p_{0,d}). (3.2)

Given a significance level α(0,1)\alpha\in(0,1), a sequence of i.i.d. observations X1,X2,XnX_{1},X_{2},\dots X_{n} following the discrete distribution in (3.1), and the associated sample count random variables

Nk:=i=1n𝟙{Xi=vk},1kd+1,N_{k}\vcentcolon=\sum_{i=1}^{n}\mathds{1}_{\{X_{i}=v_{k}\}},\quad 1\leq k\leq d+1, (3.3)

we will show how to construct a lower bound λ=λ(𝒗,vd+1,N1,N2,,Nd)\lambda_{\star}=\lambda_{\star}(\boldsymbol{v},v_{d+1},N_{1},N_{2},\dots,N_{d}) and an upper bound λ=λ(𝒗,vd+1,N1,N2,,Nd)\lambda^{\star}=\lambda^{\star}(\boldsymbol{v},v_{d+1},N_{1},N_{2},\dots,N_{d}) that satisfy

(λλ0λ)1α+εn,for all nn,\mathbb{P}(\lambda_{\star}\leq\lambda_{0}\leq\lambda^{\star})\geq 1-\alpha+\varepsilon_{n},\quad\text{for all }n\geq n_{\star}, (3.4)

where nn_{\star} is an explicit threshold, 𝑵:=(N1,N2,,Nd)Multinomial(n,𝒑0)\boldsymbol{N}\vcentcolon=(N_{1},N_{2},\dots,N_{d})\sim\mathrm{Multinomial}\hskip 0.85358pt(n,\boldsymbol{p}_{0}), and εn\varepsilon_{n} is a small explicit quantity that goes to 0 as nn\to\infty.

When there are only two category values (d=1d=1), this problem is easy. Indeed, using a binary search, a computer can find an exact (1α)100%(1-\alpha)\cdot 100\%-confidence interval for p0p_{0}, say [p0¯,p0¯][\underline{p_{0}},\overline{p_{0}}], using the fact that N1Binomial(n,p0)N_{1}\sim\mathrm{Binomial}(n,p_{0}). Then we find λ\lambda_{\star} (resp., λ\lambda^{\star}) simply by minimizing (resp., maximizing) the objective function pfv1,v2(p)p\mapsto f_{v_{1},v_{2}}(p) over p[p0¯,p0¯]p\in[\underline{p_{0}},\overline{p_{0}}], i.e.,

λ=minp[p0¯,p0¯]fv1,v2(p),λ=maxp[p0¯,p0¯]fv1,v2(p).\lambda_{\star}=\min_{p\in[\underline{p_{0}},\overline{p_{0}}]}f_{v_{1},v_{2}}(p),\qquad\lambda^{\star}=\max_{p\in[\underline{p_{0}},\overline{p_{0}}]}f_{v_{1},v_{2}}(p). (3.5)

(For details on binomial inversion, see, e.g., Langford, (2005). Binomial tails can be computed accurately using Loader’s method Loader, (2002), as used in pbinomial() in R, or Stirling’s approximation with several terms for very large nn, or computed exactly using arithmetic on extended numbers such as BigInteger and BigDecimal in Java.)

In higher dimensions however, the problem becomes much more difficult. Given a significance level α(0,1)\alpha\in(0,1) and an (observed) vector of sample counts

ni,1id+1,where nd+1:=n𝒏1and𝒑^n:=𝒏/n,n_{i}\in\mathbb{N},~{}1\leq i\leq d+1,\qquad\text{where }n_{d+1}\vcentcolon=n-\|\boldsymbol{n}\|_{1}~{}\text{and}~{}\hat{\boldsymbol{p}}_{n}\vcentcolon=\boldsymbol{n}/n, (3.6)

we have to optimize the objective function 𝒑f𝒗,vd+1(𝒑)\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}) over vectors of probabilities in the following confidence set for 𝒑0\boldsymbol{p}_{0}:

𝒞(𝒑^n,α):={𝒑𝒮do:𝒌0dn𝒮dPn,𝒑(𝒌)Pn,𝒑(𝒏)Pn,𝒑(𝒌)1α}.\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)\vcentcolon=\Bigg{\{}\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\sum_{\begin{subarray}{c}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}\\ P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\end{subarray}}P_{n,\boldsymbol{p}}(\boldsymbol{k})\leq 1-\alpha\Bigg{\}}. (3.7)
Remark 3.1.

Assuming that the (observed) sample counts nin_{i} are all positive (i.e., each of the d+1d+1 categories has at least one observation), note that the sum in (3.7) is equal to 11 for any 𝐩\boldsymbol{p} on the boundary of the simplex since Pn,𝐩(𝐧)=0P_{n,\boldsymbol{p}}(\boldsymbol{n})=0 in that case. This is why the confidence set 𝒞(𝐩^n,α)\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha) is restricted to the interior of the simplex in (3.7). This remark will play an important role in showing that the optima which define the confidence bounds in Theorem 3.3 are unique, see (A.67) and below.

The shape of the set 𝒞(𝒑^n,α)\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha) is unclear and the boundaries need not be smooth, which makes the optimization of the objective function much more complicated than in the binomial case (d=1d=1). To overcome this obstacle, our strategy will be to find a smooth superset that contains the exact confidence set for 𝒑0\boldsymbol{p}_{0} and then compute λ\lambda_{\star} and λ\lambda^{\star} by optimizing over the smooth superset, as in (3.5). The small quantity εn\varepsilon_{n} in (3.4) is the “price” we have to pay in confidence to transfer the optimization from the rough (but exact) confidence set to the smooth superset. The smooth superset for 𝒑0=(p0,1,p0,2,,p0,d)\boldsymbol{p}_{0}=(p_{0,1},p_{0,2},\dots,p_{0,d}) will be constructed using the fact that the law of the sample count vector 𝑵\boldsymbol{N}, i.e., the Multinomial(n,𝒑0)\mathrm{Multinomial}\hskip 0.85358pt(n,\boldsymbol{p}_{0}) distribution, is close in total variation, when jittered by a Uniform(1/2,1/2)d\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}, to the multivariate normal distribution with the same mean n𝒑n\boldsymbol{p} and covariance nΣ𝒑n\Sigma_{\boldsymbol{p}}. This point was proved in Proposition 2.6. Other elements of the proof of Theorem 2.7 will be used for the comparison of the confidence sets.

3.2 Solution to the problem

As mentioned previously, it is not obvious (a priori) how we can optimize the objective function 𝒑f𝒗,vd+1(𝒑)\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}) over the confidence set 𝒞(𝒑^n,α)\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha) and find an explicit solution given the multi-dimensional, discrete and rough nature of the confidence set. However, a natural idea to simplify the problem is to replace the confidence set 𝒞(𝒑^n,α)\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha) by its Gaussian analogue, namely

𝒞~(𝒑^n,α)\displaystyle\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha) :={𝒑𝒮do:ϕΣ𝒑(𝜹𝒚,𝒑)ϕΣ𝒑(𝜹𝒏,𝒑)ϕΣ𝒑(𝜹𝒚,𝒑)nd/2d𝒚1α}\displaystyle\vcentcolon=\left\{\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\int_{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})\geq\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\frac{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})}{n^{d/2}}{\rm d}\boldsymbol{y}\leq 1-\alpha\right\} (3.8)
={𝒑𝒮do:(𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑)1α}\displaystyle=\left\{\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}\right)\leq 1-\alpha\right\}
={𝒑𝒮do:γ¯(d/2,𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑/2)1α},\displaystyle=\left\{\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\overline{\gamma}(d/2,\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}/2)\leq 1-\alpha\right\},

where recall 𝒀Normald(𝟎d,Σ𝒑)\boldsymbol{Y}\sim\mathrm{Normal}_{d}(\boldsymbol{0}_{d},\Sigma_{\boldsymbol{p}}), ϕΣ𝒑\phi_{\Sigma_{\boldsymbol{p}}} was defined in (2.3), 𝜹𝒚,𝒑\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}} was defined in (2.6), and γ¯(,)\overline{\gamma}(\cdot,\cdot) was defined in (2.2).

First, we need to find a Gaussian superset 𝒞~(𝒑^n,αεn)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n}) as in (3.8) that contains the exact confidence set 𝒞(𝒑^n,α)\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha), where εn\varepsilon_{n} is a positive quantity that goes to zero as nn\to\infty. The proposition below does just that. The proof relies on the total variation bound from Proposition 2.6 and on other elements of the proof of Theorem 2.7.

Proposition 3.2.

Let α(0,1)\alpha\in(0,1), τd+1\tau\geq d+1, and let 𝐧n𝒮do\boldsymbol{n}\in n\mathcal{S}_{d}^{o} be given such that 𝐩^n𝒫τ\hat{\boldsymbol{p}}_{n}\in\mathscr{P}_{\tau}. Recall the definition of the confidence sets 𝒞(𝐩^n,α)\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha) and 𝒞~(𝐩^n,α)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha) in (3.7) and (3.8). For all nτ4n\geq\tau^{4}, we have

𝒞(𝒑^n,α)𝒞~(𝒑^n,αεn),withεn:=194.96τ3(d+1)(logn)3/2n1/2.\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)\subseteq\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n}),\quad\text{with}\quad\varepsilon_{n}\vcentcolon=\frac{194.96\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}. (3.9)

Finally, by Proposition 3.2, we can find the lower bound λ\lambda_{\star} and the upper bound λ\lambda^{\star} by optimizing the objective function f𝒗,vd+1(𝒑)f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}) over the Gaussian superset 𝒞~(𝒑^n,αεn)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n}). This solves the problem mentioned in the introduction and described in Section 3.1. The proof shows that the Gaussian superset is strictly convex and it also establishes the existence and uniqueness of the minimizers/maximizers.

Theorem 3.3 (Existence and uniqueness of the solution to the problem).

Assume the objective function 𝐩f𝐯,vd+1(𝐩)\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}) to be strictly convex. Let α(0,1)\alpha\in(0,1), τd+1\tau\geq d+1, and let 𝐧n𝒮do\boldsymbol{n}\in n\mathcal{S}_{d}^{o} be given such that 𝐩^n𝒫τ\hat{\boldsymbol{p}}_{n}\in\mathscr{P}_{\tau}. We have

(λλ0λ)1α+εn,for all nτ4,\mathbb{P}(\lambda_{\star}\leq\lambda_{0}\leq\lambda^{\star})\geq 1-\alpha+\varepsilon_{n},\quad\text{for all }n\geq\tau^{4}, (3.10)

with the choices

λ=min𝒑𝒞~(𝒑^n,αεn)f𝒗,vd+1(𝒑),λ=max𝒑𝒞~(𝒑^n,αεn)f𝒗,vd+1(𝒑).\lambda_{\star}=\min_{\boldsymbol{p}\in\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})}f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}),\qquad\lambda^{\star}=\max_{\boldsymbol{p}\in\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})}f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}). (3.11)

Furthermore, the set 𝒞~(𝐩^n,αεn)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n}) is strictly convex, so that the minimizers/maximizers in (3.11) are well defined (i.e., unique).

The bound on the confidence level in (3.10) cannot be close to exact unless the number of observations nn is extremely high. Therefore, for practical purposes, we simply let εn=0\varepsilon_{n}=0. This is illustrated below for two examples:

f𝒗,vd+1[1](𝒑)=i=1d+1pilogpi,(negative entropy function)\displaystyle f_{\boldsymbol{v},v_{d+1}}^{[1]}(\boldsymbol{p})=\sum_{i=1}^{d+1}p_{i}\log p_{i},\qquad(\text{negative entropy function}) (3.12)
f𝒗,vd+1[2](𝒑)=𝒑A𝒑,(quadratic form)\displaystyle f_{\boldsymbol{v},v_{d+1}}^{[2]}(\boldsymbol{p})=\boldsymbol{p}^{\top}A\boldsymbol{p},\qquad(\text{quadratic form}) (3.13)

where AA is a d×dd\times d positive definite matrix that may depend on the known categorical values 𝒗\boldsymbol{v} and vd+1v_{d+1}.

In Figures 3.1 and 3.2, the confidence bounds are shown to converge to the value of the objective functions at p0p_{0} (i.e., λ0\lambda_{0}) and the empirical significance levels are shown to stay below the nominal level α=0.05\alpha=0.05. The R code that generated the figures is available at https://www.dropbox.com/s/tfwnnba642740xo/multinomial_method.R?dl=0.

Refer to caption
(a) Confidence bounds on λ0\lambda_{0} as a function of nn.
Refer to caption
(b) Empirical level as a function of nn.
Figure 3.1: Example of confidences bounds and empirical levels as a function of nn, when the parameters are set to d=2d=2, τ=d+1\tau=d+1, α=0.05\alpha=0.05, 𝒑0=(0.2,0.3,0.5)\boldsymbol{p}_{0}=(0.2,0.3,0.5)^{\top}, 𝒗=(1,2,3)\boldsymbol{v}=(1,2,3)^{\top}, and the objective function is f𝒗,vd+1[1](𝒑)=i=1d+1pilogpif_{\boldsymbol{v},v_{d+1}}^{[1]}(\boldsymbol{p})=\sum_{i=1}^{d+1}p_{i}\log p_{i}.
Refer to caption
(a) Confidence bounds on λ0\lambda_{0} as a function of nn.
Refer to caption
(b) Empirical level as a function of nn.
Figure 3.2: Example of confidences bounds and empirical levels as a function of nn, when the parameters are set to d=4d=4, τ=d+1\tau=d+1, α=0.05\alpha=0.05, 𝒑0=(0.2,0.3,0.15,0.35)\boldsymbol{p}_{0}=(0.2,0.3,0.15,0.35)^{\top}, 𝒗=(1,2,3,4)\boldsymbol{v}=(1,2,3,4)^{\top}, and the objective function is f𝒗,vd+1[2](𝒑)=𝒑A𝒑f_{\boldsymbol{v},v_{d+1}}^{[2]}(\boldsymbol{p})=\boldsymbol{p}^{\top}A\boldsymbol{p} with A=(𝒂1𝒂2𝒂3)A=\begin{pmatrix}\boldsymbol{a}_{\cdot 1}&\boldsymbol{a}_{\cdot 2}&\boldsymbol{a}_{\cdot 3}\end{pmatrix} and 𝒂1=(v1+1,0.5,0.25)\boldsymbol{a}_{\cdot 1}=(v_{1}+1,0.5,0.25)^{\top}, 𝒂2=(0.5,v2+1,0.75)\boldsymbol{a}_{\cdot 2}=(0.5,v_{2}+1,0.75)^{\top}, 𝒂3=(0.25,0.75,v3+1)\boldsymbol{a}_{\cdot 3}=(0.25,0.75,v_{3}+1)^{\top}.

Appendix A Proofs

Proof of Proposition 2.4.

By taking the logarithm on the left-hand side of (2.2), we have

log{Pn,𝒑(𝒌)nd/2ϕΣ𝒑(𝜹𝒌,𝒑)}=log{n!(2πn)d/2i=1d+1ki!}+i=1d+1(ki+12)logpi+12𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑.\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}=\log\left\{\frac{n!\,(2\pi n)^{d/2}}{\prod_{i=1}^{d+1}k_{i}!}\right\}+\sum_{i=1}^{d+1}\left(k_{i}+\frac{1}{2}\right)\log p_{i}+\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}. (A.1)

From Lindelöf’s estimate of the remainder terms in the expansion of the factorials, found for example on page 67 of Remmert, (1998), we know that

n!=2πexp{(n+12)lognn+λn},where |λn|112n1.n!=\sqrt{2\pi}\exp\left\{(n+\tfrac{1}{2})\log n-n+\lambda_{n}\right\},\quad\text{where }|\lambda_{n}|\leq\frac{1}{12}n^{-1}. (A.2)

By applying (A.2) in (A.1) and reorganizing the terms, we get

log{Pn,𝒑(𝒌)nd/2ϕΣ𝒑(𝜹𝒌,𝒑)}\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\} =i=1d+1(ki+12)log(kinpi)+12𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑\displaystyle=-\sum_{i=1}^{d+1}\left(k_{i}+\frac{1}{2}\right)\log\left(\frac{k_{i}}{np_{i}}\right)+\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}} (A.3)
+1n{λnn1i=1d+1λkiki11pi(kinpi)1}.\displaystyle\quad+\frac{1}{n}\left\{\frac{\lambda_{n}}{n^{-1}}-\sum_{i=1}^{d+1}\frac{\lambda_{k_{i}}}{k_{i}^{-1}}\cdot\frac{1}{p_{i}}\cdot\left(\frac{k_{i}}{np_{i}}\right)^{-1}\right\}.

Since ki/(npi)=1+δki,pi/(npi)k_{i}/(np_{i})=1+\delta_{k_{i},p_{i}}/(\sqrt{n}\,p_{i}), the above is

log{Pn,𝒑(𝒌)nd/2ϕΣ𝒑(𝜹𝒌,𝒑)}\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\} =i=1d+1npi(1+δki,pinpi)log(1+δki,pinpi)\displaystyle=-\sum_{i=1}^{d+1}np_{i}\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)\log\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right) (A.4)
i=1d+112log(1+δki,pinpi)+12𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑\displaystyle\quad-\sum_{i=1}^{d+1}\frac{1}{2}\log\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)+\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}
+1n{λnn1i=1d+1λkiki11pi(1+δki,pinpi)1}.\displaystyle\quad+\frac{1}{n}\left\{\frac{\lambda_{n}}{n^{-1}}-\sum_{i=1}^{d+1}\frac{\lambda_{k_{i}}}{k_{i}^{-1}}\cdot\frac{1}{p_{i}}\cdot\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)^{-1}\right\}.

Now, for |y|τ(logn)/n(logn)/n1/20.84|y|\leq\tau\sqrt{(\log n)/n}\leq\sqrt{(\log n)/n^{1/2}}\leq 0.84 (recall our assumption nτ416n\geq\tau^{4}\geq 16), Lagrange error bounds for the following Taylor expansions imply

|(1+y)log(1+y)(y+y22y36)|\displaystyle\left|(1+y)\log(1+y)-\left(y+\frac{y^{2}}{2}-\frac{y^{3}}{6}\right)\right| |2(10.84)3||y44!|21|y|4,\displaystyle\leq\left|\frac{2}{(1-0.84)^{3}}\right|\cdot\left|\frac{y^{4}}{4!}\right|\leq 21\,|y|^{4}, (A.5)
|log(1+y)y|\displaystyle\big{|}\log(1+y)-y\big{|} |1(10.84)2||y22!|20|y|2,\displaystyle\leq\left|\frac{-1}{(1-0.84)^{2}}\right|\cdot\left|\frac{y^{2}}{2!}\right|\leq 20\,|y|^{2},
|(1+y)1|\displaystyle\big{|}(1+y)^{-1}\big{|} |110.84|7.\displaystyle\leq\left|\frac{1}{1-0.84}\right|\leq 7.

By applying these estimates in (A.4) with y=δki,pi/(npi)y=\delta_{k_{i},p_{i}}/(\sqrt{n}\,p_{i}), together with the error bound |λn|1/(12n)|\lambda_{n}|\leq 1/(12n) from (A.2), we obtain

log{Pn,𝒑(𝒌)nd/2ϕΣ𝒑(𝜹𝒌,𝒑)}\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\} =i=1d+1npi{δki,pinpi+12(δki,pinpi)216(δki,pinpi)3}\displaystyle=-\sum_{i=1}^{d+1}np_{i}\left\{\cancel{\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}}+\bcancel{\frac{1}{2}\left(\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)^{2}}-\frac{1}{6}\left(\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)^{3}\right\} (A.6)
i=1d+112δki,pinpi+12𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑+Rn,𝒑(𝒌),\displaystyle\quad-\sum_{i=1}^{d+1}\frac{1}{2}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}+\bcancel{\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}}+R_{n,\boldsymbol{p}}(\boldsymbol{k}),

where the rest Rn,𝒑(𝒌)R_{n,\boldsymbol{p}}(\boldsymbol{k}) satisfies

|Rn,𝒑(𝒌)|\displaystyle|R_{n,\boldsymbol{p}}(\boldsymbol{k})| i=1d+1npi21|δki,pinpi|4+i=1d+11220|δki,pinpi|2+1n(112+712i=1d+11pi).\displaystyle\leq\sum_{i=1}^{d+1}np_{i}\cdot 21\,\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{4}+\sum_{i=1}^{d+1}\frac{1}{2}\cdot 20\,\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{2}+\frac{1}{n}\left(\frac{1}{12}+\frac{7}{12}\sum_{i=1}^{d+1}\frac{1}{p_{i}}\right). (A.7)

The cancellations in (A.6) come from δkd+1,pd+1=i=1dδki,pi\delta_{k_{d+1},p_{d+1}}=-\sum_{i=1}^{d}\delta_{k_{i},p_{i}} (recall Remark 2.1) and from (Σ𝒑1)ij=pi1𝟙{i=j}+pd+11(\Sigma_{\boldsymbol{p}}^{-1})_{ij}=p_{i}^{-1}\mathds{1}_{\{i=j\}}+p_{d+1}^{-1} for 1i,jd1\leq i,j\leq d by (2.7). This ends the proof. ∎

Proof of Proposition 2.6.

By the comparison of the total variation norm with the Hellinger distance on page 726 of Carter, (2002), we already know that

~n,𝒑n,𝒑2(𝑿n,𝒑c)+𝔼[log{d~n,𝒑dn,𝒑(𝑿)} 1{𝑿n,𝒑}].\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\leq\sqrt{2\,\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})+\mathbb{E}\left[\log\left\{\frac{{\rm d}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}}{{\rm d}\mathbb{Q}_{n,\boldsymbol{p}}}(\boldsymbol{X})\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]}. (A.8)

By applying a union bound, followed by the fact that piτ1p_{i}\,\tau\geq 1 for all 𝒑𝒫τ\boldsymbol{p}\in\mathscr{P}_{\tau} and then Hoeffding’s inequality, we get

(𝑲n,𝒑c)\displaystyle\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc}) i=1d+1(|δKi,pi|>piτlogn)i=1d+1(|δKi,pi|>logn)\displaystyle\leq\sum_{i=1}^{d+1}\mathbb{P}\left(|\delta_{K_{i},p_{i}}|>p_{i}\,\tau\sqrt{\log n}\right)\leq\sum_{i=1}^{d+1}\mathbb{P}\left(|\delta_{K_{i},p_{i}}|>\sqrt{\log n}\right)
i=1d+12e2(logn)22(d+1)n2,\displaystyle\leq\sum_{i=1}^{d+1}2\,e^{-2(\sqrt{\log n})^{2}}\leq\frac{2\,(d+1)}{n^{2}}, (A.9)

and similarly, for nτ416n\geq\tau^{4}\geq 16,

(𝑿n,𝒑c)\displaystyle\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc}) i=1d+1(|δXi,pi|>logn)i=1d+1(|δKi,pi|>logn12n)\displaystyle\leq\sum_{i=1}^{d+1}\mathbb{P}\left(|\delta_{X_{i},p_{i}}|>\sqrt{\log n}\right)\leq\sum_{i=1}^{d+1}\mathbb{P}\left(|\delta_{K_{i},p_{i}}|>\sqrt{\log n}-\tfrac{1}{2\sqrt{n}}\right)
i=1d+12e2(0.9249logn)22(d+1)n1.71.\displaystyle\leq\sum_{i=1}^{d+1}2\,e^{-2(0.9249\sqrt{\log n})^{2}}\leq\frac{2\,(d+1)}{n^{1.71}}. (A.10)

To control the expectation in (A.8), note that if Pn,𝒑(𝒙)P_{n,\boldsymbol{p}}(\boldsymbol{x}) denotes the density function associated with ~n,𝒑\widetilde{\mathbb{P}}_{n,\boldsymbol{p}} (i.e., it is equal to Pn,𝒑(𝒌)P_{n,\boldsymbol{p}}(\boldsymbol{k}) whenever 𝒌0dn𝒮do\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}^{o} is closest to 𝒙0dn𝒮do+[1/2,1/2]d\boldsymbol{x}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}^{o}+[-1/2,1/2]^{d}), then

𝔼[log{d~n,𝒑dn,𝒑(𝑿)} 1{𝑿n,𝒑}]\displaystyle\mathbb{E}\left[\log\left\{\frac{{\rm d}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}}{{\rm d}\mathbb{Q}_{n,\boldsymbol{p}}}(\boldsymbol{X})\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]
=𝔼[log{Pn,𝒑(𝑿)nd/2ϕΣ𝒑(𝜹𝑿,𝒑)} 1{𝑿n,𝒑}]\displaystyle\quad=\mathbb{E}\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{X})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]
=𝔼[log{Pn,𝒑(𝑲)nd/2ϕΣ𝒑(𝜹𝑲,𝒑)} 1{𝑲n,𝒑}]\displaystyle\quad=\mathbb{E}\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]
+𝔼[log{nd/2ϕΣ𝒑(𝜹𝑲,𝒑)nd/2ϕΣ𝒑(𝜹𝑿,𝒑)} 1{𝑲n,𝒑}]\displaystyle\quad\quad+\mathbb{E}\left[\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]
+𝔼[log{Pn,𝒑(𝑲)nd/2ϕΣ𝒑(𝜹𝑿,𝒑)}(𝟙{𝑿n,𝒑}𝟙{𝑲n,𝒑})]\displaystyle\quad\quad+\mathbb{E}\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,(\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}-\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}})\right]
=:(I)+(II)+(III).\displaystyle\quad=\vcentcolon(\mathrm{I})+(\mathrm{II})+(\mathrm{III}). (A.11)

By the non-asymptotic expansion in Proposition 2.4, we have

|(I)|\displaystyle|(\mathrm{I})| n1/2i=1d+1|𝔼{(16pi2δki,pi312piδki,pi) 1{𝑲n,𝒑}}|\displaystyle\leq n^{-1/2}\sum_{i=1}^{d+1}\left|\mathbb{E}\left\{\left(\frac{1}{6p_{i}^{2}}\,\delta_{k_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{k_{i},p_{i}}\right)\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right\}\right| (A.12)
+n1i=1d+1{21pi3𝔼(|δki,pi|4)+10pi2𝔼(|δki,pi|2)+2τ3}.\displaystyle\quad+n^{-1}\sum_{i=1}^{d+1}\left\{\frac{21}{p_{i}^{3}}\,\mathbb{E}\left(|\delta_{k_{i},p_{i}}|^{4}\right)+\frac{10}{p_{i}^{2}}\,\mathbb{E}\left(|\delta_{k_{i},p_{i}}|^{2}\right)+\frac{2\tau}{3}\right\}.

By Lemma B.1, and our assumptions τd+12\tau\geq d+1\geq 2 and nτ416n\geq\tau^{4}\geq 16, we can bound the expectations in (A.12) as follows:

|(I)|\displaystyle|(\mathrm{I})| d+1n1/2[τ2618{(𝑲n,𝒑c)}1/4+τ212{(𝑲n,𝒑c)}1/2]+τ(d+1)6n\displaystyle\leq\frac{d+1}{n^{1/2}}\left[\frac{\tau^{2}}{6}\cdot\frac{1}{\sqrt{8}}\left\{\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})\right\}^{1/4}+\frac{\tau}{2}\cdot\frac{1}{2}\left\{\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})\right\}^{1/2}\right]+\frac{\tau(d+1)}{6n}
+d+1n{(21τ3+21τ21n)+10τ+2τ3}\displaystyle\qquad+\frac{d+1}{n}\left\{\left(21\tau\cdot 3+21\tau^{2}\cdot\frac{1}{n}\right)+10\tau+\frac{2\tau}{3}\right\}
τ2(d+1)n1/2{1682(d+1)n24+14τ2(d+1)n2}+38.23τ2(d+1)n\displaystyle\leq\frac{\tau^{2}(d+1)}{n^{1/2}}\left\{\frac{1}{6\sqrt{8}}\sqrt[4]{\frac{2\,(d+1)}{n^{2}}}+\frac{1}{4\tau}\sqrt{\frac{2\,(d+1)}{n^{2}}}\right\}+\frac{38.23\,\tau^{2}\,(d+1)}{n}
32.29τ2(d+1)5/4n.\displaystyle\leq\frac{32.29\,\tau^{2}\,(d+1)^{5/4}}{n}. (A.13)

For the term (II)(\mathrm{II}) in (A), note that

log{nd/2ϕΣ𝒑(𝜹𝑲,𝒑)nd/2ϕΣ𝒑(𝜹𝑿,𝒑)}\displaystyle\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}
=12n(𝑿n𝒑)Σ𝒑1(𝑿n𝒑)12n(𝑲n𝒑)Σ𝒑1(𝑲n𝒑)\displaystyle\quad=\frac{1}{2n}(\boldsymbol{X}-n\boldsymbol{p})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{X}-n\boldsymbol{p})-\frac{1}{2n}(\boldsymbol{K}-n\boldsymbol{p})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{K}-n\boldsymbol{p})
=12n(𝑿𝑲)Σ𝒑1(𝑿𝑲)\displaystyle\quad=\frac{1}{2n}(\boldsymbol{X}-\boldsymbol{K})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{X}-\boldsymbol{K})
+12n{(𝑿𝑲)Σ𝒑1(𝑲n𝒑)+(𝑲n𝒑)Σ𝒑1(𝑿𝑲)}.\displaystyle\quad\quad+\frac{1}{2n}\left\{(\boldsymbol{X}-\boldsymbol{K})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{K}-n\boldsymbol{p})+(\boldsymbol{K}-n\boldsymbol{p})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{X}-\boldsymbol{K})\right\}. (A.14)

Using the expression (Σ𝒑1)ij=pi1𝟙{i=j}+pd+11(\Sigma_{\boldsymbol{p}}^{-1})_{ij}=p_{i}^{-1}\mathds{1}_{\{i=j\}}+p_{d+1}^{-1} in (2.7), the independence of the random variables (XiKi)i=1d(X_{i}-K_{i})_{i=1}^{d}, and our assumptions τd+12\tau\geq d+1\geq 2 and nτ416n\geq\tau^{4}\geq 16, we obtain

|(II)|\displaystyle|(\mathrm{II})| 12ni=1d(pi1+pd+11)12+1ni=1d(1pi+dpd+1)2𝔼(|δki,pi|2)(𝑲n,𝒑c)\displaystyle\leq\frac{1}{2n}\sum_{i=1}^{d}\frac{(p_{i}^{-1}+p_{d+1}^{-1})}{12}+\frac{1}{\sqrt{n}}\sum_{i=1}^{d}\frac{\left(\frac{1}{p_{i}}+\frac{d}{p_{d+1}}\right)}{2}\,\sqrt{\mathbb{E}\left(|\delta_{k_{i},p_{i}}|^{2}\right)}\sqrt{\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})}
12nτ(d+1)6+1nτ(d+1)22142(d+1)n2\displaystyle\leq\frac{1}{2n}\cdot\frac{\tau(d+1)}{6}+\frac{1}{\sqrt{n}}\cdot\frac{\tau(d+1)^{2}}{2}\cdot\sqrt{\frac{1}{4}}\cdot\sqrt{\frac{2\,(d+1)}{n^{2}}}
τ(d+1)3n.\displaystyle\leq\frac{\tau(d+1)}{3n}. (A.15)

To bound the term (III)(\mathrm{III}) in (A), we first derive a rough bound for the log-ratio in (2.14) on the set {𝑲n,𝒑}{𝑿n,𝒑}\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}. Note that the three coefficients (21,20,7)(21,20,7) we found in (A.5), and hence those in (A.7), can be replaced by (84,50,10)(84,50,10) on the event {𝑲n,𝒑c}{𝑿n,𝒑}\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc}\}\cap\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\} by rerunning the proof of Proposition 2.4, since the bound on |y||y| just above (A.5) in that case becomes

max1id+1|δki,pinpi|τlognn+τ2n1.076τlognn0.9,when nτ416.\max_{1\leq i\leq d+1}\left|\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right|\leq\tau\sqrt{\frac{\log n}{n}}+\frac{\tau}{2n}\leq 1.076\tau\sqrt{\frac{\log n}{n}}\leq 0.9,\quad\text{when }n\geq\tau^{4}\geq 16. (A.16)

Taking the above into account, and using our assumptions τd+12\tau\geq d+1\geq 2 and nτ416n\geq\tau^{4}\geq 16, we have

|log{Pn,𝒑(𝑲)nd/2ϕΣ𝒑(𝜹𝑲,𝒑)} 1{𝑲n,𝒑}{𝑿n,𝒑}|\displaystyle\left|\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right|
i=1d+1(npi6|δki,pinpi|3+12|δki,pinpi|)\displaystyle\qquad\leq\sum_{i=1}^{d+1}\left(\frac{np_{i}}{6}\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{3}+\frac{1}{2}\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}\right)
+i=1d+1npi84|δki,pinpi|4+i=1d+11250|δki,pinpi|2+11τ(d+1)12n\displaystyle\qquad\quad+\sum_{i=1}^{d+1}np_{i}\cdot 84\,\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{4}+\sum_{i=1}^{d+1}\frac{1}{2}\cdot 50\,\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{2}+\frac{11\tau(d+1)}{12n}
n6(1.076τlognn)3+d+12(1.076τlognn)\displaystyle\qquad\leq\frac{n}{6}\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)^{3}+\frac{d+1}{2}\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)
+84n(1.076τlognn)4+25(1.076τlognn)2+11τ(d+1)12n\displaystyle\qquad\quad+84n\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)^{4}+25\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)^{2}+\frac{11\tau(d+1)}{12n}
96.25τ3(logn)3/2n1/2.\displaystyle\qquad\leq\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}. (A.17)

We also have the following rough bound from (A),

|log{nd/2ϕΣ𝒑(𝜹𝑲,𝒑)nd/2ϕΣ𝒑(𝜹𝑿,𝒑)} 1{𝑲n,𝒑}{𝑿n,𝒑}|\displaystyle\left|\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right|
12ni=1d(1pi+dpd+1)4+1ni=1d(1pi+dpd+1)2pi|δki,pinpi| 1{𝑲n,𝒑}{𝑿n,𝒑}\displaystyle\quad\leq\frac{1}{2n}\sum_{i=1}^{d}\frac{\left(\frac{1}{p_{i}}+\frac{d}{p_{d+1}}\right)}{4}+\frac{1}{\sqrt{n}}\sum_{i=1}^{d}\frac{\left(\frac{1}{p_{i}}+\frac{d}{p_{d+1}}\right)}{2}\,p_{i}\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}
12nτ(d+1)24+1nτ(d+1)21.076τlognn0.62τ3(logn)1/2n,\displaystyle\quad\leq\frac{1}{2n}\cdot\frac{\tau(d+1)^{2}}{4}+\frac{1}{\sqrt{n}}\cdot\frac{\tau(d+1)}{2}\cdot 1.076\,\tau\sqrt{\frac{\log n}{n}}\leq\frac{0.62\,\tau^{3}(\log n)^{1/2}}{n}, (A.18)

again using our assumptions τd+12\tau\geq d+1\geq 2 and nτ416n\geq\tau^{4}\geq 16. Putting the rough bounds (A) and (A) together yields

|(III)|\displaystyle|(\mathrm{III})| |𝔼([log{Pn,𝒑(𝑲)nd/2ϕΣ𝒑(𝜹𝑲,𝒑)}+log{nd/2ϕΣ𝒑(𝜹𝑲,𝒑)nd/2ϕΣ𝒑(𝜹𝑿,𝒑)}] 1{𝑲n,𝒑}{𝑿n,𝒑})|\displaystyle\leq\left|\mathbb{E}\left(\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}+\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\right]\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right)\right|
{96.25τ3(logn)3/2n1/2+0.62τ3(logn)1/2n}𝔼[|𝟙{𝑲n,𝒑}{𝑿n,𝒑}|]\displaystyle\leq\left\{\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}+\frac{0.62\,\tau^{3}(\log n)^{1/2}}{n}\right\}\,\mathbb{E}\left[|\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}|\right]
96.31τ3(logn)3/2n1/2𝔼[|𝟙{𝑲n,𝒑}{𝑿n,𝒑}|].\displaystyle\leq\frac{96.31\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}\,\mathbb{E}\left[|\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}|\right]. (A.19)

Putting (A), (A) and (A) in (A), together with the bound

𝔼(|𝟙{𝑿n,𝒑}𝟙{𝑲n,𝒑}|)\displaystyle\mathbb{E}\left(|\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}-\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}|\right) =𝔼(|𝟙{𝑿n,𝒑,𝑲n,𝒑c}𝟙{𝑿n,𝒑c,𝑲n,𝒑}|)\displaystyle=\mathbb{E}\left(|\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}},\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc}\}}-\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc},\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}|\right)
(𝑲n,𝒑c)+(𝑿n,𝒑c)\displaystyle\leq\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})+\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})
2(d+1)n2+2(d+1)n1.712.90(d+1)n1.71,\displaystyle\leq\frac{2\,(d+1)}{n^{2}}+\frac{2\,(d+1)}{n^{1.71}}\leq\frac{2.90\,(d+1)}{n^{1.71}}, (A.20)

(recall (A) and (A)) yields

|𝔼[log{d~n,𝒑dn,𝒑(𝑿)} 1{𝑿n,𝒑}]|\displaystyle\left|\mathbb{E}\left[\log\left\{\frac{{\rm d}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}}{{\rm d}\mathbb{Q}_{n,\boldsymbol{p}}}(\boldsymbol{X})\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]\right| |(I)|+|(II)|+|(III)|\displaystyle\leq|(\mathrm{I})|+|(\mathrm{II})|+|(\mathrm{III})|
32.29τ2(d+1)5/4n+τ(d+1)3n\displaystyle\leq\frac{32.29\,\tau^{2}\,(d+1)^{5/4}}{n}+\frac{\tau(d+1)}{3n}
+96.31τ3(logn)3/2n1/22.90(d+1)n1.71\displaystyle\quad+\frac{96.31\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}\cdot\frac{2.90\,(d+1)}{n^{1.71}}
64.31τ3(d+1)n.\displaystyle\leq\frac{64.31\,\tau^{3}\,(d+1)}{n}. (A.21)

(Again, we used the fact that τd+12\tau\geq d+1\geq 2 and nτ416n\geq\tau^{4}\geq 16 to obtain the last inequality.) Now, putting (A) and (A) together in (A.8) gives the conclusion. ∎

Proof of Theorem 2.7.

The case d=1d=1 can easily be treated separately using the Berry-Esseen theorem (in fact the bound in (2.19) would be of the form Cn1/2Cn^{-1/2} in that case). Alternatively, we can prove our statement as we do below, for all d1d\geq 1 and any >0\ell>0, only assuming further that >1/2\ell>1/2 when d=1d=1.

By the triangle inequality, we have

|(𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑)(𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑)|\displaystyle\left|\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)-\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)\right| (A.22)
n,𝒑(n,𝒑c)+|n,𝒑({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑})~n,𝒑({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d)|\displaystyle\quad\leq\mathbb{P}_{n,\boldsymbol{p}}(\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})+\left|\begin{array}[]{l}\mathbb{P}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\right)\\[1.42262pt] -\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)\end{array}\right|
+|~n,𝒑n,𝒑|({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d)\displaystyle\quad\qquad+|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}|\left(\big{\{}\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\big{\}}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)
+n,𝒑[({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d){𝒚(n,𝒑+[12,12]d):𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑}]\displaystyle\quad\qquad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)\\ \bigtriangleup\,\left\{\boldsymbol{y}\in\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\end{array}\right]
+n,𝒑[{𝒚(n,𝒑+[12,12]d):𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑}{𝒚d:𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑}]\displaystyle\quad\qquad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left\{\boldsymbol{y}\in\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\\[2.84526pt] \bigtriangleup\,\left\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\end{array}\right]
=:(A)+(B)+(C)+(D)+(E).\displaystyle\quad=\vcentcolon(A)_{\ell}+(B)_{\ell}+(C)_{\ell}+(D)_{\ell}+(E)_{\ell}.

First, as in (A), we have

(A)=(𝑲n,𝒑c)2(d+1)n2.(A)_{\ell}=\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})\leq\frac{2\,(d+1)}{n^{2}}. (A.23)

Second, note that by the definition of ~n,𝒑\widetilde{\mathbb{P}}_{n,\boldsymbol{p}},

(B)=0.(B)_{\ell}=0. (A.24)

Third, to bound (C)(C)_{\ell}, we can use the total variation bound in Proposition 2.6. We have

(C)8.03τ3/2(d+1)1/2n1/2.(C)_{\ell}\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}. (A.25)

To bound (D)(D)_{\ell}, note that {𝒚d:nd/2ϕΣ𝒑(𝜹𝒚)>nd/2ϕΣ𝒑(𝜹𝒏)}\left\{\boldsymbol{y}\in\mathbb{R}^{d}:n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y}})>n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n}})\right\} is a convex and connected set. Hence, if we ignore the restriction to n,𝒑+[1/2,1/2]d\mathscr{B}_{n,\boldsymbol{p}}+[-1/2,1/2]^{d}, the symmetric difference

({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d)\displaystyle\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right) (A.26)
{𝒚(n,𝒑+[12,12]d):𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑},\displaystyle\bigtriangleup\,\left\{\boldsymbol{y}\in\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\},

is contained in a dd-dimensional shell of thickness at most 11 in any given direction that is parallel to one of the dd axes. Figure A.3 illustrates the situation when d=2d=2.

Refer to caption
Figure A.3: The symmetric difference (A.26) is given by the difference between the gray region and the region that the ellipse covers. The ellipse itself represents the contour line of the multivariate normal Normald(n𝒑,nΣ𝒑)\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}) at height e/2/(2πn)ddet(Σ𝒑)e^{-\ell/2}/\sqrt{(2\pi n)^{d}\det(\Sigma_{\boldsymbol{p}})}.

When 𝒌n,𝒑\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}, we have

|12𝜹𝒌,𝒑Σ𝒑1(1,,1)n|=τ2lognni=1d(pi1+dpd+11)τ2(d+1)2(logn)1/22n1/2,\displaystyle\left|\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\frac{(1,\dots,1)}{\sqrt{n}}\right|=\frac{\tau}{2}\sqrt{\frac{\log n}{n}}\sum_{i=1}^{d}(p_{i}^{-1}+d\,p_{d+1}^{-1})\leq\frac{\tau^{2}(d+1)^{2}(\log n)^{1/2}}{2n^{1/2}}, (A.27)
|12(1,,1)nΣ𝒑1𝜹𝒌,𝒑|=τ2lognni=1d(pi1+dpd+11)τ2(d+1)2(logn)1/22n1/2,\displaystyle\left|\frac{1}{2}\frac{(1,\dots,1)^{\top}}{\sqrt{n}}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\right|=\frac{\tau}{2}\sqrt{\frac{\log n}{n}}\sum_{i=1}^{d}(p_{i}^{-1}+d\,p_{d+1}^{-1})\leq\frac{\tau^{2}(d+1)^{2}(\log n)^{1/2}}{2n^{1/2}}, (A.28)
|12(1,,1)nΣ𝒑1(1,,1)n|=12ni=1d(pi1+dpd+11)τ(d+1)22n,\displaystyle\left|\frac{1}{2}\frac{(1,\dots,1)}{\sqrt{n}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\frac{(1,\dots,1)}{\sqrt{n}}\right|=\frac{1}{2n}\sum_{i=1}^{d}(p_{i}^{-1}+d\,p_{d+1}^{-1})\leq\frac{\tau(d+1)^{2}}{2n}, (A.29)

so that

|12(𝜹𝒌,𝒑±(1,,1)n)Σ𝒑1(𝜹𝒌,𝒑±(1,,1)n)12𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑|bn,\displaystyle\left|\frac{1}{2}\Big{(}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\pm\frac{(1,\dots,1)}{\sqrt{n}}\Big{)}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\Big{(}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\pm\frac{(1,\dots,1)}{\sqrt{n}}\Big{)}-\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\right|\leq b_{n}, (A.30)

(using τd+12\tau\geq d+1\geq 2 and nτ416n\geq\tau^{4}\geq 16) where

bn:=1.05τ2(d+1)2(logn)1/2n1/2.b_{n}\vcentcolon=\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}. (A.31)

Using Equation (A.30), Remark 2.2 and the mean value theorem (only when d2d\geq 2), we deduce

(D)\displaystyle(D)_{\ell} (𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑+bn){𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑max(0,bn)}\displaystyle\leq\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell+b_{n}\right)-\mathbb{P}\left\{\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\max(0,\ell-b_{n})\right\}
=γ¯{d/2,(+bn)/2}γ¯{d/2,max(0,bn)/2}\displaystyle=\overline{\gamma}\left\{d/2,(\ell+b_{n})/2\right\}-\overline{\gamma}\left\{d/2,\max(0,\ell-b_{n})/2\right\} (A.32)
maxt[0,):|2t|bntd/21etΓ(d/2)bn\displaystyle\leq\max_{t\in[0,\infty):|2t-\ell|\leq b_{n}}\frac{t^{d/2-1}e^{-t}}{\Gamma(d/2)}\cdot b_{n} (A.33)

Now, notice that the maximum above is attained at t=max(0,/2bn/2)t=\max(0,\ell/2-b_{n}/2) for d{1,2}d\in\{1,2\}. More generally, the function ttd/21ett\mapsto t^{d/2-1}e^{-t} maximizes on (0,)(0,\infty) at t=d/21t=d/2-1 for all d3d\geq 3, and

Γ(d/2)=(d/21)Γ(d/21)2π(d/21)d/2+1/2e(d/21),for all d3,\Gamma(d/2)=(d/2-1)\Gamma(d/2-1)\geq\sqrt{2\pi}(d/2-1)^{d/2+1/2}e^{-(d/2-1)},\quad\text{for all }d\geq 3, (A.34)

by a standard Stirling lower bound (see, e.g., Lemma 1 of Minc & Sathre, (1964/65), or Theorem 2.2 of Batır, (2017) for a more precise bound). Using this in (A) shows that, for any given >0\ell>0 (we assumed that >1/2\ell>1/2 when d=1d=1),

(D){{max(0,/2bn/2)}1/2πemax(0,/2bn/2)bn,if d=1emax(0,/2bn/2)bn,if d=212π(d/21)bn,if d3}bn=1.05τ2(d+1)2(logn)1/2n1/2.(D)_{\ell}\leq\left\{\begin{array}[]{ll}\frac{\{\max(0,\ell/2-b_{n}/2)\}^{-1/2}}{\sqrt{\pi}\,e^{\max(0,\ell/2-b_{n}/2)}}\cdot b_{n},&\mbox{if }d=1\\ e^{-\max(0,\ell/2-b_{n}/2)}\cdot b_{n},&\mbox{if }d=2\\[2.84526pt] \frac{1}{\sqrt{2\pi}(d/2-1)}\cdot b_{n},&\mbox{if }d\geq 3\end{array}\right\}\leq b_{n}=\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}. (A.35)

To bound (E)(E)_{\ell}, we combine (A) and Proposition 2.6 to obtain

(E)\displaystyle(E)_{\ell} n,𝒑{(n,𝒑+[12,12]d)c}\displaystyle\leq\mathbb{Q}_{n,\boldsymbol{p}}\big{\{}\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}^{c}\big{\}}
(𝑿n,𝒑c)+|~n,𝒑n,𝒑|{(n,𝒑+[12,12]d)c}\displaystyle\leq\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})+|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}|\big{\{}\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}^{c}\big{\}}
2(d+1)n1.71+8.03τ3/2(d+1)1/2n1/2\displaystyle\leq\frac{2\,(d+1)}{n^{1.71}}+\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}
8.07τ3/2(d+1)1/2n1/2.\displaystyle\leq\frac{8.07\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}. (A.36)

Finally, by applying the bounds we found for (A)(A)_{\ell}, (B)(B)_{\ell}, (C)(C)_{\ell}, (D)(D)_{\ell} and (E)(E)_{\ell} in (A.23), (A.24), (A.25), (A.35) and (A), respectively, together in (A.22), we get

|(𝜹𝑲,𝒑Σ𝒑1𝜹𝑲,𝒑)(𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑)|\displaystyle\left|\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)-\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)\right|
(A)+(B)+(C)+(C)+(D)+(E)\displaystyle\quad\leq(A)_{\ell}+(B)_{\ell}+(C)_{\ell}+(C)_{\ell}+(D)_{\ell}+(E)_{\ell}
2(d+1)n2+0+8.03τ3/2(d+1)1/2n1/2+1.05τ2(d+1)2(logn)1/2n1/2+8.07τ3/2(d+1)1/2n1/2\displaystyle\quad\leq\frac{2\,(d+1)}{n^{2}}+0+\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}+\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}+\frac{8.07\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}
1.26τ3(d+1)(logn)3/2n1/2.\displaystyle\quad\leq\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}. (A.37)

This ends the proof. ∎

Proof of Corollary 2.8.

By Theorem 2.7, we know that, for τd+1\tau\geq d+1 and nτ4n\geq\tau^{4},

G()F()+1.26τ3(d+1)(logn)3/2n1/2.G(\ell)\leq F(\ell)+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}. (A.38)

By the mean value theorem, we have

1.26τ3(d+1)(logn)3/2n1/2G()G()G()G(c()),\displaystyle\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}\cdot\frac{G^{\prime}(\ell^{\star})}{G^{\prime}(\ell)}\leq G(\ell)-G(c(\ell)), (A.39)

for an appropriate point (c(),)\ell^{\star}\in(c(\ell),\ell), where recall

c():=1.26τ3(d+1)(logn)3/2G()n1/2.c(\ell)\vcentcolon=\ell-\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(\ell)\,n^{1/2}}. (A.40)

It is easily verified that tG(t)t\mapsto G^{\prime}(t) is decreasing for t2(d/21)t\geq 2(d/2-1) (use Remark 2.2 together with the fact that 2(d/21)2(d/2-1) is the mode of the Gamma distribution with shape parameter d/21d/2-1 and scale parameter 22), so that, by choosing n=n(d,τ)n=n(d,\tau) large enough and subsequently >c()2(d/21)\ell>c(\ell)\geq 2(d/2-1), we have

G(c())+1.26τ3(d+1)(logn)3/2n1/2G().G(c(\ell))+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}\leq G(\ell). (A.41)

From (A.41) and (A.38), we deduce

G(c())F()=F(c()+1.26τ3(d+1)(logn)3/2G()n1/2).G(c(\ell))\leq F(\ell)=F\left(c(\ell)+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(\ell)\,n^{1/2}}\right). (A.42)

Therefore, by applying FF^{\star} on both sides, we get, for nn(d,τ)n\geq n(d,\tau) large enough,

Ξn,𝒑(ω)𝜹𝒀(ω),𝒑Σ𝒑1𝜹𝒀(ω),𝒑+1.26τ3(d+1)(logn)3/2G(Ln,𝒑,d,τ(ω))n1/2,ωA,\Xi_{n,\boldsymbol{p}}(\omega)\leq\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(L_{n,\boldsymbol{p},d,\tau}(\omega))\,n^{1/2}},\quad\omega\in A, (A.43)

where Ln,𝒑,d,τ(ω)L_{n,\boldsymbol{p},d,\tau}(\omega) solves 𝜹𝒀(ω),𝒑Σ𝒑1𝜹𝒀(ω),𝒑=c(Ln,𝒑,d,τ(ω))\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}=c(L_{n,\boldsymbol{p},d,\tau}(\omega)) and AA is the event that we defined in (2.21). This ends the proof. ∎

Proof of Proposition 3.2.

Throughout this proof, let :=𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑\ell^{\star}\vcentcolon=\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}. As in the proof of Theorem 2.7, the case d=1d=1 can easily be treated separately using the Berry-Esseen theorem (in fact εn\varepsilon_{n} would be of the form Cn1/2Cn^{-1/2} in that case). Alternatively, we can prove our statement as we do below, for all d1d\geq 1 and >0\ell^{\star}>0, only assuming further that >1/2\ell^{\star}>1/2 when d=1d=1.

By the triangle inequality, we have

|𝒌0dn𝒮dPn,𝒑(𝒌)Pn,𝒑(𝒏)Pn,𝒑(𝒌)ϕΣ𝒑(𝜹𝒚,𝒑)ϕΣ𝒑(𝜹𝒏,𝒑)ϕΣ𝒑(𝜹𝒚,𝒑)nd/2d𝒚|\displaystyle\Bigg{|}\sum_{\begin{subarray}{c}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}\\ P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\end{subarray}}P_{n,\boldsymbol{p}}(\boldsymbol{k})-\int_{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})\geq\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\frac{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})}{n^{d/2}}{\rm d}\boldsymbol{y}\Bigg{|} (A.44)
n,𝒑(n,𝒑c)+|n,𝒑({𝒌n,𝒑:Pn,𝒑(𝒌)Pn,𝒑(𝒏)})~n,𝒑({𝒌n,𝒑:Pn,𝒑(𝒌)Pn,𝒑(𝒏)}+[12,12]d)|\displaystyle\quad\leq\mathbb{P}_{n,\boldsymbol{p}}(\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})+\left|\begin{array}[]{l}\mathbb{P}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\right\}\right)\\ -\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\right\}+\left[-\frac{1}{2},\frac{1}{2}\right]^{d}\right)\end{array}\right|
+~n,𝒑[({𝒌n,𝒑:Pn,𝒑(𝒌)Pn,𝒑(𝒏)}+[12,12]d)({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d)]\displaystyle\quad\quad+\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\right\}+\left[-\frac{1}{2},\frac{1}{2}\right]^{d}\right)\\ \bigtriangleup\,\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)\end{array}\right]
+|~n,𝒑n,𝒑|({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d)\displaystyle\quad\quad+|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}|\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)
+n,𝒑[({𝒌n,𝒑:𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑}+[12,12]d){𝒚(n,𝒑+[12,12]d):𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑}]\displaystyle\quad\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)\\ \bigtriangleup\,\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}\end{array}\right]
+n,𝒑[{𝒚(n,𝒑+[12,12]d):𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑}{𝒚d:𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑}]\displaystyle\quad\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}\\ \bigtriangleup\,\left\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}\end{array}\right]
=:(A)+(B)+()+(C)+(D)+(E).\displaystyle\quad=\vcentcolon(A)_{\ell^{\star}}+(B)_{\ell^{\star}}+(\bigstar)_{\ell^{\star}}+(C)_{\ell^{\star}}+(D)_{\ell^{\star}}+(E)_{\ell^{\star}}.

The terms (A)(A)_{\ell^{\star}}, (B)(B)_{\ell^{\star}}, (C)(C)_{\ell^{\star}}, (D)(D)_{\ell^{\star}} and (E)(E)_{\ell^{\star}} are already bounded in the proof of Theorem 2.7. It remains to bound ()(\bigstar)_{\ell^{\star}}.

If 𝒌,𝒏n,𝒑\boldsymbol{k},\boldsymbol{n}\in\mathscr{B}_{n,\boldsymbol{p}}, then an argument like (A) shows that Pn,𝒑(𝒌)Pn,𝒑(𝒏)P_{n,\boldsymbol{p}}(\boldsymbol{k})\leq P_{n,\boldsymbol{p}}(\boldsymbol{n}) and 1nd/2ϕΣ𝒑(𝜹𝒌,𝒑)>1nd/2ϕΣ𝒑(𝜹𝒏,𝒑)\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})>\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}) imply

0log{nd/2ϕΣ𝒑(𝜹𝒌,𝒑)nd/2ϕΣ𝒑(𝜹𝒏,𝒑)}2an,with an:=96.25τ3(logn)3/2n1/2,0\leq\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\right\}\leq 2a_{n},\quad\text{with }a_{n}\vcentcolon=\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}, (A.45)

and Pn,𝒑(𝒌)<Pn,𝒑(𝒏)P_{n,\boldsymbol{p}}(\boldsymbol{k})<P_{n,\boldsymbol{p}}(\boldsymbol{n}) and 1nd/2ϕΣ𝒑(𝜹𝒌,𝒑)1nd/2ϕΣ𝒑(𝜹𝒏,𝒑)\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})\geq\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}) imply

2anlog{nd/2ϕΣ𝒑(𝜹𝒌,𝒑)nd/2ϕΣ𝒑(𝜹𝒏,𝒑)}0.-2a_{n}\leq\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\right\}\leq 0. (A.46)

Therefore,

()\displaystyle(\bigstar)_{\ell^{\star}} ~n,𝒑({𝒌n,𝒑:|log{nd/2ϕΣ𝒑(𝜹𝒌,𝒑)nd/2ϕΣ𝒑(𝜹𝒏,𝒑)}|2an}+[12,12]d)\displaystyle\leq\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\left|\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\right\}\right|\leq 2a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)
~n,𝒑({𝒌n,𝒑:|𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑|4an}+[12,12]d).\displaystyle\leq\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\big{|}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}-\ell^{\star}\big{|}\leq 4a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right). (A.47)

By the triangle inequality, we can then split the above probability into three distinct terms:

|~n,𝒑n,𝒑|({𝒌n,𝒑:|𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑|4an}+[12,12]d)\displaystyle\leq\big{|}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\big{|}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\big{|}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}-\ell^{\star}\big{|}\leq 4a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)
+n,𝒑[({𝒌n,𝒑:|𝜹𝒌,𝒑Σ𝒑1𝜹𝒌,𝒑|4an}+[12,12]d){𝒚(n,𝒑+[12,12]d):|𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑|4an}]\displaystyle\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\big{|}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}-\ell^{\star}\big{|}\leq 4a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)\\[4.2679pt] \bigtriangleup\,\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\big{|}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}-\ell^{\star}\big{|}\leq 4a_{n}\right\}\end{array}\right] (A.50)
+n,𝒑{𝒚(n,𝒑+[12,12]d):|𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑|4an}\displaystyle\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\big{|}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}-\ell^{\star}\big{|}\leq 4a_{n}\right\}
=:(1)+(2)+(3).\displaystyle=\vcentcolon(\bigstar_{1})_{\ell^{\star}}+(\bigstar_{2})_{\ell^{\star}}+(\bigstar_{3})_{\ell^{\star}}. (A.51)

For the first term in (A), we can use the total variation bound in Proposition 2.6:

(1)8.03τ3/2(d+1)1/2n1/2.(\bigstar_{1})_{\ell^{\star}}\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}. (A.52)

Bounding the second term in (A) is the same as (D)(D)_{\ell^{\star}} except that there are two ellipses instead of one, namely {𝒚d:𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑=+4an}\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}=\ell^{\star}+4a_{n}\} and {𝒚d:𝜹𝒚,𝒑Σ𝒑1𝜹𝒚,𝒑=4an}\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}=\ell^{\star}-4a_{n}\}. Therefore,

(2)2bn=2.10τ2(d+1)2(logn)1/2n1/2.(\bigstar_{2})_{\ell^{\star}}\leq 2\cdot b_{n}=\frac{2.10\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}. (A.53)

For the third term in (A), we can use Equation (A.30), Remark 2.2 and the mean value theorem (only when d2d\geq 2) to obtain

(3)\displaystyle(\bigstar_{3})_{\ell^{\star}} (𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑+4an){𝜹𝒀,𝒑Σ𝒑1𝜹𝒀,𝒑max(0,4an)}\displaystyle\leq\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}+4a_{n})-\mathbb{P}\{\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\max(0,\ell^{\star}-4a_{n})\}
=γ¯{d/2,(+4an)/2}γ¯{d/2,max(0,4an)/2}\displaystyle=\overline{\gamma}\{d/2,(\ell^{\star}+4a_{n})/2\}-\overline{\gamma}\{d/2,\max(0,\ell^{\star}-4a_{n})/2\} (A.54)
maxt[0,):|2t|4antd/21etΓ(d/2)4an\displaystyle\leq\max_{t\in[0,\infty):|2t-\ell^{\star}|\leq 4a_{n}}\frac{t^{d/2-1}e^{-t}}{\Gamma(d/2)}\cdot 4a_{n} (A.55)

Now, notice that the maximum above is attained at t=max(0,/2bn/2)t=\max(0,\ell^{\star}/2-b_{n}/2) for d{1,2}d\in\{1,2\}. More generally, the function ttd/21ett\mapsto t^{d/2-1}e^{-t} maximizes on (0,)(0,\infty) at t=d/21t=d/2-1 for all d3d\geq 3, and

Γ(d/2)=(d/21)Γ(d/21)2π(d/21)d/2+1/2e(d/21),for all d3,\Gamma(d/2)=(d/2-1)\Gamma(d/2-1)\geq\sqrt{2\pi}(d/2-1)^{d/2+1/2}e^{-(d/2-1)},\quad\text{for all }d\geq 3, (A.56)

by a standard Stirling lower bound (see, e.g., Lemma 1 of Minc & Sathre, (1964/65), or Theorem 2.2 of Batır, (2017) for a more precise bound). Using this in (A) shows that, for any given >0\ell^{\star}>0 (we assumed that >1/2\ell^{\star}>1/2 when d=1d=1),

(3){{max(0,/2bn/2)}1/2πemax(0,/2bn/2)bn,if d=1emax(0,/2bn/2)bn,if d=212π(d/21)bn,if d3}4an=385τ3(logn)3/2n1/2.\displaystyle(\bigstar_{3})_{\ell^{\star}}\leq\left\{\begin{array}[]{ll}\frac{\{\max(0,\ell^{\star}/2-b_{n}/2)\}^{-1/2}}{\sqrt{\pi}\,e^{\max(0,\ell^{\star}/2-b_{n}/2)}}\cdot b_{n},&\mbox{if }d=1\\ e^{-\max(0,\ell^{\star}/2-b_{n}/2)}\cdot b_{n},&\mbox{if }d=2\\[2.84526pt] \frac{1}{\sqrt{2\pi}(d/2-1)}\cdot b_{n},&\mbox{if }d\geq 3\end{array}\right\}\leq 4a_{n}=\frac{385\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}. (A.60)

By putting (A.52), (A.53) and (A.60) together in (A), we get

()\displaystyle(\bigstar)_{\ell^{\star}} (1)+(2)+(3)\displaystyle\leq(\bigstar_{1})_{\ell^{\star}}+(\bigstar_{2})_{\ell^{\star}}+(\bigstar_{3})_{\ell^{\star}}
8.03τ3/2(d+1)1/2n1/2+2.10τ2(d+1)2(logn)1/2n1/2+385τ3(logn)3/2n1/2\displaystyle\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}+\frac{2.10\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}+\frac{385\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}
193.70τ3(d+1)(logn)3/2n1/2.\displaystyle\leq\frac{193.70\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}. (A.61)

Finally, by putting the bounds for (A)(A)_{\ell^{\star}}, (B)(B)_{\ell^{\star}}, ()(\bigstar)_{\ell^{\star}}, (C)(C)_{\ell^{\star}}, (D)(D)_{\ell^{\star}}, (E)(E)_{\ell^{\star}} found in (A.23), (A.24), (A), (A.25), (A.35) and (A), respectively, together in (A.44), we get the bound

|𝒌0dn𝒮dPn,𝒑(𝒌)Pn,𝒑(𝒏)Pn,𝒑(𝒌)ϕΣ𝒑(𝜹𝒚,𝒑)ϕΣ𝒑(𝜹𝒏,𝒑)ϕΣ𝒑(𝜹𝒚,𝒑)nd/2d𝒚|\displaystyle\left|\sum_{\begin{subarray}{c}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}\\ P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\end{subarray}}P_{n,\boldsymbol{p}}(\boldsymbol{k})-\int_{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})\geq\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\frac{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})}{n^{d/2}}{\rm d}\boldsymbol{y}\right| (A.62)
(A)+(B)+()+(C)+(D)+(E)\displaystyle\quad\leq(A)_{\ell^{\star}}+(B)_{\ell^{\star}}+(\bigstar)_{\ell^{\star}}+(C)_{\ell^{\star}}+(D)_{\ell^{\star}}+(E)_{\ell^{\star}}
2(d+1)n2+0+193.70τ3(d+1)(logn)3/2n1/2+8.03τ3/2(d+1)1/2n1/2\displaystyle\quad\leq\frac{2\,(d+1)}{n^{2}}+0+\frac{193.70\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}+\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}
+1.05τ2(d+1)2(logn)1/2n1/2+8.07τ3/2(d+1)1/2n1/2\displaystyle\qquad+\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}+\frac{8.07\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}
194.96τ3(d+1)(logn)3/2n1/2.\displaystyle\quad\leq\frac{194.96\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}. (A.63)

This ends the proof. ∎

Proof of Theorem 3.3.

Equation (3.10) is direct consequence of Proposition 3.2. It remains to show that the set 𝒞~(𝒑^n,αεn)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n}) is strictly convex.

Using the calculation in (2), we can write

12𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑\displaystyle\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}} =n2i=1d(p^ipi)2pi+n2(p^d+1pd+1)2pd+1\displaystyle=\frac{n}{2}\sum_{i=1}^{d}\frac{(\hat{p}_{i}-p_{i})^{2}}{p_{i}}+\frac{n}{2}\frac{(\hat{p}_{d+1}-p_{d+1})^{2}}{p_{d+1}}
=n2i=1d(p^i2pi2p^i+pi)+n2(p^d+12pd+12p^d+1+pd+1),\displaystyle=\frac{n}{2}\sum_{i=1}^{d}\left(\frac{\hat{p}_{i}^{2}}{p_{i}}-2\hat{p}_{i}+p_{i}\right)+\frac{n}{2}\left(\frac{\hat{p}_{d+1}^{2}}{p_{d+1}}-2\hat{p}_{d+1}+p_{d+1}\right), (A.64)

so that the Hessian matrix of 𝒑12𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑\boldsymbol{p}\mapsto\tfrac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}} is positive definite on (0,)d(0,\infty)^{d}. Indeed,

d2d𝒑d𝒑12𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑=n[diag{(p^i2pi3)i=1d}+p^d+12pd+13𝟏d𝟏d]>0.\frac{{\rm d}^{2}}{{\rm d}\boldsymbol{p}\,{\rm d}\boldsymbol{p}^{\top}}\,\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}=n\left[\mathrm{diag}\left\{\left(\frac{\hat{p}_{i}^{2}}{p_{i}^{3}}\right)_{i=1}^{d}\right\}+\frac{\hat{p}_{d+1}^{2}}{p_{d+1}^{3}}\boldsymbol{1}_{d}^{\phantom{\top}}\boldsymbol{1}_{d}^{\top}\right]>0. (A.65)

Therefore, 𝒑12𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑\boldsymbol{p}\mapsto\tfrac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}} is strictly convex, and sublevel sets of the form

{𝒑(0,)d:12𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑L}\left\{\boldsymbol{p}\in(0,\infty)^{d}:\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}\leq L\right\} (A.66)

are strictly convex for any L>0L>0. Since yγ(d/2,y)y\mapsto\gamma(d/2,y) is invertible (yγ(d/2,y)y\mapsto\gamma(d/2,y) is clearly increasing), write the inverse as ϕd()\phi_{d}(\cdot). The domain over which we optimize the objective function in (3.11) is the intersection of two strictly convex sets, namely

𝒞~(𝒑^n,α+εn)=𝒮do{𝒑(0,)d:12𝜹𝒏,𝒑Σ𝒑1𝜹𝒏,𝒑ϕd(1α+εn)},\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha+\varepsilon_{n})=\mathcal{S}_{d}^{o}\cap\left\{\boldsymbol{p}\in(0,\infty)^{d}:\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}\leq\phi_{d}(1-\alpha+\varepsilon_{n})\right\}, (A.67)

which proves that 𝒞~(𝒑^n,α+εn)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha+\varepsilon_{n}) is strictly convex. As pointed in Remark 3.1, the fact that the first term in the intersection is 𝒮do\mathcal{S}_{d}^{o} plays a crucial role for 𝒞~(𝒑^n,α+εn)\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha+\varepsilon_{n}) to be strictly convex and thus for the uniqueness minimizer/maximizer of the objective function. ∎

Appendix B Moments estimates

In this section, we derive some moment estimates for the multinomial distribution. The lemma below is used in the proof of Proposition 2.6.

Lemma B.1.

Let nn\in\mathbb{N} and 𝐩𝒮do\boldsymbol{p}\in\mathcal{S}_{d}^{o} be given. If 𝛏=(ξ1,ξ2,,ξd)Multinomial(n,𝐩)\boldsymbol{\xi}=(\xi_{1},\xi_{2},\dots,\xi_{d})\sim\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p}) according to (2.2), then, for all 1id+11\leq i\leq d+1,

𝔼(|δξi,pi|2)=pi(1pi),\displaystyle\mathbb{E}\left(|\delta_{\xi_{i},p_{i}}|^{2}\right)=p_{i}(1-p_{i}), (B.1)
𝔼(|δξi,pi|4)=3pi2(1pi)2+n1pi(17pi+12pi26pi3),\displaystyle\mathbb{E}\left(|\delta_{\xi_{i},p_{i}}|^{4}\right)=3p_{i}^{2}(1-p_{i})^{2}+n^{-1}p_{i}\,(1-7p_{i}+12p_{i}^{2}-6p_{i}^{3}), (B.2)

where ξd+1:=n𝛏1\xi_{d+1}\vcentcolon=n-\|\boldsymbol{\xi}\|_{1}. Moreover, for any Borel set B(d)B\in\mathscr{B}(\mathbb{R}^{d}) and all n16n\geq 16, we have

|𝔼(δξi,pi 1{𝝃B})|12{(𝝃Bc)}1/2,\displaystyle\Big{|}\mathbb{E}\left(\delta_{\xi_{i},p_{i}}\,\mathds{1}_{\{\boldsymbol{\xi}\in B\}}\right)\Big{|}\leq\frac{1}{2}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/2}, (B.3)
|𝔼(δξi,pi3 1{𝝃B})n1/2pi(2pi23pi+1)|18{(𝝃Bc)}1/4,\displaystyle\left|\mathbb{E}\left(\delta_{\xi_{i},p_{i}}^{3}\,\mathds{1}_{\{\boldsymbol{\xi}\in B\}}\right)-n^{-1/2}p_{i}\,(2p_{i}^{2}-3p_{i}+1)\right|\leq\frac{1}{\sqrt{8}}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}, (B.4)
Proof.

The result (B.1) is well-known and the result (B.2) is shown for example in Equation (A.29) of Ouimet, (2021b) or Equation (82) in Ouimet, (2021a). The result (B.3) is a direct consequence of (B.1) together with the Cauchy-Schwarz inequality and the fact that maxp[0,1]p(1p)=1/4\max_{p\in[0,1]}p(1-p)=1/4. Finally, to obtain (B.4), we use the fact that 𝔼(δξi,pi3)=n1/2pi(2pi23pi+1)\mathbb{E}(\delta_{\xi_{i},p_{i}}^{3})=n^{-1/2}p_{i}\,(2p_{i}^{2}-3p_{i}+1) (see Lemma A.1 in Ouimet, (2021b) or Equation (79) in Ouimet, (2021a)) followed by an application of Hölder’s inequality and the bound 𝔼(|δξi,pi|4)1/4\mathbb{E}(|\delta_{\xi_{i},p_{i}}|^{4})\leq 1/4 (which follows from (B.2) and the assumption n16n\geq 16):

|𝔼(δξi,pi3 1{𝝃B})n1/2pi(2pi23pi+1)|\displaystyle\left|\mathbb{E}\left(\delta_{\xi_{i},p_{i}}^{3}\,\mathds{1}_{\{\boldsymbol{\xi}\in B\}}\right)-n^{-1/2}p_{i}\,(2p_{i}^{2}-3p_{i}+1)\right|
=|𝔼(δξi,pi3 1{𝝃Bc})|\displaystyle\quad=\left|\mathbb{E}\left(\delta_{\xi_{i},p_{i}}^{3}\,\mathds{1}_{\{\boldsymbol{\xi}\in B^{c}\}}\right)\right|
{𝔼(|δξi,pi|4)}1/4{𝔼(|δξi,pi|4)}1/4{𝔼(|δξi,pi|4)}1/4{(𝝃Bc)}1/4\displaystyle\quad\leq\left\{\mathbb{E}\left(|\delta_{\xi_{i},p_{i}}|^{4}\right)\right\}^{1/4}\left\{\mathbb{E}\left(|\delta_{\xi_{i},p_{i}}|^{4}\right)\right\}^{1/4}\left\{\mathbb{E}\left(|\delta_{\xi_{i},p_{i}}|^{4}\right)\right\}^{1/4}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}
(1/4)1/4(1/4)1/4(1/4)1/4{(𝝃Bc)}1/4\displaystyle\quad\leq\left(1/4\right)^{1/4}\left(1/4\right)^{1/4}\left(1/4\right)^{1/4}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}
=18{(𝝃Bc)}1/4.\displaystyle\quad=\frac{1}{\sqrt{8}}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}. (B.5)

This ends the proof. ∎

R code

The R code that generated Figures 3.1 and 3.2 is available at
https://www.dropbox.com/s/tfwnnba642740xo/multinomial_method.R?dl=0.

Funding

F. Ouimet was supported by a CRM-Simons postdoctoral fellowship from the Centre de recherches mathématiques (Montréal, Canada) and the Simons Foundation.

References

  • Batır, (2017) Batır, N. 2017. Bounds for the gamma function. Results Math., 72(1-2), 865–874. MR3684463.
  • Carter, (2002) Carter, A. V. 2002. Deficiency distance between multinomial and multivariate normal experiments. Dedicated to the memory of Lucien Le Cam. Ann. Statist., 30(3), 708–730. MR1922539.
  • Esseen, (1945) Esseen, C.-G. 1945. Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian law. Acta Math., 77, 1–125. MR14626.
  • Gaunt, (2021) Gaunt, R. E. 2021. Bounds for the chi-square approximation of the power divergence family of statistics. To appear in Journal of Applied Probability, 1–26. arXiv:2107.00535.
  • Gaunt & Reinert, (2022) Gaunt, R. E., & Reinert, G. 2022. Bounds for the chi-square approximation of Friedman’s statistic by Stein’s method. Preprint, 1–38. arXiv:2111.00949.
  • Gaunt et al., (2017) Gaunt, R. E., Pickett, A. M., & Reinert, G. 2017. Chi-square approximation by Stein’s method with application to Pearson’s statistic. Ann. Appl. Probab., 27(2), 720–756. MR3655852.
  • Langford, (2005) Langford, J. 2005. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6, 273–306. MR2249822.
  • Loader, (2002) Loader, C. 2002. Fast and accurate computation of binomial probabilities. Technical report, 11 pp. https://www.r-project.org/doc/reports/CLoader-dbinom-2002.pdf.
  • Mann, (1997a) Mann, B. 1997a. Convergence rate for χ2\chi^{2} of a multinomial. Unpublished manuscript.
  • Mann, (1997b) Mann, B. 1997b. Stein’s method for χ2\chi^{2} of a multinomial. Unpublished manuscript.
  • Mason & Zhou, (2012) Mason, D. M., & Zhou, H. H. 2012. Quantile coupling inequalities and their applications. Probab. Surv., 9, 439–479. MR3007210.
  • Minc & Sathre, (1964/65) Minc, H., & Sathre, L. 1964/65. Some inequalities involving (r!)1/r(r!)^{1/r}. Proc. Edinburgh Math. Soc. (2), 14, 41–46. MR162751.
  • Ouimet, (2021a) Ouimet, F. 2021a. General formulas for the central and non-central moments of the multinomial distribution. Stats, 4(1), 18–27. doi:10.3390/stats4010002.
  • Ouimet, (2021b) Ouimet, F. 2021b. A precise local limit theorem for the multinomial distribution and some applications. J. Statist. Plann. Inference, 215, 218–233. MR4249129.
  • Prokhorov & Ulyanov, (2013) Prokhorov, Y. V., & Ulyanov, V. V. 2013. Some approximation problems in statistics and probability. Pages 235–249 of: Limit theorems in probability, statistics and number theory. Springer Proc. Math. Stat., vol. 42. Springer, Heidelberg. MR3079145.
  • Read, (1984) Read, T. R. C. 1984. Closer asymptotic approximations for the distributions of the power divergence goodness-of-fit statistics. Ann. Inst. Statist. Math., 36(1), 59–69. MR752006.
  • Remmert, (1998) Remmert, R. 1998. Classical Topics in Complex Function Theory. Graduate Texts in Mathematics, vol. 172. Springer-Verlag, New York. MR1483074.
  • Severini, (2005) Severini, T. A. 2005. Elements of Distribution Theory. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 17. Cambridge University Press, Cambridge. MR2168237.
  • Siotani & Fujikoshi, (1984) Siotani, M., & Fujikoshi, Y. 1984. Asymptotic approximations for the distributions of multinomial goodness-of-fit statistics. Hiroshima Math. J., 14(1), 115–124. MR750392.
  • Tanabe & Sagae, (1992) Tanabe, K., & Sagae, M. 1992. An exact Cholesky decomposition and the generalized inverse of the variance-covariance matrix of the multinomial distribution, with applications. J. Roy. Statist. Soc. Ser. B, 54(1), 211–219. MR1157720.
  • Ulyanov & Zubov, (2009) Ulyanov, V. V., & Zubov, V. N. 2009. Refinement on the convergence of one family of goodness-of-fit statistics to chi-squared distribution. Hiroshima Math. J., 39(1), 133–161. MR2499200.