Non-asymptotic approximations for Pearson’s chi-square statistic and its application to confidence intervals for strictly convex functions of the probability weights of discrete distributions

Eric Bax Frédéric Ouimet Verizon Media, Playa Vista (California) USA Department of Mathematics and Statistics, McGill University, Montréal (Québec) Canada H3A 0B9

Abstract

In this paper, we develop a non-asymptotic local normal approximation for multinomial probabilities. First, we use it to find non-asymptotic total variation bounds between the measures induced by uniformly jittered multinomials and the multivariate normals with the same means and covariances. From the total variation bounds, we also derive a comparison of the cumulative distribution functions and quantile coupling inequalities between Pearson’s chi-square statistic (written as the normalized quadratic form of a multinomial vector) and its multivariate normal analogue. We apply our results to find confidence intervals for the negative entropy of discrete distributions. Our method can be applied more generally to find confidence intervals for strictly convex functions of the weights of discrete distributions.

keywords:

categorical data , convex optimization , entropy , Gaussian approximation , information theory , local limit theorem , multinomial distribution , multivariate normal distribution , quantile coupling

MSC:

[2020]Primary: 62E17, 62F25, 62F30; Secondary: 60E15, 60F99, 62E20, 62H10, 62H12

1 Introduction

A fundamental question in machine learning and statistics is this: Given a set of i.i.d. observations, what can we infer about the distribution that generated the observations? In general, we cannot infer the exact distribution – it is an intractable inverse problem. However, in some cases, if we know that the generating distribution belongs to a parametric family of distributions, then it is often possible to find a confidence set on the parameters for a given significance level. If the confidence set has a shape that makes it amenable to optimization (being convex, for example), then we can use it to infer confidence bounds on functions of the parameters by minimizing/maximizing the functions over the confidence set. In this paper, we are interested in the particular case where the parametric family is a given (fixed) discrete distribution with known values $v_{1},\dots,v_{d+1}$ . The parameters are the probability weights $p_{0,i}$ associated with each value $v_{i}$ . In other words, given a sequence of i.i.d. observations $X_{1},X_{2},\dots,X_{n}$ , we assume that

\mathbb{P}(X_{i}=v_{k})=p_{k},\quad 1\leq k\leq d+1,

(1.1)

where the categorical values $v_{i}$ are fixed and known, but the categorical probabilities $p_{i}$ are unknown parameters. We want confidence bounds on the quantity

\lambda_{0}=f_{v_{1},\dots,v_{d+1}}(p_{0,1},\dots,p_{0,d}),

(1.2)

where the objective function $(p_{1},\dots,p_{d})\mapsto f_{v_{1},\dots,v_{d+1}}(p_{1},\dots,p_{d})$ can depend on the categorical values $v_{1},\dots,v_{d+1}$ and is assumed to be strictly convex. If we sample observations from the distribution (1.1), then note that the probability weights $p_{0,1},\dots,p_{0,d+1}$ are also parameters of the multinomial distribution associated with the sample count vector $\boldsymbol{N}=(N_{1},N_{2},\dots,N_{d+1})$ where

N_{k}=\sum_{i=1}^{n}\mathds{1}_{\{X_{i}=x_{k}\}},\quad 1\leq k\leq d+1.

(1.3)

Therefore, in order to build confidence bounds for $\lambda_{0}$ , an idea is to find a confidence set for the probability parameters of the sample count vector’s multinomial distribution and then minimize/maximize the objective function over the probability parameters in that set. This problem is described in detail and solved in Section 3.

Motivated by the above machine learning problem, we derive in this paper closely related theoretical results that may be of independent interest. They include a non-asymptotic local normal approximation for multinomial probabilities, a non-asymptotic total variation bound between the measures induced by uniformly jittered multinomials and the corresponding multivariate normals with the same means and covariances, and a non-asymptotic comparison of the cumulative distribution functions (c.d.f.s) between Pearson’s chi-square statistic (written as the normalized quadratic form of a multinomial vector) and its multivariate normal analogue, which then leads to quantile coupling bounds as a corollary. (For a good introduction to quantile coupling bounds, see, e.g., Mason & Zhou, (2012).)

To the best of our knowledge, the first two results are new. The local approximation result complements the asymptotic version obtained in Lemma 2.1 of Siotani & Fujikoshi, (1984), via a Stieltjes integral representation of probabilities due to Esseen, (1945), and later rederived independently in Theorem 2.1 of Ouimet, (2021b) in a log-ratio form via elementary Taylor expansions and Stirling’s formula. The total variation bound complements the asymptotic version obtained in the proof of Theorem 1 in Carter, (2002) and its improvement in Lemma 3.1 of Ouimet, (2021b) where the inductive part of Carter’s proof was removed by using the multinomial/normal local approximation directly instead of having a multi-scale comparison between binomials/normals. Unfortunately, the comparison of the c.d.f.s (and thus the quantile coupling result), being of order $(\log n)^{3/2}n^{-1/2}$ turns out to be (slightly) sub-optimal, although the derivation is easy to follow and provides a completely different route than usual (so there should still be some interest in the argument). Earlier asymptotic results in that direction can be found in Siotani & Fujikoshi, (1984) and Read, (1984). Some lapses were later found in the asymptotic expansions of those two papers and corrected in Ulyanov & Zubov, (2009) (see also Prokhorov & Ulyanov, (2013)). Slightly better bounds, of order $n^{-1/2}$ , can be obtained via a multivariate Berry-Esseen theorem, see the Kolmogorov distance bounds of Mann, (1997a, b) mentioned on p.734 of Gaunt et al., (2017) and references therein. Gaunt et al., (2017) themselves derive, using Stein’s method, distributional bounds of order $n^{-1}$ over smooth test functions between Pearson’s chi-square statistic and its asymptotic chi-square distribution, see also Gaunt, (2021) and Gaunt & Reinert, (2022) for similar results with respect to the Friedman and power divergence statistics. Since indicator functions are not smooth, the rate $n^{-1/2}$ would be optimal in our context.

Here is the outline of the paper. The definitions and theoretical results are presented in Section 2 and proved in Appendix A. The solution to the problem described in the introduction is given in Section 3 and related proofs are also differed to Appendix A. Technical moment estimates are gathered in Appendix B.

2 Definitions and theoretical results

For $d\in\mathbb{N}$ , define the (unit) $d$ -dimensional simplex and its interior by

\mathcal{S}_{d}\vcentcolon=\big{\{}\boldsymbol{s}\in[0,1]^{d}:\|\boldsymbol{s}\|_{1}\leq 1\big{\}},\qquad\mathcal{S}_{d}^{o}\vcentcolon=\big{\{}\boldsymbol{s}\in(0,1)^{d}:\|\boldsymbol{s}\|_{1}<1\big{\}},

(2.1)

where $\|\boldsymbol{s}\|_{1}\vcentcolon=\sum_{i=1}^{d}|s_{i}|$ denotes the $\ell^{1}$ norm on $\mathbb{R}^{d}$ . Given a positive integer $n$ and a set of probability weights $\boldsymbol{p}\in\mathcal{S}_{d}$ , the $\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p})$ probability mass function is defined by

P_{n,\boldsymbol{p}}(\boldsymbol{k})=\frac{n!}{\prod_{i=1}^{d+1}k_{i}!}\,\prod_{i=1}^{d+1}p_{i}^{k_{i}},\quad\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d},

(2.2)

where $p_{d+1}\vcentcolon=1-\|\boldsymbol{p}\|_{1}\geq 0$ and $k_{d+1}\vcentcolon=n-\|\boldsymbol{k}\|_{1}$ . With this notation, the multinomial distribution has $(d+1)$ categories, with respective probabilities $p_{i},~{}1\leq i\leq d+1$ . The covariance matrix for the first $d$ components of a multinomial vector is well-known to be $n\Sigma_{\boldsymbol{p}}$ , where $\Sigma_{\boldsymbol{p}}\vcentcolon=\text{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}$ , see, e.g., (Severini,, 2005, p.377). From Theorem 1 in Tanabe & Sagae, (1992), we also know that $\det(\Sigma_{\boldsymbol{p}})=\prod_{i=1}^{d+1}p_{i}$ . Hence, the density function of the multivariate centered Gaussian random vector that has the same covariance matrix $\Sigma_{\boldsymbol{p}}$ is defined as follows:

\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{y})\vcentcolon=\frac{\exp\big{(}-\frac{1}{2}\boldsymbol{y}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\,\boldsymbol{y}\big{)}}{\sqrt{(2\pi)^{d}\,\prod_{i=1}^{d+1}p_{i}}},\quad\boldsymbol{y}\in\mathbb{R}^{d}.

(2.3)

Throughout this section, let $\boldsymbol{p}\in\mathcal{S}_{d}^{o}$ be given, and let

\boldsymbol{K}\sim\mathrm{Multinomial}\hskip 0.85358pt(n,\boldsymbol{p}),\qquad\boldsymbol{Y}\sim\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}}).

(2.4)

Ultimately, our main goal (Theorem 2.7) will be to compare the c.d.f. of Pearson’s chi-square statistic, $\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}$ , with its Gaussian analogue $\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\sim\chi_{d}^{2}$ , namely compare

\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)\quad\text{with}\quad\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell),\quad\text{for }\ell>0,

(2.5)

where $\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}\vcentcolon=(\delta_{y_{1},p_{1}},\delta_{y_{2},p_{2}},\dots,\delta_{y_{d},p_{d}})^{\top}$ ,

\delta_{y,p_{i}}\vcentcolon=\frac{y-np_{i}}{\sqrt{n}},\quad y\in\mathbb{R},~{}1\leq i\leq d+1.

(2.6)

To see that the statistic $\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}$ is just a rewriting of Pearson’s chi-square statistic, use the fact that

\Sigma_{\boldsymbol{p}}^{-1}=\mathrm{diag}(\boldsymbol{p}^{-1})+p_{d+1}^{-1}\boldsymbol{1}_{d}^{\phantom{\top}}\boldsymbol{1}_{d}^{\top}

(2.7)

(see, e.g., (Tanabe & Sagae,, 1992, Eqn. 21)) together with $K_{d+1}=n-\sum_{i=1}^{d}K_{i}$ to write

$\displaystyle\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}$	$\displaystyle=\frac{1}{n}(\boldsymbol{K}-n\boldsymbol{p})^{\top}\left\{\mathrm{diag}(\boldsymbol{p}^{-1})+p_{d+1}^{-1}\boldsymbol{1}_{d}^{\phantom{\top}}\boldsymbol{1}_{d}^{\top}\right\}(\boldsymbol{K}-n\boldsymbol{p})$
	$\displaystyle=\sum_{i=1}^{d}\frac{(K_{i}-np_{i})^{2}}{np_{i}}+\sum_{i=1}^{d}\sum_{j=1}^{d}\frac{(K_{i}-np_{i})(K_{j}-np_{j})}{np_{d+1}}$
	$\displaystyle=\sum_{i=1}^{d}\frac{(K_{i}-np_{i})^{2}}{np_{i}}+\frac{(K_{d+1}-np_{d+1})^{2}}{np_{d+1}}=\sum_{i=1}^{d+1}\frac{(K_{i}-np_{i})^{2}}{np_{i}}.$	(2.8)

Remark 2.1.

Notice that

\delta_{k_{d+1},p_{d+1}}=-\sum_{i=1}^{d}\delta_{k_{i},p_{i}},

(2.9)

because $p_{d+1}=1-\|\boldsymbol{p}\|_{1}$ and $k_{d+1}=n-\|\boldsymbol{k}\|_{1}$ .

Remark 2.2.

Let $\ell>0$ be given. Since we know that $\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}$ is $\chi_{d}^{2}$ distributed, then

\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)=\overline{\gamma}(d/2,\ell/2),

(2.10)

where the two functions

\overline{\gamma}(x,a)\vcentcolon=\frac{\int_{0}^{a}t^{x-1}e^{-t}{\rm d}t}{\Gamma(x)},\qquad\Gamma(x)=\int_{0}^{\infty}t^{x-1}e^{-t}{\rm d}t,\quad x,a>0,

(2.11)

denote the regularized lower incomplete gamma function and Euler’s gamma function, respectively.

Remark 2.3.

Throughout the paper, the notation $u=\mathcal{O}(v)$ means that $\limsup_{n\to\infty}|u/v|<C$ , where $C\in(0,\infty)$ is a universal constant. Whenever $C$ might depend on a parameter, we add a subscript: for example, $u=\mathcal{O}_{d}(v)$ . Similarly, $u=\mathrm{o}(v)$ means that $\lim_{n\to\infty}|u/v|=0$ , and subscripts indicate which parameters the convergence rate can depend on.

The first step (Proposition 2.4) to achieve our goal (2.5) is to prove a non-asymptotic version of the local limit theorem found in (Siotani & Fujikoshi,, 1984, Lemma 2.1) and (Ouimet,, 2021b, Theorem 2.1). Specifically, we develop a non-asymptotic expansion for the log-ratio of the multinomial probability mass function (2.2) over the corresponding multivariate normal density (2.3). Our approximation is uniform for $\boldsymbol{k}$ in the bulk of the $\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p})$ distribution and uniform for $\boldsymbol{p}$ in a compact set away from the boundary of the simplex $\mathcal{S}_{d}$ , namely for

	$\displaystyle\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}\vcentcolon=\left\{\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}^{o}:\max_{1\leq i\leq d+1}\left\|\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right\|\leq\tau\sqrt{\frac{\log n}{n}}\right\},$		(2.12)
	$\displaystyle\boldsymbol{p}\in\mathscr{P}_{\tau}\vcentcolon=\left\{\boldsymbol{s}\in\mathcal{S}_{d}^{o}:\max_{1\leq i\leq d+1}s_{i}^{-1}\leq\tau\right\},$		(2.13)

where $\tau\geq d+1$ is a fixed parameter.

Proposition 2.4 (Non-asymptotic local expansion).

Let $\tau\geq d+1$ be given. Then, uniformly for $\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}$ , $\boldsymbol{p}\in\mathscr{P}_{\tau}$ and $n\geq\tau^{4}$ , we have

\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}=n^{-1/2}\sum_{i=1}^{d+1}\left(\frac{1}{6p_{i}^{2}}\,\delta_{k_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{k_{i},p_{i}}\right)+R_{n,\boldsymbol{p}}(\boldsymbol{k}),

(2.14)

where the error $R_{n,\boldsymbol{p}}(\boldsymbol{k})$ satisfies

\displaystyle|R_{n,\boldsymbol{p}}(\boldsymbol{k})|

\displaystyle\leq n^{-1}\sum_{i=1}^{d+1}\left(\frac{21}{p_{i}^{3}}\,|\delta_{k_{i},p_{i}}|^{4}+\frac{10}{p_{i}^{2}}\,|\delta_{k_{i},p_{i}}|^{2}+\frac{2\tau}{3}\right).

(2.15)

The following corollary may be of independent interest. It extends the local limit theorem from Proposition 2.4 to one-to-one transformations of multinomial vectors.

Corollary 2.5.

Let $\tau\geq d+1$ be given. Let $\boldsymbol{h}(\cdot)$ be a one-to-one mapping from an open subset $\mathcal{D}$ of $\mathcal{S}_{d}$ onto a subset $\mathcal{R}$ of $\mathbb{R}^{d}$ . Assume further that $\boldsymbol{h}$ has continuous partial derivatives on $\mathcal{D}$ and its Jacobian determinant $\det\{\frac{{\rm d}}{{\rm d}\boldsymbol{x}}\boldsymbol{h}(\boldsymbol{x})\}$ is non-zero for all $\boldsymbol{x}\in\mathcal{D}$ . Then, uniformly for $\boldsymbol{y}\in\mathcal{R}$ such that $\boldsymbol{h}^{-1}(\boldsymbol{y})\in\mathscr{B}_{n,\boldsymbol{p}}$ , $\boldsymbol{p}\in\mathscr{P}_{\tau}$ and $n\geq\tau^{4}$ , we have

		$\displaystyle\log\left[\frac{P_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}\left\|\det\left\{\frac{{\rm d}}{{\rm d}\boldsymbol{x}}\boldsymbol{h}(\boldsymbol{x})\|_{\boldsymbol{x}=\boldsymbol{h}^{-1}(\boldsymbol{y})}\right\}\right\|}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{h}^{-1}(\boldsymbol{y}),\boldsymbol{p}})}\right]$		(2.16)
		$\displaystyle\quad=n^{-1/2}\sum_{i=1}^{d+1}\left[\frac{1}{6p_{i}^{2}}\,\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}\right]+R_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\},$		(2.16)

where the error $R_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}$ satisfies

\displaystyle|R_{n,\boldsymbol{p}}\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}|

\displaystyle\leq n^{-1}\sum_{i=1}^{d+1}\left[\frac{21}{p_{i}^{3}}\,|\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}|^{4}+\frac{10}{p_{i}^{2}}\,|\delta_{\{\boldsymbol{h}^{-1}(\boldsymbol{y})\}_{i},p_{i}}|^{2}+\frac{2\tau}{3}\right].

(2.17)

Using the local expansion from Proposition 2.4, the second step (Proposition 2.6) to achieve our goal (2.5) consists in finding an upper bound on the total variation between the probability measure associated with the multivariate normal in (2.3) and the probability measure of a multinomial vector jittered by a uniform random variable on $(-1/2,1/2)^{d}$ . The proposition is a non-asymptotic version of (Ouimet,, 2021b, Lemma 3.1).

Proposition 2.6 (Total variation).

Let $\tau\geq d+1$ be given, and let $\boldsymbol{K}\sim\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p})$ and $\boldsymbol{U}\sim\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}$ , where $\boldsymbol{K}$ and $\boldsymbol{U}$ are assumed independent. Define $\boldsymbol{X}\vcentcolon=\boldsymbol{K}+\boldsymbol{U}$ and let $\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}$ be the law of $\boldsymbol{X}$ . Let $\mathbb{Q}_{n,\boldsymbol{p}}$ be the law of the multivariate normal distribution $\mathrm{Normal}_{d}(n\boldsymbol{p},n\Sigma_{\boldsymbol{p}})$ , where recall $\Sigma_{\boldsymbol{p}}\vcentcolon=\mathrm{diag}(\boldsymbol{p})-\boldsymbol{p}\boldsymbol{p}^{\top}$ . Then, for all $n\geq\tau^{4}$ , we have

\sup_{\boldsymbol{p}\in\mathscr{P}_{\tau}}\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}},

(2.18)

where $\|\cdot\|$ denotes the total variation norm.

The final step (Theorem 2.7) to achieve our goal (2.5) consists in controlling the difference between $\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)$ and $\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)$ using the total variation bound from Proposition 2.6.

Theorem 2.7 (C.d.f. comparison).

Let $\tau\geq d+1$ and $\ell>0$ be given. Then, for all $n\geq\tau^{4}$ , we have

\sup_{\boldsymbol{p}\in\mathscr{P}_{\tau}}\big{|}\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)-\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell)\big{|}\leq\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}.

(2.19)

As a consequence of Remark 2.2, Theorem 2.7 and the mean value theorem, we deduce the following stochastic upper bound on $\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}$ .

Corollary 2.8 (Quantile coupling).

Let $F$ be the cumulative distribution function (c.d.f.) of $\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}$ , and let $F^{\star}(q)\vcentcolon=\inf\{x\in\mathbb{R}:F(x)\geq q\}$ denote the generalized inverse distribution function (or quantile function) associated with $F$ . Also, let $G$ be the c.d.f. of $\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}$ , and define the random variable $\Xi_{n,\boldsymbol{p}}\vcentcolon=F^{\star}\{G(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}})\}$ . For $\ell>0$ , let

c(\ell)\vcentcolon=\ell-\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(\ell)\,n^{1/2}},\quad G^{\prime}(\ell)=\frac{(\ell/2)^{d/2-1}e^{-\ell/2}}{2\,\Gamma(d/2)}.

(2.20)

Then, with the event

A\vcentcolon=\left\{\omega\in\Omega:2(d/2-1)\leq\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}\leq\sup_{\ell\in[2(d/2-1),\infty)}c(\ell)\right\},

(2.21)

we have, for $n\geq n(d,\tau)$ large enough,

\Xi_{n,\boldsymbol{p}}(\omega)\leq\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}\{L_{n,\boldsymbol{p},d,\tau}(\omega)\}\,n^{1/2}},\quad\omega\in A,

(2.22)

where $L_{n,\boldsymbol{p},d,\tau}(\omega)$ solves $\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}=c\{L_{n,\boldsymbol{p},d,\tau}(\omega)\}$ .

3 Application to confidence intervals

3.1 Description of the problem

In this section, we apply the method mentioned in Section 1 and use the normal approximation results we obtained in Section 2 to clarify the optimization step, given the multi-dimensionality of the problem and the roughness of the confidence set for the multinomial.

Formally, consider a discrete random variable $X$ having the following probability mass function

\mathbb{P}(X=v_{i})=p_{0,i},\quad 1\leq i\leq d+1,

(3.1)

where $d\in\mathbb{N}$ is a positive integer, the values $(\boldsymbol{v},v_{d+1})\in\mathbb{R}^{d+1}$ are known, but the probability weights $p_{0,i}$ are unknown. We consider a (fixed and given) strictly convex objective function $\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p})$ , which may depend on the values of $\boldsymbol{v}$ and $v_{d+1}$ , and our goal is to estimate and find a confidence interval for

\lambda_{0}\vcentcolon=f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}_{0}),\quad\text{where }\boldsymbol{p}_{0}\vcentcolon=(p_{0,1},\dots,p_{0,d}).

(3.2)

Given a significance level $\alpha\in(0,1)$ , a sequence of i.i.d. observations $X_{1},X_{2},\dots X_{n}$ following the discrete distribution in (3.1), and the associated sample count random variables

N_{k}\vcentcolon=\sum_{i=1}^{n}\mathds{1}_{\{X_{i}=v_{k}\}},\quad 1\leq k\leq d+1,

(3.3)

we will show how to construct a lower bound $\lambda_{\star}=\lambda_{\star}(\boldsymbol{v},v_{d+1},N_{1},N_{2},\dots,N_{d})$ and an upper bound $\lambda^{\star}=\lambda^{\star}(\boldsymbol{v},v_{d+1},N_{1},N_{2},\dots,N_{d})$ that satisfy

\mathbb{P}(\lambda_{\star}\leq\lambda_{0}\leq\lambda^{\star})\geq 1-\alpha+\varepsilon_{n},\quad\text{for all }n\geq n_{\star},

(3.4)

where $n_{\star}$ is an explicit threshold, $\boldsymbol{N}\vcentcolon=(N_{1},N_{2},\dots,N_{d})\sim\mathrm{Multinomial}\hskip 0.85358pt(n,\boldsymbol{p}_{0})$ , and $\varepsilon_{n}$ is a small explicit quantity that goes to $0$ as $n\to\infty$ .

When there are only two category values ( $d=1$ ), this problem is easy. Indeed, using a binary search, a computer can find an exact $(1-\alpha)\cdot 100\%$ -confidence interval for $p_{0}$ , say $[\underline{p_{0}},\overline{p_{0}}]$ , using the fact that $N_{1}\sim\mathrm{Binomial}(n,p_{0})$ . Then we find $\lambda_{\star}$ (resp., $\lambda^{\star}$ ) simply by minimizing (resp., maximizing) the objective function $p\mapsto f_{v_{1},v_{2}}(p)$ over $p\in[\underline{p_{0}},\overline{p_{0}}]$ , i.e.,

\lambda_{\star}=\min_{p\in[\underline{p_{0}},\overline{p_{0}}]}f_{v_{1},v_{2}}(p),\qquad\lambda^{\star}=\max_{p\in[\underline{p_{0}},\overline{p_{0}}]}f_{v_{1},v_{2}}(p).

(3.5)

(For details on binomial inversion, see, e.g., Langford, (2005). Binomial tails can be computed accurately using Loader’s method Loader, (2002), as used in pbinomial() in R, or Stirling’s approximation with several terms for very large $n$ , or computed exactly using arithmetic on extended numbers such as BigInteger and BigDecimal in Java.)

In higher dimensions however, the problem becomes much more difficult. Given a significance level $\alpha\in(0,1)$ and an (observed) vector of sample counts

n_{i}\in\mathbb{N},~{}1\leq i\leq d+1,\qquad\text{where }n_{d+1}\vcentcolon=n-\|\boldsymbol{n}\|_{1}~{}\text{and}~{}\hat{\boldsymbol{p}}_{n}\vcentcolon=\boldsymbol{n}/n,

(3.6)

we have to optimize the objective function $\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p})$ over vectors of probabilities in the following confidence set for $\boldsymbol{p}_{0}$ :

\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)\vcentcolon=\Bigg{\{}\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\sum_{\begin{subarray}{c}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}\\ P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\end{subarray}}P_{n,\boldsymbol{p}}(\boldsymbol{k})\leq 1-\alpha\Bigg{\}}.

(3.7)

Remark 3.1.

Assuming that the (observed) sample counts $n_{i}$ are all positive (i.e., each of the $d+1$ categories has at least one observation), note that the sum in (3.7) is equal to $1$ for any $\boldsymbol{p}$ on the boundary of the simplex since $P_{n,\boldsymbol{p}}(\boldsymbol{n})=0$ in that case. This is why the confidence set $\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)$ is restricted to the interior of the simplex in (3.7). This remark will play an important role in showing that the optima which define the confidence bounds in Theorem 3.3 are unique, see (A.67) and below.

The shape of the set $\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)$ is unclear and the boundaries need not be smooth, which makes the optimization of the objective function much more complicated than in the binomial case ( $d=1$ ). To overcome this obstacle, our strategy will be to find a smooth superset that contains the exact confidence set for $\boldsymbol{p}_{0}$ and then compute $\lambda_{\star}$ and $\lambda^{\star}$ by optimizing over the smooth superset, as in (3.5). The small quantity $\varepsilon_{n}$ in (3.4) is the “price” we have to pay in confidence to transfer the optimization from the rough (but exact) confidence set to the smooth superset. The smooth superset for $\boldsymbol{p}_{0}=(p_{0,1},p_{0,2},\dots,p_{0,d})$ will be constructed using the fact that the law of the sample count vector $\boldsymbol{N}$ , i.e., the $\mathrm{Multinomial}\hskip 0.85358pt(n,\boldsymbol{p}_{0})$ distribution, is close in total variation, when jittered by a $\mathrm{Uniform}\hskip 0.56905pt(-1/2,1/2)^{d}$ , to the multivariate normal distribution with the same mean $n\boldsymbol{p}$ and covariance $n\Sigma_{\boldsymbol{p}}$ . This point was proved in Proposition 2.6. Other elements of the proof of Theorem 2.7 will be used for the comparison of the confidence sets.

3.2 Solution to the problem

As mentioned previously, it is not obvious (a priori) how we can optimize the objective function $\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p})$ over the confidence set $\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)$ and find an explicit solution given the multi-dimensional, discrete and rough nature of the confidence set. However, a natural idea to simplify the problem is to replace the confidence set $\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)$ by its Gaussian analogue, namely

$\displaystyle\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha)$	$\displaystyle\vcentcolon=\left\{\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\int_{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})\geq\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\frac{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})}{n^{d/2}}{\rm d}\boldsymbol{y}\leq 1-\alpha\right\}$	(3.8)
	$\displaystyle=\left\{\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}\right)\leq 1-\alpha\right\}$
	$\displaystyle=\left\{\boldsymbol{p}\in\mathcal{S}_{d}^{o}:\overline{\gamma}(d/2,\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}/2)\leq 1-\alpha\right\},$

where recall $\boldsymbol{Y}\sim\mathrm{Normal}_{d}(\boldsymbol{0}_{d},\Sigma_{\boldsymbol{p}})$ , $\phi_{\Sigma_{\boldsymbol{p}}}$ was defined in (2.3), $\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}$ was defined in (2.6), and $\overline{\gamma}(\cdot,\cdot)$ was defined in (2.2).

First, we need to find a Gaussian superset $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})$ as in (3.8) that contains the exact confidence set $\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)$ , where $\varepsilon_{n}$ is a positive quantity that goes to zero as $n\to\infty$ . The proposition below does just that. The proof relies on the total variation bound from Proposition 2.6 and on other elements of the proof of Theorem 2.7.

Proposition 3.2.

Let $\alpha\in(0,1)$ , $\tau\geq d+1$ , and let $\boldsymbol{n}\in n\mathcal{S}_{d}^{o}$ be given such that $\hat{\boldsymbol{p}}_{n}\in\mathscr{P}_{\tau}$ . Recall the definition of the confidence sets $\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)$ and $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha)$ in (3.7) and (3.8). For all $n\geq\tau^{4}$ , we have

\mathcal{C}(\hat{\boldsymbol{p}}_{n},\alpha)\subseteq\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n}),\quad\text{with}\quad\varepsilon_{n}\vcentcolon=\frac{194.96\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}.

(3.9)

Finally, by Proposition 3.2, we can find the lower bound $\lambda_{\star}$ and the upper bound $\lambda^{\star}$ by optimizing the objective function $f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p})$ over the Gaussian superset $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})$ . This solves the problem mentioned in the introduction and described in Section 3.1. The proof shows that the Gaussian superset is strictly convex and it also establishes the existence and uniqueness of the minimizers/maximizers.

Theorem 3.3 (Existence and uniqueness of the solution to the problem).

Assume the objective function $\boldsymbol{p}\mapsto f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p})$ to be strictly convex. Let $\alpha\in(0,1)$ , $\tau\geq d+1$ , and let $\boldsymbol{n}\in n\mathcal{S}_{d}^{o}$ be given such that $\hat{\boldsymbol{p}}_{n}\in\mathscr{P}_{\tau}$ . We have

\mathbb{P}(\lambda_{\star}\leq\lambda_{0}\leq\lambda^{\star})\geq 1-\alpha+\varepsilon_{n},\quad\text{for all }n\geq\tau^{4},

(3.10)

with the choices

\lambda_{\star}=\min_{\boldsymbol{p}\in\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})}f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}),\qquad\lambda^{\star}=\max_{\boldsymbol{p}\in\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})}f_{\boldsymbol{v},v_{d+1}}(\boldsymbol{p}).

(3.11)

Furthermore, the set $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})$ is strictly convex, so that the minimizers/maximizers in (3.11) are well defined (i.e., unique).

The bound on the confidence level in (3.10) cannot be close to exact unless the number of observations $n$ is extremely high. Therefore, for practical purposes, we simply let $\varepsilon_{n}=0$ . This is illustrated below for two examples:

	$\displaystyle f_{\boldsymbol{v},v_{d+1}}^{[1]}(\boldsymbol{p})=\sum_{i=1}^{d+1}p_{i}\log p_{i},\qquad(\text{negative entropy function})$		(3.12)
	$\displaystyle f_{\boldsymbol{v},v_{d+1}}^{[2]}(\boldsymbol{p})=\boldsymbol{p}^{\top}A\boldsymbol{p},\qquad(\text{quadratic form})$		(3.13)

where $A$ is a $d\times d$ positive definite matrix that may depend on the known categorical values $\boldsymbol{v}$ and $v_{d+1}$ .

In Figures 3.1 and 3.2, the confidence bounds are shown to converge to the value of the objective functions at $p_{0}$ (i.e., $\lambda_{0}$ ) and the empirical significance levels are shown to stay below the nominal level $\alpha=0.05$ . The R code that generated the figures is available at https://www.dropbox.com/s/tfwnnba642740xo/multinomial_method.R?dl=0.

Refer to caption — (a) Confidence bounds on $\lambda_{0}$ as a function of $n$ .

Appendix A Proofs

Proof of Proposition 2.4.

By taking the logarithm on the left-hand side of (2.2), we have

\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}=\log\left\{\frac{n!\,(2\pi n)^{d/2}}{\prod_{i=1}^{d+1}k_{i}!}\right\}+\sum_{i=1}^{d+1}\left(k_{i}+\frac{1}{2}\right)\log p_{i}+\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}.

(A.1)

From Lindelöf’s estimate of the remainder terms in the expansion of the factorials, found for example on page 67 of Remmert, (1998), we know that

n!=\sqrt{2\pi}\exp\left\{(n+\tfrac{1}{2})\log n-n+\lambda_{n}\right\},\quad\text{where }|\lambda_{n}|\leq\frac{1}{12}n^{-1}.

(A.2)

By applying (A.2) in (A.1) and reorganizing the terms, we get

	$\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}$	$\displaystyle=-\sum_{i=1}^{d+1}\left(k_{i}+\frac{1}{2}\right)\log\left(\frac{k_{i}}{np_{i}}\right)+\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}$		(A.3)
		$\displaystyle\quad+\frac{1}{n}\left\{\frac{\lambda_{n}}{n^{-1}}-\sum_{i=1}^{d+1}\frac{\lambda_{k_{i}}}{k_{i}^{-1}}\cdot\frac{1}{p_{i}}\cdot\left(\frac{k_{i}}{np_{i}}\right)^{-1}\right\}.$		(A.3)

Since $k_{i}/(np_{i})=1+\delta_{k_{i},p_{i}}/(\sqrt{n}\,p_{i})$ , the above is

$\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}$	$\displaystyle=-\sum_{i=1}^{d+1}np_{i}\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)\log\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)$	(A.4)
	$\displaystyle\quad-\sum_{i=1}^{d+1}\frac{1}{2}\log\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)+\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}$
	$\displaystyle\quad+\frac{1}{n}\left\{\frac{\lambda_{n}}{n^{-1}}-\sum_{i=1}^{d+1}\frac{\lambda_{k_{i}}}{k_{i}^{-1}}\cdot\frac{1}{p_{i}}\cdot\left(1+\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)^{-1}\right\}.$

Now, for $|y|\leq\tau\sqrt{(\log n)/n}\leq\sqrt{(\log n)/n^{1/2}}\leq 0.84$ (recall our assumption $n\geq\tau^{4}\geq 16$ ), Lagrange error bounds for the following Taylor expansions imply

$\displaystyle\left\|(1+y)\log(1+y)-\left(y+\frac{y^{2}}{2}-\frac{y^{3}}{6}\right)\right\|$	$\displaystyle\leq\left\|\frac{2}{(1-0.84)^{3}}\right\|\cdot\left\|\frac{y^{4}}{4!}\right\|\leq 21\,\|y\|^{4},$	(A.5)
$\displaystyle\big{\|}\log(1+y)-y\big{\|}$	$\displaystyle\leq\left\|\frac{-1}{(1-0.84)^{2}}\right\|\cdot\left\|\frac{y^{2}}{2!}\right\|\leq 20\,\|y\|^{2},$
$\displaystyle\big{\|}(1+y)^{-1}\big{\|}$	$\displaystyle\leq\left\|\frac{1}{1-0.84}\right\|\leq 7.$

By applying these estimates in (A.4) with $y=\delta_{k_{i},p_{i}}/(\sqrt{n}\,p_{i})$ , together with the error bound $|\lambda_{n}|\leq 1/(12n)$ from (A.2), we obtain

	$\displaystyle\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{k})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}\right\}$	$\displaystyle=-\sum_{i=1}^{d+1}np_{i}\left\{\cancel{\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}}+\bcancel{\frac{1}{2}\left(\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)^{2}}-\frac{1}{6}\left(\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right)^{3}\right\}$		(A.6)
		$\displaystyle\quad-\sum_{i=1}^{d+1}\frac{1}{2}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}+\bcancel{\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}}+R_{n,\boldsymbol{p}}(\boldsymbol{k}),$		(A.6)

where the rest $R_{n,\boldsymbol{p}}(\boldsymbol{k})$ satisfies

\displaystyle|R_{n,\boldsymbol{p}}(\boldsymbol{k})|

\displaystyle\leq\sum_{i=1}^{d+1}np_{i}\cdot 21\,\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{4}+\sum_{i=1}^{d+1}\frac{1}{2}\cdot 20\,\bigg{|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{|}^{2}+\frac{1}{n}\left(\frac{1}{12}+\frac{7}{12}\sum_{i=1}^{d+1}\frac{1}{p_{i}}\right).

(A.7)

The cancellations in (A.6) come from $\delta_{k_{d+1},p_{d+1}}=-\sum_{i=1}^{d}\delta_{k_{i},p_{i}}$ (recall Remark 2.1) and from $(\Sigma_{\boldsymbol{p}}^{-1})_{ij}=p_{i}^{-1}\mathds{1}_{\{i=j\}}+p_{d+1}^{-1}$ for $1\leq i,j\leq d$ by (2.7). This ends the proof. ∎

Proof of Proposition 2.6.

By the comparison of the total variation norm with the Hellinger distance on page 726 of Carter, (2002), we already know that

\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\leq\sqrt{2\,\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})+\mathbb{E}\left[\log\left\{\frac{{\rm d}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}}{{\rm d}\mathbb{Q}_{n,\boldsymbol{p}}}(\boldsymbol{X})\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]}.

(A.8)

By applying a union bound, followed by the fact that $p_{i}\,\tau\geq 1$ for all $\boldsymbol{p}\in\mathscr{P}_{\tau}$ and then Hoeffding’s inequality, we get

	$\displaystyle\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})$	$\displaystyle\leq\sum_{i=1}^{d+1}\mathbb{P}\left(\|\delta_{K_{i},p_{i}}\|>p_{i}\,\tau\sqrt{\log n}\right)\leq\sum_{i=1}^{d+1}\mathbb{P}\left(\|\delta_{K_{i},p_{i}}\|>\sqrt{\log n}\right)$
		$\displaystyle\leq\sum_{i=1}^{d+1}2\,e^{-2(\sqrt{\log n})^{2}}\leq\frac{2\,(d+1)}{n^{2}},$		(A.9)

and similarly, for $n\geq\tau^{4}\geq 16$ ,

	$\displaystyle\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})$	$\displaystyle\leq\sum_{i=1}^{d+1}\mathbb{P}\left(\|\delta_{X_{i},p_{i}}\|>\sqrt{\log n}\right)\leq\sum_{i=1}^{d+1}\mathbb{P}\left(\|\delta_{K_{i},p_{i}}\|>\sqrt{\log n}-\tfrac{1}{2\sqrt{n}}\right)$
		$\displaystyle\leq\sum_{i=1}^{d+1}2\,e^{-2(0.9249\sqrt{\log n})^{2}}\leq\frac{2\,(d+1)}{n^{1.71}}.$		(A.10)

To control the expectation in (A.8), note that if $P_{n,\boldsymbol{p}}(\boldsymbol{x})$ denotes the density function associated with $\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}$ (i.e., it is equal to $P_{n,\boldsymbol{p}}(\boldsymbol{k})$ whenever $\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}^{o}$ is closest to $\boldsymbol{x}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}^{o}+[-1/2,1/2]^{d}$ ), then

		$\displaystyle\mathbb{E}\left[\log\left\{\frac{{\rm d}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}}{{\rm d}\mathbb{Q}_{n,\boldsymbol{p}}}(\boldsymbol{X})\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]$
		$\displaystyle\quad=\mathbb{E}\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{X})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]$
		$\displaystyle\quad=\mathbb{E}\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]$
		$\displaystyle\quad\quad+\mathbb{E}\left[\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]$
		$\displaystyle\quad\quad+\mathbb{E}\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,(\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}-\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}})\right]$
		$\displaystyle\quad=\vcentcolon(\mathrm{I})+(\mathrm{II})+(\mathrm{III}).$		(A.11)

By the non-asymptotic expansion in Proposition 2.4, we have

	$\displaystyle\|(\mathrm{I})\|$	$\displaystyle\leq n^{-1/2}\sum_{i=1}^{d+1}\left\|\mathbb{E}\left\{\left(\frac{1}{6p_{i}^{2}}\,\delta_{k_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{k_{i},p_{i}}\right)\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right\}\right\|$		(A.12)
		$\displaystyle\quad+n^{-1}\sum_{i=1}^{d+1}\left\{\frac{21}{p_{i}^{3}}\,\mathbb{E}\left(\|\delta_{k_{i},p_{i}}\|^{4}\right)+\frac{10}{p_{i}^{2}}\,\mathbb{E}\left(\|\delta_{k_{i},p_{i}}\|^{2}\right)+\frac{2\tau}{3}\right\}.$		(A.12)

By Lemma B.1, and our assumptions $\tau\geq d+1\geq 2$ and $n\geq\tau^{4}\geq 16$ , we can bound the expectations in (A.12) as follows:

$\displaystyle\|(\mathrm{I})\|$	$\displaystyle\leq\frac{d+1}{n^{1/2}}\left[\frac{\tau^{2}}{6}\cdot\frac{1}{\sqrt{8}}\left\{\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})\right\}^{1/4}+\frac{\tau}{2}\cdot\frac{1}{2}\left\{\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})\right\}^{1/2}\right]+\frac{\tau(d+1)}{6n}$
	$\displaystyle\qquad+\frac{d+1}{n}\left\{\left(21\tau\cdot 3+21\tau^{2}\cdot\frac{1}{n}\right)+10\tau+\frac{2\tau}{3}\right\}$
	$\displaystyle\leq\frac{\tau^{2}(d+1)}{n^{1/2}}\left\{\frac{1}{6\sqrt{8}}\sqrt[4]{\frac{2\,(d+1)}{n^{2}}}+\frac{1}{4\tau}\sqrt{\frac{2\,(d+1)}{n^{2}}}\right\}+\frac{38.23\,\tau^{2}\,(d+1)}{n}$
	$\displaystyle\leq\frac{32.29\,\tau^{2}\,(d+1)^{5/4}}{n}.$	(A.13)

For the term $(\mathrm{II})$ in (A), note that

		$\displaystyle\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}$
		$\displaystyle\quad=\frac{1}{2n}(\boldsymbol{X}-n\boldsymbol{p})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{X}-n\boldsymbol{p})-\frac{1}{2n}(\boldsymbol{K}-n\boldsymbol{p})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{K}-n\boldsymbol{p})$
		$\displaystyle\quad=\frac{1}{2n}(\boldsymbol{X}-\boldsymbol{K})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{X}-\boldsymbol{K})$
		$\displaystyle\quad\quad+\frac{1}{2n}\left\{(\boldsymbol{X}-\boldsymbol{K})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{K}-n\boldsymbol{p})+(\boldsymbol{K}-n\boldsymbol{p})^{\top}\Sigma_{\boldsymbol{p}}^{-1}(\boldsymbol{X}-\boldsymbol{K})\right\}.$		(A.14)

Using the expression $(\Sigma_{\boldsymbol{p}}^{-1})_{ij}=p_{i}^{-1}\mathds{1}_{\{i=j\}}+p_{d+1}^{-1}$ in (2.7), the independence of the random variables $(X_{i}-K_{i})_{i=1}^{d}$ , and our assumptions $\tau\geq d+1\geq 2$ and $n\geq\tau^{4}\geq 16$ , we obtain

$\displaystyle\|(\mathrm{II})\|$	$\displaystyle\leq\frac{1}{2n}\sum_{i=1}^{d}\frac{(p_{i}^{-1}+p_{d+1}^{-1})}{12}+\frac{1}{\sqrt{n}}\sum_{i=1}^{d}\frac{\left(\frac{1}{p_{i}}+\frac{d}{p_{d+1}}\right)}{2}\,\sqrt{\mathbb{E}\left(\|\delta_{k_{i},p_{i}}\|^{2}\right)}\sqrt{\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})}$
	$\displaystyle\leq\frac{1}{2n}\cdot\frac{\tau(d+1)}{6}+\frac{1}{\sqrt{n}}\cdot\frac{\tau(d+1)^{2}}{2}\cdot\sqrt{\frac{1}{4}}\cdot\sqrt{\frac{2\,(d+1)}{n^{2}}}$
	$\displaystyle\leq\frac{\tau(d+1)}{3n}.$	(A.15)

To bound the term $(\mathrm{III})$ in (A), we first derive a rough bound for the log-ratio in (2.14) on the set $\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}$ . Note that the three coefficients $(21,20,7)$ we found in (A.5), and hence those in (A.7), can be replaced by $(84,50,10)$ on the event $\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc}\}\cap\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}$ by rerunning the proof of Proposition 2.4, since the bound on $|y|$ just above (A.5) in that case becomes

\max_{1\leq i\leq d+1}\left|\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\right|\leq\tau\sqrt{\frac{\log n}{n}}+\frac{\tau}{2n}\leq 1.076\tau\sqrt{\frac{\log n}{n}}\leq 0.9,\quad\text{when }n\geq\tau^{4}\geq 16.

(A.16)

Taking the above into account, and using our assumptions $\tau\geq d+1\geq 2$ and $n\geq\tau^{4}\geq 16$ , we have

		$\displaystyle\left\|\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right\|$
		$\displaystyle\qquad\leq\sum_{i=1}^{d+1}\left(\frac{np_{i}}{6}\bigg{\|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{\|}^{3}+\frac{1}{2}\bigg{\|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{\|}\right)$
		$\displaystyle\qquad\quad+\sum_{i=1}^{d+1}np_{i}\cdot 84\,\bigg{\|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{\|}^{4}+\sum_{i=1}^{d+1}\frac{1}{2}\cdot 50\,\bigg{\|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{\|}^{2}+\frac{11\tau(d+1)}{12n}$
		$\displaystyle\qquad\leq\frac{n}{6}\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)^{3}+\frac{d+1}{2}\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)$
		$\displaystyle\qquad\quad+84n\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)^{4}+25\left(1.076\,\tau\sqrt{\frac{\log n}{n}}\right)^{2}+\frac{11\tau(d+1)}{12n}$
		$\displaystyle\qquad\leq\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}.$		(A.17)

We also have the following rough bound from (A),

		$\displaystyle\left\|\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right\|$
		$\displaystyle\quad\leq\frac{1}{2n}\sum_{i=1}^{d}\frac{\left(\frac{1}{p_{i}}+\frac{d}{p_{d+1}}\right)}{4}+\frac{1}{\sqrt{n}}\sum_{i=1}^{d}\frac{\left(\frac{1}{p_{i}}+\frac{d}{p_{d+1}}\right)}{2}\,p_{i}\bigg{\|}\frac{\delta_{k_{i},p_{i}}}{\sqrt{n}\,p_{i}}\bigg{\|}\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}$
		$\displaystyle\quad\leq\frac{1}{2n}\cdot\frac{\tau(d+1)^{2}}{4}+\frac{1}{\sqrt{n}}\cdot\frac{\tau(d+1)}{2}\cdot 1.076\,\tau\sqrt{\frac{\log n}{n}}\leq\frac{0.62\,\tau^{3}(\log n)^{1/2}}{n},$		(A.18)

again using our assumptions $\tau\geq d+1\geq 2$ and $n\geq\tau^{4}\geq 16$ . Putting the rough bounds (A) and (A) together yields

$\displaystyle\|(\mathrm{III})\|$	$\displaystyle\leq\left\|\mathbb{E}\left(\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}+\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\right]\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right)\right\|$
	$\displaystyle\leq\left\{\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}+\frac{0.62\,\tau^{3}(\log n)^{1/2}}{n}\right\}\,\mathbb{E}\left[\|\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\|\right]$
	$\displaystyle\leq\frac{96.31\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}\,\mathbb{E}\left[\|\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\|\right].$	(A.19)

Putting (A), (A) and (A) in (A), together with the bound

$\displaystyle\mathbb{E}\left(\|\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}-\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\|\right)$	$\displaystyle=\mathbb{E}\left(\|\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}},\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc}\}}-\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc},\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\|\right)$
	$\displaystyle\leq\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})+\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})$
	$\displaystyle\leq\frac{2\,(d+1)}{n^{2}}+\frac{2\,(d+1)}{n^{1.71}}\leq\frac{2.90\,(d+1)}{n^{1.71}},$	(A.20)

(recall (A) and (A)) yields

$\displaystyle\left\|\mathbb{E}\left[\log\left\{\frac{{\rm d}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}}{{\rm d}\mathbb{Q}_{n,\boldsymbol{p}}}(\boldsymbol{X})\right\}\,\mathds{1}_{\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right]\right\|$	$\displaystyle\leq\|(\mathrm{I})\|+\|(\mathrm{II})\|+\|(\mathrm{III})\|$
	$\displaystyle\leq\frac{32.29\,\tau^{2}\,(d+1)^{5/4}}{n}+\frac{\tau(d+1)}{3n}$
	$\displaystyle\quad+\frac{96.31\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}\cdot\frac{2.90\,(d+1)}{n^{1.71}}$
	$\displaystyle\leq\frac{64.31\,\tau^{3}\,(d+1)}{n}.$	(A.21)

(Again, we used the fact that $\tau\geq d+1\geq 2$ and $n\geq\tau^{4}\geq 16$ to obtain the last inequality.) Now, putting (A) and (A) together in (A.8) gives the conclusion. ∎

Proof of Theorem 2.7.

The case $d=1$ can easily be treated separately using the Berry-Esseen theorem (in fact the bound in (2.19) would be of the form $Cn^{-1/2}$ in that case). Alternatively, we can prove our statement as we do below, for all $d\geq 1$ and any $\ell>0$ , only assuming further that $\ell>1/2$ when $d=1$ .

By the triangle inequality, we have

		$\displaystyle\left\|\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)-\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)\right\|$		(A.22)
		$\displaystyle\quad\leq\mathbb{P}_{n,\boldsymbol{p}}(\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})+\left\|\begin{array}[]{l}\mathbb{P}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\right)\\[1.42262pt] -\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)\end{array}\right\|$
		$\displaystyle\quad\qquad+\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\left(\big{\{}\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\big{\}}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)$
		$\displaystyle\quad\qquad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)\\ \bigtriangleup\,\left\{\boldsymbol{y}\in\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\end{array}\right]$
		$\displaystyle\quad\qquad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left\{\boldsymbol{y}\in\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\\[2.84526pt] \bigtriangleup\,\left\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}\end{array}\right]$
		$\displaystyle\quad=\vcentcolon(A)_{\ell}+(B)_{\ell}+(C)_{\ell}+(D)_{\ell}+(E)_{\ell}.$

First, as in (A), we have

(A)_{\ell}=\mathbb{P}(\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})\leq\frac{2\,(d+1)}{n^{2}}.

(A.23)

Second, note that by the definition of $\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}$ ,

(B)_{\ell}=0.

(A.24)

Third, to bound $(C)_{\ell}$ , we can use the total variation bound in Proposition 2.6. We have

(C)_{\ell}\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}.

(A.25)

To bound $(D)_{\ell}$ , note that $\left\{\boldsymbol{y}\in\mathbb{R}^{d}:n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y}})>n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n}})\right\}$ is a convex and connected set. Hence, if we ignore the restriction to $\mathscr{B}_{n,\boldsymbol{p}}+[-1/2,1/2]^{d}$ , the symmetric difference

		$\displaystyle\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\}+\big{[}-\tfrac{1}{2},\tfrac{1}{2}\big{]}^{d}\right)$		(A.26)
		$\displaystyle\bigtriangleup\,\left\{\boldsymbol{y}\in\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right\},$		(A.26)

is contained in a $d$ -dimensional shell of thickness at most $1$ in any given direction that is parallel to one of the $d$ axes. Figure A.3 illustrates the situation when $d=2$ .

When $\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}$ , we have

	$\displaystyle\left\|\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\frac{(1,\dots,1)}{\sqrt{n}}\right\|=\frac{\tau}{2}\sqrt{\frac{\log n}{n}}\sum_{i=1}^{d}(p_{i}^{-1}+d\,p_{d+1}^{-1})\leq\frac{\tau^{2}(d+1)^{2}(\log n)^{1/2}}{2n^{1/2}},$		(A.27)
	$\displaystyle\left\|\frac{1}{2}\frac{(1,\dots,1)^{\top}}{\sqrt{n}}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\right\|=\frac{\tau}{2}\sqrt{\frac{\log n}{n}}\sum_{i=1}^{d}(p_{i}^{-1}+d\,p_{d+1}^{-1})\leq\frac{\tau^{2}(d+1)^{2}(\log n)^{1/2}}{2n^{1/2}},$		(A.28)
	$\displaystyle\left\|\frac{1}{2}\frac{(1,\dots,1)}{\sqrt{n}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\frac{(1,\dots,1)}{\sqrt{n}}\right\|=\frac{1}{2n}\sum_{i=1}^{d}(p_{i}^{-1}+d\,p_{d+1}^{-1})\leq\frac{\tau(d+1)^{2}}{2n},$		(A.29)

so that

\displaystyle\left|\frac{1}{2}\Big{(}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\pm\frac{(1,\dots,1)}{\sqrt{n}}\Big{)}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\Big{(}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\pm\frac{(1,\dots,1)}{\sqrt{n}}\Big{)}-\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}\right|\leq b_{n},

(A.30)

(using $\tau\geq d+1\geq 2$ and $n\geq\tau^{4}\geq 16$ ) where

b_{n}\vcentcolon=\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}.

(A.31)

Using Equation (A.30), Remark 2.2 and the mean value theorem (only when $d\geq 2$ ), we deduce

$\displaystyle(D)_{\ell}$	$\displaystyle\leq\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell+b_{n}\right)-\mathbb{P}\left\{\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\max(0,\ell-b_{n})\right\}$
	$\displaystyle=\overline{\gamma}\left\{d/2,(\ell+b_{n})/2\right\}-\overline{\gamma}\left\{d/2,\max(0,\ell-b_{n})/2\right\}$	(A.32)
	$\displaystyle\leq\max_{t\in[0,\infty):\|2t-\ell\|\leq b_{n}}\frac{t^{d/2-1}e^{-t}}{\Gamma(d/2)}\cdot b_{n}$	(A.33)

Now, notice that the maximum above is attained at $t=\max(0,\ell/2-b_{n}/2)$ for $d\in\{1,2\}$ . More generally, the function $t\mapsto t^{d/2-1}e^{-t}$ maximizes on $(0,\infty)$ at $t=d/2-1$ for all $d\geq 3$ , and

\Gamma(d/2)=(d/2-1)\Gamma(d/2-1)\geq\sqrt{2\pi}(d/2-1)^{d/2+1/2}e^{-(d/2-1)},\quad\text{for all }d\geq 3,

(A.34)

by a standard Stirling lower bound (see, e.g., Lemma 1 of Minc & Sathre, (1964/65), or Theorem 2.2 of Batır, (2017) for a more precise bound). Using this in (A) shows that, for any given $\ell>0$ (we assumed that $\ell>1/2$ when $d=1$ ),

(D)_{\ell}\leq\left\{\begin{array}[]{ll}\frac{\{\max(0,\ell/2-b_{n}/2)\}^{-1/2}}{\sqrt{\pi}\,e^{\max(0,\ell/2-b_{n}/2)}}\cdot b_{n},&\mbox{if }d=1\\ e^{-\max(0,\ell/2-b_{n}/2)}\cdot b_{n},&\mbox{if }d=2\\[2.84526pt] \frac{1}{\sqrt{2\pi}(d/2-1)}\cdot b_{n},&\mbox{if }d\geq 3\end{array}\right\}\leq b_{n}=\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}.

(A.35)

To bound $(E)_{\ell}$ , we combine (A) and Proposition 2.6 to obtain

$\displaystyle(E)_{\ell}$	$\displaystyle\leq\mathbb{Q}_{n,\boldsymbol{p}}\big{\{}\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}^{c}\big{\}}$
	$\displaystyle\leq\mathbb{P}(\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 1.42262ptc})+\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\big{\{}\big{(}\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\big{)}^{c}\big{\}}$
	$\displaystyle\leq\frac{2\,(d+1)}{n^{1.71}}+\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}$
	$\displaystyle\leq\frac{8.07\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}.$	(A.36)

Finally, by applying the bounds we found for $(A)_{\ell}$ , $(B)_{\ell}$ , $(C)_{\ell}$ , $(D)_{\ell}$ and $(E)_{\ell}$ in (A.23), (A.24), (A.25), (A.35) and (A), respectively, together in (A.22), we get

	$\displaystyle\left\|\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)-\mathbb{P}\left(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell\right)\right\|$
	$\displaystyle\quad\leq(A)_{\ell}+(B)_{\ell}+(C)_{\ell}+(C)_{\ell}+(D)_{\ell}+(E)_{\ell}$
	$\displaystyle\quad\leq\frac{2\,(d+1)}{n^{2}}+0+\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}+\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}+\frac{8.07\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}$
	$\displaystyle\quad\leq\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}.$		(A.37)

This ends the proof. ∎

Proof of Corollary 2.8.

By Theorem 2.7, we know that, for $\tau\geq d+1$ and $n\geq\tau^{4}$ ,

G(\ell)\leq F(\ell)+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}.

(A.38)

By the mean value theorem, we have

\displaystyle\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}\cdot\frac{G^{\prime}(\ell^{\star})}{G^{\prime}(\ell)}\leq G(\ell)-G(c(\ell)),

(A.39)

for an appropriate point $\ell^{\star}\in(c(\ell),\ell)$ , where recall

c(\ell)\vcentcolon=\ell-\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(\ell)\,n^{1/2}}.

(A.40)

It is easily verified that $t\mapsto G^{\prime}(t)$ is decreasing for $t\geq 2(d/2-1)$ (use Remark 2.2 together with the fact that $2(d/2-1)$ is the mode of the Gamma distribution with shape parameter $d/2-1$ and scale parameter $2$ ), so that, by choosing $n=n(d,\tau)$ large enough and subsequently $\ell>c(\ell)\geq 2(d/2-1)$ , we have

G(c(\ell))+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}\leq G(\ell).

(A.41)

From (A.41) and (A.38), we deduce

G(c(\ell))\leq F(\ell)=F\left(c(\ell)+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(\ell)\,n^{1/2}}\right).

(A.42)

Therefore, by applying $F^{\star}$ on both sides, we get, for $n\geq n(d,\tau)$ large enough,

\Xi_{n,\boldsymbol{p}}(\omega)\leq\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y}(\omega),\boldsymbol{p}}^{\phantom{\top}}+\frac{1.26\,\tau^{3}(d+1)(\log n)^{3/2}}{G^{\prime}(L_{n,\boldsymbol{p},d,\tau}(\omega))\,n^{1/2}},\quad\omega\in A,

(A.43)

Proof of Proposition 3.2.

Throughout this proof, let $\ell^{\star}\vcentcolon=\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}$ . As in the proof of Theorem 2.7, the case $d=1$ can easily be treated separately using the Berry-Esseen theorem (in fact $\varepsilon_{n}$ would be of the form $Cn^{-1/2}$ in that case). Alternatively, we can prove our statement as we do below, for all $d\geq 1$ and $\ell^{\star}>0$ , only assuming further that $\ell^{\star}>1/2$ when $d=1$ .

By the triangle inequality, we have

		$\displaystyle\Bigg{\|}\sum_{\begin{subarray}{c}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}\\ P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\end{subarray}}P_{n,\boldsymbol{p}}(\boldsymbol{k})-\int_{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})\geq\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\frac{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})}{n^{d/2}}{\rm d}\boldsymbol{y}\Bigg{\|}$		(A.44)
		$\displaystyle\quad\leq\mathbb{P}_{n,\boldsymbol{p}}(\mathscr{B}_{n,\boldsymbol{p}}^{\hskip 0.56905ptc})+\left\|\begin{array}[]{l}\mathbb{P}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\right\}\right)\\ -\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\right\}+\left[-\frac{1}{2},\frac{1}{2}\right]^{d}\right)\end{array}\right\|$
		$\displaystyle\quad\quad+\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\right\}+\left[-\frac{1}{2},\frac{1}{2}\right]^{d}\right)\\ \bigtriangleup\,\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)\end{array}\right]$
		$\displaystyle\quad\quad+\|\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\|\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)$
		$\displaystyle\quad\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)\\ \bigtriangleup\,\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}\end{array}\right]$
		$\displaystyle\quad\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}\\ \bigtriangleup\,\left\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}\right\}\end{array}\right]$
		$\displaystyle\quad=\vcentcolon(A)_{\ell^{\star}}+(B)_{\ell^{\star}}+(\bigstar)_{\ell^{\star}}+(C)_{\ell^{\star}}+(D)_{\ell^{\star}}+(E)_{\ell^{\star}}.$

The terms $(A)_{\ell^{\star}}$ , $(B)_{\ell^{\star}}$ , $(C)_{\ell^{\star}}$ , $(D)_{\ell^{\star}}$ and $(E)_{\ell^{\star}}$ are already bounded in the proof of Theorem 2.7. It remains to bound $(\bigstar)_{\ell^{\star}}$ .

If $\boldsymbol{k},\boldsymbol{n}\in\mathscr{B}_{n,\boldsymbol{p}}$ , then an argument like (A) shows that $P_{n,\boldsymbol{p}}(\boldsymbol{k})\leq P_{n,\boldsymbol{p}}(\boldsymbol{n})$ and $\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})>\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})$ imply

0\leq\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\right\}\leq 2a_{n},\quad\text{with }a_{n}\vcentcolon=\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}},

(A.45)

and $P_{n,\boldsymbol{p}}(\boldsymbol{k})<P_{n,\boldsymbol{p}}(\boldsymbol{n})$ and $\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})\geq\tfrac{1}{n^{d/2}}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})$ imply

-2a_{n}\leq\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\right\}\leq 0.

(A.46)

Therefore,

	$\displaystyle(\bigstar)_{\ell^{\star}}$	$\displaystyle\leq\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\left\|\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\right\}\right\|\leq 2a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)$
		$\displaystyle\leq\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\big{\|}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}-\ell^{\star}\big{\|}\leq 4a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right).$		(A.47)

By the triangle inequality, we can then split the above probability into three distinct terms:

		$\displaystyle\leq\big{\|}\widetilde{\mathbb{P}}_{n,\boldsymbol{p}}-\mathbb{Q}_{n,\boldsymbol{p}}\big{\|}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\big{\|}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}-\ell^{\star}\big{\|}\leq 4a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)$
		$\displaystyle\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left[\begin{array}[]{l}\left(\left\{\boldsymbol{k}\in\mathscr{B}_{n,\boldsymbol{p}}:\big{\|}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{k},\boldsymbol{p}}^{\phantom{\top}}-\ell^{\star}\big{\|}\leq 4a_{n}\right\}+\left[-\tfrac{1}{2},\tfrac{1}{2}\right]^{d}\right)\\[4.2679pt] \bigtriangleup\,\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\big{\|}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}-\ell^{\star}\big{\|}\leq 4a_{n}\right\}\end{array}\right]$		(A.50)
		$\displaystyle\quad+\mathbb{Q}_{n,\boldsymbol{p}}\left\{\boldsymbol{y}\in\left(\mathscr{B}_{n,\boldsymbol{p}}+[-\tfrac{1}{2},\tfrac{1}{2}]^{d}\right):\big{\|}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}-\ell^{\star}\big{\|}\leq 4a_{n}\right\}$
		$\displaystyle=\vcentcolon(\bigstar_{1})_{\ell^{\star}}+(\bigstar_{2})_{\ell^{\star}}+(\bigstar_{3})_{\ell^{\star}}.$		(A.51)

For the first term in (A), we can use the total variation bound in Proposition 2.6:

(\bigstar_{1})_{\ell^{\star}}\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}.

(A.52)

Bounding the second term in (A) is the same as $(D)_{\ell^{\star}}$ except that there are two ellipses instead of one, namely $\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}=\ell^{\star}+4a_{n}\}$ and $\{\boldsymbol{y}\in\mathbb{R}^{d}:\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}}=\ell^{\star}-4a_{n}\}$ . Therefore,

(\bigstar_{2})_{\ell^{\star}}\leq 2\cdot b_{n}=\frac{2.10\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}.

(A.53)

For the third term in (A), we can use Equation (A.30), Remark 2.2 and the mean value theorem (only when $d\geq 2$ ) to obtain

$\displaystyle(\bigstar_{3})_{\ell^{\star}}$	$\displaystyle\leq\mathbb{P}(\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\ell^{\star}+4a_{n})-\mathbb{P}\{\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{Y},\boldsymbol{p}}^{\phantom{\top}}\leq\max(0,\ell^{\star}-4a_{n})\}$
	$\displaystyle=\overline{\gamma}\{d/2,(\ell^{\star}+4a_{n})/2\}-\overline{\gamma}\{d/2,\max(0,\ell^{\star}-4a_{n})/2\}$	(A.54)
	$\displaystyle\leq\max_{t\in[0,\infty):\|2t-\ell^{\star}\|\leq 4a_{n}}\frac{t^{d/2-1}e^{-t}}{\Gamma(d/2)}\cdot 4a_{n}$	(A.55)

Now, notice that the maximum above is attained at $t=\max(0,\ell^{\star}/2-b_{n}/2)$ for $d\in\{1,2\}$ . More generally, the function $t\mapsto t^{d/2-1}e^{-t}$ maximizes on $(0,\infty)$ at $t=d/2-1$ for all $d\geq 3$ , and

\Gamma(d/2)=(d/2-1)\Gamma(d/2-1)\geq\sqrt{2\pi}(d/2-1)^{d/2+1/2}e^{-(d/2-1)},\quad\text{for all }d\geq 3,

(A.56)

\displaystyle(\bigstar_{3})_{\ell^{\star}}\leq\left\{\begin{array}[]{ll}\frac{\{\max(0,\ell^{\star}/2-b_{n}/2)\}^{-1/2}}{\sqrt{\pi}\,e^{\max(0,\ell^{\star}/2-b_{n}/2)}}\cdot b_{n},&\mbox{if }d=1\\ e^{-\max(0,\ell^{\star}/2-b_{n}/2)}\cdot b_{n},&\mbox{if }d=2\\[2.84526pt] \frac{1}{\sqrt{2\pi}(d/2-1)}\cdot b_{n},&\mbox{if }d\geq 3\end{array}\right\}\leq 4a_{n}=\frac{385\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}.

(A.60)

By putting (A.52), (A.53) and (A.60) together in (A), we get

$\displaystyle(\bigstar)_{\ell^{\star}}$	$\displaystyle\leq(\bigstar_{1})_{\ell^{\star}}+(\bigstar_{2})_{\ell^{\star}}+(\bigstar_{3})_{\ell^{\star}}$
	$\displaystyle\leq\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}+\frac{2.10\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}+\frac{385\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}$
	$\displaystyle\leq\frac{193.70\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}.$	(A.61)

Finally, by putting the bounds for $(A)_{\ell^{\star}}$ , $(B)_{\ell^{\star}}$ , $(\bigstar)_{\ell^{\star}}$ , $(C)_{\ell^{\star}}$ , $(D)_{\ell^{\star}}$ , $(E)_{\ell^{\star}}$ found in (A.23), (A.24), (A), (A.25), (A.35) and (A), respectively, together in (A.44), we get the bound

	$\displaystyle\left\|\sum_{\begin{subarray}{c}\boldsymbol{k}\in\mathbb{N}_{0}^{d}\cap n\mathcal{S}_{d}\\ P_{n,\boldsymbol{p}}(\boldsymbol{k})\geq P_{n,\boldsymbol{p}}(\boldsymbol{n})\end{subarray}}P_{n,\boldsymbol{p}}(\boldsymbol{k})-\int_{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})\geq\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}})}\frac{\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{y},\boldsymbol{p}})}{n^{d/2}}{\rm d}\boldsymbol{y}\right\|$		(A.62)
	$\displaystyle\quad\leq(A)_{\ell^{\star}}+(B)_{\ell^{\star}}+(\bigstar)_{\ell^{\star}}+(C)_{\ell^{\star}}+(D)_{\ell^{\star}}+(E)_{\ell^{\star}}$
	$\displaystyle\quad\leq\frac{2\,(d+1)}{n^{2}}+0+\frac{193.70\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}+\frac{8.03\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}$
	$\displaystyle\qquad+\frac{1.05\,\tau^{2}(d+1)^{2}(\log n)^{1/2}}{n^{1/2}}+\frac{8.07\,\tau^{3/2}(d+1)^{1/2}}{n^{1/2}}$
	$\displaystyle\quad\leq\frac{194.96\,\tau^{3}(d+1)(\log n)^{3/2}}{n^{1/2}}.$		(A.63)

This ends the proof. ∎

Proof of Theorem 3.3.

Equation (3.10) is direct consequence of Proposition 3.2. It remains to show that the set $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha-\varepsilon_{n})$ is strictly convex.

Using the calculation in (2), we can write

	$\displaystyle\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}$	$\displaystyle=\frac{n}{2}\sum_{i=1}^{d}\frac{(\hat{p}_{i}-p_{i})^{2}}{p_{i}}+\frac{n}{2}\frac{(\hat{p}_{d+1}-p_{d+1})^{2}}{p_{d+1}}$
		$\displaystyle=\frac{n}{2}\sum_{i=1}^{d}\left(\frac{\hat{p}_{i}^{2}}{p_{i}}-2\hat{p}_{i}+p_{i}\right)+\frac{n}{2}\left(\frac{\hat{p}_{d+1}^{2}}{p_{d+1}}-2\hat{p}_{d+1}+p_{d+1}\right),$		(A.64)

so that the Hessian matrix of $\boldsymbol{p}\mapsto\tfrac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}$ is positive definite on $(0,\infty)^{d}$ . Indeed,

\frac{{\rm d}^{2}}{{\rm d}\boldsymbol{p}\,{\rm d}\boldsymbol{p}^{\top}}\,\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}=n\left[\mathrm{diag}\left\{\left(\frac{\hat{p}_{i}^{2}}{p_{i}^{3}}\right)_{i=1}^{d}\right\}+\frac{\hat{p}_{d+1}^{2}}{p_{d+1}^{3}}\boldsymbol{1}_{d}^{\phantom{\top}}\boldsymbol{1}_{d}^{\top}\right]>0.

(A.65)

Therefore, $\boldsymbol{p}\mapsto\tfrac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}$ is strictly convex, and sublevel sets of the form

\left\{\boldsymbol{p}\in(0,\infty)^{d}:\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}\leq L\right\}

(A.66)

are strictly convex for any $L>0$ . Since $y\mapsto\gamma(d/2,y)$ is invertible ( $y\mapsto\gamma(d/2,y)$ is clearly increasing), write the inverse as $\phi_{d}(\cdot)$ . The domain over which we optimize the objective function in (3.11) is the intersection of two strictly convex sets, namely

\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha+\varepsilon_{n})=\mathcal{S}_{d}^{o}\cap\left\{\boldsymbol{p}\in(0,\infty)^{d}:\frac{1}{2}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\top}\Sigma_{\boldsymbol{p}}^{-1}\boldsymbol{\delta}_{\boldsymbol{n},\boldsymbol{p}}^{\phantom{\top}}\leq\phi_{d}(1-\alpha+\varepsilon_{n})\right\},

(A.67)

which proves that $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha+\varepsilon_{n})$ is strictly convex. As pointed in Remark 3.1, the fact that the first term in the intersection is $\mathcal{S}_{d}^{o}$ plays a crucial role for $\widetilde{\mathcal{C}}(\hat{\boldsymbol{p}}_{n},\alpha+\varepsilon_{n})$ to be strictly convex and thus for the uniqueness minimizer/maximizer of the objective function. ∎

Appendix B Moments estimates

In this section, we derive some moment estimates for the multinomial distribution. The lemma below is used in the proof of Proposition 2.6.

Lemma B.1.

Let $n\in\mathbb{N}$ and $\boldsymbol{p}\in\mathcal{S}_{d}^{o}$ be given. If $\boldsymbol{\xi}=(\xi_{1},\xi_{2},\dots,\xi_{d})\sim\mathrm{Multinomial}\hskip 0.56905pt(n,\boldsymbol{p})$ according to (2.2), then, for all $1\leq i\leq d+1$ ,

	$\displaystyle\mathbb{E}\left(\|\delta_{\xi_{i},p_{i}}\|^{2}\right)=p_{i}(1-p_{i}),$		(B.1)
	$\displaystyle\mathbb{E}\left(\|\delta_{\xi_{i},p_{i}}\|^{4}\right)=3p_{i}^{2}(1-p_{i})^{2}+n^{-1}p_{i}\,(1-7p_{i}+12p_{i}^{2}-6p_{i}^{3}),$		(B.2)

where $\xi_{d+1}\vcentcolon=n-\|\boldsymbol{\xi}\|_{1}$ . Moreover, for any Borel set $B\in\mathscr{B}(\mathbb{R}^{d})$ and all $n\geq 16$ , we have

	$\displaystyle\Big{\|}\mathbb{E}\left(\delta_{\xi_{i},p_{i}}\,\mathds{1}_{\{\boldsymbol{\xi}\in B\}}\right)\Big{\|}\leq\frac{1}{2}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/2},$		(B.3)
	$\displaystyle\left\|\mathbb{E}\left(\delta_{\xi_{i},p_{i}}^{3}\,\mathds{1}_{\{\boldsymbol{\xi}\in B\}}\right)-n^{-1/2}p_{i}\,(2p_{i}^{2}-3p_{i}+1)\right\|\leq\frac{1}{\sqrt{8}}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4},$		(B.4)

Proof.

The result (B.1) is well-known and the result (B.2) is shown for example in Equation (A.29) of Ouimet, (2021b) or Equation (82) in Ouimet, (2021a). The result (B.3) is a direct consequence of (B.1) together with the Cauchy-Schwarz inequality and the fact that $\max_{p\in[0,1]}p(1-p)=1/4$ . Finally, to obtain (B.4), we use the fact that $\mathbb{E}(\delta_{\xi_{i},p_{i}}^{3})=n^{-1/2}p_{i}\,(2p_{i}^{2}-3p_{i}+1)$ (see Lemma A.1 in Ouimet, (2021b) or Equation (79) in Ouimet, (2021a)) followed by an application of Hölder’s inequality and the bound $\mathbb{E}(|\delta_{\xi_{i},p_{i}}|^{4})\leq 1/4$ (which follows from (B.2) and the assumption $n\geq 16$ ):

		$\displaystyle\left\|\mathbb{E}\left(\delta_{\xi_{i},p_{i}}^{3}\,\mathds{1}_{\{\boldsymbol{\xi}\in B\}}\right)-n^{-1/2}p_{i}\,(2p_{i}^{2}-3p_{i}+1)\right\|$
		$\displaystyle\quad=\left\|\mathbb{E}\left(\delta_{\xi_{i},p_{i}}^{3}\,\mathds{1}_{\{\boldsymbol{\xi}\in B^{c}\}}\right)\right\|$
		$\displaystyle\quad\leq\left\{\mathbb{E}\left(\|\delta_{\xi_{i},p_{i}}\|^{4}\right)\right\}^{1/4}\left\{\mathbb{E}\left(\|\delta_{\xi_{i},p_{i}}\|^{4}\right)\right\}^{1/4}\left\{\mathbb{E}\left(\|\delta_{\xi_{i},p_{i}}\|^{4}\right)\right\}^{1/4}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}$
		$\displaystyle\quad\leq\left(1/4\right)^{1/4}\left(1/4\right)^{1/4}\left(1/4\right)^{1/4}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}$
		$\displaystyle\quad=\frac{1}{\sqrt{8}}\left\{\mathbb{P}(\boldsymbol{\xi}\in B^{c})\right\}^{1/4}.$		(B.5)

This ends the proof. ∎

R code

The R code that generated Figures 3.1 and 3.2 is available at
https://www.dropbox.com/s/tfwnnba642740xo/multinomial_method.R?dl=0.

Funding

F. Ouimet was supported by a CRM-Simons postdoctoral fellowship from the Centre de recherches mathématiques (Montréal, Canada) and the Simons Foundation.

References

Batır, (2017) Batır, N. 2017. Bounds for the gamma function. Results Math., 72(1-2), 865–874. MR3684463.
Carter, (2002) Carter, A. V. 2002. Deficiency distance between multinomial and multivariate normal experiments. Dedicated to the memory of Lucien Le Cam. Ann. Statist., 30(3), 708–730. MR1922539.
Esseen, (1945) Esseen, C.-G. 1945. Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian law. Acta Math., 77, 1–125. MR14626.
Gaunt, (2021) Gaunt, R. E. 2021. Bounds for the chi-square approximation of the power divergence family of statistics. To appear in Journal of Applied Probability, 1–26. arXiv:2107.00535.
Gaunt & Reinert, (2022) Gaunt, R. E., & Reinert, G. 2022. Bounds for the chi-square approximation of Friedman’s statistic by Stein’s method. Preprint, 1–38. arXiv:2111.00949.
Gaunt et al., (2017) Gaunt, R. E., Pickett, A. M., & Reinert, G. 2017. Chi-square approximation by Stein’s method with application to Pearson’s statistic. Ann. Appl. Probab., 27(2), 720–756. MR3655852.
Langford, (2005) Langford, J. 2005. Tutorial on practical prediction theory for classification. J. Mach. Learn. Res., 6, 273–306. MR2249822.
Loader, (2002) Loader, C. 2002. Fast and accurate computation of binomial probabilities. Technical report, 11 pp. https://www.r-project.org/doc/reports/CLoader-dbinom-2002.pdf.
Mann, (1997a) Mann, B. 1997a. Convergence rate for $\chi^{2}$ of a multinomial. Unpublished manuscript.
Mann, (1997b) Mann, B. 1997b. Stein’s method for $\chi^{2}$ of a multinomial. Unpublished manuscript.
Mason & Zhou, (2012) Mason, D. M., & Zhou, H. H. 2012. Quantile coupling inequalities and their applications. Probab. Surv., 9, 439–479. MR3007210.
Minc & Sathre, (1964/65) Minc, H., & Sathre, L. 1964/65. Some inequalities involving $(r!)^{1/r}$ . Proc. Edinburgh Math. Soc. (2), 14, 41–46. MR162751.
Ouimet, (2021a) Ouimet, F. 2021a. General formulas for the central and non-central moments of the multinomial distribution. Stats, 4(1), 18–27. doi:10.3390/stats4010002.
Ouimet, (2021b) Ouimet, F. 2021b. A precise local limit theorem for the multinomial distribution and some applications. J. Statist. Plann. Inference, 215, 218–233. MR4249129.
Prokhorov & Ulyanov, (2013) Prokhorov, Y. V., & Ulyanov, V. V. 2013. Some approximation problems in statistics and probability. Pages 235–249 of: Limit theorems in probability, statistics and number theory. Springer Proc. Math. Stat., vol. 42. Springer, Heidelberg. MR3079145.
Read, (1984) Read, T. R. C. 1984. Closer asymptotic approximations for the distributions of the power divergence goodness-of-fit statistics. Ann. Inst. Statist. Math., 36(1), 59–69. MR752006.
Remmert, (1998) Remmert, R. 1998. Classical Topics in Complex Function Theory. Graduate Texts in Mathematics, vol. 172. Springer-Verlag, New York. MR1483074.
Severini, (2005) Severini, T. A. 2005. Elements of Distribution Theory. Cambridge Series in Statistical and Probabilistic Mathematics, vol. 17. Cambridge University Press, Cambridge. MR2168237.
Siotani & Fujikoshi, (1984) Siotani, M., & Fujikoshi, Y. 1984. Asymptotic approximations for the distributions of multinomial goodness-of-fit statistics. Hiroshima Math. J., 14(1), 115–124. MR750392.
Tanabe & Sagae, (1992) Tanabe, K., & Sagae, M. 1992. An exact Cholesky decomposition and the generalized inverse of the variance-covariance matrix of the multinomial distribution, with applications. J. Roy. Statist. Soc. Ser. B, 54(1), 211–219. MR1157720.
Ulyanov & Zubov, (2009) Ulyanov, V. V., & Zubov, V. N. 2009. Refinement on the convergence of one family of goodness-of-fit statistics to chi-squared distribution. Hiroshima Math. J., 39(1), 133–161. MR2499200.

$\displaystyle\left\|(1+y)\log(1+y)-\left(y+\frac{y^{2}}{2}-\frac{y^{3}}{6}\right)\right\|$	$\displaystyle\leq\left\|\frac{2}{(1-0.84)^{3}}\right\|\cdot\left\|\frac{y^{4}}{4!}\right\|\leq 21\,\|y\|^{4},$	(A.5)
$\displaystyle\big{\|}\log(1+y)-y\big{\|}$	$\displaystyle\leq\left\|\frac{-1}{(1-0.84)^{2}}\right\|\cdot\left\|\frac{y^{2}}{2!}\right\|\leq 20\,\|y\|^{2},$
$\displaystyle\big{\|}(1+y)^{-1}\big{\|}$	$\displaystyle\leq\left\|\frac{1}{1-0.84}\right\|\leq 7.$

	$\displaystyle\|(\mathrm{I})\|$	$\displaystyle\leq n^{-1/2}\sum_{i=1}^{d+1}\left\|\mathbb{E}\left\{\left(\frac{1}{6p_{i}^{2}}\,\delta_{k_{i},p_{i}}^{3}-\frac{1}{2p_{i}}\,\delta_{k_{i},p_{i}}\right)\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right\}\right\|$		(A.12)
		$\displaystyle\quad+n^{-1}\sum_{i=1}^{d+1}\left\{\frac{21}{p_{i}^{3}}\,\mathbb{E}\left(\|\delta_{k_{i},p_{i}}\|^{4}\right)+\frac{10}{p_{i}^{2}}\,\mathbb{E}\left(\|\delta_{k_{i},p_{i}}\|^{2}\right)+\frac{2\tau}{3}\right\}.$		(A.12)

$\displaystyle\|(\mathrm{III})\|$	$\displaystyle\leq\left\|\mathbb{E}\left(\left[\log\left\{\frac{P_{n,\boldsymbol{p}}(\boldsymbol{K})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}\right\}+\log\left\{\frac{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{K},\boldsymbol{p}})}{n^{-d/2}\phi_{\Sigma_{\boldsymbol{p}}}(\boldsymbol{\delta}_{\boldsymbol{X},\boldsymbol{p}})}\right\}\right]\,\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\right)\right\|$
	$\displaystyle\leq\left\{\frac{96.25\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}+\frac{0.62\,\tau^{3}(\log n)^{1/2}}{n}\right\}\,\mathbb{E}\left[\|\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\|\right]$
	$\displaystyle\leq\frac{96.31\,\tau^{3}(\log n)^{3/2}}{n^{1/2}}\,\mathbb{E}\left[\|\mathds{1}_{\{\boldsymbol{K}\in\mathscr{B}_{n,\boldsymbol{p}}\}\bigtriangleup\{\boldsymbol{X}\in\mathscr{B}_{n,\boldsymbol{p}}\}}\|\right].$	(A.19)