This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Channel Coding for Gaussian Channels with
Mean and Variance Constraints

Adeel Mahmood and Aaron B. Wagner
School of Electrical and Computer Engineering, Cornell University
Abstract

We consider channel coding for Gaussian channels with the recently introduced mean and variance cost constraints. Through matching converse and achievability bounds, we characterize the optimal first- and second-order performance. The main technical contribution of this paper is an achievability scheme which uses random codewords drawn from a mixture of three uniform distributions on (n1)(n-1)-spheres of radii R1,R2R_{1},R_{2} and R3R_{3}, where Ri=O(n)R_{i}=O(\sqrt{n}) and |RiRj|=O(1)|R_{i}-R_{j}|=O(1). To analyze such a mixture distribution, we prove a lemma giving a uniform O(logn)O(\log n) bound, which holds with high probability, on the log ratio of the output distributions QiccQ_{i}^{cc} and QjccQ_{j}^{cc}, where QiccQ_{i}^{cc} is induced by a random channel input uniformly distributed on an (n1)(n-1)-sphere of radius RiR_{i}. To facilitate the application of the usual central limit theorem, we also give a uniform O(logn)O(\log n) bound, which holds with high probability, on the log ratio of the output distributions QiccQ_{i}^{cc} and QiQ^{*}_{i}, where QiQ_{i}^{*} is induced by a random channel input with i.i.d. components.

Index Terms:
Channel coding, Gaussian channels, second-order coding rate, random coding, mixture distribution

I Introduction

The two common forms of cost (or power) constraints in channel coding have been the maximal cost constraint specified by c(𝐗)Γc(\mathbf{X})\leq\Gamma almost surely, or the expected cost constraint specified by 𝔼[c(𝐗)]Γ\mathbb{E}\left[c(\mathbf{X})\right]\leq\Gamma, where 𝐗n\mathbf{X}\in\mathbb{R}^{n} is a random channel input vector and c()c(\cdot) is an additively separable cost function defined as

c(𝐗):=1ni=1nc(Xi).\displaystyle c(\mathbf{X}):=\frac{1}{n}\sum_{i=1}^{n}c(X_{i}).

Recent works [1] and [2] introduced the mean and variance (m.v.) cost constraint, specified by

𝔼[c(𝐗)]\displaystyle\mathbb{E}\left[c(\mathbf{X})\right] Γ\displaystyle\leq\Gamma
Var(c(𝐗))\displaystyle\text{Var}\left(c(\mathbf{X})\right) Vn,\displaystyle\leq\frac{V}{n},

for discrete memoryless channels (DMCs). With a variance parameter VV, the mean and variance (m.v.) cost constraint generalizes the two existing frameworks in the sense that V0V\to 0 recovers the first- and second-order coding performance of the maximal cost constraint [1] and VV\to\infty recovers the expected cost constraint. Beyond generalization, the m.v. cost constraint for 0<V<0<V<\infty is shown to have practical advantages over both prior cost models. Unlike the maximal cost constraint, it allows for an improved second-order coding performance with feedback [1, Theorem 3], [2, Theorem 2]; even without feedback, the coding performance under the m.v. cost constraint is superior [2, Theorem 1]. In particular, for DMCs with a unique capacity-cost-achieving distribution, second-order coding performance improvement via feedback is possible if and only if V>0V>0. Unlike the expected cost constraint, the m.v. cost constraint enforces a controlled, ergodic use of transmission power [2, (5)]. This is essential for several practical reasons such as operating circuitry in the linear regime, minimizing power consumption, and reducing interference with other terminals. The m.v. cost constraint allows the power to fluctuate above the threshold in a manner consistent with a noise process, thus making it a more realistic and natural cost model in practice than the restrictive maximal cost constraint.

The aforementioned performance improvements under the m.v. cost constraint were shown for DMCs only, although the ergodic behavior enforced by the m.v. cost constraint is generally true. In this paper, we investigate additive white Gaussian noise (AWGN) channels with the mean and variance cost constraint with the cost function taken to be c(x)=x2c(x)=x^{2}. Specifically, we characterize the optimal first- and second-order coding rates (SOCR) under the m.v. cost constraint for AWGN channels subject to a non-vanishing average probability of error ϵ>0\epsilon>0. Simultaneously, we also characterize the optimal average error probability as a function of the second-order rate with the baseline first-order rate fixed as the capacity-cost function [3, (9.17)]. The latter is motivated by the fact, which itself follows from our results, that the strong converse [3, p. 208] holds under the m.v. cost constraint. This is an interesting finding since the strong converse does not hold for AWGN channels subject to expectation-only constraint [4, Theorem 77]. However, if one considers maximal probability of error instead of average probability of error, then the strong converse holds for both m.v. cost constraint and expectation-only constraint [5, (18)]. More results on AWGN channels can be found in [6] and [7]. In this paper, we only focus on the average error probability and non-feedback framework.

Our achievability proof relies on a random coding scheme where the channel input has a distribution supported on at most three concentric spheres of radii R1R_{1}, R2R_{2} and R3R_{3}, where Ri=O(n)R_{i}=O(\sqrt{n}) and |RiRj|=O(1)|R_{i}-R_{j}|=O(1). Each uniform distribution on a sphere induces a distribution QiccQ_{i}^{cc} on the channel output, and a mixture of such input uniform distributions induces a mixture distribution of Q1cc,Q2ccQ_{1}^{cc},Q_{2}^{cc} and Q3ccQ_{3}^{cc} on the output. For 𝐘Qicc\mathbf{Y}\sim Q_{i}^{cc}, we have that 𝔼[𝐘2]=O(n)\mathbb{E}[||\mathbf{Y}||^{2}]=O(n) and 𝐘2||\mathbf{Y}||^{2} concentrates in probability over a set encompassing ±n\pm\sqrt{n} deviations around its mean. Furthermore, the probability density function QiccQ_{i}^{cc} is given in terms of the modified Bessel function of the first kind. To assist in the analysis of the mixture distribution, we use a uniform asymptotic expansion of the Bessel function followed by traditional series expansions to bound the log\log ratio of QiccQ_{i}^{cc} and QjccQ_{j}^{cc}. Remarkably, the zeroth- and first-order terms, which are O(n)O(n) and O(n)O(\sqrt{n}) respectively, cancel out, giving us an O(logn)O(\log n) bound. This O(logn)O(\log n) bounds holds uniformly over a set on which 𝐘2||\mathbf{Y}||^{2} concentrates. Moreover, to facilitate the application of the central limit theorem, we give a similar O(logn)O(\log n) bound on the log\log ratio of QiccQ_{i}^{cc} and QQ^{*}, where QQ^{*} is the output distribution induced by an i.i.d. channel input. The discussion of this paragraph is formalized in Lemmas 5, 6 and 7 in Section III. The achievability theorem and its proof are also given in Section III.

For the proof of the matching converse result, the main technical component involves obtaining convergence of the distribution of the normalized sum of information densities to a standard Gaussian CDF. Let 𝐘=(Y1,,Yn)\mathbf{Y}=(Y_{1},\ldots,Y_{n}) denote the random channel output when the input to the AWGN channel, denoted by WW, is some fixed vector 𝐱=(x1,,xn)\mathbf{x}=(x_{1},\ldots,x_{n}). The sum of the information densities is given by

i=1nlogW(Yi|xi)Q(Yi).\displaystyle\sum_{i=1}^{n}\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}. (1)

A well-known observation is that the distribution of (1)(\ref{sphxsymm}) depends on 𝐱\mathbf{x} only through its average power c(𝐱)=𝐱2/nc(\mathbf{x})=||\mathbf{x}||^{2}/n (see Lemma 2). Unlike the maximal cost constraint, c(𝐱)c(\mathbf{x}) is not uniformly bounded in the mean and variance cost constraint framework. Hence, we prove that the normalized sum converges uniformly in distribution to the standard Gaussian, where the convergence is only uniform over typical vectors 𝐱n\mathbf{x}\in\mathbb{R}^{n}. For atypical vectors, we use standard concentration arguments. Section IV is devoted to the converse theorem and its proof.

II Preliminaries

We write 𝐱=(x1,,xn)\mathbf{x}=(x_{1},\ldots,x_{n}) to denote a vector and 𝐗=(X1,,Xn)\mathbf{X}=(X_{1},\ldots,X_{n}) to denote a random vector in n\mathbb{R}^{n}. For any μ\mu\in\mathbb{R} and σ2>0\sigma^{2}>0, let 𝒩(μ,σ2)\mathcal{N}(\mu,\sigma^{2}) denote the Gaussian distribution with mean μ\mu and variance σ2\sigma^{2}. Let Φ\Phi denote the standard Gaussian CDF. Let χn2(λ)\chi^{2}_{n}(\lambda) denote the noncentral chi-squared distribution with nn degrees of freedom and noncentrality parameter λ\lambda. If two random variables XX and YY have the same distribution, we write X=dYX\stackrel{{\scriptstyle d}}{{=}}Y. The modified Bessel function of the first kind of order ν\nu is denoted by Iν(x)I_{\nu}(x). We will write log\log to denote logarithm to the base ee and exp(x)\exp(x) to denote ee to the power of xx. Define SRn1S^{n-1}_{R} to be the (n1)(n-1)-sphere of radius RR centered at the origin, i.e., SRn1:={𝐱n:𝐱=R}S^{n-1}_{R}:=\{\mathbf{x}\in\mathbb{R}^{n}:||\mathbf{x}||=R\}.

Let 𝒫(n)\mathcal{P}(\mathbb{R}^{n}) denote the set of all probability distributions over n\mathbb{R}^{n}. If P𝒫(n)P\in\mathcal{P}(\mathbb{R}^{n}) is an nn-fold product distribution induced by some P𝒫()P^{\prime}\in\mathcal{P}(\mathbb{R}), then we write

P(𝐱)=P(𝐱)=i=1nP(xi)\displaystyle P(\mathbf{x})=P^{\prime}(\mathbf{x})=\prod_{i=1}^{n}P^{\prime}(x_{i})

with some abuse of notation.

The additive white Gaussian noise (AWGN) channel models the relationship between the channel input 𝐗\mathbf{X} and output 𝐘\mathbf{Y} over nn channel uses as 𝐘=𝐗+𝐙\mathbf{Y}=\mathbf{X}+\mathbf{Z}, where 𝐙𝒩(𝟎,NIn)\mathbf{Z}\sim\mathcal{N}(\mathbf{0},N\cdot I_{n}) represents independent and identically distributed (i.i.d.) Gaussian noise with variance N>0N>0. The noise vector 𝐙\mathbf{Z} is independent of the input 𝐗\mathbf{X}. For a single time step, we use WW to denote the conditional probability distribution associated with the channel:

W(|x):=𝒩(x,N).W(\cdot|x):=\mathcal{N}(x,N).

Let the cost function c:[0,)c:\mathbb{R}\to[0,\infty) be given by c(x)=x2c(x)=x^{2}. For a channel input sequence 𝐱n\mathbf{x}\in\mathbb{R}^{n},

c(𝐱)=1ni=1nc(xi)=𝐱2n.\displaystyle c(\mathbf{x})=\frac{1}{n}\sum_{i=1}^{n}c(x_{i})=\frac{||\mathbf{x}||^{2}}{n}.

For Γ>0\Gamma>0, the capacity-cost function is defined as

C(Γ)=maxP:𝔼P[c(X)]ΓI(P,W),\displaystyle C(\Gamma)=\max_{P:\mathbb{E}_{P}[c(X)]\leq\Gamma}I(P,W), (2)

where

𝔼P[c(X)]=x2𝑑P(x).\displaystyle\mathbb{E}_{P}\left[c(X)\right]=\int_{\mathbb{R}}x^{2}dP(x).
Definition 1

We use PP^{*} to denote the capacity-cost-achieving distribution in (2)(\ref{defcc}) and Q=PWQ^{*}=P^{*}W to denote the induced output distribution. We define

νx\displaystyle\nu_{x} :=Var(logW(Y|x)Q(Y)), where YW(|x),\displaystyle:=\text{Var}\left(\log\frac{W(Y|x)}{Q^{*}(Y)}\right),\quad\text{ where }Y\sim W(\cdot|x),
V(Γ)\displaystyle V(\Gamma) :=νxP(x)𝑑x.\displaystyle:=\int_{\mathbb{R}}\nu_{x}P^{*}(x)dx.
Lemma 1

We have P=𝒩(0,Γ)P^{*}=\mathcal{N}(0,\Gamma) and Q=𝒩(0,Γ+N)Q^{*}=\mathcal{N}(0,\Gamma+N). Thus, the capacity-cost function is given by

C(Γ)\displaystyle C(\Gamma) =12log(1+ΓN).\displaystyle=\frac{1}{2}\log\left(1+\frac{\Gamma}{N}\right).

Furthermore, for any xx\in\mathbb{R} and YW(|x)Y\sim W(\cdot|x),

𝔼[logW(Y|x)Q(Y)]\displaystyle\mathbb{E}\left[\log\frac{W(Y|x)}{Q^{*}(Y)}\right] =C(Γ)C(Γ)(Γc(x))\displaystyle=C(\Gamma)-C^{\prime}(\Gamma)\left(\Gamma-c(x)\right) (3)
C(Γ)\displaystyle C^{\prime}(\Gamma) =12(Γ+N)\displaystyle=\frac{1}{2(\Gamma+N)}
νx\displaystyle\nu_{x} =Γ2+2x2N2(N+Γ)2\displaystyle=\frac{\Gamma^{2}+2x^{2}N}{2\left(N+\Gamma\right)^{2}}
V(Γ)\displaystyle V(\Gamma) =Γ2+2ΓN2(N+Γ)2.\displaystyle=\frac{\Gamma^{2}+2\Gamma N}{2\left(N+\Gamma\right)^{2}}. (4)

Proof: The fact that P=𝒩(0,Γ)P^{*}=\mathcal{N}(0,\Gamma) can be found in [3, (9.17)]. The remaining content of Lemma 1 follows from elementary calculus.

Lemma 2 (Spherical Symmetry)

Let 𝐱n\mathbf{x}\in\mathbb{R}^{n} be the channel input. Let Yi=xi+ZiY_{i}=x_{i}+Z_{i}. Define

Ti:=logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)].\displaystyle T_{i}:=\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right].

Then

i=1nTi=dΓ2(Γ+N)Λ+Nnc(𝐱)2Γ(Γ+N)+nΓ2(Γ+N),\sum_{i=1}^{n}T_{i}\stackrel{{\scriptstyle d}}{{=}}-\frac{\Gamma}{2(\Gamma+N)}\Lambda+\frac{Nnc(\mathbf{x})}{2\Gamma(\Gamma+N)}+\frac{n\Gamma}{2(\Gamma+N)},

where Λχn2(Nnc(𝐱)Γ2)\Lambda\sim\chi^{2}_{n}\left(\frac{Nnc(\mathbf{x})}{\Gamma^{2}}\right). Hence, the distribution of i=1nTi\sum\limits_{i=1}^{n}T_{i} depends on 𝐱\mathbf{x} only through its cost c(𝐱)c(\mathbf{x}). Hence, we can write

i=1nTi=di=1nT~i\displaystyle\sum_{i=1}^{n}T_{i}\stackrel{{\scriptstyle d}}{{=}}\sum_{i=1}^{n}\tilde{T}_{i} (5)

where T~i\tilde{T}_{i}’s are i.i.d. and

T~i=dlogW(Y|c(𝐱))Q(Y)𝔼[logW(Y|c(𝐱))Q(Y)],\displaystyle\tilde{T}_{i}\stackrel{{\scriptstyle d}}{{=}}\log\frac{W(Y|\sqrt{c(\mathbf{x})})}{Q^{*}(Y)}-\mathbb{E}\left[\log\frac{W(Y|\sqrt{c(\mathbf{x})})}{Q^{*}(Y)}\right], (6)

where Y𝒩(c(𝐱),N)Y\sim\mathcal{N}(\sqrt{c(\mathbf{x})},N) in (6)(\ref{v3}).

Proof: The observation of spherical symmetry with respect to the channel input is standard in most works on AWGN channels (see, e.g., [8]). For convenience and completeness, we have given a proof of this lemma in Appendix A.

Corollary 1

With the same setup as in Lemma 2, we have

Var(i=1nTi)=nΓ2+2Nnc(𝐱)2(Γ+N)2.\displaystyle\text{Var}\left(\sum_{i=1}^{n}T_{i}\right)=\frac{n\Gamma^{2}+2Nnc(\mathbf{x})}{2(\Gamma+N)^{2}}.

Proof: Use the equality in distribution given in (5)(\ref{redefTis}) and Lemma 1.

With a blocklength nn and a fixed rate R>0R>0, let R={1,,exp(nR)}\mathcal{M}_{R}=\{1,\ldots,\lceil\exp(nR)\rceil\} denote the message set. Let MRM\in\mathcal{M}_{R} denote the random message drawn uniformly from the message set.

Definition 2

An (n,R)(n,R) code for an AWGN channel consists of an encoder ff which, for each message mRm\in\mathcal{M}_{R}, chooses an input 𝐗=f(m)n\mathbf{X}=f(m)\in\mathbb{R}^{n}, and a decoder gg which maps the output 𝐘\mathbf{Y} to m^R\hat{m}\in\mathcal{M}_{R}. The code (f,g)(f,g) is random if ff or gg is random.

Definition 3

An (n,R,Γ,V)(n,R,\Gamma,V) code for an AWGN channel is an (n,R)(n,R) code such that 𝔼[c(𝐗)]Γ\mathbb{E}\left[c(\mathbf{X})\right]\leq\Gamma and Var(c(𝐗))V/n\text{Var}\left(c(\mathbf{X})\right)\leq V/n, where the message MUnif(R)M\sim\text{Unif}(\mathcal{M}_{R}) has a uniform distribution over the message set R\mathcal{M}_{R}.

Given ϵ(0,1)\epsilon\in(0,1), define

M(n,ϵ,Γ,V):=max{exp(nR):P¯e(n,R,Γ,V)ϵ},\displaystyle M^{*}(n,\epsilon,\Gamma,V):=\max\{\lceil\exp(nR)\rceil:\bar{P}_{\text{e}}(n,R,\Gamma,V)\leq\epsilon\},

where P¯e(n,R,Γ,V)\bar{P}_{\text{e}}(n,R,\Gamma,V) denotes the minimum average error probability attainable by any random (n,R,Γ,V)(n,R,\Gamma,V) code.

Definition 4

Define the function 𝒦:×[0,)(0,1)\mathcal{K}:\mathbb{R}\times[0,\infty)\to(0,1) as

𝒦(r,V)\displaystyle\mathcal{K}\left(r,V\right) =infΠ:𝔼[Π]=rVar(Π)V𝔼[Φ(Π)].\displaystyle=\inf_{\begin{subarray}{c}\Pi:\\ \mathbb{E}[\Pi]=r\\ \text{Var}(\Pi)\leq V\end{subarray}}\mathbb{E}\left[\Phi(\Pi)\right]. (7)
Lemma 3

The 𝒦\mathcal{K} function has the following properties:

  1. 1.

    The infimum in (7)(\ref{min5}) is a minimum, and there exists a minimizer which is a discrete probability distribution with at most 3 point masses,

  2. 2.

    𝒦(r,V)\mathcal{K}(r,V) is (jointly) continuous in (r,V)(r,V),

  3. 3.

    for any fixed VV, 𝒦(r,V)\mathcal{K}(r,V) is strictly increasing in rr,

  4. 4.

    for any VV and 0<ϵ<10<\epsilon<1, there always exists a unique rr^{*} satisfying 𝒦(r,V)=ϵ\mathcal{K}(r^{*},V)=\epsilon,

  5. 5.

    for all rr\in\mathbb{R} and V>0V>0, the minimizing distribution in (7)(\ref{min5}) has at least two point masses.

Proof: See [9, Lemmas 3 and 4] for the first four properties and [2, Theorem 1] for the fifth property above.

III Achievability Result

The design of random channel codes (f,g)(f,g) can be directly related to the design of the distribution P𝒫(n)P\in\mathcal{P}(\mathbb{R}^{n}) of the codewords. Specifically, for each message mRm\in\mathcal{M}_{R}, the channel input 𝐗\mathbf{X} is chosen randomly according to PP. Given an observed 𝐲\mathbf{y} at the decoder gg, the decoder selects the message mm with the lowest index that achieves the maximum over mm of

i=1nW(yi|f(m)).\displaystyle\prod_{i=1}^{n}W\left(y_{i}|f(m)\right).

Any distribution P𝒫(n)P\in\mathcal{P}(\mathbb{R}^{n}) can be used to construct an (n,R)(n,R) channel code using the aforementioned construction. Moreover, any distribution PP satisfying

𝔼[c(𝐗)]ΓVar(c(𝐗))Vn\displaystyle\begin{split}\mathbb{E}\left[c(\mathbf{X})\right]&\leq\Gamma\\ \text{Var}\left(c(\mathbf{X})\right)&\leq\frac{V}{n}\end{split} (8)

can be used to construct a random (n,R,Γ,V)(n,R,\Gamma,V) code.

Given a random (n,R,Γ,V)(n,R,\Gamma,V) code based on an input distribution PP, the following lemma gives an upper bound on the average error probability of the code in terms of the distribution of (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) induced by PP and the channel transition probability WW.

Lemma 4

Consider an AWGN channel WW with noise variance N>0N>0 and cost constraint (Γ,V)(0,)×[0,)(\Gamma,V)\in(0,\infty)\times[0,\infty). For any distribution P𝒫(n)P\in\mathcal{P}(\mathbb{R}^{n}) satisfying (8)(\ref{faby}), and any nn, θ\theta and RR,

P¯e(n,R,Γ,V)(PW)(1nlogW(𝐘|𝐗)PW(𝐘)R+θ)+enθ,\displaystyle\bar{P}_{\text{e}}(n,R,\Gamma,V)\leq(P\circ W)\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq R+\theta\right)+e^{-n\theta}, (9)

where (𝐗,𝐘)(\mathbf{X},\mathbf{Y}) have the joint distribution specified by

(PW)(𝐱,𝐲)\displaystyle(P\circ W)(\mathbf{x},\mathbf{y}) =P(𝐱)k=1nW(yk|xk),\displaystyle=P(\mathbf{x})\prod_{k=1}^{n}W(y_{k}|x_{k}),

and PWPW denotes the marginal distribution of 𝐘\mathbf{Y}. Furthermore, if for some α\alpha and ϵ\epsilon,

lim supn(PW)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+αn)<ϵ,\displaystyle\limsup_{n\to\infty}\,(P\circ W)\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{\alpha}{\sqrt{n}}\right)<\epsilon, (10)

then the distribution PP gives rise to an achievable second-order coding rate (SOCR) of α\alpha, i.e.,

lim infnlogM(n,ϵ,Γ,V)nC(Γ)nα.\displaystyle\liminf_{n\to\infty}\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\geq\alpha. (11)

Proof: The proof can be adapted from the proof of [10, Lemma 14] by (i) replacing controllers with distributions PP satisfying (8)(\ref{faby}) and (ii) replacing sums with integrals.

Lemma 4 is a starting point for proving our achievability result. Hence, a central quantity of interest in the proof is

logW(𝐘|𝐗)PW(𝐘).\displaystyle\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}. (12)

As discussed in the introduction, Lemmas 5, 6 and 7 are helpful in the analysis of (12)(\ref{centralinterest}), where PP is a mixture distribution. Specifically, the lemmas are helpful in approximating the output distribution PWPW in terms of a simpler distribution.

Lemma 5

Consider a random vector 𝐘=𝐗+𝐙\mathbf{Y}=\mathbf{X}+\mathbf{Z}, where 𝐗\mathbf{X} and 𝐙\mathbf{Z} are independent, 𝐗\mathbf{X} is uniformly distributed on an (n1)(n-1) sphere of radius RR and 𝐙𝒩(𝟎,NIn)\mathbf{Z}\sim\mathcal{N}(\mathbf{0},NI_{n}). Let QccQ^{cc} denote the PDF of 𝐘\mathbf{Y}. Then

Qcc(𝐲)=Γ(n2)2(πN)n/2exp(R2+𝐲22N)(NR𝐲)n21In21(R𝐲N).\displaystyle Q^{cc}(\mathbf{y})=\frac{\Gamma\left(\frac{n}{2}\right)}{2(\pi N)^{n/2}}\cdot\exp\left(-\frac{R^{2}+||\mathbf{y}||^{2}}{2N}\right)\left(\frac{N}{R||\mathbf{y}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{R||\mathbf{y}||}{N}\right).

Proof: The proof is given in Appendix B.

To state the next two lemmas in a succinct way, we need to introduce some notation.

Definition 5 (Multi-parameter and multi-variable big-OO notation)

Let f,f1,,fmf,f_{1},\ldots,f_{m} be functions of ϵ,Δ\epsilon,\Delta and nn. We write

f(ϵ,Δ,n)=O(f1(ϵ,Δ,n),,fm(ϵ,Δ,n))\displaystyle f(\epsilon,\Delta,n)=O\left(f_{1}(\epsilon,\Delta,n),\ldots,f_{m}(\epsilon,\Delta,n)\right)

if there exist positive constants ϵ0,Δ0,n0\epsilon_{0},\Delta_{0},n_{0} and C1,,CmC_{1},\ldots,C_{m} such that for all nn0n\geq n_{0}, |ϵ|ϵ0|\epsilon|\leq\epsilon_{0} and |Δ|Δ0|\Delta|\leq\Delta_{0},

|f(ϵ,Δ,n)|i=1mCi|fi(ϵ,Δ,n)|.\displaystyle|f(\epsilon,\Delta,n)|\leq\sum_{i=1}^{m}C_{i}|f_{i}(\epsilon,\Delta,n)|.
Lemma 6

Consider a random vector 𝐘=𝐗+𝐙\mathbf{Y}=\mathbf{X}+\mathbf{Z}, where 𝐗\mathbf{X} and 𝐙\mathbf{Z} are independent, 𝐗\mathbf{X} is uniformly distributed on an (n1)(n-1) sphere of radius nΓ\sqrt{n\Gamma} and 𝐙𝒩(𝟎,NIn)\mathbf{Z}\sim\mathcal{N}(\mathbf{0},NI_{n}). Let QccQ^{cc} denote the PDF of 𝐘\mathbf{Y}. Consider another random vector 𝐘=𝐗+𝐙\mathbf{Y}^{\prime}=\mathbf{X}^{\prime}+\mathbf{Z}^{\prime}, where 𝐗\mathbf{X}^{\prime} and 𝐙\mathbf{Z}^{\prime} are independent, 𝐗\mathbf{X}^{\prime} is uniformly distributed on an (n1)(n-1) sphere of radius nΓ\sqrt{n\Gamma^{\prime}} and 𝐙𝒩(𝟎,NIn)\mathbf{Z}^{\prime}\sim\mathcal{N}(\mathbf{0},NI_{n}). Let Q0ccQ^{cc}_{0} denote the PDF of 𝐘\mathbf{Y}^{\prime}. Let Γ=Γ+ϵ\Gamma^{\prime}=\Gamma+\epsilon. Then

sup𝐲𝒫n|logQ0cc(𝐲)Qcc(𝐲)|=O(logn,nϵ2,nΔ2,nϵΔ),\displaystyle\sup_{\mathbf{y}\in\mathcal{P}_{n}^{*}}\Bigg{|}\log\frac{Q^{cc}_{0}(\mathbf{y})}{Q^{cc}(\mathbf{y})}\Bigg{|}=O\left(\log n,n\epsilon^{2},n\Delta^{2},n\epsilon\Delta\right),

where 𝒫n={𝐲n:Γ+NΔ𝐲2nΓ+N+Δ}\mathcal{P}_{n}^{*}=\left\{\mathbf{y}\in\mathbb{R}^{n}:\Gamma+N-\Delta\leq\frac{||\mathbf{y}||^{2}}{n}\leq\Gamma+N+\Delta\right\} and Δ0\Delta\geq 0.

Remark 1

The parameters ϵ\epsilon and Δ\Delta in Lemma 6 may depend on nn.

Proof: The proof of Lemma 6 is given in Appendix C.

Lemma 7

Consider a random vector 𝐘=𝐗+𝐙\mathbf{Y}=\mathbf{X}+\mathbf{Z}, where 𝐗\mathbf{X} and 𝐙\mathbf{Z} are independent, 𝐗\mathbf{X} is uniformly distributed on an (n1)(n-1) sphere of radius nΓ\sqrt{n\Gamma} and 𝐙𝒩(𝟎,NIn)\mathbf{Z}\sim\mathcal{N}(\mathbf{0},NI_{n}). Let QccQ^{cc} denote the PDF of 𝐘\mathbf{Y} and let Q=𝒩(𝟎,(Γ+N)In)Q^{*}=\mathcal{N}(\mathbf{0},(\Gamma+N)I_{n}). Then

sup𝐲𝒫n|logQcc(𝐲)Q(𝐲)|=O(logn,nΔ2),\displaystyle\sup_{\mathbf{y}\in\mathcal{P}_{n}^{*}}\Bigg{|}\log\frac{Q^{cc}(\mathbf{y})}{Q^{*}(\mathbf{y})}\Bigg{|}=O\left(\log n,n\Delta^{2}\right),

where 𝒫n={𝐲n:Γ+NΔ𝐲2nΓ+N+Δ}\mathcal{P}_{n}^{*}=\left\{\mathbf{y}\in\mathbb{R}^{n}:\Gamma+N-\Delta\leq\frac{||\mathbf{y}||^{2}}{n}\leq\Gamma+N+\Delta\right\} and Δ0\Delta\geq 0.

Remark 2

The parameter Δ\Delta in Lemma 7 may depend on nn. In fact, in the proof of Theorem 1, we invoke Lemma 7 for Δ=lognn\Delta=\sqrt{\frac{\log n}{n}}.

Proof: The proof of Lemma 7 is given in Appendix D.

Remark 3

The set 𝒫n\mathcal{P}_{n}^{*} in both Lemmas 6 and 7 is a “high probability” set in the following sense. For Δ=ω(n1/2)\Delta=\omega(n^{-1/2}), we have (𝐘𝒫n)1\mathbb{P}\left(\mathbf{Y}\in\mathcal{P}_{n}^{*}\right)\to 1 as nn\to\infty, if 𝔼[𝐘2/n]=Γ+N\mathbb{E}\left[||\mathbf{Y}||^{2}/n\right]=\Gamma+N and Var(𝐘2)=O(n)\text{Var}(||\mathbf{Y}||^{2})=O(n). The latter two conditions hold when 𝐘Q\mathbf{Y}\sim Q^{*} or 𝐘Qcc\mathbf{Y}\sim Q^{cc}. This property is used multiple times in the proof of Theorem 1.

Remark 4

See also [8, Theorem 42] for a related result to Lemma 7.

Theorem 1

Fix an arbitrary ϵ(0,1)\epsilon\in(0,1). Consider an AWGN channel with noise variance N>0N>0. Under the mean and variance cost constraint specified by the pair (Γ,V)(0,)×[0,)(\Gamma,V)\in(0,\infty)\times[0,\infty), we have

lim infnlogM(n,ϵ,Γ,V)nC(Γ)nmax{r:𝒦(rV(Γ),C(Γ)2VV(Γ))ϵ}.\displaystyle\liminf_{n\to\infty}\,\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\geq\max\left\{r\in\mathbb{R}:\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right)\leq\epsilon\right\}.

Alternatively, for any second-order coding rate rr\in\mathbb{R},

lim supnP¯e(n,C(Γ)+rn,Γ,V)\displaystyle\limsup_{n\to\infty}\bar{P}_{\text{e}}\left(n,C(\Gamma)+\frac{r}{\sqrt{n}},\Gamma,V\right) 𝒦(rV(Γ),C(Γ)2VV(Γ)).\displaystyle\leq\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right).
Remark 5

When V=0V=0, Theorem 1 recovers the optimal second-order coding rate corresponding to the maximal cost constraint 𝐗nΓ||\mathbf{X}||\leq\sqrt{n\Gamma}, as given in [11, Theorem 5]. In this special case, the achievability scheme in Theorem 1 simplifies to taking the channel input to be uniformly distributed on a singular (n1)(n-1)-sphere of radius nΓ\sqrt{n\Gamma}. Thus, for V=0V=0, the proof of Theorem 1 is an alternative and more direct proof technique than that of [11, Theorem 5], which relies on randomly selecting the channel input from sequences of a fixed nn-type P(n)P^{(n)} over a finite alphabet of size n1/4n^{1/4} and an implicit assumption on the convergence rate of I(P(n),W)I(P^{(n)},W) to C(Γ)C(\Gamma) [11, p. 4963].

Remark 6

When V>0V>0, the achievability scheme involves the random channel input having a mixture distribution of two or three uniform distributions on (n1)(n-1)-spheres.

Let r(ϵ,Γ,V)r(\epsilon,\Gamma,V) denote the achievable SOCR for the mean and variance cost constraint in Theorem 1, which is also the optimal SOCR as shown later in Theorem 2. As remarked earlier, r(ϵ,Γ,0)=V(Γ)Φ1(ϵ)r(\epsilon,\Gamma,0)=\sqrt{V(\Gamma)}\Phi^{-1}(\epsilon) is the optimal SOCR for the maximal cost constraint. Fig. 1 plots the SOCR against the average error probability for a Gaussian channel with Γ=2\Gamma=2 and N=1N=1, showing improved SOCR for several values of VV. In fact, [2, Theorem 1] shows that r(ϵ,Γ,V)>V(Γ)Φ1(ϵ)r(\epsilon,\Gamma,V)>\sqrt{V(\Gamma)}\Phi^{-1}(\epsilon) for all V>0V>0.

Refer to caption
Figure 1: The SOCR is compared between the almost-sure cost constraint and the mean and variance cost constraints for different values of VV. The plots for the mean and variance cost constraints are technically lower bounds to the SOCR since they are obtained through a non-exhaustive search of the feasible region in the maximization and minimization in the formulation of r(ϵ,Γ,V)r(\epsilon,\Gamma,V).
Proof:

In view of Lemma 3, consider any distribution PΠP_{\Pi} which achieves the minimum in

𝒦(rV(Γ),C(Γ)2VV(Γ)).\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right).

Let ΠPΠ\Pi\sim P_{\Pi}, where we write

PΠ(π)={p1π=π1p2π=π2p3π=π3.\displaystyle P_{\Pi}(\pi)=\begin{cases}p_{1}&\pi=\pi_{1}\\ p_{2}&\pi=\pi_{2}\\ p_{3}&\pi=\pi_{3}.\end{cases}

Recall that 𝔼[Π]=rV(Γ)\mathbb{E}[\Pi]=\frac{r}{\sqrt{V(\Gamma)}} and Var(Π)C(Γ)2VV(Γ)\text{Var}(\Pi)\leq\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}. For each j{1,2,3}j\in\{1,2,3\}, let

Γj=ΓV(Γ)C(Γ)nπj+rC(Γ)n.\displaystyle\Gamma_{j}=\Gamma-\frac{\sqrt{V(\Gamma)}}{C^{\prime}(\Gamma)\sqrt{n}}\pi_{j}+\frac{r}{C^{\prime}(\Gamma)\sqrt{n}}. (13)

We assume sufficiently large nn so that Γj>0\Gamma_{j}>0 for all j{1,2,3}j\in\{1,2,3\}. Let PjP^{*}_{j} be the capacity-cost-achieving input distribution for C(Γj)C(\Gamma_{j}) and QjQ_{j}^{*} be the corresponding optimal output distribution. Thus, Pj=𝒩(0,Γj)P_{j}^{*}=\mathcal{N}(0,\Gamma_{j}) and Qj=𝒩(0,Γj+N)Q_{j}^{*}=\mathcal{N}(0,\Gamma_{j}+N). Let QjccQ^{cc}_{j} be the output distribution induced by the input distribution Unif(SRjn1)\text{Unif}(S^{n-1}_{R_{j}}) for Rj=nΓjR_{j}=\sqrt{n\Gamma_{j}}.

Achievability Scheme: Let the random channel input 𝐗\mathbf{X} be such that with probability pjp_{j}, 𝐗Unif(SRjn1)\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}}). Denoting the distribution of 𝐗\mathbf{X} by PP, we can write

P=j=13pjUnif(SRjn1).\displaystyle P=\sum_{j=1}^{3}p_{j}\cdot\text{Unif}(S^{n-1}_{R_{j}}).

The output distribution of 𝐘\mathbf{Y} induced by PWP\circ W is

PW(𝐲)\displaystyle PW(\mathbf{y}) =j=13pjQjcc(𝐲).\displaystyle=\sum_{j=1}^{3}p_{j}Q_{j}^{cc}(\mathbf{y}).

We define

n:=(PW)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn)\displaystyle\mathcal{E}_{n}:=(P\circ W)\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\right)

and show that lim supnn<ϵ\limsup_{n\to\infty}\mathcal{E}_{n}<\epsilon which, by Lemma 4, would show that the random coding scheme achieves a second-order coding rate of rr.

Analysis: We first write

n\displaystyle\mathcal{E}_{n} =j=13pj𝐗Unif(SRjn1)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn).\displaystyle=\sum_{j=1}^{3}p_{j}\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\right). (14)

To proceed further, we upper bound

𝐗Unif(SRjn1)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn)\displaystyle\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\right) (15)
=𝐲n𝑑𝐲Qjcc(𝐲)(1nlogW(𝐲|𝐗)PW(𝐲)C(Γ)+rn|𝐘=𝐲)\displaystyle=\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\frac{1}{n}\log\frac{W(\mathbf{y}|\mathbf{X})}{PW(\mathbf{y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\Big{|}\mathbf{Y}=\mathbf{y}\right)
𝐲n𝑑𝐲Qjcc(𝐲)(1nlogW(𝐲|𝐗)Qicc(𝐲)C(Γ)+rn|𝐘=𝐲),\displaystyle\leq\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\frac{1}{n}\log\frac{W(\mathbf{y}|\mathbf{X})}{Q_{i}^{cc}(\mathbf{y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\Big{|}\mathbf{Y}=\mathbf{y}\right),

where i{1,2,3}i\in\{1,2,3\} depends on 𝐲\mathbf{y} and is such that Qicc(𝐲)Q_{i}^{cc}(\mathbf{y}) assigns the highest probability to 𝐲\mathbf{y}. We continue the derivation as

𝐗Unif(SRjn1)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn)\displaystyle\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\right)
𝐲n𝑑𝐲Qjcc(𝐲)(logW(𝐲|𝐗)Qicc(𝐲)nC(Γ)+rn|𝐘=𝐲)\displaystyle\leq\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\log\frac{W(\mathbf{y}|\mathbf{X})}{Q_{i}^{cc}(\mathbf{y})}\leq nC(\Gamma)+r\sqrt{n}\Big{|}\mathbf{Y}=\mathbf{y}\right)
=𝐲n𝑑𝐲Qjcc(𝐲)(logW(𝐲|𝐗)Qjcc(𝐲)nC(Γ)+rn+logQicc(𝐲)Qjcc(𝐲)|𝐘=𝐲)\displaystyle=\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\log\frac{W(\mathbf{y}|\mathbf{X})}{Q_{j}^{cc}(\mathbf{y})}\leq nC(\Gamma)+r\sqrt{n}+\log\frac{Q_{i}^{cc}(\mathbf{y})}{Q_{j}^{cc}(\mathbf{y})}\Big{|}\mathbf{Y}=\mathbf{y}\right)
𝐲n𝑑𝐲Qjcc(𝐲)(logW(𝐲|𝐗)Qjcc(𝐲)nC(Γ)+rn+κ1logn|𝐘=𝐲)+δn(j).\displaystyle\leq\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\log\frac{W(\mathbf{y}|\mathbf{X})}{Q_{j}^{cc}(\mathbf{y})}\leq nC(\Gamma)+r\sqrt{n}+\kappa_{1}\log n\Big{|}\mathbf{Y}=\mathbf{y}\right)+\delta_{n}^{(j)}. (16)

In the last inequality above, we used Lemma 6. Specifically, in Lemma 6, let

  • Γ=Γj\Gamma=\Gamma_{j},

  • Γ=Γi\Gamma^{\prime}=\Gamma_{i},

  • ϵ=ΓiΓj\epsilon=\Gamma_{i}-\Gamma_{j} so that ϵ=O(1n)\epsilon=O\left(\frac{1}{\sqrt{n}}\right), and

  • Δ=lognn\Delta=\sqrt{\frac{\log n}{n}}.

Consequently, κ1\kappa_{1} is a constant from the result of Lemma 6 and

δn(j)=Qjcc(|𝐘2nΓjN|>Δ).\displaystyle\delta_{n}^{(j)}=Q_{j}^{cc}\left(\Bigg{|}\frac{||\mathbf{Y}||^{2}}{n}-\Gamma_{j}-N\Bigg{|}>\Delta\right).

It is easy to check that for 𝐘Qjcc\mathbf{Y}\sim Q_{j}^{cc}, 𝔼[𝐘2]=nΓj+nN\mathbb{E}\left[||\mathbf{Y}||^{2}\right]=n\Gamma_{j}+nN and Var(𝐘2)=4nNΓj+2nN2\text{Var}(||\mathbf{Y}||^{2})=4nN\Gamma_{j}+2nN^{2}. Thus, it is easy to see that δn(j)0\delta_{n}^{(j)}\to 0 as nn\to\infty using Chebyshev inequality.

Continuing the derivation from (16)(\ref{1zp}), we have

𝐗Unif(SRjn1)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn)\displaystyle\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\right)
𝐲n𝑑𝐲Qjcc(𝐲)(logW(𝐲|𝐗)Qj(𝐲)nC(Γ)+rn+κ1logn+logQjcc(𝐲)Qj(𝐲)|𝐘=𝐲)+δn(j)\displaystyle\leq\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\log\frac{W(\mathbf{y}|\mathbf{X})}{Q_{j}^{*}(\mathbf{y})}\leq nC(\Gamma)+r\sqrt{n}+\kappa_{1}\log n+\log\frac{Q_{j}^{cc}(\mathbf{y})}{Q_{j}^{*}(\mathbf{y})}\Big{|}\mathbf{Y}=\mathbf{y}\right)+\delta_{n}^{(j)}
𝐲n𝑑𝐲Qjcc(𝐲)(logW(𝐲|𝐗)Qj(𝐲)nC(Γ)+rn+κlogn|𝐘=𝐲)+2δn(j).\displaystyle\leq\int_{\mathbf{y}\in\mathbb{R}^{n}}d\mathbf{y}Q_{j}^{cc}(\mathbf{y})\mathbb{P}\left(\log\frac{W(\mathbf{y}|\mathbf{X})}{Q_{j}^{*}(\mathbf{y})}\leq nC(\Gamma)+r\sqrt{n}+\kappa\log n\Big{|}\mathbf{Y}=\mathbf{y}\right)+2\delta_{n}^{(j)}. (17)

In the last inequality above, we used Lemma 7. Specifically, in Lemma 7, let Γ=Γj\Gamma=\Gamma_{j} and Δ=lognn\Delta=\sqrt{\frac{\log n}{n}}. Consequently, κ\kappa is a suitable bounding constant for both Lemma 6 and Lemma 7.

Continuing the derivation from (17)(\ref{2x}), we have

𝐗Unif(SRjn1)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn)\displaystyle\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}\right)
𝐗Unif(SRjn1)(logW(𝐘|𝐗)Qj(𝐘)nC(Γ)+rn+κlogn)+2δn(j)\displaystyle\leq\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\log\frac{W(\mathbf{Y}|\mathbf{X})}{Q_{j}^{*}(\mathbf{Y})}\leq nC(\Gamma)+r\sqrt{n}+\kappa\log n\right)+2\delta_{n}^{(j)}
=𝐗Unif(SRjn1)(m=1nlogW(Ym|Xm)Qj(Ym)nC(Γj)n(C(Γ)C(Γj))+rn+κlogn)+2δn(j)\displaystyle=\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\sum_{m=1}^{n}\log\frac{W(Y_{m}|X_{m})}{Q_{j}^{*}(Y_{m})}-nC(\Gamma_{j})\leq n\left(C(\Gamma)-C(\Gamma_{j})\right)+r\sqrt{n}+\kappa\log n\right)+2\delta_{n}^{(j)}
=(a)𝐗Unif(SRjn1)(m=1n[logW(Ym|Xm)Qj(Ym)𝔼[logW(Ym|Xm)Qj(Ym)]]n(C(Γ)C(Γj))+rn+κlogn)+2δn(j)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\mathbb{P}_{\mathbf{X}\sim\text{Unif}(S^{n-1}_{R_{j}})}\left(\sum_{m=1}^{n}\left[\log\frac{W(Y_{m}|X_{m})}{Q_{j}^{*}(Y_{m})}-\mathbb{E}\left[\log\frac{W(Y_{m}|X_{m})}{Q_{j}^{*}(Y_{m})}\right]\right]\leq n\left(C(\Gamma)-C(\Gamma_{j})\right)+r\sqrt{n}+\kappa\log n\right)+2\delta_{n}^{(j)}
=(b)(m=1nT~mn(C(Γ)C(Γj))+rn+κlogn)+2δn(j)\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\mathbb{P}\left(\sum_{m=1}^{n}\tilde{T}_{m}\leq n\left(C(\Gamma)-C(\Gamma_{j})\right)+r\sqrt{n}+\kappa\log n\right)+2\delta_{n}^{(j)}
=(c)(1nV(Γj)m=1nT~mnC(Γ)C(Γj)V(Γj)+rV(Γj)+κlognnV(Γj))+2δn(j)\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}\mathbb{P}\left(\frac{1}{\sqrt{nV(\Gamma_{j})}}\sum_{m=1}^{n}\tilde{T}_{m}\leq\sqrt{n}\frac{C(\Gamma)-C(\Gamma_{j})}{\sqrt{V(\Gamma_{j})}}+\frac{r}{\sqrt{V(\Gamma_{j})}}+\frac{\kappa\log n}{\sqrt{nV(\Gamma_{j})}}\right)+2\delta_{n}^{(j)}
(d)Φ(nC(Γ)C(Γj)V(Γj)+rV(Γj)+κlognnV(Γj))+δn\displaystyle\stackrel{{\scriptstyle(d)}}{{\leq}}\Phi\left(\sqrt{n}\frac{C(\Gamma)-C(\Gamma_{j})}{\sqrt{V(\Gamma_{j})}}+\frac{r}{\sqrt{V(\Gamma_{j})}}+\frac{\kappa\log n}{\sqrt{nV(\Gamma_{j})}}\right)+\delta_{n}
(e)Φ(n2V(Γj)(ΓΓjN+Γ)+rV(Γj)+2κlognnV(Γj))+δn\displaystyle\stackrel{{\scriptstyle(e)}}{{\leq}}\Phi\left(\frac{\sqrt{n}}{2\sqrt{V(\Gamma_{j})}}\left(\frac{\Gamma-\Gamma_{j}}{N+\Gamma}\right)+\frac{r}{\sqrt{V(\Gamma_{j})}}+\frac{2\kappa\log n}{\sqrt{nV(\Gamma_{j})}}\right)+\delta_{n} (18)

In equality (a)(a) above, we used Lemma 1. Specifically, use Equation (3)(\ref{equsentimes}) in Lemma 1 with Γ=Γj\Gamma=\Gamma_{j}.

In equality (b)(b) above, we used Lemma 2, where the T~m\tilde{T}_{m}’s are i.i.d. and

T~m=logW(Y|Γj)Qj(Y)𝔼[logW(Y|Γj)Qj(Y)],\displaystyle\tilde{T}_{m}=\log\frac{W(Y|\sqrt{\Gamma_{j}})}{Q^{*}_{j}(Y)}-\mathbb{E}\left[\log\frac{W(Y|\sqrt{\Gamma_{j}})}{Q^{*}_{j}(Y)}\right], (19)

where Y𝒩(Γj,N)Y\sim\mathcal{N}(\sqrt{\Gamma_{j}},N).

In equality (c)(c) above, we normalize the sum to have unit variance, which follows from Corollary 1 and Equation (4)(\ref{VGammadef}).

In inequality (d)(d) above, we used the Berry-Esseen Theorem [12] to obtain convergence of the CDF of the normalized sum of i.i.d. random variables T~m\tilde{T}_{m}’s to the standard normal CDF, with δn0\delta_{n}\to 0 accounting for both the rate of convergence and δn(j)0\delta_{n}^{(j)}\to 0. In inequality (e)(e) above, we used a Taylor series approximation, noting that ΓΓj=O(1/n)\Gamma-\Gamma_{j}=O(1/\sqrt{n}).

Substituting (18)(\ref{subsintosumof3}) into (14)(\ref{epsnu}), we obtain

n\displaystyle\mathcal{E}_{n} j=13pjΦ(n2V(Γj)(ΓΓjN+Γ)+rV(Γj)+2κlognnV(Γj))+δn\displaystyle\leq\sum_{j=1}^{3}p_{j}\Phi\left(\frac{\sqrt{n}}{2\sqrt{V(\Gamma_{j})}}\left(\frac{\Gamma-\Gamma_{j}}{N+\Gamma}\right)+\frac{r}{\sqrt{V(\Gamma_{j})}}+\frac{2\kappa\log n}{\sqrt{nV(\Gamma_{j})}}\right)+\delta_{n}

for some redefined sequence δn0\delta_{n}\to 0 as nn\to\infty. Using Equation (13)(\ref{38n}) and the formula for C(Γ)C^{\prime}(\Gamma) from Lemma 1, we can simplify the upper bound as

n\displaystyle\mathcal{E}_{n} j=13pjΦ(V(Γ)V(Γj)πjrV(Γj)+rV(Γj)+2κlognnV(Γj))+δn.\displaystyle\leq\sum_{j=1}^{3}p_{j}\Phi\left(\frac{\sqrt{V(\Gamma)}}{\sqrt{V(\Gamma_{j})}}\pi_{j}-\frac{r}{\sqrt{V(\Gamma_{j})}}+\frac{r}{\sqrt{V(\Gamma_{j})}}+\frac{2\kappa\log n}{\sqrt{nV(\Gamma_{j})}}\right)+\delta_{n}.

Therefore, since ΓjΓ\Gamma_{j}\to\Gamma as nn\to\infty, we have

lim supnn\displaystyle\limsup_{n\to\infty}\mathcal{E}_{n} j=13pjΦ(πj)\displaystyle\leq\sum_{j=1}^{3}p_{j}\Phi\left(\pi_{j}\right)
=𝒦(rV(Γ),C(Γ)2VV(Γ)).\displaystyle=\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right). (20)

To complete the proof, we choose r<rr<r^{*} where

r=max{r:𝒦(rV(Γ),C(Γ)2VV(Γ))ϵ}.\displaystyle r^{*}=\max\left\{r:\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right)\leq\epsilon\right\}. (21)

Hence, lim supnn<ϵ\limsup_{n\to\infty}\mathcal{E}_{n}<\epsilon because, from Lemma 3, 𝒦(,)\mathcal{K}(\cdot,\cdot) is a strictly increasing function in the first argument. Hence, lim supnn<ϵ\limsup_{n\to\infty}\mathcal{E}_{n}<\epsilon for r<rr<r^{*}, thus establishing that any r<rr<r^{*} is an achievable second-order coding rate. Finally, letting rrr\to r^{*} establishes an achievable second-order coding rate of rr^{*}, matching the converse in Theorem 2.

The achievability result can also be stated in terms of an upper bound on the minimum average probability of error of (n,R,Γ,V)(n,R,\Gamma,V) codes for a rate R=C(Γ)+rnR=C(\Gamma)+\frac{r}{\sqrt{n}}. From Lemma 4, we have for θ=1/nϑ\theta=1/n^{\vartheta} for 1>ϑ>1/21>\vartheta>1/2,

P¯e(n,C(Γ)+rn,Γ,V)(PW)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn+1nϑ)+en1ϑ.\displaystyle\bar{P}_{\text{e}}\left(n,C(\Gamma)+\frac{r}{\sqrt{n}},\Gamma,V\right)\leq(P\circ W)\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r}{\sqrt{n}}+\frac{1}{n^{\vartheta}}\right)+e^{-n^{1-\vartheta}}. (22)

For any r>rr^{\prime}>r, we have rn+1nϑ<rn\frac{r}{\sqrt{n}}+\frac{1}{n^{\vartheta}}<\frac{r^{\prime}}{\sqrt{n}} eventually, so

lim supnP¯e(n,C(Γ)+rn,Γ,V)\displaystyle\limsup_{n\to\infty}\bar{P}_{\text{e}}\left(n,C(\Gamma)+\frac{r}{\sqrt{n}},\Gamma,V\right)
lim supn(PW)(1nlogW(𝐘|𝐗)PW(𝐘)C(Γ)+rn)\displaystyle\leq\limsup_{n\to\infty}(P\circ W)\left(\frac{1}{n}\log\frac{W(\mathbf{Y}|\mathbf{X})}{PW(\mathbf{Y})}\leq C(\Gamma)+\frac{r^{\prime}}{\sqrt{n}}\right)
𝒦(rV(Γ),C(Γ)2VV(Γ)),\displaystyle\leq\mathcal{K}\left(\frac{r^{\prime}}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right), (23)

where the last inequality follows from (20)(\ref{extendtoerror}). Letting rrr^{\prime}\to r in (23)(\ref{notplayggg}) and invoking continuity of the function 𝒦\mathcal{K} establishes the result.

IV Converse Result

Lemma 8

Consider an AWGN channel WW with noise variance N>0N>0 and cost constraint (Γ,V)(0,)×(0,)(\Gamma,V)\in(0,\infty)\times(0,\infty). Then for every n,ρ>0n,\rho>0 and ϵ(0,1)\epsilon\in(0,1),

logM(n,ϵ,Γ,V)\displaystyle\log M^{*}(n,\epsilon,\Gamma,V) logρlog[(1ϵsupP¯𝒫Γ,V(n)infq𝒫(n)(P¯W)(W(𝐘|𝐗)q(𝐘)>ρ))+],\displaystyle\leq\log\rho-\log\left[\left(1-\epsilon-\sup_{\overline{P}\in\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n})}\,\inf_{q\in\mathcal{P}(\mathbb{R}^{n})}(\overline{P}\circ W)\left(\frac{W(\mathbf{Y}|\mathbf{X})}{q(\mathbf{Y})}>\rho\right)\right)^{+}\right], (24)

where 𝒫Γ,V(n)𝒫(n)\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n})\subset\mathcal{P}(\mathbb{R}^{n}) is the set of distributions P¯\overline{P} such that 𝔼[i=1nc(Xi)]nΓ\mathbb{E}\left[\sum_{i=1}^{n}c(X_{i})\right]\leq n\Gamma and Var(i=1nc(Xi))nV\text{Var}\left(\sum_{i=1}^{n}c(X_{i})\right)\leq nV for 𝐗P¯\mathbf{X}\sim\overline{P}.

Proof: The proof is similar to that of [9, Lemma 2] and is omitted.

Theorem 2

Fix an arbitrary ϵ(0,1)\epsilon\in(0,1). Consider an AWGN channel with noise variance N>0N>0. Under the mean and variance cost constraints specified by the pair (Γ,V)(0,)×[0,)(\Gamma,V)\in(0,\infty)\times[0,\infty), we have

lim supnlogM(n,ϵ,Γ,V)nC(Γ)nmax{r:𝒦(rV(Γ),C(Γ)2VV(Γ))ϵ}.\displaystyle\limsup_{n\to\infty}\,\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\leq\max\left\{r\in\mathbb{R}:\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right)\leq\epsilon\right\}. (25)

Alternatively, for any second-order coding rate rr\in\mathbb{R},

lim infnP¯e(n,C(Γ)+rn,Γ,V)\displaystyle\liminf_{n\to\infty}\bar{P}_{\text{e}}\left(n,C(\Gamma)+\frac{r}{\sqrt{n}},\Gamma,V\right) 𝒦(rV(Γ),C(Γ)2VV(Γ)).\displaystyle\geq\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right). (26)
Proof:

For V=0V=0, we are required to prove that the upper bound in (25)(\ref{upper_bound_V=0}) is V(Γ)Φ1(ϵ)\sqrt{V(\Gamma)}\Phi^{-1}(\epsilon), and the lower bound in (26)(\ref{lower_bound_V=0}) is Φ(r/V(Γ))\Phi\left(r/\sqrt{V(\Gamma)}\right). But this follows from the known converse results for the maximal cost constraint, since the m.v. cost constraint for V=0V=0 is more stringent.

We assume V>0V>0 for the remainder of the proof. We start with Lemma 8 and first upper bound

supP¯𝒫Γ,V(n)infq𝒫(n)(P¯W)(W(𝐘|𝐗)q(𝐘)>ρ).\displaystyle\sup_{\overline{P}\in\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n})}\,\inf_{q\in\mathcal{P}(\mathbb{R}^{n})}(\overline{P}\circ W)\left(\frac{W(\mathbf{Y}|\mathbf{X})}{q(\mathbf{Y})}>\rho\right). (27)

Let ρ=exp(nC(Γ)+nr)\rho=\exp\left(nC(\Gamma)+\sqrt{n}r\right) in (27)(\ref{bx}), where rr is a number to be specified later. Let 𝐗P¯\mathbf{X}\sim\overline{P} for any arbitrary P¯𝒫Γ,V(n)\overline{P}\in\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n}) so that 𝔼[i=1nc(Xi)]nΓ\mathbb{E}\left[\sum_{i=1}^{n}c(X_{i})\right]\leq n\Gamma and Var(i=1nc(Xi))nV\text{Var}\left(\sum_{i=1}^{n}c(X_{i})\right)\leq nV.

Choosing q=Qq=Q^{*} in (27)(\ref{bx}), we have

(P¯W)(logW(𝐘|𝐗)Q(𝐘)>nC(Γ)+nr)\displaystyle(\overline{P}\circ W)\left(\log\frac{W(\mathbf{Y}|\mathbf{X})}{Q^{*}(\mathbf{Y})}>nC(\Gamma)+\sqrt{n}r\right)
=n𝑑P¯(𝐱)W(i=1nlogW(Yi|xi)Q(Yi)>nC(Γ)+nr),\displaystyle=\int_{\mathbb{R}^{n}}d\overline{P}(\mathbf{x})W\left(\sum_{i=1}^{n}\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}>nC(\Gamma)+\sqrt{n}r\right), (28)

where in (28)(\ref{fopv}), Y1,Y2,,YnY_{1},Y_{2},\ldots,Y_{n} are independent random variables and each Yi𝒩(xi,N)Y_{i}\sim\mathcal{N}(x_{i},N). We define the empirical distribution of a vector 𝐱\mathbf{x} as

P𝐱=1ni=1nδxi,\displaystyle P_{\mathbf{x}}=\frac{1}{n}\sum_{i=1}^{n}\delta_{x_{i}},

where δx\delta_{x} is the dirac delta measure at xx. For any given channel input 𝐱\mathbf{x} and independent YiY_{i}’s where Yi𝒩(xi,N)Y_{i}\sim\mathcal{N}(x_{i},N), we have from Corollary 1 that

Var(1ni=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]])\displaystyle\text{Var}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]\right)
=νtP𝐱(t)𝑑t\displaystyle=\int\nu_{t}P_{\mathbf{x}}(t)dt
=Γ2+2Nc(𝐱)2(N+Γ)2\displaystyle=\frac{\Gamma^{2}+2Nc(\mathbf{x})}{2(N+\Gamma)^{2}}

For any β0\beta\geq 0, define

δβ:=supΓβ<tΓ+β|Γ2+2Nt2(N+Γ)2V(Γ)|\displaystyle\delta_{\beta}:=\sup_{\Gamma-\beta<t\leq\Gamma+\beta}\Bigg{|}\frac{\Gamma^{2}+2Nt}{2(N+\Gamma)^{2}}-V(\Gamma)\Bigg{|}

and note that δβ0\delta_{\beta}\to 0 as β0\beta\to 0. Define the function

ψβ(t):={V(Γ)+δβ if tΓV(Γ)δβ if t>Γ.\displaystyle\psi_{\beta}\left(t\right):=\begin{cases}V(\Gamma)+\delta_{\beta}&\text{ if }t\leq\Gamma\\ V(\Gamma)-\delta_{\beta}&\text{ if }t>\Gamma.\end{cases}

Fix any 0<Δ<1/20<\Delta<1/2 and η>0\eta>0. Then select β>0\beta>0 such that δβ<V(Γ)\delta_{\beta}<V(\Gamma),

supt|1V(Γ)1ψβ(t)|ΔC(Γ)V\displaystyle\sup_{t\in\mathbb{R}}\Bigg{|}\frac{1}{\sqrt{V(\Gamma)}}-\frac{1}{\sqrt{\psi_{\beta}(t)}}\Bigg{|}\leq\frac{\Delta}{C^{\prime}(\Gamma)\sqrt{V}} (29)

and

supt|1V(Γ)1ψβ(t)|Δ/2C(Γ)2V.\displaystyle\sup_{t\in\mathbb{R}}\Bigg{|}\frac{1}{V(\Gamma)}-\frac{1}{\psi_{\beta}(t)}\Bigg{|}\leq\frac{\Delta/2}{C^{\prime}(\Gamma)^{2}V}. (30)

Also define

φ(r):={V(Γ)δβ if r0V(Γ)+δβ if r>0.\displaystyle\varphi(r):=\begin{cases}V(\Gamma)-\delta_{\beta}&\text{ if }r\leq 0\\ V(\Gamma)+\delta_{\beta}&\text{ if }r>0.\end{cases} (31)

With Δ\Delta, η\eta and β\beta fixed as above111η\eta is used later., we divide the set of sequences 𝐱n\mathbf{x}\in\mathbb{R}^{n} into three subsets:

𝒫n,1\displaystyle\mathcal{P}_{n,1} :={𝐱:c(𝐱)Γβ},\displaystyle:=\left\{\mathbf{x}:c(\mathbf{x})\leq\Gamma-\beta\right\},
𝒫n,2\displaystyle\mathcal{P}_{n,2} :={𝐱:Γβ<c(𝐱)Γ+β},\displaystyle:=\left\{\mathbf{x}:\Gamma-\beta<c(\mathbf{x})\leq\Gamma+\beta\right\},
𝒫n,3\displaystyle\mathcal{P}_{n,3} :={𝐱:c(𝐱)>Γ+β}.\displaystyle:=\left\{\mathbf{x}:c(\mathbf{x})>\Gamma+\beta\right\}.

For 𝐱𝒫n,1\mathbf{x}\in\mathcal{P}_{n,1}, we have

W(i=1nlogW(Yi|xi)Q(Yi)>nC(Γ)+nr)\displaystyle W\left(\sum_{i=1}^{n}\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}>nC(\Gamma)+\sqrt{n}r\right)
=(a)W(i=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]]>nC(Γ)(Γc(𝐱))+nr)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}W\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]>nC^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)+\sqrt{n}r\right)
W(i=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]]>nC(Γ)β+nr)\displaystyle\leq W\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]>nC^{\prime}(\Gamma)\beta+\sqrt{n}r\right)
(b)W(i=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]]>nC(Γ)β2)\displaystyle\stackrel{{\scriptstyle(b)}}{{\leq}}W\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]>\frac{nC^{\prime}(\Gamma)\beta}{2}\right)
(c)4nC(Γ)2β2(Γ2+2NΓ2(N+Γ)2)\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}\frac{4}{nC^{\prime}(\Gamma)^{2}\beta^{2}}\left(\frac{\Gamma^{2}+2N\Gamma}{2(N+\Gamma)^{2}}\right) (32)

for sufficiently large nn. In equality (a)(a), we used Lemma 1. Inequality (b)(b) holds because rr is a constant. Inequality (c)(c) follows by applying Chebyshev’s inequality and Corollary 1, the latter of which gives us that

Var(i=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]])\displaystyle\text{Var}\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]\right)
=nΓ2+2nNc(𝐱)2(N+Γ)2\displaystyle=\frac{n\Gamma^{2}+2nNc(\mathbf{x})}{2\left(N+\Gamma\right)^{2}}
n(Γ2+2NΓ2(N+Γ)2).\displaystyle\leq n\left(\frac{\Gamma^{2}+2N\Gamma}{2(N+\Gamma)^{2}}\right).

For 𝐱𝒫n,2\mathbf{x}\in\mathcal{P}_{n,2}, we have

W(i=1nlogW(Yi|xi)Q(Yi)>nC(Γ)+nr)\displaystyle W\left(\sum_{i=1}^{n}\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}>nC(\Gamma)+\sqrt{n}r\right)
=(a)W(i=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]]>nC(Γ)(Γc(𝐱))+nr)\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}W\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]>nC^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)+\sqrt{n}r\right)
=(b)W(1nνtP𝐱(t)𝑑ti=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]]>\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}W\left(\frac{1}{\sqrt{\int n\,\nu_{t}P_{\mathbf{x}}(t)dt}}\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]>\mbox{}\right.
nC(Γ)(Γc(𝐱))νtP𝐱(t)𝑑t+rνtP𝐱(t)𝑑t)\displaystyle\left.\quad\quad\quad\quad\quad\quad\quad\quad\quad\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}+\frac{r}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}\right)
=(c)W(1nνtP𝐱(t)𝑑ti=1nT~i>nC(Γ)(Γc(𝐱))νtP𝐱(t)𝑑t+rνtP𝐱(t)𝑑t)\displaystyle\stackrel{{\scriptstyle(c)}}{{=}}W\left(\frac{1}{\sqrt{\int n\,\nu_{t}P_{\mathbf{x}}(t)dt}}\sum_{i=1}^{n}\tilde{T}_{i}>\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}+\frac{r}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}\right)
(d)1Φ(nC(Γ)(Γc(𝐱))νtP𝐱(t)𝑑t+rνtP𝐱(t)𝑑t)+κ𝐱n\displaystyle\stackrel{{\scriptstyle(d)}}{{\leq}}1-\Phi\left(\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}+\frac{r}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}\right)+\frac{\kappa_{\mathbf{x}}}{\sqrt{n}}
(e)1Φ(nC(Γ)(Γc(𝐱))νtP𝐱(t)𝑑t+rνtP𝐱(t)𝑑t)+κn\displaystyle\stackrel{{\scriptstyle(e)}}{{\leq}}1-\Phi\left(\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}+\frac{r}{\sqrt{\int\nu_{t}P_{\mathbf{x}}(t)dt}}\right)+\frac{\kappa}{\sqrt{n}}
1Φ(nC(Γ)(Γc(𝐱))ψβ(c(𝐱))+rφ(r))+κn\displaystyle\leq 1-\Phi\left(\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{x})\right)}{\sqrt{\psi_{\beta}(c(\mathbf{x}))}}+\frac{r}{\sqrt{\varphi(r)}}\right)+\frac{\kappa}{\sqrt{n}} (33)

for sufficiently large nn. In equality (a)(a), we used Lemma 1. In equality (b)(b), we normalize the sum to have unit variance, where we write

nνtP𝐱(t)𝑑t\displaystyle\int_{-\infty}^{\infty}n\,\nu_{t}P_{\mathbf{x}}(t)dt
=Var(i=1n[logW(Yi|xi)Q(Yi)𝔼[logW(Yi|xi)Q(Yi)]])\displaystyle=\text{Var}\left(\sum_{i=1}^{n}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}-\mathbb{E}\left[\log\frac{W(Y_{i}|x_{i})}{Q^{*}(Y_{i})}\right]\right]\right)
=nΓ2+2nNc(𝐱)2(N+Γ)2.\displaystyle=\frac{n\Gamma^{2}+2nNc(\mathbf{x})}{2\left(N+\Gamma\right)^{2}}.

In equality (c)(c), we use Lemma 2. In inequality (d)(d), we apply the Berry-Esseen Theorem [12], where κ𝐱\kappa_{\mathbf{x}} is a constant depending on the second- and third-order moments of {T~i}i=1n\{\tilde{T}_{i}\}_{i=1}^{n}. In inequality (e)(e), we use the fact that the distribution of each T~i\tilde{T}_{i} depends on 𝐱\mathbf{x} only through its cost c(𝐱)c(\mathbf{x}). Since c(𝐱)c(\mathbf{x}) is uniformly bounded over the set 𝒫n,2\mathcal{P}_{n,2}, the constant κ𝐱\kappa_{\mathbf{x}} can be uniformly upper bounded over all 𝐱𝒫n,2\mathbf{x}\in\mathcal{P}_{n,2} by some constant κ\kappa that does not depend on 𝐱\mathbf{x} at all.

For 𝒫n,3\mathcal{P}_{n,3}, the following lemma shows that the probability of this set under P¯\overline{P} is small.

Lemma 9

We have

P¯(𝒫n,3)Vnβ2.\displaystyle\overline{P}\left(\mathcal{P}_{n,3}\right)\leq\frac{V}{n\beta^{2}}. (34)
Proof:

Let μi=𝔼[c(Xi)]\mu_{i}=\mathbb{E}[c(X_{i})]. For any β>0\beta>0,

P¯(c(𝐗)>Γ+β)\displaystyle\overline{P}\left(c(\mathbf{X})>\Gamma+\beta\right)
=P¯(i=1n(c(Xi)μi)+i=1nμinΓ+nβ)\displaystyle=\overline{P}\left(\sum_{i=1}^{n}\left(c(X_{i})-\mu_{i}\right)+\sum_{i=1}^{n}\mu_{i}\geq n\Gamma+n\beta\right)
P¯(i=1n(c(Xi)μi)nβ)\displaystyle\leq\overline{P}\left(\sum_{i=1}^{n}\left(c(X_{i})-\mu_{i}\right)\geq n\beta\right)
P¯((i=1n(c(Xi)μi))2n2β2)\displaystyle\leq\overline{P}\left(\left(\sum_{i=1}^{n}\left(c(X_{i})-\mu_{i}\right)\right)^{2}\geq n^{2}\beta^{2}\right)
1n2β2𝔼[(i=1n(c(Xi)μi))2]\displaystyle\leq\frac{1}{n^{2}\beta^{2}}\mathbb{E}\left[\left(\sum_{i=1}^{n}\left(c(X_{i})-\mu_{i}\right)\right)^{2}\right]
Vnβ2.\displaystyle\leq\frac{V}{n\beta^{2}}.

Using the results in (32)(\ref{pn1}), (33)(\ref{pn2}) and (34)(\ref{pn3}), we can upper bound (28)(\ref{fopv}) as

(P¯W)(logW(𝐘|𝐗)Q(𝐘)>nC(Γ)+nr)\displaystyle(\overline{P}\circ W)\left(\log\frac{W(\mathbf{Y}|\mathbf{X})}{Q^{*}(\mathbf{Y})}>nC(\Gamma)+\sqrt{n}r\right)
1+4nC(Γ)2β2(Γ2+2NΓ2(N+Γ)2)+Vnβ2+κn\displaystyle\leq 1+\frac{4}{nC^{\prime}(\Gamma)^{2}\beta^{2}}\left(\frac{\Gamma^{2}+2N\Gamma}{2(N+\Gamma)^{2}}\right)+\frac{V}{n\beta^{2}}+\frac{\kappa}{\sqrt{n}}-\mbox{}
𝔼[Φ(nC(Γ)(Γc(𝐗))ψβ(c(𝐗))+rφ(r))].\displaystyle\quad\quad\quad\mathbb{E}\left[\Phi\left(\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{X})\right)}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}+\frac{r}{\sqrt{\varphi(r)}}\right)\right]. (35)

To further upper bound the above expression, we need to obtain a lower bound to

inf𝐗𝔼[Φ(nC(Γ)(Γc(𝐗))ψβ(c(𝐗))+rφ(r))],\displaystyle\inf_{\mathbf{X}}\,\mathbb{E}\left[\Phi\left(\frac{\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{X})\right)}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}+\frac{r}{\sqrt{\varphi(r)}}\right)\right],

where the infimum is over all random vectors 𝐗\mathbf{X} such that 𝔼[c(𝐗)]Γ\mathbb{E}\left[c(\mathbf{X})\right]\leq\Gamma and Var(c(𝐗))V/n\text{Var}\left(c(\mathbf{X})\right)\leq V/n. Without loss of generality, we can assume 𝔼[c(𝐗)]=Γ\mathbb{E}[c(\mathbf{X})]=\Gamma since the function Φ\Phi is monotonically increasing and the function

cΓcψβ(c)c\mapsto\frac{\Gamma-c}{\sqrt{\psi_{\beta}(c)}}

is monotonically nonincreasing.

Note that

|𝔼[nC(Γ)ψβ(c(𝐗))(Γc(𝐗))]𝔼[nC(Γ)V(Γ)(Γc(𝐗))]|\displaystyle\Bigg{|}\mathbb{E}\left[\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}\left(\Gamma-c(\mathbf{X})\right)\right]-\mathbb{E}\left[\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{V(\Gamma)}}\left(\Gamma-c(\mathbf{X})\right)\right]\Bigg{|}
=|𝔼[nC(Γ)(Γc(𝐗))[1ψβ(c(𝐗))1V(Γ)]]|\displaystyle=\Bigg{|}\mathbb{E}\left[\sqrt{n}C^{\prime}(\Gamma)\left(\Gamma-c(\mathbf{X})\right)\left[\frac{1}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}-\frac{1}{\sqrt{V(\Gamma)}}\right]\right]\Bigg{|}
nC(Γ)𝔼[|Γc(𝐗)||1ψβ(c(𝐗))1V(Γ)|]\displaystyle\leq\sqrt{n}C^{\prime}(\Gamma)\mathbb{E}\left[\big{|}\Gamma-c(\mathbf{X})\big{|}\cdot\Bigg{|}\frac{1}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}-\frac{1}{\sqrt{V(\Gamma)}}\Bigg{|}\right]
(a)nC(Γ)ΔC(Γ)V𝔼[|Γc(𝐗)|]\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\sqrt{n}C^{\prime}(\Gamma)\frac{\Delta}{C^{\prime}(\Gamma)\sqrt{V}}\mathbb{E}\left[\big{|}\Gamma-c(\mathbf{X})\big{|}\right]
(b)Δ.\displaystyle\stackrel{{\scriptstyle(b)}}{{\leq}}\Delta. (36)

In inequality (a)(a), we used (29)(\ref{aaronerr1}). In inequality (b)(b), we used the fact that 𝔼[(Γc(𝐗))2]Var(c(𝐗))\mathbb{E}\left[\sqrt{\left(\Gamma-c(\mathbf{X})\right)^{2}}\right]\leq\sqrt{\text{Var}(c(\mathbf{X}))}. Hence, from (36)(\ref{bzuyv}), we have

𝔼[nC(Γ)ψβ(c(𝐗))(Γc(𝐗))]\displaystyle\mathbb{E}\left[\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}\left(\Gamma-c(\mathbf{X})\right)\right]
𝔼[nC(Γ)V(Γ)(Γc(𝐗))]Δ\displaystyle\geq\mathbb{E}\left[\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{V(\Gamma)}}\left(\Gamma-c(\mathbf{X})\right)\right]-\Delta
=Δ.\displaystyle=-\Delta. (37)

Now define X=Γc(𝐗)X^{\prime}=\Gamma-c(\mathbf{X}) so that 𝔼[X]=0\mathbb{E}[X^{\prime}]=0 and Var(X)V/n\text{Var}(X^{\prime})\leq V/n. Also define

Ψβ(X):={V(Γ)+δβ if X0V(Γ)δβ if X<0.\displaystyle\Psi_{\beta}(X^{\prime}):=\begin{cases}V(\Gamma)+\delta_{\beta}&\text{ if }X^{\prime}\geq 0\\ V(\Gamma)-\delta_{\beta}&\text{ if }X^{\prime}<0.\end{cases}

Then note that

|Var(Γc(𝐗)ψβ(c(𝐗)))Var(Γc(𝐗)V(Γ))|\displaystyle\Bigg{|}\text{Var}\left(\frac{\Gamma-c(\mathbf{X})}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}\right)-\text{Var}\left(\frac{\Gamma-c(\mathbf{X})}{\sqrt{V(\Gamma)}}\right)\Bigg{|}
=|Var(XΨβ(X))Var(XV(Γ))|\displaystyle=\Bigg{|}\text{Var}\left(\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right)-\text{Var}\left(\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right)\Bigg{|}
=|𝔼[X2Ψβ(X)X2V(Γ)]+(𝔼[XV(Γ)])2(𝔼[XΨβ(X)])2|\displaystyle=\Bigg{|}\mathbb{E}\left[\frac{X^{\prime 2}}{\Psi_{\beta}(X^{\prime})}-\frac{X^{\prime 2}}{V(\Gamma)}\right]+\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]\right)^{2}-\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)^{2}\Bigg{|}
=|𝔼[(XΨβ(X)+XV(Γ))(XΨβ(X)XV(Γ))]+\displaystyle=\Bigg{|}\mathbb{E}\left[\left(\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}+\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right)\left(\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}-\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right)\right]+\mbox{}
(𝔼[XV(Γ)]+𝔼[XΨβ(X)])(𝔼[XV(Γ)]𝔼[XΨβ(X)])|\displaystyle\quad\quad\quad\quad\quad\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]+\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]-\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)\Bigg{|}
|𝔼[(XΨβ(X)+XV(Γ))(XΨβ(X)XV(Γ))]|+\displaystyle\leq\Bigg{|}\mathbb{E}\left[\left(\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}+\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right)\left(\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}-\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right)\right]\Bigg{|}+\mbox{}
|(𝔼[XV(Γ)]+𝔼[XΨβ(X)])(𝔼[XV(Γ)]𝔼[XΨβ(X)])|\displaystyle\quad\quad\quad\quad\quad\Bigg{|}\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]+\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]-\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)\Bigg{|}
(a)𝔼[X2]Δ/2C(Γ)2V+|(𝔼[XV(Γ)]+𝔼[XΨβ(X)])(𝔼[XV(Γ)]𝔼[XΨβ(X)])|\displaystyle\stackrel{{\scriptstyle(a)}}{{\leq}}\mathbb{E}[X^{\prime 2}]\frac{\Delta/2}{C^{\prime}(\Gamma)^{2}V}+\Bigg{|}\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]+\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)\left(\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}\right]-\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\right)\Bigg{|}
𝔼[X2]Δ/2C(Γ)2V+|𝔼[XV(Γ)+XΨβ(X)]||𝔼[XV(Γ)XΨβ(X)]|\displaystyle\leq\mathbb{E}[X^{\prime 2}]\frac{\Delta/2}{C^{\prime}(\Gamma)^{2}V}+\Bigg{|}\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}+\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\Bigg{|}\cdot\Bigg{|}\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{V(\Gamma)}}-\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\Bigg{|}
𝔼[X2]Δ/2C(Γ)2V+|𝔼[XΨβ(X)]|𝔼[|X||1V(Γ)1Ψβ(X)|]\displaystyle\leq\mathbb{E}[X^{\prime 2}]\frac{\Delta/2}{C^{\prime}(\Gamma)^{2}V}+\Bigg{|}\mathbb{E}\left[\frac{X^{\prime}}{\sqrt{\Psi_{\beta}(X^{\prime})}}\right]\Bigg{|}\cdot\mathbb{E}\left[|X^{\prime}|\cdot\Bigg{|}\frac{1}{\sqrt{V(\Gamma)}}-\frac{1}{\sqrt{\Psi_{\beta}(X^{\prime})}}\Bigg{|}\right]
(b)𝔼[X2]Δ/2C(Γ)2V+ΔnC(Γ)ΔC(Γ)V𝔼[|X|]\displaystyle\stackrel{{\scriptstyle(b)}}{{\leq}}\mathbb{E}[X^{\prime 2}]\frac{\Delta/2}{C^{\prime}(\Gamma)^{2}V}+\frac{\Delta}{\sqrt{n}C^{\prime}(\Gamma)}\cdot\frac{\Delta}{C^{\prime}(\Gamma)\sqrt{V}}\mathbb{E}\left[|X^{\prime}|\right]
(c)VnΔ/2C(Γ)2V+ΔnC(Γ)ΔC(Γ)VVn.\displaystyle\stackrel{{\scriptstyle(c)}}{{\leq}}\frac{V}{n}\frac{\Delta/2}{C^{\prime}(\Gamma)^{2}V}+\frac{\Delta}{\sqrt{n}C^{\prime}(\Gamma)}\cdot\frac{\Delta}{C^{\prime}(\Gamma)\sqrt{V}}\sqrt{\frac{V}{n}}. (38)

In inequality (a)(a), we used (30)(\ref{aaronerr2}). In inequality (b)(b), we used (29)(\ref{aaronerr1}) and (36)(\ref{bzuyv}), noting that 𝔼[X]=0\mathbb{E}[X^{\prime}]=0. In inequality (c)(c), we used the inequality 𝔼[|X|]𝔼[X2]=Var(X)V/n\mathbb{E}[|X^{\prime}|]\leq\sqrt{\mathbb{E}[X^{\prime 2}]}=\sqrt{\text{Var}(X^{\prime})}\leq\sqrt{V/n}. Therefore, from (38)(\ref{finbhikarle}), we have

Var(nC(Γ)ψβ(c(𝐗))(Γc(𝐗)))\displaystyle\text{Var}\left(\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}\left(\Gamma-c(\mathbf{X})\right)\right)
Var(nC(Γ)V(Γ)(Γc(𝐗)))+Δ\displaystyle\leq\text{Var}\left(\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{V(\Gamma)}}\left(\Gamma-c(\mathbf{X})\right)\right)+\Delta
C(Γ)2VV(Γ)+Δ.\displaystyle\leq\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta. (39)

Define the random variable Π\Pi^{\prime} as

Π:=nC(Γ)ψβ(c(𝐗))(Γc(𝐗))\displaystyle\Pi^{\prime}:=\frac{\sqrt{n}C^{\prime}(\Gamma)}{\sqrt{\psi_{\beta}(c(\mathbf{X}))}}\left(\Gamma-c(\mathbf{X})\right)

so that, from (37)(\ref{Piprimemean}) and (39)(\ref{Piprimevar}), 𝔼[Π]Δ\mathbb{E}[\Pi^{\prime}]\geq-\Delta and Var(Π)C(Γ)2VV(Γ)+Δ\text{Var}(\Pi^{\prime})\leq\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta. Then, from (35)(\ref{pq}), for sufficiently large nn,

(P¯W)(logW(𝐘|𝐗)Q(𝐘)>nC(Γ)+nr)\displaystyle(\overline{P}\circ W)\left(\log\frac{W(\mathbf{Y}|\mathbf{X})}{Q^{*}(\mathbf{Y})}>nC(\Gamma)+\sqrt{n}r\right)
1+2κninfΠ𝔼[Φ(Π+rφ(r))]\displaystyle\leq 1+2\frac{\kappa}{\sqrt{n}}-\inf_{\Pi^{\prime}}\mathbb{E}\left[\Phi\left(\Pi^{\prime}+\frac{r}{\sqrt{\varphi(r)}}\right)\right]
=(a)1+2κninfΠ𝔼[Φ(Π)]\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}1+2\frac{\kappa}{\sqrt{n}}-\inf_{\Pi}\mathbb{E}\left[\Phi(\Pi)\right]
=(b)1+2κn𝒦(Δ+rφ(r),C(Γ)2VV(Γ)+Δ).\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}1+2\frac{\kappa}{\sqrt{n}}-\mathcal{K}\left(-\Delta+\frac{r}{\sqrt{\varphi(r)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right). (40)

The infimum in equality (a)(a) above is over all random variables Π\Pi such that 𝔼[Π]Δ+rφ(r)\mathbb{E}[\Pi]\geq-\Delta+\frac{r}{\sqrt{\varphi(r)}} and Var(Π)C(Γ)2VV(Γ)+Δ.\text{Var}(\Pi)\leq\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta. Equality (b)(b) follows by the definition and properties of the 𝒦\mathcal{K} function given in Definition 4 and Lemma 3.

Using (40)(\ref{iskosocrmain}) to upper bound (27)(\ref{bx}) and using that upper bound in the result of Lemma 8 (with ρ=exp(nC(Γ)+nr)\rho=\exp\left(nC(\Gamma)+\sqrt{n}r\right)), we obtain

logM(n,ϵ,Γ,V)nC(Γ)+nrlog[(ϵ2κn+𝒦(Δ+rφ(r),C(Γ)2VV(Γ)+Δ))+]\displaystyle\log M^{*}(n,\epsilon,\Gamma,V)\leq nC(\Gamma)+\sqrt{n}r-\log\left[\left(-\epsilon-2\frac{\kappa}{\sqrt{n}}+\mathcal{K}\left(-\Delta+\frac{r}{\sqrt{\varphi(r)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\right)^{+}\right]
logM(n,ϵ,Γ,V)nC(Γ)nr1nlog[(ϵ2κn+𝒦(Δ+rφ(r),C(Γ)2VV(Γ)+Δ))+].\displaystyle\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\leq r-\frac{1}{\sqrt{n}}\log\left[\left(-\epsilon-2\frac{\kappa}{\sqrt{n}}+\mathcal{K}\left(-\Delta+\frac{r}{\sqrt{\varphi(r)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\right)^{+}\right]. (41)

For any given average error probability ϵ(0,1)\epsilon\in(0,1), we choose rr in (41)(\ref{rchoosekar}) as

r\displaystyle r ={r1+η if ϵ(0,ϵ]r2+η if ϵ(ϵ,1),\displaystyle=\begin{cases}r_{1}^{*}+\eta&\text{ if }\epsilon\in\left(0,\epsilon^{*}\right]\\ r_{2}^{*}+\eta&\text{ if }\epsilon\in\left(\epsilon^{*},1\right),\end{cases} (42)

where

r1\displaystyle r^{*}_{1} :=max{r:𝒦(Δ+rV(Γ)δβ,C(Γ)2VV(Γ)+Δ)ϵ},\displaystyle:=\max\left\{r^{\prime}:\mathcal{K}\left(-\Delta+\frac{r^{\prime}}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\leq\epsilon\right\}, (43)
r2\displaystyle r^{*}_{2} :=max{r:𝒦(Δ+rV(Γ)+δβ,C(Γ)2VV(Γ)+Δ)ϵ},\displaystyle:=\max\left\{r^{\prime}:\mathcal{K}\left(-\Delta+\frac{r^{\prime}}{\sqrt{V(\Gamma)+\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\leq\epsilon\right\}, (44)
ϵ\displaystyle\epsilon^{*} :=𝒦(ΔηV(Γ)δβ,C(Γ)2VV(Γ)+Δ).\displaystyle:=\mathcal{K}\left(-\Delta-\frac{\eta}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right).

Note that in (42)(\ref{rfuncepsilon}), r0r\leq 0 for ϵϵ\epsilon\leq\epsilon^{*} and r>0r>0 for ϵ>ϵ\epsilon>\epsilon^{*}.

Therefore, for ϵϵ\epsilon\leq\epsilon^{*}, we have

logM(n,ϵ,Γ,V)nC(Γ)nr1+η1nlog[(ϵ2κn+𝒦(Δ+r1+ηV(Γ)δβ,C(Γ)2VV(Γ)+Δ))+].\displaystyle\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\leq r_{1}^{*}+\eta-\frac{1}{\sqrt{n}}\log\left[\left(-\epsilon-\frac{2\kappa}{\sqrt{n}}+\mathcal{K}\left(-\Delta+\frac{r_{1}^{*}+\eta}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\right)^{+}\right]. (45)

Since 𝒦(,)\mathcal{K}(\cdot\,,\cdot) is strictly increasing in the first argument, we have

𝒦(Δ+r1+ηV(Γ)δβ,C(Γ)2VV(Γ)+Δ)>ϵ\displaystyle\mathcal{K}\left(-\Delta+\frac{r_{1}^{*}+\eta}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)>\epsilon

from the definition of r1r_{1}^{*} in (43)(\ref{r1star}) and the fact that η>0\eta>0. Hence, taking the limit supremum as nn\to\infty in (45)(\ref{rchoosekarvv}), we obtain

lim supnlogM(n,ϵ,Γ,V)nC(Γ)nr1+η\displaystyle\limsup_{n\to\infty}\,\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\leq r_{1}^{*}+\eta (46)

for ϵϵ\epsilon\leq\epsilon^{*}. For ϵ>ϵ\epsilon>\epsilon^{*}, a similar derivation gives us

lim supnlogM(n,ϵ,Γ,V)nC(Γ)nr2+η.\displaystyle\limsup_{n\to\infty}\,\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\leq r_{2}^{*}+\eta. (47)

We now let Δ,η,β\Delta,\eta,\beta and δβ\delta_{\beta} go to zero in both (46)(\ref{pq2}) and (47)(\ref{pq3}). Then using the fact from Lemma 3 that 𝒦(,)\mathcal{K}(\cdot,\cdot) is continuous, we obtain

lim supnlogM(n,ϵ,Γ,V)nC(Γ)nmax{r:𝒦(rV(Γ),C(Γ)2VV(Γ))ϵ}\displaystyle\limsup_{n\to\infty}\,\frac{\log M^{*}(n,\epsilon,\Gamma,V)-nC(\Gamma)}{\sqrt{n}}\leq\max\left\{r:\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right)\leq\epsilon\right\}

for all ϵ(0,1)\epsilon\in(0,1).

The converse result can also be stated in terms of a lower bound on the minimum average probability of error of (n,R,Γ,V)(n,R,\Gamma,V) codes for a rate R=C(Γ)+rnR=C(\Gamma)+\frac{r}{\sqrt{n}}. Starting again from Lemma 8, we have that for (n,R,Γ,V)(n,R,\Gamma,V) codes with minimum average error probability ϵ\epsilon,

logexp(nR)\displaystyle\log\lceil\exp(nR)\rceil logρlog[(1ϵsupP¯𝒫Γ,V(n)infq𝒫(n)(P¯W)(W(𝐘|𝐗)q(𝐘)>ρ))+].\displaystyle\leq\log\rho-\log\left[\left(1-\epsilon-\sup_{\overline{P}\in\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n})}\,\inf_{q\in\mathcal{P}(\mathbb{R}^{n})}(\overline{P}\circ W)\left(\frac{W(\mathbf{Y}|\mathbf{X})}{q(\mathbf{Y})}>\rho\right)\right)^{+}\right]. (48)

Assume first that r0r\leq 0 and let ρ=exp(nC(Γ)+rn)\rho=\exp\left(nC(\Gamma)+r^{\prime}\sqrt{n}\right) for some arbitrary r<rr^{\prime}<r. It directly follows from (40)(\ref{iskosocrmain}) and the definition of φ()\varphi(\cdot) in (31)(\ref{varphidefinitionr}) that

supP¯𝒫Γ,V(n)infq𝒫(n)(P¯W)(W(𝐘|𝐗)q(𝐘)>ρ)\displaystyle\sup_{\overline{P}\in\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n})}\,\inf_{q\in\mathcal{P}(\mathbb{R}^{n})}(\overline{P}\circ W)\left(\frac{W(\mathbf{Y}|\mathbf{X})}{q(\mathbf{Y})}>\rho\right)
1+2κn𝒦(Δ+rV(Γ)δβ,C(Γ)2VV(Γ)+Δ)\displaystyle\leq 1+2\frac{\kappa}{\sqrt{n}}-\mathcal{K}\left(-\Delta+\frac{r}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right) (49)

From (48)(\ref{qandp22}) and (49)(\ref{nogames6}), we have

logexp(nR)\displaystyle\log\lceil\exp(nR)\rceil logρlog[(ϵ2κn+𝒦(Δ+rV(Γ)δβ,C(Γ)2VV(Γ)+Δ))+]\displaystyle\leq\log\rho-\log\left[\left(-\epsilon-2\frac{\kappa}{\sqrt{n}}+\mathcal{K}\left(-\Delta+\frac{r^{\prime}}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\right)^{+}\right] (50)

which evaluates to

ϵ2κn+𝒦(Δ+rV(Γ)δβ,C(Γ)2VV(Γ)+Δ)e(rr)n.\displaystyle-\epsilon-2\frac{\kappa}{\sqrt{n}}+\mathcal{K}\left(-\Delta+\frac{r^{\prime}}{\sqrt{V(\Gamma)-\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right)\leq e^{-(r-r^{\prime})\sqrt{n}}.

Taking the limit as nn\to\infty and letting rrr^{\prime}\to r, Δ0\Delta\to 0, β0\beta\to 0 and δβ0\delta_{\beta}\to 0, we obtain

ϵ𝒦(rV(Γ),C(Γ)2VV(Γ))\displaystyle\epsilon\geq\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right) (51)

for any r0r\leq 0. For r>0r>0, let ρ=exp(nC(Γ)+rn)\rho=\exp\left(nC(\Gamma)+r^{\prime}\sqrt{n}\right) for some arbitrary 0<r<r0<r^{\prime}<r. Then from (40)(\ref{iskosocrmain}) and the definition of φ()\varphi(\cdot) in (31)(\ref{varphidefinitionr}), we have that

supP¯𝒫Γ,V(n)infq𝒫(n)(P¯W)(W(𝐘|𝐗)q(𝐘)>ρ)\displaystyle\sup_{\overline{P}\in\mathcal{P}_{\Gamma,V}(\mathbb{R}^{n})}\,\inf_{q\in\mathcal{P}(\mathbb{R}^{n})}(\overline{P}\circ W)\left(\frac{W(\mathbf{Y}|\mathbf{X})}{q(\mathbf{Y})}>\rho\right)
1+2κn𝒦(Δ+rV(Γ)+δβ,C(Γ)2VV(Γ)+Δ).\displaystyle\leq 1+2\frac{\kappa}{\sqrt{n}}-\mathcal{K}\left(-\Delta+\frac{r^{\prime}}{\sqrt{V(\Gamma)+\delta_{\beta}}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}+\Delta\right). (52)

Then a similar derivation to that used from (50)(\ref{z1s}) to (51)(\ref{z1s2}) gives us

ϵ𝒦(rV(Γ),C(Γ)2VV(Γ))\displaystyle\epsilon\geq\mathcal{K}\left(\frac{r}{\sqrt{V(\Gamma)}},\frac{C^{\prime}(\Gamma)^{2}V}{V(\Gamma)}\right)

for all r>0r>0.

Appendix A Proof of Lemma 2

Let YW(|x)=𝒩(x,N)Y\sim W(\cdot|x)=\mathcal{N}(x,N). We have

W(y|x)\displaystyle W(y|x) =12πNexp((yx)22N)\displaystyle=\frac{1}{\sqrt{2\pi N}}\exp\left(-\frac{(y-x)^{2}}{2N}\right)
Q(y)\displaystyle Q^{*}(y) =12π(Γ+N)exp(y22(Γ+N))\displaystyle=\frac{1}{\sqrt{2\pi(\Gamma+N)}}\exp\left(-\frac{y^{2}}{2(\Gamma+N)}\right)
W(Y|x)Q(Y)\displaystyle\frac{W(Y|x)}{Q^{*}(Y)} =Γ+NNexp((Yx)22N+Y22(Γ+N))\displaystyle=\frac{\sqrt{\Gamma+N}}{\sqrt{N}}\exp\left(-\frac{(Y-x)^{2}}{2N}+\frac{Y^{2}}{2(\Gamma+N)}\right)
logW(Y|x)Q(Y)\displaystyle\log\frac{W(Y|x)}{Q^{*}(Y)} =12log(1+ΓN)(Yx)22N+Y22(Γ+N)\displaystyle=\frac{1}{2}\log\left(1+\frac{\Gamma}{N}\right)-\frac{(Y-x)^{2}}{2N}+\frac{Y^{2}}{2(\Gamma+N)}
𝔼[logW(Y|x)Q(Y)]\displaystyle\mathbb{E}\left[\log\frac{W(Y|x)}{Q^{*}(Y)}\right] =12log(1+ΓN)Γx22(Γ+N).\displaystyle=\frac{1}{2}\log\left(1+\frac{\Gamma}{N}\right)-\frac{\Gamma-x^{2}}{2(\Gamma+N)}.

Hence,

Ti\displaystyle T_{i} =Yi22(Γ+N)(Yixi)22N+Γxi22(Γ+N).\displaystyle=\frac{Y_{i}^{2}}{2(\Gamma+N)}-\frac{(Y_{i}-x_{i})^{2}}{2N}+\frac{\Gamma-x_{i}^{2}}{2(\Gamma+N)}. (53)

Using the relation Yi=xi+ZiY_{i}=x_{i}+Z_{i}, we can write

i=1nTi\displaystyle\sum_{i=1}^{n}T_{i} =i=1n(xi+Zi)22(Γ+N)i=1nZi22N+nΓnc(𝐗)2(Γ+N)\displaystyle=\sum_{i=1}^{n}\frac{(x_{i}+Z_{i})^{2}}{2(\Gamma+N)}-\sum_{i=1}^{n}\frac{Z_{i}^{2}}{2N}+\frac{n\Gamma-nc(\mathbf{X})}{2(\Gamma+N)}
=nc(𝐗)2(Γ+N)+i=1nZi22(Γ+N)+i=1nxiZiΓ+Ni=1nZi22N+nΓnc(𝐗)2(Γ+N)\displaystyle=\frac{nc(\mathbf{X})}{2(\Gamma+N)}+\sum_{i=1}^{n}\frac{Z_{i}^{2}}{2(\Gamma+N)}+\sum_{i=1}^{n}\frac{x_{i}Z_{i}}{\Gamma+N}-\sum_{i=1}^{n}\frac{Z_{i}^{2}}{2N}+\frac{n\Gamma-nc(\mathbf{X})}{2(\Gamma+N)}
=i=1n[Zi22(Γ+N)+xiZiΓ+NZi22N]+nΓ2(Γ+N).\displaystyle=\sum_{i=1}^{n}\left[\frac{Z_{i}^{2}}{2(\Gamma+N)}+\frac{x_{i}Z_{i}}{\Gamma+N}-\frac{Z_{i}^{2}}{2N}\right]+\frac{n\Gamma}{2(\Gamma+N)}.

By completing the square and writing Zi=NSiZ_{i}=\sqrt{N}S_{i}, where {Si}\{S_{i}\} is i.i.d. 𝒩(0,1)\mathcal{N}(0,1), we can write

i=1nTi\displaystyle\sum_{i=1}^{n}T_{i} =Γ2(Γ+N)i=1n(SixiNΓ)2+Nnc(𝐗)2Γ(Γ+N)+nΓ2(Γ+N).\displaystyle=-\frac{\Gamma}{2(\Gamma+N)}\sum_{i=1}^{n}\left(S_{i}-x_{i}\frac{\sqrt{N}}{\Gamma}\right)^{2}+\frac{Nnc(\mathbf{X})}{2\Gamma(\Gamma+N)}+\frac{n\Gamma}{2(\Gamma+N)}.

Since

i=1n(SixiNΓ)2\sum_{i=1}^{n}\left(S_{i}-x_{i}\frac{\sqrt{N}}{\Gamma}\right)^{2}

has a noncentral chi-squared distribution with nn degrees of freedom and noncentrality parameter λ\lambda given by

λ=Nnc(𝐗)Γ2,\displaystyle\lambda=\frac{Nnc(\mathbf{X})}{\Gamma^{2}},

the assertion of the lemma follows.

Appendix B Proof of Lemma 5

Define 𝐘~=𝐗~+𝐙~\tilde{\mathbf{Y}}=\tilde{\mathbf{X}}+\tilde{\mathbf{Z}}, where 𝐗~\tilde{\mathbf{X}} and 𝐙~\tilde{\mathbf{Z}} are independent, 𝐗~\tilde{\mathbf{X}} is uniformly distributed on an (n1)(n-1) sphere of radius RN\frac{R}{\sqrt{N}} and 𝐙~𝒩(𝟎,In)\tilde{\mathbf{Z}}\sim\mathcal{N}(\mathbf{0},I_{n}). Let Q1ccQ^{cc}_{1} denote the PDF of 𝐘~\tilde{\mathbf{Y}}. From [13, Proposition 1], we have

Q1cc(𝐲~)=Γ(n2)2πn/2exp(R2+N𝐲~22N)(NR𝐲~)n21In21(R𝐲~N).\displaystyle Q^{cc}_{1}(\tilde{\mathbf{y}})=\frac{\Gamma\left(\frac{n}{2}\right)}{2\pi^{n/2}}\cdot\exp\left(-\frac{R^{2}+N||\tilde{\mathbf{y}}||^{2}}{2N}\right)\left(\frac{\sqrt{N}}{R||\tilde{\mathbf{y}}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{R||\tilde{\mathbf{y}}||}{\sqrt{N}}\right).

Since 𝐘=dN𝐘~\mathbf{Y}\stackrel{{\scriptstyle d}}{{=}}\sqrt{N}\tilde{\mathbf{Y}}, we can apply the change-of-variables formula

Qcc(𝐲)\displaystyle Q^{cc}(\mathbf{y}) =Q1cc(1N𝐲)det(1NIn)\displaystyle=Q_{1}^{cc}\left(\frac{1}{\sqrt{N}}\mathbf{y}\right)\cdot\text{det}\left(\frac{1}{\sqrt{N}}I_{n}\right)

to obtain the result.

Appendix C Proof of Lemma 6

From Lemma 5, we have

Qcc(𝐲)\displaystyle Q^{cc}(\mathbf{y}) =Γ(n2)2(πN)n/2exp(nΓ+𝐲22N)(NnΓ𝐲)n21In21(nΓ𝐲N)\displaystyle=\frac{\Gamma\left(\frac{n}{2}\right)}{2(\pi N)^{n/2}}\cdot\exp\left(-\frac{n\Gamma+||\mathbf{y}||^{2}}{2N}\right)\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)

and

Q0cc(𝐲)\displaystyle Q^{cc}_{0}(\mathbf{y}) =Γ(n2)2(πN)n/2exp(nΓ+𝐲22N)(NnΓ𝐲)n21In21(nΓ𝐲N).\displaystyle=\frac{\Gamma\left(\frac{n}{2}\right)}{2(\pi N)^{n/2}}\cdot\exp\left(-\frac{n\Gamma^{\prime}+||\mathbf{y}||^{2}}{2N}\right)\left(\frac{N}{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}{N}\right).

Hence,

Q0cc(𝐲)Qcc(𝐲)\displaystyle\frac{Q^{cc}_{0}(\mathbf{y})}{Q^{cc}(\mathbf{y})} =exp(nΓ2N)(NnΓ𝐲)n21In21(nΓ𝐲N)exp(nΓ2N)(NnΓ𝐲)n21In21(nΓ𝐲N)\displaystyle=\frac{\exp\left(-\frac{n\Gamma^{\prime}}{2N}\right)\left(\frac{N}{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}{N}\right)}{\exp\left(-\frac{n\Gamma}{2N}\right)\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)}
=exp(n2N(ΓΓ))(nΓ𝐲nΓ𝐲)n21In21(nΓ𝐲N)In21(nΓ𝐲N)\displaystyle=\exp\left(\frac{n}{2N}(\Gamma-\Gamma^{\prime})\right)\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}\right)^{\frac{n}{2}-1}\frac{I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}{N}\right)}{I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)}
=exp(n2N(ΓΓ))(ΓΓ)n412In21(nΓ𝐲N)In21(nΓ𝐲N).\displaystyle=\exp\left(\frac{n}{2N}(\Gamma-\Gamma^{\prime})\right)\left(\frac{\Gamma}{\Gamma^{\prime}}\right)^{\frac{n}{4}-\frac{1}{2}}\frac{I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}{N}\right)}{I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)}.

Hence,

logQ0cc(𝐲)Qcc(𝐲)\displaystyle\log\frac{Q^{cc}_{0}(\mathbf{y})}{Q^{cc}(\mathbf{y})}
=n2N(ΓΓ)+(n412)log(ΓΓ)+logIn21(nΓ𝐲N)logIn21(nΓ𝐲N)\displaystyle=\frac{n}{2N}(\Gamma-\Gamma^{\prime})+\left(\frac{n}{4}-\frac{1}{2}\right)\log\left(\frac{\Gamma}{\Gamma^{\prime}}\right)+\log I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}{N}\right)-\log I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)

To approximate the Bessel function, we first rewrite it as

In21(nΓ𝐲N)\displaystyle I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right) =Iν(νz),\displaystyle=I_{\nu}\left(\nu z\right),

where ν=n21\nu=\frac{n}{2}-1 and z=2nΓ𝐲N(n2)z=\frac{2\sqrt{n\Gamma}||\mathbf{y}||}{N(n-2)}. Since 𝐲𝒫n\mathbf{y}\in\mathcal{P}_{n}^{*} by assumption, we have that zz lies in a compact interval [a,b](0,)[a,b]\subset(0,\infty) for sufficiently large nn, where 0<a<b<0<a<b<\infty. Hence, we can use a uniform asymptotic expansion of the modified Bessel function (see [14, 10.41.3] whose interpretation is given in [14, 2.1(iv)]): as ν\nu\to\infty, we have

Iν(νz)=eνη(2πν)1/2(1+z2)14(1+O(1ν)),\displaystyle I_{\nu}(\nu z)=\frac{e^{\nu\eta}}{(2\pi\nu)^{1/2}(1+z^{2})^{\frac{1}{4}}}\left(1+O\left(\frac{1}{\nu}\right)\right), (54)

where

η=1+z2+log(z1+(1+z2)1/2).\eta=\sqrt{1+z^{2}}+\log\left(\frac{z}{1+(1+z^{2})^{1/2}}\right).

Since z(0,1)z\in(0,1), the O(1/ν)O(1/\nu) term in (54)(\ref{besselapprox22}) can be uniformly bounded over 𝒫n\mathcal{P}_{n}^{*}. Using the approximation in (54)(\ref{besselapprox22}), we have

logIn21(nΓ𝐲N)\displaystyle\log I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right) =n21+4𝐲2nΓN2(n2)2+n2log(2𝐲nΓN(n2))n2log(1+1+4𝐲2nΓN2(n2)2)+O(logn),\displaystyle=\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}+\frac{n}{2}\log\left(\frac{2||\mathbf{y}||\sqrt{n\Gamma}}{N(n-2)}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+O(\log n), (55)

where it is easy to see that the O(logn)O(\log n) term can be made to be uniformly bounded over 𝒫n\mathcal{P}_{n}^{*}. Using (55)(\ref{z322}), we have

logIn21(nΓ𝐲N)In21(nΓ𝐲N)\displaystyle\log I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma^{\prime}}||\mathbf{y}||}{N}\right)-I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)
=n21+4𝐲2nΓN2(n2)2+n2log(2𝐲nΓN(n2))n2log(1+1+4𝐲2nΓN2(n2)2)\displaystyle=\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}+\frac{n}{2}\log\left(\frac{2||\mathbf{y}||\sqrt{n\Gamma^{\prime}}}{N(n-2)}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}\right)-\mbox{}
n21+4𝐲2nΓN2(n2)2n2log(2𝐲nΓN(n2))+n2log(1+1+4𝐲2nΓN2(n2)2)+O(logn)\displaystyle\quad\quad\quad\quad\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}-\frac{n}{2}\log\left(\frac{2||\mathbf{y}||\sqrt{n\Gamma}}{N(n-2)}\right)+\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+O(\log n)
=n21+4𝐲2nΓN2(n2)2+n4log(ΓΓ)n2log(1+1+4𝐲2nΓN2(n2)2)\displaystyle=\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}+\frac{n}{4}\log\left(\frac{\Gamma^{\prime}}{\Gamma}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}\right)-\mbox{}
n21+4𝐲2nΓN2(n2)2+n2log(1+1+4𝐲2nΓN2(n2)2)+O(logn).\displaystyle\quad\quad\quad\quad\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}+\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+O(\log n).

Hence,

logQ0cc(𝐲)Qcc(𝐲)\displaystyle\log\frac{Q^{cc}_{0}(\mathbf{y})}{Q^{cc}(\mathbf{y})}
=n2N(ΓΓ)+(n412)log(ΓΓ)+\displaystyle=\frac{n}{2N}(\Gamma-\Gamma^{\prime})+\left(\frac{n}{4}-\frac{1}{2}\right)\log\left(\frac{\Gamma}{\Gamma^{\prime}}\right)+\mbox{}
n21+4𝐲2nΓN2(n2)2+n4log(ΓΓ)n2log(1+1+4𝐲2nΓN2(n2)2)\displaystyle\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}+\frac{n}{4}\log\left(\frac{\Gamma^{\prime}}{\Gamma}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}\right)-\mbox{}
n21+4𝐲2nΓN2(n2)2+n2log(1+1+4𝐲2nΓN2(n2)2)+O(logn)\displaystyle\quad\quad\quad\quad\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}+\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+O(\log n)
=n2N(ΓΓ)+12log(ΓΓ)+n21+4𝐲2nΓN2(n2)2n2log(1+1+4𝐲2nΓN2(n2)2)\displaystyle=\frac{n}{2N}(\Gamma-\Gamma^{\prime})+\frac{1}{2}\log\left(\frac{\Gamma^{\prime}}{\Gamma}\right)+\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma^{\prime}}{N^{2}(n-2)^{2}}}\right)-\mbox{}
n21+4𝐲2nΓN2(n2)2+n2log(1+1+4𝐲2nΓN2(n2)2)+O(logn).\displaystyle\quad\quad\quad\quad\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}+\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+O(\log n).

To simplify the notation, we write δ=𝐲2n(Γ+N)\delta=\frac{||\mathbf{y}||^{2}}{n}-(\Gamma+N). Recall that Γ=Γ+ϵ\Gamma^{\prime}=\Gamma+\epsilon and |δ|Δ|\delta|\leq\Delta. Then

logQ0cc(𝐲)Qcc(𝐲)\displaystyle\log\frac{Q^{cc}_{0}(\mathbf{y})}{Q^{cc}(\mathbf{y})}
=n2Nϵ+12log(1+ϵΓ)+12gn(δ,ϵ)12hn(δ,ϵ)12gn(δ,0)+12hn(δ,0)+O(logn),\displaystyle=-\frac{n}{2N}\epsilon+\frac{1}{2}\log\left(1+\frac{\epsilon}{\Gamma}\right)+\frac{1}{2}g_{n}(\delta,\epsilon)-\frac{1}{2}h_{n}(\delta,\epsilon)-\frac{1}{2}g_{n}(\delta,0)+\frac{1}{2}h_{n}(\delta,0)+O(\log n), (56)

where we define

gn(δ,ϵ)\displaystyle g_{n}(\delta,\epsilon) :=n2+4Γ(Γ+N)N2n4(n2)2+4ϵ(Γ+N)N2n4(n2)2+4ΓN2n4(n2)2δ+4ϵN2n4(n2)2δ\displaystyle:=\sqrt{n^{2}+\frac{4\Gamma(\Gamma+N)}{N^{2}}\frac{n^{4}}{(n-2)^{2}}+\frac{4\epsilon(\Gamma+N)}{N^{2}}\frac{n^{4}}{(n-2)^{2}}+\frac{4\Gamma}{N^{2}}\frac{n^{4}}{(n-2)^{2}}\delta+\frac{4\epsilon}{N^{2}}\frac{n^{4}}{(n-2)^{2}}\delta}
hn(δ,ϵ)\displaystyle h_{n}(\delta,\epsilon) :=nlog(1+1+4n2Γ(Γ+N+δ)N2(n2)2+4n2ϵ(Γ+N+δ)N2(n2)2).\displaystyle:=n\log\left(1+\sqrt{1+\frac{4n^{2}\Gamma(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}+\frac{4n^{2}\epsilon(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}}\right).

Using the Taylor series approximation

n2(n2)2=1+4n+O(1n2),\displaystyle\frac{n^{2}}{(n-2)^{2}}=1+\frac{4}{n}+O\left(\frac{1}{n^{2}}\right),

we have

(gn(δ,ϵ))2\displaystyle\left(g_{n}(\delta,\epsilon)\right)^{2}
=n2+4Γ(Γ+N)N2(n2+4n+O(1))+4(Γ+N)N2(n2+4n+O(1))ϵ+\displaystyle=n^{2}+\frac{4\Gamma(\Gamma+N)}{N^{2}}\left(n^{2}+4n+O(1)\right)+\frac{4(\Gamma+N)}{N^{2}}\left(n^{2}+4n+O(1)\right)\epsilon+\mbox{}
4ΓN2(n2+4n+O(1))δ+4N2(n2+4n+O(1))ϵδ\displaystyle\quad\quad\quad\quad\quad\quad\quad\frac{4\Gamma}{N^{2}}\left(n^{2}+4n+O(1)\right)\delta+\frac{4}{N^{2}}\left(n^{2}+4n+O(1)\right)\epsilon\delta
=n2(1+4Γ(Γ+N)N2+4(Γ+N)N2ϵ+4ΓN2δ+4N2ϵδ)+\displaystyle=n^{2}\left(1+\frac{4\Gamma(\Gamma+N)}{N^{2}}+\frac{4(\Gamma+N)}{N^{2}}\epsilon+\frac{4\Gamma}{N^{2}}\delta+\frac{4}{N^{2}}\epsilon\delta\right)+\mbox{}
n(16Γ(Γ+N)N2+16(Γ+N)N2ϵ+16ΓN2δ+16N2ϵδ)+O(1).\displaystyle\quad\quad\quad\quad\quad\quad\quad n\left(\frac{16\Gamma(\Gamma+N)}{N^{2}}+\frac{16(\Gamma+N)}{N^{2}}\epsilon+\frac{16\Gamma}{N^{2}}\delta+\frac{16}{N^{2}}\epsilon\delta\right)+O(1).

Let

A\displaystyle A =1+4Γ(Γ+N)N2\displaystyle=1+\frac{4\Gamma(\Gamma+N)}{N^{2}}
B\displaystyle B =4(Γ+N)N2\displaystyle=\frac{4(\Gamma+N)}{N^{2}}
C\displaystyle C =4ΓN2\displaystyle=\frac{4\Gamma}{N^{2}}
D\displaystyle D =4N2\displaystyle=\frac{4}{N^{2}}
E\displaystyle E =16Γ(Γ+N)N2\displaystyle=\frac{16\Gamma(\Gamma+N)}{N^{2}}

so that

gn(δ,ϵ)\displaystyle g_{n}(\delta,\epsilon)
=n2(A+Bϵ+Cδ+Dϵδ)+n(E+4Bϵ+4Cδ+4Dϵδ)+O(1)\displaystyle=\sqrt{n^{2}\left(A+B\epsilon+C\delta+D\epsilon\delta\right)+n\left(E+4B\epsilon+4C\delta+4D\epsilon\delta\right)+O(1)}
=nA+Bϵ+Cδ+Dϵδ+E+4Bϵ+4Cδ+4Dϵδn+O(n2)\displaystyle=n\sqrt{A+B\epsilon+C\delta+D\epsilon\delta+\frac{E+4B\epsilon+4C\delta+4D\epsilon\delta}{n}+O(n^{-2})}

Further set K=A+Bϵ+Cδ+DϵδK=A+B\epsilon+C\delta+D\epsilon\delta and L=E+4Bϵ+4Cδ+4DϵδL=E+4B\epsilon+4C\delta+4D\epsilon\delta. Then

gn(δ,ϵ)\displaystyle g_{n}(\delta,\epsilon) =nK+Ln+O(n2)\displaystyle=n\sqrt{K+\frac{L}{n}+O(n^{-2})}
=nK1+LKn+O(n2)\displaystyle=n\sqrt{K}\sqrt{1+\frac{L}{Kn}+O(n^{-2})}
=nK(1+L2Kn+O(n2))\displaystyle=n\sqrt{K}\left(1+\frac{L}{2Kn}+O(n^{-2})\right)
=nK+L2K+O(n1).\displaystyle=n\sqrt{K}+\frac{L}{2\sqrt{K}}+O(n^{-1}).

Now

K\displaystyle\sqrt{K} =A+Bϵ+Cδ+Dϵδ\displaystyle=\sqrt{A+B\epsilon+C\delta+D\epsilon\delta}
=A1+BAϵ+CAδ+DAϵδ\displaystyle=\sqrt{A}\sqrt{1+\frac{B}{A}\epsilon+\frac{C}{A}\delta+\frac{D}{A}\epsilon\delta}
=A(1+B2Aϵ+C2Aδ+O(ϵ2,Δ2,ϵΔ))\displaystyle=\sqrt{A}\left(1+\frac{B}{2A}\epsilon+\frac{C}{2A}\delta+O(\epsilon^{2},\Delta^{2},\epsilon\Delta)\right)
=A+B2Aϵ+C2Aδ+O(ϵ2,Δ2,ϵΔ).\displaystyle=\sqrt{A}+\frac{B}{2\sqrt{A}}\epsilon+\frac{C}{2\sqrt{A}}\delta+O(\epsilon^{2},\Delta^{2},\epsilon\Delta).

Thus

nK\displaystyle n\sqrt{K} =nA+B2Anϵ+C2Anδ+O(nϵ2,nΔ2,nϵΔ).\displaystyle=n\sqrt{A}+\frac{B}{2\sqrt{A}}n\epsilon+\frac{C}{2\sqrt{A}}n\delta+O(n\epsilon^{2},n\Delta^{2},n\epsilon\Delta).

Combining all the terms, we have

gn(δ,ϵ)\displaystyle g_{n}(\delta,\epsilon) =n1+4Γ(Γ+N)N2+4(Γ+N)2N21+4Γ(Γ+N)N2nϵ+4Γ2N21+4Γ(Γ+N)N2nδ+O(1)+O(nϵ2,nΔ2,nϵΔ).\displaystyle=n\sqrt{1+\frac{4\Gamma(\Gamma+N)}{N^{2}}}+\frac{4(\Gamma+N)}{2N^{2}\sqrt{1+\frac{4\Gamma(\Gamma+N)}{N^{2}}}}n\epsilon+\frac{4\Gamma}{2N^{2}\sqrt{1+\frac{4\Gamma(\Gamma+N)}{N^{2}}}}n\delta+O(1)+O(n\epsilon^{2},n\Delta^{2},n\epsilon\Delta).

Note that

A=1+4Γ(Γ+N)N2\displaystyle\sqrt{A}=\sqrt{1+\frac{4\Gamma(\Gamma+N)}{N^{2}}} =2Γ+NN.\displaystyle=\frac{2\Gamma+N}{N}.

Hence,

gn(δ,ϵ)\displaystyle g_{n}(\delta,\epsilon) =n2Γ+NN+2(Γ+N)N(2Γ+N)nϵ+2ΓN(2Γ+N)nδ+O(1)+O(nϵ2,nΔ2,nϵΔ).\displaystyle=n\frac{2\Gamma+N}{N}+\frac{2(\Gamma+N)}{N(2\Gamma+N)}n\epsilon+\frac{2\Gamma}{N(2\Gamma+N)}n\delta+O(1)+O(n\epsilon^{2},n\Delta^{2},n\epsilon\Delta). (57)

Now we turn to

hn(δ,ϵ)=nlog(1+1+4n2Γ(Γ+N+δ)N2(n2)2+4n2ϵ(Γ+N+δ)N2(n2)2).\displaystyle h_{n}(\delta,\epsilon)=n\log\left(1+\sqrt{1+\frac{4n^{2}\Gamma(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}+\frac{4n^{2}\epsilon(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}}\right).

We first simplify the expression inside the square root as follows:

1+4n2Γ(Γ+N+δ)N2(n2)2+4n2ϵ(Γ+N+δ)N2(n2)2\displaystyle 1+\frac{4n^{2}\Gamma(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}+\frac{4n^{2}\epsilon(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}
=1+4Γ(Γ+N+δ)N2(1+4n+O(1n2))+4ϵ(Γ+N+δ)N2(1+4n+O(1n2))\displaystyle=1+\frac{4\Gamma(\Gamma+N+\delta)}{N^{2}}\left(1+\frac{4}{n}+O\left(\frac{1}{n^{2}}\right)\right)+\frac{4\epsilon(\Gamma+N+\delta)}{N^{2}}\left(1+\frac{4}{n}+O\left(\frac{1}{n^{2}}\right)\right)
=1+4Γ(Γ+N+δ)N2+4ϵ(Γ+N+δ)N2+O(1n)\displaystyle=1+\frac{4\Gamma(\Gamma+N+\delta)}{N^{2}}+\frac{4\epsilon(\Gamma+N+\delta)}{N^{2}}+O\left(\frac{1}{n}\right)
=A+Cδ+Bϵ+Dϵδ+O(n1)\displaystyle=A+C\delta+B\epsilon+D\epsilon\delta+O(n^{-1})
=A(1+CAδ+BAϵ+DAϵδ+O(n1)).\displaystyle=A\left(1+\frac{C}{A}\delta+\frac{B}{A}\epsilon+\frac{D}{A}\epsilon\delta+O(n^{-1})\right).

Therefore,

A1+CAδ+BAϵ+DAϵδ+O(n1)\displaystyle\sqrt{A}\sqrt{1+\frac{C}{A}\delta+\frac{B}{A}\epsilon+\frac{D}{A}\epsilon\delta+O(n^{-1})}
=A(1+C2Aδ+B2Aϵ+O(n1,Δ2,ϵ2,ϵΔ))\displaystyle=\sqrt{A}\left(1+\frac{C}{2A}\delta+\frac{B}{2A}\epsilon+O(n^{-1},\Delta^{2},\epsilon^{2},\epsilon\Delta)\right)
=A+C2Aδ+B2Aϵ+O(n1,Δ2,ϵ2,ϵΔ).\displaystyle=\sqrt{A}+\frac{C}{2\sqrt{A}}\delta+\frac{B}{2\sqrt{A}}\epsilon+O(n^{-1},\Delta^{2},\epsilon^{2},\epsilon\Delta).

Therefore,

log(1+A+C2Aδ+B2Aϵ+O(n1,Δ2,ϵ2,ϵΔ))\displaystyle\log\left(1+\sqrt{A}+\frac{C}{2\sqrt{A}}\delta+\frac{B}{2\sqrt{A}}\epsilon+O(n^{-1},\Delta^{2},\epsilon^{2},\epsilon\Delta)\right)
=log(1+A)+11+A(C2Aδ+B2Aϵ)+O(n1,Δ2,ϵ2,ϵΔ)\displaystyle=\log\left(1+\sqrt{A}\right)+\frac{1}{1+\sqrt{A}}\left(\frac{C}{2\sqrt{A}}\delta+\frac{B}{2\sqrt{A}}\epsilon\right)+O(n^{-1},\Delta^{2},\epsilon^{2},\epsilon\Delta)
=log(2Γ+NN)+N2(Γ+N)4Γ2N(2Γ+N)δ+N2(Γ+N)4(Γ+N)2N(2Γ+N)ϵ+O(n1,Δ2,ϵ2,ϵΔ)\displaystyle=\log\left(2\frac{\Gamma+N}{N}\right)+\frac{N}{2(\Gamma+N)}\frac{4\Gamma}{2N(2\Gamma+N)}\delta+\frac{N}{2(\Gamma+N)}\frac{4(\Gamma+N)}{2N(2\Gamma+N)}\epsilon+O(n^{-1},\Delta^{2},\epsilon^{2},\epsilon\Delta)
=log(2Γ+NN)+Γ(2Γ+N)(Γ+N)δ+12Γ+Nϵ+O(n1,Δ2,ϵ2,ϵΔ).\displaystyle=\log\left(2\frac{\Gamma+N}{N}\right)+\frac{\Gamma}{(2\Gamma+N)(\Gamma+N)}\delta+\frac{1}{2\Gamma+N}\epsilon+O(n^{-1},\Delta^{2},\epsilon^{2},\epsilon\Delta).

Finally,

hn(δ,ϵ)\displaystyle h_{n}(\delta,\epsilon) =nlog(2Γ+NN)+Γ(2Γ+N)(Γ+N)nδ+12Γ+Nnϵ+O(nΔ2,nϵ2,nϵΔ)+O(1).\displaystyle=n\log\left(2\frac{\Gamma+N}{N}\right)+\frac{\Gamma}{(2\Gamma+N)(\Gamma+N)}n\delta+\frac{1}{2\Gamma+N}n\epsilon+O(n\Delta^{2},n\epsilon^{2},n\epsilon\Delta)+O(1). (58)

Substituting (57)(\ref{gdeltaepsilon}) and (58)(\ref{hdeltaepsilon}) in (56)(\ref{plugbackin}), we obtain

logQ0cc(𝐲)Qcc(𝐲)\displaystyle\log\frac{Q^{cc}_{0}(\mathbf{y})}{Q^{cc}(\mathbf{y})}
=n2Nϵ+12log(1+ϵΓ)+12gn(δ,ϵ)12hn(δ,ϵ)12gn(δ,0)+12hn(δ,0)+O(logn)\displaystyle=-\frac{n}{2N}\epsilon+\frac{1}{2}\log\left(1+\frac{\epsilon}{\Gamma}\right)+\frac{1}{2}g_{n}(\delta,\epsilon)-\frac{1}{2}h_{n}(\delta,\epsilon)-\frac{1}{2}g_{n}(\delta,0)+\frac{1}{2}h_{n}(\delta,0)+O(\log n)
=n2Nϵ+12log(1+ϵΓ)+12(n2Γ+NN+2(Γ+N)N(2Γ+N)nϵ+2ΓN(2Γ+N)nδ+O(1)+O(nϵ2,nΔ2,nϵΔ))\displaystyle=-\frac{n}{2N}\epsilon+\frac{1}{2}\log\left(1+\frac{\epsilon}{\Gamma}\right)+\frac{1}{2}\left(n\frac{2\Gamma+N}{N}+\frac{2(\Gamma+N)}{N(2\Gamma+N)}n\epsilon+\frac{2\Gamma}{N(2\Gamma+N)}n\delta+O(1)+O(n\epsilon^{2},n\Delta^{2},n\epsilon\Delta)\right)-\mbox{}
12(nlog(2Γ+NN)+Γ(2Γ+N)(Γ+N)nδ+12Γ+Nnϵ+O(nΔ2,nϵ2,nϵΔ)+O(1))+O(logn)\displaystyle\quad\quad\quad\frac{1}{2}\left(n\log\left(2\frac{\Gamma+N}{N}\right)+\frac{\Gamma}{(2\Gamma+N)(\Gamma+N)}n\delta+\frac{1}{2\Gamma+N}n\epsilon+O(n\Delta^{2},n\epsilon^{2},n\epsilon\Delta)+O(1)\right)+O(\log n)-\mbox{}
12(n2Γ+NN+2ΓN(2Γ+N)nδ+O(1)+O(nΔ2))+\displaystyle\quad\quad\quad\quad\quad\quad\quad\frac{1}{2}\left(n\frac{2\Gamma+N}{N}+\frac{2\Gamma}{N(2\Gamma+N)}n\delta+O(1)+O(n\Delta^{2})\right)+\mbox{}
12(nlog(2Γ+NN)+Γ(2Γ+N)(Γ+N)nδ+O(nΔ2)+O(1))\displaystyle\quad\quad\quad\quad\quad\quad\quad\quad\quad\frac{1}{2}\left(n\log\left(2\frac{\Gamma+N}{N}\right)+\frac{\Gamma}{(2\Gamma+N)(\Gamma+N)}n\delta+O(n\Delta^{2})+O(1)\right)
=n2Nϵ+122(Γ+N)N(2Γ+N)nϵ+O(1)+O(nϵ2,nΔ2,nϵΔ)1212Γ+Nnϵ+O(logn)\displaystyle=-\frac{n}{2N}\epsilon+\frac{1}{2}\frac{2(\Gamma+N)}{N(2\Gamma+N)}n\epsilon+O(1)+O(n\epsilon^{2},n\Delta^{2},n\epsilon\Delta)-\frac{1}{2}\frac{1}{2\Gamma+N}n\epsilon+O(\log n)
=O(logn)+O(nϵ2,nΔ2,nϵΔ).\displaystyle=O(\log n)+O(n\epsilon^{2},n\Delta^{2},n\epsilon\Delta).

Appendix D Proof of Lemma 7

From Lemma 5, we have

Qcc(𝐲)\displaystyle Q^{cc}(\mathbf{y}) =Γ(n2)2(πN)n/2exp(nΓ+𝐲22N)(NnΓ𝐲)n21In21(nΓ𝐲N).\displaystyle=\frac{\Gamma\left(\frac{n}{2}\right)}{2(\pi N)^{n/2}}\cdot\exp\left(-\frac{n\Gamma+||\mathbf{y}||^{2}}{2N}\right)\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)^{\frac{n}{2}-1}I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right).

From the standard formula for the multivariate Gaussian, we have

Q(𝐲)\displaystyle Q^{*}(\mathbf{y}) =1(2π(Γ+N))n/2exp(12(Γ+N)𝐲2).\displaystyle=\frac{1}{(2\pi(\Gamma+N))^{n/2}}\exp\left(-\frac{1}{2(\Gamma+N)}||\mathbf{y}||^{2}\right).

Then

logQcc(𝐲)Q(𝐲)\displaystyle\log\frac{Q^{cc}(\mathbf{y})}{Q^{*}(\mathbf{y})}
=log(Γ(n2))nΓ+𝐲22N+n2log(NnΓ𝐲)log(NnΓ𝐲)+log(In21(nΓ𝐲N))\displaystyle=\log\left(\Gamma\left(\frac{n}{2}\right)\right)-\frac{n\Gamma+||\mathbf{y}||^{2}}{2N}+\frac{n}{2}\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)-\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)+\log\left(I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)\right)
log(2)n2log(πN)+n2log(2π(Γ+N))+12(Γ+N)𝐲2\displaystyle\quad\quad\quad\quad\quad-\log(2)-\frac{n}{2}\log(\pi N)+\frac{n}{2}\log(2\pi(\Gamma+N))+\frac{1}{2(\Gamma+N)}||\mathbf{y}||^{2}
=log(Γ(n2))𝐲2Γ2N(Γ+N)nΓ2N+n2log(NnΓ𝐲)log(NnΓ𝐲)+log(In21(nΓ𝐲N))\displaystyle=\log\left(\Gamma\left(\frac{n}{2}\right)\right)-\frac{||\mathbf{y}||^{2}\Gamma}{2N(\Gamma+N)}-\frac{n\Gamma}{2N}+\frac{n}{2}\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)-\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)+\log\left(I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)\right)
log(2)+n2log(2Γ+2NN)\displaystyle\quad\quad\quad\quad\quad-\log(2)+\frac{n}{2}\log\left(\frac{2\Gamma+2N}{N}\right)
=(a)n2log(n2)n212log(n2)+O(1)𝐲2Γ2N(Γ+N)nΓ2N+n2log(NnΓ𝐲)log(NnΓ𝐲)+\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\frac{n}{2}\log\left(\frac{n}{2}\right)-\frac{n}{2}-\frac{1}{2}\log\left(\frac{n}{2}\right)+O(1)-\frac{||\mathbf{y}||^{2}\Gamma}{2N(\Gamma+N)}-\frac{n\Gamma}{2N}+\frac{n}{2}\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)-\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)+\mbox{}
log(In21(nΓ𝐲N))log(2)+n2log(2Γ+2NN)\displaystyle\quad\quad\quad\quad\quad\log\left(I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)\right)-\log(2)+\frac{n}{2}\log\left(\frac{2\Gamma+2N}{N}\right)
=n2log(n)n212log(n2)𝐲2Γ2N(Γ+N)nΓ2N+n2log(NnΓ𝐲)log(NnΓ𝐲)+\displaystyle=\frac{n}{2}\log\left(n\right)-\frac{n}{2}-\frac{1}{2}\log\left(\frac{n}{2}\right)-\frac{||\mathbf{y}||^{2}\Gamma}{2N(\Gamma+N)}-\frac{n\Gamma}{2N}+\frac{n}{2}\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)-\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)+\mbox{}
log(In21(nΓ𝐲N))+n2log(Γ+NN)+O(1)\displaystyle\quad\quad\quad\quad\quad\log\left(I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right)\right)+\frac{n}{2}\log\left(\frac{\Gamma+N}{N}\right)+O(1) (59)

In equality (a)(a), we used an asymptotic expansion of the log gamma function (see, e.g., [14, 5.11.1]). To approximate the Bessel function, we first rewrite it as

In21(nΓ𝐲N)\displaystyle I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right) =Iν(νz),\displaystyle=I_{\nu}\left(\nu z\right),

where ν=n21\nu=\frac{n}{2}-1 and z=2nΓ𝐲N(n2)z=\frac{2\sqrt{n\Gamma}||\mathbf{y}||}{N(n-2)}. Since 𝐲𝒫n\mathbf{y}\in\mathcal{P}_{n}^{*} by assumption, we have that zz lies in a compact interval [a,b](0,)[a,b]\subset(0,\infty) for sufficiently large nn, where 0<a<b<0<a<b<\infty. Hence, we can use a uniform asymptotic expansion of the modified Bessel function (see [14, 10.41.3] whose interpretation is given in [14, 2.1(iv)]): as ν\nu\to\infty, we have

Iν(νz)=eνη(2πν)1/2(1+z2)14(1+O(1ν)),\displaystyle I_{\nu}(\nu z)=\frac{e^{\nu\eta}}{(2\pi\nu)^{1/2}(1+z^{2})^{\frac{1}{4}}}\left(1+O\left(\frac{1}{\nu}\right)\right), (60)

where

η=1+z2+log(z1+(1+z2)1/2).\eta=\sqrt{1+z^{2}}+\log\left(\frac{z}{1+(1+z^{2})^{1/2}}\right).

Since z(0,1)z\in(0,1), the O(1/ν)O(1/\nu) term in (60)(\ref{besselapprox}) can be uniformly bounded over 𝒫n\mathcal{P}_{n}^{*}. Using the approximation in (60)(\ref{besselapprox}), we have

logIn21(nΓ𝐲N)\displaystyle\log I_{\frac{n}{2}-1}\left(\frac{\sqrt{n\Gamma}||\mathbf{y}||}{N}\right) =n21+4𝐲2nΓN2(n2)2+n2log(2𝐲nΓN(n2))n2log(1+1+4𝐲2nΓN2(n2)2)+O(logn),\displaystyle=\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}+\frac{n}{2}\log\left(\frac{2||\mathbf{y}||\sqrt{n\Gamma}}{N(n-2)}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+O(\log n), (61)

where it is easy to see that the O(logn)O(\log n) term can be made to be uniformly bounded over 𝒫n\mathcal{P}_{n}^{*}. Substituting (61)(\ref{z3}) in (59)(\ref{b20}), we obtain

logQcc(𝐲)Q(𝐲)\displaystyle\log\frac{Q^{cc}(\mathbf{y})}{Q^{*}(\mathbf{y})}
=n2log(n)n2𝐲2Γ2N(Γ+N)nΓ2N+n2log(NnΓ𝐲)log(NnΓ𝐲)+n21+4𝐲2nΓN2(n2)2+\displaystyle=\frac{n}{2}\log\left(n\right)-\frac{n}{2}-\frac{||\mathbf{y}||^{2}\Gamma}{2N(\Gamma+N)}-\frac{n\Gamma}{2N}+\frac{n}{2}\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)-\log\left(\frac{N}{\sqrt{n\Gamma}||\mathbf{y}||}\right)+\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}+\mbox{}
n2log(2𝐲nΓN(n2))n2log(1+1+4𝐲2nΓN2(n2)2)+n2log(Γ+NN)+O(logn)\displaystyle\quad\quad\quad\quad\quad\frac{n}{2}\log\left(\frac{2||\mathbf{y}||\sqrt{n\Gamma}}{N(n-2)}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)+\frac{n}{2}\log\left(\frac{\Gamma+N}{N}\right)+O(\log n)
=n2𝐲2Γ2N(Γ+N)nΓ2N+n21+4𝐲2nΓN2(n2)2n2log(1+1+4𝐲2nΓN2(n2)2)\displaystyle=-\frac{n}{2}-\frac{||\mathbf{y}||^{2}\Gamma}{2N(\Gamma+N)}-\frac{n\Gamma}{2N}+\frac{n}{2}\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4||\mathbf{y}||^{2}n\Gamma}{N^{2}(n-2)^{2}}}\right)\mbox{}
+n2log(2Γ+2NN)+log(nΓ𝐲)+O(logn)\displaystyle\quad\quad\quad\quad\quad+\frac{n}{2}\log\left(\frac{2\Gamma+2N}{N}\right)+\log\left(\sqrt{n\Gamma}||\mathbf{y}||\right)+O(\log n)
=[12gn(δ,0)ΓNnn2Γ2N(Γ+N)nδ]+[n2log(2Γ+2NN)12hn(δ,0)]+log(nΓ𝐲)+O(logn),\displaystyle=\left[\frac{1}{2}g_{n}(\delta,0)-\frac{\Gamma}{N}n-\frac{n}{2}-\frac{\Gamma}{2N(\Gamma+N)}n\delta\right]+\left[\frac{n}{2}\log\left(\frac{2\Gamma+2N}{N}\right)-\frac{1}{2}h_{n}(\delta,0)\right]+\log\left(\sqrt{n\Gamma}||\mathbf{y}||\right)+O(\log n), (62)

where in the last equality above, we have

δ\displaystyle\delta =𝐲2n(Γ+N)\displaystyle=\frac{||\mathbf{y}||^{2}}{n}-(\Gamma+N) (63)
gn(δ,0)\displaystyle g_{n}(\delta,0) =n2+4Γ(Γ+N)N2n4(n2)2+4ΓN2n4(n2)2δ\displaystyle=\sqrt{n^{2}+\frac{4\Gamma(\Gamma+N)}{N^{2}}\frac{n^{4}}{(n-2)^{2}}+\frac{4\Gamma}{N^{2}}\frac{n^{4}}{(n-2)^{2}}\delta}
hn(δ,0)\displaystyle h_{n}(\delta,0) =nlog(1+1+4n2Γ(Γ+N+δ)N2(n2)2)\displaystyle=n\log\left(1+\sqrt{1+\frac{4n^{2}\Gamma(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}}\right)

to facilitate further analysis. Recall that |δ|Δ|\delta|\leq\Delta.

From (57)(\ref{gdeltaepsilon}), we have

gn(δ,0)\displaystyle g_{n}(\delta,0) =n2Γ+NN+2ΓN(2Γ+N)nδ+O(nΔ2)+O(1).\displaystyle=n\frac{2\Gamma+N}{N}+\frac{2\Gamma}{N(2\Gamma+N)}n\delta+O(n\Delta^{2})+O\left(1\right).

Thus, we can simplify the first term in square brackets in (62)(\ref{42v}) as

12gn(δ)ΓNnn2Γ2N(Γ+N)nδ\displaystyle\frac{1}{2}g_{n}(\delta)-\frac{\Gamma}{N}n-\frac{n}{2}-\frac{\Gamma}{2N(\Gamma+N)}n\delta
=ΓN(2Γ+N)nδ+O(nΔ2)+O(1)Γ2N(Γ+N)nδ\displaystyle=\frac{\Gamma}{N(2\Gamma+N)}n\delta+O(n\Delta^{2})+O\left(1\right)-\frac{\Gamma}{2N(\Gamma+N)}n\delta
=Γ2(2Γ+N)(Γ+N)nδ+O(nΔ2)+O(1).\displaystyle=\frac{\Gamma}{2(2\Gamma+N)(\Gamma+N)}n\delta+O(n\Delta^{2})+O\left(1\right). (64)

Similarly, from (58)(\ref{hdeltaepsilon}), we have

hn(δ,0)\displaystyle h_{n}(\delta,0) =nlog(2Γ+2NN)+Γ(2Γ+N)(Γ+N)nδ+O(1)+O(nΔ2).\displaystyle=n\log\left(\frac{2\Gamma+2N}{N}\right)+\frac{\Gamma}{(2\Gamma+N)(\Gamma+N)}n\delta+O(1)+O\left(n\Delta^{2}\right).

Thus, we can simplify the second term in square brackets in (62)(\ref{42v}) as

n2log(2Γ+2NN)n2log(1+1+4n2Γ(Γ+N+δ)N2(n2)2)\displaystyle\frac{n}{2}\log\left(\frac{2\Gamma+2N}{N}\right)-\frac{n}{2}\log\left(1+\sqrt{1+\frac{4n^{2}\Gamma(\Gamma+N+\delta)}{N^{2}(n-2)^{2}}}\right)
=Γ2(2Γ+N)(Γ+N)nδ+O(1)+O(nΔ2).\displaystyle=-\frac{\Gamma}{2(2\Gamma+N)(\Gamma+N)}n\delta+O(1)+O\left(n\Delta^{2}\right). (65)

Adding (64)(\ref{firstterm_sq}) and (65)(\ref{secondterm_sq}) gives us O(nΔ2)+O(1)O(n\Delta^{2})+O(1). Hence, going back to (62)(\ref{42v}), we have

logQcc(𝐲)Q(𝐲)\displaystyle\log\frac{Q^{cc}(\mathbf{y})}{Q^{*}(\mathbf{y})}
=log(nΓ𝐲)+O(logn)+O(nΔ2)\displaystyle=\log\left(\sqrt{n\Gamma}||\mathbf{y}||\right)+O(\log n)+O(n\Delta^{2})
=O(logn)+O(nΔ2),\displaystyle=O(\log n)+O(n\Delta^{2}),

where the last equality follows from the assumption that 𝐲𝒫n\mathbf{y}\in\mathcal{P}_{n}^{*}.

Acknowledgment

This research was supported by the US National Science Foundation under grant CCF-1956192.

References

  • [1] A. Mahmood and A. B. Wagner, “Channel coding with mean and variance cost constraints,” in 2024 IEEE International Symposium on Information Theory (ISIT), 2024, pp. 510–515.
  • [2] ——, “Improved channel coding performance through cost variability,” 2024. [Online]. Available: https://arxiv.org/abs/2407.05260
  • [3] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed.   Hoboken, N.J.: Wiley-Interscience, 2006.
  • [4] Y. Polyanskiy, “Channel coding: Non-asymptotic fundamental limits,” Ph.D. dissertation, Dept. Elect. Eng., Princeton Univ., Princeton, NJ, USA, 2010.
  • [5] W. Yang, G. Caire, G. Durisi, and Y. Polyanskiy, “Optimum power control at finite blocklength,” IEEE Transactions on Information Theory, vol. 61, no. 9, pp. 4598–4615, 2015.
  • [6] S. L. Fong and V. Y. F. Tan, “Asymptotic expansions for the awgn channel with feedback under a peak power constraint,” in 2015 IEEE International Symposium on Information Theory (ISIT), 2015, pp. 311–315.
  • [7] ——, “A tight upper bound on the second-order coding rate of the parallel Gaussian channel with feedback,” IEEE Transactions on Information Theory, vol. 63, no. 10, pp. 6474–6486, 2017.
  • [8] Y. Polyanskiy, H. V. Poor, and S. Verdu, “Channel coding rate in the finite blocklength regime,” IEEE Transactions on Information Theory, vol. 56, no. 5, pp. 2307–2359, 2010.
  • [9] A. Mahmood and A. B. Wagner, “Channel coding with mean and variance cost constraints,” 2024. [Online]. Available: https://arxiv.org/abs/2401.16417
  • [10] A. B. Wagner, N. V. Shende, and Y. Altuğ, “A new method for employing feedback to improve coding performance,” IEEE Transactions on Information Theory, vol. 66, no. 11, pp. 6660–6681, 2020.
  • [11] M. Hayashi, “Information spectrum approach to second-order coding rate in channel coding,” IEEE Transactions on Information Theory, vol. 55, no. 11, pp. 4947–4966, 2009.
  • [12] P. van Beek, “An application of Fourier methods to the problem of sharpening the Berry-Esseen inequality,” Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, vol. 23, no. 3, pp. 187–196, 1972.
  • [13] A. Dytso, M. Al, H. V. Poor, and S. Shamai Shitz, “On the capacity of the peak power constrained vector Gaussian channel: An estimation theoretic perspective,” IEEE Transactions on Information Theory, vol. 65, no. 6, pp. 3907–3921, 2019.
  • [14] F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, Eds., NIST Handbook of Mathematical Functions.   New York: Cambridge University Press, 2010.