This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

From Soft-Minoration to Information-Constrained Optimal Transport and Spiked Tensor Models

Jingbo Liu Department of Statistics, University of Illinois, Urbana-Champaign
jingbol@illinois.edu
Abstract

Let PZP_{Z} be a given distribution on n\mathbb{R}^{n}. For any yny\in\mathbb{R}^{n}, we may interpret ρ(y):=ln𝔼[ey,Z]\rho(y):=\ln\mathbb{E}[e^{\left<y,Z\right>}] as a soft-max of y,Z\left<y,Z\right>. We explore lower bounds on 𝔼[ρ(Y)]\mathbb{E}[\rho(Y)] in terms of the minimum mutual information I(Z,Z¯)I(Z,\bar{Z}) over PZZ¯P_{Z\bar{Z}} which is a coupling of PZP_{Z} and itself such that ZZ¯Z-\bar{Z} is bounded in a certain sense. This may be viewed as a soft version of Sudakov’s minoration, which lower bounds the expected supremum of a stochastic process in terms of the packing number. Our method is based on convex geometry (thrifty approximation of convex bodies), and works for general non-Gaussian YY. When YY is Gaussian and Z¯\bar{Z} converges to ZZ, this recovers a recent inequality of Bai-Wu-Ozgur on information-constrained optimal transport, previously established using Gaussian-specific techniques. We also use soft-minoration to obtain asymptotically (in tensor order) tight bounds on the free energy in the Sherrington-Kirkpatrick model with spins uniformly distributed on a type class, implying asymptotically tight bounds for the type II error exponent in spiked tensor detection.

I Introduction

Given PZP_{Z} on n\mathbb{R}^{n}, define ρ(y):=ln𝔼[ey,Z],yn\rho(y):=\ln\mathbb{E}[e^{\left<y,Z\right>}],\quad y\in\mathbb{R}^{n}, where ZPZZ\sim P_{Z} and ,\left<,\right> denotes the inner product. We may interpret ρ(y)\rho(y) as a soft-max of y,Z\left<y,Z\right>. Indeed, if PZP_{Z} is the uniform distribution on a compact set 𝒜\mathcal{A}, then ρ(y)maxz𝒜y,z\rho(y)\leq\max_{z\in\mathcal{A}}\left<y,z\right>. Moreover, the inequality typically becomes tight when yy is large. If PYP_{Y} is the standard Gaussian distribution, Sudakov’s minoration [23][11] gives

𝔼[maxz𝒜Y,z]csupl>0ln𝖯l(𝒜)\displaystyle\mathbb{E}[\max_{z\in\mathcal{A}}\left<Y,z\right>]\geq c\sup_{l>0}\sqrt{\ln{\sf P}_{l}(\mathcal{A})} (1)

where c>0c>0 is a universal constant and 𝖯l(𝒜){\sf P}_{l}(\mathcal{A}) denotes the ll-packing number of 𝒜\mathcal{A} under the Euclidean distance. Generalization of Sudakov’s minoration to other log-concave measures PYP_{Y} has also been considered [11][10][16]. In this paper we explore inequalities of the following form which may be called “soft minoration”:

𝔼[ρ(Y)]function of infI(Z;Z¯)\displaystyle\mathbb{E}[\rho(Y)]\geq\textrm{function of }\inf I(Z;\bar{Z}) (2)

where the inf is over coupling PZZ¯P_{Z\bar{Z}} under which ZZ¯Z-\bar{Z} is “small” and both ZZ and Z¯\bar{Z} have the same law PZP_{Z}.

One motivation for (2) is network information theory. Cover’s problem asks the minimum relay rate needed for achieving the maximum capacity of a relay channel [7] (see also [26]). Measure concentration and reverse hypercontractivity techniques yield nontrivial bounds but are not sufficient for solving Cover’s problem [27][14][15]. The solution in the Gaussian setting is infinity, as shown by [26] using a rearrangement inequality for the spheres (see also a solution for binary symmetric channels using a similar idea [2]). Bai, Wu, and Ozgur [1] provided a simplified proof for the Gaussian setting by proving a bound on information constrained optimal transport: if PYP_{Y} is the standard normal distribution in n\mathbb{R}^{n}, PZP_{Z} has well-defined differential entropy, and R>0R>0, then

nτ0(R/n)exp(h(Z)𝗁1n)supPYZΠ(PY,PZ)I(Y;Z)R𝔼[Y,Z]\displaystyle\frac{n}{\tau_{0}(R/n)}\exp\left(\tfrac{h(Z)-{\sf h}_{1}}{n}\right)\leq\sup_{\begin{subarray}{c}\text{$P_{YZ}\in\Pi(P_{Y},P_{Z})$}\\ \text{$I(Y;Z)\leq R$}\end{subarray}}\mathbb{E}[\left<Y,Z\right>] (3)

where 𝗁1:=n2ln(2πe){\sf h}_{1}:=\frac{n}{2}\ln(2\pi e) (all information in nats throughout the paper), the function τ0(θ):=11e2θ\tau_{0}(\theta):=\frac{1}{\sqrt{1-e^{-2\theta}}}, and Π(PY,PZ)\Pi(P_{Y},P_{Z}) denotes the set of couplings between two distribution. (3) generalizes Talagrand’s T2T_{2} inequality by replacing optimal transport with entropy-regularized optimal transport [1]. Previous proofs of (3) relied on Gaussian specific arguments. In contrast, [9] used the traditional auxiliary random variable approach, yielding the same capacity region outer bound for the Gaussian setting as [26][1], and also showed that compress-and-forward solves Cover’s problem for discrete memoryless channels under a full-rankness condition. Concurrently, [12][13] used convex geometry to lower bound 𝔼[ρ(Y)]\mathbb{E}[\rho(Y)] with packing numbers of 𝒜\mathcal{A} (for PZP_{Z} uniform on 𝒜\mathcal{A}), showing the optimality of compress-and-forward for all discrete memoryless channels under conditions originally stated in [7] (without full-rankness assumption).

In [13] the argument was restricted to PZP_{Z} of the form of a uniform distribution on a set, which is sufficient for the solution of Cover’s problem because when applied to the relay channel, PZP_{Z} is the restriction of the channel output distribution on the intersection of a type class and a relay decoding set. In this paper, we consider general PZP_{Z}, and the packing number of a set is replaced by the mutual information in (2), which is accomplished by combining the approach of [13] with a tensorization argument. Another difference from [13] is that [13] used the reduction of general PYP_{Y} to the case of Rademacher distribution; in contrast, this paper uses Barvinok’s thrifty approximation of convex bodies [3] which entails explicit information theoretic bounds for general PYP_{Y}.

We show as a consequence of (2) that for any l>0l>0,

infPZZ¯Π(PZ,PZ)𝔼[ZZ¯2]14τ02(Rn)l2nI(Z;Z¯)\displaystyle\quad\inf_{\begin{subarray}{c}\text{$P_{Z\bar{Z}}\in\Pi(P_{Z},P_{Z})$}\\ \text{$\mathbb{E}[\|Z-\bar{Z}\|^{2}]\leq\frac{1}{4}\tau_{0}^{2}(\frac{R}{n})l^{2}n$}\end{subarray}}I(Z;\bar{Z})
nln(1+2nlsupPYZΠ(PY,PZ)I(Y;Z)R𝔼[YZ])\displaystyle\leq n\ln\left(1+\frac{2}{nl}\sup_{\begin{subarray}{c}\text{$P_{YZ}\in\Pi(P_{Y},P_{Z})$}\\ \text{$I(Y;Z)\leq R$}\end{subarray}}\mathbb{E}[YZ]\right) (4)

which implies (3) as l0l\to 0. As noted in [8][1], information constrained optimal transport is useful in machine learning due to the availability of fast algorithms. In many such applications PZP_{Z} is the empirical distribution of samples, in which case h(Z)=h(Z)=-\infty and (3) is useless. In contrast, a bound with I(Z;Z¯)I(Z;\bar{Z}) may still be nontrivial. Moreover, our approach easily extends to general symmetric distribution PYP_{Y}, where τ0()\tau_{0}(\cdot) is replaced by another universal function τ()\tau(\cdot) and 𝗁1{\sf h}_{1} or the infimum in (4) is given a general definition.

In statistical physics, a central quantity is the (expected) free energy 𝔼[lneH(σ)𝑑μ(σ)]\mathbb{E}[\ln\int e^{H(\sigma)}d\mu(\sigma)], where σ\sigma is called the spin or configuration, and the expectation is with respect to the randomness (also known as the disorder) of the Hamiltonian H()H(\cdot) [25][17]. Clearly (2) provides a lower bound on the free energy, once we embed HH and σ\sigma in a suitable Euclidean space so that H(σ)H(\sigma) is an inner product. The free energy in the Sherrington-Kirkpatrick model (SK) characterizes the information-theoretic threshold for spiked tensor detection [6]. Statistical physics literature has been focusing on the cases of Rademacher and spherical spins, and existing exact formulae for the free energy generally rely on these structures and are usually hard to evaluate [24][22][6][18]. We prove a simple dimension-free bound in the format of (2) and show its tightness when the tensor order is large and the prior is uniform on a type class, which in turn provides asymptotically tight type II error exponent bounds in spiked tensor detection.

II Preliminaries

The simplest proof of Sudakov’s minoration (1) in the case of Gaussian PYP_{Y} is through Gaussian comparison (see e.g. [5][11]). However, this approach is Gaussian specific, and a longstanding goal in this research area is to extend the results to general log-concave measures (see [16]). For our proofs in Section III-IV, it suffices to use the following result of Pajor [19], which can be viewed as a generalization of (1) to the case of general PYP_{Y} and packing number under the Minkowski functional distance. The proof is a simple application of the Alexandrov-Fenchel inequality, and a review of related convex geometry concepts can be found in [13].

Lemma 1.

[19] Suppose that 𝒞\mathcal{C} is a symmetric convex body in N\mathbb{R}^{N}, and let PYP_{Y} be the associated cone volume measure. Let 𝒜N\mathcal{A}\subseteq\mathbb{R}^{N} be compact, and define a:=𝔼[supz𝒜supz,Y]a:=\mathbb{E}[\sup_{z\in\mathcal{A}}\sup\left<z,Y\right>]. Let 𝒞\mathcal{C}^{\circ} be the polar of 𝒞\mathcal{C}. For any l>0l>0, define 𝖯l(𝒜){\sf P}_{l}(\mathcal{A}) as the ll-packing number of 𝒜\mathcal{A} under the Minkowski function 𝒞\|\|_{\mathcal{C}^{\circ}} (which is a norm in the case of symmetric convex body 𝒞\mathcal{C}). Then

𝖯l(𝒜)(1+2a/l)N.\displaystyle{\sf P}_{l}(\mathcal{A})\leq\left(1+2a/l\right)^{N}. (5)
Remark 1.

A difference between (1) and (5) is that the latter is dimension dependent. It is possible however to use a Johnson-Lindenstrauss embedding argument to reduce NN in (5) to the order of ln𝖯l(𝒜)\ln{\sf P}_{l}(\mathcal{A}) and recover (1); see [16].

Another key ingredient for the proofs in Section III is to relate 𝔼[ρ(Y)]\mathbb{E}[\rho(Y)] in (2) to 𝔼[maxz𝒜Y,z]\mathbb{E}[\max_{z\in\mathcal{A}}\left<Y,z\right>], which is achieved by approximating the support of PYP_{Y} in (1) by a sparser set and then apply Markov’s inequality and the union bound. More specifically, we use the following “thrifty approximation of convex body” by Barvinok [3]. We state here a simplified asymptotic version.

Lemma 2.

[3] For any τ>1\tau>1, let κ>0\kappa>0 and θ>0\theta>0 be the solutions to

1+κ2κh(κ1+κ)\displaystyle\frac{1+\kappa}{2\kappa}h(\tfrac{\kappa}{1+\kappa}) =ln(τ+τ21);\displaystyle=\ln(\tau+\sqrt{\tau^{2}-1}); (6)
(1+κ)h(κ1+κ)\displaystyle(1+\kappa)h(\tfrac{\kappa}{1+\kappa}) =θ,\displaystyle=\theta, (7)

where h()h(\cdot) denotes the binary entropy function. Then for any symmetric convex body 𝒞N\mathcal{C}\subseteq\mathbb{R}^{N}, there exists a symmetric polytope PP satisfying PCτPP\subseteq C\subseteq\tau P and with at most eθN+o(N)e^{\theta N+o(N)} vertices.

Remark 2.

From (6)(7), if τ1\tau\to 1 then κ=1+o(1)42(τ1)ln1τ1\kappa=\frac{1+o(1)}{4\sqrt{2(\tau-1)}}\ln\frac{1}{\tau-1} and θ=1+o(1)2ln1τ1\theta=\frac{1+o(1)}{2}\ln\frac{1}{\tau-1}. If τ\tau\to\infty then κ=1+o(1)τ2\kappa=\frac{1+o(1)}{\tau^{2}} and θ=2+o(1)τ2lnτ\theta=\frac{2+o(1)}{\tau^{2}}\ln\tau.

III General PYP_{Y}

In this section we derive a bound in the form of (2) which, among other things, generalizes (3) to arbitrary PYP_{Y} satisfying PY=PYP_{Y}=P_{-Y} (see Corollary 4). To be precise, τ0(R/n)\tau_{0}(R/n) in (3) will be replaced by a worse constant for general PYP_{Y}; we will explain in the next section how the constant is improved to τ0(R/n)\tau_{0}(R/n) for Gaussian PYP_{Y}.

Let θ(τ)\theta(\tau) be the function defined implicitly in (6)-(7).

Theorem 1.

Suppose that PYP_{Y} and PZP_{Z} are distributions on n\mathbb{R}^{n}, 𝔼[Y]=0\mathbb{E}[Y]=0, and YY and Y-Y have the same distribution, where YPYY\sim P_{Y}. Then for any l>0l>0 and τ>1\tau>1,

infPZ¯ZI(Z¯;Z)nln(1+2ln(𝔼[ρ(Y)]+nθ(τ)))\displaystyle\inf_{P_{\bar{Z}Z}}I(\bar{Z};Z)\leq n\ln\left(1+\frac{2}{ln}\left(\mathbb{E}[\rho(Y)]+n\theta(\tau)\right)\right) (8)

where the infimum is over all PZ¯ZΠ(PZ,PZ)P_{\bar{Z}Z}\in\Pi(P_{Z},P_{Z}) satisfying 𝔼[Z¯Z,Y]τln2\mathbb{E}[\left<\bar{Z}-Z,Y\right>]\leq\frac{\tau ln}{2} for all PZ¯ZYΠ(PZ¯Z,PY)P_{\bar{Z}ZY}\in\Pi(P_{\bar{Z}Z},P_{Y}). (8) also holds if 𝔼[ρ(Y)]\mathbb{E}[\rho(Y)] is replaced by supPYZΠ(PY,PZ){I(Y;Z)+𝔼[Y,Z]}\sup_{P_{YZ}\in\Pi(P_{Y},P_{Z})}\left\{-I(Y;Z)+\mathbb{E}[\left<Y,Z\right>]\right\}.

Proof.

It suffices to prove the case where PYP_{Y} and PZP_{Z} are supported on finite sets and all the probability masses are rational numbers. The general case can then be established by an approximation argument, using the fact that the mutual information can be arbitrarily well approximated with finite partitions of the space [21]. For any N>0N>0 which divides the denominators of these rational numbers, let PY~NP_{\tilde{Y}^{N}} (resp. PZ~NP_{\tilde{Z}^{N}}) be the equiprobable distribution on 𝒞\mathcal{C} (resp. 𝒜\mathcal{A}), defined as the PYP_{Y}-type class (resp. PZP_{Z}-type class). For any yNnNy^{N}\in\mathbb{R}^{nN}, define

ρ~(yN):=ln𝔼[eyN,Z~N]\displaystyle\tilde{\rho}(y^{N}):=\ln\mathbb{E}[e^{\left<y^{N},\tilde{Z}^{N}\right>}] (9)

where Z~NPZ~N\tilde{Z}^{N}\sim P_{\tilde{Z}^{N}}. Then by the method of types and large deviation analysis we have

𝔼[ρ~(Y~N)]\displaystyle\mathbb{E}[\tilde{\rho}(\tilde{Y}^{N})] =NsupPYZΠ(PY,PZ){I(Y;Z)+𝔼[Y,Z]}+o(N)\displaystyle=N\sup_{P_{YZ}\in\Pi(P_{Y},P_{Z})}\left\{-I(Y;Z)+\mathbb{E}[\left<Y,Z\right>]\right\}+o(N) (10)
N𝔼[ρ(Y)]+o(N)\displaystyle\leq N\mathbb{E}[\rho(Y)]+o(N) (11)

where (Y,Z)PYZ(Y,Z)\sim P_{YZ} in (10), and (11) follows since by the Donsker-Varadhan variational formula, ρ(y)=supQZ{𝔼QZ[y,Z]D(QZPZ)}\rho(y)=\sup_{Q_{Z}}\left\{\mathbb{E}_{Q_{Z}}[\left<y,Z\right>]-D(Q_{Z}\|P_{Z})\right\} for any yy and therefore

𝔼[ρ(Y)]supPYZΠ(PY,PZ){I(Y;Z)+𝔼[Y,Z]}.\displaystyle\mathbb{E}[\rho(Y)]\geq\sup_{P_{YZ}\in\Pi(P_{Y},P_{Z})}\left\{-I(Y;Z)+\mathbb{E}[\left<Y,Z\right>]\right\}. (12)

By Lemma 2, we can choose 𝒮\mathcal{S} as a subset of the convex hull of 𝒞\mathcal{C} such that

ln|𝒮|\displaystyle\ln|\mathcal{S}| =nNθ(τ)+o(N);\displaystyle=nN\theta(\tau)+o(N); (13)
𝒮\displaystyle\mathcal{S}^{\circ} τ𝒞.\displaystyle\subseteq\tau\mathcal{C}^{\circ}. (14)

Let Y^N\hat{Y}^{N} be equiprobable on 𝒮\mathcal{S}. Define \mathcal{B} a subset of nN\mathbb{R}^{nN} as

yN𝒮{z𝒜:yN,zN𝔼[yN,Z~N]+ρ~(yN)+ln(2|𝒮|)}.\displaystyle\bigcap_{y^{N}\in\mathcal{S}}\{z\in\mathcal{A}\colon\left<y^{N},z^{N}\right>\leq\mathbb{E}[\left<y^{N},\tilde{Z}^{N}\right>]+\tilde{\rho}(y^{N})+\ln(2|\mathcal{S}|)\}. (15)

Then by Markov’s inequality we have

PZ~N[]12,\displaystyle P_{\tilde{Z}^{N}}[\mathcal{B}]\geq\frac{1}{2}, (16)

and moreover,

𝔼[supzNY^N,zN]\displaystyle\mathbb{E}[\sup_{z^{N}\in\mathcal{B}}\left<\hat{Y}^{N},z^{N}\right>] 𝔼[ρ~(Y^N)]+ln(2|𝒮|)\displaystyle\leq\mathbb{E}[\tilde{\rho}(\hat{Y}^{N})]+\ln(2|\mathcal{S}|) (17)
𝔼[ρ~(Y~N)]+ln(2|𝒮|)\displaystyle\leq\mathbb{E}[\tilde{\rho}(\tilde{Y}^{N})]+\ln(2|\mathcal{S}|) (18)

where we used 𝔼[Y^N]=0\mathbb{E}[\hat{Y}^{N}]=0, and the fact that ρ~()\tilde{\rho}(\cdot) is a constant on 𝒞\mathcal{C} by permutation invariance of the type class. Now let 𝖯τnNl(){\sf P}_{\tau nNl}(\mathcal{B}) be the τnNl\tau nNl-packing number under 𝒞\|\|_{\mathcal{C}^{\circ}}, which is upper bounded by the nNlnNl-packing number 𝒮\|\|_{\mathcal{S}^{\circ}} by (14). Therefore by Pajor Lemma 1,

ln𝖯τnNl()\displaystyle\ln{\sf P}_{\tau nNl}(\mathcal{B}) nNln(1+2nNl𝔼[supzNY^N,zN]).\displaystyle\leq nN\ln\left(1+\frac{2}{nNl}\mathbb{E}\left[\sup_{z^{N}\in\mathcal{B}}\left<\hat{Y}^{N},z^{N}\right>\right]\right). (19)

For any zN𝒜z^{N}\in\mathcal{A}, the set (zN+τnNl2𝒞)𝒜(z^{N}+\frac{\tau nNl}{2}\mathcal{C}^{\circ})\cap\mathcal{A} is

{z¯N𝒜:z¯NzN,yNτnNl2,if yN is PY-type},\displaystyle\left\{\bar{z}^{N}\in\mathcal{A}\colon\left<\bar{z}^{N}-z^{N},y^{N}\right>\leq\frac{\tau nNl}{2},\textrm{if }y^{N}\textrm{ is $P_{Y}$-type}\right\}, (20)

whose ln\ln cardinality is, by large deviation analysis,

NsupPZ¯ZH(Z¯|Z)+o(N),\displaystyle N\sup_{P_{\bar{Z}Z}}H(\bar{Z}|Z)+o(N), (21)

where the supremum is over the same set as the infimum in (8). Note that the packing number can be lower bounded by |||\mathcal{B}| divided by the cardinality of the set in (20); using (16) and (21) we have

ln𝖯τnNl()NinfPZ¯ZI(Z¯;Z)+o(N).\displaystyle\ln{\sf P}_{\tau nNl}(\mathcal{B})\geq N\inf_{P_{\bar{Z}Z}}I(\bar{Z};Z)+o(N). (22)

The theorem follows by (11)(18)(19)(22) and taking NN large. ∎

Next, we consider a limiting case of Theorem 1 as l0l\to 0.

Definition 1.

Fix PYP_{Y} a distribution on n\mathbb{R}^{n}. For any L>0L>0 define

𝗁L=supPXh(X).\displaystyle{\sf h}_{L}=\sup_{P_{X}}h(X). (23)

where the supremum is over all PXP_{X} satisfying supPXYΠ(PX,PY)𝔼[X,Y]nL\sup_{P_{XY}\in\Pi(P_{X},P_{Y})}\mathbb{E}[\left<X,Y\right>]\leq nL.

Remark 3.

Since h(LX)=h(X)+nlnLh(LX)=h(X)+n\ln L, we see that 𝗁L=𝗁1+nlnL{\sf h}_{L}={\sf h}_{1}+n\ln L. Moreover, if YY is standard Gaussian then the supremum is achieved when X=LYX=LY by Talagrand’s inequality (special case of (3) when RR\to\infty), and therefore 𝗁L=n2ln(2πeL2){\sf h}_{L}=\frac{n}{2}\ln(2\pi eL^{2}).

Corollary 3.

Suppose that PYP_{Y} and PZP_{Z} are as in Theorem 1, and additionally, PZP_{Z} has well defined differential entropy. Let 𝗁1{\sf h}_{1} be as in Definition 1. Then

h(Z)𝗁1+ninfτ>1ln(τn𝔼[ρ(Y)]+τθ(τ)).\displaystyle h(Z)\leq{\sf h}_{1}+n\inf_{\tau>1}\ln\left(\frac{\tau}{n}\mathbb{E}[\rho(Y)]+\tau\theta(\tau)\right). (24)
Proof.

Using Remark 3 we have

I(Z¯;Z)\displaystyle I(\bar{Z};Z) =h(Z)h(ZZ¯|Z¯)\displaystyle=h(Z)-h(Z-\bar{Z}|\bar{Z}) (25)
h(Z)h(ZZ¯)\displaystyle\geq h(Z)-h(Z-\bar{Z}) (26)
h(Z)𝗁τln2\displaystyle\geq h(Z)-{\sf h}_{\frac{\tau ln}{2}} (27)
=h(Z)𝗁1nlnτln2.\displaystyle=h(Z)-{\sf h}_{1}-n\ln\frac{\tau ln}{2}. (28)

Then corollary follows by taking l0l\to 0 in (8). ∎

Remark 4.

From (10)-(11) we see that the bounds in Theorem 1 and Corollary 3 can be sharpened by replacing 𝔼[ρ(Y)]\mathbb{E}[\rho(Y)] by the right side of (12). It might appear that this is a strict improvement, but actually it is just an equivalent version since the converse implication is also true. Indeed, the fact that these bounds with 𝔼[ρ(Y)]\mathbb{E}[\rho(Y)] holding for all PZP_{Z} and nn implies the sharpened versions with the right side of (12), using a similar tensorization argument as in (10)-(11).

An equivalent form of (3) is the following:

Corollary 4.

Let PYP_{Y} and PZP_{Z} be as in Corollary 3. Denote τ()\tau(\cdot) as the inverse function of θ()\theta(\cdot). For any RR, we have

supPYZ𝔼[Y,Z]nτ(R/n)exp(h(Z)𝗁1n)\displaystyle\sup_{P_{YZ}}\mathbb{E}[\left<Y,Z\right>]\geq\frac{n}{\tau(R/n)}\exp\left(\frac{h(Z)-{\sf h}_{1}}{n}\right) (29)

where the supremum is over PYZΠ(PY,PZ)P_{YZ}\in\Pi(P_{Y},P_{Z}) satisfying I(Y;Z)RI(Y;Z)\leq R.

Proof.

Using Corollary 3 and Remark 4 we obtain

exp(h(Z)𝗁1n)\displaystyle\exp(\tfrac{h(Z)-{\sf h}_{1}}{n})\leq
infθ>0(τ(θ)nsupPYZΠ(PY,PZ){I(Y;Z)+𝔼[Y,Z]}+τ(θ)θ).\displaystyle\inf_{\theta>0}\left(\tfrac{\tau(\theta)}{n}\sup_{P_{YZ}\in\Pi(P_{Y},P_{Z})}\left\{-I(Y;Z)+\mathbb{E}[\left<Y,Z\right>]\right\}+\tau(\theta)\theta\right). (30)

Let λ>0\lambda>0 be such that the PYZP_{YZ} achieving

supPYZΠ(PY,PZ){I(Y;Z)+λ𝔼[Y,Z]}\displaystyle\sup_{P_{YZ}\in\Pi(P_{Y},P_{Z})}\left\{-I(Y;Z)+\lambda\mathbb{E}[\left<Y,Z\right>]\right\} (31)

ensures that I(Y;Z)=RI(Y;Z)=R. Make the substitution ZλZZ\leftarrow\lambda Z in (30), and let PYZP_{YZ} be a optimal coupling for (31). We have

λexp(h(Z)𝗁1n)\displaystyle\quad\lambda\exp(\tfrac{h(Z)-{\sf h}_{1}}{n})
=exp(h(λZ)𝗁1n)\displaystyle=\exp(\tfrac{h(\lambda Z)-{\sf h}_{1}}{n}) (32)
infθ>0(τ(θ)n{I(Y;Z)+λ𝔼[Y,Z]}+τ(θ)θ)\displaystyle\leq\inf_{\theta>0}\left(\tfrac{\tau(\theta)}{n}\left\{-I(Y;Z)+\lambda\mathbb{E}[\left<Y,Z\right>]\right\}+\tau(\theta)\theta\right) (33)
τ(1nI(Y;Z))nλ𝔼[Y,Z]\displaystyle\leq\frac{\tau(\frac{1}{n}I(Y;Z))}{n}\cdot\lambda\mathbb{E}[\left<Y,Z\right>] (34)

which establishes the claim. ∎

IV The Gaussian Case

For Gaussian PYP_{Y}, we can improve the estimates in Section III by replacing Lemma 2 with the following sharp estimate, which follows from sphere covering (e.g. [4]).

Lemma 5.

Fix ϕ(0,π2)\phi\in(0,\frac{\pi}{2}). For any positive integer NN there exists a set 𝒮N\mathcal{S}_{N} on the unit ball B1NB_{1}^{N} satisfying |𝒮N|=1sinN(ϕ+o(1))|\mathcal{S}_{N}|=\frac{1}{\sin^{N}(\phi+o(1))} and 1cosϕconv(𝒮N)B1N\frac{1}{\cos\phi}\operatorname{conv}(\mathcal{S}_{N})\supseteq B_{1}^{N}.

Now define the functions θ0(τ)\theta_{0}(\tau) and τ0(θ)\tau_{0}(\theta) by the equations τ=1cosϕ\tau=\frac{1}{\cos\phi} and θ=ln1sinϕ\theta=\ln\frac{1}{\sin\phi}; explicitly,

θ0(τ)\displaystyle\theta_{0}(\tau) =ln1τ2;\displaystyle=\ln\sqrt{1-\tau^{-2}}; (35)
τ0(θ)\displaystyle\tau_{0}(\theta) =11e2θ\displaystyle=\frac{1}{\sqrt{1-e^{-2\theta}}} (36)
Theorem 2.

If PYP_{Y} is the standard Gaussian distribution on n\mathbb{R}^{n}, Then the bounds in Theorem 1, Corollary 3, and Corollary 4 can be improved by replacing θ(τ)\theta(\tau) and τ(θ)\tau(\theta) with θ0(τ)\theta_{0}(\tau) and τ0(θ)\tau_{0}(\theta). Moreover the left side of (8) can be improved to infPZ¯Z:ZZ¯2τ2l2n/4I(Z¯;Z)\inf_{P_{\bar{Z}Z}\colon\|Z-\bar{Z}\|^{2}\leq\tau^{2}l^{2}n/4}I(\bar{Z};Z).

Proof.

The proof is similar to the general non-Gaussian case, and we shall only mention a few differences in the argument. It suffices to consider PZP_{Z} supported on a finite set with all probability masses equal to rational numbers. Let N>0N>0 divide the denominators of these rational numbers. Define 𝒜\mathcal{A}, PZ~NP_{\tilde{Z}^{N}}, and ρ~\tilde{\rho} as in the proof of Theorem 1. Then we have

h(Z~N)\displaystyle h(\tilde{Z}^{N}) =Nh(Z)+o(N),\displaystyle=Nh(Z)+o(N), (37)
𝔼[ρ~(YN)]\displaystyle\mathbb{E}[\tilde{\rho}(Y^{N})] =𝔼[ln𝒜eYN,zN𝑑PZN(zN)]+o(N)\displaystyle=\mathbb{E}\left[\ln\int_{\mathcal{A}}e^{\left<Y^{N},z^{N}\right>}d{P_{Z}}^{\otimes N}(z^{N})\right]+o(N)
𝔼[lneYN,zN𝑑PZN(zN)]+o(N)\displaystyle\leq\mathbb{E}\left[\ln\int e^{\left<Y^{N},z^{N}\right>}d{P_{Z}}^{\otimes N}(z^{N})\right]+o(N)
=N𝔼[ρ(G)]+o(N).\displaystyle=N\mathbb{E}[\rho(G)]+o(N). (38)

Define rN:=𝔼[YN]r_{N}:=\mathbb{E}[\|Y^{N}\|] and let Y~N:=rNYNYN\tilde{Y}^{N}:=\frac{r_{N}}{\|Y^{N}\|}Y^{N}. Note that 𝔼[YN|Y~N]=Y~N\mathbb{E}[Y^{N}|\tilde{Y}^{N}]=\tilde{Y}^{N}, and therefore by Jensen’s inequality,

𝔼[ρ~(Y~N)]𝔼[ρ~(YN)].\displaystyle\mathbb{E}[\tilde{\rho}(\tilde{Y}^{N})]\leq\mathbb{E}[\tilde{\rho}(Y^{N})]. (39)

Choose 𝒮\mathcal{S} similar to before but use Lemma 5 instead and replace θ(τ)\theta(\tau) in (13) with θ0(τ)\theta_{0}(\tau). Define Y^N\hat{Y}^{N} as the random variable distributed on 𝒮\mathcal{S} and following the cone volume measure, and let UU be a random rotation in nN\mathbb{R}^{nN}, independent of Y^N\hat{Y}^{N} and following the uniform distribution on the orthogonal group. Then 𝔼[ρ~(Y~N)]=𝔼[ρ~(UY^N)\mathbb{E}[\tilde{\rho}(\tilde{Y}^{N})]=\mathbb{E}[\tilde{\rho}(U\hat{Y}^{N}). There exists some (deterministic) rotation uu such that 𝔼[ρ~(UY^N)]𝔼[ρ~(uY^N)]\mathbb{E}[\tilde{\rho}(U\hat{Y}^{N})]\geq\mathbb{E}[\tilde{\rho}(u\hat{Y}^{N})], which we can assume without loss of generality to be the identity, so that

𝔼[ρ~(Y~N)]𝔼[ρ~(Y^N)].\displaystyle\mathbb{E}[\tilde{\rho}(\tilde{Y}^{N})]\geq\mathbb{E}[\tilde{\rho}(\hat{Y}^{N})]. (40)

There rest of the proof is similar to Theorem 1, where 𝒞\mathcal{C} is now the centered sphere of radius rN=N(1+o(1))r_{N}=\sqrt{N}(1+o(1)). The improved estimate on the left side of (8) is seen by refining (20) for yNy^{N} in a ball. ∎

Remark 5.

The bounds claimed in Theorem 2 are asymptotically tight (as nn\to\infty) when PZP_{Z} is uniform on a ball.

V Spiked Tensor Model

In this section we explore bounds in the form (2) when Z=XdZ=X^{\otimes d} has a special rank-1 tensor structure, and the implications for the spiked tensor detection problem in statistics [20][6]. The order dd tensors associated with RnR^{n} is again a vector space, and can be given an inner product compatible with the Frobenius norm. The dimension of order dd tensors is ndn^{d} and order dd symmetric tensors is (n+d1d){n+d-1}\choose{d}, both too large for directly applying a dimension depending minoration such as Lemma 1 for tight bounds. As mentioned in Remark 1, a dimension reduction argument may be applied. In this section, we shall just focus on the case of Gaussian PYP_{Y} where we can apply a Gaussian comparison argument, which reduces to the random energy model (REM). The result we will use is (see [17, p150])

limM1M𝔼[lnj=12MeβEj]={β24+ln2(β<2ln2)βln2(β2ln2)\displaystyle\lim_{M\to\infty}\frac{1}{M}\mathbb{E}\left[\ln\sum_{j=1}^{2^{M}}e^{-\beta E_{j}}\right]=\left\{\begin{array}[]{cc}\frac{\beta^{2}}{4}+\ln 2&(\beta<2\sqrt{\ln 2})\\ \beta\sqrt{\ln 2}&(\beta\geq 2\sqrt{\ln 2})\end{array}\right. (43)

where Ej𝒩(0,M/2)E_{j}\sim\mathcal{N}(0,M/2), j=1,,2Mj=1,\dots,2^{M} are independent.

We will lower bound the soft-max (free energy) when XnX\in\mathbb{R}^{n} follows the equiprobable distribution on a type class; once this setting is understood, the free energy for other permutation invariant PXP_{X} (such as i.i.d. coordinates) can be estimated using standard method of types and large deviation analysis.

Theorem 3.

Let P𝖷P_{\sf X} be a distribution on \mathbb{R} with finite support, and nX=n(𝖷1,,𝖷n)\sqrt{n}X=\sqrt{n}({\sf X}_{1},\dots,{\sf X}_{n}) be equiprobably distributed on the P𝖷P_{\sf X}-type class (with rounding if necessary). Define Z=n2λXdZ=\sqrt{\frac{n}{2}}\lambda X^{\otimes d} and

ρ(y):=ln𝔼[ey,Z]\displaystyle\rho(y):=\ln\mathbb{E}[e^{\left<y,Z\right>}] (44)

for any order dd tensor yndy\in\mathbb{R}^{n^{d}}, and let

Iϵ:=infP𝖷𝖷¯Π(P𝖷,P𝖷)𝔼d[𝖷𝖷¯]𝔼d[|𝖷|2]ϵ2/2I(𝖷;𝖷¯).\displaystyle I_{\epsilon}:=\inf_{\begin{subarray}{c}\text{$P_{\sf X\bar{X}}\in\Pi(P_{\sf X},P_{\sf X})$}\\ \text{$\mathbb{E}^{d}[{\sf X\bar{X}}]\geq\mathbb{E}^{d}[|{\sf X}|^{2}]-\epsilon^{2}/2$}\end{subarray}}I({\sf X;\bar{X}}). (45)

Then

limn1n𝔼[ρ(G)]supϵ>0{λ2ϵ28(λ2ϵ2<8Iϵ)λϵ12IϵIϵ(λ2ϵ28Iϵ)\displaystyle\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\rho(G)]\geq\sup_{\epsilon>0}\left\{\begin{array}[]{cc}\frac{\lambda^{2}\epsilon^{2}}{8}&(\lambda^{2}\epsilon^{2}<8I_{\epsilon})\\ \lambda\epsilon\sqrt{\frac{1}{2}I_{\epsilon}}-I_{\epsilon}&(\lambda^{2}\epsilon^{2}\geq 8I_{\epsilon})\end{array}\right. (48)

where GndG\in\mathbb{R}^{n^{d}} is an order dd tensor with i.i.d. standard Gaussian entries.

Proof.

Set r=n2λϵr=\sqrt{\frac{n}{2}}\lambda\epsilon, where ϵ>0\epsilon>0 does not depend on nn. Define

kϵ,n:=𝔼1[PZ(Z+B(r))],\displaystyle k_{\epsilon,n}:=\mathbb{E}^{-1}[P_{Z}(Z+B(r))], (49)

where B(r)B(r) denotes the centered ball of radius rr in the space of tensors under the Frobenius norm. Let nx\sqrt{n}x and nx¯\sqrt{n}\bar{x} be two sequences in the P𝖷P_{\sf X}-type class, and define zz and z¯\bar{z} accordingly. Let tt be the joint type of (x,x¯)(x,\bar{x}). Then

zz¯2\displaystyle\|z-\bar{z}\|^{2} =nλ22xdx¯d2=nλ2(𝔼P𝖷d[𝖷2]𝔼td[𝖷𝖷¯]),\displaystyle=\tfrac{n\lambda^{2}}{2}\|x^{\otimes d}-\bar{x}^{\otimes d}\|^{2}=n\lambda^{2}(\mathbb{E}_{P_{\sf X}}^{d}[{\sf X}^{2}]-\mathbb{E}_{t}^{d}[{\sf X\bar{X}}]), (50)

where 𝔼t\mathbb{E}_{t} denotes the expectation under the type tt, viewed as a distribution on 2\mathbb{R}^{2}. Then by the large deviations analysis,

limn1nlnkϵ,n\displaystyle\lim_{n\to\infty}\frac{1}{n}\ln k_{\epsilon,n} =Iϵ,\displaystyle=I_{\epsilon}, (51)

We now generate a random measure ν^\hat{\nu} supported on 𝒜\mathcal{A}, the support of PZP_{Z}. Select uniformly at random a point in 𝒜\mathcal{A}, and then select uniformly at random the next point among all points in 𝒜\mathcal{A} at least rr away from the previously selected points, and so on, until no more points can be selected. Let ν^\hat{\nu} be the equiprobable distribution on these selected points. ν^\hat{\nu} is random because of the randomness of the point selection process. By symmetry of the type class, we see that 𝔼[ν^]=PZ\mathbb{E}[\hat{\nu}]=P_{Z}, so by Jensen’s inequality,

𝔼[ρ(G)]𝔼[lneG,z𝑑ν^(z)].\displaystyle\mathbb{E}[\rho(G)]\geq\mathbb{E}\left[\ln\int e^{\left<G,z\right>}d\hat{\nu}(z)\right]. (52)

Since the support size of μ\mu is at least kr,Nk_{r,N}, using (43) and Slepian’s comparison [5], we can lower bound the right side of (52) in terms of the free energy of the REM with parameters M,βM,\beta given by

M\displaystyle M =log2kr,N;\displaystyle=\log_{2}k_{r,N}; (53)
Mβ2\displaystyle M\beta^{2} =r2\displaystyle=r^{2} (54)

and the theorem follows by taking nn\to\infty. ∎

While the statistical physics literature mostly focuses on XX equiprobable on a Boolean cube, a general XX is relevant for statistical applications such as spiked tensor detection [20][6]. Let the noise WndW\in\mathbb{R}^{n^{d}} be a tensor with i.i.d. 𝒩(0,2n)\mathcal{N}(0,\frac{2}{n}) entries. Consider a hypothesis testing problem with observation

  • H0H_{0}: T=WT=W;

  • H1H_{1}: T=λXd+WT=\lambda X^{\otimes d}+W,

where λ>0\lambda>0 is the signal to noise ratio. Denote by PH0P_{H_{0}} and PH1P_{H_{1}} the distributions of TT under H0H_{0} and H1H_{1}, respectively. From the Gaussian density formula it is easy to see that

D(PH0PH1)=nλ24𝔼[ln𝔼[enλ2W,Xd|W]].\displaystyle D(P_{H_{0}}\|P_{H_{1}})=\frac{n\lambda^{2}}{4}-\mathbb{E}\left[\ln\mathbb{E}[e^{\frac{n\lambda}{2}\left<W,X^{\otimes d}\right>}|W]\right]. (55)

Using concentration, it can be shown that the critical λ\lambda for detecting a rank-1 spike with nontrivial probability coincides with the largest λ\lambda for D(PH0PH1)=o(n)D(P_{H_{0}}\|P_{H_{1}})=o(n). Previously [20] computed such critical λ\lambda by bounding D(PH0PH1)D(P_{H_{0}}\|P_{H_{1}}) with the Rényi divergence D2(PH1PH0)D_{2}(P_{H_{1}}\|P_{H_{0}}). However when λ\lambda is above the critical value, D2(PH1PH0)D_{2}(P_{H_{1}}\|P_{H_{0}}) grows super-linearly in nn (see [20, Section 2.4]) and hence does not give a useful bound for D(PH0PH1)D(P_{H_{0}}\|P_{H_{1}}) and hence for the free energy. In contrast, we show that Theorem 3 is asymptotically tight for large dd:

Corollary 6.

In Theorem 3, suppose that P𝖷P_{\sf X} has unit variance. Then

limdlimn1n𝔼[ρ(G)]={λ24(λ<2H)λHH(λ2H)\displaystyle\lim_{d\to\infty}\lim_{n\to\infty}\frac{1}{n}\mathbb{E}[\rho(G)]=\left\{\begin{array}[]{cc}\frac{\lambda^{2}}{4}&(\lambda<2\sqrt{H})\\ \lambda\sqrt{H}-H&(\lambda\geq 2\sqrt{H})\end{array}\right. (58)

where HH is the entropy of P𝖷P_{\sf X}. In particular, if λ2H\lambda\geq 2\sqrt{H} and the type I error in spiked tensor detection is bounded away from 0 and 1, then the optimal type II exponent converges to (λ/2H)2(\lambda/2-\sqrt{H})^{2} as dd\to\infty.

Proof.

For any ϵ(0,2)\epsilon\in(0,\sqrt{2}), IϵI_{\epsilon} in Theorem 3 converges to HH as dd\to\infty. Taking ϵ2\epsilon\uparrow\sqrt{2} proves the \geq part. To see the \leq part, we follow [20] and consider the maximum likelihood statistic

m:=maxvT,vd\displaystyle m:=\max_{v}\left<T,v^{\otimes d}\right> (59)

where the maximum is over vv such that nv\sqrt{n}v is P𝖷P_{\sf X}-typical. Let n\mathcal{E}_{n} be the event that mmnm\leq m_{n}, where mnm_{n} is defined as the number such that PH0(n)=12P_{H_{0}}(\mathcal{E}_{n})=\frac{1}{2}. By the union bound calculation in [20, Proposition 4.1], we have limnmn2H\lim_{n\to\infty}m_{n}\leq 2\sqrt{H}. Then

PH1(n)\displaystyle P_{H_{1}}(\mathcal{E}_{n}) PH1(T,Xdmn)\displaystyle\leq P_{H_{1}}(\left<T,X^{\otimes d}\right>\leq m_{n}) (60)
=PH1(λ+W,Xdmn).\displaystyle=P_{H_{1}}(\lambda+\left<W,X^{\otimes d}\right>\leq m_{n}). (61)

Note that W,Xd\left<W,X^{\otimes d}\right> follows 𝒩(0,2n)\mathcal{N}(0,\frac{2}{n}). If λ22H\lambda^{2}\leq 2\sqrt{H}, we have

limn1nlnPH1(n)(λ2H)2.\displaystyle\lim_{n\to\infty}\frac{1}{n}\ln P_{H_{1}}(\mathcal{E}_{n})\leq-(\tfrac{\lambda}{2}-\sqrt{H})^{2}. (62)

By the data processing inequality,

limn1nD(PH0PH1)\displaystyle\lim_{n\to\infty}\frac{1}{n}D(P_{H_{0}}\|P_{H_{1}}) limn1nd(PH0(n)PH1(n))\displaystyle\geq\lim_{n\to\infty}\frac{1}{n}d(P_{H_{0}}(\mathcal{E}_{n})\|P_{H_{1}}(\mathcal{E}_{n})) (63)
(λ2H)2\displaystyle\geq(\tfrac{\lambda}{2}-\sqrt{H})^{2} (64)

where d()d(\cdot\|\cdot) denotes the binary divergence function. Then from (55) we have shown the \leq part of (58) in the case of λ2H\lambda\geq 2\sqrt{H}. The \leq part in the case of λ<2H\lambda<2\sqrt{H} is trivial from D(PH0PH1)0D(P_{H_{0}}\|P_{H_{1}})\geq 0. ∎

Remark 6.

Results related to Corollary 6 have appeared in the literature: as mentioned, [20] performed 2-Rényi divergence calculations to show that the critical λ\lambda converges to 2H2\sqrt{H} as dd\to\infty. The 2-Rényi divergence is equivalent to the expected partition function of 2-replica systems. For XX equiprobable on the hypercube, a classical replica symmetry calculation (see e.g. [17]) shows that the free energy of the dd-spin model converges to the free energy of the REM as dd\to\infty.

VI acknowledgement

The author would like to thank Qiang Wu for discussions on the SK model and tensor detection.

References

  • [1] Yikun Bai, Xiugang Wu, and Ayfer Ozgur. Information constrained optimal transport: From Talagrand, to Marton, to Cover. Arxiv: 2008.10249.
  • [2] L. P. Barnes, X. Wu, and A. Ozgur. A solution to cover’s problem for the binary symmetric relay channel: Geometry of sets on the hamming sphere. In 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 844–851, Oct 2017.
  • [3] Alexander Barvinok. Thrifty approximations of convex bodies by polytopes. International Mathematics Research Notices, 2014(16):4341–4356, 2014.
  • [4] Károly Böröczky and Gergely Wintsche. Covering the sphere by equal spherical balls. In Discrete and computational geometry, pages 235–251. Springer, 2003.
  • [5] Sourav Chatterjee. An error bound in the Sudakov-Fernique inequality. arXiv preprint math/0510424, 2005.
  • [6] Wei-Kuo Chen. Phase transition in the spiked random tensor with rademacher prior. The Annals of Statistics, 47(5):2734–2756, 2019.
  • [7] T. M. Cover. The capacity of the relay channel. Open Problems in Communication and Computation, edited by T. M. Cover and B. Gopinath, New York: Springer-Verlag, pages 72–73, 1987.
  • [8] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  • [9] Abbas El Gamal, Amin Gohari, and Chandra Nair. A strengthened cutset upper bound on the capacity of the relay channel and applications. IEEE Transactions on Information Theory, 2022.
  • [10] Rafał Latała. Sudakov-type minoration for log-concave vectors. Studia Mathematica, 223(3):251–274, 2014.
  • [11] Michel Ledoux and Michel Talagrand. Probability in Banach Spaces: isoperimetry and processes. Springer Science & Business Media, Berlin Heidelberg, 1991.
  • [12] Jingbo Liu. Soft minoration: Solution to cover’s problem in the original discrete memoryless setting. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 1648–1652. IEEE, 2021.
  • [13] Jingbo Liu. Minoration via mixed volumes and cover’s problem for general channels. Probability Theory and Related Fields, 183(1):315–357, 2022.
  • [14] Jingbo Liu and Ayfer Ozgur. New converses for the relay channel via reverse hypercontractivity. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 2878–2882, 2019.
  • [15] Jingbo Liu and Ayfer Özgür. Capacity upper bounds for the relay channel via reverse hypercontractivity. IEEE Transactions on Information Theory, 66(9):5448–5455, 2020.
  • [16] Shahar Mendelson, Emanuel Milman, and Grigoris Paouris. Generalized dual Sudakov minoration via dimension-reduction program. Studia Mathematica, 244:159–202, 2019.
  • [17] Marc Mezard and Andrea Montanari. Information, physics, and computation. Oxford University Press, 2009.
  • [18] Jean-Christophe Mourrat. The parisi formula is a hamilton–jacobi equation in wasserstein space. Canadian Journal of Mathematics, 74(3):607–629, 2022.
  • [19] Alain Pajor. Sous-espaces 1n\ell_{1}^{n} des espaces de banach. PhD Thesis, L’Universite Pierre Et Marie Curie, https://perso.math.u-pem.fr/pajor.alain/recherche/docs/these.pdf, 8 Nov 1984.
  • [20] Amelia Perry, Alexander S Wein, Afonso S Bandeira, and Ankur Moitra. Optimality and sub-optimality of pca i: Spiked random matrix models. The Annals of Statistics, 46(5):2416–2451, 2018.
  • [21] M.S. Pinsker. Information and information stability of random variables and processes. San Francisco : Holden-Day, translated and edited by Amiel Feinstein edition, 1964.
  • [22] Eliran Subag. The geometry of the gibbs measure of pure spherical spin glasses. Inventiones mathematicae, 210(1):135–209, 2017.
  • [23] Vladimir N Sudakov. Gaussian measures, cauchy measures and ε\varepsilon-entropy. In Soviet Math. Dokl, volume 10, pages 310–313, 1969.
  • [24] Michel Talagrand. Free energy of the spherical mean field model. Probability theory and related fields, 134(3):339–382, 2006.
  • [25] Michel Talagrand. Mean field models for spin glasses: Volume I: Basic examples, volume 54. Springer Science & Business Media, 2010.
  • [26] X. Wu, L. P. Barnes, and A. Ozgur. The capacity of the relay channel: Solution to Cover’s problem in the Gaussian case. IEEE Transactions on Information Theory, 65(1):255–275, Jan. 2019.
  • [27] Xiugang Wu, Ayfer Özgür, and Liang-Liang Xie. Improving on the cut-set bound via geometric analysis of typical sets. IEEE Transactions on Information Theory, 63(4):2254–2277, 2017.