This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Entropy and functional forms of the dimensional Brunn–Minkowski inequality in Gauss space

Gautam Aishwarya Department of Mathematics, Michigan State University, East Lansing 48824, USA. [email protected]  and  Dongbin Li Faculty of Science - Mathematics and Statistical Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada. [email protected]
Abstract.

Given even strongly log-concave random vectors X0X_{0} and X1X_{1} in n\mathbb{R}^{n}, we show that a natural joint distribution (X0,X1)(X_{0},X_{1}) satisfies,

e1nD((1t)X0+tX1Z)(1t)e1nD(X0Z)+te1nD(X1Z),e^{-\frac{1}{n}D((1-t)X_{0}+tX_{1}\|Z)}\geq(1-t)e^{-\frac{1}{n}D(X_{0}\|Z)}+te^{-\frac{1}{n}D(X_{1}\|Z)}, (1)

where ZZ is distributed according to the standard Gaussian measure γ\gamma on n\mathbb{R}^{n}, t[0,1]t\in[0,1], and D(Z)D(\cdot\|Z) is the Gaussian relative entropy. This extends and provides a different viewpoint on the corresponding geometric inequality proved by Eskenazis and Moschidis [15], namely that

γ((1t)K0+tK1)1n(1t)γ(K0)1n+tγ(K1)1n,\gamma\left((1-t)K_{0}+tK_{1}\right)^{\frac{1}{n}}\geq(1-t)\gamma(K_{0})^{\frac{1}{n}}+t\gamma(K_{1})^{\frac{1}{n}}, (2)

when K0,K1nK_{0},K_{1}\subseteq\mathbb{R}^{n} are origin-symmetric convex bodies. As an application, using Donsker–Varadhan duality, we obtain Gaussian Borell–Brascamp–Lieb inequalities applicable to even log-concave functions, which serve as functional forms of the Eskenazis–Moschidis inequality.

MSC classification: 37C10, 94A17, 52A40, 52A20, 49Q22.
GA is supported by NSF-DMS 2154402. DL acknowledges the support of the Natural Sciences and Engineering Research Council of Canada and the Department of Mathematical and Statistical Sciences at the University of Alberta.

1. Introduction and Main Results

Let γ\gamma denote the standard Gaussian probability measure on n\mathbb{R}^{n},  dγ(x)e12|x|2 dx,\textnormal{ d}\gamma(x)\propto e^{-\frac{1}{2}|x|^{2}}\textnormal{ d}x, where |||\cdot| denotes the Euclidean norm. It was shown by Eskenazis and Moschidis [15] that, if K0K_{0} and K1K_{1} are origin-symmetric convex bodies, then

γ((1t)K0+tK1)1n(1t)γ(K0)1n+tγ(K1)1n.\gamma\left((1-t)K_{0}+tK_{1}\right)^{\frac{1}{n}}\geq(1-t)\gamma(K_{0})^{\frac{1}{n}}+t\gamma(K_{1})^{\frac{1}{n}}. (3)

Here (1t)K0+tK1={(1t)x0+tx1:x0K0,x1K1}(1-t)K_{0}+tK_{1}=\{(1-t)x_{0}+tx_{1}:x_{0}\in K_{0},x_{1}\in K_{1}\} denotes the collection of tt-midpoints of all segments from K0K_{0} to K1K_{1}. Observe that, the inequality (3) cannot hold for all compact sets K0,K1K_{0},K_{1}. This can be easily seen by fixing a set K0K_{0} of positive Gaussian measure, K1={x}K_{1}=\{x\}, and sending xx\to\infty. Without extra conditions on K0K_{0} and K1K_{1}, the Gaussian measure only satisfies

γ((1t)K0+tK1)γ(K0)1tγ(K1)t,\gamma\left((1-t)K_{0}+tK_{1}\right)\geq\gamma(K_{0})^{1-t}\gamma(K_{1})^{t}, (4)

by the virtue of being log-concave. Recall that a measure ν\nu is said to be log-concave if it has a density of the form  dν dx=eV\frac{\textnormal{ d}\nu}{\textnormal{ d}x}=e^{-V}, VV convex, with respect to the Lebesgue measure.

The inequality (3) was conjectured by Gardner and Zvavitch [17], originally for convex bodies containing the origin. But soon after, Nayar and Tkocz [27] found a counter-example and suggested the assumption of symmetry about the origin. It must be mentioned that the work of Eskenazis and Moschidis closed the proof of (3) by verifying a sufficiency condition introduced by Kolesnikov and Livshyts [22], which is itself based on a machinery developed by Kolesnikov and E. Milman [23, 24].

More generally, the following conjecture has garnered a lot of attention in the last few years.

Conjecture 1.1.

Let ν\nu be an even log-concave measure on n\mathbb{R}^{n}. Then, for origin-symmetric convex bodies K0,K1nK_{0},K_{1}\subseteq\mathbb{R}^{n}, we have

ν((1t)K0+tK1)1n(1t)ν(K0)1n+tν(K1)1n.\nu\left((1-t)K_{0}+tK_{1}\right)^{\frac{1}{n}}\geq(1-t)\nu(K_{0})^{\frac{1}{n}}+t\nu(K_{1})^{\frac{1}{n}}. (5)

One reason why Conjecture 1.1 is of substantial interest is that it follows from the celebrated log-Brunn–Minkowski conjecture of Böröczky, Lutwak, Yang, and Zhang [6]. This implication was shown by Livshyts, Marsiglietti, Nayar, and Zvavitch [26]. Exciting recent developments include works by Livshyts [25], Cordero-Erausquin and Rotem [12].

Recently, Aishwarya and Rotem [1] took a completely different route to prove dimensional inequalities such as in (5) using entropy.

Definition 1.2 (Relative entropy).

Let ν\nu be a σ\sigma-additive Borel measure on n\mathbb{R}^{n}. For a probability measure μ\mu, we define the relative entropy of μ\mu with respect to ν\nu by

D(μν)={( dμ dν)log( dμ dν) dν, if μ has density w.r.t. ν,+, otherwise. D(\mu\|\nu)=\begin{cases}\int\left(\frac{\textnormal{ d}\mu}{\textnormal{ d}\nu}\right)\log\left(\frac{\textnormal{ d}\mu}{\textnormal{ d}\nu}\right)\textnormal{ d}\nu,&\textnormal{ if }\mu\textnormal{ has density w.r.t. }\nu,\\ +\infty,&\textnormal{ otherwise. }\\ \end{cases} (6)
Notation.

The relative entropy D(μν)D(\mu\|\nu) is also written as D(XY)D(X\|Y) when ν\nu is a probability measure, and X,YX,Y are n\mathbb{R}^{n}-valued random vectors with distributions μ,ν\mu,\nu, respectively. Note that the joint distribution (X,Y)(X,Y) is not specified because it is immaterial for this definition. See also Definition 1.3.

The technique in [1] is based on the variational principle [1, Lemma 2.7]:

ν(K)=supμ𝒫(K)eD(μν),\nu(K)=\sup_{\mu\in\mathcal{P}(K)}e^{-D(\mu\|\nu)}, (7)

which holds for every compact set KK and is attained by the normalised restriction νK\nu_{K} of ν\nu to KK, that is, νK(E)=ν(EK)ν(K)\nu_{K}(E)=\frac{\nu(E\cap K)}{\nu(K)} for every Borel set EE. As in formula (7), we will consistently write 𝒫(K)\mathcal{P}(K) for the collection of all probability measures on a given KK.

Suppose ν\nu is a probability measure, and YY is a random vector with distribution ν\nu. In light of the above variational formula, to prove the inequality 5, it suffices to show the existence of a joint distribution (X0,X1)(X_{0},X_{1}) with the marginals X0,X1X_{0},X_{1} having distributions νK0,νK1\nu_{K_{0}},\nu_{K_{1}}, respectively, such that the following entropy inequality holds:

e1nD((1t)X0+tX1Y)(1t)e1nD(X0Y)+te1nD(X1Y).e^{-\frac{1}{n}D((1-t)X_{0}+tX_{1}\|Y)}\geq(1-t)e^{-\frac{1}{n}D(X_{0}\|Y)}+te^{-\frac{1}{n}D(X_{1}\|Y)}. (8)

This is because the distribution of (1t)X0+tX1(1-t)X_{0}+tX_{1} lies in 𝒫((1t)K0+tK1)\mathcal{P}\left((1-t)K_{0}+tK_{1}\right).

The definition and remarks below clarify our use of some standard terminology regarding joint distributions of random vectors.

Definition 1.3.
  1. (1)

    Let X0,X1X_{0},X_{1} be n\mathbb{R}^{n}-valued random vectors. By a joint distribution with marginals X0,X1X_{0},X_{1} we mean an n×n\mathbb{R}^{n}\times\mathbb{R}^{n}-valued random vector X¯\bar{X} such that {X¯E×n}={X0E}\mathbb{P}\{\bar{X}\in E\times\mathbb{R}^{n}\}=\mathbb{P}\{X_{0}\in E\} and {X¯n×E}={X1E}\mathbb{P}\{\bar{X}\in\mathbb{R}^{n}\times E^{\prime}\}=\mathbb{P}\{X_{1}\in E^{\prime}\} for Borel sets E,EE,E^{\prime}. Here \mathbb{P} denotes the measure on the underlying probability space over which our random vectors are defined. Such an X¯\bar{X} is often written simply as (X0,X1)(X_{0},X_{1}).

  2. (2)

    Likewise, a coupling of μ0,μ1𝒫(n)\mu_{0},\mu_{1}\in\mathcal{P}(\mathbb{R}^{n}) is a π𝒫(n×n)\pi\in\mathcal{P}(\mathbb{R}^{n}\times\mathbb{R}^{n}) such that π(E×n)=μ0(E),π(n×E)=μ1(E)\pi(E\times\mathbb{R}^{n})=\mu_{0}(E),\pi(\mathbb{R}^{n}\times E^{\prime})=\mu_{1}(E^{\prime}) for Borel sets E,EE,E^{\prime}.

Remarks.
  • If XiX_{i} has distribution μi\mu_{i}, that is {XiE}=μi(E)\mathbb{P}\{X_{i}\in E\}=\mu_{i}(E) for Borel sets EE and i=0,1i=0,1, then the distribution of every joint distribution (X0,X1)(X_{0},X_{1}) is a coupling π\pi and vice versa. However, we will sometimes also call (X0,X1)(X_{0},X_{1}) a coupling.

  • If (X0,X1)(X_{0},X_{1}) has distribution π\pi, then the distribution of the corresponding (1t)X0+tX1(1-t)X_{0}+tX_{1} is given by the pushforward measure [(x,y)(1t)x+ty]#π𝒫(n)\left[(x,y)\mapsto(1-t)x+ty\right]_{\#}\pi\in\mathcal{P}(\mathbb{R}^{n}).

The coupling (X0,X1)(X_{0},X_{1}) used in [1] to obtain several results is the so-called optimal coupling for the Monge–Kantorovich problem with quadratic cost, namely the one that minimises 𝔼|X0X1|2\mathbb{E}|X_{0}-X_{1}|^{2}. For example, [1, Theorem 1.3] implies that, when Y=ZY=Z has standard Gaussian distribution, the inequality (8) holds for the optimal coupling with a worse exponent (12n\frac{1}{2n} instead of 1n\frac{1}{n}) but for a larger class (K0,K1K_{0},K_{1} are only assumed to be star-sharped with respect to origin, not necessarily symmetric or convex). This was the first time that a dimensional Brunn–Minkowski inequality was obtained for the Gaussian measure without convexity assumptions on the admissible sets (which is not possible with the earlier approach). However, while trying to obtain an inequality of the form (8) that would strengthen the result of Eskenazis and Moschidis, the authors in [1] faced a very interesting problem.

Question 1.4.

[1] Suppose X0,X1X_{0},X_{1} are n\mathbb{R}^{n}-valued random vectors with even strongly log-concave distributions, and assume that (X0,X1)(X_{0},X_{1}) have the optimal coupling. Is it true that each Xt=(1t)X0+tX1X_{t}=(1-t)X_{0}+tX_{1}, t(0,1)t\in(0,1), satisfies the Poincaré inequality for odd functions with constant 11?

Recall that, a random vector XX is said to satisfy a Poincaré inequality with constant 11 over a class of functions \mathcal{F}, if for every function ff\in\mathcal{F}, we have Var(f(X))𝔼|f(X)|2\textnormal{Var}(f(X))\leq\mathbb{E}|\nabla f(X)|^{2}. Further, a strongly log-concave random vector is one with distribution μ\mu such that  dμ dγ\frac{\textnormal{ d}\mu}{\textnormal{ d}\gamma} is a log-concave function (in this case, μ\mu is said to be a strongly log-concave measure). The relevance of this property in our context stems from the fact that γK\gamma_{K} is strongly log-concave whenever KK is a convex body. [1, Theorem 4.5] shows that the desired inequality (8) for Y=ZY=Z holds for the optimal coupling, and X0,X1X_{0},X_{1} even strongly log-concave, if the answer to Question 1.4 is positive.

The first main result of the present work is that there exists a coupling of even strongly log-concave random vectors such that (8) holds when Y=ZY=Z.

Theorem 1.5.

Let X0,X1X_{0},X_{1} be n\mathbb{R}^{n}-valued random vectors with even strongly log-concave distributions. Then, there is a coupling (X0,X1)(X_{0},X_{1}) of X0X_{0} and X1X_{1} such that

e1nD((1t)X0+tX1Z)(1t)e1nD(X0Z)+te1nD(X1Z).e^{-\frac{1}{n}D((1-t)X_{0}+tX_{1}\|Z)}\geq(1-t)e^{-\frac{1}{n}D(X_{0}\|Z)}+te^{-\frac{1}{n}D(X_{1}\|Z)}. (9)

Moreover, for this coupling, we have equality if and only if X0X_{0} and X1X_{1} have the same distribution.

Remark 1.

The proof establishes the stronger inequality,

e1nD(XtZ)σ(1t)(θ)e1nD(X0Z)+σ(t)(θ)e1nD(X1Z),e^{-\frac{1}{n}D(X_{t}\|Z)}\geq\sigma^{(1-t)}\left(\theta\right)e^{-\frac{1}{n}D(X_{0}\|Z)}+\sigma^{(t)}\left(\theta\right)e^{-\frac{1}{n}D(X_{1}\|Z)}, (10)

where θ=(𝔼|X0X1|2)12\theta=\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}, and

σ(t)(θ)=sin(2ntθ)sin(2nθ),\sigma^{(t)}(\theta)=\frac{\sin\left(\sqrt{\frac{2}{n}}t\theta\right)}{\sin\left(\sqrt{\frac{2}{n}}\theta\right)}, (11)

for this θ\theta, and t[0,1]t\in[0,1]. As discussed in the proof of Theorem 1.5, the θ\theta of interest is always strictly less than n/2π\sqrt{n/2}\pi. The equality characterisation in Theorem 1.5 follows from the equality characterisation for σ(t)(θ)=t\sigma^{(t)}(\theta)=t.

The coupling we use is not the optimal coupling, but nonetheless arises from optimal transport. Let U,VU,V be n\mathbb{R}^{n}-valued random vectors satisfying 𝔼|U|2,𝔼|V|2<\mathbb{E}|U|^{2},\mathbb{E}|V|^{2}<\infty, such that their distributions have density with respect to the Lebesgue measure. Then, a theorem of Brenier (see [29, Theorem 2.12 (ii)]) guarantees that a unique coupling minimises 𝔼|UV|2\mathbb{E}|U-V|^{2}, and furthermore, it is given by (U,T(U))(U,T(U)) where T=ϕT=\nabla\phi is the gradient of a convex function ϕ\phi. Note that the map TT, called the Brenier map from UU to VV, pushes forward the distribution of UU to the distribution of VV. In the present work, we consider the Brenier map T0T_{0} from ZZ to X0X_{0}, the Brenier map T1T_{1} from ZZ to X1X_{1}, and work with the joint distribution (X0,X1)=(T0(Z),T1(Z))(X_{0},X_{1})=(T_{0}(Z),T_{1}(Z)).

The contraction theorem of Caffarelli [10] tells us that the Brenier map from the standard Gaussian to any strongly log-concave random vector is 11-Lipschitz. This automatically gives us that the Xt=(1t)X0+tX1X_{t}=(1-t)X_{0}+tX_{1} we consider in this paper is a 11-Lipschitz image of ZZ under Tt=(1t)T0+tT1T_{t}=(1-t)T_{0}+tT_{1}. Given that ZZ satisfies a Poincaré inequality with constant 11, a standard change of variables argument immediately shows that XtX_{t} satisfies the Poincaré inequality with constant 11 for all functions. However, interestingly, we do not use this fact directly. Instead, we use the 11-Lipschitz property of TtT_{t} and the Poincaré constant of ZZ separately. It remains an open question whether the optimal coupling also satisfies the conclusion of Theorem 1.5.

An important feature of the interpolation XtX_{t}, if considered under optimal coupling, is that the trajectories {Tt(x)}t(0,1)\{T_{t}(x)\}_{t\in(0,1)} do not cross (in an almost-everywhere sense), and hence the distribution μt\mu_{t} of XtX_{t} can be described as the flow of μ0\mu_{0} under a time-dependent velocity field. Yet another useful property under optimal coupling is that the velocity field generated is a gradient field. Both these properties are used in [1].

In our case, for XtX_{t} that we consider, we are not guaranteed the existence of a driving velocity field, nor do we see a reason for this velocity field to be a gradient field even if it exists. The former technical difficulty is overcome by a “trajectories do not cross” result when X0,X1X_{0},X_{1} are “nice” (Proposition 2.2), and approximation. The latter issue most prominently appears in the proof of Theorem 1.5, where an inequality such as 𝔼tr[v(Xt)2]𝔼|v(Xt)|2\mathbb{E}\textnormal{tr}[\nabla v(X_{t})^{2}]\geq\mathbb{E}|v(X_{t})|^{2} is needed for a particular odd vector field vv. This is always true when vv is a gradient field and the even random vector XtX_{t} has Poincaré constant 11, but not in general. To resolve this problem, we explicitly use the structure of the given vector field vv (which depends on TtT_{t}) and the Gaussian Poincaré inequality. This makes it unclear if our proof would go through if T0T_{0} and T1T_{1} were contractions (via the reverse Ornstein–Uhlenbeck process) introduced by Kim and E. Milman [20], and not Brenier maps. Readers familiar with the work of Alesker, Gilboa, and V. Milman [2] may find it intriguing to compare the fact that the coupling used in this paper admits a Theorem 1.5 (while for other aforementioned couplings such a result is yet unestablished), with Gromov’s observation that ϕ[n]+ψ[n]=(ϕ+ψ)[n]\nabla\phi[\mathbb{R}^{n}]+\nabla\psi[\mathbb{R}^{n}]=(\nabla\phi+\nabla\psi)[\mathbb{R}^{n}] when ϕ,ψ\phi,\psi are C2C^{2} convex functions with strictly positive Hessian [19, 1.3.A.] (see also, [2, Proposition 2.2]).

As an immediate corollary to Theorem 1.5, using the variational principle (7), we obtain a new proof of Eskenazis and Moschidis’ result.

Corollary 1.6.

The dimensional Brunn–Minkowski inequality for the Gaussian measure (3) holds if K0K_{0} and K1K_{1} are origin-symmetric convex bodies.

Remark 2.

In view of Remark 1, it is an interesting question if one can meaningfully bound 𝔼|X0X1|2\mathbb{E}|X_{0}-X_{1}|^{2} from below, when X0,X1X_{0},X_{1} have distributions γK0,γK1\gamma_{K_{0}},\gamma_{K_{1}}, respectively, for symmetric convex bodies K0,K1K_{0},K_{1}. This could potentially lead to a Gaussian dimensional Brunn–Minkowski inequality for symmetric convex bodies which also incorporates the curvature aspects of the Gaussian measure. As far as we know, this has not been done.

A fundamental problem in this area concerns obtaining functional forms of geometric inequalities. This means, given a geometric inequality, one wants to find a functional inequality which recovers the given geometric inequality when applied to functions canonically associated with the involved sets (for example, to indicator functions). For a prototypical example, consider the Brunn–Minkowski inequality in its geometric-mean form that all log-concave measures ν\nu are known to satisfy:

ν((1t)K0+tK1)ν(K0)1tν(K1)t,\nu((1-t)K_{0}+tK_{1})\geq\nu(K_{0})^{1-t}\nu(K_{1})^{t}, (12)

whenever K0K_{0} and K1K_{1} are compact sets in n\mathbb{R}^{n}. The functional form of (12) is the Prékopa–Leindler inequality which concludes

h dν(f dν)1t(g dν)t,\int h\textnormal{ d}\nu\geq\left(\int f\textnormal{ d}\nu\right)^{1-t}\left(\int g\textnormal{ d}\nu\right)^{t}, (13)

whenever f,g,hf,g,h are non-negative functions satisfying

h((1t)x+ty)f(x)1tg(y)t,h((1-t)x+ty)\geq f(x)^{1-t}g(y)^{t}, (14)

for all x,yx,y. Of course, if ff and gg are indicator functions of K0K_{0} and K1K_{1}, respectively, then the indicator of (1t)K0+tK1(1-t)K_{0}+tK_{1} is an admissible choice for hh, thus producing (12). While several proofs of the Prékopa–Leindler inequality exist (for example, see [16]), an elegant proof can be obtained from the entropy form of (12): every pair of n\mathbb{R}^{n}-valued random vectors X0,X1X_{0},X_{1} with density (with respect to the Lebesgue measure) have a joint distribution (X0,X1)(X_{0},X_{1}) such that,

D((1t)X0+tX1Y)(1t)D(X0Y)+tD(X1Y),D((1-t)X_{0}+tX_{1}\|Y)\leq(1-t)D(X_{0}\|Y)+tD(X_{1}\|Y), (15)

where YY is a random vector with distribution ν\nu. The fact that the optimal coupling (X0,X1)(X_{0},X_{1}) satisfies (15) (see [3, Theorem 9.4.11]) is well known, and often recorded as the “displacement convexity of entropy on the metric measure space (n,||,ν)(\mathbb{R}^{n},|\cdot|,\nu)”. To go from (15) to (13) one can use the Donsker–Varadhan duality formula [13, Section 2] describing the Legendre transform of relative entropy. It says, for ν\nu-integrable functions ϕ\phi, we have

logeϕ dν=supμν[ϕ dμD(μν)],\log\int e^{\phi}\textnormal{ d}\nu=\sup_{\mu\ll\nu}\left[\int\phi\textnormal{ d}\mu-D(\mu\|\nu)\right], (16)

where the supremum on the right is over all probability measures μ\mu absolutely continuous with respect to ν\nu, and equality is attained in (16) for  dμeϕ dν\textnormal{ d}\mu\propto e^{\phi}\textnormal{ d}\nu.

We do not spell out the details of the implication (15) \Rightarrow (13) because the reader may infer the general idea from our proof of Theorem 1.7 which is rather short. Nonetheless, it is apt to remark here that this technique stands out because it entirely operates at the level of integrals and does not appeal to local estimates on the integrands (other than the one granted by assumption), thereby making it possible to extract functional inequalities even if a convexity property of entropy is only available on a restricted class of measures. The same cannot be said about some other transport-based proofs of the Prékopa–Leindler inequality (or its generalisations). Besides, this method works in measure spaces without any smooth structure. For example, the reader may find beautiful applications to discrete structures in works of Gozlan, Roberto, Samson, and Tetali [18], and Slomka [28].

We will use this duality to obtain a functional form of the dimensional Brunn–Minkowski inequality (3), which is our second main result.

Notation.

We write

Mpt(x,y){((1t)xp+typ)1p, for xy>0,0 otherwise,M_{p}^{t}(x,y)\coloneqq\begin{cases}\left((1-t)x^{p}+ty^{p}\right)^{\frac{1}{p}},&\hbox{ for }xy>0,\\ 0&\hbox{ otherwise,}\end{cases} (17)

for t[0,1]t\in[0,1] and p[,]p\in[-\infty,\infty].

Theorem 1.7.

Let p0p\geq 0, and suppose f,g,hf,g,h are non-negative functions on n\mathbb{R}^{n} with f,gf,g even log-concave and γ\gamma-integrable, such that

h((1t)x0+tx1)Mpt(f(x0),g(x1)).h((1-t)x_{0}+tx_{1})\geq M_{p}^{t}\left(f(x_{0}),g(x_{1})\right). (18)

Then, we have

h dγMp1+npt(f dγ,g dγ).\int h\textnormal{ d}\gamma\geq M_{\frac{p}{1+np}}^{t}\left(\int f\textnormal{ d}\gamma,\int g\textnormal{ d}\gamma\right). (19)

Indeed, when ff and gg are indicators of symmetric convex bodies, and p=p=\infty, one recovers (3). Inequalities such as in the theorem above are sometimes called Borell–Brascamp–Lieb inequalities after the works by Borell [5], and Brascamp–Lieb [7].

To the best of our knowledge, the argument for obtaining Theorem 1.7 from Theorem 1.5, though simple, is new. Previously, it was not clear how to apply duality (16) to inequalities such as (9) that are not linear in relative entropy. Exactly the same idea as we use in the proof of Theorem 1.7 gives further dimension-dependent functional inequalities for functions that are not necessarily log-concave, as discussed below.

Recall that, a convex function V:nV:\mathbb{R}^{n}\to\mathbb{R} is said to be β\beta-homogeneous if V(λx)=λβV(x)V(\lambda x)=\lambda^{\beta}V(x), for every xnx\in\mathbb{R}^{n} and λ>0\lambda>0. Consider the probability measure νeV dx\nu\propto e^{-V}\textnormal{ d}x, such that VV is β\beta-homogeneous for some β(1,)\beta\in(1,\infty). Say ν\nu is represented by a random vector YY. Then, [1, Theorem 1.4] states that, for random vectors X0X_{0} and X1X_{1} having radially decreasing density with respect to ν\nu, there exists a coupling (X0,X1)(X_{0},X_{1}) (in fact, the optimal coupling works) such that

eβ1βnD((1t)X0+tX1Y)(1t)eβ1βnD(X0Y)+teβ1βnD(X1Y).e^{-\frac{\beta-1}{\beta n}D((1-t)X_{0}+tX_{1}\|Y)}\geq(1-t)e^{-\frac{\beta-1}{\beta n}D(X_{0}\|Y)}+te^{-\frac{\beta-1}{\beta n}D(X_{1}\|Y)}. (20)

From this, we have the following result.

Theorem 1.8.

Consider the probability measure  dνeV dx\textnormal{ d}\nu\propto e^{-V}\textnormal{ d}x, such that VV is β\beta-homogeneous for some β(1,)\beta\in(1,\infty). Let p0p\geq 0, and suppose f,g,hf,g,h are non-negative functions on n\mathbb{R}^{n} with f,gf,g radially decreasing and ν\nu-integrable, such that

h((1t)x0+tx1)Mpt(f(x0),g(x1)).h((1-t)x_{0}+tx_{1})\geq M_{p}^{t}\left(f(x_{0}),g(x_{1})\right). (21)

Then, we have

h dνM(β1)p(β1)+βnpt(f dν,g dν).\int h\textnormal{ d}\nu\geq M_{\frac{(\beta-1)p}{(\beta-1)+\beta np}}^{t}\left(\int f\textnormal{ d}\nu,\int g\textnormal{ d}\nu\right). (22)

The standard Gaussian measure falls under the regime β=2\beta=2. Thus, Theorem 1.8 can be applied to a larger class of functions compared to Theorem 1.7, but at the same time draws a weaker conclusion.

1.1. Related works

An independent preprint, by Alexandros Eskenazis and Dario Cordero-Erausquin, containing weighted Borell–Brascamp–Lieb inequalities corresponding to the results in [15] and [12] is expected to appear soon. Their work uses the so-called L2L^{2}-method, which is based on a different perspective. We learnt from Alexandros Eskenazis, for example, that their equivalent of Theorem 1.7 applies to a wider class of reference measures (as in [12]) but requires greater restrictions on the admissible functions. To the best of our knowledge, the techniques in the aforementioned work may not directly extend to the treatment of even log-concave functions as in Theorem 1.7. We thank Alexandros Eskenazis for kindly sharing with us his joint results, and for his comments (in particular, but not limited to, his suggestion to discuss equality cases in Theorem 1.5). We eagerly await reading their paper!

We would also like to mention the related ongoing work of Andreas Malliaris, James Melbourne, and Cyril Roberto. James Melbourne discussed, with the first-named author, an elegant technique from their work to obtain Borell–Brascamp–Lieb inequalities for p(1/n,1]p\in(-1/n,1] directly from inequalities such as 3. This discussion took place at the Hausdorff Research Institute for Mathematics, before the present authors understood the precise way to go to a result as in Theorem 1.7 from the exponentiated-entropy inequality of Theorem 1.5. According to GA’s recollection, the argument of Malliaris–Melbourne–Roberto is based on elementary measure theory instead of entropy. This makes their proofs very different, mathematically and spiritually, from the one presented in our paper. More recently, after writing this preprint, we learnt from James Melbourne about an approximation argument to extend their work to include p=1/np=-1/n from p(1/n,1]p\in(-1/n,1]. It should be noted that the p=1/np=-1/n case is very powerful, it implies Borell–Brascamp–Lieb inequalities for all p>1/np>-1/n. We have not verified the details of the original arguments in the work of Malliaris–Melbourne–Roberto, but we look forward to their work with great enthusiasm. Once the details are verified to be correct, it would not only significantly generalise the statements of our Theorems 1.7 and 1.8, but also reveal many previously unexplored consequences of Conjecture 1.1.

1.2. Further acknowledgements

We are grateful to Liran Rotem for his continued and generous sharing of insights on the topic of dimensional Brunn–Minkowski inequalities. Many thanks to Alexandros Eskenazis and James Melbourne for kindly sharing their results, and to Alexander Volberg for enriching discussions on a related problem. We also sincerely acknowledge the helpful discussions with Galyna Livshyts and Emma Pollard on potential functional forms of the Eskenazis–Moschidis inequality.

1.3. Organisation of the paper

The proof of Theorem 1.5 is based on an Eulerian description of mass transport. The required background is presented in Section 2. Proofs of Theorem 1.5 and Theorem 1.7 appear in Section 3. Theorem 1.8 follows along the same lines as Theorem 1.7, hence we omit its proof.

2. Preliminaries

First of all, we note that  d2 dt2e1nD(XtZ)0\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}e^{-\frac{1}{n}D(X_{t}\|Z)}\leq 0 is equivalent to  d2 dt2D(XtZ)1n( d dtD(XtZ))2\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(X_{t}\|Z)\geq\frac{1}{n}\left(\frac{\textnormal{ d}}{\textnormal{ d}t}D(X_{t}\|Z)\right)^{2}, whenever the relevant quantities have the required regularity. Thus, we would like to compute  d2 dt2D(XtZ)\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(X_{t}\|Z) and  d dtD(XtZ)\frac{\textnormal{ d}}{\textnormal{ d}t}D(X_{t}\|Z). Such local computations are often best performed in the language of velocity fields.

Suppose a curve {μt}t[0,1]\{\mu_{t}\}_{t\in[0,1]} of probability measures on n\mathbb{R}^{n} is given. A time-dependent velocity field vtv_{t} is said to be compatible with {μt}t[0,1]\{\mu_{t}\}_{t\in[0,1]} if

tμt+div(vtμt)=0\partial_{t}\mu_{t}+\textnormal{div}(v_{t}\mu_{t})=0 (23)

is satisfied in the weak sense, where div denotes divergence. The latter equation means that

 d dtf dμt=f,vt dμt,\frac{\textnormal{ d}}{\textnormal{ d}t}\int f\textnormal{ d}\mu_{t}=\int\langle\nabla f,v_{t}\rangle\textnormal{ d}\mu_{t}, (24)

for all compactly supported smooth functions ff. Once Equation (23) is known to hold, Equation (24) holds under wider generality; for example, it holds for all bounded Lipschitz smooth functions (see [3, Chapter 8]).

Ignoring all regularity issues, we compute the derivatives of D(μtγ)D(\mu_{t}\|\gamma) twice when a compatible velocity field is given. We write the result for a general log-concave measure ν\nu since it may be of independent interest.

Proposition 2.1.

Let  dν=eW dx\textnormal{ d}\nu=e^{-W}\textnormal{ d}x, for smooth convex WW. Consider a curve of probability measures {μt}t[0,1]\{\mu_{t}\}_{t\in[0,1]} with a compatible velocity field vtv_{t}. Suppose vtv_{t} is sufficiently smooth, then

 d dtD(μtν)=divW(vt) dμt,\begin{split}\frac{\textnormal{ d}}{\textnormal{ d}t}D(\mu_{t}\|\nu)=-\int\textnormal{div}^{W}(v_{t})\textnormal{ d}\mu_{t},\\ \end{split} (25)

where divW(v)=div(v)W,v\textnormal{div}^{W}(v)=\textnormal{div}(v)-\langle\nabla W,v\rangle, for vector fields vv. Moreover, if the trajectories of vtv_{t} take each particle along a straight line with constant speed, then

 d2 dt2D(μtν)=𝒢W(vt) dμt,\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(\mu_{t}\|\nu)=\int\mathcal{G}^{W}(v_{t})\textnormal{ d}\mu_{t},\\ (26)

where 𝒢W(v)=tr(v)2+2Wv,v\mathcal{G}^{W}(v)=\textnormal{tr}(\nabla v)^{2}+\langle\nabla^{2}W\cdot v,v\rangle, for vector fields vv.

Proof.

Let ρt\rho_{t} denote the density of μt\mu_{t} with respect to ν\nu. Then,

 d dtD(μtν)= d dtlogρt dμt=tlogρt dμt+logρt,vt dμt= d dtρt dν+logρt,vtρt dν=ρt,vt dν=ρt,vteW dx=ρtdiv(eWvt) dx=ρtdivW(vt) dν=divW(vt) dμt.\begin{split}\frac{\textnormal{ d}}{\textnormal{ d}t}D(\mu_{t}\|\nu)&=\frac{\textnormal{ d}}{\textnormal{ d}t}\int\log\rho_{t}\textnormal{ d}\mu_{t}=\int\partial_{t}\log\rho_{t}\textnormal{ d}\mu_{t}+\int\langle\nabla\log\rho_{t},v_{t}\rangle\textnormal{ d}\mu_{t}\\ &=\frac{\textnormal{ d}}{\textnormal{ d}t}\int\rho_{t}\textnormal{ d}\nu+\int\langle\nabla\log\rho_{t},v_{t}\rangle\rho_{t}\textnormal{ d}\nu=\int\langle\nabla\rho_{t},v_{t}\rangle\textnormal{ d}\nu\\ &=\int\langle\nabla\rho_{t},v_{t}\rangle e^{-W}\textnormal{ d}x=-\int\rho_{t}\textnormal{div}(e^{-W}v_{t})\textnormal{ d}x\\ &=-\int\rho_{t}\,\textnormal{div}^{W}(v_{t})\textnormal{ d}\nu=-\int\textnormal{div}^{W}(v_{t})\textnormal{ d}\mu_{t}.\end{split} (27)

In the above computation, the second equality uses the chain rule and the continuity equation (23), and the sixth equality is an application of integration by parts.

Further, note that tvt+vtvt=0\partial_{t}v_{t}+\nabla_{v_{t}}v_{t}=0 if the trajectories of vtv_{t} take particles along a straight line with constant speed. This allows the following computation to proceed.

 d2 dt2D(μtν)= d dtdivW(vt) dμt=divW(tvt) dμtdivW(vt),vt dμt=divW(vtvt) dμtdivW(vt),vt dμt=𝒢W(vt) dμt,\begin{split}\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(\mu_{t}\|\nu)&=-\frac{\textnormal{ d}}{\textnormal{ d}t}\int\textnormal{div}^{W}(v_{t})\textnormal{ d}\mu_{t}=-\int\textnormal{div}^{W}\left(\partial_{t}v_{t}\right)\textnormal{ d}\mu_{t}-\int\langle\nabla\textnormal{div}^{W}(v_{t}),v_{t}\rangle\textnormal{ d}\mu_{t}\\ &=\int\textnormal{div}^{W}\left(\nabla_{v_{t}}v_{t}\right)\textnormal{ d}\mu_{t}-\int\langle\nabla\textnormal{div}^{W}(v_{t}),v_{t}\rangle\textnormal{ d}\mu_{t}=\int\mathcal{G}^{W}(v_{t})\textnormal{ d}\mu_{t},\end{split} (28)

where in the second equality we use the chain rule and the continuity equation (23), while the last equality uses the pointwise formula

𝒢W(v)=divW(vv)divW(v),v,\mathcal{G}^{W}(v)=\textnormal{div}^{W}\left(\nabla_{v}v\right)-\langle\nabla\textnormal{div}^{W}(v),v\rangle, (29)

which holds for smooth vv. Formula (29) is an easily obtainable weighted-version of the Bochner formula for vector fields (see, for example, [30, Equation 14.26]). ∎

To utilise the above formulas for the derivatives of entropy, we need to establish the existence of compatible velocity fields in the cases of interest. We let In×nI_{n\times n} denote the n×nn\times n identity matrix.

Proposition 2.2.

Fix a probability measure  dν=eW dx\textnormal{ d}\nu=e^{-W}\textnormal{ d}x, and maps T0=ϕ0,T1=ϕ1:nnT_{0}=\nabla\phi_{0},T_{1}=\nabla\phi_{1}:\mathbb{R}^{n}\to\mathbb{R}^{n}, where ϕ0,ϕ1\phi_{0},\phi_{1} are convex functions. Denote by μt=Tt#ν\mu_{t}={T_{t}}_{\#}\nu, where Tt=(1t)T0+tT1T_{t}=(1-t)T_{0}+tT_{1}. Suppose 2ϕ0\nabla^{2}\phi_{0} and 2ϕ1\nabla^{2}\phi_{1} are both lower-bounded (in the positive semi-definite order) by λIn×n\lambda I_{n\times n} for some λ>0\lambda>0. Then, the equation

vt(Tt(x))= d dtTt(x)v_{t}(T_{t}(x))=\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x) (30)

defines a velocity field compatible with the curve {μt}t[0,1]\{\mu_{t}\}_{t\in[0,1]}.

Proof.

Evidently, the only obstruction in defining a velocity field is that two trajectories Tt(x)T_{t}(x) and Tt(y)T_{t}(y) cross each other at some time t(0,1)t\in(0,1), that is, if there is a tt_{\star} such that Tt(x)=Tt(y)T_{t_{\star}}(x)=T_{t_{\star}}(y) for xyx\neq y. However,

Tt(x)Tt(y),xy=(1t)(T0(x)T0(y))+t(T1(x)T1(y)),xy=(1t)T0(x)T0(y),xy+tT1(x)T1(y),xy(1t)λ|xy|2+tλ|xy|2=λ|xy|2,\begin{split}\langle T_{t}(x)-T_{t}(y),x-y\rangle&=\langle(1-t)\left(T_{0}(x)-T_{0}(y)\right)+t\left(T_{1}(x)-T_{1}(y)\right),x-y\rangle\\ &=(1-t)\langle T_{0}(x)-T_{0}(y),x-y\rangle+t\langle T_{1}(x)-T_{1}(y),x-y\rangle\\ &\geq(1-t)\lambda|x-y|^{2}+t\lambda|x-y|^{2}=\lambda|x-y|^{2},\end{split} (31)

because ϕ0λ2|x|2\phi_{0}-\frac{\lambda}{2}|x|^{2} and ϕ1λ2|x|2\phi_{1}-\frac{\lambda}{2}|x|^{2} are convex and consequently have monotone gradients. Thus, the possibility of this obstruction is ruled out. Now we verify the compatibility. For a compactly supported smooth function ff,

 d dtf dμt= d dtf(Tt(x)) dν(x)=f(Tt(x)), d dtTt(x) dν(x)=f(Tt(x)),vt(Tt(x)) dν=f,vt dμt.\begin{split}\frac{\textnormal{ d}}{\textnormal{ d}t}\int f\textnormal{ d}\mu_{t}&=\frac{\textnormal{ d}}{\textnormal{ d}t}\int f(T_{t}(x))\textnormal{ d}\nu(x)=\int\langle\nabla f(T_{t}(x)),\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x)\rangle\textnormal{ d}\nu(x)\\ &=\int\langle\nabla f(T_{t}(x)),v_{t}(T_{t}(x))\rangle\textnormal{ d}\nu=\int\langle\nabla f,v_{t}\rangle\textnormal{ d}\mu_{t}.\end{split} (32)

3. Proof of the main results

In this section, we solely work with the Gaussian measure as the reference measure. Thus, ν\nu from the previous section is taken to be γ\gamma. In this case, we will denote divW\textnormal{div}^{W} by div~\widetilde{\textnormal{div}} and 𝒢W\mathcal{G}^{W} by 𝒢~\widetilde{\mathcal{G}}.

Proof of Theorem 1.5.

Suppose X0,X1X_{0},X_{1} are even strongly log-concave random vectors in n\mathbb{R}^{n} with distributions μ0,μ1\mu_{0},\mu_{1}, respectively. Let T0T_{0} and T1T_{1} be Brenier maps from the standard Gaussian ZZ to X0X_{0} and X1X_{1}, respectively. With the joint distribution (T0(Z),T1(Z))(T_{0}(Z),T_{1}(Z)), and Xt=(1t)X0+tX1X_{t}=(1-t)X_{0}+tX_{1}, we want to prove  d2 dt2e1nD(XtZ)0\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}e^{-\frac{1}{n}D(X_{t}\|Z)}\leq 0.

Suppose  dμ0eU0 dx\textnormal{ d}\mu_{0}\propto e^{-U_{0}}\textnormal{ d}x and  dμ1eU1 dx\textnormal{ d}\mu_{1}\propto e^{-U_{1}}\textnormal{ d}x. First, we assume that there is a κ<\kappa<\infty such that 2U0,2U1κIn×n\nabla^{2}U_{0},\nabla^{2}U_{1}\leq\kappa I_{n\times n}. By strong log-concavity, we already have 2U0,2U1In×n\nabla^{2}U_{0},\nabla^{2}U_{1}\geq I_{n\times n}. Thus, by Caffarelli’s contraction theorem (or a form thereof, see the statement in [11, Theorem 1]), we get that T0,T1T_{0},T_{1} are both 11-Lipschitz, while T01,T11T^{-1}_{0},T^{-1}_{1} are κ\sqrt{\kappa}-Lipschitz. If we write T0=ϕ0,T1=ϕ1T_{0}=\nabla\phi_{0},T_{1}=\nabla\phi_{1} as gradients of convex functions, then these bounds translate to 1κIn×n2ϕ0,2ϕ1In×n\frac{1}{\sqrt{\kappa}}I_{n\times n}\leq\nabla^{2}\phi_{0},\nabla^{2}\phi_{1}\leq I_{n\times n}. We infer from Proposition 2.2 that, if μt=Tt#γ\mu_{t}={T_{t}}_{\#}\gamma, Tt=(1t)T0+tT1T_{t}=(1-t)T_{0}+tT_{1} (thus μt\mu_{t} is the distribution of XtX_{t}), then a velocity field vtv_{t} compatible with μt\mu_{t} is well-defined by vt(Tt(x))= d dtTt(x)v_{t}(T_{t}(x))=\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x). Further, the smoothness of the velocity field vtv_{t} required to apply Proposition 2.1 can be obtained by Caffarelli’s regularity theory [8, 9]. From here we mimic the argument in [1, Theorem 4.5] with some modifications, but applied to vector fields, where we also use an analogue of a crucial auxiliary construction from [15]. We will prove the stronger inequality

𝒢~(vt) dμt2|vt|2 dμt+1n(div~(vt) dμt)2.\int\widetilde{\mathcal{G}}(v_{t})\textnormal{ d}\mu_{t}\geq 2\int|v_{t}|^{2}\textnormal{ d}\mu_{t}+\frac{1}{n}\left(\int\widetilde{\textnormal{div}}(v_{t})\textnormal{ d}\mu_{t}\right)^{2}. (33)

Let ut(x)=vt(x)lnxu_{t}(x)=v_{t}(x)-\frac{l}{n}x, where l=div~(vt) dμtl=\int\widetilde{\textnormal{div}}(v_{t})\textnormal{ d}\mu_{t}. Then,

tr(vt)2=tr(ut+lnIn×n)2=tr((ut)2+2lnut+l2n2In×n)=tr(ut)2+2lndiv(ut)+l2n=tr(ut)2+2lndiv(vt)l2n=tr(ut)2+2ln(div~(vt)+x,vt)l2n=tr(ut)2+2lnx,vt+(2lndiv~(vt)l2n).\begin{split}\textnormal{tr}(\nabla v_{t})^{2}&=\textnormal{tr}\left(\nabla u_{t}+\frac{l}{n}I_{n\times n}\right)^{2}=\textnormal{tr}\left((\nabla u_{t})^{2}+\frac{2l}{n}\nabla u_{t}+\frac{l^{2}}{n^{2}}I_{n\times n}\right)\\ &=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\textnormal{div}(u_{t})+\frac{l^{2}}{n}=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\textnormal{div}(v_{t})-\frac{l^{2}}{n}\\ &=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\left(\widetilde{\textnormal{div}}(v_{t})+\langle x,v_{t}\rangle\right)-\frac{l^{2}}{n}\\ &=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\langle x,v_{t}\rangle+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right).\end{split} (34)

To continue the proof in the mould of [1, Theorem 4.5], we would like to show that tr(ut)2 dμti|ut(i)|2 dμt\int\textnormal{tr}(\nabla u_{t})^{2}\textnormal{ d}\mu_{t}\geq\sum_{i}\int|u_{t}^{(i)}|^{2}\textnormal{ d}\mu_{t}, where we write ut=(ut(1),,ut(n))u_{t}=(u_{t}^{(1)},\ldots,u_{t}^{(n)}) in its components. This step is slightly more involved than in [1] (see Remark 3) and we are forced to take the following route.

Using the chain rule, one has

ut(Tt(x))Tt(x)=[ut(Tt(x))]=2ϕ1(x)2ϕ0(x)lnTt(x).\nabla u_{t}(T_{t}(x))\nabla T_{t}(x)=\nabla[u_{t}(T_{t}(x))]=\nabla^{2}\phi_{1}(x)-\nabla^{2}\phi_{0}(x)-\frac{l}{n}\nabla T_{t}(x).

Let A=2ϕ1(x)2ϕ0(x)lnTt(x)A=\nabla^{2}\phi_{1}(x)-\nabla^{2}\phi_{0}(x)-\frac{l}{n}\nabla T_{t}(x) and B=(Tt(x))1B=(\nabla T_{t}(x))^{-1}, note that the matrices AA and BB are symmetric; furthermore, BIn×nB\geq I_{n\times n}. Therefore, we have

tr[u(Tt(x))]2=tr(AB)2=tr(ABAB)=tr(B1/2ABAB1/2)tr(B1/2A2B1/2)=tr(A2B)=tr(ABA)tr(A2)=tr([ut(Tt(x))])2,\begin{split}\textnormal{tr}[\nabla u(T_{t}(x))]^{2}&=\textnormal{tr}(AB)^{2}=\textnormal{tr}(ABAB)=\textnormal{tr}(B^{1/2}ABAB^{1/2})\\ &\geq\textnormal{tr}(B^{1/2}A^{2}B^{1/2})=\textnormal{tr}(A^{2}B)=\textnormal{tr}(ABA)\geq\textnormal{tr}(A^{2})\\ &=\textnormal{tr}(\nabla[u_{t}(T_{t}(x))])^{2},\\ \end{split} (35)

where both the inequalities follow from the monotonicity of trace under the positive semidefinite ordering.

The trace inequality above gives us the following.

tr(ut)2 dμt=tr[u(Tt(x))]2 dγtr([ut(Tt(x))])2 dγ=i|[ut(i)(Tt(x))]|2 dγ,\begin{split}\int\textnormal{tr}(\nabla u_{t})^{2}\textnormal{ d}\mu_{t}&=\int\textnormal{tr}[\nabla u(T_{t}(x))]^{2}\textnormal{ d}\gamma\geq\int\textnormal{tr}(\nabla[u_{t}(T_{t}(x))])^{2}\textnormal{ d}\gamma\\ &=\sum_{i}\int|\nabla[u_{t}^{(i)}(T_{t}(x))]|^{2}\textnormal{ d}\gamma,\\ \end{split} (36)

where the last equality uses the fact that ut(Tt(x))u_{t}(T_{t}(x)) is a gradient field, which can be seen from the expression ut(Tt(x))=(1tln)ϕ1(x)(1+(1t)ln)ϕ0(x)u_{t}(T_{t}(x))=(1-\frac{tl}{n})\nabla\phi_{1}(x)-(1+\frac{(1-t)l}{n})\nabla\phi_{0}(x). Furthermore, ut(Tt(x))u_{t}(T_{t}(x)) is an odd function of xx, since both ϕ0(x)\nabla\phi_{0}(x) and ϕ1(x)\nabla\phi_{1}(x) are odd.

Applying the Gaussian Poincaré inequality to each component of ut(Tt(x))u_{t}(T_{t}(x)), we obtain

i|[ut(i)(Tt(x))]|2 dγi|ut(i)(Tt(x))|2 dγ=i(ut(i)(x))2 dμt=t(vt(i)(x)lnx(i))2 dμt=(|vt|22lnx,vt+l2n2|x|2) dμt.\begin{split}\sum_{i}\int|\nabla[u_{t}^{(i)}(T_{t}(x))]|^{2}\textnormal{ d}\gamma&\geq\sum_{i}\int|u_{t}^{(i)}(T_{t}(x))|^{2}\textnormal{ d}\gamma=\sum_{i}\int\left(u_{t}^{(i)}(x)\right)^{2}\textnormal{ d}\mu_{t}\\ &=\sum_{t}\int\left(v_{t}^{(i)}(x)-\frac{l}{n}x^{(i)}\right)^{2}\textnormal{ d}\mu_{t}\\ &=\int\left(|v_{t}|^{2}-\frac{2l}{n}\langle x,v_{t}\rangle+\frac{l^{2}}{n^{2}}|x|^{2}\right)\textnormal{ d}\mu_{t}.\end{split} (37)

Putting this into the expression for tr(vt)2\textnormal{tr}(\nabla v_{t})^{2} from before,

𝒢~(vt) dμt=(tr(vt)2+|vt|2) dμt=(tr(ut)2+2lnx,vt+(2lndiv~(vt)l2n)+|vt|2) dμt(|vt|22lnx,vt+l2n2|x|2+2lnx,vt+(2lndiv~(vt)l2n)+|vt|2) dμt=(2|vt|2+l2n2|x|2+(2lndiv~(vt)l2n)) dμt2|vt|2 dμt+(2lndiv~(vt)l2n) dμt=2|vt|2 dμt+1nl2,\begin{split}\int\widetilde{\mathcal{G}}(v_{t})\textnormal{ d}\mu_{t}&=\int\left(\textnormal{tr}(\nabla v_{t})^{2}+|v_{t}|^{2}\right)\textnormal{ d}\mu_{t}\\ &=\int\left(\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\langle x,v_{t}\rangle+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)+|v_{t}|^{2}\right)\textnormal{ d}\mu_{t}\\ &\geq\int\left(|v_{t}|^{2}-\frac{2l}{n}\langle x,v_{t}\rangle+\frac{l^{2}}{n^{2}}|x|^{2}+\frac{2l}{n}\langle x,v_{t}\rangle+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)+|v_{t}|^{2}\right)\textnormal{ d}\mu_{t}\\ &=\int\left(2|v_{t}|^{2}+\frac{l^{2}}{n^{2}}|x|^{2}+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)\right)\textnormal{ d}\mu_{t}\\ &\geq 2\int|v_{t}|^{2}\textnormal{ d}\mu_{t}+\int\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)\textnormal{ d}\mu_{t}=2\int|v_{t}|^{2}\textnormal{ d}\mu_{t}+\frac{1}{n}l^{2},\end{split} (38)

as desired.

Note that,

|vt|2 dμt=|vt(Tt(x))|2 dγ(x)=| d dtTt(x)|2 dγ(x)=|T0(x)T1(x)|2 dγ=𝔼|X0X1|2.\begin{split}\int|v_{t}|^{2}\textnormal{ d}\mu_{t}&=\int|v_{t}(T_{t}(x))|^{2}\textnormal{ d}\gamma(x)=\int|\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x)|^{2}\textnormal{ d}\gamma(x)\\ &=\int|T_{0}(x)-T_{1}(x)|^{2}\textnormal{ d}\gamma=\mathbb{E}|X_{0}-X_{1}|^{2}.\end{split} (39)

Thus, we have shown that

 d2 dt2D(XtZ)2𝔼|X0X1|2+1n( d dtD(XtZ))2,\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(X_{t}\|Z)\geq 2\mathbb{E}|X_{0}-X_{1}|^{2}+\frac{1}{n}\left(\frac{\textnormal{ d}}{\textnormal{ d}t}D(X_{t}\|Z)\right)^{2}, (40)

under the assumed regularity on X0,X1X_{0},X_{1} and the chosen coupling (X0,X1)(X_{0},X_{1}). We claim that,

e1nD(XtZ)σ(1t)((𝔼|X0X1|2)12)e1nD(X0Z)+σ(t)((𝔼|X0X1|2)12)e1nD(X1Z),e^{-\frac{1}{n}D(X_{t}\|Z)}\geq\sigma^{(1-t)}\left(\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}\right)e^{-\frac{1}{n}D(X_{0}\|Z)}+\sigma^{(t)}\left(\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}\right)e^{-\frac{1}{n}D(X_{1}\|Z)}, (41)

where

σ(t)(θ)={sin(2ntθ)sin(2nθ),if θ<n2π,, otherwise,\sigma^{(t)}(\theta)=\begin{cases}\frac{\sin\left(\sqrt{\frac{2}{n}}t\theta\right)}{\sin\left(\sqrt{\frac{2}{n}}\theta\right)},&\textnormal{if }\theta<\sqrt{\frac{n}{2}}\pi,\\ \infty,&\textnormal{ otherwise,}\\ \end{cases} (42)

for t[0,1]t\in[0,1]. This claim follows from the local version in inequality (40) and a comparison principle, as applied in [1, Lemma 5.3] or [14, Lemma 2.2]. Additionally, keeping in mind the triangle inequality for the Wasserstein metric and [1, Remark 7], (𝔼|X0X1|2)12\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}} is always less than n/2π\sqrt{n/2}\pi.

Since σ(t)t\sigma^{(t)}\geq t, we have proved Theorem 1.5 (and the claim in Remark 1)under the assumption that 2U0,2U1κIn×n<\nabla^{2}U_{0},\nabla^{2}U_{1}\leq\kappa I_{n\times n}<\infty. This assumption can be removed via the following approximation argument.

Let ϵ(0,1/2)\epsilon\in(0,1/2), and define the Ornstein–Uhlenbeck evolutes Xiϵ:=1ϵXi+ϵZX^{\epsilon}_{i}:=\sqrt{1-\epsilon}X_{i}+\sqrt{\epsilon}Z^{\prime} for i=0,1i=0,1, where ZZ^{\prime} is a standard Gaussian random vector independent of both the XiϵX^{\epsilon}_{i}. Denote by μiϵ\mu^{\epsilon}_{i} the distribution of XiϵX^{\epsilon}_{i}, and suppose that ρiϵ= dμiϵ dx=eUiϵ(x)\rho^{\epsilon}_{i}=\frac{\textnormal{ d}\mu^{\epsilon}_{i}}{\textnormal{ d}x}=e^{-U^{\epsilon}_{i}(x)} for i=0,1i=0,1. Obviously, the XiϵX^{\epsilon}_{i}’s are even random vectors in n\mathbb{R}^{n}. Moreover, since the Ornstein–Uhlenbeck process (see, for example, [4]) preserves strongly log-concavity of measures, they are also strongly log-concave. This means 2UiϵIn×n\nabla^{2}U^{\epsilon}_{i}\geq I_{n\times n} for ϵ(0,1/2)\forall\epsilon\in(0,1/2). Moreover, a direct calculation reveals that 2Uiϵ1ϵIn×n\nabla^{2}U^{\epsilon}_{i}\leq\frac{1}{\epsilon}I_{n\times n} (see, for example [21]).

As ϵ0\epsilon\downarrow 0, we have D(XiϵZ)D(XiZ)D(X^{\epsilon}_{i}\|Z)\rightarrow D(X_{i}\|Z) for i=0,1i=0,1. This can be seen from the expression

D(XiϵZ)=ρiϵlogρiϵ dx+12𝔼|Xiϵ|22+n2log(2π),D(X^{\epsilon}_{i}\|Z)=\int\rho^{\epsilon}_{i}\log\rho^{\epsilon}_{i}\textnormal{ d}x+\frac{1}{2}\mathbb{E}|X^{\epsilon}_{i}|_{2}^{2}+\frac{n}{2}\log(2\pi),

and [31, Remark 10].

Now choose a decreasing sequence of ϵk\epsilon_{k} that converges to 0, note that as kk\rightarrow\infty, μiϵk\mu^{\epsilon_{k}}_{i} converges weakly to μi\mu_{i} for i=0,1i=0,1, respectively, therefore by Prokhorov’s theorem, both sequences {μ0ϵk}\{\mu^{\epsilon_{k}}_{0}\} and {μ1ϵk}\{\mu^{\epsilon_{k}}_{1}\} are tight in 𝒫(n)\mathcal{P}(\mathbb{R}^{n}). Suppose that for each kk, πϵk\pi^{\epsilon_{k}} is the coupling (that is, the joint distribution of (X0ϵk,X1ϵk)(X_{0}^{\epsilon_{k}},X_{1}^{\epsilon_{k}})) that we have used in the previous part of the proof, one can show that {πϵk}\{\pi^{\epsilon_{k}}\} is also a tight sequence in 𝒫(n×n)\mathcal{P}(\mathbb{R}^{n}\times\mathbb{R}^{n}), whence it admits a weakly convergent subsequence, and without loss of generality, we assume that (πϵk)(\pi^{\epsilon_{k}}) converges to a coupling π\pi of μ0\mu_{0} and μ1\mu_{1}. Since the relative entropy D(μν)D(\mu\|\nu) is lower semi-continuous on 𝒫(n)×𝒫(n)\mathcal{P}(\mathbb{R}^{n})\times\mathcal{P}(\mathbb{R}^{n}), where 𝒫(n)\mathcal{P}(\mathbb{R}^{n}) is equipped with the weak topology, we have that

lim infkD([(x,y)(1t)x+ty]#πϵkγ)D([(x,y)(1t)x+ty]#πγ).\liminf_{k\rightarrow\infty}D([(x,y)\mapsto(1-t)x+ty]_{\#}\pi^{\epsilon_{k}}\|\gamma)\geq D([(x,y)\mapsto(1-t)x+ty]_{\#}\pi\|\gamma).

With the last observation, and that we already have

e1nD(XtϵkZ)σ(1t)(θϵk)e1nD(X0ϵkZ)+σ(t)(θϵk)e1nD(X1Z),e^{-\frac{1}{n}D(X^{\epsilon_{k}}_{t}\|Z)}\geq\sigma^{(1-t)}\left(\theta_{\epsilon_{k}}\right)e^{-\frac{1}{n}D(X^{\epsilon_{k}}_{0}\|Z)}+\sigma^{(t)}\left(\theta_{\epsilon_{k}}\right)e^{-\frac{1}{n}D(X_{1}\|Z)}, (43)

for Xtϵk=(1t)X0ϵk+tX1ϵkX^{\epsilon_{k}}_{t}=(1-t)X^{\epsilon_{k}}_{0}+tX^{\epsilon_{k}}_{1} and θϵk=(𝔼|X0ϵkX1ϵk|2)12\theta_{\epsilon_{k}}=\left(\mathbb{E}|X^{\epsilon_{k}}_{0}-X^{\epsilon_{k}}_{1}|^{2}\right)^{\frac{1}{2}}, we can send kk\to\infty to complete the proof of both Theorem 1.5 and the claim in Remark 1. ∎

Remark 3.

The vector fields that appear in [1] are gradient fields, which simplifies things. For example, if utu_{t} in the proof above was a gradient field, one could simply apply the Poincaré inequality (with respect to μt\mu_{t}) to the components of utu_{t} to get tr(ut)2 dμt|ut|2 dμt\int\textnormal{tr}(\nabla u_{t})^{2}\textnormal{ d}\mu_{t}\geq\int|u_{t}|^{2}\textnormal{ d}\mu_{t}.

Proof of Theorem 1.7.

Set F=logf,G=logg,F=\log f,G=\log g, and H=loghH=\log h. We will use Hölder’s inequality in the form

Mpt(a,b)Mqt(x,y)Mrt(ax,by),M_{p}^{t}(a,b)M_{q}^{t}(x,y)\geq M_{r}^{t}(ax,by), (44)

where q=1nq=\frac{1}{n} and 1p+1q=1r\frac{1}{p}+\frac{1}{q}=\frac{1}{r}, applied to

x=eD(μ0γ),y=eD(μ1γ),a=eF dμ0,b=eG dμ1,x=e^{-D(\mu_{0}\|\gamma)},y=e^{-D(\mu_{1}\|\gamma)},a=e^{\int F\textnormal{ d}\mu_{0}},b=e^{\int G\textnormal{ d}\mu_{1}}, (45)

where  dμ0f dγ\textnormal{ d}\mu_{0}\propto f\textnormal{ d}\gamma, and  dμ1g dγ\textnormal{ d}\mu_{1}\propto g\textnormal{ d}\gamma are probability measures. Thus, we get

((1t)epF dμ0+tepG dμ1)1p((1t)e1nD(μ0γ)+te1nD(μ1γ))1q((1t)er(F dμ0D(μ0γ))+ter(G dμ1D(μ1γ)))1r=((1t)erlogeF dγ+terlogeG dγ)1r=((1t)(eF dγ)r+t(eG dγ)r)1r.\begin{split}&\left((1-t)e^{p\int F\textnormal{ d}\mu_{0}}+te^{p\int G\textnormal{ d}\mu_{1}}\right)^{\frac{1}{p}}\left((1-t)e^{-\frac{1}{n}D(\mu_{0}\|\gamma)}+te^{-\frac{1}{n}D(\mu_{1}\|\gamma)}\right)^{\frac{1}{q}}\\ &\geq\left((1-t)e^{r\left(\int F\textnormal{ d}\mu_{0}-D(\mu_{0}\|\gamma)\right)}+te^{r\left(\int G\textnormal{ d}\mu_{1}-D(\mu_{1}\|\gamma)\right)}\right)^{\frac{1}{r}}\\ &=\left((1-t)e^{r\log\int e^{F}\textnormal{ d}\gamma}+te^{r\log\int e^{G}\textnormal{ d}\gamma}\right)^{\frac{1}{r}}\\ &=\left((1-t)\left(\int e^{F}\textnormal{ d}\gamma\right)^{r}+t\left(\int e^{G}\textnormal{ d}\gamma\right)^{r}\right)^{\frac{1}{r}}.\end{split} (46)

Taking the expectation of

H((1t)X0+tX1)log((1t)epF(X0)+tepG(X1))1p,H((1-t)X_{0}+tX_{1})\geq\log\left((1-t)e^{pF(X_{0})}+te^{pG(X_{1})}\right)^{\frac{1}{p}}, (47)

with respect to any joint distribution of (X0,X1)(X_{0},X_{1}) and combining it with the joint convexity of the function

Ψt(u,v)=log((1t)eu+tev)\Psi_{t}(u,v)=\log\left((1-t)e^{u}+te^{v}\right) (48)

for every fixed tt [14, Lemma 2.11], we see that

eH dμt((1t)epF dμ0+tepG dμ1)1p.e^{\int H\textnormal{ d}\mu_{t}}\geq\left((1-t)e^{p\int F\textnormal{ d}\mu_{0}}+te^{p\int G\textnormal{ d}\mu_{1}}\right)^{\frac{1}{p}}. (49)

Moreover, we already have

e1nD(μtγ)(1t)e1nD(μ0γ)+te1nD(μ1γ),e^{-\frac{1}{n}D(\mu_{t}\|\gamma)}\geq(1-t)e^{-\frac{1}{n}D(\mu_{0}\|\gamma)}+te^{-\frac{1}{n}D(\mu_{1}\|\gamma)}, (50)

when μt\mu_{t} is the distribution of Xt=(1t)X0+tX1X_{t}=(1-t)X_{0}+tX_{1} for the joint distribution (X0,X1)(X_{0},X_{1}) from Theorem 1.5. Putting equations (49) and (50) together and invoking the “inequality part” of Donsker–Vardhan duality, we get

eH dγeH dμteD(μtγ)((1t)epF dμ0+tepG dμ1)1p((1t)e1nD(μ0γ)+te1nD(μ1γ))1q,\begin{split}&\int e^{H}\textnormal{ d}\gamma\geq e^{\int H\textnormal{ d}\mu_{t}}e^{-D(\mu_{t}\|\gamma)}\\ &\geq\left((1-t)e^{p\int F\textnormal{ d}\mu_{0}}+te^{p\int G\textnormal{ d}\mu_{1}}\right)^{\frac{1}{p}}\left((1-t)e^{-\frac{1}{n}D(\mu_{0}\|\gamma)}+te^{-\frac{1}{n}D(\mu_{1}\|\gamma)}\right)^{\frac{1}{q}},\end{split} (51)

which completes the proof. ∎

References

  • [1] Gautam Aishwarya and Liran Rotem, New brunn–minkowski and functional inequalities via convexity of entropy, 2024.
  • [2] S. Alesker, S. Dar, and V. Milman, A remarkable measure preserving diffeomorphism between two convex bodies in 𝐑n{\bf R}^{n}, Geom. Dedicata 74 (1999), no. 2, 201–212. MR 1674116
  • [3] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré, Gradient flows in metric spaces and in the space of probability measures, second ed., Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 2008. MR 2401600
  • [4] Dominique Bakry, Ivan Gentil, Michel Ledoux, et al., Analysis and geometry of Markov diffusion operators, vol. 103, Springer, 2014.
  • [5] C. Borell, Convex functions in d-space, Uppsala Univ. Dept. of Math. Report (1973).
  • [6] Károly J. Böröczky, Erwin Lutwak, Deane Yang, and Gaoyong Zhang, The log-Brunn-Minkowski inequality, Adv. Math. 231 (2012), no. 3-4, 1974–1997. MR 2964630
  • [7] Herm Jan Brascamp and Elliott H. Lieb, On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation, J. Functional Analysis 22 (1976), no. 4, 366–389. MR 450480
  • [8] Luis A. Caffarelli, A localization property of viscosity solutions to the Monge–Ampere equation and their strict convexity, Annals of mathematics 131 (1990), no. 1, 129–134.
  • [9] by same author, The regularity of mappings with a convex potential, Journal of the American Mathematical Society 5 (1992), no. 1, 99–104.
  • [10] by same author, Monotonicity properties of optimal transportation¶ and the FKG and related inequalities, Communications in Mathematical Physics 214 (2000), no. 3, 547–563.
  • [11] Sinho Chewi and Aram-Alexandre Pooladian, An entropic generalization of Caffarelli’s contraction theorem via covariance inequalities, C. R. Math. Acad. Sci. Paris 361 (2023), 1471–1482. MR 4683324
  • [12] Dario Cordero-Erausquin and Liran Rotem, Improved log-concavity for rotationally invariant measures of symmetric convex sets, Ann. Probab. 51 (2023), no. 3, 987–1003. MR 4583060
  • [13] M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time. IV, Comm. Pure Appl. Math. 36 (1983), no. 2, 183–212. MR 690656
  • [14] Matthias Erbar, Kazumasa Kuwada, and Karl-Theodor Sturm, On the equivalence of the entropic curvature-dimension condition and Bochner’s inequality on metric measure spaces, Inventiones mathematicae 201 (2015), no. 3, 993–1071.
  • [15] Alexandros Eskenazis and Georgios Moschidis, The dimensional Brunn–Minkowski inequality in Gauss space, Journal of Functional Analysis 280 (2021), no. 6, 108914.
  • [16] Richard Gardner, The Brunn–Minkowski inequality, Bulletin of the American mathematical society 39 (2002), no. 3, 355–405.
  • [17] Richard Gardner and Artem Zvavitch, Gaussian Brunn–Minkowski inequalities, Transactions of the American Mathematical Society 362 (2010), no. 10, 5333–5353.
  • [18] Nathael Gozlan, Cyril Roberto, Paul-Marie Samson, and Prasad Tetali, Transport proofs of some discrete variants of the Prékopa-Leindler inequality, Ann. Sc. Norm. Super. Pisa Cl. Sci. (5) 22 (2021), no. 3, 1207–1232. MR 4334317
  • [19] M. Gromov, Convex sets and Kähler manifolds, Advances in differential geometry and topology, World Sci. Publ., Teaneck, NJ, 1990, pp. 1–38. MR 1095529
  • [20] Young-Heon Kim and Emanuel Milman, A generalization of Caffarelli’s contraction theorem via (reverse) heat flow, Math. Ann. 354 (2012), no. 3, 827–862. MR 2983070
  • [21] Bo’az Klartag and Eli Putterman, Spectral monotonicity under Gaussian convolution, Ann. Fac. Sci. Toulouse Math. (6) 32 (2023), no. 5, 939–967. MR 4748461
  • [22] Alexander V. Kolesnikov and Galyna V. Livshyts, On the Gardner–Zvavitch conjecture: Symmetry in inequalities of Brunn–Minkowski type, Advances in Mathematics 384 (2021), 107689.
  • [23] Alexander V. Kolesnikov and Emanuel Milman, Brascamp–Lieb-type inequalities on weighted Riemannian manifolds with boundary, The Journal of Geometric Analysis 27 (2017), 1680–1702.
  • [24] by same author, Poincaré and Brunn–Minkowski inequalities on the boundary of weighted Riemannian manifolds, American Journal of Mathematics 140 (2018), no. 5, 1147–1185.
  • [25] Galyna Livshyts, A universal bound in the dimensional Brunn–Minkowski inequality for log-concave measures, Transactions of the American Mathematical Society (2023).
  • [26] Galyna Livshyts, Arnaud Marsiglietti, Piotr Nayar, and Artem Zvavitch, On the Brunn–Minkowski inequality for general measures with applications to new isoperimetric-type inequalities, Transactions of the American Mathematical Society 369 (2017), no. 12, 8725–8742.
  • [27] Piotr Nayar and Tomasz Tkocz, A note on a Brunn–Minkowski inequality for the Gaussian measure, Proceedings of the American Mathematical Society 141 (2013), no. 11, 4027–4030.
  • [28] Boaz A. Slomka, A remark on discrete Brunn-Minkowski type inequalities via transportation of measure, Israel J. Math. 261 (2024), no. 2, 791–807. MR 4775738
  • [29] Cédric Villani, Topics in optimal transportation, no. 58, American Mathematical Soc., 2003.
  • [30] Cédric Villani, Optimal transport, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338, Springer-Verlag, Berlin, 2009, Old and new. MR 2459454
  • [31] Liyao Wang and Mokshay Madiman, Beyond the entropy power inequality, via rearrangements, IEEE Trans. Inform. Theory 60 (2014), no. 9, 5116–5137. MR 3252379