Entropy and functional forms of the dimensional Brunn–Minkowski inequality in Gauss space

Gautam Aishwarya Department of Mathematics, Michigan State University, East Lansing 48824, USA. [email protected] and Dongbin Li Faculty of Science - Mathematics and Statistical Sciences, University of Alberta, Edmonton, AB T6G 2R3, Canada. [email protected]

Abstract.

Given even strongly log-concave random vectors $X_{0}$ and $X_{1}$ in $\mathbb{R}^{n}$ , we show that a natural joint distribution $(X_{0},X_{1})$ satisfies,

e^{-\frac{1}{n}D((1-t)X_{0}+tX_{1}\|Z)}\geq(1-t)e^{-\frac{1}{n}D(X_{0}\|Z)}+te^{-\frac{1}{n}D(X_{1}\|Z)},

(1)

where $Z$ is distributed according to the standard Gaussian measure $\gamma$ on $\mathbb{R}^{n}$ , $t\in[0,1]$ , and $D(\cdot\|Z)$ is the Gaussian relative entropy. This extends and provides a different viewpoint on the corresponding geometric inequality proved by Eskenazis and Moschidis [15], namely that

\gamma\left((1-t)K_{0}+tK_{1}\right)^{\frac{1}{n}}\geq(1-t)\gamma(K_{0})^{\frac{1}{n}}+t\gamma(K_{1})^{\frac{1}{n}},

(2)

when $K_{0},K_{1}\subseteq\mathbb{R}^{n}$ are origin-symmetric convex bodies. As an application, using Donsker–Varadhan duality, we obtain Gaussian Borell–Brascamp–Lieb inequalities applicable to even log-concave functions, which serve as functional forms of the Eskenazis–Moschidis inequality.

MSC classification: 37C10, 94A17, 52A40, 52A20, 49Q22.
GA is supported by NSF-DMS 2154402. DL acknowledges the support of the Natural Sciences and Engineering Research Council of Canada and the Department of Mathematical and Statistical Sciences at the University of Alberta.

1. Introduction and Main Results

Let $\gamma$ denote the standard Gaussian probability measure on $\mathbb{R}^{n}$ , $\textnormal{ d}\gamma(x)\propto e^{-\frac{1}{2}|x|^{2}}\textnormal{ d}x,$ where $|\cdot|$ denotes the Euclidean norm. It was shown by Eskenazis and Moschidis [15] that, if $K_{0}$ and $K_{1}$ are origin-symmetric convex bodies, then

\gamma\left((1-t)K_{0}+tK_{1}\right)^{\frac{1}{n}}\geq(1-t)\gamma(K_{0})^{\frac{1}{n}}+t\gamma(K_{1})^{\frac{1}{n}}.

(3)

Here $(1-t)K_{0}+tK_{1}=\{(1-t)x_{0}+tx_{1}:x_{0}\in K_{0},x_{1}\in K_{1}\}$ denotes the collection of $t$ -midpoints of all segments from $K_{0}$ to $K_{1}$ . Observe that, the inequality (3) cannot hold for all compact sets $K_{0},K_{1}$ . This can be easily seen by fixing a set $K_{0}$ of positive Gaussian measure, $K_{1}=\{x\}$ , and sending $x\to\infty$ . Without extra conditions on $K_{0}$ and $K_{1}$ , the Gaussian measure only satisfies

\gamma\left((1-t)K_{0}+tK_{1}\right)\geq\gamma(K_{0})^{1-t}\gamma(K_{1})^{t},

(4)

by the virtue of being log-concave. Recall that a measure $\nu$ is said to be log-concave if it has a density of the form $\frac{\textnormal{ d}\nu}{\textnormal{ d}x}=e^{-V}$ , $V$ convex, with respect to the Lebesgue measure.

The inequality (3) was conjectured by Gardner and Zvavitch [17], originally for convex bodies containing the origin. But soon after, Nayar and Tkocz [27] found a counter-example and suggested the assumption of symmetry about the origin. It must be mentioned that the work of Eskenazis and Moschidis closed the proof of (3) by verifying a sufficiency condition introduced by Kolesnikov and Livshyts [22], which is itself based on a machinery developed by Kolesnikov and E. Milman [23, 24].

More generally, the following conjecture has garnered a lot of attention in the last few years.

Conjecture 1.1.

Let $\nu$ be an even log-concave measure on $\mathbb{R}^{n}$ . Then, for origin-symmetric convex bodies $K_{0},K_{1}\subseteq\mathbb{R}^{n}$ , we have

\nu\left((1-t)K_{0}+tK_{1}\right)^{\frac{1}{n}}\geq(1-t)\nu(K_{0})^{\frac{1}{n}}+t\nu(K_{1})^{\frac{1}{n}}.

(5)

One reason why Conjecture 1.1 is of substantial interest is that it follows from the celebrated log-Brunn–Minkowski conjecture of Böröczky, Lutwak, Yang, and Zhang [6]. This implication was shown by Livshyts, Marsiglietti, Nayar, and Zvavitch [26]. Exciting recent developments include works by Livshyts [25], Cordero-Erausquin and Rotem [12].

Recently, Aishwarya and Rotem [1] took a completely different route to prove dimensional inequalities such as in (5) using entropy.

Definition 1.2 (Relative entropy).

Let $\nu$ be a $\sigma$ -additive Borel measure on $\mathbb{R}^{n}$ . For a probability measure $\mu$ , we define the relative entropy of $\mu$ with respect to $\nu$ by

D(\mu\|\nu)=\begin{cases}\int\left(\frac{\textnormal{ d}\mu}{\textnormal{ d}\nu}\right)\log\left(\frac{\textnormal{ d}\mu}{\textnormal{ d}\nu}\right)\textnormal{ d}\nu,&\textnormal{ if }\mu\textnormal{ has density w.r.t. }\nu,\\ +\infty,&\textnormal{ otherwise. }\\ \end{cases}

(6)

Notation.

The relative entropy $D(\mu\|\nu)$ is also written as $D(X\|Y)$ when $\nu$ is a probability measure, and $X,Y$ are $\mathbb{R}^{n}$ -valued random vectors with distributions $\mu,\nu$ , respectively. Note that the joint distribution $(X,Y)$ is not specified because it is immaterial for this definition. See also Definition 1.3.

The technique in [1] is based on the variational principle [1, Lemma 2.7]:

\nu(K)=\sup_{\mu\in\mathcal{P}(K)}e^{-D(\mu\|\nu)},

(7)

which holds for every compact set $K$ and is attained by the normalised restriction $\nu_{K}$ of $\nu$ to $K$ , that is, $\nu_{K}(E)=\frac{\nu(E\cap K)}{\nu(K)}$ for every Borel set $E$ . As in formula (7), we will consistently write $\mathcal{P}(K)$ for the collection of all probability measures on a given $K$ .

Suppose $\nu$ is a probability measure, and $Y$ is a random vector with distribution $\nu$ . In light of the above variational formula, to prove the inequality 5, it suffices to show the existence of a joint distribution $(X_{0},X_{1})$ with the marginals $X_{0},X_{1}$ having distributions $\nu_{K_{0}},\nu_{K_{1}}$ , respectively, such that the following entropy inequality holds:

e^{-\frac{1}{n}D((1-t)X_{0}+tX_{1}\|Y)}\geq(1-t)e^{-\frac{1}{n}D(X_{0}\|Y)}+te^{-\frac{1}{n}D(X_{1}\|Y)}.

(8)

This is because the distribution of $(1-t)X_{0}+tX_{1}$ lies in $\mathcal{P}\left((1-t)K_{0}+tK_{1}\right)$ .

The definition and remarks below clarify our use of some standard terminology regarding joint distributions of random vectors.

Definition 1.3.

(1)

Let $X_{0},X_{1}$ be $\mathbb{R}^{n}$ -valued random vectors. By a joint distribution with marginals $X_{0},X_{1}$ we mean an $\mathbb{R}^{n}\times\mathbb{R}^{n}$ -valued random vector $\bar{X}$ such that $\mathbb{P}\{\bar{X}\in E\times\mathbb{R}^{n}\}=\mathbb{P}\{X_{0}\in E\}$ and $\mathbb{P}\{\bar{X}\in\mathbb{R}^{n}\times E^{\prime}\}=\mathbb{P}\{X_{1}\in E^{\prime}\}$ for Borel sets $E,E^{\prime}$ . Here $\mathbb{P}$ denotes the measure on the underlying probability space over which our random vectors are defined. Such an $\bar{X}$ is often written simply as $(X_{0},X_{1})$ .
(2)

Likewise, a coupling of $\mu_{0},\mu_{1}\in\mathcal{P}(\mathbb{R}^{n})$ is a $\pi\in\mathcal{P}(\mathbb{R}^{n}\times\mathbb{R}^{n})$ such that $\pi(E\times\mathbb{R}^{n})=\mu_{0}(E),\pi(\mathbb{R}^{n}\times E^{\prime})=\mu_{1}(E^{\prime})$ for Borel sets $E,E^{\prime}$ .

Remarks.

•

If $X_{i}$ has distribution $\mu_{i}$ , that is $\mathbb{P}\{X_{i}\in E\}=\mu_{i}(E)$ for Borel sets $E$ and $i=0,1$ , then the distribution of every joint distribution $(X_{0},X_{1})$ is a coupling $\pi$ and vice versa. However, we will sometimes also call $(X_{0},X_{1})$ a coupling.
•

If $(X_{0},X_{1})$ has distribution $\pi$ , then the distribution of the corresponding $(1-t)X_{0}+tX_{1}$ is given by the pushforward measure $\left[(x,y)\mapsto(1-t)x+ty\right]_{\#}\pi\in\mathcal{P}(\mathbb{R}^{n})$ .

The coupling $(X_{0},X_{1})$ used in [1] to obtain several results is the so-called optimal coupling for the Monge–Kantorovich problem with quadratic cost, namely the one that minimises $\mathbb{E}|X_{0}-X_{1}|^{2}$ . For example, [1, Theorem 1.3] implies that, when $Y=Z$ has standard Gaussian distribution, the inequality (8) holds for the optimal coupling with a worse exponent ( $\frac{1}{2n}$ instead of $\frac{1}{n}$ ) but for a larger class ( $K_{0},K_{1}$ are only assumed to be star-sharped with respect to origin, not necessarily symmetric or convex). This was the first time that a dimensional Brunn–Minkowski inequality was obtained for the Gaussian measure without convexity assumptions on the admissible sets (which is not possible with the earlier approach). However, while trying to obtain an inequality of the form (8) that would strengthen the result of Eskenazis and Moschidis, the authors in [1] faced a very interesting problem.

Question 1.4.

[1] Suppose $X_{0},X_{1}$ are $\mathbb{R}^{n}$ -valued random vectors with even strongly log-concave distributions, and assume that $(X_{0},X_{1})$ have the optimal coupling. Is it true that each $X_{t}=(1-t)X_{0}+tX_{1}$ , $t\in(0,1)$ , satisfies the Poincaré inequality for odd functions with constant $1$ ?

Recall that, a random vector $X$ is said to satisfy a Poincaré inequality with constant $1$ over a class of functions $\mathcal{F}$ , if for every function $f\in\mathcal{F}$ , we have $\textnormal{Var}(f(X))\leq\mathbb{E}|\nabla f(X)|^{2}$ . Further, a strongly log-concave random vector is one with distribution $\mu$ such that $\frac{\textnormal{ d}\mu}{\textnormal{ d}\gamma}$ is a log-concave function (in this case, $\mu$ is said to be a strongly log-concave measure). The relevance of this property in our context stems from the fact that $\gamma_{K}$ is strongly log-concave whenever $K$ is a convex body. [1, Theorem 4.5] shows that the desired inequality (8) for $Y=Z$ holds for the optimal coupling, and $X_{0},X_{1}$ even strongly log-concave, if the answer to Question 1.4 is positive.

The first main result of the present work is that there exists a coupling of even strongly log-concave random vectors such that (8) holds when $Y=Z$ .

Theorem 1.5.

Let $X_{0},X_{1}$ be $\mathbb{R}^{n}$ -valued random vectors with even strongly log-concave distributions. Then, there is a coupling $(X_{0},X_{1})$ of $X_{0}$ and $X_{1}$ such that

e^{-\frac{1}{n}D((1-t)X_{0}+tX_{1}\|Z)}\geq(1-t)e^{-\frac{1}{n}D(X_{0}\|Z)}+te^{-\frac{1}{n}D(X_{1}\|Z)}.

(9)

Moreover, for this coupling, we have equality if and only if $X_{0}$ and $X_{1}$ have the same distribution.

Remark 1.

The proof establishes the stronger inequality,

e^{-\frac{1}{n}D(X_{t}\|Z)}\geq\sigma^{(1-t)}\left(\theta\right)e^{-\frac{1}{n}D(X_{0}\|Z)}+\sigma^{(t)}\left(\theta\right)e^{-\frac{1}{n}D(X_{1}\|Z)},

(10)

where $\theta=\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}$ , and

\sigma^{(t)}(\theta)=\frac{\sin\left(\sqrt{\frac{2}{n}}t\theta\right)}{\sin\left(\sqrt{\frac{2}{n}}\theta\right)},

(11)

for this $\theta$ , and $t\in[0,1]$ . As discussed in the proof of Theorem 1.5, the $\theta$ of interest is always strictly less than $\sqrt{n/2}\pi$ . The equality characterisation in Theorem 1.5 follows from the equality characterisation for $\sigma^{(t)}(\theta)=t$ .

The coupling we use is not the optimal coupling, but nonetheless arises from optimal transport. Let $U,V$ be $\mathbb{R}^{n}$ -valued random vectors satisfying $\mathbb{E}|U|^{2},\mathbb{E}|V|^{2}<\infty$ , such that their distributions have density with respect to the Lebesgue measure. Then, a theorem of Brenier (see [29, Theorem 2.12 (ii)]) guarantees that a unique coupling minimises $\mathbb{E}|U-V|^{2}$ , and furthermore, it is given by $(U,T(U))$ where $T=\nabla\phi$ is the gradient of a convex function $\phi$ . Note that the map $T$ , called the Brenier map from $U$ to $V$ , pushes forward the distribution of $U$ to the distribution of $V$ . In the present work, we consider the Brenier map $T_{0}$ from $Z$ to $X_{0}$ , the Brenier map $T_{1}$ from $Z$ to $X_{1}$ , and work with the joint distribution $(X_{0},X_{1})=(T_{0}(Z),T_{1}(Z))$ .

The contraction theorem of Caffarelli [10] tells us that the Brenier map from the standard Gaussian to any strongly log-concave random vector is $1$ -Lipschitz. This automatically gives us that the $X_{t}=(1-t)X_{0}+tX_{1}$ we consider in this paper is a $1$ -Lipschitz image of $Z$ under $T_{t}=(1-t)T_{0}+tT_{1}$ . Given that $Z$ satisfies a Poincaré inequality with constant $1$ , a standard change of variables argument immediately shows that $X_{t}$ satisfies the Poincaré inequality with constant $1$ for all functions. However, interestingly, we do not use this fact directly. Instead, we use the $1$ -Lipschitz property of $T_{t}$ and the Poincaré constant of $Z$ separately. It remains an open question whether the optimal coupling also satisfies the conclusion of Theorem 1.5.

An important feature of the interpolation $X_{t}$ , if considered under optimal coupling, is that the trajectories $\{T_{t}(x)\}_{t\in(0,1)}$ do not cross (in an almost-everywhere sense), and hence the distribution $\mu_{t}$ of $X_{t}$ can be described as the flow of $\mu_{0}$ under a time-dependent velocity field. Yet another useful property under optimal coupling is that the velocity field generated is a gradient field. Both these properties are used in [1].

In our case, for $X_{t}$ that we consider, we are not guaranteed the existence of a driving velocity field, nor do we see a reason for this velocity field to be a gradient field even if it exists. The former technical difficulty is overcome by a “trajectories do not cross” result when $X_{0},X_{1}$ are “nice” (Proposition 2.2), and approximation. The latter issue most prominently appears in the proof of Theorem 1.5, where an inequality such as $\mathbb{E}\textnormal{tr}[\nabla v(X_{t})^{2}]\geq\mathbb{E}|v(X_{t})|^{2}$ is needed for a particular odd vector field $v$ . This is always true when $v$ is a gradient field and the even random vector $X_{t}$ has Poincaré constant $1$ , but not in general. To resolve this problem, we explicitly use the structure of the given vector field $v$ (which depends on $T_{t}$ ) and the Gaussian Poincaré inequality. This makes it unclear if our proof would go through if $T_{0}$ and $T_{1}$ were contractions (via the reverse Ornstein–Uhlenbeck process) introduced by Kim and E. Milman [20], and not Brenier maps. Readers familiar with the work of Alesker, Gilboa, and V. Milman [2] may find it intriguing to compare the fact that the coupling used in this paper admits a Theorem 1.5 (while for other aforementioned couplings such a result is yet unestablished), with Gromov’s observation that $\nabla\phi[\mathbb{R}^{n}]+\nabla\psi[\mathbb{R}^{n}]=(\nabla\phi+\nabla\psi)[\mathbb{R}^{n}]$ when $\phi,\psi$ are $C^{2}$ convex functions with strictly positive Hessian [19, 1.3.A.] (see also, [2, Proposition 2.2]).

As an immediate corollary to Theorem 1.5, using the variational principle (7), we obtain a new proof of Eskenazis and Moschidis’ result.

Corollary 1.6.

The dimensional Brunn–Minkowski inequality for the Gaussian measure (3) holds if $K_{0}$ and $K_{1}$ are origin-symmetric convex bodies.

Remark 2.

In view of Remark 1, it is an interesting question if one can meaningfully bound $\mathbb{E}|X_{0}-X_{1}|^{2}$ from below, when $X_{0},X_{1}$ have distributions $\gamma_{K_{0}},\gamma_{K_{1}}$ , respectively, for symmetric convex bodies $K_{0},K_{1}$ . This could potentially lead to a Gaussian dimensional Brunn–Minkowski inequality for symmetric convex bodies which also incorporates the curvature aspects of the Gaussian measure. As far as we know, this has not been done.

A fundamental problem in this area concerns obtaining functional forms of geometric inequalities. This means, given a geometric inequality, one wants to find a functional inequality which recovers the given geometric inequality when applied to functions canonically associated with the involved sets (for example, to indicator functions). For a prototypical example, consider the Brunn–Minkowski inequality in its geometric-mean form that all log-concave measures $\nu$ are known to satisfy:

\nu((1-t)K_{0}+tK_{1})\geq\nu(K_{0})^{1-t}\nu(K_{1})^{t},

(12)

whenever $K_{0}$ and $K_{1}$ are compact sets in $\mathbb{R}^{n}$ . The functional form of (12) is the Prékopa–Leindler inequality which concludes

\int h\textnormal{ d}\nu\geq\left(\int f\textnormal{ d}\nu\right)^{1-t}\left(\int g\textnormal{ d}\nu\right)^{t},

(13)

whenever $f,g,h$ are non-negative functions satisfying

h((1-t)x+ty)\geq f(x)^{1-t}g(y)^{t},

(14)

for all $x,y$ . Of course, if $f$ and $g$ are indicator functions of $K_{0}$ and $K_{1}$ , respectively, then the indicator of $(1-t)K_{0}+tK_{1}$ is an admissible choice for $h$ , thus producing (12). While several proofs of the Prékopa–Leindler inequality exist (for example, see [16]), an elegant proof can be obtained from the entropy form of (12): every pair of $\mathbb{R}^{n}$ -valued random vectors $X_{0},X_{1}$ with density (with respect to the Lebesgue measure) have a joint distribution $(X_{0},X_{1})$ such that,

D((1-t)X_{0}+tX_{1}\|Y)\leq(1-t)D(X_{0}\|Y)+tD(X_{1}\|Y),

(15)

where $Y$ is a random vector with distribution $\nu$ . The fact that the optimal coupling $(X_{0},X_{1})$ satisfies (15) (see [3, Theorem 9.4.11]) is well known, and often recorded as the “displacement convexity of entropy on the metric measure space $(\mathbb{R}^{n},|\cdot|,\nu)$ ”. To go from (15) to (13) one can use the Donsker–Varadhan duality formula [13, Section 2] describing the Legendre transform of relative entropy. It says, for $\nu$ -integrable functions $\phi$ , we have

\log\int e^{\phi}\textnormal{ d}\nu=\sup_{\mu\ll\nu}\left[\int\phi\textnormal{ d}\mu-D(\mu\|\nu)\right],

(16)

where the supremum on the right is over all probability measures $\mu$ absolutely continuous with respect to $\nu$ , and equality is attained in (16) for $\textnormal{ d}\mu\propto e^{\phi}\textnormal{ d}\nu$ .

We do not spell out the details of the implication (15) $\Rightarrow$ (13) because the reader may infer the general idea from our proof of Theorem 1.7 which is rather short. Nonetheless, it is apt to remark here that this technique stands out because it entirely operates at the level of integrals and does not appeal to local estimates on the integrands (other than the one granted by assumption), thereby making it possible to extract functional inequalities even if a convexity property of entropy is only available on a restricted class of measures. The same cannot be said about some other transport-based proofs of the Prékopa–Leindler inequality (or its generalisations). Besides, this method works in measure spaces without any smooth structure. For example, the reader may find beautiful applications to discrete structures in works of Gozlan, Roberto, Samson, and Tetali [18], and Slomka [28].

We will use this duality to obtain a functional form of the dimensional Brunn–Minkowski inequality (3), which is our second main result.

Notation.

We write

M_{p}^{t}(x,y)\coloneqq\begin{cases}\left((1-t)x^{p}+ty^{p}\right)^{\frac{1}{p}},&\hbox{ for }xy>0,\\ 0&\hbox{ otherwise,}\end{cases}

(17)

for $t\in[0,1]$ and $p\in[-\infty,\infty]$ .

Theorem 1.7.

Let $p\geq 0$ , and suppose $f,g,h$ are non-negative functions on $\mathbb{R}^{n}$ with $f,g$ even log-concave and $\gamma$ -integrable, such that

h((1-t)x_{0}+tx_{1})\geq M_{p}^{t}\left(f(x_{0}),g(x_{1})\right).

(18)

Then, we have

\int h\textnormal{ d}\gamma\geq M_{\frac{p}{1+np}}^{t}\left(\int f\textnormal{ d}\gamma,\int g\textnormal{ d}\gamma\right).

(19)

Indeed, when $f$ and $g$ are indicators of symmetric convex bodies, and $p=\infty$ , one recovers (3). Inequalities such as in the theorem above are sometimes called Borell–Brascamp–Lieb inequalities after the works by Borell [5], and Brascamp–Lieb [7].

To the best of our knowledge, the argument for obtaining Theorem 1.7 from Theorem 1.5, though simple, is new. Previously, it was not clear how to apply duality (16) to inequalities such as (9) that are not linear in relative entropy. Exactly the same idea as we use in the proof of Theorem 1.7 gives further dimension-dependent functional inequalities for functions that are not necessarily log-concave, as discussed below.

Recall that, a convex function $V:\mathbb{R}^{n}\to\mathbb{R}$ is said to be $\beta$ -homogeneous if $V(\lambda x)=\lambda^{\beta}V(x)$ , for every $x\in\mathbb{R}^{n}$ and $\lambda>0$ . Consider the probability measure $\nu\propto e^{-V}\textnormal{ d}x$ , such that $V$ is $\beta$ -homogeneous for some $\beta\in(1,\infty)$ . Say $\nu$ is represented by a random vector $Y$ . Then, [1, Theorem 1.4] states that, for random vectors $X_{0}$ and $X_{1}$ having radially decreasing density with respect to $\nu$ , there exists a coupling $(X_{0},X_{1})$ (in fact, the optimal coupling works) such that

e^{-\frac{\beta-1}{\beta n}D((1-t)X_{0}+tX_{1}\|Y)}\geq(1-t)e^{-\frac{\beta-1}{\beta n}D(X_{0}\|Y)}+te^{-\frac{\beta-1}{\beta n}D(X_{1}\|Y)}.

(20)

From this, we have the following result.

Theorem 1.8.

Consider the probability measure $\textnormal{ d}\nu\propto e^{-V}\textnormal{ d}x$ , such that $V$ is $\beta$ -homogeneous for some $\beta\in(1,\infty)$ . Let $p\geq 0$ , and suppose $f,g,h$ are non-negative functions on $\mathbb{R}^{n}$ with $f,g$ radially decreasing and $\nu$ -integrable, such that

h((1-t)x_{0}+tx_{1})\geq M_{p}^{t}\left(f(x_{0}),g(x_{1})\right).

(21)

Then, we have

\int h\textnormal{ d}\nu\geq M_{\frac{(\beta-1)p}{(\beta-1)+\beta np}}^{t}\left(\int f\textnormal{ d}\nu,\int g\textnormal{ d}\nu\right).

(22)

The standard Gaussian measure falls under the regime $\beta=2$ . Thus, Theorem 1.8 can be applied to a larger class of functions compared to Theorem 1.7, but at the same time draws a weaker conclusion.

1.1. Related works

An independent preprint, by Alexandros Eskenazis and Dario Cordero-Erausquin, containing weighted Borell–Brascamp–Lieb inequalities corresponding to the results in [15] and [12] is expected to appear soon. Their work uses the so-called $L^{2}$ -method, which is based on a different perspective. We learnt from Alexandros Eskenazis, for example, that their equivalent of Theorem 1.7 applies to a wider class of reference measures (as in [12]) but requires greater restrictions on the admissible functions. To the best of our knowledge, the techniques in the aforementioned work may not directly extend to the treatment of even log-concave functions as in Theorem 1.7. We thank Alexandros Eskenazis for kindly sharing with us his joint results, and for his comments (in particular, but not limited to, his suggestion to discuss equality cases in Theorem 1.5). We eagerly await reading their paper!

We would also like to mention the related ongoing work of Andreas Malliaris, James Melbourne, and Cyril Roberto. James Melbourne discussed, with the first-named author, an elegant technique from their work to obtain Borell–Brascamp–Lieb inequalities for $p\in(-1/n,1]$ directly from inequalities such as 3. This discussion took place at the Hausdorff Research Institute for Mathematics, before the present authors understood the precise way to go to a result as in Theorem 1.7 from the exponentiated-entropy inequality of Theorem 1.5. According to GA’s recollection, the argument of Malliaris–Melbourne–Roberto is based on elementary measure theory instead of entropy. This makes their proofs very different, mathematically and spiritually, from the one presented in our paper. More recently, after writing this preprint, we learnt from James Melbourne about an approximation argument to extend their work to include $p=-1/n$ from $p\in(-1/n,1]$ . It should be noted that the $p=-1/n$ case is very powerful, it implies Borell–Brascamp–Lieb inequalities for all $p>-1/n$ . We have not verified the details of the original arguments in the work of Malliaris–Melbourne–Roberto, but we look forward to their work with great enthusiasm. Once the details are verified to be correct, it would not only significantly generalise the statements of our Theorems 1.7 and 1.8, but also reveal many previously unexplored consequences of Conjecture 1.1.

1.2. Further acknowledgements

We are grateful to Liran Rotem for his continued and generous sharing of insights on the topic of dimensional Brunn–Minkowski inequalities. Many thanks to Alexandros Eskenazis and James Melbourne for kindly sharing their results, and to Alexander Volberg for enriching discussions on a related problem. We also sincerely acknowledge the helpful discussions with Galyna Livshyts and Emma Pollard on potential functional forms of the Eskenazis–Moschidis inequality.

1.3. Organisation of the paper

The proof of Theorem 1.5 is based on an Eulerian description of mass transport. The required background is presented in Section 2. Proofs of Theorem 1.5 and Theorem 1.7 appear in Section 3. Theorem 1.8 follows along the same lines as Theorem 1.7, hence we omit its proof.

2. Preliminaries

First of all, we note that $\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}e^{-\frac{1}{n}D(X_{t}\|Z)}\leq 0$ is equivalent to $\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(X_{t}\|Z)\geq\frac{1}{n}\left(\frac{\textnormal{ d}}{\textnormal{ d}t}D(X_{t}\|Z)\right)^{2}$ , whenever the relevant quantities have the required regularity. Thus, we would like to compute $\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(X_{t}\|Z)$ and $\frac{\textnormal{ d}}{\textnormal{ d}t}D(X_{t}\|Z)$ . Such local computations are often best performed in the language of velocity fields.

Suppose a curve $\{\mu_{t}\}_{t\in[0,1]}$ of probability measures on $\mathbb{R}^{n}$ is given. A time-dependent velocity field $v_{t}$ is said to be compatible with $\{\mu_{t}\}_{t\in[0,1]}$ if

\partial_{t}\mu_{t}+\textnormal{div}(v_{t}\mu_{t})=0

(23)

is satisfied in the weak sense, where div denotes divergence. The latter equation means that

\frac{\textnormal{ d}}{\textnormal{ d}t}\int f\textnormal{ d}\mu_{t}=\int\langle\nabla f,v_{t}\rangle\textnormal{ d}\mu_{t},

(24)

for all compactly supported smooth functions $f$ . Once Equation (23) is known to hold, Equation (24) holds under wider generality; for example, it holds for all bounded Lipschitz smooth functions (see [3, Chapter 8]).

Ignoring all regularity issues, we compute the derivatives of $D(\mu_{t}\|\gamma)$ twice when a compatible velocity field is given. We write the result for a general log-concave measure $\nu$ since it may be of independent interest.

Proposition 2.1.

Let $\textnormal{ d}\nu=e^{-W}\textnormal{ d}x$ , for smooth convex $W$ . Consider a curve of probability measures $\{\mu_{t}\}_{t\in[0,1]}$ with a compatible velocity field $v_{t}$ . Suppose $v_{t}$ is sufficiently smooth, then

\begin{split}\frac{\textnormal{ d}}{\textnormal{ d}t}D(\mu_{t}\|\nu)=-\int\textnormal{div}^{W}(v_{t})\textnormal{ d}\mu_{t},\\ \end{split}

(25)

where $\textnormal{div}^{W}(v)=\textnormal{div}(v)-\langle\nabla W,v\rangle$ , for vector fields $v$ . Moreover, if the trajectories of $v_{t}$ take each particle along a straight line with constant speed, then

\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(\mu_{t}\|\nu)=\int\mathcal{G}^{W}(v_{t})\textnormal{ d}\mu_{t},\\

(26)

where $\mathcal{G}^{W}(v)=\textnormal{tr}(\nabla v)^{2}+\langle\nabla^{2}W\cdot v,v\rangle$ , for vector fields $v$ .

Proof.

Let $\rho_{t}$ denote the density of $\mu_{t}$ with respect to $\nu$ . Then,

\begin{split}\frac{\textnormal{ d}}{\textnormal{ d}t}D(\mu_{t}\|\nu)&=\frac{\textnormal{ d}}{\textnormal{ d}t}\int\log\rho_{t}\textnormal{ d}\mu_{t}=\int\partial_{t}\log\rho_{t}\textnormal{ d}\mu_{t}+\int\langle\nabla\log\rho_{t},v_{t}\rangle\textnormal{ d}\mu_{t}\\ &=\frac{\textnormal{ d}}{\textnormal{ d}t}\int\rho_{t}\textnormal{ d}\nu+\int\langle\nabla\log\rho_{t},v_{t}\rangle\rho_{t}\textnormal{ d}\nu=\int\langle\nabla\rho_{t},v_{t}\rangle\textnormal{ d}\nu\\ &=\int\langle\nabla\rho_{t},v_{t}\rangle e^{-W}\textnormal{ d}x=-\int\rho_{t}\textnormal{div}(e^{-W}v_{t})\textnormal{ d}x\\ &=-\int\rho_{t}\,\textnormal{div}^{W}(v_{t})\textnormal{ d}\nu=-\int\textnormal{div}^{W}(v_{t})\textnormal{ d}\mu_{t}.\end{split}

(27)

In the above computation, the second equality uses the chain rule and the continuity equation (23), and the sixth equality is an application of integration by parts.

Further, note that $\partial_{t}v_{t}+\nabla_{v_{t}}v_{t}=0$ if the trajectories of $v_{t}$ take particles along a straight line with constant speed. This allows the following computation to proceed.

\begin{split}\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(\mu_{t}\|\nu)&=-\frac{\textnormal{ d}}{\textnormal{ d}t}\int\textnormal{div}^{W}(v_{t})\textnormal{ d}\mu_{t}=-\int\textnormal{div}^{W}\left(\partial_{t}v_{t}\right)\textnormal{ d}\mu_{t}-\int\langle\nabla\textnormal{div}^{W}(v_{t}),v_{t}\rangle\textnormal{ d}\mu_{t}\\ &=\int\textnormal{div}^{W}\left(\nabla_{v_{t}}v_{t}\right)\textnormal{ d}\mu_{t}-\int\langle\nabla\textnormal{div}^{W}(v_{t}),v_{t}\rangle\textnormal{ d}\mu_{t}=\int\mathcal{G}^{W}(v_{t})\textnormal{ d}\mu_{t},\end{split}

(28)

where in the second equality we use the chain rule and the continuity equation (23), while the last equality uses the pointwise formula

\mathcal{G}^{W}(v)=\textnormal{div}^{W}\left(\nabla_{v}v\right)-\langle\nabla\textnormal{div}^{W}(v),v\rangle,

(29)

which holds for smooth $v$ . Formula (29) is an easily obtainable weighted-version of the Bochner formula for vector fields (see, for example, [30, Equation 14.26]). ∎

To utilise the above formulas for the derivatives of entropy, we need to establish the existence of compatible velocity fields in the cases of interest. We let $I_{n\times n}$ denote the $n\times n$ identity matrix.

Proposition 2.2.

Fix a probability measure $\textnormal{ d}\nu=e^{-W}\textnormal{ d}x$ , and maps $T_{0}=\nabla\phi_{0},T_{1}=\nabla\phi_{1}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ , where $\phi_{0},\phi_{1}$ are convex functions. Denote by $\mu_{t}={T_{t}}_{\#}\nu$ , where $T_{t}=(1-t)T_{0}+tT_{1}$ . Suppose $\nabla^{2}\phi_{0}$ and $\nabla^{2}\phi_{1}$ are both lower-bounded (in the positive semi-definite order) by $\lambda I_{n\times n}$ for some $\lambda>0$ . Then, the equation

v_{t}(T_{t}(x))=\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x)

(30)

defines a velocity field compatible with the curve $\{\mu_{t}\}_{t\in[0,1]}$ .

Proof.

Evidently, the only obstruction in defining a velocity field is that two trajectories $T_{t}(x)$ and $T_{t}(y)$ cross each other at some time $t\in(0,1)$ , that is, if there is a $t_{\star}$ such that $T_{t_{\star}}(x)=T_{t_{\star}}(y)$ for $x\neq y$ . However,

\begin{split}\langle T_{t}(x)-T_{t}(y),x-y\rangle&=\langle(1-t)\left(T_{0}(x)-T_{0}(y)\right)+t\left(T_{1}(x)-T_{1}(y)\right),x-y\rangle\\ &=(1-t)\langle T_{0}(x)-T_{0}(y),x-y\rangle+t\langle T_{1}(x)-T_{1}(y),x-y\rangle\\ &\geq(1-t)\lambda|x-y|^{2}+t\lambda|x-y|^{2}=\lambda|x-y|^{2},\end{split}

(31)

because $\phi_{0}-\frac{\lambda}{2}|x|^{2}$ and $\phi_{1}-\frac{\lambda}{2}|x|^{2}$ are convex and consequently have monotone gradients. Thus, the possibility of this obstruction is ruled out. Now we verify the compatibility. For a compactly supported smooth function $f$ ,

\begin{split}\frac{\textnormal{ d}}{\textnormal{ d}t}\int f\textnormal{ d}\mu_{t}&=\frac{\textnormal{ d}}{\textnormal{ d}t}\int f(T_{t}(x))\textnormal{ d}\nu(x)=\int\langle\nabla f(T_{t}(x)),\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x)\rangle\textnormal{ d}\nu(x)\\ &=\int\langle\nabla f(T_{t}(x)),v_{t}(T_{t}(x))\rangle\textnormal{ d}\nu=\int\langle\nabla f,v_{t}\rangle\textnormal{ d}\mu_{t}.\end{split}

(32)

∎

3. Proof of the main results

In this section, we solely work with the Gaussian measure as the reference measure. Thus, $\nu$ from the previous section is taken to be $\gamma$ . In this case, we will denote $\textnormal{div}^{W}$ by $\widetilde{\textnormal{div}}$ and $\mathcal{G}^{W}$ by $\widetilde{\mathcal{G}}$ .

Proof of Theorem 1.5.

Suppose $X_{0},X_{1}$ are even strongly log-concave random vectors in $\mathbb{R}^{n}$ with distributions $\mu_{0},\mu_{1}$ , respectively. Let $T_{0}$ and $T_{1}$ be Brenier maps from the standard Gaussian $Z$ to $X_{0}$ and $X_{1}$ , respectively. With the joint distribution $(T_{0}(Z),T_{1}(Z))$ , and $X_{t}=(1-t)X_{0}+tX_{1}$ , we want to prove $\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}e^{-\frac{1}{n}D(X_{t}\|Z)}\leq 0$ .

Suppose $\textnormal{ d}\mu_{0}\propto e^{-U_{0}}\textnormal{ d}x$ and $\textnormal{ d}\mu_{1}\propto e^{-U_{1}}\textnormal{ d}x$ . First, we assume that there is a $\kappa<\infty$ such that $\nabla^{2}U_{0},\nabla^{2}U_{1}\leq\kappa I_{n\times n}$ . By strong log-concavity, we already have $\nabla^{2}U_{0},\nabla^{2}U_{1}\geq I_{n\times n}$ . Thus, by Caffarelli’s contraction theorem (or a form thereof, see the statement in [11, Theorem 1]), we get that $T_{0},T_{1}$ are both $1$ -Lipschitz, while $T^{-1}_{0},T^{-1}_{1}$ are $\sqrt{\kappa}$ -Lipschitz. If we write $T_{0}=\nabla\phi_{0},T_{1}=\nabla\phi_{1}$ as gradients of convex functions, then these bounds translate to $\frac{1}{\sqrt{\kappa}}I_{n\times n}\leq\nabla^{2}\phi_{0},\nabla^{2}\phi_{1}\leq I_{n\times n}$ . We infer from Proposition 2.2 that, if $\mu_{t}={T_{t}}_{\#}\gamma$ , $T_{t}=(1-t)T_{0}+tT_{1}$ (thus $\mu_{t}$ is the distribution of $X_{t}$ ), then a velocity field $v_{t}$ compatible with $\mu_{t}$ is well-defined by $v_{t}(T_{t}(x))=\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x)$ . Further, the smoothness of the velocity field $v_{t}$ required to apply Proposition 2.1 can be obtained by Caffarelli’s regularity theory [8, 9]. From here we mimic the argument in [1, Theorem 4.5] with some modifications, but applied to vector fields, where we also use an analogue of a crucial auxiliary construction from [15]. We will prove the stronger inequality

\int\widetilde{\mathcal{G}}(v_{t})\textnormal{ d}\mu_{t}\geq 2\int|v_{t}|^{2}\textnormal{ d}\mu_{t}+\frac{1}{n}\left(\int\widetilde{\textnormal{div}}(v_{t})\textnormal{ d}\mu_{t}\right)^{2}.

(33)

Let $u_{t}(x)=v_{t}(x)-\frac{l}{n}x$ , where $l=\int\widetilde{\textnormal{div}}(v_{t})\textnormal{ d}\mu_{t}$ . Then,

\begin{split}\textnormal{tr}(\nabla v_{t})^{2}&=\textnormal{tr}\left(\nabla u_{t}+\frac{l}{n}I_{n\times n}\right)^{2}=\textnormal{tr}\left((\nabla u_{t})^{2}+\frac{2l}{n}\nabla u_{t}+\frac{l^{2}}{n^{2}}I_{n\times n}\right)\\ &=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\textnormal{div}(u_{t})+\frac{l^{2}}{n}=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\textnormal{div}(v_{t})-\frac{l^{2}}{n}\\ &=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\left(\widetilde{\textnormal{div}}(v_{t})+\langle x,v_{t}\rangle\right)-\frac{l^{2}}{n}\\ &=\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\langle x,v_{t}\rangle+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right).\end{split}

(34)

To continue the proof in the mould of [1, Theorem 4.5], we would like to show that $\int\textnormal{tr}(\nabla u_{t})^{2}\textnormal{ d}\mu_{t}\geq\sum_{i}\int|u_{t}^{(i)}|^{2}\textnormal{ d}\mu_{t}$ , where we write $u_{t}=(u_{t}^{(1)},\ldots,u_{t}^{(n)})$ in its components. This step is slightly more involved than in [1] (see Remark 3) and we are forced to take the following route.

Using the chain rule, one has

\nabla u_{t}(T_{t}(x))\nabla T_{t}(x)=\nabla[u_{t}(T_{t}(x))]=\nabla^{2}\phi_{1}(x)-\nabla^{2}\phi_{0}(x)-\frac{l}{n}\nabla T_{t}(x).

Let $A=\nabla^{2}\phi_{1}(x)-\nabla^{2}\phi_{0}(x)-\frac{l}{n}\nabla T_{t}(x)$ and $B=(\nabla T_{t}(x))^{-1}$ , note that the matrices $A$ and $B$ are symmetric; furthermore, $B\geq I_{n\times n}$ . Therefore, we have

\begin{split}\textnormal{tr}[\nabla u(T_{t}(x))]^{2}&=\textnormal{tr}(AB)^{2}=\textnormal{tr}(ABAB)=\textnormal{tr}(B^{1/2}ABAB^{1/2})\\ &\geq\textnormal{tr}(B^{1/2}A^{2}B^{1/2})=\textnormal{tr}(A^{2}B)=\textnormal{tr}(ABA)\geq\textnormal{tr}(A^{2})\\ &=\textnormal{tr}(\nabla[u_{t}(T_{t}(x))])^{2},\\ \end{split}

(35)

where both the inequalities follow from the monotonicity of trace under the positive semidefinite ordering.

The trace inequality above gives us the following.

\begin{split}\int\textnormal{tr}(\nabla u_{t})^{2}\textnormal{ d}\mu_{t}&=\int\textnormal{tr}[\nabla u(T_{t}(x))]^{2}\textnormal{ d}\gamma\geq\int\textnormal{tr}(\nabla[u_{t}(T_{t}(x))])^{2}\textnormal{ d}\gamma\\ &=\sum_{i}\int|\nabla[u_{t}^{(i)}(T_{t}(x))]|^{2}\textnormal{ d}\gamma,\\ \end{split}

(36)

where the last equality uses the fact that $u_{t}(T_{t}(x))$ is a gradient field, which can be seen from the expression $u_{t}(T_{t}(x))=(1-\frac{tl}{n})\nabla\phi_{1}(x)-(1+\frac{(1-t)l}{n})\nabla\phi_{0}(x)$ . Furthermore, $u_{t}(T_{t}(x))$ is an odd function of $x$ , since both $\nabla\phi_{0}(x)$ and $\nabla\phi_{1}(x)$ are odd.

Applying the Gaussian Poincaré inequality to each component of $u_{t}(T_{t}(x))$ , we obtain

\begin{split}\sum_{i}\int|\nabla[u_{t}^{(i)}(T_{t}(x))]|^{2}\textnormal{ d}\gamma&\geq\sum_{i}\int|u_{t}^{(i)}(T_{t}(x))|^{2}\textnormal{ d}\gamma=\sum_{i}\int\left(u_{t}^{(i)}(x)\right)^{2}\textnormal{ d}\mu_{t}\\ &=\sum_{t}\int\left(v_{t}^{(i)}(x)-\frac{l}{n}x^{(i)}\right)^{2}\textnormal{ d}\mu_{t}\\ &=\int\left(|v_{t}|^{2}-\frac{2l}{n}\langle x,v_{t}\rangle+\frac{l^{2}}{n^{2}}|x|^{2}\right)\textnormal{ d}\mu_{t}.\end{split}

(37)

Putting this into the expression for $\textnormal{tr}(\nabla v_{t})^{2}$ from before,

\begin{split}\int\widetilde{\mathcal{G}}(v_{t})\textnormal{ d}\mu_{t}&=\int\left(\textnormal{tr}(\nabla v_{t})^{2}+|v_{t}|^{2}\right)\textnormal{ d}\mu_{t}\\ &=\int\left(\textnormal{tr}(\nabla u_{t})^{2}+\frac{2l}{n}\langle x,v_{t}\rangle+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)+|v_{t}|^{2}\right)\textnormal{ d}\mu_{t}\\ &\geq\int\left(|v_{t}|^{2}-\frac{2l}{n}\langle x,v_{t}\rangle+\frac{l^{2}}{n^{2}}|x|^{2}+\frac{2l}{n}\langle x,v_{t}\rangle+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)+|v_{t}|^{2}\right)\textnormal{ d}\mu_{t}\\ &=\int\left(2|v_{t}|^{2}+\frac{l^{2}}{n^{2}}|x|^{2}+\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)\right)\textnormal{ d}\mu_{t}\\ &\geq 2\int|v_{t}|^{2}\textnormal{ d}\mu_{t}+\int\left(\frac{2l}{n}\widetilde{\textnormal{div}}(v_{t})-\frac{l^{2}}{n}\right)\textnormal{ d}\mu_{t}=2\int|v_{t}|^{2}\textnormal{ d}\mu_{t}+\frac{1}{n}l^{2},\end{split}

(38)

as desired.

Note that,

\begin{split}\int|v_{t}|^{2}\textnormal{ d}\mu_{t}&=\int|v_{t}(T_{t}(x))|^{2}\textnormal{ d}\gamma(x)=\int|\frac{\textnormal{ d}}{\textnormal{ d}t}T_{t}(x)|^{2}\textnormal{ d}\gamma(x)\\ &=\int|T_{0}(x)-T_{1}(x)|^{2}\textnormal{ d}\gamma=\mathbb{E}|X_{0}-X_{1}|^{2}.\end{split}

(39)

Thus, we have shown that

\frac{\textnormal{ d}^{2}}{\textnormal{ d}t^{2}}D(X_{t}\|Z)\geq 2\mathbb{E}|X_{0}-X_{1}|^{2}+\frac{1}{n}\left(\frac{\textnormal{ d}}{\textnormal{ d}t}D(X_{t}\|Z)\right)^{2},

(40)

under the assumed regularity on $X_{0},X_{1}$ and the chosen coupling $(X_{0},X_{1})$ . We claim that,

e^{-\frac{1}{n}D(X_{t}\|Z)}\geq\sigma^{(1-t)}\left(\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}\right)e^{-\frac{1}{n}D(X_{0}\|Z)}+\sigma^{(t)}\left(\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}\right)e^{-\frac{1}{n}D(X_{1}\|Z)},

(41)

where

\sigma^{(t)}(\theta)=\begin{cases}\frac{\sin\left(\sqrt{\frac{2}{n}}t\theta\right)}{\sin\left(\sqrt{\frac{2}{n}}\theta\right)},&\textnormal{if }\theta<\sqrt{\frac{n}{2}}\pi,\\ \infty,&\textnormal{ otherwise,}\\ \end{cases}

(42)

for $t\in[0,1]$ . This claim follows from the local version in inequality (40) and a comparison principle, as applied in [1, Lemma 5.3] or [14, Lemma 2.2]. Additionally, keeping in mind the triangle inequality for the Wasserstein metric and [1, Remark 7], $\left(\mathbb{E}|X_{0}-X_{1}|^{2}\right)^{\frac{1}{2}}$ is always less than $\sqrt{n/2}\pi$ .

Since $\sigma^{(t)}\geq t$ , we have proved Theorem 1.5 (and the claim in Remark 1)under the assumption that $\nabla^{2}U_{0},\nabla^{2}U_{1}\leq\kappa I_{n\times n}<\infty$ . This assumption can be removed via the following approximation argument.

Let $\epsilon\in(0,1/2)$ , and define the Ornstein–Uhlenbeck evolutes $X^{\epsilon}_{i}:=\sqrt{1-\epsilon}X_{i}+\sqrt{\epsilon}Z^{\prime}$ for $i=0,1$ , where $Z^{\prime}$ is a standard Gaussian random vector independent of both the $X^{\epsilon}_{i}$ . Denote by $\mu^{\epsilon}_{i}$ the distribution of $X^{\epsilon}_{i}$ , and suppose that $\rho^{\epsilon}_{i}=\frac{\textnormal{ d}\mu^{\epsilon}_{i}}{\textnormal{ d}x}=e^{-U^{\epsilon}_{i}(x)}$ for $i=0,1$ . Obviously, the $X^{\epsilon}_{i}$ ’s are even random vectors in $\mathbb{R}^{n}$ . Moreover, since the Ornstein–Uhlenbeck process (see, for example, [4]) preserves strongly log-concavity of measures, they are also strongly log-concave. This means $\nabla^{2}U^{\epsilon}_{i}\geq I_{n\times n}$ for $\forall\epsilon\in(0,1/2)$ . Moreover, a direct calculation reveals that $\nabla^{2}U^{\epsilon}_{i}\leq\frac{1}{\epsilon}I_{n\times n}$ (see, for example [21]).

As $\epsilon\downarrow 0$ , we have $D(X^{\epsilon}_{i}\|Z)\rightarrow D(X_{i}\|Z)$ for $i=0,1$ . This can be seen from the expression

D(X^{\epsilon}_{i}\|Z)=\int\rho^{\epsilon}_{i}\log\rho^{\epsilon}_{i}\textnormal{ d}x+\frac{1}{2}\mathbb{E}|X^{\epsilon}_{i}|_{2}^{2}+\frac{n}{2}\log(2\pi),

and [31, Remark 10].

Now choose a decreasing sequence of $\epsilon_{k}$ that converges to 0, note that as $k\rightarrow\infty$ , $\mu^{\epsilon_{k}}_{i}$ converges weakly to $\mu_{i}$ for $i=0,1$ , respectively, therefore by Prokhorov’s theorem, both sequences $\{\mu^{\epsilon_{k}}_{0}\}$ and $\{\mu^{\epsilon_{k}}_{1}\}$ are tight in $\mathcal{P}(\mathbb{R}^{n})$ . Suppose that for each $k$ , $\pi^{\epsilon_{k}}$ is the coupling (that is, the joint distribution of $(X_{0}^{\epsilon_{k}},X_{1}^{\epsilon_{k}})$ ) that we have used in the previous part of the proof, one can show that $\{\pi^{\epsilon_{k}}\}$ is also a tight sequence in $\mathcal{P}(\mathbb{R}^{n}\times\mathbb{R}^{n})$ , whence it admits a weakly convergent subsequence, and without loss of generality, we assume that $(\pi^{\epsilon_{k}})$ converges to a coupling $\pi$ of $\mu_{0}$ and $\mu_{1}$ . Since the relative entropy $D(\mu\|\nu)$ is lower semi-continuous on $\mathcal{P}(\mathbb{R}^{n})\times\mathcal{P}(\mathbb{R}^{n})$ , where $\mathcal{P}(\mathbb{R}^{n})$ is equipped with the weak topology, we have that

\liminf_{k\rightarrow\infty}D([(x,y)\mapsto(1-t)x+ty]_{\#}\pi^{\epsilon_{k}}\|\gamma)\geq D([(x,y)\mapsto(1-t)x+ty]_{\#}\pi\|\gamma).

With the last observation, and that we already have

e^{-\frac{1}{n}D(X^{\epsilon_{k}}_{t}\|Z)}\geq\sigma^{(1-t)}\left(\theta_{\epsilon_{k}}\right)e^{-\frac{1}{n}D(X^{\epsilon_{k}}_{0}\|Z)}+\sigma^{(t)}\left(\theta_{\epsilon_{k}}\right)e^{-\frac{1}{n}D(X_{1}\|Z)},

(43)

for $X^{\epsilon_{k}}_{t}=(1-t)X^{\epsilon_{k}}_{0}+tX^{\epsilon_{k}}_{1}$ and $\theta_{\epsilon_{k}}=\left(\mathbb{E}|X^{\epsilon_{k}}_{0}-X^{\epsilon_{k}}_{1}|^{2}\right)^{\frac{1}{2}}$ , we can send $k\to\infty$ to complete the proof of both Theorem 1.5 and the claim in Remark 1. ∎

Remark 3.

The vector fields that appear in [1] are gradient fields, which simplifies things. For example, if $u_{t}$ in the proof above was a gradient field, one could simply apply the Poincaré inequality (with respect to $\mu_{t}$ ) to the components of $u_{t}$ to get $\int\textnormal{tr}(\nabla u_{t})^{2}\textnormal{ d}\mu_{t}\geq\int|u_{t}|^{2}\textnormal{ d}\mu_{t}$ .

Proof of Theorem 1.7.

Set $F=\log f,G=\log g,$ and $H=\log h$ . We will use Hölder’s inequality in the form

M_{p}^{t}(a,b)M_{q}^{t}(x,y)\geq M_{r}^{t}(ax,by),

(44)

where $q=\frac{1}{n}$ and $\frac{1}{p}+\frac{1}{q}=\frac{1}{r}$ , applied to

x=e^{-D(\mu_{0}\|\gamma)},y=e^{-D(\mu_{1}\|\gamma)},a=e^{\int F\textnormal{ d}\mu_{0}},b=e^{\int G\textnormal{ d}\mu_{1}},

(45)

where $\textnormal{ d}\mu_{0}\propto f\textnormal{ d}\gamma$ , and $\textnormal{ d}\mu_{1}\propto g\textnormal{ d}\gamma$ are probability measures. Thus, we get

\begin{split}&\left((1-t)e^{p\int F\textnormal{ d}\mu_{0}}+te^{p\int G\textnormal{ d}\mu_{1}}\right)^{\frac{1}{p}}\left((1-t)e^{-\frac{1}{n}D(\mu_{0}\|\gamma)}+te^{-\frac{1}{n}D(\mu_{1}\|\gamma)}\right)^{\frac{1}{q}}\\ &\geq\left((1-t)e^{r\left(\int F\textnormal{ d}\mu_{0}-D(\mu_{0}\|\gamma)\right)}+te^{r\left(\int G\textnormal{ d}\mu_{1}-D(\mu_{1}\|\gamma)\right)}\right)^{\frac{1}{r}}\\ &=\left((1-t)e^{r\log\int e^{F}\textnormal{ d}\gamma}+te^{r\log\int e^{G}\textnormal{ d}\gamma}\right)^{\frac{1}{r}}\\ &=\left((1-t)\left(\int e^{F}\textnormal{ d}\gamma\right)^{r}+t\left(\int e^{G}\textnormal{ d}\gamma\right)^{r}\right)^{\frac{1}{r}}.\end{split}

(46)

Taking the expectation of

H((1-t)X_{0}+tX_{1})\geq\log\left((1-t)e^{pF(X_{0})}+te^{pG(X_{1})}\right)^{\frac{1}{p}},

(47)

with respect to any joint distribution of $(X_{0},X_{1})$ and combining it with the joint convexity of the function

\Psi_{t}(u,v)=\log\left((1-t)e^{u}+te^{v}\right)

(48)

for every fixed $t$ [14, Lemma 2.11], we see that

e^{\int H\textnormal{ d}\mu_{t}}\geq\left((1-t)e^{p\int F\textnormal{ d}\mu_{0}}+te^{p\int G\textnormal{ d}\mu_{1}}\right)^{\frac{1}{p}}.

(49)

Moreover, we already have

e^{-\frac{1}{n}D(\mu_{t}\|\gamma)}\geq(1-t)e^{-\frac{1}{n}D(\mu_{0}\|\gamma)}+te^{-\frac{1}{n}D(\mu_{1}\|\gamma)},

(50)

when $\mu_{t}$ is the distribution of $X_{t}=(1-t)X_{0}+tX_{1}$ for the joint distribution $(X_{0},X_{1})$ from Theorem 1.5. Putting equations (49) and (50) together and invoking the “inequality part” of Donsker–Vardhan duality, we get

\begin{split}&\int e^{H}\textnormal{ d}\gamma\geq e^{\int H\textnormal{ d}\mu_{t}}e^{-D(\mu_{t}\|\gamma)}\\ &\geq\left((1-t)e^{p\int F\textnormal{ d}\mu_{0}}+te^{p\int G\textnormal{ d}\mu_{1}}\right)^{\frac{1}{p}}\left((1-t)e^{-\frac{1}{n}D(\mu_{0}\|\gamma)}+te^{-\frac{1}{n}D(\mu_{1}\|\gamma)}\right)^{\frac{1}{q}},\end{split}

(51)

which completes the proof. ∎

References

[1] Gautam Aishwarya and Liran Rotem, New brunn–minkowski and functional inequalities via convexity of entropy, 2024.
[2] S. Alesker, S. Dar, and V. Milman, A remarkable measure preserving diffeomorphism between two convex bodies in ${\bf R}^{n}$ , Geom. Dedicata 74 (1999), no. 2, 201–212. MR 1674116
[3] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré, Gradient flows in metric spaces and in the space of probability measures, second ed., Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 2008. MR 2401600
[4] Dominique Bakry, Ivan Gentil, Michel Ledoux, et al., Analysis and geometry of Markov diffusion operators, vol. 103, Springer, 2014.
[5] C. Borell, Convex functions in d-space, Uppsala Univ. Dept. of Math. Report (1973).
[6] Károly J. Böröczky, Erwin Lutwak, Deane Yang, and Gaoyong Zhang, The log-Brunn-Minkowski inequality, Adv. Math. 231 (2012), no. 3-4, 1974–1997. MR 2964630
[7] Herm Jan Brascamp and Elliott H. Lieb, On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation, J. Functional Analysis 22 (1976), no. 4, 366–389. MR 450480
[8] Luis A. Caffarelli, A localization property of viscosity solutions to the Monge–Ampere equation and their strict convexity, Annals of mathematics 131 (1990), no. 1, 129–134.
[9] by same author, The regularity of mappings with a convex potential, Journal of the American Mathematical Society 5 (1992), no. 1, 99–104.
[10] by same author, Monotonicity properties of optimal transportation¶ and the FKG and related inequalities, Communications in Mathematical Physics 214 (2000), no. 3, 547–563.
[11] Sinho Chewi and Aram-Alexandre Pooladian, An entropic generalization of Caffarelli’s contraction theorem via covariance inequalities, C. R. Math. Acad. Sci. Paris 361 (2023), 1471–1482. MR 4683324
[12] Dario Cordero-Erausquin and Liran Rotem, Improved log-concavity for rotationally invariant measures of symmetric convex sets, Ann. Probab. 51 (2023), no. 3, 987–1003. MR 4583060
[13] M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time. IV, Comm. Pure Appl. Math. 36 (1983), no. 2, 183–212. MR 690656
[14] Matthias Erbar, Kazumasa Kuwada, and Karl-Theodor Sturm, On the equivalence of the entropic curvature-dimension condition and Bochner’s inequality on metric measure spaces, Inventiones mathematicae 201 (2015), no. 3, 993–1071.
[15] Alexandros Eskenazis and Georgios Moschidis, The dimensional Brunn–Minkowski inequality in Gauss space, Journal of Functional Analysis 280 (2021), no. 6, 108914.
[16] Richard Gardner, The Brunn–Minkowski inequality, Bulletin of the American mathematical society 39 (2002), no. 3, 355–405.
[17] Richard Gardner and Artem Zvavitch, Gaussian Brunn–Minkowski inequalities, Transactions of the American Mathematical Society 362 (2010), no. 10, 5333–5353.
[18] Nathael Gozlan, Cyril Roberto, Paul-Marie Samson, and Prasad Tetali, Transport proofs of some discrete variants of the Prékopa-Leindler inequality, Ann. Sc. Norm. Super. Pisa Cl. Sci. (5) 22 (2021), no. 3, 1207–1232. MR 4334317
[19] M. Gromov, Convex sets and Kähler manifolds, Advances in differential geometry and topology, World Sci. Publ., Teaneck, NJ, 1990, pp. 1–38. MR 1095529
[20] Young-Heon Kim and Emanuel Milman, A generalization of Caffarelli’s contraction theorem via (reverse) heat flow, Math. Ann. 354 (2012), no. 3, 827–862. MR 2983070
[21] Bo’az Klartag and Eli Putterman, Spectral monotonicity under Gaussian convolution, Ann. Fac. Sci. Toulouse Math. (6) 32 (2023), no. 5, 939–967. MR 4748461
[22] Alexander V. Kolesnikov and Galyna V. Livshyts, On the Gardner–Zvavitch conjecture: Symmetry in inequalities of Brunn–Minkowski type, Advances in Mathematics 384 (2021), 107689.
[23] Alexander V. Kolesnikov and Emanuel Milman, Brascamp–Lieb-type inequalities on weighted Riemannian manifolds with boundary, The Journal of Geometric Analysis 27 (2017), 1680–1702.
[24] by same author, Poincaré and Brunn–Minkowski inequalities on the boundary of weighted Riemannian manifolds, American Journal of Mathematics 140 (2018), no. 5, 1147–1185.
[25] Galyna Livshyts, A universal bound in the dimensional Brunn–Minkowski inequality for log-concave measures, Transactions of the American Mathematical Society (2023).
[26] Galyna Livshyts, Arnaud Marsiglietti, Piotr Nayar, and Artem Zvavitch, On the Brunn–Minkowski inequality for general measures with applications to new isoperimetric-type inequalities, Transactions of the American Mathematical Society 369 (2017), no. 12, 8725–8742.
[27] Piotr Nayar and Tomasz Tkocz, A note on a Brunn–Minkowski inequality for the Gaussian measure, Proceedings of the American Mathematical Society 141 (2013), no. 11, 4027–4030.
[28] Boaz A. Slomka, A remark on discrete Brunn-Minkowski type inequalities via transportation of measure, Israel J. Math. 261 (2024), no. 2, 791–807. MR 4775738
[29] Cédric Villani, Topics in optimal transportation, no. 58, American Mathematical Soc., 2003.
[30] Cédric Villani, Optimal transport, Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 338, Springer-Verlag, Berlin, 2009, Old and new. MR 2459454
[31] Liyao Wang and Mokshay Madiman, Beyond the entropy power inequality, via rearrangements, IEEE Trans. Inform. Theory 60 (2014), no. 9, 5116–5137. MR 3252379