Gradient Flow Structure of a Multidimensional Nonlinear Sixth Order Quantum-Diffusion Equation
Abstract.
A nonlinear parabolic equation of sixth order is analyzed. The equation arises as a reduction of a model from quantum statistical mechanics, and also as the gradient flow of a second-order information functional with respect to the -Wasserstein metric. First, we prove global existence of weak solutions for initial conditions of finite entropy by means of the time-discrete minimizing movement scheme. Second, we calculate the linearization of the dynamics around the unique stationary solution, for which we can explicitly compute the entire spectrum. A key element in our approach is a particular relation between the entropy, the Fisher information and the second order functional that generates the gradient flow under consideration.
Key words and phrases:
Higher-order diffusion equations, quantum diffusion model, Wasserstein gradient flow, flow interchange estimate, long-time behavior, linearization2010 Mathematics Subject Classification:
Primary: 35K30, Secondary: 35B45, 35B401. Introduction
1.1. The equation
The following nonlinear parabolic evolution equation of sixth order is considered:
(1) |
where is a given parameter. There are at least two different contexts in which (1) plays a role.
The first is the semi-classical approximation of the nonlocal quantum drift-diffusion model by Degond et al [5]. In the formal asymptotic expansion of that equation in terms of the Planck constant , the right-hand side of (1) with appears as the term of order , after the linear diffusion operator at order and the operator , which is related to the Bohm potential, at order . More details on the derivation of the model and its formal expansion are given below in Section 2.1.
The other context, more relevant to the paper at hand, is that of gradient flows in the -Wasserstein distance. Consider the following three functionals, defined on — at the moment for simplicity: strictly positive — probability densities by
(2) |
which we shall refer to as — -perturbed if — entropy, Fisher information, and energy, respectively. The celebrated result of [9] is that the gradient flow of in the -Wasserstein metric is the linear Fokker-Planck equation,
(3) |
In [8] — see also[1, Example 11.1.10] — it has been shown that the gradient flow of is the so-called quantum drift-diffusion or DLSS equation,
(4) |
The starting point for our analysis is that (1) is the gradient flow of , at least formally, that is, the potential in (1) is the variational derivative of . The reason for considering as potential for a gradient flow is more profound than its formal similarity with and . There is an intimate relation between , and , that has already been the basis for deriving sharp self-similar asymptotics in [13], and that we shall elaborate on in Section 1.3 below. In a nutshell, the dissipation of along the heat flow is , the dissipation of along the heat flow is , up to a multiple of itself, and this equals the dissipation of along the flow of (4). In this spirit, one may consider (4) and (1), respectively, as fourth and sixth order analogues of the second order linear Fokker-Planck equation (3).
Our analytical results are two-fold. First, we give a proof of existence of weak solutions to the initial value problem for (1) on the whole space for initial data with finite entropy and finite second moment. This result is proven with full rigor. Second, we study the long-time asymptotics of solutions using a linearization around the steady state. This part is formal in the sense that we calculate the linearization for sufficiently smooth perturbations of the steady state and discuss the spectral properties of an appropriate closure of the linear operator.
1.2. Existence of solutions
Global existence of non-negative weak solutions to (1) with on the -dimensional torus, i.e., with periodic boundary conditions, has been proven in [11] for , and in [3] for and . The main technical ingredient of these proofs is a particular regularization of the evolution equation, namely by . This regularization produces approximations of the true solution that are smooth and have an -dependent positive lower bound. Smoothness of both and then allows to perform all the necessary a priori estimates on the approximation, and these pass to the limit . An extension of this method from the torus to the whole space — if possible at all — is at least not straight-forward, since the intermediate estimates involve simultaneous Sobolev estimates on and , that would contradict each other on unbounded domains.
Here, we shall prove existence of solutions in on the whole space . Our technical device is very different from the one recalled above: we invoke the machinery of metric gradient flows. Our approximations are of lower regularity, and we do not have any information about positivity. To justify the a priori estimates, we need the method of flow interchange with the heat equation [8, 13]. In contrast to the constructions in [11, 3], we do not modify the equation, but perform a variational discretization in time using the minimizing movement scheme. More specifically: given initial probability density of finite second moment and entropy, and a time step , define inductively a sequence by , and being a minimizer of
(5) |
We recall the -Wasserstein-distance below in Section 2.2, and we prove that the inductive procedure is well-defined. Our result about existence is that the sequences approximate a weak solution to (1).
Theorem 1.1.
Assume . Let an initial datum be given that is a probability density with finite entropy, , and finite second moment. For each , define a sequence inductively as described above. Then the piecewise constant “interpolations” with
(6) |
converge along a suitable null sequence to a limit in . And that limit is a weak solution to (1) with in the following sense: is a locally Hölder continuous curve from in , the roots and are of regularity
(7) |
respectively, and for every test function ,
(8) |
where the nonlinearity is given by
(9) |
with , which coincides with where .
Remark 1.2.
In view of (7), one has , and so is “tested” against a local -function in (9). It is far from obvious that (8) with the nonlinearity (9) is indeed a weak formulation of (1). Equality of with a more “natural” weak formulation of the nonlinearity in (1)’s right hand side, like
for smooth and positive solutions can be verified by a direct but tedious computation involving various integration by parts. A more conceptual way to recognize as a weak formulation is explained at the beginning of Section 3.3.
Theorem 1.1 is proven by time-discrete approximation of the solution via the celebrated minimizing movement scheme. The main compactness estimate for passing to the time-continuous limit is provided by the dissipation of the (unperturbed) entropy , which formally amounts to
(10) |
with some positive . The formal calculations leading to (10) via integration by parts are identical to the ones used in [3]. The justification of these estimates in the whole-space case is technically more involved.
1.3. Long-time asymptotics
The interpretation of (1) as a Wasserstein gradient flow is essential for our second main result, which concerns the long time asymptotics of . First assume , in which case there is an unique equilibrium for (1) (with mass equal to 1), given by
(11) |
Our approach to understanding the dynamics near is to formally calculate a linearization of (1) at and to determine its spectrum. We wrote a linearization since there are several competing concepts for linearization in nonlinear diffusion equations, providing different pieces of information about the long-time asymptotics. Following the ideas of [7], we study here the “linearization in Wasserstein”, which is given by the so-called displacement Hessian of at . Very informally, the displacement Hessian of a functional at a critical point is the representation of ’s second variational derivative with respect to the scalar product in . A definition and a more intuitive interpretation in terms of Lagrangian maps is given in Section 4.3.
Displacement Hessians are rarely used for the analysis of long-time asymptotics since the extraction of rigorous analytical information requires a lot of a priori knowledge about regularity from the solution. Particularly when it comes to proving higher order asymptotics, alternative linearizations are often easier to handle; we refer to the discussions in [21, 6]. The most famous application of displacement Hessians concerns the result on self-similar asymptotics for the porous medium equation [17]; further applications, also to higher order diffusion equations, can be found e.g. in [14, 15, 20]. In the situation at hand, we use the Wasserstein linearization because of its compatibility with the special structure of (1) that we outline below.
Since the derivation of the result is probably more interesting than the result itself, we briefly indicate the main ingredient, which is the aforementioned intimate relation between the three functionals in (2): recall that the linear Fokker-Planck equation (3) is the Wasserstein gradient flow of . Consider a solution to that flow, i.e.,
(12) |
It is a well-known fact that is the dissipation of along (12), i.e.,
(13) |
The connection between and — and, implicitly, also — is given by
(14) |
The relation (14) is derived in Section 4.2. It will play the same role in our studies on (1) as (13) has played for the analysis of the DLSS equation (4) in [14]. That is, we use (14) to express the displacement Hessian of in terms of the displacement Hessian of .
For the linear Fokker-Planck equation (12), which is the gradient flow of the relative entropy and has as a critical point as well, the displacement Hessian has been calculated in [7, Proposition 2]: it is the extension of the formal operator
(15) |
to . The corresponding spectrum of is well-known: it is purely discrete with eigenvalues that are precisely the positive integer multiples of . The corresponding eigenfunctions are Hermite polynomials. The derivation of might appear a bit weird: first, one re-writes the linear Fokker-Planck equation (12) on the -Wasserstein space, obtaining a highly non-linear gradient flow, and then calculates its linearization, which is — up to dualization — again the original equation (12). The key point is that the Wasserstein linearization is compatible with the relation (14), i.e., the linearization of the highly non-linear sixth order equation (1) is easily expressible in terms of , see Theorem 1.3 below. A similar idea has been used in [14], where the authors showed by exploiting the relation (13) below that the displacement Hessian for is formally given by , and discussed implications on the long-time asymptotics of the nonlinear fourth order DLSS (4) equations.
Our second main result is:
Theorem 1.3.
Given a test function , let be the solution of the transport equation
Then:
Consequently, if possesses a displacement Hessian at , then it is an extension of the linear operator .
Theorem 1.3 could be proven by direct calculations, evaluating the second variational derivative of along the transport equation, and then integrating by parts until the desired form is attained. This would be tedious but in principle straight-forward once the desired terminal form is known. Here we present a more conceptual approach, based on the relation (14), which leads us to the form of the Hessian in the first place.
As said before, the implications of Theorem 1.3 on the long-time asymptotics of (1) are far from obvious. Naturally, the hope is that the dynamics of all sufficiently smooth solutions close to equilibrium is approximately the same as that of the linearized equation, and in particular, that the eigenvalues of determine the exponential decay of ’s “nonlinear modes” in the long-time limit. It is not hard to see that the formal differential operator possesses a unique self-adjoint closure in , and that the spectrum of the latter is purely discrete with eigenvalues for
The strong results from [13], where the fully nonlinear exponential stability of the Gaussian steady state for the DLSS equation (4) has been proven on grounds of (13), give rise to a conjecture, namely that the spectral gap in the linearization actually determines the global rate of convergence to equilibrium.
Conjecture 1.
A direct consequence of Conjecture 1 would be ’s asymptotic self-similarity in case . More precisely, in Section 4.6 we show that if Conjecture 1 were true, then any solution to (1) with approaches in with algebraic rate the self-similar solution
(17) |
at least if is already sufficiently close to self-similarity initially.
Outline
After explaining the origin and relevance of equation (1) in Section 2.1, we introduce commonly known background regarding the -Wasserstein space and metric gradient flows in Section 2.2. Section 3 is then devoted to the existence proof, while Section 4 deals with formal results for the intermediate respectively long time behaviour of those obtained solutions.
Notation
The euclidean norm is denoted by , while is given by for and for . The domain of a functional defined on a set consists of all such that .
2. Derivation and Preliminaries
2.1. Derivation of the equation
We sketch the derivation of (1) from a quantum mechanical model. In [5] — building on [4] — the following non-linear and non-local quantum analogue of the classical Fokker-Planck equation (12) has been derived:
(18) |
Here is the macroscopic density of quantum particles whose dynamics aims at minimizing the ensemble’s relative von Neumann entropy . The precise definition of , sometimes referred to as ’s quantum logarithm, is intricate; for the sake of completeness, we mention one possible definition of as the uniquely determined potential such that
for all test functions . Here denotes the trace over , and is the exponential of a self-adjoint operator.
Under the specific hypotheses made in [5], the von Neumann entropy can be expressed in terms of the macroscipic density alone, and is given by
We remark that ’s variational derivative is , and thus equation (18) has the formal structure of a gradient flow in . In the semi-classical limit , the von Neumann entropy formally approaches the Boltzmann entropy , and the variational derivative formally approaches the pointwise logarithm . Consequently, (18) turns into the classical Fokker-Planck equation.
The existence analysis for the full equation (18) goes far beyond classical parabolic theory, and has been carried out just recently, and only in one space dimension [18]. Already the rigorous definition of as solution to an inverse problem has been challenging [16]. A way to approximate (18) by local equations is via the expansion of in powers of the small parameter . In [2, Appendix] the following asymptotic expansion up to has been computed (for ):
As mentioned above, a reduction of in (18) to the leading order term yields the linear Fokker-Planck (or rather: heat — since ) equation (12),
Replacing by leads to the Derida-Lebowitz-Speer-Spohn (DLSS) equation
Finally, since is identical to the functional , the equation coincides with the sixth-order equation (1) under consideration here.
2.2. Background for Wasserstein gradient flows
In this section we briefly review some basics about the theory of optimal transport and -Wasserstein gradient flows, but only as far as it is needed later. For a more profound introduction to these topics, we refer to the text books [1, 19, 22]. By we denote all probability measures with finite second moment,
We shall frequently identify absolutely continuous (with respect to the Lebesgue measure ) probability measures on with their densities . A sequence of probability measures converges narrowly to if
holds for all bounded, continuous functions . The -Wasserstein distance between two measures is defined via
(19) |
where denotes the set of all transport plans between and , that is all with respective marginals and . The Wasserstein distance metrizes narrow convergence on and is itself lower semi-continuous with respect to that convergence.
We are not going to define metric gradient flows on in general. Here we just need a particularly nice subclass.
Definition 2.1.
Let be a proper, lower semi-continuous functional. Further, let a semi-group of continuous maps on be given. We call the semi-group an -flow for if the following evolutionary variational inequality holds at any and with any :
(20) |
Example 1.
The following three examples of -flows will be important in the following. They are both special cases of [1, Example 11.2.7].
-
(1)
The linear heat flow, given as distributional solution to , is a -flow for the entropy .
-
(2)
The linear mass transport, given as distributional solution to for a potential with globally bounded second derivatives, is an -flow for the potential energy , for every such that .
-
(3)
The rescaling, given as distributional solution to , is an -flow for the potential energy .
Solutions to (1) will be constructed via discrete-in-time approximation by means of the minimizing movement scheme, i.e., a variational Euler method, see Section 3.2 below. Inductively, the approximation at time is obtained from , the one at , as minimizer in
(21) |
For passage to the continuous limit, a priori estimates independent of the time step are essential. The key is to give a rigorous meaning to a priori estimates related to dissipations of auxiliary functionals already on the time-discrete level. For this, we shall use the so-called flow interchange method [13, Theorem 3.2], where the minimizer is perturbed along the -flow of the auxiliary functional .
Lemma 2.2 (Flow Interchange).
Let be two proper, lower semicontinuous functionals with . Assume further that produces an -flow . Let be a global minimizer of the following Yosida-penalization of ,
(22) |
where is given. Then
(23) |
Sketch of proof.
Remark 2.3.
The left-hand side in (23) is an approximation of ’s dissipation along ’s flow. At least on a formal level, one expects that it is also an approximation to ’s dissipation along ’s flow, i.e., a time-discrete analogue of . Indeed, in a corresponding smooth situation, with a map , that is a gradient flow with respect to two “time” parameters and ,
for smooth functions , we have
In the non-smooth situation at hand, the two dissipations might not be identical, but typically, one can be controlled in terms of the other.
3. Existence of Solutions
3.1. Properties of the energy functional
We begin by giving a proper definition of the energy functional.
Definition 3.1.
The energy functional is defined as follows: if is absolutely continuous with , then
(26) |
otherwise, .
Remark 3.2.
Several comments are in order.
- •
- •
- •
-
•
The authors of [8] consider a slightly different version of : assuming satisfies the weaker condition , they set
(28) Since the zero set of is clearly -negligible, the integrand is -a.e. well-defined. For , one has ; this is a consequence of the fact that the derivatives of Sobolev functions are zero -a.e. on their level sets, i.e., the integrand in (26) vanishes a.e. on . The representation in (28) is the relevant one in the proof of lower semi-continuity, see Proposition 3.4 below. For our later needs, the representation (26) is better suited.
The energy defined above is part of a family of second-order functionals that have been studied in [8, Section 3] in connection with the DLSS equation (4). We recall and adapt two results here.
Proposition 3.3 (adapted from Corollary 3.2 in [8]).
There is a constant , only depending on the dimension , such that for all absolutely continuous :
(29) |
It is important to remark here that [8, Corollary 3.2] indeed applies to the case : the derivation of (29) only involves an integration by parts with the vector field , which is integrable on since and . We refer to Lemma C.1 and to the subsequent discussion in the Appendix.
Proposition 3.4 (adapted from Corollary 3.4 of [8]).
The functional is sequentially lower semi-continuous with respect to narrow convergence.
Proof.
In [8, Corollary 3.4], the lower semi-continuity of recalled in (28) above is shown. The only difference between and is that the former is defined by the integral value (possibly ) for all with , and the latter is unless .
The “stability” of ’s restricted domain follows directly from Proposition 3.3 above: if converges narrowly to and has , then is bounded in thanks to (29). Consequently, converges strongly to a limit in , which in turn implies that is absolutely continuous. And boundedness of in further implies that also . ∎
3.2. Minimizing movements
Let a time step size be fixed. For a given , define the Yosida-penalized energy functional by
Lemma 3.5.
For each , there exists a global minimizer of .
Proof.
This is a standard argument from the calculus of variations. Observe that the functional has the following properties:
-
•
It is bounded from below (in fact: non-negative), and is not identically (since it has a finite value for any absolutely continuous with ). Hence, it has a finite infimum.
-
•
It is coercive. Indeed, by non-negativity of and the properties of , one easily shows that with a that is expressible in terms of and . Hence, sublevels are tight and thus pre-compact with respect to narrow convergence.
-
•
It is sequentially lower semi-continuous with respect to narrow convergence, see Proposition 3.4 above.
The existence of a minimizer now follows by standard arguments. ∎
As a consequence of Lemma 3.5, the minimizing movement scheme for is well-defined for every initial condition . That is, starting from , one can define a sequence inductively by choosing for as a minimizer of for For notational convenience, we also introduce the usual time-discrete “interpolation” by
for , and . The sequence and its interpolation satisfy a variety of energy estimates, that directly follow from the construction via minimization.
Lemma 3.6 (Basic discrete estimates).
For every , the energy is finite, and moreover,
(30) | ||||
(31) |
And further, for all ,
(32) |
The derivation of these estimates is a standard procedure, see e.g. [1].
3.3. Discrete equation
In this section, we derive a time-discrete surrogate of (1) that is satisfied by the discrete approximation . Following the seminal idea from [9, 8], we produce a time-discrete, very weak formulation of (1) — eventually leading to (8) — by performing an inner variation of the minimizer of . That weak formulation is well-defined under the hypothesis that , which follows trivially by the construction.
More specifically, let a smooth and compactly supported function be given, and define the associated -gradient flow as solution to the ODE initial value problem
(33) |
For a given absolutely continuous measure , we consider the deformations . These satisfy the continuity equation along the vector field , i.e.,
(34) |
In Lemma 3.7 below, the -derivative of at is given explicitly. In view of (34), we obtain formally — that is, in case that is positive and smooth — that
In other words, calculating the -derivative of at amounts to calculating a special form of the right-hand side in (1), “tested” against . This philosophy — which was the founding idea behind the flow interchange lemma, see (23) — is made rigorous in Lemma 3.8 further below.
Lemma 3.7 (First variation).
Proof.
First, let us assume that ; the minor modifications for are described at the end of the proof. Introduce as the density of , and accordingly and . For later reference, observe that
Next, define the volume distortion , which is a positive and smooth function, and observe that
By the change of variables formula, we have with ,
(36) |
We shall now express the spatial derivatives of and in terms of the respective derivatives of and . For the next calculations, which involve a repeated application of the chain rule, we use instead of gradients and Hessians the less intuitive but more consistent notations with total derivatives that produce row vectors.
Recall the effect of the push-forward on densities:
(37) |
Hence we have:
(38) |
For the first derivatives, we thus obtain
and for the second derivative of ,
The expression on the right hand side calls for some explanation: the part in the square brackets is a symmetric bilinear form; and when is applied to two vectors , then that bilinear form is evaluated on the vectors .
Since all the -dependence on the right-hand side is now in and , these expressions are obviously differentiable in , with derivatives
and
We are now in the position to calculate the -derivative of at by differentiating in (36) directly under the integral. Recall the abbreviation , which is a symmetric -matrix. For the derivative of the integrand we obtain:
where we have used that to cancel the two terms multiplying . Integration of this equality with respect to yields (35) with .
When , we calculate
Since by definition of the push-forward,
we directly obtain
which provides the additional term in (35). ∎
Lemma 3.8.
For any test function , there is a constant such that for each , the measures and satisfy the following time discrete version of (8):
(39) |
Proof.
Choose such that . According to Example 1, the solution to the continuity equation (34) follows an -flow for the potential energy emerging from . As a consequence of the flow interchange estimate (23),
Substitution of (35) for the -derivative of yields
Replacing by produces the same inequality (also with the same value of ) but with an overall minus on the left-hand side. Combination of these two estimates leads to (39). ∎
3.4. Additional a priori estimates
The next step is to derive a time-discrete version of the a priori estimate (10).
Proposition 3.9.
The sequence of time-discrete approximations constructed above satisfies at each
(40) |
where is a constant that is expressible in terms of the dimension alone.
The proof of the Lemma builds on the analogous result derived in [3] for a — more hands-on — time-discrete approximation of solutions to (1) with periodic boundary conditions. The formal calculations there are identical to the ones that lead to (40). The rigorous justification is more difficult: first because the time steps in [3] have a higher degree of spatial regularity than the Wasserstein approximants here; and second, because the periodic boundary conditions make any discussion of boundary terms related to integration by parts unnecessary.
We shall perform several approximations before we can apply the formal calculations from [3]. For notational convenience, we assume for the moment; the minor modifications for are summarized at the end of the proof of Proposition 3.9.
The first ingredient in our approximation procedure is the following regularization of the energy functional: for each , define
for with .
Lemma 3.10.
Assume that . Then also , and .
Proof.
By definition of , we have that and . According to the chain rule for concatenation of with the smooth functions and that are sublinear with globally bounded first and second derivatives, also and , and
In particular, and , and so . Further, the quotients and are bounded by one, and converge to one and to zero respectively, in measure -a.e. It follows by dominated convergence that
and therefore also . ∎
We are now going to study certain properties of along solutions of the heat flow. More specifically, let some probability density be given, and consider for each :
(41) |
Note that is the unique solution to the initial value problem
Our first observation is that has the expected limit for .
Lemma 3.11.
Assume that , define as in (41) above. Then .
Proof.
We present a proof that heavily uses the dimensionality restriction .
By definition of , we have that and . First, we show that
(42) |
For this, we use that since , one has , and so is globally bounded. Now, by the chain rule,
which shows and . Interpolation with yields (42).
By the representation (41) of as convolution with , and thanks to Young’s inequality, it follows from (42) that
(43) |
Recall that is smooth and positive. It follows by the chain rule that:
Combining (43) with the fact that, for any , the function is bounded by , and converges in measure to as , we conclude that
and so we have that
which is the claim. ∎
The next result is our core computation, namely of the derivative of at .
Lemma 3.12.
Given a probability density , define by (41). Then is differentiable at each , with
(44) |
Proof.
The heat kernel from (41) is smooth and positive at every and , and all spatial derivatives with arbitrary multi-index belong to any with . Consequently, the function is positive and smooth, each is a probability density, and, thanks to Young’s integral inequality, all spatial derivatives are in any .
Next, observe that is a smooth function as well, that satisfies
(45) |
in the classical sense. Moreover, despite the fact that itself is clearly not integrable on , its spatial derivatives belong to any ; the latter is most easily seen from the fact that each can be written as a linear combination of products of terms , with suitable multi-indices , where . This further means that also times any linear combination of products of spatial derivatives of are integrable. In particular, Gauss’ theorem as stated in Lemma C.1 in the Appendix is applicable to vector fields built from such functions.
We can now calculate the derivative of , which we write equivalently in the form
Thanks to the aforementioned smoothness of and the admissibility of integration by parts via Lemma C.1 from the Appendix, we obtain — recalling (45), and supressing ’s sub-index from now on —
(46) |
From this point on, we proceed in analogy to the proof of [3, inequality (19)]. That means, we define (ad hoc) the smooth and integrable vector field
The proof of [3, inequality (19)] amounts to showing that
(47) |
holds pointwise, i.e., ’s domain of definition plays no role here. The key idea in proving (47) is that after division by , it becomes an inequality between polynomials in the first, second and third derivatives of , hence the proof of its validity is a (cumbersome) algebraic problem. For more details on the choice of and the general concepts behind the algebraic method for proving entropy dissipation inequalities, we refer the reader to [3, 10].
We are finally in the position to prove the main estimate (40).
Proof of Proposition 3.9.
This is another application of the flow interchange method, see Lemma 2.2. According to Example 1, the heat flow given by defines a -flow on for the (unperturbed) entropy . Therefore,
(48) |
Formally, the left-hand side above is minus the derivative of at , and the eventual goal is to control this in the spirit of (44), for . The main technical obstacle that prevents us from carrying out this differentiation and to conclude directly (40) from here is the possible irregularity of at . It is not even clear that as . Indeed, while lower semi-continuity is known from Proposition 3.4, upper semi-continuity might fail. The problem is that — due to the non-differentiability of at — one cannot conclude from that in . We tackle this problem by approximation of by , for which continuity at has been shown in Lemma 3.11.
It follows from Lemma 3.12 that is differentiable at every , with derivative given in (44). Thanks to Lemma 3.11, we can apply the fundamental theorem of calculus to obtain that
We pass to the limit , using Lemma 3.10 on the left hand side and Fatou’s lemma respectively the lower semi-continuity of norms on the right hand side. With the latter we subsequently obtain for
3.5. A universal bound on the energy
In the spirit of [8, Theorem 1.4], we prove the following bound on the energy.
Proposition 3.13.
There is a constant , expressible in terms of the initial entropy value and the initial second moment , such that, for each ,
(49) |
This proposition is a consequence of the following a priori estimates.
Lemma 3.14.
There is a universal constant such that, for each ,
(50) | ||||
(51) |
Lemma 3.15.
There exists a constant such that for any ,
(52) |
And consequently, for each , there exists a independent of such that
(53) |
Proof of Proposition 3.13 from Lemmas 3.14 and 3.15.
Without loss of generality, it suffices to prove
(54) |
for all such that ; the estimate (49) for larger is then a trivial consequence of the monotonicity (30). Choosing accordingly, the monotonicity (30) of implies that
Substitute the elementary estimate
in the sum on the right-hand side and use (52) to obtain
Since we assume , the terms on the right-hand side are bounded thanks to (50) and (51). That is, , where depends just and . Divide by and take the power to obtain (54) with . ∎
Proof of Lemma 3.14.
Summation of (40) from to yields
(55) |
We shall now derive a suitable lower bound on . More precisely, we derive an upper bound on ; notice that entropy and second moment are connected via
(56) |
which follows from a scaling argument — see e.g. [8, Section 2.2] for details. To estimate the second moment, we apply once again the flow interchange technique. The flow is given by exponential dilations, that is , or more explicitly in terms of the densities of :
Since, by the chain rule,
and since
it is easily seen that
Clearly, this scaling property carries over to the extension , and we obtain
Recall from Example 1 that is a -flow for the auxiliary functional . The flow interchange estimate (23) now yields
Neglecting the last term, we obtain from here the recursion formula
which clearly implies that
On the right-hand side, we estimate by means of (53) with ,
(57) |
By means of the inequality (56) between entropy and second moment, this provides an estimate from above on on the right-hand side of (55). Rearranging terms, one finally obtains (50), with . The estimate (51) is now easily obtained by estimating the first sum in (57) above with (50), and neglecting the second sum. ∎
Proof of Lemma 3.15.
Corollary 3.16.
There is a constant , expressible in terms of the initial entropy and the initial second moment alone, such that, for each ,
(58) |
Proof.
This is a direct consequence of (50). ∎
3.6. Uniform almost continuity of the discrete trajectory
In this section we show that the piecewise constant interpolations are “uniformly almost continuous” as curves in . More specifically, we derive an approximate Hölder estimate with a -independent Hölder constant, see Proposition 3.20 below.
Lemma 3.17.
There is a constant , depending on just in terms of , such that for every :
(59) |
Proof.
The idea is to show that, for a constant expressible just in terms of ,
(60) |
It then follows thanks to non-negativity of that the minimizer of satisfies
which immediately implies (59). To prove (60), we show that, on the one hand,
(61) |
and that, on the other hand,
(62) |
for all , with a constant that again depends on only via . The choice then yields (60). For the proof of (61), we compare the Wasserstein distance with the transport cost generated by the plan that has Lebesgue density
It is easily seen that the two marginals are indeed and , respectively. According to (19), we have
The proof of (62) is a bit more elaborate. First, note that
and hence
Concerning the estimate of , observe that for a smooth and positive probability density ,
Now we plug in. By Jensen’s inequality,
It thus follows that
In an analogous manner,
implying that
Collecting these estimates, we finally obtain
From here, (62) follows immediately. ∎
Lemma 3.18.
There is a constant such that, for all with ,
(63) |
Proof.
Let be the largest integer with . Using Lemma A.1 from the appendix and Proposition 3.13, we find that
Now we combine this with the triangle inequality for , the basic energy estimate (31), and Lemma A.2 from the appendix:
To conclude (63) from here, with , it suffices to recall that for arbitrary non-negative reals and . ∎
Lemma 3.19.
For all with ,
(64) |
Proof.
Recall that a modulus of continuity is a map with the property that for arbitrary .
Proposition 3.20.
Proof.
Without loss of generality, assume that .
Given , let and be the smallest integers with and , respectively. By definition of , we have that
Further note that . Let be the smallest integer with . If either or , then (65) follows directly from (63) or (64), respectively, using that
If instead , then we estimate further:
and apply (63) and (64), respectively, to the sum on the right-hand side, using that, trivially,
Finally, if , i.e., , then we estimate
apply the reasoning above to the first distance, and Lemma 3.17 to the second, using . ∎
3.7. Passage to the continuous equation
The collected estimates will be enough to pass to the limit in the time-discrete evolution equation (39). We choose for any an interpolated time discrete solution starting from , with respective Lebesgue densities .
Lemma 3.21.
There is a sequence and a Hölder continuous limit curve with and , such that
-
i)
narrowly at each ,
-
ii)
strongly in ,
-
iii)
strongly in ,
-
iv)
weakly in .
Proof.
In the following, let some time horizon be fixed. Thanks to the moment estimate (51) and the -uniform Hölder regularity (65), the curves satisfy the hypotheses of the generalized Arzelá-Ascoli-Theorem [1, Theorem 3.3.1], see Lemma D.1.
Hence, for a suitable vanishing sequence , the converge — narrowly at each — to a limit curve . And that limit inherits the Hölder continuity (65), i.e.,
(66) |
By the lower semi-continuity of , see Proposition 3.4, and the universal bound (49), one has
at each . Hence the limit measures are absolutely continuous, , with . Additionally, in view of the estimate (29), we have that
(67) |
and consequently,
(68) |
at each . Indeed, by Rellich’s theorem, a subsubsequence of an arbitrary subsequence of converges strongly to some limit in ; but then the implied pointwise a.e. convergence leads to . By independence of the limit from the chosen subsequence, we conclude convergence of the entire sequence. Next, since (67) provides a uniform bound on in , the dominated convergence theorem applies and yields
We combine this convergence with the uniform bound on in from (58) to obtain claim ii) above via interpolation. With ii) and the uniform bound on in from (58) at hand, we verify claim iii) via Theorem B.1 from the Appendix. The last claim iv) is another direct consequence of the bound (58) on . That follows from the arguments given in Lemma 3.11.
∎
Proof.
Let be a test function in time and space. Fix some ; without loss of generality, we assume that is so small that for all and . For each , use as test function in (39), then sum over all ; this is actually a finite sum since is compactly supported. With the help of the triangle inequality,
The right-hand side converges to zero for thanks to the estimate (31). This implies, after rewriting everything in terms of the interpolated functions,
where is chosen large enough so that , and we have introduced
Notice that
It is now easily checked that the convergence stated in Lemma 3.21 above are sufficient to pass to the respective limits inside the integrals, that is
This is equivalent to the weak formulation (8). ∎
This finishes the proof of Theorem 1.1.
4. Long Time Behaviour
4.1. An illustration by ODEs
We illustrate the role played by (14) by an analogous situation for smooth gradient flows on . We are given a family of strictly convex functions and fix a parameter . From here, we define derived functions by
(69) |
Thanks to strict convexity of , there exists precisely one minimum point of , and this is by construction also the unique minimum point of and of . We wish to study the linearized dynamics of the gradient flow near the stationary point . Since
and since , it follows that the linearization of ’s gradient vector field near is given by
Assuming that the eigenvalues of are known, the eigenvalue of are known as well: these are precisely for . In particular, the smallest of these is a lower bound on the exponential rate of convergence to in the linearized dynamics.
4.2. Derivation of the relation (14)
We shall now derive the relation (14), which is the basis for all further analysis below, and which plays the same role for (1) as (69) has played in the analysis of the toy problem above.
Lemma 4.1.
Let be a solution to the linear Fokker-Planck equation (12). Then, at each ,
(70) | ||||
(71) |
Proof.
Thanks to the regularizing properties of the linear Fokker-Planck equation, is an everywhere positive -function on , where it satisfies (12) in the classical sense, or equivalently
(72) |
Moreover, is a probability density at any , and decays sufficiently rapidly for to justify the integration by parts below.
Lemma 4.2.
Proof.
As in the previous proof, we rely on the regularity of the Fokker-Planck flow in the calculations below. Still, to enhance readability, we shall use simply instead of below.
To establish the connection of the right hand side in (70) to , we substitute
(76) |
and then integrate by parts in the term with linear dependence on :
Since is a probability density, the constant above amounts to . To conclude (74) from here, it remains to observe that
Next we need to show that the right-hand sides in (71) and (75), respectively, are the same. Substitution of (76) into (71) yields, after an integration by parts,
Now we integrate by parts to remove the divergence in both integrals:
Finally, observe that
This yields (75). ∎
4.3. The displacement Hessian
The geometric idea behind the linearization by means of the displacement Hessian is the representation of the dynamics on the space of probability measures in Lagrangian coordinates. For the moment, let us consider general -Wasserstein gradient flow, written in the form of a nonlinear transport equation,
(78) |
By a Lagrangian representation of a solution with respect to some reference measure , we mean a time-dependent diffeomorphism satisfying
(79) |
Note that there is a freedom of gauge here: (79) determines only up to a concatenation from the right with any -dependent map that leaves invariant.
Since satisfies the transport equation (78), it is easily deduced that “follows the vector field ”. I.e., it satisfies the Lagrangian equation
(80) |
Here refers to the aforementioned freedom of gauge for : the left and right sides in (80) may differ by a vector field that is divergence-free with respect to , i.e., .
Now assume that is a stationary solution of (78); then is a stationary solution of (80). The Wasserstein-linearization of (78) around is an appropriate linearization of (80) around id. Taking into account that , one obtains for any smooth, compactly supported vector field :
Consequently, the linearized Lagrangian dynamics is given by
(81) |
For definition of the displacement Hessian and the Wasserstein linearization, one chooses a particular gauge in (80) — and consequently also in (81) — to remove the ambiguity. Thanks to the Brenier theorem from optimal transportation, one may assume with some time-dependent potential . Inserting this into (81) yields
(82) |
The operator on the right-hand side acting on is the the negative of the displacement Hessian of . More rigorously, one defines:
Definition 4.3.
Assume that is a global minimizer of , and assume further that there exists a densely defined self-adjoint linear operator on such that, for any test function ,
where is the solution to the transport equation . Then is called displacement Hessian of at , and is denoted by .
4.4. Calculation of the displacement Hessian
We shall now calculate the displacement Hessian for the three functionals from (2). Note that they all have as global minimizer.
Proposition 4.4.
Define the linear operator on by
Then we have for all :
(83) |
Proof.
Let some be fixed. For the curve , we choose the solution of the transport problem
(84) |
Note that the transport vector field is independent of time and smooth with compact support, hence the is an everywhere positive smooth function on .
For the relative entropy, recalling the representation (73), and that for ,
Substituting (84) and integrating by parts, we obtain
This gives the first identity in (83).
For the perturbed Fisher information, we start from the representation
that follows from (70) and (74), and obtain
Hence, by (84) and two consecutive integration by parts,
which confirms the second identity in (83).
Finally, for computation of the Hessian of , we make use of (71) and (75) and so obtain
An quick inspection of the last two lines above reveals that there is exactly one term that does not vanish automatically because of , namely the one where both logarithmic terms inside the last integral get differentiated. In combination with the calculation for above, we conclude that
(85) |
We consider the last integral, substitute (84), and repeatedly integrate by parts:
Substituting this in (85) yields the final identity in (83). ∎
4.5. Special solutions and their linearization
To illustrate the applicability of the linearization of (1), we completely characterize the dynamics of (1) and its Wasserstein linearization
(86) |
in the invariant finite dimensional submanifold defined by affine deformations of Gaussians. More specifically, we consider the — non-linear and linearized — dynamics induced on the set of positive definite matrices and vectors by means of
(87) |
We begin with the linearized dynamics. Since we have
(88) |
we need to study solutions to (86) of quadratic type,
It is obvious that these form an invariant subspace under (86). For this ansatz, we obtain
and so it follows from (86) that
(89) |
Note that the equation for can be neglected, since the value of is irrelevant in .
Now for the full nonlinear dynamics. From the ansatz (87), we obtain that
It follows that, on the one hand,
(90) |
And on the other hand, using that , and also that , we obtain
and thus further
(91) |
By equating the right-hand sides of (90) and (91), we see that (1) is satisfied if and only if
Note that the last of these three differential equations is a trivial consequence of the first. Note further that the equations for and are easy to solve:
(92) |
the third root is well-defined since the expression in the brackets is a positive definite matrix at each .
In view of (88), the quantities and are related by
It is thus obvious that the linear ODEs (89) capture the correct asymptotic behaviour of and as . It should be noted that the -fold eigenvalues and consistute the lowest eigenvalues of ’s spectrum; the next eigenvalue is . The interpretation in terms of higher-order asymptotics would be this: an arbitrary initial datum that sufficiently close to in an appropriate sense can be modified by an suitable transformation of the form
such that the corresponding solution converges to at an exponential rate of . The rigorous proof of such a result is currently out of reach. For a related result on the porous medium equation, we refer to [6].
4.6. Asymptotic self-similarity
We discuss the following consequence of Conjecture 1.
Corollary 4.5.
Proof.
By the usual rescaling for homogeneous parabolic equations, we relate the solution for to a solution for . Specifically, we set
and then introduce implicitly via
(94) |
Note that . Now let be the solution to (1) with according to Theorem 1.1 for the initial condition . Performing a change of variables under the integral in (9) it is easily seen that satisfies the weak formulation (8) with .
Appendix A Two elementary inequalities for sums of powers
Lemma A.1.
For each ,
(95) |
Proof.
Lemma A.2.
For integers ,
(96) |
Proof.
Inequality (96) clearly follows from
for all . This is a consequence of the “below tangent formula” for the concave function ,
in combination with the observation that
∎
Appendix B A convergence theorem for powers
Theorem B.1.
For with and a sequence of nonnegative functions on such that
-
(i)
strongly in
-
(ii)
is bounded in
we have (up to subsequences) strongly in .
We rephrase a slight variant, respectively extension, of the proof of [11, Proposition 6.1].
Proof.
At first assume that the sequence is uniformly bounded away from 0, i.e. for some , and that all have support on a ball with radius , . By the ”missing term in Fatou’s Lemma” it is enough to show pointwise convergence and convergence of the norms for the strong convergence of in .
For this, define to be the measures with densities (with respect to the Lebesgue measure). As converges in , which of course implies weak convergence in , converges narrowly to on (since ). Moreover for we have
(97) |
and hence .
Further, (97) and assumption imply
hence converges in the sense of [1, Definition 5.4.3.] strongly in , which in addition to the narrow convergence of , implies the narrow convergence of the transport plans to in , as stated in [1, Theorem 5.4.4.]. Since the th and th moment of are uniformly bounded, by applying [1, Theorem 5.1.7.] to we find
for functions with at most -growth. Hence choosing yields the norm convergence:
By assumption we can extract pointwise convergent subsequences (not relabelled)
which, taking the strict positivity of the functions into account, yields
and hence finally the strong convergence of the subsequence.
For a nonnegative sequence we apply the above result to the truncated functions
for a sequence . Using the norm convergence of and the truncation property of , we have
(98) |
for and thus the result also holds for nonnegative functions.
To extend the argument to the whole , one argues by approximating by the sequence for where is a cut-off function with , where for and for . Passing to subsequences and using the former calculations, by diagonalization one finds a sequence converging on every ball for . With the norm boundedness of in obtained by using the representation given in (97) and interpolating between the uniformly bounded and one eventually concludes the argument for functions on the whole . ∎
Appendix C Integration by parts
For convenience of the reader, we recall the basic rule for integration by parts.
Lemma C.1.
Let be given, and assume that there exists a vector field such that in the sense of distributions. Then
(99) |
Proof.
Let be a cut-off function with , with for , and with for . Defining as usual for , we obtain
On the one hand, since , we have that in as by dominated convergence. On the other hand, since and , we have that in as , again by dominated convergence. This shows (99). ∎
An application of this variant of integration by parts is the following identity, taken from [8, Theorem 3.1]:
for all positive with . The respective vector field is given by , which is since and . The relation is easily seen for positive and smooth functions , and carries over to the afore mentioned more general via local approximation.
Appendix D Arzelá-Ascoli-Theorem
For enhanced self-containment, we replicate the statement of the generalized Arzelá-Ascoli Theorem from [1, Proposition 3.3.1], which has been essential in the proof of Lemma 3.21.
Lemma D.1.
Let be a complete metric space and an Hausdorff topology on compatible with in the sense that for sequences
Further, let be a sequentially compact set w.r.t. and let be curves such that
for a (symmetric) function , such that
where is an (at most) countable subset of . Then there exists an increasing subsequence and a limit curve such that
In the proof of Lemma 3.21, this result is applied as follows. The complete metric space is , the auxiliary topology is the one induced by narrow convergence; we have recalled the compatibility conditions in Section 2.2. The compact subset is formed by the probability measures satisfying the uniform bound on the second moment from (51); sequential compactness of is a consequence of Prokhorov’s theorem. The modulus of continuity is given by the right-hand side in the Hölder estimate (65) at . Notice that the conclusion above guarantees convergence in , i.e., narrowly, not necessarily in .
References
- [1] Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows in metric spaces and in the space of probability measures. Lectures in Mathematics ETH Zürich. Birkhäuser Verlag, Basel, second edition, 2008.
- [2] Mario Bukal, Ansgar Jüngel, and Daniel Matthes. Entropies for radially symmetric higher-order nonlinear diffusion equations. Commun. Math. Sci., 9(2):353–382, 2011.
- [3] Mario Bukal, Ansgar Jüngel, and Daniel Matthes. A multidimensional nonlinear sixth-order quantum diffusion equation. Ann. Inst. H. Poincaré Anal. Non Linéaire, 30(2):337–365, 2013.
- [4] P. Degond and C. Ringhofer. Quantum moment hydrodynamics and the entropy principle. J. Statist. Phys., 112(3-4):587–628, 2003.
- [5] Pierre Degond, Florian Méhats, and Christian Ringhofer. Quantum energy-transport and drift-diffusion models. J. Stat. Phys., 118(3-4):625–667, 2005.
- [6] Jochen Denzler, Herbert Koch, and Robert J. McCann. Higher-order time asymptotics of fast diffusion in Euclidean space: a dynamical systems approach. Mem. Amer. Math. Soc., 234(1101):vi+81, 2015.
- [7] Jochen Denzler and Robert J. McCann. Fast diffusion to self-similarity: complete spectrum, long-time asymptotics, and numerology. Arch. Ration. Mech. Anal., 175(3):301–342, 2005.
- [8] Ugo Gianazza, Giuseppe Savaré, and Giuseppe Toscani. The Wasserstein gradient flow of the Fisher information and the quantum drift-diffusion equation. Arch. Ration. Mech. Anal., 194(1):133–220, 2009.
- [9] Richard Jordan, David Kinderlehrer, and Felix Otto. The variational formulation of the Fokker-Planck equation. SIAM J. Math. Anal., 29(1):1–17, 1998.
- [10] Ansgar Jüngel and Daniel Matthes. An algorithmic construction of entropies in higher-order nonlinear PDEs. Nonlinearity, 19(3):633–659, 2006.
- [11] Ansgar Jüngel and Josipa-Pina Milišić. A sixth-order nonlinear parabolic equation for quantum systems. SIAM J. Math. Anal., 41(4):1472–1490, 2009.
- [12] Pierre-Louis Lions and Cédric Villani. Régularité optimale de racines carrées. C. R. Acad. Sci. Paris Sér. I Math., 321(12):1537–1541, 1995.
- [13] Daniel Matthes, Robert J. McCann, and Giuseppe Savaré. A family of nonlinear fourth order equations of gradient flow type. Comm. Partial Differential Equations, 34(10-12):1352–1397, 2009.
- [14] Robert J. McCann and Christian Seis. The spectrum of a family of fourth-order nonlinear diffusions near the global attractor. Comm. Partial Differential Equations, 40(2):191–218, 2015.
- [15] Robert J. McCann and Dejan Slepčev. Second-order asymptotics for the fast-diffusion equation. Int. Math. Res. Not., pages Art. ID 24947, 22, 2006.
- [16] Florian Méhats and Olivier Pinaud. An inverse problem in quantum statistical physics. J. Stat. Phys., 140(3):565–602, 2010.
- [17] Felix Otto. The geometry of dissipative evolution equations: the porous medium equation. Comm. Partial Differential Equations, 26(1-2):101–174, 2001.
- [18] Olivier Pinaud. The quantum drift-diffusion model: existence and exponential convergence to the equilibrium. Ann. Inst. H. Poincaré Anal. Non Linéaire, 36(3):811–836, 2019.
- [19] Filippo Santambrogio. Optimal transport for applied mathematicians, volume 87 of Progress in Nonlinear Differential Equations and their Applications. Birkhäuser/Springer, Cham, 2015. Calculus of variations, PDEs, and modeling.
- [20] Christian Seis. Long-time asymptotics for the porous medium equation: the spectrum of the linearized operator. J. Differential Equations, 256(3):1191–1223, 2014.
- [21] Christian Seis. The thin-film equation close to self-similarity. Anal. PDE, 11(5):1303–1342, 2018.
- [22] Cédric Villani. Topics in optimal transportation, volume 58 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI, 2003.