This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Halász–Székely Barycenter

Jairo Bochi Facultad de Matemáticas, Pontificia Universidad Católica de Chile [email protected] Godofredo Iommi Facultad de Matemáticas, Pontificia Universidad Católica de Chile [email protected]  and  Mario Ponce Facultad de Matemáticas, Pontificia Universidad Católica de Chile [email protected]
Abstract.

We introduce a notion of barycenter of a probability measure related to the symmetric mean of a collection of nonnegative real numbers. Our definition is inspired by the work of Halász and Székely, who in 1976 proved a law of large numbers for symmetric means. We study analytic properties of this Halász–Székely barycenter. We establish fundamental inequalities that relate the symmetric mean of a list of nonnegative real numbers with the barycenter of the measure uniformly supported on these points. As consequence, we go on to establish an ergodic theorem stating that the symmetric means of a sequence of dynamical observations converges to the Halász–Székely barycenter of the corresponding distribution.

2020 Mathematics Subject Classification:
26E60; 26D15, 15A15, 37A30, 60F15
The authors were partially supported by CONICYT PIA ACT172001. J.B. was partially supported by Proyecto Fondecyt 1180371. G.I. was partially supported by Proyecto Fondecyt 1190194. M.P.  was partially supported by Proyecto Fondecyt 1180922.

1. Introduction

Means have fascinated man for a long time. Ancient Greeks knew the arithmetic, geometric, and harmonic means of two positive numbers (which they may have learned from the Babylonians); they also studied other types of means that can be defined using proportions: see [He, pp. 85–89]. Newton and Maclaurin encountered the symmetric means (more about them later). Huygens introduced the notion of expected value and Jacob Bernoulli proved the first rigorous version of the law of large numbers: see [Mai, pp. 51, 73]. Gauss and Lagrange exploited the connection between the arithmetico-geometric mean and elliptic functions: see [BB]. Kolmogorov and other authors considered means from an axiomatic point of view and determined when a mean is arithmetic under a change of coordinates (i.e. quasiarithmetic): see [HLP, p. 157–163], [AD, Chapter 17]. Means and inequalities between them are the main theme of the classical book [HLP] by Hardy, Littlewood, and Pólya, and the book [Bu] by Bullen is a comprehensive account of the subject. Going beyond the real line, there are notions of averaging that relate to the geometric structure of the ambient space: see e.g. [St, EM, Na, KLL].

In this paper, we are interested in one of the most classical types of means: the elementary symmetric polynomials means, or symmetric means for short. Let us recall their definition. Given integers nk1n\geq k\geq 1, the kk-th symmetric mean of a list of nonnegative numbers x1,,xnx_{1},\dots,x_{n} is:

(1.1) 𝗌𝗒𝗆k(x1,,xn)(Ek(n)(x1,,xn)(nk))1k,\mathsf{sym}_{k}(x_{1},\dots,x_{n})\coloneqq\left(\frac{E^{(n)}_{k}(x_{1},\dots,x_{n})}{\binom{n}{k}}\right)^{\frac{1}{k}}\,,

where Ek(n)(x1,,xn)i1<<ikxi1xikE^{(n)}_{k}(x_{1},\dots,x_{n})\coloneqq\sum_{i_{1}<\cdots<i_{k}}x_{i_{1}}\cdots x_{i_{k}} is the elementary symmetric polynomial of degree kk in nn variables. Note that the extremal cases k=1k=1 and k=nk=n correspond to arithmetic and the geometric means, respectively. The symmetric means are non-increasing as functions of kk: this is Maclaurin’s inequality: see [HLP, p. 52] or [Bu, p. 327]. For much more information on symmetric means and their relatives, see [Bu, Chapter V].

Let us now turn to Probability Theory. A law of large numbers in terms of symmetric means was obtained by Halász and Székely [HS], confirming a conjecture of Székely [Sz1]. Let X1X_{1}, X2X_{2}, …be a sequence of nonnegative independent identically distributed random variables, and from them we form another sequence of random variables:

(1.2) Sn𝗌𝗒𝗆k(X1,,Xn),S_{n}\coloneqq\mathsf{sym}_{k}(X_{1},\dots,X_{n})\,,

The case of k=1k=1 corresponds to the setting of the usual law of large numbers. The case of constant k>1k>1 is not significantly different from the classical setting. Things become more interesting if kk is allowed to depend on nn, and it turns out to be advantageous to assume that k/nk/n converges to some number c[0,1]c\in[0,1]. In this case, Halász and Székely [HS] have proved that if X=X1X=X_{1} is strictly positive and satisfies some integrability conditions, then SnS_{n} converges almost surely to a non-random constant. Furthermore, they gave a formula for this limit, which we call the Halász–Székely mean with parameter cc of the random variable XX. Halász and Székely theorem was extended to the nonnegative situation by van Es [vE] (with appropriate extra hypotheses). The simplest example consists of a random variable XX that takes two nonnegative values xx and yy, each with probability 1/21/2, and c=1/2c=1/2; in this case the Halász–Székely mean is (x+y2)2\left(\frac{\sqrt{x}+\sqrt{y}}{2}\right)^{2}. But this example is misleadingly simple, and Halász–Székely means are in general unrelated to power means.

Fixed the parameter cc, the Halász–Székely mean of a nonnegative random variable XX only depends on its distribution, which we regard as a probability measure μ\mu on the half-line [0,+)[0,+\infty). Now we shift our point of view and consider probability measures as the fundamental objects. Instead of speaking of the mean of a probability measure, we prefer the word barycenter, reserving the word mean for lists of numbers (with or without weights), functions, and random variables. This is more than a lexical change. The space of probability measures has a great deal of structure: it is a convex space and it can be endowed with several topologies. So we arrive at the notion of Halász–Székely barycenter (or HS barycenter) of a probability measure μ\mu with parameter cc, which we denote [μ]c[\mu]_{c}. This is the subject of this paper. It turns out that HS barycenters can be defined directly, without resort to symmetric means or laws of large numbers (see Definition 2.3).

Symmetric means are intrinsically discrete objects and do not make sense as barycenters. In [Bu, Remark, p. 323], Bullen briefly proposes a definition of a weighted symmetric mean, only to conclude that “the properties of this weighted mean are not satisfactory” and therefore not worthy of further consideration. On the other hand, given a finite list x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}) of nonnegative numbers, we can compare the symmetric means of x¯\underline{x} with the HS barycenter of the associated probability measure μ(δx1++δxn)/n\mu\coloneqq(\delta_{x_{1}}+\dots+\delta_{x_{n}})/n. It turns out that these quantities obey certain precise inequalities (see Theorem 3.4). In particular, we have:

(1.3) 𝗌𝗒𝗆k(x¯)[μ]k/n.\mathsf{sym}_{k}(\underline{x})\geq[\mu]_{k/n}\,.

Furthermore, if x¯(m)\underline{x}^{(m)} denotes the nmnm-tuple obtained by concatenation of mm copies of x¯\underline{x}, then

(1.4) [μ]k/n=limm𝗌𝗒𝗆km(x¯(m)),[\mu]_{k/n}=\lim_{m\to\infty}\mathsf{sym}_{km}\big{(}\underline{x}^{(m)}\big{)}\,,

and we have precise bounds for the relative error of this approximation, depending only on the parameters and not on the numbers xix_{i} themselves.

Being a natural limit of symmetric means, the HS barycenters deserve to be studied by their own right. One can even argue that they give the “right” notion of weighted symmetric means that Bullen was looking for. HS barycenters have rich theoretical properties. They are also cheap to compute, while computing symmetric means involves summing exponentially many terms.

Using our general inequalities and certain continuity properties of the HS barycenters, we are able to obtain in straightforward manner an ergodic theorem that extends the laws of large numbers of Halász–Székely [HS] and van Es [vE].

A prominent feature of the symmetric mean (1.1) is that it vanishes whenever more than nkn-k of the numbers xix_{i} vanish. Consequently, the HS barycenter [μ]c[\mu]_{c} of a probability measure μ\mu on [0,+)[0,+\infty) vanishes when μ({0})>1c\mu(\{0\})>1-c. In other words, once the mass of leftmost point 0 exceeds the critical value 1c1-c, then it imposes itself on the whole distribution, and suddenly forces the mean to agree with it. Fortunately, in the subcritical regime, μ({0})<1c\mu(\{0\})<1-c, the HS barycenter turns out to be much better behaved. As it will be seen in Section 2, in the critical case μ({0})=1c\mu(\{0\})=1-c the HS barycenter can be either positive or zero, so the HS barycenter can actually vary discontinuously. Therefore our regularity results and the ergodic theorem must take this critical phenomenon into account.

This article is organized as follows. In Section 2, we define formally the HS barycenters and prove some of their basic properties. In Section 3, we state and prove the fundamental inequalities relating HS barycenters to symmetric means. In Section 4, we study the problem of continuity of the HS barycenters with respect to appropriate topologies on spaces of probability measures. In Section 5, we apply the results of the previous sections and derive a general ergodic theorem (law of large numbers) for symmetric and HS means. In Section 6, we turn back to fundamentals and discuss concavity properties of the HS barycenters and means. Finally, in Section 7 we introduce a different kind of barycenter which is a natural approximation of the HS barycenter, but has in a sense simpler theoretical properties.

2. Presenting the HS barycenter

Hardy, Littlewood, and Pólya’s axiomatization of (quasiarithmetic) means [HLP, § 6.19] is formulated in terms of distribution functions, using Stieltjes integrals. Since the first publication of their book in 1934, measures became established as fundamental objects in mathematical analysis, probability theory, dynamical systems, etc. Spaces of measures have been investigated in depth (see e.g. the influential books [Pa, Vi]). The measure-theoretic point of view provides the convenient structure for the analytic study of means or, as we prefer to call them in this case, barycenters. The simplest example of barycenter is of course the “arithmetic barycenter” of a probability measure μ\mu on Euclidean space d\mathbb{R}^{d}, defined (under the appropriate integrability condition) as x𝑑μ(x)\int x\,d\mu(x). Another example is the “geometric barycenter” of a probability measure μ\mu on the half-line (0,+)(0,+\infty), defined as exp(logxdμ(x))\exp\left(\int\log x\,d\mu(x)\right). In this section, we introduce the Hallász–Székely barycenters and study some of their basic properties.

2.1. Definitions and basic properties

Throughout this paper we use the following notations:

(2.1) [0,+),(0,+).\mathbb{R}\coloneqq[0,+\infty)\,,\quad\mathbb{R}\coloneqq(0,+\infty)\,.

We routinely work with the extended line [,+][-\infty,+\infty], endowed with the order topology.

Definition 2.1.

The Halász–Székely kernel (or HS kernel) is the following function of three variables xx\in\mathbb{R}, yy\in\mathbb{R}, and c[0,1]c\in[0,1]:

(2.2) K(x,y,c){logy+c1log(cy1x+1c)if c>0,logy+y1x1if c=0.K(x,y,c)\coloneqq\begin{cases}\log y+c^{-1}\log\left(cy^{-1}x+1-c\right)&\text{if }c>0,\\ \log y+y^{-1}x-1&\text{if }c=0.\end{cases}
Proposition 2.2.

The HS kernel has the following properties (see also Fig. 1):

  1. (a)

    The function K:[0,+)×(0,+)×[0,1][,+)K\colon[0,+\infty)\times(0,+\infty)\times[0,1]\to[-\infty,+\infty) is continuous, attaining the value -\infty only at the points (0,y,1)(0,y,1).

  2. (b)

    K(x,y,c)K(x,y,c) is increasing with respect to xx.

  3. (c)

    K(x,y,c)K(x,y,c) is decreasing with respect to cc, and strictly decreasing when xyx\neq y.

  4. (d)

    K(x,y,1)=logxK(x,y,1)=\log x is independent of yy.

  5. (e)

    K(x,y,c)logxK(x,y,c)\geq\log x, with equality if and only if x=yx=y or c=1c=1.

  6. (f)

    For each y>0y>0, the function K(,y,0)K(\mathord{\cdot},y,0) is affine, and its graph is the tangent line to logx\log x at x=yx=y.

  7. (g)

    K(λx,λy,c)=K(x,y,c)+logλK(\lambda x,\lambda y,c)=K(x,y,c)+\log\lambda, for all λ>0\lambda>0.

Proof.

Most properties are immediate from Definition 2.1. To check monotonicity with respect to cc, we compute the partial derivative when c>0c>0:

(2.3) Kc(x,y,c)=1c2[c(y1x1)cy1x+1c+log(1cy1x+1c)];K_{c}(x,y,c)=\frac{1}{c^{2}}\left[\frac{c(y^{-1}x-1)}{cy^{-1}x+1-c}+\log\left(\frac{1}{cy^{-1}x+1-c}\right)\right]\,;

since logtt1\log t\leq t-1 (with equality only if t=1t=1), we conclude that Kc(x,y,c)0K_{c}(x,y,c)\leq 0 (with equality only if x=yx=y). Since KK is continuous, we obtain property (c). Property (e) is a consequence of properties (c) and (d). ∎

Refer to caption
Figure 1. Graphs of the functions K(,y,c)K(\mathord{\cdot},y,c) for y=2y=2 and c{0,1/3,2/3,1}c\in\{0,1/3,2/3,1\}.

Let 𝒫()\mathcal{P}(\mathbb{R}) denote the set of all Borel probability measures μ\mu on \mathbb{R}. The following is the central concept of this paper:

Definition 2.3.

Let c[0,1]c\in[0,1] and μ𝒫()\mu\in\mathcal{P}(\mathbb{R}). If c=1c=1, then we require that the function logx\log x is semi-integrable111A function ff is called semi-integrable if the positive part f+max(f,0)f^{+}\coloneqq\max(f,0) is integrable or the negative part fmax(f,0)f^{-}\coloneqq\max(-f,0) is integrable. with respect to μ\mu. The Halász–Székely barycenter (or HS barycenter) with parameter cc of the probability measure μ\mu is:

(2.4) [μ]cexpinfy>0K(x,y,c)𝑑μ(x),[\mu]_{c}\coloneqq\exp\inf_{y>0}\int K(x,y,c)\,d\mu(x)\,,

where KK is the HS kernel (2.2).

First of all, let us see that the definition is meaningful:

  • If c<1c<1, then for all y>0y>0, the function K(,y,c)K(\mathord{\cdot},y,c) is bounded from below by K(0,y,c)>K(0,y,c)>-\infty, and therefore it has a well-defined integral (possibly ++\infty); so [μ]c[\mu]_{c} is a well-defined element of the extended half-line [0,+][0,+\infty].

  • If c=1c=1, then by part (d) of Proposition 2.2, the defining formula (2.4) becomes:

    (2.5) [μ]1=explogxdμ(x).[\mu]_{1}=\exp\int\log x\,d\mu(x)\,.

    The integral is a well-defined element of [,+][-\infty,+\infty], so [μ]1[\mu]_{1} is well-defined in [0,+][0,+\infty].

Formula (2.5) means that the HS barycenter with parameter c=1c=1 is the geometric barycenter; let us see that c=0c=0 corresponds to the standard arithmetic barycenter:

Proposition 2.4.

For any μ𝒫()\mu\in\mathcal{P}(\mathbb{R}), we have [μ]0=x𝑑μ(x)[\mu]_{0}=\int x\,d\mu(x).

Proof.

Let ax𝑑μ(x)a\coloneqq\int x\,d\mu(x). If a=a=\infty, then for every y>0y>0, the non-constant affine function K(,y,0)K(\mathord{\cdot},y,0) has infinite integral, so definition (2.4) gives [μ]0=[\mu]_{0}=\infty. On the other hand, if a<a<\infty, then [μ]0[\mu]_{0} is defined as expinfy>0(logy+a/y1)=a\exp\inf_{y>0}(\log y+a/y-1)=a. ∎

Let 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} denote the subset formed by those μ𝒫()\mu\in\mathcal{P}(\mathbb{R}) such that:

(2.6) log(1+x)𝑑μ(x)<\int\log(1+x)\,d\mu(x)<\infty

or, equivalently, log+xdμ<\int\log^{+}x\,d\mu<\infty (we will sometimes write “dμd\mu” instead of “dμ(x)d\mu(x)”).

Proposition 2.5.

Let c(0,1]c\in(0,1] and μ𝒫()\mu\in\mathcal{P}(\mathbb{R}). Then [μ]c<[\mu]_{c}<\infty if and only if μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}.

Proof.

The case c=1c=1 being clear, assume that c(0,1)c\in(0,1). Note that for all y>0y>0, the expression

(2.7) |K(x,y,c)c1log(x+1)|\left|K(x,y,c)-c^{-1}\log(x+1)\right|\,

is a bounded function of xx, so the integrability of K(x,y,c)K(x,y,c) and log(x+1)\log(x+1) are equivalent. ∎

Next, let us see that the standard properties one might expect for something called a “barycenter” are satisfied. For any x0x\geq 0, we denote by δx\delta_{x} the probability measure such that δx({x})=1\delta_{x}(\{x\})=1.

Proposition 2.6.

For all c[0,1]c\in[0,1] and μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, the following properties hold:

  1. (a)

    Reflexivity: [δx]c=x[\delta_{x}]_{c}=x, for every x0x\geq 0.

  2. (b)

    Monotonicity with respect to the measure: If μ1\mu_{1}, μ2𝒫HS()\mu_{2}\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} have distribution functions F1F_{1}, F2F_{2} such that F1F2F_{1}\geq F_{2}222I.e., μ2\mu_{2} is “more to the right” than μ1\mu_{1}. This defines a partial order, called usual stochastic ordering or first order stochastic dominance., then [μ1]c[μ2]c[\mu_{1}]_{c}\leq[\mu_{2}]_{c}.

  3. (c)

    Internality: If μ(I)=1\mu(I)=1 for an interval II\subseteq\mathbb{R}, then [μ]cI[\mu]_{c}\in I.

  4. (d)

    Homogeneity: If λ0\lambda\geq 0, and λμ\lambda_{*}\mu denotes the pushforward of μ\mu under the map xλxx\mapsto\lambda x, then [λμ]c=λ[μ]c[\lambda_{*}\mu]_{c}=\lambda[\mu]_{c}.

  5. (e)

    Monotonicity with respect to the parameter: If 0cc10\leq c^{\prime}\leq c\leq 1, then [μ]c[μ]c[\mu]_{c^{\prime}}\geq[\mu]_{c}.

Proof.

The proofs use the properties of the HS kernel listed in Proposition 2.2. Reflexivity is obvious when when c=1c=1 or x=0x=0, and in all other cases follows from property (e). Monotonicity with respect to the measure is a consequence of the fact that the HS kernel is increasing in xx. The internality property of the HS barycenter follows from reflexivity and monotonicity. Homogeneity follows from property (g) of the HS kernel and the change of variables formula. Finally, monotonicity with respect to the parameter cc is a consequence of the corresponding property of the HS kernel. ∎

As it will be clear later (see Example 2.13), the internality and the monotonicity properties (w.r.t. μ\mu and w.r.t. cc) are not strict.

2.2. Computation and critical phenomenon

In the remaining of this section, we discuss how to actually compute HS barycenters. In view of Proposition 2.5, we may focus on measures in 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}. The mass of zero plays an important role. Given c(0,1)c\in(0,1) and μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, we use the following terminology, where μ(0)=μ({0})\mu(0)=\mu(\{0\}):

(2.8) subcritical case:μ(0)<1ccritical case:μ(0)=1csupercritical case:μ(0)>1c}\left.\begin{tabular}[]{l r}subcritical case:&$\mu(0)<1-c$\\ critical case:&$\mu(0)=1-c$\\ supercritical case:&$\mu(0)>1-c$\end{tabular}\qquad\right\}

The next result establishes a way to compute [μ]c[\mu]_{c} in the subcritical case; the remaining cases will be dealt with later in Proposition 2.11.

Proposition 2.7.

If μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, c(0,1)c\in(0,1) and μ(0)<1c\mu(0)<1-c (subcritical case), then the equation

(2.9) Ky(x,η,c)𝑑μ(x)=0\int K_{y}(x,\eta,c)\,d\mu(x)=0

has a unique positive and finite solution η=η(μ,c)\eta=\eta(\mu,c), and the inf\inf in formula (2.4) is attained uniquely at y=ηy=\eta; in particular,

(2.10) [μ]c=expK(x,η,c)𝑑μ(x).[\mu]_{c}=\exp\int K(x,\eta,c)\,d\mu(x).
Proof.

Fix μ\mu and cc as in the statement. We compute the partial derivative:

(2.11) Ky(x,y,c)=Δ(x,y)y,whereΔ(x,y)1xcx+(1c)y.K_{y}(x,y,c)=\frac{\Delta(x,y)}{y}\,,\quad\text{where}\quad\Delta(x,y)\coloneqq 1-\frac{x}{cx+(1-c)y}\,.

Since Δ\Delta is bounded, we are allowed to differentiate under the integral sign:

(2.12) ddyK(x,y,c)𝑑μ=Ky(x,η,c)𝑑μ=ψ(y)y,\frac{d}{dy}\int K(x,y,c)\,d\mu=\int K_{y}(x,\eta,c)\,d\mu\\ =\frac{\psi(y)}{y}\,,

where ψ(y)Δ(x,y)𝑑μ\psi(y)\coloneqq\int\Delta(x,y)\,d\mu. The partial derivative

(2.13) Δy(x,y)=(1c)x(cx+(1c)y)2\Delta_{y}(x,y)=\frac{(1-c)x}{(cx+(1-c)y)^{2}}

is positive, except at x=0x=0. Since μδ0\mu\neq\delta_{0}, the function ψ\psi is strictly increasing. Furthermore,

(2.14) limy+Δ(x,y)=1andlimy0+Δ(x,y)={11/cif x>0,1if x=0,\lim_{y\to+\infty}\Delta(x,y)=1\quad\text{and}\quad\lim_{y\to 0^{+}}\Delta(x,y)=\begin{cases}1-1/c&\text{if $x>0$},\\ 1&\text{if $x=0$},\end{cases}

and so

(2.15) limy+ψ(y)=1andlimy0+ψ(y)=11μ(0)c<0,\lim_{y\to+\infty}\psi(y)=1\quad\text{and}\quad\lim_{y\to 0^{+}}\psi(y)=1-\frac{1-\mu(0)}{c}<0\,,

using the assumption μ(0)<1c\mu(0)<1-c. Therefore there exists a unique η>0\eta>0 that solves the equation ψ(η)=0\psi(\eta)=0, or equivalenlty equation (2.9). By (2.12), the function yK(x,y,c)𝑑μy\mapsto\int K(x,y,c)\,d\mu decreases on (0,η](0,\eta] and increases on [η,+)[\eta,+\infty), and so attains its infimum at η\eta. Formula (2.10) follows from the definition of [μ]c[\mu]_{c}. ∎

Let us note that, as a consequence of (2.11), equation (2.9) is equivalent to:

(2.16) xcx+(1c)η𝑑μ(x)=1.\int\frac{x}{cx+(1-c)\eta}\,d\mu(x)=1\,.
Remark 2.8.

If μ\mu belongs to 𝒫()\mathcal{P}(\mathbb{R}) but not to 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, and still c(0,1)c\in(0,1), then equation (2.9) (or its equivalent version (2.16)) still has a unique positive and finite solution η=η(μ,c)\eta=\eta(\mu,c), and formula (2.10) still holds. On the other hand, if c=0c=0 and x𝑑μ(x)<\int x\,d\mu(x)<\infty, then all conclusions of Proposition 2.7 still hold, with a similar proof.

We introduce the following auxiliary function, plotted in Fig. 2:

(2.17) B(c){0if c=0,c(1c)1ccif 0<c<1,1if c=1.B(c)\coloneqq\begin{cases}0&\text{if }c=0,\\ c(1-c)^{\frac{1-c}{c}}&\text{if }0<c<1,\\ 1&\text{if }c=1.\end{cases}
Refer to caption
Figure 2. Graph of the function BB defined by (2.17).

The following alternative formula for the HS barycenter matches the original one from [HS], and in some situations is more convenient:

Proposition 2.9.

If 0<c10<c\leq 1 and μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, then:

(2.18) [μ]c=B(c)infr>0{r1ccexp[c1log(x+r)𝑑μ(x)]}.[\mu]_{c}=B(c)\,\inf_{r>0}\left\{r^{-\frac{1-c}{c}}\exp\left[c^{-1}\int\log(x+r)\,d\mu(x)\right]\right\}\,.

Furthermore, if μ(0)<1c\mu(0)<1-c, then the inf\inf is attained at the unique positive finite solution ρ=ρ(μ,c)\rho=\rho(\mu,c) of the equation

(2.19) xx+ρ𝑑μ(x)=c.\int\frac{x}{x+\rho}\,d\mu(x)=c\,.
Proof.

The formula is obviously correct if c=1c=1. If 0<c<10<c<1, we introduce the variable r1ccyr\coloneqq\frac{1-c}{c}\,y in formula (2.4) and manipulate. Similarly, (2.16) becomes (2.19). ∎

If μ\mu is a Borel probability measure on \mathbb{R} not entirely concentrated at zero (i.e., μδ0\mu\neq\delta_{0}) then we denote by μ\mu the probability measure obtained by conditioning on the event =(0,)\mathbb{R}=(0,\infty), that is,

(2.20) μ(U)μ(U)μ(),for every Borel set U.\mu(U)\coloneqq\frac{\mu(U\cap\mathbb{R})}{\mu(\mathbb{R})},\quad\text{for every Borel set }U\subseteq\mathbb{R}\,.

Obviously, if μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, then μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} as well.

Proposition 2.10.

Let μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} and p1μ(0)p\coloneqq 1-\mu(0). If c(0,1]c\in(0,1] and cpc\leq p (critical or subcritical cases), then

(2.21) [μ]c=B(c)B(c/p)[μ]c/p.[\mu]_{c}=\frac{B(c)}{B(c/p)}[\mu]_{c/p}\,.
Proof.

Note that μδ0\mu\neq\delta_{0}, so the positive part μ\mu is defined. Using formula (2.18), we have:

(2.22) [μ]c\displaystyle[\mu]_{c} =B(c)infr>0r11cexp(1clog(x+r)𝑑μ)\displaystyle=B(c)\inf_{r>0}r^{1-\frac{1}{c}}\exp\left(\frac{1}{c}\int\log(x+r)d\mu\right)
(2.23) =B(c)infr>0r11cexp(1c(μ(0)logr+{x>0}log(x+r)𝑑μ))\displaystyle=B(c)\inf_{r>0}r^{1-\frac{1}{c}}\exp\left(\frac{1}{c}\left(\mu(0)\log r+\int_{\{x>0\}}\log(x+r)d\mu\right)\right)
(2.24) =B(c)infr>0r11c+1pcexp(1c{x>0}log(x+r)𝑑μ)\displaystyle=B(c)\inf_{r>0}r^{1-\frac{1}{c}+\frac{1-p}{c}}\exp\left(\frac{1}{c}\int_{\{x>0\}}\log(x+r)d\mu\right)
(2.25) =B(c)infr>0r1pcexp(1c/plog(x+r)𝑑μ).\displaystyle=B(c)\inf_{r>0}r^{1-\frac{p}{c}}\exp\left(\frac{1}{c/p}\int\log(x+r)d\mu\right)\,.

At this point, the assumption cpc\leq p guarantees that the barycenter [μ]c/p[\mu]_{c/p} is well-defined, and using (2.18) again we obtain (2.21). ∎

Finally, we compute the HS barycenter in the critical and supercritical cases:

Proposition 2.11.

Let μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} and c(0,1]c\in(0,1].

  1. (a)

    Critical case: If μ(0)=1c\mu(0)=1-c, then [μ]c=B(c)[μ]1[\mu]_{c}=B(c)[\mu]_{1}.

  2. (b)

    Supercritical case: If μ(0)>1c\mu(0)>1-c, then [μ]c=0[\mu]_{c}=0.

In both cases above, the infimum in formula (2.4) is not attained.

Proof.

In the critical case, we use (2.21) with p=cp=c and conclude.

In the supercritical case, we can assume that μδ0\mu\neq\delta_{0}. Note that p<cp<c, thus limr0+r1pc=0\lim_{r\to 0^{+}}r^{1-\frac{p}{c}}=0. Moreover, since log+xL1(μ)\log^{+}x\in L^{1}(\mu), we have

(2.26) limr0+{x>0}log(x+r)𝑑μ={x>0}logxdμ<+.\lim_{r\to 0^{+}}\int_{\{x>0\}}\log(x+r)\,d\mu=\int_{\{x>0\}}\log x\,d\mu<+\infty\,.

Therefore, using (2.25), we obtain [μ]c=0[\mu]_{c}=0. ∎

Propositions 2.7 and 2.11 allow us to compute HS barycenters in all cases. For emphasis, let us list explicitly the situations where the barycenter vanishes:

Proposition 2.12.

Let c[0,1]c\in[0,1] and μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}. Then [μ]c=0[\mu]_{c}=0 if and only if one of the following mutually exclusive situations occur:

  1. (a)

    c=0c=0 and μ=δ0\mu=\delta_{0}.

  2. (b)

    c>0c>0, μ(0)=1c\mu(0)=1-c, and logxdμ(x)=\int\log x\,d\mu(x)=-\infty.

  3. (c)

    c>0c>0 and μ(0)>1c\mu(0)>1-c.

Proof.

The case c=0c=0 being obvious, assume that c>0c>0, and so B(c)>0B(c)>0. In the critical case, part (a) of Proposition 2.11 tells us that [μ]c=0[\mu]_{c}=0 if and only if [μ]1=0[\mu]_{1}=0, which by (2.5) is equivalent to logxdμ=\int\log x\,d\mu=-\infty. In the supercritical case, part (b) of the Proposition ensures that [μ]c=0[\mu]_{c}=0. ∎

Example 2.13.

Consider the family of probability measures:

(2.27) μp(1p)δ0+pδ1,0p1.\mu_{p}\coloneqq(1-p)\delta_{0}+p\delta_{1}\,,\quad 0\leq p\leq 1.

If 0<c<10<c<1, then

(2.28) [μp]c={0if p<cB(c)=c(1c)1ccif p=cB(c)B(c/p)=(1c)1ccppc(pc)cpcif p>c.[\mu_{p}]_{c}=\begin{cases}0&\quad\text{if }p<c\\ B(c)=c(1-c)^{\frac{1-c}{c}}&\quad\text{if }p=c\\ \frac{B(c)}{B(c/p)}=(1-c)^{\frac{1-c}{c}}p^{\frac{p}{c}}(p-c)^{\frac{c-p}{c}}&\quad\text{if }p>c\,.\end{cases}

These formulas were first obtained by Székely [Sz1]. It follows that the function

(2.29) (p,c)[0,1]×[0,1][μp]c(p,c)\in[0,1]\times[0,1]\mapsto[\mu_{p}]_{c}

(whose graph is shown on [vE, p. 680]) is discontinuous at the points with p=c>0p=c>0, and only at those points. We will return to the issue of continuity in Section 4.

The practical computation of HS barycenters usually requires numerical methods. In any case, it is useful to notice that the function η\eta from Proposition 2.7 satisfies the internality property:

Lemma 2.14.

Let μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, c(0,1)c\in(0,1), and suppose that μ(0)<1c\mu(0)<1-c. If μ(I)=1\mu(I)=1 for an interval II\subseteq\mathbb{R}, then η(μ,c)I\eta(\mu,c)\in I.

The proof is left to the reader.

3. Comparison with the symmetric means

3.1. HS means as repetitive symmetric means

The HS barycenter of a probability measure, introduced in the previous section, may now be specialized to the case of discrete equidistributed probabilities. So the HS mean of a tuple x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}) of nonnegative numbers with parameter c[0,1]c\in[0,1] is defined as:

(3.1) 𝗁𝗌𝗆c(x¯)[δx1++δxnn]c.\mathsf{hsm}_{c}(\underline{x})\coloneqq\left[\frac{\delta_{x_{1}}+\dots+\delta_{x_{n}}}{n}\right]_{c}\,.

Using (2.18), we have more explicitly:

(3.2) 𝗁𝗌𝗆c(x¯)=B(c)infr>0r1ccj=1n(xj+r)1cn(if c>0),\mathsf{hsm}_{c}(\underline{x})=B(c)\inf_{r>0}r^{-\frac{1-c}{c}}\prod_{j=1}^{n}(x_{j}+r)^{\frac{1}{cn}}\quad\text{(if $c>0$)},

where BB is the function (2.17). On the other hand, recall that for k{1,,n}k\in\{1,\dots,n\}, the kk-th symmetric mean of the nn-tuple x¯\underline{x} is:

(3.3) 𝗌𝗒𝗆k(x¯)(Ek(n)(x¯)(nk))1k,\mathsf{sym}_{k}(\underline{x})\coloneqq\left(\frac{E^{(n)}_{k}(\underline{x})}{\binom{n}{k}}\right)^{\frac{1}{k}}\,,

where Ek(n)E^{(n)}_{k} denotes the elementary symmetric polynomial of degree kk in nn variables.

Since they originate from a barycenter, the HS means are repetition invariant333or intrinsic, in the terminology of [KLL, Def. 3.3] in the sense that, for any m>0m>0,

(3.4) 𝗁𝗌𝗆c(x¯(m))=𝗁𝗌𝗆c(x¯),\mathsf{hsm}_{c}\left(\underline{x}^{(m)}\right)=\mathsf{hsm}_{c}(\underline{x})\,,

where x¯(m)\underline{x}^{(m)} denotes the nmnm-tuple obtained by concatenation of mm copies of the nn-tuple x¯\underline{x}. No such property holds for the symmetric means, even allowing for adjustment of the parameter kk. Nevertheless, if the number of repetitions tends to infinity, then the symmetric means tend to stabilize, and the limit is a HS mean; more precisely:

Theorem 3.1.

If x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}), xi0x_{i}\geq 0, and 1kn1\leq k\leq n, then:

(3.5) limm𝗌𝗒𝗆km(x¯(m))=𝗁𝗌𝗆k/n(x¯).\lim_{m\to\infty}\mathsf{sym}_{km}\left(\underline{x}^{(m)}\right)=\mathsf{hsm}_{k/n}(\underline{x})\,.

Furthermore, the relative error goes to zero uniformly with respect to the xix_{i}’s.

This Theorem will be proved in the next subsection.

It is worthwhile to note that the Navas barycenter [Na] is obtained as a “repetition limit” similar to (3.5).

Example 3.2.

Using Propositions 2.7 and 2.11, one computes:

(3.6) 𝗁𝗌𝗆1/2(x,y)=(x+y2)2;\mathsf{hsm}_{1/2}(x,y)=\left(\frac{\sqrt{x}+\sqrt{y}}{2}\right)^{2}\,;

Therefore:

(3.7) limm𝗌𝗒𝗆m(x,,xm times,y,,ym times)=(x+y2)2.\lim_{m\to\infty}\mathsf{sym}_{m}\big{(}\underbrace{x,\dots,x}_{\text{$m$ times}},\,\underbrace{y,\dots,y}_{\text{$m$ times}}\big{)}=\left(\frac{\sqrt{x}+\sqrt{y}}{2}\right)^{2}\,.

The last equality was deduced in [CHMW, p. 31] from the asymptotics of Legendre polynomials.

Let us pose a problem:

Question 3.3.

Is the sequence m𝗌𝗒𝗆km(x¯(m))m\mapsto\mathsf{sym}_{km}\left(\underline{x}^{(m)}\right) always monotone decreasing?

There exists a partial result: when k=1k=1 and n=2n=2, [CHMW, Lemma 4.1] establishes eventual monotonicity.

3.2. Inequalities between symmetric means and HS means

The following is the first main result of this paper.

Theorem 3.4.

If x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}), xi0x_{i}\geq 0, and 1kn1\leq k\leq n, then

(3.8) 𝗁𝗌𝗆k/n(x¯)𝗌𝗒𝗆k(x¯)(nk)1/kB(kn)𝗁𝗌𝗆k/n(x¯).\mathsf{hsm}_{k/n}(\underline{x})\leq\mathsf{sym}_{k}(\underline{x})\leq\frac{\binom{n}{k}^{-1/k}}{B\left(\frac{k}{n}\right)}\,\mathsf{hsm}_{k/n}(\underline{x})\,.

Let us postpone the proof to the next subsection. The factor at the RHS of (3.8) is asymptotically 11 with respect to kk; indeed:

Lemma 3.5.

For all integers nk1n\geq k\geq 1, we have

(3.9) 1(nk)1/kB(kn)<(9k)12k,1\leq\frac{\binom{n}{k}^{-1/k}}{B\left(\frac{k}{n}\right)}<(9k)^{\frac{1}{2k}}\,,

with equality if and only if k=nk=n.

Proof.

Let ck/nc\coloneqq k/n and

(3.10) b(nk)[B(c)]k=(nk)ck(1c)nk.b\coloneqq\binom{n}{k}\left[B(c)\right]^{k}=\binom{n}{k}c^{k}\,(1-c)^{n-k}\,.

By the Binomial Theorem, b1b\leq 1, which yields the first part of (3.9). For the lower estimate, we use the following Stirling bounds (see [Fe, p. 54, (9.15)]), valid for all n1n\geq 1,

(3.11) 2πnn+12en<n!<2πnn+12en+112.\sqrt{2\pi}\,n^{n+\frac{1}{2}}e^{-n}<n!<\sqrt{2\pi}\,n^{n+\frac{1}{2}}e^{-n+\frac{1}{12}}\,.

Then, a calculation (cf. [Fe, p. 184, (3.11)]) gives:

(3.12) b>e162πc(1c)n>19k,b>\frac{e^{-\frac{1}{6}}}{\sqrt{2\pi c(1-c)n}}>\frac{1}{\sqrt{9k}}\,,

from which the second part of (3.9) follows. ∎

Theorems 3.4 and 3.5 imply that HS means (with rational values of the parameter) can be obtained as repetition limits of symmetric means:

Proof of Theorem 3.1.

Applying Theorem 3.4 to the tuple x¯(m)\underline{x}^{(m)}, using observation (3.4) and Lemma 3.5, we have:

(3.13) 𝗁𝗌𝗆k/n(x¯)limm𝗌𝗒𝗆km(x¯(m))(9km)12km𝗁𝗌𝗆k/n(x¯).\mathsf{hsm}_{k/n}(\underline{x})\leq\lim_{m\to\infty}\mathsf{sym}_{km}(\underline{x}^{(m)})\leq(9km)^{\frac{1}{2km}}\,\mathsf{hsm}_{k/n}(\underline{x})\,.\qed
Remark 3.6.

If kk is fixed, then

(3.14) limn(nk)1/kB(kn)=e(k!)1/kk>1,\lim_{n\to\infty}\frac{\binom{n}{k}^{-1/k}}{B\left(\frac{k}{n}\right)}=\frac{e(k!)^{1/k}}{k}>1\,,

and therefore the bound from Theorem 3.7 may be less satisfactory. But in this case we may use the alternative bound coming from Maclaurin inequality:

(3.15) 𝗌𝗒𝗆k(x¯)𝗌𝗒𝗆1(x¯)=𝗁𝗌𝗆0(x¯).\mathsf{sym}_{k}(\underline{x})\leq\mathsf{sym}_{1}(\underline{x})=\mathsf{hsm}_{0}(\underline{x})\,.

3.3. Proof of Theorem 3.4

The two inequalities in (3.8) will be proved independently of each other. They are essentially contained in the papers [BIP] and [HS], respectively, though neither was stated explicitly. In the following Theorems 3.8 and 3.7, we also characterize the cases of equality, and in particular show that each inequality is sharp in the sense that the corresponding factors cannot be improved.

Let us begin with the second inequality, which is more elementary. By symmetry, there is no loss of generality in assuming that the numbers xix_{i} are ordered.

Theorem 3.7.

If x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}) with x1xn0x_{1}\geq\cdots\geq x_{n}\geq 0, and 1kn1\leq k\leq n, then:

(3.16) 𝗌𝗒𝗆k(x¯)(nk)1/kB(kn)𝗁𝗌𝗆k/n(x¯).\mathsf{sym}_{k}(\underline{x})\leq\frac{\binom{n}{k}^{-1/k}}{B\left(\frac{k}{n}\right)}\mathsf{hsm}_{k/n}(\underline{x})\,.

Furthermore, equality holds if and only if xk+1=0x_{k+1}=0 or k=nk=n.

Proof.

Our starting point is Vieta’s formula:

(3.17) zn+=1nE(n)(x1,,xn)zn=j=1n(xj+z).z^{n}+\sum_{\ell=1}^{n}E^{(n)}_{\ell}(x_{1},\dots,x_{n})z^{n-\ell}=\prod_{j=1}^{n}(x_{j}+z)\,.

Therefore, by Cauchy’s formula, for any r>0r>0:

(3.18) Ek(n)(x1,,xn)=12π𝐢|z|=r1znk+1j=1n(xj+z)dz.E^{(n)}_{k}(x_{1},\dots,x_{n})=\frac{1}{2\pi\mathbf{i}}\ointctrclockwise_{|z|=r}\frac{1}{z^{n-k+1}}\prod_{j=1}^{n}(x_{j}+z)\,dz\,.

That is,

(3.19) Ek(n)(x1,,xn)=12πππ(re𝐢θ)n+kj=1n(xj+re𝐢θ)dθ,E^{(n)}_{k}(x_{1},\dots,x_{n})=\frac{1}{2\pi}\int_{-\pi}^{\pi}(re^{\mathbf{i}\theta})^{-n+k}\prod_{j=1}^{n}(x_{j}+re^{\mathbf{i}\theta})\,d\theta\,,

Taking absolute values,

(3.20) Ek(n)(x1,,xn)12πππrn+kj=1n|xj+re𝐢θ|dθrn+kj=1n(xj+r).E^{(n)}_{k}(x_{1},\dots,x_{n})\leq\frac{1}{2\pi}\int_{-\pi}^{\pi}r^{-n+k}\prod_{j=1}^{n}|x_{j}+re^{\mathbf{i}\theta}|\,d\theta\leq r^{-n+k}\prod_{j=1}^{n}(x_{j}+r)\,.

But these inequalities are valid for all r>0r>0, and therefore:

(3.21) Ek(n)(x1,,xn)infr>0rn+kj=1n(xj+r).E^{(n)}_{k}(x_{1},\dots,x_{n})\leq\inf_{r>0}r^{-n+k}\prod_{j=1}^{n}(x_{j}+r)\,.

So formulas (3.2) and (3.3) imply inequality (3.16).

Now let us investigate the possibility of equality. We consider three mutually exclusive cases, which correspond to the classification (2.8):

(3.22) subcritical case:xk+1>0critical case:xk>0=xk+1supercritical case:xk=0}\left.\begin{tabular}[]{l l}subcritical case:&$x_{k+1}>0$\\ critical case:&$x_{k}>0=x_{k+1}$\\ supercritical case:&$x_{k}=0$\end{tabular}\qquad\right\}

Using Proposition 2.11, in the critical case we have:

(3.23) 𝗌𝗒𝗆k(x¯)=(x1xk(nk))1kand𝗁𝗌𝗆k/n(x¯)=(x1xk)1kB(kn),\mathsf{sym}_{k}(\underline{x})=\left(\frac{x_{1}\cdots x_{k}}{\binom{n}{k}}\right)^{\frac{1}{k}}\quad\text{and}\quad\mathsf{hsm}_{k/n}(\underline{x})=\frac{(x_{1}\cdots x_{k})^{\frac{1}{k}}}{B(\frac{k}{n})}\,,

while in the supercritical case the two means vanish together. So, in both cases, inequality (3.16) becomes an equality. Now suppose we are in the subcritical case; then the inf\inf at the RHS of (3.21) is attained at some r>0r>0: see Proposition 2.9. On the other hand, for this (and actually any) value of rr, the second inequality in (3.20) must be strict, because the integrand is non-constant. We conclude that, in the subcritical case, inequality (3.21) is strict, and therefore (3.16) is strict. ∎

The first inequality in (3.8) is a particular case of an inequality between two types of matrix means introduced in [BIP], which we now explain. Let A=(ai,j)i,j{1,,n}A=(a_{i,j})_{i,j\in\{1,\dots,n\}} be a n×nn\times n matrix with nonnegative entries. Recall that the permanent of AA is the “signless determinant”

(3.24) per(A)σi=1nai,σ(i),\operatorname{per}(A)\coloneqq\sum_{\sigma}\prod_{i=1}^{n}a_{i,\sigma(i)}\,,

where σ\sigma runs on the permutations of {1,,n}\{1,\dots,n\}. Then the permanental mean of AA is defined as:

(3.25) 𝗉𝗆(A)(per(A)n!)1n.\mathsf{pm}(A)\coloneqq\left(\frac{\operatorname{per}(A)}{n!}\right)^{\frac{1}{n}}\,.

On the other hand, the scaling mean of the matrix AA is defined as:

(3.26) 𝗌𝗆(A)1n2infu,vuAv𝗀𝗆(u)𝗀𝗆(v),\mathsf{sm}(A)\coloneqq\frac{1}{n^{2}}\inf_{u,v}\frac{uAv}{\mathsf{gm}(u)\mathsf{gm}(v)}\,,

where uu and vv run on the set of strictly positive column vectors, and 𝗀𝗆()\mathsf{gm}(\mathord{\cdot}) denotes the geometric mean of the entries of the vector. Equivalently,

(3.27) 𝗌𝗆(A)=1ninfv𝗀𝗆(Av)𝗀𝗆(v);\mathsf{sm}(A)=\frac{1}{n}\inf_{v}\frac{\mathsf{gm}(Av)}{\mathsf{gm}(v)}\,;

see [BIP, Rem. 2.6].444Incidentally, formula (3.27) shows that, up to the factor 1/n1/n, the scaling mean is a matrix antinorm in the sense defined by [GZ]. By [BIP, Thrm. 2.17],

(3.28) 𝗌𝗆(A)𝗉𝗆(A),\mathsf{sm}(A)\leq\mathsf{pm}(A)\,,

with equality if and only if AA has permanent 0 or rank 11. This inequality is far from trivial. Indeed, if the matrix AA is doubly stochastic (i.e. row and column sums are all 11), then an easy calculation (see [BIP, Prop. 2.4]) shows that 𝗌𝗆(A)=1n\mathsf{sm}(A)=\frac{1}{n}, so (3.28) becomes 𝗉𝗆(A)1n\mathsf{pm}(A)\geq\frac{1}{n}, or equivalently,

(3.29) per(A)n!nn(if A is doubly stochastic).\operatorname{per}(A)\geq\frac{n!}{n^{n}}\quad\text{(if $A$ is doubly stochastic).}

This lower bound on the permanent of doubly stochastic matrices was conjectured in 1926 by van der Waerden and, after a protracted series of partial results, proved around 1980 independently by Egorichev and Falikman: see [Zh, Chapter 5] for the exact references and a self-contained proof, and [Gu] for more recent developments. Our inequality (3.28), despite being a generalization of Egorichev–Falikman’s (3.29), is actually a relatively simple corollary of it: we refer the reader to [BIP, § 2] for more information.555The proof of [BIP, Thrm. 2.17] uses a theorem on the existence of particular type of matrix factorization called Sinkhorn decomposition. The present article only needs the inequality (3.28) for matrices of a specific form (3.32). So the use of the existence theorem could be avoided, since it is possible to explicitly compute the corresponding Sinkhorn decomposition.

We are now in position to complete the proof of Theorem 3.4, i.e., to prove the second inequality in (3.8). The next result also characterizes the cases of equality.

Theorem 3.8.

If x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}) with x1xn0x_{1}\geq\cdots\geq x_{n}\geq 0, and 1kn1\leq k\leq n, then:

(3.30) 𝗁𝗌𝗆k/n(x¯)𝗌𝗒𝗆k(x¯).\mathsf{hsm}_{k/n}(\underline{x})\leq\mathsf{sym}_{k}(\underline{x})\,.

Furthermore, equality holds if and only if :

(3.31) k=norx1==xnorxk=0.k=n\quad\text{or}\quad x_{1}=\cdots=x_{n}\quad\text{or}\quad x_{k}=0\,.
Proof.

Consider the nonnegative n×nn\times n matrix:

(3.32) A(x1x1 1 1xnxn 1 1)with nk columns of 1’s.A\coloneqq\begin{pmatrix}x_{1}&\cdots&x_{1}&\,1\,&\,\cdots\,&\,1\,\\[8.0pt] \vdots&&\vdots&\vdots&&\vdots\\[8.0pt] x_{n}&\cdots&x_{n}&\,1\,&\,\cdots\,&\,1\,\end{pmatrix}\quad\text{with $n-k$ columns of $1$'s.}

Note that:

(3.33) per(A)=k!(nk)!Ek(n)(x1,,xn)\operatorname{per}(A)=k!\,(n-k)!\,E^{(n)}_{k}(x_{1},\dots,x_{n})

and so

(3.34) 𝗉𝗆(A)\displaystyle\mathsf{pm}(A) =(Ek(n)(x1,,xn)(nk))1n\displaystyle=\left(\frac{E^{(n)}_{k}(x_{1},\dots,x_{n})}{\binom{n}{k}}\right)^{\frac{1}{n}}
(3.35) =[𝗌𝗒𝗆k(x¯)]k/n.\displaystyle=\left[\mathsf{sym}_{k}(\underline{x})\right]^{k/n}\,.

Now let’s compute the scaling mean of AA using formula (3.27). Assume that k<nk<n. Given a column vector v=(v1vn)v=\left(\begin{smallmatrix}v_{1}\\ \vdots\\ v_{n}\end{smallmatrix}\right) with positive entries, we have:

(3.36) 𝗀𝗆(Av)=i=1n(sxi+r)1n,where si=1kvi and ri=k+1nvi.\mathsf{gm}(Av)=\prod_{i=1}^{n}(sx_{i}+r)^{\frac{1}{n}}\,,\quad\text{where }s\coloneqq\sum_{i=1}^{k}v_{i}\text{ and }r\coloneqq\sum_{i=k+1}^{n}v_{i}\,.

On the other hand, by the inequality of arithmetic and geometric means,

(3.37) 𝗀𝗆(v)(sk)kn(rnk)nkn,\mathsf{gm}(v)\leq\left(\frac{s}{k}\right)^{\frac{k}{n}}\left(\frac{r}{n-k}\right)^{\frac{n-k}{n}}\,,

with equality if v1==vk=skv_{1}=\cdots=v_{k}=\frac{s}{k}, vk+1==vn=rnkv_{k+1}=\cdots=v_{n}=\frac{r}{n-k}. So, in order to minimize the quotient 𝗀𝗆(Av)𝗀𝗆(v)\frac{\mathsf{gm}(Av)}{\mathsf{gm}(v)}, it is sufficient to consider column vectors vv satisfying these conditions. We can also normalize ss to 11, and (3.27) becomes:

(3.38) 𝗌𝗆(A)\displaystyle\mathsf{sm}(A) =infr>0i=1n(xi+r)1n(1k)kn(rnk)nkn\displaystyle=\inf_{r>0}\frac{\prod_{i=1}^{n}(x_{i}+r)^{\frac{1}{n}}}{\left(\frac{1}{k}\right)^{\frac{k}{n}}\left(\frac{r}{n-k}\right)^{\frac{n-k}{n}}}
(3.39) =[𝗁𝗌𝗆k/n(x¯)]k/n\displaystyle=\left[\mathsf{hsm}_{k/n}(\underline{x})\right]^{k/n}\,

by (3.2). This formula 𝗌𝗆(A)=[𝗁𝗌𝗆k/n(x¯)]k/n\mathsf{sm}(A)=\left[\mathsf{hsm}_{k/n}(\underline{x})\right]^{k/n} also holds for k=nk=n, taking the form 𝗌𝗆(A)=(x1xn)1n\mathsf{sm}(A)=(x_{1}\cdots x_{n})^{\frac{1}{n}}; this can be checked either by adapting the proof above, or more simply by using the homogeneity and reflexivity properties of the scaling mean (see [BIP]).

In conclusion, the matrix (3.32) has scaling and permanental means given by formulas (3.39) and (3.35), respectively, and the fundamental inequality (3.28) translates into 𝗁𝗌𝗆k/n(x¯)𝗌𝗒𝗆k(x¯)\mathsf{hsm}_{k/n}(\underline{x})\leq\mathsf{sym}_{k}(\underline{x}), that is, (3.8).

Furthermore, equality holds if and only if the matrix AA defined by (3.32) satisfies 𝗌𝗆(A)=𝗉𝗆(A)\mathsf{sm}(A)=\mathsf{pm}(A), by formulas (3.39) and (3.35). As mentioned before, 𝗌𝗆(A)=𝗉𝗆(A)\mathsf{sm}(A)=\mathsf{pm}(A) if and only if AA has rank 11 or permanent 0 (see [BIP, Thrm. 2.17]). Note that AA has rank 11 if and only if k=1k=1 or x1==xnx_{1}=\dots=x_{n}. On the other hand, by (3.35), AA has permanent 0 if and only if 𝗌𝗒𝗆k(x¯)=0\mathsf{sym}_{k}(\underline{x})=0, or equivalently xk=0x_{k}=0. So we have proved that equality 𝗁𝗌𝗆k/n(x¯)=𝗌𝗒𝗆k(x¯)\mathsf{hsm}_{k/n}(\underline{x})=\mathsf{sym}_{k}(\underline{x}) is equivalent to condition (3.31). ∎

We close this section with some comments on related results.

Remark 3.9.

In [HS], the asymptotics of the integral (3.18) are determined using the saddle point method (see e.g. [Si, Section 15.4]). However, for this method to work, the saddle must be steep, that is, the second derivative at the saddle must be large in absolute value. Major [Maj, p. 1987] discusses this situation: if the second derivative vanishes, then “a more sophisticated method has to be applied and only weaker results can be obtained in this case. We shall not discuss this question in the present paper”. On the other hand, in the general situation covered by our Theorem 3.4, the saddle can be flat. (It must be noted that the setting considered by Major is different, since he allows random variables to be negative.)

Remark 3.10.

Given an arbitrary n×nn\times n non-negative matrix AA, the permanental and scaling means satisfy the following inequalities (see [BIP, Theorem 2.17]),

(3.40) 𝗌𝗆(A)𝗉𝗆(A)n(n!)1/n𝗌𝗆(A).\mathsf{sm}(A)\leq\mathsf{pm}(A)\leq n(n!)^{-1/n}\mathsf{sm}(A).

The sequence (n(n!)1/n)(n(n!)^{-1/n}) is increasing and converges to ee. In general, as nn tends to infinity the permanental mean does not necessarily converges to the scaling mean. However, there are some special classes of matrices for which this is indeed the case: for example, in the repetitive situation covered by the Generalized Friedland limit [BIP, Theorem 2.19]. Note that 𝗁𝗌𝗆k/n(x¯)\mathsf{hsm}_{k/n}(\underline{x}) and 𝗌𝗒𝗆k(x¯)\mathsf{sym}_{k}(\underline{x}) correspond to the n/kn/k-th power of the scaling and permanental mean of the matrix AA, respectively. Therefore, (3.8) can be regarded as an improvement of (3.40) for this particular class of matrices.

Remark 3.11.

A natural extension of symmetric means are Muirhead means, see [HLP, § 2.18], [Bu, § V.6] for definition and properties. Accordingly, it should be possible to define a family of barycenters extending the HS barycenters, taking over from [BIP, § 5.2]. An analogue of inequality (3.30) holds in this extended setting, again as a consequence of the key inequality (3.28) between matrix means. However, we do not know if inequality (3.16) can be extended in a comparable level of generality.

4. Continuity of the HS barycenter

In this section we study the continuity of the HS barycenter as a two-variable function, (μ,c)[μ]c(\mu,c)\mapsto[\mu]_{c}, defined in the space 𝒫()×[0,1]\mathcal{P}(\mathbb{R})\times[0,1]. The most natural topology on 𝒫()\mathcal{P}(\mathbb{R}) is the weak topology (defined below). The barycenter function is not continuous with respect to this topology, but, on the positive side, it is lower semicontinuous, except in a particular situation. In order to obtain better results, we need to focus on subsets of measures satisfying the natural integrability conditions (usually (2.6), but differently for the extremal parameters c=0c=0 and c=1c=1), and endow these subsets with stronger topologies that are well adapted to the integrability assumptions.

In a preliminary subsection, we collect some general facts on topologies on spaces of measures. In the remaining subsections we prove several results on continuity of the HS barycenter. And all these results will be used in combination to prove our general ergodic theorem in Section 5.

4.1. Convergence of measures

If (X,d)(X,\mathrm{d}) is a separable complete metric space, let Cb(X)C_{\mathrm{b}}(X) be the set of all continuous bounded real functions on XX, and let 𝒫(X)\mathcal{P}(X) denote the set of all Borel probability measures on XX. Recall (see e.g. [Pa]) that the weak topology is a metrizable topology on 𝒫(X)\mathcal{P}(X) according to with a sequence (μn)(\mu_{n}) converges to some μ\mu if and only if ϕ𝑑μnϕ𝑑μ\int\phi\,d\mu_{n}\to\int\phi\,d\mu for every test function ϕCb(X)\phi\in C_{\mathrm{b}}(X); we say that (μn)(\mu_{n}) converges weakly to μ\mu, and denote this by μnμ\mu_{n}\rightharpoonup\mu. The space 𝒫(X)\mathcal{P}(X) is Polish, and it is compact if and only if XX is compact. Despite the space Cb(X)C_{\mathrm{b}}(X) being huge (nonseparable w.r.t. its usual topology if XX is noncompact), by [Pa, Theorem II.6.6] we can nevertheless find a countable subset 𝒞Cb(X)\mathcal{C}\subseteq C_{\mathrm{b}}(X) such that, for all (μn)(\mu_{n}) and μ\mu in 𝒫(X)\mathcal{P}(X),

(4.1) μnμϕ𝒞,ϕ𝑑μnϕ𝑑μ.\mu_{n}\rightharpoonup\mu\quad\Longleftrightarrow\quad\forall\phi\in\mathcal{C},\ {\textstyle\int\phi\,d\mu_{n}\to\int\phi\,d\mu}\,.

The following result deals with sequences of integrals ϕn𝑑μn\int\phi_{n}\,d\mu_{n} where not only the measures but also the integrands vary, and bears a resemblance to Fatou’s Lemma:

Proposition 4.1.

Suppose that (μn)(\mu_{n}) is a sequence in 𝒫(X)\mathcal{P}(X) converging weakly to some measure μ\mu, and that (ϕn)(\phi_{n}) is a sequence of continuous functions on XX converging uniformly on compact subsets to some function ϕ\phi. Furthermore, assume that the functions ϕn\phi_{n} are bounded from below by a constant C-C independent of nn. Then, lim infnϕn𝑑μnϕ𝑑μ\liminf_{n\to\infty}\int\phi_{n}\,d\mu_{n}\geq\int\phi\,d\mu.

Note that, as in Fatou’s Lemma, the integrals in Proposition 4.1 can be infinite.

Proof.

Without loss of generality, assume that C=0C=0. Let λ\lambda\in\mathbb{R} be such that λ<ϕ𝑑μ\lambda<\int\phi\,d\mu. By the monotone convergence theorem, there exists mm\in\mathbb{N} such that min{ϕ,m}𝑑μ>λ\int\min\{\phi,m\}\,d\mu>\lambda. For a function ψ:X\psi:X\to\mathbb{R}, let ψ^(x):=min{ψ(x),m}\hat{\psi}(x):=\min\{\psi(x),m\}. Note that

(4.2) ϕn𝑑μnϕ^n𝑑μn=(ϕ^nϕ^)𝑑μn+ϕ^𝑑μn.\displaystyle\int\phi_{n}\,d\mu_{n}\geq\int\hat{\phi}_{n}\,d\mu_{n}=\int\left(\hat{\phi}_{n}-\hat{\phi}\right)\,d\mu_{n}+\int\hat{\phi}\,d\mu_{n}\,.

By Prokhorov’s theorem (see e.g. [Pa, Theorem 6.7]), the sequence (μn)(\mu_{n}) forms a tight set, that is, for every ε>0\varepsilon>0, there exists a compact set KXK\subseteq X such that μn(XK)ε/(2m)\mu_{n}(X\smallsetminus K)\leq\varepsilon/(2m) for all nn. Since (ϕn)(\phi_{n}) converges uniformly on compact subsets to ϕ\phi, we obtain:

(4.3) limn|K(ϕ^nϕ^)𝑑μn|limnsupxK|ϕ^n(x)ϕ^(x)|=0.\lim_{n\to\infty}\left|\int_{K}\left(\hat{\phi}_{n}-\hat{\phi}\right)\,d\mu_{n}\right|\leq\lim_{n\to\infty}\sup_{x\in K}\left|\hat{\phi}_{n}(x)-\hat{\phi}(x)\right|=0\,.

We also have:

(4.4) |XK(ϕ^nϕ^)𝑑μn|μn(XK)supxXK|ϕ^n(x)ϕ^(x)|<ε2m2m=ε.\left|\int_{X\smallsetminus K}\left(\hat{\phi}_{n}-\hat{\phi}\right)\,d\mu_{n}\right|\leq\mu_{n}(X\smallsetminus K)\sup_{x\in X\smallsetminus K}\left|\hat{\phi}_{n}(x)-\hat{\phi}(x)\right|<\frac{\varepsilon}{2m}2m=\varepsilon.

Since (μn)(\mu_{n}) converges weakly to μ\mu, we have limnϕ^𝑑μn=ϕ^𝑑μ>λ\lim_{n\to\infty}\int\hat{\phi}\,d\mu_{n}=\int\hat{\phi}\,d\mu>\lambda. Therefore, combining (4.2), (4.3) and (4.4), for sufficiently large values of nn we obtain ϕn𝑑μn>λ\int\phi_{n}\,d\mu_{n}>\lambda. The result now follows. ∎

The next direct consequence is useful.

Corollary 4.2.

Suppose that (μn)(\mu_{n}) is a sequence in 𝒫(X)\mathcal{P}(X) converging weakly to some measure μ\mu, and that (ϕn)(\phi_{n}) is a sequence of continuous functions on XX converging uniformly on compact subsets to some function ϕ\phi. Furthermore, assume that the functions |ϕn||\phi_{n}| are bounded by a constant CC independent of nn. Then, ϕn𝑑μnϕ𝑑μ\int\phi_{n}\,d\mu_{n}\to\int\phi\,d\mu.

We will also need a slightly stronger notion of convergence. Let 𝒫1(X)𝒫(X)\mathcal{P}_{1}(X)\subseteq\mathcal{P}(X) denote the set of measures μ\mu with finite first moment, that is,

(4.5) d(x,x0)𝑑μ(x)<,\int\mathrm{d}(x,x_{0})\,d\mu(x)<\infty\,,

Here and in what follows, x0Xx_{0}\in X is a basepoint which we consider as fixed, the particular choice being entirely irrelevant. We metrize 𝒫1(X)\mathcal{P}_{1}(X) with Kantorovich metric (see e.g. [Vi, p. 207]):

(4.6) W1(μ,ν)sup{ψd(μν);ψ:X[1,1] is 1-Lipschitz}.W_{1}(\mu,\nu)\coloneqq\sup\left\{\int\psi\,d(\mu-\nu)\;\mathord{;}\;\psi\colon X\to[-1,1]\text{ is $1$-Lipschitz}\right\}\,.

Of course, the Kantorovich metric depends on the original metric d\mathrm{d} on XX; in fact, it “remembers” it, since W1(δx,δy)=d(x,y)W_{1}(\delta_{x},\delta_{y})=\mathrm{d}(x,y). The metric space (𝒫1(X),W1)(\mathcal{P}_{1}(X),W_{1}) is called 11-Wasserstein space; it is separable and complete. Unless XX is compact, the topology on 𝒫1(X)\mathcal{P}_{1}(X) is stronger than the weak topology. In fact, we have the following characterizations of convergence:

Theorem 4.3 ([Vi, Theorem 7.12]).

For all (μn)(\mu_{n}) and μ\mu in 𝒫1(X)\mathcal{P}_{1}(X), the following statements are equivalent:

  1. (a)

    W1(μn,μ)0W_{1}(\mu_{n},\mu)\to 0.

  2. (b)

    if ϕ:X\phi\colon X\to\mathbb{R} is a continuous function such that |ϕ|1+d(,x0)\frac{|\phi|}{1+\mathrm{d}(\mathord{\cdot},x_{0})} is bounded, then ϕ𝑑μnϕ𝑑μ\int\phi\,d\mu_{n}\to\int\phi\,d\mu.

  3. (c)

    μnμ\mu_{n}\rightharpoonup\mu and d(,x0)𝑑μnd(,x0)𝑑μ\int\mathrm{d}(\mathord{\cdot},x_{0})\,d\mu_{n}\to\int\mathrm{d}(\mathord{\cdot},x_{0})\,d\mu.

  4. (d)

    μnμ\mu_{n}\rightharpoonup\mu and the following “tightness condition” holds:

    (4.7) limRlim supnXBR(x0)[1+d(,x0)]𝑑μn=0.\lim_{R\to\infty}\limsup_{n\to\infty}\int_{X\smallsetminus B_{R}(x_{0})}[1+\mathrm{d}(\mathord{\cdot},x_{0})]\,d\mu_{n}=0\,.

    where BR(x0)B_{R}(x_{0}) denotes the open ball of center x0x_{0} and radius RR.

The next Lemma should be compared to Corollary 4.2:

Lemma 4.4.

Suppose that (μn)(\mu_{n}) is a sequence in 𝒫1(X)\mathcal{P}_{1}(X) converging to some measure μ\mu, and that (ϕn)(\phi_{n}) is a sequence of continuous functions on XX converging uniformly on bounded subsets to some function ϕ\phi. Furthermore, assume that the functions |ϕn|1+d(,x0)\frac{|\phi_{n}|}{1+\mathrm{d}(\mathord{\cdot},x_{0})} are bounded by a constant CC independent of nn. Then, ϕn𝑑μnϕ𝑑μ\int\phi_{n}\,d\mu_{n}\to\int\phi\,d\mu.

Proof.

Fix ε>0\varepsilon>0. By part (d) of Theorem 4.3, there exists R>0R>0 such that, for all sufficiently large nn,

(4.8) XBR(x0)[1+d(x,x0)]𝑑μn(x)ε.\int_{X\smallsetminus B_{R}(x_{0})}[1+\mathrm{d}(x,x_{0})]\,d\mu_{n}(x)\leq\varepsilon\,.

Then, we write:

(4.9) ϕn𝑑μnϕ𝑑μ=(ϕnϕ)𝑑μn+ϕd(μnμ)=XBR(x0)(ϕnϕ)𝑑μn1+BR(x0)(ϕnϕ)𝑑μn2+ϕd(μnμ)3.\int\phi_{n}\,d\mu_{n}-\int\phi\,d\mu=\int(\phi_{n}-\phi)\,d\mu_{n}+\int\phi\,d(\mu_{n}-\mu)\\ =\underbrace{\int_{X\smallsetminus B_{R}(x_{0})}(\phi_{n}-\phi)\,d\mu_{n}}_{\leavevmode\hbox to9.75pt{\vbox to9.75pt{\pgfpicture\makeatletter\hbox{\hskip 4.87285pt\lower-4.87285pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.67285pt}{0.0pt}\pgfsys@curveto{4.67285pt}{2.58076pt}{2.58076pt}{4.67285pt}{0.0pt}{4.67285pt}\pgfsys@curveto{-2.58076pt}{4.67285pt}{-4.67285pt}{2.58076pt}{-4.67285pt}{0.0pt}\pgfsys@curveto{-4.67285pt}{-2.58076pt}{-2.58076pt}{-4.67285pt}{0.0pt}{-4.67285pt}\pgfsys@curveto{2.58076pt}{-4.67285pt}{4.67285pt}{-2.58076pt}{4.67285pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}+\underbrace{\int_{B_{R}(x_{0})}(\phi_{n}-\phi)\,d\mu_{n}}_{\leavevmode\hbox to9.75pt{\vbox to9.75pt{\pgfpicture\makeatletter\hbox{\hskip 4.87285pt\lower-4.87285pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.67285pt}{0.0pt}\pgfsys@curveto{4.67285pt}{2.58076pt}{2.58076pt}{4.67285pt}{0.0pt}{4.67285pt}\pgfsys@curveto{-2.58076pt}{4.67285pt}{-4.67285pt}{2.58076pt}{-4.67285pt}{0.0pt}\pgfsys@curveto{-4.67285pt}{-2.58076pt}{-2.58076pt}{-4.67285pt}{0.0pt}{-4.67285pt}\pgfsys@curveto{2.58076pt}{-4.67285pt}{4.67285pt}{-2.58076pt}{4.67285pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}+\underbrace{\int\phi\,d(\mu_{n}-\mu)}_{\leavevmode\hbox to9.75pt{\vbox to9.75pt{\pgfpicture\makeatletter\hbox{\hskip 4.87285pt\lower-4.87285pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.67285pt}{0.0pt}\pgfsys@curveto{4.67285pt}{2.58076pt}{2.58076pt}{4.67285pt}{0.0pt}{4.67285pt}\pgfsys@curveto{-2.58076pt}{4.67285pt}{-4.67285pt}{2.58076pt}{-4.67285pt}{0.0pt}\pgfsys@curveto{-4.67285pt}{-2.58076pt}{-2.58076pt}{-4.67285pt}{0.0pt}{-4.67285pt}\pgfsys@curveto{2.58076pt}{-4.67285pt}{4.67285pt}{-2.58076pt}{4.67285pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{3}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}}\,.

By part (b) of Theorem 4.3, the term 3 tends to 0 as nn\to\infty. By the assumption of uniform convergence on bounded sets, |2|supB2R(x0)|ϕnϕ|\left|\leavevmode\hbox to9.75pt{\vbox to9.75pt{\pgfpicture\makeatletter\hbox{\hskip 4.87285pt\lower-4.87285pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.67285pt}{0.0pt}\pgfsys@curveto{4.67285pt}{2.58076pt}{2.58076pt}{4.67285pt}{0.0pt}{4.67285pt}\pgfsys@curveto{-2.58076pt}{4.67285pt}{-4.67285pt}{2.58076pt}{-4.67285pt}{0.0pt}\pgfsys@curveto{-4.67285pt}{-2.58076pt}{-2.58076pt}{-4.67285pt}{0.0pt}{-4.67285pt}\pgfsys@curveto{2.58076pt}{-4.67285pt}{4.67285pt}{-2.58076pt}{4.67285pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{2}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right|\leq\sup_{B_{2R}(x_{0})}|\phi_{n}-\phi| tends to 0 as well. Finally,

(4.10) |1|XBR(x0)(|ϕn|+|ϕ|)𝑑μn2CXBR(x0)[1+d(x,x0)]𝑑μn(x)2Cε,\left|\leavevmode\hbox to9.75pt{\vbox to9.75pt{\pgfpicture\makeatletter\hbox{\hskip 4.87285pt\lower-4.87285pt\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\pgfsys@setlinewidth{0.4pt}\pgfsys@invoke{ }\nullfont\hbox to0.0pt{\pgfsys@beginscope\pgfsys@invoke{ }{ {{}}\hbox{\hbox{{\pgfsys@beginscope\pgfsys@invoke{ }{{}{{{}}}{{}}{}{}{}{}{}{}{}{}{}{{}\pgfsys@moveto{4.67285pt}{0.0pt}\pgfsys@curveto{4.67285pt}{2.58076pt}{2.58076pt}{4.67285pt}{0.0pt}{4.67285pt}\pgfsys@curveto{-2.58076pt}{4.67285pt}{-4.67285pt}{2.58076pt}{-4.67285pt}{0.0pt}\pgfsys@curveto{-4.67285pt}{-2.58076pt}{-2.58076pt}{-4.67285pt}{0.0pt}{-4.67285pt}\pgfsys@curveto{2.58076pt}{-4.67285pt}{4.67285pt}{-2.58076pt}{4.67285pt}{0.0pt}\pgfsys@closepath\pgfsys@moveto{0.0pt}{0.0pt}\pgfsys@stroke\pgfsys@invoke{ } }{{{{}}\pgfsys@beginscope\pgfsys@invoke{ }\pgfsys@transformcm{1.0}{0.0}{0.0}{1.0}{-2.0pt}{-2.57777pt}\pgfsys@invoke{ }\hbox{{\definecolor{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@rgb@stroke{0}{0}{0}\pgfsys@invoke{ }\pgfsys@color@rgb@fill{0}{0}{0}\pgfsys@invoke{ }\hbox{{\footnotesize{1}}} }}\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope}}} } \pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope{{{}}}{}{}\hss}\pgfsys@discardpath\pgfsys@invoke{\lxSVG@closescope }\pgfsys@endscope\hss}}\lxSVG@closescope\endpgfpicture}}\right|\leq\int_{X\smallsetminus B_{R}(x_{0})}\left(|\phi_{n}|+|\phi|\right)\,d\mu_{n}\leq 2C\int_{X\smallsetminus B_{R}(x_{0})}[1+\mathrm{d}(x,x_{0})]\,d\mu_{n}(x)\leq 2C\varepsilon\,,

for all sufficiently large nn. Since ε>0\varepsilon>0 is arbitrary, we conclude that ϕn𝑑μnϕ𝑑μ\int\phi_{n}\,d\mu_{n}\to\int\phi\,d\mu, as claimed. ∎

Let us now consider the specific case of the metric space (X,d)=(,dHS)(X,\mathrm{d})=(\mathbb{R},\mathrm{d}_{\mathrm{HS}}), where:

(4.11) dHS(x,y)|log(1+x)log(1+y)|.\mathrm{d}_{\mathrm{HS}}(x,y)\coloneqq\left|\log(1+x)-\log(1+y)\right|\,.

Then the finite-first-moment condition (4.5) becomes our usual integrability condition (2.6), so the 11-Wasserstein space 𝒫1(X)\mathcal{P}_{1}(X) becomes 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}.

4.2. Lower and upper semicontinuity

The HS barycenter is definitely not continuous with respect to the weak topology, since the complement of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} is dense in 𝒫()\mathcal{P}(\mathbb{R}), and the barycenter is \infty there (by Proposition 2.5). Nevertheless, lower semicontinuity holds, except in the critical configuration:

Theorem 4.5.

For every (μ,c)𝒫()×[0,1)(\mu,c)\in\mathcal{P}(\mathbb{R})\times[0,1), we have:

(4.12) lim inf(μ~,c~)(μ,c)[μ~]c~[μ]cunlessμ(0)=1c and [μ]c>0.\liminf_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}}\geq[\mu]_{c}\quad\text{unless}\quad\mu(0)=1-c\text{ and }[\mu]_{c}>0\,.

To be explicit, the inequality above means that for every λ<[μ]c\lambda<[\mu]_{c}, there exists a neighborhood 𝒰𝒫()×[0,1)\mathcal{U}\subseteq\mathcal{P}(\mathbb{R})\times[0,1) of (μ,c)(\mu,c) with respect to the product topology (weak ×\times standard) such that [μ~]c~>λ[\tilde{\mu}]_{\tilde{c}}>\lambda for all (μ~,c~)𝒰(\tilde{\mu},\tilde{c})\in\mathcal{U}.

Proof.

Let us first note that:

(4.13) μ(0)=1c and [μ]c>0lim inf(μ~,c~)(μ,c)[μ~]c~<[μ]c.\mu(0)=1-c\text{ and }[\mu]_{c}>0\quad\Rightarrow\quad\liminf_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}}<[\mu]_{c}\,.

Indeed, if cnc+1/nc_{n}\coloneqq c+1/n, then (μ,cn)(μ,c)(\mu,c_{n})\to(\mu,c). By Proposition 2.11, we have that [μ]cn=0[\mu]_{c_{n}}=0. Thus,

(4.14) 0=limn[μ]cn=lim inf(μ~,c~)(μ,c)[μ~]c~<[μ]c.0=\lim_{n\to\infty}[\mu]_{c_{n}}=\liminf_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}}<[\mu]_{c}\,.

We now consider the converse implication: given (μ,c)𝒫()×[0,1)(\mu,c)\in\mathcal{P}(\mathbb{R})\times[0,1) such that μ(0)1c\mu(0)\neq 1-c or [μ]c=0[\mu]_{c}=0, we want to show that lim inf(μ~,c~)(μ,c)[μ~]c~[μ]c\liminf_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}}\geq[\mu]_{c}. There are some trivial cases:

  • If [μ]c=0[\mu]_{c}=0, then the conclusion is obvious.

  • If μ(0)>1c\mu(0)>1-c, then [μ]c=0[\mu]_{c}=0 by Proposition 2.11, and again the conclusion is clear.

In what follows, we assume that μ(0)<1c\mu(0)<1-c and [μ]c>0[\mu]_{c}>0. Fix a sequence (μn,cn)(\mu_{n},c_{n}) converging to (μ,c)(\mu,c). We need to prove that:

(4.15) lim infn[μn]cn[μ]c.\liminf_{n\to\infty}[\mu_{n}]_{c_{n}}\geq[\mu]_{c}\,.

We may also assume without loss of generality that [μn]cn<[\mu_{n}]_{c_{n}}<\infty for each nn. We divide the proof in two cases:

Case c>0c>0: We can also assume that cn>0c_{n}>0 for every nn. By Proposition 2.5, the hypothesis [μn]cn<[\mu_{n}]_{c_{n}}<\infty means that μn𝒫HS()\mu_{n}\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}. By Portmanteau’s Theorem [Pa, Theorem 6.1(c)], lim supnμn(0)μ(0)<1c\limsup_{n\to\infty}\mu_{n}(0)\leq\mu(0)<1-c. Thus, for sufficiently large values of nn we have μn(0)<1cn\mu_{n}(0)<1-c_{n}. In this setting, the HS barycenter [μn]cn[\mu_{n}]_{c_{n}} can be computed by Proposition 2.7. Recall from (2.16) that η(μ~,c~)\eta(\tilde{\mu},\tilde{c}) denotes the unique positive solution of the equation xc~x+(1c~)η𝑑μ~=1\int\frac{x}{\tilde{c}x+(1-\tilde{c})\eta}\,d\tilde{\mu}=1. Note that η(μ,c)\eta(\mu,c) is well defined even in the case [μ]c=[\mu]_{c}=\infty (see Remark 2.8). We claim that:

(4.16) limnη(μn,cn)=η(μ,c).\lim_{n\to\infty}\eta(\mu_{n},c_{n})=\eta(\mu,c)\,.
Proof of the claim.

Fix numbers α0\alpha_{0}, α1\alpha_{1} with 0<α0<η(μ,c)<α10<\alpha_{0}<\eta(\mu,c)<\alpha_{1}. Then a monotonicity property shown in the proof of Proposition 2.7 gives:

(4.17) xcx+(1c)α0𝑑μ>1>xcx+(1c)α1𝑑μ.\int\frac{x}{cx+(1-c)\alpha_{0}}\,d\mu>1>\int\frac{x}{cx+(1-c)\alpha_{1}}\,d\mu\,.

Note the uniform bounds:

(4.18) 0xcnx+(1cn)αi(infncn)1.0\leq\frac{x}{c_{n}x+(1-c_{n})\alpha_{i}}\leq\left(\inf_{n}c_{n}\right)^{-1}\,.

So, using Corollary 4.2, we see that xcnx+(1cn)αi𝑑μnxcx+(1c)αi𝑑μ\int\frac{x}{c_{n}x+(1-c_{n})\alpha_{i}}\,d\mu_{n}\to\int\frac{x}{cx+(1-c)\alpha_{i}}\,d\mu. In particular, for all sufficiently large nn,

(4.19) xcnx+(1cn)α0𝑑μn>1>xcnx+(1cn)α1𝑑μn,\int\frac{x}{c_{n}x+(1-c_{n})\alpha_{0}}\,d\mu_{n}>1>\int\frac{x}{c_{n}x+(1-c_{n})\alpha_{1}}\,d\mu_{n}\,,

and thus α0<η(μn,cn)<α1\alpha_{0}<\eta(\mu_{n},c_{n})<\alpha_{1}, proving the claim (4.16). ∎

For simplicity, write ynη(μn,cn)y_{n}\coloneqq\eta(\mu_{n},c_{n}). By Proposition 2.2.(b),

(4.20) K(x,yn,cn)K(0,yn,cn)=logyn+cn1log(1cn)CK(x,y_{n},c_{n})\geq K(0,y_{n},c_{n})=\log y_{n}+c_{n}^{-1}\log(1-c_{n})\geq-C

for some finite CC, since supncn<1\sup_{n}c_{n}<1 and infnyn>0\inf_{n}y_{n}>0. This allows us to apply Proposition 4.1 and obtain:

(4.21) lim infnK(x,yn,cn)𝑑μnK(x,y,c)𝑑μ,\liminf_{n\to\infty}\int K(x,y_{n},c_{n})\,d\mu_{n}\geq\int K(x,y_{\infty},c)\,d\mu\,,

where ylimnyn=η(μ,c)y_{\infty}\coloneqq\lim_{n\to\infty}y_{n}=\eta(\mu,c). Using formula (2.10), we obtain (4.15). This completes the proof of Theorem 4.5 in the case c>0c>0.

Case c=0c=0: By Proposition 4.1 we obtain, lim infx𝑑μnlim infx𝑑μ\liminf\int x\,d\mu_{n}\geq\liminf\int x\,d\mu, that is, lim inf[μn]0[μ]c\liminf[\mu_{n}]_{0}\geq[\mu]_{c}. So we can assume that cn>0c_{n}>0 for every nn, like in the previous case.

In order to prove (4.15) in the case c=0c=0, let us fix an arbitrary positive λ<[μ]0=x𝑑μ\lambda<[\mu]_{0}=\int x\,d\mu, and let us show that [μn]cn>λ[\mu_{n}]_{c_{n}}>\lambda for every sufficiently large nn. By the monotone convergence theorem, there exists mm\in\mathbb{N} such that min(x,m)𝑑μ(x)>λ\int\min(x,m)\,d\mu(x)>\lambda. Let μ^\hat{\mu} (resp. μ^n\hat{\mu}_{n}) be the push-forward of the measure μ^\hat{\mu} (resp. μn\mu_{n}) by the map xmin(x,m)x\mapsto\min(x,m). Then [μ^]0>λ[\hat{\mu}]_{0}>\lambda and, by Proposition 2.6.(b), [μ^n]cn[μn]cn[\hat{\mu}_{n}]_{c_{n}}\leq[\mu_{n}]_{c_{n}}. Furthermore, we have μ^nμ^\hat{\mu}_{n}\rightharpoonup\hat{\mu}, since for every fCb()f\in C_{\mathrm{b}}(\mathbb{R}),

(4.22) f𝑑μ^n=f(min(x,m))𝑑μn(x)f(min(x,m))𝑑μ(x)=f𝑑μ^.\int f\,d\hat{\mu}_{n}=\int f(\min(x,m))\,d\mu_{n}(x)\to\int f(\min(x,m))\,d\mu(x)=\int f\,d\hat{\mu}\,.

So, to simplify the notations, we remove the hats and assume that the measures μn\mu_{n}, μ\mu are all supported in the interval [0,m][0,m].

The numbers η(μn,cn)\eta(\mu_{n},c_{n}) and η(μ,c)\eta(\mu,c) are well-defined, as in the previous case, and furthermore they belong to the interval [0,m][0,m], by Lemma 2.14. On the other hand, by Proposition 4.1,

(4.23) lim infnxcnx+(1cn)λ𝑑μnxλ𝑑μ>1.\liminf_{n\to\infty}\int\frac{x}{c_{n}x+(1-c_{n})\lambda}\,d\mu_{n}\geq\int\frac{x}{\lambda}\,d\mu>1\,.

It follows that η(μn,cn)>λ\eta(\mu_{n},c_{n})>\lambda for all sufficiently large nn. We claim that η(μn,cn)η(μ,0)\eta(\mu_{n},c_{n})\to\eta(\mu,0), as before in (4.16). The proof is the same, except that the upper bound in (4.18) becomes infinite and must be replaced by the following estimate:

(4.24) 0xmαiλ}0xcnx+(1cn)αimcnm+(1cn)αimλ.\left.\begin{array}[]{l}0\leq x\leq m\\ \alpha_{i}\geq\lambda\end{array}\right\}\quad\Rightarrow\quad 0\leq\frac{x}{c_{n}x+(1-c_{n})\alpha_{i}}\leq\frac{m}{c_{n}m+(1-c_{n})\alpha_{i}}\leq\frac{m}{\lambda}\,.

So a repetition of the previous arguments yields (4.16), then (4.20) and (4.21), and finally (4.15). Therefore, Theorem 4.5 has been proved in both cases c>0c>0 and c=0c=0. ∎

Next, let us investigate the behaviour of the HS barycenter on the product space 𝒫HS()×[0,1]{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\times[0,1], where 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} is endowed with the topology defined in the end of Section 4.1.

Theorem 4.6.

For every (μ,c)𝒫HS()×[0,1](\mu,c)\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\times[0,1], we have:

(4.25) lim sup(μ~,c~)(μ,c)[μ~]c~\displaystyle\limsup_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}} [μ]c\displaystyle\leq[\mu]_{c} unless c\displaystyle\quad c =0\displaystyle=0 and [μ]0\displaystyle[\mu]_{0} <,\displaystyle<\infty\,,
(4.26) lim inf(μ~,c~)(μ,c)[μ~]c~\displaystyle\liminf_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}} [μ]c\displaystyle\geq[\mu]_{c} unless μ(0)\displaystyle\quad\mu(0) =1c\displaystyle=1-c and [μ]c\displaystyle[\mu]_{c} >0.\displaystyle>0\,.
Proof of part (4.25) of Theorem 4.6.

Let us start by proving the following implication:

(4.27) μ𝒫HS(),c=0, and [μ]0<lim sup(μ~,c~)(μ,c)[μ~]c~>[μ]c.\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})},\ c=0,\text{ and }[\mu]_{0}<\infty\quad\Rightarrow\quad\limsup_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}}>[\mu]_{c}\,.

Consider measures μnn1nμ+1nδn\mu_{n}\coloneqq\frac{n-1}{n}\,\mu+\frac{1}{n}\,\delta_{n}. Clearly, μnμ\mu_{n}\rightharpoonup\mu; moreover,

(4.28) log(1+x)d(μμn)(x)=log(1+n)n0.\displaystyle\int\log(1+x)\,d(\mu-\mu_{n})(x)=\frac{\log(1+n)}{n}\to 0\,.

So using characterization (c) of Theorem 4.3, we conclude that μnμ\mu_{n}\to\mu in the topology of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}. On the other hand, [μn]0=n1n[μ]0+1[μ]0+1[\mu_{n}]_{0}=\frac{n-1}{n}\,[\mu]_{0}+1\to[\mu]_{0}+1. This proves (4.27).

Next, let us prove the converse implication. So, let us fix (μ,c)(\mu,c) such that c0c\neq 0 or [μ]0=[\mu]_{0}=\infty, and let us show that if (μn,cn)(\mu_{n},c_{n}) is any sequence in 𝒫HS()×[0,1]{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\times[0,1] converging to (μ,c)(\mu,c), then lim supn[μn]cn[μ]c\limsup_{n\to\infty}[\mu_{n}]_{c_{n}}\leq[\mu]_{c}. This is obviously true if [μ]c=[\mu]_{c}=\infty, so let us assume that [μ]c<[\mu]_{c}<\infty. Then our assumption becomes c>0c>0, so by removing finitely many terms from the sequence (μn,cn)(\mu_{n},c_{n}), we may assume that infncn>0\inf_{n}c_{n}>0. Fix some finite number λ>[μ]c\lambda>[\mu]_{c}. By Definition 2.3, there is some y0>0y_{0}>0 such that K(x,y0,c)𝑑μ(x)<logλ\int K(x,y_{0},c)\,d\mu(x)<\log\lambda. The sequence of continuous functions |K(x,y0,cn)|1+log(1+x)\frac{|K(x,y_{0},c_{n})|}{1+\log(1+x)} is uniformly bounded, as a direct calculation shows. Furthermore, K(,y0,cn)K(,y0,c)K(\mathord{\cdot},y_{0},c_{n})\to K(\mathord{\cdot},y_{0},c) uniformly on compact subsets of \mathbb{R}. So Lemma 4.4 ensures that K(x,y0,cn)𝑑μn(x)K(x,y0,c)𝑑μ(x)\int K(x,y_{0},c_{n})\,d\mu_{n}(x)\to\int K(x,y_{0},c)\,d\mu(x). Now it follows from Definition 2.3 that [μn]cn<λ[\mu_{n}]_{c_{n}}<\lambda for all sufficiently large nn. Since λ>[μ]c\lambda>[\mu]_{c} is arbitrary, we conclude that lim supn[μn]cn[μ]c\limsup_{n\to\infty}[\mu_{n}]_{c_{n}}\leq[\mu]_{c}, as we wanted to show. ∎

Proof of part (4.26) of Theorem 4.6.

First, we prove that, for all (μ,c)𝒫HS()×[0,1](\mu,c)\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\times[0,1],

(4.29) μ(0)=1c, and [μ]c>0lim inf(μ~,c~)(μ,c)[μ~]c~<[μ]c.\mu(0)=1-c,\text{ and }[\mu]_{c}>0\quad\Rightarrow\quad\liminf_{(\tilde{\mu},\tilde{c})\to(\mu,c)}[\tilde{\mu}]_{\tilde{c}}<[\mu]_{c}\,.

Consider measures μnn1nμ+1nδ0\mu_{n}\coloneqq\frac{n-1}{n}\,\mu+\frac{1}{n}\,\delta_{0}. Clearly, μnμ\mu_{n}\to\mu in the topology of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}. By Proposition 2.11.(b), [μn]c=0[\mu_{n}]_{c}=0. This proves (4.29). For c[0,1)c\in[0,1) the converse is a direct consequence of Theorem 4.5, since the topology on 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} is stronger. If c=1c=1 and [μ]1=0[\mu]_{1}=0 then the result is obvious. If [μ]1>0[\mu]_{1}>0 then, as the example above shows, the result does not hold. ∎

4.3. Continuity for extremal values of the parameter

Theorem 4.6 shows that the HS barycenter map on 𝒫HS()×[0,1]{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\times[0,1] is not continuous at the pairs (μ,0)(\mu,0) (except if [μ]0=[\mu]_{0}=\infty), nor at the pairs (μ,1)(\mu,1) (except if [μ]1=0[\mu]_{1}=0). Let us see that continuity can be “recovered” if we impose extra integrablity conditions and work with stronger topologies.

If we use the standard distance d(x,y)|xy|\mathrm{d}(x,y)\coloneqq|x-y| on the half-line X==[0,+)X=\mathbb{R}=[0,+\infty), then the resulting 11-Wasserstein space is denoted 𝒫1()\mathcal{P}_{1}(\mathbb{R}). On the other hand, using the distance

(4.30) dg(x,y)|logxlogy|on the open half-line X=,\mathrm{d}_{\mathrm{g}}(x,y)\coloneqq\left|\log x-\log y\right|\quad\text{on the open half-line }X=\mathbb{R}\,,

the corresponding 11-Wasserstein space will be denoted 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}). We consider the latter space as a subset of 𝒫()\mathcal{P}(\mathbb{R}), since any measure μ\mu on \mathbb{R} can be extended to \mathbb{R} by setting μ(0)=0\mu(0)=0. The topologies we have just defined on the spaces 𝒫1()\mathcal{P}_{1}(\mathbb{R}) and 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}) are stronger than the topologies induced by 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}; in other words, the inclusion maps below are continuous:

(4.31) 𝒫1(){\mathcal{P}_{1}(\mathbb{R})}𝒫HS(){{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}}𝒫(){\mathcal{P}(\mathbb{R})}𝒫g(){\mathcal{P}_{\mathrm{g}}(\mathbb{R})}

Note that the “arithmetic barycenter” []0[\mathord{\cdot}]_{0} is finite on 𝒫1()\mathcal{P}_{1}(\mathbb{R}), while the “geometric barycenter” []1[\mathord{\cdot}]_{1} is finite and non-zero on 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}).

Finally, let us establish continuity of the HS barycenter for the extremal values of the parameter with respect to these new topologies:

Proposition 4.7.

Consider a sequence (μn,cn)(\mu_{n},c_{n}) in 𝒫HS()×[0,1]{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\times[0,1].

  1. (a)

    If (μn,cn)(μ,0)(\mu_{n},c_{n})\to(\mu,0) in 𝒫1()×[0,1)\mathcal{P}_{1}(\mathbb{R})\times[0,1), then [μn]cn[μ]0[\mu_{n}]_{c_{n}}\to[\mu]_{0}.

  2. (b)

    If (μn,cn)(μ,1)(\mu_{n},c_{n})\to(\mu,1) in 𝒫g()×(0,1]\mathcal{P}_{\mathrm{g}}(\mathbb{R})\times(0,1], then [μn]cn[μ]1[\mu_{n}]_{c_{n}}\to[\mu]_{1}.

Proof of part (a) of Proposition 4.7 .

Note that if cn=0c_{n}=0 for every nn\in\mathbb{N}, the result is direct from the definition of the topology in 𝒫1()\mathcal{P}_{1}(\mathbb{R}) (use characterization (c) of Theorem 4.3). We assume now that cn>0c_{n}>0, for every nn\in\mathbb{N}. It is a consequence of Theorem 4.5 that:

(4.32) lim inf(μn,cn)(μ,0)[μn]cn[μ]0.\liminf_{(\mu_{n},c_{n})\to(\mu,0)}[\mu_{n}]_{c_{n}}\geq[\mu]_{0}\,.

The same proof as that of part (4.25) of Theorem 4.6 can be used to prove,

(4.33) lim sup(μn,cn)(μ,0)[μn]cn[μ]0.\limsup_{(\mu_{n},c_{n})\to(\mu,0)}[\mu_{n}]_{c_{n}}\leq[\mu]_{0}\,.

Indeed, in the topology of 𝒫1()\mathcal{P}_{1}(\mathbb{R}) the HS kernels K(x,y,cn)K(x,y,c_{n}) satisfy the assumptions of Lemma 4.4. For this, it suffices to notice that, for any fixed value y0>0y_{0}>0, the sequence of continuous functions |K(x,y0,cn)|1+x\frac{|K(x,y_{0},c_{n})|}{1+x} is uniformly bounded, and that K(x,y0,cn)K(x,y0,0)K(x,y_{0},c_{n})\to K(x,y_{0},0) uniformly on compact subsets of \mathbb{R}. ∎

Proof of part (b) of Proposition 4.7 .

In the case that cn=1c_{n}=1 for every nn\in\mathbb{N}, the result is direct from the topology in 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}) (use characterization (c) of Theorem 4.3). We assume that cn<1c_{n}<1, for every nn. It is a consequence of (4.25) of Theorem 4.6 that:

(4.34) lim sup(μn,cn)(μ,1)[μn]cn[μ]1.\limsup_{(\mu_{n},c_{n})\to(\mu,1)}[\mu_{n}]_{c_{n}}\leq[\mu]_{1}\,.

Recall that the HS barycenter is decreasing in the variable cc; see Proposition 2.6.(e). In particular, [μn]cn[μn]1[\mu_{n}]_{c_{n}}\geq[\mu_{n}]_{1}, for every nn\in\mathbb{N}. Noticing that logx\log x is a test function for the convergence in the topology of 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}), we obtain:

(4.35) lim infn[μn]cnlim infn[μn]1=exp(lim infnlogxdμn)=explogxdμ=[μ]1.\liminf_{n\to\infty}[\mu_{n}]_{c_{n}}\geq\liminf_{n\to\infty}[\mu_{n}]_{1}\\ =\exp\left(\liminf_{n\to\infty}\int\log x\,d\mu_{n}\right)=\exp\int\log x\,d\mu=[\mu]_{1}\,.

The result follows combining (4.34) and (4.35). ∎

The following observation complements part (b) of Proposition 4.7, since it provides a sort of lower semicontinuity property at c=1c=1 under a weaker integrability condition:

Lemma 4.8.

Let c[0,1)c\in[0,1) and let μ𝒫()\mu\in\mathcal{P}(\mathbb{R}) be such that log(x)L1(μ)\log^{-}(x)\in L^{1}(\mu). Then:

(4.36) [μ]cexp(c1log+(x)log(x))𝑑μ(x).[\mu]_{c}\geq\exp\int\left(c^{-1}\log^{+}(x)-\log^{-}(x)\right)\,d\mu(x)\,.
Proof.

By definition, [μ]cK(x,1,c)𝑑μ(x)[\mu]_{c}\geq\int K(x,1,c)\,d\mu(x). Note that if x1x\geq 1, then K(x,1,c)=c1log(cx+1c)c1logxK(x,1,c)=c^{-1}\log(cx+1-c)\geq c^{-1}\log x, while if x<1x<1, then K(x,1,c)logxK(x,1,c)\geq\log x by Proposition 2.2.(e). In any case, K(x,1,c)c1log+(x)log(x)K(x,1,c)\geq c^{-1}\log^{+}(x)-\log^{-}(x), and the Lemma follows. ∎

5. Ergodic theorems for symmetric and HS means

Symmetric means (3.3) are only defined for lists of nonnegative numbers. On the other hand, HS barycenters are defined for probability measures on \mathbb{R}, and are therefore much more flexible objects. In particular, there is an induced concept of HS mean of a list of nonnegative numbers, which we have already introduced in (3.1). We can also define the HS mean of a function:

Definition 5.1.

If (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) is a probability space, f:Ωf\colon\Omega\to\mathbb{R} is a measurable nonnegative function, and c[0,1]c\in[0,1], then the Halász–Székely mean (or HS mean) with parameter cc of the function ff with respect to the probability measure \mathbb{P} is:

(5.1) [f]c[f]c.[f\mid\mathbb{P}]_{c}\coloneqq[f_{*}\mathbb{P}]_{c}\,.

that is, the HS barycenter with parameter cc of the push-forward measure on \mathbb{R}. In the case c=1c=1, we require that logf\log f is semi-integrable.

For arithmetic means, the classical ergodic theorem of Birkhoff states the equality between limit time averages and spatial averages. From the probabilistic viewpoint, Birkhoff’s theorem is the strong law of large numbers. We prove an ergodic theorem that applies simultaneously to symmetric and HS means, and extends results of [HS, vE]:

Theorem 5.2.

Let (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) be a probability space, let T:ΩΩT\colon\Omega\to\Omega be an ergodic measure-preserving transformation, and let f:Ωf\colon\Omega\to\mathbb{R} be a nonnegative measurable function. Then there exists a measurable set RΩR\subseteq\Omega with (R)=1\mathbb{P}(R)=1 with the following properties. For any ωR\omega\in R, for any c[0,1]c\in[0,1] such that

(5.2) c1(f1(0))c\neq 1-\mathbb{P}(f^{-1}(0))

and

(5.3) c<1unless logf is semi-integrable,c<1\quad\text{unless $\log f$ is semi-integrable,}

and for any sequence (cn)(c_{n}) in [0,1][0,1] tending to cc, we have:

(5.4) limn𝗁𝗌𝗆cn(f(ω),f(Tω),,f(Tn1ω))=[f]c;\lim_{n\to\infty}\mathsf{hsm}_{c_{n}}\big{(}f(\omega),f(T\omega),\dots,f(T^{n-1}\omega)\big{)}=[f\mid\mathbb{P}]_{c}\,;

furthermore, for any sequence (kn)(k_{n}) of integers such that 1knn1\leq k_{n}\leq n and kn/nck_{n}/n\to c, we have:

(5.5) limn𝗌𝗒𝗆kn(f(ω),f(Tω),,f(Tn1ω))=[f]c.\lim_{n\to\infty}\mathsf{sym}_{k_{n}}\big{(}f(\omega),f(T\omega),\dots,f(T^{n-1}\omega)\big{)}=[f\mid\mathbb{P}]_{c}\,.
Remark 5.3.

Since we allow HS means to take infinity value, we do not need integrability conditions as in [HS, vE], except for the unavoidable hypothesis (5.3). In the supercritical case (f1(0))>1c\mathbb{P}(f^{-1}(0))>1-c, both limits (5.4) and (5.5) are almost surely attained in finite time. In the critical case (f1(0))=1c\mathbb{P}(f^{-1}(0))=1-c, strong convergence does not necessarily hold, and the values 𝗌𝗒𝗆kn(f(ω),,f(Tn1ω))\mathsf{sym}_{k_{n}}\big{(}f(\omega),\dots,f(T^{n-1}\omega)\big{)} may oscillate. However, in the IID setting, van Es proved that the sequence of symmetric means converges in distribution, provided that the sequence (n(kn/nc))(\sqrt{n}(k_{n}/n-c)) converges in [,][-\infty,\infty]: see [vE, Theorem A1 (b)].

As we will soon see, part (5.4) of Theorem 5.2 is obtained using the results about continuity of the HS barycenter with respect to various topologies proved in Section 4, and then part (5.5) follows from the inequalities of Theorem 3.4 and Remark 3.6.

To begin the proof, let us fix (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), TT, and ff as in the statement, and let μf𝒫()\mu\coloneqq f_{*}\mathbb{P}\in\mathcal{P}(\mathbb{R}) denote the push-forward measure. Given ωΩ\omega\in\Omega, we consider the sequence of associated sample measures:

(5.6) μnωδf(ω)+δf(T(ω))++δf(Tn1(ω))n.\mu_{n}^{\omega}\coloneqq\frac{\delta_{f(\omega)}+\delta_{f(T(\omega))}+\dots+\delta_{f(T^{n-1}(\omega))}}{n}\,.

As the next result shows, these sample measures converge almost surely666[Pa, Theorem II.7.1] contains a similar result, with essentially the same proof.:

Lemma 5.4.

There exists a measurable set RΩR\subseteq\Omega with (R)=1\mathbb{P}(R)=1 such that for every ωR\omega\in R, the corresponding sample measures converge weakly to μ\mu:

(5.7) μnωμ;\mu_{n}^{\omega}\rightharpoonup\mu\,;

furthermore, stronger convergences may hold according to the function ff:

  1. (a)

    if log(1+f)L1()\log(1+f)\in L^{1}(\mathbb{P}), then μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})};

  2. (b)

    if fL1()f\in L^{1}(\mathbb{P}), then μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫1()\mathcal{P}_{1}(\mathbb{R});

  3. (c)

    if |logf|L1()|\log f|\in L^{1}(\mathbb{P}), then μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}).

Proof.

Let 𝒞Cb()\mathcal{C}\subset C_{\mathrm{b}}(\mathbb{R}) be a countable set of bounded continuous functions which is sufficient to test weak convergence, i.e., with property (4.1). For each ϕ𝒞\phi\in\mathcal{C}, applying Birkhoff’s ergodic theorem to the function ϕf\phi\circ f we obtain a measurable set RΩR\subseteq\Omega with (R)=1\mathbb{P}(R)=1 such that for all ωR\omega\in R,

(5.8) limnϕ𝑑μnω=limn1nj=0n1ϕ(f(Tiω))=ϕf𝑑=ϕ𝑑μ.\lim_{n\to\infty}\int\phi\,d\mu^{\omega}_{n}=\lim_{n\to\infty}\frac{1}{n}\sum_{j=0}^{n-1}\phi(f(T^{i}\omega))=\int\phi\circ f\,d\mathbb{P}=\int\phi\,d\mu\,.

Since 𝒞\mathcal{C} is countable, we can choose a single measurable set RR of full probability that works for all ϕ𝒞\phi\in\mathcal{C}. Then we obtain μnωμ\mu_{n}^{\omega}\rightharpoonup\mu for all ωR\omega\in R. To obtain stronger convergences, we apply Birkhoff’s theorem to the functions log(1+f)\log(1+f), ff, and |logf||\log f|, provided they are integrable, and reduce the set RR accordingly. If, for example, ff is integrable, then for all ωR\omega\in R we have:

(5.9) limnx𝑑μnω(x)=limn1nj=0n1f(Tiω)=f𝑑=x𝑑μ(x).\lim_{n\to\infty}\int x\,d\mu^{\omega}_{n}(x)=\lim_{n\to\infty}\frac{1}{n}\sum_{j=0}^{n-1}f(T^{i}\omega)=\int f\,d\mathbb{P}=\int x\,d\mu(x)\,.

Applying part (c) of Theorem 4.3 with x0=0x_{0}=0 and d(x,x0)=x\mathrm{d}(x,x_{0})=x, we conclude that μnω\mu_{n}^{\omega} converges to μ\mu in the topology of 𝒫1()\mathcal{P}_{1}(\mathbb{R}). The assertions about convergence in 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} and 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}) are proved analogously, using instead the corresponding distances (4.11) and (4.30). ∎

Proof of Theorem 5.2.

Let RR be the set given by Lemma 5.4. By the semi-integrable version of Birkhoff’s theorem (see e.g. [Kr, p. 15]), we can reduce RR if necessary and assume that for all ωR\omega\in R,

(5.10) limn1nj=0n1log±(f(Tiω))=log±(f(ω))𝑑(ω).\lim_{n\to\infty}\frac{1}{n}\sum_{j=0}^{n-1}\log^{\pm}(f(T^{i}\omega))=\int\log^{\pm}(f(\omega))\,d\mathbb{P}(\omega)\,.

Fix a point ωR\omega\in R and a number c[0,1]c\in[0,1] satisfying conditions (5.2) and (5.3). Consider any sequence (cn)(c_{n}) in [0,1][0,1] converging to cc. Let us prove (5.5), or equivalently,

(5.11) [μnω]cn[μ]c.[\mu_{n}^{\omega}]_{c_{n}}\to[\mu]_{c}\,.

There are several cases to be considered, and in all but the last case we will use Lemma 5.4:

  • First case: 0c<10\leq c<1 and [μ]c=[\mu]_{c}=\infty. Since μnωμ\mu_{n}^{\omega}\rightharpoonup\mu, (5.11) is a consequence of Theorem 4.5.

  • Second case: 0<c<10<c<1 and [μ]c<[\mu]_{c}<\infty. Then log(1+f)L1()\log(1+f)\in L^{1}(\mathbb{P}). Therefore μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, and Theorem 4.6 implies (5.11).

  • Third case: c=0c=0 and [μ]0<[\mu]_{0}<\infty. Then logfL1()\log f\in L^{1}(\mathbb{P}), and hence μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫1()\mathcal{P}_{1}(\mathbb{R}). So (5.11) follows from Proposition 4.7.(a).

  • Fourth case: c=1c=1 and [μ]1=0[\mu]_{1}=0. Then log(1+f)L1()\log(1+f)\in L^{1}(\mathbb{P}). Thus μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}, and Theorem 4.6 yields (5.11).

  • Fifth case: c=1c=1 and 0<[μ]1<0<[\mu]_{1}<\infty. Then |logf|L1()|\log f|\in L^{1}(\mathbb{P}). Therefore μnωμ\mu_{n}^{\omega}\to\mu in the topology of 𝒫g()\mathcal{P}_{\mathrm{g}}(\mathbb{R}), and (5.11) becomes a consequence of Proposition 4.7.(b).

  • Sixth case: c=1c=1 and [μ]1=[\mu]_{1}=\infty. Then log(f)\log^{-}(f) is integrable, but log+(f)\log^{+}(f) is not. If nn is large enough then cn>0c_{n}>0, so Lemma 4.8 gives:

    (5.12) log[μnω]cn\displaystyle\log[\mu_{n}^{\omega}]_{c_{n}} (cn1log+(x)log(x))𝑑μnω(x)\displaystyle\geq\int\left(c_{n}^{-1}\log^{+}(x)-\log^{-}(x)\right)\,d\mu_{n}^{\omega}(x)
    (5.13) =1cn1ni=0n1log+(f(Tiω))1ni=0n1log(f(Tiω)),\displaystyle=\frac{1}{c_{n}}\cdot\frac{1}{n}\sum_{i=0}^{n-1}\log^{+}(f(T^{i}\omega))-\frac{1}{n}\sum_{i=0}^{n-1}\log^{-}(f(T^{i}\omega))\,,

    which by (5.10) tends to \infty. This proves (5.11) in the last case.

Part (5.4) of the Theorem is proved, and now let use it to prove part (5.5). Consider a sequence (kn)(k_{n}) of integers such that 1knn1\leq k_{n}\leq n and cnkn/nc_{n}\coloneqq k_{n}/n tends to cc. By Theorem 3.4,

(5.14) [μnω]cn𝗌𝗒𝗆kn(f(ω),f(Tω),,f(Tn1ω))(nkn)1/knB(cn)[μnω]cn.[\mu_{n}^{\omega}]_{c_{n}}\leq\mathsf{sym}_{k_{n}}\big{(}f(\omega),f(T\omega),\dots,f(T^{n-1}\omega)\big{)}\leq\frac{\binom{n}{k_{n}}^{-1/k_{n}}}{B(c_{n})}\,[\mu_{n}^{\omega}]_{c_{n}}\,.

If [μ]c=[\mu]_{c}=\infty, i.e. [μnω]cn[\mu_{n}^{\omega}]_{c_{n}}\to\infty, then the first inequality forces the symmetric means to tend to \infty as well. So let us assume that [μ]c[\mu]_{c} is finite. If c>0c>0 then, by Lemma 3.5, the fraction on the RHS converges to 11 as nn\to\infty, and therefore we obtain the desired limit (5.5). If c=0c=0, then we appeal to Maclaurin inequality in the form

(5.15) 𝗌𝗒𝗆kn(f(ω),f(Tω),,f(Tn1ω))𝗌𝗒𝗆1(f(ω),f(Tω),,f(Tn1ω)).\mathsf{sym}_{k_{n}}\big{(}f(\omega),f(T\omega),\dots,f(T^{n-1}\omega)\big{)}\leq\mathsf{sym}_{1}\big{(}f(\omega),f(T\omega),\dots,f(T^{n-1}\omega)\big{)}\,.

So:

(5.16) [μnω]cn𝗌𝗒𝗆kn(f(ω),f(Tω),,f(Tn1ω))[μnω]0.[\mu_{n}^{\omega}]_{c_{n}}\leq\mathsf{sym}_{k_{n}}\big{(}f(\omega),f(T\omega),\dots,f(T^{n-1}\omega)\big{)}\leq[\mu_{n}^{\omega}]_{0}\,.

Since (5.11) also holds with cn0c_{n}\equiv 0, we see that all three terms converge together to [μ]c[\mu]_{c}, thus proving (5.5) also in the case c=0c=0. ∎

Like Birkhoff’s Ergodic Theorem itself, Theorem 5.2 should be possible to generalize in numerous directions. For example, part (5.4) can be easily adapted to flows or semiflows (actions of the group \mathbb{R} or the semigroup \mathbb{R}). One can also consider actions of amenable groups, like [Au, Na]. We shall not pursue these matters. In another direction, let us note that Central Limit Theorems for symmetric means of i.i.d. random variables have been proved by Székely [Sz2] and van Es [vE].

A weaker version of Theorem 5.2, in which the function ff is assumed to be bounded away from zero and infinity, was obtained in [BIP, Theorem 5.1] as a corollary of a fairly general pointwise ergodic theorem: the Law of Large Permanents [BIP, Theorem 4.1]. We now briefly discuss a generalization of that result obtained by Balogh and Nguyen [BN, Theorem 1.6]. Suppose that TT is an ergodic measure preserving action of the semigroup 2\mathbb{N}^{2} on the space (X,μ)(X,\mu). Given an observable g:Xg:X\to\mathbb{R} and a point xXx\in X, we define an infinite matrix whose (i,j)(i,j)-entry is g(T(i,j)x)g(T^{(i,j)}x). Consider square truncations of this matrix and then take the limit of the corresponding permanental means as the size of the square tends to infinity. It is proved that the limit exists μ\mu-almost everywhere. But not only that, it is also possible to identify the limit. It turns out that it is a functional scaling mean. This is a far reaching generalization of the matrix scaling mean (3.26): see [BIP, Section 3.1].

6. Concavity properties of the HS barycenter

In Section 4, we have studied properties of the HS barycenter that rely on topological structures. In this section, we discuss properties that are related to affine (i.e. convex) structures.

6.1. Basic concavity properties

Let us first consider the HS barycenter as a function of the measure.

Proposition 6.1.

For all c[0,1]c\in[0,1], the function μ𝒫HS()[μ]c\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\mapsto[\mu]_{c} is log-concave.

Proof.

By definition,

(6.1) log[μ]c=infy>0K(x,y,c)dμ(x).\log[\mu]_{c}=\inf_{y>0}\int K(x,y,c)\,d\mu(x)\,.

For each cc and yy, the function μK(x,y,c)𝑑μ(x)\mu\mapsto\int K(x,y,c)\,d\mu(x) is affine. Since the infimum of affine functions is concave, we conclude that log[μ]c\log[\mu]_{c} is concave as a function of μ\mu. ∎

Next, let us consider the HS barycenter as a function of the parameter.

Proposition 6.2.

For all μ𝒫HS(){δ0}\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\smallsetminus\{\delta_{0}\}, the function c[0,1][μ]ccc\in[0,1]\mapsto[\mu]_{c}^{c} is log-concave.

This Proposition can be regarded as a version of Newton inequality, which says that for every x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}), the function

(6.2) k{1,,n}[𝗌𝗒𝗆k(x¯)]kk\in\{1,\dots,n\}\mapsto[\mathsf{sym}_{k}(\underline{x})]^{k}

is log-concave (see [HLP, Theorem 51, p. 52] or [Bu, Theorem 1(1), p. 324]).

Proof of Proposition 6.2.

Note the following trait of the HS kernel: for all x0x\geq 0 and y>0y>0, the function

(6.3) c[0,1]cK(x,y,c)=clogy+log(cy1x+1c)[,+)c\in[0,1]\mapsto cK(x,y,c)=c\log y+\log(cy^{-1}x+1-c)\in[-\infty,+\infty)

is concave. Integrating over xx with respect to the given μ𝒫HS(){δ0}\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\smallsetminus\{\delta_{0}\}, and then taking infimum over yy, we conclude that the function c[0,1]clog[μ]cc\in[0,1]\mapsto c\log[\mu]_{c} is concave, as we wanted to show. ∎

Recall from Definition 5.1 that the HS mean [f]c[f\mid\mathbb{P}]_{c} of a function ff with respect to a probability measure \mathbb{P} is simply the HS barycenter of the push-forward ff_{*}\mathbb{P}. Let us now investigate this mean as a function of ff. The same argument from the proof of Proposition 6.2 shows that f[f]ccf\mapsto[f\mid\mathbb{P}]_{c}^{c} is log-concave. However, we are able to show more:

Proposition 6.3.

Let (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) be a probability space. Let FF be the set of nonnegative measurable functions ff such that log(1+f)L1()\log(1+f)\in L^{1}(\mathbb{P}). For every c(0,1]c\in(0,1], the function fF[f]ccf\in F\mapsto[f\mid\mathbb{P}]_{c}^{c} is concave.

This is a consequence of the fact that [f]cc[f\mid\mathbb{P}]_{c}^{c} is a functional scaling mean (see [BIP]), but for the convenience of the reader we provide a self-contained proof. We start with the following observation: if GG is the set of positive measurable functions gg such that loggL1()\log g\in L^{1}(\mathbb{P}), then for all gGg\in G,

(6.4) exploggd=infhGgh𝑑exploghd,\exp\int\log g\,d\mathbb{P}=\inf_{h\in G}\frac{\int gh\,d\mathbb{P}}{\exp\int\log h\,d\mathbb{P}}\,,

with the infimum being attained at h=1/gh=1/g. Indeed, this is just a reformulation of the inequality between arithmetic and geometric means.

Proof of Proposition 6.3.

Let us first consider the case c(0,1)c\in(0,1). For every fixed value of y>0y>0, the function

(6.5) g(ω)exp(cK(f(ω),y,c))=yc(cy1f(ω)+1c)g(\omega)\coloneqq\exp(cK(f(\omega),y,c))=y^{c}\big{(}cy^{-1}f(\omega)+1-c\big{)}

belongs to the set GG defined above. Using identity (6.4),

(6.6) expcK(f(ω),y,c)𝑑(ω)=infhGyc(cy1f(ω)+1c)h(ω)𝑑(ω)exploghd.\exp\int cK(f(\omega),y,c)\,d\mathbb{P}(\omega)=\inf_{h\in G}\frac{\int y^{c}(cy^{-1}f(\omega)+1-c)h(\omega)\,d\mathbb{P}(\omega)}{\exp\int\log h\,d\mathbb{P}}\,.

Consider this expression as a function of ff; since it is an infimum of affine functions, it is concave. Taking the infimum over y>0y>0, we conclude that f[f]ccf\mapsto[f\mid\mathbb{P}]_{c}^{c} is concave, as claimed.

The proof of the remaining case c=1c=1 is similar, but then we need to extend identity (6.4) to functions gg in FF; we leave the details for the reader. ∎

6.2. Finer results

For the remaining of this section, we assume that the parameter cc is in the range 0<c<10<c<1. Let us consider the HS barycenter as a function of the measure again. We define the subcritical locus as the following (convex) subset of 𝒫HS(){\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}:

(6.7) 𝒮c{μ𝒫HS();μ(0)<1c}.\mathcal{S}_{c}\coloneqq\big{\{}\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})}\;\mathord{;}\;\mu(0)<1-c\big{\}}\,.

The function []c[\mathord{\cdot}]_{c} restricted to the subcritical locus is well-behaved. It is analytic, in a sense that we will make precise below. By Proposition 6.1, this function is log-concave. Nevertheless, we will show that it is not strictly log-concave.

Let us begin with an abstract definition.

Definition 6.4.

A real-valued function ff defined on a convex subset CC of a real vector space is called quasi-affine if, for all xx, yCy\in C,

(6.8) f([x,y])[f(x),f(y)],f\left([x,y]\right)\subseteq[f(x),f(y)]\,,

where [x,y]{(1t)x+ty; 0t1}[x,y]\coloneqq\{(1-t)x+ty\;\mathord{;}\;0\leq t\leq 1\}, and the right-hand side is the interval with extremes f(x)f(x), f(y)f(y), independently of their order.

The explanation for the terminology is that quasi-affine functions are exactly those that are simultaneouly quasiconcave and quasiconvex (for the latter concepts see e.g. [ADSZ, Chapter 3]). Note that the level sets of a quasi-affine function are convex.

For μ\mu in the subcritical locus 𝒮c\mathcal{S}_{c}, the HS barycenter [μ]c[\mu]_{c} can be computed using Proposition 2.7, and this computation relies on finding the solution η=η(μ,c)\eta=\eta(\mu,c) of equation (2.16). Since the integrand in (2.16) is monotonic with respect to η\eta, the function η(,c):𝒮c\eta(\mathord{\cdot},c)\colon\mathcal{S}_{c}\to\mathbb{R} is quasi-affine. Concerning the barycenter itself, we have:

Proposition 6.5.

Let μ0\mu_{0}, μ1𝒮c\mu_{1}\in\mathcal{S}_{c}. Then the restriction of the function []c[\mathord{\cdot}]_{c} to the segment [μ0,μ1][\mu_{0},\mu_{1}] is log-affine if η(μ0,c)=η(μ1,c)\eta(\mu_{0},c)=\eta(\mu_{1},c), and is strictly log-concave otherwise.

See Fig. 3.

Refer to caption
Figure 3. The figure shows level curves of the functions []c[\mathord{\cdot}]_{c} (in red) and η(,c)\eta(\mathord{\cdot},c) (in blue) on a 22-simplex Δ𝒫HS()\Delta\subset{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} whose vertices are distinct delta measures δx1\delta_{x_{1}}, δx2\delta_{x_{2}}, δx3\delta_{x_{3}}. Specifically, we took c=2/3c=2/3, x1=1x_{1}=1, x2=24x_{2}=2^{4}, x3=28x_{3}=2^{8}, and the plotted levels for each function are 20.52^{0.5}, 22, …, 27.52^{7.5}. The function η(,c)\eta(\mathord{\cdot},c) is quasi-affine, so the blue level curves are straight segments. The HS barycenter []c[\mathord{\cdot}]_{c} is log-affine along each level curve of η(,c)\eta(\mathord{\cdot},c), and so each blue segment is cut by the red curves into subsegments of equal size (except for the extremes). On the other hand, the HS barycenter []c[\mathord{\cdot}]_{c} restricted to any segment SS not contained on a level set of η(,c)\eta(\mathord{\cdot},c) is strictly log-concave.
Proof.

As observed above, the function η(,c)\eta(\mathord{\cdot},c) on 𝒮c\mathcal{S}_{c} is quasi-affine, and in particular its level sets are convex. As a consequence of (2.10), along each level set of η(,c)\eta(\mathord{\cdot},c), the function []c[\mathord{\cdot}]_{c} is log-affine, and so not strictly log-concave there. This proves the first part of the Proposition.

To prove the second part, consider μ0\mu_{0}, μ1𝒫HS()\mu_{1}\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} such that η(μ0,c)η(μ1,c)\eta(\mu_{0},c)\neq\eta(\mu_{1},c), and parametrize the segment [μ0,μ1][\mu_{0},\mu_{1}] by μ(t)(1t)μ0+tμ1\mu(t)\coloneqq(1-t)\mu_{0}+t\mu_{1}, t[0,1]t\in[0,1]. Then, Lemma 6.6 below ensures that the second derivative of the function tlog[μ]ct\mapsto\log[\mu]_{c} is nonpositive and vanishes at finitely many points (if any). So the function t[μ(t)]ct\mapsto[\mu(t)]_{c} is strictly log-concave. This proves Proposition 6.5, modulo the Lemma. ∎

Lemma 6.6.

Suppose II\subset\mathbb{R} is an interval and tIμ(t)𝒮ct\in I\mapsto\mu(t)\in\mathcal{S}_{c} is an affine mapping. Write μ=μ(t)\mu=\mu(t), η=η(μ(t),c)\eta=\eta(\mu(t),c). Then η\eta and [μ]c[\mu]_{c} are analytic functions of tt. Furthermore, letting dot denote derivative with respect to tt, the following formula holds:

(6.9) (log[μ]c)¨=(1c)η˙2ηxdμ(x)(cx+(1c)η)2.(\log[\mu]_{c})\ddot{\ }=-\frac{(1-c)\dot{\eta}^{2}}{\eta}\int\frac{x\,d\mu(x)}{\left(cx+(1-c)\eta\right)^{2}}\,.

The integral is strictly positive (since μδ0)\mu\neq\delta_{0}), so formula (6.9) tells us that, at any point μ\mu in 𝒮c\mathcal{S}_{c}, the Hessian of the function log[]c\log[\mathord{\cdot}]_{c} is negative semidefinite (not a surprise, given Proposition 6.1), and has the same kernel as the derivative of the function η(,c)\eta(\mathord{\cdot},c) at the same point.

Proof of Lemma 6.6.

Let us omit the parameter cc in the formulas, so K(x,y)=K(x,y,c)K(x,y)=K(x,y,c). As in the proof of Proposition 2.7, we consider the following functions:

(6.10) Δ(x,y)yKy(x,y)andψ(t,y)Δ(x,y)𝑑μ(t)(x).\Delta(x,y)\coloneqq yK_{y}(x,y)\quad\text{and}\quad\psi(t,y)\coloneqq\int\Delta(x,y)\,d\mu^{(t)}(x)\,.

(We temporarily denote μ(t)\mu(t) by μ(t)\mu^{(t)}.) Then η(t)=η(μ(t),c)\eta(t)=\eta(\mu^{(t)},c) is defined implicitly by ψ(t,η(t))=0\psi(t,\eta(t))=0, for all tIt\in I. The mapping tIμ(t)t\in I\mapsto\mu^{(t)} can be extended uniquely to an affine mapping on \mathbb{R} whose values are signed measures. Inspecting the proof of Proposition 2.7, we see that η(t)\eta(t) is well-defined for all tt in an open interval JIJ\supset I.

The partial derivative Δy\Delta_{y} was computed before (2.13), and satisfies the bounds 0Δy(2cy)10\leq\Delta_{y}\leq(2cy)^{-1}. So we can differentiate under the integral sign and write ψy(t,y)=Δy(x,y)𝑑μ(t)(x)\psi_{y}(t,y)=\int\Delta_{y}(x,y)\,d\mu^{(t)}(x). This derivative is positive, since Δy(x,y)>0\Delta_{y}(x,y)>0 for all x>0x>0 and μ(t)δ0\mu^{(t)}\neq\delta_{0}. Therefore, since ψ\psi is an analytic function on the domain J×J\times\mathbb{R}, the inverse function theorem ensures that η\eta is analytic on JJ. In particular, [μ(t)]c=expK(x,η(t))𝑑μ(t)(x)[\mu^{(t)}]_{c}=\exp\int K(x,\eta(t))\,d\mu^{(t)}(x) is analytic as well, as claimed.

Now we want to prove (6.9); in that formula and in the following calculations, we omit the dependence on tt. First, we differentiate Llog[μ]c=K(x,η)dμL\coloneqq\log[\mu]_{c}=\int K(x,\eta)\,d\mu with respect to tt:

(6.11) L˙=η˙Ky(x,η)𝑑μ+K(x,η)𝑑μ˙\dot{L}=\dot{\eta}\int K_{y}(x,\eta)\,d\mu+\int K(x,\eta)\,d\dot{\mu}

(where μ˙ddtμ(t)\dot{\mu}\coloneqq\frac{d}{dt}\mu(t) is a signed measure), which by (2.9) simplifies to:

(6.12) L˙=K(x,η)𝑑μ˙.\dot{L}=\int K(x,\eta)\,d\dot{\mu}\,.

Differentiating again, and using that μ¨=0\ddot{\mu}=0 (since tμ(t)t\mapsto\mu(t) is affine), we obtain:

(6.13) L¨=η˙Ky(x,η)𝑑μ˙.\ddot{L}=\dot{\eta}\int K_{y}(x,\eta)\,d\dot{\mu}\,.

On the other hand, differentiating (2.9),

(6.14) η˙Kyy(x,η)𝑑μ+Ky(x,η)𝑑μ˙=0,\dot{\eta}\int K_{yy}(x,\eta)\,d\mu+\int K_{y}(x,\eta)\,d\dot{\mu}=0\,,

so (6.13) can be rewritten as:

(6.15) L¨=η˙2Kyy(x,η)𝑑μ.\ddot{L}=-\dot{\eta}^{2}\int K_{yy}(x,\eta)\,d\mu\,.

Let us transform this expression. Consider the function ΔyKy\Delta\coloneqq yK_{y}, which was introduced before in (2.11). Since Δy=yKyy+Ky\Delta_{y}=yK_{yy}+K_{y}, using (2.9) once again, we obtain Δy(x,η)𝑑μ=ηKyy(x,η)𝑑μ\int\Delta_{y}(x,\eta)\,d\mu=\eta\int K_{yy}(x,\eta)\,d\mu. So equation (6.15) becomes:

(6.16) L¨=η˙2ηΔy(x,η)𝑑μ.\ddot{L}=-\frac{\dot{\eta}^{2}}{\eta}\int\Delta_{y}(x,\eta)\,d\mu\,.

Substituting the expression of Δy\Delta_{y} given by (2.13), we obtain (6.9).777Incidentally, note that it is not true that Kyy0K_{yy}\geq 0 everywhere (KK is not a convex function of yy), so formula (6.15) by itself is not as useful as the final formulas (6.16) and (6.9).

For c=0c=0 or c=1c=1, the barycenter []c[\mathord{\cdot}]_{c} is a quasi-affine function. On the other hand, an inspection of Fig. 3 shows that this is not true for c=2/3c=2/3, at least, since the level sets are slightly bent. Using Proposition 6.5, we will formally prove:

Proposition 6.7.

If c(0,1)c\in(0,1), then the function []c[\mathord{\cdot}]_{c} is not quasi-affine.

A proof is given in the next section.

7. A deviation barycenter related to the HS barycenter

There is a large class of means called deviation means, which includes the class of quasiarithmetic means. Let us recall the definition (see [Da, DP]). Let II\subset\mathbb{R} be an open interval. A deviation function is a function E:I×IE\colon I\times I\to\mathbb{R} such that for all xIx\in I, the function yE(x,y)y\mapsto E(x,y) is continuous, strictly decreasing, and vanishes at y=xy=x. Given nn-tuples x¯=(x1,,xn)\underline{x}=(x_{1},\dots,x_{n}) and w¯=(w1,,wn)\underline{w}=(w_{1},\dots,w_{n}) with xiIx_{i}\in I, wi0w_{i}\geq 0, and i=1nwi=1\sum_{i=1}^{n}w_{i}=1, the deviation mean of x¯\underline{x} with weights w¯\underline{w} (with respect to the deviation function EE) is defined as the unique solution yIy\in I of the equation:

(7.1) i=1nwiE(xi,y)=0.\sum_{i=1}^{n}w_{i}E(x_{i},y)=0\,.

In terms of the probability measure μi=1nwiδxi\mu\coloneqq\sum_{i=1}^{n}w_{i}\delta_{x_{i}}, this equation can be rewritten as:

(7.2) E(x,y)𝑑μ(x)=0.\int E(x,y)\,d\mu(x)=0\,.

So it is reasonable to define the deviation barycenter of an arbitrary probability μ𝒫(I)\mu\in\mathcal{P}(I) (with respect to the deviation function EE) as the solution yy of this equation. Of course, existence and uniqueness of such a solution may depend on measurability and integrability conditions, and we will not undertake this investigation here. Nevertheless, let us note that if C𝒫(I)C\subseteq\mathcal{P}(I) is a convex set of probability measures where the deviation barycenter is uniquely defined, then it is a quasi-affine function there. Indeed, for each αI\alpha\in I, the corresponding upper level set is:

(7.3) {μC;E(x,α)𝑑μ(x)0}\left\{\mu\in C\;\mathord{;}\;\int E(x,\alpha)\,d\mu(x)\geq 0\right\}

and so it is convex; similarly for lower level sets.

Remark 7.1.

Let us mention a related concept (see [EM, AL] and references therein). Let MM be a manifold endowed with an affine (e.g. Riemannian) connection for which the exponential maps expy:TyMM\exp_{y}\colon T_{y}M\to M are diffeomorphisms. Given a probability measure μ𝒫(M)\mu\in\mathcal{P}(M), a solution yMy\in M of the equation

(7.4) expy1(x)𝑑μ(x)=0\int\exp_{y}^{-1}(x)\,d\mu(x)=0

is called an exponential barycenter of μ\mu. (For criteria of existence and uniqueness, see [AL].) The similarity between equations (7.4) and (7.2) is evident. Furthermore, like deviation barycenters, the level sets of exponential barycenters are convex. (Since MM has no order structure, it does not make sense to say that the exponential barycenter is quasi-affine.)

We have mentioned that the HS barycenter with parameter c(0,1)c\in(0,1) is not quasi-affine on the subcritical locus. Therefore HS barycenters are not a deviation barycenters, except for the extremal values of the parameter. Nevertheless, there exists a naturally related parametrized family of deviation barycenters, as we now explain.

Letting KK be the HS kernel (see Definition 2.1), we let:

(7.5) E(x,y,c)K(x,y,c)logy={c1log(cy1x+1c)if c>0,y1x1if c=0.E(x,y,c)\coloneqq K(x,y,c)-\log y=\begin{cases}c^{-1}\log\left(cy^{-1}x+1-c\right)&\text{if }c>0,\\ y^{-1}x-1&\text{if }c=0.\end{cases}

For any value of the parameter c[0,1]c\in[0,1], this is a deviation function, provided we restrict it to x>0x>0. The corresponding deviation barycenter will be called the derived from Halász–Székely barycenter (or DHS barycenter) with parameter cc. More precisely:

Definition 7.2.

Let c[0,1]c\in[0,1] and μ𝒫()\mu\in\mathcal{P}(\mathbb{R}). If c=1c=1, then we require that the function logx\log x is semi-integrable with respect to μ\mu. The DHS barycenter with parameter cc of the probability measure μ\mu, denoted μc\llbracket\mu\rrbracket_{c}, is defined as follows:

  1. (a)

    if μ=δ0\mu=\delta_{0}, or c=1c=1 and logxdμ(x)=\int\log x\,d\mu(x)=-\infty, then μc0\llbracket\mu\rrbracket_{c}\coloneqq 0;

  2. (b)

    if c=0c=0 and logxdμ(x)=\int\log x\,d\mu(x)=\infty, or c>0c>0 and log(1+x)𝑑μ(x)=\int\log(1+x)\,d\mu(x)=\infty, then μc+\llbracket\mu\rrbracket_{c}\coloneqq+\infty;

  3. (c)

    in all other cases, μc\llbracket\mu\rrbracket_{c} is defined at the unique positive and finite solution yy of the equation E(x,y,c)𝑑μ(x)=0\int E(x,y,c)\,d\mu(x)=0.

Of course, we need to show that the definition makes sense in case (c), i.e., that there exists a unique yy\in\mathbb{R} such that E(x,y,c)𝑑μ(x)=0\int E(x,y,c)\,d\mu(x)=0. This is obvious if c=0c=0 or c=1c=1, so assume that c(0,1)c\in(0,1). Since log(1+x)L1(μ)\log(1+x)\in L^{1}(\mu), the function ϕ(y)E(x,y,c)𝑑μ(x)\phi(y)\coloneqq\int E(x,y,c)\,d\mu(x) is finite, and (by the dominated convergence theorem) continuous and strictly decreasing. Furthermore, ϕ(y)\phi(y) converges to c1log(1c)<0c^{-1}\log(1-c)<0 as y+y\to+\infty and (since μδ0)\mu\neq\delta_{0}) to ++\infty as y0+y\to 0^{+}. So ϕ\phi has a unique zero on \mathbb{R}, as we wanted to prove.

The DHS barycenters have the same basic properties as the HS barycenters (Proposition 2.6); we leave the verification for the reader.888For example, monotonicity with respect to cc follows simply from the corresponding property of the deviation functions, so we do not need to use the finer comparison criteria from [Da, DP]. Furthermore, we have the following inequality:

Proposition 7.3.

[μ]cμc[\mu]_{c}\leq\llbracket\mu\rrbracket_{c}. The inequality is strict unless either μ\mu is a delta measure, c=0c=0, c=1c=1, or log(1+x)𝑑μ(x)=\int\log(1+x)\,d\mu(x)=\infty.

Proof.

Let c[0,1]c\in[0,1] and μ𝒫()\mu\in\mathcal{P}(\mathbb{R}). If μc\llbracket\mu\rrbracket_{c} is either 0 or \infty, then it is clear from Definition 7.2 and basic properties of the HS barycenter that [μ]c=μc[\mu]_{c}=\llbracket\mu\rrbracket_{c}. So assume that μcξ\llbracket\mu\rrbracket_{c}\eqqcolon\xi is neither 0 nor \infty. Then it satisfies the equation E(x,ξ,c)𝑑μ(x)=0\int E(x,\xi,c)\,d\mu(x)=0, or equivalently K(x,ξ,c)𝑑μ(x)=logξ\int K(x,\xi,c)\,d\mu(x)=\log\xi. Considering y=ξy=\xi in the definition (2.4), we obtain [μ]cξ[\mu]_{c}\leq\xi, as claimed.

Let us investigate the cases of equality. It is clear that 0\llbracket\mathord{\cdot}\rrbracket_{0} and 1\llbracket\mathord{\cdot}\rrbracket_{1} are the arithmetic and geometric barycenters, respectively, and so coincide with the corresponding HS barycenters. Also, if log(1+x)𝑑μ(x)=\int\log(1+x)\,d\mu(x)=\infty, then [μ]c=μc=[\mu]_{c}=\llbracket\mu\rrbracket_{c}=\infty. So consider c(0,1)c\in(0,1) and μ𝒫HS()\mu\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} such that [μ]c=μcξ[\mu]_{c}=\llbracket\mu\rrbracket_{c}\eqqcolon\xi. The infimum in formula (2.4) is attained at y=ξy=\xi, and thus (see Proposition 2.11) we are in the subcritical regime μ(0)<1c\mu(0)<1-c. Hence equation (2.16) holds with η=ξ\eta=\xi. Note that the equation can be rewritten as:

(7.6) ηcx+(1c)η𝑑μ(x)=1.\int\frac{\eta}{cx+(1-c)\eta}\,d\mu(x)=1\,.

On the other hand,

(7.7) log(ηcx+(1c)η)𝑑μ(x)=cE(x,η,c)𝑑μ(x)=0.\int\log\left(\frac{\eta}{cx+(1-c)\eta}\right)\,d\mu(x)=-c\int E(x,\eta,c)\,d\mu(x)=0\,.

So we have an equality in Jensen’s inequality, which is only possible if the integrands are almost everywhere constant, that is, μ\mu is a delta measure. ∎

In some senses, the DHS barycenters are better behaved than HS barycenters. For example, there is no critical phenomena.

Example 7.4.

As in Example 2.13, consider the measures μp(1p)δ0+pδ1\mu_{p}\coloneqq(1-p)\delta_{0}+p\delta_{1}, where p[0,1]p\in[0,1]. A calculation gives:

(7.8) μpc=c((1c)1pp1+c)1.\llbracket\mu_{p}\rrbracket_{c}=c\left((1-c)^{-\frac{1-p}{p}}-1+c\right)^{-1}\,.

For c=1/2c=1/2, the graphs of the two barycenters are shown in Fig. 4.

Refer to caption
Figure 4. Graphs of the functions p[μp]1/2p\mapsto[\mu_{p}]_{1/2} (red) and pμp1/2p\mapsto\llbracket\mu_{p}\rrbracket_{1/2} (blue), where μp(1p)δ0+pδ1\mu_{p}\coloneqq(1-p)\delta_{0}+p\delta_{1}.

The definition of DHS barycenters is not so arbitrary as it may seem at first sight; indeed, they approximate HS barycenters:

Proposition 7.5.

The HS and DHS barycenters are tangent at δx0\delta_{x_{0}}, for any x0>0x_{0}>0. In other words, if t[0,t1]μ(t)𝒫HS()t\in[0,t_{1}]\mapsto\mu(t)\in{\mathcal{P}_{\mathrm{HS}}(\mathbb{R})} is an affine path with μ(0)=δx0\mu(0)=\delta_{x_{0}}, then:

(7.9) ddt[μ(t)]c|t=0=ddtμ(t)c|t=0.\left.\frac{d}{dt}[\mu(t)]_{c}\right|_{t=0}=\left.\frac{d}{dt}\llbracket\mu(t)\rrbracket_{c}\right|_{t=0}\,.
Proof.

It is sufficient to consider c(0,1)c\in(0,1). Let us use the notations from the proof of Lemma 6.6. We evaluate (6.12) at t=0t=0, using η(0)=x0\eta(0)=x_{0}, thus obtaining:

(7.10) L˙(0)=K(x,x0)𝑑μ˙.\dot{L}(0)=\int K(x,x_{0})\,d\dot{\mu}\,.

Next, consider ξ=ξ(t)μ(t)c\xi=\xi(t)\coloneqq\llbracket\mu(t)\rrbracket_{c}. By definition, K(x,ξ)𝑑μ=logξ\int K(x,\xi)\,d\mu=\log\xi. Differentiating this equation,

(7.11) ξ˙Ky(x,ξ)𝑑μ+K(x,ξ)𝑑μ˙=ξ˙ξ.\dot{\xi}\int K_{y}(x,\xi)\,d\mu+\int K(x,\xi)\,d\dot{\mu}=\frac{\dot{\xi}}{\xi}\,.

Evaluating at t=0t=0 and ξ(0)=x0\xi(0)=x_{0}, we obtain:

(7.12) ξ˙(0)Ky(x0,x0)+K(x,x0)𝑑μ˙=ξ˙(0)x0.\dot{\xi}(0)K_{y}(x_{0},x_{0})+\int K(x,x_{0})d\dot{\mu}=\frac{\dot{\xi}(0)}{x_{0}}\,.

But a calculation shows that Ky(x0,x0)=0K_{y}(x_{0},x_{0})=0 (this can also be seen as a consequence of part (e) of Proposition 2.2), so we obtain:

(7.13) ξ˙(0)=x0K(x,x0)𝑑μ˙=x0L˙(0)=ddt[μ(t)]c|t=0,\dot{\xi}(0)=x_{0}\int K(x,x_{0})\,d\dot{\mu}=x_{0}\dot{L}(0)=\left.\frac{d}{dt}[\mu(t)]_{c}\right|_{t=0}\,,

as we wanted to prove. ∎

The approximation between the two barycenters is often surprisingly good, even for measures that are not very close to a delta measure:

Example 7.6.

If μ\mu is Lebesgue measure on [1,2][1,2] and c=1/2c=1/2, then:

(7.14) [μ]c\displaystyle[\mu]_{c} 1.485926,\displaystyle\simeq 1.485926\,,
(7.15) μc\displaystyle\llbracket\mu\rrbracket_{c} 1.485960,\displaystyle\simeq 1.485960\,,

a difference of 0.0020.002%.

To conclude our paper, let us confirm that HS barycenters are not quasi-affine, except for the extremal cases c=0c=0 and c=1c=1:

Proof of Proposition 6.7.

Let c(0,1)c\in(0,1). Choose some μ0\mu_{0} in subcritical locus (6.7) which is not a delta measure. Let y0[μ0]cy_{0}\coloneqq[\mu_{0}]_{c}\in\mathbb{R} and μ1δy0\mu_{1}\coloneqq\delta_{y_{0}}. Then η(μ1,c)=y0\eta(\mu_{1},c)=y_{0}. We claim that η(μ0,c)y0\eta(\mu_{0},c)\neq y_{0}. Indeed, if η(μ0,c)=y0\eta(\mu_{0},c)=y_{0}, then, by (2.4), logy0=K(x,y0,c)𝑑μ0(x)\log y_{0}=\int K(x,y_{0},c)\,d\mu_{0}(x), and so μ0c=y0\llbracket\mu_{0}\rrbracket_{c}=y_{0}, which by Proposition 7.3 implies that μ0\mu_{0} is a delta measure: contradiction. Now Proposition 6.5 guarantees that the function []c[\mathord{\cdot}]_{c} is strictly log-concave on the segment [μ0,μ1][\mu_{0},\mu_{1}]; in particular, it is not constant. Since the function attains the same value on the extremes of the segment, it cannot be quasi-affine. ∎

Acknowledgement. We thank Juarez Bochi for helping us with computer experiments at a preliminary stage of this project.

References

  • [AD] Aczél, János; Dhombres, Jean G.Functional equations in several variables. Encyclopedia of Mathematics and its Applications, 31. Cambridge University Press, Cambridge, 1989. MR ZB
  • [AL] Arnaudon, Marc; Li, Xue-Mei -- Barycenters of measures transported by stochastic flows. Ann. Probab. 33 (2005), no. 4, 1509--1543. MR ZB
  • [Au] Austin, Tim -- A CAT(0)-valued pointwise ergodic theorem. J. Topol. Anal. 3 (2011), no. 2, 145--152. MR ZB
  • [ADSZ] Avriel, Mordecai; Diewert, Walter E.; Schaible, Siegfried; Zang, Israel -- Generalized concavity. Reprint of the 1988 original ed. Classics in Applied Mathematics, 63. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2010. MR ZB
  • [BN] Balogh, József; Nguyen, Hoi -- A general law of large permanent. Discrete Contin. Dyn. Syst. 37 (2017), no. 10, 5285--5297. DOI MR ZB
  • [BIP] Bochi, Jairo; Iommi, Godofredo; Ponce, Mario -- The scaling mean and a law of large permanents. Adv. Math. 292 (2016), 374--409. DOI MR ZB
  • [BB] Borwein, Jonathan M.; Borwein, Peter B. -- Pi and the AGM. A study in analytic number theory and computational complexity. Reprint of the 1987 original. Canadian Mathematical Society Series of Monographs and Advanced Texts, 4. John Wiley & Sons, Inc., New York, 1998. MR ZB
  • [Bu] Bullen, Peter S. -- Handbook of means and their inequalities. Mathematics and its Applications, 560. Kluwer Academic Publishers Group, Dordrecht, 2003. MR ZB
  • [CHMW] Cellarosi, Francesco; Hensley, Doug; Miller, Steven J.; Wellens, Jake L. -- Continued fraction digit averages and Maclaurin’s inequalities. Exp. Math. 24 (2015), no. 1, 23--44. DOI MR ZB
  • [Da] Daróczy, Zoltán -- Über eine Klasse von Mittelwerten. Publ. Math. Debrecen 19 (1972), 211--217. MR ZB
  • [DP] Daróczy, Zoltán; Páles, Zsolt -- On comparison of mean values. Publ. Math. Debrecen 29 (1982), 107--115. MR ZB
  • [EM] Émery, Michel; Mokobodzki, Gabriel -- Sur le barycentre d’une probabilité dans une variété. Séminaire de Probabilités, XXV, 220--233, Lecture Notes in Math., 1485, Springer, Berlin, 1991. MR ZB
  • [Fe] Feller, William -- An introduction to probability theory and its applications. Vol. I. 3rd. ed. John Wiley & Sons, Inc., New York-London-Sydney, 1968. MR ZB
  • [GZ] Guglielmi, Nicola; Zennaro, Marino -- An antinorm theory for sets of matrices: bounds and approximations to the lower spectral radius. Linear Algebra Appl. 607 (2020), 89--117. DOI MR
  • [Gu] Gurvits, Leonid -- Van der Waerden/Schrijver-Valiant like conjectures and stable (aka hyperbolic) homogeneous polynomials: one theorem for all. With a corrigendum. Electron. J. Combin. 15 (2008), no. 1, Research Paper 66, 26 pp. DOI MR ZB
  • [HS] Halász, Gábor; Székely, Gábor J. -- On the elementary symmetric polynomials of independent random variables. Acta Math. Acad. Sci. Hungar. 28 (1976), no. 3-4, 397--400. DOI MR ZB
  • [HLP] Hardy, Godfrey H.; Littlewood, John E.; Pólya, George -- Inequalities. Reprint of the 1952 edition. Cambridge Mathematical Library. Cambridge University Press, Cambridge, 1988. MR ZB
  • [He] Heath, Thomas -- A history of Greek mathematics. Vol. I. From Thales to Euclid. Corrected reprint of the 1921 original. Dover Publications, Inc., New York, 1981. MR ZB
  • [KLL] Kim, Sejong; Lawson, Jimmie; Lim, Yongdo -- Barycentric maps for compactly supported measures. J. Math. Anal. Appl. 458 (2018), no. 2, 1009--1026. DOI MR ZB
  • [Kr] Krengel, Ulrich -- Ergodic theorems. With a supplement by Antoine Brunel. De Gruyter Studies in Mathematics, 6. Walter de Gruyter & Co., Berlin, 1985. MR ZB
  • [Mai] Maistrov, L.E. -- Probability theory: a historical sketch. Translated and edited by Samuel Kotz. Probability and Mathematical Statistics, Vol. 23. Academic Press, New York-London, 1974. MR ZB
  • [Maj] Major, Péter -- The limit behavior of elementary symmetric polynomials of i.i.d. random variables when their order tends to infinity. Ann. Probab. 27 (1999), no. 4, 1980--2010. URL MR ZB
  • [Na] Navas, Andrés -- An L1L^{1} ergodic theorem with values in a non-positively curved space via a canonical barycenter map. Ergodic Theory Dynam. Systems 33 (2013), no. 2, 609--623. DOI MR ZB
  • [Pa] Parthasarathy, Kalyanapuram R. -- Probability measures on metric spaces. Probability and Mathematical Statistics, No. 3. Academic Press, Inc., New York-London, 1967. MR ZB
  • [Si] Simon, Barry -- Advanced complex analysis. A Comprehensive Course in Analysis, Part 2B. American Mathematical Society, Providence, RI, 2015. MR ZB
  • [St] Stone, Marshall H. -- Postulates for the barycentric calculus. Ann. Mat. Pura Appl. (4) 29 (1949), 25--30. DOI MR ZB
  • [Sz1] Székely, Gábor J. -- On the polynomials of independent random variables. Limit theorems of probability theory (Colloq., Keszthely, 1974), pp. 365--371. Colloq. Math. Soc. János Bolyai, Vol. 11, North-Holland, Amsterdam, 1975. MR ZB
  • [Sz2] Székely, Gábor J. -- A limit theorem for elementary symmetric polynomials of independent random variables. Z. Wahrsch. Verw. Gebiete 59 (1982), no. 3, 355--359. DOI MR ZB
  • [vE] van Es, Bert -- On the weak limits of elementary symmetric polynomials. Ann. Probab. 14 (1986), no. 2, 677--695. URL MR ZB
  • [Vi] Villani, Cédric -- Topics in optimal transportation. Graduate Studies in Mathematics, 58. American Mathematical Society, Providence, RI, 2003. MR ZB
  • [Zh] Zhan, Xingzhi -- Matrix inequalities. Lecture Notes in Mathematics, 1790. Springer-Verlag, Berlin, 2002. MR ZB