This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Operations with concentration inequalities

Cosme Louartlabel=e1][email protected] [ School of Data Science, The Chinese University of Hong Kong (Shenzhen), Shenzhen, Chinapresep= , ]e1
Abstract

Following the concentration of the measure theory formalism, we consider the transformation Φ(Z)\Phi(Z) of a random variable ZZ having a general concentration function α\alpha. If the transformation Φ\Phi is λ\lambda-Lipschitz with λ>0\lambda>0 deterministic, the concentration function of Φ(Z)\Phi(Z) is immediately deduced to be equal to α(/λ)\alpha(\cdot/\lambda). If the variations of Φ\Phi are bounded by a random variable Λ\Lambda having a concentration function (around 0) β:+\beta:\mathbb{R}_{+}\to\mathbb{R}, this paper sets that Φ(Z)\Phi(Z) has a concentration function analogous to the so-called parallel product of α\alpha and β\beta. With this result at hand (i) we express the concentration of random vectors with independent heavy-tailed entries, (ii) given a transformation Φ\Phi with bounded kthk^{\text{th}} differential, we express the so-called “multi-level” concentration of Φ(Z)\Phi(Z) as a function of α\alpha, and the operator norms of the successive differentials up to the kthk^{\text{th}} (iii) we obtain a heavy-tailed version of the Hanson-Wright inequality.

60-08, 60B20,
62J07,
Concentration of Measure,
Hanson-Wright inequality,
Heavy-tailed probabilities,
Parallel sum,
keywords:
[class=MSC]
keywords:
\startlocaldefs\endlocaldefs

Notations

We denote +\mathbb{R}_{+} (resp. \mathbb{R}_{-}) the set of positive (resp. negative) or null real numbers, {0}\mathbb{R}^{*}\equiv\mathbb{R}\setminus\{0\} and +=+{0}\mathbb{R}_{+}^{*}=\mathbb{R}_{+}\setminus\{0\}. Given AA\subset\mathbb{R}, we note A¯\bar{A}, the closure of AA, Å\mathring{A}, the interior of AA and A\partial A, the boundary of AA (AA¯Å\partial A\equiv\bar{A}\setminus\mathring{A}).

We extend the relation “\leq” to a relation between sets and introduce a new order “\preceq” defined followingly for any sets A,BA,B\subset\mathbb{R}:

ABA=orB=oraA,bB:ab\displaystyle A\leq B\qquad\Longleftrightarrow\qquad A=\emptyset\ \ \operatorname{or}\ \ B=\emptyset\ \ \operatorname{or}\ \ \forall a\in A,b\in B:\ \ a\leq b
ABA=orB=oraA,bB:ab,\displaystyle A\preceq B\qquad\Longleftrightarrow\qquad A=\emptyset\ \ \operatorname{or}\ \ B=\emptyset\ \ \operatorname{or}\ \ \forall a\in A,\exists b\in B:\ \ a\leq b,

and we denote ABA\geq B (resp. ABA\succeq B) when AB-A\leq-B (resp. AB-A\preceq-B). The above definitions can easily extend to cases were AA or BB are scalars a,ba,b\in\mathbb{R} that are then identified with the singletons {a}\{a\} or {b}\{b\}.

We consider below a set-valued mapping (also called “operator”) f:2f:\mathbb{R}\to 2^{\mathbb{R}}. The graph of ff is the set Gra(f)={(x,y)2,yf(x)}\operatorname{Gra}(f)=\{(x,y)\in\mathbb{R}^{2},y\in f(x)\}. The inverse of ff is the set-valued mapping f1f^{-1} satisfying:

Gra(f1)={(y,x)2,(x,y)Gra(f)}.\displaystyle\operatorname{Gra}(f^{-1})=\{(y,x)\in\mathbb{R}^{2},(x,y)\in\operatorname{Gra}(f)\}.

Placing ourselves in the framework of monotone operator, we will say that ff is increasing if:

(x,u),(y,v)Gra(f):(yx)(vu)0,\displaystyle\forall(x,u),(y,v)\in\operatorname{Gra}(f):\quad(y-x)(v-u)\geq 0,

and ff is said to be a decreasing operator if f-f is increasing. Increasing and decreasing operators are both called monotone operators. The domain of ff is denoted Dom(f)={x,f(x)}\operatorname{Dom}(f)=\{x\in\mathbb{R},f(x)\neq\emptyset\} and the range Ran(f)={y,x:yf(x)}\operatorname{Ran}(f)=\{y\in\mathbb{R},\exists x\in\mathbb{R}:y\in f(x)\}. (Scalar-valued) functions are identified with operators whose image are all singletons. Given a set AA\subset\mathbb{R}, we naturally define:

f(A)={y,xA,yf(x)}.\displaystyle f(A)=\left\{y\in\mathbb{R},\exists x\in A,y\in f(x)\right\}.

Then, given two operators f,gf,g, the composition of ff and gg is the operator defined for any xx\in\mathbb{R} as fg(x)=f(g(x))f\circ g(x)=f(g(x)). The sum (resp. the product) between two operators f,g:2f,g:\mathbb{R}\to 2^{\mathbb{R}} is simply noted f+g:xf(x)+g(x)f+g:x\mapsto f(x)+g(x) (resp. fg:xf(x)g(x)f\cdot g:x\mapsto f(x)\cdot g(x)) their domain is exactly Dom(f)Dom(g)\operatorname{Dom}(f)\cap\operatorname{Dom}(g).

We denote Id:\operatorname{Id}:\mathbb{R}\to\mathbb{R} the function defined for all xx\in\mathbb{R} as Id(x)=x\operatorname{Id}(x)=x. Given uu\in\mathbb{R}, we denote the increment operator incu\operatorname{inc}_{u} whose domain is Dom(incu)=(,u]\operatorname{Dom}(\operatorname{inc}_{u})=(-\infty,u] and defined as:

{incu(t)={0}ift(,u)incu(u)=+.\displaystyle\left\{\begin{aligned} &\operatorname{inc}_{u}(t)=\{0\}\quad\operatorname{if}\ t\in(-\infty,u)\\ &\operatorname{inc}_{u}(u)=\mathbb{R}_{+}.\end{aligned}\right.

Given two euclidean vector spaces EE and FF and an integer dd\in\mathbb{N}, we denote 𝒟d(E,F)\mathcal{D}^{d}(E,F), the set of dd times differentiable mappings from EE to FF and d(E,F)\mathcal{L}^{d}(E,F), the set of dd-linear mappings from EdE^{d} to FF. Given a dd linear mapping hd(E,F)h\in\mathcal{L}^{d}(E,F), we further denote h\|h\| the operator norm of hh defined as:

h=sup{h(x1,,xn):x1,,xnEandx1,,xn1}.\displaystyle\|h\|=\sup\left\{\|h(x_{1},\ldots,x_{n})\|:\ x_{1},\ldots,x_{n}\in E\ \operatorname{and}\ \|x_{1}\|,\ldots,\|x_{n}\|\leq 1\right\}.

Given k[d]k\in[d], f𝒟d(E,F)f\in\mathcal{D}^{d}(E,F), and xEx\in E, we denote dkf|xk(E,F){d^{k}f}\raisebox{-2.15277pt}{$|$}_{x}\in\mathcal{L}^{k}(E,F), the kthk^{\text{th}} differential of ff at point xx.

Introduction

A fundamental result of the concentration of measure theory sets that for any Gaussian random vector Z𝒩(0,In)Z\sim\mathcal{N}(0,I_{n}), for any 11-Lipschitz for the euclidean norm on n\mathbb{R}^{n} mapping f:nf:\mathbb{R}^{n}\to\mathbb{R}:

t0:(|f(Z)𝔼[f(Z)]|>t)2et2/2,\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(\left|f(Z)-\mathbb{E}[f(Z)]\right|>t\right)\leq 2e^{-t^{2}/2},

the mapping α:t2et2/2\alpha:t\mapsto 2e^{-t^{2}/2} is here called a concentration function of ZZ. Given a transformation F:nqF:\mathbb{R}^{n}\to\mathbb{R}^{q}, λ\lambda-Lipschitz for the euclidean norm on n\mathbb{R}^{n} and q\mathbb{R}^{q} we have for any ZnZ^{\prime}\in\mathbb{R}^{n}, an independent copy of ZZ:

F(Z)F(Z)λZZ\displaystyle\left\|F(Z)-F(Z^{\prime})\right\|\leq\lambda\left\|Z-Z^{\prime}\right\|

and it is easy to deduce, with a slight modification of tt, that the concentration function of the random vector F(Z)qF(Z)\in\mathbb{R}^{q} is α(/λ)\alpha(\cdot/\lambda). The aim of our paper is to express the concentration of F(Z)F(Z) when λ\lambda is not constant but a certain random variable Λ(Z)\Lambda(Z) for Λ:n\Lambda:\mathbb{R}^{n}\to\mathbb{R}, measurable. The following result is given for the general concentration function of ZZ and λ\lambda.

Theorem 0.1.

Let us consider two metric spaces (E,d)(E,d), (E,d)(E^{\prime},d^{\prime}), a random variable ZEZ\in E, a measurable mapping Λ:E\Lambda:E\to\mathbb{R} such that there exist two strictly decreasing mappings α,β:++\alpha,\beta:\mathbb{R}_{+}\to\mathbb{R}_{+} such that for any 11-Lipschitz mapping f:Ef:E\to\mathbb{R}, t0\forall t\geq 0:

(|f(Z)f(Z)|>t)α(t)\displaystyle\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)\leq\alpha(t) and (Λ(Z)>t)β(t),\displaystyle\mathbb{P}\left(\Lambda(Z)>t\right)\leq\beta(t),

for any independent copy ZZ^{\prime} of ZZ, and a transformation Φ:EE\Phi:E\to E^{\prime} such that, for all z,zEz,z^{\prime}\in E:

d(Φ(z),Φ(z))max(Λ(z),Λ(z))d(z,z),\displaystyle d^{\prime}(\Phi(z),\Phi(z^{\prime}))\leq\max(\Lambda(z),\Lambda(z^{\prime}))d(z,z^{\prime}),

then for any 11-Lipschitz mapping g:Eg:E^{\prime}\to\mathbb{R}:

t0:(|g(Φ(Z))g(Φ(Z))|>t)3(α1β1)1(t).\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(\left|g(\Phi(Z))-g(\Phi(Z^{\prime}))\right|>t\right)\leq 3(\alpha^{-1}\cdot\beta^{-1})^{-1}(t). (1)

This theorem allows some extension of the famous Hanson-Wright concentration inequality to a wider range of concentration functions (Theorems 6.1 and 6.7).

A more general version of this theorem is given in Theorem 3.5 for α,β\alpha,\beta which are possibly constant in some parts of their domain, and for a random variable Λ(Z)\Lambda(Z) which writes like a product of nn random variables, each of which has its own concentration function. Theorem 3.10 gives a similar result for the random variable ZZ that satisfies weaker so-called “convex concentration” hypotheses and Φ\Phi takes value in \mathbb{R}.

The mapping (α1β1)1:++(\alpha^{-1}\cdot\beta^{-1})^{-1}:\mathbb{R}^{+}\to\mathbb{R}^{+} rewrites (α~1+β~1)1log(\tilde{\alpha}^{-1}+\tilde{\beta}^{-1})^{-1}\circ\log with the introduction of α~αexp\tilde{\alpha}\equiv\alpha\circ\exp and β~βexp\tilde{\beta}\equiv\beta\circ\exp. The expression (α~1+β~1)1(\tilde{\alpha}^{-1}+\tilde{\beta}^{-1})^{-1} can be recognized as the so-called parallel sum, originally introduced in electrical engineering to model parallel resistor networks, which was then generalized to matrices in [5] and to nonlinear operators in convex analysis in [4] (see [7, Chapter 24] for a presentation in the context of set-valued functions). The parallel sum is traditionally denoted \square, but since we will also be presenting a parallel product, we found it more convenient to denote it as \boxplus (and the parallel product as \boxtimes). The operation on graphs can easily be represented as shown in Figure 1.

Refer to caption
Refer to caption
Figure 1: Classical sum (Left) and parallel sum (Right) of two operators. One can wonder from this second graph whether the parallel sum should not rather be called the perpendicular sum

The control of non Lipschitz functionals has been studied by different authors. To be brief, one can mention the famous works of Vu in [25] on binary variables, then the introduction of specific operator norms on tensors to get concentration of polynomial of Gaussian variables by Latała in [16] and later generalized for more general variables and functionals in [3], [11] and [12]. Their studies let appear the notion of “multilevel concentration inequalities” where the concentration rate usually takes the form:

texp(infaA(tσa)a),\displaystyle t\mapsto\exp\left(-\inf_{a\in A}\left(\frac{t}{\sigma_{a}}\right)^{a}\right),

for a finite set A+A\subset\mathbb{R}_{+} and (σa)aA+A(\sigma_{a})_{a\in A}\in\mathbb{R}_{+}^{A}. This multi-level scheme appears naturally when one computes the parallel product of kk decreasing mappings

αminaA(1)(tσa(1))a,,αminaA(k)(tσa(k))a,\displaystyle\alpha\circ\min_{a\in A^{(1)}}\left(\frac{t}{\sigma^{(1)}_{a}}\right)^{a},\ \ \ldots,\ \ \alpha\circ\min_{a\in A^{(k)}}\left(\frac{t}{\sigma^{(k)}_{a}}\right)^{a},

for a decreasing function (possibly heavy-tailed) α\alpha replacing tet2t\mapsto e^{-t^{2}}. We present in Theorem 5.5 a first setting where this multi-level concentration can appear.

However, a more natural setting allows us to continue the long-running study of the concentration of chaos and other functionals with bounded kthk^{\text{th}}-differential, kk\in\mathbb{N}. Two approaches have been developed for these objects.

The older one is based on Hermite polynomials and was introduced by Christer Borell [9]. This work is not available online but was cited in [17] that studies chaos of maximal order dd of Gaussian variables. Then concentration inequalities closer to the one we will set are proved for Gaussian random variables in [6] and for random vectors with independent log-concave entries in [19], both with the appearance of quantiles instead of medians.

The more recent approach was initiated by Latała in [16] and relies on moment bounds to express a two-sided concentration inequality for Gaussian chaos. The tensor-product matrix norms involved in this type of concentration can be numerous, but the expectation is taken inside the norms, which could be a good improvement. This approach was followed in [3] for general functionals with bounded derivative of higher order and for random vectors satisfying log-Sobolev inequalities, then [12] extended the results to so-called “α\alpha-sub exponential random variables”.

The two approaches are known to be “relatively equivalent” in the case of d=2d=2 (see the discussion initiated at the end of Section 3 in [2]), the question remains open for larger orders. Our present work is in line with the first approach, and in this respect, our main contribution is to extend previous results to (i)(i) functionals with bounded dthd^{\text{th}}-derivative and (ii)(ii) to any random vector ZEZ\in E of a metric space (E,d)(E,d) admitting a concentration function α:++\alpha:\mathbb{R}_{+}\to\mathbb{R}_{+} that could be defined as:

t0:α(t)supf:E, 1-Lipschitz(|f(Z)f(Z)|>t)\displaystyle\forall t\geq 0:\qquad\alpha(t)\equiv\sup_{f:E\to\mathbb{R},\ 1\text{-Lipschitz}}\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)

(following the idea of Ledoux presented at the beginning of his book [18]).

Theorem 0.2.

Let us consider a random vectors ZnZ\in\mathbb{R}^{n} such that for any f:nf:\mathbb{R}^{n}\to\mathbb{R} 11-Lipschitz:

(|f(Z)mf|>t)α(t),\displaystyle\mathbb{P}\left(\left|f(Z)-m_{f}\right|>t\right)\leq\alpha(t),

for a certain median of f(Z)f(Z), mfm_{f} and a certain decreasing mapping α:++\alpha:\mathbb{R}_{+}\to\mathbb{R}_{+}.

Then, for any dd-differentiable mapping Φ𝒟d(n,p)\Phi\in\mathcal{D}^{d}(\mathbb{R}^{n},\mathbb{R}^{p}) and any g:pg:\mathbb{R}^{p}\to\mathbb{R}, 11-Lipschitz one can bound:

(|g(Φ(Z))mg|>t)2dα(1emink[d](Iddmk)1k),\displaystyle\mathbb{P}\left(\left|g(\Phi(Z))-m_{g}\right|>t\right)\leq 2^{d}\alpha\left(\frac{1}{e}\min_{k\in[d]}\left(\frac{\operatorname{Id}}{dm_{k}}\right)^{\frac{1}{k}}\right), (2)

where, mgm_{g} is a median of gΦ(Z)g\circ\Phi(Z), for all k[d1]k\in[d-1], mkm_{k} is a median of dkΦ|Z\|{d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{Z}\| and md=ddΦm_{d}=\|d^{d}\Phi\|_{\infty}.

Finally, let us mention here that an elementary result (Lemma 2.5) provides a sufficient condition to replace medians or independent copies with expectations in concentration inequalities similar to (2) when α\alpha is integrable. In “natural settings”, it morally consists in assuming the integrability of α\alpha. This Lemma allows to generalize the Hanson-wright inequality to random vectors having a concentration function α\alpha satisfying +tα(t)𝑑t\int_{\mathbb{R}_{+}}t\alpha(t)dt\leq\infty (second moment bounded), see Theorem 6.7.

In order to be able to treat rigorously the case of non-invertible mappings α\alpha and β\beta, we decided to work with maximal operators of \mathbb{R}, i.e. set-valued mappings that always admit an inverse. To our knowledge, this connection between probabilities and set-valued mappings was only explored by Rockafellar (see for instance [22]) to study financial tools like superquantiles.

Section I provides basic tools for dealing with operators and the expression of the sum (resp. the product) of two concentration inequalities with the parallel sum (resp. with the parallel product). Further presentation of operators is provided in the appendix with the definition of the minimum of operators and the connection with the parallel sum to provide a more intuitive understanding of this operation. It is an easy exercise to check that these results on operators become trivial in the case of singleton-valued invertible mappings.

Section II briefly explains how to deal with different concentration pivots such as median, independent copies or expectation. Section III introduces important results of concentration in high dimension and expresses the concentration of a transformation with concentrated variations. In Section V we present a general example of heavy-tailed concentration in high dimension. In Section V we introduce relevant notations and parallel operation tools to justify the appearance of multilevel concentration, we further prove Theorem 0.2. Finally, in Section VI, we apply our results to obtain a heavy-tailed version of the Hanson-Wright theorem.

1 Parallel operations and concentration in \mathbb{R}

Definition 1.1.

Given two operators f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}}, we denote the parallel sum and the parallel product of ff and gg followingly:

fg=(f1+g1)1\displaystyle f\boxplus g=(f^{-1}+g^{-1})^{-1} and\displaystyle\operatorname{and} fg=(f1g1)1\displaystyle f\boxtimes g=(f^{-1}\cdot g^{-1})^{-1}

To set first elementary properties on parallel sum and product, let us provide without proof basic lemmas on the inverse f operators.

Lemma 1.2 ([7], Proposition 20.10).

Given an operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}}, (f1)1=f(f^{-1})^{-1}=f and f1f^{-1} is monotone iif. ff is monotone and has the same monotonicity.

One can easily circumvise the domain and range of the parallel sum and parallel product from their definition.

Proposition 1.3.

Given two increasing (resp. decreasing) operators f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}}, fgf\boxplus g is also increasing (resp. decreasing) and:

  • Dom(fg)=Ran(f1+g1)\operatorname{Dom}(f\boxplus g)=\operatorname{Ran}(f^{-1}+g^{-1}) and Dom(fg)=Ran(f1g1)\operatorname{Dom}(f\boxtimes g)=\operatorname{Ran}(f^{-1}\cdot g^{-1}),

  • Ran(fg)=Ran(fg)=Ran(f)Ran(g)\operatorname{Ran}(f\boxplus g)=\operatorname{Ran}(f\boxtimes g)=\operatorname{Ran}(f)\cap\operatorname{Ran}(g).

To ensure the monotonicity of fgf\boxtimes g one has to impose the ranges of f,gf,g to be subsets of +\mathbb{R}_{+} (or \mathbb{R}_{-}) as it will be done later.

As a simple consequence of Lemma 1.19, note that the parallel sum and product are both associative, commutative and the parallel product distributes on the parallel sum as classical sums and products of function would do.

Lemma 1.4.

Given three operators f,g,h:2f,g,h:\mathbb{R}\to 2^{\mathbb{R}}, one has the identities:

  • fg=gff\boxplus g=g\boxplus f and fg=gff\boxtimes g=g\boxtimes f,

  • (fg)h=f(gh)(f\boxplus g)\boxplus h=f\boxplus(g\boxplus h) and (fg)h=f(gh)(f\boxtimes g)\boxtimes h=f\boxtimes(g\boxtimes h),

  • f(gh)=(fg)+(fh)f\boxtimes(g\boxplus h)=(f\boxtimes g)+(f\boxtimes h).

Unlike the classical sum or product of functions, the composition distributes towards the sum \ \boxplus and the product \ \boxtimes on the left.

Proposition 1.5.

Given three operators α,f,g:2\alpha,f,g:\mathbb{R}\mapsto 2^{\mathbb{R}}:

α(fg)=(αf)(αg)\displaystyle\alpha\circ(f\boxplus g)=(\alpha\circ f)\boxplus(\alpha\circ g) and\displaystyle\operatorname{and} α(fg)=(αf)(αg)\displaystyle\alpha\circ(f\boxtimes g)=(\alpha\circ f)\boxtimes(\alpha\circ g)

It is a simple inference of the following lemma.

Lemma 1.6.

Given two operators f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}}: (fg)1=g1f1(f\circ g)^{-1}=g^{-1}f^{-1}.

Be careful, increasing operators (resp. decreasing operators) f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} satisfy x<yf(x)f(y)x<y\implies f(x)\leq f(y), but of course f(x)f(x)f(x)\not\leq f(x). In particular, if Gra(f)={0}×++×{0}\operatorname{Gra}(f)=\{0\}\times\mathbb{R}_{+}\cup\mathbb{R}_{+}\times\{0\}, note that Gra(ff)=+2\operatorname{Gra}(f\circ f)=\mathbb{R}_{+}^{2}, one sees that ff is decreasing but fff\circ f is not monotone. The composition with invertible functions though preserves monotonicity.

It will be more convenient below to work with so-called “maximally monotone operators” principally because of Proposition 1.10 and Theorems 1.18 and 1.12 given below. Although it is not completely trivial, maximality for operators on \mathbb{R} is quite elementary, principally because the class of convex sets is equal to the class of connected sets (it is the class of intervals of \mathbb{R}).

Definition 1.7.

A monotone operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} is maximally monotone if there exists no monotone operator g:2g:\mathbb{R}\mapsto 2^{\mathbb{R}} such that Gra(f)\operatorname{Gra}(f) is strictly included in Gra(g)\operatorname{Gra}(g). In other words, ff is maximally increasing iif the following equivalence is satisfied:

(x,u)Gra(f)\displaystyle(x,u)\in\operatorname{Gra}(f) \displaystyle\Longleftrightarrow (y,v)Gra(f):(yx)(vu)0,\displaystyle\forall(y,v)\in\operatorname{Gra}(f):(y-x)(v-u)\geq 0, (3)

One has to replace “(vu)(v-u)” with “(uv)(u-v)” to get a characterization of maximally decreasing operators.

Theorem 1.8 (Minty, [7], Theorem 21.1).

An increasing (resp. decreasing) monotone operator f:2f:\mathbb{R}\to 2^{\mathbb{R}} is maximally monotone iif Ran(Id+f)=\operatorname{Ran}(\operatorname{Id}+f)=\mathbb{R} (resp. iif. Ran(Id+f)=\operatorname{Ran}(-\operatorname{Id}+f)=\mathbb{R}).

The Minty Theorem provides straightforwardly simple examples of maximally monotone operators.

Example 1.9.
  1. 1.

    An operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} is maximally monotone iif. f1f^{-1} is maximally monotone.

  2. 2.

    Any monotone operator 2\mathbb{R}\to 2^{\mathbb{R}} such that Dom(f)=\operatorname{Dom}(f)=\mathbb{R} or Ran(f)=\operatorname{Ran}(f)=\mathbb{R} is maximally monotone.

  3. 3.

    Given any uu\in\mathbb{R}, incu\operatorname{inc}_{u} is maximally increasing.

  4. 4.

    Be careful that, it is not because α\alpha and β\beta are maximally monotone that αβ\alpha\circ\beta is maximally monotone

  5. 5.

    ([7], Corollary 24.4) Given f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}} maximally monotone with the same monotonicity:

    • if111Or that Dom(g)Dom(f)̊\operatorname{Dom}(g)\cap\mathring{\operatorname{Dom}(f)}\neq\emptyset, but since the parallel sum is commutative, the two hypotheses are equivalent. Dom(f)Dom(g)̊\operatorname{Dom}(f)\cap\mathring{\operatorname{Dom}(g)}\neq\emptyset, then f+gf+g is maximally monotone

    • if Ran(f)Ran(g)̊\operatorname{Ran}(f)\cap\mathring{\operatorname{Ran}(g)}\neq\emptyset, then the parallel sum fgf\boxplus g is maximally monotone.

Proposition 1.10 ([7], Proposition 20.31, Corollary 21.12).

Given a maximally monotone operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}}, xDom(f)\forall x\in\operatorname{Dom}(f), f(x)f(x) is a closed interval, besides, Ran(f)\operatorname{Ran}(f) and Dom(f)\operatorname{Dom}(f) are both intervals of \mathbb{R} and more generally, for any interval II\subset\mathbb{R}, f(I)f(I) is also an interval.

Definition 1.11.

An operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} is locally bounded on xx\in\mathbb{R} iif. there exists ε>0\varepsilon>0 such that f([xε,x+ε])f([x-\varepsilon,x+\varepsilon]) is bounded.

Theorem 1.12 (Rockafellar–Veselý, [7], Theorem 21.15).

A maximally monotone operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} is locally bounded on a point xx\in\mathbb{R} if and only if xDom(f)x\notin\partial\operatorname{Dom}(f).

Since we are here working in \mathbb{R}, it is possible to improve the hypotheses sufficient for the sum (or parallel sum) of two operators to be maximally monotone that were given in Example 1.9, Item 5.

Proposition 1.13.

Given two maximally monotone operators f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}} with the same monotonicity:

  • Dom(f)Dom(g)f+g\operatorname{Dom}(f)\cap\operatorname{Dom}(g)\neq\emptyset\quad\implies\quad f+g maximally monotone,

  • Ran(f)Ran(g)fg\operatorname{Ran}(f)\cap\operatorname{Ran}(g)\neq\emptyset\quad\implies\quad f\boxplus g maximally monotone,

Note that the operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} such that Dom(f)=\operatorname{Dom}(f)=\emptyset is rigorously monotone but not maximally monotone (its graph is contained in the graph of any monotone operator).

Proof.

We know from Proposition 1.10 that Dom(f),Dom(g)\operatorname{Dom}(f),\operatorname{Dom}(g) are two intervals of \mathbb{R}, if Dom(f)Dom(g)\operatorname{Dom}(f)\cap\operatorname{Dom}(g) contains more that one element, then it means that it contains an open set and therefore Dom(f)Dom(g)̊\operatorname{Dom}(f)\cap\mathring{\operatorname{Dom}(g)}\neq\emptyset, one can then employ Example 1.9, Item 5 to deduce that f+gf+g is maximally monotone.

If Dom(f)Dom(g)={x}\operatorname{Dom}(f)\cap\operatorname{Dom}(g)=\{x\}, for some xx\in\mathbb{R}, then Theorem  1.12 allows us to conclude if, say, f,gf,g are maximally increasing, and Dom(f)Dom(g)\operatorname{Dom}(f)\leq\operatorname{Dom}(g) that supDom(f)=+\sup\operatorname{Dom}(f)=+\infty, infDom(g)=\inf\operatorname{Dom}(g)=-\infty and therefore, Gra(f+g)={x}×\operatorname{Gra}(f+g)=\{x\}\times\mathbb{R}, f+gf+g is maximally increasing (and decreasing).

The result on parallel sum is deduced from Example 1.9, Item 1. ∎

One could be tempted to transfer the maximality of the parallel sum to the parallel product thanks to the following identity, true for any f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}} such that222The identity f=fexplogf=f\circ\exp\circ\log is only true for operators f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} such that Dom(f)+\operatorname{Dom}(f)\subset\mathbb{R}_{+}^{*} Dom(f),Dom(g)+\operatorname{Dom}(f),\operatorname{Dom}(g)\subset\mathbb{R}_{+}^{*}:

fg\displaystyle f\boxtimes g =((fexplog)1(gexplog)1)1\displaystyle=\left((f\circ\exp\circ\log)^{-1}\cdot(g\circ\exp\circ\log)^{-1}\right)^{-1}
=(exp(fexp)1exp(gexp)1)1=(fexpgexp)log.\displaystyle=\left(\exp\circ(f\circ\exp)^{-1}\cdot\exp\circ(g\circ\exp)^{-1}\right)^{-1}=\left(f\circ\exp\boxplus g\circ\exp\right)\circ\log. (4)

However the maximality properties raise some difficulties because there are some examples where ff is maximally monotone but flogf\circ\log is not (for instance for any fincuf\equiv\operatorname{inc}_{u} for u>0u>0). Besides the condition Dom(f),Dom(g)+\operatorname{Dom}(f),\operatorname{Dom}(g)\subset\mathbb{R}_{+}^{*} is arguably too restrictive for our application where one would more naturally merely assume Dom(f),Dom(g)+\operatorname{Dom}(f),\operatorname{Dom}(g)\subset\mathbb{R}_{+}.

Still it is possible to show the maximality of the product and the parallel product from scratch if one restricts the ranges to stay in +\mathbb{R}_{+}, mainly to preserve monotonicity.

Proposition 1.14.

Given two maximally monotone operators f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}} with the same monotonicity:

  • Dom(f)Dom(g)andRan(f),Ran(g)+fg\operatorname{Dom}(f)\cap\operatorname{Dom}(g)\neq\emptyset\ \operatorname{and}\ \operatorname{Ran}(f),\operatorname{Ran}(g)\subset\mathbb{R}_{+}\ \ \ \implies\ \ \ f\cdot g maximally monotone,

  • Ran(f)Ran(g)andDom(f),Dom(g)+fg\operatorname{Ran}(f)\cap\operatorname{Ran}(g)\neq\emptyset\ \operatorname{and}\ \operatorname{Dom}(f),\operatorname{Dom}(g)\subset\mathbb{R}_{+}\ \ \ \implies\ \ \ f\boxtimes g maximally monotone,

Proof.

Let us do the proof in the case f,gf,g increasing. We will use the characterization with Ran(fg+Id)\operatorname{Ran}(f\cdot g+\operatorname{Id}) given by Theorem 1.8. Proposition 1.10 ensures that Gra(f)\operatorname{Gra}(f) and Gra(g)\operatorname{Gra}(g) are both connected sets of 2\mathbb{R}^{2}. Then it is not hard to see that Gra(fg)={(x,yz),xDom(f)Dom(g),yf(x),zg(x)}\operatorname{Gra}(f\cdot g)=\{(x,yz),x\in\operatorname{Dom}(f)\cap\operatorname{Dom}(g),\ y\in f(x),z\in g(x)\} is also connected and non empty (it is simply a consequence of the continuity of the product in \mathbb{R}). By continuity of the projection on the first variable, one can deduce that Ran(fg)\operatorname{Ran}(f\cdot g) and therefore Ran(fg+Id)\operatorname{Ran}(f\cdot g+\operatorname{Id}) are both intervals of \mathbb{R}.

Now, let us note from Theorem 1.12 that infDom(f),infDom(g)=\inf\operatorname{Dom}(f),\inf\operatorname{Dom}(g)=-\infty (otherwise (Ran(f)Ran(g))\mathbb{R}_{-}\cap(\operatorname{Ran}(f)\cup\operatorname{Ran}(g))\neq\emptyset) and therefore infRan(fg+Id)=\inf\operatorname{Ran}(f\cdot g+\operatorname{Id})=-\infty. And if one denotes m=sup(Dom(f)Dom(g))m=\sup(\operatorname{Dom}(f)\cap\operatorname{Dom}(g)) (Dom(f)Dom(g)\operatorname{Dom}(f)\cap\operatorname{Dom}(g)\neq\emptyset by hypothesis), either m<+m<+\infty and then supRan(fg)=supRan(fg+Id)=+\sup\operatorname{Ran}(f\cdot g)=\sup\operatorname{Ran}(f\cdot g+\operatorname{Id})=+\infty, either m=+m=+\infty, and again supRan(fg+Id)=+\sup\operatorname{Ran}(f\cdot g+\operatorname{Id})=+\infty. We have exactly shown, by convexity of Ran(fg+Id)\operatorname{Ran}(f\cdot g+\operatorname{Id}), that Ran(fg+Id)=\operatorname{Ran}(f\cdot g+\operatorname{Id})=\mathbb{R}, one can conclude with Theorem 1.8 that fgf\cdot g is maximal.

The previous arguments can be applied to the mappings f(Id)f\circ(-\operatorname{Id}) and g(Id)g\circ(-\operatorname{Id}) when ff and gg are maximally decreasing. ∎

All those result on maximally monotone operators being set, we reached the central objective of this section: to relate the cumulative distribution function of the sum (resp. of the product) of random variables with the parallel sum (resp. the parallel product). Although rather trivial (at least when working with invertible functions and not operators), it is at the basis of the main theorems of next sections. Since we are trying to bound probabilities, we decide to slightly restrict the class of operator considered in this kind of results. That will simplify the settings and the proofs.

Definition 1.15 ((Positive) probabilistic operators).

The class of maximally decreasing operators α:2\alpha:\mathbb{R}\mapsto 2^{\mathbb{R}} such that {1}Ran(α)+\{1\}\subset\operatorname{Ran}(\alpha)\subset\mathbb{R}_{+} is called the class of probabilistic operators and is noted \mathcal{M}_{\mathbb{P}}.

The set {α,Dom(α)+}\left\{\alpha\in\mathcal{M}_{\mathbb{P}},\operatorname{Dom}(\alpha)\subset\mathbb{R}_{+}\right\} is called the class of positive probabilistic operators and is noted +\mathcal{M}_{\mathbb{P}_{+}}.

Lemma 1.16.

Given two operators α,β:2\alpha,\beta:\mathbb{R}\mapsto 2^{\mathbb{R}}:

α,βαβ\displaystyle\alpha,\beta\in\mathcal{M}_{\mathbb{P}}\ \implies\ \alpha\boxplus\beta\in\mathcal{M}_{\mathbb{P}} and\displaystyle\operatorname{and} α,β+αβ,αβ+\displaystyle\alpha,\beta\in\mathcal{M}^{+}_{\mathbb{P}}\ \implies\ \alpha\boxplus\beta,\alpha\boxtimes\beta\in\mathcal{M}^{+}_{\mathbb{P}}
Proof.

The first implication is a simple consequence of Propositions 1.3 and Example 1.9, Item 5. For the second implication, let us simply note from Lemma 1.3 that:

Dom(fg)=Ran(f1+g1)Ran(f1)+Ran(g1)=Dom(f)+Dom(g)+.\displaystyle\operatorname{Dom}(f\boxplus g)=\operatorname{Ran}(f^{-1}+g^{-1})\subset\operatorname{Ran}(f^{-1})+\operatorname{Ran}(g^{-1})=\operatorname{Dom}(f)+\operatorname{Dom}(g)\subset\mathbb{R}_{+}.

The same holds for Dom(fg)\operatorname{Dom}(f\boxtimes g). ∎

Proposition 1.17.

Given α1,,αn\alpha_{1},\ldots,\alpha_{n}\in\mathcal{M}_{\mathbb{P}} and nn random variables X1,,XnX_{1},\ldots,X_{n} satisfying for all k[n]k\in[n], tDom(αk)\forall t\in\operatorname{Dom}(\alpha_{k}): (Xk>t)αk(t)\mathbb{P}(X_{k}>t)\leq\alpha_{k}(t), we have the concentration333Recall that the parallel sum, and the parallel product are both associative operations thanks to Lemma 1.4, there is therefore no need for parenthesis.:

t0:(i=1nXk>t)nα1αn(t).\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(\sum_{i=1}^{n}X_{k}>t\right)\leq n\alpha_{1}\boxplus\cdots\boxplus\alpha_{n}(t).

If one assumes in addition that k[n]\forall k\in[n], αk+\alpha_{k}\in\mathcal{M}_{\mathbb{P}_{+}} and Xk0X_{k}\geq 0 almost surely, then, we have the concentration:

t0:(i=1nXk>t)nα1αn(t).\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(\prod_{i=1}^{n}X_{k}>t\right)\leq n\alpha_{1}\boxtimes\cdots\boxtimes\alpha_{n}(t).
Givenz,z2:zzzz,u0\displaystyle\text{Given}\ z,z^{\prime}\in\mathbb{R}^{2}:\quad z\leq z^{\prime}\quad\Longleftrightarrow\quad\langle z^{\prime}-z,u\rangle\geq 0 where u(1,1).\displaystyle u\equiv(-1,1).

When n=2n=2 and α=β\alpha=\beta, note that αβ=α(2)\alpha\boxplus\beta=\alpha(\frac{\cdot}{2}). In particular, the example depicted on Figure 2 shows that the inequality (X+Y>t)2α(t2)\mathbb{P}\left(X+Y>t\right)\leq 2\alpha(\frac{t}{2}) can be reached for some random variables XX and YY and some values of tt.

Refer to caption
Figure 2: If the law of (X,Y)2(X,Y)\in\mathbb{R}^{2} is uniformly distributed on the gray triangles, then t[t1,t2]\forall t\in[t_{1},t_{2}]: (X+Y>t)=23=2(X>t2)=2(Y>t2)\mathbb{P}(X+Y>t)=\frac{2}{3}=2\mathbb{P}(X>\frac{t}{2})=2\mathbb{P}(Y>\frac{t}{2}). One can also unbalance the weights between T1T_{1} and T2,T3T_{2},T_{3} to get probabilities different from 13\frac{1}{3}.

The proof of this proposition requires two additional results that will mainly allow us to deal with the multiplicity of image of operators. Maximally monotone operators are actually almost always singleton value as seen in next theorem.

Theorem 1.18 (Kenderov, [7], Theorem 21.22).

Given a maximally monotone mapping f:2f:\mathbb{R}\to 2^{\mathbb{R}}, there exists a a subset CDom(f)C\subset\operatorname{Dom}(f) dense in Dom(f)\operatorname{Dom}(f) and such that xC\forall x\in C, f(x)f(x) is a singleton.

When f(x)f(x) contains more than one element, it is still possible to deal with the fact that f1fIdf^{-1}\circ f\neq\operatorname{Id} thanks to the following elementary lemma.

Lemma 1.19.

Given an operator f:2f:\mathbb{R}\to 2^{\mathbb{R}}:

xDom(f):xf1f(x)\displaystyle\forall x\in\operatorname{Dom}(f):\qquad x\in f^{-1}\circ f(x)

(in particular xf1f(x)x\preceq f^{-1}\circ f(x) and xf1f(x)x\succeq f^{-1}\circ f(x))

proof of Proposition 1.17.

Let us introduce the operator:

γα1αn=(α11++αn1)1,\displaystyle\gamma\equiv\alpha_{1}\boxplus\cdots\boxplus\alpha_{n}=\left(\alpha_{1}^{-1}+\cdots+\alpha_{n}^{-1}\right)^{-1},

that is maximally decreasing thanks to Proposition 1.13 (by definition of probabilistic operators, {1}Ran(α1)Ran(αn)\{1\}\subset\operatorname{Ran}(\alpha_{1})\cap\cdots\cap\operatorname{Ran}(\alpha_{n})). Lemma 1.19 allows us to set:

t(α11++αn1)(α11++αn1)1(t)=α11(γ(t))++αn1(γ(t)).\displaystyle t\in\left(\alpha_{1}^{-1}+\cdots+\alpha_{n}^{-1}\right)\circ\left(\alpha_{1}^{-1}+\cdots+\alpha_{n}^{-1}\right)^{-1}(t)=\alpha_{1}^{-1}(\gamma(t))+\cdots+\alpha_{n}^{-1}(\gamma(t)).

Considering tDom(γ)t\in\operatorname{Dom}(\gamma), we know from Proposition 1.10 that γ(t)\gamma(t) is an interval as well as any α11(γ(t)),,αn1(γ(t))\alpha_{1}^{-1}(\gamma(t)),\ldots,\alpha_{n}^{-1}(\gamma(t)).

Let us assume in a first time that γ(t)={y}\gamma(t)=\{y\}, for a given yy\in\mathbb{R}. Then for all i[n]i\in[n], we note ti(1),ti(2)t_{i}^{(1)},t_{i}^{(2)}\in\mathbb{R} such that αi1(y)=[ti(1),ti(2)]\alpha_{i}^{-1}(y)=[t_{i}^{(1)},t_{i}^{(2)}] (we know from Proposition 1.10 that αi1(y)\alpha_{i}^{-1}(y) is a closed interval, and it is not empty since tα11(y)++αn1(y)t\in\alpha_{1}^{-1}(y)+\cdots+\alpha_{n}^{-1}(y)). Of course t[t1(1)++tn(1),t1(2)++tn(2)]t\in[t_{1}^{(1)}+\cdots+t_{n}^{(1)},t_{1}^{(2)}+\cdots+t_{n}^{(2)}], one can then bound:

(X1++Xn>t)\displaystyle\mathbb{P}\left(X_{1}+\cdots+X_{n}>t\right) (X1++Xn>t1(1)++tn(1))\displaystyle\leq\mathbb{P}\left(X_{1}+\cdots+X_{n}>t_{1}^{(1)}+\cdots+t_{n}^{(1)}\right)
(X1>t1(1))++P(Xn>tn(1))α1(t1(1))++αn(tn(1)).\displaystyle\leq\mathbb{P}\left(X_{1}>t_{1}^{(1)}\right)+\cdots+P\left(X_{n}>t_{n}^{(1)}\right)\leq\alpha_{1}(t_{1}^{(1)})+\cdots+\alpha_{n}(t_{n}^{(1)}).

Now, i[n]\forall i\in[n], γ(t)=yαi(ti(1))\gamma(t)={y}\in\alpha_{i}(t_{i}^{(1)}), therefore the last set inequality becomes as expected (X1++Xn>t)nγ(t)\mathbb{P}\left(X_{1}+\cdots+X_{n}>t\right)\leq n\gamma(t).

When γ(t)\gamma(t) contains more than one element, we still know from Theorem 1.12 that there exists u>tu>t, uDom(γ)u\in\operatorname{Dom}(\gamma) (otherwise, γ(t)\gamma(t) would be unbounded from below, which would contradict the fact that Ran(γ)Ran(α1)Ran(αn)+\operatorname{Ran}(\gamma)\subset\operatorname{Ran}(\alpha_{1})\cap\cdots\cap\operatorname{Ran}(\alpha_{n})\subset\mathbb{R}_{+}). One can then employ Theorem 1.18 to introduce a set C(t,u)Dom(γ)C\subset(t,u)\subset\operatorname{Dom}(\gamma), dense in [t,u][t,u] and such that vC\forall v\in C, γ(v)\gamma(v) is a singleton. We know from the first part of the proof that vC\forall v\in C, (X1+Xn>v)γ(v)\mathbb{P}(X_{1}+\cdots X_{n}>v)\leq\gamma(v), the lower semi continuity of v(X1+Xn>v)v\mapsto\mathbb{P}(X_{1}+\cdots X_{n}>v) (or right continuity, since the mapping is decreasing) then allows us to conclude:

(X1+Xn>t)supvC(X1+Xn>v)supvCnγ(v)nγ(t).\displaystyle\mathbb{P}(X_{1}+\cdots X_{n}>t)\leq\sup_{v\in C}\mathbb{P}(X_{1}+\cdots X_{n}>v)\leq\sup_{v\in C}n\gamma(v)\leq n\gamma(t).

The proof is analogous for the parallel product thanks to the almost sure implication:

X1Xn>t1(1)tn(1)X1>t1(1)ororXn>tn(1).\displaystyle X_{1}\cdots X_{n}>t_{1}^{(1)}\cdots t_{n}^{(1)}\quad\Longrightarrow\quad X_{1}>t_{1}^{(1)}\ \operatorname{or}\ \ldots\ \operatorname{or}\ X_{n}>t_{n}^{(1)}.

In the setting of Proposition 1.17, note that if α1,,αn\alpha_{1},\ldots,\alpha_{n} are all invertible mappings (singleton-valued operators) defined on \mathbb{R}, one can deduce that tDom(α1αn)\forall t\in\operatorname{Dom}(\alpha_{1}\boxplus\cdots\boxplus\alpha_{n}):

α1αn(t)\displaystyle\alpha_{1}\boxplus\cdots\boxplus\alpha_{n}(t) =(α11++αn1)1(t)\displaystyle=\left(\alpha_{1}^{-1}+\cdots+\alpha_{n}^{-1}\right)^{-1}(t)
(nmax(α11,,αn1))1(t)=max(α1(tn),,αn(tn)).\displaystyle\leq\left(n\max(\alpha_{1}^{-1},\ldots,\alpha_{n}^{-1})\right)^{-1}(t)=\max\left(\alpha_{1}\left(\frac{t}{n}\right),\ldots,\alpha_{n}\left(\frac{t}{n}\right)\right).

This inference is fully justified for general operators in the appendix.

2 Pivot of a concentration

Proposition 1.17 could be called a “tail concentration result”, one might be more naturally interested in the concentration around a deterministic value as depicted in next results.

Proposition 2.1.

Given two real-valued random variables X,YX,Y\in\mathbb{R} and two deterministic variables X~,Y~\tilde{X},\tilde{Y}\in\mathbb{R}, such that (|XX~|>t)α(t)\mathbb{P}(|X-\tilde{X}|>t)\leq\alpha(t) and (|YY~|>t)β(t)\mathbb{P}(|Y-\tilde{Y}|>t)\leq\beta(t), with α,β+\alpha,\beta\in\mathcal{M}_{\mathbb{P}_{+}}, we have the concentrations:

  • (|X+YX~Y~|>t)αβ(t)\mathbb{P}(|X+Y-\tilde{X}-\tilde{Y}|>t)\leq\alpha\ \boxplus\ \beta(t),

  • (|XYX~Y~|>t)3(αβ)(αId|Y~|)(βId|X~|)(t)\mathbb{P}(|XY-\tilde{X}\tilde{Y}|>t)\leq 3(\alpha\boxtimes\beta)\ \boxplus\ \left(\alpha\circ\frac{\operatorname{Id}}{|\tilde{Y}|}\right)\ \boxplus\ \left(\beta\circ\frac{\operatorname{Id}}{|\tilde{X}|}\right)(t)

Proof.

It is a mere application of Proposition 1.17. The first result is a simple consequence of the triangular inequality and the second result is deduced from the bound:

|XYX~Y~||XX~||YY~|+|XX~||Y~|+|YY~||X~|.\displaystyle|XY-\tilde{X}\tilde{Y}|\leq|X-\tilde{X}||Y-\tilde{Y}|+|X-\tilde{X}||\tilde{Y}|+|Y-\tilde{Y}||\tilde{X}|.

Remark 2.2.

Note that in the setting of Proposition 2.1 above, if α=β\alpha=\beta is a decreasing mapping and |Y~||X~||\tilde{Y}|\leq|\tilde{X}|, then Proposition A.17 provides:

(αβ)(αId|Y~|)(βId|X~|)\displaystyle(\alpha\boxtimes\beta)\ \boxplus\ \left(\alpha\circ\frac{\operatorname{Id}}{|\tilde{Y}|}\right)\ \boxplus\ \left(\beta\circ\frac{\operatorname{Id}}{|\tilde{X}|}\right) 3αIdαId|X~|αId|X~|\displaystyle\leq 3\alpha\circ\sqrt{\operatorname{Id}}\ \boxplus\ \alpha\circ\frac{\operatorname{Id}}{|\tilde{X}|}\ \boxplus\ \alpha\circ\frac{\operatorname{Id}}{|\tilde{X}|}
3max(αId3,αId3|X~|) 3αId3+3αId3|X~|.\displaystyle\leq 3\max\left(\alpha\circ\frac{\sqrt{\operatorname{Id}}}{3},\alpha\circ\frac{\operatorname{Id}}{3|\tilde{X}|}\right)\ \leq\ \ 3\alpha\circ\frac{\sqrt{\operatorname{Id}}}{3}+3\alpha\circ\frac{\operatorname{Id}}{3|\tilde{X}|}.

Therefore, one obtains:

(|XYX~Y~|>t)3α(t3)+3α(t3|X~|)\displaystyle\mathbb{P}(|XY-\tilde{X}\tilde{Y}|>t)\leq 3\alpha\left(\frac{\sqrt{t}}{3}\right)+3\alpha\left(\frac{t}{3|\tilde{X}|}\right)

which shares some similarity with Hanson-Wright result presented in Theorems6.16.7.

The concentration around an independent copy, although less intuitive, presents some advantages like the one provided in next lemma. The concentration rate of a σ\sigma-Lipschitz transformations f(X)f(X)\in\mathbb{R} of a random variable XX\in\mathbb{R} is proportional to the Lipschitz parameter since:

|f(X)f(X)|>t|XX|>tσ.\displaystyle|f(X)-f(X^{\prime})|>t\quad\Longrightarrow\quad|X-X^{\prime}|>\frac{t}{\sigma}.
Lemma 2.3.

Given a random variable XX\in\mathbb{R}, an independent copy, XX^{\prime}, α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}, a parameter σ+\sigma\in\mathbb{R}_{+} and a σ\sigma-Lipschitz mapping ff\in\mathbb{R}^{\mathbb{R}}, one has the implication:

(|XX|>t)α(t)\displaystyle\mathbb{P}(|X-X^{\prime}|>t)\leq\alpha(t) \displaystyle\Longrightarrow (|f(X)f(X)|>t)α(tσ).\displaystyle\mathbb{P}(|f(X)-f(X^{\prime})|>t)\leq\alpha\left(\frac{t}{\sigma}\right).

Concentration around an independent copy is somehow equivalent to concentration around a median as stated in next lemma.

Lemma 2.4.

Given a random variables XX\in\mathbb{R}, an independent copy, XX^{\prime}, a median, mXm_{X} and α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}:

X~|(|XX~|>t)α(t)(|XX|>t)2α(t2)(|XmX|>t)4α(t2).\displaystyle\exists\tilde{X}\in\mathbb{R}\ |\ \mathbb{P}(|X-\tilde{X}|>t)\leq\alpha(t)\quad\Rightarrow\quad\mathbb{P}(|X-X^{\prime}|>t)\leq 2\alpha\left(\frac{t}{2}\right)\quad\Rightarrow\quad\mathbb{P}(|X-m_{X}|>t)\leq 4\alpha\left(\frac{t}{2}\right).
Proof.

The second implication can be taken from [18], Corollary 1.5. The first implication is just a consequence of the fact that:

(|XX|>t)(|XX~|+|X~X|>t)2(|XX~|>t2).\displaystyle\mathbb{P}(|X-X^{\prime}|>t)\leq\mathbb{P}(|X-\tilde{X}|+|\tilde{X}-X^{\prime}|>t)\leq 2\mathbb{P}\left(|X-\tilde{X}|>\frac{t}{2}\right).

One can then wonder if the same is true for the expectation of XX replacing the median (at least when the expectation is defined). Given α\alpha\in\mathcal{M}_{\mathbb{P}}, we define its integral with the introduction of the set:

L0α{f:Dom(α)̊integrable|αf},\displaystyle L_{0}^{\succeq\alpha}\equiv\left\{f:\mathring{\operatorname{Dom}(\alpha)}\to\mathbb{R}\ \text{integrable}\ |\ \alpha\preceq f\right\},

where the mappings f:Dom(α)̊f:\mathring{\operatorname{Dom}(\alpha)}\to\mathbb{R} are interpreted as singleton-valued operators to give sense to the inequality αf\alpha\preceq f. If L0α=L_{0}^{\succeq\alpha}=\emptyset, α=+\int\alpha=+\infty otherwise α=inffL0αf\int\alpha=\inf_{f\in L_{0}^{\succeq\alpha}}\int f.

Lemma 2.5.

Given a random variable XX\in\mathbb{R}, a deterministic scalar X~=(Xi)iII\tilde{X}=(X_{i})_{i\in I}\in\mathbb{R}^{I}, and α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}} such that (|XX~|>t)α(t)\mathbb{P}\left(|X-\tilde{X}|>t\right)\leq\alpha(t), one can bound:

(|X𝔼[X]|>t)1min(1,α(α))α(t2).\displaystyle\mathbb{P}\left(|X-\mathbb{E}[X]|>t\right)\leq\frac{1}{\min\left(1,\alpha(\int\alpha)\right)}\alpha\left(\frac{t}{2}\right).

Lemma 2.5 is proven thanks to:

Lemma 2.6.

Given α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}, for any t,τ>0t,\tau>0:

min(1,α(tτ))1min(1,α(τ))α(t2).\displaystyle\min\left(1,\alpha(t-\tau)\right)\leq\frac{1}{\min(1,\alpha(\tau))}\alpha\left(\frac{t}{2}\right).
Proof.

If t<2τt<2\tau, α(t/2)min(1,α(τ))α(t/2)α(τ)1\frac{\alpha\left(t/2\right)}{\min(1,\alpha(\tau))}\succeq\frac{\alpha\left(t/2\right)}{\alpha(\tau)}\geq 1 and if t2τt\geq 2\tau, t2tτ\frac{t}{2}\leq t-\tau thus α(tτ)α(t2)\alpha(t-\tau)\preceq\alpha(\frac{t}{2}). ∎

Proof of Lemma 2.5.

Let us bound:

|𝔼[X]X~|𝔼[|XX~|]=+(|XX~|>t)α\displaystyle|\mathbb{E}[X]-\tilde{X}|\leq\mathbb{E}[|X-\tilde{X}|]=\int_{\mathbb{R}_{+}}\mathbb{P}\left(|X-\tilde{X}|>t\right)\leq\int\alpha

One can then apply Lemma 2.6 to obtain:

(|X𝔼[X]|>t)\displaystyle\mathbb{P}\left(\left|X-\mathbb{E}[X]\right|>t\right) (|XX~|>t|𝔼[X]X~|)\displaystyle\leq\mathbb{P}\left(\left|X-\tilde{X}\right|>t-\left|\mathbb{E}[X]-\tilde{X}\right|\right)
min(1,α(tα))1min(1,α(α))α(t2).\displaystyle\leq\min\left(1,\alpha\left(t-\int\alpha\right)\right)\leq\frac{1}{\min\left(1,\alpha\left(\int\alpha\right)\right)}\alpha\left(\frac{t}{2}\right).

3 Concentration in high dimension

The concentration of the measure theory is only relevant in high dimension where as Talagrand noted in [24]: “A random variable that depends (in a “smooth” way) on the influence of many independent variable (but not too much on any of them) is essentially constant”. We give in the two next theorems two fundamental results of the theory, we refer the reader to [18] to a wider list of examples. Recent powerful characterization of product measure in most general settings can be found in [14, 13]. The concentration of a random variable ZZ of a metric space is expressed through the concentration of any f(Z)f(Z)\in\mathbb{R} for ff belonging to a certain class of regularity. The random variables f(Z)f(Z) are classically called “observations” of ZZ. Depending on the class of regularity of the set of observations that should satisfy the concentration inequality, one obtains different class of concentration, typically, from the stronger to the weaker notion: the Lipschitz (see Theorem 3.1), the convex (see Theorem 3.2) and the linear (see Theorems 6.16.7) concentration.

Without particular specification, n\mathbb{R}^{n} is endowed with the euclidean norm.

Theorem 3.1 ([18]).

Given a Gaussian vector ZnZ\in\mathbb{R}^{n} with covariance identity, for any 11-Lipschitz mapping f:nf:\mathbb{R}^{n}\to\mathbb{R} and any independent copy of ZZ, ZZ^{\prime}:

(|f(Z)f(Z)|>t)2et2/2\displaystyle\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)\leq 2e^{-t^{2}/2}

One can easily deduce as it was done for Lemma 2.3 that any random variable (Φ(Z))(\Phi(Z)), where Φ\Phi is a λ\lambda-Lipschitz transformation from n\mathbb{R}^{n} to some metric space satisfy the similar result with a decay t2et22λt\mapsto 2e^{-\frac{t^{2}}{2\lambda}}

A second result was proven by Talagrand and sets only the concentration of convex observation. It is a weaker result but very useful since it allows to study discrete distributions (that can not be obtained through Lipschitz transformation of the Gaussian vectors mentioned in Theorem 3.1).

Theorem 3.2 ([23]).

Given a random vector Z[0,1]nZ\in[0,1]^{n} with independent entries and a 11-Lipschitz mapping f:nf:\mathbb{R}^{n}\mapsto\mathbb{R}:

(|f(Z)𝔼[f(Z)]|>t)2et2/4\displaystyle\mathbb{P}\left(\left|f(Z)-\mathbb{E}[f(Z)]\right|>t\right)\leq 2e^{-t^{2}/4}

The upper concentration could equivalently had been restricted to 11-Lipschitz and concave ff.

Remark 3.3.

This theorem can be generalized to any random vector AZ+bAZ+b, for two deterministic AnA\in\mathcal{M}_{n} and bnb\in\mathbb{R}^{n} such that A1\|A\|\leq 1 (the convexity of fΦf\circ\Phi when ff is convex can not be ensured by a general transformation Φ\Phi). One could sum up this remark saying that the class of convexly concentrated random vectors is stable through bounded affine transformation.

For some specific transformations Φ\Phi that preserve some convexity properties it is sometimes possible to show the linear concentration of Φ(Z)\Phi(Z) (for instance when Φ\Phi is build with some entry-wise product or matrix product as in Theorems 6.1 and 6.7, one can refer to [20, Theorem 1] for more general results on polynomials of random matrices).

To end our short presentation, it is interesting to note that, in finite dimension, the concentration of a linear observation u(Z)u(Z) happens around u(𝔼(Z))u(\mathbb{E}(Z)).

Lemma 3.4.

Given a finite-dimensional vector space444One could provide a definition of the expectation easily in any reflexive space or even any vector space of functions taking value in a reflexive space. However, for the definition, we require u𝔼[u(X)]u\mapsto\mathbb{E}[u(X)] to be continuous on EE^{*} (the dual set of EE). Without further information on 𝔼[u(X)]\mathbb{E}[u(X)] (like a bound) the lemma can only be true on a finite dimensional space where all linear forms are continuous. If instead of the assumption f(X)αf(X)\in\alpha, ff 1-Lipschitz, one adopts the assumption (|u(XX~)|>t)α(t)\mathbb{P}\left(\left|u(X-\tilde{X})\right|>t\right)\leq\alpha(t), u1\|u\|\leq 1, then XX is in a sense centered, and it is possible to deduce the result in a general reflexive space. EE, a random vector XEX\in E, an integrable α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}} such that for all 11-Lipschitz mapping f:Ef:E\to\mathbb{R}, and any median of f(X)f(X), mfm_{f}, (|f(X)mf|>t)α(t)\mathbb{P}\left(\left|f(X)-m_{f}\right|>t\right)\leq\alpha(t), then 𝔼[X]E\mathbb{E}[X]\in E is well defined and one has the concentration for any linear form u1(E,)u\in\mathcal{L}^{1}(E,\mathbb{R}), such that u1\|u\|\leq 1:

(|u(X𝔼[X])|>t)1min(1,α(+α))α(t2)\displaystyle\mathbb{P}\left(\left|u(X-\mathbb{E}[X])\right|>t\right)\leq\frac{1}{\min\left(1,\alpha(\int_{\mathbb{R}_{+}}\alpha)\right)}\alpha\left(\frac{t}{2}\right)

This Lemma is a direct consequence of Lemma 2.5, therefore we directly pursue to the main result of this section: the extension of the concentration provided in Theorems 3.13.2 to some non Lipschitz observation.

Theorem 3.5.

Let us consider a metric space (E,d)(E,d), a random variable ZEiZ\in E_{i}, nn measurable mappings Λ1,,Λn+E\Lambda_{1},\ldots,\Lambda_{n}\in\mathbb{R}_{+}^{E} such that there exist n+1n+1 probabilistic operators α,β1,,βn+\alpha,\beta_{1},\ldots,\beta_{n}\in\mathcal{M}_{\mathbb{P}_{+}}, such that for all 11-Lipschitz mapping f:Ef:E\to\mathbb{R} and any independent copy of ZZ, ZZ^{\prime}:

(|f(Z)f(Z)|>t)α(t),\displaystyle\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)\leq\alpha(t),\quad and k[n]:(Λk(Z)t)βk(t).\displaystyle\forall k\in[n]:\quad\mathbb{P}\left(\Lambda_{k}(Z)\geq t\right)\leq\beta_{k}(t).

Given a supplementary metric space (E,d)(E^{\prime},d^{\prime}), and Φ:EE\Phi:E\to E^{\prime}, if we assume that z,zE\forall z,z^{\prime}\in E:

d(Φ(z),Φ(z))max(Λ1(z),Λ1(z))max(Λn(z),Λn(z))d(z,z)\displaystyle d^{\prime}(\Phi(z),\Phi(z^{\prime}))\leq\max(\Lambda_{1}(z),\Lambda_{1}(z^{\prime}))\cdots\max(\Lambda_{n}(z),\Lambda_{n}(z^{\prime}))\cdot d(z,z^{\prime}) (5)

then for any f:Ef:E^{\prime}\to\mathbb{R}, 11-Lipschitz:

(|f(Φ(Z))f(Φ(Z))|>t)(2n+1)αβ1βn(t).\displaystyle\mathbb{P}\left(\left|f(\Phi(Z))-f(\Phi(Z^{\prime}))\right|>t\right)\leq(2n+1)\alpha\boxtimes\beta_{1}\boxtimes\cdots\boxtimes\beta_{n}(t).

In the Hanson-Wright setting, E=nE=\mathbb{R}^{n}, Φ:XXTAX\Phi:X\mapsto X^{T}AX, for some deterministic matrix AnA\in\mathcal{M}_{n} and Λ1:z2Az\Lambda_{1}:z\mapsto 2\|Az\| (then Φ(Z)Φ(Z)2max(Λ1(Z),Λ1(Z))ZZ\|\Phi(Z)-\Phi(Z^{\prime})\|\leq 2\max(\Lambda_{1}(Z),\Lambda_{1}(Z^{\prime}))\|Z-Z^{\prime}\|).

Theorem 3.5 is a simple consequence of the following result that we took from [15].

Lemma 3.6.

In a metric space (E,d)(E,d), given a subset AEA\subset E and a mapping f:Af:A\to\mathbb{R}, λ\lambda-Lipschitz, λ>0\lambda>0 the continuation f~:E\tilde{f}:E\to\mathbb{R} defined as:

xE:f~(x)=supyAf(y)λd(x,y)\displaystyle\forall x\in E:\tilde{f}(x)=\sup_{y\in A}f(y)-\lambda d(x,y) (6)

is λ\lambda-Lipschitz and satisfies xA:\forall x\in A: f~(x)=f(x)\tilde{f}(x)=f(x).

Proof.

First note that since ff is λ\lambda-Lipschitz on AA, given xAx\in A, for all yAy\in A:

f(x)f(y)λd(x,y),\displaystyle f(x)\geq f(y)-\lambda d(x,y),

and in addition f(x)=f(x)λd(x,x)f(x)=f(x)-\lambda d(x,x), therefore, f(x)=f~(x)f(x)=\tilde{f}(x). Now, given x,zEx,z\in E, the triangle inequality provides:

f~(x)supyAf(y)λ(d(y,z)d(z,x))f~(z)λd(z,x),\displaystyle\tilde{f}(x)\geq\sup_{y\in A}f(y)-\lambda\left(d(y,z)-d(z,x)\right)\geq\tilde{f}(z)-\lambda d(z,x),

which implies |f~(z)f~(x)|λd(x,z)|\tilde{f}(z)-\tilde{f}(x)|\leq\lambda d(x,z) by symmetry. ∎

Proof of Theorem 3.5.

For simplicity we do the proof for invertible mappings α,β1,,βn\alpha,\beta_{1},\ldots,\beta_{n}, we left the proof for general probabilistic operators as an exercise for the reader, the different arguments were already provided in the proof of Proposition 1.17. Let us introduce the notation γ=αβ1βn\gamma=\alpha\boxtimes\beta_{1}\boxtimes\cdots\boxtimes\beta_{n}, and consider tDom(γ)t\in\operatorname{Dom}(\gamma). We know from Lemma 1.19 that:

t=θα1γ(t)\displaystyle t=\theta\alpha^{-1}\circ\gamma(t) where:θ(β1βn)1γ(t).\displaystyle\text{where:}\quad\theta\equiv\left(\beta_{1}\boxtimes\cdots\boxtimes\beta_{n}\right)^{-1}\circ\gamma\left(t\right). (7)

Introducing the set 𝒜{z𝔼|k[n],Λk(z)<βk1(γ(t))}\mathcal{A}\equiv\{z\in\mathbb{E}\ |\ \forall k\in[n],\Lambda_{k}(z)<\beta_{k}^{-1}(\gamma(t))\}, one gets:

(|f(Φ(Z))f(Φ(Z))|>t)\displaystyle\mathbb{P}\left(\left|f(\Phi(Z))-f(\Phi(Z^{\prime}))\right|>t\right) (|f(Φ(Z))f(Φ(Z))|>t,(Z,Z)𝒜2)\displaystyle\leq\mathbb{P}\left(\left|f(\Phi(Z))-f(\Phi(Z^{\prime}))\right|>t,\ (Z,Z^{\prime})\in\mathcal{A}^{2}\right) (8)
+(|f(Φ(Z))f(Φ(Z))|>t,(Z,Z)𝒜2).\displaystyle\hskip 14.22636pt\hskip 28.45274pt+\mathbb{P}\left(\left|f(\Phi(Z))-f(\Phi(Z^{\prime}))\right|>t,\ (Z,Z^{\prime})\notin\mathcal{A}^{2}\right).

We know from (5) that fΦf\circ\Phi is θ\theta-Lipschitz on 𝒜\mathcal{A}. Let us then denote f~\tilde{f}, the θ\theta-Lipschitz continuation of fΦ|A{f\circ\Phi}\raisebox{-2.15277pt}{$|$}_{A} defined in Lemma 3.6, one can bound thanks to Lemma 1.19 and (7):

(|f(Φ(Z))f(Φ(Z))|>t,(Z,Z)𝒜2)\displaystyle\mathbb{P}\left(\left|f(\Phi(Z))-f(\Phi(Z^{\prime}))\right|>t,\ (Z,Z^{\prime})\in\mathcal{A}^{2}\right) (|f~(Z)f~(Z)|>t)\displaystyle\leq\mathbb{P}\left(\left|\tilde{f}(Z)-\tilde{f}(Z^{\prime})\right|>t\right)
α(tθ)=αα1γ(t)=γ(t),\displaystyle\leq\alpha\left(\frac{t}{\theta}\right)=\alpha\circ\alpha^{-1}\circ\gamma\left(t\right)=\gamma\left(t\right),

One can besides bound from the hypotheses on the concentration of Λ1(Z),,Λn(Z)\Lambda_{1}(Z),\ldots,\Lambda_{n}(Z):

(|f(Φ(Z))f(Φ(Z))|>t,(Z,Z)A2)\displaystyle\mathbb{P}\left(\left|f(\Phi(Z))-f(\Phi(Z^{\prime}))\right|>t,\ (Z,Z^{\prime})\notin A^{2}\right)
(max(Λ1(Z),Λ1(Z))β11(γ(t)))++(max(Λn(Z),Λn(Z))βn1(γ(t)))\displaystyle\hskip 14.22636pt\leq\mathbb{P}\left(\max(\Lambda_{1}(Z),\Lambda_{1}(Z^{\prime}))\geq\beta_{1}^{-1}(\gamma(t))\right)+\cdots+\mathbb{P}\left(\max(\Lambda_{n}(Z),\Lambda_{n}(Z^{\prime}))\geq\beta_{n}^{-1}(\gamma(t))\right)
2β1β11(γ(t))++2βnβn1(γ(t))2nγ(t).\displaystyle\hskip 14.22636pt\leq 2\beta_{1}\circ\beta_{1}^{-1}(\gamma(t))+\cdots+2\beta_{n}\circ\beta_{n}^{-1}(\gamma(t))\preceq 2n\gamma\left(t\right).

Combining the two last inequalities with (8) one finally obtains the result of the theorem. ∎

To adapt Theorem 3.5 to the case of the convex concentration, one needs to find a 11-Lipschitz and convex mapping that could play the role of the mapping defined in (6). When the original mapping is differentiable, a good suggestion was provided in [1]; we will adapt their definition to merely convex settings in Lemma 3.9 thanks to the notion of subgradient that we recall below.

Definition 3.7 (Subgradient).

Given an euclidean vector space EE, a convex mapping f:E{,+}f:E\to\mathbb{R}\cup\{-\infty,+\infty\} and a point xEx\in E, the subgradient of ff on xx is the set:

f(x)={gE,yE:f(y)f(x)+g,yx}.\displaystyle\partial f(x)=\left\{g\in E,\forall y\in E:f(y)\geq f(x)+\langle g,y-x\rangle\right\}.

The subgradient is well suited to the study of convex Lipschitz mappings thanks to the following property.

Lemma 3.8.

Given an euclidean vector space EE, AEA\subset E an open set and f:Af:A\to\mathbb{R} convex and λ\lambda-Lipschitz, for all xAx\in A, f(x)\partial f(x)\neq\emptyset and gf(x)gλg\in\partial f(x)\implies\|g\|\leq\lambda.

Proof.

The non emptiness is provided for instance in [8], Proposition 5.4.1. Now, note that given xAx\in A and gf(x)g\in\partial f(x):

λ|f(x)f(x+gg)|g,gg=g.\displaystyle\lambda\geq\left|f(x)-f\left(x+\frac{g}{\|g\|}\right)\right|\geq\left\langle g,\frac{g}{\|g\|}\right\rangle=\|g\|.

This lemma allows us to define rigorously our Lipschitz convex continuation.

Lemma 3.9.

Given an euclidean vector space EE, a non-empty open set AnA\subset\mathbb{R}^{n} and a λ\lambda-Lipschitz, convex mapping f:Af:A\to\mathbb{R}, the mapping:

f~:E{+}ysupxAgf(x)g,yx+f(x)\displaystyle\tilde{f}:\begin{aligned} E&\longrightarrow&\mathbb{R}\cup\{+\infty\}\hskip 71.13188pt&\\ y&\longmapsto&\sup_{\genfrac{}{}{0.0pt}{2}{x\in A}{g\in\partial f(x)}}\langle g,y-x\rangle+f(x)&\end{aligned} (9)

is convex, λ\lambda-Lipschitz and satisfies:

xA:f~(x)=f(x).\displaystyle\forall x\in A:\quad\tilde{f}(x)=f(x).

If supxAf(x)+\sup_{x\in A}f(x)\neq+\infty, then f~\tilde{f} only takes finite values.

Proof.

First, the convexity is obvious as convexity is stable through supremum. Second, the triangular inequality satisfied by suprema allows us to set that y,zE\forall y,z\in E:

|f~(y)f~(z)|\displaystyle\left|\tilde{f}(y)-\tilde{f}(z)\right| =|supxAgf(x)g,yx+f(x)supxAgf(x)g,zx+f(x)|\displaystyle=|\sup_{\genfrac{}{}{0.0pt}{2}{x\in A}{g\in\partial f(x)}}\langle g,y-x\rangle+f(x)-\sup_{\genfrac{}{}{0.0pt}{2}{x^{\prime}\in A}{g\in\partial f(x^{\prime})}}\langle g,z-x^{\prime}\rangle+f(x^{\prime})|
|supxAgf(x)g,yz|λyz.\displaystyle\leq|\sup_{\genfrac{}{}{0.0pt}{2}{x\in A}{g\in\partial f(x)}}\langle g,y-z\rangle|\leq\lambda\|y-z\|.

Finally, for all yAy\in A and for all gf(y)g\in\partial f(y):

f(y)=f(y)+g,yysupxAgf(x)g,yx+f(x)f(y),\displaystyle f(y)=f(y)+\langle g,y-y\rangle\leq\sup_{\genfrac{}{}{0.0pt}{2}{x\in A}{g\in\partial f(x)}}\langle g,y-x\rangle+f(x)\leq f(y),

by definition of the subgradient. In other words f(y)f~(y)f(y)f(y)\leq\tilde{f}(y)\leq f(y) which implies f(y)=f~(y)f(y)=\tilde{f}(y).

Let us assume that there exists yEy\in E such that f~(y)=+\tilde{f}(y)=+\infty then there exists two sequences of vectors (xn)nA(x_{n})_{n\in\mathbb{N}}\in A^{\mathbb{N}} and (gn)nA(g_{n})_{n\in\mathbb{N}}\in A^{\mathbb{N}} such that n\forall n\in\mathbb{N}, gnf(xn)g_{n}\in\partial f(x_{n}), and:

limngn,yxn+f(xn)=+(=f~(y)).\displaystyle\lim_{n\to\infty}\langle g_{n},y-x_{n}\rangle+f(x_{n})=+\infty\quad(=\tilde{f}(y)).

That means in particular that limnf(xn)=+\lim_{n\to\infty}f(x_{n})=+\infty, since gnλ\|g_{n}\|\leq\lambda and yxnd(A,y)\|y-x_{n}\|\leq d(A,y). ∎

Theorem 3.10.

Let us consider a euclidean vector space (E,)(E,\|\cdot\|), a random vector ZEZ\in E, nn continuous mappings Λ1,,Λn+E\Lambda_{1},\ldots,\Lambda_{n}\in\mathbb{R}_{+}^{E} such that there exist n+1n+1 probabilistic operators α,β1,,βn+\alpha,\beta_{1},\ldots,\beta_{n}\in\mathcal{M}_{\mathbb{P}_{+}}, satisfying for any 11-Lipschitz and convex mapping f:Ef:E\to\mathbb{R} and any independent copy of ZZ, ZZ^{\prime}:

(|f(Z)f(Z)|>t)α(t),\displaystyle\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)\leq\alpha(t),\quad and k[n]:(Λk(Z)t)βk(t),\displaystyle\forall k\in[n]:\quad\mathbb{P}\left(\Lambda_{k}(Z)\geq t\right)\leq\beta_{k}(t),

Given a convex mapping Φ:E\Phi:E\to\mathbb{R}, if we assume that for all z,zEz,z^{\prime}\in E:

|Φ(z)Φ(z)|max(Λ1(z),Λ1(z))max(Λn(z),Λn(z))zz,\displaystyle\left|\Phi(z)-\Phi(z^{\prime})\right|\leq\max(\Lambda_{1}(z),\Lambda_{1}(z^{\prime}))\cdots\max(\Lambda_{n}(z),\Lambda_{n}(z^{\prime}))\cdot\|z-z^{\prime}\|,

then:

(|Φ(Z)Φ(Z)|>t)(2n+1)αβ1βn(t).\displaystyle\mathbb{P}\left(\left|\Phi(Z)-\Phi(Z^{\prime})\right|>t\right)\leq(2n+1)\alpha\boxtimes\beta_{1}\boxtimes\cdots\boxtimes\beta_{n}(t).
Proof.

The proof is of course quite similar to the one of Theorem 3.5 except that in this case, one must check that the sets 𝒜{z𝔼|k[n],Λk(z)<βk1(γ(t))}\mathcal{A}\equiv\{z\in\mathbb{E}\ |\ \forall k\in[n],\Lambda_{k}(z)<\beta_{k}^{-1}(\gamma(t))\}, for a given tDom(γ)t\in\operatorname{Dom}(\gamma) are open in order to employ Lemma 3.6 instead of Lemma 3.9. To show that 𝒜\mathcal{A} is open, first note that:

infβk1(γ(t))=infβk1(max(γ(t))βk1(γ(t)),\displaystyle\inf\beta_{k}^{-1}(\gamma(t))=\inf\beta_{k}^{-1}(\max(\gamma(t))\in\beta_{k}^{-1}(\gamma(t)),

thanks to Proposition 1.10 that sets that γ(t)\gamma(t) and βk1(max(γ(t))\beta_{k}^{-1}(\max(\gamma(t)) are both closed. One can then conclude with the hypothesis of continuity of Λ1,,Λn\Lambda_{1},\ldots,\Lambda_{n}. The rest of the proof is strictly the same. ∎

4 Heavy tailed random vector concentration

The aim of this subsection is not to set optimal heavy-tailed result concentration subsection but merely to present a straightforward and efficient way to show the Lipschitz concentration of random vectors having independent heavy tailed entries. The simple idea is to employ the Gaussian concentration given in Theorem 3.1 as a pivot to reach more general concentration decay.

To simplify the notations, let us introduce the operator 2+\mathcal{E}_{2}\in\mathcal{M}_{\mathbb{P}_{+}} defined, for all t+t\in\mathbb{R}_{+}, as:

2(t)\displaystyle\mathcal{E}_{2}(t) =2et2\displaystyle=2e^{-t^{2}}\quad ift>0\displaystyle\operatorname{if}t>0 and\displaystyle\operatorname{and} 2(0)\displaystyle\mathcal{E}_{2}(0) =[2,+).\displaystyle=[2,+\infty). (10)
Proposition 4.1.

Let us consider a random vector X=(X1,,Xn)nX=(X_{1},\ldots,X_{n})\in\mathbb{R}^{n} such that there exists a set of independent Gaussian random variables Z1,Zn𝒩(0,1)Z_{1},\ldots Z_{n}\sim\mathcal{N}(0,1), and nn mappings ϕ1,,ϕn:\phi_{1},\ldots,\phi_{n}:\mathbb{R}\to\mathbb{R}, differentiable by parts satisfying:

i:Xi=ϕi(Zi)a.s.\displaystyle\forall i\in\mathbb{N}:\quad X_{i}=\phi_{i}(Z_{i})\quad a.s.

Given an increasing invertible mapping h:++h:\mathbb{R}_{+}\mapsto\mathbb{R}_{+} such that i[n],t+:ϕi(t),ϕi(t)h(t)\forall i\in[n],\forall t\in\mathbb{R}_{+}:\phi_{i}^{\prime}(t),\phi_{i}^{\prime}(-t)\leq h(t) and555In other words, we require log(t)\log\circ\circ\sqrt{(}t) to be sub-additive after 2log(2n)2\log(2n), it is true in particular for h:tet2h:t\mapsto e^{t^{2}}. for all a>2log(2n)a>2\log(2n), b>0b>0, h(a+b)h(a)h(b)h(\sqrt{a+b})\leq h(\sqrt{a})h(\sqrt{b}), for any f:nf:\mathbb{R}^{n}\to\mathbb{R}, 11-Lipschitz and independent copy of XX, XX^{\prime}, we have the concentration:

(|f(X)f(X)|>t)32(Idh)1(th(2log(n))).\displaystyle\mathbb{P}\left(\left|f(X)-f(X^{\prime})\right|>t\right)\leq 3\mathcal{E}_{2}\circ\left(\operatorname{Id}\cdot h\right)^{-{1}}\circ\left(\frac{t}{h(\sqrt{2\log(n)})}\right). (11)

In particular, given r>0r>0, introducing the quantity Ch,r60h(t)rtr+1et22𝑑tC_{h,r}\equiv 6\int_{0}^{\infty}h(t)^{r}t^{r+1}e^{-\frac{t^{2}}{2}}dt possibly infinite but independent with nn, one can bound:

𝔼[|f(Z)f(Z)|r]Ch,rh(2log(n))r.\displaystyle\mathbb{E}[|f(Z)-f(Z^{\prime})|^{r}]\leq C_{h,r}h(\sqrt{2\log(n)})^{r}. (12)
Remark 4.2.
  1. 1.

    One might be tempted to replace the mapping (Idh)1\left(\operatorname{Id}\cdot h\right)^{-{1}} with H1H^{-1} where H:t0th(u)𝑑uH:t\mapsto\int_{0}^{t}h(u)du. It is generally not possible because one rather has t0\forall t\geq 0: H(t)0th(t)𝑑u=th(t)H(t)\leq\int_{0}^{t}h(t)du=th(t), which implies, thanks to Lemma A.6, (Idh)1H1(\operatorname{Id}\cdot h)^{-1}\leq H^{-1} and therefore 2H12(Idh)1\mathcal{E}_{2}\circ H^{-1}\leq\mathcal{E}_{2}\circ(\operatorname{Id}\cdot h)^{-1}. However, for heavy tailed distribution, hh is exponentially increasing which implies that the tail behavior of 2H1\mathcal{E}_{2}\circ H^{-1} and 2(Idh)1\mathcal{E}_{2}\circ(\operatorname{Id}\cdot h)^{-1} are very similar.

  2. 2.

    In the setting of Proposition 4.1, Lemma 2.4 allows us to replace the random variable f(X)f(X^{\prime}) by any of its medians simply adding a factor 22 in inequalities (11) and (12).

Corollary 4.3.

In the setting of Proposition 4.1, if Ch,1C_{h,1}\leq\infty, 𝔼[f(X)]\mathbb{E}[f(X)] is well defined and one can further bound:

𝔼[|f(X)𝔼[f(X)]|r]Ch,rh(2log(n))r,with:Ch,r={4Ch,1rifr<14rCh,rifr1\displaystyle\mathbb{E}[|f(X)-\mathbb{E}[f(X)]|^{r}]\leq C_{h,r}^{\prime}h(\sqrt{2\log(n)})^{r},\qquad\text{with:}\ C_{h,r}^{\prime}=\left\{\begin{aligned} 4C_{h,1}^{r}\ &\operatorname{if}\ r<1\\ 4^{r}C_{h,r}\ &\operatorname{if}\ r\geq 1\\ \end{aligned}\right.
Proof.

Introducing the notation ηnh(2log(n))\eta_{n}\equiv h(\sqrt{2\log(n)}), we know from Proposition 4.1 and Lemma 2.4 that 𝔼[|f(Z)mf|r]2Ch,rηnr\mathbb{E}[|f(Z)-m_{f}|^{r}]\leq 2C_{h,r}\eta_{n}^{r}. Besides, let us bound:

|mf𝔼[f(X)]|=|𝔼[mff(X)]|𝔼[|mff(X)|]2Ch,1ηn\displaystyle\left|m_{f}-\mathbb{E}[f(X)]\right|=\left|\mathbb{E}\left[m_{f}-f(X)\right]\right|\leq\mathbb{E}[|m_{f}-f(X)|]\leq 2C_{h,1}\eta_{n}

Now, if r<1r<1, Jensen’s inequality provides:

Ch,r=0(h(t)t)rtet22𝑑t(0h(t)t2et22𝑑t)rCh,1r\displaystyle C_{h,r}=\int_{0}^{\infty}(h(t)t)^{r}te^{-\frac{t^{2}}{2}}dt\leq\left(\int_{0}^{\infty}h(t)t^{2}e^{-\frac{t^{2}}{2}}dt\right)^{r}\leq C_{h,1}^{r}

and given two random variables Y,Z>0Y,Z>0, (Y+Z)rYr+Zr(Y+Z)^{r}\leq Y^{r}+Z^{r}. If r1r\geq 1 Jensen’s inequality implies here Ch,1Ch,r1rC_{h,1}\leq C_{h,r}^{\frac{1}{r}}, and the triangular inequality provides 𝔼[|Y+Z|r]1r𝔼[|Z|r]1r+𝔼[|Z|r]1r\mathbb{E}[|Y+Z|^{r}]^{\frac{1}{r}}\leq\mathbb{E}[|Z|^{r}]^{\frac{1}{r}}+\mathbb{E}[|Z|^{r}]^{\frac{1}{r}} which allows us to conclude in all cases that:

𝔼[|f(X)𝔼[f(X)]|r]=𝔼[|f(X)mf+mf𝔼[f(X)]|r]Ch,rηnr\displaystyle\mathbb{E}[|f(X)-\mathbb{E}[f(X)]|^{r}]=\mathbb{E}[|f(X)-m_{f}+m_{f}-\mathbb{E}[f(X)]|^{r}]\leq C^{\prime}_{h,r}\eta_{n}^{r}

Before proving Proposition 4.1, let us apply it to the case of random variables with tail 1(1+t)q\frac{1}{(1+t)^{q}}, q>0q>0. The next corollary is completely artificial but gives a simple illustration of Proposition 4.1.

Example 4.4.

Consider the setting of Proposition 4.1 with n2n\geq 2, ϕ1==ϕnϕ\phi_{1}=\cdots=\phi_{n}\equiv\phi, and:

ϕ:tet2/2q1,\displaystyle\phi:t\mapsto e^{t^{2}/2q}-1,

for a given q>0q>0. We first consider an entry of XX, XkX_{k}, for k[n]k\in[n], to understand which examples are concerned by this setting. Having in mind the elementary identities ϕ1(u)=2qlog(u+1)\phi^{-1}(u)=\sqrt{2q\log(u+1)}, and (ϕ1)(u)=q2log(u+1)1u+1(\phi^{-1})^{\prime}(u)=\sqrt{\frac{q}{2\log(u+1)}}\frac{1}{u+1}, one can perform the change of variable v=ϕ(u)v=\phi(u) in order to get:

𝔼[|Xk|r]\displaystyle\mathbb{E}[|X_{k}|^{r}] =12π0+ϕ(u)reu2/2𝑑u=12π0+qvreϕ1(v)2/2(v+1)2log(v+1)𝑑v\displaystyle=\frac{1}{\sqrt{2\pi}}\int_{0}^{+\infty}\phi(u)^{r}e^{-u^{2}/2}du=\frac{1}{\sqrt{2\pi}}\int_{0}^{+\infty}\frac{\sqrt{q}v^{r}e^{-\phi^{-1}(v)^{2}/2}}{(v+1)\sqrt{2\log(v+1)}}dv
=12π0+qvr2log(v+1)(v+1)q+1𝑑v.\displaystyle=\frac{1}{\sqrt{2\pi}}\int_{0}^{+\infty}\frac{\sqrt{q}v^{r}}{\sqrt{2\log(v+1)}(v+1)^{q+1}}dv.

With this last identity, we see that the moment of order rr of XkX_{k} is finite if and only if r<qr<q, this setting can therefore concern heavy-tailed distributions.

We now check the hypotheses of Proposition 4.1 on the mapping h:tϕ(t)=tqet2/2qh:t\mapsto\phi^{\prime}(t)=\frac{t}{q}e^{t^{2}/2q}. Given t+t\in\mathbb{R}^{+}, logϕ(t)=log(tqet2q)=log(t)2q+t2q\log\circ\phi^{\prime}(\sqrt{t})=\log(\frac{\sqrt{t}}{q}e^{\frac{t}{2q}})=\frac{\log(t)}{2q}+\frac{t}{2q}. Then, for any a,b2a,b\geq 2:

log(a+b)log(2min(a,b))log(2)+min(log(a),log(b))log(a)+log(b)\displaystyle\log(a+b)\leq\log(2\min(a,b))\leq\log(2)+\min(\log(a),\log(b))\leq\log(a)+\log(b)

The log\log (like the identity mapping ttt\mapsto t) is therefore sub additive on [2log(2n),)[2,+)[2\log(2n),\infty)\subset[2,+\infty) and consequently the same holds for tlogϕtt\mapsto\log\circ\phi^{\prime}\circ\sqrt{t}.

Besides, ϕ(2log(n))=1qn1q2log(n)\phi^{\prime}(\sqrt{2\log(n)})=\frac{1}{q}n^{\frac{1}{q}}\sqrt{2\log(n)} and one can employ Proposition 4.1 to set the concentration, for any f:nf:\mathbb{R}^{n}\to\mathbb{R}, 11-Lipschitz:

(|f(X)f(X)|>t)32(Idϕ)1(qtn1q2log(n))\displaystyle\mathbb{P}\left(\left|f(X)-f(X^{\prime})\right|>t\right)\leq 3\mathcal{E}_{2}\circ(\operatorname{Id}\cdot\phi)^{-1}\circ\left(\frac{qt}{n^{\frac{1}{q}}\sqrt{2\log(n)}}\right)

Given a parameter r>0r>0, let us compute:

Ch,r60ϕ(t)rtr+1et22𝑑t=6qr0t2r+1ert22qt22𝑑t\displaystyle C_{h,r}\equiv 6\int_{0}^{\infty}\phi^{\prime}(t)^{r}t^{r+1}e^{-\frac{t^{2}}{2}}dt=\frac{6}{q^{r}}\int_{0}^{\infty}t^{2r+1}e^{\frac{rt^{2}}{2q}-\frac{t^{2}}{2}}dt

We see that if q>1q>1, then Ch,1<+C_{h,1}<+\infty and for any r<qr<q and any 11-Lipschitz mapping f:nf:\mathbb{R}^{n}\to\mathbb{R}:

𝔼[|f(X)𝔼[f(X)]|r]Ch,r(1qn1q2log(n))r.\displaystyle\mathbb{E}[|f(X)-\mathbb{E}[f(X)]|^{r}]\leq C_{h,r}^{\prime}\left(\frac{1}{q}n^{\frac{1}{q}}\sqrt{2\log(n)}\right)^{r}.

With this last example, we saw that there exists a random vector having independent entries with minimal order of infinite moment equal to qq and such that any Lipschitz observations has finite rr moment for any r<qr<q.

More developments are here required to see how much one can extend this class of heavy-tailed random vectors. In particular some links could be made with the Fuk-Nagaev inequality [10, 21] that only concerns linear observations Z1++ZnZ_{1}+\cdots+Z_{n}. For some aspects, Fuk-Nagaev is stronger because it sets that when the entries have a moment of order qq finite then the first moments are of powers of n1qn^{\frac{1}{q}} (it would be n1q2nlognn^{\frac{1}{q}}\sqrt{2n\log n} in our case since the sum is a n\sqrt{n}-Lipschitz mapping for the euclidean norm on n\mathbb{R}^{n}). However, to our knowledge, the best Fuk-Nagaev result is only given for q1q\geq 1 ([1]) and, more importantly, it only concerns linear mappings which is an important limitation.

The proof of Proposition 4.1 relies on the following elementary lemma that justifies the sub-additive condition required on tloghtt\mapsto\log\circ h\circ\sqrt{t}.

Lemma 4.5.

Given an increasing mapping h:++h:\mathbb{R}_{+}\to\mathbb{R}_{+} such that for all a>2log(2n)a>2\log(2n), b>0b>0, h(a+b)h(a)h(b)h(\sqrt{a+b})\leq h(\sqrt{a})h(\sqrt{b}), one can bound:

min(1,n2h1)2h1Idηn,with:ηnh(2log(n))\displaystyle\min(1,n\mathcal{E}_{2}\circ h^{-1})\leq\mathcal{E}_{2}\circ h^{-1}\circ\frac{\operatorname{Id}}{\eta_{n}},\qquad\text{with:}\ \eta_{n}\equiv h(\sqrt{2\log(n)})
Proof.

Starting with the implication:

n2(h1(t))1\displaystyle n\mathcal{E}_{2}(h^{-1}(t))\leq 1 \displaystyle\Longleftrightarrow th(2log(2n))\displaystyle t\geq h(\sqrt{2\log(2n)}) \displaystyle\Longrightarrow tηn,\displaystyle t\geq\eta_{n},

we consider tηnt\geq\eta_{n} and try to bound from below:

2(h1(t/ηn))n2(h1(t))=1ne12h1(t)212h1(t/ηn)2=e12(h1(t)22lognh1(t/ηn)2).\displaystyle\frac{\mathcal{E}_{2}(h^{-1}(t/\eta_{n}))}{n\mathcal{E}_{2}(h^{-1}(t))}=\frac{1}{n}e^{\frac{1}{2}h^{-1}(t)^{2}-\frac{1}{2}h^{-1}(t/\eta_{n})^{2}}=e^{\frac{1}{2}\left(h^{-1}(t)^{2}-2\log n-h^{-1}(t/\eta_{n})^{2}\right)}. (13)

Now starting from the hypothesis on hh applied to a=2log(n)a=2\log(n) and b=h1(tηn)2b=h^{-1}(\frac{t}{\eta_{n}})^{2} one gets:

h(2log(n)+h1(tηn)2)h(2log(n))tηn,\displaystyle h\left(\sqrt{2\log(n)+h^{-1}\left(\frac{t}{\eta_{n}}\right)^{2}}\right)\leq h(\sqrt{2\log(n)})\frac{t}{\eta_{n}},

Applying Id2h1\operatorname{Id}^{2}\circ h^{-1} on both sides (h1h^{-1} like hh is increasing) one obtains:

2log(n)+h1(tηn)2=h1(ηn)2+h1(tηn)2h1(t)2.\displaystyle 2\log(n)+h^{-1}\left(\frac{t}{\eta_{n}}\right)^{2}=h^{-1}(\eta_{n})^{2}+h^{-1}\left(\frac{t}{\eta_{n}}\right)^{2}\leq h^{-1}(t)^{2}.

One can then inject this last inequality in (13) to finally obtain:

n2(h1(t))1\displaystyle n\mathcal{E}_{2}(h^{-1}(t))\leq 1 \displaystyle\Longrightarrow n2(h1(t))2(h1(t/ηn)),\displaystyle n\mathcal{E}_{2}(h^{-1}(t))\leq\mathcal{E}_{2}(h^{-1}(t/\eta_{n})),

which ends our proof. ∎

Proof of Proposition 4.1.

Let us introduce:

ϕ:nnz(ϕ1(z1),,ϕn(zn))\displaystyle\phi:\begin{aligned} \mathbb{R}^{n}&\longrightarrow&&\hskip 31.2982pt\mathbb{R}^{n}\\ z&\longmapsto&&(\phi_{1}(z_{1}),\ldots,\phi_{n}(z_{n}))\end{aligned}

to obtain the identity X=ϕ(Z)X=\phi(Z) (where we naturally defined Z(Z1,,Zn)Z\equiv(Z_{1},\ldots,Z_{n})). We want to apply Theorem 3.5 to the random vector ZZ and the mapping Φϕ\Phi\equiv\phi. Given an independent copy ZZ^{\prime}{} of ZZ, let us bound (recall that hh is increasing):

ϕ(Z)ϕ(Z)\displaystyle\|\phi(Z)-\phi(Z^{\prime}{})\| =k=1n(ϕk(Zk)ϕk(Zk))2\displaystyle=\sqrt{\sum_{k=1}^{n}(\phi_{k}(Z_{k})-\phi_{k}(Z_{k}^{\prime}))^{2}}
k=1nmax(|h(Zk)|,|h(Zk)|)2(ZkZk)2maxk[n](|h(Zk)|,|h(Zk)|)ZZ.\displaystyle\leq\sqrt{\sum_{k=1}^{n}\max(|h(Z_{k})|,|h(Z_{k}^{\prime})|)^{2}(Z_{k}-Z_{k}^{\prime})^{2}}\ \leq\max_{k\in[n]}(|h(Z_{k})|,|h(Z_{k}^{\prime})|)\|Z-Z^{\prime}{}\|.

Let us bound for any t0t\geq 0:

(maxk[n]|h(Zk)|t)\displaystyle\mathbb{P}\left(\max_{k\in[n]}|h(Z_{k})|\geq t\right) n(|h(Zk)|t)n(|Zk|>h1(t))n2(h1(t)).\displaystyle\leq n\mathbb{P}\left(|h(Z_{k})|\geq t\right)\leq n\mathbb{P}\left(|Z_{k}|>h^{-1}(t)\right)\leq n\mathcal{E}_{2}(h^{-1}(t)).

A probability being smaller than 11, one can employ Lemma 4.5 to get (with the notation ηnh(2log(n))\eta_{n}\equiv h(\sqrt{2\log(n)}):

(maxk[n]|h(Zk)|t)min(1,n2h1)(t)2h1(tηn)\displaystyle\mathbb{P}\left(\max_{k\in[n]}|h(Z_{k})|\geq t\right)\leq\min\left(1,n\mathcal{E}_{2}\circ h^{-1}\right)(t)\leq\mathcal{E}_{2}\circ h^{-1}\left(\frac{t}{\eta_{n}}\right)

One is left to employ Theorem 3.5 to finally get:

(|f(X)mf|>t)322h1(tηn)=32(Idh)1(tηn),\displaystyle\mathbb{P}\left(\left|f(X)-m_{f}\right|>t\right)\leq 3\mathcal{E}_{2}\boxtimes\mathcal{E}_{2}\circ h^{-1}\left(\frac{t}{\eta_{n}}\right)=3\mathcal{E}_{2}\circ(\operatorname{Id}\cdot h)^{-1}\left(\frac{t}{\eta_{n}}\right),

thanks to Proposition 1.5.

For the second statement, note that if we introduce g(Idh)1Idηng\equiv(\operatorname{Id}h)^{-1}\circ\frac{\operatorname{Id}}{\eta_{n}}, one can bound:

𝔼[|f(X)mf|r]\displaystyle\mathbb{E}[|f(X)-m_{f}|^{r}] =+(|f(X)mf|r>t)𝑑t\displaystyle=\int_{\mathbb{R}_{+}}\mathbb{P}\left(|f(X)-m_{f}|^{r}>t\right)dt
6r+tr1eg(t)22𝑑t\displaystyle\leq 6r\int_{\mathbb{R}_{+}}t^{r-1}e^{-\frac{g(t)^{2}}{2}}dt
=6rg(0)g1(t)r1g1(t)et22dt\displaystyle=6r\int_{g(0)}^{\infty}g^{-1}(t)^{r-1}g^{-1}{}^{\prime}(t)e^{-\frac{t^{2}}{2}}dt
=60g1(t)rtet22𝑑t[g1(t)ret22]g(0)+\displaystyle=6\int_{0}^{\infty}g^{-1}(t)^{r}te^{-\frac{t^{2}}{2}}dt-\left[g^{-1}(t)^{r}e^{-\frac{t^{2}}{2}}\right]_{g(0)}^{+\infty}
6ηnr0h(t)rtr+1et22𝑑t\displaystyle\leq 6\eta_{n}^{r}\int_{0}^{\infty}h(t)^{r}t^{r+1}e^{-\frac{t^{2}}{2}}dt

with the changes of variable ttrt\to t^{r} and g(t)tg(t)\to t (note that g1(0)=0g^{-1}(0)=0 thus g(0)=0g(0)=0).

5 Multilevel concentration

Let us study the particular settings of Theorems 3.5 and  3.10 when β1,,βn\beta_{1},\ldots,\beta_{n} all write αβ~1,,αβ~n\alpha\circ\tilde{\beta}_{1},\ldots,\alpha\circ\tilde{\beta}_{n}, for α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}} satisfying for any f:nf:\mathbb{R}^{n}\to\mathbb{R}, 11-Lipschitz:

t>0:(|f(Z)f(Z)|>t)α(t),\displaystyle\forall t>0:\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)\leq\alpha(t), (14)

given a random vector ZnZ\in\mathbb{R}^{n}. We say that β1,,βn\beta_{1},\ldots,\beta_{n} are “right compositions” of α\alpha. It is in particular the case for the Hanson-Wright inequality that we start to depict below. Given a deterministic matrix AnA\in\mathcal{M}_{n}, the application of Theorem 3.5 requires to express the concentration of Λ(Z)\Lambda(Z) for Λ:zAz\Lambda:z\mapsto\|Az\| a functional that allows us to bound the variations:

|ZTAZZATZ|(AZ+AZ)ZZ2max(Λ(Z),Λ(Z))ZZ.\displaystyle\left|Z^{T}AZ-Z^{\prime}{}^{T}AZ^{\prime}\right|\leq(\|AZ\|+\|AZ^{\prime}\|)\|Z-Z^{\prime}\|\leq 2\max(\Lambda(Z),\Lambda(Z^{\prime}))\|Z-Z^{\prime}\|.

It is not hard to see that, as a A\|A\|-Lipschitz functional of ZZ, Λ\Lambda satisfies (|Λ(Z)m|t)α(t/A)\mathbb{P}\left(\left|\Lambda(Z)-m\right|\geq t\right)\leq\alpha(t/\|A\|), where mm is a median of Λ\Lambda. The lemma below allows us to see that the concentration function of Λ\Lambda is actually a right composition of α\alpha:

t0:(|Λ(Z)|t)αβ~(t),\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(\left|\Lambda(Z)\right|\geq t\right)\leq\alpha\circ\tilde{\beta}(t), with:β~incmIdA.\displaystyle\text{with}:\quad\tilde{\beta}\equiv\operatorname{inc}_{m}\boxplus\frac{\operatorname{Id}}{\|A\|}. (15)
Lemma 5.1.

Given a random variable Λ\Lambda\in\mathbb{R}, a probabilistic operator α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}, and two parameters δ\delta\in\mathbb{R}, η+\eta\in\mathbb{R}_{+}:

t0:(|Λδ|t)α(tη)\displaystyle\forall t\geq 0:\quad\mathbb{P}(|\Lambda-\delta|\geq t)\leq\alpha\left(\frac{t}{\eta}\right) \displaystyle\Longrightarrow t0:(|Λ|t)α(inc|δ|Idη)(t).\displaystyle\forall t\geq 0:\quad\mathbb{P}(|\Lambda|\geq t)\leq\alpha\circ\left(\operatorname{inc}_{\left|\delta\right|}\boxplus\frac{\operatorname{Id}}{\eta}\right)(t).

Let us introduce some useful notations before proving this lemma. Given an increasing operator f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} such that +Dom(f)\mathbb{R}_{+}\subset\operatorname{Dom}(f), we denote f+f_{+}, the operator having domain equal to +\mathbb{R}_{+} and satisfying:

f+(t)=f(t)ift>0\displaystyle f_{+}(t)=f(t)\ \ \operatorname{if}\ t>0 and\displaystyle\operatorname{and} f+(0)=f(0).\displaystyle f_{+}(0)=f(0)_{-}.

In particular, given δ\delta\in\mathbb{R}, note that:

incδ1=δ+.\displaystyle\operatorname{inc}_{\delta}^{-1}=\delta_{+}. (16)

Besides, the fact that666Naturally Dom(01)={0}\operatorname{Dom}(0^{-1})=\{0\} and 01(0)=0^{-1}(0)=\mathbb{R} f+=max(01,f)f_{+}=\max(0^{-1},f) and Theorem A.16 imply:

Lemma 5.2.

Given f:2f:\mathbb{R}\mapsto 2^{\mathbb{R}} maximally increasing f+1=min(0,f1)f_{+}^{-1}=\min(0,f^{-1}).

Proof of Lemma 5.1.

Note that Λ|Λδ|+|δ|\Lambda\leq\left|\Lambda-\delta\right|+|\delta|, thus777One could be tempted to take advantage of Proposition 1.17 and the concentration (|δ|>t)αinc|δ|(t)\mathbb{P}\left(\left|\delta\right|>t\right)\leq\alpha\circ\operatorname{inc}_{|\delta|}(t) (|δ||\delta| being a constant), but that would introduce an unnecessary coefficient 22 in the decay. Besides, Proposition 1.17 can actually not be applied since αinc|δ|\alpha\circ\operatorname{inc}_{|\delta|} is not maximally monotone., for all t0t\geq 0:

(|Λ|t)\displaystyle\mathbb{P}\left(|\Lambda|\geq t\right) (|Λδ|+|δ|t)\displaystyle\leq\mathbb{P}\left(\left|\Lambda-\delta\right|+|\delta|\geq t\right)
α(max(t|δ|η,0))=α(ηId+|δ|+)1(t)=α(Idηinc|δ|)(t)\displaystyle\leq\alpha\left(\max\left(\frac{t-|\delta|}{\eta},0\right)\right)=\alpha\circ(\eta\operatorname{Id}+|\delta|_{+})^{-1}(t)=\alpha\circ\left(\frac{\operatorname{Id}}{\eta}\ \boxplus\operatorname{inc}_{|\delta|}\right)(t)

Theorem 3.5 applied to the concentration (14) and (15) then leads us to compute thanks to Proposition 1.5 the parallel product:

αα(incmIdA)\displaystyle\alpha\boxtimes\alpha\circ\left(\operatorname{inc}_{m}\boxplus\frac{\operatorname{Id}}{\|A\|}\right) =α(Id(incmIdA))\displaystyle=\alpha\circ\left(\operatorname{Id}\boxtimes\left(\operatorname{inc}_{m}\boxplus\frac{\operatorname{Id}}{\|A\|}\right)\right) =α((Idincm)(IdIdA))\displaystyle=\alpha\circ\left((\operatorname{Id}\boxtimes\operatorname{inc}_{m})\boxplus\left(\operatorname{Id}\boxtimes\frac{\operatorname{Id}}{\|A\|}\right)\right)

thanks to Lemma 1.4. Here, with the new notations we introduced and Lemma 5.2, let us bound:

  • Idincm=(Idm+)1=max(0,Idm)Idm\operatorname{Id}\boxtimes\operatorname{inc}_{m}=(\operatorname{Id}\cdot m_{+})^{-1}=\max\left(0,\frac{\operatorname{Id}}{m}\right)\succeq\frac{\operatorname{Id}}{m},

  • t0\forall t\geq 0, IdIdA(t)=(AId2)1(t)Id+m(t)\operatorname{Id}\boxtimes\frac{\operatorname{Id}}{\|A\|}(t)=(\|A\|\operatorname{Id}^{2})^{-1}(t)\succeq\sqrt{\frac{\operatorname{Id}_{+}}{m}}(t),

One can finally conclude, since α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}} is decreasing and the parallel sum (like the parallel product) preserves the order:

αα(incmIdA)α(IdmId+A)\displaystyle\alpha\boxtimes\alpha\circ\left(\operatorname{inc}_{m}\boxplus\frac{\operatorname{Id}}{\|A\|}\right)\preceq\alpha\circ\left(\frac{\operatorname{Id}}{m}\boxplus\sqrt{\frac{\operatorname{Id}_{+}}{\|A\|}}\right) (17)

We see a new concentration being again some variation of a right composition of α\alpha. This last concentration function can be involved in new concentration inequalities, that is why we will now describe the mechanism systematically. Let us introduce the “short notations” for all a0a\geq 0:

(ta)10inca\displaystyle\left(\frac{t}{a}\right)^{\frac{1}{0}}\equiv\operatorname{inc}_{a} and\displaystyle\operatorname{and} (t0)1ainc0.\displaystyle\left(\frac{t}{0}\right)^{\frac{1}{a}}\equiv\operatorname{inc}_{0}.

With these notations at hand, one sees that the expressions of the concentration function appearing in (17) express in a general way αaA(Idσa)1/a\alpha\circ\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a\in A}(\frac{\operatorname{Id}}{\sigma_{a}})^{1/a} for finite sets A+A\subset\mathbb{R}_{+} and (σa)aA+A(\sigma_{a})_{a\in A}\in\mathbb{R}_{+}^{A}. It is then possible to identify some simple calculation rules that are provided in next Lemma. The proof is a simple consequence of the distributive property of the parallel product provided by Lemma 1.4.

Lemma 5.3.

Given nn finite subsets A(1),,A(n)+A^{(1)},\ldots,A^{(n)}\subset\mathbb{R}_{+}, and nn families of parameters σ(1)A(1),,σ(n)A(n)\sigma^{(1)}\in\mathbb{R}^{A^{(1)}},\ldots,\sigma^{(n)}\in\mathbb{R}^{A^{(n)}} one has the identity:

a1A(1)(Idσa1(1))1a1anA(n)(Idσan(n))1an=a1A(1),,anA(n)(Idσa1(1)σan(n))1a1++an.\displaystyle\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a_{1}\in A^{(1)}}\left(\frac{\operatorname{Id}}{\sigma^{(1)}_{a_{1}}}\right)^{\frac{1}{a_{1}}}\boxtimes\cdots\boxtimes\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a_{n}\in A^{(n)}}\left(\frac{\operatorname{Id}}{\sigma^{(n)}_{a_{n}}}\right)^{\frac{1}{a_{n}}}=\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a_{1}\in A^{(1)},\ldots,a_{n}\in A^{(n)}}\left(\frac{\operatorname{Id}}{\sigma^{(1)}_{a_{1}}\cdots\sigma^{(n)}_{a_{n}}}\right)^{\frac{1}{a_{1}+\cdots+a_{n}}}.
Remark 5.4.

One can set similarly thanks to Theorem A.16:

mina1A(1)(Idσa1(1))1a1minanA(n)(Idσan(n))1an=mina1A(1),,anA(n)(Idσa1(1)σan(n))1a1++an.\displaystyle\min_{a_{1}\in A^{(1)}}\left(\frac{\operatorname{Id}}{\sigma^{(1)}_{a_{1}}}\right)^{\frac{1}{a_{1}}}\boxtimes\cdots\boxtimes\min_{a_{n}\in A^{(n)}}\left(\frac{\operatorname{Id}}{\sigma^{(n)}_{a_{n}}}\right)^{\frac{1}{a_{n}}}=\min_{a_{1}\in A^{(1)},\ldots,a_{n}\in A^{(n)}}\left(\frac{\operatorname{Id}}{\sigma^{(1)}_{a_{1}}\cdots\sigma^{(n)}_{a_{n}}}\right)^{\frac{1}{a_{1}+\cdots+a_{n}}}.

Given A+A\subset\mathbb{R}_{+} and (σa)aA+A(\sigma_{a})_{a\in A}\in\mathbb{R}_{+}^{A}:

infaA(Idσa)1a+=(exp(infaAIdlog(σa)a)log)+,\displaystyle{\inf_{a\in A}\left(\frac{\operatorname{Id}}{\sigma_{a}}\right)^{\frac{1}{a}}}_{+}=\left(\exp\circ\left(\inf_{a\in A}\frac{\operatorname{Id}-\log(\sigma_{a})}{a}\right)\circ\log\right)_{+},

and (infaAIdlog(σa)a)1=supaAaId+log(σa)(\inf_{a\in A}\frac{\operatorname{Id}-\log(\sigma_{a})}{a})^{-1}=\sup_{a\in A}a\operatorname{Id}+\log(\sigma_{a}), we recognize here the inverse of the convex conjugate of (logσa)aA(-\log\sigma_{a})_{a\in A}. This remark lead to some interesting, yet more laborious, proofs of Theorem 5.5 below.

Theorem 5.5.

Let us consider a metric space (E,d)(E,d), a random variable ZEZ\in E, nn measurable mappings Λ1,,Λn+E\Lambda_{1},\ldots,\Lambda_{n}\in\mathbb{R}_{+}^{E} such that there exist α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}, nn finite indexes sets containing 0, A(1),,A(n)+A^{(1)},\ldots,A^{(n)}\subset\mathbb{R}_{+}, and nn families of positive parameters σ(1)+A(1),,σ(n)+A(n)\sigma^{(1)}\in\mathbb{R}_{+}^{A^{(1)}},\ldots,\sigma^{(n)}\in\mathbb{R}_{+}^{A^{(n)}} such that for all f:Ef:E\mapsto\mathbb{R}, 11-Lipschitz and for any independent copy of ZZ, ZZ^{\prime}:

(|f(Z)f(Z)|>t)α(t)\displaystyle\mathbb{P}\left(\left|f(Z)-f(Z^{\prime})\right|>t\right)\leq\alpha(t) (18)

and:

k[n]:(|Λ(k)(Z)σ0(k)|t)αaA(k){0}(Idσa(k))1a(t)\displaystyle\forall k\in[n]:\quad\mathbb{P}\left(\left|\Lambda^{(k)}(Z)-\sigma^{(k)}_{0}\right|\geq t\right)\leq\alpha\circ\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a\in A^{(k)}\setminus\{0\}}\left(\frac{\operatorname{Id}}{\sigma^{(k)}_{a}}\right)^{\frac{1}{a}}(t)

Given a supplementary metric space (E,d)(E^{\prime},d^{\prime}), and a measurable mapping Φ:EE\Phi:E\to E^{\prime}, if we assume that for any z,zEz,z^{\prime}\in E:

d(Φ(z),Φ(z))max(Λ1(z),Λ1(z))max(Λn(z),Λn(z))d(z,z),a.s.\displaystyle d^{\prime}(\Phi(z),\Phi(z^{\prime}))\leq\max(\Lambda_{1}(z),\Lambda_{1}(z^{\prime}))\cdots\max(\Lambda_{n}(z),\Lambda_{n}(z^{\prime}))\cdot d(z,z^{\prime}),\quad a.s. (19)

then for any g:Eg:E^{\prime}\to\mathbb{R}, 11-Lipschitz:

(|g(Φ(Z))g(Φ(Z))|>t)αakA(k),k[n](Idσa1(1)σan(n))11+a1++an(t).\displaystyle\mathbb{P}\left(\left|g(\Phi(Z))-g(\Phi(Z^{\prime}))\right|>t\right)\leq\alpha\circ\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a_{k}\in A^{(k)},k\in[n]}\left(\frac{\operatorname{Id}}{\sigma^{(1)}_{a_{1}}\cdots\sigma^{(n)}_{a_{n}}}\right)^{\frac{1}{1+a_{1}+\cdots+a_{n}}}(t).

The result is also true in a convex concentration setting (EE euclidean vector space, E=E^{\prime}=\mathbb{R} and (18) is true for any ff 11-Lipschitz and convex).

Although practical realization of this setting does not seem very frequent, it partly explains the frequent appearance of multilevel concentration (in particular in [11] whose setting is quite far from the literature around our Theorem 0.2 that we briefly presented in the introduction).

Let us fist give some remarks on this theorem before providing its proof.

  • If σ0(1)=0\sigma_{0}^{(1)}=0, then, by convention, denoting a=1+a2++ana=1+a_{2}+\cdots+a_{n} one has:

    t>0:((t0σa2(2)σan(n))11+a2++an)1=((t0)1a)1=inc01=0+,\displaystyle\forall t>0:\quad\left(\left(\frac{t}{0\cdot\sigma_{a_{2}}^{(2)}\cdots\sigma_{a_{n}}^{(n)}}\right)^{\frac{1}{1+a_{2}+\cdots+a_{n}}}\right)^{-1}=\left(\left(\frac{t}{0}\right)^{\frac{1}{a}}\right)^{-1}=\operatorname{inc}_{0}^{-1}=0_{+},

    thus we see that the contribution of σ0(k)\sigma_{0}^{(k)} will be nonexistent in the computation of the parallel sum.

  • If there exists k[n]k\in[n] such that A(k)={0}A^{(k)}=\{0\}, then it means that Λ(k)\Lambda^{(k)} is a constant equal to σ0(k)\sigma_{0}^{(k)}, and it is indeed treated as such in the final formula.

Proof of Theorem 5.5.

For all k[n]k\in[n], let us introduce the notation β(k)aA(k)(tσa(k))1a|+\beta^{(k)}\equiv\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{a\in A^{(k)}}{\left(\frac{t}{\sigma^{(k)}_{a}}\right)^{\frac{1}{a}}}\raisebox{-2.15277pt}{$|$}_{{}^{\cap}_{+}}. First Lemma 5.1 allows to set:

(|Λ(k)(Z)|t)αβ(k)(t).\displaystyle\mathbb{P}\left(\left|\Lambda^{(k)}(Z)\right|\geq t\right)\leq\alpha\circ\beta^{(k)}(t).

Second Theorem 3.5 provides the concentration:

(|g(Φ(Z))g(Φ(Z))|>t)(2n+1)α(αβ(1))(αβ(n))(t).\displaystyle\mathbb{P}\left(\left|g(\Phi(Z))-g(\Phi(Z^{\prime}))\right|>t\right)\leq(2n+1)\alpha\boxtimes(\alpha\circ\beta^{(1)})\ \boxtimes\ \cdots\ \boxtimes\ (\alpha\circ\beta^{(n)})(t).

One can then conclude with Proposition 1.5 combined with Lemma 5.3.

The last result of multilevel concentration relies on the Taylor approximation of dd-differentiable mappings and relies on the notion of modulus of continuity. To stay coherent with our framework, we introduce this definition for operators.

Definition 5.6.

A modulus of continuity ω:2\omega:\mathbb{R}\mapsto 2^{\mathbb{R}} is a maximally increasing operator satisfying ω(0)={0}\omega(0)=\{0\} and Ran(ω)=+\operatorname{Ran}(\omega)=\mathbb{R}_{+}.

Given two metric spaces (E,d)(E,d), (E,d)(E^{\prime},d^{\prime}), a mapping f:(E,d)(E,d)f:(E,d)\to(E^{\prime},d^{\prime}) is said to be ω\omega-continuous iif.:

x,yE:d(f(x),f(y))ω(d(x,y)).\displaystyle\forall x,y\in E:\quad d(f(x),f(y))\leq\omega(d^{\prime}(x,y)).

One then has the following characterization of the concentration of measure phenomenon with modulus of continuity already provided in [18], it is a particular case of Lemma 5.9 provided just below.

Proposition 5.7 ([18], Proposition 1.3.).

Given a random vector XEX\in E if, for any f:Ef:E\to\mathbb{R}, 11-Lipschitz and for any median mfm_{f} of f(X)f(X), (|f(X)mf|>t)α(t)\mathbb{P}\left(\left|f(X)-m_{f}\right|>t\right)\leq\alpha(t), then for any g:Eg:E\to\mathbb{R}, ω\omega-continuous, and any median mgm_{g} of g(X)g(X), one has:

(|g(X)mg|>t)αω1(t)\displaystyle\mathbb{P}\left(\left|g(X)-m_{g}\right|>t\right)\leq\alpha\circ\omega^{-1}(t)

(the converse is obvious).

It does not seem easy – if possible – to find analogous to Lemma 3.6 and 3.9 to continue a ω\omega-continuous mapping f|A{f}\raisebox{-2.15277pt}{$|$}_{A} when ω\omega is not concave888It is a well known fact that modulus of continuity on convex bodies can be assumed to be concave or sub additive but the question is then to show that our restriction space AA is convex which is generally not the case. as it will be the case in the proof of next Theorem. Hopefully, this difficulty can be easily overcome since the condition that will be met to rely on the Taylor approximation is a localized notion of ω\omega-continuity that we define below.

Definition 5.8.

Given two metric space (E,d)(E,d) and (E,d)(E^{\prime},d^{\prime}), a modulus of continuity ω:2\omega:\mathbb{R}\mapsto 2^{\mathbb{R}}, and AEA\subset E, we say that a mapping f:EEf:E\to E^{\prime} is ω\omega-continuous from AA if for all xAx\in A, for all yEy\in E:

d(f(x),f(y))ω(d(x,y)).\displaystyle d^{\prime}(f(x),f(y))\leq\omega(d(x,y)).

One can then inspire from the beginning of Section 1.3 in [18] to get the following lemma.

Lemma 5.9.

Let us consider a metric space (E,d)(E,d), a random variable XEX\in E, and a decreasing mapping α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}} such that:

(|f(X)mf|>t)α(t),\displaystyle\mathbb{P}\left(\left|f(X)-m_{f}\right|>t\right)\leq\alpha(t), (20)

for mfm_{f}, a median of f(X)f(X), then for any subsets AEA\subset E, any modulus of continuity ω\omega, and any measurable mapping g:Eg:E\to\mathbb{R}, ω\omega-continuous from AA:

t0:(|g(X)mg|>t,XA)α(ω1(ct)),\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(\left|g(X)-m_{g}\right|>t,X\in A\right)\leq\alpha(\omega^{-1}(ct)), (21)

for any mgm_{g}\in\mathbb{R}, a median of g(X)g(X).

In [18], most of the results are set in the measure theory framework, the next proof is mainly an adaptation of [18]’s inferences with probabilistic notations.

Proof.

Introducing the set S={gmg}ES=\{g\leq m_{g}\}\subset E, note that xA\forall x\in A:

g(x)>mg+ω(t)d(x,S)>td(x,Sc)=0\displaystyle g(x)>m_{g}+\omega(t)\quad\Longrightarrow\quad d(x,S)>t\quad d(x,S^{c})=0
g(x)<mgω(t)d(x,S)=0d(x,Sc)>t\displaystyle g(x)<m_{g}-\omega(t)\quad\Longrightarrow\quad d(x,S)=0\quad d(x,S^{c})>t

since gg is ω\omega-continuous from AA and:

d(x,S)=0\displaystyle d(x,S)=0 \displaystyle\Longleftrightarrow d(x,Sc)0.\displaystyle d(x,S^{c})\neq 0.

We then rely on the mapping xd(x,S)d(x,Sc)x\mapsto d(x,S)-d(x,S^{c}) to remove the condition XAX\in A:

(|g(X)mg|>ω(t),XA)\displaystyle\mathbb{P}\left(|g(X)-m_{g}|>\omega(t),X\in A\right) =(g(x)>mg+ω(t)org(x)<mgω(t),XA)\displaystyle=\mathbb{P}\left(g(x)>m_{g}+\omega(t)\ \operatorname{or}\ g(x)<m_{g}-\omega(t),X\in A\right)
(|d(x,S)d(x,Sc)|>t,XA)\displaystyle\leq\mathbb{P}\left(\left|d(x,S)-d(x,S^{c})\right|>t,X\in A\right)
(|d(x,S)d(x,Sc)|>t)\displaystyle\leq\mathbb{P}\left(\left|d(x,S)-d(x,S^{c})\right|>t\right) (22)

Now, first note that g~:xd(x,S)d(x,Sc)\tilde{g}:x\mapsto d(x,S)-d(x,S^{c}) is 11-Lipschitz on EE. Given x,yEx,y\in E, if x,ySx,y\in S or x,yScx,y\in S^{c}, the Lipschitz character of the distance (it satisfies the triangular inequality) allows us to deduce that |g~(x)g~(y)|d(x,y)|\tilde{g}(x)-\tilde{g}(y)|\leq d(x,y). We then consider the case xSx\in S and yScy\in S^{c} and assume, without loss of generality that d(x,Sc)d(y,S)d(x,S^{c})\geq d(y,S), one can then bound:

|g~(x)g~(y)|=d(x,Sc)d(y,S).\displaystyle|\tilde{g}(x)-\tilde{g}(y)|=d(x,S^{c})-d(y,S).

Given ε>0\varepsilon>0, there exists zScz\in S^{c} and wSw\in S such that d(z,w)ε2d(z,w)\leq\frac{\varepsilon}{2} and d(y,S)d(y,w)ε2d(y,S)\geq d(y,w)-\frac{\varepsilon}{2} then:

|g~(x)g~(y)|d(x,z)d(y,w)+ε2d(x,w)+d(w,z)d(y,w)+ε2d(x,y)+ε,\displaystyle|\tilde{g}(x)-\tilde{g}(y)|\leq d(x,z)-d(y,w)+\frac{\varepsilon}{2}\leq d(x,w)+d(w,z)-d(y,w)+\frac{\varepsilon}{2}\leq d(x,y)+\varepsilon,

thanks to the triangular inequality. Letting ε\varepsilon tend to zero, one can deduce the 11-Lipschitz character of g~\tilde{g}.

Second, note that g~(X)\tilde{g}(X) admits 0 as a median:

(g~(X)0)=(XSc¯)12\displaystyle\mathbb{P}(\tilde{g}(X)\geq 0)=\mathbb{P}(X\in\bar{S^{c}})\geq\frac{1}{2} and\displaystyle\operatorname{and} (g~(X)0)=(XS¯)12.\displaystyle\mathbb{P}(\tilde{g}(X)\leq 0)=\mathbb{P}(X\in\bar{S})\geq\frac{1}{2}.

One can then deduce from the hypothesis of the lemma that:

t0:(|g(X)mg|ω(t),XA)(|g~(X)|>t)α(t),\displaystyle\forall t\geq 0:\quad\mathbb{P}\left(|g(X)-m_{g}|\geq\omega(t),X\in A\right)\leq\mathbb{P}\left(\left|\tilde{g}(X)\right|>t\right)\leq\alpha(t),

In the case where ω\omega is a (singleton-valued) invertible function, one can directly conclude:

(|g(X)mg|>t,XA)=(|g(X)mg|>ω(ω1(t)),XA)\displaystyle\mathbb{P}\left(\left|g(X)-m_{g}\right|>t,X\in A\right)=\mathbb{P}\left(\left|g(X)-m_{g}\right|>\omega(\omega^{-1}(t)),X\in A\right) αω1(t),\displaystyle\leq\alpha\circ\omega^{-1}\left(t\right),

for the general set-valued case, one can employ similar arguments to the ones given in the proof of Proposition 1.17. ∎

We are now almost ready to set our main result, Theorem 0.2, on the concentration of finitely differentiable mappings. To sharpen the concentration bound, a solution is to work with a sequence of polynomials (Pk)k[d][X]d(P_{k})_{k\in[d]}\in\mathbb{C}[X]^{d} satisfying:

{P0=mdk[d]:Pk=l=1kPkll!Xl+mdk,\displaystyle\left\{\begin{aligned} &P_{0}=m_{d}\\ &\forall k\in[d]:P_{k}=\sum_{l=1}^{k}\frac{P_{k-l}}{l!}X^{l}+m_{d-k},\end{aligned}\right. (23)

where the parameters m0,,md+m_{0},\ldots,m_{d}\in\mathbb{R}^{+} were defined999Actually, m0m_{0} was not defined, but it could be any positive number. To stay in line with the definition of m1,,md1m_{1},\ldots,m_{d-1}, one could choose m0m_{0} as a median of d0Φ|z=Φ(z)\|{d^{0}\Phi}\raisebox{-2.15277pt}{$|$}_{z}\|=\|\Phi(z)\|. in the setting of Theorem 0.2, but Lemma 5.10 below is independent of this choice. Note that for all k{0,,d}k\in\{0,\ldots,d\}, PkP_{k} has degree kk.

Lemma 5.10.

Given the sequence of polynomials (Pk)0kd(P_{k})_{0\leq k\leq d} defined in (23) (for a given sequence (mk)0kd+d(m_{k})_{0\leq k\leq d}\in\mathbb{R}_{+}^{d}, if one introduces the coefficients ((ai(k))0ik)0kd]((a_{i}^{(k)})_{0\leq i\leq k})_{0\leq k\leq d]} satisfying:

k[d]:Pk=i=0kai(k)mdk+iXi,\displaystyle\forall k\in[d]:\quad P_{k}=\sum_{i=0}^{k}a_{i}^{(k)}m_{d-k+i}X^{i}, (24)

then:

i[d],k,l{i,,d}:ai(k)=ai(l)ei\displaystyle\forall i\in[d],\forall k,l\in\{i,\ldots,d\}:\quad a_{i}^{(k)}=a_{i}^{(l)}\leq e^{i}
Proof.

Let us first find a relation between the coefficients ai(k)a_{i}^{(k)} from the expression of P0,,PkP_{0},\ldots,P_{k}. Of course, one has P0=mdkP_{0}=m_{d}-k, thus a0(0)=1a_{0}^{(0)}=1. Now injecting (24) in (23), one obtains (with the changes of variable hl+ih\leftarrow l+i and jhij\rightarrow h-i):

Pk=l=1ki=0klai(kl)l!mdk+l+iXi+l=h=1ki=0hai(kh+i)(hi)!mdk+hXh=h=1kj=0hahj(kj)j!mdk+hXh.\displaystyle P_{k}=\sum_{l=1}^{k}\sum_{i=0}^{k-l}\frac{a_{i}^{(k-l)}}{l!}m_{d-k+l+i}X^{i+l}=\sum_{h=1}^{k}\sum_{i=0}^{h}\frac{a_{i}^{(k-h+i)}}{(h-i)!}m_{d-k+h}X^{h}=\sum_{h=1}^{k}\sum_{j=0}^{h}\frac{a_{h-j}^{(k-j)}}{j!}m_{d-k+h}X^{h}.

One then gets the following recurrence relation between the coefficients:

ai(k)=l=1iail(kl)l!.\displaystyle a_{i}^{(k)}=\sum_{l=1}^{i}\frac{a_{i-l}^{(k-l)}}{l!}.

The recursion formula reveals that the lower index is independent of the upper index, we then denote i[d]\forall i\in[d], aiai(k)a_{i}\equiv a_{i}^{(k)}, (for any k{i,,d}k\in\{i,\ldots,d\}). Let us then show recursively that ai(i+1)ii!a_{i}\leq\frac{(i+1)^{i}}{i!}. Of course a0=1100!a_{0}=1\leq\frac{1^{0}}{0!}, then given j[d]j\in[d], if we assume that this inequality is true for i=0,,j1i=0,\ldots,j-1, one can bound:

ajl=1j(jl+1)jl(jl)!l!1j!l=0j(jl)jl=(j+1)jj!.\displaystyle a_{j}\leq\sum_{l=1}^{j}\frac{(j-l+1)^{j-l}}{(j-l)!l!}\leq\frac{1}{j!}\sum_{l=0}^{j}\binom{j}{l}j^{l}=\frac{(j+1)^{j}}{j!}.

And the Stirling formula allows us to conclude:

aj(j+1)jj!(j+1j)jej2πjej.\displaystyle a_{j}\leq\frac{(j+1)^{j}}{j!}\leq\left(\frac{j+1}{j}\right)^{j}\frac{e^{j}}{\sqrt{2\pi j}}\leq e^{j}.

We will prove below a stronger result than Theorem 0.2 which is merely deduced from Lemma 5.10 and Proposition A.17.

Theorem 5.11.

Let us consider a random vectors ZnZ\in\mathbb{R}^{n} such that for any f:nf:\mathbb{R}^{n}\to\mathbb{R} 11-Lipschitz:

(|f(Z)mf|>t)α(t),\displaystyle\mathbb{P}\left(\left|f(Z)-m_{f}\right|>t\right)\leq\alpha(t),

for a certain median of f(Z)f(Z), mfm_{f} and a certain decreasing mapping α:++\alpha:\mathbb{R}_{+}\to\mathbb{R}_{+}.

Then, for any dd-differentiable mapping Φ𝒟d(n,p)\Phi\in\mathcal{D}^{d}(\mathbb{R}^{n},\mathbb{R}^{p}) and any g:pg:\mathbb{R}^{p}\to\mathbb{R}, 11-Lipschitz, one can bound:

(|g(Φ(Z))mg|>t)2d1α(k[d](Idakmk)1k),\displaystyle\mathbb{P}\left(\left|g(\Phi(Z))-m_{g}\right|>t\right)\leq 2^{d-1}\alpha\left(\mathop{\vphantom{\bigoplus}\mathchoice{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}{\vbox{\hbox{\leavevmode\resizebox{0.0pt}{}{$\boxplus$}}}}}\displaylimits_{k\in[d]}\left(\frac{\operatorname{Id}}{a_{k}m_{k}}\right)^{\frac{1}{k}}\right),

where, mgm_{g} is a median of gΦ(Z)g\circ\Phi(Z), for all k[d1]k\in[d-1], mkm_{k} is a median of dkΦ|Z\|{d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{Z}\|, md=ddΦm_{d}=\|d^{d}\Phi\|_{\infty} and a1,,aka_{1},\ldots,a_{k} are the parameters introduced in Lemma 5.10.

Proof.

We will show recursively that, for all k{0,,d1}k\in\{0,\ldots,d-1\}, for all f:k(n,p)f:\mathcal{L}^{k}(\mathbb{R}^{n},\mathbb{R}^{p})\to\mathbb{R}, 11-Lipschitz:

(|f(dkΦ|Z)mf|>t)2d1kαβk(t)\displaystyle\mathbb{P}\left(\left|f({d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{Z})-m_{f}\right|>t\right)\leq 2^{d-1-k}\alpha\circ\beta_{k}(t) where:βk(Pdkmk)1\displaystyle\hskip 5.69046pt\text{where:}\quad\beta_{k}\equiv(P_{d-k}-m_{k})^{-1} (25)

and mfm_{f} is a median of f(dkΦ|Z)f({d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{Z}). Note from Lemma 5.1 that the iteration hypothesis implies in particular:

(dkΦ|Z>t)2d1kαPdk1(t).\displaystyle\mathbb{P}\left(\left\|{d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{Z}\right\|>t\right)\leq 2^{d-1-k}\alpha\circ P_{d-k}^{-1}(t). (26)

Given z,znz,z^{\prime}\in\mathbb{R}^{n}:

dd1Φ|zdd1Φ|zddΦzz,\displaystyle\left\|{d^{d-1}\Phi}\raisebox{-2.15277pt}{$|$}_{z}-{d^{d-1}\Phi}\raisebox{-2.15277pt}{$|$}_{z^{\prime}}\right\|\leq\|d^{d}\Phi\|_{\infty}\left\|z-z^{\prime}\right\|,

which means that dd1Φ|Z{d^{d-1}\Phi}\raisebox{-2.15277pt}{$|$}_{Z} is a ddΦ\|d^{d}\Phi\|_{\infty}-Lipschitz transformation of ZZ and therefore, for any f:d1(n,p)f:\mathcal{L}^{d-1}(\mathbb{R}^{n},\mathbb{R}^{p})\to\mathbb{R}, 11-Lipschitz:

(|f(dd1Φ|Z)mf|>t)α(tddΦ)=α(tmd),\displaystyle\mathbb{P}\left(\left|f({d^{d-1}\Phi}\raisebox{-2.15277pt}{$|$}_{Z})-m_{f}\right|>t\right)\leq\alpha\left(\frac{t}{\|d^{d}\Phi\|_{\infty}}\right)=\alpha\left(\frac{t}{m_{d}}\right),

and indeed βd1=(P1md1)1=(mdId)1\beta_{d-1}=(P_{1}-m_{d-1})^{-1}=(m_{d}\operatorname{Id})^{-1}.

Let us now assume that (25) is true from dd down to a certain k+1[d1]k+1\in[d-1]. One can bound thanks to the Taylor expansion around zz^{\prime}:

dkΦ|zdkΦ|zl=1dk11(dkl)!ddlΦ|zzzdkl+1(dk)!ddΦzzdk.\displaystyle\left\|{d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{z}-{d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{z^{\prime}}\right\|\leq\sum_{l=1}^{d-k-1}\frac{1}{(d-k-l)!}\left\|{d^{d-l}\Phi}\raisebox{-2.15277pt}{$|$}_{z^{\prime}}\right\|\|z^{\prime}-z\|^{d-k-l}+\frac{1}{(d-k)!}\|d^{d}\Phi\|_{\infty}\|z^{\prime}-z\|^{d-k}. (27)

Now, let us introduce the subset:

𝒜t{zn,ddlΦ|zPl(βk(t)),l{0,,dk1}}n.\displaystyle\mathcal{A}_{t}\equiv\left\{z\in\mathbb{R}^{n},\left\|{d^{d-l}\Phi}\raisebox{-2.15277pt}{$|$}_{z}\right\|\leq P_{l}\left(\beta_{k}(t)\right),l\in\{0,\ldots,d-k-1\}\right\}\subset\mathbb{R}^{n}.

We know from (27) that dkΦd^{k}\Phi is ωt\omega_{t}-continuous from 𝒜t\mathcal{A}_{t} with:

u0:ωt(u)=l=0dk11(dkl)!Pl(βk(t))udkl=l=1dk1l!Pdkl(βk(t))ul\displaystyle\forall u\geq 0:\quad\omega_{t}(u)=\sum_{l=0}^{d-k-1}\frac{1}{(d-k-l)!}P_{l}\left(\beta_{k}(t)\right)u^{d-k-l}=\sum_{l=1}^{d-k}\frac{1}{l!}P_{d-k-l}\left(\beta_{k}(t)\right)u^{l}

Note then that choosing u=βk(t)u=\beta_{k}(t), one deduces from the definition of P1,,PdP_{1},\ldots,P_{d} (see (23)) that:

ωt(βk(t))=Pdk(βk(t))mk=t\displaystyle\omega_{t}(\beta_{k}(t))=P_{d-k}\left(\beta_{k}(t)\right)-m_{k}=t

Lemma 5.9 and the recursion hypothesis then allows us to bound:

(|f(dkΦ|Z)mf|>t,Z𝒜t)α(ωt1(t))αβk(t)\displaystyle\mathbb{P}\left(\left|f\left({d^{k}\Phi}\raisebox{-2.15277pt}{$|$}_{Z}\right)-m_{f}\right|>t,Z\in\mathcal{A}_{t}\right)\leq\alpha(\omega_{t}^{-1}(t))\leq\alpha\circ\beta_{k}(t) (28)

Besides, we can bound thanks to (26):

(Z𝒜t)\displaystyle\mathbb{P}\left(Z\notin\mathcal{A}_{t}\right) l=1dk1(ddlΦ|z>Pl(βk(t)))\displaystyle\leq\sum_{l=1}^{d-k-1}\mathbb{P}\left(\left\|{d^{d-l}\Phi}\raisebox{-2.15277pt}{$|$}_{z}\right\|>P_{l}\left(\beta_{k}(t)\right)\right)
l=1dk12l1αPl1Plβ~k(t)(2dk11)αβk(t)\displaystyle\leq\sum_{l=1}^{d-k-1}2^{l-1}\alpha\circ P_{l}^{-1}\circ P_{l}\circ\tilde{\beta}_{k}(t)\leq(2^{d-k-1}-1)\alpha\circ\beta_{k}(t) (29)

One retrieve the iteration hypothesis (25) combining (28) with (5), the result is obtained from the identity:

β0\displaystyle\beta_{0} =(Pdm0)1=(i=1daimiIdi)1\displaystyle=(P_{d}-m_{0})^{-1}=\left(\sum_{i=1}^{d}a_{i}m_{i}\operatorname{Id}^{i}\right)^{-1}

To obtain a version of Theorem 0.2 in a convex concentration setting, one would first require to set an analogous result to Lemma 5.9 in the case of a convex concentration hypothesis (which is not straightforward, it would just be true for convex sets AEA\subset E then g~\tilde{g} would be the difference of two convex mappings which would impact the final constants), one would also have to assume that all the mappings zddΦ|zz\mapsto\|{d^{d}\Phi}\raisebox{-2.15277pt}{$|$}_{z}\| are convex which seems quite restrictive. To limit the content of the present article, we leave these developments to interested readers.

6 Consequences for Hanson-Wright concentration inequality

We formulate below the Hanson-Wright inequality as a linear concentration result on random matrices XTAXX^{T}AX with the widest hypotheses possible on α\alpha (a result with the expectation is provided in Theorem 6.7). This result is completely equivalent to concentration of quadratic forms XTAXX^{T}AX for XpX\in\mathbb{R}^{p} (see Remark 6.2), but having already presented the stronger notions of Lipschitz and convex concentration in previous sections, we found interesting to provide some examples of the linear concentrated vectors class.

Theorem 6.1.

Given a decreasing mappings α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}} and a random matrix Xp,nX\in\mathcal{M}_{p,n}, if one assumes that for all 11-Lipschitz mapping f:p,nf:\mathcal{M}_{p,n}\to\mathbb{R} and for all median of ff, mfm_{f}\in\mathbb{R}:

(|f(X)mf|t)α(t)\displaystyle\mathbb{P}\left(\left|f(X)-m_{f}\right|\geq t\right)\leq\alpha(t)

then for all deterministic Ap,BnA\in\mathcal{M}_{p},B\in\mathcal{M}_{n}, one has the concentration:

t0:(|Tr(BXTAX)mTr|>t)2α(IdmId3AB)(t),\displaystyle\forall t\geq 0:\qquad\mathbb{P}\left(\left|\operatorname{Tr}(BX^{T}AX)-m_{\operatorname{Tr}}\right|>t\right)\leq 2\alpha\circ\left(\frac{\operatorname{Id}}{m}\boxplus\sqrt{\frac{\operatorname{Id}}{3\|A\|\|B\|}}\right)(t),

for a given median of Tr(BXTAX)\operatorname{Tr}(BX^{T}AX), mTrm_{\operatorname{Tr}}\in\mathbb{R} and any mm\in\mathbb{R}, median of 2AXBF2\|AXB\|_{F}.

In a convex concentration setting the same result is obtained with some numerical constants replacing the “22” and “33” in last result.

Remark 6.2 (Vectorization of a matricial identity).

Given Mp,nM\in\mathcal{M}_{p,n}, let us introduce the notation Mˇpn\check{M}\in\mathbb{R}^{pn} satisfying Mˇi(j1)+j=Mi,j\check{M}_{i(j-1)+j}=M_{i,j} and an operation having value in pn,pn\mathcal{M}_{pn,pn} defined for any ApA\in\mathcal{M}_{p} and BnB\in\mathcal{M}_{n} as:

(AB)i(j1)+j,k(l1)+l=Ai,kBl,j.\displaystyle(A\otimes B)_{i(j-1)+j,k(l-1)+l}=A_{i,k}B_{l,j}.

Through a mere regrouping of indexes in sums, one obtains:

Tr(BMAMT)=MˇT(AB)Mˇ\displaystyle\operatorname{Tr}(BMAM^{T})=\check{M}^{T}(A\otimes B)\check{M} (30)

One sees with this vectorized identity that the study of Tr(BXTAX)\operatorname{Tr}(BX^{T}AX) boils down to a mere study of ZTCZZ^{T}CZ where ZpnZ\in\mathbb{R}^{pn} is a random vector and Cpn,pnC\in\mathcal{M}_{pn,pn} a deterministic matrix.

Proof of Theorem 6.1.

Let us assume, without loss of generality that ABA\otimes B in (30) is a symmetric matrix (the anti-symmetric contribution cancels since XˇT(AB)Xˇ=XˇT(AB)TXˇ\check{X}^{T}(A\otimes B)\check{X}=\check{X}^{T}(A\otimes B)^{T}\check{X}). Theorem 3.5 or Theorem 5.11 (the strong version of Theorem 0.2) can both be applied here, but in order to get the best concentration constants possible, we rather check the conditions of the latter one. Let us introduce Φ:MTr(BMTAM)\Phi:M\mapsto\operatorname{Tr}(BM^{T}AM), then one has for any Hp,nH\in\mathcal{M}_{p,n}:

dΦ|XH=Tr(BHTAX)+Tr(BXTAH)\displaystyle{d\Phi}\raisebox{-2.15277pt}{$|$}_{X}\cdot H=\operatorname{Tr}(BH^{T}AX)+\operatorname{Tr}(BX^{T}AH) and\displaystyle\operatorname{and} d2Φ|X(H,H)=2Tr(BHTAH),\displaystyle{d^{2}\Phi}\raisebox{-2.15277pt}{$|$}_{X}\cdot(H,H)=2\operatorname{Tr}(BH^{T}AH),

Therefore:

dΦ|X=2BXTAF\displaystyle\left\|{d\Phi}\raisebox{-2.15277pt}{$|$}_{X}\right\|=2\|BX^{T}A\|_{F} and\displaystyle\operatorname{and} d2Φ|X=d2Φ|X=2BA.\displaystyle\left\|{d^{2}\Phi}\raisebox{-2.15277pt}{$|$}_{X}\right\|_{\infty}=\left\|{d^{2}\Phi}\raisebox{-2.15277pt}{$|$}_{X}\right\|=2\|B\|\|A\|.

Applying Theorem 5.11, one can deduce the expected result (note that a1=a0=1a_{1}=a_{0}=1 and a2=a11+a02!=32a_{2}=\frac{a_{1}}{1}+\frac{a_{0}}{2!}=\frac{3}{2}).

In the case of a convex concentration of ZZ, one can still obtain a similar result by expressing ABA\otimes B as the difference of two positive symmetric matrices to be able to consider convex mappings and combine Theorem 3.10 and Lemma 2.4 to conclude.

The rest of the section aims at rewriting Theorem 6.1 in the cases where XTAXX^{T}AX admits an expectation which is linked to some integrability properties of α\alpha (see Lemma 2.5). The first lemma helps us bounding 𝔼[AXBF]\mathbb{E}[\|AXB\|_{F}] that will be close to the median “mm” in Theorem 6.1.

Lemma 6.3.

Given a random matrix Xp,nX\in\mathcal{M}_{p,n} and two deterministic matrices ApA\in\mathcal{M}_{p} and BnB\in\mathcal{M}_{n}:

𝔼[AXBF]AFBF𝔼[XˇXˇT],\displaystyle\mathbb{E}[\|AXB\|_{F}]\leq\|A\|_{F}\|B\|_{F}\sqrt{\|\mathbb{E}[\check{X}\check{X}^{T}]}\|,

where Xˇpn\check{X}\in\mathbb{R}^{pn} and satisfies (i,j)[p]×[n]\forall(i,j)\in[p]\times[n]: Xˇi(j1)+j=Xi,j\check{X}_{i(j-1)+j}=X_{i,j} as defined in Remark 6.2.

Note that if n=1n=1, Xˇ=X\check{X}=X and Lemma 6.3 basically sets that :

𝔼[AX]O(AF𝔼[XXT]).\displaystyle\mathbb{E}[\|AX\|]\leq O\left(\|A\|_{F}\sqrt{\|\mathbb{E}[XX^{T}]}\|\right).
Proof.

Similarly as (30), one has the identity:

Tr(AMBAMTB)=Tr((AB)MˇMˇT(AB)).\displaystyle\operatorname{Tr}(AMBAM^{T}B)=\operatorname{Tr}\left((A\otimes B)\check{M}\check{M}^{T}(A\otimes B)\right).

Then one can bound thanks to Cauchy-Schwarz inequality and Jensen inequality:

𝔼[AXBF]\displaystyle\mathbb{E}[\|AXB\|_{F}] =𝔼[Tr(AXBAXTB)]𝔼[Tr(AXBAXTB)]\displaystyle=\mathbb{E}\left[\sqrt{\operatorname{Tr}(AXBAX^{T}B)}\right]\leq\sqrt{\mathbb{E}\left[\operatorname{Tr}(AXBAX^{T}B)\right]}
𝔼[Tr((AB)XˇXˇT(AB))]AFBF𝔼[XˇXˇT],\displaystyle\leq\sqrt{\mathbb{E}\left[\operatorname{Tr}((A\otimes B)\check{X}\check{X}^{T}(A\otimes B))\right]}\ \leq\ \|A\|_{F}\|B\|_{F}\sqrt{\left\|\mathbb{E}[\check{X}\check{X}^{T}]\right\|},

since ABF=AFBF\|A\otimes B\|_{F}=\|A\|_{F}\|B\|_{F}. ∎

Let us now express the conditions for which 𝔼[XˇXˇT]\|\mathbb{E}[\check{X}\check{X}^{T}]\| can be bounded.

Lemma 6.4.

Given a random vectors XpX\in\mathbb{R}^{p} and α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}, let us denote:

σα+2tα(t)𝑑t.\displaystyle\sigma_{\alpha}\equiv\sqrt{\int_{\mathbb{R}_{+}}2t\alpha(t)dt}.

If we assume that for all linear form u1(p,)u\in\mathcal{L}_{1}(\mathbb{R}^{p},\mathbb{R}), s.t. u1\|u\|\leq 1:

(|uT(X𝔼[X])|>t)α(t)\displaystyle\mathbb{P}\left(\left|u^{T}(X-\mathbb{E}[X])\right|>t\right)\leq\alpha(t)

then one can bound:

𝔼[XXT](𝔼[X]+σα)2,\displaystyle\|\mathbb{E}[XX^{T}]\|\leq\left(\|\mathbb{E}[X]\|+\sigma_{\alpha}\right)^{2},

This lemma partly relies on a trivial result which is a direct consequence of Cauchy-Schwarz inequality (it is quite similar to 𝔼[|Z|]𝔼[Z2]\mathbb{E}[|Z|]\leq\sqrt{\mathbb{E}[Z^{2}]}, for a random variable ZZ\in\mathbb{R}).

Lemma 6.5.

Given α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}:

+αα(0+)+2tα(t)𝑑t,\displaystyle\int_{\mathbb{R}_{+}}\alpha\leq\sqrt{\alpha(0^{+})\int_{\mathbb{R}_{+}}2t\alpha(t)dt},

where α(0+)=sup{α(t),t>0}\alpha(0^{+})=\sup\{\alpha(t),t>0\}.

Remark 6.6.

The term α(0+)\alpha(0^{+}) might look quite bothering. Actually it is not the case since probabilities are bounded by 11, and one can generally replace α\alpha with α1min(α,1+)\alpha_{\leq 1}\equiv\min(\alpha,1^{+}) where Gra(1+)={0}×[1,+)+×{1}\operatorname{Gra}(1^{+})=\{0\}\times[1,+\infty)\cup\mathbb{R}_{+}^{*}\times\{1\}. looking at Lemma 6.4 and Theorem 6.7 that we will later set, one sees it is sufficient to show that:

  • σα1=2+tα1(t)𝑑tσα\sigma_{\alpha_{\leq 1}}=\sqrt{2\int_{\mathbb{R}_{+}}t\alpha_{\leq 1}(t)dt}\leq\sigma_{\alpha}, which is obvious,

  • α1(σα1)α(σα)\alpha_{\leq 1}(\sigma_{\alpha_{\leq 1}})\geq\alpha(\sigma_{\alpha}), which come from the fact that introducing t0max(α1({1})t_{0}\equiv\max(\alpha^{-1}(\{1\}) (non empty by definition of +\mathcal{M}_{\mathbb{P}_{+}}), one knows from Lemma 6.5 that σα1+α1t0α1(t0)t0\sigma_{\alpha_{\leq 1}}\geq\int_{\mathbb{R}_{+}}\alpha_{\leq 1}\geq t_{0}\alpha_{\leq 1}(t_{0})\geq t_{0}, and therefore α1(σα1)=α(σα1)α(σα)\alpha_{\geq 1}(\sigma_{\alpha_{\leq 1}})=\alpha(\sigma_{\alpha_{\leq 1}})\geq\alpha(\sigma_{\alpha}).

Proof of Lemma 6.4.

Following Remark 6.6 we know that it is sufficient to show our result for α=min(1+,α)\alpha=\min(1^{+},\alpha). In particular, since then α(0+)1\alpha(0^{+})\leq 1, Lemma 6.5 provides the inequality +ασα\int_{\mathbb{R}_{+}}\alpha\leq\sigma_{\alpha}. Given upu\in\mathbb{R}^{p} such that u1\|u\|\leq 1, one can then merely bound thanks to Fubini Theorem:

𝔼[uTXXTu]\displaystyle\mathbb{E}[u^{T}XX^{T}u] =𝔼[(uTX)2]=0((uTX)2>t)𝑑t\displaystyle=\mathbb{E}[(u^{T}X)^{2}]=\int_{0}^{\infty}\mathbb{P}((u^{T}X)^{2}>t)dt
=0(|uTX|>t)𝑑t=02t(|uTX|>t)𝑑t\displaystyle=\int_{0}^{\infty}\mathbb{P}(|u^{T}X|>\sqrt{t})dt=\int_{0}^{\infty}2t\mathbb{P}(|u^{T}X|>t)dt
(uT𝔼[X])2+2uT𝔼[X]t(|uTXuT𝔼[X]|>tuT𝔼[X])𝑑t\displaystyle\leq(u^{T}\mathbb{E}[X])^{2}+2\int_{u^{T}\mathbb{E}[X]}^{\infty}t\mathbb{P}(|u^{T}X-u^{T}\mathbb{E}[X]|>t-u^{T}\mathbb{E}[X])dt
(uT𝔼[X])2+20(t+|uT𝔼[X]|)(|uTXuT𝔼[X]|>t)𝑑t\displaystyle\leq(u^{T}\mathbb{E}[X])^{2}+2\int_{0}^{\infty}(t+|u^{T}\mathbb{E}[X]|)\mathbb{P}(|u^{T}X-u^{T}\mathbb{E}[X]|>t)dt
𝔼[X]2+20tα(t)dt+2𝔼[X]0αdt(𝔼[X]+σα)2,\displaystyle\leq\|\mathbb{E}[X]\|^{2}+2\int_{0}^{\infty}t\alpha(t)dt+2\|\mathbb{E}[X]\|\int_{0}^{\infty}\alpha dt\ \ \leq(\|\mathbb{E}[X]\|+\sigma_{\alpha})^{2},

One can conclude that 𝔼[XXT](𝔼[X]+σα)2\|\mathbb{E}[XX^{T}]\|\leq(\|\mathbb{E}[X]\|+\sigma_{\alpha})^{2}. ∎

We have now all the elements to prove:

Theorem 6.7.

Given a decreasing mappings α+\alpha\in\mathcal{M}_{\mathbb{P}_{+}}, and a random matrix Xp,nX\in\mathcal{M}_{p,n}, denoting again σα2+tα(t)𝑑t\sigma_{\alpha}\equiv\sqrt{2\int_{\mathbb{R}_{+}}t\alpha(t)dt}, we assume that 𝔼[X]F2σαα(σα)\left\|\mathbb{E}[X]\right\|_{F}\leq\frac{2\sigma_{\alpha}}{\sqrt{\alpha(\sigma_{\alpha})}} and that for any 11-Lipschitz and convex mapping f:p,nf:\mathcal{M}_{p,n}\to\mathbb{R} and any mfm_{f}\in\mathbb{R}, a median of f(X)f(X):

(|f(X)mf|>t)α(t)\displaystyle\mathbb{P}\left(\left|f(X)-m_{f}\right|>t\right)\leq\alpha(t)

then for any deterministic Ap,BnA\in\mathcal{M}_{p},\ B\in\mathcal{M}_{n}, one has the concentration:

(|Tr(B(XTAX𝔼[XTAX]))|>t)2α(σα)αmin(α(σα)t10AFBFσα,t6AB).\displaystyle\mathbb{P}\left(\left|\operatorname{Tr}\left(B(X^{T}AX-\mathbb{E}[X^{T}AX])\right)\right|>t\right)\leq\frac{2}{\alpha(\sigma_{\alpha})}\alpha\circ\min\left(\frac{\alpha(\sigma_{\alpha})t}{10\|A\|_{F}\|B\|_{F}\sigma_{\alpha}},\sqrt{\frac{t}{6\|A\|\|B\|}}\right).

The hypothesis 𝔼[X]Fσαα(σα)\left\|\mathbb{E}[X]\right\|_{F}\leq\frac{\sigma_{\alpha}}{\sqrt{\alpha(\sigma_{\alpha})}} might seem a bit elaborate, but one can always adapt α\alpha to satisfy this condition depending on the setting. Of course this Theorem is only relevant when σα\sigma_{\alpha}\leq\infty.

In the case where n=1n=1 and α:t2e(t/2σ)2\alpha:t\mapsto 2e^{-(t/2\sigma)^{2}} for a given σ=σα+\sigma=\sigma_{\alpha}\in\mathbb{R}_{+}, one obtains the classical Hanson-Wright inequality (see [1]) with some numerical constants C,c>0C,c>0 independent with pp and σ\sigma:

(|XTAX𝔼[XTAX]|>t)Cexp(cmin(t2AF2σ4,tAσ2)).\displaystyle\mathbb{P}\left(\left|X^{T}AX-\mathbb{E}[X^{T}AX]\right|>t\right)\leq C\exp\left(-c\min\left(\frac{t^{2}}{\|A\|_{F}^{2}\sigma^{4}},\frac{t}{\|A\|\sigma^{2}}\right)\right).
Proof.

Following Remark 6.6, we can assume without loss of generality that α|+1{\alpha}\raisebox{-2.15277pt}{$|$}_{\mathbb{R}_{+}^{*}}\leq 1. We already know from Theorem 6.1 and Lemma 3.4101010To be a direct application of Lemma 3.4, one should actually start with the Lipschitz concentration of XTAXX^{T}AX, but Theorem 6.1 just provides the concentration of Tr(BXTAX)\operatorname{Tr}(BX^{T}AX), BnB\in\mathcal{M}_{n}; that is however not an issue since in Lemma 3.4, the only relevant assumption is the concentrations of the observations u(XTAX)u(X^{T}AX), uEu\in E^{\prime}. that t0\forall t\geq 0:

(|Tr(BXTAX)Tr(B𝔼[XTAX])|>t)2α(σα)αmin(t2m,t6AB)\displaystyle\mathbb{P}\left(\left|\operatorname{Tr}(BX^{T}AX)-\operatorname{Tr}\left(B\mathbb{E}[X^{T}AX]\right)\right|>t\right)\leq\frac{2}{\alpha\left(\sigma_{\alpha}\right)}\alpha\circ\min\left(\frac{t}{2m},\sqrt{\frac{t}{6\|A\|\|B\|}}\right) (31)

where we recall that mm is a median of 2AXBF2\|AXB\|_{F}. Besides:

(|2AXBFm|t)α(Id2AB).\displaystyle\mathbb{P}\left(\left|2\|AXB\|_{F}-m\right|\geq t\right)\leq\alpha\circ\left(\frac{\operatorname{Id}}{2\|A\|\|B\|}\right).

A simple application of Fubini and Lemma 6.5 provides:

m𝔼[2AXBF]𝔼[2|𝔼[AXBF]m|]2AB+α2ABσα.\displaystyle m-\mathbb{E}[2\|AXB\|_{F}]\leq\mathbb{E}\left[2\left|\mathbb{E}[\|AXB\|_{F}]-m\right|\right]\leq 2\|A\|\|B\|\int_{\mathbb{R}_{+}}\alpha\leq 2\|A\|\|B\|\sigma_{\alpha}. (32)

Now, starting from the linear concentration inequality (consequence to Lemma 3.4):

(|uT(X𝔼[X])|t)1α(σα)α(t2)\displaystyle\mathbb{P}\left(\left|u^{T}(X-\mathbb{E}[X])\right|\geq t\right)\leq\frac{1}{\alpha(\sigma_{\alpha})}\alpha\left(\frac{t}{2}\right)

after computing 0+2t1α(σα)α(t2)=4σα2α(σα)\int_{0}^{+\infty}2t\frac{1}{\alpha(\sigma_{\alpha})}\alpha\left(\frac{t}{2}\right)=\frac{4\sigma_{\alpha}^{2}}{\alpha(\sigma_{\alpha})}, we can deduce from Lemmas 6.3 and 6.4 that:

𝔼[AXBF]AFBF(𝔼[Xˇ]+2σαα(σα))23AFBFσαα(σα).\displaystyle\mathbb{E}[\|AXB\|_{F}]\leq\|A\|_{F}\|B\|_{F}\sqrt{\left(\left\|\mathbb{E}[\check{X}]\right\|+\frac{2\sigma_{\alpha}}{\sqrt{\alpha(\sigma_{\alpha})}}\right)^{2}}\leq 3\|A\|_{F}\|B\|_{F}\frac{\sigma_{\alpha}}{\alpha(\sigma_{\alpha})}.

Let us then conclude from (32) that:

m3AFBFσαα(σα)+2ABσα5AFBFσαα(σα)\displaystyle m\leq 3\|A\|_{F}\|B\|_{F}\frac{\sigma_{\alpha}}{\alpha(\sigma_{\alpha})}+2\|A\|\|B\|\sigma_{\alpha}\leq 5\|A\|_{F}\|B\|_{F}\frac{\sigma_{\alpha}}{\alpha(\sigma_{\alpha})}

and inject this bound in (31) to obtain the result of the theorem. ∎

Appendix A Minimum of operators

The aim of this section is to bound the parallel sum and the parallel product with a simpler expression involving the minimum of operators. Before defining the minimum, we naturally introduce a relation on the set of operators of same monotonicity. This relation relies on inequalities between images of operators, it also requires some conditions on the domains. To express this last condition, given a subset AA\subset\mathbb{R}, let us introduce the notations:

A={y,yA}\displaystyle A_{-}=\left\{y\in\mathbb{R},y\preceq A\right\} and\displaystyle\operatorname{and} A+={y,yA}\displaystyle A_{+}=\left\{y\in\mathbb{R},y\succeq A\right\}
Definition A.1 (Operator order relation).

Given two increasing operators f,gf,g, we say that ff is lower than gg and we denote fgf\preceq g if and only if111111When ff and gg are maximally monotone, the domains are convex (see Proposition 1.10), and the first conditions rewrite Dom(g)+Dom(f)+\operatorname{Dom}(g)_{+}\preceq\operatorname{Dom}(f)_{+} and Dom(f)Dom(g)\operatorname{Dom}(f)_{-}\succeq\operatorname{Dom}(g)_{-}:

Dom(f)+Dom(g)+,\displaystyle\operatorname{Dom}(f)_{+}\subset\operatorname{Dom}(g)_{+}, Dom(g)Dom(f)andx:f(x)g(x)\displaystyle\operatorname{Dom}(g)_{-}\subset\operatorname{Dom}(f)_{-}\quad\operatorname{and}\quad\forall x\in\mathbb{R}:\ f(x)\preceq g(x)

If for all xx\in\mathbb{R}, instead of f(x)g(x)f(x)\preceq g(x) one has g(x)f(x)g(x)\preceq f(x) then we say gg is bigger than ff and denote gfg\succeq f. The definitions for decreasing operators are the same but one needs to interchange the symbols “++” and “-”.

Note that the hypothesis on the domains implies inequalities on the domains (they are equivalence when the domains are non empty intervals of \mathbb{R}, for instance for maximally monotone operators). Given two sets A,BA,B\subset\mathbb{R}:

A+B+AB\displaystyle A_{+}\subset B_{+}\quad\implies\quad A\succeq B and\displaystyle\operatorname{and} BABA.\displaystyle B_{-}\subset A_{-}\quad\implies\quad B\preceq A.

Definition A.1 simplifies when dealing with maximally monotone operators:

Proposition A.2.

Given two maximally monotone operators with the same monotonicity f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}} one has the equivalence:

fg\displaystyle f\preceq g \displaystyle\Longleftrightarrow gf\displaystyle g\succeq f
Proof.

Let us assume that f,gf,g are moth maximally increasing, that fgf\preceq g and consider xDom(f)Dom(g)x\in\operatorname{Dom}(f)\cap\operatorname{Dom}(g). Considering ygg(x)y_{g}\in g(x), if ygf(x)y_{g}\leq f(x), one has (u,wf)Domf\forall(u,w_{f})\in\operatorname{Dom}f, if uxu\geq x, then yff(x)\forall y_{f}\in f(x), wfyfygw_{f}\geq y_{f}\geq y_{g}. If u<xu<x, then the hypothesis fgf\preceq g, that implies in particular uDom(f)+Dom(g)+u\in\operatorname{Dom}(f)_{+}\subset\operatorname{Dom}(g)_{+}, allows us to consider some wgg(u)w_{g}\in g(u) such that ygwgwfy_{g}\geq w_{g}\geq w_{f}. We have just shown:

(u,wf)Gra(f)(xu)(ygwf)0,\displaystyle(u,w_{f})\in\operatorname{Gra}(f)\qquad\Longrightarrow\qquad(x-u)(y_{g}-w_{f})\geq 0,

the maximality of ff then exactly implies (x,yg)Gra(f)(x,y_{g})\in\operatorname{Gra}(f). In other words, ygf(x)y_{g}\succeq f(x). ∎

The condition on the domain could actually be replaced (in case of maximally monotone operators) by a similar condition on the range as partly justified by next lemma. Note that the result is here the same for increasing or decreasing operators (mainly because Ran(f)=Ran(f(Id))\operatorname{Ran}(f)=\operatorname{Ran}(f\circ(-\operatorname{Id}))).

Lemma A.3.

Given two monotone operators f,gf,g:

  • If gg is maximally monotone: fgRan(f)Ran(g)f\preceq g\quad\Longrightarrow\quad\operatorname{Ran}(f)_{-}\subset\operatorname{Ran}(g)_{-},

  • If ff is maximally monotone: gfRan(g)+Ran(f)+g\succeq f\quad\Longrightarrow\quad\operatorname{Ran}(g)_{+}\subset\operatorname{Ran}(f)_{+}.

Proof.

Let us do the proof for increasing operators. Considering zRan(f)z\in\operatorname{Ran}(f)_{-}, we know that there exists (x,y)Gra(f)(x,y)\in\operatorname{Gra}(f) such that zyz\leq y. If xDom(g)x\in\operatorname{Dom}(g) then immediately we can conclude that yg(x)y\preceq g(x) and therefore that zRan(g)z\in\operatorname{Ran}(g)_{-}. If g(x)=g(x)=\emptyset, then Proposition 1.10 implies that either Dom(g)<x\operatorname{Dom}(g)<x either Dom(g)>x\operatorname{Dom}(g)>x. Only the first case is possible since, by hypothesis, Dom(f)+Dom(g)+\operatorname{Dom}(f)_{+}\subset\operatorname{Dom}(g)_{+}. One can then conclude with Theorem 1.12 that supRan(g)=+\sup\operatorname{Ran}(g)=+\infty, which implies again zRan(g)=z\in\operatorname{Ran}(g)_{-}=\mathbb{R}. The same kind of arguments works for decreasing operators (this time Dom(g)>x\operatorname{Dom}(g)>x). The second item is shown in a symmetric way. ∎

As it will be our goal in Theorem A.16 below, note that equality between maximal operator can be deduced from a two side inequality.

Proposition A.4.

Given two maximally monotone operators with the same monotonicity f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}} one has the equivalence:

fgandgf\displaystyle f\preceq g\quad\operatorname{and}\quad g\preceq f \displaystyle\Longleftrightarrow f=g\displaystyle f=g

It is a simple consequence of the following lemma and the fact that the values of maximally monotone operators are all intervals of \mathbb{R} (see Proposition 1.10).

Lemma A.5.

Given two closed intervals A,BA,B\in\mathbb{R}, if we assume that ABA\preceq B, BAB\preceq A, ABA\succeq B and BAB\succeq A, then A=BA=B.

Proof.

Let us introduce a1,a2,b1,b2a_{1},a_{2},b_{1},b_{2}\in\mathbb{R}, such that A=[a1,a2]A=[a_{1},a_{2}] and B=[b1,b2]B=[b_{1},b_{2}], then:

ABa2b2\displaystyle A\preceq B\quad\Longrightarrow\quad a_{2}\leq b_{2} BAb2a2\displaystyle B\preceq A\quad\Longrightarrow\quad b_{2}\leq a_{2}
ABb1a1\displaystyle A\succeq B\quad\Longrightarrow\quad b_{1}\leq a_{1} BAa1b1,\displaystyle B\succeq A\quad\Longrightarrow\quad a_{1}\leq b_{1},

and we can conclude that a1=b1a_{1}=b_{1} and a2=b2a_{2}=b_{2}. ∎

Let us end with a last property on the relation \preceq between operators that allows us to combine it with the inverse operation.

Lemma A.6.

Given two monotone operators f,g:2f,g:\mathbb{R}\mapsto 2^{\mathbb{R}}

iff,gare decreasing:fgf1g1iff,gare increasing:fgf1g1\displaystyle\begin{aligned} &\operatorname{if}\ f,g\ \text{are decreasing}:\quad f\preceq g\Rightarrow f^{-1}\preceq g^{-1}\\ &\operatorname{if}\ f,g\ \text{are increasing}:\quad f\preceq g\Rightarrow f^{-1}\succeq g^{-1}\end{aligned}
Proof.

Let us only prove the result for f,g+f,g\in\mathcal{M}_{\mathbb{P}_{+}}. Given xDom(f1)Dom(g1)x\in\operatorname{Dom}(f^{-1})\cap\operatorname{Dom}(g^{-1}) (if f1(x)=f^{-1}(x)=\emptyset or g1(x)=g^{-1}(x)=\emptyset, one always has f1(x)g1(x)f^{-1}(x)\leq g^{-1}(x)), for all yf1(x)y\in f^{-1}(x) we know from the hypothesis fgf\preceq g that there exists ug(y)u\in g(y) such that uxu\geq x. We know from Lemma 1.2 that g1g^{-1} is decreasing, then g1(u)g1(x)g^{-1}(u)\preceq g^{-1}(x), and since yg1(u)y\in g^{-1}(u), we have exactly shown that f1(x)g1(x)f^{-1}(x)\preceq g^{-1}(x). ∎

Definition A.7.

Given nn increasing operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}}, we introduce the minimum of f1,,fn1f_{1},\ldots,f_{n-1} and fnf_{n} through the following definition of its graph:

Gra(mini[n]fi)={(x,y)i[n]Gra(fi)|i[n]:u>xyfi(u)}\displaystyle\operatorname{Gra}(\min_{i\in[n]}f_{i})=\left\{(x,y)\in\bigcup_{i\in[n]}\operatorname{Gra}(f_{i})\ |\ \forall i\in[n]:u>x\implies y\leq f_{i}(u)\right\}

The same way, we define the maximum of operators followingly:

Gra(maxi[n]fi)={(x,y)i[n]Gra(fi)|i[n]:u<xyfi(u)}\displaystyle\operatorname{Gra}(\max_{i\in[n]}f_{i})=\left\{(x,y)\in\bigcup_{i\in[n]}\operatorname{Gra}(f_{i})\ |\ \forall i\in[n]:u<x\implies y\geq f_{i}(u)\right\}

to define the minimum and the maximum of decreasing operators, one needs to interchange the expressions “u>xu>x” and “u<xu<x”.

Note that, as expected, with these definitions, one has for any monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}} with same monotonicty:

maxi[n]fi=mini[n]fi\displaystyle\max_{i\in[n]}f_{i}=-\min_{i\in[n]}-f_{i}
Proposition A.8.

Given nn maximally monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}}:

If f1,,fnf_{1},\ldots,f_{n} are increasing: Dom(mini[n]fi)=(i[n]Dom(fi))(i[n]Dom(fi)+)\displaystyle\operatorname{Dom}\left(\min_{i\in[n]}f_{i}\right)=\left(\bigcup_{i\in[n]}\operatorname{Dom}(f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Dom}(f_{i})_{+}\right)
If f1,,fnf_{1},\ldots,f_{n} are decreasing: Dom(mini[n]fi)=(i[n]Dom(fi))(i[n]Dom(fi)),\displaystyle\operatorname{Dom}\left(\min_{i\in[n]}f_{i}\right)=\left(\bigcup_{i\in[n]}\operatorname{Dom}(f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Dom}(f_{i})_{-}\right),

one need to interchange the signs “-” with “++” to obtain the domain of the maximum.

The first identity can be rewritten:

Dom(mini[n]fi)=(i[n]Dom(fi))(i[n]Dom(fi)+).\displaystyle\operatorname{Dom}\left(\min_{i\in[n]}f_{i}\right)=\left(\bigcup_{i\in[n]}\operatorname{Dom}(f_{i})_{-}\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Dom}(f_{i})_{+}\right). (33)
Proof.

Let us do the proof for the minimum and for increasing mappings. The inclusion Dom(mini[n]fi)i[n]Dom(fi)\operatorname{Dom}\left(\min_{i\in[n]}f_{i}\right)\subset\bigcup_{i\in[n]}\operatorname{Dom}(f_{i}) is obvious, now, considering i[n]i\in[n], let us show that Dom(minj[n]fj)Dom(fi)+\operatorname{Dom}(\min_{j\in[n]}f_{j})\subset\operatorname{Dom}(f_{i})_{+}. Given xDom(minj[n]fj)x\in\operatorname{Dom}(\min_{j\in[n]}f_{j}), if xDom(fi)x\notin\operatorname{Dom}(f_{i}), we know from Proposition 1.10 that Dom(fi)\operatorname{Dom}(f_{i}) is convex and therefore either satisfies Dom(fi)<x\operatorname{Dom}(f_{i})<x or Dom(fi)>x\operatorname{Dom}(f_{i})>x. If the latter case is true, then Theorem 1.12 allows us to set that infRan(fi)=\inf\operatorname{Ran}(f_{i})=-\infty (fif_{i} is increasing and not locally bounded on the left boundary of its domain), thus for any yy\in\mathbb{R}, there exists some u>xu>x such that yfi(u)y\succeq f_{i}(u), which is impossible by definition of the minimum of operators. Therefore Dom(fi)<x\operatorname{Dom}(f_{i})<x and as a consequence xDom(fi)+x\in\operatorname{Dom}(f_{i})_{+}.

Let us now consider x(i[n]Dom(fi))(i[n]Dom(fi)+)x\in\left(\bigcup_{i\in[n]}\operatorname{Dom}(f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Dom}(f_{i})_{+}\right). For all i[n]i\in[n]:

  • either fi(x)=f_{i}(x)=\emptyset, then Proposition 1.10 implies Dom(fi)<x\operatorname{Dom}(f_{i})<x since xDom(fi)+x\in\operatorname{Dom}(f_{i})_{+} and u>x\forall u>x, fi(u)=f_{i}(u)=\emptyset,

  • either fi(x)f_{i}(x)\neq\emptyset, one can consider yifi(x)y_{i}\in f_{i}(x) and automatically has for any u>xu>x, yifi(u)y_{i}\leq f_{i}(u).

Let us then consider ymin{yi,i[n]:fi(x)}y\equiv\min\{y_{i},i\in[n]:f_{i}(x)\neq\emptyset\}. By construction (x,y)i[n]Gra(fi)(x,y)\in\cup_{i\in[n]}\operatorname{Gra}(f_{i}) and u>x\forall u>x, yfi(u)y\leq f_{i}(u), that exactly means that (x,y)Gra(mini[n]fi)(x,y)\in\ \operatorname{Gra}(\min_{i\in[n]}f_{i}). ∎

Lemma A.9.

Given nn maximally monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}} (with same monotonicity), and i[n]i\in[n]:

fiminj[n]fj,\displaystyle f_{i}\succeq\min_{j\in[n]}f_{j}, fimaxj[n]fj,\displaystyle f_{i}\preceq\max_{j\in[n]}f_{j}, minj[n]fjfi,\displaystyle\min_{j\in[n]}f_{j}\preceq f_{i}, maxj[n]fjfi.\displaystyle\max_{j\in[n]}f_{j}\succeq f_{i}.

The proof of this lemma relies on the following elementary result that we provide without proof.

Lemma A.10.

Given two subsets A,BA,B\subset\mathbb{R}:

(AB)+=A+B+\displaystyle\left(A\cup B\right)_{+}=A_{+}\cup B_{+} and\displaystyle\operatorname{and} (AB)+=A+B+\displaystyle\left(A\cap B\right)_{+}=A_{+}\cap B_{+}

the same identities hold replacing “++” with “-”.

Proof of Lemma A.9.

Let us do the proof for the minimum and increasing mappings. We know from Proposition A.8 and more precisely from identity (33) that Dom(minj[n]fj)+Dom(fi)+\operatorname{Dom}(\min_{j\in[n]}f_{j})_{+}\subset\operatorname{Dom}(f_{i})_{+} Besides, Lemma A.10 allows us to set:

Dom(minj[n]fj)=(i[n]Dom(fi))(i[n]Dom(fi)+)=i[n]Dom(fi),\displaystyle\operatorname{Dom}(\min_{j\in[n]}f_{j})_{-}=\left(\bigcup_{i\in[n]}\operatorname{Dom}(f_{i})_{-}\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Dom}(f_{i})_{+}\right)_{-}=\bigcup_{i\in[n]}\operatorname{Dom}(f_{i})_{-},

since for all i[n]i\in[n], (Dom(fi)+)=(\operatorname{Dom}(f_{i})_{+})_{-}=\mathbb{R}. As a consequence, Dom(fi)Dom(minj[n]fj)\operatorname{Dom}(f_{i})_{-}\subset\operatorname{Dom}(\min_{j\in[n]}f_{j})_{-}.

Let us then consider xDom(minj[n]fj)Dom(fi)x\in\operatorname{Dom}(\min_{j\in[n]}f_{j})\cap\operatorname{Dom}(f_{i}) and yfi(x)y\in f_{i}(x), if there exists zminj[n]fj(x)z\in\min_{j\in[n]}f_{j}(x) such that zyz\geq y, then, by definition of the minimum, for all u>xu>x, yzfi(u)y\leq z\leq f_{i}(u) which implies (x,y)Gra(minj[n]fj)(x,y)\in\operatorname{Gra}(\min_{j\in[n]}f_{j}) (in other words yminj[n]fj(x)y\preceq\min_{j\in[n]}f_{j}(x)). The same proof clearly works for decreasing mappings, considering this time all u<xu<x, and it works the same way for the maximum.

To show that minj[n]fjfi\min_{j\in[n]}f_{j}\preceq f_{i}, one can not use Proposition A.2, since we still have not proven that minj[n]fj\min_{j\in[n]}f_{j} is maximally monotone. Given xDom(fi(x))Dom(minj[n]fj)x\in\operatorname{Dom}(f_{i}(x))\cap\operatorname{Dom}(\min_{j\in[n]}f_{j}), and zminj[n]fj(x)z\in\min_{j\in[n]}f_{j}(x), if there exists yfi(x)y\in f_{i}(x) such that zyz\geq y, then, considering any (u,w)Gra(fi)(u,w)\in\operatorname{Gra}(f_{i}), if u>xu>x, by definition of the minimum, we know that zfi(u)z\leq f_{i}(u) and for all u<xu<x, wyzw\leq y\leq z, that directly implies by maximality of fif_{i} that (x,z)Gra(fi)(x,z)\in\operatorname{Gra}(f_{i}), and therefore that zfi(x)z\preceq f_{i}(x). ∎

To show maximality properties of the maximum and minimum, we will employ the simple characterization on Ran(f+Id)\operatorname{Ran}(f+\operatorname{Id}) (resp. on Ran(fId)\operatorname{Ran}(f-\operatorname{Id})) given by Theorem 1.8 for increasing (resp. decreasing) operators. Let us first study the range of the minimum and the maximum.

Proposition A.11.

Given nn maximally monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}}:

Ran(mini[n]fi)=(i[n]Ran(fi))(i[n]Ran(fi)),\displaystyle\operatorname{Ran}\left(\min_{i\in[n]}f_{i}\right)=\left(\bigcup_{i\in[n]}\operatorname{Ran}(f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Ran}(f_{i})_{-}\right),

and one can replace the sign “-” with “++” to obtain the range of the maximum.

Proof.

By definition of the minimum, one obviously has Ran(minj[n]fj)i[n]Ran(fi)\operatorname{Ran}(\min_{j\in[n]}f_{j})\subset\bigcup_{i\in[n]}\operatorname{Ran}(f_{i}). Besides we know from the first result of Lemma A.3 combined with Lemma A.9 that for all i[n]i\in[n], Ran(minj[n]fj)Ran(minj[n]fj)Ran(fi)\operatorname{Ran}(\min_{j\in[n]}f_{j})\subset\operatorname{Ran}(\min_{j\in[n]}f_{j})_{-}\subset\operatorname{Ran}(f_{i})_{-}, that finally implies Ran(minj[n]fj)(i[n]Ran(fi))(i[n]Ran(fi))\operatorname{Ran}(\min_{j\in[n]}f_{j})\subset\left(\bigcup_{i\in[n]}\operatorname{Ran}(f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Ran}(f_{i})_{-}\right).

The other inclusion is more laborious since it is closely related to the maximality of minj[n]fj\min_{j\in[n]}f_{j} that will be proven in Proposition A.14. Let us assume that y(i[n]Ran(fi))(i[n]Ran(fi))y\in\left(\bigcup_{i\in[n]}\operatorname{Ran}(f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Ran}(f_{i})_{-}\right). The set:

mx{x,i[n],yfi(x)}\displaystyle m_{x}\equiv\left\{x\in\mathbb{R},\exists i\in[n],y\in f_{i}(x)\right\}

is non empty since yi[n]Ran(fi)y\in\cup_{i\in[n]}\operatorname{Ran}(f_{i}) and closed thanks to Proposition 1.10. Let us now consider two cases:

  • If supmx=+\sup m_{x}=+\infty, then one can consider i[n]i\in[n] such that there exists a sequence (xp)p(Domfi)(x_{p})_{p\in\mathbb{N}}\in(\operatorname{Dom}f_{i})^{\mathbb{N}} satisfying:

    p:yfi(xp)\displaystyle\forall p\in\mathbb{N}:\quad y\in f_{i}(x_{p}) and\displaystyle\operatorname{and} limpxp=+.\displaystyle\lim_{{p\to\infty}}x_{p}=+\infty.

    The operator fif_{i} being maximally increasing, it is not hard to see that it is constant on [x1,+)[x_{1},+\infty) equal to {y}\{y\}. For all j[n]j\in[n], there exists pjp_{j} such that fj(xpj)yf_{j}(x_{p_{j}})\succeq y, otherwise, that would mean that y(Ranfj)y\notin(\operatorname{Ran}f_{j})_{-}. Therefore, noting p0=maxj[n],jipjp_{0}=\max_{j\in[n],j\neq i}p_{j}, we have, for all u>xp0u>x_{p_{0}}, fj(u)fj(xp0)yf_{j}(u)\geq f_{j}(x_{p_{0}})\geq y. We have exactly shown that (xp0,y)Gra(mini[n]fi)(x_{p_{0}},y)\in\operatorname{Gra}(\min_{i\in[n]}f_{i}).

  • If supmy<+\sup m_{y}<+\infty, let us then consider x=maxmxx=\max m_{x}. Given j[n]j\in[n], if fj(x)=f_{j}(x)=\emptyset, then Proposition 1.10 would imply either Ran(fj)>y\operatorname{Ran}(f_{j})>y either Ran(fj)<y\operatorname{Ran}(f_{j})<y. Only the first case is possible, and that would imply in particular that for all u>xu>x, yfj(u)y\leq f_{j}(u). If fj(x)f_{j}(x)\neq\emptyset, then yfj(x)y\preceq f_{j}(x), otherwise, there would exist u>xu>x such that yfj(u)y\in f_{j}(u) (since yRan(fj)y\in\operatorname{Ran}(f_{j}) and Ran(fj)\operatorname{Ran}(f_{j}) is convex) which contradicts the definition of xx. Then for all u>xu>x, yfj(x)fj(u)y\preceq f_{j}(x)\leq f_{j}(u) and as before, (x,y)Gra(mini[n]fi)(x,y)\in\operatorname{Gra}(\min_{i\in[n]}f_{i}).

We want to show that those two operations are stable on the set of maximally monotone operators. The monotonicity is a simple consequence of the definition.

Proposition A.12.

Given nn increasing (resp. decreasing) monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}}, mini[n]fi\min_{i\in[n]}f_{i} and maxi[n]fi\max_{i\in[n]}f_{i} are both increasing (resp. decreasing) monotone operators.

Lemma A.13.

Given nn increasing (resp.decreasing) monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}}:

Id+mini[n]fi=mini[n]Id+fi\displaystyle\operatorname{Id}+\min_{i\in[n]}f_{i}=\min_{i\in[n]}\operatorname{Id}+f_{i}
Proposition A.14.

Given nn increasing (resp.decreasing) maximally monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\mapsto 2^{\mathbb{R}}, mini[n]fi\min_{i\in[n]}f_{i} and maxi[n]fi\max_{i\in[n]}f_{i} are both increasing (resp.decreasing) maximally monotone operators.

Proof.

Theorem 1.8 allows us to set that for all i[n]i\in[n], Ran(Id+fi)=\operatorname{Ran}(\operatorname{Id}+f_{i})=\mathbb{R} and Proposition A.11 combined with Lemma A.13 provides:

Ran(Id+mini[n]fi)\displaystyle\operatorname{Ran}\left(\operatorname{Id}+\min_{i\in[n]}f_{i}\right) =Ran(mini[n](Id+fi))\displaystyle=\operatorname{Ran}\left(\min_{i\in[n]}(\operatorname{Id}+f_{i})\right)
=(i[n]Ran(Id+fi))(i[n]Ran(Id+fi))=,\displaystyle=\left(\bigcup_{i\in[n]}\operatorname{Ran}(\operatorname{Id}+f_{i})\right)\cap\left(\bigcap_{i\in[n]}\operatorname{Ran}(\operatorname{Id}+f_{i})_{-}\right)\ \ =\ \mathbb{R},

which allows to conclude the proof thanks to Theorem 1.8 (we already know from Proposition A.12 that mini[n]fi\min_{i\in[n]}f_{i} is increasing monotone). ∎

One needs a last result to prove the main result of the Section.

Proposition A.15.

Given n+1n+1 maximally monotone operators f1,,fn,g:2f_{1},\ldots,f_{n},g:\mathbb{R}\mapsto 2^{\mathbb{R}} with the same monotonicity:

i[n],gfigmini[n]fi\displaystyle\forall i\in[n],\ g\preceq f_{i}\quad\Longleftrightarrow\quad g\preceq\min_{i\in[n]}f_{i} and\displaystyle\operatorname{and} i[n],gfigmini[n]fi\displaystyle\forall i\in[n],\ g\succeq f_{i}\quad\Longleftrightarrow\quad g\succeq\min_{i\in[n]}f_{i}
Proof.

We just prove the results for maximally increasing operators and for the minimum. The converse implication is obvious thanks to Lemma A.9 (i[n],minj[n]fjfi\forall i\in[n],\min_{j\in[n]}f_{j}\preceq f_{i}). Let us then assume that i[n],gfi\forall i\in[n],\ g\preceq f_{i}. Looking at the domain, that implies:

i[n]:\displaystyle\forall i\in[n]: Dom(g)+Dom(fi)+\displaystyle\operatorname{Dom}(g)_{+}\subset\operatorname{Dom}(f_{i})_{+} and\displaystyle\operatorname{and} Dom(fi)Dom(g),\displaystyle\operatorname{Dom}(f_{i})_{-}\subset\operatorname{Dom}(g)_{-},

That directly implies thanks to Lemma A.10 and Proposition A.8 that:

Dom(g)+Dom(mini[n]fi)+\displaystyle\operatorname{Dom}(g)_{+}\subset\operatorname{Dom}(\min_{i\in[n]}f_{i})_{+} and\displaystyle\operatorname{and} Dom(mini[n]fi)Dom(g).\displaystyle\operatorname{Dom}(\min_{i\in[n]}f_{i})_{-}\subset\operatorname{Dom}(g)_{-}.

Now considering any xDom(g)Dom(mini[n]fi)x\in\operatorname{Dom}(g)\cap\operatorname{Dom}(\min_{i\in[n]}f_{i}) and yg(x)y\in g(x), we know that for all j[n]j\in[n] yfj(x)y\preceq f_{j}(x). Then for any u>xu>x and j[n]j\in[n], yfj(x)fj(u)y\preceq f_{j}(x)\leq f_{j}(u) (inequalities still hold when the sets are empty), and therefore if (x,y)j[n]Gra(fj)(x,y)\in\cup_{j\in[n]}\operatorname{Gra}(f_{j}), then (x,y)Gra(mini[n]fj)(x,y)\in\operatorname{Gra}(\min_{i\in[n]}f_{j}), if (x,y)j[n]Gra(fj)(x,y)\notin\cup_{j\in[n]}\operatorname{Gra}(f_{j}), that means that yi[n]fi(x)y\leq\cup_{i\in[n]}f_{i}(x) and therefore ymini[n]fi(x)y\leq\min_{i\in[n]}f_{i}(x). ∎

Theorem A.16.

Given nn maximally monotone operators f1,,fn:2f_{1},\ldots,f_{n}:\mathbb{R}\to 2^{\mathbb{R}}, if f1,,fnf_{1},\ldots,f_{n} are decreasing:

(mini[n]fi)1=mini[n]fi1\displaystyle\left(\min_{i\in[n]}f_{i}\right)^{-1}=\min_{i\in[n]}f_{i}^{-1} and\displaystyle\operatorname{and} (maxi[n]fi)1=maxi[n]fi1,\displaystyle\left(\max_{i\in[n]}f_{i}\right)^{-1}=\max_{i\in[n]}f_{i}^{-1},

if f1,,fnf_{1},\ldots,f_{n} are increasing:

(mini[n]fi)1=maxi[n]fi1\displaystyle\left(\min_{i\in[n]}f_{i}\right)^{-1}=\max_{i\in[n]}f_{i}^{-1} and\displaystyle\operatorname{and} (maxi[n]fi)1=mini[n]fi1\displaystyle\left(\max_{i\in[n]}f_{i}\right)^{-1}=\min_{i\in[n]}f_{i}^{-1}
Proof.

Let us only show the first identity (for decreasing operators). We know from Lemma A.9 that for all i[n]i\in[n], fimini[n]fif_{i}\succeq\min_{i\in[n]}f_{i}, therefore Lemma A.6 allows us to conclude that fi1(mini[n]fi)1f_{i}^{-1}\succeq\left(\min_{i\in[n]}f_{i}\right)^{-1} and we can then deduce from Proposition A.15 that:

mini[n]fi1(mini[n]fi)1.\displaystyle\min_{i\in[n]}f_{i}^{-1}\succeq\left(\min_{i\in[n]}f_{i}\right)^{-1}.

The same inequality is true for f11,,fn1f_{1}^{-1},\ldots,f_{n}^{-1} replacing f1,,fnf_{1},\ldots,f_{n} which then provides mini[n]fi(mini[n]fi1)1\min_{i\in[n]}f_{i}\succeq\left(\min_{i\in[n]}f_{i}^{-1}\right)^{-1}, which implies (again with Lemma A.6):

(mini[n]fi)1mini[n]fi1,\displaystyle\left(\min_{i\in[n]}f_{i}\right)^{-1}\succeq\min_{i\in[n]}f_{i}^{-1},

One can then conclude with Proposition A.4 (all the mappings are maximally monotone thanks to Example 1.9 and Proposition A.14). ∎

It is now immediate to prove:

Proposition A.17.

Given nn probabilistic operators f1,,fnf_{1},\ldots,f_{n}\in\mathcal{M}_{\mathbb{P}}:

min(f1,,fn)(Idn)f1fnmax(f1,,fn)(Idn).\displaystyle\min(f_{1},\ldots,f_{n})\circ\left(\frac{\operatorname{Id}}{n}\right)\preceq f_{1}\boxplus\cdots\boxplus f_{n}\preceq\max(f_{1},\ldots,f_{n})\circ\left(\frac{\operatorname{Id}}{n}\right).

if we assume in addition that f1,,fn+f_{1},\ldots,f_{n}\in\mathcal{M}_{\mathbb{P}_{+}}:

min(f1,,fn)(Id1n).f1fnmax(f1,,fn)(Id1n).\displaystyle\min(f_{1},\ldots,f_{n})\circ\left(\operatorname{Id}^{\frac{1}{n}}\right).\preceq f_{1}\boxtimes\cdots\boxtimes f_{n}\preceq\max(f_{1},\ldots,f_{n})\circ\left(\operatorname{Id}^{\frac{1}{n}}\right).
Proof.

It is a simple consequence of Theorem A.16 (f1,,fnf_{1},\ldots,f_{n} are all maximally decreasing) combined with Lemma A.6 and the fact that:

nmin(f11,,fn1)f11++fn1nmax(f11,,fn1)\displaystyle n\min(f_{1}^{-1},\ldots,f_{n}^{-1})\leq f_{1}^{-1}+\cdots+f_{n}^{-1}\leq n\max(f_{1}^{-1},\ldots,f_{n}^{-1})

and that:

min(f11,,fn1)nf11fn1max(f11,,fn1)n\displaystyle\min(f_{1}^{-1},\ldots,f_{n}^{-1})^{n}\leq f_{1}^{-1}\cdots f_{n}^{-1}\leq\max(f_{1}^{-1},\ldots,f_{n}^{-1})^{n}

when f1,,fn+f_{1},\ldots,f_{n}\in\mathcal{M}_{\mathbb{P}_{+}}. ∎

References

  • [1] Radosław Adamczak. A note on the hanson-wright inequality for random vectors with dependencies. Electronic Communications in Probability, 20(72):1–13, 2015.
  • [2] Radosław Adamczak, Witold Bednorz, and Paweł Wolff. Moment estimates implied by modified log-sobolev inequalities. ESAIM: Probability and Statistics, 21:467–494, 2017.
  • [3] Radosław Adamczak and Paweł Wolff. Concentration inequalities for non-lipschitz functions with bounded derivatives of higher order. Probability Theory and Related Fields, 162:531–586, 2015.
  • [4] W Anderson, T Morley, and G Trapp. Fenchel duality of nonlinear networks. IEEE Transactions on Circuits and Systems, 25(9):762–765, 1978.
  • [5] William N Anderson Jr and Richard James Duffin. Series and parallel addition of matrices. Journal of Mathematical Analysis and Applications, 26(3):576–594, 1969.
  • [6] Miguel A Arcones and Evarist Giné. On decoupling, series expansions, and tail behavior of chaos processes. Journal of Theoretical Probability, 6(1):101–122, 1993.
  • [7] HH Bauschke and PL Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS books in mathematics, Springer, 2011.
  • [8] Dimitri Bertsekas. Convex optimization theory, volume 1. Athena Scientific, 2009.
  • [9] Christer Borell. On the taylor series of a wiener polynomial. In Seminar Notes on multiple stochastic integration, polynomial chaos and their integration. Case Western Reserve University, Cleveland, 1984.
  • [10] Dao Ha Fuk. Certain probabilistic inequalities for martingales. Sibirskii Matematicheskii Zhurnal, 14(1):185–193, 1973.
  • [11] Friedrich Götze, Holger Sambale, and Arthur Sinulis. Concentration inequalities for bounded functionals via log-sobolev-type inequalities. Journal of Theoretical Probability, 34:1623–1652, 2021.
  • [12] Friedrich Götze, Holger Sambale, and Arthur Sinulis. Concentration inequalities for polynomials in α\alpha-sub-exponential random variables. 2021.
  • [13] Nathael Gozlan, Cyril Roberto, Paul-Marie Samson, Yan Shu, and Prasad Tetali. Characterization of a class of weak transport-entropy inequalities on the line. 2018.
  • [14] Nathael Gozlan, Cyril Roberto, Paul-Marie Samson, and Prasad Tetali. Kantorovich duality for general transport costs and applications. Journal of Functional Analysis, 273(11):3327–3405, 2017.
  • [15] Alice Guionnet. Large random matrices, volume 36. Springer Science & Business Media, 2009.
  • [16] Rafał Latała. Estimates of moments and tails of gaussian chaoses. 2006.
  • [17] Michel Ledoux. A note on large deviations for wiener chaos. Séminaire de probabilités de Strasbourg, 22:249–259, 1988.
  • [18] Michel Ledoux. The concentration of measure phenomenon. ed. by peter landweber et al. vol. 89. Mathematical Surveys and Monographs. Providence, Rhode Island: American Mathematical Society, page 181, 2005.
  • [19] R Lochowski. Moment and tail estimates for multidimensional chaoses generated by symmetric random variables with logarithmically concave tails. Banach Center Publications, 72:161, 2006.
  • [20] Elizabeth S Meckes and Mark W Meckes. Concentration and convergence rates for spectral measures of random matrices. Probability Theory and Related Fields, 156:145–164, 2013.
  • [21] Sergey V Nagaev. Large deviations of sums of independent random variables. The Annals of Probability, pages 745–789, 1979.
  • [22] R Tyrrell Rockafellar and Johannes O Royset. Random variables, monotone relations, and convex analysis. Mathematical Programming, 148:297–331, 2014.
  • [23] Michel Talagrand. Concentration of measure and isoperimetric inequalities in product spaces. Publications mathématiques de l’IHÉS, 104:905–909, 1995.
  • [24] Michel Talagrand. A new look at independence. The Annals of probability, pages 1–34, 1996.
  • [25] Van H Vu. Concentration of non-lipschitz functions and applications. Random Structures & Algorithms, 20(3):262–316, 2002.