This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\catchline

Sharp Bounds for the Concentration of the Resolvent
in Convex Concentration Settings

Cosme Louart111 GIPSA-lab. [email protected] GIPSA-lab, 11 rue des Mathématiques, 38401 St Martin d’Hères
((26/12/2021))
Abstract

Considering random matrix Xp,nX\in\mathcal{M}_{p,n} with independent columns satisfying the convex concentration properties issued from a famous theorem of Talagrand, we express the linear concentration of the resolvent Q=(Ip1nXXT)1Q=(I_{p}-\frac{1}{n}XX^{T})^{-1} around a classical deterministic equivalent with a good observable diameter for the nuclear norm. The general proof relies on a decomposition of the resolvent as a series of powers of XX.

keywords:
random matrix theory ; concentration of measure ; convariance matrices ; convex concentration ; concentration of products and infinite series ; Hanson-Wright Theorem.
{history}
\ccode

Mathematics Subject Classification 2000: 15A52, 60B12, 62J10

Introduction

Initiated by Milman in the 70s [Mil71], the theory of concentration of the measure provided a wide range of concentration inequalities, mainly concerning continuous distribution (i.e. with no atoms), in particular thanks to the beautiful interpretation with the bound on the Ricci curvature [Gro79]. To give a simple fundamental example (more examples can be found in [Led05]), a random vector ZnZ\in\mathbb{R}^{n} having independent and Gaussian entries with unit variance satisfies for any 11-Lipschitz mapping f:nf:\mathbb{R}^{n}\to\mathbb{R}:

t>0:(|f(Z)𝔼[f(Z)]|t)2et2/2.\displaystyle\forall t>0:\ \ \mathbb{P}\left(\left|f(Z)-\mathbb{E}[f(Z)]\right|\geq t\right)\leq 2e^{-t^{2}/2}. (1)

This inequality is powerful for two reasons: first, it is independent on the dimension nn, second it concerns any 11-Lipschitz mapping ff. It is then interesting to formalize this behavior to introduce a class of “Lipschitz concentrated random vectors” satisfying the same concentration inequality as the Gaussian distribution (in particular, all the Lipschitz transformation of a Gaussian vector). This was done in several books and papers where this approach proved its efficiency in the study of random matrices [Tao12, Nou09, LC21b]

We want here to extend those results to a new class of concentrated vectors discovered by Talagrand in [Tal95]. Although the concentration result looks similar, its nature is quite different as it concerns bounded distributions for which classical tools of differential geometry do not operate. In a sense, it could be seen as a combinatorial result of concentration. Given a random vector Z[0,1]nZ\in[0,1]^{n} with independent entries, this result sets that for any 11-Lipschitz and convex mapping f:[0,1]f:[0,1]\to\mathbb{R}:

t>0:(|f(Z)𝔼[f(Z)]|t)2et2/4.\displaystyle\forall t>0:\ \ \mathbb{P}\left(\left|f(Z)-\mathbb{E}[f(Z)]\right|\geq t\right)\leq 2e^{-t^{2}/4}. (2)

We can mention here the recent results of [HT21] that extend this kind of inequalities for random vectors with independent and subgaussian entries. Adopting the terminology of [VW14, MS11, Ada11], we call those vectors convexly concentrated random vector (see Definition 1.2 below). The convexity required for the observations to concentrate makes the discussion on convexly concentrated random vector far more delicate. There is no more stability towards Lipschitz transformations and given a convexly concentrated random vector ZZ, just its affine transformations are sure to be concentrated. This issue raises naturally for one of the major objects of random matrix theory, namely the resolvent Qz=(zInX)1Q^{z}=(zI_{n}-X)^{-1} that can provide important eigen properties on XX. In the case of convex concentration, the concentration of the resolvent Qz=(zInX)1Q^{z}=(zI_{n}-X)^{-1} is no more a mere consequence of a bound on its differential on XpX\in\mathcal{M}_{p}. Still, as first shown by [GZ00], it is possible to obtain concentration properties on the sum of Lipschitz functionals of the eigen values. Here we pursue the study, looking at linear concentration properties of QzQ^{z} for which similar inequalities to (1) or (2) are only satisfied by 11-Lipschitz and linear functionals ff. The well known identity

1pλSp(X)f(λ)=12iπγf(z)Tr(Qz)𝑑z,\displaystyle\frac{1}{p}\sum_{\lambda\in\text{Sp}(X)}f(\lambda)=-\frac{1}{2i\pi}\oint_{\gamma}f(z)\operatorname{Tr}(Q^{z})dz, (3)

is true for any analytical mapping ff defined on the interior of a path γ\gamma\in\mathbb{C} containing the spectrum of XX (or any limit of such mappings), therefore, our results on the concentration of QzQ^{z} concern in particular the quantities studied in [GZ00].

Although it is weaker222Lipschitz concentrated random vectors are convexly and linearly concentrated, convexly concentrated random vectors are linearly concentrated., the class of linearly concentrated vectors behaves very well towards the dependence and the sum and allows us to obtain the concentration of the resolvent expressing it as a sum Qz=1zi=1(X/z)iQ^{z}=\frac{1}{z}\sum_{i=1}^{\infty}(X/z)^{i}. The linear concentration of the powers of XX was justified in333We provide an alternative proof in the appendix. [MS11] in the case of convexly concentrated random matrix XX. We call this weakening of the concentration property “the degeneracy of the convex concentration through multiplication”. The linear concentration of the resolvent is though sufficient for most practical applications that rely on an estimation of the Stieltjes transform m(z)=1pTr(Qz)m(z)=\frac{1}{p}\operatorname{Tr}(Q^{z}).

We present below our main contribution without the useful but non-standard formalism introduced in the rest of the article. It concerns the concentration and the estimation of444The concentration of (zIp1nX)1(zI_{p}-\frac{1}{\sqrt{n}}X)^{-1} for a positive symmetric matrix XpX\in\mathcal{M}_{p} is immediate from our approach. The estimation of its expectation is more laborious and goes beyond the scope of the paper although it can be obtained with the same tools.

Qz(zIp1nXXT)1\displaystyle Q^{z}\equiv\left(zI_{p}-\frac{1}{n}XX^{T}\right)^{-1}

for a random matrix Xp,nX\in\mathcal{M}_{p,n}. Following the formalism of the random matrix theory, the computable estimation of 𝔼[Qz]\mathbb{E}[Q^{z}] will be called a “deterministic equivalent”. Its definition relies on a well known result that states that given a family of symmetric matrices Σ=(Σ1,,Σn)pn\Sigma=(\Sigma_{1},\ldots,\Sigma_{n})\in\mathcal{M}_{p}^{n}, there exists a unique vector Λ~Σzn\tilde{\Lambda}_{\Sigma}^{z}\in\mathbb{C}^{n} satisfying:

i[n]:\displaystyle\forall i\in[n]: [Λ~Σz]i=1nTr(ΣiQ~ΣΛ~Σz)\displaystyle[\tilde{\Lambda}_{\Sigma}^{z}]_{i}=\frac{1}{n}\operatorname{Tr}\left(\Sigma_{i}\tilde{Q}_{\Sigma}^{\tilde{\Lambda}_{\Sigma}^{z}}\right) with Q~ΣΛ~Σz=(zIp1nj=1nΣj1[Λ~Σz]j)1.\displaystyle\text{with }\ \tilde{Q}_{\Sigma}^{\tilde{\Lambda}_{\Sigma}^{z}}=\left(zI_{p}-\frac{1}{n}\sum_{j=1}^{n}\frac{\Sigma_{j}}{1-[\tilde{\Lambda}_{\Sigma}^{z}]_{j}}\right)^{-1}.

With those notations at hand, let us state:

Theorem 0.1 (Concentration of the resolvent).

Considering two sequences (pn)n(p_{n})_{n\in\mathbb{N}}\in\mathbb{N}^{\mathbb{N}}, (σn)+(\sigma_{n})\in\mathbb{R}_{+}^{\mathbb{N}} and four constants c,C,K,γ>0c,C,K,\gamma>0, we suppose that we are given for any nn\in\mathbb{N} a random matrix: Xn=(x1(n),,xn(n))pn,nX_{n}=(x_{1}^{(n)},\ldots,x_{n}^{(n)})\in\mathcal{M}_{p_{n},n} such that

  • pnγnp_{n}\leq\gamma n

  • for all nn\in\mathbb{N}, xn(1),,xn(n)x_{n}^{(1)},\ldots,x_{n}^{(n)} are independent,

  • supn,j[n]𝔼[xn(j)]Kn\sup_{n\in\mathbb{N},j\in[n]}\left\|\mathbb{E}\left[x_{n}^{(j)}\right]\right\|\leq K\sqrt{n}

  • for any nn\in\mathbb{N} quasi-convex mapping g:n,pnmng:\mathcal{M}_{n,p_{n}}^{m_{n}}\to\mathbb{R}, 11-Lipschitz for the euclidean norm:

    (|g(Xn)𝔼[g(Xn)]|t)Cec(t/σn)2.\displaystyle\mathbb{P}\left(\left|g(X_{n})-\mathbb{E}\left[g(X_{n})\right]\right|\geq t\right)\leq Ce^{-c(t/\sigma_{n})^{2}}.

Then for any constant ε>0\varepsilon>0, there exist two constants c,C>0c^{\prime},C^{\prime}>0 such that for all nn\in\mathbb{N}, for any deterministic matrix AnA\in\mathbb{R}^{n} such that A1\|A\|\leq 1 and for any zz\in\mathbb{C}, such that d(z,Sp(1nXXT))εd(z,\text{Sp}(\frac{1}{n}XX^{T}))\geq\varepsilon:

(|Tr((QzQ~ΣΛ~Σz)A)|t)Cec(t/σn)2+Cecn,\displaystyle\mathbb{P}\left(\left|\operatorname{Tr}\left(\left(Q^{z}-\tilde{Q}_{\Sigma}^{\tilde{\Lambda}_{\Sigma}^{z}}\right)A\right)\right|\geq t\right)\leq C^{\prime}e^{-c^{\prime}(t/\sigma_{n})^{2}}+C^{\prime}e^{-c^{\prime}n},

where Σi=𝔼[xixiT]\Sigma_{i}=\mathbb{E}[x_{i}x_{i}^{T}]. Besides there exists a constant K>0K>0 such that for all nn\in\mathbb{N}:

𝔼[Qz]Q~ΣΛ~ΣzFKn.\displaystyle\left\|\mathbb{E}[Q^{z}]-\tilde{Q}_{\Sigma}^{\tilde{\Lambda}_{\Sigma}^{z}}\right\|_{F}\leq\frac{K}{\sqrt{n}}.

This theorem allows us to get good inferences on the eigen values distribution through the identity (3) and the estimation of the Stieltjes transform g(z)1pTr(Qz)g(z)\equiv-\frac{1}{p}\operatorname{Tr}(Q^{z}) satisfying the concentration inequality:

(|g(z)+1nTr(Q~Λ~z)|t)Cecp2t2,\displaystyle\mathbb{P}\left(\left|g(z)+\frac{1}{n}\operatorname{Tr}\left(\tilde{Q}^{\tilde{\Lambda}^{z}}\right)\right|\geq t\right)\leq Ce^{-cp^{2}t^{2}},

for two constants C,c>0C,c>0 (and for d(z,Sp(1nXXT))O(1)d(z,\text{Sp}(\frac{1}{n}XX^{T}))\geq O(1)).

When the distribution of the spectrum of 1nXXT\frac{1}{n}XX^{T} presents different bulks, this theorem also allows us to understand the eigen-spaces associated to those different bulks. Indeed, considering a path γ\gamma\in\mathbb{C} containing a bulk of eigen-values BSp(1nXXT)B\subset\text{Sp}(\frac{1}{n}XX^{T}), if we note EBE_{B} the associated random eigen-space and ΠB\Pi_{B} the orthogonal projector on EBE_{B}, then for any deterministic matrix ApA\in\mathcal{M}_{p}:

Tr(ΠBA)=12iπγTr(AQz)𝑑z\displaystyle\operatorname{Tr}(\Pi_{B}A)=-\frac{1}{2i\pi}\int_{\gamma}\operatorname{Tr}(AQ^{z})dz (4)

we can estimate this projection ΠB\Pi_{B} defining EBE_{B} thanks to the concentration inequality555for the concentration to be valid on all the values of the path γ\gamma, one must be careful to consider a path staying at a distance O(1)O(1) from the bulk, that is why we only consider here multiple bulk distributions:

t>0:(|1Rg(Π)Tr(ΠQz)1Rg(Π)Tr(ΠQ~Λ~z)|t)CecRg(Π)2t2,\displaystyle\forall t>0:\ \ \mathbb{P}\left(\left|\frac{1}{\text{Rg}(\Pi)}\operatorname{Tr}(\Pi Q^{z})-\frac{1}{\text{Rg}(\Pi)}\operatorname{Tr}\left(\Pi\tilde{Q}^{\tilde{\Lambda}^{z}}\right)\right|\geq t\right)\leq Ce^{-c\text{Rg}(\Pi)^{2}t^{2}},

for some constants C,c>0C,c>0 and for any projector Π\Pi defined on p\mathbb{R}^{p}.

The approach we present here does not only allows us to set the concentration of QzQ^{z}, but also the concentration of any polynomial of finite degree taking as variable combination of QzQ^{z}, XX and XTX^{T}. The general idea is to develop the polynomial as an infinite series of powers of XX in a way that the observable diameters of the different terms of the series sum to the smallest value possible. As it is described in the proof of Proposition 4.6, the summation becomes slightly elaborate when zz gets close to the spectrum.

After presenting the definition and the basic properties of the convex and linear concentration (Section 1), we express the concentration of the sum of linearly concentrated random vectors (Section 2). Then we express the concentration of the entry wise product and the matricial product of convexly concentrated random vectors and matrices (Section 3). Finally we deduce the concentration of the resolvent and provide a computable deterministic equivalent (Section 4).

1 Definition and first properties

The concentration inequality (2) is actually also valid for quasi-convex functionals defined folowingly.

Definition 1.1.

Given a normed vector space (E,)(E,\|\cdot\|), an application f:Ef:E\mapsto\mathbb{R} is said to be quasi-convex iif for any tt\in\mathbb{R}, the set {ft}{xE|f(x)t}\{f\leq t\}\equiv\{x\in E\ |\ f(x)\leq t\} is convex.

The theory of concentration of measure becomes relevant only when dimensions get big. In the cases under study in this paper, the dimension is either given by the number of entries, either by the number of columns nn of random matrices - the number of rows pp is then understood to depend on nn, we will sometimes note p=pnp=p_{n}. We follow then the approach with Levy families [Lé51] whose aim is to track the concentration speed through dimensionality. Therefore, we do not talk about a static concentration of a vector but about the concentration of a sequence of random vectors as seen in the definition below. In this paper, EnE_{n} will either be n\mathbb{R}^{n}, pn\mathbb{R}^{p_{n}} n\mathcal{M}_{n}, pn\mathcal{M}_{p_{n}} or pn,n\mathcal{M}_{p_{n},n}.

There will generally be three possibilities for the norms defining the Lipschitz character of the concentrated observations. Talagrand Theorem gives the concentration for the euclidean norm - i.e. the Frobenius norm for matrices - but we will see that some concentrations are expressed with the nuclear norm (the dual norm of the spectral norm). Given two integers l,ml,m\in\mathbb{N}, the euclidean norm on l\mathbb{R}^{l} is noted \|\cdot\|, the spectral, Frobenius and nuclear norm are respectively defined for any Ml,mM\in\mathcal{M}_{l,m} with the expressions:

M=supxmMx;\displaystyle\|M\|=\sup_{x\in\mathbb{R}^{m}}\|Mx\|; MF=Tr(MMT);\displaystyle\|M\|_{F}=\sqrt{\operatorname{Tr}(MM^{T})}; M=Tr(MMT).\displaystyle\|M\|_{*}=\operatorname{Tr}\left(\sqrt{MM^{T}}\right).
Definition 1.2.

Given a sequence of normed vector spaces (En,n)n0(E_{n},\|\cdot\|_{n})_{n\geq 0}, a sequence of random vectors (Zn)n0n0En(Z_{n})_{n\geq 0}\in\prod_{n\geq 0}E_{n}, a sequence of positive reals (σn)n0+(\sigma_{n})_{n\geq 0}\in\mathbb{R}_{+}^{\mathbb{N}}, we say that Z=(Zn)n1Z=(Z_{n})_{n\geq 1} is convexly concentrated with an observable diameter of order O(σn)O(\sigma_{n}) iff there exist two positive constants C,c>0C,c>0 such that n\forall n\in\mathbb{N} and for any 11-Lipschitz and quasi-convex function f:Enf:E_{n}\rightarrow\mathbb{R} (for the norms n\|\cdot\|_{n})666In this inequality, one could have replaced the term “𝔼[f(Zn)]\mathbb{E}[f(Z_{n})]” by “f(Zn)f(Z_{n}^{\prime})” (with ZnZ_{n}^{\prime}, an independent copy of ZnZ_{n}) or by “mfm_{f}” (with mfm_{f} a median of f(Zn)f(Z_{n})). All those three definitions are equivalent.,

t>0:\displaystyle\forall t>0: (|f(Zn)𝔼[f(Zn)]|t)Cec(t/σn)2,\displaystyle\mathbb{P}\left(\left|f(Z_{n})-\mathbb{E}[f(Z_{n})]\right|\geq t\right)\leq Ce^{-c(t/\sigma_{n})^{2}},

We write in that case777The index 22 in “2\mathcal{E}_{2}” is here a reference to the power of tt in the concentration bound Cec(t/σn)2Ce^{-c(t/\sigma_{n})^{2}}, we will see some example where this exponent is 11, in particular in the Hanson-Wright Theorem 3.5 where we will let appear a notation “1\mathcal{E}_{1}”. Znc2(σn)Z_{n}\propto_{c}\mathcal{E}_{2}(\sigma_{n}) (or more simply Zc2(σ)Z\propto_{c}\mathcal{E}_{2}(\sigma)).

The Theorem of Talagrand then writes:

Theorem 1.3 ([Tal95]).

A (sequence of) random vector Z[0,1]nZ\in[0,1]^{n} with independent entries satisfies Zc2Z\propto_{c}\mathcal{E}_{2}.

Convex concentration is preserved through affine transformations (as for the class of linearly concentrated vectors). Given two vector spaces, EE and FF, we note 𝒜(E,F)\mathcal{A}(E,F) the set of affine transformation from EE to FF, and given ϕ𝒜(E,F)\phi\in\mathcal{A}(E,F), we decompose ϕ=(ϕ)+ϕ(0)\phi=\mathcal{L}(\phi)+\phi(0), where (ϕ)\mathcal{L}(\phi) is the linear part of ϕ\phi and ϕ(0)\phi(0) is the translation part. When E=FE=F, 𝒜(E,F)\mathcal{A}(E,F) is simply noted 𝒜(E)\mathcal{A}(E).

Proposition 1.4.

Given two normed vector spaces (E,)(E,\|\cdot\|) and (F,)(F,\|\cdot\|^{\prime}), a random vector ZEZ\in E and an affine mapping ϕ𝒜(E,F)\phi\in\mathcal{A}(E,F) such that (ϕ)λ\|\mathcal{L}(\phi)\|\leq\lambda:

Zc2(σ)\displaystyle Z\propto_{c}\mathcal{E}_{2}(\sigma) \displaystyle\Longrightarrow ϕ(Z)c2(λσ).\displaystyle\phi(Z)\propto_{c}\mathcal{E}_{2}(\lambda\sigma).

We pursue our presentation with the introduction of the linear concentration. It is the “minimal” hypothesis necessary on a random vector XX to be able to bound quantities of the form 𝔼[X𝔼[X]]\mathbb{E}[\|X-\mathbb{E}[X]\|], as it has been explained in [LC19]. Here we will need its stability towards the sum when we will express QzQ^{z} as an infinite series.

Definition 1.5 (Linearly concentrated vectors).

Given a sequence of normed vector spaces (En,n)n0(E_{n},\|\cdot\|_{n})_{n\geq 0}, a sequence of random vectors (Zn)n0n0En(Z_{n})_{n\geq 0}\in\prod_{n\geq 0}E_{n}, a sequence of deterministic vectors (Z~n)n0n0En(\tilde{Z}_{n})_{n\geq 0}\in\prod_{n\geq 0}E_{n}, a sequence of positive reals (σn)n0+(\sigma_{n})_{n\geq 0}\in\mathbb{R}_{+}^{\mathbb{N}}, ZnZ_{n} is said to be linearly concentrated around the deterministic equivalent Z~n\tilde{Z}_{n} with an observable diameter of order O(σn)O(\sigma_{n}) iff there exist two constants c,C>0c,C>0 such that n\forall n\in\mathbb{N} and for any unit-normed linear form fEnf\in E_{n}^{\prime} (n\forall n\in\mathbb{N}, xE\forall x\in E: |f(x)|xn|f(x)|\leq\|x\|_{n}):

t>0:\displaystyle\forall t>0: (|f(Zn)f(Z~n)|t)Cec(t/σn)2.\displaystyle\mathbb{P}\left(\left|f(Z_{n})-f(\tilde{Z}_{n})\right|\geq t\right)\leq Ce^{-c(t/\sigma_{n})^{2}}.

When the property holds, we write ZZ~±2(σ)Z\in\tilde{Z}\pm\mathcal{E}_{2}(\sigma). If it is unnecessary to mention the deterministic equivalent, we will simply write Z2(σ)Z\in\mathcal{E}_{2}(\sigma); and if we just need to control the order of the norm of the deterministic equivalent, we can write ZO(θ)±2(σ)Z\in O(\theta)\pm\mathcal{E}_{2}(\sigma) when Z~nnO(θn)\|\tilde{Z}_{n}\|_{n}\leq O(\theta_{n}).

In the literature [BLM13], those vectors are commonly called sub-Gaussian random vectors.

The notions of linear concentration, convex concentration (and Lipschitz concentration) are equivalent for random variables and we have this important characterization with the moments:

Proposition 1.6 ([Led05], Proposition 1.8., [LC19], Lemma 1.22.).

Given a sequence of random variables ZnZ_{n}\in\mathbb{R} and a sequence of positive parameters σn>0\sigma_{n}>0, we have the equivalence:

Znc2(σn)\displaystyle Z_{n}\propto_{c}\mathcal{E}_{2}(\sigma_{n}) Zn𝔼[Zn]±2(σn)\displaystyle\Longleftrightarrow Z_{n}\in\mathbb{E}[Z_{n}]\pm\mathcal{E}_{2}(\sigma_{n})
C>0|n,m:𝔼[|Zn𝔼[Zn]|n]Cmm2σnm\displaystyle\Longleftrightarrow\exists C>0\ |\ \forall n,m\in\mathbb{N}:\mathbb{E}\left[\left|Z_{n}-\mathbb{E}[Z_{n}]\right|^{n}\right]\leq Cm^{\frac{m}{2}}\sigma_{n}^{m}
C>0|n,r>0:𝔼[|Zn𝔼[Zn]|r]Crr2σnr.\displaystyle\Longleftrightarrow\exists C>0\ |\ \forall n\in\mathbb{N},\forall r>0:\mathbb{E}\left[\left|Z_{n}-\mathbb{E}[Z_{n}]\right|^{r}\right]\leq Cr^{\frac{r}{2}}\sigma_{n}^{r}.

We end with a simple lemma that allows us to state that "every deterministic vector at a distance smaller than the observable diameter to a deterministic equivalent is also a deterministic equivalent".

Lemma 1.7 ([LC19], Lemma 2.6.).

Given a sequence of random vectors ZnEnZ_{n}\in E_{n} and two sequence of deterministic random vector Z~n,Z~nEn\tilde{Z}_{n},\tilde{Z}^{\prime}_{n}\in E_{n}, if Z~nZ~nO(σn)\|\tilde{Z}_{n}-\tilde{Z}^{\prime}_{n}\|\leq O(\sigma_{n}), then:

ZZ~±2(σ)\displaystyle Z\in\tilde{Z}\pm\mathcal{E}_{2}(\sigma) \displaystyle\Longleftrightarrow ZZ~±2(σ).\displaystyle Z\in\tilde{Z}^{\prime}\pm\mathcal{E}_{2}(\sigma).

2 Linear concentration through sums and integrals

Independence is known to be a key elements to most of concentration inequalities. However, linear concentration behaves particularly well for the concatenation of random vectors whose dependence can not be disentangled.

The next proposition sets that the observable diameter for the \ell^{\infty} norm remains unchanged through concatenation. Given a product E1imEiE\equiv\prod_{1\leq i\leq m}E_{i}, where (E1,),,(Em,)(E_{1},\|\cdot\|_{\infty}),\ldots,(E_{m},\|\cdot\|_{\infty}) are mm normed vector spaces we define the \ell^{\infty} norm on EE with the following identity:

(z1,,zm)E:(z1,,zm)=sup1imzii.\displaystyle(z_{1},\ldots,z_{m})\in E:\ \|(z_{1},\ldots,z_{m})\|_{\ell^{\infty}}=\sup_{1\leq i\leq m}\|z_{i}\|_{i}. (5)
Proposition 2.1.

Given two sequences mm\in\mathbb{N}^{\mathbb{N}} and σ+\sigma\in\mathbb{R}_{+}^{\mathbb{N}}, a constant qq, mm sequences of normed vector spaces (Ei,i)1im(E_{i},\|\cdot\|_{i})_{1\leq i\leq m}, mm sequences of deterministic vectors Z~1E1,,Z~mEm\tilde{Z}_{1}\in E_{1},\ldots,\tilde{Z}_{m}\in E_{m}, and mm sequences of random vectors Z1E1,,ZmEmZ_{1}\in E_{1},\ldots,Z_{m}\in E_{m} (possibly dependent) satisfying, for any i{1,,m}i\in\{1,\ldots,m\}, ZiZ~i±2(σ)Z_{i}\in\tilde{Z}_{i}\pm\mathcal{E}_{2}(\sigma), we have the concentration :

(Z1,,Zm)(Z~1,,Z~m)±2(σ),in (E,).\displaystyle(Z_{1},\ldots,Z_{m})\in(\tilde{Z}_{1},\ldots,\tilde{Z}_{m})\pm\mathcal{E}_{2}(\sigma),\ \ \text{in }(E,\|\cdot\|_{\ell^{\infty}}).

In other word, the linear observable diameter of (Z1,,Zm)(Z_{1},\ldots,Z_{m}) can not be bigger than the observable diameter of (Z,,Z)(Z,\ldots,Z), where ZZ is chosen as the worse possible random vector satisfying the hypotheses of Z1,,ZmZ_{1},\ldots,Z_{m}.

Remark 2.2.

Example 2.27. in [LC19] shows that this stability towards concatenation is not true for Lipschitz and convex concentration.

Proof 2.3.

Let us consider a linear function u:Eu:E\rightarrow\mathbb{R}, such that

usupz1|u(z)|1.\left\|u\right\|_{\infty}\equiv\sup_{\left\|z\right\|_{\infty}\leq 1}|u(z)|\leq 1.

Given i[m]i\in[m], let us note ui:Eiu_{i}:E_{i}\rightarrow\mathbb{R} the function defined as ui(z)=u((0,,0,z,0,,0))u_{i}(z)=u((0,\ldots,0,z,0,\ldots,0)) (where zz is in the ithi^{\text{th}} entry). For any zEz\in E, one can write :

u(z)=i=1mniui(zi),\displaystyle u(z)=\sum_{i=1}^{m}n_{i}u^{\prime}_{i}(z_{i}),

where niui=supzi1ui(z)n_{i}\equiv\|u_{i}\|=\sup_{\|z\|_{i}\leq 1}u_{i}(z) and ui=ui/niu^{\prime}_{i}=u_{i}/n_{i} (ui=1\|u^{\prime}_{i}\|=1). We have the inequality :

i=1mni=i=1mnisupzii1ui(zi)=supz1u(z)1.\sum_{i=1}^{m}n_{i}=\sum_{i=1}^{m}n_{i}\sup_{\|z_{i}\|_{i}\leq 1}u_{i}^{\prime}(z_{i})=\sup_{\|z\|_{\infty}\leq 1}u(z)\leq 1.

With this bound at hand, we plan to employ the characterization with the centered moments. Let us conclude thanks to Proposition 1.6 and the convexity of ttlt\mapsto t^{l}, for any l1l\geq 1:

𝔼[|u(Z)u(Z~)|l]\displaystyle\mathbb{E}\left[\left|u(Z)-u(\tilde{Z})\right|^{l}\right] 𝔼[(i=1mni|ui(Zi)ui(Z~i)|)l]\displaystyle\leq\mathbb{E}\left[\left(\sum_{i=1}^{m}n_{i}\left|u_{i}^{\prime}\left(Z_{i}\right)-u^{\prime}_{i}\left(\tilde{Z}_{i}\right)\right|\right)^{l}\right]
(i=1mni)l𝔼[i=1mnii=1mni|ui(Zi)ui(Z~i)|l]\displaystyle\leq\left(\sum_{i=1}^{m}n_{i}\right)^{l}\mathbb{E}\left[\sum_{i=1}^{m}\frac{n_{i}}{\sum_{i=1}^{m}n_{i}}\left|u_{i}^{\prime}\left(Z_{i}\right)-u^{\prime}_{i}\left(\tilde{Z}_{i}\right)\right|^{l}\right]
supl[m]𝔼[|ui(Zi)ui(Z~i)|l]Cll2σl.\displaystyle\leq\sup_{l\in[m]}\mathbb{E}\left[\left|u_{i}^{\prime}\left(Z_{i}\right)-u^{\prime}_{i}\left(\tilde{Z}_{i}\right)\right|^{l}\right]\ \ \leq\ Cl^{\frac{l}{2}}\sigma^{l}.

If we want to consider the concatenation of vectors with different observable diameter, it is more convenient to look at the concentration in a space (i=1mEi,r)(\prod_{i=1}^{m}E_{i},\ell^{r}), for any given r>0r>0, where, for any (z1,,zm)i=1mEi(z_{1},\ldots,z_{m})\in\prod_{i=1}^{m}E_{i}:

(z1,,zm)r=(i=1mziir)1/r.\displaystyle\left\|(z_{1},\ldots,z_{m})\right\|_{\ell^{r}}=\left(\sum_{i=1}^{m}\|z_{i}\|_{i}^{r}\right)^{1/r}.
Corollary 1

Given two constants q,r>0q,r>0, mm\in\mathbb{N}^{\mathbb{N}}, σ1,,σm(+)m\sigma_{1},\ldots,\sigma_{m}\in(\mathbb{R}_{+}^{\mathbb{N}})^{m}, mm sequences of (Ei,i)1im(E_{i},\|\cdot\|_{i})_{1\leq i\leq m}, mm sequences of deterministic vectors Z~1E1,,Z~mEm\tilde{Z}_{1}\in E_{1},\ldots,\tilde{Z}_{m}\in E_{m}, and mm sequences of random vectors Z1E1,,ZmEmZ_{1}\in E_{1},\ldots,Z_{m}\in E_{m} (possibly dependent) satisfying, for any i{1,,m}i\in\{1,\ldots,m\}, ZiZ~i±2(σi)Z_{i}\in\tilde{Z}_{i}\pm\mathcal{E}_{2}(\sigma_{i}), we have the concentration :

(Z1,,Zm)(Z~1,,Z~m)±2(σr),in (E,r),\displaystyle(Z_{1},\ldots,Z_{m})\in(\tilde{Z}_{1},\ldots,\tilde{Z}_{m})\pm\mathcal{E}_{2}(\|\sigma\|_{r}),\ \ \text{in }(E,\|\cdot\|_{\ell^{r}}),
Remark 2.4.

When E1==Em=EE_{1}=\cdots=E_{m}=E in the setting of Corollary 1, then for any vector a=(a1,,am)+ma=(a_{1},\ldots,a_{m})\in\mathbb{R}_{+}^{m}, we know that:

i=1maiZii=1maiZ~i±2(|a|Tσ),\displaystyle\sum_{i=1}^{m}a_{i}Z_{i}\in\sum_{i=1}^{m}a_{i}\tilde{Z}_{i}\pm\mathcal{E}_{2}(|a|^{T}\sigma),

where |a|=(|a1|,,|am|)+m|a|=(|a_{1}|,\ldots,|a_{m}|)\in\mathbb{R}_{+}^{m}

Proof 2.5.

We already know from Proposition 2.1 that:

(Z1σ1,,Zmσm)(Z~1σ1,,Z~mσm)±2,in (E,).\displaystyle\left(\frac{Z_{1}}{\sigma_{1}},\ldots,\frac{Z_{m}}{\sigma_{m}}\right)\in\left(\frac{\tilde{Z}_{1}}{\sigma_{1}},\ldots,\frac{\tilde{Z}_{m}}{\sigma_{m}}\right)\pm\mathcal{E}_{2},\ \ \text{in }(E,\|\cdot\|_{\ell^{\infty}}).

Let us then consider the linear mapping:

ϕ:(E,)(E,r)(z1,,zm)(σ1z1,,σmzm),\displaystyle\phi:\begin{aligned} (E,\|\cdot\|_{\ell^{\infty}})\ &&\longrightarrow&&(E,\|\cdot\|_{\ell^{r}})\hskip 17.07182pt\\ (z_{1},\ldots,z_{m})&&\longmapsto&&(\sigma_{1}z_{1},\ldots,\sigma_{m}z_{m}),\end{aligned}

the Lipschitz character of ϕ\phi is clearly σr=(i=1mσir)1/r\|\sigma\|_{r}=(\sum_{i=1}^{m}\sigma_{i}^{r})^{1/r}, and we can deduce the concentration of Z=ϕ(σ1Z1,,σmZm)Z=\phi(\sigma_{1}Z_{1},\ldots,\sigma_{m}Z_{m}).

Corollary 1 is very useful to set the concentration of infinite series of concentrated random variables. This is settled thanks to an elementary result of [LC19] that sets that the observable diameter of a limit of random vectors is equal to the limit of the observable vectors. Be careful that rigorously, there are two indexes, nn coming from Definition 1.5 that only describes the concentration of sequences of random vectors, and mm particular to this lemma that will tend to infinity. For clarity, we do not mention the index nn.

Lemma 2.6 ([LC19], Proposition 1.12.).

Given a sequence of random vectors (Zm)mE(Z_{m})_{m\in\mathbb{N}}\in E^{\mathbb{N}}, a sequence of positive reals (σm)m+(\sigma_{m})_{m\in\mathbb{N}}\in\mathbb{R}^{\mathbb{N}}_{+} and a sequence of deterministic vectors (Z~m)mE(\tilde{Z}_{m})_{m\in\mathbb{N}}\in E^{\mathbb{N}} such that:

ZmZ~m±2(σm),\displaystyle Z_{m}\in\tilde{Z}_{m}\pm\mathcal{E}_{2}(\sigma_{m}),

if we assume that (Zm)m(Z_{m})_{m\in\mathbb{N}} converges in law888For any nn\in\mathbb{N}, for any bounded continuous mapping f:m0Epf:\prod_{m\geq 0}E_{p}\to\mathbb{R}^{\mathbb{N}}: supn|𝔼[f(Zn,m)𝔼[f(Zn,)]|m0\displaystyle\sup_{n\in\mathbb{N}}\left|\mathbb{E}[f(Z_{n,m})-\mathbb{E}[f(Z_{n,\infty})]\right|\underset{m\to\infty}{\longrightarrow}0 when mm tends to infinity to a random vector (Z)E(Z_{\infty})\in E, that σmnσ\sigma_{m}\underset{n\to\infty}{\longrightarrow}\sigma_{\infty} and that Z~mnZ~\tilde{Z}_{m}\underset{n\to\infty}{\longrightarrow}\tilde{Z}_{\infty}, then:

ZZ~±2(σ).\displaystyle Z_{\infty}\in\tilde{Z}_{\infty}\pm\mathcal{E}_{2}(\sigma_{\infty}).

(The result also holds for Lipschitz and convex concentration)

Corollary 2

Given two constants q,r>0q,r>0, σ1,,σm(+)\sigma_{1},\ldots,\sigma_{m}\ldots\in(\mathbb{R}_{+}^{\mathbb{N}})^{\mathbb{N}}, a (sequences of) normed vector spaces (E,)(E,\|\cdot\|), Z~1,Z~m,E\tilde{Z}_{1}\ldots,\tilde{Z}_{m},\ldots\in E^{\mathbb{N}} deterministic, and Z1,Zm,EZ_{1}\ldots,Z_{m},\ldots\in E^{\mathbb{N}} random (possibly dependent) satisfying, for any nn\in\mathbb{N}, ZmZ~m±2(σm)Z_{m}\in\tilde{Z}_{m}\pm\mathcal{E}_{2}(\sigma_{m}). If we assume that ZnZmZ\equiv\sum_{n\in\mathbb{N}}Z_{m} is pointwise convergent999For any wΩw\in\Omega, mZm(w)\sum_{m\in\mathbb{N}}\|Z_{m}(w)\|\leq\infty and we define Z(w)mZm(w)Z(w)\equiv\sum_{m\in\mathbb{N}}Z_{m}(w), that mZ~m\sum_{m\in\mathbb{N}}\tilde{Z}_{m} is well defined and that nσi\sum_{n\in\mathbb{N}}\sigma_{i}\leq\infty, then we have the concentration :

mZmmZ~m±2(mσm),in (E,),\displaystyle\sum_{m\in\mathbb{N}}Z_{m}\in\sum_{m\in\mathbb{N}}\tilde{Z}_{m}\pm\mathcal{E}_{2}\left(\sum_{m\in\mathbb{N}}\sigma_{m}\right),\ \ \text{in }(E,\|\cdot\|),
Proof 2.7.

We already know from Corollary 1 that for all mm\in\mathbb{N}:

m=1MZmm=1MZ~m±2(mσm),in (E,).\displaystyle\sum_{m=1}^{M}Z_{m}\in\sum_{m=1}^{M}\tilde{Z}_{m}\pm\mathcal{E}_{2}\left(\sum_{m\in\mathbb{N}}\sigma_{m}\right),\ \ \text{in }(E,\|\cdot\|).

Thus in order to employ Lemma 2.6 let us note that for any bounded continuous mapping f:Ef:E\to\mathbb{R}, the dominated convergence theorem allows us to set that:

𝔼[f(m=1MZm)]M𝔼[f(m=1Zm)],\displaystyle\mathbb{E}\left[f\left(\sum_{m=1}^{M}Z_{m}\right)\right]\underset{M\to\infty}{\longrightarrow}\mathbb{E}\left[f\left(\sum_{m=1}^{\infty}Z_{m}\right)\right],

thus (m=1MZm)N(\sum_{m=1}^{M}Z_{m})_{N\in\mathbb{N}} converges in law to m=1Zm\sum_{m=1}^{\infty}Z_{m}, which allows us to set the result of the corollary.

The concentration of infinite series directly implies the concentration of resolvents and other related operators (like (InX/p)1Xk(I_{n}-X/\sqrt{p})^{-1}X^{k} for instance).

Corollary 3

Given a (sequence of) vector space (E,)(E,\|\cdot\|), let ϕ𝒜(E)\phi\in\mathcal{A}(E) be a (sequence of) random affine mapping such that there exists a constant ε>0\varepsilon>0 satisfying (ϕ)1ε\left\|\mathcal{L}(\phi)\right\|\leq 1-\varepsilon and a (sequences of) integers σ>0\sigma>0 satisfying for all (sequence of) integer kk:

(ϕ)k(ϕ(0))2(σ(1ε)k)in(E,)\displaystyle\mathcal{L}(\phi)^{k}(\phi(0))\in\mathcal{E}_{2}\left(\sigma(1-\varepsilon)^{k}\right)\ \ \text{in}\ \ (E,\|\cdot\|)

Then the random equation

Y=ϕ(Y)\displaystyle Y=\phi(Y)

admits a unique solution Y=(IdE(ϕ))1ϕ(0)Y=(Id_{E}-\mathcal{L}(\phi))^{-1}\phi(0) satisfying the linear concentration:

Y2(σ).\displaystyle Y\in\mathcal{E}_{2}(\sigma).

In practical examples, (ϕ)\|\mathcal{L}(\phi)\| is rarely bounded by 1ε1-\varepsilon for all drawings of ϕ\phi and to obtain the concentration of (ϕ)k\mathcal{L}(\phi)^{k} with an observable diameter of order σ(1ε)k\sigma(1-\varepsilon)^{k}, one needs to place oneself on an event 𝒜ϕ\mathcal{A}_{\phi} satisfying 𝒜ϕ{(ϕ)1ε}\mathcal{A}_{\phi}\subset\{\left\|\mathcal{L}(\phi)\right\|\leq 1-\varepsilon\}. Then, thanks to a simple adaptation of Lemma 4.2 below to the case of linear concentration, we have the concentration (Y|𝒜ϕ)2(σ)(Y\ |\ \mathcal{A}_{\phi})\in\mathcal{E}_{2}(\sigma). When 𝔼[(ϕ)]12ε\mathbb{E}[\|\mathcal{L}(\phi)\|]\leq 1-2\varepsilon for εO(1)\varepsilon\geq O(1) and ϕ\phi is sufficiently concentrated, it is generally possible to chose an event 𝒜ϕ\mathcal{A}_{\phi} of overwhelming probability.

As it will be seen in Subsection 4, this corollary finds its relevancy under convex concentration hypotheses, where the linear concentration seems to be the best concentration property to obtain on the resolvent Qz=(zIp1nXXT)1Q^{z}=(zI_{p}-\frac{1}{n}XX^{T})^{-1}.

Proof 2.8.

By contractivity of ϕ\phi, YY is well defined and expresses:

Y=(IdE(ϕ))1ϕ(0)=k=0(ϕ)kϕ(0).\displaystyle Y=(Id_{E}-\mathcal{L}(\phi))^{-1}\phi(0)=\sum_{k=0}^{\infty}\mathcal{L}(\phi)^{k}\phi(0).

One can then conclude with Corollary 2 that Y2(σ/ε)=2(σ)Y\in\mathcal{E}_{2}(\sigma/\varepsilon)=\mathcal{E}_{2}(\sigma).

In order to satisfy the hypothesis of Corollary 3, but also for independent interest, we are now going to express the concentration of the product of convexly concentrated random matrices.

3 Degeneracy of convex concentration through product

Given two convexly concentrated random vectors X,YEX,Y\in E satisfying X,Yc2(σ)X,Y\propto_{c}\mathcal{E}_{2}(\sigma), the convex concentration of the couple (X,Y)c2(σ)(X,Y)\propto_{c}\mathcal{E}_{2}(\sigma) is ensured if:

  1. 1.

    XX and YY are independent

  2. 2.

    (X,Y)=u(Z)(X,Y)=u(Z) with uu affine and Zc2(σ)Z\propto_{c}\mathcal{E}_{2}(\sigma).

We can then in particular state the concentration of X+YX+Y as it is a linear transformation of (X,Y)(X,Y). For the product it is not as simple as for the Lipschitz concentration, let us first consider the particular case of the entry-wise product in E=pE=\mathbb{R}^{p}. Since this result is not important for the rest of the paper, we left its proof in A.

Theorem 3.1.

Given a (sequences of) integer mm\in\mathbb{N}^{\mathbb{N}} and a (sequence of) positive number σ>0\sigma>0 such that mO(p)m\leq O(p), a (sequence of) mm random vectors X1,,XmpX_{1},\ldots,X_{m}\in\mathbb{R}^{p}, if we suppose that

X(X1,,Xm)c2(σ)\displaystyle X\equiv(X_{1},\ldots,X_{m})\propto_{c}\mathcal{E}_{2}(\sigma) in ((p)m,),\displaystyle\text{ in }\left((\mathbb{R}^{p})^{m},\|\cdot\|_{\ell^{\infty}}\right),

(with the notation \|\cdot\|_{\ell^{\infty}} defined in (5)) and that there exists a (sequence of) positive numbers κ>0\kappa>0 such that i[m]:Xiκ\forall i\in[m]:\|X_{i}\|_{\infty}\leq\kappa, then:

X1Xm2((2eκ)m1σ)in (p,).\displaystyle X_{1}\odot\cdots\odot X_{m}\in\mathcal{E}_{2}\left((2e\kappa)^{m-1}\sigma\right)\ \ \ \text{in }\ (\mathbb{R}^{p},\|\cdot\|).

And if X1==Xm=XX_{1}=\cdots=X_{m}=X, the constant 2e2e is no more needed and we get the concentration Xm2(κm1σ)X^{\odot m}\in\mathcal{E}_{2}\left(\kappa^{m-1}\sigma\right).

Remark 3.2.

If we replace the strong assumption i[m]:Xiκ\forall i\in[m]:\|X_{i}\|_{\infty}\leq\kappa, with the bound sup1im𝔼[Xi]O((logp)1/q)\sup_{1\leq i\leq m}\|\mathbb{E}[X_{i}]\|_{\infty}\leq O((\log p)^{1/q}) we can still deduce a similar result to [LC21a, Example 4.], stating the existence of a constant κO(1)\kappa\leq O(1) such that:

X1Xm2((κσ)m(log(p))(m1)/q)+q/m((κσ)m)in (p,).\displaystyle X_{1}\odot\cdots\odot X_{m}\in\mathcal{E}_{2}\left(\left(\kappa\sigma\right)^{m}(\log(p))^{(m-1)/q}\right)+\mathcal{E}_{q/m}\left(\left(\kappa\sigma\right)^{m}\right)\ \ \ \text{in }\ (\mathbb{R}^{p},\|\cdot\|).

The result of concentration of a product of matrices convexly concentrated was already proven in [MS11] but since their formulation is slightly different, we reprove in A the following result with the formulation required for the study of the resolvent.

Theorem 3.3 ([MS11], Theorem 1).

Let us consider three sequences mm\in\mathbb{N}^{\mathbb{N}} and σ,κ+\sigma,\kappa\in\mathbb{R}_{+}^{\mathbb{N}}, and a sequence of mm random random matrices X1n0,n1,,Xmnm1,nmX_{1}\in\mathcal{M}_{n_{0},n_{1}},\ldots,X_{m}\in\mathcal{M}_{n_{m-1},n_{m}}, satisfying101010The norm F\|\cdot\|_{F} is defined on n0,n1××nm1,nm\mathcal{M}_{n_{0},n_{1}}\times\cdots\times\mathcal{M}_{n_{m-1},n_{m}} by the identity: (M1,,Mp)F=M1F2++MmF2\|(M_{1},\ldots,M_{p})\|_{F}=\sqrt{\|M_{1}\|_{F}^{2}+\cdots+\|M_{m}\|_{F}^{2}} :

(X1,,Xm)c2(σ)(X_{1},\ldots,X_{m})\propto_{c}\mathcal{E}_{2}(\sigma) in (n0,n1××nm1,nm,F),\left(\mathcal{M}_{n_{0},n_{1}}\times\cdots\times\mathcal{M}_{n_{m-1},n_{m}},\|\cdot\|_{F}\right),

In the particular case where X1==XnXX_{1}=\cdots=X_{n}\equiv X, it is sufficient111111Be careful that Xc2(σ)X\propto_{c}\mathcal{E}_{2}(\sigma) does not imply that (X,,X)c2(σ)(X,\ldots,X)\propto_{c}\mathcal{E}_{2}(\sigma), it is only true when (n)m(\mathcal{M}_{n})^{m} is endowed with the norm F,\|\cdot\|_{F,\ell^{\infty}}, satisfying for any M=(M1,,Mm)(n)mM=(M_{1},\ldots,M_{m})\in(\mathcal{M}_{n})^{m}, MF,=sup1imMiF\|M\|_{F,\ell^{\infty}}=\sup_{1\leq i\leq m}\|M_{i}\|_{F} to assume that Xc2(σ)X\propto_{c}\mathcal{E}_{2}(\sigma) in (n,F)(\mathcal{M}_{n},\|\cdot\|_{F}). If there exist a sequence of positive values κ>0\kappa>0 such that i[m],Xiκ\forall i\in[m],\|X_{i}\|\leq\kappa, then the product is concentrated for the nuclear norm:

X1Xm2(κm1σn0++nm)\displaystyle X_{1}\cdots X_{m}\in\mathcal{E}_{2}\left(\kappa^{m-1}\sigma\sqrt{n_{0}+\cdots+n_{m}}\right) in (n0,nm,),\displaystyle\text{ in }\left(\mathcal{M}_{n_{0},n_{m}},\|\cdot\|_{*}\right),

where, for any Mn0,nmM\in\mathcal{M}_{n_{0},n_{m}}, M=Tr(MMT)\|M\|_{*}=\operatorname{Tr}(\sqrt{MM^{T}}) (it is the dual norm of the spectral norm).

Remark 3.4.

The hypothesis Xκ\|X\|\leq\kappa might look quite strong, however in classical settings where X2X\propto\mathcal{E}_{2} and 𝔼[X]O(n)\|\mathbb{E}[X]\|\leq O(\sqrt{n}) it has been shown that there exist three constants C,c,K>0C,c,K>0 such that (XKn)Cecn\mathbb{P}(\|X\|\geq K\sqrt{n})\leq Ce^{-cn}. Placing ourselves on the event 𝒜={XKn}\mathcal{A}=\{\|X\|\leq K\sqrt{n}\}, we can then show from Lemma 4.2 below that:

((X/n)m|𝒜)2(Km1/n)\displaystyle\left((X/\sqrt{n})^{m}\ |\ \mathcal{A}\right)\in\mathcal{E}_{2}\left(K^{m-1}/\sqrt{n}\right) and (Ac)Cecn,\displaystyle\mathbb{P}(A^{c})\leq Ce^{-cn},

(here σ=1/n\sigma=1/\sqrt{n} and κ=K\kappa=K). The same inferences hold for the concentration of (XXT/(n+p))m(XX^{T}/(n+p))^{m}.

We end this section on the concentration of the product of convexly concentrated random vectors with the Hanson-Wright Theorem that will find some use of the estimation of 𝔼[Qz]\mathbb{E}[Q^{z}]. This result was first proven in [Ada15], an alternative proof with our notations is provided in [LC21a, Proposition 8]121212This paper only studies the Lipschitz concentration case, however, since quadratic forms are convex, the arguments stays the same with convex concentration hypotheses..

Theorem 3.5 ([Ada15]).

Given two random matrices X,Yp,nX,Y\in\mathcal{M}_{p,n} such that (X,Y)c2(X,Y)\propto_{c}\mathcal{E}_{2} and 𝔼[X]F,𝔼[Y]FO(1)\|\mathbb{E}[X]\|_{F},\|\mathbb{E}[Y]\|_{F}\leq O(1), for any ApA\in\mathcal{M}_{p}:

YTAX2(AF)+1(A).\displaystyle Y^{T}AX\in\mathcal{E}_{2}(\|A\|_{F})+\mathcal{E}_{1}(\|A\|).

4 Concentration of the resolvent of the sample covariance matrix of convexly concentrated data

4.1 Assumptions on XX and “concentration zone” of the resolvent

Given nn data x1,,xnpx_{1},\ldots,x_{n}\in\mathbb{R}^{p}, to study the eigen behavior of the sample (non centered) covariance matrix 1nXXT\frac{1}{n}XX^{T}, where X=(x1,,xn)p,nX=(x_{1},\ldots,x_{n})\in\mathcal{M}_{p,n}, one classically studies the resolvent Qz=(zIp1nXXT)1Q^{z}=(zI_{p}-\frac{1}{n}XX^{T})^{-1} for the values of zz where it is defined. Let us note the pp eigen values of 1nXXT\frac{1}{n}XX^{T}: λi=σi(1nXXT)\lambda_{i}=\sigma_{i}(\frac{1}{n}XX^{T}), for i[p]i\in[p] ( then λ1λn\lambda_{1}\geq\cdots\geq\lambda_{n}), then the spectral distribution of 1nXXT\frac{1}{n}XX^{T}:

μ=1pi=1pδi\displaystyle\mu=\frac{1}{p}\sum_{i=1}^{p}\delta_{i}

has for Stieltjes transform g:z1pTr(Qz)g:z\mapsto\frac{1}{p}\operatorname{Tr}(Q^{z}).

The present study was already lead in previous papers in the case of Lipschitz concentration of XX [LC21b], or in the case of convex concentration of XX but with negative zz [LC19]. The goal of this section, is manly to present the consequences of Theorem 3.3 and adapt the recent results of [LC21b] on the case of convex concentration. We adopt here classical hypotheses and assume a convex concentration for X=(x1,,xn)X=(x_{1},\ldots,x_{n}). {assumption}[Convergence scheme] p=O(n)p=O(n). {assumption}[Independence] x1,,xnx_{1},\ldots,x_{n} are independent. {assumption}[Concentration] Xc2X\propto_{c}\mathcal{E}_{2}. {assumption}[Bounding condition131313As already done in [LC19] (but with real negative zz), one can obtain the same conclusion assuming that there are a finite number of classes for the distribution of the columns x1,,xnx_{1},\ldots,x_{n} and that supi[n]𝔼[xi]O(n)\sup_{i\in[n]}\|\mathbb{E}[x_{i}]\|\leq O(\sqrt{n})] supi[n]𝔼[xi]O(1)\sup_{i\in[n]}\|\mathbb{E}[x_{i}]\|\leq O(1). When nn gets big, μ\mu distributes along a finite number of bulks. To describe them, let us consider a positive parameter, ε>0\varepsilon>0, that could be chosen arbitrarily small (it will though be chosen independent with nn in most practical cases) and introduce as in [LC21b] the sets:

𝒮={λi}i[p]\displaystyle\mathcal{S}=\left\{\lambda_{i}\right\}_{i\in[p]} 𝒮¯={𝔼[λi]}i[p]\displaystyle\bar{\mathcal{S}}=\left\{\mathbb{E}[\lambda_{i}]\right\}_{i\in[p]} 𝒮¯ε={x,i[n],|xλi|ε}\displaystyle\bar{\mathcal{S}}^{\varepsilon}=\left\{x\in\mathbb{R},\exists i\in[n],|x-\lambda_{i}|\leq\varepsilon\right\}

One can show that νsup𝒮¯=𝔼[λ1]O(1)\nu\equiv\sup\bar{\mathcal{S}}=\mathbb{E}[\lambda_{1}]\leq O(1) and introducing the event:

𝒜ε{i[p],σi(1nXXT)𝒮¯ε/2},\displaystyle\mathcal{A}_{\varepsilon}\equiv\left\{\forall i\in[p],\sigma_{i}\left(\frac{1}{n}XX^{T}\right)\in\bar{\mathcal{S}}^{\varepsilon/2}\right\},

the concentration of σ(X)/n𝔼[σ(X)]±2(1/n)\sigma(X)/\sqrt{n}\in\mathbb{E}[\sigma(X)]\pm\mathcal{E}_{2}(1/\sqrt{n}), allows us to set:141414In [LC21b], the proof is conducted for Lipschitz concentration hypotheses on XX. However, since only the linear concentration of σ(X)\sigma(X) is needed, the justification are the same in a context of convex concentration thanks to Theorem A.7.

Lemma 4.1 ([LC21b], Lemma 3.).

There exist two constants C,c>0C,c>0 such that (𝒜c)Cecnε2\mathbb{P}\left(\mathcal{A}^{c}\right)\leq Ce^{-cn\varepsilon^{2}}.

The following lemma allows us to conduct the concentration study on the highly probable event 𝒜ε\mathcal{A}_{\varepsilon} (when εO(1)\varepsilon\geq O(1)).

Lemma 4.2.

Given a (sequence of) positive numbers σ>0\sigma>0, a (sequence of) random vector ZEZ\in E satisfying Z2(σ)Z\propto\mathcal{E}_{2}(\sigma), and a (sequence of) convex subsets AEA\subset E, if there exists a constant K>0K>0 such that (ZA)K\mathbb{P}(Z\in A)\geq K then:151515There exist two constants C,s>0C,s>0 such that for any (sequence of) 11-Lipschitz and quasi-convex mappings f:Af:A\to\mathbb{R}: t>0:(|f(Z)𝔼[f(Z)|ZA]|t|ZA)Ce(t/cσ)2,\displaystyle\forall t>0:\ \mathbb{P}\left(\left|f(Z)-\mathbb{E}[f(Z)\ |\ Z\in A]\right|\geq t\ |\ Z\in A\right)\leq Ce^{-(t/c\sigma)^{2}}, and similar concentration occur around any median of f(Z)f(Z) or any independent copy of ZZ (under {ZA}\{Z\in A\}).

(Z|ZA)c2(σ).\displaystyle(Z|Z\in A)\propto_{c}\mathcal{E}_{2}(\sigma).
Proof 4.3.

The proof is the same as the one provided in [LC21b, Lemma 2.] except that this time, one needs the additional argument that since S={fmf}S=\{f\leq m_{f}\} (for mfm_{f}, a median of ff) is convex, the mappings zd(z,S)z\mapsto d(z,S) and zd(z,S)z\mapsto-d(z,S) are both quasi-convex thanks to the triangular inequality.

We can deduce from Lemma 4.2 that for all εO(1)\varepsilon\geq O(1), (X|𝒜ε)c2(X\ |\ \mathcal{A}_{\varepsilon})\propto_{c}\mathcal{E}_{2}, and the random matrix (X|𝒜ε)(X\ |\ \mathcal{A}_{\varepsilon}) is far easier to control because (X|𝒜ε)ν+ε2\|(X\ |\ \mathcal{A}_{\varepsilon})\|\leq\nu+\frac{\varepsilon}{2} (we recall that ν𝔼[λ1]\nu\equiv\mathbb{E}[\lambda_{1}]).

4.2 Concentration of the resolvent

Placing ourselves under the event 𝒜ε\mathcal{A}_{\varepsilon}, let us first show that the resolvent Qz(zIp1nXXT)1Q^{z}\equiv(zI_{p}-\frac{1}{n}XX^{T})^{-1} is concentrated if zz has a big enough modulus. Be careful that the following concentration is expressed for the nuclear norm (for any deterministic matrix ApA\in\mathcal{M}_{p} such that AO(1)\|A\|\leq O(1), Tr(AQz)2\operatorname{Tr}(AQ^{z})\in\mathcal{E}_{2}). All the following results are provided under Assumptions 4.1-13. The next proposition is just provided as a first direct application of Theorem 3.3 and Corollary 2, a stronger result is provided in Proposition 4.6.

Proposition 4.4.

Given two parameters ε>0\varepsilon>0 and zz\in\mathbb{C} such that |z|ν+ε|z|\geq\nu+\varepsilon:

(Qz|𝒜ε)2(4ε(ν+ε))\displaystyle(Q^{z}\ |\ \mathcal{A}_{\varepsilon})\in\mathcal{E}_{2}\left(\frac{4}{\varepsilon}(\nu+\varepsilon)\right) in(p,).\displaystyle\text{in}\ \ (\mathcal{M}_{p},\|\cdot\|_{*}).
Proof 4.5.

We know from Lemma 4.2 that (X|𝒜ε)c2(X\ |\ \mathcal{A}_{\varepsilon})\propto_{c}\mathcal{E}_{2} and from Theorem 3.3 that (here κ=ν+ε2O(1)\kappa=\nu+\frac{\varepsilon}{2}\leq O(1), σ=1/n\sigma=1/\sqrt{n} and p=O(n)p=O(n)):

Under 𝒜ε\mathcal{A}_{\varepsilon}: (1nXXT)m2((ν+ε2)mm)\displaystyle\left(\frac{1}{n}XX^{T}\right)^{m}\in\mathcal{E}_{2}\left(\left(\nu+\frac{\varepsilon}{2}\right)^{m}\sqrt{m}\right) in (p,).\displaystyle\text{ in }\left(\mathcal{M}_{p},\|\cdot\|_{*}\right).

Let us then note that (ν+ε2)mm=O((ν+3ε4)m)\left(\nu+\frac{\varepsilon}{2}\right)^{m}\sqrt{m}=O\left(\left(\nu+\frac{3\varepsilon}{4}\right)^{m}\right) and for zz\in\mathbb{C} satisfying our hypotheses: (ν+3ε4)/|z|1ε4(ν+ε)(\nu+\frac{3\varepsilon}{4})/|z|\leq 1-\frac{\varepsilon}{4(\nu+\varepsilon)}. We can then deduce from Corollary 2 that under 𝒜ε\mathcal{A}_{\varepsilon}:

Qz=1z(Ip1znXXT)1=1zi=1(1znXXT)i2(4ε(ν+ε)).\displaystyle Q^{z}=\frac{1}{z}\left(I_{p}-\frac{1}{zn}XX^{T}\right)^{-1}=\frac{1}{z}\sum_{i=1}^{\infty}\left(\frac{1}{zn}XX^{T}\right)^{i}\in\mathcal{E}_{2}\left(\frac{4}{\varepsilon}(\nu+\varepsilon)\right).

Let us now try to study the concentration of QzQ^{z} when zz gets close to the spectrum, for that we now require ε>0\varepsilon>0 to be a constant (εO(1)\varepsilon\geq O(1)).

Proposition 4.6.

Given εO(1)\varepsilon\geq O(1), for all z𝒮¯εz\in\mathbb{C}\setminus\bar{\mathcal{S}}^{\varepsilon}:

(Qz|𝒜ε)2\displaystyle(Q^{z}\ |\ \mathcal{A}_{\varepsilon})\in\mathcal{E}_{2} in (p,),\displaystyle\text{in }\ \ (\mathcal{M}_{p},\|\cdot\|_{*}),

and we recall that there exist two constants C,c>0C,c>0 such that (𝒜εc)Cecn\mathbb{P}\left(\mathcal{A}_{\varepsilon}^{c}\right)\leq Ce^{-cn}.

Proof 4.7.

Proposition 4.4 already set the result for |z|ν+ερ|z|\geq\nu+\varepsilon\equiv\rho, therefore, let us now suppose that |z|ρ|z|\leq\rho.

With the notation |Qz|2((z)2+((z)1nXXT)2)1|Q^{z}|^{2}\equiv\left(\Im(z)^{2}+\left(\Re(z)-\frac{1}{n}XX^{T}\right)^{2}\right)^{-1}, let us decompose:

Qz=((z)1nXXT)|Qz|2(z)|Qz|2.\displaystyle Q^{z}=\left(\Re(z)-\frac{1}{n}XX^{T}\right)|Q^{z}|^{2}-\Im(z)|Q^{z}|^{2}. (6)

We can then deduce the linear concentration of |Qz|2|Q^{z}|^{2} with the same justifications as previously thanks to the Taylor decomposition:

|Qz|2=1ρ2m=0(1(z)2ρ2((z)1nXXT)2ρ2)m.\displaystyle|Q^{z}|^{2}=\frac{1}{\rho^{2}}\sum_{m=0}^{\infty}\left(1-\frac{\Im(z)^{2}}{\rho^{2}}-\frac{\left(\Re(z)-\frac{1}{n}XX^{T}\right)^{2}}{\rho^{2}}\right)^{m}.

Indeed, (z)Ip1nXXTd((z),S)\|\Re(z)I_{p}-\frac{1}{n}XX^{T}\|\leq d(\Re(z),S) and d(z,S)2=(z)2+d((z),S)2ρd(z,S)^{2}=\Im(z)^{2}+d(\Re(z),S)^{2}\leq\rho thus:

1(z)2ρ21ρ2((z)Ip1nXXT)2\displaystyle\left\|1-\frac{\Im(z)^{2}}{\rho^{2}}-\frac{1}{\rho^{2}}\left(\Re(z)I_{p}-\frac{1}{n}XX^{T}\right)^{2}\right\| 1d(z,S)2ρ21ε2ρ2<1.\displaystyle\leq 1-\frac{d(z,S)^{2}}{\rho^{2}}\leq 1-\frac{\varepsilon^{2}}{\rho^{2}}<1.

We therefore deduce from (6) that:

(Qz|𝒜ε)2(2ε2(|(z)|+|(z)|+ν+ε2))=2.\displaystyle(Q^{z}\ |\ \mathcal{A}_{\varepsilon})\in\mathcal{E}_{2}\left(\frac{2}{\varepsilon^{2}}\left(|\Im(z)|+|\Re(z)|+\nu+\frac{\varepsilon}{2}\right)\right)=\mathcal{E}_{2}.

For the sake of completeness, we left in the appendix an alternative laborious proof (but somehow more direct) already presented in [LC19].

4.3 Computable deterministic equivalent

We are going to look for a deterministic equivalent of QQ. We mainly follow the lines of [LC21b], we thus allow ourselves to present the justifications rather succinctly. Although Proposition 4.6 gives us a concentration of QzQ^{z} in nuclear norm, we will provide a deterministic equivalent for the Frobenius norm with a better observable diameter. For any zSεz\in\mathbb{C}\setminus S^{\varepsilon}, let us introduce Λ¯z=(Tr(Σi𝔼[Qz]))i[n]\bar{\Lambda}^{z}=(\operatorname{Tr}(\Sigma_{i}\mathbb{E}[Q^{z}]))_{i\in[n]} and recall that for any δn\delta\in\mathbb{C}^{n}, we note Q~δz=(zIp1ni=1nΣi1δi)z\tilde{Q}_{\delta}^{z}=(zI_{p}-\frac{1}{n}\sum_{i=1}^{n}\frac{\Sigma_{i}}{1-\delta_{i}})^{z}. We have the following first approximation to 𝔼[Qz]\mathbb{E}[Q^{z}]:

Proposition 4.8.

For any z𝒮¯εz\in\mathbb{C}\setminus\bar{\mathcal{S}}^{\varepsilon}:

Q~Λ¯zzO(1)\displaystyle\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|\leq O(1) and 𝔼[Qz]Q~Λ¯zzFO(1n).\displaystyle\left\|\mathbb{E}[Q^{z}]-\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|_{F}\leq O\left(\frac{1}{\sqrt{n}}\right).

To prove this proposition, we will play on the dependence of QzQ^{z} towards xix_{i} with the notation Xi(x1,,xi1,0,xi+1,,xn)p,nX_{-i}\equiv(x_{1},\ldots,x_{i-1},0,x_{i+1},\ldots,x_{n})\in\mathcal{M}_{p,n} and:

Qiz(zIp1nXiXiT)1.\displaystyle Q_{-i}^{z}\equiv\left(zI_{p}-\frac{1}{n}X_{-i}X_{-i}^{T}\right)^{-1}.

To link QzQ^{z} to QizQ^{z}_{-i} we will extensively use a direct application of the Schur identity:

Qzxi=Qizxi11nxiTQizxi.\displaystyle Q^{z}x_{i}=\frac{Q^{z}_{-i}x_{i}}{1-\frac{1}{n}x_{i}^{T}Q^{z}_{-i}x_{i}}. (7)
Proof 4.9.

All the estimations hold under 𝒜ε\mathcal{A}_{\varepsilon}, therefore the expectation should also be taken under 𝒜ε\mathcal{A}_{\varepsilon} to be fully rigorous. Note that if QiQ_{-i} and xix_{i} are independent on the whole universe, they are no more independent under 𝒜ε\mathcal{A}_{\varepsilon}. However, since the probability of 𝒜ε\mathcal{A}_{\varepsilon} is overwhelming, the correction terms are negligible, we thus allow ourselves to abusively expel from this proof the independence and approximation issues related to 𝒜ε\mathcal{A}_{\varepsilon}, a rigorous justification is provided in [LC21b].

Let us bound for any deterministic matrix ApA\in\mathcal{M}_{p} such that AF1\|A\|_{F}\leq 1:

|Tr(A(𝔼[Qz]Q~Λ¯zz))|\displaystyle\left|\operatorname{Tr}\left(A\left(\mathbb{E}[Q^{z}]-\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right)\right)\right| 1ni=1n|𝔼[Tr(A(Qz(Σi1+Λ¯izxixiT)Q~Λ¯zz))]|.\displaystyle\leq\frac{1}{n}\sum_{i=1}^{n}\left|\mathbb{E}\left[\operatorname{Tr}\left(A\left(Q^{z}\left(\frac{\Sigma_{i}}{1+\bar{\Lambda}^{z}_{i}}-x_{i}x_{i}^{T}\right)\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right)\right)\right]\right|.

We can then develop with (7):

|Tr(A(𝔼[Qz]Q~Λ¯zz))|\displaystyle\left|\operatorname{Tr}\left(A\left(\mathbb{E}[Q^{z}]-\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right)\right)\right|
1ni=1n|Tr(A(𝔼[QzQiz]ΣiQ~Λ¯zz))1Λ¯iz|\displaystyle\hskip 28.45274pt\leq\frac{1}{n}\sum_{i=1}^{n}\left|\frac{\operatorname{Tr}\left(A\left(\mathbb{E}\left[Q^{z}-Q_{-i}^{z}\right]\Sigma_{i}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right)\right)}{1-\bar{\Lambda}^{z}_{i}}\right|
+1ni=1n|𝔼[Tr(A(Qiz(Σi1Λ¯izxixiT1+1nxiTQizxi)Q~Λ¯zz))]|\displaystyle\hskip 28.45274pt\hskip 28.45274pt+\frac{1}{n}\sum_{i=1}^{n}\left|\mathbb{E}\left[\operatorname{Tr}\left(A\left(Q_{-i}^{z}\left(\frac{\Sigma_{i}}{1-\bar{\Lambda}^{z}_{i}}-\frac{x_{i}x_{i}^{T}}{1+\frac{1}{n}x_{i}^{T}Q_{-i}^{z}x_{i}}\right)\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right)\right)\right]\right|
1ni=1n|𝔼[xiTQ~Λ¯zzAQzxi1Λ¯iz(1nxitQizxiΛ¯iz)]|+O(Q~Λ¯zzn),\displaystyle\hskip 28.45274pt\leq\frac{1}{n}\sum_{i=1}^{n}\left|\mathbb{E}\left[\frac{x_{i}^{T}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}x_{i}}{1-\bar{\Lambda}^{z}_{i}}\left(\frac{1}{n}x_{i}^{t}Q_{-i}^{z}x_{i}-\bar{\Lambda}^{z}_{i}\right)\right]\right|+O\left(\frac{\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|}{\sqrt{n}}\right),

thanks to Lemma 4.13 and the independence between QizQ_{-i}^{z} and xix_{i}. We can then bound thanks to Hölder inequality and Lemma 4.15 below:

|𝔼[xiTQ~Λ¯zzAQzxi(1nxitQizxiΛ¯iz)]|\displaystyle\left|\mathbb{E}\left[x_{i}^{T}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}x_{i}\left(\frac{1}{n}x_{i}^{t}Q_{-i}^{z}x_{i}-\bar{\Lambda}^{z}_{i}\right)\right]\right|
=|𝔼[(xiTQ~Λ¯zzAQzxi𝔼[xiTQ~Λ¯zzAQzxi])(1nxitQizxi𝔼[1nxitQizxi])]|\displaystyle\hskip 14.22636pt=\left|\mathbb{E}\left[\left(x_{i}^{T}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}x_{i}-\mathbb{E}\left[x_{i}^{T}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}x_{i}\right]\right)\left(\frac{1}{n}x_{i}^{t}Q_{-i}^{z}x_{i}-\mathbb{E}\left[\frac{1}{n}x_{i}^{t}Q_{-i}^{z}x_{i}\right]\right)\right]\right|
𝔼[(xiTAQizxi11nxiTQixi𝔼[xiTAQizxi11nxiTQixi])2]O(1n)\displaystyle\hskip 14.22636pt\leq\sqrt{\mathbb{E}\left[\left(\frac{x_{i}^{T}AQ^{z}_{-i}x_{i}}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}-\mathbb{E}\left[\frac{x_{i}^{T}AQ^{z}_{-i}x_{i}}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}\right]\right)^{2}\right]}O\left(\frac{1}{\sqrt{n}}\right)
O(1n)(𝔼[(xiTQ~Λ¯zzAQizxi𝔼[xiTQ~Λ¯zzAQizxi]11nxiTQixi)2]\displaystyle\hskip 14.22636pt\leq O\left(\frac{1}{\sqrt{n}}\right)\left(\mathbb{E}\left[\left(\frac{x_{i}^{T}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}_{-i}x_{i}-\mathbb{E}[x_{i}^{T}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}_{-i}x_{i}]}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}\right)^{2}\right]\right.
+𝔼[(Tr(ΣiQ~Λ¯zzAQiz)(111nxiTQixi𝔼[111nxiTQixi]))2])1/2\displaystyle\hskip 42.67912pt\left.+\mathbb{E}\left[\left(\operatorname{Tr}\left(\Sigma_{i}\tilde{Q}^{z}_{\bar{\Lambda}^{z}}AQ^{z}_{-i}\right)\left(\frac{1}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}-\mathbb{E}\left[\frac{1}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}\right]\right)\right)^{2}\right]\right)^{1/2}
O(Q~Λ¯zzn).\displaystyle\hskip 14.22636pt\leq O\left(\frac{\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|}{\sqrt{n}}\right).

indeed since we know that |111nxiTQixi|O(1)|\frac{1}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}|\leq O(1) from Lemma 4.10, 111nxiTQixi\frac{1}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}} is a O(1)O(1)-Lipschitz transformation of 1nxiTQixi\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}, therefore, it follows the same concentration inequality (with a variance of order O(1/n)O(1/n)). Since this inequality is true for any ApA\in\mathcal{M}_{p}, we can bound:

Q~Λ¯zzQ~Λ¯zz𝔼[Qz]F+𝔼[Qz]O(Q~Λ¯zzn)+O(1),\displaystyle\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|\leq\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}-\mathbb{E}[Q^{z}]\right\|_{F}+\left\|\mathbb{E}[Q^{z}]\right\|\leq O\left(\frac{\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|}{\sqrt{n}}\right)+O(1),

which directly implies that Q~Λ¯zzO(1)\left\|\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|\leq O(1) and 𝔼[Qz]Q~Λ¯zzFO(1/n)\left\|\mathbb{E}[Q^{z}]-\tilde{Q}^{z}_{\bar{\Lambda}^{z}}\right\|_{F}\leq O(1/\sqrt{n}).

Lemma 4.10 ([LC21b], Lemmas 4., 8. ).

z𝒮¯ε\forall z\in\bar{\mathcal{S}}^{\varepsilon}, under 𝒜ε\mathcal{A}^{\varepsilon}:

Qz2ε\displaystyle\|Q^{z}\|\leq\frac{2}{\varepsilon} and supi[n]|111nxiTQixi|O(1)\displaystyle\sup_{i\in[n]}|\frac{1}{1-\frac{1}{n}x_{i}^{T}Q_{-i}x_{i}}|\leq O(1)
Lemma 4.11.

For any z𝒮¯εz\in\mathbb{C}\setminus\bar{\mathcal{S}}^{\varepsilon}, any i[n]i\in[n] and any upu\in\mathbb{R}^{p} such that u1\|u\|\leq 1:

(uTQizxi|𝒜ε),(uTQzxi|𝒜ε)O(1)±2.\displaystyle(u^{T}Q_{-i}^{z}x_{i}\ |\ \mathcal{A}_{\varepsilon}),(u^{T}Q^{z}x_{i}\ |\ \mathcal{A}_{\varepsilon})\in O(1)\pm\mathcal{E}_{2}.
Proof 4.12.

We do not care about the independence issues brought by 𝒜ε\mathcal{A}_{\varepsilon}. Let us simply bound for any t>0t>0 and under 𝒜ε\mathcal{A}_{\varepsilon}:

(|uTQizxi𝔼[uTQizxi]|t)\displaystyle\mathbb{P}\left(\left|u^{T}Q_{-i}^{z}x_{i}-\mathbb{E}\left[u^{T}Q_{-i}^{z}x_{i}\right]\right|\geq t\right)
(|uTQiz(xiμi)|t2)+(|uT(Qiz𝔼[Qiz])μi|t2)\displaystyle\hskip 28.45274pt\leq\mathbb{P}\left(\left|u^{T}Q_{-i}^{z}(x_{i}-\mu_{i})\right|\geq\frac{t}{2}\right)+\mathbb{P}\left(\left|u^{T}\left(Q_{-i}^{z}-\mathbb{E}\left[Q_{-i}^{z}\right]\right)\mu_{i}\right|\geq\frac{t}{2}\right)
𝔼[Cecnt2/Qi2]+Cecnt2 2Cecnt2,\displaystyle\hskip 28.45274pt\leq\mathbb{E}\left[Ce^{-cnt^{2}/\|Q_{-i}\|^{2}}\right]+Ce^{-cnt^{2}}\ \ \leq\ 2Ce^{-c^{\prime}nt^{2}},

for some constants C,c,c>0C,c,c^{\prime}>0. Besides, we can bound:

|𝔼[uTQizxi]|=|uT𝔼[Qiz]μi|O(1),\displaystyle\left|\mathbb{E}\left[u^{T}Q_{-i}^{z}x_{i}\right]\right|=\left|u^{T}\mathbb{E}[Q_{-i}^{z}]\mu_{i}\right|\leq O(1),

thanks to Lemma 4.10 and Assumption 13.

The concentration of uTQzxiu^{T}Q^{z}x_{i} is a consequence of the concentration QX2QX\in\mathcal{E}_{2} that can be shown thanks to Corollary 2 as in the proof of Proposition 4.6. We are then left to bounding 𝔼[uTQzxi]\mathbb{E}[u^{T}Q^{z}x_{i}]. For this purpose, let us write:

|𝔼[uTQzxi]|\displaystyle\left|\mathbb{E}[u^{T}Q^{z}x_{i}]\right| =|𝔼[uTQizxi]𝔼[(uTQizxi)(1nxiTQzxi)]|\displaystyle=\left|\mathbb{E}[u^{T}Q_{-i}^{z}x_{i}]-\mathbb{E}\left[(u^{T}Q_{-i}^{z}x_{i})\left(\frac{1}{n}x_{i}^{T}Q^{z}x_{i}\right)\right]\right|
O(1)+O(𝔼[(uTQizxi)2]𝔼[(1nxiTQzxi)2])O(1),\displaystyle\leq O(1)+O\left(\sqrt{\mathbb{E}\left[(u^{T}Q_{-i}^{z}x_{i})^{2}\right]\mathbb{E}\left[\left(\frac{1}{n}x_{i}^{T}Q^{z}x_{i}\right)^{2}\right]}\right)\leq O(1),

thanks to Cauchy-Schwarz inequality Lemma 4.10, and the bound on xix_{i}, valid under 𝒜ε\mathcal{A}_{\varepsilon}.

Lemma 4.13.

Under 𝒜ε\mathcal{A}_{\varepsilon}, for any z𝒮¯εz\in\mathbb{C}\setminus\bar{\mathcal{S}}^{\varepsilon} and any i[n]i\in[n]:

𝔼[QzQiz]O(1n).\displaystyle\|\mathbb{E}[Q^{z}-Q^{z}_{-i}]\|\leq O\left(\frac{1}{n}\right).
Proof 4.14.

For any upu\in\mathbb{R}^{p}, we can bound thanks to Lemma 4.11:

|uT𝔼[QzQiz]u|\displaystyle\left|u^{T}\mathbb{E}[Q^{z}-Q^{z}_{-i}]u\right| 1n|𝔼[uTQzxixiTQizu]|\displaystyle\leq\frac{1}{n}\left|\mathbb{E}\left[u^{T}Q^{z}x_{i}x_{i}^{T}Q^{z}_{-i}u\right]\right|
1n𝔼[(uTQzxi)2]𝔼[(xiTQizu)2]O(1n).\displaystyle\leq\frac{1}{n}\sqrt{\mathbb{E}\left[(u^{T}Q^{z}x_{i})^{2}\right]\mathbb{E}\left[(x_{i}^{T}Q^{z}_{-i}u)^{2}\right]}\ \ \leq O\left(\frac{1}{n}\right).
Lemma 4.15.

For any z𝒮¯εz\in\mathbb{C}\setminus\bar{\mathcal{S}}^{\varepsilon}deterministic matrix ApA\in\mathcal{M}_{p}:

(xiTAQizxi|𝒜ε)Tr(ΣiA𝔼[Qz])±2(AF)+1(A).\displaystyle(x_{i}^{T}AQ^{z}_{-i}x_{i}\ |\ \mathcal{A}_{\varepsilon})\in\operatorname{Tr}(\Sigma_{i}A\mathbb{E}[Q^{z}])\pm\mathcal{E}_{2}\left(\|A\|_{F}\right)+\mathcal{E}_{1}(\|A\|).
Proof 4.16.

Once again, without referring to 𝒜ε\mathcal{A}_{\varepsilon}, we assume that XO(1)\|X\|\leq O(1) and QzO(1)\|Q^{z}\|\leq O(1). Given i[n]i\in[n], since we know from Lemma 4.13 that 𝔼[QzQizO(1/n)\|\mathbb{E}[Q^{z}-Q_{-i}^{z}\|\leq O(1/\sqrt{n}), we want to bound:

|xiTAQizxiTr(ΣiA𝔼[Qi])|\displaystyle\left|x_{i}^{T}AQ^{z}_{-i}x_{i}-\operatorname{Tr}\left(\Sigma_{i}A\mathbb{E}\left[Q_{-i}\right]\right)\right| |xiTAQizxiTr(ΣiAQiz)|+|Tr(ΣiA(Qiz𝔼[Qiz]))|.\displaystyle\leq\left|x_{i}^{T}AQ^{z}_{-i}x_{i}-\operatorname{Tr}(\Sigma_{i}AQ^{z}_{-i})\right|+\left|\operatorname{Tr}\left(\Sigma_{i}A(Q^{z}_{-i}-\mathbb{E}[Q^{z}_{-i}])\right)\right|.

Now we know that, for XiX_{-i} fixed, we can bound thanks to Theorem 3.5:

(|xiTAQizxiTr(ΣiTAQiz)|t)\displaystyle\mathbb{P}\left(\left|x_{i}^{T}AQ^{z}_{-i}x_{i}-\operatorname{Tr}(\Sigma_{i}^{T}AQ^{z}_{-i})\right|\geq t\right) 𝔼[Cec(t/QizAF)2+Cect/QizA]\displaystyle\leq\mathbb{E}\left[Ce^{-c(t/\|Q_{-i}^{z}\|\|A\|_{F})^{2}}+Ce^{-ct/\|Q_{-i}^{z}\|\|A\|}\right]
Cect2/AF2+Cect/A,\displaystyle\leq Ce^{-c^{\prime}t^{2}/\|A\|_{F}^{2}}+Ce^{-c^{\prime}t/\|A\|},

for some constants C,c,c>0C,c,c^{\prime}>0, thanks to Lemma 4.10.

Besides, we know from Proposition 4.6 and Lemma 1.7 that Qiz𝔼[Qz]±2(1/n)Q_{-i}^{z}\in\mathbb{E}[Q^{z}]\pm\mathcal{E}_{2}(1/\sqrt{n}) in (p,)(\mathcal{M}_{p},\|\cdot\|_{*}), which allows us to bound:

(|Tr(ΣiAQiz)Tr(ΣiA𝔼[Qz])|t)Cect2/A2,\displaystyle\mathbb{P}\left(\left|\operatorname{Tr}(\Sigma_{i}AQ^{z}_{-i})-\operatorname{Tr}(\Sigma_{i}A\mathbb{E}[Q^{z}])\right|\geq t\right)\leq Ce^{-ct^{2}/\|A\|^{2}},

for some constants C,c>0C,c>0, since ΣiO(1)\|\Sigma_{i}\|\leq O(1). Putting the two concentration inequalities together, we obtain the result of the lemma.

Theorem 0.1 is then a consequence of the following proposition proven in [LC21b] (once Proposition 4.8 is proven, the convex concentration particularities do not intervene anymore). Recall that Λ~zn\tilde{\Lambda}^{z}\in\mathbb{C}^{n} is defined as the unique solution to the equation:

i[n]:Λ~iz=1nTr(ΣiQ~Λ~zz),\displaystyle\forall i\in[n]:\tilde{\Lambda}^{z}_{i}=\frac{1}{n}\operatorname{Tr}\left(\Sigma_{i}\tilde{Q}^{z}_{\tilde{\Lambda}^{z}}\right),

where Q~Λ~zz(zIp1ni=1nΣi1Λ~iz)\tilde{Q}^{z}_{\tilde{\Lambda}^{z}}\equiv\left(zI_{p}-\frac{1}{n}\sum_{i=1}^{n}\frac{\Sigma_{i}}{1-\tilde{\Lambda}_{i}^{z}}\right).

Proposition 4.17.

For all z𝒮¯εz\in\mathbb{C}\setminus\bar{\mathcal{S}}_{\varepsilon}:

𝔼[Q]Q~Λ~zzFO(1n).\displaystyle\left\|\mathbb{E}[Q]-\tilde{Q}^{z}_{\tilde{\Lambda}^{z}}\right\|_{F}\leq O\left(\frac{1}{\sqrt{n}}\right).

Appendix A Proofs of the concentration of products of convexly concentrated random vectors and of convexly concentrated random matrices

We will use several time the following elementary result:

Lemma A.1.

Given a convex mapping f:f:\mathbb{R}\to\mathbb{R}, and a vector a+pa\in\mathbb{R}_{+}^{p}, the mapping F:p(z1,,zp)i=1paif(zi)F:\mathbb{R}^{p}\ni(z_{1},\ldots,z_{p})\mapsto\sum_{i=1}^{p}a_{i}f(z_{i})\in\mathbb{R} is convex (so in particular quasi-convex).

To efficiently manage the concentration rate when multiplying a large number of random vectors, we will also need:

Lemma A.2.

Given mm commutative or non commutative variables a1,,ama_{1},\ldots,a_{m} of a given algebra, we have the identity:

σ𝔖maσ(1)aσ(m)=(1)mI[m](1)|I|(iIai)m,\displaystyle\sum_{\sigma\in\mathfrak{S}_{m}}a_{\sigma(1)}\cdots a_{\sigma(m)}=(-1)^{m}\sum_{I\subset[m]}(-1)^{|I|}\left(\sum_{i\in I}a_{i}\right)^{m},

where |I||I| is the cardinality of II.

Proof A.3.

The idea is to inverse the identity:

(a1++am)m=JI{i1,,im}=Jai1aim,\displaystyle(a_{1}+\cdots+a_{m})^{m}=\sum_{J\subset I}\sum_{\{i_{1},\ldots,i_{m}\}=J}a_{i_{1}}\cdots a_{i_{m}},

thanks to the Rota formula (see [Rol06]) that sets for any mappings f,gf,g defined on the set subsets of \mathbb{N} and having values in a commutative group (for the sum):

I,f(I)=JIg(J)\displaystyle\forall I\subset\mathbb{N},f(I)=\sum_{J\subset I}g(J) \displaystyle\Longleftrightarrow I,g(I)=JIμ𝒫()(J,I)f(J),\displaystyle\forall I\subset\mathbb{N},g(I)=\sum_{J\subset I}\mu_{\mathcal{P}(\mathbb{N})}(J,I)f(J),

where μ𝒫()(J,I)=(1)|IJ|\mu_{\mathcal{P}(\mathbb{N})}(J,I)=(-1)^{|I\setminus J|} is an analog of the Moëbus function for the order relation induced by the inclusions in 𝒫()\mathcal{P}(\mathbb{N}). In our case, for any J[m]J\subset[m], if we set:

f(J)=(iJai)m\displaystyle f(J)=\left(\sum_{i\in J}a_{i}\right)^{m} and g(J)={i1,,im}=Jai1aim,\displaystyle g(J)=\sum_{\{i_{1},\ldots,i_{m}\}=J}a_{i_{1}}\cdots a_{i_{m}},

we see that for any I[m]I\subset[m], f(I)=JIg(J)f(I)=\sum_{J\subset I}g(J), therefore taking the Rota formula in the case I=[m]I=[m], we obtain the result of the Lemma (in that case, μ𝒫()(J,I)=(1)m|J|\mu_{\mathcal{P}(\mathbb{N})}(J,I)=(-1)^{m-|J|} and {i1,,im}=Iai1aim=σ𝔖maσ(1)aσ(m)\sum_{\{i_{1},\ldots,i_{m}\}=I}a_{i_{1}}\cdots a_{i_{m}}=\sum_{\sigma\in\mathfrak{S}_{m}}a_{\sigma(1)}\cdots a_{\sigma(m)}).

Proof A.4 (Proof of Theorem 3.1).

Let us first assume that all the XiX_{i} are equal to a vector ZpZ\in\mathbb{R}^{p}. Considering a=(a1,,ap)pa=(a_{1},\ldots,a_{p})\in\mathbb{R}^{p}, we want to show the concentration of aTZm=i=1paizima^{T}Z^{\odot m}=\sum_{i=1}^{p}a_{i}z_{i}^{m} where z1,,zpz_{1},\ldots,z_{p} are the entries of ZZ.

The mapping pm:xxmp_{m}:x\mapsto x^{m} is not quasi-convex when mm is odd, therefore, in that case we decompose it into the difference of two convex mappings pm(z)=pm+(z)pm(z)p_{m}(z)=p_{m}^{+}(z)-p_{m}^{-}(z) where:

pm+:zmax(zm,0)\displaystyle p_{m}^{+}:z\mapsto\max(z^{m},0) and pm:zmin(zm,0),\displaystyle p^{-}_{m}:z\mapsto-\min(z^{m},0), (8)

(say that, if mm is even, then we set pm+=pmp_{m}^{+}=p_{m} and pm:z0p_{m}^{-}:z\mapsto 0). For the same reasons, we decompose ϕa+:zaTpm+(z)\phi^{+}_{a}:z\mapsto a^{T}p_{m}^{+}(z) and ϕa:zaTpm(z)\phi^{-}_{a}:z\mapsto a^{T}p_{m}^{-}(z) into:

ϕa+=ϕ|a|+ϕ|a|a+\displaystyle\phi^{+}_{a}=\phi^{+}_{|a|}-\phi^{+}_{|a|-a} and ϕa=ϕ|a|ϕ|a|a\displaystyle\phi^{-}_{a}=\phi^{-}_{|a|}-\phi^{-}_{|a|-a}

(for |a|=(|ai|)1ip|a|=(|a_{i}|)_{1\leq i\leq p}), so that:

aTZm=ϕ|a|+(Z)ϕ|a|a+(Z)ϕ|a|(Z)+ϕ|a|a(Z)\displaystyle a^{T}Z^{\odot m}=\phi^{+}_{|a|}(Z)-\phi^{+}_{|a|-a}(Z)-\phi^{-}_{|a|}(Z)+\phi^{-}_{|a|-a}(Z)

becomes a combination of quasi-convex functionals of ZZ. We now need to measure their Lipschitz parameter. Let us bound for any zpz\in\mathbb{R}^{p}:

|ϕ|a|+(z)|=i=1n|ai||zi|mazzm1,\displaystyle\left|\phi^{+}_{|a|}(z)\right|=\sum_{i=1}^{n}|a_{i}||z_{i}|^{m}\leq\|a\|\|z\|\|z\|_{\infty}^{m-1},

and the same holds for ϕ|a|a+\phi^{+}_{|a|-a}, ϕ|a|\phi^{-}_{|a|} and ϕ|a|a\phi^{-}_{|a|-a}. Note then that ϕ|a|+\phi^{+}_{|a|}, ϕ|a|a+\phi^{+}_{|a|-a}, ϕ|a|\phi^{-}_{|a|} and ϕ|a|a\phi^{-}_{|a|-a} are all aκm1\|a\|\kappa^{m-1}-Lipschitz to conclude on the concentration of XmX^{\odot m}.

Now, if we assume that the X1,,XmX_{1},\ldots,X_{m} are different, we employ Lemma A.2 in this commutative case to write (|𝔖m|=m!|\mathfrak{S}_{m}|=m!):

(X1Xm)=(1)mm!I[m](1)|I|(iIXi)m.\displaystyle(X_{1}\odot\cdots\odot X_{m})=\frac{(-1)^{m}}{m!}\sum_{I\subset[m]}(-1)^{|I|}\left(\sum_{i\in I}X_{i}\right)^{\odot m}. (9)

Therefore, the sum (p)Iz1,,zi|I|iIzip(\mathbb{R}^{p})^{I}\ni z_{1},\ldots,z_{i_{|I|}}\mapsto\sum_{i\in I}z_{i}\in\mathbb{R}^{p} being mm-Lipschitz for the norm \|\cdot\|_{\infty}, we know that I[m]\forall I\subset[m], iIXic2(mσ)\sum_{i\in I}X_{i}\propto_{c}\mathcal{E}_{2}(m\sigma), and iIXiκm\|\sum_{i\in I}X_{i}\|_{\infty}\leq\kappa m, therefore, (iIXi)m2(mmκm1σ)(\sum_{i\in I}X_{i})^{\odot m}\in\mathcal{E}_{2}(m^{m}\kappa^{m-1}\sigma). We can then exploit Proposition 2.1 to obtain

((iIXi)m)I[m]2(mmκm1σ)\displaystyle\left(\left(\sum_{i\in I}X_{i}\right)^{\odot m}\right)_{I\subset[m]}\in\mathcal{E}_{2}(m^{m}\kappa^{m-1}\sigma) in ((p)2m,),\displaystyle\text{in }\ \left((\mathbb{R}^{p})^{2^{m}},\|\cdot\|_{\ell^{\infty}}\right),

(note that #{I[m]}=2m\#\{I\subset[m]\}=2^{m}) Thus summing the 2m2^{m} concentration inequalities, we can conclude from Equation (9), and the Stirling formula mmm!=em2πm+O(1)\frac{m^{m}}{m!}=\frac{e^{m}}{\sqrt{2\pi m}}+O(1) that:

(X1Xm)2((2eκ)m1σ).\displaystyle(X_{1}\odot\cdots\odot X_{m})\in\mathcal{E}_{2}\left((2e\kappa)^{m-1}\sigma\right).

For the concentration of the matrix product, we introduce a new notion of concentration, namely the transversal convex concentration. Let us give some definitions.

Definition A.5.

Given a sequence of normed vector spaces (En,n)n0(E_{n},\|\cdot\|_{n})_{n\geq 0}, a sequence of groups (Gn)n0(G_{n})_{n\geq 0}, each GnG_{n} (for nn\in\mathbb{N}) acting on EnE_{n}, a sequence of random vectors (Zn)n0n0En(Z_{n})_{n\geq 0}\in\prod_{n\geq 0}E_{n}, a sequence of positive reals (σn)n0+(\sigma_{n})_{n\geq 0}\in\mathbb{R}_{+}^{\mathbb{N}}, we say that Z=(Zn)n0Z=(Z_{n})_{n\geq 0} is convexly concentrated transversally to the action of GG with an observable diameter of order σ\sigma and we note ZGT2(σ)Z\propto^{T}_{G}\mathcal{E}_{2}(\sigma) iff there exist two constants C,cO(1)C,c\leq O(1) such that n\forall n\in\mathbb{N} and for any 11-Lipschitz, quasi-convex and GG-invariant161616For any gGg\in G and xEx\in E, f(x)=f(gx)f(x)=f(g\cdot x) function f:Enf:E_{n}\to\mathbb{R}, t>0:\forall t>0:171717Once again, we point out that one could have replaced here 𝔼[f(Zn)]\mathbb{E}[f(Z_{n})] by f(Zn)f(Z_{n}^{\prime}) or mfm_{f}.

(|f(Zn)𝔼[f(Zn)]|t)Cec(t/σn)2\mathbb{P}\left(\left|f(Z_{n})-\mathbb{E}[f(Z_{n})]\right|\geq t\right)\leq Ce^{-c(t/\sigma_{n})^{2}}.

Remark A.6.

Given a normed vector space (E,)(E,\|\cdot\|), a group GG acting on EE and a random vector ZEZ\in E, we have the implication chain:

Z2(σ)\displaystyle Z\propto\mathcal{E}_{2}(\sigma) \displaystyle\Longrightarrow Zc2(σ)\displaystyle Z\propto_{c}\mathcal{E}_{2}(\sigma) \displaystyle\Longrightarrow ZGT2(σ).\displaystyle Z\propto_{G}^{T}\mathcal{E}_{2}(\sigma).

Considering the actions:

  • 𝔖n\mathfrak{S}_{n} on p\mathbb{R}^{p} where for σ𝔖n\sigma\in\mathfrak{S}_{n} and xpx\in\mathbb{R}^{p}, σx=(xσ(i))1ip\sigma\cdot x=(x_{\sigma(i)})_{1\leq i\leq p},

  • 𝒪p,n𝒪p×𝒪n\mathcal{O}_{p,n}\equiv\mathcal{O}_{p}\times\mathcal{O}_{n} on p,n\mathcal{M}_{p,n} where for (P,Q)𝒪p,n(P,Q)\in\mathcal{O}_{p,n} and Mp,nM\in\mathcal{M}_{p,n}, (P,Q)M=PMQ(P,Q)\cdot M=PMQ,

the convex concentration in p,n\mathcal{M}_{p,n} transversally to 𝒪p,n\mathcal{O}_{p,n} can be expressed as a concentration on p\mathbb{R}^{p} transversally to 𝔖n\mathfrak{S}_{n} thanks to the introduction the mapping σ\sigma providing to any matrix the ordered sequence of its singular values :

σ:p,n+dM(σ1(M),,σd(M)).\displaystyle\sigma\ :\ \begin{aligned} &\mathcal{M}_{p,n}&&\rightarrow&&\hskip 39.83368pt\mathbb{R}^{d}_{+}\\ &M&&\mapsto&&(\sigma_{1}(M),\ldots,\sigma_{d}(M)).\end{aligned} with d=min(p,n)\displaystyle\text{with }\ d=\min(p,n)

(there exists (P,Q)𝒪p,n(P,Q)\in\mathcal{O}_{p,n} such that M=PΣ(M)QM=P\Sigma(M)Q, where Σp,n\Sigma\in\mathcal{M}_{p,n} has σ1(M)σd(M)\sigma_{1}(M)\geq\cdots\geq\sigma_{d}(M) on the diagonal).

Theorem A.7 ([Led05], Corollary 8.23. [LC19], Theorem 2.44).

Given a random matrix Zp,nZ\in\mathcal{M}_{p,n}:

Z𝒪p,nT2(σ)\displaystyle Z\propto^{T}_{\mathcal{O}_{p,n}}\mathcal{E}_{2}(\sigma) \displaystyle\Longleftrightarrow σ(Z)𝔖dT2(σ)\displaystyle\sigma(Z)\propto^{T}_{\mathfrak{S}_{d}}\mathcal{E}_{2}(\sigma)

(where the concentrations inequalities are implicitly expressed for euclidean norms: F\|\cdot\|_{F} on p,n\mathcal{M}_{p,n} and \|\cdot\| on d\mathbb{R}^{d}).

Proof A.8 (Proof of Theorem 3.3).

Let us start to study the case where X1==XmXnX_{1}=\cdots=X_{m}\equiv X\in\mathcal{M}_{n} and X2X\propto\mathcal{E}_{2} in (n,F)(\mathcal{M}_{n},\|\cdot\|_{F}). We know from Theorem A.7 that:

σ(X)𝔖pT2,\displaystyle\sigma(X)\propto_{\mathfrak{S}_{p}}^{T}\mathcal{E}_{2},

and therefore, as a n\sqrt{n}-Lipschitz linear observation of σ(X)m2(κm1σ)\sigma(X)^{\odot m}\in\mathcal{E}_{2}\left(\kappa^{m-1}\sigma\right) (see Theorem 3.1), Tr(Xm)\operatorname{Tr}(X^{m}) follows the concentration:

Tr(Xm)=i=1pσi(X)m2(nκm1σ).\displaystyle\operatorname{Tr}(X^{m})=\sum_{i=1}^{p}\sigma_{i}(X)^{m}\in\mathcal{E}_{2}\left(\sqrt{n}\kappa^{m-1}\sigma\right).

Now, we consider the general setting where we are given mm matrices X1,,XmX_{1},\ldots,X_{m}, a deterministic matrix Ann,n0A\in\mathcal{M}_{n_{n},n_{0}} satisfying A1\|A\|\leq 1, and we want to show the concentration of tr(AX1,,Xm)tr(AX_{1},\cdots,X_{m}). First note that we stay in the hypotheses of the theorem if we replace X1X_{1} with AX1AX_{1}, we are thus left to show the concentration of Tr(X1Xm)\operatorname{Tr}(X_{1}\cdots X_{m}). We can not employ again Lemma A.2 without a strong hypothesis of commutativity on the matrices X1,,XnX_{1},\ldots,X_{n}. Indeed, one could not have gone further than a concentration on the whole term σ𝔖pTr(Xσ(1)Xσ(m))\sum_{\sigma\in\mathfrak{S}_{p}}\operatorname{Tr}(X_{\sigma(1)}\cdots X_{\sigma(m)}). However, we can still introduce the random matrix

Y=(0Xm1X1Xm0)\displaystyle Y=\left(\begin{array}[]{cccc}0&X_{m-1}&&\\ &\ddots&\ddots&\\ &&\ddots&X_{1}\\ X_{m}&&&0\end{array}\right) then Ym=(0X1mX32X210),\displaystyle Y^{m}=\left(\begin{array}[]{cccc}0&X^{m}_{1}&&\\ &\ddots&\ddots&\\ &&\ddots&X_{3}^{2}\\ X_{2}^{1}&&&0\end{array}\right),

where for i,j{2,,m1}i,j\in\{2,\ldots,m-1\}, XijXiXi+1XmX1XjX^{j}_{i}\equiv X_{i}X_{i+1}\cdots X_{m}X_{1}\cdots X_{j}. Since Yn0++nmY\in\mathcal{M}_{n_{0}+\cdots+n_{m}} satisfies Y2(σ)Y\propto\mathcal{E}_{2}(\sigma) and Yκ\|Y\|\leq\kappa, the first part of the proof provides the concentration Ym2(κm1σn0++nm)Y^{m}\in\mathcal{E}_{2}\left(\kappa^{m-1}\sigma\sqrt{n_{0}+\cdots+n_{m}}\right) in (n,)\left(\mathcal{M}_{n},\|\cdot\|_{*}\right) which directly implies the concentration of X1m=X1XmX_{1}^{m}=X_{1}\cdots X_{m}.

Appendix B Alternative proof of Proposition 4.6

We are going to show the concentration of the real part and the imaginary part of QzQ^{z}, where:

(Qz)=Qz((z)Ip1nXXT)Q¯z=((z)z)|Qz|2+Q¯z\displaystyle\Re(Q^{z})=Q^{z}\left(\Re(z)I_{p}-\frac{1}{n}XX^{T}\right)\bar{Q}^{z}=(\Re(z)-z)|Q^{z}|^{2}+\bar{Q}^{z}
(Qz)=(z)|Qz|2\displaystyle\Im(Q^{z})=\Im(z)|Q^{z}|^{2}

Since it is harder, we will only prove the linear concentration of |Qz|2=((z)2+((z)1nXXT)2)1|Q^{z}|^{2}=(\Im(z)^{2}+(\Re(z)-\frac{1}{n}XX^{T})^{2})^{-1}. For that we are going to decompose, for any matrix ApA\in\mathcal{M}_{p} with unit spectral norm, the random variable Tr(A|Qz|2)\operatorname{Tr}(A|Q^{z}|^{2}) as the sum of convex and O(1/n)O(1/\sqrt{n})-Lipschitz mappings of XX. Let us introduce the two mappings, ψ:pp\psi:\mathcal{M}_{p}\to\mathcal{M}_{p} and ϕ:p,np\phi:\mathcal{M}_{p,n}\to\mathcal{M}_{p} defined for any MpM\in\mathcal{M}_{p} and Bp,nB\in\mathcal{M}_{p,n} with:

ψ(M)=((z)2+M)1\displaystyle\psi(M)=(\Im(z)^{2}+M)^{-1} ϕ(B)=(z)22(z)nBBT+1n2BBTBBT.\displaystyle\phi(B)=\Re(z)^{2}-\frac{2\Re(z)}{n}BB^{T}+\frac{1}{n^{2}}BB^{T}BB^{T}.

We then have the identity Tr(AQz)=Tr(Aψϕ(X))\operatorname{Tr}(AQ^{z})=\operatorname{Tr}\left(A\psi\circ\phi\left(X\right)\right).

We then then look at the second derivative of ψϕ\psi\circ\phi to prove convex properties on Tr(Aψϕ)\operatorname{Tr}(A\psi\circ\phi). Given HpH\in\mathcal{M}_{p}, let us compute:

dψ MH=ϕ(M)Hϕ(M)\displaystyle\mathchoice{{d\psi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,M}}{{d\psi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,M}}{{d\psi\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,M}}{{d\psi\,\smash{\vrule height=2.77777pt,depth=1.16167pt}}_{\,M}}\cdot H=-\phi(M)H\phi(M) d2ψ M(H,H)=2ϕ(M)Hϕ(M)Hϕ(M),\displaystyle\mathchoice{{d^{2}\psi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,M}}{{d^{2}\psi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,M}}{{d^{2}\psi\,\smash{\vrule height=4.92pt,depth=1.16167pt}}_{\,M}}{{d^{2}\psi\,\smash{\vrule height=3.80888pt,depth=1.16167pt}}_{\,M}}\cdot(H,H)=2\phi(M)H\phi(M)H\phi(M),

and given Kp,nK\in\mathcal{M}_{p,n}:

dϕ BK=2(z)nL(B,K)+1n2P(K,B)\displaystyle\mathchoice{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,B}}{{d\phi\,\smash{\vrule height=2.77777pt,depth=1.16167pt}}_{\,B}}\cdot K=-\frac{2\Re(z)}{n}L(B,K)+\frac{1}{n^{2}}P(K,B)
d2ϕ B(K,K)=2(z)nKKT+2n2P2(K,B),\displaystyle\mathchoice{{d^{2}\phi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,B}}{{d^{2}\phi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,B}}{{d^{2}\phi\,\smash{\vrule height=4.92pt,depth=1.16167pt}}_{\,B}}{{d^{2}\phi\,\smash{\vrule height=3.80888pt,depth=1.16167pt}}_{\,B}}\cdot(K,K)=-\frac{2\Re(z)}{n}KK^{T}+\frac{2}{n^{2}}P_{2}(K,B),

where:

  • L(B,K)=BKT+KBTL(B,K)=BK^{T}+KB^{T}

  • P(B,K)=KBTBBT+BKTBBT+BBTKBT+BBTBKTP(B,K)=KB^{T}BB^{T}+BK^{T}BB^{T}+BB^{T}KB^{T}+BB^{T}BK^{T}

  • P2(B,K)=KKTBBT+KBTKBT+KBTBKT+BKTKBT+BKTBKT+BBTKKTP_{2}(B,K)=KK^{T}BB^{T}+KB^{T}KB^{T}+KB^{T}BK^{T}+BK^{T}KB^{T}+BK^{T}BK^{T}+BB^{T}KK^{T}.

First we deduce from the expression of the first derivative and thanks to Lemma 4.10 that, on X(𝒜)X(\mathcal{A}), Tr(Aψϕ)\operatorname{Tr}\left(A\psi\circ\phi\right) is a O(AF/n)=O(1)O(\|A\|_{F}/\sqrt{n})=O(1)-Lipschitz transformation of XX (for the Frobenius norm).

Second,choosing M=ϕ(B)M=\phi(B):

d2ψϕ B(K,K)\displaystyle\mathchoice{{d^{2}\psi\circ\phi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,B}}{{d^{2}\psi\circ\phi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,B}}{{d^{2}\psi\circ\phi\,\smash{\vrule height=4.92pt,depth=1.16167pt}}_{\,B}}{{d^{2}\psi\circ\phi\,\smash{\vrule height=3.80888pt,depth=1.16167pt}}_{\,B}}\cdot(K,K) =d2ψ mM(dϕ BK,dϕ BK)+dψ mM(d2ϕ B(K,K))\displaystyle=\mathchoice{{d^{2}\psi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,mM}}{{d^{2}\psi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,mM}}{{d^{2}\psi\,\smash{\vrule height=4.92pt,depth=1.16167pt}}_{\,mM}}{{d^{2}\psi\,\smash{\vrule height=3.80888pt,depth=1.16167pt}}_{\,mM}}\cdot\left(\mathchoice{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,B}}{{d\phi\,\smash{\vrule height=2.77777pt,depth=1.16167pt}}_{\,B}}\cdot K,\mathchoice{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,B}}{{d\phi\,\smash{\vrule height=2.77777pt,depth=1.16167pt}}_{\,B}}\cdot K\right)+\mathchoice{{d^{\psi}\,\smash{\vrule height=7.11113pt,depth=1.62633pt}}_{\,mM}}{{d^{\psi}\,\smash{\vrule height=7.11113pt,depth=1.62633pt}}_{\,mM}}{{d^{\psi}\,\smash{\vrule height=5.0pt,depth=1.16167pt}}_{\,mM}}{{d^{\psi}\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,mM}}\cdot\left(\mathchoice{{d^{2}\phi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,B}}{{d^{2}\phi\,\smash{\vrule height=6.99913pt,depth=1.65279pt}}_{\,B}}{{d^{2}\phi\,\smash{\vrule height=4.92pt,depth=1.16167pt}}_{\,B}}{{d^{2}\phi\,\smash{\vrule height=3.80888pt,depth=1.16167pt}}_{\,B}}\cdot(K,K)\right)
=2ϕ(M)(dϕ BK)ϕ(M)(dϕ BK)ϕ(M)\displaystyle=2\phi(M)\left(\mathchoice{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,B}}{{d\phi\,\smash{\vrule height=2.77777pt,depth=1.16167pt}}_{\,B}}\cdot K\right)\phi(M)\left(\mathchoice{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=5.55557pt,depth=1.65279pt}}_{\,B}}{{d\phi\,\smash{\vrule height=3.88889pt,depth=1.16167pt}}_{\,B}}{{d\phi\,\smash{\vrule height=2.77777pt,depth=1.16167pt}}_{\,B}}\cdot K\right)\phi(M)
+2(z)nϕ(M)KKTϕ(M)2n2ϕ(M)P2(K,B)ϕ(M).\displaystyle\hskip 14.22636pt+\frac{2\Re(z)}{n}\phi(M)KK^{T}\phi(M)-\frac{2}{n^{2}}\phi(M)P_{2}(K,B)\phi(M).

In this identity the only term raising an issue is 2n2ϕ(M)P2(K,B)ϕ(M)\frac{2}{n^{2}}\phi(M)P_{2}(K,B)\phi(M) because P2(K,B)P_{2}(K,B) is not nonnegative symmetric. We can however still bound:

12n2Tr(Aϕ(M)P2(K,B)ϕ(M))12n2Aϕ(M)2B2KF2O(1nTr(KKT)),\displaystyle\frac{12}{n^{2}}\operatorname{Tr}\left(A\phi(M)P_{2}(K,B)\phi(M)\right)\leq\frac{12}{n^{2}}\|A\|\|\phi(M)\|^{2}\|B\|^{2}\|K\|_{F}^{2}\leq O\left(\frac{1}{n}\operatorname{Tr}(KK^{T})\right),

for BX(𝒜Q)B\in X(\mathcal{A}_{Q}) (in particular BO(n)\|B\|\leq O(\sqrt{n}) and ϕ(M)O(1)\|\phi(M)\|\leq O(1)). Now, if we note h:p,nh:\mathcal{M}_{p,n}\to\mathbb{R} defined for any Bp,nB\in\mathcal{M}_{p,n} as h(B)=1nTr(BBT)h(B)=\frac{1}{n}\operatorname{Tr}(BB^{T}), we see that 1nTr(KKT)=d2h(B)(K,K)\frac{1}{n}\operatorname{Tr}(KK^{T})=d^{2}h(B)\cdot(K,K) is a quadratic functional on KK, hh is thus convex. It is beside O(1)O(1)-Lipschitz on X(𝒜Q)X(\mathcal{A}_{Q}) (for the Frobenius norm). Assuming in a first time that AA is nonnegative symmetric and choosing a constant CO(1)C\leq O(1) sufficiently big, we show that BTr(Aψϕ(B))+Ch(B)B\mapsto\operatorname{Tr}(A\psi\circ\phi(B))+Ch(B) is convex and O(1)O(1)-Lipschitz on X(𝒜Q)X(\mathcal{A}_{Q}) like ChCh. We have thus the concentration:

(Tr(A|Qz|2)|𝒜)2.\displaystyle\left(\operatorname{Tr}(A|Q^{z}|^{2})\ |\ \mathcal{A}\right)\in\mathcal{E}_{2}.

Now, given a general matrix ApA\in\mathcal{M}_{p}, we decompose A=A+A+A0A=A_{+}-A_{-}+A_{0} where A+A_{+} and AA_{-} are nonnegative symmetric and A0A_{0} is anti-symmetric, in that case Tr(A|Qz|2)=Tr(A+|Qz|2)Tr(A|Qz|2)\operatorname{Tr}(A|Q^{z}|^{2})=\operatorname{Tr}(A_{+}|Q^{z}|^{2})-\operatorname{Tr}(A_{-}|Q^{z}|^{2}), and we can conclude the same way. That eventually gives us the concentration of the proposition.

References

  • [Ada11] Radoslaw Adamczak. On the marchenko-pastur and circular laws for some classes of random matrices with dependent entries. Electronic Journal of Probability, 16:1065–1095, 2011.
  • [Ada15] Radosław Adamczak. A note on the hanson-wright inequality for random vectors with dependencies. Electronic Communications in Probability, 20(72):1–13, 2015.
  • [BLM13] Stéphane Boucheron, Gabor Lugosi, and Pascal Massart. Concentration inequalities. Oxford University Press, 2013.
  • [Gro79] Mikhail Gromov. Paul levy’s isoperimetric inequality. preprint de l’IHES, 1979.
  • [GZ00] Alice Guillonet and Olivier Zeitouni. Concentration of the spectral measure for large matrices. Electronic Communications in Probability, 2000.
  • [HT21] Han Huang and Konstantin Tikhomirov. On dimension-dependent concentration for convex lipschitz functions in product spaces. arXiv:2106.06121v1, 2021.
  • [LC19] Cosme Louart and Romain Couillet. Concentration of measure and large random matrices with an application to sample covariance matrices. arXiv:1805.08295, 2019.
  • [LC21a] Cosme Louart and Romain Couillet. Concentration of measure and generalized product of random vectors with an application to hanson-wright-like inequalities. arXiv preprint arXiv:2102.08020, 2021.
  • [LC21b] Cosme Louart and Romain Couillet. Spectral properties of sample covariance matrices arising from random matrices with independent non identically distributed columns. arXiv preprint, 2021.
  • [Led05] Michel Ledoux. The concentration of measure phenomenon. Number 89. American Mathematical Soc., 2005.
  • [Lé51] Paul Lévy. Problemes concrets d’analyse fonctionnelle. Gauthier-Villars, 1951.
  • [Mil71] Vitali Milman. Asymptotic properties of functions of several variables that are defined on homogeneous spaces. Doklady Akademii Nauk SSSR, 199:1247–1250, 1971.
  • [MS11] Mark W. Meckes and Stanislaw J. Szqrek. Concentration for noncommutative polynomials in random matrices. Proceedings of the American Mathematical Societ, 2011.
  • [Nou09] El Karoui Nourredine. Concentration of measure and spectra of random matrices: applications to correlation matrices, elliptical distributions and beyond. The Annals of Applied Probability, 19(6):2362–2405, 2009.
  • [Rol06] Robert Rolland. Fonctions de möbius - formule de rota. 2006.
  • [Tal95] Michel Talagrand. Concentration of Measure and Isoperimetric Inequalities in product spaces. Publications mathématiques de l’I.H.E.S., tome 81, 1995.
  • [Tao12] Terence Tao. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012.
  • [VW14] Van Vu and Ke Wang. Random weighted projections, random quadratic forms and random eigenvectors. Random Structures and Algorithms, 2014.