This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Simple Note on the Basic Properties of
Subgaussian Random Variables

\nameYang Li \email[email protected]
\addrInternational Research Center for Neurointelligence
the University of Tokyo
7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
Abstract

This note provides a basic description of subgaussianity, by defining (σ,ρ)(\sigma,\rho)-subgaussian random variables XX (σ>0,ρ>0\sigma>0,\rho>0) as those satisfying 𝔼(exp(λX))ρexp(12σ2λ2)\mathbb{E}(\exp(\lambda X))\leq\rho\exp(\frac{1}{2}\sigma^{2}\lambda^{2}) for any λ\lambda\in\mathbb{R}. The introduction of the parameter ρ\rho may be particularly useful for those seeking to refine bounds, or align results from different sources, in the analysis of stochastic processes and concentration inequalities.

1 Introduction

A subgaussian random variable refers to one whose probability distribution tail decays (at least) as fast as that of a Gaussian variable. A well-known bound for the standard normal cumulative distribution function Φ(x)\Phi(x) is given by

2π1x+x2+4exp(x22)<1Φ(x)2π1x+x2+8/πexp(x22)\sqrt{\frac{2}{\pi}}\cdot\frac{1}{x+\sqrt{x^{2}+4}}\exp\left(-\frac{x^{2}}{2}\right)<1-\Phi(x)\leq\sqrt{\frac{2}{\pi}}\cdot\frac{1}{x+\sqrt{x^{2}+8/\pi}}\exp\left(-\frac{x^{2}}{2}\right) (1)

for all x0x\geq 0 [1]; accordingly, the tail probability of a subgaussian variable is bounded above by ρexp(x2/2σ2)\rho\exp(-x^{2}/2\sigma^{2}) with some constants ρ,σ2>0\rho,\,\sigma^{2}>0. Actually, many distributions other than the Gaussian, such as the Gamma and Weibull distributions with large shape parameters, and distributions with bounded supports, can exhibit subgaussian properties. The concept of subgaussianity is useful in many applications in probability theory and statistics, notably in analyzing random processes and deriving concentration inequalities.

Subgaussian variables can be defined through various equivalent characterizations, such as upper-bounded tail probability, higher-order moments, moment-generating function, and Orlicz norms [2, 3, 4, 5, 6]. Each characterization involves a “variance proxy” (such as the previously mentioned σ2\sigma^{2}) serving as a parameter to quantify statistical dispersion or the rate of tail decay, and the variance proxies differ only by an absolute constant factor across characterizations. Textbooks on subgaussianity commonly address this equivalence, though they typically do not focus on finding potentially better bounds. Existing texts often fix the values of ρ\rho or its counterparts at simple values like 11 or 22 while discussing the variance proxies; however, for those prioritizing lighter distribution tails, it may be possible to com-promise the value of ρ\rho in pursuit of a more favorable σ2\sigma^{2}. On the other hand, different fixed values of ρ\rho may be chosen in applications, according to the author’s convention; for instance, both 𝔼[exp(X2/σ2)]2\mathbb{E}\left[\exp(X^{2}/\sigma^{2})\right]\leq 2 [2, 5, 6, 7] and 𝔼[exp(X2/σ2)]exp(1)\mathbb{E}\left[\exp(X^{2}/\sigma^{2})\right]\leq\exp(1) [8, 9] are frequently used assumptions, which can cause inconvenience when comparing the corresponding results. It is also worth noting that some texts require subgaussian variables to be centered, depending on the specific characterization used to define subgaussianity [4, 7]. For example, when a random variable XX is defined as subgaussian with its moment-generating function satisfying 𝔼[exp(λX)]ρexp(σ2λ2/2)\mathbb{E}\left[\exp(\lambda X)\right]\leq\rho\exp(\sigma^{2}\lambda^{2}/2) (λ\lambda\in\mathbb{R}), fixing ρ\rho at 11 naturally implies 𝔼[X]=0\mathbb{E}\left[X\right]=0 [7, 10].

This note aims to restate the basic properties of subgaussian variables by introducing the parameter ρ\rho alongside the variance proxy. These properties include equivalent characterizations of subgaussianity (without requiring the variables to be centered), its closure under simple operations, and an application to martingale differences. While the derivations are largely drawn from existing materials (with necessary modifications), I hope this flexible treatment of subgaussianity will benefit those seeking refined bounds or aligned results from different sources, in the analysis of large deviations and concentration inequalities.

2 Characterizations and Properties

2.1 Equivalent characterizations

Let us begin the main body with the adjusted definition of subgaussian variables, followed by Theorem 1, which states the equivalence of various characterizations.

Definition 1

A random variable XX\in\mathbb{R} is called (σ,ρ)(\sigma,\rho)-subgaussian if there exist constants σ>0\sigma>0 and ρ1\rho\geq 1 such that the moment-generating function of XX satisfies

𝔼[exp(λX)]ρexp(12σ2λ2), for all λ.\mathbb{E}\left[\exp(\lambda X)\right]\leq\rho\exp\Big{(}\frac{1}{2}\sigma^{2}\lambda^{2}\Big{)},\text{\quad for all $\lambda\in\mathbb{R}$}. (2)
Theorem 1 (Equivalent characterizations of subgaussianity)

For a random variable XX\in\mathbb{R} and constants σ1,σ2,σ3,σ4,σ5>0\sigma_{1},\sigma_{2},\sigma_{3},\sigma_{4},\sigma_{5}>0, ρ1,ρ3,ρ41\rho_{1},\rho_{3},\rho_{4}\geq 1, ρ20\rho_{2}\geq 0 and ρ51/2\rho_{5}\geq 1/2, the following properties are equivalent in the sense that, there exist absolute constants CijC_{ij} and mappings φij\varphi_{ij} such that the implication from (j)(\textup{j}) to (i)(\textup{i}) holds for all σi\sigma_{i} and ρi\rho_{i} whenever σiCijσj\sigma_{i}\geq C_{ij}\sigma_{j} and ρiφij(ρj)\rho_{i}\geq\varphi_{ij}(\rho_{j}):

  • (1)

    The distribution tail of |X|\lvert X\rvert satisfies

    [|X|λσ1]ρ1exp(λ2/2), for all λ0.\mathbb{P}\left[\lvert X\rvert\geq\lambda\sigma_{1}\right]\leq\rho_{1}\exp(-\lambda^{2}/2),\text{\quad for all $\lambda\geq 0$}.
  • (2)

    The moments of even orders of XX satisfy

    𝔼[X2k]ρ2σ22kk!, for all positive integers k.\mathbb{E}\,\big{[}X^{2k}\big{]}\leq\rho_{2}\cdot\sigma_{2}^{2k}\cdot k!,\text{\quad for all positive integers $k$}.
  • (3)

    The moment-generating function of X2X^{2} is finite at a specific point such that

    𝔼[exp(X2/σ32)]ρ3.\mathbb{E}\left[\exp(X^{2}/\sigma_{3}^{2})\right]\leq\rho_{3}.\qquad\qquad
  • (4)

    XX is (σ4,ρ4)(\sigma_{4},\rho_{4})-subgaussian, namely, the moment-generating function of XX satisfies

    𝔼[exp(λX)]ρ4exp(12σ42λ2), for all λ.\mathbb{E}\left[\exp(\lambda X)\right]\leq\rho_{4}\exp\Big{(}\frac{1}{2}\sigma_{4}^{2}\lambda^{2}\Big{)},\text{\quad for all $\lambda\in\mathbb{R}$}.
  • (5)

    The distribution tails of XX satisfy

    max{[Xλσ5],[Xλσ5]}ρ5exp(λ2/2), for all λ0.\max\big{\{}\mathbb{P}\left[X\geq\lambda\sigma_{5}\right],\mathbb{P}\left[X\leq-\lambda\sigma_{5}\right]\big{\}}\leq\rho_{5}\exp(-\lambda^{2}/2),\text{\quad for all $\lambda\geq 0$}.
Remark 1

A set of applicable choices for CijC_{ij} and φij(ρ)\varphi_{ij}(\rho) in Theorem 1 is listed in the table below, where λ(0,1)\lambda\in(0,1) is a parameter. Note that these choices are not necessarily sharp.

Concl. Hyp. (1)(1) (2)(2) (3)(3) (4)(4) (5)(5)
(1)(1) 12λ,1+λρ1λ\sqrt{\frac{1}{2\lambda}},1+\frac{\lambda\rho}{1-\lambda} 12,ρ\sqrt{\frac{1}{2}},\rho \dagger 1,2ρ1,2\rho 1,2ρ1,2\rho \dagger
(2)(2) 2,ρ\sqrt{2},\rho \dagger 1,ρ11,\rho-1 \dagger 2,2ρ\sqrt{2},2\rho 2,2ρ\sqrt{2},2\rho
(3)(3) 2λ,11λρλ\sqrt{\frac{2}{\lambda}},\frac{1}{1-\lambda}\rho^{\lambda} \dagger 1λ,1+λρ1λ\sqrt{\frac{1}{\lambda}},1+\frac{\lambda\rho}{1-\lambda} \dagger 2λ\sqrt{\frac{2}{\lambda}}, Eq. (3\dagger 2λ,11λ(2ρ)λ\sqrt{\frac{2}{\lambda}},\frac{1}{1-\lambda}(2\rho)^{\lambda}
(4)(4) 1λ,11λρλ\sqrt{\frac{1}{\lambda}},\frac{1}{1-\lambda}\rho^{\lambda} 12λ,1+λρ1λ\sqrt{\frac{1}{2\lambda}},1+\frac{\lambda\rho}{1-\lambda} 12,ρ\sqrt{\frac{1}{2}},\rho \dagger 1λ,11λ(2ρ)λ\sqrt{\frac{1}{\lambda}},\frac{1}{1-\lambda}(2\rho)^{\lambda}
(5)(5) 1λ,11λρλ\sqrt{\frac{1}{\lambda}},\frac{1}{1-\lambda}\rho^{\lambda} 12λ,1+λρ1λ\sqrt{\frac{1}{2\lambda}},1+\frac{\lambda\rho}{1-\lambda} 12,ρ\sqrt{\frac{1}{2}},\rho 1,ρ1,\rho \dagger
  • \dagger

    Derivations of the entries marked with \dagger are provided in the proof. The others simply follow from the transitivity of implications, and their details are omitted.

Proof 
(1)\implies(2) (according to Ref. [2]): Given property (1), we have

𝔼[|X|n]=0[|X|nxn]dxn=n0[|X|λσ1](λσ1)n1d(λσ1)nρ1σ1n0exp(λ2/2)λn1dλ=12nρ1(2σ1)n0exp(x)xn22dx=ρ1(2σ1)nΓ(n2+1)\begin{split}\mathbb{E}\left[\lvert X\rvert^{n}\right]&=\int_{0}^{\infty}\mathbb{P}\left[\lvert X\rvert^{n}\geq x^{n}\right]\,\mathrm{d}x^{n}=n\int_{0}^{\infty}\mathbb{P}\left[\lvert X\rvert\geq\lambda\sigma_{1}\right]\,(\lambda\sigma_{1})^{n-1}\,\mathrm{d}(\lambda\sigma_{1})\\ &\leq n\rho_{1}\sigma_{1}^{n}\int_{0}^{\infty}\exp(-\lambda^{2}/2)\,\lambda^{n-1}\,\mathrm{d}\lambda=\frac{1}{2}n\rho_{1}\big{(}\sqrt{2}\sigma_{1}\big{)}^{n}\int_{0}^{\infty}\exp(-x)\,x^{\tfrac{n-2}{2}}\,\mathrm{d}x\\ &=\rho_{1}\cdot\big{(}\sqrt{2}\sigma_{1}\big{)}^{n}\cdot\Gamma\left(\frac{n}{2}+1\right)\end{split}

for any n1n\geq 1, where Γ()\Gamma(\,\cdot\,) denotes the gamma function. When n=2kn=2k with k=1,2,3k=1,2,3\dots, property (2) is obtained with σ22σ1\sigma_{2}\geq\sqrt{2}\sigma_{1} and ρ2ρ1\rho_{2}\geq\rho_{1}; that is, C21=2C_{21}=\sqrt{2} and φ21(ρ)=ρ\varphi_{21}(\rho)=\rho.

(2)\implies(3) (according to Ref. [2], with adjustments): Given property (2), we find that the moment-generating function of X2X^{2} satisfies

𝔼[exp(λX2/σ22)]=k=0λk𝔼[X2k]k!σ22k1+ρ2k=1λk=1+λρ21λ,\begin{split}\mathbb{E}\left[\exp(\lambda X^{2}/\sigma_{2}^{2})\right]&=\sum_{k=0}^{\infty}\frac{\lambda^{k}\,\mathbb{E}[X^{2k}]}{k!\,\sigma_{2}^{2k}}\,\\[-3.0pt] &\leq 1+\rho_{2}\sum_{k=1}^{\infty}\lambda^{k}=1+\frac{\lambda\rho_{2}}{1-\lambda},\end{split}

for all λ[0,1)\lambda\in[0,1). We observe that the rightmost side of the above expression increases with λ\lambda; we can freely select any λ(0,1)\lambda\in(0,1) and accordingly set C32=1/λC_{32}=\sqrt{1/\lambda}, φ32(ρ)=1+λρ/(1λ)\varphi_{32}(\rho)=1+\lambda\rho/(1-\lambda), with which property (3) holds.

(3)\implies(4) (according to [3], with adjustments): From property (3) and the inequality λXλ2σ32/4+X2/σ32\lambda X\leq\lambda^{2}\sigma_{3}^{2}/4+X^{2}/\sigma_{3}^{2}, we find

𝔼[exp(λX)]𝔼[exp(λ2σ32/4+X2/σ32)]=exp(λ2σ32/4)𝔼[exp(X2/σ32)]ρ3exp(λ2σ32/4)\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]&\leq\mathbb{E}\left[\exp(\lambda^{2}\sigma_{3}^{2}/4+X^{2}/\sigma_{3}^{2})\right]\\ &=\exp(\lambda^{2}\sigma_{3}^{2}/4)\cdot\mathbb{E}\left[\exp(X^{2}/\sigma_{3}^{2})\right]\\ &\leq\rho_{3}\exp\big{(}\lambda^{2}\sigma_{3}^{2}/4\big{)}\end{split}

for all λ\lambda\in\mathbb{R}. Therefore, property (4) holds for σ42σ3/2\sigma_{4}\geq\sqrt{2}\sigma_{3}/2 and ρ4ρ3\rho_{4}\geq\rho_{3}, i.e., C43=1/2C_{43}=\sqrt{1/2} and φ43(ρ)=ρ\varphi_{43}(\rho)=\rho.

(4)\implies(5) (generic Chernoff bound, see [2, 3]): Given property (4), Markov’s inequality implies

[Xλσ5]=[exp(xX)exp(xλσ5)]𝔼[exp(xX)]exp(xλσ5)ρ4exp(σ42x2/2λσ5x)\begin{split}\mathbb{P}\left[X\geq\lambda\sigma_{5}\right]&=\mathbb{P}\left[\exp(xX)\geq\exp(x\lambda\sigma_{5})\right]\\ &\leq\mathbb{E}\left[\exp(xX)\right]\exp(-x\lambda\sigma_{5})\\ &\leq\rho_{4}\exp\left(\sigma_{4}^{2}x^{2}/2-\lambda\sigma_{5}x\right)\end{split}

for all λ0\lambda\geq 0 and x>0x>0. Note that [Xλσ5]ρ4exp(σ42x2/2λσ5x)\mathbb{P}\left[X\geq\lambda\sigma_{5}\right]\leq\rho_{4}\exp(\sigma_{4}^{2}x^{2}/2-\lambda\sigma_{5}x) holds trivially for x=0x=0. Minimizing the right-hand side with respect to xx, we obtain

[Xλσ5]ρ4infx0exp(σ42x2/2λσ5x)=ρ4exp[λ2σ52/2σ42],\begin{split}\mathbb{P}\left[X\geq\lambda\sigma_{5}\right]&\leq\rho_{4}\inf_{x\geq 0}\exp\left(\sigma_{4}^{2}x^{2}/2-\lambda\sigma_{5}x\right)\\ &=\rho_{4}\exp\left[-{\lambda^{2}\sigma_{5}^{2}}/2\sigma_{4}^{2}\right],\end{split}

where the infimum is attained at x=λσ5/σ42x=\lambda\sigma_{5}/\sigma_{4}^{2}. Since property (4) is invariant when replacing XX with X-X, we similarly find [Xλσ5]ρ4exp[λ2σ52/2σ42]\mathbb{P}[-X\geq\lambda\sigma_{5}]\leq\rho_{4}\exp[-{\lambda^{2}\sigma_{5}^{2}}/2\sigma_{4}^{2}] for all λ0\lambda\geq 0. Therefore, property (5) holds whenever σ5σ4\sigma_{5}\geq\sigma_{4} and ρ5ρ4\rho_{5}\geq\rho_{4}, i.e., C54=1C_{54}=1 and φ54(ρ)=ρ\varphi_{54}(\rho)=\rho.

(5)\implies(1): This is trivial, with C15=1C_{15}=1 and φ15(ρ)=2ρ\varphi_{15}(\rho)=2\rho.

So far, we have demonstrated the equivalence of all characterizations; any CijC_{ij} and φij\varphi_{ij} not directly addressed can be determined using the transitivity of implications. Furthermore, improvements can be made to certain φij\varphi_{ij} with the following additional implications.

(3)\implies(1) [2]: This follows immediately from Markov’s inequality, since

[|X|λσ3/2]=[X2/σ32λ2/2]𝔼[exp(X2/σ32)]exp(λ2/2)ρ3exp(λ2/2)\begin{split}\mathbb{P}\left[\lvert X\rvert\geq\lambda\sigma_{3}/\sqrt{2}\right]&=\mathbb{P}\left[X^{2}/\sigma_{3}^{2}\geq\lambda^{2}/2\right]\\ &\leq\mathbb{E}\left[\exp(X^{2}/\sigma_{3}^{2})\right]\exp(-\lambda^{2}/2)\\ &\leq\rho_{3}\exp(-\lambda^{2}/2)\end{split}

for any λ0\lambda\geq 0, given property (3). Therefore, property (1) holds for all σ12σ3/2\sigma_{1}\geq\sqrt{2}\sigma_{3}/2 and ρ1ρ3\rho_{1}\geq\rho_{3}, i.e., C13=1/2C_{13}=\sqrt{1/2} and φ13(ρ)=ρ\varphi_{13}(\rho)=\rho. The value C13=1/2C_{13}=\sqrt{1/2} obtained here equals that derived via chain of implications, C15C54C43C_{15}C_{54}C_{43}, while φ15(ρ)=ρ\varphi_{15}(\rho)=\rho obtained here is slightly better than the composition φ15φ54φ43(ρ)=2ρ\varphi_{15}\circ\varphi_{54}\circ\varphi_{43}(\rho)=2\rho.

(3)\implies(2) [6]: By expanding property (3) into a power series, we obtain

𝔼[exp(X2/σ32)]=1+k=1𝔼[X2k]k!σ32kρ3.\mathbb{E}\left[\exp(X^{2}/\sigma_{3}^{2})\right]=1+\sum_{k=1}^{\infty}\frac{\mathbb{E}\left[X^{2k}\right]}{k!\,\sigma_{3}^{2k}}\leq\rho_{3}.

Since every term in the sum is non-negative, it follows that

𝔼[X2k](ρ31)σ32kk!\mathbb{E}\,\big{[}X^{2k}\big{]}\leq(\rho_{3}-1)\cdot\sigma_{3}^{2k}\cdot k!

for all positive integers kk. Therefore, property (2) holds, with C23=1C_{23}=1 and φ23(ρ)=ρ1\varphi_{23}(\rho)=\rho-1, where φ23(ρ)\varphi_{23}(\rho) is slightly better than the composition φ21φ13(ρ)=ρ\varphi_{21}\circ\varphi_{13}(\rho)=\rho.

(1)\implies(3): This can be derived from the chain (1)\implies(2)\implies(3) or directly as shown in Refs. [6, 7]. However, it should be noted that property (1) may lead to a slightly better conclusion than property (2). Given property (1), we have

𝔼[|X|n]=n0[|X|λσ1](λσ1)n1d(λσ1)nσ1n0min{ρ1exp(λ2/2),1}λn1dλ=12n(2σ1)n0min{ρ1exp(x),1}xn22dx=12n(2σ1)n(0lnρ1xn22dx+ρ1lnρ1exp(x)xn22dx)=ρ1(2σ1)nΓ(n2+1,lnρ1)\begin{split}\mathbb{E}\left[\lvert X\rvert^{n}\right]&=n\int_{0}^{\infty}\mathbb{P}\left[\lvert X\rvert\geq\lambda\sigma_{1}\right]\,(\lambda\sigma_{1})^{n-1}\,\mathrm{d}(\lambda\sigma_{1})\\ &\leq n\sigma_{1}^{n}\int_{0}^{\infty}\min\big{\{}\rho_{1}\exp(-\lambda^{2}/2),1\big{\}}\,\lambda^{n-1}\,\mathrm{d}\lambda\\ &=\frac{1}{2}n\big{(}\sqrt{2}\sigma_{1}\big{)}^{n}\int_{0}^{\infty}\min\big{\{}\rho_{1}\exp(-x),1\big{\}}\,x^{\tfrac{n-2}{2}}\,\mathrm{d}x\\ &=\frac{1}{2}n\big{(}\sqrt{2}\sigma_{1}\big{)}^{n}\bigg{(}\int_{0}^{\ln\rho_{1}}\mkern-5.0mux^{\tfrac{n-2}{2}}\,\mathrm{d}x+\rho_{1}\int_{\ln\rho_{1}}^{\infty}\mkern-5.0mu\exp(-x)\,x^{\tfrac{n-2}{2}}\,\mathrm{d}x\bigg{)}\\ &=\rho_{1}\cdot\big{(}\sqrt{2}\sigma_{1}\big{)}^{n}\cdot\Gamma\left(\frac{n}{2}+1,\,\ln\rho_{1}\right)\end{split}

for any n1n\geq 1, where Γ(,)\Gamma(\cdot\,,\cdot) denotes the upper incomplete gamma function. When n=2kn=2k with k=1,2,3k=1,2,3\dots, we have

𝔼[X2k]ρ12kσ12kΓ(k+1,lnρ1)=i=0k(lnρ1)ii!2kσ12kk!,\mathbb{E}\big{[}X^{2k}\big{]}\leq\rho_{1}\cdot 2^{k}\sigma_{1}^{2k}\cdot\Gamma\left(k+1,\,\ln\rho_{1}\right)=\sum_{i=0}^{k}\frac{(\ln\rho_{1})^{i}}{i!}\cdot 2^{k}\sigma_{1}^{2k}\cdot k!,

which is slightly better than the result 𝔼[X2k]ρ12kσ12kk!\mathbb{E}\big{[}X^{2k}\big{]}\leq\rho_{1}\cdot 2^{k}\sigma_{1}^{2k}\cdot k! in the derivation for implication (1)\implies(2). Furthermore, we find

𝔼[exp(λX2/2σ12)]=k=0λk𝔼[X2k]k! 2kσ12kk=0λki=0k(lnρ1)ii!=i=0(lnρ1)ii!k=iλk=11λi=0(λlnρ1)ii!=ρ1λ1λ,\begin{split}\mathbb{E}\left[\exp(\lambda X^{2}/2\sigma_{1}^{2})\right]&=\sum_{k=0}^{\infty}\frac{\lambda^{k}\,\mathbb{E}[X^{2k}]}{k!\,2^{k}\sigma_{1}^{2k}}\leq\sum_{k=0}^{\infty}\lambda^{k}\sum_{i=0}^{k}\frac{(\ln\rho_{1})^{i}}{i!}\\[-3.0pt] &=\sum_{i=0}^{\infty}\frac{(\ln\rho_{1})^{i}}{i!}\sum_{k=i}^{\infty}\lambda^{k}=\frac{1}{1-\lambda}\sum_{i=0}^{\infty}\frac{(\lambda\ln\rho_{1})^{i}}{i!}=\frac{\rho_{1}^{\lambda}}{1-\lambda},\end{split}

for all λ[0,1)\lambda\in[0,1). Therefore, we can freely select any λ(0,1)\lambda\in(0,1), and property (3) holds with C31=2/λC_{31}=\sqrt{2/\lambda} and φ31(ρ)=ρλ/(1λ)\varphi_{31}(\rho)=\rho^{\lambda}/(1-\lambda), where φ31(ρ)\varphi_{31}(\rho) is slightly better than the composition φ32φ21(ρ)=1+λρ/(1λ)\varphi_{32}\circ\varphi_{21}(\rho)=1+\lambda\rho/(1-\lambda).

(4)\implies(3) (according to [3]): Starting from property (4), we have

𝔼[exp(xX12λσ42x2)]ρ4exp(12(11λ)σ42x2)\mathbb{E}\bigg{[}\exp\Big{(}xX-\frac{1}{2\lambda}\sigma_{4}^{2}\,x^{2}\Big{)}\bigg{]}\leq\rho_{4}\exp\bigg{(}\frac{1}{2}\Big{(}1-\frac{1}{\lambda}\Big{)}\sigma_{4}^{2}x^{2}\bigg{)}

for all xx\in\mathbb{R} and λ(0,1)\lambda\in(0,1). By integrating both sides with respect to xx on (,+)(-\infty,+\infty) and using Fubini’s theorem, we obtain

2λπσ4𝔼[exp(λX22σ42)]ρ4σ42λπ1λ,\frac{\sqrt{2\lambda\pi}}{\sigma_{4}}\mathbb{E}\left[\exp\Big{(}\frac{\lambda X^{2}}{2\sigma_{4}^{2}}\Big{)}\right]\leq\frac{\rho_{4}}{\sigma_{4}}\cdot\frac{\sqrt{2\lambda\pi}}{\sqrt{1-\lambda}},

which simplifies to

𝔼[exp(λX22σ42)]ρ41λ\mathbb{E}\left[\exp\Big{(}\frac{\lambda X^{2}}{2\sigma_{4}^{2}}\Big{)}\right]\leq\frac{\rho_{4}}{\sqrt{1-\lambda}}

for all λ(0,1)\lambda\in(0,1). Therefore, property (3) holds with C34=2/λC_{34}=\sqrt{2/\lambda} and φ34(ρ)=ρ/1λ\varphi_{34}(\rho)=\rho/\sqrt{1-\lambda}. Meanwhile, the implication chain (4)\implies(5)\implies(1)\implies(3) leads to C31C15C54=2/λC_{31}C_{15}C_{54}=\sqrt{2/\lambda} and φ31φ15φ54=(2ρ)λ/(1λ)\varphi_{31}\circ\varphi_{15}\circ\varphi_{54}=(2\rho)^{\lambda}/(1-\lambda). We may finally choose

φ34(ρ)=(1λ)1min{ρ1λ,(2ρ)λ}.\varphi_{34}(\rho)=(1-\lambda)^{-1}\min\left\{\rho\sqrt{1-\lambda},\,(2\rho)^{\lambda}\right\}. (3)

 

Remark 2

If XX is either non-negative or non-positive, i.e., [X0]=1\mathbb{P}\left[X\geq 0\right]=1 or [X0]=1\mathbb{P}\left[X\leq 0\right]=1, then properties (1) and (5) in Theorem 1 will be the same. The corresponding table for CijC_{ij} and φij(ρ)\varphi_{ij}(\rho) in Remark 1 is shown below, where λ(0,1)\lambda\in(0,1) is a parameter.

Concl. Hyp. (1)(1) (2)(2) (3)(3) (4)(4)
(1)(1)  12λ,1+λρ1λ\sqrt{\frac{1}{2\lambda}},1+\frac{\lambda\rho}{1-\lambda}  12,ρ\sqrt{\frac{1}{2}},\rho \dagger 1,ρ1,\rho \dagger
(2)(2) 2,ρ\sqrt{2},\rho \dagger  1,ρ11,\rho-1 \dagger 2,ρ\sqrt{2},\rho
(3)(3) 2λ,11λρλ\sqrt{\frac{2}{\lambda}},\frac{1}{1-\lambda}\rho^{\lambda} \dagger  1λ,1+λρ1λ\sqrt{\frac{1}{\lambda}},1+\frac{\lambda\rho}{1-\lambda} \dagger 2λ\sqrt{\frac{2}{\lambda}}, min{11λρ,11λρλ}\min\left\{\frac{1}{\sqrt{1-\lambda}}\rho,\,\frac{1}{1-\lambda}\rho^{\lambda}\right\} \dagger
(4)(4) 1λ,11λρλ\sqrt{\frac{1}{\lambda}},\frac{1}{1-\lambda}\rho^{\lambda}  12λ,1+λρ1λ\sqrt{\frac{1}{2\lambda}},1+\frac{\lambda\rho}{1-\lambda}  12,ρ\sqrt{\frac{1}{2}},\rho \dagger
  • \dagger

    Entries derived with a proof. The others follow from the transitivity of implications.

2.2 Centered subgaussian variables

Definition 1 and the equivalent characterizations in Theorem 1 do not require subgaussian variables to be centered (by “subgaussian” we mean that any of the properties in Theorem 1 is satisfied). Only (σ,1)(\sigma,1)-subgaussian variables, as conventionally defined, are guaranteed to have a zero mean. Conversely, any centered subgaussian variable must be (σ,1)(\sigma,1)-subgaussian for some σ\sigma.

Theorem 2 (Centered subgaussian variables)

Let XX\in\mathbb{R} be a subgaussian variable, satisfying any of the properties in Theorem 1, and assume that 𝔼[X]=0\mathbb{E}\left[X\right]=0. Then XX must be (σ,1)(\sigma,1)-subgaussian, namely, its moment-generating function satisfies

𝔼[exp(λX)]exp(12σ2λ2),\mathbb{E}\left[\exp(\lambda X)\right]\leq\exp\Big{(}\frac{1}{2}\sigma^{2}\lambda^{2}\Big{)},

for all λ\lambda\in\mathbb{R}, where σ\sigma can be determined as follows:

  • (a)

    If property (3)(3) in Theorem 1 is satisfied, then

    σ2σ32={12+98lnρ3,for lnρ3[0,4/9);32lnρ3,for lnρ3[4/9,16/9);98lnρ3,for lnρ3[16/9,+).\frac{\sigma^{2}}{\sigma_{3}^{2}}=\left\{\begin{aligned} &\frac{1}{2}+\frac{9}{8}\ln\rho_{3},&&\textup{for }\ln\rho_{3}\in\big{[}0,4/9\big{)};\\ &\frac{3}{2}\sqrt{\ln\rho_{3}},&&\textup{for }\ln\rho_{3}\in\big{[}4/9,16/9\big{)};\\ &\frac{9}{8}\ln\rho_{3},&&\textup{for }\ln\rho_{3}\in\big{[}16/9,+\infty\big{)}.\end{aligned}\right.
  • (b)

    If property (2)(2) in Theorem 1 is satisfied, then

    σ2σ22={2945,for ρ2[0,29/90);12[(ρ21990)2+2615ρ2+(ρ2+1990)],for ρ2[29/90,+).\frac{\sigma^{2}}{\sigma_{2}^{2}}=\left\{\begin{aligned} &\frac{29}{45},&&\textup{for }\rho_{2}\in\big{[}0,29/90\big{)};\\ &\frac{1}{2}\bigg{[}\sqrt{\Big{(}\rho_{2}-\frac{19}{90}\Big{)}^{2}+\frac{26}{15}\rho_{2}}+\bigg{(}\rho_{2}+\frac{19}{90}\bigg{)}\bigg{]},&&\textup{for }\rho_{2}\in\big{[}29/90,+\infty\big{)}.\end{aligned}\right.
  • (c)

    If property (1)(1) in Theorem 1 is satisfied, then

    σ2σ12=13(8+7lnρ1).\frac{\sigma^{2}}{\sigma_{1}^{2}}=\frac{1}{3}\big{(}8+7\ln\rho_{1}\big{)}.
  • (d)

    If property (4)(4) in Theorem 1 is satisfied, then

    σ2σ42=98lnρ4+(lnρ4)2+2lnρ4lnρ4[lnρ4+12ln(1+lnρ4+(lnρ4)2+2lnρ4)].\frac{\sigma^{2}}{\sigma_{4}^{2}}=\frac{9}{8}\cdot\frac{\ln\rho_{4}+\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}}{\ln\rho_{4}}\left[\ln\rho_{4}+\frac{1}{2}\ln\left(1+\ln\rho_{4}+\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}\right)\right].

Proof 
Case (a) (adjusted from the proof for Lemma 2 in Ref. [8]): From the numerical relation exp(x)x+exp(9x2/16)\exp(x)\leq x+\exp(9x^{2}/16) (xx\in\mathbb{R}), we find

𝔼[exp(λX)]λ𝔼[X]+𝔼[exp(916λ2X2)]0+𝔼[exp(X2/σ32)]916λ2σ32exp(916λ2σ32lnρ3)\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]&\leq\lambda\mathbb{E}\left[X\right]+\mathbb{E}\left[\exp(\tfrac{9}{16}\lambda^{2}X^{2})\right]\\ &\leq 0+\mathbb{E}\left[\exp(X^{2}/\sigma_{3}^{2})\right]^{{\tfrac{9}{16}\lambda^{2}\sigma_{3}^{2}}}\\ &\leq\exp\left(\tfrac{9}{16}\lambda^{2}\sigma_{3}^{2}\cdot\ln\rho_{3}\right)\end{split} (4)

for any λ\lambda such that 9λ2σ32/1619\lambda^{2}\sigma_{3}^{2}/16\leq 1, i.e., |λ|σ34/3\lvert\lambda\rvert\sigma_{3}\leq 4/3, where we used the assumption 𝔼[X]=0\mathbb{E}\left[X\right]=0, Jensen’s inequality, and property (3) in Theorem 1. On the other hand, from the inequality λXxλ2σ32/4+x1X2/σ32\lambda X\leq x\lambda^{2}\sigma_{3}^{2}/4+x^{-1}X^{2}/\sigma_{3}^{2} for x>0x>0, we find

𝔼[exp(λX)]𝔼[exp(xλ2σ32/4+x1X2/σ32)]exp(xλ2σ32/4)𝔼[exp(X2/σ32)]1/xexp(xλ2σ32/4+x1lnρ3)\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]&\leq\mathbb{E}\left[\exp(x\lambda^{2}\sigma_{3}^{2}/4+x^{-1}X^{2}/\sigma_{3}^{2})\right]\\ &\leq\exp\left(x\lambda^{2}\sigma_{3}^{2}/4\right)\cdot\mathbb{E}\left[\exp(X^{2}/\sigma_{3}^{2})\right]^{1/x}\\ &\leq\exp\left(x\lambda^{2}\sigma_{3}^{2}/4+x^{-1}\ln\rho_{3}\right)\\[-2.0pt] \end{split}

for all x1x\geq 1, where we again used Jensen’s inequality and property (3). By minimizing the rightmost side with respect to xx, we obtain

𝔼[exp(λX)]{exp(lnρ3σ3|λ|),for |λ|σ32lnρ3;exp(14σ32λ2+lnρ3),for |λ|σ3>2lnρ3.\mathbb{E}\left[\exp(\lambda X)\right]\leq\left\{\begin{aligned} &\exp\Big{(}\sqrt{\ln\rho_{3}}\,\sigma_{3}\lvert\lambda\rvert\Big{)},&&\textup{for }\lvert\lambda\rvert\sigma_{3}\leq 2\sqrt{\ln\rho_{3}};\\ &\exp\Big{(}\tfrac{1}{4}\sigma_{3}^{2}\lambda^{2}+\ln\rho_{3}\Big{)},&&\textup{for }\lvert\lambda\rvert\sigma_{3}>2\sqrt{\ln\rho_{3}}.\end{aligned}\right. (5)

Combining inequalities (4) and (5), the upper bound of the moment-generating function of XX on λR\lambda\in R can be expressed as follows: When 2lnρ34/32\sqrt{\ln\rho_{3}}\leq 4/3 (i.e., lnρ14/9\ln\rho_{1}\leq 4/9), we have

𝔼[exp(λX)]{exp(916lnρ3σ32λ2),for |λ|σ34/3;exp[(14+916lnρ3)σ32λ2],for |λ|σ3>4/3.\mathbb{E}\left[\exp(\lambda X)\right]\leq\left\{\begin{aligned} &\exp\Big{(}\tfrac{9}{16}\ln\rho_{3}\cdot\sigma_{3}^{2}\lambda^{2}\Big{)},&&\textup{for }\lvert\lambda\rvert\sigma_{3}\leq 4/3;\\ &\exp\Big{[}\big{(}\tfrac{1}{4}+\tfrac{9}{16}\ln\rho_{3}\big{)}\sigma_{3}^{2}\,\lambda^{2}\Big{]},&&\textup{for }\lvert\lambda\rvert\sigma_{3}>4/3.\end{aligned}\right.

When lnρ1>4/9\ln\rho_{1}>4/9, we have

𝔼[exp(λX)]{exp(916lnρ3σ32λ2),for |λ|σ3[0,4/3];exp(34lnρ3σ32λ2),for |λ|σ3(4/3,2lnρ3];exp(12σ32λ2),for |λ|σ3(2lnρ3,+).\mathbb{E}\left[\exp(\lambda X)\right]\leq\left\{\begin{aligned} &\exp\Big{(}\tfrac{9}{16}\ln\rho_{3}\cdot\sigma_{3}^{2}\lambda^{2}\Big{)},&&\textup{for }\lvert\lambda\rvert\sigma_{3}\in\big{[}0,4/3\big{]};\\ &\exp\Big{(}\tfrac{3}{4}\sqrt{\ln\rho_{3}}\cdot\sigma_{3}^{2}\lambda^{2}\Big{)},&&\textup{for }\lvert\lambda\rvert\sigma_{3}\in\big{(}4/3,2\sqrt{\ln\rho_{3}}\,\big{]};\\ &\exp\Big{(}\tfrac{1}{2}\sigma_{3}^{2}\lambda^{2}\Big{)},&&\textup{for }\lvert\lambda\rvert\sigma_{3}\in\big{(}2\sqrt{\ln\rho_{3}},+\infty\big{)}.\end{aligned}\right.

Note that

max{916lnρ3,34lnρ3,12}={916lnρ3,for lnρ1>16/9;34lnρ3,for 4/9lnρ116/9.\max\Big{\{}\tfrac{9}{16}\ln\rho_{3},\,\tfrac{3}{4}\sqrt{\ln\rho_{3}},\,\tfrac{1}{2}\Big{\}}\\ =\left\{\begin{aligned} &\tfrac{9}{16}\ln\rho_{3},&&\textup{for }\ln\rho_{1}>16/9;\\ &\tfrac{3}{4}\sqrt{\ln\rho_{3}},&&\textup{for }4/9\leq\ln\rho_{1}\leq 16/9.\\ \end{aligned}\right.

By choosing the greatest value of the coefficient for σ32λ2\sigma_{3}^{2}\lambda^{2} in each interval lnρ3[0,4/9)\ln\rho_{3}\in[0,4/9), lnρ3[4/9,16/9)\ln\rho_{3}\in[4/9,16/9), or lnρ3[16/9,+)\ln\rho_{3}\in[16/9,+\infty), the claim of case (a) is proved.

Case (b) (adjusted from [3]): The moments of odd orders of a random variable YY can be bounded according to the Cauchy–Schwarz inequality, as

𝔼[Y2k+1]𝔼[Y2k]𝔼[Y2k+2]12(x𝔼[Y2k]+x1𝔼[Y2k+2])\mathbb{E}\left[Y^{2k+1}\right]\leq\sqrt{\mathbb{E}\left[Y^{2k}\right]\,\mathbb{E}\left[Y^{2k+2}\right]}\leq\frac{1}{2}\left(x\mathbb{E}\big{[}Y^{2k}\big{]}+x^{-1}\mathbb{E}\big{[}Y^{2k+2}\big{]}\right)

for all x>0x>0 and k=0,1,2,k=0,1,2,\dots. Applying this for λX\lambda X and substituting it into the power series expansion of 𝔼[exp(λX)]\mathbb{E}\left[\exp(\lambda X)\right], we find

𝔼[exp(λX)]=1+λ𝔼[X]+k=1{𝔼[(λX)2k](2k)!+𝔼[(λX)2k+1](2k+1)!}1+k=1{λ2k𝔼[X2k](2k)!+xkλ2k𝔼[X2k]+xk1λ2k+2𝔼[X2k+2]2(2k+1)!}1+ρ2k=1{λ2kσ22kk!(2k)!+xkλ2kσ22kk!+xk1λ2k+2σ22k+2(k+1)!2(2k+1)!}=1+ρ2λ2σ222!(1+x16)+ρ2k=2λ2kσ22kk!(2k)!(kxk1+1+xk4k+2)=1+k=1ckλ2kσ22kk!,\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]&=1+\lambda\mathbb{E}\left[X\right]+\sum_{k=1}^{\infty}\left\{\frac{\mathbb{E}\left[(\lambda X)^{2k}\right]}{(2k)!}+\frac{\mathbb{E}\left[(\lambda X)^{2k+1}\right]}{(2k+1)!}\right\}\\ &\leq 1+\sum_{k=1}^{\infty}\left\{\frac{\lambda^{2k}\mathbb{E}[X^{2k}]}{(2k)!}+\frac{x_{k}\lambda^{2k}\mathbb{E}[X^{2k}]+x_{k}^{-1}\lambda^{2k+2}\mathbb{E}[X^{2k+2}]}{2\cdot(2k+1)!}\right\}\\[-2.0pt] &\leq 1+\rho_{2}\sum_{k=1}^{\infty}\left\{\frac{\lambda^{2k}\sigma_{2}^{2k}\,k!}{(2k)!}+\frac{x_{k}\lambda^{2k}\sigma_{2}^{2k}\,k!+x_{k}^{-1}\lambda^{2k+2}\sigma_{2}^{2k+2}\,(k+1)!}{2\cdot(2k+1)!}\right\}\\[-2.0pt] &=1+\rho_{2}\cdot\frac{\lambda^{2}\sigma_{2}^{2}}{2!}\left(1+\frac{x_{1}}{6}\right)+\rho_{2}\sum_{k=2}^{\infty}\frac{\lambda^{2k}\sigma_{2}^{2k}k!}{(2k)!}\left(\frac{k}{x_{k-1}}+1+\frac{x_{k}}{4k+2}\right)\\[-2.0pt] &=1+\sum_{k=1}^{\infty}c_{k}\frac{\lambda^{2k}\sigma_{2}^{2k}}{k!},\end{split}

where we have used the assumption 𝔼[X]=0\mathbb{E}\left[X\right]=0 and property (2), and have defined coefficients

ck{12(1+x16)ρ2,for k=1;(k!)2(2k)!(kxk1+1+xk4k+2)ρ2,for k=2,3,,c_{k}\coloneqq\left\{\begin{aligned} &\frac{1}{2}\left(1+\frac{x_{1}}{6}\right)\rho_{2},~{}&&\textup{for }k=1;\\ &\frac{(k!)^{2}}{(2k)!}\left(\frac{k}{x_{k-1}}+1+\frac{x_{k}}{4k+2}\right)\rho_{2},~{}&&\textup{for }k=2,3,\dots,\end{aligned}\right.

where xk>0x_{k}>0 are to be determined. By choosing xk=2(k+1)x_{k}=2(k+1) for k2k\geq 2, we have

ck={12(1+x16)ρ2,for k=1;16(2x1+85)ρ2,for k=2;(k!)2(2k)!(2+14k+2)ρ2,for k=3,4,,c_{k}=\left\{\begin{aligned} &\frac{1}{2}\left(1+\frac{x_{1}}{6}\right)\rho_{2},~{}&&\textup{for }k=1;\\ &\frac{1}{6}\left(\frac{2}{x_{1}}+\frac{8}{5}\right)\rho_{2},~{}&&\textup{for }k=2;\\ &\frac{(k!)^{2}}{(2k)!}\left(2+\frac{1}{4k+2}\right)\rho_{2},~{}&&\textup{for }k=3,4,\dots,\end{aligned}\right.

and

{c1=12(1+x16)ρ2;c2c1=132/x1+8/51+x1/6=23x11+45x11+16x123x1[19120(x16)+2910],when x1(0,6];c3c2=3102+1/142/x1+8/5928<2990,when x1(0,6];ck+1ck=(k+1)4(k+3/2)(8k+13)(8k+5)<0.29,for k=3,4,.\left\{\begin{aligned} c_{1}&=\frac{1}{2}\left(1+\frac{x_{1}}{6}\right)\rho_{2};\\ \frac{c_{2}}{c_{1}}&=\frac{1}{3}\cdot\frac{2/x_{1}+8/5}{1+x_{1}/6}=\frac{2}{3x_{1}}\frac{1+\frac{4}{5}x_{1}}{1+\frac{1}{6}x_{1}}\leq\frac{2}{3x_{1}}\left[\frac{19}{120}\Big{(}x_{1}-6\Big{)}+\frac{29}{10}\right],~{}&&\textup{when }x_{1}\in(0,6];\\ \frac{c_{3}}{c_{2}}&=\frac{3}{10}\cdot\frac{2+1/14}{2/x_{1}+8/5}\leq\frac{9}{28}<\frac{29}{90},~{}&&\textup{when }x_{1}\in(0,6];\\ \frac{c_{k+1}}{c_{k}}&=\frac{(k+1)}{4(k+3/2)}\cdot\frac{(8k+13)}{(8k+5)}<0.29,~{}&&\textup{for }k=3,4,\dots.\end{aligned}\right.

Now we need to choose the value of x1x_{1} appropriately and find a common upper bound for c1c_{1}, c2/c1c_{2}/c_{1}, and all ck+1/ckc_{k+1}/c_{k} (k2k\geq 2). When ρ229/90\rho_{2}\leq{29}/{90}, we can choose x1=6x_{1}=6, so that c129/90c_{1}\leq 29/90 and ck+1/ck29/90c_{k+1}/c_{k}\leq 29/90 for all k1k\geq 1. When ρ2>29/90\rho_{2}>{29}/{90}, we choose x1=x1(0,6)x_{1}=x_{1}^{\ast}\in(0,6) such that

12(1+x16)ρ2=23x1[19120(x16)+2910]>2990,\frac{1}{2}\left(1+\frac{x_{1}^{\ast}}{6}\right)\rho_{2}=\frac{2}{3x_{1}^{\ast}}\left[\frac{19}{120}\Big{(}x_{1}^{\ast}-6\Big{)}+\frac{29}{10}\right]>\frac{29}{90}, (6)

which leads to an explicit solution

x1=3ρ2[(ρ21990)2+2615ρ2(ρ21990)]x_{1}^{\ast}=\frac{3}{\rho_{2}}\bigg{[}\sqrt{\Big{(}\rho_{2}-\frac{19}{90}\Big{)}^{2}+\frac{26}{15}\rho_{2}}-\left(\rho_{2}-\frac{19}{90}\right)\bigg{]}

and accordingly

c1=14[(ρ21990)2+2615ρ2+(ρ2+1990)].c_{1}^{\ast}=\frac{1}{4}\bigg{[}\sqrt{\Big{(}\rho_{2}-\frac{19}{90}\Big{)}^{2}+\frac{26}{15}\rho_{2}}+\left(\rho_{2}+\frac{19}{90}\right)\bigg{]}.

With such choices of xkx_{k} (k=1,2,3,k=1,2,3,\dots), we finally have

𝔼[exp(λX)]1+k=1ckλ2kσ22kk!1+k=1λ2kσ22kk!max{2990,c1}kexp(max{2990,c1}σ22λ2),\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]&\leq 1+\sum_{k=1}^{\infty}c_{k}\frac{\lambda^{2k}\sigma_{2}^{2k}}{k!}\leq 1+\sum_{k=1}^{\infty}\frac{\lambda^{2k}\sigma_{2}^{2k}}{k!}\max\left\{\frac{29}{90},c_{1}^{\ast}\right\}^{k}\\ &\leq\exp\left(\max\Big{\{}\frac{29}{90},c_{1}^{\ast}\Big{\}}\sigma_{2}^{2}\lambda^{2}\right),\end{split}

which proves the claim of case (b).

Case (c): Property (1) in Theorem 1 indicates 𝔼[X2k]fk(ρ1)(2σ1)2kk!\mathbb{E}\,[X^{2k}]\leq f_{k}(\rho_{1})\cdot(\sqrt{2}\sigma_{1})^{2k}\cdot k!, where fk(ρ1)i=0k(lnρ1)i/i!f_{k}(\rho_{1})\coloneqq\sum_{i=0}^{k}(\ln\rho_{1})^{i}/i! for k=1,2,3,k=1,2,3,\dots, as shown in the proof for (1)\implies(3) therein. Similar to the proof above for case (b), it is easy to find

𝔼[exp(λX)]1+k=1{λ2k𝔼[X2k](2k)!+xkλ2k𝔼[X2k]+xk1λ2k+2𝔼[X2k+2]2(2k+1)!}1+k=1{(2λσ1)2kfk(ρ1)k!(2k)!+xk(2λσ1)2kfk(ρ1)k!+xk1(2λσ1)2k+2fk+1(ρ1)(k+1)!2(2k+1)!}=1+(2λσ1)22!(1+x16)f1(ρ1)+k=2(2λσ1)2kk!(2k)!(kxk1+1+xk4k+2)fk(ρ1)=1+k=1ck(ρ1)(2λ2σ12)kk!\begin{split}&\mathbb{E}\left[\exp(\lambda X)\right]\leq 1+\sum_{k=1}^{\infty}\left\{\frac{\lambda^{2k}\mathbb{E}[X^{2k}]}{(2k)!}+\frac{x_{k}\lambda^{2k}\mathbb{E}[X^{2k}]+x_{k}^{-1}\lambda^{2k+2}\mathbb{E}[X^{2k+2}]}{2\cdot(2k+1)!}\right\}\\[-2.0pt] \leq\,&1+\sum_{k=1}^{\infty}\left\{\frac{\big{(}\sqrt{2}\lambda\sigma_{1}\big{)}^{2k}f_{k}(\rho_{1})\,k!}{(2k)!}+\frac{x_{k}\big{(}\sqrt{2}\lambda\sigma_{1}\big{)}^{2k}f_{k}(\rho_{1})\,k!+x_{k}^{-1}\big{(}\sqrt{2}\lambda\sigma_{1}\big{)}^{2k+2}f_{k+1}(\rho_{1})\,(k+1)!}{2\cdot(2k+1)!}\right\}\\[-2.0pt] =\,&1+\frac{\big{(}\sqrt{2}\lambda\sigma_{1}\big{)}^{2}}{2!}\left(1+\frac{x_{1}}{6}\right)f_{1}(\rho_{1})+\sum_{k=2}^{\infty}\frac{\big{(}\sqrt{2}\lambda\sigma_{1}\big{)}^{2k}k!}{(2k)!}\left(\frac{k}{x_{k-1}}+1+\frac{x_{k}}{4k+2}\right)f_{k}(\rho_{1})\\[-2.0pt] =\,&1+\sum_{k=1}^{\infty}c_{k}(\rho_{1})\frac{\big{(}2\lambda^{2}\sigma_{1}^{2}\big{)}^{k}}{k!}\end{split}

for all λ\lambda\in\mathbb{R}, where, by choosing xk=2(k+1)x_{k}=2(k+1) for k2k\geq 2, the coefficients ckc_{k} are given by

ck(ρ1)={12(1+x16)f1(ρ1),for k=1;16(2x1+85)f2(ρ1),for k=2;(k!)2(2k)!(2+14k+2)fk(ρ1),for k=3,4,.c_{k}(\rho_{1})=\left\{\begin{aligned} &\frac{1}{2}\left(1+\frac{x_{1}}{6}\right)f_{1}(\rho_{1}),~{}&&\textup{for }k=1;\\ &\frac{1}{6}\left(\frac{2}{x_{1}}+\frac{8}{5}\right)f_{2}(\rho_{1}),~{}&&\textup{for }k=2;\\ &\frac{(k!)^{2}}{(2k)!}\left(2+\frac{1}{4k+2}\right)f_{k}(\rho_{1}),~{}&&\textup{for }k=3,4,\dots.\end{aligned}\right.

Following from the inequality

(1+xk+1)i=0kxii!i=0k+1xii!\Big{(}1+\frac{x}{k+1}\Big{)}\sum_{i=0}^{k}\frac{x^{i}}{i!}\geq\sum_{i=0}^{k+1}\frac{x^{i}}{i!}

for x0x\geq 0 and k0k\geq 0, we have

{c1=12(1+x16)(1+lnρ1);c2c1=132/x1+8/51+x1/6f2(ρ1)f1(ρ1)23x11+56x11+16x1f3(ρ1)f2(ρ1)23x1[38(x12)+2](1+12lnρ1),when x1(0,2];c3c2=3102+1/142/x1+8/5f3(ρ1)f2(ρ1)87364(1+13lnρ1),when x1(0,2];ck+1ck=(k+1)4(k+3/2)(8k+13)(8k+5)fk+1(ρ1)fk(ρ1)<0.29(1+14lnρ1),for k=3,4,.\left\{\begin{aligned} c_{1}&=\frac{1}{2}\left(1+\frac{x_{1}}{6}\right)\left(1+\ln\rho_{1}\right);\\ \frac{c_{2}}{c_{1}}&=\frac{1}{3}\cdot\frac{2/x_{1}+8/5}{1+x_{1}/6}\cdot\frac{f_{2}(\rho_{1})}{f_{1}(\rho_{1})}\leq\frac{2}{3x_{1}}\frac{1+\frac{5}{6}x_{1}}{1+\frac{1}{6}x_{1}}\cdot\frac{f_{3}(\rho_{1})}{f_{2}(\rho_{1})}\\ &\leq\frac{2}{3x_{1}}\left[\frac{3}{8}\Big{(}x_{1}-2\Big{)}+2\right]\left(1+\frac{1}{2}\ln\rho_{1}\right),~{}&&\textup{when }x_{1}\in(0,2];\\ \frac{c_{3}}{c_{2}}&=\frac{3}{10}\cdot\frac{2+1/14}{2/x_{1}+8/5}\cdot\frac{f_{3}(\rho_{1})}{f_{2}(\rho_{1})}\leq\frac{87}{364}\left(1+\frac{1}{3}\ln\rho_{1}\right),~{}&&\textup{when }x_{1}\in(0,2];\\ \frac{c_{k+1}}{c_{k}}&=\frac{(k+1)}{4(k+3/2)}\cdot\frac{(8k+13)}{(8k+5)}\cdot\frac{f_{k+1}(\rho_{1})}{f_{k}(\rho_{1})}<0.29\left(1+\frac{1}{4}\ln\rho_{1}\right),~{}&&\textup{for }k=3,4,\dots.\end{aligned}\right.

By choosing x1=x1(0,2]x_{1}=x_{1}^{\ast}\in(0,2] such that

12(1+x16)(1+lnρ1)=23x1[38(x12)+2](1+12lnρ1)23,\frac{1}{2}\left(1+\frac{x_{1}^{\ast}}{6}\right)\left(1+\ln\rho_{1}\right)=\frac{2}{3x_{1}^{\ast}}\left[\frac{3}{8}\Big{(}x_{1}^{\ast}-2\Big{)}+2\right]\left(1+\frac{1}{2}\ln\rho_{1}\right)\geq\frac{2}{3}, (7)

which leads to an explicit solution

x1=196+348lnρ1+161(lnρ1)2(6+9lnρ1)4(1+lnρ1),x_{1}^{\ast}=\frac{\sqrt{196+348\ln\rho_{1}+161(\ln\rho_{1})^{2}}-(6+9\ln\rho_{1})}{4(1+\ln\rho_{1})},

we have c1(ρ1)c1(ρ1)c_{1}(\rho_{1})\leq c_{1}^{\ast}(\rho_{1}) and ck+1(ρ1)/ck(ρ1)c1(ρ1)c_{k+1}(\rho_{1})/c_{k}(\rho_{1})\leq c_{1}^{\ast}(\rho_{1}) for all k1k\geq 1 and all ρ11\rho_{1}\geq 1, where

c1(ρ1)=196+348lnρ1+161(lnρ1)2+18+15lnρ1487lnρ1+812.c_{1}^{\ast}(\rho_{1})=\frac{\sqrt{196+348\ln\rho_{1}+161(\ln\rho_{1})^{2}}+18+15\ln\rho_{1}}{48}\leq\frac{7\ln\rho_{1}+8}{12}.

Therefore, we finally have

𝔼[exp(λX)]1+k=1ck(ρ1)(2λ2σ12)kk!1+k=1c1(ρ1)k(2λ2σ12)kk!exp(2c1(ρ1)σ12λ2)exp(7lnρ1+86σ12λ2),\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]&\leq 1+\sum_{k=1}^{\infty}c_{k}(\rho_{1})\frac{\big{(}2\lambda^{2}\sigma_{1}^{2}\big{)}^{k}}{k!}\leq 1+\sum_{k=1}^{\infty}c_{1}^{\ast}(\rho_{1})^{k}\frac{\big{(}2\lambda^{2}\sigma_{1}^{2}\big{)}^{k}}{k!}\\ &\leq\exp\left(2c_{1}^{\ast}(\rho_{1})\sigma_{1}^{2}\lambda^{2}\right)\leq\exp\left(\frac{7\ln\rho_{1}+8}{6}\sigma_{1}^{2}\lambda^{2}\right),\end{split}

which proves the claim of case (c).

Case (d): According to Theorem 1 and Remark 1, if property (4) is satisfied, then property (3) is satisfied, with σ3=σ42/x\sigma_{3}=\sigma_{4}\sqrt{2/x} and ρ3=ρ4/1x\rho_{3}=\rho_{4}/\sqrt{1-x} for any x(0,1)x\in(0,1). Furthermore, given 𝔼[X]=0\mathbb{E}\left[X\right]=0, the proof for case (a) indicates

𝔼[exp(λX)]exp(916λ2σ32lnρ3)=exp(λ2σ4298xlnρ41x)\mathbb{E}\left[\exp(\lambda X)\right]\leq\exp\left(\frac{9}{16}\lambda^{2}\sigma_{3}^{2}\cdot\ln\rho_{3}\right)=\exp\left(\lambda^{2}\sigma_{4}^{2}\cdot\frac{9}{8x}\ln\frac{\rho_{4}}{\sqrt{1-x}}\right)

for all λ\lambda such that |λ|σ34/3\lvert\lambda\rvert\sigma_{3}\leq 4/3, i.e., |λ|σ443x/2\lvert\lambda\rvert\sigma_{4}\leq\frac{4}{3}\sqrt{x/2}. On the other hand, for all λ\lambda such that |λ|σ443x/2\lvert\lambda\rvert\sigma_{4}\geq\frac{4}{3}\sqrt{x/2}, we clearly have

𝔼[exp(λX)]exp(12σ42λ2+lnρ4)exp(12σ42λ2+9σ42λ28xlnρ4)=exp(σ42λ24x+9lnρ48x).\mathbb{E}\left[\exp(\lambda X)\right]\leq\exp\Big{(}\frac{1}{2}\sigma_{4}^{2}\lambda^{2}+\ln\rho_{4}\Big{)}\leq\exp\Big{(}\frac{1}{2}\sigma_{4}^{2}\lambda^{2}+\frac{9\sigma_{4}^{2}\lambda^{2}}{8x}\ln\rho_{4}\Big{)}=\exp\Big{(}\sigma_{4}^{2}\lambda^{2}\cdot\frac{4x+9\ln\rho_{4}}{8x}\Big{)}.

Also, we have 9(lnρ4ln1x)9lnρ4+4x9(\ln\rho_{4}-\ln\sqrt{1-x})\geq 9\ln\rho_{4}+4x for x(0,1)x\in(0,1), since 9ln(1x)8x-9\ln(1-x)\geq 8x can be verified easily. Therefore, for all λ\lambda\in\mathbb{R} we have

𝔼[exp(λX)]exp(98λ2σ42minx(0,1)1xlnρ41x),\mathbb{E}\left[\exp(\lambda X)\right]\leq\exp\left(\frac{9}{8}\lambda^{2}\sigma_{4}^{2}\min_{x\in(0,1)}\frac{1}{x}\ln\frac{\rho_{4}}{\sqrt{1-x}}\right),

where the minimizing xx tends to 0 and 11 when ρ4\rho_{4} approaches 11 and ++\infty, respectively. Since the minimizing xx does not have an explicit expression, we choose the following surrogate

x=(lnρ4)2+2lnρ4lnρ4=2lnρ4(lnρ4)2+2lnρ4+lnρ4,x^{\ast}=\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}-\ln\rho_{4}=\frac{2\ln\rho_{4}}{\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}+\ln\rho_{4}}, (8)

which leads to (1x)1=1+lnρ4+(lnρ4)2+2lnρ4(1-x^{\ast})^{-1}=1+\ln\rho_{4}+\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}, and thus

minx(0,1)1xlnρ41x1xlnρ41x=(lnρ4)2+2lnρ4+lnρ42lnρ4[lnρ4+12ln(1+lnρ4+(lnρ4)2+2lnρ4)].\begin{split}&\min_{x\in(0,1)}\frac{1}{x}\ln\frac{\rho_{4}}{\sqrt{1-x}}\leq\frac{1}{x^{\ast}}\ln\frac{\rho_{4}}{\sqrt{1-x^{\ast}}}\\ &\quad=\frac{\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}+\ln\rho_{4}}{2\ln\rho_{4}}\left[\ln\rho_{4}+\frac{1}{2}\ln\left(1+\ln\rho_{4}+\sqrt{(\ln\rho_{4})^{2}+2\ln\rho_{4}}\right)\right].\end{split}

Therefore, the claim in case (d) is proved.  

Note that the σ2\sigma^{2} values for (σ,1)(\sigma,1)-subgaussian variables, explicitly provided in Theorem 2, are derived directly from each of the subgaussian properties and are not meant to be optimal. By translating between different equivalent properties, one can potentially find a better σ\sigma, as demonstrated in the following example.

Example 1

Assume that XX is centered and satisfies 𝔼[X2k]ρ2σ22kk!\mathbb{E}\left[X^{2k}\right]\leq\rho_{2}\cdot\sigma_{2}^{2k}\cdot k! for k=1,2,3,k=1,2,3,\dots, where ρ2\rho_{2} takes on the values 5×1035\times 10^{-3}, 11, or 1010. According to case (b) of Theorem 2, XX is (0.803σ2,1)(0.803\,\sigma_{2},1)-, (1.18σ2,1)(1.18\,\sigma_{2},1)-, or (3.23σ2,1)(3.23\,\sigma_{2},1)-subgaussian for each respective value of ρ2\rho_{2}.

Meanwhile, Theorem 1 and Remark 1 imply that 𝔼[exp(9X2/10σ22)]1+9ρ2\mathbb{E}\left[\exp(9X^{2}/10\sigma_{2}^{2})\right]\leq 1+9\rho_{2}. Consequently, case (a) of Theorem 2 indicates that XX is (0.782σ2,1)(0.782\,\sigma_{2},1)-, (1.70σ2,1)(1.70\,\sigma_{2},1)-, or (2.38σ2,1)(2.38\,\sigma_{2},1)-subgaussian, respectively.

While some texts define subgaussian variables by Eq. (2) fixing ρ=1\rho=1, some others [3, 6] generalize the concept by allowing nonzero expectations. They consider XX subgaussian if its centered version, X𝔼[X]X-\mathbb{E}\left[X\right], satisfies Definition 1 with ρ=1\rho=1; this implies that a subgaussian variable plus any constant remains subgaussian. This treatment is actually ”equivalent” to the definition of (σ,ρ)(\sigma,\rho)-subgaussian variables provided here, as will be clarified in Theorem 3 and Corollary 1.

Theorem 3

Let XX\in\mathbb{R} be a (σ,ρ)(\sigma,\rho)-subgaussian random variable and cc\in\mathbb{R} be a constant. Then X+cX+c is (σ,ρ)(\sigma^{\prime},\rho^{\prime})-subgaussian, with

σσ=1+x,ρρ=exp(c22xσ2),\frac{\sigma^{\prime}}{\sigma}=\sqrt{1+x},\quad\frac{\rho^{\prime}}{\rho}=\exp\Big{(}\frac{c^{2}}{2x\sigma^{2}}\Big{)}, (9)

where xx is an arbitrary positive number.

Corollary 1

A random variable XX\in\mathbb{R} is (σ,ρ)(\sigma,\rho)-subgaussian for some constants σ>0\sigma>0 and ρ1\rho\geq 1, if and only if X𝔼[X]X-\mathbb{E}\left[X\right] is (σ,1)(\sigma^{\prime},1)-subgaussian for some σ>0\sigma^{\prime}>0.

Proof  Given the assumptions in Theorem 3, we have

𝔼[exp(λ(X+c))]=exp(cλ)𝔼[exp(λX)]ρexp(12σ2λ2+cλ)ρexp(12σ2λ2+cλ+12xσ2(λcxσ2)2)=ρexp(12(1+x)σ2λ2+c22xσ2)\begin{split}\mathbb{E}\left[\exp\big{(}\lambda(X+c)\big{)}\right]&=\exp\left(c\lambda\right)\mathbb{E}\left[\exp\left(\lambda X\right)\right]\\ &\leq\rho\exp\Big{(}\frac{1}{2}\sigma^{2}\lambda^{2}+c\lambda\Big{)}\\ &\leq\rho\exp\left(\frac{1}{2}\sigma^{2}\lambda^{2}+c\lambda+\frac{1}{2}x\sigma^{2}\Big{(}\lambda-\frac{c}{x\sigma^{2}}\Big{)}^{2}\right)\\ &=\rho\exp\left(\frac{1}{2}\big{(}1+x\big{)}\sigma^{2}\lambda^{2}+\frac{c^{2}}{2x\sigma^{2}}\right)\end{split}

for all λ\lambda\in\mathbb{R} and x>0x>0, hence proving Theorem 3. Applying Theorem 3 with c=±𝔼[X]c=\pm\,\mathbb{E}\left[X\right] and Theorem 2 leads to the corollary.  

2.3 Closure under simple operations

Theorem 3 may be viewed as a specific instance of the closure of subgaussianity under sum-mation, as a constant cc\in\mathbb{R} is trivially a (σ,exp(c2/2σ2))\big{(}\sigma,\exp(c^{2}/2\sigma^{2})\big{)}-subgaussian variable, according to Definition 1. Some more general cases of the closure of subgaussianity are demonstrated by Theorems 4 and 5, where the discussion is based on properties (4) and (3) in Theorem 1, respectively. Obviously, the closure of subgaussianity can also be expressed with respect to the other subgaussian properties, potentially by introducing additional absolute constants.

Theorem 4

Let XiX_{i}\in\mathbb{R} (i=1,2,3,,n)(i=1,2,3,\dots,n) be nn random variables that are (σi,ρi)(\sigma_{i},\rho_{i})-subgaussian, respectively. Then we have

  • (i)

    Xi=1nXiX\coloneqq\sum_{i=1}^{n}X_{i} is (σ,ρ)(\sigma,\rho)-subgaussian, with

    σ=i=1nσi,lnρ=i=1nσilnρii=1nσi;\sigma=\sum_{i=1}^{n}\sigma_{i},\quad\ln\rho=\frac{\sum_{i=1}^{n}\sigma_{i}\ln\rho_{i}}{\sum_{i=1}^{n}\sigma_{i}};
  • (ii)

    Xi=1nXiX\coloneqq\sum_{i=1}^{n}X_{i}, where all XiX_{i} are independent from one another, is (σ,ρ)(\sigma,\rho)-subgaussian, with

    σ=i=1nσi2,ρ=i=1nρi;\sigma=\sqrt{\textstyle\sum_{i=1}^{n}\sigma_{i}^{2}},\quad\rho=\prod_{i=1}^{n}\rho_{i};
  • (iii)

    Xmaxi=1,2,,n{Xi}X\coloneqq\max_{i=1,2,\dots,n}\{X_{i}\} is (σ,ρ)(\sigma,\rho)-subgaussian, with

    σ=max1in{σi},ρ=i=1nρi.\sigma=\max_{1\leq i\leq n}\{\sigma_{i}\},\quad\rho=\sum_{i=1}^{n}\rho_{i}.

Proof 
Case (i) (adjusted from Ref. [7]): Considering the case of n=2n=2, we have

𝔼[exp(λX)]=𝔼[exp(λX1)exp(λX2)]𝔼[exp(σ1+σ2σ1λX1)]σ1σ1+σ2𝔼[exp(σ1+σ2σ2λX2)]σ2σ1+σ2[ρ1exp(12σ12λ2(σ1+σ2)2σ12)]σ1σ1+σ2[ρ2exp(12σ22λ2(σ1+σ2)2σ22)]σ2σ1+σ2=exp(12(σ1+σ2)2λ2+σ1lnρ1+σ2lnρ2σ1+σ2)\begin{split}&\mathbb{E}\left[\exp(\lambda X\right)]=\mathbb{E}\left[\exp(\lambda X_{1})\,\exp(\lambda X_{2})\right]\\ &\leq\mathbb{E}\left[\exp\Big{(}\tfrac{\sigma_{1}+\sigma_{2}}{\sigma_{1}}\lambda X_{1}\Big{)}\right]^{\frac{\sigma_{1}}{\sigma_{1}+\sigma_{2}}}\mathbb{E}\left[\exp\Big{(}\tfrac{\sigma_{1}+\sigma_{2}}{\sigma_{2}}\lambda X_{2}\Big{)}\right]^{\frac{\sigma_{2}}{\sigma_{1}+\sigma_{2}}}\\ &\leq\left[\rho_{1}\exp\left(\frac{1}{2}\sigma_{1}^{2}\cdot\frac{\lambda^{2}(\sigma_{1}+\sigma_{2})^{2}}{\sigma_{1}^{2}}\right)\right]^{\frac{\sigma_{1}}{\sigma_{1}+\sigma_{2}}}\left[\rho_{2}\exp\left(\frac{1}{2}\sigma_{2}^{2}\cdot\frac{\lambda^{2}(\sigma_{1}+\sigma_{2})^{2}}{\sigma_{2}^{2}}\right)\right]^{\frac{\sigma_{2}}{\sigma_{1}+\sigma_{2}}}\\ &=\exp\left(\frac{1}{2}(\sigma_{1}+\sigma_{2})^{2}\lambda^{2}+\frac{\sigma_{1}\ln\rho_{1}+\sigma_{2}\ln\rho_{2}}{\sigma_{1}+\sigma_{2}}\right)\end{split}

for all λ\lambda\in\mathbb{R}, where we applied Hölder’s inequality for the first inequality. The conclusion generalizes to larger nn by induction.

Case (ii): Given the assumption, we have

𝔼[exp(λX)]=𝔼[i=1nexp(λXi)]=i=1n𝔼[exp(λXi)]i=1nρiexp(12σi2λ2)=(i=1nρi)exp(12λ2i=1nσi2)\begin{split}\mathbb{E}\left[\exp(\lambda X\right)]&=\mathbb{E}\Big{[}{\textstyle\prod_{i=1}^{n}}\exp(\lambda X_{i})\Big{]}={\textstyle\prod_{i=1}^{n}}\mathbb{E}\left[\exp(\lambda X_{i})\right]\\ &\leq{\displaystyle\prod_{i=1}^{n}}\;\rho_{i}\exp\Big{(}\frac{1}{2}\sigma_{i}^{2}\lambda^{2}\Big{)}=\Big{(}{\textstyle\prod_{i=1}^{n}}\rho_{i}\Big{)}\cdot\exp\bigg{(}\frac{1}{2}\lambda^{2}\sum_{i=1}^{n}\sigma_{i}^{2}\bigg{)}\end{split}

for all λ\lambda\in\mathbb{R}, where we used the independence of all XiX_{i}.

Case (iii): Given the assumption, we have

𝔼[exp(λX)]=𝔼[exp(λmax1in{Xi})]<𝔼[i=1nexp(λXi)]i=1nρiexp(12σi2λ2)(i=1nρi)exp[12λ2max1in{σi2}]\begin{split}\mathbb{E}\left[\exp(\lambda X)\right]=\mathbb{E}\Big{[}\exp\big{(}\lambda\max_{1\leq i\leq n}\{X_{i}\}\big{)}\Big{]}&<\mathbb{E}\Big{[}{\textstyle\sum_{i=1}^{n}}\exp(\lambda X_{i})\Big{]}\\ \leq{\textstyle\sum_{i=1}^{n}}\rho_{i}\exp\big{(}\tfrac{1}{2}\sigma_{i}^{2}\lambda^{2}\big{)}&\leq\Big{(}{\textstyle\sum_{i=1}^{n}}\rho_{i}\Big{)}\exp\Big{[}\tfrac{1}{2}\lambda^{2}\max_{1\leq i\leq n}\{\sigma_{i}^{2}\}\Big{]}\end{split}

for any λ\lambda\in\mathbb{R}, hence proving the claim.  

Theorem 5

Let XiX_{i}\in\mathbb{R} (i=1,2,3,,n)(i=1,2,3,\dots,n) be nn random variables satisfying 𝔼[exp(Xi2/σi2)]ρi\mathbb{E}\left[\exp(X_{i}^{2}/\sigma_{i}^{2})\right]\leq\rho_{i}, respectively. Then we have

  • (i)

    XX such that |X|i=1n|Xi|\lvert X\rvert\leq\sum_{i=1}^{n}\lvert X_{i}\rvert satisfies 𝔼[exp(X2/σ2)]ρ\mathbb{E}\left[\exp(X^{2}/\sigma^{2})\right]\leq\rho, with

    σ=i=1nσi,lnρ=i=1nσilnρii=1nσi;\sigma=\sum_{i=1}^{n}\sigma_{i},\quad\ln\rho=\frac{\sum_{i=1}^{n}\sigma_{i}\ln\rho_{i}}{\sum_{i=1}^{n}\sigma_{i}};
  • (ii)

    XX such that |X|i=1nXi2\lvert X\rvert\leq\sqrt{\sum_{i=1}^{n}X_{i}^{2}} satisfies 𝔼[exp(X2/σ2)]ρ\mathbb{E}\left[\exp(X^{2}/\sigma^{2})\right]\leq\rho, with

    σ=i=1nσi2,lnρ=i=1nσi2lnρii=1nσi2;\sigma=\sqrt{\textstyle\sum_{i=1}^{n}\sigma_{i}^{2}},\quad\ln\rho=\frac{\sum_{i=1}^{n}\sigma_{i}^{2}\ln\rho_{i}}{\sum_{i=1}^{n}\sigma_{i}^{2}};
  • (iii)

    Xi=1nXiX\coloneqq\sum_{i=1}^{n}X_{i}, where all XiX_{i} are independent from one another, satisfies 𝔼[exp(X2/σ2)]ρ\mathbb{E}\left[\exp(X^{2}/\sigma^{2})\right]\leq\rho, with

    σ=i=1nσi2,ρ=i=1nρi;\sigma=\sqrt{\textstyle\sum_{i=1}^{n}\sigma_{i}^{2}},\quad\rho=\prod_{i=1}^{n}\rho_{i};

Proof 
Case (i): Considering the case of n=2n=2, we have

X2(σ1+σ2)2(|X1|+|X2|)2(σ1+σ2)2(1+σ2σ11)X12+(1+σ1σ21)X22(σ1+σ2)2=σ11X12+σ21X22σ1+σ2,\frac{X^{2}}{(\sigma_{1}+\sigma_{2})^{2}}\leq\frac{(\lvert X_{1}\rvert+\lvert X_{2}\rvert)^{2}}{(\sigma_{1}+\sigma_{2})^{2}}\leq\frac{(1+\sigma_{2}\sigma_{1}^{-1})X_{1}^{2}+(1+\sigma_{1}\sigma_{2}^{-1})X_{2}^{2}}{(\sigma_{1}+\sigma_{2})^{2}}=\frac{\sigma_{1}^{-1}X_{1}^{2}+\sigma_{2}^{-1}X_{2}^{2}}{\sigma_{1}+\sigma_{2}},

and we further find

𝔼[exp(X2(σ1+σ2)2)]𝔼[exp(σ1σ1+σ2X12/σ12)exp(σ2σ1+σ2X22/σ22)]𝔼[exp(X12/σ12)]σ1σ1+σ2𝔼[exp(X22/σ22)]σ2σ1+σ2ρ1σ1/(σ1+σ2)ρ2σ2/(σ1+σ2),\begin{split}\mathbb{E}\left[\exp\Big{(}\tfrac{X^{2}}{(\sigma_{1}+\sigma_{2})^{2}}\Big{)}\right]&\leq\mathbb{E}\left[\exp\Big{(}\tfrac{\sigma_{1}}{\sigma_{1}+\sigma_{2}}\cdot{X_{1}^{2}}/{\sigma_{1}^{2}}\Big{)}\cdot\exp\Big{(}\tfrac{\sigma_{2}}{\sigma_{1}+\sigma_{2}}\cdot{X_{2}^{2}}/{\sigma_{2}^{2}}\Big{)}\right]\\ &\leq\mathbb{E}\Big{[}\exp\big{(}{X_{1}^{2}}/{\sigma_{1}^{2}}\big{)}\Big{]}^{\frac{\sigma_{1}}{\sigma_{1}+\sigma_{2}}}\cdot\mathbb{E}\Big{[}\exp\big{(}{X_{2}^{2}}/{\sigma_{2}^{2}}\big{)}\Big{]}^{\frac{\sigma_{2}}{\sigma_{1}+\sigma_{2}}}\\ &\leq\rho_{1}^{{\sigma_{1}}/(\sigma_{1}+\sigma_{2})}\rho_{2}^{{\sigma_{2}}/(\sigma_{1}+\sigma_{2})},\end{split}

according to Hölder’s inequality. The conclusion generalizes to larger nn by induction.

Case (ii): Considering the case of n=2n=2, we have

𝔼[exp(X2σ12+σ22)]𝔼[exp(σ12σ12+σ22X12/σ12)exp(σ22σ12+σ22X22/σ22)]𝔼[exp(X12/σ12)]σ12/(σ12+σ22)𝔼[exp(X22/σ22)]σ22/(σ12+σ22)ρ1σ12/(σ12+σ22)ρ2σ22/(σ12+σ22),\begin{split}\mathbb{E}\left[\exp\Big{(}\tfrac{X^{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}}\Big{)}\right]&\leq\mathbb{E}\left[\exp\Big{(}\tfrac{\sigma_{1}^{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}}\cdot{X_{1}^{2}}/{\sigma_{1}^{2}}\Big{)}\cdot\exp\Big{(}\tfrac{\sigma_{2}^{2}}{\sigma_{1}^{2}+\sigma_{2}^{2}}\cdot{X_{2}^{2}}/{\sigma_{2}^{2}}\Big{)}\right]\\ &\leq\mathbb{E}\Big{[}\exp\big{(}{X_{1}^{2}}/{\sigma_{1}^{2}}\big{)}\Big{]}^{\sigma_{1}^{2}/(\sigma_{1}^{2}+\sigma_{2}^{2})}\cdot\mathbb{E}\Big{[}\exp\big{(}{X_{2}^{2}}/{\sigma_{2}^{2}}\big{)}\Big{]}^{\sigma_{2}^{2}/(\sigma_{1}^{2}+\sigma_{2}^{2})}\\ &\leq\rho_{1}^{{\sigma_{1}^{2}}/(\sigma_{1}^{2}+\sigma_{2}^{2})}\rho_{2}^{{\sigma_{2}^{2}}/(\sigma_{1}^{2}+\sigma_{2}^{2})},\end{split}

according to Hölder’s inequality. The conclusion generalizes to larger nn by induction.

Case (iii): Given the assumption, we have

𝔼[exp(X2/i=1nσi2)]=𝔼[exp((i=1nXi)2/i=1nσi2)]𝔼[exp(i=1n(Xi2/σi2))]=i=1n𝔼[exp(Xi2/σi2)]i=1nρi,\begin{split}\mathbb{E}\left[\exp\Big{(}X^{2}/{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}\Big{)}\right]&=\mathbb{E}\left[\exp\Big{(}\big{(}{\textstyle\sum_{i=1}^{n}}X_{i}\big{)}^{2}/{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}\Big{)}\right]\leq\mathbb{E}\left[\exp\Big{(}{\textstyle\sum_{i=1}^{n}}\big{(}X_{i}^{2}/\sigma_{i}^{2}\big{)}\Big{)}\right]\\ &=\textstyle{\prod_{i=1}^{n}\mathbb{E}\Big{[}\exp\big{(}X_{i}^{2}/\sigma_{i}^{2}\big{)}\Big{]}\leq\prod_{i=1}^{n}\rho_{i}},\end{split}

where we used Sedrakyan’s inequality for the first sign of inequality and the independence of all XiX_{i} for the second.  

2.4 Martingale difference with subgaussianity

A martingale is a sequence of random variables where the expected values remain unchanged over time, given its past history. Martingales are widely used in the study of stochastic processes, including fair gambling, asset price changes, algorithms for stochastic optimization, and more. In this note, we consider vector-valued martingales with subgaussian differences and apply the results from previous subsections to conduct a large deviation analysis of the martingales, as summarized in the following Theorem 6. This theorem, where assumptions (I), (II), and (III) progressively loosen the subgaussianity condition of the vector martingale differences, compiles existing results from [8, Lemma 2][9, Lemma 6], [5, Theorem 2.2.2], and [6, Theorem 7].

Theorem 6

Let {Xi}i=1,2,3\{X_{i}\}_{i=1,2,3\dots} be a stochastic process, {i}i=1,2,3\{\mathcal{F}_{i}\}_{i=1,2,3\dots} be the filtrations of corresponding σ\sigma-fields up to time ii, and let ϕi=[ϕi1ϕi2ϕid]d\bm{\phi}_{i}=[\phi_{i1}~{}\phi_{i2}\;\dots\;\phi_{id}]^{\top}\in\mathbb{R}^{d} be given by deterministic measurable functions ϕi=ϕi(X1,X2,,Xi)\bm{\phi}_{i}=\bm{\phi}_{i}(X_{1},X_{2},\dots,X_{i}) such that 𝔼[ϕi|i1]=𝟎\mathbb{E}\left[\bm{\phi}_{i}|\mathcal{F}_{i-1}\right]=\bm{0} for all ii. Furthermore, we consider the following conditions:

  • (I)

    𝔼[exp(ϕij2/σij2)|i1]exp(1)\mathbb{E}\big{[}\exp(\phi_{ij}^{2}/\sigma_{ij}^{2})\big{|}\mathcal{F}_{i-1}\big{]}\leq\exp(1) where j=1dσij2σi2\sum_{j=1}^{d}\sigma_{ij}^{2}\leq\sigma_{i}^{2}, for all i=1,2,3,,ni=1,2,3,\dots,n and j=1,2,3,,dj=1,2,3,\dots,d, with σij,σi>0\sigma_{ij},\sigma_{i}>0;

  • (II)

    𝔼[exp(ϕi2/σi2)|i1]exp(1)\mathbb{E}\big{[}\exp(\left\lVert\bm{\phi}_{i}\right\rVert^{2}\mkern-2.5mu/\sigma_{i}^{2})\big{|}\mathcal{F}_{i-1}\big{]}\leq\exp(1) for all i=1,2,3,,ni=1,2,3,\dots,n, with σi>0\sigma_{i}>0;

  • (III)

    𝔼[exp((𝒆uϕi)2/σi2)|i1]exp(1)\mathbb{E}\big{[}\exp\big{(}(\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i})^{2}/\sigma_{i}^{2}\big{)}\big{|}\mathcal{F}_{i-1}\big{]}\leq\exp(1) for any unit vector 𝒆ud\bm{e}_{\textup{u}}\in\mathbb{R}^{d} and all i=1,2,3,ni=1,2,3\dots,n, with σi>0\sigma_{i}>0.

Then for any λ0\lambda\geq 0 we have

[i=1nϕiλi=1nσi2]{2exp(λ2/4),if (I) holds;(d+1)exp(λ2/3),if (II) holds;5dexp(λ2/12),if (III) holds,\begin{split}&\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum_{i=1}^{n}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}}\bigg{]}\leq\left\{\begin{aligned} &2\exp(-\lambda^{2}/4),&&\textup{if (I) holds;}\\[2.0pt] &(d+1)\exp(-\lambda^{2}/3),&&\textup{if (II) holds;}\\[2.0pt] &5^{d}\exp\left(-\lambda^{2}/12\right),&&\textup{if (III) holds,}\end{aligned}\right.\end{split} (10)

and have

[𝒆ui=1nϕ𝒊λi=1nσi2]exp(λ2/3)\mathbb{P}\bigg{[}\bm{e}_{\textup{u}}^{\top}{\textstyle\sum^{n}_{i=1}\bm{\phi_{i}}}\geq\lambda\sqrt{{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq\exp(-\lambda^{2}/3) (11)

for any unit vector 𝐞ud\bm{e}_{\textup{u}}\in\mathbb{R}^{d} given any of the conditions (I), (II), and (III).

Proof 
Case (I): Given that 𝔼[ϕi|i1]=𝟎\mathbb{E}\left[\bm{\phi}_{i}|\mathcal{F}_{i-1}\right]=\bm{0} and 𝔼[exp(ϕij2/σij2)|i1]exp(1)\mathbb{E}\big{[}\exp(\phi_{ij}^{2}/\sigma_{ij}^{2})|\mathcal{F}_{i-1}\big{]}\leq\exp(1), Theorem 2 (case (a)) indicates

𝔼[exp(xϕij)|i1]exp(34σij2x2)\mathbb{E}\big{[}\exp(x\phi_{ij})\big{|}\mathcal{F}_{i-1}\big{]}\leq\exp\big{(}\tfrac{3}{4}\sigma_{ij}^{2}x^{2}\big{)}

for any xx\in\mathbb{R}, 1in1\leq i\leq n, and 1jd1\leq j\leq d. Then we have

𝔼[exp(xi=1nϕij)]=𝔼[exp(xi=1n1ϕij)𝔼[exp(xϕnj)|n1]]exp(34σnj2x2)𝔼[exp(xi=1n1ϕij)]exp[34x2i=1nσij2]\begin{split}\mathbb{E}\big{[}\exp(x{\textstyle\sum_{i=1}^{n}}\phi_{ij})\big{]}&=\mathbb{E}\Big{[}\exp(x{\textstyle\sum_{i=1}^{n-1}}\phi_{ij})\cdot\mathbb{E}\big{[}\exp(x\phi_{nj})\big{|}\mathcal{F}_{n-1}\big{]}\Big{]}\\ &\leq\exp\big{(}\tfrac{3}{4}\sigma_{nj}^{2}x^{2}\big{)}\cdot\mathbb{E}\big{[}\exp(x{\textstyle\sum_{i=1}^{n-1}}\phi_{ij})\big{]}\\ &\leq\cdots\leq\exp\Big{[}\tfrac{3}{4}x^{2}{\textstyle\sum_{i=1}^{n}}\sigma_{ij}^{2}\Big{]}\end{split} (12)

for any xx\in\mathbb{R} and 1jd1\leq j\leq d. Following from the implication (4)\implies(3) in Theorem 1 and Remark 1, we find

𝔼[exp(x(i=1nϕij)23i=1nσij2)]11x\mathbb{E}\bigg{[}\exp\bigg{(}\frac{x(\sum_{i=1}^{n}\phi_{ij})^{2}}{3\sum_{i=1}^{n}\sigma_{ij}^{2}}\bigg{)}\bigg{]}\leq\frac{1}{\sqrt{1-x}}

for any x(0,1)x\in(0,1). As i=1nϕi2=j=1d(i=1nϕij)2\left\lVert\sum_{i=1}^{n}\bm{\phi}_{i}\right\rVert^{2}=\sum_{j=1}^{d}(\sum_{i=1}^{n}\phi_{ij})^{2}, invoking assumption (I) and applying Theorem 5 (case (ii)), we find

𝔼[exp(xi=1nϕi23i=1nσi2)]𝔼[exp(xi=1nϕi23j=1di=1nσij2)]11x\mathbb{E}\bigg{[}\exp\bigg{(}\frac{x\left\lVert\sum_{i=1}^{n}\bm{\phi}_{i}\right\rVert^{2}}{3\sum_{i=1}^{n}\sigma_{i}^{2}}\bigg{)}\bigg{]}\leq\mathbb{E}\bigg{[}\exp\bigg{(}\frac{x\left\lVert\sum_{i=1}^{n}\bm{\phi}_{i}\right\rVert^{2}}{3\sum_{j=1}^{d}\sum_{i=1}^{n}\sigma_{ij}^{2}}\bigg{)}\bigg{]}\leq\frac{1}{\sqrt{1-x}}

for any x(0,1)x\in(0,1). Finally, according to the implication (3)\implies(1) in Theorem 1, we find

[i=1nϕiλ32xi=1nσi2]11xexp(λ2/2),\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum_{i=1}^{n}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{\tfrac{3}{2x}{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}}\bigg{]}\leq\frac{1}{\sqrt{1-x}}\exp(-\lambda^{2}/2),

or equivalently

[i=1nϕiλi=1nσi2]11xexp(xλ2/3),\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum_{i=1}^{n}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}}\bigg{]}\leq\frac{1}{\sqrt{1-x}}\exp(-x\lambda^{2}/3), (13)

for any x(0,1)x\in(0,1) and λ0\lambda\geq 0. Choosing x=3/4x=3/4 leads to

[i=1nϕiλi=1nσi2]2exp(λ2/4),\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum_{i=1}^{n}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}}\bigg{]}\leq 2\exp(-\lambda^{2}/4), (14)

which proves the claim for case (I). Note that if d=1d=1, we can directly obtain

[i=1nϕiλi=1nσi2]2exp(λ2/3)\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum_{i=1}^{n}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum_{i=1}^{n}}\sigma_{i}^{2}}\bigg{]}\leq 2\exp(-\lambda^{2}/3) (15)

from inequality (12) and the implication (4)\implies(1) in Theorem 1.

Case (II) (from Ref. [9], with corrections): Consider a random vector ϕd\bm{\phi}\in\mathbb{R}^{d} satisfying 𝔼[ϕ]=𝟎\mathbb{E}\left[\bm{\phi}\right]=\bm{0} and 𝔼[exp(ϕ2/σ2)]exp(1)\mathbb{E}\big{[}\exp(\left\lVert\bm{\phi}\right\rVert^{2}/\sigma^{2})\big{]}\leq\exp(1). First, we notice that the real symmetric matrix

𝚽[ 0ϕϕ𝟎](d+1)×(d+1)\mathbf{\Phi}\coloneqq\begin{bmatrix}\,0&\;\bm{\phi}^{\top}\\ \,\bm{\phi}&\mathbf{0}\end{bmatrix}\in\mathbb{R}^{(d+1)\times(d+1)} (16)

has a rank of at most 22 and has eigenvalues 0 (with multiplicity d1d-1) and ±ϕ\pm\left\lVert\bm{\phi}\right\rVert. Letting 𝐀𝐁\mathbf{A}\preceq\mathbf{B} denote that 𝐀𝐁\mathbf{A}-\mathbf{B} is negative semidefinite, we have

𝔼[exp(λ𝚽)]=𝔼[exp(λ𝐏𝐃𝐏1)]=𝔼[𝐏exp(λ𝐃)𝐏1]𝔼[λ𝐏𝐃𝐏1+𝐏exp(916λ2𝐃2)𝐏1]=λ𝔼[𝚽]+𝔼[𝐏exp(916λ2𝐃2)𝐏1]𝔼[𝐏exp(916λ2ϕ2)𝐈𝐏1]=𝔼[exp(916λ2ϕ2)]𝐈exp(916σ2λ2)𝐈\begin{split}\mathbb{E}\left[\exp(\lambda\mathbf{\Phi})\right]&=\mathbb{E}\left[\exp(\lambda\mathbf{P}\mathbf{D}\mathbf{P}^{-1})\right]=\mathbb{E}\left[\mathbf{P}\exp(\lambda\mathbf{D})\mathbf{P}^{-1}\right]\\ &\preceq\mathbb{E}\left[\lambda\mathbf{P}\mathbf{D}\mathbf{P}^{-1}+\mathbf{P}\exp\big{(}\tfrac{9}{16}\lambda^{2}\mathbf{D}^{2}\big{)}\mathbf{P}^{-1}\right]=\lambda\mathbb{E}\left[\mathbf{\Phi}\right]+\mathbb{E}\left[\mathbf{P}\exp\big{(}\tfrac{9}{16}\lambda^{2}\mathbf{D}^{2}\big{)}\mathbf{P}^{-1}\right]\\ &\preceq\mathbb{E}\big{[}\mathbf{P}\exp(\tfrac{9}{16}\lambda^{2}\left\lVert\bm{\phi}\right\rVert^{2})\mathbf{I}\mathbf{P}^{-1}\big{]}=\mathbb{E}\big{[}\exp(\tfrac{9}{16}\lambda^{2}\left\lVert\bm{\phi}\right\rVert^{2})\big{]}\mathbf{I}\\ &\preceq\exp\left(\tfrac{9}{16}\sigma^{2}\lambda^{2}\right)\mathbf{I}\end{split}

for all λ\lambda such that |λ|σ4/3\lvert\lambda\rvert\sigma\leq 4/3, where 𝚽=𝐏𝐃𝐏1\mathbf{\Phi}=\mathbf{P}\mathbf{D}\mathbf{P}^{-1} is the eigendecomposition of 𝚽\mathbf{\Phi} (𝐏\mathbf{P} being an orthogonal matrix and 𝐃\mathbf{D} diagonal). We also have

𝔼[exp(λ𝚽)]=𝔼[𝐏exp(λ𝐃)𝐏1]𝔼[𝐏exp(λϕ)𝐈𝐏1]=𝔼[exp(λϕ)]𝐈𝔼[exp(xλ2σ2/4+x1ϕ2/σ2)]𝐈exp(xλ2σ2/4+1/x)𝐈\begin{split}\mathbb{E}\left[\exp(\lambda\mathbf{\Phi})\right]=\mathbb{E}\left[\mathbf{P}\exp(\lambda\mathbf{D})\mathbf{P}^{-1}\right]&\preceq\mathbb{E}\left[\mathbf{P}\exp(\left\lVert\lambda\bm{\phi}\right\rVert)\mathbf{I}\mathbf{P}^{-1}\right]=\mathbb{E}\left[\exp(\left\lVert\lambda\bm{\phi}\right\rVert)\right]\mathbf{I}\\ &\preceq\mathbb{E}\big{[}\exp(x\lambda^{2}\sigma^{2}/4+x^{-1}\left\lVert\bm{\phi}\right\rVert^{2}/\sigma^{2})\big{]}\mathbf{I}\\ &\preceq\exp(x\lambda^{2}\sigma^{2}/4+1/x)\mathbf{I}\end{split}

for all x1x\geq 1. Thus, the argument in the proof for case (a) of Theorem 2 also applies here, leading to

𝔼[exp(λ𝚽)]exp(34σ2λ2)𝐈\mathbb{E}\left[\exp(\lambda\mathbf{\Phi})\right]\preceq\exp\big{(}\tfrac{3}{4}\sigma^{2}\lambda^{2}\big{)}\mathbf{I} (17)

for all λ\lambda\in\mathbb{R}. Second, for real symmetric matrices 𝐀\mathbf{A}, 𝐁\mathbf{B}, and 𝐂\mathbf{C}, we have

trexp(𝐀+𝐁)tr(exp(𝐀)exp(𝐁)),\operatorname{tr}\exp(\mathbf{A}+\mathbf{B})\leq\operatorname{tr}\big{(}\exp(\mathbf{A})\exp(\mathbf{B})\big{)}, (18)

which is known as the Golden–Thompson inequality [11, 12], and it is easy to verify that

tr(𝐂𝐀)tr(𝐂𝐁)\operatorname{tr}(\mathbf{C}\mathbf{A})\leq\operatorname{tr}(\mathbf{C}\mathbf{B}) (19)

if 𝐀𝐁\mathbf{A}\preceq\mathbf{B} and 𝐂𝟎\mathbf{C}\succeq\mathbf{0}. By defining 𝚽i\mathbf{\Phi}_{i} with ϕi\bm{\phi}_{i} (i=1,2,3,,ni=1,2,3,\dots,n) according to Eq. (16) and collecting all preparatory results, we find

𝔼[trexp(λi=1n𝚽i)]=𝔼[𝔼[trexp(λi=1n1𝚽i+λ𝚽n)|n1]]𝔼[𝔼[tr(exp(λi=1n1𝚽i)exp(λ𝚽n))|n1]]=𝔼[tr(exp(λi=1n1𝚽i)𝔼[exp(λ𝚽n)|n1])]𝔼[tr(exp(λi=1n1𝚽i)exp(34σn2λ2)𝐈)]=exp(34σn2λ2)𝔼[trexp(λi=1n1𝚽i)]exp(34λ2i=1nσi2)tr𝐈=(d+1)exp(34λ2i=1nσi2)\begin{split}\mathbb{E}\Big{[}\operatorname{tr}\exp\big{(}\lambda{\textstyle\sum^{n}_{i=1}}\mathbf{\Phi}_{i}\big{)}\Big{]}&=\mathbb{E}\Big{[}\mathbb{E}\Big{[}\operatorname{tr}\exp(\lambda{\textstyle\sum^{n-1}_{i=1}}\mathbf{\Phi}_{i}+\lambda\mathbf{\Phi}_{n})\Big{|}\mathcal{F}_{n-1}\Big{]}\Big{]}\\ &\leq\mathbb{E}\Big{[}\mathbb{E}\Big{[}\operatorname{tr}\big{(}\exp(\lambda{\textstyle\sum^{n-1}_{i=1}}\mathbf{\Phi}_{i})\exp(\lambda\mathbf{\Phi}_{n})\big{)}\Big{|}\mathcal{F}_{n-1}\Big{]}\Big{]}\\ &=\mathbb{E}\Big{[}\operatorname{tr}\Big{(}\exp(\lambda{\textstyle\sum^{n-1}_{i=1}}\mathbf{\Phi}_{i})\cdot\mathbb{E}\big{[}\exp(\lambda\mathbf{\Phi}_{n})\big{|}\mathcal{F}_{n-1}\big{]}\Big{)}\Big{]}\\ &\leq\mathbb{E}\Big{[}\operatorname{tr}\Big{(}\exp(\lambda{\textstyle\sum^{n-1}_{i=1}}\mathbf{\Phi}_{i})\cdot\exp\big{(}\tfrac{3}{4}\sigma_{n}^{2}\lambda^{2}\big{)}\,\mathbf{I}\Big{)}\Big{]}\\ &=\exp\big{(}\tfrac{3}{4}\sigma_{n}^{2}\lambda^{2}\big{)}\mathbb{E}\Big{[}\operatorname{tr}\exp\big{(}\lambda{\textstyle\sum^{n-1}_{i=1}}\mathbf{\Phi}_{i}\big{)}\Big{]}\leq\cdots\\ &\leq\exp\big{(}\tfrac{3}{4}\lambda^{2}{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\big{)}\operatorname{tr}\mathbf{I}\\ &=(d+1)\exp\big{(}\tfrac{3}{4}\lambda^{2}{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\big{)}\end{split}

for any λ\lambda\in\mathbb{R}, where we applied expression (18) for the first sign of inequality, and used (17) and (19) for the second. Note that

trexp(λi=1n𝚽i)=exp(λi=1nϕi)+exp(λi=1nϕi)+(d1)\operatorname{tr}\exp\big{(}\lambda{\textstyle\sum^{n}_{i=1}}\mathbf{\Phi}_{i}\big{)}=\exp(\lambda\left\lVert\textstyle\sum^{n}_{i=1}\bm{\phi}_{i}\right\rVert)+\exp(-\lambda\left\lVert\textstyle\sum^{n}_{i=1}\bm{\phi}_{i}\right\rVert)+(d-1)

for any λ\lambda\in\mathbb{R}; therefore, we have

𝔼[exp(xi=1nϕi)](d+1)exp(34x2i=1nσi2)\mathbb{E}\Big{[}\exp\big{(}x\left\lVert{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\rVert\big{)}\Big{]}\leq\big{(}d+1\big{)}\exp\Big{(}\tfrac{3}{4}x^{2}{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\Big{)} (20)

for any xx\in\mathbb{R}. This finally leads to

[i=1nϕiλi=1nσi2](d+1)exp(λ2/3)\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq(d+1)\exp(-\lambda^{2}/3) (21)

for all λ0\lambda\geq 0, according to the implication (4)\implies(1) in Theorem 1 and Remark 2.

Case (III): We notice that the Euclidean unit ball in d\mathbb{R}^{d} can be covered by (2/ϵ+1)d(2/\epsilon+1)^{d} Euclidean balls of radius ϵ\epsilon centered within the unit ball, for any ϵ(0,1)\epsilon\in(0,1) (check Corollary 4.2.13 in [2]; Example 5.8 in [3]), and let 𝒄k\bm{c}_{k} denote the centers of these balls in such a cover (k=1,2,,floor((1+2/ϵ)d)k=1,2,\dots,\operatorname{floor}\big{(}(1+2/\epsilon)^{d}\big{)}). Then for any given unit vector 𝒆u𝕊d1d\bm{e}_{\textup{u}}\in\mathbb{S}^{d-1}\mkern-3.0mu\subset\mathbb{R}^{d} there exists kk such that 𝒆u𝒄kϵ\left\lVert\bm{e}_{\textup{u}}-\bm{c}_{k}\right\rVert\leq\epsilon, and we have

i=1nϕi=max𝒆u=1{𝒆ui=1nϕi}max1k(1+2/ϵ)d{𝒄ki=1nϕi}+max𝒆ϵ{𝒆i=1nϕi}max1k(1+2/ϵ)d{𝒄ki=1nϕi}+ϵi=1nϕi,\begin{split}\left\lVert{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\rVert=\max_{\left\lVert\bm{e}_{\textup{u}}\right\rVert=1}\left\{\bm{e}_{\textup{u}}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\}&\leq\max_{1\leq k\leq(1+2/\epsilon)^{d}}\left\{\bm{c}_{k}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\}+\max_{\left\lVert\bm{e}\right\rVert\leq\epsilon}\left\{\bm{e}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\}\\ &\leq\max_{1\leq k\leq(1+2/\epsilon)^{d}}\left\{\bm{c}_{k}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\}+\epsilon\left\lVert{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\rVert,\end{split}

which indicates

0i=1nϕi(1ϵ)1max1k(1+2/ϵ)d{𝒄ki=1nϕi}0\leq\left\lVert{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\rVert\leq(1-\epsilon)^{-1}\max_{1\leq k\leq(1+2/\epsilon)^{d}}\left\{\bm{c}_{k}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\} (22)

for all ϵ(0,1)\epsilon\in(0,1). Now, we examine the subgaussianity of 𝒄ki=1nϕi\bm{c}_{k}^{\top}\sum_{i=1}^{n}\bm{\phi}_{i}. Given the assumptions 𝔼[exp((𝒆uϕi)2/σi2)|i1]exp(1)\mathbb{E}\left[\exp\big{(}(\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i})^{2}/\sigma_{i}^{2}\big{)}\big{|}\mathcal{F}_{i-1}\right]\leq\exp(1) and 𝔼[𝒆uϕi|i1]=𝒆u𝔼[ϕi|i1]=0\mathbb{E}\left[\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i}\big{|}\mathcal{F}_{i-1}\right]=\bm{e}_{\textup{u}}^{\top}\mathbb{E}\left[\bm{\phi}_{i}\big{|}\mathcal{F}_{i-1}\right]=0 for any unit vector 𝒆u\bm{e}_{\textup{u}}, Theorem 2 indicates that 𝒆uϕi\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i}, conditional on i1\mathcal{F}_{i-1}, is (3σi2/2,1)(3\sigma_{i}^{2}/2,1)-subgaussian, and

𝔼[exp(λ𝒄ki=1nϕi)]=𝔼[exp(λ𝒄ki=1n1ϕi)𝔼[exp(λ𝒄kϕn)|n1]]exp(34𝒄k2σn2λ2)𝔼[exp(λ𝒄ki=1n1ϕi)]exp[34λ2𝒄k2i=1nσi2]\begin{split}\mathbb{E}\big{[}\exp(\lambda\bm{c}_{k}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i})\big{]}&=\mathbb{E}\Big{[}\exp(\lambda\bm{c}_{k}^{\top}{\textstyle\sum^{n-1}_{i=1}}\bm{\phi}_{i})\cdot\mathbb{E}\big{[}\exp(\lambda\bm{c}_{k}^{\top}\bm{\phi}_{n})|\mathcal{F}_{n-1}\big{]}\Big{]}\\ &\leq\exp\big{(}\tfrac{3}{4}\left\lVert\bm{c}_{k}\right\rVert^{2}\sigma_{n}^{2}\lambda^{2}\big{)}\mathbb{E}\big{[}\exp(\lambda\bm{c}_{k}^{\top}{\textstyle\sum^{n-1}_{i=1}}\bm{\phi}_{i})\big{]}\leq\dots\\ &\leq\exp\Big{[}\tfrac{3}{4}\lambda^{2}\left\lVert\bm{c}_{k}\right\rVert^{2}{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\Big{]}\end{split}

for all λ\lambda\in\mathbb{R} and all 𝒄k\bm{c}_{k}. Now, according to case (iii) of Theorem 4,

𝔼[exp(λi=1nϕi)]𝔼[exp(|λ|1ϵmaxk{𝒄ki=1nϕi})](1+2ϵ)dexp(λ2(1ϵ)234maxk{𝒄k2}i=1nσi2)(1+2ϵ)dexp(1(1ϵ)234λ2i=1nσi2)\begin{split}\mathbb{E}\big{[}\exp(\lambda\left\lVert{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\right\rVert)\big{]}&\leq\mathbb{E}\left[\exp\left(\frac{\lvert\lambda\rvert}{1-\epsilon}{\max_{k}}\Big{\{}\bm{c}_{k}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\Big{\}}\right)\right]\\ &\leq\left(1+\frac{2}{\epsilon}\right)^{d}\exp\left(\frac{\lambda^{2}}{(1-\epsilon)^{2}}\cdot\frac{3}{4}{\max_{k}}\big{\{}\left\lVert\bm{c}_{k}\right\rVert^{2}\big{\}}\cdot{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\right)\\ &\leq\left(1+\frac{2}{\epsilon}\right)^{d}\exp\left(\frac{1}{(1-\epsilon)^{2}}\cdot\frac{3}{4}\lambda^{2}\cdot{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\right)\end{split}

for all λ\lambda\in\mathbb{R}. Finally, we obtain

[i=1nϕiλ1ϵ32i=1nσi2](1+2ϵ)dexp(λ2/2),\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\Big{\rVert}\geq\frac{\lambda}{1-\epsilon}\sqrt{\tfrac{3}{2}{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq\Big{(}1+\frac{2}{\epsilon}\Big{)}^{d}\exp(-\lambda^{2}/2), (23)

or equivalently

[i=1nϕiλi=1nσi2](1+2ϵ)dexp(13(1ϵ)2λ2),\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq\Big{(}1+\frac{2}{\epsilon}\Big{)}^{d}\exp\left(-\frac{1}{3}\big{(}1-\epsilon\big{)}^{2}\lambda^{2}\right), (24)

for all λ0\lambda\geq 0 and ϵ(0,1)\epsilon\in(0,1). By choosing ϵ=1/2\epsilon=1/2 we obtain

[i=1nϕiλi=1nσi2]5dexp(λ2/12)\mathbb{P}\bigg{[}\Big{\lVert}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\Big{\rVert}\geq\lambda\sqrt{{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq 5^{d}\exp\left(-\lambda^{2}/12\right) (25)

for all λ0\lambda\geq 0.

The assumptions (I), (II), and (III) are progressively weaker, in the sense that (I) implies (II) according to (ii) in Theorem 5, and (II) implies (III) as (𝒆uϕi)2𝒆u2ϕi2=ϕi2(\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i})^{2}\leq\left\lVert\bm{e}_{\textup{u}}\right\rVert^{2}\left\lVert\bm{\phi}_{i}\right\rVert^{2}=\left\lVert\bm{\phi}_{i}\right\rVert^{2}. With 𝔼[𝒆uϕi|i1]=0\mathbb{E}\left[\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i}|\mathcal{F}_{i-1}\right]=0, we have 𝔼[exp(λ𝒆uϕi)|i1]exp(3σi2λ2/4)\mathbb{E}\big{[}\exp(\lambda\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{i})|\mathcal{F}_{i-1}\big{]}\leq\exp\big{(}3\sigma_{i}^{2}\lambda^{2}/4\big{)} for any unit vector 𝒆ud\bm{e}_{\textup{u}}\in\mathbb{R}^{d} and i=1,2,3,i=1,2,3,\dots, and thus

𝔼[exp(λ𝒆ui=1nϕi)]=𝔼[exp(λ𝒆ui=1n1ϕi)𝔼[exp(λ𝒆uϕn)|n1]]exp(34σn2λ2)𝔼[exp(λi=1n1ϕi)]exp(34λ2i=1nσi2)\begin{split}\mathbb{E}\big{[}\exp(\lambda\bm{e}_{\textup{u}}^{\top}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i})\big{]}&=\mathbb{E}\Big{[}\exp(\lambda\bm{e}_{\textup{u}}^{\top}{\textstyle\sum^{n-1}_{i=1}}\bm{\phi}_{i})\cdot\mathbb{E}\big{[}\exp(\lambda\bm{e}_{\textup{u}}^{\top}\bm{\phi}_{n})|\mathcal{F}_{n-1}\big{]}\Big{]}\\ &\leq\exp\big{(}\tfrac{3}{4}\sigma_{n}^{2}\lambda^{2}\big{)}\mathbb{E}\big{[}\exp(\lambda{\textstyle\sum^{n-1}_{i=1}}\bm{\phi}_{i})\big{]}\leq\cdots\\ &\leq\exp\Big{(}\tfrac{3}{4}\lambda^{2}{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}\Big{)}\end{split}

for all λ\lambda\in\mathbb{R}. Therefore, we have

[𝒆ui=1nϕ𝒊λi=1nσi2]exp(λ2/3)\mathbb{P}\bigg{[}\bm{e}_{\textup{u}}^{\top}{\textstyle\sum^{n}_{i=1}\bm{\phi_{i}}}\geq\lambda\sqrt{{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq\exp(-\lambda^{2}/3)

for any unit vector 𝒆ud\bm{e}_{\textup{u}}\in\mathbb{R}^{d} and λ0\lambda\geq 0, according to the implication (4)\implies(5) in Theorem 1 and Remark 1.  

Remark 3

Smaller values of ϵ\epsilon in inequality (24) can be chosen when the variance proxy is prioritized. For example, by choosing ϵ=1/3\epsilon=1/3, we have

[i=1nϕiλi=1nσi2]7dexp(427λ2)7dexp(λ2/7)\mathbb{P}\bigg{[}\big{\lVert}{\textstyle\sum^{n}_{i=1}}\bm{\phi}_{i}\big{\rVert}\geq\lambda\sqrt{{\textstyle\sum^{n}_{i=1}}\sigma_{i}^{2}}\bigg{]}\leq 7^{d}\exp\left(-\tfrac{4}{27}\lambda^{2}\right)\leq 7^{d}\exp\left(-\lambda^{2}/7\right)

for all λ0\lambda\geq 0.

Remark 4

When subgaussianity of i=1nϕi\left\lVert\textstyle\sum^{n}_{i=1}\bm{\phi}_{i}\right\rVert is described by 𝔼[exp(i=1nϕi2/σ2)]ρ\mathbb{E}[\exp(\left\lVert\textstyle\sum^{n}_{i=1}\bm{\phi}_{i}\right\rVert^{2}\mkern-3.0mu/\sigma^{2})]\leq\rho with ρ\rho a constant, we will see

σ2i=1nσi2={𝒪(1),if (I) holds;𝒪(ln(d+1)),if (II) holds;𝒪(d),if (III) holds,\begin{split}&\frac{\sigma^{2}}{\textstyle\sum^{n}_{i=1}\sigma_{i}^{2}}=\left\{\begin{aligned} &\mathcal{O}(1),&&\textup{if (I) holds;}\\[1.0pt] &\mathcal{O}(\ln(d+1)),&&\textup{if (II) holds;}\\[1.0pt] &\mathcal{O}(d),&&\textup{if (III) holds,}\end{aligned}\right.\end{split}

with details omitted here.

3 Summary

This note restates the most basic properties of subgaussian random variables using the definition of (σ,ρ)(\sigma,\rho)-subgaussianity. It includes the equivalence of various characterizations, the treatment of (centered) (σ,1)(\sigma,1)-subgaussians, the closure of subgaussianity under simple operations, and an application to the large deviation analysis of subgaussian vector martingale differences. While this note adheres to well-established results, it also provides flexibility in translating between and aligning these results, as well as in manipulating their details.


Acknowledgments

The author acknowledges the support from the Institute of AI and Beyond of the University of Tokyo, and JST Moonshot R&D Grant Number JPMJMS2021.


References

  • Mitrinović [1970] Dragoslav S. Mitrinović. Analytic Inequalities, volume 165 of Grundlehren der mathematischen Wissenschaften. Springer-Verlag, Berlin · Heidelberg, first edition, 1970. doi:10.1007/978-3-642-99970-3.
  • Vershynin [2018] Roman Vershynin. High Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, UK, first edition, 2018. doi:10.1017/9781108231596.
  • Wainwright [2019] Martin J. Wainwright. High-Dimensional Statistics: A Non-Asymptotic Viewpoint, volume 48 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, UK, first edition, 2019. doi:10.1017/9781108627771.
  • Rigollet [2015] Philippe Rigollet. High Dimensional Statistics (Lecture Notes). MIT OpenCourseWare Massachusetts Institute of Technology, 2015. URL https://ocw.mit.edu/courses/18-s997-high-dimensional-statistics-spring-2015/.
  • Pauwels [2020] Edouard Pauwels. Statistics, optimization and algorithms in high dimension (Lecture Notes). Université Toulouse 3 Paul Sabatier, 2020. URL https://www.math.univ-toulouse.fr/~epauwels/M2RI/index.html.
  • Deligiannidis [2021] George Deligiannidis. Modern Statistical Theory (Lecture Notes). Statistics and Machine Learning Centre for Doctoral Training, University of Oxford, 2021. URL https://www.stats.ox.ac.uk/~deligian/pdf/statml/notes_ver2.pdf. accessed July 2024.
  • Rivasplata [2012] Omar Rivasplata. Subgaussian random variables: An expository note. Technical report, University of Alberta, 2012. doi:10.13140/RG.2.2.36288.23040.
  • Lan et al. [2012] Guanghui Lan, Arkadi Nemirovski, and Alexander Shapiro. Validation analysis of mirror descent stochastic approximation method. Mathematical Programming, 134:425–458, 2012. doi:10.1007/s10107-011-0442-6.
  • Jin et al. [2019] Chi Jin, Praneeth Netrapalli, Rong Ge, Sham M. Kakade, and Michael I. Jordan. A short note on concentration inequalities for random vectors with subgaussian norm, 2019. arXiv:1902.03736v1 [math.PR].
  • Stromberg [1994] Karl Stromberg. Probability for Analysts. Chapman & Hall Probability Series. Chapman & Hall, New York, first edition, 1994. doi:10.1201/9780203742020.
  • Golden [1965] Sidney Golden. Lower bounds for the helmholtz function. Phys. Rev., Series II, 137:B1127–B1128, 1965. doi:10.1103/PhysRev.137.B1127.
  • Thompson [1965] Colin J. Thompson. Inequality with applications in statistical mechanics. Journal of Mathematical Physics, 6:1812–1813, 1965. doi:10.1063/1.1704727.