2.1 Equivalent characterizations
Let us begin the main body with the adjusted definition of subgaussian variables, followed
by Theorem 1, which states the equivalence of various characterizations.
Definition 1
A random variable is called -subgaussian if there exist constants
and such that the moment-generating function of satisfies
|
|
|
(2) |
Theorem 1 (Equivalent characterizations of subgaussianity)
For a random variable and constants , , and , the following
properties are equivalent in the sense that, there exist absolute constants and mappings
such that the implication from to holds for all and whenever and :
-
(1)
The distribution tail of satisfies
|
|
|
-
(2)
The moments of even orders of satisfy
|
|
|
-
(3)
The moment-generating function of is finite at a specific point such that
|
|
|
-
(4)
is -subgaussian, namely, the moment-generating function of satisfies
|
|
|
-
(5)
The distribution tails of satisfy
|
|
|
Proof
(1)(2) (according to Ref. [2]): Given property (1), we have
|
|
|
for any , where denotes the gamma function.
When with , property (2) is obtained with
and ; that is, and .
(2)(3) (according to Ref. [2], with adjustments):
Given property (2), we find that the moment-generating function of satisfies
|
|
|
for all .
We observe that the rightmost side of the above expression increases with ;
we can freely select any and accordingly set
, ,
with which property (3) holds.
(3)(4) (according to [3], with adjustments):
From property (3) and the inequality ,
we find
|
|
|
for all .
Therefore, property (4) holds for
and , i.e., and .
(4)(5) (generic Chernoff bound, see [2, 3]):
Given property (4), Markov’s inequality implies
|
|
|
for all and .
Note that
holds trivially for . Minimizing the right-hand side with respect to , we obtain
|
|
|
where the infimum is attained at .
Since property (4) is invariant when replacing with , we similarly find
for all .
Therefore, property (5) holds whenever and , i.e.,
and .
(5)(1):
This is trivial, with and .
So far, we have demonstrated the equivalence of all characterizations; any and
not directly addressed can be determined using the transitivity of implications.
Furthermore, improvements can be made to certain with the following additional
implications.
(3)(1) [2]:
This follows immediately from Markov’s inequality, since
|
|
|
for any , given property (3).
Therefore, property (1) holds for all and ,
i.e., and . The value obtained here
equals that derived via chain of implications, , while
obtained here is slightly better than the composition
.
(3)(2) [6]:
By expanding property (3) into a power series, we obtain
|
|
|
Since every term in the sum is non-negative, it follows that
|
|
|
for all positive integers . Therefore, property (2) holds, with and
, where is slightly better than the composition
.
(1)(3):
This can be derived from the chain (1)(2)(3) or directly as shown in
Refs. [6, 7]. However, it should be noted that property (1) may
lead to a slightly better conclusion than property (2). Given property (1), we have
|
|
|
for any , where denotes the upper incomplete gamma function.
When with , we have
|
|
|
which is slightly better than the result
in the derivation
for implication (1)(2). Furthermore, we find
|
|
|
for all .
Therefore, we can freely select any , and property (3) holds with
and ,
where is slightly better than the composition
.
(4)(3) (according to [3]):
Starting from property (4), we have
|
|
|
for all and .
By integrating both sides with respect to on and using Fubini’s
theorem, we obtain
|
|
|
which simplifies to
|
|
|
for all .
Therefore, property (3) holds with and
.
Meanwhile, the implication chain (4)(5)(1)(3) leads to
and .
We may finally choose
|
|
|
(3) |
2.2 Centered subgaussian variables
Definition 1 and the equivalent characterizations in
Theorem 1 do not require subgaussian variables to be centered (by
“subgaussian” we mean that any of the properties in Theorem 1 is satisfied).
Only -subgaussian variables, as conventionally defined, are guaranteed to have a zero
mean. Conversely, any centered subgaussian variable must be -subgaussian for some
.
Theorem 2 (Centered subgaussian variables)
Let be a subgaussian variable, satisfying any of the properties in
Theorem 1, and assume that .
Then must be -subgaussian, namely, its moment-generating function satisfies
|
|
|
for all , where can be determined as follows:
-
(a)
If property in Theorem 1 is satisfied, then
|
|
|
-
(b)
If property in Theorem 1 is satisfied, then
|
|
|
-
(c)
If property in Theorem 1 is satisfied, then
|
|
|
-
(d)
If property in Theorem 1 is satisfied, then
|
|
|
Proof
Case (a) (adjusted from the proof for Lemma 2 in Ref. [8]):
From the numerical relation (), we find
|
|
|
(4) |
for any such that , i.e.,
, where we used the assumption ,
Jensen’s inequality, and property (3) in Theorem 1. On the other hand, from the
inequality for , we find
|
|
|
for all , where we again used Jensen’s inequality and property (3). By minimizing the
rightmost side with respect to , we obtain
|
|
|
(5) |
Combining inequalities (4) and (5), the
upper bound of the moment-generating function of on can be expressed as follows:
When (i.e., ), we have
|
|
|
When , we have
|
|
|
Note that
|
|
|
By choosing the greatest value of the coefficient for in each interval
, , or , the claim
of case (a) is proved.
Case (b) (adjusted from [3]):
The moments of odd orders of a random variable can be bounded according to the Cauchy–Schwarz
inequality, as
|
|
|
for all and . Applying this for and substituting it into the power
series expansion of , we find
|
|
|
where we have used the assumption and property (2), and have defined
coefficients
|
|
|
where are to be determined. By choosing for , we have
|
|
|
and
|
|
|
Now we need to choose the value of appropriately and find a common upper bound for ,
, and all (). When , we can choose ,
so that and for all .
When , we choose such that
|
|
|
(6) |
which leads to an explicit solution
|
|
|
and accordingly
|
|
|
With such choices of (), we finally have
|
|
|
which proves the claim of case (b).
Case (c): Property (1) in Theorem 1 indicates
, where
for , as shown in the proof for
(1)(3) therein.
Similar to the proof above for case (b), it is easy to find
|
|
|
for all , where, by choosing for , the coefficients
are given by
|
|
|
Following from the inequality
|
|
|
for and , we have
|
|
|
By choosing such that
|
|
|
(7) |
which leads to an explicit solution
|
|
|
we have and
for all and all , where
|
|
|
Therefore, we finally have
|
|
|
which proves the claim of case (c).
Case (d):
According to Theorem 1 and Remark 1, if property (4) is
satisfied, then property (3) is satisfied, with
and for any .
Furthermore, given , the proof for case (a) indicates
|
|
|
for all such that , i.e.,
. On the other hand, for all such
that , we clearly have
|
|
|
Also, we have for ,
since can be verified easily.
Therefore, for all we have
|
|
|
where the minimizing tends to and when approaches and ,
respectively. Since the minimizing does not have an explicit expression, we choose the
following surrogate
|
|
|
(8) |
which leads to , and thus
|
|
|
Therefore, the claim in case (d) is proved.
Note that the values for -subgaussian variables, explicitly provided in
Theorem 2, are derived directly from each of the subgaussian properties and are
not meant to be optimal. By translating between different equivalent properties, one can
potentially find a better , as demonstrated in the following example.
Example 1
Assume that is centered and satisfies
for , where
takes on the values , , or .
According to case (b) of Theorem 2, is
-, -, or -subgaussian for each
respective value of .
Meanwhile, Theorem 1 and Remark 1 imply that
. Consequently,
case (a) of Theorem 2 indicates that is
-, -, or -subgaussian,
respectively.
While some texts define subgaussian variables by Eq. (2) fixing
, some others [3, 6] generalize the concept by allowing
nonzero expectations. They consider subgaussian if its centered version,
, satisfies Definition 1 with ;
this implies that a subgaussian variable plus any constant remains subgaussian. This treatment is
actually ”equivalent” to the definition of -subgaussian variables provided here, as
will be clarified in Theorem 3 and Corollary 1.
Theorem 3
Let be a -subgaussian random variable
and be a constant.
Then is -subgaussian, with
|
|
|
(9) |
where is an arbitrary positive number.
Corollary 1
A random variable is -subgaussian for some constants
and , if and only if is -subgaussian for
some .
Proof
Given the assumptions in Theorem 3, we have
|
|
|
for all and , hence proving Theorem 3.
Applying Theorem 3 with
and Theorem 2 leads to the corollary.
2.3 Closure under simple operations
Theorem 3 may be viewed as a specific instance of the closure of subgaussianity
under sum-mation, as a constant is trivially a
-subgaussian variable,
according to Definition 1.
Some more general cases of the closure of subgaussianity are demonstrated by
Theorems 4 and 5, where the discussion is based on properties (4)
and (3) in Theorem 1, respectively.
Obviously, the closure of subgaussianity can also be
expressed with respect to the other subgaussian properties, potentially by introducing additional
absolute constants.
Theorem 4
Let be random variables that are
-subgaussian, respectively. Then we have
-
(i)
is -subgaussian, with
|
|
|
-
(ii)
,
where all are independent from one another, is -subgaussian, with
|
|
|
-
(iii)
is -subgaussian, with
|
|
|
Proof
Case (i) (adjusted from Ref. [7]):
Considering the case of , we have
|
|
|
for all , where we applied Hölder’s inequality for the first inequality.
The conclusion generalizes to larger by induction.
Case (ii):
Given the assumption, we have
|
|
|
for all , where we used the independence of all .
Case (iii):
Given the assumption, we have
|
|
|
for any , hence proving the claim.
Theorem 5
Let be random variables satisfying
, respectively.
Then we have
-
(i)
such that satisfies
, with
|
|
|
-
(ii)
such that satisfies
, with
|
|
|
-
(iii)
, where all are independent from one another, satisfies
, with
|
|
|
Proof
Case (i):
Considering the case of , we have
|
|
|
and we further find
|
|
|
according to Hölder’s inequality. The conclusion generalizes to larger by induction.
Case (ii):
Considering the case of , we have
|
|
|
according to Hölder’s inequality. The conclusion generalizes to larger by induction.
Case (iii):
Given the assumption, we have
|
|
|
where we used Sedrakyan’s inequality for the first sign of inequality and the independence of all
for the second.
2.4 Martingale difference with subgaussianity
A martingale is a sequence of random variables where the expected values remain unchanged over time,
given its past history. Martingales are widely used in the study of stochastic processes, including
fair gambling, asset price changes, algorithms for stochastic optimization, and more.
In this note, we consider vector-valued martingales with subgaussian differences and apply the
results from previous subsections to conduct a large deviation analysis of the martingales, as
summarized in the following Theorem 6. This theorem, where assumptions (I), (II),
and (III) progressively loosen the subgaussianity condition of the vector martingale differences,
compiles existing results from [8, Lemma 2], [9, Lemma 6],
[5, Theorem 2.2.2], and [6, Theorem 7].
Theorem 6
Let be a stochastic process, be the
filtrations of corresponding -fields up to time ,
and let be given by
deterministic measurable functions such that
for all .
Furthermore, we consider the following conditions:
-
(I)
where ,
for all and ,
with ;
-
(II)
for all , with ;
-
(III)
for any unit vector
and all ,
with .
Then for any we have
|
|
|
(10) |
and have
|
|
|
(11) |
for any unit vector
given any of the conditions (I), (II), and (III).
Proof
Case (I):
Given that and
,
Theorem 2 (case (a)) indicates
|
|
|
for any , , and .
Then we have
|
|
|
(12) |
for any and . Following from the implication (4)(3) in
Theorem 1 and Remark 1, we find
|
|
|
for any .
As ,
invoking assumption (I) and applying Theorem 5 (case (ii)), we find
|
|
|
for any . Finally, according to the implication (3)(1) in
Theorem 1, we find
|
|
|
or equivalently
|
|
|
(13) |
for any and . Choosing leads to
|
|
|
(14) |
which proves the claim for case (I). Note that if , we can directly obtain
|
|
|
(15) |
from inequality (12) and the implication (4)(1)
in Theorem 1.
Case (II) (from Ref. [9], with corrections):
Consider a random vector satisfying
and
.
First, we notice that the real symmetric matrix
|
|
|
(16) |
has a rank of at most and has eigenvalues 0 (with multiplicity ) and .
Letting denote that is negative semidefinite,
we have
|
|
|
for all such that , where
is the eigendecomposition of
( being an orthogonal matrix and diagonal). We also have
|
|
|
for all . Thus, the argument in the proof for case (a) of Theorem 2 also
applies here, leading to
|
|
|
(17) |
for all .
Second, for real symmetric matrices , , and , we have
|
|
|
(18) |
which is known as the Golden–Thompson inequality [11, 12], and it is easy
to verify that
|
|
|
(19) |
if and .
By defining with () according to
Eq. (16) and collecting all preparatory results, we find
|
|
|
for any ,
where we applied expression (18) for the first sign of inequality, and
used (17) and (19) for the second.
Note that
|
|
|
for any ; therefore, we have
|
|
|
(20) |
for any . This finally leads to
|
|
|
(21) |
for all , according to the implication (4)(1)
in Theorem 1 and Remark 2.
Case (III):
We notice that the Euclidean unit ball in can be covered by
Euclidean balls of radius centered within the unit ball, for any
(check Corollary 4.2.13 in [2]; Example 5.8 in [3]), and let
denote the centers of these balls in such a cover
().
Then for any given unit vector
there exists such that , and we have
|
|
|
which indicates
|
|
|
(22) |
for all .
Now, we examine the subgaussianity of .
Given the assumptions
and
for any unit vector , Theorem 2 indicates that
, conditional on , is
-subgaussian, and
|
|
|
for all and all .
Now, according to case (iii) of Theorem 4,
|
|
|
for all . Finally, we obtain
|
|
|
(23) |
or equivalently
|
|
|
(24) |
for all and . By choosing we obtain
|
|
|
(25) |
for all .
The assumptions (I), (II), and (III) are progressively weaker, in the sense that
(I) implies (II) according to (ii) in Theorem 5, and
(II) implies (III) as .
With , we have
for any unit vector and , and thus
|
|
|
for all . Therefore, we have
|
|
|
for any unit vector and ,
according to the implication (4)(5) in Theorem 1
and Remark 1.