Norms of structured random matrices
Abstract.
For , let be a random matrix, a real deterministic matrix, and the corresponding structured random matrix. We study the expected operator norm of considered as a random operator between and for . We prove optimal bounds up to logarithmic terms when the underlying random matrix has i.i.d. Gaussian entries, independent mean-zero bounded entries, or independent mean-zero () entries. In certain cases, we determine the precise order of the expected norm up to constants. Our results are expressed through a sum of operator norms of Hadamard products and .
Key words and phrases:
Gaussian random matrix, operator norm, structured random matrix, random variable.2020 Mathematics Subject Classification:
Primary 60B20; Secondary 46B09; 52A23; 60G15; 60E15.1. Introduction and main results
With his work on the statistical analysis of large samples [69], Wishart initiated the systematic study of large random matrices. Ever since, random matrices have continuously entered more and more areas of mathematics and applied sciences beyond probability theory and statistics, for instance, in numerical analysis through the work of Goldstine and von Neumann [65, 20] and in quantum physics through the works of Wigner [66, 67, 68] on his famous semicircle law, which resulted in significant effort to understand spectral statistics of random matrices from an asymptotic point of view. Today, random matrix theory has grown into a vital area of probability theory and statistics, and within the last two decades, random matrices have come to play a major role in many areas of (algorithmic) computational mathematics, for instance, in questions related to sparsification methods [1, 54] and sparse approximation [57, 58], dimension reduction [4, 12, 44], or combinatorial optimization [46, 53]. We refer the reader to [5, 6, 60] for more information.
In this paper, we are interested in the non-asymptotic theory of (large) random matrices. This theory plays a fundamental role in geometric functional analysis at least since the ’70s, the connection coming in various different flavors. It is of particular importance in the geometry of Banach spaces and the theory of operator algebras [9, 10, 15, 18, 21, 30] and their applications to high-dimensional problems, for instance, in convex geometry [17, 22], compressed sensing [14, 16, 48, 63], information-based complexity [27, 28], or statistical learning theory [50, 64]. On the other hand, geometric functional analysis had and still has enduring influence on random matrix theory as is witnessed, for instance, through applications of measure concentration techniques; we refer to [15, 42] and the references cited therein. The quantity we study and focus on here concerns the expected operator norm of random matrices considered as operators between finite-dimensional spaces; recall that denotes the space equipped with the (quasi-)norm , given by for and if . We address the following problem: for and , determine the right order (up to constants that may depend on the parameters and ) of
where, given a deterministic real matrix and a random matrix , we denote by
the structured random matrix; the symbol stands for the Hadamard product of matrices (i.e., entrywise multiplication). The bounds on the expected operator norm should be of optimal order and expressed in terms of the coefficients , . Understanding such expressions and related quantities is important, for instance, when studying the worst-case error of optimal algorithms which are based on random information in function approximation problems [28] (see also [33]) or the quality of random information for the recovery of vectors from an -ellipsoid, where (the radius of) optimal information is given by Gelfand numbers of a diagonal operator [29].
In the case where the random entries of are i.i.d. standard Gaussians (then we write instead of ) and , we will show the following bound, which is sharp up to logarithmic terms:
(1.1) |
where , , and denotes the Hölder conjugate of defined by the relation . As will be explained later, we obtain sharp estimates in certain cases and derive results similar to (1.1) for other models of randomness.
1.1. History of the problem and known results
In what follows, is a real deterministic matrix and always stands for a random matrix with i.i.d. standard Gaussian entries (usually the matrices are of size unless explicitly stated otherwise). We use , , etc. for positive constants which may depend only on the parameters given in brackets and write for positive absolute constants. The symbols , , , etc. denote that the inequality holds up to multiplicative constants depending only on the parameters given in the subscripts; we write if and , and , , etc. if the constants may depend on the parameters given in the subscript.
In 1975, Bennett, Goodman, and Newman [9] proved that if is an random matrix with independent, mean-zero entries taking values in , and , then
(1.2) |
In fact, up to constants, this estimate is best possible: for any matrix with entries one readily sees that ; just use standard unit vectors and operator duality. Moreover, in this ‘unstructured’ case, where for all , it is easy to extend (1.2) to the whole range of (see [8, 13] or Remark 4.2 below). Also, if all entries are i.i.d. Rademacher random variables, then the bounds are two-sided, i.e., the expected operator norm is, up to constants, the same as the minimal norm for all , (see [8, Proposition 3.2] or [13, Satz 2]).
The case most studied in the literature is the one of the spectral norm, i.e., the operator norm. Seginer [51] proved in 2000 that if is an random matrix with i.i.d. mean-zero entries, then its operator norm is of the same order as the sum of expectations of the maximum Euclidean norm of rows and columns of , i.e.,
(1.3) |
Riemer and Schütt [49] proved that, up to a logarithmic factor , the same holds true for any random matrix with independent but not necessarily identically distributed mean-zero entries. Let us also mention that in the Gaussian setting one can use a non-commutative Khintchine bound (see, e.g., [59, Equation (4.9)]) to show that, up to a factor , the expected spectral norm is of the order of the largest Euclidean norm of its rows and columns.
In the very same setting that was considered by Riemer and Schütt, Latała [37] had obtained a few years earlier the dimension-free estimate
This bound is superior to the Riemer–Schütt bound in the case of matrices with all entries equal to and is optimal for Wigner matrices. In other cases, like the one of diagonal matrices, the Riemer–Schütt bound is better.
In the case of structured Gaussian matrices, Latała, van Handel, and Youssef [40], building upon earlier work of Bandeira and van Handel [7] (which combined the moment method with combinatorial considerations) as well as results proved by van Handel in [61] (which used Slepian’s lemma), obtained the precise behavior without any logarithmic terms in the dimension, namely
(1.4) | ||||
Their proof is based on a clever block decomposition of the underlying matrix (see [40, Figure 3.1]). This result finally answered in the affirmative a conjecture made by Latała more than a decade before. We also refer the reader to the survey [62] discussing in quite some detail results prior to [40] and [61] — the latter work discusses the conjectures of Latała and van Handel and shows their equivalence.
Very recently, Latała and Świątkowski [39] investigated a similar problem when the underlying random matrix has Rademacher entries. They proved a lower bound which, up to a factor, can be reversed for randomized circulant matrices.
In [23], Guédon, Hinrichs, Litvak, and Prochno studied our main and motivating question on the order of the expected operator norm of structured random matrices considered as operators between and in the special case where and the random entries are Gaussian. In this situation, where we are not dealing with the spectral norm, the moment method cannot be employed. The approach in [23] was therefore different and based on a majorizing measure construction combining the works [24] and [25]. In [23, Theorem 1.1], the authors proved that if , then
(1.5) |
where for a standard Gaussian random variable . Moreover, for and , it was noted in [23, Remark 1.4] (see also [45, Twierdzenie 2]) that
(1.6) |
Later, an extension of (1.5) to the case of matrices with i.i.d. isotropic log-concave rows was obtained by Strzelecka in [55].
Trying to extend the upper bound for to the whole range one encounters two difficulties. First of all, the methods used in order to prove (1.5) fail if or , because the majorizing measure construction used in [23] is restricted to the case and the assumption is required in a Hölder bound. Moreover, when or the result cannot hold with the right-hand side of the same form as in (1.5) (see Remark 4.2 below for counterexamples111By Jensen’s inequality, the expected norm of a matrix with i.i.d. Rademacher entries is less than or equal to times the expected norm of the matrix with Gaussian entries, so (1.5) for or would imply the same (up to a constant) bound for matrices, which does not hold in this range of as we explain in Remark 4.2. to (1.5) in the cases and ). This explains the different form of expressions and in (1.1), which in the range reduce to the maxima of norms on the right-hand side of (1.5) — see (1.9) below.
1.2. Lower bounds and conjectures
By arguments similar to the ones used in order to prove the lower bound in (1.4), one can check that in the range considered in [23, 45] (i.e., ) one has
(1.7) | ||||
Note that for ,
which explains the simplified form of (1.6).
We remark that the proof of (1.7) is based merely on the observation that the operator norm is greater than the maximum entry of the matrix and the appropriate maximum norms of its rows and columns, combined with comparison of moments for Gaussian random vectors. Another but related way to proceed, valid for all , is to exchange expectation and suprema over the and balls in the definition of the operator norm. We present the details in Subsection 5.1. In particular, Proposition 5.1 and Corollary 5.2 imply222We use here also a trivial observation that . that, for ,
(1.8) |
It is an easy observation (see Lemma 2.1 below) that for ,
(1.9) |
Thus, in the range considered in [23, 45], the lower bounds (1.7) and (1.8) coincide.
Although it would be natural to conjecture at this point that the bound (1.8) may be reversed up to a multiplicative constant depending only on , such a reverse bound turns out not to be true in the case (and in the dual one ) as we shall show in Subsection 5.3.
In order to conjecture the right asymptotic behavior of , one may take a look at the boundary values of and , i.e., or . Note that (1.6) provides an asymptotic behavior of on a part of this boundary (i.e., for and and in the dual case and ). We provide sharp results on the remaining parts of the boundary of (see dense lines on the boundary of Figure 1 below):
where
and with denoting the non-increasing rearrangement of for a given . (For the precise formulation see Propositions 1.8 and 1.10, and Corollary 1.11 below.)
Moreover, in Subsection 5.1 we generalize the lower bounds from the boundary into the whole range (see Figure 1 below), i.e., we prove
(1.10) |
northeast lines: | |||
horizontal lines: | |||
vertical lines: | |||
northwest lines: |
Let us now discuss the relation between the terms appearing above. We postpone the proofs of all the following claims to Section 5.
In the case , we have
(1.11) | ||||
where the matrices and are obtained by permuting the columns and rows, respectively, of the matrix in such a way that and . Therefore, in the range the right-hand side of (1.10) changes continuously with and (for a fixed matrix ).
Obviously, and, in general, the former quantity may be of larger order than the latter one. In Subsection 5.3 we shall present a more subtle relation: for every we shall give an example showing that the right-hand side of (1.10) may be of larger order than . Note that by duality, i.e., the fact that
(1.12) |
the same holds in the case . This suggests that the behavior of is different in the regions with horizontal or vertical lines than in the region with northeast lines.
Moreover, we have
(1.13) |
(see Subsection 5.2). Note that this is not the case for , as one can easily see by considering, e.g., equal to the identity matrix. This suggests a different (than in other regions), simplified, behavior of in the region with northwest lines.
Given the discussion above, the lower bounds presented in (1.10), and the fact that they can be reversed for all , (and for all , ), it is natural to conjecture the following.
Conjecture 1.
For all , we conjecture that
(1.14) |
Remark 1.1.
One could pose another natural conjecture, based on the potential generalization of the first line of the bound (1.4), namely that the inequality
(1.15) |
holds for all . Indeed, the lower bound is true with constant , since for every deterministic matrix one has
However, as we prove in Subsection 5.4, this conjecture is wrong: although the right-hand sides of (1.14) and (1.15) are comparable in the range , for every pair of outside this range the right-hand side of (1.15) may be of smaller order than the left-hand side.
Let us now present a conjecture concerning the boundedness of linear operators given by infinite dimensional matrices. In what follows, we say that a matrix defines a bounded operator from to if for all the product is well defined, belongs to and the corresponding linear operator is bounded.
Conjecture 2.
Let be an infinite matrix with real coefficients and let . We conjecture that the matrix defines a bounded linear operator between and almost surely if and only if the matrix defines a bounded linear operator between and , the matrix defines a bounded linear operator between and , and
-
•
in the case , ,
-
•
in the case , , and , where , ,
-
•
in the case , , and , where , ,
-
•
(in the case we do not need to assume any additional conditions).
Proposition 1.2.
We postpone the proof of this proposition to Subsection 5.5.
In this article, in addition to the cases obtained in [40] and proved in [23, 45], we confirm Conjecture 1 when , and when , . In all the other cases, we are able to prove the upper bounds only up to logarithmic (in the dimensions ) multiplicative factors (see Corollary 1.4 below). In particular, Proposition 1.2 implies that Conjecture 2 holds for all , and for all , .
Note that in the structured case we work with, interpolating the results obtained for the boundary cases or gives a bound with polynomial (in the dimensions) multiplicative constants which are much worse than logarithmic constants from Corollary 1.4 below. However, as we shall see in Remark 4.2 below, interpolation techniques work well in the non-structured case.
1.3. Main results valid for
We start with general theorems valid for the whole range of , . Results which are based on methods working only for specific values of , , but yielding better logarithmic terms, are presented in the next subsection. A brief summary and comparison of all results can be found in Table LABEL:table:summary.
Before stating our main results, we need to introduce additional notation. For a non-empty set , and , we define
By we denote the space equipped with the norm
whose unit ball is . Obviously, the space can be identified with a subspace of . If is a linear operator, the notation means that is restricted to the space and composed with a projection onto . Moreover, for , , where the supremum is taken over all with , and is the non-increasing rearrangement of .
Theorem 1.3 (Main theorem in a general version with sets , ).
Assume that , , , and has i.i.d. standard Gaussian entries. Then
where the suprema are taken over all sets , such that , .
The above theorem gives an estimate on the largest operator norm among all submatrices of of fixed size. Let us remark that apart from being of intrinsic interest, quantities of this type (for ) have appeared in connection with the study of the restricted isometry property of random matrices with independent rows [2] or in the analysis of entropic uncertainty principles for random quantum measurements [3, 47].
Let us now give an outline of the proof of Theorem 1.3. Note that
(1.16) |
In the first step of our proof, we find polytopes and approximating (with accuracy depending logarithmically on the dimension) the unit balls in and , respectively. The extreme points of the sets and have a special and simple structure: absolute values of their non-zero coordinates are all equal to a constant depending only on the size of the support of a given point. Since is close to and is close to , we may consider only in (1.16). Since non-zero coordinates of and , respectively, are all equal up to a sign we may use a symmetrization argument and the contraction principle to remove and in the sum on the right-hand side of (1.16). Thus, in the next step of the proof we only need to estimate the expected value of
where and represent the potential supports of points in and . To deal with this quantity, we first consider the suprema over the subsets of fixed sizes and use Slepian’s lemma to compare the supremum of the Gaussian process above with the supremum of another Gaussian process, which may be bounded easily. Then we make use of the term , which allows us to go back to suprema over the sets and . At the end, we use the Gaussian concentration inequality to unfix the sizes of sets and and complete the proof.
Applying Theorem 1.3 with , immediately yields the following result, which confirms Conjecture 1 up to some logarithmic terms.
Corollary 1.4 (Main theorem – to version).
Assume that and has i.i.d. standard Gaussian entries. Then,
Moreover, we easily recover the same bound in the case of independent bounded entries. We state and prove a general version with sets and akin to Theorem 1.3 in Subsection 3.2.
Corollary 1.5.
Assume that and has independent mean-zero entries taking values in . Then
We use the two results above to obtain their analogue in the case of entries for ; these random variables are defined by (1.17). This class contains, among others,
-
•
log-concave random variables (which are ),
-
•
heavy tailed Weibull random variables (of shape parameter , i.e., for ),
-
•
random variables satisfying the condition
These random variables are with . They were considered recently in [38].
A general version of the following Corollary 1.6 with sets and is stated and proved in Subsection 3.2.
Corollary 1.6.
Assume that , , , and has independent mean-zero entries satisfying
(1.17) |
Then
1.4. Results for particular ranges of ,
We continue with results for some specific ranges of , , where we are able to prove estimates with better logarithmic dependence (results which follow from them by duality (1.12) are stated in Table LABEL:table:summary to keep the presentation short). We postpone their proofs to Section 4. We start with the case of Gaussian random variables. Recall that , where is a standard Gaussian random variable.
Proposition 1.7.
For all and , we have
(1.18) | ||||
If or , then we are able to get a result without logarithmic terms. Recall that for a sequence we denote by the non-increasing rearrangement of .
Proposition 1.8.
-
(i)
For , we have
-
(ii)
Moreover,
where , .
Note that ii shows in particular that a blow up of the constant in the upper estimate (i) for is necessary, since the right most summands in i and ii are non-comparable.
Remark 1.9.
It shall be clear from the proof that the upper bound in part i of Proposition 1.8 remains valid for any random matrix (instead of ) with independent isotropic rows (i.e., rows with mean zero and the covariance matrix equal to the identity) such that
(1.19) |
Note that the independence and the isotropicity of rows imply that also the columns of are isotropic (since the coordinates of every column are independent and have mean zero and variance ). Therefore, whenever , condition (1.19) is always satisfied (because the -integral norm is bounded above by the -integral norm, which is then equal to the right-hand side of (1.19), since the covariance matrix of each column is equal to the identity matrix).
The following proposition generalizes part (ii) of Proposition 1.8 to an arbitrary . We list it separately since we present a proof using different arguments. Recall that the case , was established before, see (1.6).
Proposition 1.10.
If , then
where for .
Proposition 1.10 immediately implies its dual version.
Corollary 1.11.
If , then
where for .
Remark 1.12.
Proposition 1.7 implies the following estimate for matrices with independent entries, in the same way as Corollary 1.4 implies Corollary 1.6 (see Subsection 3.2).
Corollary 1.13.
Assume that , , and has independent mean-zero entries satisfying
(1.20) |
Then, for , ,
By Hoeffding’s inequality (i.e., Lemma 2.13) we know that matrices with independent valued in entries having mean zero satisfy (1.20) with and . In this special case of independent bounded random variables one can also adapt the methods of [9] to prove in the smaller range the following result with explicit numerical constants and improved dependence on (note that the second logarithmic term is better than in Corollary 1.13, where the exponent equals ).
Proposition 1.14.
Assume that has independent mean-zero entries taking values in . Then, for ,
where .
Finally, we have the following general result for matrices with independent entries (cf. Corollary 1.6).
Theorem 1.15.
Let , , and assume that has independent mean-zero entries satisfying
Then, for all and ,
Having in mind the strategy of proof described after Theorem 1.3, let us elaborate on the idea of proof of Theorem 1.15. We shall split the matrix into two parts and which we treat separately. In our decomposition, all entries of are bounded by and the probability that is very small. Then we shall deal with using a crude bound (Lemma 4.3) and the fact that the probability that is small enough to compensate it. In order to bound the expectation of the norm of , we require a cut-off version of Theorem 1.15 (Lemma 4.4). To obtain it, we shall replace in the expression for the operator norm with a suitable polytope (and leave as it is) and then apply a Gaussian-type concentration inequality to the function for .
1.5. Tail bounds
All the bounds for provided in this work for random matrices also yield a tail bound for . (It is clear from the proof of Proposition 1.16 — see Subsection 3.2 — that the same applies to the estimates for , but we omit the details to keep the presentation clear.)
Proposition 1.16 (Tail bound).
Assume that , , , and . Fix a deterministic matrix and assume that
If for all random matrices with independent mean-zero entries satisfying
(1.21) |
we have
(1.22) |
then, for all random matrices with independent mean-zero entries satisfying (1.21), we also have
(1.23) |
and, for all ,
(1.24) |
1.6. Organization of the paper
In Section 2 we gather various preliminary results we shall use in the sequel. Section 3 contains the proofs of the main results valid for all , (i.e., Theorem 1.3 and its corollaries) and the tail bound from Proposition 1.16. In Section 4 we prove the results for specific choices/ranges of , . In Section 5 we prove lower bounds on expected operator norms, showing in particular that our estimates are optimal up to logarithmic factors. We also prove other results justifying the proposed form of Conjecture 1. The last subsection of Section 5 is devoted to infinite dimensional Gaussian operators.
2. Preliminaries
2.1. General facts
We start with some easy lemmas which will be used repeatedly throughout the paper.
Lemma 2.1.
For any real matrix and , we have
Furthermore, for a real matrix and , ,
Proof.
Since , we have , where denotes the convex hull of the set . Moreover, the extreme points of are the signed standard unit vectors, i.e., , and is a convex function (since ). Thus,
This immediately implies the result for the Hadamard product if .
If, on the other hand, , then by the subadditivity of the function ,
where in the last equality we used the first part of the Lemma. Since we clearly have
we thus obtain
Definition 2.2.
A set is called unconditional, if for every and every we have .
We shall use the following version of [49, Lemma 2.1].
Lemma 2.3.
Assume that , , and define the convex set
Then .
Proof.
Fix a vector . We want to prove that , where
denotes the norm generated by , i.e., its Minkowski gauge. Since both and are permutationally invariant and unconditional (see Definition 2.2), we may and will assume that . If we put , then
Since for ,333Indeed, , so ; on the other hand, , so . the triangle and Hölder inequalities yield
where we also used the elementary estimates and . This completes the proof. ∎
Remark 2.4.
The term can be replaced by by writing in the above proof
Here we used the estimates for (which follows from the concavity of the function ) and the trivial one .
Remark 2.5.
The constant in Lemma 2.3 is sharp up to a constant depending on for every (when , and the constant depending on degenerates as ). More precisely, we shall prove that if , then . Note that if and only if
(2.1) |
where is norm dual to .
Let be the set of extreme points of , and let be the non-increasing rearrangement of . For every ,
Assume that and put . We get
whereas
so inequality (2.1) yields that .
We shall also need the following standard lemma (see, e.g., [41, Section 1.3]). We will use the versions with and .
Lemma 2.6.
Let be a nonnegative random variable. If there exist , , and such that
then
Proof.
Integration by parts yields
2.2. Contraction principles
Below we recall the well-known contraction principle due to Kahane and its extension by Talagrand (see, e.g., [64, Exercise 6.7.7] and [43, Theorem 4.4 and the proof of Theorem 4.12]).
Lemma 2.7 (Contraction principle).
Let be a normed space, , and . Assume that and . Then, if are independent Rademacher random variables, we have
Lemma 2.8 (Contraction principle).
Let be a bounded subset of . Assume that are -Lipschitz and for . Then, if are independent Rademacher random variables, we have
2.3. Gaussian random variables
The following result is fundamental to the theory of Gaussian processes and referred to as Slepian’s inequality or Slepian’s lemma [52]. We use the following (slightly adapted) version taken from [11, Theorem 13.3].
Lemma 2.9 (Slepian’s lemma).
Let and be two Gaussian random vectors satisfying for all . Assume that, for all , we have . Then
The next lemma is folklore. We include a short proof of an estimate with specific constants for the sake of completeness.
Lemma 2.10.
Assume that and let , , be standard Gaussian random variables (not necessarily independent). Then
Proof.
Since the moment generating function of a Gaussian random variable is given by , it follows from Jensen’s inequality that
By taking , we get the first assertion. We apply this inequality with random variables to get the second assertion, namely
The next two lemmas are taken from [61]. Recall that is the non-increasing rearrangement of .
Lemma 2.11 ([61, Lemma 2.3]).
Assume that and let be random variables (not necessarily independent) satisfying
Then
Lemma 2.12 ([61, Lemma 2.4]).
Assume that and let be independent random variables with for . Then
Lemma 2.13 (Hoeffding’s inequality, [32, Theorem 2]).
Assume that and let , , be independent mean-zero random variables such that a.s. Then, for all ,
2.4. Random variables with heavy tails
The following lemma is a special case of [34, Theorem 1].
Lemma 2.14 (Contraction principle).
Let and assume that and are two sequences of independent symmetric random variables satisfying for every and ,
Then, for every convex function and every ,
Lemma 2.15 ([31, Theorem 6.2]).
Assume that are independent symmetric Weibull random variables with shape parameter and scale parameter , i.e., for . Then, for every and ,
Remark 2.16 (Moments of Weibull random variables).
Note that if is a symmetric random variable such that , , then has (symmetric) exponential distribution with parameter , so by Stirling’s formula we obtain, for all ,
with .
The three previous results easily imply the following estimate for integral norms of linear combinations of independent random variables.
Proposition 2.17.
Let , and assume that are independent symmetric random variables satisfying for all and . Then, for every and ,
Proof.
The first inequality is an immediate consequence of Lemma 2.14 (applied with , independent Weibull variables with shape parameter and scale parameter , and with the convex function ), Lemma 2.15, and Remark 2.16. The second inequality follows from
where in the last step we used the inequality between weighted arithmetic and geometric means. ∎
The next lemma is standard and provides us with several equivalent formulations of the property expressed through tail bounds, growth of moments, and the exponential moments, respectively. We provide a brief proof, since in the literature one usually finds versions for only.
Lemma 2.18.
Proof.
Property i implies ii by Lemma 2.14 (applied with , and an independent Weibull variable with parameter ) and Remark 2.16. Property iii implies i by Chebyshev’s inequality:
Assume now that ii holds and denote . Then, for every , we have and
while for , we have and, hence, property ii yields
Hence, by Stirling’s formula we have for ,
The next lemma states that a linear combination of independent random variables is a random variable.
Lemma 2.19.
Assume that , , and let be independent symmetric random variables satisfying for all . Then for every the random variable satisfies, for all ,
where , depend only on , , and .
Proof.
Lemma 2.20.
Assume that , , is a non-negative random variable such that for all , and is independent of . Then, for every ,
where .
Proof.
In the case we have and then almost surely and the assertion is trivial. Assume now that . By our assumptions . Let . Note that is equivalent to . Thus,
where we used and chose . ∎
Lemma 2.21.
Assume that , and that is a random variable satisfying for all . Let , , and be as in Lemma 2.20. Then there exist random variables and such that
Proof.
For we have , so , and thus . We use our assumptions, the inequality , and Lemma 2.20 to obtain for any ,
Consider the version of and the version of defined on the (common) probability space equipped with Lebesgue measure, constructed as the (generalised) inverses of cumulative distribution functions of and , respectively. Then , which implies the assertion. ∎
Lemma 2.22.
Let , and , and assume that , are random variables satisfying for all . Then
and
Proof.
By a union bound and the assumptions we get, for every ,
where we used in the last step. We integrate by parts, change the variables, and use the above bound to obtain the second part of the assertion, i.e.,
3. Proofs of the main results
After the preparation in the previous section, we shall now present the proofs of our main results.
3.1. General bound via Slepian’s lemma
In order to obtain Theorem 1.3 we first prove its weaker version, for and only. After that we shall use the polytope from Lemma 2.3 and the Gaussian concentration to see how Proposition 3.1 implies the general bound. The proof of this proposition relies on the symmetrization together with the contraction principle, which allow us to get rid of and , and make use of Slepian’s lemma.
Proposition 3.1.
Assume that has i.i.d. standard Gaussian entries and , . Then
where the suprema are taken over all sets , such that , .
Proof.
Throughout the proof, and are fixed and the suprema are taken over all index sets satisfying , and , .
Let us denote by an independent copy of . Using the duality , centering the expression, noticing that is a Gaussian random variable with variance , and using Jensen’s inequality, we see that
(3.1) |
To estimate the expected value on the right-hand side, we use a symmetrization trick together with the contraction principle (Lemma 2.8). Let be a sequence of independent Rademacher random variables independent of all others. Since the random vectors
(where ) are independent and symmetric, has the same distribution as . Therefore,
(3.2) |
Applying (conditionally, with the values of ’s fixed) the contraction principle (i.e., Lemma 2.8) with the set
and the function (which is 1-Lipschitz and takes the value at the origin), we get
(3.3) |
By proceeding similarly as in (3.1), we obtain
(3.4) |
Observe that using symmetrization and the contraction principle similarly as in (3.2) and (3.1), we can estimate the first summand on right-hand side of (3.4) as follows,
(3.5) |
Altogether, the inequalities in (3.1) – (3.5) yield that
(3.6) |
We shall now estimate the first summand on the right-hand side of (3.6) using Slepian’s lemma (i.e., Lemma 2.9). Denote
where , are independent standard Gaussian variables. The random variables clearly have zero mean. Thus, we only need to calculate and compare and . In the calculations below it will be evident over which sets the index (resp. ) runs, so in order to shorten the notation and improve readability, we use the notational convention
By independence,
By independence and the inequality (valid for ),
Thus, we clearly have
(cf. Remark 3.2 below). Hence, by Slepian’s lemma (Lemma 2.9) and Lemma 2.10 on the expected maxima of standard Gaussian random variables,
Recalling the estimate (3.6), we arrive at
which completes the proof of Proposition 3.1. ∎
Remark 3.2.
In the above proof, we also have
Therefore, by Slepian’s lemma (Lemma 2.9) we may reverse the estimate from the proof as follows:
Proof of Theorem 1.3.
Recall that stands for the supremum taken over all sets , with , . Given such sets , , we introduce the sets
Then, by Lemma 2.3, and . Therefore,
(3.7) |
where we denoted
with the suprema here (and later on in this proof) being always taken over all sets and .
By Proposition 3.1, we only know that for all and ,
(3.8) |
but we shall use the Gaussian concentration and the union bound to obtain an estimate for
Note first that and , provided that , , , . Therefore,
and, similarly,
This together with the estimate in (3.1) gives
(3.9) |
Note that by the Cauchy–Schwarz inequality, the function
is -Lipschitz with
where in the last inequality we used the fact that and . In order to estimate the right-hand side of the latter inequality, we consider the following two cases:
Case 1. If , then and . Consequently,
(3.10) |
Case 2. If , then and . Thus,
(3.11) |
In both cases we have
so the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]) implies that for all , , and ,
so
This, together with the union bound, implies that for , we have
Hence, by Lemma 2.6 and the estimate in (3.1),
Recalling (3.1) yields the assertion. ∎
3.2. Coupling
In this subsection we use contraction principles and the coupling described in Lemma 2.21 to prove Corollaries 1.5 and 1.6, and Proposition 1.16. Below we state more general versions of the corollaries akin to Theorem 1.3 (the versions from the introduction follow by setting , ).
Theorem 3.3 (General version of Corollary 1.5).
Assume that , , , and has independent mean-zero entries taking values in . Then
where the suprema are taken over all sets , such that , .
Remark 3.4 (Symmetrization of entries of a random matrix).
Let be an independent copy of a random matrix with mean entries. Then for any norm , including the operator norm from to , we have by Jensen’s inequality
Therefore, in many cases we may simply assume that we deal with matrices with symmetric (not only mean ) entries. For example, in the setting of Theorem 3.3, the entries of are symmetric and take values in , so it suffices to prove the assertion of this theorem (with a two times smaller constant on the right-hand side) under the additional assumption that the entries of the given random matrix are symmetric.
Proof of Theorem 3.3.
By Remark 3.4 we may and do assume that the entries of are symmetric — in this case we need to prove the assertion with a two times smaller constant.
Since the entries of are independent and symmetric, has the same distribution as , where is a random matrix with i.i.d. Rademacher entries, independent of all other random variables. Thus, the contraction principle (see Lemma 2.7) applied conditionally yields (below the suprema are taken over all sets , such that , , and over all , and the sums run over all and )
and the assertion follows from Theorem 1.3. ∎
Theorem 3.5 (General version of Corollary 1.6).
Assume that , , , , , and has independent mean-zero entries satisfying
Then
where the suprema are taken over all sets , such that , .
Proof.
Let be an independent copy of . Then
This means that the symmetric matrix satisfies the assumptions of Theorem 3.5. Hence, due to Remark 3.4, we may and do assume that the entries of are symmetric.
Take the unique positive parameter satisfying . For , , let be i.i.d. standard Gaussian variables, independent of other variables, and let be i.i.d. non-negative Weibull random variables with shape parameter scale parameter (i.e., for ), independent of other variables. (In the case , we have and then almost surely.) Take
as in Lemma 2.21 (we pick a pair separately for every , and then take such a version of each pair that the system of random pairs is independent).
Let be a random matrix with i.i.d. Rademacher entries, independent of all other random variables. Since the entries of are symmetric and independent, has the same distribution as . By Lemma 2.21 we know that
We use the contraction principle conditionally for , i.e., for ’s and ’s fixed. More precisely, we apply Lemma 2.7 to the space of all matrices with real coefficients, equipped with the norm
(where the first supremum is taken over all sets , such that , ; recall that the second supremum is taken over all sets as in the first supremum, and over all , and the sum runs over all and ); note that we identify with (and plays the role of from Lemma 2.7). We apply the contraction principle of Lemma 2.7 (conditionally, with the values of ’s and ’s fixed) with coefficients and points to get
(3.12) |
We may estimate the first term using Theorem 3.3 applied to the matrix as follows,
(3.13) |
Recall that and that almost surely. Next we again use the contraction principle (applied conditionally for , i.e. for fixed ’s and ’s) and get
(3.14) |
Moreover, Theorem 1.3 and Lemma 2.22 (applied with , , , and ), imply
(3.15) |
Combining the estimates in (3.2)–(3.2) yields the assertion. ∎
Finally, we prove that these estimates of the operator norms translate into tail bounds.
Proof of Proposition 1.16.
Since (1.23) implies (1.24) (by Lemma 2.18), it suffices to prove inequality (1.23). By the symmetrization argument similar to the one from the first paragraph of the proof of Theorem 3.5, we may nad will assume that has independent and symmetric entries satisfying (1.21). By assumption (1.21), and the inequality we have for every ,
so (as in the proof of Lemma 2.21) there exists a random matrix with i.i.d. entries with the symmetric Weibull distribution with shape parameter and scale parameter (i.e., for ) satisfying
(3.16) |
Let be a matrix of independent Rademacher random variables independent of all others, and let denote the operator norm from to . Let be a matrix with at the intersection of th row and th column and with other entries . The contraction principle (i.e., Lemma 2.7) applied conditionally, (3.16), and the triangle inequality yield for any ,
Therefore, it suffices to prove (1.23) for random matrices and instead of .
Since by assumption , both random matrices and satisfy (1.21), so for them inequality (1.22) holds. By the comparison of weak and strong moments [38, Theorem 1.1] (note that the random variables satisfy the assumption for all with by [38, Remark 1.5]), we have
(3.17) |
Because of inequality (1.22), the first summand on the right-hand side may be estimated by . Lemma 2.19 and the implication iii from Lemma 2.18 yield
Moreover, by (3.10) and (3.1) (used with and ) and our assumption that ,
so the second summand on the right-hand side of (3.17) is bounded above (up to a multiplicative constant depending only on , , and ) by . Thus, (1.23) indeed holds for the random matrix instead of . A similar reasoning shows that the same inequality holds also for the random matrix (one may also simply use the Khintchine–Kahane inequality and assumption (1.22)). ∎
4. Proofs of further results
4.1. Gaussian random variables
Proof of Proposition 1.7.
Fix and . Let be the set defined in Lemma 2.3 for which . Then
(4.1) |
where is the set of extreme points of . We shall now estimate the expected value of the right-hand side of (4.1).
To this end, we first consider a fixed . Then there exists a non-empty index set of cardinality such that for and for . We have
(4.2) |
Let us estimate the Lipschitz constant of the function
(4.3) |
It follows from the Cauchy–Schwarz inequality (used in ) that
(4.4) |
where we put
This shows that the function defined by (4.3) is -Lipschitz continuous. Therefore, by the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]), for any ,
(4.5) |
We shall transform this inequality into a form which is more convenient to work with. We want to estimate independently of and get rid of the dependence on and on the right-hand side. By (4.2) and the fact that , we obtain
We use the definition of , then interchange the sums, use the triangle inequality, and then the inequality between the arithmetic mean and the power mean of order (recall that and ) to obtain
(4.6) |
The two inequalities above, together with inequality (4.5) (applied with ), imply that
(4.7) | ||||
holds for any and all with support of cardinality .
We now turn to the special case .
Proof of Proposition 1.8.
Since the first part of this proof works for general , we do not restrict our attention to for now. First of all,
where is the -th row of the matrix . Centering this expression gives
(4.8) | ||||
We first take care of the second term on the right-hand side of (4.8). We have
(4.9) |
In order to deal with the first term on the right-hand side of (4.8), we use a symmetrization trick together with the contraction principle. The latter is the reason that we need to work with here. We start with the symmetrization. Denoting by independent copies of and by a sequence of Rademacher random variables independent of all others, we obtain by Jensen’s and the triangle inequalities that
(4.10) |
If , we may use the contraction principle (i.e., Lemma 2.8 applied with functions ) conditionally to obtain
(4.11) |
For , we have
(4.12) |
Moreover, we have
(4.13) |
Inequalities (4.10)–(4.13) give the estimate of the first term on the right-hand side of (4.8). This ends the proof of the upper bound for .
Now we deal with another special case, the one where .
Proof of Proposition 1.10.
Recall that we deal with the range . Using the structure of extreme points of we get
Denote . By well-known tail estimates of norms of Gaussian variables with values in Banach spaces (see, e.g., [36, Corollary 1] for a more general formulation) we get for all ,
(4.15) | ||||
(4.16) |
where are universal positive constants, and
Inequality (4.15) shows in particular that the random variables satisfy
for all , thus by Lemma 2.11 we get
which together with the observation (following from Lemma 2.1 and the fact that ) that
proves the upper estimate of the proposition.
Using comparison of moments of norms of Gaussian random vectors, we also get
(4.17) |
so to end the proof it is enough to show that
(4.18) |
This will follow by a straightforward adaptation of the argument from the proof of Lemma 2.12. We may and do assume that the sequence is non-increasing in . By (4.16) we have for any and ,
Thus, since for all , we have for any ,
Thus,
Taking maximum over gives (4.18) and ends the proof. ∎
4.2. Bounded random variables
Here we show how one can adapt the methods of [9] to prove Proposition 1.14, i.e., a version of Corollary 1.13 in the special case of bounded random variables with better logarithmic terms and with explicit numerical constants. Following [9], we start with a lemma.
Lemma 4.1.
Assume that is as in Proposition 1.14. Let and suppose that is such that almost surely. Then, for all and ,
(4.19) |
where .
Proof.
Without loss of generality we may and do assume that .
Since , for and we have . Thus, integration by parts, our assumption a.s., and Hoeffding’s inequality (i.e., Lemma 2.13) yield
Proof of Proposition 1.14.
We start with a bunch of reductions. Set
(The equalities follow from Lemma 2.1, since and ). Let be the set defined in Lemma 2.3, so that . Then
(4.20) |
where is the set of extreme points of .
Consider first a fixed . We have
(4.21) |
Denote
Then, by the boundedness of and by Hölder’s inequality, for every ,
We can now apply, for every , Lemma 4.1 (with and as above and with coefficients ). Since the random variables , , are independent, using Lemma 4.1 yields
where in the last step we used the definition of (and the fact that ). By Chebyshev’s inequality and (4.21), we have, for every ,
Combining this with the previous estimate yields, for every ,
Recall that . Thus, there exists an index set of cardinality , such that for and for . We use the definition of and the inequality between the arithmetic mean and the power mean of order (recall that and ) to get
Putting everything together, we obtain
(4.22) |
for all and all with support of cardinality .
Remark 4.2.
In the unstructured case, for which are independent, mean-zero, and take values in , it is easy to extend (1.2) to the whole range of (see [8, 13]). Indeed, for and ,
Thus, for and ,
Suppose now that and (i.e., ). Choose and so that and , i.e., and . Using the Riesz–Thorin interpolation theorem, the fact that (since the entries take values in ), and Jensen’s inequality, we arrive at
The estimates in the remaining ranges of follow by duality (1.12). Moreover, up to constants, all these estimates are optimal, as they can be reversed for matrices with entries (see [8, Proposition 3.2] or [13, Satz 2]).
4.3. random variables
In this section, we prove Theorem 1.15. To this end we shall split the matrix into two parts and such that all entries of are bounded by . Then, we shall deal with using the following crude bound and the fact that the probability that is very small. In order to bound the expectation of the norm of we need a cut-off version of Theorem 1.15 – see Lemma 4.4 below.
Lemma 4.3.
Let . Assume that satisfies the assumptions of Theorem 1.15. Then
Proof.
By a standard volumetric estimate (see, e.g., [64, Corollary 4.2.13]), we know that there exists (in the metric ) a -net in of size at most . In other words, for any there exists such that . Thus, for any ,
Hence,
(4.23) |
Likewise, if we denote by the -net in (in the metric ) of size at most , then
(4.24) |
Combining these two estimates, we see that
(4.25) | ||||
Lemma 4.4.
Let and . Assume is a random matrix with independent symmetric entries taking values in and satisfying the condition
(4.27) |
Then, for and , we have
Proof.
Fix and . Let be the set defined in Lemma 2.3 so that . Then
(4.28) |
where is the set of extreme points of . We shall now estimate the expected value of the right-hand side of (4.28).
To this end, we consider a fixed . This means that there exists a non-empty index set of cardinality such that for and for . We know from (4.1) that the Lipschitz constant of the convex function
is less than or equal to
Thus, Talagrand’s concentration for convex functions and random vectors with independent bounded coordinates (see [56, Theorem 6.6 and Equation (6.18)]), together with the inequality , implies
(4.29) |
Similar to the proof in the Gaussian case (i.e., proof of Proposition 1.7), we shall transform this into a more convenient form by getting rid of and estimating . Let us denote, for each ,
From our assumption (4.27) as well as Lemmas 2.19 and 2.18, we obtain that . Hence,
From (4.1), we see that
The above two inequalities together with estimate (4.29) (applied with ), imply that
(4.30) |
for every and any with support of cardinality .
Proof of Theorem 1.15.
By a symmetrization argument (as in the first paragraph of the proof of Theorem 3.5), we may and do assume that all the entries are symmetric. Set . Denote and let be the matrix with entries . We have
The random matrix satisfies the assumptions of Lemma 4.4. Thus, the first summand above can be estimated as follows:
For the second summand we write, using the Cauchy–Schwarz inequality and then Lemma 4.3 and Lemma 2.22 (with and ; recall that ),
Combinging the above three inequalities ends the proof. ∎
5. Lower bounds and further discussion of conjectures
5.1. Lower bounds
Let us first provide lower bounds showing that the upper bounds obtained above are indeed sharp (up to logarithms).
Proposition 5.1.
Let be a random matrix with independent mean-zero entries satisfying for some . Then, for all ,
Using duality (1.12) we immediately obtain the following corollary.
Corollary 5.2.
Let be as in Proposition 5.1. Then, for all ,
Proof of Proposition 5.1.
Let denote the operator norm from to . For and , let us denote by the matrix with entry at the intersection of th row and th column and with all other entries . By the symmetrization trick described in Remark 3.4, it suffices to consider matrices with symmetric entries and prove the assertion with a twice better constant (note that, also by Remark 3.4, the lower bound for the absolute first moment of the symmetrized entries does not change and is still equal to ).
If has symmetric independent entries, it has the same distribution as , where , , , are i.i.d. Rademacher random variables, independent of all other random variables. Hence, by Jensen’s inequality and the contraction principle (Lemma 2.7 applied with and ), we get
(5.1) |
Thus, it suffices to estimate from below .
Since , it suffices to prove the following proposition in order to provide the lower bound in Conjecture 1.
Proposition 5.3.
For the Gaussian matrix , we have
(5.2) |
where and .
5.2. The proof of Inequalities (1.13) and (1.11)
Let us now show that in the case , the third term on the right-hand side in Conjecture 1 is not needed. To this end it suffices to prove (1.13) only in the case , since the case follows by duality (1.12).
Proposition 5.4.
Whenever and , we have
(5.3) |
where .
Proof.
Since the right-hand side of (5.3) does not depend on , and the left-hand side is non-decreasing with , we may consider only the case . By permuting the columns of we may and do assume without loss of generality that the sequence is non-increasing.
Fix . Let be the midpoint of the non-empty interval . Take with . Since , we have
so . Therefore, the inequality and the facts that for all , and that imply
Taking the maximum over all completes the proof. ∎
Now we turn to the proof of (1.11). Note that it suffices to prove only the first two-sided inequality in (1.11), since the second one follows from it by duality (1.12).
Proposition 5.5.
For all , we have
(5.4) |
where the matrix is obtained by permuting the columns of the matrix in such a way that .
Proof.
By permuting the columns of the matrix , we can assume that the sequence is non-increasing. We have
(5.5) |
The function is -Lipschitz with respect to the Euclidean norm on , so by Gaussian concentration (see, e.g., [41, Chapter 5.1]),
for all , . Thus, Lemma 2.11 and inequality (5.5) imply
(5.6) |
We have
which, together with (5.6), provides the asserted upper bound.
On the other hand, if denotes the non-increasing rearrangement of the sequence of all absolute values of entries of , then Lemma 2.12 implies
which provides the asserted lower bound. ∎
Note that the above proof shows in fact that
so
(5.7) |
where the matrix is obtained by permuting the rows of the matrix in such a way that .
5.3. Counterexample to a seemingly natural conjecture
In this subsection we provide an example showing that for any the bound
(5.8) |
cannot hold. By duality (1.12), it also cannot hold for any . This explains that Conjecture 1 cannot be simplified into a form like on the right-hand side of (1.8).
Let , , and let be matrices with all entries equal to one. Consider a block matrix
of size , with blocks on the diagonal and with all other entries equal to .
5.4. Discussion of another natural conjecture
In this subsection we prove all the assertions of Remark 1.1. We begin by showing that for every ,
(5.11) |
and, in the case ,
(5.12) |
where , , and . In other words, (5.11) shows that Conjecture 1 is equivalent to (1.15) as long as .
Proof of (5.11) and (5.12).
Fix and let for . For we have . Thus is Lipschitz continuous with constant equal to
Therefore, the Gaussian concentration inequality (see, e.g., [41, Chapter 5.1]) implies that for every and every ,
so by Lemma 2.11 we get
(5.13) |
where the matrix is obtained by permuting the rows of the matrix in such a way that .
Moreover, by Jensen’s inequality,
This together with the triangle inequality and (5.13) implies
and, by duality,
where , and the matrix is obtained by permuting the columns of the matrix in such a way that . This, together with Lemma 2.1 and (5.4) yields in the case ,
what implies the lower bound of (5.11). In the case we additionally use (5.7) and the simple observation that
to get (5.12).
Next, for every pair which does not satisfy the condition we shall give examples of , and matrices , for which
(5.14) |
when . This shows that the natural conjecture (1.15) is wrong outside the range . The case , when (1.15) is valid (cf. (1.4)), is in a sense a boundary case, for which (1.15) (i.e., a natural generalization of (1.4)) may hold.
Example 5.6 (for (5.14) in the case .).
5.5. Infinite dimensional Gaussian operators
In this subsection we prove Proposition 1.2 concerning infinite dimensional Gaussian operators. It allows us to see that Conjecture 1 implies Conjecture 2.
Proof of Proposition 1.2.
We adapt the proof of [40, Corollary 1.2] to prove Proposition 1.2 in the case – remaining cases may be proven similarly. Fix for which (1.14) holds and a deterministic infinite matrix . Using the monotone convergence theorem one can show that a matrix defines a bounded operator between and if an only if . Interpreting as infinity for matrices which do not define a bounded operator, we have
and similarly
and | ||||
Therefore, (1.14) implies the following: if and only if , , and . It thus suffices to prove the following claim: almost surely if and only if .
If , then , so .
Acknowledgments
R. Adamczak is partially supported by the National Science Center, Poland via the Sonata Bis grant no. 2015/18/E/ST1/00214. R. Adamczak was partially supported by by the WTZ Grant PL 06/2018 of the OeAD. J. Prochno and M. Strzelecka are — and M. Strzelecki was — supported by the Austrian Science Fund (FWF) Project P32405 Asymptotic Geometric Analysis and Applications. M. Strzelecka was partially supported by the National Science Center, Poland, via the Maestro grant no. 2015/18/A/ST1/00553.
References
- [1] D. Achlioptas and F. Mcsherry, Fast computation of low-rank matrix approximations, J. ACM 54 (2007), no. 2, 9–es.
- [2] R. Adamczak, R. Latała, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann, Chevet type inequality and norms of submatrices, Studia Math. 210 (2012), no. 1, 35–56. MR 2949869
- [3] R. Adamczak, R. Latała, Z. Puchała, and K. Życzkowski, Asymptotic entropic uncertainty relations, J. Math. Phys. 57 (2016), no. 3, 032204, 24. MR 3478525
- [4] N. Ailon and B. Chazelle, The fast Johnson-Lindenstrauss transform and approximate nearest neighbors, SIAM J. Comput. 39 (2009), no. 1, 302–322. MR 2506527
- [5] G. Akemann, J. Baik, and P. Di Francesco (eds.), The Oxford handbook of random matrix theory, Oxford University Press, Oxford, 2015.
- [6] G. W. Anderson, A. Guionnet, and O. Zeitouni, An introduction to random matrices, Cambridge Studies in Advanced Mathematics, vol. 118, Cambridge University Press, Cambridge, 2010. MR 2760897
- [7] A. S. Bandeira and R. van Handel, Sharp nonasymptotic bounds on the norm of random matrices with independent entries, Ann. Probab. 44 (2016), no. 4, 2479–2506. MR 3531673
- [8] G. Bennett, Schur multipliers, Duke Math. J. 44 (1977), no. 3, 603–639. MR 493490
- [9] G. Bennett, V. Goodman, and C. M. Newman, Norms of random matrices, Pacific J. Math. 59 (1975), no. 2, 359–365. MR 393085
- [10] Y. Benyamini and Y. Gordon, Random factorization of operators between Banach spaces, J. Analyse Math. 39 (1981), 45–74. MR 632456
- [11] S. Boucheron, G. Lugosi, and P. Massart, Concentration inequalities, Oxford University Press, Oxford, 2013, A nonasymptotic theory of independence, With a foreword by Michel Ledoux. MR 3185193
- [12] J. Bourgain, S. Dirksen, and J. Nelson, Toward a unified theory of sparse dimensionality reduction in Euclidean space, Geom. Funct. Anal. 25 (2015), no. 4, 1009–1088. MR 3385629
- [13] B. Carl, B. Maurey, and J. Puhl, Grenzordnungen von absolut--summierenden Operatoren, Math. Nachr. 82 (1978), 205–218. MR 498116
- [14] D. Chafaï, O. Guédon, G. Lecué, and A. Pajor, Interactions between compressed sensing random matrices and high dimensional geometry, Panoramas et Synthèses [Panoramas and Syntheses], vol. 37, Société Mathématique de France, Paris, 2012. MR 3113826
- [15] K. R. Davidson and S. J. Szarek, Local operator theory, random matrices and Banach spaces, Handbook of the geometry of Banach spaces, Vol. I, North-Holland, Amsterdam, 2001, pp. 317–366. MR 1863696
- [16] S. Foucart and H. Rauhut, A mathematical introduction to compressive sensing, Applied and Numerical Harmonic Analysis, Birkhäuser/Springer, New York, 2013. MR 3100033
- [17] O. Friedland and P. Youssef, Approximating matrices and convex bodies, Int. Math. Res. Not. IMRN (2019), no. 8, 2519–2537. MR 3942169
- [18] E. D. Gluskin, Norms of random matrices and diameters of finite-dimensional sets, Mat. Sb. (N.S.) 120(162) (1983), no. 2, 180–189, 286. MR 687610
- [19] E. D. Gluskin and S. Kwapień, Tail and moment estimates for sums of independent random variables with logarithmically concave tails, Studia Math. 114 (1995), no. 3, 303–309. MR 1338834
- [20] H. H. Goldstine and J. von Neumann, Numerical inverting of matrices of high order. II, Proc. Amer. Math. Soc. 2 (1951), 188–202. MR 41539
- [21] Y. Gordon, Some inequalities for Gaussian processes and applications, Israel J. Math. 50 (1985), no. 4, 265–289. MR 800188
- [22] Y. Gordon, A. E. Litvak, C. Schütt, and E. M. Werner, Geometry of spaces between polytopes and related zonotopes, Bull. Sci. Math. 126 (2002), no. 9, 733–762. MR 1941083
- [23] O. Guédon, A. Hinrichs, A. E. Litvak, and J. Prochno, On the expectation of operator norms of random matrices, Geometric aspects of functional analysis, Lecture Notes in Math., vol. 2169, Springer, Cham, 2017, pp. 151–162. MR 3645120
- [24] O. Guédon, S. Mendelson, A. Pajor, and N. Tomczak-Jaegermann, Majorizing measures and proportional subsets of bounded orthonormal systems, Rev. Mat. Iberoam. 24 (2008), no. 3, 1075–1095. MR 2490210
- [25] O. Guédon and M. Rudelson, -moments of random vectors via majorizing measures, Adv. Math. 208 (2007), no. 2, 798–823. MR 2304336
- [26] U. Haagerup, The best constants in the Khintchine inequality, Studia Math. 70 (1981), no. 3, 231–283 (1982). MR 654838
- [27] A. Hinrichs, D. Krieg, E. Novak, J. Prochno, and M. Ullrich, On the power of random information, Multivariate Algorithms and Information-Based Complexity (F. J. Hickernell and P. Kritzer, eds.), De Gruyter, Berlin/Boston, 1994, pp. 43–64.
- [28] by same author, Random sections of ellipsoids and the power of random information, Trans. Amer. Math. Soc. 374 (2021), no. 12, 8691–8713. MR 4337926
- [29] A. Hinrichs, J. Prochno, and M. Sonnleitner, Random sections of -ellipsoids, optimal recovery and Gelfand numbers of diagonal operators, 2021.
- [30] A. Hinrichs, J. Prochno, and J. Vybíral, Gelfand numbers of embeddings of Schatten classes, Math. Ann. 380 (2021), no. 3-4, 1563–1593. MR 4297193
- [31] P. Hitczenko, S. J. Montgomery-Smith, and K. Oleszkiewicz, Moment inequalities for sums of certain independent symmetric random variables, Studia Math. 123 (1997), no. 1, 15–42. MR 1438303
- [32] W. Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58 (1963), 13–30. MR 144363
- [33] D. Krieg and M. Ullrich, Function values are enough for -approximation, Found. Comput. Math. 21 (2021), no. 4, 1141–1151. MR 4298242
- [34] S. Kwapień, Decoupling inequalities for polynomial chaos, Ann. Probab. 15 (1987), no. 3, 1062–1071. MR 893914
- [35] H. J. Landau and L. A. Shepp, On the supremum of a Gaussian process, Sankhyā Ser. A 32 (1970), 369–378. MR 286167
- [36] R. Latała, Tail and moment estimates for sums of independent random vectors with logarithmically concave tails, Studia Math. 118 (1996), no. 3, 301–304. MR 1388035
- [37] by same author, Some estimates of norms of random matrices, Proc. Amer. Math. Soc. 133 (2005), no. 5, 1273–1282. MR 2111932
- [38] R. Latała and M. Strzelecka, Comparison of weak and strong moments for vectors with independent coordinates, Mathematika 64 (2018), no. 1, 211–229. MR 3778221
- [39] R. Latała and W. Świątkowski, Norms of randomized circulant matrices, Electron. J. Probab. 27 (2022), Paper No. 80, 23. MR 4441144
- [40] R. Latała, R. van Handel, and P. Youssef, The dimension-free structure of nonhomogeneous random matrices, Invent. Math. 214 (2018), no. 3, 1031–1080. MR 3878726
- [41] M. Ledoux, The concentration of measure phenomenon, Mathematical Surveys and Monographs, vol. 89, American Mathematical Society, Providence, RI, 2001. MR 1849347
- [42] by same author, Deviation inequalities on largest eigenvalues, Geometric aspects of functional analysis, Lecture Notes in Math., vol. 1910, Springer, Berlin, 2007, pp. 167–219. MR 2349607
- [43] M. Ledoux and M. Talagrand, Probability in Banach spaces, Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 23, Springer-Verlag, Berlin, 1991, Isoperimetry and processes. MR 1102015
- [44] N. Linial, E. London, and Y. Rabinovich, The geometry of graphs and some of its algorithmic applications, Combinatorica 15 (1995), no. 2, 215–245. MR 1337355
- [45] D. Matlak, Oszacowania norm macierzy losowych, Master’s thesis, Uniwersytet Warszawski, 2017.
- [46] A. Naor, O. Regev, and T. Vidick, Efficient rounding for the noncommutative Grothendieck inequality, Theory Comput. 10 (2014), 257–295. MR 3267842
- [47] Z. Puchała, Ł. Rudnicki, and K. Życzkowski, Majorization entropic uncertainty relations, J. Phys. A 46 (2013), no. 27, 272002, 12. MR 3081910
- [48] H. Rauhut, Compressive sensing and structured random matrices, Theoretical foundations and numerical methods for sparse recovery, Radon Ser. Comput. Appl. Math., vol. 9, Walter de Gruyter, Berlin, 2010, pp. 1–92. MR 2731597
- [49] S. Riemer and C. Schütt, On the expectation of the norm of random matrices with non-identically distributed entries, Electron. J. Probab. 18 (2013), no. 29, 13. MR 3035757
- [50] M. Rudelson and R. Vershynin, Sampling from large matrices: an approach through geometric functional analysis, J. ACM 54 (2007), no. 4, Art. 21, 19. MR 2351844
- [51] Y. Seginer, The expected norm of random matrices, Combin. Probab. Comput. 9 (2000), no. 2, 149–166. MR 1762786
- [52] D. Slepian, The one-sided barrier problem for Gaussian noise, Bell System Tech. J. 41 (1962), 463–501. MR 133183
- [53] A. M.-C. So, Moment inequalities for sums of random matrices and their applications in optimization, Math. Program. 130 (2011), no. 1, Ser. A, 125–151. MR 2853163
- [54] D. A. Spielman and S.-H. Teng, Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems, Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing (New York, NY, USA), STOC ’04, Association for Computing Machinery, 2004, p. 81–90.
- [55] M. Strzelecka, Estimates of norms of log-concave random matrices with dependent entries, Electron. J. Probab. 24 (2019), Paper No. 107, 15. MR 4017125
- [56] M. Talagrand, A new look at independence, Ann. Probab. 24 (1996), no. 1, 1–34. MR 1387624
- [57] J. A. Tropp, Norms of random submatrices and sparse approximation, C. R. Math. Acad. Sci. Paris 346 (2008), no. 23-24, 1271–1274. MR 2473306
- [58] by same author, On the conditioning of random subdictionaries, Appl. Comput. Harmon. Anal. 25 (2008), no. 1, 1–24. MR 2419702
- [59] by same author, User-friendly tail bounds for sums of random matrices, Foundations of Computational Mathematics 12 (2012), no. 4, 389–434.
- [60] by same author, An introduction to matrix concentration inequalities, Foundations and Trends® in Machine Learning 8 (2015), no. 1-2, 1–230.
- [61] R. van Handel, On the spectral norm of Gaussian random matrices, Trans. Amer. Math. Soc. 369 (2017), no. 11, 8161–8178. MR 3695857
- [62] by same author, Structured random matrices, Convexity and concentration, IMA Vol. Math. Appl., vol. 161, Springer, New York, 2017, pp. 107–156. MR 3837269
- [63] R. Vershynin, Introduction to the non-asymptotic analysis of random matrices, Compressed sensing, Cambridge Univ. Press, Cambridge, 2012, pp. 210–268. MR 2963170
- [64] by same author, High-dimensional probability, Cambridge Series in Statistical and Probabilistic Mathematics, vol. 47, Cambridge University Press, Cambridge, 2018, An introduction with applications in data science, With a foreword by Sara van de Geer. MR 3837109
- [65] J. von Neumann and H. H. Goldstine, Numerical inverting of matrices of high order, Bull. Amer. Math. Soc. 53 (1947), no. 11, 1021–1099.
- [66] E. P. Wigner, Characteristic vectors of bordered matrices with infinite dimensions, Ann. of Math. (2) 62 (1955), 548–564. MR 77805
- [67] by same author, Characteristic vectors of bordered matrices with infinite dimensions. II, Ann. of Math. (2) 65 (1957), 203–207. MR 83848
- [68] by same author, On the distribution of the roots of certain symmetric matrices, Ann. of Math. (2) 67 (1958), 325–327. MR 95527
- [69] J. Wishart, The generalised product moment distribution in samples from a normal multivariate population, Biometrika 20A (1928), no. 1/2, 32–52.