Singularity of sparse Bernoulli matrices
Abstract
Let be an random matrix with i.i.d. Bernoulli() entries. We show that there is a universal constant such that, whenever and satisfy ,
where denotes a quantity which converges to zero as . We provide the corresponding upper and lower bounds on the smallest singular value of as well.
AMS 2010 Classification:
primary: 60B20, 15B52;
secondary: 46B06, 60C05.
Keywords:
Littlewood–Offord theory,
Bernoulli matrices, sparse matrices, smallest singular value, invertibility
1 Introduction
Invertibility of discrete random matrices attracts considerable attention in the literature. The classical problem in this direction — estimating the singularity probability of a square random matrix with i.i.d. entries — was first addressed by Komlós in the 1960-es. Komlós [18] showed that decays to zero as the dimension grows to infinity. A breakthrough result of Kahn–Komlós–Szemerédi [16] confirmed that the singularity probability of is exponentially small in dimension. Further improvements on the singularity probability were obtained by Tao–Vu [44, 45] and Bourgain–Vu–Wood [7]. An old conjecture states that . The conjecture was resolved in [48].
Other models of non-symmetric discrete random matrices considered in the literature include adjacency matrices of -regular digraphs, as well as the closely related model of sums of independent uniform permutation matrices [19, 9, 10, 22, 23, 24, 25, 26, 2]. In particular, the recent breakthrough works [15, 35, 36] confirmed that the adjacency matrix of a uniform random –regular digraph of a constant degree is non-singular with probability decaying to zero as the number of vertices of the graph grows to infinity. A closely related line of research deals with the rank of random matrices over finite fields. We refer to [33] for some recent results and further references.
The development of the Littlewood–Offord theory and a set of techniques of geometric functional analysis reworked in the random matrix context, produced strong invertibility results for a broad class of distributions. Following works [47, 39] of Tao–Vu and Rudelson, the paper [41] of Rudelson and Vershynin established optimal small ball probability estimates for the smallest singular value in the class of square matrices with i.i.d. subgaussian entries, namely, it was shown that any matrix with i.i.d. subgaussian entries of zero mean and unit variance satisfies for all and some depending only on the subgaussian moment. The assumptions of identical distribution of entries and of bounded subgaussian moment were removed in subsequent works [37, 30, 31]. This line of research lead to positive solution of the Bernoulli matrix conjecture mentioned in the first paragraph. Let us state the result of [48] for future reference.
Theorem (Invertibility of dense Bernoulli matrices, [48]).
-
•
For each , let be the random matrix with i.i.d. entries. Then for any there is depending only on such that the smallest singular value satisfies
In particular, , where the quantity tends to zero as grows to infinity.
-
•
For each and there is depending on and such that for any and for random matrix with i.i.d. Bernoulli() entries,
In particular, for a fixed , we have .
Sparse analogs of the Rudelson–Vershynin invertibility theorem [41] were obtained, in particular, in works [46, 14, 29, 3, 4, 5], with the strongest small ball probability estimates in the i.i.d. subgaussian setting available in [3, 4, 5]. Here, we state a result of Basak–Rudelson [3] for Bernoulli() random matrices.
Theorem (Invertibility of sparse Bernoulli matrices, [3]).
There are universal constants with the following property. Let and let satisfy . Further, let be the random matrix with i.i.d. Bernoulli() entries (that is, random variables with expectation ). Then
The singularity probabilities implied by the results [48, 3] may be regarded as suboptimal in a certain respect. Indeed, while [48] produced an asymptotically sharp base of the power in the singularity probability of , the estimate of [48] is off by a factor which may (and in fact does, as analysis of the proof shows) grow to infinity with superpolynomially fast. Further, the upper bound on the singularity probability of sparse Bernoulli matrices implied by [3] captures an exponential dependence on , but does not recover an asymptotically optimal base of the power.
A folklore conjecture for matrices asserts that , where the right hand side of the expression is the probability that two rows or two columns of the matrix are equal up to a sign (see, for example, [16]). This conjecture can be naturally extended to the model with Bernoulli() () entries as follows.
Conjecture 1.1 (Stronger singularity conjecture for Bernoulli matrices).
For each , let , and let be the matrix with i.i.d. Bernoulli() entries. Then
In particular, if then
Conceptually, the above conjecture asserts that the main causes for singularity are local in the sense that the linear dependencies typically appear within small subsets of rows or columns. In a special regime , the conjecture was positively resolved in [5] (note that if then the matrix has a zero row with probability at least ). However, the regime was not covered in [5].
The main purpose of our paper is to develop methods capable of capturing the singularity probability with a sufficient precision to answer the above question. Interestingly, this appears to be more accessible in the sparse regime, when is bounded above by a small universal constant (we discuss this in the next section in more detail). It is not difficult to show that when , the events that a given row or a given column equals zero, almost do not intersect, so that
Our main result can be formulated as follows.
Theorem 1.2.
There are universal constants with the following property. Let and let be an random matrix such that
The entries of are i.i.d. Bernoulli() , with satisfying . | (A) |
Then
where is a quantity which tends to zero as . Moreover, for every ,
In fact, our approach gives much better estimates on in the regime when is constant, see Theorem 7.1 below. At the same time, we note that obtaining small ball probability estimates for was not the main objective of this paper, and the argument was not fully optimized in that respect.
Geometrically, the main result of our work asserts that (under appropriate assumptions on ) the probability that a collection of independent random vectors in , with i.i.d Bernoulli() components is linearly dependent is equal (up to factor) to probability of the event that either is zero for some or are contained in the same coordinate hyperplane:
Thus, the linear dependencies between the vectors, when they appear, typically have the prescribed structure, falling into one of the two categories described above with the (conditional) probability .
The paper is organized as follows. In the next section, we give an overview of the proof of the main result. In Section 3, we gather some preliminary facts and important notions to be used later. In Section 4, we consider new anti-concentration inequalities for random vectors with prescribed number of non-zero components, and introduce a functional (u-degree of a vector) which enables us to classify vectors on the sphere according to anti-concentration properties of inner products with the random vectors. In the same section, we prove a key technical result — Theorem 2.2 — which states, roughly speaking, that with very high probability a random unit vector orthogonal to columns of is either close to being sparse or to being a constant multiple of , or the vector is very unstructured, i.e., has a very large u-degree.
In Section 5, we consider a special regime of constant probability of success . In this regime, estimating the event that has an almost null vector which is either close to sparse or almost constant, is relatively simple. The reader who is interested only in the regime of constant can thus skip the more technical Section 6 and have the proof of the main result as a combination of the theorems in Sections 4 and 5. In Section 6, we consider the entire range for . Here, the treatment of almost constant and close to sparse null vectors is much more challenging and involves a careful analysis of multiple cases. Finally, in Section 7 we establish an invertibility via distance lemma and prove the main result of the paper. Some open questions are discussed in Section 8.
2 Overview of the proof
In this section, we provide a high-level overview of the proof; technical details will be discussed further in the text. The proof utilizes some known approaches to the matrix invertibility, which involve, in particular, a decomposition of the space into structured and unstructured parts, a form of invertibility via distance argument, small ball probability estimates based on the Esseen lemma, and various forms of the –net argument. The novel elements of the proof are anti-concentration inequalities for random vectors with a prescribed cardinality of the support, a structural theorem for normals to random hyperplanes spanned by vectors with i.i.d. Bernoulli() components, and a sharp analysis of the matrix invertibility over the set of structured vectors. We will start the description with our use of the partitioning trick, followed by a modified invertibility-via-distance lemma, and then consider the anti-concentration inequality and the theorem for normals (Subsection 2.1) as well as invertibility over the structured vectors (Subsection 2.2).
The use of decompositions of the space into structured and unstructured vectors has become rather standard in the literature. A common idea behind such partitions is to apply the Littlewood–Offord theory to analyse the unstructured vectors and to construct a form of the –net argument to treat the structured part. Various definitions of structured and unstructured have been used in works dealing with the matrix invertibility. One of such decomposition was introduced in [28] and further developed in [41]. In this splitting the structured vectors are compressible, having a relatively small Euclidean distance to the set of sparse vectors, while the vectors in the complement are incompressible, having a large distance to sparse vectors and, as a consequence, many components of roughly comparable magnitudes. In our work, the decomposition of is closer to the one introduced in [24, 27].
Let denote a non-increasing rearrangement of absolute values of components of a vector , and let be some parameters. Further, let be a non-decreasing function from into ; we shall call it the growth function. At this moment, the choice of the growth function is not important; we can assume that grows roughly as . Define the set of gradual non-constant vectors as
(1) |
In a sense, constant multiples of the gradual non-constant vectors occupy most of the space , they play role of the unstructured vectors in our argument. By negation, the structured vectors,
(2) |
are either almost constant (with most of components nearly equal) or have a very large ratio of and for some .
For simplicity, we only discuss the problem of singularity at this moment. As and are equidistributed, to show that it is sufficient to verify that
(3) |
and
The first relation is dealt with by using a variation of the invertibility via distance argument which was introduced in [41] to obtain sharp small ball probability estimates for the smallest singular value. In the form given in [41], the argument reduces the problem of invertibility over unstructured vectors to estimating distances of the form , where is the –th column of , and is the linear span of columns of except for the –th. In our setting, however, the argument needs to be modified to pass to estimating the distance conditioned on the size of the support of the column, as this allows using much stronger anti-concentration inequalities (see the following subsection). By the invariance of the distribution of under permutation of columns, it can be shown that in order to prove the relation (3), it is enough to verify that
(4) |
where is a non-zero random vector orthogonal to and measurable with respect to (see Lemma 7.4 and the beginning of the proof of Theorem 1.2). In this form, the question can be reduced to studying the anti-concentration of the linear combinations , where the Bernoulli random variables are mutually independent with and conditioned to sum up to a fixed number in . This intermediate problem is discussed in the next subsection.
2.1 New anti-concentration inequalities for random vectors with prescribed support cardinality
The Littlewood–Offord theory — the study of anti-concentration properties of random variables — has been a crucial ingredient of many recent results on invertibility of random matrices, starting with the work of Tao–Vu [47]. In particular, the breakthrough result [41] of Rudelson–Vershynin mentioned in the introduction, is largely based on studying the Lévy function , with being the first column of the random matrix and — a random unit vector orthogonal to the remaining columns of .
We recall that given a random vector taking values in , the Lévy concentration function is defined by
in particular for a scalar random variable we have . A common approach is to determine structural properties of a fixed vector which would imply desired upper bounds on the Lévy function of its scalar product with a random vector (say, a matrix’ column). The classical result of Erdős–Littlewood–Offord [12, 21] asserts that whenever is a vector in with i.i.d. components, and is such that for all , we have
where is a universal constant. It can be further deduced from the Lévy–Kolmogorov–Rogozin inequality [38] that the above assertion remains true whenever is a random vector with independent components satisfying for some constant . More delicate structural properties, based on whether components of can be embedded into a generalized arithmetic progression with prescribed parameters were employed in [47] to prove superpolynomially small upper bounds on the singularity probability of discrete random matrices.
The Least Common Denominator (LCD) of a unit vector introduced in [41] played a central role in establishing the exponential upper bounds on the matrix singularity under more general assumptions on the entries’ distributions. We recall that the LCD of a unit vector in can be defined as
(5) |
for some parameters . The small ball probability theorem of Rudelson and Vershynin [41] states that given a vector with i.i.d. components of zero mean and unit variance satisfying some additional mild assumptions,
for some constants (see [42] for a generalization of the statement). The LCD, or its relatives, were subsequently used in studying invertibility of non-Hermitian square matrices under broader assumptions [37, 30, 31], and delocalization of eigenvectors of non-Hermitian random matrices [43, 34, 32], among many other works.
Anti-concentration properties of random linear combinations naturally play a central role in the current work, however, the measures of unstructuredness of vectors existing in the literature do not allow to obtain the precise estimates we are aiming for. Here, we develop a new functional for dealing with linear combinations of dependent Bernoulli variables.
Given , , a vector and parameters , we define the degree of unstructuredness (u-degree) of vector by
(6) |
where the sum is taken over all sequences of disjoint subsets , each of cardinality and
(7) |
Here , , denote mutually independent integer random variables uniformly distributed on respective ’s. The function in the definition acts as a smoothing of , with for all and for all (we prefer to skip discussion of this purely technical element of the proof in this section, and refer to the beginning of Section 4 for the full list of conditions imposed on ).
The functional can be understood as follows. The expression inside the supremum is the average value of the integral
with the average taken over all choices of sequences . The function under the integral, disregarding the smoothing , is the absolute value of the characteristic function of the random variable , where is a random –vector with exactly ones, and with the -th one distributed uniformly on . A relation between the magnitude of the characteristic function and anti-concentration properties of a random variable (the Esseen lemma (Lemma 3.12 below) has been commonly used in works on the matrix invertibility (see, for example, [40]), and determines the shape of the functional . The definition of the u-degree is designed specifically to work with random –vectors having a fixed sum (equal to ). The next statement follows from the definition of and the Esseen lemma.
Theorem 2.1 (A Littlewood–Offord-type inequality in terms of the u-degree).
Let be positive integers with , and let . Further, let , and let be a random –vector in uniformly distributed on the set of vectors with ones and zeros. Then
where may only depend on .
The principal difference of the u-degree and the above theorem from the notion of the LCD and (5) is that the former allow to obtain stronger anti-concentration inequalities in the same regime of sparsity, assuming that the coefficient vector is sufficiently unstructured. In fact, under certain conditions, sparse random vectors with prescribed support cardinality admit stronger anti-concentration inequalities compared to the i.i.d. model.
The last principle can be illustrated by taking the coefficient vector as a “typical” vector on the sphere . First, assume that are i.i.d. Bernoulli() , with . Then it is easy to see that for almost all (with respect to normalized Lebesgue measure) vectors ,
In words, for a typical coefficient vector on the sphere, the linear combination takes distinct values for any two distinct realizations of , and thus the Lévy function at zero is equal to the probability measure of the largest atom of the distribution of which corresponds to all equal to zero. In contrast, if the vector is uniformly distributed on the set of –vectors with support of size , then for almost all , the random sum takes distinct values. Thus,
where for small .
The above example provides only qualitative estimates and does not give an information on the location of the atoms of the distribution of . The notion of the u-degree addresses this problem. The following theorem, which is the main result of Section 4, asserts that with a very large probability the normal vector to the (say, last) columns of our matrix is either very structured or has a very large u-degree, much greater than the critical value .
Theorem 2.2.
Let , , , and let . Then there are , and , depending on such that the following holds. Let , , and . Let be an increasing (growth) function satisfying
(8) |
Assume that is an Bernoulli() random matrix. Then with probability at least one has
for all . |
We would like to emphasize that the parameter in this theorem can take values less than one, in the regime when the matrix typically has null rows and columns. In this respect, the restriction in the main theorem comes from the treatment of structured vectors.
The proof of Theorem 2.2 is rather involved, and is based on a double counting argument and specially constructed lattice approximations of the normal vectors. We refer to Section 4 for details. Here, we only note that, by taking as a sufficiently large constant, the theorem implies the relation (4), hence, accomplishes the treatment of unstructured vectors.
2.2 Almost constant, steep and -vectors
In this subsection we discuss our treatment of the set of structured vectors, . In the proof we partition the set into several subsets and work with them separately. In a simplistic form, the structured vectors are dealt with in two ways: either by constructing discretizations and taking the union bound (variations of the –net argument), or via deterministic estimates in the case when there are very few very large components in the vector. We note here that the discretization procedure has to take into account the non-centeredness of our random matrix model: while in case of centered matrices with i.i.d. components (and under appropriate moment conditions) the norm of the matrix is typically of order times the standard deviation of an entry, for our Bernoulli() model it has order (i.e., roughly times the standard deviation of an entry), which makes a direct application of the –net argument impossible. Fortunately, this large norm is attained only in one direction — the direction of the vector while on the orthogonal complement of the typical norm is . Therefore it is enough to take a standard net in the Euclidean norm and to make it denser in that one direction, which almost does not affect the cardinality of the net. We refer to Section 3.6 for details.
Let us first describe our approach in the (simpler) case when , where is a small enough absolute constant and is a fixed parameter (independent of ). We introduce four auxiliary sets and show that the set of unit structured vectors, , is contained in the closure of their union.
The first set, , consists of unit vectors close to vectors of the canonical basis, specifically, unit vectors satisfying , where denotes the non-inreasing rearrangement of the vector . For any such vector the individual bound is rather straightforward — conditioned on the event that there are no zero columns in our matrix , and that the Euclidean norms of the matrix rows are not too large, we get . This class is the main contributor to the bound for non-invertibility over the structured vectors .
For the other three sets we use anti-concentration probability estimates and discretizations. An application of Rogozin’s lemma (Proposition 3.9) implies that probability of having small inner product of a given row of our matrix with is small, provided that there is a subset such that the maximal coordinate of is bounded above by , where denotes the standard Euclidean norm and is the coordinate projection onto . Combined with tensorization Lemma 3.8 this implies exponentially (in ) small probability of the event that is close to zero — see Proposition 3.10 below. Specifically, we define as the set of unit vectors satisfying the above condition with , that is, satisfying , and for we take all unit vectors satisfying the condition with , that is, satisfying , where is a permutation satisfying , . For vectors from these two sets we have very good individual probability estimates, but, unfortunately, the complexity of both sets is large — they don’t admit nets of small cardinality. To overcome this issue, we have to redefine these sets by intersecting them with specially chosen sets of vectors having many almost equal coordinates. For the precise definition of such sets, denoted by , see Subsection 3.6. A set is a variant of the class of almost constant vectors, (see (9) below), introduced to deal with general . Having a large part of coordinates of a vector almost equal to each other reduces the complexity of the set making possible to construct a net of small cardinality. This resolves the problem and allows us to deal with these two classes of sets. The remaining class of vectors, , consists of vectors with , i.e., vectors with relatively big two largest components. For such vectors we produce needed anti-concentration estimates for the matrix-vector products by using only those two components, i.e., we consider anti-concentration for the vector , where . Since the Rogozin lemma is not suitable for this case, we compute the anti-concentration directly in Proposition 3.11. As for the classes , we actually intersect the fourth class with appropriately chosen sets of almost constant vectors in order to control cardinalities of the nets. The final step is to show that the set is contained in the union of four sets described here. Careful analysis of this approach shows that the result can be proved with all constants and parameters depending only on . Thus, it works for being between the two constants and .
The case of small , that is, the case , requires a more sophisticated splitting of — we split it into steep vectors and -vectors. The definition and the treatment of steep vectors essentially follows [24, 27], with corresponding adjustments for our model. The set of steep vectors consists of vectors having a large jump between order statistics measured at certain indices. The first subclass of steep vectors, , is the same as the class described above — vectors having very large maximal coordinate — and is treated as . Similarly to the case of constant , this class is the main contributor to the bound for non-invertibility over structured vectors. Next we fix certain and consider a sequence , , , for some specially chosen parameters and depending on and . The class will be defined as the class of vectors such that there exists with . To work with vectors from this class, we first show that for a given the event that for every choice of two disjoint sets and , a random Bernoulli() matrix has a row with exactly one in components indexed by and no ’s among components indexed by , holds with a very high probability. Then, conditioned on this event, for every , we choose corresponding to , , and corresponding to , , and the corresponding row. Then the inner product of this row with will be large in absolute value due to the jump (see Lemma 6.9 for the details). Thus, conditioned on the described event, for every we have a good lower bound on . Then next two classes of steep vectors, and , consist of vectors having a jump of order , namely, vectors in satisfy and vectors in satisfy , where and ( is the parameter from the definition of ). Trying to apply the same idea for these two subclasses one sees that the size of corresponding sets and is too large to have exactly one among a row’s components indexed by with a high probability. Therefore the proof of individual probability bounds is more delicate and technical as well as a construction of corresponding nets for . We discuss the details in Subsection 6.6.
The class of -vectors consists of non-steep vectors to which Rogozin’s lemma (Proposition 3.9) can be applied when we project a vector on smallest coordinates with , thus vectors from this class satisfy for (we will take union over all choices of integer in the interval ). Thus, the individual probability bounds for -vectors will follow from Rogozin’s lemma together with tensorization lemma as for classes , , described above. Thus the remaining part is to construct a good net for -vectors. For simplicity, dealing with such vectors, we fix the normalization . Since vectors are non-steep, we have a certain control of largest coordinates and, thus, on the Euclidean norm of a vector. The upper bound on is chosen in such a way that the cardinality on a net corresponding to largest coordinates of a vector is relatively small (it lies in -dimensional subspace). For the purpose of constructing of a net of small cardinality, we need to control the Euclidean norm of for an -vector. Therefore we split -vectors into level sets according to the value of . There will be two different types of level sets — vectors with relatively large Euclidean norm of and vectors with small . A net for level sets with large is easier to construct, since we can zero all coordinates starting with . If the Euclidean norm is small, we cannot do this, so we intersect this subclass with almost constant vectors (in fact we incorporate this intersection into the definition of -vectors), defined by
(9) |
As in the case of constant , this essentially reduces the dimension corresponding to almost constant part to one and therefore reduce the cardinality of a net. The rather technical construction of nets is presented in Subsection 6.3. In some aspects the construction follows ideas developed in [24].
3 Preliminaries
3.1 General notation
By universal or absolute constants we always mean numbers independent of all involved parameters, in particular independent of and . Given positive integers we denote sets and by and correspondingly. Having two functions and we write if there are two absolute positive constants and such that . As usual, denotes the permutation group on .
For every vector , by we denote the non-increasing rearrangement of the sequence and we fix one permutation satisfying , . We use for the standard inner product on , that is . Further, we write for the -norm of . We also denote .
3.2 Lower bound on the singularity probability
Here, we provide a simple argument showing that for the sequence of random Bernoulli() matrices , with satisfying as , we have
Our approach is similar to that applied in [5] in the related context.
Fix and write . Let be the indicator of the event that there is zero row in the matrix , and, similarly, let be the indicator of the event that has a zero column. Then, obviously,
hence,
On the other hand,
implying
Therefore,
It remains to note that, with our assumption on the growth rate of , we have , which implies
3.3 Gradual non-constant vectors
For any , we define as the set of all vectors in with . We will call these vectors -normalized. By a growth function we mean any non-decreasing function from into .
Let be an arbitrary growth function. We will say that a vector is gradual (with respect to the function ) if for all . Further, if satisfies
(10) |
then we say that the vector is essentially non-constant or just non-constant (with parameters ). Recall that the set was defined in (2) as
Vectors from this set we call gradual non-constant vectors.
Recall that the set of structured vectors was defined in (2) as the complement of scalar multiples of . The next simple lemma will allow us to reduce analysis of to the treatment of the set .
Lemma 3.1.
For any choice of parameters , the set is contained in the closure of the set .
Proof.
Let be a unit vector such that for some . If then , where . If , we can consider a sequence of vectors in defined by for and . Let
Clearly, , so for all sufficiently large we have . Thus, for all large ,
whereas . This implies the desired result. ∎
We will need two following lemmas. The first one states that vectors which do not satisfy (10) are almost constant (that is, have large part of coordinates nearly equal to each other). The second one is a simple combinatorial estimate, so we omit its proof.
Lemma 3.2.
Let , . Denote and and assume . Assume does not satisfy (10). Then there exist of cardinality and with such that for every .
Proof.
By denote the non-increasing rearrangement of (we would like to emphasize that we do not take absolute values). Note that there are two subsets with satisfying if and only if . Therefore, using that does not satisfy (10), we observe . Next consider the set
Then . Since we obtain that
Therefore, there exists an index such that . Taking , we observe that for every , . This completes the proof. ∎
Lemma 3.3.
For any there are , and depending only on with the following property. Let and let satisfy . Denote by the collection of sequences with and . Let be as in (7). Then for any pair of disjoint subsets of of cardinality at least each, one has
3.4 Auxiliary results for Bernoulli r.v. and random matrices
Let , is Bernoulli random variable taking value with probability and with probability . We say that is a Bernoulli() random variable. A random matrix with i.i.d. entries distributed as will be called Bernoulli() random matrix.
Here we provide four lemmas needed below. We start with notations for random matrices used throughout the paper. The class of all matrices having entries we denote by . We will consider a probability measure on induced by the distribution of an Bernoulli() random matrix. We will use the same notation for this probability measure; the parameter will always be clear from the context. Let . By we denote the -th row of , and by — the -th column, . By we always denote the operator norm of acting as an operator . This norm is also called spectral norm and equals the largest singular number.
We will need the following form of Bennett’s inequality.
Lemma 3.4.
Let , , and be a Bernoulli() random variable. Let and , , be independent copies of . Define the function , . Then for every ,
In particular, for ,
and for , ,
Furthermore, for ,
Moreover, if and then denoting
we have .
Proof.
Recall that Bennett’s inequality states that for mean zero independent random variables , …, satisfying (for a certain fixed ) almost surely for , one has for every ,
where (see e.g. Theorem 1.2.1 on p. 28 in [8] or Exercise 2.2 on p. 11 in [11] or Theorem 2.9 in [6]). Take , , . Then for every , and are centered, , and . Applying the Bennett inequality with twice — to and , we observe the first inequality. To prove the second inequality, we take and use that is an increasing function satisfying on . The third inequality follows by taking and using .
For the “furthermore” part, we apply the third inequality with , to get
On the other hand, using ,
Since , , , and on , this implies
Finally, to get the last inequality, we take , then
Since under our assumptions, , the bound on follows by the union bound. ∎
We need the following simple corollary of the Bennet’s lemma.
Lemma 3.5.
For any there is with the following property. Let and satisfy and . Further, let be an be Bernoulli() random matrix. Then with probability at least one has
Proof.
For each , let be the indicator of the event
By Lemma 3.4, . Since ’s are independent, by the Markov inequality,
The result follows. ∎
The following lemma provides a bound on the norm of a random Bernoulli matrix. It is similar to [5, Theorem 1.14], where the case of symmetric matrices was treated. For the sake of completeness we sketch its proof.
Lemma 3.6.
Let be large enough and . Let be a Bernoulli() random matrix. Then for every one has
In particular, taking ,
(11) |
Proof.
Given an random matrix with independent entries taking values in . we consider it as a vector in with . Then the Hilbert–Schmidt norm of is the standard Euclidean norm on . Let be any function in which is convex and is -Lipschitz with respect to the standard Euclidean norm. Then the Talagrand inequality (see e.g. Corollary 4.10 and Proposition 1.8 in [20]) gives that for every ,
We apply this inequality twice, first with the function to the matrix . At the end of this proof we show that . Therefore, taking with , we obtain the first bound. For the second bound, note that all entries of equal , hence . Thus, the second bound follows by the triangle inequality.
It remains to prove that . Recall that are the entries of . Let , be independent copies of and set . Denote by independent Rademacher random variables and by independent standard Gaussian random variables. We assume that all our variables are mutually independent and set . Since for every , is symmetric, it has the same distribution as and the same as . Then we have
Applying a result of Bandeira and Van Handel (see the beginning of Section 3.1 in [1]), we obtain
where
Note that are Bernoulli() random variables with . Therefore, using and applying the “moreover part” of Lemma 3.4, we obtain that with probability at most . Moreover, since , we have . Therefore,
∎
As an elementary corollary of the above lemma, we have the following statement where the restriction is removed.
Corollary 3.7.
For every and there is depending on with the following property. Let be large enough and let satisfy . Let be an Bernoulli() random matrix. Then
Proof.
Let , , and let be Bernoulli() matrix. Assuming that is sufficiently large, we get
Thus, the previous lemma is applicable, and we get
for some depending only on . Since the norm of a matrix is not less than the norm of any of its submatrices, and because any submatrix of is equidistributed with , we get the result. ∎
3.5 Anti-concentration
In this subsection we combine anti-concentration inequalities with the following tensorization lemma (see Lemma 3.2 in [48], Lemma 2.2 in [41] and Lemma 5.4 in [39]). We also provide Esseen’s lemma.
Lemma 3.8 (Tensorization lemma).
Let . Let be independent random variables. Assume that for all , . Then for every one has
Moreover, if there exists and such that for every and for all one has then there exists an absolute constant such that for every ,
Recall that for a real-valued random variable its Lévy concentration function is defined as
We will need bounds on the Lévy concentration function of sums of independent random variables. Such inequalities were investigated in many works, starting with Lévi, Doeblin, Kolmogorov, Rogozin. We quote here a result due to Kesten [17], who improved Rogozin’s estimate [38].
Proposition 3.9.
Let be independent random variables and satisfy . Then there exists an absolute positive constant such that
This proposition together with Lemma 3.8 immediately implies the following consequence, in which, given and , denotes coordinate projection of on .
Proposition 3.10.
There exists and absolute constant such that the following holds. Let . Let be a Bernoulli() random variable. Let , , and , , be independent copies of . Let . Let and be such that . Then
Moreover, if then
Proof.
We start with the “moreover” part. Assume . Let . Clearly, for every , . Proposition 3.9 implies that for every satisfying one has
Choosing and (note that the assumption on ensures that for all ) we obtain the “moreover” part.
Now apply Lemma 3.8 with , , , . We have
This implies the bound under assumption , which can be removed by normalizing . ∎
We also will need the following combination of a simple anti-concentration fact with Lemma 3.8.
Proposition 3.11.
Let and . Let be a Bernoulli() random variable. Let , , and , , be independent copies of . Let . Let be such that . Then
Proof.
Without loss of generality we assume that and . Note that takes value in with probability and in with probability . Since the distance between and is and since , the first inequality follows.
Now apply Lemma 3.8 with , , , . We note that then and therefore we have
which completes the proof. ∎
Lemma 3.12 (Esseen).
There exists an absolute constant such that the following holds. Let , be independent random variables. Then for every ,
3.6 Net argument
Here we discuss special nets that will be used and corresponding approximations. We fix the following notations. Let be the unit vector in the direction of . Let be the projection on and be the projection on , that is . Similarly, for , let be the projection on and be the projection on . Recall that for , the permutation satisfies , . Define a (non-linear) operator by — the coordinate projection on , where , in other words annihilates the largest coordinate of a vector. Consider the triple norm on defined by
(note that ). We will use the following notion of shifted sparse vectors (recall here that is the permutation responsible for the non-increasing rearrangement). Given and a parameter , define
Further, given another parameter , define the set
Lemma 3.13.
Let and . Then there exists an -net in with respect to of cardinality at most
Proof.
Denote . For each let be a set from the definition of (if the choice of is not unique, we fix one of them).
Fix of cardinality . We first consider vectors satisfying . Fix and denote
(thus on ). We now construct a net for . It will be obtained as the sum of four nets, where the first one deals with just one coordinate, , “killing” the maximal coordinate; the second one deals with non-constant part of the vector, consisting of at most coordinates (excluding ); the third one deals with almost constant coordinates (corresponding to ); and the fourth net deals with the direction of the constant vector. This way, three of our four nets are -dimensional. Let be the coordinate projection onto , where . Note that the definition of implies that for every . Let, as before, be the projection onto .
Let be an -net in of cardinality at most . Let be an -net (with respect to the Euclidean metric) in of cardinality at most
Further, let be an -net in the segment with cardinality at most . Then by the construction of the nets and by the definition of for every there exist , , such that for ,
in particular, . Finally, let be an -net in the segment with cardinality at most . Then for every there exists as above and with
Thus the set is an -net for with respect to and its cardinality is bounded by
Taking union of such nets over all choices of and all we obtain an -net in for of desired cardinality. Using standard argument, we pass to an -net for . ∎
Later we apply Lemma 3.13 with the following proposition.
Proposition 3.14.
Let be large enough and , and . Denote
Then for every satisfying and every one has
Proof.
Let . Then, by the definition of the triple norm, . Clearly,
Therefore, using that , we get
Since and , using again that , we observe that
The proposition follows by the triangle inequality. ∎
4 Unstructured vectors
The goal of this section is to prove Theorem 2.2.
Recall that given growth function and parameters , the set of vectors was defined in (2). In the next two sections (dealing with invertibility over structured vectors) we work with two different growth functions; one will be applied to the case of constant and the other one (giving a worse final estimate) is suitable in the general case. For this reason, and to increase flexibility of our argument, rather than fixing a specific growth function here, we will work with an arbitrary non-decreasing function satisfying the additional assumption (8) with a “global” parameter .
4.1 Degree of unstructuredness: definition and basic properties
Below, for any non-empty finite integer subset , we denote by a random variable uniformly distributed on . Additionally, for any , we fix a smooth version of . More precisely, let us fix a function satisfying
-
•
The function is twice continuously differentiable, with and ;
-
•
for all ;
-
•
for all ;
-
•
for all .
In what follows, we view the maximum of the second derivative of as a function of (the nature of this function is completely irrelevant as we do not attempt to track magnitudes of constants involved in our arguments).
Fix an integer and an integer . Recall that given a vector and parameters , the degree of unstructuredness (u-degree) of was defined in (6). The quantity will serve as a measure of unstructuredness of the vector and in its spirit is similar to the notion of the essential least common denominator introduced earlier by Rudelson and Vershynin [41]. Here unstructuredness refers to the uniformity in the locations of components of on the real line. The larger the degree is, the better anti-concentration properties of an associated random linear combination are. The functions employed in the definition will be important when discussing certain stability properties of .
We start with a proof of Theorem 2.1 which connects the definition of the u-degree with anti-concentration properties.
Proof of Theorem 2.1.
For any sequence of disjoint subsets of of cardinality each, set
Note that each point of the probability space belongs to the same number of events from the collection , therefore, for defined in (7) we have for any and ,
(12) |
Further, conditioned on an event , the random sum is equidistributed with (where we assume that are jointly independent with ). On the other hand, applying Lemma 3.12, we observe that for every ,
for a universal constant . Combining this with (12), we get for every ,
Setting , where , we obtain
in view of the definition of . The result follows. ∎
Corollary 4.1.
Let , let be integers with for all , and let . Further, let , and let be an random matrix with independent rows such that the -th row is uniformly distributed on the set of vectors with ones and zeros. Then for any non-random vector we have
The parameter which did not participate in any way in the proof of Theorem 2.1 is needed to guarantee a certain stability property of . We would like to emphasize that the use of functions is a technical element of the argument.
Proposition 4.2 (Stability of the u-degree).
For any there are depending only on with the following property. Let , , , , and assume that . Then there is a vector such that , and such that
To prove the proposition we need two auxiliary lemmas. , ,
Lemma 4.3.
Let , and let be a random vector in with and with everywhere on the probability space. Then
Proof.
We can view both and as vectors in , and can assume without loss of generality that , with . Then and
Hence,
which implies the desired estimate. ∎
Lemma 4.4.
Let , and let be a random variable in with and with everywhere on the probability space. Then for any we have
Proof.
Denote . Then and . Therefore, using that and for every , we obtain
∎
Proof of Proposition 4.2.
To prove the proposition, we will use the randomized rounding which is a well known notion in computer science, and was recently applied in the random matrix context in [30] (see also [48, 31]). Define a random vector in with independent components such that each component has distribution
Then , and, deterministically, .
Fix for a moment a number and a subset of cardinality . Our intermediate goal is to estimate the quantity
Denote
and consider two cases.
Case 1. . Using that for every , we observe that deterministically
(13) |
Therefore, by the definition of the function , in this case we have on the entire probability space
Case 2. . Set
Then and, using again , we see that everywhere. By Lemma 4.4, , in particular, . Therefore we may apply Lemma 4.3, to obtain
This implies,
(14) |
To convert the last relation to estimating , we will use the assumption that the second derivative of is uniformly bounded. Applying Taylor’s expansion around the point , we get
for some which may only depend on . Here, denotes the essential supremum of the random variable, and is bounded above by by 13. Together with (14) and with , this gives
where depends only on .
Since , in both cases we obtain for some depending only on ,
Using this inequality together with definition of , integrating over , and summing over all choices of disjoint subsets of cardinality , for every we get the relation
where are constants that may only depend on . Using independence of the components of , we can take the expectation with respect to out of the integral.
Given a vector and , denote
The above relation implies that there are two (non-random) realizations and of such that for
and
Using properties of the function , we note that for any two non-random vectors and in the range of such that they differ on a single coordinate, one has Consider a path from to consisting of a sequence of non-random vectors in the range of such that each adjacent pair differs on a single coordinate and let
If , take . Otherwise, let . Then take and note . Thus the vector is in the range of and
Making substitutions , in the integrals in , and assuming that (in this case the condition is satisfied), we can rewrite the last inequalities as
The result follows by the definition of . ∎
The last statement to be considered in this subsection asserts that the u-degree of any vector from is at least of order .
Proposition 4.5 (Lower bound on the u-degree).
For any there is depending only on with the following property. Let , , and let . Then
Lemma 4.6.
For any and there is a constant depending only on and with the following property. Let be a finite subset of , and let be a real vector (indexed by ). Assume further that are two disjoint subsets of , each of cardinality at least such that . Let and be a function on defined by
Then for every one has
Proof.
Clearly we may assume that . Denote and
Let be of cardinality and such that for all with one has and . Then for all ,
Further, take any and observe that for each , since , we have
where may only depend on . This implies that
On the other hand, whenever is such that for at most pairs , we have
whence . Taking we obtain the desired result with . ∎
Proof of Proposition 4.5.
Let be defined as in (7) and , , be from Lemma 3.3. We assume that and . For every denote
Further, let subsets and be taken from the definition of non-constant vectors applied to . Then by Lemma 3.3 and since ,
where is set of all sequences such that is the subset of such that
(15) |
Take any and denote . Without loss of generality we assume that (15) holds for all . Applying Lemma 4.6 with and , we get for all and ,
where depends only on and . This estimate implies that for ,
where denotes the Beta-function and is a constant depending only on and . Applying Hölder’s inequality, we obtain
which impies the desired result. ∎
4.2 No moderately unstructured normal vectors
Let be an Bernoulli() random matrix.. For each , denote by the span of columns , . The goal of this subsection is to prove Theorem 2.2, which asserts that under appropriate restrictions on and with a very large probability (say, at least ), the subspace is either structured or very unstructured. The main ingredient of the proof — Proposition 4.9 — will be considered in the next subsection. Here, we will only state the proposition to be used as a blackbox and for this we need to introduce an additional product structure, which, in a sense, replaces the set .
Fix a permutation , two disjoint subsets of cardinality each, and a number such that
(16) |
Define the sets by
(17) |
In what follows, we adopt the convention that whenever does not satisfy (16).
Lemma 4.7.
There exists an absolute constant such that for every there is a subset of cardinality at most with the following property. For any two partitions and of with , , there is such that , .
This lemma immediately follows from the fact that the total number of partitions of satisfying , , is exponential in (one can take ). Using Lemma 4.7, we provide an efficient approximation of .
Lemma 4.8.
Proof.
Let , and assume that satisfies . Then, by the definition of , there exist sets , each of cardinality , satisfying
Then hence we can find a number such that
By the definition of we also have for all . By the definition of , we can find a permutation such that
Clearly for such a permutation we have for every . Using (8), we obtain
Thus
Since for some , this implies the desired result. ∎
The following statment, together with Theorem 2.1 and Proposition 4.2, is the main ingredient of the proof of Theorem 2.2.
Proposition 4.9.
Let , and let the growth function satisfies (8). There exist , , and with the following property. Let , , and let be such that . Let , , with , , and let be a random vector uniformly distributed on . Then
Let us describe the proof of Theorem 2.2 informally. Assume that the hyperplane admits a normal vector which belongs to . We need to show that with a large probability the u-degree of is very large, say, at least for a small . The idea is to split the collection into about subsets according to the magnitude of the u-degree (that is, each subset will have a form for an appropriate ). To show that for each the probability of is very small, we define a discrete approximation of consisting of all vectors such that for some and additionally, in view of Proposition 4.2, and . We can bound the cardinality of such set by , for a small , by combining Proposition 4.9 with Lemma 4.8 and with the following simple fact.
Lemma 4.10.
Let , , , with , and satisfies (8) with some . Then , where depends only on .
On the other hand, for each fixed vector in the set we can estimate the probability that it “approximates” a normal vector to by using Corollary 4.1:
for some constant . Taking the union bound, we obtain
Below, we make this argument rigorous.
Proof of Theorem 2.2.
We start by defining parameters. We always assume that is large enough, so all statements used below work for our . Fix any , and , and set . Let Note that the function is a growth function that satisfies condition (8) with parameter . In particular, choosing so taht , we have
For brevity, we denote
Set
and
We will assume that is sufficiently large so that
Moreover, we will assume that
and | (18) |
and
Define two auxiliary random objects as follows. Set
and let be a random vector measurable with respect to and such that
-
•
;
-
•
and ;
-
•
.
(Note that may have dimension larger than one with non-zero probability, and thus is not uniquely defined). Note that to prove the theorem, it is sufficient to show that with probability at least one has either or .
Next, we denote
Then, proving the theorem amounts to showing that with probability at most .
We say that a collection of indices is admissible if and . For admissible sets consider disjoint collection of events defined by
Further, denote
According to Corollary 3.7, , while by Lemma 3.5 and (18),
Denote by the collection of all admissible satisfying . Then for , we have , and, using that events are disjoint,
Hence,
Therefore, to prove the theorem it is sufficient to show that for any ,
Fix an admissible , denote by the matrix obtained by transposing columns , , and let be the non-random matrix with all elements equal to . Note that, in view of our definition of , the assumptions on and Proposition 4.5, we have a deterministic relation
everywhere on the probability space. For each real number , denote by the event
Splitting the interval into subintervals, we observe that it is sufficient to show that for every we have
The rest of the argument is devoted to estimating probability of for fixed and fixed . Set . Let be a (random) integer such that
Since on we have , applying Proposition 4.2, we can construct a random vector having the following properties:
-
•
everywhere on ,
-
•
everywhere on ,
-
•
for all and everywhere on .
The first condition together with the inclusion implies that
Using that and that , we observe that there is a random number such that everywhere on one has
Let be a subset of
consisting of all vectors such that
-
•
for some ;
-
•
for all .
Note that by Lemma 4.8 the entire range of on falls into .
Combining the above observations,
whence, using that by the definition of ,
To estimate the last probability, we apply Corollary 4.1 with (note that , , and that satisfies the assumption of the corollary). We obtain that for all admissible and ,
On the other hand, the cardinality of can be estimated by combining Lemma 4.10, Lemma 4.7 and Proposition 4.9 (note that our choice of parameters guarantees applicability of these statements):
where . Thus, using our choice of parameters and assuming in addition that
by our choice of parameters. The result follows. ∎
4.3 Anti-concentration on a lattice
The goal of this subsection is to prove Proposition 4.9. Thus, in this subsection, we fix , a growth function satisfying (8), which in particular means that , a permutation , a number , two sets such that , and we do not repeat these assumptions in lemmas below. We also always use short notation for the set defined in (17).
We start with auxiliary probabilistic statements which are just special forms of Markov’s inequality.
Lemma 4.11 (Integral form of Markov’s inequality, I).
For each , let be a non-negative random variable with a.e. Assume that the random function is integrable on with probability one. Assume further that for some integrable function and some we have
for all . Then for all ,
Proof.
Consider a random set
Since for any , we have Therefore, by the Markov inequality, for all . The result follows by noting that
∎
Lemma 4.12 (Integral form of Markov’s inequality, II).
Let be a finite set, and for each , let be a non-negative random variable with a.e. Assume further that for some and some we have
for all . Then for all ,
Our next statement will be important in an approximation (discretization) argument used later in the proof.
Lemma 4.13 (Lipschitzness of the product ).
Let and set . Further, let be some non-empty subsets of . For denote
Then (viewed as a function of ) is -Lipschitz.
Proof.
By our definition, is -Lipschitz for any , hence (viewed as a function of ) is -Lipschitz. Since , by the definition of the function , we have , hence, for all ,
Taking the product, we obtain that
whenever . This, together with the bound implies for all ,
which completes the proof. ∎
In the next two lemmas we initiate the study of random variables , more specifically, we will be interested in the property that, under appropriate assumptions on ’s, the sum of such variables is close to zero on average.
Lemma 4.14.
Let , , . Let be an integer interval and let be real numbers such that for all ,
Then
Proof.
We will determine the restrictions on parameter at the end of the proof. We have
(19) |
Further, denoting and , we observe for any ,
In view of assumptions on
Therefore,
Using (19), we complete the proof. ∎
Lemma 4.15.
For every there are and , , with the following property. Let , , let () be integer intervals, and let be real numbers such that , and for all and . Then, assuming that random variables , , are mutually independent, one has
Proof.
Fix any , and set . Set and . Assume that , and let numbers and integer intervals satisfy the assumptions of the lemma. Denote the event
by , and additionally, for any subset of cardinality and any vector , set
It is not difficult to see that
whence it is sufficient to show that for any admissible ,
(20) |
Without loss of generality, we can consider . Event is contained inside the event
while the latter is contained inside the event
Thus, taking the union over all admissible choices of indices , we get
To estimate the last probability, we apply Markov’s inequality, together with the bound for the second moment from Lemma 4.14 (applied with ), and using independence of , . We then get
In view of (20) this implies the result, since using and , we have
∎
Our next step is to show that for the vector uniformly distributed on the random product is, in a certain sense, typically small (for most choices of ). To do this we first show that given a collection of distinct numbers which are pairwise well separated, the above product is small for at least one with very high probability.
Lemma 4.16.
For any there are and with the following property. Let be with . Let , be a random vector uniformly distributed on , and let be real numbers in such that for all . Fix disjoint subsets of , each of cardinality . Then
Proof.
Fix any and set and . Assume that . Note that, by our definition of , the coordinates of are independent and, moreover, each variable is distributed on an integer interval of cardinality at least . Thus, it is sufficient to prove that for any collection of integer intervals , , such that , the event
has probability at most , where, as usual, we assume that the variables , , , are jointly independent. Observe that, as for all , the event is contained inside the event
where , , . Denoting if and otherwise and using a simple counting argument for the matrix , we obtain that
To estimate we use Lemma 4.15 with . Note that , and that by our choice of , for any we have , while . Thus,
Hence,
which completes the proof. ∎
Lemma 4.17 (Very small product everywhere except for a set of measure ).
For any there are , and with the following property. Let , , , , and . Let be a random vector uniformly distributed on . Fix disjoint subsets of , each of cardinality . Then
Proof.
Fix any , and define , , , and .
Assume that the parameters and satisfy the assumptions of the lemma. In particular, we assume that is large enough so that and . Denote
Let . By Lemma 4.16 for any collection of points from satisfying for all , we have
Taking the union bound over all possible choices of from , we get
(21) |
Further, in view of Lemma 4.13, for any realization of ’s the product
viewed as a function of , is -Lipschitz. This implies that for any pair , satisfying , we have
Moreover, for any collection of numbers from satisfying for all there are numbers with for all (we used also ). This, together with (21), yields
The event whose probability is estimated above, clearly contains the event in question —
This, and our choice of parameters, implies the result. ∎
Lemma 4.18 (Moderately small product for almost all ).
For any and there are , , and with the following property. Let , , , and . Let be a random vector uniformly distributed on . Fix disjoint subsets of , each of cardinality . Then
Proof.
Let will be chosen later. Fix any . Assume . For denote
Observe that by the definition of for each we have , provided . Next note that if for some complex unit numbers their average has length then, taking the unit complex number satisfying we have
therefore there are at least indices such that . This in turn implies that there exists an index such that there are at least indices with . Thus that the event is contained in the event
To estimate the probability of the later event, we take the union bound over all choices of indices from , and over all choices of . We then get
To estimate the probability under maximum we use the definition of and independence of coordinates of the vector . Note that for each fixed there is an integer interval of the length at least such that is uniformly distributed on . Therefore, fixing a realization , , we need to count how many are such that is close to an integer. This can be done by splitting into subintervals of length and considering cases , (this case can be empty), and . This leads to the following bound with an absolute constant ,
Using this estimate and the fact that for (so, each ), we obtain
The last step of the proof is somewhat similar to the one used in the proof of Lemma 4.17 — we discretize the interval and use the Lipschitzness . Recall that and thus, by Lemma 4.13, is -Lipschitz. Let
Then for any satisfying we have deterministically. This implies that
provided that for a sufficiently small universal constant . ∎
Lemma 4.19.
Let , , , , . Let be independent random variables, with uniformly distributed on and uniformly distributed on . Then for every one has
Proof.
Clearly, it is enough to consider only. Note that
We consider two cases.
Case 1. and . In this case, deterministically, , therefore, using that on , we have for every ,
Case 2. Either or . Without loss of generality, we will assume the first inequality holds. We condition on a realization of (further in the proof, we compute conditional probabilities given ). For any , the event
is contained inside the event
On the other hand, since is uniformly distributed on a set , for some , the probability of the last event is less than . The result follows.
∎
Lemma 4.20 (Integration for small ).
For any , and there are , , and with the following property. Let be defined as in (7), , , with and , and let be a random vector uniformly distributed on . Then for every ,
where the sum is taken over all disjoint subsets of cardinality each.
Proof.
Let , and be as in Lemma 3.3). A given choice of subsets denote
(note that functions , , depend on the choice of subsets ).
First, we study the distribution of the variable for a given choice of subsets . We assume that and . We also denote and
Fix a sequence , and be a subset of cardinality such that
For any , , and by Lemma 4.19 we have for ,
Within , we can find at least disjoint pairs of indices satisfying the above condition. Let be a set of such pairs with . Using the independence of coordinates of , and denoting , we obtain for every ,
Applying this for all together with observations and (when ), we conclude that for every ,
At the next step, we apply the Lemma 4.11 with to obtain from the previous relation
Next we apply Lemma 4.12) with , (recall that depends also on the choice of ). We obtain
Further, since by Lemma 3.3 we have and since , we observe that
deterministically. Recalling that , we obtain
for some depending only on and , provided that . The result follows by the substitution in the integral. ∎
Proof of Proposition 4.9.
As we mentioned at the beginning of this subsection, we fix , a growth function satisfying (8), a permutation , a number , two sets such that , and we use for the set defined in (17). We also fix .
We start by selecting the parameters. Assume that is large enough. Set . Let be taken from Lemma 4.18. Set . Fix an integer satisfying the condition , and take . Let be defined as in (7). We assume that is chosen in such a way that the set is non-empty. As before denotes the random vector uniformly distributed on . Let be as in Lemma 3.3). A given choice of subsets denote
We have
In view of Lemma 4.20, with probability at least the first summand is bounded above by . To estimate the second summand, we combine Lemmas 4.17 and 4.18 (we assume that as otherwise there is no second summand). Fix for a moment a collection . By Lemma 4.17, with probability at least the function on is bounded above by for all points outside of some set of measure at most (note that we apply variable transformation to use the lemma here). Further, by Lemma 4.18, with probability at least we have that is bounded above by for all . Thus, with probability at least ,
Applying Lemma 4.12 with and , we obtain that
with probability at least . Thus, taking , we obtain
∎
5 Complement of gradual non-constant vectors: constant
In this section, we study the problem of invertibility of the Bernoulli() matrix over the set defined by (2) in the case when the parameter is a small constant. This setting turns out to be much simpler than treatment of the general case given in the next section. Although the results of Section 6 essentially absorb the statements of this section, we prefer to include analysis of the constant in our work, first, because it provides a short and relatively simple illustration of our method and, second, because the estimates obtained here allow to derive better quantitative bounds for the smallest singular value of .
5.1 Spliting of and main statements
We define the following four classes of vectors . For simplicity, we normalize vectors with respect to the Euclidean norm. The first class is the set of vectors with one coordinate much larger than the others, namely,
For the next sets we fix a parameter , where is the absolute constant from Proposition 3.10. Recall also that the operator (which annihilates the maximal coordinate of a given vector) and the set were introduced in Subsection 3.6. We also fix a small enough absolute positive constant . We don’t try to compute the actual value of , the conditions on how small is can be obtained from the proofs. We further fix an integer .
The second class of vectors consist of those vectors for which the Euclidean norm dominates the maximal coordinate. To control cardinalities of nets (discretizations) we intersect this class with , specifically, we set
The next set is similar to , but instead of comparing with the Euclidean norm of the entire vector, we compare with . For a technical reason, we need to control the magnitude of precisely; thus we partition the third set into subsets. Let numbers , , be defined by
(22) |
Clearly, . Then for each we define
To explain the choice of , note that if and , then . Thus, if in addition , then . We set
The fourth set covers the remaining options for vectors having a large almost constant part. Let numbers , , be defined by
(23) |
Clearly, . Then for each define the set as
Note that if and , then , justifying the choice of . We set
Finally define as the union of these four classes,
In this section we prove two following theorems.
Theorem 5.1.
There exists positive absolute constants such that the following holds. Let be large enough, let , and . Let be an Bernoulli() random matrix. Then
where the set is defined above.
Recall that the set was introduced in Subsection 3.3. The next theorem shows that, after a proper normalization, the complement of (taken in ) is contained in for some choice of and for the growth function (clearly, satisfying (8)).
Theorem 5.2.
There exists an absolute (small) positive constant such that the following holds. Let be a parameter. Then there exist , such that for , , , , and one has
5.2 Proof of Theorem 5.1
Theorem 5.1 is a consequence of four lemmas that we prove in this section. Each lemma treats one of the classes , , and Theorem 5.1 follows by the union bound. Recall that was introduced in Subsection 3.6 and that given , we fixed one permutation, , such that for . Recall also that the event was introduced in Proposition 3.14.
Lemma 5.3.
Let and . Let (with ) be the event introduced in Lemma 3.4 and by denote the subset of matrices with no zero columns. Then for every and every ,
In particular,
Proof.
Let , be entries of . Let . Denote, . Since , there exists such that . Then
Using that we observe that . Thus,
The trivial bound completes the first estimate. The “in particular” part follows by the “moreover” part of Lemma 3.4 and since . ∎
Lemma 5.4.
There exists a (small) absolute positive constant such that the following holds. Let be large enough and . Let and be Bernoulli() matrix. Then
Proof.
By Lemma 3.13 for there exists an –net in with respect to the triple norm , with cardinality at most
Since , by a standard “projection” trick, we can obtain from it an –net in of the same cardinality. Let . Let be such that . Since on we have , Proposition 3.10 implies that with probability at least ,
(24) |
Further, in view of Proposition 3.14, conditioned on (24) and on , we have
where we have chosen . Using the union bound and our choice of , we obtain that
for sufficiently large and provided that and for small enough absolute positive constant . This completes the proof. ∎
Remark 5.5.
Note that we used Proposition 3.10 with the set . In this case we could use slightly easier construction for nets than the one in Lemma 3.13 — we don’t need to distinguish the first coordinate in the net construction, in other words we could have only one special direction, not two. However this would not lead to a better estimate and in the remaining lemmas we will need the fult strength of our construction.
Next we threat the case of vectors in . The proof is similar to the proof of Lemma 5.4, but we need to remove the maximal coordinate and to deal with remaining part of the vector. Recall that the operator serves this purpose.
Lemma 5.6.
There exists a (small) absolute positive constant such that the following holds. Let be large enough, and , . Let be a random Bernoulli() matrix. Then
Proof.
Fix . By Lemma 3.13 for there exists an –net in with respect to , with cardinality at most
Again using a “projection” trick, we can construct an –net in of the same cardinality. Let . Let be such that . Since on we have , Proposition 3.10 applied with implies that with probability at least ,
Conditioned on the above inequality and on the event , Proposition 3.14 implies that
where we have chosen . Using the union bound, our choice of and , we obtain that
for large enough and for , where is a small enough absolute constant (we also assume ). Since and , we obtain
This completes the proof. ∎
Finally we threat the case of vectors in .
Lemma 5.7.
There exists a (small) absolute positive constant such that the following holds. Let be large enough and let , . Let be a Bernoulli() random matrix. Then
Proof.
Fix . By Lemma 3.13 for there exists an –net in
with respect to with cardinality at most
By the projection trick, we get an –net in .
Let . Let be such that . Since on we have , Proposition 3.11 implies that with probability at least ,
Conditioned on the above and on , Proposition 3.14 implies that
where we have chosen
provided that . Using the union bound and our choice of we obtain that
for large enough and for , where is a small enough absolute constant. Since and , we obtain
This completes the proof. ∎
5.3 Proof of Theorem 5.2
Proof.
We prove the statement with , where is the constant from Theorem 5.1, and . Note that under our choice of parameters (and assuming is small), .
Assume that . By denote the non-increasing rearrangement of (we would like to emphasize that we do not take absolute values). Note that for any there are two subsets with satisfying if and only if . This leads to the two following cases.
Case 1. . Since , in this case there exists an index with . Note that since , we have .
Subcase 1a. . Since we get
Therefore,
Now let . Then
(25) |
Our goal is to show that (with ).
If , we are done.
Otherwise, if , then (25) implies that , that is, there are at least coordinates at the distance at most from zero. Thus and hence .
If and , then necessarily for some , where are defined according to (22). Then (25) implies that , that is, there are at least coordinates at the distance at most from zero. Thus and hence .
Subcase 1b. . In this case . Assume , that is . Then
We can now define and, having noted that , proceed similarly to the Subcase 1a. We will need to use the condition , which holds for small enough .
Case 2. . Set be a permutation of such that , (note that is in general different from the permutation defined in connection with the non-increasing rearrangement of the absolute values ). Define the following set, which will play the role of the set in the definition of (see Subsection 3.6),
Then , and . Since , we observe that either or (or both). Moreover, since , we necessarily have that for some . Therefore, there exists an index such that . Taking , we observe that for every , . On the other hand we have
Now let . Then
(26) |
The end of the proof is similar to the end of the proof of Case 1. If , we are done. If , then using (26), , and we obtain that and, thus, . If , , and then, using (26) and we obtain that and, thus, . If , , and then, similarly, using (26) and , we obtain that and, thus, . This completes the proof. ∎
6 Complement of gradual non-constant vectors: general case
We split into two classes of vectors. The first class, the class of steep vectors , is constructed in essentially the same way as in [24] and [27]. The proof of bound for this class resembles corresponding proofs in [24] and [27], however, due to the differences of the models of randomness, there are important modifications. The second class , which we call -vectors, will consist of vectors to which Proposition 3.10 can be applied, therefore dealing with this class is simpler. To control the cardinality of nets, part of this class will be intersected with the almost constant vectors. Then we show that the complement of in is contained in .
We now introduce the following parameters, which will be used throughout this section. It will be convenient to denote . We always assume that and is large enough (that is, larger than a certain positive absolute constant). We also always assume that the “average degree” . Fix a sufficiently small absolute positive constant and sufficiently large absolute positive constant (we do not try to estimate the actual values of and , the conditions on how small can be extracted from the proofs, in particular, the condition on comes for (38)). We also fix two positive integers and such that
(27) |
Note that and that implies .
For we set
Then, in the case we set . Otherwise, let . Note that with this definition we always have . The indices , , are global parameters which will be used throughout the section. Below we provide the proof only for the case , the other case is treated similarly (in particular, in that other case the set defined below, will be empty).
We also will use another parameter,
(28) |
Note that the function is a decreasing function on , therefore for and sufficiently larg we have . Moreover, it is easy to see that if , then . We also notice that if for some then and, using the definition of and ,
(29) |
6.1 Two classes of vectors and main results
We first introduce the class of steep vectors. It will be constructed as a union of four subclasses. Set
Then for ,
Finally, for set and define
The set of steep vectors is .
The “rules” of the partition are summarized in the diagram.
For this class we prove the following bound.
Theorem 6.1.
There exist positive absolute constants such that the following holds. Let , and let satisfy . Let be a Bernoulli() random matrix and denote
where as before Then
Next we introduce the class of -vectors, denoted by . Let be the constant from Proposition 3.10 and recall that the class of almost constant vectors was defined by (9) in Subsection 2.2. Given denote and consider the sets
and
Define
The class should be thought of as the class of sufficiently spread vectors, not steep, but possibly without having two subsets of coordinates of size proportional to , which are separated by (which would allow us to treat those vectors as part of the set ). Crucially, the sets and are “low complexity” sets because they admit –nets of relatively small cardinalities (see Subsection 6.3). For the class we prove the following bound.
Theorem 6.2.
There are absolute constants with the following property. Let , , let and be such that . Then
Finally we show that together with classes and cover all (properly normalized) vectors for the growth function defined by
(30) |
It is straightforward to check that satisfies (8) with some absolute constant .
Theorem 6.3.
There are universal constants with the following property. Let , , and assume that . Let , , , and let be as in (30). Then
6.2 Auxiliary lemmas
In the following lemma we provide a simple bound on the Euclidean norms of vectors in the class and its complement in terms of their order statistics.
Lemma 6.4.
Let be large enough and . Consider the vectors for some , , and . Then
Proof.
Let . Since , denoting , we have
Since , , since , and in view of (29), we obtain
This implies the first bound. The bounds for are obtained similarly. ∎
The next two Lemmas 6.5 and 6.6 will be used to bound from below the norm of the matrix-vector product for vectors with a “too large” almost constant part which does not allow to directly apply the Lévy–Kolmogorov–Rogozin anti-concentration inequality together with the tensorization argument. Lemma 6.5 will be used to bound by a single inner product for a specially chosen index , while Lemma 6.6 will allow to extract a subset of “good” rows having large inner products with .
Lemma 6.5.
Let and satisfy . Let be such that either
or
Let be an Bernoulli() random matrix. By denote the event that for any choice of two disjoint sets of cardinality , there exists a row of with exactly one among components indexed by and no s among components indexed by . Then
Proof.
We first treat the case . Fix two disjoint sets of required cardinality. The probability that a fixed row has exactly one among components indexed by and no s among components indexed by equals
where we used . Since the rows are independent, the probability that does not have such a row is
Note that the number of all choices of and satisfying the conditions of the lemma is
Thus union bound over all choices of and implies
Using that and , we observe Since , we have . Thus,
which proves this case.
The case , is similar. Fixing two disjoint sets of the required cardinality, the probability that a fixed row has exactly one among components indexed by and no s among components indexed by equals
Since rows are independent, the probability that does not have such a row is
Using union bound over all choices of and we obtain
which proves the lemma. ∎
In the next lemma we restrict a matrix to a certain set of columns and estimate the cardinality of a set of rows having exactly one . To be more precise, for any and a matrix denote
The following statement is similar to Lemma 2.7 from [24] and Lemma 3.6 in [27].
Lemma 6.6.
Let be an integer and be such that . Let be a Bernoulli() random matrix. Then with probability at least
for every of cardinality one has
In particular, if , , and then, denoting
we have
Proof.
Fix of cardinality . Denote . Since ,
For every , let be the indicator of the event . Clearly, ’s are independent Bernoulli() random variables and . Applying Lemma 3.4, we observe that for every
Taking we obtain that
and
This implies the bound for a fixed . The lemma follows by the union bound. ∎
6.3 Cardinality estimates for –nets
In this subsection we provide bounds on cardinality of certain discretizations of the sets of vectors introduced earlier. Recall that denotes the vector , denotes the projection on , and is the projection on , that is . We recall also that given , denotes coordinate projection of on , and that given , is a (fixed) permutation corresponding to non-increasing rearrangement of .
Our first lemma deals with nets for and . We will consider the following normalization:
The triple norm is defined by
Lemma 6.7.
Let , , and assume that is sufficiently large. Let . Then there exists a set , , , with the following properties:
-
•
-
•
For every one has for all .
-
•
For every there are and satisfying
Since the proof of this lemma in many parts repeats the proofs of Lemma 3.8 from [24] and of Lemma 6.8 below, we only sketch it.
Proof.
Fix ) and . We first repeat the proof of Lemma 3.8 from [24] with our choice of parameters. See also the beginning of the proof of Lemma 6.8 below — many definitions, constructions, and calculations are exactly the same, however note that the normalization is slightly different. In particular, the definitions of sets , (with ), are the same (we do not need the sets and ). This will show (for large enough ) the existence of a -net (in the metric) for such that for every one has for all and .
Next given let be such that . Then . Let be a -net in the segment of cardinality at most (note, we are in the one-dimensional setting). Note that every is of the form , , in particular, . Then for (and the corresponding ), there exists such that
Finally, note that . This completes the proof. ∎
Let , be the vector subsets introduced in Subsection 6.1. Consider the increasing sequence , , defined by
, for and . | (31) |
Clearly . For , and set
It is not difficult to see that the union of ’s over admissible gives . The sets are “low complexity” sets in the sense that they admit efficient -nets. For , the low complexity is a consequence of the condition that , i.e., the vectors have a very large almost constant part. For the sets , we do not assume the almost constant behavior, but instead rely on the assumption that is large (much larger than ). This will allow us to pick much larger than , and thus construct a net of small cardinality.
Lemma 6.8.
Let be a (large) constant. Then there is depending on with the following property. Let , , let and so that is sufficiently large (larger than a constant depending on ). Let , , , and , where and are defined according to relation (31). Then there exists an -net for with respect to of cardinality at most .
Proof.
Note that in case of the set is empty whenever . So, in the course of the proof we will implicitly assume that whenever .
We follow ideas of the proof of Lemma 3.8 from [24]. We split a given vector from into few parts according to magnitudes of its coordinates and approximate each part separately. Then we construct nets for vectors with the same splitting and take the union over all nets. We now discuss the splitting. For each consider the following (depending on ) partition of . If , set . If then and we set
where is from the definition of (note that under the normalization in we have ). Then for . Next, we set
(one of the sets , could be empty). Denote . Note that the definition of and imply that , while the condition and the above observation for give for . Clearly, for .
Moreover, we have both for and :
(32) |
Thus, given and a partition of into five sets , , with cardinalities as in (32), it is enough to construct a net for vectors with , , , and then to take the union of nets over all possible realizations of and all such partitions of .
Now we describe our construction. Fix as above and fix two parameters , and . We would like to emphasize that for the actual calculations in this lemma, taking to be a small constant multiple of would be sufficient, however, we would like to run the proof with the above choice of because this corresponds to the parameter choice in the previous Lemma 6.7 whose proof we only sketched. Note that for we have , hence and
(33) |
Fix with (which will play the role of ). We shall construct a -net (in the -metric) for the set
Clearly, the nets for various ’s can be related by appropriate permutations, so without loss of generality we can assume for now that . First, consider the partition of into sets defined by
Consider the set
By the definition of , for every , one has for every (where as usual denotes the coordinate projection onto ). Define a –net (in the -metric) for by setting
where is a -net (in the -metric) of cardinality at most
in the coordinate projection of the cube . Recall that , , , where and are given by (27). Since is large enough,
which implies
To pass from the net for to the net for , let be the union of nets constructed as but for arbitrary partitions of with . Using that
and we obtain that the cardinality of is at most
Next we construct a net for the parts of the vectors corresponding to . Fix with (it will play the role of ). We construct a -net (in the -metric) for the set
Since by (33), we have for every , it is enough to take a -net of cardinality at most
in the coordinate projection of the cube .
Now we turn to the part of the vectors corresponding to . Fix with (it will play the role of ). For this part we use -metric and construct a -net (in the Euclidean metric this time) for the set
Since for we have , there exists a corresponding -net in the coordinate projection of the Euclidean ball of cardinality at most
Next we approximate the almost constant part of a vector (corresponding to ), provided that it is not empty (otherwise we skip this step). Fix with (it will play the role of ) and denote
Let . Since for every we have either or , by the definition of , every is approximated by one of within error in the -metric.
The last part of the vector, corresponding to we just approximate by . Note that for any we have , in view of the condition . On the other hand, for we have .
Now we combine our nets. Consider the net
where the union is taken over all and all partitions of into with , , , , and . Then the cardinality of ,
Using that , , , or , the obtained bounds on nets, as well as that is large enough and is small enough (smaller than a constant depending on ), we observe that the cardinality of is bounded by
By construction, for every there exists such that
where we used that and that is sufficiently small.
Finally we adjust our net to . Note that by Lemma 6.4 for every ,
Therefore, there exists an -net in of cardinality (note, the rank of is one). Then, by the constructions of nets, for every there exist and such that
Thus the set is an -net for with respect to and its cardinality is bounded by . Using standard argument we pass to an -net for . ∎
6.4 Proof of Theorem 6.2
Proof.
Recall that the sets were introduced just before Lemma 6.8 and the event was defined in Proposition 3.14.
Fix , , , . Set , where and are defined according to (31). Applying Lemma 6.8 with , we find an -net (in the –norm) for of cardinality at most . Take for a moment any . Note that , (where ). Then Proposition 3.10 implies , where
Let us condition on the event . Using the definition of and , the triangle inequality, and the definition of from Proposition 3.14, we get that for any there is such that , and hence
Using that , that , and the union bound, we obtain
Since and is small enough, the result follows by the union bound and by Lemma 3.6 applied with in order to estimate . ∎
6.5 Lower bounds on for vectors from
The following lemma provides a lower bound on the ratio for vectors from .
Lemma 6.9.
Proof.
Let , be entries of . Let be the event that there are no zero columns in . Clearly, .
Also, for each , let be the event introduced in Lemma 6.5 (with defined in (27)), and observe that, according to Lemma 6.5, for every .
Recall that denotes a permutation such that for . Pick any . In the case set and . In the case for some set and . Then by the definition of sets we have . Let
(if then ). Note that by our definition we have for any and , and that . Denote by the (random) set of rows of having exactly one 1 in and no 1’s in . Now we recall that the event was introduced in Lemma 3.4 (we use it with ) and set
Clearly, conditioned on , the set is not empty for any . By definition, for every there exists such that
Since (which implies ), we obtain
Observe that conditioned on we have . Thus, everywhere on we have for all ,
Finally, in the case we have and . In the case by Lemma 6.4 we have
This proves the lower bound on conditioned on . The probability bound follows by the union bound, Lemmas 3.4 and 6.5, and since , indeed
∎
6.6 Individual bounds for vectors from
In this section we provide individual probability bounds for vectors from the nets constructed in Lemma 6.7. To obtain the lower bounds on , we consider the behavior of the inner products , more specifically, of the Lévy concentration function for . To estimate this function, we will consider columns of corresponding to the biggest and smallest (in absolute value) coordinates of , where or . In a sense, our anti-concentration estimates will appear in the process of swapping ’s and ’s within a specially chosen subset of the matrix rows. A crucial element in this process is to extract a pair of subsets of indices on which the chosen matrix rows have only one non-zero component. This will allow to get anti-concentration bounds by “sending” the non-zero component into the other index subset from the pair. The main difficulty in this scheme comes from the restriction from Lemma 6.6, which guarantees existence of sufficiently many required subsets (and rows) but which cannot be directly applied to . To resolve this problem we use idea from [27]. We split the initially fixed set of columns into smaller subsets of columns of size at most each, and create independent random variables corresponding to this splitting. Then we apply Proposition 3.9, allowing to deal with the Lévy concentration function for sums of independent random variables.
We first describe subdivisions of used in [27]. Recall that denotes the class of all matrices with entries. We recall also that the probability measure on is always assumed to be induced by a Bernoulli() random matrix. Given and denote
By we denote the set of matrices with entries and with columns indexed by . Fix and a partition , , …, of . Given subsets of and , denote and consider the class
In words, we fix the columns indexed by and for each we fix the row indices having exactly one in columns indexed by . Then, for any fixed partition , , …, , is the disjoint union of classes over all and all , where denotes the power set.
The following is an important, but simple observation.
Lemma 6.10.
Let be a non-empty class (defined as above), and denote by the induced probability measure on , i.e., let
Then the matrix rows for matrices in are mutually independent with respect to , in other words, a random matrix distributed according to has mutually independent rows.
Finally, given a vector , a class , indices , , define
(34) |
We will view as random variables on (with respect to the measure ). It is not difficult to see that for every fixed , the variables are mutually independent, and, moreover, whenever , the variable is uniformly distributed on the multiset . Thus, we may apply Proposition 3.9 to
with some satisfying for every . This gives
(35) |
where is a positive absolute constant.
We are ready now to estimate individual probabilities.
Lemma 6.11 (Individual probabilities).
There exist absolute constants such that the following holds. Let , , Set and let and be such that
Let and assume that satisfies
Denote and consider the event
Then in the case one has
and in the case one has
where is the event introduced in Lemma 6.6 with .
Remark 6.12.
We apply this lemma below for sets with the following choice of parameters. For we set
obtaining
For , we set
obtaining for large enough ,
To prove Lemma 6.11 it will be convenient to use the same notation as in Lemma 6.9. Given two disjoint subsets , and a matrix , denote
and
Here the upper indices and refer to left and right.
Proof.
Let and fix .
Fix and satisfying the conditions of the lemma. Let , that is, a permutation of such that for all . Denote and without loss of generality assume that either or that is a large enough integer. Let be a partition of into sets of cardinality each, and let be a partition of into sets of cardinality each. Denote
Then , , …, is a partition of , which we fix in this proof. Let be a matrix. For every pair , , let the sets and be defined as after Remark 6.12 and let . Since
and by the definition of the event (see Lemma 6.6 with ), we have
(36) |
everywhere on . Now we represent as a disjoint union of classes defined at the beginning of this subsection with and . Since it is enough to prove a uniform upper bound for classes and since for every such non-empty class must satisfy (36) for every , we have
where the first maximum is taken over all with and the second maximum is taken over all with ’s satisfying condition (36).
Fix any class , where satisfies (36), and denote the corresponding induced probability measure on the class by , that is
Let
Note that . We first show that the set of ’s which belongs to many ’s is large. More precisely, denote
Then, using bounds on cardinalities of ’s, one has
Thus,
Without loss of generality we assume that and only consider the first indices from it. Then .
Now, by definition, for matrices we have
Therefore there are at most rows with . Hence,
Let and for every denote
Then
Without loss of generality we assume that the maximum above is attained at . Then
(37) |
where at the last step we used mutual independence of the events (with respect to measure ), see Lemma 6.10.
Next we estimate the factors in the product. Fix and . Since, by our assumptions, , we have . Consider the random variables , , defined in (34). Then by (35) we have
where . Moreover, in the case we just have
Thus it remains to estimate for . Fix , so that . Recall that, by construction, the intersection of the support of with is a singleton everywhere on . Denote the corresponding index by . Then
and note that whenever and whenever . Observe further that . Hence, we obtain
Combining the probability estimates starting with (37) and using that , we obtain in the case ,
provided that is small enough and . Note that the bound is meaningful only if is large enough. In the case we have
provided that is small enough. This completes the proof. ∎
6.7 Proof of Theorem 6.1
Recall that was defined in Proposition 3.14. For any matrix there exists satisfying
Normalize so that , that is, . By Lemma 6.4 we have .
Let be the net constructed in Lemma 6.7. Then there exist with
and for , and , such that Applying Proposition 3.14 (where was introduced), and using that is large enough, we obtain that for every matrix there exist and with
(38) |
Using our choice of , , , Lemma 6.7, and Lemma 6.11 twice — first with , , then with , (see Remark 6.12), we obtain that for small enough and large enough the probability is bounded by
and that the probability is bounded by
where is the event introduced in Lemma 6.6 with .
6.8 Proof of Theorem 6.3
Proof.
Clearly, it is enough to show that Let and set . Note that , where was defined in (27). Denote .
Assume first that does not satisfy (10). Then by Lemma 3.2, . If then denoting , , and using the definition of , we observe
whence
On the other hand, , hence . This implies that .
Now, if then denoting , , we get
whence
As in the previous case we have , which implies that .
Next we assume that does satisfy (10). Then, by the definition of the set and our function , does not satisfy the following condition:
We fix the smallest value of which breaks this condition and consider several cases. Note that since , we must have .
Case 1. . In this case by the conditions and by minimality of , we have and . Take and . Then we have
hence
As above we have , which implies that .
Case 2. . Take and . Then we have , , and
Therefore,
Since , we observe , hence and .
In the rest of the proof we show that we must necessarily have .
Case 3. , where . Using that , in this case we have
which is impossible for large enough .
Case 4. . Using that and that , in this case we have
which is impossible for large enough .
Case 5. for some . Recall that and recall also that if (as in this case) then . Using that , in this case we have
hence
(39) |
Considering the function , we observe that its derivative is linear in , therefore attains its maximum either at or at . Thus, to show that (39) is impossible it is enough to consider only. Let . By (29), , where . Therefore, the logarithm of the left hand side of (39) is
(40) |
On the other hand, , therefore the logarithm of the left hand side of (39) is larger than . Thus, it is enough to check that
Both inequalities follows since , , and are large enough, and since . Next assume that . Note that in this case . Thus, to disprove (39) it is enough to show that
which clearly holds for large enough .
Case 7. . In this case we have and we proceed as in Case 6. ∎
7 Proof of the main theorem
In this section, we combine the results of Sections 4, 5, and 6, as well as Subsection 3.2 to prove the main theorem, Theorems 1.2, and the following improvement for the case of constant :
Theorem 7.1.
There exists an absolute positive constant with the following property. Let be a parameter (independent of ). Then there exist and (both depend only on ), such that for every and every a Bernoulli() random matrix satisfies
and, moreover, for every ,
At this stage, the scheme of the proof to a large extent follows the approach of Rudelson and Vershynin developed in [41]. However, a crucial part of their argument — “invertibility via distance” (see [41, Lemma 3.5]) — will be reworked in order to keep sharp probability estimates for the matrix singularity and to be able to bind this part of the argument with the previous sections, where we essentially condition on row- and column-sums of our matrix.
We start by restating main results of Sections 5 and 6 using the vector class defined by (2), together with Lemma 3.1.
Corollary 7.2.
Corollary 7.3.
There are universal positive constants with the following property. Let be a parameter. Then there exist , such that for , , , , the random Bernoulli() matrix satisfies (41) with .
Below is our version of “invertibility via distance,” which deals with pairs of columns.
Lemma 7.4 (Invertibility via distance).
Let , and let be a growth function. Further, let and let be an random matrix. Then for any we have
where the sum is taken over all ordered pairs with and .
Proof.
For every , denote by the indicator of the event
The condition
for some implies that for every ,
where the last inequality follows from the definition of . Since , we get that everywhere on the event there are at least ordered pairs of indices such that for each pair the event occurs. Rewriting this assertion in terms of indicators, we observe
Applying Markov’s inequality in order to estimate probability of the event on the right hand side, we obtain the desired result. ∎
Proof of Theorems 1.2 and 7.1.
The proofs of both theorems are almost the same, the only difference is that Theorem 1.2 uses Corollary 7.3 while Theorem 1.2 uses Corollary 7.2. Let parameters be taken from Corollary 7.2 or from Corollary 7.3 correspondingly. We always write for . Let . Without loss of generality, we can assume that . Fix , and denote by the complement of the event
For denote
Applying Corollary 7.2 (or Corollary 7.3), Lemma 7.4 and the invariance of the conditional distribution of given under permutation of columns, we obtain
At the next step, we consider events
Since columns of are independent and consist of i.i.d. Bernoulli() variables, applying Lemma 3.4, we observe
Therefore, in view of equidistribution of the first two columns, we get
Denote by a random unit vector orthogonal to (and measurable with respect to) . Note that on the event the vector satisfies
which implies that on the event we also have , and . By the definition of , we have , therefore,
On the other hand, applying Theorem 2.2 with , we get that for some constants and , with probability at least ,
Combining the last two assertions and applying Theorem 2.1, we observe
Thus
By rescaling we obtain
In the case of constant (applying Corollary 7.3) we have and , and we get the small ball probability estimate of Theorem 7.1.
In the case of “general” (with the application of Corollary 7.2) we have and . Therefore,
for large enough , and the estimate follows.
In both cases the upper bound on , , is greater than , so we may omit it.
Finally, applying the argument of Subsection 3.2, we get the matching lower bound for the singularity probability. This completes the proof. ∎
8 Open questions
The result of this paper leaves open the problem of estimating the singularity probability for Bernoulli matrices in two regimes: when is logarithmic in and when is larger than the constant from Theorem 1.2.
For the first regime, we recall that the singularity probability of , with in a (small) neighborhood of , was determined up to the multiple in the work of Basak–Rudelson [5]. Definitely, it would be of interest to bridge that result and the main theorem of this paper.
Problem 8.1 (A brigde: Theorem 1.2 to Basak–Rudelson).
Let satisfy
and for each let be the matrix with i.i.d. Bernoulli() entries. Show that
Note that the main technical result for unstructured (gradual non-constant) vectors, Theorem 2.2 proved in Section 4, remains valid for these values of . It may be therefore expected that the above problem can be positively resolved by finding an efficient treatment for the structured vectors (the complement of gradual non-constant vectors), which would replace (or augment) the argument from Section 6.
On the contrary, the second problem — singularity of random Bernoulli matrices with large values of — seem to require essential new arguments for working with the unstructured vectors as the basic idea of Section 4 — gaining on anti-concentration estimates by grouping together several components of a random vector — does not seem to be applicable in this regime.
Problem 8.2 (Optimal singularity probability for dense Bernoulli matrices below the threshold).
Let the sequence satisfy
Show that
Acknowledgments
K.T. was partially supported by the Sloan Research Fellowship.
References
- [1] A.S. Bandeira, R. van Handel, Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Ann. Probab. 44 (2016), 2479–2506.
- [2] A. Basak, N. Cook and O. Zeitouni, Circular law for the sum of random permutation matrices, Electronic Journal of Probability, 23 (2018), Paper No. 33, 51 pp.
- [3] A. Basak and M. Rudelson, Invertibility of sparse non-Hermitian matrices, Adv. Math. 310 (2017), 426–483. MR3620692
- [4] A. Basak and M. Rudelson, The circular law for sparse non-Hermitian matrices, arXiv:1707.03675
- [5] A. Basak and M. Rudelson, Sharp transition of the invertibility of the adjacency matrices of random graphs, arXiv:1809.08454
- [6] S. Boucheron, G. Lugosi, P. Massart, Concentration inequalities. A nonasymptotic theory of independence. With a foreword by Michel Ledoux. Oxford University Press, Oxford, 2013.
- [7] J. Bourgain, V. H. Vu and P. M. Wood, On the singularity probability of discrete random matrices, J. Funct. Anal. 258 (2010), no. 2, 559–603. MR2557947
- [8] D. Chafaï, O. Guédon, G. Lecué, A. Pajor, Interactions between compressed sensing random matrices and high dimensional geometry, Panoramas et Synthèses [Panoramas and Syntheses], 37. Soc. Math. de France, Paris, 2012.
- [9] N. A. Cook, On the singularity of adjacency matrices for random regular digraphs, Probab. Theory Related Fields 167 (2017), no. 1-2, 143–200. MR3602844
- [10] N. Cook, The circular law for random regular digraphs, Ann. Inst. Henri Poincare Probab. Stat., to appear, arXiv:1703.05839
- [11] L. Devroye; G. Lugosi, Combinatorial methods in density estimation. Springer Series in Statistics. Springer-Verlag, New York, 2001.
- [12] P. Erdös, On a lemma of Littlewood and Offord, Bull. Amer. Math. Soc. 51 (1945), 898–902. MR0014608
- [13] C. G. Esseen, On the Kolmogorov-Rogozin inequality for the concentration function, Z. Wahrsch. Verw. Gebiete 5 (1966), 210–216. MR0205297
- [14] F. Götze and A. Tikhomirov, The circular law for random matrices, Ann. Probab. 38 (2010), no. 4, 1444–1491. MR2663633
- [15] J. Huang, Invertibility of adjacency matrices for random -regular graphs, preprint, arXiv:1807.06465, 2018.
- [16] J. Kahn, J. Komlós and E. Szemerédi, On the probability that a random -matrix is singular, J. Amer. Math. Soc. 8 (1995), no. 1, 223–240. MR1260107
- [17] H. Kesten, A sharper form of the Doeblin-Lévy-Kolmogorov-Rogozin inequality for concentration functions, Math. Scand. 25 (1969), 133–144. MR0258095
- [18] J. Komlós, On the determinant of matrices, Studia Sci. Math. Hungar 2 (1967), 7–21. MR0221962
- [19] B. Landon, P. Sosoe, H. Yau, Fixed energy universality of Dyson Brownian motion, Adv. Math. 346 (2019), 1137–1332.
- [20] M. Ledoux, The concentration of measure phenomenon. Mathematical Surveys and Monographs, 89. American Mathematical Society, Providence, RI, 2001.
- [21] J. E. Littlewood and A. C. Offord, On the number of real roots of a random algebraic equation. III, Rec. Math. [Mat. Sbornik] N.S. 12(54) (1943), 277–286. MR0009656
- [22] A. E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann and P. Youssef, Anti-concentration property for random digraphs and invertibility of their adjacency matrices, C. R. Math. Acad. Sci. Paris 354 (2016), no. 2, 121–124. MR3456885
- [23] A. E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann and P. Youssef, Adjacency matrices of random digraphs: singularity and anti-concentration, J. Math. Anal. Appl. 445 (2017), no. 2, 1447–1491. MR3545253
- [24] A. E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann and P. Youssef, The smallest singular value of a shifted -regular random square matrix, Probab. Theory Related Fields, 173 (2019), 1301–1347.
- [25] A. E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann and P. Youssef, The circular law for sparse random regular digraphs, J. European Math. Soc., to appear.
- [26] A. E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann and P. Youssef, The rank of random regular digraphs of constant degree, J. of Complexity, 48 (2018), 103–110.
- [27] A. E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann and P. Youssef, Structure of eigenvectors of random regular digraphs, Trans. Amer. Math. Soc., 371 (2019), 8097–8172.
- [28] A. E. Litvak, A. Pajor, M. Rudelson and N. Tomczak-Jaegermann, Smallest singular value of random matrices and geometry of random polytopes, Adv. Math. 195 (2005), no. 2, 491–523. MR2146352
- [29] A. E. Litvak and O. Rivasplata, Smallest singular value of sparse random matrices, Studia Math. 212 (2012), no. 3, 195–218. MR3009072
- [30] G. V. Livshyts, The smallest singular value of heavy-tailed not necessarily i.i.d. random matrices via random rounding, arXiv:1811.07038.
- [31] G. V. Livshyts, K. Tikhomirov, R. Vershynin, The smallest singular value of inhomogeneous square random matrices, arXiv:1909.04219
- [32] A. Lytova, K. Tikhomirov, On delocalization of eigenvectors of random non-Hermitian matrices, Probab. Theor. Rel. Fields, to appear.
- [33] K. Luh, S. Meehan, H.H. Nguyen, Some new results in random matrices over finite fields, arXiv:1907.02575
- [34] K. Luh, S. O’Rourke, Eigenvector Delocalization for Non-Hermitian Random Matrices and Applications, arXiv:1810.00489
- [35] A. Mészáros, The distribution of sandpile groups of random regular graphs, preprint, arXiv: 1806.03736, 2018.
- [36] H.H. Nguyen and M.M. Wood, Cokernels of adjacency matrices of random r-regular graphs, preprint, arXiv: 1806.10068, 2018.
- [37] E. Rebrova, K. Tikhomirov, Coverings of random ellipsoids, and invertibility of matrices with i.i.d. heavy-tailed entries, Israel J. Math., to appear.
- [38] B. A. Rogozin, On the increase of dispersion of sums of independent random variables, Teor. Verojatnost. i Primenen 6 (1961), 106–108. MR0131894
- [39] M. Rudelson, Invertibility of random matrices: norm of the inverse, Ann. of Math. 168 (2008), 575–600.
- [40] M. Rudelson, Recent developments in non-asymptotic theory of random matrices, in Modern aspects of random matrix theory, 83–120, Proc. Sympos. Appl. Math., 72, Amer. Math. Soc., Providence, RI. MR3288229
- [41] M. Rudelson and R. Vershynin, The Littlewood–Offord problem and invertibility of random matrices, Adv. Math. 218 (2008), no. 2, 600–633. MR2407948
- [42] M. Rudelson and R. Vershynin, Smallest singular value of a random rectangular matrix, Comm. Pure Appl. Math. 62 (2009), no. 12, 1707–1739. MR2569075
- [43] M. Rudelson and R. Vershynin, No-gaps delocalization for general random matrices, Geom. Funct. Anal. 26 (2016), no. 6, 1716–1776. MR3579707
- [44] T. Tao and V. Vu, On random matrices: singularity and determinant, Random Structures Algorithms 28 (2006), no. 1, 1–23. MR2187480
- [45] T. Tao and V. Vu, On the singularity probability of random Bernoulli matrices, J. Amer. Math. Soc. 20 (2007), no. 3, 603–628. MR2291914
- [46] T. Tao and V. Vu, Random matrices: the circular law, Commun. Contemp. Math. 10 (2008), 261–307. MR2409368
- [47] T. Tao and V. H. Vu, Inverse Littlewood-Offord theorems and the condition number of random discrete matrices, Ann. of Math. 169 (2009), 595–632. MR2480613
- [48] K. Tikhomirov, Singularity of random Bernoulli matrices, Annals of Math., to appear. arXiv:1812.09016
Alexander E. Litvak
Dept. of Math. and Stat. Sciences,
University of Alberta,
Edmonton, AB, Canada, T6G 2G1.
e-mail: [email protected]
Konstantin E. Tikhomirov
School of Math., GeorgiaTech,
686 Cherry street,
Atlanta, GA 30332, USA.
e-mail: [email protected]