On the optimal objective value of random linear programs
Abstract
We consider the problem of maximizing subject to the constraints , where , is an matrix with mutually independent centered subgaussian entries of unit variance, and is a cost vector of unit Euclidean length. In the asymptotic regime , , and under some additional assumptions on , we prove that the optimal objective value of the linear program satisfies
In the context of high-dimensional convex geometry, our findings imply sharp asymptotic bounds on the spherical mean width of the random convex polyhedron . We provide numerical experiments as supporting data for the theoretical predictions. Further, we carry out numerical studies of the limiting distribution and the standard deviation of .
1 Introduction
1.1 Problem setup and motivation
Consider the linear program (LP)
(1) |
where the entries of the coefficient matrix are mutually independent random variables, and denotes the vector of all ones. Note that is a vector of free variables. In this note we deal with two settings: either is a non-random unit vector or is random uniformly distributed on the unit Euclidean sphere. For any realization of the above LP is feasible since the zero vector satisfies all the constraints. We denote the value of the objective function of (1) by , the optimal objective value by , and any optimal solution by .
The main problem we would like to address here is the following: Under the assumption that the number of constraints is significantly larger than the number of variables, what is the magnitude of the optimal objective value?
We provide theoretical guarantees on the optimal objective value in the asymptotic regime , , and carry out empirical studies which confirm that medium size random LPs (with the number of constraints in the order of a few thousands) closely follow theoretical predictions. The numerical experiments further allow us to formulate conjectures related to the distribution of the optimal objective value, which we hope to address in future works.
Our interest in LP (1) is twofold.
-
(i)
The LP (1) with i.i.d random coefficients can be viewed as the simplest model of an “average-case” linear program. LP of this form has been considered in the mathematical literature in the context of estimating the average-case complexity of the simplex method [7, 8, 9] (see Subsection 1.4 below for a more comprehensive literature review). Despite significant progress in that direction, fundamental questions regarding properties of random LPs remain open [44, Section 6]. The results obtained in this note contribute to better understanding of the geometry of random feasible regions defined as intersections of independent random half-spaces. We view our work as a step towards investigating random LPs allowing for sparse, inhomogeneous and correlated constraints, which would serve as more realistic models of linear programs occurring in practice.
-
(ii)
Assuming that the cost vector is uniformly distributed on the unit sphere and is independent from , the quantity
(where is the expectation taken with respect to the randomness of ) is the spherical mean width of the random polyhedron . The mean width is a classical parameter associated with a convex set; in particular, , where is the polar body for defined as the convex hull of the rows of , and is the average value of the Minkowski functional for on the unit Euclidean sphere,
The mean width plays a key role in the study of projections of convex bodies and volume estimates (see, in particular, [50, Chapters 9–11] and [40]). The existing body of work (which we revise in Subsection 1.4) has focused on the problem of estimating the mean width up to a constant multiple. In this paper, we provide sharp asymptotics of in the regime of parameters mentioned above.
1.2 Notation
In the course of the note we use the letter (with subscripts) for constant numbers and , , , for standard asymptotic notations. Specifically, for positive functions and defined on positive real numbers, if there exists and a real number such that for all , and if . Further,
Definition 1.1.
A random variable is –subgaussian if
The smallest satisfying the above inequality is the subgaussian moment of (see, for example, [50, Chapter 2]). We will further call a random vector with mutually independent components –subgaussian if every component is –subgaussian. Examples of subgaussian variables include Rademacher (i.e symmetric ) random variables, and, more generally, any bounded random variables. The Rademacher distribution is of interest to us since a matrix with i.i.d entries produces a simplest model of a normalized system of constraints with discrete bounded coefficients.
1.3 Theoretical guarantees
In this paper, we study asymptotics of the optimal objective value of (1) in the regime when the number of free variables tends to infinity. We will further assume that the number of constraints is infinitely large compared to , i.e (while the setting is of significant interest, it is not covered by our argument).
The main result of this note is
Theorem 1.2 (Main result: asymptotics of in subgaussian setting).
Let . Assume that , and that for each , the entries of the coefficient matrix are mutually independent centered –subgaussian variables of unit variances. Assume further that the non-random unit cost vectors satisfy . Then for any constant , we have
and, in particular, almost surely.
Remark.
Distributional invariance under rotations makes analysis of Gaussian random polytopes considerably simpler. In the next theorem, we avoid any structural assumptions on the cost vectors, and thereby remove any implicit upper bounds on :
Theorem 1.3 (Asymptotics of in Gaussian setting).
Assume that and that for each , the entries of the coefficient matrix are mutually independent standard Gaussian variables. For each , let be a non-random unit cost vector. Then for any constant we have
In particular, almost surely.
As we mentioned before, the setting when the cost vector is uniform on the Euclidean sphere is of special interest in the field of convex geometry since the quantity (where denotes the expectation taken with respect to ) is the spherical mean width of the random polyhedron . As corollaries of Theorems 1.2 and 1.3, we obtain
Corollary 1.4 (The mean width in subgaussian setting).
Let , let matrices be as in Theorem 1.2, and assume additionally that . Then the spherical mean width of the polyhedron satisfies
Corollary 1.5 (The mean width in Gaussian setting).
Let and be as in Theorem 1.3. Then the spherical mean width of the polyhedron satisfies
1.4 Review of related literature
1.4.1 Convex feasibility problems
In the setting of Theorem 1.2, existence of a feasible vector with is equivalent to the statement that the intersection of random convex sets
is non-empty. The sets can be naturally viewed as independent random affine subspaces of the hyperplane and, with that interpretation, the above question is a relative of the spherical perceptron problem, in which the object of interest is the intersection of random affine halfspaces of the form , for independent random vectors and a constant parameter (see [46, 47] for a comprehensive discussion). Our proof of the lower bound on (equivalently, showing that ) is based on an iterative process which falls under the umbrella of the block-iterative projection methods.
The problem of constructing or approximating a vector in the non-empty intersection of a given collection of convex sets has been actively studied in optimization literature. The special setting where the convex sets are affine hyperplanes or halfspaces, corresponding to solving systems of linear equations and inequalities, was first addressed by Kaczmarz [28], Agmon [1], Motzkin and Schoenberg [37]. We refer, among others, to papers [48, 2, 45, 17, 38, 11, 34, 25, 31, 27, 14, 5, 35, 39, 24, 33, 51, 36, 52, 53] for more recent developments and further references. In particular, the work [2] introduced the block-iterative projection methods for solving the feasibility problems, when each update is computed as a function of projections onto a subset of constraints.
In view of the numerous earlier results cited above, we would like to epmhasize that the goal of our construction is verifying that the intersection is non-empty with high probability rather than approximating a solution to a problem which is known to be feasible in advance. Since we aim to get asymptotically sharp bounds on , we have to rely on high-precision analysis of constraint’s violations, not allowing for any significant losses in our estimates. To our best knowledge, no existing results on the block-iterative projection methods addresses the problem considered here.
1.4.2 Random linear programs
Linear programs with either the coefficient matrix or the right hand side generated randomly, have been actively studied in various contexts:
-
•
Average-case analysis. As we mentioned above, the setting of random LPs with the Gaussian coefficient matrix (or, more generally, a matrix with independent rows having spherically symmetric distributions) was considered by Borgwardt [7, 8, 9] in the context of estimating the average-case complexity of the shadow simplex method. The average-case complexity of the Dantzig algorithm was addressed by Smale in [43] (see also [6]).
- •
- •
-
•
Random integer feasibility problems. We refer to [10] for the problem of integer feasibility in the randomized setting.
-
•
The optimal objective value of LPs with random cost vectors [16].
1.4.3 The mean width of random polyhedra
As we have mentioned at the beginning of the introduction, the mean width of random polytopes has been studied in the convex geometry literature, although much of the work deals with the mean width of the convex hulls of random vectors i.e of the polars to the random polyhedra considered in the present work. We refer, in particular, to [13, 3, 21] for related results (see also [22, 23]).
A two-sided bound on the mean width in the i.i.d subgaussian setting was derived in [30] (see also [20] for earlier results); specifically, it was shown that under the assumption that the entries of are i.i.d subgaussian variables of zero mean and unit variance, the mean width of the symmetric polyhedron satisfies
(2) |
with high probability, where the constants depend on the subgaussian moment of the matrix entries. The argument in [30] heavily relies on results which introduce suboptimal constant multiples to the estimate of , and we believe that it cannot be adapted to derive an asymptotically sharp version of (2) which is the main goal of the present work.
1.5 Organization of the paper
We derive Theorem 1.2 and Corollary 1.4 as a combination of two separate statements: a lower bound on obtained in Section 3, and a matching upper bound treated in Section 4. The lower bound on (Theorem 3.1) is in turn obtained by combining a moderate deviations estimate from Proposition 2.5 and an iterative block-projection algorithm for constructing a feasible solution which was briefly mentioned before. The upper bound on (Theorem 4.1) proved in Section 4 relies on moderate deviations Proposition 2.6 and a special version of a covering argument.
The Gaussian setting (Theorem 1.3 and Corollary 1.5) is considered in Section 5. Finally, results of numerical simulations are presented in Section 6.
The diagrams below provide a high-level structure of the proofs of the main results of the paper.
2 Preliminaries
2.1 Compressible and incompressible vectors
Definition 2.1 (Sparse vectors).
A vector in is –sparse for some if the number of non-zero components in is at most .
Definition 2.2 (Compressible and incompressible vectors, [42]).
Let be a unit vector in , and let . The vector is called –compressible if the Euclidean distance from to the set of –sparse vectors is at most , that is, if there is a –sparse vector such that . Otherwise, is –incompressible.
Observe that a vector is –compressible if and only if the sum of squares of its largest (by the absolute value) components is at least .
2.2 Standard concentration and anti-concentration for subgaussian variables
The following result is well known, see for example [50, Section 3.1].
Proposition 2.3.
Let be a vector in with mutually independent centered –subgaussian components of unit variance. Then for every ,
where the constant depends only on .
The next proposition is an example of Paley–Zygmund–type inequalities, and its proof is standard (we provide the proof for completeness).
Proposition 2.4.
For any there is depending only on with the following property. Let be a centered –subgaussian variable of unit variance. Then
Proof.
Since is –subgaussian, all moments of are bounded, and, in particular, for some depending only on . Applying the Paley–Zygmund inequality, we get
(3) |
Let be the largest number in such that
We will show by contradiction that . Assume the contrary. Then implying, in view of (3),
On the other hand,
We conclude that leading to contradiction. ∎
Remark.
Instead of the requirement that the variable is subgaussian in the above proposition, we could use the assumption for a fixed .
2.3 Moderate deviations for linear combinations
In this section we are interested in upper and lower bounds on probabilities
where is a fixed vector and are mutually independent centered subgaussian variables of unit variances. Under certain assumptions on and , the probability bounds match (up to term in the power of exponent) those in the Gaussian setting. The estimates below are examples of moderate deviations results for sums of mutually independent variables (see, in particular, [15, Section 3.7]), and can be obtained with help of standard Bernstein–type bounds and the change of measure method. However, we were not able to locate the specific statements that we need in the literature, and provide proofs for completeness.
Proposition 2.5 (Moderate deviations: upper bound).
Fix any . Let , and for each let be a non-random unit vector in , and let be a sequence of numbers such that converges to infinity. Assume that for every and all large , the vector is –incompressible. For each , let be mutually independent –subgaussian variables of zero means and unit variances. Then for every and all large we have
Proof.
Throughout the proof, we assume that are fixed. Let . We have, by Markov’s inequality,
Since has zero mean and unit variance, for every we have
Hence, for all such that ,
where the implicit constant in depends on only. On the other hand, for every we can write (see, in particular, [50, Section 2.5])
for some depending only on . Observe that, by the assumptions on the cost vectors , there is a sequence of numbers converging to zero as , such that
Then, combining the above estimates, we obtain
The result follows. ∎
Proposition 2.6 (Moderate deviations: lower bound).
Fix any . Let , and for each let be a non-random unit vector in , and let be a sequence of numbers such that converges to infinity. Assume that . For each , let be mutually independent –subgaussian variables of zero means and unit variances. Then for every and all large we have
Proof.
We shall apply the standard technique of change of measure. Define , and for each set
(4) |
For every , let be the random variable defined via its cumulative distribution function
where is the probability measure on induced by , and such that are mutually independent. Denote by the set of all in such that . We have
To estimate , we will compute the means and the variances of . We get
and
Thus, in view of (4),
and
It follows that . To complete the proof, it remains to note that
∎
3 The lower bound on in subgaussian setting
The main result of this section is
Theorem 3.1 (Lower bound on ).
Let . Assume that and that for each , the entries of the coefficient matrix are mutually independent centered –subgaussian variables of unit variances. Assume further that the non-random unit cost vectors satisfy
for every there is such that for all , | ||
is –incompressible. |
Let be the optimal objective value of (1). Then for any constant and all sufficiently large , we have
In particular, almost surely.
Although the assumptions on the cost vectors in Theorem 3.1 may appear excessively technical, we conjecture that they are optimal in the following sense: whenever there is such that is –compressible for infinitely many , there is (independent of ) and a sequence of random matrices with mutually independent centered entries of unit variances and subgaussian moments bounded by , such that almost surely. We refer to Section 6.2 for numerical experiments regarding the assumptions on the cost vectors.
As we mentioned in the introduction, the main idea of our proof of Theorem 3.1 is to construct a feasible solution to the LP (1) satisfying via an iterative process. We start by defining a vector as the multiple of the cost vector . Our goal is to add a perturbation to which would restore its feasibility. We shall select a vector which would repair the constraints violated by , i.e for all with . The vector may itself introduce some constraints’ violation; to repair those constraints, we will find another vector , and consider , etc. At –th iteration, the vector will be chosen as a linear combination of the matrix rows having large scalar products with . The process continues for some finite number of steps which depends on the values of and . We shall prove that the resulting vector is feasible with high probability. To carry this argument over, we start with a number of preliminary estimates concerning subgaussian vectors as well as certain estimates on the condition numbers of random matrices.
We write for the set of integers . Further, given a finite subset of integers , we write for the linear real space of –dimensional vectors with components indexed over . Given a set or an event , we will write for the indicator of the set/event.
3.1 Auxiliary results
Proposition 2.3 and a simple union bound argument yields
Corollary 3.2.
For every there is a number depending only on with the following property. Let , let be a –set, and let be a random vector in with mutually independent centered –subgaussian components of unit variance. Then the event
The number of indices with is at most , and | |||
has probability as least .
Proof.
Set
where is chosen sufficiently large (the specific requirements on can be extracted from the argument below). Since the vector is subgaussian, every component of satisfies
for some depending only on . Hence, the probability that more than components of are greater than , can be estimated from above by
(5) |
If is chosen so that then the last expression in (5) is bounded above by .
Similarly to the above and assuming is sufficiently large, in view of Proposition 2.3, for any choice of a –subset , the probability that the vector has the Euclidean norm greater than , is bounded above by
Combining the two observations, we get the result. ∎
Our next observation concerns random matrices with subgaussian entries. Recall that the largest and smallest singular values of an () matrix are defined by
and
Proposition 3.3.
For every there are depending only on with the following property. Let and let be an random matrix with mutually independent centered subgaussian entries of unit variance and subgaussian moment at most . Further, let be a non-random projection operator of rank i.e is the orthogonal projection onto an –dimensional subspace of . Then
Proof.
Let be the unit vector in the null space of (note that it is determined uniquely up to changing the sign).
Fix any unit vector . Then the random vector has mutually independent components of unit variance, and is subgaussian, with the subgaussian moment of the components depending only on (see [50, Section 2.6]). Applying Proposition 2.3, we get
(6) |
where only depends on . Since the projection operator acts as a contraction we have everywhere on the probability space. On the other hand, , where, again according to [50, Section 2.6], is a subgaussian random variable with the subgaussian moment only depending on , and therefore
(7) |
for some depending only on .
The rest of the proof of the proposition is based on the standard covering argument (see, for example, [50, Chapter 4]), and we only sketch the idea. Consider a Euclidean –net on (for a sufficiently small constant ) i.e a non-random discrete subset of the Euclidean sphere such that for every vector in there is some vector in at distance at most from . It is known that can be chosen to have cardinality at most [50, Section 4.2]. In view of the last probability estimate, the event
has probability at least . Everywhere on that event, we have (see [50, Section 4.4])
and
and, in particular,
assuming that is a sufficiently small positive constant. It remains to note that, as long as the aspect ratio is sufficiently large, the quantity is exponentially small in . ∎
Remark.
It can be shown that and with probability exponentially close to one for arbitrary constant as long as is sufficiently large.
Corollary 3.4.
For every there are depending only on with the following property. Let , and let be a non-random projection operator of rank . Further, let be independent random vectors in with mutually independent centered –subgaussian components of unit variance. Let be any non-random vector in . Then with probability there is the unique vector in the linear span of such that
(8) |
and, moreover, that vector satisfies .
Proof.
The next lemma provides a basis for a discretization argument which we will apply in the proof of the theorem.
Lemma 3.5.
There is a universal constant with the following property. Let . Then there is a set of vectors in satisfying all of the following:
-
•
All vectors in have non-negative components taking values in ;
-
•
The Euclidean norm of every vector in is in the interval ;
-
•
For every unit vector there is a vector such that , ;
-
•
The size of is at most .
Proof.
Define as the collection of all vectors in with non-negative components taking values in and with the Euclidean norm in the interval . Observe that for every unit vector , by defining coordinate-wise as
we get a vector with Euclidean norm in the interval , i.e a vector from . Thus, constructed this way satisfies the first three properties listed above. It remains to verify the upper bound on .
To estimate the size of we apply the standard volumetric argument. Observe that can be bounded from above by the number of integer lattice points in the rescaled Euclidean ball of radius . That, in turn, is bounded above by the maximal number of non-intersecting parallel translates of the unit cube which can be placed inside the ball . Comparison of the Lebesgue volumes of and confirms that the latter is of order , completing the proof. ∎
3.2 Proof of Theorem 3.1
From now on, we fix a small constant .
First, suppose that . Observe that in this case . Define , so that by the assumptions of the theorem, for every constant and all large enough , the cost vectors are –incompressible. Hence, applying Proposition 2.5, we get for every and all large enough ,
implying
We infer that, by considering , the optimal objective value of the LP satisfies
with probability , and the statement of the theorem follows. In view of the above, to complete the proof it is sufficient to verify the statement under the assumption . In what follows, we assume that for some number independent of .
We shall construct a “good” event of high probability, condition on a realization of the matrix from that event, and then follow the iterative scheme outlined at the beginning of the section to construct a feasible solution. Set
and define an integer as
Definition of a “good” event. Define
At most components of the vector are greater than , | |||
where is a sufficiently large universal constant whose value can be extracted from the argument below. Observe that, by our definition of and by our assumption that is small,
whence for every , in view of the definition of ,
Define , and observe that, by the assumptions of the theorem, for every constant and all large enough , the cost vectors are –incompressible. Hence, we can apply Proposition 2.5 to get for all large enough ,
In particular, the probability that for more than indices , is bounded above by
On the other hand, for every , by Proposition 2.3 and by the union bound over –subsets of , the probability that the sum of squares of largest components of is greater than , is bounded above by
where is a universal constant. Taking to be a sufficiently large constant multiple of and combining the above estimates, we obtain that the event has probability .
Fix for a moment any subset of size at most and at least . Further, denote by the non-random discrete subset of vectors in with the properties listed in Lemma 3.5. Denote by the projection onto the orthogonal complement of the cost vector . Define the event
For every there is the unique vector in the linear span of , , | |||
such that , ; moreover, that vector satisfies: | |||
(a) ; | |||
(b) the number of indices with is at most ; | |||
where is a sufficiently large constant depending only on . By combining Corollaries 3.2 and 3.4, we obtain that the event has probability at least , whence the intersection of events taken over the collection of all subsets of of size at least and at most , has probability .
Define the “good” event
so that . Our goal is to show that, conditioned on any realization of from , the optimal objective value of the LP is at least . From now on, we assume that the matrix is non-random and satisfies the conditions from the definition of .
Definition of . Let be the –subset of corresponding to largest components of the vector (we resolve ties arbitrarily). In view of our conditioning on , for every we have
Denote by the restriction of the vector to , and let be a vector which majorizes coordinate-wise. Note that in view of the definition of , the Euclidean norm of is of order . Further, let be the unique vector in the linear combination of , , such that , , satisfying the conditions (a), (b), (c) from the definition of . Define , so that , . Observe that with such a definition of , the sum satisfies
whereas . Applying the definition of the event and the bound , we get that the vector satisfies
The number of indices with is at most ; | ||
where depends only on . We will simplify the conditions on . First, observe that in view of the definition of and since , we can assume that . Further, we can estimate from above by . Thus, the vector satisfies
The number of indices with is at most ; | ||
Definition of The vectors are constructed via an inductive process. Let . We define numbers recursively by the formula
With large, we then have
Further, assuming the vector has been defined, we let be a –subset of corresponding to largest components of (with ties resolved arbitrarily).
Let , and assume that a vector has been constructed that satisfies
The number of indices with is at most ; | ||
(observe that satisfies the above assumptions, which provides a basis of induction). We denote by the restriction of the vector to (note that has the Euclidean norm at most by the induction hypothesis), and let be a vector which majorizes coordinate-wise. Similarly to the above, we let be the unique vector in the linear span of , , such that , , and satisfying the conditions (a), (b), (c) from the definition of , and define , so that , . By the definition of , the vector satisfies
(9) |
for some constant . In is not difficult to check that, with our definition of and the assumption and as long as is large, the quantity is (much) less than ; further, is dominated by . Thus, satisfies
The number of indices with is at most ; | ||
completing the induction step.
Restoring feasibility. The above process produces vectors , where satisfies, in view of (9) and the choice of ,
so that, in particular, for every . Note also that for every and , we have
To complete the proof, we have to verify that the sum is a feasible solution to the linear program. Fix any index . Let be the Boolean indicator of the expression “”, for each , let be the indicator of “”, and set . Note that, in view of the definition of the sets , only if . The set can be partitioned into consecutive subsets in such a way that
-
•
The size of each set is either or , and
-
•
Whenever for some , the index belongs to a –subset from the partition, and it is the smaller of the two indices in that subset.
We have
Fix for a moment any and consider the corresponding sum . First, assume that is a singleton, and let be its element. By our construction of the partition, we have .
-
•
If then
-
•
If then
-
•
If then
Next, assume that the size of is two, and represent the subset as . By the construction of the vectors and , we then have
Combining the two possible cases and summing over , we get
and the proof is finished.
3.3 Applications of Theorem 3.1
As a simple sufficient condition on the cost vectors which implies the assumptions in Theorem 3.1, we consider a bound on the –norms which, in particular, implies the lower bound in the main Theorem 1.2:
Corollary 3.6 (Lower bound on under sup-norm delocalization of the cost vectors).
Let satisfy , and let matrices be as in Theorem 3.1. Consider a sequence of non-random cost vectors satisfying . Then for any constant and all sufficiently large ,
implying that almost surely.
Proof.
Fix any constant , and let be any subset of of size at most . In view of the assumptions on , we have
Since the choice of the subsets was arbitrary, the last relation implies that is –incompressible for all large . It remains to apply Theorem 3.1. ∎
The next corollary constitutes the lower bound in Corollary 1.4:
Corollary 3.7 (The mean width: a sharp lower bound).
Let and be as in Theorem 3.1, and assume additionally that . Then the spherical mean width of the polyhedron satisfies
Proof.
In this proof, we assume that for each , is a uniform random unit vector in , and that , , are mutually independent and independent from . Recall that the mean width of the polyhedron is defined as
where denotes the expectation taken with respect to . It will be most convenient for us to prove the statement by contradiction. Namely, assume that there is a small positive number such that
Denote by the –field generated by the random matrices . Observe that the event
is –measurable. Condition for a moment on any realization of the polyhedra , , from , and let be an increasing sequence of integers such that for every . The condition implies
and hence
where is the conditional probability given the realization of , , from . By the Borel–Cantelli lemma, the last assertion yields
Removing the conditioning on , we arrive at the estimate
(10) |
In the second part of the proof, we will show that the assertion (10) leads to contradiction. Fix for a moment any . We claim that is –incompressible for every large enough with probability . Indeed, the standard concentration inequality on the sphere (see, in particular, [50, Chapter 5]) implies that for every choice of a non-random –subset ,
for some universal constants . Taking the union bound, we then get
Our assumption implies that
For all such , the last estimate with yields
(11) |
It remains to observe that as long as , we have
to that the expression on the right hand side of (11) is bounded above by . This verifies the claim.
4 The upper bound on in subgaussian setting
The goal of this section is to prove the following result:
Theorem 4.1 (Upper bound on ).
Fix any . Let , . For each let be a non-random unit vector in , and assume that . For each , let be an matrix with mutually independent centered –subgaussian entries of unit variances. Then for every and all large we have
In particular, almost surely.
Observe that the statement of the theorem is equivalent to the assertion that the intersection of the feasible polyhedron with the affine hyperplane
is empty with probability . To prove the result, we will combine the moderate deviations estimate from Proposition 2.6 with a covering argument.
4.1 Auxiliary results
Proposition 2.6 cannot be directly applied in our setting since the vectors in generally do not satisfy (even if normalized) the required –norm bound. To overcome this issue, we generalize the deviation bound to sums of two orthogonal vectors where one of the vectors satisfies a strong –norm estimate:
Proposition 4.2.
Fix any . Let , and for each let be a non-random unit vector in , be any fixed vector in , and let be a sequence of numbers such that converges to infinity. Assume that . For each , let be mutually independent centered –subgaussian variables of unit variances. Then for every and all large we have
Proof.
Fix a small . In view of the assumptions on , there is a sequence of positive numbers converging to zero such that for all . We shall consider two scenarios:
-
•
. Observe that , whereas the variable is centered and –subgaussian, for some depending only on . Lemma 2.4 then implies that, assuming is sufficiently large,
for some depending only on .
-
•
. Let be a –subset of corresponding to largest (by the absolute value) components of , with ties resolved arbitrarily. Further, let be the restriction of to . In view of the condition , we have
Further, in view of orthogonality of and ,
whereas,
We infer that . Thus, setting , we obtain a unit vector with . Applying Proposition 2.6 with in place of and in place of , we get
assuming is sufficiently large. On the other hand, again applying Lemma 2.4,
for some depending only on . Combining the last two relations, we get the desired estimate.
∎
To apply the last proposition to our setting of interest, we shall bound the number of constraints violated by a test vector:
Corollary 4.3.
Let , , , and be as in Theorem 4.1, and let be any sequence of vectors orthogonal to . Then for every small constant and all large we have
with probability at least .
Proof.
In view of Proposition 4.2, applied with , for all large and every we have
Hence, the probability that the last condition holds for less than indices , is bounded above by
for all sufficiently large . ∎
Recall that the outer radius of a polyhedron is defined as .
Proposition 4.4 (An upper bound on the outer radius).
Let be as in Theorem 4.1. Then with probability , the polyhedron has the outer radius bounded above by a function of .
Proof.
Let be a small parameter to be defined later, and let be a Euclidean –net on of size at most [50, Chapter 4]. Lemma 2.4 implies that for some depending only on and for every and ,
Define as the largest number satisfying
Then, by the above, for every ,
Setting and using the above bound on the size of , we obtain that the event
has probability (recall that the with probability ; see [50, Chapter 4]). Condition on any realization of from the above event. Assume that there exists . Let be a vector satisfying . In view of the conditioning, there are at least indices such that . Consequently,
Since , we get
contradicting our choice of . The contradiction shows that the outer radius of is at most , and the claim is verified. ∎
4.2 Proof of Theorem 4.1, and an upper bound on
Proof of Theorem 4.1.
In view of Proposition 4.4, with probability the outer radius of is bounded above by , for some depending only on .
Fix any small . By the above, to complete the proof of the theorem it is sufficient to show that with probability , every vector orthogonal to and having the Euclidean norm at most , satisfies
Let be the largest integer bounded above by
Define , where is a sufficiently small constant depending only on and (the value of can be extracted from the argument below). Let be a Euclidean –net in . It is known that can be chosen to have cardinality at most (see [50, Chapter 4]).
Fix for a moment any , and observe that, in view of Corollary 4.3 and the definition of , we have
(12) |
with probability at least
Therefore, assuming that is sufficiently small and taking the union bound over all elements of , we obtain that with probability , (12) holds simultaneously for all .
Condition on any realization of such that the last event holds and, moreover, such that . Assume for a moment that there is such that
In view of the above, it implies that there is a vector at distance at most from such that
for at least indices . Hence, the spectral norm of can be estimated as
contradicting the bound . The contradiction implies the result. ∎
The next corollary constitutes an upper bound in Corollary 1.4. Since its proof is very similar to that of Corollary 3.7, we only sketch the argument.
Corollary 4.5.
Let be as in Theorem 4.1, and assume additionally that . Then, almost surely.
Proof.
In view of Proposition 4.4, there is depending only on such that the event
has probability one, where, as before, denotes the outer radius of the polyhedron . Set
We will prove the corollary by contradiction. Assume that the assertion of the corollary does not hold. Then, in view of the above definitions, there is such that
Condition for a moment on any realization of the polyhedra , , from , and let be an increasing sequence of integers such that and for every . Assume that , , are mutually independent uniform random unit vectors which are also independent from the matrices . The conditions
imply that for some depending only on and ,
where is the conditional probability given the realization of , , from . By the Borel–Cantelli lemma, the last assertion yields
Removing the conditioning on , we arrive at the estimate
(13) |
The standard concentration inequality on the sphere [50, Chapter 5] and the conditions on imply that with probability , and therefore, by Theorem 4.1,
The latter contradicts (13), and the result follows. ∎
5 The Gaussian setting
In this section, we prove Theorem 1.3 and Corollary 1.5. Note that in view of rotational invariance of the Gaussian distribution, we can assume without loss of generality that the cost vectors in Theorem 1.3 are of the form . Thus, in the regime , the statement is already covered by the more general results of Sections 3 and 4.
5.1 The outer radius of random Gaussian polytopes
In this subsection, we consider the bound on the outer radius of random Gaussian polytopes. The next result immediately implies upper bounds on and in Theorem 1.3 and Corollary 1.5 in the entire range . We anticipate that the statement is known although we are not aware of a reference, and for that reason provide a proof.
Proposition 5.1 (Outer radius of the feasible region in the Gaussian setting).
Let satisfy , and for every let be an random matrix with i.i.d standard Gaussian entries. Then, denoting by the outer radius of the feasible region, for any constant we have
In particular, almost surely.
The following approximation of the standard Gaussian distribution is well known (see, for example, [19, Volume I, Chapter VII, Lemma 2]):
Lemma 5.2.
Let be a standard Gaussian variable. Then for every ,
Let be a standard Gaussian random vector in , and be the non-decreasing rearrangement of its components. Let . Then, from the above lemma, for every we have
(14) |
As a corollary, we obtain the following deviation estimate:
Corollary 5.3.
Fix any . Let sequences of positive integers , , and , , satisfy , , and . Further, for each , let be a standard Gaussian vector in . Then
Proof.
Proof of Proposition 5.1.
Fix any . Observe that the statement is equivalent to showing that with probability the matrix has the property that for every there is with .
The argument below is similar to the proof of Proposition 4.4. Let be the largest integer bounded above by
Define , where is a sufficiently small constant depending only on . Let be a Euclidean –net in of cardinality at most (see [50, Chapter 4]).
Fix for a moment any , and observe that, in view of Corollary 5.3 and the definition of , we have
(15) |
with probability at least . Therefore, assuming that is sufficiently small and taking the union bound over all elements of , we obtain that with probability , (15) holds simultaneously for all .
Condition on any realization of such that the last event holds and, moreover, such that (recall that the latter occurs with probability ; see [50, Chapter 4]). Assume for a moment that there is such that for all . In view of the above, it implies that there is a vector at distance at most from such that
for at least indices . Hence, the spectral norm of can be estimated as
contradicting the bound . The contradiction shows that for every there is such that . The result follows. ∎
5.2 Proof of Theorem 1.3 and Corollary 1.5
As we have noted, the upper bounds in Theorem 1.3 and Corollary 1.5 follow readily from Proposition 5.1, and so we only need to verify the lower bounds. In view of results of Section 3 which also apply in the Gaussian setting, we may assume that satisfies .
Regarding the proof of Theorem 1.3, since is a standard Gaussian vector, it follows from the formula for the Gaussian density that
whereas, by the assumption , we have . This implies the result.
6 Numerical experiments
Our results described in the previous sections naturally give rise to the questions: (a) How close to each other are the asymptotic estimates of the optimal objective value and empirical observations?, (b) What is the asymptotic distribution and standard deviation of ?, and (c) How does the algorithm in the proof of Theorem 3.1 perform in practice? We discuss these questions in the following subsections, and make conjectures while providing numerical evidence. We use Gurobi 10.0.1 for solving instances of the LP (1), and set all Gurobi parameters to default.
6.1 Magnitude of the optimal objective value
Below, we investigate the quality of approximation of the optimal objective value by the asymptotic bound given in Theorems 1.2 and 1.3. The empirical outcomes are obtained through multiple runs of the LP (1) under various choices of parameters. We consider two distributions of the entries of : either is standard Gaussian or its entries are i.i.d Rademacher () random variables. In either case and for different choices of , we sample the random LP (1) taking the sample size and letting be the vector of i.i.d Rademacher variables rescaled to have unit Euclidean norm. The cost vector is generated once and remains the same for each of the instances of the LP within a sample.
relative gap() | ||||
---|---|---|---|---|
1000 | 50 | 0.40853 | 0.50626 | 23.92 |
2000 | 50 | 0.36816 | 0.44119 | 19.83 |
6000 | 50 | 0.32317 | 0.37256 | 15.28 |
10000 | 50 | 0.30719 | 0.35176 | 14.50 |
20000 | 50 | 0.28888 | 0.32473 | 12.41 |
As is shown in Tables 1 and 2, for a fixed value of , as the number of constraints progressively increases in magnitude, the relative gap between and the sample mean of the optimal objective values, defined as
decreases.
relative gap() | ||||
---|---|---|---|---|
1000 | 50 | 0.40853 | 0.50369 | 23.29 |
2000 | 50 | 0.36816 | 0.43801 | 18.97 |
6000 | 50 | 0.32317 | 0.37200 | 15.11 |
10000 | 50 | 0.30719 | 0.35220 | 14.65 |
20000 | 50 | 0.28888 | 0.32856 | 13.73 |
6.2 Structural assumptions on the cost vector
In this subsection, we carry out a numerical study of the technical conditions on the cost vectors from Theorem 1.2. Roughly speaking, those conditions stipulate that the cost vector has significantly more than “essentially non-zero” components. Since Theorem 1.2 is an asymptotic result, it is not at all clear what practical assumptions on the cost vectors should be imposed to guarantee that the resulting optimal objective value is close to the asymptotic bound.
relative gap () | ||
---|---|---|
1 | 0.429907 | 15.08 |
2 | 0.454357 | 10.25 |
3 | 0.459387 | 9.25 |
4 | 0.479007 | 5.38 |
5 | 0.481839 | 4.82 |
6 | 0.486141 | 3.97 |
7 | 0.479635 | 5.25 |
8 | 0.493168 | 2.58 |
9 | 0.490829 | 3.04 |
10 | 0.498035 | 1.62 |
In the following experiment, we consider a random coefficient matrix with i.i.d subgaussian components equidistributed with the product of Bernoulli() and N(,) random variables. Note that with this definition the entries have zero mean and unit variance. Further, we consider a family of cost vectors parametrized by an integer , so that the vector has non zero components equal to each, and the rest of the components are zeros. For a given choice of , we sample the LP (1) with the above distribution of the entries and the cost vector . Our goal is to compare the resulting sample mean of the optimal objective value to the sample mean obtained in the previous subsection for the same matrix dimensions and for the strongly incompressible cost vector. We fix and , and let to be the sample mean of optimal objective value of LP(1) with the cost vector . We denote the sample mean of objective value with the strongly incompressible cost vector by . Our empirical results show that, as long as is small (so that is very sparse), the value of the sample mean differs significantly from the one in Table 1. On the other hand, for sufficiently large the relative gap between the sample means defined by
becomes negligible. The results are presented in Table 3.
6.3 Limiting distribution of the optimal objective value
Recall that, given mutually independent standard Gaussian variables, their maximum converges (up to appropriate rescaling and translation) to the Gumbel distribution as the number of variables tends to infinity. If the vertices of the random polyhedron were behaving like independent Gaussian vectors (with a proper rescaling) then the limiting distribution of for any given fixed cost vector would be asymptotically Gumbel. Our computations suggest that in fact the limiting distribution is Gaussian.




In our numerical experiments, we consider Gaussian and Rademacher random coefficient matrices of dimensions and . The sample size in either case is taken to be . As in the previous experiment, we take to be a random vector with i.i.d Rademacher components rescaled to have the Euclidean norm one. We generate the cost vector once so that it is the same for every LP from a sample. We employ the Kolmogorov–Smirnov (KS) test to compare the sample distribution of to Gaussian of a same mean and variance. In either case, the obtained p-value exceeds the conventional significance level of 0.05, indicating lack of substantial evidence to refute the null hypothesis that the data is normally distributed. Further, visual representation of the frequency histogram and the empirical cumulative distribution function of the optimal values shown in Figures 3 and 4 validate this conjecture.
The next conjecture summarizes the experimental findings:
Conjecture 6.1 (Limiting distribution of ).
Let , , be as in Theorem 1.2, and for each let be uniform random unit cost vector independent from . Then the sequence of normalized random variables
converges in distribution to the standard Gaussian.
6.4 Standard deviation of the optimal objective value
As in the previous two numerical experiments, in this one we consider two types of the entries distributions: Gaussian and Rademacher. The cost vector is generated according to the same procedure as above. We sample the LP (1) taking the sample size to be , and compute the sample standard deviation (the square root of the sample variance) of the optimal objective value. Based on our numerical studies we speculate that standard deviation of is roughly of the order , at least in the setting where is very large compared to . Indeed, upon examination of the last column in Tables 4 and 5, we observe a consistent phenomenon: the quantities are of order for various choices of and .
1000 | 50 | 0.408539 | 0.02420 | 0.765 |
---|---|---|---|---|
2000 | 50 | 0.368161 | 0.01728 | 0.773 |
4000 | 50 | 0.337791 | 0.01333 | 0.843 |
6000 | 50 | 0.323170 | 0.01142 | 0.885 |
8000 | 50 | 0.313877 | 0.01059 | 0.947 |
10000 | 50 | 0.307196 | 0.01027 | 1.027 |
1000 | 100 | 0.465991 | 0.03014 | 0.953 |
2000 | 100 | 0.408539 | 0.01592 | 0.712 |
4000 | 100 | 0.368161 | 0.01172 | 0.742 |
6000 | 100 | 0.349456 | 0.00990 | 0.767 |
1000 | 150 | 0.528257 | 0.03192 | 1.010 |
2000 | 150 | 0.441515 | 0.015066 | 0.674 |
4000 | 150 | 0.391745 | 0.01155 | 0.731 |
6000 | 150 | 0.368161 | 0.011850 | 0.918 |
1000 | 50 | 0.408539 | 0.020420 | 0.645 |
---|---|---|---|---|
2000 | 50 | 0.368161 | 0.017393 | 0.777 |
4000 | 50 | 0.337791 | 0.011962 | 0.756 |
6000 | 50 | 0.323170 | 0.010207 | 0.790 |
8000 | 50 | 0.313877 | 0.010333 | 0.924 |
10000 | 50 | 0.307196 | 0.008997 | 0.899 |
1000 | 100 | 0.465991 | 0.023097 | 0.730 |
2000 | 100 | 0.408539 | 0.014648 | 0.655 |
4000 | 100 | 0.368161 | 0.010436 | 0.660 |
6000 | 100 | 0.349456 | 0.011791 | 0.913 |
1000 | 150 | 0.528257 | 0.032600 | 1.030 |
2000 | 150 | 0.441515 | 0.015466 | 0.691 |
4000 | 150 | 0.391745 | 0.011030 | 0.697 |
6000 | 150 | 0.368161 | 0.009153 | 0.708 |
6.5 An algorithm for finding a near-optimal feasible solution
As previously discussed, the proof of Theorem 3.1 involves an iterative projection process to restore feasibility from an initial point. In the process, we discretize the continuous space spanned by violated constraints at each iteration (i.e use a covering argument) as a means to apply probabilistic union bound. Recall that the goal of the algorithm described in the proof of Theorem 3.1 is to provide theoretical guarantees for . An exact software implementation of that algorithm is heavy in terms of both the amount of code and computations. On the other hand, it is of considerable interest to measure performance of block-iterative projection methods in the context of finding near optimal solution to instances of the random LP (1). To approach this task, we consider a “light” version of the algorithm from the proof of Theorem 3.1, which does not involve any covering arguments (see Algorithm 1 below). The algorithm is a version of the classical Kaczmarz block-iterative projection method as presented in [18] and its randomized counterparts in [38] and [53]. In our context, the blocks correspond to the rows of the matrix that are violated by the initial point and its updates at each iteration. Table 6 shows the results of running the algorithm for various parameters and . For each instance of the LP, is the the number of iterations required to restore feasibility, is the initial point and is the output of the Algorithm 1. Note that the obtained perturbations to are in the orthogonal complement of the cost vector and as a result . The numbers and are the number of violated constraints by at iteration 1 and at iteration 2, respectively. Note that these numbers are rather small compared to the size of the LP instance. We note that in our experiment, the algorithm did not converge for the instance of the LP, which we attribute to the random nature of the problem. At the same time, the Algorithm converged fast due to small number of constraint violations for all other given instances.
1000 | 50 | 2 | 0.408539 | 11 | 1 |
1000 | 100 | 2 | 0.465991 | 28 | 4 |
2000 | 50 | 2 | 0.368161 | 6 | 1 |
2000 | 100 | 2 | 0.408539 | 23 | 1 |
4000 | 50 | 2 | 0.337791 | 13 | 2 |
4000 | 100 | 2 | 0.368161 | 30 | 1 |
6000 | 50 | 2 | 0.323170 | 22 | 21 |
6000 | 100 | 2 | 0.349456 | 31 | 8 |
8000 | 50 | 2 | 0.313877 | 22 | 6 |
8000 | 100 | 2 | 0.337791 | 21 | 5 |
10000 | 50 | 2 | 0.307196 | 19 | 1 |
10000 | 100 | 1 | 0.329505 | 34 | 0 |
20000 | 100 | 1 | 0.307196 | 29 | 0 |
40000 | 50 | 1 | 0.273493 | 12 | 0 |
40000 | 100 | 2 | 0.288881 | 34 | 3 |
60000 | 50 | 2 | 0.265558 | 16 | 2 |
60000 | 100 | 1 | 0.279576 | 36 | 0 |
80000 | 50 | 1 | 0.260329 | 21 | 0 |
80000 | 100 | 2 | 0.273493 | 36 | 1 |
100000 | 50 | 2 | 0.256479 | 28 | 5 |
100000 | 100 | 2 | 0.269040 | 35 | 1 |
7 Conclusion
In this paper, we study the optimal objective value of a random linear program where the entries of the coefficient matrix have subgaussian distributions, the variables are free and the right hand side of each constraint is . We establish sharp asymptotics in the regime , and some additional assumptions on the cost vectors. This asymptotic provides quantitative information about the geometry of the random polyhedron defined by the feasible region of the random LP (1), specifically about its spherical mean width. The computational experiments support our theoretical guarantees and give insights into several open questions regarding the asymptotic behaviour of the random LP. The connection between the proof of the main result and convex feasibility problems allows us to provide a fast algorithm for obtaining a near optimal solution in our randomized setting. The random LP (1) is a step towards studying models with inhomogeneous coefficient matrices and establishing stronger connections to practical applications.
Acknowledgements
We are grateful to Dylan Altschuler for bringing our attention to [15, Section 3.7].
Declarations
Funding. Konstantin Tikhomirov is partially supported by the NSF grant DMS-2331037.
Conflict of interest/Competing interests. The authors declare no conflict of interest in regard to this publication.
Code availability. The code for the numerical experiments is available on https://github.com/marzb93/RandomLinearProgram
References
- [1] Shmuel Agmon. The relaxation method for linear inequalities. Canad. J. Math., 6:382–392, 1954.
- [2] Ron Aharoni and Yair Censor. Block-iterative projection methods for parallel computation of solutions to convex feasibility problems. Linear Algebra and Its Applications, 120:165–175, 1989.
- [3] David Alonso-Gutiérrez and Joscha Prochno. On the Gaussian behavior of marginals and the mean width of random polytopes. Proc. Amer. Math. Soc., 143(2):821–832, 2015.
- [4] Madan Mohan Babbar. Distributions of solutions of a set of linear equations (with an application to linear programming). Journal of the American Statistical Association, 50(271):854–869, 1955.
- [5] Zhong-Zhi Bai and Wen-Ting Wu. On greedy randomized Kaczmarz method for solving large sparse linear systems. SIAM J. Sci. Comput., 40(1):A592–A606, 2018.
- [6] Charles Blair. Random linear programs with many variables and few constraints. Math. Programming, 34(1):62–71, 1986.
- [7] K.-H. Borgwardt. The average number of pivot steps required by the simplex-method is polynomial. Z. Oper. Res. Ser. A-B, 26(5):A157–A177, 1982.
- [8] Karl-Heinz Borgwardt. The simplex method, volume 1 of Algorithms and Combinatorics: Study and Research Texts. Springer-Verlag, Berlin, 1987. A probabilistic analysis.
- [9] Karl Heinz Borgwardt. A sharp upper bound for the expected number of shadow vertices in LP-polyhedra under orthogonal projection on two-dimensional planes. Math. Oper. Res., 24(3):544–603, 1999.
- [10] Karthekeyan Chandrasekaran and Santosh S Vempala. Integer feasibility of random polytopes: random integer programs. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 449–458, 2014.
- [11] Sergei Chubanov. A polynomial projection algorithm for linear feasibility problems. Math. Program., 153(2, Ser. A):687–713, 2015.
- [12] Daniel Dadush and Sophie Huiberts. A friendly smoothed analysis of the simplex method. In STOC’18—Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 390–403. ACM, New York, 2018.
- [13] N. Dafnis, A. Giannopoulos, and A. Tsolomitis. Asymptotic shape of a random polytope in a convex body. J. Funct. Anal., 257(9):2820–2839, 2009.
- [14] Jesús A. De Loera, Jamie Haddock, and Deanna Needell. A sampling Kaczmarz-Motzkin algorithm for linear feasibility. SIAM J. Sci. Comput., 39(5):S66–S87, 2017.
- [15] Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of Applications of Mathematics (New York). Springer-Verlag, New York, second edition, 1998.
- [16] Martin E Dyer, Alan M Frieze, and Colin JH McDiarmid. On linear programs with random costs. Mathematical Programming, 35:3–16, 1986.
- [17] Yonina C. Eldar and Deanna Needell. Acceleration of randomized Kaczmarz method via the Johnson-Lindenstrauss lemma. Numer. Algorithms, 58(2):163–177, 2011.
- [18] Tommy Elfving. Block-iterative methods for consistent and inconsistent linear equations. Numerische Mathematik, 35:1–12, 1980.
- [19] William Feller. An introduction to probability theory and its applications, Volume 2, volume 81. John Wiley & Sons, 1991.
- [20] Apostolos Giannopoulos and Marianna Hartzoulaki. Random spaces generated by vertices of the cube. Discrete Comput. Geom., 28(2):255–273, 2002.
- [21] Apostolos Giannopoulos, Labrini Hioni, and Antonis Tsolomitis. Asymptotic shape of the convex hull of isotropic log-concave random vectors. Adv. in Appl. Math., 75:116–143, 2016.
- [22] E. D. Gluskin. An octahedron is poorly approximated by random subspaces. Funktsional. Anal. i Prilozhen., 20(1):14–20, 96, 1986.
- [23] E. D. Gluskin. Extremal properties of orthogonal parallelepipeds and their applications to the geometry of Banach spaces. Mat. Sb. (N.S.), 136(178)(1):85–96, 1988.
- [24] Robert M. Gower, Denali Molitor, Jacob Moorman, and Deanna Needell. On adaptive sketch-and-project for solving linear systems. SIAM J. Matrix Anal. Appl., 42(2):954–989, 2021.
- [25] Robert M. Gower and Peter Richtárik. Randomized iterative methods for linear systems. SIAM J. Matrix Anal. Appl., 36(4):1660–1690, 2015.
- [26] Sophie Huiberts, Yin Tat Lee, and Xinzhi Zhang. Upper and lower bounds on the smoothed complexity of the simplex method. In STOC’23—Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 1904–1917. ACM, New York, [2023] ©2023.
- [27] Yuling Jiao, Bangti Jin, and Xiliang Lu. Preasymptotic convergence of randomized Kaczmarz method. Inverse Problems, 33(12):125012, 21, 2017.
- [28] S Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen (english translation by jason stockmann): Bulletin international de l’académie polonaise des sciences et des lettres. 1937.
- [29] Marcel Klatt, Axel Munk, and Yoav Zemel. Limit laws for empirical optimal solutions in random linear programs. Annals of Operations Research, 315(1):251–278, 2022.
- [30] A. E. Litvak, A. Pajor, M. Rudelson, and N. Tomczak-Jaegermann. Smallest singular value of random matrices and geometry of random polytopes. Adv. Math., 195(2):491–523, 2005.
- [31] Ji Liu and Stephen J. Wright. An accelerated randomized Kaczmarz algorithm. Math. Comp., 85(297):153–178, 2016.
- [32] Shuyu Liu, Florentina Bunea, and Jonathan Niles-Weed. Asymptotic confidence sets for random linear programs. arXiv preprint arXiv:2302.12364, 2023.
- [33] Yong Liu and Chuan-Qing Gu. On greedy randomized block Kaczmarz method for consistent linear systems. Linear Algebra Appl., 616:178–200, 2021.
- [34] Anna Ma, Deanna Needell, and Aaditya Ramdas. Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods. SIAM J. Matrix Anal. Appl., 36(4):1590–1604, 2015.
- [35] Md Sarowar Morshed, Md Saiful Islam, and Md. Noor-E-Alam. Accelerated sampling Kaczmarz Motzkin algorithm for the linear feasibility problem. J. Global Optim., 77(2):361–382, 2020.
- [36] Md Sarowar Morshed, Md Saiful Islam, and Md Noor-E-Alam. Sampling kaczmarz-motzkin method for linear feasibility problems: generalization and acceleration. Mathematical Programming, 194(1-2):719–779, 2022.
- [37] Theodore Samuel Motzkin and Isaac Jacob Schoenberg. The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6:393–404, 1954.
- [38] Deanna Needell and Joel A. Tropp. Paved with good intentions: analysis of a randomized block Kaczmarz method. Linear Algebra Appl., 441:199–221, 2014.
- [39] Yu-Qi Niu and Bing Zheng. A greedy block Kaczmarz algorithm for solving large-scale linear systems. Appl. Math. Lett., 104:106294, 8, 2020.
- [40] Gilles Pisier. The volume of convex bodies and Banach space geometry, volume 94 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge, 1989.
- [41] András Prékopa. On the probability distribution of the optimum of a random linear program. SIAM Journal on Control, 4(1):211–222, 1966.
- [42] Mark Rudelson and Roman Vershynin. The Littlewood-Offord problem and invertibility of random matrices. Adv. Math., 218(2):600–633, 2008.
- [43] Steve Smale. On the average number of steps of the simplex method of linear programming. Math. Programming, 27(3):241–262, 1983.
- [44] Daniel A. Spielman and Shang-Hua Teng. Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J. ACM, 51(3):385–463, 2004.
- [45] Thomas Strohmer and Roman Vershynin. A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl., 15(2):262–278, 2009.
- [46] Michel Talagrand. Mean field models for spin glasses. Volume I, volume 54 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer-Verlag, Berlin, 2011. Basic examples.
- [47] Michel Talagrand. Mean field models for spin glasses. Volume II, volume 55 of Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Mathematics]. Springer, Heidelberg, 2011. Advanced replica-symmetry and low temperature.
- [48] Jan Telgen. On relaxation methods for systems of linear inequalities. European J. Oper. Res., 9(2):184–189, 1982.
- [49] Roman Vershynin. Beyond Hirsch conjecture: walks on random polytopes and smoothed complexity of the simplex method. SIAM J. Comput., 39(2):646–678, 2009.
- [50] Roman Vershynin. High-dimensional probability, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2018. An introduction with applications in data science, With a foreword by Sara van de Geer.
- [51] Yanjun Zhang and Hanyu Li. Block sampling kaczmarz–motzkin methods for consistent linear systems. Calcolo, 58(3):39, 2021.
- [52] Yanjun Zhang and Hanyu Li. Greedy Motzkin-Kaczmarz methods for solving linear systems. Numer. Linear Algebra Appl., 29(4):Paper No. e2429, 24, 2022.
- [53] Yanjun Zhang and Hanyu Li. Randomized block subsampling Kaczmarz-Motzkin method. Linear Algebra Appl., 667:133–150, 2023.