You Need to Calm Down: Calmness Regularity for a Class of Seminorm Optimization Problems††thanks: This work was supported by NSF award DMS-18-30418.
Abstract
Compressed sensing involves solving a minimization problem with objective function and linear constraints . Previous work has explored robustness to errors in and under special assumptions. Motivated by these results, we explore robustness to errors in for a wider class of objective functions and for a more general setting, where the solution may not be unique. Similar results for errors in are known and easier to prove. More precisely, for a seminorm with a polyhedral unit ball, we prove that the set-valued map is calm in whenever has full rank and the minimum value is positive, where calmness is a kind of local Lipschitz regularity.
1 Introduction
Convex optimization problems with linear constraints , where is a full-rank matrix with , covers a wide variety of applications. A well-known example is compressed sensing. In this example, one aims to recover a sparse signal given a vector of reduced number of linear measurements and the very special measurement vectors stored as the rows of . Under some restricting assumptions, the unknown signal is the solution of with minimal norm [1, 2, 3, 4, 5], or equivalently, the solution of
(1) |
A particular case of interest, which is motivated by application to MRI, assumes Fourier measurements [2]. In this case, sampling can be done by taking random measurements of the Fourier coefficients of the unknown signal , where is an upper bound for the sparseness of . These random measurements are dot products of the form , where is a randomly selected row of the discrete Fourier transform matrix. Therefore, has rows , , and . Candès, Romberg and Tao [2] showed that with probability the solution of (1) is unique and recovers . This observation extends to other kinds of measurements (see [4] and also [3, 5]).
In applications, the actual measurements of and are noisy, so understanding the reconstruction error under noise is important. Several works [6, 7, 8, 9] have proven that for very special measurement matrices and for noisy measurements of the form , where is sufficiently small and has a sufficiently close approximation in by an -sparse vector, the solution of (1) with replaced by is sufficiently close to in norm. Herman and Strohmer [10] extend this theory by allowing possible errors in the measurement matrix .
Many other applications involve error in the matrix . Examples include radar [11], remote sensing [12], telecommunications [13] and source separation [14]. More recently, Gutierrez et al. [15] developed a compression technique for model-based MRI reconstruction. Here there are at least two sources of error: the discretization of a set of ODEs and the search for a sparse basis to represent the simulation results. They use the total variation norm instead of the norm.
Motivated by these broad model problems we assume a seminorm from a particular class which includes the and total variation norms, and we prove a general stability result, with respect to perturbations in , for the convex optimization problem
(2) |
Even for the special case of the norm, our result differs from the earlier perturbation result of Herman and Strohmer [10] since we do not enforce the more restricted regime of singleton solution sets. We are thus able to eliminate their conditions on the matrix and prove a kind of local stability result. For general convex , Klatte and Kummer [16] have shown that (2) is stable with respect to perturbations in . We extend their result to stability in for a restricted class of .
The paper is organized as follows. In Section 2 we define the various notions of set-valued regularity and describe stability results for set-valued functions that are related to our problem. In Section 3, we formulate and prove our theorem using tools from subspace perturbation theory, the theory of polyhedral mappings, the theory of local error bounds for convex inequality systems, and the theory of set-valued regularity. Finally, in Section 4 we conclude this work and clarify the difficulty of extending our approach to the full class of seminorms.
2 Background
Our theory and proof combine ideas from different areas. We review here the necessary background according to the designated topics indicated by the section titles. We remark that the theory we use is formulated for a more general settings. Therefore, here and throughout the paper we formulate ideas both for our special setting and a setting of Banach spaces with a more general form of and constraints. We often use the same notation for both settings.
2.1 Mathematical Formulation of Our Objective
We consider the following solution map
(3) |
where denotes the constraint set
We note that generally is a set-valued function. We arbitrarily fix with full rank and quantitatively study the effect of perturbing on the set-valued solution map .
2.2 Regularity of Set-Valued Functions
In order to quantify the effect of perturbing on the set-valued solution map , we use some well-known notions of regularity of set-valued functions. We first motivate them for our setting by considering the special case where and is an entire face of the ball. Then a small perturbation in results in a jump from that face to a single vertex as we demonstrate in Figure 1. A model set-valued function with such a jump, which needs to be handled by the developed theory, is
This function has several properties, which we will formulate below.
We will state their definitions in terms of Banach spaces and , where we denote by all subsets of , the norm of and , the induced distance by between points or sets in . Throughout the paper we go back and forth between the general setting and our special setting of with the Euclidean norm, , and with the distance induced by the spectral norm. We also denote the spectral norm by . The induced distance by either norm is denoted by either , when involving only matrices, or , when involving sets of matrices. Even though we use the same notation, e.g., and for different setting, they are easily distinguishable. When assuming norms and distances in the Euclidean case, we use vectors, which we denote by boldfaced small letters; when assuming matrices, we denote them by boldfaced capital letters; and when assuming abstract Banach spaces, we denote their elements by regular small letters.
It is somewhat intuitive that the above function is “upper semicontinuous”. We clarify this notion as follows.
Definition 2.1.
A map is upper semicontinuous at if for any open set intersecting there exists a neighborhood of such that

It has been known since the 1970s that if is convex, then the solution map is upper semicontinuous [17, Theorem 1.15]. Our result upgrades the upper semicontinuity of to a kind of set-valued Lipschitz regularity, which we define next.
Definition 2.2.
A map is calm at if there exist neighborhoods and of and , respectively, and a constant such that for all and ,
We note that if is calm at then for all sufficiently close, there exists such that
This means that small perturbations to the measurement matrix leave at least two solutions and close.
We demonstrate that calmness is stronger than upper semicontinuity with the following example
The upper semicontinuity of is obvious. It is not calm since for and with sufficiently small absolute values the distance between and grows as and in general cannot be controlled by . Indeed, if, for example, and is arbitrary small, then (or one may write to emphasize the single value) and .
At last, we define a stronger notion of regularity, which will be useful later. Unlike lower Lipschitz semicontinuity, both and are allowed to vary. Here and throughout the paper, denotes an open ball in with center and radius , and similarly denotes a corresponding open ball in .
Definition 2.3.
A map is said to have the Aubin property at if there exists , and such that for all and all
Note that we will use the Aubin property on the “inverse” of defined as
Note that the Aubin property of implies a local bi-Lipschitz property of .
2.3 Sufficient Conditions for Calmness of Certain Solution Maps
In the general setting of Banach spaces , (with the above notation) and , and functions and , Klatte and Kummer [16] provide sufficient conditions for calmness at of
(4) |
These sufficient conditions are requirements on the regularity of the constraint set and the regularity of the mapping
(5) |
for a certain choice of . The regularity of these quantities is quantified by calmness and the following notion of lower Lipschitz semicontinuity. It is almost identical to calmness except that varies instead of .
Definition 2.4.
A set-valued map is called lower Lipschitz semicontinuous at if there exists such that
We formulate Theorem 3.1 of Klatte and Kummer [16] on sufficient conditions for calmness of . We use the quantity for and . It follows from the definition of that this quantity is well-defined as is constant on all values of .
2.4 Polyhedral Level Sets
We assume that the unit ball of , , is a polyhedron. For brevity, we refer by polyhedron to a convex polyhedron, which is the intersection of finitely many half spaces.
We relate the above property to the following well-known general notions of a polyhedral map and polyhedral level sets, which we use in this work. We define the former notion for a general function and the latter one only for . For Banach spaces and , a map is a polyhedral map if there exists polyhedra , such that
We say that has polyhedral level sets if the level-set map
(6) |
is polyhedral. Since is a seminorm, it is equivalent to the assumed property that consists of a single polyhedron (this claim will be clearer from the geometric argument presented later in Section 3.2.2).
We use a classic result from the theory of polyhedral mappings [18]:
Theorem 2.6.
A polyhedral set-valued mapping is calm at all .
2.5 Principal Angles
We exploit the geometric interplay between the affine subspace solution of and the affine subspaces of the polyhedral level sets using principal angles between shifted linear subspaces [19]. The principal angles between two linear subspaces and of with dimensions and , respectively, where , are defined recursively along with principal vectors , , and , , . The smallest principal angle is given by
(7) |
The principal vectors and achieve the minimum in (7). For , the th principal angle, , and principal vectors, and , are defined by
(8) |
where and achieve the minimum in (8)
2.6 Bounds for a System of Convex Inequalities
Our proof gives rise to a system of inequalities , where is a convex function, and requires bounding the distance between the solution set of this system and a point that lies near the boundary of this set. The sharp constant of the required bound uses the subdifferential of .
We first recall that the subdifferential of a convex function is defined as the following set of subgradients of :
The use of the subdifferential to bound distances to constraint sets began with Hoffman’s original work [20] on error bounds for systems of inequalities . Define to be the solution set of all that satisfy . Assuming this set is nonempty, Hoffman shows that there exists a constant so that for all
(9) |
where for , , . In his original paper, he proved the bound using results from conic geometry and then computed the constants ad hoc for a few different norms. Later results, see e.g., [21], generalize this to convex inequalities and compute sharp constants using the subdifferential . We will use the following theorem of [21], where (10) below is analogous to (9).
Theorem 2.7.
If is a convex function, and lies in the boundary of , then the following two statements are equivalent:
-
1.
There exist and such that for all
(10) -
2.
For all sequences with for all ,
(11)
Moreover is optimal in (10).
3 The Main Theorem and its Proof
We formulate the main theorem of this paper and follow up with its proof
Theorem 3.1.
Assume , and is a seminorm with polyhedral unit ball. Then the set-valued map defined in (3) is calm at for every full-rank such that and for every .
In analogy to the notation used in Theorem 2.5, we denote . The assumption of Theorem 3.1 that is reasonable from an applied perspective. For example, in the case of compressed sensing, where , corresponds to the trivial solution . The condition thus only excludes the trivial solution, where the sparsity prior is meaningless.
In order to prove this theorem, we first prove in Section 3.1 that is calm and lower Lipschitz at . We then prove in Section 3.2 that the map
is calm at . In view of Theorem 2.5, these results imply Theorem 3.1.
3.1 is Calm and Lower Lipschitz at
The following classical property of the Moore-Penrose inverse will be useful. We offer a short proof here.
Lemma 3.2.
Let and be the Moore-Penrose inverse of [22]. Then the following bound holds
(12) |
Proof.
The operator is the projection onto the kernel of and thus the projection is given by
Therefore,
∎
We derive the calmness of at from the above lemma. We set the neighborhoods of Definition 2.2 as and any bounded neighborhood of in of diameter , where any choice of is fine, but it impacts the calmness constant (see below). We arbitrarily fix and . To verify calmness, we directly apply Lemma 3.2
and further note that
Combining the above two equations and using the triangle inequality yields the desired calmness of :
(13) |
The calmness constant is bounded since the diameter is bounded and is full-rank, so its lowest singular value, is positive and thus is bounded. We remark that the bound in the first inequality of (13) is analogous to that in (9), though the latter one applies to a system of inequalities and not just equalities.
To see that is lower Lipschitz semicontinuous, we use again the formula . Since the complex roots of a polynomial vary continuously with respect to the coefficients, the eigenvalues of a real symmetric matrix vary continuously with respect to that matrix. Thus , which is the minimal eigenvalue of , depends continuously on . Therefore, for every there exists a such that for , , so that is also full-rank, and
(14) |
Swapping and in the first inequality of (13) and combining it with (14) gives
This bound shows that is lower Lipschitz semicontinuous at .
3.2 is Calm at
Recall the map defined in (6) and note that we can write
(15) |
Define also the system of inequalities
The following theorem from [16] gives conditions on and that ensure is calm.
Theorem 3.3.
Let , , be Banach spaces. If is calm at , is calm at , has the Aubin property at . Finally, assume is calm at . Then is calm at .
We will apply the theorem with , , and . We set , , , and . Let . To see that is calm at this point, we thus verify that
-
1.
has the Aubin property at .
-
2.
is calm at .
-
3.
is calm at .
We verify property 1 in Section 3.2.1, property 2 in Section 3.2.3, and property 3 in Section 3.2.3.
The main trick is that we have reduced the problem from perturbations in to better understood cases of perturbations. For example, in verifying properties 2 and 3 we need to study tolerance to perturbations in the right hand side of an inequality system, represented by either or . We prove the calmness of and by both geometric or algebraic approaches. The geometric ones are quite simple, while the algebraic ones enable us to compute explicit constants.
3.2.1 is Aubin
Since is convex, it is Lipschitz on for some . That is, there exists so that
For , and , we obtain by direct estimation and the above inequality the desired Aubin property as follows:
3.2.2 is calm
Geometric Proof: Note that by the degree-one homogeneity of the seminorm , for
That is, for , is simply a scaling of the set . By assumption, is a polyhedron in . Since , then the graph of over any small interval is a polyhedron in . This observation and Theorem 2.6 imply that is calm at .
Algebraic Proof: This proof relies on Theorem 2.7 with . We first verify (11) with the latter , which states that for all sequences with ,
(16) |
We note that if and , then . Indeed, only when is a minimum of , and since is a seminorm, its minimum is achieved only on the set . We further note that the assumption implies that . These two observations yield that
(17) |
Next, we define . If we can verify that is lower semicontinuous, then by (17) and the definition of lower semicontinuity, we conclude (16) as follows: The lower-semicontinuity of becomes obvious when we rewrite it as . Indeed, it follows from the set-valued upper semicontinuity of and the definitions of lower semicontinuity of a real-valued function and upper semicontinuity of a set-valued function.
Since we verified (11) with , Theorem 2.7 implies that (10) holds with the same and also specifies the optimal constant in (10). Note that by the definitions of and , . We thus proved that there exist , such that for all
(18) |
We conclude by showing that this bound implies the calmness of . Let (with chosen small enough so that ) and . By definition, , and we can thus write
(19) |
The combination of (18) and (19) results in
(20) |
for all . Thus is calm in .
3.2.3 is calm
Geometric Proof: Extending the polyhedral mapping proof from Section 3.2.2 is nontrivial because does not obey a simple scaling law. Instead, we analyze carefully the intersection of the affine set, , and the polyhedral set, , near . We will bound the calmness constant for by , with from (16), and the smallest positive principal angle between the affine space and the affine sets defining the faces of (of any dimension ) containing .
Denote by , , the faces of the polyhedron of any dimension less than that contain (and thus also intersect the -dimensional space ). Denote by the projection operator of onto . This operator is well-defined since is convex. Choose small enough so that the set
intersects only the faces of . Clearly, it also intersects . Choose coordinates so that . Then , can be viewed as linear subspaces of and the following vectors in ,
form a right triangle, which is demonstrated in Figure 2.
To prove that is calm at we wish to prove a bound of the form
(21) |
We start with the case when the triangle is degenerate, that is, either , or , or . First, assume that . This implies that and consequently the desired calmness: . Next, assume that . By definition, so . This observation and the fact that implies that and the same estimate holds as above. At last, we assume that . Then so and we can bound by (20). We thus verified (21) for these cases with .
In the case of non-degenerate triangles, we note that since ,
(22) |
Therefore, to obtain (21) we bound in terms of . Since , and form a right triangle, , where is the angle between and . The combination of this observation with the calmness of , stated in (20), results in the bound
(23) |
We note that depends on and thus (23) does not yield yet the desired bound on . To obtain this, we will establish a lower bound on .
We index by , , , where , the set of faces in , , having at least one positive principal angle with . Note that since we ruled out the trivial case, where the triangle formed by , and is degenerate, the latter set is not empty, that is, . For , , , we denote by the smallest positive principal angle between and . Since and are nonzero and contained in distinct subspaces, they are orthogonal to the principal vectors corresponding to the zero principal angles (see (8)). Hence for , , . We further denote by the minimal value of among all , , and have that . Combining this observation with (22) yields the desired calmness of at :
Algebraic Proof: The follows a similar approach to that in Section 3.2.2. We first write our set as the inequality system
We note that by definition and thus , though we treat it as a system of inequalities.
Similarly to the proof in in Section 3.2.2, the calmness of at with follows from (11) with the later . That is, we need to verify that for all sequences with
(24) |
The argument here is more subtle than in Section 3.2.2 since here , whereas before . We thus need a more subtle analysis of the function . The definition of and the fact that imply that the minimal value of , which is 0, is achieved at . Thus if and only if . Hence, for any sequence with , , or equivalently
(25) |
In order to conclude (24), we will show that is piecewise constant and combine this with (25). By Theorem 3 of Chapter 4 of [23] we have
(26) |
where co denotes the convex hull. Since has polyhedral level sets, is piecewise constant in the sense that outputs only finitely many different sets. Similarly, since is also a polyhedron, is piecewise constant as a set-valued function. These two observations and (26) thus imply that is piecewise constant as a set-valued function. Hence, is piecewise constant as a real-valued function and the proof is concluded.
4 Conclusion
We have shown that if has polyhedral level sets, then the solution map is calm. By tracing the constants in our proof and in Klatte and Kummer [16], we see that a calmness constant for is
where is the Lipschitz constant of from Section 3.2.1, is defined in (16) and is defined in (24).
A key assumption for proving that is calm is that has polyhedral sublevel sets. Removing this assumption would allow us to recover known calmness results such as the case [24]. Unfortunately, is in fact not calm for as we show below. Therefore, our approach needs a major change if one wants to extend it to other known seminorms.
We verify our claim above by using the following equivalent definition of calmness: A map is calm at if and only if there exist a neighborhood of and a constant such that
(27) |
Using this definition, we show that with in , and is not calm at . Recall that by definition
In our case,
Thus
so the distance is simply
Consequently, noting that we have
and thus no such bound of the form (27) exists with .
References
- [1] David L. Donoho and Michael Elad “Optimally sparse representation in general (nonorthogonal) dictionaries via minimization” In Proc. Natl. Acad. Sci. USA 100.5, 2003, pp. 2197–2202 DOI: 10.1073/pnas.0437847100
- [2] E. J. Candès, J. Romberg and T. Tao “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information” In IEEE Transactions on Information Theory 52.2, 2006, pp. 489–509 DOI: 10.1109/TIT.2005.862083
- [3] Emmanuel J. Candès and Terence Tao “Near-optimal signal recovery from random projections: Universal encoding strategies?” In IEEE Transactions on Information Theory 52.12, 2006, pp. 5406–5425 DOI: 10.1109/TIT.2006.885507
- [4] D. L. Donoho “For most large underdetermined systems of linear equations the minimal -norm solution is also the sparsest solution” In Communications on Pure and Applied Mathematics 59.6, 2006, pp. 797–829 DOI: 10.1002/cpa.20132
- [5] David L Donoho “Compressed sensing” In IEEE Transactions on information theory 52.4 IEEE, 2006, pp. 1289–1306
- [6] Emmanuel J Candès and Terence Tao “Near-optimal signal recovery from random projections: Universal encoding strategies?” In IEEE transactions on information theory 52.12 IEEE, 2006, pp. 5406–5425
- [7] Emmanuel J Candès “The restricted isometry property and its implications for compressed sensing” In Comptes rendus mathematique 346.9-10, 2008, pp. 589–592
- [8] D. L. Donoho, M. Elad and V. N. Temlyakov “Stable recovery of sparse overcomplete representations in the presence of noise” In IEEE Transactions on Information Theory 52.1, 2006, pp. 6–18 DOI: 10.1109/TIT.2005.860430
- [9] J. A. Tropp “Just relax: convex programming methods for identifying sparse signals in noise” In IEEE Transactions on Information Theory 52.3, 2006, pp. 1030–1051 DOI: 10.1109/TIT.2005.864420
- [10] Matthew A. Herman and Thomas Strohmer “General deviants: An analysis of perturbations in compressed sensing” In IEEE Journal on Selected Topics in Signal Processing 4.2, 2010, pp. 342–349 DOI: 10.1109/JSTSP.2009.2039170
- [11] Matthew A Herman and Thomas Strohmer “High-resolution radar via compressed sensing” In IEEE transactions on signal processing 57.6 IEEE, 2009, pp. 2275–2284
- [12] Albert C Fannjiang, Thomas Strohmer and Pengchong Yan “Compressed remote sensing of sparse objects” In SIAM Journal on Imaging Sciences 3.3 SIAM, 2010, pp. 595–618
- [13] Rémi Gribonval, Holger Rauhut, Karin Schnass and Pierre Vandergheynst “Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms” In Journal of Fourier analysis and Applications 14.5-6 Springer, 2008, pp. 655–687
- [14] Thomas Blumensath and Mike Davies “Compressed sensing and source separation” In International Conference on Independent Component Analysis and Signal Separation, 2007, pp. 341–348 Springer
- [15] Alex Gutierrez et al. “Reducing the Complexity of Model-Based MRI Reconstructions via Sparsification” Submitted, 2020
- [16] Diethard Klatte and Bernd Kummer “On Calmness of the Argmin Mapping in Parametric Optimization Problems” In Journal of Optimization Theory and Applications 165.3, 2015, pp. 708–719 DOI: 10.1007/s10957-014-0643-2
- [17] Diethard Klatte and Bernd Kummer “Nonsmooth Equations in Optimization: Regularity, Calculus, Methods, and Applications” In Journal of Chemical Information and Modeling 53.9 Springer, 2002, pp. 1689–1699 DOI: 10.1017/CBO9781107415324.004
- [18] Abderrahim Jourani “Hoffman’s error bound, local controllability, and sensitivity analysis” In SIAM Journal on Control and Optimization 38.3, 2000, pp. 947–970
- [19] Per Åke Wedin “On angles between subspaces of a finite dimensional inner product space”, 1983, pp. 263–285 DOI: 10.1007/bfb0062107
- [20] A.J. Hoffman “On approximate solutions of systems of linear inequalities” In Journal of Research of the National Bureau of Standards 49.4, 1952, pp. 263 DOI: 10.6028/jres.049.027
- [21] Huynh Van Ngai, Alexander Kruger and Michel Théra “Stability of Error Bounds for Semi-infinite Convex Constraint Systems” In SIAM Journal on Optimization 20.4, 2010, pp. 2080–2096 DOI: 10.1137/090767819
- [22] Stephen L Campbell and Carl D Meyer “Generalized inverses of linear transformations” SIAM, 2009
- [23] Aleksandr Davidovich Ioffe “Theory of extremal problems”, Studies in mathematics and its applications ; v. 6 Amsterdam ; New York : New York: North-Holland Pub. Co. ; Sole distributors for the U.S.A.Canada, Elsevier North-Holland, 1979
- [24] Jiu Ding “Perturbation analysis for the projection of a point to an affine set” In Linear Algebra and Its Applications 191.C, 1993, pp. 199–212 DOI: 10.1016/0024-3795(93)90515-P