This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

You Need to Calm Down: Calmness Regularity for a Class of Seminorm Optimization Problemsthanks: This work was supported by NSF award DMS-18-30418.

Alex Gutierrez, Gilad Lerman, Sam Stewart
Abstract

Compressed sensing involves solving a minimization problem with objective function Ω(𝒙)=𝒙1\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{1} and linear constraints 𝑨𝒙=𝒃\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}. Previous work has explored robustness to errors in 𝑨\boldsymbol{A} and 𝒃{\boldsymbol{b}} under special assumptions. Motivated by these results, we explore robustness to errors in 𝑨\boldsymbol{A} for a wider class of objective functions Ω\Omega and for a more general setting, where the solution may not be unique. Similar results for errors in 𝒃{\boldsymbol{b}} are known and easier to prove. More precisely, for a seminorm Ω(𝒙)\Omega(\boldsymbol{x}) with a polyhedral unit ball, we prove that the set-valued map S(𝑨)=argmin𝑨𝒙=𝒃Ω(𝒙)S(\boldsymbol{A})=\operatorname*{arg\,min}_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x}) is calm in 𝑨\boldsymbol{A} whenever 𝑨\boldsymbol{A} has full rank and the minimum value is positive, where calmness is a kind of local Lipschitz regularity.

1 Introduction

Convex optimization problems with linear constraints 𝑨𝒙=𝒃\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}, where 𝑨m×n\boldsymbol{A}\in\mathbb{R}^{m\times n} is a full-rank matrix with m<nm<n, covers a wide variety of applications. A well-known example is compressed sensing. In this example, one aims to recover a sparse signal given a vector 𝒃{\boldsymbol{b}} of reduced number of linear measurements and the very special measurement vectors stored as the rows of 𝑨\boldsymbol{A}. Under some restricting assumptions, the unknown signal is the solution of 𝑨𝒙=𝒃\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}} with minimal 1\ell_{1} norm [1, 2, 3, 4, 5], or equivalently, the solution of

min𝑨𝒙=𝒃𝒙1.\min_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\left\lVert\boldsymbol{x}\right\rVert_{1}. (1)

A particular case of interest, which is motivated by application to MRI, assumes Fourier measurements [2]. In this case, sampling can be done by taking m=𝒪(slogn)m=\mathcal{O}(s\log n) random measurements of the Fourier coefficients of the unknown signal 𝒙\boldsymbol{x}^{*}, where ss is an upper bound for the sparseness of 𝒙\boldsymbol{x}^{*}. These random measurements are dot products of the form 𝒙,𝝍k\langle\boldsymbol{x}^{*},\,\boldsymbol{\psi}_{k}\rangle, where 𝝍k\boldsymbol{\psi}_{k} is a randomly selected row of the n×nn\times n discrete Fourier transform matrix. Therefore, 𝑨\boldsymbol{A} has rows 𝝍1\boldsymbol{\psi}_{1}, \ldots, 𝝍k\boldsymbol{\psi}_{k} and 𝒃=𝑨𝒙{\boldsymbol{b}}=\boldsymbol{A}\boldsymbol{x}^{*}. Candès, Romberg and Tao [2] showed that with probability 11 the solution of (1) is unique and recovers 𝒙\boldsymbol{x}^{*}. This observation extends to other kinds of measurements (see [4] and also [3, 5]).

In applications, the actual measurements of 𝐀\mathbf{A} and 𝒃{\boldsymbol{b}} are noisy, so understanding the reconstruction error under noise is important. Several works [6, 7, 8, 9] have proven that for very special measurement matrices 𝑨\boldsymbol{A} and for noisy measurements 𝒃^\hat{{\boldsymbol{b}}} of the form 𝒃^=𝑨𝒙+𝒆\hat{{\boldsymbol{b}}}=\boldsymbol{A}\boldsymbol{x}^{*}+\boldsymbol{e}, where 𝒆2\left\lVert\boldsymbol{e}\right\rVert_{2} is sufficiently small and 𝒙\boldsymbol{x}^{*} has a sufficiently close approximation in 1\ell_{1} by an ss-sparse vector, the solution 𝒙\boldsymbol{x} of (1) with 𝒃{\boldsymbol{b}} replaced by 𝒃^\hat{{\boldsymbol{b}}} is sufficiently close to 𝒙\boldsymbol{x}^{*} in 2\ell_{2} norm. Herman and Strohmer [10] extend this theory by allowing possible errors in the measurement matrix 𝑨\boldsymbol{A}.

Many other applications involve error in the matrix 𝑨\boldsymbol{A}. Examples include radar [11], remote sensing [12], telecommunications [13] and source separation [14]. More recently, Gutierrez et al. [15] developed a compression technique for model-based MRI reconstruction. Here there are at least two sources of error: the discretization of a set of ODEs and the search for a sparse basis to represent the simulation results. They use the total variation norm instead of the 1\ell^{1} norm.

Motivated by these broad model problems we assume a seminorm Ω\Omega from a particular class which includes the 1\ell^{1} and total variation norms, and we prove a general stability result, with respect to perturbations in 𝑨\boldsymbol{A}, for the convex optimization problem

min𝑨𝒙=𝒃Ω(𝒙).\min_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x}). (2)

Even for the special case of the 1\ell_{1} norm, our result differs from the earlier perturbation result of Herman and Strohmer [10] since we do not enforce the more restricted regime of singleton solution sets. We are thus able to eliminate their conditions on the matrix 𝑨{\boldsymbol{A}} and prove a kind of local stability result. For general convex Ω\Omega, Klatte and Kummer [16] have shown that (2) is stable with respect to perturbations in 𝒃{\boldsymbol{b}}. We extend their result to stability in 𝑨\boldsymbol{A} for a restricted class of Ω\Omega.

The paper is organized as follows. In Section 2 we define the various notions of set-valued regularity and describe stability results for set-valued functions that are related to our problem. In Section 3, we formulate and prove our theorem using tools from subspace perturbation theory, the theory of polyhedral mappings, the theory of local error bounds for convex inequality systems, and the theory of set-valued regularity. Finally, in Section 4 we conclude this work and clarify the difficulty of extending our approach to the full class of seminorms.

2 Background

Our theory and proof combine ideas from different areas. We review here the necessary background according to the designated topics indicated by the section titles. We remark that the theory we use is formulated for a more general settings. Therefore, here and throughout the paper we formulate ideas both for our special setting and a setting of Banach spaces with a more general form of Ω\Omega and constraints. We often use the same notation for both settings.

2.1 Mathematical Formulation of Our Objective

We consider the following solution map

S(𝑨)=argmin𝑨𝒙=𝒃Ω(𝒙)=argmin𝒙M(𝑨)Ω(𝒙),S(\boldsymbol{A})=\operatorname*{arg\,min}_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x})=\operatorname*{arg\,min}_{\boldsymbol{x}\in M(\boldsymbol{A})}\Omega(\boldsymbol{x}), (3)

where M(𝑨)M(\boldsymbol{A}) denotes the constraint set

M(𝑨)={𝒛n𝑨𝒛=𝒃}.M(\boldsymbol{A})=\{\boldsymbol{z}\in\mathbb{R}^{n}\mid\boldsymbol{A}\boldsymbol{z}={\boldsymbol{b}}\}.

We note that generally S(𝑨)S(\boldsymbol{A}) is a set-valued function. We arbitrarily fix 𝑨0m×n\boldsymbol{A}_{0}\in\mathbb{R}^{m\times n} with full rank and quantitatively study the effect of perturbing 𝑨0\boldsymbol{A}_{0} on the set-valued solution map S(𝑨)S(\boldsymbol{A}).

2.2 Regularity of Set-Valued Functions

In order to quantify the effect of perturbing 𝑨0\boldsymbol{A}_{0} on the set-valued solution map S(𝑨)S(\boldsymbol{A}), we use some well-known notions of regularity of set-valued functions. We first motivate them for our setting by considering the special case where Ω(𝒙)=𝒙1\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{1} and S(𝑨0)S(\boldsymbol{A}_{0}) is an entire face of the 1\ell^{1} ball. Then a small perturbation in 𝑨0\boldsymbol{A}_{0} results in a jump from that face to a single vertex as we demonstrate in Figure 1. A model set-valued function with such a jump, which needs to be handled by the developed theory, is

G(t)={[0,1] if t(1/2,1]0 if t[0,1/2).G(t)=\begin{cases}[0,1]&\textrm{ if }t\in(1/2,1]\cr 0&\textrm{ if }t\in[0,1/2).\end{cases}

This function has several properties, which we will formulate below.

We will state their definitions in terms of Banach spaces XX and YY, where we denote by 2X2^{X} all subsets of XX, \|\cdot\| the norm of YY and d(,)d(\cdot,\cdot), the induced distance by \|\cdot\| between points or sets in XX. Throughout the paper we go back and forth between the general setting and our special setting of X=nX=\mathbb{R}^{n} with the Euclidean norm, \|\cdot\|, and Y=m×nY=\mathbb{R}^{m\times n} with the distance induced by the spectral norm. We also denote the spectral norm by \|\cdot\|. The induced distance by either norm is denoted by either \|\cdot-\cdot\|, when involving only matrices, or d(,)d(\cdot,\cdot), when involving sets of matrices. Even though we use the same notation, e.g., \|\cdot\| and d(,)d(\cdot,\cdot) for different setting, they are easily distinguishable. When assuming norms and distances in the Euclidean case, we use vectors, which we denote by boldfaced small letters; when assuming matrices, we denote them by boldfaced capital letters; and when assuming abstract Banach spaces, we denote their elements by regular small letters.

It is somewhat intuitive that the above function G(t)G(t) is “upper semicontinuous”. We clarify this notion as follows.

Definition 2.1.

A map F:Y2XF:Y\to 2^{X} is upper semicontinuous at y0y_{0} if for any open set VV intersecting F(y0)F(y_{0}) there exists a neighborhood UU of y0y_{0} such that

F(y)V, for all yU.F(y)\cap V\neq\emptyset,\quad\textrm{ for all }y\in U.
Refer to caption
Figure 1: Demonstration of a case where S(𝑨0)S(\boldsymbol{A}_{0}) is a face of the unit 1\ell_{1} ball and S(𝑨)S(\boldsymbol{A}), where 𝑨=𝑨0+𝑬\boldsymbol{A}=\boldsymbol{A}_{0}+\boldsymbol{E} and 0<𝑬10<\|\boldsymbol{E}\|\ll 1, is a vertex of this ball. Here, the line intersecting the face represents the solution of 𝑨0𝒙=𝒃\boldsymbol{A}_{0}\boldsymbol{x}={\boldsymbol{b}}, the other line represents the solution of 𝑨𝒙=𝒃\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}, and the sets S(𝑨0)S(\boldsymbol{A}_{0}) and S(𝑨)S(\boldsymbol{A}) are the intersections of these lines, respectively, with the unit 1\ell_{1} ball, which is represented by the rotated square.

It has been known since the 1970s that if Ω\Omega is convex, then the solution map S(𝑨)S(\boldsymbol{A}) is upper semicontinuous [17, Theorem 1.15]. Our result upgrades the upper semicontinuity of S(𝑨)S(\boldsymbol{A}) to a kind of set-valued Lipschitz regularity, which we define next.

Definition 2.2.

A map F:Y2XF:Y\to 2^{X} is calm at (y0,x0)Graph(F)(y_{0},x_{0})\in\textrm{Graph}(F) if there exist neighborhoods UU and VV of y0y_{0} and x0x_{0}, respectively, and a constant L(y0,x0)L(y_{0},x_{0}) such that for all y1Uy_{1}\in U and x1F(y1)Vx_{1}\in F(y_{1})\cap V,

d(x1,F(y0))L(y0,x0)y0y1.d(x_{1},F(y_{0}))\leq L(y_{0},x_{0})\left\lVert y_{0}-y_{1}\right\rVert.

We note that if S(𝑨)S(\boldsymbol{A}) is calm at (𝑨0,𝒙0)(\boldsymbol{A}_{0},\boldsymbol{x}_{0}) then for all 𝑨1\boldsymbol{A}_{1} sufficiently close, there exists 𝒙1S(𝑨1)\boldsymbol{x}_{1}\in S(\boldsymbol{A}_{1}) such that

𝒙0𝒙1C(𝑨0)𝑨0𝑨1.\left\lVert\boldsymbol{x}_{0}-\boldsymbol{x}_{1}\right\rVert\leq C(\boldsymbol{A}_{0})\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert.

This means that small perturbations to the measurement matrix 𝑨\boldsymbol{A} leave at least two solutions 𝒙0S(𝑨0)\boldsymbol{x}_{0}\in S(\boldsymbol{A}_{0}) and 𝒙1S(𝑨1)\boldsymbol{x}_{1}\in S(\boldsymbol{A}_{1}) close.

We demonstrate that calmness is stronger than upper semicontinuity with the following example

H(t)=[t,t] for t[0,1].H(t)=[-\sqrt{t},\sqrt{t}]\ \text{ for }t\in[0,1].

The upper semicontinuity of HH is obvious. It is not calm since for t0t_{0} and t1t_{1} with sufficiently small absolute values the distance between H(t0)H(t_{0}) and H(t1)H(t_{1}) grows as |t0t1||\sqrt{t_{0}}-\sqrt{t_{1}}| and in general cannot be controlled by |t0t1||t_{0}-t_{1}|. Indeed, if, for example, t0=0t_{0}=0 and t>0t>0 is arbitrary small, then H(0)={0}H(0)=\{0\} (or one may write H(0)=0H(0)=0 to emphasize the single value) and d(H(0),H(t))=ttd(H(0),H(t))=\sqrt{t}\gg t.

At last, we define a stronger notion of regularity, which will be useful later. Unlike lower Lipschitz semicontinuity, both yy and yy^{\prime} are allowed to vary. Here and throughout the paper, Bδ(y)B_{\delta}(y) denotes an open ball in YY with center yy and radius δ\delta, and similarly Bϵ(x)B_{\epsilon}(x) denotes a corresponding open ball in XX.

Definition 2.3.

A map F:Y2XF:Y\to 2^{X} is said to have the Aubin property at (y0,x0)Graph F(y_{0},x_{0})\in\textrm{Graph }F if there exists ϵ\epsilon, δ\delta and C>0C>0 such that for all y,yBδ(y0)y,y^{\prime}\in B_{\delta}(y_{0}) and all xBϵ(x0)F(y)x\in B_{\epsilon}(x_{0})\cap F(y)

d(x,F(y))Cyy.d(x,F(y^{\prime}))\leq C\left\lVert y-y^{\prime}\right\rVert.

Note that we will use the Aubin property on the “inverse” of FF defined as

F1(y)={xxF(y)}F^{-1}(y)=\{x\mid x\in F(y)\}

Note that the Aubin property of F1F^{-1} implies a local bi-Lipschitz property of FF.

2.3 Sufficient Conditions for Calmness of Certain Solution Maps

In the general setting of Banach spaces XX, YY (with the above notation) and ZZ, and functions M:Y2XM:Y\to 2^{X} and f:Z×Yf:Z\times Y\to\mathbb{R}, Klatte and Kummer [16] provide sufficient conditions for calmness at (y0,x0)Y×2X(y_{0},x_{0})\in Y\times 2^{X} of

S(y)=argminzM(y)f(z,y).S(y)=\operatorname*{arg\,min}_{z\in M(y)}f(z,y). (4)

These sufficient conditions are requirements on the regularity of the constraint set M(y)M(y) and the regularity of the mapping

L(y,μ)={xM(y)f(x,y0)μ}L(y,\mu)=\{x\in M(y)\mid f(x,y_{0})\leq\mu\} (5)

for a certain choice of μ\mu. The regularity of these quantities is quantified by calmness and the following notion of lower Lipschitz semicontinuity. It is almost identical to calmness except that yy varies instead of xx.

Definition 2.4.

A set-valued map F:Y2XF:Y\to 2^{X} is called lower Lipschitz semicontinuous at (y0,x0)(y_{0},x_{0}) if there exists δ,C>0\delta,C>0 such that

d(x0,F(y))Cy0y1, for all yBδ(y0).d(x_{0},F(y))\leq C\left\lVert y_{0}-y_{1}\right\rVert,\quad\textrm{ for all }y\in B_{\delta}(y_{0}).

We formulate Theorem 3.1 of Klatte and Kummer [16] on sufficient conditions for calmness of S(y)S(y). We use the quantity μ0=f(S(y0),y0)\mu_{0}=f(S(y_{0}),y_{0}) for y0Yy_{0}\in Y and x0Xx_{0}\in X. It follows from the definition of ff that this quantity is well-defined as ff is constant on all values of S(y0)S(y_{0}).

Theorem 2.5.

Assume the general setting of Banach spaces XX (with metric dd), YY (with norm \|\cdot\|) and ZZ, and functions M:Y2XM:Y\to 2^{X}, f:Z×Yf:Z\times Y\to\mathbb{R}, S:Y2XS:Y\to 2^{X} defined in (4) and L:Y×XL:Y\times\mathbb{R}\to X defined in (5). If y0Yy_{0}\in Y, x0Xx_{0}\in X, μ0=f(S(y0),y0)\mu_{0}=f(S(y_{0}),y_{0}) and

  1. 1.

    M(y)M(y) is calm and lower Lipschitz semicontinuous at (y0,x0)(y_{0},x_{0});

  2. 2.

    L(y,μ)L(y,\mu) is calm at ((y0,μ0),x0)((y_{0},\mu_{0}),x_{0});

then S(y)S(y) is calm at (y0,x0)(y_{0},x_{0}).

2.4 Polyhedral Level Sets

We assume that the unit ball of Ω\Omega, {𝒙Ω(𝒙)1}\{\boldsymbol{x}\mid\Omega(\boldsymbol{x})\leq 1\}, is a polyhedron. For brevity, we refer by polyhedron to a convex polyhedron, which is the intersection of finitely many half spaces.

We relate the above property to the following well-known general notions of a polyhedral map and polyhedral level sets, which we use in this work. We define the former notion for a general function and the latter one only for Ω\Omega. For Banach spaces XX and YY, a map F:XYF:X\to Y is a polyhedral map if there exists polyhedra Pi,i=1,,NP_{i},i=1,\ldots,N, such that

GraphF={(x,F(x))xX}=i=1NPi.\textrm{Graph}F=\{(x,F(x))\mid x\in X\}=\bigcup_{i=1}^{N}P_{i}.

We say that Ω\Omega has polyhedral level sets if the level-set map

F(μ)={𝒙nΩ(𝒙)μ}F(\mu)=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x})\leq\mu\} (6)

is polyhedral. Since Ω\Omega is a seminorm, it is equivalent to the assumed property that F(1)F(1) consists of a single polyhedron P1P_{1} (this claim will be clearer from the geometric argument presented later in Section 3.2.2).

We use a classic result from the theory of polyhedral mappings [18]:

Theorem 2.6.

A polyhedral set-valued mapping H:2nH:\mathbb{R}\to 2^{\mathbb{R}^{n}} is calm at all (by0,𝐱0)Graph(H)(by_{0},\boldsymbol{x}_{0})\in\textrm{Graph}(H).

2.5 Principal Angles

We exploit the geometric interplay between the affine subspace solution of 𝑨𝒙=𝒃\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}} and the affine subspaces of the polyhedral level sets using principal angles between shifted linear subspaces [19]. The principal angles 0θ1θrπ/20\leq\theta_{1}\leq\cdots\leq\theta_{r}\leq\pi/2 between two linear subspaces 𝒰\mathcal{U} and 𝒱\mathcal{V} of n\mathbb{R}^{n} with dimensions rr and kk, respectively, where r<kr<k, are defined recursively along with principal vectors 𝒖1\boldsymbol{u}_{1}, \ldots, 𝒖r\boldsymbol{u}_{r} and 𝒗1\boldsymbol{v}_{1}, \ldots, 𝒗r\boldsymbol{v}_{r}. The smallest principal angle θ1\theta_{1} is given by

θ1=min𝒖𝒰,𝒗𝒰𝒖=𝒗=1arccos(|𝒖T𝒗|).\theta_{1}=\min_{\begin{subarray}{c}\boldsymbol{u}\in\mathcal{U},\boldsymbol{v}\in\mathcal{U}\\ \|\boldsymbol{u}\|=\|\boldsymbol{v}\|=1\end{subarray}}\arccos(|\boldsymbol{u}^{T}\boldsymbol{v}|). (7)

The principal vectors 𝒖1\boldsymbol{u}_{1} and 𝒗1\boldsymbol{v}_{1} achieve the minimum in (7). For 2kr2\leq k\leq r, the kkth principal angle, θk\theta_{k}, and principal vectors, 𝒖k\boldsymbol{u}_{k} and 𝒗k\boldsymbol{v}_{k}, are defined by

θk=min𝒖𝒰,𝒖=1,𝒗𝒖1,,𝒖k1𝒗𝒱,𝒗=1,𝒗𝒗1,,𝒗k1arccos(|𝒖T𝒗|),\theta_{k}=\min_{\begin{subarray}{c}\boldsymbol{u}\in\mathcal{U},\|\boldsymbol{u}\|=1,\boldsymbol{v}\perp\boldsymbol{u}_{1},\dots,\boldsymbol{u}_{k-1}\\ \boldsymbol{v}\in\mathcal{V},\|\boldsymbol{v}\|=1,\boldsymbol{v}\perp\boldsymbol{v}_{1},\dots,\boldsymbol{v}{k-1}\end{subarray}}\arccos(|\boldsymbol{u}^{T}\boldsymbol{v}|), (8)

where 𝒖k\boldsymbol{u}_{k} and 𝒗k\boldsymbol{v}_{k} achieve the minimum in (8)

2.6 Bounds for a System of Convex Inequalities

Our proof gives rise to a system of inequalities f(𝒙)0f(\boldsymbol{x})\leq 0, where ff is a convex function, and requires bounding the distance between the solution set of this system and a point that lies near the boundary of this set. The sharp constant of the required bound uses the subdifferential of ff.

We first recall that the subdifferential of a convex function f:nf:\mathbb{R}^{n}\to\mathbb{R} is defined as the following set of subgradients of ff:

f(𝒙):={𝒗nf(𝒛)f(𝒙)𝒗,𝒛𝒙 for all 𝒛n}.\partial f(\boldsymbol{x}):=\{\boldsymbol{v}\in\mathbb{R}^{n}\mid f(\boldsymbol{z})-f(\boldsymbol{x})\geq\langle\boldsymbol{v},\,\boldsymbol{z}-\boldsymbol{x}\rangle\textrm{ for all }\boldsymbol{z}\in\mathbb{R}^{n}\}.

The use of the subdifferential to bound distances to constraint sets began with Hoffman’s original work [20] on error bounds for systems of inequalities 𝑨𝒙b\boldsymbol{A}\boldsymbol{x}\leq b. Define 𝒮𝑨,𝒃\mathcal{S}_{\boldsymbol{A},{\boldsymbol{b}}} to be the solution set of all 𝒙n\boldsymbol{x}\in\mathbb{R}^{n} that satisfy 𝑨𝒙𝒃\boldsymbol{A}\boldsymbol{x}\leq{\boldsymbol{b}}. Assuming this set is nonempty, Hoffman shows that there exists a constant CC so that for all 𝒙n\boldsymbol{x}\in\mathbb{R}^{n}

d(𝒙,𝒮𝑨,𝒃)C[𝑨𝒙𝒃]+,d(\boldsymbol{x},\mathcal{S}_{\boldsymbol{A},{\boldsymbol{b}}})\leq C\|[\boldsymbol{A}\boldsymbol{x}-{\boldsymbol{b}}]_{+}\|, (9)

where ([𝒙]+)i=max(xi,0)([\boldsymbol{x}]_{+})_{i}=\max(x_{i},0) for i=1i=1, \ldots, nn. In his original paper, he proved the bound using results from conic geometry and then computed the constants ad hoc for a few different norms. Later results, see e.g., [21], generalize this to convex inequalities f(𝒙)0f(\boldsymbol{x})\leq 0 and compute sharp constants using the subdifferential f(𝒙)\partial f(\boldsymbol{x}) . We will use the following theorem of [21], where (10) below is analogous to (9).

Theorem 2.7.

If f:nf:\mathbb{R}^{n}\to\mathbb{R} is a convex function, Q={𝐳nf(𝐳)0}Q=\{\boldsymbol{z}\in\mathbb{R}^{n}\mid f(\boldsymbol{z})\leq 0\} and 𝐱0\boldsymbol{x}_{0} lies in the boundary of QQ, then the following two statements are equivalent:

  1. 1.

    There exist c(f,𝒙0)c(f,\boldsymbol{x}_{0}) and ϵ>0\epsilon>0 such that for all 𝒙Bϵ(𝒙0)\boldsymbol{x}\in B_{\epsilon}(\boldsymbol{x}_{0})

    d(𝒙,Q)c(f,𝒙0)[f(𝒙)]+;d(\boldsymbol{x},Q)\leq c(f,\boldsymbol{x}_{0})[f(\boldsymbol{x})]_{+}; (10)
  2. 2.

    For all sequences {𝒙i}i𝒙0\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0} with f(𝒙i)>0f(\boldsymbol{x}_{i})>0 for all ii\in\mathbb{N},

    τ(f,𝒙0):=lim inf𝒙i𝒙0d(0,f(𝒙))>0.\tau(f,\boldsymbol{x}_{0}):=\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}d(0,\partial f(\boldsymbol{x}))>0. (11)

Moreover c=τ(f,𝐱0)1c=\tau(f,\boldsymbol{x}_{0})^{-1} is optimal in (10).

3 The Main Theorem and its Proof

We formulate the main theorem of this paper and follow up with its proof

Theorem 3.1.

Assume mn,m\leq n, 𝐛m{\boldsymbol{b}}\in\mathbb{R}^{m}, 𝐀m×n\boldsymbol{A}\in\mathbb{R}^{m\times n} and Ω\Omega is a seminorm with polyhedral unit ball. Then the set-valued map defined in (3) is calm at (𝐀0,𝐱0)(\boldsymbol{A}_{0},\boldsymbol{x}_{0}) for every full-rank 𝐀0m×n\boldsymbol{A}_{0}\in\mathbb{R}^{m\times n} such that min𝐀0𝐱=𝐛Ω(𝐱)>0\min_{\boldsymbol{A}_{0}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x})>0 and for every 𝐱0S(𝐀0)\boldsymbol{x}_{0}\in S(\boldsymbol{A}_{0}).

In analogy to the notation used in Theorem 2.5, we denote μ0=min𝑨0𝒙=𝒃Ω(𝒙)\mu_{0}=\min_{\boldsymbol{A}_{0}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x}). The assumption of Theorem 3.1 that μ0>0\mu_{0}>0 is reasonable from an applied perspective. For example, in the case of compressed sensing, where Ω(𝒙)=𝒙1\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{1}, μ0=0\mu_{0}=0 corresponds to the trivial solution 𝒙=𝟎\boldsymbol{x}=\boldsymbol{0}. The condition μ0>0\mu_{0}>0 thus only excludes the trivial solution, where the sparsity prior is meaningless.

In order to prove this theorem, we first prove in Section 3.1 that M(𝑨)M(\boldsymbol{A}) is calm and lower Lipschitz at (𝑨0,𝒙0)(\boldsymbol{A}_{0},\boldsymbol{x}_{0}). We then prove in Section 3.2 that the map

L(𝑨,μ)={𝒙M(𝑨)Ω(𝒙)μ}L(\boldsymbol{A},\mu)=\{\boldsymbol{x}\in M(\boldsymbol{A})\mid\Omega(\boldsymbol{x})\leq\mu\}

is calm at ((𝑨0,μ0),𝒙0)((\boldsymbol{A}_{0},\mu_{0}),\boldsymbol{x}_{0}). In view of Theorem 2.5, these results imply Theorem 3.1.

3.1 M(𝑨)M(\boldsymbol{A}) is Calm and Lower Lipschitz at (𝑨0,𝒙0)(\boldsymbol{A}_{0},\boldsymbol{x}_{0})

The following classical property of the Moore-Penrose inverse will be useful. We offer a short proof here.

Lemma 3.2.

Let 𝐱n\boldsymbol{x}\in\mathbb{R}^{n} and 𝐀\boldsymbol{A}^{\dagger} be the Moore-Penrose inverse of 𝐀\boldsymbol{A} [22]. Then the following bound holds

d(𝒙,M(𝑨))𝑨𝑨𝒙𝒃.d(\boldsymbol{x},M(\boldsymbol{A}))\leq\left\lVert\boldsymbol{A}^{\dagger}\right\rVert\left\lVert\boldsymbol{A}\boldsymbol{x}-{\boldsymbol{b}}\right\rVert. (12)
Proof.

The operator 𝑰𝑨𝑨\boldsymbol{I}-\boldsymbol{A}^{\dagger}\boldsymbol{A} is the projection onto the kernel of 𝑨\boldsymbol{A} and thus the projection PM(𝑨)P_{M(\boldsymbol{A})} is given by

PM(𝑨)𝒙=𝑨𝒃+(I𝑨𝑨)𝒙.P_{M(\boldsymbol{A})}\boldsymbol{x}=\boldsymbol{A}^{\dagger}{\boldsymbol{b}}+(I-\boldsymbol{A}^{\dagger}\boldsymbol{A})\boldsymbol{x}.

Therefore,

d(𝒙,M(𝑨))=𝒙PM(𝑨)𝒙=𝒙𝑨𝒃𝒙+𝑨𝑨𝒙=𝑨𝑨𝒙𝑨𝒃𝑨𝑨𝒙𝒃.d(\boldsymbol{x},M(\boldsymbol{A}))=\left\lVert\boldsymbol{x}-P_{M(\boldsymbol{A})}\boldsymbol{x}\right\rVert=\left\lVert\boldsymbol{x}-\boldsymbol{A}^{\dagger}{\boldsymbol{b}}-\boldsymbol{x}+\boldsymbol{A}^{\dagger}\boldsymbol{A}\boldsymbol{x}\right\rVert=\left\lVert\boldsymbol{A}^{\dagger}\boldsymbol{A}\boldsymbol{x}-\boldsymbol{A}^{\dagger}{\boldsymbol{b}}\right\rVert\leq\left\lVert\boldsymbol{A}^{\dagger}\right\rVert\left\lVert\boldsymbol{A}\boldsymbol{x}-{\boldsymbol{b}}\right\rVert.

We derive the calmness of M(𝑨)M(\boldsymbol{A}) at (𝑨0,𝒙0)m×n×n(\boldsymbol{A}_{0},\boldsymbol{x}_{0})\in\mathbb{R}^{m\times n}\times\mathbb{R}^{n} from the above lemma. We set the neighborhoods of Definition 2.2 as U=m×nU=\mathbb{R}^{m\times n} and VV any bounded neighborhood of 𝒙0\boldsymbol{x}_{0} in n\mathbb{R}^{n} of diameter D>0D>0, where any choice of D>0D>0 is fine, but it impacts the calmness constant (see below). We arbitrarily fix 𝑨1U\boldsymbol{A}_{1}\in U and 𝒙1M(𝑨1)V\boldsymbol{x}_{1}\in M(\boldsymbol{A}_{1})\cap V. To verify calmness, we directly apply Lemma 3.2

d(𝒙1,M(𝑨0))𝑨0𝑨0𝒙1𝒃d(\boldsymbol{x}_{1},M(\boldsymbol{A}_{0}))\leq\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert\left\lVert\boldsymbol{A}_{0}\boldsymbol{x}_{1}-{\boldsymbol{b}}\right\rVert

and further note that

𝑨0𝒙1𝒃=𝑨0𝒙1𝒃𝑨1𝒙1+𝒃=(𝑨0𝑨1)𝒙1.\boldsymbol{A}_{0}\boldsymbol{x}_{1}-{\boldsymbol{b}}=\boldsymbol{A}_{0}\boldsymbol{x}_{1}-{\boldsymbol{b}}-\boldsymbol{A}_{1}\boldsymbol{x}_{1}+{\boldsymbol{b}}=(\boldsymbol{A}_{0}-\boldsymbol{A}_{1})\boldsymbol{x}_{1}.

Combining the above two equations and using the triangle inequality yields the desired calmness of M(𝑨)M(\boldsymbol{A}):

d(𝒙1,M(𝑨0))𝑨0𝒙1𝑨0𝑨1𝑨0(𝒙0+D)𝑨0𝑨1.d(\boldsymbol{x}_{1},M(\boldsymbol{A}_{0}))\leq\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert\left\lVert\boldsymbol{x}_{1}\right\rVert\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert\leq\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert(\left\lVert\boldsymbol{x}_{0}\right\rVert+D)\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert. (13)

The calmness constant 𝑨0(𝒙0+D)\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert(\left\lVert\boldsymbol{x}_{0}\right\rVert+D) is bounded since the diameter DD is bounded and 𝑨0\boldsymbol{A}_{0} is full-rank, so its lowest singular value, σmin(𝑨0)\sigma_{\textrm{min}}(\boldsymbol{A}_{0}) is positive and thus 𝑨0=1/σmin(𝑨0)\left\lVert\boldsymbol{A}^{\dagger}_{0}\right\rVert=1/{\sigma_{\textrm{min}}(\boldsymbol{A}_{0})} is bounded. We remark that the bound in the first inequality of (13) is analogous to that in (9), though the latter one applies to a system of inequalities and not just equalities.

To see that M(𝑨)M(\boldsymbol{A}) is lower Lipschitz semicontinuous, we use again the formula 𝑨=1/σmin(𝑨)\left\lVert\boldsymbol{A}^{\dagger}\right\rVert=1/{\sigma_{\textrm{min}}(\boldsymbol{A})}. Since the complex roots of a polynomial vary continuously with respect to the coefficients, the eigenvalues of a real symmetric matrix vary continuously with respect to that matrix. Thus σmin(𝑨)\sigma_{\textrm{min}}(\boldsymbol{A}), which is the minimal eigenvalue of 𝑨𝑨T\boldsymbol{A}\boldsymbol{A}^{T}, depends continuously on 𝑨\boldsymbol{A}. Therefore, for every ϵ>0\epsilon>0 there exists a δ>0\delta>0 such that for 𝑨1Bδ(𝑨0)\boldsymbol{A}_{1}\in B_{\delta}(\boldsymbol{A}_{0}), σmin(𝑨1)>0\sigma_{\textrm{min}}(\boldsymbol{A}_{1})>0, so that 𝑨1\boldsymbol{A}_{1} is also full-rank, and

𝑨1(1+ϵ)𝑨0.\left\lVert\boldsymbol{A}_{1}^{\dagger}\right\rVert\leq(1+\epsilon)\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert. (14)

Swapping 𝑨1\boldsymbol{A}_{1} and 𝑨0\boldsymbol{A}_{0} in the first inequality of (13) and combining it with (14) gives

d(𝒙0,M(𝑨1))(1+ϵ)𝑨0𝒙0𝑨0𝑨1.d(\boldsymbol{x}_{0},M(\boldsymbol{A}_{1}))\leq(1+\epsilon)\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert\left\lVert\boldsymbol{x}_{0}\right\rVert\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert.

This bound shows that M(𝑨)M(\boldsymbol{A}) is lower Lipschitz semicontinuous at (𝑨0,𝒙0)(\boldsymbol{A}_{0},\boldsymbol{x}_{0}).

3.2 L(𝑨,μ)L(\boldsymbol{A},\mu) is Calm at ((𝑨0,μ0),𝒙0)((\boldsymbol{A}_{0},\mu_{0}),\boldsymbol{x}_{0})

Recall the map F(μ)F(\mu) defined in (6) and note that we can write

L(𝑨,μ)=M(𝑨)F(μ).L(\boldsymbol{A},\mu)=M(\boldsymbol{A})\cap F(\mu). (15)

Define also the system of inequalities

W(μ)=M(𝑨0)F(μ).W(\mu)=M(\boldsymbol{A}_{0})\cap F(\mu).

The following theorem from [16] gives conditions on MM and FF that ensure LL is calm.

Theorem 3.3.

Let XX, YY ,ZZ be Banach spaces. If F1:Y2XF_{1}:Y\to 2^{X} is calm at (y0,x0)Graph F1(y_{0},x_{0})\in\textrm{Graph }F_{1}, F2:Z2XF_{2}:Z\to 2^{X} is calm at (z0,x0)Graph F2(z_{0},x_{0})\in\textrm{Graph }F_{2}, F11F_{1}^{-1} has the Aubin property at (x0,y0)(x_{0},y_{0}). Finally, assume F3(z):=F1(y0)F2(z)F_{3}(z):=F_{1}(y_{0})\cap F_{2}(z) is calm at (z0,x0)(z_{0},x_{0}). Then F4(y,z):=F1(y)F2(z)F_{4}(y,z):=F_{1}(y)\cap F_{2}(z) is calm at ((y0,z0),x0)((y_{0},z_{0}),x_{0}).

We will apply the theorem with X=nX=\mathbb{R}^{n}, Y=m×nY=\mathbb{R}^{m\times n}, and Z=Z=\mathbb{R}. We set F1=MF_{1}=M, F2=FF_{2}=F, F3=WF_{3}=W, and F4=LF_{4}=L. Let ((𝑨0,μ0),𝒙0)Graph L((\boldsymbol{A}_{0},\mu_{0}),\boldsymbol{x}_{0})\in\textrm{Graph }L. To see that LL is calm at this point, we thus verify that

  1. 1.

    F1(𝒙)F^{-1}(\boldsymbol{x}) has the Aubin property at (𝒙0,μ0)(\boldsymbol{x}_{0},\mu_{0}).

  2. 2.

    F(μ)F(\mu) is calm at (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}).

  3. 3.

    W(μ)W(\mu) is calm at (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}).

We verify property 1 in Section 3.2.1, property 2 in Section 3.2.3, and property 3 in Section 3.2.3.

The main trick is that we have reduced the problem from perturbations in 𝑨\boldsymbol{A} to better understood cases of perturbations. For example, in verifying properties 2 and 3 we need to study tolerance to perturbations in the right hand side of an inequality system, represented by either F(μ)F(\mu) or W(μ)W(\mu). We prove the calmness of W(μ)W(\mu) and F(μ)F(\mu) by both geometric or algebraic approaches. The geometric ones are quite simple, while the algebraic ones enable us to compute explicit constants.

3.2.1 F1F^{-1} is Aubin

Since Ω\Omega is convex, it is Lipschitz on Bδ(𝒙0)B_{\delta}(\boldsymbol{x}_{0}) for some δ>0\delta>0. That is, there exists CΩC_{\Omega}\in\mathbb{R} so that

|Ω(𝒙)Ω(𝒙)|CΩ𝒙𝒙 for all 𝒙,𝒙Bδ(𝒙0).\left|\Omega(\boldsymbol{x})-\Omega(\boldsymbol{x}^{\prime})\right|\leq C_{\Omega}\left\lVert\boldsymbol{x}-\boldsymbol{x}^{\prime}\right\rVert\textrm{ for all }\boldsymbol{x},\boldsymbol{x}^{\prime}\in B_{\delta}(\boldsymbol{x}_{0}).

For 𝒙\boldsymbol{x}, 𝒙Bδ(𝒙0)\boldsymbol{x}^{\prime}\in B_{\delta}(\boldsymbol{x}_{0}) and μ1F1(𝒙)=[Ω(𝒙),)\mu_{1}\in F^{-1}(\boldsymbol{x})=[\Omega(\boldsymbol{x}),\infty), we obtain by direct estimation and the above inequality the desired Aubin property as follows:

d(μ1,F1(𝒙))|μ1Ω(𝒙)||Ω(𝒙)Ω(𝒙)|CΩ𝒙𝒙.d(\mu_{1},F^{-1}(\boldsymbol{x}^{\prime}))\leq\left|\mu_{1}-\Omega(\boldsymbol{x}^{\prime})\right|\leq\left|\Omega(\boldsymbol{x})-\Omega(\boldsymbol{x}^{\prime})\right|\leq C_{\Omega}\left\lVert\boldsymbol{x}-\boldsymbol{x}^{\prime}\right\rVert.

3.2.2 FF is calm

Geometric Proof: Note that by the degree-one homogeneity of the seminorm Ω\Omega, for μ>0\mu>0

F(μ)={𝒙nΩ(𝒙)μ}={𝒙nΩ(𝒙/μ)1}=μ{𝒙nΩ(𝒙)1}.F(\mu)=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x})\leq\mu\}=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x}/\mu)\leq 1\}=\mu\cdot\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x})\leq 1\}.

That is, for μ>0\mu>0, F(μ)F(\mu) is simply a scaling of the set {𝒙Ω(𝒙)1}\{\boldsymbol{x}\mid\Omega(\boldsymbol{x})\leq 1\}. By assumption, F(1)F(1) is a polyhedron in n\mathbb{R}^{n}. Since F(μ)=μF(1)F(\mu)=\mu F(1), then the graph of FF over any small interval is a polyhedron in n+1\mathbb{R}^{n+1}. This observation and Theorem 2.6 imply that FF is calm at (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}).

Algebraic Proof: This proof relies on Theorem 2.7 with f(𝒙):=Ω(𝒙)μ0f(\boldsymbol{x}):=\Omega(\boldsymbol{x})-\mu_{0}. We first verify (11) with the latter ff, which states that for all sequences {𝒙i}i𝒙0\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0} with Ω(𝒙i)>μ0\Omega(\boldsymbol{x}_{i})>\mu_{0},

τ(Ω,𝒙0):=lim inf𝒙i𝒙0d(0,Ω(𝒙))>0.\tau(\Omega,\boldsymbol{x}_{0}):=\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}d(0,\partial\Omega(\boldsymbol{x}))>0. (16)

We note that if 𝒙n\boldsymbol{x}\in\mathbb{R}^{n} and 0Ω(𝒙)0\in\partial\Omega(\boldsymbol{x}), then Ω(𝒙)=0\Omega(\boldsymbol{x})=0. Indeed, 0Ω(𝒙)0\in\partial\Omega(\boldsymbol{x}) only when 𝒙\boldsymbol{x} is a minimum of Ω\Omega, and since Ω\Omega is a seminorm, its minimum is achieved only on the set Ω(𝒙)=0\Omega(\boldsymbol{x})=0. We further note that the assumption 𝒙0F(μ0)\boldsymbol{x}_{0}\in F(\mu_{0}) implies that Ω(𝒙0)μ0>0\Omega(\boldsymbol{x}_{0})\geq\mu_{0}>0. These two observations yield that

d(0,Ω(𝒙0))>0.d(0,\partial\Omega(\boldsymbol{x}_{0}))>0. (17)

Next, we define h(𝒙):=d(0,Ω(𝒙))h(\boldsymbol{x}):=d(0,\partial\Omega(\boldsymbol{x})). If we can verify that hh is lower semicontinuous, then by (17) and the definition of lower semicontinuity, we conclude (16) as follows: lim inf𝒙i𝒙0h(𝒙i)h(𝒙0)>0.\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}h(\boldsymbol{x}_{i})\geq h(\boldsymbol{x}_{0})>0. The lower-semicontinuity of hh becomes obvious when we rewrite it as h(𝒙)=inf𝒚Ω(𝒙)𝒚2h(\boldsymbol{x})=\inf_{\boldsymbol{y}\in\partial\Omega(\boldsymbol{x})}\left\lVert\boldsymbol{y}\right\rVert_{2}. Indeed, it follows from the set-valued upper semicontinuity of Ω(𝒙)\partial\Omega(\boldsymbol{x}) and the definitions of lower semicontinuity of a real-valued function and upper semicontinuity of a set-valued function.

Since we verified (11) with f(𝒙)=Ω(𝒙)μ0f(\boldsymbol{x})=\Omega(\boldsymbol{x})-\mu_{0}, Theorem 2.7 implies that (10) holds with the same ff and also specifies the optimal constant in (10). Note that by the definitions of ff and FF, Q=F(μ0)Q=F(\mu_{0}). We thus proved that there exist ϵ>0\epsilon>0, such that for all 𝒙Bϵ(𝒙0)\boldsymbol{x}\in B_{\epsilon}(\boldsymbol{x}_{0})

d(𝒙,F(μ0))1τ(Ω,𝒙0)[Ω(𝒙)μ0]+.d(\boldsymbol{x},F(\mu_{0}))\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}[\Omega(\boldsymbol{x})-\mu_{0}]_{+}. (18)

We conclude by showing that this bound implies the calmness of FF. Let μ1Bδ(μ0)\mu_{1}\in B_{\delta}(\mu_{0}) (with δ\delta chosen small enough so that 0Bδ(μ0)0\notin B_{\delta}(\mu_{0})) and 𝒙1F(μ1)Bϵ(𝒙0)\boldsymbol{x}_{1}\in F(\mu_{1})\cap B_{\epsilon}(\boldsymbol{x}_{0}). By definition, Ω(𝒙1)μ1\Omega(\boldsymbol{x}_{1})\leq\mu_{1}, and we can thus write

Ω(𝒙1)μ0Ω(𝒙1)μ0(Ω(𝒙1)μ1)=μ1μ0.\Omega(\boldsymbol{x}_{1})-\mu_{0}\leq\Omega(\boldsymbol{x}_{1})-\mu_{0}-(\Omega(\boldsymbol{x}_{1})-\mu_{1})=\mu_{1}-\mu_{0}. (19)

The combination of (18) and (19) results in

d(𝒙1,F(μ0))1τ(Ω,𝒙0)[μ1μ0]+1τ(Ω,𝒙0)|μ0μ1|d(\boldsymbol{x}_{1},F(\mu_{0}))\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}[\mu_{1}-\mu_{0}]_{+}\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}\left|\mu_{0}-\mu_{1}\right| (20)

for all 𝒙1Bϵ(𝒙0)F(μ1)\boldsymbol{x}_{1}\in B_{\epsilon}(\boldsymbol{x}_{0})\cap F(\mu_{1}). Thus FF is calm in (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}).

3.2.3 WW is calm

Geometric Proof: Extending the polyhedral mapping proof from Section 3.2.2 is nontrivial because WW does not obey a simple scaling law. Instead, we analyze carefully the intersection of the affine set, M(𝑨0)M(\boldsymbol{A}_{0}), and the polyhedral set, F(μ0)F(\mu_{0}), near 𝒙0\boldsymbol{x}_{0}. We will bound the calmness constant for WW by (sin(θmin)τ)1(\sin(\theta_{\min})\,\tau)^{-1}, with τ=τ(Ω,𝒙0)\tau=\tau(\Omega,\boldsymbol{x}_{0}) from (16), and θmin=θmin(Ω,𝒙0)\theta_{\min}=\theta_{\min}(\Omega,\boldsymbol{x}_{0}) the smallest positive principal angle between the affine space M(𝑨0)M(\boldsymbol{A}_{0}) and the affine sets defining the faces of F(μ0)F(\mu_{0}) (of any dimension <n<n) containing 𝒙0\boldsymbol{x}_{0}.

Denote by 𝒰1\mathcal{U}_{1}, \ldots, 𝒰p\mathcal{U}_{p} the p=p(𝒙0)p=p(\boldsymbol{x}_{0}) faces of the polyhedron F(μ0)nF(\mu_{0})\in\mathbb{R}^{n} of any dimension less than nn that contain 𝒙0\boldsymbol{x}_{0} (and thus also intersect the mm-dimensional space M(𝑨0)M(\boldsymbol{A}_{0})). Denote by PF(μ0)P_{F(\mu_{0})} the projection operator of n\mathbb{R}^{n} onto F(μ0)F(\mu_{0}). This operator is well-defined since F(μ0)F(\mu_{0}) is convex. Choose ϵ>0\epsilon>0 small enough so that the set

PF(μ0)(W(μ1)Bϵ(𝒙0))P_{F(\mu_{0})}(W(\mu_{1})\cap B_{\epsilon}(\boldsymbol{x}_{0}))

intersects only the faces 𝒰1,,𝒰m\mathcal{U}_{1},\ldots,\mathcal{U}_{m} of F(μ0)F(\mu_{0}). Clearly, it also intersects M(𝑨0)M(\boldsymbol{A}_{0}). Choose coordinates so that 𝒙0=𝟎\boldsymbol{x}_{0}=\boldsymbol{0}. Then M(𝑨0)M(\boldsymbol{A}_{0}), 𝒰1,,𝒰m\mathcal{U}_{1},\ldots,\mathcal{U}_{m} can be viewed as linear subspaces of n\mathbb{R}^{n} and the following vectors in n\mathbb{R}^{n},

𝒙0=𝟎,𝒙1W(μ1)Bϵ(𝒙0), and 𝒙2:=PF(μ0)(𝒙1),\boldsymbol{x}_{0}=\boldsymbol{0},\ \ \boldsymbol{x}_{1}\in W(\mu_{1})\cap B_{\epsilon}(\boldsymbol{x}_{0}),\ \text{ and }\ \boldsymbol{x}_{2}:=P_{F(\mu_{0})}(\boldsymbol{x}_{1}),

form a right triangle, which is demonstrated in Figure 2.

Refer to caption𝒙1\boldsymbol{x}_{1}𝒙2\boldsymbol{x}_{2}𝒙0\boldsymbol{x}_{0}M(𝑨0)M(\boldsymbol{A}_{0})F(μ1)F(\mu_{1})F(μ0)F(\mu_{0})𝒰1\mathcal{U}_{1}𝒰1\mathcal{U}_{1}𝒰2\mathcal{U}_{2}θ\theta
Figure 2: Demonstration of the points 𝒙0\boldsymbol{x}_{0}, 𝒙1\boldsymbol{x}_{1} and 𝒙2\boldsymbol{x}_{2} and related quantities.

To prove that WW is calm at (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}) we wish to prove a bound of the form

d(𝒙1,W(μ0))c(μ0,𝒙0)|μ0μ1|, for some c(μ0,𝒙0)>0.d(\boldsymbol{x}_{1},W(\mu_{0}))\leq c(\mu_{0},\boldsymbol{x}_{0})\left|\mu_{0}-\mu_{1}\right|,\textrm{ for some }c(\mu_{0},\boldsymbol{x}_{0})>0. (21)

We start with the case when the triangle is degenerate, that is, either 𝒙1=𝒙0\boldsymbol{x}_{1}=\boldsymbol{x}_{0}, or 𝒙2=𝒙1\boldsymbol{x}_{2}=\boldsymbol{x}_{1}, or 𝒙2=𝒙0\boldsymbol{x}_{2}=\boldsymbol{x}_{0}. First, assume that 𝒙1=𝒙0\boldsymbol{x}_{1}=\boldsymbol{x}_{0}. This implies that 𝒙1W(μ0)\boldsymbol{x}_{1}\in W(\mu_{0}) and consequently the desired calmness: d(𝒙1,W(μ0))=0|μ0μ1|d(\boldsymbol{x}_{1},W(\mu_{0}))=0\leq\left|\mu_{0}-\mu_{1}\right|. Next, assume that 𝒙2=𝒙1\boldsymbol{x}_{2}=\boldsymbol{x}_{1}. By definition, PF(μ0)(𝒙1)=𝒙1P_{F(\mu_{0})}(\boldsymbol{x}_{1})=\boldsymbol{x}_{1} so 𝒙1F(μ0)\boldsymbol{x}_{1}\in F(\mu_{0}). This observation and the fact that 𝒙1M(𝑨0)\boldsymbol{x}_{1}\in M(\boldsymbol{A}_{0}) implies that 𝒙1W(μ0)\boldsymbol{x}_{1}\in W(\mu_{0}) and the same estimate holds as above. At last, we assume that 𝒙2=𝒙0\boldsymbol{x}_{2}=\boldsymbol{x}_{0}. Then 𝒙2W(μ0)\boldsymbol{x}_{2}\in W(\mu_{0}) so d(𝒙1,W(μ0))𝒙1𝒙2d(\boldsymbol{x}_{1},W(\mu_{0}))\leq\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert and we can bound 𝒙1𝒙2=d(𝒙1,F(μ0))\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert=d(\boldsymbol{x}_{1},F(\mu_{0})) by (20). We thus verified (21) for these cases with c(μ0,𝒙0)=1/τ(μ0,𝒙0)c(\mu_{0},\boldsymbol{x}_{0})=1/\tau(\mu_{0},\boldsymbol{x}_{0}).

In the case of non-degenerate triangles, we note that since 𝒙0=𝟎W(μ0)\boldsymbol{x}_{0}=\boldsymbol{0}\in W(\mu_{0}),

d(𝒙1,W(μ0))𝒙1𝒙0=𝒙1.d(\boldsymbol{x}_{1},W(\mu_{0}))\leq\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{0}\right\rVert=\left\lVert\boldsymbol{x}_{1}\right\rVert. (22)

Therefore, to obtain (21) we bound 𝒙1\left\lVert\boldsymbol{x}_{1}\right\rVert in terms of |μ0μ1|\left|\mu_{0}-\mu_{1}\right|. Since 𝟎\boldsymbol{0}, 𝒙1\boldsymbol{x}_{1} and 𝒙2\boldsymbol{x}_{2} form a right triangle, 𝒙1sinθ=𝒙1𝒙2\left\lVert\boldsymbol{x}_{1}\right\rVert\sin\theta=\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert, where θ\theta is the angle between 𝒙1\boldsymbol{x}_{1} and 𝒙2\boldsymbol{x}_{2}. The combination of this observation with the calmness of FF, stated in (20), results in the bound

𝒙1sinθ=𝒙1𝒙2=d(𝒙1,F(μ0))1τ(Ω,𝒙0)|μ1μ0|.\left\lVert\boldsymbol{x}_{1}\right\rVert\sin\theta=\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert=d(\boldsymbol{x}_{1},F(\mu_{0}))\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}\left|\mu_{1}-\mu_{0}\right|. (23)

We note that θ\theta depends on 𝒙1\boldsymbol{x}_{1} and thus (23) does not yield yet the desired bound on 𝒙1\left\lVert\boldsymbol{x}_{1}\right\rVert. To obtain this, we will establish a lower bound on θ\theta.

We index by k=1k=1, \ldots, \ell, where p\ell\leq p, the set of faces in 𝒰1\mathcal{U}_{1}, \ldots, 𝒰p\mathcal{U}_{p} having at least one positive principal angle with M(𝑨0)M(\boldsymbol{A}_{0}). Note that since we ruled out the trivial case, where the triangle formed by 𝒙0\boldsymbol{x}_{0}, 𝒙1\boldsymbol{x}_{1} and 𝒙2\boldsymbol{x}_{2} is degenerate, the latter set is not empty, that is, 1\ell\geq 1. For k=1k=1, \ldots, \ell, we denote by θk\theta_{k}^{*} the smallest positive principal angle between 𝒰k\mathcal{U}_{k} and M(𝑨0)M(\boldsymbol{A}_{0}). Since 𝒙1\boldsymbol{x}_{1} and 𝒙2\boldsymbol{x}_{2} are nonzero and contained in distinct subspaces, they are orthogonal to the principal vectors corresponding to the zero principal angles (see (8)). Hence θkθ\theta_{k}^{*}\leq\theta for k=1k=1, \ldots, \ell. We further denote by θmin\theta_{\textrm{min}} the minimal value of θk\theta_{k}^{*} among all k=1k=1, \ldots, \ell and have that θminθ\theta_{\textrm{min}}\leq\theta. Combining this observation with (22) yields the desired calmness of WW at (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}):

d(𝒙1,W(μ0))1sinθkminτ(Ω𝒙0)|μ1μu|.d(\boldsymbol{x}_{1},W(\mu_{0}))\leq\frac{1}{\sin\theta_{k_{\textrm{min}}}\tau(\Omega\boldsymbol{x}_{0})}\left|\mu_{1}-\mu_{u}\right|.

Algebraic Proof: The follows a similar approach to that in Section 3.2.2. We first write our set WW as the inequality system

W(μ0)=F(μ0)M(𝑨0)={𝒙nf(𝒙)0}, where f(𝒙):=max(Ω(𝒙)μ0,d(𝒙,M(𝑨0))).W(\mu_{0})=F(\mu_{0})\cap M(\boldsymbol{A}_{0})=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid f(\boldsymbol{x})\leq 0\},\quad\textrm{ where }f(\boldsymbol{x}):=\max(\Omega(\boldsymbol{x})-\mu_{0},d(\boldsymbol{x},M(\boldsymbol{A}_{0}))).

We note that by definition f0f\geq 0 and thus W(μ0)={𝒙nf(𝒙)0}W(\mu_{0})=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid f(\boldsymbol{x})\leq 0\}, though we treat it as a system of inequalities.

Similarly to the proof in in Section 3.2.2, the calmness of W(μ)W(\mu) at (μ0,𝒙0)(\mu_{0},\boldsymbol{x}_{0}) with 𝒙0W(μ0)\boldsymbol{x}_{0}\in W(\mu_{0}) follows from (11) with the later ff. That is, we need to verify that for all sequences {𝒙i}i𝒙0\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0} with f(𝒙i)>0f(\boldsymbol{x}_{i})>0

τ(f,𝒙0):=lim inf𝒙i𝒙0d(0,f(𝒙))>0.\tau^{\prime}(f,\boldsymbol{x}_{0}):=\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}d(0,\partial f(\boldsymbol{x}))>0. (24)

The argument here is more subtle than in Section 3.2.2 since here 0f(𝒙0)0\in\partial f(\boldsymbol{x}_{0}), whereas before 0Ω(𝒙0)0\notin\partial\Omega(\boldsymbol{x}_{0}). We thus need a more subtle analysis of the function 𝒙d(0,f(𝒙))\boldsymbol{x}\mapsto d(0,\partial f(\boldsymbol{x})). The definition of W(μ0)W(\mu_{0}) and the fact that f0f\geq 0 imply that the minimal value of ff, which is 0, is achieved at W(μ0)W(\mu_{0}). Thus 0f(𝒙)0\in\partial f(\boldsymbol{x}) if and only if f(𝒙)=0f(\boldsymbol{x})=0. Hence, for any sequence {𝒙i}i𝒙0\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0} with f(𝒙i)>0f(\boldsymbol{x}_{i})>0, 0f(𝒙i)0\notin\partial f(\boldsymbol{x}_{i}), or equivalently

d(0,f(𝒙i))>0.d(0,\partial f(\boldsymbol{x}_{i}))>0. (25)

In order to conclude (24), we will show that d(0,f(𝒙))d(0,\partial f(\boldsymbol{x})) is piecewise constant and combine this with (25). By Theorem 3 of Chapter 4 of [23] we have

f(𝒙)=co({Ω(𝒙)Ω(𝒙)μ0=f(𝒙)}{d(𝒙,M(𝑨0))d(𝒙,M(𝑨0))=f(𝒙)}),\partial f(\boldsymbol{x})=\textrm{co}\left(\{\partial\Omega(\boldsymbol{x})\mid\Omega(\boldsymbol{x})-\mu_{0}=f(\boldsymbol{x})\}\cup\{\partial d(\boldsymbol{x},M(\boldsymbol{A}_{0}))\mid d(\boldsymbol{x},M(\boldsymbol{A}_{0}))=f(\boldsymbol{x})\}\right), (26)

where co denotes the convex hull. Since Ω(𝒙)\Omega(\boldsymbol{x}) has polyhedral level sets, Ω(𝒙)\partial\Omega(\boldsymbol{x}) is piecewise constant in the sense that Ω(𝒙)\partial\Omega(\boldsymbol{x}) outputs only finitely many different sets. Similarly, since M(𝑨0)M(\boldsymbol{A}_{0}) is also a polyhedron, d(𝒙,M(𝑨0))\partial d(\boldsymbol{x},M(\boldsymbol{A}_{0})) is piecewise constant as a set-valued function. These two observations and (26) thus imply that f(𝒙)\partial f(\boldsymbol{x}) is piecewise constant as a set-valued function. Hence, d(0,f(𝒙))d(0,\partial f(\boldsymbol{x})) is piecewise constant as a real-valued function and the proof is concluded.

4 Conclusion

We have shown that if Ω\Omega has polyhedral level sets, then the solution map S(𝑨)S(\boldsymbol{A}) is calm. By tracing the constants in our proof and in Klatte and Kummer [16], we see that a calmness constant for SS is

max(𝑨0,1τ)(1+2CΩτ)CΩ𝑨0,\max\left(\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert,\frac{1}{\tau}\right)\left(1+\frac{2C_{\Omega}}{\tau^{\prime}}\right)C_{\Omega}\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert,

where CΩC_{\Omega} is the Lipschitz constant of Ω\Omega from Section 3.2.1, τ\tau is defined in (16) and τ\tau^{\prime} is defined in (24).

A key assumption for proving that WW is calm is that Ω\Omega has polyhedral sublevel sets. Removing this assumption would allow us to recover known calmness results such as the case Ω(𝒙)=𝒙2\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{2} [24]. Unfortunately, WW is in fact not calm for Ω(𝒙)=𝒙2\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{2} as we show below. Therefore, our approach needs a major change if one wants to extend it to other known seminorms.

We verify our claim above by using the following equivalent definition of calmness: A map G:Y2XG:Y\to 2^{X} is calm at (𝒚0,𝒙0)(\boldsymbol{y}_{0},\boldsymbol{x}_{0}) if and only if there exist a neighborhood UU of 𝒙0\boldsymbol{x}_{0} and a constant c(𝒙0)>0c(\boldsymbol{x}_{0})>0 such that

d(𝒙,G(𝒚0))c(𝒙0)d(𝒚0,G1(𝒙)) for all 𝒙U.d(\boldsymbol{x},G(\boldsymbol{y}_{0}))\leq c(\boldsymbol{x}_{0})\,d(\boldsymbol{y}_{0},G^{-1}(\boldsymbol{x}))\ \text{ for all }\ \boldsymbol{x}\in U. (27)

Using this definition, we show that W(μ)W(\mu) with Ω(𝒙)=𝒙2\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{2} in 2\mathbb{R}^{2}, 𝑨0=(0,1)\boldsymbol{A}_{0}=(0,-1) and b=1b=1 is not calm at (μ0,𝒙0)=(1,(0,1))(\mu_{0},\boldsymbol{x}_{0})=(1,(0,-1)). Recall that by definition

W(μ)=F(μ)M(𝑨0).W(\mu)=F(\mu)\cap M(\boldsymbol{A}_{0}).

In our case,

F(μ)={𝒙𝒙2μ},M(𝑨0)=(0,1)+Sp((1,0)).F(\mu)=\{\boldsymbol{x}\mid\left\lVert\boldsymbol{x}\right\rVert_{2}\leq\mu\},\quad M(\boldsymbol{A}_{0})=(0,-1)+\mathrm{Sp}((1,0)).

Thus

W1(𝒙)=[𝒙2,)W^{-1}(\boldsymbol{x})=[\left\lVert\boldsymbol{x}\right\rVert_{2},\infty)

so the distance is simply

d(μ0,W1(𝒙))=𝒙2μ0=𝒙12+𝒙22μ0.d(\mu_{0},W^{-1}(\boldsymbol{x}))=\left\lVert\boldsymbol{x}\right\rVert_{2}-\mu_{0}=\sqrt{\boldsymbol{x}_{1}^{2}+\boldsymbol{x}_{2}^{2}}-\mu_{0}.

Consequently, noting that W(μ0)=(0,1)W(\mu_{0})=(0,-1) we have

lim𝒙2=1,𝒙10d(μ0,W1(𝒙))d(𝒙,W(μ0))=lim𝒙2=1,𝒙10𝒙12+11𝒙1=0\lim_{\boldsymbol{x}_{2}=-1,\boldsymbol{x}_{1}\to 0}\frac{d(\mu_{0},W^{-1}(\boldsymbol{x}))}{d(\boldsymbol{x},W(\mu_{0}))}=\lim_{\boldsymbol{x}_{2}=-1,\boldsymbol{x}_{1}\to 0}\frac{\sqrt{\boldsymbol{x}_{1}^{2}+1}-1}{\boldsymbol{x}_{1}}=0

and thus no such bound of the form (27) exists with c(𝒙0)>0c(\boldsymbol{x}_{0})>0.

References

  • [1] David L. Donoho and Michael Elad “Optimally sparse representation in general (nonorthogonal) dictionaries via l1l^{1} minimization” In Proc. Natl. Acad. Sci. USA 100.5, 2003, pp. 2197–2202 DOI: 10.1073/pnas.0437847100
  • [2] E. J. Candès, J. Romberg and T. Tao “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information” In IEEE Transactions on Information Theory 52.2, 2006, pp. 489–509 DOI: 10.1109/TIT.2005.862083
  • [3] Emmanuel J. Candès and Terence Tao “Near-optimal signal recovery from random projections: Universal encoding strategies?” In IEEE Transactions on Information Theory 52.12, 2006, pp. 5406–5425 DOI: 10.1109/TIT.2006.885507
  • [4] D. L. Donoho “For most large underdetermined systems of linear equations the minimal 1\ell_{1}-norm solution is also the sparsest solution” In Communications on Pure and Applied Mathematics 59.6, 2006, pp. 797–829 DOI: 10.1002/cpa.20132
  • [5] David L Donoho “Compressed sensing” In IEEE Transactions on information theory 52.4 IEEE, 2006, pp. 1289–1306
  • [6] Emmanuel J Candès and Terence Tao “Near-optimal signal recovery from random projections: Universal encoding strategies?” In IEEE transactions on information theory 52.12 IEEE, 2006, pp. 5406–5425
  • [7] Emmanuel J Candès “The restricted isometry property and its implications for compressed sensing” In Comptes rendus mathematique 346.9-10, 2008, pp. 589–592
  • [8] D. L. Donoho, M. Elad and V. N. Temlyakov “Stable recovery of sparse overcomplete representations in the presence of noise” In IEEE Transactions on Information Theory 52.1, 2006, pp. 6–18 DOI: 10.1109/TIT.2005.860430
  • [9] J. A. Tropp “Just relax: convex programming methods for identifying sparse signals in noise” In IEEE Transactions on Information Theory 52.3, 2006, pp. 1030–1051 DOI: 10.1109/TIT.2005.864420
  • [10] Matthew A. Herman and Thomas Strohmer “General deviants: An analysis of perturbations in compressed sensing” In IEEE Journal on Selected Topics in Signal Processing 4.2, 2010, pp. 342–349 DOI: 10.1109/JSTSP.2009.2039170
  • [11] Matthew A Herman and Thomas Strohmer “High-resolution radar via compressed sensing” In IEEE transactions on signal processing 57.6 IEEE, 2009, pp. 2275–2284
  • [12] Albert C Fannjiang, Thomas Strohmer and Pengchong Yan “Compressed remote sensing of sparse objects” In SIAM Journal on Imaging Sciences 3.3 SIAM, 2010, pp. 595–618
  • [13] Rémi Gribonval, Holger Rauhut, Karin Schnass and Pierre Vandergheynst “Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms” In Journal of Fourier analysis and Applications 14.5-6 Springer, 2008, pp. 655–687
  • [14] Thomas Blumensath and Mike Davies “Compressed sensing and source separation” In International Conference on Independent Component Analysis and Signal Separation, 2007, pp. 341–348 Springer
  • [15] Alex Gutierrez et al. “Reducing the Complexity of Model-Based MRI Reconstructions via Sparsification” Submitted, 2020
  • [16] Diethard Klatte and Bernd Kummer “On Calmness of the Argmin Mapping in Parametric Optimization Problems” In Journal of Optimization Theory and Applications 165.3, 2015, pp. 708–719 DOI: 10.1007/s10957-014-0643-2
  • [17] Diethard Klatte and Bernd Kummer “Nonsmooth Equations in Optimization: Regularity, Calculus, Methods, and Applications” In Journal of Chemical Information and Modeling 53.9 Springer, 2002, pp. 1689–1699 DOI: 10.1017/CBO9781107415324.004
  • [18] Abderrahim Jourani “Hoffman’s error bound, local controllability, and sensitivity analysis” In SIAM Journal on Control and Optimization 38.3, 2000, pp. 947–970
  • [19] Per Åke Wedin “On angles between subspaces of a finite dimensional inner product space”, 1983, pp. 263–285 DOI: 10.1007/bfb0062107
  • [20] A.J. Hoffman “On approximate solutions of systems of linear inequalities” In Journal of Research of the National Bureau of Standards 49.4, 1952, pp. 263 DOI: 10.6028/jres.049.027
  • [21] Huynh Van Ngai, Alexander Kruger and Michel Théra “Stability of Error Bounds for Semi-infinite Convex Constraint Systems” In SIAM Journal on Optimization 20.4, 2010, pp. 2080–2096 DOI: 10.1137/090767819
  • [22] Stephen L Campbell and Carl D Meyer “Generalized inverses of linear transformations” SIAM, 2009
  • [23] Aleksandr Davidovich Ioffe “Theory of extremal problems”, Studies in mathematics and its applications ; v. 6 Amsterdam ; New York : New York: North-Holland Pub. Co. ; Sole distributors for the U.S.A.Canada, Elsevier North-Holland, 1979
  • [24] Jiu Ding “Perturbation analysis for the projection of a point to an affine set” In Linear Algebra and Its Applications 191.C, 1993, pp. 199–212 DOI: 10.1016/0024-3795(93)90515-P