You Need to Calm Down: Calmness Regularity for a Class of Seminorm Optimization Problems^†^†thanks: This work was supported by NSF award DMS-18-30418.

Alex Gutierrez, Gilad Lerman, Sam Stewart

Abstract

Compressed sensing involves solving a minimization problem with objective function $\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{1}$ and linear constraints $\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}$ . Previous work has explored robustness to errors in $\boldsymbol{A}$ and ${\boldsymbol{b}}$ under special assumptions. Motivated by these results, we explore robustness to errors in $\boldsymbol{A}$ for a wider class of objective functions $\Omega$ and for a more general setting, where the solution may not be unique. Similar results for errors in ${\boldsymbol{b}}$ are known and easier to prove. More precisely, for a seminorm $\Omega(\boldsymbol{x})$ with a polyhedral unit ball, we prove that the set-valued map $S(\boldsymbol{A})=\operatorname*{arg\,min}_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x})$ is calm in $\boldsymbol{A}$ whenever $\boldsymbol{A}$ has full rank and the minimum value is positive, where calmness is a kind of local Lipschitz regularity.

1 Introduction

Convex optimization problems with linear constraints $\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}$ , where $\boldsymbol{A}\in\mathbb{R}^{m\times n}$ is a full-rank matrix with $m<n$ , covers a wide variety of applications. A well-known example is compressed sensing. In this example, one aims to recover a sparse signal given a vector ${\boldsymbol{b}}$ of reduced number of linear measurements and the very special measurement vectors stored as the rows of $\boldsymbol{A}$ . Under some restricting assumptions, the unknown signal is the solution of $\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}$ with minimal $\ell_{1}$ norm [1, 2, 3, 4, 5], or equivalently, the solution of

\min_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\left\lVert\boldsymbol{x}\right\rVert_{1}.

(1)

A particular case of interest, which is motivated by application to MRI, assumes Fourier measurements [2]. In this case, sampling can be done by taking $m=\mathcal{O}(s\log n)$ random measurements of the Fourier coefficients of the unknown signal $\boldsymbol{x}^{*}$ , where $s$ is an upper bound for the sparseness of $\boldsymbol{x}^{*}$ . These random measurements are dot products of the form $\langle\boldsymbol{x}^{*},\,\boldsymbol{\psi}_{k}\rangle$ , where $\boldsymbol{\psi}_{k}$ is a randomly selected row of the $n\times n$ discrete Fourier transform matrix. Therefore, $\boldsymbol{A}$ has rows $\boldsymbol{\psi}_{1}$ , $\ldots$ , $\boldsymbol{\psi}_{k}$ and ${\boldsymbol{b}}=\boldsymbol{A}\boldsymbol{x}^{*}$ . Candès, Romberg and Tao [2] showed that with probability $1$ the solution of (1) is unique and recovers $\boldsymbol{x}^{*}$ . This observation extends to other kinds of measurements (see [4] and also [3, 5]).

In applications, the actual measurements of $\mathbf{A}$ and ${\boldsymbol{b}}$ are noisy, so understanding the reconstruction error under noise is important. Several works [6, 7, 8, 9] have proven that for very special measurement matrices $\boldsymbol{A}$ and for noisy measurements $\hat{{\boldsymbol{b}}}$ of the form $\hat{{\boldsymbol{b}}}=\boldsymbol{A}\boldsymbol{x}^{*}+\boldsymbol{e}$ , where $\left\lVert\boldsymbol{e}\right\rVert_{2}$ is sufficiently small and $\boldsymbol{x}^{*}$ has a sufficiently close approximation in $\ell_{1}$ by an $s$ -sparse vector, the solution $\boldsymbol{x}$ of (1) with ${\boldsymbol{b}}$ replaced by $\hat{{\boldsymbol{b}}}$ is sufficiently close to $\boldsymbol{x}^{*}$ in $\ell_{2}$ norm. Herman and Strohmer [10] extend this theory by allowing possible errors in the measurement matrix $\boldsymbol{A}$ .

Many other applications involve error in the matrix $\boldsymbol{A}$ . Examples include radar [11], remote sensing [12], telecommunications [13] and source separation [14]. More recently, Gutierrez et al. [15] developed a compression technique for model-based MRI reconstruction. Here there are at least two sources of error: the discretization of a set of ODEs and the search for a sparse basis to represent the simulation results. They use the total variation norm instead of the $\ell^{1}$ norm.

Motivated by these broad model problems we assume a seminorm $\Omega$ from a particular class which includes the $\ell^{1}$ and total variation norms, and we prove a general stability result, with respect to perturbations in $\boldsymbol{A}$ , for the convex optimization problem

\min_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x}).

(2)

Even for the special case of the $\ell_{1}$ norm, our result differs from the earlier perturbation result of Herman and Strohmer [10] since we do not enforce the more restricted regime of singleton solution sets. We are thus able to eliminate their conditions on the matrix ${\boldsymbol{A}}$ and prove a kind of local stability result. For general convex $\Omega$ , Klatte and Kummer [16] have shown that (2) is stable with respect to perturbations in ${\boldsymbol{b}}$ . We extend their result to stability in $\boldsymbol{A}$ for a restricted class of $\Omega$ .

The paper is organized as follows. In Section 2 we define the various notions of set-valued regularity and describe stability results for set-valued functions that are related to our problem. In Section 3, we formulate and prove our theorem using tools from subspace perturbation theory, the theory of polyhedral mappings, the theory of local error bounds for convex inequality systems, and the theory of set-valued regularity. Finally, in Section 4 we conclude this work and clarify the difficulty of extending our approach to the full class of seminorms.

2 Background

Our theory and proof combine ideas from different areas. We review here the necessary background according to the designated topics indicated by the section titles. We remark that the theory we use is formulated for a more general settings. Therefore, here and throughout the paper we formulate ideas both for our special setting and a setting of Banach spaces with a more general form of $\Omega$ and constraints. We often use the same notation for both settings.

2.1 Mathematical Formulation of Our Objective

We consider the following solution map

S(\boldsymbol{A})=\operatorname*{arg\,min}_{\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x})=\operatorname*{arg\,min}_{\boldsymbol{x}\in M(\boldsymbol{A})}\Omega(\boldsymbol{x}),

(3)

where $M(\boldsymbol{A})$ denotes the constraint set

M(\boldsymbol{A})=\{\boldsymbol{z}\in\mathbb{R}^{n}\mid\boldsymbol{A}\boldsymbol{z}={\boldsymbol{b}}\}.

We note that generally $S(\boldsymbol{A})$ is a set-valued function. We arbitrarily fix $\boldsymbol{A}_{0}\in\mathbb{R}^{m\times n}$ with full rank and quantitatively study the effect of perturbing $\boldsymbol{A}_{0}$ on the set-valued solution map $S(\boldsymbol{A})$ .

2.2 Regularity of Set-Valued Functions

In order to quantify the effect of perturbing $\boldsymbol{A}_{0}$ on the set-valued solution map $S(\boldsymbol{A})$ , we use some well-known notions of regularity of set-valued functions. We first motivate them for our setting by considering the special case where $\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{1}$ and $S(\boldsymbol{A}_{0})$ is an entire face of the $\ell^{1}$ ball. Then a small perturbation in $\boldsymbol{A}_{0}$ results in a jump from that face to a single vertex as we demonstrate in Figure 1. A model set-valued function with such a jump, which needs to be handled by the developed theory, is

G(t)=\begin{cases}[0,1]&\textrm{ if }t\in(1/2,1]\cr 0&\textrm{ if }t\in[0,1/2).\end{cases}

This function has several properties, which we will formulate below.

We will state their definitions in terms of Banach spaces $X$ and $Y$ , where we denote by $2^{X}$ all subsets of $X$ , $\|\cdot\|$ the norm of $Y$ and $d(\cdot,\cdot)$ , the induced distance by $\|\cdot\|$ between points or sets in $X$ . Throughout the paper we go back and forth between the general setting and our special setting of $X=\mathbb{R}^{n}$ with the Euclidean norm, $\|\cdot\|$ , and $Y=\mathbb{R}^{m\times n}$ with the distance induced by the spectral norm. We also denote the spectral norm by $\|\cdot\|$ . The induced distance by either norm is denoted by either $\|\cdot-\cdot\|$ , when involving only matrices, or $d(\cdot,\cdot)$ , when involving sets of matrices. Even though we use the same notation, e.g., $\|\cdot\|$ and $d(\cdot,\cdot)$ for different setting, they are easily distinguishable. When assuming norms and distances in the Euclidean case, we use vectors, which we denote by boldfaced small letters; when assuming matrices, we denote them by boldfaced capital letters; and when assuming abstract Banach spaces, we denote their elements by regular small letters.

It is somewhat intuitive that the above function $G(t)$ is “upper semicontinuous”. We clarify this notion as follows.

Definition 2.1.

A map $F:Y\to 2^{X}$ is upper semicontinuous at $y_{0}$ if for any open set $V$ intersecting $F(y_{0})$ there exists a neighborhood $U$ of $y_{0}$ such that

F(y)\cap V\neq\emptyset,\quad\textrm{ for all }y\in U.

Refer to caption — Figure 1: Demonstration of a case where $S(\boldsymbol{A}_{0})$ is a face of the unit $\ell_{1}$ ball and $S(\boldsymbol{A})$ , where $\boldsymbol{A}=\boldsymbol{A}_{0}+\boldsymbol{E}$ and $0<\|\boldsymbol{E}\|\ll 1$ , is a vertex of this ball. Here, the line intersecting the face represents the solution of $\boldsymbol{A}_{0}\boldsymbol{x}={\boldsymbol{b}}$ , the other line represents the solution of $\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}$ , and the sets $S(\boldsymbol{A}_{0})$ and $S(\boldsymbol{A})$ are the intersections of these lines, respectively, with the unit $\ell_{1}$ ball, which is represented by the rotated square.

It has been known since the 1970s that if $\Omega$ is convex, then the solution map $S(\boldsymbol{A})$ is upper semicontinuous [17, Theorem 1.15]. Our result upgrades the upper semicontinuity of $S(\boldsymbol{A})$ to a kind of set-valued Lipschitz regularity, which we define next.

Definition 2.2.

A map $F:Y\to 2^{X}$ is calm at $(y_{0},x_{0})\in\textrm{Graph}(F)$ if there exist neighborhoods $U$ and $V$ of $y_{0}$ and $x_{0}$ , respectively, and a constant $L(y_{0},x_{0})$ such that for all $y_{1}\in U$ and $x_{1}\in F(y_{1})\cap V$ ,

d(x_{1},F(y_{0}))\leq L(y_{0},x_{0})\left\lVert y_{0}-y_{1}\right\rVert.

We note that if $S(\boldsymbol{A})$ is calm at $(\boldsymbol{A}_{0},\boldsymbol{x}_{0})$ then for all $\boldsymbol{A}_{1}$ sufficiently close, there exists $\boldsymbol{x}_{1}\in S(\boldsymbol{A}_{1})$ such that

\left\lVert\boldsymbol{x}_{0}-\boldsymbol{x}_{1}\right\rVert\leq C(\boldsymbol{A}_{0})\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert.

This means that small perturbations to the measurement matrix $\boldsymbol{A}$ leave at least two solutions $\boldsymbol{x}_{0}\in S(\boldsymbol{A}_{0})$ and $\boldsymbol{x}_{1}\in S(\boldsymbol{A}_{1})$ close.

We demonstrate that calmness is stronger than upper semicontinuity with the following example

H(t)=[-\sqrt{t},\sqrt{t}]\ \text{ for }t\in[0,1].

The upper semicontinuity of $H$ is obvious. It is not calm since for $t_{0}$ and $t_{1}$ with sufficiently small absolute values the distance between $H(t_{0})$ and $H(t_{1})$ grows as $|\sqrt{t_{0}}-\sqrt{t_{1}}|$ and in general cannot be controlled by $|t_{0}-t_{1}|$ . Indeed, if, for example, $t_{0}=0$ and $t>0$ is arbitrary small, then $H(0)=\{0\}$ (or one may write $H(0)=0$ to emphasize the single value) and $d(H(0),H(t))=\sqrt{t}\gg t$ .

At last, we define a stronger notion of regularity, which will be useful later. Unlike lower Lipschitz semicontinuity, both $y$ and $y^{\prime}$ are allowed to vary. Here and throughout the paper, $B_{\delta}(y)$ denotes an open ball in $Y$ with center $y$ and radius $\delta$ , and similarly $B_{\epsilon}(x)$ denotes a corresponding open ball in $X$ .

Definition 2.3.

A map $F:Y\to 2^{X}$ is said to have the Aubin property at $(y_{0},x_{0})\in\textrm{Graph }F$ if there exists $\epsilon$ , $\delta$ and $C>0$ such that for all $y,y^{\prime}\in B_{\delta}(y_{0})$ and all $x\in B_{\epsilon}(x_{0})\cap F(y)$

d(x,F(y^{\prime}))\leq C\left\lVert y-y^{\prime}\right\rVert.

Note that we will use the Aubin property on the “inverse” of $F$ defined as

F^{-1}(y)=\{x\mid x\in F(y)\}

Note that the Aubin property of $F^{-1}$ implies a local bi-Lipschitz property of $F$ .

2.3 Sufficient Conditions for Calmness of Certain Solution Maps

In the general setting of Banach spaces $X$ , $Y$ (with the above notation) and $Z$ , and functions $M:Y\to 2^{X}$ and $f:Z\times Y\to\mathbb{R}$ , Klatte and Kummer [16] provide sufficient conditions for calmness at $(y_{0},x_{0})\in Y\times 2^{X}$ of

S(y)=\operatorname*{arg\,min}_{z\in M(y)}f(z,y).

(4)

These sufficient conditions are requirements on the regularity of the constraint set $M(y)$ and the regularity of the mapping

L(y,\mu)=\{x\in M(y)\mid f(x,y_{0})\leq\mu\}

(5)

for a certain choice of $\mu$ . The regularity of these quantities is quantified by calmness and the following notion of lower Lipschitz semicontinuity. It is almost identical to calmness except that $y$ varies instead of $x$ .

Definition 2.4.

A set-valued map $F:Y\to 2^{X}$ is called lower Lipschitz semicontinuous at $(y_{0},x_{0})$ if there exists $\delta,C>0$ such that

d(x_{0},F(y))\leq C\left\lVert y_{0}-y_{1}\right\rVert,\quad\textrm{ for all }y\in B_{\delta}(y_{0}).

We formulate Theorem 3.1 of Klatte and Kummer [16] on sufficient conditions for calmness of $S(y)$ . We use the quantity $\mu_{0}=f(S(y_{0}),y_{0})$ for $y_{0}\in Y$ and $x_{0}\in X$ . It follows from the definition of $f$ that this quantity is well-defined as $f$ is constant on all values of $S(y_{0})$ .

Theorem 2.5.

Assume the general setting of Banach spaces $X$ (with metric $d$ ), $Y$ (with norm $\|\cdot\|$ ) and $Z$ , and functions $M:Y\to 2^{X}$ , $f:Z\times Y\to\mathbb{R}$ , $S:Y\to 2^{X}$ defined in (4) and $L:Y\times\mathbb{R}\to X$ defined in (5). If $y_{0}\in Y$ , $x_{0}\in X$ , $\mu_{0}=f(S(y_{0}),y_{0})$ and

1.

$M(y)$ is calm and lower Lipschitz semicontinuous at $(y_{0},x_{0})$ ;
2.

$L(y,\mu)$ is calm at $((y_{0},\mu_{0}),x_{0})$ ;

then $S(y)$ is calm at $(y_{0},x_{0})$ .

2.4 Polyhedral Level Sets

We assume that the unit ball of $\Omega$ , $\{\boldsymbol{x}\mid\Omega(\boldsymbol{x})\leq 1\}$ , is a polyhedron. For brevity, we refer by polyhedron to a convex polyhedron, which is the intersection of finitely many half spaces.

We relate the above property to the following well-known general notions of a polyhedral map and polyhedral level sets, which we use in this work. We define the former notion for a general function and the latter one only for $\Omega$ . For Banach spaces $X$ and $Y$ , a map $F:X\to Y$ is a polyhedral map if there exists polyhedra $P_{i},i=1,\ldots,N$ , such that

\textrm{Graph}F=\{(x,F(x))\mid x\in X\}=\bigcup_{i=1}^{N}P_{i}.

We say that $\Omega$ has polyhedral level sets if the level-set map

F(\mu)=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x})\leq\mu\}

(6)

is polyhedral. Since $\Omega$ is a seminorm, it is equivalent to the assumed property that $F(1)$ consists of a single polyhedron $P_{1}$ (this claim will be clearer from the geometric argument presented later in Section 3.2.2).

We use a classic result from the theory of polyhedral mappings [18]:

Theorem 2.6.

A polyhedral set-valued mapping $H:\mathbb{R}\to 2^{\mathbb{R}^{n}}$ is calm at all $(by_{0},\boldsymbol{x}_{0})\in\textrm{Graph}(H)$ .

2.5 Principal Angles

We exploit the geometric interplay between the affine subspace solution of $\boldsymbol{A}\boldsymbol{x}={\boldsymbol{b}}$ and the affine subspaces of the polyhedral level sets using principal angles between shifted linear subspaces [19]. The principal angles $0\leq\theta_{1}\leq\cdots\leq\theta_{r}\leq\pi/2$ between two linear subspaces $\mathcal{U}$ and $\mathcal{V}$ of $\mathbb{R}^{n}$ with dimensions $r$ and $k$ , respectively, where $r<k$ , are defined recursively along with principal vectors $\boldsymbol{u}_{1}$ , $\ldots$ , $\boldsymbol{u}_{r}$ and $\boldsymbol{v}_{1}$ , $\ldots$ , $\boldsymbol{v}_{r}$ . The smallest principal angle $\theta_{1}$ is given by

\theta_{1}=\min_{\begin{subarray}{c}\boldsymbol{u}\in\mathcal{U},\boldsymbol{v}\in\mathcal{U}\\ \|\boldsymbol{u}\|=\|\boldsymbol{v}\|=1\end{subarray}}\arccos(|\boldsymbol{u}^{T}\boldsymbol{v}|).

(7)

The principal vectors $\boldsymbol{u}_{1}$ and $\boldsymbol{v}_{1}$ achieve the minimum in (7). For $2\leq k\leq r$ , the $k$ th principal angle, $\theta_{k}$ , and principal vectors, $\boldsymbol{u}_{k}$ and $\boldsymbol{v}_{k}$ , are defined by

\theta_{k}=\min_{\begin{subarray}{c}\boldsymbol{u}\in\mathcal{U},\|\boldsymbol{u}\|=1,\boldsymbol{v}\perp\boldsymbol{u}_{1},\dots,\boldsymbol{u}_{k-1}\\ \boldsymbol{v}\in\mathcal{V},\|\boldsymbol{v}\|=1,\boldsymbol{v}\perp\boldsymbol{v}_{1},\dots,\boldsymbol{v}{k-1}\end{subarray}}\arccos(|\boldsymbol{u}^{T}\boldsymbol{v}|),

(8)

where $\boldsymbol{u}_{k}$ and $\boldsymbol{v}_{k}$ achieve the minimum in (8)

2.6 Bounds for a System of Convex Inequalities

Our proof gives rise to a system of inequalities $f(\boldsymbol{x})\leq 0$ , where $f$ is a convex function, and requires bounding the distance between the solution set of this system and a point that lies near the boundary of this set. The sharp constant of the required bound uses the subdifferential of $f$ .

We first recall that the subdifferential of a convex function $f:\mathbb{R}^{n}\to\mathbb{R}$ is defined as the following set of subgradients of $f$ :

\partial f(\boldsymbol{x}):=\{\boldsymbol{v}\in\mathbb{R}^{n}\mid f(\boldsymbol{z})-f(\boldsymbol{x})\geq\langle\boldsymbol{v},\,\boldsymbol{z}-\boldsymbol{x}\rangle\textrm{ for all }\boldsymbol{z}\in\mathbb{R}^{n}\}.

The use of the subdifferential to bound distances to constraint sets began with Hoffman’s original work [20] on error bounds for systems of inequalities $\boldsymbol{A}\boldsymbol{x}\leq b$ . Define $\mathcal{S}_{\boldsymbol{A},{\boldsymbol{b}}}$ to be the solution set of all $\boldsymbol{x}\in\mathbb{R}^{n}$ that satisfy $\boldsymbol{A}\boldsymbol{x}\leq{\boldsymbol{b}}$ . Assuming this set is nonempty, Hoffman shows that there exists a constant $C$ so that for all $\boldsymbol{x}\in\mathbb{R}^{n}$

d(\boldsymbol{x},\mathcal{S}_{\boldsymbol{A},{\boldsymbol{b}}})\leq C\|[\boldsymbol{A}\boldsymbol{x}-{\boldsymbol{b}}]_{+}\|,

(9)

where $([\boldsymbol{x}]_{+})_{i}=\max(x_{i},0)$ for $i=1$ , $\ldots$ , $n$ . In his original paper, he proved the bound using results from conic geometry and then computed the constants ad hoc for a few different norms. Later results, see e.g., [21], generalize this to convex inequalities $f(\boldsymbol{x})\leq 0$ and compute sharp constants using the subdifferential $\partial f(\boldsymbol{x})$ . We will use the following theorem of [21], where (10) below is analogous to (9).

Theorem 2.7.

If $f:\mathbb{R}^{n}\to\mathbb{R}$ is a convex function, $Q=\{\boldsymbol{z}\in\mathbb{R}^{n}\mid f(\boldsymbol{z})\leq 0\}$ and $\boldsymbol{x}_{0}$ lies in the boundary of $Q$ , then the following two statements are equivalent:

1.

There exist $c(f,\boldsymbol{x}_{0})$ and $\epsilon>0$ such that for all $\boldsymbol{x}\in B_{\epsilon}(\boldsymbol{x}_{0})$

$d(\boldsymbol{x},Q)\leq c(f,\boldsymbol{x}_{0})[f(\boldsymbol{x})]_{+};$ (10)
2.

For all sequences $\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0}$ with $f(\boldsymbol{x}_{i})>0$ for all $i\in\mathbb{N}$ ,

$\tau(f,\boldsymbol{x}_{0}):=\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}d(0,\partial f(\boldsymbol{x}))>0.$ (11)

Moreover $c=\tau(f,\boldsymbol{x}_{0})^{-1}$ is optimal in (10).

3 The Main Theorem and its Proof

We formulate the main theorem of this paper and follow up with its proof

Theorem 3.1.

Assume $m\leq n,$ ${\boldsymbol{b}}\in\mathbb{R}^{m}$ , $\boldsymbol{A}\in\mathbb{R}^{m\times n}$ and $\Omega$ is a seminorm with polyhedral unit ball. Then the set-valued map defined in (3) is calm at $(\boldsymbol{A}_{0},\boldsymbol{x}_{0})$ for every full-rank $\boldsymbol{A}_{0}\in\mathbb{R}^{m\times n}$ such that $\min_{\boldsymbol{A}_{0}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x})>0$ and for every $\boldsymbol{x}_{0}\in S(\boldsymbol{A}_{0})$ .

In analogy to the notation used in Theorem 2.5, we denote $\mu_{0}=\min_{\boldsymbol{A}_{0}\boldsymbol{x}={\boldsymbol{b}}}\Omega(\boldsymbol{x})$ . The assumption of Theorem 3.1 that $\mu_{0}>0$ is reasonable from an applied perspective. For example, in the case of compressed sensing, where $\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{1}$ , $\mu_{0}=0$ corresponds to the trivial solution $\boldsymbol{x}=\boldsymbol{0}$ . The condition $\mu_{0}>0$ thus only excludes the trivial solution, where the sparsity prior is meaningless.

In order to prove this theorem, we first prove in Section 3.1 that $M(\boldsymbol{A})$ is calm and lower Lipschitz at $(\boldsymbol{A}_{0},\boldsymbol{x}_{0})$ . We then prove in Section 3.2 that the map

L(\boldsymbol{A},\mu)=\{\boldsymbol{x}\in M(\boldsymbol{A})\mid\Omega(\boldsymbol{x})\leq\mu\}

is calm at $((\boldsymbol{A}_{0},\mu_{0}),\boldsymbol{x}_{0})$ . In view of Theorem 2.5, these results imply Theorem 3.1.

3.1 $M(\boldsymbol{A})$ is Calm and Lower Lipschitz at $(\boldsymbol{A}_{0},\boldsymbol{x}_{0})$

The following classical property of the Moore-Penrose inverse will be useful. We offer a short proof here.

Lemma 3.2.

Let $\boldsymbol{x}\in\mathbb{R}^{n}$ and $\boldsymbol{A}^{\dagger}$ be the Moore-Penrose inverse of $\boldsymbol{A}$ [22]. Then the following bound holds

d(\boldsymbol{x},M(\boldsymbol{A}))\leq\left\lVert\boldsymbol{A}^{\dagger}\right\rVert\left\lVert\boldsymbol{A}\boldsymbol{x}-{\boldsymbol{b}}\right\rVert.

(12)

Proof.

The operator $\boldsymbol{I}-\boldsymbol{A}^{\dagger}\boldsymbol{A}$ is the projection onto the kernel of $\boldsymbol{A}$ and thus the projection $P_{M(\boldsymbol{A})}$ is given by

P_{M(\boldsymbol{A})}\boldsymbol{x}=\boldsymbol{A}^{\dagger}{\boldsymbol{b}}+(I-\boldsymbol{A}^{\dagger}\boldsymbol{A})\boldsymbol{x}.

Therefore,

d(\boldsymbol{x},M(\boldsymbol{A}))=\left\lVert\boldsymbol{x}-P_{M(\boldsymbol{A})}\boldsymbol{x}\right\rVert=\left\lVert\boldsymbol{x}-\boldsymbol{A}^{\dagger}{\boldsymbol{b}}-\boldsymbol{x}+\boldsymbol{A}^{\dagger}\boldsymbol{A}\boldsymbol{x}\right\rVert=\left\lVert\boldsymbol{A}^{\dagger}\boldsymbol{A}\boldsymbol{x}-\boldsymbol{A}^{\dagger}{\boldsymbol{b}}\right\rVert\leq\left\lVert\boldsymbol{A}^{\dagger}\right\rVert\left\lVert\boldsymbol{A}\boldsymbol{x}-{\boldsymbol{b}}\right\rVert.

∎

We derive the calmness of $M(\boldsymbol{A})$ at $(\boldsymbol{A}_{0},\boldsymbol{x}_{0})\in\mathbb{R}^{m\times n}\times\mathbb{R}^{n}$ from the above lemma. We set the neighborhoods of Definition 2.2 as $U=\mathbb{R}^{m\times n}$ and $V$ any bounded neighborhood of $\boldsymbol{x}_{0}$ in $\mathbb{R}^{n}$ of diameter $D>0$ , where any choice of $D>0$ is fine, but it impacts the calmness constant (see below). We arbitrarily fix $\boldsymbol{A}_{1}\in U$ and $\boldsymbol{x}_{1}\in M(\boldsymbol{A}_{1})\cap V$ . To verify calmness, we directly apply Lemma 3.2

d(\boldsymbol{x}_{1},M(\boldsymbol{A}_{0}))\leq\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert\left\lVert\boldsymbol{A}_{0}\boldsymbol{x}_{1}-{\boldsymbol{b}}\right\rVert

and further note that

\boldsymbol{A}_{0}\boldsymbol{x}_{1}-{\boldsymbol{b}}=\boldsymbol{A}_{0}\boldsymbol{x}_{1}-{\boldsymbol{b}}-\boldsymbol{A}_{1}\boldsymbol{x}_{1}+{\boldsymbol{b}}=(\boldsymbol{A}_{0}-\boldsymbol{A}_{1})\boldsymbol{x}_{1}.

Combining the above two equations and using the triangle inequality yields the desired calmness of $M(\boldsymbol{A})$ :

d(\boldsymbol{x}_{1},M(\boldsymbol{A}_{0}))\leq\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert\left\lVert\boldsymbol{x}_{1}\right\rVert\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert\leq\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert(\left\lVert\boldsymbol{x}_{0}\right\rVert+D)\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert.

(13)

The calmness constant $\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert(\left\lVert\boldsymbol{x}_{0}\right\rVert+D)$ is bounded since the diameter $D$ is bounded and $\boldsymbol{A}_{0}$ is full-rank, so its lowest singular value, $\sigma_{\textrm{min}}(\boldsymbol{A}_{0})$ is positive and thus $\left\lVert\boldsymbol{A}^{\dagger}_{0}\right\rVert=1/{\sigma_{\textrm{min}}(\boldsymbol{A}_{0})}$ is bounded. We remark that the bound in the first inequality of (13) is analogous to that in (9), though the latter one applies to a system of inequalities and not just equalities.

To see that $M(\boldsymbol{A})$ is lower Lipschitz semicontinuous, we use again the formula $\left\lVert\boldsymbol{A}^{\dagger}\right\rVert=1/{\sigma_{\textrm{min}}(\boldsymbol{A})}$ . Since the complex roots of a polynomial vary continuously with respect to the coefficients, the eigenvalues of a real symmetric matrix vary continuously with respect to that matrix. Thus $\sigma_{\textrm{min}}(\boldsymbol{A})$ , which is the minimal eigenvalue of $\boldsymbol{A}\boldsymbol{A}^{T}$ , depends continuously on $\boldsymbol{A}$ . Therefore, for every $\epsilon>0$ there exists a $\delta>0$ such that for $\boldsymbol{A}_{1}\in B_{\delta}(\boldsymbol{A}_{0})$ , $\sigma_{\textrm{min}}(\boldsymbol{A}_{1})>0$ , so that $\boldsymbol{A}_{1}$ is also full-rank, and

\left\lVert\boldsymbol{A}_{1}^{\dagger}\right\rVert\leq(1+\epsilon)\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert.

(14)

Swapping $\boldsymbol{A}_{1}$ and $\boldsymbol{A}_{0}$ in the first inequality of (13) and combining it with (14) gives

d(\boldsymbol{x}_{0},M(\boldsymbol{A}_{1}))\leq(1+\epsilon)\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert\left\lVert\boldsymbol{x}_{0}\right\rVert\left\lVert\boldsymbol{A}_{0}-\boldsymbol{A}_{1}\right\rVert.

This bound shows that $M(\boldsymbol{A})$ is lower Lipschitz semicontinuous at $(\boldsymbol{A}_{0},\boldsymbol{x}_{0})$ .

3.2 $L(\boldsymbol{A},\mu)$ is Calm at $((\boldsymbol{A}_{0},\mu_{0}),\boldsymbol{x}_{0})$

Recall the map $F(\mu)$ defined in (6) and note that we can write

L(\boldsymbol{A},\mu)=M(\boldsymbol{A})\cap F(\mu).

(15)

Define also the system of inequalities

W(\mu)=M(\boldsymbol{A}_{0})\cap F(\mu).

The following theorem from [16] gives conditions on $M$ and $F$ that ensure $L$ is calm.

Theorem 3.3.

Let $X$ , $Y$ , $Z$ be Banach spaces. If $F_{1}:Y\to 2^{X}$ is calm at $(y_{0},x_{0})\in\textrm{Graph }F_{1}$ , $F_{2}:Z\to 2^{X}$ is calm at $(z_{0},x_{0})\in\textrm{Graph }F_{2}$ , $F_{1}^{-1}$ has the Aubin property at $(x_{0},y_{0})$ . Finally, assume $F_{3}(z):=F_{1}(y_{0})\cap F_{2}(z)$ is calm at $(z_{0},x_{0})$ . Then $F_{4}(y,z):=F_{1}(y)\cap F_{2}(z)$ is calm at $((y_{0},z_{0}),x_{0})$ .

We will apply the theorem with $X=\mathbb{R}^{n}$ , $Y=\mathbb{R}^{m\times n}$ , and $Z=\mathbb{R}$ . We set $F_{1}=M$ , $F_{2}=F$ , $F_{3}=W$ , and $F_{4}=L$ . Let $((\boldsymbol{A}_{0},\mu_{0}),\boldsymbol{x}_{0})\in\textrm{Graph }L$ . To see that $L$ is calm at this point, we thus verify that

1.

$F^{-1}(\boldsymbol{x})$ has the Aubin property at $(\boldsymbol{x}_{0},\mu_{0})$ .
2.

$F(\mu)$ is calm at $(\mu_{0},\boldsymbol{x}_{0})$ .
3.

$W(\mu)$ is calm at $(\mu_{0},\boldsymbol{x}_{0})$ .

We verify property 1 in Section 3.2.1, property 2 in Section 3.2.3, and property 3 in Section 3.2.3.

The main trick is that we have reduced the problem from perturbations in $\boldsymbol{A}$ to better understood cases of perturbations. For example, in verifying properties 2 and 3 we need to study tolerance to perturbations in the right hand side of an inequality system, represented by either $F(\mu)$ or $W(\mu)$ . We prove the calmness of $W(\mu)$ and $F(\mu)$ by both geometric or algebraic approaches. The geometric ones are quite simple, while the algebraic ones enable us to compute explicit constants.

3.2.1 $F^{-1}$ is Aubin

Since $\Omega$ is convex, it is Lipschitz on $B_{\delta}(\boldsymbol{x}_{0})$ for some $\delta>0$ . That is, there exists $C_{\Omega}\in\mathbb{R}$ so that

\left|\Omega(\boldsymbol{x})-\Omega(\boldsymbol{x}^{\prime})\right|\leq C_{\Omega}\left\lVert\boldsymbol{x}-\boldsymbol{x}^{\prime}\right\rVert\textrm{ for all }\boldsymbol{x},\boldsymbol{x}^{\prime}\in B_{\delta}(\boldsymbol{x}_{0}).

For $\boldsymbol{x}$ , $\boldsymbol{x}^{\prime}\in B_{\delta}(\boldsymbol{x}_{0})$ and $\mu_{1}\in F^{-1}(\boldsymbol{x})=[\Omega(\boldsymbol{x}),\infty)$ , we obtain by direct estimation and the above inequality the desired Aubin property as follows:

d(\mu_{1},F^{-1}(\boldsymbol{x}^{\prime}))\leq\left|\mu_{1}-\Omega(\boldsymbol{x}^{\prime})\right|\leq\left|\Omega(\boldsymbol{x})-\Omega(\boldsymbol{x}^{\prime})\right|\leq C_{\Omega}\left\lVert\boldsymbol{x}-\boldsymbol{x}^{\prime}\right\rVert.

3.2.2 $F$ is calm

Geometric Proof: Note that by the degree-one homogeneity of the seminorm $\Omega$ , for $\mu>0$

F(\mu)=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x})\leq\mu\}=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x}/\mu)\leq 1\}=\mu\cdot\{\boldsymbol{x}\in\mathbb{R}^{n}\mid\Omega(\boldsymbol{x})\leq 1\}.

That is, for $\mu>0$ , $F(\mu)$ is simply a scaling of the set $\{\boldsymbol{x}\mid\Omega(\boldsymbol{x})\leq 1\}$ . By assumption, $F(1)$ is a polyhedron in $\mathbb{R}^{n}$ . Since $F(\mu)=\mu F(1)$ , then the graph of $F$ over any small interval is a polyhedron in $\mathbb{R}^{n+1}$ . This observation and Theorem 2.6 imply that $F$ is calm at $(\mu_{0},\boldsymbol{x}_{0})$ .

Algebraic Proof: This proof relies on Theorem 2.7 with $f(\boldsymbol{x}):=\Omega(\boldsymbol{x})-\mu_{0}$ . We first verify (11) with the latter $f$ , which states that for all sequences $\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0}$ with $\Omega(\boldsymbol{x}_{i})>\mu_{0}$ ,

\tau(\Omega,\boldsymbol{x}_{0}):=\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}d(0,\partial\Omega(\boldsymbol{x}))>0.

(16)

We note that if $\boldsymbol{x}\in\mathbb{R}^{n}$ and $0\in\partial\Omega(\boldsymbol{x})$ , then $\Omega(\boldsymbol{x})=0$ . Indeed, $0\in\partial\Omega(\boldsymbol{x})$ only when $\boldsymbol{x}$ is a minimum of $\Omega$ , and since $\Omega$ is a seminorm, its minimum is achieved only on the set $\Omega(\boldsymbol{x})=0$ . We further note that the assumption $\boldsymbol{x}_{0}\in F(\mu_{0})$ implies that $\Omega(\boldsymbol{x}_{0})\geq\mu_{0}>0$ . These two observations yield that

d(0,\partial\Omega(\boldsymbol{x}_{0}))>0.

(17)

Next, we define $h(\boldsymbol{x}):=d(0,\partial\Omega(\boldsymbol{x}))$ . If we can verify that $h$ is lower semicontinuous, then by (17) and the definition of lower semicontinuity, we conclude (16) as follows: $\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}h(\boldsymbol{x}_{i})\geq h(\boldsymbol{x}_{0})>0.$ The lower-semicontinuity of $h$ becomes obvious when we rewrite it as $h(\boldsymbol{x})=\inf_{\boldsymbol{y}\in\partial\Omega(\boldsymbol{x})}\left\lVert\boldsymbol{y}\right\rVert_{2}$ . Indeed, it follows from the set-valued upper semicontinuity of $\partial\Omega(\boldsymbol{x})$ and the definitions of lower semicontinuity of a real-valued function and upper semicontinuity of a set-valued function.

Since we verified (11) with $f(\boldsymbol{x})=\Omega(\boldsymbol{x})-\mu_{0}$ , Theorem 2.7 implies that (10) holds with the same $f$ and also specifies the optimal constant in (10). Note that by the definitions of $f$ and $F$ , $Q=F(\mu_{0})$ . We thus proved that there exist $\epsilon>0$ , such that for all $\boldsymbol{x}\in B_{\epsilon}(\boldsymbol{x}_{0})$

d(\boldsymbol{x},F(\mu_{0}))\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}[\Omega(\boldsymbol{x})-\mu_{0}]_{+}.

(18)

We conclude by showing that this bound implies the calmness of $F$ . Let $\mu_{1}\in B_{\delta}(\mu_{0})$ (with $\delta$ chosen small enough so that $0\notin B_{\delta}(\mu_{0})$ ) and $\boldsymbol{x}_{1}\in F(\mu_{1})\cap B_{\epsilon}(\boldsymbol{x}_{0})$ . By definition, $\Omega(\boldsymbol{x}_{1})\leq\mu_{1}$ , and we can thus write

\Omega(\boldsymbol{x}_{1})-\mu_{0}\leq\Omega(\boldsymbol{x}_{1})-\mu_{0}-(\Omega(\boldsymbol{x}_{1})-\mu_{1})=\mu_{1}-\mu_{0}.

(19)

The combination of (18) and (19) results in

d(\boldsymbol{x}_{1},F(\mu_{0}))\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}[\mu_{1}-\mu_{0}]_{+}\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}\left|\mu_{0}-\mu_{1}\right|

(20)

for all $\boldsymbol{x}_{1}\in B_{\epsilon}(\boldsymbol{x}_{0})\cap F(\mu_{1})$ . Thus $F$ is calm in $(\mu_{0},\boldsymbol{x}_{0})$ .

3.2.3 $W$ is calm

Geometric Proof: Extending the polyhedral mapping proof from Section 3.2.2 is nontrivial because $W$ does not obey a simple scaling law. Instead, we analyze carefully the intersection of the affine set, $M(\boldsymbol{A}_{0})$ , and the polyhedral set, $F(\mu_{0})$ , near $\boldsymbol{x}_{0}$ . We will bound the calmness constant for $W$ by $(\sin(\theta_{\min})\,\tau)^{-1}$ , with $\tau=\tau(\Omega,\boldsymbol{x}_{0})$ from (16), and $\theta_{\min}=\theta_{\min}(\Omega,\boldsymbol{x}_{0})$ the smallest positive principal angle between the affine space $M(\boldsymbol{A}_{0})$ and the affine sets defining the faces of $F(\mu_{0})$ (of any dimension $<n$ ) containing $\boldsymbol{x}_{0}$ .

Denote by $\mathcal{U}_{1}$ , $\ldots$ , $\mathcal{U}_{p}$ the $p=p(\boldsymbol{x}_{0})$ faces of the polyhedron $F(\mu_{0})\in\mathbb{R}^{n}$ of any dimension less than $n$ that contain $\boldsymbol{x}_{0}$ (and thus also intersect the $m$ -dimensional space $M(\boldsymbol{A}_{0})$ ). Denote by $P_{F(\mu_{0})}$ the projection operator of $\mathbb{R}^{n}$ onto $F(\mu_{0})$ . This operator is well-defined since $F(\mu_{0})$ is convex. Choose $\epsilon>0$ small enough so that the set

P_{F(\mu_{0})}(W(\mu_{1})\cap B_{\epsilon}(\boldsymbol{x}_{0}))

intersects only the faces $\mathcal{U}_{1},\ldots,\mathcal{U}_{m}$ of $F(\mu_{0})$ . Clearly, it also intersects $M(\boldsymbol{A}_{0})$ . Choose coordinates so that $\boldsymbol{x}_{0}=\boldsymbol{0}$ . Then $M(\boldsymbol{A}_{0})$ , $\mathcal{U}_{1},\ldots,\mathcal{U}_{m}$ can be viewed as linear subspaces of $\mathbb{R}^{n}$ and the following vectors in $\mathbb{R}^{n}$ ,

\boldsymbol{x}_{0}=\boldsymbol{0},\ \ \boldsymbol{x}_{1}\in W(\mu_{1})\cap B_{\epsilon}(\boldsymbol{x}_{0}),\ \text{ and }\ \boldsymbol{x}_{2}:=P_{F(\mu_{0})}(\boldsymbol{x}_{1}),

form a right triangle, which is demonstrated in Figure 2.

To prove that $W$ is calm at $(\mu_{0},\boldsymbol{x}_{0})$ we wish to prove a bound of the form

d(\boldsymbol{x}_{1},W(\mu_{0}))\leq c(\mu_{0},\boldsymbol{x}_{0})\left|\mu_{0}-\mu_{1}\right|,\textrm{ for some }c(\mu_{0},\boldsymbol{x}_{0})>0.

(21)

We start with the case when the triangle is degenerate, that is, either $\boldsymbol{x}_{1}=\boldsymbol{x}_{0}$ , or $\boldsymbol{x}_{2}=\boldsymbol{x}_{1}$ , or $\boldsymbol{x}_{2}=\boldsymbol{x}_{0}$ . First, assume that $\boldsymbol{x}_{1}=\boldsymbol{x}_{0}$ . This implies that $\boldsymbol{x}_{1}\in W(\mu_{0})$ and consequently the desired calmness: $d(\boldsymbol{x}_{1},W(\mu_{0}))=0\leq\left|\mu_{0}-\mu_{1}\right|$ . Next, assume that $\boldsymbol{x}_{2}=\boldsymbol{x}_{1}$ . By definition, $P_{F(\mu_{0})}(\boldsymbol{x}_{1})=\boldsymbol{x}_{1}$ so $\boldsymbol{x}_{1}\in F(\mu_{0})$ . This observation and the fact that $\boldsymbol{x}_{1}\in M(\boldsymbol{A}_{0})$ implies that $\boldsymbol{x}_{1}\in W(\mu_{0})$ and the same estimate holds as above. At last, we assume that $\boldsymbol{x}_{2}=\boldsymbol{x}_{0}$ . Then $\boldsymbol{x}_{2}\in W(\mu_{0})$ so $d(\boldsymbol{x}_{1},W(\mu_{0}))\leq\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert$ and we can bound $\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert=d(\boldsymbol{x}_{1},F(\mu_{0}))$ by (20). We thus verified (21) for these cases with $c(\mu_{0},\boldsymbol{x}_{0})=1/\tau(\mu_{0},\boldsymbol{x}_{0})$ .

In the case of non-degenerate triangles, we note that since $\boldsymbol{x}_{0}=\boldsymbol{0}\in W(\mu_{0})$ ,

d(\boldsymbol{x}_{1},W(\mu_{0}))\leq\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{0}\right\rVert=\left\lVert\boldsymbol{x}_{1}\right\rVert.

(22)

Therefore, to obtain (21) we bound $\left\lVert\boldsymbol{x}_{1}\right\rVert$ in terms of $\left|\mu_{0}-\mu_{1}\right|$ . Since $\boldsymbol{0}$ , $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ form a right triangle, $\left\lVert\boldsymbol{x}_{1}\right\rVert\sin\theta=\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert$ , where $\theta$ is the angle between $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ . The combination of this observation with the calmness of $F$ , stated in (20), results in the bound

\left\lVert\boldsymbol{x}_{1}\right\rVert\sin\theta=\left\lVert\boldsymbol{x}_{1}-\boldsymbol{x}_{2}\right\rVert=d(\boldsymbol{x}_{1},F(\mu_{0}))\leq\frac{1}{\tau(\Omega,\boldsymbol{x}_{0})}\left|\mu_{1}-\mu_{0}\right|.

(23)

We note that $\theta$ depends on $\boldsymbol{x}_{1}$ and thus (23) does not yield yet the desired bound on $\left\lVert\boldsymbol{x}_{1}\right\rVert$ . To obtain this, we will establish a lower bound on $\theta$ .

We index by $k=1$ , $\ldots$ , $\ell$ , where $\ell\leq p$ , the set of faces in $\mathcal{U}_{1}$ , $\ldots$ , $\mathcal{U}_{p}$ having at least one positive principal angle with $M(\boldsymbol{A}_{0})$ . Note that since we ruled out the trivial case, where the triangle formed by $\boldsymbol{x}_{0}$ , $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ is degenerate, the latter set is not empty, that is, $\ell\geq 1$ . For $k=1$ , $\ldots$ , $\ell$ , we denote by $\theta_{k}^{*}$ the smallest positive principal angle between $\mathcal{U}_{k}$ and $M(\boldsymbol{A}_{0})$ . Since $\boldsymbol{x}_{1}$ and $\boldsymbol{x}_{2}$ are nonzero and contained in distinct subspaces, they are orthogonal to the principal vectors corresponding to the zero principal angles (see (8)). Hence $\theta_{k}^{*}\leq\theta$ for $k=1$ , $\ldots$ , $\ell$ . We further denote by $\theta_{\textrm{min}}$ the minimal value of $\theta_{k}^{*}$ among all $k=1$ , $\ldots$ , $\ell$ and have that $\theta_{\textrm{min}}\leq\theta$ . Combining this observation with (22) yields the desired calmness of $W$ at $(\mu_{0},\boldsymbol{x}_{0})$ :

d(\boldsymbol{x}_{1},W(\mu_{0}))\leq\frac{1}{\sin\theta_{k_{\textrm{min}}}\tau(\Omega\boldsymbol{x}_{0})}\left|\mu_{1}-\mu_{u}\right|.

Algebraic Proof: The follows a similar approach to that in Section 3.2.2. We first write our set $W$ as the inequality system

W(\mu_{0})=F(\mu_{0})\cap M(\boldsymbol{A}_{0})=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid f(\boldsymbol{x})\leq 0\},\quad\textrm{ where }f(\boldsymbol{x}):=\max(\Omega(\boldsymbol{x})-\mu_{0},d(\boldsymbol{x},M(\boldsymbol{A}_{0}))).

We note that by definition $f\geq 0$ and thus $W(\mu_{0})=\{\boldsymbol{x}\in\mathbb{R}^{n}\mid f(\boldsymbol{x})\leq 0\}$ , though we treat it as a system of inequalities.

Similarly to the proof in in Section 3.2.2, the calmness of $W(\mu)$ at $(\mu_{0},\boldsymbol{x}_{0})$ with $\boldsymbol{x}_{0}\in W(\mu_{0})$ follows from (11) with the later $f$ . That is, we need to verify that for all sequences $\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0}$ with $f(\boldsymbol{x}_{i})>0$

\tau^{\prime}(f,\boldsymbol{x}_{0}):=\liminf_{\boldsymbol{x}_{i}\to\boldsymbol{x}_{0}}d(0,\partial f(\boldsymbol{x}))>0.

(24)

The argument here is more subtle than in Section 3.2.2 since here $0\in\partial f(\boldsymbol{x}_{0})$ , whereas before $0\notin\partial\Omega(\boldsymbol{x}_{0})$ . We thus need a more subtle analysis of the function $\boldsymbol{x}\mapsto d(0,\partial f(\boldsymbol{x}))$ . The definition of $W(\mu_{0})$ and the fact that $f\geq 0$ imply that the minimal value of $f$ , which is 0, is achieved at $W(\mu_{0})$ . Thus $0\in\partial f(\boldsymbol{x})$ if and only if $f(\boldsymbol{x})=0$ . Hence, for any sequence $\{\boldsymbol{x}_{i}\}_{i\in\mathbb{N}}\to\boldsymbol{x}_{0}$ with $f(\boldsymbol{x}_{i})>0$ , $0\notin\partial f(\boldsymbol{x}_{i})$ , or equivalently

d(0,\partial f(\boldsymbol{x}_{i}))>0.

(25)

In order to conclude (24), we will show that $d(0,\partial f(\boldsymbol{x}))$ is piecewise constant and combine this with (25). By Theorem 3 of Chapter 4 of [23] we have

\partial f(\boldsymbol{x})=\textrm{co}\left(\{\partial\Omega(\boldsymbol{x})\mid\Omega(\boldsymbol{x})-\mu_{0}=f(\boldsymbol{x})\}\cup\{\partial d(\boldsymbol{x},M(\boldsymbol{A}_{0}))\mid d(\boldsymbol{x},M(\boldsymbol{A}_{0}))=f(\boldsymbol{x})\}\right),

(26)

where co denotes the convex hull. Since $\Omega(\boldsymbol{x})$ has polyhedral level sets, $\partial\Omega(\boldsymbol{x})$ is piecewise constant in the sense that $\partial\Omega(\boldsymbol{x})$ outputs only finitely many different sets. Similarly, since $M(\boldsymbol{A}_{0})$ is also a polyhedron, $\partial d(\boldsymbol{x},M(\boldsymbol{A}_{0}))$ is piecewise constant as a set-valued function. These two observations and (26) thus imply that $\partial f(\boldsymbol{x})$ is piecewise constant as a set-valued function. Hence, $d(0,\partial f(\boldsymbol{x}))$ is piecewise constant as a real-valued function and the proof is concluded.

4 Conclusion

We have shown that if $\Omega$ has polyhedral level sets, then the solution map $S(\boldsymbol{A})$ is calm. By tracing the constants in our proof and in Klatte and Kummer [16], we see that a calmness constant for $S$ is

\max\left(\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert,\frac{1}{\tau}\right)\left(1+\frac{2C_{\Omega}}{\tau^{\prime}}\right)C_{\Omega}\left\lVert\boldsymbol{A}_{0}^{\dagger}\right\rVert,

where $C_{\Omega}$ is the Lipschitz constant of $\Omega$ from Section 3.2.1, $\tau$ is defined in (16) and $\tau^{\prime}$ is defined in (24).

A key assumption for proving that $W$ is calm is that $\Omega$ has polyhedral sublevel sets. Removing this assumption would allow us to recover known calmness results such as the case $\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{2}$ [24]. Unfortunately, $W$ is in fact not calm for $\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{2}$ as we show below. Therefore, our approach needs a major change if one wants to extend it to other known seminorms.

We verify our claim above by using the following equivalent definition of calmness: A map $G:Y\to 2^{X}$ is calm at $(\boldsymbol{y}_{0},\boldsymbol{x}_{0})$ if and only if there exist a neighborhood $U$ of $\boldsymbol{x}_{0}$ and a constant $c(\boldsymbol{x}_{0})>0$ such that

d(\boldsymbol{x},G(\boldsymbol{y}_{0}))\leq c(\boldsymbol{x}_{0})\,d(\boldsymbol{y}_{0},G^{-1}(\boldsymbol{x}))\ \text{ for all }\ \boldsymbol{x}\in U.

(27)

Using this definition, we show that $W(\mu)$ with $\Omega(\boldsymbol{x})=\left\lVert\boldsymbol{x}\right\rVert_{2}$ in $\mathbb{R}^{2}$ , $\boldsymbol{A}_{0}=(0,-1)$ and $b=1$ is not calm at $(\mu_{0},\boldsymbol{x}_{0})=(1,(0,-1))$ . Recall that by definition

W(\mu)=F(\mu)\cap M(\boldsymbol{A}_{0}).

In our case,

F(\mu)=\{\boldsymbol{x}\mid\left\lVert\boldsymbol{x}\right\rVert_{2}\leq\mu\},\quad M(\boldsymbol{A}_{0})=(0,-1)+\mathrm{Sp}((1,0)).

Thus

W^{-1}(\boldsymbol{x})=[\left\lVert\boldsymbol{x}\right\rVert_{2},\infty)

so the distance is simply

d(\mu_{0},W^{-1}(\boldsymbol{x}))=\left\lVert\boldsymbol{x}\right\rVert_{2}-\mu_{0}=\sqrt{\boldsymbol{x}_{1}^{2}+\boldsymbol{x}_{2}^{2}}-\mu_{0}.

Consequently, noting that $W(\mu_{0})=(0,-1)$ we have

\lim_{\boldsymbol{x}_{2}=-1,\boldsymbol{x}_{1}\to 0}\frac{d(\mu_{0},W^{-1}(\boldsymbol{x}))}{d(\boldsymbol{x},W(\mu_{0}))}=\lim_{\boldsymbol{x}_{2}=-1,\boldsymbol{x}_{1}\to 0}\frac{\sqrt{\boldsymbol{x}_{1}^{2}+1}-1}{\boldsymbol{x}_{1}}=0

and thus no such bound of the form (27) exists with $c(\boldsymbol{x}_{0})>0$ .

References

[1] David L. Donoho and Michael Elad “Optimally sparse representation in general (nonorthogonal) dictionaries via $l^{1}$ minimization” In Proc. Natl. Acad. Sci. USA 100.5, 2003, pp. 2197–2202 DOI: 10.1073/pnas.0437847100
[2] E. J. Candès, J. Romberg and T. Tao “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information” In IEEE Transactions on Information Theory 52.2, 2006, pp. 489–509 DOI: 10.1109/TIT.2005.862083
[3] Emmanuel J. Candès and Terence Tao “Near-optimal signal recovery from random projections: Universal encoding strategies?” In IEEE Transactions on Information Theory 52.12, 2006, pp. 5406–5425 DOI: 10.1109/TIT.2006.885507
[4] D. L. Donoho “For most large underdetermined systems of linear equations the minimal $\ell_{1}$ -norm solution is also the sparsest solution” In Communications on Pure and Applied Mathematics 59.6, 2006, pp. 797–829 DOI: 10.1002/cpa.20132
[5] David L Donoho “Compressed sensing” In IEEE Transactions on information theory 52.4 IEEE, 2006, pp. 1289–1306
[6] Emmanuel J Candès and Terence Tao “Near-optimal signal recovery from random projections: Universal encoding strategies?” In IEEE transactions on information theory 52.12 IEEE, 2006, pp. 5406–5425
[7] Emmanuel J Candès “The restricted isometry property and its implications for compressed sensing” In Comptes rendus mathematique 346.9-10, 2008, pp. 589–592
[8] D. L. Donoho, M. Elad and V. N. Temlyakov “Stable recovery of sparse overcomplete representations in the presence of noise” In IEEE Transactions on Information Theory 52.1, 2006, pp. 6–18 DOI: 10.1109/TIT.2005.860430
[9] J. A. Tropp “Just relax: convex programming methods for identifying sparse signals in noise” In IEEE Transactions on Information Theory 52.3, 2006, pp. 1030–1051 DOI: 10.1109/TIT.2005.864420
[10] Matthew A. Herman and Thomas Strohmer “General deviants: An analysis of perturbations in compressed sensing” In IEEE Journal on Selected Topics in Signal Processing 4.2, 2010, pp. 342–349 DOI: 10.1109/JSTSP.2009.2039170
[11] Matthew A Herman and Thomas Strohmer “High-resolution radar via compressed sensing” In IEEE transactions on signal processing 57.6 IEEE, 2009, pp. 2275–2284
[12] Albert C Fannjiang, Thomas Strohmer and Pengchong Yan “Compressed remote sensing of sparse objects” In SIAM Journal on Imaging Sciences 3.3 SIAM, 2010, pp. 595–618
[13] Rémi Gribonval, Holger Rauhut, Karin Schnass and Pierre Vandergheynst “Atoms of all channels, unite! Average case analysis of multi-channel sparse recovery using greedy algorithms” In Journal of Fourier analysis and Applications 14.5-6 Springer, 2008, pp. 655–687
[14] Thomas Blumensath and Mike Davies “Compressed sensing and source separation” In International Conference on Independent Component Analysis and Signal Separation, 2007, pp. 341–348 Springer
[15] Alex Gutierrez et al. “Reducing the Complexity of Model-Based MRI Reconstructions via Sparsification” Submitted, 2020
[16] Diethard Klatte and Bernd Kummer “On Calmness of the Argmin Mapping in Parametric Optimization Problems” In Journal of Optimization Theory and Applications 165.3, 2015, pp. 708–719 DOI: 10.1007/s10957-014-0643-2
[17] Diethard Klatte and Bernd Kummer “Nonsmooth Equations in Optimization: Regularity, Calculus, Methods, and Applications” In Journal of Chemical Information and Modeling 53.9 Springer, 2002, pp. 1689–1699 DOI: 10.1017/CBO9781107415324.004
[18] Abderrahim Jourani “Hoffman’s error bound, local controllability, and sensitivity analysis” In SIAM Journal on Control and Optimization 38.3, 2000, pp. 947–970
[19] Per Åke Wedin “On angles between subspaces of a finite dimensional inner product space”, 1983, pp. 263–285 DOI: 10.1007/bfb0062107
[20] A.J. Hoffman “On approximate solutions of systems of linear inequalities” In Journal of Research of the National Bureau of Standards 49.4, 1952, pp. 263 DOI: 10.6028/jres.049.027
[21] Huynh Van Ngai, Alexander Kruger and Michel Théra “Stability of Error Bounds for Semi-infinite Convex Constraint Systems” In SIAM Journal on Optimization 20.4, 2010, pp. 2080–2096 DOI: 10.1137/090767819
[22] Stephen L Campbell and Carl D Meyer “Generalized inverses of linear transformations” SIAM, 2009
[23] Aleksandr Davidovich Ioffe “Theory of extremal problems”, Studies in mathematics and its applications ; v. 6 Amsterdam ; New York : New York: North-Holland Pub. Co. ; Sole distributors for the U.S.A.Canada, Elsevier North-Holland, 1979
[24] Jiu Ding “Perturbation analysis for the projection of a point to an affine set” In Linear Algebra and Its Applications 191.C, 1993, pp. 199–212 DOI: 10.1016/0024-3795(93)90515-P

You Need to Calm Down: Calmness Regularity for a Class of Seminorm Optimization Problems††thanks: This work was supported by NSF award DMS-18-30418.