Geometric characterizations for strong minima with applications to nuclear norm minimization problems

Jalal Fadili¹¹1Normandie Université, ENSICAEN, UNICAEN, CNRS, GREYC, France; email: [email protected] , Tran T. A. Nghia²²2Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA; email: [email protected] , and Duy Nhat Phan³³3Department of Mathematics and Statistics, University of Massachusetts Lowell, Lowell, MA 01854, USA; [email protected]

Abstract. In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are usually used to identify solution uniqueness of nuclear norm minimization problems. Consequently, we obtain the minimum (tight) bound on the number of measurements for (strong) exact recovery of low-rank matrices.

Key Words. Convex optimization; Strong minima; Sharp minima; Second order condition; Nuclear norm minimization; Exact recovery.

Mathematics Subject Classification 52A41 $\cdot$ 90C25 $\cdot$ 49J53 $\cdot$ 49J52

1 Introduction

Strong minima is an important property at a local minimizer of an optimization problem such that the difference between the cost value and the optimal value is bigger than a proportional of the norm square of the difference between the corresponding feasible solution and the minimizer. It is an error bound condition that has various applications to sensitivity analysis, robustness, and complexity of algorithms [4, 5, 7, 8, 18, 22, 23, 24, 29, 34, 42, 44]. Finding necessary and sufficient second order conditions for strong minima is a classical research area. For nonlinear programming, the first results in this direction were probably established in [22] under some restrictive conditions. Complete second order characterizations for nonlinear programming under mild conditions such as the Mangasarian-Fromovitz constraint qualification were obtained later in [4, 34]. For constrained (nonpolyhedral) optimization problems with smooth data, necessary and sufficient second order conditions for strong minima are much more involved. They often contain nontrivial “sigma terms”, which represent curvatures of some nonpolyhedral structures in the problem. Another important feature of these conditions is that they are usually formulated as “minimax” conditions such that Lagrange multipliers are dependent on the choice of vectors in the critical cone; see, e.g., [7, 8]. Although the aforementioned sigma terms are fully calculated for many seminal classes of optimization such as semi-infinite programming, semi-definite programming, and second order cone programming, their calculations are complicated in general. Moreover, checking the (minimax) sufficient second order conditions is quite a hard task numerically.

An important problem that motivates our study in this paper is the nuclear norm minimization problem

\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad\|X\|_{*}\quad\mbox{subject to}\quad\Phi X=M_{0},

(1.1)

where $\|X\|_{*}$ is the nuclear norm of an $n_{1}\times n_{2}$ matrix $X$ , $\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{m}$ is a linear operator, and $M_{0}$ is a known vector (observation) in $\mathbb{R}^{m}$ . This problem is considered the tightest convex relaxation of the celebrated NP-hard affine rank minimization problem with various applications in computer vision, collaborative filtering, and data science; see, e.g., [1, 13, 15, 16, 40]. There are some essential reasons to study strong minima of this problem. First, strong minima of problem (1.1) guarantees the linear convergence of some proximal algorithms for solving problem (1.1) and related problems; see, e.g., [18, 29, 50]. Second, it is also sufficient for solution uniqueness and robustness of problem (1.1) [23, 24]. Solution uniqueness for problem (1.1) is a significant property in recovering the original low-rank solution $X_{0}\in\mathbb{R}^{n_{1}\times n_{2}}$ from observations $M_{0}=\Phi X_{0}$ . In [15, 16], Candès and Recht introduced a nondegenerate condition sufficient for solution uniqueness of problem (1.1). It plays an important role in their results about finding a small bound for measurements $m$ such that solving problem (1.1) recovers exactly $X_{0}$ from observations $M_{0}$ over a Gaussian linear operator $\Phi$ . Their condition is recently revealed in [24] to be a complete characterization for the so-called sharp minima introduced by Crome [12] and Polyak [39] independently. This special property of problem (1.1) at $X_{0}$ guarantees the robust recovery with a linear rate in the sense that any solutions of the following low-rank optimization problem

\frac{1}{2}\|\Phi X-M\|^{2}+\mu\|X\|_{*}

(1.2)

converge to $X_{0}$ with a linear rate as $\mu\downarrow 0$ provided that $\|M-M_{0}\|\leq c\mu$ for some constant $c>0$ ; see [14]. When strong minima occurs in problem (1.1), [24] shows that the convergence rate is Hölderian with order $\frac{1}{2}$ . It is also worth noting that solution uniqueness of nuclear norm minimization problem (1.1) can be characterized geometrically via the descent cone [1, 13]. As the descent cone is not necessarily closed, using it to check solution uniqueness numerically is not ideal. Another geometric characterization for solution uniqueness of problem (1.1) is established recently in [28], but the set in their main condition is not closed too. As strong minima is necessary for sharp minima and sufficient for solution uniqueness, it is open to study the impact of strong minima to exact recovery.

Sufficient second order condition for strong minima of problem (1.1) can be obtained from [18, Theorem 12]. The approach in [18] is rewriting problem (1.1) as a composite optimization problem and applying the classical results in [7, 8]. Some second order analysis on spectral functions including the nuclear norm studied in [17, 18, 20, 37, 51] could be helpful in understanding this result, but these second order computations applied on the nuclear norm still look complicated. Most importantly, the sufficient second order condition obtained in [18, Theorem 12] is still in a minimax form, which makes it hard to check. Our main questions throughout the paper are: 1. Is it possible to obtain simple necessary and sufficient conditions for strong minima of problem (1.1)? 2. Can we avoid the minimax form usually presented in these kinds of second order sufficient conditions? and 3. Is there any efficient way to check strong minima of problem (1.1) numerically?

Our contribution. To highlight the new ideas and direct to the bigger picture, we study in Section 3 the following composite optimization problem

\min_{x\in\mathbb{X}}\quad f(x)+g(x),

(1.3)

where $\mathbb{X}$ is a finite dimensional space, $f:\mathbb{X}\to\mathbb{R}$ is a twice continuously differentiable function, and $g:\mathbb{X}\to\overline{\mathbb{R}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathbb{R}\cup\{+\infty\}$ is a proper lower semi-continuous (nonsmooth) function. It covers problem (1.2) and many modern ones in optimization. One of the most popular ways to characterize strong minima of problem (1.3) is using the second subderivative [8, 42, 43]. As the function $g$ is nonsmooth, second subderivative of $g$ is hard to compute in general. To avoid this computation, we assume additionally that the function $g$ satisfies the classical quadratic growth condition [6, 49]. In the case of convex functions, it is shown in Section 3 that $\bar{x}$ is a strong solution of problem (1.3) if and only if $\bar{v}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\nabla f(\bar{x})\in\partial g(\bar{x})$ and the following geometric condition holds

\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{(\partial g)^{-1}(\bar{v})}(\bar{x})=\{0\},

(1.4)

where $\partial g:\mathbb{X}\rightrightarrows\mathbb{X}^{*}$ is the subdifferential mapping of $g$ , $\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})$ is the nullspace of the Hessian matrix $\nabla^{2}f(\bar{x})$ , and $T_{(\partial g)^{-1}(\bar{v})}(\bar{x})$ is the Bouligand contingent cone at $\bar{x}$ to $(\partial g)^{-1}(\bar{v})$ . The action of the contingent cone to a first order structure in the above condition tells us that (1.4) is actually a second order condition. But it looks simpler for computation than the second subderivative; see, e.g., our Corollary 4.2 for the case of nuclear norm. Our results also work when $g$ is not convex; see Section 3 for more detailed analysis.

Another problem considered in Section 3 is the following convex optimization problem with linear constraints

\min_{x\in\mathbb{X}}\quad g(x)\quad\mbox{subject to}\quad\Phi x\in K,

(1.5)

where $g:\mathbb{X}\to\overline{\mathbb{R}}$ is a continuous (nonsmooth) convex function, $\Phi:\mathbb{X}\to\mathbb{Y}$ is a linear operator between two finite dimensional spaces, and $K$ is a closed polyhedral set of $\mathbb{Y}$ . This problem covers the nuclear norm minimization problem (1.1) and a handful of other significant optimization problems [1, 3, 13]. When $g$ is twice continuously differentiable, characterizations for strong minima of problem (1.5) are simple; see, e.g., [8, Theorem 3.120]. But when $g$ is not differentiable, characterizing strong minima is much more involved. A standard approach is rewriting problem (1.5) as a composite problem, which can be represented by a constrained optimization problem with smooth data [7, 8, 35]. To obtain similar geometric characterization to (1.4) for problem (1.5), we additionally assume the function $g$ satisfies both quadratic growth condition and second order regularity. The latter condition was introduced by Bonnans, Cominetti, and Shapiro in [7] to close the gap between necessary and sufficient second order optimality conditions for constrained and composite optimization problems. But the extra assumption of quadratic growth condition on the function $g$ in this paper allows us to achieve geometric characterizations for strong minima of problem (1.5) in Theorem 3.5. Many vital classes of nonsmooth functions $g$ satisfy both the quadratic growth condition and second order regularity. To list a few, we have classes of piecewise linear-quadratic convex functions [43], spectral functions [18, 37], $\ell_{1}/\ell_{2}$ norms [50], and indicator functions to the set of positive semi-definite matrices [19]. Our results are applicable not only to nuclear norm minimization problem (1.1), but also other different problems in optimization.

As the nuclear norm satisfies both the quadratic growth condition and second order regularity, geometric characterizations for strong minima of low-rank problems (1.1) and (1.2) are foreseen. But our studies on problems (1.1) and (1.2) in Section 4 and 5 are not just straightforward applications. In Section 4, we derive simple calculation of the second order structure in (1.4) for the case of nuclear norm. Furthermore, some quantitative characterizations for strong minima of problem (1.2) are obtained via the so-called Strong Restricted Injectivity and Analysis Strong Source Condition that inherit the same terminologies introduced recently in [24] to characterize strong minima/solution uniqueness of group-sparsity optimization problems. These conditions are weaker than the well-known Restricted Injectivity and Nondegenerate Source Condition used in [15, 16] as sufficient conditions for solution uniqueness of nuclear norm minimization problem (1.1); see also [25] for the case of $\ell_{1}$ -norm. Both conditions can be verified numerically. In Section 5, we obtain new characterizations for strong minima of problem (1.1). Our conditions are not in the form of minimax problems. Indeed, Theorem 5.2 shows that $X_{0}$ is a strong solution of problem (1.1) if and only if there exists a dual certificate $\overline{Y}\in\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}$ such that

\mbox{\rm Ker}\,\Phi\cap T_{(\partial\|\cdot\|_{*})^{-1}(\overline{Y})}(X_{0})=\{0\},

which has some similarity with (1.4). Necessary and sufficient conditions for strong minima obtained in this section reveal some interesting facts about exact recovery for nuclear minimization problem (1.1). For example, one needs at least $\frac{1}{2}r(r+1)$ measurements for $M_{0}=\Phi X_{0}$ to recover exactly the matrix $X_{0}$ of rank $r$ as a strong solution of problem (1.1). This bound for $m$ is very small, but it is tight in the sense that we can construct infinitely many linear operators $\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{\frac{1}{2}r(r+1)}$ such that solving problem (1.1) recovers exactly $X_{0}$ . Another compelling result in Section 5 shows that the low-rank representation problem [27, 33] always has strong minima, when the linear operator $\Phi$ in (1.1) is any $q\times n_{1}$ matrix.

Finally in this paper, we discuss numerical methods to check strong minima and compare the results with sharp minima and solution uniqueness. For example, with 100 nuclear norm minimization problems via standard Gaussian linear operators $\Phi$ and $460$ measurements $M_{0}\in\mathbb{R}^{460}$ observed from the original matrix $X_{0}\in\mathbb{R}^{40\times 40}$ of rank $3$ , the case of exact recovery is about 80%, in which about 40% problems have sharp minima and the other 40% problems have strong minima. As the traditional approach for exact recovery in [1, 13, 16] is via sharp minima [24], seeing more unique (strong, non-sharp) solutions in these numerical experiments gives us a more complete picture about exact recovery when number of measurements $m$ is not big enough; see our Section 6 for further numerical experiments.

2 Preliminaries

Throughout this paper, we suppose that $\mathbb{X}$ is an Euclidean space with norm $\|\cdot\|$ and $\mathbb{X}^{*}$ is its dual space endowed with the inner product $\langle v,x\rangle$ for any $v\in\mathbb{X}^{*}$ and $x\in\mathbb{X}$ . $\mathbb{B}_{r}(\bar{x})$ is denoted by the closed ball with center $\bar{x}\in\mathbb{X}$ and radius $r>0$ . Let $\varphi:\mathbb{X}\to\overline{\mathbb{R}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathbb{R}\cup\{+\infty\}$ be a proper extended real-valued function with nonempty domain $\mbox{\rm dom}\,\varphi\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{x\in\mathbb{X}|\;\varphi(x)<\infty\}\neq\emptyset$ . A point $\bar{x}\in\mbox{\rm dom}\,\varphi$ is called a strong solution (or strong minimizer) of $\varphi$ if there exist $c,\varepsilon>0$ such that

\varphi(x)-\varphi(\bar{x})\geq c\|x-\bar{x}\|^{2}\quad\mbox{for all}\quad x\in\mathbb{B}_{\varepsilon}(\bar{x}).

(2.1)

In this case, we say that strong minima occurs at $\bar{x}$ . The study of strong minima is usually based on second order theory, where the following structures of subderivatives play crucial roles; see, e.g., [5, 8, 35, 42, 43].

Definition 2.1 (Subderivatives).

For a function $\varphi:\mathbb{X}\to\overline{\mathbb{R}}$ and $\bar{x}\in\mbox{\rm dom}\,\varphi$ , the subderivative of $\varphi$ at $\bar{x}$ is the function $d\varphi(\bar{x}):\mathbb{X}\to\overline{\mathbb{R}}$ defined by

d\varphi(\bar{x})(w)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\liminf_{t\downarrow 0,w^{\prime}\to w}\dfrac{\varphi(\bar{x}+tw^{\prime})-\varphi(\bar{x})}{t}\quad\mbox{for}\quad w\in X.

(2.2)

The second subderivative of $\varphi$ at $\bar{x}$ for $\bar{v}\in\mathbb{X}^{*}$ is the function $d^{2}\varphi(\bar{x}|\bar{v}):\mathbb{X}\to\overline{\mathbb{R}}$ defined by

d^{2}\varphi(\bar{x}|\bar{v})(w)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\liminf_{t\downarrow 0,w^{\prime}\to w}\dfrac{\varphi(\bar{x}+tw^{\prime})-\varphi(\bar{x})-t\langle\bar{v},w^{\prime}\rangle}{\frac{1}{2}t^{2}}\quad\mbox{for}\quad w\in X.

(2.3)

The parabolic subderivative of $\varphi$ at $\bar{x}$ for $w\in\mbox{\rm dom}\,d\varphi(\bar{x})(\cdot)$ with respect to $z\in\mathbb{X}$ is defined by

d^{2}\varphi(\bar{x})(w|z)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\liminf_{t\downarrow 0,z^{\prime}\to z}\dfrac{\varphi(\bar{x}+tw+\frac{1}{2}t^{2}z^{\prime})-\varphi(\bar{x})-td\varphi(\bar{x})(w)}{\frac{1}{2}t^{2}}.

(2.4)

Parabolic subderivatives were introduced by Ben-Tal and Zowe in [5] to study strong minima; see also [43, Theorem 13.66]. Second subderivatives dated back to the seminal work of Rockafellar [42] with good calculus [35] for many important classes of functions. It is well known [43, Theorem 13.24] that $\bar{x}$ is a strong solution of $\varphi$ if and only if $0\in\partial\varphi(\bar{x})$ and

d^{2}\varphi(\bar{x}|0)(w)>0\quad\mbox{for all}\quad w\neq 0.

(2.5)

Here $\partial\varphi(\bar{x})$ stands for the Mordukhovich limiting subdifferential of $\varphi$ at $\bar{x}$ [36]:

\partial\varphi(\bar{x})=\left\{v\in\mathbb{X}^{*}|\;\exists(x_{k},v_{k})\stackrel{{\scriptstyle X\times\mathbb{X}^{*}}}{{\longrightarrow}}(\bar{x},v),\,\liminf_{x\to x_{k}}\dfrac{\varphi(x)-\varphi(x_{k})-\langle v_{k},x-x_{k}\rangle}{\|x-x_{k}\|}\geq 0\right\}.

(2.6)

When $\varphi$ is a proper l.s.c. convex function, this subdifferential coincides with the subdifferential in the classical convex analysis

\partial\varphi(\bar{x})=\left\{v\in\mathbb{X}^{*}|\;\varphi(x)-\varphi(\bar{x})\geq\langle v,x-\bar{x}\rangle,x\in\mathbb{X}\right\}.

(2.7)

We denote $\varphi^{*}:\mathbb{X}^{*}\to\overline{\mathbb{R}}$ by the Fenchel conjugate of $\varphi$ :

\varphi^{*}(v)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sup\{\langle v,x\rangle-\varphi(x)\rangle|\;x\in\mathbb{X}\}\quad\mbox{for}\quad v\in\mathbb{X}^{*}.

(2.8)

Next let us recall here some first and second order tangent structures [8, 43] on a nonempty closed set $K$ of $\mathbb{X}$ that widely used in this paper.

Definition 2.2 (tangent cones).

Let $K$ be a closed set of $\mathbb{X}$ . The Bouligand contingent cone at the point $\bar{x}\in K$ to $K$ is defined by

T_{K}(\bar{x})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\displaystyle\mathop{{\rm Lim}\,{\rm sup}}_{t\downarrow 0}\frac{K-\bar{x}}{t}=\left\{w\in\mathbb{X}|\;\exists\,t_{k}\downarrow 0,w_{k}\to w,\bar{x}+t_{k}w_{k}\in K\right\}.

(2.9)

The inner and outer second order tangent set to $K$ at $\bar{x}\in K$ in the direction $w\in\mathbb{X}$ are defined, respectively, by

	$\displaystyle T^{i,2}_{K}(\bar{x}\|w)$	$\displaystyle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathop{{\rm Lim}\,{\rm inf}}_{t\downarrow 0}\frac{K-\bar{x}-tw}{\frac{1}{2}t^{2}}=\left\{z\in\mathbb{X}\|\;\forall\,t_{k}\downarrow 0,\exists\,z_{k}\to z,\bar{x}+t_{k}w+\frac{1}{2}t_{k}^{2}z_{k}\in K\right\}\quad\mbox{and}$		(2.10)
	$\displaystyle T^{2}_{K}(\bar{x}\|w)$	$\displaystyle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathop{{\rm Lim}\,{\rm sup}}_{t\downarrow 0}\frac{K-\bar{x}-tw}{\frac{1}{2}t^{2}}=\left\{z\in\mathbb{X}\|\;\exists\,t_{k}\downarrow 0,z_{k}\to z,\bar{x}+t_{k}w+\frac{1}{2}t_{k}^{2}z_{k}\in K\right\}.$		(2.10)

The contingent cone $T_{K}(\bar{x})$ is a closed set. It contains all $w\in\mathbb{X}$ such that there exists a sequence $\{t_{k}\}\downarrow 0$ such that ${\rm dist}\,(\bar{x}+t_{k}w;K)=o(t_{k})$ , where ${\rm dist}\,(x;K)$ denotes the distance from $x\in\mathbb{X}$ to $K$ :

{\rm dist}\,(x;K)=\min\{\|x-u\||\;u\in K\}.

(2.11)

Similarly, the inner second order tangent set to $K$ at $\bar{y}$ is

T^{i,2}_{K}(\bar{x}|w)=\left\{z\in\mathbb{X}|\;{\rm dist}\,(x+tw+\frac{1}{2}t^{2}z;K)=o(t^{2}),t\geq 0\right\}.

(2.12)

When $K$ is convex, it is well-known that

T_{K}(\bar{x})=\{w\in\mathbb{X}|\;{\rm dist}\,(\bar{x}+tw;K)=o(t),t\geq 0\}.

Since the function ${\rm dist}\,(\cdot;K)$ is a convex function, $T_{K}(\bar{x})$ is a convex set. In this case, the inner second order tangent set $T^{i,2}_{K}(\bar{y}|w)$ is also convex due to the same reason and formula (2.12). Moreover, the dual of contingent cone is the normal cone to $K$ at $\bar{x}$ :

N_{K}(\bar{x})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}[T_{K}(\bar{x})]^{-}=\{v\in\mathbb{X}^{*}|\;\langle v,x-\bar{x}\rangle\leq 0\quad\mbox{for all}\quad x\in K\}.

(2.13)

It is also the subdifferential of the indicator function $\iota_{K}$ to the set $K$ , which is defined by $\iota_{K}(x)=0$ if $x\in K$ and $+\infty$ otherwise. The normal cone can be characterized via the support function to $K$

\sigma_{K}(v)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sup\{\langle v,x\rangle|\;x\in K\}\quad\mbox{for all}\quad v\in\mathbb{X}^{*}

(2.14)

with $N_{K}(\bar{x})=\{v\in\mathbb{X}^{*}|\;\sigma_{K}(v)\leq\langle v,\bar{x}\rangle\}$ .

To characterize strong minima for constrained optimization problems, Bonnans, Cominetti, and Shapiro [7, Definition 3] introduced the following second order regular condition on $K$ ; see also [8, Definition 3.85].

Definition 2.3 (Second order regularity).

The set $K$ is called to be second order regular at $\bar{x}\in K$ if for any $w\in T_{K}(\bar{x})$ the outer second order tangent set $T^{2}_{K}(\bar{x}|w)$ coincides with the inner second order tangent set $T^{i,2}_{K}(\bar{x}|w)$ and for any sequence $x_{k}\in K$ of the form $x_{k}=\bar{y}+t_{k}w+\frac{1}{2}t_{k}^{2}r_{k}$

\lim_{k\to\infty}{\rm dist}\,(r_{k};T^{2}_{K}(\bar{x},w))=0.

The proper l.s.c. function $\varphi:\mathbb{X}\to\overline{\mathbb{R}}$ is said to be second order regular at $\bar{x}\in\mbox{\rm dom}\,\varphi$ if its epigraph of $\varphi$ is second order regular at $(\bar{x},\varphi(\bar{x}))$ .

The class of second order regular sets cover many important sets in optimization such as any polyhedral set and the set of positive semi-definite matrices and the second order ice cream cone; see, e.g., [8]. Piecewise linear quadratic convex functions are second order regular [7]. Recently, it is proved in [18] some special spectral functions are also second order regular.

When the function $\varphi:\mathbb{X}\to\overline{\mathbb{R}}$ is l.s.c. convex and second order regular at $\bar{x}\in\mbox{\rm dom}\,\varphi$ , we note from [8, Proposition 3.41]

T^{i,2}_{{\rm epi}\,\varphi}((\bar{x},\varphi(\bar{x}))|(w,d\varphi(\bar{x})(w)))=\mbox{\rm epi}\,d^{2}\varphi(\bar{x})(w|\cdot)

(2.15)

for any $w\in\mbox{\rm dom}\,d\varphi(\bar{x})$ . This is a convex set, which implies that $d^{2}\varphi(\bar{x})(w|\cdot)$ is a convex function. In this case, it is known from [8, Proposition 103] that $\varphi$ is parabolically regular at $\bar{x}$ in a direction $w\in\mathbb{X}$ for $v\in\mathbb{X}^{*}$ in the sense that

d^{2}\varphi(\bar{x}|v)(w)=-[d^{2}\varphi(\bar{x})(w|\cdot)]^{*}(v),

(2.16)

which is the Fenchel conjugate of the function $d^{2}\varphi(\bar{x})(w|\cdot)$ at $v$ provided that the pair $(w,v)\in\mathbb{X}\times\mathbb{X}^{*}$ satisfies the condition $\langle v,w\rangle=d\varphi(\bar{x})(w)$ .

Next let us slightly modify [8, Theorem 3.108 and Theorem 3.109], which give necessary and sufficient conditions for strong solutions of the following composite problem

\min_{x\in\mathbb{X}}\quad g(F(x)),

(2.17)

where $F:\mathbb{X}\to\mathbb{Y}$ is a twice continuously differentiable mapping and $g:\mathbb{Y}\to\overline{\mathbb{R}}$ is a l.s.c. proper convex function. Suppose that $y_{0}=F(x_{0})\in\mbox{\rm dom}\,g$ with $x_{0}\in\mathbb{X}$ . The Robinson’s constraint qualification at $x_{0}$ for this composite problem is known as

0\in{\rm int}\,(y_{0}+\nabla F(x_{0})\mathbb{X}-\mbox{\rm dom}\,g);

(2.18)

see, e.g., [7, 8]. The feasible point $x_{0}$ is a called a stationary point of problem (2.17) if there exists a Lagrange multiplier $\lambda\in\mathbb{Y}^{*}$ such that

\nabla F(\bar{x})^{*}\lambda=0\quad\mbox{and}\quad\lambda\in\partial g(y_{0}).

(2.19)

Theorem 2.4 (Second order characterizations for strong solutions of composite problems).

Suppose that Robinson’s constraint qualification (2.18) holds at a stationary point $x_{0}$ and that the function $g$ is second order regular at $y_{0}$ . Then $x_{0}$ is a strong solution of problem (2.17) if and only if for any nonzero $w$ in the critical cone

C(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{u\in\mathbb{X}|\;dg(y_{0})(\nabla F(x_{0})u)=0\},

(2.20)

there exists a Lagrange multiplier $\lambda$ satisfying condition (2.19) such that

\langle\lambda,\nabla^{2}F(x_{0})(w,w)\rangle+d^{2}g(y_{0}|\lambda)(\nabla F(x_{0})w)>0.

(2.21)

Proof. Let us justify the sufficient part first. According to [8, Theorem 3.109], $x_{0}$ is a strong solution of problem (2.17) provided that for any $w\in C(x_{0})\setminus\{0\}$ there exists a Lagrange multiplier $\lambda\in\mathbb{Y}^{*}$ satisfying (2.19) such that

\langle\lambda,\nabla^{2}F(x_{0})(w,w)\rangle-\Psi^{*}(\lambda)>0,

(2.22)

where $\Psi(\cdot)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}d^{2}g(y_{0})(\nabla F(x_{0})w|\cdot)$ . Since $g$ is convex and second order regular at $y_{0}$ , equation (2.15) for the function $g$ tells us that $d^{2}g(y_{0})(\nabla F(x_{0})w|\cdot)$ is a convex function for any $w\in C(x_{0})$ . Moreover, note that

\langle\lambda,\nabla F(\bar{x})w\rangle=\langle\nabla F(\bar{x})^{*}\lambda,w\rangle=0=dg(\nabla F(\bar{x})w).

We obtain from (2.16) that $d^{2}g(y_{0}|\lambda)(\nabla F(\bar{x})w)=-\Psi^{*}(\lambda)$ . This ensures the equivalence between (2.22) and (2.21). Thus $x_{0}$ is a strong solution of problem (2.17) provided that condition (2.21) holds.

To prove the necessary part, we note again that the function $d^{2}g(y_{0})(\nabla F(x_{0})w|\cdot)$ is convex, then there is no gap between second order necessary and sufficient condition, i.e., condition (2.22) is also a necessary condition for strong solution $x_{0}$ ; see [7, Theorem 5.2] or [8, Theorem 3.108]. Due to the equivalence of (2.21) and (2.22) above, condition (2.21) is also necesary for the strong minima at $x_{0}$ . $\hfill\Box$

As described in (2.21), the existence of Lagrange multiplier is dependent on the choice of each vector in the critical cone. Under the Robinson’s constraint qualification (2.18), (2.21) is a minimax condition in the sense that it is equivalent to

\min_{w\in C(x_{0}),\|w\|=1}\max_{\lambda\in\Lambda(x_{0})}\big{[}\langle\lambda,\nabla^{2}F(x_{0})(w,w)\rangle+d^{2}g(y_{0}|\lambda)(\nabla F(x_{0})w)\big{]}>0,

(2.23)

where $\Lambda(x_{0})$ is the set of all Lagrange multipliers satisfying (2.19). It is hard to check this condition numerically. On the other hand, its maximin version is more desirable, as it means that there may exist a Lagrange multiplier $\lambda$ such that inequality (2.21) is valid for any $w\in C(x_{0})\setminus\{0\}$ . However, it is not clear how to close the gap between the minimax and maximin. For the case of (1.1), we will obtain some kind of maximin condition for strong minima in Theorem 5.2.

3 Geometric characterizations for strong minima of optimization problems

3.1 Geometric characterizations for strong minima of unconstrained optimization problems

In this subsection, we consider the following composite optimization problem

\displaystyle\min_{x\in\mathbb{X}}\quad\varphi(x)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}f(x)+g(x),

(3.1)

where $f,g:\mathbb{X}\to\overline{\mathbb{R}}$ are proper functions such that ${\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g\neq\emptyset$ , $f$ is twice continuously differentiable in ${\rm int}\,(\mbox{\rm dom}\,f)$ , and $g$ is lower semi-continuous. We assume that $\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g$ is a stationary point of problem (3.1) in the sense that

0\in\partial\varphi(\bar{x})=\nabla f(\bar{x})+\partial g(\bar{x})

due to the sum rule for limiting subdifferential; see, e.g., [36, Proposition 1.107] or [43, Exercise 10.10]. Obviously, $\bar{x}$ is a stationary point if and only if $-\nabla f(\bar{x})\in\partial g(\bar{x})$ .

To characterize strong minima at the stationary point $\bar{x}$ , one of the most typical methods is using the second subderivative $d^{2}\varphi(\bar{x}|0)$ defined in (2.5). As the function $f$ is twice continuously differentiable at $\bar{x}$ , it is well-known [43, Exercise 13.18] that

d^{2}\varphi(\bar{x}|0)(w)=\langle\nabla^{2}f(\bar{x})w,w\rangle+d^{2}g(\bar{x}|-\nabla f(\bar{x}))(w)\quad\mbox{for}\quad w\in\mathbb{X}.

(3.2)

Since $g$ is possibly nonsmooth in many structured optimization problems, the computation of $d^{2}g(\bar{x}|-\nabla f(\bar{x}))(w)$ could be quite challenging. In this section, we establish several new necessary and sufficient conditions for strong minima without computing second subderivatives under an additional assumption that the function $g$ satisfies the following quadratic growth condition [6, 49]; see also [8, Section 3.5].

Definition 3.1 (Quadratic growth conditions).

Let $g:\mathbb{X}\to\overline{\mathbb{R}}$ be a proper l.s.c. function and $S$ be a closed subset of $\mathbb{X}$ with $\bar{x}\in\mbox{\rm dom}\,g\cap S$ . We say that $g$ satisfies the quadratic growth condition at $\bar{x}$ for some $\bar{v}\in\partial g(\bar{x})$ with respect to $S$ if there exist constants $\varepsilon,\delta>0$ and modulus $\kappa>0$ such that

g(x)-g(\bar{x})-\langle\bar{v},x-\bar{x}\rangle\geq\frac{\kappa}{2}[{\rm dist}\,(x;S)]^{2}\qquad\mbox{for all}\qquad x\in\mathbb{B}_{\varepsilon}^{\delta}(\bar{x}|\bar{v})

(3.3)

with $\mathbb{B}_{\varepsilon}^{\delta}(\bar{x}|\bar{v})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{x\in\mathbb{B}_{\varepsilon}(\bar{x})|\;g(x)-g(\bar{x})-\langle\bar{v},x-\bar{x}\rangle<\delta\}$ . The function $g$ is said to satisfy the quadratic growth condition at $\bar{x}$ for $\bar{v}$ , if it satisfies this condition at $\bar{x}$ for $\bar{v}\in\partial g(\bar{x})$ with respect to

S(\bar{x},\bar{v})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{x\in\mathbb{X}|\;g(x)-\langle\bar{v},x\rangle\leq g(\bar{x})-\langle\bar{v},\bar{x}\rangle\right\}.

(3.4)

Finally, we say the function $g$ satisfies the quadratic growth condition at $\bar{x}$ if it satisfies this condition at $\bar{x}$ for any $\bar{v}\in\partial g(\bar{x})$ .

As the function $g$ is l.s.c., the set $S(\bar{x},\bar{v})$ is closed and $\bar{x}\in S(\bar{x},\bar{v})$ . When quadratic growth condition (3.3) holds, it is clear that

S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})\subset S\cap\mathbb{B}_{\varepsilon}(\bar{x})\quad\mbox{for some}\quad\varepsilon>0.

(3.5)

Moreover, for any closed set $S$ fulfilling (3.5) and $x\in\mathbb{B}^{\delta}_{\frac{\varepsilon}{2}}(\bar{x})$ , we find some $u\in S(\bar{x},\bar{v})$ such that

{\rm dist}\,(x;S(\bar{x},\bar{v}))=\|x-u\|\leq\|x-\bar{x}\|<\frac{\varepsilon}{2},

which implies that $\|u-\bar{x}\|\leq\|x-\bar{x}\|+\frac{\varepsilon}{2}<\varepsilon$ , i.e., $u\in S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})$ . It follows from (3.5) that

{\rm dist}\,(x;S(\bar{x},\bar{v}))=\|x-u\|\geq{\rm dist}\,(x;S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x}))\geq{\rm dist}\,(x;S)

for any $x\in\mathbb{B}^{\delta}_{\frac{\varepsilon}{2}}(\bar{x})$ . Hence, if the function $g$ satisfies the quadratic growth condition at $\bar{x}$ for $\bar{v}$ , it also satisfies the quadratic growth condition at $\bar{x}$ for $\bar{v}$ w.r.t. any closed set $S$ fulfilling (3.5). Many necessary and sufficient conditions for the quadratic growth condition have been established in [6, 8] and [44, 49] under a different name weak sharp minima with order $2$ .

When $g$ is convex and $\bar{v}\in\partial g(\bar{x})$ , the set $S(\bar{x},\bar{v})$ coincides with $(\partial g)^{-1}(\bar{v})=\partial g^{*}(\bar{v})$ . The quadratic growth condition of $g$ at $\bar{x}$ to $\bar{v}\in\partial g(\bar{x})$ w.r.t. $(\partial g)^{-1}(\bar{v})$ has been studied and connected with the so-called Łojasiewicz inequality with exponent $\frac{1}{2}$ [9] and the metric subregularity of the subdifferential [2, 21, 52] (even for nonconvex cases.) There are broad classes of convex functions satisfying the quadratic growth condition such as piece-wise linear quadratic convex functions [43, Definition 10.20] and many convex spectral functions [18]; see also [50] for some other ones.

When $g$ is not convex, the quadratic growth condition of $g$ at $\bar{x}$ to $\bar{v}\in\partial g(\bar{x})$ w.r.t. $(\partial g)^{-1}(\bar{v})$ is the same with the quadratic growth condition of $g$ at $\bar{x}$ for $\bar{v}$ provided that

(\partial g)^{-1}(\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})\subset S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})\quad\mbox{for sufficiently small}\quad\varepsilon>0.

(3.6)

It is necessary for the quadratic growth condition (3.3) at $\bar{x}$ for $\bar{v}$ that $\bar{x}$ is a local minimizer to the function $g_{\bar{v}}(x)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}g(x)-\langle\bar{v},x\rangle$ , $x\in\mathbb{X}$ . Then, condition (3.6) is similar to the so-called proper separation of isocost surface of $g_{\bar{v}}$ in [53], which is an improvement of the proper separation of stationary points of $g_{\bar{v}}$ in [32]. By [21, Theorem 3.1], the function $g$ satisfies the quadratic growth condition at $\bar{x}$ for $\bar{v}\in\partial g(\bar{x})$ w.r.t. $(\partial g)^{-1}(\bar{v})$ provided that $\partial g$ is metrically subregular at $\bar{x}$ for $\bar{v}$ in the sense that there exist $\eta,\ell>0$ such that

{\rm dist}\,(x;(\partial g)^{-1}(\bar{v}))\leq\ell{\rm dist}\,(\bar{v};\partial g(x))\quad\mbox{for}\quad x\in\mathbb{B}_{\eta}(\bar{x}).

(3.7)

This condition is satisfied when $\partial g$ is a piecewise polyhedral set-valued mapping, i.e., the graph of $\partial g$ , $\{(x,v)\in\mathbb{X}\times\mathbb{X}^{*}|\;v\in\partial g(x)\}$ is a union of finitely many polyhedral sets; see, e.g., [43, Example 9.57]. Thus the class of (possibly nonconvex) piecewise linear-quadratic functions fulfills (3.7); see also [53] for several sufficient conditions for (3.7) and some special nonconvex piecewise linear-quadratic regularizers such as SCAD and MCD penalty functions. Although our theory in this section is applicable to nonconvex functions, we focus our later applications on low-rank minimization problems (1.2) when $g$ is the nuclear norm, which also satisfies the quadratic growth condition [50] but the graph of $\partial g$ is not piecewise polyhedral.

The following lemma plays an important role in our analysis.

Lemma 3.2 (Necessary condition for quadratic growth).

Let $g:\mathbb{X}\to\overline{\mathbb{R}}$ be a proper l.s.c. function and $S$ be a closed subset of $\mathbb{X}$ with $\bar{x}\in\mbox{\rm dom}\,g\cap S$ . If $g$ satisfies the quadratic growth at $\bar{x}$ for some $\bar{v}\in\partial g(\bar{x})$ w.r.t. $S$ with some modulus $\kappa>0$ , we have

d^{2}g(\bar{x}|\bar{v})(w)\geq\kappa[{\rm dist}\,(w;T_{S}(\bar{x}))]^{2}\quad\mbox{for all}\quad w\in\mathbb{X}.

(3.8)

Moreover, if $g$ satisfies the quadratic growth at $\bar{x}$ for $\bar{v}$ , we have

\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{w\in\mathbb{X}|\;d^{2}g(\bar{x}|\bar{v})(w)=0\right\}=T_{S(\bar{x},\bar{v})}(\bar{x}).

(3.9)

Proof. Suppose that inequality (3.3) holds with some $\varepsilon,\delta,\kappa>0$ . Pick $w\in\mathbb{X}$ , we only need to verify (3.8) when $d^{2}g(\bar{x}|\bar{v})(w)<\infty$ , i.e., $w\in\mbox{\rm dom}\,d^{2}g(\bar{x}|\bar{v})$ . It follows from (2.3) that there exist sequences $t_{k}\downarrow 0$ and $w_{k}\to w$ such that

d^{2}g(\bar{x}|\bar{v})(w)=\lim_{k\to\infty}\dfrac{g(\bar{x}+t_{k}w_{k})-g(\bar{x})-t_{k}\langle\bar{v},w_{k}\rangle}{\frac{1}{2}t_{k}^{2}}.

(3.10)

Hence, we have

g(\bar{x}+t_{k}w_{k})-g(\bar{x})-\langle\bar{v},\bar{x}+t_{k}w_{k}-\bar{x}\rangle<\delta

when $k$ is sufficiently large. Combining (3.3) and (3.10) gives us that

\displaystyle\begin{array}[]{ll}d^{2}g(\bar{x}|\bar{v})(w)&\displaystyle\geq\kappa\limsup_{k\to\infty}\left[\dfrac{{\rm dist}\,(\bar{x}+t_{k}w_{k};S)}{t_{k}}\right]^{2}\\ &\displaystyle=\kappa\limsup_{k\to\infty}\left[{\rm dist}\,\left(w_{k};\frac{S-\bar{x}}{t_{k}}\right)\right]^{2}\end{array}

(3.13)

As $S$ is closed, there exist $u_{k}\in\dfrac{S-\bar{x}}{t_{k}}$ , i.e., $\bar{x}+t_{k}u_{k}\in S$ such that ${\rm dist}\,\left(w_{k};\dfrac{S-\bar{x}}{t_{k}}\right)=\|w_{k}-u_{k}\|.$ This together with (3.13) implies that

d^{2}g(\bar{x}|\bar{v})(w)\geq\kappa(\limsup_{k\to\infty}\|w_{k}-u_{k}\|^{2}).

Hence $u_{k}$ is bounded. By passing to a subsequence, we suppose that $u_{k}$ converges to $u\in\mathbb{X}$ . As $\bar{x}+t_{k}u_{k}\in S$ , we have $u\in T_{S}(\bar{x})$ . It follows from the above inequality that

d^{2}g(\bar{x}|\bar{v})(w)\geq\kappa\|w-u\|^{2}\geq\kappa[{\rm dist}\,(w;T_{S}(\bar{x}))]^{2},

which verifies (3.8).

To justify (3.9), suppose further that $g$ satisfies the quadratic growth condition at $\bar{x}$ for $\bar{v}$ . For any $w\in\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})$ , we obtain from (3.8) that ${\rm dist}\,(w;T_{S(\bar{x},\bar{v})}(\bar{x}))=0$ , which means $w\in T_{S(\bar{x},\bar{v})}(\bar{x})$ . It follows that $\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})\subset T_{S(\bar{x},\bar{v})}(\bar{x})$ . Let us prove the opposite inclusion by picking any $w\in T_{S(\bar{x},\bar{v})}(\bar{x})$ . There exist $t_{k}\downarrow 0$ and $w_{k}\to w$ such that $\bar{x}+t_{k}w_{k}\in S(\bar{x},\bar{v})$ , i.e.,

g(\bar{x}+t_{k}w_{k})-g(\bar{x})-t_{k}\langle\bar{v},w_{k}\rangle\leq 0

We obtain from (3.8) that

0\leq\kappa[{\rm dist}\,(w;T_{S(\bar{x},\bar{v})}(\bar{x}))]^{2}\leq d^{2}g(\bar{x}|\bar{v})(w)\leq\liminf_{k\to\infty}\dfrac{g(\bar{x}+t_{k}w_{k})-g(\bar{x})-t_{k}\langle\bar{v},w_{k}\rangle}{\frac{1}{2}t_{k}^{2}}\leq 0,

(3.14)

which yields $w\in\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})$ and verifies $T_{S(\bar{x},\bar{v})}(\bar{x})\subset\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})$ . The proof is complete. $\hfill\Box$

Next let us establish the main theorem of this section, which provides a geometric characterization for strong minima of problem (3.1).

Theorem 3.3 (Necessary and sufficient conditions for strong minima).

Let $\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g$ be a stationary point of problem (3.1). If $\bar{x}$ is a strong solution of problem (3.1), then

\langle\nabla^{2}f(\bar{x})w,w\rangle>0\qquad\mbox{for all}\quad w\in T_{S(\bar{x},\bar{v})}(\bar{x})\setminus\{0\}.

(3.15)

Suppose further that $g$ satisfies the quadratic growth condition at $\bar{x}$ for $\bar{v}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\nabla f(\bar{x})$ and that $\nabla^{2}f(\bar{x})$ is positive semidefinite, then $\bar{x}$ is a strong solution of problem (3.1) if and only if

\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{S(\bar{x},\bar{v})}(\bar{x})=\{0\}.

(3.16)

Proof. As $f$ is twice continuously differentiable at $\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)$ , we derive from (2.5) and (3.2) that $\bar{x}$ is a strong solution of $\varphi$ if and only if there exists some $\ell>0$ such that

\langle\nabla^{2}f(\bar{x})w,w\rangle+d^{2}g(\bar{x}|\bar{v})(w)\geq\ell\|w\|^{2}\quad\mbox{for all}\quad w\in\mathbb{X}.

(3.17)

To justify the first part, suppose that $\bar{x}$ is a strong solution of $\varphi$ , i.e., (3.17) holds. Pick any $w\in T_{S(\bar{x},\bar{v})}(\bar{x})\setminus\{0\}$ and find sequences $t_{k}\downarrow 0$ and $w_{k}\to w$ such that $\bar{x}+t_{k}w_{k}\in S(\bar{x},\bar{v})$ , which means

\dfrac{g(\bar{x}+t_{k}w_{k})-g(\bar{x})-\langle\bar{v},\bar{x}+t_{k}w_{k}-\bar{x}\rangle}{\frac{1}{2}t_{k}^{2}}\leq 0

By the definition of $d^{2}g(\bar{x}|\bar{v})(w)$ in (2.3), we have $d^{2}g(\bar{x}|\bar{v})(w)\leq 0$ . This together with (3.17) verifies (3.15).

To verify the second part of the theorem, suppose that the function $g$ satisfies the quadratic growth condition at $\bar{x}$ for $\bar{v}$ with some modulus $\kappa>0$ and $\nabla f^{2}(\bar{x})$ is positive semidefinite. It is obvious that (3.15) implies (3.16). We only need to prove that (3.16) is sufficient for strong minima at $\bar{x}$ . Suppose that condition (3.16) is satisfied. If condition (3.17) failed, we could find a sequence $w_{k}$ such that $\|w_{k}\|=1$ and

\langle\nabla^{2}f(\bar{x})w_{k},w_{k}\rangle+d^{2}g(\bar{x}|\bar{v})(w_{k})\leq\frac{1}{k}.

It follows from (3.8) that

\frac{1}{k}\geq\langle\nabla^{2}f(\bar{x})w_{k},w_{k}\rangle+\kappa[{\rm dist}\,(w_{0};T_{S(\bar{x},\bar{v})}(\bar{x}))]^{2}.

(3.18)

By passing to a subsequence, assume that $w_{k}\to w_{0}$ with $\|w_{0}\|=1$ (without relabeling.) It follows that

0\geq\langle\nabla^{2}f(\bar{x})w_{0},w_{0}\rangle+\kappa[{\rm dist}\,(w_{0};T_{S(\bar{x},\bar{v})}(\bar{x})]^{2}\geq\langle\nabla^{2}f(\bar{x})w_{0},w_{0}\rangle\geq 0.

Hence, we have $\langle\nabla^{2}f(\bar{x})w_{0},w_{0}\rangle=0$ and ${\rm dist}\,(w_{0};T_{S(\bar{x},\bar{v})}(\bar{x}))=0$ , which means

w_{0}\in\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{S(\bar{x},\bar{v})}(\bar{x}).

This is a contradiction to (3.16) as $\|w_{0}\|=1$ . Hence, (3.16) holds for some $\ell>0$ and $\bar{x}$ is a strong solution of $\varphi$ . The proof is complete. $\hfill\Box$

Corollary 3.4 (Geometric characterization for strong minima of convex problems).

Let $f,g:\mathbb{X}\to\overline{\mathbb{R}}$ be proper l.s.c. convex functions and $\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g$ be a minimizer to problem (3.1). Suppose that $f$ is twice continuously differentiable in ${\rm int}\,(\mbox{\rm dom}\,f)$ and that $\partial g$ is metrically subregular at $\bar{x}$ for $\bar{v}=-\nabla f(\bar{x})$ . Then $\bar{x}$ is a strong solution to problem (3.1) if any only if

\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{(\partial g^{*})(\bar{v})}(\bar{x})=\{0\}.

(3.19)

Proof. As discussed before (3.7), when $g$ is a convex function, the metric subregularity of $\partial g$ at $\bar{x}$ for $\bar{v}$ implies the quadratic growth condition (they are indeed equivalent [2, 52].) Since $f$ is convex, $\nabla^{2}f(\bar{x})$ is positive semidefinite. By Theorem 3.3, $\bar{x}$ is a strong solution if and only if (3.16) holds. Since $g$ is a convex function, we have $S(\bar{x},\bar{v})=(\partial g)^{-1}(\bar{v})=\partial g^{*}(\bar{v})$ . Thus (3.19) is equivalent to (3.16). The proof is complete. $\hfill\Box$

Unlike many other necessary and sufficient conditions for strong minima, our geometric characterizations (3.16) and (3.19) do not involve the ”curvature” or the ”sigma-term” of the function $g$ . We still need to compute the contingent cones $T_{S(\bar{x},\bar{v})}(\bar{x})$ in (3.16) or $T_{\partial g^{*}(\bar{v})}(\bar{x})$ in (3.19). In Section 4 and 5, we consider the case $g=\|\cdot\|_{*}$ , the nuclear norm and provide a simple calculation of $T_{\partial g^{*}(\bar{v})}(\bar{x})$ .

3.2 Geometric characterization for strong minima of optimization problems with linear constraints

In this subsection, we apply the idea in Theorem 3.3 and Corollary 3.4 to the following convex optimization problem with linear constraints

\min_{x\in\mathbb{X}}\quad g(x)\quad\mbox{subject to}\quad\Phi x\in K,

(3.20)

where $g:\mathbb{X}\to\mathbb{R}$ is a continuous (nonsmooth) convex function with full domain, $\Phi:\mathbb{X}\to\mathbb{Y}$ is a linear operator between two Euclidean spaces, and $K$ is a closed convex polyhedral set in $\mathbb{Y}$ . Unlike problem (3.1), function $g$ needs to satisfy more properties such as convexity, quadratic growth condition, and second order regularity stated in the next theorem.

Let us recall that $x_{0}$ is a stationary solution of problem (3.20) if there exists a Lagrange multiplier $\lambda\in\mathbb{Y}^{*}$ , hence a dual certificate, such that

-\Phi^{*}\lambda\in\partial g(x_{0})\quad\mbox{and}\quad\lambda\in N_{K}(\Phi x_{0}).

(3.21)

The set of Lagrange multipliers is defined by

\displaystyle\Lambda(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{\lambda\in N_{K}(\Phi x_{0})|-\Phi^{*}\lambda\in\partial g(x_{0})\}.

(3.22)

The critical cone of this problem at the stationary point $x_{0}$ is

C(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{w\in\mathbb{X}|\;\Phi w\in T_{K}(\Phi x_{0}),dg(x_{0})(w)=0\}.

(3.23)

The point $x_{0}$ is call a strong solutions of problem (3.20) if there exists $\varepsilon>0$ and $c>0$ such that

g(x)-g(x_{0})\geq c\|x-x_{0}\|^{2}\qquad\mbox{when}\qquad\Phi x\in K\quad\mbox{and}\quad x\in\mathbb{B}_{\varepsilon}(x_{0}).

Theorem 3.5 (Geometric characterization for strong minima of problem (3.20)).

Let $x_{0}$ be a stationary point of problem (3.20). Suppose that the convex function $g$ is second order regular at $x_{0}$ and satisfies the quadratic grown condition at $x_{0}$ . Then $x_{0}$ is a strong solution of problem (3.20) if and only if

\left[\bigcap_{\lambda\in\Lambda(x_{0})}T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\right]\cap C(x_{0})=\{0\}.

(3.24)

Proof. Define $y_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\Phi x_{0},$ $\mathbb{L}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm Im}\,\Phi$ being a linear subspace of $\mathbb{Y}$ , $K_{\mathbb{L}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}K\cap\mathbb{L}$ , the mapping $F:\mathbb{X}\to\mathbb{L}\times\mathbb{X}$ by $F(x)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(\Phi x,x)$ for $x\in\mathbb{X}$ , and the function $G:\mathbb{L}\times\mathbb{X}\to\mathbb{R}$ by

G(u,x)=\iota_{K_{\mathbb{L}}}(u)+g(x)\quad\mbox{for}\quad(u,x)\in\mathbb{L}\times\mathbb{X}.

Hence we can replace $\mathbb{Y}$ by $\mathbb{L}$ and $K$ by $K_{\mathbb{L}}$ in problem (3.20). Rewrite problem (3.20) as a composite optimization problem (2.17)

\inf_{x\in\mathbb{X}}\quad G(F(x)).

(3.25)

Observe that $x_{0}$ is a strong solution of (3.20) if and only if it is a strong solution of (3.25). Robinson’s constraint qualification (2.18) for problem (3.25) is

0\in{\rm int}\,(F(x_{0})+\nabla F(x_{0})\mathbb{X}-K_{\mathbb{L}}\times\mathbb{X}).

(3.26)

As $\Phi:\mathbb{X}\to\mathbb{L}$ is a surjective operator, [8, Corollary 2.101] tells us that the above condition is equivalent to the existence of $w\in\mathbb{X}$ satisfying

y_{0}+\Phi w\in K_{\mathbb{L}}\quad\mbox{and}\quad x_{0}+w\in{\rm int}\,\mathbb{X}=\mathbb{X}.

This condition holds trivially at $w=0_{\mathbb{X}}$ . Thus Robinson’s constraint qualification (3.26) holds.

The critical cone (2.20) for problem (3.25) is

\displaystyle\widehat{C}(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{w\in\mathbb{X}|\;dG(F(x_{0})|\;\nabla F(x_{0})w)=0\}=\{w\in\mathbb{X}|\;\Phi w\in T_{K_{\mathbb{L}}}(y_{0}),dg(x_{0})(w)=0\}.

(3.27)

As $K_{\mathbb{L}}$ is also a polyhedral in $\mathbb{L}$ , we have

T_{K_{\mathbb{L}}}(y_{0})=\mathbb{R}_{+}(K_{\mathbb{L}}-y_{0})=\mathbb{R}_{+}(K-y_{0})\cap\mathbb{L}=T_{K}(y_{0})\cap\mathbb{L}.

(3.28)

It follows that the set $\widehat{C}(x_{0})$ is exactly the critical cone $C(x_{0})$ defined in (3.23).

By (2.19), the set of Lagrange multipliers of problem (3.25) is

\displaystyle\begin{array}[]{ll}\widehat{\Lambda}(x_{0})&\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{(\lambda,\mu)\in\mathbb{L}^{*}\times\mathbb{X}^{*}|\;\nabla F(x_{0})^{*}(\lambda,\mu)=0,(\lambda,\mu)\in\partial G(F(x_{0}))\}\\ &=\{(\lambda,\mu)\in\mathbb{L}^{*}\times\mathbb{X}^{*}|\;\mu=-\Phi^{*}\lambda,\lambda\in N_{K_{\mathbb{L}}}(y_{0}),\mu\in\partial g(x_{0})\}.\end{array}

(3.31)

Note further that

\mbox{\rm epi}\,G=K_{\mathbb{L}}\times\mbox{\rm epi}\,g.

Since $\mathbb{L}$ is a polyhedral, it is second order regular [7]. As $\mbox{\rm epi}\,g$ is second order regular, so is $\mbox{\rm epi}\,G$ ; see, e.g., [8, Propisition 3.89].

By Theorem 2.4, $x_{0}$ is a strong solution of problem (3.25) if and only if for any $w\in C(x_{0})\setminus\{0\}$ there exists $(\lambda,\mu)\in\Lambda(x_{0})$ such that

\displaystyle\langle(\lambda,\mu),\nabla^{2}F(x_{0})(w,w)\rangle+d^{2}G(F(x_{0})|(\lambda,\mu))(\nabla F(x_{0})w)>0.

(3.32)

Observe that

d^{2}G(F(x_{0})|\nabla F(x_{0})(\lambda,\mu))(\nabla F(x_{0})w)=d^{2}g(x_{0}|\mu)(w)+d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w).

(3.33)

Note from (2.3) that

d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w)=\liminf_{z\to\Phi w,t\downarrow 0}\frac{\iota_{K_{\mathbb{L}}}(y_{0}+tz)-\iota_{K_{\mathbb{L}}}(y_{0})-t\langle\lambda,z\rangle}{0.5t^{2}}\geq 0.

(3.34)

By (3.28), we have

d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w)=\liminf_{z\stackrel{{\scriptstyle T_{K_{\mathbb{L}}}(y_{0})}}{{\to}}\Phi w,t\downarrow 0}\frac{-\langle\lambda,z\rangle}{0.5t}\geq 0.

(3.35)

Since $w\in C(x_{0})$ , we have $\Phi w\in T_{K_{\mathbb{L}}}(\Phi x_{0})$ . As $\lambda\in N_{K_{\mathbb{L}}}(\Phi x_{0})$ and $\mu=-\Phi^{*}\lambda\in\partial g(x_{0})$ , it follows that

0=dg(x_{0})(w)\geq\langle\mu,w\rangle=-\langle\Phi^{*}\lambda,w\rangle=-\langle\lambda,\Phi w\rangle\geq 0,

which implies that $\langle\lambda,\Phi w\rangle=0$ . This together with (3.35) tells us that $d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w)=0$ .

By (3.33), condition (3.32) is equivalent to

d^{2}g(x_{0}|-\Phi^{*}\lambda)(w)>0.

(3.36)

Since $K$ is a polyhedral set, we have

N_{K_{\mathbb{L}}}(\Phi x_{0})=N_{K\cap\mathbb{L}}(\Phi x_{0})=N_{K}(\Phi x_{0})+N_{\mathbb{L}}(\Phi x_{0})=N_{K}(\Phi x_{0})+\mbox{\rm Ker}\,\Phi^{*}.

Represent $\lambda=\lambda_{1}+\lambda_{2}$ for some $\lambda_{1}\in N_{K}(\Phi x_{0})$ and $\lambda_{2}\in\mbox{\rm Ker}\,\Phi^{*}$ , it follows that

-\Phi^{*}\lambda=-\Phi^{*}(\lambda_{1}+\lambda_{2})=-\Phi^{*}\lambda_{1}.

Hence $x_{0}$ is a strong solution of problem (3.25) if any only if for any $w\in C(x_{0})\setminus\{0\}$ there exists some $\lambda\in\Lambda(x_{0})$ in (3.21) such that (3.36) holds. Since $g$ satisfies the quadratic grown condition at $x_{0}$ , we get from Lemma 3.2 that

\mbox{\rm Ker}\,d^{2}g(x_{0}|-\Phi^{*}\lambda)=T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0}).

Hence condition (3.36) means for any $w\in C(x_{0})\setminus\{0\}$ there exists $\lambda\in\Lambda(x_{0})$ such that

w\notin T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\quad\mbox{or}\quad w\in\mathbb{X}\setminus T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0}).

This is equivalent to the following inclusion

C(x_{0})\setminus\{0\}\subset\left[\bigcup_{\lambda\in\Lambda(x_{0})}\left(\mathbb{X}\setminus T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\right)\right]=\mathbb{X}\setminus\left[\bigcap_{\lambda\in\Lambda(x_{0})}T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\right],

which is also equivalent to (3.24). The proof is complete. $\hfill\Box$

The approach of using composite function (3.25) to study problem (3.20) is traditional; see, e.g., [7, 34]. However, by assuming additionally the function $g$ satisfies the quadratic grown condition at $x_{0}$ , we are able to obtain the new geometric characterization for strong solution in (3.24). The main idea in this result is similar to that in Theorem 3.3. Here we require the function $g$ to satisfy more assumptions, but when applying this result to the nuclear norm minimization problem (5.1), they are also valid, as the nuclear norm is second order regular and also satisfies the quadratic growth condition [18].

4 Characterizations for strong minima of low-rank optimization problems

This section devotes to new characterizations for strong minima of the low-rank optimization problem:

\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad h(\Phi X)+\mu\|X\|_{*},

(4.1)

where $\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{m}$ is a linear operator, $g(X)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|X\|_{*}$ is the nuclear norm of $X\in\mathbb{R}^{n_{1}\times n_{2}}$ , $\mu$ is a positive constant, and $h:\mathbb{R}^{m}\to\overline{\mathbb{R}}$ satisfies the following standing assumptions [24]:

(A)

$h$ is proper convex and twice continuously differentiable in ${\rm int}\,(\mbox{\rm dom}\,h)$ .
(B)

$\nabla^{2}h(\Phi X)$ is positive definite for any $X\in\Phi^{-1}({\rm int}\,(\mbox{\rm dom}\,h))$ .

Strongly convex functions with full domain clearly satisfy the above standing assumptions. Another important (non-strongly convex) function with these conditions widely used in statistical/machine learning is the Kullback-Leiber divergence. Sufficient conditions for strong minima of problem (4.1) can be obtained from [18, Theorem 12]. However, their result still relies on some computation of $d^{2}\|\cdot\|_{*}$ , which is complicated; see, e.g., [20, 51] and the recent paper [37] for the case of symmetric matrices. We will provide some explicit and computable characterizations for strong minima of problem (4.1) based on Corollary 3.4. The calculation of the contingent cone $T_{\partial g^{*}(\overline{Y})}(\overline{X})$ is rather simple; see our formula (4.10) below.

Let us recall a few standard notations for matrices. The space of all matrices $\mathbb{R}^{n_{1}\times n_{2}}$ ( $n_{1}\leq n_{2}$ ) is endowed with the inner product

\langle X,Y\rangle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm Tr}\,(X^{T}Y)\quad\mbox{for all}\quad X,Y\in\mathbb{R}^{n_{1}\times n_{2}},

where Tr is the trace operator. The Frobenious norm on $\mathbb{R}^{n_{1}\times n_{2}}$ is

\|X\|_{F}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sqrt{\mbox{\rm Tr}\,(X^{T}X)}\quad\mbox{for all}\quad X\in\mathbb{R}^{n_{1}\times n_{2}}.

The nuclear norm and spectral norm of $X\in\mathbb{R}^{n_{1}\times n_{2}}$ are defined respectively by

\|X\|_{*}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sum_{i=1}^{n_{1}}\sigma_{i}(X)\quad\mbox{and}\quad\|X\|\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sigma_{1}(X),

where $\sigma_{1}(X)\geq\sigma_{2}(X)\geq\ldots\geq\sigma_{n_{1}}(X)\geq 0$ are all singular values of $X$ . Suppose that a full Singular Value Decomposition (SVD) of $\overline{X}\in\mathbb{R}^{n_{1}\times n_{2}}$ is

\overline{X}=U\begin{pmatrix}\overline{\Sigma}_{r}&0\\ 0&0\end{pmatrix}_{n_{1}\times n_{2}}V^{T}\quad\mbox{with}\quad\overline{\Sigma}_{r}=\begin{pmatrix}\sigma_{1}(\overline{X})&\dots&0\\ \vdots&\ddots&\vdots\\ 0&\ldots&\sigma_{r}(\overline{X})\end{pmatrix},

(4.2)

where $r={\rm rank}\,(\overline{X})$ , $U\in\mathbb{R}^{n_{1}\times n_{1}}$ and $V\in\mathbb{R}^{n_{2}\times n_{2}}$ are orthogonal matrices. Let $\mathcal{O}(\overline{X})$ be the set of all such pairs $(U,V)$ satisfying (4.2). We write $U=\begin{pmatrix}U_{I}&U_{J}\end{pmatrix}$ and $V=\begin{pmatrix}V_{I}&V_{K}\end{pmatrix}$ , where $U_{I}$ and $V_{I}$ are the submatrices of the first $r$ columns of $U$ and $V$ , respectively. We get from (4.2) that $\overline{X}=U_{I}\overline{\Sigma}_{r}V_{I}^{T}$ , which is known as a compact SVD of $\overline{X}$ .

The following lemma is significant in our paper. The first part is well-known [48, Example 2]. The last part was established in [50, Proposition 10], which can be viewed as a direct consequence of [48, Example 1] via convex analysis, the formula of normal cone to a level set [41, Corollary 23.7.1].

Lemma 4.1 (Subdifferential of the nuclear norm).

The subdifferential to nuclear norm at $\overline{X}\in\mathbb{R}^{n_{1}\times n_{2}}$ is computed by

\partial\|\overline{X}\|_{*}=\left\{U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&W\end{pmatrix}V^{T}|\;\|W\|\leq 1\right\}\quad\mbox{for any}\quad(U,V)\in\mathcal{O}(\overline{X}).

(4.3)

Moreover, $\overline{Y}\in\partial\|\overline{X}\|_{*}$ if and only if $\|\overline{Y}\|\leq 1$ and

\|\overline{X}\|_{*}=\langle\overline{Y},\overline{X}\rangle.

(4.4)

Furthermore, for any $\overline{Y}\in\mathbb{B}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{Z\in\mathbb{R}^{n_{1}\times n_{2}}|\;\|Z\|\leq 1\}$ , we have

\partial g^{*}(\overline{Y})=N_{\mathbb{B}}(\overline{Y})=\overline{U}\begin{pmatrix}\mathbb{S}_{+}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}\quad\mbox{for any}\quad(\overline{U},\overline{V})\in\mathcal{O}(\overline{Y}),

(4.5)

where $\mathbb{S}_{+}^{p}$ is the set of all $p\times p$ symmetric positive semidefinite matrices and $p(\overline{Y})$ is defined by

p(\overline{Y})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\#\{i|\;\sigma_{i}(\overline{Y})=1\}.

(4.6)

Let $\overline{Y}\in\partial\|\overline{X}\|_{*}$ and $(U,V)\in\mathcal{O}(\overline{X})$ . It follows from (4.3) that $\overline{Y}$ can be represented by

\overline{Y}=U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&\overline{W}\end{pmatrix}V^{T}

(4.7)

with some $\overline{W}\in\mathbb{R}^{(n_{1}-r)\times(n_{2}-r)}$ satisfying $\|\overline{W}\|\leq 1$ . Let $(\widehat{U},\widehat{V})\in\mathcal{O}(\overline{W})$ and $\widehat{U}\Sigma\widehat{V}^{T}$ be a full SVD of $\overline{W}$ . We get from (4.7) that

\overline{Y}=\overline{U}\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&\Sigma\end{pmatrix}\overline{V}^{T}\quad\mbox{with}\quad\overline{U}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(U_{I}\;U_{J}\widehat{U})\quad\mbox{and}\quad\overline{V}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(V_{I}\;V_{K}\widehat{V}).

(4.8)

Observe that $\overline{U}^{T}\overline{U}=\mathbb{I}_{{n_{1}}}$ and $\overline{V}^{T}\overline{V}=\mathbb{I}_{n_{2}}$ . It follows that $(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})\cap\mathcal{O}(\overline{Y})$ , which means $\overline{X}$ and $\overline{Y}$ have simultaneous ordered singular value decomposition [30, 31] with orthogonal matrix pair $(\overline{U},\overline{V})$ in the sense that

\overline{X}=\overline{U}({\rm Diag}\,\sigma(\overline{X}))\overline{V}^{T}\qquad\mbox{and}\qquad\overline{Y}=\overline{U}({\rm Diag}\,\sigma(\overline{Y}))\overline{V}^{T},

(4.9)

where $\sigma(\overline{X})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left(\sigma_{1}(\overline{X}),\ldots,\sigma_{n_{1}}(\overline{X})\right)^{T}$ and ${\rm Diag}\,\sigma(\overline{X})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}\sigma_{1}(\overline{X})&\ldots&0&0&\ldots&0\\ 0&\ddots&0&0&\ldots&0\\ 0&\ldots&\sigma_{n_{1}}(\overline{X})&0&\ldots&0\end{pmatrix}_{n_{1}\times n_{2}}$ .

The following result establishes a geometric characterization for strong solution of the problem (4.1). According to [50, Proposition 11], the subdifferential of the nuclear norm function satisfies the metric subregularity (3.7) at any $\overline{X}$ for any $Y\in\partial\|\overline{X}\|_{*}$ .

Corollary 4.2 (Geometric characterization for strong minima of low-rank optimization problems).

Suppose that $\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h))$ is a minimizer of problem (4.1) with $\overline{Y}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\frac{1}{\mu}\Phi^{*}\nabla h(\Phi\overline{X})\in\partial\|\overline{X}\|_{*}$ . Let $(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})\cap\mathcal{O}(\overline{Y})$ as in (4.9) or (4.8). Then we have

T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\left\{\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&C&0\\ 0&0&0\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p(\overline{Y})-r)},C\in\mathbb{S}^{p(\overline{Y})-r}_{+}\right\},

(4.10)

where $p(\overline{Y})$ is defined in (4.6) and $\mathbb{S}^{r}$ is the set of all symmetric matrices of size $r\times r$ . Hence $\overline{X}$ is a strong solution of (4.1) if any only if

\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\{0\}.

(4.11)

Consequently, $\overline{X}$ is a strong solution of (4.1) provided that the following Strong Sufficient Condition holds

\mbox{\rm Ker}\,\Phi\cap\overline{U}\begin{pmatrix}\mathbb{S}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}=\{0\}.

(4.12)

Proof. By Lemma 4.1, we have

\partial g^{*}(\overline{Y})=N_{\mathbb{B}}(\overline{Y})=\overline{U}\begin{pmatrix}\mathbb{S}_{+}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}.

As $(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})$ , we obtain from (4.2) that

T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\overline{U}\begin{pmatrix}T_{\mathbb{S}_{+}^{p(\overline{Y})}}\begin{pmatrix}\overline{\Sigma}_{r}&0\\ 0&0\end{pmatrix}&0\\ 0&0\end{pmatrix}\overline{V}^{T},

which is exactly the right-hand side of (4.10) according to the contingent cone formula to $S^{p(\overline{Y})}_{+}$ in [8, Example 2.65]. Since $\partial\|\cdot\|_{*}$ is metrically subregular at $\overline{X}$ for $\overline{Y}$ by [50, Proposition 11], it follows from Corollary 3.4 that $\overline{X}$ is a strong solution of problem (4.1) if and only if

\mbox{\rm Ker}\,\Big{(}\Phi^{*}\nabla^{2}h(\Phi\overline{X})\Phi\Big{)}\cap T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\{0\}.

Since $\nabla^{2}h(\Phi\overline{X})\succ 0$ , we have $\mbox{\rm Ker}\,\Big{(}\Phi^{*}\nabla^{2}h(\Phi\overline{X})\Phi\Big{)}=\mbox{\rm Ker}\,\Phi$ . The characterization (4.11) for strong minima at $\overline{X}$ follows from the above condition.

Finally, note from (4.10) that

T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})\subset\overline{U}\begin{pmatrix}\mathbb{S}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}.

Strong Sufficient Condition (4.12) implies strong minima at $\overline{X}$ by (4.11). $\hfill\Box$

Remark 4.3.

The geometric characterization (4.11) for strong minima of problem (4.1) is news. A sufficient condition is indeed obtained from [18, Theorem 12], which considers more general optimization problems involving spectral functions. However, their result contains a nontrivial sigma-term, which is calculated explicitly in recent papers [17, 37] for the case of symmetric matrices. Our approach is totally different without any sigma-terms. Moreover, our condition is a full characterization for strong minima.

Another result about strong minima of problem (4.1) was established in [29, Proposition 12], which plays an important role in proving the local linear convergence of Forward-Backward algorithms solving problem (4.1). The result mainly states that the so-called Restricted Injectivity and Nondegenerate Condition are sufficient for strong minima; see also [24, Proposition 4.27] for similar observation. Let us recall these important conditions here; see further discussions about them in Section 5. Let $(U,V)\in\mathcal{O}(\overline{X})$ and define the model tangent subspace

\mathbb{T}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{U_{I}Y^{T}+XV_{I}^{T}|\;X\in\mathbb{R}^{n_{1}\times r},Y\in\mathbb{R}^{n_{2}\times r}\}

(4.13)

of $\mathbb{R}^{n_{1}\times n_{2}}$ with dimension $\mbox{\rm dim}\,\mathbb{T}=r(n_{1}+n_{2}-r)$ ; see, e.g., [15, 16]. The Restricted Injectivity condition means

\mbox{\rm Ker}\,\Phi\cap\mathbb{T}=\{0\}.

(4.14)

And the Nondegeneracy Condition holds when

\overline{Y}=-\frac{1}{\mu}\Phi^{*}\nabla h(\Phi\overline{X})\in{\rm ri}\,\partial\|\overline{X}\|_{*},

(4.15)

where ${\rm ri}\,\partial\|\overline{X}\|_{*}$ is the relative interior of $\partial\|\overline{X}\|_{*}$ ; see [41]. The validity of Nondegeneracy Condition (4.15) implies that $\overline{X}$ is an optimal solution of problem (4.1). Note that

{\rm ri}\,\partial\|\overline{X}\|_{*}=\left\{U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&W\end{pmatrix}V^{T}|\;\|W\|<1\right\}\quad\mbox{with}\quad r=\mbox{\rm rank}\,(\overline{X}).

(4.16)

Hence, Nondegeneracy Condition (4.15) means that the number of singular value ones, $p(\overline{Y})$ in (4.6), is the rank of $\overline{X}$ . In this case, Restricted Injectivity (4.14) clearly implies the Strong Sufficient Condition (4.12). Hence, the combination of Restricted Injectivity (4.14) and Nondegeneracy Condition (4.15) is stronger than our Strong Sufficient Condition (4.12). The following result gives a complete picture about strong minima when Nondegeneracy Condition (4.15) occurs.

Corollary 4.4 (Strong minima under Nondegeneracy Condition).

Suppose that $\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h))$ and Nondegeneracy Condition (4.15) holds. Then $\overline{X}$ is a strong solution of problem (4.1) if and only if the following Strict Restricted Injectivity holds

\mbox{\rm Ker}\,\Phi\cap U_{I}\mathbb{S}^{r}V_{I}^{T}=\{0\},

(4.17)

where $U_{I}\overline{\Sigma}V_{I}^{T}$ is a compact SVD of $\overline{X}$ .

Proof. As Nondegeneracy Condition (4.15) holds, $\overline{X}$ is a solution of problem (4.1). In this case, observe from (4.8) and (4.10) that $T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=U_{I}\mathbb{S}^{r}V_{I}^{T}.$ The equivalence between strong minima at $\overline{X}$ and (4.17) follows Corollary 3.4. $\hfill\Box$

As the dimension of subspace $U_{I}\mathbb{S}^{r}V_{I}^{T}$ is $\frac{1}{2}r(r+1)$ , which is usually small in low-rank optimization problems, it is likely that condition (4.17) holds when Nondegeneracy Condition (4.15) is satisfied. More discussions about Strict Restricted Injectivity will be added on Section 5.

Although geometric characterization (4.11) looks simple, checking it in high dimension is nontrivial. But Strong Sufficient Condition (4.12) and Strict Restricted Injectivity (4.17) can be verified easily. Next we establish some quantitative characterizations for strong minima. Before doing so, we obtain some projection formulas onto subspaces $\mathbb{T}$ and $\mathbb{T}^{\perp}$ . For any $X\in\mathbb{R}^{n_{1}\times n_{2}}$ , suppose that $X$ is represented by block matrices as:

\displaystyle\begin{array}[]{ll}X&=\begin{pmatrix}U_{I}&U_{J}\end{pmatrix}\begin{pmatrix}A&B\\ C&D\end{pmatrix}\begin{pmatrix}V_{I}&V_{K}\end{pmatrix}^{T}\quad\mbox{with}\quad(U,V)\in\mathcal{O}(\overline{X}).\end{array}

The projections of $X$ onto $\mathbb{T}$ and $\mathbb{T}^{\perp}$ are computed respectively by

P_{\mathbb{T}}X=\begin{pmatrix}U_{I}&U_{J}\end{pmatrix}\begin{pmatrix}A&B\\ C&0\end{pmatrix}\begin{pmatrix}V_{I}&V_{K}\end{pmatrix}^{T}\quad\mbox{and}\quad P_{\mathbb{T}^{\perp}}X=U_{J}DV_{K}^{T}.

(4.19)

The following result provides a formula for critical cone of nuclear norm at $\overline{X}$ for $\overline{Y}\in\partial\|\overline{X}\|_{*}$ .

Proposition 4.5 (Critical cone of nuclear norm).

Let $\overline{Y}\in\partial\|\overline{X}\|_{*}$ and $(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})\cap\mathcal{O}(\overline{Y})$ as in (4.2) and (4.8). Define $H\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{k\in\{r+1,\ldots n_{1}\}|\;\sigma_{k}(\overline{Y})=1\}$ . Then the critical cone $\mathcal{C}(\overline{X},\overline{Y})$ of $\|\cdot\|_{*}$ at $\overline{X}$ for $\overline{Y}$ is computed by

\mathcal{C}(\overline{X},\overline{Y})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;d\|\overline{X}\|_{*}(W)=\langle\overline{Y},W\rangle\}=\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{\mathbb{T}^{\perp}}W\in\overline{U}_{H}\mathbb{S}^{|H|}_{+}\overline{V}_{H}^{T}\right\},

(4.20)

where $\overline{U}_{H}$ and $\overline{V}_{H}$ are submatrices of index columns $H$ of $\overline{U}$ and $\overline{V}$ , respectively.

Proof. For any $W\in\mathbb{R}^{m\times n}$ , it is well-known from convex analysis [41] that

\displaystyle d\|\overline{X}\|_{*}(W)=\sup_{Y\in\partial\|\overline{X}\|_{*}}\langle Y,W\rangle.

(4.21)

This together with (4.3) and (4.19) implies that

\displaystyle\begin{array}[]{ll}d\|\overline{X}\|_{*}(W)&\displaystyle=\sup_{Y\in\partial\|\overline{X}\|_{*}}\langle Y,P_{\mathbb{T}}W+P_{\mathbb{T}^{\perp}}W\rangle=\sup_{Y\in\partial\|\overline{X}\|_{*}}\langle P_{\mathbb{T}}Y,W\rangle+\langle Y,P_{\mathbb{T}^{\perp}}W\rangle\\ &=\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\end{array}

(4.24)

with $E\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}U_{I}V_{I}^{T}$ . As $\overline{Y}\in\partial\|\overline{X}\|_{*}$ , we have $W\in\mathcal{C}(\overline{X},\overline{Y})$ if and only if

\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}=\langle P_{\mathbb{T}}\overline{Y},W\rangle+\langle P_{\mathbb{T}^{\perp}}\overline{Y},W\rangle=\langle E,W\rangle+\langle P_{\mathbb{T}^{\perp}}\overline{Y},P_{\mathbb{T}^{\perp}}W\rangle,

which means $\|P_{\mathbb{T}^{\perp}}W\|_{*}=\langle P_{\mathbb{T}^{\perp}}\overline{Y},P_{\mathbb{T}^{\perp}}W\rangle$ . By Lemma 4.1, we have $P_{\mathbb{T}^{\perp}}W\in\partial g^{*}(P_{\mathbb{T}^{\perp}}\overline{Y})$ , or equivalently $P_{\mathbb{T}^{\perp}}W\in\overline{U}_{H}\mathbb{S}^{|H|}_{+}\overline{V}_{H}^{T}$ . The proof is complete. $\hfill\Box$

Next, we construct the main result of this section, which contains a quantitative characterization for strong minima. A similar result for group-sparsity minimization problem is recently established in [24, Theorem 5.3].

Theorem 4.6 (Characterizations for strong minima of low-rank optimization problems).

Suppose that $\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h))$ is a minimizer of problem (4.1) and $\overline{Y}=-\frac{1}{\mu}\Phi^{*}\nabla h(\Phi\overline{X})$ with decomposition (4.2) and (4.8). The following are equivalent:

(i)

$\overline{X}$ is a strong solution to problem (4.1).

(ii)

$\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathcal{C}=\{0\}$ with

	$\displaystyle\mathcal{E}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}\|\;P_{\mathbb{T}}W\in\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&0&0\\ 0&0&0\end{pmatrix}\overline{V}^{T},A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p(\overline{Y})-r)}\right\},$		(4.25)
	$\displaystyle\mathcal{C}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}\|\;\langle E,W\rangle+\\|P_{\mathbb{T}^{\perp}}W\\|_{*}=0\right\}\qquad\mbox{with}\qquad E\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}U_{I}V_{I}^{T}.$		(4.26)

(iii)

The following conditions (a) and either (b) or (c) are satisfied:

(a)

(Strong Restricted Injectivity): $\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathbb{T}=\{0\}$ .
(b)

(Strong Nondegenerate Source Condition): There exists $Y\in\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}$ such that $Y=U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&Z\end{pmatrix}V^{T}$ and $\|Z\|<1$ .

(c)

(Analysis Strong Source Condition): The Strong Source Coefficient $\zeta(\overline{X})$ , which is the optimal value of the following spectral norm optimization problem

\min_{Z\in\mathbb{R}^{(m-r)\times(n-r)}}\qquad\|Z\|\qquad\mbox{subject to}\qquad\mathcal{M}(U_{J}ZV_{K}^{T})=-\mathcal{M}E

(4.27)

is smaller than $1$ , where $\mathcal{M}$ is a linear operator such that $\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}$ .

Proof. Let us verify the equivalence between (i) and (ii). By Corollary 4.2, it suffices to show that

\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathcal{C}.

(4.28)

By Proposition 4.5 and Corollary 4.2, we have

T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\mathcal{E}\cap\mathcal{C}(\overline{X},\overline{Y}).

(4.29)

As $\overline{Y}\in\mbox{\rm Im}\,\Phi^{*}$ , note from (4.20) and (4.24) that

\mbox{\rm Ker}\,\Phi\cap\mathcal{C}(\overline{X},\overline{Y})=\mbox{\rm Ker}\,\Phi\cap\mathcal{C}.

This together with (4.29) verifies (4.28) and also the equivalence between (i) and (ii).

Next, let us verify the implication [(ii) $\Rightarrow$ (iii)]. Suppose that (ii) (or (i)) is satisfied. Note from the projection formula (4.19) that $\mathcal{E}\cap\mathbb{T}\subset T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})$ . It follows from Corollary 4.2 that the Strong Resitricted Injectivity holds.

Since $\overline{Y}\in\partial\|\overline{X}\|_{*}\cap\mbox{\rm Im}\,\Phi^{*}$ , we obtain from (4.21) and (4.24) that

\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}=dg(\overline{X})(W)\geq\langle\overline{Y},W\rangle=0\quad\mbox{for any}\quad W\in\mbox{\rm Ker}\,\Phi.

Condition (ii) means

c\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\min\{\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}|\;W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E},\|W\|_{*}=1\}>0,

which implies that

k(W)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\geq c\|W\|_{*}\quad\mbox{for all}\quad W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E}.

(4.30)

As $\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}$ , for any $W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E}$ , we write $W=\mathcal{M}^{*}Y$ and derive from (4.30) that

(1-c)\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\|_{*}\geq\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\|_{*}-c\|\mathcal{M}^{*}Y\|_{*}\geq-\langle E,\mathcal{M}^{*}Y\rangle=-\langle\mathcal{M}E,Y\rangle.

(4.31)

Since $\mbox{\rm Im}\,\mathcal{M}^{*}\subset\mbox{\rm Ker}\,\Phi$ , we have $\mbox{\rm Im}\,\Phi^{*}\subset\mbox{\rm Ker}\,\mathcal{M}$ and thus $\overline{Y}\in\mbox{\rm Ker}\,\mathcal{M}$ . It follows that

0=\mathcal{M}\overline{Y}=\mathcal{M}P_{\mathbb{T}}\overline{Y}+\mathcal{M}P_{\mathbb{T}^{\perp}}\overline{Y}=\mathcal{M}E+\mathcal{M}P_{\mathbb{T}^{\perp}}\overline{Y}.

This together with (4.31) implies that

(1-c)\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\|_{*}\geq\langle\mathcal{M}P_{\mathbb{T}^{\perp}}\overline{Y},Y\rangle=\langle P_{\mathbb{T}^{\perp}}\overline{Y},\mathcal{M}^{*}Y\rangle=\langle P_{\mathbb{T}^{\perp}}\overline{Y},P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\rangle.

Define $\mathbb{B}_{*}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{W\in\mathbb{R}^{m\times n}|\;\|W\|_{*}\leq 1\}$ the unit ball with respect to nuclear norm. We obtain from the later and the classical minimax theorem [41, Corollary 37.3.2] that

\begin{array}[]{ll}1-c&\displaystyle\geq\sup_{W\in\mathbb{B}_{*}}\langle P_{\mathbb{T}^{\perp}}\overline{Y},W\rangle-\iota_{{\rm Im}\,P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}}(W)\\ &=\displaystyle\sup_{W\in\mathbb{B}_{*}}\inf_{X\in{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}}\langle P_{\mathbb{T}^{\perp}}\overline{Y}+X,W\rangle\\ &=\displaystyle\inf_{X\in{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}}\sup_{W\in\mathbb{B}_{*}}\langle P_{\mathbb{T}^{\perp}}\overline{Y}+X,W\rangle\\ &=\displaystyle\inf_{X\in{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}}\|P_{\mathbb{T}^{\perp}}\overline{Y}+X\|.\end{array}

Hence there exists $X_{0}\in\mbox{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}$ such that $\|P_{\mathbb{T}^{\perp}}\overline{Y}+X_{0}\|<1$ . Due to the projection formula (4.19), observe that

1>\|P_{\mathbb{T}^{\perp}}\overline{Y}+X_{0}\|\geq\|P_{\mathbb{T}^{\perp}}(P_{\mathbb{T}^{\perp}}\overline{Y}+X_{0})\|=\|P_{\mathbb{T}^{\perp}}(\overline{Y}+X_{0})\|.

Define $Y_{0}=\overline{Y}+P_{\mathbb{T}^{\perp}}X_{0}$ , we have

\mathcal{M}Y_{0}=\mathcal{M}\overline{Y}+\mathcal{M}P_{\mathbb{T}^{\perp}}X_{0}=0.

Note that $\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}$ , $\overline{Y}\in\mbox{\rm Im}\,\Phi^{*}\subset\mbox{\rm Ker}\,\mathcal{M}$ , and $X_{0}\in\mbox{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}$ . It follows that $Y_{0}\in\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}$ . Moreover, observe that $P_{\mathbb{T}}Y_{0}=P_{\mathbb{T}}\overline{Y}=E$ and $\|P_{\mathbb{T}^{\perp}}Y_{0}\|<1$ . Thus, $Y_{0}$ satisfies the condition in (b). As $\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}$ , (b) and (c) are equivalent, we ensure the implication [(ii) $\Rightarrow$ (iii)].

It remains to justify the implication [(iii) $\Rightarrow$ (ii)]. Suppose that the Strong Restricted Injectivity (a) and the Strong Source Condition (b) hold with some $Y_{0}\in\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}$ satisfying the condition in (b). Indeed, pick any $W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathcal{C}=\mbox{\rm Im}\,^{*}\mathcal{M}$ . As $Y_{0}\in\mbox{\rm Ker}\,\mathcal{M}$ , we have $\langle Y_{0},W\rangle=0$ . It follows that

\begin{array}[]{ll}0=\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}&=\displaystyle\langle P_{\mathbb{T}}Y_{0},W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\\ &=\displaystyle\langle Y_{0},W\rangle-\langle P_{\mathbb{T}^{\perp}}Y_{0},W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\\ &=\displaystyle-\langle P_{\mathbb{T}^{\perp}}Y_{0},P_{\mathbb{T}^{\perp}}W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\\ &\geq(1-\|P_{\mathbb{T}^{\perp}}Y_{0}\|)\|P_{\mathbb{T}^{\perp}}W\|_{*}.\end{array}

Since $\|P_{\mathbb{T}^{\perp}}Y_{0}\|<1$ , we have $P_{\mathbb{T}^{\perp}}W=0$ , i.e., $W\in\mathbb{T}$ . This implies that $W=0$ due to the Strong Restricted Injectivity (a). The proof is complete. $\hfill\Box$

Remark 4.7.

The Strong Restricted Injectivity (a) means that the linear operator $\Phi$ is injective on the subspace $\mathcal{E}\cap\mathbb{T}$ . It is similar with the one with the same name in [24, Theorem 5.3] that is used to characterize the uniqueness (strong) solution for group-sparsity optimization problems. This condition is adopted from the Restricted Injectivity (4.14) in [25]; see also [14, 15, 16] for the case of nuclear norm minimization problems. The Strong Restricted Injectivity is certainly weaker than the Restricted Injectivity.

The Strong Nondegenerate Source Condition and Analysis Strong Source Condition also inherit the same terminologies introduced in [24, Theorem 5.3] for group-sparsity optimization problems. In [14, 16, 25, 46, 47], the Nondegenerate Source Condition at $\overline{X}$ means the existence of a dual certificate $Y_{0}\in\mbox{\rm Im}\,\Phi^{*}\cap\partial\|\overline{X}\|_{*}$ satisfying $\|P_{\mathbb{T}^{\perp}}Y_{0}\|<1$ , which is equivalent to

\mbox{\rm Im}\,\Phi^{*}\cap{\rm ri}\,\partial\|\overline{X}\|_{*}\neq\emptyset.

(4.32)

This condition is weaker than the Nondegeneracy Condition (4.15). In the case of $\ell_{1}$ optimization, it is well-known that the Restricted Injectivity and Nondegenerate Source Condition together characterize solution uniqueness [25]. For nuclear norm minimization problem, [15, 16] shown that they are sufficient for solution uniqueness of problems (1.1) and (4.1); see also [46, 47] for more general convex optimization problems. It is worth noting that they are not necessary conditions for solution uniqueness; see, e.g., [24, Example 4.15]. One hidden reason is that the nuclear norm is not polyhedral. Recently [24] shows that these two conditions characterize the so-called sharp minima of (5.1), which is somewhat between solution uniqueness and strong minima; see our Remark 5.18 for further discussion. Due to [24, Proposition 4.27], they are sufficient for strong solution of problem (4.1). This fact can be obtained from Theorem 4.6 by observing that our Strong Nondegenerate Source Condition is weaker than Nondegenerate Source Condition, since Strong Nondegenerate Source Condition involving the set $\mathcal{E}$ means

(\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp})\cap{\rm ri}\,\partial\|\overline{X}\|_{*}\neq\emptyset

(4.33)

due to (4.16).

Remark 4.8 (Checking Strong Restricted Injectivity, Strong Sufficient Condition (4.12), and constructing the linear operator $\mathcal{M}$ ).

To check the Strong Restricted Injectivity, observe first that

\mathcal{E}=\left\{\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&C&D\\ 0&E&F\end{pmatrix}\overline{V}^{T}\in\mathbb{R}^{n_{1}\times n_{2}}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p-r)}\right\},

which is a subspace of $\mathbb{R}^{n_{1}\times n_{2}}$ with dimension $q\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{r(r+1)}{2}+r(p-r)+(n_{1}-r)(n_{2}-r)$ . Moreover, the restriction of $\mathcal{E}$ on $\mathbb{T}$

\mathcal{E}\cap\mathbb{T}=\left\{\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&0&0\\ 0&0&0\end{pmatrix}\overline{V}^{T}\in\mathbb{R}^{n_{1}\times n_{2}}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p-r)}\right\},

(4.34)

is also a subspace of $\mathbb{R}^{n_{1}\times n_{2}}$ with dimension $s\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{r(r+1)}{2}+r(p-r)$ . The set in $\overline{U}\begin{pmatrix}\mathbb{S}^{p}&0\\ 0&0\end{pmatrix}\overline{V}^{T}$ in Strong Sufficient Condition (4.12) is another subspace of $\mathbb{R}^{m\times n}$ with dimension $l\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{2}p(p+1)$ . Suppose that $\{W_{1},\ldots,W_{s}\}$ form a basis of $\mathcal{E}\cap\mathbb{T}$ , $\{W_{1},\ldots,W_{l}\}$ form a basis of $\overline{U}\begin{pmatrix}\mathbb{S}^{p}&0\\ 0&0\end{pmatrix}\overline{V}^{T}$ and $\{W_{1}\ldots W_{q}\}$ form a basis of $\mathcal{E}$ . For any $W\in\mathcal{E}$ , we write $W=\lambda_{1}W_{1}+\ldots+\lambda_{q}W_{q}$ and obtain that

\Phi(W)=\lambda_{1}\Phi(W_{1})+\ldots+\lambda_{q}\Phi(W_{q}).

(4.35)

Define $\Psi\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}\Phi(W_{1})&\ldots&\Phi(W_{q})\end{pmatrix}$ to be an $m\times q$ matrix and $\Psi_{s}$ , $\Psi_{l}$ to be the submatrices of the first $s,l$ columns of $\Psi$ , respectively. By (4.35), the Strong Restricted Injectivity is equivalent to the condition that $\mbox{\rm Ker}\,\Psi_{s}=0$ , i.e., $\mbox{\rm rank}\,{\Psi_{s}}=s.$ Similarly, the Strong Sufficient Condition (4.12) means $\mbox{\rm rank}\,{\Psi_{l}}=l$ .

Next let us discuss how to construct the operator $\mathcal{M}$ . Let $\widehat{U}\Lambda\widehat{V}^{T}$ be a SVD of $\Psi$ with $k={\rm rank}\,\Psi$ . Define $\widehat{V}_{G}$ to be the $q\times(q-k)$ submatrix of $\widehat{V}$ where $G=\{k+1,\ldots,q\}$ . Note that

\mbox{\rm Im}\,\widehat{V}_{G}=\mbox{\rm Ker}\,\Psi.

Determine the linear operator $\mathcal{M}:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{q-k}$ by

\mathcal{M}X\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\widehat{V}_{G}^{T}\begin{pmatrix}\langle W_{1},X\rangle,\ldots,\langle W_{q},X\rangle\end{pmatrix}^{T}\quad\mbox{for all}\quad X\in\mathbb{R}^{n_{1}\times n_{2}}.

(4.36)

It is easy to check that $\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}$ . $\hfill\Box$

Remark 4.9 (Checking the Analysis Strong Source Condition without using the linear operator $\mathcal{M}$ ).

To verify the Analysis Strong Source Condition, Theorem 4.6 suggests to solve the optimization problem (4.27), which involves the linear operator $\mathcal{M}$ . To avoid the computation of $\mathcal{M}$ , note first that the constraint in (4.27) means $U_{J}ZV_{K}^{T}+E\in\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}.$ Suppose that $\mathcal{N}$ is the linear operator with $\mbox{\rm Ker}\,\mathcal{N}=\mbox{\rm Im}\,\Phi^{*}$ , i.e., $\mbox{\rm Im}\,\mathcal{N}^{*}=\mbox{\rm Ker}\,\Phi$ , the later condition is equivalent to

\mathcal{N}(U_{J}ZV_{K}^{T}+E+W)=0\quad\mbox{for some}\quad W\in\mathcal{E}^{\perp},

where $\mathcal{E}^{\perp}$ is computed by

\mathcal{E}^{\perp}=\left\{U\begin{pmatrix}A&B&C\\ -B^{T}&0&0\\ D&0&0\end{pmatrix}V^{T}|\;A\in\mathbb{V}_{r},B\in\mathbb{R}^{r\times(p-r)},C\in\mathbb{R}^{r\times(n_{2}-p)},D\in\mathbb{R}^{(n_{1}-p)\times r}\right\},

(4.37)

which is a subspace of $\mathbb{T}$ with dimension $\frac{1}{2}r(r-1)+r(m+n-p-r)$ ; here $\mathbb{V}_{r}\subset\mathbb{R}^{r\times r}$ is the set of all skew-symmetric matrices. Hence problem (4.27) is equivalent to the following one

\min_{Z\in\mathbb{R}^{(n_{1}-r)\times(n_{2}-r)},W\in\mathbb{R}^{n_{1}\times n_{2}}}\qquad\|Z\|\quad\mbox{subject to}\quad\mathcal{N}(U_{J}ZV_{J}^{T}+W)=-\mathcal{N}E\quad\mbox{and}\quad W\in\mathcal{E}^{\perp}.

(4.38)

The price to pay is the size of this optimization problem is bigger than the one in (4.27). But finding the linear operator $\mathcal{N}$ is much easier than $\mathcal{M}$ .

Corollary 4.10 (Quantitative sufficient condition for strong solution).

Suppose that $\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h))$ is a minimizer of problem (4.1). Then $\overline{X}$ is a strong solution provided that the Strong Restricted Injectivity holds and

\gamma(\overline{X})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}(\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*})^{-1}\mathcal{M}E\|<1,

(4.39)

where $\mathcal{M}$ is the linear operator satisfying $\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}.$

Proof. Suppose that Strong Restricted Injectivity holds at the minimizer $\overline{X}$ . We consider the following linear equation:

\mathcal{M}P_{\mathbb{T}^{\perp}}Y=-\mathcal{M}E\quad\mbox{for}\quad Y\in\mathbb{R}^{n_{1}\times n_{2}}.

(4.40)

As $\overline{Y}\in\partial\|\overline{X}\|_{*}\cap\mbox{\rm Im}\,\Phi^{*}\subset\partial\|\overline{X}\|_{*}\cap\mbox{\rm Ker}\,\mathcal{M}$ , we have

0=\mathcal{M}\overline{Y}=\mathcal{M}(P_{\mathbb{T}}\overline{Y}+P_{\mathbb{T}^{\perp}}\overline{Y})=\mathcal{M}E+P_{\mathbb{T}^{\perp}}\overline{Y}.

It follows that $\overline{Y}$ is a solution to (4.40). Another solution of (4.40) is

\widehat{Y}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-(\mathcal{M}P_{\mathbb{T}^{\perp}})^{\dagger}\mathcal{M}E,

(4.41)

where $(\mathcal{M}P_{\mathbb{T}^{\perp}})^{\dagger}$ is the Moore-Penrose generalized inverse of $\mathcal{M}P_{\mathbb{T}^{\perp}}$ . Next we claim that $\mathcal{M}^{*}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}:\mbox{\rm Im}\,\mathcal{M}\to\mbox{\rm Im}\,\mathcal{M}$ is a bijective mapping. Indeed, suppose that $\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z=0$ for some $z\in\mbox{\rm Im}\,\mathcal{M}$ , we have

\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z\|^{2}=\langle P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z,P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z\rangle=0.

It follows that $M^{*}z\in\mathbb{T}\cap\mbox{\rm Im}\,\mathcal{M}^{*}=\{0\}$ , which implies $z\in\mbox{\rm Ker}\,\mathcal{M}^{*}\cap\mbox{\rm Im}\,\mathcal{M}=\{0\}$ . Thus $\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}$ is injective in $\mbox{\rm Im}\,\mathcal{M}$ . As the operator is self-dual, it is bijective. We obtain that

\widehat{Y}=-P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}(\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*})^{-1}\mathcal{M}E\in\mathbb{T}^{\perp}.

By the decomposition (4.19), we may write

\widehat{Y}=U\begin{pmatrix}0&0\\ 0&\widehat{Z}\end{pmatrix}V^{T}\quad\mbox{for some}\quad\widehat{Z}\in\mathbb{R}^{(n_{1}-r)\times(n_{2}-r)}.

Observe from (4.27) that

\zeta(\overline{X})\leq\|\widehat{Z}\|=\|\widehat{Y}\|=\gamma(\overline{X}).

If $\gamma(\overline{X})<1$ , we have $\zeta(\overline{X})<1.$ $\overline{X}$ is a strong solution due to Theorem 4.6. $\hfill\Box$

5 Characterizations for strong minima of nuclear norm minimization problems

In this section, let us consider the nuclear norm minimization problem (1.1):

\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad\|X\|_{*}\quad\mbox{subject to}\quad\Phi X=M_{0},

(5.1)

where $\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{m}$ is a linear operator ( $n_{1}\leq n_{2}$ ), $M_{0}\in\mathbb{R}^{m}$ is a known observation. This is a particular case of problem (3.20) with $g(X)=\|X\|_{*}$ for $X\in\mathbb{R}^{n_{1}\times n_{2}}$ . Note that $X_{0}$ is a solution of this problem if and only if $\Phi X_{0}=M_{0}$ and

0\in\partial\|X_{0}\|_{*}+N_{\Phi^{-1}(M_{0})}(X_{0})=\partial\|X_{0}\|_{*}+\mbox{\rm Im}\,\Phi^{*},

which is equivalent to the existence of a dual certificate $Y\in\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}$ . Define

\Delta(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}

to be the set of all dual certificates.

Lemma 5.1.

Let $\Omega$ be a nonempty closed set of $\mathbb{R}^{n_{1}\times n_{2}}$ . Suppose that $U_{1},U_{2}\in\mathbb{R}^{n_{1}\times n_{2}}$ and $V_{1},V_{2}\in\mathbb{R}^{n_{2}\times n_{2}}$ are orthogonal matrices satisfying $U_{1}\Omega V_{1}^{T}\supset U_{2}\Omega V_{2}^{T}$ . Then we have

U_{1}\Omega V_{1}^{T}=U_{2}\Omega V_{2}^{T}.

(5.2)

Proof. Define $U=U_{1}^{T}U_{2}$ and $V=V_{1}^{T}V_{2}$ . It is easy to check that $U$ and $V$ are orthogonal matrices. We have

U\Omega V^{T}\subset\Omega.

(5.3)

Define the mapping $\varphi:\Omega\to\Omega$ by $\varphi(X)=UXV^{T}$ for all $X\in\Omega$ . Note that $\varphi$ is an isometry with the spectral metric $d(X,Y)=\|X-Y\|$ for all $X,Y\in\Omega$ . Indeed, we have

d(\varphi(X),\varphi(Y))=\|U(X-Y)V^{T}\|=\|X-Y\|\quad\mbox{for all}\quad X,Y\in\Omega.

Note further that $\|\varphi(X)\|=\|X\|$ for all $X\in\Omega$ . It follows from (5.3) that

\varphi(\Omega\cap\overline{\mathbb{B}}_{k}(0))\subset\Omega\cap\overline{\mathbb{B}}_{k}(0)\quad\mbox{for any}\quad k\in\mathbb{N}.

Since $\Omega\cap\overline{\mathbb{B}}_{k}(0)$ is a compact metric space and $\varphi$ is an isometry, it is well-known that

\varphi(\Omega\cap\overline{\mathbb{B}}_{k}(0))=\Omega\cap\overline{\mathbb{B}}_{k}(0).

Hence we have

\varphi(\Omega)=\varphi\left(\bigcup_{k=1}^{\infty}(\Omega\cap\overline{\mathbb{B}}_{k}(0))\right)=\bigcup_{k=1}^{\infty}\varphi\left(\Omega\cap\overline{\mathbb{B}}_{k}(0)\right)=\bigcup_{k=1}^{\infty}(\Omega\cap\overline{\mathbb{B}}_{k}(0))=\Omega.

This verifies (5.2) and completes the proof of the lemma. $\hfill\Box$

For any $Y\in\partial\|X_{0}\|_{*}$ , recall from (4.6) that $p(Y)$ to be the number of singular values of $Y$ that are equal to $1$ . It follows from Lemma 3.2 that $r\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm rank}\,(X_{0})\leq p(Y)\leq n_{1}$ . Define the following constant

q(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\min\{p(Y)|\;Y\in\Delta(X_{0})\}.

(5.4)

When $X_{0}$ is a minimizer of problem (5.1), $q(X_{0})$ is well-defined and bigger than or equal to $\mbox{\rm rank}\,(X_{0})$ . The following theorem is the main result in this section.

Theorem 5.2 (Characterizations for strong minima of nuclear norm minimization problems).

Suppose that $X_{0}$ is an optimal solution of problem (5.1). The following are equivalent:

(i) $X_{0}$ is a strong solution of problem (5.1).

(ii) The following equality holds

\bigcap_{Y\in\Delta(X_{0})}\Big{[}\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y)}(X_{0})\Big{]}=\{0\}.

(5.5)

(iii) There exists a dual certificate $\overline{Y}\in\Delta(X_{0})$ such that

\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(\overline{Y})}(X_{0})=\{0\}.

(5.6)

(iv) For any $Y\in\Delta(X_{0})$ satisfying $p(Y)=q(X_{0})$ , condition (5.6) is satisfied.

Proof. Note that the nuclear norm function $\|\cdot\|_{*}$ is second order regular at $X_{0}$ [18] and it also satisfies the quadratic grown condition at $X_{0}$ , as $\partial\|\cdot\|_{*}$ is metrically subregular at $X_{0}$ for any $Y\in\partial\|X_{0}\|_{*}$ ; see [50]. By applying Lemma 4.1 and Theorem 3.5 with $\mathbb{X}=\mathbb{R}^{n_{1}\times n_{2}}$ , $\mathbb{Y}=\mathbb{R}^{m}$ , $g(\cdot)=\|\cdot\|_{*}$ , and $K=\{M_{0}\}$ , we have $X_{0}$ is a strong solution if and only if

\left[\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right]\cap C(X_{0})=\{0\},

(5.7)

where $C(X_{0})$ is the critical cone (3.23) computed by

C(X_{0})=\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;W\in\mbox{\rm Ker}\,\Phi,dg(X_{0})(W)=0\}.

By Proposition 4.5 and formula (4.10), we note that if $W\in\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y)}(X_{0})$ for some $Y\in\Delta(X_{0})$ then $W\in\mathcal{C}(X_{0},Y)$ , i.e., $dg(X_{0})(W)=\langle Y,W\rangle=0$ . Thus $W\in C(X_{0})$ and that

\left[\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right]\cap C(X_{0})=\left[\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right]\cap\mbox{\rm Ker}\,\Phi.

This together with (5.7) verifies the equivalence between (i) and (ii).

The implication [(iii) $\Rightarrow$ (ii)] and [(iv) $\Rightarrow$ (ii)] are trivial. To justify the converse implications, we first claim the existence of some dual certificate $\overline{Y}\in\Delta(X_{0})$ such that $p(\overline{Y})=q(X_{0})$ and

\bigcap_{Y\in\Delta(X_{0})}\Big{[}T_{N_{\mathbb{B}}(Y)}(X_{0})\Big{]}=T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}).

(5.8)

We prove this by using a popular version Zorn’s lemma [11, Proposition 5.9] on partially directed order set. Consider the following partially ordered set (poset)

\mathcal{P}=\Big{\{}T_{N_{\mathbb{B}}(Y)}(X_{0})|\;Y\in\Delta(X_{0})\Big{\}}

(5.9)

with the partial ordering $\supset$ between sets in $\mathcal{P}$ . Take into account any downward chain

T_{N_{\mathbb{B}}(Y_{1})}(X_{0})\supset T_{N_{\mathbb{B}}(Y_{2})}(X_{0})\supset\ldots\supset T_{N_{\mathbb{B}}(Y_{k})}(X_{0})\supset\ldots

(5.10)

for a sequence $\{Y_{k}\}\subset\Delta(X_{0})$ . We can find a subsequence $\{Y_{k_{l}}\}$ of $\{Y_{k}\}$ such that they have the same rank $p\geq r$ . According to the computation (4.10), there exist orthogonal matrices $(U_{k_{l}},V_{k_{l}})\in\mathcal{O}(Y_{k_{l}})\cap\mathcal{O}(X_{0})$ such that

T_{N_{\mathbb{B}}(Y_{k_{l}})}(X_{0})=U_{k_{l}}\Omega_{p}V^{T}_{k_{l}}\quad\mbox{with}\quad\Omega_{p}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{\begin{pmatrix}A&B&0\\ B^{T}&C&0\\ 0&0&0\end{pmatrix}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p-r)},C\in\mathbb{S}_{+}^{p-r}\right\}.

It follows that

T_{N_{\mathbb{B}}(Y_{k_{l}})}(X_{0})=U_{k_{l}}\Omega_{p}V^{T}_{k_{l}}\supset U_{k_{l+1}}\Omega_{p}V^{T}_{k_{l+1}}=T_{N_{\mathbb{B}}(Y_{k_{l+1}})}(X_{0}).

(5.11)

Since $\Omega_{p}$ is closed, we obtain from Lemma 5.1 that

T_{N_{\mathbb{B}}(Y_{k_{l}})}(X_{0})=T_{N_{\mathbb{B}}(Y_{k_{l+1}})}(X_{0})\quad\mbox{for all}\quad l=1,2,\ldots.

Hence the chain (5.10) is bounded below by $T_{N_{\mathbb{B}}(Y_{k_{1}})}(X_{0})\in\mathcal{P}$ . This means that every downward chain of of $\mathcal{P}$ has a minimum in $\mathcal{P}$ .

Let us show next that the poset $\mathcal{P}$ is directed downward with the partial ordering $\supset$ in the sense that for any two elements $T_{N_{\mathbb{B}}(Y_{1})}(X_{0})$ and $T_{N_{\mathbb{B}}(Y_{2})}(X_{0})$ of $\mathcal{P}$ with $Y_{1},Y_{2}\in\Delta(X_{0})$ , there exists $Y_{3}\in\Delta(X_{0})$ such that

T_{N_{\mathbb{B}}(Y_{1})}(X_{0})\supset T_{N_{\mathbb{B}}(Y_{3})}(X_{0})\quad\mbox{and}\quad T_{N_{\mathbb{B}}(Y_{2})}(X_{0})\supset T_{N_{\mathbb{B}}(Y_{3})}(X_{0}).

(5.12)

Indeed, we choose $Y_{3}=\frac{1}{2}(Y_{1}+Y_{2})\in\Delta(X_{0})$ . For any $W\in T_{N_{\mathbb{B}}(Y_{3})}(X_{0})$ , we obtain from (4.10) that $0=d^{2}g(X_{0}|Y_{3})(W)$ . Hence, there exists $t_{k}\downarrow 0$ and $W_{k}\to W$ such that

\begin{array}[]{ll}\dfrac{1}{k}&\geq\dfrac{g(X_{0}+t_{k}W_{k})-g(X_{0})-0.5\langle Y_{1}+Y_{2},X_{0}\rangle}{0.5t_{k}^{2}}\\ &=\dfrac{g(X_{0}+t_{k}W_{k})-g(X_{0})-\langle Y_{1},X_{0}\rangle}{t_{k}^{2}}+\dfrac{g(X_{0}+t_{k}W_{k})-g(X_{0})-\langle Y_{2},X_{0}\rangle}{t_{k}^{2}},\end{array}

which implies that

0\geq d^{2}g(X_{0}|Y_{1})(W)+d^{2}g(X_{0}|Y_{2})(W).

Since $d^{2}g(X_{0}|Y_{1})(W),d^{2}g(X_{0}|Y_{2})(W)\geq 0$ , we obtain from Lemma 3.9 and Lemma 4.1 that

W\in\mbox{\rm Ker}\,d^{2}g(X_{0}|Y_{1})=T_{N_{\mathbb{B}}(Y_{1})}(X_{0})\quad\mbox{and}\quad W\in\mbox{\rm Ker}\,d^{2}g(X_{0}|Y_{2})=T_{N_{\mathbb{B}}(Y_{2})}(X_{0}).

This clearly verifies the directed condition (5.12). By [11, Proposition 5.9], the directed downward poset $\mathcal{P}$ has a minimum in $\mathcal{P}$ in the sense that there exists $\overline{Y}$ such that (5.8) is valid. The implication [(ii) $\Rightarrow$ (iii)] follows from (5.8).

Next, let us show that $p(\overline{Y})=q(X_{0})$ and

T_{N_{\mathbb{B}}(Y)}(X_{0})=T_{N_{\mathbb{B}}(\overline{Y})}(X_{0})

(5.13)

for any $Y\in\Delta(X_{0})$ with $p(Y)=q(X_{0})$ . Indeed, pick any $Y$ satisfying the later condition, we obtain from (5.8) that

T_{N_{\mathbb{B}}(Y)}(X_{0})\supset T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}).

(5.14)

Due to the computation (4.10), the maximum ranks of matrices in $T_{N_{\mathbb{B}}(\overline{Y})}(X_{0})$ and $T_{N_{\mathbb{B}}(Y)}(X_{0})$ are $p(\overline{Y})$ and $p(Y)$ , respectively. It follows that $q(X_{0})\leq p(\overline{Y})\leq p(Y)=q(X_{0})$ , which verifies that $p(\overline{Y})=q(X_{0})$ . Moreover, due to (4.10) and (5.14), we get (5.13) from Lemma 5.1. Combining (5.13) and (5.8) ensures the implication [(ii) $\Rightarrow$ (iv)]. The proof is complete. $\hfill\Box$

Remark 5.3.

A sufficient condition for strong minima of nuclear norm minimization (5.1) can be obtained from [18, Theorem 12]. However, their condition has a format of a minimax problem: for any element in the critical cone, there exists some Lagrange multiplier such that a certain second order sufficient condition is satisfied, i.e., the Lagrange multiplier used in the sufficient condition depends on the choice of elements in critical cone. This is a typical situation; see (2.23) for instance. In our characterizations for strong minima in part (iii) and (iv) of the above theorem, the existence of dual certificate $\overline{Y}$ is independent from elements of the critical cone. Condition (iii) is close to the maximin situation. Morever, we know extra information about these dual certificates from (iv) that they should have minimum numbers of singular values that are equal to 1.

Similarly to Theorem 4.6, condition (5.6) is equivalent to the combination of Strong Restricted Injectivity and Strong Nondegenerate Source Condition. Let us recall the model tangent subspace (4.13) at $X_{0}$ here:

\mathbb{T}_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{U_{0}Y^{T}+XV_{0}^{T}|\;X\in\mathbb{R}^{n_{1}\times r},Y\in\mathbb{R}^{n_{2}\times r}\},

where $U_{0}\Sigma_{0}V_{0}^{T}$ is a compact SVD of $X_{0}$ .

Corollary 5.4 (Strong Restricted Injectivity and Strong Nondegenerate Source Condition for strong minima ).

Suppose that $X_{0}$ is a minimizer of problem (5.1). Then $X_{0}$ is a strong solution of problem (5.1) if and only if both following conditions hold:

(a)

Strong Restricted Injectivity: $\mbox{\rm Ker}\,\Phi\cap\mathcal{E}_{0}\cap T_{0}=\{0\}$ , where

\mathcal{E}_{0}=\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{T_{0}}W\in U\begin{pmatrix}A&B&0\\ B^{T}&0&0\\ 0&0&0\end{pmatrix}V^{T},A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(q(X_{0})-r)}\right\}

(5.15)

with some $(U,V)\in\mathcal{O}(X_{0})\cap\mathcal{O}(Y_{0})$ with $Y_{0}\in\Delta(X_{0})$ satisfying $p(Y_{0})=q(X_{0})$ .

(b)

Strong Nondegenerate Source Condition: There exists $Y\in\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}_{0}^{\perp}$ such that $Y\in{\rm ri}\,\partial\|X_{0}\|_{*}.$

Remark 5.5 (Sharp minima vs Strong minima).

The set $\mathcal{E}_{0}$ is only dependent on $X_{0}$ . Indeed, it follows from (5.8) and (4.10) that

\mathcal{E}_{0}=\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{\mathbb{T}}W\in P_{\mathbb{T}}\left(\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right)\right\},

which is a subspace of $\mathbb{R}^{n_{1}\times n_{2}}$ . As discussed after Theorem 4.6, the Strong Restricted Injectivity is weaker than the Restricted Injectivity

\mbox{\rm Ker}\,\Phi\cap\mathbb{T}_{0}=\{0\}.

(5.16)

The Strong Nondegenerate Source Condition is also weaker than the Nondegenerate Source Condition:

\mbox{\rm Im}\,\Phi^{*}\cap{\rm ri}\,\partial\|X_{0}\|_{*}\neq\emptyset.

(5.17)

This condition means $q(X_{0})=\mbox{\rm rank}\,(X_{0})$ . The Nondegenerate Source Condition together with the Restricted Injectivity is used in [15, 16] as sufficient conditions for solution uniqueness of problem (5.1) at $X_{0}$ . These two conditions are shown recently to be equivalent to the stronger property, sharp minima at $X_{0}$ in [24, Theorem 4.6] in the sense that there exists some $c>0$ such that

\|X\|_{*}-\|X_{0}\|_{*}\geq c\|X-X_{0}\|\quad\mbox{for any $X\in\mathbb{R}^{n_{1}\times n_{2}}$ satisfying}\quad\Phi X=M_{0}.

(5.18)

Our Strong Restricted Injectivity and Strong Nongenerate Source Condition are characterizations for a weaker property, the strong minima for problem (5.1). Of courses, they can also serve as sufficient conditions for solution uniqueness at $X_{0}$ ; see [28] for some recent characterizations for this property.

In order to check Nondegenerate Source Condition (5.17), one has to show that the Source Coefficient $\rho(X_{0})$ , the optimal value of the following optimization problem

\min_{Z\in\mathbb{T}_{0}^{\perp}}\quad\|Z\|\quad\mbox{subject to}\quad\mathcal{N}Z=-\mathcal{N}E_{0}\quad\mbox{with}\quad E_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}U_{0}V_{0}^{T}

(5.19)

to be smaller than $1$ , where $\mathcal{N}$ is a linear operator satisfying $\mbox{\rm Ker}\,\mathcal{N}=\mbox{\rm Im}\,\Phi^{*}$ ; see, e.g. [24, Remark 4.5]. When the Restricted Injectivity (5.16) holds, an upper bound for $\rho(X_{0})$ is

\tau(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|\mathcal{N}^{*}_{\mathbb{T}_{0}^{\perp}}(\mathcal{N}^{*}_{\mathbb{T}_{0}^{\perp}}\mathcal{N}_{\mathbb{T}_{0}^{\perp}})^{-1}\mathcal{N}E_{0}\|\quad\mbox{with}\quad\mathcal{N}_{\mathbb{T}_{0}^{\perp}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathcal{N}P_{\mathbb{T}_{0}^{\perp}}.

(5.20)

Hence condition $\tau(X_{0})<1$ is sufficient for sharp minima; see, e.g., [24, Corollary 4.8]. This condition is known as the Analysis Exact Recovery Condition in [38] for the case of $\ell_{1}$ optimization. Another independent condition is also used to check solution uniqueness of problem (5.1) is the so-called Irrepresentability Criterion [15, 16, 45]:

{\bf IC}(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|\Phi^{*}_{\mathbb{T}_{0}^{\perp}}\Phi_{\mathbb{T}_{0}}\left(\Phi_{\mathbb{T}_{0}}^{*}\Phi_{\mathbb{T}_{0}}\right)^{-1}E_{0}\|<1\quad\mbox{with}\quad\Phi_{\mathbb{T}_{0}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\Phi P_{\mathbb{T}_{0}}.

(5.21)

Note that ${\bf IC}(X_{0})\geq\rho(X_{0})$ . Thus ${\bf IC}(X_{0})<1$ also implies that $X_{0}$ is a sharp solution of problem (5.1).

Remark 5.6 (Descent cone vs tangent cone).

Sharp minima and strong minima of problem (5.1) are sufficient for solution uniqueness, which is a significant property for exact recovery [1, 13, 16]. An important geometric structure used to study solution uniqueness is the descent cone [13] at $X_{0}$ defined by

\mathcal{D}(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}{\rm cone}\,\{X-X_{0}|\;\|X\|_{*}\leq\|X_{0}\|_{*}\}.

(5.22)

Indeed, [13] shows that $X_{0}$ is a unique solution of problem (5.1) if and only if

\mbox{\rm Ker}\,\Phi\cap\mathcal{D}(X_{0})=\{0\}.

(5.23)

Unlike the tangent cones in (5.5) or (5.6), the descent cone $\mathcal{D}(X_{0})$ may be not closed. Although the direct connection between the descent cone ${\mathcal{D}}(X_{0})$ and tangent cones $T_{N_{\mathbb{B}}(Y)}(X_{0})$ is not clear, we claim that

\mbox{\rm Ker}\,\Phi\cap\mathcal{D}(X_{0})\subset\bigcap_{Y\in\Delta(X_{0})}\left[\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y)}(X_{0})\right]

(5.24)

when $X_{0}$ is a minimizer of problem (5.1), where $\Delta(X_{0})=\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}$ is the set of dual certificates at $X_{0}$ . Indeed, for any $W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{D}(X_{0})$ , there exists $\tau>0$ such that $\|X_{0}+\tau W\|_{*}\leq\|X_{0}\|_{*}$ . As $\Phi(X_{0}+\tau W)=\Phi X_{0}$ , we have $\|X_{0}+\tau W\|_{*}=\|X_{0}\|_{*}$ . Pick any $Y\in\Delta(X_{0})$ . In view of (4.4) and the definition of $Y$ , it follows that

\|X_{0}+\tau W\|_{*}=\|X_{0}\|_{*}=\langle Y,X_{0}\rangle=\langle Y,X_{0}+\tau W\rangle,

which implies that $X_{0}+\tau W\in N_{\mathbb{B}}(Y)$ due to (4.4)-(4.5) and thus $W\in T_{N_{\mathbb{B}}(Y)}(X_{0})$ . This verifies inclusion (5.24). Inclusion (5.24) also tells us that condition (5.5) is sufficient for (5.23). This observation is not a surprise in the sense of Theorem 5.2, as strong minima obviously implies solution uniqueness. But solution uniqueness of problem (5.1) does not indicate strong minima; see [24, Example 5.11]. Hence, inclusion (5.24) is strict.

Similarly to Corollary 4.4, the following result reveals the role of Strict Restricted Injectivity in strong minima.

Corollary 5.7 (Strict Restricted Injectivity for strong minima of problem (5.1)).

Suppose that $U_{0}\Sigma_{0}V_{0}^{T}$ is a compact SVD of $X_{0}$ with $r={\rm rank}\,(X_{0})$ . If $X_{0}$ is a strong solution, then the following Strict Restricted Injectivity is satisfied:

\mbox{\rm Ker}\,\Phi\cap U_{0}\mathbb{S}^{r}V_{0}^{T}=\{0\}.

(5.25)

This condition is also sufficient for strong solution at $X_{0}$ provided that Nondegenerate Source Condition (5.17) holds at $X_{0}$ .

Proof. Note from (4.10),

T_{N_{\mathbb{B}}(Y)}(X_{0})\supset U_{0}\mathbb{S}^{r}V_{0}^{T}\quad\mbox{for any}\quad Y\in\Delta(X_{0}).

If $X_{0}$ is a strong solution, combing (5.5) with the latter inclusion verifies (5.25).

If Nongegenerate Source Condition (5.17) holds at $X_{0}$ , there exists $Y_{0}\in\mbox{\rm Im}\,\Phi^{*}\cap{\rm ri}\,\partial\|X_{0}\|_{*}$ . Hence $\Delta(X_{0})\neq\emptyset$ , i.e., $X_{0}$ is a solution of problem (5.1). It follows from Lemma 4.1 that $p(Y_{0})=\mbox{\rm rank}\,(X_{0})$ and from (4.10) that

T_{N_{\mathbb{B}}(Y)}(X_{0})=U_{0}\mathbb{S}^{r}V_{0}^{T}.

Hence the validity of (5.25) implies that $X_{0}$ is a strong solution to problem (5.1) due to the equivalence between (i) and (iii) in Theorem 5.2. $\hfill\Box$

Suppose that $U\begin{pmatrix}\Sigma_{0}&0\\ 0&0\end{pmatrix}V^{T}$ is a full SVD of $X_{0}$ . The model tangent space $\mathbb{T}_{0}$ at $X_{0}$ can be represented by

\mathbb{T}_{0}=\left\{U\begin{pmatrix}A&B\\ C&0\end{pmatrix}V^{T}|\;A\in\mathbb{R}^{r\times r},B\in\mathbb{R}^{r\times(n_{2}-r)},C\in\mathbb{R}^{(n_{1}-r)\times r}\right\}.

(5.26)

It has dimension $r(n_{1}+n_{2}-r)$ . When Restricted Injectivity (5.16) holds, we have $m\geq r(n_{1}+n_{2}-r)$ . Similarly, as the dimension of $U_{0}\mathbb{S}^{r}V_{0}^{T}$ is $\frac{1}{2}r(r+1)$ , it is necessary for Strict Restricted Injectivity (5.25) that $m\geq\frac{1}{2}r(r+1)$ . Next we show that this bound $\frac{1}{2}r(r+1)$ for $m$ is tight.

Corollary 5.8 (Minimum bound for strong exact recovery).

Suppose that $X_{0}$ is an $n_{1}\times n_{2}$ matrix with rank $r$ . Then one needs at least $\frac{1}{2}r(r+1)$ measurements $m$ of $M_{0}$ so that solving the nuclear norm minimization problem (5.1) recovers exactly the strong solution $X_{0}$ .

Moreover, there exist infinitely many linear operators $\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{\frac{1}{2}r(r+1)}$ such that $X_{0}$ is a strong solution of problem (5.1).

Proof. Suppose that $U_{0}\Sigma V_{0}^{T}$ is a compact SVD of $X_{0}$ . Let $\{A_{1},\ldots,A_{s}\}$ with $s\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{2}r(r+1)$ be any basis of $U_{0}\mathbb{S}^{r}V_{0}^{T}$ . If $X_{0}$ is a strong solution of problem (5.1), Strict Restricted Injectivity (5.25) holds by Corollary 5.7. It follows that $\{\Phi(A_{1}),\ldots,\Phi(A_{s})\}$ are linearly independent. Hence, we have $m\geq s$ , which verifies the first part.

To justify the second part, we construct the linear operator $\Phi_{s}:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{s}$ as follows:

\Phi_{s}(X)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(\langle A_{k},X\rangle)_{1\leq k\leq s}\in\mathbb{R}^{s}\quad\mbox{for any}\quad X\in\mathbb{R}^{n_{1}\times n_{2}}.

(5.27)

Note that $\mbox{\rm Im}\,\Phi_{s}^{*}=\mbox{\rm span}\,\{A_{1},\ldots,A_{s}\}=U_{0}\mathbb{S}^{r}V_{0}^{T}$ . It follows that $E_{0}=U_{0}V_{0}^{T}\in\mbox{\rm Im}\,\Phi_{s}^{*}\cap\mbox{\rm ri}\,\partial\|X_{0}\|_{*}$ is a dual certificate that satisfies Nondegenerate Source Coundition (5.17). As

\mbox{\rm Ker}\,\Phi_{s}=(\mbox{\rm Im}\,\Phi_{s}^{*})^{\perp}=(U_{0}\mathbb{S}^{r}V_{0}^{T})^{\perp},

we have $\mbox{\rm Ker}\,\Phi_{s}\cap U_{0}\mathbb{S}^{r}V_{0}^{T}=\{0\}$ . Hence Strict Restricted Injectivity (5.25) holds. By Corollary 5.7 again, $X_{0}$ is a strong solution of problem (5.1). $\hfill\Box$

Remark 5.9 (Low bounds on the number of measurement for exact recovery).

In the theory of exact recovery, [13] shows that with $m\geq 3r(n_{1}+n_{2}-r)$ random Gaussian measurements it is sufficient to recover exactly $X_{0}$ with high probability by solving problem (5.1) from observations $M_{0}=\Phi X_{0}$ ; see also [16] for a similar result with a different approach. Also in [13, Propositions 4.5 and 4.6], a lower bound on the number of measurements is discussed for exact recovery in atomic norm minimization via the descent cone (5.6) and Terracini’s Lemma [26]. The latter is used to get an estimate of the dimension of a subspace component of the descent cone. In the case of nuclear norm, this lower bound is indeed $\min\{n_{1}n_{2},(r+1)(n_{1}+n_{2})-r\}$ ; see aslo [26, Proposition 12.2]. This bound holds for any linear measurement scheme. Our lower bound $\frac{1}{2}r(r+1)$ for $m$ is much smaller and only depends on the rank of $X_{0}$ , but is achieved for special $\Phi$ only.

Example 5.10.

Let us consider the following nuclear norm minimization problem

\min_{X\in\mathbb{R}^{2\times 2}}\|X\|_{*}\quad\mbox{subject to}\quad\Phi X\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}X_{11}&0\\ 0&X_{22}\end{pmatrix}=\begin{pmatrix}1&0\\ 0&0\end{pmatrix},

(5.28)

which is a matrix completion problem. Set $X_{0}=\begin{pmatrix}1&0\\ 0&0\end{pmatrix}$ , we have

\partial\|X_{0}\|_{*}=\begin{pmatrix}1&0\\ 0&[-1,1]\end{pmatrix}\quad\mbox{ and }\quad\mathbb{T}_{0}=\left\{\begin{pmatrix}a&b\\ c&0\end{pmatrix}|\;a,b,c\in\mathbb{R}\right\}.

(5.29)

Moreover, note that

\mbox{\rm Ker}\,\Phi=\left\{\begin{pmatrix}0&b\\ c&0\end{pmatrix}|\;b,c\in\mathbb{R}\right\}\quad\mbox{ and }\quad\mbox{\rm Im}\,\Phi^{*}=\left\{\begin{pmatrix}a&0\\ 0&d\end{pmatrix}|\;a,d\in\mathbb{R}\right\}

(5.30)

Note that $\Delta(X_{0})=\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}=\partial\|X_{0}\|_{*}$ . Hence $X_{0}$ is a solution of problem (5.28). However, $\mbox{\rm Ker}\,\Phi\cap\mathbb{T}_{0}=\mbox{\rm Ker}\,\Phi\neq\{0\}$ , i.e., Restricted Injectivity (5.16) fails. Thus $X_{0}$ is not a sharp solution of problem (5.28). Note that $q(X_{0})=1$ and $Y_{0}=\begin{pmatrix}1&0\\ 0&0\end{pmatrix}\in\Delta(X_{0})$ with $p(Y_{0})=1$ . We have from (4.10) that

T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\left\{\begin{pmatrix}a&0\\ 0&d\end{pmatrix}|\;a,d\in\mathbb{R}\right\}.

It is clear that $\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\{0\}$ . This shows that $X_{0}$ is a strong solution by Theorem 5.2. $\hfill\Box$

Example 5.11 (Checking strong minima numerically).

Let us consider the following matrix completion problem

\min_{X\in\mathbb{R}^{3\times 3}}\quad\|X\|_{*}\quad\mbox{subject to}\quad{\rm P}_{\Omega}(X)=M_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}4&2&4\\ 2&1&0\\ 4&0&0\end{pmatrix},

(5.31)

where ${\rm P}_{\Omega}$ is the projection mapping defined by

{\rm P}_{\Omega}(X)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}X_{11}&X_{12}&X_{13}\\ X_{21}&X_{22}&0\\ X_{31}&0&0\end{pmatrix}.

Define

X_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}4&2&4\\ 2&1&2\\ 4&2&4\end{pmatrix}\quad\mbox{and}\quad U=V\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{3}\begin{pmatrix}2&-2&1\\ 1&2&2\\ 2&1&-2\end{pmatrix}.

Note that $U,V$ are orthogonal matrices, ${\rm P}_{\Omega}(X_{0})=M_{0}$ , $X_{0}=U\begin{pmatrix}9&0&0\\ 0&0&0\\ 0&0&0\end{pmatrix}V^{T}$ is an SVD of $X_{0}$ , and ${\rm rank}\,(X_{0})=1$ . To check whether $X_{0}$ is a sharp solution of problem (5.31), we compute the constants $\tau(E_{0})$ in (5.20) or $\rho(E_{0})$ from problem (5.19). In this case the linear operator $\mathcal{N}$ in (5.19) is chosen by $P_{\Omega^{\perp}}$ as $\mbox{\rm Ker}\,P_{\Omega^{\perp}}=\mbox{\rm Im}\,P_{\Omega}$ . With some linear algebra, we calculate $\tau(E_{0})=1.2>1$ with $E_{0}=1/9X_{0}$ . Moreover, by using the cvx package to solve the spectral norm optimization (5.19), the Source Nondegenerate $\rho(E_{0})$ is exactly $1$ and gives us a solution $Z_{0}$ . Thus $X_{0}$ is not a sharp solution of problem (5.28).

However, note further that $Y_{0}=Z_{0}+E_{0}\in\mbox{\rm Ker}\,P_{\Omega^{\perp}}=\mbox{\rm Im}\,P_{\Omega}$ is a dual certificate, which is computed by

Y_{0}=\begin{pmatrix}0&0&1\\ 0&1&0\\ 1&0&0\end{pmatrix}.

Let us check the condition

\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\{0\}

(5.32)

in (5.6). It follows from (4.3) that

Y_{0}=UU^{T}Y_{0}VV^{T}=U\begin{pmatrix}1&0&0\\ 0&0&1\\ 0&1&0\end{pmatrix}V^{T}.

The SVD of the submatrix $\begin{pmatrix}0&1\\ 1&0\end{pmatrix}$ is certainly

\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}1&0\\ 0&1\end{pmatrix}\begin{pmatrix}1&0\\ 0&1\end{pmatrix}.

According to (4.8), $(\overline{U},\overline{Y})\in\mathcal{O}(X_{0})\cap\mathcal{O}(Y_{0})$ with

\overline{U}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{3}\begin{pmatrix}2&1&-2\\ 1&2&2\\ 2&-2&1\end{pmatrix}\quad\mbox{and}\quad\overline{V}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}V.

By formula (4.10),

T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\left\{\overline{U}\begin{pmatrix}A&B\\ B^{T}&C\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{R}^{1\times 1},B\in\mathbb{R}^{1\times 2},C\in\mathbb{S}^{2}_{+}\right\}.

It follows that

\mathcal{E}=\left\{\overline{U}\begin{pmatrix}A&B\\ B^{T}&C\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{R}^{1\times 1},B\in\mathbb{R}^{1\times 2},C\in\mathbb{R}^{2\times 2}\right\}\;\mbox{and}\;\mathcal{E}^{\perp}=\left\{\overline{U}\begin{pmatrix}0&B\\ -B^{T}&0\end{pmatrix}\overline{V}^{T}|\;B\in\mathbb{R}^{1\times 2}\right\}.

To verify (5.32), we compute $\zeta(E_{0})$ , which is the optimal solution of problem (4.38)

\min_{Z\in\mathbb{R}^{2\times 2},W\in\mathbb{R}^{3\times 3}}\quad\|Z\|\quad\mbox{subject to}\quad P_{\Omega^{c}}(\overline{U}_{J}ZV_{J}^{T}+W)=-P_{\Omega^{c}}(E_{0})\quad\mbox{and}\quad W\in\mathcal{E}^{\perp}.

This is a convex optimization problem. By using cvx to solve it, we obtain that $\zeta(E_{0})=1/6$ . As $\zeta(E_{0})<1$ , $X_{0}$ is a strong solution of problem (5.31). $\hfill\Box$

The idea of checking strong solution numerically by using Theorem 5.2 in the above example will be summarized again in Section 6 with bigger size of nuclear norm minimization problems.

In the later Corollary 5.12, we will show a big class of nuclear minimization problem satisfies both Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17), but not Restricted Injectivity (5.16).

Now let us consider a special case of problem (5.1)

\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad\|X\|_{*}\quad\mbox{subject to}\quad LX=M_{0},

(5.33)

where $L$ and $M_{0}$ are known $q\times n_{1}$ and $q\times n_{2}$ matrices, respectively. This is usually referred as the low-rank representation problem [33]. It is well-known that the optimal solution to problem (5.33) is unique and determined by $L^{\dagger}M_{0}$ , where $L^{\dagger}$ is the Moore-Penrose opeator of $L$ . In the following result, we advance this result by showing that this unique solution is indeed a strong solution, but not necessarily a sharp solution. Indeed, in this certain class, Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17) are satisfied, but Restricted Injectivity (5.16) is not.

Corollary 5.12 (Strong minima of low-rank representation problems).

Let $L$ be an $q\times n_{1}$ matrix. If the linear system $LX=M_{0}$ is consistent, then Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17) hold at $X_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}L^{\dagger}M_{0}$ in problem (5.33).

Consequently, $X_{0}$ is the strong solution of the low-rank representation problem (5.33).

Proof. Suppose that $U\Sigma V^{T}$ is a compact SVD to the matrix $L$ . Thus $L^{\dagger}=V\Sigma^{-1}U^{T}$ and note that $L^{\dagger}M_{0}=V\Sigma^{-1}U^{T}M_{0}$ . Let $U_{0}\Sigma_{0}V_{0}^{T}$ be a compact SVD of $\Sigma^{-1}U^{T}M_{0}$ with $\Sigma_{0}\in\mathbb{R}^{r\times r}$ . Note that

(VU_{0})^{T}VU_{0}=U_{0}^{T}V^{T}VU_{0}=U_{0}^{T}U_{0}=\mathbb{I}.

It follows that $VU_{0}\Sigma V_{0}^{T}$ is a compact SVD of $X_{0}$ . By Lemma 4.1, we have $E_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}VU_{0}V_{0}^{T}\in\mbox{\rm ri}\,\partial\|X_{0}\|_{*}$ . Observe further that

E_{0}=VU_{0}V_{0}^{T}=V\Sigma U^{T}U\Sigma^{-1}U_{0}V_{0}^{T}=L^{T}U\Sigma^{-1}U_{0}V_{0}^{T}=\Phi^{*}(U\Sigma^{-1}U_{0}V_{0}^{T}),

which implies that $E_{0}\in\mbox{\rm Im}\,\Phi^{*}\cap\mbox{\rm ri}\,\partial\|X_{0}\|_{*}.$ This verifies Nondegenerate Source Condition (5.17) and shows that $X_{0}$ is an optimal solution of problem (5.33).

Next, let us check Strict Restricted Injectivity (5.25). For any $W\in\mbox{\rm Ker}\,\Phi\cap(VU_{0})\mathbb{S}^{r}V_{0}^{T}$ with $r=\mbox{\rm rank}\,(X_{0})$ , we find some $A\in\mathbb{S}^{r}$ such that $W=VU_{0}AV_{0}^{T}$ . We have

0=\Phi(W)=LW=U\Sigma V^{T}VU_{0}AV_{0}^{T}=U\Sigma U_{0}AV_{0}^{T}.

It follows that

0=U_{0}^{T}\Sigma^{-1}U^{T}(U\Sigma U_{0}AV_{0}^{T})V_{0}=U_{0}^{T}\Sigma^{-1}\Sigma U_{0}A=U_{0}^{T}U_{0}A=A,

which also implies that $W=0$ . This verifies Strict Restricted Injectivity (5.25).

Consequently, according to Corollary 5.7, $X_{0}$ is the strong solution of problem (5.33). $\hfill\Box$

In the following simple example, we show that the unique solution to (5.33) may be not a sharp solution in the sense (5.18).

Example 5.13 (Unique solutions of low-rank representation problems are not sharp).

Consider the following optimization problem

\min_{X\in\mathbb{R}^{2\times 2}}\qquad\|X\|_{*}\qquad\mbox{subject to}\qquad\begin{pmatrix}1&1\end{pmatrix}X=\begin{pmatrix}1&0\end{pmatrix}.

(5.34)

As $L=\begin{pmatrix}1&1\end{pmatrix}$ and $M_{0}=\begin{pmatrix}1&0\end{pmatrix}$ , the unique solution to problem (5.34) is

X_{0}=L^{\dagger}M_{0}=\begin{pmatrix}0.5\\ 0.5\end{pmatrix}\begin{pmatrix}1&0\end{pmatrix}=\begin{pmatrix}0.5&0\\ 0.5&0\end{pmatrix}.

Pick any $X_{\varepsilon}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}0.5+\varepsilon&0\\ 0.5-\varepsilon&0\end{pmatrix}$ and note that $X_{\varepsilon}$ satisfies linear constraint in (5.34). We have

\|X_{\varepsilon}\|_{*}=\sqrt{(0.5+\varepsilon)^{2}+(0.5-\varepsilon)^{2}}=\sqrt{0.5+2\varepsilon^{2}}.

It follows that

\dfrac{\|X_{\varepsilon}\|_{*}-\|X_{0}\|_{*}}{\|X_{\varepsilon}-X_{0}\|_{F}}=\dfrac{\sqrt{(0.5+\varepsilon)^{2}+(0.5-\varepsilon)^{2}}-\sqrt{0.5}}{\sqrt{2\varepsilon^{2}}}=\dfrac{\sqrt{0.5+2\varepsilon^{2}}-\sqrt{0.5}}{\sqrt{2\varepsilon^{2}}}=\dfrac{\sqrt{2}\varepsilon}{\sqrt{0.5+2\varepsilon^{2}}+\sqrt{0.5}}.

This shows that $X_{0}$ is not a sharp solution in the sense of (5.18). The hidden reason is that Restricted Injectivity (5.16) is not satisfied in this problem. $\hfill\Box$

A relatively close problem to (5.33) is

\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad h(LX)+\mu\|X\|_{*},

(5.35)

where the function $h:\mathbb{R}^{q\times n_{2}}\to[0,\infty]$ satisfies the standing assumptions in Section 4 with open domain and $\mu$ is a positive number. This is a particular case of (4.1). Next we also show that strong minima occurs in this problem.

Corollary 5.14.

Problem (5.35) has a unique and strong solution.

Proof. It is easy to see that problem (5.35) has at least an optimal solution $\overline{X}$ . Let $U\Sigma V^{T}$ be a compact SVD of $L$ and define

\overline{Y}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\dfrac{1}{\mu}L^{T}\nabla h(L\overline{X})=-\dfrac{1}{\mu}V\Sigma U^{T}\nabla h(L\overline{X})\in\partial\|\overline{X}\|_{*}.

Let $U_{1}\Sigma_{1}V_{1}^{T}$ be a compact SVD of $-\dfrac{1}{\mu}\Sigma U^{T}\nabla h(L\overline{X})$ with $\Sigma_{1}\in\mathbb{R}^{p\times p}$ . As $(VU_{1})^{T}(VU_{1})=\mathbb{I}$ , it follows that $VU_{1}\Sigma_{1}V_{1}$ is a compact SVD of $\overline{Y}$ . By (4.5), we can find $\overline{A}\in\mathbb{S}_{+}^{p}$ such that $\overline{X}=VU_{1}\overline{A}V_{1}^{T}$ , which means

\overline{A}=U^{T}_{1}V^{T}\overline{X}V_{1}.

Next let us estimate $T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})$ . For any $W\in T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})$ , we find sequences $t_{k}\downarrow 0$ and $W_{k}\to W$ satisfying $\overline{X}+t_{k}W_{k}\in N_{\mathbb{B}}(\overline{Y})\subset VU_{1}\mathbb{S}_{+}^{p}V_{1}^{T}$ by Lemma 4.1 again. It follows that

W_{k}\in\frac{1}{t_{k}}VU_{1}(\mathbb{S}^{p}_{+}-\overline{A})V_{1}^{T}\subset VU_{1}\mathbb{S}^{p}V_{1}^{T},

which implies that $T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})\subset VU_{1}\mathbb{S}^{p}V_{1}^{T}.$ We claim next that $\mbox{\rm Ker}\,\Phi\cap T_{N_{B}(\overline{Y})}(X_{0})=\{0\}$ . Indeed, take any $W\in T_{N_{\mathbb{B}}(\overline{Y})}(X_{0})$ with $\Phi(W)=0$ , we find $B\in\mathbb{S}^{p}$ such that $W=VU_{1}BV_{1}^{T}$ and

0=\Phi(W)=LW=U\Sigma V^{T}VU_{1}BV_{1}^{T}=U\Sigma U_{1}BV_{1}^{T}.

Hence we have

U_{1}BV_{1}^{T}=\Sigma^{-1}U^{T}U\Sigma U_{1}BV_{1}=0.

This implies that $W=0$ and verifies the claim. By Corollary 4.2, $X_{0}$ is the strong solution of problem (5.35). $\hfill\Box$

6 Numerical Experiments

In this section, we perform numerical experiments to demonstrate strong minima, sharp minima, and solution uniqueness for nuclear norm minimization problem (1.1). The experiments were conducted for different matrix ranks $r$ and numbers of measurements $m$ for $M_{0}$ . Through the section, we also discuss how to use our conditions to check strong minima for problem (1.1).

6.1 Experiment 1

In the first experiment, we generate $X_{0}$ , an $n\times n$ matrix of rank $r$ , by sampling two factors $W\in\mathbb{R}^{n\times r}$ and $H\in\mathbb{R}^{n\times r}$ with independent and identically distributed (i.i.d.) random entries and setting $X_{0}=WH^{*}$ . We vectorize problem (1.1) in the following form:

\min_{X\in\mathbb{R}^{n\times n}}\quad\|X\|_{*}\quad\mbox{subject to}\quad\Phi\ \text{vec}(X)=\Phi\ \text{vec}(X_{0}),

(6.1)

where $\Phi\in\mathbb{R}^{m\times n^{2}}$ is drawn from the standard Gaussian ensemble, i.e., its entries are independently and identically distributed from a zero-mean unit variance Gaussian distribution. We declare $X_{0}$ to be recovered (a solution to (6.1)) if $\|X_{\text{opt}}-X_{0}\|_{F}/\|X_{0}\|_{F}<10^{-3}$ , as proposed in [15]. In order to check sharp minima, it is required to verify Restricted Injectivity (5.16), compute $\tau(X_{0})$ (5.20) or Source Coefficient $\rho(X_{0})$ (5.19); see [24] or Remark 5.5. To verify strong minima at $X_{0}$ , we use Strong Source Coefficient $\zeta(X_{0})$ in (4.27) or (4.38) and use it .

Specifically, let $U_{0}\begin{pmatrix}\Sigma_{0}&0\\ 0&0\end{pmatrix}V_{0}^{T}$ be a full SVD of $X_{0}$ . We respectively denote $u_{i}$ and $v_{j}$ as the $i$ th and $j$ th column of $U_{0}$ and $V_{0}$ for $1\leq i,j\leq n$ . Note from the formula of the tangent model space $\mathbb{T}_{0}$ in (5.26) that $\mathcal{B}=\{u_{i}v_{j}^{T}:(i,j)\notin[n-r+1,n]\times[n-r+1,n]\}$ forms a basis for $\mathbb{T}_{0}$ . Thus, the Restricted Injectivity holds if $\mbox{\rm rank}\,\Phi B=r(2n-r)$ , where $B$ is a matrix whose columns are all vectors from the basis $\mathcal{B}$ .

To compute $\tau(X_{0})$ , we assume that $U\Sigma V^{T}$ is an SVD of $\Phi$ and denote $V_{G}$ by the matrix whose columns are the last $n^{2}-r$ columns of $V$ . We then solve the following vectorized problem for the optimal solution $Z^{*}$ by using the cvxpy package and compute $\tau(X_{0})=\|Z^{*}\|$ :

\min_{Z\in\mathbb{T}_{0}^{\perp}}\|Z\|_{F}\quad\mbox{subject to}\quad N\text{vec}(Z)=-N\text{vec}(E_{0}),

(6.2)

where $N=V_{G}^{T}$ and $\mathbb{T}_{0}^{\perp}$ is known by

\mathbb{T}_{0}^{\perp}=\left\{U_{0}\begin{pmatrix}0&0\\ 0&D\end{pmatrix}V_{0}^{T}|\;D\in\mathbb{R}^{(n-r)\times(n-r)}\right\}.

(6.3)

To calculate $\rho(X_{0})$ , we solve the following vectorized problem of (5.19) for the optimal value $\rho(X_{0})$ by using the cvxpy package:

\min_{Z\in\mathbb{T}_{0}^{\perp}}\|Z\|\quad\mbox{subject to}\quad N\text{vec}(Z)=-N\text{vec}(E_{0}),

(6.4)

where $N$ and $\mathbb{T}_{0}^{\perp}$ are respectively determined in (6.2) and (6.3). $X_{0}$ is a sharp solution of problem (6.1) if either $\tau(X_{0})$ or $\rho(X_{0})$ is smaller than $1$ ; see Remark 5.5. Due to the possible (small) error in computation, we classify sharp minima if $X_{0}$ is recovered and either $\tau(X_{0})<0.99$ or $\rho(X_{0})<0.95$ .

To classify strong minima, we consider the cases when $X_{0}$ is recovered, and $\tau(X_{0})>0.99$ and $0.95<\rho(X_{0})<1.05$ . Let $Z_{0}$ be an optimal solution problem of (6.4) expressed by the following form:

Z_{0}=U_{0}\begin{pmatrix}0&0\\ 0&D_{0}\end{pmatrix}V_{0}^{T}\quad\mbox{and}\quad Y_{0}=U_{0}\begin{pmatrix}\mathbb{I}&0\\ 0&D_{0}\end{pmatrix}V_{0}^{T}

(6.5)

with some $D_{0}\in\mathbb{R}^{(n-r)\times n-r}$ . Note that $Y_{0}$ is a dual certificate of $X_{0}$ . According to Theorem 5.2, $X_{0}$ is a strong solution provided that

\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\{0\}.

(6.6)

By Theorem 4.6, this condition holds when the Restricted Injectivity is satisfied and the Strong Source Coefficient $\zeta(X_{0})$ , the optimal value of problem (4.27) or (4.38) is smaller than $1$ . Let $\widehat{U}\widehat{\Sigma}\widehat{V}$ be an SVD of $D_{0}$ . We write $U_{0}=[U_{I}\;U_{J}]$ and $V_{0}=[V_{I}\;V_{K}]$ , where $U_{I}$ and $V_{I}$ are the first $r$ columns of $U_{0}$ and $V_{0}$ , respectively. Define $\overline{U}=[U_{I}\;U_{J}\widehat{U}]$ and $\overline{V}=[V_{I}\;V_{J}\widehat{V}]$ , it follows that $(\overline{U},\overline{V})\in\mathcal{O}(X_{0})\cap\mathcal{O}(Y_{0})$ by (4.8). To compute $\zeta(X_{0})$ , we solve the vectorized problem of (4.38):

\min_{Z\in\mathbb{T}_{0}^{\perp},W\in\mathcal{E}^{\perp}}\|Z\|\quad\mbox{subject to}\quad N\text{vec}(Z+E_{0}+W)=0,

(6.7)

where $\mathbb{T}_{0}^{\perp}$ is determined in (6.3), $E_{0}=U_{I}V_{I}^{T}$ , and $\mathcal{E}^{\perp}$ is taken from (4.37):

\mathcal{E}^{\perp}=\left\{\overline{U}\begin{pmatrix}A&B&C\\ -B^{T}&0&0\\ D&0&0\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{V}_{r},B\in\mathbb{R}^{r\times(p-r)},C\in\mathbb{R}^{r\times(n-p)},D\in\mathbb{R}^{(n-p)\times r}\right\}.

(6.8)

We classify strong (non-sharp) minima if $\zeta(X_{0})<0.95$ .

To illustrate the occurrence of strong minima, sharp minima, and solution uniqueness of problem (6.1), we create following graphs representing the proportion of each situation with respect to the number of measurements in Figures 1. For each fixed $n=40$ and $2\leq r\leq 7$ , at any measurements $m$ we study 100 random cases and record the percentage of cases that are recovered, sharply recovered, and strongly (not sharply) recovered in black, blue, and red curves, respectively. Observe that the percentage of cases where $X_{0}$ is a strong (not sharp) solution is highest at approximately 40% and is more than the cases of sharp minima when the number of measurements is not big enough. This phenomenon was obtained at different measurements for different ranks, indicating a significant number of cases with strong (not sharp) solutions. Additionally, higher ranks require more measurements to achieve the highest percentage of cases with strong (not sharp) solutions.

We also plot the average values of $\tau(X_{0})$ , IC( $X_{0}$ ) (Irrepresentability Criterion (5.21)), $\rho(X_{0})$ , and $\zeta(X_{0})$ for each number of measurements in Figures 2 with different ranks. It seems that using $\rho(X_{0})$ to check sharp minima gives us more cases than using $\tau(X_{0})$ . Moreover, $\zeta(X_{0})$ is significantly smaller than both $\tau(X_{0})$ and $\rho(X_{0})$ while IC( $X_{0}$ ) is slightly greater than $\tau(X_{0})$ .

Refer to caption — Figure 1: Proportions of cases for which $X_{0}$ is a solution, sharp solution, and strong (not sharp) solution with respect to the number of measurements.

6.2 Experiment 2

In the second experiment, we study the following matrix completion problem:

\min_{X\in\mathbb{R}^{n\times n}}\quad\|X\|_{*}\quad\mbox{subject to}\quad X_{ij}=(X_{0})_{ij},\quad(i,j)\in\Omega.

(6.9)

by a similar process to the first one. We also generate $X_{0}$ , an $n\times n$ matrix of rank $r$ , by sampling two factors $W\in\mathbb{R}^{n\times r}$ and $H\in\mathbb{R}^{n\times r}$ with i.i.d. random entries and setting $X_{0}=WH^{*}$ . However, this time we sample an indexed subset $\Omega$ of $m$ entries uniformly at random from $[n]\times[n]$ . Cvxpy package is also used to solve problem (6.9) with an optimal solution $X_{\text{opt}}$ . $X_{0}$ is said to be recovered if $\|X_{\text{opt}}-X_{0}\|_{F}/\|X_{0}\|_{F}<10^{-3}$ . To classify sharp minima, we check Restricted Injectivity (5.16) (as done in the first experiment), compute $\tau(X_{0})$ in (5.20) or Source Coefficient $\rho(X_{0})$ in (5.19), and restrict $\tau(X_{0})\leq 0.99$ or $\rho(X_{0})\leq 0.95$ .

Specifically, to calculate $\tau(E_{0})$ , we follow the following steps. It is similar to the first experiment, denote $u_{i}$ and $v_{j}$ by the $i$ th and $j$ th column of $U_{0}$ and $V_{0}$ , where $U_{0}DV_{0}^{T}$ is a full SVD of $X_{0}$ . We define $B_{ij}=P_{\Omega}(u_{i}v_{j}^{T})$ for all $(i,j)\in\Gamma\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{(i,j)\in[n]\times[n]|\;(i,j)\notin[n-r,n]\times[n-r,n]\}$ , where $P_{\Omega}$ is the projection mapping defined by $P_{\Omega}(X)_{ij}=X_{ij}$ if $(i,j)\in\Omega$ and 0 otherwise. Then we solve the following linear system for $\alpha_{ij}$ :

\sum_{(i,j)\in\Gamma}\alpha_{ij}P_{\Gamma}\left(U_{0}^{T}B_{ij}V_{0}\right)=\begin{pmatrix}I_{r}&0\\ 0&0\end{pmatrix}.

It is not difficult to obtain from (5.20) that

\tau(E_{0})=\left\|Y-E_{0}\right\|,

where $Y=\sum_{(i,j)\in\Gamma}\alpha_{ij}B_{ij}$ and $E_{0}=U_{0}\begin{pmatrix}I_{r}&0\\ 0&0\end{pmatrix}V_{0}^{T}$ .

To compute $\rho(X_{0})$ , we transform problem (5.19) to the case of matrix completion as follows:

\min_{Z\in\mathbb{T}_{0}^{\perp}}\|Z\|\quad\mbox{subject to}\quad Z_{ij}=-(E_{0})_{ij},\quad(i,j)\notin\Omega

(6.10)

and solve by using cvxpy package for the optimal value $\rho(X_{0})$ , where $\mathbb{T}_{0}^{\perp}$ is determined in (6.3).

Similarly to (6.5), we denote $Z_{0}$ be an optimal solution of problem (6.10) and denote $Y_{0}=Z_{0}+E_{0}$ as a dual certificate of $X_{0}$ . To check if $X_{0}$ is a strong solution, we only need to verify (6.6) with $\mbox{\rm Ker}\,\Phi=\mbox{\rm Ker}\,P_{\Omega}$ by Theorem 5.2. According to Theorem 4.6, the later holds under the Restricted Injectivity and $\zeta(X_{0})<1$ , which is the optimal solution of the following problem, a version of (4.38) for matrix completion:

\min_{Z\in\mathbb{T}_{0}^{\perp},W\in\mathcal{E}^{\perp}}\|Z\|\quad\mbox{subject to}\quad(Z+E_{0}+W)_{ij}=0,\quad(i,j)\notin\Omega.

(6.11)

Here $\mathbb{T}_{0}^{\perp}$ and $\mathcal{E}^{\perp}$ are defined (6.3) and (6.8). We classify a case of strong minima (strong recovery) if $X_{0}$ is recovered and $\zeta(X_{0})\leq 0.95$ , but $\tau(X_{0})>0.99$ and $0.95<\rho(X_{0})<1.05$ . In Figures 3 , we plot the proportion of cases when $X_{0}$ is a unique solution, sharp solution, and strong (not sharp) solution in relation to the number of measurements. Additionally, Figures 4 displays the curves of the average value of $\tau(X_{0})$ , $\rho(X_{0})$ , and $\zeta(X_{0})$ with respect to the number of measurements.

Based on Figure 3, we can see that the highest percentage of cases where $X_{0}$ is a strong solution (but not a sharp one) is just around 15%. It is much smaller than the 40% in Experiment 1. A possible reason is due to the special structure of the linear operator $\Phi=P_{\Omega}$ in (6.9), which is chosen in such a way that each row contains only one entry of 1 and the remaining entries are 0. Additionally, these entries of 1 must be in distinct columns. Similar to Experiment 1, we found that higher ranks require more measurements to achieve the highest percentage of cases with strong (not sharp) solutions.

As shown in Figure 4, the average values of $\tau(X_{0})$ , $\rho(X_{0})$ , and $\zeta(X_{0})$ change depending on the number of measurements. The difference between the curves of $\tau(X_{0})$ and $\rho(X_{0})$ is less noticeable in comparison to Experiment 1, but the curve $\zeta(X_{0})$ is still significantly lower than both of $\tau(X_{0})$ and $\rho(X_{0})$ .

References

[1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp: Living on the edge: A geometric theory of phase transitions in convex optimization, IMA Inform. Inference, 3 (2008), 224–294.
[2] F. J. Aragón Artacho and M. H. Geoffroy: Characterizations of metric regularity of subdifferentials, J. Convex Anal., 15 (2008), 365–380.
[3] B. P. W. Ames and S. A. Vavasis: Nuclear norm minimization for the planted clique and biclique problems, Math. Program., 29 (2011), 69–89.
[4] A. Ben-Tal. Second order and related extremality conditions in nonlinear programming. J. Optim. Theory Appl., 31 (1980) 143–165.
[5] A. Ben-Tal and J. Zowe: A unified theory of first- and second order conditions for extremum problems in topological vector spaces. Math. Program. 19 (1982), 39–76.
[6] J. F. Bonnans and A. D. Ioffe: second order sufficiency and quadratic growth for non-isolated minima, Math. Oper. Res., 20 (1995), 801–817.
[7] J.F. Bonnans, R. Cominetti, and A. Shapiro: Second order optimality conditions based on parabolic second order tangent, SIAM J. Optim., 9 (1999), 466-492
[8] J. F. Bonnans and A. Shapiro: Perturbation Analysis of Optimization Problems, Springer, 2000.
[9] J. Bolte, T. P. Nguyen, J. Peypouquet, B. W. Suter: From error bounds to the complexity of first-order descent methods for convex functions, Math. Program., 165 (2017), 471–507.
[10] J. Burke: Second order necessary and sufficient conditions for convex composite NDO, Math. Program., 38 (1987), 287-302.
[11] P. M. Cohn: Universal algebra, Springer Science & Business Media, 2012.
[12] L. Cromme: Strong uniqueness. A far-reaching criterion for the convergence analysis of iterative procedures, Numer. Math., 29(2) (1977/78), 179–193
[13] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky: The convex geometry of linear inverse problems, Found Comput Math, 12 (2012), 805–849.
[14] E. Candès and Y. Plan: Matrix completion with noise, Proceeding of the IEEE, 98 (2010), 925–936.
[15] E. Candès and B. Recht: Exact matrix completion via convex optimization Found. Comput. Math., 9 (2009), 717–772.
[16] E. Candès and B. Recht: Simple bounds for recovering low-complexity models, Math. Program., 141 (2013), 577–589.
[17] Y. Cui and C. Ding: Nonsmooth composite matrix optimization: strong regularity, constraint nondegeneracy and beyond, ArXiv, arXiv:1907.13253
[18] Y. Cui, C. Ding, and X. Zhao: Quadratic growth conditions for convex matrix optimization problems associated with spectral functions, SIAM J. Optim., 27 (2017), 2332–2355.
[19] Y. Cui, D. Sun, and K-C. Toh: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming, Math. Program., 178 (2019),381–415
[20] C. Ding: Variational analysis of the Ky Fan k-norm, Set-Valued and Var. Anal., 25 (2017), 265–296.
[21] D. Drusvyatskiy, B. S. Mordukhovich, and T. T. A. Nghia: second order growth, tilt stability, and metric regularity of the subdifferential, J. Convex Anal., 21 (2014), 1165–1192.
[22] A. V. Fiacco and G. P. McCormick: Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York, 1968.
[23] J. Fadili, J. Malick, G. Peyré: Sensitivity Analysis for Mirror-Stratifiable Convex Functions, SIAM J. Optim., 28 (2018), 2975—3000.
[24] J. Fadili, T. T. A. Nghia, and T. T. T. Tran: Sharp, strong and unique minimizers for low complexity robust recovery, IMA Inf. Inference, 12 (2023).
[25] M. Grasmair, M. Haltmeier, and O. Scherzer: Necessary and sufficient conditions for linear convergence of $\ell_{1}$ regularization, Comm. Pure Applied Math. 64(2011), 161–182.
[26] J. Harris: Algebraic Geometry, Springer, 1992.
[27] C. J. Hsieh and P. Olsen: Nuclear norm minimization via active subspace selection. ICML, 32 (2014), 575–583.
[28] T. Hoheisel and E. Paquette: Uniqueness in nuclear norm minimization: Flatness of the nuclear norm sphere and simultaneous polarization, J. Optim. Theo. Appl., 197 (2023), 252-276.
[29] J. Liang, J. Fadili, G. Peyré: Activity identification and local linear convergence of forward-backward-type methods, SIAM J. Optim., 27 (2017), 408–437.
[30] A. S. Lewis and H. S. Sendov: Nonsmooth analysis of singular values. I. Theory. Set-Valued Anal., 13 (2005), 213–241.
[31] A. S. Lewis and H. S. Sendov: Nonsmooth analysis of singular values. II. Applications, Set-Valued Anal., 13 (2005), 243–264.
[32] Z.-Q. Luo and P. Tseng: Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46 (1993), 157–178.
[33] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma: Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 171-184.
[34] A. D. Ioffe. Necessary and sufficient conditions for a local minimum III: Second order conditions and augmented duality. SIAM J. Control Optim., 17 (1979), 266–286.
[35] A. D. Ioffe: Variational analysis of a composite function: A formula for the lower second order epi-derivative, J. Math. Anal. Appl., 160 (1991), 379–405.
[36] B. S. Mordukhovich: Variational Analysis and Generalized Differentiation, Springer, 2006.
[37] A. Mohammadi and E. Sarabi: Parabolic regularity of spectral functions. Part I: Theory. arXiv:2301.04240
[38] S. Nam, M. E. Davies, M. Elad, R. Gribonval: The cosparse analysis model and algorithms, Applied and Computational Harmonic Analysis, 34 (2013), 30–56.
[39] B. T. Polyak. Sharp minima. Technical report, Institute of Control Sciences Lecture Notes, Moscow, USSR, 1979.
[40] B. Recht, M. Fazel, and P. Parrilo: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev., 52 (2010), 471–501.
[41] R. T. Rockafellar: Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970.
[42] R. T. Rockafellar: second order optimality conditions in nonlinear programming obtained by way of epi-derivatives, Trans. Amer. Soc., 307 (1988), 75–108.
[43] R. T. Rockafellar and R. J-B. Wets: Variational Analysis, Springer, Berlin, 1998.
[44] M. Studniarski and D. E. Ward: Weak sharp minima and suffficient conditions, SIAM J. Control Optim., 38 (1999), 219–236.
[45] S. Vaiter, G. Peyré, and J. Fadili: Model consistency of partly smooth regularizers. IEEE Transactions on Information Theory, 64 (2017), 1725–1737.
[46] S. Vaiter, M. Golbabaee, J. Fadili, and G. Peyré: Model selection with low complexity priors. Information and Inference: A Journal of the IMA, 4 (2015), 230–287.
[47] S. Vaiter, G. Peyré, and J. Fadili: Low complexity regularization of linear inverse problems. In Sampling Theory, a Renaissance, 103–153. Birkhäuser, Cham, 2015.
[48] G. A. Watson: Characterization of the subdifferential of some matrix norms, Linear Algebra Appl., 170 (1992), 33–45
[49] D. E. Ward: Characterizations of strict local minima and necessary conditions for weak sharp minima, J. Optim. Theory and Appl., 80 (1994), 551–571.
[50] Z. Zhou and A. M-C. So: A unified approach to error bounds for structured convex optimization, Math. Program., 165 (2017), 689–728.
[51] L. Zhang, N. Zhang, and X. Xiao: On the second order directional derivatives of singular values of matrices and symmetric matrix-valued functions, Set-Valued Var. Anal., 21 (2013), 557–586.
[52] R. Zhang and J. Treiman: Upper-Lipschitz multifunctions and inverse subdifferentials, Nonlinear Anal., 24 (1995), 273–286.
[53] X. Wang, J. J. Ye, X. Yuan, S. Zeng, and J. Zhang.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis, Set-Valued and Variational Analysis, 30 (2022), 39–79.

	$\displaystyle T^{i,2}_{K}(\bar{x}\|w)$	$\displaystyle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathop{{\rm Lim}\,{\rm inf}}_{t\downarrow 0}\frac{K-\bar{x}-tw}{\frac{1}{2}t^{2}}=\left\{z\in\mathbb{X}\|\;\forall\,t_{k}\downarrow 0,\exists\,z_{k}\to z,\bar{x}+t_{k}w+\frac{1}{2}t_{k}^{2}z_{k}\in K\right\}\quad\mbox{and}$		(2.10)
	$\displaystyle T^{2}_{K}(\bar{x}\|w)$	$\displaystyle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathop{{\rm Lim}\,{\rm sup}}_{t\downarrow 0}\frac{K-\bar{x}-tw}{\frac{1}{2}t^{2}}=\left\{z\in\mathbb{X}\|\;\exists\,t_{k}\downarrow 0,z_{k}\to z,\bar{x}+t_{k}w+\frac{1}{2}t_{k}^{2}z_{k}\in K\right\}.$		(2.10)

Geometric characterizations for strong minima with applications to nuclear norm minimization problems

1 Introduction

2 Preliminaries

Definition 2.1 (Subderivatives).

Definition 2.2 (tangent cones).

Definition 2.3 (Second order regularity).

Theorem 2.4 (Second order characterizations for strong solutions of composite problems).

3 Geometric characterizations for strong minima of optimization problems

3.1 Geometric characterizations for strong minima of unconstrained optimization problems

Definition 3.1 (Quadratic growth conditions).

Lemma 3.2 (Necessary condition for quadratic growth).

Theorem 3.3 (Necessary and sufficient conditions for strong minima).

Corollary 3.4 (Geometric characterization for strong minima of convex problems).

3.2 Geometric characterization for strong minima of optimization problems with linear constraints

Theorem 3.5 (Geometric characterization for strong minima of problem (3.20)).

4 Characterizations for strong minima of low-rank optimization problems

Lemma 4.1 (Subdifferential of the nuclear norm).

Corollary 4.2 (Geometric characterization for strong minima of low-rank optimization problems).

Remark 4.3.

Corollary 4.4 (Strong minima under Nondegeneracy Condition).

Proposition 4.5 (Critical cone of nuclear norm).

Theorem 4.6 (Characterizations for strong minima of low-rank optimization problems).

Remark 4.7.

Remark 4.8 (Checking Strong Restricted Injectivity, Strong Sufficient Condition (4.12), and constructing the linear operator ℳ\mathcal{M}).

Remark 4.9 (Checking the Analysis Strong Source Condition without using the linear operator ℳ\mathcal{M}).

Corollary 4.10 (Quantitative sufficient condition for strong solution).

5 Characterizations for strong minima of nuclear norm minimization problems

Lemma 5.1.

Theorem 5.2 (Characterizations for strong minima of nuclear norm minimization problems).

Remark 5.3.

Corollary 5.4 (Strong Restricted Injectivity and Strong Nondegenerate Source Condition for strong minima ).

Remark 5.5 (Sharp minima vs Strong minima).

Remark 5.6 (Descent cone vs tangent cone).

Corollary 5.7 (Strict Restricted Injectivity for strong minima of problem (5.1)).

Corollary 5.8 (Minimum bound for strong exact recovery).

Remark 5.9 (Low bounds on the number of measurement for exact recovery).

Example 5.10.

Example 5.11 (Checking strong minima numerically).

Corollary 5.12 (Strong minima of low-rank representation problems).

Example 5.13 (Unique solutions of low-rank representation problems are not sharp).

Corollary 5.14.

6 Numerical Experiments

6.1 Experiment 1

6.2 Experiment 2

References

Remark 4.8 (Checking Strong Restricted Injectivity, Strong Sufficient Condition (4.12), and constructing the linear operator $\mathcal{M}$ ).

Remark 4.9 (Checking the Analysis Strong Source Condition without using the linear operator $\mathcal{M}$ ).