This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Geometric characterizations for strong minima with applications to nuclear norm minimization problems

Jalal Fadili111Normandie Université, ENSICAEN, UNICAEN, CNRS, GREYC, France; email: [email protected] ,    Tran T. A. Nghia222Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309, USA; email: [email protected] ,    and    Duy Nhat Phan333Department of Mathematics and Statistics, University of Massachusetts Lowell, Lowell, MA 01854, USA; [email protected]

Abstract. In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are usually used to identify solution uniqueness of nuclear norm minimization problems. Consequently, we obtain the minimum (tight) bound on the number of measurements for (strong) exact recovery of low-rank matrices.

Key Words. Convex optimization; Strong minima; Sharp minima; Second order condition; Nuclear norm minimization; Exact recovery.

Mathematics Subject Classification 52A41 \cdot 90C25 \cdot 49J53 \cdot 49J52

1 Introduction

Strong minima is an important property at a local minimizer of an optimization problem such that the difference between the cost value and the optimal value is bigger than a proportional of the norm square of the difference between the corresponding feasible solution and the minimizer. It is an error bound condition that has various applications to sensitivity analysis, robustness, and complexity of algorithms [4, 5, 7, 8, 18, 22, 23, 24, 29, 34, 42, 44]. Finding necessary and sufficient second order conditions for strong minima is a classical research area. For nonlinear programming, the first results in this direction were probably established in [22] under some restrictive conditions. Complete second order characterizations for nonlinear programming under mild conditions such as the Mangasarian-Fromovitz constraint qualification were obtained later in [4, 34]. For constrained (nonpolyhedral) optimization problems with smooth data, necessary and sufficient second order conditions for strong minima are much more involved. They often contain nontrivial “sigma terms”, which represent curvatures of some nonpolyhedral structures in the problem. Another important feature of these conditions is that they are usually formulated as “minimax” conditions such that Lagrange multipliers are dependent on the choice of vectors in the critical cone; see, e.g., [7, 8]. Although the aforementioned sigma terms are fully calculated for many seminal classes of optimization such as semi-infinite programming, semi-definite programming, and second order cone programming, their calculations are complicated in general. Moreover, checking the (minimax) sufficient second order conditions is quite a hard task numerically.

An important problem that motivates our study in this paper is the nuclear norm minimization problem

minXn1×n2Xsubject toΦX=M0,\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad\|X\|_{*}\quad\mbox{subject to}\quad\Phi X=M_{0}, (1.1)

where X\|X\|_{*} is the nuclear norm of an n1×n2n_{1}\times n_{2} matrix XX, Φ:n1×n2m\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{m} is a linear operator, and M0M_{0} is a known vector (observation) in m\mathbb{R}^{m}. This problem is considered the tightest convex relaxation of the celebrated NP-hard affine rank minimization problem with various applications in computer vision, collaborative filtering, and data science; see, e.g., [1, 13, 15, 16, 40]. There are some essential reasons to study strong minima of this problem. First, strong minima of problem (1.1) guarantees the linear convergence of some proximal algorithms for solving problem (1.1) and related problems; see, e.g., [18, 29, 50]. Second, it is also sufficient for solution uniqueness and robustness of problem (1.1) [23, 24]. Solution uniqueness for problem (1.1) is a significant property in recovering the original low-rank solution X0n1×n2X_{0}\in\mathbb{R}^{n_{1}\times n_{2}} from observations M0=ΦX0M_{0}=\Phi X_{0}. In [15, 16], Candès and Recht introduced a nondegenerate condition sufficient for solution uniqueness of problem (1.1). It plays an important role in their results about finding a small bound for measurements mm such that solving problem (1.1) recovers exactly X0X_{0} from observations M0M_{0} over a Gaussian linear operator Φ\Phi. Their condition is recently revealed in [24] to be a complete characterization for the so-called sharp minima introduced by Crome [12] and Polyak [39] independently. This special property of problem (1.1) at X0X_{0} guarantees the robust recovery with a linear rate in the sense that any solutions of the following low-rank optimization problem

12ΦXM2+μX\frac{1}{2}\|\Phi X-M\|^{2}+\mu\|X\|_{*} (1.2)

converge to X0X_{0} with a linear rate as μ0\mu\downarrow 0 provided that MM0cμ\|M-M_{0}\|\leq c\mu for some constant c>0c>0; see [14]. When strong minima occurs in problem (1.1), [24] shows that the convergence rate is Hölderian with order 12\frac{1}{2}. It is also worth noting that solution uniqueness of nuclear norm minimization problem (1.1) can be characterized geometrically via the descent cone [1, 13]. As the descent cone is not necessarily closed, using it to check solution uniqueness numerically is not ideal. Another geometric characterization for solution uniqueness of problem (1.1) is established recently in [28], but the set in their main condition is not closed too. As strong minima is necessary for sharp minima and sufficient for solution uniqueness, it is open to study the impact of strong minima to exact recovery.

Sufficient second order condition for strong minima of problem (1.1) can be obtained from [18, Theorem 12]. The approach in [18] is rewriting problem (1.1) as a composite optimization problem and applying the classical results in [7, 8]. Some second order analysis on spectral functions including the nuclear norm studied in [17, 18, 20, 37, 51] could be helpful in understanding this result, but these second order computations applied on the nuclear norm still look complicated. Most importantly, the sufficient second order condition obtained in [18, Theorem 12] is still in a minimax form, which makes it hard to check. Our main questions throughout the paper are: 1. Is it possible to obtain simple necessary and sufficient conditions for strong minima of problem (1.1)? 2. Can we avoid the minimax form usually presented in these kinds of second order sufficient conditions? and 3. Is there any efficient way to check strong minima of problem (1.1) numerically?

Our contribution. To highlight the new ideas and direct to the bigger picture, we study in Section 3 the following composite optimization problem

minx𝕏f(x)+g(x),\min_{x\in\mathbb{X}}\quad f(x)+g(x), (1.3)

where 𝕏\mathbb{X} is a finite dimensional space, f:𝕏f:\mathbb{X}\to\mathbb{R} is a twice continuously differentiable function, and g:𝕏¯=def{+}g:\mathbb{X}\to\overline{\mathbb{R}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathbb{R}\cup\{+\infty\} is a proper lower semi-continuous (nonsmooth) function. It covers problem (1.2) and many modern ones in optimization. One of the most popular ways to characterize strong minima of problem (1.3) is using the second subderivative [8, 42, 43]. As the function gg is nonsmooth, second subderivative of gg is hard to compute in general. To avoid this computation, we assume additionally that the function gg satisfies the classical quadratic growth condition [6, 49]. In the case of convex functions, it is shown in Section 3 that x¯\bar{x} is a strong solution of problem (1.3) if and only if v¯=deff(x¯)g(x¯)\bar{v}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\nabla f(\bar{x})\in\partial g(\bar{x}) and the following geometric condition holds

Ker2f(x¯)T(g)1(v¯)(x¯)={0},\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{(\partial g)^{-1}(\bar{v})}(\bar{x})=\{0\}, (1.4)

where g:𝕏𝕏\partial g:\mathbb{X}\rightrightarrows\mathbb{X}^{*} is the subdifferential mapping of gg, Ker2f(x¯)\mbox{\rm Ker}\,\nabla^{2}f(\bar{x}) is the nullspace of the Hessian matrix 2f(x¯)\nabla^{2}f(\bar{x}), and T(g)1(v¯)(x¯)T_{(\partial g)^{-1}(\bar{v})}(\bar{x}) is the Bouligand contingent cone at x¯\bar{x} to (g)1(v¯)(\partial g)^{-1}(\bar{v}). The action of the contingent cone to a first order structure in the above condition tells us that (1.4) is actually a second order condition. But it looks simpler for computation than the second subderivative; see, e.g., our Corollary 4.2 for the case of nuclear norm. Our results also work when gg is not convex; see Section 3 for more detailed analysis.

Another problem considered in Section 3 is the following convex optimization problem with linear constraints

minx𝕏g(x)subject toΦxK,\min_{x\in\mathbb{X}}\quad g(x)\quad\mbox{subject to}\quad\Phi x\in K, (1.5)

where g:𝕏¯g:\mathbb{X}\to\overline{\mathbb{R}} is a continuous (nonsmooth) convex function, Φ:𝕏𝕐\Phi:\mathbb{X}\to\mathbb{Y} is a linear operator between two finite dimensional spaces, and KK is a closed polyhedral set of 𝕐\mathbb{Y}. This problem covers the nuclear norm minimization problem (1.1) and a handful of other significant optimization problems [1, 3, 13]. When gg is twice continuously differentiable, characterizations for strong minima of problem (1.5) are simple; see, e.g., [8, Theorem 3.120]. But when gg is not differentiable, characterizing strong minima is much more involved. A standard approach is rewriting problem (1.5) as a composite problem, which can be represented by a constrained optimization problem with smooth data [7, 8, 35]. To obtain similar geometric characterization to (1.4) for problem (1.5), we additionally assume the function gg satisfies both quadratic growth condition and second order regularity. The latter condition was introduced by Bonnans, Cominetti, and Shapiro in [7] to close the gap between necessary and sufficient second order optimality conditions for constrained and composite optimization problems. But the extra assumption of quadratic growth condition on the function gg in this paper allows us to achieve geometric characterizations for strong minima of problem (1.5) in Theorem 3.5. Many vital classes of nonsmooth functions gg satisfy both the quadratic growth condition and second order regularity. To list a few, we have classes of piecewise linear-quadratic convex functions [43], spectral functions [18, 37], 1/2\ell_{1}/\ell_{2} norms [50], and indicator functions to the set of positive semi-definite matrices [19]. Our results are applicable not only to nuclear norm minimization problem (1.1), but also other different problems in optimization.

As the nuclear norm satisfies both the quadratic growth condition and second order regularity, geometric characterizations for strong minima of low-rank problems (1.1) and (1.2) are foreseen. But our studies on problems (1.1) and (1.2) in Section 4 and 5 are not just straightforward applications. In Section 4, we derive simple calculation of the second order structure in (1.4) for the case of nuclear norm. Furthermore, some quantitative characterizations for strong minima of problem (1.2) are obtained via the so-called Strong Restricted Injectivity and Analysis Strong Source Condition that inherit the same terminologies introduced recently in [24] to characterize strong minima/solution uniqueness of group-sparsity optimization problems. These conditions are weaker than the well-known Restricted Injectivity and Nondegenerate Source Condition used in [15, 16] as sufficient conditions for solution uniqueness of nuclear norm minimization problem (1.1); see also [25] for the case of 1\ell_{1}-norm. Both conditions can be verified numerically. In Section 5, we obtain new characterizations for strong minima of problem (1.1). Our conditions are not in the form of minimax problems. Indeed, Theorem 5.2 shows that X0X_{0} is a strong solution of problem (1.1) if and only if there exists a dual certificate Y¯ImΦX0\overline{Y}\in\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*} such that

KerΦT()1(Y¯)(X0)={0},\mbox{\rm Ker}\,\Phi\cap T_{(\partial\|\cdot\|_{*})^{-1}(\overline{Y})}(X_{0})=\{0\},

which has some similarity with (1.4). Necessary and sufficient conditions for strong minima obtained in this section reveal some interesting facts about exact recovery for nuclear minimization problem (1.1). For example, one needs at least 12r(r+1)\frac{1}{2}r(r+1) measurements for M0=ΦX0M_{0}=\Phi X_{0} to recover exactly the matrix X0X_{0} of rank rr as a strong solution of problem (1.1). This bound for mm is very small, but it is tight in the sense that we can construct infinitely many linear operators Φ:n1×n212r(r+1)\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{\frac{1}{2}r(r+1)} such that solving problem (1.1) recovers exactly X0X_{0}. Another compelling result in Section 5 shows that the low-rank representation problem [27, 33] always has strong minima, when the linear operator Φ\Phi in (1.1) is any q×n1q\times n_{1} matrix.

Finally in this paper, we discuss numerical methods to check strong minima and compare the results with sharp minima and solution uniqueness. For example, with 100 nuclear norm minimization problems via standard Gaussian linear operators Φ\Phi and 460460 measurements M0460M_{0}\in\mathbb{R}^{460} observed from the original matrix X040×40X_{0}\in\mathbb{R}^{40\times 40} of rank 33, the case of exact recovery is about 80%, in which about 40% problems have sharp minima and the other 40% problems have strong minima. As the traditional approach for exact recovery in [1, 13, 16] is via sharp minima [24], seeing more unique (strong, non-sharp) solutions in these numerical experiments gives us a more complete picture about exact recovery when number of measurements mm is not big enough; see our Section 6 for further numerical experiments.

2 Preliminaries

Throughout this paper, we suppose that 𝕏\mathbb{X} is an Euclidean space with norm \|\cdot\| and 𝕏\mathbb{X}^{*} is its dual space endowed with the inner product v,x\langle v,x\rangle for any v𝕏v\in\mathbb{X}^{*} and x𝕏x\in\mathbb{X}. 𝔹r(x¯)\mathbb{B}_{r}(\bar{x}) is denoted by the closed ball with center x¯𝕏\bar{x}\in\mathbb{X} and radius r>0r>0. Let φ:𝕏¯=def{+}\varphi:\mathbb{X}\to\overline{\mathbb{R}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathbb{R}\cup\{+\infty\} be a proper extended real-valued function with nonempty domain domφ=def{x𝕏|φ(x)<}\mbox{\rm dom}\,\varphi\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{x\in\mathbb{X}|\;\varphi(x)<\infty\}\neq\emptyset. A point x¯domφ\bar{x}\in\mbox{\rm dom}\,\varphi is called a strong solution (or strong minimizer) of φ\varphi if there exist c,ε>0c,\varepsilon>0 such that

φ(x)φ(x¯)cxx¯2for allx𝔹ε(x¯).\varphi(x)-\varphi(\bar{x})\geq c\|x-\bar{x}\|^{2}\quad\mbox{for all}\quad x\in\mathbb{B}_{\varepsilon}(\bar{x}). (2.1)

In this case, we say that strong minima occurs at x¯\bar{x}. The study of strong minima is usually based on second order theory, where the following structures of subderivatives play crucial roles; see, e.g., [5, 8, 35, 42, 43].

Definition 2.1 (Subderivatives).

For a function φ:𝕏¯\varphi:\mathbb{X}\to\overline{\mathbb{R}} and x¯domφ\bar{x}\in\mbox{\rm dom}\,\varphi, the subderivative of φ\varphi at x¯\bar{x} is the function dφ(x¯):𝕏¯d\varphi(\bar{x}):\mathbb{X}\to\overline{\mathbb{R}} defined by

dφ(x¯)(w)=deflim inft0,wwφ(x¯+tw)φ(x¯)tforwX.d\varphi(\bar{x})(w)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\liminf_{t\downarrow 0,w^{\prime}\to w}\dfrac{\varphi(\bar{x}+tw^{\prime})-\varphi(\bar{x})}{t}\quad\mbox{for}\quad w\in X. (2.2)

The second subderivative of φ\varphi at x¯\bar{x} for v¯𝕏\bar{v}\in\mathbb{X}^{*} is the function d2φ(x¯|v¯):𝕏¯d^{2}\varphi(\bar{x}|\bar{v}):\mathbb{X}\to\overline{\mathbb{R}} defined by

d2φ(x¯|v¯)(w)=deflim inft0,wwφ(x¯+tw)φ(x¯)tv¯,w12t2forwX.d^{2}\varphi(\bar{x}|\bar{v})(w)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\liminf_{t\downarrow 0,w^{\prime}\to w}\dfrac{\varphi(\bar{x}+tw^{\prime})-\varphi(\bar{x})-t\langle\bar{v},w^{\prime}\rangle}{\frac{1}{2}t^{2}}\quad\mbox{for}\quad w\in X. (2.3)

The parabolic subderivative of φ\varphi at x¯\bar{x} for wdomdφ(x¯)()w\in\mbox{\rm dom}\,d\varphi(\bar{x})(\cdot) with respect to z𝕏z\in\mathbb{X} is defined by

d2φ(x¯)(w|z)=deflim inft0,zzφ(x¯+tw+12t2z)φ(x¯)tdφ(x¯)(w)12t2.d^{2}\varphi(\bar{x})(w|z)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\liminf_{t\downarrow 0,z^{\prime}\to z}\dfrac{\varphi(\bar{x}+tw+\frac{1}{2}t^{2}z^{\prime})-\varphi(\bar{x})-td\varphi(\bar{x})(w)}{\frac{1}{2}t^{2}}. (2.4)

Parabolic subderivatives were introduced by Ben-Tal and Zowe in [5] to study strong minima; see also [43, Theorem 13.66]. Second subderivatives dated back to the seminal work of Rockafellar [42] with good calculus [35] for many important classes of functions. It is well known [43, Theorem 13.24] that x¯\bar{x} is a strong solution of φ\varphi if and only if 0φ(x¯)0\in\partial\varphi(\bar{x}) and

d2φ(x¯|0)(w)>0for allw0.d^{2}\varphi(\bar{x}|0)(w)>0\quad\mbox{for all}\quad w\neq 0. (2.5)

Here φ(x¯)\partial\varphi(\bar{x}) stands for the Mordukhovich limiting subdifferential of φ\varphi at x¯\bar{x} [36]:

φ(x¯)={v𝕏|(xk,vk)X×𝕏(x¯,v),lim infxxkφ(x)φ(xk)vk,xxkxxk0}.\partial\varphi(\bar{x})=\left\{v\in\mathbb{X}^{*}|\;\exists(x_{k},v_{k})\stackrel{{\scriptstyle X\times\mathbb{X}^{*}}}{{\longrightarrow}}(\bar{x},v),\,\liminf_{x\to x_{k}}\dfrac{\varphi(x)-\varphi(x_{k})-\langle v_{k},x-x_{k}\rangle}{\|x-x_{k}\|}\geq 0\right\}. (2.6)

When φ\varphi is a proper l.s.c. convex function, this subdifferential coincides with the subdifferential in the classical convex analysis

φ(x¯)={v𝕏|φ(x)φ(x¯)v,xx¯,x𝕏}.\partial\varphi(\bar{x})=\left\{v\in\mathbb{X}^{*}|\;\varphi(x)-\varphi(\bar{x})\geq\langle v,x-\bar{x}\rangle,x\in\mathbb{X}\right\}. (2.7)

We denote φ:𝕏¯\varphi^{*}:\mathbb{X}^{*}\to\overline{\mathbb{R}} by the Fenchel conjugate of φ\varphi:

φ(v)=defsup{v,xφ(x)|x𝕏}forv𝕏.\varphi^{*}(v)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sup\{\langle v,x\rangle-\varphi(x)\rangle|\;x\in\mathbb{X}\}\quad\mbox{for}\quad v\in\mathbb{X}^{*}. (2.8)

Next let us recall here some first and second order tangent structures [8, 43] on a nonempty closed set KK of 𝕏\mathbb{X} that widely used in this paper.

Definition 2.2 (tangent cones).

Let KK be a closed set of 𝕏\mathbb{X}. The Bouligand contingent cone at the point x¯K\bar{x}\in K to KK is defined by

TK(x¯)=defLimsupt0Kx¯t={w𝕏|tk0,wkw,x¯+tkwkK}.T_{K}(\bar{x})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\displaystyle\mathop{{\rm Lim}\,{\rm sup}}_{t\downarrow 0}\frac{K-\bar{x}}{t}=\left\{w\in\mathbb{X}|\;\exists\,t_{k}\downarrow 0,w_{k}\to w,\bar{x}+t_{k}w_{k}\in K\right\}. (2.9)

The inner and outer second order tangent set to KK at x¯K\bar{x}\in K in the direction w𝕏w\in\mathbb{X} are defined, respectively, by

TKi,2(x¯|w)\displaystyle T^{i,2}_{K}(\bar{x}|w) =defLiminft0Kx¯tw12t2={z𝕏|tk0,zkz,x¯+tkw+12tk2zkK}and\displaystyle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathop{{\rm Lim}\,{\rm inf}}_{t\downarrow 0}\frac{K-\bar{x}-tw}{\frac{1}{2}t^{2}}=\left\{z\in\mathbb{X}|\;\forall\,t_{k}\downarrow 0,\exists\,z_{k}\to z,\bar{x}+t_{k}w+\frac{1}{2}t_{k}^{2}z_{k}\in K\right\}\quad\mbox{and} (2.10)
TK2(x¯|w)\displaystyle T^{2}_{K}(\bar{x}|w) =defLimsupt0Kx¯tw12t2={z𝕏|tk0,zkz,x¯+tkw+12tk2zkK}.\displaystyle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathop{{\rm Lim}\,{\rm sup}}_{t\downarrow 0}\frac{K-\bar{x}-tw}{\frac{1}{2}t^{2}}=\left\{z\in\mathbb{X}|\;\exists\,t_{k}\downarrow 0,z_{k}\to z,\bar{x}+t_{k}w+\frac{1}{2}t_{k}^{2}z_{k}\in K\right\}.

The contingent cone TK(x¯)T_{K}(\bar{x}) is a closed set. It contains all w𝕏w\in\mathbb{X} such that there exists a sequence {tk}0\{t_{k}\}\downarrow 0 such that dist(x¯+tkw;K)=o(tk){\rm dist}\,(\bar{x}+t_{k}w;K)=o(t_{k}), where dist(x;K){\rm dist}\,(x;K) denotes the distance from x𝕏x\in\mathbb{X} to KK:

dist(x;K)=min{xu|uK}.{\rm dist}\,(x;K)=\min\{\|x-u\||\;u\in K\}. (2.11)

Similarly, the inner second order tangent set to KK at y¯\bar{y} is

TKi,2(x¯|w)={z𝕏|dist(x+tw+12t2z;K)=o(t2),t0}.T^{i,2}_{K}(\bar{x}|w)=\left\{z\in\mathbb{X}|\;{\rm dist}\,(x+tw+\frac{1}{2}t^{2}z;K)=o(t^{2}),t\geq 0\right\}. (2.12)

When KK is convex, it is well-known that

TK(x¯)={w𝕏|dist(x¯+tw;K)=o(t),t0}.T_{K}(\bar{x})=\{w\in\mathbb{X}|\;{\rm dist}\,(\bar{x}+tw;K)=o(t),t\geq 0\}.

Since the function dist(;K){\rm dist}\,(\cdot;K) is a convex function, TK(x¯)T_{K}(\bar{x}) is a convex set. In this case, the inner second order tangent set TKi,2(y¯|w)T^{i,2}_{K}(\bar{y}|w) is also convex due to the same reason and formula (2.12). Moreover, the dual of contingent cone is the normal cone to KK at x¯\bar{x}:

NK(x¯)=def[TK(x¯)]={v𝕏|v,xx¯0for allxK}.N_{K}(\bar{x})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}[T_{K}(\bar{x})]^{-}=\{v\in\mathbb{X}^{*}|\;\langle v,x-\bar{x}\rangle\leq 0\quad\mbox{for all}\quad x\in K\}. (2.13)

It is also the subdifferential of the indicator function ιK\iota_{K} to the set KK, which is defined by ιK(x)=0\iota_{K}(x)=0 if xKx\in K and ++\infty otherwise. The normal cone can be characterized via the support function to KK

σK(v)=defsup{v,x|xK}for allv𝕏\sigma_{K}(v)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sup\{\langle v,x\rangle|\;x\in K\}\quad\mbox{for all}\quad v\in\mathbb{X}^{*} (2.14)

with NK(x¯)={v𝕏|σK(v)v,x¯}N_{K}(\bar{x})=\{v\in\mathbb{X}^{*}|\;\sigma_{K}(v)\leq\langle v,\bar{x}\rangle\}.

To characterize strong minima for constrained optimization problems, Bonnans, Cominetti, and Shapiro [7, Definition 3] introduced the following second order regular condition on KK; see also [8, Definition 3.85].

Definition 2.3 (Second order regularity).

The set KK is called to be second order regular at x¯K\bar{x}\in K if for any wTK(x¯)w\in T_{K}(\bar{x}) the outer second order tangent set TK2(x¯|w)T^{2}_{K}(\bar{x}|w) coincides with the inner second order tangent set TKi,2(x¯|w)T^{i,2}_{K}(\bar{x}|w) and for any sequence xkKx_{k}\in K of the form xk=y¯+tkw+12tk2rkx_{k}=\bar{y}+t_{k}w+\frac{1}{2}t_{k}^{2}r_{k}

limkdist(rk;TK2(x¯,w))=0.\lim_{k\to\infty}{\rm dist}\,(r_{k};T^{2}_{K}(\bar{x},w))=0.

The proper l.s.c. function φ:𝕏¯\varphi:\mathbb{X}\to\overline{\mathbb{R}} is said to be second order regular at x¯domφ\bar{x}\in\mbox{\rm dom}\,\varphi if its epigraph of φ\varphi is second order regular at (x¯,φ(x¯))(\bar{x},\varphi(\bar{x})).

The class of second order regular sets cover many important sets in optimization such as any polyhedral set and the set of positive semi-definite matrices and the second order ice cream cone; see, e.g., [8]. Piecewise linear quadratic convex functions are second order regular [7]. Recently, it is proved in [18] some special spectral functions are also second order regular.

When the function φ:𝕏¯\varphi:\mathbb{X}\to\overline{\mathbb{R}} is l.s.c. convex and second order regular at x¯domφ\bar{x}\in\mbox{\rm dom}\,\varphi, we note from [8, Proposition 3.41]

Tepiφi,2((x¯,φ(x¯))|(w,dφ(x¯)(w)))=epid2φ(x¯)(w|)T^{i,2}_{{\rm epi}\,\varphi}((\bar{x},\varphi(\bar{x}))|(w,d\varphi(\bar{x})(w)))=\mbox{\rm epi}\,d^{2}\varphi(\bar{x})(w|\cdot) (2.15)

for any wdomdφ(x¯)w\in\mbox{\rm dom}\,d\varphi(\bar{x}). This is a convex set, which implies that d2φ(x¯)(w|)d^{2}\varphi(\bar{x})(w|\cdot) is a convex function. In this case, it is known from [8, Proposition 103] that φ\varphi is parabolically regular at x¯\bar{x} in a direction w𝕏w\in\mathbb{X} for v𝕏v\in\mathbb{X}^{*} in the sense that

d2φ(x¯|v)(w)=[d2φ(x¯)(w|)](v),d^{2}\varphi(\bar{x}|v)(w)=-[d^{2}\varphi(\bar{x})(w|\cdot)]^{*}(v), (2.16)

which is the Fenchel conjugate of the function d2φ(x¯)(w|)d^{2}\varphi(\bar{x})(w|\cdot) at vv provided that the pair (w,v)𝕏×𝕏(w,v)\in\mathbb{X}\times\mathbb{X}^{*} satisfies the condition v,w=dφ(x¯)(w)\langle v,w\rangle=d\varphi(\bar{x})(w).

Next let us slightly modify [8, Theorem 3.108 and Theorem 3.109], which give necessary and sufficient conditions for strong solutions of the following composite problem

minx𝕏g(F(x)),\min_{x\in\mathbb{X}}\quad g(F(x)), (2.17)

where F:𝕏𝕐F:\mathbb{X}\to\mathbb{Y} is a twice continuously differentiable mapping and g:𝕐¯g:\mathbb{Y}\to\overline{\mathbb{R}} is a l.s.c. proper convex function. Suppose that y0=F(x0)domgy_{0}=F(x_{0})\in\mbox{\rm dom}\,g with x0𝕏x_{0}\in\mathbb{X}. The Robinson’s constraint qualification at x0x_{0} for this composite problem is known as

0int(y0+F(x0)𝕏domg);0\in{\rm int}\,(y_{0}+\nabla F(x_{0})\mathbb{X}-\mbox{\rm dom}\,g); (2.18)

see, e.g., [7, 8]. The feasible point x0x_{0} is a called a stationary point of problem (2.17) if there exists a Lagrange multiplier λ𝕐\lambda\in\mathbb{Y}^{*} such that

F(x¯)λ=0andλg(y0).\nabla F(\bar{x})^{*}\lambda=0\quad\mbox{and}\quad\lambda\in\partial g(y_{0}). (2.19)
Theorem 2.4 (Second order characterizations for strong solutions of composite problems).

Suppose that Robinson’s constraint qualification (2.18) holds at a stationary point x0x_{0} and that the function gg is second order regular at y0y_{0}. Then x0x_{0} is a strong solution of problem (2.17) if and only if for any nonzero ww in the critical cone

C(x0)=def{u𝕏|dg(y0)(F(x0)u)=0},C(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{u\in\mathbb{X}|\;dg(y_{0})(\nabla F(x_{0})u)=0\}, (2.20)

there exists a Lagrange multiplier λ\lambda satisfying condition (2.19) such that

λ,2F(x0)(w,w)+d2g(y0|λ)(F(x0)w)>0.\langle\lambda,\nabla^{2}F(x_{0})(w,w)\rangle+d^{2}g(y_{0}|\lambda)(\nabla F(x_{0})w)>0. (2.21)

Proof. Let us justify the sufficient part first. According to [8, Theorem 3.109], x0x_{0} is a strong solution of problem (2.17) provided that for any wC(x0){0}w\in C(x_{0})\setminus\{0\} there exists a Lagrange multiplier λ𝕐\lambda\in\mathbb{Y}^{*} satisfying (2.19) such that

λ,2F(x0)(w,w)Ψ(λ)>0,\langle\lambda,\nabla^{2}F(x_{0})(w,w)\rangle-\Psi^{*}(\lambda)>0, (2.22)

where Ψ()=defd2g(y0)(F(x0)w|)\Psi(\cdot)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}d^{2}g(y_{0})(\nabla F(x_{0})w|\cdot). Since gg is convex and second order regular at y0y_{0}, equation (2.15) for the function gg tells us that d2g(y0)(F(x0)w|)d^{2}g(y_{0})(\nabla F(x_{0})w|\cdot) is a convex function for any wC(x0)w\in C(x_{0}). Moreover, note that

λ,F(x¯)w=F(x¯)λ,w=0=dg(F(x¯)w).\langle\lambda,\nabla F(\bar{x})w\rangle=\langle\nabla F(\bar{x})^{*}\lambda,w\rangle=0=dg(\nabla F(\bar{x})w).

We obtain from (2.16) that d2g(y0|λ)(F(x¯)w)=Ψ(λ)d^{2}g(y_{0}|\lambda)(\nabla F(\bar{x})w)=-\Psi^{*}(\lambda). This ensures the equivalence between (2.22) and (2.21). Thus x0x_{0} is a strong solution of problem (2.17) provided that condition (2.21) holds.

To prove the necessary part, we note again that the function d2g(y0)(F(x0)w|)d^{2}g(y_{0})(\nabla F(x_{0})w|\cdot) is convex, then there is no gap between second order necessary and sufficient condition, i.e., condition (2.22) is also a necessary condition for strong solution x0x_{0}; see [7, Theorem 5.2] or [8, Theorem 3.108]. Due to the equivalence of (2.21) and (2.22) above, condition (2.21) is also necesary for the strong minima at x0x_{0}. \hfill\Box

As described in (2.21), the existence of Lagrange multiplier is dependent on the choice of each vector in the critical cone. Under the Robinson’s constraint qualification (2.18), (2.21) is a minimax condition in the sense that it is equivalent to

minwC(x0),w=1maxλΛ(x0)[λ,2F(x0)(w,w)+d2g(y0|λ)(F(x0)w)]>0,\min_{w\in C(x_{0}),\|w\|=1}\max_{\lambda\in\Lambda(x_{0})}\big{[}\langle\lambda,\nabla^{2}F(x_{0})(w,w)\rangle+d^{2}g(y_{0}|\lambda)(\nabla F(x_{0})w)\big{]}>0, (2.23)

where Λ(x0)\Lambda(x_{0}) is the set of all Lagrange multipliers satisfying (2.19). It is hard to check this condition numerically. On the other hand, its maximin version is more desirable, as it means that there may exist a Lagrange multiplier λ\lambda such that inequality (2.21) is valid for any wC(x0){0}w\in C(x_{0})\setminus\{0\}. However, it is not clear how to close the gap between the minimax and maximin. For the case of (1.1), we will obtain some kind of maximin condition for strong minima in Theorem 5.2.

3 Geometric characterizations for strong minima of optimization problems

3.1 Geometric characterizations for strong minima of unconstrained optimization problems

In this subsection, we consider the following composite optimization problem

minx𝕏φ(x)=deff(x)+g(x),\displaystyle\min_{x\in\mathbb{X}}\quad\varphi(x)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}f(x)+g(x), (3.1)

where f,g:𝕏¯f,g:\mathbb{X}\to\overline{\mathbb{R}} are proper functions such that int(domf)domg{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g\neq\emptyset, ff is twice continuously differentiable in int(domf){\rm int}\,(\mbox{\rm dom}\,f), and gg is lower semi-continuous. We assume that x¯int(domf)domg\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g is a stationary point of problem (3.1) in the sense that

0φ(x¯)=f(x¯)+g(x¯)0\in\partial\varphi(\bar{x})=\nabla f(\bar{x})+\partial g(\bar{x})

due to the sum rule for limiting subdifferential; see, e.g., [36, Proposition 1.107] or [43, Exercise 10.10]. Obviously, x¯\bar{x} is a stationary point if and only if f(x¯)g(x¯)-\nabla f(\bar{x})\in\partial g(\bar{x}).

To characterize strong minima at the stationary point x¯\bar{x}, one of the most typical methods is using the second subderivative d2φ(x¯|0)d^{2}\varphi(\bar{x}|0) defined in (2.5). As the function ff is twice continuously differentiable at x¯\bar{x}, it is well-known [43, Exercise 13.18] that

d2φ(x¯|0)(w)=2f(x¯)w,w+d2g(x¯|f(x¯))(w)forw𝕏.d^{2}\varphi(\bar{x}|0)(w)=\langle\nabla^{2}f(\bar{x})w,w\rangle+d^{2}g(\bar{x}|-\nabla f(\bar{x}))(w)\quad\mbox{for}\quad w\in\mathbb{X}. (3.2)

Since gg is possibly nonsmooth in many structured optimization problems, the computation of d2g(x¯|f(x¯))(w)d^{2}g(\bar{x}|-\nabla f(\bar{x}))(w) could be quite challenging. In this section, we establish several new necessary and sufficient conditions for strong minima without computing second subderivatives under an additional assumption that the function gg satisfies the following quadratic growth condition [6, 49]; see also [8, Section 3.5].

Definition 3.1 (Quadratic growth conditions).

Let g:𝕏¯g:\mathbb{X}\to\overline{\mathbb{R}} be a proper l.s.c. function and SS be a closed subset of 𝕏\mathbb{X} with x¯domgS\bar{x}\in\mbox{\rm dom}\,g\cap S. We say that gg satisfies the quadratic growth condition at x¯\bar{x} for some v¯g(x¯)\bar{v}\in\partial g(\bar{x}) with respect to SS if there exist constants ε,δ>0\varepsilon,\delta>0 and modulus κ>0\kappa>0 such that

g(x)g(x¯)v¯,xx¯κ2[dist(x;S)]2for allx𝔹εδ(x¯|v¯)g(x)-g(\bar{x})-\langle\bar{v},x-\bar{x}\rangle\geq\frac{\kappa}{2}[{\rm dist}\,(x;S)]^{2}\qquad\mbox{for all}\qquad x\in\mathbb{B}_{\varepsilon}^{\delta}(\bar{x}|\bar{v}) (3.3)

with 𝔹εδ(x¯|v¯)=def{x𝔹ε(x¯)|g(x)g(x¯)v¯,xx¯<δ}\mathbb{B}_{\varepsilon}^{\delta}(\bar{x}|\bar{v})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{x\in\mathbb{B}_{\varepsilon}(\bar{x})|\;g(x)-g(\bar{x})-\langle\bar{v},x-\bar{x}\rangle<\delta\}. The function gg is said to satisfy the quadratic growth condition at x¯\bar{x} for v¯\bar{v}, if it satisfies this condition at x¯\bar{x} for v¯g(x¯)\bar{v}\in\partial g(\bar{x}) with respect to

S(x¯,v¯)=def{x𝕏|g(x)v¯,xg(x¯)v¯,x¯}.S(\bar{x},\bar{v})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{x\in\mathbb{X}|\;g(x)-\langle\bar{v},x\rangle\leq g(\bar{x})-\langle\bar{v},\bar{x}\rangle\right\}. (3.4)

Finally, we say the function gg satisfies the quadratic growth condition at x¯\bar{x} if it satisfies this condition at x¯\bar{x} for any v¯g(x¯)\bar{v}\in\partial g(\bar{x}).

As the function gg is l.s.c., the set S(x¯,v¯)S(\bar{x},\bar{v}) is closed and x¯S(x¯,v¯)\bar{x}\in S(\bar{x},\bar{v}). When quadratic growth condition (3.3) holds, it is clear that

S(x¯,v¯)𝔹ε(x¯)S𝔹ε(x¯)for someε>0.S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})\subset S\cap\mathbb{B}_{\varepsilon}(\bar{x})\quad\mbox{for some}\quad\varepsilon>0. (3.5)

Moreover, for any closed set SS fulfilling (3.5) and x𝔹ε2δ(x¯)x\in\mathbb{B}^{\delta}_{\frac{\varepsilon}{2}}(\bar{x}), we find some uS(x¯,v¯)u\in S(\bar{x},\bar{v}) such that

dist(x;S(x¯,v¯))=xuxx¯<ε2,{\rm dist}\,(x;S(\bar{x},\bar{v}))=\|x-u\|\leq\|x-\bar{x}\|<\frac{\varepsilon}{2},

which implies that ux¯xx¯+ε2<ε\|u-\bar{x}\|\leq\|x-\bar{x}\|+\frac{\varepsilon}{2}<\varepsilon, i.e., uS(x¯,v¯)𝔹ε(x¯)u\in S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x}). It follows from (3.5) that

dist(x;S(x¯,v¯))=xudist(x;S(x¯,v¯)𝔹ε(x¯))dist(x;S){\rm dist}\,(x;S(\bar{x},\bar{v}))=\|x-u\|\geq{\rm dist}\,(x;S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x}))\geq{\rm dist}\,(x;S)

for any x𝔹ε2δ(x¯)x\in\mathbb{B}^{\delta}_{\frac{\varepsilon}{2}}(\bar{x}). Hence, if the function gg satisfies the quadratic growth condition at x¯\bar{x} for v¯\bar{v}, it also satisfies the quadratic growth condition at x¯\bar{x} for v¯\bar{v} w.r.t. any closed set SS fulfilling (3.5). Many necessary and sufficient conditions for the quadratic growth condition have been established in [6, 8] and [44, 49] under a different name weak sharp minima with order 22.

When gg is convex and v¯g(x¯)\bar{v}\in\partial g(\bar{x}), the set S(x¯,v¯)S(\bar{x},\bar{v}) coincides with (g)1(v¯)=g(v¯)(\partial g)^{-1}(\bar{v})=\partial g^{*}(\bar{v}). The quadratic growth condition of gg at x¯\bar{x} to v¯g(x¯)\bar{v}\in\partial g(\bar{x}) w.r.t. (g)1(v¯)(\partial g)^{-1}(\bar{v}) has been studied and connected with the so-called Łojasiewicz inequality with exponent 12\frac{1}{2} [9] and the metric subregularity of the subdifferential [2, 21, 52] (even for nonconvex cases.) There are broad classes of convex functions satisfying the quadratic growth condition such as piece-wise linear quadratic convex functions [43, Definition 10.20] and many convex spectral functions [18]; see also [50] for some other ones.

When gg is not convex, the quadratic growth condition of gg at x¯\bar{x} to v¯g(x¯)\bar{v}\in\partial g(\bar{x}) w.r.t. (g)1(v¯)(\partial g)^{-1}(\bar{v}) is the same with the quadratic growth condition of gg at x¯\bar{x} for v¯\bar{v} provided that

(g)1(v¯)𝔹ε(x¯)S(x¯,v¯)𝔹ε(x¯)for sufficiently smallε>0.(\partial g)^{-1}(\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})\subset S(\bar{x},\bar{v})\cap\mathbb{B}_{\varepsilon}(\bar{x})\quad\mbox{for sufficiently small}\quad\varepsilon>0. (3.6)

It is necessary for the quadratic growth condition (3.3) at x¯\bar{x} for v¯\bar{v} that x¯\bar{x} is a local minimizer to the function gv¯(x)=defg(x)v¯,xg_{\bar{v}}(x)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}g(x)-\langle\bar{v},x\rangle, x𝕏x\in\mathbb{X}. Then, condition (3.6) is similar to the so-called proper separation of isocost surface of gv¯g_{\bar{v}} in [53], which is an improvement of the proper separation of stationary points of gv¯g_{\bar{v}} in [32]. By [21, Theorem 3.1], the function gg satisfies the quadratic growth condition at x¯\bar{x} for v¯g(x¯)\bar{v}\in\partial g(\bar{x}) w.r.t. (g)1(v¯)(\partial g)^{-1}(\bar{v}) provided that g\partial g is metrically subregular at x¯\bar{x} for v¯\bar{v} in the sense that there exist η,>0\eta,\ell>0 such that

dist(x;(g)1(v¯))dist(v¯;g(x))forx𝔹η(x¯).{\rm dist}\,(x;(\partial g)^{-1}(\bar{v}))\leq\ell{\rm dist}\,(\bar{v};\partial g(x))\quad\mbox{for}\quad x\in\mathbb{B}_{\eta}(\bar{x}). (3.7)

This condition is satisfied when g\partial g is a piecewise polyhedral set-valued mapping, i.e., the graph of g\partial g, {(x,v)𝕏×𝕏|vg(x)}\{(x,v)\in\mathbb{X}\times\mathbb{X}^{*}|\;v\in\partial g(x)\} is a union of finitely many polyhedral sets; see, e.g., [43, Example 9.57]. Thus the class of (possibly nonconvex) piecewise linear-quadratic functions fulfills (3.7); see also [53] for several sufficient conditions for (3.7) and some special nonconvex piecewise linear-quadratic regularizers such as SCAD and MCD penalty functions. Although our theory in this section is applicable to nonconvex functions, we focus our later applications on low-rank minimization problems (1.2) when gg is the nuclear norm, which also satisfies the quadratic growth condition [50] but the graph of g\partial g is not piecewise polyhedral.

The following lemma plays an important role in our analysis.

Lemma 3.2 (Necessary condition for quadratic growth).

Let g:𝕏¯g:\mathbb{X}\to\overline{\mathbb{R}} be a proper l.s.c. function and SS be a closed subset of 𝕏\mathbb{X} with x¯domgS\bar{x}\in\mbox{\rm dom}\,g\cap S. If gg satisfies the quadratic growth at x¯\bar{x} for some v¯g(x¯)\bar{v}\in\partial g(\bar{x}) w.r.t. SS with some modulus κ>0\kappa>0, we have

d2g(x¯|v¯)(w)κ[dist(w;TS(x¯))]2for allw𝕏.d^{2}g(\bar{x}|\bar{v})(w)\geq\kappa[{\rm dist}\,(w;T_{S}(\bar{x}))]^{2}\quad\mbox{for all}\quad w\in\mathbb{X}. (3.8)

Moreover, if gg satisfies the quadratic growth at x¯\bar{x} for v¯\bar{v}, we have

Kerd2g(x¯|v¯)=def{w𝕏|d2g(x¯|v¯)(w)=0}=TS(x¯,v¯)(x¯).\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{w\in\mathbb{X}|\;d^{2}g(\bar{x}|\bar{v})(w)=0\right\}=T_{S(\bar{x},\bar{v})}(\bar{x}). (3.9)

Proof. Suppose that inequality (3.3) holds with some ε,δ,κ>0\varepsilon,\delta,\kappa>0. Pick w𝕏w\in\mathbb{X}, we only need to verify (3.8) when d2g(x¯|v¯)(w)<d^{2}g(\bar{x}|\bar{v})(w)<\infty, i.e., wdomd2g(x¯|v¯)w\in\mbox{\rm dom}\,d^{2}g(\bar{x}|\bar{v}). It follows from (2.3) that there exist sequences tk0t_{k}\downarrow 0 and wkww_{k}\to w such that

d2g(x¯|v¯)(w)=limkg(x¯+tkwk)g(x¯)tkv¯,wk12tk2.d^{2}g(\bar{x}|\bar{v})(w)=\lim_{k\to\infty}\dfrac{g(\bar{x}+t_{k}w_{k})-g(\bar{x})-t_{k}\langle\bar{v},w_{k}\rangle}{\frac{1}{2}t_{k}^{2}}. (3.10)

Hence, we have

g(x¯+tkwk)g(x¯)v¯,x¯+tkwkx¯<δg(\bar{x}+t_{k}w_{k})-g(\bar{x})-\langle\bar{v},\bar{x}+t_{k}w_{k}-\bar{x}\rangle<\delta

when kk is sufficiently large. Combining (3.3) and (3.10) gives us that

d2g(x¯|v¯)(w)κlim supk[dist(x¯+tkwk;S)tk]2=κlim supk[dist(wk;Sx¯tk)]2\displaystyle\begin{array}[]{ll}d^{2}g(\bar{x}|\bar{v})(w)&\displaystyle\geq\kappa\limsup_{k\to\infty}\left[\dfrac{{\rm dist}\,(\bar{x}+t_{k}w_{k};S)}{t_{k}}\right]^{2}\\ &\displaystyle=\kappa\limsup_{k\to\infty}\left[{\rm dist}\,\left(w_{k};\frac{S-\bar{x}}{t_{k}}\right)\right]^{2}\end{array} (3.13)

As SS is closed, there exist ukSx¯tku_{k}\in\dfrac{S-\bar{x}}{t_{k}}, i.e., x¯+tkukS\bar{x}+t_{k}u_{k}\in S such that dist(wk;Sx¯tk)=wkuk.{\rm dist}\,\left(w_{k};\dfrac{S-\bar{x}}{t_{k}}\right)=\|w_{k}-u_{k}\|. This together with (3.13) implies that

d2g(x¯|v¯)(w)κ(lim supkwkuk2).d^{2}g(\bar{x}|\bar{v})(w)\geq\kappa(\limsup_{k\to\infty}\|w_{k}-u_{k}\|^{2}).

Hence uku_{k} is bounded. By passing to a subsequence, we suppose that uku_{k} converges to u𝕏u\in\mathbb{X}. As x¯+tkukS\bar{x}+t_{k}u_{k}\in S, we have uTS(x¯)u\in T_{S}(\bar{x}). It follows from the above inequality that

d2g(x¯|v¯)(w)κwu2κ[dist(w;TS(x¯))]2,d^{2}g(\bar{x}|\bar{v})(w)\geq\kappa\|w-u\|^{2}\geq\kappa[{\rm dist}\,(w;T_{S}(\bar{x}))]^{2},

which verifies (3.8).

To justify (3.9), suppose further that gg satisfies the quadratic growth condition at x¯\bar{x} for v¯\bar{v}. For any wKerd2g(x¯|v¯)w\in\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v}), we obtain from (3.8) that dist(w;TS(x¯,v¯)(x¯))=0{\rm dist}\,(w;T_{S(\bar{x},\bar{v})}(\bar{x}))=0, which means wTS(x¯,v¯)(x¯)w\in T_{S(\bar{x},\bar{v})}(\bar{x}). It follows that Kerd2g(x¯|v¯)TS(x¯,v¯)(x¯)\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v})\subset T_{S(\bar{x},\bar{v})}(\bar{x}). Let us prove the opposite inclusion by picking any wTS(x¯,v¯)(x¯)w\in T_{S(\bar{x},\bar{v})}(\bar{x}). There exist tk0t_{k}\downarrow 0 and wkww_{k}\to w such that x¯+tkwkS(x¯,v¯)\bar{x}+t_{k}w_{k}\in S(\bar{x},\bar{v}), i.e.,

g(x¯+tkwk)g(x¯)tkv¯,wk0g(\bar{x}+t_{k}w_{k})-g(\bar{x})-t_{k}\langle\bar{v},w_{k}\rangle\leq 0

We obtain from (3.8) that

0κ[dist(w;TS(x¯,v¯)(x¯))]2d2g(x¯|v¯)(w)lim infkg(x¯+tkwk)g(x¯)tkv¯,wk12tk20,0\leq\kappa[{\rm dist}\,(w;T_{S(\bar{x},\bar{v})}(\bar{x}))]^{2}\leq d^{2}g(\bar{x}|\bar{v})(w)\leq\liminf_{k\to\infty}\dfrac{g(\bar{x}+t_{k}w_{k})-g(\bar{x})-t_{k}\langle\bar{v},w_{k}\rangle}{\frac{1}{2}t_{k}^{2}}\leq 0, (3.14)

which yields wKerd2g(x¯|v¯)w\in\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v}) and verifies TS(x¯,v¯)(x¯)Kerd2g(x¯|v¯)T_{S(\bar{x},\bar{v})}(\bar{x})\subset\mbox{\rm Ker}\,d^{2}g(\bar{x}|\bar{v}). The proof is complete. \hfill\Box

Next let us establish the main theorem of this section, which provides a geometric characterization for strong minima of problem (3.1).

Theorem 3.3 (Necessary and sufficient conditions for strong minima).

Let x¯int(domf)domg\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g be a stationary point of problem (3.1). If x¯\bar{x} is a strong solution of problem (3.1), then

2f(x¯)w,w>0for allwTS(x¯,v¯)(x¯){0}.\langle\nabla^{2}f(\bar{x})w,w\rangle>0\qquad\mbox{for all}\quad w\in T_{S(\bar{x},\bar{v})}(\bar{x})\setminus\{0\}. (3.15)

Suppose further that gg satisfies the quadratic growth condition at x¯\bar{x} for v¯=deff(x¯)\bar{v}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\nabla f(\bar{x}) and that 2f(x¯)\nabla^{2}f(\bar{x}) is positive semidefinite, then x¯\bar{x} is a strong solution of problem (3.1) if and only if

Ker2f(x¯)TS(x¯,v¯)(x¯)={0}.\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{S(\bar{x},\bar{v})}(\bar{x})=\{0\}. (3.16)

Proof. As ff is twice continuously differentiable at x¯int(domf)\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f), we derive from (2.5) and (3.2) that x¯\bar{x} is a strong solution of φ\varphi if and only if there exists some >0\ell>0 such that

2f(x¯)w,w+d2g(x¯|v¯)(w)w2for allw𝕏.\langle\nabla^{2}f(\bar{x})w,w\rangle+d^{2}g(\bar{x}|\bar{v})(w)\geq\ell\|w\|^{2}\quad\mbox{for all}\quad w\in\mathbb{X}. (3.17)

To justify the first part, suppose that x¯\bar{x} is a strong solution of φ\varphi, i.e., (3.17) holds. Pick any wTS(x¯,v¯)(x¯){0}w\in T_{S(\bar{x},\bar{v})}(\bar{x})\setminus\{0\} and find sequences tk0t_{k}\downarrow 0 and wkww_{k}\to w such that x¯+tkwkS(x¯,v¯)\bar{x}+t_{k}w_{k}\in S(\bar{x},\bar{v}), which means

g(x¯+tkwk)g(x¯)v¯,x¯+tkwkx¯12tk20\dfrac{g(\bar{x}+t_{k}w_{k})-g(\bar{x})-\langle\bar{v},\bar{x}+t_{k}w_{k}-\bar{x}\rangle}{\frac{1}{2}t_{k}^{2}}\leq 0

By the definition of d2g(x¯|v¯)(w)d^{2}g(\bar{x}|\bar{v})(w) in (2.3), we have d2g(x¯|v¯)(w)0d^{2}g(\bar{x}|\bar{v})(w)\leq 0. This together with (3.17) verifies (3.15).

To verify the second part of the theorem, suppose that the function gg satisfies the quadratic growth condition at x¯\bar{x} for v¯\bar{v} with some modulus κ>0\kappa>0 and f2(x¯)\nabla f^{2}(\bar{x}) is positive semidefinite. It is obvious that (3.15) implies (3.16). We only need to prove that (3.16) is sufficient for strong minima at x¯\bar{x}. Suppose that condition (3.16) is satisfied. If condition (3.17) failed, we could find a sequence wkw_{k} such that wk=1\|w_{k}\|=1 and

2f(x¯)wk,wk+d2g(x¯|v¯)(wk)1k.\langle\nabla^{2}f(\bar{x})w_{k},w_{k}\rangle+d^{2}g(\bar{x}|\bar{v})(w_{k})\leq\frac{1}{k}.

It follows from (3.8) that

1k2f(x¯)wk,wk+κ[dist(w0;TS(x¯,v¯)(x¯))]2.\frac{1}{k}\geq\langle\nabla^{2}f(\bar{x})w_{k},w_{k}\rangle+\kappa[{\rm dist}\,(w_{0};T_{S(\bar{x},\bar{v})}(\bar{x}))]^{2}. (3.18)

By passing to a subsequence, assume that wkw0w_{k}\to w_{0} with w0=1\|w_{0}\|=1 (without relabeling.) It follows that

02f(x¯)w0,w0+κ[dist(w0;TS(x¯,v¯)(x¯)]22f(x¯)w0,w00.0\geq\langle\nabla^{2}f(\bar{x})w_{0},w_{0}\rangle+\kappa[{\rm dist}\,(w_{0};T_{S(\bar{x},\bar{v})}(\bar{x})]^{2}\geq\langle\nabla^{2}f(\bar{x})w_{0},w_{0}\rangle\geq 0.

Hence, we have 2f(x¯)w0,w0=0\langle\nabla^{2}f(\bar{x})w_{0},w_{0}\rangle=0 and dist(w0;TS(x¯,v¯)(x¯))=0{\rm dist}\,(w_{0};T_{S(\bar{x},\bar{v})}(\bar{x}))=0, which means

w0Ker2f(x¯)TS(x¯,v¯)(x¯).w_{0}\in\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{S(\bar{x},\bar{v})}(\bar{x}).

This is a contradiction to (3.16) as w0=1\|w_{0}\|=1. Hence, (3.16) holds for some >0\ell>0 and x¯\bar{x} is a strong solution of φ\varphi. The proof is complete. \hfill\Box

Corollary 3.4 (Geometric characterization for strong minima of convex problems).

Let f,g:𝕏¯f,g:\mathbb{X}\to\overline{\mathbb{R}} be proper l.s.c. convex functions and x¯int(domf)domg\bar{x}\in{\rm int}\,(\mbox{\rm dom}\,f)\cap\mbox{\rm dom}\,g be a minimizer to problem (3.1). Suppose that ff is twice continuously differentiable in int(domf){\rm int}\,(\mbox{\rm dom}\,f) and that g\partial g is metrically subregular at x¯\bar{x} for v¯=f(x¯)\bar{v}=-\nabla f(\bar{x}). Then x¯\bar{x} is a strong solution to problem (3.1) if any only if

Ker2f(x¯)T(g)(v¯)(x¯)={0}.\mbox{\rm Ker}\,\nabla^{2}f(\bar{x})\cap T_{(\partial g^{*})(\bar{v})}(\bar{x})=\{0\}. (3.19)

Proof. As discussed before (3.7), when gg is a convex function, the metric subregularity of g\partial g at x¯\bar{x} for v¯\bar{v} implies the quadratic growth condition (they are indeed equivalent [2, 52].) Since ff is convex, 2f(x¯)\nabla^{2}f(\bar{x}) is positive semidefinite. By Theorem 3.3, x¯\bar{x} is a strong solution if and only if (3.16) holds. Since gg is a convex function, we have S(x¯,v¯)=(g)1(v¯)=g(v¯)S(\bar{x},\bar{v})=(\partial g)^{-1}(\bar{v})=\partial g^{*}(\bar{v}). Thus (3.19) is equivalent to (3.16). The proof is complete. \hfill\Box

Unlike many other necessary and sufficient conditions for strong minima, our geometric characterizations (3.16) and (3.19) do not involve the ”curvature” or the ”sigma-term” of the function gg. We still need to compute the contingent cones TS(x¯,v¯)(x¯)T_{S(\bar{x},\bar{v})}(\bar{x}) in (3.16) or Tg(v¯)(x¯)T_{\partial g^{*}(\bar{v})}(\bar{x}) in (3.19). In Section 4 and 5, we consider the case g=g=\|\cdot\|_{*}, the nuclear norm and provide a simple calculation of Tg(v¯)(x¯)T_{\partial g^{*}(\bar{v})}(\bar{x}).

3.2 Geometric characterization for strong minima of optimization problems with linear constraints

In this subsection, we apply the idea in Theorem 3.3 and Corollary 3.4 to the following convex optimization problem with linear constraints

minx𝕏g(x)subject toΦxK,\min_{x\in\mathbb{X}}\quad g(x)\quad\mbox{subject to}\quad\Phi x\in K, (3.20)

where g:𝕏g:\mathbb{X}\to\mathbb{R} is a continuous (nonsmooth) convex function with full domain, Φ:𝕏𝕐\Phi:\mathbb{X}\to\mathbb{Y} is a linear operator between two Euclidean spaces, and KK is a closed convex polyhedral set in 𝕐\mathbb{Y}. Unlike problem (3.1), function gg needs to satisfy more properties such as convexity, quadratic growth condition, and second order regularity stated in the next theorem.

Let us recall that x0x_{0} is a stationary solution of problem (3.20) if there exists a Lagrange multiplier λ𝕐\lambda\in\mathbb{Y}^{*}, hence a dual certificate, such that

Φλg(x0)andλNK(Φx0).-\Phi^{*}\lambda\in\partial g(x_{0})\quad\mbox{and}\quad\lambda\in N_{K}(\Phi x_{0}). (3.21)

The set of Lagrange multipliers is defined by

Λ(x0)=def{λNK(Φx0)|Φλg(x0)}.\displaystyle\Lambda(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{\lambda\in N_{K}(\Phi x_{0})|-\Phi^{*}\lambda\in\partial g(x_{0})\}. (3.22)

The critical cone of this problem at the stationary point x0x_{0} is

C(x0)=def{w𝕏|ΦwTK(Φx0),dg(x0)(w)=0}.C(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{w\in\mathbb{X}|\;\Phi w\in T_{K}(\Phi x_{0}),dg(x_{0})(w)=0\}. (3.23)

The point x0x_{0} is call a strong solutions of problem (3.20) if there exists ε>0\varepsilon>0 and c>0c>0 such that

g(x)g(x0)cxx02whenΦxKandx𝔹ε(x0).g(x)-g(x_{0})\geq c\|x-x_{0}\|^{2}\qquad\mbox{when}\qquad\Phi x\in K\quad\mbox{and}\quad x\in\mathbb{B}_{\varepsilon}(x_{0}).
Theorem 3.5 (Geometric characterization for strong minima of problem (3.20)).

Let x0x_{0} be a stationary point of problem (3.20). Suppose that the convex function gg is second order regular at x0x_{0} and satisfies the quadratic grown condition at x0x_{0}. Then x0x_{0} is a strong solution of problem (3.20) if and only if

[λΛ(x0)Tg(Φλ)(x0)]C(x0)={0}.\left[\bigcap_{\lambda\in\Lambda(x_{0})}T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\right]\cap C(x_{0})=\{0\}. (3.24)

Proof. Define y0=defΦx0,y_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\Phi x_{0}, 𝕃=defImΦ\mathbb{L}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm Im}\,\Phi being a linear subspace of 𝕐\mathbb{Y}, K𝕃=defK𝕃K_{\mathbb{L}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}K\cap\mathbb{L}, the mapping F:𝕏𝕃×𝕏F:\mathbb{X}\to\mathbb{L}\times\mathbb{X} by F(x)=def(Φx,x)F(x)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(\Phi x,x) for x𝕏x\in\mathbb{X}, and the function G:𝕃×𝕏G:\mathbb{L}\times\mathbb{X}\to\mathbb{R} by

G(u,x)=ιK𝕃(u)+g(x)for(u,x)𝕃×𝕏.G(u,x)=\iota_{K_{\mathbb{L}}}(u)+g(x)\quad\mbox{for}\quad(u,x)\in\mathbb{L}\times\mathbb{X}.

Hence we can replace 𝕐\mathbb{Y} by 𝕃\mathbb{L} and KK by K𝕃K_{\mathbb{L}} in problem (3.20). Rewrite problem (3.20) as a composite optimization problem (2.17)

infx𝕏G(F(x)).\inf_{x\in\mathbb{X}}\quad G(F(x)). (3.25)

Observe that x0x_{0} is a strong solution of (3.20) if and only if it is a strong solution of (3.25). Robinson’s constraint qualification (2.18) for problem (3.25) is

0int(F(x0)+F(x0)𝕏K𝕃×𝕏).0\in{\rm int}\,(F(x_{0})+\nabla F(x_{0})\mathbb{X}-K_{\mathbb{L}}\times\mathbb{X}). (3.26)

As Φ:𝕏𝕃\Phi:\mathbb{X}\to\mathbb{L} is a surjective operator, [8, Corollary 2.101] tells us that the above condition is equivalent to the existence of w𝕏w\in\mathbb{X} satisfying

y0+ΦwK𝕃andx0+wint𝕏=𝕏.y_{0}+\Phi w\in K_{\mathbb{L}}\quad\mbox{and}\quad x_{0}+w\in{\rm int}\,\mathbb{X}=\mathbb{X}.

This condition holds trivially at w=0𝕏w=0_{\mathbb{X}}. Thus Robinson’s constraint qualification (3.26) holds.

The critical cone (2.20) for problem (3.25) is

C^(x0)=def{w𝕏|dG(F(x0)|F(x0)w)=0}={w𝕏|ΦwTK𝕃(y0),dg(x0)(w)=0}.\displaystyle\widehat{C}(x_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{w\in\mathbb{X}|\;dG(F(x_{0})|\;\nabla F(x_{0})w)=0\}=\{w\in\mathbb{X}|\;\Phi w\in T_{K_{\mathbb{L}}}(y_{0}),dg(x_{0})(w)=0\}. (3.27)

As K𝕃K_{\mathbb{L}} is also a polyhedral in 𝕃\mathbb{L}, we have

TK𝕃(y0)=+(K𝕃y0)=+(Ky0)𝕃=TK(y0)𝕃.T_{K_{\mathbb{L}}}(y_{0})=\mathbb{R}_{+}(K_{\mathbb{L}}-y_{0})=\mathbb{R}_{+}(K-y_{0})\cap\mathbb{L}=T_{K}(y_{0})\cap\mathbb{L}. (3.28)

It follows that the set C^(x0)\widehat{C}(x_{0}) is exactly the critical cone C(x0)C(x_{0}) defined in (3.23).

By (2.19), the set of Lagrange multipliers of problem (3.25) is

Λ^(x0)=def{(λ,μ)𝕃×𝕏|F(x0)(λ,μ)=0,(λ,μ)G(F(x0))}={(λ,μ)𝕃×𝕏|μ=Φλ,λNK𝕃(y0),μg(x0)}.\displaystyle\begin{array}[]{ll}\widehat{\Lambda}(x_{0})&\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{(\lambda,\mu)\in\mathbb{L}^{*}\times\mathbb{X}^{*}|\;\nabla F(x_{0})^{*}(\lambda,\mu)=0,(\lambda,\mu)\in\partial G(F(x_{0}))\}\\ &=\{(\lambda,\mu)\in\mathbb{L}^{*}\times\mathbb{X}^{*}|\;\mu=-\Phi^{*}\lambda,\lambda\in N_{K_{\mathbb{L}}}(y_{0}),\mu\in\partial g(x_{0})\}.\end{array} (3.31)

Note further that

epiG=K𝕃×epig.\mbox{\rm epi}\,G=K_{\mathbb{L}}\times\mbox{\rm epi}\,g.

Since 𝕃\mathbb{L} is a polyhedral, it is second order regular [7]. As epig\mbox{\rm epi}\,g is second order regular, so is epiG\mbox{\rm epi}\,G; see, e.g., [8, Propisition 3.89].

By Theorem 2.4, x0x_{0} is a strong solution of problem (3.25) if and only if for any wC(x0){0}w\in C(x_{0})\setminus\{0\} there exists (λ,μ)Λ(x0)(\lambda,\mu)\in\Lambda(x_{0}) such that

(λ,μ),2F(x0)(w,w)+d2G(F(x0)|(λ,μ))(F(x0)w)>0.\displaystyle\langle(\lambda,\mu),\nabla^{2}F(x_{0})(w,w)\rangle+d^{2}G(F(x_{0})|(\lambda,\mu))(\nabla F(x_{0})w)>0. (3.32)

Observe that

d2G(F(x0)|F(x0)(λ,μ))(F(x0)w)=d2g(x0|μ)(w)+d2ιK𝕃(y0|λ)(Φw).d^{2}G(F(x_{0})|\nabla F(x_{0})(\lambda,\mu))(\nabla F(x_{0})w)=d^{2}g(x_{0}|\mu)(w)+d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w). (3.33)

Note from (2.3) that

d2ιK𝕃(y0|λ)(Φw)=lim infzΦw,t0ιK𝕃(y0+tz)ιK𝕃(y0)tλ,z0.5t20.d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w)=\liminf_{z\to\Phi w,t\downarrow 0}\frac{\iota_{K_{\mathbb{L}}}(y_{0}+tz)-\iota_{K_{\mathbb{L}}}(y_{0})-t\langle\lambda,z\rangle}{0.5t^{2}}\geq 0. (3.34)

By (3.28), we have

d2ιK𝕃(y0|λ)(Φw)=lim infzTK𝕃(y0)Φw,t0λ,z0.5t0.d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w)=\liminf_{z\stackrel{{\scriptstyle T_{K_{\mathbb{L}}}(y_{0})}}{{\to}}\Phi w,t\downarrow 0}\frac{-\langle\lambda,z\rangle}{0.5t}\geq 0. (3.35)

Since wC(x0)w\in C(x_{0}), we have ΦwTK𝕃(Φx0)\Phi w\in T_{K_{\mathbb{L}}}(\Phi x_{0}). As λNK𝕃(Φx0)\lambda\in N_{K_{\mathbb{L}}}(\Phi x_{0}) and μ=Φλg(x0)\mu=-\Phi^{*}\lambda\in\partial g(x_{0}), it follows that

0=dg(x0)(w)μ,w=Φλ,w=λ,Φw0,0=dg(x_{0})(w)\geq\langle\mu,w\rangle=-\langle\Phi^{*}\lambda,w\rangle=-\langle\lambda,\Phi w\rangle\geq 0,

which implies that λ,Φw=0\langle\lambda,\Phi w\rangle=0. This together with (3.35) tells us that d2ιK𝕃(y0|λ)(Φw)=0d^{2}\iota_{K_{\mathbb{L}}}(y_{0}|\lambda)(\Phi w)=0.

By (3.33), condition (3.32) is equivalent to

d2g(x0|Φλ)(w)>0.d^{2}g(x_{0}|-\Phi^{*}\lambda)(w)>0. (3.36)

Since KK is a polyhedral set, we have

NK𝕃(Φx0)=NK𝕃(Φx0)=NK(Φx0)+N𝕃(Φx0)=NK(Φx0)+KerΦ.N_{K_{\mathbb{L}}}(\Phi x_{0})=N_{K\cap\mathbb{L}}(\Phi x_{0})=N_{K}(\Phi x_{0})+N_{\mathbb{L}}(\Phi x_{0})=N_{K}(\Phi x_{0})+\mbox{\rm Ker}\,\Phi^{*}.

Represent λ=λ1+λ2\lambda=\lambda_{1}+\lambda_{2} for some λ1NK(Φx0)\lambda_{1}\in N_{K}(\Phi x_{0}) and λ2KerΦ\lambda_{2}\in\mbox{\rm Ker}\,\Phi^{*}, it follows that

Φλ=Φ(λ1+λ2)=Φλ1.-\Phi^{*}\lambda=-\Phi^{*}(\lambda_{1}+\lambda_{2})=-\Phi^{*}\lambda_{1}.

Hence x0x_{0} is a strong solution of problem (3.25) if any only if for any wC(x0){0}w\in C(x_{0})\setminus\{0\} there exists some λΛ(x0)\lambda\in\Lambda(x_{0}) in (3.21) such that (3.36) holds. Since gg satisfies the quadratic grown condition at x0x_{0}, we get from Lemma 3.2 that

Kerd2g(x0|Φλ)=Tg(Φλ)(x0).\mbox{\rm Ker}\,d^{2}g(x_{0}|-\Phi^{*}\lambda)=T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0}).

Hence condition (3.36) means for any wC(x0){0}w\in C(x_{0})\setminus\{0\} there exists λΛ(x0)\lambda\in\Lambda(x_{0}) such that

wTg(Φλ)(x0)orw𝕏Tg(Φλ)(x0).w\notin T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\quad\mbox{or}\quad w\in\mathbb{X}\setminus T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0}).

This is equivalent to the following inclusion

C(x0){0}[λΛ(x0)(𝕏Tg(Φλ)(x0))]=𝕏[λΛ(x0)Tg(Φλ)(x0)],C(x_{0})\setminus\{0\}\subset\left[\bigcup_{\lambda\in\Lambda(x_{0})}\left(\mathbb{X}\setminus T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\right)\right]=\mathbb{X}\setminus\left[\bigcap_{\lambda\in\Lambda(x_{0})}T_{\partial g^{*}(-\Phi^{*}\lambda)}(x_{0})\right],

which is also equivalent to (3.24). The proof is complete. \hfill\Box

The approach of using composite function (3.25) to study problem (3.20) is traditional; see, e.g., [7, 34]. However, by assuming additionally the function gg satisfies the quadratic grown condition at x0x_{0}, we are able to obtain the new geometric characterization for strong solution in (3.24). The main idea in this result is similar to that in Theorem 3.3. Here we require the function gg to satisfy more assumptions, but when applying this result to the nuclear norm minimization problem (5.1), they are also valid, as the nuclear norm is second order regular and also satisfies the quadratic growth condition [18].

4 Characterizations for strong minima of low-rank optimization problems

This section devotes to new characterizations for strong minima of the low-rank optimization problem:

minXn1×n2h(ΦX)+μX,\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad h(\Phi X)+\mu\|X\|_{*}, (4.1)

where Φ:n1×n2m\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{m} is a linear operator, g(X)=defXg(X)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|X\|_{*} is the nuclear norm of Xn1×n2X\in\mathbb{R}^{n_{1}\times n_{2}}, μ\mu is a positive constant, and h:m¯h:\mathbb{R}^{m}\to\overline{\mathbb{R}} satisfies the following standing assumptions [24]:

  1. (A)

    hh is proper convex and twice continuously differentiable in int(domh){\rm int}\,(\mbox{\rm dom}\,h).

  2. (B)

    2h(ΦX)\nabla^{2}h(\Phi X) is positive definite for any XΦ1(int(domh))X\in\Phi^{-1}({\rm int}\,(\mbox{\rm dom}\,h)).

Strongly convex functions with full domain clearly satisfy the above standing assumptions. Another important (non-strongly convex) function with these conditions widely used in statistical/machine learning is the Kullback-Leiber divergence. Sufficient conditions for strong minima of problem (4.1) can be obtained from [18, Theorem 12]. However, their result still relies on some computation of d2d^{2}\|\cdot\|_{*}, which is complicated; see, e.g., [20, 51] and the recent paper [37] for the case of symmetric matrices. We will provide some explicit and computable characterizations for strong minima of problem (4.1) based on Corollary 3.4. The calculation of the contingent cone Tg(Y¯)(X¯)T_{\partial g^{*}(\overline{Y})}(\overline{X}) is rather simple; see our formula (4.10) below.

Let us recall a few standard notations for matrices. The space of all matrices n1×n2\mathbb{R}^{n_{1}\times n_{2}} (n1n2n_{1}\leq n_{2}) is endowed with the inner product

X,Y=defTr(XTY)for allX,Yn1×n2,\langle X,Y\rangle\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm Tr}\,(X^{T}Y)\quad\mbox{for all}\quad X,Y\in\mathbb{R}^{n_{1}\times n_{2}},

where Tr  is the trace operator. The Frobenious norm on n1×n2\mathbb{R}^{n_{1}\times n_{2}} is

XF=defTr(XTX)for allXn1×n2.\|X\|_{F}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sqrt{\mbox{\rm Tr}\,(X^{T}X)}\quad\mbox{for all}\quad X\in\mathbb{R}^{n_{1}\times n_{2}}.

The nuclear norm and spectral norm of Xn1×n2X\in\mathbb{R}^{n_{1}\times n_{2}} are defined respectively by

X=defi=1n1σi(X)andX=defσ1(X),\|X\|_{*}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sum_{i=1}^{n_{1}}\sigma_{i}(X)\quad\mbox{and}\quad\|X\|\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\sigma_{1}(X),

where σ1(X)σ2(X)σn1(X)0\sigma_{1}(X)\geq\sigma_{2}(X)\geq\ldots\geq\sigma_{n_{1}}(X)\geq 0 are all singular values of XX. Suppose that a full Singular Value Decomposition (SVD) of X¯n1×n2\overline{X}\in\mathbb{R}^{n_{1}\times n_{2}} is

X¯=U(Σ¯r000)n1×n2VTwithΣ¯r=(σ1(X¯)00σr(X¯)),\overline{X}=U\begin{pmatrix}\overline{\Sigma}_{r}&0\\ 0&0\end{pmatrix}_{n_{1}\times n_{2}}V^{T}\quad\mbox{with}\quad\overline{\Sigma}_{r}=\begin{pmatrix}\sigma_{1}(\overline{X})&\dots&0\\ \vdots&\ddots&\vdots\\ 0&\ldots&\sigma_{r}(\overline{X})\end{pmatrix}, (4.2)

where r=rank(X¯)r={\rm rank}\,(\overline{X}) , Un1×n1U\in\mathbb{R}^{n_{1}\times n_{1}} and Vn2×n2V\in\mathbb{R}^{n_{2}\times n_{2}} are orthogonal matrices. Let 𝒪(X¯)\mathcal{O}(\overline{X}) be the set of all such pairs (U,V)(U,V) satisfying (4.2). We write U=(UIUJ)U=\begin{pmatrix}U_{I}&U_{J}\end{pmatrix} and V=(VIVK)V=\begin{pmatrix}V_{I}&V_{K}\end{pmatrix}, where UIU_{I} and VIV_{I} are the submatrices of the first rr columns of UU and VV, respectively. We get from (4.2) that X¯=UIΣ¯rVIT\overline{X}=U_{I}\overline{\Sigma}_{r}V_{I}^{T}, which is known as a compact SVD of X¯\overline{X}.

The following lemma is significant in our paper. The first part is well-known [48, Example 2]. The last part was established in [50, Proposition 10], which can be viewed as a direct consequence of [48, Example 1] via convex analysis, the formula of normal cone to a level set [41, Corollary 23.7.1].

Lemma 4.1 (Subdifferential of the nuclear norm).

The subdifferential to nuclear norm at X¯n1×n2\overline{X}\in\mathbb{R}^{n_{1}\times n_{2}} is computed by

X¯={U(𝕀r00W)VT|W1}for any(U,V)𝒪(X¯).\partial\|\overline{X}\|_{*}=\left\{U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&W\end{pmatrix}V^{T}|\;\|W\|\leq 1\right\}\quad\mbox{for any}\quad(U,V)\in\mathcal{O}(\overline{X}). (4.3)

Moreover, Y¯X¯\overline{Y}\in\partial\|\overline{X}\|_{*} if and only if Y¯1\|\overline{Y}\|\leq 1 and

X¯=Y¯,X¯.\|\overline{X}\|_{*}=\langle\overline{Y},\overline{X}\rangle. (4.4)

Furthermore, for any Y¯𝔹=def{Zn1×n2|Z1}\overline{Y}\in\mathbb{B}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{Z\in\mathbb{R}^{n_{1}\times n_{2}}|\;\|Z\|\leq 1\}, we have

g(Y¯)=N𝔹(Y¯)=U¯(𝕊+p(Y¯)000)V¯Tfor any(U¯,V¯)𝒪(Y¯),\partial g^{*}(\overline{Y})=N_{\mathbb{B}}(\overline{Y})=\overline{U}\begin{pmatrix}\mathbb{S}_{+}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}\quad\mbox{for any}\quad(\overline{U},\overline{V})\in\mathcal{O}(\overline{Y}), (4.5)

where 𝕊+p\mathbb{S}_{+}^{p} is the set of all p×pp\times p symmetric positive semidefinite matrices and p(Y¯)p(\overline{Y}) is defined by

p(Y¯)=def#{i|σi(Y¯)=1}.p(\overline{Y})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\#\{i|\;\sigma_{i}(\overline{Y})=1\}. (4.6)

Let Y¯X¯\overline{Y}\in\partial\|\overline{X}\|_{*} and (U,V)𝒪(X¯)(U,V)\in\mathcal{O}(\overline{X}). It follows from (4.3) that Y¯\overline{Y} can be represented by

Y¯=U(𝕀r00W¯)VT\overline{Y}=U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&\overline{W}\end{pmatrix}V^{T} (4.7)

with some W¯(n1r)×(n2r)\overline{W}\in\mathbb{R}^{(n_{1}-r)\times(n_{2}-r)} satisfying W¯1\|\overline{W}\|\leq 1. Let (U^,V^)𝒪(W¯)(\widehat{U},\widehat{V})\in\mathcal{O}(\overline{W}) and U^ΣV^T\widehat{U}\Sigma\widehat{V}^{T} be a full SVD of W¯\overline{W}. We get from (4.7) that

Y¯=U¯(𝕀r00Σ)V¯TwithU¯=def(UIUJU^)andV¯=def(VIVKV^).\overline{Y}=\overline{U}\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&\Sigma\end{pmatrix}\overline{V}^{T}\quad\mbox{with}\quad\overline{U}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(U_{I}\;U_{J}\widehat{U})\quad\mbox{and}\quad\overline{V}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(V_{I}\;V_{K}\widehat{V}). (4.8)

Observe that U¯TU¯=𝕀n1\overline{U}^{T}\overline{U}=\mathbb{I}_{{n_{1}}} and V¯TV¯=𝕀n2\overline{V}^{T}\overline{V}=\mathbb{I}_{n_{2}}. It follows that (U¯,V¯)𝒪(X¯)𝒪(Y¯)(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})\cap\mathcal{O}(\overline{Y}), which means X¯\overline{X} and Y¯\overline{Y} have simultaneous ordered singular value decomposition [30, 31] with orthogonal matrix pair (U¯,V¯)(\overline{U},\overline{V}) in the sense that

X¯=U¯(Diagσ(X¯))V¯TandY¯=U¯(Diagσ(Y¯))V¯T,\overline{X}=\overline{U}({\rm Diag}\,\sigma(\overline{X}))\overline{V}^{T}\qquad\mbox{and}\qquad\overline{Y}=\overline{U}({\rm Diag}\,\sigma(\overline{Y}))\overline{V}^{T}, (4.9)

where σ(X¯)=def(σ1(X¯),,σn1(X¯))T\sigma(\overline{X})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left(\sigma_{1}(\overline{X}),\ldots,\sigma_{n_{1}}(\overline{X})\right)^{T} and Diagσ(X¯)=def(σ1(X¯)00000000σn1(X¯)00)n1×n2{\rm Diag}\,\sigma(\overline{X})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}\sigma_{1}(\overline{X})&\ldots&0&0&\ldots&0\\ 0&\ddots&0&0&\ldots&0\\ 0&\ldots&\sigma_{n_{1}}(\overline{X})&0&\ldots&0\end{pmatrix}_{n_{1}\times n_{2}}.

The following result establishes a geometric characterization for strong solution of the problem (4.1). According to [50, Proposition 11], the subdifferential of the nuclear norm function satisfies the metric subregularity (3.7) at any X¯\overline{X} for any YX¯Y\in\partial\|\overline{X}\|_{*}.

Corollary 4.2 (Geometric characterization for strong minima of low-rank optimization problems).

Suppose that X¯Φ1(int(domh))\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h)) is a minimizer of problem (4.1) with Y¯=def1μΦh(ΦX¯)X¯\overline{Y}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\frac{1}{\mu}\Phi^{*}\nabla h(\Phi\overline{X})\in\partial\|\overline{X}\|_{*}. Let (U¯,V¯)𝒪(X¯)𝒪(Y¯)(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})\cap\mathcal{O}(\overline{Y}) as in (4.9) or (4.8). Then we have

TN𝔹(Y¯)(X¯)={U¯(AB0BTC0000)V¯T|A𝕊r,Br×(p(Y¯)r),C𝕊+p(Y¯)r},T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\left\{\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&C&0\\ 0&0&0\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p(\overline{Y})-r)},C\in\mathbb{S}^{p(\overline{Y})-r}_{+}\right\}, (4.10)

where p(Y¯)p(\overline{Y}) is defined in (4.6) and 𝕊r\mathbb{S}^{r} is the set of all symmetric matrices of size r×rr\times r. Hence X¯\overline{X} is a strong solution of (4.1) if any only if

KerΦTN𝔹(Y¯)(X¯)={0}.\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\{0\}. (4.11)

Consequently, X¯\overline{X} is a strong solution of (4.1) provided that the following Strong Sufficient Condition holds

KerΦU¯(𝕊p(Y¯)000)V¯T={0}.\mbox{\rm Ker}\,\Phi\cap\overline{U}\begin{pmatrix}\mathbb{S}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}=\{0\}. (4.12)

Proof. By Lemma 4.1, we have

g(Y¯)=N𝔹(Y¯)=U¯(𝕊+p(Y¯)000)V¯T.\partial g^{*}(\overline{Y})=N_{\mathbb{B}}(\overline{Y})=\overline{U}\begin{pmatrix}\mathbb{S}_{+}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}.

As (U¯,V¯)𝒪(X¯)(\overline{U},\overline{V})\in\mathcal{O}(\overline{X}), we obtain from (4.2) that

TN𝔹(Y¯)(X¯)=U¯(T𝕊+p(Y¯)(Σ¯r000)000)V¯T,T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\overline{U}\begin{pmatrix}T_{\mathbb{S}_{+}^{p(\overline{Y})}}\begin{pmatrix}\overline{\Sigma}_{r}&0\\ 0&0\end{pmatrix}&0\\ 0&0\end{pmatrix}\overline{V}^{T},

which is exactly the right-hand side of (4.10) according to the contingent cone formula to S+p(Y¯)S^{p(\overline{Y})}_{+} in [8, Example 2.65]. Since \partial\|\cdot\|_{*} is metrically subregular at X¯\overline{X} for Y¯\overline{Y} by [50, Proposition 11], it follows from Corollary 3.4 that X¯\overline{X} is a strong solution of problem (4.1) if and only if

Ker(Φ2h(ΦX¯)Φ)TN𝔹(Y¯)(X¯)={0}.\mbox{\rm Ker}\,\Big{(}\Phi^{*}\nabla^{2}h(\Phi\overline{X})\Phi\Big{)}\cap T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\{0\}.

Since 2h(ΦX¯)0\nabla^{2}h(\Phi\overline{X})\succ 0, we have Ker(Φ2h(ΦX¯)Φ)=KerΦ\mbox{\rm Ker}\,\Big{(}\Phi^{*}\nabla^{2}h(\Phi\overline{X})\Phi\Big{)}=\mbox{\rm Ker}\,\Phi. The characterization (4.11) for strong minima at X¯\overline{X} follows from the above condition.

Finally, note from (4.10) that

TN𝔹(Y¯)(X¯)U¯(𝕊p(Y¯)000)V¯T.T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})\subset\overline{U}\begin{pmatrix}\mathbb{S}^{p(\overline{Y})}&0\\ 0&0\end{pmatrix}\overline{V}^{T}.

Strong Sufficient Condition (4.12) implies strong minima at X¯\overline{X} by (4.11). \hfill\Box

Remark 4.3.

The geometric characterization (4.11) for strong minima of problem (4.1) is news. A sufficient condition is indeed obtained from [18, Theorem 12], which considers more general optimization problems involving spectral functions. However, their result contains a nontrivial sigma-term, which is calculated explicitly in recent papers [17, 37] for the case of symmetric matrices. Our approach is totally different without any sigma-terms. Moreover, our condition is a full characterization for strong minima.

Another result about strong minima of problem (4.1) was established in [29, Proposition 12], which plays an important role in proving the local linear convergence of Forward-Backward algorithms solving problem (4.1). The result mainly states that the so-called Restricted Injectivity and Nondegenerate Condition are sufficient for strong minima; see also [24, Proposition 4.27] for similar observation. Let us recall these important conditions here; see further discussions about them in Section 5. Let (U,V)𝒪(X¯)(U,V)\in\mathcal{O}(\overline{X}) and define the model tangent subspace

𝕋=def{UIYT+XVIT|Xn1×r,Yn2×r}\mathbb{T}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{U_{I}Y^{T}+XV_{I}^{T}|\;X\in\mathbb{R}^{n_{1}\times r},Y\in\mathbb{R}^{n_{2}\times r}\} (4.13)

of n1×n2\mathbb{R}^{n_{1}\times n_{2}} with dimension dim𝕋=r(n1+n2r)\mbox{\rm dim}\,\mathbb{T}=r(n_{1}+n_{2}-r); see, e.g., [15, 16]. The Restricted Injectivity condition means

KerΦ𝕋={0}.\mbox{\rm Ker}\,\Phi\cap\mathbb{T}=\{0\}. (4.14)

And the Nondegeneracy Condition holds when

Y¯=1μΦh(ΦX¯)riX¯,\overline{Y}=-\frac{1}{\mu}\Phi^{*}\nabla h(\Phi\overline{X})\in{\rm ri}\,\partial\|\overline{X}\|_{*}, (4.15)

where riX¯{\rm ri}\,\partial\|\overline{X}\|_{*} is the relative interior of X¯\partial\|\overline{X}\|_{*}; see [41]. The validity of Nondegeneracy Condition (4.15) implies that X¯\overline{X} is an optimal solution of problem (4.1). Note that

riX¯={U(𝕀r00W)VT|W<1}withr=rank(X¯).{\rm ri}\,\partial\|\overline{X}\|_{*}=\left\{U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&W\end{pmatrix}V^{T}|\;\|W\|<1\right\}\quad\mbox{with}\quad r=\mbox{\rm rank}\,(\overline{X}). (4.16)

Hence, Nondegeneracy Condition (4.15) means that the number of singular value ones, p(Y¯)p(\overline{Y}) in (4.6), is the rank of X¯\overline{X}. In this case, Restricted Injectivity (4.14) clearly implies the Strong Sufficient Condition (4.12). Hence, the combination of Restricted Injectivity (4.14) and Nondegeneracy Condition (4.15) is stronger than our Strong Sufficient Condition (4.12). The following result gives a complete picture about strong minima when Nondegeneracy Condition (4.15) occurs.

Corollary 4.4 (Strong minima under Nondegeneracy Condition).

Suppose that X¯Φ1(int(domh))\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h)) and Nondegeneracy Condition (4.15) holds. Then X¯\overline{X} is a strong solution of problem (4.1) if and only if the following Strict Restricted Injectivity holds

KerΦUI𝕊rVIT={0},\mbox{\rm Ker}\,\Phi\cap U_{I}\mathbb{S}^{r}V_{I}^{T}=\{0\}, (4.17)

where UIΣ¯VITU_{I}\overline{\Sigma}V_{I}^{T} is a compact SVD of X¯\overline{X}.

Proof. As Nondegeneracy Condition (4.15) holds, X¯\overline{X} is a solution of problem (4.1). In this case, observe from (4.8) and (4.10) that TN𝔹(Y¯)(X¯)=UI𝕊rVIT.T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=U_{I}\mathbb{S}^{r}V_{I}^{T}. The equivalence between strong minima at X¯\overline{X} and (4.17) follows Corollary 3.4.\hfill\Box

As the dimension of subspace UI𝕊rVITU_{I}\mathbb{S}^{r}V_{I}^{T} is 12r(r+1)\frac{1}{2}r(r+1), which is usually small in low-rank optimization problems, it is likely that condition (4.17) holds when Nondegeneracy Condition (4.15) is satisfied. More discussions about Strict Restricted Injectivity will be added on Section 5.

Although geometric characterization (4.11) looks simple, checking it in high dimension is nontrivial. But Strong Sufficient Condition (4.12) and Strict Restricted Injectivity (4.17) can be verified easily. Next we establish some quantitative characterizations for strong minima. Before doing so, we obtain some projection formulas onto subspaces 𝕋\mathbb{T} and 𝕋\mathbb{T}^{\perp}. For any Xn1×n2X\in\mathbb{R}^{n_{1}\times n_{2}}, suppose that XX is represented by block matrices as:

X=(UIUJ)(ABCD)(VIVK)Twith(U,V)𝒪(X¯).\displaystyle\begin{array}[]{ll}X&=\begin{pmatrix}U_{I}&U_{J}\end{pmatrix}\begin{pmatrix}A&B\\ C&D\end{pmatrix}\begin{pmatrix}V_{I}&V_{K}\end{pmatrix}^{T}\quad\mbox{with}\quad(U,V)\in\mathcal{O}(\overline{X}).\end{array}

The projections of XX onto 𝕋\mathbb{T} and 𝕋\mathbb{T}^{\perp} are computed respectively by

P𝕋X=(UIUJ)(ABC0)(VIVK)TandP𝕋X=UJDVKT.P_{\mathbb{T}}X=\begin{pmatrix}U_{I}&U_{J}\end{pmatrix}\begin{pmatrix}A&B\\ C&0\end{pmatrix}\begin{pmatrix}V_{I}&V_{K}\end{pmatrix}^{T}\quad\mbox{and}\quad P_{\mathbb{T}^{\perp}}X=U_{J}DV_{K}^{T}. (4.19)

The following result provides a formula for critical cone of nuclear norm at X¯\overline{X} for Y¯X¯\overline{Y}\in\partial\|\overline{X}\|_{*}.

Proposition 4.5 (Critical cone of nuclear norm).

Let Y¯X¯\overline{Y}\in\partial\|\overline{X}\|_{*} and (U¯,V¯)𝒪(X¯)𝒪(Y¯)(\overline{U},\overline{V})\in\mathcal{O}(\overline{X})\cap\mathcal{O}(\overline{Y}) as in (4.2) and (4.8). Define H=def{k{r+1,n1}|σk(Y¯)=1}H\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{k\in\{r+1,\ldots n_{1}\}|\;\sigma_{k}(\overline{Y})=1\}. Then the critical cone 𝒞(X¯,Y¯)\mathcal{C}(\overline{X},\overline{Y}) of \|\cdot\|_{*} at X¯\overline{X} for Y¯\overline{Y} is computed by

𝒞(X¯,Y¯)=def{Wn1×n2|dX¯(W)=Y¯,W}={Wn1×n2|P𝕋WU¯H𝕊+|H|V¯HT},\mathcal{C}(\overline{X},\overline{Y})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;d\|\overline{X}\|_{*}(W)=\langle\overline{Y},W\rangle\}=\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{\mathbb{T}^{\perp}}W\in\overline{U}_{H}\mathbb{S}^{|H|}_{+}\overline{V}_{H}^{T}\right\}, (4.20)

where U¯H\overline{U}_{H} and V¯H\overline{V}_{H} are submatrices of index columns HH of U¯\overline{U} and V¯\overline{V}, respectively.

Proof. For any Wm×nW\in\mathbb{R}^{m\times n}, it is well-known from convex analysis [41] that

dX¯(W)=supYX¯Y,W.\displaystyle d\|\overline{X}\|_{*}(W)=\sup_{Y\in\partial\|\overline{X}\|_{*}}\langle Y,W\rangle. (4.21)

This together with (4.3) and (4.19) implies that

dX¯(W)=supYX¯Y,P𝕋W+P𝕋W=supYX¯P𝕋Y,W+Y,P𝕋W=E,W+P𝕋W\displaystyle\begin{array}[]{ll}d\|\overline{X}\|_{*}(W)&\displaystyle=\sup_{Y\in\partial\|\overline{X}\|_{*}}\langle Y,P_{\mathbb{T}}W+P_{\mathbb{T}^{\perp}}W\rangle=\sup_{Y\in\partial\|\overline{X}\|_{*}}\langle P_{\mathbb{T}}Y,W\rangle+\langle Y,P_{\mathbb{T}^{\perp}}W\rangle\\ &=\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\end{array} (4.24)

with E=defUIVITE\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}U_{I}V_{I}^{T}. As Y¯X¯\overline{Y}\in\partial\|\overline{X}\|_{*}, we have W𝒞(X¯,Y¯)W\in\mathcal{C}(\overline{X},\overline{Y}) if and only if

E,W+P𝕋W=P𝕋Y¯,W+P𝕋Y¯,W=E,W+P𝕋Y¯,P𝕋W,\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}=\langle P_{\mathbb{T}}\overline{Y},W\rangle+\langle P_{\mathbb{T}^{\perp}}\overline{Y},W\rangle=\langle E,W\rangle+\langle P_{\mathbb{T}^{\perp}}\overline{Y},P_{\mathbb{T}^{\perp}}W\rangle,

which means P𝕋W=P𝕋Y¯,P𝕋W\|P_{\mathbb{T}^{\perp}}W\|_{*}=\langle P_{\mathbb{T}^{\perp}}\overline{Y},P_{\mathbb{T}^{\perp}}W\rangle. By Lemma 4.1, we have P𝕋Wg(P𝕋Y¯)P_{\mathbb{T}^{\perp}}W\in\partial g^{*}(P_{\mathbb{T}^{\perp}}\overline{Y}), or equivalently P𝕋WU¯H𝕊+|H|V¯HTP_{\mathbb{T}^{\perp}}W\in\overline{U}_{H}\mathbb{S}^{|H|}_{+}\overline{V}_{H}^{T}. The proof is complete. \hfill\Box

Next, we construct the main result of this section, which contains a quantitative characterization for strong minima. A similar result for group-sparsity minimization problem is recently established in [24, Theorem 5.3].

Theorem 4.6 (Characterizations for strong minima of low-rank optimization problems).

Suppose that X¯Φ1(int(domh))\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h)) is a minimizer of problem (4.1) and Y¯=1μΦh(ΦX¯)\overline{Y}=-\frac{1}{\mu}\Phi^{*}\nabla h(\Phi\overline{X}) with decomposition (4.2) and (4.8). The following are equivalent:

  1. (i)

    X¯\overline{X} is a strong solution to problem (4.1).

  2. (ii)

    KerΦ𝒞={0}\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathcal{C}=\{0\} with

    =def{Wn1×n2|P𝕋WU¯(AB0BT00000)V¯T,A𝕊r,Br×(p(Y¯)r)},\displaystyle\mathcal{E}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{\mathbb{T}}W\in\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&0&0\\ 0&0&0\end{pmatrix}\overline{V}^{T},A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p(\overline{Y})-r)}\right\}, (4.25)
    𝒞=def{Wn1×n2|E,W+P𝕋W=0}withE=defUIVIT.\displaystyle\mathcal{C}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}=0\right\}\qquad\mbox{with}\qquad E\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}U_{I}V_{I}^{T}. (4.26)
  3. (iii)

    The following conditions (a) and either (b) or (c) are satisfied:

    • (a)

      (Strong Restricted Injectivity): KerΦ𝕋={0}\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathbb{T}=\{0\}.

    • (b)

      (Strong Nondegenerate Source Condition): There exists YImΦ+Y\in\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp} such that Y=U(𝕀r00Z)VTY=U\begin{pmatrix}\mathbb{I}_{r}&0\\ 0&Z\end{pmatrix}V^{T} and Z<1\|Z\|<1.

    • (c)

      (Analysis Strong Source Condition): The Strong Source Coefficient ζ(X¯)\zeta(\overline{X}), which is the optimal value of the following spectral norm optimization problem

      minZ(mr)×(nr)Zsubject to(UJZVKT)=E\min_{Z\in\mathbb{R}^{(m-r)\times(n-r)}}\qquad\|Z\|\qquad\mbox{subject to}\qquad\mathcal{M}(U_{J}ZV_{K}^{T})=-\mathcal{M}E (4.27)

      is smaller than 11, where \mathcal{M} is a linear operator such that Im=KerΦ\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}.

Proof. Let us verify the equivalence between (i) and (ii). By Corollary 4.2, it suffices to show that

KerΦTN𝔹(Y¯)(X¯)=KerΦ𝒞.\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathcal{C}. (4.28)

By Proposition 4.5 and Corollary 4.2, we have

TN𝔹(Y¯)(X¯)=𝒞(X¯,Y¯).T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})=\mathcal{E}\cap\mathcal{C}(\overline{X},\overline{Y}). (4.29)

As Y¯ImΦ\overline{Y}\in\mbox{\rm Im}\,\Phi^{*}, note from (4.20) and (4.24) that

KerΦ𝒞(X¯,Y¯)=KerΦ𝒞.\mbox{\rm Ker}\,\Phi\cap\mathcal{C}(\overline{X},\overline{Y})=\mbox{\rm Ker}\,\Phi\cap\mathcal{C}.

This together with (4.29) verifies (4.28) and also the equivalence between (i) and (ii).

Next, let us verify the implication [(ii)\Rightarrow(iii)]. Suppose that (ii) (or (i)) is satisfied. Note from the projection formula (4.19) that 𝕋TN𝔹(Y¯)(X¯)\mathcal{E}\cap\mathbb{T}\subset T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X}). It follows from Corollary 4.2 that the Strong Resitricted Injectivity holds.

Since Y¯X¯ImΦ\overline{Y}\in\partial\|\overline{X}\|_{*}\cap\mbox{\rm Im}\,\Phi^{*}, we obtain from (4.21) and (4.24) that

E,W+P𝕋W=dg(X¯)(W)Y¯,W=0for anyWKerΦ.\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}=dg(\overline{X})(W)\geq\langle\overline{Y},W\rangle=0\quad\mbox{for any}\quad W\in\mbox{\rm Ker}\,\Phi.

Condition (ii) means

c=defmin{E,W+P𝕋W|WKerΦ,W=1}>0,c\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\min\{\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}|\;W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E},\|W\|_{*}=1\}>0,

which implies that

k(W)=defE,W+P𝕋WcWfor allWKerΦ.k(W)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\geq c\|W\|_{*}\quad\mbox{for all}\quad W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E}. (4.30)

As Im=KerΦ\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}, for any WKerΦW\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E}, we write W=YW=\mathcal{M}^{*}Y and derive from (4.30) that

(1c)P𝕋YP𝕋YcYE,Y=E,Y.(1-c)\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\|_{*}\geq\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\|_{*}-c\|\mathcal{M}^{*}Y\|_{*}\geq-\langle E,\mathcal{M}^{*}Y\rangle=-\langle\mathcal{M}E,Y\rangle. (4.31)

Since ImKerΦ\mbox{\rm Im}\,\mathcal{M}^{*}\subset\mbox{\rm Ker}\,\Phi, we have ImΦKer\mbox{\rm Im}\,\Phi^{*}\subset\mbox{\rm Ker}\,\mathcal{M} and thus Y¯Ker\overline{Y}\in\mbox{\rm Ker}\,\mathcal{M}. It follows that

0=Y¯=P𝕋Y¯+P𝕋Y¯=E+P𝕋Y¯.0=\mathcal{M}\overline{Y}=\mathcal{M}P_{\mathbb{T}}\overline{Y}+\mathcal{M}P_{\mathbb{T}^{\perp}}\overline{Y}=\mathcal{M}E+\mathcal{M}P_{\mathbb{T}^{\perp}}\overline{Y}.

This together with (4.31) implies that

(1c)P𝕋YP𝕋Y¯,Y=P𝕋Y¯,Y=P𝕋Y¯,P𝕋Y.(1-c)\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\|_{*}\geq\langle\mathcal{M}P_{\mathbb{T}^{\perp}}\overline{Y},Y\rangle=\langle P_{\mathbb{T}^{\perp}}\overline{Y},\mathcal{M}^{*}Y\rangle=\langle P_{\mathbb{T}^{\perp}}\overline{Y},P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}Y\rangle.

Define 𝔹=def{Wm×n|W1}\mathbb{B}_{*}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{W\in\mathbb{R}^{m\times n}|\;\|W\|_{*}\leq 1\} the unit ball with respect to nuclear norm. We obtain from the later and the classical minimax theorem [41, Corollary 37.3.2] that

1csupW𝔹P𝕋Y¯,WιImP𝕋(W)=supW𝔹infXKerP𝕋P𝕋Y¯+X,W=infXKerP𝕋supW𝔹P𝕋Y¯+X,W=infXKerP𝕋P𝕋Y¯+X.\begin{array}[]{ll}1-c&\displaystyle\geq\sup_{W\in\mathbb{B}_{*}}\langle P_{\mathbb{T}^{\perp}}\overline{Y},W\rangle-\iota_{{\rm Im}\,P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}}(W)\\ &=\displaystyle\sup_{W\in\mathbb{B}_{*}}\inf_{X\in{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}}\langle P_{\mathbb{T}^{\perp}}\overline{Y}+X,W\rangle\\ &=\displaystyle\inf_{X\in{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}}\sup_{W\in\mathbb{B}_{*}}\langle P_{\mathbb{T}^{\perp}}\overline{Y}+X,W\rangle\\ &=\displaystyle\inf_{X\in{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}}\|P_{\mathbb{T}^{\perp}}\overline{Y}+X\|.\end{array}

Hence there exists X0KerP𝕋X_{0}\in\mbox{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}} such that P𝕋Y¯+X0<1\|P_{\mathbb{T}^{\perp}}\overline{Y}+X_{0}\|<1. Due to the projection formula (4.19), observe that

1>P𝕋Y¯+X0P𝕋(P𝕋Y¯+X0)=P𝕋(Y¯+X0).1>\|P_{\mathbb{T}^{\perp}}\overline{Y}+X_{0}\|\geq\|P_{\mathbb{T}^{\perp}}(P_{\mathbb{T}^{\perp}}\overline{Y}+X_{0})\|=\|P_{\mathbb{T}^{\perp}}(\overline{Y}+X_{0})\|.

Define Y0=Y¯+P𝕋X0Y_{0}=\overline{Y}+P_{\mathbb{T}^{\perp}}X_{0}, we have

Y0=Y¯+P𝕋X0=0.\mathcal{M}Y_{0}=\mathcal{M}\overline{Y}+\mathcal{M}P_{\mathbb{T}^{\perp}}X_{0}=0.

Note that Ker=ImΦ+\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}, Y¯ImΦKer\overline{Y}\in\mbox{\rm Im}\,\Phi^{*}\subset\mbox{\rm Ker}\,\mathcal{M}, and X0KerP𝕋X_{0}\in\mbox{\rm Ker}\,\mathcal{M}P_{\mathbb{T}^{\perp}}. It follows that Y0Ker=ImΦ+Y_{0}\in\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}. Moreover, observe that P𝕋Y0=P𝕋Y¯=EP_{\mathbb{T}}Y_{0}=P_{\mathbb{T}}\overline{Y}=E and P𝕋Y0<1\|P_{\mathbb{T}^{\perp}}Y_{0}\|<1. Thus, Y0Y_{0} satisfies the condition in (b). As Ker=ImΦ+\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}, (b) and (c) are equivalent, we ensure the implication [(ii)\Rightarrow(iii)].

It remains to justify the implication [(iii)\Rightarrow(ii)]. Suppose that the Strong Restricted Injectivity (a) and the Strong Source Condition (b) hold with some Y0ImΦ+Y_{0}\in\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp} satisfying the condition in (b). Indeed, pick any WKerΦ𝒞=ImW\in\mbox{\rm Ker}\,\Phi\cap\mathcal{E}\cap\mathcal{C}=\mbox{\rm Im}\,^{*}\mathcal{M}. As Y0KerY_{0}\in\mbox{\rm Ker}\,\mathcal{M}, we have Y0,W=0\langle Y_{0},W\rangle=0. It follows that

0=E,W+P𝕋W=P𝕋Y0,W+P𝕋W=Y0,WP𝕋Y0,W+P𝕋W=P𝕋Y0,P𝕋W+P𝕋W(1P𝕋Y0)P𝕋W.\begin{array}[]{ll}0=\langle E,W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}&=\displaystyle\langle P_{\mathbb{T}}Y_{0},W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\\ &=\displaystyle\langle Y_{0},W\rangle-\langle P_{\mathbb{T}^{\perp}}Y_{0},W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\\ &=\displaystyle-\langle P_{\mathbb{T}^{\perp}}Y_{0},P_{\mathbb{T}^{\perp}}W\rangle+\|P_{\mathbb{T}^{\perp}}W\|_{*}\\ &\geq(1-\|P_{\mathbb{T}^{\perp}}Y_{0}\|)\|P_{\mathbb{T}^{\perp}}W\|_{*}.\end{array}

Since P𝕋Y0<1\|P_{\mathbb{T}^{\perp}}Y_{0}\|<1, we have P𝕋W=0P_{\mathbb{T}^{\perp}}W=0, i.e., W𝕋W\in\mathbb{T}. This implies that W=0W=0 due to the Strong Restricted Injectivity (a). The proof is complete. \hfill\Box

Remark 4.7.

The Strong Restricted Injectivity (a) means that the linear operator Φ\Phi is injective on the subspace 𝕋\mathcal{E}\cap\mathbb{T}. It is similar with the one with the same name in [24, Theorem 5.3] that is used to characterize the uniqueness (strong) solution for group-sparsity optimization problems. This condition is adopted from the Restricted Injectivity (4.14) in [25]; see also [14, 15, 16] for the case of nuclear norm minimization problems. The Strong Restricted Injectivity is certainly weaker than the Restricted Injectivity.

The Strong Nondegenerate Source Condition and Analysis Strong Source Condition also inherit the same terminologies introduced in [24, Theorem 5.3] for group-sparsity optimization problems. In [14, 16, 25, 46, 47], the Nondegenerate Source Condition at X¯\overline{X} means the existence of a dual certificate Y0ImΦX¯Y_{0}\in\mbox{\rm Im}\,\Phi^{*}\cap\partial\|\overline{X}\|_{*} satisfying P𝕋Y0<1\|P_{\mathbb{T}^{\perp}}Y_{0}\|<1, which is equivalent to

ImΦriX¯.\mbox{\rm Im}\,\Phi^{*}\cap{\rm ri}\,\partial\|\overline{X}\|_{*}\neq\emptyset. (4.32)

This condition is weaker than the Nondegeneracy Condition (4.15). In the case of 1\ell_{1} optimization, it is well-known that the Restricted Injectivity and Nondegenerate Source Condition together characterize solution uniqueness [25]. For nuclear norm minimization problem, [15, 16] shown that they are sufficient for solution uniqueness of problems (1.1) and (4.1); see also [46, 47] for more general convex optimization problems. It is worth noting that they are not necessary conditions for solution uniqueness; see, e.g., [24, Example 4.15]. One hidden reason is that the nuclear norm is not polyhedral. Recently [24] shows that these two conditions characterize the so-called sharp minima of (5.1), which is somewhat between solution uniqueness and strong minima; see our Remark 5.18 for further discussion. Due to [24, Proposition 4.27], they are sufficient for strong solution of problem (4.1). This fact can be obtained from Theorem 4.6 by observing that our Strong Nondegenerate Source Condition is weaker than Nondegenerate Source Condition, since Strong Nondegenerate Source Condition involving the set \mathcal{E} means

(ImΦ+)riX¯(\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp})\cap{\rm ri}\,\partial\|\overline{X}\|_{*}\neq\emptyset (4.33)

due to (4.16).

Remark 4.8 (Checking Strong Restricted Injectivity, Strong Sufficient Condition (4.12), and constructing the linear operator \mathcal{M}).

To check the Strong Restricted Injectivity, observe first that

={U¯(AB0BTCD0EF)V¯Tn1×n2|A𝕊r,Br×(pr)},\mathcal{E}=\left\{\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&C&D\\ 0&E&F\end{pmatrix}\overline{V}^{T}\in\mathbb{R}^{n_{1}\times n_{2}}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p-r)}\right\},

which is a subspace of n1×n2\mathbb{R}^{n_{1}\times n_{2}} with dimension q=defr(r+1)2+r(pr)+(n1r)(n2r)q\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{r(r+1)}{2}+r(p-r)+(n_{1}-r)(n_{2}-r). Moreover, the restriction of \mathcal{E} on 𝕋\mathbb{T}

𝕋={U¯(AB0BT00000)V¯Tn1×n2|A𝕊r,Br×(pr)},\mathcal{E}\cap\mathbb{T}=\left\{\overline{U}\begin{pmatrix}A&B&0\\ B^{T}&0&0\\ 0&0&0\end{pmatrix}\overline{V}^{T}\in\mathbb{R}^{n_{1}\times n_{2}}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p-r)}\right\}, (4.34)

is also a subspace of n1×n2\mathbb{R}^{n_{1}\times n_{2}} with dimension s=defr(r+1)2+r(pr)s\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{r(r+1)}{2}+r(p-r). The set in U¯(𝕊p000)V¯T\overline{U}\begin{pmatrix}\mathbb{S}^{p}&0\\ 0&0\end{pmatrix}\overline{V}^{T} in Strong Sufficient Condition (4.12) is another subspace of m×n\mathbb{R}^{m\times n} with dimension l=def12p(p+1)l\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{2}p(p+1). Suppose that {W1,,Ws}\{W_{1},\ldots,W_{s}\} form a basis of 𝕋\mathcal{E}\cap\mathbb{T}, {W1,,Wl}\{W_{1},\ldots,W_{l}\} form a basis of U¯(𝕊p000)V¯T\overline{U}\begin{pmatrix}\mathbb{S}^{p}&0\\ 0&0\end{pmatrix}\overline{V}^{T} and {W1Wq}\{W_{1}\ldots W_{q}\} form a basis of \mathcal{E}. For any WW\in\mathcal{E}, we write W=λ1W1++λqWqW=\lambda_{1}W_{1}+\ldots+\lambda_{q}W_{q} and obtain that

Φ(W)=λ1Φ(W1)++λqΦ(Wq).\Phi(W)=\lambda_{1}\Phi(W_{1})+\ldots+\lambda_{q}\Phi(W_{q}). (4.35)

Define Ψ=def(Φ(W1)Φ(Wq))\Psi\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}\Phi(W_{1})&\ldots&\Phi(W_{q})\end{pmatrix} to be an m×qm\times q matrix and Ψs\Psi_{s}, Ψl\Psi_{l} to be the submatrices of the first s,ls,l columns of Ψ\Psi, respectively. By (4.35), the Strong Restricted Injectivity is equivalent to the condition that KerΨs=0\mbox{\rm Ker}\,\Psi_{s}=0, i.e., rankΨs=s.\mbox{\rm rank}\,{\Psi_{s}}=s. Similarly, the Strong Sufficient Condition (4.12) means rankΨl=l\mbox{\rm rank}\,{\Psi_{l}}=l.

Next let us discuss how to construct the operator \mathcal{M}. Let U^ΛV^T\widehat{U}\Lambda\widehat{V}^{T} be a SVD of Ψ\Psi with k=rankΨk={\rm rank}\,\Psi. Define V^G\widehat{V}_{G} to be the q×(qk)q\times(q-k) submatrix of V^\widehat{V} where G={k+1,,q}G=\{k+1,\ldots,q\}. Note that

ImV^G=KerΨ.\mbox{\rm Im}\,\widehat{V}_{G}=\mbox{\rm Ker}\,\Psi.

Determine the linear operator :n1×n2qk\mathcal{M}:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{q-k} by

X=defV^GT(W1,X,,Wq,X)Tfor allXn1×n2.\mathcal{M}X\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\widehat{V}_{G}^{T}\begin{pmatrix}\langle W_{1},X\rangle,\ldots,\langle W_{q},X\rangle\end{pmatrix}^{T}\quad\mbox{for all}\quad X\in\mathbb{R}^{n_{1}\times n_{2}}. (4.36)

It is easy to check that Im=KerΦ\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}. \hfill\Box

Remark 4.9 (Checking the Analysis Strong Source Condition without using the linear operator \mathcal{M}).

To verify the Analysis Strong Source Condition, Theorem 4.6 suggests to solve the optimization problem (4.27), which involves the linear operator \mathcal{M}. To avoid the computation of \mathcal{M}, note first that the constraint in (4.27) means UJZVKT+EKer=ImΦ+.U_{J}ZV_{K}^{T}+E\in\mbox{\rm Ker}\,\mathcal{M}=\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}^{\perp}. Suppose that 𝒩\mathcal{N} is the linear operator with Ker𝒩=ImΦ\mbox{\rm Ker}\,\mathcal{N}=\mbox{\rm Im}\,\Phi^{*}, i.e., Im𝒩=KerΦ\mbox{\rm Im}\,\mathcal{N}^{*}=\mbox{\rm Ker}\,\Phi, the later condition is equivalent to

𝒩(UJZVKT+E+W)=0for someW,\mathcal{N}(U_{J}ZV_{K}^{T}+E+W)=0\quad\mbox{for some}\quad W\in\mathcal{E}^{\perp},

where \mathcal{E}^{\perp} is computed by

={U(ABCBT00D00)VT|A𝕍r,Br×(pr),Cr×(n2p),D(n1p)×r},\mathcal{E}^{\perp}=\left\{U\begin{pmatrix}A&B&C\\ -B^{T}&0&0\\ D&0&0\end{pmatrix}V^{T}|\;A\in\mathbb{V}_{r},B\in\mathbb{R}^{r\times(p-r)},C\in\mathbb{R}^{r\times(n_{2}-p)},D\in\mathbb{R}^{(n_{1}-p)\times r}\right\}, (4.37)

which is a subspace of 𝕋\mathbb{T} with dimension 12r(r1)+r(m+npr)\frac{1}{2}r(r-1)+r(m+n-p-r); here 𝕍rr×r\mathbb{V}_{r}\subset\mathbb{R}^{r\times r} is the set of all skew-symmetric matrices. Hence problem (4.27) is equivalent to the following one

minZ(n1r)×(n2r),Wn1×n2Zsubject to𝒩(UJZVJT+W)=𝒩EandW.\min_{Z\in\mathbb{R}^{(n_{1}-r)\times(n_{2}-r)},W\in\mathbb{R}^{n_{1}\times n_{2}}}\qquad\|Z\|\quad\mbox{subject to}\quad\mathcal{N}(U_{J}ZV_{J}^{T}+W)=-\mathcal{N}E\quad\mbox{and}\quad W\in\mathcal{E}^{\perp}. (4.38)

The price to pay is the size of this optimization problem is bigger than the one in (4.27). But finding the linear operator 𝒩\mathcal{N} is much easier than \mathcal{M}.

Corollary 4.10 (Quantitative sufficient condition for strong solution).

Suppose that X¯Φ1(int(domh))\overline{X}\in\Phi^{-1}({\rm int}(\mbox{\rm dom}\,h)) is a minimizer of problem (4.1). Then X¯\overline{X} is a strong solution provided that the Strong Restricted Injectivity holds and

γ(X¯)=defP𝕋(P𝕋)1E<1,\gamma(\overline{X})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}(\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*})^{-1}\mathcal{M}E\|<1, (4.39)

where \mathcal{M} is the linear operator satisfying Im=KerΦ.\mbox{\rm Im}\,\mathcal{M}^{*}=\mbox{\rm Ker}\,\Phi\cap\mathcal{E}.

Proof. Suppose that Strong Restricted Injectivity holds at the minimizer X¯\overline{X}. We consider the following linear equation:

P𝕋Y=EforYn1×n2.\mathcal{M}P_{\mathbb{T}^{\perp}}Y=-\mathcal{M}E\quad\mbox{for}\quad Y\in\mathbb{R}^{n_{1}\times n_{2}}. (4.40)

As Y¯X¯ImΦX¯Ker\overline{Y}\in\partial\|\overline{X}\|_{*}\cap\mbox{\rm Im}\,\Phi^{*}\subset\partial\|\overline{X}\|_{*}\cap\mbox{\rm Ker}\,\mathcal{M}, we have

0=Y¯=(P𝕋Y¯+P𝕋Y¯)=E+P𝕋Y¯.0=\mathcal{M}\overline{Y}=\mathcal{M}(P_{\mathbb{T}}\overline{Y}+P_{\mathbb{T}^{\perp}}\overline{Y})=\mathcal{M}E+P_{\mathbb{T}^{\perp}}\overline{Y}.

It follows that Y¯\overline{Y} is a solution to (4.40). Another solution of (4.40) is

Y^=def(P𝕋)E,\widehat{Y}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-(\mathcal{M}P_{\mathbb{T}^{\perp}})^{\dagger}\mathcal{M}E, (4.41)

where (P𝕋)(\mathcal{M}P_{\mathbb{T}^{\perp}})^{\dagger} is the Moore-Penrose generalized inverse of P𝕋\mathcal{M}P_{\mathbb{T}^{\perp}}. Next we claim that P𝕋:ImIm\mathcal{M}^{*}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}:\mbox{\rm Im}\,\mathcal{M}\to\mbox{\rm Im}\,\mathcal{M} is a bijective mapping. Indeed, suppose that P𝕋z=0\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z=0 for some zImz\in\mbox{\rm Im}\,\mathcal{M}, we have

P𝕋z2=P𝕋z,P𝕋z=0.\|P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z\|^{2}=\langle P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z,P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}z\rangle=0.

It follows that Mz𝕋Im={0}M^{*}z\in\mathbb{T}\cap\mbox{\rm Im}\,\mathcal{M}^{*}=\{0\}, which implies zKerIm={0}z\in\mbox{\rm Ker}\,\mathcal{M}^{*}\cap\mbox{\rm Im}\,\mathcal{M}=\{0\}. Thus P𝕋\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*} is injective in Im\mbox{\rm Im}\,\mathcal{M}. As the operator is self-dual, it is bijective. We obtain that

Y^=P𝕋(P𝕋)1E𝕋.\widehat{Y}=-P_{\mathbb{T}^{\perp}}\mathcal{M}^{*}(\mathcal{M}P_{\mathbb{T}^{\perp}}\mathcal{M}^{*})^{-1}\mathcal{M}E\in\mathbb{T}^{\perp}.

By the decomposition (4.19), we may write

Y^=U(000Z^)VTfor someZ^(n1r)×(n2r).\widehat{Y}=U\begin{pmatrix}0&0\\ 0&\widehat{Z}\end{pmatrix}V^{T}\quad\mbox{for some}\quad\widehat{Z}\in\mathbb{R}^{(n_{1}-r)\times(n_{2}-r)}.

Observe from (4.27) that

ζ(X¯)Z^=Y^=γ(X¯).\zeta(\overline{X})\leq\|\widehat{Z}\|=\|\widehat{Y}\|=\gamma(\overline{X}).

If γ(X¯)<1\gamma(\overline{X})<1, we have ζ(X¯)<1.\zeta(\overline{X})<1. X¯\overline{X} is a strong solution due to Theorem 4.6.\hfill\Box

5 Characterizations for strong minima of nuclear norm minimization problems

In this section, let us consider the nuclear norm minimization problem (1.1):

minXn1×n2Xsubject toΦX=M0,\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad\|X\|_{*}\quad\mbox{subject to}\quad\Phi X=M_{0}, (5.1)

where Φ:n1×n2m\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{m} is a linear operator (n1n2n_{1}\leq n_{2}), M0mM_{0}\in\mathbb{R}^{m} is a known observation. This is a particular case of problem (3.20) with g(X)=Xg(X)=\|X\|_{*} for Xn1×n2X\in\mathbb{R}^{n_{1}\times n_{2}}. Note that X0X_{0} is a solution of this problem if and only if ΦX0=M0\Phi X_{0}=M_{0} and

0X0+NΦ1(M0)(X0)=X0+ImΦ,0\in\partial\|X_{0}\|_{*}+N_{\Phi^{-1}(M_{0})}(X_{0})=\partial\|X_{0}\|_{*}+\mbox{\rm Im}\,\Phi^{*},

which is equivalent to the existence of a dual certificate YImΦX0Y\in\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}. Define

Δ(X0)=defImΦX0\Delta(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}

to be the set of all dual certificates.

Lemma 5.1.

Let Ω\Omega be a nonempty closed set of n1×n2\mathbb{R}^{n_{1}\times n_{2}}. Suppose that U1,U2n1×n2U_{1},U_{2}\in\mathbb{R}^{n_{1}\times n_{2}} and V1,V2n2×n2V_{1},V_{2}\in\mathbb{R}^{n_{2}\times n_{2}} are orthogonal matrices satisfying U1ΩV1TU2ΩV2TU_{1}\Omega V_{1}^{T}\supset U_{2}\Omega V_{2}^{T}. Then we have

U1ΩV1T=U2ΩV2T.U_{1}\Omega V_{1}^{T}=U_{2}\Omega V_{2}^{T}. (5.2)

Proof. Define U=U1TU2U=U_{1}^{T}U_{2} and V=V1TV2V=V_{1}^{T}V_{2}. It is easy to check that UU and VV are orthogonal matrices. We have

UΩVTΩ.U\Omega V^{T}\subset\Omega. (5.3)

Define the mapping φ:ΩΩ\varphi:\Omega\to\Omega by φ(X)=UXVT\varphi(X)=UXV^{T} for all XΩX\in\Omega. Note that φ\varphi is an isometry with the spectral metric d(X,Y)=XYd(X,Y)=\|X-Y\| for all X,YΩX,Y\in\Omega. Indeed, we have

d(φ(X),φ(Y))=U(XY)VT=XYfor allX,YΩ.d(\varphi(X),\varphi(Y))=\|U(X-Y)V^{T}\|=\|X-Y\|\quad\mbox{for all}\quad X,Y\in\Omega.

Note further that φ(X)=X\|\varphi(X)\|=\|X\| for all XΩX\in\Omega. It follows from (5.3) that

φ(Ω𝔹¯k(0))Ω𝔹¯k(0)for anyk.\varphi(\Omega\cap\overline{\mathbb{B}}_{k}(0))\subset\Omega\cap\overline{\mathbb{B}}_{k}(0)\quad\mbox{for any}\quad k\in\mathbb{N}.

Since Ω𝔹¯k(0)\Omega\cap\overline{\mathbb{B}}_{k}(0) is a compact metric space and φ\varphi is an isometry, it is well-known that

φ(Ω𝔹¯k(0))=Ω𝔹¯k(0).\varphi(\Omega\cap\overline{\mathbb{B}}_{k}(0))=\Omega\cap\overline{\mathbb{B}}_{k}(0).

Hence we have

φ(Ω)=φ(k=1(Ω𝔹¯k(0)))=k=1φ(Ω𝔹¯k(0))=k=1(Ω𝔹¯k(0))=Ω.\varphi(\Omega)=\varphi\left(\bigcup_{k=1}^{\infty}(\Omega\cap\overline{\mathbb{B}}_{k}(0))\right)=\bigcup_{k=1}^{\infty}\varphi\left(\Omega\cap\overline{\mathbb{B}}_{k}(0)\right)=\bigcup_{k=1}^{\infty}(\Omega\cap\overline{\mathbb{B}}_{k}(0))=\Omega.

This verifies (5.2) and completes the proof of the lemma. \hfill\Box

For any YX0Y\in\partial\|X_{0}\|_{*}, recall from (4.6) that p(Y)p(Y) to be the number of singular values of YY that are equal to 11 . It follows from Lemma 3.2 that r=defrank(X0)p(Y)n1r\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mbox{\rm rank}\,(X_{0})\leq p(Y)\leq n_{1}. Define the following constant

q(X0)=defmin{p(Y)|YΔ(X0)}.q(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\min\{p(Y)|\;Y\in\Delta(X_{0})\}. (5.4)

When X0X_{0} is a minimizer of problem (5.1), q(X0)q(X_{0}) is well-defined and bigger than or equal to rank(X0)\mbox{\rm rank}\,(X_{0}). The following theorem is the main result in this section.

Theorem 5.2 (Characterizations for strong minima of nuclear norm minimization problems).

Suppose that X0X_{0} is an optimal solution of problem (5.1). The following are equivalent:

(i) X0X_{0} is a strong solution of problem (5.1).

(ii) The following equality holds

YΔ(X0)[KerΦTN𝔹(Y)(X0)]={0}.\bigcap_{Y\in\Delta(X_{0})}\Big{[}\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y)}(X_{0})\Big{]}=\{0\}. (5.5)

(iii) There exists a dual certificate Y¯Δ(X0)\overline{Y}\in\Delta(X_{0}) such that

KerΦTN𝔹(Y¯)(X0)={0}.\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(\overline{Y})}(X_{0})=\{0\}. (5.6)

(iv) For any YΔ(X0)Y\in\Delta(X_{0}) satisfying p(Y)=q(X0)p(Y)=q(X_{0}), condition (5.6) is satisfied.

Proof. Note that the nuclear norm function \|\cdot\|_{*} is second order regular at X0X_{0} [18] and it also satisfies the quadratic grown condition at X0X_{0}, as \partial\|\cdot\|_{*} is metrically subregular at X0X_{0} for any YX0Y\in\partial\|X_{0}\|_{*}; see [50]. By applying Lemma 4.1 and Theorem 3.5 with 𝕏=n1×n2\mathbb{X}=\mathbb{R}^{n_{1}\times n_{2}}, 𝕐=m\mathbb{Y}=\mathbb{R}^{m}, g()=g(\cdot)=\|\cdot\|_{*}, and K={M0}K=\{M_{0}\}, we have X0X_{0} is a strong solution if and only if

[YΔ(X0)TN𝔹(Y)(X0)]C(X0)={0},\left[\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right]\cap C(X_{0})=\{0\}, (5.7)

where C(X0)C(X_{0}) is the critical cone (3.23) computed by

C(X0)={Wn1×n2|WKerΦ,dg(X0)(W)=0}.C(X_{0})=\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;W\in\mbox{\rm Ker}\,\Phi,dg(X_{0})(W)=0\}.

By Proposition 4.5 and formula (4.10), we note that if WKerΦTN𝔹(Y)(X0)W\in\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y)}(X_{0}) for some YΔ(X0)Y\in\Delta(X_{0}) then W𝒞(X0,Y)W\in\mathcal{C}(X_{0},Y), i.e., dg(X0)(W)=Y,W=0dg(X_{0})(W)=\langle Y,W\rangle=0. Thus WC(X0)W\in C(X_{0}) and that

[YΔ(X0)TN𝔹(Y)(X0)]C(X0)=[YΔ(X0)TN𝔹(Y)(X0)]KerΦ.\left[\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right]\cap C(X_{0})=\left[\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right]\cap\mbox{\rm Ker}\,\Phi.

This together with (5.7) verifies the equivalence between (i) and (ii).

The implication [(iii)\Rightarrow(ii)] and [(iv)\Rightarrow(ii)] are trivial. To justify the converse implications, we first claim the existence of some dual certificate Y¯Δ(X0)\overline{Y}\in\Delta(X_{0}) such that p(Y¯)=q(X0)p(\overline{Y})=q(X_{0}) and

YΔ(X0)[TN𝔹(Y)(X0)]=TN𝔹(Y¯)(X0).\bigcap_{Y\in\Delta(X_{0})}\Big{[}T_{N_{\mathbb{B}}(Y)}(X_{0})\Big{]}=T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}). (5.8)

We prove this by using a popular version Zorn’s lemma [11, Proposition 5.9] on partially directed order set. Consider the following partially ordered set (poset)

𝒫={TN𝔹(Y)(X0)|YΔ(X0)}\mathcal{P}=\Big{\{}T_{N_{\mathbb{B}}(Y)}(X_{0})|\;Y\in\Delta(X_{0})\Big{\}} (5.9)

with the partial ordering \supset between sets in 𝒫\mathcal{P}. Take into account any downward chain

TN𝔹(Y1)(X0)TN𝔹(Y2)(X0)TN𝔹(Yk)(X0)T_{N_{\mathbb{B}}(Y_{1})}(X_{0})\supset T_{N_{\mathbb{B}}(Y_{2})}(X_{0})\supset\ldots\supset T_{N_{\mathbb{B}}(Y_{k})}(X_{0})\supset\ldots (5.10)

for a sequence {Yk}Δ(X0)\{Y_{k}\}\subset\Delta(X_{0}). We can find a subsequence {Ykl}\{Y_{k_{l}}\} of {Yk}\{Y_{k}\} such that they have the same rank prp\geq r. According to the computation (4.10), there exist orthogonal matrices (Ukl,Vkl)𝒪(Ykl)𝒪(X0)(U_{k_{l}},V_{k_{l}})\in\mathcal{O}(Y_{k_{l}})\cap\mathcal{O}(X_{0}) such that

TN𝔹(Ykl)(X0)=UklΩpVklTwithΩp=def{(AB0BTC0000)|A𝕊r,Br×(pr),C𝕊+pr}.T_{N_{\mathbb{B}}(Y_{k_{l}})}(X_{0})=U_{k_{l}}\Omega_{p}V^{T}_{k_{l}}\quad\mbox{with}\quad\Omega_{p}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\left\{\begin{pmatrix}A&B&0\\ B^{T}&C&0\\ 0&0&0\end{pmatrix}|\;A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(p-r)},C\in\mathbb{S}_{+}^{p-r}\right\}.

It follows that

TN𝔹(Ykl)(X0)=UklΩpVklTUkl+1ΩpVkl+1T=TN𝔹(Ykl+1)(X0).T_{N_{\mathbb{B}}(Y_{k_{l}})}(X_{0})=U_{k_{l}}\Omega_{p}V^{T}_{k_{l}}\supset U_{k_{l+1}}\Omega_{p}V^{T}_{k_{l+1}}=T_{N_{\mathbb{B}}(Y_{k_{l+1}})}(X_{0}). (5.11)

Since Ωp\Omega_{p} is closed, we obtain from Lemma 5.1 that

TN𝔹(Ykl)(X0)=TN𝔹(Ykl+1)(X0)for alll=1,2,.T_{N_{\mathbb{B}}(Y_{k_{l}})}(X_{0})=T_{N_{\mathbb{B}}(Y_{k_{l+1}})}(X_{0})\quad\mbox{for all}\quad l=1,2,\ldots.

Hence the chain (5.10) is bounded below by TN𝔹(Yk1)(X0)𝒫T_{N_{\mathbb{B}}(Y_{k_{1}})}(X_{0})\in\mathcal{P}. This means that every downward chain of of 𝒫\mathcal{P} has a minimum in 𝒫\mathcal{P}.

Let us show next that the poset 𝒫\mathcal{P} is directed downward with the partial ordering \supset in the sense that for any two elements TN𝔹(Y1)(X0)T_{N_{\mathbb{B}}(Y_{1})}(X_{0}) and TN𝔹(Y2)(X0)T_{N_{\mathbb{B}}(Y_{2})}(X_{0}) of 𝒫\mathcal{P} with Y1,Y2Δ(X0)Y_{1},Y_{2}\in\Delta(X_{0}), there exists Y3Δ(X0)Y_{3}\in\Delta(X_{0}) such that

TN𝔹(Y1)(X0)TN𝔹(Y3)(X0)andTN𝔹(Y2)(X0)TN𝔹(Y3)(X0).T_{N_{\mathbb{B}}(Y_{1})}(X_{0})\supset T_{N_{\mathbb{B}}(Y_{3})}(X_{0})\quad\mbox{and}\quad T_{N_{\mathbb{B}}(Y_{2})}(X_{0})\supset T_{N_{\mathbb{B}}(Y_{3})}(X_{0}). (5.12)

Indeed, we choose Y3=12(Y1+Y2)Δ(X0)Y_{3}=\frac{1}{2}(Y_{1}+Y_{2})\in\Delta(X_{0}). For any WTN𝔹(Y3)(X0)W\in T_{N_{\mathbb{B}}(Y_{3})}(X_{0}), we obtain from (4.10) that 0=d2g(X0|Y3)(W)0=d^{2}g(X_{0}|Y_{3})(W). Hence, there exists tk0t_{k}\downarrow 0 and WkWW_{k}\to W such that

1kg(X0+tkWk)g(X0)0.5Y1+Y2,X00.5tk2=g(X0+tkWk)g(X0)Y1,X0tk2+g(X0+tkWk)g(X0)Y2,X0tk2,\begin{array}[]{ll}\dfrac{1}{k}&\geq\dfrac{g(X_{0}+t_{k}W_{k})-g(X_{0})-0.5\langle Y_{1}+Y_{2},X_{0}\rangle}{0.5t_{k}^{2}}\\ &=\dfrac{g(X_{0}+t_{k}W_{k})-g(X_{0})-\langle Y_{1},X_{0}\rangle}{t_{k}^{2}}+\dfrac{g(X_{0}+t_{k}W_{k})-g(X_{0})-\langle Y_{2},X_{0}\rangle}{t_{k}^{2}},\end{array}

which implies that

0d2g(X0|Y1)(W)+d2g(X0|Y2)(W).0\geq d^{2}g(X_{0}|Y_{1})(W)+d^{2}g(X_{0}|Y_{2})(W).

Since d2g(X0|Y1)(W),d2g(X0|Y2)(W)0d^{2}g(X_{0}|Y_{1})(W),d^{2}g(X_{0}|Y_{2})(W)\geq 0, we obtain from Lemma 3.9 and Lemma 4.1 that

WKerd2g(X0|Y1)=TN𝔹(Y1)(X0)andWKerd2g(X0|Y2)=TN𝔹(Y2)(X0).W\in\mbox{\rm Ker}\,d^{2}g(X_{0}|Y_{1})=T_{N_{\mathbb{B}}(Y_{1})}(X_{0})\quad\mbox{and}\quad W\in\mbox{\rm Ker}\,d^{2}g(X_{0}|Y_{2})=T_{N_{\mathbb{B}}(Y_{2})}(X_{0}).

This clearly verifies the directed condition (5.12). By [11, Proposition 5.9], the directed downward poset 𝒫\mathcal{P} has a minimum in 𝒫\mathcal{P} in the sense that there exists Y¯\overline{Y} such that (5.8) is valid. The implication [(ii)\Rightarrow(iii)] follows from (5.8).

Next, let us show that p(Y¯)=q(X0)p(\overline{Y})=q(X_{0}) and

TN𝔹(Y)(X0)=TN𝔹(Y¯)(X0)T_{N_{\mathbb{B}}(Y)}(X_{0})=T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}) (5.13)

for any YΔ(X0)Y\in\Delta(X_{0}) with p(Y)=q(X0)p(Y)=q(X_{0}). Indeed, pick any YY satisfying the later condition, we obtain from (5.8) that

TN𝔹(Y)(X0)TN𝔹(Y¯)(X0).T_{N_{\mathbb{B}}(Y)}(X_{0})\supset T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}). (5.14)

Due to the computation (4.10), the maximum ranks of matrices in TN𝔹(Y¯)(X0)T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}) and TN𝔹(Y)(X0)T_{N_{\mathbb{B}}(Y)}(X_{0}) are p(Y¯)p(\overline{Y}) and p(Y)p(Y), respectively. It follows that q(X0)p(Y¯)p(Y)=q(X0)q(X_{0})\leq p(\overline{Y})\leq p(Y)=q(X_{0}), which verifies that p(Y¯)=q(X0)p(\overline{Y})=q(X_{0}). Moreover, due to (4.10) and (5.14), we get (5.13) from Lemma 5.1. Combining (5.13) and (5.8) ensures the implication [(ii)\Rightarrow(iv)]. The proof is complete. \hfill\Box

Remark 5.3.

A sufficient condition for strong minima of nuclear norm minimization (5.1) can be obtained from [18, Theorem 12]. However, their condition has a format of a minimax problem: for any element in the critical cone, there exists some Lagrange multiplier such that a certain second order sufficient condition is satisfied, i.e., the Lagrange multiplier used in the sufficient condition depends on the choice of elements in critical cone. This is a typical situation; see (2.23) for instance. In our characterizations for strong minima in part (iii) and (iv) of the above theorem, the existence of dual certificate Y¯\overline{Y} is independent from elements of the critical cone. Condition (iii) is close to the maximin situation. Morever, we know extra information about these dual certificates from (iv) that they should have minimum numbers of singular values that are equal to 1.

Similarly to Theorem 4.6, condition (5.6) is equivalent to the combination of Strong Restricted Injectivity and Strong Nondegenerate Source Condition. Let us recall the model tangent subspace (4.13) at X0X_{0} here:

𝕋0=def{U0YT+XV0T|Xn1×r,Yn2×r},\mathbb{T}_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{U_{0}Y^{T}+XV_{0}^{T}|\;X\in\mathbb{R}^{n_{1}\times r},Y\in\mathbb{R}^{n_{2}\times r}\},

where U0Σ0V0TU_{0}\Sigma_{0}V_{0}^{T} is a compact SVD of X0X_{0}.

Corollary 5.4 (Strong Restricted Injectivity and Strong Nondegenerate Source Condition for strong minima ).

Suppose that X0X_{0} is a minimizer of problem (5.1). Then X0X_{0} is a strong solution of problem (5.1) if and only if both following conditions hold:

  • (a)

    Strong Restricted Injectivity: KerΦ0T0={0}\mbox{\rm Ker}\,\Phi\cap\mathcal{E}_{0}\cap T_{0}=\{0\}, where

    0={Wn1×n2|PT0WU(AB0BT00000)VT,A𝕊r,Br×(q(X0)r)}\mathcal{E}_{0}=\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{T_{0}}W\in U\begin{pmatrix}A&B&0\\ B^{T}&0&0\\ 0&0&0\end{pmatrix}V^{T},A\in\mathbb{S}^{r},B\in\mathbb{R}^{r\times(q(X_{0})-r)}\right\} (5.15)

    with some (U,V)𝒪(X0)𝒪(Y0)(U,V)\in\mathcal{O}(X_{0})\cap\mathcal{O}(Y_{0}) with Y0Δ(X0)Y_{0}\in\Delta(X_{0}) satisfying p(Y0)=q(X0)p(Y_{0})=q(X_{0}).

  • (b)

    Strong Nondegenerate Source Condition: There exists YImΦ+0Y\in\mbox{\rm Im}\,\Phi^{*}+\mathcal{E}_{0}^{\perp} such that YriX0.Y\in{\rm ri}\,\partial\|X_{0}\|_{*}.

Remark 5.5 (Sharp minima vs Strong minima).

The set 0\mathcal{E}_{0} is only dependent on X0X_{0}. Indeed, it follows from (5.8) and (4.10) that

0={Wn1×n2|P𝕋WP𝕋(YΔ(X0)TN𝔹(Y)(X0))},\mathcal{E}_{0}=\left\{W\in\mathbb{R}^{n_{1}\times n_{2}}|\;P_{\mathbb{T}}W\in P_{\mathbb{T}}\left(\bigcap_{Y\in\Delta(X_{0})}T_{N_{\mathbb{B}}(Y)}(X_{0})\right)\right\},

which is a subspace of n1×n2\mathbb{R}^{n_{1}\times n_{2}}. As discussed after Theorem 4.6, the Strong Restricted Injectivity is weaker than the Restricted Injectivity

KerΦ𝕋0={0}.\mbox{\rm Ker}\,\Phi\cap\mathbb{T}_{0}=\{0\}. (5.16)

The Strong Nondegenerate Source Condition is also weaker than the Nondegenerate Source Condition:

ImΦriX0.\mbox{\rm Im}\,\Phi^{*}\cap{\rm ri}\,\partial\|X_{0}\|_{*}\neq\emptyset. (5.17)

This condition means q(X0)=rank(X0)q(X_{0})=\mbox{\rm rank}\,(X_{0}). The Nondegenerate Source Condition together with the Restricted Injectivity is used in [15, 16] as sufficient conditions for solution uniqueness of problem (5.1) at X0X_{0}. These two conditions are shown recently to be equivalent to the stronger property, sharp minima at X0X_{0} in [24, Theorem 4.6] in the sense that there exists some c>0c>0 such that

XX0cXX0for any Xn1×n2 satisfyingΦX=M0.\|X\|_{*}-\|X_{0}\|_{*}\geq c\|X-X_{0}\|\quad\mbox{for any $X\in\mathbb{R}^{n_{1}\times n_{2}}$ satisfying}\quad\Phi X=M_{0}. (5.18)

Our Strong Restricted Injectivity and Strong Nongenerate Source Condition are characterizations for a weaker property, the strong minima for problem (5.1). Of courses, they can also serve as sufficient conditions for solution uniqueness at X0X_{0}; see [28] for some recent characterizations for this property.

In order to check Nondegenerate Source Condition (5.17), one has to show that the Source Coefficient ρ(X0)\rho(X_{0}), the optimal value of the following optimization problem

minZ𝕋0Zsubject to𝒩Z=𝒩E0withE0=defU0V0T\min_{Z\in\mathbb{T}_{0}^{\perp}}\quad\|Z\|\quad\mbox{subject to}\quad\mathcal{N}Z=-\mathcal{N}E_{0}\quad\mbox{with}\quad E_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}U_{0}V_{0}^{T} (5.19)

to be smaller than 11, where 𝒩\mathcal{N} is a linear operator satisfying Ker𝒩=ImΦ\mbox{\rm Ker}\,\mathcal{N}=\mbox{\rm Im}\,\Phi^{*}; see, e.g. [24, Remark 4.5]. When the Restricted Injectivity (5.16) holds, an upper bound for ρ(X0)\rho(X_{0}) is

τ(X0)=def𝒩𝕋0(𝒩𝕋0𝒩𝕋0)1𝒩E0with𝒩𝕋0=def𝒩P𝕋0.\tau(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|\mathcal{N}^{*}_{\mathbb{T}_{0}^{\perp}}(\mathcal{N}^{*}_{\mathbb{T}_{0}^{\perp}}\mathcal{N}_{\mathbb{T}_{0}^{\perp}})^{-1}\mathcal{N}E_{0}\|\quad\mbox{with}\quad\mathcal{N}_{\mathbb{T}_{0}^{\perp}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\mathcal{N}P_{\mathbb{T}_{0}^{\perp}}. (5.20)

Hence condition τ(X0)<1\tau(X_{0})<1 is sufficient for sharp minima; see, e.g., [24, Corollary 4.8]. This condition is known as the Analysis Exact Recovery Condition in [38] for the case of 1\ell_{1} optimization. Another independent condition is also used to check solution uniqueness of problem (5.1) is the so-called Irrepresentability Criterion [15, 16, 45]:

𝐈𝐂(X0)=defΦ𝕋0Φ𝕋0(Φ𝕋0Φ𝕋0)1E0<1withΦ𝕋0=defΦP𝕋0.{\bf IC}(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\|\Phi^{*}_{\mathbb{T}_{0}^{\perp}}\Phi_{\mathbb{T}_{0}}\left(\Phi_{\mathbb{T}_{0}}^{*}\Phi_{\mathbb{T}_{0}}\right)^{-1}E_{0}\|<1\quad\mbox{with}\quad\Phi_{\mathbb{T}_{0}}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\Phi P_{\mathbb{T}_{0}}. (5.21)

Note that 𝐈𝐂(X0)ρ(X0){\bf IC}(X_{0})\geq\rho(X_{0}). Thus 𝐈𝐂(X0)<1{\bf IC}(X_{0})<1 also implies that X0X_{0} is a sharp solution of problem (5.1).

Remark 5.6 (Descent cone vs tangent cone).

Sharp minima and strong minima of problem (5.1) are sufficient for solution uniqueness, which is a significant property for exact recovery [1, 13, 16]. An important geometric structure used to study solution uniqueness is the descent cone [13] at X0X_{0} defined by

𝒟(X0)=defcone{XX0|XX0}.\mathcal{D}(X_{0})\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}{\rm cone}\,\{X-X_{0}|\;\|X\|_{*}\leq\|X_{0}\|_{*}\}. (5.22)

Indeed, [13] shows that X0X_{0} is a unique solution of problem (5.1) if and only if

KerΦ𝒟(X0)={0}.\mbox{\rm Ker}\,\Phi\cap\mathcal{D}(X_{0})=\{0\}. (5.23)

Unlike the tangent cones in (5.5) or (5.6), the descent cone 𝒟(X0)\mathcal{D}(X_{0}) may be not closed. Although the direct connection between the descent cone 𝒟(X0){\mathcal{D}}(X_{0}) and tangent cones TN𝔹(Y)(X0)T_{N_{\mathbb{B}}(Y)}(X_{0}) is not clear, we claim that

KerΦ𝒟(X0)YΔ(X0)[KerΦTN𝔹(Y)(X0)]\mbox{\rm Ker}\,\Phi\cap\mathcal{D}(X_{0})\subset\bigcap_{Y\in\Delta(X_{0})}\left[\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y)}(X_{0})\right] (5.24)

when X0X_{0} is a minimizer of problem (5.1), where Δ(X0)=ImΦX0\Delta(X_{0})=\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*} is the set of dual certificates at X0X_{0}. Indeed, for any WKerΦ𝒟(X0)W\in\mbox{\rm Ker}\,\Phi\cap\mathcal{D}(X_{0}), there exists τ>0\tau>0 such that X0+τWX0\|X_{0}+\tau W\|_{*}\leq\|X_{0}\|_{*}. As Φ(X0+τW)=ΦX0\Phi(X_{0}+\tau W)=\Phi X_{0}, we have X0+τW=X0\|X_{0}+\tau W\|_{*}=\|X_{0}\|_{*}. Pick any YΔ(X0)Y\in\Delta(X_{0}). In view of (4.4) and the definition of YY, it follows that

X0+τW=X0=Y,X0=Y,X0+τW,\|X_{0}+\tau W\|_{*}=\|X_{0}\|_{*}=\langle Y,X_{0}\rangle=\langle Y,X_{0}+\tau W\rangle,

which implies that X0+τWN𝔹(Y)X_{0}+\tau W\in N_{\mathbb{B}}(Y) due to (4.4)-(4.5) and thus WTN𝔹(Y)(X0)W\in T_{N_{\mathbb{B}}(Y)}(X_{0}). This verifies inclusion (5.24). Inclusion (5.24) also tells us that condition (5.5) is sufficient for (5.23). This observation is not a surprise in the sense of Theorem 5.2, as strong minima obviously implies solution uniqueness. But solution uniqueness of problem (5.1) does not indicate strong minima; see [24, Example 5.11]. Hence, inclusion (5.24) is strict.

Similarly to Corollary 4.4, the following result reveals the role of Strict Restricted Injectivity in strong minima.

Corollary 5.7 (Strict Restricted Injectivity for strong minima of problem (5.1)).

Suppose that U0Σ0V0TU_{0}\Sigma_{0}V_{0}^{T} is a compact SVD of X0X_{0} with r=rank(X0)r={\rm rank}\,(X_{0}). If X0X_{0} is a strong solution, then the following Strict Restricted Injectivity is satisfied:

KerΦU0𝕊rV0T={0}.\mbox{\rm Ker}\,\Phi\cap U_{0}\mathbb{S}^{r}V_{0}^{T}=\{0\}. (5.25)

This condition is also sufficient for strong solution at X0X_{0} provided that Nondegenerate Source Condition (5.17) holds at X0X_{0}.

Proof. Note from (4.10),

TN𝔹(Y)(X0)U0𝕊rV0Tfor anyYΔ(X0).T_{N_{\mathbb{B}}(Y)}(X_{0})\supset U_{0}\mathbb{S}^{r}V_{0}^{T}\quad\mbox{for any}\quad Y\in\Delta(X_{0}).

If X0X_{0} is a strong solution, combing (5.5) with the latter inclusion verifies (5.25).

If Nongegenerate Source Condition (5.17) holds at X0X_{0}, there exists Y0ImΦriX0Y_{0}\in\mbox{\rm Im}\,\Phi^{*}\cap{\rm ri}\,\partial\|X_{0}\|_{*}. Hence Δ(X0)\Delta(X_{0})\neq\emptyset, i.e., X0X_{0} is a solution of problem (5.1). It follows from Lemma 4.1 that p(Y0)=rank(X0)p(Y_{0})=\mbox{\rm rank}\,(X_{0}) and from (4.10) that

TN𝔹(Y)(X0)=U0𝕊rV0T.T_{N_{\mathbb{B}}(Y)}(X_{0})=U_{0}\mathbb{S}^{r}V_{0}^{T}.

Hence the validity of (5.25) implies that X0X_{0} is a strong solution to problem (5.1) due to the equivalence between (i) and (iii) in Theorem 5.2. \hfill\Box

Suppose that U(Σ0000)VTU\begin{pmatrix}\Sigma_{0}&0\\ 0&0\end{pmatrix}V^{T} is a full SVD of X0X_{0}. The model tangent space 𝕋0\mathbb{T}_{0} at X0X_{0} can be represented by

𝕋0={U(ABC0)VT|Ar×r,Br×(n2r),C(n1r)×r}.\mathbb{T}_{0}=\left\{U\begin{pmatrix}A&B\\ C&0\end{pmatrix}V^{T}|\;A\in\mathbb{R}^{r\times r},B\in\mathbb{R}^{r\times(n_{2}-r)},C\in\mathbb{R}^{(n_{1}-r)\times r}\right\}. (5.26)

It has dimension r(n1+n2r)r(n_{1}+n_{2}-r). When Restricted Injectivity (5.16) holds, we have mr(n1+n2r)m\geq r(n_{1}+n_{2}-r). Similarly, as the dimension of U0𝕊rV0TU_{0}\mathbb{S}^{r}V_{0}^{T} is 12r(r+1)\frac{1}{2}r(r+1), it is necessary for Strict Restricted Injectivity (5.25) that m12r(r+1)m\geq\frac{1}{2}r(r+1). Next we show that this bound 12r(r+1)\frac{1}{2}r(r+1) for mm is tight.

Corollary 5.8 (Minimum bound for strong exact recovery).

Suppose that X0X_{0} is an n1×n2n_{1}\times n_{2} matrix with rank rr. Then one needs at least 12r(r+1)\frac{1}{2}r(r+1) measurements mm of M0M_{0} so that solving the nuclear norm minimization problem (5.1) recovers exactly the strong solution X0X_{0}.

Moreover, there exist infinitely many linear operators Φ:n1×n212r(r+1)\Phi:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{\frac{1}{2}r(r+1)} such that X0X_{0} is a strong solution of problem (5.1).

Proof. Suppose that U0ΣV0TU_{0}\Sigma V_{0}^{T} is a compact SVD of X0X_{0}. Let {A1,,As}\{A_{1},\ldots,A_{s}\} with s=def12r(r+1)s\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{2}r(r+1) be any basis of U0𝕊rV0TU_{0}\mathbb{S}^{r}V_{0}^{T}. If X0X_{0} is a strong solution of problem (5.1), Strict Restricted Injectivity (5.25) holds by Corollary 5.7. It follows that {Φ(A1),,Φ(As)}\{\Phi(A_{1}),\ldots,\Phi(A_{s})\} are linearly independent. Hence, we have msm\geq s, which verifies the first part.

To justify the second part, we construct the linear operator Φs:n1×n2s\Phi_{s}:\mathbb{R}^{n_{1}\times n_{2}}\to\mathbb{R}^{s} as follows:

Φs(X)=def(Ak,X)1kssfor anyXn1×n2.\Phi_{s}(X)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}(\langle A_{k},X\rangle)_{1\leq k\leq s}\in\mathbb{R}^{s}\quad\mbox{for any}\quad X\in\mathbb{R}^{n_{1}\times n_{2}}. (5.27)

Note that ImΦs=span{A1,,As}=U0𝕊rV0T\mbox{\rm Im}\,\Phi_{s}^{*}=\mbox{\rm span}\,\{A_{1},\ldots,A_{s}\}=U_{0}\mathbb{S}^{r}V_{0}^{T}. It follows that E0=U0V0TImΦsriX0E_{0}=U_{0}V_{0}^{T}\in\mbox{\rm Im}\,\Phi_{s}^{*}\cap\mbox{\rm ri}\,\partial\|X_{0}\|_{*} is a dual certificate that satisfies Nondegenerate Source Coundition (5.17). As

KerΦs=(ImΦs)=(U0𝕊rV0T),\mbox{\rm Ker}\,\Phi_{s}=(\mbox{\rm Im}\,\Phi_{s}^{*})^{\perp}=(U_{0}\mathbb{S}^{r}V_{0}^{T})^{\perp},

we have KerΦsU0𝕊rV0T={0}\mbox{\rm Ker}\,\Phi_{s}\cap U_{0}\mathbb{S}^{r}V_{0}^{T}=\{0\}. Hence Strict Restricted Injectivity (5.25) holds. By Corollary 5.7 again, X0X_{0} is a strong solution of problem (5.1). \hfill\Box

Remark 5.9 (Low bounds on the number of measurement for exact recovery).

In the theory of exact recovery, [13] shows that with m3r(n1+n2r)m\geq 3r(n_{1}+n_{2}-r) random Gaussian measurements it is sufficient to recover exactly X0X_{0} with high probability by solving problem (5.1) from observations M0=ΦX0M_{0}=\Phi X_{0}; see also [16] for a similar result with a different approach. Also in [13, Propositions 4.5 and 4.6], a lower bound on the number of measurements is discussed for exact recovery in atomic norm minimization via the descent cone (5.6) and Terracini’s Lemma [26]. The latter is used to get an estimate of the dimension of a subspace component of the descent cone. In the case of nuclear norm, this lower bound is indeed min{n1n2,(r+1)(n1+n2)r}\min\{n_{1}n_{2},(r+1)(n_{1}+n_{2})-r\}; see aslo [26, Proposition 12.2]. This bound holds for any linear measurement scheme. Our lower bound 12r(r+1)\frac{1}{2}r(r+1) for mm is much smaller and only depends on the rank of X0X_{0}, but is achieved for special Φ\Phi only.

Example 5.10.

Let us consider the following nuclear norm minimization problem

minX2×2Xsubject toΦX=def(X1100X22)=(1000),\min_{X\in\mathbb{R}^{2\times 2}}\|X\|_{*}\quad\mbox{subject to}\quad\Phi X\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}X_{11}&0\\ 0&X_{22}\end{pmatrix}=\begin{pmatrix}1&0\\ 0&0\end{pmatrix}, (5.28)

which is a matrix completion problem. Set X0=(1000)X_{0}=\begin{pmatrix}1&0\\ 0&0\end{pmatrix}, we have

X0=(100[1,1]) and 𝕋0={(abc0)|a,b,c}.\partial\|X_{0}\|_{*}=\begin{pmatrix}1&0\\ 0&[-1,1]\end{pmatrix}\quad\mbox{ and }\quad\mathbb{T}_{0}=\left\{\begin{pmatrix}a&b\\ c&0\end{pmatrix}|\;a,b,c\in\mathbb{R}\right\}. (5.29)

Moreover, note that

KerΦ={(0bc0)|b,c} and ImΦ={(a00d)|a,d}\mbox{\rm Ker}\,\Phi=\left\{\begin{pmatrix}0&b\\ c&0\end{pmatrix}|\;b,c\in\mathbb{R}\right\}\quad\mbox{ and }\quad\mbox{\rm Im}\,\Phi^{*}=\left\{\begin{pmatrix}a&0\\ 0&d\end{pmatrix}|\;a,d\in\mathbb{R}\right\} (5.30)

Note that Δ(X0)=ImΦX0=X0\Delta(X_{0})=\mbox{\rm Im}\,\Phi^{*}\cap\partial\|X_{0}\|_{*}=\partial\|X_{0}\|_{*}. Hence X0X_{0} is a solution of problem (5.28). However, KerΦ𝕋0=KerΦ{0}\mbox{\rm Ker}\,\Phi\cap\mathbb{T}_{0}=\mbox{\rm Ker}\,\Phi\neq\{0\}, i.e., Restricted Injectivity (5.16) fails. Thus X0X_{0} is not a sharp solution of problem (5.28). Note that q(X0)=1q(X_{0})=1 and Y0=(1000)Δ(X0)Y_{0}=\begin{pmatrix}1&0\\ 0&0\end{pmatrix}\in\Delta(X_{0}) with p(Y0)=1p(Y_{0})=1. We have from (4.10) that

TN𝔹(Y0)(X0)={(a00d)|a,d}.T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\left\{\begin{pmatrix}a&0\\ 0&d\end{pmatrix}|\;a,d\in\mathbb{R}\right\}.

It is clear that KerΦTN𝔹(Y0)(X0)={0}\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\{0\}. This shows that X0X_{0} is a strong solution by Theorem 5.2. \hfill\Box

Example 5.11 (Checking strong minima numerically).

Let us consider the following matrix completion problem

minX3×3Xsubject toPΩ(X)=M0=def(424210400),\min_{X\in\mathbb{R}^{3\times 3}}\quad\|X\|_{*}\quad\mbox{subject to}\quad{\rm P}_{\Omega}(X)=M_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}4&2&4\\ 2&1&0\\ 4&0&0\end{pmatrix}, (5.31)

where PΩ{\rm P}_{\Omega} is the projection mapping defined by

PΩ(X)=def(X11X12X13X21X220X3100).{\rm P}_{\Omega}(X)\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}X_{11}&X_{12}&X_{13}\\ X_{21}&X_{22}&0\\ X_{31}&0&0\end{pmatrix}.

Define

X0=def(424212424)andU=V=def13(221122212).X_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}4&2&4\\ 2&1&2\\ 4&2&4\end{pmatrix}\quad\mbox{and}\quad U=V\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{3}\begin{pmatrix}2&-2&1\\ 1&2&2\\ 2&1&-2\end{pmatrix}.

Note that U,VU,V are orthogonal matrices, PΩ(X0)=M0{\rm P}_{\Omega}(X_{0})=M_{0}, X0=U(900000000)VTX_{0}=U\begin{pmatrix}9&0&0\\ 0&0&0\\ 0&0&0\end{pmatrix}V^{T} is an SVD of X0X_{0}, and rank(X0)=1{\rm rank}\,(X_{0})=1. To check whether X0X_{0} is a sharp solution of problem (5.31), we compute the constants τ(E0)\tau(E_{0}) in (5.20) or ρ(E0)\rho(E_{0}) from problem (5.19). In this case the linear operator 𝒩\mathcal{N} in (5.19) is chosen by PΩP_{\Omega^{\perp}} as KerPΩ=ImPΩ\mbox{\rm Ker}\,P_{\Omega^{\perp}}=\mbox{\rm Im}\,P_{\Omega}. With some linear algebra, we calculate τ(E0)=1.2>1\tau(E_{0})=1.2>1 with E0=1/9X0E_{0}=1/9X_{0}. Moreover, by using the cvx package to solve the spectral norm optimization (5.19), the Source Nondegenerate ρ(E0)\rho(E_{0}) is exactly 11 and gives us a solution Z0Z_{0}. Thus X0X_{0} is not a sharp solution of problem (5.28).

However, note further that Y0=Z0+E0KerPΩ=ImPΩY_{0}=Z_{0}+E_{0}\in\mbox{\rm Ker}\,P_{\Omega^{\perp}}=\mbox{\rm Im}\,P_{\Omega} is a dual certificate, which is computed by

Y0=(001010100).Y_{0}=\begin{pmatrix}0&0&1\\ 0&1&0\\ 1&0&0\end{pmatrix}.

Let us check the condition

KerΦTN𝔹(Y0)(X0)={0}\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\{0\} (5.32)

in (5.6). It follows from (4.3) that

Y0=UUTY0VVT=U(100001010)VT.Y_{0}=UU^{T}Y_{0}VV^{T}=U\begin{pmatrix}1&0&0\\ 0&0&1\\ 0&1&0\end{pmatrix}V^{T}.

The SVD of the submatrix (0110)\begin{pmatrix}0&1\\ 1&0\end{pmatrix} is certainly

(0110)(1001)(1001).\begin{pmatrix}0&1\\ 1&0\end{pmatrix}\begin{pmatrix}1&0\\ 0&1\end{pmatrix}\begin{pmatrix}1&0\\ 0&1\end{pmatrix}.

According to (4.8), (U¯,Y¯)𝒪(X0)𝒪(Y0)(\overline{U},\overline{Y})\in\mathcal{O}(X_{0})\cap\mathcal{O}(Y_{0}) with

U¯=def13(212122221)andV¯=defV.\overline{U}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\frac{1}{3}\begin{pmatrix}2&1&-2\\ 1&2&2\\ 2&-2&1\end{pmatrix}\quad\mbox{and}\quad\overline{V}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}V.

By formula (4.10),

TN𝔹(Y0)(X0)={U¯(ABBTC)V¯T|A1×1,B1×2,C𝕊+2}.T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\left\{\overline{U}\begin{pmatrix}A&B\\ B^{T}&C\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{R}^{1\times 1},B\in\mathbb{R}^{1\times 2},C\in\mathbb{S}^{2}_{+}\right\}.

It follows that

={U¯(ABBTC)V¯T|A1×1,B1×2,C2×2}and={U¯(0BBT0)V¯T|B1×2}.\mathcal{E}=\left\{\overline{U}\begin{pmatrix}A&B\\ B^{T}&C\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{R}^{1\times 1},B\in\mathbb{R}^{1\times 2},C\in\mathbb{R}^{2\times 2}\right\}\;\mbox{and}\;\mathcal{E}^{\perp}=\left\{\overline{U}\begin{pmatrix}0&B\\ -B^{T}&0\end{pmatrix}\overline{V}^{T}|\;B\in\mathbb{R}^{1\times 2}\right\}.

To verify (5.32), we compute ζ(E0)\zeta(E_{0}), which is the optimal solution of problem (4.38)

minZ2×2,W3×3Zsubject toPΩc(U¯JZVJT+W)=PΩc(E0)andW.\min_{Z\in\mathbb{R}^{2\times 2},W\in\mathbb{R}^{3\times 3}}\quad\|Z\|\quad\mbox{subject to}\quad P_{\Omega^{c}}(\overline{U}_{J}ZV_{J}^{T}+W)=-P_{\Omega^{c}}(E_{0})\quad\mbox{and}\quad W\in\mathcal{E}^{\perp}.

This is a convex optimization problem. By using cvx to solve it, we obtain that ζ(E0)=1/6\zeta(E_{0})=1/6. As ζ(E0)<1\zeta(E_{0})<1, X0X_{0} is a strong solution of problem (5.31). \hfill\Box

The idea of checking strong solution numerically by using Theorem 5.2 in the above example will be summarized again in Section 6 with bigger size of nuclear norm minimization problems.

In the later Corollary 5.12, we will show a big class of nuclear minimization problem satisfies both Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17), but not Restricted Injectivity (5.16).

Now let us consider a special case of problem (5.1)

minXn1×n2Xsubject toLX=M0,\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad\|X\|_{*}\quad\mbox{subject to}\quad LX=M_{0}, (5.33)

where LL and M0M_{0} are known q×n1q\times n_{1} and q×n2q\times n_{2} matrices, respectively. This is usually referred as the low-rank representation problem [33]. It is well-known that the optimal solution to problem (5.33) is unique and determined by LM0L^{\dagger}M_{0}, where LL^{\dagger} is the Moore-Penrose opeator of LL. In the following result, we advance this result by showing that this unique solution is indeed a strong solution, but not necessarily a sharp solution. Indeed, in this certain class, Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17) are satisfied, but Restricted Injectivity (5.16) is not.

Corollary 5.12 (Strong minima of low-rank representation problems).

Let LL be an q×n1q\times n_{1} matrix. If the linear system LX=M0LX=M_{0} is consistent, then Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17) hold at X0=defLM0X_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}L^{\dagger}M_{0} in problem (5.33).

Consequently, X0X_{0} is the strong solution of the low-rank representation problem (5.33).

Proof. Suppose that UΣVTU\Sigma V^{T} is a compact SVD to the matrix LL. Thus L=VΣ1UTL^{\dagger}=V\Sigma^{-1}U^{T} and note that LM0=VΣ1UTM0L^{\dagger}M_{0}=V\Sigma^{-1}U^{T}M_{0}. Let U0Σ0V0TU_{0}\Sigma_{0}V_{0}^{T} be a compact SVD of Σ1UTM0\Sigma^{-1}U^{T}M_{0} with Σ0r×r\Sigma_{0}\in\mathbb{R}^{r\times r}. Note that

(VU0)TVU0=U0TVTVU0=U0TU0=𝕀.(VU_{0})^{T}VU_{0}=U_{0}^{T}V^{T}VU_{0}=U_{0}^{T}U_{0}=\mathbb{I}.

It follows that VU0ΣV0TVU_{0}\Sigma V_{0}^{T} is a compact SVD of X0X_{0}. By Lemma 4.1, we have E0=defVU0V0TriX0E_{0}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}VU_{0}V_{0}^{T}\in\mbox{\rm ri}\,\partial\|X_{0}\|_{*}. Observe further that

E0=VU0V0T=VΣUTUΣ1U0V0T=LTUΣ1U0V0T=Φ(UΣ1U0V0T),E_{0}=VU_{0}V_{0}^{T}=V\Sigma U^{T}U\Sigma^{-1}U_{0}V_{0}^{T}=L^{T}U\Sigma^{-1}U_{0}V_{0}^{T}=\Phi^{*}(U\Sigma^{-1}U_{0}V_{0}^{T}),

which implies that E0ImΦriX0.E_{0}\in\mbox{\rm Im}\,\Phi^{*}\cap\mbox{\rm ri}\,\partial\|X_{0}\|_{*}. This verifies Nondegenerate Source Condition (5.17) and shows that X0X_{0} is an optimal solution of problem (5.33).

Next, let us check Strict Restricted Injectivity (5.25). For any WKerΦ(VU0)𝕊rV0TW\in\mbox{\rm Ker}\,\Phi\cap(VU_{0})\mathbb{S}^{r}V_{0}^{T} with r=rank(X0)r=\mbox{\rm rank}\,(X_{0}), we find some A𝕊rA\in\mathbb{S}^{r} such that W=VU0AV0TW=VU_{0}AV_{0}^{T}. We have

0=Φ(W)=LW=UΣVTVU0AV0T=UΣU0AV0T.0=\Phi(W)=LW=U\Sigma V^{T}VU_{0}AV_{0}^{T}=U\Sigma U_{0}AV_{0}^{T}.

It follows that

0=U0TΣ1UT(UΣU0AV0T)V0=U0TΣ1ΣU0A=U0TU0A=A,0=U_{0}^{T}\Sigma^{-1}U^{T}(U\Sigma U_{0}AV_{0}^{T})V_{0}=U_{0}^{T}\Sigma^{-1}\Sigma U_{0}A=U_{0}^{T}U_{0}A=A,

which also implies that W=0W=0. This verifies Strict Restricted Injectivity (5.25).

Consequently, according to Corollary 5.7, X0X_{0} is the strong solution of problem (5.33). \hfill\Box

In the following simple example, we show that the unique solution to (5.33) may be not a sharp solution in the sense (5.18).

Example 5.13 (Unique solutions of low-rank representation problems are not sharp).

Consider the following optimization problem

minX2×2Xsubject to(11)X=(10).\min_{X\in\mathbb{R}^{2\times 2}}\qquad\|X\|_{*}\qquad\mbox{subject to}\qquad\begin{pmatrix}1&1\end{pmatrix}X=\begin{pmatrix}1&0\end{pmatrix}. (5.34)

As L=(11)L=\begin{pmatrix}1&1\end{pmatrix} and M0=(10)M_{0}=\begin{pmatrix}1&0\end{pmatrix}, the unique solution to problem (5.34) is

X0=LM0=(0.50.5)(10)=(0.500.50).X_{0}=L^{\dagger}M_{0}=\begin{pmatrix}0.5\\ 0.5\end{pmatrix}\begin{pmatrix}1&0\end{pmatrix}=\begin{pmatrix}0.5&0\\ 0.5&0\end{pmatrix}.

Pick any Xε=def(0.5+ε00.5ε0)X_{\varepsilon}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\begin{pmatrix}0.5+\varepsilon&0\\ 0.5-\varepsilon&0\end{pmatrix} and note that XεX_{\varepsilon} satisfies linear constraint in (5.34). We have

Xε=(0.5+ε)2+(0.5ε)2=0.5+2ε2.\|X_{\varepsilon}\|_{*}=\sqrt{(0.5+\varepsilon)^{2}+(0.5-\varepsilon)^{2}}=\sqrt{0.5+2\varepsilon^{2}}.

It follows that

XεX0XεX0F=(0.5+ε)2+(0.5ε)20.52ε2=0.5+2ε20.52ε2=2ε0.5+2ε2+0.5.\dfrac{\|X_{\varepsilon}\|_{*}-\|X_{0}\|_{*}}{\|X_{\varepsilon}-X_{0}\|_{F}}=\dfrac{\sqrt{(0.5+\varepsilon)^{2}+(0.5-\varepsilon)^{2}}-\sqrt{0.5}}{\sqrt{2\varepsilon^{2}}}=\dfrac{\sqrt{0.5+2\varepsilon^{2}}-\sqrt{0.5}}{\sqrt{2\varepsilon^{2}}}=\dfrac{\sqrt{2}\varepsilon}{\sqrt{0.5+2\varepsilon^{2}}+\sqrt{0.5}}.

This shows that X0X_{0} is not a sharp solution in the sense of (5.18). The hidden reason is that Restricted Injectivity (5.16) is not satisfied in this problem. \hfill\Box

A relatively close problem to (5.33) is

minXn1×n2h(LX)+μX,\min_{X\in\mathbb{R}^{n_{1}\times n_{2}}}\quad h(LX)+\mu\|X\|_{*}, (5.35)

where the function h:q×n2[0,]h:\mathbb{R}^{q\times n_{2}}\to[0,\infty] satisfies the standing assumptions in Section 4 with open domain and μ\mu is a positive number. This is a particular case of (4.1). Next we also show that strong minima occurs in this problem.

Corollary 5.14.

Problem (5.35) has a unique and strong solution.

Proof. It is easy to see that problem (5.35) has at least an optimal solution X¯\overline{X}. Let UΣVTU\Sigma V^{T} be a compact SVD of LL and define

Y¯=def1μLTh(LX¯)=1μVΣUTh(LX¯)X¯.\overline{Y}\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}-\dfrac{1}{\mu}L^{T}\nabla h(L\overline{X})=-\dfrac{1}{\mu}V\Sigma U^{T}\nabla h(L\overline{X})\in\partial\|\overline{X}\|_{*}.

Let U1Σ1V1TU_{1}\Sigma_{1}V_{1}^{T} be a compact SVD of 1μΣUTh(LX¯)-\dfrac{1}{\mu}\Sigma U^{T}\nabla h(L\overline{X}) with Σ1p×p\Sigma_{1}\in\mathbb{R}^{p\times p}. As (VU1)T(VU1)=𝕀(VU_{1})^{T}(VU_{1})=\mathbb{I}, it follows that VU1Σ1V1VU_{1}\Sigma_{1}V_{1} is a compact SVD of Y¯\overline{Y}. By (4.5), we can find A¯𝕊+p\overline{A}\in\mathbb{S}_{+}^{p} such that X¯=VU1A¯V1T\overline{X}=VU_{1}\overline{A}V_{1}^{T}, which means

A¯=U1TVTX¯V1.\overline{A}=U^{T}_{1}V^{T}\overline{X}V_{1}.

Next let us estimate TN𝔹(Y¯)(X¯)T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X}). For any WTN𝔹(Y¯)(X¯)W\in T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X}), we find sequences tk0t_{k}\downarrow 0 and WkWW_{k}\to W satisfying X¯+tkWkN𝔹(Y¯)VU1𝕊+pV1T\overline{X}+t_{k}W_{k}\in N_{\mathbb{B}}(\overline{Y})\subset VU_{1}\mathbb{S}_{+}^{p}V_{1}^{T} by Lemma 4.1 again. It follows that

Wk1tkVU1(𝕊+pA¯)V1TVU1𝕊pV1T,W_{k}\in\frac{1}{t_{k}}VU_{1}(\mathbb{S}^{p}_{+}-\overline{A})V_{1}^{T}\subset VU_{1}\mathbb{S}^{p}V_{1}^{T},

which implies that TN𝔹(Y¯)(X¯)VU1𝕊pV1T.T_{N_{\mathbb{B}}(\overline{Y})}(\overline{X})\subset VU_{1}\mathbb{S}^{p}V_{1}^{T}. We claim next that KerΦTNB(Y¯)(X0)={0}\mbox{\rm Ker}\,\Phi\cap T_{N_{B}(\overline{Y})}(X_{0})=\{0\}. Indeed, take any WTN𝔹(Y¯)(X0)W\in T_{N_{\mathbb{B}}(\overline{Y})}(X_{0}) with Φ(W)=0\Phi(W)=0, we find B𝕊pB\in\mathbb{S}^{p} such that W=VU1BV1TW=VU_{1}BV_{1}^{T} and

0=Φ(W)=LW=UΣVTVU1BV1T=UΣU1BV1T.0=\Phi(W)=LW=U\Sigma V^{T}VU_{1}BV_{1}^{T}=U\Sigma U_{1}BV_{1}^{T}.

Hence we have

U1BV1T=Σ1UTUΣU1BV1=0.U_{1}BV_{1}^{T}=\Sigma^{-1}U^{T}U\Sigma U_{1}BV_{1}=0.

This implies that W=0W=0 and verifies the claim. By Corollary 4.2, X0X_{0} is the strong solution of problem (5.35).\hfill\Box

6 Numerical Experiments

In this section, we perform numerical experiments to demonstrate strong minima, sharp minima, and solution uniqueness for nuclear norm minimization problem (1.1). The experiments were conducted for different matrix ranks rr and numbers of measurements mm for M0M_{0}. Through the section, we also discuss how to use our conditions to check strong minima for problem (1.1).

6.1 Experiment 1

In the first experiment, we generate X0X_{0}, an n×nn\times n matrix of rank rr, by sampling two factors Wn×rW\in\mathbb{R}^{n\times r} and Hn×rH\in\mathbb{R}^{n\times r} with independent and identically distributed (i.i.d.) random entries and setting X0=WHX_{0}=WH^{*}. We vectorize problem (1.1) in the following form:

minXn×nXsubject toΦvec(X)=Φvec(X0),\min_{X\in\mathbb{R}^{n\times n}}\quad\|X\|_{*}\quad\mbox{subject to}\quad\Phi\ \text{vec}(X)=\Phi\ \text{vec}(X_{0}), (6.1)

where Φm×n2\Phi\in\mathbb{R}^{m\times n^{2}} is drawn from the standard Gaussian ensemble, i.e., its entries are independently and identically distributed from a zero-mean unit variance Gaussian distribution. We declare X0X_{0} to be recovered (a solution to (6.1)) if XoptX0F/X0F<103\|X_{\text{opt}}-X_{0}\|_{F}/\|X_{0}\|_{F}<10^{-3}, as proposed in [15]. In order to check sharp minima, it is required to verify Restricted Injectivity (5.16), compute τ(X0)\tau(X_{0}) (5.20) or Source Coefficient ρ(X0)\rho(X_{0}) (5.19); see [24] or Remark 5.5. To verify strong minima at X0X_{0}, we use Strong Source Coefficient ζ(X0)\zeta(X_{0}) in (4.27) or (4.38) and use it .

Specifically, let U0(Σ0000)V0TU_{0}\begin{pmatrix}\Sigma_{0}&0\\ 0&0\end{pmatrix}V_{0}^{T} be a full SVD of X0X_{0}. We respectively denote uiu_{i} and vjv_{j} as the iith and jjth column of U0U_{0} and V0V_{0} for 1i,jn1\leq i,j\leq n. Note from the formula of the tangent model space 𝕋0\mathbb{T}_{0} in (5.26) that ={uivjT:(i,j)[nr+1,n]×[nr+1,n]}\mathcal{B}=\{u_{i}v_{j}^{T}:(i,j)\notin[n-r+1,n]\times[n-r+1,n]\} forms a basis for 𝕋0\mathbb{T}_{0}. Thus, the Restricted Injectivity holds if rankΦB=r(2nr)\mbox{\rm rank}\,\Phi B=r(2n-r), where BB is a matrix whose columns are all vectors from the basis \mathcal{B}.

To compute τ(X0)\tau(X_{0}), we assume that UΣVTU\Sigma V^{T} is an SVD of Φ\Phi and denote VGV_{G} by the matrix whose columns are the last n2rn^{2}-r columns of VV. We then solve the following vectorized problem for the optimal solution ZZ^{*} by using the cvxpy package and compute τ(X0)=Z\tau(X_{0})=\|Z^{*}\|:

minZ𝕋0ZFsubject toNvec(Z)=Nvec(E0),\min_{Z\in\mathbb{T}_{0}^{\perp}}\|Z\|_{F}\quad\mbox{subject to}\quad N\text{vec}(Z)=-N\text{vec}(E_{0}), (6.2)

where N=VGTN=V_{G}^{T} and 𝕋0\mathbb{T}_{0}^{\perp} is known by

𝕋0={U0(000D)V0T|D(nr)×(nr)}.\mathbb{T}_{0}^{\perp}=\left\{U_{0}\begin{pmatrix}0&0\\ 0&D\end{pmatrix}V_{0}^{T}|\;D\in\mathbb{R}^{(n-r)\times(n-r)}\right\}. (6.3)

To calculate ρ(X0)\rho(X_{0}), we solve the following vectorized problem of (5.19) for the optimal value ρ(X0)\rho(X_{0}) by using the cvxpy package:

minZ𝕋0Zsubject toNvec(Z)=Nvec(E0),\min_{Z\in\mathbb{T}_{0}^{\perp}}\|Z\|\quad\mbox{subject to}\quad N\text{vec}(Z)=-N\text{vec}(E_{0}), (6.4)

where NN and 𝕋0\mathbb{T}_{0}^{\perp} are respectively determined in (6.2) and (6.3). X0X_{0} is a sharp solution of problem (6.1) if either τ(X0)\tau(X_{0}) or ρ(X0)\rho(X_{0}) is smaller than 11; see Remark 5.5. Due to the possible (small) error in computation, we classify sharp minima if X0X_{0} is recovered and either τ(X0)<0.99\tau(X_{0})<0.99 or ρ(X0)<0.95\rho(X_{0})<0.95.

To classify strong minima, we consider the cases when X0X_{0} is recovered, and τ(X0)>0.99\tau(X_{0})>0.99 and 0.95<ρ(X0)<1.050.95<\rho(X_{0})<1.05. Let Z0Z_{0} be an optimal solution problem of (6.4) expressed by the following form:

Z0=U0(000D0)V0TandY0=U0(𝕀00D0)V0TZ_{0}=U_{0}\begin{pmatrix}0&0\\ 0&D_{0}\end{pmatrix}V_{0}^{T}\quad\mbox{and}\quad Y_{0}=U_{0}\begin{pmatrix}\mathbb{I}&0\\ 0&D_{0}\end{pmatrix}V_{0}^{T} (6.5)

with some D0(nr)×nrD_{0}\in\mathbb{R}^{(n-r)\times n-r}. Note that Y0Y_{0} is a dual certificate of X0X_{0}. According to Theorem 5.2, X0X_{0} is a strong solution provided that

KerΦTN𝔹(Y0)(X0)={0}.\mbox{\rm Ker}\,\Phi\cap T_{N_{\mathbb{B}}(Y_{0})}(X_{0})=\{0\}. (6.6)

By Theorem 4.6, this condition holds when the Restricted Injectivity is satisfied and the Strong Source Coefficient ζ(X0)\zeta(X_{0}), the optimal value of problem (4.27) or (4.38) is smaller than 11. Let U^Σ^V^\widehat{U}\widehat{\Sigma}\widehat{V} be an SVD of D0D_{0}. We write U0=[UIUJ]U_{0}=[U_{I}\;U_{J}] and V0=[VIVK]V_{0}=[V_{I}\;V_{K}], where UIU_{I} and VIV_{I} are the first rr columns of U0U_{0} and V0V_{0}, respectively. Define U¯=[UIUJU^]\overline{U}=[U_{I}\;U_{J}\widehat{U}] and V¯=[VIVJV^]\overline{V}=[V_{I}\;V_{J}\widehat{V}], it follows that (U¯,V¯)𝒪(X0)𝒪(Y0)(\overline{U},\overline{V})\in\mathcal{O}(X_{0})\cap\mathcal{O}(Y_{0}) by (4.8). To compute ζ(X0)\zeta(X_{0}), we solve the vectorized problem of (4.38):

minZ𝕋0,WZsubject toNvec(Z+E0+W)=0,\min_{Z\in\mathbb{T}_{0}^{\perp},W\in\mathcal{E}^{\perp}}\|Z\|\quad\mbox{subject to}\quad N\text{vec}(Z+E_{0}+W)=0, (6.7)

where 𝕋0\mathbb{T}_{0}^{\perp} is determined in (6.3), E0=UIVITE_{0}=U_{I}V_{I}^{T}, and \mathcal{E}^{\perp} is taken from (4.37):

={U¯(ABCBT00D00)V¯T|A𝕍r,Br×(pr),Cr×(np),D(np)×r}.\mathcal{E}^{\perp}=\left\{\overline{U}\begin{pmatrix}A&B&C\\ -B^{T}&0&0\\ D&0&0\end{pmatrix}\overline{V}^{T}|\;A\in\mathbb{V}_{r},B\in\mathbb{R}^{r\times(p-r)},C\in\mathbb{R}^{r\times(n-p)},D\in\mathbb{R}^{(n-p)\times r}\right\}. (6.8)

We classify strong (non-sharp) minima if ζ(X0)<0.95\zeta(X_{0})<0.95.

To illustrate the occurrence of strong minima, sharp minima, and solution uniqueness of problem (6.1), we create following graphs representing the proportion of each situation with respect to the number of measurements in Figures 1. For each fixed n=40n=40 and 2r72\leq r\leq 7, at any measurements mm we study 100 random cases and record the percentage of cases that are recovered, sharply recovered, and strongly (not sharply) recovered in black, blue, and red curves, respectively. Observe that the percentage of cases where X0X_{0} is a strong (not sharp) solution is highest at approximately 40% and is more than the cases of sharp minima when the number of measurements is not big enough. This phenomenon was obtained at different measurements for different ranks, indicating a significant number of cases with strong (not sharp) solutions. Additionally, higher ranks require more measurements to achieve the highest percentage of cases with strong (not sharp) solutions.

We also plot the average values of τ(X0)\tau(X_{0}), IC(X0X_{0}) (Irrepresentability Criterion (5.21)), ρ(X0)\rho(X_{0}), and ζ(X0)\zeta(X_{0}) for each number of measurements in Figures 2 with different ranks. It seems that using ρ(X0)\rho(X_{0}) to check sharp minima gives us more cases than using τ(X0)\tau(X_{0}). Moreover, ζ(X0)\zeta(X_{0}) is significantly smaller than both τ(X0)\tau(X_{0}) and ρ(X0)\rho(X_{0}) while IC(X0X_{0}) is slightly greater than τ(X0)\tau(X_{0}).

Refer to caption
Figure 1: Proportions of cases for which X0X_{0} is a solution, sharp solution, and strong (not sharp) solution with respect to the number of measurements.
Refer to caption
Figure 2: Evolution of the average value of τ(E0)\tau(E_{0}), Source Nondegenerate ρ(E0)\rho(E_{0}), and Strong Source Coefficient ζ(E0)\zeta(E_{0}) with respect to the number of measurements.

6.2 Experiment 2

In the second experiment, we study the following matrix completion problem:

minXn×nXsubject toXij=(X0)ij,(i,j)Ω.\min_{X\in\mathbb{R}^{n\times n}}\quad\|X\|_{*}\quad\mbox{subject to}\quad X_{ij}=(X_{0})_{ij},\quad(i,j)\in\Omega. (6.9)

by a similar process to the first one. We also generate X0X_{0}, an n×nn\times n matrix of rank rr, by sampling two factors Wn×rW\in\mathbb{R}^{n\times r} and Hn×rH\in\mathbb{R}^{n\times r} with i.i.d. random entries and setting X0=WHX_{0}=WH^{*}. However, this time we sample an indexed subset Ω\Omega of mm entries uniformly at random from [n]×[n][n]\times[n]. Cvxpy package is also used to solve problem (6.9) with an optimal solution XoptX_{\text{opt}}. X0X_{0} is said to be recovered if XoptX0F/X0F<103\|X_{\text{opt}}-X_{0}\|_{F}/\|X_{0}\|_{F}<10^{-3}. To classify sharp minima, we check Restricted Injectivity (5.16) (as done in the first experiment), compute τ(X0)\tau(X_{0}) in (5.20) or Source Coefficient ρ(X0)\rho(X_{0}) in (5.19), and restrict τ(X0)0.99\tau(X_{0})\leq 0.99 or ρ(X0)0.95\rho(X_{0})\leq 0.95.

Specifically, to calculate τ(E0)\tau(E_{0}), we follow the following steps. It is similar to the first experiment, denote uiu_{i} and vjv_{j} by the iith and jjth column of U0U_{0} and V0V_{0}, where U0DV0TU_{0}DV_{0}^{T} is a full SVD of X0X_{0}. We define Bij=PΩ(uivjT)B_{ij}=P_{\Omega}(u_{i}v_{j}^{T}) for all (i,j)Γ=def{(i,j)[n]×[n]|(i,j)[nr,n]×[nr,n]}(i,j)\in\Gamma\stackrel{{\scriptstyle\text{\rm\tiny def}}}{{=}}\{(i,j)\in[n]\times[n]|\;(i,j)\notin[n-r,n]\times[n-r,n]\}, where PΩP_{\Omega} is the projection mapping defined by PΩ(X)ij=XijP_{\Omega}(X)_{ij}=X_{ij} if (i,j)Ω(i,j)\in\Omega and 0 otherwise. Then we solve the following linear system for αij\alpha_{ij}:

(i,j)ΓαijPΓ(U0TBijV0)=(Ir000).\sum_{(i,j)\in\Gamma}\alpha_{ij}P_{\Gamma}\left(U_{0}^{T}B_{ij}V_{0}\right)=\begin{pmatrix}I_{r}&0\\ 0&0\end{pmatrix}.

It is not difficult to obtain from (5.20) that

τ(E0)=YE0,\tau(E_{0})=\left\|Y-E_{0}\right\|,

where Y=(i,j)ΓαijBijY=\sum_{(i,j)\in\Gamma}\alpha_{ij}B_{ij} and E0=U0(Ir000)V0TE_{0}=U_{0}\begin{pmatrix}I_{r}&0\\ 0&0\end{pmatrix}V_{0}^{T}.

To compute ρ(X0)\rho(X_{0}), we transform problem (5.19) to the case of matrix completion as follows:

minZ𝕋0Zsubject toZij=(E0)ij,(i,j)Ω\min_{Z\in\mathbb{T}_{0}^{\perp}}\|Z\|\quad\mbox{subject to}\quad Z_{ij}=-(E_{0})_{ij},\quad(i,j)\notin\Omega (6.10)

and solve by using cvxpy package for the optimal value ρ(X0)\rho(X_{0}), where 𝕋0\mathbb{T}_{0}^{\perp} is determined in (6.3).

Similarly to (6.5), we denote Z0Z_{0} be an optimal solution of problem (6.10) and denote Y0=Z0+E0Y_{0}=Z_{0}+E_{0} as a dual certificate of X0X_{0}. To check if X0X_{0} is a strong solution, we only need to verify (6.6) with KerΦ=KerPΩ\mbox{\rm Ker}\,\Phi=\mbox{\rm Ker}\,P_{\Omega} by Theorem 5.2. According to Theorem 4.6, the later holds under the Restricted Injectivity and ζ(X0)<1\zeta(X_{0})<1, which is the optimal solution of the following problem, a version of (4.38) for matrix completion:

minZ𝕋0,WZsubject to(Z+E0+W)ij=0,(i,j)Ω.\min_{Z\in\mathbb{T}_{0}^{\perp},W\in\mathcal{E}^{\perp}}\|Z\|\quad\mbox{subject to}\quad(Z+E_{0}+W)_{ij}=0,\quad(i,j)\notin\Omega. (6.11)

Here 𝕋0\mathbb{T}_{0}^{\perp} and \mathcal{E}^{\perp} are defined (6.3) and (6.8). We classify a case of strong minima (strong recovery) if X0X_{0} is recovered and ζ(X0)0.95\zeta(X_{0})\leq 0.95, but τ(X0)>0.99\tau(X_{0})>0.99 and 0.95<ρ(X0)<1.050.95<\rho(X_{0})<1.05. In Figures 3 , we plot the proportion of cases when X0X_{0} is a unique solution, sharp solution, and strong (not sharp) solution in relation to the number of measurements. Additionally, Figures 4 displays the curves of the average value of τ(X0)\tau(X_{0}), ρ(X0)\rho(X_{0}), and ζ(X0)\zeta(X_{0}) with respect to the number of measurements.

Based on Figure 3, we can see that the highest percentage of cases where X0X_{0} is a strong solution (but not a sharp one) is just around 15%. It is much smaller than the 40% in Experiment 1. A possible reason is due to the special structure of the linear operator Φ=PΩ\Phi=P_{\Omega} in (6.9), which is chosen in such a way that each row contains only one entry of 1 and the remaining entries are 0. Additionally, these entries of 1 must be in distinct columns. Similar to Experiment 1, we found that higher ranks require more measurements to achieve the highest percentage of cases with strong (not sharp) solutions.

As shown in Figure 4, the average values of τ(X0)\tau(X_{0}), ρ(X0)\rho(X_{0}), and ζ(X0)\zeta(X_{0}) change depending on the number of measurements. The difference between the curves of τ(X0)\tau(X_{0}) and ρ(X0)\rho(X_{0}) is less noticeable in comparison to Experiment 1, but the curve ζ(X0)\zeta(X_{0}) is still significantly lower than both of τ(X0)\tau(X_{0}) and ρ(X0)\rho(X_{0}).

Refer to caption
Figure 3: Proportions of cases for which X0X_{0} is a solution, sharp solution, and strong (not sharp) solution with respect to the number of measurements.
Refer to caption
Figure 4: Evolution of the average value of τ(E0)\tau(E_{0}), Source Nondegenerate ρ(E0)\rho(E_{0}), and Strong Source Coefficient ζ(E0)\zeta(E_{0}) with respect to the number of measurements.

References

  • [1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp: Living on the edge: A geometric theory of phase transitions in convex optimization, IMA Inform. Inference, 3 (2008), 224–294.
  • [2] F. J. Aragón Artacho and M. H. Geoffroy: Characterizations of metric regularity of subdifferentials, J. Convex Anal., 15 (2008), 365–380.
  • [3] B. P. W. Ames and S. A. Vavasis: Nuclear norm minimization for the planted clique and biclique problems, Math. Program., 29 (2011), 69–89.
  • [4] A. Ben-Tal. Second order and related extremality conditions in nonlinear programming. J. Optim. Theory Appl., 31 (1980) 143–165.
  • [5] A. Ben-Tal and J. Zowe: A unified theory of first- and second order conditions for extremum problems in topological vector spaces. Math. Program. 19 (1982), 39–76.
  • [6] J. F. Bonnans and A. D. Ioffe: second order sufficiency and quadratic growth for non-isolated minima, Math. Oper. Res., 20 (1995), 801–817.
  • [7] J.F. Bonnans, R. Cominetti, and A. Shapiro: Second order optimality conditions based on parabolic second order tangent, SIAM J. Optim., 9 (1999), 466-492
  • [8] J. F. Bonnans and A. Shapiro: Perturbation Analysis of Optimization Problems, Springer, 2000.
  • [9] J. Bolte, T. P. Nguyen, J. Peypouquet, B. W. Suter: From error bounds to the complexity of first-order descent methods for convex functions, Math. Program., 165 (2017), 471–507.
  • [10] J. Burke: Second order necessary and sufficient conditions for convex composite NDO, Math. Program., 38 (1987), 287-302.
  • [11] P. M. Cohn: Universal algebra, Springer Science & Business Media, 2012.
  • [12] L. Cromme: Strong uniqueness. A far-reaching criterion for the convergence analysis of iterative procedures, Numer. Math., 29(2) (1977/78), 179–193
  • [13] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky: The convex geometry of linear inverse problems, Found Comput Math, 12 (2012), 805–849.
  • [14] E. Candès and Y. Plan: Matrix completion with noise, Proceeding of the IEEE, 98 (2010), 925–936.
  • [15] E. Candès and B. Recht: Exact matrix completion via convex optimization Found. Comput. Math., 9 (2009), 717–772.
  • [16] E. Candès and B. Recht: Simple bounds for recovering low-complexity models, Math. Program., 141 (2013), 577–589.
  • [17] Y. Cui and C. Ding: Nonsmooth composite matrix optimization: strong regularity, constraint nondegeneracy and beyond, ArXiv, arXiv:1907.13253
  • [18] Y. Cui, C. Ding, and X. Zhao: Quadratic growth conditions for convex matrix optimization problems associated with spectral functions, SIAM J. Optim., 27 (2017), 2332–2355.
  • [19] Y. Cui, D. Sun, and K-C. Toh: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming, Math. Program., 178 (2019),381–415
  • [20] C. Ding: Variational analysis of the Ky Fan k-norm, Set-Valued and Var. Anal., 25 (2017), 265–296.
  • [21] D. Drusvyatskiy, B. S. Mordukhovich, and T. T. A. Nghia: second order growth, tilt stability, and metric regularity of the subdifferential, J. Convex Anal., 21 (2014), 1165–1192.
  • [22] A. V. Fiacco and G. P. McCormick: Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York, 1968.
  • [23] J. Fadili, J. Malick, G. Peyré: Sensitivity Analysis for Mirror-Stratifiable Convex Functions, SIAM J. Optim., 28 (2018), 2975—3000.
  • [24] J. Fadili, T. T. A. Nghia, and T. T. T. Tran: Sharp, strong and unique minimizers for low complexity robust recovery, IMA Inf. Inference, 12 (2023).
  • [25] M. Grasmair, M. Haltmeier, and O. Scherzer: Necessary and sufficient conditions for linear convergence of 1\ell_{1} regularization, Comm. Pure Applied Math. 64(2011), 161–182.
  • [26] J. Harris: Algebraic Geometry, Springer, 1992.
  • [27] C. J. Hsieh and P. Olsen: Nuclear norm minimization via active subspace selection. ICML, 32 (2014), 575–583.
  • [28] T. Hoheisel and E. Paquette: Uniqueness in nuclear norm minimization: Flatness of the nuclear norm sphere and simultaneous polarization, J. Optim. Theo. Appl., 197 (2023), 252-276.
  • [29] J. Liang, J. Fadili, G. Peyré: Activity identification and local linear convergence of forward-backward-type methods, SIAM J. Optim., 27 (2017), 408–437.
  • [30] A. S. Lewis and H. S. Sendov: Nonsmooth analysis of singular values. I. Theory. Set-Valued Anal., 13 (2005), 213–241.
  • [31] A. S. Lewis and H. S. Sendov: Nonsmooth analysis of singular values. II. Applications, Set-Valued Anal., 13 (2005), 243–264.
  • [32] Z.-Q. Luo and P. Tseng: Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46 (1993), 157–178.
  • [33] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma: Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 171-184.
  • [34] A. D. Ioffe. Necessary and sufficient conditions for a local minimum III: Second order conditions and augmented duality. SIAM J. Control Optim., 17 (1979), 266–286.
  • [35] A. D. Ioffe: Variational analysis of a composite function: A formula for the lower second order epi-derivative, J. Math. Anal. Appl., 160 (1991), 379–405.
  • [36] B. S. Mordukhovich: Variational Analysis and Generalized Differentiation, Springer, 2006.
  • [37] A. Mohammadi and E. Sarabi: Parabolic regularity of spectral functions. Part I: Theory. arXiv:2301.04240
  • [38] S. Nam, M. E. Davies, M. Elad, R. Gribonval: The cosparse analysis model and algorithms, Applied and Computational Harmonic Analysis, 34 (2013), 30–56.
  • [39] B. T. Polyak. Sharp minima. Technical report, Institute of Control Sciences Lecture Notes, Moscow, USSR, 1979.
  • [40] B. Recht, M. Fazel, and P. Parrilo: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev., 52 (2010), 471–501.
  • [41] R. T. Rockafellar: Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970.
  • [42] R. T. Rockafellar: second order optimality conditions in nonlinear programming obtained by way of epi-derivatives, Trans. Amer. Soc., 307 (1988), 75–108.
  • [43] R. T. Rockafellar and R. J-B. Wets: Variational Analysis, Springer, Berlin, 1998.
  • [44] M. Studniarski and D. E. Ward: Weak sharp minima and suffficient conditions, SIAM J. Control Optim., 38 (1999), 219–236.
  • [45] S. Vaiter, G. Peyré, and J. Fadili: Model consistency of partly smooth regularizers. IEEE Transactions on Information Theory, 64 (2017), 1725–1737.
  • [46] S. Vaiter, M. Golbabaee, J. Fadili, and G. Peyré: Model selection with low complexity priors. Information and Inference: A Journal of the IMA, 4 (2015), 230–287.
  • [47] S. Vaiter, G. Peyré, and J. Fadili: Low complexity regularization of linear inverse problems. In Sampling Theory, a Renaissance, 103–153. Birkhäuser, Cham, 2015.
  • [48] G. A. Watson: Characterization of the subdifferential of some matrix norms, Linear Algebra Appl., 170 (1992), 33–45
  • [49] D. E. Ward: Characterizations of strict local minima and necessary conditions for weak sharp minima, J. Optim. Theory and Appl., 80 (1994), 551–571.
  • [50] Z. Zhou and A. M-C. So: A unified approach to error bounds for structured convex optimization, Math. Program., 165 (2017), 689–728.
  • [51] L. Zhang, N. Zhang, and X. Xiao: On the second order directional derivatives of singular values of matrices and symmetric matrix-valued functions, Set-Valued Var. Anal., 21 (2013), 557–586.
  • [52] R. Zhang and J. Treiman: Upper-Lipschitz multifunctions and inverse subdifferentials, Nonlinear Anal., 24 (1995), 273–286.
  • [53] X. Wang, J. J. Ye, X. Yuan, S. Zeng, and J. Zhang.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis, Set-Valued and Variational Analysis, 30 (2022), 39–79.