Geometric characterizations for strong minima with applications to nuclear norm minimization problems
Abstract. In this paper, we introduce several geometric characterizations for strong minima of optimization problems. Applying these results to nuclear norm minimization problems allows us to obtain new necessary and sufficient quantitative conditions for this important property. Our characterizations for strong minima are weaker than the Restricted Injectivity and Nondegenerate Source Condition, which are usually used to identify solution uniqueness of nuclear norm minimization problems. Consequently, we obtain the minimum (tight) bound on the number of measurements for (strong) exact recovery of low-rank matrices.
Key Words. Convex optimization; Strong minima; Sharp minima; Second order condition; Nuclear norm minimization; Exact recovery.
Mathematics Subject Classification 52A41 90C25 49J53 49J52
1 Introduction
Strong minima is an important property at a local minimizer of an optimization problem such that the difference between the cost value and the optimal value is bigger than a proportional of the norm square of the difference between the corresponding feasible solution and the minimizer. It is an error bound condition that has various applications to sensitivity analysis, robustness, and complexity of algorithms [4, 5, 7, 8, 18, 22, 23, 24, 29, 34, 42, 44]. Finding necessary and sufficient second order conditions for strong minima is a classical research area. For nonlinear programming, the first results in this direction were probably established in [22] under some restrictive conditions. Complete second order characterizations for nonlinear programming under mild conditions such as the Mangasarian-Fromovitz constraint qualification were obtained later in [4, 34]. For constrained (nonpolyhedral) optimization problems with smooth data, necessary and sufficient second order conditions for strong minima are much more involved. They often contain nontrivial “sigma terms”, which represent curvatures of some nonpolyhedral structures in the problem. Another important feature of these conditions is that they are usually formulated as “minimax” conditions such that Lagrange multipliers are dependent on the choice of vectors in the critical cone; see, e.g., [7, 8]. Although the aforementioned sigma terms are fully calculated for many seminal classes of optimization such as semi-infinite programming, semi-definite programming, and second order cone programming, their calculations are complicated in general. Moreover, checking the (minimax) sufficient second order conditions is quite a hard task numerically.
An important problem that motivates our study in this paper is the nuclear norm minimization problem
(1.1) |
where is the nuclear norm of an matrix , is a linear operator, and is a known vector (observation) in . This problem is considered the tightest convex relaxation of the celebrated NP-hard affine rank minimization problem with various applications in computer vision, collaborative filtering, and data science; see, e.g., [1, 13, 15, 16, 40]. There are some essential reasons to study strong minima of this problem. First, strong minima of problem (1.1) guarantees the linear convergence of some proximal algorithms for solving problem (1.1) and related problems; see, e.g., [18, 29, 50]. Second, it is also sufficient for solution uniqueness and robustness of problem (1.1) [23, 24]. Solution uniqueness for problem (1.1) is a significant property in recovering the original low-rank solution from observations . In [15, 16], Candès and Recht introduced a nondegenerate condition sufficient for solution uniqueness of problem (1.1). It plays an important role in their results about finding a small bound for measurements such that solving problem (1.1) recovers exactly from observations over a Gaussian linear operator . Their condition is recently revealed in [24] to be a complete characterization for the so-called sharp minima introduced by Crome [12] and Polyak [39] independently. This special property of problem (1.1) at guarantees the robust recovery with a linear rate in the sense that any solutions of the following low-rank optimization problem
(1.2) |
converge to with a linear rate as provided that for some constant ; see [14]. When strong minima occurs in problem (1.1), [24] shows that the convergence rate is Hölderian with order . It is also worth noting that solution uniqueness of nuclear norm minimization problem (1.1) can be characterized geometrically via the descent cone [1, 13]. As the descent cone is not necessarily closed, using it to check solution uniqueness numerically is not ideal. Another geometric characterization for solution uniqueness of problem (1.1) is established recently in [28], but the set in their main condition is not closed too. As strong minima is necessary for sharp minima and sufficient for solution uniqueness, it is open to study the impact of strong minima to exact recovery.
Sufficient second order condition for strong minima of problem (1.1) can be obtained from [18, Theorem 12]. The approach in [18] is rewriting problem (1.1) as a composite optimization problem and applying the classical results in [7, 8]. Some second order analysis on spectral functions including the nuclear norm studied in [17, 18, 20, 37, 51] could be helpful in understanding this result, but these second order computations applied on the nuclear norm still look complicated. Most importantly, the sufficient second order condition obtained in [18, Theorem 12] is still in a minimax form, which makes it hard to check. Our main questions throughout the paper are: 1. Is it possible to obtain simple necessary and sufficient conditions for strong minima of problem (1.1)? 2. Can we avoid the minimax form usually presented in these kinds of second order sufficient conditions? and 3. Is there any efficient way to check strong minima of problem (1.1) numerically?
Our contribution. To highlight the new ideas and direct to the bigger picture, we study in Section 3 the following composite optimization problem
(1.3) |
where is a finite dimensional space, is a twice continuously differentiable function, and is a proper lower semi-continuous (nonsmooth) function. It covers problem (1.2) and many modern ones in optimization. One of the most popular ways to characterize strong minima of problem (1.3) is using the second subderivative [8, 42, 43]. As the function is nonsmooth, second subderivative of is hard to compute in general. To avoid this computation, we assume additionally that the function satisfies the classical quadratic growth condition [6, 49]. In the case of convex functions, it is shown in Section 3 that is a strong solution of problem (1.3) if and only if and the following geometric condition holds
(1.4) |
where is the subdifferential mapping of , is the nullspace of the Hessian matrix , and is the Bouligand contingent cone at to . The action of the contingent cone to a first order structure in the above condition tells us that (1.4) is actually a second order condition. But it looks simpler for computation than the second subderivative; see, e.g., our Corollary 4.2 for the case of nuclear norm. Our results also work when is not convex; see Section 3 for more detailed analysis.
Another problem considered in Section 3 is the following convex optimization problem with linear constraints
(1.5) |
where is a continuous (nonsmooth) convex function, is a linear operator between two finite dimensional spaces, and is a closed polyhedral set of . This problem covers the nuclear norm minimization problem (1.1) and a handful of other significant optimization problems [1, 3, 13]. When is twice continuously differentiable, characterizations for strong minima of problem (1.5) are simple; see, e.g., [8, Theorem 3.120]. But when is not differentiable, characterizing strong minima is much more involved. A standard approach is rewriting problem (1.5) as a composite problem, which can be represented by a constrained optimization problem with smooth data [7, 8, 35]. To obtain similar geometric characterization to (1.4) for problem (1.5), we additionally assume the function satisfies both quadratic growth condition and second order regularity. The latter condition was introduced by Bonnans, Cominetti, and Shapiro in [7] to close the gap between necessary and sufficient second order optimality conditions for constrained and composite optimization problems. But the extra assumption of quadratic growth condition on the function in this paper allows us to achieve geometric characterizations for strong minima of problem (1.5) in Theorem 3.5. Many vital classes of nonsmooth functions satisfy both the quadratic growth condition and second order regularity. To list a few, we have classes of piecewise linear-quadratic convex functions [43], spectral functions [18, 37], norms [50], and indicator functions to the set of positive semi-definite matrices [19]. Our results are applicable not only to nuclear norm minimization problem (1.1), but also other different problems in optimization.
As the nuclear norm satisfies both the quadratic growth condition and second order regularity, geometric characterizations for strong minima of low-rank problems (1.1) and (1.2) are foreseen. But our studies on problems (1.1) and (1.2) in Section 4 and 5 are not just straightforward applications. In Section 4, we derive simple calculation of the second order structure in (1.4) for the case of nuclear norm. Furthermore, some quantitative characterizations for strong minima of problem (1.2) are obtained via the so-called Strong Restricted Injectivity and Analysis Strong Source Condition that inherit the same terminologies introduced recently in [24] to characterize strong minima/solution uniqueness of group-sparsity optimization problems. These conditions are weaker than the well-known Restricted Injectivity and Nondegenerate Source Condition used in [15, 16] as sufficient conditions for solution uniqueness of nuclear norm minimization problem (1.1); see also [25] for the case of -norm. Both conditions can be verified numerically. In Section 5, we obtain new characterizations for strong minima of problem (1.1). Our conditions are not in the form of minimax problems. Indeed, Theorem 5.2 shows that is a strong solution of problem (1.1) if and only if there exists a dual certificate such that
which has some similarity with (1.4). Necessary and sufficient conditions for strong minima obtained in this section reveal some interesting facts about exact recovery for nuclear minimization problem (1.1). For example, one needs at least measurements for to recover exactly the matrix of rank as a strong solution of problem (1.1). This bound for is very small, but it is tight in the sense that we can construct infinitely many linear operators such that solving problem (1.1) recovers exactly . Another compelling result in Section 5 shows that the low-rank representation problem [27, 33] always has strong minima, when the linear operator in (1.1) is any matrix.
Finally in this paper, we discuss numerical methods to check strong minima and compare the results with sharp minima and solution uniqueness. For example, with 100 nuclear norm minimization problems via standard Gaussian linear operators and measurements observed from the original matrix of rank , the case of exact recovery is about 80%, in which about 40% problems have sharp minima and the other 40% problems have strong minima. As the traditional approach for exact recovery in [1, 13, 16] is via sharp minima [24], seeing more unique (strong, non-sharp) solutions in these numerical experiments gives us a more complete picture about exact recovery when number of measurements is not big enough; see our Section 6 for further numerical experiments.
2 Preliminaries
Throughout this paper, we suppose that is an Euclidean space with norm and is its dual space endowed with the inner product for any and . is denoted by the closed ball with center and radius . Let be a proper extended real-valued function with nonempty domain . A point is called a strong solution (or strong minimizer) of if there exist such that
(2.1) |
In this case, we say that strong minima occurs at . The study of strong minima is usually based on second order theory, where the following structures of subderivatives play crucial roles; see, e.g., [5, 8, 35, 42, 43].
Definition 2.1 (Subderivatives).
For a function and , the subderivative of at is the function defined by
(2.2) |
The second subderivative of at for is the function defined by
(2.3) |
The parabolic subderivative of at for with respect to is defined by
(2.4) |
Parabolic subderivatives were introduced by Ben-Tal and Zowe in [5] to study strong minima; see also [43, Theorem 13.66]. Second subderivatives dated back to the seminal work of Rockafellar [42] with good calculus [35] for many important classes of functions. It is well known [43, Theorem 13.24] that is a strong solution of if and only if and
(2.5) |
Here stands for the Mordukhovich limiting subdifferential of at [36]:
(2.6) |
When is a proper l.s.c. convex function, this subdifferential coincides with the subdifferential in the classical convex analysis
(2.7) |
We denote by the Fenchel conjugate of :
(2.8) |
Next let us recall here some first and second order tangent structures [8, 43] on a nonempty closed set of that widely used in this paper.
Definition 2.2 (tangent cones).
Let be a closed set of . The Bouligand contingent cone at the point to is defined by
(2.9) |
The inner and outer second order tangent set to at in the direction are defined, respectively, by
(2.10) | ||||
The contingent cone is a closed set. It contains all such that there exists a sequence such that , where denotes the distance from to :
(2.11) |
Similarly, the inner second order tangent set to at is
(2.12) |
When is convex, it is well-known that
Since the function is a convex function, is a convex set. In this case, the inner second order tangent set is also convex due to the same reason and formula (2.12). Moreover, the dual of contingent cone is the normal cone to at :
(2.13) |
It is also the subdifferential of the indicator function to the set , which is defined by if and otherwise. The normal cone can be characterized via the support function to
(2.14) |
with .
To characterize strong minima for constrained optimization problems, Bonnans, Cominetti, and Shapiro [7, Definition 3] introduced the following second order regular condition on ; see also [8, Definition 3.85].
Definition 2.3 (Second order regularity).
The set is called to be second order regular at if for any the outer second order tangent set coincides with the inner second order tangent set and for any sequence of the form
The proper l.s.c. function is said to be second order regular at if its epigraph of is second order regular at .
The class of second order regular sets cover many important sets in optimization such as any polyhedral set and the set of positive semi-definite matrices and the second order ice cream cone; see, e.g., [8]. Piecewise linear quadratic convex functions are second order regular [7]. Recently, it is proved in [18] some special spectral functions are also second order regular.
When the function is l.s.c. convex and second order regular at , we note from [8, Proposition 3.41]
(2.15) |
for any . This is a convex set, which implies that is a convex function. In this case, it is known from [8, Proposition 103] that is parabolically regular at in a direction for in the sense that
(2.16) |
which is the Fenchel conjugate of the function at provided that the pair satisfies the condition .
Next let us slightly modify [8, Theorem 3.108 and Theorem 3.109], which give necessary and sufficient conditions for strong solutions of the following composite problem
(2.17) |
where is a twice continuously differentiable mapping and is a l.s.c. proper convex function. Suppose that with . The Robinson’s constraint qualification at for this composite problem is known as
(2.18) |
see, e.g., [7, 8]. The feasible point is a called a stationary point of problem (2.17) if there exists a Lagrange multiplier such that
(2.19) |
Theorem 2.4 (Second order characterizations for strong solutions of composite problems).
Suppose that Robinson’s constraint qualification (2.18) holds at a stationary point and that the function is second order regular at . Then is a strong solution of problem (2.17) if and only if for any nonzero in the critical cone
(2.20) |
there exists a Lagrange multiplier satisfying condition (2.19) such that
(2.21) |
Proof. Let us justify the sufficient part first. According to [8, Theorem 3.109], is a strong solution of problem (2.17) provided that for any there exists a Lagrange multiplier satisfying (2.19) such that
(2.22) |
where . Since is convex and second order regular at , equation (2.15) for the function tells us that is a convex function for any . Moreover, note that
We obtain from (2.16) that . This ensures the equivalence between (2.22) and (2.21). Thus is a strong solution of problem (2.17) provided that condition (2.21) holds.
To prove the necessary part, we note again that the function is convex, then there is no gap between second order necessary and sufficient condition, i.e., condition (2.22) is also a necessary condition for strong solution ; see [7, Theorem 5.2] or [8, Theorem 3.108]. Due to the equivalence of (2.21) and (2.22) above, condition (2.21) is also necesary for the strong minima at .
As described in (2.21), the existence of Lagrange multiplier is dependent on the choice of each vector in the critical cone. Under the Robinson’s constraint qualification (2.18), (2.21) is a minimax condition in the sense that it is equivalent to
(2.23) |
where is the set of all Lagrange multipliers satisfying (2.19). It is hard to check this condition numerically. On the other hand, its maximin version is more desirable, as it means that there may exist a Lagrange multiplier such that inequality (2.21) is valid for any . However, it is not clear how to close the gap between the minimax and maximin. For the case of (1.1), we will obtain some kind of maximin condition for strong minima in Theorem 5.2.
3 Geometric characterizations for strong minima of optimization problems
3.1 Geometric characterizations for strong minima of unconstrained optimization problems
In this subsection, we consider the following composite optimization problem
(3.1) |
where are proper functions such that , is twice continuously differentiable in , and is lower semi-continuous. We assume that is a stationary point of problem (3.1) in the sense that
due to the sum rule for limiting subdifferential; see, e.g., [36, Proposition 1.107] or [43, Exercise 10.10]. Obviously, is a stationary point if and only if .
To characterize strong minima at the stationary point , one of the most typical methods is using the second subderivative defined in (2.5). As the function is twice continuously differentiable at , it is well-known [43, Exercise 13.18] that
(3.2) |
Since is possibly nonsmooth in many structured optimization problems, the computation of could be quite challenging. In this section, we establish several new necessary and sufficient conditions for strong minima without computing second subderivatives under an additional assumption that the function satisfies the following quadratic growth condition [6, 49]; see also [8, Section 3.5].
Definition 3.1 (Quadratic growth conditions).
Let be a proper l.s.c. function and be a closed subset of with . We say that satisfies the quadratic growth condition at for some with respect to if there exist constants and modulus such that
(3.3) |
with . The function is said to satisfy the quadratic growth condition at for , if it satisfies this condition at for with respect to
(3.4) |
Finally, we say the function satisfies the quadratic growth condition at if it satisfies this condition at for any .
As the function is l.s.c., the set is closed and . When quadratic growth condition (3.3) holds, it is clear that
(3.5) |
Moreover, for any closed set fulfilling (3.5) and , we find some such that
which implies that , i.e., . It follows from (3.5) that
for any . Hence, if the function satisfies the quadratic growth condition at for , it also satisfies the quadratic growth condition at for w.r.t. any closed set fulfilling (3.5). Many necessary and sufficient conditions for the quadratic growth condition have been established in [6, 8] and [44, 49] under a different name weak sharp minima with order .
When is convex and , the set coincides with . The quadratic growth condition of at to w.r.t. has been studied and connected with the so-called Łojasiewicz inequality with exponent [9] and the metric subregularity of the subdifferential [2, 21, 52] (even for nonconvex cases.) There are broad classes of convex functions satisfying the quadratic growth condition such as piece-wise linear quadratic convex functions [43, Definition 10.20] and many convex spectral functions [18]; see also [50] for some other ones.
When is not convex, the quadratic growth condition of at to w.r.t. is the same with the quadratic growth condition of at for provided that
(3.6) |
It is necessary for the quadratic growth condition (3.3) at for that is a local minimizer to the function , . Then, condition (3.6) is similar to the so-called proper separation of isocost surface of in [53], which is an improvement of the proper separation of stationary points of in [32]. By [21, Theorem 3.1], the function satisfies the quadratic growth condition at for w.r.t. provided that is metrically subregular at for in the sense that there exist such that
(3.7) |
This condition is satisfied when is a piecewise polyhedral set-valued mapping, i.e., the graph of , is a union of finitely many polyhedral sets; see, e.g., [43, Example 9.57]. Thus the class of (possibly nonconvex) piecewise linear-quadratic functions fulfills (3.7); see also [53] for several sufficient conditions for (3.7) and some special nonconvex piecewise linear-quadratic regularizers such as SCAD and MCD penalty functions. Although our theory in this section is applicable to nonconvex functions, we focus our later applications on low-rank minimization problems (1.2) when is the nuclear norm, which also satisfies the quadratic growth condition [50] but the graph of is not piecewise polyhedral.
The following lemma plays an important role in our analysis.
Lemma 3.2 (Necessary condition for quadratic growth).
Let be a proper l.s.c. function and be a closed subset of with . If satisfies the quadratic growth at for some w.r.t. with some modulus , we have
(3.8) |
Moreover, if satisfies the quadratic growth at for , we have
(3.9) |
Proof. Suppose that inequality (3.3) holds with some . Pick , we only need to verify (3.8) when , i.e., . It follows from (2.3) that there exist sequences and such that
(3.10) |
Hence, we have
when is sufficiently large. Combining (3.3) and (3.10) gives us that
(3.13) |
As is closed, there exist , i.e., such that This together with (3.13) implies that
Hence is bounded. By passing to a subsequence, we suppose that converges to . As , we have . It follows from the above inequality that
which verifies (3.8).
To justify (3.9), suppose further that satisfies the quadratic growth condition at for . For any , we obtain from (3.8) that , which means . It follows that . Let us prove the opposite inclusion by picking any . There exist and such that , i.e.,
We obtain from (3.8) that
(3.14) |
which yields and verifies . The proof is complete.
Next let us establish the main theorem of this section, which provides a geometric characterization for strong minima of problem (3.1).
Theorem 3.3 (Necessary and sufficient conditions for strong minima).
Proof. As is twice continuously differentiable at , we derive from (2.5) and (3.2) that is a strong solution of if and only if there exists some such that
(3.17) |
To justify the first part, suppose that is a strong solution of , i.e., (3.17) holds. Pick any and find sequences and such that , which means
By the definition of in (2.3), we have . This together with (3.17) verifies (3.15).
To verify the second part of the theorem, suppose that the function satisfies the quadratic growth condition at for with some modulus and is positive semidefinite. It is obvious that (3.15) implies (3.16). We only need to prove that (3.16) is sufficient for strong minima at . Suppose that condition (3.16) is satisfied. If condition (3.17) failed, we could find a sequence such that and
It follows from (3.8) that
(3.18) |
By passing to a subsequence, assume that with (without relabeling.) It follows that
Hence, we have and , which means
This is a contradiction to (3.16) as . Hence, (3.16) holds for some and is a strong solution of . The proof is complete.
Corollary 3.4 (Geometric characterization for strong minima of convex problems).
Proof. As discussed before (3.7), when is a convex function, the metric subregularity of at for implies the quadratic growth condition (they are indeed equivalent [2, 52].) Since is convex, is positive semidefinite. By Theorem 3.3, is a strong solution if and only if (3.16) holds. Since is a convex function, we have . Thus (3.19) is equivalent to (3.16). The proof is complete.
Unlike many other necessary and sufficient conditions for strong minima, our geometric characterizations (3.16) and (3.19) do not involve the ”curvature” or the ”sigma-term” of the function . We still need to compute the contingent cones in (3.16) or in (3.19). In Section 4 and 5, we consider the case , the nuclear norm and provide a simple calculation of .
3.2 Geometric characterization for strong minima of optimization problems with linear constraints
In this subsection, we apply the idea in Theorem 3.3 and Corollary 3.4 to the following convex optimization problem with linear constraints
(3.20) |
where is a continuous (nonsmooth) convex function with full domain, is a linear operator between two Euclidean spaces, and is a closed convex polyhedral set in . Unlike problem (3.1), function needs to satisfy more properties such as convexity, quadratic growth condition, and second order regularity stated in the next theorem.
Let us recall that is a stationary solution of problem (3.20) if there exists a Lagrange multiplier , hence a dual certificate, such that
(3.21) |
The set of Lagrange multipliers is defined by
(3.22) |
The critical cone of this problem at the stationary point is
(3.23) |
The point is call a strong solutions of problem (3.20) if there exists and such that
Theorem 3.5 (Geometric characterization for strong minima of problem (3.20)).
Proof. Define being a linear subspace of , , the mapping by for , and the function by
Hence we can replace by and by in problem (3.20). Rewrite problem (3.20) as a composite optimization problem (2.17)
(3.25) |
Observe that is a strong solution of (3.20) if and only if it is a strong solution of (3.25). Robinson’s constraint qualification (2.18) for problem (3.25) is
(3.26) |
As is a surjective operator, [8, Corollary 2.101] tells us that the above condition is equivalent to the existence of satisfying
This condition holds trivially at . Thus Robinson’s constraint qualification (3.26) holds.
The critical cone (2.20) for problem (3.25) is
(3.27) |
As is also a polyhedral in , we have
(3.28) |
It follows that the set is exactly the critical cone defined in (3.23).
By (2.19), the set of Lagrange multipliers of problem (3.25) is
(3.31) |
Note further that
Since is a polyhedral, it is second order regular [7]. As is second order regular, so is ; see, e.g., [8, Propisition 3.89].
By Theorem 2.4, is a strong solution of problem (3.25) if and only if for any there exists such that
(3.32) |
Observe that
(3.33) |
Note from (2.3) that
(3.34) |
By (3.28), we have
(3.35) |
Since , we have . As and , it follows that
which implies that . This together with (3.35) tells us that .
By (3.33), condition (3.32) is equivalent to
(3.36) |
Since is a polyhedral set, we have
Represent for some and , it follows that
Hence is a strong solution of problem (3.25) if any only if for any there exists some in (3.21) such that (3.36) holds. Since satisfies the quadratic grown condition at , we get from Lemma 3.2 that
Hence condition (3.36) means for any there exists such that
This is equivalent to the following inclusion
which is also equivalent to (3.24). The proof is complete.
The approach of using composite function (3.25) to study problem (3.20) is traditional; see, e.g., [7, 34]. However, by assuming additionally the function satisfies the quadratic grown condition at , we are able to obtain the new geometric characterization for strong solution in (3.24). The main idea in this result is similar to that in Theorem 3.3. Here we require the function to satisfy more assumptions, but when applying this result to the nuclear norm minimization problem (5.1), they are also valid, as the nuclear norm is second order regular and also satisfies the quadratic growth condition [18].
4 Characterizations for strong minima of low-rank optimization problems
This section devotes to new characterizations for strong minima of the low-rank optimization problem:
(4.1) |
where is a linear operator, is the nuclear norm of , is a positive constant, and satisfies the following standing assumptions [24]:
-
(A)
is proper convex and twice continuously differentiable in .
-
(B)
is positive definite for any .
Strongly convex functions with full domain clearly satisfy the above standing assumptions. Another important (non-strongly convex) function with these conditions widely used in statistical/machine learning is the Kullback-Leiber divergence. Sufficient conditions for strong minima of problem (4.1) can be obtained from [18, Theorem 12]. However, their result still relies on some computation of , which is complicated; see, e.g., [20, 51] and the recent paper [37] for the case of symmetric matrices. We will provide some explicit and computable characterizations for strong minima of problem (4.1) based on Corollary 3.4. The calculation of the contingent cone is rather simple; see our formula (4.10) below.
Let us recall a few standard notations for matrices. The space of all matrices () is endowed with the inner product
where Tr is the trace operator. The Frobenious norm on is
The nuclear norm and spectral norm of are defined respectively by
where are all singular values of . Suppose that a full Singular Value Decomposition (SVD) of is
(4.2) |
where , and are orthogonal matrices. Let be the set of all such pairs satisfying (4.2). We write and , where and are the submatrices of the first columns of and , respectively. We get from (4.2) that , which is known as a compact SVD of .
The following lemma is significant in our paper. The first part is well-known [48, Example 2]. The last part was established in [50, Proposition 10], which can be viewed as a direct consequence of [48, Example 1] via convex analysis, the formula of normal cone to a level set [41, Corollary 23.7.1].
Lemma 4.1 (Subdifferential of the nuclear norm).
The subdifferential to nuclear norm at is computed by
(4.3) |
Moreover, if and only if and
(4.4) |
Furthermore, for any , we have
(4.5) |
where is the set of all symmetric positive semidefinite matrices and is defined by
(4.6) |
Let and . It follows from (4.3) that can be represented by
(4.7) |
with some satisfying . Let and be a full SVD of . We get from (4.7) that
(4.8) |
Observe that and . It follows that , which means and have simultaneous ordered singular value decomposition [30, 31] with orthogonal matrix pair in the sense that
(4.9) |
where and .
The following result establishes a geometric characterization for strong solution of the problem (4.1). According to [50, Proposition 11], the subdifferential of the nuclear norm function satisfies the metric subregularity (3.7) at any for any .
Corollary 4.2 (Geometric characterization for strong minima of low-rank optimization problems).
Suppose that is a minimizer of problem (4.1) with . Let as in (4.9) or (4.8). Then we have
(4.10) |
where is defined in (4.6) and is the set of all symmetric matrices of size . Hence is a strong solution of (4.1) if any only if
(4.11) |
Consequently, is a strong solution of (4.1) provided that the following Strong Sufficient Condition holds
(4.12) |
Proof. By Lemma 4.1, we have
As , we obtain from (4.2) that
which is exactly the right-hand side of (4.10) according to the contingent cone formula to in [8, Example 2.65]. Since is metrically subregular at for by [50, Proposition 11], it follows from Corollary 3.4 that is a strong solution of problem (4.1) if and only if
Since , we have . The characterization (4.11) for strong minima at follows from the above condition.
Finally, note from (4.10) that
Strong Sufficient Condition (4.12) implies strong minima at by (4.11).
Remark 4.3.
The geometric characterization (4.11) for strong minima of problem (4.1) is news. A sufficient condition is indeed obtained from [18, Theorem 12], which considers more general optimization problems involving spectral functions. However, their result contains a nontrivial sigma-term, which is calculated explicitly in recent papers [17, 37] for the case of symmetric matrices. Our approach is totally different without any sigma-terms. Moreover, our condition is a full characterization for strong minima.
Another result about strong minima of problem (4.1) was established in [29, Proposition 12], which plays an important role in proving the local linear convergence of Forward-Backward algorithms solving problem (4.1). The result mainly states that the so-called Restricted Injectivity and Nondegenerate Condition are sufficient for strong minima; see also [24, Proposition 4.27] for similar observation. Let us recall these important conditions here; see further discussions about them in Section 5. Let and define the model tangent subspace
(4.13) |
of with dimension ; see, e.g., [15, 16]. The Restricted Injectivity condition means
(4.14) |
And the Nondegeneracy Condition holds when
(4.15) |
where is the relative interior of ; see [41]. The validity of Nondegeneracy Condition (4.15) implies that is an optimal solution of problem (4.1). Note that
(4.16) |
Hence, Nondegeneracy Condition (4.15) means that the number of singular value ones, in (4.6), is the rank of . In this case, Restricted Injectivity (4.14) clearly implies the Strong Sufficient Condition (4.12). Hence, the combination of Restricted Injectivity (4.14) and Nondegeneracy Condition (4.15) is stronger than our Strong Sufficient Condition (4.12). The following result gives a complete picture about strong minima when Nondegeneracy Condition (4.15) occurs.
Corollary 4.4 (Strong minima under Nondegeneracy Condition).
Proof. As Nondegeneracy Condition (4.15) holds, is a solution of problem (4.1). In this case, observe from (4.8) and (4.10) that The equivalence between strong minima at and (4.17) follows Corollary 3.4.
As the dimension of subspace is , which is usually small in low-rank optimization problems, it is likely that condition (4.17) holds when Nondegeneracy Condition (4.15) is satisfied. More discussions about Strict Restricted Injectivity will be added on Section 5.
Although geometric characterization (4.11) looks simple, checking it in high dimension is nontrivial. But Strong Sufficient Condition (4.12) and Strict Restricted Injectivity (4.17) can be verified easily. Next we establish some quantitative characterizations for strong minima. Before doing so, we obtain some projection formulas onto subspaces and . For any , suppose that is represented by block matrices as:
The projections of onto and are computed respectively by
(4.19) |
The following result provides a formula for critical cone of nuclear norm at for .
Proposition 4.5 (Critical cone of nuclear norm).
Proof. For any , it is well-known from convex analysis [41] that
(4.21) |
This together with (4.3) and (4.19) implies that
(4.24) |
with . As , we have if and only if
which means . By Lemma 4.1, we have , or equivalently . The proof is complete.
Next, we construct the main result of this section, which contains a quantitative characterization for strong minima. A similar result for group-sparsity minimization problem is recently established in [24, Theorem 5.3].
Theorem 4.6 (Characterizations for strong minima of low-rank optimization problems).
Suppose that is a minimizer of problem (4.1) and with decomposition (4.2) and (4.8). The following are equivalent:
-
(i)
is a strong solution to problem (4.1).
-
(ii)
with
(4.25) (4.26) -
(iii)
The following conditions (a) and either (b) or (c) are satisfied:
-
(a)
(Strong Restricted Injectivity): .
-
(b)
(Strong Nondegenerate Source Condition): There exists such that and .
-
(c)
(Analysis Strong Source Condition): The Strong Source Coefficient , which is the optimal value of the following spectral norm optimization problem
(4.27) is smaller than , where is a linear operator such that .
-
(a)
Proof. Let us verify the equivalence between (i) and (ii). By Corollary 4.2, it suffices to show that
(4.28) |
By Proposition 4.5 and Corollary 4.2, we have
(4.29) |
As , note from (4.20) and (4.24) that
This together with (4.29) verifies (4.28) and also the equivalence between (i) and (ii).
Next, let us verify the implication [(ii)(iii)]. Suppose that (ii) (or (i)) is satisfied. Note from the projection formula (4.19) that . It follows from Corollary 4.2 that the Strong Resitricted Injectivity holds.
As , for any , we write and derive from (4.30) that
(4.31) |
Since , we have and thus . It follows that
This together with (4.31) implies that
Define the unit ball with respect to nuclear norm. We obtain from the later and the classical minimax theorem [41, Corollary 37.3.2] that
Hence there exists such that . Due to the projection formula (4.19), observe that
Define , we have
Note that , , and . It follows that . Moreover, observe that and . Thus, satisfies the condition in (b). As , (b) and (c) are equivalent, we ensure the implication [(ii)(iii)].
It remains to justify the implication [(iii)(ii)]. Suppose that the Strong Restricted Injectivity (a) and the Strong Source Condition (b) hold with some satisfying the condition in (b). Indeed, pick any . As , we have . It follows that
Since , we have , i.e., . This implies that due to the Strong Restricted Injectivity (a). The proof is complete.
Remark 4.7.
The Strong Restricted Injectivity (a) means that the linear operator is injective on the subspace . It is similar with the one with the same name in [24, Theorem 5.3] that is used to characterize the uniqueness (strong) solution for group-sparsity optimization problems. This condition is adopted from the Restricted Injectivity (4.14) in [25]; see also [14, 15, 16] for the case of nuclear norm minimization problems. The Strong Restricted Injectivity is certainly weaker than the Restricted Injectivity.
The Strong Nondegenerate Source Condition and Analysis Strong Source Condition also inherit the same terminologies introduced in [24, Theorem 5.3] for group-sparsity optimization problems. In [14, 16, 25, 46, 47], the Nondegenerate Source Condition at means the existence of a dual certificate satisfying , which is equivalent to
(4.32) |
This condition is weaker than the Nondegeneracy Condition (4.15). In the case of optimization, it is well-known that the Restricted Injectivity and Nondegenerate Source Condition together characterize solution uniqueness [25]. For nuclear norm minimization problem, [15, 16] shown that they are sufficient for solution uniqueness of problems (1.1) and (4.1); see also [46, 47] for more general convex optimization problems. It is worth noting that they are not necessary conditions for solution uniqueness; see, e.g., [24, Example 4.15]. One hidden reason is that the nuclear norm is not polyhedral. Recently [24] shows that these two conditions characterize the so-called sharp minima of (5.1), which is somewhat between solution uniqueness and strong minima; see our Remark 5.18 for further discussion. Due to [24, Proposition 4.27], they are sufficient for strong solution of problem (4.1). This fact can be obtained from Theorem 4.6 by observing that our Strong Nondegenerate Source Condition is weaker than Nondegenerate Source Condition, since Strong Nondegenerate Source Condition involving the set means
(4.33) |
due to (4.16).
Remark 4.8 (Checking Strong Restricted Injectivity, Strong Sufficient Condition (4.12), and constructing the linear operator ).
To check the Strong Restricted Injectivity, observe first that
which is a subspace of with dimension . Moreover, the restriction of on
(4.34) |
is also a subspace of with dimension . The set in in Strong Sufficient Condition (4.12) is another subspace of with dimension . Suppose that form a basis of , form a basis of and form a basis of . For any , we write and obtain that
(4.35) |
Define to be an matrix and , to be the submatrices of the first columns of , respectively. By (4.35), the Strong Restricted Injectivity is equivalent to the condition that , i.e., Similarly, the Strong Sufficient Condition (4.12) means .
Next let us discuss how to construct the operator . Let be a SVD of with . Define to be the submatrix of where . Note that
Determine the linear operator by
(4.36) |
It is easy to check that .
Remark 4.9 (Checking the Analysis Strong Source Condition without using the linear operator ).
To verify the Analysis Strong Source Condition, Theorem 4.6 suggests to solve the optimization problem (4.27), which involves the linear operator . To avoid the computation of , note first that the constraint in (4.27) means Suppose that is the linear operator with , i.e., , the later condition is equivalent to
where is computed by
(4.37) |
which is a subspace of with dimension ; here is the set of all skew-symmetric matrices. Hence problem (4.27) is equivalent to the following one
(4.38) |
The price to pay is the size of this optimization problem is bigger than the one in (4.27). But finding the linear operator is much easier than .
Corollary 4.10 (Quantitative sufficient condition for strong solution).
Suppose that is a minimizer of problem (4.1). Then is a strong solution provided that the Strong Restricted Injectivity holds and
(4.39) |
where is the linear operator satisfying
Proof. Suppose that Strong Restricted Injectivity holds at the minimizer . We consider the following linear equation:
(4.40) |
As , we have
It follows that is a solution to (4.40). Another solution of (4.40) is
(4.41) |
where is the Moore-Penrose generalized inverse of . Next we claim that is a bijective mapping. Indeed, suppose that for some , we have
It follows that , which implies . Thus is injective in . As the operator is self-dual, it is bijective. We obtain that
By the decomposition (4.19), we may write
Observe from (4.27) that
If , we have is a strong solution due to Theorem 4.6.
5 Characterizations for strong minima of nuclear norm minimization problems
In this section, let us consider the nuclear norm minimization problem (1.1):
(5.1) |
where is a linear operator (), is a known observation. This is a particular case of problem (3.20) with for . Note that is a solution of this problem if and only if and
which is equivalent to the existence of a dual certificate . Define
to be the set of all dual certificates.
Lemma 5.1.
Let be a nonempty closed set of . Suppose that and are orthogonal matrices satisfying . Then we have
(5.2) |
Proof. Define and . It is easy to check that and are orthogonal matrices. We have
(5.3) |
Define the mapping by for all . Note that is an isometry with the spectral metric for all . Indeed, we have
Note further that for all . It follows from (5.3) that
Since is a compact metric space and is an isometry, it is well-known that
Hence we have
This verifies (5.2) and completes the proof of the lemma.
For any , recall from (4.6) that to be the number of singular values of that are equal to . It follows from Lemma 3.2 that . Define the following constant
(5.4) |
When is a minimizer of problem (5.1), is well-defined and bigger than or equal to . The following theorem is the main result in this section.
Theorem 5.2 (Characterizations for strong minima of nuclear norm minimization problems).
Suppose that is an optimal solution of problem (5.1). The following are equivalent:
(i) is a strong solution of problem (5.1).
(ii) The following equality holds
(5.5) |
(iii) There exists a dual certificate such that
(5.6) |
(iv) For any satisfying , condition (5.6) is satisfied.
Proof. Note that the nuclear norm function is second order regular at [18] and it also satisfies the quadratic grown condition at , as is metrically subregular at for any ; see [50]. By applying Lemma 4.1 and Theorem 3.5 with , , , and , we have is a strong solution if and only if
(5.7) |
where is the critical cone (3.23) computed by
By Proposition 4.5 and formula (4.10), we note that if for some then , i.e., . Thus and that
This together with (5.7) verifies the equivalence between (i) and (ii).
The implication [(iii)(ii)] and [(iv)(ii)] are trivial. To justify the converse implications, we first claim the existence of some dual certificate such that and
(5.8) |
We prove this by using a popular version Zorn’s lemma [11, Proposition 5.9] on partially directed order set. Consider the following partially ordered set (poset)
(5.9) |
with the partial ordering between sets in . Take into account any downward chain
(5.10) |
for a sequence . We can find a subsequence of such that they have the same rank . According to the computation (4.10), there exist orthogonal matrices such that
It follows that
(5.11) |
Since is closed, we obtain from Lemma 5.1 that
Hence the chain (5.10) is bounded below by . This means that every downward chain of of has a minimum in .
Let us show next that the poset is directed downward with the partial ordering in the sense that for any two elements and of with , there exists such that
(5.12) |
Indeed, we choose . For any , we obtain from (4.10) that . Hence, there exists and such that
which implies that
Since , we obtain from Lemma 3.9 and Lemma 4.1 that
This clearly verifies the directed condition (5.12). By [11, Proposition 5.9], the directed downward poset has a minimum in in the sense that there exists such that (5.8) is valid. The implication [(ii)(iii)] follows from (5.8).
Next, let us show that and
(5.13) |
for any with . Indeed, pick any satisfying the later condition, we obtain from (5.8) that
(5.14) |
Due to the computation (4.10), the maximum ranks of matrices in and are and , respectively. It follows that , which verifies that . Moreover, due to (4.10) and (5.14), we get (5.13) from Lemma 5.1. Combining (5.13) and (5.8) ensures the implication [(ii)(iv)]. The proof is complete.
Remark 5.3.
A sufficient condition for strong minima of nuclear norm minimization (5.1) can be obtained from [18, Theorem 12]. However, their condition has a format of a minimax problem: for any element in the critical cone, there exists some Lagrange multiplier such that a certain second order sufficient condition is satisfied, i.e., the Lagrange multiplier used in the sufficient condition depends on the choice of elements in critical cone. This is a typical situation; see (2.23) for instance. In our characterizations for strong minima in part (iii) and (iv) of the above theorem, the existence of dual certificate is independent from elements of the critical cone. Condition (iii) is close to the maximin situation. Morever, we know extra information about these dual certificates from (iv) that they should have minimum numbers of singular values that are equal to 1.
Similarly to Theorem 4.6, condition (5.6) is equivalent to the combination of Strong Restricted Injectivity and Strong Nondegenerate Source Condition. Let us recall the model tangent subspace (4.13) at here:
where is a compact SVD of .
Corollary 5.4 (Strong Restricted Injectivity and Strong Nondegenerate Source Condition for strong minima ).
Remark 5.5 (Sharp minima vs Strong minima).
The set is only dependent on . Indeed, it follows from (5.8) and (4.10) that
which is a subspace of . As discussed after Theorem 4.6, the Strong Restricted Injectivity is weaker than the Restricted Injectivity
(5.16) |
The Strong Nondegenerate Source Condition is also weaker than the Nondegenerate Source Condition:
(5.17) |
This condition means . The Nondegenerate Source Condition together with the Restricted Injectivity is used in [15, 16] as sufficient conditions for solution uniqueness of problem (5.1) at . These two conditions are shown recently to be equivalent to the stronger property, sharp minima at in [24, Theorem 4.6] in the sense that there exists some such that
(5.18) |
Our Strong Restricted Injectivity and Strong Nongenerate Source Condition are characterizations for a weaker property, the strong minima for problem (5.1). Of courses, they can also serve as sufficient conditions for solution uniqueness at ; see [28] for some recent characterizations for this property.
In order to check Nondegenerate Source Condition (5.17), one has to show that the Source Coefficient , the optimal value of the following optimization problem
(5.19) |
to be smaller than , where is a linear operator satisfying ; see, e.g. [24, Remark 4.5]. When the Restricted Injectivity (5.16) holds, an upper bound for is
(5.20) |
Hence condition is sufficient for sharp minima; see, e.g., [24, Corollary 4.8]. This condition is known as the Analysis Exact Recovery Condition in [38] for the case of optimization. Another independent condition is also used to check solution uniqueness of problem (5.1) is the so-called Irrepresentability Criterion [15, 16, 45]:
(5.21) |
Note that . Thus also implies that is a sharp solution of problem (5.1).
Remark 5.6 (Descent cone vs tangent cone).
Sharp minima and strong minima of problem (5.1) are sufficient for solution uniqueness, which is a significant property for exact recovery [1, 13, 16]. An important geometric structure used to study solution uniqueness is the descent cone [13] at defined by
(5.22) |
Indeed, [13] shows that is a unique solution of problem (5.1) if and only if
(5.23) |
Unlike the tangent cones in (5.5) or (5.6), the descent cone may be not closed. Although the direct connection between the descent cone and tangent cones is not clear, we claim that
(5.24) |
when is a minimizer of problem (5.1), where is the set of dual certificates at . Indeed, for any , there exists such that . As , we have . Pick any . In view of (4.4) and the definition of , it follows that
which implies that due to (4.4)-(4.5) and thus . This verifies inclusion (5.24). Inclusion (5.24) also tells us that condition (5.5) is sufficient for (5.23). This observation is not a surprise in the sense of Theorem 5.2, as strong minima obviously implies solution uniqueness. But solution uniqueness of problem (5.1) does not indicate strong minima; see [24, Example 5.11]. Hence, inclusion (5.24) is strict.
Similarly to Corollary 4.4, the following result reveals the role of Strict Restricted Injectivity in strong minima.
Corollary 5.7 (Strict Restricted Injectivity for strong minima of problem (5.1)).
Suppose that is a compact SVD of with . If is a strong solution, then the following Strict Restricted Injectivity is satisfied:
(5.25) |
This condition is also sufficient for strong solution at provided that Nondegenerate Source Condition (5.17) holds at .
Proof. Note from (4.10),
If is a strong solution, combing (5.5) with the latter inclusion verifies (5.25).
If Nongegenerate Source Condition (5.17) holds at , there exists . Hence , i.e., is a solution of problem (5.1). It follows from Lemma 4.1 that and from (4.10) that
Hence the validity of (5.25) implies that is a strong solution to problem (5.1) due to the equivalence between (i) and (iii) in Theorem 5.2.
Suppose that is a full SVD of . The model tangent space at can be represented by
(5.26) |
It has dimension . When Restricted Injectivity (5.16) holds, we have . Similarly, as the dimension of is , it is necessary for Strict Restricted Injectivity (5.25) that . Next we show that this bound for is tight.
Corollary 5.8 (Minimum bound for strong exact recovery).
Suppose that is an matrix with rank . Then one needs at least measurements of so that solving the nuclear norm minimization problem (5.1) recovers exactly the strong solution .
Moreover, there exist infinitely many linear operators such that is a strong solution of problem (5.1).
Proof. Suppose that is a compact SVD of . Let with be any basis of . If is a strong solution of problem (5.1), Strict Restricted Injectivity (5.25) holds by Corollary 5.7. It follows that are linearly independent. Hence, we have , which verifies the first part.
To justify the second part, we construct the linear operator as follows:
(5.27) |
Note that . It follows that is a dual certificate that satisfies Nondegenerate Source Coundition (5.17). As
we have . Hence Strict Restricted Injectivity (5.25) holds. By Corollary 5.7 again, is a strong solution of problem (5.1).
Remark 5.9 (Low bounds on the number of measurement for exact recovery).
In the theory of exact recovery, [13] shows that with random Gaussian measurements it is sufficient to recover exactly with high probability by solving problem (5.1) from observations ; see also [16] for a similar result with a different approach. Also in [13, Propositions 4.5 and 4.6], a lower bound on the number of measurements is discussed for exact recovery in atomic norm minimization via the descent cone (5.6) and Terracini’s Lemma [26]. The latter is used to get an estimate of the dimension of a subspace component of the descent cone. In the case of nuclear norm, this lower bound is indeed ; see aslo [26, Proposition 12.2]. This bound holds for any linear measurement scheme. Our lower bound for is much smaller and only depends on the rank of , but is achieved for special only.
Example 5.10.
Let us consider the following nuclear norm minimization problem
(5.28) |
which is a matrix completion problem. Set , we have
(5.29) |
Moreover, note that
(5.30) |
Note that . Hence is a solution of problem (5.28). However, , i.e., Restricted Injectivity (5.16) fails. Thus is not a sharp solution of problem (5.28). Note that and with . We have from (4.10) that
It is clear that . This shows that is a strong solution by Theorem 5.2.
Example 5.11 (Checking strong minima numerically).
Let us consider the following matrix completion problem
(5.31) |
where is the projection mapping defined by
Define
Note that are orthogonal matrices, , is an SVD of , and . To check whether is a sharp solution of problem (5.31), we compute the constants in (5.20) or from problem (5.19). In this case the linear operator in (5.19) is chosen by as . With some linear algebra, we calculate with . Moreover, by using the cvx package to solve the spectral norm optimization (5.19), the Source Nondegenerate is exactly and gives us a solution . Thus is not a sharp solution of problem (5.28).
However, note further that is a dual certificate, which is computed by
Let us check the condition
(5.32) |
in (5.6). It follows from (4.3) that
The SVD of the submatrix is certainly
According to (4.8), with
By formula (4.10),
It follows that
To verify (5.32), we compute , which is the optimal solution of problem (4.38)
This is a convex optimization problem. By using cvx to solve it, we obtain that . As , is a strong solution of problem (5.31).
The idea of checking strong solution numerically by using Theorem 5.2 in the above example will be summarized again in Section 6 with bigger size of nuclear norm minimization problems.
In the later Corollary 5.12, we will show a big class of nuclear minimization problem satisfies both Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17), but not Restricted Injectivity (5.16).
Now let us consider a special case of problem (5.1)
(5.33) |
where and are known and matrices, respectively. This is usually referred as the low-rank representation problem [33]. It is well-known that the optimal solution to problem (5.33) is unique and determined by , where is the Moore-Penrose opeator of . In the following result, we advance this result by showing that this unique solution is indeed a strong solution, but not necessarily a sharp solution. Indeed, in this certain class, Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17) are satisfied, but Restricted Injectivity (5.16) is not.
Corollary 5.12 (Strong minima of low-rank representation problems).
Let be an matrix. If the linear system is consistent, then Strict Restricted Injectivity (5.25) and Nondegenerate Source Condition (5.17) hold at in problem (5.33).
Consequently, is the strong solution of the low-rank representation problem (5.33).
Proof. Suppose that is a compact SVD to the matrix . Thus and note that . Let be a compact SVD of with . Note that
It follows that is a compact SVD of . By Lemma 4.1, we have . Observe further that
which implies that This verifies Nondegenerate Source Condition (5.17) and shows that is an optimal solution of problem (5.33).
Next, let us check Strict Restricted Injectivity (5.25). For any with , we find some such that . We have
It follows that
which also implies that . This verifies Strict Restricted Injectivity (5.25).
In the following simple example, we show that the unique solution to (5.33) may be not a sharp solution in the sense (5.18).
Example 5.13 (Unique solutions of low-rank representation problems are not sharp).
Consider the following optimization problem
(5.34) |
As and , the unique solution to problem (5.34) is
Pick any and note that satisfies linear constraint in (5.34). We have
It follows that
This shows that is not a sharp solution in the sense of (5.18). The hidden reason is that Restricted Injectivity (5.16) is not satisfied in this problem.
A relatively close problem to (5.33) is
(5.35) |
where the function satisfies the standing assumptions in Section 4 with open domain and is a positive number. This is a particular case of (4.1). Next we also show that strong minima occurs in this problem.
Corollary 5.14.
Problem (5.35) has a unique and strong solution.
Proof. It is easy to see that problem (5.35) has at least an optimal solution . Let be a compact SVD of and define
Let be a compact SVD of with . As , it follows that is a compact SVD of . By (4.5), we can find such that , which means
Next let us estimate . For any , we find sequences and satisfying by Lemma 4.1 again. It follows that
which implies that We claim next that . Indeed, take any with , we find such that and
Hence we have
This implies that and verifies the claim. By Corollary 4.2, is the strong solution of problem (5.35).
6 Numerical Experiments
In this section, we perform numerical experiments to demonstrate strong minima, sharp minima, and solution uniqueness for nuclear norm minimization problem (1.1). The experiments were conducted for different matrix ranks and numbers of measurements for . Through the section, we also discuss how to use our conditions to check strong minima for problem (1.1).
6.1 Experiment 1
In the first experiment, we generate , an matrix of rank , by sampling two factors and with independent and identically distributed (i.i.d.) random entries and setting . We vectorize problem (1.1) in the following form:
(6.1) |
where is drawn from the standard Gaussian ensemble, i.e., its entries are independently and identically distributed from a zero-mean unit variance Gaussian distribution. We declare to be recovered (a solution to (6.1)) if , as proposed in [15]. In order to check sharp minima, it is required to verify Restricted Injectivity (5.16), compute (5.20) or Source Coefficient (5.19); see [24] or Remark 5.5. To verify strong minima at , we use Strong Source Coefficient in (4.27) or (4.38) and use it .
Specifically, let be a full SVD of . We respectively denote and as the th and th column of and for . Note from the formula of the tangent model space in (5.26) that forms a basis for . Thus, the Restricted Injectivity holds if , where is a matrix whose columns are all vectors from the basis .
To compute , we assume that is an SVD of and denote by the matrix whose columns are the last columns of . We then solve the following vectorized problem for the optimal solution by using the cvxpy package and compute :
(6.2) |
where and is known by
(6.3) |
To calculate , we solve the following vectorized problem of (5.19) for the optimal value by using the cvxpy package:
(6.4) |
where and are respectively determined in (6.2) and (6.3). is a sharp solution of problem (6.1) if either or is smaller than ; see Remark 5.5. Due to the possible (small) error in computation, we classify sharp minima if is recovered and either or .
To classify strong minima, we consider the cases when is recovered, and and . Let be an optimal solution problem of (6.4) expressed by the following form:
(6.5) |
with some . Note that is a dual certificate of . According to Theorem 5.2, is a strong solution provided that
(6.6) |
By Theorem 4.6, this condition holds when the Restricted Injectivity is satisfied and the Strong Source Coefficient , the optimal value of problem (4.27) or (4.38) is smaller than . Let be an SVD of . We write and , where and are the first columns of and , respectively. Define and , it follows that by (4.8). To compute , we solve the vectorized problem of (4.38):
(6.7) |
where is determined in (6.3), , and is taken from (4.37):
(6.8) |
We classify strong (non-sharp) minima if .
To illustrate the occurrence of strong minima, sharp minima, and solution uniqueness of problem (6.1), we create following graphs representing the proportion of each situation with respect to the number of measurements in Figures 1. For each fixed and , at any measurements we study 100 random cases and record the percentage of cases that are recovered, sharply recovered, and strongly (not sharply) recovered in black, blue, and red curves, respectively. Observe that the percentage of cases where is a strong (not sharp) solution is highest at approximately 40% and is more than the cases of sharp minima when the number of measurements is not big enough. This phenomenon was obtained at different measurements for different ranks, indicating a significant number of cases with strong (not sharp) solutions. Additionally, higher ranks require more measurements to achieve the highest percentage of cases with strong (not sharp) solutions.
We also plot the average values of , IC() (Irrepresentability Criterion (5.21)), , and for each number of measurements in Figures 2 with different ranks. It seems that using to check sharp minima gives us more cases than using . Moreover, is significantly smaller than both and while IC() is slightly greater than .


6.2 Experiment 2
In the second experiment, we study the following matrix completion problem:
(6.9) |
by a similar process to the first one. We also generate , an matrix of rank , by sampling two factors and with i.i.d. random entries and setting . However, this time we sample an indexed subset of entries uniformly at random from . Cvxpy package is also used to solve problem (6.9) with an optimal solution . is said to be recovered if . To classify sharp minima, we check Restricted Injectivity (5.16) (as done in the first experiment), compute in (5.20) or Source Coefficient in (5.19), and restrict or .
Specifically, to calculate , we follow the following steps. It is similar to the first experiment, denote and by the th and th column of and , where is a full SVD of . We define for all , where is the projection mapping defined by if and 0 otherwise. Then we solve the following linear system for :
It is not difficult to obtain from (5.20) that
where and .
To compute , we transform problem (5.19) to the case of matrix completion as follows:
(6.10) |
and solve by using cvxpy package for the optimal value , where is determined in (6.3).
Similarly to (6.5), we denote be an optimal solution of problem (6.10) and denote as a dual certificate of . To check if is a strong solution, we only need to verify (6.6) with by Theorem 5.2. According to Theorem 4.6, the later holds under the Restricted Injectivity and , which is the optimal solution of the following problem, a version of (4.38) for matrix completion:
(6.11) |
Here and are defined (6.3) and (6.8). We classify a case of strong minima (strong recovery) if is recovered and , but and . In Figures 3 , we plot the proportion of cases when is a unique solution, sharp solution, and strong (not sharp) solution in relation to the number of measurements. Additionally, Figures 4 displays the curves of the average value of , , and with respect to the number of measurements.
Based on Figure 3, we can see that the highest percentage of cases where is a strong solution (but not a sharp one) is just around 15%. It is much smaller than the 40% in Experiment 1. A possible reason is due to the special structure of the linear operator in (6.9), which is chosen in such a way that each row contains only one entry of 1 and the remaining entries are 0. Additionally, these entries of 1 must be in distinct columns. Similar to Experiment 1, we found that higher ranks require more measurements to achieve the highest percentage of cases with strong (not sharp) solutions.
As shown in Figure 4, the average values of , , and change depending on the number of measurements. The difference between the curves of and is less noticeable in comparison to Experiment 1, but the curve is still significantly lower than both of and .


References
- [1] D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp: Living on the edge: A geometric theory of phase transitions in convex optimization, IMA Inform. Inference, 3 (2008), 224–294.
- [2] F. J. Aragón Artacho and M. H. Geoffroy: Characterizations of metric regularity of subdifferentials, J. Convex Anal., 15 (2008), 365–380.
- [3] B. P. W. Ames and S. A. Vavasis: Nuclear norm minimization for the planted clique and biclique problems, Math. Program., 29 (2011), 69–89.
- [4] A. Ben-Tal. Second order and related extremality conditions in nonlinear programming. J. Optim. Theory Appl., 31 (1980) 143–165.
- [5] A. Ben-Tal and J. Zowe: A unified theory of first- and second order conditions for extremum problems in topological vector spaces. Math. Program. 19 (1982), 39–76.
- [6] J. F. Bonnans and A. D. Ioffe: second order sufficiency and quadratic growth for non-isolated minima, Math. Oper. Res., 20 (1995), 801–817.
- [7] J.F. Bonnans, R. Cominetti, and A. Shapiro: Second order optimality conditions based on parabolic second order tangent, SIAM J. Optim., 9 (1999), 466-492
- [8] J. F. Bonnans and A. Shapiro: Perturbation Analysis of Optimization Problems, Springer, 2000.
- [9] J. Bolte, T. P. Nguyen, J. Peypouquet, B. W. Suter: From error bounds to the complexity of first-order descent methods for convex functions, Math. Program., 165 (2017), 471–507.
- [10] J. Burke: Second order necessary and sufficient conditions for convex composite NDO, Math. Program., 38 (1987), 287-302.
- [11] P. M. Cohn: Universal algebra, Springer Science & Business Media, 2012.
- [12] L. Cromme: Strong uniqueness. A far-reaching criterion for the convergence analysis of iterative procedures, Numer. Math., 29(2) (1977/78), 179–193
- [13] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky: The convex geometry of linear inverse problems, Found Comput Math, 12 (2012), 805–849.
- [14] E. Candès and Y. Plan: Matrix completion with noise, Proceeding of the IEEE, 98 (2010), 925–936.
- [15] E. Candès and B. Recht: Exact matrix completion via convex optimization Found. Comput. Math., 9 (2009), 717–772.
- [16] E. Candès and B. Recht: Simple bounds for recovering low-complexity models, Math. Program., 141 (2013), 577–589.
- [17] Y. Cui and C. Ding: Nonsmooth composite matrix optimization: strong regularity, constraint nondegeneracy and beyond, ArXiv, arXiv:1907.13253
- [18] Y. Cui, C. Ding, and X. Zhao: Quadratic growth conditions for convex matrix optimization problems associated with spectral functions, SIAM J. Optim., 27 (2017), 2332–2355.
- [19] Y. Cui, D. Sun, and K-C. Toh: On the R-superlinear convergence of the KKT residuals generated by the augmented Lagrangian method for convex composite conic programming, Math. Program., 178 (2019),381–415
- [20] C. Ding: Variational analysis of the Ky Fan k-norm, Set-Valued and Var. Anal., 25 (2017), 265–296.
- [21] D. Drusvyatskiy, B. S. Mordukhovich, and T. T. A. Nghia: second order growth, tilt stability, and metric regularity of the subdifferential, J. Convex Anal., 21 (2014), 1165–1192.
- [22] A. V. Fiacco and G. P. McCormick: Nonlinear Programming: Sequential Unconstrained Minimization Techniques, Wiley, New York, 1968.
- [23] J. Fadili, J. Malick, G. Peyré: Sensitivity Analysis for Mirror-Stratifiable Convex Functions, SIAM J. Optim., 28 (2018), 2975—3000.
- [24] J. Fadili, T. T. A. Nghia, and T. T. T. Tran: Sharp, strong and unique minimizers for low complexity robust recovery, IMA Inf. Inference, 12 (2023).
- [25] M. Grasmair, M. Haltmeier, and O. Scherzer: Necessary and sufficient conditions for linear convergence of regularization, Comm. Pure Applied Math. 64(2011), 161–182.
- [26] J. Harris: Algebraic Geometry, Springer, 1992.
- [27] C. J. Hsieh and P. Olsen: Nuclear norm minimization via active subspace selection. ICML, 32 (2014), 575–583.
- [28] T. Hoheisel and E. Paquette: Uniqueness in nuclear norm minimization: Flatness of the nuclear norm sphere and simultaneous polarization, J. Optim. Theo. Appl., 197 (2023), 252-276.
- [29] J. Liang, J. Fadili, G. Peyré: Activity identification and local linear convergence of forward-backward-type methods, SIAM J. Optim., 27 (2017), 408–437.
- [30] A. S. Lewis and H. S. Sendov: Nonsmooth analysis of singular values. I. Theory. Set-Valued Anal., 13 (2005), 213–241.
- [31] A. S. Lewis and H. S. Sendov: Nonsmooth analysis of singular values. II. Applications, Set-Valued Anal., 13 (2005), 243–264.
- [32] Z.-Q. Luo and P. Tseng: Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46 (1993), 157–178.
- [33] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Yu, and Y. Ma: Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 171-184.
- [34] A. D. Ioffe. Necessary and sufficient conditions for a local minimum III: Second order conditions and augmented duality. SIAM J. Control Optim., 17 (1979), 266–286.
- [35] A. D. Ioffe: Variational analysis of a composite function: A formula for the lower second order epi-derivative, J. Math. Anal. Appl., 160 (1991), 379–405.
- [36] B. S. Mordukhovich: Variational Analysis and Generalized Differentiation, Springer, 2006.
- [37] A. Mohammadi and E. Sarabi: Parabolic regularity of spectral functions. Part I: Theory. arXiv:2301.04240
- [38] S. Nam, M. E. Davies, M. Elad, R. Gribonval: The cosparse analysis model and algorithms, Applied and Computational Harmonic Analysis, 34 (2013), 30–56.
- [39] B. T. Polyak. Sharp minima. Technical report, Institute of Control Sciences Lecture Notes, Moscow, USSR, 1979.
- [40] B. Recht, M. Fazel, and P. Parrilo: Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Rev., 52 (2010), 471–501.
- [41] R. T. Rockafellar: Convex Analysis, Princeton University Press, Princeton, New Jersey, 1970.
- [42] R. T. Rockafellar: second order optimality conditions in nonlinear programming obtained by way of epi-derivatives, Trans. Amer. Soc., 307 (1988), 75–108.
- [43] R. T. Rockafellar and R. J-B. Wets: Variational Analysis, Springer, Berlin, 1998.
- [44] M. Studniarski and D. E. Ward: Weak sharp minima and suffficient conditions, SIAM J. Control Optim., 38 (1999), 219–236.
- [45] S. Vaiter, G. Peyré, and J. Fadili: Model consistency of partly smooth regularizers. IEEE Transactions on Information Theory, 64 (2017), 1725–1737.
- [46] S. Vaiter, M. Golbabaee, J. Fadili, and G. Peyré: Model selection with low complexity priors. Information and Inference: A Journal of the IMA, 4 (2015), 230–287.
- [47] S. Vaiter, G. Peyré, and J. Fadili: Low complexity regularization of linear inverse problems. In Sampling Theory, a Renaissance, 103–153. Birkhäuser, Cham, 2015.
- [48] G. A. Watson: Characterization of the subdifferential of some matrix norms, Linear Algebra Appl., 170 (1992), 33–45
- [49] D. E. Ward: Characterizations of strict local minima and necessary conditions for weak sharp minima, J. Optim. Theory and Appl., 80 (1994), 551–571.
- [50] Z. Zhou and A. M-C. So: A unified approach to error bounds for structured convex optimization, Math. Program., 165 (2017), 689–728.
- [51] L. Zhang, N. Zhang, and X. Xiao: On the second order directional derivatives of singular values of matrices and symmetric matrix-valued functions, Set-Valued Var. Anal., 21 (2013), 557–586.
- [52] R. Zhang and J. Treiman: Upper-Lipschitz multifunctions and inverse subdifferentials, Nonlinear Anal., 24 (1995), 273–286.
- [53] X. Wang, J. J. Ye, X. Yuan, S. Zeng, and J. Zhang.: Perturbation techniques for convergence analysis of proximal gradient method and other first-order algorithms via variational analysis, Set-Valued and Variational Analysis, 30 (2022), 39–79.