Tight error bounds for log-determinant cones without constraint qualifications
Abstract
In this paper, without requiring any constraint qualifications, we establish tight error bounds for the log-determinant cone, which is the closure of the hypograph of the perspective function of the log-determinant function. This error bound is obtained using the recently developed framework based on one-step facial residual functions.
Keywords: error bounds, facial residual functions, log-determinant cone
1 Introduction
The convex conic feasibility problem has attracted a lot of attention due to its power in modeling convex problems. Specifically, a convex conic feasibility problem admits the following form:
(Feas) |
where is a closed convex cone contained in a finite dimensional Euclidean space , is a subspace and is given. Various aspects of (Feas) such as numerical algorithms and applications have been studied in the literature; see e.g., [5, 17]. Here we focus on the theoretical aspects, particularly error bounds for (Feas). To be more precise, assuming the feasibility of (Feas), we want to establish inequalities that give upper bounds on the distance from an arbitrary point to based on the individual distances from the point to and . As a fundamental topic in optimization [18, 21, 29, 32, 43], error bounds possess a wide range of applications, especially in algorithm design and convergence analysis.
In this paper, we consider (Feas) with being the log-determinant cone defined as
where , is the positive orthant, (resp., ) is the set of positive semidefinite (resp., positive definite) matrices. We note that the log-determinant cone is the closure of the hypograph of the perspective function of the log-determinant function.
The log-determinant function has both theoretical and practical importance. It is a self-concordant barrier function for , and hence it is useful for defining the logarithmically homogeneous self-concordant barrier functions (LHSCBs) for various matrix cones. LHSCBs are crucial for complexity analysis of the celebrated primal-dual interior point methods for solving conic feasibility problems; see, e.g., [31, 9]. In practice, the log-determinant function appears frequently in countless real-world applications, especially in the area of machine learning, to name but a few, the sparse inverse covariance estimation [16], the fused multiple graphical Lasso problem [41, 42], Gaussian process [34, 36], sparse covariance selection [13, 12], finding minimum-volume ellipsoids [1, 38, 39], the determinantal point process [20], kernel learning [4], D-optimal design [2, 8] and so on.
An elementary observation is that
in this way, a problem that has a log-determinant term in its objective can be recast as a problem over the log-determinant cone . In view of the importance and prevalence of the log-determinant function, the cone can also be used to handle numerous applications.
That said, if one wishes to use conic linear optimization to solve problems involving log-determinants, it is not strictly necessary to use . Indeed, it is possible, for example, to consider a reformulation using positive semidefinite cones and exponential cones, e.g., [30, Section 6.2.3].
A natural question then is whether it is more advantageous to use a reformulation or handle directly. Indeed, Hypatia implements the log-determinant cone as a predefined exotic cone [9] and their numerical experiments show that the direct use of the log-determinant cone gives numerical advantages compared to the use of reformulations, see [11] and [10, Sections 8.4.1, 8.4.2]. One reason that other formulations may be less efficient is that they increase the dimension of the problem. Another drawback is that they do not capture the geometry of the hypograph of the log determinant function as tightly.
Motivated by these results, we present a study of the facial structure of and its properties in connection to feasibility problems as in (Feas).
Specifically, we deduce tight error bounds for (Feas) with by deploying a recently developed framework [23, 24], which is based on the facial reduction algorithm [7, 33, 40] and one-step facial residual functions [23, Definition 3.4]. This framework has been used with success to develop concrete error bounds for symmetric cones [26], exponential cones [23], -cones [24] and power cones [22]. Although the log-determinant cone is a high-dimensional generalization of the exponential cone, whose error bounds were studied in depth in [23], the derivation of the error bounds for the log-determinant cone is not straightforward. Indeed, the exponential cone is three dimensional and so its facial structure can be visualized explicitly. In contrast, with a higher dimension, the log-determinant cone has a more involved facial structure.
2 Notation and preliminaries
In this paper, we will use lowercase letters to represent real scalars, bold-faced letters to denote vectors (including “generalized” vectors such as , which consists of real scalars and a matrix), capital letters to denote matrices and curly capital letters for spaces, subspaces or cones.
Let be a finite dimensional Euclidean space, and and be the set of nonnegative and positive real numbers, respectively. For a real number , we denote that . The inner product on is denoted by and the induced norm is denoted by . With the induced norm, for any and a closed convex set , we denote the projection of onto by and the distance between and by . For any , we denote the ball centered at with radius by . For notational simplicity, we will use to denote the ball centered at with radius . Meanwhile, we will denote the orthogonal complement of by .
2.1 Matrices
We use to denote the set of all real matrices and to denote the set of symmetric matrices. The identity matrix will be denoted by . Let and be the set of symmetric positive semidefinite matrices and positive definite matrices respectively. The interior of is . We write (resp., ) if (resp., ). For any , we let denote the -th eigenvalue of such that . We will use and to denote the maximum and minimum eigenvalues of , respectively. The rank of is defined by the number of non-zero eigenvalues, denoted by . The trace (resp., determinant) of is defined by (resp., ). With these, we recall that the Frobenius inner product on is given by for any , and the Frobenius norm is . For any (resp., ), we have (resp., ). We hence also have for any that
(2.1) |
For a given non-zero positive semidefinite matrix, the next result connects its determinant with its trace and rank.
Lemma 2.1.
Let . Then for any , there exists so that
(2.2) |
Proof.
Let be an eigendecomposition of , where is orthogonal and is diagonal, and let be the rank of . Then since . Without loss of generality, we may suppose that the first diagonal entries of , denoted as , are nonzero and are arranged in descending order. Then is the smallest positive eigenvalue of and we have for any that
(2.3) |
where is the submatrix of formed by for , (a) holds since (thanks to ), (b) is true because of the interlacing theorem (see [19, Theorem 4.3.8]).
2.2 Error bounds for conic feasibility problems
We first recall the definition of error bounds.
We remark that typically it is required that satisfy , be nondecreasing and be right-continuous at . Under these conditions, the error bound in Definition 2.2 can be understood in the context of consistent error bound functions, see [25, Definition 3.1] and this footnote.111For (where is the ball centered at the origin with radius ), if Definition 2.2 holds, then can be taken to be a nondecreasing function of (since considering a larger constant still preserves the error bound inequality). In this way, the function given by satisfies [25, Definition 3.1], provided that has the aforementioned properties. With different residual functions, we will have different error bounds, among which the Lipschitzian and Hölderian error bounds are most widely studied in the literature.
Particularly, we say that (Feas) satisfies a uniform Hölderian error bound with exponent if Definition 2.2 holds with for every bounded set . That is, for every bounded set , there exists a constant such that
for all . If , then the error bound is said to be Lipschitzian. Hölderian error bounds are a particular case of a consistent error bound, see [25, Theorem 3.5].
Let be a closed convex cone contained in and be its dual cone. We will denote the boundary, relative interior, linear span, and dimension of by and , respectively. If , then is said to be pointed. If is a face of , i.e., for any such that , we have , then we write .222By convention, we only consider nonempty faces. If further for some , we say that is an exposed face of . A face is said to be proper if , and we denote it by . If is proper and , then is said to be a nontrivial face of .
The facial reduction algorithm [7, 33, 40] and the FRA-poly algorithm [27] play important roles in making full use of the facial structure of a cone; see also [23, Section 3]. More precisely, assuming (Feas) is feasible, the facial reduction algorithm aims at finding the minimal face that contains the feasible region and satisfies some constraint qualification. One of the most commonly used constraint qualification is the so-called partial-polyhedral Slater (PPS) condition [26, Definition 3]. For (Feas), if and satisfy the PPS condition, then a Lipschitzian error bound holds for and ; see [6, Corollary 3] and the discussion preceding [23, Proposition 2.3]. Thanks to this property, we can apply the facial reduction algorithm to deduce the error bounds based on the one-step facial residual function [23, Definition 3.4] without requiring any constraint qualifications, as in the framework developed recently in [26]; see also [23, 24]. This framework is highly inspired by the fundamental work of Sturm on error bound for LMIs, see [37]. For the convenience of the reader, we recall the definition of the one-step facial residual function as follows.
Definition 2.3 (One-step facial residual function (-FRF)).
Let be a closed convex cone and . Suppose that satisfies the following properties:
-
(i)
is nonnegative, nondecreasing in each argument and it holds that for every .
-
(ii)
The following implication holds for any and :
Then is said to be a one-step facial residual function (FRF) for and .
The one-step facial residual function is used in each step of the facial reduction algorithm to connect a face and its subface until a face is found such that and satisfy the PPS condition. Then the error bound for and can be obtained as a special composition of those one-step facial residual functions. Due to the importance of the PPS condition in this framework, we shall define the distance to the PPS condition of a feasible (Feas), denoted by , as the length minus one of the shortest chain of faces (among those chains constructed as in [26, Proposition 5]) such that the PPS condition holds for the final face in the chain and .
Before ending this subsection, we present a lemma and a proposition that will help simplify our subsequent analysis.
Lemma 2.4 (Formula of ).
Let be a closed convex cone and be such that is a nontrivial exposed face of . Let and let and . Then, we have
(2.5) |
and,
(2.6) |
Proof.
Since , we have
Moreover, we can notice that and .
The above display implies that for any , are three vertices of a right-angled triangle with being the right angle. This fact also leads us to the observation that . Indeed, suppose not, then there exists with such that . Then and hence form a new right-angled triangle. Thus,
Since , the above display contradicts the fact that . Therefore, and . ∎
The next proposition states an error bound result related to the positive semidefinite cone. We present a proof based on the results in [26], although it can also be obtained from Sturm’s error bound in [37].
Proposition 2.5 (Error bound for positive semidefinite cones).
Let and , then there exists such that
(2.7) |
where
(2.8) |
3 Error bounds for the log-determinant cones
In this section, we will compute the one-step facial residual functions for the log-determinant cones, and obtain error bounds. Let be a positive integer and be the dimension of , we consider the -dimensional space . We let denote an element of , where and , and equip with the following inner product:
Recall that the log-determinant cone is defined as follows.
(3.1) | ||||
(3.2) |
Its dual cone is given by333Here is a sketch. Let be such that and let be the closed convex cone generated by the set . We have . That is, is the closure of . By [35, Theorem 14.4], the closed convex cone generated by satisfies , where is the polar of . The conjugate of is for . Overall, we conclude that iff iff is in the closure of . Finally, this implies that if and only if is in the closure of . This means iff . Thus, we conclude that the cones in (3.1) and (3.3) are dual to each other.
(3.3) | ||||
(3.4) |
One should notice that if , then the log-determinant cone reduces to the exponential cone, whose corresponding error bound results were discussed in [23]. Hence, without loss of generality, we assume that in the rest of this paper. Notice from (3.2) and (3.4) that is a scaled and rotated version of .
For convenience, we further define
(3.5) |
With that, we have
(3.6) |
and
Before moving on, we present several inequalities, which will be useful for our subsequent analysis.
-
1.
Let , and let with and and satisfy . Then, we have
(3.7) -
2.
Let and . The following inequalities hold for all sufficiently small ,
(3.8)
3.1 Facial structure
In general, we are more interested in nontrivial faces, especially nontrivial exposed faces. Recall that if there exists such that , then is a nontrivial exposed face of . Different nonzero ’s along will induce different nontrivial exposed faces.
The next proposition completely characterizes the facial structure of the log-determinant cone.
Proposition 3.1 (Facial structure of ).
All nontrivial faces of the log-determinant cone can be classified into the following types:
-
(a)
infinitely many 1-dimensional faces exposed by with ,
(3.9) where
(3.10) -
(b)
a single -dimensional exposed face exposed by with ,
(3.11) -
(c)
infinitely many -dimensional exposed faces given by
(3.12) which are exposed by
(3.13) -
(d)
a single -dimensional exposed face exposed by
that is, ,
(3.14) - (e)
Proof.
Let be such that is a nontrivial face of . Recall that is pointed, so . By (3), and we can determine whether or by checking whether or not. Therefore, we shall consider the following cases.
: indicates that , then we must have
For any , since , we can see that if and only if
(3.16) |
If , then , and so . This together with and (2.1) imply that . Since , we observe that
Thus, and . The latter relation leads to . Consequently, .
If , then by the definition of the log-determinant cone and hence . Then, we know that and hence (3.16) becomes
After rearranging terms, we have
(3.17) |
Note also that
where and .
Let , we can rewrite (3.17) as follows,
(3.18) |
Since for all and if and only if , (3.18) holds if and only if
This illustrates that all the eigenvalues of are . Hence, one can immediately see and so . By substituting this expression of into , we obtain (3.9).
: indicates that , then and . Now, for any , we have if and only if
(3.19) |
Since and , we observe that both summands on the left hand side of (3.19) are nonnegative. Therefore, (3.19) holds if and only if
(3.20) |
These together with (2.1) make it clear the cases we need to consider.
Specifically, if , we consider the following four cases.
-
1.
If and , then , which contradicts our assumption. This case is hence impossible.
- 2.
-
3.
If , then but is not definite, so . Since holds, this corresponds to (3.12).
-
4.
If , i.e., , then . This corresponds to (3.14).
Therefore, we obtain the exposed faces defined as in (3.11), (3.12) and (3.14).
We now show that all nontrivial faces of were accounted for (3.9), (3.11), (3.12), (3.14) and (3.15). First of all, by the previous discussion, all nontrivial exposed faces must be among the ones in (3.9), (3.11), (3.12), and (3.14). Suppose is a non-exposed face of . Then it must be contained in a nontrivial exposed face of , e.g., [7, Proposition 3.6] or [28, Proposition 2.1]. The faces in (3.9) and (3.14) are one-dimensional, so the only candidates for are the faces as in (3.11) and (3.12).
It is worth noting that when , the case corresponding to does not occur. We also have the following relationships between these nontrivial faces. Let with be given. If , then the corresponding faces and satisfy the following inclusion
(3.21) |
If , then we have
(3.22) |
For distinct and with and , suppose and expose and , respectively. If , then (see, e.g., [3, Section 6]). A similar result also holds for non-exposed faces, that is, denote the non-exposed faces by and , respectively, with respect to and , if , then .
3.2 One-step facial residual functions
In this subsection, we shall apply the strategy in [23, Section 3.1] to compute the corresponding one-step facial residual functions for nontrivial exposed faces of the log-determinant cone. Put concretely, consider with . For and some nondecreasing function with and for some , we define
(3.23) |
In view of [23, Theorem 3.10] and [23, Lemma 3.9], if then we can use and to construct a one-step facial residual function for and . In [23], the positivity of (with the exponential cone in place of and some properly selected ) was shown by contradiction. Here, we will follow a similar strategy and make extensive use of the following fact from [23, Lemma 3.12]: if , then there exist and a sequence such that
(3.24) |
where , and .
3.2.1 : the unique -dimensional faces
We define the piecewise modified Boltzmann-Shannon entropy as follows:
(3.25) |
Note that is nondecreasing with and for any .
The next theorem shows that for , which implies that an entropic error bound holds.
Theorem 3.2 (Entropic error bound concerning ).
Let with such that . Let and let be defined as in (3.23) with and . Then and
(3.26) |
Proof.
By (3.11), with . Since for all , we have and for all . Hence, for all .
Recall that with , then Since and is a hyperplane, one can immediately see that for all ,
Using (3.11), and , we see that and . We thus obtain that for all ,
Because , for sufficiently large , we have . Hence,
where (a) comes from the fact and (3.7). This contradicts (3.24) with in place of and hence this case cannot happen. Therefore, we conclude that , with which and [23, Theorem 3.10], (3.26) holds. ∎
Remark 3.3 (Tightness of (3.26)).
We claim that for , there is a specific choice of sequence in with along which both sides of (3.26) vanish at the same order of magnitude. Recall that we assumed that ; see the discussions following (3.4).444When , the log-determinant cone reduces to the exponential cone studied in [23], where the tightness of the corresponding error bounds was shown in Remark 4.14 therein. Let with so that . Define for every . Then . Since for any and as , there exists such that . Thus, applying (3.26), there exists such that
Noticing that the projection of onto (see (3.11)) is given by , we obtain
Let for every . Then since . In view of the definition of (see (3.25)) and its monotonicity, we conclude that for large enough we have
That means it holds that for all sufficiently large ,
Consequently, for any given nonnegative function such that , we have upon noting that
which shows that the choice of in (3.26) is tight.
3.2.2 : the family of -dimensional faces
Let and let be such that . Let be defined as in (3.23) with and some nondecreasing function with and for some . If , in view of [23, Lemma 3.12], there exists and a sequence such that (3.24) holds. As we will see later in the proofs of Theorem 3.6 and Theorem 3.8 below, we will encounter the following three cases:
-
(I)
and for all large ;
-
(II)
and infinitely often;
-
(III)
and infinitely often.
For case (I), we have the following lemma which will aid in our further analysis. One should notice that this lemma holds for both and .
Lemma 3.5.
Let with and such that with or . Let be arbitrary and be such that
where and . Then
where is defined as in (2.8) with being .
Proof.
Note that implies with and for all . Then, , which is nonnegative since both and are positive semidefinite. Because and is a hyperplane, one can immediately see that for all ,
On the other hand, by Lemma 2.4 and the formula of , we obtain that for all ,
where the final inequality comes from Proposition 2.5 and is defined as in (2.8) with being .
Now, we can conclude that
This completes the proof. ∎
Now, we are ready to show the error bound concerning . We first show that we have a Hölderian error bound concerning when .
Theorem 3.6 (Hölderian error bound concerning if ).
Let with , and such that . Let and let be defined as in (3.23) with and . Then and
(3.27) |
Proof.
If , in view of [23, Lemma 3.12], there exist and a sequence such that (3.24) holds with and . Since , the equation for the boundary of (see (3.6) and (3.21)) implies that we have the following two cases:
-
(i)
infinitely often;
-
(ii)
for all large .
(i) Passing to a subsequence if necessary, we can assume that for all , that is,
Then, , which is positive since and both are positive semidefinite.
Now, one can check that
(3.28) |
On the other hand, by Lemma 2.4, the formula of and Proposition 2.5, we obtain the following inequality for all ,
(3.29) |
Let and .
If infinitely often, then by extracting a subsequence if necessary, we may assume that for all . Then we have from (3.29) and (3.7) that for all large ,
where (a) holds by (3.8) with and the fact that (since ), (b) is true since and for all thanks to (3.28).
This contradicts (3.24) with in place of and hence this case cannot happen.
Remark 3.7 (Tightness of (3.27)).
Fix any (recall that we assumed ; see the discussions following (3.4)). Let with , and . Then, we have from (3.12). Let be such that where is diagonal, and . Then
(3.30) |
Fix a with . For every , we define
Then there exists such that . We also observe that for all based on standard arguments involving the Schur complement. Then . With that, we have
Next, we consider the case where . Define as follows
(3.31) |
We note that is increasing with and for all . Moreover, for any . With , the next theorem shows that for , which implies that a log-type error bound holds.
Theorem 3.8 (Log-type error bound concerning if ).
Proof.
If , in view of [23, Lemma 3.12], there exists and sequences being defined as those therein, with the cone being and the face being , such that (3.24) holds with as in (3.31). As in the proof of Theorem 3.6, the condition means that we need to consider the following two cases:
-
(i)
infinitely often;
-
(ii)
for all large .
(i) Passing to a subsequence if necessary, we can assume that for all , that is,
Then , which is nonnegative since .
Now, one can check that for all ,
(3.33) |
On the other hand, by Lemma 2.4, the formula of and Proposition 2.5, we obtain that for all ,
(3.34) |
Let and .
If infinitely often, then, by passing to a subsequence if necessary, we may assume that for all , and hence for all . Thus, upon invoking Lemma 2.1, we obtain that for all ,
(3.35) |
Then, for all sufficiently large ,
where and , (a) comes from (3.34) and (3.7), (b) holds because of (3.35) and the fact that is increasing for all sufficiently small positive , (c) is true by (3.8) (with ).
Therefore, we conclude that
This contradicts (3.24) with in place of and hence this case cannot happen.
If infinitely often, then by passing to a subsequence if necessary, we may assume that for all large . Moreover, recalling the exponential form of in (3.2), we have for all . Invoking Lemma 2.1, we then see that for all ,
Thus, by taking logarithm on both sides, the above inequality becomes
Since , , and both sequences are positive, we note that for all large . After multiplying on both sides of the above display and rearranging terms, we see that for all large ,
Then, by passing to the limit on both sides of the above display, we obtain that
(3.36) |
Therefore, we conclude that
where (a) is true owing to (3.33) and (3.34), (b) comes from (3.36) and the fact , the last inequality holds because thanks to for all large . The above display contradicts (3.24) with in place of and so this case cannot happen.
Remark 3.9 (Tightness of (3.32)).
Let with , . Then, we have from (3.12). Consider the sequence , and for every , we note that and for every . Moreover, there exists such that . Therefore, applying (3.32), there exists such that
In view of the definition of (see (3.31)) and its monotonicity, for large enough we have
Consequently, it holds that for all sufficiently large ,
Similar to the argument in Remark 3.3, we conclude that the choice of is tight.
3.2.3 : the exceptional 1-dimensional face
We first show a Lipschitz error bound concerning if .
Theorem 3.11 (Lipschitz error bound concerning if ).
Let with and such that . Let and let be defined as in (3.23) with and . Then and
(3.37) |
Proof.
If , in view of [23, Lemma 3.12], there exists and sequences being defined as those therein, with the cone being and the face being , such that (3.24) holds with . Note that means that we need to consider the following two cases:
-
(i)
infinitely often;
-
(ii)
for all large .
(i) Without loss of generality, we assume that for all by passing to a subsequence if necessary, that is,
Then, and
On the other hand, by Lemma 2.4, we obtain that for all ,
(3.38) |
If infinitely often, by passing to a subsequence if necessary, we may assume that for all large and hence, recalling that (since ), we obtain
where (a) holds because for all .
Combining these identities and using (2.1) yields:
This contradicts (3.24) with in place of and hence this case cannot happen.
Note that a Lipschitz error bound is always tight up to a constant, so (3.37) is tight.
If , we have the following Log-type error bound for .
Theorem 3.12 (Log-type error bound concerning if ).
Proof.
If , in view of [23, Lemma 3.12], there exists and sequences being defined as those therein, with the cone being and the face being , such that (3.24) holds with as in (3.31). Note that means that we need to consider the following two cases:
-
(i)
infinitely often;
-
(ii)
for all large .
(i) Without loss of generality, we assume that for all by passing to a subsequence if necessary, that is,
Then and
(3.40) |
In addition, by Lemma 2.4, we obtain that for all ,
(3.41) |
Let .
If infinitely often, then by passing to a subsequence if necessary, we may assume that for all . Hence we have . Thus, combining Lemma 2.1 with , we obtain that for all ,
(3.42) |
Then, for sufficiently large ,
(3.43) | ||||
where , (a) comes from (3.41) and (3.7), (b) holds because of (2.1), (3.42) and the fact that is increasing for all sufficiently small positive , (c) is true because for sufficiently small and because .
Hence,
where the first inequality comes from (3.40) and (3.43), the second inequality comes from (3.8) (with and ). This contradicts (3.24) with in place of and hence this case cannot happen.
If infinitely often, then by passing to a subsequence if necessary, we may assume that that for all . Moreover, recalling the exponential form of in (3.2), we have for all . Upon invoking Lemma 2.1 with , we then see that for all , we have
Thus, by taking the logarithm on both sides, the above inequality becomes
Since and , are positive sequences, we note that for all large . After multiplying on both sides of the above display and rearranging terms, we see that for all large ,
Then, by passing to the limit on both sides of the above display, we obtain that
(3.44) |
Note also that since has full rank, we have upon invoking the equivalence in (2.1) that . Then Proposition 2.5 guarantees that .
Remark 3.13 (Tightness of (3.39)).
Let with . Then, . Consider the same sequences in Remark 3.9, i.e., for every ,
Note that there exists such that and for any . Therefore, applying (3.39), there exists such that
In view of the definition of (see (3.31)) and its monotonicity, for large enough we have
Consequently, it holds that for all sufficiently large ,
Similar to the argument in Remark 3.3, we conclude that the choice of is tight.
3.2.4 : the family of 1-dimensional faces
Theorem 3.15 (Hölderian error bound concerning ).
Let with and such that . Let and let be defined as (3.23) with and . Then and
(3.45) |
Proof.
If , in view of [23, Lemma 3.12], there exists and sequences being defined as those therein, with the cone being and the face being , such that (3.24) holds with . We consider two different cases.
-
(i)
infinitely often, i.e., infinitely often (wherefore );
-
(ii)
for all large , i.e., for all large .
(i) If infinitely often, by extracting a subsequence if necessary, we may assume that
Combining this with the definition of , we have
Here, we recall that .
Since projections are non-expansive, we have . Moreover, since , we have . Thus,
This display shows that (3.24) for does not hold in this case. Since holds for small , we conclude that (3.24) for does not hold as well.
(ii) If for all large , by passing to a subsequence if necessary, we can assume that
Thus, we have
(3.46) |
and
(3.47) |
where for and , and the nonnegativity comes from the observation that for all and the facts that and ; recall that here and , then for all , and hence is well-defined.
Next, we turn to compute . Using Lemma 2.4, (3.9) and (3.10), one can see for all ,
where (a) holds because . We remark that is a symmetric matrix. Let
We notice that and for and . Then, we have for all ,555For , we denote the nuclear norm and spectral norm of by and , respectively.
where , and (a) and (b) hold since the dual norm of nuclear norm is the spectral norm . Hence, we obtain that for all ,
(3.48) |
Before moving on, we define two auxiliary functions and discuss some useful properties. Define
(3.49) |
We observe that
(3.50) | |||
In addition, for all and if and only if .
Now, recall from the setting of that and . This and the formula of in (3.47) reveal that we need to consider the following two cases:
-
(I)
;
-
(II)
.
For notational simplicity, we define for all .
(I) Without loss of generality, by passing to a further subsequence, we assume that . Combining this assumption and the fact that for all with (3.50), we know that . Now, consider the Taylor expansion of at , that is,
It follows that there exists such that for any satisfying , . Thus, we have for all large that,
(3.51) |
We can deduce the lower bound of for sufficiently large as follows:
where (a) comes from (3.46), (b) comes from (3.47) and (3.49), (c) holds by (3.51), (d) comes from the root-mean inequality.
Next, to derive a bound for , we shall relate to . To this end, notice that Then, there exists such that for any ,
Therefore, by (3.48), for all sufficiently large ,
We thus conclude that
where (a) holds since . This contradicts (3.24) with in place of and hence this case cannot happen.
(II) In this case, in view of (3.50), by passing to a further subsequence if necessary, we can assume that there exist and such that for all large , that is, for all large . Then, for all large . Now, consider the following function
where is defined as in (3.49). Since implies for some , we see that is well-defined. Moreover, one can check that is lower semi-continuous and never zero.
We claim that . Granting this, we have
where (a) comes from (3.46), (3.47), (3.48) and the definition of and in (3.49), (b) holds thanks to the definition of . The above display contradicts (3.24) with in place of and hence this case cannot happen. Therefore, we obtain that with . Together with [23, Theorem 3.10], we deduce that (3.45) holds.
Now it remains to show that . Since is lower semi-continuous and never zero, it suffices to show that ; see this footnote.666Suppose that . Then, there exists a sequence such that . If is unbounded, we can find a subsequence such that and holds, which would contradict . So must be bounded and passing to a subsequence we may assume it converges to some . By lower semicontinuity, we have . However, is always positive, so this cannot happen either. Therefore, implies .
To this end, consider a sequence such that and
then there exists at least one such that . Consequently, and , and so both and tend to . Passing to a subsequence, we can assume that for each , exists and we can split into three parts:
-
(1)
, then , . Denote the set of indices of these components by where refers to “constant”.
For any , we have
Thus, there exists a constant such that for all sufficiently large and all ,
-
(2)
, then , . Denote the set of indices of these components by .
-
(3)
, then , . Denote the set of these components by . We have , since otherwise .
For any , we notice that
Thus, for all sufficiently large and all ,
Combining the above three cases, we obtain
(3.52) |
where (a) comes from the fact that
which holds because the numerator tends to 0 while the denominator tends to infinity. ∎
Remark 3.16 (Tightness of (3.45)).
Let be defined as in Proposition 3.1.(a)) and
so that , and there exists such that . Then we have
For the sake of notational simplicity, we denote , then
(3.53) |
Consider the Taylor expansion of with respect to at , we have
(3.54) |
Next, upon invoking the definitions of and (see (3.9) and (3.10), respectively), we can see that
For the sake of brevity, we denote and . Then we have
Noting
we know that attains its minimum at , which is larger than 0 for sufficiently large (or, equivalently, sufficiently small ).
Next we move towards the analysis of . Consider the Taylor expansion of and with respect to at , we have that
and
Then, for all sufficiently large ,
(3.55) |
Next we show . Suppose that , then777Note that this quadratic in has real roots because ; see the discussions following (3.4). and . This implies that and hence thanks to . Therefore, we can see that because either or .
By Theorem 3.15, we have the following one-step facial residual function for and .
Corollary 3.17.
Let with and such that . Let be as in (3.23) with and . Then the function defined by
is a one-step facial residual function for and .
3.3 Error bounds
In this subsection, we combine all the previous analysis to deduce the error bound concerning (Feas) with . We proceed as follows.
We consider (Feas) with and we suppose (Feas) is feasible. We also let , where we recall that denotes the distance to the PPS condition, i.e., the minimum number of facial reduction steps necessary to find a face such that and satisfy the PPS condition; see [26, Section 2.4.1].
In particular, invoking [26, Proposition 5], there exists a chain of faces
(3.57) |
together with satisfying the following properties:
-
(a)
For all we have
-
(b)
and and satisfy the PPS condition.
In order to get the final error bound for (Feas) we aggregate the one-step facial residual functions for each of and using the recipe described in [23, Theorem 3.8].
So far, we only computed facial residual functions for and , but we need the ones for the other and . Fortunately, thanks to the facial structure of , if , then must be a face of the form or (see (3.11) and (3.12)). This is because all other possibilities correspond to non-exposed faces or faces of dimension (for which the PPS condition is automatically satisfied).
and are symmetric cones [14, 15] since they are linearly isomorphic to a direct product of and a face of a positive semidefinite cone (which are symmetric cones on their own right, e.g., [26, Proposition 31]). The conclusion is that for the faces “down the chain” we can compute the one-step facial residual functions using the general result for symmetric cones given in [26, Theorem 35]. We note this as a lemma.
Lemma 3.18.
Let be a face of . Let . Then, there exists a constant such that the function
is a one-step facial residual function for and .
Proof.
Follows by invoking [26, Theorem 35] with , and . ∎
We are now positioned to prove our main result in this paper.
Theorem 3.19 (Error bounds for (Feas) with ).
Consider (Feas) with . Suppose (Feas) is feasible and let and consider a chain of faces as in (3.57). Then and the following items hold:
-
(i)
If , then (Feas) satisfies a Lipschitzian error bound.
-
(ii)
If , we have or or or or .
-
(iii)
If we have or is of form . Then, an error bound with residual function holds, where and
(3.58)
Proof.
Following the discussion so far, if , it is because or is of the form . Also, as remarked previously, in this case, is a symmetric cone that is a direct product of a polyhedral cone (of rank at most ) and a symmetric cone of rank at most . Considering the conic feasibility problem with , it follows from [26, Proposition 24, Remark 39] that
Hence, by adding the first facial reduction step to get , we obtain the bound on . Next, we examine the possibilities for .
∎
From Theorem 3.19 we see the presence of non-Hölderian behaviour in the cases of entropic and logarithmic error bounds. A similar phenomenon was observed in the study of error bounds for the exponential cone, see [23, Section 4.4]. The analysis of convergence rates of algorithms under non-Hölderian error bounds is still a challenge (see [25, Sections 5 and 6]) and is thus another interesting test bed for research ideas on this topic.
References
- [1] S. D. Ahipasaoglu, P. Sun, and M. J. Todd. Linear convergence of a modified Frank-Wolfe algorithm for computing minimum-volume enclosing ellipsoids. Optimization Methods and Software, 23(1):5–19, 2008.
- [2] C. L. Atwood. Optimal and efficient designs of experiments. Annals of Mathematical Statistics, 40:1570–1602, 1969.
- [3] G. P. Barker and D. Carlson. Cones of diagonally dominant matrices. Pacific Journal of Mathematics, 57(1):15 – 32, 1975.
- [4] S. Bartels, W. Boomsma, J. Frellsen, and D. Garreau. Kernel-matrix determinant estimates from stopped Cholesky decomposition. Journal of Machine Learning Research, 24:71:1–71:57, 2023.
- [5] H. H. Bauschke and J. M. Borwein. On projection algorithms for solving convex feasibility problems. SIAM Review, 38(3):367–426, 1996.
- [6] H. H. Bauschke, J. M. Borwein, and W. Li. Strong conical hull intersection property, bounded linear regularity, Jameson’s property (G), and error bounds in convex optimization. Mathematical Programming, 86(1):135–160, Sep 1999.
- [7] J. M. Borwein and H. Wolkowicz. Regularizing the abstract convex program. Journal of Mathematical Analysis and Applications, 83(2):495 – 530, 1981.
- [8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, 2004.
- [9] C. Coey, L. Kapelevich, and J. P. Vielma. Solving natural conic formulations with Hypatia.jl. INFORMS Journal on Computing, 34:2686–2699, 2022.
- [10] C. Coey, L. Kapelevich, and J. P. Vielma. Conic optimization with spectral functions on Euclidean Jordan algebras. Mathematics of Operations Research, 48(4):1906–1933, 2023.
- [11] C. Coey, L. Kapelevich, and J. P. Vielma. Performance enhancements for a generic conic interior point algorithm. Mathematical Programming Computation, 15(1):53–101, 2023.
- [12] A. d’Aspremont, O. Banerjee, and L. El Ghaoui. First-order methods for sparse covariance selection. SIAM Journal on Matrix Analysis and Applications, 30(1):56–66, 2008.
- [13] A. P. Dempster. Covariance selection. Biometrics, 28(1):157–175, 1972.
- [14] J. Faraut and A. Korányi. Analysis on Symmetric Cones. Clarendon Press, Oxford, 1994.
- [15] L. Faybusovich. Several Jordan-algebraic aspects of optimization. Optimization, 57(3):379–393, 2008.
- [16] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3):432–441, 2008.
- [17] D. Henrion and J. Malick. Projection methods for conic feasibility problems: applications to polynomial sum-of-squares decompositions. Optimization Methods and Software, 26(1):23–46, 2011.
- [18] A. J. Hoffman. On approximate solutions of systems of linear inequalities. Journal of Research of the National Bureau of Standards, 49(4):263–265, 1952.
- [19] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 1990.
- [20] A. Kulesza and B. Taskar. Determinantal point processes for machine learning. Foundations and Trend® in Machine Learning, 5(2-3):123–286, 2012.
- [21] A. S. Lewis and J.-S. Pang. Error bounds for convex inequality systems. In Generalized Convexity, Generalized Monotonicity: Recent Results, pages 75–110. Springer, US, 1998.
- [22] Y. Lin, S. B. Lindstrom, B. F. Lourenço, and T. K. Pong. Generalized power cones: optimal error bounds and automorphisms. ArXiv e-prints. To appear at the SIAM Journal on Optimization, 2023. arXiv:2211.16142.
- [23] S. B. Lindstrom, B. F. Lourenço, and T. K. Pong. Error bounds, facial residual functions and applications to the exponential cone. Mathematical Programming, 200:229–278, 2023.
- [24] S. B. Lindstrom, B. F. Lourenço, and T. K. Pong. Optimal error bounds in the absence of constraint qualifications with applications to the -cones and beyond. arXiv preprint, 2021. arXiv:2109.11729.
- [25] T. Liu and B. F. Lourenço. Convergence analysis under consistent error bounds. Foundations of Computational Mathematics, pages 1–51, 2022.
- [26] B. F. Lourenço. Amenable cones: error bounds without constraint qualifications. Mathematical Programming, 186:1–48, 2021.
- [27] B. F. Lourenço, M. Muramatsu, and T. Tsuchiya. Facial reduction and partial polyhedrality. SIAM Journal on Optimization, 28(3):2304–2326, 2018.
- [28] B. F. Lourenço, V. Roshchina, and J. Saunderson. Amenable cones are particularly nice. SIAM Journal on Optimization, 32(3):2347–2375, 2022.
- [29] Z.-Q. Luo and P. Tseng. Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research, 46(1):157–178, 1993.
- [30] MOSEK ApS. MOSEK Modeling Cookbook Release 3.3.0, 2022. URL: https://docs.mosek.com/modeling-cookbook/index.html.
- [31] Y. Nesterov and A. Nemirovskii. Interior-point Polynomial Algorithms in Convex Programming. SIAM, Philadelphia, 1994.
- [32] J.-S. Pang. Error bounds in mathematical programming. Mathematical Programming, 79(1):299–332, 1997.
- [33] G. Pataki. Strong duality in conic linear programming: Facial reduction and extended duals. In Computational and Analytical Mathematics, volume 50, pages 613–634. Springer, New York, 2013.
- [34] C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, MA, 2005.
- [35] R. T. Rockafellar. Convex Analysis. Princeton University Press, New Jersey, 1997.
- [36] H. Rue and L. Held. Gaussian Markov Random Fields: Theory and Applications. Chapman and Hall/CRC, New York, 2005.
- [37] J. F. Sturm. Error bounds for linear matrix inequalities. SIAM Journal on Optimization, 10(4):1228–1248, 2000.
- [38] M. J. Todd. Minimum-volume Ellipsoids: Theory and Algorithms. SIAM, Philadelphia, 2016.
- [39] S. Van Aelst and P. Rousseeuw. Minimum volume ellipsoid. Wiley Interdisciplinary Reviews: Computational Statistics, 1(1):71–82, 2009.
- [40] H. Waki and M. Muramatsu. Facial reduction algorithms for conic optimization problems. Journal of Optimization Theory and Applications, 158(1):188–215, 2013.
- [41] S. Yang, Z. Lu, X. Shen, P. Wonka, and J. Ye. Fused multiple graphical lasso. SIAM Journal on Optimization, 25(2):916–943, 2015.
- [42] N. Zhang, Y. Zhang, D. Sun, and K.-C. Toh. An efficient linearly convergent regularized proximal point algorithm for fused multiple graphical Lasso problems. SIAM Journal on Mathematics of Data Science, 3(2):524–543, 2021.
- [43] Z. Zhou and A. M.-C. So. A unified approach to error bounds for structured convex optimization problems. Mathematical Programming, 165(2):689–728, Oct 2017.