A Unified Convergence Analysis of a Second-Order Method of Multipliers for Nonlinear Conic Programming
Abstract
In this paper, we accomplish a unified convergence analysis of a second-order method of multipliers (i.e., a second-order augmented Lagrangian method) for solving the conventional nonlinear conic optimization problems. Specifically, the algorithm that we investigated incorporates a specially designed nonsmooth (generalized) Newton step to furnish a second-order update rule for the multipliers. We first show in a unified fashion that under a few abstract assumptions, the proposed method is locally convergent and possesses a (nonasymptotic) superlinear convergence rate, even though the penalty parameter is fixed and/or the strict complementarity fails. Subsequently, we demonstrate that, for the three typical scenarios, i.e., the classic nonlinear programming, the nonlinear second-order cone programming, and the nonlinear semidefinite programming, these abstract assumptions are nothing but exactly the implications of the iconic sufficient conditions that were assumed for establishing the Q-linear convergence rates of the method of multipliers without assuming the strict complementarity.
Keywords. second-order method of multipliers, augmented Lagrangian method, convergence rate, generalized Newton method, second-order cone programming, semidefinite programming
AMS Subject Classification [2000]. Primary 65K05, 49J52; Secondary 90C22, 26E25
1 Introduction
Let and be two finite-dimensional real Hilbert spaces each endowed with an inner product and its induced norm . Consider the following general nonlinear conic optimization problem
(1.1) |
where , and are three twice continuously differentiable functions, and is a closed convex self-dual cone in , i.e., it holds that
Let be the given penalty parameter. The augmented Lagrangian function of problem (1.1) is defined by
(1.2) |
where denotes the metric projection operator onto .
The method of multipliers (also known as the augmented Lagrangian method) for solving problem (1.1) can be stated, in brief, as follows. Let be a given constant and be the initial estimate of the multipliers. For , determine that minimizes the augmented Lagrangian function with respect to , and then update the multipliers via
(1.3) |
thereafter, update the penalty parameter such that .
This method was initiated in Hestenes [23] and Powell [37] for solving equality constrained nonlinear programming (NLP) problems, and was generalized in Rockafellar [43] for NLP problems with inequality constraints. Nowadays, the augmented Lagrangian method has been serving as the backbone of many optimization solvers, such as [17, 55, 53, 28], which are readily available for both academic and commercial users. The literature of the augmented Lagrangian method is considerably abundant, so that we are only able to review the references on analyzing and improving its convergence rate, which are the most relevant ones to this work.
The linear convergence rates of the method of multipliers for NLP problems has been extensively discussed in the literature, such as Powell [37], Rockafellar [43, 44], Tretyakov [52], Bertsekas [2], Conn [16], Ito and Kunisch [24], and Contesse-Becker [18], under various assumptions. One may refer to the monograph of Bertsekas [5] and the references therein for a thorough discussion on the augmented Lagrangian method for NLP problems. For nonlinear second-order cone programming (NLSOCP) problems, the local linear convergence rate of the augmented Lagrangian method has been analyzed in Liu and Zhang [29], and the corresponding results were further improved in Liu and Zhang [30]. For nonlinear semidefinite programming (NLSDP) problems, the local linear rate of convergence was analyzed by Sun et al. [51], and, importantly, the strict complementarity is not necessary for getting this result. Here, we should mention that, in the vast majority of the above mentioned references, the convergence rates of the method of multipliers could be asymptotically superlinear, provided that the penalty parameter sequence tends to infinity as .
In the seminal work [45] of Rockafellar, it was observed that, in the convex programming setting, the method of multipliers can be viewed as a proximal point algorithm applied to the dual. On the other hand, without assuming the convexity, one can locally interpret the iteration procedure (1.3) for updating the multipliers as a first-order steepest ascent step applied to a certain concave (dual) problem. Consequently, given the linear convergence rates of the method of multipliers mentioned above, it is natural to pose the question that if one can design a second-order method of multipliers in the sense that a second-order update form of the multipliers, instead of (1.3), is incorporated in the method of multipliers for calculating and . Meanwhile, it is also interesting to know if the corresponding superlinear convergence rates are achievable, even when the sequence is bounded and/or the strict complementarity fails.
In fact, several attempts have been made to resolve this question for quite a long time. For NLP problems without inequality constraints, Buys [11] first introduced a second-order procedure for updating the multipliers, which can be viewed as an application of the classic Newton method to a certain dual problem. Such a procedure was later investigated and refined by Bertsekas [3, 4, 5]. Later, Brusch [9] and Fletcher [19] independently proposed updating rules of the multipliers via the quasi-Newton method, and this was also studied in Fontecilla et al. [20]. For NLP problems with inequality constraints, the augmented Lagrangian function is no longer twice continuously differentiable. Therefore, it is much more difficult to design a second-order method of multipliers for these problems. The first progress towards this direction was made by Yuan [54], in which the updating rule is based on the classic Newton method, but the corresponding convergence result still has not been formally provided. More recently, Chen and Liao [13] proposed a second-order augmented Lagrangian method for solving NLP problems with inequality constraints. This algorithm is based on a specially designed generalized Newton method and, the corresponding local convergence, as well as the superlinear convergence rate, was established. At the same time, extensive numerical study of the second-order method of multipliers has also been conducted for convex quadratic programming [10], and the numerical results suggest that the second-order update formula of the multipliers, at least locally, can be much better than the usual first-order update rule [10, Section 6].
Motivated by the references mentioned above, in this paper, we would like to approach the problem of how to extend the second-order method of multipliers in [13] from NLP problems with inequality constraints to the more general conic optimization cases, i.e., the more complicated NLSOCP and NLSDP problems. We should emphasize that, as can be observed later, such an extension is highly nontrivial, inasmuch as the required variational analysis for the non-polyhedral conic constraints is much more intricate than that for the NLP case, in which is a polyhedral convex set.
In the same vein as [13], we first use a specially designed generalized Newton method to deal with the nonsmoothness induced by the conic constraints, so as to construct a unified second-order method of multipliers for general nonlinear conic programming problems. We analyze its convergence (locally convergent with a superlinear convergence rate even the strict complementarity fails and the penalty parameter is fixed) under certain abstract assumptions. After that, for the NLP, NLSOCP and NLSDP cases, we show separately that these abstract assumptions can be implied by certain second-order optimality conditions plus a constraint qualification, which have been frequently used for establishing the Q-linear convergence rate of the method of multipliers. Furthermore, when the conic constraints vanish, the algorithm studied in this paper automatically turns to the classic second-order method of multipliers for NLP problems with only equality constraints, and the convergence results in this paper are consistent with all the previous related studies.
The remaining parts of this paper are organized as follows. In Section 2, we prepare the necessary preliminaries from the nonsmooth analysis literature for further discussions. In Section 3, we propose a second-order method of multipliers for solving problem (1.1), and establish its convergence under certain abstract assumptions. In Sections 4, 5 and 6, we separately specify the abstract assumptions in Section 3 with respect to the NLP, NLSCOP and NLSDP cases, which constitutes the main contribution of this paper. Section 7 presents a simple numerical example to illustrate the superlinear convergence of the method studied in this work. We conclude this paper in Section 8.
2 Notation and preliminaries
2.1 Notation
Let be an arbitrarily given finite-dimensional real Hilbert space endowed with an inner product that induces the norm . We use to denote the linear space of all the linear operators from to itself. For any given , we define the open -neighborhood in by
Let be a nonempty closed (not necessarily convex) set in and denote the lineality space of , i.e., the largest linear space contained in . We use to denote the affine hull of , i.e., the smallest linear subspace of that contains . Moreover, we define the distance from a point in to the set by
We use and to denote the tangent cone and contingent cone of the set at , respectively, which are defined by
and
If, additionally, is a convex set, one has that and such tangent and contingent cones are simply called as the tangent cone of at . In this case, we define the projection operator to the set by
Let and be the other two given finite-dimensional real Hilbert spaces each endowed with an inner product and its induced norm . For any set , we define
Let be an arbitrarily given function. The directional derivative of at along with a nonzero direction , if exists, is denoted by . is said to be directionally differentiable at if exists for any nonzero . If is a linear mapping, we use to denote its adjoint. If is Fréchet-differentiable at , we use to denote the Fréchet-derivative of at this point and, in this case, we denote
Let be an open set in . If is locally Lipschitz continuous on , is almost everywhere Fréchet-differentiable in , thanks to [46, Definition 9.1] and the Rademacher’s theorem [46, Theorem 9.60]. In this case, we denote the set of all the points in where is Fréchet-differentiable by . Then, the Bouligand-subdifferential of at , denoted by , is defined by
Moreover, the Clarke’s generalized Jacobian of at is defined by
i.e., the convex hull of .
Let be an arbitrarily given function. The partial Fréchet-derivative of to at , if exists, is denoted by , and we define . Furthermore, if is twice Fréchet-differentiable at , we denote
2.2 Nonsmooth Newton methods
Recall that is a finite-dimensional real Hilbert space. Given a nonempty open set . Let be a locally Lipschitz continuous function and consider the following nonlinear equation
(2.1) |
Definition 2.1 (Semismoothness).
Suppose that is locally Lipschitz continuous. is said to be semismooth at a point if is directionally differentiable at and, for any sequence that converging to with , i.e., the Clarke’s generalized Jacobian of at , it holds that
(2.2) |
Moreover, is said to be strongly semismooth at if is semismooth at but with (2.2) being replaced by .
To solve the nonlinear equation (2.1), the semismooth Newton method, as an instance of nonsmooth (generalized) Newton methods, requires a surrogate in for the Jacobian used in the classic Newton’s method. For this purpose, one can prescribe a multi-valued mapping and make the following assumption.
Assumption 2.1.
The mapping satisfies
-
(a)
for any , is a nonempty and compact subset of ;
-
(b)
the mapping is outer semicontinuous111The definition of outer semicontinuous can be found in Rockafellar [46, Definition 5.4].;
-
(c)
, .
Here, is similarly defined as in [15, Proposition 2.6.2], i.e., by decomposing with each being a finite-dimensional Hilbert space, one can write , so that one can define
Based on the mapping defined above, the semismooth Newton method for solving (2.1) can be stated as follows: Let be the given initial point. For ,
-
1.
choose ;
-
2.
compute the generalized Newton direction via solving the linear system ;
-
3.
set .
Remark 2.2.
The notion of semismoothness was first introduced in Mifflin [32] for functionals, and was later extended in Qi and Sun [38] to vector-valued functions. Here we adopt the definition for semismoothness from Sun and Sun [50, Definition 3.6], which was derived based on Pang and Qi [36, Proposition 1]. The semismooth Newton method was first proposed in Kummer [27, 26], and was further extended by Qi and Sun [38], Qi [39], and Sun and Han [49]. The following theorem gives the convergence properties of the above method.
Theorem 2.3 ([38, Theorem 3.2]).
Let be a solution to the nonlinear equation (2.1). Suppose that Assumption 2.1 holds and is nonsingular, i.e., all are nonsingular, and is semismooth at . Then, there exists a positive number such that for any , the sequence generated by the semismooth Newton method is well-defined and converges to with a superlinear rate, i.e., for all sufficiently large , it holds that
Furthermore, if F is strongly semismooth at , then the convergence rate is Q-quadratic, i.e., for all sufficiently large , it holds that .
As can be observed from the proof of [38, Theorem 3.2], the nonsingularity of implies that is uniformly bounded and nonsingular in a neighborhood of . Meanwhile, the semismoothness of guarantees that, for any sequence converging to with being sufficiently large,
(2.3) |
In fact, from the proof of [38, Theorem 3.2], one also can see that, even if the mapping fails to satisfy Assumption 2.1(c), the semismooth Newton method is well-defined and convergent if, additionally, (2.3) holds. Consequently a much broader algorithmic framework of nonsmooth (generalized) Newton methods emerges beyond the semismooth newton method. This observation constitutes the basis for designing the second-order method of multipliers in the subsequent section. For the convenience of further discussions, we summarize the facts mentioned above as the following proposition.
Proposition 2.4.
Let be a solution to the nonlinear equation (2.1). Suppose that Assumption 2.1(a) and Assumption 2.1(b) hold, and (2.3) holds for any sequence converging to . If is nonsingular and is semismooth at , then there exists a positive number such that for any , the sequence generated by the semismooth Newton method is well-defined and converges to with a superlinear rate.
3 A second-order method of multipliers
In this section, we propose a second-order method of multipliers for solving problem (1.1). This method is essentially a specially designed nonsmooth (generalized) Newton method, which is based on Proposition 2.4. Throughout this section, we make the blanket assumption that the projection operator is strongly semismooth, as this will be fulfilled for all the three classes of problems that will be studied in the forthcoming three sections.
The Lagrangian function of problem (1.1) is defined by
(3.1) |
A vector is called as a stationary point of problem (1.1) if there exists a vector such that is a solution to the following Karush-Kuhn-Tucker (KKT) system:
(3.2) |
In this case, the vector is called a Lagrange multiplier at . We denote the set of all the Lagrangian multipliers at by , which is an empty set if fails to be a stationary point.
Let be a given constant and be a given stationary point of problem (1.1) with the Lagrangian multipliers . We define the mapping
(3.3) |
We make the following abstract assumption for the given stationary point .
Assumption 3.1.
(i) is the unique Lagrangian multiplier;
(ii) there exist two positive constants and such that for any ,
where the linear operator is defined by (3.3).
The following result was given in Sun et al. [51, Proposition 1].
Proposition 3.1.
Suppose that Assumption 3.1 holds and is fixed. Then, there exist two constants and , both depending on , and a locally Lipschitz continuous function defined on by
(3.4) |
such that:
-
(a)
is semismooth at any point in ;
-
(b)
for any and , it holds that every element in is positive definite;
-
(c)
For any , is the unique optimal solution to
If Assumption 3.1 holds and is fixed, we can also fix the two parameters and such that the conclusions (a)–(c) in Proposition 3.1 hold. In this case, we can define the real-valued function
(3.5) |
Obviously, is concave on its domain and
Moreover, the following result can be obtained from Proposition 3.1, and one may refer to Sun et al. [51, Proposition 2] for the detailed proof.
Lemma 3.2.
Now we start to propose our second-order method of multipliers for solving problem (1.1). Suppose that we have obtained a certain , which we denote by with the index for convenience. Then, the procedure in (1.3) to get with can be reformulated as
which can be interpreted as a (first-order) gradient ascent step at for the purpose of maximizing . Thus, it is natural to consider if one can introduce a second-order update rule for the multipliers based on this (first-order) gradient information. Obviously, the classical Newton method cannot be applied directly, due to the nonsmooth property induced by the projection . However, even though the function is semismooth on (according to Lemma 3.2), the semismooth Newton method introduced in Section 2.2 is still not directly applicable, inasmuch as the fact that is a composite function and it is highly difficult to find a mapping such that
(3.7) |
Fortunately, thanks to Proposition 2.4, even if (3.7) fails, one can still hope to design a second-order method to address this issue. Next, we will design a specific nonsmooth Newton method to ensure a second-order method of multipliers for problem (1.1).
For this purpose, we suppose that Assumption 3.1 holds and is fixed, so that we can define the functions
(3.8) |
and
(3.9) |
where both the constant and the function are specified by Proposition 3.1. In this case, we can also define the following set-valued mapping (from to )
(3.10) |
where denotes the identity operator in . According to (3.2) and (3.3), one has that
(3.11) |
The following result on the properties of the mapping defined above is crucial to the forthcoming algorithmic design. Its proof is given in Appendix A, which is mainly based on the work of [13].
Proposition 3.3.
Based on the above discussions, we are able to present the following second-order method of multipliers.
Remark 3.4.
(a) Algorithm 1 is essentially an application of a nonsmooth Newton method for (locally) maximizing the function , or equivalently, for solving the nonlinear equation . In the second step of Algorithm 1, the linear operator for computing can be explicitly obtained for the NLP, NLSOCP and NLSDP problems. Consequently, one can explicitly compute via (3.9) and the linear operator can be calculated via (3.10).
(b) Generally, it is hard to check if . Consequently, Algorithm 1 could not be attributed as the classic semismooth Newton method introduced in Section 2.2. We should mention that, it is still uncertain if one can explicitly find a linear operator such that or . However, as will be shown later, is sufficient for our purpose.
To analyze the convergence properties of Algorithm 1, we make the following abstract assumption as a supplementary to Assumption 3.1.
Assumption 3.2.
For any , any linear operator satisfies .
The following result is essential to the convergence analysis of Algorithm 1. The proof of this theorem is given in Appendix B.
Theorem 3.5.
Now we establish the local convergence properties of Algorithm 1.
Theorem 3.6.
Proof.
According to Lemma 3.2, the function defined by (3.6) is semismooth. Meanwhile, the mapping defined in (3.10) is locally nonempty, compact-valued and outer semicontinuous in a neighborhood of . Then, by Proposition 3.1 and Proposition 3.3 we know that Algorithm 1 is well-defined. Consequently, the theorem follows from Theorem 3.5 and Proposition 2.4. This completes the proof. ∎
We make the following remark on the abstract assumptions used in Theorem 3.6.
Remark 3.7.
Theorem 3.6 has established the local convergence, as well as the superlinear convergence rate, of the second-order method of multipliers (Algorithm 1). However, at the first glance, both Assumptions 3.1 and 3.2 are too abstract to be appreciated. Therefore, it is of great importance to transfer them to some concrete conditions. In the following three sections, we show separately that, for the three typical scenarios, i.e., the NLP, the NLSOCP, and the NLSDP, Assumptions 3.1 and 3.2 are the consequence of a certain constraint qualification, together with a second-order sufficient condition, and the corresponding analysis constitutes the main contribution of this paper. As can be observed later, these concrete conditions are exactly the prerequisites that were frequently used for establishing the local Q-linear convergence rate of the method of multipliers without requiring strict complementarity.
Before finishing this section, we would like to make the following remark on Algorithm 1.
Remark 3.8.
In the conventional problem settings of NLP, NLSOCP and NLSDP, even the classic augmented Lagrangian method is not necessarily convergent globally. Moreover, Algorithm 1 is based on a specially designed generalized Newton method by regarding the update of multipliers in the augmented Lagrangian method as a first-order step. As the second-order approaches often require stronger conditions to guarantee even the local convergence property, it is not necessarily globally convergent in general if the first-order counterpart is not. Consequently, the best possible result for the multiplier sequence generated by Algorithm 1 is a local convergence. Moreover, as the convergence rate study relies on a certain local error bound (as it is not realistic in general to expect a global error bound), the corresponding convergence rate results are also local ones.
4 The NLP case
Consider the NLP problem
(4.1) |
where , and are three twice continuously differentiable functions. Problem (4.1) is an instance of problem (1.1) with , and . For convenience, we define the two index sets and . The following two definitions are well-known [33] for NLP problem (4.1).
Definition 4.1 (LICQ).
Given a point such that and . We say that the linear independence constraint qualification (LICQ) holds at if the set of active constraint gradients
is linearly independent.
Let be a solution to the KKT system (3.2). The critical cone at for is defined by
Definition 4.2 (Second-order sufficient condition for NLP).
Let be a stationary point of problem (4.1), We say the second-order sufficient condition holds at if there exists such that
The following definition was introduced in Robinson [40].
Definition 4.3 (Strong second-order sufficient condition for NLP).
Let be a stationary point of problem (4.1), We say the strong second-order sufficient condition holds at if there exists such that
where is the affine hull of .
The following proposition is the main result of Chen and Liao [13].
5 The NLSOCP case
In this section, we consider the following NLSOCP problem
(5.1) |
where , , and are three twice continuously differentiable functions, with each , being a second-order cone in , i.e.,
Note that . Problem (5.1) is an instance of problem (1.1) with and . Moreover, for convenience, we can partition the function by with each such that is real-valued. In the rest part of this section, we first introduce some preliminary results related to NLSOCP problems. After that, we present the main results of this section, and discuss a special case.
5.1 Preliminaries on NLSOCP
For each , we use and to denote the interior and the boundary of the second-order cone , respectively. According to [21, Proposition 3.3], the projection operator to the cone takes the form of , with
where
It has been established in Chen et al. [14, Proposition 4.3] (also in Hayashi et al. [22, Proposition 4.5]) that the projection operator is strongly semismooth. Moreover, the Bouligand-subdifferential of this projection operator has been discussed in Pang et al. [35, Lemma 14], Chen et al. [12, Lemma 4] and Outrata and Sun [34, Lemma 1]. The following result directly follows [25, Lemmas 2.5 & 2.6].
Lemma 5.1.
For any given point , each element is a block diagonal matrix , with each taking one of the following representations:
-
(a)
-
(b)
-
(c)
-
(d)
According to [6, Lemma 25], the tangent cone of at , with each , , is given by , where
Consequently, the corresponding lineality space of takes the following form
(5.2) |
with
The following definition of constraint nondegeneracy222The name of “constraint nondegeneracy” was coined in Robinson in [42]. For NLP problems, constraint nondegeneracy is exactly the LICQ, c.f. Robinson [41] and Shapiro [47]. for NLSOCP problems follows from [7, Section 2].
Definition 5.2 (Constraint nondegeneracy).
We say that the constraint nondegeneracy holds at a feasible point to the NLSOCP problem (5.1) if
(5.3) |
Let be a stationary point of the NLSOCP problem (5.1) with . The critical cone of NLSOCP at is defined, e.g., in [8, 6], by
where is the orthogonal complement of in . The explicit formulation of can be found in [6, Corollary 26]. Moreover, as in [6, (47)], for any , one has that and for all ,
Hence, one has that
(5.4) |
The following definition of the strong second-order sufficient condition for the NLSOCP problem comes from Bonnans and Ramírez [6].
Definition 5.3 (Strong second-order sufficient condition for NLSOCP).
Let be a stationary point of the NLSOCP problem (5.1). We say that the strong second-order sufficient condition holds at if there exists a vector such that
(5.5) |
where is a matrix with each , defined by
5.2 Main results
Based on the previous subsection, we are ready to establish the main result of this section.
Theorem 5.4.
Proof.
Assumption 3.1 directly follows from [6, Proposition 17] and [29, Proposition 10]. In the following, we show that Assumption 3.2 holds.
Let be fixed. Denote with each , and , . Since Assumption 3.1 holds, we know that for any , the linear operator defined in (3.3) is positive definite, so that by (3.11) we know that defined by (3.10) is nonempty. Suppose that there exists a certain such that the corresponding is not negative definite. From (3.10) we know that . Therefore, what we actually suppose is that is singular, i.e., there exists two vectors and such that
(5.6) |
Note that
(5.7) |
Meanwhile, from Lemma 5.1 we know that with each , . Moreover, it is easy to deduce from [31, Proposition 1] that . Hence, by using the fact that , one has from (5.6) and (5.7) that
(5.8) |
and
(5.9) |
Since the constraint nondegeneracy condition (5.3) holds at , there exist a vector and a vector with , , such that
Therefore, one can get from (5.8) and (5.9) that
(5.10) |
Since , one has from (5.2) that for any ,
(5.11) |
Now, for each , we separate our analysis to the following four cases.
Case 1. If , one has that so that . Therefore, by Lemma 5.1 (a) we know that . Hence, in this case, .
Case 2. If , one has that . Hence, also holds.
Case 3. If and , one has that . Hence, in this case, we know from Lemma 5.1 (c) that
If , one has that . Otherwise, one has that
(5.12) |
Note that , , and . Then, one can see from (5.11) that Therefore, one can get from (5.12) that .
Case 4. If and , from [30, Lemma 2.2]333The proof of this Lemma originally comes from [1]. we know that there exists a constant such that . Hence, in this case, it holds that
(5.13) |
Since , it holds that . Therefore, by using Lemma 5.1 (a), the equation (5.13), and the fact that , one can get that
(5.14) |
Therefore, it holds that
which, together with (5.9), implies that
If , the above equation implies that so that . Otherwise, one has that , and the above equation can be equivalently formulated as
One can reformulate the above equality
(5.15) |
with
Then, in order to ensure (5.15), its discriminant (by viewing it as a quadratic polynomial of ) satisfies , which implies that
As a result, it holds that . Note that, if , one has from (5.15) that , or else . Hence, there always exists a nonzero constant such that
(5.16) |
Note that . Hence, from (5.14) and (5.16) we know that
Then, according to the above equation and (5.11), one has that
6 The NLSDP case
Let be the linear space of all real symmetric matrices and be the cone of positive semidefinite matrices in . We consider the NLSDP problem:
(6.1) |
where , and are three twice continuously differentiable functions. Problem (6.1) is an instance of problem (1.1) with and . We first give a quick review of some preliminary results for NLSDP problems. One may refer to [48] for more details.
6.1 Preliminaries on NLSDP
For any two matrices , the inner product of them defined by and its induced norm is the Frobenius norm given by . Moreover, we write . Hence, it holds that
(6.2) |
One can write its spectral decomposition as with being an orthogonal matrix and being diagonal matrix whose diagonal entries are ordered by . We use the three index sets and to indicate the positive, zero, and negative eigenvalues of , respectively, i.e.,
(6.3) |
Then, one can write
(6.4) |
with , , and . Moreover, we define the matrix by
(6.5) |
with the convention that . As has been shown in Sun and Sun [50], the projection operator is strongly semismooth. Moreover, the following result in Pang et al. [35, Lemma 11] characterizes the Bouligand-subdifferential of this projection operator.
Lemma 6.1.
Suppose that has the spectral decomposition with and being partitioned by (6.4). Then, for any , there exist two index sets and that partition , together with a matrix with entries in , such that
(6.6) |
where denotes the Hadamard product and
It is easy to see that
(6.7) |
where and . From (6.7) we know that the lineality space of this set can be represented by
(6.8) |
The following definition of constraint nondegeneracy for the NLSDP problem comes from Sun [48, Definition 3.2], which is an analogue of the LICQ for NLP problems. The relation between this condition and the concept of nondegeneracy in Robinson [41] also has been well discussed in [48, Section 3].
Definition 6.2 (Constraint nondegeneracy).
We say that a feasible point to the NLSDP problem is constraint nondegenerate if
(6.9) |
The critical cone of at , is defined by
Therefore,
Let be a solution to the KKT system (3.2) of the NLSDP problem (6.1). Then, the critical cone of NLSDP at is defined as
Since an explicit formula for the affine hull of is not readily available, an outer approximation to this affine hull, with respect to was introduced 444One may refer to Sun [48, Section 3] for more information. in Sun [48], i.e.,
Moreover, for any , one has that , which is different from (5.4) for the NLSOCP problems.
For any given matrix , we define the linear-quadratic function , which is linear in the first argument and quadratic in the second argument, by
where is the Moore-Penrose pseudo-inverse of . The following definition of the strong second-order sufficient condition for NLSDP problems was given in Sun [48, Definition 3.2], which is an extension of the strong second-order sufficient condition for NLP problems in Robinson [40] to NLSDP problems.
Definition 6.3 (Strong second-order sufficient condition for NLSDP).
Let be a stationary point of the NLSDP problem. We say that the strong second-order sufficient condition holds at if
where
6.2 Main results
Based on the above preparations, we are ready to establish the main results of this section.
Theorem 6.4.
Proof.
The proof for Assumption 3.1 comes from [8, Proposition 17] and [51, Proposition 4]. Next, we show that Assumption 3.2 holds.
According to Assumption 3.1 we know that is a singleton. Recall that is a solution to the KKT system of the NLSDP problem. Then,
If has the spectral decomposition that with and having the representations in (6.4) and being defined in (6.3), one has from (6.2) that
(6.10) |
Therefore, by using (6.7) and (6.8) one has
so that
(6.11) |
Define and . Then, by using (6.10) one can get
Let and such that . From (6.6) and (6.10) we know that for any ,
|
(6.12) |
where is the identity matrix and has the same structure and properties as the matrix in Lemma 6.1. Then, by (6.5) and the fact that is arbitrarily given, we can conclude that is positive semidefinite, where is the identity operator on . Since Assumption 3.1 holds, any can be represented by the right-hand side of (3.10) with being nonsingular. Therefore,
Therefore, if , one has that
(6.13) |
and
(6.14) |
Since the constraint nondegeneracy (6.9) holds for the NLSDP problem at , there exists a vector and a matrix such that
Therefore, one can get from (6.13) and (6.14) that
Since , we know from (6.11) that . Therefore, it holds that
7 Numerical illustration
In this section, we use a simple numerical example to test whether the numerical performance of the second-order augmented Lagrangian method can achieve a superlinear convergence. For comparison, the classic augmented Lagrangian method with the linear convergence rate is also implemented.
Consider the following problem
It is easy to verify the solution of this problem is given by , and both the LICQ (Definition 4.1) and the strong second-order sufficient condition (Definition 4.3) are satisfied at this point with the corresponding multipliers given by . We set the penalty parameter , and implement both the augmented Lagrangian method and the second-order augmented Lagrangian method for this problem with the initial multipliers 555The initial multiplier is not intentionally chosen. One can observe the same numerical behavior from other initial points. .
The numerical experiment is conducted by running Matlab R2020a on a MacBook Pro (macOS 11.2.3 system) with one Intel 2.7 GHz Quad-Core i7 Processor and 16 GB RAM. We measure the error at the -th iteration via
The algorithms are terminated if . Moreover, we set as if , so that a valid value can be obtained by taking the logarithm of the error when it is too close to zero.
Figure 1 clearly shows the behaviors of the two methods. The left part shows how the errors of the two methods varies with respect to the iteration number, while the right part of this figure, by taking logarithm of the errors, explicitly shows the linear convergence of the classic augmented Lagrangian method, as well as the superlinear convergence of the second-order method of multipliers.


8 Conclusions
In this paper, we have proposed a second-order method of multipliers for solving the NLP, the NLSOCP and the NLSDP problems. The proposed method is a combination of the classic method of multipliers and a specially designed nonsmooth Newton method for updating the multipliers. The local convergence, as well as the superlinear convergence of this second-order method was established under the constraint nondegeneracy (or LICQ) together with a strong second-order sufficient condition. We re-emphasize that, the conditions we used in this paper for proving the superlinear rate convergence of the second-order method of multipliers are exactly those used in the literature for establishing the linear rate convergence (or asymptotically superlinear if the penalty parameter goes to infinity) of the classic method of multipliers without assuming the strictly complementarity. Besides, a simple numerical test was conducted to demonstrate the superlinear convergence behavior of the method studied in this paper.
Since the proposed method is based on both the augmented Lagrangian method and generalized Newton method, we are not able to establish any global convergence property of this method in general. However, to make the method being implementable, globalization techniques should be further studied, together with the computational studies on solving the subproblems and updating the multipliers.
References
- [1] Alizadeh F, Goldfarb D. Second-order cone programming. Math Program, 2003, 95(1): 3–51
- [2] Bertsekas D P. On penalty and multiplier methods for constrained minimization. SIAM J Control Optim, 1976, 14(2): 216–235
- [3] Bertsekas D P. Multiplier methods: A survey. Automatica, 1976, 12(2): 133–145
- [4] Bertsekas D P. On the convergence properties of second-order methods of multipliers. J Optim Theory Appl, 1978, 25(3): 443–449
- [5] Bertsekas D P. Constrained Optimization and Lagrange Multiplier Methods. New York: Academic, 1982
- [6] Bonnans J F, Ramírez H. Perturbation analysis of second-order cone programming problems. Math Program, 2005, 104(2): 205–227
- [7] Bonnans J F, Shapiro A. Nondegeneracy and quantitative stability of parameterized optimization problems with multiple solutions. SIAM J Optim, 1998, 8(4): 940–946
- [8] Bonnans J F, Shapiro A. Perturbation Analysis of Optimization Problems. New York: Springer, 2000
- [9] Brusch R. A rapidly convergent method for equality constrained function minimization. In: Proceeding of 1973 IEEE Conference on Decision Control. San Diego, California, 1973, 80–81
- [10] Bueno L, Haeser G, Santos L. Towards an efficient augmented Lagrangian method for convex quadratic programming. Comput Optim Appl, 2020, 76: 767–800
- [11] Buys J. Dual algorithms for constrained optimization problems. PhD Dissertation. Netherlands: Univ. of Leiden, 1972
- [12] Chen J-S, Chen X, Tseng P. Analysis of nonsmooth vector-valued functions associated with second-order cones. Math Program, 2004, 101: 95–117
- [13] Chen L, Liao A P. On the convergence properties of a second-order augmented Lagrangian method for nonlinear programming problems with inequality constraints. J Optim Theory Appl, 2020, 187: 248–265
- [14] Chen X, Sun D F, Sun J. Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput Optim Appl, 2003, 25: 39–56
- [15] Clarke F H. Optimization and Nonsmooth Analysis. New York: Wiley, 1983
- [16] Conn A, Gould N, Toint P. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J Numer Anal, 1991, 28(2): 545–572
- [17] Conn A, Gould N, Toint P. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A). Berlin Heidelberg: Springer-Verlag, 1992
- [18] Contesse-Becker L. Extended convergence results for the method of multipliers for nonstrictly binding inequality constraints. J Optim Theory Appl, 1993, 79(2): 273–310
- [19] Fletcher R. An ideal penalty function for constrained optimization. IMA J Appl Math, 1975, 15(3): 319–342
- [20] Fontecilla R, Steihaug T, Tapia R. A convergence theory for a class of quasi-Newton methods for constrained optimization. SIAM J Numer Anal, 1987, 24(5): 1133–1151
- [21] Fukushima M, Luo Z Q, Tseng P. Smoothing functions for second-order-cone complementarity problems. SIAM J Optim, 2001, 12: 436–460
- [22] Hayashi S, Yamashita N, Fukushima M. A combined smoothing and regularization method for monotone second-order cone complementarity problems. SIAM J Optim, 2005, 15: 593–615
- [23] Hestenes M. Multiplier and gradient methods. J Optim Theory Appl, 1969, 4(5): 303–320
- [24] Ito K, Kunisch K. The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math Program, 1990, 46(1-3): 341–360
- [25] Kanzow C, Ferenczi I, Fukushima M. On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM J Optim, 2009, 20(1): 297–320
- [26] Kummer B. Newton’s method based on generalized derivatives for nonsmooth functions: Convergence analysis. In: Oettli W (ed.), Advances in Optimization, Berlin: Springer, 1992, 171–194
- [27] Kummer B. Newton’s method for nondifferentiable functions. In: Guddat J (ed.), Advances in Mathematical Optimization, Berlin: Akademie-Verlag, 1998, 114–125
- [28] Li X D, Sun D F, Toh K-C. A highly efficient semismooth Netwon augmented Lagrangian method for solving Lasso problems. SIAM J Optim, 2018, 28: 433–458
- [29] Liu Y J, Zhang L W. Convergence analysis of the augmented Lagrangian method for nonlinear second-order cone optimization problems. Nonlinear Anal, 2007, 67: 1359–1373
- [30] Liu Y J, Zhang L W. Convergence of the augmented Lagrangian method for nonlinear optimization problems over second-order cones. J Optim Theory Appl, 2008, 139: 557–575
- [31] Meng F W, Sun D F, Zhao G Y. Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Math Program, 2005, 104: 561–581
- [32] Mifflin R. Semismooth and semiconvex functions in constrained optimization. SIAM J Control Optim, 1977, 15(6) 959–972
- [33] Nocedal J, Wright S J. Numerical Optimization (2nd Edition). New York: Springer-Verlag, 2006
- [34] Outrata J V, Sun D F. On the coderivative of the projection operator onto the second-order cone. Set-Valued Anal, 2008, 16: 999–1014
- [35] Pang J-S, Sun D F, Sun J. Semismooth homeomorphisms and strong stability of semidefinite and Lorentz complementarity problems. Math Oper Res, 2003, 28: 39–63
- [36] Pang J-S, Qi L Q. Nonsmooth equations: motivation and algorithms. SIAM J Optim, 1993, 3: 443–465
- [37] Powell M J D. A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, New York: Academic, 1969, 283–298
- [38] Qi L Q, Sun J. A nonsmooth version of Newton’s method. Math Program, 1993, 58(1-3): 353–367
- [39] Qi L Q. Convergence analysis of some algorithms for solving nonsmooth equations. Math Oper Res, 1993, 18(1): 227–244
- [40] Robinson S M. Strongly regular generalized equations. Math Oper Res, 1980, 5(1): 43–62
- [41] Robinson S M. Local structure of feasible sets in nonlinear programming, Part II: Nondegeneracy. Math Program Study, 1984, 22: 217–230
- [42] Robinson S M. Constraint nondegeneracy in variational analysis. Math Oper Res, 2003, 28: 201–232
- [43] Rockafellar R T. A dual approach to solving nonlinear programming problems by unconstrained optimization. Math Program, 1973, 5(1): 354–373
- [44] Rockafellar R T. The multiplier method of hestenes and powell applied to convex programming. J Optim Theory Appl, 1973, 12(6): 555–562
- [45] Rockafellar R T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math Oper Res, 1976, 1(2): 97–116
- [46] Rockafellar R T, Wets R J-B. Variational Analysis. Berlin: Springer-Verlag, 1998
- [47] Shapiro A. Sensitivity analysis of generalized equations. J Math Sci, 2003, 115: 2554–2565.
- [48] Sun D F. The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math Oper Res, 2006, 31(4): 761–776
- [49] Sun D F, Han J Y. Newton and quasi-Newton methods for a class of nonsmooth equations and related problems. SIAM J Optim, 1997, 7: 463–480
- [50] Sun D, Sun J. Semismooth matrix valued functions. Math Oper Res, 2002, 27: 150–169
- [51] Sun D F, Sun J, Zhang L W. The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math Program, 2008, 114(2): 349–391
- [52] Tretyakov N. A method of penalty estimates for convex programming problems. Ékonomika i Matematicheskie Metody, 1973, 9: 525–540
- [53] Yang L Q, Sun D F, Toh K-C. SDPNAL+: a majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math Program Comput, 2015, 7: 331–366
- [54] Yuan Y-X. Analysis on a superlinearly convergent augmented Lagrangian method. Acta Math Sinica, English Series, 2014, 30(1): 1–10
- [55] Zhao X Y, Sun D F, Toh K-C. A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J Optim, 2010, 20: 1737–1765
Appendix A Proof of Proposition 3.3
The following result is a consequence of Qi and Sun [38, Theorem 2.3].
Lemma A.1.
The locally Lipschitz continuous function is semismooth at if and only if for any sequence converging to with , it holds that .
Now we start to prove Proposition 3.3.
Proof of Proposition 3.3.
Since Assumption 3.1 holds and , one can get the two constants and , as well as the locally Lipschitz function , from Proposition 3.1. For convenience, we denote . Suppose that . From (1.2), (3.1), (3.4) and (3.8) one can get
Then, by taking the directional derivative of along with the given direction and using (3.8) one can see that
(A.1) |
On the other hand, since the projection operator is strongly semismooth, we know that there always exists a linear operator such that
(A.2) |
Therefore, by combining (A.1), (A.2) and (3.9) together, one can get
(A.3) |
From [31, Proposition 1] we know that the range of is bounded. Then, by [46, proposition 5.51(b)] and [46, Proposition 5.52(b)] we know that the composite mapping is outer semicontinuous in . Note that the set is bounded and bounded away from zero. Hence, by (3.11) and Assumption 3.1, we know from [46, Proposition 5.12(a)] that there exists a positive constant such that is always positive definite if with . Consequently, in this case, we know from (A.3) that
(A.4) |
We use to denote the set of all the points in where is Fréchet-differentiable. Since is semismooth in , it holds that for any ,
(A.5) |
Thus, by combining (A.2), (A.4) and (A.5) together, one can see from (3.10) that
Since is locally Lipschitz continuous and is bounded-valued and outer semicontinuous, by [46, Propositions 5.51(b) & 5.52(b)] we know that is compact-valued and outer semicontinuous. Hence, for any , one has that . This completes the proof. ∎
Appendix B Proof of Theorem 3.5
Proof.
Since Assumption 3.1 holds and , one can get the two constants and , as well as the locally Lipschitz function , via Proposition 3.1. Meanwhile, from Proposition 3.3 we know that there exists a constant such that the mapping defined by (3.10) is well-defined, compact-valued and outer semicontinuous in . Since the sequence converges to , one always has that for all sufficiently large . Therefore, the set is nonempty and compact if is sufficiently large. Moreover, according to Assumption 3.2 and [46, Proposition 5.12(a)] we know that there exists a positive constant such that every element in is negative definite whenever .
Next, we prove (3.12). For all sufficiently large , with the convention that , one can see from (3.10) that there exists a linear operator such that
(B.1) |
Meanwhile, one can see from (3.2) and Lemma 3.2 that
(B.2) |
For convenience, we denote and . Then, by (B.1) and (B.2) we know that (3.12) holds if and only if for all one has that, with ,
(B.3) |
and
(B.4) |
We first veirfy (B.3). We know from (A.2) and (A.4) that, when is sufficiently large,
(B.5) |
where the linear operator is chosen such that
(B.6) |
As has been discussed in Appendix A, the range of is bounded, so that the sequence is also bounded. Moreover, the set is bounded and bounded away from zero, while the composite mapping is outer semicontinuous in . Note that the functions and are twice continuously differentiable, the sequences and are bounded. Then, we know from (B.5) that there exists a constant such that, for all sufficiently large , .
Since is strongly semismooth and , one can see from (B.6) that, when and is sufficiently large,
where the penultimate equality is due to Lemma A.1, and the last inequality comes from the fact that the function is semismooth (hence directionally differentiable) around . Then, from the fact that is bounded, we can get from the above equation that
(B.7) |
Note that (A.1) in Appendix A holds for . Then, by using (A.1) and (3.9) together we can get that for and ,
(B.8) |
Then, by taking (B.7) into (B.8) one can get that for all sufficiently large with ,
which, together with Assumption 3.1 the fact the mapping is outer semicontinuous in , implies that for all sufficiently large , as ,
(B.9) |
Since is locally semismooth, we have that for sufficiently large ,
(B.10) |
Now, by substituting (B.9) in (B.10) and noting that the sequence is bounded, we can get (B.3). Next, we verify (B.4). Note that for sufficiently large with , it holds that
(B.11) |
and
(B.12) |
Meanwhile, one can choose for each the linear operator such that
(B.13) |
Then by combining (B.11), (B.12) and (B.13) together, we can get that
(B.14) |
where the last inequality comes from Lemma A.1 and the fact that . Note that (B.9) implies that
(B.15) |
Thus, by substituting (B.15) in (B.14) we can get (B.4), which completes the proof. ∎