This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Unified Convergence Analysis of a Second-Order Method of Multipliers for Nonlinear Conic Programming

Liang Chen Junyuan Zhu  Xinyuan Zhao School of Mathematics, Hunan University, Changsha 410082, P.R. China ([email protected]). This author was supported in part by the National Natural Science Foundation of China (11801158), the Hunan Provincial Natural Science Foundation of China (2019JJ50040), and the Fundamental Research Funds for the Central Universities in China.School of Mathematics, Hunan University, Changsha 410082, P.R. China ([email protected]).School of Mathematics, Beijing University of Technology, Beijing 100124, P. R. China ([email protected]). This author was supported in part by the National Natural Science Foundation of China (11871002) and the General Program of Science and Technology of Beijing Municipal Education Commission.
Abstract

In this paper, we accomplish a unified convergence analysis of a second-order method of multipliers (i.e., a second-order augmented Lagrangian method) for solving the conventional nonlinear conic optimization problems. Specifically, the algorithm that we investigated incorporates a specially designed nonsmooth (generalized) Newton step to furnish a second-order update rule for the multipliers. We first show in a unified fashion that under a few abstract assumptions, the proposed method is locally convergent and possesses a (nonasymptotic) superlinear convergence rate, even though the penalty parameter is fixed and/or the strict complementarity fails. Subsequently, we demonstrate that, for the three typical scenarios, i.e., the classic nonlinear programming, the nonlinear second-order cone programming, and the nonlinear semidefinite programming, these abstract assumptions are nothing but exactly the implications of the iconic sufficient conditions that were assumed for establishing the Q-linear convergence rates of the method of multipliers without assuming the strict complementarity.

Keywords.  second-order method of multipliers, augmented Lagrangian method, convergence rate, generalized Newton method, second-order cone programming, semidefinite programming

AMS Subject Classification [2000].  Primary 65K05, 49J52; Secondary 90C22, 26E25

1 Introduction

Let 𝒳{\mathcal{X}} and 𝒵{\mathcal{Z}} be two finite-dimensional real Hilbert spaces each endowed with an inner product ,\langle\cdot,\cdot\rangle and its induced norm \|\cdot\|. Consider the following general nonlinear conic optimization problem

minx𝒳f(x)s.t.h(x)=0,g(x)K,\min_{x\in{\mathcal{X}}}\,f(x)\quad\mbox{s.t.}\quad\ h(x)=0,\ g(x)\in K, (1.1)

where f:𝒳f:{\mathcal{X}}\to{\mathds{R}}, h:𝒳mh:{\mathcal{X}}\to{\mathds{R}}^{m} and g:𝒳𝒵g:{\mathcal{X}}\to{\mathcal{Z}} are three twice continuously differentiable functions, and KK is a closed convex self-dual cone in 𝒵{\mathcal{Z}}, i.e., it holds that

KK:={μ|μ,μ0,μK}.K\equiv K^{*}:=\left\{\mu\,|\,\langle\mu,\mu^{\prime}\rangle\geq 0,\ \forall\mu^{\prime}\in K\right\}.

Let c>0c>0 be the given penalty parameter. The augmented Lagrangian function of problem (1.1) is defined by

c(x,λ,μ):=f(x)λ,h(x)+c2h(x)2+12c(ΠK(μcg(x))2μ2),(x,λ,μ)𝒳×m×𝒵,\begin{array}[]{ll}\displaystyle{\mathcal{L}}_{c}(x,\lambda,\mu):=f(x)-\langle\lambda,h(x)\rangle+\frac{c}{2}\|h(x)\|^{2}+\frac{1}{2c}\big{(}\|\Pi_{K}(\mu-cg(x))\|^{2}-\|\mu\|^{2}\big{)},\\[8.53581pt] \displaystyle\hfill\forall(x,\lambda,\mu)\in{\mathcal{X}}\times{\mathds{R}}^{m}\times{\mathcal{Z}},\end{array} (1.2)

where ΠK()\Pi_{K}(\cdot) denotes the metric projection operator onto KK.

The method of multipliers (also known as the augmented Lagrangian method) for solving problem (1.1) can be stated, in brief, as follows. Let c0>0c_{0}>0 be a given constant and (λ0,μ0)m×K(\lambda^{0},\mu^{0})\in{\mathds{R}}^{m}\times K be the initial estimate of the multipliers. For k=0,1,k=0,1,\ldots, determine xk+1𝒳x^{k+1}\in{\mathcal{X}} that minimizes the augmented Lagrangian function ck(x,λk,μk){\mathcal{L}}_{c_{k}}(x,\lambda^{k},\mu^{k}) with respect to xx, and then update the multipliers via

{λk+1:=λkckh(xk+1),μk+1:=ΠK(μkckg(xk+1)),\left\{\begin{array}[]{l}\lambda^{k+1}:=\lambda^{k}-c_{k}h(x^{k+1}),\\[5.69054pt] \mu^{k+1}:=\Pi_{K}\big{(}\mu^{k}-c_{k}g(x^{k+1})\big{)},\end{array}\right. (1.3)

thereafter, update the penalty parameter such that ck+1ckc_{k+1}\geq c_{k}.

This method was initiated in Hestenes [23] and Powell [37] for solving equality constrained nonlinear programming (NLP) problems, and was generalized in Rockafellar [43] for NLP problems with inequality constraints. Nowadays, the augmented Lagrangian method has been serving as the backbone of many optimization solvers, such as [17, 55, 53, 28], which are readily available for both academic and commercial users. The literature of the augmented Lagrangian method is considerably abundant, so that we are only able to review the references on analyzing and improving its convergence rate, which are the most relevant ones to this work.

The linear convergence rates of the method of multipliers for NLP problems has been extensively discussed in the literature, such as Powell [37], Rockafellar [43, 44], Tretyakov [52], Bertsekas [2], Conn [16], Ito and Kunisch [24], and Contesse-Becker [18], under various assumptions. One may refer to the monograph of Bertsekas [5] and the references therein for a thorough discussion on the augmented Lagrangian method for NLP problems. For nonlinear second-order cone programming (NLSOCP) problems, the local linear convergence rate of the augmented Lagrangian method has been analyzed in Liu and Zhang [29], and the corresponding results were further improved in Liu and Zhang [30]. For nonlinear semidefinite programming (NLSDP) problems, the local linear rate of convergence was analyzed by Sun et al. [51], and, importantly, the strict complementarity is not necessary for getting this result. Here, we should mention that, in the vast majority of the above mentioned references, the convergence rates of the method of multipliers could be asymptotically superlinear, provided that the penalty parameter sequence {ck}\{c_{k}\} tends to infinity as kk\to\infty.

In the seminal work [45] of Rockafellar, it was observed that, in the convex programming setting, the method of multipliers can be viewed as a proximal point algorithm applied to the dual. On the other hand, without assuming the convexity, one can locally interpret the iteration procedure (1.3) for updating the multipliers as a first-order steepest ascent step applied to a certain concave (dual) problem. Consequently, given the linear convergence rates of the method of multipliers mentioned above, it is natural to pose the question that if one can design a second-order method of multipliers in the sense that a second-order update form of the multipliers, instead of (1.3), is incorporated in the method of multipliers for calculating λk+1\lambda^{k+1} and μk+1\mu^{k+1}. Meanwhile, it is also interesting to know if the corresponding superlinear convergence rates are achievable, even when the sequence {ck}\{c_{k}\} is bounded and/or the strict complementarity fails.

In fact, several attempts have been made to resolve this question for quite a long time. For NLP problems without inequality constraints, Buys [11] first introduced a second-order procedure for updating the multipliers, which can be viewed as an application of the classic Newton method to a certain dual problem. Such a procedure was later investigated and refined by Bertsekas [3, 4, 5]. Later, Brusch [9] and Fletcher [19] independently proposed updating rules of the multipliers via the quasi-Newton method, and this was also studied in Fontecilla et al. [20]. For NLP problems with inequality constraints, the augmented Lagrangian function is no longer twice continuously differentiable. Therefore, it is much more difficult to design a second-order method of multipliers for these problems. The first progress towards this direction was made by Yuan [54], in which the updating rule is based on the classic Newton method, but the corresponding convergence result still has not been formally provided. More recently, Chen and Liao [13] proposed a second-order augmented Lagrangian method for solving NLP problems with inequality constraints. This algorithm is based on a specially designed generalized Newton method and, the corresponding local convergence, as well as the superlinear convergence rate, was established. At the same time, extensive numerical study of the second-order method of multipliers has also been conducted for convex quadratic programming [10], and the numerical results suggest that the second-order update formula of the multipliers, at least locally, can be much better than the usual first-order update rule [10, Section 6].

Motivated by the references mentioned above, in this paper, we would like to approach the problem of how to extend the second-order method of multipliers in [13] from NLP problems with inequality constraints to the more general conic optimization cases, i.e., the more complicated NLSOCP and NLSDP problems. We should emphasize that, as can be observed later, such an extension is highly nontrivial, inasmuch as the required variational analysis for the non-polyhedral conic constraints is much more intricate than that for the NLP case, in which KK is a polyhedral convex set.

In the same vein as [13], we first use a specially designed generalized Newton method to deal with the nonsmoothness induced by the conic constraints, so as to construct a unified second-order method of multipliers for general nonlinear conic programming problems. We analyze its convergence (locally convergent with a superlinear convergence rate even the strict complementarity fails and the penalty parameter is fixed) under certain abstract assumptions. After that, for the NLP, NLSOCP and NLSDP cases, we show separately that these abstract assumptions can be implied by certain second-order optimality conditions plus a constraint qualification, which have been frequently used for establishing the Q-linear convergence rate of the method of multipliers. Furthermore, when the conic constraints vanish, the algorithm studied in this paper automatically turns to the classic second-order method of multipliers for NLP problems with only equality constraints, and the convergence results in this paper are consistent with all the previous related studies.

The remaining parts of this paper are organized as follows. In Section 2, we prepare the necessary preliminaries from the nonsmooth analysis literature for further discussions. In Section 3, we propose a second-order method of multipliers for solving problem (1.1), and establish its convergence under certain abstract assumptions. In Sections 4, 5 and 6, we separately specify the abstract assumptions in Section 3 with respect to the NLP, NLSCOP and NLSDP cases, which constitutes the main contribution of this paper. Section 7 presents a simple numerical example to illustrate the superlinear convergence of the method studied in this work. We conclude this paper in Section 8.

2 Notation and preliminaries

2.1 Notation

Let 𝒰{\mathcal{U}} be an arbitrarily given finite-dimensional real Hilbert space endowed with an inner product ,\langle\cdot,\cdot\rangle that induces the norm \|\cdot\|. We use 𝕃(𝒰,𝒰){\mathds{L}}({\mathcal{U}},{\mathcal{U}}) to denote the linear space of all the linear operators from 𝒰{\mathcal{U}} to itself. For any given ε>0\varepsilon>0, we define the open ε\varepsilon-neighborhood in 𝒰{\mathcal{U}} by

𝔹ε(u):={u𝒰uu<ε},u𝒰.{\mathds{B}}_{\varepsilon}(u):=\{u^{\prime}\in{\mathcal{U}}\mid\|u-u^{\prime}\|<\varepsilon\},\quad\forall u\in{\mathcal{U}}.

Let UU be a nonempty closed (not necessarily convex) set in 𝒰{\mathcal{U}} and 𝗅𝗂𝗇U{\sf lin\,}U denote the lineality space of UU, i.e., the largest linear space contained in UU. We use 𝖺𝖿𝖿U{\sf aff\,}U to denote the affine hull of UU, i.e., the smallest linear subspace of 𝒰{\mathcal{U}} that contains UU. Moreover, we define the distance from a point in 𝒰{\mathcal{U}} to the set UU by

𝖽𝗂𝗌𝗍(u,U):=infuUuu,u𝒰.{\sf dist}(u,U):=\inf_{u^{\prime}\in U}\|u-u^{\prime}\|,\quad\forall u\in{\mathcal{U}}.

We use 𝒯Ui(u){\mathcal{T}}_{U}^{i}(u) and 𝒯U(u){\mathcal{T}}_{U}(u) to denote the tangent cone and contingent cone of the set UU at u𝒰u\in{\mathcal{U}}, respectively, which are defined by

𝒯Ui(u):={u𝒰𝖽𝗂𝗌𝗍(u+τu,U)=o(τ),τ0},{\mathcal{T}}_{U}^{i}(u):=\left\{u^{\prime}\in{\mathcal{U}}\mid{\sf dist}(u+\tau u^{\prime},U)=o(\tau),\ \forall\tau\geq 0\right\},

and

𝒯U(u):={u𝒰τk0, such that 𝖽𝗂𝗌𝗍(u+τku,U)=o(τk)}.{\mathcal{T}}_{U}(u):=\left\{u^{\prime}\in{\mathcal{U}}\mid\exists\,\tau_{k}\downarrow 0,\mbox{ such that }{\sf dist}(u+\tau_{k}u^{\prime},U)=o(\tau_{k})\right\}.

If, additionally, UU is a convex set, one has that 𝒯Ui(u)=𝒯U(u),u𝒰{\mathcal{T}}_{U}^{i}(u)={\mathcal{T}}_{U}(u),\,\forall u\in{\mathcal{U}} and such tangent and contingent cones are simply called as the tangent cone of UU at uu. In this case, we define the projection operator to the set UU by

ΠU(u)=argminuUuu,u𝒰.\Pi_{U}(u)=\operatorname*{arg\,min}_{u^{\prime}\in U}\|u-u^{\prime}\|,\quad\forall u\in{\mathcal{U}}.

Let 𝒱{\mathcal{V}} and 𝒲{\mathcal{W}} be the other two given finite-dimensional real Hilbert spaces each endowed with an inner product ,\langle\cdot,\cdot\rangle and its induced norm \|\cdot\|. For any set C𝒰×𝒱C\subseteq{\mathcal{U}}\times{\mathcal{V}}, we define

π𝒰(C):={u𝒰(u,v)C}.\pi_{{\mathcal{U}}}(C):=\{u\in{\mathcal{U}}\mid(u,v)\in C\}.

Let F:𝒰𝒲F:{\mathcal{U}}\to{\mathcal{W}} be an arbitrarily given function. The directional derivative of F()F(\cdot) at u𝒰u\in{\mathcal{U}} along with a nonzero direction Δu𝒰\Delta u\in{\mathcal{U}}, if exists, is denoted by F(u;Δu)F^{\prime}(u;\Delta u). FF is said to be directionally differentiable at u𝒰u\in{\mathcal{U}} if F(u;Δu)F^{\prime}(u;\Delta u) exists for any nonzero Δu𝒰\Delta u\in{\mathcal{U}}. If FF is a linear mapping, we use FF^{*} to denote its adjoint. If FF is Fréchet-differentiable at u𝒰u\in{\mathcal{U}}, we use 𝒥F(u){\mathcal{J}}F(u) to denote the Fréchet-derivative of FF at this point and, in this case, we denote

F(u):=(𝒥F(u)).\nabla F(u):=({\mathcal{J}}F(u))^{*}.

Let OO be an open set in 𝒰{\mathcal{U}}. If F:O𝒲F:O\to{\mathcal{W}} is locally Lipschitz continuous on OO, FF is almost everywhere Fréchet-differentiable in OO, thanks to [46, Definition 9.1] and the Rademacher’s theorem [46, Theorem 9.60]. In this case, we denote the set of all the points in OO where FF is Fréchet-differentiable by DFD_{F}. Then, the Bouligand-subdifferential of FF at uOu\in O, denoted by BF(u)\partial_{B}F(u), is defined by

BF(u):={limk𝒥F(uk){uk}DF,uku}.\partial_{B}F(u):=\left\{\lim_{k\to\infty}{\mathcal{J}}F(u^{k})\mid\ \{u^{k}\}\subset D_{F},\,u^{k}\to u\,\right\}.

Moreover, the Clarke’s generalized Jacobian of FF at uOu\in O is defined by

F(u):=conv{BF(u)},\partial F(u):=\mbox{\sf conv}\{\partial_{B}F(u)\},

i.e., the convex hull of BF(u)\partial_{B}F(u).

Let G:𝒰×𝒱𝒲G:{\mathcal{U}}\times{\mathcal{V}}\to{\mathcal{W}} be an arbitrarily given function. The partial Fréchet-derivative of GG to uu at (u,v)𝒰×𝒱(u,v)\in{\mathcal{U}}\times{\mathcal{V}}, if exists, is denoted by 𝒥uG(u,v){\mathcal{J}}_{u}G(u,v), and we define uG(u,v):=(𝒥uG(u,v))\nabla_{u}G(u,v):=({\mathcal{J}}_{u}G(u,v))^{*}. Furthermore, if GG is twice Fréchet-differentiable at (u,v)𝒰×𝒱(u,v)\in{\mathcal{U}}\times{\mathcal{V}}, we denote

𝒥2G(u,v):=𝒥(𝒥G)(u,v),𝒥u,u2G(u,v):=𝒥u(𝒥uG)(u,v),2G(u,v):=𝒥(G)(u,v),uu2G(u,v):=𝒥u(uG)(u,v).\begin{array}[]{l}{\mathcal{J}}^{2}G(u,v):={\mathcal{J}}({\mathcal{J}}G)(u,v),\quad{\mathcal{J}}^{2}_{u,u}G(u,v):={\mathcal{J}}_{u}({\mathcal{J}}_{u}G)(u,v),\\[5.69054pt] \nabla^{2}G(u,v):={\mathcal{J}}(\nabla G)(u,v),\quad\nabla_{uu}^{2}G(u,v):={\mathcal{J}}_{u}(\nabla_{u}G)(u,v).\end{array}

Finally, we mention that the notations in Sections 4, 5 and 6 are isolated from each other, as the contents of these sections are independent.

2.2 Nonsmooth Newton methods

Recall that 𝒰{\mathcal{U}} is a finite-dimensional real Hilbert space. Given a nonempty open set O𝒰O\subset{\mathcal{U}}. Let F:O𝒰F:O\to{\mathcal{U}} be a locally Lipschitz continuous function and consider the following nonlinear equation

F(u)=0.F(u)=0. (2.1)
Definition 2.1 (Semismoothness).

Suppose that F:O𝒰F:O\to{\mathcal{U}} is locally Lipschitz continuous. FF is said to be semismooth at a point uOu\in O if FF is directionally differentiable at uu and, for any sequence {uk}O\{u^{k}\}\subset O that converging to uu with VkF(uk)V^{k}\in\partial F(u^{k}), i.e., the Clarke’s generalized Jacobian of FF at uku^{k}, it holds that

F(uk)F(u)Vk(uku)=o(uku).F(u^{k})-F(u)-V^{k}(u^{k}-u)=o(\|u^{k}-u\|). (2.2)

Moreover, FF is said to be strongly semismooth at uu if FF is semismooth at uu but with (2.2) being replaced by F(uk)F(u)Vk(uku)=O(uku2)F(u^{k})-F(u)-V^{k}(u^{k}-u)=O(\|u^{k}-u\|^{2}).

To solve the nonlinear equation (2.1), the semismooth Newton method, as an instance of nonsmooth (generalized) Newton methods, requires a surrogate in 𝕃(𝒰,𝒰){\mathds{L}}({\mathcal{U}},{\mathcal{U}}) for the Jacobian used in the classic Newton’s method. For this purpose, one can prescribe a multi-valued mapping TF:O𝕃(𝒰,𝒰)T_{F}:{O}\rightrightarrows{\mathds{L}}({\mathcal{U}},{\mathcal{U}}) and make the following assumption.

Assumption 2.1.

The mapping TF:O𝕃(𝒰,𝒰)T_{F}:{O}\rightrightarrows{\mathds{L}}({\mathcal{U}},{\mathcal{U}}) satisfies

  • (a)

    for any uOu\in O, TF(u)T_{F}(u) is a nonempty and compact subset of 𝕃(𝒰,𝒰){\mathds{L}}({\mathcal{U}},{\mathcal{U}});

  • (b)

    the mapping TFT_{F} is outer semicontinuous111The definition of outer semicontinuous can be found in Rockafellar [46, Definition 5.4].;

  • (c)

    BF(u)ΔuTF(u)ΔuCF(u)Δu\partial_{B}F(u)\Delta u\subset T_{F}(u)\Delta u\subset\partial_{C}F(u)\Delta u,  uO,Δu𝒰\forall\,u\in O,\,\Delta u\in{\mathcal{U}}.

Here, C\partial_{C} is similarly defined as in [15, Proposition 2.6.2], i.e., by decomposing 𝒰=i=1n𝒰i{\mathcal{U}}=\prod_{i=1}^{n}{\mathcal{U}}_{i} with each 𝒰i{\mathcal{U}}_{i} being a finite-dimensional Hilbert space, one can write F(u)=i=1nFi(u),uOF(u)=\prod_{i=1}^{n}F_{i}(u),\,\forall u\in O, so that one can define

CF(u):=i=1nFi(u)withFi(u):=π𝒰i(F(u)),uO.\partial_{C}F(u):=\prod_{i=1}^{n}\partial F_{i}(u)\quad\mbox{with}\quad\partial F_{i}(u):=\pi_{{\mathcal{U}}_{i}}(\partial F(u)),\quad\forall u\in O.

Based on the mapping TFT_{F} defined above, the semismooth Newton method for solving (2.1) can be stated as follows: Let u0Ou^{0}\in O be the given initial point. For k=1,k=1,\ldots,

  • 1.

    choose VkTF(uk)V^{k}\in T_{F}(u^{k});

  • 2.

    compute the generalized Newton direction Δuk=Δu\Delta u^{k}=\Delta u via solving the linear system F(uk)+VkΔu=0F(u^{k})+V^{k}\Delta u=0;

  • 3.

    set uk+1:=uk+Δuku^{k+1}:=u^{k}+\Delta u^{k}.

Remark 2.2.

The notion of semismoothness was first introduced in Mifflin [32] for functionals, and was later extended in Qi and Sun [38] to vector-valued functions. Here we adopt the definition for semismoothness from Sun and Sun [50, Definition 3.6], which was derived based on Pang and Qi [36, Proposition 1]. The semismooth Newton method was first proposed in Kummer [27, 26], and was further extended by Qi and Sun [38], Qi [39], and Sun and Han [49]. The following theorem gives the convergence properties of the above method.

Theorem 2.3 ([38, Theorem 3.2]).

Let u𝒰u^{*}\in{\mathcal{U}} be a solution to the nonlinear equation (2.1). Suppose that Assumption 2.1 holds and TF(u)T_{F}(u^{*}) is nonsingular, i.e., all VTF(u)V\in T_{F}(u^{*}) are nonsingular, and FF is semismooth at uu^{*}. Then, there exists a positive number ε>0\varepsilon>0 such that for any u0𝔹ε(u)u^{0}\in{\mathds{B}}_{\varepsilon}(u^{*}), the sequence {uk}\{u^{k}\} generated by the semismooth Newton method is well-defined and converges to uu^{*} with a superlinear rate, i.e., for all sufficiently large kk, it holds that

uk+1u=o(uku).\|u^{k+1}-u^{*}\|=o(\|u^{k}-u^{*}\|).

Furthermore, if F is strongly semismooth at uu^{*}, then the convergence rate is Q-quadratic, i.e., for all sufficiently large kk, it holds that uk+1u=O(uku2)\|u^{k+1}-u^{*}\|=O(\|u^{k}-u^{*}\|^{2}).

As can be observed from the proof of [38, Theorem 3.2], the nonsingularity of TF(u)T_{F}(u^{*}) implies that TF()T_{F}(\cdot) is uniformly bounded and nonsingular in a neighborhood of uu^{*}. Meanwhile, the semismoothness of FF guarantees that, for any sequence {uk}\{u^{k}\} converging to uu^{*} with kk being sufficiently large,

F(uk)F(u)Vk(uku)=o(uku),VkTF(uk).F(u^{k})-F(u^{*})-V^{k}(u^{k}-u^{*})=o(\|u^{k}-u^{*}\|),\quad\forall\ V^{k}\in T_{F}(u^{k}). (2.3)

In fact, from the proof of [38, Theorem 3.2], one also can see that, even if the mapping TFT_{F} fails to satisfy Assumption 2.1(c), the semismooth Newton method is well-defined and convergent if, additionally, (2.3) holds. Consequently a much broader algorithmic framework of nonsmooth (generalized) Newton methods emerges beyond the semismooth newton method. This observation constitutes the basis for designing the second-order method of multipliers in the subsequent section. For the convenience of further discussions, we summarize the facts mentioned above as the following proposition.

Proposition 2.4.

Let u𝒰u^{*}\in{\mathcal{U}} be a solution to the nonlinear equation (2.1). Suppose that Assumption 2.1(a) and Assumption 2.1(b) hold, and (2.3) holds for any sequence {uk}\{u^{k}\} converging to uu^{*}. If TF(u)T_{F}(u^{*}) is nonsingular and FF is semismooth at uu^{*}, then there exists a positive number ε>0\varepsilon>0 such that for any u0𝔹ε(u)u^{0}\in{\mathds{B}}_{\varepsilon}(u^{*}), the sequence {uk}\{u^{k}\} generated by the semismooth Newton method is well-defined and converges to uu^{*} with a superlinear rate.

Proof.

By using (2.3) instead of Assumption 2.1(c) and repeating the proof of [38, Theorem 3.2], one can get the result. ∎

3 A second-order method of multipliers

In this section, we propose a second-order method of multipliers for solving problem (1.1). This method is essentially a specially designed nonsmooth (generalized) Newton method, which is based on Proposition 2.4. Throughout this section, we make the blanket assumption that the projection operator ΠK()\Pi_{K}(\cdot) is strongly semismooth, as this will be fulfilled for all the three classes of problems that will be studied in the forthcoming three sections.

The Lagrangian function of problem (1.1) is defined by

0(x,λ,μ):=f(x)λ,h(x)μ,g(x),(x,λ,μ)𝒳×m×𝒵.{\mathcal{L}}_{0}(x,\lambda,\mu):=f(x)-\langle\lambda,h(x)\rangle-\langle\mu,g(x)\rangle,\quad\forall(x,\lambda,\mu)\in{\mathcal{X}}\times{\mathds{R}}^{m}\times{\mathcal{Z}}. (3.1)

A vector x𝒳x^{*}\in{\mathcal{X}} is called as a stationary point of problem (1.1) if there exists a vector (λ,μ)m×𝒵(\lambda^{*},\mu^{*})\in{\mathds{R}}^{m}\times{\mathcal{Z}} such that (x,λ,μ)(x^{*},\lambda^{*},\mu^{*}) is a solution to the following Karush-Kuhn-Tucker (KKT) system:

{x0(x,λ,μ)=0,h(x)=0,g(x)K,μK,μ,g(x)=0.\left\{\begin{array}[]{l}\nabla_{x}{\mathcal{L}}_{0}(x,\lambda,\mu)=0,\\[5.69054pt] h(x)=0,\\[5.69054pt] g(x)\in K,\ \ \mu\in K,\ \ \langle\mu,g(x)\rangle=0.\end{array}\right. (3.2)

In this case, the vector (λ,μ)(\lambda^{*},\mu^{*}) is called a Lagrange multiplier at xx^{*}. We denote the set of all the Lagrangian multipliers at x𝒳x\in{\mathcal{X}} by (x){\mathcal{M}}(x), which is an empty set if xx fails to be a stationary point.

Let c>0c>0 be a given constant and x𝒳x^{*}\in{\mathcal{X}} be a given stationary point of problem (1.1) with the Lagrangian multipliers (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}). We define the mapping

𝒜(c,λ,μ)(W):=xx20(x,λ,μ)+ch(x)𝒥h(x)+cg(x)W𝒥g(x),W𝕃(𝒵,𝒵).\begin{array}[]{r}{\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W):=\nabla_{xx}^{2}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})+c\nabla h(x^{*}){\mathcal{J}}h(x^{*})+c\nabla g(x^{*})W{\mathcal{J}}g(x^{*}),\\[5.69054pt] \forall\,W\in{\mathds{L}}({\mathcal{Z}},{\mathcal{Z}}).\end{array} (3.3)

We make the following abstract assumption for the given stationary point x𝒳x^{*}\in{\mathcal{X}}.

Assumption 3.1.

(i) (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}) is the unique Lagrangian multiplier;

(ii) there exist two positive constants c¯\bar{c} and η\eta such that for any cc¯c\geq\bar{c},

x,𝒜(c,λ,μ)(W)xηx2,WBΠK(μcg(x)),x𝒳,\langle x,{\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W)\,x\rangle\geq\eta\|x\|^{2},\quad\forall\,W\in\partial_{B}\Pi_{K}(\mu^{*}-cg(x^{*})),\ \forall\,x\in{\mathcal{X}},

where the linear operator 𝒜(c,λ,μ)(W){{\mathcal{A}}}_{(c,\lambda^{*},\mu^{*})}(W) is defined by (3.3).

The following result was given in Sun et al. [51, Proposition 1].

Proposition 3.1.

Suppose that Assumption 3.1 holds and cc¯c\geq\bar{c} is fixed. Then, there exist two constants ε>0\varepsilon>0 and δ0>0\delta_{0}>0, both depending on cc, and a locally Lipschitz continuous function 𝐱c(){\mathbf{x}}_{c}(\cdot) defined on 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}) by

𝐱c(λ,μ):=argminx𝔹ε(x)c(x,λ,μ),{\mathbf{x}}_{c}(\lambda,\mu):=\operatorname*{arg\,min}_{x\in{\mathds{B}}_{\varepsilon}(x^{*})}{\mathcal{L}}_{c}(x,\lambda,\mu), (3.4)

such that:

  • (a)

    𝐱c(){\mathbf{x}}_{c}(\cdot) is semismooth at any point in 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*});

  • (b)

    for any x𝔹ε(x)x\in{\mathds{B}}_{\varepsilon}(x^{*}) and (λ,μ)𝔹δ0(λ,μ)(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}), it holds that every element in πxB(xc)(x,λ,μ)\pi_{x}\partial_{B}(\nabla_{x}{\mathcal{L}}_{c})(x,\lambda,\mu) is positive definite;

  • (c)

    For any (λ,μ)𝔹δ0(λ,μ)(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}), 𝐱c(λ,μ){\mathbf{x}}_{c}(\lambda,\mu) is the unique optimal solution to

    minx𝔹ε(x)c(x,λ,μ).\min_{x\in{\mathds{B}_{\varepsilon}(x^{*})}}{\mathcal{L}}_{c}(x,\lambda,\mu).

If Assumption 3.1 holds and cc¯c\geq\bar{c} is fixed, we can also fix the two parameters ε>0\varepsilon>0 and δ0>0\delta_{0}>0 such that the conclusions (a)–(c) in Proposition 3.1 hold. In this case, we can define the real-valued function

ϑc(λ,μ):=minx𝔹ε(x)c(x,λ,μ),(λ,μ)𝔹δ0(λ,μ).\vartheta_{c}(\lambda,\mu):=\min_{x\in{\mathds{B}}_{\varepsilon}(x^{*})}{\mathcal{L}}_{c}(x,\lambda,\mu),\quad\forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}). (3.5)

Obviously, ϑc()\vartheta_{c}(\cdot) is concave on its domain and

ϑc(λ,μ)=c(𝐱c(λ,μ),λ,μ),(λ,μ)𝔹δ0(λ,μ).\vartheta_{c}(\lambda,\mu)={\mathcal{L}}_{c}({\mathbf{x}}_{c}(\lambda,\mu),\lambda,\mu),\ \forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}).

Moreover, the following result can be obtained from Proposition 3.1, and one may refer to Sun et al. [51, Proposition 2] for the detailed proof.

Lemma 3.2.

Suppose that Assumption 3.1 holds and cc¯c\geq\bar{c} is fixed. Let the two constants ε>0\varepsilon>0 and δ0>0\delta_{0}>0 be given by Proposition 3.1. Then, the concave function ϑc()\vartheta_{c}(\cdot) defined by (3.5) is continuously differentiable on 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}) with its gradient given by

ϑc(λ,μ)=(h(𝐱c(λ,μ))1cΠK(μcg(𝐱c(λ,μ)))μc).\nabla\vartheta_{c}(\lambda,\mu)=\left(\begin{matrix}-h({\mathbf{x}}_{c}(\lambda,\mu))\\[2.84526pt] \displaystyle\frac{1}{c}\,\Pi_{K}\big{(}\mu-cg({\mathbf{x}}_{c}(\lambda,\mu))\big{)}-\frac{\mu}{c}\end{matrix}\right). (3.6)

Moreover, ϑc()\nabla\vartheta_{c}(\cdot) is semismooth on 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}).

Now we start to propose our second-order method of multipliers for solving problem (1.1). Suppose that we have obtained a certain (λ,μ)𝔹δ0(λ,μ)(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}), which we denote by (λk,μk)(\lambda^{k},\mu^{k}) with the index k0k\geq 0 for convenience. Then, the procedure in (1.3) to get (λk+1,μk+1)(\lambda^{k+1},\mu^{k+1}) with xk+1:=𝐱c(λk,μk)x^{k+1}:={\mathbf{x}}_{c}(\lambda^{k},\mu^{k}) can be reformulated as

(λk+1,μk+1)=(λk,μk)+cϑc(λk,μk),(\lambda^{k+1},\mu^{k+1})=(\lambda^{k},\mu^{k})+c\nabla\vartheta_{c}(\lambda^{k},\mu^{k}),

which can be interpreted as a (first-order) gradient ascent step at (λk,μk)(\lambda^{k},\mu^{k}) for the purpose of maximizing ϑc\vartheta_{c}. Thus, it is natural to consider if one can introduce a second-order update rule for the multipliers based on this (first-order) gradient information. Obviously, the classical Newton method cannot be applied directly, due to the nonsmooth property induced by the projection ΠK()\Pi_{K}(\cdot). However, even though the function ϑc\nabla\vartheta_{c} is semismooth on 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}) (according to Lemma 3.2), the semismooth Newton method introduced in Section 2.2 is still not directly applicable, inasmuch as the fact that ϑc\nabla\vartheta_{c} is a composite function and it is highly difficult to find a mapping Tϑc:m×𝒵𝕃(m×𝒵,m×𝒵)T_{\vartheta_{c}}:{\mathds{R}}^{m}\times{\mathcal{Z}}\rightrightarrows{\mathds{L}}({\mathds{R}}^{m}\times{\mathcal{Z}},{\mathds{R}}^{m}\times{\mathcal{Z}}) such that

B(ϑc)(λ,μ)yTϑc(λ,μ)yC(ϑc)(λ,μ)y,(λ,μ)𝔹δ0(λ,μ),ym×𝒵.\begin{array}[]{rl}\partial_{B}(\nabla\vartheta_{c})(\lambda,\mu)y\subset T_{\vartheta_{c}}(\lambda,\mu)y\subset\partial_{C}(\nabla\vartheta_{c})(\lambda,\mu)y,\\[5.69054pt] \qquad\qquad\forall\,(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}),\ \forall\,y\in{\mathds{R}}^{m}\times{\mathcal{Z}}.\end{array} (3.7)

Fortunately, thanks to Proposition 2.4, even if (3.7) fails, one can still hope to design a second-order method to address this issue. Next, we will design a specific nonsmooth Newton method to ensure a second-order method of multipliers for problem (1.1).

For this purpose, we suppose that Assumption 3.1 holds and cc¯c\geq\bar{c} is fixed, so that we can define the functions

{λc(λ,μ):=λch(𝐱c(λ,μ)),μc(λ,μ):=ΠK(μcg(𝐱c(λ,μ))),(λ,μ)𝔹δ0(λ,μ),\left\{\begin{array}[]{ll}{\lambda}_{c}(\lambda,\mu):=\lambda-ch({\mathbf{x}}_{c}(\lambda,\mu)),\\[5.69054pt] \mu_{c}(\lambda,\mu):=\Pi_{K}\left(\mu-cg({\mathbf{x}}_{c}(\lambda,\mu))\right),\end{array}\right.\quad\forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}), (3.8)

and

𝒜c(λ,μ,W):=xx20(𝐱c(λ,μ),λc(λ,μ),μc(λ,μ))+ch(𝐱c(λ,μ))𝒥h(𝐱c(λ,μ))+cg(𝐱c(λ,μ))W𝒥g(𝐱c(λ,μ)),(λ,μ)𝔹δ0(λ,μ),W𝕃(𝒵,𝒵),\begin{array}[]{r}\displaystyle{\mathcal{A}}_{c}(\lambda,\mu,W):=\nabla_{xx}^{2}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(\lambda,\mu),\lambda_{c}(\lambda,\mu),\mu_{c}(\lambda,\mu))+c\nabla h({\mathbf{x}}_{c}(\lambda,\mu)){\mathcal{J}}h({\mathbf{x}}_{c}(\lambda,\mu))\\[5.69054pt] +c\nabla g({\mathbf{x}}_{c}(\lambda,\mu))W{\mathcal{J}}g({\mathbf{x}}_{c}(\lambda,\mu)),\qquad\forall\,(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}),\ \forall\,W\in{\mathds{L}}({\mathcal{Z}},{\mathcal{Z}}),\end{array} (3.9)

where both the constant δ0\delta_{0} and the function 𝐱c(){\mathbf{x}}_{c}(\cdot) are specified by Proposition 3.1. In this case, we can also define the following set-valued mapping (from m×𝒵{\mathds{R}}^{m}\times{\mathcal{Z}} to 𝕃(m×𝒵,m×𝒵){\mathds{L}}({\mathds{R}}^{m}\times{\mathcal{Z}},{\mathds{R}}^{m}\times{\mathcal{Z}}))

𝕍c(λ,μ):={(𝒥h(𝐱c(λ,μ))W𝒥g(𝐱c(λ,μ)))(𝒜c(λ,μ,W))1(h(𝐱c(λ,μ)),g(𝐱c(λ,μ))W)(0001c(𝒵W))WBΠK(μcg(𝐱c(λ,μ)))},\begin{array}[]{l}{\mathds{V}}_{c}(\lambda,\mu):=\left\{-\begin{pmatrix}{\mathcal{J}}h({\mathbf{x}}_{c}(\lambda,\mu))\\[5.69054pt] W{\mathcal{J}}g({\mathbf{x}}_{c}(\lambda,\mu))\end{pmatrix}({\mathcal{A}}_{c}(\lambda,\mu,W))^{-1}\Big{(}\nabla h({\mathbf{x}}_{c}(\lambda,\mu)),\nabla g({\mathbf{x}}_{c}(\lambda,\mu))W\Big{)}\right.\\[17.07164pt] \hfill\left.-\begin{pmatrix}0&0\\[5.69054pt] 0&\frac{1}{c}({\mathcal{I}}_{{\mathcal{Z}}}-W)\end{pmatrix}\mid W\in\partial_{B}\Pi_{K}\Big{(}\mu-cg\big{(}{\mathbf{x}}_{c}(\lambda,\mu)\big{)}\Big{)}\right\},\end{array} (3.10)

where 𝒵{\mathcal{I}}_{\mathcal{Z}} denotes the identity operator in 𝒵{\mathcal{Z}}. According to (3.2) and (3.3), one has that

𝒜c(λ,μ,W)=𝒜(c,λ,μ)(W),WBΠK(μcg(x)).{\mathcal{A}}_{c}(\lambda^{*},\mu^{*},W)={\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W),\quad\forall\,W\in\partial_{B}\Pi_{K}(\mu^{*}-cg(x^{*})). (3.11)

The following result on the properties of the mapping 𝕍c{\mathds{V}}_{c} defined above is crucial to the forthcoming algorithmic design. Its proof is given in Appendix A, which is mainly based on the work of [13].

Proposition 3.3.

Suppose that Assumption 3.1 holds and cc¯c\geq\bar{c}. Then, there exists a constant δ1>0\delta_{1}>0 (δ1δ0\delta_{1}\leq\delta_{0}) such that the mapping 𝕍c{\mathds{V}}_{c} in (3.10) is well-defined, compact-valued and outer semicontinuous on 𝔹δ1(λ,μ){\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}). Moreover, it holds that

B(ϑc)(λ,μ)𝕍c(λ,μ),(λ,μ)𝔹δ1(λ,μ).\partial_{B}(\nabla\vartheta_{c})(\lambda,\mu)\subseteq{\mathds{V}}_{c}(\lambda,\mu),\quad\forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}).

Based on the above discussions, we are able to present the following second-order method of multipliers.

Algorithm 1 A second-order method of multipliers for problem (1.1).

Given c>0c>0 and (λ0,μ0)m×𝒵(\lambda^{0},\mu^{0})\in{\mathds{R}}^{m}\times{\mathcal{Z}}. For k=0,1,k=0,1,\ldots,

  • 1.

    compute xk+1:=𝐱c(λk,μk)=argminx𝔹ε(x)c(x,λk,μk)\displaystyle x^{k+1}:={\mathbf{x}}_{c}(\lambda^{k},\mu^{k})=\operatorname*{arg\,min}_{x\in{\mathds{B}}_{\varepsilon}(x^{*})}{\mathcal{L}}_{c}(x,\lambda^{k},\mu^{k});

  • 2.

    choose a linear operator Vk𝕍c(λk,μk)V^{k}\in{\mathds{V}}_{c}(\lambda^{k},\mu^{k}) and solve the linear system ϑc(λk,μk)+VkΔy=0\nabla\vartheta_{c}(\lambda^{k},\mu^{k})+V^{k}\Delta y=0 to obtain Δyk+1m×𝒵\Delta y^{k+1}\in{\mathds{R}}^{m}\times{\mathcal{Z}}, where ϑc\nabla\vartheta_{c} is given by (3.6) and 𝕍c(λk,μk){\mathds{V}}_{c}(\lambda^{k},\mu^{k}) is given by (3.10);

  • 3.

    compute (λk+1,μk+1):=(λk,μk)+Δyk+1.(\lambda^{k+1},\mu^{k+1}):=(\lambda^{k},\mu^{k})+\Delta y^{k+1}.

Remark 3.4.

(a) Algorithm 1 is essentially an application of a nonsmooth Newton method for (locally) maximizing the function ϑc\vartheta_{c}, or equivalently, for solving the nonlinear equation ϑc(λ,μ)=0\nabla\vartheta_{c}(\lambda,\mu)=0. In the second step of Algorithm 1, the linear operator WBΠK(μkcg(xk+1))W\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})) for computing VkV^{k} can be explicitly obtained for the NLP, NLSOCP and NLSDP problems. Consequently, one can explicitly compute 𝒜c(λk,μk,Wk){\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}) via (3.9) and the linear operator VkV^{k} can be calculated via (3.10).

(b) Generally, it is hard to check if 𝕍c(λk,μk)C(ϑc)(λk,μk){\mathds{V}}_{c}(\lambda^{k},\mu^{k})\subseteq\partial_{C}(\nabla\vartheta_{c})(\lambda^{k},\mu^{k}). Consequently, Algorithm 1 could not be attributed as the classic semismooth Newton method introduced in Section 2.2. We should mention that, it is still uncertain if one can explicitly find a linear operator VkV^{k} such that VkC(ϑc)(λk,μk)V^{k}\in\partial_{C}(\nabla\vartheta_{c})(\lambda^{k},\mu^{k}) or VkB(ϑc)(λk,μk)V^{k}\in\partial_{B}(\nabla\vartheta_{c})(\lambda^{k},\mu^{k}). However, as will be shown later, Vk𝕍c(λk,μk)V^{k}\in{\mathds{V}}_{c}(\lambda^{k},\mu^{k}) is sufficient for our purpose.

To analyze the convergence properties of Algorithm 1, we make the following abstract assumption as a supplementary to Assumption 3.1.

Assumption 3.2.

For any cc¯c\geq\bar{c}, any linear operator V𝕍c(λ,μ)V\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*}) satisfies V0V\prec 0.

The following result is essential to the convergence analysis of Algorithm 1. The proof of this theorem is given in Appendix B.

Theorem 3.5.

Suppose that Assumptions 3.1 and 3.2 hold, and cc¯c\geq\bar{c}. Let {(λk,μk)}\{(\lambda^{k},\mu^{k})\} be an arbitrary sequence that converging to (λ,μ)(\lambda^{*},\mu^{*}). Then, for all the sufficiently large kk, the set 𝕍c(λk,μk){\mathds{V}}_{c}(\lambda^{k},\mu^{k}) is nonempty and compact, and it holds that

ϑc(λk,μk)ϑc(λ,μ)Vk(λkλ,μkμ)=o((λkλ,μkμ)),Vk𝕍c(λk,μk).\begin{array}[]{r}\nabla\vartheta_{c}(\lambda^{k},\mu^{k})-\nabla\vartheta_{c}(\lambda^{*},\mu^{*})-V_{k}(\lambda^{k}-\lambda^{*},\mu^{k}-\mu^{*})=o\left(\|(\lambda^{k}-\lambda^{*},\mu^{k}-\mu^{*})\|\right),\\[5.69054pt] \forall\,V^{k}\in{\mathds{V}}_{c}(\lambda^{k},\mu^{k}).\end{array} (3.12)

Now we establish the local convergence properties of Algorithm 1.

Theorem 3.6.

Suppose that Assumptions 3.1 and 3.2 hold. If c>0c>0 is sufficiently large, then there exists a constant δ>0\delta>0 such that, if (λ0,μ0)𝔹δ(λ,μ)(\lambda^{0},\mu^{0})\in{\mathds{B}_{\delta}}(\lambda^{*},\mu^{*}), the whole sequence {(λk,μk)}\{(\lambda^{k},\mu^{k})\} generated by Algorithm 1 is well-defined and converges to (λ,μ)(\lambda^{*},\mu^{*}) with a superlinear convergence rate.

Proof.

According to Lemma 3.2, the function ϑc\nabla\vartheta_{c} defined by (3.6) is semismooth. Meanwhile, the mapping 𝕍c{\mathds{V}}_{c} defined in (3.10) is locally nonempty, compact-valued and outer semicontinuous in a neighborhood of (λ,μ)(\lambda^{*},\mu^{*}). Then, by Proposition 3.1 and Proposition 3.3 we know that Algorithm 1 is well-defined. Consequently, the theorem follows from Theorem 3.5 and Proposition 2.4. This completes the proof. ∎

We make the following remark on the abstract assumptions used in Theorem 3.6.

Remark 3.7.

Theorem 3.6 has established the local convergence, as well as the superlinear convergence rate, of the second-order method of multipliers (Algorithm 1). However, at the first glance, both Assumptions 3.1 and 3.2 are too abstract to be appreciated. Therefore, it is of great importance to transfer them to some concrete conditions. In the following three sections, we show separately that, for the three typical scenarios, i.e., the NLP, the NLSOCP, and the NLSDP, Assumptions 3.1 and 3.2 are the consequence of a certain constraint qualification, together with a second-order sufficient condition, and the corresponding analysis constitutes the main contribution of this paper. As can be observed later, these concrete conditions are exactly the prerequisites that were frequently used for establishing the local Q-linear convergence rate of the method of multipliers without requiring strict complementarity.

Before finishing this section, we would like to make the following remark on Algorithm 1.

Remark 3.8.

In the conventional problem settings of NLP, NLSOCP and NLSDP, even the classic augmented Lagrangian method is not necessarily convergent globally. Moreover, Algorithm 1 is based on a specially designed generalized Newton method by regarding the update of multipliers in the augmented Lagrangian method as a first-order step. As the second-order approaches often require stronger conditions to guarantee even the local convergence property, it is not necessarily globally convergent in general if the first-order counterpart is not. Consequently, the best possible result for the multiplier sequence generated by Algorithm 1 is a local convergence. Moreover, as the convergence rate study relies on a certain local error bound (as it is not realistic in general to expect a global error bound), the corresponding convergence rate results are also local ones.

4 The NLP case

Consider the NLP problem

minxf(x)s.t.h(x)=0,g(x)0,\min_{x}f(x)\quad\mbox{s.t.}\quad h(x)=0,\quad g(x)\geq 0, (4.1)

where f:nf:{\mathds{R}}^{n}\to{\mathds{R}}, h:nmh:{\mathds{R}}^{n}\to{\mathds{R}}^{m} and g:npg:{\mathds{R}}^{n}\to{\mathds{R}}^{p} are three twice continuously differentiable functions. Problem (4.1) is an instance of problem (1.1) with 𝒳n{\mathcal{X}}\equiv{\mathds{R}}^{n}, 𝒵p{\mathcal{Z}}\equiv{\mathds{R}}^{p} and K+p:={μpμj0,j=1,2,,p}K\equiv{\mathds{R}}_{+}^{p}:=\{\mu\in{\mathds{R}}^{p}\mid\mu_{j}\geq 0,j=1,2,\ldots,p\}. For convenience, we define the two index sets 𝕀:={1,,m}{\mathds{I}}:=\{1,\ldots,m\} and 𝕁:={1,,p}{\mathds{J}}:=\{1,\ldots,p\}. The following two definitions are well-known [33] for NLP problem (4.1).

Definition 4.1 (LICQ).

Given a point xnx\in{\mathds{R}}^{n} such that h(x)=0h(x)=0 and g(x)0g(x)\geq 0. We say that the linear independence constraint qualification (LICQ) holds at xx if the set of active constraint gradients

{hi(x),gj(x)i𝕀,j𝕁,gj(x)=0}\{\nabla h_{i}(x),\nabla g_{j}(x)\mid\,i\in{\mathds{I}},\,j\in{\mathds{J}},g_{j}(x)=0\}

is linearly independent.

Let (x,λ,μ)(x^{*},\lambda^{*},\mu^{*}) be a solution to the KKT system (3.2). The critical cone at xx^{*} for (λ,μ)(\lambda^{*},\mu^{*}) is defined by

𝒞(x,(λ,μ)):={dm|hi(x),d=0,i𝕀,gj(x),d=0,j𝕁 such that gj(x)=0 and μj>0,gj(x),d0,j𝕁 such that gj(x)=0 and μj=0}.\begin{array}[]{ll}{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))\\[5.69054pt] \quad:=\left\{d\in{\mathds{R}}^{m}\Bigg{|}\begin{array}[]{l}\langle\nabla h_{i}(x^{*}),d\rangle=0,\ \forall i\in{\mathds{I}},\\[1.42262pt] \langle\nabla g_{j}(x^{*}),d\rangle=0,\ \forall j\in{\mathds{J}}\mbox{ such that }g_{j}(x^{*})=0\mbox{ and }\mu^{*}_{j}>0,\\[1.42262pt] \langle\nabla g_{j}(x^{*}),d\rangle\geq 0,\ \forall j\in{\mathds{J}}\mbox{ such that }g_{j}(x^{*})=0\mbox{ and }\mu^{*}_{j}=0\end{array}\right\}.\end{array}
Definition 4.2 (Second-order sufficient condition for NLP).

Let xx^{*} be a stationary point of problem (4.1), We say the second-order sufficient condition holds at xx^{*} if there exists (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}) such that

d,xx20(x,λ,μ)d>0,d𝒞(x,(λ,μ)){0}.\langle d,\nabla_{xx}^{2}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})d\rangle>0,\ \forall d\in{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))\setminus\{0\}.

The following definition was introduced in Robinson [40].

Definition 4.3 (Strong second-order sufficient condition for NLP).

Let xx^{*} be a stationary point of problem (4.1), We say the strong second-order sufficient condition holds at xx^{*} if there exists (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}) such that

d,xx20(x,λ,μ)d>0,d𝖺𝖿𝖿𝒞(x,(λ,μ)){0},\langle d,\nabla_{xx}^{2}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})d\rangle>0,\ \forall d\in{\sf aff\,}{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))\setminus\{0\},

where 𝖺𝖿𝖿𝒞(x,(λ,μ)){\sf aff\,}{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*})) is the affine hull of 𝒞(x,(λ,μ)){\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*})).

The following proposition is the main result of Chen and Liao [13].

Proposition 4.4.

Suppose that both the LICQ and the strong second-order sufficient condition hold at the given stationary point xx^{*} of the NLP problem (4.1). Then, Assumptions 3.1 and 3.2 hold at xx^{*} for problem (4.1).

5 The NLSOCP case

In this section, we consider the following NLSOCP problem

minf(x)s.t.h(x)=0,g(x)K,\min f(x)\quad\mbox{s.t.}\quad h(x)=0,\quad g(x)\in K, (5.1)

where f:nf:{\mathds{R}}^{n}\to{\mathds{R}}, h:nmh:{\mathds{R}}^{n}\to{\mathds{R}}^{m}, and g:npg:{\mathds{R}}^{n}\to{\mathds{R}}^{p} are three twice continuously differentiable functions, K:=𝒬1××𝒬sK:={\mathcal{Q}}_{1}\times\cdots\times{\mathcal{Q}}_{s} with each 𝒬j{\mathcal{Q}}_{j}, j=1,,sj=1,\ldots,s being a second-order cone in rj+1{\mathds{R}}^{r_{j}+1}, i.e.,

𝒬j:={μj=(μ¯j,μ˙j)rj×μ˙jμ¯j}.{\mathcal{Q}}_{j}:=\{\mu_{j}=(\bar{\mu}_{j},\dot{\mu}_{j})\in{\mathds{R}}^{r_{j}}\times{\mathds{R}}\mid\dot{\mu}_{j}\geq\|\bar{\mu}_{j}\|\}.

Note that p=j=1s(rj+1)p=\sum_{j=1}^{s}(r_{j}+1). Problem (5.1) is an instance of problem (1.1) with 𝒳n{\mathcal{X}}\equiv{\mathds{R}}^{n} and 𝒵p{\mathcal{Z}}\equiv{\mathds{R}}^{p}. Moreover, for convenience, we can partition the function gg by g()=(g1(),,gs())g(\cdot)=(g_{1}(\cdot),\ldots,g_{s}(\cdot)) with each gj()=(g¯j(),g˙j())g_{j}(\cdot)=(\bar{g}_{j}(\cdot),\dot{g}_{j}(\cdot)) such that g˙j\dot{g}_{j} is real-valued. In the rest part of this section, we first introduce some preliminary results related to NLSOCP problems. After that, we present the main results of this section, and discuss a special case.

5.1 Preliminaries on NLSOCP

For each j=1,,sj=1,\dots,s, we use 𝗂𝗇𝗍𝒬j{\sf int\,}{\mathcal{Q}}_{j} and 𝖻𝖽𝗋𝗒𝒬j{\sf bdry\,}{\mathcal{Q}}_{j} to denote the interior and the boundary of the second-order cone 𝒬j{\mathcal{Q}}_{j}, respectively. According to [21, Proposition 3.3], the projection operator to the cone KK takes the form of ΠK(μ)=(ν1,,νs)\Pi_{K}(\mu)=(\nu_{1},\ldots,\nu_{s}), μ=(μ1,,μs)r1+1××rs+1\forall\mu=(\mu_{1},\ldots,\mu_{s})\in{\mathds{R}}^{r_{1}+1}\times\cdots\times{\mathds{R}}^{r_{s}+1} with

νj={12(1+μ˙jμ¯j)(μ¯j,μ¯j),if |μ˙j|<μ¯j,μj,if μ˙jμ¯j,0,if μ˙jμ¯j,\nu_{j}=\left\{\begin{array}[]{ll}\displaystyle\frac{1}{2}\left(1+\frac{\dot{\mu}_{j}}{\|\bar{\mu}_{j}\|}\right)(\bar{\mu}_{j},\|\bar{\mu}_{j}\|),&\mbox{if }\ |\dot{\mu}_{j}|<\|\bar{\mu}_{j}\|,\\[5.69054pt] \mu_{j},&\mbox{if }\ \dot{\mu}_{j}\geq\|\bar{\mu}_{j}\|,\\[5.69054pt] 0,&\mbox{if }\ \dot{\mu}_{j}\leq-\|\bar{\mu}_{j}\|,\\ \end{array}\right.

where μj=(μ¯j,μ˙j)rj×,j=1,,s.\mu_{j}=(\bar{\mu}_{j},\dot{\mu}_{j})\in{\mathds{R}}^{r_{j}}\times{\mathds{R}},\forall j=1,\ldots,s.

It has been established in Chen et al. [14, Proposition 4.3] (also in Hayashi et al. [22, Proposition 4.5]) that the projection operator ΠK()\Pi_{K}(\cdot) is strongly semismooth. Moreover, the Bouligand-subdifferential of this projection operator has been discussed in Pang et al. [35, Lemma 14], Chen et al. [12, Lemma 4] and Outrata and Sun [34, Lemma 1]. The following result directly follows [25, Lemmas 2.5 & 2.6].

Lemma 5.1.

For any given point μ=(μ1,,μs)r1+1××rs+1\mu=(\mu_{1},\ldots,\mu_{s})\in{\mathds{R}}^{r_{1}+1}\times\cdots\times{\mathds{R}}^{r_{s}+1}, each element WBΠK(μ)W\in\partial_{B}\Pi_{K}(\mu) is a block diagonal matrix W=𝖣𝗂𝖺𝗀(W1,,Ws)W={\sf Diag\,}(W_{1},\ldots,W_{s}), with each Wj(rj+1)×(rj+1)W_{j}\in{\mathds{R}}^{(r_{j}+1)\times(r_{j}+1)} taking one of the following representations:

  • (a)

    Wj={12((1+μ˙jμ¯j)Irjμ˙jμ¯jμ¯jμ¯jTμ¯j2μ¯jμ¯jμ¯jTμ¯j1), if |μ˙j|<μ¯j,Irj+1, if μ˙j>μ¯j,0, if μ˙j<μ¯j;W_{j}=\left\{\begin{array}[]{ll}\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\left(1+\frac{\dot{\mu}_{j}}{\|\bar{\mu}_{j}\|}\right)I_{r_{j}}-\frac{\dot{\mu}_{j}}{\|\bar{\mu}_{j}\|}\frac{\bar{\mu}_{j}\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|^{2}}&\displaystyle\frac{\bar{\mu}_{j}}{\|\bar{\mu}_{j}\|}\\[11.38109pt] \displaystyle\frac{\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|}&1\end{pmatrix},&\mbox{ if }\displaystyle|\dot{\mu}_{j}|<\|\bar{\mu}_{j}\|,\\[5.69054pt] \displaystyle\displaystyle I_{r_{j}+1},&\mbox{ if }\displaystyle\dot{\mu}_{j}>\|\bar{\mu}_{j}\|,\\[5.69054pt] \displaystyle 0,&\mbox{ if }\displaystyle\dot{\mu}_{j}<-\|\bar{\mu}_{j}\|;\end{array}\right.

  • (b)

    Wj{Irj+1,Irj+1+12(μ¯jμ¯jTμ¯j2μ¯jμ¯jμ¯jTμ¯j1)}, if μj0 and μ˙j=μ¯j;W_{j}\in\left\{I_{r_{j}+1},\ I_{r_{j}+1}+\frac{1}{2}\begin{pmatrix}\displaystyle-\frac{\bar{\mu}_{j}\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|^{2}}&\displaystyle\frac{\bar{\mu}_{j}}{\|\bar{\mu}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|}&\displaystyle-1\end{pmatrix}\right\},\ \mbox{ if }\mu_{j}\neq 0\mbox{ and }\dot{\mu}_{j}=\|\bar{\mu}_{j}\|;

  • (c)

    Wj{0,12(μ¯jμ¯jTμ¯j2μ¯jμ¯jμ¯jTμ¯j1)}, if μj0 and μ˙j=μ¯j;W_{j}\in\left\{0,\ \frac{1}{2}\begin{pmatrix}\displaystyle\frac{\bar{\mu}_{j}\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|^{2}}&\displaystyle\frac{\bar{\mu}_{j}}{\|\bar{\mu}_{j}\|}\\[11.38109pt] \displaystyle\frac{\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|}&\displaystyle 1\end{pmatrix}\right\},\ \mbox{ if }\mu_{j}\neq 0\mbox{ and }\dot{\mu}_{j}=-\|\bar{\mu}_{j}\|;

  • (d)

    Wj{12(2τIrj+(12τ)ν¯ν¯Tν¯ν¯T1)|ν¯rj,ν¯=1,τ[0,1]}{0}{Irj+1}, if μj=0.W_{j}\in\left\{\frac{1}{2}\begin{pmatrix}\displaystyle 2\tau I_{r_{j}}+(1-2\tau)\bar{\nu}\bar{\nu}^{T}&\displaystyle\bar{\nu}\\[5.69054pt] \displaystyle\bar{\nu}^{T}&\displaystyle 1\end{pmatrix}\Bigl{|}\ \bar{\nu}\in{\mathds{R}}^{r_{j}},\ \|\bar{\nu}\|=1,\ \tau\in[0,1]\right\}\cup\{0\}\cup\{I_{r_{j}+1}\},\ \mbox{ if }\mu_{j}=0.

According to [6, Lemma 25], the tangent cone of KK at μ=(μ1,,μs)K\mu=(\mu_{1},\ldots,\mu_{s})\in K, with each μjrj+1\mu_{j}\in{\mathds{R}}^{r_{j}+1}, j=1,,sj=1,\ldots,s, is given by 𝒯K(μ)=𝒯𝒬1(μ1)××𝒯𝒬s(μs){\mathcal{T}}_{K}(\mu)={\mathcal{T}}_{{\mathcal{Q}}_{1}}(\mu_{1})\times\cdots\times{\mathcal{T}}_{{\mathcal{Q}}_{s}}(\mu_{s}), where

𝒯𝒬j(μj):={rj+1, if μ˙j>μ¯j,𝒬j, if μj=0,{νjrj+1ν¯j,μ¯jνj˙μ˙j0}, otherwise.{\mathcal{T}}_{{\mathcal{Q}}_{j}}(\mu_{j}):=\left\{\begin{array}[]{ll}{\mathds{R}}^{r_{j}+1},&\mbox{ if }\dot{\mu}_{j}>\bar{\mu}_{j},\\[5.69054pt] {\mathcal{Q}}_{j},&\mbox{ if }\mu_{j}=0,\\[2.84526pt] \{\nu_{j}\in{\mathds{R}}^{r_{j}+1}\mid\langle\bar{\nu}_{j},\bar{\mu}_{j}\rangle-\dot{\nu_{j}}\dot{\mu}_{j}\leq 0\},&\mbox{ otherwise}.\end{array}\right.

Consequently, the corresponding lineality space of 𝒯K(μ){\mathcal{T}}_{K}(\mu) takes the following form

𝗅𝗂𝗇(𝒯K(μ))=Θ1××Θs{{\sf lin\,}}({\mathcal{T}}_{K}(\mu))=\Theta_{1}\times\cdots\times\Theta_{s} (5.2)

with

Θj={rj+1, if μ˙j>μ¯j,{0}, if μj=0,{νjrj+1ν¯j,μ¯jνj˙μ˙j=0}, otherwise.\Theta_{j}=\left\{\begin{array}[]{ll}{\mathds{R}}^{r_{j}+1},&\mbox{ if }\dot{\mu}_{j}>\bar{\mu}_{j},\\ \{0\},&\mbox{ if }\mu_{j}=0,\\ \{\nu_{j}\in{\mathds{R}}^{r_{j}+1}\mid\langle\bar{\nu}_{j},\bar{\mu}_{j}\rangle-\dot{\nu_{j}}\dot{\mu}_{j}=0\},&\mbox{ otherwise}.\end{array}\right.

The following definition of constraint nondegeneracy222The name of “constraint nondegeneracy” was coined in Robinson in [42]. For NLP problems, constraint nondegeneracy is exactly the LICQ, c.f. Robinson [41] and Shapiro [47]. for NLSOCP problems follows from [7, Section 2].

Definition 5.2 (Constraint nondegeneracy).

We say that the constraint nondegeneracy holds at a feasible point xx to the NLSOCP problem (5.1) if

(𝒥h(x)𝒥g(x))n+({0}𝗅𝗂𝗇(𝒯K(g(x))))=(mp).\begin{pmatrix}{\mathcal{J}}h(x)\\[2.84526pt] {\mathcal{J}}g(x)\end{pmatrix}{\mathds{R}}^{n}+\begin{pmatrix}\{0\}\\[2.84526pt] {\sf lin\,}\big{(}{\mathcal{T}}_{K}(g(x))\big{)}\end{pmatrix}=\begin{pmatrix}{\mathds{R}}^{m}\\[2.84526pt] {\mathds{R}}^{p}\end{pmatrix}. (5.3)

Let xx^{*} be a stationary point of the NLSOCP problem (5.1) with (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}). The critical cone 𝒞(x){\mathcal{C}}(x^{*}) of NLSOCP at xx^{*} is defined, e.g., in [8, 6], by

𝒞(x):={dn𝒥h(x)d=0,𝒥g(x)d𝒯K(g(x))(μ)},{\mathcal{C}}(x^{*}):=\left\{d\in{\mathds{R}}^{n}\mid{\mathcal{J}}h(x^{*})d=0,\ {\mathcal{J}}g(x^{*})d\in{\mathcal{T}}_{K}(g(x^{*}))\cap(\mu^{*})^{\bot}\right\},

where (μ)(\mu^{*})^{\bot} is the orthogonal complement of μ\mu^{*} in p{\mathds{R}}^{p}. The explicit formulation of 𝒞(x){\mathcal{C}}(x^{*}) can be found in [6, Corollary 26]. Moreover, as in [6, (47)], for any d𝖺𝖿𝖿𝒞(x)d\in{{\sf aff\,}}\,{\mathcal{C}}(x^{*}), one has that 𝒥h(x)d=0{\mathcal{J}}h(x^{*})d=0 and for all j=1,,sj=1,\ldots,s,

{𝒥gj(x)d=0, if μj𝗂𝗇𝗍𝒬j,𝒥gj(x)d𝖺𝖿𝖿{(μ¯j,μ˙j)}, if μj𝖻𝖽𝗋𝗒𝒬j\{0} and gj(x)=0,𝒥gj(x)d,μj=0, if μj,gj(x)𝖻𝖽𝗋𝗒𝒬j\{0}.\left\{\begin{array}[]{ll}{\mathcal{J}}g_{j}(x^{*})d=0,&\mbox{ if }\mu^{*}_{j}\in{{\sf int\,}\,}{\mathcal{Q}}_{j},\\[2.84526pt] {\mathcal{J}}g_{j}(x^{*})d\in{{\sf aff\,}}\{(-\bar{\mu}_{j},\dot{\mu}_{j})\},&\mbox{ if }\mu^{*}_{j}\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}\mbox{ and }g_{j}(x^{*})=0,\\[2.84526pt] \langle{\mathcal{J}}g_{j}(x^{*})d,\mu_{j}^{*}\rangle=0,&\mbox{ if }\mu^{*}_{j},g_{j}(x^{*})\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}.\end{array}\right.

Hence, one has that

𝖺𝖿𝖿𝒞(x)={dn𝒥h(x)d=0,𝒥g(x)d𝖺𝖿𝖿(𝒯K(g(x))(μ))}.{\sf aff\,}{\mathcal{C}}(x^{*})=\left\{d\in{\mathds{R}}^{n}\mid{\mathcal{J}}h(x^{*})d=0,\ {\mathcal{J}}g(x^{*})d\in{\sf aff\,}\big{(}{\mathcal{T}}_{K}(g(x^{*}))\cap(\mu^{*})^{\bot}\big{)}\right\}. (5.4)

The following definition of the strong second-order sufficient condition for the NLSOCP problem comes from Bonnans and Ramírez [6].

Definition 5.3 (Strong second-order sufficient condition for NLSOCP).

Let xx^{*} be a stationary point of the NLSOCP problem (5.1). We say that the strong second-order sufficient condition holds at xx^{*} if there exists a vector (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}) such that

d,(𝒥xx20(x,λ,μ)+(x,μ))d>0,d𝖺𝖿𝖿(C(x))\{0},\left\langle d,\left({\mathcal{J}}^{2}_{xx}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})+{\mathcal{H}}(x^{*},\mu^{*})\right)d\right\rangle>0,\quad\forall d\in{{\sf aff\,}}(C(x^{*}))\backslash\{0\}, (5.5)

where (x,μ)=j=1sj(x,μ){\mathcal{H}}(x^{*},\mu^{*})=\sum_{j=1}^{s}{\mathcal{H}}_{j}(x^{*},\mu^{*}) is a matrix with each j(x,μ)n×n{\mathcal{H}}_{j}(x^{*},\mu^{*})\in{\mathds{R}}^{n\times n}, j=1,,sj=1,\ldots,s defined by

j(x,μ):={μ˙jg˙j(x)gj(x)(Irj001)𝒥gj(x),if gj(x)𝖻𝖽𝗋𝗒𝒬j\{0},0, otherwise.\begin{array}[]{l}{\mathcal{H}}_{j}(x^{*},\mu^{*}):=\left\{\begin{array}[]{ll}\displaystyle-\frac{\dot{\mu}_{j}}{\dot{g}_{j}(x^{*})}\nabla g_{j}(x^{*})\begin{pmatrix}-I_{r_{j}}&0\\ 0&1\end{pmatrix}{\mathcal{J}}g_{j}(x^{*}),&\mbox{if }g_{j}(x^{*})\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\},\\[11.38109pt] 0,&\mbox{ otherwise}.\end{array}\right.\end{array}

5.2 Main results

Based on the previous subsection, we are ready to establish the main result of this section.

Theorem 5.4.

Suppose that both the constraint nondegeneracy (5.3) and the strong second-order sufficient condition (5.5) hold for problem (5.1) at xx^{*} with (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}). Then, Assumptions 3.1 and 3.2 hold.

Proof.

Assumption 3.1 directly follows from [6, Proposition 17] and [29, Proposition 10]. In the following, we show that Assumption 3.2 holds.

Let cc¯c\geq\bar{c} be fixed. Denote z=(z1,,zs):=μcg(x)z=(z_{1},\ldots,z_{s}):=\mu^{*}-cg(x^{*}) with each zj:=μjcgj(x)z_{j}:=\mu_{j}-cg_{j}(x^{*}), and gj()=(g¯j(),g˙j())g_{j}(\cdot)=(\bar{g}_{j}(\cdot),\dot{g}_{j}(\cdot)), j=1,,s\forall j=1,\ldots,s. Since Assumption 3.1 holds, we know that for any WBΠK(z)W\in\partial_{B}\Pi_{K}(z), the linear operator 𝒜(c,λ,μ)(W){\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W) defined in (3.3) is positive definite, so that by (3.11) we know that 𝕍c(λ,μ){\mathds{V}}_{c}(\lambda^{*},\mu^{*}) defined by (3.10) is nonempty. Suppose that there exists a certain WBΠK(z)W\in\partial_{B}\Pi_{K}(z) such that the corresponding V𝕍c(y)V\in{\mathds{V}}_{c}(y^{*}) is not negative definite. From (3.10) we know that V0V\preceq 0. Therefore, what we actually suppose is that VV is singular, i.e., there exists two vectors Δλm\Delta\lambda\in{\mathds{R}}^{m} and Δμp\Delta\mu\in{\mathds{R}}^{p} such that

Δy:=(Δλ,Δμ)0butΔy,VΔy=0.\Delta y:=(\Delta\lambda,\Delta\mu)\neq 0\quad\mbox{but}\quad\langle\Delta y,V\Delta y\rangle=0. (5.6)

Note that

Δy,V(Δy)=h(x)(Δλ)+g(x)W(Δμ),(𝒜(c,λ,μ)(W))1(h(x)(Δλ)+g(x)W(Δμ))+1cΔμ,(IpW)(Δμ).\begin{array}[]{ll}-\left\langle\Delta y,V(\Delta y)\right\rangle=&\displaystyle\big{\langle}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu),\\[2.84526pt] &\quad({\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W))^{-1}\big{(}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu)\big{)}\big{\rangle}\\[5.69054pt] &+\frac{1}{c}\left\langle\Delta\mu,(I_{p}-W)(\Delta\mu)\right\rangle.\end{array} (5.7)

Meanwhile, from Lemma 5.1 we know that W=𝖣𝗂𝖺𝗀(W1,,Ws)W={\sf Diag\,}(W_{1},\ldots,W_{s}) with each Wj(rj+1)×(rj+1)W_{j}\in{\mathds{R}}^{(r_{j}+1)\times(r_{j}+1)}, j=1,,s\forall j=1,\ldots,s. Moreover, it is easy to deduce from [31, Proposition 1] that IrjWj0I_{r_{j}}\succeq W_{j}\succeq 0. Hence, by using the fact that 𝒜(c,λ,μ)(W)0{\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W)\succ 0, one has from (5.6) and (5.7) that

h(x)(Δλ)+g(x)W(Δμ)=0,\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu)=0, (5.8)

and

(Δμ)j,(Irj+1Wj)(Δμ)j=0,j=1,,s.\langle(\Delta\mu)_{j},(I_{r_{j}+1}-W_{j})(\Delta\mu)_{j}\rangle=0,\quad\forall j=1,\ldots,s. (5.9)

Since the constraint nondegeneracy condition (5.3) holds at xx^{*}, there exist a vector dnd\in{\mathds{R}}^{n} and a vector ξ=(ξ1,,ξs)𝗅𝗂𝗇(𝒯K(g(x)))\xi=(\xi_{1},\ldots,\xi_{s})\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{K}(g(x^{*}))\big{)} with ξj=(ξ¯j,ξ˙j)rj×\xi_{j}=(\bar{\xi}_{j},\dot{\xi}_{j})\in{\mathds{R}}^{r_{j}}\times{\mathds{R}}, j=1,,s\forall j=1,\ldots,s, such that

𝒥h(x)d=Δλand𝒥g(x)d+ξ=Δμ.{\mathcal{J}}h(x^{*})d=\Delta\lambda\quad\mbox{and}\quad{\mathcal{J}}g(x^{*})d+\xi=\Delta\mu.

Therefore, one can get from (5.8) and (5.9) that

Δλ2+Δμ2=Δλ,Δλ+Δμ,Δμ=Δλ,Δλ+WΔμ,Δμ+(IpW)Δμ,Δμ=Δλ,𝒥h(x)d+WΔμ,𝒥g(x)d+ξ+(IpW)Δμ,Δμ=h(x)(Δλ)+g(x)W(Δμ),d+WΔμ,ξ+(IpW)Δμ,Δμ=WΔμ,ξ.\begin{array}[]{l}\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=\langle\Delta\lambda,\Delta\lambda\rangle+\langle\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,\Delta\lambda\rangle+\langle W\Delta\mu,\Delta\mu\rangle+\langle(I_{p}-W)\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,{\mathcal{J}}h(x^{*})d\rangle+\langle W\Delta\mu,{\mathcal{J}}g(x^{*})d+\xi\rangle+\langle(I_{p}-W)\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu),d\rangle+\langle W\Delta\mu,\xi\rangle+\langle(I_{p}-W)\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle W\Delta\mu,\xi\rangle.\end{array} (5.10)

Since ξ𝗅𝗂𝗇(𝒯K(g(x)))\xi\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{K}(g(x^{*}))\big{)}, one has from (5.2) that for any j=1,,sj=1,\ldots,s,

{ξjrj+1, if gj(x)𝗂𝗇𝗍𝒬j,ξj=0, if gj(x)=0,ξ¯j,g¯j(x)ξ˙jg˙j(x)=0, otherwise.\left\{\begin{array}[]{ll}\xi_{j}\in{\mathds{R}}^{r_{j}+1},&\mbox{ if }g_{j}(x^{*})\in{{\sf int\,}\,}{\mathcal{Q}}_{j},\\[2.84526pt] \xi_{j}=0,&\mbox{ if }g_{j}(x^{*})=0,\\[2.84526pt] \langle\bar{\xi}_{j},\bar{g}_{j}(x^{*})\rangle-\dot{\xi}_{j}\dot{g}_{j}(x^{*})=0,&\mbox{ otherwise.}\end{array}\right. (5.11)

Now, for each j{1,,s}j\in\{1,\ldots,s\}, we separate our analysis to the following four cases.

Case 1. If gj(x)𝗂𝗇𝗍𝒬jg_{j}(x^{*})\in{{\sf int\,}}\,{\mathcal{Q}}_{j}, one has that μj=0\mu_{j}^{*}=0 so that zj=cgj(x)𝗂𝗇𝗍𝒬j-z_{j}=cg_{j}(x^{*})\in{{\sf int\,}}\,{\mathcal{Q}}_{j}. Therefore, by Lemma 5.1 (a) we know that Wj=0W_{j}=0. Hence, in this case, Wj(Δμ)j,ξj=0\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0.

Case 2. If gj(x)=0g_{j}(x^{*})=0, one has that ξj=0\xi_{j}=0. Hence, Wj(Δμ)j,ξj=0\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0 also holds.

Case 3. If gj(x)𝖻𝖽𝗋𝗒𝒬j\{0}g_{j}(x^{*})\in{\sf bdry\,}\,{\mathcal{Q}}_{j}\backslash\{0\} and μj=0\mu_{j}^{*}=0, one has that zj𝖻𝖽𝗋𝗒𝒬j\{0}-z_{j}\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}. Hence, in this case, we know from Lemma 5.1 (c) that

Wj{0,12(z¯jz¯jTz¯j2z¯jz¯jz¯jTz¯j1)}.W_{j}\in\left\{0,\frac{1}{2}\begin{pmatrix}\displaystyle\frac{\bar{z}_{j}\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|^{2}}&\ \displaystyle\frac{\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|}&\displaystyle 1\end{pmatrix}\right\}.

If Wj=0W_{j}=0, one has that Wj(Δμ)j,ξj=0\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0. Otherwise, one has that

Wjξj=12(z¯jz¯jTz¯j2z¯jz¯jz¯jTz¯j1)ξj=12((z¯jTξ¯j)z¯jz¯j2+ξ˙jz¯jz¯jz¯jTξ¯jz¯j+ξ˙j)=12(1z¯j2(z¯jTξ¯j+ξ˙jz¯j)z¯j1z¯j(z¯jTξ¯j+ξ˙jz¯j)).\begin{array}[]{ll}W_{j}\xi_{j}&=\frac{1}{2}\begin{pmatrix}\displaystyle\frac{\bar{z}_{j}\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|^{2}}&\displaystyle\frac{\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|}&\displaystyle 1\end{pmatrix}\xi_{j}=\frac{1}{2}\begin{pmatrix}\displaystyle\frac{(\bar{z}_{j}^{T}\bar{\xi}_{j})\bar{z}_{j}}{\|\bar{z}_{j}\|^{2}}+\frac{\dot{\xi}_{j}\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{z}_{j}^{T}\bar{\xi}_{j}}{\|\bar{z}_{j}\|}+\dot{\xi}_{j}\end{pmatrix}\\[28.45274pt] &=\frac{1}{2}\begin{pmatrix}\displaystyle\frac{1}{{\|\bar{z}_{j}\|^{2}}}\left(\bar{z}_{j}^{T}\bar{\xi}_{j}+{\dot{\xi}_{j}}{\|\bar{z}_{j}\|}\right)\bar{z}_{j}\\[8.53581pt] \displaystyle\frac{1}{\|\bar{z}_{j}\|}\left(\bar{z}_{j}^{T}\bar{\xi}_{j}+\dot{\xi}_{j}\|\bar{z}_{j}\|\right)\end{pmatrix}.\end{array} (5.12)

Note that μj=0\mu_{j}^{*}=0, z˙j=z¯j\dot{z}_{j}=-\|\bar{z}_{j}\|, and g˙j(x)>0\dot{g}_{j}(x^{*})>0. Then, one can see from (5.11) that ξ¯jTz¯j+ξ˙jz¯j=ξ¯jTz¯jξ˙jz˙j=cξ¯j,g¯j(x)+cξ˙jg˙j(x)=0.\bar{\xi}_{j}^{T}\bar{z}_{j}+\dot{\xi}_{j}\|\bar{z}_{j}\|=\bar{\xi}_{j}^{T}\bar{z}_{j}-\dot{\xi}_{j}\dot{z}_{j}=-c\langle\bar{\xi}_{j},\bar{g}_{j}(x^{*})\rangle+c\dot{\xi}_{j}\dot{g}_{j}(x^{*})=0. Therefore, one can get from (5.12) that Wj(Δμ)j,ξj=0\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0.

Case 4. If gj(x)𝖻𝖽𝗋𝗒𝒬j\{0}g_{j}(x^{*})\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\} and μj0\mu_{j}^{*}\neq 0, from [30, Lemma 2.2]333The proof of this Lemma originally comes from [1]. we know that there exists a constant σj>0\sigma_{j}>0 such that μj=σj(g¯j(x),g˙j(x))\mu_{j}^{*}=\sigma_{j}(-\bar{g}_{j}(x^{*}),\dot{g}_{j}(x^{*})). Hence, in this case, it holds that

zj=μjcgj(x)=((σj+c)g¯j(x),(σjc)g˙(x)).z_{j}=\mu_{j}^{*}-cg_{j}(x^{*})=\big{(}-(\sigma_{j}+c)\bar{g}_{j}(x^{*}),(\sigma_{j}-c)\dot{g}(x^{*})\big{)}. (5.13)

Since σj+c>σjc>(σj+c)\sigma_{j}+c>\sigma_{j}-c>-(\sigma_{j}+c), it holds that z¯j<z˙j<z¯j-\|\bar{z}_{j}\|<\dot{z}_{j}<\|\bar{z}_{j}\|. Therefore, by using Lemma 5.1 (a), the equation (5.13), and the fact that g¯j(x)=g˙j(x)\|\bar{g}_{j}(x^{*})\|=\dot{g}_{j}(x^{*}), one can get that

Wj=12((1+z˙jz¯j)Irjz˙jz¯jz¯jz¯jTz¯j2z¯jz¯jz¯jTz¯j1)=12((1+σjcσj+c)Irjσjcσj+cg¯j(x)(g¯j(x))Tg¯j(x)2g¯j(x)g¯j(x)(g¯j(x))Tg¯j(x)1).\begin{array}[]{lll}\displaystyle W_{j}&=&\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\left(1+\frac{\dot{z}_{j}}{\|\bar{z}_{j}\|}\right)I_{r_{j}}-\frac{\dot{z}_{j}}{\|\bar{z}_{j}\|}\frac{\bar{z}_{j}\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|^{2}}&\displaystyle\frac{\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[11.38109pt] \displaystyle\frac{\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|}&\displaystyle 1\end{pmatrix}\\[28.45274pt] &=&\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\left(1+\frac{\sigma_{j}-c}{\sigma_{j}+c}\right)I_{r_{j}}-\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{\bar{g}_{j}(x^{*})(\bar{g}_{j}(x^{*}))^{T}}{\|\bar{g}_{j}(x^{*})\|^{2}}&\displaystyle\quad\frac{-\bar{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|}\\[11.38109pt] \displaystyle\frac{(-\bar{g}_{j}(x^{*}))^{T}}{\|\bar{g}_{j}(x^{*})\|}&\displaystyle 1\end{pmatrix}.\end{array} (5.14)

Therefore, it holds that

(Irj+1Wj)(Δμ)j=12(2cσj+c(Δμ¯)j+σjcσj+c(g¯j(x))T(Δμ¯)jg¯j(x)2g¯j(x)+(Δμ˙)jg¯j(x)g¯j(x)(g¯j(x))T(Δμ¯)jg¯j(x)+(Δμ˙)j),\begin{array}[]{ll}(I_{r_{j}+1}-W_{j})(\Delta\mu)_{j}\\[5.69054pt] =\frac{1}{2}\begin{pmatrix}\displaystyle\frac{2c}{\sigma_{j}+c}{{(\Delta\bar{\mu})}_{j}}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|^{2}}\bar{g}_{j}(x^{*})+\frac{(\Delta\dot{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|}\bar{g}_{j}(x^{*})\\[11.38109pt] \displaystyle\frac{(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|}+(\Delta\dot{\mu})_{j}\end{pmatrix},\end{array}

which, together with (5.9), implies that

2cσj+c(Δμ¯)j2+σjcσj+c[(g¯j(x))T(Δμ¯)j]2g¯j(x)2+2(Δμ˙)j(g¯j(x))T(Δμ¯)jg¯j(x)+(Δμ˙)j2=0.\frac{2c}{\sigma_{j}+c}\|(\Delta\bar{\mu})_{j}\|^{2}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{[(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}]^{2}}{\|\bar{g}_{j}(x^{*})\|^{2}}+2\frac{(\Delta\dot{\mu})_{j}(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|}+(\Delta\dot{\mu})_{j}^{2}=0.

If (Δμ¯)j=0(\Delta\bar{\mu})_{j}=0, the above equation implies that (Δμ˙)j=0(\Delta\dot{\mu})_{j}=0 so that Wj(Δμ)j,ξj=0\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0. Otherwise, one has that (Δμ¯)j0(\Delta\bar{\mu})_{j}\neq 0, and the above equation can be equivalently formulated as

2cσj+c+σjcσj+c[(g¯j(x))T(Δμ¯)j]2g¯j(x)2(Δμ¯)j2+2(Δμ˙)j(g¯j(x))T(Δμ¯)jg¯j(x)(Δμ¯)j2+(Δμ˙)j2(Δμ¯)j2=0.\frac{2c}{\sigma_{j}+c}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{[(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}]^{2}}{\|\bar{g}_{j}(x^{*})\|^{2}\|(\Delta\bar{\mu})_{j}\|^{2}}+2\frac{(\Delta\dot{\mu})_{j}(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|\|(\Delta\bar{\mu})_{j}\|^{2}}+\frac{(\Delta\dot{\mu})_{j}^{2}}{\|(\Delta\bar{\mu})_{j}\|^{2}}=0.

One can reformulate the above equality

ωj2+2θjωj+2cσj+c+σjcσj+cθj2=0\omega_{j}^{2}+2\theta_{j}\omega_{j}+\frac{2c}{\sigma_{j}+c}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\theta_{j}^{2}=0 (5.15)

with

θj:=(g¯j(x))T(Δμ¯)jg¯j(x)(Δμ¯)jandωj:=(Δμ˙)j(Δμ¯)j.\theta_{j}:=\frac{(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|\|(\Delta\bar{\mu})_{j}\|}\quad\mbox{and}\quad\omega_{j}:=\frac{(\Delta\dot{\mu})_{j}}{\|(\Delta\bar{\mu})_{j}\|}.

Then, in order to ensure (5.15), its discriminant (by viewing it as a quadratic polynomial of ωj\omega_{j}) satisfies 4θj24(2cσj+c+σjcσj+cθj2)04\theta^{2}_{j}-4\left(\frac{2c}{\sigma_{j}+c}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\theta^{2}_{j}\right)\geq 0, which implies that

cσj+cθj2cσj+c0.\frac{c}{\sigma_{j}+c}\theta^{2}_{j}-\frac{c}{\sigma_{j}+c}\geq 0.

As a result, it holds that θj2=1\theta^{2}_{j}=1. Note that, if θj=1\theta_{j}=1, one has from (5.15) that ωj=1\omega_{j}=-1, or else ωj=1\omega_{j}=1. Hence, there always exists a nonzero constant κj\kappa_{j} such that

(Δμ)j=κj(g¯j(x),g˙j(x))and{κj>0, if θj=1,κj<0, if θj=1.(\Delta\mu)_{j}=\kappa_{j}\big{(}\bar{g}_{j}(x^{*}),-\dot{g}_{j}(x^{*})\big{)}\quad\mbox{and}\quad\left\{\begin{array}[]{ll}\kappa_{j}>0,&\mbox{ if }\theta_{j}=1,\\ \kappa_{j}<0,&\mbox{ if }\theta_{j}=-1.\end{array}\right. (5.16)

Note that g˙j(x)=g¯j(x)\dot{g}_{j}(x^{*})=\|\bar{g}_{j}(x^{*})\|. Hence, from (5.14) and (5.16) we know that

Wj(Δμ)j=12(2σjσj+cκjg¯j(x)σjcσj+c((g¯j(x))Tg¯j(x)g¯j(x)2κjg¯j(x)κjg˙j(x)g¯j(x)g¯j(x)κj(g¯j(x))Tg¯j(x)g¯j(x)κjg˙j(x))=(κjg¯j(x),κjg˙j(x)).\begin{array}[]{ll}&W_{j}(\Delta\mu)_{j}\\[5.69054pt] &=\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\frac{2\sigma_{j}}{\sigma_{j}+c}\kappa_{j}\bar{g}_{j}(x^{*})-\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{((\bar{g}_{j}(x^{*}))^{T}\bar{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|^{2}}\kappa_{j}\bar{g}_{j}(x^{*})-\frac{-\kappa_{j}\dot{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|}\bar{g}_{j}(x^{*})\\[11.38109pt] \displaystyle-\kappa_{j}\frac{(\bar{g}_{j}(x^{*}))^{T}\bar{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|}-\kappa_{j}\dot{g}_{j}(x^{*})\end{pmatrix}\\[28.45274pt] &=\displaystyle\left(\kappa_{j}\bar{g}_{j}(x^{*}),-\kappa_{j}\dot{g}_{j}(x^{*})\right).\end{array}

Then, according to the above equation and (5.11), one has that

ξj,Wj(Δμ)j=κj(ξ¯j,g¯j(x)ξ˙jg˙j(x))=0.\langle\xi_{j},W_{j}(\Delta\mu)_{j}\rangle=\kappa_{j}\big{(}\langle\bar{\xi}_{j},\bar{g}_{j}(x^{*})\rangle-\dot{\xi}_{j}\dot{g}_{j}(x^{*})\big{)}=0.

From all above discussion, we have proved that Wj(Δμ)j,ξj=0,j=1,,s\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0,\,\forall j=1,\ldots,s. Consequently, from (5.10) we know that Δλ2+Δμ2=0\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=0. This contradicts (5.6). That is, from Δy,V(Δy)=0\left\langle\Delta y,V(\Delta y)\right\rangle=0 one can only get Δy=0\Delta y=0. Therefore, any V𝕍c(λ,μ)V\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*}) is negative definite, which completes the proof. ∎

6 The NLSDP case

Let 𝒮p{\mathcal{S}}^{p} be the linear space of all p×pp\times p real symmetric matrices and 𝒮+p{\mathcal{S}}^{p}_{+} be the cone of positive semidefinite matrices in 𝒮p{\mathcal{S}}^{p}. We consider the NLSDP problem:

minxf(x)s.t.h(x)=0,g(x)𝒮+p,\min_{x}f(x)\quad\mbox{s.t.}\quad h(x)=0,\quad g(x)\in{\mathcal{S}}_{+}^{p}, (6.1)

where f:𝒳f:{\mathcal{X}}\to{\mathds{R}}, h:𝒳mh:{\mathcal{X}}\to{\mathds{R}}^{m} and g:𝒳𝒮pg:{\mathcal{X}}\to{\mathcal{S}}^{p} are three twice continuously differentiable functions. Problem (6.1) is an instance of problem (1.1) with 𝒵=𝒮p{\mathcal{Z}}={\mathcal{S}}^{p} and K=𝒮+pK={\mathcal{S}}_{+}^{p}. We first give a quick review of some preliminary results for NLSDP problems. One may refer to [48] for more details.

6.1 Preliminaries on NLSDP

For any two matrices Z,Z𝒮pZ,Z^{\prime}\in{\mathcal{S}}^{p}, the inner product of them defined by Z,Z:=Tr(ZTZ)\langle Z,Z^{\prime}\rangle:={\rm Tr}(Z^{T}Z^{\prime}) and its induced norm is the Frobenius norm given by Z=Z,Z\|Z\|=\sqrt{\langle Z,Z\rangle}. Moreover, we write Z+:=Π𝒮+p(Z)Z_{+}:=\Pi_{{\mathcal{S}}_{+}^{p}}(Z). Hence, it holds that

Z+𝒮+p,Z+Z𝒮+p,Z+,(Z+Z)=0.Z_{+}\in{\mathcal{S}}_{+}^{p},\quad Z_{+}-Z\in{\mathcal{S}}_{+}^{p},\quad\langle Z_{+},(Z_{+}-Z)\rangle=0. (6.2)

One can write its spectral decomposition as Z=PDPTZ=PDP^{T} with Pp×pP\in{\mathds{R}}^{p\times p} being an orthogonal matrix and Dp×pD\in{\mathds{R}}^{p\times p} being diagonal matrix whose diagonal entries are ordered by ϱ1ϱ2,,ϱp\varrho_{1}\geq\varrho_{2},\ldots,\geq\varrho_{p}. We use the three index sets α,β\alpha,\beta and γ\gamma to indicate the positive, zero, and negative eigenvalues of ZZ, respectively, i.e.,

α:={iϱi>0},β:={iϱi=0}andγ:={iϱi<0},i=1,,p.\alpha:=\{i\mid\varrho_{i}>0\},\quad\beta:=\{i\mid\varrho_{i}=0\}\quad\mbox{and}\quad\gamma:=\{i\mid\varrho_{i}<0\},\quad\forall i=1,\ldots,p. (6.3)

Then, one can write

D=(Dα000Dβ000Dγ)andP=(Pα,Pβ,Pγ)D=\begin{pmatrix}D_{\alpha}&0&0\\ 0&D_{\beta}&0\\ 0&0&D_{\gamma}\end{pmatrix}\quad\mbox{and}\quad P=\begin{pmatrix}P_{\alpha},P_{\beta},P_{\gamma}\end{pmatrix} (6.4)

with Pαp×|α|P_{\alpha}\in{\mathds{R}}^{p\times|\alpha|}, Pβp×|β|P_{\beta}\in{\mathds{R}}^{p\times|\beta|}, and Pγp×|γ|P_{\gamma}\in{\mathds{R}}^{p\times|\gamma|}. Moreover, we define the matrix U𝒮pU\in{\mathcal{S}}^{p} by

Uij:=max{ϱi,0}+max{ϱj,0}|ϱi|+|ϱj|,i,j=1,,p,U_{ij}:=\frac{\max\{\varrho_{i},0\}+\max\{\varrho_{j},0\}}{|\varrho_{i}|+|\varrho_{j}|},\quad i,j=1,\ldots,p, (6.5)

with the convention that 0/0=10/0=1. As has been shown in Sun and Sun [50], the projection operator Π𝒮+p\Pi_{{\mathcal{S}}_{+}^{p}} is strongly semismooth. Moreover, the following result in Pang et al. [35, Lemma 11] characterizes the Bouligand-subdifferential of this projection operator.

Lemma 6.1.

Suppose that Z𝒮pZ\in{\mathcal{S}}^{p} has the spectral decomposition Z=PDPTZ=PDP^{T} with PP and DD being partitioned by (6.4). Then, for any 𝐖BΠ𝒮+p(Z){\mathbf{W}}\in\partial_{B}\Pi_{{\mathcal{S}}_{+}^{p}}(Z), there exist two index sets α\alpha^{\prime} and γ\gamma^{\prime} that partition β\beta, together with a matrix Γ|α|×|γ|\Gamma\in{\mathds{R}}^{|\alpha^{\prime}|\times|\gamma^{\prime}|} with entries in [0,1][0,1], such that

𝐖(H)=P([PTHP]αα[PTHP]αβUαγ[PTHP]αγ[PTHP]αβTM([PTHP]ββ)0[PTHP]αγTUαγT00)PT,H𝒮p,\begin{array}[]{l}{\mathbf{W}}(H)=P\begin{pmatrix}[P^{T}HP]_{\alpha\alpha}&[P^{T}HP]_{\alpha\beta}&U_{\alpha\gamma}\circ[P^{T}HP]_{\alpha\gamma}\\[5.69054pt] [P^{T}HP]_{\alpha\beta}^{T}&M([P^{T}HP]_{\beta\beta})&0\\[5.69054pt] [P^{T}HP]_{\alpha\gamma}^{T}\circ U_{\alpha\gamma}^{T}&0&0\end{pmatrix}P^{T},\\[5.69054pt] \hfill\forall\,H\in{\mathcal{S}}^{p},\end{array} (6.6)

where ``"``\circ" denotes the Hadamard product and

M([PTHP]ββ):=(([PTHP]ββ)ααΓ([PTHP]ββ)αγ([PTHP]ββ)γαΓT0).M([P^{T}HP]_{\beta\beta}):=\begin{pmatrix}([P^{T}HP]_{\beta\beta})_{\alpha^{\prime}\alpha^{\prime}}&\Gamma\circ([P^{T}HP]_{\beta\beta})_{\alpha^{\prime}\gamma^{\prime}}\\[5.69054pt] ([P^{T}HP]_{\beta\beta})_{\gamma^{\prime}\alpha^{\prime}}\circ\Gamma^{T}&0\end{pmatrix}.

It is easy to see that

𝒯𝒮+p(Z+)={H𝒮pPα¯THPα¯0},{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(Z_{+})=\{H\in{\mathcal{S}}^{p}\mid P_{\bar{\alpha}}^{T}HP_{\bar{\alpha}}\succeq 0\}, (6.7)

where α¯:=βγ\bar{\alpha}:=\beta\cup\gamma and Pα¯:=[Pβ,Pγ]P_{\bar{\alpha}}:=[P_{\beta},P_{\gamma}]. From (6.7) we know that the lineality space of this set can be represented by

𝗅𝗂𝗇(𝒯𝒮+p(Z+))={H𝒮pPα¯THPα¯=0}.{{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(Z^{+})\big{)}=\{H\in{\mathcal{S}}^{p}\mid P_{\bar{\alpha}}^{T}HP_{\bar{\alpha}}=0\}. (6.8)

The following definition of constraint nondegeneracy for the NLSDP problem comes from Sun [48, Definition 3.2], which is an analogue of the LICQ for NLP problems. The relation between this condition and the concept of nondegeneracy in Robinson [41] also has been well discussed in [48, Section 3].

Definition 6.2 (Constraint nondegeneracy).

We say that a feasible point xx to the NLSDP problem is constraint nondegenerate if

(𝒥h(x)𝒥g(x))𝒳+({0}𝗅𝗂𝗇(𝒯𝒮+p(g(x))))=(m𝒮p).\begin{pmatrix}{\mathcal{J}}h(x)\\ {\mathcal{J}}g(x)\end{pmatrix}{\mathcal{X}}+\begin{pmatrix}\{0\}\\ {{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x))\big{)}\end{pmatrix}=\begin{pmatrix}{\mathds{R}}^{m}\\ {\mathcal{S}}^{p}\end{pmatrix}. (6.9)

The critical cone of 𝒮+p{\mathcal{S}}_{+}^{p} at Z𝒮pZ\in{\mathcal{S}}^{p}, is defined by

𝒞(Z;S+p):=𝒯𝒮+p(Z+)(Z+Z),={H𝒮p:PβTHPβ0,PβTHPγ=0,PγTHPγ=0}.\begin{array}[]{ll}{\mathcal{C}}(Z;S_{+}^{p}):&={\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(Z_{+})\cap(Z_{+}-Z)^{\bot},\\[5.69054pt] &=\{H\in{\mathcal{S}}^{p}:P_{\beta}^{T}HP_{\beta}\succeq 0,\,P_{\beta}^{T}HP_{\gamma}=0,\,P_{\gamma}^{T}HP_{\gamma}=0\}.\end{array}

Therefore,

𝖺𝖿𝖿(𝒞(Z;S+p))={H𝒮pPβTHPγ=0,PγTHPγ=0}.{{\sf aff\,}}\big{(}{\mathcal{C}}(Z;S_{+}^{p})\big{)}=\{H\in{\mathcal{S}}^{p}\mid P_{\beta}^{T}HP_{\gamma}=0,\,P_{\gamma}^{T}HP_{\gamma}=0\}.

Let (x,λ,μ)(x^{*},\lambda^{*},\mu^{*}) be a solution to the KKT system (3.2) of the NLSDP problem (6.1). Then, the critical cone 𝒞(x){\mathcal{C}}(x^{*}) of NLSDP at xx^{*} is defined as

𝒞(x):={d𝒳𝒥h(x)d=0,𝒥g(x)d𝒞(g(x)μ;𝒮+p)},{\mathcal{C}}(x^{*}):=\{d\in{\mathcal{X}}\mid{\mathcal{J}}h(x^{*})d=0,\ {\mathcal{J}}g(x^{*})d\in{\mathcal{C}}(g(x^{*})-\mu^{*};{\mathcal{S}}_{+}^{p})\},

Since an explicit formula for the affine hull of 𝒞(x){\mathcal{C}}(x^{*}) is not readily available, an outer approximation to this affine hull, with respect to (λ,μ)(\lambda^{*},\mu^{*}) was introduced 444One may refer to Sun [48, Section 3] for more information. in Sun [48], i.e.,

𝖺𝗉𝗉(λ,μ):={d𝒳:𝒥h(x)d=0,𝒥g(x)d𝖺𝖿𝖿𝒞((λ,μ);𝒮+p)}={d𝒳:𝒥h(x)d=0,PβT(𝒥g(x)d)Pγ=0,PγT(𝒥g(x)d)Pγ=0}.\begin{array}[]{ll}{{\sf app\,}}(\lambda^{*},\mu^{*}):&=\{d\in{\mathcal{X}}:{\mathcal{J}}h(x^{*})d=0,{\mathcal{J}}g(x^{*})d\in{{\sf aff\,}}{\mathcal{C}}((\lambda^{*},\mu^{*});{\mathcal{S}}_{+}^{p})\}\\[5.69054pt] &=\{d\in{\mathcal{X}}:{\mathcal{J}}h(x^{*})d=0,P_{\beta}^{T}({\mathcal{J}}g(x^{*})d)P_{\gamma}=0,P_{\gamma}^{T}({\mathcal{J}}g(x^{*})d)P_{\gamma}=0\}.\end{array}

Moreover, for any (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}), one has that 𝖺𝖿𝖿(𝒞(x))𝖺𝗉𝗉(λ,μ){{\sf aff\,}}({\mathcal{C}}(x^{*}))\subseteq{{\sf app\,}}(\lambda^{*},\mu^{*}), which is different from (5.4) for the NLSOCP problems.

For any given matrix B𝒮pB\in{\mathcal{S}}^{p}, we define the linear-quadratic function ΥB:𝒮p×𝒮p\Upsilon_{B}:{\mathcal{S}}^{p}\times{\mathcal{S}}^{p}\to{\mathds{R}}, which is linear in the first argument and quadratic in the second argument, by

ΥB(Γ,A):=2Γ,ABA,(Γ,A)𝒮p×𝒮p,\Upsilon_{B}(\Gamma,A):=2\langle\Gamma,AB^{\dagger}A\rangle,\quad(\Gamma,A)\in{\mathcal{S}}^{p}\times{\mathcal{S}}^{p},

where BB^{\dagger} is the Moore-Penrose pseudo-inverse of BB. The following definition of the strong second-order sufficient condition for NLSDP problems was given in Sun [48, Definition 3.2], which is an extension of the strong second-order sufficient condition for NLP problems in Robinson [40] to NLSDP problems.

Definition 6.3 (Strong second-order sufficient condition for NLSDP).

Let xx^{*} be a stationary point of the NLSDP problem. We say that the strong second-order sufficient condition holds at xx^{*} if

sup(λ,μ)(x){d,𝒥xx20(x,λ,μ)dΥg(x)(μ,𝒥g(x)d)}>0,d𝒞^(x)\{0},\sup_{(\lambda,\mu)\in{\mathcal{M}}(x^{*})}\left\{\langle d,{\mathcal{J}}^{2}_{xx}{\mathcal{L}}_{0}(x^{*},\lambda,\mu)d\rangle-\Upsilon_{g(x^{*})}(\mu,{\mathcal{J}}g(x^{*})d)\right\}>0,\ \forall d\in\widehat{\mathcal{C}}(x^{*})\backslash\{0\},

where

𝒞^(x):=(λ,μ)(x)𝖺𝗉𝗉(λ,μ).\widehat{\mathcal{C}}(x^{*}):=\bigcap_{(\lambda,\mu)\in{\mathcal{M}}(x^{*})}{{\sf app\,}}(\lambda,\mu).

6.2 Main results

Based on the above preparations, we are ready to establish the main results of this section.

Theorem 6.4.

Suppose that both the constraint nondegeneracy and the strong second-order sufficient condition for problem (6.1) hold at xx^{*} ((with (λ,μ)(x)(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*}))). Then, Assumptions 3.1 and 3.2 hold.

Proof.

The proof for Assumption 3.1 comes from [8, Proposition 17] and [51, Proposition 4]. Next, we show that Assumption 3.2 holds.

According to Assumption 3.1 we know that (x)={(λ,μ)}{\mathcal{M}}(x^{*})=\{(\lambda^{*},\mu^{*})\} is a singleton. Recall that (x,λ,μ)(x^{*},\lambda^{*},\mu^{*}) is a solution to the KKT system of the NLSDP problem. Then,

g(x)𝒮+pandμ𝒮+p.g(x^{*})\in{\mathcal{S}}_{+}^{p}\quad\mbox{and}\quad\mu^{*}\in{\mathcal{S}}_{+}^{p}.

If Ξ:=g(x)μ\Xi:=g(x^{*})-\mu^{*} has the spectral decomposition that Ξ=PDPT\Xi=PDP^{T} with PP and DD having the representations in (6.4) and α,β,γ\alpha,\beta,\gamma being defined in (6.3), one has from (6.2) that

g(x)=P(Dα00000000)PTandμ=P(00000000Dγ)PT.g(x^{*})=P\begin{pmatrix}D_{\alpha}&0&0\\ 0&0&0\\ 0&0&0\end{pmatrix}P^{T}\quad\mbox{and}\quad-\mu^{*}=P\begin{pmatrix}0&0&0\\ 0&0&0\\ 0&0&D_{\gamma}\end{pmatrix}P^{T}. (6.10)

Therefore, by using (6.7) and (6.8) one has

𝒯𝒮+p(g(x))={H𝒮p[PβPγ]TH[PβPγ]0},{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x^{*}))=\{H\in{\mathcal{S}}^{p}\mid[P_{\beta}\ P_{\gamma}]^{T}H[P_{\beta}\ P_{\gamma}]\succeq 0\},

so that

𝗅𝗂𝗇(𝒯S+p(g(x)))={H𝒮p[PβPγ]TH[PβPγ]=0}.{{\sf lin\,}}\big{(}{\mathcal{T}}_{S_{+}^{p}}(g(x^{*}))\big{)}=\{H\in{\mathcal{S}}^{p}\mid[P_{\beta}\ P_{\gamma}]^{T}H[P_{\beta}\ P_{\gamma}]=0\}. (6.11)

Define Z:=μcg(x)Z:=\mu^{*}-cg(x^{*}) and Q=[Pγ,Pβ,Pα]Q=[P_{\gamma},P_{\beta},P_{\alpha}]. Then, by using (6.10) one can get

Z=P(cDα0000000Dγ)PT=Q(Dγ0000000cDα)QT.Z=P\begin{pmatrix}-cD_{\alpha}&0&0\\ 0&0&0\\ 0&0&-D_{\gamma}\end{pmatrix}P^{T}=Q\begin{pmatrix}-D_{\gamma}&0&0\\ 0&0&0\\ 0&0&-cD_{\alpha}\end{pmatrix}Q^{T}.

Let Δλm\Delta\lambda\in{\mathds{R}}^{m} and Δμ𝒮p\Delta\mu\in{\mathcal{S}}^{p} such that (Δλ,Δμ)0(\Delta\lambda,\Delta\mu)\neq 0. From (6.6) and (6.10) we know that for any 𝐖BΠ𝒮+p(Z){\mathbf{W}}\in\partial_{B}\Pi_{{\mathcal{S}}_{+}^{p}}(Z),

Δμ,(𝒮p𝐖)Δμ=Δμ,QQT(Δμ)QQTQ(PγTΔμPγPγTΔμPβUγαPγTΔμPαPβTΔμPγM¯(PβTΔμPβ)0PαTΔμPγUγαT00)QT=QTΔμQ,QT(Δμ)Q(PγTΔμPγPγTΔμPβUγαPγTΔμPαPβTΔμPγM¯(PβTΔμPβ)0PαTΔμPγUγαT00)=QTΔμQ,(00PγTΔμPαUγαPγTΔμPα0(IβM¯)(PβTΔμPβ)PβTΔμPαPαTΔμPγPαTΔμPγUγαTPαTΔμPβPαTΔμPα),\begin{array}[]{l}\displaystyle\langle\Delta\mu,({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu\rangle\\ =\left\langle\Delta\mu,Q\,Q^{T}(\Delta\mu)Q\,Q^{T}-Q\begin{pmatrix}P_{\gamma}^{T}{\Delta\mu}P_{\gamma}&P_{\gamma}^{T}{\Delta\mu}P_{\beta}&U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] P_{\beta}^{T}{\Delta\mu}P_{\gamma}&\overline{M}(P_{\beta}^{T}{\Delta\mu}P_{\beta})&0\\[5.69054pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&0&0\end{pmatrix}Q^{T}\right\rangle\\[34.1433pt] =\left\langle Q^{T}\Delta\mu Q,Q^{T}(\Delta\mu)Q\,-\begin{pmatrix}P_{\gamma}^{T}{\Delta\mu}P_{\gamma}&P_{\gamma}^{T}{\Delta\mu}P_{\beta}&U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] P_{\beta}^{T}{\Delta\mu}P_{\gamma}&\overline{M}(P_{\beta}^{T}{\Delta\mu}P_{\beta})&0\\[5.69054pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&0&0\end{pmatrix}\right\rangle\\[34.1433pt] =\left\langle Q^{T}\Delta\mu Q,\begin{pmatrix}0&0&P_{\gamma}^{T}{\Delta\mu}P_{\alpha}-U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] 0&(I_{\beta}-\overline{M})(P_{\beta}^{T}{\Delta\mu}P_{\beta})&P_{\beta}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}-P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&P_{\alpha}^{T}{\Delta\mu}P_{\beta}&P_{\alpha}^{T}{\Delta\mu}P_{\alpha}\end{pmatrix}\right\rangle,\end{array}

(6.12)

where Iβ|β|×|β|I_{\beta}\in{\mathds{R}}^{|\beta|\times|\beta|} is the identity matrix and M¯\overline{M} has the same structure and properties as the matrix MM in Lemma 6.1. Then, by (6.5) and the fact that Δμ\Delta\mu is arbitrarily given, we can conclude that (𝒮p𝐖)({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}}) is positive semidefinite, where 𝒮p{\mathcal{I}}_{{\mathcal{S}}^{p}} is the identity operator on 𝒮p{\mathcal{S}}^{p}. Since Assumption 3.1 holds, any 𝐕𝕍c(λ,μ){\mathbf{V}}\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*}) can be represented by the right-hand side of (3.10) with 𝒜c(λ,μ,𝐖){\mathcal{A}}_{c}(\lambda^{*},\mu^{*},{\mathbf{W}}) being nonsingular. Therefore,

(Δλ,Δμ),𝐕(Δλ,Δμ)=h(x)(Δλ)+g(x)𝐖(Δμ),(𝒜c(λ,μ,𝐖))1(h(x)(Δλ)+g(x)𝐖(Δμ))+1cΔμ,(𝒮p𝐖)Δμ.\begin{array}[]{ll}-\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle\\[5.69054pt] =\Big{\langle}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu),\\[5.69054pt] \quad({\mathcal{A}}_{c}(\lambda^{*},\mu^{*},{\mathbf{W}}))^{-1}\big{(}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu)\big{)}\Big{\rangle}+\frac{1}{c}\left\langle\Delta\mu,({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu\right\rangle.\end{array}

Therefore, if (Δλ,Δμ),𝐕(Δλ,Δμ)=0\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle=0, one has that

Δμ,(𝒮p𝐖)Δμ=0,\langle\Delta\mu,({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu\rangle=0, (6.13)

and

h(x)(Δλ)+g(x)𝐖(Δμ)=0.\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu)=0. (6.14)

Since the constraint nondegeneracy (6.9) holds for the NLSDP problem at xx^{*}, there exists a vector d𝒳d\in{\mathcal{X}} and a matrix S𝗅𝗂𝗇(𝒯𝒮+p(g(x)))S\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x^{*}))\big{)} such that

𝒥h(x)d=Δλand𝒥g(x)d+S=Δμ.{\mathcal{J}}h(x^{*})d=\Delta\lambda\quad\mbox{and}\quad{\mathcal{J}}g(x^{*})d+S=\Delta\mu.

Therefore, one can get from (6.13) and (6.14) that

Δλ2+Δμ2=Δλ,Δλ+Δμ,Δμ=Δλ,Δλ+𝐖Δμ,Δμ+(𝒮p𝐖)Δμ,Δμ=Δλ,𝒥h(x)d+𝐖Δμ,𝒥g(x)d+S+(𝒮p𝐖)Δμ,Δμ=h(x)(Δλ)+g(x)𝐖(Δμ),d+(𝒮p𝐖)Δμ,Δμ+𝐖Δμ,S=𝐖Δμ,S.\begin{array}[]{l}\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=\langle\Delta\lambda,\Delta\lambda\rangle+\langle\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,\Delta\lambda\rangle+\langle{\mathbf{W}}\Delta\mu,\Delta\mu\rangle+\langle({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,{\mathcal{J}}h(x^{*})d\rangle+\langle{\mathbf{W}}\Delta\mu,{\mathcal{J}}g(x^{*})d+S\rangle+\langle({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu),d\rangle+\langle({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu,\Delta\mu\rangle+\langle{\mathbf{W}}\Delta\mu,S\rangle\\[5.69054pt] =\langle{\mathbf{W}}\Delta\mu,S\rangle.\end{array}

Since S𝗅𝗂𝗇(𝒯𝒮+p(g(x)))S\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x^{*}))\big{)}, we know from (6.11) that [PβPγ]TS[PβPγ]=0[P_{\beta}\ P_{\gamma}]^{T}S[P_{\beta}\ P_{\gamma}]=0. Therefore, it holds that

𝐖Δμ,S=(PγTΔμPγPγTΔμPβUγαPγTΔμPαPβTΔμPγM¯(PβTΔμPβ)0PαTΔμPγUγαT00),(00PγTSPα00PβTSPαPαTSPγPαTSPβPαTSPα)=UγαPγTΔμPα,PγTSPα+PαTΔμPγUγαT,PαTSPγ.\begin{array}[]{lll}\langle{\mathbf{W}}\Delta\mu,S\rangle&=&\left\langle\begin{pmatrix}P_{\gamma}^{T}{\Delta\mu}P_{\gamma}&P_{\gamma}^{T}{\Delta\mu}P_{\beta}&U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[2.84526pt] P_{\beta}^{T}{\Delta\mu}P_{\gamma}&\overline{M}(P_{\beta}^{T}{\Delta\mu}P_{\beta})&0\\[2.84526pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&0&0\end{pmatrix}\right.,\\[28.45274pt] &&\qquad\qquad\qquad\left.\begin{pmatrix}0&0&P^{T}_{\gamma}SP_{\alpha}\\[2.84526pt] 0&0&P^{T}_{\beta}SP_{\alpha}\\[2.84526pt] P^{T}_{\alpha}SP_{\gamma}&P^{T}_{\alpha}SP_{\beta}&P^{T}_{\alpha}SP_{\alpha}\end{pmatrix}\right\rangle\\[28.45274pt] &=&\langle U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha},P^{T}_{\gamma}SP_{\alpha}\rangle+\langle P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T},P^{T}_{\alpha}SP_{\gamma}\rangle.\end{array}

Suppose that (Δλ,Δμ),𝐕(Δλ,Δμ)=0\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle=0. From (6.12) and (6.13) we know that

PγTΔμPα,PγTΔμPαUγαPγTΔμPα=0.\langle P_{\gamma}^{T}{\Delta\mu}P_{\alpha},P_{\gamma}^{T}{\Delta\mu}P_{\alpha}-U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\rangle=0.

Then, by using (6.5) and (6.10) one can get PγTΔμPα=0\|P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\|=0. Hence 𝐖Δμ,S=0\langle{\mathbf{W}}\Delta\mu,S\rangle=0. Consequently, it holds that

Δλ2+Δμ2=0.\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=0.

This implies that if (Δλ,Δμ),𝐕(Δλ,Δμ)=0\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle=0, one must have Δλ=0\Delta\lambda=0 and Δμ=0\Delta\mu=0. Therefore, any 𝐕𝕍c(λ,μ){\mathbf{V}}\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*}) is negative definite. This completes the proof. ∎

7 Numerical illustration

In this section, we use a simple numerical example to test whether the numerical performance of the second-order augmented Lagrangian method can achieve a superlinear convergence. For comparison, the classic augmented Lagrangian method with the linear convergence rate is also implemented.

Consider the following problem

min(x1,x2)2{12(x11)2+12(x22)2x1=0,x20}.\min_{(x_{1},x_{2})\in{\mathds{R}}^{2}}\left\{\frac{1}{2}(x_{1}-1)^{2}+\frac{1}{2}(x_{2}-2)^{2}\ \mid\ x_{1}=0,\ x_{2}\geq 0\right\}.

It is easy to verify the solution of this problem is given by (x1,x2)=(0,2)(x_{1}^{*},x_{2}^{*})=(0,2), and both the LICQ (Definition 4.1) and the strong second-order sufficient condition (Definition 4.3) are satisfied at this point with the corresponding multipliers given by (λ,μ)=(1,0)(\lambda^{*},\mu^{*})=(-1,0). We set the penalty parameter c=1c=1, and implement both the augmented Lagrangian method and the second-order augmented Lagrangian method for this problem with the initial multipliers 555The initial multiplier is not intentionally chosen. One can observe the same numerical behavior from other initial points. (λ0,μ0):=(100,100)(\lambda^{0},\mu^{0}):=(100,100).

The numerical experiment is conducted by running Matlab R2020a on a MacBook Pro (macOS 11.2.3 system) with one Intel 2.7 GHz Quad-Core i7 Processor and 16 GB RAM. We measure the error at the kk-th iteration via

ηk:=(λk,μk)(λ,μ).\eta^{k}:=\|(\lambda^{k},\mu^{k})-(\lambda^{*},\mu^{*})\|.

The algorithms are terminated if ηk1020\eta^{k}\leq 10^{-20}. Moreover, we set ηk\eta^{k} as 105010^{-50} if ηk1050\eta^{k}\leq 10^{-50}, so that a valid value can be obtained by taking the logarithm of the error when it is too close to zero.

Figure 1 clearly shows the behaviors of the two methods. The left part shows how the errors of the two methods varies with respect to the iteration number, while the right part of this figure, by taking logarithm of the errors, explicitly shows the linear convergence of the classic augmented Lagrangian method, as well as the superlinear convergence of the second-order method of multipliers.

Figure 1: The error (left) and the logarithm of the error (right) with respect to the iteration
Refer to caption
Refer to caption

8 Conclusions

In this paper, we have proposed a second-order method of multipliers for solving the NLP, the NLSOCP and the NLSDP problems. The proposed method is a combination of the classic method of multipliers and a specially designed nonsmooth Newton method for updating the multipliers. The local convergence, as well as the superlinear convergence of this second-order method was established under the constraint nondegeneracy (or LICQ) together with a strong second-order sufficient condition. We re-emphasize that, the conditions we used in this paper for proving the superlinear rate convergence of the second-order method of multipliers are exactly those used in the literature for establishing the linear rate convergence (or asymptotically superlinear if the penalty parameter goes to infinity) of the classic method of multipliers without assuming the strictly complementarity. Besides, a simple numerical test was conducted to demonstrate the superlinear convergence behavior of the method studied in this paper.

Since the proposed method is based on both the augmented Lagrangian method and generalized Newton method, we are not able to establish any global convergence property of this method in general. However, to make the method being implementable, globalization techniques should be further studied, together with the computational studies on solving the subproblems and updating the multipliers.

References

  • [1] Alizadeh F, Goldfarb D. Second-order cone programming. Math Program, 2003, 95(1): 3–51
  • [2] Bertsekas D P. On penalty and multiplier methods for constrained minimization. SIAM J Control Optim, 1976, 14(2): 216–235
  • [3] Bertsekas D P. Multiplier methods: A survey. Automatica, 1976, 12(2): 133–145
  • [4] Bertsekas D P. On the convergence properties of second-order methods of multipliers. J Optim Theory Appl, 1978, 25(3): 443–449
  • [5] Bertsekas D P. Constrained Optimization and Lagrange Multiplier Methods. New York: Academic, 1982
  • [6] Bonnans J F, Ramírez H. Perturbation analysis of second-order cone programming problems. Math Program, 2005, 104(2): 205–227
  • [7] Bonnans J F, Shapiro A. Nondegeneracy and quantitative stability of parameterized optimization problems with multiple solutions. SIAM J Optim, 1998, 8(4): 940–946
  • [8] Bonnans J F, Shapiro A. Perturbation Analysis of Optimization Problems. New York: Springer, 2000
  • [9] Brusch R. A rapidly convergent method for equality constrained function minimization. In: Proceeding of 1973 IEEE Conference on Decision Control. San Diego, California, 1973, 80–81
  • [10] Bueno L, Haeser G, Santos L. Towards an efficient augmented Lagrangian method for convex quadratic programming. Comput Optim Appl, 2020, 76: 767–800
  • [11] Buys J. Dual algorithms for constrained optimization problems. PhD Dissertation. Netherlands: Univ. of Leiden, 1972
  • [12] Chen J-S, Chen X, Tseng P. Analysis of nonsmooth vector-valued functions associated with second-order cones. Math Program, 2004, 101: 95–117
  • [13] Chen L, Liao A P. On the convergence properties of a second-order augmented Lagrangian method for nonlinear programming problems with inequality constraints. J Optim Theory Appl, 2020, 187: 248–265
  • [14] Chen X, Sun D F, Sun J. Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput Optim Appl, 2003, 25: 39–56
  • [15] Clarke F H. Optimization and Nonsmooth Analysis. New York: Wiley, 1983
  • [16] Conn A, Gould N, Toint P. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J Numer Anal, 1991, 28(2): 545–572
  • [17] Conn A, Gould N, Toint P. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A). Berlin Heidelberg: Springer-Verlag, 1992
  • [18] Contesse-Becker L. Extended convergence results for the method of multipliers for nonstrictly binding inequality constraints. J Optim Theory Appl, 1993, 79(2): 273–310
  • [19] Fletcher R. An ideal penalty function for constrained optimization. IMA J Appl Math, 1975, 15(3): 319–342
  • [20] Fontecilla R, Steihaug T, Tapia R. A convergence theory for a class of quasi-Newton methods for constrained optimization. SIAM J Numer Anal, 1987, 24(5): 1133–1151
  • [21] Fukushima M, Luo Z Q, Tseng P. Smoothing functions for second-order-cone complementarity problems. SIAM J Optim, 2001, 12: 436–460
  • [22] Hayashi S, Yamashita N, Fukushima M. A combined smoothing and regularization method for monotone second-order cone complementarity problems. SIAM J Optim, 2005, 15: 593–615
  • [23] Hestenes M. Multiplier and gradient methods. J Optim Theory Appl, 1969, 4(5): 303–320
  • [24] Ito K, Kunisch K. The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math Program, 1990, 46(1-3): 341–360
  • [25] Kanzow C, Ferenczi I, Fukushima M. On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM J Optim, 2009, 20(1): 297–320
  • [26] Kummer B. Newton’s method based on generalized derivatives for nonsmooth functions: Convergence analysis. In: Oettli W (ed.), Advances in Optimization, Berlin: Springer, 1992, 171–194
  • [27] Kummer B. Newton’s method for nondifferentiable functions. In: Guddat J (ed.), Advances in Mathematical Optimization, Berlin: Akademie-Verlag, 1998, 114–125
  • [28] Li X D, Sun D F, Toh K-C. A highly efficient semismooth Netwon augmented Lagrangian method for solving Lasso problems. SIAM J Optim, 2018, 28: 433–458
  • [29] Liu Y J, Zhang L W. Convergence analysis of the augmented Lagrangian method for nonlinear second-order cone optimization problems. Nonlinear Anal, 2007, 67: 1359–1373
  • [30] Liu Y J, Zhang L W. Convergence of the augmented Lagrangian method for nonlinear optimization problems over second-order cones. J Optim Theory Appl, 2008, 139: 557–575
  • [31] Meng F W, Sun D F, Zhao G Y. Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Math Program, 2005, 104: 561–581
  • [32] Mifflin R. Semismooth and semiconvex functions in constrained optimization. SIAM J Control Optim, 1977, 15(6) 959–972
  • [33] Nocedal J, Wright S J. Numerical Optimization (2nd Edition). New York: Springer-Verlag, 2006
  • [34] Outrata J V, Sun D F. On the coderivative of the projection operator onto the second-order cone. Set-Valued Anal, 2008, 16: 999–1014
  • [35] Pang J-S, Sun D F, Sun J. Semismooth homeomorphisms and strong stability of semidefinite and Lorentz complementarity problems. Math Oper Res, 2003, 28: 39–63
  • [36] Pang J-S, Qi L Q. Nonsmooth equations: motivation and algorithms. SIAM J Optim, 1993, 3: 443–465
  • [37] Powell M J D. A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, New York: Academic, 1969, 283–298
  • [38] Qi L Q, Sun J. A nonsmooth version of Newton’s method. Math Program, 1993, 58(1-3): 353–367
  • [39] Qi L Q. Convergence analysis of some algorithms for solving nonsmooth equations. Math Oper Res, 1993, 18(1): 227–244
  • [40] Robinson S M. Strongly regular generalized equations. Math Oper Res, 1980, 5(1): 43–62
  • [41] Robinson S M. Local structure of feasible sets in nonlinear programming, Part II: Nondegeneracy. Math Program Study, 1984, 22: 217–230
  • [42] Robinson S M. Constraint nondegeneracy in variational analysis. Math Oper Res, 2003, 28: 201–232
  • [43] Rockafellar R T. A dual approach to solving nonlinear programming problems by unconstrained optimization. Math Program, 1973, 5(1): 354–373
  • [44] Rockafellar R T. The multiplier method of hestenes and powell applied to convex programming. J Optim Theory Appl, 1973, 12(6): 555–562
  • [45] Rockafellar R T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math Oper Res, 1976, 1(2): 97–116
  • [46] Rockafellar R T, Wets R J-B. Variational Analysis. Berlin: Springer-Verlag, 1998
  • [47] Shapiro A. Sensitivity analysis of generalized equations. J Math Sci, 2003, 115: 2554–2565.
  • [48] Sun D F. The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math Oper Res, 2006, 31(4): 761–776
  • [49] Sun D F, Han J Y. Newton and quasi-Newton methods for a class of nonsmooth equations and related problems. SIAM J Optim, 1997, 7: 463–480
  • [50] Sun D, Sun J. Semismooth matrix valued functions. Math Oper Res, 2002, 27: 150–169
  • [51] Sun D F, Sun J, Zhang L W. The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math Program, 2008, 114(2): 349–391
  • [52] Tretyakov N. A method of penalty estimates for convex programming problems. Ékonomika i Matematicheskie Metody, 1973, 9: 525–540
  • [53] Yang L Q, Sun D F, Toh K-C. SDPNAL+: a majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math Program Comput, 2015, 7: 331–366
  • [54] Yuan Y-X. Analysis on a superlinearly convergent augmented Lagrangian method. Acta Math Sinica, English Series, 2014, 30(1): 1–10
  • [55] Zhao X Y, Sun D F, Toh K-C. A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J Optim, 2010, 20: 1737–1765

Appendix A Proof of Proposition 3.3

The following result is a consequence of Qi and Sun [38, Theorem 2.3].

Lemma A.1.

The locally Lipschitz continuous function F:O𝒰F:O\to{\mathcal{U}} is semismooth at uOu\in O if and only if for any sequence {uk}O\{u^{k}\}\subset O converging to uu with VkF(uk)V^{k}\in\partial F(u^{k}), it holds that F(u;uku)Vk(uku)=o(uku)F\,^{\prime}(u;u^{k}-u)-V^{k}(u^{k}-u)=o(\|u^{k}-u\|).

Now we start to prove Proposition 3.3.

Proof of Proposition 3.3.

Since Assumption 3.1 holds and cc¯c\geq\bar{c}, one can get the two constants ε\varepsilon and δ0\delta_{0}, as well as the locally Lipschitz function 𝐱c(){\mathbf{x}}_{c}(\cdot), from Proposition 3.1. For convenience, we denote y:=(λ,μ)m×𝒵y:=(\lambda,\mu)\in{\mathds{R}}^{m}\times{\mathcal{Z}}. Suppose that y𝔹δ0(λ,μ)y\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}). From (1.2), (3.1), (3.4) and (3.8) one can get

x0(𝐱c(y),λc(y),μc(y))=xc(𝐱c(y),λ,μ)=0.\nabla_{x}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(y),\lambda_{c}(y),\mu_{c}(y))=\nabla_{x}{\mathcal{L}}_{c}({\mathbf{x}}_{c}(y),\lambda,\mu)=0.

Then, by taking the directional derivative of x0(𝐱c(y),λc(y),μc(y))\nabla_{x}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(y),\lambda_{c}(y),\mu_{c}(y)) along with the given direction Δy:=(Δλ,Δμ)m×𝒵\Delta y:=(\Delta\lambda,\Delta\mu)\in{\mathds{R}}^{m}\times{\mathcal{Z}} and using (3.8) one can see that

0=xx20(𝐱c(y),λc(y),μc(y))𝐱c(y;Δy)h(𝐱c(y))(Δλ)+ch(𝐱c(y))𝒥h(𝐱c(y))𝐱c(y;Δy)g(𝐱c(y))ΠK(μcg(𝐱c(y));Δμc𝒥g(𝐱c(y))𝐱c(y;Δy)).\begin{array}[]{ll}0=&\nabla_{xx}^{2}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(y),\lambda_{c}(y),\mu_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] &-\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+c\nabla h({\mathbf{x}}_{c}(y)){\mathcal{J}}h({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] &-\nabla g({\mathbf{x}}_{c}(y))\Pi_{K}^{\prime}(\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)).\end{array} (A.1)

On the other hand, since the projection operator ΠK()\Pi_{K}(\cdot) is strongly semismooth, we know that there always exists a linear operator W^BΠK(μcg(𝐱c(y)))\widehat{W}\in\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y))) such that

ΠK(μcg(𝐱c(y));Δμc𝒥g(𝐱c(y))𝐱c(y;Δy))=W^(Δμc𝒥g(𝐱c(y))𝐱c(y;Δy)).\Pi_{K}^{\prime}\left(\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\right)=\widehat{W}\left(\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\right). (A.2)

Therefore, by combining (A.1), (A.2) and (3.9) together, one can get

𝒜c(λ,μ,W^)𝐱c(y;Δy)=h(𝐱c(y))(Δλ)+g(𝐱c(y))W^(Δμ).{\mathcal{A}}_{c}(\lambda,\mu,\widehat{W}){\mathbf{x}}_{c}^{\prime}(y;\Delta y)=\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+\nabla g({\mathbf{x}}_{c}(y))\widehat{W}(\Delta\mu). (A.3)

From [31, Proposition 1] we know that the range of BΠK()\partial_{B}\Pi_{K}(\cdot) is bounded. Then, by [46, proposition 5.51(b)] and [46, Proposition 5.52(b)] we know that the composite mapping 𝒜c(λ,μ,BΠK(μcg(y))){\mathcal{A}}_{c}(\lambda,\mu,\partial_{B}\Pi_{K}(\mu-cg(y))) is outer semicontinuous in 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}). Note that the set 𝒜c(λ,μ,BΠK(μcg(λ,μ))){\mathcal{A}}_{c}(\lambda^{*},\mu^{*},\partial_{B}\Pi_{K}(\mu^{*}-cg(\lambda^{*},\mu^{*}))) is bounded and bounded away from zero. Hence, by (3.11) and Assumption 3.1, we know from [46, Proposition 5.12(a)] that there exists a positive constant δ1δ0\delta_{1}\leq\delta_{0} such that 𝒜c(λ,μ,W){\mathcal{A}}_{c}(\lambda,\mu,W) is always positive definite if WBΠK(μcg(𝐱c(y)))W\in\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y))) with y=(λ,μ)𝔹δ1(λ,μ)y=(\lambda,\mu)\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}). Consequently, in this case, we know from (A.3) that

𝐱c(y;Δy)=(𝒜c(λ,μ,W^))1(h(𝐱c(y))(Δλ)+g(𝐱c(y))W^(Δμ)).{\mathbf{x}}_{c}^{\prime}(y;\Delta y)=({\mathcal{A}}_{c}(\lambda,\mu,\widehat{W}))^{-1}\left(\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+\nabla g({\mathbf{x}}_{c}(y))\widehat{W}(\Delta\mu)\right). (A.4)

We use DϑcD_{\nabla\vartheta_{c}} to denote the set of all the points in 𝔹δ1(λ,μ){\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}) where ϑc()\nabla\vartheta_{c}(\cdot) is Fréchet-differentiable. Since ϑc()\nabla\vartheta_{c}(\cdot) is semismooth in 𝔹δ1(λ,μ){\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}), it holds that for any y=(λ,μ)Dϑcy=(\lambda,\mu)\in D_{\nabla\vartheta_{c}},

2ϑc(y)(Δy)=(𝒥h(𝐱c(y))𝐱c(y;Δy)c1Δμ+c1ΠK(μcg(𝐱c(y));Δμc𝒥g(𝐱c(y))𝐱c(y;Δy))).\nabla^{2}\vartheta_{c}(y)(\Delta y)=\left(\begin{matrix}-{\mathcal{J}}h({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[2.84526pt] -c^{-1}\Delta\mu+c^{-1}\Pi_{K}^{\prime}\big{(}\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\big{)}\end{matrix}\right). (A.5)

Thus, by combining (A.2), (A.4) and (A.5) together, one can see from (3.10) that

2ϑc(y)(Δy){V(Δy)V𝕍c(y)},y=(λ,μ)Dϑ.\nabla^{2}\vartheta_{c}(y)(\Delta y)\\ \in\left\{V(\Delta y)\mid V\in{\mathds{V}}_{c}(y)\right\},\quad\forall y=(\lambda,\mu)\in D_{\nabla\vartheta}.

Since 𝐱c(){\mathbf{x}}_{c}(\cdot) is locally Lipschitz continuous and BΠK()\partial_{B}\Pi_{K}(\cdot) is bounded-valued and outer semicontinuous, by [46, Propositions 5.51(b) & 5.52(b)] we know that 𝕍c{\mathds{V}}_{c} is compact-valued and outer semicontinuous. Hence, for any (λ,μ)𝔹δ1(λ,μ)(\lambda,\mu)\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}), one has that B(ϑc)(λ,μ)𝕍c(λ,μ)\partial_{B}(\nabla\vartheta_{c})(\lambda,\mu)\subseteq{\mathds{V}}_{c}(\lambda,\mu). This completes the proof. ∎

Appendix B Proof of Theorem 3.5

Proof.

Since Assumption 3.1 holds and cc¯c\geq\bar{c}, one can get the two constants ε\varepsilon and δ0\delta_{0}, as well as the locally Lipschitz function 𝐱c(){\mathbf{x}}_{c}(\cdot), via Proposition 3.1. Meanwhile, from Proposition 3.3 we know that there exists a constant δ1δ0\delta_{1}\leq\delta_{0} such that the mapping 𝕍c{\mathds{V}}_{c} defined by (3.10) is well-defined, compact-valued and outer semicontinuous in 𝔹δ1(λ,μ){\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}). Since the sequence {(λk,μk)}\{(\lambda^{k},\mu^{k})\} converges to (λ,μ)(\lambda^{*},\mu^{*}), one always has that (λk,μk)𝔹δ1(λ,μ)(\lambda^{k},\mu^{k})\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}) for all sufficiently large kk. Therefore, the set 𝕍c(λk,μk){\mathds{V}}_{c}(\lambda^{k},\mu^{k}) is nonempty and compact if kk is sufficiently large. Moreover, according to Assumption 3.2 and [46, Proposition 5.12(a)] we know that there exists a positive constant δ2δ1\delta_{2}\leq\delta_{1} such that every element in 𝕍c(λ,μ){\mathds{V}}_{c}(\lambda,\mu) is negative definite whenever (λ,μ)𝔹δ2(λ,μ)(\lambda,\mu)\in{\mathds{B}}_{\delta_{2}}(\lambda^{*},\mu^{*}).

Next, we prove (3.12). For all sufficiently large kk, with the convention that xk+1:=𝐱c(λk,μk)x^{k+1}:={\mathbf{x}}_{c}(\lambda^{k},\mu^{k}), one can see from (3.10) that there exists a linear operator WkBΠK(μkcg(xk+1))W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})) such that

Vk=(𝒥h(xk+1)Wk𝒥g(xk+1))(𝒜c(λk,μk,Wk))1(h(xk+1),g(xk+1)Wk)c1(000𝒵Wk).\begin{array}[]{ll}V^{k}=&-\begin{pmatrix}{\mathcal{J}}h(x^{k+1})\\[0.0pt] W^{k}{\mathcal{J}}g(x^{k+1})\end{pmatrix}\left({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k})\right)^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)\\[11.38109pt] &-c^{-1}\left(\begin{matrix}0&0\\[0.0pt] 0&{\mathcal{I}}_{{\mathcal{Z}}}-W^{k}\end{matrix}\right).\end{array} (B.1)

Meanwhile, one can see from (3.2) and Lemma 3.2 that

ϑc(λk,μk)ϑc(λ,μ)=(h(xk+1)+h(x)c1(μkμ)+c1ΠK(μkcg(xk+1))c1ΠK(μcg(x))).\begin{array}[]{l}\nabla\vartheta_{c}(\lambda^{k},\mu^{k})-\nabla\vartheta_{c}(\lambda^{*},\mu^{*})\\[5.69054pt] =\left(\begin{matrix}-h(x^{k+1})+h(x^{*})\\[5.69054pt] -c^{-1}(\mu^{k}-\mu^{*})+c^{-1}\Pi_{K}\big{(}\mu^{k}-cg(x^{k+1}))-c^{-1}\Pi_{K}(\mu^{*}-cg(x^{*})\big{)}\end{matrix}\right).\end{array} (B.2)

For convenience, we denote yk:=(λk,μk)y^{k}:=(\lambda^{k},\mu^{k}) and y:=(λ,μ)y^{*}:=(\lambda^{*},\mu^{*}). Then, by (B.1) and (B.2) we know that (3.12) holds if and only if for all WkBΠK(μkcg(xk+1))W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})) one has that, with kk\to\infty,

h(x)h(xk+1)+𝒥h(xk+1)(𝒜c(λk,μk,Wk))1(h(xk+1),g(xk+1)Wk)(yky)=o(yky)\begin{array}[]{l}h(x^{*})-h(x^{k+1})\\[2.84526pt] +{\mathcal{J}}h(x^{k+1})({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})\\[2.84526pt] =o(\|y^{k}-y^{*}\|)\end{array} (B.3)

and

Wk𝒥g(xk+1)(𝒜c(λk,μk,Wk))1(h(xk+1),g(xk+1)Wk)(yky)+c1ΠK(μkcg(xk+1))c1ΠK(μcg(x))c1Wk(μkμ)=o(yky).W^{k}{\mathcal{J}}g(x^{k+1})({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})\\[2.84526pt] +c^{-1}\Pi_{K}(\mu^{k}-cg(x^{k+1}))-c^{-1}\Pi_{K}(\mu^{*}-cg(x^{*}))-c^{-1}W^{k}(\mu^{k}-\mu^{*})\\[2.84526pt] =o(\|y^{k}-y^{*}\|). (B.4)

We first veirfy (B.3). We know from (A.2) and (A.4) that, when kk is sufficiently large,

𝐱c(yk;yky)=(𝒜c(λk,μk,W^k))1(h(xk+1)(λkλ)+g(xk+1)W^k(μkμ)),\begin{array}[]{ll}{\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =\big{(}{\mathcal{A}}_{c}(\lambda^{k},\mu^{k},\widehat{W}^{k})\big{)}^{-1}\big{(}\nabla h(x^{k+1})(\lambda^{k}-\lambda^{*})+\nabla g(x^{k+1})\widehat{W}^{k}(\mu^{k}-\mu^{*})\big{)},\end{array} (B.5)

where the linear operator W^kBΠK(μkcg(xk+1))\widehat{W}^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})) is chosen such that

ΠK(μkcg(xk+1);μkμc𝒥g(xk+1)𝐱c(yk;yky))=W^k(μkμc𝒥g(xk+1)𝐱c(yk;yky)).\begin{array}[]{ll}\displaystyle\Pi_{K}^{\prime}\big{(}\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\big{)}\\[5.69054pt] \displaystyle=\widehat{W}^{k}\big{(}\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\big{)}.\end{array} (B.6)

As has been discussed in Appendix A, the range of BΠK()\partial_{B}\Pi_{K}(\cdot) is bounded, so that the sequence {W^k}\{\widehat{W}^{k}\} is also bounded. Moreover, the set 𝒜c(λ,μ,BΠK(μcg(λ,μ))){\mathcal{A}}_{c}(\lambda^{*},\mu^{*},\partial_{B}\Pi_{K}(\mu^{*}-cg(\lambda^{*},\mu^{*}))) is bounded and bounded away from zero, while the composite mapping 𝒜c(λ,μ,BΠK(μcg(y))){\mathcal{A}}_{c}(\lambda,\mu,\partial_{B}\Pi_{K}(\mu-cg(y))) is outer semicontinuous in 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}). Note that the functions gg and hh are twice continuously differentiable, the sequences {h(xk+1)}\{\nabla h(x^{k+1})\} and {g(xk+1)}\{\nabla g(x^{k+1})\} are bounded. Then, we know from (B.5) that there exists a constant ρ1>0\rho_{1}>0 such that, for all sufficiently large kk, 𝐱c(yk;yky)ρ1yky\|{\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\|\leq\rho_{1}\|y^{k}-y^{*}\|.

Since ΠK()\Pi_{K}(\cdot) is strongly semismooth and WkBΠK(μkcg(xk+1))W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})), one can see from (B.6) that, when kk\to\infty and kk is sufficiently large,

ΠK(μkcg(xk+1);μkμc𝒥g(xk+1)𝐱c(yk;yky))Wk(μkμc𝒥g(xk+1)𝐱c(yk;yky))=(W^kWk)(μkμc𝒥g(xk+1)𝐱c(yk;yky))=(W^kWk)(μkcg(xk+1)μ+cg(x))+c(W^kWk)(g(xk+1)g(x)𝒥g(xk+1)𝐱c(yk;yky)))=o(μkcg(𝐱c(yk))μ+cg(x))+c(W^kWk)(g(𝐱c(yk))g(𝐱c(y))𝒥g(xk+1)𝐱c(yk;yky))=o(yky),\begin{array}[]{l}\Pi_{K}^{\prime}\left(\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\right)\\[5.69054pt] \qquad-W^{k}(\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*}))\\[5.69054pt] =(\widehat{W}^{k}-W^{k})\left(\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\right)\\[5.69054pt] =(\widehat{W}^{k}-W^{k})\left(\mu^{k}-cg(x^{k+1})-\mu^{*}+cg(x^{*})\right)\\[5.69054pt] \quad+c(\widehat{W}^{k}-W^{k})\left(g(x^{k+1})-g(x^{*})-{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*}))\right)\\[5.69054pt] =o\left(\|\mu^{k}-cg({\mathbf{x}}_{c}(y^{k}))-\mu^{*}+cg(x^{*})\|\right)\\[2.84526pt] \quad+c(\widehat{W}^{k}-W^{k})\big{(}g({\mathbf{x}}_{c}(y^{k}))-g({\mathbf{x}}_{c}(y^{*}))-{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\big{)}\\[5.69054pt] =o(\|y^{k}-y^{*}\|),\end{array}

where the penultimate equality is due to Lemma A.1, and the last inequality comes from the fact that the function g(𝐱c())g({\mathbf{x}}_{c}(\cdot)) is semismooth (hence directionally differentiable) around yy^{*}. Then, from the fact that {g(xk+1)}\{\nabla g(x^{k+1})\} is bounded, we can get from the above equation that

g(xk+1)ΠK(μkcg(xk+1);μkμc𝒥g(xk+1)𝐱c(yk;yky))+cg(xk+1)Wk𝒥g(xk+1)𝐱c(yk;yky)g(xk+1)Wk(μkμ)=o(yky).\begin{array}[]{l}\nabla g(x^{k+1})\Pi_{K}^{\prime}\left(\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\right)\\[5.69054pt] \qquad+c\nabla g(x^{k+1})W^{k}{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})-\nabla g(x^{k+1})W^{k}(\mu^{k}-\mu^{*})\\[5.69054pt] =o(\|y^{k}-y^{*}\|).\end{array} (B.7)

Note that (A.1) in Appendix A holds for y𝔹δ2(λ,μ)y\in{\mathds{B}}_{\delta_{2}}(\lambda^{*},\mu^{*}). Then, by using (A.1) and (3.9) together we can get that for y𝔹δ2(λ,μ)y\in{\mathds{B}}_{\delta_{2}}(\lambda^{*},\mu^{*}) and WBΠK(μcg(𝐱c(y)))W\in\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y))),

𝒜c(λ,μ,W)𝐱c(y;Δy)=h(𝐱c(y))(Δλ)+cg(𝐱c(λ,μ))W𝒥g(𝐱c(λ,μ))𝐱c(y;Δy)+g(𝐱c(y))ΠK(μcg(𝐱c(y));Δμc𝒥g(𝐱c(y))𝐱c(y;Δy)).\begin{array}[]{ll}\displaystyle{\mathcal{A}}_{c}(\lambda,\mu,W){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] =\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+c\nabla g({\mathbf{x}}_{c}(\lambda,\mu))W{\mathcal{J}}g({\mathbf{x}}_{c}(\lambda,\mu)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] \qquad+\nabla g({\mathbf{x}}_{c}(y))\Pi_{K}^{\prime}(\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)).\end{array} (B.8)

Then, by taking (B.7) into (B.8) one can get that for all sufficiently large kk with WkBΠK(μkcg(xk+1))W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})),

𝒜c(λk,μk,Wk)𝐱c(yk;yky)=h(xk+1)(λkλ)+g(xk+1)Wk(μkμ)+o(yky),\begin{array}[]{ll}{\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =\nabla h(x^{k+1})(\lambda^{k}-\lambda^{*})+\nabla g(x^{k+1})W^{k}(\mu^{k}-\mu^{*})+o(\|y^{k}-y^{*}\|),\end{array}

which, together with Assumption 3.1 the fact the mapping 𝒜c(λ,μ,BΠK(μcg(𝐱c(y)))){\mathcal{A}}_{c}(\lambda,\mu,\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y)))) is outer semicontinuous in 𝔹δ0(λ,μ){\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}), implies that for all sufficiently large kk, as kk\to\infty,

𝐱c(yk;yky)=(𝒜c(λk,μk,Wk))1(h(xk+1),g(xk+1)Wk)(yky)+o(yky).\begin{array}[]{ll}{\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})+o(\|y^{k}-y^{*}\|).\end{array} (B.9)

Since h(𝐱c())h({\mathbf{x}}_{c}(\cdot)) is locally semismooth, we have that for sufficiently large kk\to\infty,

h(x)=h(xk+1)𝒥h(xk+1)𝐱c(yk;yky)+o(yky).h(x^{*})=h(x^{k+1})-{\mathcal{J}}h(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})+o(\|y^{k}-y^{*}\|). (B.10)

Now, by substituting (B.9) in (B.10) and noting that the sequence {𝒥h(xk+1)}\{{\mathcal{J}}h(x^{k+1})\} is bounded, we can get (B.3). Next, we verify (B.4). Note that for sufficiently large kk with kk\to\infty, it holds that

c1(ΠK(μkcg(xk+1))ΠK(μcg(x)))=c1ΠK(μkcg(xk+1),μkμcg(xk+1)+cg(x))+o(yky),\begin{array}[]{l}c^{-1}(\Pi_{K}(\mu^{k}-cg(x^{k+1}))-\Pi_{K}(\mu^{*}-cg(x^{*})))\\[2.84526pt] =c^{-1}\Pi_{K}^{\prime}\left(\mu^{k}-cg(x^{k+1}),\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)+o(\|y^{k}-y^{*}\|),\end{array} (B.11)

and

cg(xk+1)+cg(x)=c𝒥g(xk+1)𝐱c(yk;yky)+o(yky).-cg(x^{k+1})+cg(x^{*})=-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})+o(\|y^{k}-y^{*}\|). (B.12)

Meanwhile, one can choose for each kk the linear operator W~kBΠK(μkcg(xk+1))\widetilde{W}^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})) such that

W~k(μkμcg(xk+1)+cg(x))=ΠK(μkcg(xk+1);μkμcg(xk+1)+cg(x)).\begin{array}[]{l}\widetilde{W}^{k}\big{(}\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\big{)}\\[5.69054pt] =\Pi_{K}^{\prime}\big{(}\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\big{)}.\end{array} (B.13)

Then by combining (B.11), (B.12) and (B.13) together, we can get that

c1(ΠK(μkcg(xk+1))ΠK(μcg(x)))c1Wk(μkμ)+Wk𝒥g(xk+1)𝐱c(yk;yky)=c1W~k(μkμcg(xk+1)+cg(x))c1Wk(μkμcg(xk+1)+cg(x))+o(yky)=c1(W~kWk)(μkμcg(xk+1)+cg(x))+o(yky)=o(yky),\begin{array}[]{l}c^{-1}\left(\Pi_{K}(\mu^{k}-cg(x^{k+1}))-\Pi_{K}(\mu^{*}-cg(x^{*}))\right)-c^{-1}W^{k}(\mu^{k}-\mu^{*})\\[5.69054pt] \quad+W^{k}{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =c^{-1}\widetilde{W}^{k}\left(\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)\\[5.69054pt] \qquad-c^{-1}W^{k}\left(\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)+o(\|y^{k}-y^{*}\|)\\[5.69054pt] =c^{-1}(\widetilde{W}^{k}-W^{k})\left(\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)+o(\|y^{k}-y^{*}\|)\\[5.69054pt] =o(\|y^{k}-y^{*}\|),\end{array} (B.14)

where the last inequality comes from Lemma A.1 and the fact that W~k,WkBΠK(μkcg(xk+1))\widetilde{W}^{k},W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1})). Note that (B.9) implies that

Wk𝒥g(xk+1)𝐱c(yk;yky)=Wk𝒥g(xk+1)(𝒜c(λk,μk,Wk))1(h(xk+1),g(xk+1)Wk)(yky)+o(yky).\begin{array}[]{l}W^{k}{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =W^{k}{\mathcal{J}}g(x^{k+1})({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})+o(\|y^{k}-y^{*}\|).\end{array} (B.15)

Thus, by substituting (B.15) in (B.14) we can get (B.4), which completes the proof. ∎