A Unified Convergence Analysis of a Second-Order Method of Multipliers for Nonlinear Conic Programming

Liang Chen Junyuan Zhu Xinyuan Zhao School of Mathematics, Hunan University, Changsha 410082, P.R. China ([email protected]). This author was supported in part by the National Natural Science Foundation of China (11801158), the Hunan Provincial Natural Science Foundation of China (2019JJ50040), and the Fundamental Research Funds for the Central Universities in China.School of Mathematics, Hunan University, Changsha 410082, P.R. China ([email protected]).School of Mathematics, Beijing University of Technology, Beijing 100124, P. R. China ([email protected]). This author was supported in part by the National Natural Science Foundation of China (11871002) and the General Program of Science and Technology of Beijing Municipal Education Commission.

Abstract

In this paper, we accomplish a unified convergence analysis of a second-order method of multipliers (i.e., a second-order augmented Lagrangian method) for solving the conventional nonlinear conic optimization problems. Specifically, the algorithm that we investigated incorporates a specially designed nonsmooth (generalized) Newton step to furnish a second-order update rule for the multipliers. We first show in a unified fashion that under a few abstract assumptions, the proposed method is locally convergent and possesses a (nonasymptotic) superlinear convergence rate, even though the penalty parameter is fixed and/or the strict complementarity fails. Subsequently, we demonstrate that, for the three typical scenarios, i.e., the classic nonlinear programming, the nonlinear second-order cone programming, and the nonlinear semidefinite programming, these abstract assumptions are nothing but exactly the implications of the iconic sufficient conditions that were assumed for establishing the Q-linear convergence rates of the method of multipliers without assuming the strict complementarity.

Keywords. second-order method of multipliers, augmented Lagrangian method, convergence rate, generalized Newton method, second-order cone programming, semidefinite programming

AMS Subject Classification [2000]. Primary 65K05, 49J52; Secondary 90C22, 26E25

1 Introduction

Let ${\mathcal{X}}$ and ${\mathcal{Z}}$ be two finite-dimensional real Hilbert spaces each endowed with an inner product $\langle\cdot,\cdot\rangle$ and its induced norm $\|\cdot\|$ . Consider the following general nonlinear conic optimization problem

\min_{x\in{\mathcal{X}}}\,f(x)\quad\mbox{s.t.}\quad\ h(x)=0,\ g(x)\in K,

(1.1)

where $f:{\mathcal{X}}\to{\mathds{R}}$ , $h:{\mathcal{X}}\to{\mathds{R}}^{m}$ and $g:{\mathcal{X}}\to{\mathcal{Z}}$ are three twice continuously differentiable functions, and $K$ is a closed convex self-dual cone in ${\mathcal{Z}}$ , i.e., it holds that

K\equiv K^{*}:=\left\{\mu\,|\,\langle\mu,\mu^{\prime}\rangle\geq 0,\ \forall\mu^{\prime}\in K\right\}.

Let $c>0$ be the given penalty parameter. The augmented Lagrangian function of problem (1.1) is defined by

\begin{array}[]{ll}\displaystyle{\mathcal{L}}_{c}(x,\lambda,\mu):=f(x)-\langle\lambda,h(x)\rangle+\frac{c}{2}\|h(x)\|^{2}+\frac{1}{2c}\big{(}\|\Pi_{K}(\mu-cg(x))\|^{2}-\|\mu\|^{2}\big{)},\\[8.53581pt] \displaystyle\hfill\forall(x,\lambda,\mu)\in{\mathcal{X}}\times{\mathds{R}}^{m}\times{\mathcal{Z}},\end{array}

(1.2)

where $\Pi_{K}(\cdot)$ denotes the metric projection operator onto $K$ .

The method of multipliers (also known as the augmented Lagrangian method) for solving problem (1.1) can be stated, in brief, as follows. Let $c_{0}>0$ be a given constant and $(\lambda^{0},\mu^{0})\in{\mathds{R}}^{m}\times K$ be the initial estimate of the multipliers. For $k=0,1,\ldots$ , determine $x^{k+1}\in{\mathcal{X}}$ that minimizes the augmented Lagrangian function ${\mathcal{L}}_{c_{k}}(x,\lambda^{k},\mu^{k})$ with respect to $x$ , and then update the multipliers via

\left\{\begin{array}[]{l}\lambda^{k+1}:=\lambda^{k}-c_{k}h(x^{k+1}),\\[5.69054pt] \mu^{k+1}:=\Pi_{K}\big{(}\mu^{k}-c_{k}g(x^{k+1})\big{)},\end{array}\right.

(1.3)

thereafter, update the penalty parameter such that $c_{k+1}\geq c_{k}$ .

This method was initiated in Hestenes [23] and Powell [37] for solving equality constrained nonlinear programming (NLP) problems, and was generalized in Rockafellar [43] for NLP problems with inequality constraints. Nowadays, the augmented Lagrangian method has been serving as the backbone of many optimization solvers, such as [17, 55, 53, 28], which are readily available for both academic and commercial users. The literature of the augmented Lagrangian method is considerably abundant, so that we are only able to review the references on analyzing and improving its convergence rate, which are the most relevant ones to this work.

The linear convergence rates of the method of multipliers for NLP problems has been extensively discussed in the literature, such as Powell [37], Rockafellar [43, 44], Tretyakov [52], Bertsekas [2], Conn [16], Ito and Kunisch [24], and Contesse-Becker [18], under various assumptions. One may refer to the monograph of Bertsekas [5] and the references therein for a thorough discussion on the augmented Lagrangian method for NLP problems. For nonlinear second-order cone programming (NLSOCP) problems, the local linear convergence rate of the augmented Lagrangian method has been analyzed in Liu and Zhang [29], and the corresponding results were further improved in Liu and Zhang [30]. For nonlinear semidefinite programming (NLSDP) problems, the local linear rate of convergence was analyzed by Sun et al. [51], and, importantly, the strict complementarity is not necessary for getting this result. Here, we should mention that, in the vast majority of the above mentioned references, the convergence rates of the method of multipliers could be asymptotically superlinear, provided that the penalty parameter sequence $\{c_{k}\}$ tends to infinity as $k\to\infty$ .

In the seminal work [45] of Rockafellar, it was observed that, in the convex programming setting, the method of multipliers can be viewed as a proximal point algorithm applied to the dual. On the other hand, without assuming the convexity, one can locally interpret the iteration procedure (1.3) for updating the multipliers as a first-order steepest ascent step applied to a certain concave (dual) problem. Consequently, given the linear convergence rates of the method of multipliers mentioned above, it is natural to pose the question that if one can design a second-order method of multipliers in the sense that a second-order update form of the multipliers, instead of (1.3), is incorporated in the method of multipliers for calculating $\lambda^{k+1}$ and $\mu^{k+1}$ . Meanwhile, it is also interesting to know if the corresponding superlinear convergence rates are achievable, even when the sequence $\{c_{k}\}$ is bounded and/or the strict complementarity fails.

In fact, several attempts have been made to resolve this question for quite a long time. For NLP problems without inequality constraints, Buys [11] first introduced a second-order procedure for updating the multipliers, which can be viewed as an application of the classic Newton method to a certain dual problem. Such a procedure was later investigated and refined by Bertsekas [3, 4, 5]. Later, Brusch [9] and Fletcher [19] independently proposed updating rules of the multipliers via the quasi-Newton method, and this was also studied in Fontecilla et al. [20]. For NLP problems with inequality constraints, the augmented Lagrangian function is no longer twice continuously differentiable. Therefore, it is much more difficult to design a second-order method of multipliers for these problems. The first progress towards this direction was made by Yuan [54], in which the updating rule is based on the classic Newton method, but the corresponding convergence result still has not been formally provided. More recently, Chen and Liao [13] proposed a second-order augmented Lagrangian method for solving NLP problems with inequality constraints. This algorithm is based on a specially designed generalized Newton method and, the corresponding local convergence, as well as the superlinear convergence rate, was established. At the same time, extensive numerical study of the second-order method of multipliers has also been conducted for convex quadratic programming [10], and the numerical results suggest that the second-order update formula of the multipliers, at least locally, can be much better than the usual first-order update rule [10, Section 6].

Motivated by the references mentioned above, in this paper, we would like to approach the problem of how to extend the second-order method of multipliers in [13] from NLP problems with inequality constraints to the more general conic optimization cases, i.e., the more complicated NLSOCP and NLSDP problems. We should emphasize that, as can be observed later, such an extension is highly nontrivial, inasmuch as the required variational analysis for the non-polyhedral conic constraints is much more intricate than that for the NLP case, in which $K$ is a polyhedral convex set.

In the same vein as [13], we first use a specially designed generalized Newton method to deal with the nonsmoothness induced by the conic constraints, so as to construct a unified second-order method of multipliers for general nonlinear conic programming problems. We analyze its convergence (locally convergent with a superlinear convergence rate even the strict complementarity fails and the penalty parameter is fixed) under certain abstract assumptions. After that, for the NLP, NLSOCP and NLSDP cases, we show separately that these abstract assumptions can be implied by certain second-order optimality conditions plus a constraint qualification, which have been frequently used for establishing the Q-linear convergence rate of the method of multipliers. Furthermore, when the conic constraints vanish, the algorithm studied in this paper automatically turns to the classic second-order method of multipliers for NLP problems with only equality constraints, and the convergence results in this paper are consistent with all the previous related studies.

The remaining parts of this paper are organized as follows. In Section 2, we prepare the necessary preliminaries from the nonsmooth analysis literature for further discussions. In Section 3, we propose a second-order method of multipliers for solving problem (1.1), and establish its convergence under certain abstract assumptions. In Sections 4, 5 and 6, we separately specify the abstract assumptions in Section 3 with respect to the NLP, NLSCOP and NLSDP cases, which constitutes the main contribution of this paper. Section 7 presents a simple numerical example to illustrate the superlinear convergence of the method studied in this work. We conclude this paper in Section 8.

2 Notation and preliminaries

2.1 Notation

Let ${\mathcal{U}}$ be an arbitrarily given finite-dimensional real Hilbert space endowed with an inner product $\langle\cdot,\cdot\rangle$ that induces the norm $\|\cdot\|$ . We use ${\mathds{L}}({\mathcal{U}},{\mathcal{U}})$ to denote the linear space of all the linear operators from ${\mathcal{U}}$ to itself. For any given $\varepsilon>0$ , we define the open $\varepsilon$ -neighborhood in ${\mathcal{U}}$ by

{\mathds{B}}_{\varepsilon}(u):=\{u^{\prime}\in{\mathcal{U}}\mid\|u-u^{\prime}\|<\varepsilon\},\quad\forall u\in{\mathcal{U}}.

Let $U$ be a nonempty closed (not necessarily convex) set in ${\mathcal{U}}$ and ${\sf lin\,}U$ denote the lineality space of $U$ , i.e., the largest linear space contained in $U$ . We use ${\sf aff\,}U$ to denote the affine hull of $U$ , i.e., the smallest linear subspace of ${\mathcal{U}}$ that contains $U$ . Moreover, we define the distance from a point in ${\mathcal{U}}$ to the set $U$ by

{\sf dist}(u,U):=\inf_{u^{\prime}\in U}\|u-u^{\prime}\|,\quad\forall u\in{\mathcal{U}}.

We use ${\mathcal{T}}_{U}^{i}(u)$ and ${\mathcal{T}}_{U}(u)$ to denote the tangent cone and contingent cone of the set $U$ at $u\in{\mathcal{U}}$ , respectively, which are defined by

{\mathcal{T}}_{U}^{i}(u):=\left\{u^{\prime}\in{\mathcal{U}}\mid{\sf dist}(u+\tau u^{\prime},U)=o(\tau),\ \forall\tau\geq 0\right\},

and

{\mathcal{T}}_{U}(u):=\left\{u^{\prime}\in{\mathcal{U}}\mid\exists\,\tau_{k}\downarrow 0,\mbox{ such that }{\sf dist}(u+\tau_{k}u^{\prime},U)=o(\tau_{k})\right\}.

If, additionally, $U$ is a convex set, one has that ${\mathcal{T}}_{U}^{i}(u)={\mathcal{T}}_{U}(u),\,\forall u\in{\mathcal{U}}$ and such tangent and contingent cones are simply called as the tangent cone of $U$ at $u$ . In this case, we define the projection operator to the set $U$ by

\Pi_{U}(u)=\operatorname*{arg\,min}_{u^{\prime}\in U}\|u-u^{\prime}\|,\quad\forall u\in{\mathcal{U}}.

Let ${\mathcal{V}}$ and ${\mathcal{W}}$ be the other two given finite-dimensional real Hilbert spaces each endowed with an inner product $\langle\cdot,\cdot\rangle$ and its induced norm $\|\cdot\|$ . For any set $C\subseteq{\mathcal{U}}\times{\mathcal{V}}$ , we define

\pi_{{\mathcal{U}}}(C):=\{u\in{\mathcal{U}}\mid(u,v)\in C\}.

Let $F:{\mathcal{U}}\to{\mathcal{W}}$ be an arbitrarily given function. The directional derivative of $F(\cdot)$ at $u\in{\mathcal{U}}$ along with a nonzero direction $\Delta u\in{\mathcal{U}}$ , if exists, is denoted by $F^{\prime}(u;\Delta u)$ . $F$ is said to be directionally differentiable at $u\in{\mathcal{U}}$ if $F^{\prime}(u;\Delta u)$ exists for any nonzero $\Delta u\in{\mathcal{U}}$ . If $F$ is a linear mapping, we use $F^{*}$ to denote its adjoint. If $F$ is Fréchet-differentiable at $u\in{\mathcal{U}}$ , we use ${\mathcal{J}}F(u)$ to denote the Fréchet-derivative of $F$ at this point and, in this case, we denote

\nabla F(u):=({\mathcal{J}}F(u))^{*}.

Let $O$ be an open set in ${\mathcal{U}}$ . If $F:O\to{\mathcal{W}}$ is locally Lipschitz continuous on $O$ , $F$ is almost everywhere Fréchet-differentiable in $O$ , thanks to [46, Definition 9.1] and the Rademacher’s theorem [46, Theorem 9.60]. In this case, we denote the set of all the points in $O$ where $F$ is Fréchet-differentiable by $D_{F}$ . Then, the Bouligand-subdifferential of $F$ at $u\in O$ , denoted by $\partial_{B}F(u)$ , is defined by

\partial_{B}F(u):=\left\{\lim_{k\to\infty}{\mathcal{J}}F(u^{k})\mid\ \{u^{k}\}\subset D_{F},\,u^{k}\to u\,\right\}.

Moreover, the Clarke’s generalized Jacobian of $F$ at $u\in O$ is defined by

\partial F(u):=\mbox{\sf conv}\{\partial_{B}F(u)\},

i.e., the convex hull of $\partial_{B}F(u)$ .

Let $G:{\mathcal{U}}\times{\mathcal{V}}\to{\mathcal{W}}$ be an arbitrarily given function. The partial Fréchet-derivative of $G$ to $u$ at $(u,v)\in{\mathcal{U}}\times{\mathcal{V}}$ , if exists, is denoted by ${\mathcal{J}}_{u}G(u,v)$ , and we define $\nabla_{u}G(u,v):=({\mathcal{J}}_{u}G(u,v))^{*}$ . Furthermore, if $G$ is twice Fréchet-differentiable at $(u,v)\in{\mathcal{U}}\times{\mathcal{V}}$ , we denote

\begin{array}[]{l}{\mathcal{J}}^{2}G(u,v):={\mathcal{J}}({\mathcal{J}}G)(u,v),\quad{\mathcal{J}}^{2}_{u,u}G(u,v):={\mathcal{J}}_{u}({\mathcal{J}}_{u}G)(u,v),\\[5.69054pt] \nabla^{2}G(u,v):={\mathcal{J}}(\nabla G)(u,v),\quad\nabla_{uu}^{2}G(u,v):={\mathcal{J}}_{u}(\nabla_{u}G)(u,v).\end{array}

Finally, we mention that the notations in Sections 4, 5 and 6 are isolated from each other, as the contents of these sections are independent.

2.2 Nonsmooth Newton methods

Recall that ${\mathcal{U}}$ is a finite-dimensional real Hilbert space. Given a nonempty open set $O\subset{\mathcal{U}}$ . Let $F:O\to{\mathcal{U}}$ be a locally Lipschitz continuous function and consider the following nonlinear equation

F(u)=0.

(2.1)

Definition 2.1 (Semismoothness).

Suppose that $F:O\to{\mathcal{U}}$ is locally Lipschitz continuous. $F$ is said to be semismooth at a point $u\in O$ if $F$ is directionally differentiable at $u$ and, for any sequence $\{u^{k}\}\subset O$ that converging to $u$ with $V^{k}\in\partial F(u^{k})$ , i.e., the Clarke’s generalized Jacobian of $F$ at $u^{k}$ , it holds that

F(u^{k})-F(u)-V^{k}(u^{k}-u)=o(\|u^{k}-u\|).

(2.2)

Moreover, $F$ is said to be strongly semismooth at $u$ if $F$ is semismooth at $u$ but with (2.2) being replaced by $F(u^{k})-F(u)-V^{k}(u^{k}-u)=O(\|u^{k}-u\|^{2})$ .

To solve the nonlinear equation (2.1), the semismooth Newton method, as an instance of nonsmooth (generalized) Newton methods, requires a surrogate in ${\mathds{L}}({\mathcal{U}},{\mathcal{U}})$ for the Jacobian used in the classic Newton’s method. For this purpose, one can prescribe a multi-valued mapping $T_{F}:{O}\rightrightarrows{\mathds{L}}({\mathcal{U}},{\mathcal{U}})$ and make the following assumption.

Assumption 2.1.

The mapping $T_{F}:{O}\rightrightarrows{\mathds{L}}({\mathcal{U}},{\mathcal{U}})$ satisfies

(a)

for any $u\in O$ , $T_{F}(u)$ is a nonempty and compact subset of ${\mathds{L}}({\mathcal{U}},{\mathcal{U}})$ ;
(b)

the mapping $T_{F}$ is outer semicontinuous¹¹1The definition of outer semicontinuous can be found in Rockafellar [46, Definition 5.4].;
(c)

$\partial_{B}F(u)\Delta u\subset T_{F}(u)\Delta u\subset\partial_{C}F(u)\Delta u$ , $\forall\,u\in O,\,\Delta u\in{\mathcal{U}}$ .

Here, $\partial_{C}$ is similarly defined as in [15, Proposition 2.6.2], i.e., by decomposing ${\mathcal{U}}=\prod_{i=1}^{n}{\mathcal{U}}_{i}$ with each ${\mathcal{U}}_{i}$ being a finite-dimensional Hilbert space, one can write $F(u)=\prod_{i=1}^{n}F_{i}(u),\,\forall u\in O$ , so that one can define

\partial_{C}F(u):=\prod_{i=1}^{n}\partial F_{i}(u)\quad\mbox{with}\quad\partial F_{i}(u):=\pi_{{\mathcal{U}}_{i}}(\partial F(u)),\quad\forall u\in O.

Based on the mapping $T_{F}$ defined above, the semismooth Newton method for solving (2.1) can be stated as follows: Let $u^{0}\in O$ be the given initial point. For $k=1,\ldots$ ,

1.

choose $V^{k}\in T_{F}(u^{k})$ ;
2.

compute the generalized Newton direction $\Delta u^{k}=\Delta u$ via solving the linear system $F(u^{k})+V^{k}\Delta u=0$ ;
3.

set $u^{k+1}:=u^{k}+\Delta u^{k}$ .

Remark 2.2.

The notion of semismoothness was first introduced in Mifflin [32] for functionals, and was later extended in Qi and Sun [38] to vector-valued functions. Here we adopt the definition for semismoothness from Sun and Sun [50, Definition 3.6], which was derived based on Pang and Qi [36, Proposition 1]. The semismooth Newton method was first proposed in Kummer [27, 26], and was further extended by Qi and Sun [38], Qi [39], and Sun and Han [49]. The following theorem gives the convergence properties of the above method.

Theorem 2.3 ([38, Theorem 3.2]).

Let $u^{*}\in{\mathcal{U}}$ be a solution to the nonlinear equation (2.1). Suppose that Assumption 2.1 holds and $T_{F}(u^{*})$ is nonsingular, i.e., all $V\in T_{F}(u^{*})$ are nonsingular, and $F$ is semismooth at $u^{*}$ . Then, there exists a positive number $\varepsilon>0$ such that for any $u^{0}\in{\mathds{B}}_{\varepsilon}(u^{*})$ , the sequence $\{u^{k}\}$ generated by the semismooth Newton method is well-defined and converges to $u^{*}$ with a superlinear rate, i.e., for all sufficiently large $k$ , it holds that

\|u^{k+1}-u^{*}\|=o(\|u^{k}-u^{*}\|).

Furthermore, if F is strongly semismooth at $u^{*}$ , then the convergence rate is Q-quadratic, i.e., for all sufficiently large $k$ , it holds that $\|u^{k+1}-u^{*}\|=O(\|u^{k}-u^{*}\|^{2})$ .

As can be observed from the proof of [38, Theorem 3.2], the nonsingularity of $T_{F}(u^{*})$ implies that $T_{F}(\cdot)$ is uniformly bounded and nonsingular in a neighborhood of $u^{*}$ . Meanwhile, the semismoothness of $F$ guarantees that, for any sequence $\{u^{k}\}$ converging to $u^{*}$ with $k$ being sufficiently large,

F(u^{k})-F(u^{*})-V^{k}(u^{k}-u^{*})=o(\|u^{k}-u^{*}\|),\quad\forall\ V^{k}\in T_{F}(u^{k}).

(2.3)

In fact, from the proof of [38, Theorem 3.2], one also can see that, even if the mapping $T_{F}$ fails to satisfy Assumption 2.1(c), the semismooth Newton method is well-defined and convergent if, additionally, (2.3) holds. Consequently a much broader algorithmic framework of nonsmooth (generalized) Newton methods emerges beyond the semismooth newton method. This observation constitutes the basis for designing the second-order method of multipliers in the subsequent section. For the convenience of further discussions, we summarize the facts mentioned above as the following proposition.

Proposition 2.4.

Let $u^{*}\in{\mathcal{U}}$ be a solution to the nonlinear equation (2.1). Suppose that Assumption 2.1(a) and Assumption 2.1(b) hold, and (2.3) holds for any sequence $\{u^{k}\}$ converging to $u^{*}$ . If $T_{F}(u^{*})$ is nonsingular and $F$ is semismooth at $u^{*}$ , then there exists a positive number $\varepsilon>0$ such that for any $u^{0}\in{\mathds{B}}_{\varepsilon}(u^{*})$ , the sequence $\{u^{k}\}$ generated by the semismooth Newton method is well-defined and converges to $u^{*}$ with a superlinear rate.

Proof.

By using (2.3) instead of Assumption 2.1(c) and repeating the proof of [38, Theorem 3.2], one can get the result. ∎

3 A second-order method of multipliers

In this section, we propose a second-order method of multipliers for solving problem (1.1). This method is essentially a specially designed nonsmooth (generalized) Newton method, which is based on Proposition 2.4. Throughout this section, we make the blanket assumption that the projection operator $\Pi_{K}(\cdot)$ is strongly semismooth, as this will be fulfilled for all the three classes of problems that will be studied in the forthcoming three sections.

The Lagrangian function of problem (1.1) is defined by

{\mathcal{L}}_{0}(x,\lambda,\mu):=f(x)-\langle\lambda,h(x)\rangle-\langle\mu,g(x)\rangle,\quad\forall(x,\lambda,\mu)\in{\mathcal{X}}\times{\mathds{R}}^{m}\times{\mathcal{Z}}.

(3.1)

A vector $x^{*}\in{\mathcal{X}}$ is called as a stationary point of problem (1.1) if there exists a vector $(\lambda^{*},\mu^{*})\in{\mathds{R}}^{m}\times{\mathcal{Z}}$ such that $(x^{*},\lambda^{*},\mu^{*})$ is a solution to the following Karush-Kuhn-Tucker (KKT) system:

\left\{\begin{array}[]{l}\nabla_{x}{\mathcal{L}}_{0}(x,\lambda,\mu)=0,\\[5.69054pt] h(x)=0,\\[5.69054pt] g(x)\in K,\ \ \mu\in K,\ \ \langle\mu,g(x)\rangle=0.\end{array}\right.

(3.2)

In this case, the vector $(\lambda^{*},\mu^{*})$ is called a Lagrange multiplier at $x^{*}$ . We denote the set of all the Lagrangian multipliers at $x\in{\mathcal{X}}$ by ${\mathcal{M}}(x)$ , which is an empty set if $x$ fails to be a stationary point.

Let $c>0$ be a given constant and $x^{*}\in{\mathcal{X}}$ be a given stationary point of problem (1.1) with the Lagrangian multipliers $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ . We define the mapping

\begin{array}[]{r}{\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W):=\nabla_{xx}^{2}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})+c\nabla h(x^{*}){\mathcal{J}}h(x^{*})+c\nabla g(x^{*})W{\mathcal{J}}g(x^{*}),\\[5.69054pt] \forall\,W\in{\mathds{L}}({\mathcal{Z}},{\mathcal{Z}}).\end{array}

(3.3)

We make the following abstract assumption for the given stationary point $x^{*}\in{\mathcal{X}}$ .

Assumption 3.1.

(i) $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ is the unique Lagrangian multiplier;

(ii) there exist two positive constants $\bar{c}$ and $\eta$ such that for any $c\geq\bar{c}$ ,

\langle x,{\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W)\,x\rangle\geq\eta\|x\|^{2},\quad\forall\,W\in\partial_{B}\Pi_{K}(\mu^{*}-cg(x^{*})),\ \forall\,x\in{\mathcal{X}},

where the linear operator ${{\mathcal{A}}}_{(c,\lambda^{*},\mu^{*})}(W)$ is defined by (3.3).

The following result was given in Sun et al. [51, Proposition 1].

Proposition 3.1.

Suppose that Assumption 3.1 holds and $c\geq\bar{c}$ is fixed. Then, there exist two constants $\varepsilon>0$ and $\delta_{0}>0$ , both depending on $c$ , and a locally Lipschitz continuous function ${\mathbf{x}}_{c}(\cdot)$ defined on ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ by

{\mathbf{x}}_{c}(\lambda,\mu):=\operatorname*{arg\,min}_{x\in{\mathds{B}}_{\varepsilon}(x^{*})}{\mathcal{L}}_{c}(x,\lambda,\mu),

(3.4)

such that:

(a)

${\mathbf{x}}_{c}(\cdot)$ is semismooth at any point in ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ ;
(b)

for any $x\in{\mathds{B}}_{\varepsilon}(x^{*})$ and $(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ , it holds that every element in $\pi_{x}\partial_{B}(\nabla_{x}{\mathcal{L}}_{c})(x,\lambda,\mu)$ is positive definite;
(c)

For any $(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ , ${\mathbf{x}}_{c}(\lambda,\mu)$ is the unique optimal solution to

$\min_{x\in{\mathds{B}_{\varepsilon}(x^{*})}}{\mathcal{L}}_{c}(x,\lambda,\mu).$

If Assumption 3.1 holds and $c\geq\bar{c}$ is fixed, we can also fix the two parameters $\varepsilon>0$ and $\delta_{0}>0$ such that the conclusions (a)–(c) in Proposition 3.1 hold. In this case, we can define the real-valued function

\vartheta_{c}(\lambda,\mu):=\min_{x\in{\mathds{B}}_{\varepsilon}(x^{*})}{\mathcal{L}}_{c}(x,\lambda,\mu),\quad\forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}).

(3.5)

Obviously, $\vartheta_{c}(\cdot)$ is concave on its domain and

\vartheta_{c}(\lambda,\mu)={\mathcal{L}}_{c}({\mathbf{x}}_{c}(\lambda,\mu),\lambda,\mu),\ \forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}).

Moreover, the following result can be obtained from Proposition 3.1, and one may refer to Sun et al. [51, Proposition 2] for the detailed proof.

Lemma 3.2.

Suppose that Assumption 3.1 holds and $c\geq\bar{c}$ is fixed. Let the two constants $\varepsilon>0$ and $\delta_{0}>0$ be given by Proposition 3.1. Then, the concave function $\vartheta_{c}(\cdot)$ defined by (3.5) is continuously differentiable on ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ with its gradient given by

\nabla\vartheta_{c}(\lambda,\mu)=\left(\begin{matrix}-h({\mathbf{x}}_{c}(\lambda,\mu))\\[2.84526pt] \displaystyle\frac{1}{c}\,\Pi_{K}\big{(}\mu-cg({\mathbf{x}}_{c}(\lambda,\mu))\big{)}-\frac{\mu}{c}\end{matrix}\right).

(3.6)

Moreover, $\nabla\vartheta_{c}(\cdot)$ is semismooth on ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ .

Now we start to propose our second-order method of multipliers for solving problem (1.1). Suppose that we have obtained a certain $(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ , which we denote by $(\lambda^{k},\mu^{k})$ with the index $k\geq 0$ for convenience. Then, the procedure in (1.3) to get $(\lambda^{k+1},\mu^{k+1})$ with $x^{k+1}:={\mathbf{x}}_{c}(\lambda^{k},\mu^{k})$ can be reformulated as

(\lambda^{k+1},\mu^{k+1})=(\lambda^{k},\mu^{k})+c\nabla\vartheta_{c}(\lambda^{k},\mu^{k}),

which can be interpreted as a (first-order) gradient ascent step at $(\lambda^{k},\mu^{k})$ for the purpose of maximizing $\vartheta_{c}$ . Thus, it is natural to consider if one can introduce a second-order update rule for the multipliers based on this (first-order) gradient information. Obviously, the classical Newton method cannot be applied directly, due to the nonsmooth property induced by the projection $\Pi_{K}(\cdot)$ . However, even though the function $\nabla\vartheta_{c}$ is semismooth on ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ (according to Lemma 3.2), the semismooth Newton method introduced in Section 2.2 is still not directly applicable, inasmuch as the fact that $\nabla\vartheta_{c}$ is a composite function and it is highly difficult to find a mapping $T_{\vartheta_{c}}:{\mathds{R}}^{m}\times{\mathcal{Z}}\rightrightarrows{\mathds{L}}({\mathds{R}}^{m}\times{\mathcal{Z}},{\mathds{R}}^{m}\times{\mathcal{Z}})$ such that

\begin{array}[]{rl}\partial_{B}(\nabla\vartheta_{c})(\lambda,\mu)y\subset T_{\vartheta_{c}}(\lambda,\mu)y\subset\partial_{C}(\nabla\vartheta_{c})(\lambda,\mu)y,\\[5.69054pt] \qquad\qquad\forall\,(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}),\ \forall\,y\in{\mathds{R}}^{m}\times{\mathcal{Z}}.\end{array}

(3.7)

Fortunately, thanks to Proposition 2.4, even if (3.7) fails, one can still hope to design a second-order method to address this issue. Next, we will design a specific nonsmooth Newton method to ensure a second-order method of multipliers for problem (1.1).

For this purpose, we suppose that Assumption 3.1 holds and $c\geq\bar{c}$ is fixed, so that we can define the functions

\left\{\begin{array}[]{ll}{\lambda}_{c}(\lambda,\mu):=\lambda-ch({\mathbf{x}}_{c}(\lambda,\mu)),\\[5.69054pt] \mu_{c}(\lambda,\mu):=\Pi_{K}\left(\mu-cg({\mathbf{x}}_{c}(\lambda,\mu))\right),\end{array}\right.\quad\forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}),

(3.8)

and

\begin{array}[]{r}\displaystyle{\mathcal{A}}_{c}(\lambda,\mu,W):=\nabla_{xx}^{2}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(\lambda,\mu),\lambda_{c}(\lambda,\mu),\mu_{c}(\lambda,\mu))+c\nabla h({\mathbf{x}}_{c}(\lambda,\mu)){\mathcal{J}}h({\mathbf{x}}_{c}(\lambda,\mu))\\[5.69054pt] +c\nabla g({\mathbf{x}}_{c}(\lambda,\mu))W{\mathcal{J}}g({\mathbf{x}}_{c}(\lambda,\mu)),\qquad\forall\,(\lambda,\mu)\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*}),\ \forall\,W\in{\mathds{L}}({\mathcal{Z}},{\mathcal{Z}}),\end{array}

(3.9)

where both the constant $\delta_{0}$ and the function ${\mathbf{x}}_{c}(\cdot)$ are specified by Proposition 3.1. In this case, we can also define the following set-valued mapping (from ${\mathds{R}}^{m}\times{\mathcal{Z}}$ to ${\mathds{L}}({\mathds{R}}^{m}\times{\mathcal{Z}},{\mathds{R}}^{m}\times{\mathcal{Z}})$ )

\begin{array}[]{l}{\mathds{V}}_{c}(\lambda,\mu):=\left\{-\begin{pmatrix}{\mathcal{J}}h({\mathbf{x}}_{c}(\lambda,\mu))\\[5.69054pt] W{\mathcal{J}}g({\mathbf{x}}_{c}(\lambda,\mu))\end{pmatrix}({\mathcal{A}}_{c}(\lambda,\mu,W))^{-1}\Big{(}\nabla h({\mathbf{x}}_{c}(\lambda,\mu)),\nabla g({\mathbf{x}}_{c}(\lambda,\mu))W\Big{)}\right.\\[17.07164pt] \hfill\left.-\begin{pmatrix}0&0\\[5.69054pt] 0&\frac{1}{c}({\mathcal{I}}_{{\mathcal{Z}}}-W)\end{pmatrix}\mid W\in\partial_{B}\Pi_{K}\Big{(}\mu-cg\big{(}{\mathbf{x}}_{c}(\lambda,\mu)\big{)}\Big{)}\right\},\end{array}

(3.10)

where ${\mathcal{I}}_{\mathcal{Z}}$ denotes the identity operator in ${\mathcal{Z}}$ . According to (3.2) and (3.3), one has that

{\mathcal{A}}_{c}(\lambda^{*},\mu^{*},W)={\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W),\quad\forall\,W\in\partial_{B}\Pi_{K}(\mu^{*}-cg(x^{*})).

(3.11)

The following result on the properties of the mapping ${\mathds{V}}_{c}$ defined above is crucial to the forthcoming algorithmic design. Its proof is given in Appendix A, which is mainly based on the work of [13].

Proposition 3.3.

Suppose that Assumption 3.1 holds and $c\geq\bar{c}$ . Then, there exists a constant $\delta_{1}>0$ ( $\delta_{1}\leq\delta_{0}$ ) such that the mapping ${\mathds{V}}_{c}$ in (3.10) is well-defined, compact-valued and outer semicontinuous on ${\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ . Moreover, it holds that

\partial_{B}(\nabla\vartheta_{c})(\lambda,\mu)\subseteq{\mathds{V}}_{c}(\lambda,\mu),\quad\forall(\lambda,\mu)\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*}).

Based on the above discussions, we are able to present the following second-order method of multipliers.

Algorithm 1 A second-order method of multipliers for problem (1.1).

Given $c>0$ and $(\lambda^{0},\mu^{0})\in{\mathds{R}}^{m}\times{\mathcal{Z}}$ . For $k=0,1,\ldots$ ,

1.

compute $\displaystyle x^{k+1}:={\mathbf{x}}_{c}(\lambda^{k},\mu^{k})=\operatorname*{arg\,min}_{x\in{\mathds{B}}_{\varepsilon}(x^{*})}{\mathcal{L}}_{c}(x,\lambda^{k},\mu^{k})$ ;
2.

choose a linear operator $V^{k}\in{\mathds{V}}_{c}(\lambda^{k},\mu^{k})$ and solve the linear system $\nabla\vartheta_{c}(\lambda^{k},\mu^{k})+V^{k}\Delta y=0$ to obtain $\Delta y^{k+1}\in{\mathds{R}}^{m}\times{\mathcal{Z}}$ , where $\nabla\vartheta_{c}$ is given by (3.6) and ${\mathds{V}}_{c}(\lambda^{k},\mu^{k})$ is given by (3.10);
3.

compute $(\lambda^{k+1},\mu^{k+1}):=(\lambda^{k},\mu^{k})+\Delta y^{k+1}.$

Remark 3.4.

(a) Algorithm 1 is essentially an application of a nonsmooth Newton method for (locally) maximizing the function $\vartheta_{c}$ , or equivalently, for solving the nonlinear equation $\nabla\vartheta_{c}(\lambda,\mu)=0$ . In the second step of Algorithm 1, the linear operator $W\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ for computing $V^{k}$ can be explicitly obtained for the NLP, NLSOCP and NLSDP problems. Consequently, one can explicitly compute ${\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k})$ via (3.9) and the linear operator $V^{k}$ can be calculated via (3.10).

(b) Generally, it is hard to check if ${\mathds{V}}_{c}(\lambda^{k},\mu^{k})\subseteq\partial_{C}(\nabla\vartheta_{c})(\lambda^{k},\mu^{k})$ . Consequently, Algorithm 1 could not be attributed as the classic semismooth Newton method introduced in Section 2.2. We should mention that, it is still uncertain if one can explicitly find a linear operator $V^{k}$ such that $V^{k}\in\partial_{C}(\nabla\vartheta_{c})(\lambda^{k},\mu^{k})$ or $V^{k}\in\partial_{B}(\nabla\vartheta_{c})(\lambda^{k},\mu^{k})$ . However, as will be shown later, $V^{k}\in{\mathds{V}}_{c}(\lambda^{k},\mu^{k})$ is sufficient for our purpose.

To analyze the convergence properties of Algorithm 1, we make the following abstract assumption as a supplementary to Assumption 3.1.

Assumption 3.2.

For any $c\geq\bar{c}$ , any linear operator $V\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*})$ satisfies $V\prec 0$ .

The following result is essential to the convergence analysis of Algorithm 1. The proof of this theorem is given in Appendix B.

Theorem 3.5.

Suppose that Assumptions 3.1 and 3.2 hold, and $c\geq\bar{c}$ . Let $\{(\lambda^{k},\mu^{k})\}$ be an arbitrary sequence that converging to $(\lambda^{*},\mu^{*})$ . Then, for all the sufficiently large $k$ , the set ${\mathds{V}}_{c}(\lambda^{k},\mu^{k})$ is nonempty and compact, and it holds that

\begin{array}[]{r}\nabla\vartheta_{c}(\lambda^{k},\mu^{k})-\nabla\vartheta_{c}(\lambda^{*},\mu^{*})-V_{k}(\lambda^{k}-\lambda^{*},\mu^{k}-\mu^{*})=o\left(\|(\lambda^{k}-\lambda^{*},\mu^{k}-\mu^{*})\|\right),\\[5.69054pt] \forall\,V^{k}\in{\mathds{V}}_{c}(\lambda^{k},\mu^{k}).\end{array}

(3.12)

Now we establish the local convergence properties of Algorithm 1.

Theorem 3.6.

Suppose that Assumptions 3.1 and 3.2 hold. If $c>0$ is sufficiently large, then there exists a constant $\delta>0$ such that, if $(\lambda^{0},\mu^{0})\in{\mathds{B}_{\delta}}(\lambda^{*},\mu^{*})$ , the whole sequence $\{(\lambda^{k},\mu^{k})\}$ generated by Algorithm 1 is well-defined and converges to $(\lambda^{*},\mu^{*})$ with a superlinear convergence rate.

Proof.

According to Lemma 3.2, the function $\nabla\vartheta_{c}$ defined by (3.6) is semismooth. Meanwhile, the mapping ${\mathds{V}}_{c}$ defined in (3.10) is locally nonempty, compact-valued and outer semicontinuous in a neighborhood of $(\lambda^{*},\mu^{*})$ . Then, by Proposition 3.1 and Proposition 3.3 we know that Algorithm 1 is well-defined. Consequently, the theorem follows from Theorem 3.5 and Proposition 2.4. This completes the proof. ∎

We make the following remark on the abstract assumptions used in Theorem 3.6.

Remark 3.7.

Theorem 3.6 has established the local convergence, as well as the superlinear convergence rate, of the second-order method of multipliers (Algorithm 1). However, at the first glance, both Assumptions 3.1 and 3.2 are too abstract to be appreciated. Therefore, it is of great importance to transfer them to some concrete conditions. In the following three sections, we show separately that, for the three typical scenarios, i.e., the NLP, the NLSOCP, and the NLSDP, Assumptions 3.1 and 3.2 are the consequence of a certain constraint qualification, together with a second-order sufficient condition, and the corresponding analysis constitutes the main contribution of this paper. As can be observed later, these concrete conditions are exactly the prerequisites that were frequently used for establishing the local Q-linear convergence rate of the method of multipliers without requiring strict complementarity.

Before finishing this section, we would like to make the following remark on Algorithm 1.

Remark 3.8.

In the conventional problem settings of NLP, NLSOCP and NLSDP, even the classic augmented Lagrangian method is not necessarily convergent globally. Moreover, Algorithm 1 is based on a specially designed generalized Newton method by regarding the update of multipliers in the augmented Lagrangian method as a first-order step. As the second-order approaches often require stronger conditions to guarantee even the local convergence property, it is not necessarily globally convergent in general if the first-order counterpart is not. Consequently, the best possible result for the multiplier sequence generated by Algorithm 1 is a local convergence. Moreover, as the convergence rate study relies on a certain local error bound (as it is not realistic in general to expect a global error bound), the corresponding convergence rate results are also local ones.

4 The NLP case

Consider the NLP problem

\min_{x}f(x)\quad\mbox{s.t.}\quad h(x)=0,\quad g(x)\geq 0,

(4.1)

where $f:{\mathds{R}}^{n}\to{\mathds{R}}$ , $h:{\mathds{R}}^{n}\to{\mathds{R}}^{m}$ and $g:{\mathds{R}}^{n}\to{\mathds{R}}^{p}$ are three twice continuously differentiable functions. Problem (4.1) is an instance of problem (1.1) with ${\mathcal{X}}\equiv{\mathds{R}}^{n}$ , ${\mathcal{Z}}\equiv{\mathds{R}}^{p}$ and $K\equiv{\mathds{R}}_{+}^{p}:=\{\mu\in{\mathds{R}}^{p}\mid\mu_{j}\geq 0,j=1,2,\ldots,p\}$ . For convenience, we define the two index sets ${\mathds{I}}:=\{1,\ldots,m\}$ and ${\mathds{J}}:=\{1,\ldots,p\}$ . The following two definitions are well-known [33] for NLP problem (4.1).

Definition 4.1 (LICQ).

Given a point $x\in{\mathds{R}}^{n}$ such that $h(x)=0$ and $g(x)\geq 0$ . We say that the linear independence constraint qualification (LICQ) holds at $x$ if the set of active constraint gradients

\{\nabla h_{i}(x),\nabla g_{j}(x)\mid\,i\in{\mathds{I}},\,j\in{\mathds{J}},g_{j}(x)=0\}

is linearly independent.

Let $(x^{*},\lambda^{*},\mu^{*})$ be a solution to the KKT system (3.2). The critical cone at $x^{*}$ for $(\lambda^{*},\mu^{*})$ is defined by

\begin{array}[]{ll}{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))\\[5.69054pt] \quad:=\left\{d\in{\mathds{R}}^{m}\Bigg{|}\begin{array}[]{l}\langle\nabla h_{i}(x^{*}),d\rangle=0,\ \forall i\in{\mathds{I}},\\[1.42262pt] \langle\nabla g_{j}(x^{*}),d\rangle=0,\ \forall j\in{\mathds{J}}\mbox{ such that }g_{j}(x^{*})=0\mbox{ and }\mu^{*}_{j}>0,\\[1.42262pt] \langle\nabla g_{j}(x^{*}),d\rangle\geq 0,\ \forall j\in{\mathds{J}}\mbox{ such that }g_{j}(x^{*})=0\mbox{ and }\mu^{*}_{j}=0\end{array}\right\}.\end{array}

Definition 4.2 (Second-order sufficient condition for NLP).

Let $x^{*}$ be a stationary point of problem (4.1), We say the second-order sufficient condition holds at $x^{*}$ if there exists $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ such that

\langle d,\nabla_{xx}^{2}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})d\rangle>0,\ \forall d\in{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))\setminus\{0\}.

The following definition was introduced in Robinson [40].

Definition 4.3 (Strong second-order sufficient condition for NLP).

Let $x^{*}$ be a stationary point of problem (4.1), We say the strong second-order sufficient condition holds at $x^{*}$ if there exists $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ such that

\langle d,\nabla_{xx}^{2}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})d\rangle>0,\ \forall d\in{\sf aff\,}{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))\setminus\{0\},

where ${\sf aff\,}{\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))$ is the affine hull of ${\mathcal{C}}(x^{*},(\lambda^{*},\mu^{*}))$ .

The following proposition is the main result of Chen and Liao [13].

Proposition 4.4.

Suppose that both the LICQ and the strong second-order sufficient condition hold at the given stationary point $x^{*}$ of the NLP problem (4.1). Then, Assumptions 3.1 and 3.2 hold at $x^{*}$ for problem (4.1).

5 The NLSOCP case

In this section, we consider the following NLSOCP problem

\min f(x)\quad\mbox{s.t.}\quad h(x)=0,\quad g(x)\in K,

(5.1)

where $f:{\mathds{R}}^{n}\to{\mathds{R}}$ , $h:{\mathds{R}}^{n}\to{\mathds{R}}^{m}$ , and $g:{\mathds{R}}^{n}\to{\mathds{R}}^{p}$ are three twice continuously differentiable functions, $K:={\mathcal{Q}}_{1}\times\cdots\times{\mathcal{Q}}_{s}$ with each ${\mathcal{Q}}_{j}$ , $j=1,\ldots,s$ being a second-order cone in ${\mathds{R}}^{r_{j}+1}$ , i.e.,

{\mathcal{Q}}_{j}:=\{\mu_{j}=(\bar{\mu}_{j},\dot{\mu}_{j})\in{\mathds{R}}^{r_{j}}\times{\mathds{R}}\mid\dot{\mu}_{j}\geq\|\bar{\mu}_{j}\|\}.

Note that $p=\sum_{j=1}^{s}(r_{j}+1)$ . Problem (5.1) is an instance of problem (1.1) with ${\mathcal{X}}\equiv{\mathds{R}}^{n}$ and ${\mathcal{Z}}\equiv{\mathds{R}}^{p}$ . Moreover, for convenience, we can partition the function $g$ by $g(\cdot)=(g_{1}(\cdot),\ldots,g_{s}(\cdot))$ with each $g_{j}(\cdot)=(\bar{g}_{j}(\cdot),\dot{g}_{j}(\cdot))$ such that $\dot{g}_{j}$ is real-valued. In the rest part of this section, we first introduce some preliminary results related to NLSOCP problems. After that, we present the main results of this section, and discuss a special case.

5.1 Preliminaries on NLSOCP

For each $j=1,\dots,s$ , we use ${\sf int\,}{\mathcal{Q}}_{j}$ and ${\sf bdry\,}{\mathcal{Q}}_{j}$ to denote the interior and the boundary of the second-order cone ${\mathcal{Q}}_{j}$ , respectively. According to [21, Proposition 3.3], the projection operator to the cone $K$ takes the form of $\Pi_{K}(\mu)=(\nu_{1},\ldots,\nu_{s})$ , $\forall\mu=(\mu_{1},\ldots,\mu_{s})\in{\mathds{R}}^{r_{1}+1}\times\cdots\times{\mathds{R}}^{r_{s}+1}$ with

\nu_{j}=\left\{\begin{array}[]{ll}\displaystyle\frac{1}{2}\left(1+\frac{\dot{\mu}_{j}}{\|\bar{\mu}_{j}\|}\right)(\bar{\mu}_{j},\|\bar{\mu}_{j}\|),&\mbox{if }\ |\dot{\mu}_{j}|<\|\bar{\mu}_{j}\|,\\[5.69054pt] \mu_{j},&\mbox{if }\ \dot{\mu}_{j}\geq\|\bar{\mu}_{j}\|,\\[5.69054pt] 0,&\mbox{if }\ \dot{\mu}_{j}\leq-\|\bar{\mu}_{j}\|,\\ \end{array}\right.

where $\mu_{j}=(\bar{\mu}_{j},\dot{\mu}_{j})\in{\mathds{R}}^{r_{j}}\times{\mathds{R}},\forall j=1,\ldots,s.$

It has been established in Chen et al. [14, Proposition 4.3] (also in Hayashi et al. [22, Proposition 4.5]) that the projection operator $\Pi_{K}(\cdot)$ is strongly semismooth. Moreover, the Bouligand-subdifferential of this projection operator has been discussed in Pang et al. [35, Lemma 14], Chen et al. [12, Lemma 4] and Outrata and Sun [34, Lemma 1]. The following result directly follows [25, Lemmas 2.5 & 2.6].

Lemma 5.1.

For any given point $\mu=(\mu_{1},\ldots,\mu_{s})\in{\mathds{R}}^{r_{1}+1}\times\cdots\times{\mathds{R}}^{r_{s}+1}$ , each element $W\in\partial_{B}\Pi_{K}(\mu)$ is a block diagonal matrix $W={\sf Diag\,}(W_{1},\ldots,W_{s})$ , with each $W_{j}\in{\mathds{R}}^{(r_{j}+1)\times(r_{j}+1)}$ taking one of the following representations:

(a)

$W_{j}=\left\{\begin{array}[]{ll}\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\left(1+\frac{\dot{\mu}_{j}}{\|\bar{\mu}_{j}\|}\right)I_{r_{j}}-\frac{\dot{\mu}_{j}}{\|\bar{\mu}_{j}\|}\frac{\bar{\mu}_{j}\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|^{2}}&\displaystyle\frac{\bar{\mu}_{j}}{\|\bar{\mu}_{j}\|}\\[11.38109pt] \displaystyle\frac{\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|}&1\end{pmatrix},&\mbox{ if }\displaystyle|\dot{\mu}_{j}|<\|\bar{\mu}_{j}\|,\\[5.69054pt] \displaystyle\displaystyle I_{r_{j}+1},&\mbox{ if }\displaystyle\dot{\mu}_{j}>\|\bar{\mu}_{j}\|,\\[5.69054pt] \displaystyle 0,&\mbox{ if }\displaystyle\dot{\mu}_{j}<-\|\bar{\mu}_{j}\|;\end{array}\right.$
(b)

$W_{j}\in\left\{I_{r_{j}+1},\ I_{r_{j}+1}+\frac{1}{2}\begin{pmatrix}\displaystyle-\frac{\bar{\mu}_{j}\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|^{2}}&\displaystyle\frac{\bar{\mu}_{j}}{\|\bar{\mu}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|}&\displaystyle-1\end{pmatrix}\right\},\ \mbox{ if }\mu_{j}\neq 0\mbox{ and }\dot{\mu}_{j}=\|\bar{\mu}_{j}\|;$
(c)

$W_{j}\in\left\{0,\ \frac{1}{2}\begin{pmatrix}\displaystyle\frac{\bar{\mu}_{j}\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|^{2}}&\displaystyle\frac{\bar{\mu}_{j}}{\|\bar{\mu}_{j}\|}\\[11.38109pt] \displaystyle\frac{\bar{\mu}_{j}^{T}}{\|\bar{\mu}_{j}\|}&\displaystyle 1\end{pmatrix}\right\},\ \mbox{ if }\mu_{j}\neq 0\mbox{ and }\dot{\mu}_{j}=-\|\bar{\mu}_{j}\|;$
(d)

$W_{j}\in\left\{\frac{1}{2}\begin{pmatrix}\displaystyle 2\tau I_{r_{j}}+(1-2\tau)\bar{\nu}\bar{\nu}^{T}&\displaystyle\bar{\nu}\\[5.69054pt] \displaystyle\bar{\nu}^{T}&\displaystyle 1\end{pmatrix}\Bigl{|}\ \bar{\nu}\in{\mathds{R}}^{r_{j}},\ \|\bar{\nu}\|=1,\ \tau\in[0,1]\right\}\cup\{0\}\cup\{I_{r_{j}+1}\},\ \mbox{ if }\mu_{j}=0.$

According to [6, Lemma 25], the tangent cone of $K$ at $\mu=(\mu_{1},\ldots,\mu_{s})\in K$ , with each $\mu_{j}\in{\mathds{R}}^{r_{j}+1}$ , $j=1,\ldots,s$ , is given by ${\mathcal{T}}_{K}(\mu)={\mathcal{T}}_{{\mathcal{Q}}_{1}}(\mu_{1})\times\cdots\times{\mathcal{T}}_{{\mathcal{Q}}_{s}}(\mu_{s})$ , where

{\mathcal{T}}_{{\mathcal{Q}}_{j}}(\mu_{j}):=\left\{\begin{array}[]{ll}{\mathds{R}}^{r_{j}+1},&\mbox{ if }\dot{\mu}_{j}>\bar{\mu}_{j},\\[5.69054pt] {\mathcal{Q}}_{j},&\mbox{ if }\mu_{j}=0,\\[2.84526pt] \{\nu_{j}\in{\mathds{R}}^{r_{j}+1}\mid\langle\bar{\nu}_{j},\bar{\mu}_{j}\rangle-\dot{\nu_{j}}\dot{\mu}_{j}\leq 0\},&\mbox{ otherwise}.\end{array}\right.

Consequently, the corresponding lineality space of ${\mathcal{T}}_{K}(\mu)$ takes the following form

{{\sf lin\,}}({\mathcal{T}}_{K}(\mu))=\Theta_{1}\times\cdots\times\Theta_{s}

(5.2)

with

\Theta_{j}=\left\{\begin{array}[]{ll}{\mathds{R}}^{r_{j}+1},&\mbox{ if }\dot{\mu}_{j}>\bar{\mu}_{j},\\ \{0\},&\mbox{ if }\mu_{j}=0,\\ \{\nu_{j}\in{\mathds{R}}^{r_{j}+1}\mid\langle\bar{\nu}_{j},\bar{\mu}_{j}\rangle-\dot{\nu_{j}}\dot{\mu}_{j}=0\},&\mbox{ otherwise}.\end{array}\right.

The following definition of constraint nondegeneracy²²2The name of “constraint nondegeneracy” was coined in Robinson in [42]. For NLP problems, constraint nondegeneracy is exactly the LICQ, c.f. Robinson [41] and Shapiro [47]. for NLSOCP problems follows from [7, Section 2].

Definition 5.2 (Constraint nondegeneracy).

We say that the constraint nondegeneracy holds at a feasible point $x$ to the NLSOCP problem (5.1) if

\begin{pmatrix}{\mathcal{J}}h(x)\\[2.84526pt] {\mathcal{J}}g(x)\end{pmatrix}{\mathds{R}}^{n}+\begin{pmatrix}\{0\}\\[2.84526pt] {\sf lin\,}\big{(}{\mathcal{T}}_{K}(g(x))\big{)}\end{pmatrix}=\begin{pmatrix}{\mathds{R}}^{m}\\[2.84526pt] {\mathds{R}}^{p}\end{pmatrix}.

(5.3)

Let $x^{*}$ be a stationary point of the NLSOCP problem (5.1) with $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ . The critical cone ${\mathcal{C}}(x^{*})$ of NLSOCP at $x^{*}$ is defined, e.g., in [8, 6], by

{\mathcal{C}}(x^{*}):=\left\{d\in{\mathds{R}}^{n}\mid{\mathcal{J}}h(x^{*})d=0,\ {\mathcal{J}}g(x^{*})d\in{\mathcal{T}}_{K}(g(x^{*}))\cap(\mu^{*})^{\bot}\right\},

where $(\mu^{*})^{\bot}$ is the orthogonal complement of $\mu^{*}$ in ${\mathds{R}}^{p}$ . The explicit formulation of ${\mathcal{C}}(x^{*})$ can be found in [6, Corollary 26]. Moreover, as in [6, (47)], for any $d\in{{\sf aff\,}}\,{\mathcal{C}}(x^{*})$ , one has that ${\mathcal{J}}h(x^{*})d=0$ and for all $j=1,\ldots,s$ ,

\left\{\begin{array}[]{ll}{\mathcal{J}}g_{j}(x^{*})d=0,&\mbox{ if }\mu^{*}_{j}\in{{\sf int\,}\,}{\mathcal{Q}}_{j},\\[2.84526pt] {\mathcal{J}}g_{j}(x^{*})d\in{{\sf aff\,}}\{(-\bar{\mu}_{j},\dot{\mu}_{j})\},&\mbox{ if }\mu^{*}_{j}\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}\mbox{ and }g_{j}(x^{*})=0,\\[2.84526pt] \langle{\mathcal{J}}g_{j}(x^{*})d,\mu_{j}^{*}\rangle=0,&\mbox{ if }\mu^{*}_{j},g_{j}(x^{*})\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}.\end{array}\right.

Hence, one has that

{\sf aff\,}{\mathcal{C}}(x^{*})=\left\{d\in{\mathds{R}}^{n}\mid{\mathcal{J}}h(x^{*})d=0,\ {\mathcal{J}}g(x^{*})d\in{\sf aff\,}\big{(}{\mathcal{T}}_{K}(g(x^{*}))\cap(\mu^{*})^{\bot}\big{)}\right\}.

(5.4)

The following definition of the strong second-order sufficient condition for the NLSOCP problem comes from Bonnans and Ramírez [6].

Definition 5.3 (Strong second-order sufficient condition for NLSOCP).

Let $x^{*}$ be a stationary point of the NLSOCP problem (5.1). We say that the strong second-order sufficient condition holds at $x^{*}$ if there exists a vector $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ such that

\left\langle d,\left({\mathcal{J}}^{2}_{xx}{\mathcal{L}}_{0}(x^{*},\lambda^{*},\mu^{*})+{\mathcal{H}}(x^{*},\mu^{*})\right)d\right\rangle>0,\quad\forall d\in{{\sf aff\,}}(C(x^{*}))\backslash\{0\},

(5.5)

where ${\mathcal{H}}(x^{*},\mu^{*})=\sum_{j=1}^{s}{\mathcal{H}}_{j}(x^{*},\mu^{*})$ is a matrix with each ${\mathcal{H}}_{j}(x^{*},\mu^{*})\in{\mathds{R}}^{n\times n}$ , $j=1,\ldots,s$ defined by

\begin{array}[]{l}{\mathcal{H}}_{j}(x^{*},\mu^{*}):=\left\{\begin{array}[]{ll}\displaystyle-\frac{\dot{\mu}_{j}}{\dot{g}_{j}(x^{*})}\nabla g_{j}(x^{*})\begin{pmatrix}-I_{r_{j}}&0\\ 0&1\end{pmatrix}{\mathcal{J}}g_{j}(x^{*}),&\mbox{if }g_{j}(x^{*})\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\},\\[11.38109pt] 0,&\mbox{ otherwise}.\end{array}\right.\end{array}

5.2 Main results

Based on the previous subsection, we are ready to establish the main result of this section.

Theorem 5.4.

Suppose that both the constraint nondegeneracy (5.3) and the strong second-order sufficient condition (5.5) hold for problem (5.1) at $x^{*}$ with $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ . Then, Assumptions 3.1 and 3.2 hold.

Proof.

Assumption 3.1 directly follows from [6, Proposition 17] and [29, Proposition 10]. In the following, we show that Assumption 3.2 holds.

Let $c\geq\bar{c}$ be fixed. Denote $z=(z_{1},\ldots,z_{s}):=\mu^{*}-cg(x^{*})$ with each $z_{j}:=\mu_{j}-cg_{j}(x^{*})$ , and $g_{j}(\cdot)=(\bar{g}_{j}(\cdot),\dot{g}_{j}(\cdot))$ , $\forall j=1,\ldots,s$ . Since Assumption 3.1 holds, we know that for any $W\in\partial_{B}\Pi_{K}(z)$ , the linear operator ${\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W)$ defined in (3.3) is positive definite, so that by (3.11) we know that ${\mathds{V}}_{c}(\lambda^{*},\mu^{*})$ defined by (3.10) is nonempty. Suppose that there exists a certain $W\in\partial_{B}\Pi_{K}(z)$ such that the corresponding $V\in{\mathds{V}}_{c}(y^{*})$ is not negative definite. From (3.10) we know that $V\preceq 0$ . Therefore, what we actually suppose is that $V$ is singular, i.e., there exists two vectors $\Delta\lambda\in{\mathds{R}}^{m}$ and $\Delta\mu\in{\mathds{R}}^{p}$ such that

\Delta y:=(\Delta\lambda,\Delta\mu)\neq 0\quad\mbox{but}\quad\langle\Delta y,V\Delta y\rangle=0.

(5.6)

Note that

\begin{array}[]{ll}-\left\langle\Delta y,V(\Delta y)\right\rangle=&\displaystyle\big{\langle}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu),\\[2.84526pt] &\quad({\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W))^{-1}\big{(}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu)\big{)}\big{\rangle}\\[5.69054pt] &+\frac{1}{c}\left\langle\Delta\mu,(I_{p}-W)(\Delta\mu)\right\rangle.\end{array}

(5.7)

Meanwhile, from Lemma 5.1 we know that $W={\sf Diag\,}(W_{1},\ldots,W_{s})$ with each $W_{j}\in{\mathds{R}}^{(r_{j}+1)\times(r_{j}+1)}$ , $\forall j=1,\ldots,s$ . Moreover, it is easy to deduce from [31, Proposition 1] that $I_{r_{j}}\succeq W_{j}\succeq 0$ . Hence, by using the fact that ${\mathcal{A}}_{(c,\lambda^{*},\mu^{*})}(W)\succ 0$ , one has from (5.6) and (5.7) that

\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu)=0,

(5.8)

and

\langle(\Delta\mu)_{j},(I_{r_{j}+1}-W_{j})(\Delta\mu)_{j}\rangle=0,\quad\forall j=1,\ldots,s.

(5.9)

Since the constraint nondegeneracy condition (5.3) holds at $x^{*}$ , there exist a vector $d\in{\mathds{R}}^{n}$ and a vector $\xi=(\xi_{1},\ldots,\xi_{s})\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{K}(g(x^{*}))\big{)}$ with $\xi_{j}=(\bar{\xi}_{j},\dot{\xi}_{j})\in{\mathds{R}}^{r_{j}}\times{\mathds{R}}$ , $\forall j=1,\ldots,s$ , such that

{\mathcal{J}}h(x^{*})d=\Delta\lambda\quad\mbox{and}\quad{\mathcal{J}}g(x^{*})d+\xi=\Delta\mu.

Therefore, one can get from (5.8) and (5.9) that

\begin{array}[]{l}\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=\langle\Delta\lambda,\Delta\lambda\rangle+\langle\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,\Delta\lambda\rangle+\langle W\Delta\mu,\Delta\mu\rangle+\langle(I_{p}-W)\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,{\mathcal{J}}h(x^{*})d\rangle+\langle W\Delta\mu,{\mathcal{J}}g(x^{*})d+\xi\rangle+\langle(I_{p}-W)\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*})W(\Delta\mu),d\rangle+\langle W\Delta\mu,\xi\rangle+\langle(I_{p}-W)\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle W\Delta\mu,\xi\rangle.\end{array}

(5.10)

Since $\xi\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{K}(g(x^{*}))\big{)}$ , one has from (5.2) that for any $j=1,\ldots,s$ ,

\left\{\begin{array}[]{ll}\xi_{j}\in{\mathds{R}}^{r_{j}+1},&\mbox{ if }g_{j}(x^{*})\in{{\sf int\,}\,}{\mathcal{Q}}_{j},\\[2.84526pt] \xi_{j}=0,&\mbox{ if }g_{j}(x^{*})=0,\\[2.84526pt] \langle\bar{\xi}_{j},\bar{g}_{j}(x^{*})\rangle-\dot{\xi}_{j}\dot{g}_{j}(x^{*})=0,&\mbox{ otherwise.}\end{array}\right.

(5.11)

Now, for each $j\in\{1,\ldots,s\}$ , we separate our analysis to the following four cases.

Case 1. If $g_{j}(x^{*})\in{{\sf int\,}}\,{\mathcal{Q}}_{j}$ , one has that $\mu_{j}^{*}=0$ so that $-z_{j}=cg_{j}(x^{*})\in{{\sf int\,}}\,{\mathcal{Q}}_{j}$ . Therefore, by Lemma 5.1 (a) we know that $W_{j}=0$ . Hence, in this case, $\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0$ .

Case 2. If $g_{j}(x^{*})=0$ , one has that $\xi_{j}=0$ . Hence, $\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0$ also holds.

Case 3. If $g_{j}(x^{*})\in{\sf bdry\,}\,{\mathcal{Q}}_{j}\backslash\{0\}$ and $\mu_{j}^{*}=0$ , one has that $-z_{j}\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}$ . Hence, in this case, we know from Lemma 5.1 (c) that

W_{j}\in\left\{0,\frac{1}{2}\begin{pmatrix}\displaystyle\frac{\bar{z}_{j}\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|^{2}}&\ \displaystyle\frac{\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|}&\displaystyle 1\end{pmatrix}\right\}.

If $W_{j}=0$ , one has that $\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0$ . Otherwise, one has that

\begin{array}[]{ll}W_{j}\xi_{j}&=\frac{1}{2}\begin{pmatrix}\displaystyle\frac{\bar{z}_{j}\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|^{2}}&\displaystyle\frac{\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|}&\displaystyle 1\end{pmatrix}\xi_{j}=\frac{1}{2}\begin{pmatrix}\displaystyle\frac{(\bar{z}_{j}^{T}\bar{\xi}_{j})\bar{z}_{j}}{\|\bar{z}_{j}\|^{2}}+\frac{\dot{\xi}_{j}\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[8.53581pt] \displaystyle\frac{\bar{z}_{j}^{T}\bar{\xi}_{j}}{\|\bar{z}_{j}\|}+\dot{\xi}_{j}\end{pmatrix}\\[28.45274pt] &=\frac{1}{2}\begin{pmatrix}\displaystyle\frac{1}{{\|\bar{z}_{j}\|^{2}}}\left(\bar{z}_{j}^{T}\bar{\xi}_{j}+{\dot{\xi}_{j}}{\|\bar{z}_{j}\|}\right)\bar{z}_{j}\\[8.53581pt] \displaystyle\frac{1}{\|\bar{z}_{j}\|}\left(\bar{z}_{j}^{T}\bar{\xi}_{j}+\dot{\xi}_{j}\|\bar{z}_{j}\|\right)\end{pmatrix}.\end{array}

(5.12)

Note that $\mu_{j}^{*}=0$ , $\dot{z}_{j}=-\|\bar{z}_{j}\|$ , and $\dot{g}_{j}(x^{*})>0$ . Then, one can see from (5.11) that $\bar{\xi}_{j}^{T}\bar{z}_{j}+\dot{\xi}_{j}\|\bar{z}_{j}\|=\bar{\xi}_{j}^{T}\bar{z}_{j}-\dot{\xi}_{j}\dot{z}_{j}=-c\langle\bar{\xi}_{j},\bar{g}_{j}(x^{*})\rangle+c\dot{\xi}_{j}\dot{g}_{j}(x^{*})=0.$ Therefore, one can get from (5.12) that $\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0$ .

Case 4. If $g_{j}(x^{*})\in{\sf bdry\,}{\mathcal{Q}}_{j}\backslash\{0\}$ and $\mu_{j}^{*}\neq 0$ , from [30, Lemma 2.2]³³3The proof of this Lemma originally comes from [1]. we know that there exists a constant $\sigma_{j}>0$ such that $\mu_{j}^{*}=\sigma_{j}(-\bar{g}_{j}(x^{*}),\dot{g}_{j}(x^{*}))$ . Hence, in this case, it holds that

z_{j}=\mu_{j}^{*}-cg_{j}(x^{*})=\big{(}-(\sigma_{j}+c)\bar{g}_{j}(x^{*}),(\sigma_{j}-c)\dot{g}(x^{*})\big{)}.

(5.13)

Since $\sigma_{j}+c>\sigma_{j}-c>-(\sigma_{j}+c)$ , it holds that $-\|\bar{z}_{j}\|<\dot{z}_{j}<\|\bar{z}_{j}\|$ . Therefore, by using Lemma 5.1 (a), the equation (5.13), and the fact that $\|\bar{g}_{j}(x^{*})\|=\dot{g}_{j}(x^{*})$ , one can get that

\begin{array}[]{lll}\displaystyle W_{j}&=&\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\left(1+\frac{\dot{z}_{j}}{\|\bar{z}_{j}\|}\right)I_{r_{j}}-\frac{\dot{z}_{j}}{\|\bar{z}_{j}\|}\frac{\bar{z}_{j}\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|^{2}}&\displaystyle\frac{\bar{z}_{j}}{\|\bar{z}_{j}\|}\\[11.38109pt] \displaystyle\frac{\bar{z}_{j}^{T}}{\|\bar{z}_{j}\|}&\displaystyle 1\end{pmatrix}\\[28.45274pt] &=&\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\left(1+\frac{\sigma_{j}-c}{\sigma_{j}+c}\right)I_{r_{j}}-\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{\bar{g}_{j}(x^{*})(\bar{g}_{j}(x^{*}))^{T}}{\|\bar{g}_{j}(x^{*})\|^{2}}&\displaystyle\quad\frac{-\bar{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|}\\[11.38109pt] \displaystyle\frac{(-\bar{g}_{j}(x^{*}))^{T}}{\|\bar{g}_{j}(x^{*})\|}&\displaystyle 1\end{pmatrix}.\end{array}

(5.14)

Therefore, it holds that

\begin{array}[]{ll}(I_{r_{j}+1}-W_{j})(\Delta\mu)_{j}\\[5.69054pt] =\frac{1}{2}\begin{pmatrix}\displaystyle\frac{2c}{\sigma_{j}+c}{{(\Delta\bar{\mu})}_{j}}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|^{2}}\bar{g}_{j}(x^{*})+\frac{(\Delta\dot{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|}\bar{g}_{j}(x^{*})\\[11.38109pt] \displaystyle\frac{(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|}+(\Delta\dot{\mu})_{j}\end{pmatrix},\end{array}

which, together with (5.9), implies that

\frac{2c}{\sigma_{j}+c}\|(\Delta\bar{\mu})_{j}\|^{2}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{[(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}]^{2}}{\|\bar{g}_{j}(x^{*})\|^{2}}+2\frac{(\Delta\dot{\mu})_{j}(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|}+(\Delta\dot{\mu})_{j}^{2}=0.

If $(\Delta\bar{\mu})_{j}=0$ , the above equation implies that $(\Delta\dot{\mu})_{j}=0$ so that $\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0$ . Otherwise, one has that $(\Delta\bar{\mu})_{j}\neq 0$ , and the above equation can be equivalently formulated as

\frac{2c}{\sigma_{j}+c}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{[(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}]^{2}}{\|\bar{g}_{j}(x^{*})\|^{2}\|(\Delta\bar{\mu})_{j}\|^{2}}+2\frac{(\Delta\dot{\mu})_{j}(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|\|(\Delta\bar{\mu})_{j}\|^{2}}+\frac{(\Delta\dot{\mu})_{j}^{2}}{\|(\Delta\bar{\mu})_{j}\|^{2}}=0.

One can reformulate the above equality

\omega_{j}^{2}+2\theta_{j}\omega_{j}+\frac{2c}{\sigma_{j}+c}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\theta_{j}^{2}=0

(5.15)

with

\theta_{j}:=\frac{(\bar{g}_{j}(x^{*}))^{T}(\Delta\bar{\mu})_{j}}{\|\bar{g}_{j}(x^{*})\|\|(\Delta\bar{\mu})_{j}\|}\quad\mbox{and}\quad\omega_{j}:=\frac{(\Delta\dot{\mu})_{j}}{\|(\Delta\bar{\mu})_{j}\|}.

Then, in order to ensure (5.15), its discriminant (by viewing it as a quadratic polynomial of $\omega_{j}$ ) satisfies $4\theta^{2}_{j}-4\left(\frac{2c}{\sigma_{j}+c}+\frac{\sigma_{j}-c}{\sigma_{j}+c}\theta^{2}_{j}\right)\geq 0$ , which implies that

\frac{c}{\sigma_{j}+c}\theta^{2}_{j}-\frac{c}{\sigma_{j}+c}\geq 0.

As a result, it holds that $\theta^{2}_{j}=1$ . Note that, if $\theta_{j}=1$ , one has from (5.15) that $\omega_{j}=-1$ , or else $\omega_{j}=1$ . Hence, there always exists a nonzero constant $\kappa_{j}$ such that

(\Delta\mu)_{j}=\kappa_{j}\big{(}\bar{g}_{j}(x^{*}),-\dot{g}_{j}(x^{*})\big{)}\quad\mbox{and}\quad\left\{\begin{array}[]{ll}\kappa_{j}>0,&\mbox{ if }\theta_{j}=1,\\ \kappa_{j}<0,&\mbox{ if }\theta_{j}=-1.\end{array}\right.

(5.16)

Note that $\dot{g}_{j}(x^{*})=\|\bar{g}_{j}(x^{*})\|$ . Hence, from (5.14) and (5.16) we know that

\begin{array}[]{ll}&W_{j}(\Delta\mu)_{j}\\[5.69054pt] &=\displaystyle\frac{1}{2}\begin{pmatrix}\displaystyle\frac{2\sigma_{j}}{\sigma_{j}+c}\kappa_{j}\bar{g}_{j}(x^{*})-\frac{\sigma_{j}-c}{\sigma_{j}+c}\frac{((\bar{g}_{j}(x^{*}))^{T}\bar{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|^{2}}\kappa_{j}\bar{g}_{j}(x^{*})-\frac{-\kappa_{j}\dot{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|}\bar{g}_{j}(x^{*})\\[11.38109pt] \displaystyle-\kappa_{j}\frac{(\bar{g}_{j}(x^{*}))^{T}\bar{g}_{j}(x^{*})}{\|\bar{g}_{j}(x^{*})\|}-\kappa_{j}\dot{g}_{j}(x^{*})\end{pmatrix}\\[28.45274pt] &=\displaystyle\left(\kappa_{j}\bar{g}_{j}(x^{*}),-\kappa_{j}\dot{g}_{j}(x^{*})\right).\end{array}

Then, according to the above equation and (5.11), one has that

\langle\xi_{j},W_{j}(\Delta\mu)_{j}\rangle=\kappa_{j}\big{(}\langle\bar{\xi}_{j},\bar{g}_{j}(x^{*})\rangle-\dot{\xi}_{j}\dot{g}_{j}(x^{*})\big{)}=0.

From all above discussion, we have proved that $\langle W_{j}(\Delta\mu)_{j},\xi_{j}\rangle=0,\,\forall j=1,\ldots,s$ . Consequently, from (5.10) we know that $\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=0$ . This contradicts (5.6). That is, from $\left\langle\Delta y,V(\Delta y)\right\rangle=0$ one can only get $\Delta y=0$ . Therefore, any $V\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*})$ is negative definite, which completes the proof. ∎

6 The NLSDP case

Let ${\mathcal{S}}^{p}$ be the linear space of all $p\times p$ real symmetric matrices and ${\mathcal{S}}^{p}_{+}$ be the cone of positive semidefinite matrices in ${\mathcal{S}}^{p}$ . We consider the NLSDP problem:

\min_{x}f(x)\quad\mbox{s.t.}\quad h(x)=0,\quad g(x)\in{\mathcal{S}}_{+}^{p},

(6.1)

where $f:{\mathcal{X}}\to{\mathds{R}}$ , $h:{\mathcal{X}}\to{\mathds{R}}^{m}$ and $g:{\mathcal{X}}\to{\mathcal{S}}^{p}$ are three twice continuously differentiable functions. Problem (6.1) is an instance of problem (1.1) with ${\mathcal{Z}}={\mathcal{S}}^{p}$ and $K={\mathcal{S}}_{+}^{p}$ . We first give a quick review of some preliminary results for NLSDP problems. One may refer to [48] for more details.

6.1 Preliminaries on NLSDP

For any two matrices $Z,Z^{\prime}\in{\mathcal{S}}^{p}$ , the inner product of them defined by $\langle Z,Z^{\prime}\rangle:={\rm Tr}(Z^{T}Z^{\prime})$ and its induced norm is the Frobenius norm given by $\|Z\|=\sqrt{\langle Z,Z\rangle}$ . Moreover, we write $Z_{+}:=\Pi_{{\mathcal{S}}_{+}^{p}}(Z)$ . Hence, it holds that

Z_{+}\in{\mathcal{S}}_{+}^{p},\quad Z_{+}-Z\in{\mathcal{S}}_{+}^{p},\quad\langle Z_{+},(Z_{+}-Z)\rangle=0.

(6.2)

One can write its spectral decomposition as $Z=PDP^{T}$ with $P\in{\mathds{R}}^{p\times p}$ being an orthogonal matrix and $D\in{\mathds{R}}^{p\times p}$ being diagonal matrix whose diagonal entries are ordered by $\varrho_{1}\geq\varrho_{2},\ldots,\geq\varrho_{p}$ . We use the three index sets $\alpha,\beta$ and $\gamma$ to indicate the positive, zero, and negative eigenvalues of $Z$ , respectively, i.e.,

\alpha:=\{i\mid\varrho_{i}>0\},\quad\beta:=\{i\mid\varrho_{i}=0\}\quad\mbox{and}\quad\gamma:=\{i\mid\varrho_{i}<0\},\quad\forall i=1,\ldots,p.

(6.3)

Then, one can write

D=\begin{pmatrix}D_{\alpha}&0&0\\ 0&D_{\beta}&0\\ 0&0&D_{\gamma}\end{pmatrix}\quad\mbox{and}\quad P=\begin{pmatrix}P_{\alpha},P_{\beta},P_{\gamma}\end{pmatrix}

(6.4)

with $P_{\alpha}\in{\mathds{R}}^{p\times|\alpha|}$ , $P_{\beta}\in{\mathds{R}}^{p\times|\beta|}$ , and $P_{\gamma}\in{\mathds{R}}^{p\times|\gamma|}$ . Moreover, we define the matrix $U\in{\mathcal{S}}^{p}$ by

U_{ij}:=\frac{\max\{\varrho_{i},0\}+\max\{\varrho_{j},0\}}{|\varrho_{i}|+|\varrho_{j}|},\quad i,j=1,\ldots,p,

(6.5)

with the convention that $0/0=1$ . As has been shown in Sun and Sun [50], the projection operator $\Pi_{{\mathcal{S}}_{+}^{p}}$ is strongly semismooth. Moreover, the following result in Pang et al. [35, Lemma 11] characterizes the Bouligand-subdifferential of this projection operator.

Lemma 6.1.

Suppose that $Z\in{\mathcal{S}}^{p}$ has the spectral decomposition $Z=PDP^{T}$ with $P$ and $D$ being partitioned by (6.4). Then, for any ${\mathbf{W}}\in\partial_{B}\Pi_{{\mathcal{S}}_{+}^{p}}(Z)$ , there exist two index sets $\alpha^{\prime}$ and $\gamma^{\prime}$ that partition $\beta$ , together with a matrix $\Gamma\in{\mathds{R}}^{|\alpha^{\prime}|\times|\gamma^{\prime}|}$ with entries in $[0,1]$ , such that

\begin{array}[]{l}{\mathbf{W}}(H)=P\begin{pmatrix}[P^{T}HP]_{\alpha\alpha}&[P^{T}HP]_{\alpha\beta}&U_{\alpha\gamma}\circ[P^{T}HP]_{\alpha\gamma}\\[5.69054pt] [P^{T}HP]_{\alpha\beta}^{T}&M([P^{T}HP]_{\beta\beta})&0\\[5.69054pt] [P^{T}HP]_{\alpha\gamma}^{T}\circ U_{\alpha\gamma}^{T}&0&0\end{pmatrix}P^{T},\\[5.69054pt] \hfill\forall\,H\in{\mathcal{S}}^{p},\end{array}

(6.6)

where $``\circ"$ denotes the Hadamard product and

M([P^{T}HP]_{\beta\beta}):=\begin{pmatrix}([P^{T}HP]_{\beta\beta})_{\alpha^{\prime}\alpha^{\prime}}&\Gamma\circ([P^{T}HP]_{\beta\beta})_{\alpha^{\prime}\gamma^{\prime}}\\[5.69054pt] ([P^{T}HP]_{\beta\beta})_{\gamma^{\prime}\alpha^{\prime}}\circ\Gamma^{T}&0\end{pmatrix}.

It is easy to see that

{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(Z_{+})=\{H\in{\mathcal{S}}^{p}\mid P_{\bar{\alpha}}^{T}HP_{\bar{\alpha}}\succeq 0\},

(6.7)

where $\bar{\alpha}:=\beta\cup\gamma$ and $P_{\bar{\alpha}}:=[P_{\beta},P_{\gamma}]$ . From (6.7) we know that the lineality space of this set can be represented by

{{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(Z^{+})\big{)}=\{H\in{\mathcal{S}}^{p}\mid P_{\bar{\alpha}}^{T}HP_{\bar{\alpha}}=0\}.

(6.8)

The following definition of constraint nondegeneracy for the NLSDP problem comes from Sun [48, Definition 3.2], which is an analogue of the LICQ for NLP problems. The relation between this condition and the concept of nondegeneracy in Robinson [41] also has been well discussed in [48, Section 3].

Definition 6.2 (Constraint nondegeneracy).

We say that a feasible point $x$ to the NLSDP problem is constraint nondegenerate if

\begin{pmatrix}{\mathcal{J}}h(x)\\ {\mathcal{J}}g(x)\end{pmatrix}{\mathcal{X}}+\begin{pmatrix}\{0\}\\ {{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x))\big{)}\end{pmatrix}=\begin{pmatrix}{\mathds{R}}^{m}\\ {\mathcal{S}}^{p}\end{pmatrix}.

(6.9)

The critical cone of ${\mathcal{S}}_{+}^{p}$ at $Z\in{\mathcal{S}}^{p}$ , is defined by

\begin{array}[]{ll}{\mathcal{C}}(Z;S_{+}^{p}):&={\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(Z_{+})\cap(Z_{+}-Z)^{\bot},\\[5.69054pt] &=\{H\in{\mathcal{S}}^{p}:P_{\beta}^{T}HP_{\beta}\succeq 0,\,P_{\beta}^{T}HP_{\gamma}=0,\,P_{\gamma}^{T}HP_{\gamma}=0\}.\end{array}

Therefore,

{{\sf aff\,}}\big{(}{\mathcal{C}}(Z;S_{+}^{p})\big{)}=\{H\in{\mathcal{S}}^{p}\mid P_{\beta}^{T}HP_{\gamma}=0,\,P_{\gamma}^{T}HP_{\gamma}=0\}.

Let $(x^{*},\lambda^{*},\mu^{*})$ be a solution to the KKT system (3.2) of the NLSDP problem (6.1). Then, the critical cone ${\mathcal{C}}(x^{*})$ of NLSDP at $x^{*}$ is defined as

{\mathcal{C}}(x^{*}):=\{d\in{\mathcal{X}}\mid{\mathcal{J}}h(x^{*})d=0,\ {\mathcal{J}}g(x^{*})d\in{\mathcal{C}}(g(x^{*})-\mu^{*};{\mathcal{S}}_{+}^{p})\},

Since an explicit formula for the affine hull of ${\mathcal{C}}(x^{*})$ is not readily available, an outer approximation to this affine hull, with respect to $(\lambda^{*},\mu^{*})$ was introduced ⁴⁴4One may refer to Sun [48, Section 3] for more information. in Sun [48], i.e.,

\begin{array}[]{ll}{{\sf app\,}}(\lambda^{*},\mu^{*}):&=\{d\in{\mathcal{X}}:{\mathcal{J}}h(x^{*})d=0,{\mathcal{J}}g(x^{*})d\in{{\sf aff\,}}{\mathcal{C}}((\lambda^{*},\mu^{*});{\mathcal{S}}_{+}^{p})\}\\[5.69054pt] &=\{d\in{\mathcal{X}}:{\mathcal{J}}h(x^{*})d=0,P_{\beta}^{T}({\mathcal{J}}g(x^{*})d)P_{\gamma}=0,P_{\gamma}^{T}({\mathcal{J}}g(x^{*})d)P_{\gamma}=0\}.\end{array}

Moreover, for any $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ , one has that ${{\sf aff\,}}({\mathcal{C}}(x^{*}))\subseteq{{\sf app\,}}(\lambda^{*},\mu^{*})$ , which is different from (5.4) for the NLSOCP problems.

For any given matrix $B\in{\mathcal{S}}^{p}$ , we define the linear-quadratic function $\Upsilon_{B}:{\mathcal{S}}^{p}\times{\mathcal{S}}^{p}\to{\mathds{R}}$ , which is linear in the first argument and quadratic in the second argument, by

\Upsilon_{B}(\Gamma,A):=2\langle\Gamma,AB^{\dagger}A\rangle,\quad(\Gamma,A)\in{\mathcal{S}}^{p}\times{\mathcal{S}}^{p},

where $B^{\dagger}$ is the Moore-Penrose pseudo-inverse of $B$ . The following definition of the strong second-order sufficient condition for NLSDP problems was given in Sun [48, Definition 3.2], which is an extension of the strong second-order sufficient condition for NLP problems in Robinson [40] to NLSDP problems.

Definition 6.3 (Strong second-order sufficient condition for NLSDP).

Let $x^{*}$ be a stationary point of the NLSDP problem. We say that the strong second-order sufficient condition holds at $x^{*}$ if

\sup_{(\lambda,\mu)\in{\mathcal{M}}(x^{*})}\left\{\langle d,{\mathcal{J}}^{2}_{xx}{\mathcal{L}}_{0}(x^{*},\lambda,\mu)d\rangle-\Upsilon_{g(x^{*})}(\mu,{\mathcal{J}}g(x^{*})d)\right\}>0,\ \forall d\in\widehat{\mathcal{C}}(x^{*})\backslash\{0\},

where

\widehat{\mathcal{C}}(x^{*}):=\bigcap_{(\lambda,\mu)\in{\mathcal{M}}(x^{*})}{{\sf app\,}}(\lambda,\mu).

6.2 Main results

Based on the above preparations, we are ready to establish the main results of this section.

Theorem 6.4.

Suppose that both the constraint nondegeneracy and the strong second-order sufficient condition for problem (6.1) hold at $x^{*}$ $($ with $(\lambda^{*},\mu^{*})\in{\mathcal{M}}(x^{*})$ $)$ . Then, Assumptions 3.1 and 3.2 hold.

Proof.

The proof for Assumption 3.1 comes from [8, Proposition 17] and [51, Proposition 4]. Next, we show that Assumption 3.2 holds.

According to Assumption 3.1 we know that ${\mathcal{M}}(x^{*})=\{(\lambda^{*},\mu^{*})\}$ is a singleton. Recall that $(x^{*},\lambda^{*},\mu^{*})$ is a solution to the KKT system of the NLSDP problem. Then,

g(x^{*})\in{\mathcal{S}}_{+}^{p}\quad\mbox{and}\quad\mu^{*}\in{\mathcal{S}}_{+}^{p}.

If $\Xi:=g(x^{*})-\mu^{*}$ has the spectral decomposition that $\Xi=PDP^{T}$ with $P$ and $D$ having the representations in (6.4) and $\alpha,\beta,\gamma$ being defined in (6.3), one has from (6.2) that

g(x^{*})=P\begin{pmatrix}D_{\alpha}&0&0\\ 0&0&0\\ 0&0&0\end{pmatrix}P^{T}\quad\mbox{and}\quad-\mu^{*}=P\begin{pmatrix}0&0&0\\ 0&0&0\\ 0&0&D_{\gamma}\end{pmatrix}P^{T}.

(6.10)

Therefore, by using (6.7) and (6.8) one has

{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x^{*}))=\{H\in{\mathcal{S}}^{p}\mid[P_{\beta}\ P_{\gamma}]^{T}H[P_{\beta}\ P_{\gamma}]\succeq 0\},

so that

{{\sf lin\,}}\big{(}{\mathcal{T}}_{S_{+}^{p}}(g(x^{*}))\big{)}=\{H\in{\mathcal{S}}^{p}\mid[P_{\beta}\ P_{\gamma}]^{T}H[P_{\beta}\ P_{\gamma}]=0\}.

(6.11)

Define $Z:=\mu^{*}-cg(x^{*})$ and $Q=[P_{\gamma},P_{\beta},P_{\alpha}]$ . Then, by using (6.10) one can get

Z=P\begin{pmatrix}-cD_{\alpha}&0&0\\ 0&0&0\\ 0&0&-D_{\gamma}\end{pmatrix}P^{T}=Q\begin{pmatrix}-D_{\gamma}&0&0\\ 0&0&0\\ 0&0&-cD_{\alpha}\end{pmatrix}Q^{T}.

Let $\Delta\lambda\in{\mathds{R}}^{m}$ and $\Delta\mu\in{\mathcal{S}}^{p}$ such that $(\Delta\lambda,\Delta\mu)\neq 0$ . From (6.6) and (6.10) we know that for any ${\mathbf{W}}\in\partial_{B}\Pi_{{\mathcal{S}}_{+}^{p}}(Z)$ ,

$\begin{array}[]{l}\displaystyle\langle\Delta\mu,({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu\rangle\\ =\left\langle\Delta\mu,Q\,Q^{T}(\Delta\mu)Q\,Q^{T}-Q\begin{pmatrix}P_{\gamma}^{T}{\Delta\mu}P_{\gamma}&P_{\gamma}^{T}{\Delta\mu}P_{\beta}&U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] P_{\beta}^{T}{\Delta\mu}P_{\gamma}&\overline{M}(P_{\beta}^{T}{\Delta\mu}P_{\beta})&0\\[5.69054pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&0&0\end{pmatrix}Q^{T}\right\rangle\\[34.1433pt] =\left\langle Q^{T}\Delta\mu Q,Q^{T}(\Delta\mu)Q\,-\begin{pmatrix}P_{\gamma}^{T}{\Delta\mu}P_{\gamma}&P_{\gamma}^{T}{\Delta\mu}P_{\beta}&U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] P_{\beta}^{T}{\Delta\mu}P_{\gamma}&\overline{M}(P_{\beta}^{T}{\Delta\mu}P_{\beta})&0\\[5.69054pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&0&0\end{pmatrix}\right\rangle\\[34.1433pt] =\left\langle Q^{T}\Delta\mu Q,\begin{pmatrix}0&0&P_{\gamma}^{T}{\Delta\mu}P_{\alpha}-U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] 0&(I_{\beta}-\overline{M})(P_{\beta}^{T}{\Delta\mu}P_{\beta})&P_{\beta}^{T}{\Delta\mu}P_{\alpha}\\[5.69054pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}-P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&P_{\alpha}^{T}{\Delta\mu}P_{\beta}&P_{\alpha}^{T}{\Delta\mu}P_{\alpha}\end{pmatrix}\right\rangle,\end{array}$

(6.12)

where $I_{\beta}\in{\mathds{R}}^{|\beta|\times|\beta|}$ is the identity matrix and $\overline{M}$ has the same structure and properties as the matrix $M$ in Lemma 6.1. Then, by (6.5) and the fact that $\Delta\mu$ is arbitrarily given, we can conclude that $({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})$ is positive semidefinite, where ${\mathcal{I}}_{{\mathcal{S}}^{p}}$ is the identity operator on ${\mathcal{S}}^{p}$ . Since Assumption 3.1 holds, any ${\mathbf{V}}\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*})$ can be represented by the right-hand side of (3.10) with ${\mathcal{A}}_{c}(\lambda^{*},\mu^{*},{\mathbf{W}})$ being nonsingular. Therefore,

\begin{array}[]{ll}-\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle\\[5.69054pt] =\Big{\langle}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu),\\[5.69054pt] \quad({\mathcal{A}}_{c}(\lambda^{*},\mu^{*},{\mathbf{W}}))^{-1}\big{(}\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu)\big{)}\Big{\rangle}+\frac{1}{c}\left\langle\Delta\mu,({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu\right\rangle.\end{array}

Therefore, if $\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle=0$ , one has that

\langle\Delta\mu,({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu\rangle=0,

(6.13)

and

\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu)=0.

(6.14)

Since the constraint nondegeneracy (6.9) holds for the NLSDP problem at $x^{*}$ , there exists a vector $d\in{\mathcal{X}}$ and a matrix $S\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x^{*}))\big{)}$ such that

{\mathcal{J}}h(x^{*})d=\Delta\lambda\quad\mbox{and}\quad{\mathcal{J}}g(x^{*})d+S=\Delta\mu.

Therefore, one can get from (6.13) and (6.14) that

\begin{array}[]{l}\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=\langle\Delta\lambda,\Delta\lambda\rangle+\langle\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,\Delta\lambda\rangle+\langle{\mathbf{W}}\Delta\mu,\Delta\mu\rangle+\langle({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\Delta\lambda,{\mathcal{J}}h(x^{*})d\rangle+\langle{\mathbf{W}}\Delta\mu,{\mathcal{J}}g(x^{*})d+S\rangle+\langle({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu,\Delta\mu\rangle\\[5.69054pt] =\langle\nabla h(x^{*})(\Delta\lambda)+\nabla g(x^{*}){\mathbf{W}}(\Delta\mu),d\rangle+\langle({\mathcal{I}}_{{\mathcal{S}}^{p}}-{\mathbf{W}})\Delta\mu,\Delta\mu\rangle+\langle{\mathbf{W}}\Delta\mu,S\rangle\\[5.69054pt] =\langle{\mathbf{W}}\Delta\mu,S\rangle.\end{array}

Since $S\in{{\sf lin\,}}\big{(}{\mathcal{T}}_{{\mathcal{S}}_{+}^{p}}(g(x^{*}))\big{)}$ , we know from (6.11) that $[P_{\beta}\ P_{\gamma}]^{T}S[P_{\beta}\ P_{\gamma}]=0$ . Therefore, it holds that

\begin{array}[]{lll}\langle{\mathbf{W}}\Delta\mu,S\rangle&=&\left\langle\begin{pmatrix}P_{\gamma}^{T}{\Delta\mu}P_{\gamma}&P_{\gamma}^{T}{\Delta\mu}P_{\beta}&U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\\[2.84526pt] P_{\beta}^{T}{\Delta\mu}P_{\gamma}&\overline{M}(P_{\beta}^{T}{\Delta\mu}P_{\beta})&0\\[2.84526pt] P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T}&0&0\end{pmatrix}\right.,\\[28.45274pt] &&\qquad\qquad\qquad\left.\begin{pmatrix}0&0&P^{T}_{\gamma}SP_{\alpha}\\[2.84526pt] 0&0&P^{T}_{\beta}SP_{\alpha}\\[2.84526pt] P^{T}_{\alpha}SP_{\gamma}&P^{T}_{\alpha}SP_{\beta}&P^{T}_{\alpha}SP_{\alpha}\end{pmatrix}\right\rangle\\[28.45274pt] &=&\langle U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha},P^{T}_{\gamma}SP_{\alpha}\rangle+\langle P_{\alpha}^{T}{\Delta\mu}P_{\gamma}\circ U_{\gamma\alpha}^{T},P^{T}_{\alpha}SP_{\gamma}\rangle.\end{array}

Suppose that $\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle=0$ . From (6.12) and (6.13) we know that

\langle P_{\gamma}^{T}{\Delta\mu}P_{\alpha},P_{\gamma}^{T}{\Delta\mu}P_{\alpha}-U_{\gamma\alpha}\circ P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\rangle=0.

Then, by using (6.5) and (6.10) one can get $\|P_{\gamma}^{T}{\Delta\mu}P_{\alpha}\|=0$ . Hence $\langle{\mathbf{W}}\Delta\mu,S\rangle=0$ . Consequently, it holds that

\|\Delta\lambda\|^{2}+\|\Delta\mu\|^{2}=0.

This implies that if $\langle(\Delta\lambda,\Delta\mu),{\mathbf{V}}(\Delta\lambda,\Delta\mu)\rangle=0$ , one must have $\Delta\lambda=0$ and $\Delta\mu=0$ . Therefore, any ${\mathbf{V}}\in{\mathds{V}}_{c}(\lambda^{*},\mu^{*})$ is negative definite. This completes the proof. ∎

7 Numerical illustration

In this section, we use a simple numerical example to test whether the numerical performance of the second-order augmented Lagrangian method can achieve a superlinear convergence. For comparison, the classic augmented Lagrangian method with the linear convergence rate is also implemented.

Consider the following problem

\min_{(x_{1},x_{2})\in{\mathds{R}}^{2}}\left\{\frac{1}{2}(x_{1}-1)^{2}+\frac{1}{2}(x_{2}-2)^{2}\ \mid\ x_{1}=0,\ x_{2}\geq 0\right\}.

It is easy to verify the solution of this problem is given by $(x_{1}^{*},x_{2}^{*})=(0,2)$ , and both the LICQ (Definition 4.1) and the strong second-order sufficient condition (Definition 4.3) are satisfied at this point with the corresponding multipliers given by $(\lambda^{*},\mu^{*})=(-1,0)$ . We set the penalty parameter $c=1$ , and implement both the augmented Lagrangian method and the second-order augmented Lagrangian method for this problem with the initial multipliers ⁵⁵5The initial multiplier is not intentionally chosen. One can observe the same numerical behavior from other initial points. $(\lambda^{0},\mu^{0}):=(100,100)$ .

The numerical experiment is conducted by running Matlab R2020a on a MacBook Pro (macOS 11.2.3 system) with one Intel 2.7 GHz Quad-Core i7 Processor and 16 GB RAM. We measure the error at the $k$ -th iteration via

\eta^{k}:=\|(\lambda^{k},\mu^{k})-(\lambda^{*},\mu^{*})\|.

The algorithms are terminated if $\eta^{k}\leq 10^{-20}$ . Moreover, we set $\eta^{k}$ as $10^{-50}$ if $\eta^{k}\leq 10^{-50}$ , so that a valid value can be obtained by taking the logarithm of the error when it is too close to zero.

Figure 1 clearly shows the behaviors of the two methods. The left part shows how the errors of the two methods varies with respect to the iteration number, while the right part of this figure, by taking logarithm of the errors, explicitly shows the linear convergence of the classic augmented Lagrangian method, as well as the superlinear convergence of the second-order method of multipliers.

Refer to caption — Figure 1: The error (left) and the logarithm of the error (right) with respect to the iteration

8 Conclusions

In this paper, we have proposed a second-order method of multipliers for solving the NLP, the NLSOCP and the NLSDP problems. The proposed method is a combination of the classic method of multipliers and a specially designed nonsmooth Newton method for updating the multipliers. The local convergence, as well as the superlinear convergence of this second-order method was established under the constraint nondegeneracy (or LICQ) together with a strong second-order sufficient condition. We re-emphasize that, the conditions we used in this paper for proving the superlinear rate convergence of the second-order method of multipliers are exactly those used in the literature for establishing the linear rate convergence (or asymptotically superlinear if the penalty parameter goes to infinity) of the classic method of multipliers without assuming the strictly complementarity. Besides, a simple numerical test was conducted to demonstrate the superlinear convergence behavior of the method studied in this paper.

Since the proposed method is based on both the augmented Lagrangian method and generalized Newton method, we are not able to establish any global convergence property of this method in general. However, to make the method being implementable, globalization techniques should be further studied, together with the computational studies on solving the subproblems and updating the multipliers.

References

[1] Alizadeh F, Goldfarb D. Second-order cone programming. Math Program, 2003, 95(1): 3–51
[2] Bertsekas D P. On penalty and multiplier methods for constrained minimization. SIAM J Control Optim, 1976, 14(2): 216–235
[3] Bertsekas D P. Multiplier methods: A survey. Automatica, 1976, 12(2): 133–145
[4] Bertsekas D P. On the convergence properties of second-order methods of multipliers. J Optim Theory Appl, 1978, 25(3): 443–449
[5] Bertsekas D P. Constrained Optimization and Lagrange Multiplier Methods. New York: Academic, 1982
[6] Bonnans J F, Ramírez H. Perturbation analysis of second-order cone programming problems. Math Program, 2005, 104(2): 205–227
[7] Bonnans J F, Shapiro A. Nondegeneracy and quantitative stability of parameterized optimization problems with multiple solutions. SIAM J Optim, 1998, 8(4): 940–946
[8] Bonnans J F, Shapiro A. Perturbation Analysis of Optimization Problems. New York: Springer, 2000
[9] Brusch R. A rapidly convergent method for equality constrained function minimization. In: Proceeding of 1973 IEEE Conference on Decision Control. San Diego, California, 1973, 80–81
[10] Bueno L, Haeser G, Santos L. Towards an efficient augmented Lagrangian method for convex quadratic programming. Comput Optim Appl, 2020, 76: 767–800
[11] Buys J. Dual algorithms for constrained optimization problems. PhD Dissertation. Netherlands: Univ. of Leiden, 1972
[12] Chen J-S, Chen X, Tseng P. Analysis of nonsmooth vector-valued functions associated with second-order cones. Math Program, 2004, 101: 95–117
[13] Chen L, Liao A P. On the convergence properties of a second-order augmented Lagrangian method for nonlinear programming problems with inequality constraints. J Optim Theory Appl, 2020, 187: 248–265
[14] Chen X, Sun D F, Sun J. Complementarity functions and numerical experiments on some smoothing Newton methods for second-order-cone complementarity problems. Comput Optim Appl, 2003, 25: 39–56
[15] Clarke F H. Optimization and Nonsmooth Analysis. New York: Wiley, 1983
[16] Conn A, Gould N, Toint P. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J Numer Anal, 1991, 28(2): 545–572
[17] Conn A, Gould N, Toint P. LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A). Berlin Heidelberg: Springer-Verlag, 1992
[18] Contesse-Becker L. Extended convergence results for the method of multipliers for nonstrictly binding inequality constraints. J Optim Theory Appl, 1993, 79(2): 273–310
[19] Fletcher R. An ideal penalty function for constrained optimization. IMA J Appl Math, 1975, 15(3): 319–342
[20] Fontecilla R, Steihaug T, Tapia R. A convergence theory for a class of quasi-Newton methods for constrained optimization. SIAM J Numer Anal, 1987, 24(5): 1133–1151
[21] Fukushima M, Luo Z Q, Tseng P. Smoothing functions for second-order-cone complementarity problems. SIAM J Optim, 2001, 12: 436–460
[22] Hayashi S, Yamashita N, Fukushima M. A combined smoothing and regularization method for monotone second-order cone complementarity problems. SIAM J Optim, 2005, 15: 593–615
[23] Hestenes M. Multiplier and gradient methods. J Optim Theory Appl, 1969, 4(5): 303–320
[24] Ito K, Kunisch K. The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math Program, 1990, 46(1-3): 341–360
[25] Kanzow C, Ferenczi I, Fukushima M. On the local convergence of semismooth Newton methods for linear and nonlinear second-order cone programs without strict complementarity. SIAM J Optim, 2009, 20(1): 297–320
[26] Kummer B. Newton’s method based on generalized derivatives for nonsmooth functions: Convergence analysis. In: Oettli W (ed.), Advances in Optimization, Berlin: Springer, 1992, 171–194
[27] Kummer B. Newton’s method for nondifferentiable functions. In: Guddat J (ed.), Advances in Mathematical Optimization, Berlin: Akademie-Verlag, 1998, 114–125
[28] Li X D, Sun D F, Toh K-C. A highly efficient semismooth Netwon augmented Lagrangian method for solving Lasso problems. SIAM J Optim, 2018, 28: 433–458
[29] Liu Y J, Zhang L W. Convergence analysis of the augmented Lagrangian method for nonlinear second-order cone optimization problems. Nonlinear Anal, 2007, 67: 1359–1373
[30] Liu Y J, Zhang L W. Convergence of the augmented Lagrangian method for nonlinear optimization problems over second-order cones. J Optim Theory Appl, 2008, 139: 557–575
[31] Meng F W, Sun D F, Zhao G Y. Semismoothness of solutions to generalized equations and the Moreau-Yosida regularization. Math Program, 2005, 104: 561–581
[32] Mifflin R. Semismooth and semiconvex functions in constrained optimization. SIAM J Control Optim, 1977, 15(6) 959–972
[33] Nocedal J, Wright S J. Numerical Optimization (2nd Edition). New York: Springer-Verlag, 2006
[34] Outrata J V, Sun D F. On the coderivative of the projection operator onto the second-order cone. Set-Valued Anal, 2008, 16: 999–1014
[35] Pang J-S, Sun D F, Sun J. Semismooth homeomorphisms and strong stability of semidefinite and Lorentz complementarity problems. Math Oper Res, 2003, 28: 39–63
[36] Pang J-S, Qi L Q. Nonsmooth equations: motivation and algorithms. SIAM J Optim, 1993, 3: 443–465
[37] Powell M J D. A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization, New York: Academic, 1969, 283–298
[38] Qi L Q, Sun J. A nonsmooth version of Newton’s method. Math Program, 1993, 58(1-3): 353–367
[39] Qi L Q. Convergence analysis of some algorithms for solving nonsmooth equations. Math Oper Res, 1993, 18(1): 227–244
[40] Robinson S M. Strongly regular generalized equations. Math Oper Res, 1980, 5(1): 43–62
[41] Robinson S M. Local structure of feasible sets in nonlinear programming, Part II: Nondegeneracy. Math Program Study, 1984, 22: 217–230
[42] Robinson S M. Constraint nondegeneracy in variational analysis. Math Oper Res, 2003, 28: 201–232
[43] Rockafellar R T. A dual approach to solving nonlinear programming problems by unconstrained optimization. Math Program, 1973, 5(1): 354–373
[44] Rockafellar R T. The multiplier method of hestenes and powell applied to convex programming. J Optim Theory Appl, 1973, 12(6): 555–562
[45] Rockafellar R T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math Oper Res, 1976, 1(2): 97–116
[46] Rockafellar R T, Wets R J-B. Variational Analysis. Berlin: Springer-Verlag, 1998
[47] Shapiro A. Sensitivity analysis of generalized equations. J Math Sci, 2003, 115: 2554–2565.
[48] Sun D F. The strong second-order sufficient condition and constraint nondegeneracy in nonlinear semidefinite programming and their implications. Math Oper Res, 2006, 31(4): 761–776
[49] Sun D F, Han J Y. Newton and quasi-Newton methods for a class of nonsmooth equations and related problems. SIAM J Optim, 1997, 7: 463–480
[50] Sun D, Sun J. Semismooth matrix valued functions. Math Oper Res, 2002, 27: 150–169
[51] Sun D F, Sun J, Zhang L W. The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math Program, 2008, 114(2): 349–391
[52] Tretyakov N. A method of penalty estimates for convex programming problems. Ékonomika i Matematicheskie Metody, 1973, 9: 525–540
[53] Yang L Q, Sun D F, Toh K-C. SDPNAL+: a majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math Program Comput, 2015, 7: 331–366
[54] Yuan Y-X. Analysis on a superlinearly convergent augmented Lagrangian method. Acta Math Sinica, English Series, 2014, 30(1): 1–10
[55] Zhao X Y, Sun D F, Toh K-C. A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J Optim, 2010, 20: 1737–1765

Appendix A Proof of Proposition 3.3

The following result is a consequence of Qi and Sun [38, Theorem 2.3].

Lemma A.1.

The locally Lipschitz continuous function $F:O\to{\mathcal{U}}$ is semismooth at $u\in O$ if and only if for any sequence $\{u^{k}\}\subset O$ converging to $u$ with $V^{k}\in\partial F(u^{k})$ , it holds that $F\,^{\prime}(u;u^{k}-u)-V^{k}(u^{k}-u)=o(\|u^{k}-u\|)$ .

Now we start to prove Proposition 3.3.

Proof of Proposition 3.3.

Since Assumption 3.1 holds and $c\geq\bar{c}$ , one can get the two constants $\varepsilon$ and $\delta_{0}$ , as well as the locally Lipschitz function ${\mathbf{x}}_{c}(\cdot)$ , from Proposition 3.1. For convenience, we denote $y:=(\lambda,\mu)\in{\mathds{R}}^{m}\times{\mathcal{Z}}$ . Suppose that $y\in{\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ . From (1.2), (3.1), (3.4) and (3.8) one can get

\nabla_{x}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(y),\lambda_{c}(y),\mu_{c}(y))=\nabla_{x}{\mathcal{L}}_{c}({\mathbf{x}}_{c}(y),\lambda,\mu)=0.

Then, by taking the directional derivative of $\nabla_{x}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(y),\lambda_{c}(y),\mu_{c}(y))$ along with the given direction $\Delta y:=(\Delta\lambda,\Delta\mu)\in{\mathds{R}}^{m}\times{\mathcal{Z}}$ and using (3.8) one can see that

\begin{array}[]{ll}0=&\nabla_{xx}^{2}{\mathcal{L}}_{0}({\mathbf{x}}_{c}(y),\lambda_{c}(y),\mu_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] &-\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+c\nabla h({\mathbf{x}}_{c}(y)){\mathcal{J}}h({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] &-\nabla g({\mathbf{x}}_{c}(y))\Pi_{K}^{\prime}(\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)).\end{array}

(A.1)

On the other hand, since the projection operator $\Pi_{K}(\cdot)$ is strongly semismooth, we know that there always exists a linear operator $\widehat{W}\in\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y)))$ such that

\Pi_{K}^{\prime}\left(\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\right)=\widehat{W}\left(\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\right).

(A.2)

Therefore, by combining (A.1), (A.2) and (3.9) together, one can get

{\mathcal{A}}_{c}(\lambda,\mu,\widehat{W}){\mathbf{x}}_{c}^{\prime}(y;\Delta y)=\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+\nabla g({\mathbf{x}}_{c}(y))\widehat{W}(\Delta\mu).

(A.3)

From [31, Proposition 1] we know that the range of $\partial_{B}\Pi_{K}(\cdot)$ is bounded. Then, by [46, proposition 5.51(b)] and [46, Proposition 5.52(b)] we know that the composite mapping ${\mathcal{A}}_{c}(\lambda,\mu,\partial_{B}\Pi_{K}(\mu-cg(y)))$ is outer semicontinuous in ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ . Note that the set ${\mathcal{A}}_{c}(\lambda^{*},\mu^{*},\partial_{B}\Pi_{K}(\mu^{*}-cg(\lambda^{*},\mu^{*})))$ is bounded and bounded away from zero. Hence, by (3.11) and Assumption 3.1, we know from [46, Proposition 5.12(a)] that there exists a positive constant $\delta_{1}\leq\delta_{0}$ such that ${\mathcal{A}}_{c}(\lambda,\mu,W)$ is always positive definite if $W\in\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y)))$ with $y=(\lambda,\mu)\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ . Consequently, in this case, we know from (A.3) that

{\mathbf{x}}_{c}^{\prime}(y;\Delta y)=({\mathcal{A}}_{c}(\lambda,\mu,\widehat{W}))^{-1}\left(\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+\nabla g({\mathbf{x}}_{c}(y))\widehat{W}(\Delta\mu)\right).

(A.4)

We use $D_{\nabla\vartheta_{c}}$ to denote the set of all the points in ${\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ where $\nabla\vartheta_{c}(\cdot)$ is Fréchet-differentiable. Since $\nabla\vartheta_{c}(\cdot)$ is semismooth in ${\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ , it holds that for any $y=(\lambda,\mu)\in D_{\nabla\vartheta_{c}}$ ,

\nabla^{2}\vartheta_{c}(y)(\Delta y)=\left(\begin{matrix}-{\mathcal{J}}h({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[2.84526pt] -c^{-1}\Delta\mu+c^{-1}\Pi_{K}^{\prime}\big{(}\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\big{)}\end{matrix}\right).

(A.5)

Thus, by combining (A.2), (A.4) and (A.5) together, one can see from (3.10) that

\nabla^{2}\vartheta_{c}(y)(\Delta y)\\ \in\left\{V(\Delta y)\mid V\in{\mathds{V}}_{c}(y)\right\},\quad\forall y=(\lambda,\mu)\in D_{\nabla\vartheta}.

Since ${\mathbf{x}}_{c}(\cdot)$ is locally Lipschitz continuous and $\partial_{B}\Pi_{K}(\cdot)$ is bounded-valued and outer semicontinuous, by [46, Propositions 5.51(b) & 5.52(b)] we know that ${\mathds{V}}_{c}$ is compact-valued and outer semicontinuous. Hence, for any $(\lambda,\mu)\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ , one has that $\partial_{B}(\nabla\vartheta_{c})(\lambda,\mu)\subseteq{\mathds{V}}_{c}(\lambda,\mu)$ . This completes the proof. ∎

Appendix B Proof of Theorem 3.5

Proof.

Since Assumption 3.1 holds and $c\geq\bar{c}$ , one can get the two constants $\varepsilon$ and $\delta_{0}$ , as well as the locally Lipschitz function ${\mathbf{x}}_{c}(\cdot)$ , via Proposition 3.1. Meanwhile, from Proposition 3.3 we know that there exists a constant $\delta_{1}\leq\delta_{0}$ such that the mapping ${\mathds{V}}_{c}$ defined by (3.10) is well-defined, compact-valued and outer semicontinuous in ${\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ . Since the sequence $\{(\lambda^{k},\mu^{k})\}$ converges to $(\lambda^{*},\mu^{*})$ , one always has that $(\lambda^{k},\mu^{k})\in{\mathds{B}}_{\delta_{1}}(\lambda^{*},\mu^{*})$ for all sufficiently large $k$ . Therefore, the set ${\mathds{V}}_{c}(\lambda^{k},\mu^{k})$ is nonempty and compact if $k$ is sufficiently large. Moreover, according to Assumption 3.2 and [46, Proposition 5.12(a)] we know that there exists a positive constant $\delta_{2}\leq\delta_{1}$ such that every element in ${\mathds{V}}_{c}(\lambda,\mu)$ is negative definite whenever $(\lambda,\mu)\in{\mathds{B}}_{\delta_{2}}(\lambda^{*},\mu^{*})$ .

Next, we prove (3.12). For all sufficiently large $k$ , with the convention that $x^{k+1}:={\mathbf{x}}_{c}(\lambda^{k},\mu^{k})$ , one can see from (3.10) that there exists a linear operator $W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ such that

\begin{array}[]{ll}V^{k}=&-\begin{pmatrix}{\mathcal{J}}h(x^{k+1})\\[0.0pt] W^{k}{\mathcal{J}}g(x^{k+1})\end{pmatrix}\left({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k})\right)^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)\\[11.38109pt] &-c^{-1}\left(\begin{matrix}0&0\\[0.0pt] 0&{\mathcal{I}}_{{\mathcal{Z}}}-W^{k}\end{matrix}\right).\end{array}

(B.1)

Meanwhile, one can see from (3.2) and Lemma 3.2 that

\begin{array}[]{l}\nabla\vartheta_{c}(\lambda^{k},\mu^{k})-\nabla\vartheta_{c}(\lambda^{*},\mu^{*})\\[5.69054pt] =\left(\begin{matrix}-h(x^{k+1})+h(x^{*})\\[5.69054pt] -c^{-1}(\mu^{k}-\mu^{*})+c^{-1}\Pi_{K}\big{(}\mu^{k}-cg(x^{k+1}))-c^{-1}\Pi_{K}(\mu^{*}-cg(x^{*})\big{)}\end{matrix}\right).\end{array}

(B.2)

For convenience, we denote $y^{k}:=(\lambda^{k},\mu^{k})$ and $y^{*}:=(\lambda^{*},\mu^{*})$ . Then, by (B.1) and (B.2) we know that (3.12) holds if and only if for all $W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ one has that, with $k\to\infty$ ,

\begin{array}[]{l}h(x^{*})-h(x^{k+1})\\[2.84526pt] +{\mathcal{J}}h(x^{k+1})({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})\\[2.84526pt] =o(\|y^{k}-y^{*}\|)\end{array}

(B.3)

and

W^{k}{\mathcal{J}}g(x^{k+1})({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})\\[2.84526pt] +c^{-1}\Pi_{K}(\mu^{k}-cg(x^{k+1}))-c^{-1}\Pi_{K}(\mu^{*}-cg(x^{*}))-c^{-1}W^{k}(\mu^{k}-\mu^{*})\\[2.84526pt] =o(\|y^{k}-y^{*}\|).

(B.4)

We first veirfy (B.3). We know from (A.2) and (A.4) that, when $k$ is sufficiently large,

\begin{array}[]{ll}{\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =\big{(}{\mathcal{A}}_{c}(\lambda^{k},\mu^{k},\widehat{W}^{k})\big{)}^{-1}\big{(}\nabla h(x^{k+1})(\lambda^{k}-\lambda^{*})+\nabla g(x^{k+1})\widehat{W}^{k}(\mu^{k}-\mu^{*})\big{)},\end{array}

(B.5)

where the linear operator $\widehat{W}^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ is chosen such that

\begin{array}[]{ll}\displaystyle\Pi_{K}^{\prime}\big{(}\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\big{)}\\[5.69054pt] \displaystyle=\widehat{W}^{k}\big{(}\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\big{)}.\end{array}

(B.6)

As has been discussed in Appendix A, the range of $\partial_{B}\Pi_{K}(\cdot)$ is bounded, so that the sequence $\{\widehat{W}^{k}\}$ is also bounded. Moreover, the set ${\mathcal{A}}_{c}(\lambda^{*},\mu^{*},\partial_{B}\Pi_{K}(\mu^{*}-cg(\lambda^{*},\mu^{*})))$ is bounded and bounded away from zero, while the composite mapping ${\mathcal{A}}_{c}(\lambda,\mu,\partial_{B}\Pi_{K}(\mu-cg(y)))$ is outer semicontinuous in ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ . Note that the functions $g$ and $h$ are twice continuously differentiable, the sequences $\{\nabla h(x^{k+1})\}$ and $\{\nabla g(x^{k+1})\}$ are bounded. Then, we know from (B.5) that there exists a constant $\rho_{1}>0$ such that, for all sufficiently large $k$ , $\|{\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\|\leq\rho_{1}\|y^{k}-y^{*}\|$ .

Since $\Pi_{K}(\cdot)$ is strongly semismooth and $W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ , one can see from (B.6) that, when $k\to\infty$ and $k$ is sufficiently large,

\begin{array}[]{l}\Pi_{K}^{\prime}\left(\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\right)\\[5.69054pt] \qquad-W^{k}(\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*}))\\[5.69054pt] =(\widehat{W}^{k}-W^{k})\left(\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\right)\\[5.69054pt] =(\widehat{W}^{k}-W^{k})\left(\mu^{k}-cg(x^{k+1})-\mu^{*}+cg(x^{*})\right)\\[5.69054pt] \quad+c(\widehat{W}^{k}-W^{k})\left(g(x^{k+1})-g(x^{*})-{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*}))\right)\\[5.69054pt] =o\left(\|\mu^{k}-cg({\mathbf{x}}_{c}(y^{k}))-\mu^{*}+cg(x^{*})\|\right)\\[2.84526pt] \quad+c(\widehat{W}^{k}-W^{k})\big{(}g({\mathbf{x}}_{c}(y^{k}))-g({\mathbf{x}}_{c}(y^{*}))-{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\big{)}\\[5.69054pt] =o(\|y^{k}-y^{*}\|),\end{array}

where the penultimate equality is due to Lemma A.1, and the last inequality comes from the fact that the function $g({\mathbf{x}}_{c}(\cdot))$ is semismooth (hence directionally differentiable) around $y^{*}$ . Then, from the fact that $\{\nabla g(x^{k+1})\}$ is bounded, we can get from the above equation that

\begin{array}[]{l}\nabla g(x^{k+1})\Pi_{K}^{\prime}\left(\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\right)\\[5.69054pt] \qquad+c\nabla g(x^{k+1})W^{k}{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})-\nabla g(x^{k+1})W^{k}(\mu^{k}-\mu^{*})\\[5.69054pt] =o(\|y^{k}-y^{*}\|).\end{array}

(B.7)

Note that (A.1) in Appendix A holds for $y\in{\mathds{B}}_{\delta_{2}}(\lambda^{*},\mu^{*})$ . Then, by using (A.1) and (3.9) together we can get that for $y\in{\mathds{B}}_{\delta_{2}}(\lambda^{*},\mu^{*})$ and $W\in\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y)))$ ,

\begin{array}[]{ll}\displaystyle{\mathcal{A}}_{c}(\lambda,\mu,W){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] =\nabla h({\mathbf{x}}_{c}(y))(\Delta\lambda)+c\nabla g({\mathbf{x}}_{c}(\lambda,\mu))W{\mathcal{J}}g({\mathbf{x}}_{c}(\lambda,\mu)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)\\[5.69054pt] \qquad+\nabla g({\mathbf{x}}_{c}(y))\Pi_{K}^{\prime}(\mu-cg({\mathbf{x}}_{c}(y));\Delta\mu-c{\mathcal{J}}g({\mathbf{x}}_{c}(y)){\mathbf{x}}_{c}^{\prime}(y;\Delta y)).\end{array}

(B.8)

Then, by taking (B.7) into (B.8) one can get that for all sufficiently large $k$ with $W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ ,

\begin{array}[]{ll}{\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =\nabla h(x^{k+1})(\lambda^{k}-\lambda^{*})+\nabla g(x^{k+1})W^{k}(\mu^{k}-\mu^{*})+o(\|y^{k}-y^{*}\|),\end{array}

which, together with Assumption 3.1 the fact the mapping ${\mathcal{A}}_{c}(\lambda,\mu,\partial_{B}\Pi_{K}(\mu-cg({\mathbf{x}}_{c}(y))))$ is outer semicontinuous in ${\mathds{B}}_{\delta_{0}}(\lambda^{*},\mu^{*})$ , implies that for all sufficiently large $k$ , as $k\to\infty$ ,

\begin{array}[]{ll}{\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})+o(\|y^{k}-y^{*}\|).\end{array}

(B.9)

Since $h({\mathbf{x}}_{c}(\cdot))$ is locally semismooth, we have that for sufficiently large $k\to\infty$ ,

h(x^{*})=h(x^{k+1})-{\mathcal{J}}h(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})+o(\|y^{k}-y^{*}\|).

(B.10)

Now, by substituting (B.9) in (B.10) and noting that the sequence $\{{\mathcal{J}}h(x^{k+1})\}$ is bounded, we can get (B.3). Next, we verify (B.4). Note that for sufficiently large $k$ with $k\to\infty$ , it holds that

\begin{array}[]{l}c^{-1}(\Pi_{K}(\mu^{k}-cg(x^{k+1}))-\Pi_{K}(\mu^{*}-cg(x^{*})))\\[2.84526pt] =c^{-1}\Pi_{K}^{\prime}\left(\mu^{k}-cg(x^{k+1}),\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)+o(\|y^{k}-y^{*}\|),\end{array}

(B.11)

and

-cg(x^{k+1})+cg(x^{*})=-c{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})+o(\|y^{k}-y^{*}\|).

(B.12)

Meanwhile, one can choose for each $k$ the linear operator $\widetilde{W}^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ such that

\begin{array}[]{l}\widetilde{W}^{k}\big{(}\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\big{)}\\[5.69054pt] =\Pi_{K}^{\prime}\big{(}\mu^{k}-cg(x^{k+1});\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\big{)}.\end{array}

(B.13)

Then by combining (B.11), (B.12) and (B.13) together, we can get that

\begin{array}[]{l}c^{-1}\left(\Pi_{K}(\mu^{k}-cg(x^{k+1}))-\Pi_{K}(\mu^{*}-cg(x^{*}))\right)-c^{-1}W^{k}(\mu^{k}-\mu^{*})\\[5.69054pt] \quad+W^{k}{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =c^{-1}\widetilde{W}^{k}\left(\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)\\[5.69054pt] \qquad-c^{-1}W^{k}\left(\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)+o(\|y^{k}-y^{*}\|)\\[5.69054pt] =c^{-1}(\widetilde{W}^{k}-W^{k})\left(\mu^{k}-\mu^{*}-cg(x^{k+1})+cg(x^{*})\right)+o(\|y^{k}-y^{*}\|)\\[5.69054pt] =o(\|y^{k}-y^{*}\|),\end{array}

(B.14)

where the last inequality comes from Lemma A.1 and the fact that $\widetilde{W}^{k},W^{k}\in\partial_{B}\Pi_{K}(\mu^{k}-cg(x^{k+1}))$ . Note that (B.9) implies that

\begin{array}[]{l}W^{k}{\mathcal{J}}g(x^{k+1}){\mathbf{x}}_{c}^{\prime}(y^{k};y^{k}-y^{*})\\[5.69054pt] =W^{k}{\mathcal{J}}g(x^{k+1})({\mathcal{A}}_{c}(\lambda^{k},\mu^{k},W^{k}))^{-1}\left(\nabla h(x^{k+1}),\nabla g(x^{k+1})W^{k}\right)(y^{k}-y^{*})+o(\|y^{k}-y^{*}\|).\end{array}

(B.15)

Thus, by substituting (B.15) in (B.14) we can get (B.4), which completes the proof. ∎