This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence analysis of primal-dual augmented Lagrangian methods and duality theory

M.V. Dolgopolik111Institute for Problems in Mechanical Engineering of the Russian Academy of Sciences, Saint Petersburg, Russia
Abstract

We develop a unified theory of augmented Lagrangians for nonconvex optimization problems that encompasses both duality theory and convergence analysis of primal-dual augmented Lagrangian methods in the infinite dimensional setting. Our goal is to present many well-known concepts and results related to augmented Lagrangians in a unified manner and bridge a gap between existing convergence analysis of primal-dual augmented Lagrangian methods and abstract duality theory. Within our theory we specifically emphasize the role of various fundamental duality concepts (such as duality gap, optimal dual solutions, global saddle points, etc.) in convergence analysis of augmented Lagrangians methods and underline interconnections between all these concepts and convergence of primal and dual sequences generated by such methods. In particular, we prove that the zero duality gap property is a necessary condition for the boundedness of the primal sequence, while the existence of an optimal dual solution is a necessary condition for the boundedness of the sequences of multipliers and penalty parameters, irrespective of the way in which the multipliers and the penalty parameter are updated. Our theoretical results are applicable to many different augmented Lagrangians for various types of cone constrained optimization problems, including Rockafellar-Wets’ augmented Lagrangian, (penalized) exponential/hyperbolic-type augmented Lagrangians, modified barrier functions, etc.

1 Introduction

Augmented Lagrangians play a fundamental role in optimization and many other closely related fields [2, 25, 34, 5, 19], and there is a vast literature on various aspects of augmented Lagrangian theory and corresponding methods. A wide range of topics that is studied within the more theoretically oriented part of the literature includes such important problems as analysis of the zero duality gap property [31, 32, 17, 39, 86, 72, 79], exact penalty representation [31, 32, 86, 9, 72, 87, 85, 12, 24], existence of global saddle points [68, 76, 77, 84, 43, 70, 22], existence of augmented Lagrange multipliers [62, 61, 88, 18, 21], etc.

In turn, more practically oriented papers are usually devoted to convergence analysis of primal-dual augmented Lagrangian methods and do not utilise any theoretical concepts or results from duality theory, except for the zero duality gap property [8, 10, 1, 16, 14]. Very few attempts have been made to connect convergence analysis of such methods with fundamental results from duality theory. Paper [7], in which the equivalence between primal convergence and differentiability of the augmented dual function for the sharp Lagrangian for equality constrained problems was established, is perhaps the most notable among them.

In addition, papers dealing with convergence analysis of primal-dual augmented Lagrangian methods typically consider only one particular augmented Lagrangian for one particular type of constraints. As a consequence of that, very similar results have to be repeated multiple types in different settings (cf. convergence analysis of augmented Lagrangian methods based on the Hestenes-Powell-Rockafellar augmented Lagrangian for mathematical programming problems [5], nonconvex problems with second order cone constraints [40, 41], nonconvex semidefinite programming problems [65, 45, 74], as well as similar convergence analysis of augmented Lagrangian methods based on the exponential-type augmented Lagrangian/modified barrier function for nonconvex problems with second order cone constraints [81] and nonconvex semidefinite programming problems [38]). Some attempts have been made to unify convergence analysis of a number of primal-dual augmented Lagrangian methods [44, 42, 71], but only for finite dimensional inequality constrained optimization problems.

Thus, there are two fundamental challenges within the theory of augmented Lagrangians and corresponding optimization methods. The first one is connected with a noticeable gap that exists between more theoretically oriented results dealing with duality theory and more practically oriented results on convergence analysis of primal-dual methods. Often, the duality theory plays little to no role in convergence analysis of primal-dual methods, and theoretical results from this theory are almost never utilised to help better understand convergence of augmented Lagrangian methods. The second challenge is connected with unification of numerous similar results on duality theory and augmented Lagrangian methods that are scattered in the literature.

The main goal of this article is to present a general theory of augmented Lagrangians for nonconvex cone constrained optimization problems in the infinite dimensional setting that encompasses both fundamental theoretical concepts from duality theory and convergence analysis of primal-dual augmented Lagrangian methods, and also highlights interconnections between them, thus at least partially solving the two aforementioned challenges.

Our other aim is to correct for the bias that exists in the literature on augmented Lagrangians, in which a disproportionate number of papers is devoted to analysis of Rockafellar-Wets’s augmented Lagrangian [58, Section 11.K] and corresponding numerical methods, while very little attention is paid to other classes of augmented Lagrangians. In particular, to the best of the author’s knowledge, many concepts (such as exact penalty map [12], exact penalty representation, and augmented Lagrange multipliers) have been introduced and studied exclusively in the context of Rockafellar-Wets’ augmented Lagrangian. Our aim is to show that these concepts can be defined and be useful in a much more broad setting and to demonstrate that many results from duality theory and convergence analysis of primal-dual methods can be proved for Rockafellar-Wets’ augmented Lagrangian and many other augmented Lagrangians in a unified way.

To achieve our goals, we adopt an axiomatic augmented Lagrangian setting developed by the author in [22] and inspired by [39]. Within this setting, one defines an abstract augmented Lagrangian without specifying its structure, while all theoretical results are proved using a set of axioms (basic assumptions). To demonstrate the broad applicability of this approach, we present many particular examples of well-known augmented Lagrangians and prove that they satisfy these axioms.

We also develop a general duality theory for augmented Lagrangians that compliments the results of the author’s earlier papers [22, 23] and, in particular, provide simple necessary and sufficient conditions for the zero duality gap property to hold true, from which many existing results on this property can be easily derived. Finally, we present a general convergence analysis for a model augmented Lagrangian method with arbitrary rules for updating multipliers and penalty parameter. Under some natural assumptions, that are satisfied in many particular cases, we study primal and dual convergence of this method, making specific emphasis on the role of various fundamental concepts from duality theory in convergence analysis of augmented Lagrangian methods. In particular, we points out direct connections between primal convergence and the zero duality gap property, as well as direct connections between dual convergence, boundedness of the penalty parameter, and the existence of optimal dual solutions/global saddle points.

The paper is organized as follows. An abstract axiomatic augmented Lagrangian setting for cone constrained optimization problems in normed spaces, including a list of basic assumptions (axioms) on an augmented Lagrangian, is presented in Section 2. Many particular examples of augmented Lagrangians for various types of cone constrained optimization problems and the basic assumptions that these augmented Lagrangians satisfy are discussed in details in Section 3. Section 4 is devoted to duality theory. In this section we analyse the zero duality gap property for the augmented Lagrangian, and also study interconnections between global saddle points, globally optimal solutions of the augmented dual problem, augmented Lagrange multipliers, and the penalty map. Finally, a general convergence theory for a model augmented Lagrangian method, that encompasses many existing primal-dual augmented Lagrangian methods as particular cases, is presented in Section 5.

2 Axiomatic augmented Lagrangian setting

We start by presenting an axiomatic approach to augmented Lagrangians that serves as a foundation for our duality and convergence theories. Let XX and YY be real normed spaces, QXQ\subseteq X be a nonempty closed set (not necessarily convex), and KYK\subset Y be a closed convex cone. Suppose also that some functions f:X{+}f\colon X\to\mathbb{R}\cup\{+\infty\} and G:XYG\colon X\to Y are given. In this article we study augmented Lagrangians for the following cone constrained optimization problem:

minf(x)subject toG(x)K,xQ.\min\>f(x)\quad\text{subject to}\quad G(x)\in K,\quad x\in Q.\qquad (𝒫)

We assume that the feasible region of this problem is nonempty and its optimal value, denoted by ff_{*}, is finite.

The topological dual space of YY is denoted by YY^{*}, and let ,\langle\cdot,\cdot\rangle be either the inner product in s\mathbb{R}^{s}, ss\in\mathbb{N}, or the duality pairing between YY and its dual, depending on the context. Recall that K={yYy,y0yK}K^{*}=\{y^{*}\in Y^{*}\mid\langle y^{*},y\rangle\leq 0\enspace\forall y\in K\} is the polar cone of the cone KK.

Denote by \preceq the binary relation over Y×YY\times Y defined as y1y2y_{1}\preceq y_{2} if and only if y2y1Ky_{2}-y_{1}\in-K. We say that this binary relation is induced by the cone K-K. As one can readily check, this binary relation is a partial order on YY if and only if K(K)={0}K\cap(-K)=\{0\}. In this case \preceq is called the partial order induced by the cone K-K. Note that the cone constraint G(x)KG(x)\in K can be rewritten as G(x)0G(x)\preceq 0.

We define an augmented Lagrangian for the problem (𝒫)(\mathcal{P}) as follows. Choose any function Φ:Y×Y×(0,+){±}\Phi\colon Y\times Y^{*}\times(0,+\infty)\to\mathbb{R}\cup\{\pm\infty\}, Φ=Φ(y,λ,c)\Phi=\Phi(y,\lambda,c). An augmented Lagrangian for the problem (𝒫)(\mathcal{P}) is defined as

(x,λ,c)=f(x)+Φ(G(x),λ,c)\mathscr{L}(x,\lambda,c)=f(x)+\Phi(G(x),\lambda,c)

if Φ(G(x),λ,c)>\Phi(G(x),\lambda,c)>-\infty, and (x,λ,c)=\mathscr{L}(x,\lambda,c)=-\infty, otherwise. Here λY\lambda\in Y^{*} is a multiplier and c>0c>0 is a penalty parameter. Note that only the constraint G(x)KG(x)\in K is incorporated into the augmented Lagrangian.

Unlike most existing works on augmented Lagrangians and corresponding optimization methods, we do not impose any assumptions on the structure of the function Φ\Phi. It can be defined in an arbitrary way. Instead, being inspired by [39] and following the ideas of our earlier paper [22], we propose to use a set of axioms (assumptions) describing behaviour of the function Φ(y,λ,c)\Phi(y,\lambda,c) for various types of arguments (e.g. as cc increases unboundedly or when yKy\in K). This approach allows one to construct an axiomatic theory of augmented Lagrangians and helps one to better understand what kind of assumptions the function Φ\Phi must satisfy to guarantee that the augmented Lagrangian (x,λ,c)\mathscr{L}(x,\lambda,c) and optimization methods based on this augmented Lagrangian have certain desirable properties. As we will show below, most well-known augmented Lagrangians satisfy our proposed set of axioms, which means that our axiomatic theory is rich enough and can be applied in many particular cases.

In order to unite several particular cases into one general theory, we formulate axioms/assumptions on the function Φ\Phi with respect to a prespecified closed convex cone ΛY\Lambda\subseteq Y^{*} of admissible multipliers. Usually, Λ=Y\Lambda=Y^{*} or Λ=K\Lambda=K^{*}.

We grouped the assumptions on Φ\Phi according to their meaning. If the function Φ(y,λ,c)\Phi(y,\lambda,c) is viewed as a black box with input (y,λ,c)(y,\lambda,c) and output Φ(y,λ,c)\Phi(y,\lambda,c), then assumptions (A1)(A1)(A6)(A6) describe what kind of output is produced by this black box with respect to certain specific kinds of input. Assumptions (A7)(A7)(A11)(A11) describe general properties of the function Φ\Phi, such as monotonicity, differentiability, and convexity. Assumptions (A12)(A12)(A15)(A15) impose restrictions on the way the function Φ(y,λ,c)\Phi(y,\lambda,c) behaves as cc increases unboundedly or along certain sequences {(yn,λn,cn)}\{(y_{n},\lambda_{n},c_{n})\}. Finally, the subscript “s” indicates a stronger, i.e. more restrictive, version of an assumption.

Denote dist(y,K)=infzKyz\operatorname{dist}(y,K)=\inf_{z\in K}\|y-z\| for any yYy\in Y, and let B(x,r)B(x,r) be the closed ball with centre xx and radius r>0r>0. Below is the list of basic assumptions on the function Φ\Phi that are utilised throughout the article:

  • (A1)

    yKλΛc>0\forall y\in K\>\forall\lambda\in\Lambda\>\forall c>0 one has Φ(y,λ,c)0\Phi(y,\lambda,c)\leq 0;

  • (A2)

    yKc>0λΛ\forall y\in K\>\forall c>0\>\exists\lambda\in\Lambda such that Φ(y,λ,c)0\Phi(y,\lambda,c)\geq 0;

  • (A3)

    yKc>0λΛ\forall y\notin K\>\forall c>0\>\exists\lambda\in\Lambda such that Φ(y,tλ,c)+\Phi(y,t\lambda,c)\to+\infty as t+t\to+\infty;

  • (A4)

    yKλKc>0\forall y\in K\>\forall\lambda\in K^{*}\>\forall c>0 such that λ,y=0\langle\lambda,y\rangle=0 one has Φ(y,λ,c)=0\Phi(y,\lambda,c)=0;

  • (A5)

    yKλKc>0\forall y\in K\>\forall\lambda\in K^{*}\>\forall c>0 such that λ,y0\langle\lambda,y\rangle\neq 0 one has Φ(y,λ,c)<0\Phi(y,\lambda,c)<0;

  • (A6)

    yKλΛKc>0\forall y\in K\>\forall\lambda\in\Lambda\setminus K^{*}\>\forall c>0 one has Φ(y,λ,c)<0\Phi(y,\lambda,c)<0;

  • (A7)

    yYλΛ\forall y\in Y\>\forall\lambda\in\Lambda the function cΦ(y,λ,c)c\mapsto\Phi(y,\lambda,c) is non-decreasing;

  • (A8)

    λΛc>0\forall\lambda\in\Lambda\>\forall c>0 the function yΦ(y,λ,c)y\mapsto\Phi(y,\lambda,c) is convex and non-decreasing with respect to the binary relation \preceq;

  • (A9)

    yYc>0\forall y\in Y\>\forall c>0 the function λΦ(y,λ,c)\lambda\mapsto\Phi(y,\lambda,c) is concave;

  • (A9)s(A9)_{s}

    yY\forall y\in Y the function (λ,c)Φ(y,λ,c)(\lambda,c)\mapsto\Phi(y,\lambda,c) is concave;

  • (A10)

    yY\forall y\in Y the function (λ,c)Φ(y,λ,c)(\lambda,c)\mapsto\Phi(y,\lambda,c) is upper semicontinuous;

  • (A11)

    yKλKc>0\forall y\in K\>\forall\lambda\in K^{*}\>\forall c>0 such that λ,y=0\langle\lambda,y\rangle=0 the function Φ(,λ,c)\Phi(\cdot,\lambda,c) is Fréchet differentiable at yy and its Fréchet derivative satisfies the equality DyΦ(y,λ,c)=Φ0(λ)D_{y}\Phi(y,\lambda,c)=\Phi_{0}(\lambda) for some surjective mapping Φ0:KK\Phi_{0}\colon K^{*}\to K^{*} that does not depend on yy and cc, and such that Φ0(λ),y=0\langle\Phi_{0}(\lambda),y\rangle=0 if and only if λ,y=0\langle\lambda,y\rangle=0;

  • (A12)

    λΛc0>0r>0\forall\lambda\in\Lambda\>\forall c_{0}>0\>\forall r>0 one has

    limc+inf{Φ(y,λ,c)Φ(y,λ,c0)|yY:dist(y,K)r,|Φ(y,λ,c0)|<+}=+;\lim_{c\to+\infty}\inf\Big{\{}\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\Bigm{|}\\ y\in Y\colon\operatorname{dist}(y,K)\geq r,\>|\Phi(y,\lambda,c_{0})|<+\infty\Big{\}}=+\infty;
  • (A12)s(A12)_{s}

    c0>0r>0\forall c_{0}>0\>\forall r>0 and for any bounded subset Λ0Λ\Lambda_{0}\subseteq\Lambda one has

    limc+infλΛ0inf{Φ(y,λ,c)Φ(y,λ,c0)|yY:dist(y,K)r,|Φ(y,λ,c0)|<+}=+;\lim_{c\to+\infty}\inf_{\lambda\in\Lambda_{0}}\inf\Big{\{}\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\Bigm{|}\\ y\in Y\colon\operatorname{dist}(y,K)\geq r,\>|\Phi(y,\lambda,c_{0})|<+\infty\Big{\}}=+\infty;
  • (A13)

    λΛ\forall\lambda\in\Lambda and for any sequences {cn}(0,+)\{c_{n}\}\subset(0,+\infty) and {yn}Y\{y_{n}\}\subset Y such that cn+c_{n}\to+\infty and dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as nn\to\infty one has lim infnΦ(yn,λ,cn)0\liminf\limits_{n\to\infty}\Phi(y_{n},\lambda,c_{n})\geq 0;

  • (A13)s(A13)_{s}

    for any sequences {cn}(0,+)\{c_{n}\}\subset(0,+\infty) and {yn}Y\{y_{n}\}\subset Y such that cn+c_{n}\to+\infty and dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as nn\to\infty and for any bounded subset Λ0Λ\Lambda_{0}\subseteq\Lambda one has lim infninfλΛ0Φ(yn,λ,cn)0\liminf\limits_{n\to\infty}\inf\limits_{\lambda\in\Lambda_{0}}\Phi(y_{n},\lambda,c_{n})\geq 0;

  • (A14)

    λΛ{cn}(0,+)\forall\lambda\in\Lambda\>\forall\{c_{n}\}\subset(0,+\infty) such that cn+c_{n}\to+\infty as nn\to\infty there exists {tn}(0,+)\{t_{n}\}\subset(0,+\infty) such that tn0t_{n}\to 0 as nn\to\infty and for any {yn}Y\{y_{n}\}\subset Y with dist(yn,K)tn\operatorname{dist}(y_{n},K)\leq t_{n} one has limnΦ(yn,λ,cn)=0\lim\limits_{n\to\infty}\Phi(y_{n},\lambda,c_{n})=0;

  • (A14)s(A14)_{s}

    {cn}(0,+)\forall\{c_{n}\}\subset(0,+\infty) such that cn+c_{n}\to+\infty as nn\to\infty and for any bounded sequence {λn}Λ\{\lambda_{n}\}\subseteq\Lambda there exists {tn}(0,+)\{t_{n}\}\subset(0,+\infty) such that tn0t_{n}\to 0 as nn\to\infty and for any {yn}Y\{y_{n}\}\subset Y with dist(yn,K)tn\operatorname{dist}(y_{n},K)\leq t_{n} the following equality holds true limnΦ(yn,λn,cn)=0\lim\limits_{n\to\infty}\Phi(y_{n},\lambda_{n},c_{n})=0;

  • (A15)

    for any bounded sequences {λn}Λ\{\lambda_{n}\}\subset\Lambda and {cn}(0,+)\{c_{n}\}\subset(0,+\infty) and for any sequence {yn}Y\{y_{n}\}\subset Y such that dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as nn\to\infty one has lim supnΦ(yn,λn,cn)0\limsup\limits_{n\to\infty}\Phi(y_{n},\lambda_{n},c_{n})\leq 0;

Remark 1.

In the case of assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), (A14)s(A14)_{s}, and (A15)(A15), we say that the function Φ\Phi satisfies the restricted version of the corresponding assumption, if one replaces “any sequence {yn}Y\{y_{n}\}\subset Y” in the formulation of corresponding assumption with “any bounded sequence {yn}Y\{y_{n}\}\subset Y”. Note that the validity of a (non-restricted) assumption implies that its restricted version also hold true.

Remark 2.

It should be noted that assumption (A13)s(A13)_{s} can be reformulated as follows: for any sequences {cn}(0,+)\{c_{n}\}\subset(0,+\infty) and {yn}Y\{y_{n}\}\subset Y such that cn+c_{n}\to+\infty and dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as nn\to\infty and for any bounded sequence {λn}Λ\{\lambda_{n}\}\subseteq\Lambda one has lim infnΦ(yn,λn,cn)0\liminf\limits_{n\to\infty}\Phi(y_{n},\lambda_{n},c_{n})\geq 0.

Below we will often use the following corollary to assumptions (A13)(A13) and (A13)s(A13)_{s} that can be readily verified directly.

Lemma 1.

If Φ\Phi satisfies assumption (A13)(A13), then for any λΛ\lambda\in\Lambda one has

lim infc+infyKΦ(y,λ,c)0.\liminf_{c\to+\infty}\inf_{y\in K}\Phi(y,\lambda,c)\geq 0.

If Φ\Phi satisfies assumption (A13)s(A13)_{s}, then for any bounded subset Λ0Λ\Lambda_{0}\subseteq\Lambda one has

lim infc+infλΛ0infyKΦ(y,λ,c)0.\liminf_{c\to+\infty}\inf_{\lambda\in\Lambda_{0}}\inf\limits_{y\in K}\Phi(y,\lambda,c)\geq 0.

In the following section we will give many particular examples of augmented Lagrangians for different types of cone constrained optimization problems (e.g. equality constrained problems, inequality constrained problems, problems with semidefinite constraints, etc.). If an optimization problem has several different types of constraints simultaneously, it is convenient to represent them as cone constraints of the form Gi(x)KiG_{i}(x)\in K_{i} for some mappings Gi:XYiG_{i}\colon X\to Y_{i} and closed convex cones KiK_{i} in real Banach spaces YiY_{i}, i{1,,m}i\in\{1,\ldots,m\}, with mm\in\mathbb{N}. Then one can define Y=Y1××YmY=Y_{1}\times\ldots\times Y_{m} and

G()=(G1(),,Gm()),K=K1××KmG(\cdot)=(G_{1}(\cdot),\ldots,G_{m}(\cdot)),\quad K=K_{1}\times\ldots\times K_{m}

to formally rewrite such optimization problems as the problem (𝒫)(\mathcal{P}). The space YY is equipped with the norm y=y1++ym\|y\|=\|y_{1}\|+\ldots+\|y_{m}\| for all y=(y1,,ym)Yy=(y_{1},\ldots,y_{m})\in Y.

In order to define a function Φ(y,λ,c)\Phi(y,\lambda,c) in this case, one can define corresponding functions Φi(yi,λi,c)\Phi_{i}(y_{i},\lambda_{i},c) for each constraint Gi(x)KiG_{i}(x)\in K_{i} individually (here yiYiy_{i}\in Y_{i} and λiΛiYi\lambda_{i}\in\Lambda_{i}\subseteq Y_{i}^{*}) and then put

Φ(y,λ,c)=i=1mΦi(yi,λi,c),y=(y1,,ym),λ=(λ1,,λm),\Phi(y,\lambda,c)=\sum_{i=1}^{m}\Phi_{i}(y_{i},\lambda_{i},c),\quad y=(y_{1},\ldots,y_{m}),\quad\lambda=(\lambda_{1},\ldots,\lambda_{m}), (1)

if Φi(yi,λi,c)>\Phi_{i}(y_{i},\lambda_{i},c)>-\infty for all ii, and Φ(y,λ,c)=\Phi(y,\lambda,c)=-\infty, otherwise. Most (if not all) existing augmented Lagrangians for problems with several different types constraints are defined in this way. Let us show that, roughly speaking, if Λ=Λ1××Λm\Lambda=\Lambda_{1}\times\ldots\times\Lambda_{m} and all functions Φi\Phi_{i}, i{1,,m}i\in\{1,\ldots,m\}, satisfy some basic assumption simultaneously, then the function Φ\Phi also satisfies this basic assumption.

Theorem 1.

Let Λ=Λ1××Λm\Lambda=\Lambda_{1}\times\ldots\times\Lambda_{m}. Then the following statements hold true:

  1. 1.

    if all functions Φi\Phi_{i}, iI:={1,,m}i\in I:=\{1,\ldots,m\}, satisfy one of the basic assumptions simultaneously, except for assumptions (A5)(A5), (A6)(A6), (A12)(A12), and (A12)s(A12)_{s}, then the function Φ\Phi defined in (1) satisfies the same basic assumption;

  2. 2.

    if all functions Φi\Phi_{i}, iIi\in I, satisfy assumptions (A1)(A1) and (A5)(A5) (or (A1)(A1) and (A6)(A6)) simultaneously, then the function Φ\Phi defined in (1) also satisfies assumption (A5)(A5) (or (A6)(A6));

  3. 3.

    if all functions Φi\Phi_{i}, iIi\in I, satisfy assumptions (A7)(A7) and (A12)(A12) (or (A7)(A7) and (A12)s(A12)_{s}) simultaneously, then the function Φ\Phi defined in (1) also satisfies assumption (A12)(A12) (or (A12)s(A12)_{s}).

Proof.

Assumption (A1). If yKy\in K and λΛ\lambda\in\Lambda, then yiKiy_{i}\in K_{i} and λiΛi\lambda_{i}\in\Lambda_{i} for all iIi\in I. Therefore, Φi(yi,λi,c)0\Phi_{i}(y_{i},\lambda_{i},c)\leq 0 by assumption (A1)(A1), which implies that Φ(y,λ,c)0\Phi(y_{,}\lambda,c)\leq 0.

Assumption (A2). Choose any yKy\in K and c>0c>0. By assumption (A2)(A2) for any iIi\in I there exists λiΛi\lambda_{i}\in\Lambda_{i} such that Φi(yi,λi,c)0\Phi_{i}(y_{i},\lambda_{i},c)\geq 0. Put λ=(λ1,,λm)\lambda=(\lambda_{1},\ldots,\lambda_{m}). Then Φ(y,λ,c)0\Phi(y,\lambda,c)\geq 0.

Assumption (A3). Choose any yKy\in K and c>0c>0. By assumption (A3)(A3) for any iIi\in I there exists λiΛi\lambda_{i}\in\Lambda_{i} such that Φi(yi,tλi,c)+\Phi_{i}(y_{i},t\lambda_{i},c)\to+\infty as t+t\to+\infty. Put λ=(λ1,,λm)\lambda=(\lambda_{1},\ldots,\lambda_{m}). Then Φ(y,tλ,c)+\Phi(y,t\lambda,c)\to+\infty as tt\to\infty.

Assumption (A4). Let yKy\in K and λK\lambda\in K^{*} be such that λ,y=0\langle\lambda,y\rangle=0. It is easily seen that the condition λK\lambda\in K^{*} implies that λiKi\lambda_{i}\in K^{*}_{i} for any iIi\in I and, therefore, λi,yi0\langle\lambda_{i},y_{i}\rangle\leq 0 for all iIi\in I. Hence λi,yi=0\langle\lambda_{i},y_{i}\rangle=0 for all iIi\in I and by assumption (A4)(A4) one has Φi(yi,λi,c)=0\Phi_{i}(y_{i},\lambda_{i},c)=0, which yields Φ(y,λ,c)=0\Phi(y,\lambda,c)=0.

Assumption (A5). If yKy\in K and λK\lambda\in K^{*} are such that λ,y0\langle\lambda,y\rangle\neq 0, then there exists kIk\in I such that λk,yk0\langle\lambda_{k},y_{k}\rangle\neq 0. Therefore, Φk(yk,λk,c)<0\Phi_{k}(y_{k},\lambda_{k},c)<0. In turn, for any iki\neq k one has Φi(yi,λi,c)0\Phi_{i}(y_{i},\lambda_{i},c)\leq 0 by assumption (A1)(A1). Hence Φ(y,λ,c)<0\Phi(y,\lambda,c)<0.

Assumption (A6). If yKy\in K and λΛK\lambda\in\Lambda\setminus K^{*}, then λkΛkKk\lambda_{k}\in\Lambda_{k}\setminus K_{k}^{*} for some kIk\in I. Therefore, Φk(yk,λk,c)<0\Phi_{k}(y_{k},\lambda_{k},c)<0, while Φ(yi,λi,c)0\Phi(y_{i},\lambda_{i},c)\leq 0 for any iki\neq k by assumption (A1)(A1). Consequently, Φ(y,λ,c)<0\Phi(y,\lambda,c)<0.

Assumption (A7). The function Φ(y,λ,c)\Phi(y,\lambda,c) is non-decreasing in cc as the sum of non-decreasing in cc functions.

Assumption (A8). Note that if the function Φi(,λi,c)\Phi_{i}(\cdot,\lambda_{i},c) is non-decreasing with respect to the binary relation induced by the cone Ki-K_{i}, then the function yΦi(yi,λi,c)y\mapsto\Phi_{i}(y_{i},\lambda_{i},c) is non-decreasing with respect to the binary relation induced by the cone K-K. Therefore the function Φ(y,λ,c)\Phi(y,\lambda,c) is convex and non-decreasing with respect to the binary relation induced by the cone K-K as the sum of convex and non-decreasing functions.

Assumptions (A9) and (A9)s. The function Φ(y,λ,c)\Phi(y,\lambda,c) is concave in λ\lambda (or (λ,c)(\lambda,c)) as the sum of concave functions.

Assumption (A10). The function Φ(y,λ,c)\Phi(y,\lambda,c) is upper semicontinuous as the sum of a finite number of upper semicontinuous functions.

Assumption (A11). If yKy\in K and λK\lambda\in K^{*} are such that λ,y=0\langle\lambda,y\rangle=0, then, as was noted above, yiKiy_{i}\in K_{i}, λiKi\lambda_{i}\in K_{i}^{*}, and λi,yi=0\langle\lambda_{i},y_{i}\rangle=0 for all iIi\in I. Then each of the functions Φi(,λi,c)\Phi_{i}(\cdot,\lambda_{i},c) is Fréchet differentiable at yiy_{i} and its derivative has the form DyiΦi(yi,λi,c)=Φi0(λi)D_{y_{i}}\Phi_{i}(y_{i},\lambda_{i},c)=\Phi_{i0}(\lambda_{i}) for some function Φi0:KiKi\Phi_{i0}\colon K_{i}^{*}\to K_{i}^{*} such that Φi0(λi),yi=0\langle\Phi_{i0}(\lambda_{i}),y_{i}\rangle=0 if and only if λi,yi=0\langle\lambda_{i},y_{i}\rangle=0. Therefore, the function Φ(,λ,c)\Phi(\cdot,\lambda,c) is Fréchet differentiable at the point yy and its Fréchet derivative is equal to Φ0(λ)=(Φ10(λ1),,Φm0(λm))\Phi_{0}(\lambda)=(\Phi_{10}(\lambda_{1}),\ldots,\Phi_{m0}(\lambda_{m})). Clearly, one has

Φ0(λ),y=0Φi0(λi),yi=0iI\displaystyle\langle\Phi_{0}(\lambda),y\rangle=0\enspace\Leftrightarrow\enspace\langle\Phi_{i0}(\lambda_{i}),y_{i}\rangle=0\enspace\forall i\in I\enspace λi,yi=0iI\displaystyle\Leftrightarrow\enspace\langle\lambda_{i},y_{i}\rangle=0\enspace\forall i\in I
λ,y=0,\displaystyle\Leftrightarrow\enspace\langle\lambda,y\rangle=0,

which implies the required result.

Assumption (A12). Fix any λΛ\lambda\in\Lambda, c0>0c_{0}>0, and r>0r>0. Choose some α>0\alpha>0. Let yYy\in Y be such that |Φ(y,λ,c0)|<+|\Phi(y,\lambda,c_{0})|<+\infty and dist(y,K)r\operatorname{dist}(y,K)\geq r. Then there exists iIi\in I such that dist(yi,Ki)r/m\operatorname{dist}(y_{i},K_{i})\geq r/m. By assumption (A12)(A12) for the functions Φj\Phi_{j}, for any jIj\in I there exists cj(r/m,α)c0c_{j}(r/m,\alpha)\geq c_{0} such that for all ccj(r/m,α)c\geq c_{j}(r/m,\alpha) one has

inf{Φj(zj,λj,c)Φj(zj,λj,c0)|zjYj:dist(zj,Kj)r/m,|Φj(zj,λj,c0)|<+}α.\inf\Big{\{}\Phi_{j}(z_{j},\lambda_{j},c)-\Phi_{j}(z_{j},\lambda_{j},c_{0})\Bigm{|}z_{j}\in Y_{j}\colon\operatorname{dist}(z_{j},K_{j})\geq r/m,\\ |\Phi_{j}(z_{j},\lambda_{j},c_{0})|<+\infty\Big{\}}\geq\alpha.

Let c(r,α):=max{cj(r/m,α)jI}c(r,\alpha):=\max\{c_{j}(r/m,\alpha)\mid j\in I\}. Then with the use of assumption (A7)(A7) one gets that for any cc(r,α)c\geq c(r,\alpha) the following inequalities hold true:

Φ(y,λ,c)Φ(y,λ,c0)\displaystyle\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0}) =i=1m(Φi(yi,λi,c)Φi(yi,λi,c0))\displaystyle=\sum_{i=1}^{m}\big{(}\Phi_{i}(y_{i},\lambda_{i},c)-\Phi_{i}(y_{i},\lambda_{i},c_{0})\big{)}
Φi(yi,λi,c)Φi(yi,λi,c0)α.\displaystyle\geq\Phi_{i}(y_{i},\lambda_{i},c)-\Phi_{i}(y_{i},\lambda_{i},c_{0})\geq\alpha.

Since yYy\in Y such that |Φ(y,λ,c0)|<+|\Phi(y,\lambda,c_{0})|<+\infty and dist(y,K)r\operatorname{dist}(y,K)\geq r were chosen arbitrarily, one can conclude that for any α>0\alpha>0 and r>0r>0 there exists c(r,α)c0c(r,\alpha)\geq c_{0} such that

inf{Φ(y,λ,c)Φ(y,λ,c0)|yY:dist(y,K)r,Φ(y,λ,c0)<+}α\inf\Big{\{}\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\Bigm{|}y\in Y\colon\operatorname{dist}(y,K)\geq r,\>\Phi(y,\lambda,c_{0})<+\infty\Big{\}}\geq\alpha

for all cc(r,α)c\geq c(r,\alpha). Consequently, the function Φ\Phi satisfies assumption (A12)(A12).

Assumption (A12)s. Choose some c0>0c_{0}>0 and let Λ0Λ\Lambda_{0}\subset\Lambda be a bounded set. Clearly, one can find bounded sets Λi0Λi\Lambda_{i0}\subset\Lambda_{i} such that Λ0Λ^0\Lambda_{0}\subseteq\widehat{\Lambda}_{0}, where Λ^0:=Λ10××Λm0\widehat{\Lambda}_{0}:=\Lambda_{10}\times\ldots\times\Lambda_{m0}.

By assumption (A12)s(A12)_{s} for the functions Φi\Phi_{i}, for any iIi\in I, r>0r>0, and α>0\alpha>0 one can find ci(r,α,Λi0)c0c_{i}(r,\alpha,\Lambda_{i0})\geq c_{0} such that

infλΛi0inf{Φi(yi,λi,c)Φ(yi,λi,c0)|yiYi(r,Φi)}αcci(r,α,Λi0),\inf_{\lambda\in\Lambda_{i0}}\inf\Big{\{}\Phi_{i}(y_{i},\lambda_{i},c)-\Phi(y_{i},\lambda_{i},c_{0})\Bigm{|}y_{i}\in Y_{i}(r,\Phi_{i})\Big{\}}\geq\alpha\quad\forall c\geq c_{i}(r,\alpha,\Lambda_{i0}),

where Yi(r,Φi):={yiYidist(yi,Ki)r,|Φi(yi,λi,c0)|<+}Y_{i}(r,\Phi_{i}):=\{y_{i}\in Y_{i}\mid\operatorname{dist}(y_{i},K_{i})\geq r,|\Phi_{i}(y_{i},\lambda_{i},c_{0})|<+\infty\}.

Denote Y(r,Φ)={yYdist(y,K)r,|Φ(y,λ,c0)|<+}Y(r,\Phi)=\{y\in Y\mid\operatorname{dist}(y,K)\geq r,\>|\Phi(y,\lambda,c_{0})|<+\infty\}. Fix some r>0r>0 and choose any yY(r,Φ)y\in Y(r,\Phi). Then dist(yi,Ki)r/m\operatorname{dist}(y_{i},K_{i})\geq r/m for some iIi\in I. Hence with the use of assumption (A7)(A7) one gets that

infλΛ0inf{Φ(y,λ,c)Φ(y,λ,c0)|yY(r,Φ)}infλΛ^0inf{Φ(y,λ,c)Φ(y,λ,c0)|yY(r,Φ)}infλΛi0inf{Φi(yi,λi,c)Φ(yi,λi,c0)|yiYi(r/m,Φi)}α\inf_{\lambda\in\Lambda_{0}}\inf\Big{\{}\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\Bigm{|}y\in Y(r,\Phi)\Big{\}}\\ \geq\inf_{\lambda\in\widehat{\Lambda}_{0}}\inf\Big{\{}\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\Bigm{|}y\in Y(r,\Phi)\Big{\}}\\ \geq\inf_{\lambda\in\Lambda_{i0}}\inf\Big{\{}\Phi_{i}(y_{i},\lambda_{i},c)-\Phi(y_{i},\lambda_{i},c_{0})\Bigm{|}y_{i}\in Y_{i}(r/m,\Phi_{i})\Big{\}}\geq\alpha

for any cc(r,α,Λ0):=max{cj(r/m,α,Λi0)jI}c\geq c(r,\alpha,\Lambda_{0}):=\max\{c_{j}(r/m,\alpha,\Lambda_{i0})\mid j\in I\}. Since α>0\alpha>0 was chosen arbitrarily, one can conclude that the function Φ\Phi satisfies assumption (A12)s(A12)_{s}.

Assumption (A13). If sequences {cn}(0,+)\{c_{n}\}\subset(0,+\infty) and {yn}Y\{y_{n}\}\subset Y are such that cn+c_{n}\to+\infty and dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as kk\to\infty, then dist(yin,Ki)0\operatorname{dist}(y_{in},K_{i})\to 0 as nn\to\infty and lim infnΦi(yin,λ,cn)0\liminf_{n\to\infty}\Phi_{i}(y_{in},\lambda,c_{n})\geq 0 for all iIi\in I. From these inequalities it obviously follows that lim infnΦ(yn,λ,cn)0\liminf_{n\to\infty}\Phi(y_{n},\lambda,c_{n})\geq 0 as well.

Assumption (A13)s. The proof is essentially the same as the proof for assumption (A13)(A13).

Assumption (A14). Fix any λΛ\lambda\in\Lambda and a sequence {cn}(0,+)\{c_{n}\}\subset(0,+\infty) such that cn+c_{n}\to+\infty as nn\to\infty. By our assumption for any iIi\in I there exists a sequence {tin}(0,+)\{t_{in}\}\subset(0,+\infty) converging to zero and such that for any {yin}Yi\{y_{in}\}\subset Y_{i} with dist(yin,Ki)tin\operatorname{dist}(y_{in},K_{i})\leq t_{in} for all nn\in\mathbb{N} one has Φi(yin,λ,cn)0\Phi_{i}(y_{in},\lambda,c_{n})\to 0 as nn\to\infty.

Define tn=min{t1n,,tmn}t_{n}=\min\{t_{1n},\ldots,t_{mn}\} and choose any sequence {yn}Y\{y_{n}\}\subset Y such that dist(yn,K)tn\operatorname{dist}(y_{n},K)\leq t_{n} for all nn\in\mathbb{N}. Then for any iIi\in I and nn\in\mathbb{N} one has dist(yin,Ki)tin\operatorname{dist}(y_{in},K_{i})\leq t_{in}, which implies that Φi(yin,λ,cn)0\Phi_{i}(y_{in},\lambda,c_{n})\to 0 as nn\to\infty and, therefore, Φ(yn,λ,cn)0\Phi(y_{n},\lambda,c_{n})\to 0 as nn\to\infty.

Assumption (A14)s. The proof is the same as the proof for assumption (A14)(A14).

Assumption (A15). If {λn}Λ\{\lambda_{n}\}\subset\Lambda and {cn}\{c_{n}\} are bounded sequences and a sequence {yn}Y\{y_{n}\}\subset Y is such that dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as nn\to\infty, then the sequences {λin}\{\lambda_{in}\} are also bounded and dist(yin,Ki)0\operatorname{dist}(y_{in},K_{i})\to 0 as nn\to\infty for all iIi\in I. Consequently, one has

lim supnΦi(yin,λin,cn)0iI.\limsup_{n\to\infty}\Phi_{i}(y_{in},\lambda_{in},c_{n})\leq 0\quad\forall i\in I.

Applying these inequalities one can readily check that lim supnΦ(yn,λn,cn)0\limsup\limits_{n\to\infty}\Phi(y_{n},\lambda_{n},c_{n})\leq 0.

The proof of the claim of the theorem for the restricted versions of assumptions (A13)(A13)(A15)(A15), (A13)s(A13)_{s}, and (A14)s(A14)_{s} is essentially the same as the proofs for the non-restricted versions of these assumptions. ∎

Remark 3.

Let us note that assumptions (A1)(A1) and (A7)(A7) are satisfied for all particular augmented Lagrangians presented in this paper and all augmented Lagrangians known to the author.

Thus, the previous theorem allows one to analyse augmented Lagrangians for each particular type of cone constraints separately and then simply define an augmented Lagrangian for problems with several different types of cone constraints as in (1). Then the basic assumptions will be satisfied for this augmented Lagrangian by Theorem 1.

3 Examples of augmented Lagrangians

Let us present some particular examples of augmented Lagrangians for various types of cone constraints and discuss which of the basic assumptions are satisfied for these augmented Lagrangians. Our aim is to show that the basic assumptions are not restrictive and satisfied for most augmented Lagrangians appearing in the literature.

Many examples of augmented Lagrangians were already presented in the author’s previous paper [22]. However, many of the basic assumptions from this paper are completely new and were not used in [22] (e.g. assumptions (A9)(A9), (A10)(A10), and (A13)(A13)(A15)(A15), as well as their stronger and restricted versions). Therefore, for the sake of completeness we will present detailed descriptions of all examples, even though they partially overlap with the contents of [22, Section 3].

3.1 Augmented Lagrangians for problems with general cone constraints

First, we consider an augmented Lagrangian that is defined for any cone constrained optimization problem of the form (𝒫)(\mathcal{P}). Since particular versions of this augmented Lagrangian are apparently studied and used in optimization methods more often than any other augmented Lagrangian, we will discuss it in more details than other examples.

Example 1 (Rockafellar-Wets’ augmented Lagrangian).

Let σ:Y[0,+]\sigma\colon Y\to[0,+\infty] be such that σ(0)=0\sigma(0)=0 and σ(y)0\sigma(y)\neq 0 for any y0y\neq 0. In particular, one can set σ(y)=y\sigma(y)=\|y\| or σ(y)=0.5y2\sigma(y)=0.5\|y\|^{2}. Define

Φ(y,λ,c)=infpKy(λ,p+cσ(p)).\Phi(y,\lambda,c)=\inf_{p\in K-y}\big{(}-\langle\lambda,p\rangle+c\sigma(p)\big{)}. (2)

The augmented Lagrangian with the term Φ\Phi defined in this way was first introduced by Rockafellar and Wets [58, Section 11.K] (see also [31, 62, 32, 21]). Various generalizations of this augmented Lagrangian were studied in [9, 72, 70].

Lemma 2.

Let Λ=Y\Lambda=Y^{*}. Then the function Φ\Phi defined in (2) satisfies:

  1. 1.

    assumptions (A1)(A1)(A4)(A4), (A7)(A7), (A9)(A9), (A9)s(A9)_{s}, and (A10)(A10) in the general case;

  2. 2.

    assumptions (A5)(A5) and (A6)(A6), if σ(ty)=o(t)\sigma(ty)=o(t) as t0t\to 0 (that is, σ(ty)/t0\sigma(ty)/t\to 0 as t0t\to 0) for any yYy\in Y;

  3. 3.

    assumption (A8)(A8), if the function σ\sigma is convex;

  4. 4.

    assumption (A11)(A11) with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda, if σ(y)=0.5y2\sigma(y)=0.5\|y\|^{2} and YY is a Hilbert space;

  5. 5.

    assumptions (A12)(A12) and (A12)s(A12)_{s}, if σ\sigma has a valley at zero, that is, for any neighbourhood UU of zero in YY there exists δ>0\delta>0 such that σ(y)δ\sigma(y)\geq\delta for all yYUy\in Y\setminus U;

  6. 6.

    assumptions (A13)(A13) and (A13)s(A13)_{s}, if σ(y)ω(y)\sigma(y)\geq\omega(\|y\|) for all yYy\in Y and some continuous function ω:[0,+)[0,+)\omega\colon[0,+\infty)\to[0,+\infty) such that ω(t)=0\omega(t)=0 if and only if t=0t=0, and lim inft+ω(t)/t>0\liminf_{t\to+\infty}\omega(t)/t>0;

  7. 7.

    assumptions (A14)(A14) and (A14)s(A14)_{s}, if σ\sigma is continuous at zero and there exists a continuous function ω:[0,+)[0,+)\omega\colon[0,+\infty)\to[0,+\infty) such that σ(y)ω(y)\sigma(y)\geq\omega(\|y\|) for all yYy\in Y, ω(t)=0\omega(t)=0 if and only if t=0t=0, and lim inft+ω(t)/t>0\liminf_{t\to+\infty}\omega(t)/t>0;

  8. 8.

    assumption (A15)(A15), if the function σ\sigma is continuous at zero.

Proof.

We divide the proof of the lemma into several parts corresponding to its separate statements.

Part 1. Assumptions (A1)(A1) (set p=0p=0), (A2)(A2) (set λ=0\lambda=0), and (A7)(A7) are obviously satisfied. Assumption (A3)(A3) is satisfied for λY\lambda\in Y^{*} from the separation theorem for the sets {y}\{y\} and KK. Assumption (A4)(A4) is satisfied, since if λ,y=0\langle\lambda,y\rangle=0 and λK\lambda\in K^{*}, then

Φ(y,λ,c)=infpK(λ,p+cσ(py))cinfpKσ(py)0,\Phi(y,\lambda,c)=\inf_{p\in K}\big{(}-\langle\lambda,p\rangle+c\sigma(p-y)\big{)}\geq c\inf_{p\in K}\sigma(p-y)\geq 0,

which along with assumption (A1)(A1) implies that Φ(y,λ,c)=0\Phi(y,\lambda,c)=0. Assumptions (A9)(A9), (A9)s(A9)_{s}, and (A10)(A10) are satisfied by virtue of the fact that the function (λ,c)Φ(y,λ,c)(\lambda,c)\mapsto\Phi(y,\lambda,c) is the infimum of a family of linear functions.

Part 2. Let yKy\in K and λK\lambda\in K^{*} be such that λ,y0\langle\lambda,y\rangle\neq 0. For any tt in a sufficiently small neighbourhood of zero one has z(t)=(1+tsign(λ,y))yKz(t)=(1+t\operatorname{sign}(\langle\lambda,y\rangle))y\in K, which implies that for p=z(t)yKyp=z(t)-y\in K-y the following inequalities hold true:

Φ(y,λ,c)λ,p+cσ(p)=t|λ,y|+cσ(tsign(λ,y)y).\Phi(y,\lambda,c)\leq-\langle\lambda,p\rangle+c\sigma(p)=-t\big{|}\langle\lambda,y\rangle\big{|}+c\sigma\big{(}t\operatorname{sign}(\langle\lambda,y\rangle)y\big{)}.

The last expression is negative for any sufficiently small tt, since σ(ty)=o(t)\sigma(ty)=o(t), that is, assumption (A5)(A5) holds true.

Let now yKy\in K and λΛK\lambda\in\Lambda\setminus K^{*}. For any such λ\lambda one can find p0Kp_{0}\in K for which λ,p0<0\langle\lambda,p_{0}\rangle<0. Then putting p=tp0p=tp_{0} for t>0t>0 (note that tp0+yKtp_{0}+y\in K, since yKy\in K and KK is a convex cone, which yields pKyp\in K-y) one gets

Φ(y,λ,c)tλ,p0+cσ(tp0)<0\Phi(y,\lambda,c)\leq-t\langle\lambda,p_{0}\rangle+c\sigma(tp_{0})<0

for any sufficiently small tt, thanks to the fact that σ(tp0)=o(t)\sigma(tp_{0})=o(t).

Part 3. Fix any α(0,1)\alpha\in(0,1) and y1,y2Yy_{1},y_{2}\in Y. Choose any Mi>Φ(yi,λ,c)M_{i}>\Phi(y_{i},\lambda,c), i{1,2}i\in\{1,2\}. By definition one can find piKyip_{i}\in K-y_{i} such that Mi>λ,pi+cσ(pi)M_{i}>-\langle\lambda,p_{i}\rangle+c\sigma(p_{i}), i{1,2}i\in\{1,2\}. Then for p(α)=αp1+(1α)p2K(αy1+(1α)y2)p(\alpha)=\alpha p_{1}+(1-\alpha)p_{2}\in K-(\alpha y_{1}+(1-\alpha)y_{2}) one has

Φ(αy1+(1α)y2,λ,c)\displaystyle\Phi(\alpha y_{1}+(1-\alpha)y_{2},\lambda,c) λ,p(α)+cσ(p(α))\displaystyle\leq-\langle\lambda,p(\alpha)\rangle+c\sigma(p(\alpha))
α(λ,p1+cσ(p1))+(1α)(λ,p2+cσ(p2))\displaystyle\leq\alpha\Big{(}-\langle\lambda,p_{1}\rangle+c\sigma(p_{1})\Big{)}+(1-\alpha)\Big{(}-\langle\lambda,p_{2}\rangle+c\sigma(p_{2})\Big{)}
<αM1+(1α)M2.\displaystyle<\alpha M_{1}+(1-\alpha)M_{2}.

Hence by [54, Theorem I.4.2] one can conclude that the function Φ(,λ,c)\Phi(\cdot,\lambda,c) is convex. Let us now show that it is non-decreasing with respect to the binary relation induced by the cone K-K.

Indeed, fix any y1,y2Yy_{1},y_{2}\in Y such that y1y2y_{1}\preceq y_{2}, i.e. y2y1Ky_{2}-y_{1}\in-K. One can obviously suppose that Φ(y2,λ,c)<+\Phi(y_{2},\lambda,c)<+\infty. By definition for any M>Φ(y2,λ,c)M>\Phi(y_{2},\lambda,c) one can find pKy2p\in K-y_{2} such that Mλ,p+cσ(p)M\geq-\langle\lambda,p\rangle+c\sigma(p). Let zKz\in K be such that p=zy2p=z-y_{2}. Note that z(y2y1)Kz-(y_{2}-y_{1})\in K due to the fact that (y2y1)K-(y_{2}-y_{1})\in K by our assumption. Then p=z(y2y1)y1Ky1p=z-(y_{2}-y_{1})-y_{1}\in K-y_{1}, which yields

Φ(y1,λ,c)λ,p+cσ(p)M.\Phi(y_{1},\lambda,c)\leq-\langle\lambda,p\rangle+c\sigma(p)\leq M.

Since M>Φ(y2,λ,c)M>\Phi(y_{2},\lambda,c) was chosen arbitrarily, one can conclude that the function Φ(,λ,c)\Phi(\cdot,\lambda,c) is non-decreasing with respect to the binary relation \preceq.

Part 4. The proof of this statement of the lemma can be found in [62].

Part 5. Fix any c0>0c_{0}>0. For any yYy\in Y and λΛ\lambda\in\Lambda such that Φ(y,λ,c0)\Phi(y,\lambda,c_{0}) is finite one has

Φ(y,λ,c)\displaystyle\Phi(y,\lambda,c) =infpKy(λ,p+(cc0+c0)σ(p))\displaystyle=\inf_{p\in K-y}\big{(}-\langle\lambda,p\rangle+(c-c_{0}+c_{0})\sigma(p)\big{)}
infpKy(λ,p+c0σ(p))+(cc0)infpKyσ(p)\displaystyle\geq\inf_{p\in K-y}\big{(}-\langle\lambda,p\rangle+c_{0}\sigma(p)\big{)}+(c-c_{0})\inf_{p\in K-y}\sigma(p)

for any cc0c\geq c_{0}. If dist(y,K)r\operatorname{dist}(y,K)\geq r, then pr\|p\|\geq r for any pKyp\in K-y. Therefore by our assumption there exists δ>0\delta>0, independent on λΛ\lambda\in\Lambda, cc0c\geq c_{0}, and y{zKdist(z,K)r}y\in\{z\in K\mid\operatorname{dist}(z,K)\geq r\}, such that

Φ(y,λ,c)Φ(y,λ,c0)(cc0)δ.\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\geq(c-c_{0})\delta.

With the use of this inequality one can easily prove that assumptions (A12)(A12) and (A12)s(A12)_{s} hold true.

Part 6. Let {cn}(0,+)\{c_{n}\}\subset(0,+\infty) be an increasing unbounded sequence. Fix any bounded set Λ0Λ\Lambda_{0}\subset\Lambda. Then for any yYy\in Y and λΛ0\lambda\in\Lambda_{0} one has

Φ(y,λ,cn)infpKy(λp+cnω(p))inft0(Rt+cnω(t)),\Phi(y,\lambda,c_{n})\geq\inf_{p\in K-y}\big{(}-\|\lambda\|\|p\|+c_{n}\omega(\|p\|)\big{)}\geq\inf_{t\geq 0}\big{(}-Rt+c_{n}\omega(t)\big{)},

where R>0R>0 is such that λR\|\lambda\|\leq R for all λΛ0\lambda\in\Lambda_{0}. By applying the assumptions on the function ω\omega one can readily check that

lim infninft0(Rt+cnω(t))0,\liminf_{n\to\infty}\inf_{t\geq 0}\big{(}-Rt+c_{n}\omega(t)\big{)}\geq 0,

which yields

lim infninfλΛ0infyYΦ(y,λ,cn)0.\liminf_{n\to\infty}\inf_{\lambda\in\Lambda_{0}}\inf_{y\in Y}\Phi(y,\lambda,c_{n})\geq 0. (3)

Consequently, assumptions (A13)(A13) and (A13)s(A13)_{s} hold true.

Part 7. Let {λn}Y\{\lambda_{n}\}\subset Y^{*} be a bounded sequence and a sequence {cn}(0,+)\{c_{n}\}\subset(0,+\infty) be such that cn+c_{n}\to+\infty as nn\to\infty. Due to the continuity of σ\sigma at zero, for any nn\in\mathbb{N} one can find δn>0\delta_{n}>0 such that for all pB(0,δn)p\in B(0,\delta_{n}) one has σ(p)1/(cnn)\sigma(p)\leq 1/(c_{n}n). One can obviously suppose that δn0\delta_{n}\to 0 as nn\to\infty.

Define tn=δn/2t_{n}=\delta_{n}/2 for all nn\in\mathbb{N}. Then for any sequence {yn}Y\{y_{n}\}\subset Y such that dist(yn,K)tn\operatorname{dist}(y_{n},K)\leq t_{n} for all nn\in\mathbb{N} one can find a sequence {zn}K\{z_{n}\}\subset K such that znynδn\|z_{n}-y_{n}\|\leq\delta_{n} for all nn\in\mathbb{N}. Observe that for pn=znynKynp_{n}=z_{n}-y_{n}\in K-y_{n} one has

Φ(yn,λn,cn)λn,pn+cnσ(pn)n,\Phi(y_{n},\lambda_{n},c_{n})\leq-\langle\lambda_{n},p_{n}\rangle+c_{n}\sigma(p_{n})\quad\forall n\in\mathbb{N}, (4)

and the right-hand side of this inequality converges to zero, since pn0p_{n}\to 0 as nn\to\infty and the sequence {λn}\{\lambda_{n}\} is bounded. Combining this fact with inequality (3) one obtains that we have found a sequence {tn}\{t_{n}\} for which assumptions (A14)(A14) and (A14)s(A14)_{s} hold true.

Part 8. Let {λn}Y\{\lambda_{n}\}\subset Y^{*} and {cn}(0,+)\{c_{n}\}\subset(0,+\infty) be bounded sequences, and a sequence {yn}Y\{y_{n}\}\subset Y be such that dist(yn,K)0\operatorname{dist}(y_{n},K)\to 0 as nn\to\infty. Then thanks to the continuity of σ\sigma at zero one can find a sequence {zn}K\{z_{n}\}\subset K such that pn0\|p_{n}\|\to 0 and σ(pn)0\sigma(p_{n})\to 0 as nn\to\infty, where pn=znynp_{n}=z_{n}-y_{n}. Now, applying inequality (4) and taking into account the fact that the right-hand side of this inequality obviously converges to zero one can conclude that assumption (A15)(A15) is valid. ∎

Thus, all basic assumptions are satisfied for σ(y)=0.5y2\sigma(y)=0.5\|y\|^{2}. In the other important case of the sharp Lagrangian [27, 8, 10, 1, 16], that is, the case when σ(y)=y\sigma(y)=\|y\|, all assumptions, except for (A5)(A5), (A6)(A6), and (A11)(A11), hold true. It should be noted that these assumptions are not used in the main results presented in this article and needed only to strengthen some of these results in the convex case.

Remark 4.

Note that neither assumption (A5)(A5) nor assumption (A6)(A6) are satisfied for σ(y)=y\sigma(y)=\|y\| due to the fact that in this case Φ(y,λ,c)0\Phi(y,\lambda,c)\geq 0 for all yYy\in Y, if c>λc>\|\lambda\|. Thus, if σ(ty)o(t)\sigma(ty)\neq o(t) for some yYy\in Y, then assumptions (A5)(A5) and (A6)(A6) might not hold true.

3.2 Augmented Lagrangians for problems with equality constraints

Let us now consider the following equality constrained problem:

minf(x)subject toG(x)=0,xQ.\min\>f(x)\quad\text{subject to}\quad G(x)=0,\quad x\in Q.

The constraint G(x)=0G(x)=0 can obviously be rewritten as the cone constraint G(x)KG(x)\in K, if one puts K={0}K=\{0\}. The binary relation \preceq in this case coincides with the equality relation “==”, and all functions are non-decreasing with respect to this relation.

Example 2 (Hestenes-Powell’s augmented Lagrangian).

Define Λ=Y\Lambda=Y^{*} and

Φ(y,λ,c)=λ,y+c2y2.\Phi(y,\lambda,c)=\langle\lambda,y\rangle+\frac{c}{2}\|y\|^{2}.

Then the corresponding augmented Lagrangian is a particular case of the augmented Lagrangian from Example 1 with σ(y)=0.5y2\sigma(y)=0.5\|y\|^{2}. Therefore, this function Φ\Phi satisfies all basic assumptions, except for assumption (A11)(A11), in the general case, and it satisfies assumption (A11)(A11) with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda, if YY is a Hilbert space. Note that in the case Y=mY=\mathbb{R}^{m} the corresponding augmented Lagrangian ()\mathscr{L}(\cdot) coincides with the Hestenes-Powell augmented Lagrangian [29, 52, 5].

Example 3 (sharp Lagrangian).

Define Λ=Y\Lambda=Y^{*} and

Φ(y,λ,c)=λ,y+cy.\Phi(y,\lambda,c)=\langle\lambda,y\rangle+c\|y\|.

Then the corresponding augmented Lagrangian is a particular case of the augmented Lagrangian from Example 1 with σ(y)=y\sigma(y)=\|y\|. Therefore, this function Φ\Phi satisfied all basic assumptions, except for assumptions (A5)(A5), (A6)(A6), and (A11)(A11).

In the case of equality constrained problems of the form

minf(x)subject togj(x)=0,iI,xQ,\min\>f(x)\quad\text{subject to}\quad g_{j}(x)=0,\quad i\in I,\quad x\in Q,

where I={1,,m}I=\{1,\ldots,m\} and gi:Xg_{i}\colon X\to\mathbb{R} are given function, one can define a more general class of augmented Lagrangians. This problem can be written as the problem (𝒫)(\mathcal{P}) with Y=mY=\mathbb{R}^{m}, G()=(g1(),,gm())G(\cdot)=(g_{1}(\cdot),\ldots,g_{m}(\cdot)), and K={0}K=\{0\}. Note that in this particular case the dual space YY^{*} can be identified with m\mathbb{R}^{m}.

Example 4 (Mangasarian’s augmented Lagrangian).

Let ϕ:\phi\colon\mathbb{R}\to\mathbb{R} be a twice differentiable strictly convex function such that ϕ(0)=ϕ(0)=0\phi(0)=\phi^{\prime}(0)=0 and ϕ()\phi^{\prime}(\cdot) is surjective. Define

Φ(y,λ,c)=i=1m1c(ϕ(cyi+λi)ϕ(λi))\Phi(y,\lambda,c)=\sum_{i=1}^{m}\frac{1}{c}\big{(}\phi(cy_{i}+\lambda_{i})-\phi(\lambda_{i})\big{)}

for all y=(y1,,ym)Yy=(y_{1},\ldots,y_{m})\in Y and λ=(λ1,,λm)Λ\lambda=(\lambda_{1},\ldots,\lambda_{m})\in\Lambda. Then the corresponding augmented Lagrangian ()\mathscr{L}(\cdot) coincides with Mangasarian’s augmented Lagrangian from [46] (see also [76]). In the case ϕ(t)=0.5t2\phi(t)=0.5t^{2}, this augmented Lagrangian coincides with the Hestenes-Powell augmented Lagrangian. One can also put, e.g. ϕ(t)=|t|t2n/(2n+1)\phi(t)=|t|t^{2n}/(2n+1) or ϕ(t)=t2n/2n\phi(t)=t^{2n}/2n for any n{1,2,}n\in\{1,2,\ldots\}.

Let Λ=Ym\Lambda=Y^{*}\cong\mathbb{R}^{m}. Then one can readily check that all assumptions, except for assumptions (A9)(A9) and (A9)s(A9)_{s}, are satisfied in the general case (assumption (A11)(A11) holds true with Φ0(λ)(ϕ(λ1),,ϕ(λm))\Phi_{0}(\lambda)\equiv(\phi^{\prime}(\lambda_{1}),\ldots,\phi^{\prime}(\lambda_{m}))). Assumptions (A9)(A9) and (A9)s(A9)_{s} are satisfied if and only if ϕ(t)=at2\phi(t)=at^{2} for some a>0a>0. The validity of these assumptions in the case when the function ϕ\phi is quadratic can be readily verified directly. Let us prove the converse statement.

Suppose that the function λΦ(y,λ,c)\lambda\to\Phi(y,\lambda,c) is concave. Then applying the second order derivative test for concavity one gets that ϕ′′(λi)ϕ′′(cyi+λi)\phi^{\prime\prime}(\lambda_{i})\geq\phi^{\prime\prime}(cy_{i}+\lambda_{i}) for all λi,yi\lambda_{i},y_{i}\in\mathbb{R} and c>0c>0 or, equivalently, ϕ′′()\phi^{\prime\prime}(\cdot) is a constant function. Hence bearing in mind the conditions ϕ(0)=ϕ(0)=0\phi(0)=\phi^{\prime}(0)=0 one gets that ϕ(t)=at2\phi(t)=at^{2} for some a>0a>0.

3.3 Augmented Lagrangians for problems with inequality constraints

Next we will present several examples of augmented Lagrangians for the inequality constrained problem

minf(x)subject togi(x)0,iI,xQ,\min\>f(x)\quad\text{subject to}\quad g_{i}(x)\leq 0,\quad i\in I,\quad x\in Q, (5)

where I={1,,m}I=\{1,\ldots,m\} and gi:Xg_{i}\colon X\to\mathbb{R} are given functions. This problem can be written as the problem (𝒫)(\mathcal{P}) with Y=mY=\mathbb{R}^{m}, G()=(g1(),,gm())G(\cdot)=(g_{1}(\cdot),\ldots,g_{m}(\cdot)), and K=mK=\mathbb{R}_{-}^{m}, where =(,0]\mathbb{R}_{-}=(-\infty,0]. The dual space YY^{*} can be identified with m\mathbb{R}^{m}, while KK^{*} can be identified with +m\mathbb{R}^{m}_{+}, where +=[0,+)\mathbb{R}_{+}=[0,+\infty). The binary relation \preceq in this case is the coordinate-wise partial order.

All particular augmented Lagrangians for problem (5) used in optimization methods and known to the author are separable (except for nonlinear Lagrangians; see [60, 17, 72]), that is, the corresponding function Φ(y,λ,c)\Phi(y,\lambda,c) has the form

Φ(y,λ,c)=i=1mΦi(yi,λi,c)y=(y1,,ym),λ=(λ1,,λm)\Phi(y,\lambda,c)=\sum_{i=1}^{m}\Phi_{i}(y_{i},\lambda_{i},c)\quad\forall y=(y_{1},\ldots,y_{m}),\>\lambda=(\lambda_{1},\ldots,\lambda_{m}) (6)

for some functions Φi:2×(0,+){±}\Phi_{i}\colon\mathbb{R}^{2}\times(0,+\infty)\to\mathbb{R}\cup\{\pm\infty\}. Although one can can choose different functions Φi\Phi_{i} for different iIi\in I (that is, for different inequality constraints), to the best of the author’s knowledge, only the case when Φi\Phi_{i} are the same for all iIi\in I is considered in the vast majority of papers on augmented Lagrangians for inequality constrained problems.

Example 5 (essentially quadratic/Hestenes-Powell-Rockafellar’s augmented Lagrangian).

Let ϕ:\phi\colon\mathbb{R}\to\mathbb{R} be a twice continuously differentiable strictly convex function such that ϕ(0)=ϕ(0)=0\phi(0)=\phi^{\prime}(0)=0, and the derivative ϕ()\phi^{\prime}(\cdot) is surjective. Following Bertsekas [2, Section 5.1.2, Example 1], for any y,λy,\lambda\in\mathbb{R} define

P(y,λ)={λy+ϕ(y),if λ+ϕ(y)0,mint(λt+ϕ(t)),otherwiseP(y,\lambda)=\begin{cases}\lambda y+\phi(y),&\text{if }\lambda+\phi^{\prime}(y)\geq 0,\\ \min_{t\in\mathbb{R}}\big{(}\lambda t+\phi(t)\big{)},&\text{otherwise}\end{cases}

(note that the minimum is finite and attained at any tt such that λ+ϕ(t)=0\lambda+\phi^{\prime}(t)=0, which exists due to the surjectivity of ϕ()\phi^{\prime}(\cdot)) and put

Φi(yi,λi,c)=1cP(cyi,λ)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{1}{c}P(cy_{i},\lambda)\quad\forall i\in I.

The corresponding augmented Lagrangian ()\mathscr{L}(\cdot) is called the essentially quadratic augmented Lagrangian for problem (5) (see [68, 39, 71]). In the case ϕ(t)=t2/2\phi(t)=t^{2}/2 one has

Φi(yi,λi,c)=λimax{yi,λic}+c2max{yi,λic}2,\Phi_{i}(y_{i},\lambda_{i},c)=\lambda_{i}\max\left\{y_{i},-\frac{\lambda_{i}}{c}\right\}+\frac{c}{2}\max\left\{y_{i},-\frac{\lambda_{i}}{c}\right\}^{2},

and ()\mathscr{L}(\cdot) is the well-known Hestenes-Powell-Rockafellar augmented Lagrangian [29, 52, 55, 56, 57, 5], which is a particular case of the augmented Lagrangian from Example 1 with σ(y)=y2/2\sigma(y)=\|y\|^{2}/2 and \|\cdot\| being the Euclidean norm.

Let λ=Y=m\lambda=Y^{*}=\mathbb{R}^{m}. Then one can readily verify that all basic assumptions, except for assumption (A9)s(A9)_{s}, hold true in the general case (assumption (A11)(A11) is satisfied with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda). Assumption (A9)s(A9)_{s} is satisfied for ϕ(t)=at2\phi(t)=at^{2}, a>0a>0.

Example 6 (cubic augmented Lagrangian).

Let

Φi(yi,λi,c)=13c(max{sign(λi)|λi|+cyi,0}3|λi|3/2)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{1}{3c}\Big{(}\max\big{\{}\operatorname{sign}(\lambda_{i})\sqrt{|\lambda_{i}|}+cy_{i},0\big{\}}^{3}-|\lambda_{i}|^{3/2}\Big{)}\quad\forall i\in I.

Then ()\mathscr{L}(\cdot) coincides with the cubic augmented Lagrangian [36]. One can easily check that all basic assumptions, except for assumptions (A9)(A9) and (A9)s(A9)_{s}, are satisfied in this case with Λ=Y=m\Lambda=Y^{*}=\mathbb{R}^{m} (assumption (A11)(A11) is satisfied with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda). Assumption (A9)(A9) holds true, provided ΛK\Lambda\subseteq K^{*}, while assumption (A9)s(A9)_{s} is not satisfied for any choice of Λ\Lambda.

Example 7 (Mangasarian’s augmented Lagrangian).

Let ϕ:\phi\colon\mathbb{R}\to\mathbb{R} be a twice continuously differentiable strictly convex function such that ϕ(0)=ϕ(0)=0\phi(0)=\phi^{\prime}(0)=0 and the function ϕ()\phi^{\prime}(\cdot) is surjective. Define

Φi(yi,λi,c)=1c(ϕ(max{cyi+λi,0})ϕ(λi))iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{1}{c}\Big{(}\phi\big{(}\max\{cy_{i}+\lambda_{i},0\}\big{)}-\phi(\lambda_{i})\Big{)}\quad\forall i\in I. (7)

Then ()\mathscr{L}(\cdot) coincides with the augmented Lagrangian introduced by Mangasarian [46] and studied, e.g. in [76]. Let Λ=Y=m\Lambda=Y^{*}=\mathbb{R}^{m}. Then all basic assumptions, except for assumptions (A9)(A9) and (A9)s(A9)_{s}, hold true (assumption (A11)(A11) is satisfied with Φ0(λ)=(ϕ(max{λ1,0}),,ϕ(max{λm,0}))\Phi_{0}(\lambda)=(\phi^{\prime}(\max\{\lambda_{1},0\}),\ldots,\phi^{\prime}(\max\{\lambda_{m},0\}))). Assumptions (A9)(A9) and (A9)s(A9)_{s} are satisfied for ϕ(t)=at2\phi(t)=at^{2} with a>0a>0.

Example 8 (exponential-type augmented Lagrangian).

Let ϕ:\phi\colon\mathbb{R}\to\mathbb{R} be a twice differentiable strictly increasing function such that ϕ(0)=0\phi(0)=0. Define

Φi(yi,λi,c)=λicϕ(cyi)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{\lambda_{i}}{c}\phi(cy_{i})\quad\forall i\in I.

If ϕ(t)=et1\phi(t)=e^{t}-1, then ()\mathscr{L}(\cdot) coincides with the exponential penalty function [2, 69, 68, 39, 71]. In turn, if ϕ(t)=2(ln(et+1)ln2)\phi(t)=2(\ln(e^{t}+1)-\ln 2), then ()\mathscr{L}(\cdot) is the Polyak’s log-sigmoid Lagrangian [50, 51]. In the general case we call the corresponding function ()\mathscr{L}(\cdot) the exponential-type augmented Lagrangian.

Let Λ=K=+m\Lambda=K^{*}=\mathbb{R}^{m}_{+}. Then assumptions (A1)(A1)(A6)(A6), (A9)(A9), (A10)(A10), and (A15)(A15) are satisfied in the general case. Assumptions (A7)(A7) and (A8)(A8) hold true, provided the function ϕ\phi is convex. Assumption (A11)(A11) is satisfied with Φ0(λ)ϕ(0)λ\Phi_{0}(\lambda)\equiv\phi^{\prime}(0)\lambda if and only if ϕ(0)0\phi^{\prime}(0)\neq 0. Restricted versions of assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} (see Remark 1) are satisfied if and only if ϕ(t)/t0\phi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions are satisfied if and only if the function ϕ\phi is bounded below. Finally, assumptions (A9)s(A9)_{s}, (A12)(A12), and (A12)s(A12)_{s} (put λ=0\lambda=0) are never satisfied for the exponential-type augmented Lagrangian.

Thus, all basic assumptions, except for assumptions (A9)s(A9)_{s}, (A12)(A12), and (A12)s(A12)_{s}, are valid for the exponential penalty function and the log-sigmoid Lagrangian.

Example 9 (penalized exponential-type augmented Lagrangian).

Suppose that ϕ:\phi\colon\mathbb{R}\to\mathbb{R} is a twice differentiable strictly increasing function such that ϕ(0)=0\phi(0)=0, and ξ:\xi\colon\mathbb{R}\to\mathbb{R} is a twice continuously differentiable non-decreasing function such that ξ(t)=0\xi(t)=0 for all t0t\leq 0 and ξ(t)>0\xi(t)>0 for all t>0t>0 (for example, one can set ξ(t)=max{0,t}3\xi(t)=\max\{0,t\}^{3}). Following Bertsekas [2, Section 5.1.2, Example 2] define

Φi(yi,λi,c)=λicϕ(cyi)+1cξ(cyi)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{\lambda_{i}}{c}\phi(cy_{i})+\frac{1}{c}\xi(cy_{i})\quad\forall i\in I.

Then the function ()\mathscr{L}(\cdot) is called the penalized exponential-type augmented Lagrangian [68, 39, 71], since it is obtained from the augmented Lagrangian from the previous example by adding the penalty term ξ(cyi)/c\xi(cy_{i})/c.

Let Λ=K=+m\Lambda=K^{*}=\mathbb{R}^{m}_{+}. Then assumptions (A1)(A1)(A6)(A6), (A9)(A9), (A10)(A10), and (A15)(A15), are satisfied in the general case. Assumptions (A7)(A7) and (A8)(A8) hold true, provided the functions ϕ\phi and ξ\xi are convex. Assumption (A11)(A11) is satisfied with Φ0(λ)ϕ(0)λ\Phi_{0}(\lambda)\equiv\phi^{\prime}(0)\lambda if and only if ϕ(0)0\phi^{\prime}(0)\neq 0. Assumptions (A12)(A12) and (A12)s(A12)_{s} are valid, provided ξ(t)/t+\xi(t)/t\to+\infty as tt\to\infty and ϕ\phi is either bounded below or convex. Restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s}, hold true if and only if ϕ(t)/t0\phi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions hold true if and only if ϕ\phi is bounded below.

Thus, if ϕ(t)=et1\phi(t)=e^{t}-1 or ϕ(t)=2(ln(et+1)ln2)\phi(t)=2(\ln(e^{t}+1)-\ln 2) and ξ(t)=max{0,t}3\xi(t)=\max\{0,t\}^{3}, then all basic assumptions, except for assumption (A9)s(A9)_{s}, hold true.

Example 10 (p-th power augmented Lagrangian).

Let b0b\geq 0 and a continuous non-decreasing function ϕ:+\phi\colon\mathbb{R}\to\mathbb{R}_{+} be such that ϕ(t)>ϕ(b)>0\phi(t)>\phi(b)>0 for all t>bt>b. For example, one can set ϕ(t)=et\phi(t)=e^{t} with b0b\geq 0 or ϕ(t)=max{0,t}\phi(t)=\max\{0,t\} with b>0b>0. By our assumption the inequality gi(x)0g_{i}(x)\leq 0 is satisfied if and only if ϕ(gi(x)+b)/ϕ(b)1\phi(g_{i}(x)+b)/\phi(b)\leq 1. Furthermore, ϕ(gi(x)+b)0\phi(g_{i}(x)+b)\geq 0 for all xXx\in X. Define

Φi(yi,λi,c)=λic+1((ϕ(yi+b)ϕ(b))c+11)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{\lambda_{i}}{c+1}\left(\left(\frac{\phi(y_{i}+b)}{\phi(b)}\right)^{c+1}-1\right)\quad\forall i\in I.

Then ()\mathscr{L}(\cdot) coincides with the p-th power augmented Lagrangian [37, 75, 39].

Let Λ=K=+m\Lambda=K^{*}=\mathbb{R}^{m}_{+}. Then assumptions (A1)(A1)(A7)(A7), (A9)(A9), (A10)(A10), (A13)(A13)(A15)(A15), (A13)s(A13)_{s}, and (A14)s(A14)_{s} hold true. Assumption (A8)(A8) is satisfied, if the function ϕ\phi is convex. Assumption (A11)(A11) is satisfied with Φ0(λ)ϕ(b)λ\Phi_{0}(\lambda)\equiv\phi^{\prime}(b)\lambda, provided ϕ\phi is differentiable and ϕ(b)0\phi^{\prime}(b)\neq 0. Finally, assumptions (A9)s(A9)_{s}, (A12)(A12), and (A12)s(A12)_{s} are not satisfied for the p-th power augmented Lagrangian.

Remark 5.

Let ϕ\phi be as in the previous example and ξ\xi be as in Example 9. Then by analogy with the penalized exponential-type augmented Lagrangian one can define the penalized p-th power augmented Lagrangian as follows:

Φi(yi,λi,c)=λic+1((ϕ(yi+b)ϕ(b))c+11)+1cξ(cyi)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{\lambda_{i}}{c+1}\left(\left(\frac{\phi(y_{i}+b)}{\phi(b)}\right)^{c+1}-1\right)+\frac{1}{c}\xi(cy_{i})\quad\forall i\in I.

If the function ϕ\phi is convex and differentiable, ϕ(b)0\phi^{\prime}(b)\neq 0, ξ\xi is convex, and ξ(t)/t+\xi(t)/t\to+\infty as tt\to\infty, then one can verify that the penalized p-th power augmented Lagrangian satisfies all basic assumption, except for assumption (A9)s(A9)_{s}. Let us also mention that one can apply this trick of adding the penalty term ξ(cyi)/c\xi(cy_{i})/c to any other augmented Lagrangian for inequality constrained problems, if it does not satisfy assumptions (A12)(A12) and (A12)s(A12)_{s}, in order to construct the penalized version of this augmented Lagrangian satisfying assumptions (A12)(A12) and (A12)s(A12)_{s} and having all other properties of the non-penalized version.

Example 11 (hyperbolic-type augmented Lagrangian).

Let ϕ:\phi\colon\mathbb{R}\to\mathbb{R} be a twice differentiable strictly increasing convex function such that ϕ(0)=0\phi(0)=0. Define

Φi(yi,λi,c)=1cϕ(cλiyi)iI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{1}{c}\phi(c\lambda_{i}y_{i})\quad\forall i\in I.

If ϕ(t)=t+t2+11\phi(t)=t+\sqrt{t^{2}+1}-1, then ()\mathscr{L}(\cdot) coincides with the hyperbolic augmented Lagrangian [78, 53]. In the general case we call such function (x,λ,c)\mathscr{L}(x,\lambda,c) the hyperbolic-type augmented Lagrangian.

Let Λ=K=+m\Lambda=K^{*}=\mathbb{R}^{m}_{+}. Then assumptions (A1)(A1)(A8)(A8), (A10)(A10), (A11)(A11), and (A15)(A15) are satisfied in the general case (assumption (A11)(A11) is satisfied with Φ0(λ)ϕ(0)λ\Phi_{0}(\lambda)\equiv\phi^{\prime}(0)\lambda). Assumption (A9)(A9) is satisfied if and only if ϕ\phi is a linear function. Restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s}, hold true if and only if ϕ(t)/t0\phi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions hold true if and only if ϕ\phi is bounded below. Finally, assumptions (A9)s(A9)_{s}, (A12)(A12), and (A12)s(A12)_{s} are never satisfied for the hyperbolic-type augmented Lagrangian.

Example 12 (modified barrier function).

Let ϕ:(,1)\phi\colon(-\infty,1)\to\mathbb{R} be a twice differentiable strictly increasing function such that ϕ(0)=0\phi(0)=0 and ϕ(t)+\phi(t)\to+\infty as t1t\to 1. Define

Φi(yi,λi,c)={λicϕ(cyi),if cyi<1,+,otherwiseiI.\Phi_{i}(y_{i},\lambda_{i},c)=\begin{cases}\frac{\lambda_{i}}{c}\phi(cy_{i}),&\text{if }cy_{i}<1,\\ +\infty,&\text{otherwise}\end{cases}\qquad\forall i\in I.

Then augmented Lagrangian ()\mathscr{L}(\cdot) coincides with the modified barrier function introduced by R. Polyak [49]. In particular, in the case ϕ(t)=ln(1t)\phi(t)=-\ln(1-t) the augmented Lagrangian ()\mathscr{L}(\cdot) is the modified Frisch function, while in the case ϕ(t)=1/(1t)1\phi(t)=1/(1-t)-1 the augmented Lagrangian ()\mathscr{L}(\cdot) is the modified Carrol function [49] (see also [68, 39, 71]).

Let Λ=K=+m\Lambda=K^{*}=\mathbb{R}^{m}_{+}. Then assumptions (A1)(A1)(A6)(A6), (A9)(A9), (A10)(A10), (A12)(A12), (A12)s(A12)_{s}, and (A15)(A15) are satisfied in the general case. Assumptions (A7)(A7) and (A8)(A8) hold true, if the function ϕ\phi is convex. Assumption (A11)(A11) is satisfied with Φ0(λ)=ϕ(0)λ\Phi_{0}(\lambda)=\phi^{\prime}(0)\lambda if and only if ϕ(0)0\phi^{\prime}(0)\neq 0. Restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} hold true if and only if ϕ(t)/t0\phi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions are valid if and only if the function ϕ\phi is bounded below. Finally, assumption (A9)s(A9)_{s} cannot hold true for the modified barrier function.

Thus, the modified Carrol function satisfies all basic assumptions, except for assumption (A9)s(A9)_{s}, while the modified Frisch functions satisfies all assumptions, except for (A9)s(A9)_{s} and non-restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s}.

Example 13 (He-Wu-Meng’s augmented Lagrangian).

Let

Φi(yi,λi,c)=1c0cyi(t2+λi2+t)𝑑tiI.\Phi_{i}(y_{i},\lambda_{i},c)=\frac{1}{c}\int_{0}^{cy_{i}}\big{(}\sqrt{t^{2}+\lambda_{i}^{2}}+t\big{)}\,dt\quad\forall i\in I.

Then ()\mathscr{L}(\cdot) coincides with the augmented Lagrangian introduced by He, Wu, and Meng [28]. Let us note that

Φi(yi,λi,c)=yi2(cyi)2+λi2+cyi22+λi22cln((cyi)2+λi2+cyi)λi22cln|λi|,\Phi_{i}(y_{i},\lambda_{i},c)=\frac{y_{i}}{2}\sqrt{(cy_{i})^{2}+\lambda_{i}^{2}}+\frac{cy_{i}^{2}}{2}+\frac{\lambda_{i}^{2}}{2c}\ln\big{(}\sqrt{(cy_{i})^{2}+\lambda_{i}^{2}}+cy_{i}\big{)}-\frac{\lambda_{i}^{2}}{2c}\ln|\lambda_{i}|,

if λi0\lambda_{i}\neq 0, and Φi(yi,0,c)=cyi(yi+|yi|)/2\Phi_{i}(y_{i},0,c)=cy_{i}(y_{i}+|y_{i}|)/2.

Let Λ=Y=m\Lambda=Y^{*}=\mathbb{R}^{m}. Then assumptions (A1)(A1)(A5)(A5), (A7)(A7), (A8)(A8), (A10)(A10), (A11)(A11) with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda, (A12)(A12), (A12)s(A12)_{s}, (A15)(A15), and restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} are satisfied in the general case. Assumption (A6)(A6) holds true if and only if ΛK\Lambda\subseteq K^{*}, since Φ(0,λ,c)=0\Phi(0,\lambda,c)=0 for all λY\lambda\in Y^{*}. Finally, assumptions (A9)(A9), (A9)s(A9)_{s}, (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} are not satisfied for He-Wu-Meng’s augmented Lagrangian in the general case. Let us note that the non-restricted versions of the last 4 assumptions are not satisfied due to the fact that Φi(y,λ,c)\Phi_{i}(y,\lambda,c)\to-\infty as yy\to-\infty.

Remark 6.

Many more particular examples of augmented Lagrangians for inequality constrained optimization problems can be found in [3].

3.4 Augmented Lagrangians for problems with second order cone constraints

Let us now consider nonlinear second order cone programming problems:

minf(x)subject togi(x)𝒦i+1,iI,xQ,\min\>f(x)\quad\text{subject to}\quad g_{i}(x)\in\mathcal{K}_{\ell_{i}+1},\quad i\in I,\quad x\in Q, (8)

where gi:Xi+1g_{i}\colon X\to\mathbb{R}^{\ell_{i}+1}, iI:={1,,m}i\in I:=\{1,\ldots,m\}, are given functions,

𝒦i+1={y=(y0,y¯)×i|y0y¯}\mathcal{K}_{\ell_{i}+1}=\Big{\{}y=(y^{0},\overline{y})\in\mathbb{R}\times\mathbb{R}^{\ell_{i}}\Bigm{|}y^{0}\geq\|\overline{y}\|\Big{\}}

is the second order (Lorentz/ice cream) cone of dimension i+1\ell_{i}+1, and \|\cdot\| is the Euclidean norm.

Problem (8) can be rewritten as the problem (𝒫)(\mathcal{P}) with

Y=1+1××m+1,K=𝒦1+1××𝒦m+1,Y=\mathbb{R}^{\ell_{1}+1}\times\ldots\times\mathbb{R}^{\ell_{m}+1},\quad K=\mathcal{K}_{\ell_{1}+1}\times\ldots\times\mathcal{K}_{\ell_{m}+1},

and G()=(g1(),,gm())G(\cdot)=(g_{1}(\cdot),\ldots,g_{m}(\cdot)). Note that the dual space YY^{*} can be identified with YY, while the polar cone KK^{*} can be identified with (𝒦1+1)××(𝒦m+1)(-\mathcal{K}_{\ell_{1}+1})\times\ldots\times(-\mathcal{K}_{\ell_{m}+1}).

Example 14 (Hestenes-Powell-Rockafellar’s augmented Lagrangian).

For any y=(y1,,ym)Yy=(y_{1},\ldots,y_{m})\in Y, λ=(λ1,,λm)Y\lambda=(\lambda_{1},\ldots,\lambda_{m})\in Y^{*}, and c>0c>0 define

Φ(y,λ,c)=c2i=1m[dist2(yi+1cλi,𝒦i+1)1c2λi2].\Phi(y,\lambda,c)=\frac{c}{2}\sum_{i=1}^{m}\left[\operatorname{dist}^{2}\left(y_{i}+\frac{1}{c}\lambda_{i},\mathcal{K}_{\ell_{i}+1}\right)-\frac{1}{c^{2}}\|\lambda_{i}\|^{2}\right].

This function Φ\Phi is a particular case of the function Φ\Phi from Example 1 with σ(y)=(y12++ym2)/2\sigma(y)=(\|y_{1}\|^{2}+\ldots+\|y_{m}\|^{2})/2 (see [40, 41, 84]). Therefore it satisfies all basic assumptions with Λ=Y\Lambda=Y^{*} (assumption (A11)(A11) holds true with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda).

To define another augmented Lagrangian for optimization problems with nonlinear second order cone constraints, recall (see [26, 64]) that in the context of such problems Löwner’s operator associated with a function ψ:\psi\colon\mathbb{R}\to\mathbb{R} is defined as follows:

Ψ(y)=12(ψ(y0+y)+ψ(y0y)(ψ(y0+y)ψ(y0y))y¯y¯)\Psi(y)=\frac{1}{2}\begin{pmatrix}\psi(y^{0}+\|y\|)+\psi(y^{0}-\|y\|)\\ \Big{(}\psi(y^{0}+\|y\|)-\psi(y^{0}-\|y\|)\Big{)}\frac{\overline{y}}{\|\overline{y}\|}\end{pmatrix}

for any y(y0,y¯)×ly\in(y^{0},\overline{y})\in\mathbb{R}\times\mathbb{R}^{l} with y¯0\overline{y}\neq 0, and Ψ(y)=(ψ(y0),0)\Psi(y)=(\psi(y^{0}),0) for y=(y0,0)y=(y^{0},0) . One can readily verify that if ψ(0)=0\psi(0)=0 and the function ψ\psi is strictly increasing, then Ψ(y)𝒦+1-\Psi(-y)\in\mathcal{K}_{\ell+1} for any y𝒦+1y\in\mathcal{K}_{\ell+1}, while Ψ(y)𝒦+1-\Psi(-y)\notin\mathcal{K}_{\ell+1} for any y𝒦+1y\notin\mathcal{K}_{\ell+1}.

Example 15 (exponential-type augmented Lagrangian/modified barrier function).

Let ψ:{+}\psi\colon\mathbb{R}\to\mathbb{R}\cup\{+\infty\} be a non-decreasing convex function such that domψ=(,ε0)\operatorname{dom}\psi=(-\infty,\varepsilon_{0}) for some ε0(0,+]\varepsilon_{0}\in(0,+\infty], ψ(t)+\psi(t)\to+\infty as tε0t\to\varepsilon_{0} in the case ε0<+\varepsilon_{0}<+\infty and ψ(t)/t+\psi(t)/t\to+\infty as t+t\to+\infty in the case ε0=+\varepsilon_{0}=+\infty. Suppose also that ψ\psi is twice differentiable on domψ\operatorname{dom}\psi, ψ(0)=0\psi(0)=0, and ψ(0)=1\psi^{\prime}(0)=1. For any y=(y1,,ym)Yy=(y_{1},\ldots,y_{m})\in Y and λ=(λ1,,λm)Y\lambda=(\lambda_{1},\ldots,\lambda_{m})\in Y^{*} define

Φ(y,λ,c)=1ci=1mλi,Ψ(cyi),\Phi(y,\lambda,c)=-\frac{1}{c}\sum_{i=1}^{m}\big{\langle}\lambda_{i},\Psi(-cy_{i})\big{\rangle},

if y0+y¯<ε0/c-y^{0}+\|\overline{y}\|<\varepsilon_{0}/c, and Φ(y,λ,c)=+\Phi(y,\lambda,c)=+\infty otherwise. The corresponding augmented Lagrangian, which can be viewed as an extension of augmented Lagrangian from Examples 8 and 12 to the case of nonlinear second order cone programming problems, was introduced in [81].

Let Λ=K\Lambda=K^{*}. Then assumptions (A1)(A1)(A7)(A7), (A9)(A9)(A11)(A11), and (A15)(A15) hold true in the general case (assumption (A11)(A11) is satisfied with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda by [81, Lemma 3.1]). Assumptions (A12)(A12) and (A12)s(A12)_{s} are satisfied if and only if ε0<+\varepsilon_{0}<+\infty. Restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} hold true if and only if ψ(t)/t0\psi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions hold true if and only if ψ\psi is bounded below. Finally, assumptions (A8)(A8) and (A9)s(A9)_{s} are not satisfied for the function Φ\Phi from this example.

3.5 Augmented Lagrangians for problems with semidefinite constraints

Let us now consider nonlinear semidefinite programming problems of the form:

minf(x)subject toG(x)𝕆×,xQ,\min\>f(x)\quad\text{subject to}\quad G(x)\preceq\mathbb{O}_{\ell\times\ell},\quad x\in Q, (9)

where G:X𝕊G\colon X\to\mathbb{S}^{\ell} is a given function, 𝕊\mathbb{S}^{\ell} is the space of all real symmetric matrices of order \ell endowed with the inner product A,B=Tr(AB)\langle A,B\rangle=\operatorname{Tr}(AB) and the corresponding norm AF=Tr(A2)\|A\|_{F}=\sqrt{\operatorname{Tr}(A^{2})}, A,B𝕊A,B\in\mathbb{S}^{\ell}, which is called the Frobenius norm, Tr()\operatorname{Tr}(\cdot) is the trace operator, 𝕆×\mathbb{O}_{\ell\times\ell} is the zero matrix of order ×\ell\times\ell, and \preceq is the Löwner partial order on the space 𝕊\mathbb{S}^{\ell}, that is, ABA\preceq B for some A,B𝕊A,B\in\mathbb{S}^{\ell} if and only if the matrix BAB-A is positive semidefinite.

Problem (9) can be written as the problem (𝒫)(\mathcal{P}) with Y=𝕊Y=\mathbb{S}^{\ell} and KK being the cone of negative semidefinite matrices 𝕊\mathbb{S}^{\ell}_{-}. Note that the binary relation induced by the cone K-K coincides with the Löwner partial order. The dual space YY^{*} in this case can be identified with 𝕊\mathbb{S}^{\ell}, while the polar cone KK^{*} can be identified with the cone of positive semidefinite matrices 𝕊+\mathbb{S}^{\ell}_{+}.

Example 16 (Hestenes-Powell-Rockafellar’s augmented Lagrangian).

For any y,λ𝕊y,\lambda\in\mathbb{S}^{\ell} and c>0c>0 define

Φ(y,λ,c)=12c(Tr([cy+λ]+2)Tr(λ2)),\Phi(y,\lambda,c)=\frac{1}{2c}\Big{(}\operatorname{Tr}\big{(}[cy+\lambda]_{+}^{2}\big{)}-\operatorname{Tr}(\lambda^{2})\Big{)},

where []+[\cdot]_{+} is the projection of a matrix onto the cone 𝕊+\mathbb{S}_{+}^{\ell}. This function Φ\Phi is a particular case of the function Φ\Phi from Example 1 with σ(y)=yF2/2\sigma(y)=\|y\|_{F}^{2}/2 and, therefore, it satisfies all basic assumptions with Λ=Y\Lambda=Y^{*} (assumption (A11)(A11) holds true with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda). The corresponding augmented Lagrangian and optimization methods for nonlinear semidefinite programming problems utilising this augmented Lagrangian were studied in [33, 67, 65, 83, 73, 66, 45, 74, 77, 80].

One can also extend the exponential-type augmented Lagrangian/modified barrier function for inequality constrained problems to the case of nonlinear semidefinite programming problems. To define such extension, recall that the matrix function/Löwner’s operator [30, 64] associated with a function ψ:\psi\colon\mathbb{R}\to\mathbb{R} is defined as follows:

Ψ(y)=Ediag(ψ(σ1(y)),,ψ(σ(y)))ETyY,\Psi(y)=E\operatorname{diag}\Big{(}\psi(\sigma_{1}(y)),\ldots,\psi(\sigma_{\ell}(y))\Big{)}E^{T}\quad\forall y\in Y,

where y=Ediag(σ1(y),,σ(y))ETy=E\operatorname{diag}(\sigma_{1}(y),\ldots,\sigma_{\ell}(y))E^{T} is a spectral decomposition of a matrix y𝕊y\in\mathbb{S}^{\ell}, while σ1(y),,σ(y)\sigma_{1}(y),\ldots,\sigma_{\ell}(y) are the eigenvalue of yy listed in the decreasing order. Note that if the function ψ\psi is non-decreasing and ψ(0)=0\psi(0)=0, then Ψ(y)𝕊\Psi(y)\in\mathbb{S}^{\ell}_{-} for any y𝕊y\in\mathbb{S}_{-}^{\ell}.

Example 17 (exponential-type augmented Lagrangian/modified barrier function).

Let a function ψ:{+}\psi\colon\mathbb{R}\to\mathbb{R}\cup\{+\infty\} be as in Example 15. For any y,λ𝕊y,\lambda\in\mathbb{S}^{\ell} and c>0c>0 define

Φ(y,λ,c)=1cλ,Ψ(cy),\Phi(y,\lambda,c)=\frac{1}{c}\big{\langle}\lambda,\Psi(cy)\big{\rangle},

if cσ1(y)<ε0c\sigma_{1}(y)<\varepsilon_{0}, and Φ(y,λ,c)=+\Phi(y,\lambda,c)=+\infty otherwise. The corresponding augmented Lagrangian ()\mathscr{L}(\cdot) is an extension of augmented Lagrangians for inequality constrained optimization problems from Examples 8 and 12. It was studied in details in [63, 47, 38, 82, 43].

Let Λ=K=𝕊+\Lambda=K^{*}=\mathbb{S}_{+}^{\ell}. Then assumptions (A1)(A1)(A7)(A7), (A9)(A9)(A11)(A11), and (A15)(A15) hold true in the general case (assumption (A11)(A11) is satisfied with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda by [43, Proposition 4.2]). Assumption (A8)(A8) is satisfied, if the matrix function Ψ()\Psi(\cdot) is monotone and convex (see, e.g. [30]). Assumptions (A12)(A12) and (A12)s(A12)_{s} are satisfied if and only if ε0<+\varepsilon_{0}<+\infty. Restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} hold true if and only if ψ(t)/t0\psi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions hold true if and only if ψ\psi is bounded below. Finally, assumption (A9)s(A9)_{s} is not satisfied for the function Φ\Phi from this example.

Example 18 (penalized exponential-type augmented Lagrangian).

Let a function ψ:\psi\colon\mathbb{R}\to\mathbb{R} be a twice continuously differentiable non-decreasing convex function such that ψ(t)/t+\psi(t)/t\to+\infty as t+t\to+\infty, ψ(0)=0\psi(0)=0 and ψ(0)=1\psi^{\prime}(0)=1. Let also ξ:\xi\colon\mathbb{R}\to\mathbb{R} be a twice continuously differentiable non-decreasing convex function such that ξ(t)=0\xi(t)=0 for all t0t\leq 0 and ξ(t)>0\xi(t)>0 for all t>0t>0. Denote by Ξ()\Xi(\cdot) the Löwner’s operator associated with ξ()\xi(\cdot). For any y,λ𝕊y,\lambda\in\mathbb{S}^{\ell} and c>0c>0 define

Φ(y,λ,c)=1cλ,Ψ(cy)+1cTr(Ξ(cy)).\Phi(y,\lambda,c)=\frac{1}{c}\big{\langle}\lambda,\Psi(cy)\big{\rangle}+\frac{1}{c}\operatorname{Tr}\Big{(}\Xi(cy)\Big{)}.

The corresponding augmented Lagrangian ()\mathscr{L}(\cdot) was introduced in [43] and is an extrension of the penalized exponential-type augmented Lagrangian from Example 9 to the case of nonlinear semidefinite programming problems.

Let Λ=K=𝕊+\Lambda=K^{*}=\mathbb{S}_{+}^{\ell}. Then assumptions (A1)(A1)(A7)(A7), (A9)(A9)(A11)(A11), and (A15)(A15) hold true in the general case (assumption (A11)(A11) is satisfied with Φ0(λ)λ\Phi_{0}(\lambda)\equiv\lambda by [43, Proposition 4.2]). Assumption (A8)(A8) is satisfied, provided the matrix functions Ψ()\Psi(\cdot) and Ξ()\Xi(\cdot) are monotone and convex. Assumptions (A12)(A12) and (A12)s(A12)_{s} are satisfied, if ξ(t)/t+\xi(t)/t\to+\infty as t+t\to+\infty. Restricted assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} hold true if and only if ψ(t)/t0\psi(t)/t\to 0 as tt\to-\infty, while non-restricted versions of these assumptions hold true if and only if ψ\psi is bounded below. Finally, assumption (A9)s(A9)_{s} is not satisfied, regardless of the choice of ψ\psi and ξ\xi.

3.6 Augmented Lagrangians for problems with pointwise inequality constraints

Let (T,𝔄,μ)(T,\mathfrak{A},\mu) be a measure space and XX be some space of functions x:Tmx\colon T\to\mathbb{R}^{m}. For example, XX can be defined as Lp(T,𝔄,μ)L^{p}(T,\mathfrak{A},\mu) or as the Sobolev space, when TT is an open subset of d\mathbb{R}^{d}. Let us consider problems with pointwise inequality constraints of the form:

minf(x)subject tog(x(t),t)0for a.e. tT,xQ,\min\>f(x)\quad\text{subject to}\quad g(x(t),t)\leq 0\enspace\text{for a.e. }t\in T,\quad x\in Q, (10)

where g:X×Tg\colon X\times T\to\mathbb{R} is a given function such that g(x(),)Lp(T,𝔄,μ)g(x(\cdot),\cdot)\in L^{p}(T,\mathfrak{A},\mu) for some fixed p[1,+)p\in[1,+\infty) and all xXx\in X.

One can rewrite problem (10) as the problem (𝒫)(\mathcal{P}) with Y=Lp(T,𝔄,μ)Y=L^{p}(T,\mathfrak{A},\mu), KK being the cone of nonpositive function Lp(T,𝔄,μ)L^{p}_{-}(T,\mathfrak{A},\mu), and G(x)()=g(x(),)G(x)(\cdot)=g(x(\cdot),\cdot) for all xXx\in X. In this case the dual space YY^{*} can be identified with Lq(T,𝔄,μ)L^{q}(T,\mathfrak{A},\mu), where q(1,+]q\in(1,+\infty] is the conjugate exponent of pp, that is, 1/p+1/q=11/p+1/q=1 (q=+q=+\infty, if p=1p=1). In turn, the polar cone KK^{*} can be identified with the cone of nonnegative functions L+q(T,𝔄,μ)L^{q}_{+}(T,\mathfrak{A},\mu).

For the sake of shortness we will consider only an augmented Lagrangian for problem (10) based on the Hestenes-Powell-Rockafellar augmented Lagrangian. However, it should be mentioned that one can define an augmented Lagrangian for this problem based on any other augmented Lagrangian for inequality constrained optimization problems.

Example 19 (Hestenes-Powell-Rockafellar augmented Lagrangian).

Suppose that either p=2p=2 or p2p\geq 2 and the measure μ\mu is finite. For any yY:=Lp(T,𝔄,μ)y\in Y:=L^{p}(T,\mathfrak{A},\mu), λYLq(T,𝔄,μ)\lambda\in Y^{*}\cong L^{q}(T,\mathfrak{A},\mu), and c>0c>0 define

Φ(y,λ,c)=T(λ(t)max{y(t),λ(t)c}+c2max{y(t),λ(t)c}2)dμ(t).\Phi(y,\lambda,c)=\int_{T}\left(\lambda(t)\max\left\{y(t),-\frac{\lambda(t)}{c}\right\}+\frac{c}{2}\max\left\{y(t),-\frac{\lambda(t)}{c}\right\}^{2}\right)d\mu(t).

Observe that

|λ(t)max{y(t),λ(t)c}+c2max{y(t),λ(t)c}2|1+c2|y(t)|2+(12+12c)|λ(t)|2.\left|\lambda(t)\max\left\{y(t),-\frac{\lambda(t)}{c}\right\}+\frac{c}{2}\max\left\{y(t),-\frac{\lambda(t)}{c}\right\}^{2}\right|\\ \leq\frac{1+c}{2}|y(t)|^{2}+\left(\frac{1}{2}+\frac{1}{2c}\right)|\lambda(t)|^{2}.

Therefore, the value Φ(y,λ,c)\Phi(y,\lambda,c) is correctly defined and finite for any yYy\in Y, λY\lambda\in Y^{*}, and c>0c>0, if p=2p=2 or p2p\geq 2 and the measure μ\mu is finite.

Let Λ=Y\Lambda=Y^{*}. Then one can readily verify that all basic assumptions hold true in the general case, except for assumptions (A12)(A12) and (A12)s(A12)_{s}. Assumptions (A12)(A12) and (A12)s(A12)_{s} are satisfied in the case p=2p=2, since

Φ(y,λ,c)Φ(y,λ,c0)(cc0)Tmax{0,y(t)}2dμ(t)=(cc0)dist(y,K)2\Phi(y,\lambda,c)-\Phi(y,\lambda,c_{0})\geq(c-c_{0})\int_{T}\max\{0,y(t)\}^{2}d\mu(t)=(c-c_{0})\operatorname{dist}(y,K)^{2}

for all yYy\in Y, λΛ\lambda\in\Lambda, and cc0>0c\geq c_{0}>0 (see the proof of the validity of assumptions (A12)(A12) and (A12)s(A12)_{s} for the Rockafellar-Wets’ augmented Lagrangian).

3.7 Some comments on particular augmented Lagrangians

Before we proceed to the analysis of the augmented dual problem and primal-dual augmented Lagrangian methods, let us make a few general observations about the presented examples:

  1. 1.

    All basic assumptions, except for assumption (A9)s(A9)_{s}, are satisfied for all particular augmented Lagrangians presented above (under appropriate additional assumptions), except for the exponential-type augmented Lagrangian (Examples 8, 15, and 17), the p-th power augmented Lagrangian (Example 10), the hyperbolic-type augmented Lagrangian (Example 11), and He-Wu-Meng’s augmented Lagrangian (Example 13). The exponential-type augmented Lagrangian (the case ε0=+\varepsilon_{0}=+\infty in Examples 15 and 17) and the p-th power augmented Lagrangian do not satisfy assumptions (A12)(A12) and (A12)s(A12)_{s}, the hyperbolic-type augmented Lagrangian does not satisfy assumptions (A9)(A9), (A12)(A12), and (A12)s(A12)_{s}, while He-Wu-Meng’s augmented Lagrangian does not satisfy assumption (A9)(A9) and non-restricted versions of assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s}.

  2. 2.

    Assumption (A9)s(A9)_{s} is satisfied only for the Hestenes-Powell-Rockafellar augmented Lagrangian (Examples 2, 5, 14, 16, and 19) and its generalization, Rockafellar-Wets’ augmented Lagrangian (Example 1.

  3. 3.

    For assumptions (A1)(A1), (A6)(A6)(A8)(A8), (A12)(A12)(A15)(A15), and (A12)s(A12)_{s}(A14)s(A14)_{s} to be satisfied for the exponential-type augmented Lagrangian (Examples 8, 15, and 17), the penalized exponential-type augmented Lagrangian (Examples 9 and 18), the modified barrier function (Example 12), and the p-th power augmented Lagrangian (Example 10), and the hyperbolic-type augmented Lagrangian (Example 11) it is necessary that ΛK\Lambda\subseteq K^{*}. In contrast, for all other paritcular augmented Lagrangians presented in this section these assumptions are satisfied for Λ=Y\Lambda=Y^{*} (in the case of the He-Wu-Meng’s augmented Lagrangian only the restricted versions of assumptions (A13)(A13), (A13)s(A13)_{s}, (A14)(A14), and (A14)s(A14)_{s} are satisfied for Λ=Y\Lambda=Y^{*}).

  4. 4.

    Our theory of augmented Lagrangians encompasses penalty functions of the form Fc()=f()+cdist(G(),K)αF_{c}(\cdot)=f(\cdot)+c\operatorname{dist}(G(\cdot),K)^{\alpha} with α>0\alpha>0. One simply needs to define Φ(y,λ,c):=cdist(y,K)α\Phi(y,\lambda,c):=c\operatorname{dist}(y,K)^{\alpha}. This function Φ\Phi satisfied assumptions (A1)(A1), (A2)(A2), (A4)(A4), (A7)(A7), (A9)(A9), (A9)s(A9)_{s}, (A10)(A10), (A12)(A12)(A15)(A15) and (A12)s(A12)_{s}(A14)s(A14)_{s} for any choice of the set Λ\Lambda (assumption (A8)(A8) is satisfied, if α1\alpha\geq 1, while assumption (A6)(A6) is satisfied, if ΛK\Lambda\subseteq K^{*}), which means that the main results of this paper on the zero duality gap and convergence of augmented Lagrangian methods can be applied to the penalty function Fc()F_{c}(\cdot).

4 Duality theory

One of the central concepts of the theory of augmented Lagrangians and corresponding optimization methods is the (augmented) dual problem:

max(λ,c)Θ(λ,c)subject toλΛ,c>0,\max_{(\lambda,c)}\>\Theta(\lambda,c)\quad\text{subject to}\quad\lambda\in\Lambda,\enspace c>0,\qquad (𝒟)

where

Θ(λ,c):=infxQ(x,λ,c)λΛ,c>0\Theta(\lambda,c):=\inf_{x\in Q}\mathscr{L}(x,\lambda,c)\quad\forall\lambda\in\Lambda,\>c>0 (11)

is the (augmented) dual function. As is well-known and will be discussed in details below, convergence of augmented Lagrangian methods is interlinked with various properties of the dual problem. Therefore, before turning to augmented Lagrangian methods, we need to analyse how standard duality results are translated into our axiomatic augmented Lagrangian setting.

Remark 7.

Note that if assumption (A9)s(A9)_{s} is satisfied, then the dual function Θ\Theta is concave and the augmented dual problem (𝒟)(\mathcal{D}) is a concave optimization problem, even if the original problem (𝒫)(\mathcal{P}) is nonconvex. Furthermore, assumption (A10)(A10) ensures that the dual function is upper semicontinuous, as the infimum of the family {(x,)}\{\mathscr{L}(x,\cdot)\}, xQx\in Q, of upper semicontinuous functions.

4.1 Zero duality gap property

Let us first study how optimal values of the problems (𝒫)(\mathcal{P}) and (𝒟)(\mathcal{D}) relate to each other. We start by showing that under an essentially nonrestrictive assumption the optimal value of the augmented dual problem does not exceed the optimal value of the primal problem.

Proposition 1 (weak duality).

Let assumption (A1)(A1) hold true. Then

Θ(λ,c)f(x)xΩ,λΛ,c>0,\Theta(\lambda,c)\leq f(x)\quad\forall x\in\Omega,\>\lambda\in\Lambda,\>c>0,

where Ω\Omega is the feasible region of the problem (𝒫)(\mathcal{P}). In pacritular, Θf\Theta_{*}\leq f_{*}, where Θ\Theta_{*} is the optimal value of the problem (𝒟)(\mathcal{D}) and ff_{*} is the optimal value of the problem (𝒫)(\mathcal{P}).

Proof.

By assumption (A1)(A1) for any point xΩx\in\Omega and all λΛ\lambda\in\Lambda and c>0c>0 one has (x,λ,c)f(x)\mathscr{L}(x,\lambda,c)\leq f(x). Hence applying the definition of Θ\Theta (see (11)) and the fact that ΩQ\Omega\subseteq Q one obtains the required result. ∎

As is well-known, the optimal values of the primal and dual problems might not coincide, especially for nonconvex problems. In this case Θ<f\Theta_{*}<f_{*} and the quantity fΘ>0f_{*}-\Theta_{*}>0 is called the duality gap.

Definition 1.

One says that there is zero duality gap between the primal problem (𝒫)(\mathcal{P}) and the dual problem (𝒟)(\mathcal{D}) (or that the augmented Lagrangian ()\mathscr{L}(\cdot) has the zero duality gap property, or that the strong duality with respect to the augmented Lagrangian ()\mathscr{L}(\cdot) holds true), if Θ=f\Theta_{*}=f_{*}.

Our aim now is to understand what kind of assumptions one must impose on the function Φ\Phi to ensure that the corresponding augmented Lagrangian (x,λ,c):=f(x)+Φ(G(x),λ,c)\mathscr{L}(x,\lambda,c):=f(x)+\Phi(G(x),\lambda,c) has the zero duality gap property. To this end, we extend the standard result (see, e.g. [59]) connecting the optimal value of the dual problem with the behaviour of the optimal value (perturbation) function

β(p)=inf{f(x)|xQ:G(x)pK}pY\beta(p)=\inf\big{\{}f(x)\bigm{|}x\in Q\colon G(x)-p\in K\big{\}}\quad\forall p\in Y

of the problem (𝒫)(\mathcal{P}) to our case. Denote by domλΘ\operatorname{dom}_{\lambda}\Theta the effective domain of Θ\Theta in λ\lambda, that is, domλΘ={λΛc>0:Θ(λ,c)>}\operatorname{dom}_{\lambda}\Theta=\{\lambda\in\Lambda\mid\exists c>0\colon\Theta(\lambda,c)>-\infty\}. Note that λdomλΘ\lambda\in\operatorname{dom}_{\lambda}\Theta if and only if the function (,λ,c)\mathscr{L}(\cdot,\lambda,c) is bounded below on QQ for some c>0c>0.

Theorem 2 (optimal dual value formula).

Let assumptions (A1)(A1), (A7)(A7), and (A12)(A12)(A14)(A14) hold true. Then

Θ:=supλΛ,c>0Θ(λ,c)={,if domλΘ=,min{f,lim infp0β(p)},if domλΘ.\Theta_{*}:=\sup_{\lambda\in\Lambda,c>0}\Theta(\lambda,c)=\begin{cases}-\infty,&\text{if }\operatorname{dom}_{\lambda}\Theta=\emptyset,\\ \min\left\{f_{*},\liminf_{p\to 0}\beta(p)\right\},&\text{if }\operatorname{dom}_{\lambda}\Theta\neq\emptyset.\end{cases}

In addition, Θ=limc+Θ(λ,c)\Theta_{*}=\lim\limits_{c\to+\infty}\Theta(\lambda,c) for all λdomλΘ\lambda\in\operatorname{dom}_{\lambda}\Theta.

Proof.

Note that Θ(λ,c)=\Theta(\lambda,c)=-\infty for all λΛ\lambda\in\Lambda and c>0c>0, and Θ=\Theta_{*}=-\infty, if domλΘ=\operatorname{dom}_{\lambda}\Theta=\emptyset. Therefore, below we can suppose that domλΘ\operatorname{dom}_{\lambda}\Theta\neq\emptyset.

By assumption (A7)(A7) the function Φ(y,λ,c)\Phi(y,\lambda,c) is non-decreasing in cc. Therefore the functions (x,λ,c)\mathscr{L}(x,\lambda,c) and Θ(λ,c)\Theta(\lambda,c) are non-decreasing in cc for all xXx\in X and λΛ\lambda\in\Lambda. Hence, as is easy to see, one has

supλΛ,c>0Θ(λ,c)=supλΛsupc>0Θ(λ,c)=supλΛlimc+Θ(λ,c)=supλdomλΘlimc+Θ(λ,c).\sup_{\lambda\in\Lambda,c>0}\Theta(\lambda,c)=\sup_{\lambda\in\Lambda}\sup_{c>0}\Theta(\lambda,c)=\sup_{\lambda\in\Lambda}\lim_{c\to+\infty}\Theta(\lambda,c)=\sup_{\lambda\in\operatorname{dom}_{\lambda}\Theta}\lim_{c\to+\infty}\Theta(\lambda,c).

Consequently, it is sufficient to check that

Θ(λ):=limc+Θ(λ,c)=min{f,lim infp0β(p)}λdomλΘ.\Theta_{*}(\lambda):=\lim\limits_{c\to+\infty}\Theta(\lambda,c)=\min\left\{f_{*},\liminf_{p\to 0}\beta(p)\right\}\quad\forall\lambda\in\operatorname{dom}_{\lambda}\Theta. (12)

Let us prove this equality.

Fix any λdomλΘ\lambda\in\operatorname{dom}_{\lambda}\Theta and any unbounded strictly increasing sequence {cn}(0,+)\{c_{n}\}\subset(0,+\infty) such that the function (,λ,c0)\mathscr{L}(\cdot,\lambda,c_{0}) is bounded below on QQ (such c0c_{0} exists by the definition of domλΘ\operatorname{dom}_{\lambda}\Theta). Then by Proposition 1 one has

fΘ(λ)Θ(λ,cn)Θ(λ,c0)>n,limnΘ(λ,cn)=Θ(λ).\begin{split}f_{*}\geq\Theta_{*}(\lambda)\geq&\Theta(\lambda,c_{n})\geq\Theta(\lambda,c_{0})>-\infty\quad\forall n\in\mathbb{N},\quad\\ &\lim_{n\to\infty}\Theta(\lambda,c_{n})=\Theta_{*}(\lambda).\end{split} (13)

Let {xn}Q\{x_{n}\}\subset Q be a sequence such that (xn,λ,cn)Θ(λ,cn)+1/(n+1)\mathscr{L}(x_{n},\lambda,c_{n})\leq\Theta(\lambda,c_{n})+1/(n+1) for all nn\in\mathbb{N}. Observe that from (13) it follows that

limn(xn,λ,cn)=Θ(λ)f.\lim_{n\to\infty}\mathscr{L}(x_{n},\lambda,c_{n})=\Theta_{*}(\lambda)\leq f_{*}. (14)

Note also that due to assumption (A7)(A7) for all r>0r>0, nn\in\mathbb{N}, and xQx\in Q such that dist(G(x),K)r\operatorname{dist}(G(x),K)\geq r one has (x,λ,cn)=+\mathscr{L}(x,\lambda,c_{n})=+\infty, if Φ(G(x),λ,c0)=+\Phi(G(x),\lambda,c_{0})=+\infty, and

(x,λ,cn)=(x,λ,c0)+Φ(G(x),λ,cn)Φ(G(x),λ,c0)Θ(λ,c0)+inf{Φ(y,λ,cn)Φ(y,λ,c0)|yY,dist(y,K)r,|Φ(y,λ,c0)|<+},\mathscr{L}(x,\lambda,c_{n})=\mathscr{L}(x,\lambda,c_{0})+\Phi(G(x),\lambda,c_{n})-\Phi(G(x),\lambda,c_{0})\geq\Theta(\lambda,c_{0})\\ +\inf\Big{\{}\Phi(y,\lambda,c_{n})-\Phi(y,\lambda,c_{0})\Bigm{|}y\in Y,\>\operatorname{dist}(y,K)\geq r,\>|\Phi(y,\lambda,c_{0})|<+\infty\Big{\}},

if Φ(G(x),λ,c0)<+\Phi(G(x),\lambda,c_{0})<+\infty (note that Φ(G(x),λ,c0)>\Phi(G(x),\lambda,c_{0})>-\infty, since λdomλΘ\lambda\in\operatorname{dom}_{\lambda}\Theta). Therefore, by assumption (A12)(A12) for any r>0r>0 one has (x,λ,cn)+\mathscr{L}(x,\lambda,c_{n})\to+\infty as nn\to\infty uniformly on the set {xQdist(G(x),K)r}\{x\in Q\mid\operatorname{dist}(G(x),K)\geq r\}, which with the use of (14) implies that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty. Let us consider two cases.

Case I. Suppose that there exists a subsequence {xnk}\{x_{n_{k}}\} that is feasible for the problem (𝒫)(\mathcal{P}). Then with the use of Lemma 1 one gets

limk(xnk,λ,cnk)\displaystyle\lim_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda,c_{n_{k}}) lim infk(f(xnk)+infyKΦ(y,λ,cnk))\displaystyle\geq\liminf_{k\to\infty}\Big{(}f(x_{n_{k}})+\inf_{y\in K}\Phi(y,\lambda,c_{n_{k}})\Big{)}
lim infkf(xnk)f.\displaystyle\geq\liminf_{k\to\infty}f(x_{n_{k}})\geq f_{*}.

Case II. Suppose now that G(xnk)KG(x_{n_{k}})\notin K for some subsequence {xnk}\{x_{n_{k}}\} Then with the use of assumption (A13)(A13) one gets

limk(xnk,λ,cnk)lim infkf(xnk)lim infkβ(pk)lim infp0β(p),\lim_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda,c_{n_{k}})\geq\liminf_{k\to\infty}f(x_{n_{k}})\geq\liminf_{k\to\infty}\beta(p_{k})\geq\liminf_{p\to 0}\beta(p),

where {pk}Y\{p_{k}\}\subset Y is any sequence such that G(xnk)pkKG(x_{n_{k}})-p_{k}\in K and pk0\|p_{k}\|\to 0 as kk\to\infty (note that such sequence exists, since dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty).

Combining the two cases and inequalities (13) and (14) one obtains that

fΘ(λ)min{f,lim infp0β(p)}.f_{*}\geq\Theta_{*}(\lambda)\geq\min\{f_{*},\liminf_{p\to 0}\beta(p)\}. (15)

To prove equality (12), suppose that f>lim infp0β(p)=:βf_{*}>\liminf_{p\to 0}\beta(p)=:\beta_{*}.

Let {pk}Y\{p_{k}\}\subset Y be any sequence such that pk0p_{k}\to 0 and β(pk)β\beta(p_{k})\to\beta_{*} as kk\to\infty. Let also {tn}\{t_{n}\} be the sequence from assumption (A14)(A14). Clearly, there exists a subsequence {pkn}\{p_{k_{n}}\} such that pkntn\|p_{k_{n}}\|\leq t_{n} for all nn\in\mathbb{N}. By the definition of the optimal value function β\beta for any nn\in\mathbb{N} one can find xnQx_{n}\in Q such that G(xn)pknKG(x_{n})-p_{k_{n}}\in K (i.e. dist(G(xn),K)tn\operatorname{dist}(G(x_{n}),K)\leq t_{n}) and f(xn)β(pkn)+1/(n+1)f(x_{n})\leq\beta(p_{k_{n}})+1/(n+1) in the case when β>\beta_{*}>-\infty, and f(xn)f(x_{n})\to-\infty as nn\to\infty in the case when β=\beta_{*}=-\infty.

If β>\beta_{*}>-\infty, then thanks to assumption (A14)(A14) one has

Θ(λ)=limnΘ(λ,cn)limn(xn,λ,cn)=limnf(xn)\displaystyle\Theta_{*}(\lambda)=\lim_{n\to\infty}\Theta(\lambda,c_{n})\leq\lim_{n\to\infty}\mathscr{L}(x_{n},\lambda,c_{n})=\lim_{n\to\infty}f(x_{n}) =limnβ(pkn)\displaystyle=\lim_{n\to\infty}\beta(p_{k_{n}})
=lim infp0β(p),\displaystyle=\liminf_{p\to 0}\beta(p),

which along with (15) implies the required result. In turn, if β=\beta_{*}=-\infty, then due to assumption (A14)(A14) one has

Θ(λ)=limnΘ(λ,cn)limn(xn,λ,cn)=limnf(xn)==lim infp0β(p),\Theta_{*}(\lambda)=\lim_{n\to\infty}\Theta(\lambda,c_{n})\leq\lim_{n\to\infty}\mathscr{L}(x_{n},\lambda,c_{n})=\lim_{n\to\infty}f(x_{n})=-\infty=\liminf_{p\to 0}\beta(p),

which implies that equality (12) holds true. ∎

Corollary 1 (duality gap formula).

If under the assumptions of the previous theorem one has domλΘ\operatorname{dom}_{\lambda}\Theta\neq\emptyset, then

fΘ=max{0,flim infp0β(p)}.f_{*}-\Theta_{*}=\max\left\{0,f_{*}-\liminf_{p\to 0}\beta(p)\right\}.

In particular, if the duality gap is positive, then it is equal to flim infp0β(p)f_{*}-\liminf_{p\to 0}\beta(p).

Remark 8.

Although in the proof of Theorem 2 we considered the case β=\beta_{*}=-\infty, in actuality, the assumptions of this theorem ensure that β>\beta_{*}>-\infty. Namely, if assumptions (A7)(A7) and (A14)(A14) are satisfied and (,λ,c)\mathscr{L}(\cdot,\lambda,c) is bounded below on QQ for some λΛ\lambda\in\Lambda and c>0c>0, then β=lim infp0β(p)>\beta_{*}=\liminf_{p\to 0}\beta(p)>-\infty. Indeed, suppose by contradiction that β=\beta_{*}=-\infty. Then there exists a sequence {pn}Y\{p_{n}\}\in Y such that pn0p_{n}\to 0 and β(pn)\beta(p_{n})\to-\infty as nn\to\infty. By the definition of the optimal value function one can find a sequence {xn}Q\{x_{n}\}\subset Q such that G(xn)pnKG(x_{n})-p_{n}\in K for all nn\in\mathbb{N} and f(xn)f(x_{n})\to-\infty as nn\to\infty. Note that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty, since pnp_{n} converges to zero.

Let {cn}(c,+)\{c_{n}\}\subset(c,+\infty) be any increasing unbounded sequence and {tk}\{t_{k}\} be the sequence from assumption (A14)(A14). Clearly, one can find a subsequence {xnk}\{x_{n_{k}}\} such that dist(G(xnk),K)tk\operatorname{dist}(G(x_{n_{k}}),K)\leq t_{k} for all kk\in\mathbb{N}. Then by assumption (A14)(A14) one has Φ(G(xnk),λ,cnk)0\Phi(G(x_{n_{k}}),\lambda,c_{n_{k}})\to 0 as kk\to\infty, which implies that

limk(xnk,λ,cnk)=limkf(xnk)=,\lim_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda,c_{n_{k}})=\lim_{k\to\infty}f(x_{n_{k}})=-\infty,

which due to assumption (A7)(A7) contradicts the fact that (,λ,c)\mathscr{L}(\cdot,\lambda,c) is bounded below on QQ.

Remark 9.

The claim of Theorem 2 remains to hold true, if only restricted versions of assumptions (A13)(A13) and (A14)(A14) hold true, and one additionally assumes that the projection of the set G(Q)G(Q) onto the cone KK is bounded. If this projection is bounded, then one can show that the sequences {G(xnk)}\{G(x_{n_{k}})\} appearing in the proof of the theorem are also bounded. Therefore, only restricted versions of assumptions (A13)(A13) and (A14)(A14) are needed to prove the theorem in this case, which makes the theorem applicable, for example, to He-Wu-Meng’s augmented Lagrangian (Example 13).

Let us note that the assumption on the boundedness of the projection of G(Q)G(Q) onto KK is not uncommon in the literature on augmented Lagrangians and primal-dual augmented Lagrangian methods (see, e.g. [44, Assumption 2], [39, Assumption 2], [71, Assumption 2], [70, condition (2)], etc.). In many particular cases this assumption is not restrictive from the theoretical point of view. For example, one can always guarantee that this assumption is satisfied for inequality constrainted problems by replacing the constraints gi(x)0g_{i}(x)\leq 0 with egi(x)10e^{g_{i}(x)}-1\leq 0.

As a simple corollary to Theorem 2 we can obtain necessary and sufficient conditions for the augmented Lagrangian (x,λ,c)\mathscr{L}(x,\lambda,c) to have the zero duality gap property.

Theorem 3 (zero duality gap characterisation).

Let assumptions (A1)(A1), (A7)(A7), and (A12)(A12)(A14)(A14) be valid. Then the zero duality gap property holds true if and only if the optimal value function β\beta is lower semicontinuous (lsc) at the origin and there exist λΛ\lambda\in\Lambda and c>0c>0 such that the function (,λ,c)\mathscr{L}(\cdot,\lambda,c) is bounded below on QQ.

Proof.

Suppose that the zero duality gap property holds true. Then the optimal value of the dual problem is finite, which implies that domλΘ\operatorname{dom}_{\lambda}\Theta\neq\emptyset or, equivalently, there exist λΛ\lambda\in\Lambda and c>0c>0 such that the function (,λ,c)\mathscr{L}(\cdot,\lambda,c) is bounded below on QQ. Moreover, f=Θ=min{f,lim infp0β(p)}f_{*}=\Theta_{*}=\min\{f_{*},\liminf_{p\to 0}\beta(p)\} by Theorem 2, which means that lim infp0β(p)f=β(0)\liminf_{p\to 0}\beta(p)\geq f_{*}=\beta(0), that is, the optimal value function β\beta is lsc at the origin.

Conversely, suppose that β\beta is lsc at the origin and domλΘ\operatorname{dom}_{\lambda}\Theta\neq\emptyset. Then by Theorem 2 one has Θ=f\Theta_{*}=f_{*}, that is, there is zero duality gap between the primal and dual problems. ∎

Remark 10.

Let us note that one can prove the zero duality gap property for ()\mathscr{L}(\cdot) under slightly less restrictive assumptions on the function Φ\Phi than in the previous theorems. Namely, instead of assuming that the claims of assumptions (A12)(A12)(A14)(A14), (A16)(A16) are satisfied for all λΛ\lambda\in\Lambda, it is sufficient to suppose that there exists λ0domλΘ\lambda_{0}\in\operatorname{dom}_{\lambda}\Theta satisfying these assumptions (in the case of the coercivity assumption one must suppose that the function (,λ,c)\mathscr{L}(\cdot,\lambda,c) is coercive for λ=λ0\lambda=\lambda_{0}). Then arguing in the same way as in the proof of Theorem 2 one can check that

fsupλΛ,c>0Θ(λ,c)limc+Θ(λ0,c)=min{f,lim infp0β(p)}.f_{*}\geq\sup_{\lambda\in\Lambda,c>0}\Theta(\lambda,c)\geq\lim_{c\to+\infty}\Theta(\lambda_{0},c)=\min\big{\{}f_{*},\liminf_{p\to 0}\beta(p)\big{\}}.

This inequality obviously implies that the zero duality gap property holds true, provided the optimal value function β\beta is lsc at the origin. Although such small change in the assumptions of the theorem might seem insignificant, in actuality it considerably broadens the class of augmented Lagrangians to which the sufficient conditions for the validity of the zero duality gap property can be applied. For example, Theorems 2 and 3 are inapplicable to the exponential-type augmented Lagrangian (Example 8), since this augmented Lagrangian does not satisfy assumption (A12)(A12). However, it satisfies the claim of assumption (A12)(A12) for any λ+m\lambda\in\mathbb{R}_{+}^{m} that lies in the interior of +m\mathbb{R}_{+}^{m} (i.e. that does not have zero components) and, therefore, one can conclude that the zero duality gap property holds true for the exponential-type augmented Lagrangian, provided the optimal value function is lsc at the origin and there exists λ0domλΘint+m\lambda_{0}\in\operatorname{dom}_{\lambda}\Theta\cap\operatorname{int}\mathbb{R}_{+}^{m}.

Remark 11.

Theorem 3 implies that under suitable assumptions the zero duality gap property depends not on the properties of the augmented Lagrangian ()\mathscr{L}(\cdot), but rather properties of the optimization problem (𝒫)(\mathcal{P}) itself. Similarly, by Corollary 1 the duality gap fΘf_{*}-\Theta_{*} does not depend on the augmented Lagrangian or even some characteristic of the dual problem (𝒟)(\mathcal{D}). It is completely predefined by the properties of the optimization problem under consideration. Thus, in a sense, the absence of the duality gap between the primal and dual problems, as well as the size of the duality gap, when it is positive, are properties of optimization problems themselves, not augmented Lagrangians or augmented dual problems that are used for analysing and/or solving these problems.

For the sake of completeness, let us also present a simple characterisation of the lower semicontinuity of the optimal value function β\beta, from which one can easily derive a number of well-known sufficient conditions for this function to be lsc at the origin.

Proposition 2.

For the optimal value function β\beta to be lsc at the origin it is necessary and sufficient that there does not exist a sequence {xn}Q\{x_{n}\}\subset Q such that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty and lim infnf(xn)<f\liminf_{n\to\infty}f(x_{n})<f_{*}.

Proof.

Necessity. Suppose that β\beta is lsc at the origin. Let {xn}Q\{x_{n}\}\subset Q be any sequence such that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty. Denote pn=G(xn)p_{n}=G(x_{n}). Then pn0p_{n}\to 0 as nn\to\infty and due to the lower semicontinuity of β\beta at the origin one has

lim infnf(xn)lim infnβ(pn)β(0)=f.\liminf_{n\to\infty}f(x_{n})\geq\liminf_{n\to\infty}\beta(p_{n})\geq\beta(0)=f_{*}.

In other words, there does not exist a sequence {xn}Q\{x_{n}\}\subset Q satisfying the conditions from the formulation of the proposition.

Sufficiency. Suppose by contradiction that the function β\beta is not lsc at the origin. Then there exist ε>0\varepsilon>0 and a sequence {pn}Y\{p_{n}\}\subset Y converging to zero and such that β(pn)β(0)ε\beta(p_{n})\leq\beta(0)-\varepsilon for all nn\in\mathbb{N}. By the definition of the optimal value function for any nn\in\mathbb{N} one can find xnQx_{n}\in Q such that G(xn)K+pnG(x_{n})\in K+p_{n} and f(xn)fε/2f(x_{n})\leq f_{*}-\varepsilon/2 (recall that β(0)=f\beta(0)=f_{*}). Therefore dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty and lim infnf(xn)<f\liminf_{n\to\infty}f(x_{n})<f_{*}, which contradicts the assumptions of the proposition that there does not exist a sequence {xn}Q\{x_{n}\}\subset Q satisfying these conditions. ∎

Corollary 2.

Let the space XX be reflexive, the set QQ be weakly sequentially closed (in particular, one can suppose that QQ is convex), and the functions ff and dist(G(),K)\operatorname{dist}(G(\cdot),K) be weakly sequentially lsc on QQ. Then for the optimal value function β\beta to be lsc at the origin it is necessary and sufficient there does not exist a sequence {xn}Q\{x_{n}\}\subset Q such that xn+\|x_{n}\|\to+\infty and dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty, and lim infnf(xn)<f\liminf_{n\to\infty}f(x_{n})<f_{*}.

Proof.

The necessity of the conditions from the formulation of the corollary for the lower semicontinuity of the function β\beta follows directly from the previous proposition. Let us prove that they are also sufficient for the lower semicontinuity of β\beta.

Taking into account Proposition 2 it is sufficient to prove that there does not exists a bounded sequence {xn}Q\{x_{n}\}\subset Q such that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty and lim infnf(xn)<f\liminf_{n\to\infty}f(x_{n})<f_{*}. Suppose by contradiction that such bounded sequence exists. Replacing this sequence with a subsequence, if necessary, one can assume that the sequence {f(xn)}\{f(x_{n})\} converges. Since the space XX is reflexive, one can extract a subsequence {xnk}\{x_{n_{k}}\} that weakly converges to some point xx_{*} that belongs to the set QQ, since this set is weakly sequentially closed. Furthermore, G(x)KG(x_{*})\in K, i.e. xx_{*} is feasible for the problem (𝒫)(\mathcal{P}), since dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty and the function dist(G(),K)\operatorname{dist}(G(\cdot),K) is weakly sequentially lsc. Hence taking into account the fact that ff is also weakly sequentially lsc one gets that

f>limnf(xn)=limkf(xnk)f(x),f_{*}>\lim_{n\to\infty}f(x_{n})=\lim_{k\to\infty}f(x_{n_{k}})\geq f(x_{*}),

which is impossible by virtue of the fact that the point xx_{*} is feasible for the problem (𝒫)(\mathcal{P}). ∎

Corollary 3.

Let the assumptions of the previous corollary be satisfied and one of the following conditions hold true:

  1. 1.

    the set QQ is bounded;

  2. 2.

    the function ff is coercive on QQ;

  3. 3.

    the function dist(G(),K)\operatorname{dist}(G(\cdot),K) is coercive on QQ;

  4. 4.

    the penalty function f()+cdist(G(),K)αf(\cdot)+c\operatorname{dist}(G(\cdot),K)^{\alpha} is coercive on QQ for some c>0c>0 and α>0\alpha>0.

Then the optimal value function β\beta is lsc at the origin.

Thus, by Theorem 3 and Corollary 2, in the case when the space XX reflexive (in particular, in the finite dimensional case), under some natural lower semicontinuity assumptions the duality gap between the problems (𝒫)(\mathcal{P}) and (𝒟)(\mathcal{D}) is positive if and only if there exists an unbounded sequence {xn}Q\{x_{n}\}\subset Q such that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty and the lower limit of the sequence {f(xn)}\{f(x_{n})\} is smaller than the optimal value of the problem (𝒫)(\mathcal{P}). Furthermore, one can verify that the infimum of all such lower limits is equal to lim infp0β(p)\liminf_{p\to 0}\beta(p) and, therefore, defines the value of the duality gap fΘf_{*}-\Theta_{*}.

Remark 12.

Many existing results on the zero duality gap property for various augmented Lagrangians are either particular cases of Theorems 2 and 3 combined with Corollaries 2 and 3 or can be easily derived directly from these theorems and corollaries, including [31, Theorem 4.1], [32, Theorem 2.2], [8, Theorem 3], [39, Theorem 2.1], [86, Theorem 4.1], [87, Theorem 2.1], the claims about the zero duality gap property in Theorems 7, 9, and 11 in [79], etc.

4.2 Optimal dual solutions and global saddle points

As we will show below, dual convergence of augmented Lagrangian methods (that is, the convergence of the sequence of multipliers) is directly connected with the existence of globally optimal solutions of the dual problem (𝒟)(\mathcal{D}). Therefore, let us analyse main properties of optimal dual solutions that will help us to better understand dual convergence of augmented Lagrangian methods. For the sake of shortness, below we use the term optimal dual solution, instead of globally optimal solution of the dual problem (𝒟)(\mathcal{D}).

First, we make a simple observation about the role of the penalty parameter cc in optimal dual solutions.

Proposition 3.

Let assumption (A7)(A7) hold true and (λ,c)(\lambda_{*},c_{*}) be an optimal dual solution. Then for any ccc\geq c_{*} the equality Θ(λ,c)=Θ(λ,c)\Theta(\lambda_{*},c)=\Theta(\lambda_{*},c_{*}) holds true and the pair (λ,c)(\lambda_{*},c) is also an optimal dual solution.

Proof.

Under the assumption (A7)(A7) the augmented Lagrangian (x,λ,c)\mathscr{L}(x,\lambda,c) is non-decreasing in cc, which obviously implies that the augmented dual function Θ(λ,c)=infxQ(x,λ,c)\Theta(\lambda,c)=\inf_{x\in Q}\mathscr{L}(x,\lambda,c) is non-decreasing in cc as well. Therefore, for any ccc\geq c_{*} one has Θ(λ,c)Θ(λ,c)\Theta(\lambda_{*},c)\geq\Theta(\lambda,c_{*}), which means that (λ,c)(\lambda_{*},c) is a globally optimal solution of the dual problem and Θ(λ,c)=Θ(λ,c)\Theta(\lambda_{*},c)=\Theta(\lambda_{*},c_{*}). ∎

With the use of the previous proposition we can describe the structure of the set of optimal dual solutions, which we denote by 𝒟\mathscr{D}_{*}. Define

c(λ)=inf{c>0|(λ,c)𝒟}λΛ.c_{*}(\lambda)=\inf\big{\{}c>0\bigm{|}(\lambda,c)\in\mathscr{D}_{*}\big{\}}\quad\forall\lambda\in\Lambda.

Note that by definition c(λ)=+c_{*}(\lambda)=+\infty, if (λ,c)𝒟(\lambda,c)\notin\mathscr{D}_{*} for any c>0c>0. In addition, if c(λ)<+c_{*}(\lambda)<+\infty and assumption (A7)(A7) holds true, then according to the previous proposition (λ,c)𝒟(\lambda,c)\in\mathscr{D}_{*} for any c>c(λ)c>c_{*}(\lambda). Following the work of Burachik et al. [12], we call the function c()c_{*}(\cdot) the penalty map.

Let us prove some properties of the penalty map and describe the structure of the set of optimal dual solutions with the use of this map.

Corollary 4.

Let assumptions (A7)(A7), (A9)(A9), and (A10)(A10) be valid and the dual problem (𝒟)(\mathcal{D}) have a globally optimal solution. Then the set domc()\operatorname{dom}c_{*}(\cdot) is convex, the penalty map c()c_{*}(\cdot) is a quasiconvex function and

𝒟\displaystyle\mathscr{D}_{*} ={{λ}×[c(λ),+)|λdomc():c(λ)>0}\displaystyle=\Big{\{}\{\lambda\}\times[c_{*}(\lambda),+\infty)\Bigm{|}\lambda\in\operatorname{dom}c_{*}(\cdot)\colon c_{*}(\lambda)>0\Big{\}}
{{λ}×(0,+)|λdomc():c(λ)=0}.\displaystyle\bigcup\Big{\{}\{\lambda\}\times(0,+\infty)\Bigm{|}\lambda\in\operatorname{dom}c_{*}(\cdot)\colon c_{*}(\lambda)=0\Big{\}}.
Proof.

Let us first show that the set domc()\operatorname{dom}c_{*}(\cdot) is convex. Indeed, choose any λ1,λ2domc()\lambda_{1},\lambda_{2}\in\operatorname{dom}c_{*}(\cdot), α[0,1]\alpha\in[0,1], and c>max{c(λ1),c(λ2)}c>\max\{c_{*}(\lambda_{1}),c_{*}(\lambda_{2})\}. Then by Propositon 3 both (λ1,c)𝒟(\lambda_{1},c)\in\mathscr{D}_{*} and (λ2,c)𝒟(\lambda_{2},c)\in\mathscr{D}_{*}. Hence taking into account the fact that the dual function Θ(λ,c)\Theta(\lambda,c) is concave in λ\lambda by assumption (A9)(A9) one gets

Θ(αλ1+(1α)λ2,c)αΘ(λ1,c)+(1α)Θ(λ2,c)=αΘ+(1α)Θ=Θ.\Theta(\alpha\lambda_{1}+(1-\alpha)\lambda_{2},c)\geq\alpha\Theta(\lambda_{1},c)+(1-\alpha)\Theta(\lambda_{2},c)=\alpha\Theta_{*}+(1-\alpha)\Theta_{*}=\Theta_{*}.

Therefore (αλ1+(1α)λ2,c)𝒟(\alpha\lambda_{1}+(1-\alpha)\lambda_{2},c)\in\mathscr{D}_{*}, which means that αλ1+(1α)λ2domc()\alpha\lambda_{1}+(1-\alpha)\lambda_{2}\in\operatorname{dom}c_{*}(\cdot) and the set domc()\operatorname{dom}c_{*}(\cdot) is convex. Moreover, since c>max{c(λ1),c(λ2)}c>\max\{c_{*}(\lambda_{1}),c_{*}(\lambda_{2})\} was chosen aribtrarily, one has c(αλ1+(1α)λ2)max{c(λ1),c(λ2)}c_{*}(\alpha\lambda_{1}+(1-\alpha)\lambda_{2})\leq\max\{c_{*}(\lambda_{1}),c_{*}(\lambda_{2})\}, that is, the penalty map is a quasiconvex function.

As was noted above, for any λdomc()\lambda\in\operatorname{dom}c_{*}(\cdot) and c>c(λ)c>c_{*}(\lambda) one has (λ,c)𝒟(\lambda,c)\in\mathscr{D}_{*}. Hence bearing in mind the fact that the augmented dual function Θ\Theta is upper semicontinuous (usc) by assumption (A10)(A10) one can conclude that

{λ}×[c(λ),+)\displaystyle\{\lambda\}\times[c_{*}(\lambda),+\infty) 𝒟λdomc():c(λ)>0,\displaystyle\subseteq\mathscr{D}_{*}\quad\forall\lambda\in\operatorname{dom}c_{*}(\cdot)\colon c_{*}(\lambda)>0,
{λ}×(0,+)\displaystyle\{\lambda\}\times(0,+\infty) 𝒟λdomc():c(λ)=0.\displaystyle\subseteq\mathscr{D}_{*}\quad\forall\lambda\in\operatorname{dom}c_{*}(\cdot)\colon c_{*}(\lambda)=0.

The validity of the converse inclusions follows directly from the definition of the penalty map and Proposition 3. ∎

Remark 13.

Note that we need to consider the case c(λ)=0c_{*}(\lambda)=0 separately due to the fact that many particular augmented Lagrangians are not defined for c=0c=0 (see examples in Section 3). However, if a given augmented Lagrangian is correctly defined for c=0c=0 and assumptions (A7)(A7), (A9)(A9), and (A10)(A10) are satisfied for c[0,+)c\in[0,+\infty), then

𝒟=λdomc(){λ}[c(λ),+)=epic(),\mathscr{D}_{*}=\bigcup_{\lambda_{*}\in\operatorname{dom}c_{*}(\cdot)}\{\lambda_{*}\}\cup[c_{*}(\lambda_{*}),+\infty)=\operatorname{epi}c_{*}(\cdot),

where epic()\operatorname{epi}c_{*}(\cdot) is the epigraph of the penalty map. This equality holds true, in particular, for Rockafellar-Wets’ augmented Lagrangian from Example 1. Let us also note that in the case when assumption (A9)s(A9)_{s} is satisfied (e.g. in the case of Rockafellar-Wets’ augmented Lagrangian) the penalty map is convex, since in this case the dual function Θ\Theta is concave and, therefore, the set of optimal dual solutions 𝒟\mathscr{D}_{*} is convex.

Optimal dual solutions can be described in terms of global saddle points of the augmented Lagrangian.

Definition 2.

A pair (x,λ)Q×Λ(x_{*},\lambda_{*})\in Q\times\Lambda is called a global saddle point of the augmented Lagrangian ()\mathscr{L}(\cdot), if there exists c>0c_{*}>0 such that

supλΛ(x,λ,c)(x,λ,c)infxQ(x,λ,c)<+cc.\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c)\leq\mathscr{L}(x_{*},\lambda_{*},c)\leq\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)<+\infty\quad\forall c\geq c_{*}. (16)

The infimum of all such cc_{*} is denoted by c(x,λ)c_{*}(x_{*},\lambda_{*}) and is call the least exact penalty parameter for the global saddle point (x,λ)(x_{*},\lambda_{*}).

Remark 14.

It is worth noting that inequalities (16) from the definition of global saddle point are obviously satisfied as (and, therefore, can be replaced with) equalities.

The following theorem, that combines together several well-known results (cf. [58, Theorem 11.59], [62, Theorem 2.1, part (v)], [88, Theorem 2.1], etc.), shows how optimal dual solutions are interconnected with global saddle points. We present a complete proof of this theorem for the sake of completeness and due to the fact that, to the best of the author’s knowledge, all three claims of this theorem cannot be derived from existing results within our axiomatic augmented Lagrangian setting.

Theorem 4.

Let assumptions (A1)(A1)(A3)(A3) and (A7)(A7) be valid. Then the following statements hold true:

  1. 1.

    if a global saddle point (x,λ)(x_{*},\lambda_{*}) of ()\mathscr{L}(\cdot) exists, then the zero duality gap property holds true, xx_{*} is a globally optimal solution of the problem (𝒫)(\mathcal{P}), and for any c>c(x,λ)c_{*}>c_{*}(x_{*},\lambda_{*}) the pair (λ,c)(\lambda_{*},c_{*}) is an optimal dual solution;

  2. 2.

    if (λ,c)(\lambda_{*},c_{*}) is an optimal dual solution and the zero duality gap property holds true, then for any globally optimal solution xx_{*} of the problem (𝒫)(\mathcal{P}) the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of ()\mathscr{L}(\cdot) and c(x,λ)cc_{*}(x_{*},\lambda_{*})\leq c_{*};

  3. 3.

    if (x,λ)(x_{*},\lambda_{*}) is a global saddle point of ()\mathscr{L}(\cdot), then

    f=(x,λ,c)=infxQsupλΛ(x,λ,c)=supλΛinfxQ(x,λ,c)=Θf_{*}=\mathscr{L}(x_{*},\lambda_{*},c)=\inf_{x\in Q}\sup_{\lambda\in\Lambda}\mathscr{L}(x,\lambda,c)=\sup_{\lambda\in\Lambda}\inf_{x\in Q}\mathscr{L}(x,\lambda,c)=\Theta_{*}

    for all c>c(x,λ)c>c_{*}(x_{*},\lambda_{*}).

Proof.

Part 1. Let (x,λ)(x_{*},\lambda_{*}) be a global saddle point of ()\mathscr{L}(\cdot) and some c>c(x,λ)c>c_{*}(x_{*},\lambda_{*}) be fixed. Let us first show that xx_{*} is feasible. Indeed, assume that G(x)KG(x_{*})\notin K. Then by assumption (A3)(A3) there exists a multiplier λ0Λ\lambda_{0}\in\Lambda such that Φ(G(x),tλ0,c)+\Phi(G(x_{*}),t\lambda_{0},c)\to+\infty as t+t\to+\infty, which contradicts the inequalities

(x,tλ0,c)supλΛ(x,λ,c)(x,λ,c)<+c>c(x,λ)\mathscr{L}(x_{*},t\lambda_{0},c)\leq\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c)\leq\mathscr{L}(x_{*},\lambda_{*},c)<+\infty\quad\forall c>c_{*}(x_{*},\lambda_{*})

that follow from the definition of the global saddle point. Thus, G(x)KG(x_{*})\in K, that is, xx_{*} is feasible.

By assumption (A2)(A2) there exists λ^Λ\widehat{\lambda}\in\Lambda such that Φ(G(x),λ^,c)0\Phi(G(x_{*}),\widehat{\lambda},c)\geq 0, which by the definition of global saddle point implies that

f(x)(x,λ^,c)supλΛ(x,λ,c)(x,λ,c)f(x)c>c(x,λ),f(x_{*})\leq\mathscr{L}(x_{*},\widehat{\lambda},c)\leq\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c)\leq\mathscr{L}(x_{*},\lambda_{*},c)\leq f(x_{*})\quad\forall c>c_{*}(x_{*},\lambda_{*}),

where the last inequality is valid by assumption (A1)(A1). Consequently, one has

(x,λ,c)=f(x),Φ(G(x),λ,c)=0c>c(x,λ).\mathscr{L}(x_{*},\lambda_{*},c)=f(x_{*}),\quad\Phi(G(x_{*}),\lambda_{*},c)=0\quad\forall c>c_{*}(x_{*},\lambda_{*}).

Hence applying the definition of global saddle point and assumption (A1)(A1) once more one gets that

f(x)=(x,λ,c)infxQ(x,λ,c)infxQ:G(x)K(x,λ,c)f(x)f(x_{*})=\mathscr{L}(x_{*},\lambda_{*},c)\leq\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)\leq\inf_{x\in Q\colon G(x)\in K}\mathscr{L}(x,\lambda_{*},c)\leq f(x)

for any feasible xx, which means that xx_{*} is a globally optimal solution of the problem (𝒫)(\mathcal{P}). Furthermore, one has

Θ(λ,c)=infxQ(x,λ,c)=(x,λ,c)=f(x)=fc>c(x,λ).\Theta(\lambda_{*},c)=\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)=\mathscr{L}(x_{*},\lambda_{*},c)=f(x_{*})=f_{*}\quad\forall c>c_{*}(x_{*},\lambda_{*}).

Thetefore, by Proposition 1 the zero duality gap property holds true and the pair (λ,c)(\lambda_{*},c) is an optimal dual solution for any c>c(x,λ)c>c_{*}(x_{*},\lambda_{*}).

Part 2. Let (λ,c)(\lambda_{*},c_{*}) and xx_{*} be globally optimal solutions of the problems (𝒟)(\mathcal{D}) and (𝒫)(\mathcal{P}) respectively, and suppose that the zero duality gap between property holds true. Fix any ccc\geq c_{*}. Then applying assumption (A1)(A1) and Proposition 3 one obtains that

(x,λ,c)f(x)=f=Θ=Θ(λ,c)=infxQ(x,λ,c)c>c.\mathscr{L}(x_{*},\lambda_{*},c)\leq f(x_{*})=f_{*}=\Theta_{*}=\Theta(\lambda_{*},c)=\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)\quad\forall c>c_{*}.

Consequently, (x,λ,c)=f(x)\mathscr{L}(x_{*},\lambda_{*},c)=f(x_{*}), and applying assumption (A1)(A1) once again one gets

supλΛ(x,λ,c)f(x)=(x,λ,c)=infxQ(x,λ,c)c>c,\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c)\leq f(x_{*})=\mathscr{L}(x_{*},\lambda_{*},c)=\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)\quad\forall c>c_{*},

which obviously means that (x,λ)(x_{*},\lambda_{*}) is a global saddle point of the augmented Lagrangian and c(x,λ)cc_{*}(x_{*},\lambda_{*})\leq c_{*}.

Part 3. Let (x,λ)(x_{*},\lambda_{*}) be a global saddle point. Choose any c>c(x,λ)c>c_{*}(x_{*},\lambda_{*}). From the proof of the first statement of the theorem it follows that

(x,λ,c)=f=Θ=Θ(λ,c)=infxQ(x,λ,c)=supλΛinfxQ(x,λ,c),\mathscr{L}(x_{*},\lambda_{*},c)=f_{*}=\Theta_{*}=\Theta(\lambda_{*},c)=\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)=\sup_{\lambda\in\Lambda}\inf_{x\in Q}\mathscr{L}(x,\lambda,c),

where the last equality and the fact that Θ=Θ(λ,c)\Theta_{*}=\Theta(\lambda_{*},c) follow from the fact that (λ,c)(\lambda_{*},c) is an optimal dual solution.

By the definition of global saddle point (x,λ,c)=supλΛ(x,λ,c)\mathscr{L}(x_{*},\lambda_{*},c)=\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c), which implies that

(x,λ,c)infxQsupλΛ(x,λ,c).\mathscr{L}(x_{*},\lambda_{*},c)\geq\inf_{x\in Q}\sup_{\lambda\in\Lambda}\mathscr{L}(x,\lambda,c).

On the other hand, by the same definition one also has

(x,λ,c)=infxQ(x,λ,c)infxQsupλΛ(x,λ,c).\mathscr{L}(x_{*},\lambda_{*},c)=\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c)\leq\inf_{x\in Q}\sup_{\lambda\in\Lambda}\mathscr{L}(x,\lambda,c).

Thus, (x,λ,c)=infxQsupλΛ(x,λ,c)\mathscr{L}(x_{*},\lambda_{*},c)=\inf_{x\in Q}\sup_{\lambda\in\Lambda}\mathscr{L}(x,\lambda,c), and the proof is complete. ∎

Remark 15.

Note that assumption (A7)(A7) is not needed for the validity of the first and third statements of the theorem, since it is not used in the proofs of these statements. In turn, assumptions (A2)(A2) and (A3)(A3) are not needed for the validity of the second statement of the theorem.

Combining the first and second statements of the previous theorem one obtains the two following useful results.

Corollary 5.

Let assumptions (A1)(A1)(A3)(A3) and (A7)(A7) hold true. Then a global saddle point of ()\mathscr{L}(\cdot) exists if and only if there exist globally optimal solutions of the primal problem (𝒫)(\mathcal{P}) and the dual problem (𝒟)(\mathcal{D}) and the zero duality gap property holds true.

Corollary 6.

Let assumptions (A1)(A1)(A3)(A3) and (A7)(A7) be valid and (x,λ)(x_{*},\lambda_{*}) be a global saddle point of ()\mathscr{L}(\cdot). Then for any globally optimal solution zz_{*} of the problem (𝒫)(\mathcal{P}) the pair (z,λ)(z_{*},\lambda_{*}) is also a global saddle point of ()\mathscr{L}(\cdot) and c(x,λ)=c(z,λ)=c(λ)c_{*}(x_{*},\lambda_{*})=c_{*}(z_{*},\lambda_{*})=c_{*}(\lambda_{*}).

Thus, the least exact penalty parameter c(x,λ)c_{*}(x_{*},\lambda_{*}) does not depend on a globally optimal solution xx_{*} of the problem (𝒫)(\mathcal{P}) and is equal to the value of the penalty map c(λ)c_{*}(\lambda_{*}).

Remark 16.

As was shown in [22, Proposition 9], if the functions ff and GG are differentiable, then under some natural assumptions on the function Φ\Phi any global saddle point of the augmented Lagrangian ()\mathscr{L}(\cdot) is a KKT-point of the problem (𝒫)(\mathcal{P}). This result implies that if there are two globally optimal solutions of the problem (𝒫)(\mathcal{P}) having disjoint sets of multipliers satisfying KKT optimality conditions (note that problems having such optimal solutions are necessarily nonconvex), then there are no global saddle points of the augmented Lagrangian ()\mathscr{L}(\cdot) and by Corollary 5 either the duality gap between the primal and dual problems is positive or the dual problem has no globally optimal solutions. As we will show below, this fact leads to the unboundedness of the sequence of multipliers or the sequence of penalty parameters generated by augmented Lagrangian methods for problems having optimal solutions with disjoint sets of Lagrange multipliers.

Let us give an example illustrating the previous remark.

Example 20.

Let X=Y=X=Y=\mathbb{R}. Consider the following optimization problem:

minf(x)=x2subject tog1(x)=x10,g2(x)=x10.\min\>f(x)=-x^{2}\quad\text{subject to}\enspace g_{1}(x)=x-1\leq 0,\enspace g_{2}(x)=-x-1\leq 0. (17)

This problem has two globally optimal solutions: 11 and 1-1. The corresponding Lagrange multipliers are (2,0)(2,0) and (0,2)(0,2). Thus, the sets of Lagrange multipliers corresponding to two different globally optimal solutions are disjoint.

The optimal value function for problem (17) has the form:

β(p)\displaystyle\beta(p) =inf{x2|x1p10,x1p20}\displaystyle=\inf\Big{\{}-x^{2}\Bigm{|}x-1-p_{1}\leq 0,\>-x-1-p_{2}\leq 0\Big{\}}
={max{|1p1|,|1+p2|}2,if p1p22,+,if p1p2>2.\displaystyle=\begin{cases}-\max\big{\{}|1-p_{1}|,|1+p_{2}|\big{\}}^{2},&\text{if }p_{1}-p_{2}\leq 2,\\ +\infty,&\text{if }p_{1}-p_{2}>2.\end{cases}

The function β\beta is obviously continuous at the origin. Therefore, under the assumptions of Theorem 3 the zero duality gap property holds true. In particular, it holds true for the Hestenes-Powell-Rockafellar augmented Lagrangian for problem (17):

(x,λ,c)=x2+λ1max{x1,λ1c}+c2max{x1,λ1c}2+λ2max{x1,λ2c}+c2max{x1,λ2c}2.\begin{split}\mathscr{L}(x,\lambda,c)=-x^{2}&+\lambda_{1}\max\left\{x-1,-\frac{\lambda_{1}}{c}\right\}+\frac{c}{2}\max\left\{x-1,-\frac{\lambda_{1}}{c}\right\}^{2}\\ &+\lambda_{2}\max\left\{-x-1,-\frac{\lambda_{2}}{c}\right\}+\frac{c}{2}\max\left\{-x-1,-\frac{\lambda_{2}}{c}\right\}^{2}.\end{split} (18)

However, the corresponding augmented dual problem has no globally optimal solutions.

Indeed, if an optimal dual solution (λ,c)(\lambda_{*},c_{*}) exists, then by Theorem 4 for any globally optimal solution xx_{*} of problem (17) the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of the augmented Lagrangian and c(x,λ)cc_{*}(x_{*},\lambda_{*})\leq c_{*}. Therefore by the third statement of Theorem 4 and the definition of global saddle point for any c>cc>c_{*} one has

f=1=(1,λ,c)=(1,λ,c)=infx(x,λ,c).f_{*}=-1=\mathscr{L}(1,\lambda_{*},c)=\mathscr{L}(-1,\lambda_{*},c)=\inf_{x\in\mathbb{R}}\mathscr{L}(x,\lambda_{*},c). (19)

Hence applying assumption (A1)(A1) one gets that

0\displaystyle 0 =Φ(G(1),λ,c)=(λ)2max{2,(λ)2c}+c2max{2,(λ)2c}2\displaystyle=\Phi(G(1),\lambda_{*},c)=(\lambda_{*})_{2}\max\left\{-2,-\frac{(\lambda_{*})_{2}}{c}\right\}+\frac{c}{2}\max\left\{-2,-\frac{(\lambda_{*})_{2}}{c}\right\}^{2}
0\displaystyle 0 =Φ(G(1),λ,c)=(λ)1max{2,(λ)1c}+c2max{2,(λ)1c}2\displaystyle=\Phi(G(-1),\lambda_{*},c)=(\lambda_{*})_{1}\max\left\{-2,-\frac{(\lambda_{*})_{1}}{c}\right\}+\frac{c}{2}\max\left\{-2,-\frac{(\lambda_{*})_{1}}{c}\right\}^{2}

Clearly, there exists c0>0c_{0}>0 such that (λ)1/c<2(\lambda_{*})_{1}/c<2 and (λ)2/c<2(\lambda_{*})_{2}/c<2 for any cc0c\geq c_{0}. Therefore, by the equalities above for any cc0c\geq c_{0} one has

0=(λ)122c,0=(λ)222c0=-\frac{(\lambda_{*})_{1}^{2}}{2c},\quad 0=-\frac{(\lambda_{*})_{2}^{2}}{2c}

or, equivalently, λ=0\lambda_{*}=0. However, as one can easily check,

infx(x,λ,c)=infx(x,0,c)=12c2<fc>2,\inf_{x\in\mathbb{R}}\mathscr{L}(x,\lambda_{*},c)=\inf_{x\in\mathbb{R}}\mathscr{L}(x,0,c)=-1-\frac{2}{c-2}<f_{*}\quad\forall c>2,

which contradicts (19). Thus, the augmented dual problem has no globally optimal solutions. In the following section we will show how a standard primal-dual augmented Lagrangian method behaves for problem (17) (see Example 22).

Another object in the augmented duality theory, that is directly connected with optimal dual solutions and global saddle points, is augmented Lagrange multiplier, which we introduce below by analogy with the theory of Rockafellar-Wets’ augmented Lagrangians [62, 61, 88, 18, 21].

Definition 3.

A vector λΛ\lambda_{*}\in\Lambda is called an augmented Lagrange multiplier (of the augmented Lagrangian ()\mathscr{L}(\cdot)), if there exists xQx_{*}\in Q such that the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of the augmented Lagrangian. The set of all augmented Lagrange multipliers is denoted by 𝒜\mathscr{A}_{*}.

Remark 17.

(i) Our definition of the augmented Lagrange multiplier is equivalent to the one used in the context of Rockafellar-Wets’ augmented Lagrangians by [62, Theorem 2.1].

(ii) By Theorem 4, in the case when the zero duality gap property is satisfied and there exists a globally optimal solution of the primal problem, a vector λΛ\lambda_{*}\in\Lambda is an augmented Lagrange multiplier if and only if there exists c>0c_{*}>0 such that (λ,c)(\lambda_{*},c_{*}) is an optimal dual solution. Consequently, 𝒜=domc()\mathscr{A}_{*}=\operatorname{dom}c_{*}(\cdot).

Let us point out some interesting properties of augmented Lagrange multipliers.

Proposition 4.

Let assumption (A1)(A1) and the zero duality gap property hold true, and there exist a globally optimal solution of the problem (𝒫)(\mathcal{P}). Then a vector λΛ\lambda_{*}\in\Lambda is an augmented Lagrange multiplier if and only if there exists c>0c_{*}>0 such that

Θ(λ,c):=infxQ(x,λ,c)=f.\Theta(\lambda_{*},c_{*}):=\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c_{*})=f_{*}. (20)

Furthermore, if this equality and assumptions (A2)(A2), (A3)(A3), and (A7)(A7) are satisfied, then the following statements hold true:

  1. 1.

    Θ(λ,c)=f\Theta(\lambda_{*},c)=f_{*} for all ccc\geq c_{*};

  2. 2.

    the infimum of all cc_{*} for which (20) holds true is equal to c(λ)c_{*}(\lambda_{*});

  3. 3.

    for all c>c(λ)c_{*}>c_{*}(\lambda_{*}) the infimum in (20) is attained at every globally optimal solution of the problem (𝒫)(\mathcal{P});

  4. 4.

    if the function cΦ(y,λ,c)c\mapsto\Phi(y,\lambda_{*},c) is strictly increasing on domΦ(y,λ,)\operatorname{dom}\Phi(y,\lambda_{*},\cdot) for any yKy\notin K and on T(y):={c(0,+):<Φ(y,λ,c)<0}T(y):=\{c\in(0,+\infty)\colon-\infty<\Phi(y,\lambda_{*},c)<0\} for any yKy\in K, then for all c>c(λ)c_{*}>c_{*}(\lambda_{*}) the infimum in (20) is attained at some xQx\in Q if and only if xx is a globally optimal solution of the problem (𝒫)(\mathcal{P}).

Proof.

Part 1. Let λ\lambda_{*} be an augmented Lagrange multiplier. Then by Theorem 4 there exists c>0c_{*}>0 such that (λ,c)(\lambda_{*},c_{*}) is an optimal dual solution. Hence with the use of the fact that the zero duality gap property holds true one can conclude that equality (20) is valid.

Suppose now that equality (20) is satisfied for some λΛ\lambda_{*}\in\Lambda and c>0c_{*}>0. Then by Proposition 1 the pair (λ,c)(\lambda_{*},c_{*}) is an optimal dual solution, which by Theorem 4 implies that λ\lambda_{*} is an augmented Lagrange multiplier.

Part 2.1. Suppose that that equality (20) is satisfied for some λΛ\lambda_{*}\in\Lambda and c>0c_{*}>0, and assumptions (A2)(A2), (A3)(A3), and (A7)(A7) hold true. Then, as was noted earlier, the function Θ(λ,c)\Theta(\lambda,c) is non-decreasing in cc, which along with Proposition 1 imply that Θ(λ,c)=f\Theta(\lambda_{*},c)=f_{*} for all ccc\geq c_{*}.

Part 2.2. The fact that the infimum of all cc_{*} for which (20) holds true is equal to c(λ)c_{*}(\lambda_{*}) follows directly from the definition of the penalty map.

Part 2.3. If a pair (λ,c)(\lambda_{*},c_{*}) satisfies equality (20), then it is an optimal dual solution. Consequently, by Theorem 4 for any globally optimal solution xx_{*} of the problem (𝒫)(\mathcal{P}) the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of the augmented Lagrangian. By the definition of global saddle point it means that the infimum in (20) is attained at xx_{*} (see (16)).

Part 2.4. Suppose finally that the function cΦ(y,λ,c)c\mapsto\Phi(y,\lambda_{*},c) is strictly increasing on domΦ(y,λ,)\operatorname{dom}\Phi(y,\lambda_{*},\cdot) for any yKy\notin K and on T(y)T(y) for any yKy\in K, and the infimum in (20) is attained at some xQx\in Q. If xx is feasible and Φ(G(x),λ,c)<0\Phi(G(x),\lambda_{*},c_{*})<0 or xx is infeasible, then for any c(λ)<c<cc_{*}(\lambda_{*})<c<c_{*} by our assumption on the function Φ\Phi one has f=(x,λ,c)>(x,λ,c)ff_{*}=\mathscr{L}(x,\lambda_{*},c_{*})>\mathscr{L}(x,\lambda_{*},c)\geq f_{*}, which is obviously impossible. Therefore, xx is feasible and Φ(G(x),λ,c)=0\Phi(G(x),\lambda_{*},c_{*})=0, which means that f(x)=ff(x)=f_{*}, that is, xx_{*} is a globally optimal solution of the problem (𝒫)(\mathcal{P}). ∎

Thus, if λΛ\lambda_{*}\in\Lambda is an augmented Lagrange multiplier, then under some additional assumptions the problem (𝒫)(\mathcal{P}) has the same optimal value and the same globally optimal solutions as the problem

minx(x,λ,c)subject to xQ\min_{x}\>\mathscr{L}(x,\lambda_{*},c)\quad\text{subject to }x\in Q

for any c>c(λ)c>c_{*}(\lambda_{*}). That is why in [58] augmented Lagrange multipliers were called multipliers supporting an exact penalty representation (see [58, Definition 11.60] and [31, 32, 86, 9, 72, 87, 12]).

Remark 18.

The assumption that the function cΦ(y,λ,c)c\mapsto\Phi(y,\lambda_{*},c) is strictly increasing on domΦ(y,λ,)\operatorname{dom}\Phi(y,\lambda_{*},\cdot) for any yKy\notin K and on T(y)T(y) for any yKy\in K is very mild and satisfied in many particular cases. For example, it is satisfied for any λΛ\lambda_{*}\in\Lambda for Rockafellar-Wets’ augmented Lagrangian (Example 1), provided the function σ\sigma has a valley at zero and is continuous at this point, the Hestenes-Powell-Rockafellar augmented Lagrangian (Examples 2, 14, 16, 19), the sharp Lagrangian (Example 3), Mangasarian’s augmented Lagrangian(Examples 4 and 7), the essentially quadratic augmented Lagrangian (Example 5), the cubic augmented Lagrangian (Example 6), the penalized exponential-type augmented Lagrangian (Examples 9 and 18), provided the function ξ\xi is strictly convex on (0,+)(0,+\infty), and He-Wu-Meng’s augmented Lagrangian (Example 13). This assumption is also satisfied for any λΛ\lambda_{*}\in\Lambda that does not have zero components for the exponential-type augmented Lagrangian (Example 8), the hyperbolic-type augmented Lagrangian (Example 11), and the modified barrier function (Example 12), provided the function ϕ\phi is strictly convex in these examples, and for the p-th power augmented Lagrangian (Example 10).

Remark 19.

To the best of the author’s knowledge, the penalty map, augmented Lagrange multipliers, and an exact penalty representation have been introduced and studied earlier only in the context of Rockafellar-Wets’ augmented Lagrangians. In this section we demonatrated (see Corollary 4 and Proposition 4) that there is nothing specific in these concepts that is inherently connected to Rockafellar-Wets’ augmented Lagrangians. They can be naturally introduced and studied for any other augmented Lagrangian, including the (penalized) exponential-type augmented Lagrangian, modified barrier functions, Mangasarian’s augmented Lagrangian, etc., that are typically not considered in the theory of Rockafellar-Wets’ augmented Lagrangians.

4.3 Optimal dual solutions for convex problems

In the case when the problem (𝒫)(\mathcal{P}) is convex, optimal dual solutions, roughly speaking, do not depend on the penalty parameter cc (more precisely, the penalty map c(λ)c_{*}(\lambda_{*}) does not depend on λ𝒜\lambda_{*}\in\mathscr{A}_{*}), and one can give a very simple (and well-known) description of the set of optimal dual solutions in terms of Lagrange multipliers.

Let the function ff and the set QQ be convex, and the mapping GG be convex with respect to binary relation \preceq, that is,

G(αx1+(1α)x2)αG(x1)+(1α)G(x2)x1,x2X,α[0,1].G(\alpha x_{1}+(1-\alpha)x_{2})\preceq\alpha G(x_{1})+(1-\alpha)G(x_{2})\quad\forall x_{1},x_{2}\in X,\>\alpha\in[0,1].

Then the problem (𝒫)(\mathcal{P}) is convex. Moreover, in this case under assumption (A8)(A8) the augmented Lagrangian (x,λ,c)\mathscr{L}(x,\lambda,c) is convex in xx. Indeed, by applying first the fact that Φ(,λ,c)\Phi(\cdot,\lambda,c) is non-decreasing with respect to \preceq and then the fact that this function is convex one obtains that for any x1,x2Xx_{1},x_{2}\in X and α[0,1]\alpha\in[0,1] the following inequalities hold true:

(αx1\displaystyle\mathscr{L}(\alpha x_{1} +(1α)x2,λ,c)=f(αx1+(1α)x2)+Φ(G(αx1+(1α)x2),λ,c)\displaystyle+(1-\alpha)x_{2},\lambda,c)=f(\alpha x_{1}+(1-\alpha)x_{2})+\Phi(G(\alpha x_{1}+(1-\alpha)x_{2}),\lambda,c)
αf(x1)+(1α)f(x2)+Φ(αG(x1)+(1α)G(x2),λ,c)\displaystyle\leq\alpha f(x_{1})+(1-\alpha)f(x_{2})+\Phi(\alpha G(x_{1})+(1-\alpha)G(x_{2}),\lambda,c)
αf(x1)+(1α)f(x2)+αΦ(G(x1),λ,c)+(1α)Φ(G(x2),λ,c)\displaystyle\leq\alpha f(x_{1})+(1-\alpha)f(x_{2})+\alpha\Phi(G(x_{1}),\lambda,c)+(1-\alpha)\Phi(G(x_{2}),\lambda,c)
=α(x1,λ,c)+(1α)(x2,λ,c),\displaystyle=\alpha\mathscr{L}(x_{1},\lambda,c)+(1-\alpha)\mathscr{L}(x_{2},\lambda,c),

which means that the function (,λ,c)\mathscr{L}(\cdot,\lambda,c) is convex for any λΛ\lambda\in\Lambda and c>0c>0 (in the case when Φ(αG(x1)+(1α)G(x2),λ,c)=\Phi(\alpha G(x_{1})+(1-\alpha)G(x_{2}),\lambda,c)=-\infty, instead of the inequalities above one should apply [54, Theorem I.4.2]).

Denote by L(x,λ)=f(x)+λ,G(x)L(x,\lambda)=f(x)+\langle\lambda,G(x)\rangle the standard Lagrangian for the problem (𝒫)(\mathcal{P}), where λK\lambda\in K^{*}. It is easily seen that the function L(,λ)L(\cdot,\lambda) is convex. Recall that a vector λK\lambda_{*}\in K^{*} is called a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) at a feasible point xx_{*}, if 0xL(x,λ)+NQ(x)0\in\partial_{x}L(x_{*},\lambda_{*})+N_{Q}(x_{*}) and λ,G(x)=0\langle\lambda_{*},G(x_{*})\rangle=0, where xL(x,λ)\partial_{x}L(x_{*},\lambda_{*}) is the subdifferential of the function L(,λ)L(\cdot,\lambda_{*}) at xx_{*} in the sense of convex analysis and NQ(x)={xXx,xx0xQ}N_{Q}(x_{*})=\{x^{*}\in X^{*}\mid\langle x^{*},x-x_{*}\rangle\leq 0\>\forall x\in Q\} is the normal cone to the set QQ at xx_{*} (see, e.g. [6, Definition 3.5]). The existence of a Lagrange multiplier at xx_{*} is a sufficient, and in the case when 0int(G(Q)K)0\in\operatorname{int}(G(Q)-K) necessary, global optimality condition, and the set Λ\Lambda_{*} of Lagrange multipliers of the problem (𝒫)(\mathcal{P}) is a nonempty, convex, weak compact set that does not depend on an optimal solution xx_{*} (see [6, Theorem 3.6]). Furthermore, λ\lambda_{*} is a Lagrange multiplier at xx_{*} if and only if (x,λ)(x_{*},\lambda_{*}) is a saddle point of the Lagrangian L()L(\cdot):

supλKL(x,λ)L(x,λ)infxQL(x,λ).\sup_{\lambda\in K^{*}}L(x_{*},\lambda)\leq L(x_{*},\lambda_{*})\leq\inf_{x\in Q}L(x,\lambda_{*}).

Note finally that if L(,λ)L(\cdot,\lambda_{*}) is directionally differentiable at xx_{*}, then λ\lambda_{*} is a Lagrange multiplier at xx_{*} if and only if [L(,λ)](x,h)0[L(\cdot,\lambda_{*})]^{\prime}(x_{*},h)\geq 0 for all hTQ(x)h\in T_{Q}(x_{*}) and λ,G(x)=0\langle\lambda_{*},G(x_{*})\rangle=0, where [L(,λ)](x,h)[L(\cdot,\lambda_{*})]^{\prime}(x_{*},h) is the directional derivative of L(,λ)L(\cdot,\lambda_{*}) at xx_{*} in the direction hh, and TQ(x)T_{Q}(x_{*}) is the contingent cone to the set QQ at the point xx_{*} (cf. [6, Lemma 3.7]).

Let us now present a complete characterisation of the set optimal dual solutions in terms of Lagrange multipliers in the convex case. Roughly speaking, this result shows that under suitable assumptions there is essentially no difference between standard duality theory, based on the Lagrangian L()L(\cdot), and augmented duality theory, based on the augmented Lagrangian ()\mathscr{L}(\cdot). For the sake of simplicity we will prove this result under the assumption that the functions ff and GG are directionally differentiable at a globally optimal solution of the problem (𝒫)(\mathcal{P}), although in various particular cases (e.g. in the case of inequality constrained problems) this result can be proved without this assumption.

Theorem 5.

Let the following conditions hold true:

  1. 1.

    ff and QQ are convex, GG is convex with respect to the binary relation \preceq;

  2. 2.

    assumptions (A1)(A1), (A4)(A4), (A5)(A5), (A7)(A7), (A8)(A8), and (A11)(A11) hold true;

  3. 3.

    KΛK^{*}\subseteq\Lambda;

  4. 4.

    ff and GG are directionally differentiable at a globally optimal solution xx_{*} of the problem (𝒫)(\mathcal{P}).

Then a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) exists if and only if the zero duality gap property holds true and there exists a globally optimal solution (λ,c)(\lambda_{*},c_{*}) of the augmented dual problem (𝒟)(\mathcal{D}) with λK\lambda_{*}\in K^{*}.

Moreover, if a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) exists, then (λ,c)(\lambda_{*},c_{*}) with λK\lambda_{*}\in K^{*} is an optimal dual solution if and only if Φ0(λ)\Phi_{0}(\lambda_{*}) is a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) and c>0c_{*}>0, where Φ0\Phi_{0} is from assumption (A11)(A11).

Proof.

Suppose that a Lagrange multiplier λK\lambda_{*}\in K^{*} exists. Then, as was noted above, [L(,λ)](x,h)0[L(\cdot,\lambda_{*})]^{\prime}(x_{*},h)\geq 0 for all hTQ(x)h\in T_{Q}(x_{*}). By assumption (A11)(A11) the function Φ(,λ,c)\Phi(\cdot,\lambda,c) is Fréchet differentiable and its Fréchet derivative DyΦ(y,λ,c)=Φ0(λ)D_{y}\Phi(y,\lambda,c)=\Phi_{0}(\lambda) is a surjective mapping from KK^{*} onto KK^{*}. Therefore, there exists μK\mu_{*}\in K^{*} such that Φ0(μ)=λ\Phi_{0}(\mu_{*})=\lambda_{*}. Applying the chain rule for directional derivatives one gets

[(,μ,c)](x,h)=f(x,h)+Φ0(μ),G(x,h)=[L(,λ)](x,h)0[\mathscr{L}(\cdot,\mu_{*},c)]^{\prime}(x_{*},h)=f^{\prime}(x_{*},h)+\langle\Phi_{0}(\mu_{*}),G^{\prime}(x_{*},h)\rangle=[L(\cdot,\lambda_{*})]^{\prime}(x_{*},h)\geq 0

for any c>0c>0 and hTQ(x)h\in T_{Q}(x_{*}). As was noted above, under the assumptions of the theorem the function (,μ,c)\mathscr{L}(\cdot,\mu_{*},c) is convex. Consequently, the inequality above implies that xx_{*} is a point of global minimum of (,μ,c)\mathscr{L}(\cdot,\mu_{*},c) on the set QQ. Recall that λ,G(x)=0\langle\lambda_{*},G(x_{*})\rangle=0, since λ\lambda_{*} is a Lagrange multiplier. Hence by assumption (A11)(A11) one has μ,G(x)=0\langle\mu_{*},G(x_{*})\rangle=0, and with the use of assumption (A4)(A4) one gets

f=f(x)=(x,μ,c)=infxQ(x,μ,c)=Θ(μ,c)c>0,f_{*}=f(x_{*})=\mathscr{L}(x_{*},\mu_{*},c)=\inf_{x\in Q}\mathscr{L}(x,\mu_{*},c)=\Theta(\mu_{*},c)\quad\forall c>0,

which by Proposition 1 means that the zero duality gap property is satisfied and (μ,c)(\mu_{*},c) with any c>0c>0 is an optimal dual solution.

Suppose now that the zero duality gap property holds true and there exists an optimal solution (μ,c)(\mu_{*},c_{*}) of the problem (𝒟)(\mathcal{D}) with μK\mu_{*}\in K^{*}. Then by Theorem 4 (see also Remark 15) the pair (x,μ)(x_{*},\mu_{*}) is a global saddle point of the augmented Lagrangian and c(μ)cc_{*}(\mu_{*})\leq c_{*}. Therefore, by the definition of global saddle point xx_{*} is a point of global minimum of (,μ,c)\mathscr{L}(\cdot,\mu_{*},c) on the set QQ and

(x,μ,c)=Θ(μ,c)=Θ(μ,c)=Θ=fc>c\mathscr{L}(x_{*},\mu_{*},c)=\Theta(\mu_{*},c)=\Theta(\mu_{*},c_{*})=\Theta_{*}=f_{*}\quad\forall c>c_{*} (21)

(here we used Proposition 3). Hence with the use of assumption (A11)(A11) one obtains that

0[(,μ,c)](x,h)=f(x,h)+Φ0(μ),G(x,h)=[L(,λ)](x,h)0\leq[\mathscr{L}(\cdot,\mu_{*},c)]^{\prime}(x_{*},h)=f^{\prime}(x_{*},h)+\langle\Phi_{0}(\mu_{*}),G^{\prime}(x_{*},h)\rangle=[L(\cdot,\lambda_{*})]^{\prime}(x_{*},h)

for any hTQ(x)h\in T_{Q}(x_{*}), where λ=Φ0(μ)\lambda_{*}=\Phi_{0}(\mu_{*}). Moreover, μ,G(x)=0\langle\mu_{*},G(x_{*})\rangle=0 and, therefore, λ,G(x)=0\langle\lambda_{*},G(x_{*})\rangle=0 by assumption (A11)(A11), since otherwise by assumption (A5)(A5) one has (x,μ,c)<f(x)=f\mathscr{L}(x_{*},\mu_{*},c)<f(x_{*})=f_{*}, which contradicts (21). Thus, λ\lambda_{*} is a Lagrange multiplier.

It remains to note that, as was shown above, if λ\lambda_{*} is a Lagrange multiplier, then for any c>0c>0 and μK\mu_{*}\in K^{*} such that Φ0(μ)=λ\Phi_{0}(\mu_{*})=\lambda_{*} the pair (μ,c)(\mu_{*},c) is an optimal dual solution. Conversely, if (μ,c)(\mu_{*},c_{*}) with μK\mu_{*}\in K^{*} is an optimal dual solution, then λ=Φ0(μ)\lambda_{*}=\Phi_{0}(\mu_{*}) is a Lagrange multiplier. ∎

Corollary 7.

Let ff and QQ be convex, GG be convex with respect to the binary relation \preceq, ff and GG be directionally differentiable at an optimal solution xx_{*} of the problem (𝒫)(\mathcal{P}), KΛK^{*}\subseteq\Lambda, and suppose that assumptions (A1)(A1)(A8)(A8) and (A11)(A11) hold true. Then a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) exists if and only if the zero duality gap property holds true and there exists an optimal dual solution. Moreover, if a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) exists, then (λ,c)(\lambda_{*},c_{*}) is a globally optimal solution of the dual problem (𝒟)(\mathcal{D}) if and only if Φ0(λ)\Phi_{0}(\lambda_{*}) is a Lagrange multiplier of the problem (𝒫)(\mathcal{P}) and c>0c_{*}>0.

Proof.

Let (λ,c)(\lambda_{*},c_{*}) be an optimal dual solution. Then by Theorem 4 the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point, and with the use of [22, Proposition 2] one can conclude that λK\lambda_{*}\in K^{*}. Thus, for any optimal dual solution (λ,c)(\lambda_{*},c_{*}) one has λK\lambda_{*}\in K^{*} and the claim of the corollary follows directly from Theorem 5. ∎

With the use of the previous corollary we can finally describe the structure of the set of optimal dual solutions 𝒟\mathscr{D}_{*}, the penalty map c()c_{*}(\cdot), and the set of augmented Lagrange multipliers 𝒜\mathscr{A}_{*} in the convex case.

Corollary 8.

Let the assumptions of the previous corollary be valid. Then 𝒜=Φ01(Λ)\mathscr{A}_{*}=\Phi_{0}^{-1}(\Lambda_{*}), c(λ)=0c_{*}(\lambda_{*})=0 for any λ𝒜\lambda_{*}\in\mathscr{A}_{*}, and the following equality holds true:

𝒟=Φ01(Λ)×(0,+).\mathscr{D}_{*}=\Phi_{0}^{-1}\big{(}\Lambda_{*}\big{)}\times(0,+\infty). (22)
Remark 20.

The fact that optimal dual solutions for convex problems do not depend on the penalty parameter motivates one to consider a slightly different augmented dual problem in the convex case:

minλΘc(λ)subject to λΛ.\min_{\lambda}\>\Theta_{c}(\lambda)\quad\text{subject to }\lambda\in\Lambda. (23)

Here Θc(λ)=Θ(c,λ)\Theta_{c}(\lambda)=\Theta(c,\lambda) and c>0c>0 is fixed, that is, the penalty parameter is not considered as a variable of augmented dual problem, but rather as a fixed external parameter. Note that the function Θc()\Theta_{c}(\cdot) is concave, provided assumption (A9)(A9) holds true, which is satisfied for most particular augmented Lagrangians, in contrast to the much more restrictive assumption (A9)s(A9)_{s}, which is satisfied, to the best of the author’s knowledge, only for Rockafellar-Wets’ augmented Lagrangian and, in particular, the Hestenes-Powell-Rockafellar augmented Lagrangian. Taking into account the concavity of the function Θc()\Theta_{c}(\cdot) one can consider primal-dual augmented Lagrangian methods based on solving problem (23), instead of the augmented dual problem (𝒟)(\mathcal{D}), i.e. augmented Lagrangian methods with fixed penalty parameter. Convergence analysis of such methods is always based on the use of a particular structure of an augmented Lagrangian under consideration (see the survey of such methods in [35]), which makes it difficult to extend such analysis to the general axiomatic augmented Lagrangian setting adopted in this article.

Let us show that in the case when assumptions (A5)(A5), (A6)(A6), and (A11)(A11) are not satisfied, equality (22) might be no longer valid and the penalty map might not be identically equal to zero on domc()\operatorname{dom}c_{*}(\cdot), even when the problem (𝒫)(\mathcal{P}) is convex.

Example 21.

Let X=Y=X=Y=\mathbb{R}. Consider the following optimization problem:

minf(x)=xsubject tog(x)=x0.\min\>f(x)=-x\quad\text{subject to}\enspace g(x)=x\leq 0. (24)

Let ()\mathscr{L}(\cdot) be the sharp Lagrangian for this problem (i.e. the augmented Lagrangian from Example 1 with σ(y)=y\sigma(y)=\|y\|). Then

(x,λ,c)\displaystyle\mathscr{L}(x,\lambda,c) =f(x)+infp(,g(x)](λp+c|p|)\displaystyle=f(x)+\inf_{p\in(-\infty,-g(x)]}\big{(}-\lambda p+c|p|\big{)}
=x+{λx+c|x|,if λ>c(λ+c)max{0,x},if |λ|c,if λ<c,\displaystyle=-x+\begin{cases}\lambda x+c|x|,&\text{if }\lambda>c\\ (\lambda+c)\max\{0,x\},&\text{if }|\lambda|\leq c\\ -\infty,&\text{if }\lambda<-c,\end{cases}

and, as one can readily verify by carefully writing down all particular cases,

Θ(λ,c)={0,if c|λ1|,,otherwise.\Theta(\lambda,c)=\begin{cases}0,&\text{if }c\geq|\lambda-1|,\\ -\infty,&\text{otherwise.}\end{cases}

Consequently, one has

𝒟={(λ,c)2|c|λ1|},c(λ)=|λ1|λ,\mathscr{D}_{*}=\Big{\{}(\lambda,c)\in\mathbb{R}^{2}\Bigm{|}c\geq|\lambda-1|\Big{\}},\quad c_{*}(\lambda)=|\lambda-1|\quad\forall\lambda\in\mathbb{R},

that is, the claims of Corollary 8 do not hold true for the sharp Lagrangian (recall that this Lagrangian does not satisfy assumptions (A5)(A5), (A6)(A6), and (A11)(A11)).

5 Convergence analysis of augmented Lagrangian methods

The goal of this section is to prove general convergence theorems for a large class of augmented Lagrangian methods and to analyse interrelations between convergence of augmented Lagrangian methods, zero duality gap property, and the existence of global saddle points/optimal dual solutions. We aim at presenting such results that explicitly highlight this kind of interrelations, instead of implicitly using them within the proofs, as it is usually done in the literature.

5.1 Model augmented Lagrangian method

We present all theoretical results for the following model augmented Lagrangian method given in Algorithm 1. In order to include various particular cases into the general theory, we do not specify a way in which multipliers and the penalty parameter are updated by the method, that is, they can be updated in any way that satisfies certain assumptions presented below. It makes our results applicable to the vast majority of existing augmented Lagrangian methods, including methods with nonmonotone penalty parameter updates as in [4], methods based on maximizing the augmented dual function with the use of bundle methods as in [48], etc. However, one should underline that our convergence analysis is by no means universal and there are augmented Lagrangian methods to which it cannot be applied. We will briefly discuss some such methods further in this section (see Remark 21 below).

Initialization. Choose an initial value of the multipliers λ0Λ\lambda_{0}\in\Lambda, a minimal value of the penalty parameter cmin>0c_{min}>0, an initial value of the penalty parameter c0cminc_{0}\geq c_{\min}, and a sequence {εn}(0,+)\{\varepsilon_{n}\}\subset(0,+\infty) of tolerances. Put n=0n=0.
Step 1. Solution of subproblem. Find an εn\varepsilon_{n}-optimal solution xnx_{n} of the problem
min(x,λn,cn)subject toxQ,\min\>\mathscr{L}(x,\lambda_{n},c_{n})\quad\text{subject to}\quad x\in Q,
that is, find xnQx_{n}\in Q such that (xn,λn,cn)(x,λn,cn)+εn\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq\mathscr{L}(x,\lambda_{n},c_{n})+\varepsilon_{n} for all xQx\in Q.
Step 2. Multiplier update. Choose some λn+1Λ\lambda_{n+1}\in\Lambda.
Step 3. Penalty parameter update. Choose some cn+1cminc_{n+1}\geq c_{min}. Set nn+1n\rightarrow n+1 and go to Step 1.
Algorithm 1 Model augmented Lagrangian method

Let us comment on Step 1 on Algorithm 1. For the purposes of theoretical analysis of primal-dual augmented Lagrangian methods it is often assumed that the augmented Lagrangian subproblem

min(x,λn,cn)subject toxQ,\min\>\mathscr{L}(x,\lambda_{n},c_{n})\quad\text{subject to}\quad x\in Q,

is solved exactly, i.e. that xnx_{n} is a globally optimal solution of this problem [69, 50, 27, 8, 40, 41, 38, 13, 5]. Moreover, even when it is assumed that this subproblem is solved only approximately as in [15, 44, 71, 10, 7, 11, 16, 14], one almost always assumes that εn0\varepsilon_{n}\to 0 as nn\to\infty, and the case when εn\varepsilon_{n} does not tend to zero is not properly analysed (papers [45, 48] are very rare exceptions to this rule). However, from the practical point of view the assumption that εn0\varepsilon_{n}\to 0 as nn\to\infty cannot be satisfied, especially in the infinite dimensional case, due to round off errors, discretisation errors, etc. The value εn>0\varepsilon_{n}>0 should be viewed as an unavoidable error reflecting the overall precision with which computations can be performed that does not tend to zero with iterations. To take this unavoidable error into account, below we present a detailed analysis of the model augmented Lagrangian method without assuming that εn0\varepsilon_{n}\to 0 as nn\to\infty, and then show how corresponding convergence theorems can be clarified and strengthened by imposing this additional purely theoretical assumption.

It should also be noted that practical augmented Lagrangian methods must include stopping criteria. We do not include a stopping criterion in our formulation of the model augmented Lagrangian method, because we are interested only in its theoretical (asymptotic) analysis, that is, in the analysis of the way sequences {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} generated by this method behave as nn\to\infty. This asymptotic analysis can be used to devise appropriate stopping criteria for practical implementations of augmented Lagrangian methods.

Below we will utilise the following natural assumptions on the model augmented Lagrangian method and sequences generated by this method that are satisfied in many particular cases:

  • (B1)

    for any nn\in\mathbb{N} the function (,λn,cn)\mathscr{L}(\cdot,\lambda_{n},c_{n}) is bounded below on QQ;

  • (B2)

    the sequence of multipliers {λn}\{\lambda_{n}\} is bounded;

  • (B3)

    if the sequence of penalty parameters {cn}\{c_{n}\} is bounded, then one has dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty; if, in addition, some subsequence {λnk}\{\lambda_{n_{k}}\} is bounded, then Φ(G(xnk),λnk,cnk)0\Phi(G(x_{n_{k}}),\lambda_{n_{k}},c_{n_{k}})\to 0 as kk\to\infty;

  • (B4)

    if the sequence of penalty parameters {cn}\{c_{n}\} is unbounded, then cn+c_{n}\to+\infty as nn\to\infty.

The assumption (B1)(B1) is a basic assumption for all primal-dual augmented Lagrangian methods, which is needed to ensure that the sequence {xn}\{x_{n}\} is correctly defined. The assumption (B2)(B2) is often imposed for the purposes of convergence analysis of augmented Lagrangian methods and, as is noted in [5], is usually satisfied in practice for traditional rules for updating multipliers. Moreover, various techniques can be used to guarantee the validity of assumption (B2)(B2), such as safeguarding and normalization of multipliers [44, 42, 5].

We formulate (B3)(B3) as an assumption due to the fact that we do not impose any restrictions on the way in which the penalty parameter cnc_{n} is updated. For many augmented Lagrangian methods, penalty parameter updates are specifically designed to ensure that assumption (B3)(B3) is satisfied by default (see the rules for updating the penalty parameter and corresponding convergence analysis in [44, 42, 71, 5] and other aforementioned papers on augmented Lagrangian methods). Finally, assumption (B4)(B4) is needed only in the case of methods with nonmonotone penalty parameter updates. It excludes the undesirable situation of unboundedly increasing oscillations of the penalty parameter (e.g. c2n=nc_{2n}=n and c2n+1=1c_{2n+1}=1 for all nn\in\mathbb{N}), which cannot be properly analysed within out general augmented Lagrangian setting.

Remark 21.

(i) Assumption (B3)(B3) plays one of the key roles in our convergence analysis of the model augmented Lagrangian method. Therefore, this analysis is inapplicable to those methods for which assumption (B3)(B3) is not satisfied, such as the modified subgradient algorithm (the MSG) proposed by Gasimov [27] (the fact that assumption (B3)(B3) is not satisfied for the MSG in the general case follows from [8, Example 1]).

(ii) The convergence analysis of the model augmented Lagrangian method presented below heavily relies on the assumption on boundedness of the sequence of multipliers and cannot be applied in the case when the sequence {λn}\{\lambda_{n}\} does not have at least a bounded subsequence. However, there are primal-dual augmented Lagrangian methods for which one can prove convergence of the sequence {xn}\{x_{n}\} to the set of globally optimal solutions of the problem (𝒫)(\mathcal{P}) even in the case when the norm of the multipliers λn\lambda_{n} increases unboundedly with iterations. Augmented Lagrangian methods with the so-called conditional multiplier updating (see [20], [44, Algorithm 3], [42, Algorithm 3], [45, Algorithm 3]) and the algorithms from [71] are examples of such methods. The main idea behind these methods consists in designing multiplier and penalty parameter updating rules in such a way as to ensure that an increase of the norm of the multipliers λn\|\lambda_{n}\| is compensated by a sufficient increase of the penalty parameter cnc_{n}, so that one can prove that

limndist(G(xn),K)=0,limnΦ(G(xn),λn,cn)=0\lim_{n\to\infty}\operatorname{dist}(G(x_{n}),K)=0,\quad\lim_{n\to\infty}\Phi(G(x_{n}),\lambda_{n},c_{n})=0 (25)

even if λn+\|\lambda_{n}\|\to+\infty as nn\to\infty. It is possible to extend convergence analysis of these methods to our axiomatic augmented Lagrangian setting by either imposing some restrictive assumptions on the function Φ\Phi or directly assuming that relations (25) hold true. We do not present such extension here and leave it as a problem for future research.

5.2 Convergence analysis of the method

Let us now turn to convergence analysis. We start with two simple observations. The first one can be viewed as a generalization of some existing results (e.g. [27, Theorem 5] and [48, Theorem 1]) to the case of general cone constrained problems and arbitrary augmented Lagrangians satisfying a certain assumption.

Lemma 3.

Let the function yΦ(y,λ,c)y\mapsto\Phi(y,\lambda,c) be non-decreasing with respect to the binary relation \preceq for any λΛ\lambda\in\Lambda and c>0c>0, and let {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} be the sequence generated by the model augmented Lagrangian method. Then for any nn\in\mathbb{N} the point xnx_{n} is an εn\varepsilon_{n}-optimal solution of the problem

minf(x)subject to G(x)G(xn),xQ,\min\>f(x)\quad\text{subject to }G(x)\preceq G(x_{n}),\quad x\in Q, (26)

Moreover, if G(xn)=0G(x_{n})=0, then xnx_{n} is an εn\varepsilon_{n}-optimal solution of the problem (𝒫)(\mathcal{P}).

Proof.

Suppose by contradiction that the claim of the lemma is false. Then one can find a feasible point xx of problem (26) such that f(xn)>f(x)+εnf(x_{n})>f(x)+\varepsilon_{n}. Hence by our assumption on the function Φ\Phi one gets

(x,λn,cn)=f(x)+Φ(G(x),λn,cn)\displaystyle\mathscr{L}(x,\lambda_{n},c_{n})=f(x)+\Phi(G(x),\lambda_{n},c_{n}) <f(xn)εn+Φ(G(xn),λn,cn)\displaystyle<f(x_{n})-\varepsilon_{n}+\Phi(G(x_{n}),\lambda_{n},c_{n})
=(xn,λn,cn)εn,\displaystyle=\mathscr{L}(x_{n},\lambda_{n},c_{n})-\varepsilon_{n},

which contradicts the definition on xnx_{n}.

Suppose now that G(xn)=0G(x_{n})=0. Recall that the inequality G(x)G(xn)G(x)\preceq G(x_{n}) means that G(xn)G(x)KG(x_{n})-G(x)\in-K or, equivalently, G(x)K+G(xn)=KG(x)\in K+G(x_{n})=K. Therefore the feasible region of the problem (𝒫)(\mathcal{P}) coincides with the feasible region of problem (26), which implies that an εn\varepsilon_{n}-optimal solution of this problem is also an εn\varepsilon_{n}-optimal solution of the problem (𝒫)(\mathcal{P}). ∎

Remark 22.

The assumption that the function yΦ(y,λ,c)y\mapsto\Phi(y,\lambda,c) is non-decreasing is satisfied for all particular augmented Lagrangians presented in this paper, except for the one from Example 15.

The second observation is connected with the augmented dual function. Recall that if the function Φ\Phi satisfies assumption (A1)(A1), then (x,λ,c)f(x)\mathscr{L}(x,\lambda,c)\leq f(x) for any feasible point xx. Therefore, the following result holds true.

Lemma 4.

Let assumption (A1)(A1) be valid. Then

Θ(λn,cn)(xn,λn,cn)Θ(λn,cn)+εnf+εnn.\Theta(\lambda_{n},c_{n})\leq\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq\Theta(\lambda_{n},c_{n})+\varepsilon_{n}\leq f_{*}+\varepsilon_{n}\quad\forall n\in\mathbb{N}.

Thus, if the sequence {Θ(λn,cn)}\{\Theta(\lambda_{n},c_{n})\} is bounded below, then the corresponding sequence {(xn,λn,cn)}\{\mathscr{L}(x_{n},\lambda_{n},c_{n})\} is bounded. Let us analyse how these sequences behave in the limit. As we will see below, this analysis is a key ingredient in the global convergence theory of augmented Lagrangian methods.

The following cumbersome technical result, which can be viewed as a partial generalization of Theorem 2, plays a key role in our convergence analysis of the model augmented Lagrangian method. The proof of this result is very similar to the proof of Theorem 2, and we include it only for the sake of completeness.

Lemma 5.

Let assumptions (A1)(A1), (A13)s(A13)_{s}, and (A14)s(A14)_{s} hold true. Suppose also that a sequence {(xn,λn,cn)}Q×domΘ\{(x_{n},\lambda_{n},c_{n})\}\subset Q\times\operatorname{dom}\Theta is such that:

  1. 1.

    dist(G(xn,K)0\operatorname{dist}(G(x_{n},K)\to 0 as nn\to\infty,

  2. 2.

    the sequence {λn}\{\lambda_{n}\} is bounded,

  3. 3.

    cn+c_{n}\to+\infty as nn\to\infty,

  4. 4.

    the sequence {(xn,λn,cn)Θ(λn,cn)}\{\mathscr{L}(x_{n},\lambda_{n},c_{n})-\Theta(\lambda_{n},c_{n})\} is bounded above.

Let finally ε=lim supn((xn,λn,cn)Θ(λn,cn))\varepsilon_{*}=\limsup_{n\to\infty}(\mathscr{L}(x_{n},\lambda_{n},c_{n})-\Theta(\lambda_{n},c_{n})). Then

ϑεlim infnΘ(λn,cn)\displaystyle\vartheta_{*}-\varepsilon_{*}\leq\liminf_{n\to\infty}\Theta(\lambda_{n},c_{n}) lim supnΘ(λn,cn)ϑ\displaystyle\leq\limsup_{n\to\infty}\Theta(\lambda_{n},c_{n})\leq\vartheta_{*} (27)
ϑlim infn(xn,λn,cn)\displaystyle\vartheta_{*}\leq\liminf_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n}) lim supn(xn,λn,cn)ϑ+ε\displaystyle\leq\limsup_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq\vartheta_{*}+\varepsilon_{*} (28)
ϑlim infnf(xn)\displaystyle\vartheta_{*}\leq\liminf_{n\to\infty}f(x_{n}) lim supnf(xn)ϑ+ε.\displaystyle\leq\limsup_{n\to\infty}f(x_{n})\leq\vartheta_{*}+\varepsilon_{*}. (29)

where ϑ=min{f,lim infp0β(p)}\vartheta_{*}=\min\big{\{}f_{*},\liminf_{p\to 0}\beta(p)\big{\}}.

Proof.

Note that the upper estimate for the limit superior in (29) follows directly from the upper estimate in (28) and assumption (A13)s(A13)_{s}. In turn, the lower estimate for the limit inferior in (29) follows directly from the fact that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty.

Indeed, if some subsequence {xnk}\{x_{n_{k}}\} is feasible for the problem (𝒫)(\mathcal{P}), then obviously lim infkf(xnk)f\liminf_{k\to\infty}f(x_{n_{k}})\geq f_{*}. In turn, if each member of a subsequence {xnk}\{x_{n_{k}}\} is infeasible for the problem (𝒫)(\mathcal{P}), then f(xnk)β(pk)f(x_{n_{k}})\geq\beta(p_{k}) for any pkYp_{k}\in Y such that G(xnk)pkKG(x_{n_{k}})-p_{k}\in K. Since dist(G(xnk),K)0\operatorname{dist}(G(x_{n_{k}}),K)\to 0 as kk\to\infty, one can choose pkYp_{k}\in Y, kk\in\mathbb{N}, such that pk0p_{k}\to 0 as kk\to\infty. Consequently, one has

lim infkf(xnk)lim infkβ(pk)lim infp0β(p),\liminf_{k\to\infty}f(x_{n_{k}})\geq\liminf_{k\to\infty}\beta(p_{k})\geq\liminf_{p\to 0}\beta(p),

which obviously implies that the the lower estimate in (29) holds true.

Thus, we need to prove only inequalities (27) and (28). Let us consider two cases.

Case I. Suppose that there exists a subsequence {xnk}\{x_{n_{k}}\} that is feasible for the problem (𝒫)(\mathcal{P}). Then with the use of Lemma 1 one gets

lim infk(xnk,λnk,cnk)\displaystyle\liminf_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}}) lim infk(f(xnk)+infλB(0,η)ΛinfyKΦ(y,λ,cnk))\displaystyle\geq\liminf_{k\to\infty}\Big{(}f(x_{n_{k}})+\inf_{\lambda\in B(0,\eta)\cap\Lambda}\inf_{y\in K}\Phi(y,\lambda,c_{n_{k}})\Big{)}
lim infkf(xnk)f,\displaystyle\geq\liminf_{k\to\infty}f(x_{n_{k}})\geq f_{*},

where η=supkλnk\eta=\sup_{k}\|\lambda_{n_{k}}\|.

Case II. Suppose now that there exists a subsequence {xnk}\{x_{n_{k}}\} such that G(xnk)KG(x_{n_{k}})\notin K for all kk\in\mathbb{N}. Then with the use of assumption (A13)s(A13)_{s} one gets

lim infk(xnk,λnk,cnk)lim infkf(xnk)lim infkβ(pk)lim infp0β(p),\liminf_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\geq\liminf_{k\to\infty}f(x_{n_{k}})\geq\liminf_{k\to\infty}\beta(p_{k})\geq\liminf_{p\to 0}\beta(p),

where {pk}Y\{p_{k}\}\subset Y is any sequence such that G(xnk)pkKG(x_{n_{k}})-p_{k}\in K for all kk\in\mathbb{N} and pk0\|p_{k}\|\to 0 as kk\to\infty (clearly, such sequence exists, since dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty).

Combining the two cases one gets that the lower estimates for the limit inferiors in (27) and (28) hold true. Let us now prove the upper estimates for the limit superiors. Note that the upper estimate in (28) follows directly from the upper estimate in (27) and Lemma 4. Therefore, it suffies to prove only the upper estimate for the limit superior of {Θ(λn,cn)}\{\Theta(\lambda_{n},c_{n})\}.

By Proposition 1 one has Θ(λn,cn)f\Theta(\lambda_{n},c_{n})\leq f_{*} for all nn\in\mathbb{N}, which implies that lim supnΘ(λn,cn)f\limsup_{n\to\infty}\Theta(\lambda_{n},c_{n})\leq f_{*}. If β:=lim infp0β(p)f\beta_{*}:=\liminf_{p\to 0}\beta(p)\geq f_{*}, then the proof is complete. Therefore, suppose that β<f\beta_{*}<f_{*}.

By the definition of limit inferior there exists a sequence {pk}Y\{p_{k}\}\subset Y such that pk0p_{k}\to 0 and β(pk)β\beta(p_{k})\to\beta_{*} as kk\to\infty. Let {tn}\{t_{n}\} be the sequence from assumption (A14)s(A14)_{s}. Since pk0p_{k}\to 0 as kk\to\infty, there exists a subsequence {pkn}\{p_{k_{n}}\} such that pkntn\|p_{k_{n}}\|\leq t_{n} for all nn\in\mathbb{N}.

By the definition of the optimal value function β\beta for any nn\in\mathbb{N} one can find xnQx_{n}\in Q such that G(xn)pknKG(x_{n})-p_{k_{n}}\in K (which implies that dist(G(xn),K)tn\operatorname{dist}(G(x_{n}),K)\leq t_{n}) and f(xn)β(pkn)+1/nf(x_{n})\leq\beta(p_{k_{n}})+1/n, if β(pkn)>\beta(p_{k_{n}})>-\infty, and f(xn)nf(x_{n})\leq-n otherwise. By applying assumption (A14)s(A14)_{s}, one gets that

lim supnΘ(λn,cn)lim supn(zn,λn,cn)\displaystyle\limsup_{n\to\infty}\Theta(\lambda_{n},c_{n})\leq\limsup_{n\to\infty}\mathscr{L}(z_{n},\lambda_{n},c_{n}) =lim supnf(zn)\displaystyle=\limsup_{n\to\infty}f(z_{n})
=limnβ(pkn)=lim infp0β(p),\displaystyle=\lim_{n\to\infty}\beta(p_{k_{n}})=\liminf_{p\to 0}\beta(p),

which means that the upper estimate in (27) holds true. ∎

Remark 23.

The claim of the lemma above (as well as the claim of Theorem 6 based on that lemma) remains to hold true, if only restricted versions of assumptions (A13)s(A13)_{s} and (A14)s(A14)_{s} are satisfied, and one additionally assumes that the projection of the set G(Q)G(Q) onto the cone KK is bounded (see Remark 9).

Corollary 9.

Let assumptions (A1)(A1), (A13)s(A13)_{s}, and (A14)s(A14)_{s} hold true. Let also a sequence {(λn,cn)}domΘ\{(\lambda_{n},c_{n})\}\subset\operatorname{dom}\Theta be such that the sequence {λn}\{\lambda_{n}\} is bounded and cn+c_{n}\to+\infty as nn\to\infty. Suppose finally that there exists a sequence {zn}Q\{z_{n}\}\subset Q such that dist(G(zn),K)0\operatorname{dist}(G(z_{n}),K)\to 0 and ((zn,λn,cn)Θ(λn,cn))0(\mathscr{L}(z_{n},\lambda_{n},c_{n})-\Theta(\lambda_{n},c_{n}))\to 0 as nn\to\infty. Then

limnΘ(λn,cn)=min{f,lim infp0β(p)}\lim_{n\to\infty}\Theta(\lambda_{n},c_{n})=\min\big{\{}f_{*},\liminf_{p\to 0}\beta(p)\big{\}} (30)
Proof.

Apply the previous lemma to the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} with xn=znx_{n}=z_{n} for all nn\in\mathbb{N}. ∎

Remark 24.

The corollary above strengthens Lemma 5. Namely, it states that if there exists a sequence {zn}\{z_{n}\} satisfying the assumptions of the corollary, then one can actually replace the lower and upper estimates (27) with equality (30). Note also that by the definition of augmented dual function there always exists a sequence {zn}Q\{z_{n}\}\subset Q such that ((zn,λn,cn)Θ(λn,cn))0(\mathscr{L}(z_{n},\lambda_{n},c_{n})-\Theta(\lambda_{n},c_{n}))\to 0 as nn\to\infty. The main assumption of the corollary is that one can find a sequence {zn}\{z_{n}\} not only satisfying this condition, but also such that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty.

Let us also provide necessary and sufficient conditions for the sequence {dist(G(xn),K)}\{\operatorname{dist}(G(x_{n}),K)\} to converge to zero.

Lemma 6.

Let assumptions (A1)(A1), (A7)(A7), and (A12)s(A12)_{s} hold true and a sequence {(xn,λn,cn)}Q×domΘ\{(x_{n},\lambda_{n},c_{n})\}\subset Q\times\operatorname{dom}\Theta be such that:

  1. 1.

    the sequence {λn}\{\lambda_{n}\} is bounded,

  2. 2.

    cn+c_{n}\to+\infty as nn\to\infty,

  3. 3.

    the sequence {(xn,λn,cn)Θ(λn,cn)}\{\mathscr{L}(x_{n},\lambda_{n},c_{n})-\Theta(\lambda_{n},c_{n})\} is bounded above,

  4. 4.

    there exists τ>0\tau>0 such that the function infnΦ(G(),λn,τ)\inf_{n}\Phi(G(\cdot),\lambda_{n},\tau) is bounded below on QQ.

Then for the sequence {dist(G(xn),K)}\{\operatorname{dist}(G(x_{n}),K)\} to converge to zero it is sufficient that the sequence {f(xn)}\{f(x_{n})\} is bounded below. This condition becomes necessary, when lim infp0β(p)>\liminf_{p\to 0}\beta(p)>-\infty.

Proof.

If only a finite number of elements of the sequence {xn}\{x_{n}\} is infeasible for the problem (𝒫)(\mathcal{P}), then the claim of the lemma is trivial, since in this case G(xn)KG(x_{n})\in K and f(xn)ff(x_{n})\geq f_{*} for any sufficiently large nn. Therefore, replacing, if necessary, the sequence {xn}\{x_{n}\} with its subsequence one can suppose that G(xn)KG(x_{n})\notin K for all nn\in\mathbb{N}.

Sufficiency. Denote εn=(xn,λn,cn)Θ(λn,cn)\varepsilon_{n}=\mathscr{L}(x_{n},\lambda_{n},c_{n})-\Theta(\lambda_{n},c_{n}) and

ε¯=supnεn<+,f¯=infnf(xn)>,Φ¯=infninfxQΦ(G(),λn,τ)>.\overline{\varepsilon}=\sup_{n}\varepsilon_{n}<+\infty,\quad\underline{f}=\inf_{n}f(x_{n})>-\infty,\quad\underline{\Phi}=\inf_{n}\inf_{x\in Q}\Phi(G(\cdot),\lambda_{n},\tau)>-\infty.

By Lemma 4 one has

(xn,λn,cn)Θ(λn,cn)+εnf+ε¯n,\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq\Theta(\lambda_{n},c_{n})+\varepsilon_{n}\leq f_{*}+\overline{\varepsilon}\quad\forall n\in\mathbb{N}, (31)

that is, the sequence {(xn,λn,cn)}\{\mathscr{L}(x_{n},\lambda_{n},c_{n})\} is bounded above.

Fix any r>0r>0. Due to the boundedness of the sequence {λn}\{\lambda_{n}\} and assumption (A12)s(A12)_{s} there exists t(r)τt(r)\geq\tau such that for any ct(r)c\geq t(r), nn\in\mathbb{N}, and xEnx\in E_{n} one has

Φ(G(x),λn,c)Φ(G(x),λn,τ)ff¯+ε¯+1Φ¯,\Phi(G(x),\lambda_{n},c)-\Phi(G(x),\lambda_{n},\tau)\geq f_{*}-\underline{f}+\overline{\varepsilon}+1-\underline{\Phi},

which implies that Φ(G(x),λn,c)ff¯+ε¯+1\Phi(G(x),\lambda_{n},c)\geq f_{*}-\underline{f}+\overline{\varepsilon}+1, where

En:={xQ|dist(G(x),K)r,Φ(G(x),λn,τ)<+}E_{n}:=\Big{\{}x\in Q\Bigm{|}\operatorname{dist}(G(x),K)\geq r,\>\Phi(G(x),\lambda_{n},\tau)<+\infty\Big{\}}

Therefore, for any nn\in\mathbb{N} such that cnt(r)c_{n}\geq t(r) and dist(G(xn),K)r\operatorname{dist}(G(x_{n}),K)\geq r one has

(xn,λn,cn)=f(xn)+Φ(G(xn),λn,cn)f+ε¯+1,\mathscr{L}(x_{n},\lambda_{n},c_{n})=f(x_{n})+\Phi(G(x_{n}),\lambda_{n},c_{n})\geq f_{*}+\overline{\varepsilon}+1,

which contradicts (31). Consequently, for any nn\in\mathbb{N} such that cnt(r)c_{n}\geq t(r) one has dist(G(xn),K)<r\operatorname{dist}(G(x_{n}),K)<r, which implies that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty.

Necessity. Suppose by contradiction that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0, but the sequence {f(xn)}\{f(x_{n})\} is unbounded below. Then for any sequence {pn}Y\{p_{n}\}\subset Y such that G(xn)pnKG(x_{n})-p_{n}\in K and pn0p_{n}\to 0 as nn\to\infty (at least one such sequence exists, since that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty) one has

=lim infnf(xn)lim infnβ(pn)lim infp0β(p),-\infty=\liminf_{n\to\infty}f(x_{n})\geq\liminf_{n\to\infty}\beta(p_{n})\geq\liminf_{p\to 0}\beta(p),

which contradicts our assumption. ∎

Remark 25.

(i) The last assumption of the lemma is satisfied, in particular, if for any bounded set Λ0Λ\Lambda_{0}\subset\Lambda there exists τ>0\tau>0 such that the function infλΛ0Φ(,λ,τ)\inf_{\lambda\in\Lambda_{0}}\Phi(\cdot,\lambda,\tau) is bounded below on YY. This assumption is satisfied for all particular examples of augmented Lagrangians from Section 3, except for He-Wu-Meng’s augmented Lagrangian (Example 13) under appropriate additional assumptions. Namely, in the case of Rockafellar-Wet’s augmented Lagrangian (Example 1) one needs to additionally assume that σ()σ0α\sigma(\cdot)\geq\sigma_{0}\|\cdot\|^{\alpha} for some σ0>0\sigma_{0}>0 and α1\alpha\geq 1, while in the case of the (penalized) exponential-type augmented Lagrangians (Examples 8, 9, 15, 17, and 18) and the hyperbolic-type augmented Lagrangian (Example 11) one needs to additionally assume that the function ϕ\phi/ψ\psi is bounded below. In all other example the assumption on the boundedness below of the function Φ\Phi is satisfied without any additional assumptions.

(ii) The last assumption of the lemma is satisfied for He-Wu-Meng’s augmented Lagrangian and the (penalized) exponential/hyperbolic-type augmented Lagrangians with unbounded below functions ϕ\phi/ψ\psi, if the projection of the set G(Q)G(Q) onto KK is bounded. In particular, in the case of inequality constrained problems it is sufficient to suppose that the functions gig_{i}, defining the constraints gi(x)0g_{i}(x)\leq 0, are bounded below. As was noted in Remark 9, this assumption is not restrictive from the theoretical point of view.

(iii) It should be noted that we used the last assumption of the lemma in order to implicitly prove that assumption (A12)s(A12)_{s} implies that

limcinfninf{Φ(y,λn,c)|yY:dist(y,K)r}=+\lim_{c\to\infty}\inf_{n\in\mathbb{N}}\inf\Big{\{}\Phi(y,\lambda_{n},c)\Bigm{|}y\in Y\colon\operatorname{dist}(y,K)\geq r\Big{\}}=+\infty (32)

for any r>0r>0. Therefore one might wonder whether it would be better to formulate (32) as a basic assumption and use it instead of assumption (A12)s(A12)_{s} and the assumption on the boundedness below of the function Φ(G(),λn,τ)\Phi(G(\cdot),\lambda_{n},\tau). Note, however, that in most particular cases the boundedness below of the function Φ\Phi is a necessary condition for (32) to hold true. In particular, one can easily check that if a separable augmented Lagrangian (see (6)) satisfies condition (32), then each function Φi\Phi_{i} is bounded below. That is why we opted to use assumption (A12)s(A12)_{s} along with the assumption on the boundedness below of the function Φ\Phi instead of condition (32).

Now we are ready to estimate the limit of the sequence {(xn,λn,cn)}\{\mathscr{L}(x_{n},\lambda_{n},c_{n})\} and the corresponding sequences {Θ(λn,cn)\{\Theta(\lambda_{n},c_{n}) and {f(xn)}\{f(x_{n})\} of the augmented dual and objective functions’ values for sequences {(xn,λ,cn)}\{(x_{n},\lambda,c_{n})\} generated by the model augmented Lagrangian method. Recall that Θ\Theta_{*} is the optimal value of the augmented dual problem.

Theorem 6 (main convergence theorem).

Let {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} be the sequence generated by the model augmented Lagrangian method, and suppose that the following conditions are satisfied:

  1. 1.

    assumptions (A1)(A1), (A7)(A7), (A12)s(A12)_{s}(A14)s(A14)_{s}, and (A15)(A15) hold true;

  2. 2.

    assumptions (B1)(B1)(B4)(B4) hold true;

  3. 3.

    the sequence {εn}\{\varepsilon_{n}\} is bounded and lim supnεn=ε\limsup_{n\to\infty}\varepsilon_{n}=\varepsilon_{*};

  4. 4.

    for any bounded set Λ0Λ\Lambda_{0}\subset\Lambda there exists τ>0\tau>0 such that the function infλΛ0Φ(G(),λ,τ)\inf_{\lambda\in\Lambda_{0}}\Phi(G(\cdot),\lambda,\tau) is bounded below on QQ.

Then in the case when the sequence {cn}\{c_{n}\} is bounded one has dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty, and in the case when the sequence {cn}\{c_{n}\} is unbounded one has dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 if and only if the sequence {f(xn)}\{f(x_{n})\} is bounded below. Furthermore, if the sequence {f(xn)}\{f(x_{n})\} is bounded below, then

Θεlim infnΘ(λn,cn)\displaystyle\Theta_{*}-\varepsilon_{*}\leq\liminf_{n\to\infty}\Theta(\lambda_{n},c_{n}) lim supnΘ(λn,cn)Θ,\displaystyle\leq\limsup_{n\to\infty}\Theta(\lambda_{n},c_{n})\leq\Theta_{*}, (33)
Θlim infn(xn,λn,cn)\displaystyle\Theta_{*}\leq\liminf_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n}) lim supn(xn,λn,cn)Θ+ε\displaystyle\leq\limsup_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq\Theta_{*}+\varepsilon_{*} (34)
Θlim infnf(xn)\displaystyle\Theta_{*}\leq\liminf_{n\to\infty}f(x_{n}) lim supnf(xn)Θ+ε.\displaystyle\leq\limsup_{n\to\infty}f(x_{n})\leq\Theta_{*}+\varepsilon_{*}. (35)
Proof.

If the sequence {cn}\{c_{n}\} is unbounded, then by assumption (B4)(B4) one has cn+c_{n}\to+\infty as nn\to\infty, and the claim of the theorem follows directly from Lemmas 5 and 6 (the fact that limp0β(p)>\lim_{p\to 0}\beta(p)>-\infty follows directly from Remark 8). Therefore, suppose that the sequence of penalty parameters {cn}\{c_{n}\} is bounded. Note that inequalities (33) in this case follow from the first inequality in (34) and the definition of Θ\Theta_{*}. Note further that by assumption (B3)(B3) one has dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 and Φ(G(xn),λn,cn)0\Phi(G(x_{n}),\lambda_{n},c_{n})\to 0 as nn\to\infty. Consequently, one has

lim infn(xn,λn,cn)=lim infnf(xn),lim supn(xn,λn,cn)=lim supnf(xn).\liminf_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n})=\liminf_{n\to\infty}f(x_{n}),\quad\limsup_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n})=\limsup_{n\to\infty}f(x_{n}).

Thus, it is sufficient to prove either of the inequalities (34) and (35). We divide the rest of the proof into two parts.

Part 1. Lower estimate. Let {xnk}\{x_{n_{k}}\} be any subsequence such that

limkf(xnk)=lim infnf(xn)\lim_{k\to\infty}f(x_{n_{k}})=\liminf_{n\to\infty}f(x_{n})

(at least one such subsequence exists by the definition of limit inferior). Suppose at first that there exists a subsequence of the sequence {xnk}\{x_{n_{k}}\}, which we denote again by {xnk}\{x_{n_{k}}\}, that is feasible for the problem (𝒫)(\mathcal{P}). Then f(xnk)ff(x_{n_{k}})\geq f_{*} for all kk\in\mathbb{N} and the lower estimate for the limit inferior in (35) holds true by Proposition 1.

Suppose now that xnkx_{n_{k}} is infeasible for the problem (𝒫)(\mathcal{P}) for all kk greater than some k0k_{0}\in\mathbb{N}. Since dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty, for any kk0k\in k_{0} one can find pkYp_{k}\in Y such that G(xnk)pkKG(x_{n_{k}})-p_{k}\in K and pk0p_{k}\to 0 as kk\to\infty. Consequently, f(xnk)β(pk)f(x_{n_{k}})\geq\beta(p_{k}) for all kk0k\geq k_{0} and

limkf(xnk)lim infkβ(pk)lim infp0β(p),\lim_{k\to\infty}f(x_{n_{k}})\geq\liminf_{k\to\infty}\beta(p_{k})\geq\liminf_{p\to 0}\beta(p),

which by Theorem 2 implies that the lower estimate for the limit inferior in (35) is valid.

Part 2. Upper estimate. Suppose at first that flim infp0β(p)f_{*}\leq\liminf_{p\to 0}\beta(p). Then by Lemma 4 and Theorem 2 one has

lim supn(xn,λn,cn)f+ε=Θ+ε,\limsup_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq f_{*}+\varepsilon_{*}=\Theta_{*}+\varepsilon_{*},

that is, the upper estimate for the limit superior in (34) holds true.

Let us now consider the case f>lim infp0β(p)=:βf_{*}>\liminf_{p\to 0}\beta(p)=:\beta_{*}. By the definition of limit inferior there exists {pn}Y\{p_{n}\}\subset Y such that pn0p_{n}\to 0 and β(pn)β\beta(p_{n})\to\beta_{*} as nn\to\infty. Note that β>\beta_{*}>-\infty by Remark 8 and, therefore, one can suppose that β(pn)>\beta(p_{n})>-\infty for all nn\in\mathbb{N}.

By the definition of the optimal value function one can find a sequence {zn}Q\{z_{n}\}\subset Q such that G(zn)pnKG(z_{n})-p_{n}\in K and f(zn)β(pn)+1/nf(z_{n})\leq\beta(p_{n})+1/n for all nn\in\mathbb{N}. Hence, in particular, dist(G(zn),K)0\operatorname{dist}(G(z_{n}),K)\to 0 as nn\to\infty, which by assumption (A15)(A15) implies that lim supnΦ(G(zn),λn,cn)0\limsup_{n\to\infty}\Phi(G(z_{n}),\lambda_{n},c_{n})\leq 0. Consequently, one has

lim supnΘ(λn,cn)lim supn(zn,λn,cn)limnf(zn)=limnβ(pn)=β.\limsup_{n\to\infty}\Theta(\lambda_{n},c_{n})\leq\limsup_{n\to\infty}\mathscr{L}(z_{n},\lambda_{n},c_{n})\leq\lim_{n\to\infty}f(z_{n})=\lim_{n\to\infty}\beta(p_{n})=\beta_{*}.

Recall that by the definition of xnx_{n} one has (xn,λn,cn)Θ(λn,cn)+εn\mathscr{L}(x_{n},\lambda_{n},c_{n})\leq\Theta(\lambda_{n},c_{n})+\varepsilon_{n}. Therefore the inequality above along with Theorem 2 imply that the upper estimate for the limit superior in (34) holds true. ∎

Remark 26.

(i) The previous theorem can be slightly generalized in the following way. Namely, suppose that assumption (B2)(B2) does not hold true, but there exists a bounded subsequence {λnk}\{\lambda_{n_{k}}\}. Then the claim of Theorem 6 holds true for the corresponding subsequence {(xnk,λnk,cnk)}\{(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\}. In the case when the sequence {cn}\{c_{n}\} is unbounded, one simply needs to apply Lemmas 5 and 6 to this subsequence. In the case when the sequence {cn}\{c_{n}\} is bounded, one just needs to repeat the proof of the theorem with the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} replaced by the subsequence {(xnk,λnk,cnk)}\{(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\}.

(ii) Note that from the proof of Theorem 6 it follows that the sequence {f(xn)}\{f(x_{n})\} is always bounded below in the case when the sequence {cn}\{c_{n}\} is bounded.

(iii) In many papers on augmented Lagrangians and augmented Lagrangians methods, it is assumed by default that the function ff is bounded below on the set QQ (see [44, Assumption 1] [42, Assumption 2.3], [39, Assumption 1] [71, Assumption 1], [45, Assumption 1], [70, Assumption (1)], etc.). From the theoretical point of view this assumption is not restrictive, since one can always replace the objective function ff with ef()e^{f(\cdot)}. This assumption ensures that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty for sequences {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} generated by the model augmented Lagrangian method, regardless of whether the sequence of penalty parameters is bounded or not. Furthermore, in the case when the sequence {cn}\{c_{n}\} increases unboundedly, this assumptions guarantees that estimates (33) can be replaced with equality (30) by Corollary 9 and Lemma 6.

Corollary 10.

Let the assumptions of Theorem 6 hold true, and suppose that εn0\varepsilon_{n}\to 0 as nn\to\infty. Then

limnΘ(λn,cn)=limn(xn,λn,cn)=limnf(xn)=Θ.\lim_{n\to\infty}\Theta(\lambda_{n},c_{n})=\lim_{n\to\infty}\mathscr{L}(x_{n},\lambda_{n},c_{n})=\lim_{n\to\infty}f(x_{n})=\Theta_{*}.

5.3 Primal convergence

Now we are ready to prove general theorems on convergence of the sequence {xn}\{x_{n}\} generated by the model augmented Lagrangian method. Denote by Δ=fΘ\Delta_{*}=f_{*}-\Theta_{*} the duality gap between the primal problem (𝒫)(\mathcal{P}) and the augmented dual problem (𝒟)(\mathcal{D}).

Theorem 7 (primal convergence vs. duality gap).

Let the assumptions of Theorem 6 be valid, the sequence {f(xn)}\{f(x_{n})\} be bounded below, and the functions ff and dist(G(),K)\operatorname{dist}(G(\cdot),K) be lsc on QQ. Then the sequence {xn}\{x_{n}\} has limit points, only if Δε\Delta_{*}\leq\varepsilon_{*} (that is, the duality gap is smaller than the tolerance with which the augmented Lagrangian subproblems are solved). Furthermore, all limit points of the sequence {xn}\{x_{n}\} (if such points exist) are (εΔ)(\varepsilon_{*}-\Delta_{*})-optimal solutions of the problem (𝒫)(\mathcal{P}).

Proof.

Suppose that there exists a limit point xx_{*} of the sequence {xn}\{x_{n}\}, i.e. there exists a subsequence {xnk}\{x_{n_{k}}\} that converges to xx_{*}. Recall that dist(G(xn),K)0\operatorname{dist}(G(x_{n}),K)\to 0 as nn\to\infty and

lim supnf(xn)Θ+ε\limsup_{n\to\infty}f(x_{n})\leq\Theta_{*}+\varepsilon_{*}

by Theorem 6. Hence taking into account the semicontinuity assumptions one can conclude that dist(G(x),K)=0\operatorname{dist}(G(x_{*}),K)=0, i.e. xx_{*} is feasible for the problem (𝒫)(\mathcal{P}), and

ff(x)Θ+εf+εΔ.f_{*}\leq f(x_{*})\leq\Theta_{*}+\varepsilon_{*}\leq f_{*}+\varepsilon_{*}-\Delta_{*}.

Therefore, Δε\Delta_{*}\leq\varepsilon_{*} and xx_{*} is an (εΔ)(\varepsilon_{*}-\Delta_{*})-optimal solution of the problem (𝒫)(\mathcal{P}). ∎

Corollary 11 (primal convergence vs. zero duality gap).

If under the assumptions of the previous theorem εn0\varepsilon_{n}\to 0 as nn\to\infty, then for the sequence {xn}\{x_{n}\} to have limit points it is necessary that there is zero duality gap between the primal problem (𝒫)(\mathcal{P}) and the augmented dual problem (𝒟)(\mathcal{D}). Furthermore, in this case all limit points of the sequence {xn}\{x_{n}\} (if such points exist) are globally optimal solutions of the problem (𝒫)(\mathcal{P}).

In the case when the space XX is reflexive (in particular, finite dimensional), we can prove a somewhat stronger result. Namely, we can show that if the zero duality gap property is not satisfied, then the sequence {xn}\{x_{n}\} necessarily escapes to infinity as nn\to\infty.

Theorem 8 (boundedness vs. duality gap).

Let the assumptions of Theorem 6 be valid, the space XX be reflexive, the set QQ be weakly sequentially closed, the functions ff and dist(G(),K)\operatorname{dist}(G(\cdot),K) be weakly sequentially lsc on QQ, the sequence {f(xn)}\{f(x_{n})\} be bounded below. Then the following statements hold true:

  1. 1.

    for the sequence {xn}\{x_{n}\} to have a bounded subsequence it is necessary that Δε\Delta_{*}\leq\varepsilon_{*};

  2. 2.

    all weakly limit points of the sequence {xn}\{x_{n}\} (if such points exist at all) are (εΔ)(\varepsilon_{*}-\Delta_{*})-optimal solutions of the problem (𝒫)(\mathcal{P});

  3. 3.

    if εn0\varepsilon_{n}\to 0 as nn\to\infty, then for the sequence {xn}\{x_{n}\} to have a bounded subsequence it is necessary that the zero duality gap property holds true; furthermore, in this case all weakly limit points of the sequence {xn}\{x_{n}\} are globally optimal solutions of the problem (𝒫)(\mathcal{P}).

Proof.

Bearing in mind the fact that any bounded sequence in a reflexive Banach space has a weakly convergent subsequence and arguing in the same way as in the proof of Theorem 7 one can easily verify that all claims of this theorem hold true. ∎

5.4 Dual convergence

Let us now turn to analysis of dual convergence, that is, convergence of the sequence of multipliers {λn}\{\lambda_{n}\} or, more precisely, convergence of the dual sequence {(λn,cn)}\{(\lambda_{n},c_{n})\}. Although convergence of multipliers for some particular augmented Lagrangian methods can be studied, even in the case when the sequence of multipliers {cn}\{c_{n}\} increases unboundedly, with the use of optimality conditions, only convergence of the whole sequence {(λn,cn)}\{(\lambda_{n},c_{n})\} is apparently connected with some fundamental properties of the augmented dual problem. Such connection might exist in the case when the penalty parameter increases unboundedly, but an analysis of such connection is a challenging task, which we leave as an open problem for future research.

We start our study of the dual convergence with a simple auxiliary result that provides an important characterisation of limit points of the sequence {(λn,cn)}\{(\lambda_{n},c_{n})\}.

Lemma 7.

Let all assumptions of Theorem 6 be valid, except for assumption (B2)(B2). Suppose also that assumption (A10)(A10) holds true and the sequence {cn}\{c_{n}\} is bounded. Then any limit point (λ,c)(\lambda_{*},c_{*}) of the sequence {(λn,cn)}\{(\lambda_{n},c_{n})\} (if such point exists) is an ε\varepsilon_{*}-optimal solution of the dual problem.

Proof.

Let a subsequence {(λnk,cnk)}\{(\lambda_{n_{k}},c_{n_{k}})\} converge to some (λ,c)Λ×(0,+)(\lambda_{*},c_{*})\in\Lambda\times(0,+\infty). Then, in particular, the sequence {λnk}\{\lambda_{n_{k}}\} is bounded and by Theorem 6 (see also Remark 26) one has

lim infkΘ(λnk,cnk)Θε.\liminf_{k\to\infty}\Theta(\lambda_{n_{k}},c_{n_{k}})\geq\Theta_{*}-\varepsilon_{*}.

Hence bearing in mind the fact that the function Θ\Theta is upper semicontinuous by assumption (A10)(A10) (see Remark 7) one can conclude that Θ(λ,c)Θε\Theta(\lambda_{*},c_{*})\geq\Theta_{*}-\varepsilon_{*}, that is, (λ,c)(\lambda_{*},c_{*}) is an ε\varepsilon_{*}-optimal solution of the dual problem. ∎

Corollary 12.

Let the assumptions of Lemma 7 be valid and suppose that the functions ff and dist(G(),K)\operatorname{dist}(G(\cdot),K) are lsc on QQ, while the augmented Lagrangian ()\mathscr{L}(\cdot) is lsc on Q×Λ×(0,+)Q\times\Lambda\times(0,+\infty). Then any limit point (x,λ,c)(x_{*},\lambda_{*},c_{*}) of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} is an 2ε2\varepsilon_{*}-saddle point of the augmented Lagrangian, that is,

supλΛ(x,λ,c)2ε(x,λ,c)infxQ(x,λ,c)+ε.\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c_{*})-2\varepsilon_{*}\leq\mathscr{L}(x_{*},\lambda_{*},c_{*})\leq\inf_{x\in Q}\mathscr{L}(x,\lambda_{*},c_{*})+\varepsilon_{*}. (36)
Proof.

Suppose that a subsequence {(xnk,λnk,cnk)}\{(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\} converges to some triplet (x,λ,c)Q×Λ×(0,+)(x_{*},\lambda_{*},c_{*})\in Q\times\Lambda\times(0,+\infty). Then by Theorem 6 (see also Remark 26) one has dist(G(xnk),K)0\operatorname{dist}(G(x_{n_{k}}),K)\to 0 as kk\to\infty and

lim supkf(xnk)Θ+ε,lim supk(xnk,λnk,cnk)Θ+ε.\limsup_{k\to\infty}f(x_{n_{k}})\leq\Theta_{*}+\varepsilon_{*},\quad\limsup_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\leq\Theta_{*}+\varepsilon_{*}.

Therefore, by the lower semicontinuity assumptions one has G(x)KG(x_{*})\in K and

f(x)Θ+ε,(x,λ,c)Θ+ε.f(x_{*})\leq\Theta_{*}+\varepsilon_{*},\quad\mathscr{L}(x_{*},\lambda_{*},c_{*})\leq\Theta_{*}+\varepsilon_{*}.

On the other hand, Proposition 1 and Lemma 7 imply that

Θf(x),ΘεΘ(λ,c)(x,λ,c).\Theta_{*}\leq f(x_{*}),\quad\Theta_{*}-\varepsilon_{*}\leq\Theta(\lambda_{*},c_{*})\leq\mathscr{L}(x_{*},\lambda_{*},c_{*}).

Hence with the use of assumption (A1)(A1) one gets that

supλΛ(x,λ,c)f(x)(x,λ,c)+2ε,\sup_{\lambda\in\Lambda}\mathscr{L}(x_{*},\lambda,c_{*})\leq f(x_{*})\leq\mathscr{L}(x_{*},\lambda_{*},c_{*})+2\varepsilon_{*},

that is, the first inequality in (36) is satisfied.

By the definition of xnx_{n} one has

(xnk,λnk,cnk)(x,λnk,cnk)+εnkkxQ,\mathscr{L}(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\leq\mathscr{L}(x,\lambda_{n_{k}},c_{n_{k}})+\varepsilon_{n_{k}}\quad\forall k\in\mathbb{N}\>\forall x\in Q,

which implies that

lim infk(xnk,λnk,cnk)lim supk(x,λnk,cnk)+εxQ.\liminf_{k\to\infty}\mathscr{L}(x_{n_{k}},\lambda_{n_{k}},c_{n_{k}})\leq\limsup_{k\to\infty}\mathscr{L}(x,\lambda_{n_{k}},c_{n_{k}})+\varepsilon_{*}\quad\forall x\in Q.

Hence with the use of assumption (A10)(A10) and the fact that the function ()\mathscr{L}(\cdot) is lsc one obtains

(x,λ,c)(x,λ,c)+εxQ,\mathscr{L}(x_{*},\lambda_{*},c_{*})\leq\mathscr{L}(x,\lambda_{*},c_{*})+\varepsilon_{*}\quad\forall x\in Q,

that is, the second inequality in (36) is satisfied. ∎

Remark 27.

(i) It should be noted that the augmented Lagrangian is lsc on Q×Λ×(0,+)Q\times\Lambda\times(0,+\infty) for all particular examples of the function Φ\Phi from Section 3, except for Rockafellar-Wets’ augmented Lagrangian, if the function ff is lsc on QQ and the function GG is continuous on this set. In the case of Rockafellar-Wets’ augmented Lagrangian (Example 1) one needs to impose some additional assumptions, such as σ()=0.52\sigma(\cdot)=0.5\|\cdot\|^{2} or K={0}K=\{0\} and σ()=\sigma(\cdot)=\|\cdot\|. Let us also mention that in the case of inequality constrained problems, instead of assuming that GG is continuous, it is sufficient to suppose that the functions gig_{i} defining the constrained gi(x)0g_{i}(x)\leq 0 are only lower semicontinuous.

(ii) If the augmented Lagrangian ()\mathscr{L}(\cdot) is continuous on Q×Λ×(0,+)Q\times\Lambda\times(0,+\infty), then one can replace 2ε2\varepsilon_{*} in the first inequality in (36) with ε\varepsilon_{*}. Indeed, in this case by Theorem 6 one has (x,λ,c)Θ\mathscr{L}(x_{*},\lambda_{*},c_{*})\geq\Theta_{*} and, therefore, f(x)(x,λ,c)+εf(x_{*})\leq\mathscr{L}(x_{*},\lambda_{*},c_{*})+\varepsilon_{*}, which implies the required result.

With the use of Lemma 7 we can easily show how dual convergence is connected with existence of optimal dual solutions/global saddle points of the augmented Lagrangian.

Theorem 9 (dual convergence vs. existence of optimal dual solutions).

Let {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} be the sequence generated by the model augmented Lagrangian method, and suppose that the following conditions are satisfied:

  1. 1.

    assumptions (A1)(A1), (A7)(A7), (A10)(A10), (A12)s(A12)_{s}(A14)s(A14)_{s}, and (A15)(A15) hold true;

  2. 2.

    assumptions (B1)(B1), (B3)(B3), and (B4)(B4) hold true;

  3. 3.

    εn0\varepsilon_{n}\to 0 as nn\to\infty;

  4. 4.

    for any bounded set Λ0Λ\Lambda_{0}\subset\Lambda there exists τ>0\tau>0 such that the function infλΛ0Φ(G(),λ,τ)\inf_{\lambda\in\Lambda_{0}}\Phi(G(\cdot),\lambda,\tau) is bounded below on QQ.

Then for the sequence of penalty parameters {cn}\{c_{n}\} to be bounded and the sequence of multipliers {λn}\{\lambda_{n}\} to have a limit point it is necessary that a globally optimal solution of the dual problem (𝒟)(\mathcal{D}) exists.

Proof.

By Lemma 7, under the assumptions of the theorem any limit point of the sequence {(λn,cn)}\{(\lambda_{n},c_{n})\} is a globally optimal solution of the dual problem. Therefore, for the existence of such limit (or, equivalently, for the sequence {cn}\{c_{n}\} to be bounded and the sequence {λn}\{\lambda_{n}\} to have a limit point) it is necessary that a globally optimal solution of the dual problem exists. ∎

Theorem 10 (full convergence vs. existence of global saddle points).

Let all assumptions of Theorem 9 be valid and suppose that the functions ff and dist(G(),K)\operatorname{dist}(G(\cdot),K) are lsc on the set QQ. The for the sequence of penalty parameters {cn}\{c_{n}\} to be bounded and the sequence {(xn,λn)}\{(x_{n},\lambda_{n})\} to have a limit point it is necessary that there exists a global saddle point of the augmented Lagrangian ()\mathscr{L}(\cdot). Moreover, for any limit point {(x,λ,c)}\{(x_{*},\lambda_{*},c_{*})\} of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} (if such point exists) the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of the augmented Lagrangian and cc(x,λ)c_{*}\geq c_{*}(x_{*},\lambda_{*}).

Proof.

Let (x,λ,c)(x_{*},\lambda_{*},c_{*}) be a limit point of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\}. Then by Lemma 7 the pair (λ,c)(\lambda_{*},c_{*}) is an optimal dual solution. In turn, by applying Theorem 6 (see also Remark 26) one can readily verify that the zero duality gap property is satisfied and xx_{*} is a globally optimal solution of the problem (𝒫)(\mathcal{P}). Therefore, by Theorem 4 the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of (x,λ,c)\mathscr{L}(x,\lambda,c) and cc(x,λ)c_{*}\geq c_{*}(x_{*},\lambda_{*}). Consequently, for the existence of a limit point of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} it is necessary that there exists a global saddle point of the augmented Lagrangian. ∎

In the case when the space YY is reflexive one can prove somewhat stronger versions of the previous theorems that uncover a connection between the boundedness of the sequences of multipliers and penalty parameters and the existence of optimal dual solutions/global saddle points.

Theorem 11 (boundedness vs. existence of optimal dual solutions).

Suppose that {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} is the sequence generated by the model augmented Lagrangian method, and let the following conditions be satisfied:

  1. 1.

    the space YY is reflexive;

  2. 2.

    assumptions (A1)(A1), (A7)(A7), (A9)s(A9)_{s}, (A10)(A10), (A12)s(A12)_{s}(A14)s(A14)_{s}, and (A15)(A15) hold true;

  3. 3.

    assumptions (B1)(B1), (B3)(B3), and (B4)(B4) hold true;

  4. 4.

    the sequence {εn}\{\varepsilon_{n}\} is bounded and lim supnεn=ε\limsup_{n\to\infty}\varepsilon_{n}=\varepsilon_{*}.

  5. 5.

    for any bounded set Λ0Λ\Lambda_{0}\subset\Lambda there exists τ>0\tau>0 such that the function infλΛ0Φ(G(),λ,τ)\inf_{\lambda\in\Lambda_{0}}\Phi(G(\cdot),\lambda,\tau) is bounded below on QQ.

Then the following statements hold true:

  1. 1.

    any weakly limit point of the sequence {(λn,cn)}\{(\lambda_{n},c_{n})\} (if such point exists) is an ε\varepsilon_{*}-optimal solution of the problem (𝒟)(\mathcal{D});

  2. 2.

    if εn0\varepsilon_{n}\to 0 as nn\to\infty, then for the boundedness of the sequence {(λn,cn)}\{(\lambda_{n},c_{n})\} it is necessary that there exists a globally optimal solution of the augmented dual problem (𝒟)(\mathcal{D});

  3. 3.

    if, in addition, XX is reflexive, QQ is weakly sequentially closed, the functions ff and dist(G(),K)\operatorname{dist}(G(\cdot),K) are weakly sequentially lsc on QQ, and εn0\varepsilon_{n}\to 0 as nn\to\infty, then for the boundedness of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} it is necessary that there exists a global saddle point of the augmented Lagrangian; furthermore, for any weakly limit point (x,λ,c)(x_{*},\lambda_{*},c_{*}) of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} the pair (x,λ)(x_{*},\lambda_{*}) is a global saddle point of (x,λ,c)\mathscr{L}(x,\lambda,c) and cc(x,λ)c_{*}\geq c_{*}(x_{*},\lambda_{*}).

Proof.

The proof of this theorem almost literally repeats the proofs of Lemma 7 and Theorems 9 and 10. One only needs to use the facts that (i) any bounded sequence from a reflexive Banach space has a weakly convergent subsequence, (ii) the augmented dual function Θ(λ,c)\Theta(\lambda,c) is usc and concave by assumptions (A9)s(A9)_{s} and (A10)(A10), and (iii) any usc concave function defined on a Banach space is also weakly sequentially usc. ∎

Remark 28.

It should be underlined that the previous theorem provides necessary conditions for the boundedness of sequences {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\} generated by the model augmented Lagrangian method irrespective of the way in which the sequences of multipliers {λn}\{\lambda_{n}\} and penalty parameters {cn}\{c_{n}\} are updated. As long as the assumptions of the theorem are satisfied, the existence of a global saddle point is a necessary conditions for the boundedness of the sequence {(xn,λn,cn)}\{(x_{n},\lambda_{n},c_{n})\}. Similarly, the existence of an optimal dual solution is a necessary condition for the boundedness of the sequences {λn,}\{\lambda_{n},\} and {cn}\{c_{n}\}, regardless of the way in which they are updated.

Let us finally return to Example 20 in order to demonstrate how one can apply general convergence results obtained in this seciton to better understand and predict behaviour of primal-dual augmented Lagrangian methods.

Example 22.

Let X=Y=X=Y=\mathbb{R}. Consider the following optimization problem:

minf(x)=x2subject tog1(x)=x10,g2(x)=x10.\min\>f(x)=-x^{2}\quad\text{subject to}\enspace g_{1}(x)=x-1\leq 0,\enspace g_{2}(x)=-x-1\leq 0. (37)

Let ()\mathscr{L}(\cdot) be the Hestenes-Powell-Rockafellar augmented Lagrangian for this problem (see (18)). As was shown in Example 20, the zero duality gap property holds true in this case, but the augmented dual problem has no globally optimal solutions.

Let the multipliers and the penalty parameter be updated in accordance with the classic augmented Lagrangian method (see, e.g. [5, Algorithm 4.1]), that is,

λ(n+1),1=max{λn1+cng1(xn),0},λ(n+1),2=max{λn2+cng2(xn),0}\lambda_{(n+1),1}=\max\big{\{}\lambda_{n1}+c_{n}g_{1}(x_{n}),0\big{\}},\quad\lambda_{(n+1),2}=\max\big{\{}\lambda_{n2}+c_{n}g_{2}(x_{n}),0\big{\}} (38)

and

cn+1={cn,if n=0 or VnτVn1,γcn,otherwise,c_{n+1}=\begin{cases}c_{n},&\text{if }n=0\text{ or }\|V_{n}\|\leq\tau\|V_{n-1}\|,\\ \gamma c_{n},&\text{otherwise},\end{cases} (39)

where τ(0,1)\tau\in(0,1) and γ>1\gamma>1 are fixed parameters and

Vni:=min{gi(xn),μnicn},i{1,2}.V_{ni}:=\min\left\{-g_{i}(x_{n}),\frac{\mu_{ni}}{c_{n}}\right\},\quad i\in\{1,2\}.

One can readily check that assumptions (B1)(B1), (B3)(B3), and (B4)(B4) hold true in this case, if cmin2c_{\min}\geq 2. Hence with the use of Theorem 11 one can conclude that the sequence {(λn,cn)}\{(\lambda_{n},c_{n})\} has no bounded subsequence, which means that either cn+c_{n}\to+\infty or λn+\|\lambda_{n}\|\to+\infty as nn\to\infty. Let us provide a numerical example to illustrate this claim.

Table 1: First 10 iterations of the model augmented Lagrangian method for problem (37).
nn 0 1 2 3 4 5 6 7 8 9
xnx_{n} 2 -3 1.5 -1.5 1.2 -1.2 1.0909 -1.0909 1.0435 -1.0435
λn1\lambda_{n1} 1 4 0 3 0 2.4 0 2.1818 0 2.087
λn2\lambda_{n2} 1 0 6 0 3 0 2.4 0 2.1818 0
cnc_{n} 3 3 6 6 12 12 24 24 48 48

Let τ=0.9\tau=0.9, γ=2\gamma=2, c0=3c_{0}=3, and λ0=(1,1)\lambda_{0}=(1,1). First 10 iterations of the model augmented Lagrangian method with multiplier updates (38) and penalty parameter updates (39) are given in Table 1. Let us note that the points of global minimum of the augmented Lagrangian were computed analytically. The results of computer simulation have shown that cnc_{n}\to\infty, λ2n(2,0)\lambda_{2n}\to(2,0), λ2n+1(0,2)\lambda_{2n+1}\to(0,2), x2n1x_{2n}\to 1, and x2n+11x_{2n+1}\to-1 as nn\to\infty (this fact can be proved analytically, but its proof is rather cumbersome, which is why we do not present it here). Thus, the iterations of the method oscillate between gradually shrinking neighbourhoods of two globally optimal solutions of problem (37) and the KKT multipliers corresponding to these solutions, while the penalty parameter increases unboundedly.

Remark 29.

The convergence theory for the model augmented Lagrangian method presented in this paper generalizes and unifies many existing results on convergence of augmented Lagrangian methods. In particular, many such results either directly follow from Theorems 6, 7, and 10 and Lemma 7 or can be easily deduced from them, including [2, Proposition 2.1], [50, Theorem 6.1, part (1)], [44, Theorems 1–3 and 7], [42, Theorems 2.4 and 3.1], [45, Theorems 4 and 7], [5, Theorem 5.2], [48, Theorem 5, part (iii)], etc.

References

  • [1] A. M. Bagirov, G. Ozturk, and R. Kasimbeyli. A shapr augmented Lagrangian-based method for constrained non-convex optimization. Optim. Methods Softw., 34:462–488, 2019.
  • [2] D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York, 1982.
  • [3] E. G. Birgin, R. A. Castillo, and J. M. Martinez. Numerical comparison of augmented Lagragian algorithms for nonconvex problems. Comput. Optim. Appl., 31:31–55, 2005.
  • [4] E. G. Birgin and J. M. Martinez. Augmented Lagrangian method with nonmonotone penalty parameters for constrained optimization. Comput. Optim. Appl., 51:941–965, 2012.
  • [5] E. G. Birgin and J. M. Martinez. Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM, Philadelphia, 2014.
  • [6] J. F. Bonnans and A. Shapiro. Perturbation analysis of optimization problems. Springer, New York, 2000.
  • [7] R. S. Burachik. On primal convergence for augmented Lagrangian duality. Optim., 60:979–990, 2011.
  • [8] R. S. Burachik, R. N. Gasimov, N. A. Ismayilova, and C. Kaya. On a modified subgradient algorithm for dual problems via sharp augmented Lagrangian. J. Glob. Optim., 34:55–78, 2006.
  • [9] R. S. Burachik, A. N. Iusem, and J. G. Melo. Duality and exact penalization for general augmented Lagrangians. J. Optim. Theory Appl., 147:125–140, 2010.
  • [10] R. S. Burachik, A. N. Iusem, and J. G. Melo. A primal dual modified subgradient algorithm with sharp Lagrangian. J. Glob. Optim., 46:55–78, 2010.
  • [11] R. S. Burachik, A. N. Iusem, and J. G. Melo. An inexact modified subgradient algorithm for primal-dual problems via augmented Lagrangians. J. Optim. Theory Appl., 157:108–131, 2013.
  • [12] R. S. Burachik, A. N. Iusem, and J. G. Melo. The exact penalty map for nonsmooth and nonconvex optimization. Optim., 64:717–738, 2015.
  • [13] R. S. Burachik and C. Y. Kaya. An update rule and a convergence result for a penalty function method. J. Ind. Manag. Optim., 3:381–398, 2007.
  • [14] R. S. Burachik, C. Y. Kaya, and X. Liu. A primal-dual algorithm as applied to optimal control problems. arXiv: 2304.03465, 2023.
  • [15] R. S. Burachik, C. Y. Kaya, and M. Mammadov. An inexact modified subgradient algorithm for nonconvex optimization. Comput. Optim. Appl., 45:1–34, 2008.
  • [16] R. S. Burachik and X. Liu. An inexact deflected subgradient algorithm in infinite dimensional spaces. arXiv: 2302.02072, 2023.
  • [17] R. S. Burachik and A. Rubinov. Abstract convexity and augmented Lagrangians. SIAM J. Optim., 18:413–436, 2007.
  • [18] R. S. Burachik, X. Q. Yang, and Y. Y. Zhou. Existence of augmented Lagrange multipliers for semi-infinite programming problems. J. Optim. Theory Appl., 173:471–503, 2017.
  • [19] E. Burman, P. Hansbo, and M. G. Larson. The augmented Lagrangian method as a framework for stabilised methods in computational mechanics. Arch. Comput. Methods Eng., 30:2579–2604, 2023.
  • [20] A. Conn, N. I. M. Gould, and P. Toint. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal., 28:545–572, 1991.
  • [21] M. V. Dolgopolik. Existence of augmented Lagrange multipliers: reduction to exact penalty functions and localization principle. Math. Program., 166:297–326, 2017.
  • [22] M. V. Dolgopolik. Augmented Lagrangian functions for cone constrained optimization: the existence of global saddle points and exact penalty property. J. Glob. Optim., 71:237–296, 2018.
  • [23] M. V. Dolgopolik. Minimax exactness and global saddle points of nonlinear augmented Lagrangians. J. Appl. Numer. Optim., 3:61–83, 2021.
  • [24] M. J. Feizollahi, S. Ahmed, and A. Sun. Exact augmented Lagrangian duality for mixed integer linear programming. Math. Program., 161:365–387, 2017.
  • [25] M. Fortin and R. Glowinski. Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland, Amsterdam, 1983.
  • [26] M. Fukushima, Z.-Q. Luo, and P. Tseng. Smoothing functions for second-order-cone complementarity problems. SIAM J. Optim., 12:436–460, 2001.
  • [27] R. N. Gasimov. Augmented Lagrangian duality and nondifferentiable optimization methods in nonconvex programming. J. Glob. Optim., 24:187–203, 2002.
  • [28] S. He, L. Wu, and H. Meng. A nonlinear Lagrangian for constrained optimization problems. J. Appl. Math. Comput., 38:669–685, 2012.
  • [29] M. R. Hestenes. Multiplier and gradient methods. J. Optim. Theory Appl., 4:303–320, 1969.
  • [30] R. A. Horn and C. R. Johnson. Topics in matrix analysis. Cambridge University Press, Cambridge, 1991.
  • [31] X. X. Huang and X. Q. Yang. A unified augmented Lagrangian approach to duality and exact penalization. Math. Oper. Res., 28:533–552, 2003.
  • [32] X. X. Huang and X. Q. Yang. Further study on augmented Lagrangian duality theory. J. Glob. Optim., 31:193–210, 2005.
  • [33] X. X. Huang, X. Q. Yang, and K. L. Teo. Augmented Lagrangian and nonlinear semidefinite programs. In F. Giannessi and A. Maugeri, editors, Variational Analysis and Applications, pages 513–529. Springer, Boston, 2005.
  • [34] K. Ito and K. Kunisch. Lagrange Multiplier Approach to Variational Problems and Applications. SIAM, Philadelphia, 2008.
  • [35] A. N. Iusem. Augmented Lagrangian methods and proximal point methods for convex optimization. Investigación Operativa, 8:11–49, 1999.
  • [36] K. C. Kiwiel. On the twice differentiable cubic augmented Lagrangian. J. Optim. Theory Appl., 88:233–236, 1996.
  • [37] D. Li and X. L. Sun. Convexification and existence of saddle point in a pth-power reformulation for nonconvex constrained optimization. Nonlinear Anal., 47:5611–5622, 2001.
  • [38] Y. Li and L. Zhang. A new nonlinear Lagrangian method for nonconvex semidefinite programming. J. Appl. Anal., 15:149–172, 2009.
  • [39] Q. Liu and X. Yang. Zero duality and saddle points of a class of augmented Lagrangian functions in constrained non-convex optimization. Optim., 57:655–667, 2008.
  • [40] Y. J. Liu and L. W. Zhang. Convergence analysis of the augmented Lagrangian method for nonlinear second-order cone optimization problems. Nonlinear Anal.: Theory, Methods, Appl., 67:1359–1373, 2007.
  • [41] Y. J. Liu and L. W. Zhang. Convergence of the augmented Lagrangian method for nonlinear optimization problems over second-order cones. J. Optim. Theory Appl., 139:557–575, 2008.
  • [42] H. Luo, X. Sun, and H. Wu. Convergence properties of augmented Lagrangian methods for constrained global optimization. Optim. Methods Softw., 23:763–778, 2008.
  • [43] H. Luo, H. Wu, and J. Liu. On saddle points in semidefinite optimization via separation scheme. J. Optim. Theory Appl., 165:113–150, 2015.
  • [44] H. Z. Luo, X. L. Sun, and D. Li. On the convergence of augmented Lagrangian methods for constrained global optimization. SIAM J. Optim., 18:1209–1230, 2007.
  • [45] H. Z. Luo, H. X. Wu, and G. T. Chen. On the convergence of augmented Lagrangian methods for nonlinear semidefinite programming. J. Glob. Optim., 54:599–618, 2012.
  • [46] O. L. Mangasarian. Unconstrained Lagrangians in nonlinear programming. SIAM J. Control, 12:772–791, 1975.
  • [47] D. Noll. Local convergence of an augmented Lagrangian method for matrix inequality constrained programming. Optim. Methods Softw., 22:777–802, 2007.
  • [48] M. C. W. Oliveira and C. Sagastizábal. Revisiting augmented Lagrangian duals. Math. Program., 196:235–277, 2022.
  • [49] R. Polyak. Modified barrier functions: theory and methods. Math. Program., 54:177–222, 1992.
  • [50] R. A. Polyak. Log-sigmoid multipliers method in constrained optimization. Ann. Oper. Res., 101:427–460, 2001.
  • [51] R. A. Polyak. Nonlinear rescaling vs. smoothing technique in convex optimization. Math. Program., 92:197–235, 2002.
  • [52] M. J. D. Powell. A method for nonlinear constraints in minimization problems. In R. Fletcher, editor, Optimization, pages 283–298. Academic Press, London, 1969.
  • [53] L. M. Ramirez, N. Maculan, A. E. Xavier, and V. L. Xavier. Dislocation hyperbolic augmented Lagrangian algorithm for nonconvex optimization. RAIRO Oper. Res., 57:2941–2950, 2023.
  • [54] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.
  • [55] R. T. Rockafellar. The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl., 12:555–562, 1973.
  • [56] R. T. Rockafellar. Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control Optim., 12:268–285, 1974.
  • [57] R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35:183–238, 1993.
  • [58] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis. Springer-Verlar, Berlin, 1998.
  • [59] A. M. Rubinov, X. X. Huang, and X. Q. Yang. The zero duality gap property and lower semicontinuity of the perturbation function. Math. Oper. Res., 27:775–791, 2002.
  • [60] A. M. Rubinov and X. Q. Yang. Lagrange-type functions in constrained nonconvex optimization. Kluwer Academic Publishers, Boston, 2003.
  • [61] J. J. Rückmann and A. Shapiro. Augmented Lagrangians in semi-infinite programming. Math. Program., 116:499–512, 2009.
  • [62] A. Shapiro and J. Sun. Some properties of the augmented Lagrangian in cone constrained optimization. Math. Oper. Res., 29:479–491, 2004.
  • [63] M. Stingl. On the solution of nonlinear semidefinite programs by augmented Lagrangian methods. PhD thesis, Institute of Applied Mathematics II, Friedrech-Alexander University of Erlangen-Nuremberg, Erlangen, Germany, 2006.
  • [64] D. Sun and J. Sun. Löwner’s operator and spectral functions in euclidean jordan algebras. Math. Oper. Res., 33:421–445, 2008.
  • [65] D. Sun, J. Sun, and L. Zhang. The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math. Program., 114:349–391, 2008.
  • [66] J. Sun. On methods for solving nonlinear semidefinite optimization problems. Numer. Algebra Control Optim., 1:1–14, 2011.
  • [67] J. Sun, L. W. Zhang, and Y. Wu. Properties of the augmented Lagrangian in nonlinear semidefinite optimization. J. Optim. Theory Appl., 129:437–456, 2006.
  • [68] X. L. Sun, D. Li, and K. I. M. McKinnon. On saddle points of augmented Lagrangians for constrained nonconvex optimization. SIAM J. Optim., 15:1128–1146, 2005.
  • [69] P. Tseng and D. P. Bertsekas. On the convergence of the exponential multiplier method for convex programming. Math. Program., 60:1–19, 1993.
  • [70] C. Wang, Q. Liu, and B. Qu. Global saddle points of nonlinear augmented Lagrangian functions. J. Glob. Optim., 68:125–146, 2017.
  • [71] C. Y. Wang and D. Li. Unified theory of augmented Lagrangian methods for constrained global optimization. J. Glob. Optim., 44:433–458, 2009.
  • [72] C. Y. Wang, X. Q. Yang, and X. M. Yang. Nonlinear augmented Lagrangian and duality theory. Math. Oper. Res., 38:740–760, 2012.
  • [73] Z. Wen, D. Goldfarb, and W. Yin. Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput., 2:203–230, 2010.
  • [74] H. Wu, H. Luo, X. Ding, and G. Chen. Global convergence of modified augmented Lagrangian methods for nonlinear semidefinite programming. Comput. Optim. Appl., 56:531–558, 2013.
  • [75] H. X. Wu and H. Z. Luo. A note on the existence of saddle points of p-th power Lagrangian for constrained nonconvex optimization. Optim., 61:1231–1245, 2012.
  • [76] H. X. Wu and H. Z. Luo. Saddle points of general augmented Lagrangians for constrained nonconvex optimization. J. Glob. Optim., 53:683–697, 2012.
  • [77] H. X. Wu, H. Z. Luo, and J. F. Yang. Nonlinear separation approach for the augmented Lagrangian in nonlinear semidefinite programming. J. Glob. Optim., 59:695–727, 2014.
  • [78] A. E. Xavier. Hyperbolic penalty: a new method for nonlinear programming with inequalities. Int. Trans. Oper. Res., 8:659–671, 2001.
  • [79] G. D. Yalcin and R. Kasimbeyli. On weak conjugacy, augmented Lagrangians and duality in nonconvex optimization. Math. Methods Oper. Res., 92:199–228, 2020.
  • [80] H. Yamashita and H. Yabe. A survey of numerical methods for nonlinear semidefinite programming. J. Oper. Res. Soc. Jpn., 58:24–60, 2015.
  • [81] L. Zhang, J. Gu, and X. Xiao. A class of nonlinear Lagrangians for nonconvex second order cone programming. Comput. Optim. Appl., 49:61–99, 2011.
  • [82] L. Zhang, Y. Li, and J. Wu. Nonlinear rescaling Lagrangians for nonconvex semidefinite programming. Optim., 63:899–920, 2014.
  • [83] X. Y. Zhao, D. Sun, and K.-C. Toh. A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim., 20:1737–1765, 2010.
  • [84] J. Zhou and J. S. Chen. On the existence of saddle points for nonlinear second-order cone programming problems. J. Glob. Optim., 62:459–480, 2015.
  • [85] J. Zhou, N. Xiu, and C. Wang. Saddle point and exact penalty representation for generalized proximal Lagrangians. J. Glob. Optim., 54:669–687, 2012.
  • [86] Y. Y. Zhou and X. Q. Yang. Duality and penalization in optimization via an augmented Lagrangian function with applications. J. Optim. Theory Appl., 140:171–188, 2009.
  • [87] Y. Y. Zhou and X. Q. Yang. Augmented Lagrangian functions for constrained optimization problems. J. Glob. Optim., 52:95–108, 2012.
  • [88] Y. Y. Zhou, J. C. Zhou, and X. Q. Yang. Existence of augmented Lagrange multipliers for cone constrained optimization problems. J. Glob. Optim., 58:243–260, 2014.