Convergence analysis of primal-dual augmented Lagrangian methods and duality theory
Abstract
We develop a unified theory of augmented Lagrangians for nonconvex optimization problems that encompasses both duality theory and convergence analysis of primal-dual augmented Lagrangian methods in the infinite dimensional setting. Our goal is to present many well-known concepts and results related to augmented Lagrangians in a unified manner and bridge a gap between existing convergence analysis of primal-dual augmented Lagrangian methods and abstract duality theory. Within our theory we specifically emphasize the role of various fundamental duality concepts (such as duality gap, optimal dual solutions, global saddle points, etc.) in convergence analysis of augmented Lagrangians methods and underline interconnections between all these concepts and convergence of primal and dual sequences generated by such methods. In particular, we prove that the zero duality gap property is a necessary condition for the boundedness of the primal sequence, while the existence of an optimal dual solution is a necessary condition for the boundedness of the sequences of multipliers and penalty parameters, irrespective of the way in which the multipliers and the penalty parameter are updated. Our theoretical results are applicable to many different augmented Lagrangians for various types of cone constrained optimization problems, including Rockafellar-Wets’ augmented Lagrangian, (penalized) exponential/hyperbolic-type augmented Lagrangians, modified barrier functions, etc.
1 Introduction
Augmented Lagrangians play a fundamental role in optimization and many other closely related fields [2, 25, 34, 5, 19], and there is a vast literature on various aspects of augmented Lagrangian theory and corresponding methods. A wide range of topics that is studied within the more theoretically oriented part of the literature includes such important problems as analysis of the zero duality gap property [31, 32, 17, 39, 86, 72, 79], exact penalty representation [31, 32, 86, 9, 72, 87, 85, 12, 24], existence of global saddle points [68, 76, 77, 84, 43, 70, 22], existence of augmented Lagrange multipliers [62, 61, 88, 18, 21], etc.
In turn, more practically oriented papers are usually devoted to convergence analysis of primal-dual augmented Lagrangian methods and do not utilise any theoretical concepts or results from duality theory, except for the zero duality gap property [8, 10, 1, 16, 14]. Very few attempts have been made to connect convergence analysis of such methods with fundamental results from duality theory. Paper [7], in which the equivalence between primal convergence and differentiability of the augmented dual function for the sharp Lagrangian for equality constrained problems was established, is perhaps the most notable among them.
In addition, papers dealing with convergence analysis of primal-dual augmented Lagrangian methods typically consider only one particular augmented Lagrangian for one particular type of constraints. As a consequence of that, very similar results have to be repeated multiple types in different settings (cf. convergence analysis of augmented Lagrangian methods based on the Hestenes-Powell-Rockafellar augmented Lagrangian for mathematical programming problems [5], nonconvex problems with second order cone constraints [40, 41], nonconvex semidefinite programming problems [65, 45, 74], as well as similar convergence analysis of augmented Lagrangian methods based on the exponential-type augmented Lagrangian/modified barrier function for nonconvex problems with second order cone constraints [81] and nonconvex semidefinite programming problems [38]). Some attempts have been made to unify convergence analysis of a number of primal-dual augmented Lagrangian methods [44, 42, 71], but only for finite dimensional inequality constrained optimization problems.
Thus, there are two fundamental challenges within the theory of augmented Lagrangians and corresponding optimization methods. The first one is connected with a noticeable gap that exists between more theoretically oriented results dealing with duality theory and more practically oriented results on convergence analysis of primal-dual methods. Often, the duality theory plays little to no role in convergence analysis of primal-dual methods, and theoretical results from this theory are almost never utilised to help better understand convergence of augmented Lagrangian methods. The second challenge is connected with unification of numerous similar results on duality theory and augmented Lagrangian methods that are scattered in the literature.
The main goal of this article is to present a general theory of augmented Lagrangians for nonconvex cone constrained optimization problems in the infinite dimensional setting that encompasses both fundamental theoretical concepts from duality theory and convergence analysis of primal-dual augmented Lagrangian methods, and also highlights interconnections between them, thus at least partially solving the two aforementioned challenges.
Our other aim is to correct for the bias that exists in the literature on augmented Lagrangians, in which a disproportionate number of papers is devoted to analysis of Rockafellar-Wets’s augmented Lagrangian [58, Section 11.K] and corresponding numerical methods, while very little attention is paid to other classes of augmented Lagrangians. In particular, to the best of the author’s knowledge, many concepts (such as exact penalty map [12], exact penalty representation, and augmented Lagrange multipliers) have been introduced and studied exclusively in the context of Rockafellar-Wets’ augmented Lagrangian. Our aim is to show that these concepts can be defined and be useful in a much more broad setting and to demonstrate that many results from duality theory and convergence analysis of primal-dual methods can be proved for Rockafellar-Wets’ augmented Lagrangian and many other augmented Lagrangians in a unified way.
To achieve our goals, we adopt an axiomatic augmented Lagrangian setting developed by the author in [22] and inspired by [39]. Within this setting, one defines an abstract augmented Lagrangian without specifying its structure, while all theoretical results are proved using a set of axioms (basic assumptions). To demonstrate the broad applicability of this approach, we present many particular examples of well-known augmented Lagrangians and prove that they satisfy these axioms.
We also develop a general duality theory for augmented Lagrangians that compliments the results of the author’s earlier papers [22, 23] and, in particular, provide simple necessary and sufficient conditions for the zero duality gap property to hold true, from which many existing results on this property can be easily derived. Finally, we present a general convergence analysis for a model augmented Lagrangian method with arbitrary rules for updating multipliers and penalty parameter. Under some natural assumptions, that are satisfied in many particular cases, we study primal and dual convergence of this method, making specific emphasis on the role of various fundamental concepts from duality theory in convergence analysis of augmented Lagrangian methods. In particular, we points out direct connections between primal convergence and the zero duality gap property, as well as direct connections between dual convergence, boundedness of the penalty parameter, and the existence of optimal dual solutions/global saddle points.
The paper is organized as follows. An abstract axiomatic augmented Lagrangian setting for cone constrained optimization problems in normed spaces, including a list of basic assumptions (axioms) on an augmented Lagrangian, is presented in Section 2. Many particular examples of augmented Lagrangians for various types of cone constrained optimization problems and the basic assumptions that these augmented Lagrangians satisfy are discussed in details in Section 3. Section 4 is devoted to duality theory. In this section we analyse the zero duality gap property for the augmented Lagrangian, and also study interconnections between global saddle points, globally optimal solutions of the augmented dual problem, augmented Lagrange multipliers, and the penalty map. Finally, a general convergence theory for a model augmented Lagrangian method, that encompasses many existing primal-dual augmented Lagrangian methods as particular cases, is presented in Section 5.
2 Axiomatic augmented Lagrangian setting
We start by presenting an axiomatic approach to augmented Lagrangians that serves as a foundation for our duality and convergence theories. Let and be real normed spaces, be a nonempty closed set (not necessarily convex), and be a closed convex cone. Suppose also that some functions and are given. In this article we study augmented Lagrangians for the following cone constrained optimization problem:
We assume that the feasible region of this problem is nonempty and its optimal value, denoted by , is finite.
The topological dual space of is denoted by , and let be either the inner product in , , or the duality pairing between and its dual, depending on the context. Recall that is the polar cone of the cone .
Denote by the binary relation over defined as if and only if . We say that this binary relation is induced by the cone . As one can readily check, this binary relation is a partial order on if and only if . In this case is called the partial order induced by the cone . Note that the cone constraint can be rewritten as .
We define an augmented Lagrangian for the problem as follows. Choose any function , . An augmented Lagrangian for the problem is defined as
if , and , otherwise. Here is a multiplier and is a penalty parameter. Note that only the constraint is incorporated into the augmented Lagrangian.
Unlike most existing works on augmented Lagrangians and corresponding optimization methods, we do not impose any assumptions on the structure of the function . It can be defined in an arbitrary way. Instead, being inspired by [39] and following the ideas of our earlier paper [22], we propose to use a set of axioms (assumptions) describing behaviour of the function for various types of arguments (e.g. as increases unboundedly or when ). This approach allows one to construct an axiomatic theory of augmented Lagrangians and helps one to better understand what kind of assumptions the function must satisfy to guarantee that the augmented Lagrangian and optimization methods based on this augmented Lagrangian have certain desirable properties. As we will show below, most well-known augmented Lagrangians satisfy our proposed set of axioms, which means that our axiomatic theory is rich enough and can be applied in many particular cases.
In order to unite several particular cases into one general theory, we formulate axioms/assumptions on the function with respect to a prespecified closed convex cone of admissible multipliers. Usually, or .
We grouped the assumptions on according to their meaning. If the function is viewed as a black box with input and output , then assumptions – describe what kind of output is produced by this black box with respect to certain specific kinds of input. Assumptions – describe general properties of the function , such as monotonicity, differentiability, and convexity. Assumptions – impose restrictions on the way the function behaves as increases unboundedly or along certain sequences . Finally, the subscript “s” indicates a stronger, i.e. more restrictive, version of an assumption.
Denote for any , and let be the closed ball with centre and radius . Below is the list of basic assumptions on the function that are utilised throughout the article:
-
(A1)
one has ;
-
(A2)
such that ;
-
(A3)
such that as ;
-
(A4)
such that one has ;
-
(A5)
such that one has ;
-
(A6)
one has ;
-
(A7)
the function is non-decreasing;
-
(A8)
the function is convex and non-decreasing with respect to the binary relation ;
-
(A9)
the function is concave;
-
the function is concave;
-
(A10)
the function is upper semicontinuous;
-
(A11)
such that the function is Fréchet differentiable at and its Fréchet derivative satisfies the equality for some surjective mapping that does not depend on and , and such that if and only if ;
-
(A12)
one has
-
and for any bounded subset one has
-
(A13)
and for any sequences and such that and as one has ;
-
for any sequences and such that and as and for any bounded subset one has ;
-
(A14)
such that as there exists such that as and for any with one has ;
-
such that as and for any bounded sequence there exists such that as and for any with the following equality holds true ;
-
(A15)
for any bounded sequences and and for any sequence such that as one has ;
Remark 1.
In the case of assumptions , , , , and , we say that the function satisfies the restricted version of the corresponding assumption, if one replaces “any sequence ” in the formulation of corresponding assumption with “any bounded sequence ”. Note that the validity of a (non-restricted) assumption implies that its restricted version also hold true.
Remark 2.
It should be noted that assumption can be reformulated as follows: for any sequences and such that and as and for any bounded sequence one has .
Below we will often use the following corollary to assumptions and that can be readily verified directly.
Lemma 1.
If satisfies assumption , then for any one has
If satisfies assumption , then for any bounded subset one has
In the following section we will give many particular examples of augmented Lagrangians for different types of cone constrained optimization problems (e.g. equality constrained problems, inequality constrained problems, problems with semidefinite constraints, etc.). If an optimization problem has several different types of constraints simultaneously, it is convenient to represent them as cone constraints of the form for some mappings and closed convex cones in real Banach spaces , , with . Then one can define and
to formally rewrite such optimization problems as the problem . The space is equipped with the norm for all .
In order to define a function in this case, one can define corresponding functions for each constraint individually (here and ) and then put
(1) |
if for all , and , otherwise. Most (if not all) existing augmented Lagrangians for problems with several different types constraints are defined in this way. Let us show that, roughly speaking, if and all functions , , satisfy some basic assumption simultaneously, then the function also satisfies this basic assumption.
Theorem 1.
Let . Then the following statements hold true:
-
1.
if all functions , , satisfy one of the basic assumptions simultaneously, except for assumptions , , , and , then the function defined in (1) satisfies the same basic assumption;
-
2.
if all functions , , satisfy assumptions and (or and ) simultaneously, then the function defined in (1) also satisfies assumption (or );
-
3.
if all functions , , satisfy assumptions and (or and ) simultaneously, then the function defined in (1) also satisfies assumption (or ).
Proof.
Assumption (A1). If and , then and for all . Therefore, by assumption , which implies that .
Assumption (A2). Choose any and . By assumption for any there exists such that . Put . Then .
Assumption (A3). Choose any and . By assumption for any there exists such that as . Put . Then as .
Assumption (A4). Let and be such that . It is easily seen that the condition implies that for any and, therefore, for all . Hence for all and by assumption one has , which yields .
Assumption (A5). If and are such that , then there exists such that . Therefore, . In turn, for any one has by assumption . Hence .
Assumption (A6). If and , then for some . Therefore, , while for any by assumption . Consequently, .
Assumption (A7). The function is non-decreasing in as the sum of non-decreasing in functions.
Assumption (A8). Note that if the function is non-decreasing with respect to the binary relation induced by the cone , then the function is non-decreasing with respect to the binary relation induced by the cone . Therefore the function is convex and non-decreasing with respect to the binary relation induced by the cone as the sum of convex and non-decreasing functions.
Assumptions (A9) and (A9)s. The function is concave in (or ) as the sum of concave functions.
Assumption (A10). The function is upper semicontinuous as the sum of a finite number of upper semicontinuous functions.
Assumption (A11). If and are such that , then, as was noted above, , , and for all . Then each of the functions is Fréchet differentiable at and its derivative has the form for some function such that if and only if . Therefore, the function is Fréchet differentiable at the point and its Fréchet derivative is equal to . Clearly, one has
which implies the required result.
Assumption (A12). Fix any , , and . Choose some . Let be such that and . Then there exists such that . By assumption for the functions , for any there exists such that for all one has
Let . Then with the use of assumption one gets that for any the following inequalities hold true:
Since such that and were chosen arbitrarily, one can conclude that for any and there exists such that
for all . Consequently, the function satisfies assumption .
Assumption (A12)s. Choose some and let be a bounded set. Clearly, one can find bounded sets such that , where .
By assumption for the functions , for any , , and one can find such that
where .
Denote . Fix some and choose any . Then for some . Hence with the use of assumption one gets that
for any . Since was chosen arbitrarily, one can conclude that the function satisfies assumption .
Assumption (A13). If sequences and are such that and as , then as and for all . From these inequalities it obviously follows that as well.
Assumption (A13)s. The proof is essentially the same as the proof for assumption .
Assumption (A14). Fix any and a sequence such that as . By our assumption for any there exists a sequence converging to zero and such that for any with for all one has as .
Define and choose any sequence such that for all . Then for any and one has , which implies that as and, therefore, as .
Assumption (A14)s. The proof is the same as the proof for assumption .
Assumption (A15). If and are bounded sequences and a sequence is such that as , then the sequences are also bounded and as for all . Consequently, one has
Applying these inequalities one can readily check that .
The proof of the claim of the theorem for the restricted versions of assumptions –, , and is essentially the same as the proofs for the non-restricted versions of these assumptions. ∎
Remark 3.
Let us note that assumptions and are satisfied for all particular augmented Lagrangians presented in this paper and all augmented Lagrangians known to the author.
Thus, the previous theorem allows one to analyse augmented Lagrangians for each particular type of cone constraints separately and then simply define an augmented Lagrangian for problems with several different types of cone constraints as in (1). Then the basic assumptions will be satisfied for this augmented Lagrangian by Theorem 1.
3 Examples of augmented Lagrangians
Let us present some particular examples of augmented Lagrangians for various types of cone constraints and discuss which of the basic assumptions are satisfied for these augmented Lagrangians. Our aim is to show that the basic assumptions are not restrictive and satisfied for most augmented Lagrangians appearing in the literature.
Many examples of augmented Lagrangians were already presented in the author’s previous paper [22]. However, many of the basic assumptions from this paper are completely new and were not used in [22] (e.g. assumptions , , and –, as well as their stronger and restricted versions). Therefore, for the sake of completeness we will present detailed descriptions of all examples, even though they partially overlap with the contents of [22, Section 3].
3.1 Augmented Lagrangians for problems with general cone constraints
First, we consider an augmented Lagrangian that is defined for any cone constrained optimization problem of the form . Since particular versions of this augmented Lagrangian are apparently studied and used in optimization methods more often than any other augmented Lagrangian, we will discuss it in more details than other examples.
Example 1 (Rockafellar-Wets’ augmented Lagrangian).
Let be such that and for any . In particular, one can set or . Define
(2) |
The augmented Lagrangian with the term defined in this way was first introduced by Rockafellar and Wets [58, Section 11.K] (see also [31, 62, 32, 21]). Various generalizations of this augmented Lagrangian were studied in [9, 72, 70].
Lemma 2.
Let . Then the function defined in (2) satisfies:
-
1.
assumptions –, , , , and in the general case;
-
2.
assumptions and , if as (that is, as ) for any ;
-
3.
assumption , if the function is convex;
-
4.
assumption with , if and is a Hilbert space;
-
5.
assumptions and , if has a valley at zero, that is, for any neighbourhood of zero in there exists such that for all ;
-
6.
assumptions and , if for all and some continuous function such that if and only if , and ;
-
7.
assumptions and , if is continuous at zero and there exists a continuous function such that for all , if and only if , and ;
-
8.
assumption , if the function is continuous at zero.
Proof.
We divide the proof of the lemma into several parts corresponding to its separate statements.
Part 1. Assumptions (set ), (set ), and are obviously satisfied. Assumption is satisfied for from the separation theorem for the sets and . Assumption is satisfied, since if and , then
which along with assumption implies that . Assumptions , , and are satisfied by virtue of the fact that the function is the infimum of a family of linear functions.
Part 2. Let and be such that . For any in a sufficiently small neighbourhood of zero one has , which implies that for the following inequalities hold true:
The last expression is negative for any sufficiently small , since , that is, assumption holds true.
Let now and . For any such one can find for which . Then putting for (note that , since and is a convex cone, which yields ) one gets
for any sufficiently small , thanks to the fact that .
Part 3. Fix any and . Choose any , . By definition one can find such that , . Then for one has
Hence by [54, Theorem I.4.2] one can conclude that the function is convex. Let us now show that it is non-decreasing with respect to the binary relation induced by the cone .
Indeed, fix any such that , i.e. . One can obviously suppose that . By definition for any one can find such that . Let be such that . Note that due to the fact that by our assumption. Then , which yields
Since was chosen arbitrarily, one can conclude that the function is non-decreasing with respect to the binary relation .
Part 4. The proof of this statement of the lemma can be found in [62].
Part 5. Fix any . For any and such that is finite one has
for any . If , then for any . Therefore by our assumption there exists , independent on , , and , such that
With the use of this inequality one can easily prove that assumptions and hold true.
Part 6. Let be an increasing unbounded sequence. Fix any bounded set . Then for any and one has
where is such that for all . By applying the assumptions on the function one can readily check that
which yields
(3) |
Consequently, assumptions and hold true.
Part 7. Let be a bounded sequence and a sequence be such that as . Due to the continuity of at zero, for any one can find such that for all one has . One can obviously suppose that as .
Define for all . Then for any sequence such that for all one can find a sequence such that for all . Observe that for one has
(4) |
and the right-hand side of this inequality converges to zero, since as and the sequence is bounded. Combining this fact with inequality (3) one obtains that we have found a sequence for which assumptions and hold true.
Part 8. Let and be bounded sequences, and a sequence be such that as . Then thanks to the continuity of at zero one can find a sequence such that and as , where . Now, applying inequality (4) and taking into account the fact that the right-hand side of this inequality obviously converges to zero one can conclude that assumption is valid. ∎
Thus, all basic assumptions are satisfied for . In the other important case of the sharp Lagrangian [27, 8, 10, 1, 16], that is, the case when , all assumptions, except for , , and , hold true. It should be noted that these assumptions are not used in the main results presented in this article and needed only to strengthen some of these results in the convex case.
Remark 4.
Note that neither assumption nor assumption are satisfied for due to the fact that in this case for all , if . Thus, if for some , then assumptions and might not hold true.
3.2 Augmented Lagrangians for problems with equality constraints
Let us now consider the following equality constrained problem:
The constraint can obviously be rewritten as the cone constraint , if one puts . The binary relation in this case coincides with the equality relation “”, and all functions are non-decreasing with respect to this relation.
Example 2 (Hestenes-Powell’s augmented Lagrangian).
Define and
Then the corresponding augmented Lagrangian is a particular case of the augmented Lagrangian from Example 1 with . Therefore, this function satisfies all basic assumptions, except for assumption , in the general case, and it satisfies assumption with , if is a Hilbert space. Note that in the case the corresponding augmented Lagrangian coincides with the Hestenes-Powell augmented Lagrangian [29, 52, 5].
Example 3 (sharp Lagrangian).
Define and
Then the corresponding augmented Lagrangian is a particular case of the augmented Lagrangian from Example 1 with . Therefore, this function satisfied all basic assumptions, except for assumptions , , and .
In the case of equality constrained problems of the form
where and are given function, one can define a more general class of augmented Lagrangians. This problem can be written as the problem with , , and . Note that in this particular case the dual space can be identified with .
Example 4 (Mangasarian’s augmented Lagrangian).
Let be a twice differentiable strictly convex function such that and is surjective. Define
for all and . Then the corresponding augmented Lagrangian coincides with Mangasarian’s augmented Lagrangian from [46] (see also [76]). In the case , this augmented Lagrangian coincides with the Hestenes-Powell augmented Lagrangian. One can also put, e.g. or for any .
Let . Then one can readily check that all assumptions, except for assumptions and , are satisfied in the general case (assumption holds true with ). Assumptions and are satisfied if and only if for some . The validity of these assumptions in the case when the function is quadratic can be readily verified directly. Let us prove the converse statement.
Suppose that the function is concave. Then applying the second order derivative test for concavity one gets that for all and or, equivalently, is a constant function. Hence bearing in mind the conditions one gets that for some .
3.3 Augmented Lagrangians for problems with inequality constraints
Next we will present several examples of augmented Lagrangians for the inequality constrained problem
(5) |
where and are given functions. This problem can be written as the problem with , , and , where . The dual space can be identified with , while can be identified with , where . The binary relation in this case is the coordinate-wise partial order.
All particular augmented Lagrangians for problem (5) used in optimization methods and known to the author are separable (except for nonlinear Lagrangians; see [60, 17, 72]), that is, the corresponding function has the form
(6) |
for some functions . Although one can can choose different functions for different (that is, for different inequality constraints), to the best of the author’s knowledge, only the case when are the same for all is considered in the vast majority of papers on augmented Lagrangians for inequality constrained problems.
Example 5 (essentially quadratic/Hestenes-Powell-Rockafellar’s augmented Lagrangian).
Let be a twice continuously differentiable strictly convex function such that , and the derivative is surjective. Following Bertsekas [2, Section 5.1.2, Example 1], for any define
(note that the minimum is finite and attained at any such that , which exists due to the surjectivity of ) and put
The corresponding augmented Lagrangian is called the essentially quadratic augmented Lagrangian for problem (5) (see [68, 39, 71]). In the case one has
and is the well-known Hestenes-Powell-Rockafellar augmented Lagrangian [29, 52, 55, 56, 57, 5], which is a particular case of the augmented Lagrangian from Example 1 with and being the Euclidean norm.
Let . Then one can readily verify that all basic assumptions, except for assumption , hold true in the general case (assumption is satisfied with ). Assumption is satisfied for , .
Example 6 (cubic augmented Lagrangian).
Let
Then coincides with the cubic augmented Lagrangian [36]. One can easily check that all basic assumptions, except for assumptions and , are satisfied in this case with (assumption is satisfied with ). Assumption holds true, provided , while assumption is not satisfied for any choice of .
Example 7 (Mangasarian’s augmented Lagrangian).
Let be a twice continuously differentiable strictly convex function such that and the function is surjective. Define
(7) |
Then coincides with the augmented Lagrangian introduced by Mangasarian [46] and studied, e.g. in [76]. Let . Then all basic assumptions, except for assumptions and , hold true (assumption is satisfied with ). Assumptions and are satisfied for with .
Example 8 (exponential-type augmented Lagrangian).
Let be a twice differentiable strictly increasing function such that . Define
If , then coincides with the exponential penalty function [2, 69, 68, 39, 71]. In turn, if , then is the Polyak’s log-sigmoid Lagrangian [50, 51]. In the general case we call the corresponding function the exponential-type augmented Lagrangian.
Let . Then assumptions –, , , and are satisfied in the general case. Assumptions and hold true, provided the function is convex. Assumption is satisfied with if and only if . Restricted versions of assumptions , , , and (see Remark 1) are satisfied if and only if as , while non-restricted versions of these assumptions are satisfied if and only if the function is bounded below. Finally, assumptions , , and (put ) are never satisfied for the exponential-type augmented Lagrangian.
Thus, all basic assumptions, except for assumptions , , and , are valid for the exponential penalty function and the log-sigmoid Lagrangian.
Example 9 (penalized exponential-type augmented Lagrangian).
Suppose that is a twice differentiable strictly increasing function such that , and is a twice continuously differentiable non-decreasing function such that for all and for all (for example, one can set ). Following Bertsekas [2, Section 5.1.2, Example 2] define
Then the function is called the penalized exponential-type augmented Lagrangian [68, 39, 71], since it is obtained from the augmented Lagrangian from the previous example by adding the penalty term .
Let . Then assumptions –, , , and , are satisfied in the general case. Assumptions and hold true, provided the functions and are convex. Assumption is satisfied with if and only if . Assumptions and are valid, provided as and is either bounded below or convex. Restricted assumptions , , , and , hold true if and only if as , while non-restricted versions of these assumptions hold true if and only if is bounded below.
Thus, if or and , then all basic assumptions, except for assumption , hold true.
Example 10 (p-th power augmented Lagrangian).
Let and a continuous non-decreasing function be such that for all . For example, one can set with or with . By our assumption the inequality is satisfied if and only if . Furthermore, for all . Define
Then coincides with the p-th power augmented Lagrangian [37, 75, 39].
Let . Then assumptions –, , , –, , and hold true. Assumption is satisfied, if the function is convex. Assumption is satisfied with , provided is differentiable and . Finally, assumptions , , and are not satisfied for the p-th power augmented Lagrangian.
Remark 5.
Let be as in the previous example and be as in Example 9. Then by analogy with the penalized exponential-type augmented Lagrangian one can define the penalized p-th power augmented Lagrangian as follows:
If the function is convex and differentiable, , is convex, and as , then one can verify that the penalized p-th power augmented Lagrangian satisfies all basic assumption, except for assumption . Let us also mention that one can apply this trick of adding the penalty term to any other augmented Lagrangian for inequality constrained problems, if it does not satisfy assumptions and , in order to construct the penalized version of this augmented Lagrangian satisfying assumptions and and having all other properties of the non-penalized version.
Example 11 (hyperbolic-type augmented Lagrangian).
Let be a twice differentiable strictly increasing convex function such that . Define
If , then coincides with the hyperbolic augmented Lagrangian [78, 53]. In the general case we call such function the hyperbolic-type augmented Lagrangian.
Let . Then assumptions –, , , and are satisfied in the general case (assumption is satisfied with ). Assumption is satisfied if and only if is a linear function. Restricted assumptions , , , and , hold true if and only if as , while non-restricted versions of these assumptions hold true if and only if is bounded below. Finally, assumptions , , and are never satisfied for the hyperbolic-type augmented Lagrangian.
Example 12 (modified barrier function).
Let be a twice differentiable strictly increasing function such that and as . Define
Then augmented Lagrangian coincides with the modified barrier function introduced by R. Polyak [49]. In particular, in the case the augmented Lagrangian is the modified Frisch function, while in the case the augmented Lagrangian is the modified Carrol function [49] (see also [68, 39, 71]).
Let . Then assumptions –, , , , , and are satisfied in the general case. Assumptions and hold true, if the function is convex. Assumption is satisfied with if and only if . Restricted assumptions , , , and hold true if and only if as , while non-restricted versions of these assumptions are valid if and only if the function is bounded below. Finally, assumption cannot hold true for the modified barrier function.
Thus, the modified Carrol function satisfies all basic assumptions, except for assumption , while the modified Frisch functions satisfies all assumptions, except for and non-restricted assumptions , , , and .
Example 13 (He-Wu-Meng’s augmented Lagrangian).
Let
Then coincides with the augmented Lagrangian introduced by He, Wu, and Meng [28]. Let us note that
if , and .
Let . Then assumptions –, , , , with , , , , and restricted assumptions , , , and are satisfied in the general case. Assumption holds true if and only if , since for all . Finally, assumptions , , , , , and are not satisfied for He-Wu-Meng’s augmented Lagrangian in the general case. Let us note that the non-restricted versions of the last 4 assumptions are not satisfied due to the fact that as .
Remark 6.
Many more particular examples of augmented Lagrangians for inequality constrained optimization problems can be found in [3].
3.4 Augmented Lagrangians for problems with second order cone constraints
Let us now consider nonlinear second order cone programming problems:
(8) |
where , , are given functions,
is the second order (Lorentz/ice cream) cone of dimension , and is the Euclidean norm.
Problem (8) can be rewritten as the problem with
and . Note that the dual space can be identified with , while the polar cone can be identified with .
Example 14 (Hestenes-Powell-Rockafellar’s augmented Lagrangian).
To define another augmented Lagrangian for optimization problems with nonlinear second order cone constraints, recall (see [26, 64]) that in the context of such problems Löwner’s operator associated with a function is defined as follows:
for any with , and for . One can readily verify that if and the function is strictly increasing, then for any , while for any .
Example 15 (exponential-type augmented Lagrangian/modified barrier function).
Let be a non-decreasing convex function such that for some , as in the case and as in the case . Suppose also that is twice differentiable on , , and . For any and define
if , and otherwise. The corresponding augmented Lagrangian, which can be viewed as an extension of augmented Lagrangian from Examples 8 and 12 to the case of nonlinear second order cone programming problems, was introduced in [81].
Let . Then assumptions –, –, and hold true in the general case (assumption is satisfied with by [81, Lemma 3.1]). Assumptions and are satisfied if and only if . Restricted assumptions , , , and hold true if and only if as , while non-restricted versions of these assumptions hold true if and only if is bounded below. Finally, assumptions and are not satisfied for the function from this example.
3.5 Augmented Lagrangians for problems with semidefinite constraints
Let us now consider nonlinear semidefinite programming problems of the form:
(9) |
where is a given function, is the space of all real symmetric matrices of order endowed with the inner product and the corresponding norm , , which is called the Frobenius norm, is the trace operator, is the zero matrix of order , and is the Löwner partial order on the space , that is, for some if and only if the matrix is positive semidefinite.
Problem (9) can be written as the problem with and being the cone of negative semidefinite matrices . Note that the binary relation induced by the cone coincides with the Löwner partial order. The dual space in this case can be identified with , while the polar cone can be identified with the cone of positive semidefinite matrices .
Example 16 (Hestenes-Powell-Rockafellar’s augmented Lagrangian).
For any and define
where is the projection of a matrix onto the cone . This function is a particular case of the function from Example 1 with and, therefore, it satisfies all basic assumptions with (assumption holds true with ). The corresponding augmented Lagrangian and optimization methods for nonlinear semidefinite programming problems utilising this augmented Lagrangian were studied in [33, 67, 65, 83, 73, 66, 45, 74, 77, 80].
One can also extend the exponential-type augmented Lagrangian/modified barrier function for inequality constrained problems to the case of nonlinear semidefinite programming problems. To define such extension, recall that the matrix function/Löwner’s operator [30, 64] associated with a function is defined as follows:
where is a spectral decomposition of a matrix , while are the eigenvalue of listed in the decreasing order. Note that if the function is non-decreasing and , then for any .
Example 17 (exponential-type augmented Lagrangian/modified barrier function).
Let a function be as in Example 15. For any and define
if , and otherwise. The corresponding augmented Lagrangian is an extension of augmented Lagrangians for inequality constrained optimization problems from Examples 8 and 12. It was studied in details in [63, 47, 38, 82, 43].
Let . Then assumptions –, –, and hold true in the general case (assumption is satisfied with by [43, Proposition 4.2]). Assumption is satisfied, if the matrix function is monotone and convex (see, e.g. [30]). Assumptions and are satisfied if and only if . Restricted assumptions , , , and hold true if and only if as , while non-restricted versions of these assumptions hold true if and only if is bounded below. Finally, assumption is not satisfied for the function from this example.
Example 18 (penalized exponential-type augmented Lagrangian).
Let a function be a twice continuously differentiable non-decreasing convex function such that as , and . Let also be a twice continuously differentiable non-decreasing convex function such that for all and for all . Denote by the Löwner’s operator associated with . For any and define
The corresponding augmented Lagrangian was introduced in [43] and is an extrension of the penalized exponential-type augmented Lagrangian from Example 9 to the case of nonlinear semidefinite programming problems.
Let . Then assumptions –, –, and hold true in the general case (assumption is satisfied with by [43, Proposition 4.2]). Assumption is satisfied, provided the matrix functions and are monotone and convex. Assumptions and are satisfied, if as . Restricted assumptions , , , and hold true if and only if as , while non-restricted versions of these assumptions hold true if and only if is bounded below. Finally, assumption is not satisfied, regardless of the choice of and .
3.6 Augmented Lagrangians for problems with pointwise inequality constraints
Let be a measure space and be some space of functions . For example, can be defined as or as the Sobolev space, when is an open subset of . Let us consider problems with pointwise inequality constraints of the form:
(10) |
where is a given function such that for some fixed and all .
One can rewrite problem (10) as the problem with , being the cone of nonpositive function , and for all . In this case the dual space can be identified with , where is the conjugate exponent of , that is, (, if ). In turn, the polar cone can be identified with the cone of nonnegative functions .
For the sake of shortness we will consider only an augmented Lagrangian for problem (10) based on the Hestenes-Powell-Rockafellar augmented Lagrangian. However, it should be mentioned that one can define an augmented Lagrangian for this problem based on any other augmented Lagrangian for inequality constrained optimization problems.
Example 19 (Hestenes-Powell-Rockafellar augmented Lagrangian).
Suppose that either or and the measure is finite. For any , , and define
Observe that
Therefore, the value is correctly defined and finite for any , , and , if or and the measure is finite.
Let . Then one can readily verify that all basic assumptions hold true in the general case, except for assumptions and . Assumptions and are satisfied in the case , since
for all , , and (see the proof of the validity of assumptions and for the Rockafellar-Wets’ augmented Lagrangian).
3.7 Some comments on particular augmented Lagrangians
Before we proceed to the analysis of the augmented dual problem and primal-dual augmented Lagrangian methods, let us make a few general observations about the presented examples:
-
1.
All basic assumptions, except for assumption , are satisfied for all particular augmented Lagrangians presented above (under appropriate additional assumptions), except for the exponential-type augmented Lagrangian (Examples 8, 15, and 17), the p-th power augmented Lagrangian (Example 10), the hyperbolic-type augmented Lagrangian (Example 11), and He-Wu-Meng’s augmented Lagrangian (Example 13). The exponential-type augmented Lagrangian (the case in Examples 15 and 17) and the p-th power augmented Lagrangian do not satisfy assumptions and , the hyperbolic-type augmented Lagrangian does not satisfy assumptions , , and , while He-Wu-Meng’s augmented Lagrangian does not satisfy assumption and non-restricted versions of assumptions , , , and .
- 2.
-
3.
For assumptions , –, –, and – to be satisfied for the exponential-type augmented Lagrangian (Examples 8, 15, and 17), the penalized exponential-type augmented Lagrangian (Examples 9 and 18), the modified barrier function (Example 12), and the p-th power augmented Lagrangian (Example 10), and the hyperbolic-type augmented Lagrangian (Example 11) it is necessary that . In contrast, for all other paritcular augmented Lagrangians presented in this section these assumptions are satisfied for (in the case of the He-Wu-Meng’s augmented Lagrangian only the restricted versions of assumptions , , , and are satisfied for ).
-
4.
Our theory of augmented Lagrangians encompasses penalty functions of the form with . One simply needs to define . This function satisfied assumptions , , , , , , , – and – for any choice of the set (assumption is satisfied, if , while assumption is satisfied, if ), which means that the main results of this paper on the zero duality gap and convergence of augmented Lagrangian methods can be applied to the penalty function .
4 Duality theory
One of the central concepts of the theory of augmented Lagrangians and corresponding optimization methods is the (augmented) dual problem:
where
(11) |
is the (augmented) dual function. As is well-known and will be discussed in details below, convergence of augmented Lagrangian methods is interlinked with various properties of the dual problem. Therefore, before turning to augmented Lagrangian methods, we need to analyse how standard duality results are translated into our axiomatic augmented Lagrangian setting.
Remark 7.
Note that if assumption is satisfied, then the dual function is concave and the augmented dual problem is a concave optimization problem, even if the original problem is nonconvex. Furthermore, assumption ensures that the dual function is upper semicontinuous, as the infimum of the family , , of upper semicontinuous functions.
4.1 Zero duality gap property
Let us first study how optimal values of the problems and relate to each other. We start by showing that under an essentially nonrestrictive assumption the optimal value of the augmented dual problem does not exceed the optimal value of the primal problem.
Proposition 1 (weak duality).
Let assumption hold true. Then
where is the feasible region of the problem . In pacritular, , where is the optimal value of the problem and is the optimal value of the problem .
Proof.
By assumption for any point and all and one has . Hence applying the definition of (see (11)) and the fact that one obtains the required result. ∎
As is well-known, the optimal values of the primal and dual problems might not coincide, especially for nonconvex problems. In this case and the quantity is called the duality gap.
Definition 1.
One says that there is zero duality gap between the primal problem and the dual problem (or that the augmented Lagrangian has the zero duality gap property, or that the strong duality with respect to the augmented Lagrangian holds true), if .
Our aim now is to understand what kind of assumptions one must impose on the function to ensure that the corresponding augmented Lagrangian has the zero duality gap property. To this end, we extend the standard result (see, e.g. [59]) connecting the optimal value of the dual problem with the behaviour of the optimal value (perturbation) function
of the problem to our case. Denote by the effective domain of in , that is, . Note that if and only if the function is bounded below on for some .
Theorem 2 (optimal dual value formula).
Let assumptions , , and – hold true. Then
In addition, for all .
Proof.
Note that for all and , and , if . Therefore, below we can suppose that .
By assumption the function is non-decreasing in . Therefore the functions and are non-decreasing in for all and . Hence, as is easy to see, one has
Consequently, it is sufficient to check that
(12) |
Let us prove this equality.
Fix any and any unbounded strictly increasing sequence such that the function is bounded below on (such exists by the definition of ). Then by Proposition 1 one has
(13) |
Let be a sequence such that for all . Observe that from (13) it follows that
(14) |
Note also that due to assumption for all , , and such that one has , if , and
if (note that , since ). Therefore, by assumption for any one has as uniformly on the set , which with the use of (14) implies that as . Let us consider two cases.
Case I. Suppose that there exists a subsequence that is feasible for the problem . Then with the use of Lemma 1 one gets
Case II. Suppose now that for some subsequence Then with the use of assumption one gets
where is any sequence such that and as (note that such sequence exists, since as ).
Combining the two cases and inequalities (13) and (14) one obtains that
(15) |
To prove equality (12), suppose that .
Let be any sequence such that and as . Let also be the sequence from assumption . Clearly, there exists a subsequence such that for all . By the definition of the optimal value function for any one can find such that (i.e. ) and in the case when , and as in the case when .
Corollary 1 (duality gap formula).
If under the assumptions of the previous theorem one has , then
In particular, if the duality gap is positive, then it is equal to .
Remark 8.
Although in the proof of Theorem 2 we considered the case , in actuality, the assumptions of this theorem ensure that . Namely, if assumptions and are satisfied and is bounded below on for some and , then . Indeed, suppose by contradiction that . Then there exists a sequence such that and as . By the definition of the optimal value function one can find a sequence such that for all and as . Note that as , since converges to zero.
Let be any increasing unbounded sequence and be the sequence from assumption . Clearly, one can find a subsequence such that for all . Then by assumption one has as , which implies that
which due to assumption contradicts the fact that is bounded below on .
Remark 9.
The claim of Theorem 2 remains to hold true, if only restricted versions of assumptions and hold true, and one additionally assumes that the projection of the set onto the cone is bounded. If this projection is bounded, then one can show that the sequences appearing in the proof of the theorem are also bounded. Therefore, only restricted versions of assumptions and are needed to prove the theorem in this case, which makes the theorem applicable, for example, to He-Wu-Meng’s augmented Lagrangian (Example 13).
Let us note that the assumption on the boundedness of the projection of onto is not uncommon in the literature on augmented Lagrangians and primal-dual augmented Lagrangian methods (see, e.g. [44, Assumption 2], [39, Assumption 2], [71, Assumption 2], [70, condition (2)], etc.). In many particular cases this assumption is not restrictive from the theoretical point of view. For example, one can always guarantee that this assumption is satisfied for inequality constrainted problems by replacing the constraints with .
As a simple corollary to Theorem 2 we can obtain necessary and sufficient conditions for the augmented Lagrangian to have the zero duality gap property.
Theorem 3 (zero duality gap characterisation).
Let assumptions , , and – be valid. Then the zero duality gap property holds true if and only if the optimal value function is lower semicontinuous (lsc) at the origin and there exist and such that the function is bounded below on .
Proof.
Suppose that the zero duality gap property holds true. Then the optimal value of the dual problem is finite, which implies that or, equivalently, there exist and such that the function is bounded below on . Moreover, by Theorem 2, which means that , that is, the optimal value function is lsc at the origin.
Conversely, suppose that is lsc at the origin and . Then by Theorem 2 one has , that is, there is zero duality gap between the primal and dual problems. ∎
Remark 10.
Let us note that one can prove the zero duality gap property for under slightly less restrictive assumptions on the function than in the previous theorems. Namely, instead of assuming that the claims of assumptions –, are satisfied for all , it is sufficient to suppose that there exists satisfying these assumptions (in the case of the coercivity assumption one must suppose that the function is coercive for ). Then arguing in the same way as in the proof of Theorem 2 one can check that
This inequality obviously implies that the zero duality gap property holds true, provided the optimal value function is lsc at the origin. Although such small change in the assumptions of the theorem might seem insignificant, in actuality it considerably broadens the class of augmented Lagrangians to which the sufficient conditions for the validity of the zero duality gap property can be applied. For example, Theorems 2 and 3 are inapplicable to the exponential-type augmented Lagrangian (Example 8), since this augmented Lagrangian does not satisfy assumption . However, it satisfies the claim of assumption for any that lies in the interior of (i.e. that does not have zero components) and, therefore, one can conclude that the zero duality gap property holds true for the exponential-type augmented Lagrangian, provided the optimal value function is lsc at the origin and there exists .
Remark 11.
Theorem 3 implies that under suitable assumptions the zero duality gap property depends not on the properties of the augmented Lagrangian , but rather properties of the optimization problem itself. Similarly, by Corollary 1 the duality gap does not depend on the augmented Lagrangian or even some characteristic of the dual problem . It is completely predefined by the properties of the optimization problem under consideration. Thus, in a sense, the absence of the duality gap between the primal and dual problems, as well as the size of the duality gap, when it is positive, are properties of optimization problems themselves, not augmented Lagrangians or augmented dual problems that are used for analysing and/or solving these problems.
For the sake of completeness, let us also present a simple characterisation of the lower semicontinuity of the optimal value function , from which one can easily derive a number of well-known sufficient conditions for this function to be lsc at the origin.
Proposition 2.
For the optimal value function to be lsc at the origin it is necessary and sufficient that there does not exist a sequence such that as and .
Proof.
Necessity. Suppose that is lsc at the origin. Let be any sequence such that as . Denote . Then as and due to the lower semicontinuity of at the origin one has
In other words, there does not exist a sequence satisfying the conditions from the formulation of the proposition.
Sufficiency. Suppose by contradiction that the function is not lsc at the origin. Then there exist and a sequence converging to zero and such that for all . By the definition of the optimal value function for any one can find such that and (recall that ). Therefore as and , which contradicts the assumptions of the proposition that there does not exist a sequence satisfying these conditions. ∎
Corollary 2.
Let the space be reflexive, the set be weakly sequentially closed (in particular, one can suppose that is convex), and the functions and be weakly sequentially lsc on . Then for the optimal value function to be lsc at the origin it is necessary and sufficient there does not exist a sequence such that and as , and .
Proof.
The necessity of the conditions from the formulation of the corollary for the lower semicontinuity of the function follows directly from the previous proposition. Let us prove that they are also sufficient for the lower semicontinuity of .
Taking into account Proposition 2 it is sufficient to prove that there does not exists a bounded sequence such that as and . Suppose by contradiction that such bounded sequence exists. Replacing this sequence with a subsequence, if necessary, one can assume that the sequence converges. Since the space is reflexive, one can extract a subsequence that weakly converges to some point that belongs to the set , since this set is weakly sequentially closed. Furthermore, , i.e. is feasible for the problem , since as and the function is weakly sequentially lsc. Hence taking into account the fact that is also weakly sequentially lsc one gets that
which is impossible by virtue of the fact that the point is feasible for the problem . ∎
Corollary 3.
Let the assumptions of the previous corollary be satisfied and one of the following conditions hold true:
-
1.
the set is bounded;
-
2.
the function is coercive on ;
-
3.
the function is coercive on ;
-
4.
the penalty function is coercive on for some and .
Then the optimal value function is lsc at the origin.
Thus, by Theorem 3 and Corollary 2, in the case when the space reflexive (in particular, in the finite dimensional case), under some natural lower semicontinuity assumptions the duality gap between the problems and is positive if and only if there exists an unbounded sequence such that as and the lower limit of the sequence is smaller than the optimal value of the problem . Furthermore, one can verify that the infimum of all such lower limits is equal to and, therefore, defines the value of the duality gap .
Remark 12.
Many existing results on the zero duality gap property for various augmented Lagrangians are either particular cases of Theorems 2 and 3 combined with Corollaries 2 and 3 or can be easily derived directly from these theorems and corollaries, including [31, Theorem 4.1], [32, Theorem 2.2], [8, Theorem 3], [39, Theorem 2.1], [86, Theorem 4.1], [87, Theorem 2.1], the claims about the zero duality gap property in Theorems 7, 9, and 11 in [79], etc.
4.2 Optimal dual solutions and global saddle points
As we will show below, dual convergence of augmented Lagrangian methods (that is, the convergence of the sequence of multipliers) is directly connected with the existence of globally optimal solutions of the dual problem . Therefore, let us analyse main properties of optimal dual solutions that will help us to better understand dual convergence of augmented Lagrangian methods. For the sake of shortness, below we use the term optimal dual solution, instead of globally optimal solution of the dual problem .
First, we make a simple observation about the role of the penalty parameter in optimal dual solutions.
Proposition 3.
Let assumption hold true and be an optimal dual solution. Then for any the equality holds true and the pair is also an optimal dual solution.
Proof.
Under the assumption the augmented Lagrangian is non-decreasing in , which obviously implies that the augmented dual function is non-decreasing in as well. Therefore, for any one has , which means that is a globally optimal solution of the dual problem and . ∎
With the use of the previous proposition we can describe the structure of the set of optimal dual solutions, which we denote by . Define
Note that by definition , if for any . In addition, if and assumption holds true, then according to the previous proposition for any . Following the work of Burachik et al. [12], we call the function the penalty map.
Let us prove some properties of the penalty map and describe the structure of the set of optimal dual solutions with the use of this map.
Corollary 4.
Let assumptions , , and be valid and the dual problem have a globally optimal solution. Then the set is convex, the penalty map is a quasiconvex function and
Proof.
Let us first show that the set is convex. Indeed, choose any , , and . Then by Propositon 3 both and . Hence taking into account the fact that the dual function is concave in by assumption one gets
Therefore , which means that and the set is convex. Moreover, since was chosen aribtrarily, one has , that is, the penalty map is a quasiconvex function.
As was noted above, for any and one has . Hence bearing in mind the fact that the augmented dual function is upper semicontinuous (usc) by assumption one can conclude that
The validity of the converse inclusions follows directly from the definition of the penalty map and Proposition 3. ∎
Remark 13.
Note that we need to consider the case separately due to the fact that many particular augmented Lagrangians are not defined for (see examples in Section 3). However, if a given augmented Lagrangian is correctly defined for and assumptions , , and are satisfied for , then
where is the epigraph of the penalty map. This equality holds true, in particular, for Rockafellar-Wets’ augmented Lagrangian from Example 1. Let us also note that in the case when assumption is satisfied (e.g. in the case of Rockafellar-Wets’ augmented Lagrangian) the penalty map is convex, since in this case the dual function is concave and, therefore, the set of optimal dual solutions is convex.
Optimal dual solutions can be described in terms of global saddle points of the augmented Lagrangian.
Definition 2.
A pair is called a global saddle point of the augmented Lagrangian , if there exists such that
(16) |
The infimum of all such is denoted by and is call the least exact penalty parameter for the global saddle point .
Remark 14.
It is worth noting that inequalities (16) from the definition of global saddle point are obviously satisfied as (and, therefore, can be replaced with) equalities.
The following theorem, that combines together several well-known results (cf. [58, Theorem 11.59], [62, Theorem 2.1, part (v)], [88, Theorem 2.1], etc.), shows how optimal dual solutions are interconnected with global saddle points. We present a complete proof of this theorem for the sake of completeness and due to the fact that, to the best of the author’s knowledge, all three claims of this theorem cannot be derived from existing results within our axiomatic augmented Lagrangian setting.
Theorem 4.
Let assumptions – and be valid. Then the following statements hold true:
-
1.
if a global saddle point of exists, then the zero duality gap property holds true, is a globally optimal solution of the problem , and for any the pair is an optimal dual solution;
-
2.
if is an optimal dual solution and the zero duality gap property holds true, then for any globally optimal solution of the problem the pair is a global saddle point of and ;
-
3.
if is a global saddle point of , then
for all .
Proof.
Part 1. Let be a global saddle point of and some be fixed. Let us first show that is feasible. Indeed, assume that . Then by assumption there exists a multiplier such that as , which contradicts the inequalities
that follow from the definition of the global saddle point. Thus, , that is, is feasible.
By assumption there exists such that , which by the definition of global saddle point implies that
where the last inequality is valid by assumption . Consequently, one has
Hence applying the definition of global saddle point and assumption once more one gets that
for any feasible , which means that is a globally optimal solution of the problem . Furthermore, one has
Thetefore, by Proposition 1 the zero duality gap property holds true and the pair is an optimal dual solution for any .
Part 2. Let and be globally optimal solutions of the problems and respectively, and suppose that the zero duality gap between property holds true. Fix any . Then applying assumption and Proposition 3 one obtains that
Consequently, , and applying assumption once again one gets
which obviously means that is a global saddle point of the augmented Lagrangian and .
Part 3. Let be a global saddle point. Choose any . From the proof of the first statement of the theorem it follows that
where the last equality and the fact that follow from the fact that is an optimal dual solution.
By the definition of global saddle point , which implies that
On the other hand, by the same definition one also has
Thus, , and the proof is complete. ∎
Remark 15.
Note that assumption is not needed for the validity of the first and third statements of the theorem, since it is not used in the proofs of these statements. In turn, assumptions and are not needed for the validity of the second statement of the theorem.
Combining the first and second statements of the previous theorem one obtains the two following useful results.
Corollary 5.
Let assumptions – and hold true. Then a global saddle point of exists if and only if there exist globally optimal solutions of the primal problem and the dual problem and the zero duality gap property holds true.
Corollary 6.
Let assumptions – and be valid and be a global saddle point of . Then for any globally optimal solution of the problem the pair is also a global saddle point of and .
Thus, the least exact penalty parameter does not depend on a globally optimal solution of the problem and is equal to the value of the penalty map .
Remark 16.
As was shown in [22, Proposition 9], if the functions and are differentiable, then under some natural assumptions on the function any global saddle point of the augmented Lagrangian is a KKT-point of the problem . This result implies that if there are two globally optimal solutions of the problem having disjoint sets of multipliers satisfying KKT optimality conditions (note that problems having such optimal solutions are necessarily nonconvex), then there are no global saddle points of the augmented Lagrangian and by Corollary 5 either the duality gap between the primal and dual problems is positive or the dual problem has no globally optimal solutions. As we will show below, this fact leads to the unboundedness of the sequence of multipliers or the sequence of penalty parameters generated by augmented Lagrangian methods for problems having optimal solutions with disjoint sets of Lagrange multipliers.
Let us give an example illustrating the previous remark.
Example 20.
Let . Consider the following optimization problem:
(17) |
This problem has two globally optimal solutions: and . The corresponding Lagrange multipliers are and . Thus, the sets of Lagrange multipliers corresponding to two different globally optimal solutions are disjoint.
The optimal value function for problem (17) has the form:
The function is obviously continuous at the origin. Therefore, under the assumptions of Theorem 3 the zero duality gap property holds true. In particular, it holds true for the Hestenes-Powell-Rockafellar augmented Lagrangian for problem (17):
(18) |
However, the corresponding augmented dual problem has no globally optimal solutions.
Indeed, if an optimal dual solution exists, then by Theorem 4 for any globally optimal solution of problem (17) the pair is a global saddle point of the augmented Lagrangian and . Therefore by the third statement of Theorem 4 and the definition of global saddle point for any one has
(19) |
Hence applying assumption one gets that
Clearly, there exists such that and for any . Therefore, by the equalities above for any one has
or, equivalently, . However, as one can easily check,
which contradicts (19). Thus, the augmented dual problem has no globally optimal solutions. In the following section we will show how a standard primal-dual augmented Lagrangian method behaves for problem (17) (see Example 22).
Another object in the augmented duality theory, that is directly connected with optimal dual solutions and global saddle points, is augmented Lagrange multiplier, which we introduce below by analogy with the theory of Rockafellar-Wets’ augmented Lagrangians [62, 61, 88, 18, 21].
Definition 3.
A vector is called an augmented Lagrange multiplier (of the augmented Lagrangian ), if there exists such that the pair is a global saddle point of the augmented Lagrangian. The set of all augmented Lagrange multipliers is denoted by .
Remark 17.
(i) Our definition of the augmented Lagrange multiplier is equivalent to the one used in the context of Rockafellar-Wets’ augmented Lagrangians by [62, Theorem 2.1].
(ii) By Theorem 4, in the case when the zero duality gap property is satisfied and there exists a globally optimal solution of the primal problem, a vector is an augmented Lagrange multiplier if and only if there exists such that is an optimal dual solution. Consequently, .
Let us point out some interesting properties of augmented Lagrange multipliers.
Proposition 4.
Let assumption and the zero duality gap property hold true, and there exist a globally optimal solution of the problem . Then a vector is an augmented Lagrange multiplier if and only if there exists such that
(20) |
Furthermore, if this equality and assumptions , , and are satisfied, then the following statements hold true:
-
1.
for all ;
-
2.
the infimum of all for which (20) holds true is equal to ;
-
3.
for all the infimum in (20) is attained at every globally optimal solution of the problem ;
-
4.
if the function is strictly increasing on for any and on for any , then for all the infimum in (20) is attained at some if and only if is a globally optimal solution of the problem .
Proof.
Part 1. Let be an augmented Lagrange multiplier. Then by Theorem 4 there exists such that is an optimal dual solution. Hence with the use of the fact that the zero duality gap property holds true one can conclude that equality (20) is valid.
Suppose now that equality (20) is satisfied for some and . Then by Proposition 1 the pair is an optimal dual solution, which by Theorem 4 implies that is an augmented Lagrange multiplier.
Part 2.1. Suppose that that equality (20) is satisfied for some and , and assumptions , , and hold true. Then, as was noted earlier, the function is non-decreasing in , which along with Proposition 1 imply that for all .
Part 2.2. The fact that the infimum of all for which (20) holds true is equal to follows directly from the definition of the penalty map.
Part 2.3. If a pair satisfies equality (20), then it is an optimal dual solution. Consequently, by Theorem 4 for any globally optimal solution of the problem the pair is a global saddle point of the augmented Lagrangian. By the definition of global saddle point it means that the infimum in (20) is attained at (see (16)).
Part 2.4. Suppose finally that the function is strictly increasing on for any and on for any , and the infimum in (20) is attained at some . If is feasible and or is infeasible, then for any by our assumption on the function one has , which is obviously impossible. Therefore, is feasible and , which means that , that is, is a globally optimal solution of the problem . ∎
Thus, if is an augmented Lagrange multiplier, then under some additional assumptions the problem has the same optimal value and the same globally optimal solutions as the problem
for any . That is why in [58] augmented Lagrange multipliers were called multipliers supporting an exact penalty representation (see [58, Definition 11.60] and [31, 32, 86, 9, 72, 87, 12]).
Remark 18.
The assumption that the function is strictly increasing on for any and on for any is very mild and satisfied in many particular cases. For example, it is satisfied for any for Rockafellar-Wets’ augmented Lagrangian (Example 1), provided the function has a valley at zero and is continuous at this point, the Hestenes-Powell-Rockafellar augmented Lagrangian (Examples 2, 14, 16, 19), the sharp Lagrangian (Example 3), Mangasarian’s augmented Lagrangian(Examples 4 and 7), the essentially quadratic augmented Lagrangian (Example 5), the cubic augmented Lagrangian (Example 6), the penalized exponential-type augmented Lagrangian (Examples 9 and 18), provided the function is strictly convex on , and He-Wu-Meng’s augmented Lagrangian (Example 13). This assumption is also satisfied for any that does not have zero components for the exponential-type augmented Lagrangian (Example 8), the hyperbolic-type augmented Lagrangian (Example 11), and the modified barrier function (Example 12), provided the function is strictly convex in these examples, and for the p-th power augmented Lagrangian (Example 10).
Remark 19.
To the best of the author’s knowledge, the penalty map, augmented Lagrange multipliers, and an exact penalty representation have been introduced and studied earlier only in the context of Rockafellar-Wets’ augmented Lagrangians. In this section we demonatrated (see Corollary 4 and Proposition 4) that there is nothing specific in these concepts that is inherently connected to Rockafellar-Wets’ augmented Lagrangians. They can be naturally introduced and studied for any other augmented Lagrangian, including the (penalized) exponential-type augmented Lagrangian, modified barrier functions, Mangasarian’s augmented Lagrangian, etc., that are typically not considered in the theory of Rockafellar-Wets’ augmented Lagrangians.
4.3 Optimal dual solutions for convex problems
In the case when the problem is convex, optimal dual solutions, roughly speaking, do not depend on the penalty parameter (more precisely, the penalty map does not depend on ), and one can give a very simple (and well-known) description of the set of optimal dual solutions in terms of Lagrange multipliers.
Let the function and the set be convex, and the mapping be convex with respect to binary relation , that is,
Then the problem is convex. Moreover, in this case under assumption the augmented Lagrangian is convex in . Indeed, by applying first the fact that is non-decreasing with respect to and then the fact that this function is convex one obtains that for any and the following inequalities hold true:
which means that the function is convex for any and (in the case when , instead of the inequalities above one should apply [54, Theorem I.4.2]).
Denote by the standard Lagrangian for the problem , where . It is easily seen that the function is convex. Recall that a vector is called a Lagrange multiplier of the problem at a feasible point , if and , where is the subdifferential of the function at in the sense of convex analysis and is the normal cone to the set at (see, e.g. [6, Definition 3.5]). The existence of a Lagrange multiplier at is a sufficient, and in the case when necessary, global optimality condition, and the set of Lagrange multipliers of the problem is a nonempty, convex, weak∗ compact set that does not depend on an optimal solution (see [6, Theorem 3.6]). Furthermore, is a Lagrange multiplier at if and only if is a saddle point of the Lagrangian :
Note finally that if is directionally differentiable at , then is a Lagrange multiplier at if and only if for all and , where is the directional derivative of at in the direction , and is the contingent cone to the set at the point (cf. [6, Lemma 3.7]).
Let us now present a complete characterisation of the set optimal dual solutions in terms of Lagrange multipliers in the convex case. Roughly speaking, this result shows that under suitable assumptions there is essentially no difference between standard duality theory, based on the Lagrangian , and augmented duality theory, based on the augmented Lagrangian . For the sake of simplicity we will prove this result under the assumption that the functions and are directionally differentiable at a globally optimal solution of the problem , although in various particular cases (e.g. in the case of inequality constrained problems) this result can be proved without this assumption.
Theorem 5.
Let the following conditions hold true:
-
1.
and are convex, is convex with respect to the binary relation ;
-
2.
assumptions , , , , , and hold true;
-
3.
;
-
4.
and are directionally differentiable at a globally optimal solution of the problem .
Then a Lagrange multiplier of the problem exists if and only if the zero duality gap property holds true and there exists a globally optimal solution of the augmented dual problem with .
Moreover, if a Lagrange multiplier of the problem exists, then with is an optimal dual solution if and only if is a Lagrange multiplier of the problem and , where is from assumption .
Proof.
Suppose that a Lagrange multiplier exists. Then, as was noted above, for all . By assumption the function is Fréchet differentiable and its Fréchet derivative is a surjective mapping from onto . Therefore, there exists such that . Applying the chain rule for directional derivatives one gets
for any and . As was noted above, under the assumptions of the theorem the function is convex. Consequently, the inequality above implies that is a point of global minimum of on the set . Recall that , since is a Lagrange multiplier. Hence by assumption one has , and with the use of assumption one gets
which by Proposition 1 means that the zero duality gap property is satisfied and with any is an optimal dual solution.
Suppose now that the zero duality gap property holds true and there exists an optimal solution of the problem with . Then by Theorem 4 (see also Remark 15) the pair is a global saddle point of the augmented Lagrangian and . Therefore, by the definition of global saddle point is a point of global minimum of on the set and
(21) |
(here we used Proposition 3). Hence with the use of assumption one obtains that
for any , where . Moreover, and, therefore, by assumption , since otherwise by assumption one has , which contradicts (21). Thus, is a Lagrange multiplier.
It remains to note that, as was shown above, if is a Lagrange multiplier, then for any and such that the pair is an optimal dual solution. Conversely, if with is an optimal dual solution, then is a Lagrange multiplier. ∎
Corollary 7.
Let and be convex, be convex with respect to the binary relation , and be directionally differentiable at an optimal solution of the problem , , and suppose that assumptions – and hold true. Then a Lagrange multiplier of the problem exists if and only if the zero duality gap property holds true and there exists an optimal dual solution. Moreover, if a Lagrange multiplier of the problem exists, then is a globally optimal solution of the dual problem if and only if is a Lagrange multiplier of the problem and .
Proof.
With the use of the previous corollary we can finally describe the structure of the set of optimal dual solutions , the penalty map , and the set of augmented Lagrange multipliers in the convex case.
Corollary 8.
Let the assumptions of the previous corollary be valid. Then , for any , and the following equality holds true:
(22) |
Remark 20.
The fact that optimal dual solutions for convex problems do not depend on the penalty parameter motivates one to consider a slightly different augmented dual problem in the convex case:
(23) |
Here and is fixed, that is, the penalty parameter is not considered as a variable of augmented dual problem, but rather as a fixed external parameter. Note that the function is concave, provided assumption holds true, which is satisfied for most particular augmented Lagrangians, in contrast to the much more restrictive assumption , which is satisfied, to the best of the author’s knowledge, only for Rockafellar-Wets’ augmented Lagrangian and, in particular, the Hestenes-Powell-Rockafellar augmented Lagrangian. Taking into account the concavity of the function one can consider primal-dual augmented Lagrangian methods based on solving problem (23), instead of the augmented dual problem , i.e. augmented Lagrangian methods with fixed penalty parameter. Convergence analysis of such methods is always based on the use of a particular structure of an augmented Lagrangian under consideration (see the survey of such methods in [35]), which makes it difficult to extend such analysis to the general axiomatic augmented Lagrangian setting adopted in this article.
Let us show that in the case when assumptions , , and are not satisfied, equality (22) might be no longer valid and the penalty map might not be identically equal to zero on , even when the problem is convex.
Example 21.
Let . Consider the following optimization problem:
(24) |
Let be the sharp Lagrangian for this problem (i.e. the augmented Lagrangian from Example 1 with ). Then
and, as one can readily verify by carefully writing down all particular cases,
Consequently, one has
that is, the claims of Corollary 8 do not hold true for the sharp Lagrangian (recall that this Lagrangian does not satisfy assumptions , , and ).
5 Convergence analysis of augmented Lagrangian methods
The goal of this section is to prove general convergence theorems for a large class of augmented Lagrangian methods and to analyse interrelations between convergence of augmented Lagrangian methods, zero duality gap property, and the existence of global saddle points/optimal dual solutions. We aim at presenting such results that explicitly highlight this kind of interrelations, instead of implicitly using them within the proofs, as it is usually done in the literature.
5.1 Model augmented Lagrangian method
We present all theoretical results for the following model augmented Lagrangian method given in Algorithm 1. In order to include various particular cases into the general theory, we do not specify a way in which multipliers and the penalty parameter are updated by the method, that is, they can be updated in any way that satisfies certain assumptions presented below. It makes our results applicable to the vast majority of existing augmented Lagrangian methods, including methods with nonmonotone penalty parameter updates as in [4], methods based on maximizing the augmented dual function with the use of bundle methods as in [48], etc. However, one should underline that our convergence analysis is by no means universal and there are augmented Lagrangian methods to which it cannot be applied. We will briefly discuss some such methods further in this section (see Remark 21 below).
Let us comment on Step 1 on Algorithm 1. For the purposes of theoretical analysis of primal-dual augmented Lagrangian methods it is often assumed that the augmented Lagrangian subproblem
is solved exactly, i.e. that is a globally optimal solution of this problem [69, 50, 27, 8, 40, 41, 38, 13, 5]. Moreover, even when it is assumed that this subproblem is solved only approximately as in [15, 44, 71, 10, 7, 11, 16, 14], one almost always assumes that as , and the case when does not tend to zero is not properly analysed (papers [45, 48] are very rare exceptions to this rule). However, from the practical point of view the assumption that as cannot be satisfied, especially in the infinite dimensional case, due to round off errors, discretisation errors, etc. The value should be viewed as an unavoidable error reflecting the overall precision with which computations can be performed that does not tend to zero with iterations. To take this unavoidable error into account, below we present a detailed analysis of the model augmented Lagrangian method without assuming that as , and then show how corresponding convergence theorems can be clarified and strengthened by imposing this additional purely theoretical assumption.
It should also be noted that practical augmented Lagrangian methods must include stopping criteria. We do not include a stopping criterion in our formulation of the model augmented Lagrangian method, because we are interested only in its theoretical (asymptotic) analysis, that is, in the analysis of the way sequences generated by this method behave as . This asymptotic analysis can be used to devise appropriate stopping criteria for practical implementations of augmented Lagrangian methods.
Below we will utilise the following natural assumptions on the model augmented Lagrangian method and sequences generated by this method that are satisfied in many particular cases:
-
(B1)
for any the function is bounded below on ;
-
(B2)
the sequence of multipliers is bounded;
-
(B3)
if the sequence of penalty parameters is bounded, then one has as ; if, in addition, some subsequence is bounded, then as ;
-
(B4)
if the sequence of penalty parameters is unbounded, then as .
The assumption is a basic assumption for all primal-dual augmented Lagrangian methods, which is needed to ensure that the sequence is correctly defined. The assumption is often imposed for the purposes of convergence analysis of augmented Lagrangian methods and, as is noted in [5], is usually satisfied in practice for traditional rules for updating multipliers. Moreover, various techniques can be used to guarantee the validity of assumption , such as safeguarding and normalization of multipliers [44, 42, 5].
We formulate as an assumption due to the fact that we do not impose any restrictions on the way in which the penalty parameter is updated. For many augmented Lagrangian methods, penalty parameter updates are specifically designed to ensure that assumption is satisfied by default (see the rules for updating the penalty parameter and corresponding convergence analysis in [44, 42, 71, 5] and other aforementioned papers on augmented Lagrangian methods). Finally, assumption is needed only in the case of methods with nonmonotone penalty parameter updates. It excludes the undesirable situation of unboundedly increasing oscillations of the penalty parameter (e.g. and for all ), which cannot be properly analysed within out general augmented Lagrangian setting.
Remark 21.
(i) Assumption plays one of the key roles in our convergence analysis of the model augmented Lagrangian method. Therefore, this analysis is inapplicable to those methods for which assumption is not satisfied, such as the modified subgradient algorithm (the MSG) proposed by Gasimov [27] (the fact that assumption is not satisfied for the MSG in the general case follows from [8, Example 1]).
(ii) The convergence analysis of the model augmented Lagrangian method presented below heavily relies on the assumption on boundedness of the sequence of multipliers and cannot be applied in the case when the sequence does not have at least a bounded subsequence. However, there are primal-dual augmented Lagrangian methods for which one can prove convergence of the sequence to the set of globally optimal solutions of the problem even in the case when the norm of the multipliers increases unboundedly with iterations. Augmented Lagrangian methods with the so-called conditional multiplier updating (see [20], [44, Algorithm 3], [42, Algorithm 3], [45, Algorithm 3]) and the algorithms from [71] are examples of such methods. The main idea behind these methods consists in designing multiplier and penalty parameter updating rules in such a way as to ensure that an increase of the norm of the multipliers is compensated by a sufficient increase of the penalty parameter , so that one can prove that
(25) |
even if as . It is possible to extend convergence analysis of these methods to our axiomatic augmented Lagrangian setting by either imposing some restrictive assumptions on the function or directly assuming that relations (25) hold true. We do not present such extension here and leave it as a problem for future research.
5.2 Convergence analysis of the method
Let us now turn to convergence analysis. We start with two simple observations. The first one can be viewed as a generalization of some existing results (e.g. [27, Theorem 5] and [48, Theorem 1]) to the case of general cone constrained problems and arbitrary augmented Lagrangians satisfying a certain assumption.
Lemma 3.
Let the function be non-decreasing with respect to the binary relation for any and , and let be the sequence generated by the model augmented Lagrangian method. Then for any the point is an -optimal solution of the problem
(26) |
Moreover, if , then is an -optimal solution of the problem .
Proof.
Suppose by contradiction that the claim of the lemma is false. Then one can find a feasible point of problem (26) such that . Hence by our assumption on the function one gets
which contradicts the definition on .
Suppose now that . Recall that the inequality means that or, equivalently, . Therefore the feasible region of the problem coincides with the feasible region of problem (26), which implies that an -optimal solution of this problem is also an -optimal solution of the problem . ∎
Remark 22.
The assumption that the function is non-decreasing is satisfied for all particular augmented Lagrangians presented in this paper, except for the one from Example 15.
The second observation is connected with the augmented dual function. Recall that if the function satisfies assumption , then for any feasible point . Therefore, the following result holds true.
Lemma 4.
Let assumption be valid. Then
Thus, if the sequence is bounded below, then the corresponding sequence is bounded. Let us analyse how these sequences behave in the limit. As we will see below, this analysis is a key ingredient in the global convergence theory of augmented Lagrangian methods.
The following cumbersome technical result, which can be viewed as a partial generalization of Theorem 2, plays a key role in our convergence analysis of the model augmented Lagrangian method. The proof of this result is very similar to the proof of Theorem 2, and we include it only for the sake of completeness.
Lemma 5.
Let assumptions , , and hold true. Suppose also that a sequence is such that:
-
1.
as ,
-
2.
the sequence is bounded,
-
3.
as ,
-
4.
the sequence is bounded above.
Let finally . Then
(27) | ||||
(28) | ||||
(29) |
where .
Proof.
Note that the upper estimate for the limit superior in (29) follows directly from the upper estimate in (28) and assumption . In turn, the lower estimate for the limit inferior in (29) follows directly from the fact that as .
Indeed, if some subsequence is feasible for the problem , then obviously . In turn, if each member of a subsequence is infeasible for the problem , then for any such that . Since as , one can choose , , such that as . Consequently, one has
which obviously implies that the the lower estimate in (29) holds true.
Case I. Suppose that there exists a subsequence that is feasible for the problem . Then with the use of Lemma 1 one gets
where .
Case II. Suppose now that there exists a subsequence such that for all . Then with the use of assumption one gets
where is any sequence such that for all and as (clearly, such sequence exists, since as ).
Combining the two cases one gets that the lower estimates for the limit inferiors in (27) and (28) hold true. Let us now prove the upper estimates for the limit superiors. Note that the upper estimate in (28) follows directly from the upper estimate in (27) and Lemma 4. Therefore, it suffies to prove only the upper estimate for the limit superior of .
By Proposition 1 one has for all , which implies that . If , then the proof is complete. Therefore, suppose that .
By the definition of limit inferior there exists a sequence such that and as . Let be the sequence from assumption . Since as , there exists a subsequence such that for all .
By the definition of the optimal value function for any one can find such that (which implies that ) and , if , and otherwise. By applying assumption , one gets that
which means that the upper estimate in (27) holds true. ∎
Remark 23.
Corollary 9.
Let assumptions , , and hold true. Let also a sequence be such that the sequence is bounded and as . Suppose finally that there exists a sequence such that and as . Then
(30) |
Proof.
Apply the previous lemma to the sequence with for all . ∎
Remark 24.
The corollary above strengthens Lemma 5. Namely, it states that if there exists a sequence satisfying the assumptions of the corollary, then one can actually replace the lower and upper estimates (27) with equality (30). Note also that by the definition of augmented dual function there always exists a sequence such that as . The main assumption of the corollary is that one can find a sequence not only satisfying this condition, but also such that as .
Let us also provide necessary and sufficient conditions for the sequence to converge to zero.
Lemma 6.
Let assumptions , , and hold true and a sequence be such that:
-
1.
the sequence is bounded,
-
2.
as ,
-
3.
the sequence is bounded above,
-
4.
there exists such that the function is bounded below on .
Then for the sequence to converge to zero it is sufficient that the sequence is bounded below. This condition becomes necessary, when .
Proof.
If only a finite number of elements of the sequence is infeasible for the problem , then the claim of the lemma is trivial, since in this case and for any sufficiently large . Therefore, replacing, if necessary, the sequence with its subsequence one can suppose that for all .
Fix any . Due to the boundedness of the sequence and assumption there exists such that for any , , and one has
which implies that , where
Therefore, for any such that and one has
which contradicts (31). Consequently, for any such that one has , which implies that as .
Necessity. Suppose by contradiction that , but the sequence is unbounded below. Then for any sequence such that and as (at least one such sequence exists, since that as ) one has
which contradicts our assumption. ∎
Remark 25.
(i) The last assumption of the lemma is satisfied, in particular, if for any bounded set there exists such that the function is bounded below on . This assumption is satisfied for all particular examples of augmented Lagrangians from Section 3, except for He-Wu-Meng’s augmented Lagrangian (Example 13) under appropriate additional assumptions. Namely, in the case of Rockafellar-Wet’s augmented Lagrangian (Example 1) one needs to additionally assume that for some and , while in the case of the (penalized) exponential-type augmented Lagrangians (Examples 8, 9, 15, 17, and 18) and the hyperbolic-type augmented Lagrangian (Example 11) one needs to additionally assume that the function / is bounded below. In all other example the assumption on the boundedness below of the function is satisfied without any additional assumptions.
(ii) The last assumption of the lemma is satisfied for He-Wu-Meng’s augmented Lagrangian and the (penalized) exponential/hyperbolic-type augmented Lagrangians with unbounded below functions /, if the projection of the set onto is bounded. In particular, in the case of inequality constrained problems it is sufficient to suppose that the functions , defining the constraints , are bounded below. As was noted in Remark 9, this assumption is not restrictive from the theoretical point of view.
(iii) It should be noted that we used the last assumption of the lemma in order to implicitly prove that assumption implies that
(32) |
for any . Therefore one might wonder whether it would be better to formulate (32) as a basic assumption and use it instead of assumption and the assumption on the boundedness below of the function . Note, however, that in most particular cases the boundedness below of the function is a necessary condition for (32) to hold true. In particular, one can easily check that if a separable augmented Lagrangian (see (6)) satisfies condition (32), then each function is bounded below. That is why we opted to use assumption along with the assumption on the boundedness below of the function instead of condition (32).
Now we are ready to estimate the limit of the sequence and the corresponding sequences and of the augmented dual and objective functions’ values for sequences generated by the model augmented Lagrangian method. Recall that is the optimal value of the augmented dual problem.
Theorem 6 (main convergence theorem).
Let be the sequence generated by the model augmented Lagrangian method, and suppose that the following conditions are satisfied:
-
1.
assumptions , , –, and hold true;
-
2.
assumptions – hold true;
-
3.
the sequence is bounded and ;
-
4.
for any bounded set there exists such that the function is bounded below on .
Then in the case when the sequence is bounded one has as , and in the case when the sequence is unbounded one has if and only if the sequence is bounded below. Furthermore, if the sequence is bounded below, then
(33) | ||||
(34) | ||||
(35) |
Proof.
If the sequence is unbounded, then by assumption one has as , and the claim of the theorem follows directly from Lemmas 5 and 6 (the fact that follows directly from Remark 8). Therefore, suppose that the sequence of penalty parameters is bounded. Note that inequalities (33) in this case follow from the first inequality in (34) and the definition of . Note further that by assumption one has and as . Consequently, one has
Thus, it is sufficient to prove either of the inequalities (34) and (35). We divide the rest of the proof into two parts.
Part 1. Lower estimate. Let be any subsequence such that
(at least one such subsequence exists by the definition of limit inferior). Suppose at first that there exists a subsequence of the sequence , which we denote again by , that is feasible for the problem . Then for all and the lower estimate for the limit inferior in (35) holds true by Proposition 1.
Suppose now that is infeasible for the problem for all greater than some . Since as , for any one can find such that and as . Consequently, for all and
which by Theorem 2 implies that the lower estimate for the limit inferior in (35) is valid.
Part 2. Upper estimate. Suppose at first that . Then by Lemma 4 and Theorem 2 one has
that is, the upper estimate for the limit superior in (34) holds true.
Let us now consider the case . By the definition of limit inferior there exists such that and as . Note that by Remark 8 and, therefore, one can suppose that for all .
By the definition of the optimal value function one can find a sequence such that and for all . Hence, in particular, as , which by assumption implies that . Consequently, one has
Recall that by the definition of one has . Therefore the inequality above along with Theorem 2 imply that the upper estimate for the limit superior in (34) holds true. ∎
Remark 26.
(i) The previous theorem can be slightly generalized in the following way. Namely, suppose that assumption does not hold true, but there exists a bounded subsequence . Then the claim of Theorem 6 holds true for the corresponding subsequence . In the case when the sequence is unbounded, one simply needs to apply Lemmas 5 and 6 to this subsequence. In the case when the sequence is bounded, one just needs to repeat the proof of the theorem with the sequence replaced by the subsequence .
(ii) Note that from the proof of Theorem 6 it follows that the sequence is always bounded below in the case when the sequence is bounded.
(iii) In many papers on augmented Lagrangians and augmented Lagrangians methods, it is assumed by default that the function is bounded below on the set (see [44, Assumption 1] [42, Assumption 2.3], [39, Assumption 1] [71, Assumption 1], [45, Assumption 1], [70, Assumption (1)], etc.). From the theoretical point of view this assumption is not restrictive, since one can always replace the objective function with . This assumption ensures that as for sequences generated by the model augmented Lagrangian method, regardless of whether the sequence of penalty parameters is bounded or not. Furthermore, in the case when the sequence increases unboundedly, this assumptions guarantees that estimates (33) can be replaced with equality (30) by Corollary 9 and Lemma 6.
Corollary 10.
Let the assumptions of Theorem 6 hold true, and suppose that as . Then
5.3 Primal convergence
Now we are ready to prove general theorems on convergence of the sequence generated by the model augmented Lagrangian method. Denote by the duality gap between the primal problem and the augmented dual problem .
Theorem 7 (primal convergence vs. duality gap).
Let the assumptions of Theorem 6 be valid, the sequence be bounded below, and the functions and be lsc on . Then the sequence has limit points, only if (that is, the duality gap is smaller than the tolerance with which the augmented Lagrangian subproblems are solved). Furthermore, all limit points of the sequence (if such points exist) are -optimal solutions of the problem .
Proof.
Suppose that there exists a limit point of the sequence , i.e. there exists a subsequence that converges to . Recall that as and
by Theorem 6. Hence taking into account the semicontinuity assumptions one can conclude that , i.e. is feasible for the problem , and
Therefore, and is an -optimal solution of the problem . ∎
Corollary 11 (primal convergence vs. zero duality gap).
If under the assumptions of the previous theorem as , then for the sequence to have limit points it is necessary that there is zero duality gap between the primal problem and the augmented dual problem . Furthermore, in this case all limit points of the sequence (if such points exist) are globally optimal solutions of the problem .
In the case when the space is reflexive (in particular, finite dimensional), we can prove a somewhat stronger result. Namely, we can show that if the zero duality gap property is not satisfied, then the sequence necessarily escapes to infinity as .
Theorem 8 (boundedness vs. duality gap).
Let the assumptions of Theorem 6 be valid, the space be reflexive, the set be weakly sequentially closed, the functions and be weakly sequentially lsc on , the sequence be bounded below. Then the following statements hold true:
-
1.
for the sequence to have a bounded subsequence it is necessary that ;
-
2.
all weakly limit points of the sequence (if such points exist at all) are -optimal solutions of the problem ;
-
3.
if as , then for the sequence to have a bounded subsequence it is necessary that the zero duality gap property holds true; furthermore, in this case all weakly limit points of the sequence are globally optimal solutions of the problem .
Proof.
Bearing in mind the fact that any bounded sequence in a reflexive Banach space has a weakly convergent subsequence and arguing in the same way as in the proof of Theorem 7 one can easily verify that all claims of this theorem hold true. ∎
5.4 Dual convergence
Let us now turn to analysis of dual convergence, that is, convergence of the sequence of multipliers or, more precisely, convergence of the dual sequence . Although convergence of multipliers for some particular augmented Lagrangian methods can be studied, even in the case when the sequence of multipliers increases unboundedly, with the use of optimality conditions, only convergence of the whole sequence is apparently connected with some fundamental properties of the augmented dual problem. Such connection might exist in the case when the penalty parameter increases unboundedly, but an analysis of such connection is a challenging task, which we leave as an open problem for future research.
We start our study of the dual convergence with a simple auxiliary result that provides an important characterisation of limit points of the sequence .
Lemma 7.
Let all assumptions of Theorem 6 be valid, except for assumption . Suppose also that assumption holds true and the sequence is bounded. Then any limit point of the sequence (if such point exists) is an -optimal solution of the dual problem.
Proof.
Let a subsequence converge to some . Then, in particular, the sequence is bounded and by Theorem 6 (see also Remark 26) one has
Hence bearing in mind the fact that the function is upper semicontinuous by assumption (see Remark 7) one can conclude that , that is, is an -optimal solution of the dual problem. ∎
Corollary 12.
Let the assumptions of Lemma 7 be valid and suppose that the functions and are lsc on , while the augmented Lagrangian is lsc on . Then any limit point of the sequence is an -saddle point of the augmented Lagrangian, that is,
(36) |
Proof.
Suppose that a subsequence converges to some triplet . Then by Theorem 6 (see also Remark 26) one has as and
Therefore, by the lower semicontinuity assumptions one has and
On the other hand, Proposition 1 and Lemma 7 imply that
Hence with the use of assumption one gets that
that is, the first inequality in (36) is satisfied.
By the definition of one has
which implies that
Hence with the use of assumption and the fact that the function is lsc one obtains
that is, the second inequality in (36) is satisfied. ∎
Remark 27.
(i) It should be noted that the augmented Lagrangian is lsc on for all particular examples of the function from Section 3, except for Rockafellar-Wets’ augmented Lagrangian, if the function is lsc on and the function is continuous on this set. In the case of Rockafellar-Wets’ augmented Lagrangian (Example 1) one needs to impose some additional assumptions, such as or and . Let us also mention that in the case of inequality constrained problems, instead of assuming that is continuous, it is sufficient to suppose that the functions defining the constrained are only lower semicontinuous.
With the use of Lemma 7 we can easily show how dual convergence is connected with existence of optimal dual solutions/global saddle points of the augmented Lagrangian.
Theorem 9 (dual convergence vs. existence of optimal dual solutions).
Let be the sequence generated by the model augmented Lagrangian method, and suppose that the following conditions are satisfied:
-
1.
assumptions , , , –, and hold true;
-
2.
assumptions , , and hold true;
-
3.
as ;
-
4.
for any bounded set there exists such that the function is bounded below on .
Then for the sequence of penalty parameters to be bounded and the sequence of multipliers to have a limit point it is necessary that a globally optimal solution of the dual problem exists.
Proof.
By Lemma 7, under the assumptions of the theorem any limit point of the sequence is a globally optimal solution of the dual problem. Therefore, for the existence of such limit (or, equivalently, for the sequence to be bounded and the sequence to have a limit point) it is necessary that a globally optimal solution of the dual problem exists. ∎
Theorem 10 (full convergence vs. existence of global saddle points).
Let all assumptions of Theorem 9 be valid and suppose that the functions and are lsc on the set . The for the sequence of penalty parameters to be bounded and the sequence to have a limit point it is necessary that there exists a global saddle point of the augmented Lagrangian . Moreover, for any limit point of the sequence (if such point exists) the pair is a global saddle point of the augmented Lagrangian and .
Proof.
Let be a limit point of the sequence . Then by Lemma 7 the pair is an optimal dual solution. In turn, by applying Theorem 6 (see also Remark 26) one can readily verify that the zero duality gap property is satisfied and is a globally optimal solution of the problem . Therefore, by Theorem 4 the pair is a global saddle point of and . Consequently, for the existence of a limit point of the sequence it is necessary that there exists a global saddle point of the augmented Lagrangian. ∎
In the case when the space is reflexive one can prove somewhat stronger versions of the previous theorems that uncover a connection between the boundedness of the sequences of multipliers and penalty parameters and the existence of optimal dual solutions/global saddle points.
Theorem 11 (boundedness vs. existence of optimal dual solutions).
Suppose that is the sequence generated by the model augmented Lagrangian method, and let the following conditions be satisfied:
-
1.
the space is reflexive;
-
2.
assumptions , , , , –, and hold true;
-
3.
assumptions , , and hold true;
-
4.
the sequence is bounded and .
-
5.
for any bounded set there exists such that the function is bounded below on .
Then the following statements hold true:
-
1.
any weakly limit point of the sequence (if such point exists) is an -optimal solution of the problem ;
-
2.
if as , then for the boundedness of the sequence it is necessary that there exists a globally optimal solution of the augmented dual problem ;
-
3.
if, in addition, is reflexive, is weakly sequentially closed, the functions and are weakly sequentially lsc on , and as , then for the boundedness of the sequence it is necessary that there exists a global saddle point of the augmented Lagrangian; furthermore, for any weakly limit point of the sequence the pair is a global saddle point of and .
Proof.
The proof of this theorem almost literally repeats the proofs of Lemma 7 and Theorems 9 and 10. One only needs to use the facts that (i) any bounded sequence from a reflexive Banach space has a weakly convergent subsequence, (ii) the augmented dual function is usc and concave by assumptions and , and (iii) any usc concave function defined on a Banach space is also weakly sequentially usc. ∎
Remark 28.
It should be underlined that the previous theorem provides necessary conditions for the boundedness of sequences generated by the model augmented Lagrangian method irrespective of the way in which the sequences of multipliers and penalty parameters are updated. As long as the assumptions of the theorem are satisfied, the existence of a global saddle point is a necessary conditions for the boundedness of the sequence . Similarly, the existence of an optimal dual solution is a necessary condition for the boundedness of the sequences and , regardless of the way in which they are updated.
Let us finally return to Example 20 in order to demonstrate how one can apply general convergence results obtained in this seciton to better understand and predict behaviour of primal-dual augmented Lagrangian methods.
Example 22.
Let . Consider the following optimization problem:
(37) |
Let be the Hestenes-Powell-Rockafellar augmented Lagrangian for this problem (see (18)). As was shown in Example 20, the zero duality gap property holds true in this case, but the augmented dual problem has no globally optimal solutions.
Let the multipliers and the penalty parameter be updated in accordance with the classic augmented Lagrangian method (see, e.g. [5, Algorithm 4.1]), that is,
(38) |
and
(39) |
where and are fixed parameters and
One can readily check that assumptions , , and hold true in this case, if . Hence with the use of Theorem 11 one can conclude that the sequence has no bounded subsequence, which means that either or as . Let us provide a numerical example to illustrate this claim.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
2 | -3 | 1.5 | -1.5 | 1.2 | -1.2 | 1.0909 | -1.0909 | 1.0435 | -1.0435 | |
1 | 4 | 0 | 3 | 0 | 2.4 | 0 | 2.1818 | 0 | 2.087 | |
1 | 0 | 6 | 0 | 3 | 0 | 2.4 | 0 | 2.1818 | 0 | |
3 | 3 | 6 | 6 | 12 | 12 | 24 | 24 | 48 | 48 |
Let , , , and . First 10 iterations of the model augmented Lagrangian method with multiplier updates (38) and penalty parameter updates (39) are given in Table 1. Let us note that the points of global minimum of the augmented Lagrangian were computed analytically. The results of computer simulation have shown that , , , , and as (this fact can be proved analytically, but its proof is rather cumbersome, which is why we do not present it here). Thus, the iterations of the method oscillate between gradually shrinking neighbourhoods of two globally optimal solutions of problem (37) and the KKT multipliers corresponding to these solutions, while the penalty parameter increases unboundedly.
Remark 29.
The convergence theory for the model augmented Lagrangian method presented in this paper generalizes and unifies many existing results on convergence of augmented Lagrangian methods. In particular, many such results either directly follow from Theorems 6, 7, and 10 and Lemma 7 or can be easily deduced from them, including [2, Proposition 2.1], [50, Theorem 6.1, part (1)], [44, Theorems 1–3 and 7], [42, Theorems 2.4 and 3.1], [45, Theorems 4 and 7], [5, Theorem 5.2], [48, Theorem 5, part (iii)], etc.
References
- [1] A. M. Bagirov, G. Ozturk, and R. Kasimbeyli. A shapr augmented Lagrangian-based method for constrained non-convex optimization. Optim. Methods Softw., 34:462–488, 2019.
- [2] D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York, 1982.
- [3] E. G. Birgin, R. A. Castillo, and J. M. Martinez. Numerical comparison of augmented Lagragian algorithms for nonconvex problems. Comput. Optim. Appl., 31:31–55, 2005.
- [4] E. G. Birgin and J. M. Martinez. Augmented Lagrangian method with nonmonotone penalty parameters for constrained optimization. Comput. Optim. Appl., 51:941–965, 2012.
- [5] E. G. Birgin and J. M. Martinez. Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM, Philadelphia, 2014.
- [6] J. F. Bonnans and A. Shapiro. Perturbation analysis of optimization problems. Springer, New York, 2000.
- [7] R. S. Burachik. On primal convergence for augmented Lagrangian duality. Optim., 60:979–990, 2011.
- [8] R. S. Burachik, R. N. Gasimov, N. A. Ismayilova, and C. Kaya. On a modified subgradient algorithm for dual problems via sharp augmented Lagrangian. J. Glob. Optim., 34:55–78, 2006.
- [9] R. S. Burachik, A. N. Iusem, and J. G. Melo. Duality and exact penalization for general augmented Lagrangians. J. Optim. Theory Appl., 147:125–140, 2010.
- [10] R. S. Burachik, A. N. Iusem, and J. G. Melo. A primal dual modified subgradient algorithm with sharp Lagrangian. J. Glob. Optim., 46:55–78, 2010.
- [11] R. S. Burachik, A. N. Iusem, and J. G. Melo. An inexact modified subgradient algorithm for primal-dual problems via augmented Lagrangians. J. Optim. Theory Appl., 157:108–131, 2013.
- [12] R. S. Burachik, A. N. Iusem, and J. G. Melo. The exact penalty map for nonsmooth and nonconvex optimization. Optim., 64:717–738, 2015.
- [13] R. S. Burachik and C. Y. Kaya. An update rule and a convergence result for a penalty function method. J. Ind. Manag. Optim., 3:381–398, 2007.
- [14] R. S. Burachik, C. Y. Kaya, and X. Liu. A primal-dual algorithm as applied to optimal control problems. arXiv: 2304.03465, 2023.
- [15] R. S. Burachik, C. Y. Kaya, and M. Mammadov. An inexact modified subgradient algorithm for nonconvex optimization. Comput. Optim. Appl., 45:1–34, 2008.
- [16] R. S. Burachik and X. Liu. An inexact deflected subgradient algorithm in infinite dimensional spaces. arXiv: 2302.02072, 2023.
- [17] R. S. Burachik and A. Rubinov. Abstract convexity and augmented Lagrangians. SIAM J. Optim., 18:413–436, 2007.
- [18] R. S. Burachik, X. Q. Yang, and Y. Y. Zhou. Existence of augmented Lagrange multipliers for semi-infinite programming problems. J. Optim. Theory Appl., 173:471–503, 2017.
- [19] E. Burman, P. Hansbo, and M. G. Larson. The augmented Lagrangian method as a framework for stabilised methods in computational mechanics. Arch. Comput. Methods Eng., 30:2579–2604, 2023.
- [20] A. Conn, N. I. M. Gould, and P. Toint. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal., 28:545–572, 1991.
- [21] M. V. Dolgopolik. Existence of augmented Lagrange multipliers: reduction to exact penalty functions and localization principle. Math. Program., 166:297–326, 2017.
- [22] M. V. Dolgopolik. Augmented Lagrangian functions for cone constrained optimization: the existence of global saddle points and exact penalty property. J. Glob. Optim., 71:237–296, 2018.
- [23] M. V. Dolgopolik. Minimax exactness and global saddle points of nonlinear augmented Lagrangians. J. Appl. Numer. Optim., 3:61–83, 2021.
- [24] M. J. Feizollahi, S. Ahmed, and A. Sun. Exact augmented Lagrangian duality for mixed integer linear programming. Math. Program., 161:365–387, 2017.
- [25] M. Fortin and R. Glowinski. Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems. North-Holland, Amsterdam, 1983.
- [26] M. Fukushima, Z.-Q. Luo, and P. Tseng. Smoothing functions for second-order-cone complementarity problems. SIAM J. Optim., 12:436–460, 2001.
- [27] R. N. Gasimov. Augmented Lagrangian duality and nondifferentiable optimization methods in nonconvex programming. J. Glob. Optim., 24:187–203, 2002.
- [28] S. He, L. Wu, and H. Meng. A nonlinear Lagrangian for constrained optimization problems. J. Appl. Math. Comput., 38:669–685, 2012.
- [29] M. R. Hestenes. Multiplier and gradient methods. J. Optim. Theory Appl., 4:303–320, 1969.
- [30] R. A. Horn and C. R. Johnson. Topics in matrix analysis. Cambridge University Press, Cambridge, 1991.
- [31] X. X. Huang and X. Q. Yang. A unified augmented Lagrangian approach to duality and exact penalization. Math. Oper. Res., 28:533–552, 2003.
- [32] X. X. Huang and X. Q. Yang. Further study on augmented Lagrangian duality theory. J. Glob. Optim., 31:193–210, 2005.
- [33] X. X. Huang, X. Q. Yang, and K. L. Teo. Augmented Lagrangian and nonlinear semidefinite programs. In F. Giannessi and A. Maugeri, editors, Variational Analysis and Applications, pages 513–529. Springer, Boston, 2005.
- [34] K. Ito and K. Kunisch. Lagrange Multiplier Approach to Variational Problems and Applications. SIAM, Philadelphia, 2008.
- [35] A. N. Iusem. Augmented Lagrangian methods and proximal point methods for convex optimization. Investigación Operativa, 8:11–49, 1999.
- [36] K. C. Kiwiel. On the twice differentiable cubic augmented Lagrangian. J. Optim. Theory Appl., 88:233–236, 1996.
- [37] D. Li and X. L. Sun. Convexification and existence of saddle point in a pth-power reformulation for nonconvex constrained optimization. Nonlinear Anal., 47:5611–5622, 2001.
- [38] Y. Li and L. Zhang. A new nonlinear Lagrangian method for nonconvex semidefinite programming. J. Appl. Anal., 15:149–172, 2009.
- [39] Q. Liu and X. Yang. Zero duality and saddle points of a class of augmented Lagrangian functions in constrained non-convex optimization. Optim., 57:655–667, 2008.
- [40] Y. J. Liu and L. W. Zhang. Convergence analysis of the augmented Lagrangian method for nonlinear second-order cone optimization problems. Nonlinear Anal.: Theory, Methods, Appl., 67:1359–1373, 2007.
- [41] Y. J. Liu and L. W. Zhang. Convergence of the augmented Lagrangian method for nonlinear optimization problems over second-order cones. J. Optim. Theory Appl., 139:557–575, 2008.
- [42] H. Luo, X. Sun, and H. Wu. Convergence properties of augmented Lagrangian methods for constrained global optimization. Optim. Methods Softw., 23:763–778, 2008.
- [43] H. Luo, H. Wu, and J. Liu. On saddle points in semidefinite optimization via separation scheme. J. Optim. Theory Appl., 165:113–150, 2015.
- [44] H. Z. Luo, X. L. Sun, and D. Li. On the convergence of augmented Lagrangian methods for constrained global optimization. SIAM J. Optim., 18:1209–1230, 2007.
- [45] H. Z. Luo, H. X. Wu, and G. T. Chen. On the convergence of augmented Lagrangian methods for nonlinear semidefinite programming. J. Glob. Optim., 54:599–618, 2012.
- [46] O. L. Mangasarian. Unconstrained Lagrangians in nonlinear programming. SIAM J. Control, 12:772–791, 1975.
- [47] D. Noll. Local convergence of an augmented Lagrangian method for matrix inequality constrained programming. Optim. Methods Softw., 22:777–802, 2007.
- [48] M. C. W. Oliveira and C. Sagastizábal. Revisiting augmented Lagrangian duals. Math. Program., 196:235–277, 2022.
- [49] R. Polyak. Modified barrier functions: theory and methods. Math. Program., 54:177–222, 1992.
- [50] R. A. Polyak. Log-sigmoid multipliers method in constrained optimization. Ann. Oper. Res., 101:427–460, 2001.
- [51] R. A. Polyak. Nonlinear rescaling vs. smoothing technique in convex optimization. Math. Program., 92:197–235, 2002.
- [52] M. J. D. Powell. A method for nonlinear constraints in minimization problems. In R. Fletcher, editor, Optimization, pages 283–298. Academic Press, London, 1969.
- [53] L. M. Ramirez, N. Maculan, A. E. Xavier, and V. L. Xavier. Dislocation hyperbolic augmented Lagrangian algorithm for nonconvex optimization. RAIRO Oper. Res., 57:2941–2950, 2023.
- [54] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.
- [55] R. T. Rockafellar. The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl., 12:555–562, 1973.
- [56] R. T. Rockafellar. Augmented Lagrange multiplier functions and duality in nonconvex programming. SIAM J. Control Optim., 12:268–285, 1974.
- [57] R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review, 35:183–238, 1993.
- [58] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis. Springer-Verlar, Berlin, 1998.
- [59] A. M. Rubinov, X. X. Huang, and X. Q. Yang. The zero duality gap property and lower semicontinuity of the perturbation function. Math. Oper. Res., 27:775–791, 2002.
- [60] A. M. Rubinov and X. Q. Yang. Lagrange-type functions in constrained nonconvex optimization. Kluwer Academic Publishers, Boston, 2003.
- [61] J. J. Rückmann and A. Shapiro. Augmented Lagrangians in semi-infinite programming. Math. Program., 116:499–512, 2009.
- [62] A. Shapiro and J. Sun. Some properties of the augmented Lagrangian in cone constrained optimization. Math. Oper. Res., 29:479–491, 2004.
- [63] M. Stingl. On the solution of nonlinear semidefinite programs by augmented Lagrangian methods. PhD thesis, Institute of Applied Mathematics II, Friedrech-Alexander University of Erlangen-Nuremberg, Erlangen, Germany, 2006.
- [64] D. Sun and J. Sun. Löwner’s operator and spectral functions in euclidean jordan algebras. Math. Oper. Res., 33:421–445, 2008.
- [65] D. Sun, J. Sun, and L. Zhang. The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math. Program., 114:349–391, 2008.
- [66] J. Sun. On methods for solving nonlinear semidefinite optimization problems. Numer. Algebra Control Optim., 1:1–14, 2011.
- [67] J. Sun, L. W. Zhang, and Y. Wu. Properties of the augmented Lagrangian in nonlinear semidefinite optimization. J. Optim. Theory Appl., 129:437–456, 2006.
- [68] X. L. Sun, D. Li, and K. I. M. McKinnon. On saddle points of augmented Lagrangians for constrained nonconvex optimization. SIAM J. Optim., 15:1128–1146, 2005.
- [69] P. Tseng and D. P. Bertsekas. On the convergence of the exponential multiplier method for convex programming. Math. Program., 60:1–19, 1993.
- [70] C. Wang, Q. Liu, and B. Qu. Global saddle points of nonlinear augmented Lagrangian functions. J. Glob. Optim., 68:125–146, 2017.
- [71] C. Y. Wang and D. Li. Unified theory of augmented Lagrangian methods for constrained global optimization. J. Glob. Optim., 44:433–458, 2009.
- [72] C. Y. Wang, X. Q. Yang, and X. M. Yang. Nonlinear augmented Lagrangian and duality theory. Math. Oper. Res., 38:740–760, 2012.
- [73] Z. Wen, D. Goldfarb, and W. Yin. Alternating direction augmented Lagrangian methods for semidefinite programming. Math. Program. Comput., 2:203–230, 2010.
- [74] H. Wu, H. Luo, X. Ding, and G. Chen. Global convergence of modified augmented Lagrangian methods for nonlinear semidefinite programming. Comput. Optim. Appl., 56:531–558, 2013.
- [75] H. X. Wu and H. Z. Luo. A note on the existence of saddle points of p-th power Lagrangian for constrained nonconvex optimization. Optim., 61:1231–1245, 2012.
- [76] H. X. Wu and H. Z. Luo. Saddle points of general augmented Lagrangians for constrained nonconvex optimization. J. Glob. Optim., 53:683–697, 2012.
- [77] H. X. Wu, H. Z. Luo, and J. F. Yang. Nonlinear separation approach for the augmented Lagrangian in nonlinear semidefinite programming. J. Glob. Optim., 59:695–727, 2014.
- [78] A. E. Xavier. Hyperbolic penalty: a new method for nonlinear programming with inequalities. Int. Trans. Oper. Res., 8:659–671, 2001.
- [79] G. D. Yalcin and R. Kasimbeyli. On weak conjugacy, augmented Lagrangians and duality in nonconvex optimization. Math. Methods Oper. Res., 92:199–228, 2020.
- [80] H. Yamashita and H. Yabe. A survey of numerical methods for nonlinear semidefinite programming. J. Oper. Res. Soc. Jpn., 58:24–60, 2015.
- [81] L. Zhang, J. Gu, and X. Xiao. A class of nonlinear Lagrangians for nonconvex second order cone programming. Comput. Optim. Appl., 49:61–99, 2011.
- [82] L. Zhang, Y. Li, and J. Wu. Nonlinear rescaling Lagrangians for nonconvex semidefinite programming. Optim., 63:899–920, 2014.
- [83] X. Y. Zhao, D. Sun, and K.-C. Toh. A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim., 20:1737–1765, 2010.
- [84] J. Zhou and J. S. Chen. On the existence of saddle points for nonlinear second-order cone programming problems. J. Glob. Optim., 62:459–480, 2015.
- [85] J. Zhou, N. Xiu, and C. Wang. Saddle point and exact penalty representation for generalized proximal Lagrangians. J. Glob. Optim., 54:669–687, 2012.
- [86] Y. Y. Zhou and X. Q. Yang. Duality and penalization in optimization via an augmented Lagrangian function with applications. J. Optim. Theory Appl., 140:171–188, 2009.
- [87] Y. Y. Zhou and X. Q. Yang. Augmented Lagrangian functions for constrained optimization problems. J. Glob. Optim., 52:95–108, 2012.
- [88] Y. Y. Zhou, J. C. Zhou, and X. Q. Yang. Existence of augmented Lagrange multipliers for cone constrained optimization problems. J. Glob. Optim., 58:243–260, 2014.