An efficient descent method for locally Lipschitz multiobjective optimization problems
Abstract
In this article, we present an efficient descent method for locally Lipschitz continuous multiobjective optimization problems (MOPs). The method is realized by combining a theoretical result regarding the computation of descent directions for nonsmooth MOPs with a practical method to approximate the subdifferentials of the objective functions. We show convergence to points which satisfy a necessary condition for Pareto optimality. Using a set of test problems, we compare our method to the multiobjective proximal bundle method by Mäkelä. The results indicate that our method is competitive while being easier to implement. While the number of objective function evaluations is larger, the overall number of subgradient evaluations is lower. Finally, we show that our method can be combined with a subdivision algorithm to compute entire Pareto sets of nonsmooth MOPs.
1 Introduction
In many scenarios in real life, the problem of optimizing multiple objectives at the same time arises. In engineering for example, one often wants to steer a physical system as close as possible to a desired state while minimizing the required energy cost at the same time. These problems are called multiobjective optimization problems (MOPs) and generally do not possess a single optimal solution. Instead, the solution is the set of all optimal compromises, the so-called Pareto set containing all Pareto optimal points. Due to this, the numerical computation of solutions to MOPs is more challenging than to single-objective problems. On top of that, there are numerous applications where the objectives are nonsmooth, for example contact problems in mechanics, which adds to the difficulty. In this article, we will address both difficulties combined by considering nonsmooth MOPs.
When addressing the above-mentioned difficulties, i.e., multiple objectives and nonsmoothness, separately, there exists a large number solution methods. For smooth MOPs, the most popular methods include evolutionary [9, 10] and scalarization methods [27]. Additionally, some methods from single-objective optimization have been generalized, like gradient descent methods [13, 30, 14] and Newton’s method [12]. For the nonsmooth single-objective case, commonly used methods include subgradient methods [31], bundle methods [20] and gradient sampling methods [3]. More recently, in [22], a generalization of the steepest descent method to the nonsmooth case was proposed, which is based on an efficient approximation of the subdifferential of the objective function. For nonsmooth multiobjective optimization, the literature is a lot more scarce. Since classical scalarization approaches do not require the existence of gradients, they can still be used. In [1], a generalization of the steepest descent method was proposed for the case when the full subdifferentials of the objectives are known, which is rarely the case in practice. In [2, 7], the subgradient method was generalized to the multiobjective case, but both articles report that their method is not suitable for real life application due to inefficiency. In [25] (see also [18, 23]), a multiobjective version of the proximal bundle method was proposed, which currently appears to be the most efficient solver.
In this article, we develop a new descent method for locally Lipschitz continuous MOPs by combining the descent direction from [1] with the approximation of the subdifferentials from [22]. In [1] it was shown that the element with the smallest norm in the negative convex hull of the subdifferentials of the objective functions is a common descent direction for all objectives. In [22], the subdifferential of the objective function was approximated by starting with a single subgradient and then systematically computing new subgradients until the element with the smallest norm in the convex hull of all subgradients is a direction of (sufficient) descent. Combining both approaches yields a descent direction for locally Lipschitz MOPs and together with an Armijo step length, we obtain a descent method. We show convergence to points which satisfy a necessary condition for Pareto optimality. Using a set of test problems, we compare the performance of our method to the multiobjective proximal bundle method from [25]. The results indicate that our method is inferior in terms of function evaluations, but superior in terms of subgradient evaluations.
The structure of this article is as follows: we start with a short introduction to nonsmooth and multiobjective optimization in Section 2. In Section 3, we derive our descent method by replacing the Clarke subdifferential for the computation of the descent direction by the Goldstein -subdifferential and then showing how the latter can be efficiently approximated. In Section 4, we apply our descent method to numerical examples. We first visualize and discuss the typical behavior of our method before comparing it to the multiobjective proximal bundle method from [25] using a set of test problems. Afterwards, we show how our method can be combined with a subdivision algorithm to approximate entire Pareto sets. Finally, in Section 5, we draw a conclusion and discuss possible future work.
2 Nonsmooth multiobjective optimization
We consider the nonsmooth multiobjective optimization problem
(MOP) |
where is the objective vector with components , , called objective functions. We assume the objective functions to be locally Lipschitz continuous, i.e., for each and , there is some and with
where denotes the Euclidean norm in . Since (MOP) is an optimization problem with a vector-valued objective function, the classical concept of optimality from the scalar case can not directly be conveyed. Instead, we are looking for the Pareto set, which is defined in the following way:
Definition 2.1.
A point is called Pareto optimal, if there is no such that
The set of all Pareto optimal points is the Pareto set.
In practice, to check if a given point is Pareto optimal, we need optimality conditions. In the smooth case, there are the well-known KKT conditions (cf. [27]), which are based on the gradients of the objective functions. In case the objective functions are merely locally Lipschitz, the KKT conditions can be generalized using the concept of subdifferentials. In the following, we will recall the required definitions and results from nonsmooth analysis. For a more detailed introduction, we refer to [6].
Definition 2.2.
Let be the set of points where is not differentiable. Then
is the (Clarke) subdifferential of in . is a subgradient.
It is easy to see that if is continuously differentiable, then the Clarke subdifferential is the set containing only the gradient of . We will later use the following technical result on some properties of the Clarke subdifferential (cf. [6], Prop. 2.1.2).
Lemma 2.3.
is nonempty, convex and compact.
Using the subdifferential, we can state a necessary optimality condition for locally Lipschitz MOPs (cf. [24], Thm. 12).
Theorem 2.4.
Let be Pareto optimal. Then
(1) |
In the smooth case, (1) reduces to the classical multiobjective KKT conditions. Note that in contrast to the smooth case, the optimality condition (1) is numerically challenging to work with, as subdifferentials are difficult to compute. Thus, in numerical methods, (1) is only used implicitly.
The method we are presenting in this paper is a descent method, which means that, starting from a point , we want to generate a sequence in which each point is an improvement over the previous point. This is done by computing directions and step lengths such that and
For the computation of , we recall the following basic result from convex analysis that forms the theoretical foundation for descent methods in the presence of multiple (sub)gradients. Let be the Euclidean norm in .
Theorem 2.5.
Let be convex and compact and
(2) |
Then either and
(3) |
or and there is no with for all .
Proof.
Since is the projection of the origin onto the closed and convex set , we have
for all (cf. [4], Lem.). In particular, if then .
Conversely, implies , so in this case there can not be any with for all . ∎
Roughly speaking, Theorem 2.5 states that the element of minimal norm in the convex and compact set is directionally opposed to all elements of . To be more precise, is contained in the intersection of all half-spaces induced by elements of . In the context of optimization, this result has several applications:
In (i) and (ii), the solution of problem (2) is straightforward, since is a convex polytope with the gradients as vertices. In (iii), the solution of (2) is non-trivial due to the difficulty of computing the subdifferential. In subgradient methods [31], the solution is approximated by using a single subgradient instead of the entire subdifferential. As a result, it can not be guaranteed that the solution is a descent direction and in particular, (2) can not be used as a stopping criterion. In gradient sampling methods [3], the subdifferential is approximated by the convex hull of gradients of the objective function in randomly sampled points around the current point. Due to the randomness, it can not be guaranteed that the resulting direction yields sufficient descent. Additionally, a check for differentiability of the objective is required, which can pose a problem [17]. In (iv), the solution of (2) gets even more complicated due to the presence of multiple subdifferentials. So far, the only methods that deal with (2) in this case are multiobjective versions of the subgradient method [2, 7], which were reported unsuitable for real life applications.
In the following section, we will describe a new way to compute descent directions for nonsmooth MOPs by systematically computing an approximation of that is sufficient to obtain a ”good enough” descent direction from (2).
3 Descent method for nonsmooth MOPs
In this section, we will present a method to compute descent directions of nonsmooth MOPs that generalizes the method from [22] to the multiobjective case. As described in the previous section, when computing descent directions via Theorem 2.5, one has the problem of having to compute subdifferentials. Since these are difficult to come by in practice, we will instead replace in Theorem 2.5 by an approximation of such that the resulting direction is guaranteed to have sufficient descent. To this end, we will first replace the Clarke subdifferential by the so-called -subdifferential, and then take a finite approximation of the latter.
3.1 The epsilon-subdifferential
By definition, is the convex hull of the limits of the gradient of in all sequences near that converge to . Thus, if we evaluate in a number of points close to (where it is defined) and take the convex hull, we expect the resulting set to be an approximation of . To formalize this, we introduce the following definition [16, 21].
Definition 3.1.
Let , and . Then
is the (Goldstein) -subdifferential of in . is an -subgradient.
Note that and . For we define for the multiobjective setting
To be able to choose in Theorem 2.5, we first need to establish some properties of .
Lemma 3.2.
is nonempty, convex and compact. In particular, the same holds for .
Proof.
For , this was shown in [16], Prop. 2.3. For , it then follows directly from the definition. ∎
Corollary 3.3.
Let .
-
a)
If is Pareto optimal, then
(4) -
b)
Let and
(5) Then either and
(6) or and there is no with for all .
The previous corollary states that when working with the -subdifferential instead of the Clarke subdifferential, we still have a necessary optimality condition and a way to compute descent directions, although the optimality conditions are weaker and the descent direction has a less strong descent. This is illustrated in the following example.
Example 3.4.
Consider the locally Lipschitz function
The set of nondifferentiable points of is . For and we have
For we have
Figure 1 shows the Clarke subdifferential (a), the -subdifferential (b) for and the corresponding sets for .

(a)

(b)
Additionally, the corresponding solutions of (5) are shown. In this case, the predicted descent (cf. (3)) is approximately in (a) and in (b).
Figure 2 shows the same scenario for .

(a)

(b)
The following lemma shows that for the direction from (5), there is a lower bound for a step size up to which we have guaranteed descent in each objective function .
Lemma 3.5.
Let and be the solution of (5). Then
Proof.
In the following, we will describe how we can obtain a good approximation of (5) without requiring full knowledge of the -subdifferentials.
3.2 Efficient computation of descent directions
In this part, we will describe how the solution of (5) can be approximated when only a single subgradient can be computed at every . Similar to the gradient sampling approach, the idea behind our method is to replace in (5) by the convex hull of a finite number of -subgradients , . Since it is impossible to know a priori how many and which -subgradients are required to obtain a good descent direction, we solve (5) multiple times in an iterative approach to enrich our approximation until a satisfying direction has been found. To this end, we have to specify how to enrich our current approximation and how to characterize an acceptable descent direction.
Let and
(7) |
Let . Motivated by Lemma 3.5, we regard as an acceptable descent direction, if
(8) |
If the set for which (8) is violated is non-empty then we have to find a new -subgradient such that yields a better descent direction. Intuitively, (8) being violated means that the local behavior of , , in in the direction is not sufficiently captured in . Thus, for each , we expect that there exists some such that improves the approximation of . This is proven in the following lemma.
Lemma 3.6.
Proof.
The following example visualizes the previous lemma.
Example 3.7.
Consider as in Example 3.4, and . The dashed lines in Figure 3 show the -subdifferentials, and the resulting descent direction (cf. Figure 1 and 2).

(a)

(b)
Let . Then , so and
Let and be the approximation of , shown as the solid line in Figure 3(a). Let be the solution of (7) for this and choose . Checking (8), we have
By Lemma 3.6, this means that there is some and such that
In this case, we can take for example , resulting in
Figure 3(b) shows the improved approximation and the resulting descent direction . By checking (8) for this new descent direction, we see that is acceptable. (Note that in general, a single improvement step like this will not be sufficient to reach an acceptable direction.)
Note that Lemma 3.6 only shows the existence of and without stating a way how to actually compute them. To this end, let be the index of an objective function for which (8) is not satisfied, define
(11) |
(cf. [22]) and consider Algorithm 1. If is continuously differentiable around , then (9) is equivalent to , i.e., being monotonically increasing around . Thus, the idea of Algorithm 1 is to find some such that is monotonically increasing around , while checking if (9) is satisfied for a subgradient .
Although in the general setting, we can not guarantee that Algorithm 1 yields a subgradient satisfying (9), we can at least show that after finitely many iterations, a factor is found such that contains a subgradient that satisfies (9).
Lemma 3.8.
Proof.
Let be finite with last element . Then Algorithm 1 must have stopped in step 3, i.e., some satisfying (9) was found.
Now let be infinite. By construction, is a Cauchy sequence in the compact set , so it has to converge to some . Additionally, since (8) is violated for the index by assumption, we have
Let and be the sequences corresponding to and in Algorithm 1 (at the start of each iteration). Then for all . Thus, by the mean value theorem, there has to be some such that
In particular, and since , for all . By upper semicontinuity of there must be some with . By the chain rule, we have
(12) |
Thus, there must be some with
By upper semicontinuity of it also follows that there is some such that for all . Applying the same argument as above completes the proof. ∎
In the following remark, we will briefly discuss the implication of Lemma 3.8 for practical use of Algorithm 1.
Remark 3.9.
Let be as in Lemma 3.8.
-
a)
Note that if and is differentiable in , then we have
i.e., the stopping criterion in step 3 is satisfied. Thus, if Algorithm 1 generates an infinite sequence, must be nonsmooth in for all . In particular, must be nonsmooth in for all .
-
b)
If is regular (cf. [6], Def. 2.3.4), then equality holds when applying the chain rule to (cf. [6], Thm. 2.3.10), i.e.,
Thus, if Algorithm 1 generates an infinite sequence, then for all there must be both positive and negative elements in . By convexity of , this implies that for all , i.e., must have infinitely many (nonsmooth) stationary points in .
Motivated by the previous remark, we will from now on assume that Algorithm 1 stops after finitely many iterations and thus yields a new subgradient satisfying (9). We can use this method of finding new subgradients to construct an algorithm that computes descent directions of nonsmooth MOPs, namely Algorithm 2.
The following theorem shows that Algorithm 2 stops after a finite number of iterations and produces an acceptable descent direction (cf. (8)).
Theorem 3.10.
Algorithm 2 terminates. In particular, if is the last element of , then either or is an acceptable descent direction, i.e.,
Proof.
Assume that Algorithm 2 does not terminate, i.e., is an infinite sequence. Let and . Since and we have
(13) |
for all . Since we must have
(14) |
by step 5. Let be a common Lipschitz constant of all , , on the closed -ball around . Then by [6], Prop. 2.1.2, and the definition of the -subdifferential, we must have for all . So in particular,
(15) |
Combining (3.2) with (14) and (15) yields
Let . Since and we have . We obtain
Since Algorithm 2 did not terminate, it holds . It follows that
Let . Note that we have for all , so . Additionally, does not depend on , so we have
In particular, there is some such that , which is a contradiction. ∎
Remark 3.11.
The proof of Theorem 3.10 shows that for convergence of Algorithm 2, it would be sufficient to consider only a single in step 5. Similarly, for the initial approximation , a single element from for any would be enough. A modification of either step can potentially reduce the number of executions of step 5 (i.e., Algorithm 1) in Algorithm 2 in case the -subdifferentials of multiple objective functions are similar. Nonetheless, we will restrain the discussion in this article to Algorithm 2 as it is, since both modifications also introduce a bias towards certain objective functions, which we want to avoid.
To highlight the strengths of Algorithm 2, we will consider an example where standard gradient sampling approaches can fail to obtain a useful descent direction.
Example 3.12.
For consider the locally Lipschitz function
The set of nondifferentiable points is
separating into four smooth areas (cf. Figure 4(a)). For large , the two areas above the graph of become small, making it difficult to compute the subdifferential close to .
Let , , and . In this case, is the minimal point of and
In particular, , so the descent direction with the exact -subdifferentials from (5) is zero. When applying Algorithm 2 in , after two iterations we obtain
i.e., . Thus, is correctly identified as (potentially) Pareto optimal. The final approximation of is
The first two elements of are the gradients of and in from the first iteration of Algorithm 2, and the last element is the gradient of in from the second iteration. The result is visualized in Figure 4.

(a)

(b)
Building on Algorithm 2, it is now straightforward to construct the descent method for locally Lipschitz continuous MOPs given in Algorithm 3. In step 4, the classical Armijo backtracking line search was used (cf. [13]) for the sake of simplicity. Note that it is well defined due to step 4 in Algorithm 2.
Since we introduced the two tolerances (for the -subdifferential) and (as a threshold for when we consider -subgradients to be zero), we can not expect that Algorithm 3 computes points which satisfy the optimality condition (1). This is why we introduce the following definition, similar to the definition of -stationarity from [3].
Definition 3.13.
Let , and . Then is called -critical, if
It is easy to see that -criticality is a necessary optimality condition for Pareto optimality, but a weaker one than (1). The following theorem shows that convergence in the sense of -criticality is what we can expect from our descent method.
Theorem 3.14.
Let be the sequence generated by Algorithm 3. Then either is unbounded below for each , or is finite with the last element being -critical.
Proof.
Assume that is infinite. Then for all . By step 4 and Lemma 3.5 we have
for all . This implies that is unbounded below for each .
Now assume that is finite, with and being the last elements of and , respectively. Since the algorithm stopped, we must have . From the application of Algorithm 2 in step 2, we know that there must be some such that . This implies
∎
4 Numerical examples
In this section we will apply our nonsmooth descent method (Algorithm 3) to several examples. We will begin by discussing its typical behavior before comparing its performance to the multiobjective proximal bundle method [25]. Finally, we will combine our method with the subdivision algorithm [11] in order to approximate the entire Pareto set of nonsmooth MOPs.
4.1 Typical behavior
In smooth areas, the behavior of Algorithm 3 is almost identical to the behavior of the multiobjective steepest descent method [13]. The only difference stems from the fact that, unlike the Clarke subdifferential, the -subdifferential does not reduce to the gradient when is continuously differentiable. As a result, Algorithm 3 may behave differently in points where
is large. (If is twice differentiable, this can obviously be characterized in terms of second order derivatives.) Nevertheless, if is chosen small enough, this difference can be neglected. Thus, in the following, we will focus on the behavior with respect to the nonsmoothness of .
To show the typical behavior of Algorithm 3, we consider the objective function
(16) |
from [25] (combining Crescent from [19] and Mifflin 2 from [26]). The set of nondifferentiable points is . We consider the starting points
the tolerances , and the Armijo parameters , . The results are shown in Figure 5.

We will briefly go over the result for each starting point:
-
•
For , the sequence moves through the smooth area like the steepest descent method until a point is found with a distance less or equal to the set of nondifferentiable points . In that point, more than one -subgradient is required to obtain a sufficient approximation of the -subdifferentials. Since this part of is Pareto optimal, no acceptable descent direction (cf. (8)) is found and the algorithm stops (in a -critical point).
-
•
For , the sequence starts zig-zagging around the non-optimal part of , since the points are too far away from for the algorithm to notice the nondifferentiability. When a point is found with distance less or equal to , a better descent direction is found, breaking the zig-zagging motion.
-
•
For , the sequence has a similar zig-zagging motion to the previous case. The difference is that this time, the sequence moves along until a Pareto optimal point in is found.
As described above, the zig-zagging behavior when starting in is caused by the fact that was too small for the method to notice the nondifferentiability. To circumvent problems like this and quickly move through problematic areas, it is possible to apply Algorithm 3 consecutively with decreasing values of . The result is Algorithm 4. (A similar idea was implemented in [22].)
4.2 Comparison to the multiobjective proximal bundle method
We will now compare Algorithms 3 and 4 to the multiobjective proximal bundle method (MPB) by Mäkelä, Karmitsa and Wilppu from [25] (see also [23]). As test problems, we consider the 18 MOPs in Table 1, which are created on the basis of the scalar problems from [25]. Problems 1 to 15 are convex (and were also considered in [28]) and problems 16 to 18 are nonconvex. Due to their simplicity, we are able to differentiate all test problems by hand to obtain explicit formulas for the subgradients. For each test problem, we choose 100 starting points on a 10 10 grid in the corresponding area given in Table 1.
Nr. | Area | Nr. | Area | ||
---|---|---|---|---|---|
1. | CB3, DEM | 10. | QL, LQ | ||
2. | CB3, QL | 11. | QL, Mifflin 1 | ||
3. | CB3, LQ | 12. | QL, Wolfe | ||
4. | CB3, Mifflin 1 | 13. | LQ, Mifflin 1 | ||
5. | CB3, Wolfe | 14. | LQ, Wolfe | ||
6. | DEM, QL | 15. | Mifflin 1, Wolfe | ||
7. | DEM, LQ | 16. | Crescent, Mifflin 2 | ||
8. | DEM, Mifflin 1 | 17. | Mifflin 2, WF | ||
9. | DEM, Wolfe | 18. | Mifflin 2, SPIRAL |
For the MPB, we use the Fortran implementation from [23] with the default parameters. For Algorithm 3, we use , , and (i.e., the initial step size is chosen depending on the norm of the descent direction in the current iteration). For Algorithm 4, we additionally use , , . By this choice of parameters, all three methods produce results of similar approximation quality.
To compare the performance of the three methods, we count the number of evaluations of objectives , their subgradients and the number of iterations (i.e., descent steps) needed. (This means that one call of will account for evaluations of objectives.) Since the MPB always evaluates all objectives and subgradients in a point, the value for the objectives and the subgradients are the same here. The results are shown in Table 2 and are discussed in the following.
# | # | # Iter | |||||||
Nr. | MPB | Alg. 3 | Alg. 4 | MPB | Alg. 3 | Alg. 4 | MPB | Alg. 3 | Alg. 4 |
1. | 1780 | 6924 | 7801 | 1780 | 1102 | 1751 | 761 | 492 | 695 |
2. | 2522 | 14688 | 12263 | 2522 | 1906 | 2351 | 1151 | 842 | 914 |
3. | 880 | 5625 | 6447 | 880 | 921 | 1534 | 340 | 448 | 662 |
4. | 4416 | 103826 | 17664 | 4416 | 11774 | 3415 | 1832 | 4644 | 1242 |
5. | 2956 | 30457 | 16877 | 2956 | 3479 | 3037 | 1377 | 1616 | 1161 |
6. | 1640 | 8357 | 8684 | 1640 | 1209 | 1802 | 706 | 552 | 736 |
7. | 1702 | 8736 | 8483 | 1702 | 1307 | 1832 | 723 | 595 | 739 |
8. | 4226 | 8283 | 8620 | 4226 | 1318 | 1914 | 1204 | 582 | 759 |
9. | 1828 | 8201 | 8794 | 1828 | 1194 | 1805 | 793 | 536 | 732 |
10. | 1782 | 6799 | 7201 | 1782 | 1101 | 1722 | 684 | 543 | 733 |
11. | 4426 | 52096 | 17594 | 4426 | 6311 | 3189 | 1964 | 2442 | 1206 |
12. | 2482 | 15146 | 12446 | 2482 | 1992 | 2401 | 1140 | 967 | 1010 |
13. | 2662 | 36570 | 9513 | 2662 | 4958 | 2247 | 1221 | 1692 | 787 |
14. | 4264 | 95303 | 12227 | 4264 | 9524 | 2571 | 1774 | 4379 | 921 |
15. | 3594 | 85936 | 15669 | 3594 | 9329 | 3124 | 1444 | 3963 | 1125 |
16. | 2206 | 20372 | 11094 | 2206 | 2596 | 2400 | 884 | 1194 | 947 |
17. | 2388 | 7920 | 5852 | 2388 | 1272 | 1556 | 868 | 626 | 706 |
18. | 11430 | 166707 | 31528 | 11430 | 16676 | 6902 | 2789 | 8291 | 2412 |
Avg. | 3176.9 | 37885.9 | 12153.2 | 3176.9 | 4331.6 | 2530.7 | 1203.1 | 1911.3 | 971.5 |
100% | 1192.5% | 382.5% | 100% | 136.3% | 79.7% | 100% | 158.9% | 80.8% |
-
•
Function evaluations: When considering the number of function evaluations, it is clear that the MPB requires far less evaluations than both of our algorithms. In our methods, these evaluations are used to check if a descent direction is acceptable (cf. (8)) and for the computation of the Armijo step length. One reason for the larger total amount is the fact that unlike the MPB, our methods are autonomous in the sense that they do not reuse information from previous iterations, so some information is potentially gathered multiple times. Additionally, the step length we use is fairly simple, so it might be possible to lower the number of evaluations by using a more sophisticated step length. When comparing our methods to each other, we see that Algorithm 4 is a lot more efficient than Algorithm 3 when the number of evaluations is high and is slightly less efficient when the number of evaluations is low. The reason for this is that for simple problems (i.e., where the number of evaluations is low), some of the iterations of Algorithm 4 will be redundant, because the -critical point of the previous iteration is already -critical.
-
•
Subgradient evaluations: For the subgradient evaluations, we see that MPB is slightly superior to our methods on problems 3, 5 and 16, but inferior on the rest. Regarding the comparison of Algorithms 3 and 4, we observe the same pattern as for the function evaluations: Algorithm 3 is superior for simple and Algorithm 4 for complex problems.
-
•
Iterations: For the number of iterations, besides problem 5, we see the exact same pattern as for the number of subgradient evaluations. Note that the MPB can perform null steps, which are iterations where only the bundle is enriched, while the current point in the descent sequence stays the same.
For our set of test problems, this leads us to the overall conclusion that in terms of function evaluations, the MPB seems to be superior to our methods, while in terms of subgradient evaluations, our methods seem to be (almost always) more efficient. Furthermore, we remark that the implementation of the MPB is somewhat challenging, whereas our method can be implemented relatively quickly.
4.3 Combination with the subdivision algorithm
Note that so far, we have a method where we can put in some initial point from and obtain a single -critical point (close to an actual Pareto optimal point) as a result. But ultimately, we are not interested in one, but all Pareto optimal points. The intuitive and straightforward approach to extend our method would be to just take a large set of well-spread initial points and apply our method to each of them. The problem with this is that we have no guarantee that this results in a good approximation of the Pareto set. To solve this issue, we combine our method with the subdivision algorithm which was developed for smooth problems in [11]. Since we only have to do minor adjustments for the nonsmooth case, we will only sketch the method here and refer to [11] for the details.
The idea is to interpret the nonsmooth descent method as a discrete dynamical system
(17) |
where is the map that applies one iteration of Algorithm 3 to a point in . (For the sake of brevity, we have omitted the rest of the input of the algorithm here.) Since no information is carried over between iterations of the algorithm, the trajectory (i.e., the sequence) generated by the system (17) is the same as the one generated by Algorithm 3. In particular, this means that the Pareto set of the MOP is contained in the set of fixed points of the system (17). Thus, the subdivision algorithm (which was originally designed to compute attractors of dynamical systems) can be used to compute (a superset of) the Pareto set.
The subdivision algorithm starts with a large hypercube (or box) in that contains the Pareto set and mainly consists of two steps:
-
1.
Subdivision: Divide each box in the current set of boxes into smaller boxes.
-
2.
Selection: Compute the image of the union of the current set of boxes under and remove all boxes that have an empty intersection with this image. Go to step 1.
In practice, we realize step 1 by evenly dividing each box into smaller boxes and step 2 by using the image of a set of sample points. The algorithm is visualized in Figure 6.

(a)

(b)
Unfortunately, the convergence results of the subdivison algorithm only apply if is a diffeomorphism. If the objective function is smooth, then the descent direction is at least continuous (cf. [13]) and the resulting dynamical system , while not being a diffeomorphism, still behaves well enough for the subdivision algorithm to work. If is nonsmooth, then our descent direction is inherently discontinuous close to the nonsmooth points. Thus, the subdivision algorithm applied to (17) will (usually) fail to work. In practice, we were able to solve this issue by applying multiple iterations of Algorithm 3 in at once instead of just one. Roughly speaking, this smoothes by pushing the influence of the discontinuity further away from the Pareto set and was sufficient for convergence (in our tests).
Figures 7 to 9 show the result of the subdivision algorithm for some of the problems from Table 1. For each problem, we used iterations of Algorithm 3 in , as the starting box and applied iterations of the subdivision algorithm. For the approximation of the Pareto front (i.e., the image of the Pareto set), we evaluated in all points of the image of of the last selection step in the subdivision algorithm. In all of these examples, the algorithm produced a tight approximation of the Pareto set.

(a)

(b)

(a)

(b)

(a)

(b)
5 Conclusion and outlook
In this article, we have developed a new descent method for locally Lipschitz continuous multiobjective optimization problems, which is based on the efficient approximation of the Clarke subdifferentials of the objective functions from [22]. In [1], it was shown that the element with the smallest norm in the negative convex hull of the union of the subdifferentials is a descent direction for all objectives at the same time. In practice, the entire subdifferentials which are required to compute this direction are rarely known and only single subgradients can be computed. To solve this issue, we presented a method to obtain an approximation of the subdifferentials which is sufficient to obtain a descent direction. The idea is to start with a rough approximation of the subdifferentials by only a few subgradients and then systematically enrich the approximation with new subgradients until a direction of sufficient descent is found. By combining the descent direction with an Armijo step length, we obtained a descent method for nonsmooth MOPs and showed convergence to points which satisfy a necessary condition for Pareto optimality. We then compared the performance to the multiobjective proximal bundle method from [25]. For the 18 test problems we considered, the MPB was superior in terms of objective function evaluations, but our method required less subgradient evaluations and iterations. Finally, we showed that our descent method can be combined with the subdivision algorithm from [11] to compute approximations of entire Pareto sets.
For future work, we believe that it is straightforward to extend our method to constrained MOPs by adding constraints to the problem (7) that ensure that the descent direction maintains the feasibility of the descent sequence (similar to [14] for smooth problems). Additionally, in [8], the classical gradient sampling method for scalar nonsmooth optimization was generalized by allowing variable norms in the direction finding problem, increasing its efficiency. We expect that a similar generalization can be performed for problem (7), which potentially yields a similar increase in efficiency. Additional potential for increased performance lies in more advanced step length schemes as well as descent directions with memory (for instance, conjugate-gradient-like). Furthermore, it might be possible to extend our method to infinite-dimensional nonsmooth MOPs [29, 5]. Finally, in the context of nonsmooth many-objective optimization, we believe that considering subsets of objectives is a very promising and efficient approach (cf. [15] for smooth problems). However, theoretical advances are required for locally Lipschitz continuous problems.
Acknowledgement
This research was funded by the DFG Priority Programme 1962 ”Non-smooth and Complement-arity-based Distributed Parameter Systems”.
References
- [1] H. Attouch, G. Garrigos, and X. Goudou. A dynamic gradient approach to pareto optimization with nonsmooth convex objective functions. Journal of Mathematical Analysis and Applications, 422(1):741 – 771, 2015.
- [2] Y. Bello-Cruz. A Subgradient Method for Vector Optimization Problems. SIAM Journal on Optimization, Nov 2013.
- [3] J. Burke, A. Lewis, and M. Overton. A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization. SIAM Journal on Optimization, 15:751–779, Jan 2005.
- [4] W. Cheney and A. A. Goldstein. Proximity maps for convex sets. Proceedings of the American Mathematical Society, 10(3):448–450, 1959.
- [5] C. Christof and G. Müller. Multiobjective Optimal Control of a Non-Smooth Semilinear Elliptic Partial Differential Equation. European Series in Applied and Industrial Mathematics (ESAIM) : Control, Optimisation and Calculus of Variations, 2020.
- [6] F. Clarke. Optimization and Nonsmooth Analysis. Society for Industrial and Applied Mathematics, 1983.
- [7] J. Cruz Neto, G. Silva, O. Ferreira, and J. Lopes. A subgradient method for multiobjective optimization. Computational Optimization and Applications, 54:461–472, Apr 2013.
- [8] F. E. Curtis and X. Que. An adaptive gradient sampling algorithm for non-smooth optimization. Optimization Methods and Software, 28(6):1302–1324, 2013.
- [9] K. Deb. Multi-objective optimization using evolutionary algorithms, volume 16. John Wiley & Sons, 2001.
- [10] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, Apr 2002.
- [11] M. Dellnitz, O. Schütze, and T. Hestermeyer. Covering Pareto Sets by Multilevel Subdivision Techniques. Journal of Optimization Theory and Applications, 124(1):113–136, 2005.
- [12] J. Fliege, L. Graña, and B. F. Svaiter. Newton’s Method for Multiobjective Optimization. SIAM Journal on Optimization, 20, Mar 2008.
- [13] J. Fliege and B. F. Svaiter. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494, Aug 2000.
- [14] B. Gebken, S. Peitz, and M. Dellnitz. A Descent Method for Equality and Inequality Constrained Multiobjective Optimization Problems. In L. Trujillo, O. Schütze, Y. Maldonado, and P. Valle, editors, Numerical and Evolutionary Optimization – NEO 2017, pages 29–61. Springer, Cham, 2019.
- [15] B. Gebken, S. Peitz, and M. Dellnitz. On the hierarchical structure of Pareto critical sets. Journal of Global Optimization, 73(4):891–913, 2019.
- [16] A. Goldstein. Optimization of Lipschitz continuous functions. Mathematical Programming, 13:14–22, 12 1977.
- [17] E. S. Helou, S. A. Santos, and L. E. A. Simões. On the differentiability check in gradient sampling methods. Optimization Methods and Software, 31(5):983–1007, 2016.
- [18] K. C. Kiwiel. A descent method for nonsmooth convex multiobjective minimization. Large scale systems, 8(2):119–129, 1985.
- [19] K. C. Kiwiel. Methods of Descent for Nondifferentiable Optimization. Springer-Verlag Berlin Heidelberg, 1985.
- [20] K. C. Kiwiel. Proximity Control in Bundle Methods for Convex Nondifferentiable Minimization. Mathematical Programming, 46:105–122, Jan 1990.
- [21] K. C. Kiwiel. A Nonderivative Version of the Gradient Sampling Algorithm for Nonsmooth Nonconvex Optimization. SIAM Journal on Optimization, 20(4):1983–1994, 2010.
- [22] N. Mahdavi-Amiri and R. Yousefpour. An Effective Nonsmooth Optimization Algorithm for Locally Lipschitz Functions. Journal of Optimization Theory and Applications, 155(1):180–195, Oct 2012.
- [23] M. M. Mäkelä. Multiobjective proximal bundle method for nonconvex nonsmooth optimization: Fortran subroutine mpbngc 2.0. Reports of the Department of Mathematical Information Technology, Series B. Scientific Computing, B, 13:2003, 2003.
- [24] M. M. Mäkelä, V.-P. Eronen, and N. Karmitsa. On Nonsmooth Multiobjective Optimality Conditions with Generalized Convexities, pages 333–357. Springer New York, New York, NY, 2014.
- [25] M. M. Mäkelä, N. Karmitsa, and O. Wilppu. Multiobjective proximal bundle method for nonsmooth optimization. TUCS technical report No 1120, Turku Centre for Computer Science, Turku, 2014.
- [26] M. M. Mäkelä and P. Neittaanmäki. Nonsmooth Optimization: Analysis and Algorithms with Applications to Optimal Control. World Scientific, 1992.
- [27] K. Miettinen. Nonlinear Multiobjective Optimization. Springer US, 1998.
- [28] O. Montonen, N. Karmitsa, and M. M. Mäkelä. Multiple subgradient descent bundle method for convex nonsmooth multiobjective optimization. Optimization, 67(1):139–158, 2018.
- [29] B. Mordukhovich. Multiobjective optimization problems with equilibrium constraints. Optimization Methods and Software, 117:331–354, Jul 2008.
- [30] S. Schäffler, R. Schultz, and K. Weinzierl. Stochastic Method for the Solution of Unconstrained Vector Optimization Problems. J. Optim. Theory Appl., 114(1):209–222, 2002.
- [31] N. Shor. Minimization Methods for Non-Differentiable Function. Springer-Verlag Berlin Heidelberg, 1985.