∎ ∎
[email protected]
X.X. Jiang 33institutetext: College of Mathematics, Sichuan University, Chengdu, 610065, China
[email protected]
L.P. Tang 44institutetext: National Center for Applied Mathematics in Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]
🖂X.M. Yang 55institutetext: National Center for Applied Mathematics in Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]
On the Convergence of Newton-type Proximal Gradient Method for Multiobjective Optimization Problems
Abstract
In a recent study, Ansary (Optim Methods Softw 38(3):570-590,2023) proposed a Newton-type proximal gradient method for nonlinear multiobjective optimization problems (NPGMO). However, the favorable convergence properties typically associated with Newton-type methods were not established for NPGMO in Ansary’s work. In response to this gap, we develop a straightforward framework for analyzing the convergence behavior of the NPGMO. Specifically, under the assumption of strong convexity, we demonstrate that the NPGMO enjoys quadratic termination, superlinear convergence, and quadratic convergence for problems that are quadratic, twice continuously differentiable and twice Lipschitz continuously differentiable, respectively.
Keywords:
Multiobjective optimization Newton-type method Proximal gradient method ConvergenceMSC:
90C29 90C301 Introduction
A multiobjective composite optimization problems (MCOPs) can be formulated as follows:
(MCOP) |
where is a vector-valued function. Each component , , is defined by
where is continuously differentiable and is proper convex and lower semicontinuous but not necessarily differentiable.
Over the past two decades, descent methods have garnered increasing attention within the multiobjective optimization community (see, e.g., AP2021 ; BI2005 ; CL2016 ; CTY2023 ; FS2000 ; FD2009 ; FV2016 ; GI2004 ; LP2018 ; MP2018 ; MP2019 ; P2014 ; QG2011 ; TFY2019 and references therein). These methods generate descent directions by solving subproblems, eliminating the need for predefined parameters. As far as we know, the study of multiobjective gradient descent methods can be traced back to the pioneering efforts of Mukai M1980 as well as the seminal work of Fliege and Svaiter FS2000 .
Recently, Tanabe et al. TFY2019 proposed a proximal gradient method for MCOPs (PGMO), and their subsequent convergence rates analysis TFY2023 revealed that the PGMO exhibits slow convergence when dealing with ill-conditioned problems. Notably, most of multiobjective first-order methods are prone to sensitivity concerning problem conditioning. In response to this challenge, Fliege et al. FD2009 leveraged a quadratic model to better capture the problem’s geometry, and proposed the Newton’s method for MOPs (NMO). Parallel to its single-objective counterpart, the NMO demonstrates locally superlinear and quadratic convergence under standard assumptions. Building upon this foundation, Ansary recently adopted the idea of Fliege et al. FD2009 , and extended it to derive the Newton-type proximal gradient method for MOPs (NPGMO) A2023 . However, the appealing convergence properties associated with Newton’s method were not established within Ansary’s work. It is worth mentioning that Newton-type proximal gradient method for SOPs LSS2014 exhibits convergence akin to Newton’s method. This naturally raises the question: Does NPGMO enjoy the same desirable convergence properties as NMO?
The primary objective of this paper is to provide an affirmative answer to the question. The contributions of this study can be summarized as follows:
(i) We demonstrate that the NPGMO finds an exact Pareto solution in one iteration for strongly convex quadratic problems.
(ii) Given the assumptions that is both strongly convex and twice continuously differentiable for , we prove that NPGMO is strong convergence, and the local convergence rate is superlinear. Additionally, by further assuming that each is twice Lipschitz continuously differentiable, the local convergence rate of NPGMO is elevated to a quadratic rate.
(iii) We derive a simplified analysis framework to elucidate the appealing convergence properties of NPGMO, which can be linked back to the properties of NMO. The proofs are grounded in a modified fundamental inequality, a common tool employed in the convergence analysis of first-order methods.
The paper is organized as follows. In section 2, we present some necessary notations and definitions that will be used later. Section 3 revisits the descriptions and highlights certain properties of NPGMO. The convergence analysis of NPGMO is detailed in Section 4. Lastly, we draw our conclusions in the final section of the paper.
2 Preliminaries
Throughout this paper, the -dimensional Euclidean space is equipped with the inner product and the induced norm . Denote the set of symmetric (semi-)positive definite matrices in . For a positive definite matrix , the notation is used to represent the norm induced by on vector . Additionally, we define
the directional derivative of at in the direction .
For simplicity, we utilize the notation , and define
to represent the -dimensional unit simplex. To prevent any ambiguity, we establish the order in as follows:
and in as:
In the following, we introduce the concepts of optimality for (MCOP) in the Pareto sense.
Definition 1.
A vector is called Pareto solution to (MCOP), if there exists no such that and .
Definition 2.
A vector is called weakly Pareto solution to (MCOP), if there exists no such that .
Definition 3.
A vector is called Pareto critical point of (MCOP), if
From Definitions 1 and 2, it is evident that Pareto solutions are always weakly Pareto solutions. The following lemma shows the relationships among the three concepts of Pareto optimality.
Lemma 1 (Theorem 3.1 of FD2009 ).
The following statements hold.
Definition 4.
A twice continuously differentiable function is -strongly convex if
holds for all .
3 Newton-type proximal gradient method for MCOPs
In this section, we revisit the multiobjective proximal Newton-type method, which was originally introduced by by Ansary A2023 .
3.1 Newton-type proximal gradient method
A multiobjective Newton-type proximal search direction corresponds to the unique optimal solution of the following subproblem:
(1) |
namely,
The optimal value of the subproblem is denoted by
For simplicity, we denote by
By Sion’s minimax theorem (S1958, ), there exists such that
we use the first-order optimality condition to get
(2) |
The following lemma presents several properties of .
Lemma 2 (Lemma 3.2 and Theorem 3.1 of A2023 ).
Lemma 3 (Theorem 3.2 of A2023 ).
Suppose is strongly convex function with module for . Then
The multiobjective Newton-type proximal gradient with line search is described as follows:
4 Convergence analysis of NPGMO
As we know, the Newton-type proximal gradient method for SOPs exhibits desirable convergence behavior of Newton-type methods for minimizing smooth functions, see LSS2014 . Naturally, this prompts the question: Does NPGMO exhibit similar desirable convergence properties to those of NMO when applied to minimizing smooth functions?
4.1 Strong convergence
As evident from Algorithm 1, it terminates either with a Pareto critical point in a finite number of iterations or generates an infinite sequence of points. In the forthcoming analysis, we will assume that Algorithm 1 generates an infinite sequence of noncritical points. In A2023 , Ansary analyzed the global convergence of NPGMO.
Theorem 4.1 (Theorem 4.1 of A2023 ).
We can derive the strong convergence of NPGMO.
Theorem 4.2.
Suppose is strongly convex with module for all . Let be the sequence produced by Algorithm 1. Then converges to some Pareto point .
Proof.
Since is strongly convex and is convex for , this, together with the continuity of and lower semi-continuity of for , yields the level set is compact, and any Pareto critical point is a Pareto point. Moreover, we can infer due to the fact that is decreasing. With the compactness of , it follows that has an accumulation point and . By applying a similar argument as presented in the proof of (TFY2019, , Theorem 4.2), it can be concluded that is a Pareto critical point (Pareto point). Next, we proceed to prove the uniqueness of . Suppose the contrary that there exists another distinct accumulation point . By the strong convexity of for , the following inequality holds:
where the equality follows from the convergence of . However, this contradicts the fact that is a Pareto point. The uniqueness of accumulation point of implies that converges to .
4.2 Quadratic termination
In this section, we will delve into the analysis of the quadratic termination property of Algorithm 1. Prior to presenting the outcome, we lay the foundation by establishing the subsequent modified fundamental inequality.
Proposition 1 (modified fundamental inequality).
Suppose is strictly convex for all . Let be the sequence produced by Algorithm 1. If , then there exists such that
(3) | ||||
Proof.
Theorem 4.3.
Proof.
First, we aim to establish that for all . Since is strongly convex quadratic problem for , there exists a positive definite matrix such that for all . This leads us to the conclusion that
Thus, for all . For all , we use relation (3) to get
where . Applying Theorem 4.2, we can assert the existence of such that for all . Substituting into above inequality, we have
Combining this with the fact that leads to the following inequality
Considering that is a positive definite matrix for , it follows that . This completes the proof.
By setting , for , the quadratic termination property also applies to the NMO as presented in FD2009 .
Corollary 1.
Suppose is strongly convex quadratic problem for . Let be the sequence produced by the NMO. Then , i.e., the NMO finds an exact Pareto point in one iteration.
4.3 Local superlinear convergence
Next, we give sufficient conditions for local superlinear convergence.
Theorem 4.4.
Suppose is strongly convex with module , and its Hessian is continuous for . Let be the sequence produced by Algorithm 1. Then, for any , there exists such that
holds for all , where . Furthermore, the sequence converges superlinearly to .
Proof.
Referring to the arguments presented in the proof of Theorem 4.2, we conclude that converges to a certain Pareto point and converges to . This implies that for any , there exists such that for all . Given that is continuous, it follows that is uniformly continuous on the compact set . For any , there exists such that, for all ,
where the last inequality is due to the facts that (Lemma 3) and . Consequently, we can deduce that for all . Substituting into (3), we obtain
(4) | ||||
where the first inequality comes from the fact for all . On the other hand, can be equivalently expressed as
By substituting the above relation into (4) with and , respectively, we have
(5) | ||||
Since the sequence converges to a Pareto solution , there exists such that, for all ,
and
Substituting these two bounds into (5), we obtain
This, together with -strong convexity of , implies
(6) |
Through direct calculation, we have
Rearranging and dividing by , we have
where . Since , we deduce that . Substituting in to relation (6), we derive that
Furthermore, since tends to when tends to infinity, it follows that
We use the relation to get
This concludes the proof.
4.4 Quadratic convergence
The additional assumption of Lipschitz continuity of the Hessian for guarantees a quadratic convergence rate of the NPGMO, as we will now demonstrate.
Theorem 4.5.
Suppose is strongly convex with module , and its Hessian is Lipschitz continuous with constant for . Let be the sequence produced by Algorithm 1. Then, for all , there exists such that
holds for all , where . Furthermore, the sequence converges quadratically to .
Proof.
Drawing from the arguments presented in the proof of Theorem 4.4, we can establish that for any , there exists a threshold such that, for all ,
and
where . Utilizing relation (5), we deduce that
where the second inequality can be attributed to the Lipschitz continuity of for , while the final equality originates from the definition of . By reordering terms and leveraging the -strong convexity of , we derive
This, together with the convergence of to and the limit , leads to
This concludes the proof.
5 Conclusions
In this paper, we have demonstrated the appealing convergence properties of NPGMO, including quadratic termination, locally superlinear convergence, and locally quadratic convergence. These results were established within a unified framework, which can potentially serve as a template for analyzing second-order methods for MOPs.
References
- [1] M. A. T. Ansary. A newton-type proximal gradient method for nonlinear multi-objective optimization problems. Optimization Methods and Software, 38(3):570–590, 2023.
- [2] M. A. T. Ansary and G. Panda. A globally convergent SQCQP method for multiobjective optimization problems. SIAM Journal on Optimization, 31(1):91–113, 2021.
- [3] H. Bonnel, A. N. Iusem, and B. F. Svaiter. Proximal methods in vector optimization. SIAM Journal on Optimization, 15(4):953–970, 2005.
- [4] G. A. Carrizo, P. A. Lotito, and M. C. Maciel. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem. Mathematical Programming, 159(1):339–369, 2016.
- [5] G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
- [6] J. Chen, L. P. Tang, and X. M. Yang. A Barzilai-Borwein descent method for multiobjective optimization problems. European Journal of Operational Research, 311(1):196–209, 2023.
- [7] J. Fliege, L. M. Graa Drummond, and B. F. Svaiter. Newton’s method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626, 2009.
- [8] J. Fliege and B. F. Svaiter. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494, 2000.
- [9] J. Fliege and A. I. F. Vaz. A method for constrained multiobjective optimization based on SQP techniques. SIAM Journal on Optimization, 26(4):2091–2119, 2016.
- [10] L. M. Graa Drummond and A. N. Iusem. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1):5–29, 2004.
- [11] J. D. Lee, Y. Sun, and M. A. Saunders. Proximal newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420–1443, 2014.
- [12] L. R. Lucambio Pérez and L. F. Prudente. Nonlinear conjugate gradient methods for vector optimization. SIAM Journal on Optimization, 28(3):2690–2720, 2018.
- [13] Q. Mercier, F. Poirion, and J. A. Désidéri. A stochastic multiple gradient descent algorithm. European Journal of Operational Research, 271(3):808–817, 2018.
- [14] V. Morovati and L. Pourkarimi. Extension of zoutendijk method for solving constrained multiobjective optimization problems. European Journal of Operational Research, 273(1):44–57, 2019.
- [15] H. Mukai. Algorithms for multicriterion optimization. IEEE Transactions on Automatic Control, 25(2):177–186, 1980.
- [16] Ž. Povalej. Quasi-Newton’s method for multiobjective optimization. Journal of Computational and Applied Mathematics, 255:765–777, 2014.
- [17] S. Qu, M. Goh, and F. T. Chan. Quasi-Newton methods for solving multiobjective optimization. Operations Research Letters, 39(5):397–399, 2011.
- [18] M. Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, 1958.
- [19] H. Tanabe, E. H. Fukuda, and N. Yamashita. Proximal gradient methods for multiobjective optimization and their applications. Computational Optimization and Applications, 72:339–361, 2019.
- [20] H. Tanabe, E. H. Fukuda, and N. Yamashita. Convergence rates analysis of a multiobjective proximal gradient method. Optimization Letters, 17:333–350, 2023.