This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

∎ ∎

11institutetext: J. Chen 22institutetext: National Center for Applied Mathematics in Chongqing, Chongqing Normal University, Chongqing 401331, China, and School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
[email protected]
X.X. Jiang
33institutetext: College of Mathematics, Sichuan University, Chengdu, 610065, China
[email protected]
L.P. Tang
44institutetext: National Center for Applied Mathematics in Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]
🖂X.M. Yang
55institutetext: National Center for Applied Mathematics in Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]

On the Convergence of Newton-type Proximal Gradient Method for Multiobjective Optimization Problems

Jian Chen    Xiaoxue Jiang    Liping Tang    Xinmin Yang
(Received: date / Accepted: date)
Abstract

In a recent study, Ansary (Optim Methods Softw 38(3):570-590,2023) proposed a Newton-type proximal gradient method for nonlinear multiobjective optimization problems (NPGMO). However, the favorable convergence properties typically associated with Newton-type methods were not established for NPGMO in Ansary’s work. In response to this gap, we develop a straightforward framework for analyzing the convergence behavior of the NPGMO. Specifically, under the assumption of strong convexity, we demonstrate that the NPGMO enjoys quadratic termination, superlinear convergence, and quadratic convergence for problems that are quadratic, twice continuously differentiable and twice Lipschitz continuously differentiable, respectively.

Keywords:
Multiobjective optimization Newton-type method Proximal gradient method Convergence
MSC:
90C29 90C30

1 Introduction

A multiobjective composite optimization problems (MCOPs) can be formulated as follows:

minxnF(x),\displaystyle\min\limits_{x\in\mathbb{R}^{n}}F(x), (MCOP)

where F:nmF:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is a vector-valued function. Each component FiF_{i}, i=1,2,,mi=1,2,...,m, is defined by

Fi:=fi+gi,F_{i}:=f_{i}+g_{i},

where fif_{i} is continuously differentiable and gig_{i} is proper convex and lower semicontinuous but not necessarily differentiable.

Over the past two decades, descent methods have garnered increasing attention within the multiobjective optimization community (see, e.g., AP2021 ; BI2005 ; CL2016 ; CTY2023 ; FS2000 ; FD2009 ; FV2016 ; GI2004 ; LP2018 ; MP2018 ; MP2019 ; P2014 ; QG2011 ; TFY2019 and references therein). These methods generate descent directions by solving subproblems, eliminating the need for predefined parameters. As far as we know, the study of multiobjective gradient descent methods can be traced back to the pioneering efforts of Mukai M1980 as well as the seminal work of Fliege and Svaiter FS2000 .

Recently, Tanabe et al. TFY2019 proposed a proximal gradient method for MCOPs (PGMO), and their subsequent convergence rates analysis TFY2023 revealed that the PGMO exhibits slow convergence when dealing with ill-conditioned problems. Notably, most of multiobjective first-order methods are prone to sensitivity concerning problem conditioning. In response to this challenge, Fliege et al. FD2009 leveraged a quadratic model to better capture the problem’s geometry, and proposed the Newton’s method for MOPs (NMO). Parallel to its single-objective counterpart, the NMO demonstrates locally superlinear and quadratic convergence under standard assumptions. Building upon this foundation, Ansary recently adopted the idea of Fliege et al. FD2009 , and extended it to derive the Newton-type proximal gradient method for MOPs (NPGMO) A2023 . However, the appealing convergence properties associated with Newton’s method were not established within Ansary’s work. It is worth mentioning that Newton-type proximal gradient method for SOPs LSS2014 exhibits convergence akin to Newton’s method. This naturally raises the question: Does NPGMO enjoy the same desirable convergence properties as NMO?

The primary objective of this paper is to provide an affirmative answer to the question. The contributions of this study can be summarized as follows:

(i) We demonstrate that the NPGMO finds an exact Pareto solution in one iteration for strongly convex quadratic problems.

(ii) Given the assumptions that fif_{i} is both strongly convex and twice continuously differentiable for i=1,2,,mi=1,2,...,m, we prove that NPGMO is strong convergence, and the local convergence rate is superlinear. Additionally, by further assuming that each fif_{i} is twice Lipschitz continuously differentiable, the local convergence rate of NPGMO is elevated to a quadratic rate.

(iii) We derive a simplified analysis framework to elucidate the appealing convergence properties of NPGMO, which can be linked back to the properties of NMO. The proofs are grounded in a modified fundamental inequality, a common tool employed in the convergence analysis of first-order methods.

The paper is organized as follows. In section 2, we present some necessary notations and definitions that will be used later. Section 3 revisits the descriptions and highlights certain properties of NPGMO. The convergence analysis of NPGMO is detailed in Section 4. Lastly, we draw our conclusions in the final section of the paper.

2 Preliminaries

Throughout this paper, the nn-dimensional Euclidean space n\mathbb{R}^{n} is equipped with the inner product ,\langle\cdot,\cdot\rangle and the induced norm \|\cdot\|. Denote 𝕊++n(𝕊+n)\mathbb{S}^{n}_{++}(\mathbb{S}^{n}_{+}) the set of symmetric (semi-)positive definite matrices in n×n\mathbb{R}^{n\times n}. For a positive definite matrix HH, the notation xH=x,Hx\|x\|_{H}=\sqrt{\langle x,Hx\rangle} is used to represent the norm induced by HH on vector xx. Additionally, we define

Fi(x;d):=limt0Fi(x+td)Fi(x)tF^{\prime}_{i}(x;d):=\lim\limits_{t\downarrow 0}\frac{F_{i}(x+td)-F_{i}(x)}{t}

the directional derivative of FiF_{i} at xx in the direction dd.

For simplicity, we utilize the notation [m]:=1,2,,m[m]:={1,2,...,m}, and define

Δm:={λ:i[m]λi=1,λi0,i[m]}\Delta_{m}:=\left\{\lambda:\sum\limits_{i\in[m]}\lambda_{i}=1,\lambda_{i}\geq 0,\ i\in[m]\right\}

to represent the mm-dimensional unit simplex. To prevent any ambiguity, we establish the order ()\preceq(\prec) in m\mathbb{R}^{m} as follows:

u()vvu+m(++m),u\preceq(\prec)v~{}\Leftrightarrow~{}v-u\in\mathbb{R}^{m}_{+}(\mathbb{R}^{m}_{++}),

and in 𝕊n\mathbb{S}^{n} as:

U()VVU𝕊+n(𝕊++n).U\preceq(\prec)V~{}\Leftrightarrow~{}V-U\in\mathbb{S}^{n}_{+}(\mathbb{S}^{n}_{++}).

In the following, we introduce the concepts of optimality for (MCOP) in the Pareto sense.

Definition 1.

A vector xnx^{\ast}\in\mathbb{R}^{n} is called Pareto solution to (MCOP), if there exists no xnx\in\mathbb{R}^{n} such that F(x)F(x)F(x)\preceq F(x^{\ast}) and F(x)F(x)F(x)\neq F(x^{\ast}).

Definition 2.

A vector xnx^{\ast}\in\mathbb{R}^{n} is called weakly Pareto solution to (MCOP), if there exists no xnx\in\mathbb{R}^{n} such that F(x)F(x)F(x)\prec F(x^{\ast}).

Definition 3.

A vector xnx^{\ast}\in\mathbb{R}^{n} is called Pareto critical point of (MCOP), if

maxi[m]Fi(x;d)0,dn.\max\limits_{i\in[m]}F_{i}^{\prime}(x^{*};d)\geq 0,~{}\forall d\in\mathbb{R}^{n}.

From Definitions 1 and 2, it is evident that Pareto solutions are always weakly Pareto solutions. The following lemma shows the relationships among the three concepts of Pareto optimality.

Lemma 1 (Theorem 3.1 of FD2009 ).

The following statements hold.

  • (i)\mathrm{(i)}

    If xnx\in\mathbb{R}^{n} is a weakly Pareto solution to (MCOP), then xx is Pareto critical point.

  • (ii)\mathrm{(ii)}

    Let every component FiF_{i} of FF be convex. If xnx\in\mathbb{R}^{n} is a Pareto critical point of (MCOP), then xx is weakly Pareto solution.

  • (iii)\mathrm{(iii)}

    Let every component FiF_{i} of FF be strictly convex. If xnx\in\mathbb{R}^{n} is a Pareto critical point of (MCOP), then xx is Pareto solution.

Definition 4.

A twice continuously differentiable function hh is μ\mu-strongly convex if

μI2h(x)\mu I\preceq\nabla^{2}h(x)

holds for all xnx\in\mathbb{R}^{n}.

3 Newton-type proximal gradient method for MCOPs

In this section, we revisit the multiobjective proximal Newton-type method, which was originally introduced by by Ansary A2023 .

3.1 Newton-type proximal gradient method

A multiobjective Newton-type proximal search direction corresponds to the unique optimal solution of the following subproblem:

mindnmaxi[m]{fi(x),d+gi(x+d)gi(x)+12d,2fi(x)d},\mathop{\min}\limits_{d\in\mathbb{R}^{n}}\max\limits_{i\in[m]}\left\{\langle\nabla f_{i}(x),d\rangle+g_{i}(x+d)-g_{i}(x)+\frac{1}{2}\left\langle{d,\nabla^{2}f_{i}(x)d}\right\rangle\right\}, (1)

namely,

d(x):=argmindnmaxi[m]{fi(x),d+gi(x+d)gi(x)+12d,2fi(x)d}.d(x):=\mathop{\arg\min}\limits_{d\in\mathbb{R}^{n}}\max\limits_{i\in[m]}\left\{\langle\nabla f_{i}(x),d\rangle+g_{i}(x+d)-g_{i}(x)+\frac{1}{2}\left\langle{d,\nabla^{2}f_{i}(x)d}\right\rangle\right\}.

The optimal value of the subproblem is denoted by

θ(x):=mindnmaxi[m]{fi(x),d+gi(x+d)gi(x)+12d,2fi(x)d}.\theta(x):=\min\limits_{d\in\mathbb{R}^{n}}\max\limits_{i\in[m]}\left\{\langle\nabla f_{i}(x),d\rangle+g_{i}(x+d)-g_{i}(x)+\frac{1}{2}\left\langle{d,\nabla^{2}f_{i}(x)d}\right\rangle\right\}.

For simplicity, we denote by

hλ(x):=i[m]λihi(x),h_{\lambda}(x):=\sum\limits_{i\in[m]}\lambda_{i}h_{i}(x),
hλ(x):=i[m]λihi(x),\nabla h_{\lambda}(x):=\sum\limits_{i\in[m]}\lambda_{i}\nabla h_{i}(x),
2hλ(x):=i[m]λi2hi(x).\nabla^{2}h_{\lambda}(x):=\sum\limits_{i\in[m]}\lambda_{i}\nabla^{2}h_{i}(x).

By Sion’s minimax theorem (S1958, ), there exists λkΔm\lambda^{k}\in\Delta_{m} such that

dk=argmindn{fλk(xk),d+gλk(xk+d)gλk(xk)+12d,2fλk(xk)d},d^{k}=\mathop{\arg\min}\limits_{d\in\mathbb{R}^{n}}\left\{\langle\nabla f_{\lambda^{k}}(x^{k}),d\rangle+g_{\lambda^{k}}(x^{k}+d)-g_{\lambda^{k}}(x^{k})+\frac{1}{2}\left\langle{d,\nabla^{2}f_{\lambda^{k}}(x^{k})d}\right\rangle\right\},

we use the first-order optimality condition to get

fλk(xk)2fλk(xk)dkgλk(xk+dk).-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})d^{k}\in\partial g_{\lambda^{k}}(x^{k}+d^{k}). (2)

The following lemma presents several properties of d(x)d(x).

Lemma 2 (Lemma 3.2 and Theorem 3.1 of A2023 ).

Suppose fif_{i} is strictly convex function for all i[m]i\in[m]. Let d(x)d(x) be the optimal solution of (1). Then the following statements hold.

  • (i)\mathrm{(i)}

    xnx\in\mathbb{R}^{n} is a Pareto critical point of (MCOP) if and only if d(x)=0d(x)=0.

  • (ii)\mathrm{(ii)}

    If {xk}\{x^{k}\} converges xx^{*}, and {d(xk)}\{d(x^{k})\} converges dd^{*}, then d(x)=dd(x^{*})=d^{*}.

Lemma 3 (Theorem 3.2 of A2023 ).

Suppose fif_{i} is strongly convex function with module μ>0\mu>0 for i[m]i\in[m]. Then

θ(x)μ2d(x)2.\theta(x)\leq-\frac{\mu}{2}\left\lVert{d(x)}\right\rVert^{2}.

The multiobjective Newton-type proximal gradient with line search is described as follows:

Data: x0n,ϵ>0,σ,γ(0,1)x^{0}\in\mathbb{R}^{n},~{}\epsilon>0,~{}\sigma,\gamma\in(0,1)
1 for k=0,k=0,... do
2      Compute dkd^{k} and θk\theta^{k} by solving subproblem (1) with x=xkx=x^{k}
3       if dk<ϵ\|d^{k}\|<\epsilon then
4             return approximated Pareto critical point xkx^{k}
5      else
6            Compute the stepsize tk(0,1]t_{k}\in(0,1] as the maximum of
Tk:={γj:j,Fi(xk+tkdk)Fi(xk)γjσθk,i[m]}T_{k}:=\{\gamma^{j}:j\in\mathbb{N},~{}F_{i}(x^{k}+t_{k}d^{k})-F_{i}(x^{k})\leq\gamma^{j}\sigma\theta^{k},~{}i\in[m]\}
7             Update xk+1:=xk+tkdkx^{k+1}:=x^{k}+t_{k}d^{k}
8       end if
9      
10 end for
Algorithm 1 Newton-type_proximal_gradient_method_for_MCOPs A2023

4 Convergence analysis of NPGMO

As we know, the Newton-type proximal gradient method for SOPs exhibits desirable convergence behavior of Newton-type methods for minimizing smooth functions, see LSS2014 . Naturally, this prompts the question: Does NPGMO exhibit similar desirable convergence properties to those of NMO when applied to minimizing smooth functions?

4.1 Strong convergence

As evident from Algorithm 1, it terminates either with a Pareto critical point in a finite number of iterations or generates an infinite sequence of points. In the forthcoming analysis, we will assume that Algorithm 1 generates an infinite sequence of noncritical points. In A2023 , Ansary analyzed the global convergence of NPGMO.

Theorem 4.1 (Theorem 4.1 of A2023 ).

Suppose fif_{i} is strongly convex with module μ>0\mu>0, its gradient is Lipschitz continuous with constant L1L_{1} for all i[m]i\in[m], and the level set F(x0):={x:F(x)F(x0)}\mathcal{L}_{F}(x^{0}):=\{x:F(x)\preceq F(x^{0})\} is bounded. Let {xk}\{x^{k}\} be the sequence produced by Algorithm 1. Then every accumulation point of {xk}\{x^{k}\} is a Pareto critical point of (MCOP).

We can derive the strong convergence of NPGMO.

Theorem 4.2.

Suppose fif_{i} is strongly convex with module μ>0\mu>0 for all i[m]i\in[m]. Let {xk}\{x^{k}\} be the sequence produced by Algorithm 1. Then {xk}\{x^{k}\} converges to some Pareto point xx^{*}.

Proof.

Since fif_{i} is strongly convex and gig_{i} is convex for i[m]i\in[m], this, together with the continuity of fif_{i} and lower semi-continuity of gig_{i} for i[m]i\in[m], yields the level set F(x0){x:Fi(x)Fi(x0)}\mathcal{L}_{F}(x^{0})\subset\{x:F_{i}(x)\preceq F_{i}(x^{0})\} is compact, and any Pareto critical point is a Pareto point. Moreover, we can infer {xk}F(x0)\{x^{k}\}\subset\mathcal{L}_{F}(x^{0}) due to the fact that {F(xk)}\{F(x^{k})\} is decreasing. With the compactness of F(x0)\mathcal{L}_{F}(x^{0}), it follows that {xk}\{x^{k}\} has an accumulation point xx^{*} and limkF(xk)=F(x)\lim\limits_{k\rightarrow\infty}F(x^{k})=F(x^{*}) . By applying a similar argument as presented in the proof of (TFY2019, , Theorem 4.2), it can be concluded that xx^{*} is a Pareto critical point (Pareto point). Next, we proceed to prove the uniqueness of xx^{*}. Suppose the contrary that there exists another distinct accumulation point x1x^{*}_{1}. By the strong convexity of FiF_{i} for i[m]i\in[m], the following inequality holds:

F(sx+(1s)x1)sF(x)+(1s)F(x1)=F(x),F(sx^{*}+(1-s)x^{*}_{1})\prec sF(x^{*})+(1-s)F(x^{*}_{1})=F(x^{*}),

where the equality follows from the convergence of {F(xk)}\{F(x^{k})\}. However, this contradicts the fact that xx^{*} is a Pareto point. The uniqueness of accumulation point of {xk}\{x^{k}\} implies that {xk}\{x^{k}\} converges to xx^{*}.

4.2 Quadratic termination

In this section, we will delve into the analysis of the quadratic termination property of Algorithm 1. Prior to presenting the outcome, we lay the foundation by establishing the subsequent modified fundamental inequality.

Proposition 1 (modified fundamental inequality).

Suppose fif_{i} is strictly convex for all i[m]i\in[m]. Let {xk}\{x^{k}\} be the sequence produced by Algorithm 1. If tk=1t_{k}=1, then there exists λkΔm\lambda^{k}\in\Delta_{m} such that

Fλk(xk+1)Fλk(x)\displaystyle~{}~{}~{}~{}F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x) (3)
12xkx2fλk(xk)212xk+1xk2fλk(xk)212xk+1x2fλk(xk)2\displaystyle\leq\frac{1}{2}\left\lVert{x^{k}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}
+xk+1xk,01012fλk(xk+st(xk+1xk))𝑑s(t(xk+1xk))𝑑t\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle
xxk,01012fλk(xk+st(xxk))𝑑s(t(xxk))𝑑t.\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle.
Proof.

From the twice continuity of fif_{i}, we can deduce that

Fi(xk+1)Fi(x)\displaystyle~{}~{}~{}~{}F_{i}(x^{k+1})-F_{i}(x)
=(fi(xk+1)fi(xk))(fi(x)fi(xk))+gi(xk+1)gi(x)\displaystyle=(f_{i}(x^{k+1})-f_{i}(x^{k}))-(f_{i}(x)-f_{i}(x^{k}))+g_{i}(x^{k+1})-g_{i}(x)
=fi(xk),xk+1xk+xk+1xk,01012fi(xk+st(xk+1xk))𝑑s(t(xk+1xk))𝑑t\displaystyle=\left\langle{\nabla f_{i}(x^{k}),x^{k+1}-x^{k}}\right\rangle+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle
+fi(xk),xkxxxk,01012fi(xk+st(xxk))𝑑s(t(xxk))𝑑t+gi(xk+1)gi(x)\displaystyle~{}~{}~{}~{}+\left\langle{\nabla f_{i}(x^{k}),x^{k}-x}\right\rangle-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle+g_{i}(x^{k+1})-g_{i}(x)
=fi(xk),xk+1x+xk+1xk,01012fi(xk+st(xk+1xk))𝑑s(t(xk+1xk))𝑑t\displaystyle=\left\langle{\nabla f_{i}(x^{k}),x^{k+1}-x}\right\rangle+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle
xxk,01012fi(xk+st(xxk))𝑑s(t(xxk))𝑑t+gi(xk+1)gi(x).\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle+g_{i}(x^{k+1})-g_{i}(x).

On the other hand, from (2), we have fλk(xk)2fλk(xk)dkgλk(xk+dk),λkΔm-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})d^{k}\in\partial g_{\lambda^{k}}(x^{k}+d^{k}),~{}\lambda^{k}\in\Delta_{m}. This, together with the fact that tk=1t_{k}=1, implies

gλk(xk+1)gλk(x)\displaystyle g_{\lambda^{k}}(x^{k+1})-g_{\lambda^{k}}(x) fλk(xk)2fλk(xk)dk,xk+1x\displaystyle\leq\left\langle{-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})d^{k},x^{k+1}-x}\right\rangle
=fλk(xk)2fλk(xk)(xk+1xk),xk+1x.\displaystyle=\left\langle{-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})(x^{k+1}-x^{k}),x^{k+1}-x}\right\rangle.

We use the last two relations to get

Fλk(xk+1)Fλk(x)\displaystyle~{}~{}~{}~{}F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x)
2fλk(xk)(xkxk+1),xk+1x+xk+1xk,01012fλk(xk+st(xk+1xk))𝑑s(t(xk+1xk))𝑑t\displaystyle\leq\left\langle{\nabla^{2}f_{\lambda^{k}}(x^{k})(x^{k}-x^{k+1}),x^{k+1}-x}\right\rangle+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle
xxk,01012fλk(xk+st(xxk))𝑑s(t(xxk))𝑑t\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle
=12xkx2fλk(xk)212xk+1xk2fλk(xk)212xk+1x2fλk(xk)2\displaystyle=\frac{1}{2}\left\lVert{x^{k}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}
+xk+1xk,01012fλk(xk+st(xk+1xk))𝑑s(t(xk+1xk))𝑑t\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle
xxk,01012fλk(xk+st(xxk))𝑑s(t(xxk))𝑑t,\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle,

where the equality comes from the three points lemma CT1993 . This completes the proof.

Theorem 4.3.

Suppose fif_{i} is strongly convex quadratic problem for i[m]i\in[m]. Let {xk}\{x^{k}\} be the sequence produced by Algorithm 1. Then xk+1=xx^{k+1}=x^{*}, i.e., Algorithm 1 finds an exact Pareto point of (MCOP) in one iteration.

Proof.

First, we aim to establish that tk=1t_{k}=1 for all kk. Since fif_{i} is strongly convex quadratic problem for i[m]i\in[m], there exists a positive definite matrix AiA_{i} such that 2fi(x)=Ai\nabla^{2}f_{i}(x)=A_{i} for all xnx\in\mathbb{R}^{n}. This leads us to the conclusion that

Fi(xk+dk)Fi(xk)\displaystyle F_{i}(x^{k}+d^{k})-F_{i}(x^{k}) =fi(xk),dk+gi(xk+dk)gi(xk)+12dk,Aidk\displaystyle=\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},A_{i}d^{k}}\right\rangle
maxi[m]{fi(xk),dk+gi(xk+dk)gi(xk)+12dk,Aidk}\displaystyle\leq\max\limits_{i\in[m]}\left\{\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},A_{i}d^{k}}\right\rangle\right\}
=θk\displaystyle=\theta^{k}
σθk.\displaystyle\leq\sigma\theta^{k}.

Thus, tk=1t_{k}=1 for all kk. For all xnx\in\mathbb{R}^{n}, we use relation (3) to get

Fλk(xk+1)Fλk(x)\displaystyle F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x) 12xkxAλk212xk+1xkAλk212xk+1xAλk2\displaystyle\leq\frac{1}{2}\left\lVert{x^{k}-x}\right\rVert^{2}_{A_{\lambda^{k}}}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{A_{\lambda^{k}}}-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{A_{\lambda^{k}}}
+xk+1xk,01Aλkt(xk+1xk)𝑑txxk,01Aλkt(xxk)𝑑t\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}A_{\lambda^{k}}t(x^{k+1}-x^{k})dt}\right\rangle-\left\langle{x-x^{k},\int_{0}^{1}A_{\lambda^{k}}t(x-x^{k})dt}\right\rangle
=12xk+1xAλk2,\displaystyle=-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{A_{\lambda^{k}}},

where Aλk:=i[m]λikAiA_{\lambda^{k}}:=\sum\limits_{i\in[m]}\lambda^{k}_{i}A_{i}. Applying Theorem 4.2, we can assert the existence of xx^{*} such that F(x)F(xk)F(x^{*})\preceq F(x^{k}) for all kk. Substituting x=xx=x^{*} into above inequality, we have

Fλk(xk+1)Fλk(x)12xk+1xAλk2.F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x^{*})\leq-\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{A_{\lambda^{k}}}.

Combining this with the fact that F(x)F(xk)F(x^{*})\preceq F(x^{k}) leads to the following inequality

12xk+1xAλk20.\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{A_{\lambda^{k}}}\leq 0.

Considering that AiA_{i} is a positive definite matrix for i[m]i\in[m], it follows that xk+1=xx^{k+1}=x^{*}. This completes the proof.

By setting gi(x)=0g_{i}(x)=0, for i[m]i\in[m], the quadratic termination property also applies to the NMO as presented in FD2009 .

Corollary 1.

Suppose FiF_{i} is strongly convex quadratic problem for i[m]i\in[m]. Let {xk}\{x^{k}\} be the sequence produced by the NMO. Then xk+1=xx^{k+1}=x^{*}, i.e., the NMO finds an exact Pareto point in one iteration.

4.3 Local superlinear convergence

Next, we give sufficient conditions for local superlinear convergence.

Theorem 4.4.

Suppose fif_{i} is strongly convex with module μ>0\mu>0, and its Hessian is continuous for i[m]i\in[m]. Let {xk}\{x^{k}\} be the sequence produced by Algorithm 1. Then, for any 0<ϵ(1σ)μ0<\epsilon\leq(1-\sigma)\mu, there exists Kϵ>0K_{\epsilon}>0 such that

xk+1xϵ(1+τk2)μxkx\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\sqrt{\frac{\epsilon(1+\tau_{k}^{2})}{\mu}}\left\lVert{x^{k}-x^{*}}\right\rVert

holds for all kKϵk\geq K_{\epsilon}, where τk:=xk+1xkxkx[μ2μϵϵ2μϵ,μ+2μϵϵ2μϵ]\tau_{k}:=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]. Furthermore, the sequence {xk}\{x^{k}\} converges superlinearly to xx^{*}.

Proof.

Referring to the arguments presented in the proof of Theorem 4.2, we conclude that {xk}\{x^{k}\} converges to a certain Pareto point xx^{*} and {dk}\{d^{k}\} converges to 0. This implies that for any r>0r>0, there exists Kr>0K_{r}>0 such that xk,xk+dkB[x,r]x^{k},x^{k}+d^{k}\in B[x^{*},r] for all kKrk\geq K_{r}. Given that 2fi\nabla^{2}f_{i} is continuous, it follows that 2fi\nabla^{2}f_{i} is uniformly continuous on the compact set B[x,r]B[x^{*},r]. For any 0<ϵ(1σ)μ0<\epsilon\leq(1-\sigma)\mu, there exists Kϵ1KrK^{1}_{\epsilon}\geq K_{r} such that, for all kKϵ1k\geq K^{1}_{\epsilon},

Fi(xk+dk)Fi(xk)\displaystyle F_{i}(x^{k}+d^{k})-F_{i}(x^{k}) fi(xk),dk+gi(xk+dk)gi(xk)+12dk,2fi(xk)dk+ϵ2dk2\displaystyle\leq\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},\nabla^{2}f_{i}(x^{k})d^{k}}\right\rangle+\frac{\epsilon}{2}\|d^{k}\|^{2}
maxi[m]{fi(xk),dk+gi(xk+dk)gi(xk)+12dk,2fi(xk)dk}+ϵ2dk2\displaystyle\leq\max\limits_{i\in[m]}\left\{\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},\nabla^{2}f_{i}(x^{k})d^{k}}\right\rangle\right\}+\frac{\epsilon}{2}\|d^{k}\|^{2}
=θk+ϵ2dk2\displaystyle=\theta^{k}+\frac{\epsilon}{2}\|d^{k}\|^{2}
=σθk+(1σ)θk+ϵ2dk2\displaystyle=\sigma\theta^{k}+(1-\sigma)\theta^{k}+\frac{\epsilon}{2}\|d^{k}\|^{2}
σθk,\displaystyle\leq\sigma\theta^{k},

where the last inequality is due to the facts that θkμ2dk2\theta^{k}\leq-\frac{\mu}{2}\left\lVert{d^{k}}\right\rVert^{2} (Lemma 3) and ϵ(1σ)μ\epsilon\leq(1-\sigma)\mu. Consequently, we can deduce that tk=1t_{k}=1 for all kKϵ1k\geq K^{1}_{\epsilon}. Substituting x=xx=x^{*} into (3), we obtain

0\displaystyle 0 Fλk(xk+1)Fλk(x)\displaystyle\leq F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x^{*}) (4)
12xkx2fλk(xk)212xk+1xk2fλk(xk)212xk+1x2fλk(xk)2\displaystyle\leq\frac{1}{2}\left\lVert{x^{k}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}
+xk+1xk,01012fλk(xk+st(xk+1xk))𝑑s(t(xk+1xk))𝑑t\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle
xxk,01012fλk(xk+st(xxk))𝑑s(t(xxk))𝑑t,\displaystyle~{}~{}~{}~{}-\left\langle{x^{*}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{*}-x^{k}))ds(t(x^{*}-x^{k}))dt}\right\rangle,

where the first inequality comes from the fact F(x)F(xk)F(x^{*})\preceq F(x^{k}) for all kk. On the other hand, 12xxk2fλk(xk)2\frac{1}{2}\left\lVert{x-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})} can be equivalently expressed as

xxk,01012fλk(xk)𝑑s(t(xxk))𝑑t.\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k})ds(t(x-x^{k}))dt}\right\rangle.

By substituting the above relation into (4) with x=xx=x^{*} and x=xk+1x=x^{k+1}, respectively, we have

12xk+1x2fλk(xk)2\displaystyle~{}~{}~{}~{}\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})} (5)
xk+1xk,0101(2fλk(xk+st(xk+1xk))2fλk(xk))𝑑s(t(xk+1xk))𝑑t\displaystyle\leq\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{k}))dt}\right\rangle
xxk,0101(2fλk(xk+st(xxk))2fλk(xk))𝑑s(t(xxk))𝑑t.\displaystyle~{}~{}~{}~{}-\left\langle{x^{*}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{*}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{*}-x^{k}))dt}\right\rangle.

Since the sequence {xk}\{x^{k}\} converges to a Pareto solution xx^{*}, there exists KϵKϵ1K_{\epsilon}\geq K^{1}_{\epsilon} such that, for all kKϵk\geq K_{\epsilon},

2fλk(xk+st(xk+1xk))2fλk(xk)ϵ,s,t[0,1],\left\lVert{\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})}\right\rVert\leq\epsilon,~{}\forall s,t\in[0,1],

and

2fλk(xk+st(xxk))2fλk(xk)ϵ,s,t[0,1].\left\lVert{\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{*}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})}\right\rVert\leq\epsilon,~{}\forall s,t\in[0,1].

Substituting these two bounds into (5), we obtain

12xk+1x2fλk(xk)2ϵ2xk+1xk2+ϵ2xxk2.\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}\leq\frac{\epsilon}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\frac{\epsilon}{2}\left\lVert{x^{*}-x^{k}}\right\rVert^{2}.

This, together with μ\mu-strong convexity of fif_{i}, implies

μxk+1x2ϵxk+1xk2+ϵxxk2.\mu\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}\leq\epsilon\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\epsilon\left\lVert{x^{*}-x^{k}}\right\rVert^{2}. (6)

Through direct calculation, we have

ϵxk+1xk2+ϵxxk2\displaystyle~{}~{}~{}~{}\epsilon\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\epsilon\left\lVert{x^{*}-x^{k}}\right\rVert^{2}
μxk+1x2\displaystyle\geq\mu\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}
=μxk+1xk+xkx2\displaystyle=\mu\left\lVert{x^{k+1}-x^{k}+x^{k}-x^{*}}\right\rVert^{2}
μxk+1xk2+μxkx22μxk+1xkxkx.\displaystyle\geq\mu\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\mu\left\lVert{x^{k}-x^{*}}\right\rVert^{2}-2\mu\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k}-x^{*}}\right\rVert.

Rearranging and dividing by xkx2\left\lVert{x^{k}-x^{*}}\right\rVert^{2}, we have

(μϵ)τk22μτk+μϵ0,(\mu-\epsilon)\tau_{k}^{2}-2\mu\tau_{k}+\mu-\epsilon\leq 0,

where τk:=xk+1xkxkx\tau_{k}:=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}. Since ϵ(1σ)μ\epsilon\leq(1-\sigma)\mu, we deduce that τk[μ2μϵϵ2μϵ,μ+2μϵϵ2μϵ]\tau_{k}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]. Substituting τk\tau_{k} in to relation (6), we derive that

xk+1xϵ(1+τk2)μxkx.\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\sqrt{\frac{\epsilon(1+\tau_{k}^{2})}{\mu}}\left\lVert{x^{k}-x^{*}}\right\rVert.

Furthermore, since ϵ\epsilon tends to 0 when kk tends to infinity, it follows that

limkτklimϵ0[μ2μϵϵ2μϵ,μ+2μϵϵ2μϵ]={1}.\lim\limits_{k\rightarrow\infty}\tau_{k}\in\lim\limits_{\epsilon\rightarrow 0}\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]=\{1\}.

We use the relation to get

limkxk+1xxkx=0.\lim\limits_{k\rightarrow\infty}\frac{\left\lVert{x^{k+1}-x^{*}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}=0.

This concludes the proof.

4.4 Quadratic convergence

The additional assumption of Lipschitz continuity of the Hessian 2fi\nabla^{2}f_{i} for i[m]i\in[m] guarantees a quadratic convergence rate of the NPGMO, as we will now demonstrate.

Theorem 4.5.

Suppose fif_{i} is strongly convex with module μ>0\mu>0, and its Hessian is Lipschitz continuous with constant L2L_{2} for i[m]i\in[m]. Let {xk}\{x^{k}\} be the sequence produced by Algorithm 1. Then, for all ϵ>0\epsilon>0, there exists Kϵ>0K_{\epsilon}>0 such that

xk+1x(2τk+1)L23μτkLxkxxkx2\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\frac{(2\tau_{k}+1)L_{2}}{3\mu-\tau_{k}L\left\lVert{x^{k}-x^{*}}\right\rVert}\left\lVert{x^{k}-x^{*}}\right\rVert^{2}

holds for all kKϵk\geq K_{\epsilon}, where τk:=xk+1xkxkx[μ2μϵϵ2μϵ,μ+2μϵϵ2μϵ]\tau_{k}:=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]. Furthermore, the sequence {xk}\{x^{k}\} converges quadratically to xx^{*}.

Proof.

Drawing from the arguments presented in the proof of Theorem 4.4, we can establish that for any 0<ϵ(1σ)μ0<\epsilon\leq(1-\sigma)\mu, there exists a threshold Kϵ>0K_{\epsilon}>0 such that, for all kKϵk\geq K_{\epsilon},

tk=1t_{k}=1

and

τk[μ2μϵϵ2μϵ,μ+2μϵϵ2μϵ],\tau_{k}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right],

where τk=xk+1xkxkx\tau_{k}=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}. Utilizing relation (5), we deduce that

12xk+1x2fλk(xk)2\displaystyle~{}~{}~{}~{}\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}
xk+1x+xxk,0101(2fλk(xk+st(xk+1xk))2fλk(xk))𝑑s(t(xk+1x+xxk))𝑑t\displaystyle\leq\left\langle{x^{k+1}-x^{*}+x^{*}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{*}+x^{*}-x^{k}))dt}\right\rangle
xxk,0101(2fλk(xk+st(xxk))2fλk(xk))𝑑s(t(xxk))𝑑t\displaystyle~{}~{}~{}~{}-\left\langle{x^{*}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{*}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{*}-x^{k}))dt}\right\rangle
=xk+1x,0101(2fλk(xk+st(xk+1xk))2fλk(xk))𝑑s(t(xk+1x))𝑑t\displaystyle=\left\langle{x^{k+1}-x^{*},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{*}))dt}\right\rangle
+2xk+1x,0101(2fλk(xk+st(xk+1xk))2fλk(xk))𝑑s(t(xxk))𝑑t\displaystyle~{}~{}~{}~{}+2\left\langle{x^{k+1}-x^{*},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{*}-x^{k}))dt}\right\rangle
+xxk,0101(2fλk(xk+st(xk+1xk))2fλk(xk+st(xxk)))𝑑s(t(xxk))𝑑t\displaystyle~{}~{}~{}~{}+\left\langle{x^{*}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{*}-x^{k}))\right)ds(t(x^{*}-x^{k}))dt}\right\rangle
0101L2st2xk+1xk𝑑s𝑑txk+1x2+20101L2st2xk+1xk𝑑s𝑑txk+1xxkx\displaystyle\leq\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert dsdt\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}+2\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert dsdt\left\lVert{x^{k+1}-x^{*}}\right\rVert\left\lVert{x^{k}-x^{*}}\right\rVert
+0101L2st2xk+1x𝑑s𝑑txkx2\displaystyle~{}~{}~{}~{}+\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert dsdt\left\lVert{x^{k}-x^{*}}\right\rVert^{2}
=L26xk+1x(xk+1xkxk+1x+2xk+1xkxkx+xkx2)\displaystyle=\frac{L_{2}}{6}\left\lVert{x^{k+1}-x^{*}}\right\rVert\left(\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k+1}-x^{*}}\right\rVert+2\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k}-x^{*}}\right\rVert+\left\lVert{x^{k}-x^{*}}\right\rVert^{2}\right)
=L26xk+1x(τkxkxxk+1x+(2τk+1)xkx2),\displaystyle=\frac{L_{2}}{6}\left\lVert{x^{k+1}-x^{*}}\right\rVert\left(\tau_{k}\left\lVert{x^{k}-x^{*}}\right\rVert\left\lVert{x^{k+1}-x^{*}}\right\rVert+(2\tau_{k}+1)\left\lVert{x^{k}-x^{*}}\right\rVert^{2}\right),

where the second inequality can be attributed to the Lipschitz continuity of 2fi\nabla^{2}f_{i} for i[m]i\in[m], while the final equality originates from the definition of τk\tau_{k}. By reordering terms and leveraging the μ\mu-strong convexity of fif_{i}, we derive

xk+1x(2τk+1)L23μτkL2xkxxkx2.\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\frac{(2\tau_{k}+1)L_{2}}{3\mu-\tau_{k}L_{2}\left\lVert{x^{k}-x^{*}}\right\rVert}\left\lVert{x^{k}-x^{*}}\right\rVert^{2}.

This, together with the convergence of xk{x^{k}} to xx^{*} and the limit limkτk=1\lim\limits_{k\rightarrow\infty}\tau_{k}=1, leads to

limkxk+1xxkx2=L2μ.\lim\limits_{k\rightarrow\infty}\frac{\left\lVert{x^{k+1}-x^{*}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert^{2}}=\frac{L_{2}}{\mu}.

This concludes the proof.

5 Conclusions

In this paper, we have demonstrated the appealing convergence properties of NPGMO, including quadratic termination, locally superlinear convergence, and locally quadratic convergence. These results were established within a unified framework, which can potentially serve as a template for analyzing second-order methods for MOPs.

References

  • [1] M. A. T. Ansary. A newton-type proximal gradient method for nonlinear multi-objective optimization problems. Optimization Methods and Software, 38(3):570–590, 2023.
  • [2] M. A. T. Ansary and G. Panda. A globally convergent SQCQP method for multiobjective optimization problems. SIAM Journal on Optimization, 31(1):91–113, 2021.
  • [3] H. Bonnel, A. N. Iusem, and B. F. Svaiter. Proximal methods in vector optimization. SIAM Journal on Optimization, 15(4):953–970, 2005.
  • [4] G. A. Carrizo, P. A. Lotito, and M. C. Maciel. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem. Mathematical Programming, 159(1):339–369, 2016.
  • [5] G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
  • [6] J. Chen, L. P. Tang, and X. M. Yang. A Barzilai-Borwein descent method for multiobjective optimization problems. European Journal of Operational Research, 311(1):196–209, 2023.
  • [7] J. Fliege, L. M. Gran~\mathrm{\tilde{n}}a Drummond, and B. F. Svaiter. Newton’s method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626, 2009.
  • [8] J. Fliege and B. F. Svaiter. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494, 2000.
  • [9] J. Fliege and A. I. F. Vaz. A method for constrained multiobjective optimization based on SQP techniques. SIAM Journal on Optimization, 26(4):2091–2119, 2016.
  • [10] L. M. Gran~\mathrm{\tilde{n}}a Drummond and A. N. Iusem. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1):5–29, 2004.
  • [11] J. D. Lee, Y. Sun, and M. A. Saunders. Proximal newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420–1443, 2014.
  • [12] L. R. Lucambio Pérez and L. F. Prudente. Nonlinear conjugate gradient methods for vector optimization. SIAM Journal on Optimization, 28(3):2690–2720, 2018.
  • [13] Q. Mercier, F. Poirion, and J. A. Désidéri. A stochastic multiple gradient descent algorithm. European Journal of Operational Research, 271(3):808–817, 2018.
  • [14] V. Morovati and L. Pourkarimi. Extension of zoutendijk method for solving constrained multiobjective optimization problems. European Journal of Operational Research, 273(1):44–57, 2019.
  • [15] H. Mukai. Algorithms for multicriterion optimization. IEEE Transactions on Automatic Control, 25(2):177–186, 1980.
  • [16] Ž. Povalej. Quasi-Newton’s method for multiobjective optimization. Journal of Computational and Applied Mathematics, 255:765–777, 2014.
  • [17] S. Qu, M. Goh, and F. T. Chan. Quasi-Newton methods for solving multiobjective optimization. Operations Research Letters, 39(5):397–399, 2011.
  • [18] M. Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, 1958.
  • [19] H. Tanabe, E. H. Fukuda, and N. Yamashita. Proximal gradient methods for multiobjective optimization and their applications. Computational Optimization and Applications, 72:339–361, 2019.
  • [20] H. Tanabe, E. H. Fukuda, and N. Yamashita. Convergence rates analysis of a multiobjective proximal gradient method. Optimization Letters, 17:333–350, 2023.
Acknowledgements.
This work was funded by the Major Program of the National Natural Science Foundation of China [grant numbers 11991020, 11991024]; the National Natural Science Foundation of China [grant numbers 11971084, 12171060]; NSFC-RGC (Hong Kong) Joint Research Program [grant number 12261160365]; the Team Project of Innovation Leading Talent in Chongqing [grant number CQYC20210309536]; the Natural Science Foundation of Chongqing [grant number ncamc2022-msxm01]; and Foundation of Chongqing Normal University [grant numbers 22XLB005, 22XLB006].