∎ ∎

¹¹institutetext: J. Chen ²²institutetext: National Center for Applied Mathematics in Chongqing, Chongqing Normal University, Chongqing 401331, China, and School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
[email protected]
X.X. Jiang ³³institutetext: College of Mathematics, Sichuan University, Chengdu, 610065, China
[email protected]
L.P. Tang ⁴⁴institutetext: National Center for Applied Mathematics in Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]
🖂X.M. Yang ⁵⁵institutetext: National Center for Applied Mathematics in Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]

On the Convergence of Newton-type Proximal Gradient Method for Multiobjective Optimization Problems

Jian Chen Xiaoxue Jiang Liping Tang Xinmin Yang

(Received: date / Accepted: date)

Abstract

In a recent study, Ansary (Optim Methods Softw 38(3):570-590,2023) proposed a Newton-type proximal gradient method for nonlinear multiobjective optimization problems (NPGMO). However, the favorable convergence properties typically associated with Newton-type methods were not established for NPGMO in Ansary’s work. In response to this gap, we develop a straightforward framework for analyzing the convergence behavior of the NPGMO. Specifically, under the assumption of strong convexity, we demonstrate that the NPGMO enjoys quadratic termination, superlinear convergence, and quadratic convergence for problems that are quadratic, twice continuously differentiable and twice Lipschitz continuously differentiable, respectively.

Keywords:

Multiobjective optimization Newton-type method Proximal gradient method Convergence

MSC:

90C29 90C30

1 Introduction

A multiobjective composite optimization problems (MCOPs) can be formulated as follows:

\displaystyle\min\limits_{x\in\mathbb{R}^{n}}F(x),

(MCOP)

where $F:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ is a vector-valued function. Each component $F_{i}$ , $i=1,2,...,m$ , is defined by

F_{i}:=f_{i}+g_{i},

where $f_{i}$ is continuously differentiable and $g_{i}$ is proper convex and lower semicontinuous but not necessarily differentiable.

Over the past two decades, descent methods have garnered increasing attention within the multiobjective optimization community (see, e.g., AP2021 ; BI2005 ; CL2016 ; CTY2023 ; FS2000 ; FD2009 ; FV2016 ; GI2004 ; LP2018 ; MP2018 ; MP2019 ; P2014 ; QG2011 ; TFY2019 and references therein). These methods generate descent directions by solving subproblems, eliminating the need for predefined parameters. As far as we know, the study of multiobjective gradient descent methods can be traced back to the pioneering efforts of Mukai M1980 as well as the seminal work of Fliege and Svaiter FS2000 .

Recently, Tanabe et al. TFY2019 proposed a proximal gradient method for MCOPs (PGMO), and their subsequent convergence rates analysis TFY2023 revealed that the PGMO exhibits slow convergence when dealing with ill-conditioned problems. Notably, most of multiobjective first-order methods are prone to sensitivity concerning problem conditioning. In response to this challenge, Fliege et al. FD2009 leveraged a quadratic model to better capture the problem’s geometry, and proposed the Newton’s method for MOPs (NMO). Parallel to its single-objective counterpart, the NMO demonstrates locally superlinear and quadratic convergence under standard assumptions. Building upon this foundation, Ansary recently adopted the idea of Fliege et al. FD2009 , and extended it to derive the Newton-type proximal gradient method for MOPs (NPGMO) A2023 . However, the appealing convergence properties associated with Newton’s method were not established within Ansary’s work. It is worth mentioning that Newton-type proximal gradient method for SOPs LSS2014 exhibits convergence akin to Newton’s method. This naturally raises the question: Does NPGMO enjoy the same desirable convergence properties as NMO?

The primary objective of this paper is to provide an affirmative answer to the question. The contributions of this study can be summarized as follows:

(i) We demonstrate that the NPGMO finds an exact Pareto solution in one iteration for strongly convex quadratic problems.

(ii) Given the assumptions that $f_{i}$ is both strongly convex and twice continuously differentiable for $i=1,2,...,m$ , we prove that NPGMO is strong convergence, and the local convergence rate is superlinear. Additionally, by further assuming that each $f_{i}$ is twice Lipschitz continuously differentiable, the local convergence rate of NPGMO is elevated to a quadratic rate.

(iii) We derive a simplified analysis framework to elucidate the appealing convergence properties of NPGMO, which can be linked back to the properties of NMO. The proofs are grounded in a modified fundamental inequality, a common tool employed in the convergence analysis of first-order methods.

The paper is organized as follows. In section 2, we present some necessary notations and definitions that will be used later. Section 3 revisits the descriptions and highlights certain properties of NPGMO. The convergence analysis of NPGMO is detailed in Section 4. Lastly, we draw our conclusions in the final section of the paper.

2 Preliminaries

Throughout this paper, the $n$ -dimensional Euclidean space $\mathbb{R}^{n}$ is equipped with the inner product $\langle\cdot,\cdot\rangle$ and the induced norm $\|\cdot\|$ . Denote $\mathbb{S}^{n}_{++}(\mathbb{S}^{n}_{+})$ the set of symmetric (semi-)positive definite matrices in $\mathbb{R}^{n\times n}$ . For a positive definite matrix $H$ , the notation $\|x\|_{H}=\sqrt{\langle x,Hx\rangle}$ is used to represent the norm induced by $H$ on vector $x$ . Additionally, we define

F^{\prime}_{i}(x;d):=\lim\limits_{t\downarrow 0}\frac{F_{i}(x+td)-F_{i}(x)}{t}

the directional derivative of $F_{i}$ at $x$ in the direction $d$ .

For simplicity, we utilize the notation $[m]:={1,2,...,m}$ , and define

\Delta_{m}:=\left\{\lambda:\sum\limits_{i\in[m]}\lambda_{i}=1,\lambda_{i}\geq 0,\ i\in[m]\right\}

to represent the $m$ -dimensional unit simplex. To prevent any ambiguity, we establish the order $\preceq(\prec)$ in $\mathbb{R}^{m}$ as follows:

u\preceq(\prec)v~{}\Leftrightarrow~{}v-u\in\mathbb{R}^{m}_{+}(\mathbb{R}^{m}_{++}),

and in $\mathbb{S}^{n}$ as:

U\preceq(\prec)V~{}\Leftrightarrow~{}V-U\in\mathbb{S}^{n}_{+}(\mathbb{S}^{n}_{++}).

In the following, we introduce the concepts of optimality for (MCOP) in the Pareto sense.

Definition 1.

A vector $x^{\ast}\in\mathbb{R}^{n}$ is called Pareto solution to (MCOP), if there exists no $x\in\mathbb{R}^{n}$ such that $F(x)\preceq F(x^{\ast})$ and $F(x)\neq F(x^{\ast})$ .

Definition 2.

A vector $x^{\ast}\in\mathbb{R}^{n}$ is called weakly Pareto solution to (MCOP), if there exists no $x\in\mathbb{R}^{n}$ such that $F(x)\prec F(x^{\ast})$ .

Definition 3.

A vector $x^{\ast}\in\mathbb{R}^{n}$ is called Pareto critical point of (MCOP), if

\max\limits_{i\in[m]}F_{i}^{\prime}(x^{*};d)\geq 0,~{}\forall d\in\mathbb{R}^{n}.

From Definitions 1 and 2, it is evident that Pareto solutions are always weakly Pareto solutions. The following lemma shows the relationships among the three concepts of Pareto optimality.

Lemma 1 (Theorem 3.1 of FD2009 ).

The following statements hold.

$\mathrm{(i)}$

If $x\in\mathbb{R}^{n}$ is a weakly Pareto solution to (MCOP), then $x$ is Pareto critical point.
$\mathrm{(ii)}$

Let every component $F_{i}$ of $F$ be convex. If $x\in\mathbb{R}^{n}$ is a Pareto critical point of (MCOP), then $x$ is weakly Pareto solution.
$\mathrm{(iii)}$

Let every component $F_{i}$ of $F$ be strictly convex. If $x\in\mathbb{R}^{n}$ is a Pareto critical point of (MCOP), then $x$ is Pareto solution.

Definition 4.

A twice continuously differentiable function $h$ is $\mu$ -strongly convex if

\mu I\preceq\nabla^{2}h(x)

holds for all $x\in\mathbb{R}^{n}$ .

3 Newton-type proximal gradient method for MCOPs

In this section, we revisit the multiobjective proximal Newton-type method, which was originally introduced by by Ansary A2023 .

3.1 Newton-type proximal gradient method

A multiobjective Newton-type proximal search direction corresponds to the unique optimal solution of the following subproblem:

\mathop{\min}\limits_{d\in\mathbb{R}^{n}}\max\limits_{i\in[m]}\left\{\langle\nabla f_{i}(x),d\rangle+g_{i}(x+d)-g_{i}(x)+\frac{1}{2}\left\langle{d,\nabla^{2}f_{i}(x)d}\right\rangle\right\},

(1)

namely,

d(x):=\mathop{\arg\min}\limits_{d\in\mathbb{R}^{n}}\max\limits_{i\in[m]}\left\{\langle\nabla f_{i}(x),d\rangle+g_{i}(x+d)-g_{i}(x)+\frac{1}{2}\left\langle{d,\nabla^{2}f_{i}(x)d}\right\rangle\right\}.

The optimal value of the subproblem is denoted by

\theta(x):=\min\limits_{d\in\mathbb{R}^{n}}\max\limits_{i\in[m]}\left\{\langle\nabla f_{i}(x),d\rangle+g_{i}(x+d)-g_{i}(x)+\frac{1}{2}\left\langle{d,\nabla^{2}f_{i}(x)d}\right\rangle\right\}.

For simplicity, we denote by

h_{\lambda}(x):=\sum\limits_{i\in[m]}\lambda_{i}h_{i}(x),

\nabla h_{\lambda}(x):=\sum\limits_{i\in[m]}\lambda_{i}\nabla h_{i}(x),

\nabla^{2}h_{\lambda}(x):=\sum\limits_{i\in[m]}\lambda_{i}\nabla^{2}h_{i}(x).

By Sion’s minimax theorem (S1958, ), there exists $\lambda^{k}\in\Delta_{m}$ such that

d^{k}=\mathop{\arg\min}\limits_{d\in\mathbb{R}^{n}}\left\{\langle\nabla f_{\lambda^{k}}(x^{k}),d\rangle+g_{\lambda^{k}}(x^{k}+d)-g_{\lambda^{k}}(x^{k})+\frac{1}{2}\left\langle{d,\nabla^{2}f_{\lambda^{k}}(x^{k})d}\right\rangle\right\},

we use the first-order optimality condition to get

-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})d^{k}\in\partial g_{\lambda^{k}}(x^{k}+d^{k}).

(2)

The following lemma presents several properties of $d(x)$ .

Lemma 2 (Lemma 3.2 and Theorem 3.1 of A2023 ).

Suppose $f_{i}$ is strictly convex function for all $i\in[m]$ . Let $d(x)$ be the optimal solution of (1). Then the following statements hold.

$\mathrm{(i)}$

$x\in\mathbb{R}^{n}$ is a Pareto critical point of (MCOP) if and only if $d(x)=0$ .
$\mathrm{(ii)}$

If $\{x^{k}\}$ converges $x^{*}$ , and $\{d(x^{k})\}$ converges $d^{*}$ , then $d(x^{*})=d^{*}$ .

Lemma 3 (Theorem 3.2 of A2023 ).

Suppose $f_{i}$ is strongly convex function with module $\mu>0$ for $i\in[m]$ . Then

\theta(x)\leq-\frac{\mu}{2}\left\lVert{d(x)}\right\rVert^{2}.

The multiobjective Newton-type proximal gradient with line search is described as follows:

Data:

x^{0}\in\mathbb{R}^{n},~{}\epsilon>0,~{}\sigma,\gamma\in(0,1)

1 for $k=0,...$ do

2 Compute

d^{k}

and

\theta^{k}

by solving subproblem (1) with

x=x^{k}

3 if $\|d^{k}\|<\epsilon$ then

4 return approximated Pareto critical point

x^{k}

5 else

6 Compute the stepsize

t_{k}\in(0,1]

as the maximum of

T_{k}:=\{\gamma^{j}:j\in\mathbb{N},~{}F_{i}(x^{k}+t_{k}d^{k})-F_{i}(x^{k})\leq\gamma^{j}\sigma\theta^{k},~{}i\in[m]\}

7 Update

x^{k+1}:=x^{k}+t_{k}d^{k}

8 end if

10 end for

Algorithm 1 Newton-type_proximal_gradient_method_for_MCOPs A2023

4 Convergence analysis of NPGMO

As we know, the Newton-type proximal gradient method for SOPs exhibits desirable convergence behavior of Newton-type methods for minimizing smooth functions, see LSS2014 . Naturally, this prompts the question: Does NPGMO exhibit similar desirable convergence properties to those of NMO when applied to minimizing smooth functions?

4.1 Strong convergence

As evident from Algorithm 1, it terminates either with a Pareto critical point in a finite number of iterations or generates an infinite sequence of points. In the forthcoming analysis, we will assume that Algorithm 1 generates an infinite sequence of noncritical points. In A2023 , Ansary analyzed the global convergence of NPGMO.

Theorem 4.1 (Theorem 4.1 of A2023 ).

Suppose $f_{i}$ is strongly convex with module $\mu>0$ , its gradient is Lipschitz continuous with constant $L_{1}$ for all $i\in[m]$ , and the level set $\mathcal{L}_{F}(x^{0}):=\{x:F(x)\preceq F(x^{0})\}$ is bounded. Let $\{x^{k}\}$ be the sequence produced by Algorithm 1. Then every accumulation point of $\{x^{k}\}$ is a Pareto critical point of (MCOP).

We can derive the strong convergence of NPGMO.

Theorem 4.2.

Suppose $f_{i}$ is strongly convex with module $\mu>0$ for all $i\in[m]$ . Let $\{x^{k}\}$ be the sequence produced by Algorithm 1. Then $\{x^{k}\}$ converges to some Pareto point $x^{*}$ .

Proof.

Since $f_{i}$ is strongly convex and $g_{i}$ is convex for $i\in[m]$ , this, together with the continuity of $f_{i}$ and lower semi-continuity of $g_{i}$ for $i\in[m]$ , yields the level set $\mathcal{L}_{F}(x^{0})\subset\{x:F_{i}(x)\preceq F_{i}(x^{0})\}$ is compact, and any Pareto critical point is a Pareto point. Moreover, we can infer $\{x^{k}\}\subset\mathcal{L}_{F}(x^{0})$ due to the fact that $\{F(x^{k})\}$ is decreasing. With the compactness of $\mathcal{L}_{F}(x^{0})$ , it follows that $\{x^{k}\}$ has an accumulation point $x^{*}$ and $\lim\limits_{k\rightarrow\infty}F(x^{k})=F(x^{*})$ . By applying a similar argument as presented in the proof of (TFY2019, , Theorem 4.2), it can be concluded that $x^{*}$ is a Pareto critical point (Pareto point). Next, we proceed to prove the uniqueness of $x^{*}$ . Suppose the contrary that there exists another distinct accumulation point $x^{*}_{1}$ . By the strong convexity of $F_{i}$ for $i\in[m]$ , the following inequality holds:

F(sx^{*}+(1-s)x^{*}_{1})\prec sF(x^{*})+(1-s)F(x^{*}_{1})=F(x^{*}),

where the equality follows from the convergence of $\{F(x^{k})\}$ . However, this contradicts the fact that $x^{*}$ is a Pareto point. The uniqueness of accumulation point of $\{x^{k}\}$ implies that $\{x^{k}\}$ converges to $x^{*}$ .

4.2 Quadratic termination

In this section, we will delve into the analysis of the quadratic termination property of Algorithm 1. Prior to presenting the outcome, we lay the foundation by establishing the subsequent modified fundamental inequality.

Proposition 1 (modified fundamental inequality).

Suppose $f_{i}$ is strictly convex for all $i\in[m]$ . Let $\{x^{k}\}$ be the sequence produced by Algorithm 1. If $t_{k}=1$ , then there exists $\lambda^{k}\in\Delta_{m}$ such that

		$\displaystyle~{}~{}~{}~{}F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x)$		(3)
		$\displaystyle\leq\frac{1}{2}\left\lVert{x^{k}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$
		$\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
		$\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle.$

Proof.

From the twice continuity of $f_{i}$ , we can deduce that

	$\displaystyle~{}~{}~{}~{}F_{i}(x^{k+1})-F_{i}(x)$
	$\displaystyle=(f_{i}(x^{k+1})-f_{i}(x^{k}))-(f_{i}(x)-f_{i}(x^{k}))+g_{i}(x^{k+1})-g_{i}(x)$
	$\displaystyle=\left\langle{\nabla f_{i}(x^{k}),x^{k+1}-x^{k}}\right\rangle+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}+\left\langle{\nabla f_{i}(x^{k}),x^{k}-x}\right\rangle-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle+g_{i}(x^{k+1})-g_{i}(x)$
	$\displaystyle=\left\langle{\nabla f_{i}(x^{k}),x^{k+1}-x}\right\rangle+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{i}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle+g_{i}(x^{k+1})-g_{i}(x).$

On the other hand, from (2), we have $-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})d^{k}\in\partial g_{\lambda^{k}}(x^{k}+d^{k}),~{}\lambda^{k}\in\Delta_{m}$ . This, together with the fact that $t_{k}=1$ , implies

	$\displaystyle g_{\lambda^{k}}(x^{k+1})-g_{\lambda^{k}}(x)$	$\displaystyle\leq\left\langle{-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})d^{k},x^{k+1}-x}\right\rangle$
		$\displaystyle=\left\langle{-\nabla f_{\lambda^{k}}(x^{k})-\nabla^{2}f_{\lambda^{k}}(x^{k})(x^{k+1}-x^{k}),x^{k+1}-x}\right\rangle.$

We use the last two relations to get

	$\displaystyle~{}~{}~{}~{}F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x)$
	$\displaystyle\leq\left\langle{\nabla^{2}f_{\lambda^{k}}(x^{k})(x^{k}-x^{k+1}),x^{k+1}-x}\right\rangle+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle$
	$\displaystyle=\frac{1}{2}\left\lVert{x^{k}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$
	$\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}-\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x-x^{k}))ds(t(x-x^{k}))dt}\right\rangle,$

where the equality comes from the three points lemma CT1993 . This completes the proof.

Theorem 4.3.

Suppose $f_{i}$ is strongly convex quadratic problem for $i\in[m]$ . Let $\{x^{k}\}$ be the sequence produced by Algorithm 1. Then $x^{k+1}=x^{*}$ , i.e., Algorithm 1 finds an exact Pareto point of (MCOP) in one iteration.

Proof.

First, we aim to establish that $t_{k}=1$ for all $k$ . Since $f_{i}$ is strongly convex quadratic problem for $i\in[m]$ , there exists a positive definite matrix $A_{i}$ such that $\nabla^{2}f_{i}(x)=A_{i}$ for all $x\in\mathbb{R}^{n}$ . This leads us to the conclusion that

	$\displaystyle F_{i}(x^{k}+d^{k})-F_{i}(x^{k})$	$\displaystyle=\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},A_{i}d^{k}}\right\rangle$
		$\displaystyle\leq\max\limits_{i\in[m]}\left\{\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},A_{i}d^{k}}\right\rangle\right\}$
		$\displaystyle=\theta^{k}$
		$\displaystyle\leq\sigma\theta^{k}.$

Thus, $t_{k}=1$ for all $k$ . For all $x\in\mathbb{R}^{n}$ , we use relation (3) to get

	$\displaystyle F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x)$	$\displaystyle\leq\frac{1}{2}\left\lVert{x^{k}-x}\right\rVert^{2}_{A_{\lambda^{k}}}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{A_{\lambda^{k}}}-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{A_{\lambda^{k}}}$
		$\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}A_{\lambda^{k}}t(x^{k+1}-x^{k})dt}\right\rangle-\left\langle{x-x^{k},\int_{0}^{1}A_{\lambda^{k}}t(x-x^{k})dt}\right\rangle$
		$\displaystyle=-\frac{1}{2}\left\lVert{x^{k+1}-x}\right\rVert^{2}_{A_{\lambda^{k}}},$

where $A_{\lambda^{k}}:=\sum\limits_{i\in[m]}\lambda^{k}_{i}A_{i}$ . Applying Theorem 4.2, we can assert the existence of $x^{*}$ such that $F(x^{*})\preceq F(x^{k})$ for all $k$ . Substituting $x=x^{*}$ into above inequality, we have

F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x^{*})\leq-\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{A_{\lambda^{k}}}.

Combining this with the fact that $F(x^{*})\preceq F(x^{k})$ leads to the following inequality

\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{A_{\lambda^{k}}}\leq 0.

Considering that $A_{i}$ is a positive definite matrix for $i\in[m]$ , it follows that $x^{k+1}=x^{*}$ . This completes the proof.

By setting $g_{i}(x)=0$ , for $i\in[m]$ , the quadratic termination property also applies to the NMO as presented in FD2009 .

Corollary 1.

Suppose $F_{i}$ is strongly convex quadratic problem for $i\in[m]$ . Let $\{x^{k}\}$ be the sequence produced by the NMO. Then $x^{k+1}=x^{*}$ , i.e., the NMO finds an exact Pareto point in one iteration.

4.3 Local superlinear convergence

Next, we give sufficient conditions for local superlinear convergence.

Theorem 4.4.

Suppose $f_{i}$ is strongly convex with module $\mu>0$ , and its Hessian is continuous for $i\in[m]$ . Let $\{x^{k}\}$ be the sequence produced by Algorithm 1. Then, for any $0<\epsilon\leq(1-\sigma)\mu$ , there exists $K_{\epsilon}>0$ such that

\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\sqrt{\frac{\epsilon(1+\tau_{k}^{2})}{\mu}}\left\lVert{x^{k}-x^{*}}\right\rVert

holds for all $k\geq K_{\epsilon}$ , where $\tau_{k}:=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]$ . Furthermore, the sequence $\{x^{k}\}$ converges superlinearly to $x^{*}$ .

Proof.

Referring to the arguments presented in the proof of Theorem 4.2, we conclude that $\{x^{k}\}$ converges to a certain Pareto point $x^{*}$ and $\{d^{k}\}$ converges to $0$ . This implies that for any $r>0$ , there exists $K_{r}>0$ such that $x^{k},x^{k}+d^{k}\in B[x^{*},r]$ for all $k\geq K_{r}$ . Given that $\nabla^{2}f_{i}$ is continuous, it follows that $\nabla^{2}f_{i}$ is uniformly continuous on the compact set $B[x^{*},r]$ . For any $0<\epsilon\leq(1-\sigma)\mu$ , there exists $K^{1}_{\epsilon}\geq K_{r}$ such that, for all $k\geq K^{1}_{\epsilon}$ ,

	$\displaystyle F_{i}(x^{k}+d^{k})-F_{i}(x^{k})$	$\displaystyle\leq\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},\nabla^{2}f_{i}(x^{k})d^{k}}\right\rangle+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle\leq\max\limits_{i\in[m]}\left\{\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},\nabla^{2}f_{i}(x^{k})d^{k}}\right\rangle\right\}+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle=\theta^{k}+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle=\sigma\theta^{k}+(1-\sigma)\theta^{k}+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle\leq\sigma\theta^{k},$

where the last inequality is due to the facts that $\theta^{k}\leq-\frac{\mu}{2}\left\lVert{d^{k}}\right\rVert^{2}$ (Lemma 3) and $\epsilon\leq(1-\sigma)\mu$ . Consequently, we can deduce that $t_{k}=1$ for all $k\geq K^{1}_{\epsilon}$ . Substituting $x=x^{*}$ into (3), we obtain

$\displaystyle 0$	$\displaystyle\leq F_{\lambda^{k}}(x^{k+1})-F_{\lambda^{k}}(x^{*})$	(4)
	$\displaystyle\leq\frac{1}{2}\left\lVert{x^{k}-x^{}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}-\frac{1}{2}\left\lVert{x^{k+1}-x^{}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$
	$\displaystyle~{}~{}~{}~{}+\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}-\left\langle{x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{}-x^{k}))ds(t(x^{*}-x^{k}))dt}\right\rangle,$

where the first inequality comes from the fact $F(x^{*})\preceq F(x^{k})$ for all $k$ . On the other hand, $\frac{1}{2}\left\lVert{x-x^{k}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$ can be equivalently expressed as

\left\langle{x-x^{k},\int_{0}^{1}\int_{0}^{1}\nabla^{2}f_{\lambda^{k}}(x^{k})ds(t(x-x^{k}))dt}\right\rangle.

By substituting the above relation into (4) with $x=x^{*}$ and $x=x^{k+1}$ , respectively, we have

		$\displaystyle~{}~{}~{}~{}\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$		(5)
		$\displaystyle\leq\left\langle{x^{k+1}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{k}))dt}\right\rangle$
		$\displaystyle~{}~{}~{}~{}-\left\langle{x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{*}-x^{k}))dt}\right\rangle.$

Since the sequence $\{x^{k}\}$ converges to a Pareto solution $x^{*}$ , there exists $K_{\epsilon}\geq K^{1}_{\epsilon}$ such that, for all $k\geq K_{\epsilon}$ ,

\left\lVert{\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})}\right\rVert\leq\epsilon,~{}\forall s,t\in[0,1],

and

\left\lVert{\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{*}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})}\right\rVert\leq\epsilon,~{}\forall s,t\in[0,1].

Substituting these two bounds into (5), we obtain

\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}\leq\frac{\epsilon}{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\frac{\epsilon}{2}\left\lVert{x^{*}-x^{k}}\right\rVert^{2}.

This, together with $\mu$ -strong convexity of $f_{i}$ , implies

\mu\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}\leq\epsilon\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\epsilon\left\lVert{x^{*}-x^{k}}\right\rVert^{2}.

(6)

Through direct calculation, we have

	$\displaystyle~{}~{}~{}~{}\epsilon\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\epsilon\left\lVert{x^{*}-x^{k}}\right\rVert^{2}$
	$\displaystyle\geq\mu\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}$
	$\displaystyle=\mu\left\lVert{x^{k+1}-x^{k}+x^{k}-x^{*}}\right\rVert^{2}$
	$\displaystyle\geq\mu\left\lVert{x^{k+1}-x^{k}}\right\rVert^{2}+\mu\left\lVert{x^{k}-x^{}}\right\rVert^{2}-2\mu\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k}-x^{}}\right\rVert.$

Rearranging and dividing by $\left\lVert{x^{k}-x^{*}}\right\rVert^{2}$ , we have

(\mu-\epsilon)\tau_{k}^{2}-2\mu\tau_{k}+\mu-\epsilon\leq 0,

where $\tau_{k}:=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}$ . Since $\epsilon\leq(1-\sigma)\mu$ , we deduce that $\tau_{k}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]$ . Substituting $\tau_{k}$ in to relation (6), we derive that

\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\sqrt{\frac{\epsilon(1+\tau_{k}^{2})}{\mu}}\left\lVert{x^{k}-x^{*}}\right\rVert.

Furthermore, since $\epsilon$ tends to $0$ when $k$ tends to infinity, it follows that

\lim\limits_{k\rightarrow\infty}\tau_{k}\in\lim\limits_{\epsilon\rightarrow 0}\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right]=\{1\}.

We use the relation to get

\lim\limits_{k\rightarrow\infty}\frac{\left\lVert{x^{k+1}-x^{*}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}=0.

This concludes the proof.

4.4 Quadratic convergence

The additional assumption of Lipschitz continuity of the Hessian $\nabla^{2}f_{i}$ for $i\in[m]$ guarantees a quadratic convergence rate of the NPGMO, as we will now demonstrate.

Theorem 4.5.

Suppose $f_{i}$ is strongly convex with module $\mu>0$ , and its Hessian is Lipschitz continuous with constant $L_{2}$ for $i\in[m]$ . Let $\{x^{k}\}$ be the sequence produced by Algorithm 1. Then, for all $\epsilon>0$ , there exists $K_{\epsilon}>0$ such that

\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\frac{(2\tau_{k}+1)L_{2}}{3\mu-\tau_{k}L\left\lVert{x^{k}-x^{*}}\right\rVert}\left\lVert{x^{k}-x^{*}}\right\rVert^{2}

Proof.

Drawing from the arguments presented in the proof of Theorem 4.4, we can establish that for any $0<\epsilon\leq(1-\sigma)\mu$ , there exists a threshold $K_{\epsilon}>0$ such that, for all $k\geq K_{\epsilon}$ ,

t_{k}=1

and

\tau_{k}\in\left[\frac{\mu-\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon},\frac{\mu+\sqrt{2\mu\epsilon-\epsilon^{2}}}{\mu-\epsilon}\right],

where $\tau_{k}=\frac{\left\lVert{x^{k+1}-x^{k}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert}$ . Utilizing relation (5), we deduce that

	$\displaystyle~{}~{}~{}~{}\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$
	$\displaystyle\leq\left\langle{x^{k+1}-x^{}+x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{}+x^{}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}-\left\langle{x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{*}-x^{k}))dt}\right\rangle$
	$\displaystyle=\left\langle{x^{k+1}-x^{},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}+2\left\langle{x^{k+1}-x^{},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}+\left\langle{x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{}-x^{k}))\right)ds(t(x^{*}-x^{k}))dt}\right\rangle$

	$\displaystyle\leq\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert dsdt\left\lVert{x^{k+1}-x^{}}\right\rVert^{2}+2\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert dsdt\left\lVert{x^{k+1}-x^{}}\right\rVert\left\lVert{x^{k}-x^{*}}\right\rVert$
	$\displaystyle~{}~{}~{}~{}+\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{}}\right\rVert dsdt\left\lVert{x^{k}-x^{}}\right\rVert^{2}$
	$\displaystyle=\frac{L_{2}}{6}\left\lVert{x^{k+1}-x^{}}\right\rVert\left(\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k+1}-x^{}}\right\rVert+2\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k}-x^{}}\right\rVert+\left\lVert{x^{k}-x^{}}\right\rVert^{2}\right)$
	$\displaystyle=\frac{L_{2}}{6}\left\lVert{x^{k+1}-x^{}}\right\rVert\left(\tau_{k}\left\lVert{x^{k}-x^{}}\right\rVert\left\lVert{x^{k+1}-x^{}}\right\rVert+(2\tau_{k}+1)\left\lVert{x^{k}-x^{}}\right\rVert^{2}\right),$

where the second inequality can be attributed to the Lipschitz continuity of $\nabla^{2}f_{i}$ for $i\in[m]$ , while the final equality originates from the definition of $\tau_{k}$ . By reordering terms and leveraging the $\mu$ -strong convexity of $f_{i}$ , we derive

\left\lVert{x^{k+1}-x^{*}}\right\rVert\leq\frac{(2\tau_{k}+1)L_{2}}{3\mu-\tau_{k}L_{2}\left\lVert{x^{k}-x^{*}}\right\rVert}\left\lVert{x^{k}-x^{*}}\right\rVert^{2}.

This, together with the convergence of ${x^{k}}$ to $x^{*}$ and the limit $\lim\limits_{k\rightarrow\infty}\tau_{k}=1$ , leads to

\lim\limits_{k\rightarrow\infty}\frac{\left\lVert{x^{k+1}-x^{*}}\right\rVert}{\left\lVert{x^{k}-x^{*}}\right\rVert^{2}}=\frac{L_{2}}{\mu}.

This concludes the proof.

5 Conclusions

In this paper, we have demonstrated the appealing convergence properties of NPGMO, including quadratic termination, locally superlinear convergence, and locally quadratic convergence. These results were established within a unified framework, which can potentially serve as a template for analyzing second-order methods for MOPs.

References

[1] M. A. T. Ansary. A newton-type proximal gradient method for nonlinear multi-objective optimization problems. Optimization Methods and Software, 38(3):570–590, 2023.
[2] M. A. T. Ansary and G. Panda. A globally convergent SQCQP method for multiobjective optimization problems. SIAM Journal on Optimization, 31(1):91–113, 2021.
[3] H. Bonnel, A. N. Iusem, and B. F. Svaiter. Proximal methods in vector optimization. SIAM Journal on Optimization, 15(4):953–970, 2005.
[4] G. A. Carrizo, P. A. Lotito, and M. C. Maciel. Trust region globalization strategy for the nonconvex unconstrained multiobjective optimization problem. Mathematical Programming, 159(1):339–369, 2016.
[5] G. Chen and M. Teboulle. Convergence analysis of a proximal-like minimization algorithm using bregman functions. SIAM Journal on Optimization, 3(3):538–543, 1993.
[6] J. Chen, L. P. Tang, and X. M. Yang. A Barzilai-Borwein descent method for multiobjective optimization problems. European Journal of Operational Research, 311(1):196–209, 2023.
[7] J. Fliege, L. M. Gra $\mathrm{\tilde{n}}$ a Drummond, and B. F. Svaiter. Newton’s method for multiobjective optimization. SIAM Journal on Optimization, 20(2):602–626, 2009.
[8] J. Fliege and B. F. Svaiter. Steepest descent methods for multicriteria optimization. Mathematical Methods of Operations Research, 51(3):479–494, 2000.
[9] J. Fliege and A. I. F. Vaz. A method for constrained multiobjective optimization based on SQP techniques. SIAM Journal on Optimization, 26(4):2091–2119, 2016.
[10] L. M. Gra $\mathrm{\tilde{n}}$ a Drummond and A. N. Iusem. A projected gradient method for vector optimization problems. Computational Optimization and Applications, 28(1):5–29, 2004.
[11] J. D. Lee, Y. Sun, and M. A. Saunders. Proximal newton-type methods for minimizing composite functions. SIAM Journal on Optimization, 24(3):1420–1443, 2014.
[12] L. R. Lucambio Pérez and L. F. Prudente. Nonlinear conjugate gradient methods for vector optimization. SIAM Journal on Optimization, 28(3):2690–2720, 2018.
[13] Q. Mercier, F. Poirion, and J. A. Désidéri. A stochastic multiple gradient descent algorithm. European Journal of Operational Research, 271(3):808–817, 2018.
[14] V. Morovati and L. Pourkarimi. Extension of zoutendijk method for solving constrained multiobjective optimization problems. European Journal of Operational Research, 273(1):44–57, 2019.
[15] H. Mukai. Algorithms for multicriterion optimization. IEEE Transactions on Automatic Control, 25(2):177–186, 1980.
[16] Ž. Povalej. Quasi-Newton’s method for multiobjective optimization. Journal of Computational and Applied Mathematics, 255:765–777, 2014.
[17] S. Qu, M. Goh, and F. T. Chan. Quasi-Newton methods for solving multiobjective optimization. Operations Research Letters, 39(5):397–399, 2011.
[18] M. Sion. On general minimax theorems. Pacific Journal of Mathematics, 8(1):171–176, 1958.
[19] H. Tanabe, E. H. Fukuda, and N. Yamashita. Proximal gradient methods for multiobjective optimization and their applications. Computational Optimization and Applications, 72:339–361, 2019.
[20] H. Tanabe, E. H. Fukuda, and N. Yamashita. Convergence rates analysis of a multiobjective proximal gradient method. Optimization Letters, 17:333–350, 2023.

Acknowledgements.

This work was funded by the Major Program of the National Natural Science Foundation of China [grant numbers 11991020, 11991024]; the National Natural Science Foundation of China [grant numbers 11971084, 12171060]; NSFC-RGC (Hong Kong) Joint Research Program [grant number 12261160365]; the Team Project of Innovation Leading Talent in Chongqing [grant number CQYC20210309536]; the Natural Science Foundation of Chongqing [grant number ncamc2022-msxm01]; and Foundation of Chongqing Normal University [grant numbers 22XLB005, 22XLB006].

	$\displaystyle F_{i}(x^{k}+d^{k})-F_{i}(x^{k})$	$\displaystyle\leq\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},\nabla^{2}f_{i}(x^{k})d^{k}}\right\rangle+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle\leq\max\limits_{i\in[m]}\left\{\left\langle{\nabla f_{i}(x^{k}),d^{k}}\right\rangle+g_{i}(x^{k}+d^{k})-g_{i}(x^{k})+\frac{1}{2}\left\langle{d^{k},\nabla^{2}f_{i}(x^{k})d^{k}}\right\rangle\right\}+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle=\theta^{k}+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle=\sigma\theta^{k}+(1-\sigma)\theta^{k}+\frac{\epsilon}{2}\\|d^{k}\\|^{2}$
		$\displaystyle\leq\sigma\theta^{k},$

	$\displaystyle~{}~{}~{}~{}\frac{1}{2}\left\lVert{x^{k+1}-x^{*}}\right\rVert^{2}_{\nabla^{2}f_{\lambda^{k}}(x^{k})}$
	$\displaystyle\leq\left\langle{x^{k+1}-x^{}+x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{}+x^{}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}-\left\langle{x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{*}-x^{k}))dt}\right\rangle$
	$\displaystyle=\left\langle{x^{k+1}-x^{},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{k+1}-x^{}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}+2\left\langle{x^{k+1}-x^{},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k})\right)ds(t(x^{}-x^{k}))dt}\right\rangle$
	$\displaystyle~{}~{}~{}~{}+\left\langle{x^{}-x^{k},\int_{0}^{1}\int_{0}^{1}\left(\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{k+1}-x^{k}))-\nabla^{2}f_{\lambda^{k}}(x^{k}+st(x^{}-x^{k}))\right)ds(t(x^{*}-x^{k}))dt}\right\rangle$

	$\displaystyle\leq\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert dsdt\left\lVert{x^{k+1}-x^{}}\right\rVert^{2}+2\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{k}}\right\rVert dsdt\left\lVert{x^{k+1}-x^{}}\right\rVert\left\lVert{x^{k}-x^{*}}\right\rVert$
	$\displaystyle~{}~{}~{}~{}+\int_{0}^{1}\int_{0}^{1}L_{2}st^{2}\left\lVert{x^{k+1}-x^{}}\right\rVert dsdt\left\lVert{x^{k}-x^{}}\right\rVert^{2}$
	$\displaystyle=\frac{L_{2}}{6}\left\lVert{x^{k+1}-x^{}}\right\rVert\left(\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k+1}-x^{}}\right\rVert+2\left\lVert{x^{k+1}-x^{k}}\right\rVert\left\lVert{x^{k}-x^{}}\right\rVert+\left\lVert{x^{k}-x^{}}\right\rVert^{2}\right)$
	$\displaystyle=\frac{L_{2}}{6}\left\lVert{x^{k+1}-x^{}}\right\rVert\left(\tau_{k}\left\lVert{x^{k}-x^{}}\right\rVert\left\lVert{x^{k+1}-x^{}}\right\rVert+(2\tau_{k}+1)\left\lVert{x^{k}-x^{}}\right\rVert^{2}\right),$