∎ ∎

¹¹institutetext: J. Chen ²²institutetext: Department of Mathematics, Shanghai University, Shanghai 200444, China
[email protected]
L.P. Tang ³³institutetext: National Center for Applied Mathematics of Chongqing, Chongqing 401331, China
[email protected]
🖂X.M. Yang ⁴⁴institutetext: National Center for Applied Mathematics of Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]

Convergence rates analysis of Interior Bregman Gradient Method for Vector Optimization Problems

Jian Chen Liping Tang Xinmin Yang

(Received: date / Accepted: date)

Abstract

In recent years, by using Bregman distance, the Lipschitz gradient continuity and strong convexity were lifted and replaced by relative smoothness and relative strong convexity. Under the mild assumptions, it was proved that gradient methods with Bregman regularity converge linearly for single-objective optimization problems (SOPs). In this paper, we extend the relative smoothness and relative strong convexity to vector-valued functions and analyze the convergence of an interior Bregman gradient method for vector optimization problems (VOPs). Specifically, the global convergence rates are $\mathcal{O}(\frac{1}{k})$ and $\mathcal{O}(r^{k})(0<r<1)$ for convex and relative strongly convex VOPs, respectively. Moreover, the proposed method converges linearly for VOPs that satisfy a vector Bregman-PL inequality.

Keywords:

Vector optimization Bregman distance Relative smoothness Relative strong convexity Linear convergence

MSC:

65K05 90C26 90C29 90C30

1 Introduction

Let $C\subset\mathbb{R}^{m}$ be a closed, convex, and pointed cone with a non-empty interior. The partial order induced by $C$ :

y\preceq_{C}(\mathrm{resp.}\prec_{C})y^{\prime}\ \Leftrightarrow\ y^{\prime}-y\in C(\mathrm{resp.}\ \mathrm{int}(C)).

In this paper, we consider the the following vector optimization problem

\displaystyle\min\limits_{x\in\Omega}F(x),

(VOP)

where every component $F_{i}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ is differentiable, and $\Omega\subset\mathbb{R}^{n}$ is closed and convex with a non-empty interior. For the optimal solutions of (VOP), none of the objectives can be improved without sacrificing the others. The concept of optimality is thus defined as efficiency. It is worth noting that (VOP) corresponds to a multiobjective optimization problem when $C=\mathbb{R}^{m}_{+}$ , where $\mathbb{R}^{m}_{+}$ is the non-negative orthant of $\mathbb{R}^{m}$ . Some applications of multiobjective optimization problems can be found in engineering 1a , economics 1b ; eco , and machine learning mac1 ; m2 ; mac2 , etc.

Scalarization approaches mnp ; luc ; joh are widely explored for VOPs, which convert a VOP into a single-objective optimization problem so that classical numerical optimization methods can be applied to obtain the solutions to the original problem. Unfortunately, the approaches burden the decision-maker with some parameters which are unknown in advance. To overcome the drawback, Fliege and Svaiter 6 invented the parameter-free method, called the steepest descent method for multiobjective optimization. The corresponding method for VOPs is described in vsd . Motivated by the work of Fliege and Svaiter, some standard first-order method methods are extended to solve MOPs, and VOPs 8 ; 9a ; 9b ; 9c ; 9d ; chen . The main idea of these methods is to compute a descent direction; then, a line search technique is performed along the direction to ensure sufficient decrease for all objective functions. By the strategy, these descent methods produce a sequence of points that enjoy global convergence. Very recently, it was proved that the convergence rates of the multiobjective gradient descent method com and proximal gradient method pcom are the same as the ones for SOPs.

Similar to the first-order method for SOPs, a widely used assumption for MOPs is that the objective function gradients have to be globally Lipschitz continuous, which guarantees a sufficient descent in each iteration. By using Bregman distance, the restrictive assumption was replaced by the Lipschitz-like condition lc , or relative smoothness rs for SOPs. On the other hand, the relative strong convexity was proposed to replace strong convexity in rs . Under the mild assumptions, the linear convergence results were established for gradient rs and proximal gradient method as with Bregman regularization. Moreover, the linear convergence results of the gradient method were obtained with Bregman error bound condition nlns . Auslender and Teboulle ip presented another advantage of the Bregman method: the produced sequence can be restricted in the interior of the feasible set by choosing a suitable Bregman distance and thus eliminate the constraints.

The Bregman gradient methods were also explored in VOPs. Villacorta and Oliveira mip developed the interior Bregman gradient method with a modified convergence sensing condition for convex VOPs with a non-polyhedral set. In cz . Chen et al. introduced the vector-valued Bregman function for MOPs, and the corresponding vector-valued Bregman distance replaced the Euclidean distance in the proximal point algorithm (PPA). Very recently, the PPA with vector-valued Bregman distance was applied to solve multiobjective DC programming in 2022 . Moreover, the proximal gradient method with Bregman functions for MOPs has been investigated by Ansary and Dutta 2022a .

It should be noted that convergence analysis with the appealing potential of Bregman distance was not discussed in the methods mentioned above. Naturally, a question arises can we obtain linear convergence results of the Bregman gradient method for VOPs without Lipschitz gradient continuity and strong convexity. The paper’s main contribution is to answer the question positively. More specifically:

$\bullet$

We extend the relative smoothness and relative strong convexity to vector-valued functions in the context of VOPs;
$\bullet$

We present two merit functions and analyze their properties, such as optimality, continuity, and interrelation. We also proposed a generalized vector Bregman-PL inequality, which will be used to prove the linear convergence of the proposed method.
$\bullet$

We propose an interior Bregman gradient method for VOPs that restricts the produced sequence inside the feasible set. It is proved that every accumulation point of the produced sequence is $C$ -stationary point for VOP, and the whole sequence converges to a weakly efficient set with $C$ -convex objective functions.
$\bullet$

With relative smooth and strongly convex objective functions, we prove the produced sequence converges linearly to a weakly efficient point in the sense of Bregman distance. We also prove that a merit function converges linearly with the vector Bregman-PL inequality.

The organization of the paper is as follows. Some notations and definitions are given in Sect. 2 for our later use. Sect. 3 discusses the merit functions for VOPs. Sect. 4 is devoted to introducing the interior Bregman gradient method and proving the convergence results for the proposed method. At the end of the paper, some conclusions are drawn.

2 Preliminaries

2.1 Notations

$\bullet$

$[m]=\{1,2,...,m\}$ .
$\bullet$

$\Delta^{+}_{m}=\left\{\lambda:\sum\limits_{i\in[m]}\lambda_{i}=1,\lambda_{i}>0,\ i\in[m]\right\}$ the relative interior of $m$ -dimensional unit simplex.
$\bullet$

$\|\cdot\|$ the Euclidean norm, $\langle\cdot,\cdot\rangle$ the inner product.
$\bullet$

$\mathrm{int}(\cdot)$ and $\mathrm{cl}(\cdot)$ the interior and the closure of a set, respectively.
$\bullet$

$\mathrm{conv}(\cdot)$ the convex hull of a set.
$\bullet$

$B[x,r]$ the closed ball centred at $x$ with radius $r$ .
$\bullet$

$JF(x)\in\mathbb{R}^{m\times n}$ and $\nabla F_{i}(x)\in\mathbb{R}^{n}$ the Jacobian matrix and the gradient of $F_{i}$ at $x$ , respectively.
$\bullet$

$C^{*}=\{c^{*}\in\mathbb{R}^{m}:\langle c^{*},c\rangle\geq 0,\forall c\in C\}$ the positive polar cone of $C$ .
$\bullet$

$G=\mathrm{conv}(\{c^{*}\in\mathbb{R}^{m}:\|c^{*}\|=1,\forall c^{*}\in C^{*}\})$ .
$\bullet$

$\omega^{*}(x^{*})=\sup\limits_{x\in\mathbb{R}^{n}}\{\langle x^{*},x\rangle-\omega(x)\}$ the conjugate function of $\omega$ .

2.2 Vector optimization

Recall some definitions of solutions to (VOP) as follows.

Definition 1.

joh A vector $x^{*}\in\Omega$ is called efficient solution to (VOP), if there exists no $x\in\Omega$ such that $F(x)\preceq_{C}F(x^{\ast})$ and $F(x)\neq F(x^{\ast})$ .

Definition 2.

joh A vector $x^{*}\in\Omega$ is called weakly efficient solution to (VOP), if there exists no $x\in\Omega$ such that $F(x)\prec_{C}F(x^{\ast})$ .

Definition 3.

vsd A vector $x^{\ast}\in\Omega$ is called $C$ -stationary point to (VOP), if

JF(x^{*})(\Omega-x)\cap(-\mathrm{int}(C))=\emptyset.

For a non-stationary point $x$ , there exists a descent direction that is defined as:

Definition 4.

vsd A vector $d\in\mathbb{R}^{n}$ is called $C$ -descent direction for $F$ at $x$ , if

JF(x)d\in-\mathrm{int}(C).

Remark 1.

If $x\in\Omega$ is a non-stationary point, then there exists a vector $y\in\Omega$ such that $JF(x)(y-x)\in-\mathrm{int}(C)$ . Note that $\mathrm{int}(C)$ is an open set, the relation is also valid for some $y\in\mathrm{int}(\Omega)$ . It follows from Definition 4 that $y-x$ is a $C$ -descent direction.

Definition 5.

joh A function $F(\cdot)$ is called $C$ -convex on $\Omega$ , if

F(\lambda x+(1-\lambda)y)\preceq_{C}\lambda F(x)+(1-\lambda)F(y),\ \forall x,y\in\Omega,\ \lambda\in[0,1].

Since $F(\cdot)$ is differentiable, $C$ -convexity of $F(\cdot)$ on $\Omega$ is equivalent to

JF(x)(y-x)\preceq_{C}F(y)-F(x),\ \forall x,y\in\Omega.

By using the positive polar cone $C^{*}$ , an equivalent characterization of $C$ -convexity of $F(\cdot)$ on $\Omega$ is presented as follows.

Lemma 1.

luc A function $F(\cdot)$ is $C$ -convex on $\Omega$ if and only if $\langle c^{*},F(\cdot)\rangle$ is convex on $\Omega$ for all $c^{*}\in C^{*}$ .

Remark 2.

Note that $\frac{c^{*}}{\|c^{*}\|}\in G$ for all $c^{*}\in C^{*}\setminus\{0\}$ , then $F(\cdot)$ is $C$ -convex on $\Omega$ if and only if $\langle c^{*},F(\cdot)\rangle$ is convex on $\Omega$ for all $c^{*}\in G$ .

In $C$ -convex case, we derive the equivalence between $C$ -stationary point and weakly efficient solution.

Lemma 2.

joh Assume that the objective function $F(\cdot)$ is $C$ -convex on $\Omega$ . Then $x^{*}\in\Omega$ is a $C$ -stationary point of (VOP) if and only if $x^{*}$ is an efficient solution of (VOP).

2.3 Relative smoothness and relative strong convexity

Definition 6 (Legendre function).

roc Let $\omega:X\rightarrow\left(-\infty,+\infty\right]$ be a proper closed convex function, where $X\subset\mathbb{R}^{n}$ . Then

$\mathrm{(i)}$

$\omega(\cdot)$ is essentially smooth if $\omega(\cdot)$ is differentiable on $\mathrm{int(dom}\omega)$ and $\lim\limits_{k\rightarrow\infty}\|\nabla\omega(x^{k})\|\rightarrow\infty$ for all $\{x^{k}\}\subset\mathrm{int(dom}\omega)$ converges to some boundary point of $\mathrm{dom}\omega$ ;
$\mathrm{(ii)}$

$\omega(\cdot)$ is a Legendre function if $\omega(\cdot)$ is essentially smooth and strictly convex on $\mathrm{int(dom}\omega)$ .

Lemma 3.

roc For a Legendre function $\omega(\cdot)$ , we have the following useful properties.

$\mathrm{(i)}$

$\omega(\cdot)$ is Legendre if and only if its conjugate $\omega^{*}$ is Legendre.
$\mathrm{(ii)}$

$\omega^{*}(\nabla\omega(x))=\langle x,\nabla\omega(x)\rangle-\omega(x)$ .
$\mathrm{(iii)}$

$(\nabla\omega(\cdot))^{-1}=\nabla\omega^{*}(\cdot)$ .
$\mathrm{(iv)}$

$\mathrm{dom}\partial\omega=\mathrm{int(dom}\omega)$ with $\partial\omega(x)=\{\nabla\omega(x)\},\ \ \forall x\in\mathrm{int(dom}\omega)$ .

Using the Legendre function $\omega(\cdot)$ . the Bregman distance $D_{\omega}(\cdot,\cdot)$ w.r.t. $\omega(\cdot)$ is defined as

D_{\omega}(x,y)=\omega(x)-\omega(y)-\langle\nabla\omega(y),x-y\rangle,\ \forall(x,y)\in\mathrm{dom}(\omega)\times\mathrm{int(dom}\omega).

Since $\omega(\cdot)$ is strictly convex on $\mathrm{int(dom}\omega)$ , we have $D_{\omega}(x,y)\geq 0$ for all $x,y\in\mathrm{int(dom}\omega)$ , and the equality holds if and only if $x=y$ . In what follows, we recall a basic property for $D_{\omega}(\cdot,\cdot)$ that will be used in the sequel:

D_{\omega}(x,y)=D_{\omega^{*}}(\nabla\omega(y),\nabla\omega(x)),\ \forall x,y\in\mathrm{int(dom}\omega).

(1)

In general, $D_{\omega}(\cdot,\cdot)$ is not symmetric. The measure of symmetry for $D_{\omega}(\cdot,\cdot)$ was introduced in lc . The symmetric coefficient is given by

\alpha(\omega):=\inf\left\{\frac{D_{\omega}(x,y)}{D_{\omega}(y,x)}:x,y\in\mathrm{int(dom}\omega),\ x\neq y\right\}\in[0,1].

Remark 3.

As noted in lc , we call $D_{\omega}(\cdot,\cdot)$ symmetric if $\alpha(\omega)=1$ . If $\alpha(\omega)=0$ , then symmetry is absent for $D_{\omega}(\cdot,\cdot)$ . The former happens when $\omega$ is a strictly convex quadratic function, which was deduced from (sym, , Lemma 3.16). As shown in (lc, , Proposition 2), if $\mathrm{dom}\omega$ is not open, then $\alpha(\omega)=0$ .

In the context of vector optimization, we propose the relative smoothness of $F(\cdot)$ relative to Legendre function $\omega(\cdot)$ .

Definition 7.

For a vector $e\in\mathrm{int}C$ , we call $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ on $\Omega\subseteq\mathrm{dom}(\omega)$ if for all $x,y\in\mathrm{int}\Omega$ , there exists $L>0$ such that

F(y)\preceq_{C}F(x)+JF(x)(y-x)+LD_{\omega}(y,x)e.

The following proposition presents some properties for relative smoothness.

Proposition 1.

Assume $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ on $\Omega\subseteq\mathrm{dom}(\omega)$ , then

$\mathrm{(i)}$

it is equivalent to $L\omega(\cdot)e-F(\cdot)$ is $C$ -convex on $\Omega$ ;
$\mathrm{(ii)}$

for any $c\in\mathrm{int}C$ , $F(\cdot)$ is $(\gamma L,C,c)$ -smooth relative to $\omega(\cdot)$ on $\Omega$ , where $\gamma=\max\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}$ .

Proof.

(i) From Remark 2, $C$ -convexity of $L\omega(\cdot)e-F(\cdot)$ on $\Omega$ is equivalent to $\langle c^{*},L\omega(\cdot)e-F(\cdot)\rangle$ is convex on $\Omega$ for all $c^{*}\in G$ . Then, we have

\langle c^{*},LD_{\omega}(y,x)e-F(y)+F(x)+JF(x)(y-x)\rangle\geq 0,\ \forall x,y\in\mathrm{int}(\Omega),\ c^{*}\in G.

This is equivalent to $F(y)\preceq_{C}F(x)+JF(x)(y-x)+LD_{\omega}(y,x)e$ , assertion (i) is proved.

(ii) From assertion (i), we obtain $\langle c^{*},L\omega(\cdot)e-F(\cdot)\rangle$ is convex on $\Omega$ for all $c^{*}\in G$ . Since $\omega(\cdot)$ is convex, for any $c\in\mathrm{int}C$ , it follows that $\langle c^{*},\bar{L}\omega(\cdot)c-F(\cdot)\rangle$ is convex on $\Omega$ for all $c^{*}\in G$ when $\bar{L}\langle c^{*},c\rangle\geq L\langle c^{*},e\rangle$ holds for all $c^{*}\in G$ . The inequality holds by setting $\bar{L}=\max\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}L$ , and $\max\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}$ is well-defined due to $c\in\mathrm{int}C$ and the compactness of $G$ . $\ \ \square$

We also propose the relative strong convex of $F(\cdot)$ relative to Legendre function $\omega(\cdot)$ .

Definition 8.

For a vector $e\in\mathrm{int}C$ , we call $F(\cdot)$ is $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ on $\Omega\subseteq\mathrm{dom}(\omega)$ if for all $x,y\in\mathrm{int}\Omega$ , there exists $\mu\geq 0$ such that

F(x)+JF(x)(y-x)+\mu D_{\omega}(y,x)e\preceq_{C}F(y).

Similarly, we present some properties for relative strong convexity as follows.

Proposition 2.

Assume $F(\cdot)$ is $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ on $\Omega\subseteq\mathrm{dom}(\omega)$ , then

$\mathrm{(i)}$

it is equivalent to $F(\cdot)-\mu\omega(\cdot)e$ is $C$ -convex on $\Omega$ ;
$\mathrm{(ii)}$

for any $c\in\mathrm{int}C$ , $F(\cdot)$ is $(\rho\mu,C,c)$ -strongly convex relative to $\omega(\cdot)$ on $\Omega$ , where $\rho=\min\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}$ .

Proof.

The proof is similar to the arguments used in Proposition 1, we omit it here. $\ \ \square$

Remark 4.

When $\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}$ , the $(L,C,e)$ -smoothness reduces to condition (A) in cw . If $C=\mathbb{R}_{+}$ , the $(L,C,e)$ -smoothness and $(\mu,C,e)$ -strong convexity correspond to relative $L$ -smoothness and $\mu$ -strong convexity in rs , respectively.

From the above two propositions, the constants of relative smoothness and relative strong convexity depend on $\omega(\cdot)$ , $F(\cdot)$ and $e$ . Recall that the quotient $\frac{L}{\mu}$ plays a key role in the geometric convergence of first-order methods in the presence of strong convexity. If $\frac{L}{\mu}$ is large, the function descreases slowly when first-order methods are applied. We call the function is ill-conditioned in the situation. To alleviate this dilemma, for the general $\omega(\cdot)$ and $F(\cdot)$ , the vector $e$ is selected as:

e:=\mathop{\arg\max}\limits_{c\in B[0,r]}\min\limits_{c^{*}\in G}\langle c^{*},c\rangle.

(2)

When $C=\mathbb{R}_{+}^{m}$ , the vector $e$ defined as (2) corresponds to $(\frac{r}{\sqrt{m}},...,\frac{r}{\sqrt{m}})$ . For simplicity, we select a suitable $r>0$ such that

\max\limits_{c^{*}\in G}\langle c^{*},e\rangle=1.

(3)

On the other hand, since $C^{*}\subset\mathbb{R}^{m}$ is a closed, convex, and pointed cone, we have $0\notin G$ . Then we defined

\delta:=\min\limits_{c^{*}\in G}\langle c^{*},e\rangle>0.

(4)

3 Merit Functions for VOPs

A function is called merit function for (VOP) if it returns $0$ at the solutions of (VOP) and strictly positive values otherwise. In this section, we first present the following two merit functions.

u_{0}(x):=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},F(x)-F(y)\rangle,

(5)

v_{\ell}(x):=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\},

(6)

where $\ell>0$ is a constant and $\mathrm{cl}(\mathrm{dom}\omega)=\Omega$ .

3.1 Properties of merit functions

We can show that $u_{0}(\cdot)$ and $v_{\ell}(\cdot)$ are merit functions in the sense of weak efficiency and stationarity, respectively.

Theorem 1.

Let $u_{0}(\cdot)$ and $v_{\ell}(\cdot)$ be defined as (5) and (6), respectively. Then

$\mathrm{(i)}$

the point $x\in\Omega$ is a weakly efficient solution of (VOP) if and only if $u_{0}(x)=0$ ;
$\mathrm{(ii)}$

the point $x\in\mathrm{int}(\Omega)$ is $C$ -stationary point of (VOP) if and only if $v_{\ell}(x)=0$ ;

Proof.

(i) Since $x$ is a weakly efficient solution of (VOP), we have

F(x)-F(y)\notin\mathrm{int}(C),\ \forall y\in\Omega.

Then

\min\limits_{c^{*}\in G}\langle c^{*},F(x)-F(y)\rangle\leq 0,\ \forall y\in\Omega,

which is equivalent to

\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},F(x)-F(y)\rangle\leq 0.

On the other hand, the definition of $u_{0}(\cdot)$ implies that

u_{0}(x)\geq\min\limits_{c^{*}\in G}\{\langle c^{*},F(x)-F(x)\rangle\}=0,

which is equivalent to $u_{0}(x)=0$ .

(ii) Since $x$ is $C$ -stationary point of (VOP), we have

\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-y)\rangle\leq 0,

which, along the fact that $D_{\omega}(y,x)\geq 0$ , yields

v_{\ell}(x)=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}\leq 0.

On the other hand, the definition of $v_{\ell}$ implies that

v_{\ell}(x)\geq\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-x)\rangle-\ell D_{\omega}(x,x)\}=0.

Hence, $v_{\ell}(x)=0$ .

Conversely, assume that $x$ is not $C$ -stationary when $v_{\ell}(x)=0$ . Then, there exists $y_{0}\in\Omega$ such that

\min\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-y_{0})\rangle>0.

For all $t\in[0,1]$ , $x+t(y_{0}-x)\in\Omega$ . Hence,

	$\displaystyle v_{\ell}(x)$	$\displaystyle=\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}$
		$\displaystyle\geq\min\limits_{c^{}\in G}\{\langle c^{},tJF(x)(x-y_{0})\rangle-\ell D_{\omega}(x+t(y_{0}-x),x)\}.$

On the other hand, the differentiability of $\omega(\cdot)$ implies $\lim\limits_{t\rightarrow 0}\frac{D_{\omega}(x+t(y_{0}-x),x)}{t}=0$ . This together with the above inequality and $\min\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-y_{0})\rangle>0$ yield $v_{\ell}(x)>0$ . This contradicts $v_{\ell}(x)=0$ . The proof is completed. $\ \ \square$

We denote by $V_{\ell}(\cdot)$ the optimal solution of $v_{\ell}(\cdot)$ . Hence,

V_{\ell}(x):=\mathop{\arg\sup}\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}.

(7)

In the sequel, we systematically assume that $V_{\ell}(\cdot)$ is well-defined on $\mathrm{int(dom}\omega)$ , i.e., $V_{\ell}(\cdot)$ is a single valued mapping from $\mathrm{int(dom}\omega)$ to $\mathrm{int(dom}\omega)$ . We give a sufficient condition for the assumption.

Lemma 4.

If $\omega(\cdot)$ is supercoercive, i.e.,

\lim\limits_{\|x\|\rightarrow\infty}\frac{\omega(x)}{\|x\|}=\infty,

then $V_{\ell}(\cdot)$ is a single valued mapping from $\mathrm{int(dom}\omega)$ to $\mathrm{int(dom}\omega)$ .

Proof.

For any fixed point $x\in\mathrm{int(dom}\omega)$ , and $\ell>0$ . We define the function $P_{\ell}(\cdot)$ for $u$ as follows

P_{\ell}(u)=\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-u)\rangle-\ell D_{\omega}(u,x)\}.

(8)

Then $V_{\ell}(x)=\mathop{\arg\sup}\limits_{u\in\Omega}P_{\ell}(u)$ . Substituting the equality of Bregman distance into equality (8), we have

	$\displaystyle P_{\ell}(u)$	$\displaystyle=\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-u)\rangle-\ell(\omega(u)-\omega(x)-\nabla\omega(x)(u-x))\}$
		$\displaystyle=\\|u\\|\left(\frac{\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-u)\rangle\}}{\\|u\\|}-\frac{\ell\omega(u)}{\\|u\\|}+\frac{\ell(\omega(x)+\nabla\omega(x)(u-x))}{\\|u\\|}\right).$

Taking $\|u\|\rightarrow\infty$ , then $\frac{\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-u)\rangle\}}{\|u\|}$ is bounded due to the compactness of $G$ . This combines with the boundness of $\frac{\ell(\omega(x)+\nabla\omega(x)(u-x))}{\|u\|}$ and supercoercivity of $\omega(\cdot)$ implies $\lim\limits_{\|u\|\rightarrow\infty}P_{\ell}(u)=-\infty$ . Hence, $P_{\ell}(\cdot)$ is level bounded on $\mathbb{R}^{n}$ . It follows from continuity of $P_{\ell}(\cdot)$ and Weierstrass’s theorem (rocv, , Theorem 1.9) that $V_{\ell}(x)$ is nonempty. The uniqueness of $V_{\ell}(x)$ is given by the strict convexity of $\omega(\cdot)$ . From the optimality condition of $V_{\ell}(x)$ , it follows that $\partial\omega(V_{\ell}(x))$ is nonempty. Then Lemma 3(iv) implies that $V_{\ell}(x)\in\mathrm{int(dom}\omega)$ . $\ \ \square$

Remark 5.

Recall that $\mathrm{cl}(\mathrm{dom}\omega)=\Omega$ , we have $\mathrm{int(dom}\omega)=\mathrm{int}(\Omega)$ . From Lemma 4, if $\omega(\cdot)$ is supercoercive, then $V_{\ell}(\cdot)$ is a single valued mapping from $\mathrm{int}(\Omega)$ to $\mathrm{int}(\Omega)$ .

Proposition 3.

For all $\ell>0$ , let $v_{\ell}(\cdot)$ and $V_{\ell}(\cdot)$ be defined as (6) and (7), respectively. Then

\mathrm{(i)}

there exists $c_{\ell}(x)\in G$ , such that

V_{\ell}(x)=\mathop{\arg\sup}\limits_{y\in\Omega}\{\langle c_{\ell}(x),JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\},

(9)

and

c_{\ell}(x)\in\mathop{\arg\min}\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-V_{\ell}(x))\rangle;

(10)

$\mathrm{(ii)}$

$V_{\ell}(x)=\nabla\omega^{*}(\nabla\omega(x)-\frac{1}{\ell}JF(x)^{T}c_{\ell}(x)),\ \forall x\in\mathrm{int}(\Omega);$
$\mathrm{(iii)}$

$\langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle=\ell(D_{\omega}(x,V_{\ell}(x))+D_{\omega}(V_{\ell}(x),x)),\ \forall x\in\mathrm{int}(\Omega);$
$\mathrm{(iv)}$

$v_{\ell}(x)=\ell D_{\omega}(x,V_{\ell}(x)),\ \forall x\in\mathrm{int}(\Omega)$ .

Proof.

(i) Note that $\Omega$ is convex, $G$ is compact and convex, and $\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)$ is convex for $c^{*}$ and concave for $y$ . Therefore, it follows by Sion’s minimax theorem 32 that there exists $c_{\ell}(x)\in G$ such that

		$\displaystyle\ \sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}$
	$\displaystyle=$	$\displaystyle\ \min\limits_{c^{}\in G}\sup\limits_{y\in\Omega}\{\langle c^{},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}$
	$\displaystyle=$	$\displaystyle\ \langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle-\ell D_{\omega}(V_{\ell}(x),x),$

and

V_{\ell}(x)\in\mathop{\arg\sup}\limits_{y\in\Omega}\{\langle c_{\ell}(x),JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\},

and

c_{\ell}(x)\in\mathop{\arg\min}\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-V_{\ell}(x))\rangle.

Hence, we obtain (10), and (9) follows by the strict convexity of $\omega(\cdot)$ .

(ii) From the optimality condition for (9) and the fact that $V_{\ell}(x)\in\mathrm{int}(\Omega)$ , we have

-JF(x)^{T}c_{\ell}(x)-\ell(\nabla\omega(V_{\ell}(x))-\nabla\omega(x))=0.

(11)

Then, assertion (i) follows from the fact that $(\nabla\omega(\cdot))^{-1}=\nabla\omega^{*}(\cdot)$ .

(iii) From (11), we have

	$\displaystyle\langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle$	$\displaystyle=\ell\langle\nabla\omega(V_{\ell}(x))-\nabla\omega(x),V_{\ell}(x)-x\rangle$
		$\displaystyle=\ell(D_{\omega}(V_{\ell}(x),x)+D_{\omega}(x,V_{\ell}(x))).$

Hence, we obtain the desired result.

(iv) Using (9) and the fact that $V_{\ell}(x)$ is the unique solution of $v_{\ell}(x)$ , we obtain

	$\displaystyle v_{\ell}(x)$	$\displaystyle=\langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle-\ell D_{\omega}(V_{\ell}(x),x)$
		$\displaystyle=\ell(D_{\omega}(V_{\ell}(x),x)+D_{\omega}(x,V_{\ell}(x)))-\ell D_{\omega}(V_{\ell}(x),x)$
		$\displaystyle=\ell D_{\omega}(x,V_{\ell}(x)),$

where the second equality is given by assertion (ii). The proof is completed. $\ \ \square$

We can also show the continuity of $v_{\ell}(\cdot)$ and $V_{\ell}(\cdot)$ .

Proposition 4.

For all $\ell>0$ , the $v_{\ell}(\cdot)$ and $V_{\ell}(\cdot)$ are continuous on $\mathrm{int}(\Omega)$ .

Proof.

From Proposition 3(iv), it is sufficient to prove the continuity of $V_{\ell}(\cdot)$ . For all $x\in\mathrm{int}(\Omega)$ , the optimality condition (11) gives

\ell\langle\nabla\omega(x)-\nabla\omega(V_{\ell}(x)),z-V_{\ell}(x)\rangle=\langle JF(x)^{T}c_{\ell}(x),z-V_{\ell}(x)\rangle,\ \forall z\in\Omega.

For $\bar{x}\in\mathrm{int}(\Omega)$ and a sequence $\{x_{k}\}\subset\mathrm{int}(\Omega)$ satisfying $x_{k}\rightarrow\bar{x}$ . Substituting $(x,z)=(\bar{x},V_{\ell}(x_{k}))$ and $(x,z)=(x_{k},V_{\ell}(\bar{x}))$ into the above equality, respectively, and sum them up, we have

		$\displaystyle\ell\langle\nabla\omega(x_{k})-\nabla\omega(\bar{x})+\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle$
	$\displaystyle=$	$\displaystyle\langle JF(x_{k})^{T}c_{\ell}(x_{k})-JF(\bar{x})^{T}c_{\ell}(\bar{x}),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle$
	$\displaystyle=$	$\displaystyle\langle JF(x_{k})^{T}c_{\ell}(x_{k}),x_{k}-V_{\ell}(x_{k})\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),\bar{x}-V_{\ell}(\bar{x})\rangle$
		$\displaystyle+\langle JF(x_{k})^{T}c_{\ell}(x_{k}),V_{\ell}(\bar{x})-x_{k}\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),V_{\ell}(x_{k})-\bar{x}\rangle$
	$\displaystyle\leq$	$\displaystyle\langle JF(x_{k})^{T}c_{\ell}(\bar{x}),x_{k}-V_{\ell}(x_{k})\rangle+\langle JF(\bar{x})^{T}c_{\ell}(x_{k}),\bar{x}-V_{\ell}(\bar{x})\rangle$
		$\displaystyle+\langle JF(x_{k})^{T}c_{\ell}(x_{k}),V_{\ell}(\bar{x})-x_{k}\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),V_{\ell}(x_{k})-\bar{x}\rangle$
	$\displaystyle=$	$\displaystyle\langle(JF(x_{k})-JF(\bar{x}))^{T}c_{\ell}(\bar{x}),x_{k}-V_{\ell}(x_{k})\rangle+\langle(JF(\bar{x})-JF(x_{k}))^{T}c_{\ell}(x_{k}),\bar{x}-V_{\ell}(\bar{x})\rangle$
		$\displaystyle+\langle JF(x_{k})^{T}c_{\ell}(x_{k}),\bar{x}-x_{k}\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),x_{k}-\bar{x}\rangle,$

where the inequality is given by (10). For simplicity, we denote the right hand side of the above inequality as $RHS$ . Then, we have

\displaystyle\ell\langle\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle\leq RHS+\langle\nabla\omega(\bar{x})-\nabla\omega(x_{k}),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle.

Notice that $\omega(\cdot)$ is strictly convex on $\mathrm{int}(\Omega)$ , we have

\langle\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle\geq 0.

On the other hand, since $\nabla\omega(\cdot)$ and $JF(\cdot)$ are continuous, and $x_{k}\rightarrow\bar{x}$ , we obtain $RHS$ and $\nabla\omega(\bar{x})-\nabla\omega(x_{k})$ tend to $0$ . Then $\langle\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle$ tends to $0$ . It follows by the strict convexity of $\omega$ that $V_{\ell}(x_{k})\rightarrow V_{\ell}(\bar{x})$ . Hence, the continuity of $\omega(\cdot)$ follows from the arbitrary of $\{x_{k}\}$ and $\bar{x}\in\mathrm{int}(\Omega)$ . $\ \ \square$

3.2 Vector Bregman-PL inequality

In this subsection, a noval vector Bregman-PL inequality is derived with merit functions. At first, we present the connection between $D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c_{\ell}(x)))$ and $D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c_{\ell}(x)))$ for some $\ell_{1},\ell_{2}>0$ , which will be used to establish the connection between $v_{\ell_{1}}(x)$ and $v_{\ell_{2}}(x)$ .

Lemma 5.

Let $\ell_{1},\ell_{2}>0$ , $c^{*}\in G$ and $x\in\mathrm{int}(\Omega)$ . If $\ell_{1}\geq\ell_{2}$ , we have

D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c^{*}))\geq D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c^{*})).

(12)

Proof.

The proof is similar to the arguments in the proof of (nlns, , Lemma 3.5), we omit it here. $\ \ \square$

Note that it is not clear whether the inequality holds for $0<\ell_{1}<\ell_{2}$ in Lemma 5. To overcome this difficulty we present the following assumption.

Assumption 1.

Let $\ell_{1},\ell_{2}>0$ , $c^{*}\in G$ and $x\in\mathrm{int}(\Omega)$ . Then there exists $\theta:\mathbb{R}_{++}\rightarrow\mathbb{R}_{++}$ such that

D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c^{*}))\geq\theta({\frac{\ell_{1}}{\ell_{2}}})D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c^{*}))

(13)

The Assumption 1 seems strict in general. Indeed, we have the following sufficient condition for Assumption 1, which is first presented in nlns .

Remark 6.

$\mathrm{(i)}$

If $\ell_{1}\geq\ell_{2}$ , Assumption 1 holds with $\theta(\cdot)\equiv 1$ .

\mathrm{(ii)}

Assumption 1 holds when $\omega(\cdot)$ is $\sigma$ -strongly convex and $\kappa$ -smooth. In this setting, the conjugate $\omega^{*}(\cdot)$ satisfies

\frac{\|u-v\|^{2}}{2\kappa}\leq D_{\omega^{*}}(u,v)\leq\frac{\|u-v\|^{2}}{2\sigma},\ \forall u,v\in\mathrm{int}(\Omega).

It follows by (1) and the above inequalities that

D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c^{*}))\geq\frac{\ell_{1}^{2}}{2\kappa}\|JF(x)^{T}c^{*}\|^{2},

and

\|JF(x)^{T}c^{*}\|^{2}\geq\frac{2\sigma}{\ell_{2}^{2}}D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c^{*})).

Then Assumption 1 holds with $\theta(t):=\frac{\sigma t^{2}}{\kappa}$ .

We are now in a position to present the relation between $v_{\ell_{1}}(\cdot)$ and $v_{\ell_{2}}(\cdot)$ .

Proposition 5.

Suppose Assumption 1 holds. Let $\ell_{1},\ell_{2}>0$ , and $x\in\mathrm{int}(\Omega)$ . For $\ell_{1}\geq\ell_{2}$ , we have

v_{\ell_{1}}(x)\leq v_{\ell_{2}}(x)\leq\frac{\ell_{2}}{\theta(\frac{\ell_{2}}{\ell_{1}})\ell_{1}}v_{\ell_{1}}(x).

(14)

Proof.

The left hand side of inequalities (14) follows directly from the definition of $v_{\ell}(\cdot)$ . Next, we proof the right hand side of inequalities (14). From (9) and the definition of $c_{\ell}(\cdot)$ , we have

v_{\ell_{2}}(x)\leq\sup\limits_{y\in\Omega}\{\langle c_{\ell_{1}}(x),JF(x)(x-y)\rangle-\ell_{2}D_{\omega}(y,x)\}.

Using the similar argument in the proof of Proposition 3(iv), we obtain

\displaystyle v_{\ell_{2}}(x)\leq\sup\limits_{y\in\Omega}\{\langle c_{\ell_{1}}(x),JF(x)(x-y)\rangle-\ell_{2}D_{\omega}(y,x)\}=\ell_{2}D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\frac{1}{\ell_{2}}JF(x)^{T}c_{\ell_{1}}(x))).

We further use inequality (13) to get

	$\displaystyle v_{\ell_{2}}(x)$	$\displaystyle\leq\frac{\ell_{2}}{\theta(\frac{\ell_{2}}{\ell_{1}})}D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\frac{1}{\ell_{1}}JF(x)^{T}c_{\ell_{1}}(x)))$
		$\displaystyle=\frac{\ell_{2}}{\theta(\frac{\ell_{2}}{\ell_{1}})\ell_{1}}v_{\ell_{1}}(x),$

where the equality is given by Proposition 3(iv). Therefore, we obtain the right hand side of inequalities (14). $\ \ \square$

Moreover, we present the relations between $u_{0}(\cdot)$ and $v_{\ell}(\cdot)$ .

Proposition 6.

For the merit functions $u_{0}(\cdot)$ and $v_{\ell}(\cdot)$ , the following statements hold.

$\mathrm{(i)}$

If $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ for some $L>0$ , then

$v_{L}(x)\leq u_{0}(x),\ \forall x\in\mathrm{int}(\Omega).$ (15)
$\mathrm{(ii)}$

If $F(\cdot)$ is $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ for some $\mu\geq 0$ , then

$u_{0}(x)\leq v_{\delta\mu}(x),\ \forall x\in\mathrm{int}(\Omega).$ (16)

Proof.

(i) From the $(L,C,e)$ -smoothness of $F(\cdot)$ , we have

F(y)\preceq_{C}F(x)+JF(x)(y-x)+LD_{\omega}(y,x)e,\ \forall y\in\Omega.

Hence,

	$\displaystyle\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},F(x)-F(y)\rangle\}$	$\displaystyle\geq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)-LD_{\omega}(y,x)e\rangle\}$
		$\displaystyle\geq\sup\limits_{y\in\Omega}\{\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle\}+\min\limits_{c^{}\in G}\{\langle c^{},-LD_{\omega}(y,x)e\rangle\}\}$
		$\displaystyle\geq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle-LD_{\omega}(y,x)\},$

where the last inequality is given by (3). It follows that $u_{0}(x)\geq v_{L}(x)$ .

From the $(\mu,C,e)$ -strong comvexity of $F(\cdot)$ , we have

F(x)+JF(x)(y-x)+\mu D_{\omega}(y,x)e\preceq_{C}F(y).

Therefore,

	$\displaystyle\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},F(x)-F(y)\rangle\}$	$\displaystyle\leq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)-\mu D_{\omega}(y,x)e\rangle\}$
		$\displaystyle\leq\sup\limits_{y\in\Omega}\{\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle\}+\min\limits_{c^{}\in G}\{\langle c^{},-\mu D_{\omega}(y,x)e\rangle\}\}$
		$\displaystyle\leq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle-\delta\mu D_{\omega}(y,x)\},$

where the last inequality is given by (4). Hence, $u_{0}(x)\leq v_{\delta\mu}(x)$ . $\ \ \square$

We propose the following error bound property, which will be used in convergence rate analysis.

Definition 9.

We say that vector Bregman-PL inequality holds for $F(\cdot)$ on $\mathrm{int}(\Omega)$ when

u_{0}(x)\leq\tau(\ell)v_{\ell}(x),\ \forall x\in\mathrm{int}(\Omega),\ell>0,

(17)

where $\tau:\mathbb{R}_{++}\rightarrow\mathbb{R}_{++}$ .

Remark 7.

The vector Bregman-PL inequality (17) corresponds to the general PL inequality when $m=1$ and $\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}$ . It is also a generalization for gradient dominated inequality (nlns, , Definition 3.2) in the context of vector optimization problems. Recently, a multiobjective proximal-PL inequality was established in (pcom, , Definition 4.1). The vector Bregman-PL inequality coincides with multiobjective proximal-PL inequality when $\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}$ and $g_{i}(\cdot)=\sigma_{\Omega}(\cdot)$ for all $i\in[m]$ .

By virtue of Propositions 5 and 6, we give the sufficient conditions for vector Bregman-PL inequality.

Proposition 7.

Suppose that Assumption 1 holds and $F(\cdot)$ is $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ for some $\mu\geq 0$ . Then for all $x\in\mathrm{int}(\Omega)$ and $\ell>0$ , vector Bregman-PL inequality (17) holds with $\tau(\ell)=\frac{\delta\mu}{\theta(\frac{\delta\mu}{\ell})\ell}$ .

Proof.

The result follows directly from inequalities (14) and (16). $\ \ \square$

4 Interior Bregman Gradient Method for VOPs

In this section, we first introduce interior Bregman gradient method for VOPs.

Data:

x^{0}\in\mathrm{int}(\Omega),\ \ell>0

, a supercoercive Legendre function

\omega

with

\Omega=\mathrm{cl}(\mathrm{dom}\omega)

and

F(\cdot)

(L,C,e)

-smooth related to

\omega(\cdot)

1 for $k=0,1,...$ do

x^{k+1}=V_{\ell}(x^{k})

3 if $x^{k+1}=x^{k}$ then

4 return Pareto critical point

x^{k}

5 end if

7 end for

Algorithm 1 interior_Bregman_gradient_method_for_VOPs

The proposed algorithm generates a sequence via the following iterates:

x^{k+1}=V_{\ell}(x^{k}).

From Lemma 4 and the strict convexity of $\omega(\cdot)$ , the algorithm is well-defined, and the generated sequence $\{x^{k}\}$ lies in $\mathrm{int}(\Omega)$ . Note that $V_{\ell}(x^{k})$ is the unique solution of

v_{\ell}(x^{k})=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x^{k})(x^{k}-y)\rangle-\ell D_{\omega}(y,x^{k})\},

then

V_{\ell}(x^{k})=\mathop{\arg\inf}\limits_{y\in\Omega}\max\limits_{c^{*}\in G}\{\langle c^{*},JF(x^{k})(y-x^{k})\rangle+\ell D_{\omega}(y,x^{k})\}.

Define mapping $\varphi:\mathbb{R}^{m}\rightarrow\mathbb{R}$ as

\varphi(F(x)):=\max\limits_{c^{*}\in G}\{\langle c^{*},F(x)\rangle.

From the iterates in Algorithm 1, we deduce the following sufficient descent property.

Lemma 6 (Sufficient descent property).

Assume that $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ , and $\ell>0$ . Then the sequence $\{x^{k}\}$ produced by Algorithm 1 satisfies

\varphi(F(x^{k+1})-F(x^{k}))\leq-((1+\alpha(\omega))\ell-L)D_{\omega}(x^{k+1},x^{k}).

(18)

Furthermore, the sufficient descent property holds when

\ell>\frac{L}{1+\alpha(\omega)}.

Proof.

From the the definition of $(L,C,e)$ -smoothness, we have

\displaystyle F(x^{k+1})-F(x^{k})\preceq_{C}JF(x^{k})(x^{k+1}-x^{k})+LD_{\omega}(x^{k+1},x^{k})e.

Then

	$\displaystyle\varphi(F(x^{k+1})-F(x^{k}))$	$\displaystyle\leq\varphi(JF(x^{k})(x^{k+1}-x^{k}))+LD_{\omega}(x^{k+1},x^{k})$
		$\displaystyle=-\ell D_{\omega}(x^{k+1},x^{k})-\ell D_{\omega}(x^{k},x^{k+1})+LD_{\omega}(x^{k+1},x^{k})$
		$\displaystyle\leq-((1+\alpha(\omega))\ell-L)D_{\omega}(x^{k+1},x^{k}),$

where the first equality follows from $x^{k+1}=V_{\ell}(x^{k})$ , the definitions of $\varphi(\cdot)$ , relation (10) and Proposition 3(iii), the second equality is given by the definition of $\alpha(\omega)$ . This together with the inequality $\ell>\frac{L}{1+\alpha(\omega)}$ yield the sufficient descent property. $\ \ \square$

We can see that Algorithm 1 terminates with an $C$ -stationary point in a finite number of iterations or generates an infinite sequence. In the sequel, we will suppose that Algorithm 1 generates an infinite sequence of nonstationary points. Firstly, we present the global convergence of Algorithm 1 in nonconvex case.

Theorem 2.

Assume that $\{x:F(x)\preceq_{C}F(x^{0})\}$ is bounded, $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ for some $L>0$ , and $\ell>\frac{L}{1+\alpha(\omega)}$ . Let $\{x^{k}\}$ be the sequence generated by Algorithm 1. Then, we have

$\mathrm{(i)}$

there exists $F^{*}\preceq_{C}F(x^{k})$ for all $k$ such that $\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}$ .
$\mathrm{(ii)}$

$\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty$ .
$\mathrm{(iii)}$

$\{x_{k}\}$ has at least one accumulation point, and every accumulation point $x^{*}\in\mathrm{int}(\Omega)$ is a $C$ -stationary point. Moreover, if $\nabla\omega(\cdot)$ is $L_{\omega}$ -Lipschitz continuous on $\mathrm{int}(\Omega)$ , every accumulation point $x^{*}\in\mathrm{bd}(\Omega)$ is a $C$ -stationary point.
$\mathrm{(iv)}$

$\min\limits_{0\leq p\leq k-1}v_{\ell}(x^{p})\leq\frac{-\ell\varphi(F^{*}-F(x^{0}))}{\alpha(\omega)((1+\alpha(\omega))\ell-L)k}$

Proof.

(i) From Lemma 6, it follows that $\{F(x_{k})\}$ is descreasing under the partial order induced by $C$ . This combined with boundness of $\{x:F(x)\preceq_{C}F(x^{0})\}$ implies $\{x_{k}\}$ is bounded and there exists $F^{*}\preceq_{C}F(x^{k})$ for all $k$ such that $\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}$ .

(ii) Summing the inequality (18) over $k=0,1,...p$ , we obtain

	$\displaystyle-\sum\limits_{k=0}^{p}((1+\alpha(\omega))\ell-L)D_{\omega}(x^{k+1},x^{k})$	$\displaystyle\geq\sum\limits_{k=0}^{p}\varphi(F(x^{k+1})-F(x^{k}))$
		$\displaystyle\geq\varphi(\sum\limits_{k=0}^{p}\{F(x^{k+1})-F(x^{k})\})$
		$\displaystyle\geq\varphi(F^{*}-F(x^{0}))$

where the second inequality follows from the subadditiveness of $\varphi(\cdot)$ , and the last inequality is given by the definition of $\varphi(\cdot)$ and $F^{*}\preceq_{C}F(x^{k})$ . Hence

\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty

(iii) From Proposition 3(iv), we have

v_{\ell}(x^{k})=\ell D_{\omega}(x^{k},x^{k+1})\leq\frac{\ell}{\alpha(\omega)}D_{\omega}(x^{k+1},x^{k}),

(19)

this together with assertion (ii) imply

\lim\limits_{k\rightarrow\infty}v_{\ell}(x^{k})=0.

On the other hand, the boundness of $\{x^{k}\}$ yields that there exists a subsequence $\{x^{k}\}_{k\in\mathcal{K}}\subset\{x^{k}\}$ tends to $x^{*}$ , where $x^{*}$ is an accumulation point. Next, we prove $x^{*}$ is a $C$ -stationary point. We distinguish two cases: $x^{*}\in\mathrm{int}(\Omega)$ or $x^{*}\in\mathrm{bd}(\Omega)$ . If $x^{*}\in\mathrm{int}(\Omega)$ , it follows from the continuity of $v_{\ell}(\cdot)$ that

v_{\ell}(x^{*})=\lim\limits_{k\in\mathcal{K}}v_{\ell}(x^{k})=0.

Therefore, $x^{*}$ is a $C$ -stationary point. If $x^{*}\in\mathrm{bd}(\Omega)$ , suppose by contrary that $x^{*}$ is a non-stationary point. From Remark 1, there exists $y_{0}\in\mathrm{int}(\Omega)$ such that

JF(x^{*})(y_{0}-x^{*})\in-\mathrm{int}(C).

Then, we denote by

\varepsilon:=\varphi(JF(x^{*})(y_{0}-x^{*}))<0.

From the continuity of $JF(\cdot)$ and the compactness of $G$ , we have

\varphi(JF(x^{k})(y_{0}-x^{k}))\leq\frac{\varepsilon}{2},

for $k\in\mathcal{K}$ is sufficient large. Then, we have

	$\displaystyle v_{\ell}(x^{k})$	$\displaystyle=-\inf\limits_{y\in\Omega}\max\limits_{c^{}\in G}\{\langle c^{},JF(x^{k})(y-x^{k})\rangle+\ell D_{\omega}(y,x^{k})\}$
		$\displaystyle\geq-t\varphi(JF(x^{k})(y_{0}-x^{k}))-\ell D_{\omega}(x^{k}+t(y_{0}-x^{k}),x^{k})$
		$\displaystyle\geq-\frac{\varepsilon}{2}t-\ell D_{\omega}(x^{k}+t(y_{0}-x^{k}),x^{k})$
		$\displaystyle\geq-\frac{\varepsilon}{2}t-\frac{\ell L_{\omega}}{2}t^{2}\\|y_{0}-x^{k}\\|^{2}$
		$\displaystyle\geq-\frac{\varepsilon}{2}t-\ell L_{\omega}t^{2}\\|y_{0}-x^{*}\\|^{2},$

for $k\in\mathcal{K}$ is sufficient large and all $t\in(0,1)$ , where the first inequality follows from $x^{k}+t(y_{0}-x^{k})\in\Omega$ , the third inequality is given by $L_{\omega}$ -Lipschitz continuity of $\nabla\omega(\cdot)$ , and the last inequality is due to that $k\in\mathcal{K}$ is sufficient large so that, without loss of generality, we have $\|y_{0}-x^{k}\|\leq\sqrt{2}\|y_{0}-x^{*}\|$ . Recall that the above inequalities holds for all $t\in(0,1)$ , then there exists $t_{0}\in(0,1)$ such that $v_{\ell}(x^{k})\geq-\frac{\varepsilon t_{0}}{4}$ . This contradicts $v_{\ell}(x^{k})\rightarrow 0$ . Therefore, $x^{*}$ is a $C$ -stationary point.

(iv) In the proof of assertion (ii), we obtained

\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})\leq\frac{-\varphi(F^{*}-F(x^{0}))}{(1+\alpha(\omega))\ell-L},

this combines with (19) yield

\sum\limits_{k=0}^{\infty}v_{\ell}(x^{k})\leq\frac{-\ell\varphi(F^{*}-F(x^{0}))}{\alpha(\omega)((1+\alpha(\omega))\ell-L)}.

Then, assertion (iv) holds due to

\sum\limits_{k=0}^{\infty}v_{\ell}(x^{k})\geq\sum\limits_{p=0}^{k-1}v_{\ell}(x^{p})\geq k\min\limits_{0\leq p\leq k-1}v_{\ell}(x^{p}).

$\ \ \square$

Remark 8.

In Theorem 2(iii), if the accumulation $x^{*}$ lies in $\mathrm{bd}(\Omega)$ , the continuity of $v_{\ell}(\cdot)$ and $V_{\ell}(\cdot)$ can not be derived due to $\mathrm{dom}\nabla\omega=\mathrm{int}(\Omega)$ . Recall that when $\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}$ , the continuity of $d(\cdot)$ , corresponds to $V_{\ell}(\cdot)$ , implied the stationarity of accumulation points 6 ; vsd . Alternatively, since the gradient of $\frac{1}{2}\|\cdot\|^{2}$ is $1$ -Lipschitz continuous, the stationarity of accumulation points follows directly from a subsequence $\{d(x^{k})\}_{k\in\mathcal{K}}\rightarrow 0$ . The result is crucial when the continuity of $v_{\ell}(\cdot)$ and $V_{\ell}(\cdot)$ are unknown. For example, the authors used but didn’t prove the continuity of $d(\cdot)$ in (dsd, , Theorem 3.2). However, the result is valid, which can be proved by $\{d(x^{k})\}_{k\in\mathcal{K}}\rightarrow 0$ .

4.1 Strong convergence

In this subsection, we discuss the strong convergence of Algorithm 1 in convex case. Firstly, we present the following identity, known as three points lemma.

Lemma 7 (Three points lemma).

tp Assume that $a,b,c\in\mathrm{int(dom}\omega)$ , then the following equality holds:

\langle\nabla\omega(b)-\nabla\omega(a),c-a\rangle=D_{\omega}(c,a)+D_{\omega}(a,b)-D_{\omega}(c,b).

Applying the three points lemma, we can now establish a fundamental inequality for Algorithm 1 with $C$ -convex objective function.

Lemma 8.

Assume that $F(\cdot)$ is $(L,C,e)$ -smooth and $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ for some $L>0$ and $\mu\geq 0$ . Let $x^{k+1}=V_{\ell}(x^{k})$ for some $\ell>0$ . Then, for all $x\in\mathrm{int}(\Omega)$ , we have

\langle c_{\ell}(x^{k}),F(x^{k+1})-F(x)\rangle\leq(\ell-\delta\mu)D_{\omega}(x,x^{k})-\ell D_{\omega}(x,x^{k+1})+(L-\ell)D_{\omega}(x^{k+1},x^{k}).

(20)

Proof.

From the $(L,C,e)$ -smoothness of $F(\cdot)$ , for all $x\in\mathrm{int}(\Omega)$ , we have

	$\displaystyle F(x^{k+1})$	$\displaystyle\preceq_{C}F(x^{k})+JF(x^{k})(x^{k+1}-x^{k})+LD_{\omega}(x^{k+1},x^{k})e$
		$\displaystyle\preceq_{C}F(x)+JF(x^{k})(x^{k+1}-x)-\mu D_{\omega}(x,x^{k})e+LD_{\omega}(x^{k+1},x^{k})e,$

where the second inequality is given by the $(\mu,C,e)$ -strong convexity of $F(\cdot)$ . Rearranging and multiplying by $c_{\ell}(x^{k})$ , we obtain

		$\displaystyle\langle c_{\ell}(x^{k}),F(x^{k+1})-F(x)\rangle$
	$\displaystyle\leq$	$\displaystyle\langle c_{\ell}(x^{k}),JF(x^{k})(x^{k+1}-x)\rangle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})$
	$\displaystyle\leq$	$\displaystyle\langle c_{\ell}(x^{k}),JF(x^{k})(x^{k+1}-x^{k})\rangle+\langle c_{\ell}(x^{k}),JF(x^{k})(x^{k}-x)\rangle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})$
	$\displaystyle=$	$\displaystyle-\ell(D_{\omega}(x^{k+1},x^{k})+D_{\omega}(x^{k},x^{k+1}))+\ell\langle\nabla\omega(x^{k})-\nabla\omega(x^{k+1}),x^{k}-x\rangle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})$
	$\displaystyle=$	$\displaystyle-\ell(D_{\omega}(x^{k+1},x^{k})+D_{\omega}(x^{k},x^{k+1}))+\ell(D_{\omega}(x,x^{k})+D_{\omega}(x^{k},x^{k+1})-D_{\omega}(x,x^{k+1}))$
		$\displaystyle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})$
	$\displaystyle=$	$\displaystyle(\ell-\delta\mu)D_{\omega}(x,x^{k})-\ell D_{\omega}(x,x^{k+1})+(L-\ell)D_{\omega}(x^{k+1},x^{k}),$

where the first equality follows from Proposition 3(iii) and equality (11), the second equality is given by three points lemma with $a=x^{k}$ , $b=x^{k+1}$ and $c=x$ . $\ \ \square$

Before presenting the convergence result in convex case, we recall the following result on nonnegative sequences.

Lemma 9.

(qf, , Lemma 2) Let $\{a^{k}\}$ and $\{b^{k}\}$ be nonnegative sequences. Assume that $\sum\limits_{k=1}^{\infty}b^{k}<\infty$ and

a^{k+1}\leq a^{k}+b^{k}.

Then, $\{a^{k}\}$ converges.

We also recall the following assumption.

Assumption 2.

(lc, , Assumption H) Assume the Legendre function $\omega(\cdot)$ satisfies

$\mathrm{(i)}$

If $\{x^{k}\}\subset\mathrm{int}(\Omega)$ converges to some point $x\in\Omega$ , then $D_{\omega}(x,x^{k})\rightarrow 0$ .
$\mathrm{(ii)}$

Reciprocally, if $x\in\Omega$ and if $\{x^{k}\}\subset\mathrm{int}(\Omega)$ is such that $D_{\omega}(x,x^{k})\rightarrow 0$ , then $x^{k}\rightarrow x$ .

Remark 9.

Note that $\omega(\cdot)$ is differentiable on $\mathrm{int}(\Omega)$ , Assumption 2(i) holds directly for $x\in\mathrm{int}(\Omega)$ . On the other hand, if $x\in\mathrm{int}(\Omega)$ , Assumption 2(ii) follows from the strict convexity of $\omega(\cdot)$ on $\mathrm{int}(\Omega)$ . Hence, Assumption 2 is proposed for $x\in\mathrm{bd}(\Omega)$ .

In what follows, we give the convergence result for convex case.

Theorem 3.

The assumptions of Theorem 2 hold and $F(\cdot)$ is $C$ -convex. Then, we have

$\mathrm{(i)}$

there exists $F^{*}\preceq_{C}F(x^{k})$ for all $k$ such that $\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}$ .
$\mathrm{(ii)}$

$\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty$ .
$\mathrm{(iii)}$

every accumulation point of $\{x_{k}\}$ is a weakly efficient solution. Assume that Assumption 2 holds, then $\{x_{k}\}$ converges to some weakly efficient solution $x^{*}$ .
$\mathrm{(iv)}$

$\langle\bar{c}_{\ell}^{k-1},F(x^{k})-F^{*}\rangle\leq\frac{\ell D_{\omega}(x^{*},x^{0})+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p})}{k}$ , where $\bar{c}_{\ell}^{k-1}=\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p})$ .
$\mathrm{(v)}$

if $\ell\geq L$ , then $\langle\bar{c}_{\ell}^{k-1},F(x^{k})-F^{*}\rangle\leq\frac{\ell D_{\omega}(x^{*},x^{0})}{k}$ , where $\bar{c}_{\ell}^{k-1}=\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p})$ .

Proof.

Since all assumptions of Theorem 2 hold, we obtain assertions (i) and (ii).

(iii) Denote by $X^{*}_{F}$ = $\{x\in\Omega:F(x)=F^{*}\}$ . From the proof of Theorem 2(iii), there exists $x^{*}\in X^{*}_{F}$ is an accumulation point of $\{x^{k}\}$ . Next, we prove $x^{*}$ is a weakly efficient point. Suppose by the contrary that there exists $\hat{x}\in\Omega$ such that $F(\hat{x})\prec_{C}F(x^{*})$ . Substituting $x=\hat{x}$ and $\mu=0$ into inequality (20), we have

\langle c_{\ell}(x^{k}),F(x^{k+1})-F(\hat{x})\rangle\leq\ell(D_{\omega}(\hat{x},x^{k})-D_{\omega}(\hat{x},x^{k+1}))+(L-\ell)D_{\omega}(x^{k+1},x^{k}).

(21)

Summing the above inequality over $0$ to $k-1$ , we obtain

	$\displaystyle\ell(D_{\omega}(\hat{x},x^{0})-D_{\omega}(\hat{x},x^{k}))+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p})$	$\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{p+1})-F(\hat{x})\rangle$
		$\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{*})-F(\hat{x})\rangle,$

where the second inequality follows from $F(x^{*})\preceq_{C}F(x^{p})$ . Multiplying by $\frac{1}{k}$ , we obtain

\langle\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{*})-F(\hat{x})\rangle\leq\frac{\ell D_{\omega}(\hat{x},x^{0})+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p})}{k}.

(22)

Since $\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p})\in G$ , this together with $F(\hat{x})\prec_{C}F(x^{*})$ imply

\displaystyle\langle\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{*})-F(\hat{x})\rangle\geq\delta_{0},

(23)

where $\delta_{0}:=\min\limits_{c^{*}\in G}\langle c^{*},F(x^{*})-F(\hat{x})\rangle>0$ . However, as $k\rightarrow\infty$ , the right hand side of inequality (22) tends to $0$ due to $\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty$ . This contradicts ineuqality (23). Therefore, $x^{*}$ is a weakly efficient point. In what follows, we prove the whole sequence converges to $x^{*}$ . Substituting $x=x^{*}$ and $\mu=0$ into inequality (20), we have

\langle c_{\ell}(x^{k}),F(x^{k+1})-F(x^{*})\rangle\leq\ell(D_{\omega}(x^{*},x^{k})-D_{\omega}(x^{*},x^{k+1}))+(L-\ell)D_{\omega}(x^{k+1},x^{k}).

(24)

The left hand side of the above inequality is nonnegative due to $F^{*}\preceq_{C}F(x^{k+1})$ , then

D_{\omega}(x^{*},x^{k+1})\leq D_{\omega}(x^{*},x^{k})+\frac{L-\ell}{\ell}D_{\omega}(x^{k+1},x^{k}).

This combines with $\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty$ and Lemma 9 imply $\{D_{\omega}(x^{*},x^{k})\}$ converges. On the other hand, $x^{*}$ is an accumulation point of $\{x^{k}\}$ . It follows from Assumption 2(i) that $\{D_{\omega}(x^{*},x^{k})\}$ converges to $0$ . Therefore, by Assumption 2(ii), we obtain sequence $\{x^{k}\}$ converges to $x^{*}$ .

(iv) Summing inequality (24) over $0$ to $k-1$ , we obtain

	$\displaystyle\ell(D_{\omega}(x^{},x^{0})-D_{\omega}(x^{},x^{k}))+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p})$	$\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{p+1})-F^{*}\rangle$
		$\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{k})-F^{*}\rangle,$

where the second inequality follows from $F(x^{p+1})\preceq_{C}F(x^{p})$ . Multiplying by $\frac{1}{k}$ , we obtain assertion (iv).

(v) Assertion (v) follows directly from assertion (iv). $\ \ \square$

4.2 Linear convergence

This subsection is devoted to discuss the linear convergence of Algorithm 1. Fristly, we present the following results under the assumption that $F(\cdot)$ is $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ for $\mu>0$ .

Theorem 4.

The assumptions of Theorem 2 hold and $F(\cdot)$ is $(\mu,C,e)$ -strongly convex relative to $\omega(\cdot)$ for some $\mu>0$ . Then, we have

$\mathrm{(i)}$

there exists $F^{*}\preceq_{C}F(x^{k})$ for all $k$ such that $\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}$ .
$\mathrm{(ii)}$

$\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty$ .
$\mathrm{(iii)}$

every accumulation point of $\{x_{k}\}$ is a weakly efficient solution. Suppose that Assumption 2 holds, then $\{x_{k}\}$ converges to some weakly efficient solution $x^{*}$ .
$\mathrm{(iv)}$

if $\ell\geq L$ , then $D_{\omega}(x^{*},x^{k+1})\leq\frac{\ell-\delta\mu}{\ell}D_{\omega}(x^{*},x^{k})$ .

Proof.

Since $(\mu,C,e)$ -strong convexity is sharper than $C$ -convexity, the assertions (i)-(iii) hold directly from Theorem 3.

(iv) Substituting $x=x^{*}$ into inequality (20), we have

\langle c_{\ell}(x^{k}),F(x^{k+1})-F^{*}\rangle\leq(\ell-\delta\mu)D_{\omega}(x^{*},x^{k})-\ell D_{\omega}(x^{*},x^{k+1})+(L-\ell)D_{\omega}(x^{k+1},x^{k}).

This together with $F^{*}\preceq_{C}F(x^{k+1})$ and $\ell\geq L$ yield

D_{\omega}(x^{*},x^{k+1})\leq\frac{\ell-\delta\mu}{\ell}D_{\omega}(x^{*},x^{k}).

$\ \ \square$

Next, we show the Q-linear convergence result of $\{u_{0}(x^{k})\}$ with vector Bregman-PL inequality.

Theorem 5.

Assume that vector Bregman-PL inequality (17) holds and $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ for some $L>0$ . If $\ell\geq L$ , then the sequence $\{x^{k}\}$ generated by Algorithm 1 satisfies

u_{0}(x^{k+1})\leq\left(1-\frac{1}{\tau(\ell)}\right)u_{0}(x^{k}).

Proof.

Since $F(\cdot)$ is $(L,C,e)$ -smooth relative to $\omega(\cdot)$ and $\ell\geq L$ , we have

F(x^{k+1})\preceq_{C}F(x^{k})+JF(x^{k})(x^{k+1}-x^{k})+\ell D_{\omega}(x^{k+1},x^{k})e.

Then, for all $x\in\Omega$ we obtain

F(x^{k+1})-F(x)\preceq_{C}F(x^{k})-F(x)+JF(x^{k})(x^{k+1}-x^{k})+\ell D_{\omega}(x^{k+1},x^{k})e.

It follows that

		$\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{}\in G}\langle c^{},F(x^{k})-F(x)\rangle$
	$\displaystyle\geq$	$\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},F(x^{k+1})-F(x)+JF(x^{k})(x^{k}-x^{k+1})-\ell D_{\omega}(x^{k+1},x^{k})e\rangle\}$
	$\displaystyle\geq$	$\displaystyle\sup\limits_{x\in\Omega}\{\min\limits_{c^{}\in G}\langle c^{},F(x^{k+1})-F(x)\rangle+\min\limits_{c^{}\in G}\langle c^{},JF(x^{k})(x^{k}-x^{k+1})\rangle-\ell D_{\omega}(x^{k+1},x^{k})\}$
	$\displaystyle=$	$\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{}\in G}\langle c^{},F(x^{k+1})-F(x)\rangle+v_{\ell}(x^{k}),$

Hence,

	$\displaystyle u_{0}(x^{k+1})$	$\displaystyle\leq u_{0}(x^{k})-v_{\ell}(x^{k})$
		$\displaystyle\leq\left(1-\frac{1}{\tau(\ell)}\right)u_{0}(x^{k}).$

From (15), (17) and $\ell\geq L$ , we have $\tau(\ell)\geq 1$ . Then $0<1-\frac{1}{\tau(\ell)}<1$ , we derive that $\{u_{0}(x^{k})\}$ converges linearly. $\ \ \square$

5 Conclusions

We extended the relative smoothness and relative strong convexity to vector-valued functions in the context of VOPs. Then we presented convergence rates for the interior Bregman gradient method using the generalized assumptions. In addition, we also devised a vector Bregman-PL inequality, which was used to prove linear convergence of the proposed method. It is worth noting that the convergence results extended the ones in com to non-Lipschitz gradient continuous and non-strongly convex case.

References

(1) Marler, R. T., Arora, J. S.: Survey of multi-objective optimization methods for engineering. Struct. Multidisc. Optim. 26(6), 369-395 (2004)
(2) Tapia, M., Coello, C.: Applications of multi-objective evolutionary algorithms in economics and finance: A survey. IEEE Congress on Evolutionary Computation. 532-539 (2007)
(3) Fliege, J., Werner, R.: Robust multiobjective optimization & applications in portfolio optimization. Eur. J. Oper. Res. 234(2), 422-433 (2014)
(4) Sener, O., Koltun, V.: Multi-task learning as multiobjective optimization. Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 527-538 (2018)
(5) Mahapatra, D., Rajan, V.: Multi-task learning with user preferences: gradient descent with controlled ascent in Pareto optimization. Proc. Int. Conf. Mach. Learn. (ICML), 119, 6597-6601 (2020)
(6) Ye, F., Lin, B., Yue, Z., Guo, P., Xiao, Q., Zhang, Y.: Multi-objective meta learning. Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2021
(7) Miettinen, K.: Nonlinear Multiobjective Optimization. Kluwer Academic Publishers, Boston, Massachusetts (1999)
(8) Luc, T. D.: Theory of Vector Optimization, Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, Berlin (1989)
(9) John, J.: Vector Optimization: Theory, Applications and Extensions. 2nd ed. Springer, Berlin (2011)
(10) Fliege, J., Svaiter, B. F.: Steepest descent methods for multicriteria optimization. Math. Method. Oper. Res. 51(3), 479-494 (2000)
(11) $\mathrm{Gra\tilde{n}a}$ Drummond, L. M., Svaiter, B. F.: A steepest descent method for vector optimization problems. J. Comput. Appl. Math. 175, 395–414 (2005)
(12) Drummong, L. M. G., Iusem, A. N.: A projected gradient method for vector optimization problems. Comput. Optim. Appl. 28(1), 5-29 (2004)
(13) Bonnel, H., Iusem, A. N., Svaiter, B. F.: Proximal methods in vector optimization. SIAM J. Optim. 15(4), 953-970 (2005)
(14) Lucambio Pérez, L. R., Prudente, L. F.: Nonlinear conjugate gradient methods for vector optimization. SIAM J. Optim. 20(3), 2690-2720 (2018)
(15) Bento, G. C., Cruz Neto, J. X., López, G., Soubeyran, A., Souza, J. C. O.: The proximal point method for locally Lipschitz functions in multiobjective optimization with application to the compromise problem. SIAM J. Optim. 28(2), 1104-1120 (2018)
(16) Tanabe, H., Fukuda, E. H., Yamashita, N.: Proximal gradient methods for multiobjective optimization and their applications. Comput. Optim. Appl. 72, 339-361 (2019)
(17) Chen, J., Tang, L. P., Yang, X. M.: A Barzilai-Borwein descent method for multiobjective optimization problems. arXiv:2204.08616 (2022)
(18) Fliege, J., Vaz, A. I. F., Vicente, L. N.: Complexity of gradient descent for multiobjective optimization. Optim. Methods Softw. 34(5), 949-959 (2019)
(19) Tanabe, H., Fukuda, E. H., Yamashita, N.: Convergence rates analysis of a multiobjective proximal gradient method. Optim. Lett. (2022)
(20) Bauschke, H. H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330-348 (2016)
(21) Lu, H., Freund, Robert M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333-354 (2018)
(22) Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170, 67-96 (2018)
(23) Bauschke, H. H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182, 1068-1087 (2019)
(24) Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697-725 (2006)
(25) Villacorta, K. D. V., Oliveira, P. R.: An interior proximal method in vector optimization. Eur. J. Oper. Res. 214(3), 485-492 (2011)
(26) Chen, Z., Huang, X. X., Yang, X. Q.: Generalized proximal point algorithms for multiobjective optimization problems. Appl. Anal. 90(6), 935-949 (2011)
(27) Souza, $\mathrm{Jo\tilde{a}o}$ Carlos de O., et al.: The proximal point method with a vectorial Bregman regularization in multiobjective DC programming. Optim. 71(1), 263-284 (2022)
(28) Ansary, M. A. T., Dutta, J.: A proximal gradient method for multi-objective optimization problems using Bregman functions. optimization-online (2022)
(29) Rockafellar, R. T.: Convex Analysis. Princeton University Press, Princeton (1970)
(30) Bauschke, H. H., Borwein, J. M.: Joint and separate convexity of the Bregman distance. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications (Haifa 2000), pp. 23-36. Elsevier, Amsterdam (2001)
(31) Chen, W., Yang, X. M., Zhao, Y.: Conditional gradient method for vector optimization. arXiv:2109.11296 (2022)
(32) Rockafellar, R. T., Wets, R. J. B.: Variational Analysis, Grundlehren Math. Wiss. 317, Springer, Berlin (1998)
(33) Sion, M.: On general minimax theorems. Pac. J. Math., 8(1), 171-176 (1958)
(34) Moudden, M. E., Mouatasim, A. E.: Accelerated diagonal steepest descent method for unconstrained multiobjective optimization. J. Optim. Theory Appl. 188(1), 220-242 (2021)
(35) Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538-543 (1993)
(36) Polyak, B. T.: Introduction to Optimization. Optimization Software, Inc., New York (1987)

	$\displaystyle\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},F(x)-F(y)\rangle\}$	$\displaystyle\geq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)-LD_{\omega}(y,x)e\rangle\}$
		$\displaystyle\geq\sup\limits_{y\in\Omega}\{\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle\}+\min\limits_{c^{}\in G}\{\langle c^{},-LD_{\omega}(y,x)e\rangle\}\}$
		$\displaystyle\geq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle-LD_{\omega}(y,x)\},$

	$\displaystyle\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},F(x)-F(y)\rangle\}$	$\displaystyle\leq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)-\mu D_{\omega}(y,x)e\rangle\}$
		$\displaystyle\leq\sup\limits_{y\in\Omega}\{\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle\}+\min\limits_{c^{}\in G}\{\langle c^{},-\mu D_{\omega}(y,x)e\rangle\}\}$
		$\displaystyle\leq\sup\limits_{y\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},JF(x)(x-y)\rangle-\delta\mu D_{\omega}(y,x)\},$

		$\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{}\in G}\langle c^{},F(x^{k})-F(x)\rangle$
	$\displaystyle\geq$	$\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{}\in G}\{\langle c^{},F(x^{k+1})-F(x)+JF(x^{k})(x^{k}-x^{k+1})-\ell D_{\omega}(x^{k+1},x^{k})e\rangle\}$
	$\displaystyle\geq$	$\displaystyle\sup\limits_{x\in\Omega}\{\min\limits_{c^{}\in G}\langle c^{},F(x^{k+1})-F(x)\rangle+\min\limits_{c^{}\in G}\langle c^{},JF(x^{k})(x^{k}-x^{k+1})\rangle-\ell D_{\omega}(x^{k+1},x^{k})\}$
	$\displaystyle=$	$\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{}\in G}\langle c^{},F(x^{k+1})-F(x)\rangle+v_{\ell}(x^{k}),$