This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

∎ ∎

11institutetext: J. Chen 22institutetext: Department of Mathematics, Shanghai University, Shanghai 200444, China
[email protected]
L.P. Tang
33institutetext: National Center for Applied Mathematics of Chongqing, Chongqing 401331, China
[email protected]
🖂X.M. Yang
44institutetext: National Center for Applied Mathematics of Chongqing, and School of Mathematical Sciences, Chongqing Normal University, Chongqing 401331, China
[email protected]

Convergence rates analysis of Interior Bregman Gradient Method for Vector Optimization Problems

Jian Chen    Liping Tang    Xinmin Yang
(Received: date / Accepted: date)
Abstract

In recent years, by using Bregman distance, the Lipschitz gradient continuity and strong convexity were lifted and replaced by relative smoothness and relative strong convexity. Under the mild assumptions, it was proved that gradient methods with Bregman regularity converge linearly for single-objective optimization problems (SOPs). In this paper, we extend the relative smoothness and relative strong convexity to vector-valued functions and analyze the convergence of an interior Bregman gradient method for vector optimization problems (VOPs). Specifically, the global convergence rates are 𝒪(1k)\mathcal{O}(\frac{1}{k}) and 𝒪(rk)(0<r<1)\mathcal{O}(r^{k})(0<r<1) for convex and relative strongly convex VOPs, respectively. Moreover, the proposed method converges linearly for VOPs that satisfy a vector Bregman-PL inequality.

Keywords:
Vector optimization Bregman distance Relative smoothness Relative strong convexity Linear convergence
MSC:
65K05 90C26 90C29 90C30

1 Introduction

Let CmC\subset\mathbb{R}^{m} be a closed, convex, and pointed cone with a non-empty interior. The partial order induced by CC:

yC(resp.C)yyyC(resp.int(C)).y\preceq_{C}(\mathrm{resp.}\prec_{C})y^{\prime}\ \Leftrightarrow\ y^{\prime}-y\in C(\mathrm{resp.}\ \mathrm{int}(C)).

In this paper, we consider the the following vector optimization problem

minxΩF(x),\displaystyle\min\limits_{x\in\Omega}F(x), (VOP)

where every component Fi:nmF_{i}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} is differentiable, and Ωn\Omega\subset\mathbb{R}^{n} is closed and convex with a non-empty interior. For the optimal solutions of (VOP), none of the objectives can be improved without sacrificing the others. The concept of optimality is thus defined as efficiency. It is worth noting that (VOP) corresponds to a multiobjective optimization problem when C=+mC=\mathbb{R}^{m}_{+}, where +m\mathbb{R}^{m}_{+} is the non-negative orthant of m\mathbb{R}^{m}. Some applications of multiobjective optimization problems can be found in engineering 1a , economics 1b ; eco , and machine learning mac1 ; m2 ; mac2 , etc.

Scalarization approaches mnp ; luc ; joh are widely explored for VOPs, which convert a VOP into a single-objective optimization problem so that classical numerical optimization methods can be applied to obtain the solutions to the original problem. Unfortunately, the approaches burden the decision-maker with some parameters which are unknown in advance. To overcome the drawback, Fliege and Svaiter 6 invented the parameter-free method, called the steepest descent method for multiobjective optimization. The corresponding method for VOPs is described in vsd . Motivated by the work of Fliege and Svaiter, some standard first-order method methods are extended to solve MOPs, and VOPs 8 ; 9a ; 9b ; 9c ; 9d ; chen . The main idea of these methods is to compute a descent direction; then, a line search technique is performed along the direction to ensure sufficient decrease for all objective functions. By the strategy, these descent methods produce a sequence of points that enjoy global convergence. Very recently, it was proved that the convergence rates of the multiobjective gradient descent method com and proximal gradient method pcom are the same as the ones for SOPs.

Similar to the first-order method for SOPs, a widely used assumption for MOPs is that the objective function gradients have to be globally Lipschitz continuous, which guarantees a sufficient descent in each iteration. By using Bregman distance, the restrictive assumption was replaced by the Lipschitz-like condition lc , or relative smoothness rs for SOPs. On the other hand, the relative strong convexity was proposed to replace strong convexity in rs . Under the mild assumptions, the linear convergence results were established for gradient rs and proximal gradient method as with Bregman regularization. Moreover, the linear convergence results of the gradient method were obtained with Bregman error bound condition nlns . Auslender and Teboulle ip presented another advantage of the Bregman method: the produced sequence can be restricted in the interior of the feasible set by choosing a suitable Bregman distance and thus eliminate the constraints.

The Bregman gradient methods were also explored in VOPs. Villacorta and Oliveira mip developed the interior Bregman gradient method with a modified convergence sensing condition for convex VOPs with a non-polyhedral set. In cz . Chen et al. introduced the vector-valued Bregman function for MOPs, and the corresponding vector-valued Bregman distance replaced the Euclidean distance in the proximal point algorithm (PPA). Very recently, the PPA with vector-valued Bregman distance was applied to solve multiobjective DC programming in 2022 . Moreover, the proximal gradient method with Bregman functions for MOPs has been investigated by Ansary and Dutta 2022a .

It should be noted that convergence analysis with the appealing potential of Bregman distance was not discussed in the methods mentioned above. Naturally, a question arises can we obtain linear convergence results of the Bregman gradient method for VOPs without Lipschitz gradient continuity and strong convexity. The paper’s main contribution is to answer the question positively. More specifically:

  • \bullet

    We extend the relative smoothness and relative strong convexity to vector-valued functions in the context of VOPs;

  • \bullet

    We present two merit functions and analyze their properties, such as optimality, continuity, and interrelation. We also proposed a generalized vector Bregman-PL inequality, which will be used to prove the linear convergence of the proposed method.

  • \bullet

    We propose an interior Bregman gradient method for VOPs that restricts the produced sequence inside the feasible set. It is proved that every accumulation point of the produced sequence is CC-stationary point for VOP, and the whole sequence converges to a weakly efficient set with CC-convex objective functions.

  • \bullet

    With relative smooth and strongly convex objective functions, we prove the produced sequence converges linearly to a weakly efficient point in the sense of Bregman distance. We also prove that a merit function converges linearly with the vector Bregman-PL inequality.

The organization of the paper is as follows. Some notations and definitions are given in Sect. 2 for our later use. Sect. 3 discusses the merit functions for VOPs. Sect. 4 is devoted to introducing the interior Bregman gradient method and proving the convergence results for the proposed method. At the end of the paper, some conclusions are drawn.

2 Preliminaries

2.1 Notations

  • \bullet

    [m]={1,2,,m}[m]=\{1,2,...,m\}.

  • \bullet

    Δm+={λ:i[m]λi=1,λi>0,i[m]}\Delta^{+}_{m}=\left\{\lambda:\sum\limits_{i\in[m]}\lambda_{i}=1,\lambda_{i}>0,\ i\in[m]\right\} the relative interior of mm-dimensional unit simplex.

  • \bullet

    \|\cdot\| the Euclidean norm, ,\langle\cdot,\cdot\rangle the inner product.

  • \bullet

    int()\mathrm{int}(\cdot) and cl()\mathrm{cl}(\cdot) the interior and the closure of a set, respectively.

  • \bullet

    conv()\mathrm{conv}(\cdot) the convex hull of a set.

  • \bullet

    B[x,r]B[x,r] the closed ball centred at xx with radius rr.

  • \bullet

    JF(x)m×nJF(x)\in\mathbb{R}^{m\times n} and Fi(x)n\nabla F_{i}(x)\in\mathbb{R}^{n} the Jacobian matrix and the gradient of FiF_{i} at xx, respectively.

  • \bullet

    C={cm:c,c0,cC}C^{*}=\{c^{*}\in\mathbb{R}^{m}:\langle c^{*},c\rangle\geq 0,\forall c\in C\} the positive polar cone of CC.

  • \bullet

    G=conv({cm:c=1,cC})G=\mathrm{conv}(\{c^{*}\in\mathbb{R}^{m}:\|c^{*}\|=1,\forall c^{*}\in C^{*}\}).

  • \bullet

    ω(x)=supxn{x,xω(x)}\omega^{*}(x^{*})=\sup\limits_{x\in\mathbb{R}^{n}}\{\langle x^{*},x\rangle-\omega(x)\} the conjugate function of ω\omega.

2.2 Vector optimization

Recall some definitions of solutions to (VOP) as follows.

Definition 1.

joh A vector xΩx^{*}\in\Omega is called efficient solution to (VOP), if there exists no xΩx\in\Omega such that F(x)CF(x)F(x)\preceq_{C}F(x^{\ast}) and F(x)F(x)F(x)\neq F(x^{\ast}).

Definition 2.

joh A vector xΩx^{*}\in\Omega is called weakly efficient solution to (VOP), if there exists no xΩx\in\Omega such that F(x)CF(x)F(x)\prec_{C}F(x^{\ast}).

Definition 3.

vsd A vector xΩx^{\ast}\in\Omega is called CC-stationary point to (VOP), if

JF(x)(Ωx)(int(C))=.JF(x^{*})(\Omega-x)\cap(-\mathrm{int}(C))=\emptyset.

For a non-stationary point xx, there exists a descent direction that is defined as:

Definition 4.

vsd A vector dnd\in\mathbb{R}^{n} is called CC-descent direction for FF at xx, if

JF(x)dint(C).JF(x)d\in-\mathrm{int}(C).

Remark 1.

If xΩx\in\Omega is a non-stationary point, then there exists a vector yΩy\in\Omega such that JF(x)(yx)int(C)JF(x)(y-x)\in-\mathrm{int}(C). Note that int(C)\mathrm{int}(C) is an open set, the relation is also valid for some yint(Ω)y\in\mathrm{int}(\Omega). It follows from Definition 4 that yxy-x is a CC-descent direction.

Definition 5.

joh A function F()F(\cdot) is called CC-convex on Ω\Omega, if

F(λx+(1λ)y)CλF(x)+(1λ)F(y),x,yΩ,λ[0,1].F(\lambda x+(1-\lambda)y)\preceq_{C}\lambda F(x)+(1-\lambda)F(y),\ \forall x,y\in\Omega,\ \lambda\in[0,1].

Since F()F(\cdot) is differentiable, CC-convexity of F()F(\cdot) on Ω\Omega is equivalent to

JF(x)(yx)CF(y)F(x),x,yΩ.JF(x)(y-x)\preceq_{C}F(y)-F(x),\ \forall x,y\in\Omega.

By using the positive polar cone CC^{*}, an equivalent characterization of CC-convexity of F()F(\cdot) on Ω\Omega is presented as follows.

Lemma 1.

luc A function F()F(\cdot) is CC-convex on Ω\Omega if and only if c,F()\langle c^{*},F(\cdot)\rangle is convex on Ω\Omega for all cCc^{*}\in C^{*}.

Remark 2.

Note that ccG\frac{c^{*}}{\|c^{*}\|}\in G for all cC{0}c^{*}\in C^{*}\setminus\{0\}, then F()F(\cdot) is CC-convex on Ω\Omega if and only if c,F()\langle c^{*},F(\cdot)\rangle is convex on Ω\Omega for all cGc^{*}\in G.

In CC-convex case, we derive the equivalence between CC-stationary point and weakly efficient solution.

Lemma 2.

joh Assume that the objective function F()F(\cdot) is CC-convex on Ω\Omega. Then xΩx^{*}\in\Omega is a CC-stationary point of (VOP) if and only if xx^{*} is an efficient solution of (VOP).

2.3 Relative smoothness and relative strong convexity

Definition 6 (Legendre function).

roc Let ω:X(,+]\omega:X\rightarrow\left(-\infty,+\infty\right] be a proper closed convex function, where XnX\subset\mathbb{R}^{n}. Then

  • (i)\mathrm{(i)}

    ω()\omega(\cdot) is essentially smooth if ω()\omega(\cdot) is differentiable on int(domω)\mathrm{int(dom}\omega) and limkω(xk)\lim\limits_{k\rightarrow\infty}\|\nabla\omega(x^{k})\|\rightarrow\infty for all {xk}int(domω)\{x^{k}\}\subset\mathrm{int(dom}\omega) converges to some boundary point of domω\mathrm{dom}\omega;

  • (ii)\mathrm{(ii)}

    ω()\omega(\cdot) is a Legendre function if ω()\omega(\cdot) is essentially smooth and strictly convex on int(domω)\mathrm{int(dom}\omega).

Lemma 3.

roc For a Legendre function ω()\omega(\cdot), we have the following useful properties.

  • (i)\mathrm{(i)}

    ω()\omega(\cdot) is Legendre if and only if its conjugate ω\omega^{*} is Legendre.

  • (ii)\mathrm{(ii)}

    ω(ω(x))=x,ω(x)ω(x)\omega^{*}(\nabla\omega(x))=\langle x,\nabla\omega(x)\rangle-\omega(x).

  • (iii)\mathrm{(iii)}

    (ω())1=ω()(\nabla\omega(\cdot))^{-1}=\nabla\omega^{*}(\cdot).

  • (iv)\mathrm{(iv)}

    domω=int(domω)\mathrm{dom}\partial\omega=\mathrm{int(dom}\omega) with ω(x)={ω(x)},xint(domω)\partial\omega(x)=\{\nabla\omega(x)\},\ \ \forall x\in\mathrm{int(dom}\omega).

Using the Legendre function ω()\omega(\cdot). the Bregman distance Dω(,)D_{\omega}(\cdot,\cdot) w.r.t. ω()\omega(\cdot) is defined as

Dω(x,y)=ω(x)ω(y)ω(y),xy,(x,y)dom(ω)×int(domω).D_{\omega}(x,y)=\omega(x)-\omega(y)-\langle\nabla\omega(y),x-y\rangle,\ \forall(x,y)\in\mathrm{dom}(\omega)\times\mathrm{int(dom}\omega).

Since ω()\omega(\cdot) is strictly convex on int(domω)\mathrm{int(dom}\omega), we have Dω(x,y)0D_{\omega}(x,y)\geq 0 for all x,yint(domω)x,y\in\mathrm{int(dom}\omega), and the equality holds if and only if x=yx=y. In what follows, we recall a basic property for Dω(,)D_{\omega}(\cdot,\cdot) that will be used in the sequel:

Dω(x,y)=Dω(ω(y),ω(x)),x,yint(domω).D_{\omega}(x,y)=D_{\omega^{*}}(\nabla\omega(y),\nabla\omega(x)),\ \forall x,y\in\mathrm{int(dom}\omega). (1)

In general, Dω(,)D_{\omega}(\cdot,\cdot) is not symmetric. The measure of symmetry for Dω(,)D_{\omega}(\cdot,\cdot) was introduced in lc . The symmetric coefficient is given by

α(ω):=inf{Dω(x,y)Dω(y,x):x,yint(domω),xy}[0,1].\alpha(\omega):=\inf\left\{\frac{D_{\omega}(x,y)}{D_{\omega}(y,x)}:x,y\in\mathrm{int(dom}\omega),\ x\neq y\right\}\in[0,1].
Remark 3.

As noted in lc , we call Dω(,)D_{\omega}(\cdot,\cdot) symmetric if α(ω)=1\alpha(\omega)=1. If α(ω)=0\alpha(\omega)=0, then symmetry is absent for Dω(,)D_{\omega}(\cdot,\cdot). The former happens when ω\omega is a strictly convex quadratic function, which was deduced from (sym, , Lemma 3.16). As shown in (lc, , Proposition 2), if domω\mathrm{dom}\omega is not open, then α(ω)=0\alpha(\omega)=0.

In the context of vector optimization, we propose the relative smoothness of F()F(\cdot) relative to Legendre function ω()\omega(\cdot).

Definition 7.

For a vector eintCe\in\mathrm{int}C, we call F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot) on Ωdom(ω)\Omega\subseteq\mathrm{dom}(\omega) if for all x,yintΩx,y\in\mathrm{int}\Omega, there exists L>0L>0 such that

F(y)CF(x)+JF(x)(yx)+LDω(y,x)e.F(y)\preceq_{C}F(x)+JF(x)(y-x)+LD_{\omega}(y,x)e.

The following proposition presents some properties for relative smoothness.

Proposition 1.

Assume F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot) on Ωdom(ω)\Omega\subseteq\mathrm{dom}(\omega), then

  • (i)\mathrm{(i)}

    it is equivalent to Lω()eF()L\omega(\cdot)e-F(\cdot) is CC-convex on Ω\Omega;

  • (ii)\mathrm{(ii)}

    for any cintCc\in\mathrm{int}C, F()F(\cdot) is (γL,C,c)(\gamma L,C,c)-smooth relative to ω()\omega(\cdot) on Ω\Omega, where γ=maxcG{c,ec,c}\gamma=\max\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}.

Proof.

(i) From Remark 2, CC-convexity of Lω()eF()L\omega(\cdot)e-F(\cdot) on Ω\Omega is equivalent to c,Lω()eF()\langle c^{*},L\omega(\cdot)e-F(\cdot)\rangle is convex on Ω\Omega for all cGc^{*}\in G. Then, we have

c,LDω(y,x)eF(y)+F(x)+JF(x)(yx)0,x,yint(Ω),cG.\langle c^{*},LD_{\omega}(y,x)e-F(y)+F(x)+JF(x)(y-x)\rangle\geq 0,\ \forall x,y\in\mathrm{int}(\Omega),\ c^{*}\in G.

This is equivalent to F(y)CF(x)+JF(x)(yx)+LDω(y,x)eF(y)\preceq_{C}F(x)+JF(x)(y-x)+LD_{\omega}(y,x)e, assertion (i) is proved.

(ii) From assertion (i), we obtain c,Lω()eF()\langle c^{*},L\omega(\cdot)e-F(\cdot)\rangle is convex on Ω\Omega for all cGc^{*}\in G. Since ω()\omega(\cdot) is convex, for any cintCc\in\mathrm{int}C, it follows that c,L¯ω()cF()\langle c^{*},\bar{L}\omega(\cdot)c-F(\cdot)\rangle is convex on Ω\Omega for all cGc^{*}\in G when L¯c,cLc,e\bar{L}\langle c^{*},c\rangle\geq L\langle c^{*},e\rangle holds for all cGc^{*}\in G. The inequality holds by setting L¯=maxcG{c,ec,c}L\bar{L}=\max\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}L, and maxcG{c,ec,c}\max\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\} is well-defined due to cintCc\in\mathrm{int}C and the compactness of GG.\ \ \square

We also propose the relative strong convex of F()F(\cdot) relative to Legendre function ω()\omega(\cdot).

Definition 8.

For a vector eintCe\in\mathrm{int}C, we call F()F(\cdot) is (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) on Ωdom(ω)\Omega\subseteq\mathrm{dom}(\omega) if for all x,yintΩx,y\in\mathrm{int}\Omega, there exists μ0\mu\geq 0 such that

F(x)+JF(x)(yx)+μDω(y,x)eCF(y).F(x)+JF(x)(y-x)+\mu D_{\omega}(y,x)e\preceq_{C}F(y).

Similarly, we present some properties for relative strong convexity as follows.

Proposition 2.

Assume F()F(\cdot) is (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) on Ωdom(ω)\Omega\subseteq\mathrm{dom}(\omega), then

  • (i)\mathrm{(i)}

    it is equivalent to F()μω()eF(\cdot)-\mu\omega(\cdot)e is CC-convex on Ω\Omega;

  • (ii)\mathrm{(ii)}

    for any cintCc\in\mathrm{int}C, F()F(\cdot) is (ρμ,C,c)(\rho\mu,C,c)-strongly convex relative to ω()\omega(\cdot) on Ω\Omega, where ρ=mincG{c,ec,c}\rho=\min\limits_{c^{*}\in G}\left\{\frac{\langle c^{*},e\rangle}{\langle c^{*},c\rangle}\right\}.

Proof.

The proof is similar to the arguments used in Proposition 1, we omit it here.\ \ \square

Remark 4.

When ω()=122\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}, the (L,C,e)(L,C,e)-smoothness reduces to condition (A) in cw . If C=+C=\mathbb{R}_{+}, the (L,C,e)(L,C,e)-smoothness and (μ,C,e)(\mu,C,e)-strong convexity correspond to relative LL-smoothness and μ\mu-strong convexity in rs , respectively.

From the above two propositions, the constants of relative smoothness and relative strong convexity depend on ω()\omega(\cdot), F()F(\cdot) and ee. Recall that the quotient Lμ\frac{L}{\mu} plays a key role in the geometric convergence of first-order methods in the presence of strong convexity. If Lμ\frac{L}{\mu} is large, the function descreases slowly when first-order methods are applied. We call the function is ill-conditioned in the situation. To alleviate this dilemma, for the general ω()\omega(\cdot) and F()F(\cdot), the vector ee is selected as:

e:=argmaxcB[0,r]mincGc,c.e:=\mathop{\arg\max}\limits_{c\in B[0,r]}\min\limits_{c^{*}\in G}\langle c^{*},c\rangle. (2)

When C=+mC=\mathbb{R}_{+}^{m}, the vector ee defined as (2) corresponds to (rm,,rm)(\frac{r}{\sqrt{m}},...,\frac{r}{\sqrt{m}}). For simplicity, we select a suitable r>0r>0 such that

maxcGc,e=1.\max\limits_{c^{*}\in G}\langle c^{*},e\rangle=1. (3)

On the other hand, since CmC^{*}\subset\mathbb{R}^{m} is a closed, convex, and pointed cone, we have 0G0\notin G. Then we defined

δ:=mincGc,e>0.\delta:=\min\limits_{c^{*}\in G}\langle c^{*},e\rangle>0. (4)

3 Merit Functions for VOPs

A function is called merit function for (VOP) if it returns 0 at the solutions of (VOP) and strictly positive values otherwise. In this section, we first present the following two merit functions.

u0(x):=supyΩmincGc,F(x)F(y),u_{0}(x):=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},F(x)-F(y)\rangle, (5)
v(x):=supyΩmincG{c,JF(x)(xy)Dω(y,x)},v_{\ell}(x):=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}, (6)

where >0\ell>0 is a constant and cl(domω)=Ω\mathrm{cl}(\mathrm{dom}\omega)=\Omega.

3.1 Properties of merit functions

We can show that u0()u_{0}(\cdot) and v()v_{\ell}(\cdot) are merit functions in the sense of weak efficiency and stationarity, respectively.

Theorem 1.

Let u0()u_{0}(\cdot) and v()v_{\ell}(\cdot) be defined as (5) and (6), respectively. Then

  • (i)\mathrm{(i)}

    the point xΩx\in\Omega is a weakly efficient solution of (VOP) if and only if u0(x)=0u_{0}(x)=0;

  • (ii)\mathrm{(ii)}

    the point xint(Ω)x\in\mathrm{int}(\Omega) is CC-stationary point of (VOP) if and only if v(x)=0v_{\ell}(x)=0;

Proof.

(i) Since xx is a weakly efficient solution of (VOP), we have

F(x)F(y)int(C),yΩ.F(x)-F(y)\notin\mathrm{int}(C),\ \forall y\in\Omega.

Then

mincGc,F(x)F(y)0,yΩ,\min\limits_{c^{*}\in G}\langle c^{*},F(x)-F(y)\rangle\leq 0,\ \forall y\in\Omega,

which is equivalent to

supyΩmincGc,F(x)F(y)0.\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},F(x)-F(y)\rangle\leq 0.

On the other hand, the definition of u0()u_{0}(\cdot) implies that

u0(x)mincG{c,F(x)F(x)}=0,u_{0}(x)\geq\min\limits_{c^{*}\in G}\{\langle c^{*},F(x)-F(x)\rangle\}=0,

which is equivalent to u0(x)=0u_{0}(x)=0.

(ii) Since xx is CC-stationary point of (VOP), we have

supyΩmincGc,JF(x)(xy)0,\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-y)\rangle\leq 0,

which, along the fact that Dω(y,x)0D_{\omega}(y,x)\geq 0, yields

v(x)=supyΩmincG{c,JF(x)(xy)Dω(y,x)}0.v_{\ell}(x)=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}\leq 0.

On the other hand, the definition of vv_{\ell} implies that

v(x)mincG{c,JF(x)(xx)Dω(x,x)}=0.v_{\ell}(x)\geq\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-x)\rangle-\ell D_{\omega}(x,x)\}=0.

Hence, v(x)=0v_{\ell}(x)=0.

Conversely, assume that xx is not CC-stationary when v(x)=0v_{\ell}(x)=0. Then, there exists y0Ωy_{0}\in\Omega such that

mincGc,JF(x)(xy0)>0.\min\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-y_{0})\rangle>0.

For all t[0,1]t\in[0,1], x+t(y0x)Ωx+t(y_{0}-x)\in\Omega. Hence,

v(x)\displaystyle v_{\ell}(x) =supyΩmincG{c,JF(x)(xy)Dω(y,x)}\displaystyle=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}
mincG{c,tJF(x)(xy0)Dω(x+t(y0x),x)}.\displaystyle\geq\min\limits_{c^{*}\in G}\{\langle c^{*},tJF(x)(x-y_{0})\rangle-\ell D_{\omega}(x+t(y_{0}-x),x)\}.

On the other hand, the differentiability of ω()\omega(\cdot) implies limt0Dω(x+t(y0x),x)t=0\lim\limits_{t\rightarrow 0}\frac{D_{\omega}(x+t(y_{0}-x),x)}{t}=0. This together with the above inequality and mincGc,JF(x)(xy0)>0\min\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-y_{0})\rangle>0 yield v(x)>0v_{\ell}(x)>0. This contradicts v(x)=0v_{\ell}(x)=0. The proof is completed.\ \ \square

We denote by V()V_{\ell}(\cdot) the optimal solution of v()v_{\ell}(\cdot). Hence,

V(x):=argsupyΩmincG{c,JF(x)(xy)Dω(y,x)}.V_{\ell}(x):=\mathop{\arg\sup}\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}. (7)

In the sequel, we systematically assume that V()V_{\ell}(\cdot) is well-defined on int(domω)\mathrm{int(dom}\omega), i.e., V()V_{\ell}(\cdot) is a single valued mapping from int(domω)\mathrm{int(dom}\omega) to int(domω)\mathrm{int(dom}\omega). We give a sufficient condition for the assumption.

Lemma 4.

If ω()\omega(\cdot) is supercoercive, i.e.,

limxω(x)x=,\lim\limits_{\|x\|\rightarrow\infty}\frac{\omega(x)}{\|x\|}=\infty,

then V()V_{\ell}(\cdot) is a single valued mapping from int(domω)\mathrm{int(dom}\omega) to int(domω)\mathrm{int(dom}\omega).

Proof.

For any fixed point xint(domω)x\in\mathrm{int(dom}\omega), and >0\ell>0. We define the function P()P_{\ell}(\cdot) for uu as follows

P(u)=mincG{c,JF(x)(xu)Dω(u,x)}.P_{\ell}(u)=\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-u)\rangle-\ell D_{\omega}(u,x)\}. (8)

Then V(x)=argsupuΩP(u)V_{\ell}(x)=\mathop{\arg\sup}\limits_{u\in\Omega}P_{\ell}(u). Substituting the equality of Bregman distance into equality (8), we have

P(u)\displaystyle P_{\ell}(u) =mincG{c,JF(x)(xu)(ω(u)ω(x)ω(x)(ux))}\displaystyle=\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-u)\rangle-\ell(\omega(u)-\omega(x)-\nabla\omega(x)(u-x))\}
=u(mincG{c,JF(x)(xu)}uω(u)u+(ω(x)+ω(x)(ux))u).\displaystyle=\|u\|\left(\frac{\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-u)\rangle\}}{\|u\|}-\frac{\ell\omega(u)}{\|u\|}+\frac{\ell(\omega(x)+\nabla\omega(x)(u-x))}{\|u\|}\right).

Taking u\|u\|\rightarrow\infty, then mincG{c,JF(x)(xu)}u\frac{\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-u)\rangle\}}{\|u\|} is bounded due to the compactness of GG. This combines with the boundness of (ω(x)+ω(x)(ux))u\frac{\ell(\omega(x)+\nabla\omega(x)(u-x))}{\|u\|} and supercoercivity of ω()\omega(\cdot) implies limuP(u)=\lim\limits_{\|u\|\rightarrow\infty}P_{\ell}(u)=-\infty. Hence, P()P_{\ell}(\cdot) is level bounded on n\mathbb{R}^{n}. It follows from continuity of P()P_{\ell}(\cdot) and Weierstrass’s theorem (rocv, , Theorem 1.9) that V(x)V_{\ell}(x) is nonempty. The uniqueness of V(x)V_{\ell}(x) is given by the strict convexity of ω()\omega(\cdot). From the optimality condition of V(x)V_{\ell}(x), it follows that ω(V(x))\partial\omega(V_{\ell}(x)) is nonempty. Then Lemma 3(iv) implies that V(x)int(domω)V_{\ell}(x)\in\mathrm{int(dom}\omega).\ \ \square

Remark 5.

Recall that cl(domω)=Ω\mathrm{cl}(\mathrm{dom}\omega)=\Omega, we have int(domω)=int(Ω)\mathrm{int(dom}\omega)=\mathrm{int}(\Omega). From Lemma 4, if ω()\omega(\cdot) is supercoercive, then V()V_{\ell}(\cdot) is a single valued mapping from int(Ω)\mathrm{int}(\Omega) to int(Ω)\mathrm{int}(\Omega).

Proposition 3.

For all >0\ell>0, let v()v_{\ell}(\cdot) and V()V_{\ell}(\cdot) be defined as (6) and (7), respectively. Then

  • (i)\mathrm{(i)}

    there exists c(x)Gc_{\ell}(x)\in G, such that

    V(x)=argsupyΩ{c(x),JF(x)(xy)Dω(y,x)},V_{\ell}(x)=\mathop{\arg\sup}\limits_{y\in\Omega}\{\langle c_{\ell}(x),JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}, (9)

    and

    c(x)argmincGc,JF(x)(xV(x));c_{\ell}(x)\in\mathop{\arg\min}\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-V_{\ell}(x))\rangle; (10)
  • (ii)\mathrm{(ii)}

    V(x)=ω(ω(x)1JF(x)Tc(x)),xint(Ω);V_{\ell}(x)=\nabla\omega^{*}(\nabla\omega(x)-\frac{1}{\ell}JF(x)^{T}c_{\ell}(x)),\ \forall x\in\mathrm{int}(\Omega);

  • (iii)\mathrm{(iii)}

    c(x),JF(x)(xV(x))=(Dω(x,V(x))+Dω(V(x),x)),xint(Ω);\langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle=\ell(D_{\omega}(x,V_{\ell}(x))+D_{\omega}(V_{\ell}(x),x)),\ \forall x\in\mathrm{int}(\Omega);

  • (iv)\mathrm{(iv)}

    v(x)=Dω(x,V(x)),xint(Ω)v_{\ell}(x)=\ell D_{\omega}(x,V_{\ell}(x)),\ \forall x\in\mathrm{int}(\Omega).

Proof.

(i) Note that Ω\Omega is convex, GG is compact and convex, and c,JF(x)(xy)Dω(y,x)\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x) is convex for cc^{*} and concave for yy. Therefore, it follows by Sion’s minimax theorem 32 that there exists c(x)Gc_{\ell}(x)\in G such that

supyΩmincG{c,JF(x)(xy)Dω(y,x)}\displaystyle\ \sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}
=\displaystyle= mincGsupyΩ{c,JF(x)(xy)Dω(y,x)}\displaystyle\ \min\limits_{c^{*}\in G}\sup\limits_{y\in\Omega}\{\langle c^{*},JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\}
=\displaystyle= c(x),JF(x)(xV(x))Dω(V(x),x),\displaystyle\ \langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle-\ell D_{\omega}(V_{\ell}(x),x),

and

V(x)argsupyΩ{c(x),JF(x)(xy)Dω(y,x)},V_{\ell}(x)\in\mathop{\arg\sup}\limits_{y\in\Omega}\{\langle c_{\ell}(x),JF(x)(x-y)\rangle-\ell D_{\omega}(y,x)\},

and

c(x)argmincGc,JF(x)(xV(x)).c_{\ell}(x)\in\mathop{\arg\min}\limits_{c^{*}\in G}\langle c^{*},JF(x)(x-V_{\ell}(x))\rangle.

Hence, we obtain (10), and (9) follows by the strict convexity of ω()\omega(\cdot).

(ii) From the optimality condition for (9) and the fact that V(x)int(Ω)V_{\ell}(x)\in\mathrm{int}(\Omega), we have

JF(x)Tc(x)(ω(V(x))ω(x))=0.-JF(x)^{T}c_{\ell}(x)-\ell(\nabla\omega(V_{\ell}(x))-\nabla\omega(x))=0. (11)

Then, assertion (i) follows from the fact that (ω())1=ω()(\nabla\omega(\cdot))^{-1}=\nabla\omega^{*}(\cdot).

(iii) From (11), we have

c(x),JF(x)(xV(x))\displaystyle\langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle =ω(V(x))ω(x),V(x)x\displaystyle=\ell\langle\nabla\omega(V_{\ell}(x))-\nabla\omega(x),V_{\ell}(x)-x\rangle
=(Dω(V(x),x)+Dω(x,V(x))).\displaystyle=\ell(D_{\omega}(V_{\ell}(x),x)+D_{\omega}(x,V_{\ell}(x))).

Hence, we obtain the desired result.

(iv) Using (9) and the fact that V(x)V_{\ell}(x) is the unique solution of v(x)v_{\ell}(x), we obtain

v(x)\displaystyle v_{\ell}(x) =c(x),JF(x)(xV(x))Dω(V(x),x)\displaystyle=\langle c_{\ell}(x),JF(x)(x-V_{\ell}(x))\rangle-\ell D_{\omega}(V_{\ell}(x),x)
=(Dω(V(x),x)+Dω(x,V(x)))Dω(V(x),x)\displaystyle=\ell(D_{\omega}(V_{\ell}(x),x)+D_{\omega}(x,V_{\ell}(x)))-\ell D_{\omega}(V_{\ell}(x),x)
=Dω(x,V(x)),\displaystyle=\ell D_{\omega}(x,V_{\ell}(x)),

where the second equality is given by assertion (ii). The proof is completed.\ \ \square

We can also show the continuity of v()v_{\ell}(\cdot) and V()V_{\ell}(\cdot).

Proposition 4.

For all >0\ell>0, the v()v_{\ell}(\cdot) and V()V_{\ell}(\cdot) are continuous on int(Ω)\mathrm{int}(\Omega).

Proof.

From Proposition 3(iv), it is sufficient to prove the continuity of V()V_{\ell}(\cdot). For all xint(Ω)x\in\mathrm{int}(\Omega), the optimality condition (11) gives

ω(x)ω(V(x)),zV(x)=JF(x)Tc(x),zV(x),zΩ.\ell\langle\nabla\omega(x)-\nabla\omega(V_{\ell}(x)),z-V_{\ell}(x)\rangle=\langle JF(x)^{T}c_{\ell}(x),z-V_{\ell}(x)\rangle,\ \forall z\in\Omega.

For x¯int(Ω)\bar{x}\in\mathrm{int}(\Omega) and a sequence {xk}int(Ω)\{x_{k}\}\subset\mathrm{int}(\Omega) satisfying xkx¯x_{k}\rightarrow\bar{x}. Substituting (x,z)=(x¯,V(xk))(x,z)=(\bar{x},V_{\ell}(x_{k})) and (x,z)=(xk,V(x¯))(x,z)=(x_{k},V_{\ell}(\bar{x})) into the above equality, respectively, and sum them up, we have

ω(xk)ω(x¯)+ω(V(x¯))ω(V(xk)),V(x¯)V(xk)\displaystyle\ell\langle\nabla\omega(x_{k})-\nabla\omega(\bar{x})+\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle
=\displaystyle= JF(xk)Tc(xk)JF(x¯)Tc(x¯),V(x¯)V(xk)\displaystyle\langle JF(x_{k})^{T}c_{\ell}(x_{k})-JF(\bar{x})^{T}c_{\ell}(\bar{x}),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle
=\displaystyle= JF(xk)Tc(xk),xkV(xk)+JF(x¯)Tc(x¯),x¯V(x¯)\displaystyle\langle JF(x_{k})^{T}c_{\ell}(x_{k}),x_{k}-V_{\ell}(x_{k})\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),\bar{x}-V_{\ell}(\bar{x})\rangle
+JF(xk)Tc(xk),V(x¯)xk+JF(x¯)Tc(x¯),V(xk)x¯\displaystyle+\langle JF(x_{k})^{T}c_{\ell}(x_{k}),V_{\ell}(\bar{x})-x_{k}\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),V_{\ell}(x_{k})-\bar{x}\rangle
\displaystyle\leq JF(xk)Tc(x¯),xkV(xk)+JF(x¯)Tc(xk),x¯V(x¯)\displaystyle\langle JF(x_{k})^{T}c_{\ell}(\bar{x}),x_{k}-V_{\ell}(x_{k})\rangle+\langle JF(\bar{x})^{T}c_{\ell}(x_{k}),\bar{x}-V_{\ell}(\bar{x})\rangle
+JF(xk)Tc(xk),V(x¯)xk+JF(x¯)Tc(x¯),V(xk)x¯\displaystyle+\langle JF(x_{k})^{T}c_{\ell}(x_{k}),V_{\ell}(\bar{x})-x_{k}\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),V_{\ell}(x_{k})-\bar{x}\rangle
=\displaystyle= (JF(xk)JF(x¯))Tc(x¯),xkV(xk)+(JF(x¯)JF(xk))Tc(xk),x¯V(x¯)\displaystyle\langle(JF(x_{k})-JF(\bar{x}))^{T}c_{\ell}(\bar{x}),x_{k}-V_{\ell}(x_{k})\rangle+\langle(JF(\bar{x})-JF(x_{k}))^{T}c_{\ell}(x_{k}),\bar{x}-V_{\ell}(\bar{x})\rangle
+JF(xk)Tc(xk),x¯xk+JF(x¯)Tc(x¯),xkx¯,\displaystyle+\langle JF(x_{k})^{T}c_{\ell}(x_{k}),\bar{x}-x_{k}\rangle+\langle JF(\bar{x})^{T}c_{\ell}(\bar{x}),x_{k}-\bar{x}\rangle,

where the inequality is given by (10). For simplicity, we denote the right hand side of the above inequality as RHSRHS. Then, we have

ω(V(x¯))ω(V(xk)),V(x¯)V(xk)RHS+ω(x¯)ω(xk),V(x¯)V(xk).\displaystyle\ell\langle\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle\leq RHS+\langle\nabla\omega(\bar{x})-\nabla\omega(x_{k}),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle.

Notice that ω()\omega(\cdot) is strictly convex on int(Ω)\mathrm{int}(\Omega), we have

ω(V(x¯))ω(V(xk)),V(x¯)V(xk)0.\langle\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle\geq 0.

On the other hand, since ω()\nabla\omega(\cdot) and JF()JF(\cdot) are continuous, and xkx¯x_{k}\rightarrow\bar{x}, we obtain RHSRHS and ω(x¯)ω(xk)\nabla\omega(\bar{x})-\nabla\omega(x_{k}) tend to 0. Then ω(V(x¯))ω(V(xk)),V(x¯)V(xk)\langle\nabla\omega(V_{\ell}(\bar{x}))-\nabla\omega(V_{\ell}(x_{k})),V_{\ell}(\bar{x})-V_{\ell}(x_{k})\rangle tends to 0. It follows by the strict convexity of ω\omega that V(xk)V(x¯)V_{\ell}(x_{k})\rightarrow V_{\ell}(\bar{x}). Hence, the continuity of ω()\omega(\cdot) follows from the arbitrary of {xk}\{x_{k}\} and x¯int(Ω)\bar{x}\in\mathrm{int}(\Omega).\ \ \square

3.2 Vector Bregman-PL inequality

In this subsection, a noval vector Bregman-PL inequality is derived with merit functions. At first, we present the connection between Dω(x,ω(ω(x)1JF(x)Tc(x)))D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c_{\ell}(x))) and Dω(x,ω(ω(x)2JF(x)Tc(x)))D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c_{\ell}(x))) for some 1,2>0\ell_{1},\ell_{2}>0, which will be used to establish the connection between v1(x)v_{\ell_{1}}(x) and v2(x)v_{\ell_{2}}(x).

Lemma 5.

Let 1,2>0\ell_{1},\ell_{2}>0, cGc^{*}\in G and xint(Ω)x\in\mathrm{int}(\Omega). If 12\ell_{1}\geq\ell_{2}, we have

Dω(x,ω(ω(x)1JF(x)Tc))Dω(x,ω(ω(x)2JF(x)Tc)).D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c^{*}))\geq D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c^{*})). (12)

Proof.

The proof is similar to the arguments in the proof of (nlns, , Lemma 3.5), we omit it here.\ \ \square

Note that it is not clear whether the inequality holds for 0<1<20<\ell_{1}<\ell_{2} in Lemma 5. To overcome this difficulty we present the following assumption.

Assumption 1.

Let 1,2>0\ell_{1},\ell_{2}>0, cGc^{*}\in G and xint(Ω)x\in\mathrm{int}(\Omega). Then there exists θ:++++\theta:\mathbb{R}_{++}\rightarrow\mathbb{R}_{++} such that

Dω(x,ω(ω(x)1JF(x)Tc))θ(12)Dω(x,ω(ω(x)2JF(x)Tc))D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c^{*}))\geq\theta({\frac{\ell_{1}}{\ell_{2}}})D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c^{*})) (13)

The Assumption 1 seems strict in general. Indeed, we have the following sufficient condition for Assumption 1, which is first presented in nlns .

Remark 6.
  • (i)\mathrm{(i)}

    If 12\ell_{1}\geq\ell_{2}, Assumption 1 holds with θ()1\theta(\cdot)\equiv 1.

  • (ii)\mathrm{(ii)}

    Assumption 1 holds when ω()\omega(\cdot) is σ\sigma-strongly convex and κ\kappa-smooth. In this setting, the conjugate ω()\omega^{*}(\cdot) satisfies

    uv22κDω(u,v)uv22σ,u,vint(Ω).\frac{\|u-v\|^{2}}{2\kappa}\leq D_{\omega^{*}}(u,v)\leq\frac{\|u-v\|^{2}}{2\sigma},\ \forall u,v\in\mathrm{int}(\Omega).

It follows by (1) and the above inequalities that

Dω(x,ω(ω(x)1JF(x)Tc))122κJF(x)Tc2,D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{1}JF(x)^{T}c^{*}))\geq\frac{\ell_{1}^{2}}{2\kappa}\|JF(x)^{T}c^{*}\|^{2},

and

JF(x)Tc22σ22Dω(x,ω(ω(x)2JF(x)Tc)).\|JF(x)^{T}c^{*}\|^{2}\geq\frac{2\sigma}{\ell_{2}^{2}}D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\ell_{2}JF(x)^{T}c^{*})).

Then Assumption 1 holds with θ(t):=σt2κ\theta(t):=\frac{\sigma t^{2}}{\kappa}.

We are now in a position to present the relation between v1()v_{\ell_{1}}(\cdot) and v2()v_{\ell_{2}}(\cdot).

Proposition 5.

Suppose Assumption 1 holds. Let 1,2>0\ell_{1},\ell_{2}>0, and xint(Ω)x\in\mathrm{int}(\Omega). For 12\ell_{1}\geq\ell_{2}, we have

v1(x)v2(x)2θ(21)1v1(x).v_{\ell_{1}}(x)\leq v_{\ell_{2}}(x)\leq\frac{\ell_{2}}{\theta(\frac{\ell_{2}}{\ell_{1}})\ell_{1}}v_{\ell_{1}}(x). (14)

Proof.

The left hand side of inequalities (14) follows directly from the definition of v()v_{\ell}(\cdot). Next, we proof the right hand side of inequalities (14). From (9) and the definition of c()c_{\ell}(\cdot), we have

v2(x)supyΩ{c1(x),JF(x)(xy)2Dω(y,x)}.v_{\ell_{2}}(x)\leq\sup\limits_{y\in\Omega}\{\langle c_{\ell_{1}}(x),JF(x)(x-y)\rangle-\ell_{2}D_{\omega}(y,x)\}.

Using the similar argument in the proof of Proposition 3(iv), we obtain

v2(x)supyΩ{c1(x),JF(x)(xy)2Dω(y,x)}=2Dω(x,ω(ω(x)12JF(x)Tc1(x))).\displaystyle v_{\ell_{2}}(x)\leq\sup\limits_{y\in\Omega}\{\langle c_{\ell_{1}}(x),JF(x)(x-y)\rangle-\ell_{2}D_{\omega}(y,x)\}=\ell_{2}D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\frac{1}{\ell_{2}}JF(x)^{T}c_{\ell_{1}}(x))).

We further use inequality (13) to get

v2(x)\displaystyle v_{\ell_{2}}(x) 2θ(21)Dω(x,ω(ω(x)11JF(x)Tc1(x)))\displaystyle\leq\frac{\ell_{2}}{\theta(\frac{\ell_{2}}{\ell_{1}})}D_{\omega}(x,\nabla\omega^{*}(\nabla\omega(x)-\frac{1}{\ell_{1}}JF(x)^{T}c_{\ell_{1}}(x)))
=2θ(21)1v1(x),\displaystyle=\frac{\ell_{2}}{\theta(\frac{\ell_{2}}{\ell_{1}})\ell_{1}}v_{\ell_{1}}(x),

where the equality is given by Proposition 3(iv). Therefore, we obtain the right hand side of inequalities (14).\ \ \square

Moreover, we present the relations between u0()u_{0}(\cdot) and v()v_{\ell}(\cdot).

Proposition 6.

For the merit functions u0()u_{0}(\cdot) and v()v_{\ell}(\cdot), the following statements hold.

  • (i)\mathrm{(i)}

    If F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot) for some L>0L>0, then

    vL(x)u0(x),xint(Ω).v_{L}(x)\leq u_{0}(x),\ \forall x\in\mathrm{int}(\Omega). (15)
  • (ii)\mathrm{(ii)}

    If F()F(\cdot) is (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) for some μ0\mu\geq 0, then

    u0(x)vδμ(x),xint(Ω).u_{0}(x)\leq v_{\delta\mu}(x),\ \forall x\in\mathrm{int}(\Omega). (16)

Proof.

(i) From the (L,C,e)(L,C,e)-smoothness of F()F(\cdot), we have

F(y)CF(x)+JF(x)(yx)+LDω(y,x)e,yΩ.F(y)\preceq_{C}F(x)+JF(x)(y-x)+LD_{\omega}(y,x)e,\ \forall y\in\Omega.

Hence,

supyΩmincG{c,F(x)F(y)}\displaystyle\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},F(x)-F(y)\rangle\} supyΩmincG{c,JF(x)(xy)LDω(y,x)e}\displaystyle\geq\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)-LD_{\omega}(y,x)e\rangle\}
supyΩ{mincG{c,JF(x)(xy)}+mincG{c,LDω(y,x)e}}\displaystyle\geq\sup\limits_{y\in\Omega}\{\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle\}+\min\limits_{c^{*}\in G}\{\langle c^{*},-LD_{\omega}(y,x)e\rangle\}\}
supyΩmincG{c,JF(x)(xy)LDω(y,x)},\displaystyle\geq\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-LD_{\omega}(y,x)\},

where the last inequality is given by (3). It follows that u0(x)vL(x)u_{0}(x)\geq v_{L}(x).

From the (μ,C,e)(\mu,C,e)-strong comvexity of F()F(\cdot), we have

F(x)+JF(x)(yx)+μDω(y,x)eCF(y).F(x)+JF(x)(y-x)+\mu D_{\omega}(y,x)e\preceq_{C}F(y).

Therefore,

supyΩmincG{c,F(x)F(y)}\displaystyle\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},F(x)-F(y)\rangle\} supyΩmincG{c,JF(x)(xy)μDω(y,x)e}\displaystyle\leq\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)-\mu D_{\omega}(y,x)e\rangle\}
supyΩ{mincG{c,JF(x)(xy)}+mincG{c,μDω(y,x)e}}\displaystyle\leq\sup\limits_{y\in\Omega}\{\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle\}+\min\limits_{c^{*}\in G}\{\langle c^{*},-\mu D_{\omega}(y,x)e\rangle\}\}
supyΩmincG{c,JF(x)(xy)δμDω(y,x)},\displaystyle\leq\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x)(x-y)\rangle-\delta\mu D_{\omega}(y,x)\},

where the last inequality is given by (4). Hence, u0(x)vδμ(x)u_{0}(x)\leq v_{\delta\mu}(x).\ \ \square

We propose the following error bound property, which will be used in convergence rate analysis.

Definition 9.

We say that vector Bregman-PL inequality holds for F()F(\cdot) on int(Ω)\mathrm{int}(\Omega) when

u0(x)τ()v(x),xint(Ω),>0,u_{0}(x)\leq\tau(\ell)v_{\ell}(x),\ \forall x\in\mathrm{int}(\Omega),\ell>0, (17)

where τ:++++\tau:\mathbb{R}_{++}\rightarrow\mathbb{R}_{++}.

Remark 7.

The vector Bregman-PL inequality (17) corresponds to the general PL inequality when m=1m=1 and ω()=122\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}. It is also a generalization for gradient dominated inequality (nlns, , Definition 3.2) in the context of vector optimization problems. Recently, a multiobjective proximal-PL inequality was established in (pcom, , Definition 4.1). The vector Bregman-PL inequality coincides with multiobjective proximal-PL inequality when ω()=122\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2} and gi()=σΩ()g_{i}(\cdot)=\sigma_{\Omega}(\cdot) for all i[m]i\in[m].

By virtue of Propositions 5 and 6, we give the sufficient conditions for vector Bregman-PL inequality.

Proposition 7.

Suppose that Assumption 1 holds and F()F(\cdot) is (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) for some μ0\mu\geq 0. Then for all xint(Ω)x\in\mathrm{int}(\Omega) and >0\ell>0, vector Bregman-PL inequality (17) holds with τ()=δμθ(δμ)\tau(\ell)=\frac{\delta\mu}{\theta(\frac{\delta\mu}{\ell})\ell}.

Proof.

The result follows directly from inequalities (14) and (16).\ \ \square

4 Interior Bregman Gradient Method for VOPs

In this section, we first introduce interior Bregman gradient method for VOPs.

Data: x0int(Ω),>0x^{0}\in\mathrm{int}(\Omega),\ \ell>0, a supercoercive Legendre function ω\omega with Ω=cl(domω)\Omega=\mathrm{cl}(\mathrm{dom}\omega) and F()F(\cdot) is (L,C,e)(L,C,e)-smooth related to ω()\omega(\cdot).
1 for k=0,1,k=0,1,... do
2       xk+1=V(xk)x^{k+1}=V_{\ell}(x^{k})
3       if xk+1=xkx^{k+1}=x^{k} then
4             return Pareto critical point xkx^{k}
5       end if
6      
7 end for
Algorithm 1 interior_Bregman_gradient_method_for_VOPs

The proposed algorithm generates a sequence via the following iterates:

xk+1=V(xk).x^{k+1}=V_{\ell}(x^{k}).

From Lemma 4 and the strict convexity of ω()\omega(\cdot), the algorithm is well-defined, and the generated sequence {xk}\{x^{k}\} lies in int(Ω)\mathrm{int}(\Omega). Note that V(xk)V_{\ell}(x^{k}) is the unique solution of

v(xk)=supyΩmincG{c,JF(xk)(xky)Dω(y,xk)},v_{\ell}(x^{k})=\sup\limits_{y\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},JF(x^{k})(x^{k}-y)\rangle-\ell D_{\omega}(y,x^{k})\},

then

V(xk)=arginfyΩmaxcG{c,JF(xk)(yxk)+Dω(y,xk)}.V_{\ell}(x^{k})=\mathop{\arg\inf}\limits_{y\in\Omega}\max\limits_{c^{*}\in G}\{\langle c^{*},JF(x^{k})(y-x^{k})\rangle+\ell D_{\omega}(y,x^{k})\}.

Define mapping φ:m\varphi:\mathbb{R}^{m}\rightarrow\mathbb{R} as

φ(F(x)):=maxcG{c,F(x).\varphi(F(x)):=\max\limits_{c^{*}\in G}\{\langle c^{*},F(x)\rangle.

From the iterates in Algorithm 1, we deduce the following sufficient descent property.

Lemma 6 (Sufficient descent property).

Assume that F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot), and >0\ell>0. Then the sequence {xk}\{x^{k}\} produced by Algorithm 1 satisfies

φ(F(xk+1)F(xk))((1+α(ω))L)Dω(xk+1,xk).\varphi(F(x^{k+1})-F(x^{k}))\leq-((1+\alpha(\omega))\ell-L)D_{\omega}(x^{k+1},x^{k}). (18)

Furthermore, the sufficient descent property holds when

>L1+α(ω).\ell>\frac{L}{1+\alpha(\omega)}.

Proof.

From the the definition of (L,C,e)(L,C,e)-smoothness, we have

F(xk+1)F(xk)CJF(xk)(xk+1xk)+LDω(xk+1,xk)e.\displaystyle F(x^{k+1})-F(x^{k})\preceq_{C}JF(x^{k})(x^{k+1}-x^{k})+LD_{\omega}(x^{k+1},x^{k})e.

Then

φ(F(xk+1)F(xk))\displaystyle\varphi(F(x^{k+1})-F(x^{k})) φ(JF(xk)(xk+1xk))+LDω(xk+1,xk)\displaystyle\leq\varphi(JF(x^{k})(x^{k+1}-x^{k}))+LD_{\omega}(x^{k+1},x^{k})
=Dω(xk+1,xk)Dω(xk,xk+1)+LDω(xk+1,xk)\displaystyle=-\ell D_{\omega}(x^{k+1},x^{k})-\ell D_{\omega}(x^{k},x^{k+1})+LD_{\omega}(x^{k+1},x^{k})
((1+α(ω))L)Dω(xk+1,xk),\displaystyle\leq-((1+\alpha(\omega))\ell-L)D_{\omega}(x^{k+1},x^{k}),

where the first equality follows from xk+1=V(xk)x^{k+1}=V_{\ell}(x^{k}), the definitions of φ()\varphi(\cdot), relation (10) and Proposition 3(iii), the second equality is given by the definition of α(ω)\alpha(\omega). This together with the inequality >L1+α(ω)\ell>\frac{L}{1+\alpha(\omega)} yield the sufficient descent property.\ \ \square

We can see that Algorithm 1 terminates with an CC-stationary point in a finite number of iterations or generates an infinite sequence. In the sequel, we will suppose that Algorithm 1 generates an infinite sequence of nonstationary points. Firstly, we present the global convergence of Algorithm 1 in nonconvex case.

Theorem 2.

Assume that {x:F(x)CF(x0)}\{x:F(x)\preceq_{C}F(x^{0})\} is bounded, F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot) for some L>0L>0, and >L1+α(ω)\ell>\frac{L}{1+\alpha(\omega)}. Let {xk}\{x^{k}\} be the sequence generated by Algorithm 1. Then, we have

  • (i)\mathrm{(i)}

    there exists FCF(xk)F^{*}\preceq_{C}F(x^{k}) for all kk such that limkF(xk)=F\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}.

  • (ii)\mathrm{(ii)}

    k=0Dω(xk+1,xk)<\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty.

  • (iii)\mathrm{(iii)}

    {xk}\{x_{k}\} has at least one accumulation point, and every accumulation point xint(Ω)x^{*}\in\mathrm{int}(\Omega) is a CC-stationary point. Moreover, if ω()\nabla\omega(\cdot) is LωL_{\omega}-Lipschitz continuous on int(Ω)\mathrm{int}(\Omega), every accumulation point xbd(Ω)x^{*}\in\mathrm{bd}(\Omega) is a CC-stationary point.

  • (iv)\mathrm{(iv)}

    min0pk1v(xp)φ(FF(x0))α(ω)((1+α(ω))L)k\min\limits_{0\leq p\leq k-1}v_{\ell}(x^{p})\leq\frac{-\ell\varphi(F^{*}-F(x^{0}))}{\alpha(\omega)((1+\alpha(\omega))\ell-L)k}

Proof.

(i) From Lemma 6, it follows that {F(xk)}\{F(x_{k})\} is descreasing under the partial order induced by CC. This combined with boundness of {x:F(x)CF(x0)}\{x:F(x)\preceq_{C}F(x^{0})\} implies {xk}\{x_{k}\} is bounded and there exists FCF(xk)F^{*}\preceq_{C}F(x^{k}) for all kk such that limkF(xk)=F\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}.

(ii) Summing the inequality (18) over k=0,1,pk=0,1,...p, we obtain

k=0p((1+α(ω))L)Dω(xk+1,xk)\displaystyle-\sum\limits_{k=0}^{p}((1+\alpha(\omega))\ell-L)D_{\omega}(x^{k+1},x^{k}) k=0pφ(F(xk+1)F(xk))\displaystyle\geq\sum\limits_{k=0}^{p}\varphi(F(x^{k+1})-F(x^{k}))
φ(k=0p{F(xk+1)F(xk)})\displaystyle\geq\varphi(\sum\limits_{k=0}^{p}\{F(x^{k+1})-F(x^{k})\})
φ(FF(x0))\displaystyle\geq\varphi(F^{*}-F(x^{0}))

where the second inequality follows from the subadditiveness of φ()\varphi(\cdot), and the last inequality is given by the definition of φ()\varphi(\cdot) and FCF(xk)F^{*}\preceq_{C}F(x^{k}). Hence

k=0Dω(xk+1,xk)<\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty

(iii) From Proposition 3(iv), we have

v(xk)=Dω(xk,xk+1)α(ω)Dω(xk+1,xk),v_{\ell}(x^{k})=\ell D_{\omega}(x^{k},x^{k+1})\leq\frac{\ell}{\alpha(\omega)}D_{\omega}(x^{k+1},x^{k}), (19)

this together with assertion (ii) imply

limkv(xk)=0.\lim\limits_{k\rightarrow\infty}v_{\ell}(x^{k})=0.

On the other hand, the boundness of {xk}\{x^{k}\} yields that there exists a subsequence {xk}k𝒦{xk}\{x^{k}\}_{k\in\mathcal{K}}\subset\{x^{k}\} tends to xx^{*}, where xx^{*} is an accumulation point. Next, we prove xx^{*} is a CC-stationary point. We distinguish two cases: xint(Ω)x^{*}\in\mathrm{int}(\Omega) or xbd(Ω)x^{*}\in\mathrm{bd}(\Omega). If xint(Ω)x^{*}\in\mathrm{int}(\Omega), it follows from the continuity of v()v_{\ell}(\cdot) that

v(x)=limk𝒦v(xk)=0.v_{\ell}(x^{*})=\lim\limits_{k\in\mathcal{K}}v_{\ell}(x^{k})=0.

Therefore, xx^{*} is a CC-stationary point. If xbd(Ω)x^{*}\in\mathrm{bd}(\Omega), suppose by contrary that xx^{*} is a non-stationary point. From Remark 1, there exists y0int(Ω)y_{0}\in\mathrm{int}(\Omega) such that

JF(x)(y0x)int(C).JF(x^{*})(y_{0}-x^{*})\in-\mathrm{int}(C).

Then, we denote by

ε:=φ(JF(x)(y0x))<0.\varepsilon:=\varphi(JF(x^{*})(y_{0}-x^{*}))<0.

From the continuity of JF()JF(\cdot) and the compactness of GG, we have

φ(JF(xk)(y0xk))ε2,\varphi(JF(x^{k})(y_{0}-x^{k}))\leq\frac{\varepsilon}{2},

for k𝒦k\in\mathcal{K} is sufficient large. Then, we have

v(xk)\displaystyle v_{\ell}(x^{k}) =infyΩmaxcG{c,JF(xk)(yxk)+Dω(y,xk)}\displaystyle=-\inf\limits_{y\in\Omega}\max\limits_{c^{*}\in G}\{\langle c^{*},JF(x^{k})(y-x^{k})\rangle+\ell D_{\omega}(y,x^{k})\}
tφ(JF(xk)(y0xk))Dω(xk+t(y0xk),xk)\displaystyle\geq-t\varphi(JF(x^{k})(y_{0}-x^{k}))-\ell D_{\omega}(x^{k}+t(y_{0}-x^{k}),x^{k})
ε2tDω(xk+t(y0xk),xk)\displaystyle\geq-\frac{\varepsilon}{2}t-\ell D_{\omega}(x^{k}+t(y_{0}-x^{k}),x^{k})
ε2tLω2t2y0xk2\displaystyle\geq-\frac{\varepsilon}{2}t-\frac{\ell L_{\omega}}{2}t^{2}\|y_{0}-x^{k}\|^{2}
ε2tLωt2y0x2,\displaystyle\geq-\frac{\varepsilon}{2}t-\ell L_{\omega}t^{2}\|y_{0}-x^{*}\|^{2},

for k𝒦k\in\mathcal{K} is sufficient large and all t(0,1)t\in(0,1), where the first inequality follows from xk+t(y0xk)Ωx^{k}+t(y_{0}-x^{k})\in\Omega, the third inequality is given by LωL_{\omega}-Lipschitz continuity of ω()\nabla\omega(\cdot), and the last inequality is due to that k𝒦k\in\mathcal{K} is sufficient large so that, without loss of generality, we have y0xk2y0x\|y_{0}-x^{k}\|\leq\sqrt{2}\|y_{0}-x^{*}\|. Recall that the above inequalities holds for all t(0,1)t\in(0,1), then there exists t0(0,1)t_{0}\in(0,1) such that v(xk)εt04v_{\ell}(x^{k})\geq-\frac{\varepsilon t_{0}}{4}. This contradicts v(xk)0v_{\ell}(x^{k})\rightarrow 0. Therefore, xx^{*} is a CC-stationary point.

(iv) In the proof of assertion (ii), we obtained

k=0Dω(xk+1,xk)φ(FF(x0))(1+α(ω))L,\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})\leq\frac{-\varphi(F^{*}-F(x^{0}))}{(1+\alpha(\omega))\ell-L},

this combines with (19) yield

k=0v(xk)φ(FF(x0))α(ω)((1+α(ω))L).\sum\limits_{k=0}^{\infty}v_{\ell}(x^{k})\leq\frac{-\ell\varphi(F^{*}-F(x^{0}))}{\alpha(\omega)((1+\alpha(\omega))\ell-L)}.

Then, assertion (iv) holds due to

k=0v(xk)p=0k1v(xp)kmin0pk1v(xp).\sum\limits_{k=0}^{\infty}v_{\ell}(x^{k})\geq\sum\limits_{p=0}^{k-1}v_{\ell}(x^{p})\geq k\min\limits_{0\leq p\leq k-1}v_{\ell}(x^{p}).

\ \ \square

Remark 8.

In Theorem 2(iii), if the accumulation xx^{*} lies in bd(Ω)\mathrm{bd}(\Omega), the continuity of v()v_{\ell}(\cdot) and V()V_{\ell}(\cdot) can not be derived due to domω=int(Ω)\mathrm{dom}\nabla\omega=\mathrm{int}(\Omega). Recall that when ω()=122\omega(\cdot)=\frac{1}{2}\|\cdot\|^{2}, the continuity of d()d(\cdot), corresponds to V()V_{\ell}(\cdot), implied the stationarity of accumulation points 6 ; vsd . Alternatively, since the gradient of 122\frac{1}{2}\|\cdot\|^{2} is 11-Lipschitz continuous, the stationarity of accumulation points follows directly from a subsequence {d(xk)}k𝒦0\{d(x^{k})\}_{k\in\mathcal{K}}\rightarrow 0. The result is crucial when the continuity of v()v_{\ell}(\cdot) and V()V_{\ell}(\cdot) are unknown. For example, the authors used but didn’t prove the continuity of d()d(\cdot) in (dsd, , Theorem 3.2). However, the result is valid, which can be proved by {d(xk)}k𝒦0\{d(x^{k})\}_{k\in\mathcal{K}}\rightarrow 0.

4.1 Strong convergence

In this subsection, we discuss the strong convergence of Algorithm 1 in convex case. Firstly, we present the following identity, known as three points lemma.

Lemma 7 (Three points lemma).

tp Assume that a,b,cint(domω)a,b,c\in\mathrm{int(dom}\omega), then the following equality holds:

ω(b)ω(a),ca=Dω(c,a)+Dω(a,b)Dω(c,b).\langle\nabla\omega(b)-\nabla\omega(a),c-a\rangle=D_{\omega}(c,a)+D_{\omega}(a,b)-D_{\omega}(c,b).

Applying the three points lemma, we can now establish a fundamental inequality for Algorithm 1 with CC-convex objective function.

Lemma 8.

Assume that F()F(\cdot) is (L,C,e)(L,C,e)-smooth and (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) for some L>0L>0 and μ0\mu\geq 0. Let xk+1=V(xk)x^{k+1}=V_{\ell}(x^{k}) for some >0\ell>0. Then, for all xint(Ω)x\in\mathrm{int}(\Omega), we have

c(xk),F(xk+1)F(x)(δμ)Dω(x,xk)Dω(x,xk+1)+(L)Dω(xk+1,xk).\langle c_{\ell}(x^{k}),F(x^{k+1})-F(x)\rangle\leq(\ell-\delta\mu)D_{\omega}(x,x^{k})-\ell D_{\omega}(x,x^{k+1})+(L-\ell)D_{\omega}(x^{k+1},x^{k}). (20)

Proof.

From the (L,C,e)(L,C,e)-smoothness of F()F(\cdot), for all xint(Ω)x\in\mathrm{int}(\Omega), we have

F(xk+1)\displaystyle F(x^{k+1}) CF(xk)+JF(xk)(xk+1xk)+LDω(xk+1,xk)e\displaystyle\preceq_{C}F(x^{k})+JF(x^{k})(x^{k+1}-x^{k})+LD_{\omega}(x^{k+1},x^{k})e
CF(x)+JF(xk)(xk+1x)μDω(x,xk)e+LDω(xk+1,xk)e,\displaystyle\preceq_{C}F(x)+JF(x^{k})(x^{k+1}-x)-\mu D_{\omega}(x,x^{k})e+LD_{\omega}(x^{k+1},x^{k})e,

where the second inequality is given by the (μ,C,e)(\mu,C,e)-strong convexity of F()F(\cdot). Rearranging and multiplying by c(xk)c_{\ell}(x^{k}), we obtain

c(xk),F(xk+1)F(x)\displaystyle\langle c_{\ell}(x^{k}),F(x^{k+1})-F(x)\rangle
\displaystyle\leq c(xk),JF(xk)(xk+1x)δμDω(x,xk)+LDω(xk+1,xk)\displaystyle\langle c_{\ell}(x^{k}),JF(x^{k})(x^{k+1}-x)\rangle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})
\displaystyle\leq c(xk),JF(xk)(xk+1xk)+c(xk),JF(xk)(xkx)δμDω(x,xk)+LDω(xk+1,xk)\displaystyle\langle c_{\ell}(x^{k}),JF(x^{k})(x^{k+1}-x^{k})\rangle+\langle c_{\ell}(x^{k}),JF(x^{k})(x^{k}-x)\rangle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})
=\displaystyle= (Dω(xk+1,xk)+Dω(xk,xk+1))+ω(xk)ω(xk+1),xkxδμDω(x,xk)+LDω(xk+1,xk)\displaystyle-\ell(D_{\omega}(x^{k+1},x^{k})+D_{\omega}(x^{k},x^{k+1}))+\ell\langle\nabla\omega(x^{k})-\nabla\omega(x^{k+1}),x^{k}-x\rangle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})
=\displaystyle= (Dω(xk+1,xk)+Dω(xk,xk+1))+(Dω(x,xk)+Dω(xk,xk+1)Dω(x,xk+1))\displaystyle-\ell(D_{\omega}(x^{k+1},x^{k})+D_{\omega}(x^{k},x^{k+1}))+\ell(D_{\omega}(x,x^{k})+D_{\omega}(x^{k},x^{k+1})-D_{\omega}(x,x^{k+1}))
δμDω(x,xk)+LDω(xk+1,xk)\displaystyle-\delta\mu D_{\omega}(x,x^{k})+LD_{\omega}(x^{k+1},x^{k})
=\displaystyle= (δμ)Dω(x,xk)Dω(x,xk+1)+(L)Dω(xk+1,xk),\displaystyle(\ell-\delta\mu)D_{\omega}(x,x^{k})-\ell D_{\omega}(x,x^{k+1})+(L-\ell)D_{\omega}(x^{k+1},x^{k}),

where the first equality follows from Proposition 3(iii) and equality (11), the second equality is given by three points lemma with a=xka=x^{k}, b=xk+1b=x^{k+1} and c=xc=x.\ \ \square

Before presenting the convergence result in convex case, we recall the following result on nonnegative sequences.

Lemma 9.

(qf, , Lemma 2) Let {ak}\{a^{k}\} and {bk}\{b^{k}\} be nonnegative sequences. Assume that k=1bk<\sum\limits_{k=1}^{\infty}b^{k}<\infty and

ak+1ak+bk.a^{k+1}\leq a^{k}+b^{k}.

Then, {ak}\{a^{k}\} converges.

We also recall the following assumption.

Assumption 2.

(lc, , Assumption H) Assume the Legendre function ω()\omega(\cdot) satisfies

  • (i)\mathrm{(i)}

    If {xk}int(Ω)\{x^{k}\}\subset\mathrm{int}(\Omega) converges to some point xΩx\in\Omega, then Dω(x,xk)0D_{\omega}(x,x^{k})\rightarrow 0.

  • (ii)\mathrm{(ii)}

    Reciprocally, if xΩx\in\Omega and if {xk}int(Ω)\{x^{k}\}\subset\mathrm{int}(\Omega) is such that Dω(x,xk)0D_{\omega}(x,x^{k})\rightarrow 0, then xkxx^{k}\rightarrow x.

Remark 9.

Note that ω()\omega(\cdot) is differentiable on int(Ω)\mathrm{int}(\Omega), Assumption 2(i) holds directly for xint(Ω)x\in\mathrm{int}(\Omega). On the other hand, if xint(Ω)x\in\mathrm{int}(\Omega), Assumption 2(ii) follows from the strict convexity of ω()\omega(\cdot) on int(Ω)\mathrm{int}(\Omega). Hence, Assumption 2 is proposed for xbd(Ω)x\in\mathrm{bd}(\Omega).

In what follows, we give the convergence result for convex case.

Theorem 3.

The assumptions of Theorem 2 hold and F()F(\cdot) is CC-convex. Then, we have

  • (i)\mathrm{(i)}

    there exists FCF(xk)F^{*}\preceq_{C}F(x^{k}) for all kk such that limkF(xk)=F\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}.

  • (ii)\mathrm{(ii)}

    k=0Dω(xk+1,xk)<\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty.

  • (iii)\mathrm{(iii)}

    every accumulation point of {xk}\{x_{k}\} is a weakly efficient solution. Assume that Assumption 2 holds, then {xk}\{x_{k}\} converges to some weakly efficient solution xx^{*}.

  • (iv)\mathrm{(iv)}

    c¯k1,F(xk)FDω(x,x0)+(L)p=0k1Dω(xp+1,xp)k\langle\bar{c}_{\ell}^{k-1},F(x^{k})-F^{*}\rangle\leq\frac{\ell D_{\omega}(x^{*},x^{0})+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p})}{k}, where c¯k1=1kp=0k1c(xp)\bar{c}_{\ell}^{k-1}=\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}).

  • (v)\mathrm{(v)}

    if L\ell\geq L, then c¯k1,F(xk)FDω(x,x0)k\langle\bar{c}_{\ell}^{k-1},F(x^{k})-F^{*}\rangle\leq\frac{\ell D_{\omega}(x^{*},x^{0})}{k}, where c¯k1=1kp=0k1c(xp)\bar{c}_{\ell}^{k-1}=\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}).

Proof.

Since all assumptions of Theorem 2 hold, we obtain assertions (i) and (ii).

(iii) Denote by XFX^{*}_{F}={xΩ:F(x)=F}\{x\in\Omega:F(x)=F^{*}\}. From the proof of Theorem 2(iii), there exists xXFx^{*}\in X^{*}_{F} is an accumulation point of {xk}\{x^{k}\}. Next, we prove xx^{*} is a weakly efficient point. Suppose by the contrary that there exists x^Ω\hat{x}\in\Omega such that F(x^)CF(x)F(\hat{x})\prec_{C}F(x^{*}). Substituting x=x^x=\hat{x} and μ=0\mu=0 into inequality (20), we have

c(xk),F(xk+1)F(x^)(Dω(x^,xk)Dω(x^,xk+1))+(L)Dω(xk+1,xk).\langle c_{\ell}(x^{k}),F(x^{k+1})-F(\hat{x})\rangle\leq\ell(D_{\omega}(\hat{x},x^{k})-D_{\omega}(\hat{x},x^{k+1}))+(L-\ell)D_{\omega}(x^{k+1},x^{k}). (21)

Summing the above inequality over 0 to k1k-1, we obtain

(Dω(x^,x0)Dω(x^,xk))+(L)p=0k1Dω(xp+1,xp)\displaystyle\ell(D_{\omega}(\hat{x},x^{0})-D_{\omega}(\hat{x},x^{k}))+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p}) p=0k1c(xp),F(xp+1)F(x^)\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{p+1})-F(\hat{x})\rangle
p=0k1c(xp),F(x)F(x^),\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{*})-F(\hat{x})\rangle,

where the second inequality follows from F(x)CF(xp)F(x^{*})\preceq_{C}F(x^{p}). Multiplying by 1k\frac{1}{k}, we obtain

1kp=0k1c(xp),F(x)F(x^)Dω(x^,x0)+(L)p=0k1Dω(xp+1,xp)k.\langle\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{*})-F(\hat{x})\rangle\leq\frac{\ell D_{\omega}(\hat{x},x^{0})+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p})}{k}. (22)

Since 1kp=0k1c(xp)G\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p})\in G, this together with F(x^)CF(x)F(\hat{x})\prec_{C}F(x^{*}) imply

1kp=0k1c(xp),F(x)F(x^)δ0,\displaystyle\langle\frac{1}{k}\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{*})-F(\hat{x})\rangle\geq\delta_{0}, (23)

where δ0:=mincGc,F(x)F(x^)>0\delta_{0}:=\min\limits_{c^{*}\in G}\langle c^{*},F(x^{*})-F(\hat{x})\rangle>0. However, as kk\rightarrow\infty, the right hand side of inequality (22) tends to 0 due to k=0Dω(xk+1,xk)<\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty. This contradicts ineuqality (23). Therefore, xx^{*} is a weakly efficient point. In what follows, we prove the whole sequence converges to xx^{*}. Substituting x=xx=x^{*} and μ=0\mu=0 into inequality (20), we have

c(xk),F(xk+1)F(x)(Dω(x,xk)Dω(x,xk+1))+(L)Dω(xk+1,xk).\langle c_{\ell}(x^{k}),F(x^{k+1})-F(x^{*})\rangle\leq\ell(D_{\omega}(x^{*},x^{k})-D_{\omega}(x^{*},x^{k+1}))+(L-\ell)D_{\omega}(x^{k+1},x^{k}). (24)

The left hand side of the above inequality is nonnegative due to FCF(xk+1)F^{*}\preceq_{C}F(x^{k+1}), then

Dω(x,xk+1)Dω(x,xk)+LDω(xk+1,xk).D_{\omega}(x^{*},x^{k+1})\leq D_{\omega}(x^{*},x^{k})+\frac{L-\ell}{\ell}D_{\omega}(x^{k+1},x^{k}).

This combines with k=0Dω(xk+1,xk)<\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty and Lemma 9 imply {Dω(x,xk)}\{D_{\omega}(x^{*},x^{k})\} converges. On the other hand, xx^{*} is an accumulation point of {xk}\{x^{k}\}. It follows from Assumption 2(i) that {Dω(x,xk)}\{D_{\omega}(x^{*},x^{k})\} converges to 0. Therefore, by Assumption 2(ii), we obtain sequence {xk}\{x^{k}\} converges to xx^{*}.

(iv) Summing inequality (24) over 0 to k1k-1, we obtain

(Dω(x,x0)Dω(x,xk))+(L)p=0k1Dω(xp+1,xp)\displaystyle\ell(D_{\omega}(x^{*},x^{0})-D_{\omega}(x^{*},x^{k}))+(L-\ell)\sum\limits_{p=0}^{k-1}D_{\omega}(x^{p+1},x^{p}) p=0k1c(xp),F(xp+1)F\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{p+1})-F^{*}\rangle
p=0k1c(xp),F(xk)F,\displaystyle\geq\langle\sum\limits_{p=0}^{k-1}c_{\ell}(x^{p}),F(x^{k})-F^{*}\rangle,

where the second inequality follows from F(xp+1)CF(xp)F(x^{p+1})\preceq_{C}F(x^{p}). Multiplying by 1k\frac{1}{k}, we obtain assertion (iv).

(v) Assertion (v) follows directly from assertion (iv).\ \ \square

4.2 Linear convergence

This subsection is devoted to discuss the linear convergence of Algorithm 1. Fristly, we present the following results under the assumption that F()F(\cdot) is (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) for μ>0\mu>0.

Theorem 4.

The assumptions of Theorem 2 hold and F()F(\cdot) is (μ,C,e)(\mu,C,e)-strongly convex relative to ω()\omega(\cdot) for some μ>0\mu>0. Then, we have

  • (i)\mathrm{(i)}

    there exists FCF(xk)F^{*}\preceq_{C}F(x^{k}) for all kk such that limkF(xk)=F\lim\limits_{k\rightarrow\infty}F(x^{k})=F^{*}.

  • (ii)\mathrm{(ii)}

    k=0Dω(xk+1,xk)<\sum\limits_{k=0}^{\infty}D_{\omega}(x^{k+1},x^{k})<\infty.

  • (iii)\mathrm{(iii)}

    every accumulation point of {xk}\{x_{k}\} is a weakly efficient solution. Suppose that Assumption 2 holds, then {xk}\{x_{k}\} converges to some weakly efficient solution xx^{*}.

  • (iv)\mathrm{(iv)}

    if L\ell\geq L, then Dω(x,xk+1)δμDω(x,xk)D_{\omega}(x^{*},x^{k+1})\leq\frac{\ell-\delta\mu}{\ell}D_{\omega}(x^{*},x^{k}).

Proof.

Since (μ,C,e)(\mu,C,e)-strong convexity is sharper than CC-convexity, the assertions (i)-(iii) hold directly from Theorem 3.

(iv) Substituting x=xx=x^{*} into inequality (20), we have

c(xk),F(xk+1)F(δμ)Dω(x,xk)Dω(x,xk+1)+(L)Dω(xk+1,xk).\langle c_{\ell}(x^{k}),F(x^{k+1})-F^{*}\rangle\leq(\ell-\delta\mu)D_{\omega}(x^{*},x^{k})-\ell D_{\omega}(x^{*},x^{k+1})+(L-\ell)D_{\omega}(x^{k+1},x^{k}).

This together with FCF(xk+1)F^{*}\preceq_{C}F(x^{k+1}) and L\ell\geq L yield

Dω(x,xk+1)δμDω(x,xk).D_{\omega}(x^{*},x^{k+1})\leq\frac{\ell-\delta\mu}{\ell}D_{\omega}(x^{*},x^{k}).

\ \ \square

Next, we show the Q-linear convergence result of {u0(xk)}\{u_{0}(x^{k})\} with vector Bregman-PL inequality.

Theorem 5.

Assume that vector Bregman-PL inequality (17) holds and F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot) for some L>0L>0. If L\ell\geq L, then the sequence {xk}\{x^{k}\} generated by Algorithm 1 satisfies

u0(xk+1)(11τ())u0(xk).u_{0}(x^{k+1})\leq\left(1-\frac{1}{\tau(\ell)}\right)u_{0}(x^{k}).

Proof.

Since F()F(\cdot) is (L,C,e)(L,C,e)-smooth relative to ω()\omega(\cdot) and L\ell\geq L, we have

F(xk+1)CF(xk)+JF(xk)(xk+1xk)+Dω(xk+1,xk)e.F(x^{k+1})\preceq_{C}F(x^{k})+JF(x^{k})(x^{k+1}-x^{k})+\ell D_{\omega}(x^{k+1},x^{k})e.

Then, for all xΩx\in\Omega we obtain

F(xk+1)F(x)CF(xk)F(x)+JF(xk)(xk+1xk)+Dω(xk+1,xk)e.F(x^{k+1})-F(x)\preceq_{C}F(x^{k})-F(x)+JF(x^{k})(x^{k+1}-x^{k})+\ell D_{\omega}(x^{k+1},x^{k})e.

It follows that

supxΩmincGc,F(xk)F(x)\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},F(x^{k})-F(x)\rangle
\displaystyle\geq supxΩmincG{c,F(xk+1)F(x)+JF(xk)(xkxk+1)Dω(xk+1,xk)e}\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{*}\in G}\{\langle c^{*},F(x^{k+1})-F(x)+JF(x^{k})(x^{k}-x^{k+1})-\ell D_{\omega}(x^{k+1},x^{k})e\rangle\}
\displaystyle\geq supxΩ{mincGc,F(xk+1)F(x)+mincGc,JF(xk)(xkxk+1)Dω(xk+1,xk)}\displaystyle\sup\limits_{x\in\Omega}\{\min\limits_{c^{*}\in G}\langle c^{*},F(x^{k+1})-F(x)\rangle+\min\limits_{c^{*}\in G}\langle c^{*},JF(x^{k})(x^{k}-x^{k+1})\rangle-\ell D_{\omega}(x^{k+1},x^{k})\}
=\displaystyle= supxΩmincGc,F(xk+1)F(x)+v(xk),\displaystyle\sup\limits_{x\in\Omega}\min\limits_{c^{*}\in G}\langle c^{*},F(x^{k+1})-F(x)\rangle+v_{\ell}(x^{k}),

Hence,

u0(xk+1)\displaystyle u_{0}(x^{k+1}) u0(xk)v(xk)\displaystyle\leq u_{0}(x^{k})-v_{\ell}(x^{k})
(11τ())u0(xk).\displaystyle\leq\left(1-\frac{1}{\tau(\ell)}\right)u_{0}(x^{k}).

From (15), (17) and L\ell\geq L, we have τ()1\tau(\ell)\geq 1. Then 0<11τ()<10<1-\frac{1}{\tau(\ell)}<1, we derive that {u0(xk)}\{u_{0}(x^{k})\} converges linearly.\ \ \square

5 Conclusions

We extended the relative smoothness and relative strong convexity to vector-valued functions in the context of VOPs. Then we presented convergence rates for the interior Bregman gradient method using the generalized assumptions. In addition, we also devised a vector Bregman-PL inequality, which was used to prove linear convergence of the proposed method. It is worth noting that the convergence results extended the ones in com to non-Lipschitz gradient continuous and non-strongly convex case.

References

  • (1) Marler, R. T., Arora, J. S.: Survey of multi-objective optimization methods for engineering. Struct. Multidisc. Optim. 26(6), 369-395 (2004)
  • (2) Tapia, M., Coello, C.: Applications of multi-objective evolutionary algorithms in economics and finance: A survey. IEEE Congress on Evolutionary Computation. 532-539 (2007)
  • (3) Fliege, J., Werner, R.: Robust multiobjective optimization & applications in portfolio optimization. Eur. J. Oper. Res. 234(2), 422-433 (2014)
  • (4) Sener, O., Koltun, V.: Multi-task learning as multiobjective optimization. Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 527-538 (2018)
  • (5) Mahapatra, D., Rajan, V.: Multi-task learning with user preferences: gradient descent with controlled ascent in Pareto optimization. Proc. Int. Conf. Mach. Learn. (ICML), 119, 6597-6601 (2020)
  • (6) Ye, F., Lin, B., Yue, Z., Guo, P., Xiao, Q., Zhang, Y.: Multi-objective meta learning. Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2021
  • (7) Miettinen, K.: Nonlinear Multiobjective Optimization. Kluwer Academic Publishers, Boston, Massachusetts (1999)
  • (8) Luc, T. D.: Theory of Vector Optimization, Lecture Notes in Economics and Mathematical Systems. Springer-Verlag, Berlin (1989)
  • (9) John, J.: Vector Optimization: Theory, Applications and Extensions. 2nd ed. Springer, Berlin (2011)
  • (10) Fliege, J., Svaiter, B. F.: Steepest descent methods for multicriteria optimization. Math. Method. Oper. Res. 51(3), 479-494 (2000)
  • (11) Gran~a\mathrm{Gra\tilde{n}a} Drummond, L. M., Svaiter, B. F.: A steepest descent method for vector optimization problems. J. Comput. Appl. Math. 175, 395–414 (2005)
  • (12) Drummong, L. M. G., Iusem, A. N.: A projected gradient method for vector optimization problems. Comput. Optim. Appl. 28(1), 5-29 (2004)
  • (13) Bonnel, H., Iusem, A. N., Svaiter, B. F.: Proximal methods in vector optimization. SIAM J. Optim. 15(4), 953-970 (2005)
  • (14) Lucambio Pérez, L. R., Prudente, L. F.: Nonlinear conjugate gradient methods for vector optimization. SIAM J. Optim. 20(3), 2690-2720 (2018)
  • (15) Bento, G. C., Cruz Neto, J. X., López, G., Soubeyran, A., Souza, J. C. O.: The proximal point method for locally Lipschitz functions in multiobjective optimization with application to the compromise problem. SIAM J. Optim. 28(2), 1104-1120 (2018)
  • (16) Tanabe, H., Fukuda, E. H., Yamashita, N.: Proximal gradient methods for multiobjective optimization and their applications. Comput. Optim. Appl. 72, 339-361 (2019)
  • (17) Chen, J., Tang, L. P., Yang, X. M.: A Barzilai-Borwein descent method for multiobjective optimization problems. arXiv:2204.08616 (2022)
  • (18) Fliege, J., Vaz, A. I. F., Vicente, L. N.: Complexity of gradient descent for multiobjective optimization. Optim. Methods Softw. 34(5), 949-959 (2019)
  • (19) Tanabe, H., Fukuda, E. H., Yamashita, N.: Convergence rates analysis of a multiobjective proximal gradient method. Optim. Lett. (2022)
  • (20) Bauschke, H. H., Bolte, J., Teboulle, M.: A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications. Math. Oper. Res. 42(2), 330-348 (2016)
  • (21) Lu, H., Freund, Robert M., Nesterov, Y.: Relatively smooth convex optimization by first-order methods, and applications. SIAM J. Optim. 28(1), 333-354 (2018)
  • (22) Teboulle, M.: A simplified view of first order methods for optimization. Math. Program. 170, 67-96 (2018)
  • (23) Bauschke, H. H., Bolte, J., Chen, J., Teboulle, M., Wang, X.: On linear convergence of non-Euclidean gradient methods without strong convexity and Lipschitz gradient continuity. J. Optim. Theory Appl. 182, 1068-1087 (2019)
  • (24) Auslender, A., Teboulle, M.: Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 16(3), 697-725 (2006)
  • (25) Villacorta, K. D. V., Oliveira, P. R.: An interior proximal method in vector optimization. Eur. J. Oper. Res. 214(3), 485-492 (2011)
  • (26) Chen, Z., Huang, X. X., Yang, X. Q.: Generalized proximal point algorithms for multiobjective optimization problems. Appl. Anal. 90(6), 935-949 (2011)
  • (27) Souza, Joa~o\mathrm{Jo\tilde{a}o} Carlos de O., et al.: The proximal point method with a vectorial Bregman regularization in multiobjective DC programming. Optim. 71(1), 263-284 (2022)
  • (28) Ansary, M. A. T., Dutta, J.: A proximal gradient method for multi-objective optimization problems using Bregman functions. optimization-online (2022)
  • (29) Rockafellar, R. T.: Convex Analysis. Princeton University Press, Princeton (1970)
  • (30) Bauschke, H. H., Borwein, J. M.: Joint and separate convexity of the Bregman distance. In: Butnariu, D., Censor, Y., Reich, S. (eds.) Inherently Parallel Algorithms in Feasibility and Optimization and their Applications (Haifa 2000), pp. 23-36. Elsevier, Amsterdam (2001)
  • (31) Chen, W., Yang, X. M., Zhao, Y.: Conditional gradient method for vector optimization. arXiv:2109.11296 (2022)
  • (32) Rockafellar, R. T., Wets, R. J. B.: Variational Analysis, Grundlehren Math. Wiss. 317, Springer, Berlin (1998)
  • (33) Sion, M.: On general minimax theorems. Pac. J. Math., 8(1), 171-176 (1958)
  • (34) Moudden, M. E., Mouatasim, A. E.: Accelerated diagonal steepest descent method for unconstrained multiobjective optimization. J. Optim. Theory Appl. 188(1), 220-242 (2021)
  • (35) Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3, 538-543 (1993)
  • (36) Polyak, B. T.: Introduction to Optimization. Optimization Software, Inc., New York (1987)