This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\UseRawInputEncoding

An inexact golden ratio primal-dual algorithm with linesearch step for a saddle point problem

Changjie Fang  111  College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail: [email protected].,   Jinxiu Liu  222  College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail: [email protected].,   Jingtao Qiu  333  College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail:[email protected].   and
Shenglan Chen  444  College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail:[email protected].

Abstract In this paper, we propose an inexact golden ratio primal-dual algorithm with linesearch step(IP-GRPDAL) for solving the saddle point problems, where two subproblems can be approximately solved by applying the notations of inexact extended proximal operators with matrix norm. Our proposed IP-GRPDAL method allows for larger stepsizes by replacing the extrapolation step with a convex combination step. Each iteration of the linesearch requires to update only the dual variable, and hence it is quite cheap. In addition, we prove convergence of the proposed algorithm and show an O(1/N)O(1/N) ergodic convergence rate for our algorithm, where NN represents the number of iterations. When one of the component functions is strongly convex, the accelerated O(1/N2)O(1/N^{2}) convergence rate results are established by choosing adaptively some algorithmic parameters. Furthermore, when both component functions are strongly convex, the linear convergence rate results are achieved. Numerical simulation results on the sparse recovery and image deblurring problems illustrate the feasibility and efficiency of our inexact algorithms.

Keywords Convex optimization \cdot Inexact extended proximal operators \cdot Golden ratio primal-dual algorithm \cdot Linesearch \cdot Image deblurring

1 Introduction

Let X:=nX:=\mathbb{R}^{n} and Y:=mY:=\mathbb{R}^{m} be two finite-dimensional Euclidean spaces equipped with a standard inner product ,\langle\cdot,\cdot\rangle and a norm =,\|\cdot\|=\sqrt{\langle\cdot,\cdot\rangle}. Let f:X(,+]f:X\to(-\infty,+\infty] and g,h:Y(,+]g,h:Y\to(-\infty,+\infty] be proper lower semicontinuous (l.s.c) convex functions, A:XYA:X\to Y be a bounded linear mapping. Denote the Legendre-Fenchel conjugate of hh and the adjoint of AA by hh^{*} and AA^{*}, respectively. Now we consider the primal problem

minxXf(x)+h(Ax)\displaystyle\mathop{\min}\limits_{x\in X}f(x)+h(Ax) (1.1)

together with its dual problem

maxyYf(Ay)+h(y).\displaystyle\mathop{\max}\limits_{y\in Y}f^{*}(-A^{*}y)+h^{*}(y). (1.2)

If a primal-dual solution pair (x¯,y¯)(\overline{x},\overline{y}) of (1.1) and (1.2) exists, i.e.,

0f(x¯)+Ay¯,  0h(Ax¯)y¯,\displaystyle 0\in\partial f(\overline{x})+A^{*}\overline{y},\,\,0\in\partial h(A\overline{x})-\overline{y},

then the problem (1.1) is equivalent to the following saddle-point formulation:

minxXmaxyYf(x)+Ax,yh(y).\displaystyle\min_{x\in X}\max_{y\in Y}f(x)+\langle Ax,y\rangle-h^{*}(y). (1.3)

It is well known that many application problems can be formulated as the saddle point problem (1.3) such as image restoration, magnetic resonance imaging and computer vision; see, for example, [28, 30, 35]. Two of the most popular approaches are the primal-dual algorithm (PDA) [9, 20], alternating direction method of multipliers (ADMM) method [5, 19], and their accelerated and generalized variants [24, 26, 25]. To solve model (1.3), the following first-order primal-dual algorithm (PDA) [9] has attracted much attention:

{xk+1=Proxτf(xkτAyk),x^k+1=xk+1+θ(xk+1xk),yk+1=Proxσh(yk+σAxk+1),\begin{cases}{x}^{k+1}=\textrm{Prox}_{\tau f}(x^{k}-\tau A^{*}y^{k}),\\ \hat{x}^{k+1}=x^{k+1}+\theta({x^{k+1}-x^{k}}),\\ {y}^{k+1}=\textrm{Prox}_{\sigma h^{*}}(y^{k}+\sigma Ax^{k+1}),\end{cases} (1.4)

where τ>0\tau>0 and σ>0\sigma>0 are regularization parameters and θ(0,1)\theta\in(0,1) is an extrapolation parameter, and for θ=1\theta=1, the convergence of PDA was proved with the requirement on step sizes τσA2<1\tau\sigma\|A\|^{2}<1. Generally, with fixed τ\tau and σ\sigma, a flexible extrapolation parameter θ\theta is of benefit to a potential acceleration of the algorithm (1.4), which motivates researchers to enlarge the range of θ\theta, e.g., see [18, 32]. Indeed, the scheme (1.4) reduces to the classic Arrow-Hurwicz method [4] when θ=0\theta=0. However, the convergence of the Arrow-Hurwicz method can only be guaranteed under the restrictive assumption that τ\tau and σ\sigma are sufficiently small. In order to overcome this difficulty, Chang et al. in [11] proposed a golden ratio primal-dual algorithm (GRPDA) for solving (1.3) based on a seminal convex combination technique introduced by Malitsky in [25], which, started at (x0,y0)n×m(x^{0},y^{0})\in\mathbb{R}^{n}\times\mathbb{R}^{m}, iterates as

{zk+1=ϕ1ϕxk+1ϕzk,xk+1=Proxτf(zk+1τAyk),yk+1=Proxσh(yk+σAxk+1),\begin{cases}z^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k},\\ {x}^{k+1}=\textrm{Prox}_{\tau f}(z^{k+1}-\tau A^{*}y^{k}),\\ {y}^{k+1}=\textrm{Prox}_{\sigma h^{*}}(y^{k}+\sigma Ax^{k+1}),\end{cases} (1.5)

where ϕ(1,1+52]\phi\in(1,\frac{1+\sqrt{5}}{2}] determines the convex combination coefficients. In [11], the iterative convergence and ergodic convergence rate results are established under the condition τσA2<ϕ\tau\sigma\|A\|^{2}<\phi. Since ϕ>1\phi>1, this stepsize condition is much relaxed than that of the PDA method (1.4); see also [12]. Further, Chang et al. incorporated linesearch strategy into the GRPDA method, in which the next iteration yk+1y^{k+1} is implicitly computed, since the stepsize parameter τk+1\tau_{k+1} is determined by the inequality including yk+1y^{k+1}; see Step 2 of Algorithm 3.1 in [13].

As primal-dual algorithms, however, when the proximal operators of ff and hh^{*} are not easy to compute, they did not perform ideally well in terms of computing time and efficiency, e.g., see the examples in [8, 15] and the numerical experiments in [17]. And in many practical applications, one often encounters the case that at least one of the proximal operators does not possess a closed-form solution and their evaluation involves inner iterative algorithms. In this situation, some researchers are dedicated to approximately solving the subproblems instead of finding their accurate solutions, for example, [21, 24, 27, 34, 33]. An absolute error criterion was adopted in [14], where the subproblem errors are controlled by a summable sequence of error tolerances. Jiang et al. [22, 23] studied two inexact primal-dual algorithms with absolute and relative error criteria respectively, where for the inexact primal-dual method with a relative error criterion, only the O(1/N)O(1/N) convergence rate was established. In [29], Rasch and Chambolle proposed the inexact first-order primal-dual methods by applying the concepts of inexact proxima where all the controlled errors were required to be summable. Further, Fang et al. [16] proposed an inexact primal-dual method with correction step by introducing the notations of inexact extended proximal operators with matrix norm, where the O(1/N)O(1/N) ergodic convergence rate was achieved. In [16], the accelerated versions of the proposed method under the assumptions that ff or hh^{*} is strongly convex have not been considered.

In this paper, we are concerned with the following saddle point problem:

minxXmaxyYL(x,y)=f(x)+Ax,yg(y)\displaystyle\mathop{\min}\limits_{x\in X}\mathop{\max}\limits_{y\in Y}L(x,y)=f(x)+{\langle{Ax,y}\rangle}-g(y) (1.6)

Recall that (x¯,y¯)(\overline{x},\overline{y}) is called a saddle point of (1.6) if it satisfies the inequalities

L(x¯,y)L(x¯,y¯)L(x,y¯),xX,yY\displaystyle L(\overline{x},y)\leqslant L(\overline{x},\overline{y})\leqslant L(x,\overline{y}),\forall x\in X,\forall y\in Y (1.7)

Hence, Problem (1.3) is a special case of Problem (1.6).

Motivated by the research works [13, 16], in this paper, we propose an inexact golden ratio primal-dual algorithm with linesearch step for solving problem (1.6) by applying the type-2 approximation of the extended proximal point introduced in [16]. The main contributions of this paper are summarized as follows.

• For the case the proximal operators of ff and gg are not easy to compute, we propose an inexact IP-GRPDAL method with extended proximal terms containing symmetric positive definite matrix. Both subproblems in our method can be approximately solved under the type-2 approximation criterias. Global convergence and O(1/N)O(1/N) ergodic convergence rate results are established under the condition τσAT2<ϕ\tau\sigma\|A\|_{T}^{2}<\phi, where NN denotes the iteration counter. Furthermore, we establish the convergence rates in case the error tolerances {δk}\{\delta_{k}\} and {εk}\{\varepsilon_{k}\} are required to decrease like O(1/k2α+1)O(1/k^{2\alpha+1}) for some α>0\alpha>0.

• Our method updates the dual variable by adopting linesearch step to allow adaptive and potentially much larger stepsizes which effectively reduces the computational effort of the algorithm iteration. In addition, the next iteration yk+1y^{k+1} is explicitly computed compared with that in [13]; see (3.3) in Algorithm 1.

• We propose the accelerated versions of IP-GRPDAL method, which were not provided in [16]. When one of the underlying functions is strongly convex, O(1/N2)O(1/N^{2}) convergence rate results are established by adaptively choosing some algorithmic parameters, for example, β\beta is replaced by βk\beta_{k}; see (3.32) of Algorithm 2. In addition, the linear convergence rate results can be established when both ff and gg are strongly convex.

• We perform numerical experiments for the sparse recovery and image deblurring problems, demonstrating that our method outperforms some existing methods [13, 9, 16, 26].

The rest of this paper is organized as follows. In Section 2, we introduce the concepts of inexact extended proximal terms and present some auxiliary material. In Section 3, the main algorithm and its accelerated versions are presented. At the same time, we also prove the convergence of our algorithms and analyze their convergence rates. Numerical experiment results are reported in Section 4. Some conclusions are presented in Section 5.

2 Preliminaries

For given x1Xx_{1}\in X and y1Yy_{1}\in Y, we define Px1,y1(x):=f(x)f(x1)+xx1,Ay1P_{x_{1},y_{1}}(x):=f(x)-f(x_{1})+{\langle{x-x_{1},A^{*}y_{1}}\rangle} and Dx1,y1(y):=g(y)g(y1)+yy1,Ax1D_{x_{1},y_{1}}(y):=g(y)-g(y_{1})+{\langle{y-y_{1},-Ax_{1}}\rangle} for any xXx\in X and yYy\in Y. Let (x¯,y¯)X×Y(\overline{x},\overline{y})\in X\times Y be a generic saddle point. When there is no confusion, we will omit the subscript in PP and DD,

{P(x):=Px¯,y¯(x)=f(x)f(x¯)+xx¯,Ay¯,xX,D(y):=Dx¯,y¯(y)=g(y)g(y¯)+yy¯,Ax¯,yY.\begin{cases}P(x):=P_{\overline{x},\overline{y}}(x)=f(x)-f(\overline{x})+\langle x-\overline{x},A^{*}\overline{y}\rangle,\forall x\in X,\\ D(y):=D_{\overline{x},\overline{y}}(y)=g(y)-g(\overline{y})+\langle y-\overline{y},-A\overline{x}\rangle,\forall y\in Y.\end{cases} (2.1)

By subgradient inequality, it is clear that P(x)0P(x)\geqslant 0 and D(y)0D(y)\geqslant 0. Note that the functions P(x)P(x) and D(y)D(y) are convex in xx and yy, respectively. The primal-dual gap is defined as G(x,y):=L(x,y¯)L(x¯,y)G(x,y):=L(x,\overline{y})-L(\overline{x},y) for (x,y)X×Y(x,y)\in X\times Y. It is easy to verify that

G(x,y):=Gx¯,y¯(x,y)=P(x)+D(y)0,(x,y)X×Y.G(x,y):=G_{\overline{x},\overline{y}}(x,y)=P(x)+D(y)\geqslant 0,\forall(x,y)\in X\times Y. (2.2)

The system (2.1) can be reformulated as

0f(x¯)+Ay¯,0g(y¯)Ax¯.0\in\partial f(\overline{x})+A^{*}\overline{y},\qquad 0\in\partial g(\overline{y})-A\overline{x}.

Suppose that hh is a convex function in n\mathbb{R}^{n}, and that Dn×nD\in\mathbb{R}^{n\times n} is a symmetric positive definite matrix. For any D0D\succ 0 and given yny\in\mathbb{R}^{n}, denote

Jy(x):=h(x)+12τxyD2,xn,J_{y}(x):=h(x)+\frac{1}{2\tau}{\|x-y\|}^{2}_{D},\forall x\in\mathbb{R}^{n}, (2.3)

and define the proximal operator of hh as

ProxτhD(y)=argminxX{h(x)+12τxyD2},\textnormal{Prox}_{\tau h}^{D}(y)=\mathop{\arg\min}\limits_{x\in X}\{h(x)+\frac{1}{2\tau}{\|x-y\|}^{2}_{D}\}, (2.4)

where xD2=x,Dx{\|x\|}_{D}^{2}=\langle{x,Dx}\rangle and D1D^{-1} denotes the inverse of DD, the first-order optimality condition for the proximum gives different characterizations of the proximal operator:

z¯:=ProxτhD(y)0Jy(z¯)yz¯τh(z¯).\overline{z}:=\textrm{Prox}_{\tau h}^{D}(y)\Longleftrightarrow 0\in\partial J_{y}(\overline{z})\Longleftrightarrow\frac{y-\overline{z}}{\tau}\in\partial h(\overline{z}).

Below, we recall definitions of three different types of inexact extended proximal operators with matrix norm, which can be found in [16].

Definition 2.1

Let ε0\varepsilon\geqslant 0. zXz\in X is said to be a type-0 approximation of the extended proximal point ProxτhD(y)Prox_{\tau h}^{D}(y) with precision ε\varepsilon if

z0εProxτhD(y)zz¯D2τεz\thickapprox_{0}^{\varepsilon}\textnormal{Prox}_{\tau h}^{D}(y)\Longleftrightarrow\|z-\overline{z}\|_{D}\leqslant\sqrt{2\tau\varepsilon} (2.5)
Definition 2.2

Let ε0\varepsilon\geqslant 0. zXz\in X is said to be a type-1 approximation of the extended proximal point ProxτhD(y)Prox_{\tau h}^{D}(y) with precision ε\varepsilon if

z1εProxτhD(y)0εJy(z),z\thickapprox_{1}^{\varepsilon}\textnormal{Prox}_{\tau h}^{D}(y)\Longleftrightarrow 0\in\partial_{\varepsilon}J_{y}(z), (2.6)

where εJy(z)={pX|Jy(x)Jy(z)+p,xzε,xX}.\partial_{\varepsilon}J_{y}(z)=\{p\in X|J_{y}(x)\geqslant J_{y}(z)+\langle{p,x-z}\rangle-\varepsilon,\forall x\in X\}.

Definition 2.3

Let ε0\varepsilon\geqslant 0. zXz\in X is said to be a type-2 approximation of the extended proximal point ProxτhD(y)Prox_{\tau h}^{D}(y) with precision ε\varepsilon if

z2εProxτhD(y)1τD(yz)εh(z),z\thickapprox_{2}^{\varepsilon}\textnormal{Prox}_{\tau h}^{D}(y)\Longleftrightarrow\frac{1}{\tau}D(y-z)\in\partial_{\varepsilon}h(z), (2.7)

where εh(z)={pX|h(x)h(z)+p,xzε,xX}.\partial_{\varepsilon}h(z)=\{p\in X|h(x)\geqslant h(z)+\langle{p,x-z}\rangle-\varepsilon,\forall x\in X\}.

We give two simple but useful lemmas in the following.

Lemma 2.4

For any x,y,zx,y,z and a symmetric positive definite matrix DD, we have the identity

D(xy),xz=12[xyD2+xzD2yzD2].\langle{D(x-y),x-z}\rangle=\frac{1}{2}[\|x-y\|_{D}^{2}+\|x-z\|_{D}^{2}-\|y-z\|_{D}^{2}]. (2.8)

For α\alpha\in\mathbb{R}, there holds

αx+(1α)yD2=αxD2+(1α)yD2α(1α)xyD2.\|\alpha x+(1-\alpha)y\|_{D}^{2}=\alpha\|x\|_{D}^{2}+(1-\alpha)\|y\|_{D}^{2}-\alpha(1-\alpha)\|x-y\|_{D}^{2}. (2.9)
Lemma 2.5

[31] Assuming that λ\lambda and Λ\Lambda are the minimum and maximum eigenvalues of the symmetric positive definite matrix DD, respectively, we have

λxxDΛx.\sqrt{\lambda}\|x\|\leqslant\|x\|_{D}\leqslant\sqrt{\Lambda}\|x\|. (2.10)

3 Algorithm and convergence properties

In this section, we propose an inexact GRPDA algorithm and then show the convergence of the proposed method. If ff is further assumed to be strongly convex, we can modify our method to accelerate the convergence rate. Moreover, if both ff and gg are strongly convex, a linear convergence rate can be achieved.

3.1 Convex case

1: Let φ=5+12\varphi=\frac{\sqrt{5}+1}{2} be the golden ratio, that is φ2=1+φ\varphi^{2}=1+\varphi. Choose x0=z0n,y0m,ϕ(1,φ),η(0,1),μ(0,1)x^{0}=z^{0}\in\mathbb{R}^{n},y^{0}\in\mathbb{R}^{m},\phi\in(1,\varphi),\eta\in(0,1),\mu\in(0,1), τ0>0\tau_{0}>0 and β>0\beta>0. Sn×nS\in\mathbb{R}^{n\times n} and Tm×mT\in\mathbb{R}^{m\times m} are given symmetric positive definite matrix. Set ψ=1+ϕϕ2\psi=\frac{1+\phi}{\phi^{2}} and k=0.k=0.
2: Compute
zk+1=ϕ1ϕxk+1ϕzk,{z}^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k}, (3.1)
xk+12δk+1argminxX{L(x,yk)+12τkxzk+1S2}.{x}^{k+1}\approx_{2}^{\delta_{k+1}}\mathop{\arg\min}\limits_{x\in X}\{L(x,y^{k})+\frac{1}{2\tau_{k}}\|x-z^{k+1}\|_{S}^{2}\}. (3.2)
3:  Choose any τk+1[τk,ψτk]\tau_{k+1}\in[\tau_{k},\psi\tau_{k}] and run
  3.a: Compute
yk+12εk+1argmaxyY{L(xk+1,y)12βτk+1yykT2}.{y}^{k+1}\approx_{2}^{\varepsilon_{k+1}}\mathop{\arg\max}\limits_{y\in Y}\{L(x^{k+1},y)-\frac{1}{2\beta\tau_{k+1}}\|y-y^{k}\|_{T}^{2}\}. (3.3)
  3.b: Break linesearch if
βτk+1Ayk+1AykTηϕτkyk+1ykT.\sqrt{\beta\tau_{k+1}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\leqslant\eta\sqrt{\frac{\phi}{\tau_{k}}}\|y^{k+1}-y^{k}\|_{T}. (3.4)
  Otherwise, set τk+1:=τk+1μ\tau_{k+1}:=\tau_{k+1}\mu and go to 3.a.
4: Set k+1k+2k+1\leftarrow k+2 and return to 2.
Algorithm 1 An inexact GRPDA with linesearch(IP-GRPDAL)

Firstly, we summarize several useful lemmas which will be used in the sequence..

Lemma 3.1

(i) The linesearch in Algorithm 1 always terminates.
(ii) There exists τ¯:=ηϕLβψ>0\underline{\tau}:=\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}>0 such that τk+1>τ¯\tau_{k+1}>\underline{\tau} for all k0k\geqslant 0.
(iii) For any integer N>0N>0, we have |𝒜N|c^N|\mathcal{A}_{N}|\geqslant\hat{c}N for some constant c^>0\hat{c}>0, where 𝒜N={1kN:τkτ¯}\mathcal{A}_{N}=\{1\leqslant k\leqslant N:\tau_{k}\geqslant\underline{\tau}\} and |𝒜N||\mathcal{A}_{N}| is the cardinality of the set 𝒜N\mathcal{A}_{N}, which implies k=1Nτk+1c¯N\sum_{k=1}^{N}\tau_{k+1}\geqslant\underline{c}N with c¯=c^τ¯\underline{c}=\hat{c}\underline{\tau}.

Proof. (i) In each iteration of the linesearch τk\tau_{k} is multiplied by factor μ(0,1)\mu\in(0,1). Since (3.4) is fulfilled whenever τkηϕLβψ\tau_{k}\leqslant\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}} where L=ATL=\|A^{*}\|_{T}, the inner loop can not run indefinitely.

(ii)According to the recursive method, we assume that τ0>ηϕLβψ\tau_{0}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}. Then, our goal is to show that from τk>ηϕLβψ\tau_{k}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}} follows τk+1>ηϕLβψ\tau_{k+1}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}. Suppose that τk+1=ψτkμi\tau_{k+1}=\psi\tau_{k}\mu^{i} for some iZ+i\in Z^{+}. If i=0i=0 then τk+1>τk>ηϕLβψ\tau_{k+1}>\tau_{k}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}. If i>0i>0 then τk+1=ψτkμi1\tau_{k+1}^{{}^{\prime}}=\psi\tau_{k}\mu^{i-1} does not satisfy (3.4). Thus, τk+1>ηϕLβψ\tau_{k+1}^{{}^{\prime}}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}} and hence, τk+1>ηϕLβψ\tau_{k+1}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}.

(iii) The detailed proof takes similar approach with Lemma 3.1 (iii) of [13] and is thus omitted.

Lemma 3.2

Suppose that λ1,λ2>η\lambda_{1},\lambda_{2}>\eta are the minimum eigenvalues of SS and TT, respectively. Let θk+1=τk+1τk\theta_{k+1}=\frac{\tau_{k+1}}{\tau_{k}} and {(zk+1,xk+1,yk+1):k 0}\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\} be the sequence generated by Algorithm 1. Then, for any (x¯,y¯)X×Y(\overline{x},\overline{y})\in X\times Y, there holds

τk+1G(xk+1,yk+1)S(xk+2zk+2),x¯xk+2+1βT(yk+1yk),y¯yk+1+ϕθk+1S(xk+1zk+2),xk+2xk+1+τk+1A(yk+1yk),xk+1xk+2+τk+1(δk+1+δk+2+εk+1).\begin{split}\tau_{k+1}&G(x^{k+1},y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle\\ &+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle+\tau_{k+1}\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle\\ &+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.5)

Proof. By Definition 2.3, the optimal condition of (3.2) yields

1τkS(zk+1xk+1)Aykδk+1f(xk+1).\frac{1}{\tau_{k}}S(z^{k+1}-x^{k+1})-A^{*}y^{k}\in\partial_{\delta_{k+1}}f(x^{k+1}).

In view of the definition of ε\varepsilon-subdifferential, we have

f(x)f(xk+1)+1τkS(xk+1zk+1)+τkAyk,xxk+1+δk+10,xX.f(x)-f(x^{k+1})+\frac{1}{\tau_{k}}\langle S{(x^{k+1}-z^{k+1})+\tau_{k}A^{*}y^{k},x-x^{k+1}}\rangle+\delta_{k+1}\geqslant 0,\forall x\in X. (3.6)

Setting x=x¯x=\overline{x} and x=xk+2x=x^{k+2} in (3.6), respectively, we obtain

τk(f(xk+1)f(x¯))S(xk+1zk+1)+τkAyk,x¯xk+1+τkδk+1.\tau_{k}(f(x^{k+1})-f(\overline{x}))\leqslant\langle S(x^{k+1}-z^{k+1})+\tau_{k}A^{*}y^{k},\overline{x}-x^{k+1}\rangle+\tau_{k}\delta_{k+1}. (3.7)
τk(f(xk+1)f(xk+2)S(xk+1zk+1)+τkAyk,xk+2xk+1+τkδk+1,\tau_{k}(f(x^{k+1})-f(x^{k+2})\leqslant\langle S(x^{k+1}-z^{k+1})+\tau_{k}A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\tau_{k}\delta_{k+1}, (3.8)

In view of (3.7), we have

τk+1(f(xk+2)f(x¯))S(xk+2zk+2)+τk+1Ayk+1,x¯xk+2+τk+1δk+2.\tau_{k+1}(f(x^{k+2})-f(\overline{x}))\leqslant\langle S(x^{k+2}-z^{k+2})+\tau_{k+1}A^{*}y^{k+1},\overline{x}-x^{k+2}\rangle+\tau_{k+1}\delta_{k+2}. (3.9)

Multiplying (3.8) by θk+1\theta_{k+1} and using the fact xk+1zk+1=ϕ(xk+1zk+2)x^{k+1}-z^{k+1}=\phi(x^{k+1}-z^{k+2}), we obtain

τk+1(f(xk+1)f(xk+2))ϕθk+1S(xk+1zk+2)+τk+1Ayk,xk+2xk+1+τk+1δk+1.\tau_{k+1}(f(x^{k+1})-f(x^{k+2}))\leqslant\langle\phi\theta_{k+1}S(x^{k+1}-z^{k+2})+\tau_{k+1}A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\tau_{k+1}\delta_{k+1}. (3.10)

Similarly, for y¯Y\overline{y}\in Y, from (3.3) we obtain

τk+1(g(yk+1)g(y¯))1βT(yk+1yk)βτk+1Axk+1,y¯yk+1+τk+1εk+1.\tau_{k+1}(g(y^{k+1})-g(\overline{y}))\leqslant\frac{1}{\beta}\langle T(y^{k+1}-y^{k})-\beta\tau_{k+1}Ax^{k+1},\overline{y}-y^{k+1}\rangle+\tau_{k+1}\varepsilon_{k+1}. (3.11)

Direct calculations show that a summation of (3.9), (3.10) and (3.11) gives

τk+1(f(xk+1)f(x¯))+τk+1(g(yk+1)g(y¯))+τk+1Ay¯,xk+1x¯τk+1Ax¯,yk+1y¯\tau_{k+1}(f(x^{k+1})-f(\overline{x}))+\tau_{k+1}(g(y^{k+1})-g(\overline{y}))+\tau_{k+1}\langle A^{*}\overline{y},x^{k+1}-\overline{x}\rangle-\tau_{k+1}\langle A\overline{x},y^{k+1}-\overline{y}\rangle
S(xk+2zk+2),x¯xk+2+τk+1Ayk+1,x¯xk+2+ϕθk+1S(xk+1zk+2),xk+2xk+1\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\tau_{k+1}\langle A^{*}y^{k+1},\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle
+τk+1Ayk,xk+2xk+1+1βT(yk+1yk),y¯yk+1τk+1Axk+1,y¯yk+1+\tau_{k+1}\langle A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle-\tau_{k+1}\langle Ax^{k+1},\overline{y}-y^{k+1}\rangle
+τk+1Ay¯,xk+1x¯τk+1Ax¯,yk+1y¯+τk+1(δk+1+δk+2+εk+1).+\tau_{k+1}\langle A^{*}\overline{y},x^{k+1}-\overline{x}\rangle-\tau_{k+1}\langle A\overline{x},y^{k+1}-\overline{y}\rangle+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).

Applying the definitions of P(x)P(x) and D(y)D(y) in (2.1) to the above inequality, we have

τk+1P(xk+1)+τk+1D(yk+1)S(xk+2zk+2),x¯xk+2+ϕθk+1S(xk+1zk+2),xk+2xk+1\tau_{k+1}P(x^{k+1})+\tau_{k+1}D(y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle
1βT(yk+1yk),y¯yk+1+τk+1A(yk+1yk),xk+1xk+2+τk+1(δk+1+δk+2+εk+1),\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle+\tau_{k+1}\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}),

which, by the definition of G(x,y)G(x,y), implies (3.5) immediately.

Lemma 3.3

Let {(zk+1,xk+1,yk+1):k 0}\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\} be the sequence generated by Algorithm 1. For k0k\geqslant 0, it holds that

ϕϕ1zk+3x¯S2+1βyk+1y¯T2+2τk+1G(xk+1,yk+1)ϕϕ1zk+2x¯S2+1βyky¯T2(1ηλ1)θk+1ϕxk+2xk+1S2λ2ηλ2βyk+1ykT2θk+1ϕxk+1zk+2S2+2τk+1(δk+1+δk+2+εk+1).\begin{split}&\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-(1-\frac{\eta}{\lambda_{1}})\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}\\ &-\frac{\lambda_{2}-\eta}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2}-\theta_{k+1}\phi\|x^{k+1}-z^{k+2}\|_{S}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.12)

Proof. First, it is easy to verify from ψ=1+ϕϕ2\psi=\frac{1+\phi}{\phi^{2}} and θk+1=τk+1τkψ\theta_{k+1}=\frac{\tau_{k+1}}{\tau_{k}}\leqslant\psi that

1+1ϕϕθk+11+1ϕϕψ=0.1+\frac{1}{\phi}-\phi\theta_{k+1}\geqslant 1+\frac{1}{\phi}-\phi\psi=0. (3.13)

Since λ1\lambda_{1} and λ2\lambda_{2} are the minimum eigenvalues of SS and TT, respectively, it follows from (2.10) that

xk+2xk+11λ1xk+2xk+1SandAyk+1Ayk1λ2Ayk+1AykT.\|x^{k+2}-x^{k+1}\|\leqslant\frac{1}{\sqrt{\lambda_{1}}}\|x^{k+2}-x^{k+1}\|_{S}\quad and\quad\|A^{*}y^{k+1}-A^{*}y^{k}\|\leqslant\frac{1}{\sqrt{\lambda_{2}}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}.

Using Cauchy-Schwarz inequality, from (3.4) we obtain

2τk+1Ayk+1Aykxk+2xk+12τk+11λ1λ2xk+2xk+1SAyk+1AykTη(θk+1ϕλ1xk+2xk+1S2+1λ2βyk+1ykT2).\begin{split}&2\tau_{k+1}\|A^{*}y^{k+1}-A^{*}y^{k}\|\|x^{k+2}-x^{k+1}\|\\ &\leqslant 2\tau_{k+1}\frac{1}{\sqrt{\lambda_{1}}\sqrt{\lambda_{2}}}\|x^{k+2}-x^{k+1}\|_{S}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\\ &\leqslant\eta(\frac{\theta_{k+1}\phi}{\lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2}).\end{split} (3.14)

Applying (2.8) and Cauchy-Schwarz inequality, from (3.5) we have

xk+2x¯S2+1βyk+1y¯T2+2τk+1G(xk+1,yk+1)zk+2x¯S2+(ϕθk+11)xk+2zk+2S2ϕθk+1xk+1zk+2S2ϕθk+1xk+2xk+1S2+1βyky¯T21βyk+1ykT2+2τk+1Ayk+1Aykxk+1xk+2+2τk+1(δk+1+δk+2+εk+1).\begin{split}\|x^{k+2}&-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\|z^{k+2}-\overline{x}\|_{S}^{2}+(\phi\theta_{k+1}-1)\|x^{k+2}-z^{k+2}\|_{S}^{2}-\phi\theta_{k+1}\|x^{k+1}-z^{k+2}\|_{S}^{2}\\ &-\phi\theta_{k+1}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{1}{\beta}\|y^{k+1}-y^{k}\|_{T}^{2}\\ &+2\tau_{k+1}\|A^{*}y^{k+1}-A^{*}y^{k}\|\|x^{k+1}-x^{k+2}\|+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.15)

In view of (3.1), we get xk+2=ϕϕ1zk+31ϕ1zk+2x^{k+2}=\frac{\phi}{\phi-1}z^{k+3}-\frac{1}{\phi-1}z^{k+2} and zk+3zk+2=ϕ1ϕ(xk+2zk+2)z^{k+3}-z^{k+2}=\frac{\phi-1}{\phi}(x^{k+2}-z^{k+2}). Thus, from (2.9) we deduce

xk+2x¯S2=ϕϕ1(zk+3x¯)1ϕ1(zk+2x¯)S2=ϕϕ1zk+3x¯S21ϕ1zk+2x¯S2+ϕ(ϕ1)2zk+3zk+2S2=ϕϕ1zk+3x¯S21ϕ1zk+2x¯S2+1ϕxk+2zk+2S2.\begin{split}\|x^{k+2}-\overline{x}\|_{S}^{2}&=\|\frac{\phi}{\phi-1}(z^{k+3}-\overline{x})-\frac{1}{\phi-1}(z^{k+2}-\overline{x})\|_{S}^{2}\\ &=\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}-\frac{1}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{\phi}{(\phi-1)^{2}}\|z^{k+3}-z^{k+2}\|_{S}^{2}\\ &=\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}-\frac{1}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\phi}\|x^{k+2}-z^{k+2}\|_{S}^{2}.\end{split} (3.16)

Combining (3.14) and (3.16) with (3.15), we obtain

ϕϕ1zk+3x¯S2+1βyk+1y¯T2+2τk+1G(xk+1,yk+1)ϕϕ1zk+2x¯S2(1ϕθk+1+1ϕ)xk+2zk+2S2+1βyky¯T2θk+1ϕxk+2xk+1S2+η(θk+1ϕλ1xk+2xk+1S2+1λ2βyk+1ykT2)1βyk+1ykT2θk+1ϕxk+1zk+2S2+2τk+1(δk+1+δk+2+εk+1)ϕϕ1zk+2x¯S2+1βyky¯T2(1ηλ1)θk+1ϕxk+2xk+1S2λ2ηλ2βyk+1ykT2θk+1ϕxk+1zk+2S2+2τk+1(δk+1+δk+2+εk+1).\begin{split}\frac{\phi}{\phi-1}&\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}-(1-\phi\theta_{k+1}+\frac{1}{\phi})\|x^{k+2}-z^{k+2}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}\\ &-\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}+\eta(\frac{\theta_{k+1}\phi}{\lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2})\\ &-\frac{1}{\beta}\|y^{k+1}-y^{k}\|_{T}^{2}-\theta_{k+1}\phi\|x^{k+1}-z^{k+2}\|_{S}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})\\ \leqslant&\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-(1-\frac{\eta}{\lambda_{1}})\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}\\ &-\frac{\lambda_{2}-\eta}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2}-\theta_{k+1}\phi\|x^{k+1}-z^{k+2}\|_{S}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.17)

In the following, we summarize the convergence result for Algorithm 1.

Theorem 3.4

Suppose that {(zk+1,xk+1,yk+1):k 0}\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\} is the sequence generated by Algorithm 1. Then, {(xk+1,yk+1):k 0}\{(x^{k+1},y^{k+1}):k\geqslant\ 0\} is bounded and every limit point of {(xk+1,yk+1):k 0}\{(x^{k+1},y^{k+1}):k\geqslant\ 0\} is a solution of (1.6).

Proof. Since λ1,λ2>η\lambda_{1},\lambda_{2}>\eta and G(xk+1,yk+1)0G(x^{k+1},y^{k+1})\geqslant 0, (3.12) yields

ϕϕ1zk+3x¯S2+1βyk+1y¯T2ϕϕ1zk+2x¯S2+1βyky¯T2+2τk+1(δk+1+δk+2+εk+1).\begin{split}&\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}\\ &\leqslant\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.18)

By summing over k=0,1,2,,N1k=0,1,2,...,N-1, we obtain

ϕϕ1zN+2x¯S2+1βyNy¯T2ϕϕ1z2x¯S2+1βy0y¯T2+2k=0N1τk+1(δk+1+δk+2+εk+1).\begin{split}&\frac{\phi}{\phi-1}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{N}-\overline{y}\|_{T}^{2}\\ &\leqslant\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.19)

Since the sequences {δk}\{\delta_{k}\} and {εk}\{\varepsilon_{k}\} are summable, and {τk}\{\tau_{k}\} is bounded, k=0N1τk+1(δk+1+δk+2+εk+1)\sum_{k=0}^{N-1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}) is bounded. From this we deduce that ϕϕ1zk+1x¯S2+1βyk+1y¯T2\frac{\phi}{\phi-1}\|z^{k+1}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2} is bounded. Thus, {zk+1}\{z^{k+1}\} and {yk+1}\{y^{k+1}\} are bounded sequences. Hence, from (3.1) we know that {xk+1}\{x^{k+1}\} is also bounded.

Summing up (3.12) from k=0k=0 to N1N-1, we get

ϕϕ1zN+2x¯S2+1βyNy¯T2+ϕk=0N1θk+1xk+1zk+2S2\frac{\phi}{\phi-1}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{N}-\overline{y}\|_{T}^{2}+\phi\sum_{k=0}^{N-1}\theta_{k+1}\|x^{k+1}-z^{k+2}\|_{S}^{2}
ϕϕ1z2x¯S2+1βy0y¯T2+2k=0N1τk+1(δk+1+δk+2+εk+1)<.\leqslant\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})<\infty.

Since θk+1=τk+1τk>1\theta_{k+1}=\frac{\tau_{k+1}}{\tau_{k}}>1, k=0θk+1=\sum_{k=0}^{\infty}\theta_{k+1}=\infty. Letting NN\to\infty in the above inequality and applying the equivalence of M\|\cdot\|_{M} and 2\|\cdot\|_{2}, where MM denotes the symmetric positive definite matrix, we get limkxk+1zk+2=0\lim\limits_{k\to\infty}\|x^{k+1}-z^{k+2}\|=0.

Similarly, we can deduce that limkyk+1yk=0\lim\limits_{k\to\infty}\|y^{k+1}-y^{k}\|=0. Thus, (xk+1,yk+1)(x^{k+1},y^{k+1}) has at least one limit point (x,y)(x^{\infty},y^{\infty}) and hence there exists a subsequence {(xki+1,yki+1)}\{(x^{k_{i}+1},y^{k_{i}+1})\} such that (xki+1,yki+1)(x,y)(x^{k_{i}+1},y^{k_{i}+1})\to(x^{\infty},y^{\infty}) as ii\to\infty. Similar to (3.7) and (3.11), for any (x,y)(x,y), there hold

τki(f(xki+1)f(x))S(xki+1zki+1)+τkiAyki,xxki+1+τkiδki+1.\tau_{k_{i}}(f(x^{k_{i}+1})-f(x))\leqslant\langle S(x^{k_{i}+1}-z^{k_{i}+1})+\tau_{k_{i}}A^{*}y^{k_{i}},x-x^{k_{i}+1}\rangle+\tau_{k_{i}}\delta_{k_{i}+1}. (3.20)

and

τki+1(g(yki+1)g(y))1βT(yki+1yki)βτki+1Axki+1,yyki+1+τki+1εki+1.\tau_{k_{i}+1}(g(y^{k_{i}+1})-g(y))\leqslant\frac{1}{\beta}\langle T(y^{k_{i}+1}-y^{k_{i}})-\beta\tau_{k_{i}+1}Ax^{k_{i}+1},y-y^{k_{i}+1}\rangle+\tau_{k_{i}+1}\varepsilon_{k_{i}+1}. (3.21)

In view of Lemma 3.1 (ii), we have τki+1>τ¯>0\tau_{k_{i}+1}>\underline{\tau}>0. Letting kk\rightarrow\infty in (3.20) and (3.21), respectively, and taking into account that δk0\delta_{k}\to 0 and εk0\varepsilon_{k}\to 0 as kk\to\infty, we obtain

f(x)f(x)+Ay,xx0andg(y)g(y)Ax,yy0.f(x)-f(x^{\infty})+\langle A^{*}y^{\infty},x-x^{\infty}\rangle\geqslant 0\quad\textrm{and}\quad g(y)-g(y^{\infty})-\langle Ax^{\infty},y-y^{\infty}\rangle\geqslant 0. (3.22)

This shows that (x,y)(x^{\infty},y^{\infty}) is a saddle point of (1.6).

We now establish the convergence rates of Algorithm 1.

Theorem 3.5

Let {(zk+1,xk+1,yk+1):k 0}\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\} be the sequence generated by Algorithm 1, and {(x¯,y¯)}\{{(\overline{x},\overline{y})}\} is any saddle point of (1.6). Then, for the ergodic sequence {(XN,YN)}\{(X^{N},Y^{N})\} given by

XN=1SNk=1NτkxkandYN=1SNk=1NτkykwithSN=k=1Nτk,{X}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\tau_{k}x^{k}\quad and\quad{Y}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\tau_{k}y^{k}\qquad with\quad S^{N}=\sum_{k=1}^{N}\tau_{k}, (3.23)

it holds that

G(XN,YN)c12c¯N,G(X^{N},Y^{N})\leqslant\frac{c_{1}}{2\underline{c}N}, (3.24)

where

c1=ϕϕ1z2x¯S2+1βy0y¯T2+k=0N12τk+1(δk+1+δk+2+εk+1).c_{1}=\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+\sum_{k=0}^{N-1}2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).

Proof. Since λ1,λ2>η\lambda_{1},\lambda_{2}>\eta, it follows from (3.12) that

2τk+1G(xk+1,yk+1)\displaystyle 2\tau_{k+1}G(x^{k+1},y^{k+1})\leqslant ϕϕ1(zk+2x¯S2zk+3x¯S2)+1β(yky¯T2yk+1y¯T2)\displaystyle\frac{\phi}{\phi-1}(\|z^{k+2}-\overline{x}\|_{S}^{2}-\|z^{k+3}-\overline{x}\|_{S}^{2})+\frac{1}{\beta}(\|y^{k}-\overline{y}\|_{T}^{2}-\|y^{k+1}-\overline{y}\|_{T}^{2})
+2τk+1(δk+1+δk+2+εk+1).\displaystyle+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).

By taking summation over k=0,1,2,,N1k=0,1,2,...,N-1, we obtain

2k=0N1τk+1G(xk+1,yk+1)ϕϕ1z2x¯S2+1βy0y¯T2+2k=0N1(τk+1(δk+1+δk+2+εk+1)).2\sum_{k=0}^{N-1}\tau_{k+1}G(x^{k+1},y^{k+1})\leqslant\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}(\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})). (3.25)

Since P(x)P(x) and D(y)D(y) are convex,

{P(XN)1SNk=0N1τk+1P(xk+1)D(YN)1SNk=0N1τk+1D(yk+1),\begin{cases}P(X^{N})\leqslant\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}P(x^{k+1})\\ D(Y^{N})\leqslant\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}D(y^{k+1}),\end{cases}

where SN=k=0N1τk+1S^{N}=\sum_{k=0}^{N-1}\tau_{k+1}. Combining (3.23) with (3.25), we obtain

G(XN,YN)=P(XN)+D(YN)1SNk=0N1τk+1(P(xk+1)+D(yk+1))=1SNk=0N1τk+1G(xk+1,yk+1)12SN(ϕϕ1z2x¯S2+1βy0y¯T2+2k=0N1(τk+1(δk+1+δk+2+εk+1))).\begin{split}G(&X^{N},Y^{N})=P(X^{N})+D(Y^{N})\\ &\leqslant\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}(P(x^{k+1})+D(y^{k+1}))=\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{1}{2S^{N}}(\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}(\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}))).\end{split} (3.26)

In view of Lemma 3.1(iii), we have SN=k=1Nτkc¯NS^{N}=\sum_{k=1}^{N}\tau_{k}\geqslant\underline{c}N. Hence, (3.24) holds.

Lemma 3.6

( [29]) For ξ>1\xi>-1, let sN:=k=1Nkξs^{N}:=\sum_{k=1}^{N}k^{\xi}. Then

sN=O(N1+ξ).s^{N}=O(N^{1+\xi}).

Similar to Corallary 3.4 in [29], we have the following theorem.

Theorem 3.7

If α>0\alpha>0 and δk=O(1kα)\delta_{k}=O(\frac{1}{k^{\alpha}}), εk=O(1kα)\varepsilon_{k}=O(\frac{1}{k^{\alpha}}), we have

G(XN,YN)={O(1/N),α>1,O(lnN/N),α=1,O(Nα),α(0,1).\displaystyle G(X^{N},Y^{N})=\left\{\begin{array}[]{ll}O(1/N),\quad\alpha>1,\\ O(\ln N/N),\alpha=1,\\ O(N^{-\alpha}),\quad\alpha\in(0,1).\end{array}\right. (3.27)

Proof. If α>1\alpha>1, then the sequences {δk}\{\delta_{k}\} and {εk}\{\varepsilon_{k}\} are summable. Since the sequence {τk}\{\tau_{k}\} is bounded, from (3.24) we have

G(XN,YN)=O(1/N).G(X^{N},Y^{N})=O(1/N).

If α=1\alpha=1, then δk=O(1k)\delta_{k}=O(\frac{1}{k}) and εk=O(1k)\varepsilon_{k}=O(\frac{1}{k}). In view of the assumption on δk\delta_{k}, for some r>0r>0, we have

δkrk+1.\delta_{k}\leqslant\frac{r}{k+1}.

Thus,

k=0N1δk+1k=0N1rk+1=c(1+k=2N1k)r(1+1N1t𝑑t)=r(1+lnN).\sum_{k=0}^{N-1}\delta_{k+1}\leqslant\sum_{k=0}^{N-1}\frac{r}{k+1}=c(1+\sum_{k=2}^{N}\frac{1}{k})\leqslant r(1+\int_{1}^{N}\frac{1}{t}dt)=r(1+\ln N).

Hence, from the boundedness of {τk}\{\tau_{k}\} we know that k=0N1τk+1δk+1=O(lnN)\sum_{k=0}^{N-1}\tau_{k+1}\delta_{k+1}=O(\ln N). Similarly, k=0N1τk+1εk+1\sum_{k=0}^{N-1}\tau_{k+1}\varepsilon_{k+1} can also obtain the same result. Therefore, we get

G(XN,YN)=O(lnN/N).G(X^{N},Y^{N})=O(\ln N/N).

If α(0,1)\alpha\in(0,1), then α>1-\alpha>-1, from Lemma 3.4 we obtain k=0N1δk+1=O(N1α)\sum_{k=0}^{N-1}\delta_{k+1}=O(N^{1-\alpha}) and k=0N1εk+1=O(N1α)\sum_{k=0}^{N-1}\varepsilon_{k+1}=O(N^{1-\alpha}). Thus, we have

G(XN,YN)=O(Nα).G(X^{N},Y^{N})=O(N^{-\alpha}).

3.2 Partially strongly convex case

Now, we consider the acceleration of Algorithm 1 assuming additionally that ff is γf\gamma_{f}-strongly convex, i.e., it holds for some γf>0\gamma_{f}>0 that

f(y)f(x)+u,yx+γf2yx2,uf(x),x,yn.f(y)\geqslant f(x)+\langle u,y-x\rangle+\frac{\gamma_{f}}{2}\|y-x\|^{2},\forall u\in\partial f(x),\forall x,y\in\mathbb{R}^{n}. (3.28)

When gg is strongly convex,the corresponding results can be achieved similarly and is thus omitted. The accelerated version of Algorithm 1 is summarized in Algorithm 2.

1:  Choose x0=z0nx^{0}=z^{0}\in\mathbb{R}^{n}, y0my^{0}\in\mathbb{R}^{m}, β0>0,τ0>0,μ(0,1)\beta_{0}>0,\tau_{0}>0,\mu\in(0,1) and ϕ(ξ,φ)\phi\in(\xi,\varphi) where ξ\xi is the unique real root of ξ3ξ1=0\xi^{3}-\xi-1=0. Let Sn×nS\in\mathbb{R}^{n\times n} and Tm×mT\in\mathbb{R}^{m\times m} are given symmetric positive definite matrix. λ1,λ2>1\lambda_{1},\lambda_{2}>1 are the minimum eigenvalues of SS and TT, respectively, Λ1\Lambda_{1} is the maximum eigenvalues of SS. Set ψ=1+ϕϕ2\psi=\frac{1+\phi}{\phi^{2}} and k=0.k=0.
2:  Compute
zk+1=ϕ1ϕxk+1ϕzk,{z}^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k}, (3.29)
xk+12δk+1argminxX{L(x,yk)+12τkxzk+1S2},{x}^{k+1}\approx_{2}^{\delta_{k+1}}\mathop{\arg\min}\limits_{x\in X}\{L(x,y^{k})+\frac{1}{2\tau_{k}}\|x-z^{k+1}\|_{S}^{2}\}, (3.30)
ωk+1=ϕψϕΛ1+ψγfτk,\omega_{k+1}=\frac{\phi-\psi}{\phi\Lambda_{1}+\psi\gamma_{f}\tau_{k}}, (3.31)
βk+1=βk(1+γfωk+1τk).\beta_{k+1}=\beta_{k}(1+\gamma_{f}\omega_{k+1}\tau_{k}). (3.32)
3:  Choose any τk+1[τk,ψτk]\tau_{k+1}\in[\tau_{k},\psi\tau_{k}] and run
  3.a: Compute
yk+12εk+1argmaxyY{L(xk+1,y)12βk+1τk+1yykT2}.{y}^{k+1}\approx_{2}^{\varepsilon_{k+1}}\mathop{\arg\max}\limits_{y\in Y}\{L(x^{k+1},y)-\frac{1}{2\beta_{k+1}\tau_{k+1}}\|y-y^{k}\|_{T}^{2}\}. (3.33)
  3.b: Break linesearch if
βk+1τk+1Ayk+1AykTϕτkyk+1ykT.\sqrt{\beta_{k+1}\tau_{k+1}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\leqslant\sqrt{\frac{\phi}{\tau_{k}}}\|y^{k+1}-y^{k}\|_{T}. (3.34)
  Otherwise, set τk+1:=τk+1μ\tau_{k+1}:=\tau_{k+1}\mu and go to 3.a.
4:  Set k+1k+2k+1\leftarrow k+2 and return to 2.
Algorithm 2 Accelerated IP-GRPDAL when ff is γf\gamma_{f}-strongly convex

Similar to Lemma 3.1, we have the following results.

Lemma 3.8

(i) The linesearch in Algorithm 2 always terminates.
(ii) There exists constant c>0c>0 such that βkck2\beta_{k}\geqslant ck^{2}.

Proof. (i) This conclusion follows from Lemma 3.1(i) by replacing β\beta with βk\beta_{k} and setting η=1\eta=1.
(ii) The proof is similar to Lemma 4.1 (ii) in [13] and is thus omitted.

Next we will establish the O(1/N2)O(1/{N^{2}}) convergence rate of Algorithm 2.

Theorem 3.9

Let {(zk+1,xk+1,yk+1):k 0}\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\} be the sequence generated by Algorithm 2, and {(x¯,y¯)}\{{(\overline{x},\overline{y})}\} is any saddle point of (1.6). Then, we can obtain zN+2x¯S=O(1N)\|z^{N+2}-\overline{x}\|_{S}=O(\frac{1}{N}) and for the ergodic sequence given by

S¯N=k=1Nβkτk,X¯N=1SNk=1Nβkτkxk,andY¯N=1SNk=1Nβkτkyk,\overline{S}^{N}=\sum_{k=1}^{N}\beta_{k}\tau_{k},\quad\overline{X}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\beta_{k}\tau_{k}x^{k},\quad and\quad\overline{Y}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\beta_{k}\tau_{k}y^{k}, (3.35)

there exists a constant c3>0c_{3}>0 such that

G(X¯N,Y¯N)c3N2G(\overline{X}^{N},\overline{Y}^{N})\leqslant\frac{c_{3}}{N^{2}} (3.36)

Proof. Since ff is strongly convex, it follows from (3.30) and Definition 2.3 that

f(x)f(xk+1)+1τkS(xk+1zk+1)+Ayk,xxk+1γf2xxk+12+δk+10.f(x)-f(x^{k+1})+\frac{1}{\tau_{k}}\langle{S(x^{k+1}-z^{k+1})+A^{*}y^{k},x-x^{k+1}}\rangle-\frac{\gamma_{f}}{2}\|x-x^{k+1}\|^{2}+\delta_{k+1}\geqslant 0. (3.37)

Since Λ1\Lambda_{1} is the maximum eigenvalue of SS, from (2.10) we get xxk+121Λ1xxk+1S2\|x-x^{k+1}\|^{2}\geqslant\frac{1}{\Lambda_{1}}\|x-x^{k+1}\|_{S}^{2}. Similar to (3.7)-(3.10), we obtain

τk+1(f(xk+2)f(x¯)+γf2Λ1x¯xk+2S2))S(xk+2zk+2)+τk+1Ayk+1,x¯xk+2+τk+1δk+2,\begin{split}\tau_{k+1}&(f(x^{k+2})-f(\overline{x})+\frac{\gamma_{f}}{2\Lambda_{1}}\|\overline{x}-x^{k+2}\|_{S}^{2}))\\ &\leqslant\langle S(x^{k+2}-z^{k+2})+\tau_{k+1}A^{*}y^{k+1},\overline{x}-x^{k+2}\rangle+\tau_{k+1}\delta_{k+2},\end{split} (3.38)
τk+1(f(xk+1)f(xk+2)+γf2Λ1xk+2xk+1S2))ϕθk+1S(xk+1zk+2)+τk+1Ayk,xk+2xk+1+τk+1δk+1.\begin{split}\tau_{k+1}&(f(x^{k+1})-f(x^{k+2})+\frac{\gamma_{f}}{2\Lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}))\\ &\leqslant\langle\phi\theta_{k+1}S(x^{k+1}-z^{k+2})+\tau_{k+1}A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\tau_{k+1}\delta_{k+1}.\end{split} (3.39)

From (3.33), for y¯Y\overline{y}\in Y we have

τk+1(g(yk+1)g(y¯))1βk+1T(yk+1yk)xβk+1τk+1Axk+1,y¯yk+1+τk+1εk+1.\tau_{k+1}(g(y^{k+1})-g(\overline{y}))\leqslant\frac{1}{\beta_{k+1}}\langle T(y^{k+1}-y^{k})x-\beta_{k+1}\tau_{k+1}Ax^{k+1},\overline{y}-y^{k+1}\rangle+\tau_{k+1}\varepsilon_{k+1}. (3.40)

Then, by adding (3.38), (3.39) and (3.40) and using similar arguments as in Lemma 3.2, we deduce

τk+1G(xk+1,yk+1)S(xk+2zk+2),x¯xk+2+ϕθk+1S(xk+1zk+2),xk+2xk+1+1βk+1T(yk+1yk),y¯yk+1+τk+1A(yk+1yk),xk+1xk+2γfτk+12Λ1xk+2xk+1S2γfτk+12Λ1x¯xk+2S2+τk+1(δk+1+δk+2+εk+1).\begin{split}\tau_{k+1}G&(x^{k+1},y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle\\ &+\frac{1}{\beta_{k+1}}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle+\tau_{k+1}\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle\\ &-\frac{\gamma_{f}\tau_{k+1}}{2\Lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}-\frac{\gamma_{f}\tau_{k+1}}{2\Lambda_{1}}\|\overline{x}-x^{k+2}\|_{S}^{2}+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.41)

Using (2.8) and Cauchy-Schwarz inequality, from (3.41) we get

(1+γfτk+1Λ1)x¯xk+2S2+1βk+1y¯yk+1T2+2τk+1G(xk+1,yk+1)zk+2x¯S2+1βk+1yky¯T21βk+1ykyk+1T2+(ϕθk+11)zk+2xk+2S2ϕθk+1zk+2xk+1S2ϕθk+1xk+2xk+1S2+2τk+1A(yk+1yk)xk+1xk+2+2τk+1(δk+1+δk+2+εk+1).\begin{split}(1+&\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\|\overline{x}-x^{k+2}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|\overline{y}-y^{k+1}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{1}{\beta_{k+1}}\|y^{k}-y^{k+1}\|_{T}^{2}\\ &+(\phi\theta_{k+1}-1)\|z^{k+2}-x^{k+2}\|_{S}^{2}-\phi\theta_{k+1}\|z^{k+2}-x^{k+1}\|_{S}^{2}-\phi\theta_{k+1}\|x^{k+2}-x^{k+1}\|_{S}^{2}\\ &+2\tau_{k+1}\|A^{*}(y^{k+1}-y^{k})\|\|x^{k+1}-x^{k+2}\|+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.42)

Combining (3.16) with (3.42), we have

(1+γfτk+1Λ1)ϕϕ1x¯zk+3S2+1βk+1y¯yk+1T2+2τk+1G(xk+1,yk+1)(γfτk+1/Λ1)+ϕϕ1zk+2x¯S2+1βk+1yky¯T21βk+1ykyk+1T2+(ϕθk+111+(γfτk+1/Λ1)ϕ)zk+2xk+2S2ϕθk+1zk+2xk+1S2ϕθk+1xk+2xk+1S2+2τk+1A(yk+1yk)xk+1xk+2+2τk+1(δk+1+δk+2+εk+1).\begin{split}(1+&\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}\|\overline{x}-z^{k+3}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|\overline{y}-y^{k+1}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{1}{\beta_{k+1}}\|y^{k}-y^{k+1}\|_{T}^{2}\\ &+(\phi\theta_{k+1}-1-\frac{1+(\gamma_{f}\tau_{k+1}/\Lambda_{1})}{\phi})\|z^{k+2}-x^{k+2}\|_{S}^{2}-\phi\theta_{k+1}\|z^{k+2}-x^{k+1}\|_{S}^{2}\\ &-\phi\theta_{k+1}\|x^{k+2}-x^{k+1}\|_{S}^{2}+2\tau_{k+1}\|A^{*}(y^{k+1}-y^{k})\|\|x^{k+1}-x^{k+2}\|\\ &+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.43)

Thus, it follows from (2.10), (3.34) and Cauchy-Schwarz inequality that

2τk+1Ayk+1Aykxk+1xk+2θk+1ϕλ1xk+2xk+1S2+1λ2βk+1yk+1ykT2.2\tau_{k+1}\|A^{*}y^{k+1}-A^{*}y^{k}\|\|x^{k+1}-x^{k+2}\|\leqslant\frac{\theta_{k+1}\phi}{\lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\lambda_{2}\beta_{k+1}}\|y^{k+1}-y^{k}\|_{T}^{2}. (3.44)

Since ψ=1+ϕϕ2\psi=\frac{1+\phi}{\phi^{2}} and θk+1ψ\theta_{k+1}\leqslant\psi, we have ϕθk+111+(γfτk+1/Λ1)ϕϕψ11ϕ(γfτk+1/Λ1)ϕ=γfτk+1Λ1ϕ\phi\theta_{k+1}-1-\frac{1+(\gamma_{f}\tau_{k+1}/\Lambda_{1})}{\phi}\leqslant\phi\psi-1-\frac{1}{\phi}-\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})}{\phi}=-\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}\phi}. Therefore, substituting (3.44) into (3.43), we obtain

(1+γfτk+1Λ1)ϕϕ1x¯zk+3S2+1βk+1y¯yk+1T2+2τk+1G(xk+1,yk+1)(γfτk+1/Λ1)+ϕϕ1zk+2x¯S2+1βk+1yky¯T2γfτk+1ϕΛ1zk+2xk+2S2(11λ1)θk+1ϕxk+2xk+1S2λ21λ2βk+1yk+1ykT2+2τk+1(δk+1+δk+2+εk+1)(γfτk+1/Λ1)+ϕϕ1zk+2x¯S2+1βk+1yky¯T2γfτk+1ϕΛ1zk+2xk+2S2+2τk+1(δk+1+δk+2+εk+1).\begin{split}(1+&\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}\|\overline{x}-z^{k+3}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|\overline{y}-y^{k+1}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{\gamma_{f}\tau_{k+1}}{\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &-(1-\frac{1}{\lambda_{1}})\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}-\frac{\lambda_{2}-1}{\lambda_{2}\beta_{k+1}}\|y^{k+1}-y^{k}\|_{T}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{\gamma_{f}\tau_{k+1}}{\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.45)

Note that (1+γfτk+1Λ1)ϕϕ1=ϕ(1+(γfτk+1/Λ1))ϕ+(γfτk+2/Λ1)ϕ+(γfτk+2/Λ1)ϕ1.(1+\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}=\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}. Thus, it follows from τk+2ψτk+1\tau_{k+2}\leqslant\psi\tau_{k+1} and ωk+2=ϕψϕΛ1+ψγfτk+1\omega_{k+2}=\frac{\phi-\psi}{\phi\Lambda_{1}+\psi\gamma_{f}\tau_{k+1}} that

ϕ(1+(γfτk+1/Λ1))ϕ+(γfτk+2/Λ1)ϕ(1+(γfτk+1/Λ1))ϕ+(γfψτk+1/Λ1)=1+(ϕψ)γfτk+1ϕΛ1+γfψτk+1=1+ωk+2γfτk+1.\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}\geqslant\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\psi\tau_{k+1}/\Lambda_{1})}=1+\frac{(\phi-\psi)\gamma_{f}\tau_{k+1}}{\phi\Lambda_{1}+\gamma_{f}\psi\tau_{k+1}}=1+\omega_{k+2}\gamma_{f}\tau_{k+1}. (3.46)

Since βk+2=βk+1(1+γfωk+2τk+1)\beta_{k+2}=\beta_{k+1}(1+\gamma_{f}\omega_{k+2}\tau_{k+1}),

(1+γfτk+1Λ1)ϕϕ1=ϕ(1+(γfτk+1/Λ1))ϕ+(γfτk+2/Λ1)ϕ+(γfτk+2/Λ1)ϕ1(1+ωk+2γfτk+1)ϕ+(γfτk+2/Λ1)ϕ1=βk+2βk+1ϕ+(γfτk+2/Λ1)ϕ1.\begin{split}(1+\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}&=\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}\\ &\geqslant(1+\omega_{k+2}\gamma_{f}\tau_{k+1})\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}\\ &=\frac{\beta_{k+2}}{\beta_{k+1}}\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}.\end{split} (3.47)

Substituting (3.47) into (3.45) and multiplying the resulting inequality by 12βk+1\frac{1}{2}\beta_{k+1}, we have

βk+22ϕ+(γfτk+2/Λ1)ϕ1x¯zk+3S2+12y¯yk+1T2+βk+1τk+1G(xk+1,yk+1)βk+12(γfτk+1/Λ1)+ϕϕ1zk+2x¯S2+12yky¯T2βk+12γfτk+1Λ1ϕzk+2xk+2S2+βk+1τk+1(δk+1+δk+2+εk+1).\begin{split}\frac{\beta_{k+2}}{2}&\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}\|\overline{x}-z^{k+3}\|_{S}^{2}+\frac{1}{2}\|\overline{y}-y^{k+1}\|_{T}^{2}+\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{\beta_{k+1}}{2}\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{2}\|y^{k}-\overline{y}\|_{T}^{2}\\ &-\frac{\beta_{k+1}}{2}\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}\phi}\|z^{k+2}-x^{k+2}\|_{S}^{2}+\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.48)

Define Ak+1:=(γfτk+1/Λ1)+ϕ2(ϕ1)zk+2x¯S2+12βk+1yky¯T2A_{k+1}:=\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{2(\phi-1)}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{2\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}, and hence (3.48) yields

βk+2Ak+2+βk+1τk+1G(xk+1,yk+1)\displaystyle\beta_{k+2}A_{k+2}+\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})\leqslant βk+1Ak+1βk+1γfτk+12ϕΛ1zk+2xk+2S2\displaystyle\beta_{k+1}A_{k+1}-\frac{\beta_{k+1}\gamma_{f}\tau_{k+1}}{2\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}
+βk+1τk+1(δk+1+δk+2+εk+1).\displaystyle+\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).

By summing over k=0,1,2,,N1k=0,1,2,...,N-1 in the above inequality, we obtain

βN+1AN+1+k=0N1βk+1τk+1G(xk+1,yk+1)+k=0N1βk+1γfτk+12ϕΛ1zk+2xk+2S2β1A1+k=0N1βk+1τk+1(δk+1+δk+2+εk+1).\begin{split}\beta_{N+1}A_{N+1}&+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})+\sum_{k=0}^{N-1}\frac{\beta_{k+1}\gamma_{f}\tau_{k+1}}{2\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &\leqslant\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.49)

In view of the convexity of G(x,y)G(x,y), from (3.49), we have

G(X¯N,Y¯N)1S¯Nk=0N1βk+1τk+1G(xk+1,yk+1)1S¯N(β1A1+k=0N1βk+1τk+1(δk+1+δk+2+εk+1)).\begin{split}G(\overline{X}^{N},\overline{Y}^{N})&\leqslant\frac{1}{\overline{S}^{N}}\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{1}{\overline{S}^{N}}(\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})).\end{split} (3.50)

According to the definition of Ak+1A_{k+1} and (3.49), we get

AN+1=(γfτN+1/Λ1)+ϕ2(ϕ1)zN+2x¯S2+12βN+1yNy¯T2A_{N+1}=\frac{(\gamma_{f}\tau_{N+1}/\Lambda_{1})+\phi}{2(\phi-1)}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{2\beta_{N+1}}\|y^{N}-\overline{y}\|_{T}^{2}
β1A1+k=0N1βk+1τk+1(δk+1+δk+2+εk+1)βN+1.\leqslant\frac{\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})}{\beta_{N+1}}.

Thus,

zN+2x¯S22(ϕ1)(γfτN+1/Λ1)+ϕAN+12(β1A1+k=0N1βk+1τk+1(δk+1+δk+2+εk+1)βN+1.\|z^{N+2}-\overline{x}\|_{S}^{2}\leqslant\frac{2(\phi-1)}{(\gamma_{f}\tau_{N+1}/\Lambda_{1})+\phi}A_{N+1}\leqslant\frac{2(\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})}{\beta_{N+1}}. (3.51)

From Lemma 3.8(ii), we know that there exists c>0c>0 such that βkck2\beta_{k}\geqslant ck^{2} for all k1k\geqslant 1, and hence (3.51) implies zN+2x¯Sc2N\|z^{N+2}-\overline{x}\|_{S}\leqslant\frac{c_{2}}{N} with

c2=2(β1A1+k=0N1βk+1τk+1(δk+1+δk+2+εk+1))c>0.c_{2}=\sqrt{\frac{2(\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}))}{c}}>0.

Since βk+1=βk(1+γfωk+1τk)\beta_{k+1}=\beta_{k}(1+\gamma_{f}\omega_{k+1}\tau_{k}) and ωk+1=ϕψϕΛ1+ψγfτk<1\omega_{k+1}=\frac{\phi-\psi}{\phi\Lambda_{1}+\psi\gamma_{f}\tau_{k}}<1,

βkτk=βk+1βkγfωk+1βk+1βkγf.\beta_{k}\tau_{k}=\frac{\beta_{k+1}-\beta_{k}}{\gamma_{f}\omega_{k+1}}\geqslant\frac{\beta_{k+1}-\beta_{k}}{\gamma_{f}}. (3.52)

Since S¯N=k=1NβkτkβN+1β1γf\overline{S}^{N}=\sum_{k=1}^{N}\beta_{k}\tau_{k}\geqslant\frac{\beta_{N+1}-\beta_{1}}{\gamma_{f}} and βkck2,\beta_{k}\geqslant ck^{2}, S¯N=O(N2)\overline{S}^{N}=O(N^{2}). This means that G(X¯N,Y¯N)c3N2G(\overline{X}^{N},\overline{Y}^{N})\leqslant\frac{c_{3}}{N^{2}} for some constant c3>0c_{3}>0. This completes the proof.

3.3 Completely strongly convex case

We further assume that ff is γf\gamma_{f}-strongly convex and gg is γg\gamma_{g}-strongly convex. In this setting, Algorithm 3 can be accelerated to linear convergence by properly selecting parameters τk\tau_{k}, δk\delta_{k} and εk\varepsilon_{k}.

1:  Choose x0=z0nx^{0}=z^{0}\in\mathbb{R}^{n}, y0my^{0}\in\mathbb{R}^{m}, β>0,τ0>0,μ(0,1)\beta>0,\tau_{0}>0,\mu\in(0,1) and ϕ(ξ,φ)\phi\in(\xi,\varphi) where ξ\xi is the unique real root of ξ3ξ1=0\xi^{3}-\xi-1=0. Let Sn×nS\in\mathbb{R}^{n\times n} and Tm×mT\in\mathbb{R}^{m\times m} are given symmetric positive definite matrix. Set ψ=1+ϕϕ2\psi=\frac{1+\phi}{\phi^{2}} and k=0.k=0.
2:  Compute
zk+1=ϕ1ϕxk+1ϕzk,{z}^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k},
xk+12δk+1argminxX{L(x,yk)+12τkxzk+1S2}.{x}^{k+1}\approx_{2}^{\delta_{k+1}}\mathop{\arg\min}\limits_{x\in X}\{L(x,y^{k})+\frac{1}{2\tau_{k}}\|x-z^{k+1}\|_{S}^{2}\}.
3:  Choose any τk+1[τk,ψτk]\tau_{k+1}\in[\tau_{k},\psi\tau_{k}] and run
  3.a: Compute
yk+12εk+1argmaxyY{L(xk+1,y)12βτk+1yykT2}.{y}^{k+1}\approx_{2}^{\varepsilon_{k+1}}\mathop{\arg\max}\limits_{y\in Y}\{L(x^{k+1},y)-\frac{1}{2\beta\tau_{k+1}}\|y-y^{k}\|_{T}^{2}\}. (3.53)
  3.b: Break linesearch if
βτk+1Ayk+1AykTϕτkyk+1ykT.\sqrt{\beta\tau_{k+1}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\leqslant\sqrt{\frac{\phi}{\tau_{k}}}\|y^{k+1}-y^{k}\|_{T}. (3.54)
  Otherwise, set τk+1:=τk+1μ\tau_{k+1}:=\tau_{k+1}\mu and go to 3.a.
4:  Set k+1k+2k+1\leftarrow k+2 and return to 2.
Algorithm 3 Accelerated IP-GRPDAL when ff and gg are strongly convex

Now, we summarize the linear convergence rate of Algorithm 3 in the following theorem.

Theorem 3.10

Suppose that Λ1,Λ2>1\Lambda_{1},\Lambda_{2}>1 are the maximum eigenvalues of SS and TT, respectively, δk=εk=O(qk)\delta_{k}=\varepsilon_{k}=O(q^{k}) with q(0,1)q\in(0,1), and τk=τ\tau_{k}=\tau such that 1+γfτΛ1=1+βγgτΛ2=1ρ1+\frac{\gamma_{f}\tau}{\Lambda_{1}}=1+\frac{\beta\gamma_{g}\tau}{\Lambda_{2}}=\frac{1}{\rho}. Let {(zk+1,xk+1,yk+1):k 0}\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\} be the sequence generated by Algorithm 3 and {(x¯,y¯)}\{(\overline{x},\overline{y})\} is any saddle point of (1.6). Then, for the ergodic sequence {(X~N,Y~N)}\{(\widetilde{X}^{N},\widetilde{Y}^{N})\} given by

X~N=1S~Nk=0N11ρkxk+1andY~N=1S~Nk=0N11ρkyk+1withS~N=k=0N11ρk,\widetilde{X}^{N}=\frac{1}{\widetilde{S}^{N}}\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}x^{k+1}\quad and\quad\widetilde{Y}^{N}=\frac{1}{\widetilde{S}^{N}}\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}y^{k+1}\quad with\quad\widetilde{S}^{N}=\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}, (3.55)

it holds that

G(X~N,Y~N)+12τx¯zN+2S2={O(ρN),ρ>q,O(NρN),ρ=q,O(qN),ρ<q.\displaystyle G(\widetilde{X}^{N},\widetilde{Y}^{N})+\frac{1}{2\tau}\|\overline{x}-z^{N+2}\|_{S}^{2}=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q.\end{array}\right.

Proof. Similar to the proof of (3.41) in Theorem 3.9, one can deduce that

τG(xk+1,yk+1)S(xk+2zk+2),x¯xk+2+ϕθk+1S(xk+1zk+2),xk+2xk+1+1βT(yk+1yk),y¯yk+1+τA(yk+1yk),xk+1xk+2γfτ2Λ1xk+2x¯S2γfτ2Λ1xk+2xk+1S2γgτ2Λ2yk+1y¯T2+τ(δk+1+δk+2+εk+1).\begin{split}\tau G(&x^{k+1},y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle\\ &+\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle+\tau\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle-\frac{\gamma_{f}\tau}{2\Lambda_{1}}\|x^{k+2}-\overline{x}\|_{S}^{2}\\ &-\frac{\gamma_{f}\tau}{2\Lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}-\frac{\gamma_{g}\tau}{2\Lambda_{2}}\|y^{k+1}-\overline{y}\|_{T}^{2}+\tau(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.56)

Applying the same arguments as in (3.42)-(3.45), we have

(1+γfτΛ1)ϕϕ1zk+3x¯S2+(1β+γgτΛ2)yk+1y¯T2+2τG(xk+1,yk+1)(γfτ/Λ1)+ϕϕ1zk+2x¯S2+1βyky¯T2γfτϕΛ1zk+2xk+2S2+2τ(δk+1+δk+2+εk+1).\begin{split}(1&+\frac{\gamma_{f}\tau}{\Lambda_{1}})\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+(\frac{1}{\beta}+\frac{\gamma_{g}\tau}{\Lambda_{2}})\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau G(x^{k+1},y^{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{\gamma_{f}\tau}{\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &+2\tau(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.57)

Since 1+γfτΛ1=1+βγgτΛ2=1ρ1+\frac{\gamma_{f}\tau}{\Lambda_{1}}=1+\frac{\beta\gamma_{g}\tau}{\Lambda_{2}}=\frac{1}{\rho}, from (3.47) we get

(1+γfτΛ1)ϕϕ1ϕ(1+(γfτ/Λ1))ϕ+(γfτ/Λ1)ϕ+(γfτ/Λ1)ϕ11ρϕ+(γfτ/Λ1)ϕ1and1β+γgτΛ2=1ρβ.(1+\frac{\gamma_{f}\tau}{\Lambda_{1}})\frac{\phi}{\phi-1}\geqslant\frac{\phi(1+(\gamma_{f}\tau/\Lambda_{1}))}{\phi+(\gamma_{f}\tau/\Lambda_{1})}\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\geqslant\frac{1}{\rho}\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\quad and\quad\frac{1}{\beta}+\frac{\gamma_{g}\tau}{\Lambda_{2}}=\frac{1}{\rho\beta}.

Thus, the inequality (3.57) implies that

1ρϕ+(γfτ/Λ1)ϕ1zk+3x¯S2+1ρβyk+1y¯T2+2τG(xk+1,yk+1)ϕ+(γfτ/Λ1)ϕ1zk+2x¯S2+1βyky¯T2+2τ(δk+1+δk+2+εk+1).\begin{split}\frac{1}{\rho}&\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\rho\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau G(x^{k+1},y^{k+1})\\ &\leqslant\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}+2\tau(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.58)

Multiplying (3.58) by ρk\rho^{-k} and summing the resulting inequality from k=0k=0 to N1N-1, we get

1ρNϕ+(γfτ/Λ1)ϕ1zN+2x¯S2+1ρNβyNy¯T2+2τk=0N11ρkG(xk+1,yk+1)ϕ+(γfτ/Λ1)ϕ1z2x¯S2+1βy0y¯T2+2τk=0N11ρk(δk+1+δk+2+εk+1).\begin{split}\frac{1}{\rho^{N}}&\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{\rho^{N}\beta}\|y^{N}-\overline{y}\|_{T}^{2}+2\tau\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\tau\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.59)

Since P(x)P(x) and P(y)P(y) are convex, from (3.55) we deduce

k=0N11ρkG(xk+1,yk+1)=k=0N11ρk(P(xk+1)+D(yk+1))S~N(P(X~N)+D(Y~N))=S~NG(X~N,Y~N).\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}G(x^{k+1},y^{k+1})=\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(P(x^{k+1})+D(y^{k+1}))\geqslant\widetilde{S}^{N}(P(\widetilde{X}^{N})+D(\widetilde{Y}^{N}))=\widetilde{S}^{N}G(\widetilde{X}^{N},\widetilde{Y}^{N}).

Therefore, from (3.59) we have

S~NG(X~N,Y~N)+ϕ+(γfτ/Λ1)2τρN(ϕ1)x¯zN+2S2(γfτ1/Λ1)+ϕ2τ(ϕ1)z2x¯S2+12βτy0y¯T2+k=0N11ρk(δk+1+δk+2+εk+1).\begin{split}\widetilde{S}^{N}&G(\widetilde{X}^{N},\widetilde{Y}^{N})+\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{2\tau\rho^{N}(\phi-1)}\|\overline{x}-z^{N+2}\|_{S}^{2}\\ &\leqslant\frac{(\gamma_{f}\tau_{1}/\Lambda_{1})+\phi}{2\tau(\phi-1)}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{2\beta\tau}\|y^{0}-\overline{y}\|_{T}^{2}+\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split} (3.60)

Note that δk=εk=O(qk)\delta_{k}=\varepsilon_{k}=O(q^{k}), 1S~N=1/(k=0N11ρk)=O(ρN)\frac{1}{\widetilde{S}^{N}}=1/(\sum_{k=0}^{N-1}\frac{1}{\rho^{k}})=O(\rho^{N}) with q,ρ(0,1)q,\rho\in(0,1), and

ρNk=0N1qkρk=ρNk=0N1(qρ)k=(qNρN)ρqρ.\rho^{N}\sum_{k=0}^{N-1}\frac{q^{k}}{\rho^{k}}=\rho^{N}\sum_{k=0}^{N-1}(\frac{q}{\rho})^{k}=(q^{N}-\rho^{N})\frac{\rho}{q-\rho}.

Thus,

1S~Nk=0N11ρk(δk+1+δk+2+εk+1)={O(ρN),ρ>q,O(NρN),ρ=q,O(qN),ρ<q.\displaystyle\frac{1}{\widetilde{S}^{N}}\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q.\end{array}\right.

Since G(X~N,Y~N)0G(\widetilde{X}^{N},\widetilde{Y}^{N})\geqslant 0 and ϕ+(γfτ/Λ1)ρN(ϕ1)>ϕ+(γfτ/Λ1)ϕ1>1\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\rho^{N}(\phi-1)}>\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}>1,

G(X~N,Y~N)={O(ρN),ρ>q,O(NρN),ρ=q,O(qN),ρ<q,12τx¯zN+2S2={O(ρN),ρ>q,O(NρN),ρ=q,O(qN),ρ<q.\begin{aligned} G(\widetilde{X}^{N},\widetilde{Y}^{N})=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q,\end{array}\right.\end{aligned}\begin{aligned} \frac{1}{2\tau}\|\overline{x}-z^{N+2}\|_{S}^{2}=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q.\end{array}\right.\end{aligned}

This completes the proof.

4 Numerical experiments

In this section, we apply the proposed Algorithm 1 to solve sparse recovery and image deblurring problems. We compare IP-GRPDAL method with Algorithm 1 in [9](denoted by PDA), Algorithm 1 in [16](denoted by IPDA), PDAL method in [26], and GRPDAL method in [13]. All codes were written in MATLAB R2015a on a PC (with CPU Intel i5-5200U). For simplicity we set S=(1r1I1001r2I2)S=\left(\begin{array}[]{cc}\frac{1}{r_{1}}I_{1}&0\\ 0&\frac{1}{r_{2}}I_{2}\end{array}\right), T=(1s1I1001s2I2)T=\left(\begin{array}[]{cc}\frac{1}{s_{1}}I_{1}&0\\ 0&\frac{1}{s_{2}}I_{2}\end{array}\right) , δk+1=0\delta_{k+1}=0 and εk+1=O(1/(k+1)α)\varepsilon_{k+1}=O(1/(k+1)^{\alpha}) in Algorithm 1.

4.1 l1l_{1} regularized analysis sparse recovery problem

We study the following problem:

Φ(x)=minx12Axb2+ζx1\varPhi(x)=\mathop{\textrm{min}}\limits_{x}\frac{1}{2}\|Ax-b\|^{2}+\zeta\|x\|_{1} (4.1)

where Am×nA\in\mathbb{R}^{m\times n}, xnx\in\mathbb{R}^{n}, bmb\in\mathbb{R}^{m}, and ζ>0\zeta>0 is a regularization parameter. Let f(x)=ζx1f(x)=\zeta\|x\|_{1}, g(y)=12y2+b,yg^{*}(y)=\frac{1}{2}\|y\|^{2}+\langle b,y\rangle, We can rewrite (4.1) as

minxmaxyAx,y+f(x)g(y)\mathop{\textrm{min}}\limits_{x}\mathop{\textrm{max}}\limits_{y}\langle Ax,y\rangle+f(x)-g^{*}(y) (4.2)

We use Proximal Gradient Method to solve subproblems in these algorithms, and the parameters are set as follows:
(i) A:=1nrandn(n,p)A:=\frac{1}{\sqrt{n}}\textrm{randn}(n,p);
(ii) ωRn\omega\in R^{n} is a random vector, where ss random coordinates are drawn from the uniform distribution in [10,10][-10,10] and the rest are zeros;
(iii) The observed value bb is generated by b=Aω+N(0,0.1)b=A\omega+N(0,0.1), the regularization parameter ζ\zeta was set to be 0.10.1;
(iv) We set τ=110A,σ=10A\tau=\frac{1}{10\|A\|},\sigma=\frac{10}{\|A\|} for the PDA method. For PDAL, GRPDAL and IP-GRPDAL methods, we set τ0=y1y0/(βAy1Ay0),β=100,ϕ=1.618,μ=0.7\tau_{0}=\|y_{-1}-y_{0}\|/(\sqrt{\beta}\|A^{*}y_{-1}-A^{*}y_{0}\|),\beta=100,\phi=1.618,\mu=0.7 and η=0.99.\eta=0.99. At the same time, we set s1=2,r1=0.99s1,s2=1,r2=0.99s2s_{1}=2,r_{1}=\frac{0.99}{s_{1}},s_{2}=1,r_{2}=\frac{0.99}{s_{2}} and α=2\alpha=2 for the IP-GRPDAL method as those for the IPDA method. The initial points are x0=(0,,0)x^{0}=(0,...,0) and y0=Ax0+b.y^{0}=Ax^{0}+b.

In this experiment, we terminate all the algorithms when Φ(xk)Φ<1010\varPhi(x^{k})-\varPhi^{*}<10^{-10}, where an approximation of the optimal value Φ\varPhi^{*} is obtained by running the algorithms for 5000 iterations.

In order to investigate the stability and efficiency of our method, we test three groups of problems with different pairs (n,p,s)(n,p,s) and run the tests 10 times for each group. The average numerical performances including the CPU time (Time, in seconds), the number of iterations (Iter) and the number of extra linesearch trial steps (LS) of PDAL, GRPDAL and IP-GRPDAL methods are reported in Table 1.

Table 1: Numerical results of tested algorithms with random tight frames
nn pp ss PDA IPDA PDAL
Time Iter Time Iter Time Iter LS
100 100 10 62.23 58842 13.68 51229 29.27 28316 27918
500 800 50 138.89 \diagup 60.64 92863 79.56 51500 50276
1000 2000 100 427.67 \diagup 391.08 184676 206.37 108251 106938
nn pp ss GRPDAL   IP-GRPDAL
Time Iter LS Time Iter LS
100 100 10 18.75 19390 5976 9.49 18292 4647
500 800 50 58.23 42407 13980 53.01 38594 12260
1000 2000 100 175.49 \diagup 72619 113.30 \diagup 58627
Refer to caption
(a)(n,p,s)=(100,100,10)(n,p,s)=(100,100,10)
Refer to caption
(b)(n,p,s)=(500,800,50)(n,p,s)=(500,800,50)
Refer to caption
(c)(n,p,s)=(1000,2000,100)(n,p,s)=(1000,2000,100)
Figure 1: Evolution of function value residuals with respect to CPU time

From the results presented in Table 1, we can observe that our IP-GRPDAL method takes less CPU time and number of iterations compared with the other ones. In Figure 1, the ordinate denotes the function value residuals Φ(xk)Φ\varPhi(x^{k})-\varPhi^{*} while the abscissa denotes the CPU time, from which it can be seen that the IP-GRPDAL method is much faster than the other ones.

4.2 TV-L1L_{1} image deblurring problem

In this subsection, we study the numerical solution of the TV-L1L_{1} model (4.1) in [16] for image deblurring

minxXF(x)=Kxf1+νDx1,\mathop{\textrm{min}}\limits_{x\in X}F(x)=\|Kx-f\|_{1}+\nu\|Dx\|_{1}, (4.3)

where fYf\in Y is a given (noisy) image, K:XYK:X\to Y is a known linear (blurring) operator, D:XYD:X\to Y denotes the gradient operator and ν\nu is a regularization parameter. We introduce the variables κ1,κ2>0\kappa_{1},\kappa_{2}>0 which satisfy κ1+κ2=ν\kappa_{1}+\kappa_{2}=\nu. Then, (4.3) can be written as

minxXKxf1+κ1Dx1+κ2Dx1.\mathop{\textrm{min}}\limits_{x\in X}\|Kx-f\|_{1}+\kappa_{1}\|Dx\|_{1}+\kappa_{2}\|Dx\|_{1}.

Further, the above formula can be rewritten as

minxXmaxyYL(x,y):=κ1Dx1+Ax,yΥC1(u)ΥCκ2(v)f,u,\mathop{\textrm{min}}\limits_{x\in X}\mathop{\textrm{max}}\limits_{y\in Y}L(x,y):=\kappa_{1}\|Dx\|_{1}+\langle Ax,y\rangle-\Upsilon_{C_{1}}(u)-\Upsilon_{C_{\kappa_{2}}}(v)-\langle f,u\rangle,

where Cκ={yY|yκ}C_{\kappa}=\{y\in Y|\|y\|_{\infty}\leqslant\kappa\}, y=(uv)y=\left(\begin{array}[]{c}u\\ v\end{array}\right), and A=(Kκ2D).A=\left(\begin{array}[]{c}K\\ \kappa_{2}D\end{array}\right). We adopt the following inequality as the stopping criterion of inner loop:

Ψ(yk+1,vk+1)εk+1,\varPsi(y^{k+1},v^{k+1})\leqslant\varepsilon_{k+1},

where εk+1=O(1/(k+1)α)\varepsilon_{k+1}=O(1/(k+1)^{\alpha}) and Ψ\varPsi is defined as (4.11) in [16].

In the following we will report the numerical experiment results. In this test, average blur with hsize=9 was applied to the original image cameraman.png(256 256) by fspecial(average, 9), and 20%20\% salt-pepper noise was added in(see Figure 2). At the same time, we adopt the following stopping rule:

F(xk)F(x)F(x)<105,\frac{F(x^{k})-F(x^{*})}{F(x^{*})}<10^{-5},

where x{x}^{*} is a solution of the TV-L1L_{1} model (4.3).

Refer to caption
(a) Original Image
Refer to caption
(b) Noise Image
Figure 2: Cameraman.png (256 ×\times 256)

We fixed the number of iterations as 100100 and the penalty coefficient ν=0.1\nu=0.1. When the five algorithms are implemented, their respective parameters are choosen as follows:
PDA: τ=σ=0.99\tau=\sigma=0.99;
IPDA: τ0=σ0=1,s1=2,r1=0.99s1,s2=1,r2=0.99s2,α=2\tau_{0}=\sigma_{0}=1,s_{1}=2,r_{1}=\frac{0.99}{s_{1}},s_{2}=1,r_{2}=\frac{0.99}{s_{2}},\alpha=2;
PDAL: τ0=0.1,β=1,μ=0.1,η=0.99\tau_{0}=0.1,\beta=1,\mu=0.1,\eta=0.99;
GRPDAL: τ0=0.1,ϕ=1.618,β=1,μ=0.1,η=0.99\tau_{0}=0.1,\phi=1.618,\beta=1,\mu=0.1,\eta=0.99;
IP-GRPDAL: τ0=0.1,ϕ=1.618,β=1,s1=2,r1=0.99s1,s2=1,r2=0.99s2,α=2,μ=0.1,η=0.99\tau_{0}=0.1,\phi=1.618,\beta=1,s_{1}=2,r_{1}=\frac{0.99}{s_{1}},s_{2}=1,r_{2}=\frac{0.99}{s_{2}},\alpha=2,\mu=0.1,\eta=0.99.

Refer to caption
(a) PDA
Refer to caption
(b) IPDA
Refer to caption
(c) PDAL
Refer to caption
(d) GRPDAL
Refer to caption
(e) IP-GRPDAL
Figure 3: Restored images

The restored images by the above Algorithms are displayed in Figure 3. Obviously, our IP-GRPDAL method gets better restoration quality compared with PDA, PDAL, IPDA and GRPDAL methods. In our experiment, we find that, if we increase the number of iterations to 10001000 or more, all algorithms can restore the image with almost the same quality, but our algorithm needs fewer iterations than the other ones.

5 Conclusions

In this paper, we propose an inexact golden ratio primal-dual algorithm with linesearch for the saddle point problem by applying inexact extended proximal terms with matrix norm introduced in [16]. Under the assumption that τσAT2<ϕ\tau\sigma\|A\|_{T}^{2}<\phi, we show the convergence of our Algorithm 1, provided that both controlled error sequences {δk+1}\{\delta_{k+1}\} and {εk+1}\{\varepsilon_{k+1}\} were required to be summable. The O(1/N)O(1/N) convergence rate in the ergodic sense is also established in the general convex case. When either one of the component functions is strongly convex, accelerated version of Algorithm 1 is proposed, which achieves O(1/N2)O(1/N^{2}) ergodic convergence rate. Furthermore, the linear convergence results are established when both component functions are strongly convex. We also apply our method to solve sparse recovery and TV-L1L_{1} image deblurring problems and verify their efficiency numerically. It will be a interesting open problem to establish nonergodic convergence rate of IP-GRPDAL method with linesearch.

References

  • [1]
  • [2]
  • 3.
  • 4. Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in linear and non-linear programming. Stanford Mathematical Studies in the Social Sciences, II. Stanford University Press, Stanford, Calif. With contributions by Chenery, H.B., Johnson, S.M., Karlin, S., Marschak, T., Solow, R.M (1958)
  • 5. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trend in Machine Learning 3(1), 1-122(2011)
  • 6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2(1), 183-202(2009)
  • 7. Condat, L.: A primal dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. Journal of Optimization Theory and Applications 158(2), 460-479(2013)
  • 8. Cai, J. F., Cand s, E. J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization 20(4), 1956-1982(2010)
  • 9. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120-145(2011)
  • 10. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numerica 25(1), 161-319(2016)
  • 11. Chang, X., Yang, J.: A golden ratio primal-dual algorithm for structured convex optimization. Journal of Scientific Computing 87(1), 1-26(2021)
  • 12. Chang, X., Yang, J.: GRPDA revisited: relaxed condition and connection to Chambolle-Pock’s primal-dual algorithm. Journal of Scientific Computing 93(3), 1-18(2022)
  • 13. Chang, X., Yang, J., Zhang, H.: Golden ratio primal-dual algorithm with linesearch. SIAM Journal on Optimization 32(3), 1584-1613(2022)
  • 14. Eckstein, J., Bertsekas, D. P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55(1), 293-318(1992)
  • 15. Ehrhardt, M. J., Betcke, M. M.: Multicontrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences 9(3), 1084-1106(2016)
  • 16. Fang, C., Hu, L., Chen, S.: An inexact primal-dual method with correction step for a saddle point problem in image debluring. Journal of Global Optimization 87(2), 965-988(2023)
  • 17. Han, D., He, H., Yang, H., Yuan, X.: A customized Douglas-Rachford splitting algorithm for separable convex minimization with linear constraints. Numerische Mathematik 127(1), 167-200(2014)
  • 18. He, H., Desai, J., Wang, K.: A primal-dual prediction-correction algorithm for saddle point optimization. Journal of Global Optimization 66(1), 573-583(2016)
  • 19. He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM Journal on Optimization 22(2), 313-340(2012)
  • 20. He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM Journal on Imaging Sciences 5(1), 119-149(2012)
  • 21. Hien, L. T. K., Zhao, R., Haskell, W. B.: An inexact primal-dual smoothing framework for large-scale non-bilinear saddle point problems. Journal of Optimization Theory and Applications 200(1), 34-67(2024)
  • 22. Jiang, F., Cai, X., Wu, Z., Han, D.: Approximate first-order primal-dual algorithms for saddle point problems. Mathematics of Computation 90(329), 1227-1262(2021)
  • 23. Jiang, F., Wu, Z., Cai, X., Zhang, H.: A first-order inexact primal-dual algorithm for a class of convex-concave saddle point problems. Numerical Algorithms 1(1), 1-28(2021)
  • 24. Liu, Y., Xu, Y., Yin, W.: Acceleration of primal-dual methods by preconditioning and simple subproblem procedures. Journal of Scientific Computing 86(2), 21-46(2021)
  • 25. Malitsky, Y.: Golden ratio algorithms for variational inequalities. Mathematical Programming 184(1), 383-410(2020)
  • 26. Malitsky, Y., Pock, T. A first-order primal-dual algorithm with linesearch. SIAM Journal on Optimization 28(1), 411-432(2018)
  • 27. Ng, M. K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM Journal on Scientific Computing 33(4), 1643-1668(2011)
  • 28. Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford-Shah functional. In 20092009 IEEE 1212th International Conference on Computer Vision, IEEE, 1133-1140(2009)
  • 29. Rasch, J., Chambolle, A.: Inexact first-order primal-dual algorithms. Computational Optimization and Applications 76(2), 381-430(2020)
  • 30. Rudin, L. I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1-4), 259-268(1992)
  • 31. Shores, T. S.: Applied Linear Algebra and Matrix Analysis. Springer, New York (2007)
  • 32. Wang, K., He, H.: A double extrapolation primal-dual algorithm for saddle point problems. Journal of Scientific Computing 85(2), 1-30(2020)
  • 33. Wu, Z., Song, Y., Jiang, F.: Inexact generalized ADMM with relative error criteria for linearly constrained convex optimization problems. Optimization Letters 18(2), 447-470(2024)
  • 34. Xie, J.: On inexact ADMMs with relative error criteria. Computational Optimization and Applications 71(3), 743-765(2018)
  • 35. Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. Ucla Cam Report 34(2), 8-34(2008)