\UseRawInputEncoding

An inexact golden ratio primal-dual algorithm with linesearch step for a saddle point problem

Changjie Fang ¹¹1 College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail: [email protected]., Jinxiu Liu ²²2 College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail: [email protected]., Jingtao Qiu ³³3 College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail:[email protected]. and
Shenglan Chen ⁴⁴4 College of Science, Chongqing University of Posts and Telecommunications, Chongqing 400065, China. E-mail:[email protected].

Abstract In this paper, we propose an inexact golden ratio primal-dual algorithm with linesearch step(IP-GRPDAL) for solving the saddle point problems, where two subproblems can be approximately solved by applying the notations of inexact extended proximal operators with matrix norm. Our proposed IP-GRPDAL method allows for larger stepsizes by replacing the extrapolation step with a convex combination step. Each iteration of the linesearch requires to update only the dual variable, and hence it is quite cheap. In addition, we prove convergence of the proposed algorithm and show an $O(1/N)$ ergodic convergence rate for our algorithm, where $N$ represents the number of iterations. When one of the component functions is strongly convex, the accelerated $O(1/N^{2})$ convergence rate results are established by choosing adaptively some algorithmic parameters. Furthermore, when both component functions are strongly convex, the linear convergence rate results are achieved. Numerical simulation results on the sparse recovery and image deblurring problems illustrate the feasibility and efficiency of our inexact algorithms.

Keywords Convex optimization $\cdot$ Inexact extended proximal operators $\cdot$ Golden ratio primal-dual algorithm $\cdot$ Linesearch $\cdot$ Image deblurring

1 Introduction

Let $X:=\mathbb{R}^{n}$ and $Y:=\mathbb{R}^{m}$ be two finite-dimensional Euclidean spaces equipped with a standard inner product $\langle\cdot,\cdot\rangle$ and a norm $\|\cdot\|=\sqrt{\langle\cdot,\cdot\rangle}$ . Let $f:X\to(-\infty,+\infty]$ and $g,h:Y\to(-\infty,+\infty]$ be proper lower semicontinuous (l.s.c) convex functions, $A:X\to Y$ be a bounded linear mapping. Denote the Legendre-Fenchel conjugate of $h$ and the adjoint of $A$ by $h^{*}$ and $A^{*}$ , respectively. Now we consider the primal problem

\displaystyle\mathop{\min}\limits_{x\in X}f(x)+h(Ax)

(1.1)

together with its dual problem

\displaystyle\mathop{\max}\limits_{y\in Y}f^{*}(-A^{*}y)+h^{*}(y).

(1.2)

If a primal-dual solution pair $(\overline{x},\overline{y})$ of (1.1) and (1.2) exists, i.e.,

\displaystyle 0\in\partial f(\overline{x})+A^{*}\overline{y},\,\,0\in\partial h(A\overline{x})-\overline{y},

then the problem (1.1) is equivalent to the following saddle-point formulation:

\displaystyle\min_{x\in X}\max_{y\in Y}f(x)+\langle Ax,y\rangle-h^{*}(y).

(1.3)

It is well known that many application problems can be formulated as the saddle point problem (1.3) such as image restoration, magnetic resonance imaging and computer vision; see, for example, [28, 30, 35]. Two of the most popular approaches are the primal-dual algorithm (PDA) [9, 20], alternating direction method of multipliers (ADMM) method [5, 19], and their accelerated and generalized variants [24, 26, 25]. To solve model (1.3), the following first-order primal-dual algorithm (PDA) [9] has attracted much attention:

\begin{cases}{x}^{k+1}=\textrm{Prox}_{\tau f}(x^{k}-\tau A^{*}y^{k}),\\ \hat{x}^{k+1}=x^{k+1}+\theta({x^{k+1}-x^{k}}),\\ {y}^{k+1}=\textrm{Prox}_{\sigma h^{*}}(y^{k}+\sigma Ax^{k+1}),\end{cases}

(1.4)

where $\tau>0$ and $\sigma>0$ are regularization parameters and $\theta\in(0,1)$ is an extrapolation parameter, and for $\theta=1$ , the convergence of PDA was proved with the requirement on step sizes $\tau\sigma\|A\|^{2}<1$ . Generally, with fixed $\tau$ and $\sigma$ , a flexible extrapolation parameter $\theta$ is of benefit to a potential acceleration of the algorithm (1.4), which motivates researchers to enlarge the range of $\theta$ , e.g., see [18, 32]. Indeed, the scheme (1.4) reduces to the classic Arrow-Hurwicz method [4] when $\theta=0$ . However, the convergence of the Arrow-Hurwicz method can only be guaranteed under the restrictive assumption that $\tau$ and $\sigma$ are sufficiently small. In order to overcome this difficulty, Chang et al. in [11] proposed a golden ratio primal-dual algorithm (GRPDA) for solving (1.3) based on a seminal convex combination technique introduced by Malitsky in [25], which, started at $(x^{0},y^{0})\in\mathbb{R}^{n}\times\mathbb{R}^{m}$ , iterates as

\begin{cases}z^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k},\\ {x}^{k+1}=\textrm{Prox}_{\tau f}(z^{k+1}-\tau A^{*}y^{k}),\\ {y}^{k+1}=\textrm{Prox}_{\sigma h^{*}}(y^{k}+\sigma Ax^{k+1}),\end{cases}

(1.5)

where $\phi\in(1,\frac{1+\sqrt{5}}{2}]$ determines the convex combination coefficients. In [11], the iterative convergence and ergodic convergence rate results are established under the condition $\tau\sigma\|A\|^{2}<\phi$ . Since $\phi>1$ , this stepsize condition is much relaxed than that of the PDA method (1.4); see also [12]. Further, Chang et al. incorporated linesearch strategy into the GRPDA method, in which the next iteration $y^{k+1}$ is implicitly computed, since the stepsize parameter $\tau_{k+1}$ is determined by the inequality including $y^{k+1}$ ; see Step 2 of Algorithm 3.1 in [13].

As primal-dual algorithms, however, when the proximal operators of $f$ and $h^{*}$ are not easy to compute, they did not perform ideally well in terms of computing time and efficiency, e.g., see the examples in [8, 15] and the numerical experiments in [17]. And in many practical applications, one often encounters the case that at least one of the proximal operators does not possess a closed-form solution and their evaluation involves inner iterative algorithms. In this situation, some researchers are dedicated to approximately solving the subproblems instead of finding their accurate solutions, for example, [21, 24, 27, 34, 33]. An absolute error criterion was adopted in [14], where the subproblem errors are controlled by a summable sequence of error tolerances. Jiang et al. [22, 23] studied two inexact primal-dual algorithms with absolute and relative error criteria respectively, where for the inexact primal-dual method with a relative error criterion, only the $O(1/N)$ convergence rate was established. In [29], Rasch and Chambolle proposed the inexact first-order primal-dual methods by applying the concepts of inexact proxima where all the controlled errors were required to be summable. Further, Fang et al. [16] proposed an inexact primal-dual method with correction step by introducing the notations of inexact extended proximal operators with matrix norm, where the $O(1/N)$ ergodic convergence rate was achieved. In [16], the accelerated versions of the proposed method under the assumptions that $f$ or $h^{*}$ is strongly convex have not been considered.

In this paper, we are concerned with the following saddle point problem:

\displaystyle\mathop{\min}\limits_{x\in X}\mathop{\max}\limits_{y\in Y}L(x,y)=f(x)+{\langle{Ax,y}\rangle}-g(y)

(1.6)

Recall that $(\overline{x},\overline{y})$ is called a saddle point of (1.6) if it satisfies the inequalities

\displaystyle L(\overline{x},y)\leqslant L(\overline{x},\overline{y})\leqslant L(x,\overline{y}),\forall x\in X,\forall y\in Y

(1.7)

Hence, Problem (1.3) is a special case of Problem (1.6).

Motivated by the research works [13, 16], in this paper, we propose an inexact golden ratio primal-dual algorithm with linesearch step for solving problem (1.6) by applying the type-2 approximation of the extended proximal point introduced in [16]. The main contributions of this paper are summarized as follows.

• For the case the proximal operators of $f$ and $g$ are not easy to compute, we propose an inexact IP-GRPDAL method with extended proximal terms containing symmetric positive definite matrix. Both subproblems in our method can be approximately solved under the type-2 approximation criterias. Global convergence and $O(1/N)$ ergodic convergence rate results are established under the condition $\tau\sigma\|A\|_{T}^{2}<\phi$ , where $N$ denotes the iteration counter. Furthermore, we establish the convergence rates in case the error tolerances $\{\delta_{k}\}$ and $\{\varepsilon_{k}\}$ are required to decrease like $O(1/k^{2\alpha+1})$ for some $\alpha>0$ .

• Our method updates the dual variable by adopting linesearch step to allow adaptive and potentially much larger stepsizes which effectively reduces the computational effort of the algorithm iteration. In addition, the next iteration $y^{k+1}$ is explicitly computed compared with that in [13]; see (3.3) in Algorithm 1.

• We propose the accelerated versions of IP-GRPDAL method, which were not provided in [16]. When one of the underlying functions is strongly convex, $O(1/N^{2})$ convergence rate results are established by adaptively choosing some algorithmic parameters, for example, $\beta$ is replaced by $\beta_{k}$ ; see (3.32) of Algorithm 2. In addition, the linear convergence rate results can be established when both $f$ and $g$ are strongly convex.

• We perform numerical experiments for the sparse recovery and image deblurring problems, demonstrating that our method outperforms some existing methods [13, 9, 16, 26].

The rest of this paper is organized as follows. In Section 2, we introduce the concepts of inexact extended proximal terms and present some auxiliary material. In Section 3, the main algorithm and its accelerated versions are presented. At the same time, we also prove the convergence of our algorithms and analyze their convergence rates. Numerical experiment results are reported in Section 4. Some conclusions are presented in Section 5.

2 Preliminaries

For given $x_{1}\in X$ and $y_{1}\in Y$ , we define $P_{x_{1},y_{1}}(x):=f(x)-f(x_{1})+{\langle{x-x_{1},A^{*}y_{1}}\rangle}$ and $D_{x_{1},y_{1}}(y):=g(y)-g(y_{1})+{\langle{y-y_{1},-Ax_{1}}\rangle}$ for any $x\in X$ and $y\in Y$ . Let $(\overline{x},\overline{y})\in X\times Y$ be a generic saddle point. When there is no confusion, we will omit the subscript in $P$ and $D$ ,

\begin{cases}P(x):=P_{\overline{x},\overline{y}}(x)=f(x)-f(\overline{x})+\langle x-\overline{x},A^{*}\overline{y}\rangle,\forall x\in X,\\ D(y):=D_{\overline{x},\overline{y}}(y)=g(y)-g(\overline{y})+\langle y-\overline{y},-A\overline{x}\rangle,\forall y\in Y.\end{cases}

(2.1)

By subgradient inequality, it is clear that $P(x)\geqslant 0$ and $D(y)\geqslant 0$ . Note that the functions $P(x)$ and $D(y)$ are convex in $x$ and $y$ , respectively. The primal-dual gap is defined as $G(x,y):=L(x,\overline{y})-L(\overline{x},y)$ for $(x,y)\in X\times Y$ . It is easy to verify that

G(x,y):=G_{\overline{x},\overline{y}}(x,y)=P(x)+D(y)\geqslant 0,\forall(x,y)\in X\times Y.

(2.2)

The system (2.1) can be reformulated as

0\in\partial f(\overline{x})+A^{*}\overline{y},\qquad 0\in\partial g(\overline{y})-A\overline{x}.

Suppose that $h$ is a convex function in $\mathbb{R}^{n}$ , and that $D\in\mathbb{R}^{n\times n}$ is a symmetric positive definite matrix. For any $D\succ 0$ and given $y\in\mathbb{R}^{n}$ , denote

J_{y}(x):=h(x)+\frac{1}{2\tau}{\|x-y\|}^{2}_{D},\forall x\in\mathbb{R}^{n},

(2.3)

and define the proximal operator of $h$ as

\textnormal{Prox}_{\tau h}^{D}(y)=\mathop{\arg\min}\limits_{x\in X}\{h(x)+\frac{1}{2\tau}{\|x-y\|}^{2}_{D}\},

(2.4)

where ${\|x\|}_{D}^{2}=\langle{x,Dx}\rangle$ and $D^{-1}$ denotes the inverse of $D$ , the first-order optimality condition for the proximum gives different characterizations of the proximal operator:

\overline{z}:=\textrm{Prox}_{\tau h}^{D}(y)\Longleftrightarrow 0\in\partial J_{y}(\overline{z})\Longleftrightarrow\frac{y-\overline{z}}{\tau}\in\partial h(\overline{z}).

Below, we recall definitions of three different types of inexact extended proximal operators with matrix norm, which can be found in [16].

Definition 2.1

Let $\varepsilon\geqslant 0$ . $z\in X$ is said to be a type-0 approximation of the extended proximal point $Prox_{\tau h}^{D}(y)$ with precision $\varepsilon$ if

z\thickapprox_{0}^{\varepsilon}\textnormal{Prox}_{\tau h}^{D}(y)\Longleftrightarrow\|z-\overline{z}\|_{D}\leqslant\sqrt{2\tau\varepsilon}

(2.5)

Definition 2.2

Let $\varepsilon\geqslant 0$ . $z\in X$ is said to be a type-1 approximation of the extended proximal point $Prox_{\tau h}^{D}(y)$ with precision $\varepsilon$ if

z\thickapprox_{1}^{\varepsilon}\textnormal{Prox}_{\tau h}^{D}(y)\Longleftrightarrow 0\in\partial_{\varepsilon}J_{y}(z),

(2.6)

where $\partial_{\varepsilon}J_{y}(z)=\{p\in X|J_{y}(x)\geqslant J_{y}(z)+\langle{p,x-z}\rangle-\varepsilon,\forall x\in X\}.$

Definition 2.3

Let $\varepsilon\geqslant 0$ . $z\in X$ is said to be a type-2 approximation of the extended proximal point $Prox_{\tau h}^{D}(y)$ with precision $\varepsilon$ if

z\thickapprox_{2}^{\varepsilon}\textnormal{Prox}_{\tau h}^{D}(y)\Longleftrightarrow\frac{1}{\tau}D(y-z)\in\partial_{\varepsilon}h(z),

(2.7)

where $\partial_{\varepsilon}h(z)=\{p\in X|h(x)\geqslant h(z)+\langle{p,x-z}\rangle-\varepsilon,\forall x\in X\}.$

We give two simple but useful lemmas in the following.

Lemma 2.4

For any $x,y,z$ and a symmetric positive definite matrix $D$ , we have the identity

\langle{D(x-y),x-z}\rangle=\frac{1}{2}[\|x-y\|_{D}^{2}+\|x-z\|_{D}^{2}-\|y-z\|_{D}^{2}].

(2.8)

For $\alpha\in\mathbb{R}$ , there holds

\|\alpha x+(1-\alpha)y\|_{D}^{2}=\alpha\|x\|_{D}^{2}+(1-\alpha)\|y\|_{D}^{2}-\alpha(1-\alpha)\|x-y\|_{D}^{2}.

(2.9)

Lemma 2.5

[31] Assuming that $\lambda$ and $\Lambda$ are the minimum and maximum eigenvalues of the symmetric positive definite matrix $D$ , respectively, we have

\sqrt{\lambda}\|x\|\leqslant\|x\|_{D}\leqslant\sqrt{\Lambda}\|x\|.

(2.10)

3 Algorithm and convergence properties

In this section, we propose an inexact GRPDA algorithm and then show the convergence of the proposed method. If $f$ is further assumed to be strongly convex, we can modify our method to accelerate the convergence rate. Moreover, if both $f$ and $g$ are strongly convex, a linear convergence rate can be achieved.

3.1 Convex case

1: Let

\varphi=\frac{\sqrt{5}+1}{2}

be the golden ratio, that is

\varphi^{2}=1+\varphi

. Choose

x^{0}=z^{0}\in\mathbb{R}^{n},y^{0}\in\mathbb{R}^{m},\phi\in(1,\varphi),\eta\in(0,1),\mu\in(0,1)

\tau_{0}>0

and

\beta>0

S\in\mathbb{R}^{n\times n}

and

T\in\mathbb{R}^{m\times m}

are given symmetric positive definite matrix. Set

\psi=\frac{1+\phi}{\phi^{2}}

and

k=0.

2: Compute

{z}^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k},

(3.1)

{x}^{k+1}\approx_{2}^{\delta_{k+1}}\mathop{\arg\min}\limits_{x\in X}\{L(x,y^{k})+\frac{1}{2\tau_{k}}\|x-z^{k+1}\|_{S}^{2}\}.

(3.2)

3: Choose any

\tau_{k+1}\in[\tau_{k},\psi\tau_{k}]

and run

3.a: Compute

{y}^{k+1}\approx_{2}^{\varepsilon_{k+1}}\mathop{\arg\max}\limits_{y\in Y}\{L(x^{k+1},y)-\frac{1}{2\beta\tau_{k+1}}\|y-y^{k}\|_{T}^{2}\}.

(3.3)

3.b: Break linesearch if

\sqrt{\beta\tau_{k+1}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\leqslant\eta\sqrt{\frac{\phi}{\tau_{k}}}\|y^{k+1}-y^{k}\|_{T}.

(3.4)

Otherwise, set

\tau_{k+1}:=\tau_{k+1}\mu

and go to 3.a.

4: Set

k+1\leftarrow k+2

and return to 2.

Algorithm 1 An inexact GRPDA with linesearch(IP-GRPDAL)

Firstly, we summarize several useful lemmas which will be used in the sequence..

Lemma 3.1

(i) The linesearch in Algorithm 1 always terminates.
(ii) There exists $\underline{\tau}:=\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}>0$ such that $\tau_{k+1}>\underline{\tau}$ for all $k\geqslant 0$ .
(iii) For any integer $N>0$ , we have $|\mathcal{A}_{N}|\geqslant\hat{c}N$ for some constant $\hat{c}>0$ , where $\mathcal{A}_{N}=\{1\leqslant k\leqslant N:\tau_{k}\geqslant\underline{\tau}\}$ and $|\mathcal{A}_{N}|$ is the cardinality of the set $\mathcal{A}_{N}$ , which implies $\sum_{k=1}^{N}\tau_{k+1}\geqslant\underline{c}N$ with $\underline{c}=\hat{c}\underline{\tau}$ .

Proof. (i) In each iteration of the linesearch $\tau_{k}$ is multiplied by factor $\mu\in(0,1)$ . Since (3.4) is fulfilled whenever $\tau_{k}\leqslant\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ where $L=\|A^{*}\|_{T}$ , the inner loop can not run indefinitely.

(ii)According to the recursive method, we assume that $\tau_{0}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ . Then, our goal is to show that from $\tau_{k}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ follows $\tau_{k+1}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ . Suppose that $\tau_{k+1}=\psi\tau_{k}\mu^{i}$ for some $i\in Z^{+}$ . If $i=0$ then $\tau_{k+1}>\tau_{k}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ . If $i>0$ then $\tau_{k+1}^{{}^{\prime}}=\psi\tau_{k}\mu^{i-1}$ does not satisfy (3.4). Thus, $\tau_{k+1}^{{}^{\prime}}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ and hence, $\tau_{k+1}>\frac{\eta\sqrt{\phi}}{L\sqrt{\beta\psi}}$ .

(iii) The detailed proof takes similar approach with Lemma 3.1 (iii) of [13] and is thus omitted.

Lemma 3.2

Suppose that $\lambda_{1},\lambda_{2}>\eta$ are the minimum eigenvalues of $S$ and $T$ , respectively. Let $\theta_{k+1}=\frac{\tau_{k+1}}{\tau_{k}}$ and $\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\}$ be the sequence generated by Algorithm 1. Then, for any $(\overline{x},\overline{y})\in X\times Y$ , there holds

\begin{split}\tau_{k+1}&G(x^{k+1},y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle\\ &+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle+\tau_{k+1}\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle\\ &+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.5)

Proof. By Definition 2.3, the optimal condition of (3.2) yields

\frac{1}{\tau_{k}}S(z^{k+1}-x^{k+1})-A^{*}y^{k}\in\partial_{\delta_{k+1}}f(x^{k+1}).

In view of the definition of $\varepsilon$ -subdifferential, we have

f(x)-f(x^{k+1})+\frac{1}{\tau_{k}}\langle S{(x^{k+1}-z^{k+1})+\tau_{k}A^{*}y^{k},x-x^{k+1}}\rangle+\delta_{k+1}\geqslant 0,\forall x\in X.

(3.6)

Setting $x=\overline{x}$ and $x=x^{k+2}$ in (3.6), respectively, we obtain

\tau_{k}(f(x^{k+1})-f(\overline{x}))\leqslant\langle S(x^{k+1}-z^{k+1})+\tau_{k}A^{*}y^{k},\overline{x}-x^{k+1}\rangle+\tau_{k}\delta_{k+1}.

(3.7)

\tau_{k}(f(x^{k+1})-f(x^{k+2})\leqslant\langle S(x^{k+1}-z^{k+1})+\tau_{k}A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\tau_{k}\delta_{k+1},

(3.8)

In view of (3.7), we have

\tau_{k+1}(f(x^{k+2})-f(\overline{x}))\leqslant\langle S(x^{k+2}-z^{k+2})+\tau_{k+1}A^{*}y^{k+1},\overline{x}-x^{k+2}\rangle+\tau_{k+1}\delta_{k+2}.

(3.9)

Multiplying (3.8) by $\theta_{k+1}$ and using the fact $x^{k+1}-z^{k+1}=\phi(x^{k+1}-z^{k+2})$ , we obtain

\tau_{k+1}(f(x^{k+1})-f(x^{k+2}))\leqslant\langle\phi\theta_{k+1}S(x^{k+1}-z^{k+2})+\tau_{k+1}A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\tau_{k+1}\delta_{k+1}.

(3.10)

Similarly, for $\overline{y}\in Y$ , from (3.3) we obtain

\tau_{k+1}(g(y^{k+1})-g(\overline{y}))\leqslant\frac{1}{\beta}\langle T(y^{k+1}-y^{k})-\beta\tau_{k+1}Ax^{k+1},\overline{y}-y^{k+1}\rangle+\tau_{k+1}\varepsilon_{k+1}.

(3.11)

Direct calculations show that a summation of (3.9), (3.10) and (3.11) gives

\tau_{k+1}(f(x^{k+1})-f(\overline{x}))+\tau_{k+1}(g(y^{k+1})-g(\overline{y}))+\tau_{k+1}\langle A^{*}\overline{y},x^{k+1}-\overline{x}\rangle-\tau_{k+1}\langle A\overline{x},y^{k+1}-\overline{y}\rangle

\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\tau_{k+1}\langle A^{*}y^{k+1},\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle

+\tau_{k+1}\langle A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle-\tau_{k+1}\langle Ax^{k+1},\overline{y}-y^{k+1}\rangle

+\tau_{k+1}\langle A^{*}\overline{y},x^{k+1}-\overline{x}\rangle-\tau_{k+1}\langle A\overline{x},y^{k+1}-\overline{y}\rangle+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).

Applying the definitions of $P(x)$ and $D(y)$ in (2.1) to the above inequality, we have

\tau_{k+1}P(x^{k+1})+\tau_{k+1}D(y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle

\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle+\tau_{k+1}\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}),

which, by the definition of $G(x,y)$ , implies (3.5) immediately.

Lemma 3.3

Let $\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\}$ be the sequence generated by Algorithm 1. For $k\geqslant 0$ , it holds that

\begin{split}&\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-(1-\frac{\eta}{\lambda_{1}})\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}\\ &-\frac{\lambda_{2}-\eta}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2}-\theta_{k+1}\phi\|x^{k+1}-z^{k+2}\|_{S}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.12)

Proof. First, it is easy to verify from $\psi=\frac{1+\phi}{\phi^{2}}$ and $\theta_{k+1}=\frac{\tau_{k+1}}{\tau_{k}}\leqslant\psi$ that

1+\frac{1}{\phi}-\phi\theta_{k+1}\geqslant 1+\frac{1}{\phi}-\phi\psi=0.

(3.13)

Since $\lambda_{1}$ and $\lambda_{2}$ are the minimum eigenvalues of $S$ and $T$ , respectively, it follows from (2.10) that

\|x^{k+2}-x^{k+1}\|\leqslant\frac{1}{\sqrt{\lambda_{1}}}\|x^{k+2}-x^{k+1}\|_{S}\quad and\quad\|A^{*}y^{k+1}-A^{*}y^{k}\|\leqslant\frac{1}{\sqrt{\lambda_{2}}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}.

Using Cauchy-Schwarz inequality, from (3.4) we obtain

\begin{split}&2\tau_{k+1}\|A^{*}y^{k+1}-A^{*}y^{k}\|\|x^{k+2}-x^{k+1}\|\\ &\leqslant 2\tau_{k+1}\frac{1}{\sqrt{\lambda_{1}}\sqrt{\lambda_{2}}}\|x^{k+2}-x^{k+1}\|_{S}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\\ &\leqslant\eta(\frac{\theta_{k+1}\phi}{\lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2}).\end{split}

(3.14)

Applying (2.8) and Cauchy-Schwarz inequality, from (3.5) we have

\begin{split}\|x^{k+2}&-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\|z^{k+2}-\overline{x}\|_{S}^{2}+(\phi\theta_{k+1}-1)\|x^{k+2}-z^{k+2}\|_{S}^{2}-\phi\theta_{k+1}\|x^{k+1}-z^{k+2}\|_{S}^{2}\\ &-\phi\theta_{k+1}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{1}{\beta}\|y^{k+1}-y^{k}\|_{T}^{2}\\ &+2\tau_{k+1}\|A^{*}y^{k+1}-A^{*}y^{k}\|\|x^{k+1}-x^{k+2}\|+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.15)

In view of (3.1), we get $x^{k+2}=\frac{\phi}{\phi-1}z^{k+3}-\frac{1}{\phi-1}z^{k+2}$ and $z^{k+3}-z^{k+2}=\frac{\phi-1}{\phi}(x^{k+2}-z^{k+2})$ . Thus, from (2.9) we deduce

\begin{split}\|x^{k+2}-\overline{x}\|_{S}^{2}&=\|\frac{\phi}{\phi-1}(z^{k+3}-\overline{x})-\frac{1}{\phi-1}(z^{k+2}-\overline{x})\|_{S}^{2}\\ &=\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}-\frac{1}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{\phi}{(\phi-1)^{2}}\|z^{k+3}-z^{k+2}\|_{S}^{2}\\ &=\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}-\frac{1}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\phi}\|x^{k+2}-z^{k+2}\|_{S}^{2}.\end{split}

(3.16)

Combining (3.14) and (3.16) with (3.15), we obtain

\begin{split}\frac{\phi}{\phi-1}&\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}-(1-\phi\theta_{k+1}+\frac{1}{\phi})\|x^{k+2}-z^{k+2}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}\\ &-\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}+\eta(\frac{\theta_{k+1}\phi}{\lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2})\\ &-\frac{1}{\beta}\|y^{k+1}-y^{k}\|_{T}^{2}-\theta_{k+1}\phi\|x^{k+1}-z^{k+2}\|_{S}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})\\ \leqslant&\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-(1-\frac{\eta}{\lambda_{1}})\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}\\ &-\frac{\lambda_{2}-\eta}{\lambda_{2}\beta}\|y^{k+1}-y^{k}\|_{T}^{2}-\theta_{k+1}\phi\|x^{k+1}-z^{k+2}\|_{S}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.17)

In the following, we summarize the convergence result for Algorithm 1.

Theorem 3.4

Suppose that $\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\}$ is the sequence generated by Algorithm 1. Then, $\{(x^{k+1},y^{k+1}):k\geqslant\ 0\}$ is bounded and every limit point of $\{(x^{k+1},y^{k+1}):k\geqslant\ 0\}$ is a solution of (1.6).

Proof. Since $\lambda_{1},\lambda_{2}>\eta$ and $G(x^{k+1},y^{k+1})\geqslant 0$ , (3.12) yields

\begin{split}&\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}\\ &\leqslant\frac{\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.18)

By summing over $k=0,1,2,...,N-1$ , we obtain

\begin{split}&\frac{\phi}{\phi-1}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{N}-\overline{y}\|_{T}^{2}\\ &\leqslant\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.19)

Since the sequences $\{\delta_{k}\}$ and $\{\varepsilon_{k}\}$ are summable, and $\{\tau_{k}\}$ is bounded, $\sum_{k=0}^{N-1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})$ is bounded. From this we deduce that $\frac{\phi}{\phi-1}\|z^{k+1}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}$ is bounded. Thus, $\{z^{k+1}\}$ and $\{y^{k+1}\}$ are bounded sequences. Hence, from (3.1) we know that $\{x^{k+1}\}$ is also bounded.

Summing up (3.12) from $k=0$ to $N-1$ , we get

\frac{\phi}{\phi-1}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{N}-\overline{y}\|_{T}^{2}+\phi\sum_{k=0}^{N-1}\theta_{k+1}\|x^{k+1}-z^{k+2}\|_{S}^{2}

\leqslant\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})<\infty.

Since $\theta_{k+1}=\frac{\tau_{k+1}}{\tau_{k}}>1$ , $\sum_{k=0}^{\infty}\theta_{k+1}=\infty$ . Letting $N\to\infty$ in the above inequality and applying the equivalence of $\|\cdot\|_{M}$ and $\|\cdot\|_{2}$ , where $M$ denotes the symmetric positive definite matrix, we get $\lim\limits_{k\to\infty}\|x^{k+1}-z^{k+2}\|=0$ .

Similarly, we can deduce that $\lim\limits_{k\to\infty}\|y^{k+1}-y^{k}\|=0$ . Thus, $(x^{k+1},y^{k+1})$ has at least one limit point $(x^{\infty},y^{\infty})$ and hence there exists a subsequence $\{(x^{k_{i}+1},y^{k_{i}+1})\}$ such that $(x^{k_{i}+1},y^{k_{i}+1})\to(x^{\infty},y^{\infty})$ as $i\to\infty$ . Similar to (3.7) and (3.11), for any $(x,y)$ , there hold

\tau_{k_{i}}(f(x^{k_{i}+1})-f(x))\leqslant\langle S(x^{k_{i}+1}-z^{k_{i}+1})+\tau_{k_{i}}A^{*}y^{k_{i}},x-x^{k_{i}+1}\rangle+\tau_{k_{i}}\delta_{k_{i}+1}.

(3.20)

and

\tau_{k_{i}+1}(g(y^{k_{i}+1})-g(y))\leqslant\frac{1}{\beta}\langle T(y^{k_{i}+1}-y^{k_{i}})-\beta\tau_{k_{i}+1}Ax^{k_{i}+1},y-y^{k_{i}+1}\rangle+\tau_{k_{i}+1}\varepsilon_{k_{i}+1}.

(3.21)

In view of Lemma 3.1 (ii), we have $\tau_{k_{i}+1}>\underline{\tau}>0$ . Letting $k\rightarrow\infty$ in (3.20) and (3.21), respectively, and taking into account that $\delta_{k}\to 0$ and $\varepsilon_{k}\to 0$ as $k\to\infty$ , we obtain

f(x)-f(x^{\infty})+\langle A^{*}y^{\infty},x-x^{\infty}\rangle\geqslant 0\quad\textrm{and}\quad g(y)-g(y^{\infty})-\langle Ax^{\infty},y-y^{\infty}\rangle\geqslant 0.

(3.22)

This shows that $(x^{\infty},y^{\infty})$ is a saddle point of (1.6).

We now establish the convergence rates of Algorithm 1.

Theorem 3.5

Let $\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\}$ be the sequence generated by Algorithm 1, and $\{{(\overline{x},\overline{y})}\}$ is any saddle point of (1.6). Then, for the ergodic sequence $\{(X^{N},Y^{N})\}$ given by

{X}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\tau_{k}x^{k}\quad and\quad{Y}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\tau_{k}y^{k}\qquad with\quad S^{N}=\sum_{k=1}^{N}\tau_{k},

(3.23)

it holds that

G(X^{N},Y^{N})\leqslant\frac{c_{1}}{2\underline{c}N},

(3.24)

where

c_{1}=\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+\sum_{k=0}^{N-1}2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).

Proof. Since $\lambda_{1},\lambda_{2}>\eta$ , it follows from (3.12) that

	$\displaystyle 2\tau_{k+1}G(x^{k+1},y^{k+1})\leqslant$	$\displaystyle\frac{\phi}{\phi-1}(\\|z^{k+2}-\overline{x}\\|_{S}^{2}-\\|z^{k+3}-\overline{x}\\|_{S}^{2})+\frac{1}{\beta}(\\|y^{k}-\overline{y}\\|_{T}^{2}-\\|y^{k+1}-\overline{y}\\|_{T}^{2})$
		$\displaystyle+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).$

By taking summation over $k=0,1,2,...,N-1$ , we obtain

2\sum_{k=0}^{N-1}\tau_{k+1}G(x^{k+1},y^{k+1})\leqslant\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}(\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})).

(3.25)

Since $P(x)$ and $D(y)$ are convex,

\begin{cases}P(X^{N})\leqslant\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}P(x^{k+1})\\ D(Y^{N})\leqslant\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}D(y^{k+1}),\end{cases}

where $S^{N}=\sum_{k=0}^{N-1}\tau_{k+1}$ . Combining (3.23) with (3.25), we obtain

\begin{split}G(&X^{N},Y^{N})=P(X^{N})+D(Y^{N})\\ &\leqslant\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}(P(x^{k+1})+D(y^{k+1}))=\frac{1}{S^{N}}\sum_{k=0}^{N-1}\tau_{k+1}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{1}{2S^{N}}(\frac{\phi}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\sum_{k=0}^{N-1}(\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}))).\end{split}

(3.26)

In view of Lemma 3.1(iii), we have $S^{N}=\sum_{k=1}^{N}\tau_{k}\geqslant\underline{c}N$ . Hence, (3.24) holds.

Lemma 3.6

( [29]) For $\xi>-1$ , let $s^{N}:=\sum_{k=1}^{N}k^{\xi}$ . Then

s^{N}=O(N^{1+\xi}).

Similar to Corallary 3.4 in [29], we have the following theorem.

Theorem 3.7

If $\alpha>0$ and $\delta_{k}=O(\frac{1}{k^{\alpha}})$ , $\varepsilon_{k}=O(\frac{1}{k^{\alpha}})$ , we have

\displaystyle G(X^{N},Y^{N})=\left\{\begin{array}[]{ll}O(1/N),\quad\alpha>1,\\ O(\ln N/N),\alpha=1,\\ O(N^{-\alpha}),\quad\alpha\in(0,1).\end{array}\right.

(3.27)

Proof. If $\alpha>1$ , then the sequences $\{\delta_{k}\}$ and $\{\varepsilon_{k}\}$ are summable. Since the sequence $\{\tau_{k}\}$ is bounded, from (3.24) we have

G(X^{N},Y^{N})=O(1/N).

If $\alpha=1$ , then $\delta_{k}=O(\frac{1}{k})$ and $\varepsilon_{k}=O(\frac{1}{k})$ . In view of the assumption on $\delta_{k}$ , for some $r>0$ , we have

\delta_{k}\leqslant\frac{r}{k+1}.

Thus,

\sum_{k=0}^{N-1}\delta_{k+1}\leqslant\sum_{k=0}^{N-1}\frac{r}{k+1}=c(1+\sum_{k=2}^{N}\frac{1}{k})\leqslant r(1+\int_{1}^{N}\frac{1}{t}dt)=r(1+\ln N).

Hence, from the boundedness of $\{\tau_{k}\}$ we know that $\sum_{k=0}^{N-1}\tau_{k+1}\delta_{k+1}=O(\ln N)$ . Similarly, $\sum_{k=0}^{N-1}\tau_{k+1}\varepsilon_{k+1}$ can also obtain the same result. Therefore, we get

G(X^{N},Y^{N})=O(\ln N/N).

If $\alpha\in(0,1)$ , then $-\alpha>-1$ , from Lemma 3.4 we obtain $\sum_{k=0}^{N-1}\delta_{k+1}=O(N^{1-\alpha})$ and $\sum_{k=0}^{N-1}\varepsilon_{k+1}=O(N^{1-\alpha})$ . Thus, we have

G(X^{N},Y^{N})=O(N^{-\alpha}).

3.2 Partially strongly convex case

Now, we consider the acceleration of Algorithm 1 assuming additionally that $f$ is $\gamma_{f}$ -strongly convex, i.e., it holds for some $\gamma_{f}>0$ that

f(y)\geqslant f(x)+\langle u,y-x\rangle+\frac{\gamma_{f}}{2}\|y-x\|^{2},\forall u\in\partial f(x),\forall x,y\in\mathbb{R}^{n}.

(3.28)

When $g$ is strongly convex,the corresponding results can be achieved similarly and is thus omitted. The accelerated version of Algorithm 1 is summarized in Algorithm 2.

1: Choose

x^{0}=z^{0}\in\mathbb{R}^{n}

y^{0}\in\mathbb{R}^{m}

\beta_{0}>0,\tau_{0}>0,\mu\in(0,1)

and

\phi\in(\xi,\varphi)

where

\xi

is the unique real root of

\xi^{3}-\xi-1=0

. Let

S\in\mathbb{R}^{n\times n}

and

T\in\mathbb{R}^{m\times m}

are given symmetric positive definite matrix.

\lambda_{1},\lambda_{2}>1

are the minimum eigenvalues of

S

and

T

, respectively,

\Lambda_{1}

is the maximum eigenvalues of

S

. Set

\psi=\frac{1+\phi}{\phi^{2}}

and

k=0.

2: Compute

{z}^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k},

(3.29)

{x}^{k+1}\approx_{2}^{\delta_{k+1}}\mathop{\arg\min}\limits_{x\in X}\{L(x,y^{k})+\frac{1}{2\tau_{k}}\|x-z^{k+1}\|_{S}^{2}\},

(3.30)

\omega_{k+1}=\frac{\phi-\psi}{\phi\Lambda_{1}+\psi\gamma_{f}\tau_{k}},

(3.31)

\beta_{k+1}=\beta_{k}(1+\gamma_{f}\omega_{k+1}\tau_{k}).

(3.32)

3: Choose any

\tau_{k+1}\in[\tau_{k},\psi\tau_{k}]

and run

3.a: Compute

{y}^{k+1}\approx_{2}^{\varepsilon_{k+1}}\mathop{\arg\max}\limits_{y\in Y}\{L(x^{k+1},y)-\frac{1}{2\beta_{k+1}\tau_{k+1}}\|y-y^{k}\|_{T}^{2}\}.

(3.33)

3.b: Break linesearch if

\sqrt{\beta_{k+1}\tau_{k+1}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\leqslant\sqrt{\frac{\phi}{\tau_{k}}}\|y^{k+1}-y^{k}\|_{T}.

(3.34)

Otherwise, set

\tau_{k+1}:=\tau_{k+1}\mu

and go to 3.a.

4: Set

k+1\leftarrow k+2

and return to 2.

Algorithm 2 Accelerated IP-GRPDAL when

f

\gamma_{f}

-strongly convex

Similar to Lemma 3.1, we have the following results.

Lemma 3.8

(i) The linesearch in Algorithm 2 always terminates.
(ii) There exists constant $c>0$ such that $\beta_{k}\geqslant ck^{2}$ .

Proof. (i) This conclusion follows from Lemma 3.1(i) by replacing $\beta$ with $\beta_{k}$ and setting $\eta=1$ .
(ii) The proof is similar to Lemma 4.1 (ii) in [13] and is thus omitted.

Next we will establish the $O(1/{N^{2}})$ convergence rate of Algorithm 2.

Theorem 3.9

Let $\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\}$ be the sequence generated by Algorithm 2, and $\{{(\overline{x},\overline{y})}\}$ is any saddle point of (1.6). Then, we can obtain $\|z^{N+2}-\overline{x}\|_{S}=O(\frac{1}{N})$ and for the ergodic sequence given by

\overline{S}^{N}=\sum_{k=1}^{N}\beta_{k}\tau_{k},\quad\overline{X}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\beta_{k}\tau_{k}x^{k},\quad and\quad\overline{Y}^{N}=\frac{1}{S^{N}}\sum_{k=1}^{N}\beta_{k}\tau_{k}y^{k},

(3.35)

there exists a constant $c_{3}>0$ such that

G(\overline{X}^{N},\overline{Y}^{N})\leqslant\frac{c_{3}}{N^{2}}

(3.36)

Proof. Since $f$ is strongly convex, it follows from (3.30) and Definition 2.3 that

f(x)-f(x^{k+1})+\frac{1}{\tau_{k}}\langle{S(x^{k+1}-z^{k+1})+A^{*}y^{k},x-x^{k+1}}\rangle-\frac{\gamma_{f}}{2}\|x-x^{k+1}\|^{2}+\delta_{k+1}\geqslant 0.

(3.37)

Since $\Lambda_{1}$ is the maximum eigenvalue of $S$ , from (2.10) we get $\|x-x^{k+1}\|^{2}\geqslant\frac{1}{\Lambda_{1}}\|x-x^{k+1}\|_{S}^{2}$ . Similar to (3.7)-(3.10), we obtain

\begin{split}\tau_{k+1}&(f(x^{k+2})-f(\overline{x})+\frac{\gamma_{f}}{2\Lambda_{1}}\|\overline{x}-x^{k+2}\|_{S}^{2}))\\ &\leqslant\langle S(x^{k+2}-z^{k+2})+\tau_{k+1}A^{*}y^{k+1},\overline{x}-x^{k+2}\rangle+\tau_{k+1}\delta_{k+2},\end{split}

(3.38)

\begin{split}\tau_{k+1}&(f(x^{k+1})-f(x^{k+2})+\frac{\gamma_{f}}{2\Lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}))\\ &\leqslant\langle\phi\theta_{k+1}S(x^{k+1}-z^{k+2})+\tau_{k+1}A^{*}y^{k},x^{k+2}-x^{k+1}\rangle+\tau_{k+1}\delta_{k+1}.\end{split}

(3.39)

From (3.33), for $\overline{y}\in Y$ we have

\tau_{k+1}(g(y^{k+1})-g(\overline{y}))\leqslant\frac{1}{\beta_{k+1}}\langle T(y^{k+1}-y^{k})x-\beta_{k+1}\tau_{k+1}Ax^{k+1},\overline{y}-y^{k+1}\rangle+\tau_{k+1}\varepsilon_{k+1}.

(3.40)

Then, by adding (3.38), (3.39) and (3.40) and using similar arguments as in Lemma 3.2, we deduce

\begin{split}\tau_{k+1}G&(x^{k+1},y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle\\ &+\frac{1}{\beta_{k+1}}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle+\tau_{k+1}\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle\\ &-\frac{\gamma_{f}\tau_{k+1}}{2\Lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}-\frac{\gamma_{f}\tau_{k+1}}{2\Lambda_{1}}\|\overline{x}-x^{k+2}\|_{S}^{2}+\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.41)

Using (2.8) and Cauchy-Schwarz inequality, from (3.41) we get

\begin{split}(1+&\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\|\overline{x}-x^{k+2}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|\overline{y}-y^{k+1}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{1}{\beta_{k+1}}\|y^{k}-y^{k+1}\|_{T}^{2}\\ &+(\phi\theta_{k+1}-1)\|z^{k+2}-x^{k+2}\|_{S}^{2}-\phi\theta_{k+1}\|z^{k+2}-x^{k+1}\|_{S}^{2}-\phi\theta_{k+1}\|x^{k+2}-x^{k+1}\|_{S}^{2}\\ &+2\tau_{k+1}\|A^{*}(y^{k+1}-y^{k})\|\|x^{k+1}-x^{k+2}\|+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.42)

Combining (3.16) with (3.42), we have

\begin{split}(1+&\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}\|\overline{x}-z^{k+3}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|\overline{y}-y^{k+1}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{1}{\beta_{k+1}}\|y^{k}-y^{k+1}\|_{T}^{2}\\ &+(\phi\theta_{k+1}-1-\frac{1+(\gamma_{f}\tau_{k+1}/\Lambda_{1})}{\phi})\|z^{k+2}-x^{k+2}\|_{S}^{2}-\phi\theta_{k+1}\|z^{k+2}-x^{k+1}\|_{S}^{2}\\ &-\phi\theta_{k+1}\|x^{k+2}-x^{k+1}\|_{S}^{2}+2\tau_{k+1}\|A^{*}(y^{k+1}-y^{k})\|\|x^{k+1}-x^{k+2}\|\\ &+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.43)

Thus, it follows from (2.10), (3.34) and Cauchy-Schwarz inequality that

2\tau_{k+1}\|A^{*}y^{k+1}-A^{*}y^{k}\|\|x^{k+1}-x^{k+2}\|\leqslant\frac{\theta_{k+1}\phi}{\lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}+\frac{1}{\lambda_{2}\beta_{k+1}}\|y^{k+1}-y^{k}\|_{T}^{2}.

(3.44)

Since $\psi=\frac{1+\phi}{\phi^{2}}$ and $\theta_{k+1}\leqslant\psi$ , we have $\phi\theta_{k+1}-1-\frac{1+(\gamma_{f}\tau_{k+1}/\Lambda_{1})}{\phi}\leqslant\phi\psi-1-\frac{1}{\phi}-\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})}{\phi}=-\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}\phi}$ . Therefore, substituting (3.44) into (3.43), we obtain

\begin{split}(1+&\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}\|\overline{x}-z^{k+3}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|\overline{y}-y^{k+1}\|_{T}^{2}+2\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{\gamma_{f}\tau_{k+1}}{\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &-(1-\frac{1}{\lambda_{1}})\theta_{k+1}\phi\|x^{k+2}-x^{k+1}\|_{S}^{2}-\frac{\lambda_{2}-1}{\lambda_{2}\beta_{k+1}}\|y^{k+1}-y^{k}\|_{T}^{2}+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{\gamma_{f}\tau_{k+1}}{\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &+2\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.45)

Note that $(1+\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}=\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}.$ Thus, it follows from $\tau_{k+2}\leqslant\psi\tau_{k+1}$ and $\omega_{k+2}=\frac{\phi-\psi}{\phi\Lambda_{1}+\psi\gamma_{f}\tau_{k+1}}$ that

\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}\geqslant\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\psi\tau_{k+1}/\Lambda_{1})}=1+\frac{(\phi-\psi)\gamma_{f}\tau_{k+1}}{\phi\Lambda_{1}+\gamma_{f}\psi\tau_{k+1}}=1+\omega_{k+2}\gamma_{f}\tau_{k+1}.

(3.46)

Since $\beta_{k+2}=\beta_{k+1}(1+\gamma_{f}\omega_{k+2}\tau_{k+1})$ ,

\begin{split}(1+\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}})\frac{\phi}{\phi-1}&=\frac{\phi(1+(\gamma_{f}\tau_{k+1}/\Lambda_{1}))}{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}\\ &\geqslant(1+\omega_{k+2}\gamma_{f}\tau_{k+1})\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}\\ &=\frac{\beta_{k+2}}{\beta_{k+1}}\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}.\end{split}

(3.47)

Substituting (3.47) into (3.45) and multiplying the resulting inequality by $\frac{1}{2}\beta_{k+1}$ , we have

\begin{split}\frac{\beta_{k+2}}{2}&\frac{\phi+(\gamma_{f}\tau_{k+2}/\Lambda_{1})}{\phi-1}\|\overline{x}-z^{k+3}\|_{S}^{2}+\frac{1}{2}\|\overline{y}-y^{k+1}\|_{T}^{2}+\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})\\ \leqslant&\frac{\beta_{k+1}}{2}\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{2}\|y^{k}-\overline{y}\|_{T}^{2}\\ &-\frac{\beta_{k+1}}{2}\frac{\gamma_{f}\tau_{k+1}}{\Lambda_{1}\phi}\|z^{k+2}-x^{k+2}\|_{S}^{2}+\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.48)

Define $A_{k+1}:=\frac{(\gamma_{f}\tau_{k+1}/\Lambda_{1})+\phi}{2(\phi-1)}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{2\beta_{k+1}}\|y^{k}-\overline{y}\|_{T}^{2}$ , and hence (3.48) yields

	$\displaystyle\beta_{k+2}A_{k+2}+\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})\leqslant$	$\displaystyle\beta_{k+1}A_{k+1}-\frac{\beta_{k+1}\gamma_{f}\tau_{k+1}}{2\phi\Lambda_{1}}\\|z^{k+2}-x^{k+2}\\|_{S}^{2}$
		$\displaystyle+\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).$

By summing over $k=0,1,2,...,N-1$ in the above inequality, we obtain

\begin{split}\beta_{N+1}A_{N+1}&+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})+\sum_{k=0}^{N-1}\frac{\beta_{k+1}\gamma_{f}\tau_{k+1}}{2\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &\leqslant\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.49)

In view of the convexity of $G(x,y)$ , from (3.49), we have

\begin{split}G(\overline{X}^{N},\overline{Y}^{N})&\leqslant\frac{1}{\overline{S}^{N}}\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{1}{\overline{S}^{N}}(\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})).\end{split}

(3.50)

According to the definition of $A_{k+1}$ and (3.49), we get

A_{N+1}=\frac{(\gamma_{f}\tau_{N+1}/\Lambda_{1})+\phi}{2(\phi-1)}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{2\beta_{N+1}}\|y^{N}-\overline{y}\|_{T}^{2}

\leqslant\frac{\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})}{\beta_{N+1}}.

Thus,

\|z^{N+2}-\overline{x}\|_{S}^{2}\leqslant\frac{2(\phi-1)}{(\gamma_{f}\tau_{N+1}/\Lambda_{1})+\phi}A_{N+1}\leqslant\frac{2(\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})}{\beta_{N+1}}.

(3.51)

From Lemma 3.8(ii), we know that there exists $c>0$ such that $\beta_{k}\geqslant ck^{2}$ for all $k\geqslant 1$ , and hence (3.51) implies $\|z^{N+2}-\overline{x}\|_{S}\leqslant\frac{c_{2}}{N}$ with

c_{2}=\sqrt{\frac{2(\beta_{1}A_{1}+\sum_{k=0}^{N-1}\beta_{k+1}\tau_{k+1}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}))}{c}}>0.

Since $\beta_{k+1}=\beta_{k}(1+\gamma_{f}\omega_{k+1}\tau_{k})$ and $\omega_{k+1}=\frac{\phi-\psi}{\phi\Lambda_{1}+\psi\gamma_{f}\tau_{k}}<1$ ,

\beta_{k}\tau_{k}=\frac{\beta_{k+1}-\beta_{k}}{\gamma_{f}\omega_{k+1}}\geqslant\frac{\beta_{k+1}-\beta_{k}}{\gamma_{f}}.

(3.52)

Since $\overline{S}^{N}=\sum_{k=1}^{N}\beta_{k}\tau_{k}\geqslant\frac{\beta_{N+1}-\beta_{1}}{\gamma_{f}}$ and $\beta_{k}\geqslant ck^{2},$ $\overline{S}^{N}=O(N^{2})$ . This means that $G(\overline{X}^{N},\overline{Y}^{N})\leqslant\frac{c_{3}}{N^{2}}$ for some constant $c_{3}>0$ . This completes the proof.

3.3 Completely strongly convex case

We further assume that $f$ is $\gamma_{f}$ -strongly convex and $g$ is $\gamma_{g}$ -strongly convex. In this setting, Algorithm 3 can be accelerated to linear convergence by properly selecting parameters $\tau_{k}$ , $\delta_{k}$ and $\varepsilon_{k}$ .

1: Choose

x^{0}=z^{0}\in\mathbb{R}^{n}

y^{0}\in\mathbb{R}^{m}

\beta>0,\tau_{0}>0,\mu\in(0,1)

and

\phi\in(\xi,\varphi)

where

\xi

is the unique real root of

\xi^{3}-\xi-1=0

. Let

S\in\mathbb{R}^{n\times n}

and

T\in\mathbb{R}^{m\times m}

are given symmetric positive definite matrix. Set

\psi=\frac{1+\phi}{\phi^{2}}

and

k=0.

2: Compute

{z}^{k+1}=\frac{\phi-1}{\phi}x^{k}+\frac{1}{\phi}z^{k},

{x}^{k+1}\approx_{2}^{\delta_{k+1}}\mathop{\arg\min}\limits_{x\in X}\{L(x,y^{k})+\frac{1}{2\tau_{k}}\|x-z^{k+1}\|_{S}^{2}\}.

3: Choose any

\tau_{k+1}\in[\tau_{k},\psi\tau_{k}]

and run

3.a: Compute

{y}^{k+1}\approx_{2}^{\varepsilon_{k+1}}\mathop{\arg\max}\limits_{y\in Y}\{L(x^{k+1},y)-\frac{1}{2\beta\tau_{k+1}}\|y-y^{k}\|_{T}^{2}\}.

(3.53)

3.b: Break linesearch if

\sqrt{\beta\tau_{k+1}}\|A^{*}y^{k+1}-A^{*}y^{k}\|_{T}\leqslant\sqrt{\frac{\phi}{\tau_{k}}}\|y^{k+1}-y^{k}\|_{T}.

(3.54)

Otherwise, set

\tau_{k+1}:=\tau_{k+1}\mu

and go to 3.a.

4: Set

k+1\leftarrow k+2

and return to 2.

Algorithm 3 Accelerated IP-GRPDAL when

f

and

g

are strongly convex

Now, we summarize the linear convergence rate of Algorithm 3 in the following theorem.

Theorem 3.10

Suppose that $\Lambda_{1},\Lambda_{2}>1$ are the maximum eigenvalues of $S$ and $T$ , respectively, $\delta_{k}=\varepsilon_{k}=O(q^{k})$ with $q\in(0,1)$ , and $\tau_{k}=\tau$ such that $1+\frac{\gamma_{f}\tau}{\Lambda_{1}}=1+\frac{\beta\gamma_{g}\tau}{\Lambda_{2}}=\frac{1}{\rho}$ . Let $\{(z^{k+1},x^{k+1},y^{k+1}):k\geqslant\ 0\}$ be the sequence generated by Algorithm 3 and $\{(\overline{x},\overline{y})\}$ is any saddle point of (1.6). Then, for the ergodic sequence $\{(\widetilde{X}^{N},\widetilde{Y}^{N})\}$ given by

\widetilde{X}^{N}=\frac{1}{\widetilde{S}^{N}}\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}x^{k+1}\quad and\quad\widetilde{Y}^{N}=\frac{1}{\widetilde{S}^{N}}\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}y^{k+1}\quad with\quad\widetilde{S}^{N}=\sum_{k=0}^{N-1}\frac{1}{\rho^{k}},

(3.55)

it holds that

\displaystyle G(\widetilde{X}^{N},\widetilde{Y}^{N})+\frac{1}{2\tau}\|\overline{x}-z^{N+2}\|_{S}^{2}=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q.\end{array}\right.

Proof. Similar to the proof of (3.41) in Theorem 3.9, one can deduce that

\begin{split}\tau G(&x^{k+1},y^{k+1})\leqslant\langle S(x^{k+2}-z^{k+2}),\overline{x}-x^{k+2}\rangle+\phi\theta_{k+1}\langle S(x^{k+1}-z^{k+2}),x^{k+2}-x^{k+1}\rangle\\ &+\frac{1}{\beta}\langle T(y^{k+1}-y^{k}),\overline{y}-y^{k+1}\rangle+\tau\langle A^{*}(y^{k+1}-y^{k}),x^{k+1}-x^{k+2}\rangle-\frac{\gamma_{f}\tau}{2\Lambda_{1}}\|x^{k+2}-\overline{x}\|_{S}^{2}\\ &-\frac{\gamma_{f}\tau}{2\Lambda_{1}}\|x^{k+2}-x^{k+1}\|_{S}^{2}-\frac{\gamma_{g}\tau}{2\Lambda_{2}}\|y^{k+1}-\overline{y}\|_{T}^{2}+\tau(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.56)

Applying the same arguments as in (3.42)-(3.45), we have

\begin{split}(1&+\frac{\gamma_{f}\tau}{\Lambda_{1}})\frac{\phi}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+(\frac{1}{\beta}+\frac{\gamma_{g}\tau}{\Lambda_{2}})\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau G(x^{k+1},y^{k+1})\\ \leqslant&\frac{(\gamma_{f}\tau/\Lambda_{1})+\phi}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}-\frac{\gamma_{f}\tau}{\phi\Lambda_{1}}\|z^{k+2}-x^{k+2}\|_{S}^{2}\\ &+2\tau(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.57)

Since $1+\frac{\gamma_{f}\tau}{\Lambda_{1}}=1+\frac{\beta\gamma_{g}\tau}{\Lambda_{2}}=\frac{1}{\rho}$ , from (3.47) we get

(1+\frac{\gamma_{f}\tau}{\Lambda_{1}})\frac{\phi}{\phi-1}\geqslant\frac{\phi(1+(\gamma_{f}\tau/\Lambda_{1}))}{\phi+(\gamma_{f}\tau/\Lambda_{1})}\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\geqslant\frac{1}{\rho}\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\quad and\quad\frac{1}{\beta}+\frac{\gamma_{g}\tau}{\Lambda_{2}}=\frac{1}{\rho\beta}.

Thus, the inequality (3.57) implies that

\begin{split}\frac{1}{\rho}&\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{k+3}-\overline{x}\|_{S}^{2}+\frac{1}{\rho\beta}\|y^{k+1}-\overline{y}\|_{T}^{2}+2\tau G(x^{k+1},y^{k+1})\\ &\leqslant\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{k+2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{k}-\overline{y}\|_{T}^{2}+2\tau(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.58)

Multiplying (3.58) by $\rho^{-k}$ and summing the resulting inequality from $k=0$ to $N-1$ , we get

\begin{split}\frac{1}{\rho^{N}}&\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{N+2}-\overline{x}\|_{S}^{2}+\frac{1}{\rho^{N}\beta}\|y^{N}-\overline{y}\|_{T}^{2}+2\tau\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}G(x^{k+1},y^{k+1})\\ &\leqslant\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{\beta}\|y^{0}-\overline{y}\|_{T}^{2}+2\tau\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.59)

Since $P(x)$ and $P(y)$ are convex, from (3.55) we deduce

\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}G(x^{k+1},y^{k+1})=\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(P(x^{k+1})+D(y^{k+1}))\geqslant\widetilde{S}^{N}(P(\widetilde{X}^{N})+D(\widetilde{Y}^{N}))=\widetilde{S}^{N}G(\widetilde{X}^{N},\widetilde{Y}^{N}).

Therefore, from (3.59) we have

\begin{split}\widetilde{S}^{N}&G(\widetilde{X}^{N},\widetilde{Y}^{N})+\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{2\tau\rho^{N}(\phi-1)}\|\overline{x}-z^{N+2}\|_{S}^{2}\\ &\leqslant\frac{(\gamma_{f}\tau_{1}/\Lambda_{1})+\phi}{2\tau(\phi-1)}\|z^{2}-\overline{x}\|_{S}^{2}+\frac{1}{2\beta\tau}\|y^{0}-\overline{y}\|_{T}^{2}+\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1}).\end{split}

(3.60)

Note that $\delta_{k}=\varepsilon_{k}=O(q^{k})$ , $\frac{1}{\widetilde{S}^{N}}=1/(\sum_{k=0}^{N-1}\frac{1}{\rho^{k}})=O(\rho^{N})$ with $q,\rho\in(0,1)$ , and

\rho^{N}\sum_{k=0}^{N-1}\frac{q^{k}}{\rho^{k}}=\rho^{N}\sum_{k=0}^{N-1}(\frac{q}{\rho})^{k}=(q^{N}-\rho^{N})\frac{\rho}{q-\rho}.

Thus,

\displaystyle\frac{1}{\widetilde{S}^{N}}\sum_{k=0}^{N-1}\frac{1}{\rho^{k}}(\delta_{k+1}+\delta_{k+2}+\varepsilon_{k+1})=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q.\end{array}\right.

Since $G(\widetilde{X}^{N},\widetilde{Y}^{N})\geqslant 0$ and $\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\rho^{N}(\phi-1)}>\frac{\phi+(\gamma_{f}\tau/\Lambda_{1})}{\phi-1}>1$ ,

\begin{aligned} G(\widetilde{X}^{N},\widetilde{Y}^{N})=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q,\end{array}\right.\end{aligned}\begin{aligned} \frac{1}{2\tau}\|\overline{x}-z^{N+2}\|_{S}^{2}=\left\{\begin{array}[]{ll}O(\rho^{N}),\quad\rho>q,\\ O(N\rho^{N}),\rho=q,\\ O(q^{N}),\quad\rho<q.\end{array}\right.\end{aligned}

This completes the proof.

4 Numerical experiments

In this section, we apply the proposed Algorithm 1 to solve sparse recovery and image deblurring problems. We compare IP-GRPDAL method with Algorithm 1 in [9](denoted by PDA), Algorithm 1 in [16](denoted by IPDA), PDAL method in [26], and GRPDAL method in [13]. All codes were written in MATLAB R2015a on a PC (with CPU Intel i5-5200U). For simplicity we set $S=\left(\begin{array}[]{cc}\frac{1}{r_{1}}I_{1}&0\\ 0&\frac{1}{r_{2}}I_{2}\end{array}\right)$ , $T=\left(\begin{array}[]{cc}\frac{1}{s_{1}}I_{1}&0\\ 0&\frac{1}{s_{2}}I_{2}\end{array}\right)$ , $\delta_{k+1}=0$ and $\varepsilon_{k+1}=O(1/(k+1)^{\alpha})$ in Algorithm 1.

4.1 $l_{1}$ regularized analysis sparse recovery problem

We study the following problem:

\varPhi(x)=\mathop{\textrm{min}}\limits_{x}\frac{1}{2}\|Ax-b\|^{2}+\zeta\|x\|_{1}

(4.1)

where $A\in\mathbb{R}^{m\times n}$ , $x\in\mathbb{R}^{n}$ , $b\in\mathbb{R}^{m}$ , and $\zeta>0$ is a regularization parameter. Let $f(x)=\zeta\|x\|_{1}$ , $g^{*}(y)=\frac{1}{2}\|y\|^{2}+\langle b,y\rangle$ , We can rewrite (4.1) as

\mathop{\textrm{min}}\limits_{x}\mathop{\textrm{max}}\limits_{y}\langle Ax,y\rangle+f(x)-g^{*}(y)

(4.2)

We use Proximal Gradient Method to solve subproblems in these algorithms, and the parameters are set as follows:
(i) $A:=\frac{1}{\sqrt{n}}\textrm{randn}(n,p)$ ;
(ii) $\omega\in R^{n}$ is a random vector, where $s$ random coordinates are drawn from the uniform distribution in $[-10,10]$ and the rest are zeros;
(iii) The observed value $b$ is generated by $b=A\omega+N(0,0.1)$ , the regularization parameter $\zeta$ was set to be $0.1$ ;
(iv) We set $\tau=\frac{1}{10\|A\|},\sigma=\frac{10}{\|A\|}$ for the PDA method. For PDAL, GRPDAL and IP-GRPDAL methods, we set $\tau_{0}=\|y_{-1}-y_{0}\|/(\sqrt{\beta}\|A^{*}y_{-1}-A^{*}y_{0}\|),\beta=100,\phi=1.618,\mu=0.7$ and $\eta=0.99.$ At the same time, we set $s_{1}=2,r_{1}=\frac{0.99}{s_{1}},s_{2}=1,r_{2}=\frac{0.99}{s_{2}}$ and $\alpha=2$ for the IP-GRPDAL method as those for the IPDA method. The initial points are $x^{0}=(0,...,0)$ and $y^{0}=Ax^{0}+b.$

In this experiment, we terminate all the algorithms when $\varPhi(x^{k})-\varPhi^{*}<10^{-10}$ , where an approximation of the optimal value $\varPhi^{*}$ is obtained by running the algorithms for 5000 iterations.

In order to investigate the stability and efficiency of our method, we test three groups of problems with different pairs $(n,p,s)$ and run the tests 10 times for each group. The average numerical performances including the CPU time (Time, in seconds), the number of iterations (Iter) and the number of extra linesearch trial steps (LS) of PDAL, GRPDAL and IP-GRPDAL methods are reported in Table 1.

Table 1: Numerical results of tested algorithms with random tight frames

$n$	$p$	$s$	PDA		IPDA		PDAL
$n$	$p$	$s$	Time	Iter	Time	Iter	Time	Iter	LS
100	100	10	62.23	58842	13.68	51229	29.27	28316	27918
500	800	50	138.89	$\diagup$	60.64	92863	79.56	51500	50276
1000	2000	100	427.67	$\diagup$	391.08	184676	206.37	108251	106938
$n$	$p$	$s$	GRPDAL			IP-GRPDAL
$n$	$p$	$s$	Time	Iter	LS		Time	Iter	LS
100	100	10	18.75	19390	5976		9.49	18292	4647
500	800	50	58.23	42407	13980		53.01	38594	12260
1000	2000	100	175.49	$\diagup$	72619		113.30	$\diagup$	58627

Refer to caption — (a) $(n,p,s)=(100,100,10)$

From the results presented in Table 1, we can observe that our IP-GRPDAL method takes less CPU time and number of iterations compared with the other ones. In Figure 1, the ordinate denotes the function value residuals $\varPhi(x^{k})-\varPhi^{*}$ while the abscissa denotes the CPU time, from which it can be seen that the IP-GRPDAL method is much faster than the other ones.

4.2 TV- $L_{1}$ image deblurring problem

In this subsection, we study the numerical solution of the TV- $L_{1}$ model (4.1) in [16] for image deblurring

\mathop{\textrm{min}}\limits_{x\in X}F(x)=\|Kx-f\|_{1}+\nu\|Dx\|_{1},

(4.3)

where $f\in Y$ is a given (noisy) image, $K:X\to Y$ is a known linear (blurring) operator, $D:X\to Y$ denotes the gradient operator and $\nu$ is a regularization parameter. We introduce the variables $\kappa_{1},\kappa_{2}>0$ which satisfy $\kappa_{1}+\kappa_{2}=\nu$ . Then, (4.3) can be written as

\mathop{\textrm{min}}\limits_{x\in X}\|Kx-f\|_{1}+\kappa_{1}\|Dx\|_{1}+\kappa_{2}\|Dx\|_{1}.

Further, the above formula can be rewritten as

\mathop{\textrm{min}}\limits_{x\in X}\mathop{\textrm{max}}\limits_{y\in Y}L(x,y):=\kappa_{1}\|Dx\|_{1}+\langle Ax,y\rangle-\Upsilon_{C_{1}}(u)-\Upsilon_{C_{\kappa_{2}}}(v)-\langle f,u\rangle,

where $C_{\kappa}=\{y\in Y|\|y\|_{\infty}\leqslant\kappa\}$ , $y=\left(\begin{array}[]{c}u\\ v\end{array}\right)$ , and $A=\left(\begin{array}[]{c}K\\ \kappa_{2}D\end{array}\right).$ We adopt the following inequality as the stopping criterion of inner loop:

\varPsi(y^{k+1},v^{k+1})\leqslant\varepsilon_{k+1},

where $\varepsilon_{k+1}=O(1/(k+1)^{\alpha})$ and $\varPsi$ is defined as (4.11) in [16].

In the following we will report the numerical experiment results. In this test, average blur with hsize=9 was applied to the original image cameraman.png(256 256) by fspecial(average, 9), and $20\%$ salt-pepper noise was added in(see Figure 2). At the same time, we adopt the following stopping rule:

\frac{F(x^{k})-F(x^{*})}{F(x^{*})}<10^{-5},

where ${x}^{*}$ is a solution of the TV- $L_{1}$ model (4.3).

We fixed the number of iterations as $100$ and the penalty coefficient $\nu=0.1$ . When the five algorithms are implemented, their respective parameters are choosen as follows:
PDA: $\tau=\sigma=0.99$ ;
IPDA: $\tau_{0}=\sigma_{0}=1,s_{1}=2,r_{1}=\frac{0.99}{s_{1}},s_{2}=1,r_{2}=\frac{0.99}{s_{2}},\alpha=2$ ;
PDAL: $\tau_{0}=0.1,\beta=1,\mu=0.1,\eta=0.99$ ;
GRPDAL: $\tau_{0}=0.1,\phi=1.618,\beta=1,\mu=0.1,\eta=0.99$ ;
IP-GRPDAL: $\tau_{0}=0.1,\phi=1.618,\beta=1,s_{1}=2,r_{1}=\frac{0.99}{s_{1}},s_{2}=1,r_{2}=\frac{0.99}{s_{2}},\alpha=2,\mu=0.1,\eta=0.99$ .

The restored images by the above Algorithms are displayed in Figure 3. Obviously, our IP-GRPDAL method gets better restoration quality compared with PDA, PDAL, IPDA and GRPDAL methods. In our experiment, we find that, if we increase the number of iterations to $1000$ or more, all algorithms can restore the image with almost the same quality, but our algorithm needs fewer iterations than the other ones.

5 Conclusions

In this paper, we propose an inexact golden ratio primal-dual algorithm with linesearch for the saddle point problem by applying inexact extended proximal terms with matrix norm introduced in [16]. Under the assumption that $\tau\sigma\|A\|_{T}^{2}<\phi$ , we show the convergence of our Algorithm 1, provided that both controlled error sequences $\{\delta_{k+1}\}$ and $\{\varepsilon_{k+1}\}$ were required to be summable. The $O(1/N)$ convergence rate in the ergodic sense is also established in the general convex case. When either one of the component functions is strongly convex, accelerated version of Algorithm 1 is proposed, which achieves $O(1/N^{2})$ ergodic convergence rate. Furthermore, the linear convergence results are established when both component functions are strongly convex. We also apply our method to solve sparse recovery and TV- $L_{1}$ image deblurring problems and verify their efficiency numerically. It will be a interesting open problem to establish nonergodic convergence rate of IP-GRPDAL method with linesearch.

References

[1]
[2]
3.
4. Arrow, K.J., Hurwicz, L., Uzawa, H.: Studies in linear and non-linear programming. Stanford Mathematical Studies in the Social Sciences, II. Stanford University Press, Stanford, Calif. With contributions by Chenery, H.B., Johnson, S.M., Karlin, S., Marschak, T., Solow, R.M (1958)
5. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trend in Machine Learning 3(1), 1-122(2011)
6. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2(1), 183-202(2009)
7. Condat, L.: A primal dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. Journal of Optimization Theory and Applications 158(2), 460-479(2013)
8. Cai, J. F., Cand s, E. J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization 20(4), 1956-1982(2010)
9. Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision 40(1), 120-145(2011)
10. Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numerica 25(1), 161-319(2016)
11. Chang, X., Yang, J.: A golden ratio primal-dual algorithm for structured convex optimization. Journal of Scientific Computing 87(1), 1-26(2021)
12. Chang, X., Yang, J.: GRPDA revisited: relaxed condition and connection to Chambolle-Pock’s primal-dual algorithm. Journal of Scientific Computing 93(3), 1-18(2022)
13. Chang, X., Yang, J., Zhang, H.: Golden ratio primal-dual algorithm with linesearch. SIAM Journal on Optimization 32(3), 1584-1613(2022)
14. Eckstein, J., Bertsekas, D. P.: On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming 55(1), 293-318(1992)
15. Ehrhardt, M. J., Betcke, M. M.: Multicontrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences 9(3), 1084-1106(2016)
16. Fang, C., Hu, L., Chen, S.: An inexact primal-dual method with correction step for a saddle point problem in image debluring. Journal of Global Optimization 87(2), 965-988(2023)
17. Han, D., He, H., Yang, H., Yuan, X.: A customized Douglas-Rachford splitting algorithm for separable convex minimization with linear constraints. Numerische Mathematik 127(1), 167-200(2014)
18. He, H., Desai, J., Wang, K.: A primal-dual prediction-correction algorithm for saddle point optimization. Journal of Global Optimization 66(1), 573-583(2016)
19. He, B., Tao, M., Yuan, X.: Alternating direction method with Gaussian back substitution for separable convex programming. SIAM Journal on Optimization 22(2), 313-340(2012)
20. He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. SIAM Journal on Imaging Sciences 5(1), 119-149(2012)
21. Hien, L. T. K., Zhao, R., Haskell, W. B.: An inexact primal-dual smoothing framework for large-scale non-bilinear saddle point problems. Journal of Optimization Theory and Applications 200(1), 34-67(2024)
22. Jiang, F., Cai, X., Wu, Z., Han, D.: Approximate first-order primal-dual algorithms for saddle point problems. Mathematics of Computation 90(329), 1227-1262(2021)
23. Jiang, F., Wu, Z., Cai, X., Zhang, H.: A first-order inexact primal-dual algorithm for a class of convex-concave saddle point problems. Numerical Algorithms 1(1), 1-28(2021)
24. Liu, Y., Xu, Y., Yin, W.: Acceleration of primal-dual methods by preconditioning and simple subproblem procedures. Journal of Scientific Computing 86(2), 21-46(2021)
25. Malitsky, Y.: Golden ratio algorithms for variational inequalities. Mathematical Programming 184(1), 383-410(2020)
26. Malitsky, Y., Pock, T. A first-order primal-dual algorithm with linesearch. SIAM Journal on Optimization 28(1), 411-432(2018)
27. Ng, M. K., Wang, F., Yuan, X.: Inexact alternating direction methods for image recovery. SIAM Journal on Scientific Computing 33(4), 1643-1668(2011)
28. Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the Mumford-Shah functional. In $2009$ IEEE $12$ th International Conference on Computer Vision, IEEE, 1133-1140(2009)
29. Rasch, J., Chambolle, A.: Inexact first-order primal-dual algorithms. Computational Optimization and Applications 76(2), 381-430(2020)
30. Rudin, L. I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60(1-4), 259-268(1992)
31. Shores, T. S.: Applied Linear Algebra and Matrix Analysis. Springer, New York (2007)
32. Wang, K., He, H.: A double extrapolation primal-dual algorithm for saddle point problems. Journal of Scientific Computing 85(2), 1-30(2020)
33. Wu, Z., Song, Y., Jiang, F.: Inexact generalized ADMM with relative error criteria for linearly constrained convex optimization problems. Optimization Letters 18(2), 447-470(2024)
34. Xie, J.: On inexact ADMMs with relative error criteria. Computational Optimization and Applications 71(3), 743-765(2018)
35. Zhu, M., Chan, T.: An efficient primal-dual hybrid gradient algorithm for total variation image restoration. Ucla Cam Report 34(2), 8-34(2008)

An inexact golden ratio primal-dual algorithm with linesearch step for a saddle point problem

1 Introduction

2 Preliminaries

Definition 2.1

Definition 2.2

Definition 2.3

Lemma 2.4

Lemma 2.5

3 Algorithm and convergence properties

3.1 Convex case

Lemma 3.1

Lemma 3.2

Lemma 3.3

Theorem 3.4

Theorem 3.5

Lemma 3.6

Theorem 3.7

3.2 Partially strongly convex case

Lemma 3.8

Theorem 3.9

3.3 Completely strongly convex case

Theorem 3.10

4 Numerical experiments

4.1 l1l_{1} regularized analysis sparse recovery problem

4.2 TV-L1L_{1} image deblurring problem

5 Conclusions

References

4.1 $l_{1}$ regularized analysis sparse recovery problem

4.2 TV- $L_{1}$ image deblurring problem