This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A note on the convergence of the monotone inclusion version of the primal-dual hybrid gradient algorithm

Levon Nurbekyan
Department of Mathematics, Emory University
[email protected]
Abstract

The note contains a direct extension of the convergence proof of the primal-dual hybrid gradient (PDHG) algorithm in [3] to the case of monotone inclusions.

1 Introduction

Assume that 1,2\mathcal{H}_{1},\mathcal{H}_{2} are Hilbert spaces, and A:121A:\mathcal{H}_{1}\to 2^{\mathcal{H}_{1}}, B:222B:\mathcal{H}_{2}\to 2^{\mathcal{H}_{2}} are maximally monotone maps. Furthermore, assume that C:12C:\mathcal{H}_{1}\to\mathcal{H}_{2} is a non-zero bounded linear operator, and consider the following pair of primal-dual monotone inclusions

findx1s.t. 0Ax+C(B(Cx))(P)findy2s.t.yB(Cx),CyAx,for somex1(D)\begin{split}\text{find}\leavevmode\nobreak\ x\in\mathcal{H}_{1}\leavevmode\nobreak\ \text{s.t.}\leavevmode\nobreak\ 0\in Ax+C^{*}(B(Cx))&\quad\text{(P)}\\ \text{find}\leavevmode\nobreak\ y\in\mathcal{H}_{2}\leavevmode\nobreak\ \text{s.t.}\leavevmode\nobreak\ y\in B(Cx),\leavevmode\nobreak\ -C^{*}y\in Ax,\leavevmode\nobreak\ \text{for some}\leavevmode\nobreak\ x\in\mathcal{H}_{1}&\quad\text{(D)}\end{split} (1)

When A,BA,B are subdifferential maps of proper convex lower semicontinuous functions, this previous problem reduces to a pair of primal-dual convex programs or a convex-concave saddle point problem. More specifically, if A=f1A=\partial f_{1}, B=f2B=\partial f_{2} for f1:1¯f_{1}:\mathcal{H}_{1}\to\overline{\mathbb{R}}, f2:2¯f_{2}:\mathcal{H}_{2}\to\overline{\mathbb{R}} then (1) is equivalent to

infx1{f1(x)+f2(Cx)}=infx1supy2{f1(x)+Cx,yf2(y)}=supy2{f1(Cy)f2(y)}.\begin{split}\inf_{x\in\mathcal{H}_{1}}\left\{f_{1}(x)+f_{2}(Cx)\right\}=&\inf_{x\in\mathcal{H}_{1}}\sup_{y\in\mathcal{H}_{2}}\left\{f_{1}(x)+\langle Cx,y\rangle-f_{2}(y)\right\}\\ =&\sup_{y\in\mathcal{H}_{2}}\left\{-f_{1}^{*}(-C^{*}y)-f_{2}(y)\right\}.\end{split} (2)

In [2, 3], the authors introduced a first-order primal-dual splitting scheme for solving (2), which in its simplest form reads as

{xn+1=argminx1f1(x)+Cx,yn+xxn22τ,x~n+1=2xn+1xn,yn+1=argmaxy2Cx~n+1,yf2(y)yyn22σ,\begin{cases}x^{n+1}=\underset{x\in\mathcal{H}_{1}}{\operatorname{argmin}}\leavevmode\nobreak\ f_{1}(x)+\langle Cx,y^{n}\rangle+\frac{\|x-x^{n}\|^{2}}{2\tau},\\ \tilde{x}^{n+1}=2x^{n+1}-x^{n},\\ y^{n+1}=\underset{y\in\mathcal{H}_{2}}{\operatorname{argmax}}\leavevmode\nobreak\ \langle C\tilde{x}^{n+1},y\rangle-f_{2}(y)-\frac{\|y-y^{n}\|^{2}}{2\sigma},\end{cases} (3)

where τ,σ>0\tau,\sigma>0. The main results in [2, 3] provide convergence of ergodic sequences

XN=1Nn=1Nxi,YN=1Nn=1Nyi,X^{N}=\frac{1}{N}\sum_{n=1}^{N}x_{i},\quad Y^{N}=\frac{1}{N}\sum_{n=1}^{N}y_{i}, (4)

under the assumption

τσ<1C2.\tau\sigma<\frac{1}{\|C\|^{2}}. (5)

In [6], the author considers a more general version of (1) and introduces a splitting scheme, which in its simplest form reads as

{xn+1=(I+τA)1(xnτCyn),x~n+1=2xn+1xn,yn+1=(I+σB1)1(yn+σCx~n+1).\begin{cases}x^{n+1}=(I+\tau A)^{-1}\left(x^{n}-\tau C^{*}y^{n}\right),\\ \tilde{x}^{n+1}=2x^{n+1}-x^{n},\\ y^{n+1}=(I+\sigma B^{-1})^{-1}\left(y^{n}+\sigma C\tilde{x}^{n+1}\right).\end{cases} (6)

Using techniques different from the ones in [2, 3], the author in [6] proves the convergence of the iterates in (6) to the solution of (1) under the same assumption (5). The key idea is to rewrite (6) in the form of a forward-backward splitting algorithm analyzed in [4].

In this note, we provide a direct extension of the convergence proof of (3) in [3] for the monotone inclusion version (6).

2 Notation and hypotheses

Throughout the note, we assume that 1,2\mathcal{H}_{1},\mathcal{H}_{2} are Hilbert spaces, A,BA,B are maximally monotone, and CC is a non-zero bounded linear operator. Furthermore, assume that ψ1:1\psi_{1}:\mathcal{H}_{1}\to\mathbb{R} and ψ2:2\psi_{2}:\mathcal{H}_{2}\to\mathbb{R} are continuously Fréchet differentiable convex functions, and denote by

D1(x,x¯)=ψ1(x)ψ1(x¯)ψ1(x¯),xx¯,x,x¯1,D2(y,y¯)=ψ2(y)ψ2(y¯)ψ2(y¯),yy¯,y,y¯2,\begin{split}D_{1}(x,\bar{x})=&\psi_{1}(x)-\psi_{1}(\bar{x})-\langle\nabla\psi_{1}(\bar{x}),x-\bar{x}\rangle,\quad x,\bar{x}\in\mathcal{H}_{1},\\ D_{2}(y,\bar{y})=&\psi_{2}(y)-\psi_{2}(\bar{y})-\langle\nabla\psi_{2}(\bar{y}),y-\bar{y}\rangle,\quad y,\bar{y}\in\mathcal{H}_{2},\end{split} (7)

their Bregman divergences. We assume that there exists α>0\alpha>0 such that

D1(x,x¯)+D2(y,y¯)C(xx¯),yy¯α(xx¯2+yy¯2),x,x¯1,y,y¯2.D_{1}(x,\bar{x})+D_{2}(y,\bar{y})-\langle C(x-\bar{x}),y-\bar{y}\rangle\geq\alpha\left(\|x-\bar{x}\|^{2}+\|y-\bar{y}\|^{2}\right),\quad\forall x,\bar{x}\in\mathcal{H}_{1},\quad\forall y,\bar{y}\in\mathcal{H}_{2}. (8)

Taking y=y¯y=\bar{y} we obtain

ψ1(x)ψ1(x¯)ψ1(x¯),xx¯=D1(x,x¯)αxx¯2,x,x¯1,\psi_{1}(x)-\psi_{1}(\bar{x})-\langle\nabla\psi_{1}(\bar{x}),x-\bar{x}\rangle=D_{1}(x,\bar{x})\geq\alpha\|x-\bar{x}\|^{2},\forall x,\bar{x}\in\mathcal{H}_{1}, (9)

which means that ψ1\psi_{1} is 2α2\alpha-strongly convex. Similarly, we have that

ψ2(y)ψ2(y¯)ψ2(y¯),yy¯=D2(y,y¯)αyy¯2,y,y¯2,\psi_{2}(y)-\psi_{2}(\bar{y})-\langle\nabla\psi_{2}(\bar{y}),y-\bar{y}\rangle=D_{2}(y,\bar{y})\geq\alpha\|y-\bar{y}\|^{2},\forall y,\bar{y}\in\mathcal{H}_{2}, (10)

and so ψ2\psi_{2} is also 2α2\alpha-strongly convex.

Lemma 1.

Assume that \mathcal{H} is a Hilbert space, ψ:\psi:\mathcal{H}\to\mathbb{R} is a continuously Fréchet differentiable strongly convex function, and M:2M:\mathcal{H}\to 2^{\mathcal{H}} is a maximally monotone operator. Furthermore, denote by

D(x,x¯)=ψ(x)ψ(x¯)ψ(x¯),xx¯,x,x¯.D(x,\bar{x})=\psi(x)-\psi(\bar{x})-\langle\nabla\psi(\bar{x}),x-\bar{x}\rangle,\quad x,\bar{x}\in\mathcal{H}.

Then the map

Tx=xD(x,x¯)+Mx,x,Tx=\nabla_{x}D(x,\bar{x})+Mx,\quad x\in\mathcal{H},

is surjective for all x¯\bar{x}\in\mathcal{H}.

Proof.

Fix an arbitrary x¯\bar{x}\in\mathcal{H}. Since xD(x,x¯)x\mapsto D(x,\bar{x}) is convex and smooth [1, Theorem 20.25] yields that xxD(x,x¯)x\mapsto\nabla_{x}D(x,\bar{x}) is maximally monotone with a domain \mathcal{H}. Hence, by [5, Theorem 1] we have that TT is maximally monotone.

Next, let (x0,y0)graM(x_{0},y_{0})\in\operatorname{gra}M. Then for every xx\in\mathcal{H} we have that

infTx=infψ(x)ψ(x0)+Mxy0+(ψ(x0)+y0ψ(x¯))infψ(x)ψ(x0)+Mxy0ψ(x0)+y0ψ(x¯).\begin{split}\inf\|Tx\|=&\inf\|\nabla\psi(x)-\nabla\psi(x_{0})+Mx-y_{0}+(\nabla\psi(x_{0})+y_{0}-\nabla\psi(\bar{x}))\|\\ \geq&\inf\|\nabla\psi(x)-\nabla\psi(x_{0})+Mx-y_{0}\|-\|\nabla\psi(x_{0})+y_{0}-\nabla\psi(\bar{x})\|.\end{split}

Furthermore, the strong convexity of ψ\psi yields that

ψ(x)ψ(x0)+Mxy0,xx02αxx02,\begin{split}\langle\nabla\psi(x)-\nabla\psi(x_{0})+Mx-y_{0},x-x_{0}\rangle\geq 2\alpha\|x-x_{0}\|^{2},\end{split}

for some α>0\alpha>0, and from Cauchy-Schwarz inequality we obtain that

infψ(x)ψ(x0)+Mxy02αxx0,x.\inf\|\nabla\psi(x)-\nabla\psi(x_{0})+Mx-y_{0}\|\geq 2\alpha\|x-x_{0}\|,\quad\forall x\in\mathcal{H}.

Hence

infT(x)2αxx0ψ(x0)+y0ψ(x¯),x,\inf\|T(x)\|\geq 2\alpha\|x-x_{0}\|-\|\nabla\psi(x_{0})+y_{0}-\nabla\psi(\bar{x})\|,\quad\forall x\in\mathcal{H},

which implies

limxTx=,\lim_{\|x\|\to\infty}\|Tx\|=\infty,

and [1, Corollary 21.24] concludes the proof. ∎

3 The algorithm and its convergence

Considering the following primal-dual splitting algorithm

{xn+1=(xD1(,xn)+A)1(Cyn),x~n+1=2xn+1xn,yn+1=(yD2(,yn)+σB1)1(Cx~n+1).\begin{cases}x^{n+1}=(\nabla_{x}D_{1}(\cdot,x^{n})+A)^{-1}\left(-C^{*}y^{n}\right),\\ \tilde{x}^{n+1}=2x^{n+1}-x^{n},\\ y^{n+1}=(\nabla_{y}D_{2}(\cdot,y^{n})+\sigma B^{-1})^{-1}\left(C\tilde{x}^{n+1}\right).\end{cases} (11)

This previous algorithm is an extension of [3, Algorithm 1], where the subdifferential maps are replaced by general maximally monotone maps. When

ψ1(x)=x22τ,ψ2(y)=y22σ,x1,y2,\psi_{1}(x)=\frac{\|x\|^{2}}{2\tau},\quad\psi_{2}(y)=\frac{\|y\|^{2}}{2\sigma},\quad x\in\mathcal{H}_{1},\leavevmode\nobreak\ y\in\mathcal{H}_{2},

we obtain

D1(x,x¯)=xx¯22τ,D2(y,y¯)=yy¯22σ,D_{1}(x,\bar{x})=\frac{\|x-\bar{x}\|^{2}}{2\tau},\quad D_{2}(y,\bar{y})=\frac{\|y-\bar{y}\|^{2}}{2\sigma},

and (11) reduces to (6). Moreover the existence of an α>0\alpha>0 such that (8) holds is equivalent to (5).

Furthermore, Lemma 1 guarantees that all steps in (11) are well defined, and the algorithm will not halt.

Theorem 1.

Assume that (1) admits a solution (x,y)1×2(x^{*},y^{*})\in\mathcal{H}_{1}\times\mathcal{H}_{2}, and (xn,x~n,yn)(x^{n},\tilde{x}^{n},y^{n}) are generated by (11) with arbitrary initial points (x0,x~0,y0)1×1×2(x^{0},\tilde{x}^{0},y^{0})\in\mathcal{H}_{1}\times\mathcal{H}_{1}\times\mathcal{H}_{2}. Then the ergodic sequence {(XN,YN)}\{(X_{N},Y_{N})\} defined in (4) is bounded, and all its weak limits are solutions of (1).

Proof.

We introduce the following function

(x,ζ;y,η)=sup(u,v)Ax×B1yxζ,uCη+Cζv,yη=sup(u,v)Ax×B1yζx,u+ηy,vCx,η+Cζ,y,\begin{split}\mathcal{L}(x,\zeta;y,\eta)=&\sup_{(u,v)\in Ax\times B^{-1}y}\langle x-\zeta,-u-C^{*}\eta\rangle+\langle C\zeta-v,y-\eta\rangle\\ =&\sup_{(u,v)\in Ax\times B^{-1}y}\langle\zeta-x,u\rangle+\langle\eta-y,v\rangle-\langle Cx,\eta\rangle+\langle C\zeta,y\rangle,\end{split} (12)

where we set the supremum of an empty set to be -\infty. As pointed out in [3], the basic building block of (11) is the iteration

{x^=(xD1(,x¯)+A)1(Cy~),y^=(yD2(,y¯)+σB1)1(Cx~),\begin{cases}\hat{x}=(\nabla_{x}D_{1}(\cdot,\bar{x})+A)^{-1}\left(-C^{*}\tilde{y}\right),\\ \hat{y}=(\nabla_{y}D_{2}(\cdot,\bar{y})+\sigma B^{-1})^{-1}\left(C\tilde{x}\right),\end{cases} (13)

for suitable choices of x¯,x^,x~\bar{x},\hat{x},\tilde{x} and y¯,y^,y~\bar{y},\hat{y},\tilde{y}. In an expanded form, (13) can be written as

{xD1(x^,x¯)+u^=Cy~,yD2(y^,y¯)+v^=Cx~,\begin{cases}\nabla_{x}D_{1}(\hat{x},\bar{x})+\hat{u}=-C^{*}\tilde{y},\\ \nabla_{y}D_{2}(\hat{y},\bar{y})+\hat{v}=C\tilde{x},\end{cases} (14)

where (u^,v^)Ax^×B1y^(\hat{u},\hat{v})\in A\hat{x}\times B^{-1}\hat{y}. Thus, we first obtain estimates for the general iteration (14) and then apply them to (11).

Let (14) hold, and (x,y)1×2(x,y)\in\mathcal{H}_{1}\times\mathcal{H}_{2}, (u,v)Ax×B1y(u,v)\in Ax\times B^{-1}y be arbitrary. Then by the monotonicity of AA and (14) we have that

u,xx^u^,xx^=Cy~xD1(x^,x¯),xx^=Cy~,xx^+D1(x^,x¯)+D1(x,x^)D1(x,x¯),\begin{split}\langle u,x-\hat{x}\rangle\geq&\langle\hat{u},x-\hat{x}\rangle=\langle-C^{*}\tilde{y}-\nabla_{x}D_{1}(\hat{x},\bar{x}),x-\hat{x}\rangle\\ =&\langle-C^{*}\tilde{y},x-\hat{x}\rangle+D_{1}(\hat{x},\bar{x})+D_{1}(x,\hat{x})-D_{1}(x,\bar{x}),\end{split} (15)

where we also used the identity

xD1(x^,x¯),xx^=D1(x^,x¯)+D1(x,x^)D1(x,x¯).\langle-\nabla_{x}D_{1}(\hat{x},\bar{x}),x-\hat{x}\rangle=D_{1}(\hat{x},\bar{x})+D_{1}(x,\hat{x})-D_{1}(x,\bar{x}).

Similarly, using the monotonicity of B1B^{-1} we obtain

v,yy^v^,yy^=Cx~xD2(x^,x¯),xx^=Cx~,yy^+D2(y^,y¯)+D2(y,y^)D2(y,y¯).\begin{split}\langle v,y-\hat{y}\rangle\geq&\langle\hat{v},y-\hat{y}\rangle=\langle C\tilde{x}-\nabla_{x}D_{2}(\hat{x},\bar{x}),x-\hat{x}\rangle\\ =&\langle C\tilde{x},y-\hat{y}\rangle+D_{2}(\hat{y},\bar{y})+D_{2}(y,\hat{y})-D_{2}(y,\bar{y}).\end{split} (16)

Combining (15), (16), we obtain

D1(x,x¯)D1(x^,x¯)D1(x,x^)+D2(y,y¯)D2(y^,y¯)D2(y,y^)xx^,uCy~+Cx~v,yy^=xx^,uCy^+Cx^v,yy^+C(xx^),y^y~+C(x~x^),yy^.\begin{split}&D_{1}(x,\bar{x})-D_{1}(\hat{x},\bar{x})-D_{1}(x,\hat{x})+D_{2}(y,\bar{y})-D_{2}(\hat{y},\bar{y})-D_{2}(y,\hat{y})\\ \geq&\langle x-\hat{x},-u-C^{*}\tilde{y}\rangle+\langle C\tilde{x}-v,y-\hat{y}\rangle\\ =&\langle x-\hat{x},-u-C^{*}\hat{y}\rangle+\langle C\hat{x}-v,y-\hat{y}\rangle+\langle C(x-\hat{x}),\hat{y}-\tilde{y}\rangle+\langle C(\tilde{x}-\hat{x}),y-\hat{y}\rangle.\end{split}

Since (u,v)Ax×B1y(u,v)\in Ax\times B^{-1}y are arbitrary, we obtain that

(x,x^;y,y^)D1(x,x¯)D1(x^,x¯)D1(x,x^)+D2(y,y¯)D2(y^,y¯)D2(y,y^)+C(xx^),y~y^+C(x~x^),y^y,x1,y2.\begin{split}\mathcal{L}(x,\hat{x};y,\hat{y})\leq&D_{1}(x,\bar{x})-D_{1}(\hat{x},\bar{x})-D_{1}(x,\hat{x})+D_{2}(y,\bar{y})-D_{2}(\hat{y},\bar{y})-D_{2}(y,\hat{y})\\ &+\langle C(x-\hat{x}),\tilde{y}-\hat{y}\rangle+\langle C(\tilde{x}-\hat{x}),\hat{y}-y\rangle,\quad\forall x\in\mathcal{H}_{1},\leavevmode\nobreak\ y\in\mathcal{H}_{2}.\end{split} (17)

As in [3], this previous inequality is the key inequality in the proof. Indeed, (11) corresponds to choosing

x^=xn+1,x¯=xn,x~n+1=2xn+1xn,y^=yn+1,y¯=yn,y~=yn,\hat{x}=x^{n+1},\leavevmode\nobreak\ \bar{x}=x^{n},\leavevmode\nobreak\ \tilde{x}^{n+1}=2x^{n+1}-x^{n},\leavevmode\nobreak\ \hat{y}=y^{n+1},\leavevmode\nobreak\ \bar{y}=y^{n},\leavevmode\nobreak\ \tilde{y}=y^{n},

in (13), and so (17) yields

(x,xn+1;y,yn+1){D1(x,xn)+D2(y,yn)C(xxn),yyn}{D1(x,xn+1)+D2(y,yn+1)C(xxn+1),yyn+1}{D1(xn+1,xn)+D2(yn+1,yn)C(xn+1xn),yn+1yn}.\begin{split}\mathcal{L}(x,x^{n+1};y,y^{n+1})\leq&\left\{D_{1}(x,x^{n})+D_{2}(y,y^{n})-\langle C(x-x^{n}),y-y^{n}\rangle\right\}\\ &-\left\{D_{1}(x,x^{n+1})+D_{2}(y,y^{n+1})-\langle C(x-x^{n+1}),y-y^{n+1}\rangle\right\}\\ &-\left\{D_{1}(x^{n+1},x^{n})+D_{2}(y^{n+1},y^{n})-\langle C(x^{n+1}-x^{n}),y^{n+1}-y^{n}\rangle\right\}.\end{split}

Hence, by the convexity of (ζ,η)(x,ζ;y,η)(\zeta,\eta)\mapsto\mathcal{L}(x,\zeta;y,\eta), we obtain

N(x,XN;y,YN)n=1N(x,xn;y,yn){D1(x,x0)+D2(y,y0)C(xx0),yy0}{D1(x,xN)+D2(y,yN)C(xxN),yyN}n=1N{D1(xn,xn1)+D2(yn,yn1)C(xnxn1),ynyn1},\begin{split}N\mathcal{L}(x,X^{N};y,Y^{N})\leq&\sum_{n=1}^{N}\mathcal{L}(x,x^{n};y,y^{n})\\ \leq&\left\{D_{1}(x,x^{0})+D_{2}(y,y^{0})-\langle C(x-x^{0}),y-y^{0}\rangle\right\}\\ &-\left\{D_{1}(x,x^{N})+D_{2}(y,y^{N})-\langle C(x-x^{N}),y-y^{N}\rangle\right\}\\ &-\sum_{n=1}^{N}\left\{D_{1}(x^{n},x^{n-1})+D_{2}(y^{n},y^{n-1})-\langle C(x^{n}-x^{n-1}),y^{n}-y^{n-1}\rangle\right\},\end{split} (18)

for all x1x\in\mathcal{H}_{1}, y2y\in\mathcal{H}_{2}, and NN\in\mathbb{N}. Note that (8) guarantees that the expressions in the curly brackets are nonnegative.

Recall that (x,y)(x^{*},y^{*}) is a solution of (1), and so

CyAx,CxB1y.-C^{*}y^{*}\in Ax^{*},\quad Cx^{*}\in B^{-1}y^{*}. (19)

But then by the definition of \mathcal{L} we have that

(x,ζ;y,η)xζ,CyCη+CζCx,yη=0,ζ1,η2.\mathcal{L}(x^{*},\zeta;y^{*},\eta)\geq\langle x^{*}-\zeta,C^{*}y^{*}-C^{*}\eta\rangle+\langle C\zeta-Cx^{*},y^{*}-\eta\rangle=0,\quad\forall\zeta\in\mathcal{H}_{1},\leavevmode\nobreak\ \forall\eta\in\mathcal{H}_{2}.

In particular, we have that

(x,XN;y,YN)0,\mathcal{L}(x^{*},X^{N};y^{*},Y^{N})\geq 0, (20)

and (18) yields that

D1(x,xN)+D2(y,yN)C(xxN),yyND1(x,x0)+D2(y,y0)C(xx0),yy0,D_{1}(x^{*},x^{N})+D_{2}(y^{*},y^{N})-\langle C(x^{*}-x^{N}),y^{*}-y^{N}\rangle\leq D_{1}(x^{*},x^{0})+D_{2}(y^{*},y^{0})-\langle C(x^{*}-x^{0}),y-y^{0}\rangle,

and (8) implies that

xNx2+yNy2D1(x,x0)+D2(y,y0)C(xx0),yy0α,N.\|x^{N}-x^{*}\|^{2}+\|y^{N}-y^{*}\|^{2}\leq\frac{D_{1}(x^{*},x^{0})+D_{2}(y^{*},y^{0})-\langle C(x^{*}-x^{0}),y-y^{0}\rangle}{\alpha},\quad\forall N\in\mathbb{N}.

Therefore, {(xn,yn)}\{(x^{n},y^{n})\} is a bounded sequence, and the convexity of the norm yields the boundedness of the ergodic sequence with the same bounds; that is,

XNx2+YNy2D1(x,x0)+D2(y,y0)C(xx0),yy0α,N.\|X^{N}-x^{*}\|^{2}+\|Y^{N}-y^{*}\|^{2}\leq\frac{D_{1}(x^{*},x^{0})+D_{2}(y^{*},y^{0})-\langle C(x^{*}-x^{0}),y-y^{0}\rangle}{\alpha},\quad\forall N\in\mathbb{N}.

Assume that (X,Y)(X,Y) is a weak (subsequential) limit of {(XN,YN)}\{(X_{N},Y_{N})\}. Invoking (18) again, we obtain

(x,XN;y,YN)D1(x,x0)+D2(y,y0)C(xx0),yy0N,\mathcal{L}(x,X^{N};y,Y^{N})\leq\frac{D_{1}(x,x^{0})+D_{2}(y,y^{0})-\langle C(x-x^{0}),y-y^{0}\rangle}{N}, (21)

for all x1x\in\mathcal{H}_{1}, y2y\in\mathcal{H}_{2}, and NN\in\mathbb{N}. Let (u,v)Ax×B1y(u,v)\in Ax\times B^{-1}y be arbitrary. Then we have that

XNx,u+YNy,vCx,YN+CXN,y(x,XN;y,YN),\begin{split}\langle X^{N}-x,u\rangle+\langle Y^{N}-y,v\rangle-\langle Cx,Y^{N}\rangle+\langle CX^{N},y\rangle\leq\mathcal{L}(x,X^{N};y,Y^{N}),\end{split}

and so the weak convergence and (21) yield

Xx,u+Yy,vCx,Y+CX,y=limNXNx,u+YNy,vCx,YN+CXN,ylim infN(x,XN;y,YN)0.\begin{split}&\langle X-x,u\rangle+\langle Y-y,v\rangle-\langle Cx,Y\rangle+\langle CX,y\rangle\\ =&\lim_{N\to\infty}\langle X^{N}-x,u\rangle+\langle Y^{N}-y,v\rangle-\langle Cx,Y^{N}\rangle+\langle CX^{N},y\rangle\\ \leq&\liminf_{N\to\infty}\mathcal{L}(x,X^{N};y,Y^{N})\leq 0.\end{split}

Therefore we have that

(x,X;y,Y)0,x1,y2.\mathcal{L}(x,X;y,Y)\leq 0,\quad\forall x\in\mathcal{H}_{1},\leavevmode\nobreak\ y\in\mathcal{H}_{2}. (22)

Taking y=Yy=Y in (22) we obtain

xX,u+CY0,(x,u)graA,\langle x-X,u+C^{*}Y\rangle\geq 0,\quad\forall(x,u)\in\operatorname{gra}A,

and so maximal monotonicity of AA yields that

(X,CY)graACYAX.(X,-C^{*}Y)\in\operatorname{gra}A\Longleftrightarrow-C^{*}Y\in AX. (23)

Similarly, plugging in x=Xx=X in (22) we find that

yY,vCX0,(y,v)graB1,\langle y-Y,v-CX\rangle\geq 0,\quad\forall(y,v)\in\operatorname{gra}B^{-1},

and the maximal monotonicity of B1B^{-1} yields that

(Y,CX)graB1YB(CX).(Y,CX)\in\operatorname{gra}B^{-1}\Longleftrightarrow Y\in B(CX). (24)

Combining (23) and (24) we obtain that (X,Y)(X,Y) is a solution of (1). ∎

References

  • [1] H. H. Bauschke and P. L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC. Springer, Cham, second edition, 2017. With a foreword by Hédy Attouch.
  • [2] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision, 40(1):120–145, 2011.
  • [3] A. Chambolle and T. Pock. On the ergodic convergence rates of a first-order primal-dual algorithm. Math. Program., 159(1-2):253–287, 2016.
  • [4] P. L. Combettes. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization, 53(5-6):475–504, 2004.
  • [5] R. T. Rockafellar. On the maximality of sums of nonlinear monotone operators. Trans. Amer. Math. Soc., 149:75–88, 1970.
  • [6] B. C. Vũ. A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math., 38(3):667–681, 2013.