Explicit Convergence Rate of The Proximal Point Algorithm under R-Continuity

Ba Khiet Le Michel Théra Optimization Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, VietnamE-mail: [email protected]Mathematics and Computer Science Department, University of Limoges, 123 Avenue Albert Thomas, 87060 Limoges CEDEX, France School of Engineering, IT and Physical Sciences, Federation University, Ballarat, Victoria, 3350, Australia E-mail: [email protected]

Abstract

The paper provides a thorough comparison between R-continuity and other fundamental tools in optimization such as metric regularity, metric subregularity and calmness. We show that R-continuity has some advantages in the convergence rate analysis of algorithms solving optimization problems. We also present some properties of R-continuity and study the explicit convergence rate of the Proximal Point Algorithm $(\mathbf{P}\mathbf{P}\mathbf{A})$ under the R-continuity.

Keywords. R-continuity, metric regularity, metric subregularity, calmness, Proximal Point Algorithm, convergence rate

AMS Subject Classification. 28B05, 34A36, 34A60, 49J52, 49J53, 93D20

1 Introduction

In what follows, $\mathbb{X}$ and $\mathbb{Y}$ are real Banach spaces whose norms are designated by $\|\cdot\|$ . We use the notation $\mathbb{B}(x,r)$ to denote the closed ball with center $x$ and radius $r>0$ and by $\mathbb{B}$ the closed unit ball which consists of the elements of norm less than or equal to $1$ . By a set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ , we mean a mapping which assigns to each $x\in\mathbb{X}$ a subset $\mathcal{A}(x)$ (possibly empty) of $\mathbb{Y}$ . The domain, the range and the graph of $\mathcal{A}$ are defined respectively by

{\rm dom}\,\mathcal{A}=\{x\in\mathbb{X}:\mathcal{A}(x)\neq\emptyset\},\;\;{\rm rge}\,\mathcal{A}=\cup_{x\in\mathbb{X}}\mathcal{A}(x),

and

{\rm Graph\,}{\mathcal{A}=\{(x,y)\in\mathbb{X}\times\mathbb{Y}\;\text{such that}\;y\in\mathcal{A}(x})\}.

As usual we denote by $\mathcal{A}^{-1}:\mathbb{Y}\rightrightarrows\mathbb{X}$ the inverse of $\mathcal{A}$ defined by

x\in\mathcal{A}^{-1}(y)\;\iff\;y\in\mathcal{A}(x).

The notion $\mathbf{d}(x,\mathbf{S})$ stands for the distance from a point $x\in\mathbb{X}$ to a subset $\mathbf{S}\subset\mathbb{X}$ :

\mathbf{d}(x,\mathbf{S}):=\,\inf_{y\in\mathbf{S}}\|x-y\|.

Given a set-valued mapping $\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H}$ defined in a Hilbert space $\mathcal{H}$ , and inspired by Rockafellar’s paper [25], recently B. K. Le [17] introduced the notion of R-continuity for studying the convergence rate of the Tikhonov regularization of the inclusion

0\in\mathcal{A}(x).

(1)

In [17], it was proved that R-continuity is a useful tool to analyze the convergence of $\mathbf{D}\mathbf{C}\mathbf{A}$ (Difference of Convex Algorithm) and can explain why $\mathbf{D}\mathbf{C}\mathbf{A}$ is effective in approximating solutions for a broad class of functions. In recent decades, there has been a surge of interest to study variational inclusions such as (1) since they modelise a variety of important systems. This is the case especially in optimization when the condition for critical points is considered (the Fermat rule). Another connection with the inclusion (1) arises in PDEs and is well discussed in the books [6, 9]. The fact that the solution set $\mathbf{S}:={\mathcal{A}}^{-1}(0)$ of (1) involves the inverse operator of $\mathcal{A}$ , conducts naturally to study the continuity of $\mathcal{{A}}^{-1}$ at zero. According to [17], the mapping $\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H}$ is said to be R-continuous at zero if there exist a radius $\sigma>0$ and a non-decreasing modulus function $\rho:\mathbb{R}^{+}\to\mathbb{R}^{+}$ satisfying $\lim_{r\to 0^{+}}\rho(r)=\rho(0)=0$ such that

\mathcal{A}(x)\subset\mathcal{A}(0)+\rho(\|x\|)\mathbb{B},\;\text{ for every}\;x\in\sigma\mathbb{B}.

(2)

Denoting by $\mathbf{e}\mathbf{x}(C,D):=\sup_{x\in C}\mathbf{d}(x,D)$ , the excess of the set $C$ over the set $D$ , with the convention that $\mathbf{e}\mathbf{x}(\emptyset,D)=0$ when $D$ is nonempty and $\mathbf{e}\mathbf{x}(C,\emptyset)=+\infty$ for any $C$ , (2) is equivalent to saying that

\mathbf{e}\mathbf{x}(\mathcal{A}(x),\mathcal{A}(0))\leq\rho(\|x\|)\;\text{whenever}\;\|x\|\leq\sigma.

(3)

When $\rho(r):=\,Lr$ for some $L>0$ , (3) is changed into

\mathbf{e}\mathbf{x}(\mathcal{A}(x),\mathcal{A}(0))\leq L\|x\|\;\text{whenever}\;\|x\|\leq\sigma.

(4)

In this case, $\mathcal{A}$ is referred to as R-Lipschitz continuous at zero or equivalently upper Lipschitz at zero in the sense of Robinson [24] or outer Lipschitz continuous at zero [5]. In addition, if $\mathcal{A}(0)$ is a singleton, R-Lipschitz continuity at zero of $\mathcal{A}$ is exactly the Lipschitz continuity at zero introduced by R. T. Rockafellar in [25] who considered Lipschitz continuity at zero of set-valued mappings as an important tool in optimization. Rockafellar demonstrated the linear convergence rate of the proximal point algorithm after a finite number of iterations from any starting point $x_{0}$ . However the requirement that the solution set is a singleton is often quite restrictive in practice. Thus, allowing the solution set $\mathbf{S}$ to be set-valued and and not requiring the continuity modulus function $\rho$ to be Lipschitz continuous makes R-continuity a competitive alternative. It provides a viable option alongside other fundamental concepts such as metric regularity, metric subregularity or calmness in the study of sensitivity analysis as well as in establishing the convergence rate of algorithms. Note that R-continuity can be also extended to Banach spaces. In order to compare these regularity concepts, let us recall the definitions of metric regularity, metric subregularity and calmness (see, e.g., [14, 15, 12, 30, 31, 10, 16, 26, 11, 21, 22, 28] and the references therein). Given $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ where $\mathbb{X},\mathbb{Y}$ are real Banach spaces and $(\bar{x},\bar{y})\in{\rm Graph\,}\mathcal{A}$ , one says that $\mathcal{A}$ is metrically regular near $(\bar{x},\bar{y})$ if a linear error bound holds

\mathbf{d}(x,\mathcal{A}^{-1}(y))\leq\kappa\mathbf{d}(y,\mathcal{A}(x))

(5)

for some $\kappa>0$ and for all $(x,y)$ close to $(\bar{x},\bar{y})$ . If (5) is satisfied for all $(x,y)\in\mathbb{X}\times\mathbb{Y}$ , then $\mathcal{A}$ is termed globally metrically regular. When $(\bar{x},0)\in{\rm Graph\,}\,\mathcal{A}$ , inequality (5) provides an estimate of how far is a point $x$ around $\bar{x}$ to the solution set $\mathcal{A}^{-1}(0)$ . However, metric regularity can be too stringent as it requires the inequality (5) to hold in a neighborhood of $(\bar{x},\bar{y})$ . One says that $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is calm at $(\bar{x},\bar{y})\in{\rm Graph\,}\mathcal{A}$ if there exist $\kappa>0,\epsilon>0,\sigma>0$ such that

\mathcal{A}({x})\cap\mathbb{B}(\bar{y},\epsilon)\subset\mathcal{A}(\bar{x})+\kappa\|x-\bar{x}\|\mathbb{B},\;\;\;\text{for all}\;x\in\mathbb{B}(\bar{x},\sigma),

(6)

or equivalently

\mathbf{e}\mathbf{x}(\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon),\mathcal{A}(\bar{x}))\leq\kappa\|x-\bar{x}\|,\;\;{\rm for\;all}\;x\in\mathbb{B}(\bar{x},\sigma).

(7)

Note that when the set $\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon)$ is empty the inequality (7) is always satisfied. When $\bar{x}=0$ , (7) becomes

\mathbf{e}\mathbf{x}(\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon),\mathcal{A}(0))\leq\kappa\|x\|,\;\;{\rm whenever}\;\|x\|\leq\sigma.

(8)

Another well-known regularity concept is metric subregularity. The set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is called metrically subregular at $(\bar{x},0)\in{\rm Graph\,}\,\mathcal{A}$ if there exists a constant $\kappa>0$ such that

\mathbf{d}(x,\mathcal{A}^{-1}(0))\leq\kappa\mathbf{d}(0,\mathcal{A}(x)),\;\;\text{ for all }\;x\;\text{close to}\;\bar{x}.

(9)

It is known (see, e.g., [14, Proposition 2.62]) that metric subregularity of $\mathcal{A}$ at $(\bar{x},0)$ is equivalent to calmness of $\mathcal{A}^{-1}$ at $(0,\bar{x})$ . From (3) and (8), let us observe that if $\mathcal{A}$ is R-Lipschitz continuous at 0 then it is calm at $(0,\bar{y})$ for any $\bar{y}\in\mathcal{A}(0)$ . While metric regularity is too strong, calmness is relatively weak to deduce useful properties. For example when the set $\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon)$ is empty for some $x\in\epsilon\mathbb{B}$ , no information can be deduced. In addition, with complicated $\mathcal{A}$ , we do not know a specific $\bar{y}$ and have to use computers to find an approximation of an element of $\mathcal{A}(0)$ . The first advantage of R-continuity is that it is unnecessary to know a prior solution $\bar{y}$ in advance. Furthermore in (2), since $x$ is small, for each $y\in\mathcal{A}(x)$ , there exists some $\tilde{y}\in\mathcal{A}(0)$ close to $y$ . This fact is meaningful since it is difficult to ensure $y$ to be in the vicinity of $\bar{y}$ , which is unknown. Secondly, R-continuity is straightforward and always guarantees the system consistency in Hoffman’s sense [13]. Indeed, Theorem 7 establishes that under the R-continuity of $\mathcal{A}^{-1}$ at zero, the inclusion (1) is consistent, i.e., if $y_{\sigma}$ has a small norm and satisfies $y_{\sigma}\in\mathcal{A}(x_{\sigma})$ , then we can find a solution $\bar{x}\in\mathbf{S}$ such that $x_{\sigma}$ is close to $\bar{x}$ . Note that to guarantee the consistency property, metric subregularity or metric regularity must be satisfied globally. Thirdly, R-continuity does not require the property (2) to hold around a point belonging to the graph of the operator; it is easily verified for a broad class of set-valued mappings. In order to achieve the metric regularity at some point $(\bar{x},\bar{y})\in{\rm Graph\,}\mathcal{A}$ , the celebrated Robinson-Ursescu Theorem [23, 29] requires for the operator $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ to have a closed and convex graph and also that $\bar{y}\in{\rm int}(\mathcal{A}(\mathbb{X}))$ (the interior of $\mathcal{A}(\mathbb{X}))$ . R-continuity only necessitates that the graph of the operators be closed. Indeed, if the operator has a closed graph at zero and is locally compact at zero then it is R-continuous at zero (Theorem 5). Conversely, if the operator is R-continuous at zero and its value at zero is closed, then it has a closed graph at zero (Theorem 6). The condition for having its graph closed at zero is relatively mild. If an operator’s graph is closed, then it has a closed graph at zero as well. Furthermore, an operator has a closed graph if and only if its inverse has a closed graph. Closed graph operators are usually found in optimization when dealing with continuous single-valued mappings, maximally monotone operators (see, e.g., [8]), the sum of two closed graph operators where one of them is single-valued, the sum of two closed graph set-valued operators where one of them is locally compact (Proposition 3). The Sign function in ${\mathbb{R}}^{n}$ , which equals to the convex subdifferential of the norm function, is a well-known example of an operator with compact range and is widely used in image processing, mechanical and electrical engineering (see, e.g., [2, 20]).

Finally, we demonstrate that R-continuity can be used to analyse the explicit convergence rate of the proximal point algorithm ( $\mathbf{P}\mathbf{P}\mathbf{A}$ ) when applied to a maximally monotone operator $\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H}$ , where $\mathcal{H}$ is a Hilbert space. The $\mathbf{P}\mathbf{P}\mathbf{A}$ , introduced by B. Martinet and further developed by Rockafellar, Bauschke and Combettes and others (see, e.g., [7, 18, 25]), is an essential tool in convex optimization if $\mathcal{A}$ is set-valued and lacks special structure. We show that if $\mathcal{A}^{-1}$ is globally R-Lipschitz continuous at zero (i.e., $\sigma=+\infty$ ), the convergence of the proximal point algorithm is linear from the beginning (Theorem 11). When considering the case where $\mathcal{A}=\partial f$ , i.e. when $\mathcal{A}$ is the convex subdifferential of a proper lower semicontinuous extended real-valued convex function, not necessarily smooth $f:\mathcal{H}\to{\mathbb{R}}\cup\{+\infty\}$ , if $\mathcal{A}^{-1}$ is only R-continuous, we obtain an explicit convergence rate of $\mathbf{P}\mathbf{P}\mathbf{A}$ based on the modulus function. If $(\partial f)^{-1}$ is R-Lipschitz continuous at zero, one has the linear convergence of the generated sequence $(x_{n})$ after some iterations (Theorem 14).

The paper is organized as follows. First we review the necessary material about R-continuity of set-valued mappings and maximally monotone operators in Section 2. Some key properties of R-continuity are established in Section 3. The analysis of the convergence rate of $\mathbf{P}\mathbf{P}\mathbf{A}$ under the R-continuity is presented in Section 4. The paper ends in Section 5 with some conclusions and perspectives.

2 Mathematical preliminaries

In what follows, we extend the notion of R-continuity introduced in [17] from Hilbert spaces to Banach spaces, defined for any point in the domain of the operators. Let $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ be a set-valued mapping where $\mathbb{X},\mathbb{Y}$ are Banach spaces and ${\bar{x}}\in{\rm dom}\,\mathcal{A}$ .

Definition 1.

The set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is called R-continuous at ${\bar{x}}$ if there exist $\sigma>0$ and a non-decreasing function $\rho:\mathbb{R}^{+}\to\mathbb{R}^{+}$ satisfying $\lim_{r\to 0^{+}}\rho(r)=\rho(0)=0$ such that

\mathcal{A}(x)\subset\mathcal{A}({\bar{x}})+\rho(\|x-{\bar{x}}\|)\mathbb{B},\;\forall x\in\mathbb{B}({\bar{x}},\sigma),

(10)

or equivalently, for each $y\in\mathcal{A}(x)$ , there exists ${\bar{y}}\in\mathcal{A}({\bar{x}})$ such that $\|y-{\bar{y}}\|\leq\rho(\|x-{\bar{x}}\|)$ for all $x\in\mathbb{B}({\bar{x}},\sigma)$ .

The function $\rho$ is called a continuity modulus function of $\mathcal{A}$ at ${\bar{x}}$ and $\sigma$ is called the radius. We say that $\mathcal{A}$ is $R$ -Lipschitz continuous at ${\bar{x}}$ with modulus $L$ if $\rho(r)=Lr$ for some $L>0$ . In addition, if $\sigma=\infty$ then $\mathcal{A}$ is said globally R-Lipschitz continuous at ${\bar{x}}$ .

Remark 1.

The set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is R-continuous at ${\bar{x}}$ with modulus function $\rho$ and radius $\sigma$ iff $-\mathcal{A}$ is R-continuous at ${\bar{x}}$ with modulus function $\rho$ and radius $\sigma$ .

Definition 2.

The set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is called R-continuous if it is R-continuous at any point in its domain. It is called R-Lipschitz continuous if it is R-Lipschitz continuous at any point in its domain. In addition, if the radius $\sigma=+\infty$ , then it is called globally R-Lipschitz continuous.

Proposition 1.

If $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is globally metrically regular then $\mathcal{A}^{-1}:\mathbb{Y}\rightrightarrows\mathbb{X}$ is globally R-Lipschitz continuous.

Proof.

Let be given ${y},{\bar{y}}\in{\rm dom}\,\mathcal{A}^{-1}$ and ${x}\in\mathcal{A}^{-1}({y})$ . We have $y\in\mathcal{A}x$ and since $\mathcal{A}$ is globally metrically regular, one has

\mathbf{d}(x,\mathcal{A}^{-1}(\bar{y}))\leq\kappa\mathbf{d}(\bar{y},\mathcal{A}(x))\leq\kappa\|y-\bar{y}\|,

(11)

for some $\kappa>0$ . Since $x\in\mathcal{A}^{-1}(y)$ is arbitrary, we deduce that

\mathbf{e}\mathbf{x}(\mathcal{A}^{-1}(y),\mathcal{A}^{-1}(\bar{y}))\leq\kappa\|y-\bar{y}\|

and the conclusion follows. ∎

The following simple example shows that the metric regularity is strictly stronger than R-Lipschitz continuity. This important fact means that we still obtain the convergence rate of optimization algorithms if only R-Lipschitz continuity or even R-continuity holds (see, e.g., [17]).

Example 1.

Let $\mathcal{A}:{\mathbb{R}}\rightrightarrows{\mathbb{R}}$ be defined by

\mathcal{A}(x):=\,\left\{\begin{array}[]{l}[0,\infty)\;\;\;\;\;\;\;\;{\rm if}\;\;\;\;x=1\\ \\ 0\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm if}\;\;\;-1<x<1\\ \\ (-\infty,0]\;\;\;\;\;\;{\rm if}\;\;\;\;x<0.\end{array}\right.

Then $\mathcal{A}^{-1}(y)={\rm\mathbf{Sign}}(y)$ is R-Lipschitz continuous for any positive modulus. However, if $y_{n}<0$ and $y_{n}\to 0$ then

\mathbf{d}(1,\mathcal{A}^{-1}(y_{n}))=2,\;\;\mathbf{d}(y_{n},\mathcal{A}(1))=\mathbf{d}(y_{n},[0,\infty))=|y_{n}|\to 0.

Thus $\mathcal{A}$ is not metrically regular at $(1,0)$ .

In the text that follows, we consider set-valued mappings with closed graph.

Definition 3.

We say that $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ has a closed graph if $y_{n}\in\mathcal{A}(x_{n}),\;y_{n}\to y$ and $x_{n}\to x$ then $y\in\mathcal{A}(x)$ . It is said that $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ has a closed graph at zero if $y_{n}\in\mathcal{A}(x_{n}),\;x_{n}\to 0$ and $y_{n}\to y$ then $y\in\mathcal{A}(0)$ .

Proposition 2.

The set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ has a closed graph if and only if $\mathcal{A}^{-1}:\mathbb{Y}\rightrightarrows\mathbb{X}$ has a closed graph.

Proof.

It follows directly from the definition of the inverse set-valued mapping $\mathcal{A}^{-1}$ . ∎

Proposition 3.

Suppose that $\mathcal{A}=\mathcal{A}_{1}+\mathcal{A}_{2}$ where $\mathcal{A}_{1},\mathcal{A}_{2}:\mathbb{X}\rightrightarrows\mathbb{Y}$ has a closed graph;
a) If $\mathcal{A}_{2}$ is single-valued, then $\mathcal{A}$ has a closed graph;
b) If $\mathcal{A}_{2}$ is locally compact, i.e., for each $\bar{x}\in X$ there exists $\epsilon>0$ such that the set $\bigcup_{x\in\mathbb{B}(\bar{x},\epsilon)}\mathcal{A}_{2}(x)$ is compact, then $\mathcal{A}$ has a closed graph.

Proof.

a) Let $y_{n}\in\mathcal{A}(x_{n})=\mathcal{A}_{1}(x_{n})+\mathcal{A}_{2}(x_{n}),y_{n}\to y$ and $x_{n}\to x$ . We have $y_{n}=z_{n}+\mathcal{A}_{2}(x_{n})$ for some $z_{n}\in\mathcal{A}_{1}(x_{n})$ . Since $z_{n}=y_{n}-\mathcal{A}_{2}(x_{n})\to y-\mathcal{A}_{2}(x)$ and $\mathcal{A}_{1}$ has a closed graph, we imply that $y-\mathcal{A}_{2}(x)\in\mathcal{A}_{1}(x)$ and the conclusion follows.
b) Similarly let $y_{n}\in\mathcal{A}(x_{n})=\mathcal{A}_{1}(x_{n})+\mathcal{A}_{2}(x_{n}),y_{n}\to y$ and $x_{n}\to x$ . We have $y_{n}=z_{n}+v_{n}$ for some $z_{n}\in\mathcal{A}_{1}(x_{n})$ and $v_{n}\in\mathcal{A}_{2}(x_{n})$ . Since $\mathcal{A}_{2}$ is locally compact, there exists a subsequence of $(v_{n})$ , without relabelling w.l.o.g, converging to some $v\in\mathcal{A}_{2}(x)$ . Thus $z_{n}=y_{n}-v_{n}$ converges to $y-v\in\mathcal{A}_{1}(x)$ . Therefore $y\in\mathcal{A}_{1}(x)+v\in\mathcal{A}_{1}(x)+\mathcal{A}_{2}(x).$ ∎

Finally some useful properties of monotone operators are reminded. Let $\mathcal{H}$ be a Hilbert space, a set-valued mapping $\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H}$ is called monotone provided

\langle\bar{x}-\bar{y},x-y\rangle\geq 0\;\;\forall\;x,y\in\mathcal{H},\bar{x}\in\mathcal{A}(x)\;{\rm and}\;\bar{y}\in\mathcal{A}(y).

In addition, it is called maximally monotone if there is no monotone operator $\mathcal{B}$ such that the graph of $\mathcal{A}$ is strictly included in the graph of $\mathcal{B}$ . The mapping $\mathcal{A}$ is called $\gamma$ -strongly monotone if

\langle\bar{x}-\bar{y},x-y\rangle\geq\gamma\|x-y\|^{2}\;\;\forall\;x,y\in\mathcal{H},\bar{x}\in\mathcal{A}(x)\;{\rm and}\;\bar{y}\in\mathcal{A}(y).

Note that such operators have been studied extensively because of their role in convex analysis and certain partial differential equations. The resolvent of a maximally monotone operator $\mathcal{A}$ is defined respectively by $\mathbf{J}_{\mathcal{A}}:=(\mathbf{I}\mathbf{d}+\mathcal{A})^{-1}$ . It is well-known that resolvents are single-valued and non-expansive (see, e.g., [19]).

A set-valued mapping $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is called $\gamma$ -coercive with modulus $\gamma>0$ if for all $x,y\in{\rm dom}\,\mathcal{A}$ and for all $\bar{x}\in\mathcal{A}(x)$ , $\bar{y}\in\mathcal{A}(y)$ , we have

\|\bar{x}-\bar{y}\|\geq\gamma\|x-y\|.

(12)

In order to simplify the writing we will use the notation: for all $x,y\in{\rm dom}\,\mathcal{A}$ , we have

\|\mathcal{A}(x)-\mathcal{A}(y)\|\geq\gamma\|x-y\|,

to describe this property. It is easy to see that a $\gamma$ -strongly monotone operator is $\gamma$ -coercive. Another example is given by matrices with full column rank (see, e.g., [17]). Next we extend to a pair of set-valued mappings the notion of monotonicity of a pair of single-valued operators introduced in Hilbert spaces by Adly-Cojocaru-Le [4].

Definition 4.

Let be given two set-valued mappings $\mathcal{B},\mathcal{C}:\mathbb{X}\rightrightarrows\mathbb{Y}$ . The pair $(\mathcal{B},\mathcal{C})$ is called monotone if for all $x,y\in\mathbb{X},x_{b}\in\mathcal{B}(x),y_{b}\in\mathcal{B}(y),x_{c}\in\mathcal{C}(x),y_{c}\in\mathcal{C}(y)$ , we have

\|(x_{b}-y_{b})+(x_{c}-y_{c})\|^{2}\geq\|x_{b}-y_{b}\|^{2}+\|x_{c}-y_{c}\|^{2}.

(13)

Equivalently, using the notation previously given (13) is equivalent to

\|\mathcal{B}(x)-\mathcal{B}(y)+\mathcal{C}(x)-\mathcal{C}(y)\|^{2}\geq\|\mathcal{B}(x)-\mathcal{B}(y)\|^{2}+\|\mathcal{C}(x)-\mathcal{C}(y)\|^{2}.

Note that when $\mathbb{Y}=\mathcal{H}$ is a Hilbert space, the monotonicity of $(\mathcal{B},\mathcal{C})$ is equivalent to

\langle\mathcal{B}(x)-\mathcal{B}(y),\mathcal{C}(x)-\mathcal{C}(y)\rangle\geq 0,\;\;\forall\;x,y\in\mathcal{H}.

In addition, we say $(\mathcal{B},\mathcal{C})$ is $\gamma$ -strongly monotone $(\gamma>0)$ if

\langle\mathcal{B}(x)-\mathcal{B}(y),\mathcal{C}(x)-\mathcal{C}(y)\rangle\geq\gamma\|x-y\|^{2},\;\;\forall\;x,y\in\mathcal{H}.

Remark 2.

1.

In Hilbert spaces, the monotonicity of $(\mathcal{B},\mathcal{C})$ means that the increments of $\mathcal{B}$ and $\mathcal{C}$ does not form an obtuse angle.
2.

If $(\mathcal{B},\mathcal{C})$ is monotone then for all $x,y\in\mathbb{X}$ , one has

$\|\mathcal{B}(x)-\mathcal{B}(y)+\mathcal{C}(x)-\mathcal{C}(y)\|\geq\|\mathcal{B}(x)-\mathcal{B}(y)\|.$
3.

If $\mathcal{B}$ is monotone then the pair $(\mathcal{B},\mathbf{I}\mathbf{d})$ is monotone.
4.

The strong monotonicity of the pair $(\mathcal{B},\mathcal{C})$ is important for the linear convergence of optimization algorithms solving the inclusion $0\in\mathcal{B}(x)$ [4].When $\mathcal{B}$ fails to be monotone, with some suitable choice of $\mathcal{C}$ , the pair $(\mathcal{B},\mathcal{C})$ becomes strongly monotone, as the following example shows.

Example 2.

Let $\mathcal{B},\mathcal{C}:{\mathbb{R}}^{2}\rightrightarrows{\mathbb{R}}^{2}$ be defined by

\mathcal{B}(x_{1},x_{2})=\left(\begin{array}[]{ccc}\mathbf{Sign}(x_{2})+3x_{2}+\sin|x_{1}|\\ \\ \mathbf{Sign}(x_{1})+3x_{1}+\cos|x_{2}|\end{array}\right),\;\;\mathcal{C}(x_{1},x_{2})=\left(\begin{array}[]{ccc}3x_{2}\\ \\ 3x_{1}\end{array}\right)

where

\mathbf{Sign}(a)=\left\{\begin{array}[]{l}1\;\;\;\;\;\;\;\;{\rm if}\;\;\;\;a>0\\ \\ ${\rm[-1,1]}$\;\;\;{\rm if}\;\;\;a=0,\\ \\ -1\;\;\;\;\;\;{\rm if}\;\;\;\;a<0.\end{array}\right.

Then $\mathcal{B},\mathcal{C}$ are not monotone but $(\mathcal{B},\mathcal{C})$ is $6$ -strongly monotone. Indeed for all $x=(x_{1},x_{2})$ and $y=(y_{1},y_{2})$ , we have

			$\displaystyle\langle\mathcal{B}(x)-\mathcal{B}(y),\mathcal{C}(x)-\mathcal{C}(y)\rangle$
		$\displaystyle\geq$	$\displaystyle 9(x_{2}-y_{2})^{2}+3(\sin\|x_{1}\|-\sin\|y_{1}\|)(x_{2}-y_{2})+9(x_{1}-y_{1})^{2}+3(\cos\|x_{2}\|-\cos\|y_{2}\|)(x_{1}-y_{1})$
		$\displaystyle\geq$	$\displaystyle 6\\|x-y\\|^{2}.$

3 Properties of R-Continuity

First we show that the sum (and also the difference) of two R-continuous mappings is also R-continuous.

Theorem 4.

Suppose that $\mathcal{A}_{1},\mathcal{A}_{2}:\mathbb{X}\rightrightarrows\mathbb{Y}$ are R-continuous at $\bar{x}$ then $\mathcal{A}_{1}+\mathcal{A}_{2}$ is also R-continuous at $\bar{x}$ .

Proof.

Let $\rho_{1},\rho_{2}$ and $\sigma_{1},\sigma_{2}$ be the modulus functions and radii of $\mathcal{A}_{1}$ and $\mathcal{A}_{2}$ respectively. We set $\sigma:=\min\{\sigma_{1},\sigma_{2}\}$ and $\rho(r):=\max\{\rho_{1}(r),\rho_{2}(r)\}$ for all $r\geq 0$ then $\rho$ is non-decreasing and $\lim_{r\to 0^{+}}\rho(r)=\rho(0)=0$ . Taking $x\in\mathbb{B}(\bar{x},\sigma)$ and $y\in\mathcal{A}_{1}(x)+\mathcal{A}_{2}(x)$ , then $y=y_{1}+y_{2}$ where $y_{1}\in\mathcal{A}_{1}(x)$ and $y_{2}\in\mathcal{A}_{2}(x)$ . Since $\mathcal{A}_{1},\mathcal{A}_{2}$ are R-continuous at $\bar{x}$ there exist $\bar{y}_{1}\in\mathcal{A}_{1}(\bar{x})$ and $\bar{y}_{2}\in\mathcal{A}_{2}(\bar{x})$ such that $\|y_{1}-\bar{y}_{1}\|\leq\rho_{1}(\|x-\bar{x}\|)$ and $\|y_{2}-\bar{y}_{2}\|\leq\rho_{2}(\|x-\bar{x}\|)$ . Let $\bar{y}=\bar{y}_{1}+\bar{y}_{2}\in\mathcal{A}_{1}(\bar{x})+\mathcal{A}_{2}(\bar{x})$ then

\|y-\bar{y}\|\leq\|y_{1}-\bar{y}_{1}\|+\|y_{2}-\bar{y}_{2}\|\leq 2\rho(\|x-\bar{x}\|).

(14)

It means that $\mathcal{A}_{1}+\mathcal{A}_{2}$ is R-continuous at $\bar{x}$ with modulus function $2\rho$ and radius $\sigma$ . ∎

Next we show that R-continuity is satisfied for a large class of operators and has a closed connection with the closed graph property at zero.

Theorem 5.

If $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ has a closed graph at zero and is locally compact at zero then $\mathcal{A}$ is $R$ -continuous at zero.

Proof.

We define the function $\rho$ as follows: set $\rho(0)=0$ and if $\sigma>0$ , set

\rho(\sigma)=\inf\{\delta>0:\mathcal{A}(x)\subset\mathcal{A}(0)+\delta\mathbb{B},\;\;\forall x\in\sigma\mathbb{B}\}.

It is easy to see that $\rho$ is well-defined and non-decreasing because $\mathcal{A}$ is locally compact at zero. Since $\rho$ is non-decreasing and bounded from below by $0$ , $\lim_{\sigma\to 0^{+}}\rho(\sigma)$ exists. If we suppose that $\mathcal{A}$ is not R-continuous at $0$ , then we must have $\lim_{\sigma\to 0^{+}}\rho(\sigma)=\delta^{*}>0$ . Hence there exist two sequences $(x_{n})$ , $(y_{n})$ such that $x_{n}\to 0$ , $y_{n}\in\mathcal{A}(x_{n})$ and

y_{n}\notin\mathcal{A}(0)+\frac{\delta^{*}}{2}\mathbb{B}.

(15)

Since $\mathcal{A}$ is locally compact at zero, on relabeling if necessary, we may suppose that $(y_{n})$ converges to $\bar{y}\in\mathcal{A}(0)$ since $\mathcal{A}$ has a closed graph at zero. This contradicts $(\ref{closedr})$ and the proof is complete. ∎

Theorem 6.

If $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is R-continuous at zero and $\mathcal{A}(0)$ is closed then $\mathcal{A}$ has a closed graph at zero.

Proof.

Suppose that $y_{n}\to y$ , $x_{n}\to 0$ and $y_{n}\in\mathcal{A}(x_{n})$ . From the R-continuity of $\mathcal{A}$ , we have

y_{n}\in\mathcal{A}(x_{n})\subset\mathcal{A}(0)+\rho(\|x_{n}\|).

Since $y_{n}\to y$ , $\rho(\|x_{n}\|)\to 0$ and $\mathcal{A}(0)$ is closed, we deduce that $y\in\mathcal{A}(0)$ and thus $\mathcal{A}$ has a closed graph at zero. ∎

Although R-continuity is straightforward, it always guarantees the consistency of the system in Hoffman’s sense [13].

Theorem 7.

Let $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ be a set-valued mapping. If $\mathcal{A}^{-1}$ is R-continuous at zero then the inclusion $0\in\mathcal{A}(x)$ is consistent in the sense of Hoffman, i.e., if $y_{\sigma}$ has a small norm satisfying $y_{\sigma}\in\mathcal{A}(x_{\sigma})$ then there exists a solution $\bar{x}\in\mathbf{S}:=\mathcal{A}^{-1}(0)$ such that $x_{\sigma}$ is close to $\bar{x}$ .

Proof.

Suppose that $\mathcal{A}^{-1}$ is R-continuous at zero with modulus function $\rho$ and radius $\sigma$ . Let $y_{e}\in\mathcal{A}(x_{\sigma})$ and $\|y_{e}\|\leq\sigma$ . We claim that we can find some $\bar{x}$ such that $0\in\mathcal{A}(\bar{x})$ and $\|x_{\sigma}-\bar{x}\|$ is also small. Indeed, we have

x_{\sigma}\in\mathcal{A}^{-1}(y_{e})\subset\mathcal{A}^{-1}(0)+\rho(\sigma)\mathbb{B}=\mathbf{S}+\rho(\sigma)\mathbb{B}

which implies that

\mathbf{d}(x_{\sigma},\mathbf{S})\leq\rho(\sigma)

and the conclusion follows. ∎

Remark 3.

Let us note that to obtain the consistency property, metric subregularity or metric regularity have to be satisfied globally. Indeed suppose that metric subregularity (9) holds and $\mathbf{d}(0,\mathcal{A}(x))$ is small but $x$ is not close to ${\bar{x}}$ , we cannot conclude that $x$ is close to a solution.

In optimization, when we consider the inclusion $0\in\mathcal{A}(x)$ , we want to know whether $\mathcal{A}^{-1}$ is R-continuous or even R-Lipschitz continuous. Here we provide some cases where the global R-Lipschitz continuity is satisfied. It is known that the inverse of any square matrix is globally R-Lipschitz continuous [17, Proposition 2.10]. Now we prove that the inverse of any matrix is globally R-Lipschitz continuous. Note that this result cannot be deduced from the Robinson-Ursescu Theorem. Finally we give some nontrivial nonlinear examples (see also Proposition 1 and [5, Theorem 7]).

Theorem 8.

If $\mathbf{A}\in{\mathbb{R}}^{m\times n}$ is a matrix, then $\mathbf{A}^{-1}$ is globally R-Lipschitz continuous.

Proof.

Let be given $\bar{y}\in{\rm dom}\,\mathbf{A}^{-1}$ . For all $y\in{\rm dom\,}(\mathbf{A}^{-1})$ and $x\in\mathbf{A}^{-1}(y)$ , we want to find $\bar{x}\in\mathbf{A}^{-1}(\bar{y})$ such that

\|x-\bar{x}\|\leq\kappa\|y-\bar{y}\|,

for some $\kappa>0$ . First we take some $x^{\prime}\in\mathbf{A}^{-1}(\bar{y})$ . We have $y=\mathbf{A}x$ and $\bar{y}=\mathbf{A}x^{\prime}$ . Note that $\mathbf{A}^{T}\mathbf{A}\in{\mathbb{R}}^{n\times n}$ is a positive semi-definite matrix. Let $x=x_{i}+x_{k}$ where $x_{i},x_{k}$ are the projections of $x$ onto ${\rm Im}(\mathbf{A}^{T}\mathbf{A})$ and ${\rm Ker}(\mathbf{A}^{T}\mathbf{A})={\rm Ker}(\mathbf{A})$ , respectively. Similarly we decompose $x^{\prime}=x^{\prime}_{i}+x^{\prime}_{k}$ . Then we have

\|\mathbf{A}^{T}\|\|y-\bar{y}\|\geq\|\mathbf{A}^{T}(y-\bar{y})\|=\|\mathbf{A}^{T}\mathbf{A}(x_{i}-x_{i}^{\prime})\|\geq k\|x_{i}-x_{i}^{\prime}\|

for some $k>0$ (see, e.g., [17, Lemma 2.8] or [27, Lemma 3]). Thus if we choose $\bar{x}=x_{i}^{\prime}+x_{k}$ then $\mathbf{A}\bar{x}=\mathbf{A}(x_{i}^{\prime}+x_{k})=\mathbf{A}(x_{i}^{\prime}+x^{\prime}_{k})=\mathbf{A}x^{\prime}=\bar{y}$ since $x_{k},x^{\prime}_{k}\in{\rm Ker}(\mathbf{A})$ . In addition, we have

\|x-\bar{x}\|=\|x_{i}-x_{i}^{\prime}\|\leq\frac{\|\mathbf{A}^{T}\|}{k}\|y-\bar{y}\|,

and the conclusion follows. ∎

Proposition 9.

If $\mathcal{A}:{\mathbb{R}}^{n}\rightrightarrows{\mathbb{R}}^{m}$ has the form $\mathcal{A}=\mathbf{B}+\mathcal{F}\circ\mathbf{B}$ where $\mathbf{B}\in{\mathbb{R}}^{m\times n}$ is a matrix and $\mathcal{F}:{\mathbb{R}}^{m}\rightrightarrows{\mathbb{R}}^{m}$ is monotone then $\mathcal{A}^{-1}$ is globally R-Lipschitz continuous.

Proof.

Suppose that $y\in\mathcal{A(}x)=\mathbf{B}x+y_{1},\;y_{1}\in\mathcal{F}(\mathbf{B}x)$ and $\bar{y}\in{\rm dom\,}(\mathcal{A}^{-1})$ . We want to find $\bar{x}\in\mathcal{A}^{-1}(\bar{y})$ such that

\|x-\bar{x}\|\leq\kappa\|y-\bar{y}\|,

for some $\kappa>0$ . First we take some $x^{\prime}\in\mathcal{A}^{-1}(\bar{y})$ , i.e., $\bar{y}\in\mathcal{A}(x^{\prime})=\mathbf{B}x^{\prime}+\bar{y}_{1},\;\bar{y}_{1}\in\mathcal{F}(\mathbf{B}x^{\prime})$ . Using the proof of Theorem 8, there exists $\bar{x}$ such that $\mathbf{B}\bar{x}=\mathbf{B}x^{\prime}$ and $\|\mathbf{B}x-\mathbf{B}\bar{x}\|\geq\kappa_{1}\|x-\bar{x}\|$ for some $\kappa_{1}>0$ . Since $\mathcal{F}$ is monotone, one has

\|y-\bar{y}\|^{2}=\|\mathbf{B}x-\mathbf{B}\bar{x}+y_{1}-\bar{y}_{1}\|^{2}\geq\|\mathbf{B}x-\mathbf{B}\bar{x}\|^{2}\geq\kappa_{1}^{2}\|x-\bar{x}\|^{2}

and the conclusion follows. ∎

Proposition 10.

If $\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}$ has the form $\mathcal{A}=\mathcal{B}+\mathcal{C}$ where $\mathcal{B}:\mathbb{X}\rightrightarrows\mathbb{Y}$ is coercive and the pair $(\mathcal{B},\mathcal{C})$ is monotone then $\mathcal{A}^{-1}$ is globally R-Lipschitz continuous.

Proof.

Suppose that $\bar{y}=\bar{y}_{1}+\bar{y}_{2}\in\mathcal{A}(\bar{x})$ where $\bar{y}_{1}\in\mathcal{B}(\bar{x}),\bar{y}_{2}\in\mathcal{C}(\bar{x})$ and $y=y_{1}+y_{2}\in\mathcal{A}(x)$ where $y_{1}\in\mathcal{B}(x),y_{2}\in\mathcal{C}(x)$ . Since $\mathcal{B}:\mathbb{X}\to\mathbb{Y}$ is coercive and the pair $(\mathcal{B},\mathcal{C})$ is monotone, one has

$\displaystyle\\|\bar{y}-y\\|^{2}$	$\displaystyle=$	$\displaystyle\\|\bar{y}_{1}-y_{1}+\bar{y}_{2}-y_{2}\\|^{2}$
	$\displaystyle\geq$	$\displaystyle\\|\bar{y}_{1}-y_{1}\\|^{2}$
	$\displaystyle\geq$	$\displaystyle\kappa^{2}\\|\bar{x}-x\\|^{2},$

for some $\kappa>0$ . Thus $\mathcal{A}^{-1}$ is single-valued Lipschitz continuous and thus is R-Lipschitz continuous. ∎

4 Explicit Convergence Rate of $\mathbf{P}\mathbf{P}\mathbf{A}$ under the R-Continuity

Let $\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H}$ be a maximally monotone operator with nonempty $\mathbf{S}:=\mathcal{A}^{-1}(0)$ where $\mathcal{H}$ is a Hilbert space. Let us recall that $\mathbf{P}\mathbf{P}\mathbf{A}$ generates for any starting point $x_{0}$ , a sequence $(x_{n})$ defined by the rule:

x_{n+1}=\textbf{J}_{\gamma\mathcal{A}}(x_{n}),\;x_{0}\in\mathcal{H},

(16)

for some $\gamma>0$ where $\textbf{J}_{\gamma\mathcal{A}}(x):\,=(\mathbf{I}\mathbf{d}+\gamma\mathcal{A})^{-1}(x)$ . $\mathbf{P}\mathbf{P}\mathbf{A}$ plays an important role in convex optimization if $\mathcal{A}$ is set-valued and has no special structure. The following result shows that the convergence rate of $\mathbf{P}\mathbf{P}\mathbf{A}$ is indeed linear if $\mathcal{A}^{-1}$ is R-Lipschitz continuous at zero globally.

Theorem 11.

Suppose that $\mathcal{A}^{-1}$ is R-Lipschitz continuous at zero globally with modulus function $\rho(r)=Lr$ for some $L>0$ . Then for $n\geq 1$ , the sequence $(\mathbf{d}(x_{n},\mathbf{S}))_{n\in{\mathbb{N}}}$ converges to zero with linear rate where $(x_{n})$ is the sequence generated by $\mathbf{P}\mathbf{P}\mathbf{A}$ with $\gamma>2L$ .

Proof.

Let $\bar{x}_{n}=\mathbf{P}{\mathbf{r}}\mathbf{o}\mathbf{j}_{\mathbf{S}}(x_{n})$ . We have $\|x_{n+1}-\bar{x}_{n}\|\leq\|x_{n}-\bar{x}_{n}\|=\mathbf{d}(x_{n},\mathbf{S})$ where the inequality is obtained by using the nonexpansiveness of the resolvent. We have

\displaystyle-\frac{x_{n+1}-x_{n}}{\gamma}\in\mathcal{A}(x_{n+1}).

(17)

Since $\mathcal{A}^{-1}$ is R-Lipschitz continuous at zero globally, we obtain

x_{n+1}=\mathcal{A}^{-1}\Big{(}-\frac{x_{n+1}-x_{n}}{\gamma}\Big{)}\subset\mathcal{A}^{-1}(0)+\rho\Big{(}\frac{\|x_{n+1}-x_{n}\|}{\gamma}\Big{)}\mathbb{B}=\mathbf{S}+\rho\Big{(}\frac{\|x_{n+1}-x_{n}\|}{\gamma}\Big{)}\mathbb{B}.

Consequently

\mathbf{d}(x_{n+1},\mathbf{S})\leq\frac{L\|x_{n+1}-x_{n}\|}{\gamma}\leq\frac{L}{\gamma}\Big{(}\|x_{n+1}-\bar{x}_{n}\|+\|x_{n}-\bar{x}_{n}\|\Big{)}\leq\frac{2L}{\gamma}\|x_{n}-\bar{x}_{n}\|=\kappa\mathbf{d}(x_{n},\mathbf{S})

where $\kappa=\frac{2L}{\gamma}<1$ by choosing $\gamma>2L$ . ∎

Next we consider the convergence rate of $\mathbf{P}\mathbf{P}\mathbf{A}$ under only the R-continuity of $\mathcal{A}^{-1}$ . The following result is similar to [3, Lemma 3.1]. Here we give a proof.

Lemma 12.

Let $f:[t_{0},\infty)\to{\mathbb{R}}^{+}$ be a nonincreasing function which is locally absolutely continuous and belongs to $\mathbf{L}^{1}(t_{0},\infty)$ . Then $\lim_{t\to\infty}tf(t)=0.$

Proof.

Without loss of generality, suppose that $t_{0}\geq 0$ since the limit involves only for $t$ large. Let $F(t)=tf(t)$ then $F$ is locally absolutely continuous, bounded from below by $0$ and for almost all $t\geq t_{0}$ , we have

\frac{d}{dt}(F(t))=f(t)+tf^{\prime}(t)\leq f(t).

Since $f\in\mathbf{L}^{1}(t_{0},\infty)$ , using [1, Lemma 5.1], we deduce that $\lim_{t\to\infty}F(t)$ exists. Suppose that $\lim_{t\to\infty}F(t)=c>0$ then there exists $T>0,c_{1}>0$ such that for all $t\geq T$ one has $F(t)\geq c_{1}$ , or equivalently $f(t)\geq c_{1}t^{-1}$ . Then

\int_{T}^{\infty}f(t)dt\geq c_{1}\int_{T}^{\infty}t^{-1}dt=\infty,

a contradiction. Thus we must have $\lim_{t\to\infty}F(t)=0$ or equivalently $\lim_{t\to\infty}tf(t)=0$ . ∎

Lemma 13.

Let $(a_{n})$ be a nonincreasing nonnegative sequence such that $\sum_{n=1}^{\infty}a_{n}<\infty$ . Then we have $\lim_{n\to\infty}na_{n}=0$ .

Proof.

Since $\sum_{n=1}^{\infty}a_{n}<\infty$ , we have $\sum_{n=1}^{\infty}a_{n+1}<\infty$ and thus $\sum_{n=1}^{\infty}b_{n}<\infty$ where $b_{n}:=(a_{n}+a_{n+1})/2$ . Let $f:[1,\infty)\to{\mathbb{R}}^{+}$ be defined as follows: if $t\in[n,n+1]$ then $f(t)=a_{n}+(a_{n+1}-a_{n})(t-n)$ , $n\geq 1$ . Then $f$ is a nonincreasing function which is locally absolutely continuous and belongs to $\mathbf{L}^{1}(0,\infty)$ since

\int_{1}^{\infty}f(t)dt=\sum_{n=1}^{\infty}b_{n}<\infty.

Note that if $t\in[n-1,n)$ then $f(t)\geq a_{n}$ and $tf(t)\geq(n-1)a_{n}$ . Using Lemma 12, we have $\lim_{t\to\infty}tf(t)=0$ and thus

\lim_{n\to\infty}(n-1)a_{n}=0.

Therefore $\lim_{n\to\infty}na_{n}=0$ since $\lim_{n\to\infty}a_{n}=0$ . ∎

Remark 4.

The fact that the sequence $(a_{n})$ is nonincreasing cannot be omitted. For example let $n_{k}=k^{2},k=1,2,3\ldots$ and $x_{n}=0$ if $n\neq n_{k}$ and $x_{n}=\frac{1}{n}$ if $n=n_{k}$ . Then

\sum_{n=1}^{\infty}x_{n}=\sum_{k=1}^{\infty}\frac{1}{k^{2}}<\infty.

However $nx_{n}=0$ if $n\neq n_{k}$ and $nx_{n}={1}$ if $n=n_{k}$ . Thus the sequence $(nx_{n})$ is not convergent.

Theorem 14.

Suppose that $f:\mathcal{H}\to{\mathbb{R}}\cup\{+\infty\}$ is a proper lower semicontinuous convex function such that ${\rm inf}f>-\infty$ and $(\partial f)^{-1}$ is R-continuous at zero with modulus function $\rho$ and radius $\sigma$ . Let $(x_{n})$ be the sequence generated by the proximal point algorithm

x_{0}\in\mathcal{H},\;x_{n+1}={\rm\textbf{J}}_{\gamma\partial f}(x_{n}),\;{\rm for\;some\;}\gamma>0.

(18)

We have
a) $\|x_{n+1}-x_{n}\|^{2}=o(\frac{1}{n}).$
b) Let $n_{0}$ be such that $\frac{\|x_{n_{0}+1}-x_{n_{0}}\|}{\gamma}\leq\sigma$ . Then for all $n\geq n_{0}$ , we have

\mathbf{d}(x_{n},\mathbf{S})\leq\rho\Big{(}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\Big{)}\to 0

and

f(x_{n})-f^{*}\leq\mathbf{d}(x_{n},\mathbf{S})o(\frac{1}{\sqrt{n}})\leq\rho\Big{(}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\Big{)}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\to 0

where $f^{*}=\min_{x\in\mathcal{H}}f(x)$ .

c) If $\rho(r)=Lr$ for some $L>0$ and $\gamma>2L$ then for $n\geq n_{0}$ , the distance $\mathbf{d}(x_{n},\mathbf{S})$ converges to zero with linear rate and so is $f(x_{n})-f^{*}$ .

Proof.

a) From

\displaystyle-\frac{x_{n+1}-x_{n}}{\gamma}\in\partial f(x_{n+1})

(19)

and the fact that $f$ is convex, we imply that

\left\langle-\frac{x_{n+1}-x_{n}}{\gamma},x_{n}-x_{n+1}\right\rangle\leq f(x_{n})-f(x_{n+1}).

Hence

\sum_{n=1}^{\infty}\|x_{n+1}-x_{n}\|^{2}\leq\gamma(f(x_{0})-{\rm inf}f)<\infty

and thus $\|x_{n+1}-x_{n}\|$ converges to zero. Since $(\|x_{n+1}-x_{n}\|)_{n}$ is nonincreasing due to the nonexpansiveness of the resolvent, using Lemma 13, we have

\|x_{n+1}-x_{n}\|^{2}=o(\frac{1}{n}).

b) From (19) and the R-continuity of $(\partial f)^{-1}$ , for $n\geq n_{0}$ , we obtain

x_{n+1}=(\partial f)^{-1}\left(-\frac{x_{n+1}-x_{n}}{\gamma}\right)\subset(\partial f)^{-1}(0)+\rho\left(\frac{\|x_{n+1}-x_{n}\|}{\gamma}\right)\mathbb{B}=\mathbf{S}+\rho\left(\frac{\|x_{n+1}-x_{n}\|}{\gamma}\right)\mathbb{B}.

Therefore

\mathbf{d}(x_{n+1},\mathbf{S})\leq\rho\Big{(}\frac{\|x_{n+1}-x_{n}\|}{\gamma}\Big{)}=\rho\Big{(}o(\frac{1}{\sqrt{n}})\Big{)}\to 0,\;\;{\rm as}\;\;n\to\infty.

(20)

Let $x_{n+1}^{*}=\mathbf{P}{\mathbf{r}}\mathbf{o}\mathbf{j}_{\mathbf{S}}(x_{n+1})$ then $f(x_{n+1}^{*})=f^{*}$ . Using (19), one has

\left\langle-\frac{x_{n+1}-x_{n}}{\gamma},x_{n+1}^{*}-x_{n+1}\right\rangle\leq f(\bar{x})-f(x_{n+1}).

which implies that

f(x_{n+1})-f^{*}\leq\left\|\frac{x_{n+1}-x_{n}}{\gamma}\right\|\mathbf{d}(x_{n+1},\mathbf{S})\leq\rho\Big{(}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\Big{)}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\to 0.

c) Let $\bar{x}_{n}=\mathbf{P}{\mathbf{r}}\mathbf{o}\mathbf{j}_{\mathbf{S}}(x_{n})$ . We know that $\|x_{n+1}-\bar{x}_{n}\|\leq\|x_{n}-\bar{x}_{n}\|=\mathbf{d}(x_{n},\mathbf{S})$ . From (20), for $n\geq n_{0}$ , we have

\mathbf{d}(x_{n+1},\mathbf{S})\leq\frac{L\ {}\|x_{n+1}-x_{n}\|}{\gamma}\leq\frac{L}{\gamma}\Big{(}\|x_{n+1}-\bar{x}_{n}\|+\|x_{n}-\bar{x}_{n}\|\Big{)}\leq\frac{2L}{\gamma}\|x_{n}-\bar{x}_{n}\|=\kappa\mathbf{d}(x_{n},\mathbf{S})

where $\kappa=\frac{2L}{\gamma}<1$ by choosing $\gamma>2L$ .
∎

Remark 5.

R-continuity ensures $\mathbf{P}\mathbf{P}\mathbf{A}$ (and also $\mathbf{D}\mathbf{C}\mathbf{A}$ [17]) to be consistent in the numerical sense, i.e., if $\|x_{n+1}-x_{n}\|$ is small then $x_{n}$ is close to a solution. It provides an estimation of the distance between $x_{n}$ and the solution set based on the convergence rate of $\|x_{n+1}-x_{n}\|$ to zero.

5 Conclusions

In this paper, we highlighted several advantages of R-continuity compared to other key tools used in optimization, such as metric regularity, metric subregularity and calmness. We explored important properties of R-continuity and derived an explicit convergence rate for the Proximal Point Algorithm ( $\mathbf{P}\mathbf{P}\mathbf{A}$ ) under this framework. We believe that the technique developed in the paper can be extended to gain further insights into the convergence rate of other optimization algorithms.

6 Acknowledgements

This research benefited from the support of the FMJH Program Gaspard Monge for optimization and operations research and their interactions with data science.

References

[1] B. Abbas, H. Attouch, B. F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces, J. Optim. Theory Appl., 161 (2), 331–360, 2014
[2] V. Acary, O. Bonnefon, B. Brogliato, Nonsmooth Modeling and Simulation for Switched Circuits, Lecture Notes in Electrical Engineering Vol 69. Springer Netherlands, 2011
[3] S. Adly, H. Attouch, M. H. Le, A doubly nonlinear evolution system with threshold effects associated with dry friction, J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02417-2
[4] S. Adly, M. G. Cojocaru, B. K. Le, State-Dependent Sweeping Processes: Asymptotic Behavior and Algorithmic Approaches. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02485-4
[5] S. Adly, A. L. Dontchev, M. Théra, On one-sided Lipschitz stability of set-valued contractions. Numer. Funct. Anal. Optim. 35, 837–850 (2014)
[6] H. Attouch and G. Buttazzo, G. Michaille, Variational Analysis in Sobolev and BV Spaces, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014
[7] H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
[8] H. Brezis, Opérateurs Maximaux Monotones et Semi-groupes de Contractions dans les Espaces de Hilbert, Math. Studies 5, North-Holland American Elsevier, 1973
[9] H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer New York, NY, 2010
[10] A. V. Dmitruk, A. Y. Kruger, Metric regularity and systems of generalized equations, J. Math. Anal. Appl., 342(2):864–873, 2008
[11] A. L. Dontchev and R. T. Rockafellar, Implicit Functions and Solution Mappings. A View from Variational Analysis. Springer Monographs in Mathematics. Springer, Dordrecht, 2009
[12] R. Henrion, J. V. Outrata, Calmness of constraint systems with applications, Math. Program., Ser. B 104, 437–464 (2005)
[13] A. J. Hoffman, On approximate solutions of systems of linear inequalities, J. Research Nat. Bureau Standards 49(4) (1952), 263–265
[14] A. D. Ioffe, Metric regularity–a survey. Part I. Theory. J. Aust. Math. Soc., 101(2): 188–243, 2016.
[15] A. D. Ioffe, J. V. Outrata, On metric and calmness qualification conditions in subdifferential calculus. Set-Valued Anal., 16(2-3): 199–227, 2008
[16] A. Y. Kruger, Nonlinear metric subregularity. J. Optim. Theory Appl., 171(3): 820–855, 2016.
[17] B. K. Le, R-Continuity with Applications to Convergence Analysis of Tikhonov Regularization and DC Programming, Journal of Convex Analysis, Volume 31 (2024), No. 1, 243–254
[18] B. Martinet, Regularisation d’inéquations variationelles par approximations successives, Rev. Francaise Inf. Rech.Oper., (1970), pp. 154–159
[19] G. J. Minty, Monotone (Nonlinear) Operators in Hilbert Space. Duke Mathematical Journal, 29, 341–346, (1962).
[20] Ch. A. Micchelli, L. Chen and Y. Xu, Proximity algorithms for image models: Denoising, Inverse Problems, Vol. 27(4) 045009, 2011
[21] H. V. Ngai, M. Théra, Error Bounds in Metric Spaces and Application to the Perturbation Stability of Metric Regularity, SIAM J. Optim. 19(1), 1–20 (2008)
[22] J. P. Penot, Calculus without Derivatives, Springer New York, 2014
[23] S. M. Robinson, Regularity and stability for convex multivalued functions, Math. Oper. Res. 1 (1976), 130–143
[24] S. M. Robinson, Some continuity properties of polyhedral multifunctions, Math. Program. Study, 14 (1981) 206–214
[25] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optimization 14(5), 877–898, 1976
[26] R. T. Rockafellar, R.J.-B. Wets, Variational analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer- Verlag, Berlin, 1998
[27] A. Tanwani, B. Brogliato, C. Prieur, Well-Posedness and Output Regulation for Implicit Time-Varying Evolution Variational Inequalities, SIAM J. Control Opti., Vol. 56(2), 751–781, 2018
[28] L. Thibault, Unilateral Variational Analysis in Banach Spaces. World Scientific, 2023
[29] C. Ursescu, Multifunctions with closed convex graphs, Czechoslovak Math. J. 25 (1975), 438–411.
[30] X. Y. Zheng and K. F. Ng, Metric subregularity and constraint qualifications for convex generalized equations in Banach spaces. SIAM J. Optim., 18(2):437–460, 2007
[31] X. Y. Zheng and K. F. Ng, Metric subregularity and calmness for nonconvex generalized equations in Banach spaces, SIAM J. Optim., 20(5):2119–2136, 2010

$\displaystyle\\|\bar{y}-y\\|^{2}$	$\displaystyle=$	$\displaystyle\\|\bar{y}_{1}-y_{1}+\bar{y}_{2}-y_{2}\\|^{2}$
	$\displaystyle\geq$	$\displaystyle\\|\bar{y}_{1}-y_{1}\\|^{2}$
	$\displaystyle\geq$	$\displaystyle\kappa^{2}\\|\bar{x}-x\\|^{2},$

Explicit Convergence Rate of The Proximal Point Algorithm under R-Continuity

Abstract

1 Introduction

2 Mathematical preliminaries

Definition 1.

Remark 1.

Definition 2.

Proposition 1.

Proof.

Example 1.

Definition 3.

Proposition 2.

Proof.

Proposition 3.

Proof.

Definition 4.

Remark 2.

Example 2.

3 Properties of R-Continuity

Theorem 4.

Proof.

Theorem 5.

Proof.

Theorem 6.

Proof.

Theorem 7.

Proof.

Remark 3.

Theorem 8.

Proof.

Proposition 9.

Proof.

Proposition 10.

Proof.

4 Explicit Convergence Rate of 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} under the R-Continuity

Theorem 11.

Proof.

Lemma 12.

Proof.

Lemma 13.

Proof.

Remark 4.

Theorem 14.

Proof.

Remark 5.

5 Conclusions

6 Acknowledgements

References

4 Explicit Convergence Rate of $\mathbf{P}\mathbf{P}\mathbf{A}$ under the R-Continuity