This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Explicit Convergence Rate of The Proximal Point Algorithm under R-Continuity

Ba Khiet Le   Michel Théra Optimization Research Group, Faculty of Mathematics and Statistics, Ton Duc Thang University, Ho Chi Minh City, VietnamE-mail: [email protected]Mathematics and Computer Science Department, University of Limoges, 123 Avenue Albert Thomas, 87060 Limoges CEDEX, France School of Engineering, IT and Physical Sciences, Federation University, Ballarat, Victoria, 3350, Australia E-mail: [email protected]
Abstract

The paper provides a thorough comparison between R-continuity and other fundamental tools in optimization such as metric regularity, metric subregularity and calmness. We show that R-continuity has some advantages in the convergence rate analysis of algorithms solving optimization problems. We also present some properties of R-continuity and study the explicit convergence rate of the Proximal Point Algorithm (𝐏𝐏𝐀)(\mathbf{P}\mathbf{P}\mathbf{A}) under the R-continuity.

Keywords. R-continuity, metric regularity, metric subregularity, calmness, Proximal Point Algorithm, convergence rate

AMS Subject Classification. 28B05, 34A36, 34A60, 49J52, 49J53, 93D20

1 Introduction

In what follows, 𝕏\mathbb{X} and 𝕐\mathbb{Y} are real Banach spaces whose norms are designated by \|\cdot\|. We use the notation 𝔹(x,r)\mathbb{B}(x,r) to denote the closed ball with center xx and radius r>0r>0 and by 𝔹\mathbb{B} the closed unit ball which consists of the elements of norm less than or equal to 11. By a set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y}, we mean a mapping which assigns to each x𝕏x\in\mathbb{X} a subset 𝒜(x)\mathcal{A}(x) (possibly empty) of 𝕐\mathbb{Y}. The domain, the range and the graph of 𝒜\mathcal{A} are defined respectively by

dom𝒜={x𝕏:𝒜(x)},rge𝒜=x𝕏𝒜(x),{\rm dom}\,\mathcal{A}=\{x\in\mathbb{X}:\mathcal{A}(x)\neq\emptyset\},\;\;{\rm rge}\,\mathcal{A}=\cup_{x\in\mathbb{X}}\mathcal{A}(x),

and

Graph𝒜={(x,y)𝕏×𝕐such thaty𝒜(x)}.{\rm Graph\,}{\mathcal{A}=\{(x,y)\in\mathbb{X}\times\mathbb{Y}\;\text{such that}\;y\in\mathcal{A}(x})\}.

As usual we denote by 𝒜1:𝕐𝕏\mathcal{A}^{-1}:\mathbb{Y}\rightrightarrows\mathbb{X} the inverse of 𝒜\mathcal{A} defined by

x𝒜1(y)y𝒜(x).x\in\mathcal{A}^{-1}(y)\;\iff\;y\in\mathcal{A}(x).

The notion 𝐝(x,𝐒)\mathbf{d}(x,\mathbf{S}) stands for the distance from a point x𝕏x\in\mathbb{X} to a subset 𝐒𝕏\mathbf{S}\subset\mathbb{X}:

𝐝(x,𝐒):=infy𝐒xy.\mathbf{d}(x,\mathbf{S}):=\,\inf_{y\in\mathbf{S}}\|x-y\|.

Given a set-valued mapping 𝒜:\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H} defined in a Hilbert space \mathcal{H}, and inspired by Rockafellar’s paper [25], recently B. K. Le [17] introduced the notion of R-continuity for studying the convergence rate of the Tikhonov regularization of the inclusion

0𝒜(x).0\in\mathcal{A}(x). (1)

In [17], it was proved that R-continuity is a useful tool to analyze the convergence of 𝐃𝐂𝐀\mathbf{D}\mathbf{C}\mathbf{A} (Difference of Convex Algorithm) and can explain why 𝐃𝐂𝐀\mathbf{D}\mathbf{C}\mathbf{A} is effective in approximating solutions for a broad class of functions. In recent decades, there has been a surge of interest to study variational inclusions such as (1) since they modelise a variety of important systems. This is the case especially in optimization when the condition for critical points is considered (the Fermat rule). Another connection with the inclusion (1) arises in PDEs and is well discussed in the books [6, 9]. The fact that the solution set 𝐒:=𝒜1(0)\mathbf{S}:={\mathcal{A}}^{-1}(0) of (1) involves the inverse operator of 𝒜\mathcal{A}, conducts naturally to study the continuity of 𝒜1\mathcal{{A}}^{-1} at zero. According to [17], the mapping 𝒜:\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H} is said to be R-continuous at zero if there exist a radius σ>0\sigma>0 and a non-decreasing modulus function ρ:++\rho:\mathbb{R}^{+}\to\mathbb{R}^{+} satisfying limr0+ρ(r)=ρ(0)=0\lim_{r\to 0^{+}}\rho(r)=\rho(0)=0 such that

𝒜(x)𝒜(0)+ρ(x)𝔹, for everyxσ𝔹.\mathcal{A}(x)\subset\mathcal{A}(0)+\rho(\|x\|)\mathbb{B},\;\text{ for every}\;x\in\sigma\mathbb{B}. (2)

Denoting by 𝐞𝐱(C,D):=supxC𝐝(x,D)\mathbf{e}\mathbf{x}(C,D):=\sup_{x\in C}\mathbf{d}(x,D), the excess of the set CC over the set DD, with the convention that 𝐞𝐱(,D)=0\mathbf{e}\mathbf{x}(\emptyset,D)=0 when DD is nonempty and 𝐞𝐱(C,)=+\mathbf{e}\mathbf{x}(C,\emptyset)=+\infty for any CC, (2) is equivalent to saying that

𝐞𝐱(𝒜(x),𝒜(0))ρ(x)wheneverxσ.\mathbf{e}\mathbf{x}(\mathcal{A}(x),\mathcal{A}(0))\leq\rho(\|x\|)\;\text{whenever}\;\|x\|\leq\sigma. (3)

When ρ(r):=Lr\rho(r):=\,Lr for some L>0L>0, (3) is changed into

𝐞𝐱(𝒜(x),𝒜(0))Lxwheneverxσ.\mathbf{e}\mathbf{x}(\mathcal{A}(x),\mathcal{A}(0))\leq L\|x\|\;\text{whenever}\;\|x\|\leq\sigma. (4)

In this case, 𝒜\mathcal{A} is referred to as R-Lipschitz continuous at zero or equivalently upper Lipschitz at zero in the sense of Robinson [24] or outer Lipschitz continuous at zero [5]. In addition, if 𝒜(0)\mathcal{A}(0) is a singleton, R-Lipschitz continuity at zero of 𝒜\mathcal{A} is exactly the Lipschitz continuity at zero introduced by R. T. Rockafellar in [25] who considered Lipschitz continuity at zero of set-valued mappings as an important tool in optimization. Rockafellar demonstrated the linear convergence rate of the proximal point algorithm after a finite number of iterations from any starting point x0x_{0}. However the requirement that the solution set is a singleton is often quite restrictive in practice. Thus, allowing the solution set 𝐒\mathbf{S} to be set-valued and and not requiring the continuity modulus function ρ\rho to be Lipschitz continuous makes R-continuity a competitive alternative. It provides a viable option alongside other fundamental concepts such as metric regularity, metric subregularity or calmness in the study of sensitivity analysis as well as in establishing the convergence rate of algorithms. Note that R-continuity can be also extended to Banach spaces. In order to compare these regularity concepts, let us recall the definitions of metric regularity, metric subregularity and calmness (see, e.g., [14, 15, 12, 30, 31, 10, 16, 26, 11, 21, 22, 28] and the references therein). Given 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} where 𝕏,𝕐\mathbb{X},\mathbb{Y} are real Banach spaces and (x¯,y¯)Graph𝒜(\bar{x},\bar{y})\in{\rm Graph\,}\mathcal{A}, one says that 𝒜\mathcal{A} is metrically regular near (x¯,y¯)(\bar{x},\bar{y}) if a linear error bound holds

𝐝(x,𝒜1(y))κ𝐝(y,𝒜(x))\mathbf{d}(x,\mathcal{A}^{-1}(y))\leq\kappa\mathbf{d}(y,\mathcal{A}(x)) (5)

for some κ>0\kappa>0 and for all (x,y)(x,y) close to (x¯,y¯)(\bar{x},\bar{y}). If (5) is satisfied for all (x,y)𝕏×𝕐(x,y)\in\mathbb{X}\times\mathbb{Y}, then 𝒜\mathcal{A} is termed globally metrically regular. When (x¯,0)Graph𝒜(\bar{x},0)\in{\rm Graph\,}\,\mathcal{A}, inequality (5) provides an estimate of how far is a point xx around x¯\bar{x} to the solution set 𝒜1(0)\mathcal{A}^{-1}(0). However, metric regularity can be too stringent as it requires the inequality (5) to hold in a neighborhood of (x¯,y¯)(\bar{x},\bar{y}). One says that 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is calm at (x¯,y¯)Graph𝒜(\bar{x},\bar{y})\in{\rm Graph\,}\mathcal{A} if there exist κ>0,ϵ>0,σ>0\kappa>0,\epsilon>0,\sigma>0 such that

𝒜(x)𝔹(y¯,ϵ)𝒜(x¯)+κxx¯𝔹,for allx𝔹(x¯,σ),\mathcal{A}({x})\cap\mathbb{B}(\bar{y},\epsilon)\subset\mathcal{A}(\bar{x})+\kappa\|x-\bar{x}\|\mathbb{B},\;\;\;\text{for all}\;x\in\mathbb{B}(\bar{x},\sigma), (6)

or equivalently

𝐞𝐱(𝒜(x)𝔹(y¯,ϵ),𝒜(x¯))κxx¯,forallx𝔹(x¯,σ).\mathbf{e}\mathbf{x}(\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon),\mathcal{A}(\bar{x}))\leq\kappa\|x-\bar{x}\|,\;\;{\rm for\;all}\;x\in\mathbb{B}(\bar{x},\sigma). (7)

Note that when the set 𝒜(x)𝔹(y¯,ϵ)\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon) is empty the inequality (7) is always satisfied. When x¯=0\bar{x}=0, (7) becomes

𝐞𝐱(𝒜(x)𝔹(y¯,ϵ),𝒜(0))κx,wheneverxσ.\mathbf{e}\mathbf{x}(\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon),\mathcal{A}(0))\leq\kappa\|x\|,\;\;{\rm whenever}\;\|x\|\leq\sigma. (8)

Another well-known regularity concept is metric subregularity. The set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is called metrically subregular at (x¯,0)Graph𝒜(\bar{x},0)\in{\rm Graph\,}\,\mathcal{A} if there exists a constant κ>0\kappa>0 such that

𝐝(x,𝒜1(0))κ𝐝(0,𝒜(x)), for all xclose tox¯.\mathbf{d}(x,\mathcal{A}^{-1}(0))\leq\kappa\mathbf{d}(0,\mathcal{A}(x)),\;\;\text{ for all }\;x\;\text{close to}\;\bar{x}. (9)

It is known (see, e.g., [14, Proposition 2.62]) that metric subregularity of 𝒜\mathcal{A} at (x¯,0)(\bar{x},0) is equivalent to calmness of 𝒜1\mathcal{A}^{-1} at (0,x¯)(0,\bar{x}). From (3) and (8), let us observe that if 𝒜\mathcal{A} is R-Lipschitz continuous at 0 then it is calm at (0,y¯)(0,\bar{y}) for any y¯𝒜(0)\bar{y}\in\mathcal{A}(0). While metric regularity is too strong, calmness is relatively weak to deduce useful properties. For example when the set 𝒜(x)𝔹(y¯,ϵ)\mathcal{A}(x)\cap\mathbb{B}(\bar{y},\epsilon) is empty for some xϵ𝔹x\in\epsilon\mathbb{B}, no information can be deduced. In addition, with complicated 𝒜\mathcal{A}, we do not know a specific y¯\bar{y} and have to use computers to find an approximation of an element of 𝒜(0)\mathcal{A}(0). The first advantage of R-continuity is that it is unnecessary to know a prior solution y¯\bar{y} in advance. Furthermore in (2), since xx is small, for each y𝒜(x)y\in\mathcal{A}(x), there exists some y~𝒜(0)\tilde{y}\in\mathcal{A}(0) close to yy. This fact is meaningful since it is difficult to ensure yy to be in the vicinity of y¯\bar{y}, which is unknown. Secondly, R-continuity is straightforward and always guarantees the system consistency in Hoffman’s sense [13]. Indeed, Theorem 7 establishes that under the R-continuity of 𝒜1\mathcal{A}^{-1} at zero, the inclusion (1) is consistent, i.e., if yσy_{\sigma} has a small norm and satisfies yσ𝒜(xσ)y_{\sigma}\in\mathcal{A}(x_{\sigma}), then we can find a solution x¯𝐒\bar{x}\in\mathbf{S} such that xσx_{\sigma} is close to x¯\bar{x}. Note that to guarantee the consistency property, metric subregularity or metric regularity must be satisfied globally. Thirdly, R-continuity does not require the property (2) to hold around a point belonging to the graph of the operator; it is easily verified for a broad class of set-valued mappings. In order to achieve the metric regularity at some point (x¯,y¯)Graph𝒜(\bar{x},\bar{y})\in{\rm Graph\,}\mathcal{A}, the celebrated Robinson-Ursescu Theorem [23, 29] requires for the operator 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} to have a closed and convex graph and also that y¯int(𝒜(𝕏))\bar{y}\in{\rm int}(\mathcal{A}(\mathbb{X})) (the interior of 𝒜(𝕏))\mathcal{A}(\mathbb{X})). R-continuity only necessitates that the graph of the operators be closed. Indeed, if the operator has a closed graph at zero and is locally compact at zero then it is R-continuous at zero (Theorem 5). Conversely, if the operator is R-continuous at zero and its value at zero is closed, then it has a closed graph at zero (Theorem 6). The condition for having its graph closed at zero is relatively mild. If an operator’s graph is closed, then it has a closed graph at zero as well. Furthermore, an operator has a closed graph if and only if its inverse has a closed graph. Closed graph operators are usually found in optimization when dealing with continuous single-valued mappings, maximally monotone operators (see, e.g., [8]), the sum of two closed graph operators where one of them is single-valued, the sum of two closed graph set-valued operators where one of them is locally compact (Proposition 3). The Sign function in n{\mathbb{R}}^{n}, which equals to the convex subdifferential of the norm function, is a well-known example of an operator with compact range and is widely used in image processing, mechanical and electrical engineering (see, e.g., [2, 20]).

Finally, we demonstrate that R-continuity can be used to analyse the explicit convergence rate of the proximal point algorithm (𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A}) when applied to a maximally monotone operator 𝒜:\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H}, where \mathcal{H} is a Hilbert space. The 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A}, introduced by B. Martinet and further developed by Rockafellar, Bauschke and Combettes and others (see, e.g., [7, 18, 25]), is an essential tool in convex optimization if 𝒜\mathcal{A} is set-valued and lacks special structure. We show that if 𝒜1\mathcal{A}^{-1} is globally R-Lipschitz continuous at zero (i.e., σ=+\sigma=+\infty), the convergence of the proximal point algorithm is linear from the beginning (Theorem 11). When considering the case where 𝒜=f\mathcal{A}=\partial f, i.e. when 𝒜\mathcal{A} is the convex subdifferential of a proper lower semicontinuous extended real-valued convex function, not necessarily smooth f:{+}f:\mathcal{H}\to{\mathbb{R}}\cup\{+\infty\}, if 𝒜1\mathcal{A}^{-1} is only R-continuous, we obtain an explicit convergence rate of 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} based on the modulus function. If (f)1(\partial f)^{-1} is R-Lipschitz continuous at zero, one has the linear convergence of the generated sequence (xn)(x_{n}) after some iterations (Theorem 14).

The paper is organized as follows. First we review the necessary material about R-continuity of set-valued mappings and maximally monotone operators in Section 2. Some key properties of R-continuity are established in Section 3. The analysis of the convergence rate of 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} under the R-continuity is presented in Section 4. The paper ends in Section 5 with some conclusions and perspectives.

2 Mathematical preliminaries

In what follows, we extend the notion of R-continuity introduced in [17] from Hilbert spaces to Banach spaces, defined for any point in the domain of the operators. Let 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} be a set-valued mapping where 𝕏,𝕐\mathbb{X},\mathbb{Y} are Banach spaces and x¯dom𝒜{\bar{x}}\in{\rm dom}\,\mathcal{A}.

Definition 1.

The set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is called R-continuous at x¯{\bar{x}} if there exist σ>0\sigma>0 and a non-decreasing function ρ:++\rho:\mathbb{R}^{+}\to\mathbb{R}^{+} satisfying limr0+ρ(r)=ρ(0)=0\lim_{r\to 0^{+}}\rho(r)=\rho(0)=0 such that

𝒜(x)𝒜(x¯)+ρ(xx¯)𝔹,x𝔹(x¯,σ),\mathcal{A}(x)\subset\mathcal{A}({\bar{x}})+\rho(\|x-{\bar{x}}\|)\mathbb{B},\;\forall x\in\mathbb{B}({\bar{x}},\sigma), (10)

or equivalently, for each y𝒜(x)y\in\mathcal{A}(x), there exists y¯𝒜(x¯){\bar{y}}\in\mathcal{A}({\bar{x}}) such that yy¯ρ(xx¯)\|y-{\bar{y}}\|\leq\rho(\|x-{\bar{x}}\|) for all x𝔹(x¯,σ)x\in\mathbb{B}({\bar{x}},\sigma).

The function ρ\rho is called a continuity modulus function of 𝒜\mathcal{A} at x¯{\bar{x}} and σ\sigma is called the radius. We say that 𝒜\mathcal{A} is RR-Lipschitz continuous at x¯{\bar{x}} with modulus LL if ρ(r)=Lr\rho(r)=Lr for some L>0L>0. In addition, if σ=\sigma=\infty then 𝒜\mathcal{A} is said globally R-Lipschitz continuous at x¯{\bar{x}}.

Remark 1.

The set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is R-continuous at x¯{\bar{x}} with modulus function ρ\rho and radius σ\sigma iff 𝒜-\mathcal{A} is R-continuous at x¯{\bar{x}} with modulus function ρ\rho and radius σ\sigma.

Definition 2.

The set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is called R-continuous if it is R-continuous at any point in its domain. It is called R-Lipschitz continuous if it is R-Lipschitz continuous at any point in its domain. In addition, if the radius σ=+\sigma=+\infty, then it is called globally R-Lipschitz continuous.

Proposition 1.

If 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is globally metrically regular then 𝒜1:𝕐𝕏\mathcal{A}^{-1}:\mathbb{Y}\rightrightarrows\mathbb{X} is globally R-Lipschitz continuous.

Proof.

Let be given y,y¯dom𝒜1{y},{\bar{y}}\in{\rm dom}\,\mathcal{A}^{-1} and x𝒜1(y){x}\in\mathcal{A}^{-1}({y}). We have y𝒜xy\in\mathcal{A}x and since 𝒜\mathcal{A} is globally metrically regular, one has

𝐝(x,𝒜1(y¯))κ𝐝(y¯,𝒜(x))κyy¯,\mathbf{d}(x,\mathcal{A}^{-1}(\bar{y}))\leq\kappa\mathbf{d}(\bar{y},\mathcal{A}(x))\leq\kappa\|y-\bar{y}\|, (11)

for some κ>0\kappa>0. Since x𝒜1(y)x\in\mathcal{A}^{-1}(y) is arbitrary, we deduce that

𝐞𝐱(𝒜1(y),𝒜1(y¯))κyy¯\mathbf{e}\mathbf{x}(\mathcal{A}^{-1}(y),\mathcal{A}^{-1}(\bar{y}))\leq\kappa\|y-\bar{y}\|

and the conclusion follows. ∎

The following simple example shows that the metric regularity is strictly stronger than R-Lipschitz continuity. This important fact means that we still obtain the convergence rate of optimization algorithms if only R-Lipschitz continuity or even R-continuity holds (see, e.g., [17]).

Example 1.

Let 𝒜:\mathcal{A}:{\mathbb{R}}\rightrightarrows{\mathbb{R}} be defined by

𝒜(x):={[0,)ifx=10if1<x<1(,0]ifx<0.\mathcal{A}(x):=\,\left\{\begin{array}[]{l}[0,\infty)\;\;\;\;\;\;\;\;{\rm if}\;\;\;\;x=1\\ \\ 0\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;{\rm if}\;\;\;-1<x<1\\ \\ (-\infty,0]\;\;\;\;\;\;{\rm if}\;\;\;\;x<0.\end{array}\right.

Then 𝒜1(y)=𝐒𝐢𝐠𝐧(y)\mathcal{A}^{-1}(y)={\rm\mathbf{Sign}}(y) is R-Lipschitz continuous for any positive modulus. However, if yn<0y_{n}<0 and yn0y_{n}\to 0 then

𝐝(1,𝒜1(yn))=2,𝐝(yn,𝒜(1))=𝐝(yn,[0,))=|yn|0.\mathbf{d}(1,\mathcal{A}^{-1}(y_{n}))=2,\;\;\mathbf{d}(y_{n},\mathcal{A}(1))=\mathbf{d}(y_{n},[0,\infty))=|y_{n}|\to 0.

Thus 𝒜\mathcal{A} is not metrically regular at (1,0)(1,0).

In the text that follows, we consider set-valued mappings with closed graph.

Definition 3.

We say that 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} has a closed graph if yn𝒜(xn),ynyy_{n}\in\mathcal{A}(x_{n}),\;y_{n}\to y and xnxx_{n}\to x then y𝒜(x)y\in\mathcal{A}(x). It is said that 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} has a closed graph at zero if yn𝒜(xn),xn0y_{n}\in\mathcal{A}(x_{n}),\;x_{n}\to 0 and ynyy_{n}\to y then y𝒜(0)y\in\mathcal{A}(0).

Proposition 2.

The set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} has a closed graph if and only if 𝒜1:𝕐𝕏\mathcal{A}^{-1}:\mathbb{Y}\rightrightarrows\mathbb{X} has a closed graph.

Proof.

It follows directly from the definition of the inverse set-valued mapping 𝒜1\mathcal{A}^{-1}. ∎

Proposition 3.

Suppose that 𝒜=𝒜1+𝒜2\mathcal{A}=\mathcal{A}_{1}+\mathcal{A}_{2} where 𝒜1,𝒜2:𝕏𝕐\mathcal{A}_{1},\mathcal{A}_{2}:\mathbb{X}\rightrightarrows\mathbb{Y} has a closed graph;
a) If 𝒜2\mathcal{A}_{2} is single-valued, then 𝒜\mathcal{A} has a closed graph;
b) If 𝒜2\mathcal{A}_{2} is locally compact, i.e., for each x¯X\bar{x}\in X there exists ϵ>0\epsilon>0 such that the set x𝔹(x¯,ϵ)𝒜2(x)\bigcup_{x\in\mathbb{B}(\bar{x},\epsilon)}\mathcal{A}_{2}(x) is compact, then 𝒜\mathcal{A} has a closed graph.

Proof.

a) Let yn𝒜(xn)=𝒜1(xn)+𝒜2(xn),ynyy_{n}\in\mathcal{A}(x_{n})=\mathcal{A}_{1}(x_{n})+\mathcal{A}_{2}(x_{n}),y_{n}\to y and xnxx_{n}\to x. We have yn=zn+𝒜2(xn)y_{n}=z_{n}+\mathcal{A}_{2}(x_{n}) for some zn𝒜1(xn)z_{n}\in\mathcal{A}_{1}(x_{n}). Since zn=yn𝒜2(xn)y𝒜2(x)z_{n}=y_{n}-\mathcal{A}_{2}(x_{n})\to y-\mathcal{A}_{2}(x) and 𝒜1\mathcal{A}_{1} has a closed graph, we imply that y𝒜2(x)𝒜1(x)y-\mathcal{A}_{2}(x)\in\mathcal{A}_{1}(x) and the conclusion follows.
b) Similarly let yn𝒜(xn)=𝒜1(xn)+𝒜2(xn),ynyy_{n}\in\mathcal{A}(x_{n})=\mathcal{A}_{1}(x_{n})+\mathcal{A}_{2}(x_{n}),y_{n}\to y and xnxx_{n}\to x. We have yn=zn+vny_{n}=z_{n}+v_{n} for some zn𝒜1(xn)z_{n}\in\mathcal{A}_{1}(x_{n}) and vn𝒜2(xn)v_{n}\in\mathcal{A}_{2}(x_{n}). Since 𝒜2\mathcal{A}_{2} is locally compact, there exists a subsequence of (vn)(v_{n}), without relabelling w.l.o.g, converging to some v𝒜2(x)v\in\mathcal{A}_{2}(x). Thus zn=ynvnz_{n}=y_{n}-v_{n} converges to yv𝒜1(x)y-v\in\mathcal{A}_{1}(x). Therefore y𝒜1(x)+v𝒜1(x)+𝒜2(x).y\in\mathcal{A}_{1}(x)+v\in\mathcal{A}_{1}(x)+\mathcal{A}_{2}(x).

Finally some useful properties of monotone operators are reminded. Let \mathcal{H} be a Hilbert space, a set-valued mapping 𝒜:\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H} is called monotone provided

x¯y¯,xy0x,y,x¯𝒜(x)andy¯𝒜(y).\langle\bar{x}-\bar{y},x-y\rangle\geq 0\;\;\forall\;x,y\in\mathcal{H},\bar{x}\in\mathcal{A}(x)\;{\rm and}\;\bar{y}\in\mathcal{A}(y).

In addition, it is called maximally monotone if there is no monotone operator \mathcal{B} such that the graph of 𝒜\mathcal{A} is strictly included in the graph of \mathcal{B}. The mapping 𝒜\mathcal{A} is called γ\gamma-strongly monotone if

x¯y¯,xyγxy2x,y,x¯𝒜(x)andy¯𝒜(y).\langle\bar{x}-\bar{y},x-y\rangle\geq\gamma\|x-y\|^{2}\;\;\forall\;x,y\in\mathcal{H},\bar{x}\in\mathcal{A}(x)\;{\rm and}\;\bar{y}\in\mathcal{A}(y).

Note that such operators have been studied extensively because of their role in convex analysis and certain partial differential equations. The resolvent of a maximally monotone operator 𝒜\mathcal{A} is defined respectively by 𝐉𝒜:=(𝐈𝐝+𝒜)1\mathbf{J}_{\mathcal{A}}:=(\mathbf{I}\mathbf{d}+\mathcal{A})^{-1}. It is well-known that resolvents are single-valued and non-expansive (see, e.g., [19]).

A set-valued mapping 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is called γ\gamma-coercive with modulus γ>0\gamma>0 if for all x,ydom𝒜x,y\in{\rm dom}\,\mathcal{A} and for all x¯𝒜(x)\bar{x}\in\mathcal{A}(x), y¯𝒜(y)\bar{y}\in\mathcal{A}(y), we have

x¯y¯γxy.\|\bar{x}-\bar{y}\|\geq\gamma\|x-y\|. (12)

In order to simplify the writing we will use the notation: for all x,ydom𝒜x,y\in{\rm dom}\,\mathcal{A}, we have

𝒜(x)𝒜(y)γxy,\|\mathcal{A}(x)-\mathcal{A}(y)\|\geq\gamma\|x-y\|,

to describe this property. It is easy to see that a γ\gamma-strongly monotone operator is γ\gamma-coercive. Another example is given by matrices with full column rank (see, e.g., [17]). Next we extend to a pair of set-valued mappings the notion of monotonicity of a pair of single-valued operators introduced in Hilbert spaces by Adly-Cojocaru-Le [4].

Definition 4.

Let be given two set-valued mappings ,𝒞:𝕏𝕐\mathcal{B},\mathcal{C}:\mathbb{X}\rightrightarrows\mathbb{Y}. The pair (,𝒞)(\mathcal{B},\mathcal{C}) is called monotone if for all x,y𝕏,xb(x),yb(y),xc𝒞(x),yc𝒞(y)x,y\in\mathbb{X},x_{b}\in\mathcal{B}(x),y_{b}\in\mathcal{B}(y),x_{c}\in\mathcal{C}(x),y_{c}\in\mathcal{C}(y), we have

(xbyb)+(xcyc)2xbyb2+xcyc2.\|(x_{b}-y_{b})+(x_{c}-y_{c})\|^{2}\geq\|x_{b}-y_{b}\|^{2}+\|x_{c}-y_{c}\|^{2}. (13)

Equivalently, using the notation previously given (13) is equivalent to

(x)(y)+𝒞(x)𝒞(y)2(x)(y)2+𝒞(x)𝒞(y)2.\|\mathcal{B}(x)-\mathcal{B}(y)+\mathcal{C}(x)-\mathcal{C}(y)\|^{2}\geq\|\mathcal{B}(x)-\mathcal{B}(y)\|^{2}+\|\mathcal{C}(x)-\mathcal{C}(y)\|^{2}.

Note that when 𝕐=\mathbb{Y}=\mathcal{H} is a Hilbert space, the monotonicity of (,𝒞)(\mathcal{B},\mathcal{C}) is equivalent to

(x)(y),𝒞(x)𝒞(y)0,x,y.\langle\mathcal{B}(x)-\mathcal{B}(y),\mathcal{C}(x)-\mathcal{C}(y)\rangle\geq 0,\;\;\forall\;x,y\in\mathcal{H}.

In addition, we say (,𝒞)(\mathcal{B},\mathcal{C}) is γ\gamma-strongly monotone (γ>0)(\gamma>0) if

(x)(y),𝒞(x)𝒞(y)γxy2,x,y.\langle\mathcal{B}(x)-\mathcal{B}(y),\mathcal{C}(x)-\mathcal{C}(y)\rangle\geq\gamma\|x-y\|^{2},\;\;\forall\;x,y\in\mathcal{H}.
Remark 2.
  1. 1.

    In Hilbert spaces, the monotonicity of (,𝒞)(\mathcal{B},\mathcal{C}) means that the increments of \mathcal{B} and 𝒞\mathcal{C} does not form an obtuse angle.

  2. 2.

    If (,𝒞)(\mathcal{B},\mathcal{C}) is monotone then for all x,y𝕏x,y\in\mathbb{X}, one has

    (x)(y)+𝒞(x)𝒞(y)(x)(y).\|\mathcal{B}(x)-\mathcal{B}(y)+\mathcal{C}(x)-\mathcal{C}(y)\|\geq\|\mathcal{B}(x)-\mathcal{B}(y)\|.
  3. 3.

    If \mathcal{B} is monotone then the pair (,𝐈𝐝)(\mathcal{B},\mathbf{I}\mathbf{d}) is monotone.

  4. 4.

    The strong monotonicity of the pair (,𝒞)(\mathcal{B},\mathcal{C}) is important for the linear convergence of optimization algorithms solving the inclusion 0(x)0\in\mathcal{B}(x) [4].When \mathcal{B} fails to be monotone, with some suitable choice of 𝒞\mathcal{C}, the pair (,𝒞)(\mathcal{B},\mathcal{C}) becomes strongly monotone, as the following example shows.

Example 2.

Let ,𝒞:22\mathcal{B},\mathcal{C}:{\mathbb{R}}^{2}\rightrightarrows{\mathbb{R}}^{2} be defined by

(x1,x2)=(𝐒𝐢𝐠𝐧(x2)+3x2+sin|x1|𝐒𝐢𝐠𝐧(x1)+3x1+cos|x2|),𝒞(x1,x2)=(3x23x1)\mathcal{B}(x_{1},x_{2})=\left(\begin{array}[]{ccc}\mathbf{Sign}(x_{2})+3x_{2}+\sin|x_{1}|\\ \\ \mathbf{Sign}(x_{1})+3x_{1}+\cos|x_{2}|\end{array}\right),\;\;\mathcal{C}(x_{1},x_{2})=\left(\begin{array}[]{ccc}3x_{2}\\ \\ 3x_{1}\end{array}\right)

where

𝐒𝐢𝐠𝐧(a)={1ifa>0[-1,1]ifa=0,1ifa<0.\mathbf{Sign}(a)=\left\{\begin{array}[]{l}1\;\;\;\;\;\;\;\;{\rm if}\;\;\;\;a>0\\ \\ ${\rm[-1,1]}$\;\;\;{\rm if}\;\;\;a=0,\\ \\ -1\;\;\;\;\;\;{\rm if}\;\;\;\;a<0.\end{array}\right.

Then ,𝒞\mathcal{B},\mathcal{C} are not monotone but (,𝒞)(\mathcal{B},\mathcal{C}) is 66-strongly monotone. Indeed for all x=(x1,x2)x=(x_{1},x_{2}) and y=(y1,y2)y=(y_{1},y_{2}), we have

(x)(y),𝒞(x)𝒞(y)\displaystyle\langle\mathcal{B}(x)-\mathcal{B}(y),\mathcal{C}(x)-\mathcal{C}(y)\rangle
\displaystyle\geq 9(x2y2)2+3(sin|x1|sin|y1|)(x2y2)+9(x1y1)2+3(cos|x2|cos|y2|)(x1y1)\displaystyle 9(x_{2}-y_{2})^{2}+3(\sin|x_{1}|-\sin|y_{1}|)(x_{2}-y_{2})+9(x_{1}-y_{1})^{2}+3(\cos|x_{2}|-\cos|y_{2}|)(x_{1}-y_{1})
\displaystyle\geq 6xy2.\displaystyle 6\|x-y\|^{2}.

3 Properties of R-Continuity

First we show that the sum (and also the difference) of two R-continuous mappings is also R-continuous.

Theorem 4.

Suppose that 𝒜1,𝒜2:𝕏𝕐\mathcal{A}_{1},\mathcal{A}_{2}:\mathbb{X}\rightrightarrows\mathbb{Y} are R-continuous at x¯\bar{x} then 𝒜1+𝒜2\mathcal{A}_{1}+\mathcal{A}_{2} is also R-continuous at x¯\bar{x}.

Proof.

Let ρ1,ρ2\rho_{1},\rho_{2} and σ1,σ2\sigma_{1},\sigma_{2} be the modulus functions and radii of 𝒜1\mathcal{A}_{1} and 𝒜2\mathcal{A}_{2} respectively. We set σ:=min{σ1,σ2}\sigma:=\min\{\sigma_{1},\sigma_{2}\} and ρ(r):=max{ρ1(r),ρ2(r)}\rho(r):=\max\{\rho_{1}(r),\rho_{2}(r)\} for all r0r\geq 0 then ρ\rho is non-decreasing and limr0+ρ(r)=ρ(0)=0\lim_{r\to 0^{+}}\rho(r)=\rho(0)=0. Taking x𝔹(x¯,σ)x\in\mathbb{B}(\bar{x},\sigma) and y𝒜1(x)+𝒜2(x)y\in\mathcal{A}_{1}(x)+\mathcal{A}_{2}(x), then y=y1+y2y=y_{1}+y_{2} where y1𝒜1(x)y_{1}\in\mathcal{A}_{1}(x) and y2𝒜2(x)y_{2}\in\mathcal{A}_{2}(x). Since 𝒜1,𝒜2\mathcal{A}_{1},\mathcal{A}_{2} are R-continuous at x¯\bar{x} there exist y¯1𝒜1(x¯)\bar{y}_{1}\in\mathcal{A}_{1}(\bar{x}) and y¯2𝒜2(x¯)\bar{y}_{2}\in\mathcal{A}_{2}(\bar{x}) such that y1y¯1ρ1(xx¯)\|y_{1}-\bar{y}_{1}\|\leq\rho_{1}(\|x-\bar{x}\|) and y2y¯2ρ2(xx¯)\|y_{2}-\bar{y}_{2}\|\leq\rho_{2}(\|x-\bar{x}\|). Let y¯=y¯1+y¯2𝒜1(x¯)+𝒜2(x¯)\bar{y}=\bar{y}_{1}+\bar{y}_{2}\in\mathcal{A}_{1}(\bar{x})+\mathcal{A}_{2}(\bar{x}) then

yy¯y1y¯1+y2y¯22ρ(xx¯).\|y-\bar{y}\|\leq\|y_{1}-\bar{y}_{1}\|+\|y_{2}-\bar{y}_{2}\|\leq 2\rho(\|x-\bar{x}\|). (14)

It means that 𝒜1+𝒜2\mathcal{A}_{1}+\mathcal{A}_{2} is R-continuous at x¯\bar{x} with modulus function 2ρ2\rho and radius σ\sigma. ∎

Next we show that R-continuity is satisfied for a large class of operators and has a closed connection with the closed graph property at zero.

Theorem 5.

If 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} has a closed graph at zero and is locally compact at zero then 𝒜\mathcal{A} is RR-continuous at zero.

Proof.

We define the function ρ\rho as follows: set ρ(0)=0\rho(0)=0 and if σ>0\sigma>0, set

ρ(σ)=inf{δ>0:𝒜(x)𝒜(0)+δ𝔹,xσ𝔹}.\rho(\sigma)=\inf\{\delta>0:\mathcal{A}(x)\subset\mathcal{A}(0)+\delta\mathbb{B},\;\;\forall x\in\sigma\mathbb{B}\}.

It is easy to see that ρ\rho is well-defined and non-decreasing because 𝒜\mathcal{A} is locally compact at zero. Since ρ\rho is non-decreasing and bounded from below by 0, limσ0+ρ(σ)\lim_{\sigma\to 0^{+}}\rho(\sigma) exists. If we suppose that 𝒜\mathcal{A} is not R-continuous at 0, then we must have limσ0+ρ(σ)=δ>0\lim_{\sigma\to 0^{+}}\rho(\sigma)=\delta^{*}>0. Hence there exist two sequences (xn)(x_{n}), (yn)(y_{n}) such that xn0x_{n}\to 0, yn𝒜(xn)y_{n}\in\mathcal{A}(x_{n}) and

yn𝒜(0)+δ2𝔹.y_{n}\notin\mathcal{A}(0)+\frac{\delta^{*}}{2}\mathbb{B}. (15)

Since 𝒜\mathcal{A} is locally compact at zero, on relabeling if necessary, we may suppose that (yn)(y_{n}) converges to y¯𝒜(0)\bar{y}\in\mathcal{A}(0) since 𝒜\mathcal{A} has a closed graph at zero. This contradicts (15)(\ref{closedr}) and the proof is complete. ∎

Theorem 6.

If 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} is R-continuous at zero and 𝒜(0)\mathcal{A}(0) is closed then 𝒜\mathcal{A} has a closed graph at zero.

Proof.

Suppose that ynyy_{n}\to y, xn0x_{n}\to 0 and yn𝒜(xn)y_{n}\in\mathcal{A}(x_{n}). From the R-continuity of 𝒜\mathcal{A}, we have

yn𝒜(xn)𝒜(0)+ρ(xn).y_{n}\in\mathcal{A}(x_{n})\subset\mathcal{A}(0)+\rho(\|x_{n}\|).

Since ynyy_{n}\to y, ρ(xn)0\rho(\|x_{n}\|)\to 0 and 𝒜(0)\mathcal{A}(0) is closed, we deduce that y𝒜(0)y\in\mathcal{A}(0) and thus 𝒜\mathcal{A} has a closed graph at zero. ∎

Although R-continuity is straightforward, it always guarantees the consistency of the system in Hoffman’s sense [13].

Theorem 7.

Let 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} be a set-valued mapping. If 𝒜1\mathcal{A}^{-1} is R-continuous at zero then the inclusion 0𝒜(x)0\in\mathcal{A}(x) is consistent in the sense of Hoffman, i.e., if yσy_{\sigma} has a small norm satisfying yσ𝒜(xσ)y_{\sigma}\in\mathcal{A}(x_{\sigma}) then there exists a solution x¯𝐒:=𝒜1(0)\bar{x}\in\mathbf{S}:=\mathcal{A}^{-1}(0) such that xσx_{\sigma} is close to x¯\bar{x}.

Proof.

Suppose that 𝒜1\mathcal{A}^{-1} is R-continuous at zero with modulus function ρ\rho and radius σ\sigma. Let ye𝒜(xσ)y_{e}\in\mathcal{A}(x_{\sigma}) and yeσ\|y_{e}\|\leq\sigma. We claim that we can find some x¯\bar{x} such that 0𝒜(x¯)0\in\mathcal{A}(\bar{x}) and xσx¯\|x_{\sigma}-\bar{x}\| is also small. Indeed, we have

xσ𝒜1(ye)𝒜1(0)+ρ(σ)𝔹=𝐒+ρ(σ)𝔹x_{\sigma}\in\mathcal{A}^{-1}(y_{e})\subset\mathcal{A}^{-1}(0)+\rho(\sigma)\mathbb{B}=\mathbf{S}+\rho(\sigma)\mathbb{B}

which implies that

𝐝(xσ,𝐒)ρ(σ)\mathbf{d}(x_{\sigma},\mathbf{S})\leq\rho(\sigma)

and the conclusion follows. ∎

Remark 3.

Let us note that to obtain the consistency property, metric subregularity or metric regularity have to be satisfied globally. Indeed suppose that metric subregularity (9) holds and 𝐝(0,𝒜(x))\mathbf{d}(0,\mathcal{A}(x)) is small but xx is not close to x¯{\bar{x}}, we cannot conclude that xx is close to a solution.

In optimization, when we consider the inclusion 0𝒜(x)0\in\mathcal{A}(x), we want to know whether 𝒜1\mathcal{A}^{-1} is R-continuous or even R-Lipschitz continuous. Here we provide some cases where the global R-Lipschitz continuity is satisfied. It is known that the inverse of any square matrix is globally R-Lipschitz continuous [17, Proposition 2.10]. Now we prove that the inverse of any matrix is globally R-Lipschitz continuous. Note that this result cannot be deduced from the Robinson-Ursescu Theorem. Finally we give some nontrivial nonlinear examples (see also Proposition 1 and [5, Theorem 7]).

Theorem 8.

If 𝐀m×n\mathbf{A}\in{\mathbb{R}}^{m\times n} is a matrix, then 𝐀1\mathbf{A}^{-1} is globally R-Lipschitz continuous.

Proof.

Let be given y¯dom𝐀1\bar{y}\in{\rm dom}\,\mathbf{A}^{-1}. For all ydom(𝐀1)y\in{\rm dom\,}(\mathbf{A}^{-1}) and x𝐀1(y)x\in\mathbf{A}^{-1}(y), we want to find x¯𝐀1(y¯)\bar{x}\in\mathbf{A}^{-1}(\bar{y}) such that

xx¯κyy¯,\|x-\bar{x}\|\leq\kappa\|y-\bar{y}\|,

for some κ>0\kappa>0. First we take some x𝐀1(y¯)x^{\prime}\in\mathbf{A}^{-1}(\bar{y}). We have y=𝐀xy=\mathbf{A}x and y¯=𝐀x\bar{y}=\mathbf{A}x^{\prime}. Note that 𝐀T𝐀n×n\mathbf{A}^{T}\mathbf{A}\in{\mathbb{R}}^{n\times n} is a positive semi-definite matrix. Let x=xi+xkx=x_{i}+x_{k} where xi,xkx_{i},x_{k} are the projections of xx onto Im(𝐀T𝐀){\rm Im}(\mathbf{A}^{T}\mathbf{A}) and Ker(𝐀T𝐀)=Ker(𝐀){\rm Ker}(\mathbf{A}^{T}\mathbf{A})={\rm Ker}(\mathbf{A}), respectively. Similarly we decompose x=xi+xkx^{\prime}=x^{\prime}_{i}+x^{\prime}_{k}. Then we have

𝐀Tyy¯𝐀T(yy¯)=𝐀T𝐀(xixi)kxixi\|\mathbf{A}^{T}\|\|y-\bar{y}\|\geq\|\mathbf{A}^{T}(y-\bar{y})\|=\|\mathbf{A}^{T}\mathbf{A}(x_{i}-x_{i}^{\prime})\|\geq k\|x_{i}-x_{i}^{\prime}\|

for some k>0k>0 (see, e.g., [17, Lemma 2.8] or [27, Lemma 3]). Thus if we choose x¯=xi+xk\bar{x}=x_{i}^{\prime}+x_{k} then 𝐀x¯=𝐀(xi+xk)=𝐀(xi+xk)=𝐀x=y¯\mathbf{A}\bar{x}=\mathbf{A}(x_{i}^{\prime}+x_{k})=\mathbf{A}(x_{i}^{\prime}+x^{\prime}_{k})=\mathbf{A}x^{\prime}=\bar{y} since xk,xkKer(𝐀)x_{k},x^{\prime}_{k}\in{\rm Ker}(\mathbf{A}). In addition, we have

xx¯=xixi𝐀Tkyy¯,\|x-\bar{x}\|=\|x_{i}-x_{i}^{\prime}\|\leq\frac{\|\mathbf{A}^{T}\|}{k}\|y-\bar{y}\|,

and the conclusion follows. ∎

Proposition 9.

If 𝒜:nm\mathcal{A}:{\mathbb{R}}^{n}\rightrightarrows{\mathbb{R}}^{m} has the form 𝒜=𝐁+𝐁\mathcal{A}=\mathbf{B}+\mathcal{F}\circ\mathbf{B} where 𝐁m×n\mathbf{B}\in{\mathbb{R}}^{m\times n} is a matrix and :mm\mathcal{F}:{\mathbb{R}}^{m}\rightrightarrows{\mathbb{R}}^{m} is monotone then 𝒜1\mathcal{A}^{-1} is globally R-Lipschitz continuous.

Proof.

Suppose that y𝒜(x)=𝐁x+y1,y1(𝐁x)y\in\mathcal{A(}x)=\mathbf{B}x+y_{1},\;y_{1}\in\mathcal{F}(\mathbf{B}x) and y¯dom(𝒜1)\bar{y}\in{\rm dom\,}(\mathcal{A}^{-1}). We want to find x¯𝒜1(y¯)\bar{x}\in\mathcal{A}^{-1}(\bar{y}) such that

xx¯κyy¯,\|x-\bar{x}\|\leq\kappa\|y-\bar{y}\|,

for some κ>0\kappa>0. First we take some x𝒜1(y¯)x^{\prime}\in\mathcal{A}^{-1}(\bar{y}), i.e., y¯𝒜(x)=𝐁x+y¯1,y¯1(𝐁x)\bar{y}\in\mathcal{A}(x^{\prime})=\mathbf{B}x^{\prime}+\bar{y}_{1},\;\bar{y}_{1}\in\mathcal{F}(\mathbf{B}x^{\prime}). Using the proof of Theorem 8, there exists x¯\bar{x} such that 𝐁x¯=𝐁x\mathbf{B}\bar{x}=\mathbf{B}x^{\prime} and 𝐁x𝐁x¯κ1xx¯\|\mathbf{B}x-\mathbf{B}\bar{x}\|\geq\kappa_{1}\|x-\bar{x}\| for some κ1>0\kappa_{1}>0. Since \mathcal{F} is monotone, one has

yy¯2=𝐁x𝐁x¯+y1y¯12𝐁x𝐁x¯2κ12xx¯2\|y-\bar{y}\|^{2}=\|\mathbf{B}x-\mathbf{B}\bar{x}+y_{1}-\bar{y}_{1}\|^{2}\geq\|\mathbf{B}x-\mathbf{B}\bar{x}\|^{2}\geq\kappa_{1}^{2}\|x-\bar{x}\|^{2}

and the conclusion follows. ∎

Proposition 10.

If 𝒜:𝕏𝕐\mathcal{A}:\mathbb{X}\rightrightarrows\mathbb{Y} has the form 𝒜=+𝒞\mathcal{A}=\mathcal{B}+\mathcal{C} where :𝕏𝕐\mathcal{B}:\mathbb{X}\rightrightarrows\mathbb{Y} is coercive and the pair (,𝒞)(\mathcal{B},\mathcal{C}) is monotone then 𝒜1\mathcal{A}^{-1} is globally R-Lipschitz continuous.

Proof.

Suppose that y¯=y¯1+y¯2𝒜(x¯)\bar{y}=\bar{y}_{1}+\bar{y}_{2}\in\mathcal{A}(\bar{x}) where y¯1(x¯),y¯2𝒞(x¯)\bar{y}_{1}\in\mathcal{B}(\bar{x}),\bar{y}_{2}\in\mathcal{C}(\bar{x}) and y=y1+y2𝒜(x)y=y_{1}+y_{2}\in\mathcal{A}(x) where y1(x),y2𝒞(x)y_{1}\in\mathcal{B}(x),y_{2}\in\mathcal{C}(x). Since :𝕏𝕐\mathcal{B}:\mathbb{X}\to\mathbb{Y} is coercive and the pair (,𝒞)(\mathcal{B},\mathcal{C}) is monotone, one has

y¯y2\displaystyle\|\bar{y}-y\|^{2} =\displaystyle= y¯1y1+y¯2y22\displaystyle\|\bar{y}_{1}-y_{1}+\bar{y}_{2}-y_{2}\|^{2}
\displaystyle\geq y¯1y12\displaystyle\|\bar{y}_{1}-y_{1}\|^{2}
\displaystyle\geq κ2x¯x2,\displaystyle\kappa^{2}\|\bar{x}-x\|^{2},

for some κ>0\kappa>0. Thus 𝒜1\mathcal{A}^{-1} is single-valued Lipschitz continuous and thus is R-Lipschitz continuous. ∎

4 Explicit Convergence Rate of 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} under the R-Continuity

Let 𝒜:\mathcal{A}:\mathcal{H}\rightrightarrows\mathcal{H} be a maximally monotone operator with nonempty 𝐒:=𝒜1(0)\mathbf{S}:=\mathcal{A}^{-1}(0) where \mathcal{H} is a Hilbert space. Let us recall that 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} generates for any starting point x0x_{0}, a sequence (xn)(x_{n}) defined by the rule:

xn+1=Jγ𝒜(xn),x0,x_{n+1}=\textbf{J}_{\gamma\mathcal{A}}(x_{n}),\;x_{0}\in\mathcal{H}, (16)

for some γ>0\gamma>0 where Jγ𝒜(x):=(𝐈𝐝+γ𝒜)1(x)\textbf{J}_{\gamma\mathcal{A}}(x):\,=(\mathbf{I}\mathbf{d}+\gamma\mathcal{A})^{-1}(x). 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} plays an important role in convex optimization if 𝒜\mathcal{A} is set-valued and has no special structure. The following result shows that the convergence rate of 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} is indeed linear if 𝒜1\mathcal{A}^{-1} is R-Lipschitz continuous at zero globally.

Theorem 11.

Suppose that 𝒜1\mathcal{A}^{-1} is R-Lipschitz continuous at zero globally with modulus function ρ(r)=Lr\rho(r)=Lr for some L>0L>0. Then for n1n\geq 1, the sequence (𝐝(xn,𝐒))n(\mathbf{d}(x_{n},\mathbf{S}))_{n\in{\mathbb{N}}} converges to zero with linear rate where (xn)(x_{n}) is the sequence generated by 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} with γ>2L\gamma>2L.

Proof.

Let x¯n=𝐏𝐫𝐨𝐣𝐒(xn)\bar{x}_{n}=\mathbf{P}{\mathbf{r}}\mathbf{o}\mathbf{j}_{\mathbf{S}}(x_{n}). We have xn+1x¯nxnx¯n=𝐝(xn,𝐒)\|x_{n+1}-\bar{x}_{n}\|\leq\|x_{n}-\bar{x}_{n}\|=\mathbf{d}(x_{n},\mathbf{S}) where the inequality is obtained by using the nonexpansiveness of the resolvent. We have

xn+1xnγ𝒜(xn+1).\displaystyle-\frac{x_{n+1}-x_{n}}{\gamma}\in\mathcal{A}(x_{n+1}). (17)

Since 𝒜1\mathcal{A}^{-1} is R-Lipschitz continuous at zero globally, we obtain

xn+1=𝒜1(xn+1xnγ)𝒜1(0)+ρ(xn+1xnγ)𝔹=𝐒+ρ(xn+1xnγ)𝔹.x_{n+1}=\mathcal{A}^{-1}\Big{(}-\frac{x_{n+1}-x_{n}}{\gamma}\Big{)}\subset\mathcal{A}^{-1}(0)+\rho\Big{(}\frac{\|x_{n+1}-x_{n}\|}{\gamma}\Big{)}\mathbb{B}=\mathbf{S}+\rho\Big{(}\frac{\|x_{n+1}-x_{n}\|}{\gamma}\Big{)}\mathbb{B}.

Consequently

𝐝(xn+1,𝐒)Lxn+1xnγLγ(xn+1x¯n+xnx¯n)2Lγxnx¯n=κ𝐝(xn,𝐒)\mathbf{d}(x_{n+1},\mathbf{S})\leq\frac{L\|x_{n+1}-x_{n}\|}{\gamma}\leq\frac{L}{\gamma}\Big{(}\|x_{n+1}-\bar{x}_{n}\|+\|x_{n}-\bar{x}_{n}\|\Big{)}\leq\frac{2L}{\gamma}\|x_{n}-\bar{x}_{n}\|=\kappa\mathbf{d}(x_{n},\mathbf{S})

where κ=2Lγ<1\kappa=\frac{2L}{\gamma}<1 by choosing γ>2L\gamma>2L . ∎

Next we consider the convergence rate of 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} under only the R-continuity of 𝒜1\mathcal{A}^{-1}. The following result is similar to [3, Lemma 3.1]. Here we give a proof.

Lemma 12.

Let f:[t0,)+f:[t_{0},\infty)\to{\mathbb{R}}^{+} be a nonincreasing function which is locally absolutely continuous and belongs to 𝐋1(t0,)\mathbf{L}^{1}(t_{0},\infty). Then limttf(t)=0.\lim_{t\to\infty}tf(t)=0.

Proof.

Without loss of generality, suppose that t00t_{0}\geq 0 since the limit involves only for tt large. Let F(t)=tf(t)F(t)=tf(t) then FF is locally absolutely continuous, bounded from below by 0 and for almost all tt0t\geq t_{0}, we have

ddt(F(t))=f(t)+tf(t)f(t).\frac{d}{dt}(F(t))=f(t)+tf^{\prime}(t)\leq f(t).

Since f𝐋1(t0,)f\in\mathbf{L}^{1}(t_{0},\infty), using [1, Lemma 5.1], we deduce that limtF(t)\lim_{t\to\infty}F(t) exists. Suppose that limtF(t)=c>0\lim_{t\to\infty}F(t)=c>0 then there exists T>0,c1>0T>0,c_{1}>0 such that for all tTt\geq T one has F(t)c1F(t)\geq c_{1}, or equivalently f(t)c1t1f(t)\geq c_{1}t^{-1}. Then

Tf(t)𝑑tc1Tt1𝑑t=,\int_{T}^{\infty}f(t)dt\geq c_{1}\int_{T}^{\infty}t^{-1}dt=\infty,

a contradiction. Thus we must have limtF(t)=0\lim_{t\to\infty}F(t)=0 or equivalently limttf(t)=0\lim_{t\to\infty}tf(t)=0. ∎

Lemma 13.

Let (an)(a_{n}) be a nonincreasing nonnegative sequence such that n=1an<\sum_{n=1}^{\infty}a_{n}<\infty. Then we have limnnan=0\lim_{n\to\infty}na_{n}=0.

Proof.

Since n=1an<\sum_{n=1}^{\infty}a_{n}<\infty, we have n=1an+1<\sum_{n=1}^{\infty}a_{n+1}<\infty and thus n=1bn<\sum_{n=1}^{\infty}b_{n}<\infty where bn:=(an+an+1)/2b_{n}:=(a_{n}+a_{n+1})/2. Let f:[1,)+f:[1,\infty)\to{\mathbb{R}}^{+} be defined as follows: if t[n,n+1]t\in[n,n+1] then f(t)=an+(an+1an)(tn)f(t)=a_{n}+(a_{n+1}-a_{n})(t-n), n1n\geq 1. Then ff is a nonincreasing function which is locally absolutely continuous and belongs to 𝐋1(0,)\mathbf{L}^{1}(0,\infty) since

1f(t)𝑑t=n=1bn<.\int_{1}^{\infty}f(t)dt=\sum_{n=1}^{\infty}b_{n}<\infty.

Note that if t[n1,n)t\in[n-1,n) then f(t)anf(t)\geq a_{n} and tf(t)(n1)antf(t)\geq(n-1)a_{n}. Using Lemma 12, we have limttf(t)=0\lim_{t\to\infty}tf(t)=0 and thus

limn(n1)an=0.\lim_{n\to\infty}(n-1)a_{n}=0.

Therefore limnnan=0\lim_{n\to\infty}na_{n}=0 since limnan=0\lim_{n\to\infty}a_{n}=0. ∎

Remark 4.

The fact that the sequence (an)(a_{n}) is nonincreasing cannot be omitted. For example let nk=k2,k=1,2,3n_{k}=k^{2},k=1,2,3\ldots and xn=0x_{n}=0 if nnkn\neq n_{k} and xn=1nx_{n}=\frac{1}{n} if n=nkn=n_{k}. Then

n=1xn=k=11k2<.\sum_{n=1}^{\infty}x_{n}=\sum_{k=1}^{\infty}\frac{1}{k^{2}}<\infty.

However nxn=0nx_{n}=0 if nnkn\neq n_{k} and nxn=1nx_{n}={1} if n=nkn=n_{k}. Thus the sequence (nxn)(nx_{n}) is not convergent.

Theorem 14.

Suppose that f:{+}f:\mathcal{H}\to{\mathbb{R}}\cup\{+\infty\} is a proper lower semicontinuous convex function such that inff>{\rm inf}f>-\infty and (f)1(\partial f)^{-1} is R-continuous at zero with modulus function ρ\rho and radius σ\sigma. Let (xn)(x_{n}) be the sequence generated by the proximal point algorithm

x0,xn+1=Jγf(xn),forsomeγ>0.x_{0}\in\mathcal{H},\;x_{n+1}={\rm\textbf{J}}_{\gamma\partial f}(x_{n}),\;{\rm for\;some\;}\gamma>0. (18)

We have
a) xn+1xn2=o(1n).\|x_{n+1}-x_{n}\|^{2}=o(\frac{1}{n}).
b) Let n0n_{0} be such that xn0+1xn0γσ\frac{\|x_{n_{0}+1}-x_{n_{0}}\|}{\gamma}\leq\sigma. Then for all nn0n\geq n_{0}, we have

𝐝(xn,𝐒)ρ(o(1n))0\mathbf{d}(x_{n},\mathbf{S})\leq\rho\Big{(}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\Big{)}\to 0

and

f(xn)f𝐝(xn,𝐒)o(1n)ρ(o(1n))o(1n)0f(x_{n})-f^{*}\leq\mathbf{d}(x_{n},\mathbf{S})o(\frac{1}{\sqrt{n}})\leq\rho\Big{(}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\Big{)}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\to 0

where f=minxf(x)f^{*}=\min_{x\in\mathcal{H}}f(x).

c) If ρ(r)=Lr\rho(r)=Lr for some L>0L>0 and γ>2L\gamma>2L then for nn0n\geq n_{0}, the distance 𝐝(xn,𝐒)\mathbf{d}(x_{n},\mathbf{S}) converges to zero with linear rate and so is f(xn)ff(x_{n})-f^{*}.

Proof.

a) From

xn+1xnγf(xn+1)\displaystyle-\frac{x_{n+1}-x_{n}}{\gamma}\in\partial f(x_{n+1}) (19)

and the fact that ff is convex, we imply that

xn+1xnγ,xnxn+1f(xn)f(xn+1).\left\langle-\frac{x_{n+1}-x_{n}}{\gamma},x_{n}-x_{n+1}\right\rangle\leq f(x_{n})-f(x_{n+1}).

Hence

n=1xn+1xn2γ(f(x0)inff)<\sum_{n=1}^{\infty}\|x_{n+1}-x_{n}\|^{2}\leq\gamma(f(x_{0})-{\rm inf}f)<\infty

and thus xn+1xn\|x_{n+1}-x_{n}\| converges to zero. Since (xn+1xn)n(\|x_{n+1}-x_{n}\|)_{n} is nonincreasing due to the nonexpansiveness of the resolvent, using Lemma 13, we have

xn+1xn2=o(1n).\|x_{n+1}-x_{n}\|^{2}=o(\frac{1}{n}).

b) From (19) and the R-continuity of (f)1(\partial f)^{-1}, for nn0n\geq n_{0} , we obtain

xn+1=(f)1(xn+1xnγ)(f)1(0)+ρ(xn+1xnγ)𝔹=𝐒+ρ(xn+1xnγ)𝔹.x_{n+1}=(\partial f)^{-1}\left(-\frac{x_{n+1}-x_{n}}{\gamma}\right)\subset(\partial f)^{-1}(0)+\rho\left(\frac{\|x_{n+1}-x_{n}\|}{\gamma}\right)\mathbb{B}=\mathbf{S}+\rho\left(\frac{\|x_{n+1}-x_{n}\|}{\gamma}\right)\mathbb{B}.

Therefore

𝐝(xn+1,𝐒)ρ(xn+1xnγ)=ρ(o(1n))0,asn.\mathbf{d}(x_{n+1},\mathbf{S})\leq\rho\Big{(}\frac{\|x_{n+1}-x_{n}\|}{\gamma}\Big{)}=\rho\Big{(}o(\frac{1}{\sqrt{n}})\Big{)}\to 0,\;\;{\rm as}\;\;n\to\infty. (20)

Let xn+1=𝐏𝐫𝐨𝐣𝐒(xn+1)x_{n+1}^{*}=\mathbf{P}{\mathbf{r}}\mathbf{o}\mathbf{j}_{\mathbf{S}}(x_{n+1}) then f(xn+1)=ff(x_{n+1}^{*})=f^{*}. Using (19), one has

xn+1xnγ,xn+1xn+1f(x¯)f(xn+1).\left\langle-\frac{x_{n+1}-x_{n}}{\gamma},x_{n+1}^{*}-x_{n+1}\right\rangle\leq f(\bar{x})-f(x_{n+1}).

which implies that

f(xn+1)fxn+1xnγ𝐝(xn+1,𝐒)ρ(o(1n))o(1n)0.f(x_{n+1})-f^{*}\leq\left\|\frac{x_{n+1}-x_{n}}{\gamma}\right\|\mathbf{d}(x_{n+1},\mathbf{S})\leq\rho\Big{(}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\Big{)}o\Big{(}\frac{1}{\sqrt{n}}\Big{)}\to 0.

c) Let x¯n=𝐏𝐫𝐨𝐣𝐒(xn)\bar{x}_{n}=\mathbf{P}{\mathbf{r}}\mathbf{o}\mathbf{j}_{\mathbf{S}}(x_{n}). We know that xn+1x¯nxnx¯n=𝐝(xn,𝐒)\|x_{n+1}-\bar{x}_{n}\|\leq\|x_{n}-\bar{x}_{n}\|=\mathbf{d}(x_{n},\mathbf{S}) . From (20), for nn0n\geq n_{0}, we have

𝐝(xn+1,𝐒)Lxn+1xnγLγ(xn+1x¯n+xnx¯n)2Lγxnx¯n=κ𝐝(xn,𝐒)\mathbf{d}(x_{n+1},\mathbf{S})\leq\frac{L\ {}\|x_{n+1}-x_{n}\|}{\gamma}\leq\frac{L}{\gamma}\Big{(}\|x_{n+1}-\bar{x}_{n}\|+\|x_{n}-\bar{x}_{n}\|\Big{)}\leq\frac{2L}{\gamma}\|x_{n}-\bar{x}_{n}\|=\kappa\mathbf{d}(x_{n},\mathbf{S})

where κ=2Lγ<1\kappa=\frac{2L}{\gamma}<1 by choosing γ>2L\gamma>2L .

Remark 5.

R-continuity ensures 𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A} (and also 𝐃𝐂𝐀\mathbf{D}\mathbf{C}\mathbf{A} [17]) to be consistent in the numerical sense, i.e., if xn+1xn\|x_{n+1}-x_{n}\| is small then xnx_{n} is close to a solution. It provides an estimation of the distance between xnx_{n} and the solution set based on the convergence rate of xn+1xn\|x_{n+1}-x_{n}\| to zero.

5 Conclusions

In this paper, we highlighted several advantages of R-continuity compared to other key tools used in optimization, such as metric regularity, metric subregularity and calmness. We explored important properties of R-continuity and derived an explicit convergence rate for the Proximal Point Algorithm (𝐏𝐏𝐀\mathbf{P}\mathbf{P}\mathbf{A}) under this framework. We believe that the technique developed in the paper can be extended to gain further insights into the convergence rate of other optimization algorithms.

6 Acknowledgements

This research benefited from the support of the FMJH Program Gaspard Monge for optimization and operations research and their interactions with data science.

References

  • [1] B. Abbas, H. Attouch, B. F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces, J. Optim. Theory Appl., 161 (2), 331–360, 2014
  • [2] V. Acary, O. Bonnefon, B. Brogliato, Nonsmooth Modeling and Simulation for Switched Circuits, Lecture Notes in Electrical Engineering Vol 69. Springer Netherlands, 2011
  • [3] S. Adly, H. Attouch, M. H. Le, A doubly nonlinear evolution system with threshold effects associated with dry friction, J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02417-2
  • [4] S. Adly, M. G. Cojocaru, B. K. Le, State-Dependent Sweeping Processes: Asymptotic Behavior and Algorithmic Approaches. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02485-4
  • [5] S. Adly, A. L. Dontchev, M. Théra, On one-sided Lipschitz stability of set-valued contractions. Numer. Funct. Anal. Optim. 35, 837–850 (2014)
  • [6] H. Attouch and G. Buttazzo, G. Michaille, Variational Analysis in Sobolev and BV Spaces, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014
  • [7] H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
  • [8] H. Brezis, Opérateurs Maximaux Monotones et Semi-groupes de Contractions dans les Espaces de Hilbert, Math. Studies 5, North-Holland American Elsevier, 1973
  • [9] H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer New York, NY, 2010
  • [10] A. V. Dmitruk, A. Y. Kruger, Metric regularity and systems of generalized equations, J. Math. Anal. Appl., 342(2):864–873, 2008
  • [11] A. L. Dontchev and R. T. Rockafellar, Implicit Functions and Solution Mappings. A View from Variational Analysis. Springer Monographs in Mathematics. Springer, Dordrecht, 2009
  • [12] R. Henrion, J. V. Outrata, Calmness of constraint systems with applications, Math. Program., Ser. B 104, 437–464 (2005)
  • [13] A. J. Hoffman, On approximate solutions of systems of linear inequalities, J. Research Nat. Bureau Standards 49(4) (1952), 263–265
  • [14] A. D. Ioffe, Metric regularity–a survey. Part I. Theory. J. Aust. Math. Soc., 101(2): 188–243, 2016.
  • [15] A. D. Ioffe, J. V. Outrata, On metric and calmness qualification conditions in subdifferential calculus. Set-Valued Anal., 16(2-3): 199–227, 2008
  • [16] A. Y. Kruger, Nonlinear metric subregularity. J. Optim. Theory Appl., 171(3): 820–855, 2016.
  • [17] B. K. Le, R-Continuity with Applications to Convergence Analysis of Tikhonov Regularization and DC Programming, Journal of Convex Analysis, Volume 31 (2024), No. 1, 243–254
  • [18] B. Martinet, Regularisation d’inéquations variationelles par approximations successives, Rev. Francaise Inf. Rech.Oper., (1970), pp. 154–159
  • [19] G. J. Minty, Monotone (Nonlinear) Operators in Hilbert Space. Duke Mathematical Journal, 29, 341–346, (1962).
  • [20] Ch. A. Micchelli, L. Chen and Y. Xu, Proximity algorithms for image models: Denoising, Inverse Problems, Vol. 27(4) 045009, 2011
  • [21] H. V. Ngai, M. Théra, Error Bounds in Metric Spaces and Application to the Perturbation Stability of Metric Regularity, SIAM J. Optim. 19(1), 1–20 (2008)
  • [22] J. P. Penot, Calculus without Derivatives, Springer New York, 2014
  • [23] S. M. Robinson, Regularity and stability for convex multivalued functions, Math. Oper. Res. 1 (1976), 130–143
  • [24] S. M. Robinson, Some continuity properties of polyhedral multifunctions, Math. Program. Study, 14 (1981) 206–214
  • [25] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optimization 14(5), 877–898, 1976
  • [26] R. T. Rockafellar, R.J.-B. Wets, Variational analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer- Verlag, Berlin, 1998
  • [27] A. Tanwani, B. Brogliato, C. Prieur, Well-Posedness and Output Regulation for Implicit Time-Varying Evolution Variational Inequalities, SIAM J. Control Opti., Vol. 56(2), 751–781, 2018
  • [28] L. Thibault, Unilateral Variational Analysis in Banach Spaces. World Scientific, 2023
  • [29] C. Ursescu, Multifunctions with closed convex graphs, Czechoslovak Math. J. 25 (1975), 438–411.
  • [30] X. Y. Zheng and K. F. Ng, Metric subregularity and constraint qualifications for convex generalized equations in Banach spaces. SIAM J. Optim., 18(2):437–460, 2007
  • [31] X. Y. Zheng and K. F. Ng, Metric subregularity and calmness for nonconvex generalized equations in Banach spaces, SIAM J. Optim., 20(5):2119–2136, 2010