Explicit Convergence Rate of The Proximal Point Algorithm under R-Continuity
Abstract
The paper provides a thorough comparison between R-continuity and other fundamental tools in optimization such as metric regularity, metric subregularity and calmness. We show that R-continuity has some advantages in the convergence rate analysis of algorithms solving optimization problems. We also present some properties of R-continuity and study the explicit convergence rate of the Proximal Point Algorithm under the R-continuity.
Keywords. R-continuity, metric regularity, metric subregularity, calmness, Proximal Point Algorithm, convergence rate
AMS Subject Classification. 28B05, 34A36, 34A60, 49J52, 49J53, 93D20
1 Introduction
In what follows, and are real Banach spaces whose norms are designated by . We use the notation to denote the closed ball with center and radius and by the closed unit ball which consists of the elements of norm less than or equal to . By a set-valued mapping , we mean a mapping which assigns to each a subset (possibly empty) of . The domain, the range and the graph of are defined respectively by
and
As usual we denote by the inverse of defined by
The notion stands for the distance from a point to a subset :
Given a set-valued mapping defined in a Hilbert space , and inspired by Rockafellar’s paper [25], recently B. K. Le [17] introduced the notion of R-continuity for studying the convergence rate of the Tikhonov regularization of the inclusion
(1) |
In [17], it was proved that R-continuity is a useful tool to analyze the convergence of (Difference of Convex Algorithm) and can explain why is effective in approximating solutions for a broad class of functions. In recent decades, there has been a surge of interest to study variational inclusions such as (1) since they modelise a variety of important systems. This is the case especially in optimization when the condition for critical points is considered (the Fermat rule). Another connection with the inclusion (1) arises in PDEs and is well discussed in the books [6, 9]. The fact that the solution set of (1) involves the inverse operator of , conducts naturally to study the continuity of at zero. According to [17], the mapping is said to be R-continuous at zero if there exist a radius and a non-decreasing modulus function satisfying such that
(2) |
Denoting by , the excess of the set over the set , with the convention that when is nonempty and for any , (2) is equivalent to saying that
(3) |
When for some , (3) is changed into
(4) |
In this case, is referred to as R-Lipschitz continuous at zero or equivalently upper Lipschitz at zero in the sense of Robinson [24] or outer Lipschitz continuous at zero [5]. In addition, if is a singleton, R-Lipschitz continuity at zero of is exactly the Lipschitz continuity at zero introduced by R. T. Rockafellar in [25] who considered Lipschitz continuity at zero of set-valued mappings as an important tool in optimization. Rockafellar demonstrated the linear convergence rate of the proximal point algorithm after a finite number of iterations from any starting point . However the requirement that the solution set is a singleton is often quite restrictive in practice. Thus, allowing the solution set to be set-valued and and not requiring the continuity modulus function to be Lipschitz continuous makes R-continuity a competitive alternative. It provides a viable option alongside other fundamental concepts such as metric regularity, metric subregularity or calmness in the study of sensitivity analysis as well as in establishing the convergence rate of algorithms. Note that R-continuity can be also extended to Banach spaces. In order to compare these regularity concepts, let us recall the definitions of metric regularity, metric subregularity and calmness (see, e.g., [14, 15, 12, 30, 31, 10, 16, 26, 11, 21, 22, 28] and the references therein). Given where are real Banach spaces and , one says that is metrically regular near if a linear error bound holds
(5) |
for some and for all close to . If (5) is satisfied for all , then is termed globally metrically regular. When , inequality (5) provides an estimate of how far is a point around to the solution set . However, metric regularity can be too stringent as it requires the inequality (5) to hold in a neighborhood of . One says that is calm at if there exist such that
(6) |
or equivalently
(7) |
Note that when the set is empty the inequality (7) is always satisfied. When , (7) becomes
(8) |
Another well-known regularity concept is metric subregularity. The set-valued mapping is called metrically subregular at if there exists a constant such that
(9) |
It is known (see, e.g., [14, Proposition 2.62]) that metric subregularity of at is equivalent to calmness of at . From (3) and (8), let us observe that if is R-Lipschitz continuous at 0 then it is calm at for any . While metric regularity is too strong, calmness is relatively weak to deduce useful properties. For example when the set is empty for some , no information can be deduced. In addition, with complicated , we do not know a specific and have to use computers to find an approximation of an element of . The first advantage of R-continuity is that it is unnecessary to know a prior solution in advance. Furthermore in (2), since is small, for each , there exists some close to . This fact is meaningful since it is difficult to ensure to be in the vicinity of , which is unknown. Secondly, R-continuity is straightforward and always guarantees the system consistency in Hoffman’s sense [13]. Indeed, Theorem 7 establishes that under the R-continuity of at zero, the inclusion (1) is consistent, i.e., if has a small norm and satisfies , then we can find a solution such that is close to . Note that to guarantee the consistency property, metric subregularity or metric regularity must be satisfied globally. Thirdly, R-continuity does not require the property (2) to hold around a point belonging to the graph of the operator; it is easily verified for a broad class of set-valued mappings. In order to achieve the metric regularity at some point , the celebrated Robinson-Ursescu Theorem [23, 29] requires for the operator to have a closed and convex graph and also that (the interior of . R-continuity only necessitates that the graph of the operators be closed. Indeed, if the operator has a closed graph at zero and is locally compact at zero then it is R-continuous at zero (Theorem 5). Conversely, if the operator is R-continuous at zero and its value at zero is closed, then it has a closed graph at zero (Theorem 6). The condition for having its graph closed at zero is relatively mild. If an operator’s graph is closed, then it has a closed graph at zero as well. Furthermore, an operator has a closed graph if and only if its inverse has a closed graph. Closed graph operators are usually found in optimization when dealing with continuous single-valued mappings, maximally monotone operators (see, e.g., [8]), the sum of two closed graph operators where one of them is single-valued, the sum of two closed graph set-valued operators where one of them is locally compact (Proposition 3). The Sign function in , which equals to the convex subdifferential of the norm function, is a well-known example of an operator with compact range and is widely used in image processing, mechanical and electrical engineering (see, e.g., [2, 20]).
Finally, we demonstrate that R-continuity can be used to analyse the explicit convergence rate of the proximal point algorithm () when applied to a maximally monotone operator , where is a Hilbert space. The , introduced by B. Martinet and further developed by Rockafellar, Bauschke and Combettes and others (see, e.g., [7, 18, 25]), is an essential tool in convex optimization if is set-valued and lacks special structure. We show that if is globally R-Lipschitz continuous at zero (i.e., ), the convergence of the proximal point algorithm is linear from the beginning (Theorem 11). When considering the case where , i.e. when is the convex subdifferential of a proper lower semicontinuous extended real-valued convex function, not necessarily smooth , if is only R-continuous, we obtain an explicit convergence rate of based on the modulus function. If is R-Lipschitz continuous at zero, one has the linear convergence of the generated sequence after some iterations (Theorem 14).
The paper is organized as follows. First we review the necessary material about R-continuity of set-valued mappings and maximally monotone operators in Section 2. Some key properties of R-continuity are established in Section 3. The analysis of the convergence rate of under the R-continuity is presented in Section 4. The paper ends in Section 5 with some conclusions and perspectives.
2 Mathematical preliminaries
In what follows, we extend the notion of R-continuity introduced in [17] from Hilbert spaces to Banach spaces, defined for any point in the domain of the operators. Let be a set-valued mapping where are Banach spaces and .
Definition 1.
The set-valued mapping is called R-continuous at if there exist and a non-decreasing function satisfying such that
(10) |
or equivalently, for each , there exists such that for all .
The function is called a continuity modulus function of at and is called the radius. We say that is -Lipschitz continuous at with modulus if for some . In addition, if then is said globally R-Lipschitz continuous at .
Remark 1.
The set-valued mapping is R-continuous at with modulus function and radius iff is R-continuous at with modulus function and radius .
Definition 2.
The set-valued mapping is called R-continuous if it is R-continuous at any point in its domain. It is called R-Lipschitz continuous if it is R-Lipschitz continuous at any point in its domain. In addition, if the radius , then it is called globally R-Lipschitz continuous.
Proposition 1.
If is globally metrically regular then is globally R-Lipschitz continuous.
Proof.
Let be given and . We have and since is globally metrically regular, one has
(11) |
for some . Since is arbitrary, we deduce that
and the conclusion follows. ∎
The following simple example shows that the metric regularity is strictly stronger than R-Lipschitz continuity. This important fact means that we still obtain the convergence rate of optimization algorithms if only R-Lipschitz continuity or even R-continuity holds (see, e.g., [17]).
Example 1.
Let be defined by
Then is R-Lipschitz continuous for any positive modulus. However, if and then
Thus is not metrically regular at .
In the text that follows, we consider set-valued mappings with closed graph.
Definition 3.
We say that has a closed graph if and then . It is said that has a closed graph at zero if and then .
Proposition 2.
The set-valued mapping has a closed graph if and only if has a closed graph.
Proof.
It follows directly from the definition of the inverse set-valued mapping . ∎
Proposition 3.
Suppose that where has a closed graph;
a) If is single-valued, then has a closed graph;
b) If is locally compact, i.e., for each there exists such that the set is compact, then has a closed graph.
Proof.
a) Let and . We have for some . Since and has a closed graph, we imply that and the conclusion follows.
b) Similarly let and . We have for some and . Since is locally compact, there exists a subsequence of , without relabelling w.l.o.g, converging to some . Thus converges to . Therefore
∎
Finally some useful properties of monotone operators are reminded. Let be a Hilbert space, a set-valued mapping is called monotone provided
In addition, it is called maximally monotone if there is no monotone operator such that the graph of is strictly included in the graph of . The mapping is called -strongly monotone if
Note that such operators have been studied extensively because of their role in convex analysis and certain partial differential equations. The resolvent of a maximally monotone operator is defined respectively by . It is well-known that resolvents are single-valued and non-expansive (see, e.g., [19]).
A set-valued mapping is called -coercive with modulus if for all and for all , , we have
(12) |
In order to simplify the writing we will use the notation: for all , we have
to describe this property. It is easy to see that a -strongly monotone operator is -coercive. Another example is given by matrices with full column rank (see, e.g., [17]). Next we extend to a pair of set-valued mappings the notion of monotonicity of a pair of single-valued operators introduced in Hilbert spaces by Adly-Cojocaru-Le [4].
Definition 4.
Let be given two set-valued mappings . The pair is called monotone if for all , we have
(13) |
Equivalently, using the notation previously given (13) is equivalent to
Note that when is a Hilbert space, the monotonicity of is equivalent to
In addition, we say is -strongly monotone if
Remark 2.
-
1.
In Hilbert spaces, the monotonicity of means that the increments of and does not form an obtuse angle.
-
2.
If is monotone then for all , one has
-
3.
If is monotone then the pair is monotone.
-
4.
The strong monotonicity of the pair is important for the linear convergence of optimization algorithms solving the inclusion [4].When fails to be monotone, with some suitable choice of , the pair becomes strongly monotone, as the following example shows.
Example 2.
Let be defined by
where
Then are not monotone but is -strongly monotone. Indeed for all and , we have
3 Properties of R-Continuity
First we show that the sum (and also the difference) of two R-continuous mappings is also R-continuous.
Theorem 4.
Suppose that are R-continuous at then is also R-continuous at .
Proof.
Let and be the modulus functions and radii of and respectively. We set and for all then is non-decreasing and . Taking and , then where and . Since are R-continuous at there exist and such that and . Let then
(14) |
It means that is R-continuous at with modulus function and radius . ∎
Next we show that R-continuity is satisfied for a large class of operators and has a closed connection with the closed graph property at zero.
Theorem 5.
If has a closed graph at zero and is locally compact at zero then is -continuous at zero.
Proof.
We define the function as follows: set and if , set
It is easy to see that is well-defined and non-decreasing because is locally compact at zero. Since is non-decreasing and bounded from below by , exists. If we suppose that is not R-continuous at , then we must have . Hence there exist two sequences , such that , and
(15) |
Since is locally compact at zero, on relabeling if necessary, we may suppose that converges to since has a closed graph at zero. This contradicts and the proof is complete. ∎
Theorem 6.
If is R-continuous at zero and is closed then has a closed graph at zero.
Proof.
Suppose that , and . From the R-continuity of , we have
Since , and is closed, we deduce that and thus has a closed graph at zero. ∎
Although R-continuity is straightforward, it always guarantees the consistency of the system in Hoffman’s sense [13].
Theorem 7.
Let be a set-valued mapping. If is R-continuous at zero then the inclusion is consistent in the sense of Hoffman, i.e., if has a small norm satisfying then there exists a solution such that is close to .
Proof.
Suppose that is R-continuous at zero with modulus function and radius . Let and . We claim that we can find some such that and is also small. Indeed, we have
which implies that
and the conclusion follows. ∎
Remark 3.
Let us note that to obtain the consistency property, metric subregularity or metric regularity have to be satisfied globally. Indeed suppose that metric subregularity (9) holds and is small but is not close to , we cannot conclude that is close to a solution.
In optimization, when we consider the inclusion , we want to know whether is R-continuous or even R-Lipschitz continuous. Here we provide some cases where the global R-Lipschitz continuity is satisfied. It is known that the inverse of any square matrix is globally R-Lipschitz continuous [17, Proposition 2.10]. Now we prove that the inverse of any matrix is globally R-Lipschitz continuous. Note that this result cannot be deduced from the Robinson-Ursescu Theorem. Finally we give some nontrivial nonlinear examples (see also Proposition 1 and [5, Theorem 7]).
Theorem 8.
If is a matrix, then is globally R-Lipschitz continuous.
Proof.
Let be given . For all and , we want to find such that
for some . First we take some . We have and . Note that is a positive semi-definite matrix. Let where are the projections of onto and , respectively. Similarly we decompose . Then we have
for some (see, e.g., [17, Lemma 2.8] or [27, Lemma 3]). Thus if we choose then since . In addition, we have
and the conclusion follows. ∎
Proposition 9.
If has the form where is a matrix and is monotone then is globally R-Lipschitz continuous.
Proof.
Suppose that and . We want to find such that
for some . First we take some , i.e., . Using the proof of Theorem 8, there exists such that and for some . Since is monotone, one has
and the conclusion follows. ∎
Proposition 10.
If has the form where is coercive and the pair is monotone then is globally R-Lipschitz continuous.
Proof.
Suppose that where and where . Since is coercive and the pair is monotone, one has
for some . Thus is single-valued Lipschitz continuous and thus is R-Lipschitz continuous. ∎
4 Explicit Convergence Rate of under the R-Continuity
Let be a maximally monotone operator with nonempty where is a Hilbert space. Let us recall that generates for any starting point , a sequence defined by the rule:
(16) |
for some where . plays an important role in convex optimization if is set-valued and has no special structure. The following result shows that the convergence rate of is indeed linear if is R-Lipschitz continuous at zero globally.
Theorem 11.
Suppose that is R-Lipschitz continuous at zero globally with modulus function for some . Then for , the sequence converges to zero with linear rate where is the sequence generated by with .
Proof.
Let . We have where the inequality is obtained by using the nonexpansiveness of the resolvent. We have
(17) |
Since is R-Lipschitz continuous at zero globally, we obtain
Consequently
where by choosing . ∎
Next we consider the convergence rate of under only the R-continuity of . The following result is similar to [3, Lemma 3.1]. Here we give a proof.
Lemma 12.
Let be a nonincreasing function which is locally absolutely continuous and belongs to . Then
Proof.
Without loss of generality, suppose that since the limit involves only for large. Let then is locally absolutely continuous, bounded from below by and for almost all , we have
Since , using [1, Lemma 5.1], we deduce that exists. Suppose that then there exists such that for all one has , or equivalently . Then
a contradiction. Thus we must have or equivalently . ∎
Lemma 13.
Let be a nonincreasing nonnegative sequence such that . Then we have .
Proof.
Since , we have and thus where . Let be defined as follows: if then , . Then is a nonincreasing function which is locally absolutely continuous and belongs to since
Note that if then and . Using Lemma 12, we have and thus
Therefore since . ∎
Remark 4.
The fact that the sequence is nonincreasing cannot be omitted. For example let and if and if . Then
However if and if . Thus the sequence is not convergent.
Theorem 14.
Suppose that is a proper lower semicontinuous convex function such that and is R-continuous at zero with modulus function and radius . Let be the sequence generated by the proximal point algorithm
(18) |
We have
a)
b) Let be such that . Then for all , we have
and
where .
c) If for some and then for , the distance converges to zero with linear rate and so is .
Proof.
a) From
(19) |
and the fact that is convex, we imply that
Hence
and thus converges to zero. Since is nonincreasing due to the nonexpansiveness of the resolvent, using Lemma 13, we have
b) From (19) and the R-continuity of , for , we obtain
Therefore
(20) |
Let then . Using (19), one has
which implies that
c) Let . We know that . From (20), for , we have
where by choosing .
∎
Remark 5.
R-continuity ensures (and also [17]) to be consistent in the numerical sense, i.e., if is small then is close to a solution. It provides an estimation of the distance between and the solution set based on the convergence rate of to zero.
5 Conclusions
In this paper, we highlighted several advantages of R-continuity compared to other key tools used in optimization, such as metric regularity, metric subregularity and calmness. We explored important properties of R-continuity and derived an explicit convergence rate for the Proximal Point Algorithm () under this framework. We believe that the technique developed in the paper can be extended to gain further insights into the convergence rate of other optimization algorithms.
6 Acknowledgements
This research benefited from the support of the FMJH Program Gaspard Monge for optimization and operations research and their interactions with data science.
References
- [1] B. Abbas, H. Attouch, B. F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces, J. Optim. Theory Appl., 161 (2), 331–360, 2014
- [2] V. Acary, O. Bonnefon, B. Brogliato, Nonsmooth Modeling and Simulation for Switched Circuits, Lecture Notes in Electrical Engineering Vol 69. Springer Netherlands, 2011
- [3] S. Adly, H. Attouch, M. H. Le, A doubly nonlinear evolution system with threshold effects associated with dry friction, J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02417-2
- [4] S. Adly, M. G. Cojocaru, B. K. Le, State-Dependent Sweeping Processes: Asymptotic Behavior and Algorithmic Approaches. J Optim Theory Appl (2024). https://doi.org/10.1007/s10957-024-02485-4
- [5] S. Adly, A. L. Dontchev, M. Théra, On one-sided Lipschitz stability of set-valued contractions. Numer. Funct. Anal. Optim. 35, 837–850 (2014)
- [6] H. Attouch and G. Buttazzo, G. Michaille, Variational Analysis in Sobolev and BV Spaces, Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014
- [7] H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, Berlin (2011)
- [8] H. Brezis, Opérateurs Maximaux Monotones et Semi-groupes de Contractions dans les Espaces de Hilbert, Math. Studies 5, North-Holland American Elsevier, 1973
- [9] H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Springer New York, NY, 2010
- [10] A. V. Dmitruk, A. Y. Kruger, Metric regularity and systems of generalized equations, J. Math. Anal. Appl., 342(2):864–873, 2008
- [11] A. L. Dontchev and R. T. Rockafellar, Implicit Functions and Solution Mappings. A View from Variational Analysis. Springer Monographs in Mathematics. Springer, Dordrecht, 2009
- [12] R. Henrion, J. V. Outrata, Calmness of constraint systems with applications, Math. Program., Ser. B 104, 437–464 (2005)
- [13] A. J. Hoffman, On approximate solutions of systems of linear inequalities, J. Research Nat. Bureau Standards 49(4) (1952), 263–265
- [14] A. D. Ioffe, Metric regularity–a survey. Part I. Theory. J. Aust. Math. Soc., 101(2): 188–243, 2016.
- [15] A. D. Ioffe, J. V. Outrata, On metric and calmness qualification conditions in subdifferential calculus. Set-Valued Anal., 16(2-3): 199–227, 2008
- [16] A. Y. Kruger, Nonlinear metric subregularity. J. Optim. Theory Appl., 171(3): 820–855, 2016.
- [17] B. K. Le, R-Continuity with Applications to Convergence Analysis of Tikhonov Regularization and DC Programming, Journal of Convex Analysis, Volume 31 (2024), No. 1, 243–254
- [18] B. Martinet, Regularisation d’inéquations variationelles par approximations successives, Rev. Francaise Inf. Rech.Oper., (1970), pp. 154–159
- [19] G. J. Minty, Monotone (Nonlinear) Operators in Hilbert Space. Duke Mathematical Journal, 29, 341–346, (1962).
- [20] Ch. A. Micchelli, L. Chen and Y. Xu, Proximity algorithms for image models: Denoising, Inverse Problems, Vol. 27(4) 045009, 2011
- [21] H. V. Ngai, M. Théra, Error Bounds in Metric Spaces and Application to the Perturbation Stability of Metric Regularity, SIAM J. Optim. 19(1), 1–20 (2008)
- [22] J. P. Penot, Calculus without Derivatives, Springer New York, 2014
- [23] S. M. Robinson, Regularity and stability for convex multivalued functions, Math. Oper. Res. 1 (1976), 130–143
- [24] S. M. Robinson, Some continuity properties of polyhedral multifunctions, Math. Program. Study, 14 (1981) 206–214
- [25] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optimization 14(5), 877–898, 1976
- [26] R. T. Rockafellar, R.J.-B. Wets, Variational analysis, volume 317 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer- Verlag, Berlin, 1998
- [27] A. Tanwani, B. Brogliato, C. Prieur, Well-Posedness and Output Regulation for Implicit Time-Varying Evolution Variational Inequalities, SIAM J. Control Opti., Vol. 56(2), 751–781, 2018
- [28] L. Thibault, Unilateral Variational Analysis in Banach Spaces. World Scientific, 2023
- [29] C. Ursescu, Multifunctions with closed convex graphs, Czechoslovak Math. J. 25 (1975), 438–411.
- [30] X. Y. Zheng and K. F. Ng, Metric subregularity and constraint qualifications for convex generalized equations in Banach spaces. SIAM J. Optim., 18(2):437–460, 2007
- [31] X. Y. Zheng and K. F. Ng, Metric subregularity and calmness for nonconvex generalized equations in Banach spaces, SIAM J. Optim., 20(5):2119–2136, 2010