This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Inexact Newton Methods for Solving Generalized Equations on Riemannian Manifolds

Mauricio S. Louzeiro School of Computer Science and Technology, Dongguan University of Technology, Dongguan, Guangdong, China and IME/UFG, Avenida Esperança, s/n, Campus Samambaia, CEP 74690-900, Goiânia, GO, Brazil (Email: [email protected]). This author was partially supported by National Natural Science Foundation of China (No. 12171087).    Gilson N. Silva Departamento de Matemática, Universidade Federal do Piauí­, Teresina, Piauí­, 64049-550, Brazil (Email: [email protected]). This author was partially supported by CNPq, Brazil (401864/2022-7 and 306593/2022-0).    Jinyun Yuan School of Computer Science and Technology, Dongguan University of Technology, Dongguan, Guangdong, China (Email: [email protected]). This author was partially supported by National Natural Science Foundation of China (No. 12171087), Dongguan University of Technology, China (221110093 ), and Shanghai Municipal Science and Technology Commission, China (23WZ2501400).    Daoping Zhang School of Mathematical Sciences and LPMC, Nankai University, Tianjin 300071, China (Email: [email protected]). This author was supported by National Natural Science Foundation of China (No. 12201320) and the Fundamental Research Funds for the Central Universities, Nankai University (No. 63221039 and 63231144).
Abstract

The convergence of inexact Newton methods is studied for solving generalized equations on Riemannian manifolds by using the metric regularity property, which is also explored. Under appropriate conditions and without any additional geometric assumptions, local convergence results with linear and quadratic rates, as well as a semi-local convergence result, are obtained for the proposed method. Finally, the theory is applied to the problem of finding a singularity for the sum of two vector fields. In particular, the KKT system for the constrained Riemannian center of mass on the sphere is explored numerically.

Key words: Generalized equation, inexact Newton method, metric regularity, Riemannian manifolds, vector fields.
AMS subject classification:  90C33 \cdot 49M37 \cdot 65K05

1 Introduction

In recent years, constrained generalized equations on Banach spaces, i.e., finding a solution to the inclusion

xC,f(x)+F(x)0x\in C,\quad f(x)+F(x)\ni 0 (1)

where 𝕏\mathbb{X} and 𝕐\mathbb{Y} are two Banach spaces, C𝕏C\subset\mathbb{X} is a nonempty, closed, and convex set, f:𝕏𝕐f\colon\mathbb{X}\to\mathbb{Y} is a mapping, and F:𝕏𝕐F\colon\mathbb{X}\rightrightarrows\mathbb{Y} is a set-valued mapping, have gained increased attention [18, 7]. This is attributed to the fact that the model (1) covers many well-known problems, such as constrained variational inequality and split variational inequality problem [15, 34], nonlinear equations, systems of equations and inequalities, optimality condition in mathematical programming and optimal control, and equilibrium problem. The readers are referred to [3, 5, 4, 36, 21, 17, 30, 28, 26, 9, 24, 22, 23, 8, 33, 31, 38] for a detailed study of (1) with C=𝕏C=\mathbb{X}.

For a Riemannian manifold {\cal M}, a closed set Ω\Omega\subset{\cal M} with a nonempty interior, a continuously differentiable mapping f:mf\colon{\cal M}\to\mathbb{R}^{m}, and a set-valued mapping F:mF\colon{\cal M}\rightrightarrows\mathbb{R}^{m}, we consider the generalized equation

pΩ,f(p)+F(p)0.p\in\Omega,\quad f(p)+F(p)\ni 0. (2)

Evidently, (2) covers the nonlinear equation f(p)=0f(p)=0 (F0)F\equiv 0) and the nonlinear inclusion problem f(p)Kf(p)\in K (FKF\equiv-K) for a fixed cone KmK\subset\mathbb{R}^{m} [52]. In this paper, we shall prove that problem (2) also covers the problem of finding a singularity of the sum of two vector fields of the form

pΩ,V(p)+Z(p)0p,p\in\Omega,\quad V(p)+Z(p)\ni 0_{p}, (3)

where V:𝒯V\colon{\cal M}\to{\cal T}{\cal M} is a single-valued vector field, Z:𝒯Z\colon{\cal M}\rightrightarrows{\cal T}{\cal M} is a set-valued vector field. In particular, we demonstrate that (2) can also be used to obtain a solution for a variational inequality problem and the Karush–Kuhn–Tucker (KKT) conditions for a constrained optimization problem on manifolds.

Problem (3) has been extensively investigated [5, 28] and solved using Newton’s method for Z0Z\equiv 0 [6, 25, 51, 42, 2, 19]. This problem naturally arises, for example, in the first-order optimality conditions of the minimization problem

Minimize ς(p)+ϑ(p),pintΩ,\text{Minimize }\varsigma(p)+\vartheta(p),\quad p\in\operatorname{int}\Omega, (4)

where ς:¯{±}\varsigma\colon{\cal M}\to\overline{\mathbb{R}}\coloneqq\mathbb{R}\cup\{\pm\infty\} is a differentiable function defined over intΩ\operatorname{int}\Omega (the interior of Ω\Omega) and ϑ:¯\vartheta\colon{\cal M}\to\overline{\mathbb{R}} is non-differentiable. In fact, one can consider (3) with VV representing the Riemannian gradient of ς\varsigma (gradς\operatorname{grad}\varsigma), ZZ representing the Riemannian subdifferential of ϑ\vartheta (ϑ\partial\vartheta), and Ω=intΩ\Omega=\operatorname{int}\Omega, i.e.,

pintΩ,gradς(p)+ϑ(p)0p.p\in\operatorname{int}\Omega,\quad\operatorname{grad}\varsigma(p)+\partial\vartheta(p)\ni 0_{p}.

Problem (4) is associated with several important applications, including sparse principal component analysis [32], sparse blind deconvolution [54], unsupervised feature selection [49], and image restoration [12, 11].

The extensive scope of generalized equations and the growing interest in optimization on manifolds [47, 14, 1] in recent years have motivated us to explore the integration of these two areas in this paper. Our focus is on studying a Riemannian version of the inexact Newton method proposed by [16] with the aim of solving problem (2). Given an initial point p0p_{0}\in\mathcal{M}, our method generates a sequence of iterations {pk}\{p_{k}\} as follows:

(f(pk)+𝒟f(pk)[vk]+F(exppkvk))Rk(pk),pk+1exppkvk,\left(f(p_{k})+\mathcal{D}f(p_{k})[v_{k}]+F(\exp_{p_{k}}v_{k})\right)\cap R_{k}(p_{k})\neq\varnothing,\qquad p_{k+1}\coloneqq\exp_{p_{k}}v_{k}, (5)

where Rk:mR_{k}\colon\mathcal{M}\rightrightarrows\mathbb{R}^{m} is a sequence of set-valued mappings with closed graphs representing the inexactness, and 𝒟f\mathcal{D}f denotes the differential of ff. In other words, the method involves: selecting an initial point p0p_{0} on the manifold (sufficiently close to a solution), solving a subproblem to find vkv_{k} in the intersection above (under certain conditions, it is possible to guarantee that it is nonempty), and computing the next iteration pk+1p_{k+1} by applying the exponential map to (pk,vk)(p_{k},v_{k}). It is noteworthy that whenever vkv_{k} is sufficiently small, (5) can be expressed as:

(f(pk)+𝒟f(pk)[exppk1pk+1]+F(pk+1))Rk(pk).\left(f(p_{k})+\mathcal{D}f(p_{k})[\exp^{-1}_{p_{k}}p_{k+1}]+F(p_{k+1})\right)\cap R_{k}(p_{k})\neq\varnothing. (6)

Although it may appear that obtaining vkv_{k} requires solving the subproblem in (5) exactly, the map RkR_{k} is specifically introduced to circumvent this necessity. This is evident in the particular case of (5) where Rk(pk)R_{k}(p_{k}) is defined as the closed ball centered at 0 with a radius of ηkf(pk)e\eta_{k}\|f(p_{k})\|_{e}, with ηk\eta_{k} being a forcing sequence of non-negative real numbers converging to 0, and F0F\equiv 0. In this scenario, the subproblem in (5) consists of finding a vkv_{k} that satisfies

f(pk)+𝒟f(pk)[vk]eηkf(pk)e,k=0,1,,\|f(p_{k})+{\cal D}f(p_{k})[v_{k}]\|_{e}\leq\eta_{k}\|f(p_{k})\|_{e},\qquad k=0,1,\ldots,

where e\|\cdot\|_{e} denotes the Euclidean norm, which can be interpreted as an inexact Newton method for by solving f(p)=0f(p)=0. The convergence analysis presented in this paper is conducted using (5) because it is more general and can be applied to other potential particular cases of (2).

Assuming that the set-valued mapping f+Ff+F in (2) is metrically regular at p¯Ω\bar{p}\in\Omega for 0 and that RkR_{k} satisfies a suitable boundedness condition, we demonstrate that a sequence {pk}\{p_{k}\} generated by (6) exhibits both linear and quadratic convergence towards p¯\bar{p}, with the exact nature of the convergence depending on the specific assumptions regarding RkR_{k}. These results can be obtained without requiring prior knowledge of the sequence of mappings RkR_{k} in terms of problem-specific data. However, it is necessary to ensure that a sequence {uk}\{u_{k}\} in Rk(pk)R_{k}(p_{k}) converges to 0 at the same rate as the sequence {pk}\{p_{k}\} converges to p¯\bar{p}. This requirement is a standard assumption in the context of inexact Newton-type methods, even when applied to nonlinear equations.

Under the condition of metric regularity of a linearization of f+Ff+F at p¯\bar{p} for 0, and provided that certain additional conditions are met, we present variations of the aforementioned results. In these variations, a neighborhood of p¯\bar{p} is assumed to be known, which allows for a more suitable choice of the initial point p0p_{0} for the sequence {pk}\{p_{k}\}. We also provide a semi-local convergence result, where the required conditions are related to p0p_{0} rather than p¯\bar{p}. This result is new even in the case where {\cal M} is a Euclidean space.

To understand the concept of metric regularity well on Riemannian manifolds, we constructed examples of mappings that are metrically regular over the set of positive definite symmetric matrices equipped with a well-established Riemannian metric. Additionally, we present conditions that guarantee the metric regularity property for certain mappings, and in particular, we compare the metric regularity of f+Ff+F with that of its linearization. Finally, some examples are given to show that our proposed concept is useful and a numerical example applied to the KKT system is given as well for the constrained Riemannian center of mass on the sphere to illustrate our theoretical results.

This work is organized as follows. In Section 2, some notations and basic concepts are reviewed. In Section 3 the metric regularity assumption is explored. In Section 4, local and semi-local convergence are studied for the proposed method. In Section 5 the relationship is investigated between (2) and (3), and some examples of classical problems which can be viewed as generalized equations are given. In Section 6, a numerical example is provided to illustrate our theoretical results. Finally, conclusions are presented in the last section.

2 Preliminary

In this section we recall some notations, definitions and basic properties of Riemannian manifolds used throughout the paper, which can be found in many introductory books on Riemannian geometry, for example [20, 41, 40, 50].

Suppose that {\cal M} is a connected, nn-dimensional smooth manifold. At each point pp\in{\cal M}, the tangent space 𝒯p{\cal T}_{p}{\cal M} is an nn-dimensional vector space with its origin at 0p0_{p}. The disjoint union of all tangent spaces, denoted as 𝒯{\cal T}{\cal M}, is the tangent bundle of {\cal M}. We assume that {\cal M} is equipped with a Riemannian metric, making it a Riemannian manifold. At each point pp\in{\cal M}, this Riemannian metric is denoted by ,p:𝒯p×𝒯p{\langle}\cdot,\cdot{\rangle}_{p}\colon{\cal T}_{p}{\cal M}\times{\cal T}_{p}{\cal M}\to\mathbb{R}, and the associated norm in 𝒯p{\cal T}_{p}{\cal M} is represented as p\|\cdot\|_{p}. The Riemannian distance between two points pp and qq in {\cal M}, denoted as d(p,q)d(p,q), is defined as the infimum of the lengths of all piecewise smooth curve segments connecting pp to qq. Additionally, the distance from a point pp to a subset 𝒲{\cal W}\subset{\cal M} is defined as d(p,𝒲)infq𝒲d(p,q)d(p,{\cal W})\coloneqq\inf_{q\in{\cal W}}d(p,q) and the interior of 𝒲{\cal W} is represented by int𝒲\operatorname{int}{\cal W}.

A vector field VV on {\cal M} is a correspondence that associates to each point pp\in{\cal M} a vector V(p)𝒯pV(p)\in{\cal T}_{p}{\cal M}. The point pp is said to be a singularity of VV if and only if V(p)=0pV(p)=0_{p}. The set of smooth vector fields on 𝒲{\cal W}\subseteq{\cal M} is denoted by 𝒳(𝒲){\cal X}({\cal W}).

The tangent vector of a smooth curve γ:I\gamma\colon I\to{\cal M} defined on some open interval II\subseteq\mathbb{R} is denoted by γ˙(t)\dot{\gamma}(t). For each a,tIa,t\in I, a<ta<t, the Levi-Civita connection :𝒳()×𝒳()𝒳()\nabla\colon{\cal X}({\cal M})\times{\cal X}({\cal M})\to{\cal X}({\cal M}) induces an isometry Pγ,a,t:𝒯γ(a)𝒯γ(t)P_{\gamma,a,t}\colon{\cal T}_{\gamma(a)}{\mathcal{M}}\to{\cal T}_{\gamma(t)}{\mathcal{M}} relative to the Riemannian metric on {\cal M}, given by Pγ,a,t,v=V(γ(t))P_{\gamma,a,t},v=V(\gamma(t)), where VV is the unique vector field on γ\gamma such that γ˙(t)V(γ(t))=0\nabla_{\dot{\gamma}(t)}V(\gamma(t))=0 and V(γ(a))=vV(\gamma(a))=v. The isometry Pγ,a,tP_{\gamma,a,t} is the parallel transport along γ\gamma joining γ(a)\gamma(a) to γ(t)\gamma(t). When the geodesic γ\gamma connecting p=γ(a)p=\gamma(a) and q=γ(t)q=\gamma(t) is unique, the notation PpqP_{pq} will be used instead of Pγ,a,tP_{\gamma,a,t}. It is well-known that PqpPpqP_{qp}\circ P_{pq} is equal to the identity map over 𝒯p{\cal T}_{p}{\cal M}.

The differential of a smooth function f:f\colon{\cal M}\to\mathbb{R} at pp is the linear map 𝒟f(p):𝒯p{\cal D}f(p)\colon{\cal T}_{p}{\cal M}\to\mathbb{R} which assigns to each v𝒯pv\in{\cal T}_{p}{\cal M} the value

𝒟f(p)[v]=γ˙(t0)[f]=ddt(fγ)|t=t0,{\cal D}f(p)[v]=\dot{\gamma}(t_{0})[f]=\frac{\operatorname{d}}{\operatorname{dt}}(f\circ\gamma)\Bigl{|}_{t=t_{0}},

for every smooth curve γ:I\gamma\colon I\to{\cal M} satisfying γ(t0)=p\gamma(t_{0})=p and γ˙(t0)=v\dot{\gamma}(t_{0})=v. The gradient at pp of ff, denoted as gradf(p)\operatorname{grad}f(p), is defined by the unique tangent vector at pp such that gradf(p),vp=𝒟f(p)[v]{\langle}\operatorname{grad}f(p),v{\rangle}_{p}={\cal D}f(p)[v] for all v𝒯p.v\in{\cal T}_{p}{\cal M}. For a smooth multifunction f(f1,,fm):mf\coloneqq(f_{1},\ldots,f_{m})\colon{\cal M}\to\mathbb{R}^{m}, its differential 𝒟f(p):𝒯pm{\cal D}f(p)\colon{\cal T}_{p}{\cal M}\to\mathbb{R}^{m} is given by

𝒟f(p)[v]=(gradf1(p),vp,,gradfm(p),vp),v𝒯p.{\cal D}f(p)[v]=({\langle}\operatorname{grad}f_{1}(p),v{\rangle}_{p},\ldots,{\langle}\operatorname{grad}f_{m}(p),v{\rangle}_{p}),\qquad v\in{\cal T}_{p}{\cal M}. (7)

Note that 𝒟f(p){\cal D}f(p) is a linear map from 𝒯p{\cal T}_{p}{\cal M} into m\mathbb{R}^{m} for all pp\in{\cal M}. The norm of a linear map A:𝒯pmA\colon{\cal T}_{p}{\cal M}\to\mathbb{R}^{m} is defined by Amapsup{Ave:v𝒯p,vp=1}\|A\|_{map}\coloneqq\sup\{\|Av\|_{e}\colon v\in{\cal T}_{p}{\cal M},\,\|v\|_{p}=1\} where e\|\cdot\|_{e} is the Euclidean norm on m\mathbb{R}^{m}.

A vector field VV along a smooth curve γ\gamma is said to be parallel if and only if γ˙V=0\nabla_{\dot{\gamma}}V=0. The curve γ\gamma is a geodesic when γ˙\dot{\gamma} is self-parallel. When the geodesic equation γ˙γ˙=0\nabla_{\dot{\gamma}}\dot{\gamma}=0 is a second-order nonlinear ordinary differential equation, the geodesic γ=γv(,p)\gamma=\gamma_{v}(\cdot,p) is determined by its position pp and velocity vv at pp. The restriction of a geodesic to a closed bounded interval is called a geodesic segment. Denote the unique geodesic segment γ:[0,1]\gamma\colon[0,1]\to{\cal M} satisfying γ(0)=p\gamma(0)=p and γ(1)=q\gamma(1)=q by γpq\gamma_{pq}. A geodesic segment joining pp to qq in \mathcal{M} is said to be minimal if its length is equal to d(p,q)d(p,q). A Riemannian manifold is complete if its geodesics are defined for all values of tt\in\mathbb{R}. Hopf-Rinow’s theorem asserts that every pair of points in a complete, connected Riemannian manifold \mathcal{M} can be joined by a (not necessarily unique) minimal geodesic segment. Due to the completeness of \mathcal{M}, the exponential map expp:𝒯p\exp_{p}\colon{\cal T}_{p}\mathcal{M}\to\mathcal{M} is given by exppv=γv(1,p)\exp_{p}v=\gamma_{v}(1,p), for each pp\in\mathcal{M}. In this paper, all manifolds are assumed to be connected, finite-dimensional, and complete.

The open and closed balls on {\cal M} of radius r>0r>0 centered at pp\in{\cal M} are defined, respectively, as r(p){q:d(p,q)<r}{\cal B}_{r}(p)\coloneqq\left\{q\in{\cal M}\colon d(p,q)<r\right\} and r[p]{q:d(p,q)r}{\cal B}_{r}[p]\coloneqq\left\{q\in{\cal M}\colon d(p,q)\leq r\right\}. Analogously, the open and closed balls on 𝒯p{\cal T}_{p}{\cal M} of radius r>0r>0 centered at u𝒯pu\in{\cal T}_{p}{\cal M} are defined, respectively, as Br(u){v𝒯p:vup<r}B_{r}(u)\coloneqq\left\{v\in{\cal T}_{p}{\cal M}\colon\|v-u\|_{p}<r\right\} and Br[u]{v𝒯p:vupr}B_{r}[u]\coloneqq\left\{v\in{\cal T}_{p}{\cal M}\colon\|v-u\|_{p}\leq r\right\}. It is well-known that there exists r>0r>0 such that expp:Br(0p)r(p)\exp_{p}\colon B_{r}(0_{p})\to{\cal B}_{r}(p) is a diffeomorphism, and r(p){\cal B}_{r}(p) is called a normal ball with center pp and radius rr. Whenever r(p){\cal B}_{r}(p) is a normal ball, it and its closure will be denoted by Br(p)\operatorname{B}_{r}(p) and Br[p]\operatorname{B}_{r}[p], respectively. The injectivity radius of \cal M at pp, denoted by rinj(p)r_{inj}(p), is the supremum of all r>0r>0 such that r(p){\cal B}_{r}(p) is a normal ball. The equality

d(q,p)=expp1qpd(q,p)=\|\exp_{p}^{-1}q\|_{p} (8)

holds for all qr(p)q\in{\cal B}_{r}(p), where expp1:r(p)𝒯p\exp_{p}^{-1}\colon{\cal B}_{r}(p)\to{\cal T}_{p}{\cal M} denotes the inverse of the exponential map. Recall that there exist r>0r>0 and δ>0\delta>0 such that, for every qBr(p)q\in\operatorname{B}_{r}(p), δ(q){\cal B}_{\delta}(q) is a normal ball and Br(p)δ(q)\operatorname{B}_{r}(p)\subset{\cal B}_{\delta}(q), see [20, Theorem 3.7]. In this case, Br(p)\operatorname{B}_{r}(p) is called a totally normal ball of center pp and radius rr. When \mathcal{M} is a Hadamard manifold, that is, a complete, simply connected Riemannian manifold with nonpositive sectional curvature, Br(p)\operatorname{B}_{r}(p) is totally normal for all r>0r>0 and pp\in\mathcal{M}.

A sequence {pk}\{p_{k}\}\subset{\cal M} is linearly convergent to p¯\bar{p} when there exist θ(0,1)\theta\in(0,1) and k0k_{0}\in\mathbb{N} such that

d(pk+1,p¯)θd(pk,p¯),for all k>k0.d(p_{k+1},\bar{p})\leq\theta d(p_{k},\bar{p}),\quad\mbox{for all }k>k_{0}.

It is said to be quadratically convergent to p¯\bar{p} when there exist θ>0\theta>0 and k0k_{0}\in\mathbb{N} such that

d(pk+1,p¯)θd2(pk,p¯),for all k>k0.d(p_{k+1},\bar{p})\leq\theta d^{2}(p_{k},\bar{p}),\quad\mbox{for all }k>k_{0}.

We end this section by presenting the concept of Lipschitz continuity for the differential of continuously differentiable functions in the Riemannian context. This concept will be fundamental for obtaining the quadratic convergence rate for our algorithm.

Definition 1.

Let f:mf\colon{\cal M}\to\mathbb{R}^{m} be continuously differentiable at p¯\bar{p}. Then 𝒟f{\cal D}f is L-Lipschitz continuous around p¯\bar{p} if there exists δL>0\delta_{L}>0 such that for every p,qδL(p¯)p,q\in{\cal B}_{\delta_{L}}(\bar{p})

𝒟f(q)Pγ,0,1𝒟f(p)mapLγ˙(0)p,\|{\cal D}f(q)P_{\gamma,0,1}-{\cal D}f(p)\|_{map}\leq L\|\dot{\gamma}(0)\|_{p}, (9)

where γ:[0,1]\gamma\colon[0,1]\to{\cal M} is a geodesic connecting p=γ(0)p=\gamma(0) to q=γ(1)q=\gamma(1).

3 Metric Regularity on Riemannian Manifolds

Let F:mF\colon{\cal M}\rightrightarrows\mathbb{R}^{m} be a set-valued mapping, whose the graph of FF is the set gphF{(p,x)×m:xF(p)}\operatorname{gph}F\coloneqq\{(p,x)\in{\cal M}\times\mathbb{R}^{m}\colon x\in F(p)\} with domain domF{p:F(p)}\operatorname{dom}F\coloneqq\{p\in{\cal M}\colon F(p)\neq\emptyset\}. The inverse of FF is defined as xF1(x)={p:xF(p)}x\mapsto F^{-1}(x)=\{p\in{\cal M}\colon x\in F(p)\}.

A function f:mf\colon{\cal M}\to\mathbb{R}^{m} is said to be Lipschitz continuous relative to a set 𝒲domf{\cal W}\subset\operatorname{dom}f if there exists a constant κ0\kappa\geq 0 such that

|f(p)f(p)|κd(p,p) for all p,p𝒲.|f(p^{\prime})-f(p)|\leq\kappa d(p^{\prime},p)\mbox{ for all }p^{\prime},p\in{\cal W}.

Let p¯intdomf\bar{p}\in\operatorname{int}\operatorname{dom}f. The Lipschitz modulus of ff at p¯\bar{p} is defined by

lip(f;p¯)lim supp,pp¯pp|f(p)f(p)|d(p,p).\operatorname{lip}(f;\bar{p})\coloneqq\underset{\begin{subarray}{c}p^{\prime},p\to\bar{p}\\ p\neq p^{\prime}\end{subarray}}{\limsup}\frac{|f(p^{\prime})-f(p)|}{d(p^{\prime},p)}.

We now introduce the following notations on m\mathbb{R}^{m}: 𝔹r[x]\mathbb{B}_{r}[x] denotes the Euclidean closed ball of radius r>0r>0 and center xx, and ded_{e} denotes the Euclidean distance from a point to a set. The next concept plays an important role in role in this paper and it comes from [23, p. 279] in the metric spaces setting:

Definition 2.

A set-valued mapping Φ:m\Phi\colon{\cal M}\rightrightarrows\mathbb{R}^{m} is said to be metrically regular at p¯\bar{p}\in{\cal M} for x¯m\bar{x}\in\mathbb{R}^{m} when x¯Φ(p¯)\bar{x}\in\Phi(\bar{p}), and there exist positive constants σ\sigma, aa and bb such that

the setgphΦ(a[p¯]×𝔹b[x¯])is closed\mbox{the set}\,\,\operatorname{gph}\Phi\,\cap({\cal B}_{a}[\bar{p}]\times\mathbb{B}_{b}[\bar{x}])\,\,\mbox{is closed} (10)

and

d(p,Φ1(x))σde(x,Φ(p)) for all (p,x)a[p¯]×𝔹b[x¯].d(p,\Phi^{-1}(x))\leq\sigma d_{e}(x,\Phi(p))\,\,\mbox{ for all }\,\,\,(p,x)\in{\cal B}_{a}[\bar{p}]\times\mathbb{B}_{b}[\bar{x}]. (11)

The infimum of σ\sigma over all such combinations of σ\sigma, aa and bb is called the regularity modulus for Φ\Phi at p¯\bar{p} for x¯\bar{x} and denoted by reg(Φ;p¯|x¯)\operatorname{reg}(\Phi;\bar{p}|\bar{x}). The absence of metric regularity is signaled by reg(Φ;p¯|x¯)=\operatorname{reg}(\Phi;\bar{p}|\bar{x})=\infty.

It is important to emphasize that one of the main assumptions used to guarantee the convergence of the proposed algorithm in this paper is metric regularity. This concept, in the context of Riemannian manifolds, has been explored in recent works such as [4, 2, 28]. We will now briefly discuss this property in the case where n\mathcal{M}\equiv\mathbb{R}^{n}. If f:nnf\colon\mathbb{R}^{n}\to\mathbb{R}^{n} is a differentiable function, the metric regularity of ff at p¯\bar{p} for 0 is equivalent to stating that the inverse of the Jacobian matrix of ff at p¯\bar{p} exists, which is the standard regularity assumption applied to solve the nonlinear equation f(p)=0f(p)=0. On the other hand, if we are interested in solving a nonlinear system of equations and inequalities, for example,

g(p)x,h(p)=y,g(p)\leq x,\quad h(p)=y, (12)

where xmx\in\mathbb{R}^{m} and ysy\in\mathbb{R}^{s} are given parameters, and g:nmg:\mathbb{R}^{n}\to\mathbb{R}^{m} and h:nsh:\mathbb{R}^{n}\to\mathbb{R}^{s} are continuously differentiable functions, then we can associate problem (12) with the following generalized equation:

ξf(p)+F(p),ξ=[xy],f=[gh],F=[+m0].\xi\in f(p)+F(p),\quad\xi=\begin{bmatrix}x\\ y\end{bmatrix},\quad f=\begin{bmatrix}g\\ h\end{bmatrix},\quad F=\begin{bmatrix}\mathbb{R}_{+}^{m}\\ 0\end{bmatrix}.

In this particular case, it is well known that the metric regularity of f+Ff+F at p¯\bar{p} for 0 is equivalent to the standard Mangasarian-Fromovitz constraint qualification at p¯\bar{p}. Notably, the case where FNCF\equiv N_{C} is of particular interest, with NCN_{C} denoting the normal cone to a closed convex set CmC\subset\mathbb{R}^{m}. The metric regularity of f+NCf+N_{C} at p¯\bar{p} for 0 is equivalent to the concept of strong metric regularity, meaning that f+NCf+N_{C} is locally single-valued and Lipschitz continuous at p¯\bar{p} for 0. Thus, if ff and CC are chosen such that f+NC0f+N_{C}\ni 0 forms the KKT system for a constrained optimization problem, then strong metric regularity is equivalent to the linear independence of the gradients of the active constraints and the strong second-order sufficient condition. These details are thoroughly discussed in [24]; see also Examples 6 and 7 in Section 5.

We present in Subsection 3.1 some examples of set-valued mapping that satisfy the previous concept. The next lemma is a version in the Riemannian setting of Corollary 3F.3 given in [24]. Its proof is similar to the corresponding result in Euclidean space and will be omitted here.

Lemma 1.

Consider a mapping Φ:m\Phi\colon{\cal M}\rightrightarrows\mathbb{R}^{m} and a point (p¯,x¯)gphΦ(\bar{p},\bar{x})\in\operatorname{gph}\Phi. Then for every g:mg\colon{\cal M}\to\mathbb{R}^{m} with lip(g;p¯)=0\operatorname{lip}(g;\bar{p})=0, one has reg(g+Φ;p¯|g(p¯)+x¯)=reg(Φ;p¯|x¯)\operatorname{reg}(g+\Phi;\bar{p}|g(\bar{p})+\bar{x})=\operatorname{reg}(\Phi;\bar{p}|\bar{x}).

In the theorem below, we demonstrate that the metric regularity condition of a mapping at p¯\bar{p}\in{\cal M} for x¯m\bar{x}\in\mathbb{R}^{m} is equivalent to the same condition for its linearization at p¯\bar{p}.

Theorem 1.

Consider a normal ball Br(p¯)\operatorname{B}_{r}(\bar{p})\subset{\cal M}. Let f:mf\colon{\cal M}\to\mathbb{R}^{m} be a continuously differentiable function at p¯\bar{p} and F:mF\colon{\cal M}\rightrightarrows\mathbb{R}^{m} be a set-valued mapping. Then for G:mG\colon{\cal M}\rightrightarrows\mathbb{R}^{m} defined by

G(p)={f(p¯)+𝒟f(p¯)[expp¯1p]+F(p),pBr(p¯),f(p)+F(p),p\Br(p¯),G(p)=\begin{cases}f(\bar{p})+{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]+F(p),&p\in\operatorname{B}_{r}(\bar{p}),\\ f(p)+F(p),&p\in{\cal M}\backslash\operatorname{B}_{r}(\bar{p}),\end{cases} (13)

one has reg(G;p¯|0)=reg(f+F;p¯|0)\operatorname{reg}(G;\bar{p}|0)=\operatorname{reg}(f+F;\bar{p}|0).

Proof.

Pick ϵ>0\epsilon>0 arbitrary. By Lemma 5 given in the Appendix of this paper, there exists a totally normal ball Bδϵ(p¯)Br(p¯)\operatorname{B}_{\delta_{\epsilon}}(\bar{p})\subset\operatorname{B}_{r}(\bar{p}) such that

𝒟f(p)[expp1pexpp1p′′]𝒟f(p¯)[expp¯1pexpp¯1p′′]eϵd(p,p′′),p,p,p′′Bδϵ(p¯).\|{\cal D}f(p)[\exp_{p}^{-1}p^{\prime}-\exp_{p}^{-1}p^{\prime\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}p^{\prime}-\exp_{\bar{p}}^{-1}p^{\prime\prime}]\|_{e}\leq\epsilon d(p^{\prime},p^{\prime\prime}),\qquad p,p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}). (14)

Consider a function g:mg\colon{\cal M}\to\mathbb{R}^{m} given by

g(q)={f(p¯)f(q)+𝒟f(p¯)[expp¯1q],qBr(p¯),0,q\Br(p¯).g(q)=\begin{cases}f(\bar{p})-f(q)+{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}q],&q\in\operatorname{B}_{r}(\bar{p}),\\ \qquad 0,&q\in{\cal M}\backslash\operatorname{B}_{r}(\bar{p}).\end{cases} (15)

In particular, this function satisfies

g(p′′)g(p)e=f(p)f(p′′)𝒟f(p¯)[expp¯1pexpp¯1p′′]e,p,p′′Bδϵ(p¯).\|g(p^{\prime\prime})-g(p^{\prime})\|_{e}=\|f(p^{\prime})-f(p^{\prime\prime})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime}]\|_{e},\qquad p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}). (16)

The first part of Proposition 2 guarantees that there exists δ(0,δϵ)\delta\in(0,\delta_{\epsilon}) such that

f(p)f(p′′)=01𝒟f(γp′′p(t))[γ˙p′′p(t)]dt,p,p′′Bδ(p¯),f(p^{\prime})-f(p^{\prime\prime})=\int_{0}^{1}{\cal D}f(\gamma_{p^{\prime\prime}p^{\prime}}(t))[\dot{\gamma}_{p^{\prime\prime}p^{\prime}}(t)]\operatorname{dt},\qquad p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta}(\bar{p}),

where γp′′p:[0,1]\gamma_{p^{\prime\prime}p^{\prime}}\colon[0,1]\to{\cal M} is the geodesic satisfying γp′′p(0)=p′′\gamma_{p^{\prime\prime}p^{\prime}}(0)=p^{\prime\prime} and γp′′p(1)=p\gamma_{p^{\prime\prime}p^{\prime}}(1)=p^{\prime}. Adding 𝒟f(p¯)[expp¯1pexpp¯1p′′]-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime}] to both sides, it follows that

f(p)f(p′′)𝒟f(p¯)[expp¯1pexpp¯1p′′]=01𝒟f(γp′′p(t))[γ˙p′′p(t)]𝒟f(p¯)[expp¯1pexpp¯1p′′]dt,p,p′′Bδ(p¯).f(p^{\prime})-f(p^{\prime\prime})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime}]=\\ \int_{0}^{1}{\cal D}f(\gamma_{p^{\prime\prime}p^{\prime}}(t))[\dot{\gamma}_{p^{\prime\prime}p^{\prime}}(t)]-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime}]\operatorname{dt},\qquad p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta}(\bar{p}).

Applying norm on both sides, we conclude that

f(p)f(p′′)𝒟f(p¯)[expp¯1pexpp¯1p′′]e01𝒟f(γp′′p(t))[γ˙p′′p(t)]𝒟f(p¯)[expp¯1pexpp¯1p′′]edt,p,p′′Bδ(p¯).\|f(p^{\prime})-f(p^{\prime\prime})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime}]\|_{e}\leq\\ \int_{0}^{1}\|{\cal D}f(\gamma_{p^{\prime\prime}p^{\prime}}(t))[\dot{\gamma}_{p^{\prime\prime}p^{\prime}}(t)]-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime}]\|_{e}\operatorname{dt},\qquad p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta}(\bar{p}). (17)

On the other hand, from Proposition 3 with p=p′′p=p^{\prime\prime} and q=pq=p^{\prime} we have

γ˙p′′p(t)=expγp′′p(t)1pexpγp′′p(t)1p′′,t[0,1].\dot{\gamma}_{p^{\prime\prime}p^{\prime}}(t)=\exp_{\gamma_{p^{\prime\prime}p^{\prime}}(t)}^{-1}p^{\prime}-\exp_{\gamma_{p^{\prime\prime}p^{\prime}}(t)}^{-1}p^{\prime\prime},\qquad t\in[0,1].

From this equality and (14) with p=γp′′p(t)p=\gamma_{p^{\prime\prime}p^{\prime}}(t), (17) yields

f(p)f(p′′)𝒟f(p¯)(expp¯1pexpp¯1p′′)eϵd(p,p′′),p,p′′Bδ(p¯).\|f(p^{\prime})-f(p^{\prime\prime})-{\cal D}f(\bar{p})(\exp^{-1}_{\bar{p}}p^{\prime}-\exp^{-1}_{\bar{p}}p^{\prime\prime})\|_{e}\leq\epsilon d(p^{\prime},p^{\prime\prime}),\qquad p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta}(\bar{p}).

Then (16) implies that g(p′′)g(p)eϵd(p,p′′)\|g(p^{\prime\prime})-g(p^{\prime})\|_{e}\leq\epsilon d(p^{\prime},p^{\prime\prime}) for all p,p′′Bδ(p¯)p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta}(\bar{p}). Since ϵ>0\epsilon>0 was chosen arbitrarily, we can conclude that lip(g;p¯)=0\operatorname{lip}(g;\bar{p})=0. Finally, to obtain the desired equality, just use Lemma  1 with gg as in (15) and Φ=f+F\Phi=f+F. ∎

Next, we define the Aubin property for a set-valued mapping :m\aleph\colon\mathbb{R}^{m}\rightrightarrows\mathcal{M}. To this end, it is necessary to introduce the concept of excess between subsets of \mathcal{M}. For sets 𝒲1{\cal W}_{1} and 𝒲2{\cal W}_{2} in \mathcal{M}, the excess of 𝒲1{\cal W}_{1} beyond 𝒲2{\cal W}_{2} is defined as

e(𝒲1,𝒲2)=supp𝒲1d(p,𝒲2),e({\cal W}_{1},{\cal W}_{2})=\sup_{p\in{\cal W}_{1}}d(p,{\cal W}_{2}),

with the convention that e(,𝒲2)=0e(\emptyset,{\cal W}_{2})=0 for 𝒲2{\cal W}_{2}\neq\emptyset and e(,𝒲2)=e(\emptyset,{\cal W}_{2})=\infty otherwise.

Definition 3.

A set-valued mapping :m\aleph\colon\mathbb{R}^{m}\rightrightarrows\mathcal{M} is said to have the Aubin property at x¯m\bar{x}\in\mathbb{R}^{m} for p¯\bar{p}\in\mathcal{M} if p¯(x¯)\bar{p}\in\aleph(\bar{x}), and there exist positive constants σ\sigma, aa, and bb such that

e((x)a[p¯],(x))σxxfor all x,x𝔹b[x¯].e\left(\aleph(x)\cap\mathcal{B}_{a}[\bar{p}],\aleph(x^{\prime})\right)\leq\sigma\|x-x^{\prime}\|\quad\text{for all }x,x^{\prime}\in\mathbb{B}_{b}[\bar{x}].

The infimum of σ\sigma over all such combinations of σ\sigma, aa, and bb is called the Lipschitz modulus of \aleph at x¯\bar{x} for p¯\bar{p} and is denoted by lip(;x¯|p¯)\operatorname{lip}(\aleph;\bar{x}|\bar{p}). The absence of this property is indicated by lip(;x¯|p¯)=\operatorname{lip}(\aleph;\bar{x}|\bar{p})=\infty.

The next result provides a relationship between the metric regularity of a set-valued mapping Φ:m\Phi\colon{\cal M}\rightrightarrows\mathbb{R}^{m} and the Aubin property for its inverse. Its proof follows from a straightforward adaptation of the demonstration in [24, Theorem 3E.6] and the concepts introduced above, and hence will be omitted.

Theorem 2.

A set-valued mapping Φ:m\Phi\colon{\cal M}\rightrightarrows\mathbb{R}^{m} is metrically regular at p¯\bar{p} for x¯\bar{x} with a constant σ\sigma if and only if its inverse Φ1:m\Phi^{-1}:\mathbb{R}^{m}\rightrightarrows{\cal M} has the Aubin property at x¯\bar{x} for p¯\bar{p} with constant σ\sigma. Thus, lip(Φ1;x¯|p¯)=reg(Φ;p¯|x¯).\operatorname{lip}(\Phi^{-1};\bar{x}|\bar{p})=\operatorname{reg}(\Phi;\bar{p}|\bar{x}).

3.1 Example in the SPD Matrices Cone

In this section, some examples of metrically regular mappings Φ\Phi on a particular manifold {\cal M} are presented.

Denote the set of symmetric matrices of size n×nn\times n for some nn\in\mathbb{N} by 𝒫(n){\cal P}(n), and the cone of symmetric positive definite matrices by =𝒫+(n){\cal M}={\cal P}_{+}(n). The latter is endowed with the affine invariant Riemannian metric given by

v,utr(vp1up1),p,v,u𝒯p,\langle v,u\rangle\coloneqq\mbox{tr}(vp^{-1}up^{-1}),\qquad p\in\mathcal{M},\quad v,u\in\mathcal{T}_{p}\mathcal{M}, (18)

where tr denotes the trace of a matrix. The tangent space 𝒯p\mathcal{T}_{p}\mathcal{M} can be identified with 𝒫(n){\cal P}(n). Moreover, \mathcal{M} is a Hadamard manifold, see, for example, [39, Theorem 1.2. p. 325]. The Riemannian distance is given by

d(p,q)=tr12(ln2(p12qp12)),p,q,d(p,q)=\operatorname{tr}^{\frac{1}{2}}(\ln^{2}(p^{-\frac{1}{2}}qp^{-\frac{1}{2}})),\qquad p,q\in\mathcal{M}, (19)

where ln\ln denotes the matrix logarithm. The Riemannian gradient of f:f\colon\cal M\to\mathbb{R} at pp is given by

gradf(p)=pf(p)p,\operatorname{grad}f(p)=pf^{\prime}(p)p, (20)

where f(p)f^{\prime}(p) represents the Euclidean gradient of ff at pp, see [48]. The identity matrix of size n×nn\times n will be denoted by id\operatorname{id}.

In the following two examples, we verify the property of metric regularity for single-valued functions.

Example 1.

Consider the function Φ:\Phi\colon\mathcal{M}\to{\mathbb{R}} defined by Φ(p)=ln(tr(p))\Phi(p)=\ln(\operatorname{tr}(p)). For every xx\in\mathbb{R}, the following equality holds:

Φ(exln(tr(p))p)=ln(tr(exln(tr(p))p))=ln(exln(tr(p))tr(p))=x,p.\Phi(e^{x-\ln({\operatorname{tr}(p)})}p)=\ln(\operatorname{tr}(e^{x-\ln({\operatorname{tr}(p)})}p))=\ln(e^{x-\ln({\operatorname{tr}(p)})}\operatorname{tr}(p))=x,\qquad p\in\mathcal{M}. (21)

This implies that {exln(tr(p))p:p}Φ1(x)\{e^{x-\ln({\operatorname{tr}(p)})}p\colon p\in\mathcal{M}\}\subset\Phi^{-1}(x) for all x.x\in\mathbb{R}. Thus, using (19) and simple algebraic manipulations, we obtain

d(Φ1(x),q)\displaystyle d(\Phi^{-1}(x),q) d(exln(tr(q))q,q),\displaystyle\leq d(e^{x-\ln({\operatorname{tr}(q)})}q,q),
=tr12(|xln(tr(q))|2id),\displaystyle=\operatorname{tr}^{\frac{1}{2}}(|x-\ln({\operatorname{tr}(q)})|^{2}\operatorname{id}),
=n|xln(tr(q))|=n|xΦ(q)|,x,q.\displaystyle=\sqrt{n}\,|x-\ln({\operatorname{tr}(q)})|=\sqrt{n}\,|x-\Phi(q)|,\qquad x\in\mathbb{R},\quad q\in\mathcal{M}.

On the other hand, since Φ\Phi is Euclidean differentiable, it follows from (20) that Φ\Phi will also be Riemannian differentiable with respect to metric (18). Therefore, Φ\Phi is continuous and, by Proposition 4, it has closed graph. Given this and the last inequality above, we conclude that Φ\Phi is metrically regular at pp for Φ(p)\Phi(p) for all pp\in{\cal M}.

Example 2.

Define the function Φ:\Phi\colon{\cal M}\to{\mathbb{R}} by Φ(p)=1/tr(p)\Phi(p)=1/\operatorname{tr}(p). Note that {(xtr(p))1p:p}Φ1(x)\{(x\operatorname{tr}(p))^{-1}p\colon p\in\mathcal{M}\}\subset\Phi^{-1}(x) for all x\{0}x\in\mathbb{R}\backslash\{0\}. Using this and (19), we obtain

d(Φ1(x),p)\displaystyle d(\Phi^{-1}(x),p) d((xtr(p))1p,p)\displaystyle\leq d((x\operatorname{tr}(p))^{-1}p,p)
=tr12(ln2(xtr(p)id))=n|ln(xtr(p))|,x\{0},p.\displaystyle=\operatorname{tr}^{\frac{1}{2}}(\ln^{2}(x\operatorname{tr}(p)\operatorname{id}))=\sqrt{n}\,|\ln(x\operatorname{tr}(p))|,\qquad x\in\mathbb{R}\backslash\{0\},\quad p\in\mathcal{M}. (22)

Now recall that

tr(p)=λ1(p)++λn(p),\operatorname{tr}(p)=\lambda_{1}(p)+\ldots+\lambda_{n}(p), (23)

where λ1(p),,λn(p)\lambda_{1}(p),\ldots,\lambda_{n}(p) are the eigenvalues of pp. Pick a constant a>0a>0. From (19), for all i{1,,n}i\in\{1,\ldots,n\}, one has

|ln(λi(p))|<(ln2(λ1(p))++ln2(λn(p)))12=tr12(ln2(p))=d(p,id)<a,pBa(id).|\ln(\lambda_{i}(p))|<(\ln^{2}(\lambda_{1}(p))+\ldots+\ln^{2}(\lambda_{n}(p)))^{\frac{1}{2}}=\operatorname{tr}^{\frac{1}{2}}(\ln^{2}(p))=d(p,\operatorname{id})<a,\qquad p\in\operatorname{B}_{a}(\operatorname{id}).

Then ea<λi(p)<eae^{-a}<\lambda_{i}(p)<e^{a} holds for all pBa(id)p\in\operatorname{B}_{a}(\operatorname{id}) and i=1,,ni=1,\ldots,n. Consequently

nea<λ1(p)++λn(p)<nea,pBa(id).ne^{-a}<\lambda_{1}(p)+\ldots+\lambda_{n}(p)<ne^{a},\qquad p\in\operatorname{B}_{a}(\operatorname{id}).

With (23), we conclude that 1/tr(p)(ea/n,ea/n)1/\operatorname{tr}(p)\in(e^{-a}/n,e^{a}/n) for all pBa(id)p\in\operatorname{B}_{a}(\operatorname{id}). Since the function ln\ln is Lipschitz on the interval (ea/n,ea/n)(e^{-a}/n,e^{a}/n) (its gradient is bounded on this interval), there exists σ>0\sigma>0 such that

|ln(xtr(p))|=|ln(x)ln(1/tr(p))|σ|x1/tr(p)|,x(ea/n,ea/n),pBa(id).|\ln(x\operatorname{tr}(p))|=\left|\ln(x)-\ln(1/\operatorname{tr}(p))\right|\leq\sigma\left|x-1/\operatorname{tr}(p)\right|,\qquad x\in(e^{-a}/n,e^{a}/n),\quad p\in\operatorname{B}_{a}(\operatorname{id}). (24)

Therefore, (2) and (24) yield

d(Φ1(x),p)nσ|xΦ(p)|,x(ea/n,ea/n),pBa(id).d(\Phi^{-1}(x),p)\leq\sqrt{n}\,\sigma\left|x-\Phi(p)\right|,\qquad x\in(e^{-a}/n,e^{a}/n),\quad p\in\operatorname{B}_{a}(\operatorname{id}).

Furthermore, taking into account the argument presented at the end of Example 1, it can be concluded that gphΦ\operatorname{gph}\Phi is closed. Then Φ\Phi is metrically regular at pp for Φ(p)\Phi(p) for all pBa(id)p\in\operatorname{B}_{a}(\operatorname{id}).

We now present two examples involving set-valued mappings. In these examples, we demonstrate that the given mappings are metrically regular at specific points in {\cal M}.

Example 3.

Define the set-valued mapping Φ:\Phi\colon\mathcal{M}\rightrightarrows{\mathbb{R}} by

Φ(p)={ln(tr(p)),p\(1/nid),{0}[1,2],p=1/nid.\Phi(p)=\begin{cases}\,\,\ln(\operatorname{tr}(p)),&p\in{\cal M}\backslash(1/n\operatorname{id}),\\ \{0\}\cup[1,2],&p=1/n\operatorname{id}.\end{cases} (25)

By (25) and the calculations in (21), if xx\in\mathbb{R} and qq\in\mathcal{M} satisfy exln(tr(q))q\(1/nid)e^{x-\ln({\operatorname{tr}(q)})}q\in\mathcal{M}\backslash(1/n\operatorname{id}) then Φ(exln(tr(q))q)ln(tr(exln(tr(q))q))=x\Phi(e^{x-\ln({\operatorname{tr}(q)})}q)\ni\ln(\operatorname{tr}(e^{x-\ln({\operatorname{tr}(q)})}q))=x. Now, consider the case exln(tr(q))q=1/nide^{x-\ln({\operatorname{tr}(q)})}q=1/n\operatorname{id}. By applying tr\operatorname{tr} on both sides, it follows that exln(tr(q))tr(q)=1e^{x-\ln({\operatorname{tr}(q)})}\operatorname{tr}(q)=1. Clearly, this implies that x=0x=0. Therefore, by (25), we get Φ(exln(tr(q))q)x\Phi(e^{x-\ln({\operatorname{tr}(q)})}q)\ni x for this case as well. Thus, {exln(tr(q))q:q}Φ1(x)\{e^{x-\ln({\operatorname{tr}(q)})}q\colon q\in\mathcal{M}\}\subset\Phi^{-1}(x) holds for all xx\in\mathbb{R}, which implies

d(Φ1(x),q)\displaystyle d(\Phi^{-1}(x),q) d(exln(tr(q))q,q),\displaystyle\leq d(e^{x-\ln({\operatorname{tr}(q)})}q,q),
=tr12(|xln(tr(q))|2id)=n|xln(tr(q))|,x,q.\displaystyle=\operatorname{tr}^{\frac{1}{2}}(|x-\ln({\operatorname{tr}(q)})|^{2}\operatorname{id})=\sqrt{n}\,|x-\ln({\operatorname{tr}(q)})|,\qquad x\in\mathbb{R},\quad q\in\mathcal{M}.

On the other hand, it follows from (25) that

|xln(tr(q))|=de(x,Φ(q)),x(,1/2),q.|x-\ln({\operatorname{tr}(q)})|=d_{e}(x,\Phi(q)),\qquad x\in(-\infty,1/2),\quad q\in\mathcal{M}.

Overall, we conclude

d(Φ1(x),q)nde(x,Φ(q)),x(,1/2),q.d(\Phi^{-1}(x),q)\leq\sqrt{n}\,d_{e}(x,\Phi(q)),\qquad x\in(-\infty,1/2),\quad q\in\mathcal{M}.

With a justification analogous to the one presented in Example 1, we state that Φ\Phi has closed graph. Therefore, Φ\Phi is metrically regular at pp for ln(tr(p))\ln(\operatorname{tr}(p)) for all p{q:ln(tr(q))(,1/2)}p\in\{q\in{\cal M}\colon\ln({\operatorname{tr}(q)})\in(-\infty,1/2)\}.

Example 4.

Consider the set-valued mapping Φ:\Phi\colon\mathcal{M}\rightrightarrows{\mathbb{R}} by

Φ(p)={    1/tr(p),p\id,{1/n}[2,3],p=id.\Phi(p)=\begin{cases}\,\,\,\,1/\operatorname{tr}(p),&p\in{\cal M}\backslash\operatorname{id},\\ \{1/n\}\cup[2,3],&p=\operatorname{id}.\end{cases} (26)

Following the steps of Example 3, we can show that Φ(1/(xtr(q))q)1/tr(1/(xtr(q))q)=x\Phi(1/(x\operatorname{tr}(q))\,q)\ni 1/\operatorname{tr}(1/(x\operatorname{tr}(q))\,q)=x holds for all x\{0}x\in\mathbb{R}\backslash\{0\}, qq\in\mathcal{M}. This means that {1/(xtr(q))q:q}Φ1(x)\{1/(x\operatorname{tr}(q))\,q\colon q\in\mathcal{M}\}\subset\Phi^{-1}(x) for all x\{0}x\in\mathbb{R}\backslash\{0\}, and hence

d(Φ1(x),q)\displaystyle d(\Phi^{-1}(x),q) d((xtr(q))1q,q)\displaystyle\leq d((x\operatorname{tr}(q))^{-1}q,q)
=tr12(ln2(xtr(q)id))=n|ln(xtr(q))|,x\{0},q.\displaystyle=\operatorname{tr}^{\frac{1}{2}}(\ln^{2}(x\operatorname{tr}(q)\operatorname{id}))=\sqrt{n}\,|\ln(x\operatorname{tr}(q))|,\qquad x\in\mathbb{R}\backslash\{0\},\quad q\in\mathcal{M}.

From (24), for all a>0a>0 there exists σ>0\sigma>0 such that

d(Φ1(x),q)σ|x1/tr(q)|,x(ea/n,ea/n),qBa(id).d(\Phi^{-1}(x),q)\leq\sigma\left|x-1/\operatorname{tr}(q)\right|,\qquad x\in(e^{-a}/n,e^{a}/n),\quad q\in\operatorname{B}_{a}(\operatorname{id}).

Pick a>0a>0. Exploiting the definition of Φ\Phi in (26), it follows that

d(Φ1(x),q)σde(x,Φ(q)),x(ea/n,min{ea/n,1}),qBa(id).d(\Phi^{-1}(x),q)\leq\sigma d_{e}(x,\Phi(q)),\qquad x\in(e^{-a}/n,\min\{e^{a}/n,1\}),\quad q\in\operatorname{B}_{a}(\operatorname{id}).

Combining this with the fact that gphΦ\operatorname{gph}\Phi is closed, we conclude that Φ\Phi is metrically regular at pp for 1/tr(p)1/\operatorname{tr}(p) for all p{q:1/tr(q)(ea/n,min{ea/n,1})}p\in\{q\in{\cal M}\colon 1/\operatorname{tr}(q)\in(e^{-a}/n,\min\{e^{a}/n,1\})\}.

4 Convergence

In this section, we will establish three convergence theorems for the sequence generated by the inexact Newton method described in (6). The proofs presented here rely on a Riemannian version of Theorem 5G.3 from [24], which is presented in the following lemma. The proof of this lemma is a straightforward adaptation of its Euclidean version and will be omitted.

Lemma 2.

Consider a set-valued mapping F:mF\colon\mathcal{M}\rightrightarrows\mathbb{R}^{m} and a point (p¯,x¯)gphF(\bar{p},\bar{x})\in\operatorname{gph}F at which FF is metrically regular with positive constants σ\sigma, aa and bb satisfying (10)-(11). Let ν>0\nu>0 be such that σν<1\sigma\nu<1 and σ>σ/(1σν)\sigma^{\prime}>\sigma/(1-\sigma\nu). Then for every positive α\alpha and β\beta such that

αa/2,να+2βb,2σβα\alpha\leq a/2,\quad\nu\alpha+2\beta\leq b,\quad 2\sigma^{\prime}\beta\leq\alpha

and for every function g:mg\colon{\cal M}\to\mathbb{R}^{m} satisfying

g(p¯)eβ,g(q)g(q)eνd(q,q),q,qα[p¯],\|g(\bar{p})\|_{e}\leq\beta,\quad\|g(q)-g(q^{\prime})\|_{e}\leq\nu d(q,q^{\prime}),\quad q,q^{\prime}\in{\cal B}_{\alpha}[\bar{p}],

the mapping g+Fg+F has the following property: for every x,x𝔹β[x¯]x,x^{\prime}\in\mathbb{B}_{\beta}[\bar{x}] and every p(g+F)1(x)α[p¯]p\in(g+F)^{-1}(x)\cap{\cal B}_{\alpha}[\bar{p}] there exists p(g+F)1(x)p^{\prime}\in(g+F)^{-1}(x^{\prime}) such that d(p,p)σxxe.d(p,p^{\prime})\leq\sigma^{\prime}\|x-x^{\prime}\|_{e}.

Our first convergence result is a local analysis of the inexact Newton method (6) for solving (2). This approach makes assumptions around a solution p¯\bar{p} of (2).

Theorem 3.

Let p¯\bar{p} be a solution of (2). Suppose that the following conditions hold:

  1. (i)

    the function f:mf\colon{\cal M}\to\mathbb{R}^{m} is continuously differentiable at p¯\bar{p} and reg(f+F;p¯|0)=σ\operatorname{reg}(f+F;\bar{p}|0)=\sigma;

  2. (ii)

    the sequence {Rk}\{R_{k}\} satisfies

    lim suppp¯pp¯{1d(p,p¯)supk0supxRk(p)xe}<1σ,k=0,1,.\underset{\begin{subarray}{c}p\to\bar{p}\\ p\neq\bar{p}\end{subarray}}{\limsup}\left\{\frac{1}{d(p,\bar{p})}\sup_{k\in\mathbb{N}_{0}}\sup_{x\in R_{k}(p)}\|x\|_{e}\right\}<\frac{1}{\sigma},\qquad k=0,1,\ldots. (27)

Then there exist θ(0,1)\theta\in(0,1) and a totally normal ball Br(p¯)\operatorname{B}_{r}(\bar{p})\subset{\cal M} such that for every pBr(p¯)\{p¯}p\in\operatorname{B}_{r}(\bar{p})\backslash\{\bar{p}\}, every k0k\in\mathbb{N}_{0}, and every ukRk(p)u_{k}\in R_{k}(p) there exists qBr(p¯)q^{\prime}\in\operatorname{B}_{r}(\bar{p}) satisfying

f(p)+𝒟f(p)[expp1q]+F(q)ukf(p)+{\cal D}f(p)[\exp^{-1}_{p}q^{\prime}]+F(q^{\prime})\ni u_{k} (28)

and

d(q,p¯)θd(p,p¯).d(q^{\prime},\bar{p})\leq\theta d(p,\bar{p}). (29)

Consequently, for any starting point p0Br(p¯)p_{0}\in\operatorname{B}_{r}(\bar{p}), there exists a sequence {pk}\{p_{k}\} generated by (6) that converges linearly to p¯\bar{p}.

Proof.

First, (27) implies that there exists ι>0\iota>0 satisfying the following inequality:

lim suppp¯pp¯{1d(p,p¯)supk0supxRk(p)xe}<ι<1σ.\underset{\begin{subarray}{c}p\to\bar{p}\\ p\neq\bar{p}\end{subarray}}{\limsup}\left\{\frac{1}{d(p,\bar{p})}\sup_{k\in\mathbb{N}_{0}}\sup_{x\in R_{k}(p)}\|x\|_{e}\right\}<\iota<\frac{1}{\sigma}. (30)

Choose ι\iota satisfying (30) and positive constants μ>0\mu>0, κ>σ\kappa>\sigma, ϵ>0\epsilon>0 and θ(0,1)\theta\in(0,1) such that

μκ<1,κ(ϵ+ι)<θ(1μκ).\mu\kappa<1,\qquad\kappa(\epsilon+\iota)<\theta(1-\mu\kappa). (31)

Pick Θ>0\Theta>0 and τ(σ,κ)\tau\in(\sigma,\kappa) such that

σ1μσ<Θ<κ1μκ,τ1μτ<Θ.\frac{\sigma}{1-\mu\sigma}<\Theta<\frac{\kappa}{1-\mu\kappa},\qquad\frac{\tau}{1-\mu\tau}<\Theta. (32)

From the first inequality in (30), there exists δ>0\delta>0 such that

xe<ιd(p,p¯)wheneverpδ(p¯)\{p¯},xRk(p),k0.\|x\|_{e}<\iota d(p,\bar{p})\quad\mbox{whenever}\quad p\in{\cal B}_{\delta}(\bar{p})\backslash\{\bar{p}\},\quad x\in R_{k}(p),\quad k\in\mathbb{N}_{0}. (33)

Choose a totally normal ball Bδ¯(p¯)δ(p¯)\operatorname{B}_{\bar{\delta}}(\bar{p})\subset{\cal B}_{\delta}(\bar{p}). Note that the function gp:mg_{p}\colon{\cal M}\to\mathbb{R}^{m} defined by

gp(q)={𝒟f(p)[expp1qexpp1p¯]𝒟f(p¯)[expp¯1q],qBδ¯(p¯),0,q\Bδ¯(p¯).g_{p}(q)=\begin{cases}{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}\bar{p}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}q],&q\in\operatorname{B}_{\bar{\delta}}(\bar{p}),\\ \qquad 0,&q\in{\cal M}\backslash\operatorname{B}_{\bar{\delta}}(\bar{p}).\end{cases} (34)

satisfies gp(p¯)=0g_{p}(\bar{p})=0 for all pBδ¯(p¯)p\in\operatorname{B}_{\bar{\delta}}(\bar{p}). On the other hand, since ff and FF satisfy (i)(i), it follows from Theorem 1 that reg(G;p¯|0)=σ\operatorname{reg}(G;\bar{p}|0)=\sigma, where G:mG\colon{\cal M}\rightrightarrows\mathbb{R}^{m} is the function defined in (13) with r=δ¯r=\bar{\delta}. Hence, by the definition of reg\operatorname{reg} in Definition 2, there exist a>0a>0 and b>0b>0 such that (10) and (11) are satisfied for Φ=G\Phi=G and σ=τ\sigma=\tau. Moreover, by Lemma 5 and (34), there exists α<min{δ¯,a/2,b/μ}\alpha<\min\{\bar{\delta},a/2,b/\mu\} such that

gp(q)gp(q)e=𝒟f(p)[expp1qexpp1q]𝒟f(p¯)[expp¯1qexpp¯1q]eμd(q,q),\|g_{p}(q)-g_{p}(q^{\prime})\|_{e}=\|{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}q^{\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}q-\exp_{\bar{p}}^{-1}q^{\prime}]\|_{e}\leq\mu d(q,q^{\prime}),

for all p,q,qBα(p¯)p,q,q^{\prime}\in\operatorname{B}_{\alpha}(\bar{p}). Therefore, for every pBα(p¯)p\in\operatorname{B}_{\alpha}(\bar{p}) one has

gp(p¯)=0,gp(q)gp(q)eμd(q,q),q,qBα[p¯].g_{p}(\bar{p})=0,\quad\|g_{p}(q)-g_{p}(q^{\prime})\|_{e}\leq\mu d(q,q^{\prime}),\quad q,q^{\prime}\in\operatorname{B}_{\alpha}[\bar{p}].

Overall, there exists β>0\beta>0 (independent of pp) such that

αa/2,μα+2βb,2Θβα,\alpha\leq a/2,\qquad\mu\alpha+2\beta\leq b,\qquad 2\Theta\beta\leq\alpha,

and

gp(p¯)eβ,gp(q)gp(q)eμd(q,q),p,q,qBα[p¯].\|g_{p}(\bar{p})\|_{e}\leq\beta,\quad\|g_{p}(q)-g_{p}(q^{\prime})\|_{e}\leq\mu d(q,q^{\prime}),\quad p,q,q^{\prime}\in\operatorname{B}_{\alpha}[\bar{p}].

Applying Lemma 2 with g=gpg=g_{p}, F=GF=G, ν=μ\nu=\mu, σ=Θ\sigma^{\prime}=\Theta and x¯=0\bar{x}=0, we conclude that for every x,x𝔹β[0]x,x^{\prime}\in\mathbb{B}_{\beta}[0] and every q(gp+G)1(x)Bα[p¯]q\in(g_{p}+G)^{-1}(x)\cap\operatorname{B}_{\alpha}[\bar{p}] there exists q(gp+G)1(x)q^{\prime}\in(g_{p}+G)^{-1}(x^{\prime}) such that d(q,q)Θxxe.d(q,q^{\prime})\leq\Theta\|x-x^{\prime}\|_{e}. Taking x=x¯=0x=\bar{x}=0 and q=p¯q=\bar{p}, we conclude that for every x𝔹β[0]x^{\prime}\in\mathbb{B}_{\beta}[0] there exists q(gp+G)1(x)q^{\prime}\in(g_{p}+G)^{-1}(x^{\prime}) such that

d(p¯,q)Θxe.d(\bar{p},q^{\prime})\leq\Theta\|x^{\prime}\|_{e}. (35)

Using the second part of Proposition 2 with ϵ>0\epsilon>0 chosen as in (31), we have the existence of a normal ball Bδϵ(p¯)\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) such that

f(p)f(p¯)𝒟f(p¯)[expp¯1p]eϵd(p,p¯),pBδϵ(p¯),\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq\epsilon d(p,\bar{p}),\qquad p\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}), (36)

Fixed rr such that

0<r<min{βϵ+ι,δ¯,δϵ}0<r<\min\left\{\frac{\beta}{\epsilon+\iota},\bar{\delta},\delta_{\epsilon}\right\} (37)

and pBr(p¯)\{p¯}p\in\operatorname{B}_{r}(\bar{p})\backslash\{\bar{p}\}, by choosing k0k\in\mathbb{N}_{0} and ukRk(p)u_{k}\in R_{k}(p), it follows from (33) that uku_{k} satisfies ukeιd(p,p¯)\|u_{k}\|_{e}\leq\iota d(p,\bar{p}). Denote

ykf(p)f(p¯)+𝒟f(p)[expp1p¯]uk.y_{k}\coloneqq f(p)-f(\bar{p})+{\cal D}f(p)[\exp^{-1}_{p}\bar{p}]-u_{k}. (38)

If yk=0y_{k}=0, then qp¯q^{\prime}\coloneqq\bar{p} satisfies (28) because f(p¯)F(p¯)-f(\bar{p})\in F(\bar{p}), and (29) holds trivially. For yk0y_{k}\neq 0, it follows from (38), (36) and (33) that

ykef(p)f(p¯)+𝒟f(p)[expp1p¯]e+uke(ϵ+ι)d(p,p¯).\|y_{k}\|_{e}\leq\|f(p)-f(\bar{p})+{\cal D}f(p)[\exp^{-1}_{p}\bar{p}]\|_{e}+\|u_{k}\|_{e}\leq(\epsilon+\iota)d(p,\bar{p}). (39)

Since d(p,p¯)<r<β/(ϵ+ι)d(p,\bar{p})<r<\beta/(\epsilon+\iota), it follows from (39) that yke<β\|y_{k}\|_{e}<\beta. Applying (35) with x=ykx^{\prime}=-y_{k}, we obtain that there exists q(gp+G)1(yk)q^{\prime}\in(g_{p}+G)^{-1}(-y_{k}) such that d(q,p¯)Θyked(q^{\prime},\bar{p})\leq\Theta\|y_{k}\|_{e}. Hence, from the upper bound for Θ\Theta given in (32), (39), and the last inequality in (31), it follows that

d(q,p¯)<κ1μκyke<κ(ϵ+ι)1μκd(p,p¯)<θd(p,p¯).d(q^{\prime},\bar{p})<\frac{\kappa}{1-\mu\kappa}\|y_{k}\|_{e}<\frac{\kappa(\epsilon+\iota)}{1-\mu\kappa}d(p,\bar{p})<\theta d(p,\bar{p}).

Furthermore, it comes from yk(gp+G)(q)-y_{k}\in(g_{p}+G)(q^{\prime}), (38), (34) and (13) that

f(p)+f(p¯)𝒟f(p)[expp1p¯]+ukf(p¯)+𝒟f(p)[expp1qexpp1p¯]+F(q),-f(p)+f(\bar{p})-{\cal D}f(p)[\exp^{-1}_{p}\bar{p}]+u_{k}\in f(\bar{p})+{\cal D}f(p)[\exp_{p}^{-1}q^{\prime}-\exp_{p}^{-1}\bar{p}]+F(q^{\prime}),

which means that ukf(p)+𝒟f(p)[expp1q]+F(q).u_{k}\in f(p)+{\cal D}f(p)[\exp_{p}^{-1}q^{\prime}]+F(q^{\prime}). Thus, qq^{\prime} satisfies (28) and (29).

Now choose p0Br(p¯)p_{0}\in\operatorname{B}_{r}(\bar{p}). If pk=p¯p_{k}=\bar{p}, then pk+1p¯p_{k+1}\coloneqq\bar{p} verifies (6) because 0Rk(x¯)0\in R_{k}(\bar{x}). If pkp¯p_{k}\neq\bar{p}, applying (28) and (29) with p=p0p=p_{0} and q=p1q^{\prime}=p_{1} we have f(p0)+𝒟f(p0)[expp01p1]+F(p1)u0f(p_{0})+{\cal D}f(p_{0})[\exp^{-1}_{p_{0}}p_{1}]+F(p_{1})\ni u_{0} and

d(p1,p¯)θd(p0,p¯).d(p_{1},\bar{p})\leq\theta d(p_{0},\bar{p}).

Repeating this argument one can conclude that there exists a sequence {pk}\{p_{k}\} in Br(p¯)\operatorname{B}_{r}(\bar{p}) which satisfies (6) and converges linearly to p¯\bar{p}. ∎

An upper bound for the radius rr of the ball mentioned in the statement of Theorem 3 is given in (37). This bound depends on certain constants that, while not known a priori, are guaranteed to exist due to conditions i) and ii) in Lemma 2, as well as additional results presented in the Appendix of this paper.

Below, we present a version of Theorem 3 with more technical conditions, where it is assumed that the constants mentioned above are known. Under these new conditions, we can explicitly determine a radius of convergence for the sequence pk{p_{k}} generated by (6). The proof of this new theorem follows a similar structure to the proof of Theorem 3 and will therefore be omitted.

Theorem 4.

Let p¯\bar{p} be a solution to the equation (2). We assume that the following conditions are satisfied:

  1. (i)

    Bδ¯(p¯)\operatorname{B}_{\bar{\delta}}(\bar{p}) is a totally normal ball, and the mapping G:mG\colon{\cal M}\rightrightarrows\mathbb{R}^{m} defined in (13) with r=δ¯r=\bar{\delta} is metrically regular at p¯\bar{p} for 0 with the constant τ>0\tau>0 and neighborhoods a[p¯]{\cal B}_{a}[\bar{p}] and 𝔹b[0]\mathbb{B}_{b}[0].

  2. (ii)

    ι(0,1/τ)\iota\in(0,1/\tau) is chosen such that xe<ιd(p,p¯)\|x\|_{e}<\iota d(p,\bar{p}) holds for all pBδ¯(p¯)\{p¯}p\in\operatorname{B}_{\bar{\delta}}(\bar{p})\backslash\{\bar{p}\}, xRk(p)x\in R_{k}(p), k0k\in\mathbb{N}_{0}.

  3. (iii)

    μ>0\mu>0, κ>τ\kappa>\tau, ϵ>0\epsilon>0 and Θ>0\Theta>0 satisfy

    μκ<1,κ(ϵ+ι)<1μκ,τ1μτ<Θ<κ1μκ.\mu\kappa<1,\qquad\kappa(\epsilon+\iota)<1-\mu\kappa,\qquad\frac{\tau}{1-\mu\tau}<\Theta<\frac{\kappa}{1-\mu\kappa}.
  4. (iv)

    α<min{δ¯,a/2,b/μ}\alpha<\min\{\bar{\delta},a/2,b/\mu\} and

    𝒟f(p)[expp1qexpp1q]𝒟f(p¯)[expp¯1qexpp¯1q]eμd(q,q)for all p,q,qBα(p¯).\|{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}q^{\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}q-\exp_{\bar{p}}^{-1}q^{\prime}]\|_{e}\leq\mu d(q,q^{\prime})\quad\mbox{for all }\,p,q,q^{\prime}\in\operatorname{B}_{\alpha}(\bar{p}).
  5. (v)

    β>0\beta>0 is such that μα+2βb\mu\alpha+2\beta\leq b and 2Θβα2\Theta\beta\leq\alpha.

  6. (vi)

    δϵ>0\delta_{\epsilon}>0 satisfies f(p)f(p¯)𝒟f(p¯)[expp¯1p]eϵd(p,p¯)\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq\epsilon d(p,\bar{p}) for all pBδϵ(p¯)p\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}).

Then, for every starting point p0Br(p¯)p_{0}\in\operatorname{B}_{r}(\bar{p}) with 0<r<min{β/(ϵ+ι),δ¯,δϵ}0<r<\min\{\beta/(\epsilon+\iota),\bar{\delta},\delta_{\epsilon}\}, there exists a sequence {pk}\{p_{k}\} generated by (6) that is well defined and converges linearly to p¯\bar{p}.

Next, we modify the assumption on the multifunction RkR_{k} and introduce a stronger condition for the differentiability of the function ff in order to establish quadratic convergence of the sequence (6).

Theorem 5.

Let p¯\bar{p} be a solution of (2). Suppose the following conditions hold:

  1. (i)

    the function f:mf\colon{\cal M}\to\mathbb{R}^{m} is continuously differentiable at p¯\bar{p}, and reg(f+F;p¯|0)=σ\operatorname{reg}(f+F;\bar{p}|0)=\sigma;

  2. (ii)

    𝒟f{\cal D}f is L-Lipschitz continuous around p¯\bar{p};

  3. (iii)

    the sequence {Rk}\{R_{k}\} satisfies

    lim suppp¯pp¯{1d2(p,p¯)supk0supxRk(p)xe}<1σ,k=0,1,.\underset{\begin{subarray}{c}p\to\bar{p}\\ p\neq\bar{p}\end{subarray}}{\limsup}\left\{\frac{1}{d^{2}(p,\bar{p})}\sup_{k\in\mathbb{N}_{0}}\sup_{x\in R_{k}(p)}\|x\|_{e}\right\}<\frac{1}{\sigma},\qquad k=0,1,\ldots. (40)

Then there exist θ>0\theta>0 and a totally normal ball Br(p¯)\operatorname{B}_{r}(\bar{p})\subset{\cal M} such that for every pBr(p¯)\{p¯}p\in\operatorname{B}_{r}(\bar{p})\backslash\{\bar{p}\}, every k0k\in\mathbb{N}_{0}, and every ukRk(p)u_{k}\in R_{k}(p), there exists qBr(p¯)q^{\prime}\in\operatorname{B}_{r}(\bar{p}) satisfying

f(p)+𝒟f(p)[expp1q]+F(q)ukf(p)+{\cal D}f(p)[\exp^{-1}_{p}q^{\prime}]+F(q^{\prime})\ni u_{k} (41)

and

d(q,p¯)θd2(p,p¯).d(q^{\prime},\bar{p})\leq\theta d^{2}(p,\bar{p}). (42)

Consequently, for any starting point p0Br(p¯)p_{0}\in\operatorname{B}_{r}(\bar{p}), there exists a sequence {pk}\{p_{k}\} generated by (6) that converges quadratically to p¯\bar{p}.

Proof.

This proof is analogous to the proof of Theorem 3. Using (40), we can find ι<1/σ\iota<1/\sigma and a totally normal ball Bδ(p¯)\operatorname{B}_{\delta}(\bar{p}) such that

xe<ιd2(p,p¯)wheneverpBδ(p¯)\{p¯},k0,xRk(p).\|x\|_{e}<\iota d^{2}(p,\bar{p})\quad\mbox{whenever}\quad p\in\operatorname{B}_{\delta}(\bar{p})\backslash\{\bar{p}\},\quad k\in\mathbb{N}_{0},\quad x\in R_{k}(p). (43)

Choose μ>0\mu>0, κ>σ\kappa>\sigma, Θ>0\Theta>0 and τ(σ,κ)\tau\in(\sigma,\kappa) satisfying

μκ<1,σ1μσ<Θ<κ1μκ,τ1μτ<Θ.\mu\kappa<1,\qquad\frac{\sigma}{1-\mu\sigma}<\Theta<\frac{\kappa}{1-\mu\kappa},\qquad\frac{\tau}{1-\mu\tau}<\Theta. (44)

Note that μτ<1\mu\tau<1. From (i)(i) and Theorem 1, we obtain reg(G;p¯|0)=σ\operatorname{reg}(G;\bar{p}|0)=\sigma, where G:mG\colon\mathcal{M}\rightrightarrows\mathbb{R}^{m} is the function defined in (13) with r=δr=\delta. Thus, there exist a>0a>0 and b>0b>0 such that GG is metrically regular at p¯\bar{p} for 0 with the constant τ\tau and neighborhoods a[p¯]{\cal B}_{a}[\bar{p}] and 𝔹b[0]\mathbb{B}_{b}[0]. Now, consider the auxiliary functions gp:mg_{p}\colon{\cal M}\to\mathbb{R}^{m}, pBδ(p¯)p\in\operatorname{B}_{\delta}(\bar{p}), defined in (34). Recall that gp(p¯)=0g_{p}(\bar{p})=0 for all pBδ(p¯)p\in\operatorname{B}_{\delta}(\bar{p}). Make δ>0\delta>0 smaller, if necessary, to ensure

gp(q)gp(q)eμd(q,q),p,q,qBδ[p¯].\|g_{p}(q)-g_{p}(q^{\prime})\|_{e}\leq\mu d(q,q^{\prime}),\qquad p,q,q^{\prime}\in\operatorname{B}_{\delta}[\bar{p}].

Hence, we can apply Lemma 2 with g=gpg=g_{p}, F=GF=G, σ=τ\sigma=\tau, ν=μ\nu=\mu, σ=Θ\sigma^{\prime}=\Theta, x=x¯=0x=\bar{x}=0, and p=p¯p=\bar{p}. This yields the existence of β>0\beta>0 (independent of the point pδ(p¯)p\in\mathcal{B}_{\delta}(\bar{p}) that determines the function gpg_{p}) such that for each x𝔹β[0]x^{\prime}\in\mathbb{B}_{\beta}[0], there exists q(gp+G)1(x)q^{\prime}\in(g_{p}+G)^{-1}(x^{\prime}) such that

d(q,p¯)Θxe.d(q^{\prime},\bar{p})\leq\Theta\|x^{\prime}\|_{e}. (45)

With (ii)(ii), the last part of Proposition 2 implies that there exists δL>0\delta_{L}>0 such that

f(p)f(p¯)𝒟f(p¯)[expp¯1p]eLd2(p,p¯),pBδL(p¯).\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq Ld^{2}(p,\bar{p}),\qquad p\in\operatorname{B}_{\delta_{L}}(\bar{p}). (46)

Fixed rr such that

0<r<min{(βL+ι)1/2,1μκκ(L+ι),δ,δL}0<r<\min\left\{\left(\frac{\beta}{L+\iota}\right)^{1/2},\frac{1-\mu\kappa}{\kappa(L+\iota)},\;\delta,\;\delta_{L}\right\} (47)

and pBr(p¯)\{p¯}p\in\operatorname{B}_{r}(\bar{p})\backslash\{\bar{p}\} fixed, by choosing k0k\in\mathbb{N}_{0} and ukRk(p)u_{k}\in R_{k}(p) it follows from (43) that uku_{k} satisfies ukeιd2(p,p¯)\|u_{k}\|_{e}\leq\iota d^{2}(p,\bar{p}). Denote

ykf(p)f(p¯)+𝒟f(p)[expp1p¯]uk.y_{k}\coloneqq f(p)-f(\bar{p})+{\cal D}f(p)[\exp^{-1}_{p}\bar{p}]-u_{k}. (48)

If yk=0y_{k}=0, then qp¯q^{\prime}\coloneqq\bar{p} satisfies (41) because f(p¯)F(p¯)-f(\bar{p})\in F(\bar{p}) and (42) hold trivially. Assume that yk0y_{k}\neq 0. Using (46) and (48), we obtain

ykef(p)f(p¯)+𝒟f(p)[expp1p¯]e+uke(L+ι)d2(p,p¯).\|y_{k}\|_{e}\leq\|f(p)-f(\bar{p})+{\cal D}f(p)[\exp^{-1}_{p}\bar{p}]\|_{e}+\|u_{k}\|_{e}\leq(L+\iota)d^{2}(p,\bar{p}). (49)

Since d(p,p¯)<r<(β/(L+ι))1/2d(p,\bar{p})<r<\left(\beta/(L+\iota)\right)^{1/2}, it follows from (49) that yke<β\|y_{k}\|_{e}<\beta. Applying (45) with x=ykx^{\prime}=-y_{k}, we obtain that there exists q(gp+G)1(yk)q^{\prime}\in(g_{p}+G)^{-1}(-y_{k}) such that d(q,p¯)Θyked(q^{\prime},\bar{p})\leq\Theta\|y_{k}\|_{e}. Hence, utilizing the upper bound for Θ\Theta given in (44) and (49), it follows that

d(q,p¯)κ1μκykeθd2(p,p¯)whereθ:=κ(L+ι)1μκ.d(q^{\prime},\bar{p})\leq\frac{\kappa}{1-\mu\kappa}\|y_{k}\|_{e}\leq\theta d^{2}(p,\bar{p})\quad\mbox{where}\quad\theta:=\frac{\kappa(L+\iota)}{1-\mu\kappa}.

Furthermore, it comes from yk(gp+G)(q)-y_{k}\in(g_{p}+G)(q^{\prime}), (48), (34) and (13) that

f(p)+f(p¯)𝒟f(p)[expp1p¯]+ukf(p¯)+𝒟f(p)[expp1qexpp1p¯]+F(q),-f(p)+f(\bar{p})-{\cal D}f(p)[\exp^{-1}_{p}\bar{p}]+u_{k}\in f(\bar{p})+{\cal D}f(p)[\exp_{p}^{-1}q^{\prime}-\exp_{p}^{-1}\bar{p}]+F(q^{\prime}),

which means that ukf(p)+𝒟f(p)[expp1q]+F(q).u_{k}\in f(p)+{\cal D}f(p)[\exp_{p}^{-1}q^{\prime}]+F(q^{\prime}). Thus, qq^{\prime} satisfies (41) and (42).

To finish the proof, choose any p0Br(p¯)p_{0}\in\operatorname{B}_{r}(\bar{p}). If p0=p¯p_{0}=\bar{p}, then p1p¯p_{1}\coloneqq\bar{p} verifies (6) because 0R0(p¯)0\in R_{0}(\bar{p}). If p0p¯p_{0}\neq\bar{p}, applying (41) and (42) we obtain that for every u0R0(p0)u_{0}\in R_{0}(p_{0}) there exists p1p_{1} such that

f(p0)+𝒟f(p0)[expp01p1]+F(p1)u0andd(p1,p¯)θd2(p0,p¯).f(p_{0})+{\cal D}f(p_{0})[\exp^{-1}_{p_{0}}p_{1}]+F(p_{1})\ni u_{0}\quad\mbox{and}\quad d(p_{1},\bar{p})\leq\theta d^{2}(p_{0},\bar{p}).

By considering the definition of rr and θ,\theta, we obtain from the above inequality that p1Br(p¯).p_{1}\in\operatorname{B}_{r}(\bar{p}). Repeating the previous argument it is possible to construct a sequence {pk}\{p_{k}\} in Br(p¯)\operatorname{B}_{r}(\bar{p}) that satisfies (6) and converges quadratically to p¯\bar{p}. ∎

Under suitable conditions, it is possible to determine the radius rr mentioned in Theorem 5. Details are given in the following result. The proof of this result is along the same lines as the proof of Theorem 5 and will therefore be omitted.

Theorem 6.

Let p¯\bar{p} be a solution of (2). Suppose that the following conditions hold:

  1. (i)

    Bδ(p¯)\operatorname{B}_{\delta}(\bar{p}) is a totally normal ball, and the mapping G:mG\colon{\cal M}\rightrightarrows\mathbb{R}^{m} defined in (13) with r=δr=\delta is metrically regular at p¯\bar{p} for 0 with the constant τ>0\tau>0 and neighborhoods a[p¯]{\cal B}_{a}[\bar{p}] and 𝔹b[0]\mathbb{B}_{b}[0].

  2. (ii)

    ι(0,1/τ)\iota\in(0,1/\tau) satisfies xe<ιd2(p,p¯)\|x\|_{e}<\iota d^{2}(p,\bar{p}) for all pBδ(p¯)\{p¯}p\in\operatorname{B}_{\delta}(\bar{p})\backslash\{\bar{p}\}, xRk(p)x\in R_{k}(p), k0k\in\mathbb{N}_{0}.

  3. (iii)

    μ>0\mu>0, κ>τ\kappa>\tau and Θ>0\Theta>0 satisfy

    μκ<1,τ1μτ<Θ<κ1μκ.\mu\kappa<1,\qquad\frac{\tau}{1-\mu\tau}<\Theta<\frac{\kappa}{1-\mu\kappa}.
  4. (iv)

    α<min{δ,a/2,b/μ}\alpha<\min\{\delta,a/2,b/\mu\} and

    𝒟f(p)[expp1qexpp1q]𝒟f(p¯)[expp¯1qexpp¯1q]eμd(q,q)for all p,q,qBα(p¯).\|{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}q^{\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}q-\exp_{\bar{p}}^{-1}q^{\prime}]\|_{e}\leq\mu d(q,q^{\prime})\quad\mbox{for all }\,p,q,q^{\prime}\in\operatorname{B}_{\alpha}(\bar{p}).
  5. (v)

    β>0\beta>0 satisfies μα+2βb\mu\alpha+2\beta\leq b and 2Θβα2\Theta\beta\leq\alpha.

  6. (vi)

    L>0L>0 and δL>0\delta_{L}>0 satisfy f(p)f(p¯)𝒟f(p¯)[expp¯1p]eLd2(p,p¯)\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq Ld^{2}(p,\bar{p}) for all pBδL(p¯)p\in\operatorname{B}_{\delta_{L}}(\bar{p}).

Then for every rr satisfying (47) and starting point p0Br(p¯)p_{0}\in\operatorname{B}_{r}(\bar{p}) there exists a sequence {pk}\{p_{k}\} generated by (6) that is well defined and converges linearly to p¯\bar{p}.

Some comments about the previous results are in order. First, Theorems 3 and 5 establish conditions for ensuring that the inexact Newton method (6) converges with linear and quadratic rates, respectively. Second, a similar result to Theorem 3 is presented in [17, Theorem 2.1] by considering {\cal M} as Banach spaces. Thus, if =X{\cal M}=X, where XX is a Banach space, and the derivative 𝒟f(){\cal D}f(\cdot) is replaced by a suitable approximation, then Theorems 3 and [17, Theorem 2.1] are equivalent. Third, to ensure superlinear and quadratic convergence, the authors in [17] assume a stronger condition, namely, the strong metric regularity (see [17, Theorem 2.3]). We recall that a set-valued mapping FF is strongly metrically regular at x¯\bar{x} for y¯\bar{y} if and only if its inverse F1F^{-1} has a single-valued graphical localization around y¯\bar{y} for x¯\bar{x} which is Lipschitz continuous around y¯\bar{y} with Lipschitz modulus at y¯\bar{y} equal to reg(F;x¯|y¯)\operatorname{reg}(F;\bar{x}|\bar{y}), see [17, p. 1007]. Fourth, we establish quadratic convergence of (6) by only assuming the metric regularity condition, which is clearly a weaker assumption than strong metric regularity, see the previous comment. Thus, the result obtained in Theorem 5 is stronger than [17, Theorem 2.3]. Finally, Theorems 4 and 6 refine Theorems 3 and 5, respectively, in the sense that they provide guidance on how to find the neighborhood to start the proposed method in (6).

It is also important to mention that in [17] is introduced a version of the Dennis-Moré theorem for the sequence generated by (6). We do not go futher in this topic because we do not proposed a quasi-Newton method in this paper.

We conclude this section by presenting a semi-local convergence result for the inexact Newton method proposed in this paper to solve (2). This result makes no assumptions about the unknown solution to the problem under investigation; instead, the assumptions are made about the starting point p0p_{0}. It is worth mentioning that this result is novel, even in the Euclidean context.

Theorem 7.

Assume that for (p0,y0)Ω×m(p_{0},y_{0})\in\Omega\times\mathbb{R}^{m} with y0f(p0)+F(p0)y_{0}\in f(p_{0})+F(p_{0}) and u0R0(p0)u_{0}\in R_{0}(p_{0}), the following conditions hold:

  1. (i)

    Bδ(p0)Ω\operatorname{B}_{\delta}(p_{0})\subset\Omega is a totally normal ball, and the mapping G:mG\colon{\cal M}\rightrightarrows\mathbb{R}^{m} defined by

    G(p)={f(p0)+𝒟f(p0)[expp01p]+F(p),pBδ(p0)f(p)+F(p),p\Bδ(p0)G(p)=\begin{cases}f(p_{0})+{\cal D}f(p_{0})[\exp^{-1}_{p_{0}}p]+F(p),&p\in\operatorname{B}_{\delta}(p_{0})\\ f(p)+F(p),&p\in{\cal M}\backslash\operatorname{B}_{\delta}(p_{0})\end{cases} (50)

    is metrically regular at p0p_{0} for y0y_{0} with the constant σ>0\sigma>0 and neighborhoods a[p0]{\cal B}_{a}[p_{0}] and 𝔹b[y0]\mathbb{B}_{b}[y_{0}].

  2. (ii)

    the positive constants μ\mu, α\alpha, β\beta, Θ\Theta, ϵ\epsilon and ι\iota satisfy

    μσ<1,αa/2,μα+2βb,\mu\sigma<1,\qquad\alpha\leq a/2,\qquad\mu\alpha+2\beta\leq b,
    σ/(1μσ)<Θα/(2β),ϵ+ι<2β/α.\sigma/(1-\mu\sigma)<\Theta\leq\alpha/(2\beta),\qquad\epsilon+\iota<2\beta/\alpha.
  3. (iii)

    u0eιy0e\|u_{0}\|_{e}\leq\iota\|y_{0}\|_{e} and for all pBδ(p0)p\in\operatorname{B}_{\delta}(p_{0}) there holds

    y0emin{βΘ(1+ι),β1+ι,β(1α^)α^α^2+1,δ(1α^)Θ(1+ι)},α^Θ(ϵ+ι).\|y_{0}\|_{e}\leq\min\left\{\frac{\beta}{\Theta(1+\iota)},\frac{\beta}{1+\iota},\frac{\beta(1-\hat{\alpha})}{\hat{\alpha}-\hat{\alpha}^{2}+1},\frac{\delta(1-\hat{\alpha})}{\Theta(1+\iota)}\right\},\qquad\hat{\alpha}\coloneqq\Theta(\epsilon+\iota). (51)
  4. (iv)

    xe+yeιd(p~,q~)\|x\|_{e}+\|y\|_{e}\leq\iota d(\tilde{p},\tilde{q}) holds for all p~,q~Bδ(p0)\tilde{p},\tilde{q}\in\operatorname{B}_{\delta}(p_{0}), xRk(p~)x\in R_{k}(\tilde{p}), yRk1(q~)y\in R_{k-1}(\tilde{q}), kk\in\mathbb{N} and p~q~\tilde{p}\neq\tilde{q}.

  5. (v)

    f:mf\colon{\cal M}\to\mathbb{R}^{m} is continuously differentiable, and the inequalities

    f(q)f(p)𝒟f(p)[expp1q]eϵd(p,q)\|f(q)-f(p)-{\cal D}f(p)[\exp^{-1}_{p}q]\|_{e}\leq\epsilon d(p,q)

    and

    𝒟f(p)[expp1qexpp1q]𝒟f(p0)[expp01qexpp01q]eμd(q,q)\|{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}q^{\prime}]-{\cal D}f(p_{0})[\exp_{p_{0}}^{-1}q-\exp_{p_{0}}^{-1}q^{\prime}]\|_{e}\leq\mu d(q,q^{\prime})

    hold for all p,q,qBδ(p0)p,q,q^{\prime}\in\operatorname{B}_{\delta}(p_{0}).

Then there exist sequences {pk}k\{p_{k}\}_{k\in\mathbb{N}} and {uk}k\{u_{k}\}_{k\in\mathbb{N}} satisfying:

  • (A1)

    d(pk,p0)1α^k1α^Θ(1+ι)y0e.d(p_{k},p_{0})\leq\frac{1-\hat{\alpha}^{k}}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e}.

  • (A2)

    d(pk,pk1)α^k1Θ(1+ι)y0e.d(p_{k},p_{k-1})\leq\hat{\alpha}^{k-1}\Theta(1+\iota)\|y_{0}\|_{e}.

  • (A3)

    uk1f(pk1)+𝒟f(pk1)[exppk11pk]+F(pk).u_{k-1}\in f(p_{k-1})+{\cal D}f(p_{k-1})[\exp^{-1}_{p_{k-1}}p_{k}]+F(p_{k}).

In particular, {pk}k\{p_{k}\}_{k\in\mathbb{N}} remains in Bδ(p0)\operatorname{B}_{\delta}(p_{0}), converges to a solution p¯\bar{p} of (2), and satisfies

d(pk,p¯)α^k1α^Θ(1+ι)y0e for all k.d(p_{k},\bar{p})\leq\frac{\hat{\alpha}^{k}}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e}\,\,\mbox{ for all }\,\,k\in\mathbb{N}.
Proof.

Note that the function gp:mg_{p}\colon{\cal M}\to\mathbb{R}^{m}, pBδ(p0)p\in\operatorname{B}_{\delta}(p_{0}), defined by

gp(q)={𝒟f(p)[expp1qexpp1p0]𝒟f(p0)[expp01q],qBδ(p0)0,q\Bδ(p0),g_{p}(q)=\begin{cases}{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}p_{0}]-{\cal D}f(p_{0})[\exp_{p_{0}}^{-1}q],&q\in\operatorname{B}_{\delta}(p_{0})\\ \qquad 0,&q\in{\cal M}\backslash\operatorname{B}_{\delta}(p_{0}),\end{cases} (52)

satisfies gp(p0)=0g_{p}(p_{0})=0. From (52) and the second inequality in (v)(v), it follows that

gp(q)gp(q)e=𝒟f(p)[expp1qexpp1q]𝒟f(p0)[expp01qexpp01q]eμd(q,q),\|g_{p}(q)-g_{p}(q^{\prime})\|_{e}=\|{\cal D}f(p)[\exp_{p}^{-1}q-\exp_{p}^{-1}q^{\prime}]-{\cal D}f(p_{0})[\exp_{p_{0}}^{-1}q-\exp_{p_{0}}^{-1}q^{\prime}]\|_{e}\leq\mu d(q,q^{\prime}),

for all p,q,qBδ(p0)p,q,q^{\prime}\in\operatorname{B}_{\delta}(p_{0}). Thus, by using (ii)(ii), one has

αa/2,μα+2βb,2Θβα,\alpha\leq a/2,\qquad\mu\alpha+2\beta\leq b,\qquad 2\Theta\beta\leq\alpha,

and

gp(p0)e<β,gp(q)gp(q)eμd(q,q),p,q,qα(p0).\|g_{p}(p_{0})\|_{e}<\beta,\quad\|g_{p}(q)-g_{p}(q^{\prime})\|_{e}\leq\mu d(q,q^{\prime}),\quad p,q,q^{\prime}\in{\cal B}_{\alpha}(p_{0}).

Hence, we can apply Lemma 2 with g=gpg=g_{p}, F=GF=G, ν=μ\nu=\mu, σ=Θ\sigma^{\prime}=\Theta, x¯=y0\bar{x}=y_{0} and p¯=p0,\bar{p}=p_{0}, to obtain the following statement:

  • (S)

    for every x,x𝔹β[y0]x,x^{\prime}\in\mathbb{B}_{\beta}[y_{0}] and every q~(gp+G)1(x)Bα[p0]\tilde{q}\in(g_{p}+G)^{-1}(x)\cap\operatorname{B}_{\alpha}[p_{0}] there exists q~(gp+G)1(x)\tilde{q}^{\prime}\in(g_{p}+G)^{-1}(x^{\prime}) such that d(q~,q~)Θxxed(\tilde{q},\tilde{q}^{\prime})\leq\Theta\|x-x^{\prime}\|_{e}.

Due to (iv)(iv), for every kk\in\mathbb{N}, the following can be stated:

pkpk1Bδ(p0),ukRk(pk)uke+uk1eιd(pk,pk1).p_{k}\neq p_{k-1}\in\operatorname{B}_{\delta}(p_{0}),\,u_{k}\in R_{k}(p_{k})\implies\|u_{k}\|_{e}+\|u_{k-1}\|_{e}\leq\iota d(p_{k},p_{k-1}). (53)

We are now able to prove (A1)–(A3) by induction. By using statement (S) with x=y0,x=y_{0}, q~=p=p0,\tilde{q}=p=p_{0}, and x=u0,x^{\prime}=u_{0}, we conclude that there exists p1:=q~(gp0+G)1(u0)p_{1}:=\tilde{q}^{\prime}\in(g_{p_{0}}+G)^{-1}(u_{0}) such that

d(p1,p0)Θy0u0eΘ(1+ι)y0e,d(p_{1},p_{0})\leq\Theta\|y_{0}-u_{0}\|_{e}\leq\Theta(1+\iota)\|y_{0}\|_{e},

where the last inequality follows from (iii)(iii). Moreover, the inclusion p1(gp0+G)1(u0)p_{1}\in(g_{p_{0}}+G)^{-1}(u_{0}) is equivalent to

u0f(p0)+𝒟f(p0)[expp01p1]+F(p1).u_{0}\in f(p_{0})+{\cal D}f(p_{0})[\exp^{-1}_{p_{0}}p_{1}]+F(p_{1}).

Hence, (A1)–(A3) hold for k=1k=1. For k>1k>1, assume the induction hypothesis: there exist pjBδ(p0)p_{j}\in\operatorname{B}_{\delta}(p_{0}) and ujRj(pj)u_{j}\in R_{j}(p_{j}) such that (A1)–(A3) hold for every j{1,2,,k}j\in\{1,2,\ldots,k\}. Denote

zk\displaystyle z_{k} f(p0)f(pk)𝒟f(pk)[exppk1p0]+uk,\displaystyle\coloneqq f(p_{0})-f(p_{k})-{\cal D}f(p_{k})[\exp^{-1}_{p_{k}}p_{0}]+u_{k}, (54)
wk\displaystyle w_{k} f(p0)f(pk1)𝒟f(pk)[exppk1p0]𝒟f(pk1)[exppk11pk]+uk1.\displaystyle\coloneqq f(p_{0})-f(p_{k-1})-{\cal D}f(p_{k})[\exp^{-1}_{p_{k}}p_{0}]-{\cal D}f(p_{k-1})[\exp^{-1}_{p_{k-1}}p_{k}]+u_{k-1}. (55)

By (A3), (52), (50) we find pk(gpk+G)1(wk)p_{k}\in(g_{p_{k}}+G)^{-1}(w_{k}). On the other hand, the first inequality in (v)(v), (53), (A1), (A2) and (51) yield

zky0e\displaystyle\|z_{k}-y_{0}\|_{e} f(p0)f(pk)𝒟f(pk)[exppk1p0]e+uke+y0e\displaystyle\leq\|f(p_{0})-f(p_{k})-{\cal D}f(p_{k})[\exp^{-1}_{p_{k}}p_{0}]\|_{e}+\|u_{k}\|_{e}+\|y_{0}\|_{e}
ϵd(pk,p0)+ιd(pk,pk1)+y0e\displaystyle\leq\epsilon d(p_{k},p_{0})+\iota d(p_{k},p_{k-1})+\|y_{0}\|_{e}
ϵd(pk,p0)+(ϵ+ι)d(pk,pk1)+y0e\displaystyle\leq\epsilon d(p_{k},p_{0})+(\epsilon+\iota)d(p_{k},p_{k-1})+\|y_{0}\|_{e}

and

wky0e\displaystyle\|w_{k}-y_{0}\|_{e} f(p0)f(pk)𝒟f(pk)[exppk1p0]e\displaystyle\leq\|f(p_{0})-f(p_{k})-{\cal D}f(p_{k})[\exp^{-1}_{p_{k}}p_{0}]\|_{e}
+f(pk)f(pk1)𝒟f(pk1)[exppk11pk]e+uk1e+y0e\displaystyle+\|f(p_{k})-f(p_{k-1})-{\cal D}f(p_{k-1})[\exp^{-1}_{p_{k-1}}p_{k}]\|_{e}+\|u_{k-1}\|_{e}+\|y_{0}\|_{e}
ϵd(pk,p0)+(ϵ+ι)d(pk,pk1)+y0e\displaystyle\leq\epsilon d(p_{k},p_{0})+(\epsilon+\iota)d(p_{k},p_{k-1})+\|y_{0}\|_{e}

and

ϵd(pk,p0)+(ϵ+ι)d(pk,pk1)+y0e\displaystyle\epsilon d(p_{k},p_{0})+(\epsilon+\iota)d(p_{k},p_{k-1})+\|y_{0}\|_{e} ϵΘ1α^y0+(ϵ+ι)Θy0+y0e\displaystyle\leq\frac{\epsilon\Theta}{1-\hat{\alpha}}\|y_{0}\|+(\epsilon+\iota)\Theta\|y_{0}\|+\|y_{0}\|_{e}
(ϵ+ι)Θ1α^y0+(ϵ+ι)Θy0+y0e\displaystyle\leq\frac{(\epsilon+\iota)\Theta}{1-\hat{\alpha}}\|y_{0}\|+(\epsilon+\iota)\Theta\|y_{0}\|+\|y_{0}\|_{e}
=α^α^2+11α^y0eβ,\displaystyle=\frac{\hat{\alpha}-\hat{\alpha}^{2}+1}{1-\hat{\alpha}}\|y_{0}\|_{e}\leq\beta,

which implies wk,zk𝔹β[y0]w_{k},z_{k}\in\mathbb{B}_{\beta}[y_{0}]. Therefore, we can apply (S) with x=zk,x^{\prime}=z_{k}, q~=p=pk\tilde{q}=p=p_{k} and x=wkx=w_{k} to conclude that there exists q(gpk+G)1(zk)q^{\prime}\in(g_{p_{k}}+G)^{-1}(z_{k}) such that d(q,pk)Θzkwked(q^{\prime},p_{k})\leq\Theta\|z_{k}-w_{k}\|_{e}. Putting q=pk+1q^{\prime}=p_{k+1} in this inequality and taking into account (54), (55), the first inequality in (v)(v), (53) and (A2), we get

d(pk+1,pk)\displaystyle d(p_{k+1},p_{k}) Θzkwke\displaystyle\leq\Theta\|z_{k}-w_{k}\|_{e}
Θ(f(pk)+f(pk1)+𝒟f(pk1)[exppk11pk]e+uke+uk1e)\displaystyle\leq\Theta\left(\|-f(p_{k})+f(p_{k-1})+{\cal D}f(p_{k-1})[\exp^{-1}_{p_{k-1}}p_{k}]\|_{e}+\|u_{k}\|_{e}+\|u_{k-1}\|_{e}\right)
Θ(ϵ+ι)d(pk,pk1)=α^d(pk,pk1)α^kΘy0eα^kΘ(1+ι)y0e.\displaystyle\leq\Theta(\epsilon+\iota)d(p_{k},p_{k-1})=\hat{\alpha}d(p_{k},p_{k-1})\leq\hat{\alpha}^{k}\Theta\|y_{0}\|_{e}\leq\hat{\alpha}^{k}\Theta(1+\iota)\|y_{0}\|_{e}.

In view of this and (A1), we also have

d(pk+1,p0)\displaystyle d(p_{k+1},p_{0}) d(pk+1,pk)+d(pk,p0)\displaystyle\leq d(p_{k+1},p_{k})+d(p_{k},p_{0})
α^kΘ(1+ι)y0e+1α^k1α^Θ(1+ι)y0e=1α^k+11α^Θ(1+ι)y0e.\displaystyle\leq\hat{\alpha}^{k}\Theta(1+\iota)\|y_{0}\|_{e}+\frac{1-\hat{\alpha}^{k}}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e}=\frac{1-\hat{\alpha}^{k+1}}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e}.

Furthermore, it comes from pk+1(gpk+G)1(zk)p_{k+1}\in(g_{p_{k}}+G)^{-1}(z_{k}) that

ukf(pk)+𝒟f(pk)[exppk1pk+1]+F(pk+1).u_{k}\in f(p_{k})+{\cal D}f(p_{k})[\exp_{p_{k}}^{-1}p_{k+1}]+F(p_{k+1}).

Overall, we can conclude that (A1), (A2) and (A3) hold for all k1.k\geq 1.

Now our focus is to show the convergence of {pk}.\{p_{k}\}. Using (A1) and (A2) in a simple induction procedure, we can show that

d(pm,pn)α^m1α^nm1α^Θ(1+ι)y0ed(p_{m},p_{n})\leq\hat{\alpha}^{m}\frac{1-\hat{\alpha}^{n-m}}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e}

holds for all m<nm<n, m,nm,n\in\mathbb{N}. Hence, since α^(0,1)\hat{\alpha}\in(0,1), {pk}\{p_{k}\} is a Cauchy sequence, and consequently, there exists p¯\bar{p}\in{\cal M} such that limkpk=p¯.\lim_{k\to\infty}p_{k}=\bar{p}. From (A1) and (51), it follows that pkBδ(p0)p_{k}\in\operatorname{B}_{\delta}(p_{0}) for all \mathbb{N}, which implies that p¯Bδ[p0]Ω\bar{p}\in\operatorname{B}_{\delta}[p_{0}]\subset\Omega. As ff is continuous and FF has closed graph, by letting k+k\to+\infty in (53) and (A3), we conclude that p¯\bar{p} is a solution of (2). Finally, using (A2), we get

d(pk,p¯)\displaystyle d(p_{k},\bar{p}) =limmd(pk,pk+m)limmi=kk1+md(pi,pi+1)\displaystyle=\lim_{m\to\infty}d(p_{k},p_{k+m})\leq\lim_{m\to\infty}\sum_{i=k}^{k-1+m}d(p_{i},p_{i+1})
limmi=kk1+mα^iΘ(1+ι)y0e=limmα^k(1α^m)1α^Θ(1+ι)y0e=α^k1α^Θ(1+ι)y0e,\displaystyle\leq\lim_{m\to\infty}\sum_{i=k}^{k-1+m}\hat{\alpha}^{i}\Theta(1+\iota)\|y_{0}\|_{e}=\lim_{m\to\infty}\frac{\hat{\alpha}^{k}(1-\hat{\alpha}^{m})}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e}=\frac{\hat{\alpha}^{k}}{1-\hat{\alpha}}\Theta(1+\iota)\|y_{0}\|_{e},

for all kk\in\mathbb{N}. ∎

We conclude this section with some remarks about Theorem 7. First, it is a result to guarantee the existence of a neighborhood which the inexact Newton’s method is well-defined and, hence its convergence to a solution of (2). Thus, in practice, it can be hard to evaluate all the parameters in itens (i)(v)(i)-(v). Second, it states that if the initial point p0p_{0}\in{\cal M} satisfies the inclusion y0(f+F)(p0)y_{0}\in(f+F)(p_{0}) where y0Bχ(0)y_{0}\in B_{\chi}(0) and

0<χ:=min{βΘ(1+ι),β1+ι,β(1α^)α^α^2+1,δ(1α^)Θ}0<\chi:=\min\left\{\frac{\beta}{\Theta(1+\iota)},\frac{\beta}{1+\iota},\frac{\beta(1-\hat{\alpha})}{\hat{\alpha}-\hat{\alpha}^{2}+1},\frac{\delta(1-\hat{\alpha})}{\Theta}\right\}

then the inexact Newton’s method finds a solution of (2) in Bχ~(p0)B_{\tilde{\chi}}(p_{0}), where

χ~=Θ(1+ι)1α^y0e.\tilde{\chi}=\frac{\Theta(1+\iota)}{1-\hat{\alpha}}\|y_{0}\|_{e}.

We can interpret this remark as follows: although we cannot evaluate χ\chi in practice, if we choose y0y_{0} close to 0 then the proposed method will work. Third, the assumption u0ιy0e\|u_{0}\|\leq\iota\|y_{0}\|_{e} in (iii)(iii) ensures that the first call (and the subsequent calls) to the inexact Newton’s method for solving (2) address the inexactness. Fourth, conditions in (i)-(ii) are usual in the context of metric regularity, and both inequalities in (v)(v) are related to the smoothness assumption of ff and the continuity of the exponential map. Thus, if ϵ\epsilon and η\eta are small enough, this assumption holds.

5 Application

In this section, we investigate three well-known problems that can be formulated as generalized equations on Riemannian manifolds.

Example 5 (System of Inequalities and Equalities).

Consider the generalized equation (2) with FKF\equiv-K, where KmK\subset\mathbb{R}^{m} is the fixed cone

s×{0}ms{xm:xi0fori=1,,sandxi=0fori=s+1,,m}.\mathbb{R}_{-}^{s}\times\{0\}^{m-s}\coloneqq\{x\in\mathbb{R}^{m}\colon x_{i}\leq 0\,\,\mbox{for}\,\,i=1,\ldots,s\,\,\mbox{and}\,\,x_{i}=0\,\,\mbox{for}\,\,i=s+1,\ldots,m\}.

It is easy to see that this generalized equation is equivalent to the following system of equalities and inequalities:

fi(p)\displaystyle f_{i}(p) 0for i=1,,s,\displaystyle\leq 0\quad\text{for }i=1,\ldots,s, (56)
fi(p)\displaystyle f_{i}(p) =0for i=s+1,,m.\displaystyle=0\quad\text{for }i=s+1,\ldots,m.

Let p¯\bar{p} solve (56) and let each fif_{i} be continuously differentiable around p¯\bar{p} for all i=1,,mi=1,\ldots,m. As defined in [10, Definition 3.12], the Mangasarian–Fromovitz condition for system (56) is as follows:

v𝒯p¯ such that{gradfi(p¯),v<0for i{1,,s} with fi(p¯)=0,gradfi(p¯),v=0for i{s+1,,m}.\exists v\in{\cal T}_{\bar{p}}{\cal M}\text{ such that}\quad\begin{cases}\langle\operatorname{grad}f_{i}(\bar{p}),v\rangle<0&\quad\text{for }i\in\{1,\ldots,s\}\text{ with }f_{i}(\bar{p})=0,\\ \langle\operatorname{grad}f_{i}(\bar{p}),v\rangle=0&\quad\text{for }i\in\{s+1,\ldots,m\}.\end{cases} (57)

After making simple adaptations of [24, Example 4D.3] to the Riemannian context, we can assert that condition (57) is equivalent to the Aubin property of (f+K)1(-f+K)^{-1} at x¯\bar{x} for p¯\bar{p}, which, in turn, by Theorem 2, is equivalent to the metric regularity of f+K-f+K at p¯\bar{p} for x¯\bar{x}.

The following proposition serves as a prerequisite for discussing the last two examples in this section. This result establishes an equivalence between problems (2) and (3).

Proposition 1.

Let Ω\Omega\subseteq\mathcal{M} and let {E1,,En}\{E_{1},\ldots,E_{n}\} be a basis for 𝒳(Ω)\mathcal{X}(\Omega). Suppose V:Ω𝒯V\colon\Omega\to\mathcal{T}\mathcal{M} is a single-valued vector field, and Z:Ω𝒯Z\colon\Omega\rightrightarrows\mathcal{T}\mathcal{M} is a set-valued vector field. Then, a point p¯\bar{p} is a solution to (3) if and only if it is a solution to the generalized equation (2) with f:Ωnf\colon\Omega\to\mathbb{R}^{n} and F:ΩnF\colon\Omega\rightrightarrows\mathbb{R}^{n} defined, respectively, by

f(p)(V(p),E1(p),,V(p),En(p))andF(p)vZ(p)(v,E1(p),,v,En(p)).f(p)\coloneqq(\langle V(p),E_{1}(p)\rangle,\ldots,\langle V(p),E_{n}(p)\rangle)\quad\text{and}\quad F(p)\coloneqq\bigcup_{v\in Z(p)}(\langle v,E_{1}(p)\rangle,\ldots,\langle v,E_{n}(p)\rangle). (58)
Proof.

If p¯\bar{p} is a solution of (3), then there exists v¯Z(p¯)\bar{v}\in Z(\bar{p}) such that V(p¯)+v¯=0p¯V(\bar{p})+\bar{v}=0_{\bar{p}}. Consequently,

V(p¯),Ei(p¯)+v¯,Ei(p¯)=V(p¯)+v¯,Ei(p¯)=0for all i=1,,n.\langle V(\bar{p}),E_{i}(\bar{p})\rangle+\langle\bar{v},E_{i}(\bar{p})\rangle=\langle V(\bar{p})+\bar{v},E_{i}(\bar{p})\rangle=0\quad\text{for all }i=1,\ldots,n.

Using the definitions of ff and FF in (58), we find that p¯\bar{p} satisfies (2). Conversely, if p¯\bar{p} is a solution to (2) with ff and FF defined in (58), then there exists v¯Z(p¯)\bar{v}\in Z(\bar{p}) such that

(V(p¯),E1(p¯),,V(p¯),En(p¯))+(v¯,E1(p¯),,v¯,En(p¯))=0.(\langle V(\bar{p}),E_{1}(\bar{p})\rangle,\ldots,\langle V(\bar{p}),E_{n}(\bar{p})\rangle)+(\langle\bar{v},E_{1}(\bar{p})\rangle,\ldots,\langle\bar{v},E_{n}(\bar{p})\rangle)=0.

Since {E1(p¯),,En(p¯)}\{E_{1}(\bar{p}),\ldots,E_{n}(\bar{p})\} forms a basis for 𝒯p¯\mathcal{T}_{\bar{p}}\mathcal{M}, it follows that V(p¯)+v¯,v=0\langle V(\bar{p})+\bar{v},v\rangle=0 for all v𝒯p¯v\in\mathcal{T}_{\bar{p}}\mathcal{M}. Thus, 0p¯=V(p¯)+v¯0_{\bar{p}}=V(\bar{p})+\bar{v}, implying that p¯\bar{p} is a solution to (3). ∎

In the following example, we discuss the variational inequality problem proposed in [44], which extends the problem introduced in [46].

Example 6 (Variational Inequality Problem).

Let V:Ω𝒯V\colon\Omega\subseteq{\cal M}\to\mathcal{T}\mathcal{M} be a single-valued vector field. Consider the variational inequality problem

pΩ,V(p),γ˙(0)0for all γΓp,qΩ,p\in\Omega,\quad\langle V(p),\dot{\gamma}(0)\rangle\geq 0\quad\text{for all }\gamma\in\Gamma^{\Omega}_{p,q}, (59)

where Γp,qΩ\Gamma^{\Omega}_{p,q} is the set of all geodesics γ:[0,1]\gamma\colon[0,1]\to\mathcal{M} satisfying γ(0)=p\gamma(0)=p, γ(1)=q\gamma(1)=q, and γ(t)Ω\gamma(t)\in\Omega for all t[0,1]t\in[0,1]. Since the normal cone associated with Ω\Omega is the set-valued vector field NΩ:Ω𝒯N_{\Omega}\colon\Omega\rightrightarrows\mathcal{T}\mathcal{M} defined by

NΩ(p){v𝒯p:v,γ˙(0)0 for each γΓp,qΩ},N_{\Omega}(p)\coloneqq\left\{v\in\mathcal{T}_{p}\mathcal{M}\colon\langle v,\dot{\gamma}(0)\rangle\leq 0\text{ for each }\gamma\in\Gamma^{\Omega}_{p,q}\right\},

for all pΩp\in\Omega (see [43]), the problem (59) is equivalent to

pΩ,V(p)+NΩ(p)0p,p\in\Omega,\quad V(p)+N_{\Omega}(p)\ni 0_{p},

which, in turn, by Proposition 1, is equivalent to generalized equation (2) with f:Ωnf\colon\Omega\to\mathbb{R}^{n} and F:ΩnF\colon\Omega\rightrightarrows\mathbb{R}^{n} as in (58), with Z=NΩZ=N_{\Omega} and {E1,,En}\{E_{1},\ldots,E_{n}\} being any basis for 𝒳(Ω)\mathcal{X}(\Omega).

The following example proposes an approach based on the Riemannian extension of the analysis presented in [36].

Example 7 (KKT Conditions).

Consider the constrained nonlinear optimization problem on {\cal M}:

minimize 𝐟(p)\displaystyle\,{\bf f}(p) (60)
subject to 𝐠(p)0,𝐡(p)=0,\displaystyle\,{\bf g}(p)\leq 0,\,\,{\bf h}(p)=0, (61)

where 𝐟:{\bf f}\colon{\cal M}\to\mathbb{R}, 𝐠(𝐠1,,𝐠m1):m1{\bf g}\coloneqq({\bf g}_{1},\ldots,{\bf g}_{m_{1}})\colon{\cal M}\to\mathbb{R}^{m_{1}}, and 𝐡(𝐡1,,𝐡m2):m2{\bf h}\coloneqq({\bf h}_{1},\ldots,{\bf h}_{m_{2}})\colon{\cal M}\to\mathbb{R}^{m_{2}} are assumed to be continuously differentiable functions. The constraint 𝐠(p)0{\bf g}(p)\leq 0 means that 𝐠i(p)0{\bf g}_{i}(p)\leq 0 for all i=1,,m1i=1,\ldots,m_{1}. The Lagrangian function :×m1×m2{\cal L}\colon{\cal M}\times\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}}\to\mathbb{R} is given by

(p,μ,λ):=𝐟(p)+i=1m1μi𝐠i(p)+i=1m2λi𝐡i(p),\mathcal{L}(p,\mu,\lambda):={\bf f}(p)+\sum_{i=1}^{m_{1}}\mu_{i}{\bf g}_{i}(p)+\sum_{i=1}^{m_{2}}\lambda_{i}{\bf h}_{i}(p),

where μ(μ1,,μm1)m1\mu\coloneqq(\mu_{1},\ldots,\mu_{m_{1}})\in\mathbb{R}^{m_{1}} and λ(λ1,,λm2)m2\lambda\coloneqq(\lambda_{1},\ldots,\lambda_{m_{2}})\in\mathbb{R}^{m_{2}}. For each (μ,λ)m1×m2(\mu,\lambda)\in\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}}, consider the function μ,λ:{\cal L}_{\mu,\lambda}\colon{\cal M}\to\mathbb{R} defined by μ,λ(p)=(p,μ,λ){\cal L}_{\mu,\lambda}(p)=\mathcal{L}(p,\mu,\lambda) for all pp\in{\cal M}. Based on [10], we assert that the KKT conditions for (60)–(61) are:

gradμ,λ(p)=grad𝐟(p)+i=1m1μigrad𝐠i(p)+i=1m2λigrad𝐡i(p)=0p,\displaystyle\operatorname{grad}\mathcal{L}_{\mu,\lambda}(p)=\operatorname{grad}{\bf f}(p)+\sum_{i=1}^{m_{1}}\mu_{i}\operatorname{grad}{\bf g}_{i}(p)+\sum_{i=1}^{m_{2}}\lambda_{i}\operatorname{grad}{\bf h}_{i}(p)=0_{p}, (62)
μ0,𝐠(p)0,i=1m1μi𝐠i(p)=0,\displaystyle\mu\geq 0,\quad{\bf g}(p)\leq 0,\quad\sum_{i=1}^{m_{1}}\mu_{i}{\bf g}_{i}(p)=0, (63)
𝐡(p)=0.\displaystyle{\bf h}(p)=0. (64)

Let ~×m1×m2\widetilde{\mathcal{M}}\coloneqq\mathcal{M}\times\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}} and consider the vector field V:~𝒯~𝒯×m1×m2V\colon\widetilde{\mathcal{M}}\to\mathcal{T}\widetilde{\mathcal{M}}\equiv\mathcal{T}\mathcal{M}\times\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}} defined by

V(p,μ,λ)=(gradμ,λ(p),𝐠(p),𝐡(p)),V(p,\mu,\lambda)=(\operatorname{grad}\mathcal{L}_{\mu,\lambda}(p),{\bf g}(p),{\bf h}(p)),

and the set-valued vector field Z:~𝒯~Z\colon\widetilde{\mathcal{M}}\rightrightarrows\mathcal{T}\widetilde{\mathcal{M}} defined by

Z(p,μ,λ)={{0p}×{y+m1:i=1m1μiyi=0}×{0},if μ0;,otherwise,Z(p,\mu,\lambda)=\left\{\begin{array}[]{lll}\{0_{p}\}\times\left\{y\in\mathbb{R}_{+}^{m_{1}}\colon\sum_{i=1}^{m_{1}}\mu_{i}y_{i}=0\right\}\times\{0\},&\text{if }\mu\geq 0;\\ \,\emptyset,&\text{otherwise},\end{array}\right.

where +m1\mathbb{R}_{+}^{m_{1}} denotes the set of vectors in m1\mathbb{R}^{m_{1}} with nonnegative coordinates. Note that KKT system (62)–(63)–(64) is equivalent to the problem

(p,μ,λ)~,V(p,μ,λ)+Z(p,μ,λ)(0p,0,0).(p,\mu,\lambda)\in\widetilde{\mathcal{M}},\quad V(p,\mu,\lambda)+Z(p,\mu,\lambda)\ni(0_{p},0,0). (65)

Let {E1,,En}\{E_{1},\ldots,E_{n}\} be a basis for 𝒳()\mathcal{X}(\mathcal{M}) and {En+1,,En+m1+m2}\{E_{n+1},\ldots,E_{n+m_{1}+m_{2}}\} be the canonical basis for m1×m2\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}}. Then, {E1,,En+m1+m2}\{E_{1},\ldots,E_{n+m_{1}+m_{2}}\} forms a basis for 𝒳(~)\mathcal{X}(\widetilde{\mathcal{M}}) and, by Proposition 1, (65) is equivalent to the generalized equation

(p,μ,λ)~,f(p,μ,λ)+F(p,μ,λ)0,(p,\mu,\lambda)\in\widetilde{\mathcal{M}},\quad f(p,\mu,\lambda)+F(p,\mu,\lambda)\ni 0,

where f:~n×m1×m2f\colon\widetilde{\mathcal{M}}\to\mathbb{R}^{n}\times\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}} and F:~n×m1×m2F\colon\widetilde{\mathcal{M}}\rightrightarrows\mathbb{R}^{n}\times\mathbb{R}^{m_{1}}\times\mathbb{R}^{m_{2}} are defined, respectively, by

(f(p,μ,λ))i={gradμ,λ(p),Ei(p),if i=1,,n;(𝐠(p))in,if i=n+1,,n+m1;(𝐡(p))inm1,if i=n+m1+1,,n+m1+m2;(f(p,\mu,\lambda))_{i}=\left\{\begin{array}[]{lll}\langle\operatorname{grad}\mathcal{L}_{\mu,\lambda}(p),E_{i}(p)\rangle,&\text{if }i=1,\ldots,n;\\ ({\bf g}(p))_{i-n},&\text{if }i=n+1,\ldots,n+m_{1};\\ ({\bf h}(p))_{i-n-m_{1}},&\text{if }i=n+m_{1}+1,\ldots,n+m_{1}+m_{2};\end{array}\right.

and

F(p,μ,λ)={{0}×{y+m1:i=1m1μiyi=0}×{0},if μ0;,otherwise.F(p,\mu,\lambda)=\left\{\begin{array}[]{lll}\{0\}\times\left\{y\in\mathbb{R}_{+}^{m_{1}}\colon\sum_{i=1}^{m_{1}}\mu_{i}y_{i}=0\right\}\times\{0\},&\text{if }\mu\geq 0;\\ \,\emptyset,&\text{otherwise}.\end{array}\right.

6 Numerical Example

In this section, we present a numerical example based on a generalized equation derived from Example 7, and solve it using the inexact Newton method described in (5). All computations were performed on a MacBook Pro running macOS Sonoma 14.5, equipped with 16 GB RAM, an Apple M1 Pro CPU, and MATLAB R2022a. To ensure reproducibility, we fixed the randomness using MATLAB’s built-in function rng(2024).

Here, we consider the Riemannian constrained optimization problem on {\cal M}:

minimize 𝐟(p)1Ni=1Nd2(p,pi)\displaystyle\,{\bf f}(p)\coloneqq\frac{1}{N}\sum_{i=1}^{N}d^{2}(p,p^{i}) (66)
subject to 𝐠(p)d2(p,p~)r20,\displaystyle\,{\bf g}(p)\coloneqq d^{2}(p,\tilde{p})-r^{2}\leq 0, (67)

where r>0r>0 and p1,,pN,p~p^{1},\ldots,p^{N},\tilde{p}\in\cal{M} are chosen such that r<rinj(p~)r<r_{inj}(\tilde{p}) and p1,,pNr(p~)p^{1},\ldots,p^{N}\in{\cal B}_{r}(\tilde{p}). We will use the fact that

grad𝐟(p)=2Ni=1Nexpp1piandgrad𝐠(p)=2expp1p~,pr(p~).\operatorname{grad}{\bf f}(p)=-\frac{2}{N}\sum_{i=1}^{N}\exp^{-1}_{p}p^{i}\quad\text{and}\quad\operatorname{grad}{\bf g}(p)=-2\exp^{-1}_{p}\tilde{p},\quad\forall p\in{\cal B}_{r}(\tilde{p}). (68)

The problem defined by (66)–(67) represents a constrained version of the Riemannian center of mass, also known as the (Riemannian) mean, as proposed in [10]. This problem was first introduced in [37] and has been extensively studied in recent literature (see, for example, [13, 29, 53]).

Particularly, we are interested in problem (66)–(67) with =𝕊3{p4|pe=1}\mathcal{M}=\mathbb{S}^{3}\coloneqq\{p\in\mathbb{R}^{4}\ |\ \|p\|_{e}=1\}, equipped with the metric of the ambient space 4\mathbb{R}^{4}. According to [35], the skew-symmetric matrices

M1=[0100100000010010],M2=[0010000110000100],M3=[0001001001001000],M_{1}=\begin{bmatrix}0&-1&0&0\\ 1&0&0&0\\ 0&0&0&-1\\ 0&0&1&0\end{bmatrix},\quad M_{2}=\begin{bmatrix}0&0&-1&0\\ 0&0&0&1\\ 1&0&0&0\\ 0&-1&0&0\end{bmatrix},\quad M_{3}=\begin{bmatrix}0&0&0&-1\\ 0&0&-1&0\\ 0&1&0&0\\ 1&0&0&0\end{bmatrix},

induce an orthonormal basis of vector fields {E1,E2,E3}\{E_{1},E_{2},E_{3}\} on 𝕊3\mathbb{S}^{3}, defined by Ei(p)=MipE_{i}(p)=M_{i}p (standard matrix-vector product) for all p𝕊3p\in\mathbb{S}^{3} and i=1,2,3i=1,2,3. By defining μ:𝕊3\mathcal{L}_{\mu}\colon\mathbb{S}^{3}\to\mathbb{R} by

μ(p)𝐟(p)+μ𝐠(p),\mathcal{L}_{\mu}(p)\coloneqq{\bf f}(p)+\mu{\bf g}(p),

it follows from Example 7 that the KKT conditions for (66)–(67) are:

gradμ(p)=grad𝐟(p)+μgrad𝐠(p)\displaystyle\operatorname{grad}\mathcal{L}_{\mu}(p)=\operatorname{grad}{\bf f}(p)+\mu\operatorname{grad}{\bf g}(p) =0p,\displaystyle=0_{p}, (69)
μ0,𝐠(p)0,μ𝐠(p)\displaystyle\mu\geq 0,\quad{\bf g}(p)\leq 0,\quad\mu{\bf g}(p) =0,\displaystyle=0, (70)

which is equivalent to the generalized equation

(p,μ)𝕊3×,f(p,μ)+F(p,μ)0,(p,\mu)\in\mathbb{S}^{3}\times\mathbb{R},\quad f(p,\mu)+F(p,\mu)\ni 0, (71)

where f:𝕊3×3×f\colon\mathbb{S}^{3}\times\mathbb{R}\to\mathbb{R}^{3}\times\mathbb{R} and F:𝕊3×3×F\colon\mathbb{S}^{3}\times\mathbb{R}\to\mathbb{R}^{3}\times\mathbb{R} are defined, respectively, by

f(p,μ)=(gradμ(p),E1(p),gradμ(p),E2(p),gradμ(p),E3(p),𝐠(p))f(p,\mu)=(\langle\operatorname{grad}\mathcal{L}_{\mu}(p),E_{1}(p)\rangle,\langle\operatorname{grad}\mathcal{L}_{\mu}(p),E_{2}(p)\rangle,\langle\operatorname{grad}\mathcal{L}_{\mu}(p),E_{3}(p)\rangle,{\bf g}(p)) (72)

and

F(p,μ)={{0}×{y+:μy=0},if μ0,,otherwise,F(p,\mu)=\left\{\begin{array}[]{ll}\{0\}\times\left\{y\in\mathbb{R}_{+}\colon\mu y=0\right\},&\text{if }\mu\geq 0,\\ \emptyset,&\text{otherwise},\end{array}\right. (73)

for all (p,μ)𝕊3×(p,\mu)\in\mathbb{S}^{3}\times\mathbb{R}.

To apply the inexact Newton method in (5) to solve (71), the subproblem in each iteration kk involves computing (vk,νk)𝒯(pk,μk)(𝕊3×)𝒯pk𝕊3×(v_{k},\nu_{k})\in\mathcal{T}_{(p_{k},\mu_{k})}(\mathbb{S}^{3}\times\mathbb{R})\equiv\mathcal{T}_{p_{k}}\mathbb{S}^{3}\times\mathbb{R} such that

(f(pk,μk)+𝒟f(pk,μk)[(vk,νk)]+F(exppkvk,μk+νk))Rk(pk,μk).\left(f(p_{k},\mu_{k})+\mathcal{D}f(p_{k},\mu_{k})[(v_{k},\nu_{k})]+F(\exp_{p_{k}}v_{k},\mu_{k}+\nu_{k})\right)\cap R_{k}(p_{k},\mu_{k})\neq\varnothing.

To accomplish this, select ukRk(pk,μk)u_{k}\in R_{k}(p_{k},\mu_{k}) (we use uk=[1/(10k),1/(10k),1/(10k),1/(10k)]tu_{k}=\left[1/(10^{k}),1/(10^{k}),1/(10^{k}),1/(10^{k})\right]^{t}, where tt denotes the transpose operation, for all kk), and then solve the following optimization problem:

minimize 12z+f(pk,μk)+𝒟f(pk,μk)[(v,ν)]uke2\displaystyle\,\,\frac{1}{2}\|z+f(p_{k},\mu_{k})+\mathcal{D}f(p_{k},\mu_{k})[(v,\nu)]-u_{k}\|_{e}^{2} (74)
subject to (v,ν)𝒯pk𝕊3×,z=(z1,z2,z3,z4)F(exppkv,μk+ν).\displaystyle\,(v,\nu)\in\mathcal{T}_{p_{k}}\mathbb{S}^{3}\times\mathbb{R},\quad z=(z_{1},z_{2},z_{3},z_{4})\in F(\exp_{p_{k}}v,\mu_{k}+\nu). (75)

Based on the definition of FF in (73) and the fact that every v𝒯pk𝕊3v\in\mathcal{T}_{p_{k}}\mathbb{S}^{3} can be expressed as a linear combination of E1(pk)E_{1}(p_{k}), E2(pk)E_{2}(p_{k}), and E3(pk)E_{3}(p_{k}), we can find a solution to (74)–(75) by solving the Euclidean quadratic constraint problem:

minimize 12z+f(pk,μk)+𝒟f(pk,μk)[(α1E1(pk)+α2E2(pk)+α3E3(pk),ν)]uke2\displaystyle\,\frac{1}{2}\|z+f(p_{k},\mu_{k})+\mathcal{D}f(p_{k},\mu_{k})[(\alpha_{1}E_{1}(p_{k})+\alpha_{2}E_{2}(p_{k})+\alpha_{3}E_{3}(p_{k}),\nu)]-u_{k}\|_{e}^{2} (76)
subject to (α1,α2,α3)3,z1=z2=z3=0,z4+,μk+ν+,z4(μk+ν)=0.\displaystyle\,(\alpha_{1},\alpha_{2},\alpha_{3})\in\mathbb{R}^{3},\,\,z_{1}=z_{2}=z_{3}=0,\,\,z_{4}\in\mathbb{R}^{+},\,\,\mu_{k}+\nu\in\mathbb{R}^{+},\,\,z_{4}(\mu_{k}+\nu)=0. (77)

Thus, the iteration k+1k+1 is obtained as follows:

(pk+1,μk+1)=(exppk(α1kE1(pk)+α2kE2(pk)+α3kE3(pk)),μk+νk)(p_{k+1},\mu_{k+1})=(\exp_{p_{k}}(\alpha^{k}_{1}E_{1}(p_{k})+\alpha^{k}_{2}E_{2}(p_{k})+\alpha^{k}_{3}E_{3}(p_{k})),\mu_{k}+\nu_{k})

where α1k,α2k,α3k,νk\alpha^{k}_{1},\alpha^{k}_{2},\alpha^{k}_{3},\nu_{k} are a solution for (76)–(77).

For the implementation of our algorithm, we use the following expressions derived from (68) and (72):

f(pk,μk)=[2Ni=1Nexppk1pi2μkexppk1p~,E1(pk)2Ni=1Nexppk1pi2μkexppk1p~,E2(pk)2Ni=1Nexppk1pi2μkexppk1p~,E3(pk)d2(pk,p~)r2]f(p_{k},\mu_{k})=\begin{bmatrix}\left\langle-\frac{2}{N}\sum_{i=1}^{N}\exp^{-1}_{p_{k}}p^{i}-2\mu_{k}\exp^{-1}_{p_{k}}\tilde{p},E_{1}(p_{k})\right\rangle\\ \left\langle-\frac{2}{N}\sum_{i=1}^{N}\exp^{-1}_{p_{k}}p^{i}-2\mu_{k}\exp^{-1}_{p_{k}}\tilde{p},E_{2}(p_{k})\right\rangle\\ \left\langle-\frac{2}{N}\sum_{i=1}^{N}\exp^{-1}_{p_{k}}p^{i}-2\mu_{k}\exp^{-1}_{p_{k}}\tilde{p},E_{3}(p_{k})\right\rangle\\ d^{2}(p_{k},\tilde{p})-r^{2}\end{bmatrix}

and

𝒟f(pk,μk)[(v,ν)]=[gradpf1(pk,μk),v2logpkp~,E1(pk)νgradpf2(pk,μk),v2logpkp~,E2(pk)νgradpf3(pk,μk),v2logpkp~,E3(pk)ν2logpkp~,v],(v,ν)𝒯pk𝕊3×,\mathcal{D}f(p_{k},\mu_{k})[(v,\nu)]=\begin{bmatrix}\langle\operatorname{grad}_{p}f_{1}(p_{k},\mu_{k}),v\rangle-2\langle\log_{p_{k}}\tilde{p},E_{1}(p_{k})\rangle\nu\\ \langle\operatorname{grad}_{p}f_{2}(p_{k},\mu_{k}),v\rangle-2\langle\log_{p_{k}}\tilde{p},E_{2}(p_{k})\rangle\nu\\ \langle\operatorname{grad}_{p}f_{3}(p_{k},\mu_{k}),v\rangle-2\langle\log_{p_{k}}\tilde{p},E_{3}(p_{k})\rangle\nu\\ -2\langle\log_{p_{k}}\tilde{p},v\rangle\end{bmatrix},\quad(v,\nu)\in\mathcal{T}_{p_{k}}\mathbb{S}^{3}\times\mathbb{R},

where, for each j{1,2,3}j\in\{1,2,3\}, gradpfj(pk,μk)\operatorname{grad}_{p}f_{j}(p_{k},\mu_{k}) denotes the Riemannian gradient of fj(,μk)f_{j}(\cdot,\mu_{k}) at pkp_{k}, which is the projection of the Euclidean gradient of fj(,μk)f_{j}(\cdot,\mu_{k}) at pkp_{k} onto 𝒯pk𝕊3\mathcal{T}_{p_{k}}\mathbb{S}^{3}. This projection can be obtained by multiplying the vector by the matrix id4×4pk(pk)t\operatorname{id}_{4\times 4}-p_{k}(p_{k})^{t}, where id4×4\operatorname{id}_{4\times 4} denotes the identity matrix of dimension 4×44\times 4. Additionally, we need to know the following expressions:

expp(v){cos(v2)p+sin(v2)vv2,v𝒯p𝕊3{0},p,v=0,\exp_{p}(v)\coloneqq\left\{\begin{split}&\cos(\|v\|_{2})p+\sin(\|v\|_{2})\frac{v}{\|v\|_{2}},\ v\in\mathcal{T}_{p}\mathbb{S}^{3}\setminus\{0\},\\ &p,\qquad\qquad\qquad\qquad\qquad\qquad v=0,\end{split}\right.

and

expp1(q){arccosp,q1p,q2(IppT)q,q{p,p},0,q=p.\exp_{p}^{-1}(q)\coloneqq\left\{\begin{split}&\frac{\arccos\langle p,q\rangle}{\sqrt{1-\langle p,q\rangle^{2}}}(I-pp^{T})q,\ q\notin\{p,-p\},\\ &0,\qquad\qquad\qquad\qquad\qquad\ q=p.\end{split}\right.

These formulas on the sphere can be found in [27], for example.

Now, we are prepared for the numerical implementation. We use the Matlab built-in function rand to generate pip^{i}, normalizing them to unit vectors. Each component of pip^{i} lies within the interval (0,1)(0,1). Specifically, we consider four cases:

  • A1.

    N=10N=10, r=2r=2, p~=[0,0,0,1]\tilde{p}=[0,0,0,1], and p0=p1p_{0}=p^{1};

  • A2.

    N=500N=500, r=2r=2, p~=[0,0,0,1]\tilde{p}=[0,0,0,1], and p0=p1p_{0}=p^{1};

  • A3.

    N=10N=10, r=0.1r=0.1, p~=[0,0,0,1]\tilde{p}=[0,0,0,1], and p0=p1p_{0}=p^{1};

  • A4.

    N=500N=500, r=0.1r=0.1, p~=[0,0,0,1]\tilde{p}=[0,0,0,1], and p0=p1p_{0}=p^{1};

where p0p_{0} is the initial iteration point. For the stopping criteria, we use

Φ(pk,μk)e1012and𝐠(pk)1012\|\Phi(p_{k},\mu_{k})\|_{e}\leq 10^{-12}\quad\mathrm{and}\quad{\bf g}(p_{k})\leq 10^{-12}

where Φ(f1,f2,f3)\Phi\coloneqq(f_{1},f_{2},f_{3}).

From Figure 1 and Table 1, we can claim that for the above four cases A1-A4, we find a solution (p,μ)(p^{\star},\mu^{\star}) for generalized equation (71). This is because for all four cases, we achieve Φ(p,μ)e1012\|\Phi(p^{\star},\mu^{\star})\|_{e}\leq 10^{-12}, 𝐠(p)1012{\bf g}(p^{\star})\leq 10^{-12} and μ0\mu^{\star}\geq 0. In addition, KKT system (69)-(70) is also satisfied under (p,μ)(p^{\star},\mu^{\star}), which is asserted in theory and again ensured numerically. Furthermore, looking into cases A1-A2, we have 𝐠(p)<0{\bf g}(p^{\star})<0, which means that pp^{\star} lies in the interior of the constraint region. But for cases A3-A4, pp^{\star} is on the boundary of the constraint region because 𝐠(p){\bf g}(p^{\star}) is almost equal to 0. Due to the effect of the constraint, the convergence rate of the norm of Φ(pk,μk)\Phi(p_{k},\mu_{k}) for cases A1-A2 is stable. For cases A3-A4, the convergence rate of the norm of Φ(pk,μk)\Phi(p_{k},\mu_{k}) in the first several iterations is unstable but after several iterations, the convergence rate becomes stable. This coincides with the theory that the convergence rate of the inexact Newton method becomes stable when the iteration point is close to the solution.

Refer to caption
(a) Case A1
Refer to caption
(b) Case A2
Refer to caption
(c) Case A3
Refer to caption
(d) Case A4
Figure 1: Norm of Φ(pk,μk)\Phi(p_{k},\mu_{k}) at each iteration for four cases A1-A4.
Table 1: The final results of four cases A1-A4.
pp^{\star} μ\mu^{\star} 𝐠(p){\bf g}(p^{\star}) μ𝐠(p)\mu{\bf g}(p^{\star}) gradμ(p)p\|\operatorname{grad}\mathcal{L}_{\mu^{\star}}(p^{\star})\|_{p} No. of Iteration
Case A1 [0.4388,0.4862,0.5665,0.5001] 0 -2.9037 0 8.02×10138.02\times 10^{-13} 21
Case A2 [0.4908,0.5100,0.4878,0.5109] 0 -2.9298 0 3.26×10133.26\times 10^{-13} 19
Case A3 [0.0504,0.0566,0.0650,0.9950] 8.9114 1.05×10151.05\times 10^{-15} 9.32×10159.32\times 10^{-15} 7.91×10137.91\times 10^{-13} 16
Case A4 [0.0570,0.0593,0.0566,0.9950] 8.7870 1.05×10151.05\times 10^{-15} 9.20×10159.20\times 10^{-15} 7.89×10137.89\times 10^{-13} 16

7 Conclusion

In this paper, we address the problem of finding solutions to generalized equations on Riemannian manifolds. If the manifold is a Euclidean space, then it is well-known that this problem encompasses several other contexts, such as standard nonlinear optimization, variational inequalities, and equilibrium problems. Here, we present a general inexact Newton method for solving generalized equations. Firstly, we discuss the metric regularity property and provide some examples of mappings satisfying this property. Secondly, we establish local convergence results, including both linear and quadratic convergence under suitable assumptions, along with a semi-local convergence result. Finally, we discuss the relationship between problems (2) and (3).

All results obtained here are derived under assumptions that are highly natural in comparison to their Euclidean counterparts, without requiring additional conditions related to the geometry of the manifold. From this perspective, considering a problem as a generalized equation may be a promising approach in the context of manifolds as well.

As a next step, we plan to investigate how to extend the theory presented here to cases where the exponential map is replaced by a general retraction. Moreover, we intend to apply this new concept to quasi-Newton-type methods for solving problem (2).

Data Availability Data sharing not applicable to this article as no datasets were generated or analyzed during the current study

Declarations

Conflict of interest The authors declare that they have no conflict of interest

Appendix A Appendix

We begin this section by presenting two supporting lemmas. The proof of the first one can be found in [45, Lemma A.1]. The second one is analogous to [45, Lemma A.2], but its proof requires some adaptations. Therefore, we provide the proof here.

Lemma 3.

Let (p¯,v¯)(\bar{p},\bar{v}) be a point on 𝒯{\cal T}{\cal M}, and Br(p¯)\operatorname{B}_{r}(\bar{p}) be a normal ball. Define the following vector field on Br(p¯)\operatorname{B}_{r}(\bar{p}):

V(p)=Pp¯pv¯,pBr(p¯).V(p)=P_{\bar{p}p}\bar{v},\qquad p\in\operatorname{B}_{r}(\bar{p}).

Then, VV is a smooth vector field on Br(p¯)\operatorname{B}_{r}(\bar{p}).

Lemma 4.

Let f:mf\colon{\cal M}\to\mathbb{R}^{m} be a continuously differentiable function at p¯\bar{p} and Br(p¯)\operatorname{B}_{r}(\bar{p}) a normal ball. Then, for each ϵ>0\epsilon>0, there exists δ(0,r)\delta\in(0,r) such that

𝒟f(p)Pp¯p𝒟f(p¯)mapϵfor all pBδ(p¯).\|{\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p})\|_{map}\leq\epsilon\quad\mbox{for all }p\in\operatorname{B}_{\delta}(\bar{p}).
Proof.

It follows from (7) and parallel transport properties that

𝒟f(p)[Pp¯pv]\displaystyle{\cal D}f(p)[P_{\bar{p}p}v] =(gradf1(p),Pp¯pvp,,gradfm(p),Pp¯pvp),\displaystyle=({\langle}\operatorname{grad}f_{1}(p),P_{\bar{p}p}v{\rangle}_{p},\ldots,{\langle}\operatorname{grad}f_{m}(p),P_{\bar{p}p}v{\rangle}_{p}),
=(Ppp¯gradf1(p),vp¯,,Ppp¯gradfm(p),vp¯),v𝒯p¯,pBr(p¯).\displaystyle=({\langle}P_{p\bar{p}}\operatorname{grad}f_{1}(p),v{\rangle}_{\bar{p}},\ldots,{\langle}P_{p\bar{p}}\operatorname{grad}f_{m}(p),v{\rangle}_{\bar{p}}),\quad v\in{\cal T}_{\bar{p}}{\cal M},\,\,p\in\operatorname{B}_{r}(\bar{p}).

From norm properties, we have

(𝒟f(p)Pp¯p𝒟f(p¯))[v]e2\displaystyle\|({\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p}))[v]\|_{e}^{2} =i=1m|Ppp¯gradfi(p)gradfi(p¯),vp¯|2,\displaystyle=\sum_{i=1}^{m}|{\langle}P_{p\bar{p}}\operatorname{grad}f_{i}(p)-\operatorname{grad}f_{i}(\bar{p}),v{\rangle}_{\bar{p}}|^{2},
i=1mPpp¯gradfi(p)gradfi(p¯)p¯2vp¯2,v𝒯p¯,pBr(p¯).\displaystyle\leq\sum_{i=1}^{m}\|P_{p\bar{p}}\operatorname{grad}f_{i}(p)-\operatorname{grad}f_{i}(\bar{p})\|_{\bar{p}}^{2}\|v\|_{\bar{p}}^{2},\quad v\in{\cal T}_{\bar{p}}{\cal M},\,\,p\in\operatorname{B}_{r}(\bar{p}).

Since 𝒟f(p)Pp¯p𝒟f(p¯):𝒯p¯m{\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p})\colon{\cal T}_{\bar{p}}{\cal M}\to\mathbb{R}^{m} is a linear map, the above inequality implies that

𝒟f(p)Pp¯p𝒟f(p¯)map2\displaystyle\|{\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p})\|_{map}^{2} i=1mPpp¯gradfi(p)gradfi(p¯)p¯2\displaystyle\leq\sum_{i=1}^{m}\|P_{p\bar{p}}\operatorname{grad}f_{i}(p)-\operatorname{grad}f_{i}(\bar{p})\|_{\bar{p}}^{2}
=i=1mgradfi(p)Pp¯pgradfi(p¯)p2,pBr(p¯).\displaystyle=\sum_{i=1}^{m}\|\operatorname{grad}f_{i}(p)-P_{\bar{p}p}\operatorname{grad}f_{i}(\bar{p})\|_{p}^{2},\quad p\in\operatorname{B}_{r}(\bar{p}).

Under the assumptions of the lemma, gradfi\operatorname{grad}f_{i} is a continuous vector field around p¯\bar{p} for all i=1,,mi=1,\ldots,m. Hence, using Lemma 3 with v¯=gradfi(p¯)\bar{v}=\operatorname{grad}f_{i}(\bar{p}) and considering the fact that the function pp{\cal M}\ni p\mapsto\|\cdot\|_{p} is continuous, we obtain

limpp¯gradfi(p)Pp¯pgradfi(p¯))p=gradfi(p¯)Pp¯p¯gradfi(p¯)p¯=0.\lim_{p\to\bar{p}}\|\operatorname{grad}f_{i}(p)-P_{\bar{p}p}\operatorname{grad}f_{i}(\bar{p}))\|_{p}=\|\operatorname{grad}f_{i}(\bar{p})-P_{\bar{p}\bar{p}}\operatorname{grad}f_{i}(\bar{p})\|_{\bar{p}}=0.

Thus, it follows from the last inequality above that limpp¯𝒟f(p)Pp¯p𝒟f(p¯)map=0\lim_{p\to\bar{p}}\|{\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p})\|_{map}=0, which is equivalent to what we want. ∎

In the following proposition, we provide a Riemannian version of the Fundamental Theorem of Calculus and two inequalities that play a crucial role throughout this paper. We claim that the inequalities in the following result do not hold for a general retraction.

Proposition 2.

Let f:mf\colon{\cal M}\to\mathbb{R}^{m} be a continuously differentiable function at p¯\bar{p}. Then, there exists δ>0\delta>0 such that for each p,pδ(p¯)p^{\prime},p\in{\cal B}_{\delta}(\bar{p}), we have

f(p)f(p)=01𝒟f(γ(t))[γ˙(t)]dt,f(p)-f(p^{\prime})=\int_{0}^{1}{\cal D}f(\gamma(t))[\dot{\gamma}(t)]\operatorname{dt}, (78)

where γ:[0,1]\gamma\colon[0,1]\to{\cal M} is a geodesic satisfying γ(0)=p\gamma(0)=p^{\prime} and γ(1)=p\gamma(1)=p. In particular, for each ϵ>0\epsilon>0, there exists a normal ball Bδϵ(p¯)\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) such that

f(p)f(p¯)𝒟f(p¯)[expp¯1p]eϵd(p,p¯),pBδϵ(p¯).\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq\epsilon d(p,\bar{p}),\qquad p\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}). (79)

Furthermore, if 𝒟f{\cal D}f is LL-Lipschitz continuous around p¯\bar{p} then there exists a normal ball BδL(p¯)\operatorname{B}_{\delta_{L}}(\bar{p}) such that

f(p)f(p¯)𝒟f(p¯)[expp¯1p]eLd2(p,p¯),pBδL(p¯).\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq Ld^{2}(p,\bar{p}),\qquad p\in\operatorname{B}_{\delta_{L}}(\bar{p}). (80)
Proof.

Choose δ>0\delta>0 such that ff is continuously differentiable on δ(p¯){\cal B}_{\delta}(\bar{p}). Pick p,pδ(p¯)p^{\prime},p\in{\cal B}_{\delta}(\bar{p}) and consider a geodesic γ:[0,1]\gamma\colon[0,1]\to{\cal M} connecting p=γ(0)p^{\prime}=\gamma(0) to p=γ(1)p=\gamma(1). Using [20, Proposition 2.7], we conclude that

ddt(fγ)(t)=𝒟f(γ(t))[γ˙(t)],t[0,1].\frac{\operatorname{d}}{\operatorname{dt}}(f\circ\gamma)(t)={\cal D}f(\gamma(t))[\dot{\gamma}(t)],\qquad t\in[0,1].

Applying integral with respect to tt on both sides and using the Fundamental Theorem of Calculus for the real function fγf\circ\gamma, we conclude that

f(p)f(p)=f(γ(1))f(γ(0))=01𝒟f(γ(t))[γ˙(t)]dt,f(p)-f(p^{\prime})=f(\gamma(1))-f(\gamma(0))=\int_{0}^{1}{\cal D}f(\gamma(t))[\dot{\gamma}(t)]\operatorname{dt}, (81)

which completes the proof of (78). Now, let us prove (79) and (80). Let Br(p¯)\operatorname{B}_{r}(\bar{p}) be a normal ball. For each pBr(p¯)p\in\operatorname{B}_{r}(\bar{p}), the geodesic γp¯p:[0,1]\gamma_{\bar{p}p}\colon[0,1]\to{\cal M} connecting p¯\bar{p} to pp can be written as γp¯p(t)=expp¯(texpp¯1p)\gamma_{\bar{p}p}(t)=\exp_{\bar{p}}(t\exp_{\bar{p}}^{-1}p), which implies that γ˙p¯p(t)=Pp¯γp¯p(t)expp¯1p\dot{\gamma}_{\bar{p}p}(t)=P_{\bar{p}\gamma_{\bar{p}p}(t)}\exp^{-1}_{\bar{p}}p holds for all t[0,1]t\in[0,1]. Hence, it follows from (81) with γ=γp¯p\gamma=\gamma_{\bar{p}p} that

f(p)f(p¯)=01𝒟f(γp¯p(t))[Pp¯γp¯p(t)expp¯1p]dt.f(p)-f(\bar{p})=\int_{0}^{1}{\cal D}f(\gamma_{\bar{p}p}(t))[P_{\bar{p}\gamma_{\bar{p}p}(t)}\exp^{-1}_{\bar{p}}p]\operatorname{dt}.

By adding the term 𝒟f(p¯)[expp¯1p]-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p] on both sides, it comes that

f(p)f(p¯)𝒟f(p¯)[expp¯1p]=01(𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯))[expp¯1p]dt.f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]=\int_{0}^{1}\left({\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\right)[\exp^{-1}_{\bar{p}}p]\,\operatorname{dt}.

Applying the Euclidean norm of m\mathbb{R}^{m}, we get

f(p)f(p¯)𝒟f(p¯)[expp¯1p]e\displaystyle\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e} =01(𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯))[expp¯1p]dte\displaystyle=\left\|\int_{0}^{1}\left({\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\right)[\exp^{-1}_{\bar{p}}p]\,\operatorname{dt}\right\|_{e}
01(𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯))[expp¯1p]edt\displaystyle\leq\int_{0}^{1}\left\|\left({\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\right)[\exp^{-1}_{\bar{p}}p]\right\|_{e}\,\operatorname{dt}
01𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯)mapexpp¯1pp¯dt.\displaystyle\leq\int_{0}^{1}\|{\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\|_{map}\,\|\exp^{-1}_{\bar{p}}p\|_{\bar{p}}\,\,\operatorname{dt}.

It follows from (8) and simple properties of integral that

f(p)f(p¯)𝒟f(p¯)[expp¯1p]esupt[0,1]{𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯)map}d(p,p¯),\|f(p)-f(\bar{p})-{\cal D}f(\bar{p})[\exp^{-1}_{\bar{p}}p]\|_{e}\leq\sup_{t\in[0,1]}\left\{\|{\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\|_{map}\right\}\,d(p,\bar{p}), (82)

for all pBr(p¯)p\in\operatorname{B}_{r}(\bar{p}). For arbitrary ϵ>0\epsilon>0, it follows from Lemma 4 and δϵ(0,r)\delta_{\epsilon}\in(0,r) that 𝒟f(p)Pp¯p𝒟f(p¯)mapϵ\|{\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p})\|_{map}\leq\epsilon for all pBδϵ(p¯),p\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}), which implies that 𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯)mapϵ\|{\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\|_{map}\leq\epsilon for all pBδϵ(p¯)p\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) and t[0,1]t\in[0,1]. Therefore, (82) yields (79). Finally, assume that 𝒟f{\cal D}f is LL-Lipschitz continuous around p¯\bar{p}. By Definition 1, there exists δL(0,r)\delta_{L}\in(0,r) such that 𝒟f(p)Pp¯p𝒟f(p¯)mapLd(p,p¯)\|{\cal D}f(p)P_{\bar{p}p}-{\cal D}f(\bar{p})\|_{map}\leq Ld(p,\bar{p}) for all pBδL(p¯)p\in\operatorname{B}_{\delta_{L}}(\bar{p}), which implies that

supt[0,1]{𝒟f(γp¯p(t))Pp¯γp¯p(t)𝒟f(p¯)map}Lsupt[0,1]{d(γp¯p(t),p¯)}=Ld(p,p¯),\sup_{t\in[0,1]}\left\{\|{\cal D}f(\gamma_{\bar{p}p}(t))P_{\bar{p}\gamma_{\bar{p}p}(t)}-{\cal D}f(\bar{p})\|_{map}\right\}\leq L\sup_{t\in[0,1]}\left\{d(\gamma_{\bar{p}p}(t),\bar{p})\right\}=Ld(p,\bar{p}),

for all pBδL(p¯)p\in\operatorname{B}_{\delta_{L}}(\bar{p}). Using the previous inequality and (82), we conclude the proof of (80). ∎

The following proposition provides a geometric property of the exponential map.

Proposition 3.

Let Br(p¯)\operatorname{B}_{r}(\bar{p}) be a totally normal ball. Then, for each pp and qq in Br(p¯)\operatorname{B}_{r}(\bar{p}), we have

γ˙pq(t)=expγpq(t)1qexpγpq(t)1p,t[0,1],\dot{\gamma}_{pq}(t)=\exp_{\gamma_{pq}(t)}^{-1}q-\exp_{\gamma_{pq}(t)}^{-1}p,\qquad t\in[0,1], (83)

where γpq:[0,1]\gamma_{pq}\colon[0,1]\to{\cal M} is the geodesic connecting p=γpq(0)p=\gamma_{pq}(0) to q=γpq(1)q=\gamma_{pq}(1).

Proof.

Let pp and qq be arbitrary points in Br(p¯)\operatorname{B}_{r}(\bar{p}). Note that (83) is trivially satisfied when t=0t=0 and t=1t=1. Thus, we only need to analyze the case where t(0,1)t\in(0,1). Considering that the geodesic starting from pp with direction γ˙pq(0)\dot{\gamma}_{pq}(0) is unique, and that Br(p¯)\operatorname{B}_{r}(\bar{p}) is a totally normal ball, we have

1d(p,q)γ˙pq(0)=1d(p,γpq(t))expp1(γpq(t)),t(0,1).\frac{1}{d(p,q)}\dot{\gamma}_{pq}(0)=\frac{1}{d(p,\gamma_{pq}(t))}\exp_{p}^{-1}(\gamma_{pq}(t)),\quad t\in(0,1).

Applying parallel transport on both sides of the equation, we obtain

1d(p,q)Ppγpq(t)[γ˙pq(0)]=1d(p,γpq(t))Ppγpq(t)[expp1(γpq(t))]=1d(p,γpq(t))[expγpq(t)1p],t(0,1).\frac{1}{d(p,q)}P_{p\gamma_{pq}(t)}[\dot{\gamma}_{pq}(0)]=\frac{1}{d(p,\gamma_{pq}(t))}P_{p\gamma_{pq}(t)}[\exp_{p}^{-1}(\gamma_{pq}(t))]=\frac{1}{d(p,\gamma_{pq}(t))}[-\exp_{\gamma_{pq}(t)}^{-1}p],\quad t\in(0,1).

Since γpq\gamma_{pq} is a geodesic, the field γ˙pq\dot{\gamma}_{pq} is parallel along γpq\gamma_{pq}, and, consequently, the equality γ˙pq(t)=Ppγpq(t)[γ˙pq(0)]\dot{\gamma}_{pq}(t)=P_{p\gamma_{pq}(t)}[\dot{\gamma}_{pq}(0)] holds for all t(0,1)t\in(0,1). Thus, the last expression implies

d(p,γpq(t))d(p,q)γ˙pq(t)=expγpq(t)1(p),t(0,1).\frac{d(p,\gamma_{pq}(t))}{d(p,q)}\dot{\gamma}_{pq}(t)=-\exp_{\gamma_{pq}(t)}^{-1}(p),\quad t\in(0,1).

In a similar manner, we can show that

d(γpq(t),q)d(p,q)γ˙pq(t)=expγpq(t)1(q),t(0,1).\frac{d(\gamma_{pq}(t),q)}{d(p,q)}\dot{\gamma}_{pq}(t)=\exp_{\gamma_{pq}(t)}^{-1}(q),\quad t\in(0,1).

To conclude the proof, simply add the last two equalities and use the fact that d(p,γpq(t))+d(γpq(t),q)=d(p,q)d(p,\gamma_{pq}(t))+d(\gamma_{pq}(t),q)=d(p,q) for all t(0,1)t\in(0,1). ∎

In the next result, we will establish a sufficient condition for the function f:f:\mathcal{M}\to\mathbb{R} to have closed graph.

Proposition 4.

Let f:f\colon{\cal M}\to\mathbb{R} be a continuous function. Then, the graph of ff is closed in ×{\cal M}\times\mathbb{R}.

Proof.

Since the function ff is continuous, the function φ:×\varphi\colon{\cal M}\times\mathbb{R}\to\mathbb{R} defined by φ(p,x)=|f(p)x|\varphi(p,x)=|f(p)-x| is also continuous. On the other hand, we have that

gphf{(p,x)×:x=f(p)}=φ1(0).\operatorname{gph}f\coloneqq\{(p,x)\in{\cal M}\times\mathbb{R}\colon x=f(p)\}=\varphi^{-1}(0).

Therefore, gphf\operatorname{gph}f is closed because it is the inverse image, by the continuous function φ\varphi, of the closed set {0}\{0\}. ∎

Throughout this paper, our strategy to prove some results requires the following lemma.

Lemma 5.

Let f:mf\colon{\cal M}\to\mathbb{R}^{m} be a continuously differentiable function at p¯\bar{p}. Then, for each ϵ>0\epsilon>0, there exists a totally normal ball Bδϵ(p¯)\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) such that

𝒟f(p)[expp1pexpp1p′′]𝒟f(p¯)[expp¯1pexpp¯1p′′]eϵd(p,p′′),p,p,p′′Bδϵ(p¯).\|{\cal D}f(p)[\exp_{p}^{-1}p^{\prime}-\exp_{p}^{-1}p^{\prime\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}p^{\prime}-\exp_{\bar{p}}^{-1}p^{\prime\prime}]\|_{e}\leq\epsilon d(p^{\prime},p^{\prime\prime}),\qquad p,p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}).
Proof.

Let Br(p¯)\operatorname{B}_{r}(\bar{p}) be a totally normal ball. Define the function Ψ:Br(p¯)×Br(p¯)m\Psi\colon\operatorname{B}_{r}(\bar{p})\times\operatorname{B}_{r}(\bar{p})\to\mathbb{R}^{m} by

Ψ(p,q)=𝒟f(p)[expp1q]𝒟f(p¯)[expp¯1q].\Psi(p,q)={\cal D}f(p)[\exp_{p}^{-1}q]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}q]. (84)

Let ϵ>0\epsilon>0, and consider ×\cal M\times\cal M with the product metric. Since 𝒟f{\cal D}f and exp1\exp^{-1} are continuous at p¯\bar{p} and continuously differentiable at (p¯,p¯)(\bar{p},\bar{p}), respectively, 𝒟2Ψ(p,q){\cal D}_{2}\Psi(p,q) (the differential of Ψ(p,)\Psi(p,\cdot) at qq) is continuous at (p¯,p¯)(\bar{p},\bar{p}). Therefore, there exists δ(0,r)\delta\in(0,r) such that

d((p,q),(p¯,p¯))=(d2(p,p¯)+d2(q,p¯))12<δimplies𝒟2Ψ(p,q)𝒟2Ψ(p¯,p¯)e<ϵ.d((p,q),(\bar{p},\bar{p}))=(d^{2}(p,\bar{p})+d^{2}(q,\bar{p}))^{\frac{1}{2}}<\delta\quad\mbox{implies}\quad\|{\cal D}_{2}\Psi(p,q)-{\cal D}_{2}\Psi(\bar{p},\bar{p})\|_{e}<\epsilon.

From (84), it follows that Ψ(p¯,q)=0\Psi(\bar{p},q)=0 for all qBr(p¯)q\in\operatorname{B}_{r}(\bar{p}) which implies 𝒟2Ψ(p¯,p¯)=0{\cal D}_{2}\Psi(\bar{p},\bar{p})=0. Thus, the last line can be rewritten as

d2(p,p¯)+d2(q,p¯)<δ2implies𝒟2Ψ(p,q)e<ϵ.d^{2}(p,\bar{p})+d^{2}(q,\bar{p})<\delta^{2}\quad\mbox{implies}\quad\|{\cal D}_{2}\Psi(p,q)\|_{e}<\epsilon. (85)

Set δϵδ/2\delta_{\epsilon}\coloneqq\delta/\sqrt{2}. It follows from d2(p,p¯)+d2(q,p¯)<δ2d^{2}(p,\bar{p})+d^{2}(q,\bar{p})<\delta^{2} for all (p,q)Bδϵ(p¯)×Bδϵ(p¯)(p,q)\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p})\times\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) and (85) that 𝒟2Ψ(p,q)e<ϵ\|{\cal D}_{2}\Psi(p,q)\|_{e}<\epsilon for all (p,q)Bδϵ(p¯)×Bδϵ(p¯)(p,q)\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p})\times\operatorname{B}_{\delta_{\epsilon}}(\bar{p}). Consequently,

supqδϵ(p¯){𝒟2Ψ(p,q)e}ϵfor all pBδϵ(p¯).\sup_{q\in{\cal B}_{\delta_{\epsilon}}(\bar{p})}\{\|{\cal D}_{2}\Psi(p,q)\|_{e}\}\leq\epsilon\quad\mbox{for all }p\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}). (86)

For p,p,p′′Bδϵ(p¯)p,p^{\prime},p^{\prime\prime}\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}), it follows from the first part of Proposition 2 and (84) that

𝒟f(p)[expp1pexpp1p′′]𝒟f(p¯)[expp¯1pexpp¯1p′′]\displaystyle{\cal D}f(p)[\exp_{p}^{-1}p^{\prime}-\exp_{p}^{-1}p^{\prime\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}p^{\prime}-\exp_{\bar{p}}^{-1}p^{\prime\prime}] =Ψ(p,p)Ψ(p,p′′),\displaystyle=\Psi(p,p^{\prime})-\Psi(p,p^{\prime\prime}),
=01𝒟2Ψ(p,γ(t))[γ˙(t)]dt,\displaystyle=\int_{0}^{1}{\cal D}_{2}\Psi(p,\gamma(t))[\dot{\gamma}(t)]\operatorname{dt},

where γ:[0,1]\gamma\colon[0,1]\to{\cal M} is a geodesic that satisfies γ(0)=p′′\gamma(0)=p^{\prime\prime} and γ(1)=p\gamma(1)=p^{\prime}. Since Bδϵ(p¯)\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) is totally normal, γ(t)Bδϵ(p¯)\gamma(t)\in\operatorname{B}_{\delta_{\epsilon}}(\bar{p}) for all t[0,1]t\in[0,1]. Hence, applying the norm to both sides of the above equality and using norm properties, we obtain

𝒟f(p)[expp1pexpp1p′′]𝒟f(p¯)[expp¯1pexpp¯1p′′]e\displaystyle\|{\cal D}f(p)[\exp_{p}^{-1}p^{\prime}-\exp_{p}^{-1}p^{\prime\prime}]-{\cal D}f(\bar{p})[\exp_{\bar{p}}^{-1}p^{\prime}-\exp_{\bar{p}}^{-1}p^{\prime\prime}]\|_{e} 01𝒟2Ψ(p,γ(t))[γ˙(t)]edt,\displaystyle\leq\int_{0}^{1}\|{\cal D}_{2}\Psi(p,\gamma(t))[\dot{\gamma}(t)]\|_{e}\operatorname{dt},
01𝒟2Ψ(p,γ(t))mapγ˙(t)γ(t)dt,\displaystyle\leq\int_{0}^{1}\|{\cal D}_{2}\Psi(p,\gamma(t))\|_{map}\|\dot{\gamma}(t)\|_{\gamma(t)}\operatorname{dt},
supqδϵ(p¯){𝒟2Ψ(p,q)map}d(p,p′′).\displaystyle\leq\sup_{q\in{\cal B}_{\delta_{\epsilon}}(\bar{p})}\left\{\|{\cal D}_{2}\Psi(p,q)\|_{map}\right\}d(p^{\prime},p^{\prime\prime}).

The conclusion of this proof follows from the previous inequality and (86). ∎

References

  • [1] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
  • [2] R. L. Adler, J.-P. Dedieu, J. Y. Margulies, M. Martens, and M. Shub. Newton’s method on Riemannian manifolds and a geometric model for the human spine. IMA Journal of Numerical Analysis, 22(3):359–390, 2002.
  • [3] S. Adly, R. Cibulka, and H. van Ngai. Newton’s method for solving inclusions using set-valued approximations. SIAM J. Optim., 25:159–184, 2015.
  • [4] S. Adly, H. van Ngai, and N. V. Vu. Newton-type method for solving generalized equations on Riemannian manifolds. Journal of Convex Analysis, 25(2):341–370, 2018.
  • [5] S. Adly, H. van Ngai, and N. V. Vu. Dennis-Moré condition for set-valued vector fields and the superlinear convergence of Broyden updates in Riemannian manifolds. Journal of Convex Analysis, 29(3), 2022.
  • [6] F. Alvarez, J. Bolte, and J. Munier. A unifying local convergence result for Newton’s method in Riemannian manifolds. Foundations of Computational Mathematics, 8(2):197–226, 2008.
  • [7] R. Andreani, R. M. de Carvalho, L. D. Secchin, and G. N. Silva. Convergence of Quasi-Newton methods for solving constrained generalized equations. ESAIM:COCV, 28, 2022.
  • [8] F. J. Aragón Artacho, A. Belyakov, A. L. Dontchev, and M. López. Local convergence of quasi-Newton methods under metric regularity. Comput. Optim. Appl., 58(1):225–247, 2014.
  • [9] F. J. Aragón Artacho, A. L. Dontchev, M. Gaydu, M. H. Geoffroy, and V. M. Veliov. Metric regularity of Newton’s iteration. SIAM J. Control Optim., 49(2):339–362, 2011.
  • [10] R. Bergmann and R. Herzog. Intrinsic formulation of KKT conditions and constraint qualifications on smooth manifolds. SIAM Journal on Optimization, 29(4):2423–2444, 2019.
  • [11] R. Bergmann, R. Herzog, M. Silva Louzeiro, D. Tenbrinck, and J. Vidal-Núñez. Fenchel duality theory and a primal-dual algorithm on Riemannian manifolds. Foundations of Computational Mathematics, 21(6):1465–1504, 2021.
  • [12] R. Bergmann, J. Persch, and G. Steidl. A parallel Douglas–Rachford algorithm for minimizing rof-like functionals on images with values in symmetric Hadamard manifolds. SIAM Journal on Imaging Sciences, 9(3):901–937, 2016.
  • [13] D. A. Bini and B. Iannazzo. Computing the karcher mean of symmetric positive definite matrices. Linear Algebra and its Applications, 438(4):1700–1710, 2013.
  • [14] N. Boumal. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023.
  • [15] Y. Censor, A. Gibali, and S. Reich. Algorithms for the split variational inequality problem. Numerical Algorithms, 59:301–323, 2011.
  • [16] R. Cibulka, A. Dontchev, and M. H. Geoffroy. Inexact Newton Methods and Dennis–Moré Theorems for Nonsmooth Generalized Equations. SIAM J. Control Optim., 53(2):1003–1019, 2015.
  • [17] R. Cibulka, A. Dontchev, and M. H. Geoffroy. Inexact Newton methods and Dennis–Moré theorems for nonsmooth generalized equations. SIAM Journal on Control and Optimization, 53(2):1003–1019, 2015.
  • [18] F. R. de Oliveira, O. P. Ferreira, and G. N. Silva. Newton’s method with feasible inexact projections for solving constrained generalized equations. Computational Optimization and Applications, 72:159–177, 2019.
  • [19] J.-P. Dedieu, P. Priouret, and G. Malajovich. Newton’s method on Riemannian manifolds: covariant alpha theory. IMA Journal of Numerical Analysis, 23(3):395–419, 2003.
  • [20] M. P. do Carmo. Riemannian geometry. Mathematics: Theory & Applications. Birkhäuser Boston, Inc., Boston, MA, 1992. Translated from the second Portuguese edition by Francis Flaherty.
  • [21] A. L. Dontchev. Local analysis of a Newton-type method based on partial linearization. In The mathematics of numerical analysis (Park City, UT, 1995), volume 32 of Lectures in Appl. Math., pages 295–306. Amer. Math. Soc., Providence, RI, 1996.
  • [22] A. L. Dontchev and R. T. Rockafellar. Newton’s method for generalized equations: a sequential implicit function theorem. Math. Program., 123(1, Ser. B):139–159, 2010.
  • [23] A. L. Dontchev and R. T. Rockafellar. Convergence of inexact Newton methods for generalized equations. Math. Program., 139(1-2, Ser. B):115–137, 2013.
  • [24] A. L. Dontchev and R. T. Rockafellar. Implicit Functions and Solution Mappings: A View from Variational Analysis. Springer, New York, 2nd edition, 2014.
  • [25] T. A. Fernandes, O. P. Ferreira, and J. Yuan. On the superlinear convergence of Newton’s method on Riemannian manifolds. Journal of Optimization Theory and Applications, 173(3):828–843, 2017.
  • [26] O. Ferreira and G. Silva. Local convergence analysis of Newton’s method for solving strongly regular generalized equations. Journal of Mathematical Analysis and Applications, 458(1):481–496, 2018.
  • [27] O. P. Ferreira, A. N. Iusem, and S. Z. Németh. Projections onto convex sets on the sphere. Journal of Global Optimization, 57(3):663–676, 2013.
  • [28] O. P. Ferreira, C. Jean-Alexis, and A. Piétrus. Metrically regular vector field and iterative processes for generalized equations in Hadamard manifolds. Journal of Optimization Theory and Applications, 175(3):624–651, 2017.
  • [29] O. P. Ferreira, M. S. Louzeiro, and L. Prudente. Gradient method for optimization on riemannian manifolds with lower bounded curvature. SIAM Journal on Optimization, 29(4):2517–2541, 2019.
  • [30] O. P. Ferreira and G. N. Silva. Kantorovich’s Theorem on Newton’s Method for Solving Strongly Regular Generalized Equation. SIAM J. Optim., 27(2):910–926, 2017.
  • [31] M. Gaydu and G. N. Silva. A general iterative procedure to solve generalized equations with differentiable multifunction. Journal of Optimization Theory and Applications, 185:207–222, 2020.
  • [32] M. Genicot, W. Huang, and N. T. Trendafilov. Weakly correlated sparse components with nearly orthonormal loadings. In International Conference on Geometric Science of Information, pages 484–490. Springer, 2015.
  • [33] M. H. Geoffroy and A. Piétrus. Local convergence of some iterative methods for generalized equations. Journal of Mathematical Analysis and Applications, 290(2):497–505, 2004.
  • [34] H. He, C. Ling, and H.-K. Xu. A relaxed projection method for split variational inequalities. Journal of Optimization Theory and Applications, 166:213–233, 2015.
  • [35] L. Hesselholt. Vector fields on spheres. Preprint, http://www-math. mit. edu  larsh/teaching/vectorfields. pdf.
  • [36] A. F. Izmailov and M. V. Solodov. Inexact Josephy–Newton framework for generalized equations and its applications to local analysis of Newtonian methods for constrained optimization. Computational Optimization and Applications, 46(2):347–368, 2010.
  • [37] H. Karcher. Riemannian center of mass and mollifier smoothing. Communications on pure and applied mathematics, 30(5):509–541, 1977.
  • [38] D. Klatte and B. Kummer. Approximations and generalized Newton methods. Mathematical Programming, 168:673–716, 2018.
  • [39] S. Lang. Fundamentals of differential geometry, volume 191 of Graduate Texts in Mathematics. Springer-Verlag, New York, 1999.
  • [40] J. M. Lee. Introduction to Smooth Manifolds, volume 218 of Graduate Texts in Mathematics. Springer-Verlag, New York, 2003.
  • [41] J. M. Lee. Riemannian manifolds: an introduction to curvature, volume 176. Springer Science & Business Media, 2006.
  • [42] C. Li and J. Wang. Convergence of the Newton method and uniqueness of zeros of vector fields on Riemannian manifolds. Science in China Series A: Mathematics, 48(11):1465–1478, 2005.
  • [43] C. Li and J.-C. Yao. Variational inequalities for set-valued vector fields on Riemannian manifolds: Convexity of the solution set and the proximal point algorithm. SIAM Journal on Control and Optimization, 50(4):2486–2514, 2012.
  • [44] S.-L. Li, C. Li, Y.-C. Liou, and J.-C. Yao. Existence of solutions for variational inequalities on Riemannian manifolds. Nonlinear Analysis: Theory, Methods & Applications, 71(11):5695–5706, 2009.
  • [45] C. Liu and N. Boumal. Simple algorithms for optimization on Riemannian manifolds with constraints. Applied Mathematics & Optimization, 82(3):949–981, 2020.
  • [46] S. Németh. Variational inequalities on hadamard manifolds. Nonlinear Analysis: Theory, Methods & Applications, 52(5):1491–1498, 2003.
  • [47] H. Sato. Riemannian optimization and its applications. Springer, 2021.
  • [48] S. Sra and R. Hosseini. Conic geometric optimization on the manifold of positive definite matrices. SIAM J. Optim., 25(1):713–739, 2015.
  • [49] J. Tang and H. Liu. Unsupervised feature selection for linked social media data. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 904–912, 2012.
  • [50] L. W. Tu. An Introduction to Manifolds. Universitext. Springer, New York, 2 edition, 2011.
  • [51] C. Udrişte. Convex functions and optimization methods on Riemannian manifolds, volume 297 of Mathematics and its Applications. Kluwer Academic Publishers Group, Dordrecht, 1994.
  • [52] J.-H. Wang, S. Huang, and C. Li. Extended Newton’s method for mappings on Riemannian manifolds with values in a cone. Taiwanese J. Math., 13(2B):633–656, 2009.
  • [53] M. Weber and S. Sra. Riemannian optimization via frank-wolfe methods. Mathematical Programming, 199(1):525–556, 2023.
  • [54] Y. Zhang, Y. Lau, H.-w. Kuo, S. Cheung, A. Pasupathy, and J. Wright. On the global geometry of sphere-constrained sparse blind deconvolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4894–4902, 2017.