This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Hedy ATTOUCH 22institutetext: IMAG, Univ. Montpellier, CNRS, Montpellier, France
[email protected],
Supported by COST Action: CA16228
33institutetext: Aïcha BALHAG 44institutetext: Zaki CHBANI 55institutetext: Hassan RIAHI
Cadi Ayyad University
Sémlalia Faculty of Sciences 40000 Marrakech, Morroco
[email protected]
66institutetext: [email protected] 77institutetext: [email protected]

Damped inertial dynamics with vanishing Tikhonov regularization: strong asymptotic convergence towards the minimum norm solution

Hedy ATTOUCH    Aïcha BALHAG    Zaki CHBANI    Hassan RIAHI
Abstract

In a Hilbert space, we provide a fast dynamic approach to the hierarchical minimization problem which consists in finding the minimum norm solution of a convex minimization problem. For this, we study the convergence properties of the trajectories generated by a damped inertial dynamic with Tikhonov regularization. When the time goes to infinity, the Tikhonov regularization parameter is supposed to tend towards zero, not too fast, which is a key property to make the trajectories strongly converge towards the minimizer of ff of minimum norm. According to the structure of the heavy ball method for strongly convex functions, the viscous damping coefficient is proportional to the square root of the Tikhonov regularization parameter. Therefore, it also converges to zero, which will ensure rapid convergence of values. Precisely, under a proper tuning of these parameters, based on Lyapunov’s analysis, we show that the trajectories strongly converge towards the minimizer of minimum norm, and we provide the convergence rate of the values. We show a trade off between the property of fast convergence of values, and the property of strong convergence towards the minimum norm solution. This study improves several previous works where this type of results was obtained under restrictive hypotheses.

Keywords:
Accelerated gradient methods; convex optimization; damped inertial dynamics; hierarchical minimization; Nesterov accelerated gradient method; Tikhonov approximation.
MSC:
37N40, 46N10, 49M30, 65K05, 65K10, 90B50, 90C25.

1 Introduction

Throughout the paper, \mathcal{H} is a real Hilbert space which is endowed with the scalar product ,\langle\cdot,\cdot\rangle, with x2=x,x\|x\|^{2}=\langle x,x\rangle for xx\in\mathcal{H}. We consider the convex minimization problem

min{f(x):x},\min\left\{f(x):\ x\in\mathcal{H}\right\}, (1)

where f:f:\mathcal{H}\rightarrow\mathbb{R} is a convex continuously differentiable function whose solution set S=argminfS=\operatorname{argmin}f is nonempty. We aim at finding by rapid methods the element of minimum norm of SS. Our approach is in line with the dynamic approach developed by Attouch and László in AL to solve this question. It is based on the asymptotic analysis, as t+t\to+\infty, of the nonautonomous damped inertial dynamic

(TRIGS)x¨(t)+δε(t)x˙(t)+f(x(t))+ε(t)x(t)=0,{\rm(TRIGS)}\qquad\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0,

where the function ff and the Tikhonov regularization parameter ε\varepsilon satisfy the following hypothesis111In section 4, we will extend our study to the case of a convex lower semicontinuous proper function f:{+}f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}.:

(0){f: is convex and differentiable,f is Lipschitz continuous on bounded sets;S:=argminf. We denote by x the element of minimum norm of S;ε:[t0,+[+ is a nonincreasing function, of class 𝒞1, such that limtε(t)=0.\displaystyle(\mathcal{H}_{0})\;\begin{cases}\;\;f:\mathcal{H}\rightarrow\mathbb{R}\mbox{ is convex and differentiable},\nabla f\mbox{ is Lipschitz continuous on bounded sets};\vspace{1mm}\\ \;\;S:=\mbox{argmin}f\neq\emptyset.\mbox{ We denote by }x^{*}\mbox{ the element of minimum norm of }S;\vspace{1mm}\\ \;\;\varepsilon:[t_{0},+\infty[\to\mathbb{R}^{+}\mbox{ is a nonincreasing function, of class }\mathcal{C}^{1},\mbox{ such that }\ \lim_{t\to\infty}\varepsilon(t)=0.\end{cases}

The Cauchy problem for (TRIGS) is well posed. The proof of the existence and uniqueness of a global solution for the corresponding Cauchy problem is given in the appendix (see also AL ). It is based on classical arguments combining the Cauchy-Lipschitz theorem with energy estimates. Our main contribution is to develop a new Lyapunov analysis which gives the strong convergence of the trajectories of (TRIGS) to the element of minimum norm of SS. Precisely, we give sufficient conditions on ε(t)\varepsilon(t) which ensure that limt+x(t)x=0\lim_{t\to+\infty}\|x(t)-x^{\ast}\|=0. This improves the results of AL .

1.1 Attouch-László Lyapunov analysis of (TRIGS)

The main idea developed in AL consists of starting from the Polyak heavy ball with friction dynamic for strongly convex functions, then adapting it via Tikhonov approximation to deal with the case of general convex functions. Recall that a function f:f:{\mathcal{H}}\to\mathbb{R} is said to be μ\mu-strongly convex for some μ>0\mu>0 if fμ22f-\frac{\mu}{2}\|\cdot\|^{2} is convex. In this setting, we have the following exponential convergence result for the damped autonomous inertial dynamic where the damping coefficient is twice the square root of the modulus of strong convexity of ff:

Theorem 1.1

Suppose that f:f:{\mathcal{H}}\to\mathbb{R} is a function of class 𝒞1{\mathcal{C}}^{1} which is μ\mu-strongly convex for some μ>0\mu>0. Let x():[t0,+[x(\cdot):[t_{0},+\infty[\to{\mathcal{H}} be a solution trajectory of

x¨(t)+2μx˙(t)+f(x(t))=0.\ddot{x}(t)+2\sqrt{\mu}\dot{x}(t)+\nabla f(x(t))=0. (2)

Then, the following property holds:

f(x(t))minf=𝒪(eμt) as t+.f(x(t))-\min_{\mathcal{H}}f=\mathcal{O}\left(e^{-\sqrt{\mu}t}\right)\;\mbox{ as }\;t\to+\infty.

To adapt this result to the case of a general convex differentiable function f:f:{\mathcal{H}}\to\mathbb{R}, a natural idea is to use Tikhonov’s method of regularization. This leads to consider the non-autonomous dynamic which at time tt is governed by the gradient of the strongly convex function

φt:,φt(x):=f(x)+ε(t)2x2.\varphi_{t}:{\mathcal{H}}\to\mathbb{R},\quad\varphi_{t}(x):=f(x)+\frac{\varepsilon(t)}{2}\|x\|^{2}.

Then, replacing ff by φt\varphi_{t} in (2), and noticing that φt\varphi_{t} is ε(t)\varepsilon(t)-strongly convex, this gives the following dynamic which was introduced in AL (δ\delta is a positive parameter)

(TRIGS)x¨(t)+δε(t)x˙(t)+f(x(t))+ε(t)x(t)=0.{\rm(TRIGS)}\qquad\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0.

(TRIGS) stands shortly for Tikhonov regularization of inertial gradient systems. In order not to asymptotically modify the equilibria, it is supposed that ε(t)0\varepsilon(t)\to 0 as t+t\to+\infty222This is the key property of the asymptotic version (t+t\to+\infty) of the Browder-Tikhonov regularization method.. This condition implies that (TRIGS) falls within the framework of the inertial gradient systems with asymptotically vanishing damping. The importance of this class of inertial dynamics has been highlighted by several recent studies AAD1 , ABotCest , AC10 , ACPR , AP , CD , SBC , which make the link with the accelerated gradient method of Nesterov Nest1 ; Nest2 .
The control of the decay of ε(t)\varepsilon(t) to zero as t+t\to+\infty plays a key role in the Lyapunov analysis of (TRIGS), and uses the following condition.

Definition 1

Let us give δ>0\delta>0. We say that tε(t)t\mapsto\varepsilon(t) satisfies the controlled decay property (CD)λ{\rm(CD)}_{\lambda}, if it is a nonincreasing function which satisfies: there exists t1t0t_{1}\geq t_{0} such that for all tt1,t\geq t_{1},

ddt(1ε(t))min(2λδ,δλ),\dfrac{d}{dt}\left(\frac{1}{\sqrt{\varepsilon(t)}}\right)\leq\min(2\lambda-\delta,\delta-\lambda), (3)

where λ\lambda is a parameter such that δ2<λ<δ\frac{\delta}{2}<\lambda<\delta for 0<δ20<\delta\leq 2, and δ+δ242<λ<δ\frac{\delta+\sqrt{\delta^{2}-4}}{2}<\lambda<\delta for δ>2\delta>2 .

By integrating the differential inequality (3), one can easily verify that this condition implies that ε(t)\varepsilon(t) is greater than or equal to C/t2C/t^{2}. Since the damping coefficient is proportional to ε(t)\sqrt{\varepsilon(t)}, this means that it must be greater than or equal to C/tC/t. This is in accordance with the theory of inertial gradient systems with time-dependent viscosity coefficient, which states that the asymptotic optimization property is valid provided that the integral on [t0,+[[t_{0},+\infty[ of the viscous damping coefficient is infinite, see AC10 , CEG . Let us state the following convergence result obtained in AL .

Theorem 1.2

(Attouch-László AL ) Let x:[t0,+[x:[t_{0},+\infty[\to\mathcal{H} be a solution trajectory of (TRIGS). Let δ\delta be a positive parameter. Suppose that ε()\varepsilon(\cdot) satisfies the condition (CD)λ{\rm(CD)}_{\lambda} for some λ>0\lambda>0. Then, we have the following rate of convergence of values: for all tt1t\geq t_{1}

f(x(t))minfλx221γ(t)t1tε32(s)γ(s)𝑑s+Cγ(t),f(x(t))-\min_{{\mathcal{H}}}f\leq\frac{\lambda\|x^{*}\|^{2}}{2}\frac{1}{\gamma(t)}\int_{t_{1}}^{t}\varepsilon^{\frac{3}{2}}(s)\gamma(s)ds+\frac{C}{\gamma(t)}, (4)

where

γ(t)=exp(t1tμ(s)𝑑s),μ(t)=ε˙(t)2ε(t)+(δλ)ε(t)\gamma(t)=\exp\left({\displaystyle{\int_{t_{1}}^{t}\mu(s)ds}}\right),\quad\mu(t)=-\frac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}

and C=(f(x(t1))f)+ε(t1)2x(t1)2+12λε(t1)(x(t1)x)+x˙(t1)2.\mbox{ and }\;C=\left(f(x(t_{1}))-f^{\ast}\right)+\frac{\varepsilon(t_{1})}{2}\|x(t_{1})\|^{2}+\frac{1}{2}\|\lambda\sqrt{\varepsilon(t_{1})}(x(t_{1})-x^{\ast})+\dot{x}(t_{1})\|^{2}.

The proof is based on the following Lyapunov function :[t0,+[+,\mathcal{E}:[t_{0},+\infty[\to\mathbb{R}_{+},

(t):=(f(x(t))minf)+ε(t)2x(t)2+12c(t)(x(t)x)+x˙(t)2,\displaystyle\mathcal{E}(t):=\left(f(x(t))-\min f\right)+\frac{\varepsilon(t)}{2}\|x(t)\|^{2}+\frac{1}{2}\|c(t)(x(t)-x^{\ast})+\dot{x}(t)\|^{2}, (5)

where the function c:[t0,+[c:[t_{0},+\infty[\to{\mathbb{R}} is chosen appropriately. Based on this Lyapunov analysis, it is proved in AL that lim inft+x(t)x=0\liminf_{t\to+\infty}\|x(t)-x^{\ast}\|=0. We will improve this result, and show that limt+x(t)x=0\lim_{t\to+\infty}\|x(t)-x^{\ast}\|=0. For this, we will develop a new Lyapunov analysis. Let us first recall some related previous results showing the progression of the understanding of these delicate questions, where the hierarchical minimization property is reached asymptotically.

1.2 Historical facts and related results

In relation to hierarchical optimization, a rich literature has been devoted to the coupling of dynamic gradient systems with Tikhonov regularization, and to the study of the corresponding algorithms.

1.2.1 First-order gradient dynamics

For first-order gradient systems and subdifferential inclusions, the asymptotic hierarchical minimization property which results from the introduction of a vanishing viscosity term in the dynamic (in our context the Tikhonov approximation Tikh ; TA ) has been highlighted in a series of papers AlvCab , Att2 , AttCom , AttCza2 , BaiCom , CPS , Hirstoaga . In parallel way, there is a vast literature on convex descent algorithms involving Tikhonov and more general penalty, regularization terms. The historical evolution can be traced back to Fiacco and McCormick FM , and the interpretation of interior point methods with the help of a vanishing logarithmic barrier. Some more specific references for the coupling of Prox and Tikhonov can be found in Cominetti Com . The time discretization of the first-order gradient systems and subdifferential inclusions involving multiscale (in time) features provides a natural link between the continuous and discrete dynamics. The resulting algorithms combine proximal based methods (for example forward-backward algorithms), with the viscosity of penalization methods, see AttCzaPey1 , AttCzaPey2 , BotCse1 , Cabot-inertiel ; Cab , Hirstoaga .

1.2.2 Second order gradient dynamics

First studies concerning the coupling of damped inertial dynamics with Tikhonov approximation concerned the heavy ball with friction system of Polyak Polyak , where the damping coefficient γ>0\gamma>0 is fixed. In AttCza1 Attouch-Czarnecki considered the system

x¨(t)+γx˙(t)+f(x(t))+ε(t)x(t)=0.\ddot{x}(t)+\gamma\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0. (6)

In the slow parametrization case 0+ε(t)𝑑t=+\int_{0}^{+\infty}\varepsilon(t)dt=+\infty, they proved that any solution x()x(\cdot) of (6) converges strongly to the minimum norm element of argminf\operatorname{argmin}f, see also JM-Tikh . A parallel study has been developed for PDE’s, see AA for damped hyperbolic equations with non-isolated equilibria, and AlvCab for semilinear PDE’s. The system (6) is a special case of the general dynamic model

x¨(t)+γx˙(t)+f(x(t))+ε(t)g(x(t))=0\ddot{x}(t)+\gamma\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)\nabla g(x(t))=0 (7)

which involves two functions ff and gg intervening with different time scale. When ε()\varepsilon(\cdot) tends to zero moderately slowly, it was shown in Att-Czar-last that the trajectories of (7) converge asymptotically to equilibria that are solutions of the following hierarchical problem: they minimize the function gg on the set of minimizers of ff. When =1×2\mathcal{H}={\mathcal{H}}_{1}\times{\mathcal{H}}_{2} is a product space, defining for x=(x1,x2)x=(x_{1},x_{2}), f(x1,x2):=f1(x1)+f2(x2)f(x_{1},x_{2}):=f_{1}(x_{1})+f_{2}(x_{2}) and g(x1,x2):=A1x1A2x22g(x_{1},x_{2}):=\|A_{1}x_{1}-A_{2}x_{2}\|^{2}, where the Ai,i{1,2}A_{i},\,i\in\{1,2\} are linear operators, (7) provides (weakly) coupled inertial systems. The continuous and discrete-time versions of these systems have a natural connection to the best response dynamics for potential games AttCza2 , domain decomposition for PDE’s abc2 , optimal transport abc , coupled wave equations HJ2 .

In the quest for a faster convergence, the following system

(AVD)α,εx¨(t)+αtx˙(t)+f(x(t))+ε(t)x(t)=0,\mbox{(AVD)}_{\alpha,\varepsilon}\quad\quad\ddot{x}(t)+\frac{\alpha}{t}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0, (8)

has been studied by Attouch-Chbani-Riahi ACR . It is a Tikhonov regularization of the dynamic

(AVD)αx¨(t)+αtx˙(t)+f(x(t))=0,\mbox{(AVD)}_{\alpha}\quad\quad\ddot{x}(t)+\frac{\alpha}{t}\dot{x}(t)+\nabla f(x(t))=0, (9)

which was introduced by Su, Boyd and Candès in SBC . When α=3\alpha=3, (AVD)α\mbox{(AVD)}_{\alpha} can be viewed as a continuous version of the accelerated gradient method of Nesterov. It has been the subject of many recent studies which have given an in-depth understanding of the Nesterov acceleration method, see AAD1 , AC10 , ACPR , SBC , Siegel , WRJ .

1.3 Model results

Let us illustrate our results in the case ε(t)=1tp\varepsilon(t)=\frac{1}{t^{p}}. In section 3, we will prove the following result:

Theorem 1.3

Take ε(t)=1/tp\varepsilon(t)=1/t^{p},  0<p<20<p<2. Let x:[t0,+[x:[t_{0},+\infty[\to\mathcal{H} be a solution trajectory of

x¨(t)+δtp2x˙(t)+f(x(t))+1tpx(t)=0.\ddot{x}(t)+\frac{\delta}{t^{\frac{p}{2}}}\dot{x}(t)+\nabla f\left(x(t)\right)+\frac{1}{t^{p}}x(t)=0.

Then, we have the following rates of convergence for the values and the trajectory:

f(x(t))minf=𝒪(1tp) as t+\displaystyle f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)\mbox{ as }\;t\to+\infty (10)
x(t)xε(t)2=𝒪(1t2p2) as t+.\displaystyle\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)\mbox{ as }\;t\to+\infty. (11)

According to the strong convergence of xε(t)x_{\varepsilon(t)} to the minimum norm solution, the above property implies that x(t)x(t) strongly converges to the minimum norm solution.

With many respect, these results represent an important advance compared to previous works:

i) Let us underline that in our approach the rapid convergence of the values and the strong convergence towards the solution of minimum norm are obtained for the same dynamic, whereas in the previous works ACR , AttCza1 , they are obtained for different dynamics corresponding to different settings of the parameters. Moreover, we obtain the strong convergence of the trajectories to the minimum norm solution, whereas in AL and ACR it was only obtained that lim inft+x(t)x=0.\liminf_{t\to+\infty}{\|x(t)-x^{\ast}\|}=0. It is clear that the results extend naturally to obtaining strong convergence towards the solution closest to a desired state xdx_{d}. It suffices to replace in Tikhonov’s approximation x2\|x\|^{2} by xxd2\|x-x_{d}\|^{2}. This is important for inverse problems. In addition, we obtain a convergence rate of the values which is better than the one obtained inAL .

ii) These results show the trade-off between the property of rapid convergence of values, and the property of strong convergence towards the minimum norm solution. The two rates of convergence move in opposite directions with pp varies. The determination of a good compromise between these two antagonistic criteria is an interesting subject that we will consider later.

iii) Note that at the limit, when p=2p=2, which is the most interesting case to obtain a fast convergence of values comparable to the accelerated gradient method of Nesterov, then our analysis does not allow us to conclude that the trajectories are converging towards the solution of minimum norm. This question remains open, the interested reader can consult AL .

1.4 Contents

The paper is organized as follows. In section 2, for a general Tikhonov regularization parameter ε(t)\varepsilon(t), we study the asymptotic convergence properties of the solution trajectories of (TRIGS). Based on Lyapunov analysis, we show their strong convergence to the element of minimum norm of SS, and we provide convergence rate of the values. In section 3, we apply these results to the particular case ε(t)=1tp\varepsilon(t)=\frac{1}{t^{p}}. Section 4 considers the extension of these results to the nonsmooth case. Section 5 contains numerical illustrations. We conclude in section 6 with some perspective and open questions.

2 Convergence analysis for general ε(t)\varepsilon(t)

We are going to analyze via Lyapunov analysis the convergence properties as t+t\to+\infty of the solution trajectories of the inertial dynamic (TRIGS) that we recall below

x¨(t)+δε(t)x˙(t)+f(x(t))+ε(t)x(t)=0.\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0. (12)

Throughout the paper, we assume that t0t_{0} is the origin of time, δ\delta is a positive parameter. For each tt0t\geq t_{0}, let us introduce the function φt:\varphi_{t}:{\mathcal{H}}\to{\mathbb{R}} defined by

φt(x):=f(x)+ε(t)2x2,\varphi_{t}(x):=f(x)+\dfrac{\varepsilon(t)}{2}\|x\|^{2}, (13)

and set

xε(t):=argminφt,x_{\varepsilon(t)}:=\operatorname{argmin}_{{\mathcal{H}}}\varphi_{t},

which is the unique minimizer of the strongly convex function φt\varphi_{t}. The first order optimality condition gives

f(xε(t))+ε(t)xε(t)=0.\nabla f(x_{\varepsilon(t)})+\varepsilon(t)x_{\varepsilon(t)}=0. (14)

The following properties are immediate consequences of the classical properties of the Tikhonov regularization

tt0xε(t)x\displaystyle\forall t\geq t_{0}\;\;\|x_{\varepsilon(t)}\|\leq\|x^{*}\|\vspace{3mm} (15)
limt+xε(t)x=0wherex=projargminf0.\displaystyle\lim_{t\rightarrow+\infty}\|x_{\varepsilon(t)}-x^{*}\|=0\quad\hbox{where}\>x^{*}=\mbox{proj}_{\operatorname{argmin}f}0. (16)

2.1 Preparatory results for Lyapunov analysis

Let us introduce the real-valued function function t[t0,+[E(t)+t\in[t_{0},+\infty[\mapsto E(t)\in{\mathbb{R}}^{+} that plays a key role in our Lyapunov analysis. It is defined by

E(t):=(φt(x(t))φt(xε(t)))+12v(t)2,E(t):=\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\|v(t)\|^{2}, (17)

where φt\varphi_{t} has been defined in (13), and

v(t)=τ(t)(x(t)xε(t))+x˙(t).v(t)=\tau(t)\left(x(t)-x_{\varepsilon(t)}\right)+\dot{x}(t). (18)

The time dependent parameter τ()\tau(\cdot) will be adjusted during the proof. We will show that under judicious choice of the parameters, then tE(t)t\mapsto E(t) is a decreasing function. Moreover, we will estimate the rate of convergence of E(t)E(t) towards zero. This will provide the rates of convergence of values and trajectories, as the following lemma shows.

Lemma 1

Let x():[t0,+[x(\cdot):[t_{0},+\infty[\to{\mathcal{H}} be a solution trajectory of the damped inertial dynamic (TRIGS), and t[t0,+[E(t)+t\in[t_{0},+\infty[\mapsto E(t)\in{\mathbb{R}}^{+} be the energy function defined in (17). Then, the following inequalites are satisfied: for any tt0t\geq t_{0},

f(x(t))minfE(t)+ε(t)2x2;\displaystyle f(x(t))-\min_{\mathcal{H}}f\leq E(t)+\dfrac{\varepsilon(t)}{2}\|x^{*}\|^{2}; (19)
x(t)xε(t)22E(t)ε(t).\displaystyle\|x(t)-x_{\varepsilon(t)}\|^{2}\leq\frac{2E(t)}{\varepsilon(t)}. (20)

Therefore, x(t)x(t) converges strongly to xx^{*} as soon as limt+E(t)ε(t)=0.\lim_{t\to+\infty}\displaystyle{\frac{E(t)}{\varepsilon(t)}}=0.

Proof

i)i) According to the definition of φt\varphi_{t}, we have

f(x(t))minf=φt(x(t))φt(x)+ε(t)2(x2x(t)2)=[φt(x(t))φt(xε(t))]+[φt(xε(t))φt(x)0]+ε(t)2(x2x(t)2)φt(x(t))φt(xε(t))+ε(t)2x2.\begin{array}[]{lll}f(x(t))-\min_{\mathcal{H}}f&=&\varphi_{t}(x(t))-\varphi_{t}(x^{*})+\dfrac{\varepsilon(t)}{2}\left(\|x^{*}\|^{2}-\|x(t)\|^{2}\right)\\ &=&\left[\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right]+\left[\underbrace{\varphi_{t}(x_{\varepsilon(t)})-\varphi_{t}(x^{*})}_{\leq 0}\right]+\dfrac{\varepsilon(t)}{2}\left(\|x^{*}\|^{2}-\|x(t)\|^{2}\right)\\ &\leq&\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})+\dfrac{\varepsilon(t)}{2}\|x^{*}\|^{2}.\end{array}

By definition of E(t)E(t) we have

φt(x(t))φt(xε(t))E(t)\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\leq E(t) (21)

which, combined with the above inequality, gives (19).

ii)ii) By the strong convexity of φt\varphi_{t}, and xε(t):=argminφtx_{\varepsilon(t)}:=\operatorname{argmin}_{{\mathcal{H}}}\varphi_{t}, we have

φt(x(t))φt(xε(t)ε(t)2x(t)xε(t)2.\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)}\geq\frac{\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}.

By combining the inequality above with (21), we get

E(t)ε(t)2x(t)xε(t)2,E(t)\geq\frac{\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2},

which gives (20).
By the assumption 0)\mathcal{H}_{0}), we have limtε(t)=0\lim_{t\to\infty}\varepsilon(t)=0. According to (16), we deduce that xε(t)x_{\varepsilon(t)} converges strongly to xx^{*}. The conclusion is a direct consequence of inequality (20).

To estimate E(t)E(t), we will show that it satisfies a first order differential inequality of the form

E˙(t)+μ(t)E(t)ρ(t)x2\dot{E}(t)+\mu(t)E(t)\leq\rho(t)\|x^{*}\|^{2} (22)

where ρ(t)\rho(t) and μ(t)\mu(t) are positive functions that will be made precise in the proof. So the first step of the proof is to compute ddtE(t)\dfrac{d}{dt}E(t). The computation of ddtE(t)\dfrac{d}{dt}E(t) involves the two termsddt(φt(xε(t)))\dfrac{d}{dt}\left(\varphi_{t}(x_{\varepsilon(t)})\right) and ddt(xε(t)).\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right). They are evaluated in the lemma below.

Lemma 2

For each tt0t\geq t_{0}, we have

  • i)i)

    ddt(φt(xε(t)))=12ε˙(t)xε(t)2;\dfrac{d}{dt}\left(\varphi_{t}(x_{\varepsilon(t)})\right)=\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2};

  • ii)ii)

    ddt(xε(t))2ε˙(t)ε(t)ddt(xε(t)),xε(t)\left\|\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right)\right\|^{2}\leq-\dfrac{\dot{\varepsilon}(t)}{\varepsilon(t)}\left\langle\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right),x_{\varepsilon(t)}\right\rangle.

Therefore, ddt(xε(t))ε˙(t)ε(t)xε(t).\left\|\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right)\right\|\leq-\dfrac{\dot{\varepsilon}(t)}{\varepsilon(t)}\|x_{\varepsilon(t)}\|.

Proof

i)i) We have φt(xε(t))=infyH{f(y)+ε(t)2y02}=f1ε(t)(0).\varphi_{t}(x_{\varepsilon(t)})=\inf_{y\in H}\{f(y)+\frac{\varepsilon(t)}{2}\|y-0\|^{2}\}=f_{\frac{1}{\varepsilon(t)}}(0).
Since ddλfλ(x)=12fλ(x)2,\dfrac{d}{d\lambda}f_{\lambda}(x)=-\frac{1}{2}\|\nabla f_{\lambda}(x)\|^{2}, (see (Att-book, , Lemma 3.27), (Att2, , Corollary 6.2)), we have:

ddtfλ(t)(x)=λ˙(t)2fλ(t)(x)2.\dfrac{d}{dt}f_{\lambda(t)}(x)=-\frac{\dot{\lambda}(t)}{2}\|\nabla f_{\lambda(t)}(x)\|^{2}.

Therefore,

ddtφt(xε(t))=ddt(f1ε(t)(0))=12ε˙(t)ε2(t)f1ε(t)(0)2.\dfrac{d}{dt}\varphi_{t}(x_{\varepsilon(t)})=\dfrac{d}{dt}\left(f_{\frac{1}{\varepsilon(t)}}(0)\right)=\frac{1}{2}\dfrac{\dot{\varepsilon}(t)}{\varepsilon^{2}(t)}\|\nabla f_{\frac{1}{\varepsilon(t)}}(0)\|^{2}. (23)

On the other hand, we have

φt((xε(t))=0f(xε(t))+ε(t)xε(t)=0xε(t)=J1ε(t)f(0).\nabla\varphi_{t}((x_{\varepsilon(t)})=0\Longleftrightarrow\nabla f(x_{\varepsilon(t)})+\varepsilon(t)x_{\varepsilon(t)}=0\Longleftrightarrow x_{\varepsilon(t)}=J^{f}_{\frac{1}{\varepsilon(t)}}(0).

Since f1ε(t)(0)=ε(t)(0J1ε(t)f(0)),\nabla f_{\frac{1}{\varepsilon(t)}}(0)=\varepsilon(t)\left(0-J^{f}_{\frac{1}{\varepsilon(t)}}(0)\right), we get f1ε(t)(0)=ε(t)xε(t)\nabla f_{\frac{1}{\varepsilon(t)}}(0)=-\varepsilon(t)x_{\varepsilon(t)}. This combined with (23) gives

ddtφt(xε(t))=12ε˙(t)xε(t)2.\dfrac{d}{dt}\varphi_{t}(x_{\varepsilon(t)})=\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}.

We have

ε(t)xε(t)=f(xε(t))andε(t+h)xε(t+h)=f(xε(t+h)).-\varepsilon(t)x_{\varepsilon(t)}=\nabla f(x_{\varepsilon(t)})\quad\hbox{and}\quad-\varepsilon(t+h)x_{\varepsilon(t+h)}=\nabla f(x_{\varepsilon(t+h)}).

According to the monotonicity of f,\nabla f, we have

ε(t)xε(t)ε(t+h)xε(t+h),xε(t+h)xε(t)0,\langle\varepsilon(t)x_{\varepsilon(t)}-\varepsilon(t+h)x_{\varepsilon(t+h)},x_{\varepsilon(t+h)}-x_{\varepsilon(t)}\rangle\geq 0,

which implies

ε(t)uε(t+h)uε(t)2+(ε(t)ε(t+h))uε(t+h),uε(t+h)uε(t)0.-\varepsilon(t)\|u_{\varepsilon(t+h)}-u_{\varepsilon(t)}\|^{2}+\left(\varepsilon(t)-\varepsilon(t+h)\right)\langle u_{\varepsilon(t+h)},u_{\varepsilon(t+h)}-u_{\varepsilon(t)}\rangle\geq 0.

After division by h2,h^{2}, we obtain

(ε(t)ε(t+h))hxε(t+h),xε(t+h)xε(t)hε(t)xε(t+h)xε(t)h2.\dfrac{\left(\varepsilon(t)-\varepsilon(t+h)\right)}{h}\left\langle x_{\varepsilon(t+h)},\dfrac{x_{\varepsilon(t+h)}-x_{\varepsilon(t)}}{h}\right\rangle\geq\varepsilon(t)\left\|\dfrac{x_{\varepsilon(t+h)}-x_{\varepsilon(t)}}{h}\right\|^{2}.

By letting h0,h\rightarrow 0, we get

ε˙(t)xε(t),ddtxε(t)ε(t)ddtxε(t)2,-\dot{\varepsilon}(t)\left\langle x_{\varepsilon(t)},\dfrac{d}{dt}x_{\varepsilon(t)}\right\rangle\geq\varepsilon(t)\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2},

which completes the proof.

2.2 Main result

Given a general parameter ε()\varepsilon(\cdot), let’s proceed with the Lyapunov analysis.

Theorem 2.1

Suppose that f:f:{\mathcal{H}}\to\mathbb{R} is a convex function of class 𝒞1{\mathcal{C}}^{1}. Let x():[t0,+[x(\cdot):[t_{0},+\infty[\to{\mathcal{H}} be a solution trajectory of the system (TRIGS) with δ>0\delta>0

x¨(t)+δε(t)x˙(t)+f(x(t))+ε(t)x(t)=0.\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0.

Let us assume that there exists a,c>1a,c>1 and t1t0t_{1}\geq t_{0} such that for all tt1,t\geq t_{1},

(1)ddt(1ε(t))min(2λδ,δa+1aλ),(\mathcal{H}_{1})\qquad\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)\leq\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right),

where λ\lambda is such that

δ2<λ<aa+1δ for 0<δ21c,12(δ+1c+(δ+1c)24)<λ<aa+1δ for δ>21c.\frac{\delta}{2}<\lambda<\frac{a}{a+1}\delta\;\mbox{ for }0<\delta\leq 2-\frac{1}{c},\quad\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)<\lambda<\frac{a}{a+1}\delta\;\mbox{ for }\delta>2-\frac{1}{c}.

Then, the following property holds:

E(t)x22t1t[((λc+a)λε˙2(s)ε32(s)ε˙(s))γ(s)]𝑑sγ(t)+γ(t1)E(t1)γ(t)E(t)\leq\dfrac{\|x^{*}\|^{2}}{2}\dfrac{\displaystyle\int_{t_{1}}^{t}\left[\left((\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(s)}{\varepsilon^{\frac{3}{2}}(s)}-\dot{\varepsilon}(s)\right)\gamma(s)\right]ds}{\gamma(t)}+\dfrac{\gamma(t_{1})E(t_{1})}{\gamma(t)} (24)

where γ(t)=exp(t1tμ(s)𝑑s),\gamma(t)=\exp\left(\displaystyle\int_{t_{1}}^{t}\mu(s)ds\right), and μ(t)=ε˙(t)2ε(t)+(δλ)ε(t)\mu(t)=-\dfrac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}.

Proof

According to the classical derivation chain rule and Lemma 2 i)i), the differentiation of the function E()E(\cdot) gives

E˙(t)=φt(x(t)),x˙(t)+12ε˙(t)x(t)212ε˙(t)xε(t)2+v˙(t),v(t).\begin{array}[]{lll}\dot{E}(t)&=&\langle\nabla\varphi_{t}(x(t)),\dot{x}(t)\rangle+\frac{1}{2}\dot{\varepsilon}(t)\|x(t)\|^{2}-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}+\langle\dot{v}(t),v(t)\rangle.\end{array} (25)

Using the constitutive equation (12), we have

v˙(t)=τ˙(t)(x(t)xε(t))+τ(t)x˙(t)τ(t)ddtxε(t)+x¨(t),=τ˙(t)(x(t)xε(t))+τ(t)x˙(t)τ(t)ddtxε(t)δε(t)x˙(t)φt(x(t))=τ˙(t)(x(t)xε(t))+(τ(t)δε(t))x˙(t)τ(t)ddtxε(t)φt(x(t)).\begin{array}[]{lll}\dot{v}(t)&=&\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\tau(t)\dot{x}(t)-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}+\ddot{x}(t),\\ &=&\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\tau(t)\dot{x}(t)-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}-\delta\sqrt{\varepsilon(t)}\dot{x}(t)-\nabla\varphi_{t}(x(t))\\ &=&\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\dot{x}(t)-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}-\nabla\varphi_{t}(x(t)).\end{array}

Therefore,

v˙(t),v(t)\displaystyle\langle\dot{v}(t),v(t)\rangle =\displaystyle= τ˙(t)(x(t)xε(t))+(τ(t)δε(t))x˙(t),τ(t)(x(t)xε(t))+x˙(t)\displaystyle\left\langle\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\dot{x}(t),\tau(t)(x(t)-x_{\varepsilon(t)})+\dot{x}(t)\right\rangle (26)
+\displaystyle+ τ(t)ddtxε(t)φt(x(t)),τ(t)(x(t)xε(t))+x˙(t)\displaystyle\left\langle-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}-\nabla\varphi_{t}(x(t)),\tau(t)(x(t)-x_{\varepsilon(t)})+\dot{x}(t)\right\rangle
=\displaystyle= τ˙(t)τ(t)x(t)xε(t)2+(τ˙(t)+τ2(t)δτ(t)ε(t))x(t)xε(t),x˙(t)\displaystyle\dot{\tau}(t)\tau(t)\|x(t)-x_{\varepsilon(t)}\|^{2}+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle
+\displaystyle+ (τ(t)δε(t))x˙(t)2τ2(t)ddtxε(t),x(t)xε(t)τ(t)ddtxε(t),x˙(t)\displaystyle\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\|\dot{x}(t)\|^{2}-\tau^{2}(t)\langle\dfrac{d}{dt}x_{\varepsilon(t)},x(t)-x_{\varepsilon(t)}\rangle-\tau(t)\langle\dfrac{d}{dt}x_{\varepsilon(t)},\dot{x}(t)\rangle
τ(t)φt(x(t)),x(t)xε(t))=A0φt(x(t)),x˙(t),\displaystyle\underbrace{-\tau(t)\langle\nabla\varphi_{t}(x(t)),x(t)-x_{\varepsilon(t)})\rangle}_{=A_{0}}-\langle\nabla\varphi_{t}(x(t)),\dot{x}(t)\rangle,

Since φt\varphi_{t} is strongly convex, we have

φt(xε(t))φt(x(t))φt(x(t)),xε(t)x(t)+ε(t)2x(t)xε(t)2.\varphi_{t}(x_{\varepsilon(t)})-\varphi_{t}(x(t))\geq\left\langle\nabla\varphi_{t}(x(t)),x_{\varepsilon(t)}-x(t)\right\rangle+\frac{\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}.

Using the above inequality, we get

A0=τ(t)φt(x(t)),x(t)xε(t))τ(t)(φt(x(t))φt(xε(t)))τ(t)ε(t)2x(t)xε(t)2.A_{0}=-\tau(t)\langle\nabla\varphi_{t}(x(t)),x(t)-x_{\varepsilon(t)})\rangle\leq-\tau(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)-\dfrac{\tau(t)\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}. (27)

Moreover, we have for any a>1,a>1,

τ(t)ddtxε(t),x˙(t)τ(t)2ax˙(t)2+aτ(t)2ddtxε(t)2-\tau(t)\left\langle\dfrac{d}{dt}x_{\varepsilon(t)},\dot{x}(t)\right\rangle\leq\dfrac{\tau(t)}{2a}\|\dot{x}(t)\|^{2}+\dfrac{a\tau(t)}{2}\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}

and for any b>0b>0

τ2(t)ddtxε(t),x(t)xε(t)bτ(t)2ddtxε(t)2+τ3(t)2bx(t)xε(t)2.-\tau^{2}(t)\left\langle\dfrac{d}{dt}x_{\varepsilon(t)},x(t)-x_{\varepsilon(t)}\right\rangle\leq\dfrac{b\tau(t)}{2}\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}+\dfrac{\tau^{3}(t)}{2b}\|x(t)-x_{\varepsilon(t)}\|^{2}.

By combining the two above inequalities with (25), (26) and (27), and after reduction we obtain

E˙(t)\displaystyle\dot{E}(t) \displaystyle\leq τ(t)(φt(x(t))φt(xε(t)))+12ε˙(t)x(t)212ε˙(t)xε(t)2\displaystyle-\tau(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\dot{\varepsilon}(t)\|x(t)\|^{2}-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2} (28)
+[τ˙(t)τ(t)τ(t)ε(t)2]x(t)xε(t)2\displaystyle+\left[\dot{\tau}(t)\tau(t)-\dfrac{\tau(t)\varepsilon(t)}{2}\right]\|x(t)-x_{\varepsilon(t)}\|^{2}
+(τ˙(t)+τ2(t)δτ(t)ε(t))x(t)xε(t),x˙(t)+(τ(t)δε(t))x˙(t)2\displaystyle+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle+\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\|\dot{x}(t)\|^{2}
τ2(t)ddtxε(t),x(t)xε(t)τ(t)ddtxε(t),x˙(t)\displaystyle-\tau^{2}(t)\langle\dfrac{d}{dt}x_{\varepsilon(t)},x(t)-x_{\varepsilon(t)}\rangle-\tau(t)\left\langle\dfrac{d}{dt}x_{\varepsilon(t)},\dot{x}(t)\right\rangle
\displaystyle\leq τ(t)(φt(x(t))φt(xε(t)))+12[(b+a)τ(t)]ddtxε(t)2\displaystyle-\tau(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\left[(b+a)\tau(t)\right]\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}
+ε˙(t)2x(t)2ε˙(t)2xε(t)2+((1+12a)τ(t)δε(t))x˙(t)2\displaystyle+\frac{\dot{\varepsilon}(t)}{2}\|x(t)\|^{2}-\dfrac{\dot{\varepsilon}(t)}{2}\|x_{\varepsilon(t)}\|^{2}+\left((1+\frac{1}{2a})\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\|\dot{x}(t)\|^{2}
+[τ˙(t)τ(t)τ(t)ε(t)2+τ3(t)2b]x(t)xε(t)2\displaystyle+\left[\dot{\tau}(t)\tau(t)-\dfrac{\tau(t)\varepsilon(t)}{2}+\dfrac{\tau^{3}(t)}{2b}\right]\|x(t)-x_{\varepsilon(t)}\|^{2}
+(τ˙(t)+τ2(t)δτ(t)ε(t))x(t)xε(t),x˙(t).\displaystyle+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle.

On the other hand, for a positive function μ(t),\mu(t), we have

μ(t)E(t)=μ(t)(φt(x(t))φt(xε(t)))+μ(t)2v(t)2=μ(t)(φt(x(t))φt(xε(t)))+μ(t)τ2(t)2x(t)xε(t)2+μ(t)2x˙(t)2+μ(t)τ(t)x(t)xε(t),x˙(t).\begin{array}[]{lll}\mu(t)E(t)&=&\mu(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\dfrac{\mu(t)}{2}\|v(t)\|^{2}\\ &=&\mu(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\dfrac{\mu(t)\tau^{2}(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}+\dfrac{\mu(t)}{2}\|\dot{x}(t)\|^{2}\\ &&+\mu(t)\tau(t)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle.\end{array} (29)

By adding (28) and (29), we get

E˙(t)+μ(t)E(t)\displaystyle\dot{E}(t)+\mu(t)E(t) \displaystyle\leq (μ(t)τ(t))(φt(x(t))φt(xε(t)))+12[(b+a)τ(t)]ddtxε(t)2\displaystyle\left(\mu(t)-\tau(t)\right)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\left[(b+a)\tau(t)\right]\|\dfrac{d}{dt}x_{\varepsilon(t)}\|^{2} (30)
12ε˙(t)xε(t)2+ε˙(t)2x(t)2+((1+12a)τ(t)δε(t)+μ(t)2)x˙(t)2\displaystyle-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}+\frac{\dot{\varepsilon}(t)}{2}\|x(t)\|^{2}+\left((1+\frac{1}{2a})\tau(t)-\delta\sqrt{\varepsilon(t)}+\frac{\mu(t)}{2}\right)\|\dot{x}(t)\|^{2}
+[τ˙(t)τ(t)τ(t)ε(t)2+τ3(t)2b+μ(t)τ2(t)2]x(t)xε(t)2\displaystyle+\left[\dot{\tau}(t)\tau(t)-\dfrac{\tau(t)\varepsilon(t)}{2}+\dfrac{\tau^{3}(t)}{2b}+\dfrac{\mu(t)\tau^{2}(t)}{2}\right]\|x(t)-x_{\varepsilon(t)}\|^{2}
+(τ˙(t)+τ2(t)δτ(t)ε(t)+μ(t)τ(t))x(t)xε(t),x˙(t).\displaystyle+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}+\mu(t)\tau(t)\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle.

As we do not know a priori the sign of x(t)xε(t),x˙(t),\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle, we take the coefficient in front of this term equal to zero, which gives

τ˙(t)+τ2(t)δτ(t)ε(t)+μ(t)τ(t)=0.\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}+\mu(t)\tau(t)=0. (31)

Let us make the following choice of the time dependent parameter τ(t)\tau(t) introduced in (18) (indeed, it is a key ingredient of our Lyapunov analysis)

τ(t)=λε(t),\tau(t)=\lambda\sqrt{\varepsilon(t)},

where λ\lambda is a positive parameter to be fixed. Then, the relation (31) can be equivalently written

μ(t)=ε˙(t)2ε(t)+(δλ)ε(t).\mu(t)=-\dfrac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}.

According to this choice for μ(t)\mu(t) and τ(t),\tau(t), and neglecting the term ε˙(t)2x(t)2\frac{\dot{\varepsilon}(t)}{2}\|x(t)\|^{2} which is less than or equal to zero, the inequality (30) becomes

E˙(t)+μ(t)E(t)12ε(t)(ε˙(t)+2(δ2λ)ε32(t))(φt(x(t))φt(xε(t)))+12[(b+a)λε(t)]ddtxε(t)212ε˙(t)xε(t)2+14ε(t)[2((1+1a)λδ)ε32(t)ε˙(t)]x˙(t)2+λ4[λε˙(t)+2(δλλ21)ε32(t)+2λ2bε32(t)]x(t)xε(t)2.\begin{array}[]{lll}\dot{E}(t)+\mu(t)E(t)&\leq&\dfrac{1}{2\varepsilon(t)}\left(-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)\right)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)\\ &&+\dfrac{1}{2}\left[(b+a)\lambda\sqrt{\varepsilon(t)}\right]\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}\\ &&+\dfrac{1}{4\varepsilon(t)}\left[2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)\right]\|\dot{x}(t)\|^{2}\\ &&+\dfrac{\lambda}{4}\left[\lambda\dot{\varepsilon}(t)+2\left(\delta\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)+2\frac{\lambda^{2}}{b}\varepsilon^{\frac{3}{2}}(t)\right]\|x(t)-x_{\varepsilon(t)}\|^{2}.\end{array} (32)

According to item ii)ii) of Lemma 2, and inequality (15)

ddtxε(t)2ε˙2(t)ε2(t)xε(t)2ε˙2(t)ε2(t)x2.\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}\leq\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{2}(t)}\|x_{\varepsilon(t)}\|^{2}\leq\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{2}(t)}\|x^{*}\|^{2}.

Using again inequality (15), and the fact that ε˙(t)0-\dot{\varepsilon}(t)\geq 0, we have

12ε˙(t)xε(t)212ε˙(t)x2.-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}\leq-\frac{1}{2}\dot{\varepsilon}(t)\|x^{*}\|^{2}.

By inserting the two inequalities above in (32), we obtain

E˙(t)+μ(t)E(t)12ε(t)(ε˙(t)+2(δ2λ)ε32(t))(φt(x(t))φt(xε(t)))+12[(b+a)λε˙2(t)ε32(t)ε˙(t)]x2+14ε(t)[2((1+1a)λδ)ε32(t)ε˙(t)]x˙(t)2+λ4[λε˙(t)+2(δλλ21)ε32(t)+2λ2bε32(t)]x(t)xε(t)2.\begin{array}[]{lll}\dot{E}(t)+\mu(t)E(t)&\leq&\dfrac{1}{2\varepsilon(t)}\left(-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)\right)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)\\ &&+\dfrac{1}{2}\left[(b+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\|x^{*}\|^{2}\\ &&+\dfrac{1}{4\varepsilon(t)}\left[2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)\right]\|\dot{x}(t)\|^{2}\\ &&+\dfrac{\lambda}{4}\left[\lambda\dot{\varepsilon}(t)+2\left(\delta\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)+2\frac{\lambda^{2}}{b}\varepsilon^{\frac{3}{2}}(t)\right]\|x(t)-x_{\varepsilon(t)}\|^{2}.\end{array} (33)

By taking b=cλ,b=c\lambda, with c>1,c>1, we get

E˙(t)+μ(t)E(t)12ε(t)(ε˙(t)+2(δ2λ)ε32(t))=A(φt(x(t))φt(xε(t)))+12[(λc+a)λε˙2(t)ε32(t)ε˙(t)]x2+14ε(t)[2((1+1a)λδ)ε32(t)ε˙(t)]=Bx˙(t)2+λ4[λε˙(t)+2((δ+1c)λλ21)ε32(t)]=Cx(t)xε(t)2.\begin{array}[]{lll}\dot{E}(t)+\mu(t)E(t)&\leq&\dfrac{1}{2\varepsilon(t)}\underbrace{\left(-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)\right)}_{=A}\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)\\ &&+\dfrac{1}{2}\left[(\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\|x^{*}\|^{2}\\ &&+\dfrac{1}{4\varepsilon(t)}\underbrace{\left[2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)\right]}_{=B}\|\dot{x}(t)\|^{2}\\ &&+\dfrac{\lambda}{4}\underbrace{\left[\lambda\dot{\varepsilon}(t)+2\left((\delta+\frac{1}{c})\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)\right]}_{=C}\|x(t)-x_{\varepsilon(t)}\|^{2}.\end{array} (34)

We are looking for sufficient conditions on the parameters λ\lambda and ε(t)\varepsilon(t) which make AA, BB, and CC less than or equal to zero. It is here that the hypothesis (1)(\mathcal{H}_{1}) formulated in the statement of the Theorem 2.1 intervenes. It is recalled below for the convenience of the reader

(1)(\mathcal{H}_{1})  There exists a,c>1a,c>1 and t1t0t_{1}\geq t_{0} such that for all tt1,t\geq t_{1},

ddt(1ε(t))min(2λδ,δa+1aλ),\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon}(t)}\right)\leq\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right),

where λ\lambda is such that δ2<λ<aa+1δ\frac{\delta}{2}<\lambda<\frac{a}{a+1}\delta for 0<δ21c0<\delta\leq 2-\frac{1}{c} and 12(δ+1c+(δ+1c)24)<λ<aa+1δ\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)<\lambda<\frac{a}{a+1}\delta for δ>21c.\delta>2-\frac{1}{c}.


Clearly, condition (1)(\mathcal{H}_{1}) is equivalent to

ddt(1ε(t))2λδandddt(1ε(t))δa+1aλ.\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon}(t)}\right)\leq 2\lambda-\delta\quad and\quad\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon}(t)}\right)\leq\delta-\frac{a+1}{a}\lambda.

Under condition (1)(\mathcal{H}_{1}) we immediately obtain that AA and BB are less than or equal to zero:

\bulletA=ε˙(t)+2(δ2λ)ε32(t)=2ε32(t)[ddt(1ε(t))+δ2λ]0;A=-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)=2\varepsilon^{\frac{3}{2}}(t)\left[\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)+\delta-2\lambda\right]\leq 0;

\bulletB=2((1+1a)λδ)ε32(t)ε˙(t)=2ε32(t)[ddt(1ε(t))+a+1aλδ]0.B=2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)=2\varepsilon^{\frac{3}{2}}(t)\left[\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)+\dfrac{a+1}{a}\lambda-\delta\right]\leq 0.

\bullet   Let us now examine CC.

C=λε˙(t)+2((δ+1c)λλ21)ε32(t)=2ε32(t)[λddt(1ε(t))0+((δ+1c)λλ21)].\begin{array}[]{lll}C&=&\lambda\dot{\varepsilon}(t)+2\left((\delta+\frac{1}{c})\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)\\ &=&2\varepsilon^{\frac{3}{2}}(t)\left[-\lambda\underbrace{\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)}_{\geq 0}+\left((\delta+\frac{1}{c})\lambda-\lambda^{2}-1\right)\right].\end{array}

When δ21c\delta\leq 2-\frac{1}{c} we have (δ+1c)λλ212λλ210.(\delta+\frac{1}{c})\lambda-\lambda^{2}-1\leq 2\lambda-\lambda^{2}-1\leq 0.
When δ>21c,\delta>2-\frac{1}{c}, we have (δ+1c)λλ210,(\delta+\frac{1}{c})\lambda-\lambda^{2}-1\leq 0, because λ12(δ+1c+(δ+1c)24)\lambda\geq\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right).
This implies that C0.C\leq 0.

According to (34), under condition (1)(\mathcal{H}_{1}) we conclude that

E˙(t)+μ(t)E(t)12[(λc+a)λε˙2(t)ε32(t)ε˙(t)]x2\dot{E}(t)+\mu(t)E(t)\leq\frac{1}{2}\left[(\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\|x^{*}\|^{2} (35)

By multiplying the inequality above with γ(t)=exp(t1tμ(s)𝑑s),\gamma(t)=\exp\left(\displaystyle\int_{t_{1}}^{t}\mu(s)ds\right), we obtain

ddt(γ(t)E(t))x22[(λc+a)λε˙2(t)ε32(t)ε˙(t)]γ(t)\dfrac{d}{dt}\left(\gamma(t)E(t)\right)\leq\dfrac{\|x^{*}\|^{2}}{2}\left[(\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\gamma(t) (36)

By integrating (36) on [t1,t][t_{1},t], we get

E(t)x22t1t[((λc+a)λε˙2(s)ε32(s)ε˙(s))γ(s)]𝑑sγ(t)+γ(t1)E(t1)γ(t).E(t)\leq\dfrac{\|x^{*}\|^{2}}{2}\dfrac{\displaystyle\int_{t_{1}}^{t}\left[\left((\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(s)}{\varepsilon^{\frac{3}{2}}(s)}-\dot{\varepsilon}(s)\right)\gamma(s)\right]ds}{\gamma(t)}+\dfrac{\gamma(t_{1})E(t_{1})}{\gamma(t)}.

This completes the proof of the Lyapunov analysis.

Remark 1

Given δ>0\delta>0, the condition (1)(\mathcal{H}_{1}) gives the admissible values of the parameters a>1,c>1a>1,\;c>1 and λ>0\lambda>0 which enter into the convergence rates obtained in Theorem 2.1. Let us verify that the inequalities that define the values of these parameters are consistent. We consider successively the two cases δ<2\delta<2, then δ2\delta\geq 2.

a) When δ<2\delta<2, we have δ<21c\delta<2-\frac{1}{c} for cc sufficiently large. Because a>1a>1, we have 12<aa+1\frac{1}{2}<\frac{a}{a+1}, and hence the interval [12δ,aa+1δ][\frac{1}{2}\delta,\frac{a}{a+1}\delta] is nonempty. Therefore, in this case the conditions are consistent.

b) Suppose now that δ2\delta\geq 2. Then δ>21c\delta>2-\frac{1}{c} for all c>0c>0, and we can argue with cc arbitrarily large. Let us verify that we can find aa and cc such that

δ+1c+(δ+1c)242<aa+1δ,\frac{\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}}{2\;}<\frac{a}{a+1}\delta, (37)

and hence a value of δ\delta belonging to the corresponding interval. By letting c+c\to+\infty and a+a\to+\infty in the above inequality we obtain

δ+δ24<2δ,\delta+\sqrt{\delta^{2}-4}<2\delta,

which is equivalent to δ24<δ\sqrt{\delta^{2}-4}<\delta, and hence is satisfied. By a continuity argument, we obtain that the inequality (37) is satisfied by taking aa and cc sufficiently large.
Note that, since we are interested in asymptotic results the important point is to get the existence of parameters for which the Lyapunov analysis is valid. If we are interested in complexity results then the precise value of these parameters is important.

Remark 2

The above argument shows that the controlled decay property (CD)λ{\rm(CD)}_{\lambda} used in AL corresponds to the limiting case c+c\to+\infty and a+a\to+\infty in our condition (1)(\mathcal{H}_{1}).

Remark 3

As in AL , our Lyapunov analysis is valid for an arbitrary choice of the parameter δ\delta. It would be interesting to know what is the best choice for δ\delta.

3 Particular cases

Let’s study the case ε(t)=1tp\varepsilon(t)=\dfrac{1}{t^{p}}, and discuss, according to the value of the parameter 0<p<20<p<2, the convergence rate of values, and the convergence rate to zero of x(t)xϵ(t)\|x(t)-x_{\epsilon(t)}\|. The following results were stated in Theorem 1.3, in the introduction, as model results. We reproduce them here for the convenience of the reader. The point is simply to apply the general Theorem 2.1 to this particular situation, and to show that the different quantities involved in the convergence results can be computed explicitly.

Theorem 3.1

Take ε(t)=1tp\varepsilon(t)=\displaystyle\frac{1}{t^{p}},  0<p<20<p<2. Let x:[t0,+[x:[t_{0},+\infty[\to\mathcal{H} be a solution trajectory of

x¨(t)+δtp2x˙(t)+f(x(t))+1tpx(t)=0.\ddot{x}(t)+\frac{\delta}{\displaystyle{t^{\frac{p}{2}}}}\dot{x}(t)+\nabla f\left(x(t)\right)+\frac{1}{t^{p}}x(t)=0.

Then, we have convergence of the values, and strong convergence to the minimum norm solution with the following rates:

f(x(t))minf=𝒪(1tp) as t+;\displaystyle f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle\frac{1}{t^{p}}\right)\mbox{ as }\;t\to+\infty; (38)
x(t)xε(t)2=𝒪(1t2p2) as t+.\displaystyle\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)\mbox{ as }\;t\to+\infty. (39)
Proof

a) Convergence rate of the values: With the notations of Theorem 2.1, we have

μ(t)=ε˙(t)2ε(t)+(δλ)ε(t)=p2t+δλtp2.\mu(t)=-\dfrac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}=\dfrac{p}{2t}+\dfrac{\delta-\lambda}{t^{\frac{p}{2}}}.

So

γ(t)\displaystyle\gamma(t) =\displaystyle= exp(t1tμ(s)𝑑s)=(tt1)p2exp[2(δλ)2p(t2p2t12p2)]\displaystyle\exp\left(\displaystyle\int_{t_{1}}^{t}\mu(s)ds\right)=\left(\dfrac{t}{t_{1}}\right)^{\frac{p}{2}}\exp\left[\frac{2(\delta-\lambda)}{2-p}\left(t^{\frac{2-p}{2}}-t_{1}^{\frac{2-p}{2}}\right)\right] (40)
=\displaystyle= C1tp2exp[2(δλ)2pt2p2]\displaystyle C_{1}t^{\frac{p}{2}}\exp\left[\frac{2(\delta-\lambda)}{2-p}t^{\frac{2-p}{2}}\right]

where C1=(t1p2exp[2(δλ)2pt12p2])1.C_{1}=\left(t_{1}^{\frac{p}{2}}\exp\left[\frac{2(\delta-\lambda)}{2-p}t_{1}^{\frac{2-p}{2}}\right]\right)^{-1}. Let us choose the parameters a,c>1a,c>1, λ>0\lambda>0 such that

δ2<λ<aa+1δ\frac{\delta}{2}<\lambda<\frac{a}{a+1}\delta for 0<δ21c0<\delta\leq 2-\frac{1}{c} and δ2<12(δ+1c+(δ+1c)24)<λ<aa+1δ\frac{\delta}{2}<\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)<\lambda<\frac{a}{a+1}\delta for δ>21c\delta>2-\frac{1}{c}.

Notice that, in these two cases, we have 12δ<λ<a+1aδ\frac{1}{2}\delta<\lambda<\frac{a+1}{a}\delta. Therefore, min(2λδ,δa+1aλ)>0\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right)>0. On the other hand, since p<2p<2, we have limt+tp22=0\lim_{t\to+\infty}t^{\frac{p-2}{2}}=0. So we have, for tt1t\geq t_{1} large enough,

ddt(1ε(t))=p2tp22min(2λδ,δa+1aλ).\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)=\frac{p}{2}t^{\frac{p-2}{2}}\leq\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right).

As a consequence, the condition (1)(\mathcal{H}_{1}) is satisfied.

According to (24), we have E(t)E1(t)x2+E2(t)E(t)\leq E_{1}(t)\|x^{*}\|^{2}+E_{2}(t) where

E1(t)=12γ(t)t1t[((λc+a)λε˙2(s)ε32(s)ε˙(s))γ(s)]𝑑sE_{1}(t)=\dfrac{1}{2\gamma(t)}\displaystyle{\int_{t_{1}}^{t}\left[\left((\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(s)}{\varepsilon^{\frac{3}{2}}(s)}-\dot{\varepsilon}(s)\right)\gamma(s)\right]ds} (41)

and

E2(t)=γ(t1)E(t1)γ(t).E_{2}(t)=\dfrac{\gamma(t_{1})E(t_{1})}{\gamma(t)}. (42)

According to (40) we have

E2(t)Ctp2exp[2(δλ)2pt2p2].E_{2}(t)\leq Ct^{-\frac{p}{2}}\exp\left[-\frac{2(\delta-\lambda)}{2-p}t^{\frac{2-p}{2}}\right].

Since 0<p<20<p<2 and δ>λ\delta>\lambda, we have that E2(t)E_{2}(t) tends to zero at an exponential rate, as t+t\to+\infty.
Thus, we only have to focus on the asymptotic behavior of E1(t)E_{1}(t). Let us simplify the formula by setting λ0:=(λc+a)λ,δ0:=2(δλ)2p\lambda_{0}:=(\lambda c+a)\lambda,\;\delta_{0}:=\dfrac{2(\delta-\lambda)}{2-p} in (24). Replacing ε(t)\varepsilon(t) and γ(t)\gamma(t) by their values in (41) gives

E1(t)=p2tp2exp(δ0t2p2)t1t(λ0ps2+1sp+22)exp(δ0s2p2)𝑑s.E_{1}(t)=\dfrac{p}{2t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\displaystyle\int_{t_{1}}^{t}\left(\dfrac{\lambda_{0}p}{s^{2}}+\frac{1}{s^{\frac{p+2}{2}}}\right)\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)ds.

To compute the above integral, we notice that

dds(1ρsexp(δ0s2p2))=(1ρs2+δ0(2p)2ρsp+22)exp(δ0s2p2).\dfrac{d}{ds}\left(\dfrac{1}{\rho s}\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)\right)=\left(-\dfrac{1}{\rho s^{2}}+\dfrac{\delta_{0}(2-p)}{2\rho s^{\frac{p+2}{2}}}\right)\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right).

Then, note that, when we set ρ>0\rho>0 such that ρ<min(1,1a+1δ)\rho<\min\left(1\;,\;\frac{1}{a+1}\delta\right), we obtain

λ0ps2+1sp+221ρs2+δ0(2p)2ρsp+22λ0p+1ρs2(δ0(2p)2ρ1)1sp+22=δλρ1sp+22λ0p+1ρs2p2δλρ1.\begin{array}[]{lll}\dfrac{\lambda_{0}p}{s^{2}}+\dfrac{1}{s^{\frac{p+2}{2}}}\leq-\dfrac{1}{\rho s^{2}}+\dfrac{\delta_{0}(2-p)}{2\rho s^{\frac{p+2}{2}}}&\Longleftrightarrow&\dfrac{\lambda_{0}p+\frac{1}{\rho}}{s^{2}}\leq\left(\dfrac{\delta_{0}(2-p)}{2\rho}-1\right)\dfrac{1}{s^{\frac{p+2}{2}}}=\dfrac{\frac{\delta-\lambda}{\rho}-1}{s^{\frac{p+2}{2}}}\\ &\Longleftrightarrow&\dfrac{\lambda_{0}p+\frac{1}{\rho}}{s^{\frac{2-p}{2}}}\leq\dfrac{\delta-\lambda}{\rho}-1.\end{array}

Let us verify that the last above inequality is satisfied for ss large enough. First, since 0<p<20<p<2, we have 2p2>0\frac{2-p}{2}>0, and hence lims+1s2p2=0\lim_{s\to+\infty}\dfrac{1}{s^{\frac{2-p}{2}}}=0. On the other hand, according to (1)(\mathcal{H}_{1}) we have λ<aa+1δ\lambda<\frac{a}{a+1}\delta. This property combined with the choice of ρ\rho implies

δρ>δ1a+1δ=aa+1δ>λ.\delta-\rho>\delta-\frac{1}{a+1}\delta=\frac{a}{a+1}\delta>\lambda.

Therefore δλρ1>0\dfrac{\delta-\lambda}{\rho}-1>0. So, for t1t_{1} large enough

E1(t)p2tp2exp(δ0t2p2)t1t(1ρs2+δ0(2p)2ρsp+22)exp(δ0s2p2)𝑑s=12tp2exp(δ0t2p2)t1tdds(1ρsexp(δ0s2p2))𝑑s=p2ρtp+22ptp2exp(δ0t2p2)12ρt1exp(δ0t12p2)p2ρtp+22.\begin{array}[]{lll}E_{1}(t)&\leq&\dfrac{p}{2t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\displaystyle\int_{t_{1}}^{t}\left(-\dfrac{1}{\rho s^{2}}+\dfrac{\delta_{0}(2-p)}{2\rho s^{\frac{p+2}{2}}}\right)\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)ds\\ &=&\dfrac{1}{2t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\displaystyle\int_{t_{1}}^{t}\dfrac{d}{ds}\left(\dfrac{1}{\rho s}\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)\right)ds\\ &=&\dfrac{p}{2\rho t^{\frac{p+2}{2}}}-\dfrac{p}{t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\dfrac{1}{2\rho t_{1}}\exp\left(\delta_{0}t_{1}^{\frac{2-p}{2}}\right)\leq\dfrac{p}{2\rho t^{\frac{p+2}{2}}}.\end{array}

Since E2(t)E_{2}(t) has an exponential decay to zero, we deduce that there exists a positive constant CC such that for tt large enough

E(t)Ctp+22.E(t)\leq\dfrac{C}{t^{\frac{p+2}{2}}}.

According to Lemma 1, we get

f(x(t))minfC(1tp+22+1tp).f(x(t))-\min_{\mathcal{H}}f\leq C\left(\dfrac{1}{t^{\frac{p+2}{2}}}+\frac{1}{t^{p}}\right).

Since 0<p<2,0<p<2, we have p<p+22p<\frac{p+2}{2}. We conclude that

f(x(t))minf=𝒪(1tp) as t+.f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)\mbox{ as }\;t\to+\infty.

b) Convergence rate to zero of x(t)x\|x(t)-x^{*}\|. According to Lemma 1, we have

x(t)xε(t)22E(t)ε(t),\|x(t)-x_{\varepsilon(t)}\|^{2}\leq\frac{2E(t)}{\varepsilon(t)}, (43)

and since, for tt large enough, E(t)Ctp+22E(t)\leq\dfrac{C}{t^{\frac{p+2}{2}}}, we obtain

x(t)xε(t)2=𝒪(1t2p2),\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right),

which completes the proof.

Remark 4

The convergence rate of the values 𝒪(1tp)\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right) obtained in Theorem 3.1 notably improves the result obtained in AL , where the convergence rate was of order 𝒪(1t3p22)\mathcal{O}\left(\dfrac{1}{t^{\frac{3p-2}{2}}}\right). Indeed, for p<2p<2 we have p>3p22p>\frac{3p-2}{2}. In addition, we have obtained that for any 0<p<20<p<2, for any trajectory of (TRIGS), we have strong convergence of the trajectory to the minimum norm solution, as time tt tends toward ++\infty. In AL it was only obtained lim inft+x(t)x=0\liminf_{t\to+\infty}\|x(t)-x^{*}\|=0.

Remark 5

A close look at the proof of Theorem 3.1 shows that the convergence rate of the values is still valid in the case p=2p=2. Precisely, by taking ε(t)=ct2\varepsilon(t)=\frac{c}{t^{2}} with cc sufficiently small, we have that the condition (1)(\mathcal{H}_{1}) is satisfied, and hence

f(x(t))minf=𝒪(1t2) as t+.f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle\frac{1}{t^{2}}\right)\mbox{ as }\;t\to+\infty.
Remark 6

As a key ingredient of our proof of the strong convergence of the trajectories of (TRIGS) to the element of minimum norm of SS, we use the function h(t):=12x(t)xε(t)2,h(t):=\frac{1}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}, and show that limt+h(t)=0\lim_{t\to+\infty}h(t)=0. This strategy which consists in showing that the trajectory is not too far from the viscosity curve εxε\varepsilon\mapsto x_{\varepsilon} was already present in the approach developed by Attouch and Cominetti in AttCom for the study of similar questions in the case of the steepest descent method.

3.1 Trade-off between the convergence rate of values and trajectories

The following elementary diagram shows the respective evolution as pp varies of the convergence rate of the values, and the convergence rate of x(t)xε(t)2\|x(t)-x_{\varepsilon(t)}\|^{2}.

pp0223\frac{2}{3}223\frac{2}{3}f(x(t))minf=𝒪(1tp)f(x(t))-\min_{\mathcal{H}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)(red color),  x(t)xε(t)2=𝒪(1t2p2)\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)(blue color)

We observe that p=23p=\frac{2}{3} is a good compromise between these two antagonist properties. Let us state the corresponding result below.

Corollary 1

Take ε(t)=1t23\varepsilon(t)=\displaystyle{\frac{1}{t^{\frac{2}{3}}}}, δ>0\delta>0. Let x:[t0,+[x:[t_{0},+\infty[\to\mathcal{H} be a solution trajectory of

x¨(t)+δt13x˙(t)+f(x(t))+1t23x(t)=0.\ddot{x}(t)+\displaystyle{\frac{\delta}{t^{\frac{1}{3}}}}\dot{x}(t)+\nabla f\left(x(t)\right)+\displaystyle{\frac{1}{t^{\frac{2}{3}}}}x(t)=0.

Then, we have convergence of the values, and strong convergence to the minimum norm solution with the following rates:

f(x(t))minf=𝒪(1t23) and x(t)xε(t)2=𝒪(1t23) as t+.f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{\frac{2}{3}}}}\right)\mbox{ and }\;\;\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\displaystyle{\frac{1}{t^{\frac{2}{3}}}}\right)\mbox{ as }\;t\to+\infty.

b) Another interesting case is to take p<2p<2, with pp close to 22. In this case, we have a convergence rate of the values which is arbitrarily close to the convergence rate of the accelerated gradient method of Nesterov, with a guarantee of strong convergence towards the minimum norm solution. The case p=2p=2 has been studied extensively in AL . The strong convergence of the trajectories to the minimum norm solution is an open question in the case p=2p=2.

c) Estimating the convergence rate of x(t)x(t) to xx^{*} relies on getting informations about the viscosity trajectory ϵxε\epsilon\mapsto x_{\varepsilon}, and how fast xεx_{\varepsilon} converges to xx^{*} as ε0\varepsilon\to 0. This is a difficult problem because the viscosity trajectory can have an infinite length, as Torralba showed in Torralba . His counterexample involves the construction of a convex function from its sub-level sets, and relates to a convex function whose sub-level sets vary greatly. Our analysis focuses on general ff, i.e.  the worst case. This suggests that, under good geometric properties of ff, such as the Kurdyka-Lojasiewicz property, one should be able to obtain better results; see also AttCom where it is shown the importance of this kind of question for the coupling of the steepest descent with Tikhonov approximation.

4 Non-smooth case

Let us extend the previous results to the case of a proper lower semicontinuous and convex function f:{+}f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}. We rely on the basic properties of the Moreau envelope. Recall that, for any θ>0\theta>0, fθ:f_{\theta}:{\mathcal{H}}\to{\mathbb{R}} is defined by

fθ(x)=minξ{f(ξ)+12θxξ2},for any x.f_{\theta}(x)=\min_{\xi\in{\mathcal{H}}}\left\{f(\xi)+\frac{1}{2\theta}\|x-\xi\|^{2}\right\},\quad\text{for any $x\in{\mathcal{H}}$.} (44)

Then fθf_{\theta} is a convex differentiable function, whose gradient is θ1\theta^{-1}-Lipschitz continuous, and such that minf=minfθ\min_{{\mathcal{H}}}f=\min_{{\mathcal{H}}}f_{\theta},   argminfθ=argminf\operatorname{argmin}_{{\mathcal{H}}}f_{\theta}=\operatorname{argmin}_{{\mathcal{H}}}f. Denoting by proxθf(x)\operatorname{prox}_{\theta f}(x) the unique point where the minimum value is achieved in (44), let us recall the following classical formulae:

  1. 1.

    fθ(x)=f(proxθf(x))+12θxproxθf(x)2f_{\theta}(x)=f(\operatorname{prox}_{\theta f}(x))+\frac{1}{2\theta}\|x-\operatorname{prox}_{\theta f}(x)\|^{2}.

  2. 2.

    fθ(x)=1θ(xproxθf(x))\nabla f_{\theta}(x)=\frac{1}{\theta}\left(x-\operatorname{prox}_{\theta f}(x)\right).

The interested reader may refer to BC ; Bre1 for a comprehensive treatment of the Moreau envelope in a Hilbert setting. Since the set of minimizers is preserved by taking the Moreau envelope, the idea is to replace ff by fθf_{\theta} in the inertial dynamic (TRIGS). Then, (TRIGS) applied to fθf_{\theta} now reads

(TRIGS)θx¨(t)+δε(t)x˙(t)+fθ(x(t))+ε(t)x(t)=0.\mbox{(TRIGS)}_{\theta}\quad\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f_{\theta}(x(t))+\varepsilon(t)x(t)=0.

Clearly, since fθf_{\theta} is continuously differentiable, the hypothesis (0)(\mathcal{H}_{0}) are satisfied by the above dynamic. By applying Theorem 3.1 to (TRIGS)θ\mbox{(TRIGS)}_{\theta}, we get the following result, which provides convergence rate of the values and strong convergence to the minimum norm solution.

Theorem 4.1

Let f:{+}f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\} be a convex, lower semicontinuous, proper function. Take ε(t)=1/tp\varepsilon(t)=1/t^{p},  0<p<20<p<2, and θ>0\theta>0. Let x:[t0,+[x:[t_{0},+\infty[\to\mathcal{H} be a solution trajectory of (TRIGS)θ\mbox{\rm(TRIGS)}_{\theta}, i.e.

x¨(t)+δtp2x˙(t)+fθ(x(t))+1tpx(t)=0.\ddot{x}(t)+\frac{\delta}{t^{\frac{p}{2}}}\dot{x}(t)+\nabla f_{\theta}\left(x(t)\right)+\frac{1}{t^{p}}x(t)=0.

Then, we have the following convergence rates: as t+t\to+\infty

f(proxθf(x(t)))minf=𝒪(1tp);\displaystyle f({\operatorname{prox}}_{\theta f}(x(t)))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right); (45)
x(t)proxθf(x(t)))2=𝒪(1tp);\displaystyle\|x(t)-{\operatorname{prox}}_{\theta f}(x(t)))\|^{2}=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right); (46)
x(t)xε(t)2=𝒪(1t2p2).\displaystyle\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right). (47)
Proof

By applying Theorem 3.1 to the function fθf_{\theta}, and since inffθ=inff\inf f_{\theta}=\inf f, we get

fθ(x(t))minf=𝒪(1tp) as t+\displaystyle f_{\theta}(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)\mbox{ as }\;t\to+\infty (48)
x(t)xε(t)2=𝒪(1t2p2) as t+.\displaystyle\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)\mbox{ as }\;t\to+\infty. (49)

According to fθ(x)minf=(f(proxθf(x))minf)+12θxproxθf(x)2f_{\theta}(x)-\min_{{\mathcal{H}}}f=\Big{(}f(\operatorname{prox}_{\theta f}(x))-\min_{{\mathcal{H}}}f\Big{)}+\frac{1}{2\theta}\|x-\operatorname{prox}_{\theta f}(x)\|^{2}, we get

f(proxθf(x(t)))minffθ(x(t))minf=𝒪(1tp),f({\operatorname{prox}}_{\theta f}(x(t)))-\min_{{\mathcal{H}}}f\leq f_{\theta}(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right),

which gives the claims.

Remark 7

The above result suggests that, in the case of a nonsmooth convex function f:{+}f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}, the corresponding proximal algorithms will inherit the convergence properties of the continuous dynamic (TRIALS). When considering convex minimization problems with additive structure min{f+g}\min\{f+g\} with ff smooth and gg nonsmooth, it is in general difficult to compute the proximal mapping of f+gf+g. A common device then consists of using a splitting method, and writing the minimization problem as the fixed point problem Tx=xTx=x where, given θ>0\theta>0

Tx=proxθg(xθf(x)).Tx=\mbox{prox}_{\theta g}\left(x-\theta\nabla f(x)\right).

Under appropriate conditions, TT is an averaged nonexpansive operator BC , so the associated iterative method (proximal gradient method) converges to a fixed point of TT, and therefore of the initial minimization problem. In our context, this naturally leads to studying the inertial system

x¨(t)+δε(t)x˙(t)+(IT)(x(t))+ε(t)x(t)=0.\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+(I-T)(x(t))+\varepsilon(t)x(t)=0.

Many properties of the Tikhonov approximation are still valid for maximally monotone operators, which allows to expect good convergence properties for the above system. This is a subject for further research.

5 Numerical illustration

Let us illustrate our results in the following elementary situation. Take =20{\mathcal{H}}={\mathbb{R}}^{20} equipped with the classical Euclidean structure, and f:20+f:{\mathbb{R}}^{20}\to{\mathbb{R}}^{+} is given by

f(x1,,x20):=12i=110(x2i1+x2i1)2.f(x_{1},\cdots,x_{20}):=\dfrac{1}{2}\displaystyle\sum_{i=1}^{10}\left(x_{2i-1}+x_{2i}-1\right)^{2}.

The function ff is convex, but not strongly convex. In this case, the solution set SS is the entire affine subspace {x20:x2i1+x2i1=0, for all i=1,,10}\{x\in{\mathbb{R}}^{20}:x_{2i-1}+x_{2i}-1=0,\text{ for all }i=1,\cdots,10\}, and the projection of the origin onto SS is given by x=(12,,12)x^{*}=(\frac{1}{2},\cdots,\frac{1}{2}).

Refer to caption
Figure 1: (TRIGS) with f(x):=12i=110(x2i1+x2i1)2f(x):=\frac{1}{2}\sum_{i=1}^{10}\left(x_{2i-1}+x_{2i}-1\right)^{2}. Convergence rates of values f(x(t))f(x)f(x(t))-f(x^{*}) (top), of x(t)x2\|x(t)-x^{*}\|_{2} (middle), of x(t)xε(t)2\|x(t)-x_{\varepsilon(t)}\|_{2} (below)

The numerical experiments described in Figure 1 are in agreement with our theoretical findings. They show the trade-off between the convergence rate of values f(x(t))f(x)f(x(t))-f(x^{*}) and trajectories x(t)xε(t))2\|x(t)-x_{\varepsilon(t)})\|_{2}, and that taken around p=23p=\frac{2}{3} is a good compromise. We limit our illustration to the case 0<p<10<p<1. It is the most interesting case for obtaining good convergence rate of the trajectories towards the minimum norm solution. We also notice that the regularization Tikhonov’s term 1tpx(t)\frac{1}{t^{p}}x(t) in the system (TRIGS) reduces the oscillations. This suggests introducing the Hessian driven damping into these dynamics to further dampen oscillations, see ACFR , APR , BCL and references therein. This is related to the notion of strong damping for PDE’s.


6 Conclusion, perspective

In the general framework of convex optimization in Hilbert spaces, we have introduced a damped inertial dynamic which generates trajectories rapidly converging towards the minimum norm solution. We obtained these results by removing restrictive assumptions concerning the convergence of trajectories, made in previous works. This seems to be the first time that these two properties have been obtained for the same inertial dynamic. We have developed an in-depth mathematical Lyapunov analysis of the dynamic which is a valuable tool for the development of corresponding results for algorithms obtained by temporal discretization. Precisely, the corresponding algorithmic study must be the subject of a subsequent study. Many interesting questions such as the introduction of Hessian-driven damping to attenuate oscillations, and the study of the impact of error disturbances, merit further study. These results also adapt well to inverse problems for which strong convergence of trajectories, and obtaining a solution close to a desired state are key properties. It is likely that a parallel approach can be developed for constrained optimization, in multiobjective optimization for the dynamical approach to Pareto optima, and within the framework of potential games. The Lyapunov analysis developed in this paper could also be very useful to study the asymptotic stabilization of several classes of PDE’s, for example nonlinear damped wave equations.

Appendix A Auxiliary results

A.1 Existence and uniqueness for the Cauchy problem, energy estimates

Let us first show that the Cauchy problem for (TRIGS) is well posed. The proof relies on classical arguments combining the Cauchy-Lipschitz theorem with energy estimates. The following proof has been given in AL . We reproduce it for the convenience of the reader, and give supplementary energy estimates.

Theorem A.1

Let us make the assumptions (0)(\mathcal{H}_{0}) on ff and ε\varepsilon. Then, given (x0,v0)×(x_{0},v_{0})\in{\mathcal{H}}\times{\mathcal{H}}, there exists a unique global classical solution x:[t0,+[x:[t_{0},+\infty[\to\mathcal{H} of the Cauchy problem

{x¨(t)+δε(t)x˙(t)+f(x(t))+ϵ(t)x(t)=0x(t0)=x0,x˙(t0)=v0.\displaystyle\begin{cases}\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f\left(x(t)\right)+\epsilon(t)x(t)=0\vspace{1mm}\\ x(t_{0})=x_{0},\,\dot{x}(t_{0})=v_{0}.\end{cases} (50)

In addition, the global energy function tW(t)t\mapsto W(t) is decreasing where

W(t):=12x˙(t)2+f(x(t))+12ϵ(t)x(t)2,W(t):=\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2},

and we have the energy estimate

t0+ε(t)x˙(t)2𝑑t<+.\int_{t_{0}}^{+\infty}\sqrt{\varepsilon(t)}\|\dot{x}(t)\|^{2}dt<+\infty. (51)
Proof

Consider the Hamiltonian formulation of (50), which gives the first order system

{x˙(t)y(t)=0y˙(t)+δε(t)y(t)+f(x(t))+ϵ(t)x(t)=0x(t0)=x0,y(t0)=v0.\displaystyle\begin{cases}\dot{x}(t)-y(t)=0\vspace{1mm}\\ \dot{y}(t)+\delta\sqrt{\varepsilon(t)}y(t)+\nabla f\left(x(t)\right)+\epsilon(t)x(t)=0\vspace{1mm}\\ x(t_{0})=x_{0},\,y(t_{0})=v_{0}.\end{cases} (52)

According to the hypothesis (0)(\mathcal{H}_{0}), and by applying the Cauchy-Lipschitz theorem in the locally Lipschitz case, we obtain the existence and uniqueness of a local solution of the Cauchy problem (52). Then, in order to pass from a local solution to a global solution, we use energy estimates. By taking the scalar product of (TRIGS) with x˙(t)\dot{x}(t), we obtain

ddt(12x˙(t)2+f(x(t))+12ϵ(t)x(t)2))+δε(t)x˙(t)212ϵ˙(t)x(t)2=0.\frac{d}{dt}\Big{(}\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2})\Big{)}+\delta\sqrt{\varepsilon(t)}\|\dot{x}(t)\|^{2}-\frac{1}{2}\dot{\epsilon}(t)\|x(t)\|^{2}=0. (53)

According to (H3)(H_{3}), the function ϵ()\epsilon(\cdot) is non-increasing. Therefore, the energy function tW(t)t\mapsto W(t) is decreasing where W(t):=12x˙(t)2+f(x(t))+12ϵ(t)x(t)2.W(t):=\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2}. The end of the proof follows a standard argument. Take a maximal solution defined on an interval [t0,T[[t_{0},T[. If TT is infinite, the proof is over. Otherwise, if TT is finite, according to the above energy estimate, we have that x˙(t)\|\dot{x}(t)\| remains bounded, just like x(t)\|x(t)\| and x¨(t)\|\ddot{x}(t)\| (use (TRIGS)). Therefore, the limit of x(t)x(t) and x˙(t)\dot{x}(t) exists when tTt\to T. Applying the local existence result at TT with the initial conditions thus obtained gives a contradiction to the maximality of the solution.
Let us complete the proof with the energy estimates. Returning to (53), we get

ddt(12x˙(t)2+f(x(t))+12ϵ(t)x(t)2))+δε(t)x˙(t)20.\frac{d}{dt}\Big{(}\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2})\Big{)}+\delta\sqrt{\varepsilon(t)}\|\dot{x}(t)\|^{2}\leq 0. (54)

After integration of (54), we get (51).

References

  • (1) F. Alvarez, H. Attouch, Convergence and asymptotic stabilization for some damped hyperbolic equations with non-isolated equilibria, ESAIM Control Optim. Calc. Var. 6 (2001), 539–552.
  • (2) F. Alvarez, A. Cabot, Asymptotic selection of viscosity equilibria of semilinear evolution equations by the introduction of a slowly vanishing term, Discrete Contin. Dyn. Syst. 15 (2006), 921–938.
  • (3) V. Apidopoulos, J.-F. Aujol, Ch. Dossal, The differential inclusion modeling the FISTA algorithm and optimality of convergence rate in the case b3b\leq 3, SIAM J. Optim., 28(1) (2018), 551—574.
  • (4) H. Attouch, Variational convergence for functions and operators, Applicable Mathematics Series, Pitman Advanced Publishing Program, 1984.
  • (5) H. Attouch, Viscosity solutions of minimization problems, SIAM J. Optim. 6 (3) (1996), 769–806.
  • (6) H. Attouch, R.I. Boţ, E.R. Csetnek, Fast optimization via inertial dynamics with closed-loop damping, Journal of the European Mathematical Society (JEMS), 2021, hal-02910307.
  • (7) H. Attouch, L.M. Briceño-Arias, P.L. Combettes, A parallel splitting method for coupled monotone inclusions, SIAM J. Control Optim. 48 (5) (2010), 3246–3270.
  • (8) H. Attouch, L.M. Briceño-Arias, P.L. Combettes, A strongly convergent primal-dual method for nonoverlapping domain decomposition, Numerische Mathematik, 133(3) (2016), 443–470.
  • (9) H. Attouch, A. Cabot, Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity, J. Differential Equations, 263 (9), (2017), 5412–5458.
  • (10) H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First order optimization algorithms via inertial systems with Hessian driven damping, Math. Program. (2020), https://doi.org/10.1007/s10107-020-01591-1.
  • (11) H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Mathematical Programming, 168 (1-2) (2018), 123–175.
  • (12) H. Attouch, Z. Chbani, H. Riahi, Combining fast inertial dynamics for convex optimization with Tikhonov regularization, J. Math. Anal. Appl, 457 (2018), 1065–1094.
  • (13) H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method, J. Differential Equations, 128 (2) (1996), 519–540.
  • (14) H. Attouch, M.-O. Czarnecki, Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria, J. Differential Equations 179 (2002), 278–310.
  • (15) H. Attouch, M.-O. Czarnecki, Asymptotic behavior of coupled dynamical systems with multiscale aspects, J. Differential Equations 248 (2010), 1315–1344.
  • (16) H. Attouch, M.-O. Czarnecki, J. Peypouquet, Prox-penalization and splitting methods for constrained variational problems, SIAM J. Optim. 21 (2011), 149–173.
  • (17) H. Attouch, M.-O. Czarnecki, J. Peypouquet, Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities, SIAM J. Optim. 21 (2011), 1251–1274.
  • (18) H. Attouch, M.-O. Czarnecki, Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects, J. Differential Equations, 262 (3) (2017), 2745–2770.
  • (19) H. Attouch, S. László, Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution, arXiv:2104.11987v1 [math.OC] 24 Apr 2021.
  • (20) H. Attouch, J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1/k21/k^{2}, SIAM J. Optim., 26(3) (2016), pp. 1824–1834.
  • (21) H. Attouch, J. Peypouquet, P. Redont, Fast convex minimization via inertial dynamics with Hessian driven damping, J. Differential Equations, 261(10), (2016), 5734–5783.
  • (22) J.-B. Baillon, R. Cominetti, A convergence result for non-autonomous subgradient evolution equations and its application to the steepest descent exponential penalty trajectory in linear programming, J. Funct. Anal. 187 (2001) 263-273.
  • (23) H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert spaces, CMS Books in Mathematics, Springer, (2011).
  • (24) R. I. Bot, E. R. Csetnek, Forward-Backward and Tseng’s type penalty schemes for monotone inclusion problems, Set-Valued Var. Anal. 22 (2014), 313–331.
  • (25) R. I. Boţ, E. R. Csetnek, S.C. László, Tikhonov regularization of a second order dynamical system with Hessian damping, Math. Program. (2020), https://doi.org/10.1007/s10107-020-01528-8.
  • (26) H. Brézis, Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972).
  • (27) A. Cabot, Inertial gradient-like dynamical system controlled by a stabilizing term, J. Optim. Theory Appl. 120 (2004) 275–303.
  • (28) A. Cabot, Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization, SIAM J. Optim. 15 (2) (2005), 555–572.
  • (29) A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Trans. Amer. Math. Soc. 361 (2009), 5983–6017.
  • (30) A. Chambolle, Ch. Dossal, On the convergence of the iterates of Fista, J. Opt. Theory Appl., 166 (2015), 968–982.
  • (31) R. Cominetti, Coupling the proximal point algorithm with approximation methods, J. Optim. Theory Appl. 95 (3) (1997), 581–600.
  • (32) R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization, J. Differential Equations, 245 (2008), 3753–3763.
  • (33) A. Fiacco, G. McCormick, Nonlinear programming: Sequential Unconstrained Minimization Techniques, John Wiley and Sons, New York, (1968).
  • (34) A. Haraux, M.A. Jendoubi, A Liapunov function approach to the stabilization of second-order coupled systems, (2016) arXiv preprint arXiv:1604.06547.
  • (35) S.A. Hirstoaga, Approximation et résolution de problèmes d’équilibre, de point fixe et d’inclusion monotone. PhD thesis, Université Pierre et Marie Curie - Paris VI, 2006, HAL Id: tel-00137228.
  • (36) M.A. Jendoubi, R. May, On an asymptotically autonomous system with Tikhonov type regularizing term, Archiv der Mathematik 95 (4) (2010), 389–399.
  • (37) Y. Nesterov, A method of solving a convex programming problem with convergence rate O(1/k2)O(1/k^{2}), Soviet Math. Dokl. 27 (1983), 372–376.
  • (38) Y. Nesterov, Introductory lectures on convex optimization: A basic course, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.
  • (39) B. Polyak, Introduction to Optimization, New York, NY: Optimization Software-Inc, 1987.
  • (40) W. Siegel, Accelerated first-order methods: Differential equations and Lyapunov functions, arXiv:1903.05671v1 [math.OC], 2019.
  • (41) W. Su, S. Boyd, E. J. Candès, A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights. NIPS, December 2014.
  • (42) A. N. Tikhonov, Doklady Akademii Nauk SSSR 151 (1963) 501–504, (Translated in ”Solution of incorrectly formulated problems and the regularization method”. Soviet Mathematics 4 (1963) 1035–1038).
  • (43) A. N. Tikhonov, V. Y. Arsenin, Solutions of Ill-Posed Problems, Winston, New York, 1977.
  • (44) D. Torralba, Convergence epigraphique et changements d’échelle en analyse variationnelle et optimisation, PhD thesis, Université Montpellier, 1996.
  • (45) A. C. Wilson, B. Recht, M. I. Jordan, A Lyapunov analysis of momentum methods in optimization, arXiv preprint arXiv:1611.02635, 2016.