∎

¹¹institutetext: Hedy ATTOUCH ²²institutetext: IMAG, Univ. Montpellier, CNRS, Montpellier, France
[email protected],
Supported by COST Action: CA16228 ³³institutetext: Aïcha BALHAG ⁴⁴institutetext: Zaki CHBANI ⁵⁵institutetext: Hassan RIAHI
Cadi Ayyad University
Sémlalia Faculty of Sciences 40000 Marrakech, Morroco
[email protected] ⁶⁶institutetext: [email protected] ⁷⁷institutetext: [email protected]

Damped inertial dynamics with vanishing Tikhonov regularization: strong asymptotic convergence towards the minimum norm solution

Hedy ATTOUCH Aïcha BALHAG Zaki CHBANI Hassan RIAHI

Abstract

In a Hilbert space, we provide a fast dynamic approach to the hierarchical minimization problem which consists in finding the minimum norm solution of a convex minimization problem. For this, we study the convergence properties of the trajectories generated by a damped inertial dynamic with Tikhonov regularization. When the time goes to infinity, the Tikhonov regularization parameter is supposed to tend towards zero, not too fast, which is a key property to make the trajectories strongly converge towards the minimizer of $f$ of minimum norm. According to the structure of the heavy ball method for strongly convex functions, the viscous damping coefficient is proportional to the square root of the Tikhonov regularization parameter. Therefore, it also converges to zero, which will ensure rapid convergence of values. Precisely, under a proper tuning of these parameters, based on Lyapunov’s analysis, we show that the trajectories strongly converge towards the minimizer of minimum norm, and we provide the convergence rate of the values. We show a trade off between the property of fast convergence of values, and the property of strong convergence towards the minimum norm solution. This study improves several previous works where this type of results was obtained under restrictive hypotheses.

Keywords:

Accelerated gradient methods; convex optimization; damped inertial dynamics; hierarchical minimization; Nesterov accelerated gradient method; Tikhonov approximation.

MSC:

37N40, 46N10, 49M30, 65K05, 65K10, 90B50, 90C25.

1 Introduction

Throughout the paper, $\mathcal{H}$ is a real Hilbert space which is endowed with the scalar product $\langle\cdot,\cdot\rangle$ , with $\|x\|^{2}=\langle x,x\rangle$ for $x\in\mathcal{H}$ . We consider the convex minimization problem

\min\left\{f(x):\ x\in\mathcal{H}\right\},

(1)

where $f:\mathcal{H}\rightarrow\mathbb{R}$ is a convex continuously differentiable function whose solution set $S=\operatorname{argmin}f$ is nonempty. We aim at finding by rapid methods the element of minimum norm of $S$ . Our approach is in line with the dynamic approach developed by Attouch and László in AL to solve this question. It is based on the asymptotic analysis, as $t\to+\infty$ , of the nonautonomous damped inertial dynamic

{\rm(TRIGS)}\qquad\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0,

where the function $f$ and the Tikhonov regularization parameter $\varepsilon$ satisfy the following hypothesis¹¹1In section 4, we will extend our study to the case of a convex lower semicontinuous proper function $f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}$ .:

\displaystyle(\mathcal{H}_{0})\;\begin{cases}\;\;f:\mathcal{H}\rightarrow\mathbb{R}\mbox{ is convex and differentiable},\nabla f\mbox{ is Lipschitz continuous on bounded sets};\vspace{1mm}\\ \;\;S:=\mbox{argmin}f\neq\emptyset.\mbox{ We denote by }x^{*}\mbox{ the element of minimum norm of }S;\vspace{1mm}\\ \;\;\varepsilon:[t_{0},+\infty[\to\mathbb{R}^{+}\mbox{ is a nonincreasing function, of class }\mathcal{C}^{1},\mbox{ such that }\ \lim_{t\to\infty}\varepsilon(t)=0.\end{cases}

The Cauchy problem for (TRIGS) is well posed. The proof of the existence and uniqueness of a global solution for the corresponding Cauchy problem is given in the appendix (see also AL ). It is based on classical arguments combining the Cauchy-Lipschitz theorem with energy estimates. Our main contribution is to develop a new Lyapunov analysis which gives the strong convergence of the trajectories of (TRIGS) to the element of minimum norm of $S$ . Precisely, we give sufficient conditions on $\varepsilon(t)$ which ensure that $\lim_{t\to+\infty}\|x(t)-x^{\ast}\|=0$ . This improves the results of AL .

1.1 Attouch-László Lyapunov analysis of (TRIGS)

The main idea developed in AL consists of starting from the Polyak heavy ball with friction dynamic for strongly convex functions, then adapting it via Tikhonov approximation to deal with the case of general convex functions. Recall that a function $f:{\mathcal{H}}\to\mathbb{R}$ is said to be $\mu$ -strongly convex for some $\mu>0$ if $f-\frac{\mu}{2}\|\cdot\|^{2}$ is convex. In this setting, we have the following exponential convergence result for the damped autonomous inertial dynamic where the damping coefficient is twice the square root of the modulus of strong convexity of $f$ :

Theorem 1.1

Suppose that $f:{\mathcal{H}}\to\mathbb{R}$ is a function of class ${\mathcal{C}}^{1}$ which is $\mu$ -strongly convex for some $\mu>0$ . Let $x(\cdot):[t_{0},+\infty[\to{\mathcal{H}}$ be a solution trajectory of

\ddot{x}(t)+2\sqrt{\mu}\dot{x}(t)+\nabla f(x(t))=0.

(2)

Then, the following property holds:

f(x(t))-\min_{\mathcal{H}}f=\mathcal{O}\left(e^{-\sqrt{\mu}t}\right)\;\mbox{ as }\;t\to+\infty.

To adapt this result to the case of a general convex differentiable function $f:{\mathcal{H}}\to\mathbb{R}$ , a natural idea is to use Tikhonov’s method of regularization. This leads to consider the non-autonomous dynamic which at time $t$ is governed by the gradient of the strongly convex function

\varphi_{t}:{\mathcal{H}}\to\mathbb{R},\quad\varphi_{t}(x):=f(x)+\frac{\varepsilon(t)}{2}\|x\|^{2}.

Then, replacing $f$ by $\varphi_{t}$ in (2), and noticing that $\varphi_{t}$ is $\varepsilon(t)$ -strongly convex, this gives the following dynamic which was introduced in AL ( $\delta$ is a positive parameter)

{\rm(TRIGS)}\qquad\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0.

(TRIGS) stands shortly for Tikhonov regularization of inertial gradient systems. In order not to asymptotically modify the equilibria, it is supposed that $\varepsilon(t)\to 0$ as $t\to+\infty$ ²²2This is the key property of the asymptotic version ( $t\to+\infty$ ) of the Browder-Tikhonov regularization method.. This condition implies that (TRIGS) falls within the framework of the inertial gradient systems with asymptotically vanishing damping. The importance of this class of inertial dynamics has been highlighted by several recent studies AAD1 , ABotCest , AC10 , ACPR , AP , CD , SBC , which make the link with the accelerated gradient method of Nesterov Nest1 ; Nest2 .
The control of the decay of $\varepsilon(t)$ to zero as $t\to+\infty$ plays a key role in the Lyapunov analysis of (TRIGS), and uses the following condition.

Definition 1

Let us give $\delta>0$ . We say that $t\mapsto\varepsilon(t)$ satisfies the controlled decay property ${\rm(CD)}_{\lambda}$ , if it is a nonincreasing function which satisfies: there exists $t_{1}\geq t_{0}$ such that for all $t\geq t_{1},$

\dfrac{d}{dt}\left(\frac{1}{\sqrt{\varepsilon(t)}}\right)\leq\min(2\lambda-\delta,\delta-\lambda),

(3)

where $\lambda$ is a parameter such that $\frac{\delta}{2}<\lambda<\delta$ for $0<\delta\leq 2$ , and $\frac{\delta+\sqrt{\delta^{2}-4}}{2}<\lambda<\delta$ for $\delta>2$ .

By integrating the differential inequality (3), one can easily verify that this condition implies that $\varepsilon(t)$ is greater than or equal to $C/t^{2}$ . Since the damping coefficient is proportional to $\sqrt{\varepsilon(t)}$ , this means that it must be greater than or equal to $C/t$ . This is in accordance with the theory of inertial gradient systems with time-dependent viscosity coefficient, which states that the asymptotic optimization property is valid provided that the integral on $[t_{0},+\infty[$ of the viscous damping coefficient is infinite, see AC10 , CEG . Let us state the following convergence result obtained in AL .

Theorem 1.2

(Attouch-László AL ) Let $x:[t_{0},+\infty[\to\mathcal{H}$ be a solution trajectory of (TRIGS). Let $\delta$ be a positive parameter. Suppose that $\varepsilon(\cdot)$ satisfies the condition ${\rm(CD)}_{\lambda}$ for some $\lambda>0$ . Then, we have the following rate of convergence of values: for all $t\geq t_{1}$

f(x(t))-\min_{{\mathcal{H}}}f\leq\frac{\lambda\|x^{*}\|^{2}}{2}\frac{1}{\gamma(t)}\int_{t_{1}}^{t}\varepsilon^{\frac{3}{2}}(s)\gamma(s)ds+\frac{C}{\gamma(t)},

(4)

where

\gamma(t)=\exp\left({\displaystyle{\int_{t_{1}}^{t}\mu(s)ds}}\right),\quad\mu(t)=-\frac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}

$\mbox{ and }\;C=\left(f(x(t_{1}))-f^{\ast}\right)+\frac{\varepsilon(t_{1})}{2}\|x(t_{1})\|^{2}+\frac{1}{2}\|\lambda\sqrt{\varepsilon(t_{1})}(x(t_{1})-x^{\ast})+\dot{x}(t_{1})\|^{2}.$

The proof is based on the following Lyapunov function $\mathcal{E}:[t_{0},+\infty[\to\mathbb{R}_{+},$

\displaystyle\mathcal{E}(t):=\left(f(x(t))-\min f\right)+\frac{\varepsilon(t)}{2}\|x(t)\|^{2}+\frac{1}{2}\|c(t)(x(t)-x^{\ast})+\dot{x}(t)\|^{2},

(5)

where the function $c:[t_{0},+\infty[\to{\mathbb{R}}$ is chosen appropriately. Based on this Lyapunov analysis, it is proved in AL that $\liminf_{t\to+\infty}\|x(t)-x^{\ast}\|=0$ . We will improve this result, and show that $\lim_{t\to+\infty}\|x(t)-x^{\ast}\|=0$ . For this, we will develop a new Lyapunov analysis. Let us first recall some related previous results showing the progression of the understanding of these delicate questions, where the hierarchical minimization property is reached asymptotically.

1.2 Historical facts and related results

In relation to hierarchical optimization, a rich literature has been devoted to the coupling of dynamic gradient systems with Tikhonov regularization, and to the study of the corresponding algorithms.

1.2.1 First-order gradient dynamics

For first-order gradient systems and subdifferential inclusions, the asymptotic hierarchical minimization property which results from the introduction of a vanishing viscosity term in the dynamic (in our context the Tikhonov approximation Tikh ; TA ) has been highlighted in a series of papers AlvCab , Att2 , AttCom , AttCza2 , BaiCom , CPS , Hirstoaga . In parallel way, there is a vast literature on convex descent algorithms involving Tikhonov and more general penalty, regularization terms. The historical evolution can be traced back to Fiacco and McCormick FM , and the interpretation of interior point methods with the help of a vanishing logarithmic barrier. Some more specific references for the coupling of Prox and Tikhonov can be found in Cominetti Com . The time discretization of the first-order gradient systems and subdifferential inclusions involving multiscale (in time) features provides a natural link between the continuous and discrete dynamics. The resulting algorithms combine proximal based methods (for example forward-backward algorithms), with the viscosity of penalization methods, see AttCzaPey1 , AttCzaPey2 , BotCse1 , Cabot-inertiel ; Cab , Hirstoaga .

1.2.2 Second order gradient dynamics

First studies concerning the coupling of damped inertial dynamics with Tikhonov approximation concerned the heavy ball with friction system of Polyak Polyak , where the damping coefficient $\gamma>0$ is fixed. In AttCza1 Attouch-Czarnecki considered the system

\ddot{x}(t)+\gamma\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0.

(6)

In the slow parametrization case $\int_{0}^{+\infty}\varepsilon(t)dt=+\infty$ , they proved that any solution $x(\cdot)$ of (6) converges strongly to the minimum norm element of $\operatorname{argmin}f$ , see also JM-Tikh . A parallel study has been developed for PDE’s, see AA for damped hyperbolic equations with non-isolated equilibria, and AlvCab for semilinear PDE’s. The system (6) is a special case of the general dynamic model

\ddot{x}(t)+\gamma\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)\nabla g(x(t))=0

(7)

which involves two functions $f$ and $g$ intervening with different time scale. When $\varepsilon(\cdot)$ tends to zero moderately slowly, it was shown in Att-Czar-last that the trajectories of (7) converge asymptotically to equilibria that are solutions of the following hierarchical problem: they minimize the function $g$ on the set of minimizers of $f$ . When $\mathcal{H}={\mathcal{H}}_{1}\times{\mathcal{H}}_{2}$ is a product space, defining for $x=(x_{1},x_{2})$ , $f(x_{1},x_{2}):=f_{1}(x_{1})+f_{2}(x_{2})$ and $g(x_{1},x_{2}):=\|A_{1}x_{1}-A_{2}x_{2}\|^{2}$ , where the $A_{i},\,i\in\{1,2\}$ are linear operators, (7) provides (weakly) coupled inertial systems. The continuous and discrete-time versions of these systems have a natural connection to the best response dynamics for potential games AttCza2 , domain decomposition for PDE’s abc2 , optimal transport abc , coupled wave equations HJ2 .

In the quest for a faster convergence, the following system

\mbox{(AVD)}_{\alpha,\varepsilon}\quad\quad\ddot{x}(t)+\frac{\alpha}{t}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0,

(8)

has been studied by Attouch-Chbani-Riahi ACR . It is a Tikhonov regularization of the dynamic

\mbox{(AVD)}_{\alpha}\quad\quad\ddot{x}(t)+\frac{\alpha}{t}\dot{x}(t)+\nabla f(x(t))=0,

(9)

which was introduced by Su, Boyd and Candès in SBC . When $\alpha=3$ , $\mbox{(AVD)}_{\alpha}$ can be viewed as a continuous version of the accelerated gradient method of Nesterov. It has been the subject of many recent studies which have given an in-depth understanding of the Nesterov acceleration method, see AAD1 , AC10 , ACPR , SBC , Siegel , WRJ .

1.3 Model results

Let us illustrate our results in the case $\varepsilon(t)=\frac{1}{t^{p}}$ . In section 3, we will prove the following result:

Theorem 1.3

Take $\varepsilon(t)=1/t^{p}$ , $0<p<2$ . Let $x:[t_{0},+\infty[\to\mathcal{H}$ be a solution trajectory of

\ddot{x}(t)+\frac{\delta}{t^{\frac{p}{2}}}\dot{x}(t)+\nabla f\left(x(t)\right)+\frac{1}{t^{p}}x(t)=0.

Then, we have the following rates of convergence for the values and the trajectory:

	$\displaystyle f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)\mbox{ as }\;t\to+\infty$		(10)
	$\displaystyle\\|x(t)-x_{\varepsilon(t)}\\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)\mbox{ as }\;t\to+\infty.$		(11)

According to the strong convergence of $x_{\varepsilon(t)}$ to the minimum norm solution, the above property implies that $x(t)$ strongly converges to the minimum norm solution.

With many respect, these results represent an important advance compared to previous works:

i) Let us underline that in our approach the rapid convergence of the values and the strong convergence towards the solution of minimum norm are obtained for the same dynamic, whereas in the previous works ACR , AttCza1 , they are obtained for different dynamics corresponding to different settings of the parameters. Moreover, we obtain the strong convergence of the trajectories to the minimum norm solution, whereas in AL and ACR it was only obtained that $\liminf_{t\to+\infty}{\|x(t)-x^{\ast}\|}=0.$ It is clear that the results extend naturally to obtaining strong convergence towards the solution closest to a desired state $x_{d}$ . It suffices to replace in Tikhonov’s approximation $\|x\|^{2}$ by $\|x-x_{d}\|^{2}$ . This is important for inverse problems. In addition, we obtain a convergence rate of the values which is better than the one obtained inAL .

ii) These results show the trade-off between the property of rapid convergence of values, and the property of strong convergence towards the minimum norm solution. The two rates of convergence move in opposite directions with $p$ varies. The determination of a good compromise between these two antagonistic criteria is an interesting subject that we will consider later.

iii) Note that at the limit, when $p=2$ , which is the most interesting case to obtain a fast convergence of values comparable to the accelerated gradient method of Nesterov, then our analysis does not allow us to conclude that the trajectories are converging towards the solution of minimum norm. This question remains open, the interested reader can consult AL .

1.4 Contents

The paper is organized as follows. In section 2, for a general Tikhonov regularization parameter $\varepsilon(t)$ , we study the asymptotic convergence properties of the solution trajectories of (TRIGS). Based on Lyapunov analysis, we show their strong convergence to the element of minimum norm of $S$ , and we provide convergence rate of the values. In section 3, we apply these results to the particular case $\varepsilon(t)=\frac{1}{t^{p}}$ . Section 4 considers the extension of these results to the nonsmooth case. Section 5 contains numerical illustrations. We conclude in section 6 with some perspective and open questions.

2 Convergence analysis for general $\varepsilon(t)$

We are going to analyze via Lyapunov analysis the convergence properties as $t\to+\infty$ of the solution trajectories of the inertial dynamic (TRIGS) that we recall below

\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0.

(12)

Throughout the paper, we assume that $t_{0}$ is the origin of time, $\delta$ is a positive parameter. For each $t\geq t_{0}$ , let us introduce the function $\varphi_{t}:{\mathcal{H}}\to{\mathbb{R}}$ defined by

\varphi_{t}(x):=f(x)+\dfrac{\varepsilon(t)}{2}\|x\|^{2},

(13)

and set

x_{\varepsilon(t)}:=\operatorname{argmin}_{{\mathcal{H}}}\varphi_{t},

which is the unique minimizer of the strongly convex function $\varphi_{t}$ . The first order optimality condition gives

\nabla f(x_{\varepsilon(t)})+\varepsilon(t)x_{\varepsilon(t)}=0.

(14)

The following properties are immediate consequences of the classical properties of the Tikhonov regularization

	$\displaystyle\forall t\geq t_{0}\;\;\\|x_{\varepsilon(t)}\\|\leq\\|x^{*}\\|\vspace{3mm}$		(15)
	$\displaystyle\lim_{t\rightarrow+\infty}\\|x_{\varepsilon(t)}-x^{}\\|=0\quad\hbox{where}\>x^{}=\mbox{proj}_{\operatorname{argmin}f}0.$		(16)

2.1 Preparatory results for Lyapunov analysis

Let us introduce the real-valued function function $t\in[t_{0},+\infty[\mapsto E(t)\in{\mathbb{R}}^{+}$ that plays a key role in our Lyapunov analysis. It is defined by

E(t):=\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\|v(t)\|^{2},

(17)

where $\varphi_{t}$ has been defined in (13), and

v(t)=\tau(t)\left(x(t)-x_{\varepsilon(t)}\right)+\dot{x}(t).

(18)

The time dependent parameter $\tau(\cdot)$ will be adjusted during the proof. We will show that under judicious choice of the parameters, then $t\mapsto E(t)$ is a decreasing function. Moreover, we will estimate the rate of convergence of $E(t)$ towards zero. This will provide the rates of convergence of values and trajectories, as the following lemma shows.

Lemma 1

Let $x(\cdot):[t_{0},+\infty[\to{\mathcal{H}}$ be a solution trajectory of the damped inertial dynamic (TRIGS), and $t\in[t_{0},+\infty[\mapsto E(t)\in{\mathbb{R}}^{+}$ be the energy function defined in (17). Then, the following inequalites are satisfied: for any $t\geq t_{0}$ ,

	$\displaystyle f(x(t))-\min_{\mathcal{H}}f\leq E(t)+\dfrac{\varepsilon(t)}{2}\\|x^{*}\\|^{2};$		(19)
	$\displaystyle\\|x(t)-x_{\varepsilon(t)}\\|^{2}\leq\frac{2E(t)}{\varepsilon(t)}.$		(20)

Therefore, $x(t)$ converges strongly to $x^{*}$ as soon as $\lim_{t\to+\infty}\displaystyle{\frac{E(t)}{\varepsilon(t)}}=0.$

Proof

$i)$ According to the definition of $\varphi_{t}$ , we have

\begin{array}[]{lll}f(x(t))-\min_{\mathcal{H}}f&=&\varphi_{t}(x(t))-\varphi_{t}(x^{*})+\dfrac{\varepsilon(t)}{2}\left(\|x^{*}\|^{2}-\|x(t)\|^{2}\right)\\ &=&\left[\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right]+\left[\underbrace{\varphi_{t}(x_{\varepsilon(t)})-\varphi_{t}(x^{*})}_{\leq 0}\right]+\dfrac{\varepsilon(t)}{2}\left(\|x^{*}\|^{2}-\|x(t)\|^{2}\right)\\ &\leq&\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})+\dfrac{\varepsilon(t)}{2}\|x^{*}\|^{2}.\end{array}

By definition of $E(t)$ we have

\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\leq E(t)

(21)

which, combined with the above inequality, gives (19).

$ii)$ By the strong convexity of $\varphi_{t}$ , and $x_{\varepsilon(t)}:=\operatorname{argmin}_{{\mathcal{H}}}\varphi_{t}$ , we have

\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)}\geq\frac{\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}.

By combining the inequality above with (21), we get

E(t)\geq\frac{\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2},

which gives (20).
By the assumption $\mathcal{H}_{0})$ , we have $\lim_{t\to\infty}\varepsilon(t)=0$ . According to (16), we deduce that $x_{\varepsilon(t)}$ converges strongly to $x^{*}$ . The conclusion is a direct consequence of inequality (20).

To estimate $E(t)$ , we will show that it satisfies a first order differential inequality of the form

\dot{E}(t)+\mu(t)E(t)\leq\rho(t)\|x^{*}\|^{2}

(22)

where $\rho(t)$ and $\mu(t)$ are positive functions that will be made precise in the proof. So the first step of the proof is to compute $\dfrac{d}{dt}E(t)$ . The computation of $\dfrac{d}{dt}E(t)$ involves the two terms $\dfrac{d}{dt}\left(\varphi_{t}(x_{\varepsilon(t)})\right)$ and $\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right).$ They are evaluated in the lemma below.

Lemma 2

For each $t\geq t_{0}$ , we have

$i)$

$\dfrac{d}{dt}\left(\varphi_{t}(x_{\varepsilon(t)})\right)=\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2};$
$ii)$

$\left\|\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right)\right\|^{2}\leq-\dfrac{\dot{\varepsilon}(t)}{\varepsilon(t)}\left\langle\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right),x_{\varepsilon(t)}\right\rangle$ .

Therefore, $\left\|\dfrac{d}{dt}\left(x_{\varepsilon(t)}\right)\right\|\leq-\dfrac{\dot{\varepsilon}(t)}{\varepsilon(t)}\|x_{\varepsilon(t)}\|.$

Proof

$i)$ We have $\varphi_{t}(x_{\varepsilon(t)})=\inf_{y\in H}\{f(y)+\frac{\varepsilon(t)}{2}\|y-0\|^{2}\}=f_{\frac{1}{\varepsilon(t)}}(0).$
Since $\dfrac{d}{d\lambda}f_{\lambda}(x)=-\frac{1}{2}\|\nabla f_{\lambda}(x)\|^{2},$ (see (Att-book, , Lemma 3.27), (Att2, , Corollary 6.2)), we have:

\dfrac{d}{dt}f_{\lambda(t)}(x)=-\frac{\dot{\lambda}(t)}{2}\|\nabla f_{\lambda(t)}(x)\|^{2}.

Therefore,

\dfrac{d}{dt}\varphi_{t}(x_{\varepsilon(t)})=\dfrac{d}{dt}\left(f_{\frac{1}{\varepsilon(t)}}(0)\right)=\frac{1}{2}\dfrac{\dot{\varepsilon}(t)}{\varepsilon^{2}(t)}\|\nabla f_{\frac{1}{\varepsilon(t)}}(0)\|^{2}.

(23)

On the other hand, we have

\nabla\varphi_{t}((x_{\varepsilon(t)})=0\Longleftrightarrow\nabla f(x_{\varepsilon(t)})+\varepsilon(t)x_{\varepsilon(t)}=0\Longleftrightarrow x_{\varepsilon(t)}=J^{f}_{\frac{1}{\varepsilon(t)}}(0).

Since $\nabla f_{\frac{1}{\varepsilon(t)}}(0)=\varepsilon(t)\left(0-J^{f}_{\frac{1}{\varepsilon(t)}}(0)\right),$ we get $\nabla f_{\frac{1}{\varepsilon(t)}}(0)=-\varepsilon(t)x_{\varepsilon(t)}$ . This combined with (23) gives

\dfrac{d}{dt}\varphi_{t}(x_{\varepsilon(t)})=\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}.

We have

-\varepsilon(t)x_{\varepsilon(t)}=\nabla f(x_{\varepsilon(t)})\quad\hbox{and}\quad-\varepsilon(t+h)x_{\varepsilon(t+h)}=\nabla f(x_{\varepsilon(t+h)}).

According to the monotonicity of $\nabla f,$ we have

\langle\varepsilon(t)x_{\varepsilon(t)}-\varepsilon(t+h)x_{\varepsilon(t+h)},x_{\varepsilon(t+h)}-x_{\varepsilon(t)}\rangle\geq 0,

which implies

-\varepsilon(t)\|u_{\varepsilon(t+h)}-u_{\varepsilon(t)}\|^{2}+\left(\varepsilon(t)-\varepsilon(t+h)\right)\langle u_{\varepsilon(t+h)},u_{\varepsilon(t+h)}-u_{\varepsilon(t)}\rangle\geq 0.

After division by $h^{2},$ we obtain

\dfrac{\left(\varepsilon(t)-\varepsilon(t+h)\right)}{h}\left\langle x_{\varepsilon(t+h)},\dfrac{x_{\varepsilon(t+h)}-x_{\varepsilon(t)}}{h}\right\rangle\geq\varepsilon(t)\left\|\dfrac{x_{\varepsilon(t+h)}-x_{\varepsilon(t)}}{h}\right\|^{2}.

By letting $h\rightarrow 0,$ we get

-\dot{\varepsilon}(t)\left\langle x_{\varepsilon(t)},\dfrac{d}{dt}x_{\varepsilon(t)}\right\rangle\geq\varepsilon(t)\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2},

which completes the proof.

2.2 Main result

Given a general parameter $\varepsilon(\cdot)$ , let’s proceed with the Lyapunov analysis.

Theorem 2.1

Suppose that $f:{\mathcal{H}}\to\mathbb{R}$ is a convex function of class ${\mathcal{C}}^{1}$ . Let $x(\cdot):[t_{0},+\infty[\to{\mathcal{H}}$ be a solution trajectory of the system (TRIGS) with $\delta>0$

\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f(x(t))+\varepsilon(t)x(t)=0.

Let us assume that there exists $a,c>1$ and $t_{1}\geq t_{0}$ such that for all $t\geq t_{1},$

(\mathcal{H}_{1})\qquad\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)\leq\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right),

where $\lambda$ is such that

\frac{\delta}{2}<\lambda<\frac{a}{a+1}\delta\;\mbox{ for }0<\delta\leq 2-\frac{1}{c},\quad\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)<\lambda<\frac{a}{a+1}\delta\;\mbox{ for }\delta>2-\frac{1}{c}.

Then, the following property holds:

E(t)\leq\dfrac{\|x^{*}\|^{2}}{2}\dfrac{\displaystyle\int_{t_{1}}^{t}\left[\left((\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(s)}{\varepsilon^{\frac{3}{2}}(s)}-\dot{\varepsilon}(s)\right)\gamma(s)\right]ds}{\gamma(t)}+\dfrac{\gamma(t_{1})E(t_{1})}{\gamma(t)}

(24)

where $\gamma(t)=\exp\left(\displaystyle\int_{t_{1}}^{t}\mu(s)ds\right),$ and $\mu(t)=-\dfrac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}$ .

Proof

According to the classical derivation chain rule and Lemma 2 $i)$ , the differentiation of the function $E(\cdot)$ gives

\begin{array}[]{lll}\dot{E}(t)&=&\langle\nabla\varphi_{t}(x(t)),\dot{x}(t)\rangle+\frac{1}{2}\dot{\varepsilon}(t)\|x(t)\|^{2}-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}+\langle\dot{v}(t),v(t)\rangle.\end{array}

(25)

Using the constitutive equation (12), we have

\begin{array}[]{lll}\dot{v}(t)&=&\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\tau(t)\dot{x}(t)-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}+\ddot{x}(t),\\ &=&\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\tau(t)\dot{x}(t)-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}-\delta\sqrt{\varepsilon(t)}\dot{x}(t)-\nabla\varphi_{t}(x(t))\\ &=&\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\dot{x}(t)-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}-\nabla\varphi_{t}(x(t)).\end{array}

Therefore,

$\displaystyle\langle\dot{v}(t),v(t)\rangle$	$\displaystyle=$	$\displaystyle\left\langle\dot{\tau}(t)\left(x(t)-x_{\varepsilon(t)}\right)+\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\dot{x}(t),\tau(t)(x(t)-x_{\varepsilon(t)})+\dot{x}(t)\right\rangle$	(26)
	$\displaystyle+$	$\displaystyle\left\langle-\tau(t)\dfrac{d}{dt}x_{\varepsilon(t)}-\nabla\varphi_{t}(x(t)),\tau(t)(x(t)-x_{\varepsilon(t)})+\dot{x}(t)\right\rangle$
	$\displaystyle=$	$\displaystyle\dot{\tau}(t)\tau(t)\\|x(t)-x_{\varepsilon(t)}\\|^{2}+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle$
	$\displaystyle+$	$\displaystyle\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\\|\dot{x}(t)\\|^{2}-\tau^{2}(t)\langle\dfrac{d}{dt}x_{\varepsilon(t)},x(t)-x_{\varepsilon(t)}\rangle-\tau(t)\langle\dfrac{d}{dt}x_{\varepsilon(t)},\dot{x}(t)\rangle$
		$\displaystyle\underbrace{-\tau(t)\langle\nabla\varphi_{t}(x(t)),x(t)-x_{\varepsilon(t)})\rangle}_{=A_{0}}-\langle\nabla\varphi_{t}(x(t)),\dot{x}(t)\rangle,$

Since $\varphi_{t}$ is strongly convex, we have

\varphi_{t}(x_{\varepsilon(t)})-\varphi_{t}(x(t))\geq\left\langle\nabla\varphi_{t}(x(t)),x_{\varepsilon(t)}-x(t)\right\rangle+\frac{\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}.

Using the above inequality, we get

A_{0}=-\tau(t)\langle\nabla\varphi_{t}(x(t)),x(t)-x_{\varepsilon(t)})\rangle\leq-\tau(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)-\dfrac{\tau(t)\varepsilon(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}.

(27)

Moreover, we have for any $a>1,$

-\tau(t)\left\langle\dfrac{d}{dt}x_{\varepsilon(t)},\dot{x}(t)\right\rangle\leq\dfrac{\tau(t)}{2a}\|\dot{x}(t)\|^{2}+\dfrac{a\tau(t)}{2}\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}

and for any $b>0$

-\tau^{2}(t)\left\langle\dfrac{d}{dt}x_{\varepsilon(t)},x(t)-x_{\varepsilon(t)}\right\rangle\leq\dfrac{b\tau(t)}{2}\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}+\dfrac{\tau^{3}(t)}{2b}\|x(t)-x_{\varepsilon(t)}\|^{2}.

By combining the two above inequalities with (25), (26) and (27), and after reduction we obtain

$\displaystyle\dot{E}(t)$	$\displaystyle\leq$	$\displaystyle-\tau(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\dot{\varepsilon}(t)\\|x(t)\\|^{2}-\frac{1}{2}\dot{\varepsilon}(t)\\|x_{\varepsilon(t)}\\|^{2}$	(28)
		$\displaystyle+\left[\dot{\tau}(t)\tau(t)-\dfrac{\tau(t)\varepsilon(t)}{2}\right]\\|x(t)-x_{\varepsilon(t)}\\|^{2}$
		$\displaystyle+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle+\left(\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\\|\dot{x}(t)\\|^{2}$
		$\displaystyle-\tau^{2}(t)\langle\dfrac{d}{dt}x_{\varepsilon(t)},x(t)-x_{\varepsilon(t)}\rangle-\tau(t)\left\langle\dfrac{d}{dt}x_{\varepsilon(t)},\dot{x}(t)\right\rangle$
	$\displaystyle\leq$	$\displaystyle-\tau(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\left[(b+a)\tau(t)\right]\left\\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\\|^{2}$
		$\displaystyle+\frac{\dot{\varepsilon}(t)}{2}\\|x(t)\\|^{2}-\dfrac{\dot{\varepsilon}(t)}{2}\\|x_{\varepsilon(t)}\\|^{2}+\left((1+\frac{1}{2a})\tau(t)-\delta\sqrt{\varepsilon(t)}\right)\\|\dot{x}(t)\\|^{2}$
		$\displaystyle+\left[\dot{\tau}(t)\tau(t)-\dfrac{\tau(t)\varepsilon(t)}{2}+\dfrac{\tau^{3}(t)}{2b}\right]\\|x(t)-x_{\varepsilon(t)}\\|^{2}$
		$\displaystyle+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle.$

On the other hand, for a positive function $\mu(t),$ we have

\begin{array}[]{lll}\mu(t)E(t)&=&\mu(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\dfrac{\mu(t)}{2}\|v(t)\|^{2}\\ &=&\mu(t)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\dfrac{\mu(t)\tau^{2}(t)}{2}\|x(t)-x_{\varepsilon(t)}\|^{2}+\dfrac{\mu(t)}{2}\|\dot{x}(t)\|^{2}\\ &&+\mu(t)\tau(t)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle.\end{array}

(29)

By adding (28) and (29), we get

$\displaystyle\dot{E}(t)+\mu(t)E(t)$	$\displaystyle\leq$	$\displaystyle\left(\mu(t)-\tau(t)\right)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)+\frac{1}{2}\left[(b+a)\tau(t)\right]\\|\dfrac{d}{dt}x_{\varepsilon(t)}\\|^{2}$	(30)
		$\displaystyle-\frac{1}{2}\dot{\varepsilon}(t)\\|x_{\varepsilon(t)}\\|^{2}+\frac{\dot{\varepsilon}(t)}{2}\\|x(t)\\|^{2}+\left((1+\frac{1}{2a})\tau(t)-\delta\sqrt{\varepsilon(t)}+\frac{\mu(t)}{2}\right)\\|\dot{x}(t)\\|^{2}$
		$\displaystyle+\left[\dot{\tau}(t)\tau(t)-\dfrac{\tau(t)\varepsilon(t)}{2}+\dfrac{\tau^{3}(t)}{2b}+\dfrac{\mu(t)\tau^{2}(t)}{2}\right]\\|x(t)-x_{\varepsilon(t)}\\|^{2}$
		$\displaystyle+\left(\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}+\mu(t)\tau(t)\right)\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle.$

As we do not know a priori the sign of $\langle x(t)-x_{\varepsilon(t)},\dot{x}(t)\rangle,$ we take the coefficient in front of this term equal to zero, which gives

\dot{\tau}(t)+\tau^{2}(t)-\delta\tau(t)\sqrt{\varepsilon(t)}+\mu(t)\tau(t)=0.

(31)

Let us make the following choice of the time dependent parameter $\tau(t)$ introduced in (18) (indeed, it is a key ingredient of our Lyapunov analysis)

\tau(t)=\lambda\sqrt{\varepsilon(t)},

where $\lambda$ is a positive parameter to be fixed. Then, the relation (31) can be equivalently written

\mu(t)=-\dfrac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}.

According to this choice for $\mu(t)$ and $\tau(t),$ and neglecting the term $\frac{\dot{\varepsilon}(t)}{2}\|x(t)\|^{2}$ which is less than or equal to zero, the inequality (30) becomes

\begin{array}[]{lll}\dot{E}(t)+\mu(t)E(t)&\leq&\dfrac{1}{2\varepsilon(t)}\left(-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)\right)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)\\ &&+\dfrac{1}{2}\left[(b+a)\lambda\sqrt{\varepsilon(t)}\right]\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}\\ &&+\dfrac{1}{4\varepsilon(t)}\left[2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)\right]\|\dot{x}(t)\|^{2}\\ &&+\dfrac{\lambda}{4}\left[\lambda\dot{\varepsilon}(t)+2\left(\delta\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)+2\frac{\lambda^{2}}{b}\varepsilon^{\frac{3}{2}}(t)\right]\|x(t)-x_{\varepsilon(t)}\|^{2}.\end{array}

(32)

According to item $ii)$ of Lemma 2, and inequality (15)

$\left\|\dfrac{d}{dt}x_{\varepsilon(t)}\right\|^{2}\leq\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{2}(t)}\|x_{\varepsilon(t)}\|^{2}\leq\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{2}(t)}\|x^{*}\|^{2}.$

Using again inequality (15), and the fact that $-\dot{\varepsilon}(t)\geq 0$ , we have

$-\frac{1}{2}\dot{\varepsilon}(t)\|x_{\varepsilon(t)}\|^{2}\leq-\frac{1}{2}\dot{\varepsilon}(t)\|x^{*}\|^{2}.$

By inserting the two inequalities above in (32), we obtain

\begin{array}[]{lll}\dot{E}(t)+\mu(t)E(t)&\leq&\dfrac{1}{2\varepsilon(t)}\left(-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)\right)\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)\\ &&+\dfrac{1}{2}\left[(b+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\|x^{*}\|^{2}\\ &&+\dfrac{1}{4\varepsilon(t)}\left[2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)\right]\|\dot{x}(t)\|^{2}\\ &&+\dfrac{\lambda}{4}\left[\lambda\dot{\varepsilon}(t)+2\left(\delta\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)+2\frac{\lambda^{2}}{b}\varepsilon^{\frac{3}{2}}(t)\right]\|x(t)-x_{\varepsilon(t)}\|^{2}.\end{array}

(33)

By taking $b=c\lambda,$ with $c>1,$ we get

\begin{array}[]{lll}\dot{E}(t)+\mu(t)E(t)&\leq&\dfrac{1}{2\varepsilon(t)}\underbrace{\left(-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)\right)}_{=A}\left(\varphi_{t}(x(t))-\varphi_{t}(x_{\varepsilon(t)})\right)\\ &&+\dfrac{1}{2}\left[(\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\|x^{*}\|^{2}\\ &&+\dfrac{1}{4\varepsilon(t)}\underbrace{\left[2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)\right]}_{=B}\|\dot{x}(t)\|^{2}\\ &&+\dfrac{\lambda}{4}\underbrace{\left[\lambda\dot{\varepsilon}(t)+2\left((\delta+\frac{1}{c})\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)\right]}_{=C}\|x(t)-x_{\varepsilon(t)}\|^{2}.\end{array}

(34)

We are looking for sufficient conditions on the parameters $\lambda$ and $\varepsilon(t)$ which make $A$ , $B$ , and $C$ less than or equal to zero. It is here that the hypothesis $(\mathcal{H}_{1})$ formulated in the statement of the Theorem 2.1 intervenes. It is recalled below for the convenience of the reader

$(\mathcal{H}_{1})$ There exists $a,c>1$ and $t_{1}\geq t_{0}$ such that for all $t\geq t_{1},$

\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon}(t)}\right)\leq\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right),

where $\lambda$ is such that $\frac{\delta}{2}<\lambda<\frac{a}{a+1}\delta$ for $0<\delta\leq 2-\frac{1}{c}$ and $\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)<\lambda<\frac{a}{a+1}\delta$ for $\delta>2-\frac{1}{c}.$

Clearly, condition $(\mathcal{H}_{1})$ is equivalent to

\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon}(t)}\right)\leq 2\lambda-\delta\quad and\quad\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon}(t)}\right)\leq\delta-\frac{a+1}{a}\lambda.

Under condition $(\mathcal{H}_{1})$ we immediately obtain that $A$ and $B$ are less than or equal to zero:

$\bullet$ $A=-\dot{\varepsilon}(t)+2(\delta-2\lambda)\varepsilon^{\frac{3}{2}}(t)=2\varepsilon^{\frac{3}{2}}(t)\left[\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)+\delta-2\lambda\right]\leq 0;$

$\bullet$ $B=2\left((1+\frac{1}{a})\lambda-\delta\right)\varepsilon^{\frac{3}{2}}(t)-\dot{\varepsilon}(t)=2\varepsilon^{\frac{3}{2}}(t)\left[\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)+\dfrac{a+1}{a}\lambda-\delta\right]\leq 0.$

$\bullet$ Let us now examine $C$ .

\begin{array}[]{lll}C&=&\lambda\dot{\varepsilon}(t)+2\left((\delta+\frac{1}{c})\lambda-\lambda^{2}-1\right)\varepsilon^{\frac{3}{2}}(t)\\ &=&2\varepsilon^{\frac{3}{2}}(t)\left[-\lambda\underbrace{\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)}_{\geq 0}+\left((\delta+\frac{1}{c})\lambda-\lambda^{2}-1\right)\right].\end{array}

When $\delta\leq 2-\frac{1}{c}$ we have $(\delta+\frac{1}{c})\lambda-\lambda^{2}-1\leq 2\lambda-\lambda^{2}-1\leq 0.$
When $\delta>2-\frac{1}{c},$ we have $(\delta+\frac{1}{c})\lambda-\lambda^{2}-1\leq 0,$ because $\lambda\geq\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)$ .
This implies that $C\leq 0.$

According to (34), under condition $(\mathcal{H}_{1})$ we conclude that

\dot{E}(t)+\mu(t)E(t)\leq\frac{1}{2}\left[(\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\|x^{*}\|^{2}

(35)

By multiplying the inequality above with $\gamma(t)=\exp\left(\displaystyle\int_{t_{1}}^{t}\mu(s)ds\right),$ we obtain

\dfrac{d}{dt}\left(\gamma(t)E(t)\right)\leq\dfrac{\|x^{*}\|^{2}}{2}\left[(\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(t)}{\varepsilon^{\frac{3}{2}}(t)}-\dot{\varepsilon}(t)\right]\gamma(t)

(36)

By integrating (36) on $[t_{1},t]$ , we get

E(t)\leq\dfrac{\|x^{*}\|^{2}}{2}\dfrac{\displaystyle\int_{t_{1}}^{t}\left[\left((\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(s)}{\varepsilon^{\frac{3}{2}}(s)}-\dot{\varepsilon}(s)\right)\gamma(s)\right]ds}{\gamma(t)}+\dfrac{\gamma(t_{1})E(t_{1})}{\gamma(t)}.

This completes the proof of the Lyapunov analysis.

Remark 1

Given $\delta>0$ , the condition $(\mathcal{H}_{1})$ gives the admissible values of the parameters $a>1,\;c>1$ and $\lambda>0$ which enter into the convergence rates obtained in Theorem 2.1. Let us verify that the inequalities that define the values of these parameters are consistent. We consider successively the two cases $\delta<2$ , then $\delta\geq 2$ .

a) When $\delta<2$ , we have $\delta<2-\frac{1}{c}$ for $c$ sufficiently large. Because $a>1$ , we have $\frac{1}{2}<\frac{a}{a+1}$ , and hence the interval $[\frac{1}{2}\delta,\frac{a}{a+1}\delta]$ is nonempty. Therefore, in this case the conditions are consistent.

b) Suppose now that $\delta\geq 2$ . Then $\delta>2-\frac{1}{c}$ for all $c>0$ , and we can argue with $c$ arbitrarily large. Let us verify that we can find $a$ and $c$ such that

\frac{\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}}{2\;}<\frac{a}{a+1}\delta,

(37)

and hence a value of $\delta$ belonging to the corresponding interval. By letting $c\to+\infty$ and $a\to+\infty$ in the above inequality we obtain

\delta+\sqrt{\delta^{2}-4}<2\delta,

which is equivalent to $\sqrt{\delta^{2}-4}<\delta$ , and hence is satisfied. By a continuity argument, we obtain that the inequality (37) is satisfied by taking $a$ and $c$ sufficiently large.
Note that, since we are interested in asymptotic results the important point is to get the existence of parameters for which the Lyapunov analysis is valid. If we are interested in complexity results then the precise value of these parameters is important.

Remark 2

The above argument shows that the controlled decay property ${\rm(CD)}_{\lambda}$ used in AL corresponds to the limiting case $c\to+\infty$ and $a\to+\infty$ in our condition $(\mathcal{H}_{1})$ .

Remark 3

As in AL , our Lyapunov analysis is valid for an arbitrary choice of the parameter $\delta$ . It would be interesting to know what is the best choice for $\delta$ .

3 Particular cases

Let’s study the case $\varepsilon(t)=\dfrac{1}{t^{p}}$ , and discuss, according to the value of the parameter $0<p<2$ , the convergence rate of values, and the convergence rate to zero of $\|x(t)-x_{\epsilon(t)}\|$ . The following results were stated in Theorem 1.3, in the introduction, as model results. We reproduce them here for the convenience of the reader. The point is simply to apply the general Theorem 2.1 to this particular situation, and to show that the different quantities involved in the convergence results can be computed explicitly.

Theorem 3.1

Take $\varepsilon(t)=\displaystyle\frac{1}{t^{p}}$ , $0<p<2$ . Let $x:[t_{0},+\infty[\to\mathcal{H}$ be a solution trajectory of

\ddot{x}(t)+\frac{\delta}{\displaystyle{t^{\frac{p}{2}}}}\dot{x}(t)+\nabla f\left(x(t)\right)+\frac{1}{t^{p}}x(t)=0.

Then, we have convergence of the values, and strong convergence to the minimum norm solution with the following rates:

	$\displaystyle f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle\frac{1}{t^{p}}\right)\mbox{ as }\;t\to+\infty;$		(38)
	$\displaystyle\\|x(t)-x_{\varepsilon(t)}\\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)\mbox{ as }\;t\to+\infty.$		(39)

Proof

a) Convergence rate of the values: With the notations of Theorem 2.1, we have

\mu(t)=-\dfrac{\dot{\varepsilon}(t)}{2\varepsilon(t)}+(\delta-\lambda)\sqrt{\varepsilon(t)}=\dfrac{p}{2t}+\dfrac{\delta-\lambda}{t^{\frac{p}{2}}}.

	$\displaystyle\gamma(t)$	$\displaystyle=$	$\displaystyle\exp\left(\displaystyle\int_{t_{1}}^{t}\mu(s)ds\right)=\left(\dfrac{t}{t_{1}}\right)^{\frac{p}{2}}\exp\left[\frac{2(\delta-\lambda)}{2-p}\left(t^{\frac{2-p}{2}}-t_{1}^{\frac{2-p}{2}}\right)\right]$		(40)
		$\displaystyle=$	$\displaystyle C_{1}t^{\frac{p}{2}}\exp\left[\frac{2(\delta-\lambda)}{2-p}t^{\frac{2-p}{2}}\right]$		(40)

where $C_{1}=\left(t_{1}^{\frac{p}{2}}\exp\left[\frac{2(\delta-\lambda)}{2-p}t_{1}^{\frac{2-p}{2}}\right]\right)^{-1}.$ Let us choose the parameters $a,c>1$ , $\lambda>0$ such that

$\frac{\delta}{2}<\lambda<\frac{a}{a+1}\delta$ for $0<\delta\leq 2-\frac{1}{c}$ and $\frac{\delta}{2}<\frac{1}{2}\left(\delta+\frac{1}{c}+\sqrt{(\delta+\frac{1}{c})^{2}-4}\right)<\lambda<\frac{a}{a+1}\delta$ for $\delta>2-\frac{1}{c}$ .

Notice that, in these two cases, we have $\frac{1}{2}\delta<\lambda<\frac{a+1}{a}\delta$ . Therefore, $\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right)>0$ . On the other hand, since $p<2$ , we have $\lim_{t\to+\infty}t^{\frac{p-2}{2}}=0$ . So we have, for $t\geq t_{1}$ large enough,

\dfrac{d}{dt}\left(\dfrac{1}{\sqrt{\varepsilon(t)}}\right)=\frac{p}{2}t^{\frac{p-2}{2}}\leq\min\left(2\lambda-\delta\;,\;\delta-\frac{a+1}{a}\lambda\right).

As a consequence, the condition $(\mathcal{H}_{1})$ is satisfied.

According to (24), we have $E(t)\leq E_{1}(t)\|x^{*}\|^{2}+E_{2}(t)$ where

E_{1}(t)=\dfrac{1}{2\gamma(t)}\displaystyle{\int_{t_{1}}^{t}\left[\left((\lambda c+a)\lambda\dfrac{\dot{\varepsilon}^{2}(s)}{\varepsilon^{\frac{3}{2}}(s)}-\dot{\varepsilon}(s)\right)\gamma(s)\right]ds}

(41)

and

E_{2}(t)=\dfrac{\gamma(t_{1})E(t_{1})}{\gamma(t)}.

(42)

According to (40) we have

E_{2}(t)\leq Ct^{-\frac{p}{2}}\exp\left[-\frac{2(\delta-\lambda)}{2-p}t^{\frac{2-p}{2}}\right].

Since $0<p<2$ and $\delta>\lambda$ , we have that $E_{2}(t)$ tends to zero at an exponential rate, as $t\to+\infty$ .
Thus, we only have to focus on the asymptotic behavior of $E_{1}(t)$ . Let us simplify the formula by setting $\lambda_{0}:=(\lambda c+a)\lambda,\;\delta_{0}:=\dfrac{2(\delta-\lambda)}{2-p}$ in (24). Replacing $\varepsilon(t)$ and $\gamma(t)$ by their values in (41) gives

E_{1}(t)=\dfrac{p}{2t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\displaystyle\int_{t_{1}}^{t}\left(\dfrac{\lambda_{0}p}{s^{2}}+\frac{1}{s^{\frac{p+2}{2}}}\right)\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)ds.

To compute the above integral, we notice that

\dfrac{d}{ds}\left(\dfrac{1}{\rho s}\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)\right)=\left(-\dfrac{1}{\rho s^{2}}+\dfrac{\delta_{0}(2-p)}{2\rho s^{\frac{p+2}{2}}}\right)\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right).

Then, note that, when we set $\rho>0$ such that $\rho<\min\left(1\;,\;\frac{1}{a+1}\delta\right)$ , we obtain

\begin{array}[]{lll}\dfrac{\lambda_{0}p}{s^{2}}+\dfrac{1}{s^{\frac{p+2}{2}}}\leq-\dfrac{1}{\rho s^{2}}+\dfrac{\delta_{0}(2-p)}{2\rho s^{\frac{p+2}{2}}}&\Longleftrightarrow&\dfrac{\lambda_{0}p+\frac{1}{\rho}}{s^{2}}\leq\left(\dfrac{\delta_{0}(2-p)}{2\rho}-1\right)\dfrac{1}{s^{\frac{p+2}{2}}}=\dfrac{\frac{\delta-\lambda}{\rho}-1}{s^{\frac{p+2}{2}}}\\ &\Longleftrightarrow&\dfrac{\lambda_{0}p+\frac{1}{\rho}}{s^{\frac{2-p}{2}}}\leq\dfrac{\delta-\lambda}{\rho}-1.\end{array}

Let us verify that the last above inequality is satisfied for $s$ large enough. First, since $0<p<2$ , we have $\frac{2-p}{2}>0$ , and hence $\lim_{s\to+\infty}\dfrac{1}{s^{\frac{2-p}{2}}}=0$ . On the other hand, according to $(\mathcal{H}_{1})$ we have $\lambda<\frac{a}{a+1}\delta$ . This property combined with the choice of $\rho$ implies

\delta-\rho>\delta-\frac{1}{a+1}\delta=\frac{a}{a+1}\delta>\lambda.

Therefore $\dfrac{\delta-\lambda}{\rho}-1>0$ . So, for $t_{1}$ large enough

\begin{array}[]{lll}E_{1}(t)&\leq&\dfrac{p}{2t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\displaystyle\int_{t_{1}}^{t}\left(-\dfrac{1}{\rho s^{2}}+\dfrac{\delta_{0}(2-p)}{2\rho s^{\frac{p+2}{2}}}\right)\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)ds\\ &=&\dfrac{1}{2t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\displaystyle\int_{t_{1}}^{t}\dfrac{d}{ds}\left(\dfrac{1}{\rho s}\exp\left(\delta_{0}s^{\frac{2-p}{2}}\right)\right)ds\\ &=&\dfrac{p}{2\rho t^{\frac{p+2}{2}}}-\dfrac{p}{t^{\frac{p}{2}}\exp\left(\delta_{0}t^{\frac{2-p}{2}}\right)}\dfrac{1}{2\rho t_{1}}\exp\left(\delta_{0}t_{1}^{\frac{2-p}{2}}\right)\leq\dfrac{p}{2\rho t^{\frac{p+2}{2}}}.\end{array}

Since $E_{2}(t)$ has an exponential decay to zero, we deduce that there exists a positive constant $C$ such that for $t$ large enough

E(t)\leq\dfrac{C}{t^{\frac{p+2}{2}}}.

According to Lemma 1, we get

f(x(t))-\min_{\mathcal{H}}f\leq C\left(\dfrac{1}{t^{\frac{p+2}{2}}}+\frac{1}{t^{p}}\right).

Since $0<p<2,$ we have $p<\frac{p+2}{2}$ . We conclude that

f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)\mbox{ as }\;t\to+\infty.

b) Convergence rate to zero of $\|x(t)-x^{*}\|$ . According to Lemma 1, we have

\|x(t)-x_{\varepsilon(t)}\|^{2}\leq\frac{2E(t)}{\varepsilon(t)},

(43)

and since, for $t$ large enough, $E(t)\leq\dfrac{C}{t^{\frac{p+2}{2}}}$ , we obtain

\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right),

which completes the proof.

Remark 4

The convergence rate of the values $\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)$ obtained in Theorem 3.1 notably improves the result obtained in AL , where the convergence rate was of order $\mathcal{O}\left(\dfrac{1}{t^{\frac{3p-2}{2}}}\right)$ . Indeed, for $p<2$ we have $p>\frac{3p-2}{2}$ . In addition, we have obtained that for any $0<p<2$ , for any trajectory of (TRIGS), we have strong convergence of the trajectory to the minimum norm solution, as time $t$ tends toward $+\infty$ . In AL it was only obtained $\liminf_{t\to+\infty}\|x(t)-x^{*}\|=0$ .

Remark 5

A close look at the proof of Theorem 3.1 shows that the convergence rate of the values is still valid in the case $p=2$ . Precisely, by taking $\varepsilon(t)=\frac{c}{t^{2}}$ with $c$ sufficiently small, we have that the condition $(\mathcal{H}_{1})$ is satisfied, and hence

f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle\frac{1}{t^{2}}\right)\mbox{ as }\;t\to+\infty.

Remark 6

As a key ingredient of our proof of the strong convergence of the trajectories of (TRIGS) to the element of minimum norm of $S$ , we use the function $h(t):=\frac{1}{2}\|x(t)-x_{\varepsilon(t)}\|^{2},$ and show that $\lim_{t\to+\infty}h(t)=0$ . This strategy which consists in showing that the trajectory is not too far from the viscosity curve $\varepsilon\mapsto x_{\varepsilon}$ was already present in the approach developed by Attouch and Cominetti in AttCom for the study of similar questions in the case of the steepest descent method.

3.1 Trade-off between the convergence rate of values and trajectories

The following elementary diagram shows the respective evolution as $p$ varies of the convergence rate of the values, and the convergence rate of $\|x(t)-x_{\varepsilon(t)}\|^{2}$ .

We observe that $p=\frac{2}{3}$ is a good compromise between these two antagonist properties. Let us state the corresponding result below.

Corollary 1

Take $\varepsilon(t)=\displaystyle{\frac{1}{t^{\frac{2}{3}}}}$ , $\delta>0$ . Let $x:[t_{0},+\infty[\to\mathcal{H}$ be a solution trajectory of

\ddot{x}(t)+\displaystyle{\frac{\delta}{t^{\frac{1}{3}}}}\dot{x}(t)+\nabla f\left(x(t)\right)+\displaystyle{\frac{1}{t^{\frac{2}{3}}}}x(t)=0.

Then, we have convergence of the values, and strong convergence to the minimum norm solution with the following rates:

f(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{\frac{2}{3}}}}\right)\mbox{ and }\;\;\|x(t)-x_{\varepsilon(t)}\|^{2}=\mathcal{O}\left(\displaystyle{\frac{1}{t^{\frac{2}{3}}}}\right)\mbox{ as }\;t\to+\infty.

b) Another interesting case is to take $p<2$ , with $p$ close to $2$ . In this case, we have a convergence rate of the values which is arbitrarily close to the convergence rate of the accelerated gradient method of Nesterov, with a guarantee of strong convergence towards the minimum norm solution. The case $p=2$ has been studied extensively in AL . The strong convergence of the trajectories to the minimum norm solution is an open question in the case $p=2$ .

c) Estimating the convergence rate of $x(t)$ to $x^{*}$ relies on getting informations about the viscosity trajectory $\epsilon\mapsto x_{\varepsilon}$ , and how fast $x_{\varepsilon}$ converges to $x^{*}$ as $\varepsilon\to 0$ . This is a difficult problem because the viscosity trajectory can have an infinite length, as Torralba showed in Torralba . His counterexample involves the construction of a convex function from its sub-level sets, and relates to a convex function whose sub-level sets vary greatly. Our analysis focuses on general $f$ , i.e. the worst case. This suggests that, under good geometric properties of $f$ , such as the Kurdyka-Lojasiewicz property, one should be able to obtain better results; see also AttCom where it is shown the importance of this kind of question for the coupling of the steepest descent with Tikhonov approximation.

4 Non-smooth case

Let us extend the previous results to the case of a proper lower semicontinuous and convex function $f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}$ . We rely on the basic properties of the Moreau envelope. Recall that, for any $\theta>0$ , $f_{\theta}:{\mathcal{H}}\to{\mathbb{R}}$ is defined by

f_{\theta}(x)=\min_{\xi\in{\mathcal{H}}}\left\{f(\xi)+\frac{1}{2\theta}\|x-\xi\|^{2}\right\},\quad\text{for any $x\in{\mathcal{H}}$.}

(44)

Then $f_{\theta}$ is a convex differentiable function, whose gradient is $\theta^{-1}$ -Lipschitz continuous, and such that $\min_{{\mathcal{H}}}f=\min_{{\mathcal{H}}}f_{\theta}$ , $\operatorname{argmin}_{{\mathcal{H}}}f_{\theta}=\operatorname{argmin}_{{\mathcal{H}}}f$ . Denoting by $\operatorname{prox}_{\theta f}(x)$ the unique point where the minimum value is achieved in (44), let us recall the following classical formulae:

1.

$f_{\theta}(x)=f(\operatorname{prox}_{\theta f}(x))+\frac{1}{2\theta}\|x-\operatorname{prox}_{\theta f}(x)\|^{2}$ .
2.

$\nabla f_{\theta}(x)=\frac{1}{\theta}\left(x-\operatorname{prox}_{\theta f}(x)\right)$ .

The interested reader may refer to BC ; Bre1 for a comprehensive treatment of the Moreau envelope in a Hilbert setting. Since the set of minimizers is preserved by taking the Moreau envelope, the idea is to replace $f$ by $f_{\theta}$ in the inertial dynamic (TRIGS). Then, (TRIGS) applied to $f_{\theta}$ now reads

\mbox{(TRIGS)}_{\theta}\quad\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f_{\theta}(x(t))+\varepsilon(t)x(t)=0.

Clearly, since $f_{\theta}$ is continuously differentiable, the hypothesis $(\mathcal{H}_{0})$ are satisfied by the above dynamic. By applying Theorem 3.1 to $\mbox{(TRIGS)}_{\theta}$ , we get the following result, which provides convergence rate of the values and strong convergence to the minimum norm solution.

Theorem 4.1

Let $f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}$ be a convex, lower semicontinuous, proper function. Take $\varepsilon(t)=1/t^{p}$ , $0<p<2$ , and $\theta>0$ . Let $x:[t_{0},+\infty[\to\mathcal{H}$ be a solution trajectory of $\mbox{\rm(TRIGS)}_{\theta}$ , i.e.

\ddot{x}(t)+\frac{\delta}{t^{\frac{p}{2}}}\dot{x}(t)+\nabla f_{\theta}\left(x(t)\right)+\frac{1}{t^{p}}x(t)=0.

Then, we have the following convergence rates: as $t\to+\infty$

	$\displaystyle f({\operatorname{prox}}_{\theta f}(x(t)))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right);$		(45)
	$\displaystyle\\|x(t)-{\operatorname{prox}}_{\theta f}(x(t)))\\|^{2}=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right);$		(46)
	$\displaystyle\\|x(t)-x_{\varepsilon(t)}\\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right).$		(47)

Proof

By applying Theorem 3.1 to the function $f_{\theta}$ , and since $\inf f_{\theta}=\inf f$ , we get

	$\displaystyle f_{\theta}(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right)\mbox{ as }\;t\to+\infty$		(48)
	$\displaystyle\\|x(t)-x_{\varepsilon(t)}\\|^{2}=\mathcal{O}\left(\dfrac{1}{t^{\frac{2-p}{2}}}\right)\mbox{ as }\;t\to+\infty.$		(49)

According to $f_{\theta}(x)-\min_{{\mathcal{H}}}f=\Big{(}f(\operatorname{prox}_{\theta f}(x))-\min_{{\mathcal{H}}}f\Big{)}+\frac{1}{2\theta}\|x-\operatorname{prox}_{\theta f}(x)\|^{2}$ , we get

f({\operatorname{prox}}_{\theta f}(x(t)))-\min_{{\mathcal{H}}}f\leq f_{\theta}(x(t))-\min_{{\mathcal{H}}}f=\mathcal{O}\left(\displaystyle{\frac{1}{t^{p}}}\right),

which gives the claims.

Remark 7

The above result suggests that, in the case of a nonsmooth convex function $f:{\mathcal{H}}\to{\mathbb{R}}\cup\left\{+\infty\right\}$ , the corresponding proximal algorithms will inherit the convergence properties of the continuous dynamic (TRIALS). When considering convex minimization problems with additive structure $\min\{f+g\}$ with $f$ smooth and $g$ nonsmooth, it is in general difficult to compute the proximal mapping of $f+g$ . A common device then consists of using a splitting method, and writing the minimization problem as the fixed point problem $Tx=x$ where, given $\theta>0$

Tx=\mbox{prox}_{\theta g}\left(x-\theta\nabla f(x)\right).

Under appropriate conditions, $T$ is an averaged nonexpansive operator BC , so the associated iterative method (proximal gradient method) converges to a fixed point of $T$ , and therefore of the initial minimization problem. In our context, this naturally leads to studying the inertial system

\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+(I-T)(x(t))+\varepsilon(t)x(t)=0.

Many properties of the Tikhonov approximation are still valid for maximally monotone operators, which allows to expect good convergence properties for the above system. This is a subject for further research.

5 Numerical illustration

Let us illustrate our results in the following elementary situation. Take ${\mathcal{H}}={\mathbb{R}}^{20}$ equipped with the classical Euclidean structure, and $f:{\mathbb{R}}^{20}\to{\mathbb{R}}^{+}$ is given by

f(x_{1},\cdots,x_{20}):=\dfrac{1}{2}\displaystyle\sum_{i=1}^{10}\left(x_{2i-1}+x_{2i}-1\right)^{2}.

The function $f$ is convex, but not strongly convex. In this case, the solution set $S$ is the entire affine subspace $\{x\in{\mathbb{R}}^{20}:x_{2i-1}+x_{2i}-1=0,\text{ for all }i=1,\cdots,10\}$ , and the projection of the origin onto $S$ is given by $x^{*}=(\frac{1}{2},\cdots,\frac{1}{2})$ .

Refer to caption — Figure 1: (TRIGS) with $f(x):=\frac{1}{2}\sum_{i=1}^{10}\left(x_{2i-1}+x_{2i}-1\right)^{2}$ . Convergence rates of values $f(x(t))-f(x^{*})$ (top), of $\|x(t)-x^{*}\|_{2}$ (middle), of $\|x(t)-x_{\varepsilon(t)}\|_{2}$ (below)

The numerical experiments described in Figure 1 are in agreement with our theoretical findings. They show the trade-off between the convergence rate of values $f(x(t))-f(x^{*})$ and trajectories $\|x(t)-x_{\varepsilon(t)})\|_{2}$ , and that taken around $p=\frac{2}{3}$ is a good compromise. We limit our illustration to the case $0<p<1$ . It is the most interesting case for obtaining good convergence rate of the trajectories towards the minimum norm solution. We also notice that the regularization Tikhonov’s term $\frac{1}{t^{p}}x(t)$ in the system (TRIGS) reduces the oscillations. This suggests introducing the Hessian driven damping into these dynamics to further dampen oscillations, see ACFR , APR , BCL and references therein. This is related to the notion of strong damping for PDE’s.

6 Conclusion, perspective

In the general framework of convex optimization in Hilbert spaces, we have introduced a damped inertial dynamic which generates trajectories rapidly converging towards the minimum norm solution. We obtained these results by removing restrictive assumptions concerning the convergence of trajectories, made in previous works. This seems to be the first time that these two properties have been obtained for the same inertial dynamic. We have developed an in-depth mathematical Lyapunov analysis of the dynamic which is a valuable tool for the development of corresponding results for algorithms obtained by temporal discretization. Precisely, the corresponding algorithmic study must be the subject of a subsequent study. Many interesting questions such as the introduction of Hessian-driven damping to attenuate oscillations, and the study of the impact of error disturbances, merit further study. These results also adapt well to inverse problems for which strong convergence of trajectories, and obtaining a solution close to a desired state are key properties. It is likely that a parallel approach can be developed for constrained optimization, in multiobjective optimization for the dynamical approach to Pareto optima, and within the framework of potential games. The Lyapunov analysis developed in this paper could also be very useful to study the asymptotic stabilization of several classes of PDE’s, for example nonlinear damped wave equations.

Appendix A Auxiliary results

A.1 Existence and uniqueness for the Cauchy problem, energy estimates

Let us first show that the Cauchy problem for (TRIGS) is well posed. The proof relies on classical arguments combining the Cauchy-Lipschitz theorem with energy estimates. The following proof has been given in AL . We reproduce it for the convenience of the reader, and give supplementary energy estimates.

Theorem A.1

Let us make the assumptions $(\mathcal{H}_{0})$ on $f$ and $\varepsilon$ . Then, given $(x_{0},v_{0})\in{\mathcal{H}}\times{\mathcal{H}}$ , there exists a unique global classical solution $x:[t_{0},+\infty[\to\mathcal{H}$ of the Cauchy problem

\displaystyle\begin{cases}\ddot{x}(t)+\delta\sqrt{\varepsilon(t)}\dot{x}(t)+\nabla f\left(x(t)\right)+\epsilon(t)x(t)=0\vspace{1mm}\\ x(t_{0})=x_{0},\,\dot{x}(t_{0})=v_{0}.\end{cases}

(50)

In addition, the global energy function $t\mapsto W(t)$ is decreasing where

W(t):=\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2},

and we have the energy estimate

\int_{t_{0}}^{+\infty}\sqrt{\varepsilon(t)}\|\dot{x}(t)\|^{2}dt<+\infty.

(51)

Proof

Consider the Hamiltonian formulation of (50), which gives the first order system

\displaystyle\begin{cases}\dot{x}(t)-y(t)=0\vspace{1mm}\\ \dot{y}(t)+\delta\sqrt{\varepsilon(t)}y(t)+\nabla f\left(x(t)\right)+\epsilon(t)x(t)=0\vspace{1mm}\\ x(t_{0})=x_{0},\,y(t_{0})=v_{0}.\end{cases}

(52)

According to the hypothesis $(\mathcal{H}_{0})$ , and by applying the Cauchy-Lipschitz theorem in the locally Lipschitz case, we obtain the existence and uniqueness of a local solution of the Cauchy problem (52). Then, in order to pass from a local solution to a global solution, we use energy estimates. By taking the scalar product of (TRIGS) with $\dot{x}(t)$ , we obtain

\frac{d}{dt}\Big{(}\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2})\Big{)}+\delta\sqrt{\varepsilon(t)}\|\dot{x}(t)\|^{2}-\frac{1}{2}\dot{\epsilon}(t)\|x(t)\|^{2}=0.

(53)

According to $(H_{3})$ , the function $\epsilon(\cdot)$ is non-increasing. Therefore, the energy function $t\mapsto W(t)$ is decreasing where $W(t):=\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2}.$ The end of the proof follows a standard argument. Take a maximal solution defined on an interval $[t_{0},T[$ . If $T$ is infinite, the proof is over. Otherwise, if $T$ is finite, according to the above energy estimate, we have that $\|\dot{x}(t)\|$ remains bounded, just like $\|x(t)\|$ and $\|\ddot{x}(t)\|$ (use (TRIGS)). Therefore, the limit of $x(t)$ and $\dot{x}(t)$ exists when $t\to T$ . Applying the local existence result at $T$ with the initial conditions thus obtained gives a contradiction to the maximality of the solution.
Let us complete the proof with the energy estimates. Returning to (53), we get

\frac{d}{dt}\Big{(}\frac{1}{2}\|\dot{x}(t)\|^{2}+f(x(t))+\frac{1}{2}\epsilon(t)\|x(t)\|^{2})\Big{)}+\delta\sqrt{\varepsilon(t)}\|\dot{x}(t)\|^{2}\leq 0.

(54)

After integration of (54), we get (51).

References

(1) F. Alvarez, H. Attouch, Convergence and asymptotic stabilization for some damped hyperbolic equations with non-isolated equilibria, ESAIM Control Optim. Calc. Var. 6 (2001), 539–552.
(2) F. Alvarez, A. Cabot, Asymptotic selection of viscosity equilibria of semilinear evolution equations by the introduction of a slowly vanishing term, Discrete Contin. Dyn. Syst. 15 (2006), 921–938.
(3) V. Apidopoulos, J.-F. Aujol, Ch. Dossal, The differential inclusion modeling the FISTA algorithm and optimality of convergence rate in the case $b\leq 3$ , SIAM J. Optim., 28(1) (2018), 551—574.
(4) H. Attouch, Variational convergence for functions and operators, Applicable Mathematics Series, Pitman Advanced Publishing Program, 1984.
(5) H. Attouch, Viscosity solutions of minimization problems, SIAM J. Optim. 6 (3) (1996), 769–806.
(6) H. Attouch, R.I. Boţ, E.R. Csetnek, Fast optimization via inertial dynamics with closed-loop damping, Journal of the European Mathematical Society (JEMS), 2021, hal-02910307.
(7) H. Attouch, L.M. Briceño-Arias, P.L. Combettes, A parallel splitting method for coupled monotone inclusions, SIAM J. Control Optim. 48 (5) (2010), 3246–3270.
(8) H. Attouch, L.M. Briceño-Arias, P.L. Combettes, A strongly convergent primal-dual method for nonoverlapping domain decomposition, Numerische Mathematik, 133(3) (2016), 443–470.
(9) H. Attouch, A. Cabot, Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity, J. Differential Equations, 263 (9), (2017), 5412–5458.
(10) H. Attouch, Z. Chbani, J. Fadili, H. Riahi, First order optimization algorithms via inertial systems with Hessian driven damping, Math. Program. (2020), https://doi.org/10.1007/s10107-020-01591-1.
(11) H. Attouch, Z. Chbani, J. Peypouquet, P. Redont, Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity, Mathematical Programming, 168 (1-2) (2018), 123–175.
(12) H. Attouch, Z. Chbani, H. Riahi, Combining fast inertial dynamics for convex optimization with Tikhonov regularization, J. Math. Anal. Appl, 457 (2018), 1065–1094.
(13) H. Attouch, R. Cominetti, A dynamical approach to convex minimization coupling approximation with the steepest descent method, J. Differential Equations, 128 (2) (1996), 519–540.
(14) H. Attouch, M.-O. Czarnecki, Asymptotic control and stabilization of nonlinear oscillators with non-isolated equilibria, J. Differential Equations 179 (2002), 278–310.
(15) H. Attouch, M.-O. Czarnecki, Asymptotic behavior of coupled dynamical systems with multiscale aspects, J. Differential Equations 248 (2010), 1315–1344.
(16) H. Attouch, M.-O. Czarnecki, J. Peypouquet, Prox-penalization and splitting methods for constrained variational problems, SIAM J. Optim. 21 (2011), 149–173.
(17) H. Attouch, M.-O. Czarnecki, J. Peypouquet, Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities, SIAM J. Optim. 21 (2011), 1251–1274.
(18) H. Attouch, M.-O. Czarnecki, Asymptotic behavior of gradient-like dynamical systems involving inertia and multiscale aspects, J. Differential Equations, 262 (3) (2017), 2745–2770.
(19) H. Attouch, S. László, Convex optimization via inertial algorithms with vanishing Tikhonov regularization: fast convergence to the minimum norm solution, arXiv:2104.11987v1 [math.OC] 24 Apr 2021.
(20) H. Attouch, J. Peypouquet, The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than $1/k^{2}$ , SIAM J. Optim., 26(3) (2016), pp. 1824–1834.
(21) H. Attouch, J. Peypouquet, P. Redont, Fast convex minimization via inertial dynamics with Hessian driven damping, J. Differential Equations, 261(10), (2016), 5734–5783.
(22) J.-B. Baillon, R. Cominetti, A convergence result for non-autonomous subgradient evolution equations and its application to the steepest descent exponential penalty trajectory in linear programming, J. Funct. Anal. 187 (2001) 263-273.
(23) H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert spaces, CMS Books in Mathematics, Springer, (2011).
(24) R. I. Bot, E. R. Csetnek, Forward-Backward and Tseng’s type penalty schemes for monotone inclusion problems, Set-Valued Var. Anal. 22 (2014), 313–331.
(25) R. I. Boţ, E. R. Csetnek, S.C. László, Tikhonov regularization of a second order dynamical system with Hessian damping, Math. Program. (2020), https://doi.org/10.1007/s10107-020-01528-8.
(26) H. Brézis, Opérateurs maximaux monotones dans les espaces de Hilbert et équations d’évolution, Lecture Notes 5, North Holland, (1972).
(27) A. Cabot, Inertial gradient-like dynamical system controlled by a stabilizing term, J. Optim. Theory Appl. 120 (2004) 275–303.
(28) A. Cabot, Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization, SIAM J. Optim. 15 (2) (2005), 555–572.
(29) A. Cabot, H. Engler, S. Gadat, On the long time behavior of second order differential equations with asymptotically small dissipation, Trans. Amer. Math. Soc. 361 (2009), 5983–6017.
(30) A. Chambolle, Ch. Dossal, On the convergence of the iterates of Fista, J. Opt. Theory Appl., 166 (2015), 968–982.
(31) R. Cominetti, Coupling the proximal point algorithm with approximation methods, J. Optim. Theory Appl. 95 (3) (1997), 581–600.
(32) R. Cominetti, J. Peypouquet, S. Sorin, Strong asymptotic convergence of evolution equations governed by maximal monotone operators with Tikhonov regularization, J. Differential Equations, 245 (2008), 3753–3763.
(33) A. Fiacco, G. McCormick, Nonlinear programming: Sequential Unconstrained Minimization Techniques, John Wiley and Sons, New York, (1968).
(34) A. Haraux, M.A. Jendoubi, A Liapunov function approach to the stabilization of second-order coupled systems, (2016) arXiv preprint arXiv:1604.06547.
(35) S.A. Hirstoaga, Approximation et résolution de problèmes d’équilibre, de point fixe et d’inclusion monotone. PhD thesis, Université Pierre et Marie Curie - Paris VI, 2006, HAL Id: tel-00137228.
(36) M.A. Jendoubi, R. May, On an asymptotically autonomous system with Tikhonov type regularizing term, Archiv der Mathematik 95 (4) (2010), 389–399.
(37) Y. Nesterov, A method of solving a convex programming problem with convergence rate $O(1/k^{2})$ , Soviet Math. Dokl. 27 (1983), 372–376.
(38) Y. Nesterov, Introductory lectures on convex optimization: A basic course, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.
(39) B. Polyak, Introduction to Optimization, New York, NY: Optimization Software-Inc, 1987.
(40) W. Siegel, Accelerated first-order methods: Differential equations and Lyapunov functions, arXiv:1903.05671v1 [math.OC], 2019.
(41) W. Su, S. Boyd, E. J. Candès, A Differential Equation for Modeling Nesterov’s Accelerated Gradient Method: Theory and Insights. NIPS, December 2014.
(42) A. N. Tikhonov, Doklady Akademii Nauk SSSR 151 (1963) 501–504, (Translated in ”Solution of incorrectly formulated problems and the regularization method”. Soviet Mathematics 4 (1963) 1035–1038).
(43) A. N. Tikhonov, V. Y. Arsenin, Solutions of Ill-Posed Problems, Winston, New York, 1977.
(44) D. Torralba, Convergence epigraphique et changements d’échelle en analyse variationnelle et optimisation, PhD thesis, Université Montpellier, 1996.
(45) A. C. Wilson, B. Recht, M. I. Jordan, A Lyapunov analysis of momentum methods in optimization, arXiv preprint arXiv:1611.02635, 2016.

Damped inertial dynamics with vanishing Tikhonov regularization: strong asymptotic convergence towards the minimum norm solution

Abstract

Keywords:

MSC:

1 Introduction

1.1 Attouch-László Lyapunov analysis of (TRIGS)

Theorem 1.1

Definition 1

Theorem 1.2

1.2 Historical facts and related results

1.2.1 First-order gradient dynamics

1.2.2 Second order gradient dynamics

1.3 Model results

Theorem 1.3

1.4 Contents

2 Convergence analysis for general ε​(t)\varepsilon(t)

2.1 Preparatory results for Lyapunov analysis

Lemma 1

Proof

Lemma 2

Proof

2.2 Main result

Theorem 2.1

Proof

Remark 1

Remark 2

Remark 3

3 Particular cases

Theorem 3.1

Proof

Remark 4

Remark 5

Remark 6

3.1 Trade-off between the convergence rate of values and trajectories

Corollary 1

4 Non-smooth case

Theorem 4.1

Proof

Remark 7

5 Numerical illustration

6 Conclusion, perspective

Appendix A Auxiliary results

A.1 Existence and uniqueness for the Cauchy problem, energy estimates

Theorem A.1

Proof

References

2 Convergence analysis for general $\varepsilon(t)$