Time-adaptive Lagrangian Variational Integrators
for Accelerated Optimization on Manifolds

Valentin Duruisseaux and Melvin Leok

Abstract.

A variational framework for accelerated optimization was recently introduced on normed vector spaces and Riemannian manifolds in Wibisono et al. [65] and Duruisseaux and Leok [19]. It was observed that a careful combination of time-adaptivity and symplecticity in the numerical integration can result in a significant gain in computational efficiency. It is however well known that symplectic integrators lose their near energy preservation properties when variable time-steps are used. The most common approach to circumvent this problem involves the Poincaré transformation on the Hamiltonian side, and was used in Duruisseaux et al. [20] to construct efficient explicit algorithms for symplectic accelerated optimization. However, the current formulations of Hamiltonian variational integrators do not make intrinsic sense on more general spaces such as Riemannian manifolds and Lie groups. In contrast, Lagrangian variational integrators are well-defined on manifolds, so we develop here a framework for time-adaptivity in Lagrangian variational integrators and use the resulting geometric integrators to solve optimization problems on vector spaces and Lie groups.

1. Introduction

Many machine learning algorithms are designed around the minimization of a loss function or the maximization of a likelihood function. Due to the ever-growing scale of data sets, there has been a lot of focus on first-order optimization algorithms because of their low cost per iteration. In 1983, Nesterov’s accelerated gradient method was introduced in [54], and was shown to converge in $\mathcal{O}(1/k^{2})$ to the minimum of the convex objective function $f$ , improving on the $\mathcal{O}(1/k)$ convergence rate exhibited by the standard gradient descent methods. This $\mathcal{O}(1/k^{2})$ convergence rate was shown in [55] to be optimal among first-order methods using only information about $\nabla f$ at consecutive iterates. This phenomenon in which an algorithm displays this improved rate of convergence is referred to as acceleration, and other accelerated algorithms have been derived since Nesterov’s algorithm. More recently, it was shown in [59] that Nesterov’s accelerated gradient method limits, as the time-step goes to 0, to a second-order differential equation and that the objective function $f(x(t))$ converges to its optimal value at a rate of $\mathcal{O}(1/t^{2})$ along the trajectories of this ordinary differential equation. It was later shown in [65] that in continuous time, the convergence rate of $f(x(t))$ can be accelerated to an arbitrary convergence rate $\mathcal{O}(1/t^{p})$ in normed spaces, by considering flow maps generated by a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This framework for accelerated optimization in normed vector spaces has been studied and exploited using geometric numerical integrators in [3; 31; 8; 22; 21; 20]. In [20], time-adaptive geometric integrators have been proposed to take advantage of the time-rescaling property of the Bregman family and design efficient explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity could result in a significant gain in computational efficiency, by simulating higher-order Bregman dynamics using the computationally efficient lower-order Bregman integrators applied to the time-rescaled dynamics.

More generally, symplectic integrators form a class of geometric integrators of interest since, when applied to Hamiltonian systems, they yield discrete approximations of the flow that preserve the symplectic 2-form and as a result also preserve many qualitative aspects of the underlying dynamical system (see [26]). In particular, when applied to conservative Hamiltonian systems, symplectic integrators exhibit excellent long-time near-energy preservation. However, when symplectic integrators were first used in combination with variable time-steps, the near-energy preservation was lost and the integrators performed poorly (see [7; 24]). There has been a great effort to circumvent this problem, and there have been many successes, including methods based on the Poincaré transformation [69; 25]: a Poincaré transformed Hamiltonian in extended phase space is constructed which allows the use of variable time-steps in symplectic integrators without losing the nice conservation properties associated to these integrators. In [20], the Poincaré transformation was incorporated in the Hamiltonian variational integrator framework which provides a systematic method for constructing symplectic integrators of arbitrarily high-order based on the discretization of Hamilton’s principle [49; 27], or equivalently, by the approximation of the generating function of the symplectic flow map. The Poincaré transformation was at the heart of the construction of time-adaptive geometric integrators for Bregman Hamiltonian systems which resulted in efficient, explicit algorithms for accelerated optimization in [20].

In [60; 42], accelerated optimization algorithms were proposed in the Lie group setting for specific choices of parameters in the Bregman family, and [2] provided a first example of Bregman dynamics on Riemannian manifolds. The entire variational framework was later generalized to the Riemannian manifold setting in [19], and time-adaptive geometric integrators taking advantage of the time-rescaling property of the Bregman family have been proposed in the Riemannian manifold setting as well using discrete variational integrators incorporating holonomic constraints [16] and projection-based variational integrators [18]. Note that both these strategies relied on exploiting the structure of the Euclidean spaces in which the Riemannian manifolds are embedded. Although the Whitney and Nash Embedding Theorems [64; 63; 53] imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, designing intrinsic methods that would exploit and preserve the symmetries and geometric properties of the manifold could have advantages both in terms of computational efficiency and in terms of improving our understanding of the acceleration phenomenon on Riemannian manifolds. Developing an intrinsic extension of Hamiltonian variational integrators to manifolds would require some additional work, since the current approach involves Type II/III generating functions $H_{d}^{+}(q_{k},p_{k+1})$ , $H_{d}^{-}(p_{k},q_{k+1})$ , which depend on the position at one boundary point, and the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space. On the other hand, Lagrangian variational integrators involve a Type I generating function $L_{d}(q_{k},q_{k+1})$ which only depends on the position at the boundary points and is therefore well-defined on manifolds, and many Lagrangian variational integrators have been derived on Riemannian manifolds, especially in the Lie group [38; 39; 5; 27; 56; 43; 28; 37; 36] and homogeneous space [40] settings. This gives an incentive to construct a mechanism on the Lagrangian side which mimics the Poincaré transformation, since it is more natural and easier to work on the Lagrangian side on more general spaces than on the Hamiltonian side. However, a first difficulty is that the Poincaré transformed Hamiltonian is degenerate and therefore does not have a corresponding Type I Lagrangian formulation. As a result, we cannot exploit the usual correspondence between Hamiltonian and Lagrangian dynamics and need to come up with a different strategy. A second difficulty is that all the literature to this day on the Poincaré transformation constructs the Poincaré transformed system by reverse-engineering, which does not provide much insight into the origin of the mechanism and how it can be extended to different systems.

Outline. We first review the basics of variational integration of Lagrangian and Hamiltonian systems, and the Poincaré transformation in Section 2. We then introduce a simple but novel derivation of the Poincaré transformation from a variational principle in Section 3.1. This gives additional insight into the transformation mechanism and provides natural candidates for time-adaptivity on the Lagrangian side, which we then construct both in continuous and discrete time in Sections 3.2 and 3.3. We then compare the performance of the resulting time-adaptive Lagrangian accelerated optimization algorithms to their Poincaré Hamiltonian analogues in Section 4. Finally, we demonstrate in Section 5 that our time-adaptive Lagrangian approach extends naturally to more general spaces without having to face the obstructions experienced on the Hamiltonian side.

Contributions. In summary, the main contributions of this paper are:

•

A novel derivation of the Poincaré transformation from a variational principle, in Section 3.1
•

New frameworks for variable time-stepping in Lagrangian integrators, and new discrete variational formulations of Lagrangian mechanics with these new variable time-stepping mechanisms, in Sections 3.2 and 3.3
•

New explicit symplectic accelerated optimization algorithms on normed vector spaces
•

New intrinsic symplectic accelerated optimization algorithms on Riemannian manifolds

2. Background

2.1. Lagrangian and Hamiltonian Mechanics

Given a $n$ -dimensional manifold $\mathcal{Q}$ , a Lagrangian is a function $L:T\mathcal{Q}\rightarrow\mathbb{R}$ . The corresponding action integral $\mathcal{S}$ is the functional

\mathcal{S}(q)=\int_{0}^{T}{L(q,\dot{q})dt},

(2.1)

over the space of smooth curves $q:[0,T]\rightarrow\mathcal{Q}$ . Hamilton’s variational principle states that $\delta\mathcal{S}=0$ where the variation $\delta\mathcal{S}$ is induced by an infinitesimal variation $\delta q$ of the trajectory $q$ that vanishes at the endpoints. Given local coordinates $(q^{1},\ldots,q^{n})$ on the manifold $\mathcal{Q}$ , Hamilton’s variational principle can be shown to be equivalent to the Euler–Lagrange equations,

\frac{d}{dt}\left(\frac{\partial L}{\partial\dot{q}^{k}}\right)=\frac{\partial L}{\partial q^{k}},\qquad\text{for }k=1,\ldots,n.

(2.2)

The Legendre transform $\mathbb{F}L:T\mathcal{Q}\rightarrow T^{*}\mathcal{Q}$ of $L$ is defined fiberwise by $\mathbb{F}L:(q^{i},\dot{q}^{i})\mapsto\left(q^{i},\frac{\partial L}{\partial\dot{q}^{i}}\right)$ , and we say that a Lagrangian $L$ is regular or nondegenerate if the Hessian matrix $\frac{\partial^{2}L}{\partial\dot{q}^{2}}$ is invertible for every $q$ and $\dot{q}$ , and hyperregular if the Legendre transform $\mathbb{F}L$ is a diffeomorphism. A hyperregular Lagrangian on $T\mathcal{Q}$ induces a Hamiltonian system on $T^{*}\mathcal{Q}$ via

H(q,p)=\langle\mathbb{F}L(q,\dot{q}),\dot{q}\rangle-L(q,\dot{q})=\sum_{j=1}^{n}{p_{j}\dot{q}^{j}}-L(q,\dot{q})\bigg{|}_{p_{i}=\frac{\partial L}{\partial\dot{q}^{i}}},

(2.3)

where $p_{i}=\frac{\partial L}{\partial\dot{q}^{i}}\in T^{*}\mathcal{Q}$ is the conjugate momentum of $q^{i}$ . A Hamiltonian $H$ is called hyperregular if $\mathbb{F}H:T^{*}\mathcal{Q}\rightarrow T\mathcal{Q}$ defined by $\mathbb{F}H(\alpha)\cdot\beta=\frac{d}{ds}\big{|}_{s=0}H(\alpha+s\beta)$ , is a diffeomorphism. Hyperregularity of the Hamiltonian $H$ implies invertibility of the Hessian matrix $\frac{\partial^{2}H}{\partial p^{2}}$ and thus nondegeneracy of $H$ . Theorem 7.4.3 in [48] states that hyperregular Lagrangians and hyperregular Hamiltonians correspond in a bijective manner. We can also define a Hamiltonian variational principle on the Hamiltonian side in momentum phase space which is equivalent to Hamilton’s equations,

\dot{p}_{k}=-\frac{\partial H}{\partial q^{k}}(p,q),\qquad\dot{q}^{k}=\frac{\partial H}{\partial p_{k}}(p,q),\qquad\text{for }k=1,\ldots,n.

(2.4)

These equations can also be shown to be equivalent to the Euler–Lagrange equations (2.2), provided that the Lagrangian is hyperregular.

2.2. Variational Integrators

Variational integrators are derived by discretizing Hamilton’s principle, instead of discretizing Hamilton’s equations directly. As a result, variational integrators are symplectic, preserve many invariants and momentum maps, and have excellent long-time near-energy preservation (see [49]).

Traditionally, variational integrators have been designed based on the Type I generating function commonly known as the discrete Lagrangian, $L_{d}:Q\times Q\rightarrow\mathbb{R}$ . The exact discrete Lagrangian that generates the time- $h$ flow of Hamilton’s equations can be represented both in a variational form and in a boundary-value form. The latter is given by

L_{d}^{E}(q_{0},q_{1};h)=\int_{0}^{h}L(q(t),\dot{q}(t))dt,

(2.5)

where $q(0)=q_{0},$ $q(h)=q_{1},$ and $q$ satisfies the Euler–Lagrange equations over the time interval $[0,h]$ . A variational integrator is defined by constructing an approximation $L_{d}:Q\times Q\rightarrow\mathbb{R}$ to $L_{d}^{E}$ , and then applying the discrete Euler–Lagrange equations,

p_{k}=-D_{1}L_{d}(q_{k},q_{k+1}),\qquad p_{k+1}=D_{2}L_{d}(q_{k},q_{k+1}),

(2.6)

where $D_{i}$ denotes a partial derivative with respect to the $i$ -th argument, and these equations implicitly define the integrator map $\tilde{F}_{L_{d}}:(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1})$ . The error analysis is greatly simplified via Theorem 2.3.1 of [49], which states that if a discrete Lagrangian, $L_{d}:Q\times Q\rightarrow\mathbb{R}$ , approximates the exact discrete Lagrangian $L_{d}^{E}:Q\times Q\rightarrow\mathbb{R}$ to order $r$ , i.e.,

L_{d}(q_{0},q_{1};h)=L_{d}^{E}(q_{0},q_{1};h)+\mathcal{O}(h^{r+1}),

(2.7)

then the discrete Hamiltonian map $\tilde{F}_{L_{d}}:(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1})$ , viewed as a one-step method, has order of accuracy $r$ . Many other properties of the integrator, such as momentum conservation properties of the method, can be determined by analyzing the associated discrete Lagrangian, as opposed to analyzing the integrator directly.

Variational integrators have been extended to the framework of Type II and Type III generating functions, commonly referred to as discrete Hamiltonians (see [34; 46; 57]). Hamiltonian variational integrators are derived by discretizing Hamilton’s phase space principle. The boundary-value formulation of the exact Type II generating function of the time- $h$ flow of Hamilton’s equations is given by the exact discrete right Hamiltonian,

H_{d}^{+,E}(q_{0},p_{1};h)=p_{1}^{\top}q_{1}-\int_{0}^{h}\left[p(t)^{\top}\dot{q}(t)-H(q(t),p(t))\right]dt,

(2.8)

where $(q,p)$ satisfies Hamilton’s equations with boundary conditions $q(0)=q_{0}$ and $p(h)=p_{1}$ . A Type II Hamiltonian variational integrator is constructed by using a discrete right Hamiltonian $H_{d}^{+}$ approximating $H_{d}^{+,E}$ , and applying the discrete right Hamilton’s equations,

p_{0}=D_{1}H_{d}^{+}(q_{0},p_{1}),\qquad q_{1}=D_{2}H_{d}^{+}(q_{0},p_{1}),

(2.9)

which implicitly defines the integrator, $\tilde{F}_{H_{d}^{+}}:(q_{0},p_{0})\mapsto(q_{1},p_{1})$ .

Theorem 2.3.1 of [49], which simplified the error analysis for Lagrangian variational integrators, has an analogue for Hamiltonian variational integrators. Theorem 2.2 in [57] states that if a discrete right Hamiltonian $H^{+}_{d}$ approximates the exact discrete right Hamiltonian $H_{d}^{+,E}$ to order $r$ , i.e.,

H^{+}_{d}(q_{0},p_{1};h)=H_{d}^{+,E}(q_{0},p_{1};h)+\mathcal{O}(h^{r+1}),

(2.10)

then the discrete right Hamilton’s map $\tilde{F}_{H^{+}_{d}}:(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1})$ , viewed as a one-step method, is order $r$ accurate. Note that discrete left Hamiltonians and corresponding discrete left Hamilton’s maps can also be constructed in the Type III case (see [46; 20]).

Examples of variational integrators include Galerkin variational integrators [46; 49], Prolongation-Collocation variational integrators [45], and Taylor variational integrators [58]. In many cases, the Type I and Type II/III approaches will produce equivalent integrators. This equivalence has been established in [58] for Taylor variational integrators provided the Lagrangian is hyperregular, and in [46] for generalized Galerkin variational integrators constructed using the same choices of basis functions and numerical quadrature formula provided the Hamiltonian is hyperregular. However, Hamiltonian and Lagrangian variational integrators are not always equivalent. In particular, it was shown in [57] that even when the Hamiltonian and Lagrangian integrators are analytically equivalent, they might still have different numerical properties because of numerical conditioning issues. Even more to the point, Lagrangian variational integrators cannot always be constructed when the underlying Hamiltonian is degenerate. This is particularly relevant in variational accelerated optimization since the time-adaptive Hamiltonian framework for accelerated optimization presented in [20] relies on a degenerate Hamiltonian which has no associated Lagrangian description. We will thus not be able to exploit the usual correspondence between Hamiltonian and Lagrangian dynamics and will have to come up with a different strategy to allow time-adaptivity on the Lagrangian side.

We now describe the construction of Taylor variational integrators as introduced in [58] as we will use them in our numerical experiments. A discrete approximate Lagrangian or Hamiltonian is constructed by approximating the flow map and the trajectory associated with the boundary values using a Taylor method, and approximating the integral by a quadrature rule. The Taylor variational integrator is generated by the implicit discrete Euler–Lagrange equations associated with the discrete Lagrangian or by the Hamilton’s equations associated with the discrete Hamiltonian. More explicitly, we first construct $\rho$ -order and $(\rho+1)$ -order Taylor methods $\Psi_{h}^{(\rho)}$ and $\Psi_{h}^{(\rho+1)}$ approximating the exact time- $h$ flow map $\Phi_{h}:TQ\rightarrow TQ$ corresponding to the Euler–Lagrange equation in the Type I case or the exact time- $h$ flow map $\Phi_{h}:T^{*}Q\rightarrow T^{*}Q$ corresponding to Hamilton’s equation in the Type II case. Let $\pi_{Q}:(q,p)\mapsto q$ and $\pi_{T^{*}Q}:(q,p)\mapsto p$ . Given a quadrature rule of order $s$ with weights and nodes $(b_{i},c_{i})$ for $i=1,...,m$ , the Taylor variational integrators are then constructed as follows:

Type I Lagrangian Taylor Variational Integrator (LTVI):

(i)

Approximate $\dot{q}(0)=v_{0}$ by the solution $\tilde{v}_{0}$ of the problem $q_{1}=\pi_{Q}\circ\Psi_{h}^{(\rho+1)}(q_{0},\tilde{v}_{0}).$
(ii)

Generate approximations $(q_{c_{i}},v_{c_{i}})\approx(q(c_{i}h),\dot{q}(c_{i}h))$ via $(q_{c_{i}},v_{c_{i}})=\Psi_{c_{i}h}^{(\rho)}(q_{0},\tilde{v}_{0}).$
(iii)

Apply the quadrature rule to obtain the associated discrete Lagrangian,

$L_{d}(q_{0},q_{1};h)=h\sum_{i=1}^{m}{b_{i}L(q_{c_{i}},v_{c_{i}})}.$
(iv)

The variational integrator is then defined by the implicit discrete Euler–Lagrange equations,

$p_{0}=-D_{1}L_{d}(q_{0},q_{1}),\qquad p_{1}=D_{2}L_{d}(q_{0},q_{1}).$

Type II Hamiltonian Taylor Variational Integrator (HTVI):

(i)

Approximate $p(0)=p_{0}$ by the solution $\tilde{p}_{0}$ of the problem $p_{1}=\pi_{T^{*}Q}\circ\Psi_{h}^{(\rho)}(q_{0},\tilde{p}_{0}).$
(ii)

Generate approximations $(q_{c_{i}},p_{c_{i}})\approx(q(c_{i}h),p(c_{i}h))$ via $(q_{c_{i}},p_{c_{i}})=\Psi_{c_{i}h}^{(\rho)}(q_{0},\tilde{p}_{0}).$
(iii)

Approximate $q_{1}$ via $\tilde{q}_{1}=\pi_{Q}\circ\Psi_{h}^{(\rho+1)}(q_{0},\tilde{p}_{0}).$
(iv)

Use the continuous Legendre transform to obtain $\dot{q}_{c_{i}}=\frac{\partial H}{\partial p_{c_{i}}}$ .

(v)

Apply the quadrature rule to obtain the associated discrete right Hamiltonian,

H_{d}^{+}(q_{0},p_{1};h)=p_{1}^{\top}\tilde{q}_{1}-h\sum_{i=1}^{m}{b_{i}\left[p_{c_{i}}^{\top}\dot{q}_{c_{i}}-H(q_{c_{i}},p_{c_{i}})\right]}.

(vi)

The variational integrator is then defined by the implicit discrete right Hamilton’s equations,

$q_{1}=D_{2}H_{d}^{+}(q_{0},p_{1}),\qquad p_{0}=D_{1}H_{d}^{+}(q_{0},p_{1}).$

The following error analysis results were derived in [58] and [20]:

Theorem 2.1.

Suppose the Lagrangian $L$ is Lipschitz continuous in both variables, and is sufficiently regular for the Taylor method $\Psi_{h}^{(\rho+1)}$ to be well-defined.

Then $L_{d}(q_{0},q_{1})$ approximates $L_{d}^{E}(q_{0},q_{1})$ with at least order of accuracy $\min{(\rho+1,s)}$ .

By Theorem 2.3.1 in [49], the associated discrete Hamiltonian map has the same order of accuracy.

Theorem 2.2.

Suppose the Hamiltonian $H$ and its partial derivative $\frac{\partial H}{\partial p}$ are Lipschitz continuous in both variables, and $H$ is sufficiently regular for the Taylor method $\Psi_{h}^{(\rho+1)}$ to be well-defined.

Then $H_{d}^{+}(q_{0},p_{1})$ approximates $H_{d}^{+,E}(q_{0},p_{1})$ with at least order of accuracy $\min{(\rho+1,s)}$ .

By Theorem 2.2 in [57], the associated discrete right Hamilton’s map has the same order of accuracy.

Note that analogous constructions and error analysis results have been derived in [20; 58] for discrete left Hamiltonians in the Type III case.

2.3. Time-adaptive Hamiltonian integrators via the Poincaré transformation

Symplectic integrators form a class of geometric numerical integrators of interest since, when applied to conservative Hamiltonian systems, they yield discrete approximations of the flow that preserve the symplectic 2-form [26], which results in the preservation of many qualitative aspects of the underlying system. In particular, symplectic integrators exhibit excellent long-time near-energy preservation. However, when symplectic integrators were first used in combination with variable time-steps, the near-energy preservation was lost and the integrators performed poorly [7; 24]. Backward error analysis provided justification both for the excellent long-time near-energy preservation of symplectic integrators and for the poor performance experienced when using variable time-steps (see Chapter IX of [26]): it shows that symplectic integrators can be associated with a modified Hamiltonian in the form of a formal power series in terms of the time-step. The use of a variable time-step results in a different modified Hamiltonian at every iteration, which is the source of the poor energy conservation. The Poincaré transformation is one way to incorporate variable time-steps in geometric integrators without losing the nice conservation properties associated with these integrators.

Given a Hamiltonian $H(q,t,p)$ , consider a desired transformation of time $t\mapsto\tau$ described by the monitor function $g(q,t,p)$ via

\frac{dt}{d\tau}=g(q,t,p).

(2.11)

The time $t$ shall be referred to as the physical time, while $\tau$ will be referred to as the fictive time, and we will denote derivatives with respect to $t$ and $\tau$ by dots and apostrophes, respectively. A new Hamiltonian system is constructed using the Poincaré transformation,

\bar{H}(\bar{q},\bar{p})=g(q,\mathfrak{q},p)\left(H(q,\mathfrak{q},p)+\mathfrak{p}\right),

(2.12)

in the extended phase space defined by $\bar{q}=\left[\begin{smallmatrix}q\\ \mathfrak{q}\end{smallmatrix}\right]\in\bar{\mathcal{Q}}$ and $\bar{p}=\left[\begin{smallmatrix}p\\ \mathfrak{p}\end{smallmatrix}\right]$ where $\mathfrak{p}$ is the conjugate momentum for $\mathfrak{q}=t$ with $\mathfrak{p}(0)=-H(q(0),0,p(0))$ . The corresponding equations of motion in the extended phase space are then given by

\bar{q}^{\prime}=\frac{\partial\bar{H}}{\partial\bar{p}},\qquad\qquad\bar{p}^{\prime}=-\frac{\partial\bar{H}}{\partial\bar{q}}.

(2.13)

Suppose $(\bar{Q}(\tau),\bar{P}(\tau))$ are solutions to these extended equations of motion, and let $(q(t),p(t))$ solve Hamilton’s equations for the original Hamiltonian $H$ . Then

\bar{H}(\bar{Q}(\tau),\bar{P}(\tau))=\bar{H}(\bar{Q}(0),\bar{P}(0))=0.

(2.14)

Therefore, the components $(Q(\tau),P(\tau))$ in the original phase space of the augmented solutions $(\bar{Q}(\tau),\bar{P}(\tau))$ satisfy

H(Q(\tau),\tau,P(\tau))=-\mathfrak{p}(\tau),\qquad H(Q(0),0,P(0))=-\mathfrak{p}(0)=H(q(0),0,p(0)).

(2.15)

Then, $(Q(\tau),P(\tau))$ and $(q(t),p(t))$ both satisfy Hamilton’s equations for the original Hamiltonian $H$ with the same initial values, so they must be the same. Note that the Hessian is given by

\frac{\partial^{2}\bar{H}}{\partial\bar{p}^{2}}=\begin{bmatrix}\frac{\partial H}{\partial p}\nabla_{p}g(\bar{q},p)^{\top}+g(\bar{q},p)\frac{\partial^{2}H}{\partial p^{2}}+\nabla_{p}g(\bar{q},p)\frac{\partial H}{\partial p}^{\top}\ \ \nabla_{p}g(\bar{q},p)\\ \nabla_{p}g(\bar{q},p)^{\top}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad 0\end{bmatrix},

(2.16)

which will be singular in many cases. The degeneracy of the Hamiltonian $\bar{H}$ implies that there is no corresponding Type I Lagrangian formulation. This approach works seamlessly with the existing methods and theorems for Hamiltonian variational integrators, but where the system under consideration is the transformed Hamiltonian system resulting from the Poincaré transformation. We can use a symplectic integrator with constant time-step in fictive time $\tau$ on the Poincaré transformed system, which will have the effect of integrating the original system with the desired variable time-step in physical time $t$ via the relation $\frac{dt}{d\tau}=g(q,t,p)$ .

3. Time-adaptive Lagrangian Integrators

The Poincaré transformation for time-adaptive symplectic integrators on the Hamiltonian side presented in Section 2.3 with autonomous monitor function $g(q,p)$ was first introduced in [69], and extended to the case where $g$ can also depend on time based on ideas from [25]. All the literature to date on the Poincaré transformation have constructed the Poincaré transformed system by reverse-engineering: the Poincaré transformed Hamiltonian is chosen in such a way that the corresponding component dynamics satisfy Hamilton’s equations in the original space.

3.1. Variational Derivation of the Poincaré Hamiltonian

We now depart from the traditional reverse-engineering strategy for the Poincaré transformation and present a new way to think about the Poincaré transformed Hamiltonian by deriving it from a variational principle. This simple derivation gives additional insight into the transformation mechanism and provides natural candidates for time-adaptivity on the Lagrangian side and for more general frameworks.

As before, we work in the extended space $(q,\mathfrak{q},p,\mathfrak{p})$ where $\mathfrak{q}=t$ and $\mathfrak{p}$ is the corresponding conjugate momentum, and consider a time transformation $t\rightarrow\tau$ given by

\frac{dt}{d\tau}=g(q,t,p).

(3.1)

We define an extended action functional $\mathfrak{S}:C^{2}([0,T],T^{*}\bar{\mathcal{Q}})\rightarrow\mathbb{R}$ by

$\displaystyle\mathfrak{S}(\bar{q}(\cdot),\bar{p}(\cdot))$	$\displaystyle=\bar{p}(T)\bar{q}(T)-\int_{0}^{T}{\left[\bar{p}(t)\dot{\bar{q}}(t)-H(q(t),t,p(t))-\mathfrak{p}(t)\right]dt}$	(3.2)
	$\displaystyle=\bar{p}(T)\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left[\bar{p}(\tau)\frac{d\tau}{dt}\bar{q}^{\prime}(\tau)-H(q(\tau),\mathfrak{q}(\tau),p(\tau))-\mathfrak{p}(\tau)\right]\frac{dt}{d\tau}d\tau}$	(3.3)
	$\displaystyle=\bar{p}(T)\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left\{\bar{p}(\tau)\bar{q}^{\prime}(\tau)-\frac{dt}{d\tau}\left[H(q(\tau),\mathfrak{q}(\tau),p(\tau))+\mathfrak{p}(\tau)\right]\right\}d\tau},$	(3.4)

where we have performed a change of variables in the integral. Then,

\mathfrak{S}(\bar{q}(\cdot),\bar{p}(\cdot))=\bar{p}(T)\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left\{\bar{p}(\tau)\bar{q}^{\prime}(\tau)-g(q(\tau),\mathfrak{q}(\tau),p(\tau))\left[H(q(\tau),\mathfrak{q}(\tau),p(\tau))+\mathfrak{p}(\tau)\right]\right\}d\tau}.

(3.5)

Computing the variation of $\mathfrak{S}$ yields

	$\displaystyle\delta\mathfrak{S}$	$\displaystyle=\bar{q}(T)\delta\bar{p}(T)+\bar{p}(T)\delta\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{q}^{\prime}\delta\mathfrak{p}+\mathfrak{p}\delta\mathfrak{q}^{\prime}-\left(g\frac{\partial H}{\partial\mathfrak{q}}+\frac{\partial g}{\partial\mathfrak{q}}(H+\mathfrak{q})\right)\delta\mathfrak{q}-g\delta\mathfrak{p}\right]d\tau}$
		$\displaystyle\qquad\qquad-\int_{\tau(t=0)}^{\tau(t=T)}{\left[q^{\prime}\delta p+p\delta q^{\prime}-\left(g\frac{\partial H}{\partial q}+\frac{\partial g}{\partial q}(H+\mathfrak{p})\right)\delta q-\left(g\frac{\partial H}{\partial p}+\frac{\partial g}{\partial p}(H+\mathfrak{p})\right)\delta p\right]d\tau},$

and using integration by parts and the boundary conditions $\delta\bar{q}(0)=0$ and $\delta\bar{p}(T)=0,$ gives

	$\displaystyle\delta\mathfrak{S}$	$\displaystyle=\int_{\tau(t=0)}^{\tau(t=T)}{\left[p^{\prime}+g\frac{\partial H}{\partial q}+\frac{\partial g}{\partial q}(H+\mathfrak{p})\right]\delta qd\tau}+\int_{\tau(t=0)}^{\tau(t=T)}{\left[g\frac{\partial H}{\partial p}+\frac{\partial g}{\partial p}(H+\mathfrak{p})-q^{\prime}\right]\delta pd\tau}$
		$\displaystyle\qquad\qquad+\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{p}^{\prime}+g\frac{\partial H}{\partial\mathfrak{q}}+\frac{\partial g}{\partial\mathfrak{q}}(H+\mathfrak{p})\right]\delta\mathfrak{q}d\tau}-\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{q}^{\prime}-g\right]\delta\mathfrak{p}d\tau}.$

Thus, the condition that $\mathfrak{S}(\bar{q}(\cdot),\bar{p}(\cdot))$ is stationary with respect to the boundary conditions $\delta\bar{q}(0)=0$ and $\delta\bar{p}(T)=0$ is equivalent to $(\bar{q}(\cdot),\bar{p}(\cdot))$ satisfying Hamilton’s canonical equations corresponding to the Poincaré transformed Hamiltonian,

	$\displaystyle\mathfrak{q}^{\prime}=g(q,\mathfrak{q},p),$		(3.6)
	$\displaystyle q^{\prime}=g(q,\mathfrak{q},p)\frac{\partial H}{\partial p}(q,\mathfrak{q},p)+\frac{\partial g}{\partial p}(q,\mathfrak{q},p)\left[H(q,\mathfrak{q},p)+\mathfrak{p}\right],$		(3.7)
	$\displaystyle p^{\prime}=-g(q,\mathfrak{q},p)\frac{\partial H}{\partial q}(q,\mathfrak{q},p)-\frac{\partial g}{\partial q}(q,\mathfrak{q},p)\left[H(q,\mathfrak{q},p)+\mathfrak{p}\right],$		(3.8)
	$\displaystyle\mathfrak{p}^{\prime}=-g(q,\mathfrak{q},p)\frac{\partial H}{\partial\mathfrak{q}}(q,\mathfrak{q},p)-\frac{\partial g}{\partial\mathfrak{q}}(q,\mathfrak{q},p)\left[H(q,\mathfrak{q},p)+\mathfrak{p}\right].$		(3.9)

An alternative way to reach the same conclusion is by interpreting equation (3.5) as the usual Type II action functional for the modified Hamiltonian,

g(q(\tau),\mathfrak{q}(\tau),p(\tau))\left[H(q(\tau),\mathfrak{q}(\tau),p(\tau))+\mathfrak{p}(\tau)\right],

(3.10)

which coincides with the Poincaré transformed Hamiltonian.

3.2. Time-adaptivity from a Variational Principle on the Lagrangian side

We will now derive a mechanism for time-adaptivity on the Lagrangian side by mimicking the derivation of the Poincaré Hamiltonian. We will work in the extended space $\bar{q}=(q,\mathfrak{q},\lambda)^{\top}\in\bar{\mathcal{Q}}$ where $\mathfrak{q}=t$ and $\lambda$ is a Lagrange multiplier used to enforce the time rescaling $\frac{dt}{d\tau}=g(t).$ Consider the action functional $\mathfrak{S}:C^{2}([0,T],T\bar{\mathcal{Q}})\rightarrow\mathbb{R}$ given by

$\displaystyle\mathfrak{S}(\bar{q}(\cdot),\dot{\bar{q}}(\cdot))$	$\displaystyle=\int_{0}^{T}{\left[L(q(t),\dot{q}(t),\mathfrak{q}(t))-\lambda(t)\left(\frac{d\mathfrak{q}}{d\tau}-g(\mathfrak{q}(t))\right)\right]dt}$	(3.11)
	$\displaystyle=\int_{\tau(t=0)}^{\tau(t=T)}{\left[\frac{dt}{d\tau}L\left(q(\tau),\frac{d\tau}{dt}q^{\prime}(\tau),\mathfrak{q}(\tau)\right)-\lambda(\tau)\frac{dt}{d\tau}\left(\frac{d\mathfrak{q}}{d\tau}-g(\mathfrak{q}(\tau))\right)\right]d\tau}$	(3.12)
	$\displaystyle=\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{q}^{\prime}(\tau)L\left(q(\tau),\frac{d\tau}{dt}q^{\prime}(\tau),\mathfrak{q}(\tau)\right)-\lambda(\tau)\mathfrak{q}^{\prime}(\tau)\left[\mathfrak{q}^{\prime}(\tau)-g(\mathfrak{q}(\tau))\right]\right]d\tau},$	(3.13)

where, as before, we have performed a change of variables in the integral. This is the usual Type I action functional for the extended autonomous Lagrangian,

\bar{L}(\bar{q}(\tau),\bar{q}^{\prime}(\tau))=\mathfrak{q}^{\prime}(\tau)L\left(q(\tau),\frac{d\tau}{dt}q^{\prime}(\tau),\mathfrak{q}(\tau)\right)-\lambda(\tau)\mathfrak{q}^{\prime}(\tau)\left[\mathfrak{q}^{\prime}(\tau)-g(\mathfrak{q}(\tau))\right].

(3.14)

Theorem 3.1.

If $\left(\bar{q}(\tau),\bar{q}^{\prime}(\tau)\right)$ satisfies the Euler–Lagrange equations corresponding to the extended Lagrangian $\bar{L}$ , then its components satisfy $\frac{dt}{d\tau}=g(t)$ and the original Euler–Lagrange equations

\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}\left(q,\dot{q},t\right)=\frac{\partial L}{\partial q}\left(q,\dot{q},t\right).

(3.15)

Proof.

Substituting the expression for $\bar{L}$ into the Euler–Lagrange equations, $\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial\lambda^{\prime}}=\frac{\partial\bar{L}}{\partial\lambda},$ and $\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial q^{\prime}}=\frac{\partial\bar{L}}{\partial q}$ , gives

\mathfrak{q}^{\prime}\left[\mathfrak{q}^{\prime}-g(\mathfrak{q})\right]=0,

and

\frac{d\mathfrak{q}}{d\tau}\frac{d}{d\mathfrak{q}}\left[\mathfrak{q}^{\prime}\frac{\partial L\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)}{\partial q^{\prime}}\right]=\mathfrak{q}^{\prime}\frac{\partial L\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)}{\partial q}.

Now, $\mathfrak{q}^{\prime}=g(\mathfrak{q})>0$ so $\mathfrak{q}^{\prime}=g(\mathfrak{q})$ , and the chain rule gives

\frac{d}{d\mathfrak{q}}\frac{\partial L}{\partial\dot{q}}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)=\frac{\partial L}{\partial q}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right).

Using the equation $\dot{q}=\frac{d\tau}{d\mathfrak{q}}q^{\prime}$ and replacing $\mathfrak{q}$ by $t$ recovers the original Euler–Lagrange equations. ∎

We now introduce a discrete variational formulation of these continuous Lagrangian mechanics. Suppose we are given a partition $0=\tau_{0}<\tau_{1}<\ldots<\tau_{N}=\mathcal{T}$ of the interval $[0,\mathcal{T}]$ , and a discrete curve in $\mathcal{Q}\times\mathbb{R}\times\mathbb{R}$ denoted by $\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}$ such that $q_{k}\approx q(\tau_{k})$ , $\mathfrak{q}_{k}\approx\mathfrak{q}(\tau_{k})$ , and $\lambda_{k}\approx\lambda(\tau_{k})$ . Consider the discrete action functional,

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}},

(3.16)

where $L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})$ is obtained by approximating the exact discrete Lagrangian, which is related to Jacobi’s solution of the Hamilton–Jacobi equation and is the generating function for the exact time- $h$ flow map. It is given by the extremum of the action integral from $\tau_{k}$ to $\tau_{k+1}$ over twice continuously differentiable curves $(q,\mathfrak{q})\in\mathcal{Q}\times\mathbb{R}$ satisfying the boundary conditions $(q(\tau_{k}),\mathfrak{q}(\tau_{k}))=(q_{k},\mathfrak{q}_{k}),$ and $(q(\tau_{k+1}),\mathfrak{q}(\tau_{k+1}))=(q_{k+1},\mathfrak{q}_{k+1})$ :

L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],\mathcal{Q}\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=(q_{k+1},\mathfrak{q}_{k+1})\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}.

(3.17)

In practice, we can obtain an approximation by replacing the integral with a quadrature rule, and extremizing over a finite-dimensional function space instead of $C^{2}([\tau_{k},\tau_{k+1}],\mathcal{Q}\times\mathbb{R})$ . This discrete functional $\bar{\mathfrak{S}}_{d}$ is a discrete analogue of the action functional $\bar{\mathfrak{S}}:C^{2}([0,T],\mathcal{Q}\times\mathbb{R}\times\mathbb{R})\rightarrow\mathbb{R}$ given by

	$\displaystyle\bar{\mathfrak{S}}(q(\cdot),\mathfrak{q}(\cdot),\lambda(\cdot))$	$\displaystyle=\int_{0}^{\mathcal{T}}{\bar{L}\left(q(\tau),\mathfrak{q}(\tau),\lambda(\tau),q^{\prime}(\tau),\mathfrak{q}^{\prime}(\tau),\lambda^{\prime}(\tau)\right)d\tau}$		(3.18)
		$\displaystyle=\int_{0}^{\mathcal{T}}{\left[L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)-\lambda\mathfrak{q}^{\prime}+\lambda g(\mathfrak{q})\right]\mathfrak{q}^{\prime}d\tau}.$		(3.19)

We can derive the following result which relates a discrete Type I variational principle to a set of discrete Euler–Lagrange equations:

Theorem 3.2.

The Type I discrete Hamilton’s variational principle,

\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

(3.20)

is equivalent to the discrete extended Euler–Lagrange equations,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

(3.21)

\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,

(3.22)

		$\displaystyle\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]$		(3.23)
		$\displaystyle\qquad+\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,$		(3.23)

where $L_{d_{k}}$ denotes $L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})$ .

Proof.

See Appendix A.1. ∎

Defining the discrete momenta via the discrete Legendre transformations,

p_{k}=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),\qquad\qquad\mathfrak{p}_{k}=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),

(3.24)

and using a constant time-step $h$ in $\tau$ , the discrete Euler–Lagrange equations can be rewritten as

$\displaystyle p_{k}$	$\displaystyle=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(3.25)
$\displaystyle\mathfrak{p}_{k}$	$\displaystyle=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(3.26)
$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+hg(\mathfrak{q}_{k}),$	(3.27)
$\displaystyle p_{k+1}$	$\displaystyle=\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}D_{3}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(3.28)
$\displaystyle\mathfrak{p}_{k+1}$	$\displaystyle=\frac{L_{d_{k}}-L_{d_{k+1}}}{hg(\mathfrak{q}_{k+1})}+\frac{\lambda_{k+1}}{h}+\lambda_{k+1}\nabla g(\mathfrak{q}_{k+1})+\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}\left[D_{4}L_{d_{k}}-\frac{\lambda_{k}}{h}\right].$	(3.29)

3.3. A Second Time-Adaptive Framework obtained by Reverse-Engineering

As mentioned earlier, all the literature to date on the Poincaré transformation have constructed the Poincaré transformed system by reverse-engineering. The Poincaré transformed Hamiltonian is chosen in such a way that the corresponding component dynamics satisfy the Hamilton’s equations in the original space. We will follow a similar strategy to derive a second framework for time-adaptivity from the Lagrangian perspective.

Given a time-dependent Lagrangian $L(q(t),\dot{q}(t),t)$ consider a transformation of time $t\rightarrow\tau$ ,

\frac{dt}{d\tau}=g(t),

(3.30)

described by the monitor function $g(t)$ . The time $t$ shall be referred to as the physical time, while $\tau$ will be referred to as the fictive time, and we will denote derivatives with respect to $t$ and $\tau$ by dots and apostrophes, respectively. We define the autonomous Lagrangian,

\bar{L}(\bar{q}(\tau),\bar{q}^{\prime}(\tau))=\mathfrak{q}^{\prime}L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)-\lambda\left(\mathfrak{q}^{\prime}-g(\mathfrak{q})\right),

(3.31)

in the extended space with $\bar{q}=\left(q,\mathfrak{q},\lambda\right)^{\top}$ where $\mathfrak{q}=t$ , and where $\lambda$ is a multiplier used to impose the constraint that the time evolution is guided by the monitor function $g(t)$ . Note that in contrast to the earlier framework, the Lagrange multiplier term lacks an extra multiplicative factor of $\mathfrak{q}^{\prime}$ .

Theorem 3.3.

\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}\left(q,\dot{q},t\right)=\frac{\partial L}{\partial q}\left(q,\dot{q},t\right).

(3.32)

Proof.

Substituting the expression for $\bar{L}$ in the Euler–Lagrange equations $\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial\lambda^{\prime}}=\frac{\partial\bar{L}}{\partial\lambda}$ and $\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial q^{\prime}}=\frac{\partial\bar{L}}{\partial q}$ gives

\mathfrak{q}^{\prime}=g(\mathfrak{q}),

and

\frac{d\mathfrak{q}}{d\tau}\frac{d}{d\mathfrak{q}}\left[\frac{d\mathfrak{q}}{d\tau}\frac{\partial L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)}{\partial q^{\prime}}\right]=\ \frac{d\mathfrak{q}}{d\tau}\frac{\partial L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)}{\partial q}.

We can divide by $\frac{d\mathfrak{q}}{d\tau}$ and use the chain rule to get $\mathfrak{q}^{\prime}=g(\mathfrak{q})$ and

\frac{d}{d\mathfrak{q}}\frac{\partial L}{\partial\dot{q}}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)=\frac{\partial L}{\partial q}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right).

Using the equations $\dot{q}=\frac{d\tau}{d\mathfrak{q}}q^{\prime}$ and $\mathfrak{q}^{\prime}=g(\mathfrak{q})$ , and replacing $\mathfrak{q}$ by $t$ recovers the desired equations. ∎

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left\{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\right]+\lambda_{k}g(\mathfrak{q}_{k})\right\}},

(3.33)

where,

L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],\mathcal{Q}\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=(q_{k+1},\mathfrak{q}_{k+1})\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}.

(3.34)

This discrete functional $\bar{\mathfrak{S}}_{d}$ is a discrete analogue of the action functional $\bar{\mathfrak{S}}:C^{2}([0,T],\mathcal{Q}\times\mathbb{R}\times\mathbb{R})\rightarrow\mathbb{R}$ given by

	$\displaystyle\bar{\mathfrak{S}}(q(\cdot),\mathfrak{q}(\cdot),\lambda(\cdot))$	$\displaystyle=\int_{0}^{\mathcal{T}}{\bar{L}\left(q(\tau),\mathfrak{q}(\tau),\lambda(\tau),q^{\prime}(\tau),\mathfrak{q}^{\prime}(\tau),\lambda^{\prime}(\tau)\right)d\tau}$		(3.35)
		$\displaystyle=\int_{0}^{\mathcal{T}}{\left\{\mathfrak{q}^{\prime}\left[L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)-\lambda\right]+\lambda g(\mathfrak{q})\right\}d\tau}.$		(3.36)

We can derive the following result which relates a discrete Type I variational principle to a set of discrete Euler–Lagrange equations:

Theorem 3.4.

The Type I discrete Hamilton’s variational principle,

\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

(3.37)

is equivalent to the discrete extended Euler–Lagrange equations,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

(3.38)

\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,

(3.39)

\displaystyle\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{L_{d_{k}}}{\tau_{k+1}-\tau_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{L_{d_{k-1}}}{\tau_{k}-\tau_{k-1}}=\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}-\lambda_{k}\nabla g(\mathfrak{q}_{k}),

(3.40)

where $L_{d_{k}}$ denotes $L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})$ .

Proof.

See Appendix A.2. ∎

Defining the discrete momenta via the discrete Legendre transformations,

p_{k}=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),\qquad\qquad\mathfrak{p}_{k}=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),

(3.41)

and using a constant time-step $h$ in $\tau$ , the discrete Euler–Lagrange equations can be rewritten as

$\displaystyle p_{k}$	$\displaystyle=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(3.42)
$\displaystyle\mathfrak{p}_{k}$	$\displaystyle=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(3.43)
$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+hg(\mathfrak{q}_{k}),$	(3.44)
$\displaystyle p_{k+1}$	$\displaystyle=\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}D_{3}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(3.45)
$\displaystyle\mathfrak{p}_{k+1}$	$\displaystyle=\frac{L_{d_{k}}-L_{d_{k+1}}+\lambda_{k+1}-\lambda_{k}+h\lambda_{k+1}\nabla g(\mathfrak{q}_{k+1})+hg(\mathfrak{q}_{k})D_{4}L_{d_{k}}}{hg(\mathfrak{q}_{k+1})}.$	(3.46)

3.4. Remarks on the Framework for Time-Adaptivity

Time-adaptivity comes more naturally on the Hamiltonian side through the Poincaré transformation. Indeed, in the Hamiltonian case, the time-rescaling equation $\mathfrak{q}^{\prime}=g(q,\mathfrak{q},p)$ emerged naturally through the change of time variable inside the extended action functional. By contrast, in the Lagrangian case, we need to impose the time-rescaling equation as a constraint via a multiplier, which we then consider as an extra position coordinate. This strategy can be thought of as being part of the more general framework for constrained variational integrators (see [49; 16]).

The Poincaré transformation on the Hamiltonian side was presented in [69; 25; 20] for the general case where the monitor function can depend on position, time and momentum, $g=g(q,t,p)$ . For the accelerated optimization application which was our main motivation to develop a time-adaptive framework for geometric integrators, the monitor function only depends on time, $g=g(t)$ . For the sake of simplicity and clarity, we have decided to only present the theory for time-adaptive Lagrangian integrators for monitor functions of the form $g=g(t)$ in this paper. Note however that this time-adaptivity framework on the Lagrangian side can be extended to the case where the monitor function also depends on position, $g=g(q,t)$ . The action integral remains the same with the exception that $g$ is now a function of $(q,\mathfrak{q})$ . Unlike the case where $g=g(t)$ , the corresponding Euler–Lagrange equation $\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial q^{\prime}}=\frac{\partial L}{\partial q}$ yields an extra term $\lambda(t)\frac{\partial g}{\partial q}(q,t)$ in the original phase-space,

\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}\left(q,\dot{q},t\right)-\frac{\partial L}{\partial q}\left(q,\dot{q},t\right)=\lambda(t)\frac{\partial g}{\partial q}(q,t).

(3.47)

The discrete Euler–Lagrange equations become more complicated and involve terms with partial derivatives $D_{1}g(q_{k},\mathfrak{q}_{k})$ of $g$ with respect to $q$ . Furthermore, when $g=g(q,t)$ , the discrete Euler–Lagrange equations involve $\lambda_{k}$ but the time-evolution of the Lagrange multiplier $\lambda$ is not well-defined, so the discrete Hamiltonian map corresponding to the discrete Lagrangian $L_{d}$ is not well-defined, as explained in [49, page 440]. Although there are ways to circumvent this problem in practice, this adds some difficulty and makes the time-adaptive Lagrangian approach with $g=g(q,t)$ less natural and desirable than the corresponding Poincaré transformation on the Hamiltonian side. It might also be tempting to generalize further and consider the case where $g=g(q,\dot{q},t)$ . However, in this case, the time-rescaling equation $\frac{dt}{d\tau}=g(q,\dot{q},t)$ becomes implicit and it becomes less clear how to generalize the variational derivation presented in this paper. There are examples where time-adaptivity with these more general monitor functions proved advantageous (see for instance Kepler’s problem in [20]). This motivates further effort towards developing a better framework for time-adaptivity on the Lagrangian side with more general monitor functions.

It might be more natural to consider these time-rescaled Lagrangian and Hamiltonian dynamics as Dirac mechanics [44; 66; 67] on the Pontryagin bundle $(q,v,p)\in T\mathcal{Q}\oplus T^{*}\mathcal{Q}$ . Dirac dynamics are described by the Hamilton-Pontryagin variational principle where the momentum $p$ acts as a Lagrange multiplier to impose the kinematic equation $\dot{q}=v$ ,

\delta\int_{0}^{T}{\left[L(q,v,t)+p(\dot{q}-v)\right]dt}=0.

(3.48)

This provides a variational description of both Lagrangian and Hamiltonian mechanics, yields the implicit Euler–Lagrange equations

\dot{q}=v,\qquad\dot{p}=\frac{\partial L}{\partial q},\qquad p=\frac{\partial L}{\partial v},

(3.49)

and suggests the introduction of a more general quantity, the generalized energy

E(q,v,p,t)=pv-L(q,v,t),

(3.50)

as an alternative to the Hamiltonian.

4. Application to Accelerated Optimization on Vector Spaces

4.1. A Variational Framework for Accelerated Optimization

A variational framework was introduced in [65] for accelerated optimization on normed vector spaces. The $p$ -Bregman Lagrangians and Hamiltonians are defined to be

\mathcal{L}_{p}(x,v,t)=\frac{t^{p+1}}{2p}\langle v,v\rangle-Cpt^{2p-1}f(x),

(4.1)

\mathcal{H}_{p}(x,r,t)=\frac{p}{2t^{p+1}}\langle r,r\rangle+Cpt^{2p-1}f(x),

(4.2)

which are scalar-valued functions of position $x\in\mathcal{X}$ , velocity $v\in\mathbb{R}^{d}$ or momentum $r\in\mathbb{R}^{d}$ , and time $t$ . In [65], it was shown that solutions to the $p$ -Bregman Euler–Lagrange equations converge to a minimizer of the convex objective function $f$ at a convergence rate of $\mathcal{O}(1/t^{p})$ . Furthermore, this family of Bregman dynamics is closed under time dilation: time-rescaling a solution to the $p$ -Bregman Euler–Lagrange equations via $\tau(t)=t^{\mathring{p}/p}$ yields a solution to the $\mathring{p}$ -Bregman Euler–Lagrange equations. Thus, the entire subfamily of Bregman trajectories indexed by the parameter $p$ can be obtained by speeding up or slowing down along the Bregman curve corresponding to any value of $p$ . In [20], the time-rescaling property of the Bregman dynamics was exploited together with a carefully chosen Poincaré transformation to transform the $p$ -Bregman Hamiltonian into an autonomous version of the $\mathring{p}$ -Bregman Hamiltonian in extended phase-space, where $\mathring{p}<p$ . This strategy allowed us to achieve the faster rate of convergence associated with the higher-order $p$ -Bregman dynamics, but with the computational efficiency of integrating the lower-order $\mathring{p}$ -Bregman dynamics. Explicitly, using the time rescaling $\tau(t)=t^{\mathring{p}/p}$ within the Poincaré transformation framework yields the adaptive approach $p\rightarrow\mathring{p}$ -Bregman Hamiltonian,

\displaystyle\bar{H}_{p\rightarrow\mathring{p}}(\bar{q},\bar{r})

\displaystyle=\frac{p^{2}}{2\mathring{p}\mathfrak{q}^{p+\mathring{p}/p}}\langle r,r\rangle+\frac{Cp^{2}}{\mathring{p}}\mathfrak{q}^{2p-\mathring{p}/p}f(q)+\frac{p}{\mathring{p}}\mathfrak{r}\mathfrak{q}^{1-\mathring{p}/p},

(4.3)

and when $\mathring{p}=p$ , the direct approach $p$ -Bregman Hamiltonian,

\bar{H}_{p}(\bar{q},\bar{r})=\frac{p}{2\mathfrak{q}^{p+1}}\langle r,r\rangle+Cp\mathfrak{q}^{2p-1}f(q)+\mathfrak{r}.

(4.4)

In [20], a careful computational study was performed on how time-adaptivity and symplecticity of the numerical scheme improve the performance of the resulting optimization algorithm. In particular, it was observed that time-adaptive Hamiltonian variational discretizations, which are automatically symplectic, with adaptive time-steps informed by the time-rescaling of the family of $p$ -Bregman Hamiltonians yielded the most robust and computationally efficient optimization algorithms, outperforming fixed-timestep symplectic discretizations, adaptive-timestep non-symplectic discretizations, and Nesterov’s accelerated gradient algorithm which is neither time-adaptive nor symplectic.

4.2. Numerical Methods

4.2.1. A Lagrangian Taylor Variational Integrator (LTVI)

We will now construct a time-adaptive Lagrangian Taylor variational integrator (LTVI) for the $p$ -Bregman Lagrangian,

\bar{L}_{p}\left(q,q^{\prime},\mathfrak{q}\right)=\frac{\mathfrak{q}^{p+1}}{2p}\langle q^{\prime},q^{\prime}\rangle-Cp\mathfrak{q}^{2p-1}f(q),

(4.5)

using the strategy outlined in Section 2.2 together with the discrete Euler–Lagrange equations derived in Sections 3.2 and 3.3.

Looking at the form of the continuous $p$ -Bregman Euler–Lagrange equations,

\ddot{q}+\frac{p+1}{\mathfrak{q}}\dot{q}+Cp^{2}\mathfrak{q}^{p-2}\nabla f(q)=0,

(4.6)

we can note that $\nabla f$ appears in the expression for $\ddot{q}$ . Now, the construction of a LTVI as presented in Section 2.2 requires $\rho$ -order and $(\rho+1)$ -order Taylor approximations of $q$ . This means that if we take $\rho\geq 1$ , then $\nabla f$ and higher-order derivatives of $f$ will appear in the resulting discrete Lagrangian $L_{d}$ , and as a consequence, the discrete Euler–Lagrange equations,

p_{0}=-D_{1}L_{d}(q_{0},q_{1}),\qquad p_{1}=D_{2}L_{d}(q_{0},q_{1}),

(4.7)

will yield an integrator which is not gradient-based. Keeping in mind the machine learning applications where data sets are very large, we will restrict ourselves to explicit first-order optimization algorithms, and therefore the highest value of $\rho$ that we can choose to obtain a gradient-based algorithm is $\rho=0$ .

With $\rho=0$ , the choice of quadrature rule does not matter, so we can take the rectangular quadrature rule about the initial point ( $c_{1}=0$ and $b_{1}=1$ ). We first approximate $\dot{\bar{q}}(0)=\bar{v}_{0}$ by the solution $\tilde{\bar{v}}_{0}$ of the problem $\bar{q}_{1}=\pi_{Q}\circ\Psi_{h}^{(1)}(\bar{q}_{0},\tilde{\bar{v}}_{0})=\bar{q}_{0}+h\tilde{\bar{v}}_{0}$ , that is $\tilde{\bar{v}}_{0}=\frac{\bar{q}_{1}-\bar{q}_{0}}{h}$ . Then, applying the quadrature rule gives the associated discrete Lagrangian,

\displaystyle L_{d}(\bar{q}_{0},\bar{q}_{1})

\displaystyle=hL_{p}\left(q_{0},\frac{\tilde{v}_{0}}{g(\mathfrak{q}_{0})},\mathfrak{q}_{0}\right)=\frac{\mathfrak{q}_{0}^{p+1}}{2p(g(\mathfrak{q}_{0}))^{2}}h\langle\tilde{v}_{0},\tilde{v}_{0}\rangle-Chp\mathfrak{q}_{0}^{2p-1}f(q_{0}).

(4.8)

The variational integrator is then defined by the discrete extended Euler–Lagrange equations derived in Sections 3.2 and 3.3. In practice, we are not interested in the evolution of the conjugate momentum $\mathfrak{r}$ , and since it will not appear in the updates for the other variables, the discrete equations of motion from Sections 3.2 and 3.3 both reduce to the same updates,

$\displaystyle r_{k}$	$\displaystyle=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(4.9)
$\displaystyle r_{k+1}$	$\displaystyle=\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}D_{3}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),$	(4.10)
$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+hg(\mathfrak{q}_{k}).$	(4.11)

Now, for the adaptive approach, substituting $g(\mathfrak{q})=\frac{p}{\mathring{p}}\mathfrak{q}^{1-\frac{\mathring{p}}{p}}$ and

L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})=\frac{\mathring{p}^{2}}{2hp^{3}}\mathfrak{q}_{k}^{p-1+2\mathring{p}/p}\langle q_{k+1}-q_{k},q_{k+1}-q_{k}\rangle-Chp\mathfrak{q}_{0}^{2p-1}f(q_{k}),

(4.12)

yields the adaptive LTVI algorithm,

$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+h\frac{p}{\mathring{p}}\mathfrak{q}_{k}^{1-\mathring{p}/p},$	(4.13)
$\displaystyle q_{k+1}$	$\displaystyle=q_{k}+\frac{hp^{3}}{\mathring{p}^{2}\mathfrak{q}_{k}^{p-1+2\mathring{p}/p}}r_{k}-\frac{Ch^{2}p^{4}}{\mathring{p}^{2}}\mathfrak{q}_{k}^{p-2\mathring{p}/p}\nabla f(q_{k}),$	(4.14)
$\displaystyle r_{k+1}$	$\displaystyle=\frac{\mathring{p}^{2}\mathfrak{q}_{k}^{p+\mathring{p}/p}}{hp^{3}\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}(q_{k+1}-q_{k}).$	(4.15)

In the direct approach, $\mathring{p}=p$ so $g(\mathfrak{q})=1$ and we obtain the direct LTVI algorithm,

$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+h,$	(4.16)
$\displaystyle q_{k+1}$	$\displaystyle=q_{k}+\frac{hp}{\mathfrak{q}_{k}^{p+1}}r_{k}-Ch^{2}p^{2}\mathfrak{q}_{k}^{p-2}\nabla f(q_{k}),$	(4.17)
$\displaystyle r_{k+1}$	$\displaystyle=\frac{\mathfrak{q}_{k}^{p+1}}{hp}(q_{k+1}-q_{k}).$	(4.18)

4.2.2. A Hamiltonian Taylor Variational Integrator (HTVI)

In [20], a Type II Hamiltonian Taylor Variational Integrator (HTVI) was derived following the strategy from Section 2.2 with $\rho=0$ for the adaptive approach $p\rightarrow\mathring{p}$ -Bregman Hamiltonian,

\displaystyle\bar{H}_{p\rightarrow\mathring{p}}(\bar{q},\bar{r})

\displaystyle=\frac{p^{2}}{2\mathring{p}\mathfrak{q}^{p+\mathring{p}/p}}\langle r,r\rangle+\frac{Cp^{2}}{\mathring{p}}\mathfrak{q}^{2p-\mathring{p}/p}f(q)+\frac{p}{\mathring{p}}\mathfrak{r}\mathfrak{q}^{1-\mathring{p}/p}.

(4.19)

This adaptive HTVI is the most natural Hamiltonian analogue of the LTVI described in Section 4.2.1, and its updates are given by

$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+h\frac{p}{\mathring{p}}\mathfrak{q}_{k}^{1-\mathring{p}/p},$	(4.20)
$\displaystyle r_{k+1}$	$\displaystyle=r_{k}-\frac{p^{2}}{\mathring{p}}Ch\mathfrak{q}_{k}^{2p-\mathring{p}/p}\nabla f(q_{k}),$	(4.21)
$\displaystyle q_{k+1}$	$\displaystyle=q_{k}+\frac{p^{2}}{\mathring{p}}h\mathfrak{q}_{k}^{-p-\mathring{p}/p}r_{k+1}.$	(4.22)

When $\mathring{p}=p$ , it reduces to the direct HTVI,

$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+h,$	(4.23)
$\displaystyle r_{k+1}$	$\displaystyle=r_{k}-hCp\mathfrak{q}_{k}^{2p-1}\nabla f(q_{k}),$	(4.24)
$\displaystyle q_{k+1}$	$\displaystyle=q_{k}+hp\mathfrak{q}_{k}^{-p-1}r_{k+1}.$	(4.25)

4.3. Numerical Experiments

Numerical experiments using the numerical methods presented in the previous section have been conducted to minimize the quartic function,

f(x)=\left[(x-1)^{\top}\Sigma(x-1)\right]^{2},

(4.26)

where $x\in\mathbb{R}^{d}$ and $\Sigma_{ij}=0.9^{|i-j|}$ . This convex function achieves its minimum value $0$ at $x^{*}=1$ .

As was observed in [20] for the HTVI algorithm, the numerical experiments showed that a carefully tuned adaptive approach algorithm enjoys a significantly better rate of convergence and requires a much smaller number of steps to achieve convergence than the direct approach, as can be seen in Figure 1 for the LTVI methods. Although the adaptive approach requires a smaller fictive time-step $h$ than the direct approach, the physical time-steps resulting from $t=\tau^{p/\mathring{p}}$ in the adaptive approach grow rapidly to values larger than the constant physical time-step of the direct approach. The results of Figure 1 also show that the adaptive and direct LTVI methods become more and more efficient as $p$ is increased, which was also the case for the HTVI algorithm in [20].

Refer to caption — Figure 1. Comparison of the direct and adaptive approaches for the LTVI algorithm, when applied to the quartic function (4.26).

The LTVI and HTVI algorithms presented in Section 4.2 perform empirically almost exactly in the same way for the same parameters and time-step, as can be seen for instance in Figure 2. As a result, the computational analysis carried in [20] for the HTVI algorithm extends to the LTVI algorithm. In particular, it was shown in [20] that the HTVI algorithm is much more efficient than non-adaptive non-symplectic and adaptive non-symplectic integrators for the Bregman dynamics, and that it can be a competitive first-order explicit algorithm which can outperform certain popular optimization algorithms such as Nesterov’s Accelerated Gradient [54], Trust Region Steepest Descent, ADAM [33], AdaGrad [15], and RMSprop [61], for certain objective functions. Since the computational performance of the LTVI algorithm is almost exactly the same as that of the HTVI algorithm, this means that the LTVI algorithm is also much more efficient than non-symplectic integrators for the Bregman dynamics and can also be very competitive as a first-order explicit optimization algorithm.

5. Accelerated Optimization on More General Spaces

5.1. Motivation and Prior Work

The variational framework for accelerated optimization on normed vector spaces from [65; 20] was extended to the Riemannian manifold setting in [19] via a Riemannian $p$ -Bregman Lagrangian $\mathcal{L}_{p}:T\mathcal{Q}\times\mathbb{R}\rightarrow\mathbb{R}$ and a corresponding Riemannian $p$ -Bregman Hamiltonian $\mathcal{H}_{p}:T^{*}\mathcal{Q}\times\mathbb{R}\rightarrow\mathbb{R}$ , for $p>0$ , of the form

\mathcal{L}_{p}(X,V,t)=\frac{t^{\frac{\zeta}{\lambda}p+1}}{2p}\langle V,V\rangle-Cpt^{(\frac{\zeta}{\lambda}+1)p-1}f(X),

(5.1)

\mathcal{H}_{p}(X,R,t)=\frac{p}{2t^{\frac{\zeta}{\lambda}p+1}}\llangle R,R\rrangle+Cpt^{(\frac{\zeta}{\lambda}+1)p-1}f(X),

(5.2)

where $\zeta$ and $\lambda$ are constants having to do with the curvature of the manifold and the convexity of the objective function $f$ . These yield the associated $p$ -Bregman Euler–Lagrange equations,

\nabla_{\dot{X}}\dot{X}+\frac{\zeta p+\lambda}{\lambda t}\dot{X}+Cp^{2}t^{p-2}\text{grad}f(X)=0.

(5.3)

Here, $\text{grad}f$ denotes the Riemannian gradient of $f$ , $\nabla_{X}Y$ is the covariant derivative of $Y$ along $X$ , and $\llangle\cdot,\cdot\rrangle$ is the fiber metric on $T^{*}\mathcal{Q}$ induced by the Riemannian metric $\langle\cdot,\cdot\rangle$ on $\mathcal{Q}$ whose local coordinates representation is the inverse of the local representation of $\langle\cdot,\cdot\rangle$ . See [1; 6; 48; 35; 32; 19] for a more detailed description of these notions from Riemannian geometry and of this Riemannian variational framework for accelerated optimization. Note that some work was done on accelerated optimization via numerical integration of Bregman dynamics in the Lie group setting [42; 60] before the theory for more general Bregman families on Riemannian manifolds was established in [19].

It was shown in [19] that solutions to the $p$ -Bregman Euler–Lagrange equations converge to a minimizer of $f$ at a convergence rate of $\mathcal{O}(1/t^{p})$ , under suitable assumptions, and proven that time-rescaling a solution to the $p$ -Bregman Euler–Lagrange equations via $\tau(t)=t^{\mathring{p}/p}$ yields a solution to the $\mathring{p}$ -Bregman Euler–Lagrange equations. As a result, the adaptive approach involving the Poincaré transformation was extended to the Riemannian manifold setting via the adaptive approach Riemannian $p\rightarrow\mathring{p}$ Bregman Hamiltonian,

\displaystyle\bar{\mathcal{H}}_{p\rightarrow\mathring{p}}\left(\bar{Q},\bar{R}\right)

\displaystyle=\frac{p^{2}}{2\mathring{p}\mathfrak{Q}^{\frac{\zeta}{\lambda}p+\frac{\mathring{p}}{p}}}\llangle R,R\rrangle+\frac{Cp^{2}}{\mathring{p}}\mathfrak{Q}^{\left(\frac{\zeta}{\lambda}+1\right)p-\frac{\mathring{p}}{p}}f(Q)+\frac{p}{\mathring{p}}\mathfrak{Q}^{1-\frac{\mathring{p}}{p}}\mathfrak{R}.

(5.4)

This adaptive framework was exploited using discrete variational integrators incorporating holonomic constraints [16] and projection-based variational integrators [18]. Both these strategies relied on embedding the Riemannian manifolds into an ambient Euclidean space. Although the Whitney and Nash Embedding Theorems [64; 63; 53] imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, designing intrinsic methods that would exploit and preserve the symmetries and geometric properties of the Riemannian manifold and of the problem at hand could have advantages, both in terms of computation and in terms of improving our understanding of the acceleration phenomenon on Riemannian manifolds.

Developing an intrinsic extension of Hamiltonian variational integrators to manifolds would require some additional work, since the current approach involves Type II/III generating functions $H_{d}^{+}(q_{k},p_{k+1})$ , $H_{d}^{-}(p_{k},q_{k+1})$ , which depend on the position at one boundary point, and on the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space, and one should ideally consider a Hamiltonian variational integrator construction based on the discrete generalized energy in the context of discrete Dirac mechanics [44; 66; 67].

On the other hand, Lagrangian variational integrators involve a Type I generating function $L_{d}(q_{k},q_{k+1})$ which only depends on the position at the boundary points and is therefore well-defined on manifolds, and many Lagrangian variational integrators have been derived on Riemannian manifolds, especially in the Lie group [38; 39; 5; 27; 56; 43; 28; 37; 36] and homogeneous space [40] settings. The time-adaptive framework developed in this paper now makes it possible to design time-adaptive Lagrangian integrators for accelerated optimization on these more general spaces, where it is more natural and easier to work on the Lagrangian side than on the Hamiltonian side.

5.2. Accelerated Optimization on Lie Groups

Although it is possible to work on Riemannian manifolds, we will restrict ourselves to Lie groups for simplicity of exposition since there is more literature available on Lie group integrators than Riemannian integrators. Note as well that prior work is available on accelerated optimization via numerical integration of Bregman dynamics in the Lie group setting [42; 60].

Here, we will work in the setting introduced in [42]. The setting of [60] can be thought of as a special case of the more general Lie group framework for accelerated optimization presented here. Consider a $n$ -dimensional Lie group $G$ with associated Lie algebra $\mathfrak{g}=T_{e}G$ , and a left-trivialization of the tangent bundle of the group $TG\simeq G\times\mathfrak{g}$ , via $(q,\dot{q})\mapsto(q,\text{L}_{q^{-1}}\dot{q})\equiv(q,\xi)$ , where $\text{L}:G\times G\rightarrow G$ is the left action defined by $\text{L}_{q}h=qh$ for all $q,h\in G$ . Suppose that $\mathfrak{g}$ is equipped with an inner product which induces an inner product on $T_{q}G$ via left-trivialization,

(v\bullet w)_{T_{q}G}=(\text{T}_{q}\text{L}_{q^{-1}}v\bullet\text{T}_{q}\text{L}_{q^{-1}}w)_{\mathfrak{g}},\qquad\forall v,w\in T_{q}G.

With this inner product, we identify $\mathfrak{g}\simeq\mathfrak{g}^{*}$ and $T_{q}G\simeq T_{q}^{*}G\simeq G\times\mathfrak{g}^{*}$ via the Riesz representation. Let $\mathbf{J}:\mathfrak{g}\rightarrow\mathfrak{g}^{*}$ be chosen such that $(\mathbf{J}(\xi)\bullet\zeta)$ is positive-definite and symmetric as a bilinear form of $\xi,\zeta\in\mathfrak{g}$ . Then, the metric $\langle\cdot,\cdot\rangle:\mathfrak{g}\times\mathfrak{g}\rightarrow\mathbb{R}$ with $\langle\xi,\zeta\rangle=(\mathbf{J}(\xi)\bullet\zeta)$ serves as a left-invariant Riemannian metric on $G$ . The adjoint and ad operators are denoted by $\text{Ad}_{q}:\mathfrak{g}\rightarrow\mathfrak{g}$ and $\text{ad}_{\xi}:\mathfrak{g}\rightarrow\mathfrak{g}$ , respectively. We refer the reader to [48; 41; 23] for a more detailed description of Lie group theory and mechanics on Lie groups.

As mentioned earlier, there is a lot of literature available on Lie group integrators. We refer the reader to [30; 11; 12; 13] for very thorough surveys of the literature on Lie group methods, which acknowledge all the foundational contributions leading to the current state of Lie group integrator theory. In particular, the Crouch and Grossman approach [14], the Lewis and Simo approach [47], Runge–Kutta–Munthe–Kaas methods [51; 52; 50; 9], Magnus and Fer expansions [68; 4; 29], and commutator-free Lie group methods [10] are outlined in these surveys. Variational integrators have also been derived on the Lagrangian side in the Lie group setting [38; 39; 5; 27; 56; 43; 28; 37; 36].

We now introduce a discrete variational formulation of time-adaptive Lagrangian mechanics on Lie groups. Suppose we are given a partition $0=\tau_{0}<\tau_{1}<\ldots<\tau_{N}=\mathcal{T}$ of the interval $[0,\mathcal{T}]$ , and a discrete curve in $G\times\mathbb{R}\times\mathbb{R}$ denoted by $\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}$ such that $q_{k}\approx q(\tau_{k})$ , $\mathfrak{q}_{k}\approx\mathfrak{q}(\tau_{k})$ , and $\lambda_{k}\approx\lambda(\tau_{k})$ . The discrete kinematics equation is chosen to be

q_{k+1}=q_{k}f_{k},

(5.5)

where $f_{k}\in G$ represents the relative update over a single step.

Consider the discrete action functional,

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}},

(5.6)

where,

L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],G\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=\left(q_{k}f_{k},\mathfrak{q}_{k+1}\right)\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{\xi}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}.

(5.7)

We can derive the following result which relates a discrete Type I variational principle to a set of discrete Euler–Lagrange equations:

Theorem 5.1.

The Type I discrete Hamilton’s variational principle,

\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

(5.8)

where,

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}},

(5.9)

is equivalent to the discrete extended Euler–Lagrange equations,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

(5.10)

\emph{Ad}^{*}_{f_{k}^{-1}}\left(\emph{T}_{e}^{*}\emph{L}_{f_{k}}D_{2}L_{d_{k}}\right)=\emph{T}_{e}^{*}\emph{L}_{q_{k}}D_{1}L_{d_{k}}+\frac{\tau_{k+1}-\tau_{k}}{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\emph{T}_{e}^{*}\emph{L}_{f_{k-1}}D_{2}L_{d_{k-1}},

(5.11)

		$\displaystyle\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]$		(5.12)
		$\displaystyle\qquad+\left[D_{4}L_{d_{k}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,$		(5.12)

where $L_{d_{k}}$ denotes $L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})$ .

Proof.

See Appendix A.3. ∎

Now, define

\displaystyle\mathfrak{p}_{k}

\displaystyle=-D_{3}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})

(5.13)

and

\displaystyle\mu_{k}

\displaystyle=\text{Ad}^{*}_{f_{k}^{-1}}\left(\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\right)-\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}).

(5.14)

Then,

\displaystyle\mu_{k+1}

\displaystyle=\frac{\tau_{k+2}-\tau_{k+1}}{\mathfrak{q}_{k+2}-\mathfrak{q}_{k+1}}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}).

(5.15)

With these definitions, if we use a constant time-step $h$ in fictive time $\tau$ and substitute $g(\mathfrak{q})=\frac{p}{\mathring{p}}\mathfrak{q}^{1-\mathring{p}/p}$ , the discrete Euler–Lagrange equations can be rewritten as

$\displaystyle\mu_{k}$	$\displaystyle=\text{Ad}^{}_{f_{k}^{-1}}\left(\text{T}_{e}^{}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\right)-\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}),$	(5.16)
$\displaystyle\mu_{k+1}$	$\displaystyle=\frac{\mathfrak{q}_{k}^{1-\mathring{p}/p}}{\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}),$	(5.17)
$\displaystyle\mathfrak{q}_{k+1}$	$\displaystyle=\mathfrak{q}_{k}+h\frac{p}{\mathring{p}}\mathfrak{q}_{k}^{1-\mathring{p}/p},$	(5.18)
$\displaystyle\mathfrak{p}_{k+1}$	$\displaystyle=\frac{\mathring{p}\left[\lambda_{k+1}-\lambda_{k}+L_{d_{k}}-L_{d_{k+1}}\right]}{hp\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}+D_{4}L_{d_{k}}+\frac{\lambda_{k+1}}{\mathfrak{q}_{k+1}}\left(1-\frac{\mathring{p}}{p}\right).$	(5.19)

In the Lie group setting, the Riemannian $p$ -Bregman Lagrangian becomes

\mathcal{L}_{p}(q,\xi,t)=\frac{t^{\kappa p+1}}{2p}\langle\xi,\xi\rangle-Cpt^{(\kappa+1)p-1}f(q),

(5.20)

with corresponding Euler–Lagrange equation,

\frac{d\mathbf{J}(\xi)}{dt}+\frac{\kappa p+1}{t}\mathbf{J}(\xi)-\text{ad}_{\xi}^{*}\mathbf{J}(\xi)+Cp^{2}t^{p-2}\nabla_{\text{L}}f(q)=0,

(5.21)

where $\nabla_{\text{L}}f$ is the left-trivialized derivative of $f$ , given by $\nabla_{\text{L}}f(q)=\text{T}_{e}^{*}\text{L}_{q}(\text{D}_{q}f(q))$ . We then consider the discrete Lagrangian,

\displaystyle L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})

\displaystyle=\frac{\mathfrak{q}_{k}^{\kappa p+1}}{hp(g(\mathfrak{q}_{k}))^{2}}T_{d}(f_{k})-Chp\mathfrak{q}_{k}^{(\kappa+1)p-1}f(q_{k}),

(5.22)

where $T_{d}(f_{k})\approx\frac{1}{2}\langle h\xi_{k},h\xi_{k}\rangle$ , which approximates

L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],G\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=(q_{k}f_{k},\mathfrak{q}_{k+1})\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{\xi}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}.

(5.23)

5.3. Numerical Experiment on $\text{SO}(3)$

We work on the 3-dimensional Special Orthogonal group,

\text{SO}(3)=\{R\in\mathbb{R}^{3\times 3}|R^{\top}R=I_{3\times 3},\det{(R)}=1\}.

(5.24)

Its Lie algebra is

\mathfrak{so}(3)=\{S\in\mathbb{R}^{3\times 3}|S^{\top}=-S\},

(5.25)

with the matrix commutator as the Lie bracket. We have an identification between $\mathbb{R}^{3}$ and $\mathfrak{so}(3)$ given by the hat map $\hat{\cdot}:\mathbb{R}^{3}\rightarrow\mathfrak{so}(3)$ , defined such that $\hat{x}y=x\times y$ for any $x,y\in\mathbb{R}^{3}$ . The inverse of the hat map is the vee map $(\cdot)^{\vee}:\mathfrak{so}(3)\rightarrow\mathbb{R}^{3}$ . The inner product on $\mathfrak{so}(3)$ is given by

\left(\hat{\eta}\bullet\hat{\xi}\right)_{\mathfrak{so}(3)}=\frac{1}{2}\text{Trace}\left(\hat{\eta}^{\top}\hat{\xi}\right)=\eta^{\top}\xi,

(5.26)

and the metric is chosen so that

\langle\hat{\eta},\hat{\xi}\rangle=\left(\mathbf{J}(\hat{\eta})\bullet\hat{\xi}\right)_{\mathfrak{so}(3)}=\text{Trace}\left(\hat{\eta}^{\top}J_{d}\hat{\xi}\right)=\eta^{\top}J\xi,

(5.27)

where $J\in\mathbb{R}^{3\times 3}$ is a symmetric positive-definite matrix and $J_{d}=\frac{1}{2}\text{Trace}(J)I_{3\times 3}-J$ .
On $\text{SO}(3)$ , for any $u,v\in\mathbb{R}^{3}$ and $F\in\text{SO}(3)$ ,

\text{ad}_{\hat{u}}\hat{v}=[\hat{u},\hat{v}]=\hat{u}\hat{v}-\hat{v}\hat{u}=\widehat{u\times v},\qquad\text{Ad}_{F}\hat{u}=F\hat{u}F^{\top}=\widehat{Fu}.

(5.28)

Identifying $\mathfrak{so}(3)^{*}\simeq\mathfrak{so}(3)\simeq\mathbb{R}^{3}$ , we have for any $u,v\in\mathbb{R}^{3}$ and $F\in\text{SO}(3)$ that

\text{ad}_{u}v=\hat{u}v=u\times v,\qquad\text{ad}^{*}_{u}v=-\hat{u}v=v\times u,\qquad\text{Ad}_{F}u=Fu,\qquad\text{Ad}_{F}^{*}u=F^{\top}u.

(5.29)

On $\text{SO}(3)$ , the Riemannian $p$ -Bregman Lagrangian becomes

\mathcal{L}_{p}(R,\Omega,t)=\frac{t^{p+1}}{2p}\Omega^{\top}J\Omega-Cpt^{2p-1}f(R),

(5.30)

and the corresponding Euler–Lagrange equations are given by

J\dot{\Omega}+\frac{p+1}{t}J\Omega+\hat{\Omega}J\Omega+Cp^{2}t^{p-2}\nabla_{\text{L}}f(R)=0,\qquad\dot{R}=R\hat{\Omega}.

(5.31)

The discrete kinematics equation is written as

R_{k+1}=R_{k}F_{k},

(5.32)

where $F_{k}\in\text{SO}(3)$ , and $\kappa=1$ so we get the discrete Lagrangian,

\displaystyle L_{d}(R_{k},F_{k},\mathfrak{R}_{k},\mathfrak{R}_{k+1})

\displaystyle=\frac{\mathring{p}^{2}}{hp^{3}}\mathfrak{R}_{k}^{p-1+2\mathring{p}/p}T_{d}(F_{k})-Chp\mathfrak{R}_{k}^{2p-1}f(R_{k}).

(5.33)

As in [38; 42], the angular velocity is approximated by $\hat{\Omega}_{k}\approx\frac{1}{h}R_{k}^{\top}(R_{k+1}-R_{k})=\frac{1}{h}(F_{k}-I_{3\times 3})$ so we can take

T_{d}(F_{k})=\text{Trace}\left([I_{3\times 3}-F_{k}]J_{d}\right).

(5.34)

Differentiating this equation and using the identity $\text{Trace}(-\hat{x}A)=(A-A^{\top})^{\vee}\cdot x$ yields

\text{T}_{I}^{*}\text{L}_{F_{k}}\left(D_{F_{k}}T_{d}(F_{k})\right)=\left(J_{d}F_{k}-F_{k}^{\top}J_{d}\right)^{\vee}.

(5.35)

Then, the discrete Euler–Lagrange equations for $\mu_{k}$ and $\mu_{k+1}$ become

	$\displaystyle\mu_{k}$	$\displaystyle=\frac{\mathring{p}^{2}}{hp^{3}}\mathfrak{R}_{k}^{p-1+2\mathring{p}/p}\left(F_{k}J_{d}-J_{d}F_{k}^{\top}\right)^{\vee}+Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k}),$		(5.36)
	$\displaystyle\mu_{k+1}$	$\displaystyle=\frac{\mathfrak{q}_{k}^{1-\mathring{p}/p}}{\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}F_{k}^{\top}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right].$		(5.37)

Now, equation (5.36) can be solved explicitly when $J=I_{3\times 3}$ as described in [42]:

\displaystyle F_{k}=\exp{\left(\frac{\sin^{-1}{\|a_{k}\|}}{\|a_{k}\|}\hat{a}_{k}\right)},\qquad\text{where}\quad a_{k}=\frac{hp^{3}}{\mathring{p}^{2}}\mathfrak{R}_{k}^{1-p-2\mathring{p}/p}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right].

(5.38)

Therefore, we get the Adaptive LLGVI (Adaptive Lagrangian Lie Group Variational Integrator)

	$\displaystyle F_{k}=\exp{\left(\frac{\sin^{-1}{\\|a_{k}\\|}}{\\|a_{k}\\|}\hat{a}_{k}\right)},\text{ }\text{ where }\text{ }a_{k}=\frac{hp^{3}}{\mathring{p}^{2}}\mathfrak{R}_{k}^{1-p-2\mathring{p}/p}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right],$		(5.39)
	$\displaystyle\mathfrak{R}_{k+1}=\mathfrak{R}_{k}+h\frac{p}{\mathring{p}}\mathfrak{R}_{k}^{1-\mathring{p}/p},$		(5.40)
	$\displaystyle\mu_{k+1}=\frac{\mathfrak{R}_{k}^{1-\mathring{p}/p}}{\mathfrak{R}_{k+1}^{1-\mathring{p}/p}}F_{k}^{\top}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right],$		(5.41)
	$\displaystyle R_{k+1}=R_{k}F_{k}.$		(5.42)

We will use this integrator to solve the problem of minimizing the objective function,

f(R)=\frac{1}{2}\|A-R\|_{F}^{2}=\frac{1}{2}\left(\|A\|_{F}^{2}+3\right)-\text{Trace}(A^{\top}R),

(5.43)

over $R\in\text{SO}(3)$ , where $\|\cdot\|_{F}$ denotes the Frobenius norm. Its left-trivialized gradient is given by

\nabla_{\text{L}}f(R)=\left(A^{\top}R-R^{\top}A\right)^{\vee}.

(5.44)

Minimizing this objective function appears in the least-squares estimation of attitude, commonly known as Wahba’s problem [62]. The optimal attitude is explicitly given by

R^{*}=U\text{diag}\left[1,1,\det(UV)\right]V^{\top},

(5.45)

where $A=USV^{\top}$ is the singular value decomposition of $A$ with $U,V\in\text{O}(3)$ and $S$ diagonal.

We have tested the Adaptive LLGVI integrator on Wahba’s problem against the Implicit Lie Group Variational Integrator (Implicit LGVI) from [42]. The Implicit LGVI is a Lagrangian Lie group variational integrator which adaptively adjusts the step size at every step. It should be noted that these two adaptive approaches use adaptivity in two fundamentally different ways: our Adaptive LLGVI method uses a priori adaptivity based on known global properties of the family of differential equations considered (i.e. the time-rescaling symmetry of the family of Bregman dynamics), while the implicit method from [42] adapts the time-steps in an a posteriori way, by solving a system of nonlinear equations coming from an extended variational principle. The results of our numerical experiments are presented in Figures 3 and 4. In these numerical experiments, we have used the termination criteria

|f(R_{k})-f(R^{*})|<\delta\qquad\text{and}\qquad|f(R_{k})-f(R_{k-1})|<\delta.

(5.46)

We can see from Figure 3 that both algorithms preserve the orthogonality condition $R_{k}^{\top}R_{k}=I_{3\times 3}$ very well. Now, we can observe from Figure 3 that although both algorithms follow the same curve in time $t$ , they do not travel along this curve at the same speed. Despite the fact that the Adaptive LLGVI algorithm initially takes smaller time-steps, those time-steps eventually become much larger than for the Implicit LGVI algorithm, and as a result, the Adaptive LLGVI algorithm achieves the termination criteria in a smaller number of iterations, which can also be seen more explicitly in the table from Figure 4. Unlike the Implicit LGVI algorithm, the Adaptive LLGVI algorithm is explicit, so each iteration is much cheaper and is therefore significantly faster, as can be seen from the running times displayed in Figure 4. Furthermore, the Adaptive LLGVI algorithm is significantly easier to implement.

6. Conclusion

A variational framework for accelerated optimization on vector spaces was introduced [65] by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This variational framework was exploited in [20] by using time-adaptive geometric Hamiltonian integrators to design efficient, explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity, which was possible on the Hamiltonian side thanks to the Poincaré transformation, could result in a significant gain in computational efficiency, by simulating higher-order Bregman dynamics using the computationally efficient lower-order Bregman integrators applied to the time-rescaled dynamics.

The variational framework and time-adaptive approach on the Hamiltonian side were later extended to the Riemannian manifolds setting in [19]. However, the current formulations of Hamiltonian variational integrators do not make sense intrinsically on manifolds, so this framework was only exploited using methods which take advantage of the structure of the Euclidean spaces in which the Riemannian manifolds are embedded [16; 18] instead of the structure of the Riemannian manifolds themselves. On the other hand, existing formulations of Lagrangian variational integrators are well-defined on manifolds, and many Lagrangian variational integrators have been derived on Riemannian manifolds, especially in the Lie group setting. This motivated exploring whether it is possible to construct a mechanism on the Lagrangian side which mimics the Poincaré transformation, since it is more natural and easier to work on the Lagrangian side on curved manifolds.

The usual correspondence between Hamiltonian and Lagrangian dynamics could not be exploited here since the Poincaré Hamiltonian is degenerate and therefore does not have a corresponding Lagrangian formulation. Instead, we introduced a novel derivation of the Poincaré transformation from a variational principle which gave us additional insight into the transformation mechanism and provided natural candidates for a time-adaptive framework on the Lagrangian side. Based on these observations, we constructed a theory of time-adaptive Lagrangian mechanics both in continuous and discrete time, and applied the resulting time-adaptive Lagrangian variational integrators to solve optimization problems by simulating Bregman dynamics, within the variational framework introduced in [65]. We observed empirically that our time-adaptive Lagrangian variational integrators performed almost exactly in the same way as the time-adaptive Hamiltonian variational integrators coming from the Poincaré framework of [20], whenever they are used with the same parameters and time-step. As a result, the computational analysis carried in [20] for the HTVI algorithm extends to the LTVI algorithm, and thus the LTVI algorithm is much more efficient than non-symplectic integrators for the Bregman dynamics and can be a competitive first-order explicit algorithm since it can outperform commonly used optimization algorithms for certain objective functions.

Finally, we showed that our time-adaptive Lagrangian approach extends naturally to more general spaces such as Riemannian manifolds and Lie groups without having to face the difficulties experienced on the Hamiltonian side, and we then applied time-adaptive Lie group Lagrangian variational integrators to solve an optimization problem on the three-dimensional Special Orthogonal group $\text{SO}(3)$ . In particular, the resulting Lie group accelerated optimization algorithms were significantly faster and easier to implement than other recently proposed time-adaptive Lie group variational integrators for accelerated optimization.

In future work, we will explore the issue of time-adaptive Lagrangian mechanics for more general monitor functions, using the primal-dual framework of Dirac mechanics. We will also study the convergence properties of the discrete-time algorithms, and try to better understand how to reconcile the Nesterov barrier theorem with the convergence properties of the continuous Bregman flows. It would also be useful to study the extent to which the practical considerations recently presented in [17], which significantly improved the computational performance of the symplectic optimization algorithms in the normed vector space setting, extend to the Riemannian manifold and Lie group settings with the Lagrangian Riemannian and Lie group variational integrators.

Acknowledgments

The authors would like to thank Taeyoung Lee and Molei Tao for sharing the code for the implicit Lie group variational integrator from [42], which was used in the numerical comparison with the method introduced in this paper. The authors would also like to thank the referees for their helpful comments and suggestions.

Funding

The authors were supported in part by NSF under grants DMS-1345013, DMS-1813635, CCF-2112665, by AFOSR under grant FA9550-18-1-0288, and by the DoD under grant HQ00342010023 (Newton Award for Transformative Ideas during the COVID-19 Pandemic).

Appendix A Proofs of Theorems

A.1. Proof of Theorem 3.2

Theorem A.1.

The Type I discrete Hamilton’s variational principle,

\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

where,

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}},

is equivalent to the discrete extended Euler–Lagrange equations,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,

	$\displaystyle\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]$
	$\displaystyle\qquad+\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,$

where $L_{d_{k}}$ denotes $L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})$ .

Proof.

We use the notation $L_{d_{k}}=L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})$ , and we will use the fact that $\delta q_{0}=\delta q_{N}=\delta\mathfrak{q}_{0}=\delta\mathfrak{q}_{N}=0$ throughout the proof. We have

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}$	$\displaystyle=\delta\left(\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}\right)$
		$\displaystyle=\sum_{k=1}^{N-1}{\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\qquad+\sum_{k=0}^{N-2}{\left[D_{4}L_{d_{k}}-\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k+1}}+\sum_{k=0}^{N-2}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k+1}}$
		$\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}\delta q_{k}}+\sum_{k=0}^{N-2}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{3}L_{d_{k}}\delta q_{k+1}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.$

Thus,

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}$	$\displaystyle=\sum_{k=1}^{N-1}{\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\delta\mathfrak{q}_{k}}+\sum_{k=1}^{N-1}{\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d_{k-1}}\right]\delta q_{k}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.$

As a consequence, if

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,

	$\displaystyle\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]$
	$\displaystyle\qquad+\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,$

then $\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0$ . Conversely, if $\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0$ , then a discrete fundamental theorem of the calculus of variations yields the above equations. ∎

A.2. Proof of Theorem 3.4

Theorem A.2.

The Type I discrete Hamilton’s variational principle,

\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

where,

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left\{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\right]+\lambda_{k}g(\mathfrak{q}_{k})\right\}},

is equivalent to the discrete extended Euler–Lagrange equations,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,

\displaystyle\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}L_{d_{k-1}}=\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}-\lambda_{k}\nabla g(\mathfrak{q}_{k}),

where $L_{d_{k}}$ denotes $L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})$ .

Proof.

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}$	$\displaystyle=\delta\left(\sum_{k=0}^{N-1}{\left\{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\right]+\lambda_{k}g(\mathfrak{q}_{k})\right\}}\right)$
		$\displaystyle=\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}+\sum_{k=0}^{N-2}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{4}L_{d_{k}}+\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}\right]\delta\mathfrak{q}_{k+1}}$
		$\displaystyle\quad\qquad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}\delta q_{k}}+\sum_{k=0}^{N-2}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{3}L_{d_{k}}\delta q_{k+1}}+\sum_{k=0}^{N-1}{\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.$

Thus,

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}$	$\displaystyle=\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}L_{d_{k-1}}-\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d_{k-1}}\right]\delta q_{k}}+\sum_{k=0}^{N-1}{\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.$

As a consequence, if

\displaystyle\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}L_{d_{k-1}}=\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}-\lambda_{k}\nabla g(\mathfrak{q}_{k}),

\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

A.3. Proof of Theorem 5.1

Theorem A.3.

The Type I discrete Hamilton’s variational principle,

\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

where,

\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}},}

is equivalent to the discrete extended Euler–Lagrange equations,

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

\displaystyle\emph{Ad}^{*}_{f_{k}^{-1}}\left(\emph{T}_{e}^{*}\emph{L}_{f_{k}}D_{2}L_{d_{k}}\right)

\displaystyle=\emph{T}_{e}^{*}\emph{L}_{q_{k}}D_{1}L_{d_{k}}+\frac{\tau_{k+1}-\tau_{k}}{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\emph{T}_{e}^{*}\emph{L}_{f_{k-1}}D_{2}L_{d_{k-1}},

	$\displaystyle\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]$
	$\displaystyle\qquad+\left[D_{4}L_{d_{k}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,$

where $L_{d_{k}}$ denotes $L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})$ .

Proof.

We use the notation $L_{d_{k}}=L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})$ and we will use the fact that $\delta q_{0}=\delta q_{N}=\delta\mathfrak{q}_{0}=\delta\mathfrak{q}_{N}=\eta_{0}=\eta_{N}=0$ throughout the proof. We have

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)$	$\displaystyle=\delta\left(\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}\right)$
		$\displaystyle=\sum_{k=1}^{N-1}{\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\qquad+\sum_{k=0}^{N-2}{\left[D_{4}L_{d_{k}}-\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k+1}}+\sum_{k=0}^{N-2}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k+1}}$
		$\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}\delta q_{k}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}\delta f_{k}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.$

We can write $\delta g_{k}$ as $\delta g_{k}=g_{k}\eta_{k}$ for some $\eta_{k}\in\mathfrak{g}$ . Then, taking the variation of the discrete kinematics equation $q_{k+1}=q_{k}f_{k}$ gives the equation $\delta q_{k+1}=\delta q_{k}f_{k}+q_{k}\delta f_{k}$ and $f_{k}=q_{k}^{-1}q_{k+1}$ . Therefore,

\displaystyle\delta f_{k}

\displaystyle=q_{k}^{-1}\delta q_{k+1}-q_{k}^{-1}\delta q_{k}f_{k}=q_{k}^{-1}q_{k+1}\eta_{k+1}-q_{k}^{-1}q_{k}\eta_{k}f_{k}=f_{k}\eta_{k+1}-\eta_{k}f_{k},

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)$	$\displaystyle=\sum_{k=1}^{N-1}{\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\quad+\sum_{k=1}^{N-1}{\left[\left(D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right)\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left(L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right)\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\qquad\quad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{}\text{L}_{q_{k}}D_{1}L_{d_{k}}\bullet\eta_{k}\right)}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{}\text{L}_{f_{k}}D_{2}L_{d_{k}}\bullet\left[\eta_{k+1}-f_{k}^{-1}\eta_{k}f_{k}\right]\right)}$
		$\displaystyle\qquad\qquad\qquad+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.$

Then,

	$\displaystyle\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)$	$\displaystyle=\sum_{k=1}^{N-1}{\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad+\sum_{k=1}^{N-1}{\left[\left(D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right)\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left(L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right)\right]\delta\mathfrak{q}_{k}}$
		$\displaystyle\qquad\quad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d_{k}}\bullet\eta_{k}\right)}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}$
		$\displaystyle\qquad\qquad+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\left(\text{T}_{e}^{}\text{L}_{f_{k-1}}D_{2}L_{d_{k-1}}\bullet\eta_{k}\right)}-\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{}\text{L}_{f_{k}}D_{2}L_{d_{k}}\bullet\text{Ad}_{f_{k}^{-1}}\eta_{k}\right)}.$

As a consequence, if

\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

\displaystyle\emph{Ad}^{*}_{f_{k}^{-1}}\left(\emph{T}_{e}^{*}\emph{L}_{f_{k}}D_{2}L_{d_{k}}\right)

\displaystyle=\emph{T}_{e}^{*}\emph{L}_{q_{k}}D_{1}L_{d_{k}}+\frac{\tau_{k+1}-\tau_{k}}{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\emph{T}_{e}^{*}\emph{L}_{f_{k-1}}D_{2}L_{d_{k-1}},

	$\displaystyle\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]$
	$\displaystyle\qquad+\left[D_{4}L_{d_{k}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,$

The authors have no competing interests to declare that are relevant to the content of this article.

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Absil et al. [2008] P. A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008. ISBN 978-0-691-13298-3.
Alimisis et al. [2020] F. Alimisis, A. Orvieto, G. Bécigneul, and A. Lucchi. A continuous-time perspective for modeling acceleration in Riemannian optimization. In Proceedings of the 23rd International AISTATS Conference, volume 108 of PMLR, pages 1297–1307, 2020.
Betancourt et al. [2018] M. Betancourt, M. I. Jordan, and A. Wilson. On symplectic optimization. 2018.
Blanes et al. [2008] S. Blanes, F. Casas, J. A. Oteo, and J. Ros. The Magnus expansion and some of its applications. Physics Reports, 470:151–238, 11 2008. doi: 10.1016/j.physrep.2008.11.001.
Bou-Rabee and Marsden [2009] N. Bou-Rabee and J. E. Marsden. Hamilton–Pontryagin integrators on Lie groups part I: Introduction and structure-preserving properties. Foundations of Computational Mathematics, 9(2):197–219, 2009.
Boumal [2020] N. Boumal. An introduction to optimization on smooth manifolds, 2020. URL http://www.nicolasboumal.net/book.
Calvo and Sanz-Serna [1993] J. P. Calvo and J. M. Sanz-Serna. The development of variable-step symplectic integrators, with application to the two-body problem. SIAM J. Sci. Comp., 14(4):936–952, 1993.
Campos et al. [2021] C. M. Campos, A. Mahillo, and D. Martín de Diego. A discrete variational derivation of accelerated methods in optimization. 2021. URL https://arxiv.org/abs/2106.02700.
Casas and Owren [2003] F. Casas and B. Owren. Cost efficient Lie group integrators in the RKMK class. BIT, 43:723–742, 11 2003. doi: 10.1023/B:BITN.0000009959.29287.d4.
Celledoni et al. [2003] E. Celledoni, A. Marthinsen, and B. Owren. Commutator-free Lie group methods. Future Gener. Comput. Syst., 19:341–352, 2003.
Celledoni et al. [2014] E. Celledoni, H. Marthinsen, and B. Owren. An introduction to Lie group integrators – basics, new developments and applications. Journal of Computational Physics, 257:1040–1061, 2014. ISSN 0021-9991. doi: 10.1016/j.jcp.2012.12.031.
Celledoni et al. [2022] E. Celledoni, E. Çokaj, A. Leone, D. Murari, and B. Owren. Lie group integrators for mechanical systems. International Journal of Computer Mathematics, 99(1):58–88, 2022. doi: 10.1080/00207160.2021.1966772.
Christiansen et al. [2011] S. H. Christiansen, H. Z. Munthe-Kaas, and B. Owren. Topics in structure-preserving discretization. Acta Numerica, 20:1–119, 2011. doi: 10.1017/S096249291100002X.
Crouch and Grossman [1993] P. E. Crouch and R. L. Grossman. Numerical integration of ordinary differential equations on manifolds. Journal of Nonlinear Science, 3:1–33, 1993.
Duchi et al. [2011] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
Duruisseaux and Leok [2022a] V. Duruisseaux and M. Leok. Accelerated optimization on Riemannian manifolds via discrete constrained variational integrators. Journal of Nonlinear Science, 32(42), 2022a. URL https://doi.org/10.1007/s00332-022-09795-9.
Duruisseaux and Leok [2022b] V. Duruisseaux and M. Leok. Practical perspectives on symplectic accelerated optimization. 2022b. URL https://arxiv.org/abs/2207.1146.
Duruisseaux and Leok [2022c] V. Duruisseaux and M. Leok. Accelerated optimization on Riemannian manifolds via projected variational integrators. 2022c. URL https://arxiv.org/abs/2201.02904.
Duruisseaux and Leok [2022d] V. Duruisseaux and M. Leok. A variational formulation of accelerated optimization on Riemannian manifolds. SIAM Journal on Mathematics of Data Science, 4(2):649–674, 2022d. URL https://doi.org/10.1137/21M1395648.
Duruisseaux et al. [2021] V. Duruisseaux, J. Schmitt, and M. Leok. Adaptive Hamiltonian variational integrators and applications to symplectic accelerated optimization. SIAM Journal on Scientific Computing, 43(4):A2949–A2980, 2021. URL https://doi.org/10.1137/20M1383835.
[21] G. França, M. I. Jordan, and R. Vidal. On dissipative symplectic integration with applications to gradient-based optimization. Journal of Statistical Mechanics: Theory and Experiment, 2021(4):043402. doi: 10.1088/1742-5468/abf5d4.
França et al. [2021] G. França, A. Barp, M. Girolami, and M. I. Jordan. Optimization on manifolds: A symplectic approach, 07 2021.
Gallier and Quaintance [2020] J. Gallier and J. Quaintance. Differential Geometry and Lie Groups: A Computational Perspective. Geometry and Computing. Springer International Publishing, 2020. ISBN 9783030460402.
Gladman et al. [1991] B. Gladman, M. Duncan, and J. Candy. Symplectic integrators for long-time integrations in celestial mechanics. Celestial Mech. Dynamical Astronomy, 52:221–240, 1991.
Hairer [1997] E. Hairer. Variable time step integration with symplectic methods. Applied Numerical Mathematics, 25(2-3):219–227, 1997.
Hairer et al. [2006] E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integration, volume 31 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, second edition, 2006.
Hall and Leok [2015] J. Hall and M. Leok. Spectral Variational Integrators. Numer. Math., 130(4):681–740, 2015.
Hussein et al. [2006] I. I. Hussein, M. Leok, A. K. Sanyal, and A. M. Bloch. A discrete variational integrator for optimal control problems on SO(3). Proceedings of the 45th IEEE Conference on Decision and Control, pages 6636–6641, 2006.
Iserles and Nørsett [1999] A. Iserles and S. P. Nørsett. On the solution of linear differential equations in Lie groups. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 357(1754):983–1019, 1999. ISSN 1364503X.
Iserles et al. [2000] A. Iserles, H. Z. Munthe-Kaas, S. P. Nørsett, and A. Zanna. Lie group methods. In Acta Numerica, volume 9, pages 215–365. Cambridge University Press, 2000.
[31] M. I. Jordan. Dynamical, symplectic and stochastic perspectives on gradient-based optimization. In Proceedings of the International Congress of Mathematicians (ICM 2018), pages 523–549. doi: 10.1142/9789813272880˙0022.
Jost [2017] J. Jost. Riemannian Geometry and Geometric Analysis. Universitext. Springer, Cham, 7th edition, 2017.
Kingma and Ba [2014] D. Kingma and J. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
Lall and West [2006] S. Lall and M. West. Discrete variational Hamiltonian mechanics. J. Phys. A, 39(19):5509–5519, 2006.
Lee [2018] J. M. Lee. Introduction to Riemannian Manifolds, volume 176 of Graduate Texts in Mathematics. Springer, Cham, second edition, 2018.
Lee [2008] T. Lee. Computational geometric mechanics and control of rigid bodies. Ph.D. dissertation, University of Michigan, 2008.
[37] T. Lee, N. H. McClamroch, and M. Leok. A Lie group variational integrator for the attitude dynamics of a rigid body with applications to the 3d pendulum. Proceedings of 2005 IEEE Conference on Control Applications. (CCA 2005), pages 962–967.
Lee et al. [2007a] T. Lee, M. Leok, and N. H. McClamroch. Lie group variational integrators for the full body problem. Comput. Methods Appl. Mech. Engrg., 196(29-30):2907–2924, 2007a.
Lee et al. [2007b] T. Lee, M. Leok, and N. H. McClamroch. Lie group variational integrators for the full body problem in orbital mechanics. Celestial Mechanics and Dynamical Astronomy, 98(2):121–144, 2007b.
Lee et al. [2009] T. Lee, M. Leok, and N. H. McClamroch. Lagrangian mechanics and variational integrators on two-spheres. International Journal for Numerical Methods in Engineering, 79(9):1147–1174, 2009.
Lee et al. [2017] T. Lee, M. Leok, and N. H. McClamroch. Global Formulations of Lagrangian and Hamiltonian Dynamics on Manifolds: A Geometric Approach to Modeling and Analysis. Interaction of Mechanics and Mathematics. Springer International Publishing, 2017. ISBN 9783319569536.
Lee et al. [2021] T. Lee, M. Tao, and M. Leok. Variational symplectic accelerated optimization on Lie groups. 2021.
Leok [2004] M. Leok. Generalized Galerkin variational integrators: Lie group, multiscale, and pseudospectral methods. (preprint, arXiv:math.NA/0508360), 2004.
Leok and Ohsawa [2011] M. Leok and T. Ohsawa. Variational and geometric structures of discrete Dirac mechanics. Found. Comput. Math., 11(5):529--562, 2011.
Leok and Shingel [2012] M. Leok and T. Shingel. Prolongation-collocation variational integrators. IMA J. Numer. Anal., 32(3):1194--1216, 2012.
Leok and Zhang [2011] M. Leok and J. Zhang. Discrete Hamiltonian variational integrators. IMA Journal of Numerical Analysis, 31(4):1497--1532, 2011.
Lewis and Simo [1994] D. Lewis and J. C. Simo. Conserving algorithms for the dynamics of Hamiltonian systems on Lie groups. Journal of Nonlinear Science, 4(1):253--299, December 1994. doi: 10.1007/BF02430634.
Marsden and Ratiu [1999] J. E. Marsden and T. S. Ratiu. Introduction to mechanics and symmetry, volume 17 of Texts in Applied Mathematics. Springer-Verlag, New York, second edition, 1999.
Marsden and West [2001] J. E. Marsden and M. West. Discrete mechanics and variational integrators. Acta Numer., 10:357--514, 2001.
Munthe-Kaas [1995] H. Z. Munthe-Kaas. Lie-Butcher theory for Runge-Kutta methods. BIT, 35:572--587, 1995.
Munthe-Kaas [1998] H. Z. Munthe-Kaas. Runge-Kutta methods on Lie groups. BIT Numerical Mathematics, 38:92--111, 1998.
Munthe-Kaas [1999] H. Z. Munthe-Kaas. High order Runge-Kutta methods on manifolds. Applied Numerical Mathematics, 29:115--127, 1999.
Nash [1956] J. Nash. The imbedding problem for Riemannian manifolds. Annals of Mathematics, 63(1):20--63, 1956. ISSN 0003486X.
Nesterov [1983] Y. Nesterov. A method of solving a convex programming problem with convergence rate $\mathcal{O}(1/k^{2})$ . Soviet Mathematics Doklady, 27(2):372--376, 1983.
Nesterov [2004] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.
Nordkvist and Sanyal [2010] N. Nordkvist and A. K. Sanyal. A Lie group variational integrator for rigid body motion in SE(3) with applications to underwater vehicle dynamics. In 49th IEEE Conference on Decision and Control (CDC), pages 5414--5419, 2010. doi: 10.1109/CDC.2010.5717622.
Schmitt and Leok [2017] J. M. Schmitt and M. Leok. Properties of Hamiltonian variational integrators. IMA Journal of Numerical Analysis, 38(1):377--398, 03 2017.
Schmitt et al. [2018] J. M. Schmitt, T. Shingel, and M. Leok. Lagrangian and Hamiltonian Taylor variational integrators. BIT Numerical Mathematics, 58:457--488, 2018. doi: 10.1007/s10543-017-0690-9.
Su et al. [2016] W. Su, S. Boyd, and E. Candes. A differential equation for modeling Nesterov’s Accelerated Gradient method: theory and insights. Journal of Machine Learning Research, 17(153):1--43, 2016.
Tao and Ohsawa [2020] M. Tao and T. Ohsawa. Variational optimization on Lie groups, with examples of leading (generalized) eigenvalue problems. In Proceedings of the 23rd International AISTATS Conference, volume 108 of PMLR, 2020.
Tieleman and Hinton [2012] T. Tieleman and G. Hinton. Coursera: Neural Networks for Machine Learning - Lecture 6.5: RMSprop. 2012.
Wahba [1965] G. Wahba. A Least Squares Estimate of Satellite Attitude. SIAM Review, 7(3):409, July 1965. doi: 10.1137/1007077.
Whitney [1944a] H. Whitney. The singularities of a smooth $n$ -manifold in $(2n-1)$ -space. Annals of Mathematics, 45(2):247--293, 1944a. ISSN 0003486X.
Whitney [1944b] H. Whitney. The self-intersections of a smooth $n$ -manifold in $2n$ -space. Annals of Mathematics, 45(2):220--246, 1944b. ISSN 0003486X.
Wibisono et al. [2016] A. Wibisono, A. Wilson, and M. Jordan. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47):E7351--E7358, 2016.
Yoshimura and Marsden [2006a] H. Yoshimura and J. E. Marsden. Dirac structures in Lagrangian mechanics Part I: Implicit Lagrangian systems. Journal of Geometry and Physics, 57(1):133--156, 2006a.
Yoshimura and Marsden [2006b] H. Yoshimura and J. E. Marsden. Dirac structures in Lagrangian mechanics Part II: Variational structures. Journal of Geometry and Physics, 57(1):209--250, 2006b.
[68] A. Zanna. Collocation and relaxed collocation for the Fer and the Magnus expansions. SIAM Journal on Numerical Analysis, 36(4):1145--1182. ISSN 00361429.
Zare and Szebehely [1975] K. Zare and V. G. Szebehely. Time transformations in the extended phase-space. Celestial mechanics, 11:469--482, 1975.

Time-adaptive Lagrangian Variational Integrators for Accelerated Optimization on Manifolds

Abstract.

1. Introduction

2. Background

2.1. Lagrangian and Hamiltonian Mechanics

2.2. Variational Integrators

Theorem 2.1.

Theorem 2.2.

2.3. Time-adaptive Hamiltonian integrators via the Poincaré transformation

3. Time-adaptive Lagrangian Integrators

3.1. Variational Derivation of the Poincaré Hamiltonian

3.2. Time-adaptivity from a Variational Principle on the Lagrangian side

Theorem 3.1.

Proof.

Theorem 3.2.

Proof.

3.3. A Second Time-Adaptive Framework obtained by Reverse-Engineering

Theorem 3.3.

Proof.

Theorem 3.4.

Proof.

3.4. Remarks on the Framework for Time-Adaptivity

4. Application to Accelerated Optimization on Vector Spaces

4.1. A Variational Framework for Accelerated Optimization

4.2. Numerical Methods

4.2.1. A Lagrangian Taylor Variational Integrator (LTVI)

4.2.2. A Hamiltonian Taylor Variational Integrator (HTVI)

4.3. Numerical Experiments

5. Accelerated Optimization on More General Spaces

5.1. Motivation and Prior Work

5.2. Accelerated Optimization on Lie Groups

Theorem 5.1.

Proof.

5.3. Numerical Experiment on SO​(3)\text{SO}(3)

6. Conclusion

Acknowledgments

Funding

Appendix A Proofs of Theorems

A.1. Proof of Theorem 3.2

Theorem A.1.

Proof.

A.2. Proof of Theorem 3.4

Theorem A.2.

Proof.

A.3. Proof of Theorem 5.1

Theorem A.3.

Proof.

References

Time-adaptive Lagrangian Variational Integrators
for Accelerated Optimization on Manifolds

5.3. Numerical Experiment on $\text{SO}(3)$