This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Time-adaptive Lagrangian Variational Integrators
for Accelerated Optimization on Manifolds

Valentin Duruisseaux and Melvin Leok
Abstract.

A variational framework for accelerated optimization was recently introduced on normed vector spaces and Riemannian manifolds in Wibisono et al. [65] and Duruisseaux and Leok [19]. It was observed that a careful combination of time-adaptivity and symplecticity in the numerical integration can result in a significant gain in computational efficiency. It is however well known that symplectic integrators lose their near energy preservation properties when variable time-steps are used. The most common approach to circumvent this problem involves the Poincaré transformation on the Hamiltonian side, and was used in Duruisseaux et al. [20] to construct efficient explicit algorithms for symplectic accelerated optimization. However, the current formulations of Hamiltonian variational integrators do not make intrinsic sense on more general spaces such as Riemannian manifolds and Lie groups. In contrast, Lagrangian variational integrators are well-defined on manifolds, so we develop here a framework for time-adaptivity in Lagrangian variational integrators and use the resulting geometric integrators to solve optimization problems on vector spaces and Lie groups.

1. Introduction

Many machine learning algorithms are designed around the minimization of a loss function or the maximization of a likelihood function. Due to the ever-growing scale of data sets, there has been a lot of focus on first-order optimization algorithms because of their low cost per iteration. In 1983, Nesterov’s accelerated gradient method was introduced in  [54], and was shown to converge in 𝒪(1/k2)\mathcal{O}(1/k^{2}) to the minimum of the convex objective function ff, improving on the 𝒪(1/k)\mathcal{O}(1/k) convergence rate exhibited by the standard gradient descent methods. This 𝒪(1/k2)\mathcal{O}(1/k^{2}) convergence rate was shown in  [55] to be optimal among first-order methods using only information about f\nabla f at consecutive iterates. This phenomenon in which an algorithm displays this improved rate of convergence is referred to as acceleration, and other accelerated algorithms have been derived since Nesterov’s algorithm. More recently, it was shown in  [59] that Nesterov’s accelerated gradient method limits, as the time-step goes to 0, to a second-order differential equation and that the objective function f(x(t))f(x(t)) converges to its optimal value at a rate of 𝒪(1/t2)\mathcal{O}(1/t^{2}) along the trajectories of this ordinary differential equation. It was later shown in  [65] that in continuous time, the convergence rate of f(x(t))f(x(t)) can be accelerated to an arbitrary convergence rate 𝒪(1/tp)\mathcal{O}(1/t^{p}) in normed spaces, by considering flow maps generated by a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This framework for accelerated optimization in normed vector spaces has been studied and exploited using geometric numerical integrators in [3; 31; 8; 22; 21; 20]. In [20], time-adaptive geometric integrators have been proposed to take advantage of the time-rescaling property of the Bregman family and design efficient explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity could result in a significant gain in computational efficiency, by simulating higher-order Bregman dynamics using the computationally efficient lower-order Bregman integrators applied to the time-rescaled dynamics.

More generally, symplectic integrators form a class of geometric integrators of interest since, when applied to Hamiltonian systems, they yield discrete approximations of the flow that preserve the symplectic 2-form and as a result also preserve many qualitative aspects of the underlying dynamical system (see  [26]). In particular, when applied to conservative Hamiltonian systems, symplectic integrators exhibit excellent long-time near-energy preservation. However, when symplectic integrators were first used in combination with variable time-steps, the near-energy preservation was lost and the integrators performed poorly (see  [7; 24]). There has been a great effort to circumvent this problem, and there have been many successes, including methods based on the Poincaré transformation [69; 25]: a Poincaré transformed Hamiltonian in extended phase space is constructed which allows the use of variable time-steps in symplectic integrators without losing the nice conservation properties associated to these integrators. In [20], the Poincaré transformation was incorporated in the Hamiltonian variational integrator framework which provides a systematic method for constructing symplectic integrators of arbitrarily high-order based on the discretization of Hamilton’s principle [49; 27], or equivalently, by the approximation of the generating function of the symplectic flow map. The Poincaré transformation was at the heart of the construction of time-adaptive geometric integrators for Bregman Hamiltonian systems which resulted in efficient, explicit algorithms for accelerated optimization in [20].

In [60; 42], accelerated optimization algorithms were proposed in the Lie group setting for specific choices of parameters in the Bregman family, and [2] provided a first example of Bregman dynamics on Riemannian manifolds. The entire variational framework was later generalized to the Riemannian manifold setting in  [19], and time-adaptive geometric integrators taking advantage of the time-rescaling property of the Bregman family have been proposed in the Riemannian manifold setting as well using discrete variational integrators incorporating holonomic constraints [16] and projection-based variational integrators [18]. Note that both these strategies relied on exploiting the structure of the Euclidean spaces in which the Riemannian manifolds are embedded. Although the Whitney and Nash Embedding Theorems [64; 63; 53] imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, designing intrinsic methods that would exploit and preserve the symmetries and geometric properties of the manifold could have advantages both in terms of computational efficiency and in terms of improving our understanding of the acceleration phenomenon on Riemannian manifolds. Developing an intrinsic extension of Hamiltonian variational integrators to manifolds would require some additional work, since the current approach involves Type II/III generating functions Hd+(qk,pk+1)H_{d}^{+}(q_{k},p_{k+1}), Hd(pk,qk+1)H_{d}^{-}(p_{k},q_{k+1}), which depend on the position at one boundary point, and the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space. On the other hand, Lagrangian variational integrators involve a Type I generating function Ld(qk,qk+1)L_{d}(q_{k},q_{k+1}) which only depends on the position at the boundary points and is therefore well-defined on manifolds, and many Lagrangian variational integrators have been derived on Riemannian manifolds, especially in the Lie group [38; 39; 5; 27; 56; 43; 28; 37; 36] and homogeneous space [40] settings. This gives an incentive to construct a mechanism on the Lagrangian side which mimics the Poincaré transformation, since it is more natural and easier to work on the Lagrangian side on more general spaces than on the Hamiltonian side. However, a first difficulty is that the Poincaré transformed Hamiltonian is degenerate and therefore does not have a corresponding Type I Lagrangian formulation. As a result, we cannot exploit the usual correspondence between Hamiltonian and Lagrangian dynamics and need to come up with a different strategy. A second difficulty is that all the literature to this day on the Poincaré transformation constructs the Poincaré transformed system by reverse-engineering, which does not provide much insight into the origin of the mechanism and how it can be extended to different systems.

Outline. We first review the basics of variational integration of Lagrangian and Hamiltonian systems, and the Poincaré transformation in Section 2. We then introduce a simple but novel derivation of the Poincaré transformation from a variational principle in Section 3.1. This gives additional insight into the transformation mechanism and provides natural candidates for time-adaptivity on the Lagrangian side, which we then construct both in continuous and discrete time in Sections 3.2 and 3.3. We then compare the performance of the resulting time-adaptive Lagrangian accelerated optimization algorithms to their Poincaré Hamiltonian analogues in Section 4. Finally, we demonstrate in Section  5 that our time-adaptive Lagrangian approach extends naturally to more general spaces without having to face the obstructions experienced on the Hamiltonian side.

Contributions. In summary, the main contributions of this paper are:

  • A novel derivation of the Poincaré transformation from a variational principle, in Section 3.1

  • New frameworks for variable time-stepping in Lagrangian integrators, and new discrete variational formulations of Lagrangian mechanics with these new variable time-stepping mechanisms, in Sections 3.2 and 3.3

  • New explicit symplectic accelerated optimization algorithms on normed vector spaces

  • New intrinsic symplectic accelerated optimization algorithms on Riemannian manifolds

2. Background

2.1. Lagrangian and Hamiltonian Mechanics

Given a nn-dimensional manifold 𝒬\mathcal{Q}, a Lagrangian is a function L:T𝒬L:T\mathcal{Q}\rightarrow\mathbb{R}. The corresponding action integral 𝒮\mathcal{S} is the functional

𝒮(q)=0TL(q,q˙)𝑑t,\mathcal{S}(q)=\int_{0}^{T}{L(q,\dot{q})dt}, (2.1)

over the space of smooth curves q:[0,T]𝒬q:[0,T]\rightarrow\mathcal{Q}. Hamilton’s variational principle states that δ𝒮=0\delta\mathcal{S}=0 where the variation δ𝒮\delta\mathcal{S} is induced by an infinitesimal variation δq\delta q of the trajectory qq that vanishes at the endpoints. Given local coordinates (q1,,qn)(q^{1},\ldots,q^{n}) on the manifold 𝒬\mathcal{Q}, Hamilton’s variational principle can be shown to be equivalent to the Euler–Lagrange equations,

ddt(Lq˙k)=Lqk,for k=1,,n.\frac{d}{dt}\left(\frac{\partial L}{\partial\dot{q}^{k}}\right)=\frac{\partial L}{\partial q^{k}},\qquad\text{for }k=1,\ldots,n. (2.2)

The Legendre transform 𝔽L:T𝒬T𝒬\mathbb{F}L:T\mathcal{Q}\rightarrow T^{*}\mathcal{Q} of LL is defined fiberwise by 𝔽L:(qi,q˙i)(qi,Lq˙i)\mathbb{F}L:(q^{i},\dot{q}^{i})\mapsto\left(q^{i},\frac{\partial L}{\partial\dot{q}^{i}}\right), and we say that a Lagrangian LL is regular or nondegenerate if the Hessian matrix 2Lq˙2\frac{\partial^{2}L}{\partial\dot{q}^{2}} is invertible for every qq and q˙\dot{q}, and hyperregular if the Legendre transform 𝔽L\mathbb{F}L is a diffeomorphism. A hyperregular Lagrangian on T𝒬T\mathcal{Q} induces a Hamiltonian system on T𝒬T^{*}\mathcal{Q} via

H(q,p)=𝔽L(q,q˙),q˙L(q,q˙)=j=1npjq˙jL(q,q˙)|pi=Lq˙i,H(q,p)=\langle\mathbb{F}L(q,\dot{q}),\dot{q}\rangle-L(q,\dot{q})=\sum_{j=1}^{n}{p_{j}\dot{q}^{j}}-L(q,\dot{q})\bigg{|}_{p_{i}=\frac{\partial L}{\partial\dot{q}^{i}}}, (2.3)

where pi=Lq˙iT𝒬p_{i}=\frac{\partial L}{\partial\dot{q}^{i}}\in T^{*}\mathcal{Q} is the conjugate momentum of qiq^{i}. A Hamiltonian HH is called hyperregular if 𝔽H:T𝒬T𝒬\mathbb{F}H:T^{*}\mathcal{Q}\rightarrow T\mathcal{Q} defined by 𝔽H(α)β=dds|s=0H(α+sβ)\mathbb{F}H(\alpha)\cdot\beta=\frac{d}{ds}\big{|}_{s=0}H(\alpha+s\beta), is a diffeomorphism. Hyperregularity of the Hamiltonian HH implies invertibility of the Hessian matrix 2Hp2\frac{\partial^{2}H}{\partial p^{2}} and thus nondegeneracy of HH. Theorem 7.4.3 in [48] states that hyperregular Lagrangians and hyperregular Hamiltonians correspond in a bijective manner. We can also define a Hamiltonian variational principle on the Hamiltonian side in momentum phase space which is equivalent to Hamilton’s equations,

p˙k=Hqk(p,q),q˙k=Hpk(p,q),for k=1,,n.\dot{p}_{k}=-\frac{\partial H}{\partial q^{k}}(p,q),\qquad\dot{q}^{k}=\frac{\partial H}{\partial p_{k}}(p,q),\qquad\text{for }k=1,\ldots,n. (2.4)

These equations can also be shown to be equivalent to the Euler–Lagrange equations (2.2), provided that the Lagrangian is hyperregular.

2.2. Variational Integrators

Variational integrators are derived by discretizing Hamilton’s principle, instead of discretizing Hamilton’s equations directly. As a result, variational integrators are symplectic, preserve many invariants and momentum maps, and have excellent long-time near-energy preservation (see [49]).

Traditionally, variational integrators have been designed based on the Type I generating function commonly known as the discrete Lagrangian, Ld:Q×QL_{d}:Q\times Q\rightarrow\mathbb{R}. The exact discrete Lagrangian that generates the time-hh flow of Hamilton’s equations can be represented both in a variational form and in a boundary-value form. The latter is given by

LdE(q0,q1;h)=0hL(q(t),q˙(t))𝑑t,L_{d}^{E}(q_{0},q_{1};h)=\int_{0}^{h}L(q(t),\dot{q}(t))dt, (2.5)

where q(0)=q0,q(0)=q_{0}, q(h)=q1,q(h)=q_{1}, and qq satisfies the Euler–Lagrange equations over the time interval [0,h][0,h]. A variational integrator is defined by constructing an approximation Ld:Q×QL_{d}:Q\times Q\rightarrow\mathbb{R} to LdEL_{d}^{E}, and then applying the discrete Euler–Lagrange equations,

pk=D1Ld(qk,qk+1),pk+1=D2Ld(qk,qk+1),p_{k}=-D_{1}L_{d}(q_{k},q_{k+1}),\qquad p_{k+1}=D_{2}L_{d}(q_{k},q_{k+1}), (2.6)

where DiD_{i} denotes a partial derivative with respect to the ii-th argument, and these equations implicitly define the integrator map F~Ld:(qk,pk)(qk+1,pk+1)\tilde{F}_{L_{d}}:(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1}). The error analysis is greatly simplified via Theorem 2.3.1 of [49], which states that if a discrete Lagrangian, Ld:Q×QL_{d}:Q\times Q\rightarrow\mathbb{R}, approximates the exact discrete Lagrangian LdE:Q×QL_{d}^{E}:Q\times Q\rightarrow\mathbb{R} to order rr, i.e.,

Ld(q0,q1;h)=LdE(q0,q1;h)+𝒪(hr+1),L_{d}(q_{0},q_{1};h)=L_{d}^{E}(q_{0},q_{1};h)+\mathcal{O}(h^{r+1}), (2.7)

then the discrete Hamiltonian map F~Ld:(qk,pk)(qk+1,pk+1)\tilde{F}_{L_{d}}:(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1}), viewed as a one-step method, has order of accuracy rr. Many other properties of the integrator, such as momentum conservation properties of the method, can be determined by analyzing the associated discrete Lagrangian, as opposed to analyzing the integrator directly.

Variational integrators have been extended to the framework of Type II and Type III generating functions, commonly referred to as discrete Hamiltonians (see [34; 46; 57]). Hamiltonian variational integrators are derived by discretizing Hamilton’s phase space principle. The boundary-value formulation of the exact Type II generating function of the time-hh flow of Hamilton’s equations is given by the exact discrete right Hamiltonian,

Hd+,E(q0,p1;h)=p1q10h[p(t)q˙(t)H(q(t),p(t))]𝑑t,H_{d}^{+,E}(q_{0},p_{1};h)=p_{1}^{\top}q_{1}-\int_{0}^{h}\left[p(t)^{\top}\dot{q}(t)-H(q(t),p(t))\right]dt, (2.8)

where (q,p)(q,p) satisfies Hamilton’s equations with boundary conditions q(0)=q0q(0)=q_{0} and p(h)=p1p(h)=p_{1}. A Type II Hamiltonian variational integrator is constructed by using a discrete right Hamiltonian Hd+H_{d}^{+} approximating Hd+,EH_{d}^{+,E}, and applying the discrete right Hamilton’s equations,

p0=D1Hd+(q0,p1),q1=D2Hd+(q0,p1),p_{0}=D_{1}H_{d}^{+}(q_{0},p_{1}),\qquad q_{1}=D_{2}H_{d}^{+}(q_{0},p_{1}), (2.9)

which implicitly defines the integrator, F~Hd+:(q0,p0)(q1,p1)\tilde{F}_{H_{d}^{+}}:(q_{0},p_{0})\mapsto(q_{1},p_{1}).

Theorem 2.3.1 of [49], which simplified the error analysis for Lagrangian variational integrators, has an analogue for Hamiltonian variational integrators. Theorem 2.2 in [57] states that if a discrete right Hamiltonian Hd+H^{+}_{d} approximates the exact discrete right Hamiltonian Hd+,EH_{d}^{+,E} to order rr, i.e.,

Hd+(q0,p1;h)=Hd+,E(q0,p1;h)+𝒪(hr+1),H^{+}_{d}(q_{0},p_{1};h)=H_{d}^{+,E}(q_{0},p_{1};h)+\mathcal{O}(h^{r+1}), (2.10)

then the discrete right Hamilton’s map F~Hd+:(qk,pk)(qk+1,pk+1)\tilde{F}_{H^{+}_{d}}:(q_{k},p_{k})\mapsto(q_{k+1},p_{k+1}), viewed as a one-step method, is order rr accurate. Note that discrete left Hamiltonians and corresponding discrete left Hamilton’s maps can also be constructed in the Type III case (see [46; 20]).

Examples of variational integrators include Galerkin variational integrators  [46; 49], Prolongation-Collocation variational integrators [45], and Taylor variational integrators [58]. In many cases, the Type I and Type II/III approaches will produce equivalent integrators. This equivalence has been established in [58] for Taylor variational integrators provided the Lagrangian is hyperregular, and in [46] for generalized Galerkin variational integrators constructed using the same choices of basis functions and numerical quadrature formula provided the Hamiltonian is hyperregular. However, Hamiltonian and Lagrangian variational integrators are not always equivalent. In particular, it was shown in [57] that even when the Hamiltonian and Lagrangian integrators are analytically equivalent, they might still have different numerical properties because of numerical conditioning issues. Even more to the point, Lagrangian variational integrators cannot always be constructed when the underlying Hamiltonian is degenerate. This is particularly relevant in variational accelerated optimization since the time-adaptive Hamiltonian framework for accelerated optimization presented in [20] relies on a degenerate Hamiltonian which has no associated Lagrangian description. We will thus not be able to exploit the usual correspondence between Hamiltonian and Lagrangian dynamics and will have to come up with a different strategy to allow time-adaptivity on the Lagrangian side.

We now describe the construction of Taylor variational integrators as introduced in [58] as we will use them in our numerical experiments. A discrete approximate Lagrangian or Hamiltonian is constructed by approximating the flow map and the trajectory associated with the boundary values using a Taylor method, and approximating the integral by a quadrature rule. The Taylor variational integrator is generated by the implicit discrete Euler–Lagrange equations associated with the discrete Lagrangian or by the Hamilton’s equations associated with the discrete Hamiltonian. More explicitly, we first construct ρ\rho-order and (ρ+1)(\rho+1)-order Taylor methods Ψh(ρ)\Psi_{h}^{(\rho)} and Ψh(ρ+1)\Psi_{h}^{(\rho+1)} approximating the exact time-hh flow map Φh:TQTQ\Phi_{h}:TQ\rightarrow TQ corresponding to the Euler–Lagrange equation in the Type I case or the exact time-hh flow map Φh:TQTQ\Phi_{h}:T^{*}Q\rightarrow T^{*}Q corresponding to Hamilton’s equation in the Type II case. Let πQ:(q,p)q\pi_{Q}:(q,p)\mapsto q and πTQ:(q,p)p\pi_{T^{*}Q}:(q,p)\mapsto p. Given a quadrature rule of order ss with weights and nodes (bi,ci)(b_{i},c_{i}) for i=1,,mi=1,...,m, the Taylor variational integrators are then constructed as follows:

Type I Lagrangian Taylor Variational Integrator (LTVI):

  1. (i)

    Approximate q˙(0)=v0\dot{q}(0)=v_{0} by the solution v~0\tilde{v}_{0} of the problem q1=πQΨh(ρ+1)(q0,v~0).q_{1}=\pi_{Q}\circ\Psi_{h}^{(\rho+1)}(q_{0},\tilde{v}_{0}).

  2. (ii)

    Generate approximations (qci,vci)(q(cih),q˙(cih))(q_{c_{i}},v_{c_{i}})\approx(q(c_{i}h),\dot{q}(c_{i}h)) via (qci,vci)=Ψcih(ρ)(q0,v~0).(q_{c_{i}},v_{c_{i}})=\Psi_{c_{i}h}^{(\rho)}(q_{0},\tilde{v}_{0}).

  3. (iii)

    Apply the quadrature rule to obtain the associated discrete Lagrangian,

    Ld(q0,q1;h)=hi=1mbiL(qci,vci).L_{d}(q_{0},q_{1};h)=h\sum_{i=1}^{m}{b_{i}L(q_{c_{i}},v_{c_{i}})}.
  4. (iv)

    The variational integrator is then defined by the implicit discrete Euler–Lagrange equations,

    p0=D1Ld(q0,q1),p1=D2Ld(q0,q1).p_{0}=-D_{1}L_{d}(q_{0},q_{1}),\qquad p_{1}=D_{2}L_{d}(q_{0},q_{1}).

Type II Hamiltonian Taylor Variational Integrator (HTVI):

  1. (i)

    Approximate p(0)=p0p(0)=p_{0} by the solution p~0\tilde{p}_{0} of the problem p1=πTQΨh(ρ)(q0,p~0).p_{1}=\pi_{T^{*}Q}\circ\Psi_{h}^{(\rho)}(q_{0},\tilde{p}_{0}).

  2. (ii)

    Generate approximations (qci,pci)(q(cih),p(cih))(q_{c_{i}},p_{c_{i}})\approx(q(c_{i}h),p(c_{i}h)) via (qci,pci)=Ψcih(ρ)(q0,p~0).(q_{c_{i}},p_{c_{i}})=\Psi_{c_{i}h}^{(\rho)}(q_{0},\tilde{p}_{0}).

  3. (iii)

    Approximate q1q_{1} via q~1=πQΨh(ρ+1)(q0,p~0).\tilde{q}_{1}=\pi_{Q}\circ\Psi_{h}^{(\rho+1)}(q_{0},\tilde{p}_{0}).

  4. (iv)

    Use the continuous Legendre transform to obtain q˙ci=Hpci\dot{q}_{c_{i}}=\frac{\partial H}{\partial p_{c_{i}}}.

  5. (v)

    Apply the quadrature rule to obtain the associated discrete right Hamiltonian,

    Hd+(q0,p1;h)=p1q~1hi=1mbi[pciq˙ciH(qci,pci)].H_{d}^{+}(q_{0},p_{1};h)=p_{1}^{\top}\tilde{q}_{1}-h\sum_{i=1}^{m}{b_{i}\left[p_{c_{i}}^{\top}\dot{q}_{c_{i}}-H(q_{c_{i}},p_{c_{i}})\right]}.
  6. (vi)

    The variational integrator is then defined by the implicit discrete right Hamilton’s equations,

    q1=D2Hd+(q0,p1),p0=D1Hd+(q0,p1).q_{1}=D_{2}H_{d}^{+}(q_{0},p_{1}),\qquad p_{0}=D_{1}H_{d}^{+}(q_{0},p_{1}).

The following error analysis results were derived in [58] and [20]:

Theorem 2.1.

Suppose the Lagrangian LL is Lipschitz continuous in both variables, and is sufficiently regular for the Taylor method Ψh(ρ+1)\Psi_{h}^{(\rho+1)} to be well-defined.

Then Ld(q0,q1)L_{d}(q_{0},q_{1}) approximates LdE(q0,q1)L_{d}^{E}(q_{0},q_{1}) with at least order of accuracy min(ρ+1,s)\min{(\rho+1,s)}.

By Theorem 2.3.1 in [49], the associated discrete Hamiltonian map has the same order of accuracy.

Theorem 2.2.

Suppose the Hamiltonian HH and its partial derivative Hp\frac{\partial H}{\partial p} are Lipschitz continuous in both variables, and HH is sufficiently regular for the Taylor method Ψh(ρ+1)\Psi_{h}^{(\rho+1)} to be well-defined.

Then Hd+(q0,p1)H_{d}^{+}(q_{0},p_{1}) approximates Hd+,E(q0,p1)H_{d}^{+,E}(q_{0},p_{1}) with at least order of accuracy min(ρ+1,s)\min{(\rho+1,s)}.

By Theorem 2.2 in [57], the associated discrete right Hamilton’s map has the same order of accuracy.

Note that analogous constructions and error analysis results have been derived in [20; 58] for discrete left Hamiltonians in the Type III case.

2.3. Time-adaptive Hamiltonian integrators via the Poincaré transformation

Symplectic integrators form a class of geometric numerical integrators of interest since, when applied to conservative Hamiltonian systems, they yield discrete approximations of the flow that preserve the symplectic 2-form [26], which results in the preservation of many qualitative aspects of the underlying system. In particular, symplectic integrators exhibit excellent long-time near-energy preservation. However, when symplectic integrators were first used in combination with variable time-steps, the near-energy preservation was lost and the integrators performed poorly [7; 24]. Backward error analysis provided justification both for the excellent long-time near-energy preservation of symplectic integrators and for the poor performance experienced when using variable time-steps (see Chapter IX of [26]): it shows that symplectic integrators can be associated with a modified Hamiltonian in the form of a formal power series in terms of the time-step. The use of a variable time-step results in a different modified Hamiltonian at every iteration, which is the source of the poor energy conservation. The Poincaré transformation is one way to incorporate variable time-steps in geometric integrators without losing the nice conservation properties associated with these integrators.

Given a Hamiltonian H(q,t,p)H(q,t,p), consider a desired transformation of time tτt\mapsto\tau described by the monitor function g(q,t,p)g(q,t,p) via

dtdτ=g(q,t,p).\frac{dt}{d\tau}=g(q,t,p). (2.11)

The time tt shall be referred to as the physical time, while τ\tau will be referred to as the fictive time, and we will denote derivatives with respect to tt and τ\tau by dots and apostrophes, respectively. A new Hamiltonian system is constructed using the Poincaré transformation,

H¯(q¯,p¯)=g(q,𝔮,p)(H(q,𝔮,p)+𝔭),\bar{H}(\bar{q},\bar{p})=g(q,\mathfrak{q},p)\left(H(q,\mathfrak{q},p)+\mathfrak{p}\right), (2.12)

in the extended phase space defined by q¯=[q𝔮]𝒬¯\bar{q}=\left[\begin{smallmatrix}q\\ \mathfrak{q}\end{smallmatrix}\right]\in\bar{\mathcal{Q}} and p¯=[p𝔭]\bar{p}=\left[\begin{smallmatrix}p\\ \mathfrak{p}\end{smallmatrix}\right] where 𝔭\mathfrak{p} is the conjugate momentum for 𝔮=t\mathfrak{q}=t with 𝔭(0)=H(q(0),0,p(0))\mathfrak{p}(0)=-H(q(0),0,p(0)). The corresponding equations of motion in the extended phase space are then given by

q¯=H¯p¯,p¯=H¯q¯.\bar{q}^{\prime}=\frac{\partial\bar{H}}{\partial\bar{p}},\qquad\qquad\bar{p}^{\prime}=-\frac{\partial\bar{H}}{\partial\bar{q}}. (2.13)

Suppose (Q¯(τ),P¯(τ))(\bar{Q}(\tau),\bar{P}(\tau)) are solutions to these extended equations of motion, and let (q(t),p(t))(q(t),p(t)) solve Hamilton’s equations for the original Hamiltonian HH. Then

H¯(Q¯(τ),P¯(τ))=H¯(Q¯(0),P¯(0))=0.\bar{H}(\bar{Q}(\tau),\bar{P}(\tau))=\bar{H}(\bar{Q}(0),\bar{P}(0))=0. (2.14)

Therefore, the components (Q(τ),P(τ))(Q(\tau),P(\tau)) in the original phase space of the augmented solutions (Q¯(τ),P¯(τ))(\bar{Q}(\tau),\bar{P}(\tau)) satisfy

H(Q(τ),τ,P(τ))=𝔭(τ),H(Q(0),0,P(0))=𝔭(0)=H(q(0),0,p(0)).H(Q(\tau),\tau,P(\tau))=-\mathfrak{p}(\tau),\qquad H(Q(0),0,P(0))=-\mathfrak{p}(0)=H(q(0),0,p(0)). (2.15)

Then, (Q(τ),P(τ))(Q(\tau),P(\tau)) and (q(t),p(t))(q(t),p(t)) both satisfy Hamilton’s equations for the original Hamiltonian HH with the same initial values, so they must be the same. Note that the Hessian is given by

2H¯p¯2=[Hppg(q¯,p)+g(q¯,p)2Hp2+pg(q¯,p)Hppg(q¯,p)pg(q¯,p)0],\frac{\partial^{2}\bar{H}}{\partial\bar{p}^{2}}=\begin{bmatrix}\frac{\partial H}{\partial p}\nabla_{p}g(\bar{q},p)^{\top}+g(\bar{q},p)\frac{\partial^{2}H}{\partial p^{2}}+\nabla_{p}g(\bar{q},p)\frac{\partial H}{\partial p}^{\top}\ \ \nabla_{p}g(\bar{q},p)\\ \nabla_{p}g(\bar{q},p)^{\top}\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad 0\end{bmatrix}, (2.16)

which will be singular in many cases. The degeneracy of the Hamiltonian H¯\bar{H} implies that there is no corresponding Type I Lagrangian formulation. This approach works seamlessly with the existing methods and theorems for Hamiltonian variational integrators, but where the system under consideration is the transformed Hamiltonian system resulting from the Poincaré transformation. We can use a symplectic integrator with constant time-step in fictive time τ\tau on the Poincaré transformed system, which will have the effect of integrating the original system with the desired variable time-step in physical time tt via the relation dtdτ=g(q,t,p)\frac{dt}{d\tau}=g(q,t,p).

3. Time-adaptive Lagrangian Integrators

The Poincaré transformation for time-adaptive symplectic integrators on the Hamiltonian side presented in Section 2.3 with autonomous monitor function g(q,p)g(q,p) was first introduced in [69], and extended to the case where gg can also depend on time based on ideas from [25]. All the literature to date on the Poincaré transformation have constructed the Poincaré transformed system by reverse-engineering: the Poincaré transformed Hamiltonian is chosen in such a way that the corresponding component dynamics satisfy Hamilton’s equations in the original space.

3.1. Variational Derivation of the Poincaré Hamiltonian

We now depart from the traditional reverse-engineering strategy for the Poincaré transformation and present a new way to think about the Poincaré transformed Hamiltonian by deriving it from a variational principle. This simple derivation gives additional insight into the transformation mechanism and provides natural candidates for time-adaptivity on the Lagrangian side and for more general frameworks.

As before, we work in the extended space (q,𝔮,p,𝔭)(q,\mathfrak{q},p,\mathfrak{p}) where 𝔮=t\mathfrak{q}=t and 𝔭\mathfrak{p} is the corresponding conjugate momentum, and consider a time transformation tτt\rightarrow\tau given by

dtdτ=g(q,t,p).\frac{dt}{d\tau}=g(q,t,p). (3.1)

We define an extended action functional 𝔖:C2([0,T],T𝒬¯)\mathfrak{S}:C^{2}([0,T],T^{*}\bar{\mathcal{Q}})\rightarrow\mathbb{R} by

𝔖(q¯(),p¯())\displaystyle\mathfrak{S}(\bar{q}(\cdot),\bar{p}(\cdot)) =p¯(T)q¯(T)0T[p¯(t)q¯˙(t)H(q(t),t,p(t))𝔭(t)]𝑑t\displaystyle=\bar{p}(T)\bar{q}(T)-\int_{0}^{T}{\left[\bar{p}(t)\dot{\bar{q}}(t)-H(q(t),t,p(t))-\mathfrak{p}(t)\right]dt} (3.2)
=p¯(T)q¯(T)τ(t=0)τ(t=T)[p¯(τ)dτdtq¯(τ)H(q(τ),𝔮(τ),p(τ))𝔭(τ)]dtdτ𝑑τ\displaystyle=\bar{p}(T)\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left[\bar{p}(\tau)\frac{d\tau}{dt}\bar{q}^{\prime}(\tau)-H(q(\tau),\mathfrak{q}(\tau),p(\tau))-\mathfrak{p}(\tau)\right]\frac{dt}{d\tau}d\tau} (3.3)
=p¯(T)q¯(T)τ(t=0)τ(t=T){p¯(τ)q¯(τ)dtdτ[H(q(τ),𝔮(τ),p(τ))+𝔭(τ)]}𝑑τ,\displaystyle=\bar{p}(T)\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left\{\bar{p}(\tau)\bar{q}^{\prime}(\tau)-\frac{dt}{d\tau}\left[H(q(\tau),\mathfrak{q}(\tau),p(\tau))+\mathfrak{p}(\tau)\right]\right\}d\tau}, (3.4)

where we have performed a change of variables in the integral. Then,

𝔖(q¯(),p¯())=p¯(T)q¯(T)τ(t=0)τ(t=T){p¯(τ)q¯(τ)g(q(τ),𝔮(τ),p(τ))[H(q(τ),𝔮(τ),p(τ))+𝔭(τ)]}𝑑τ.\mathfrak{S}(\bar{q}(\cdot),\bar{p}(\cdot))=\bar{p}(T)\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left\{\bar{p}(\tau)\bar{q}^{\prime}(\tau)-g(q(\tau),\mathfrak{q}(\tau),p(\tau))\left[H(q(\tau),\mathfrak{q}(\tau),p(\tau))+\mathfrak{p}(\tau)\right]\right\}d\tau}. (3.5)

Computing the variation of 𝔖\mathfrak{S} yields

δ𝔖\displaystyle\delta\mathfrak{S} =q¯(T)δp¯(T)+p¯(T)δq¯(T)τ(t=0)τ(t=T)[𝔮δ𝔭+𝔭δ𝔮(gH𝔮+g𝔮(H+𝔮))δ𝔮gδ𝔭]𝑑τ\displaystyle=\bar{q}(T)\delta\bar{p}(T)+\bar{p}(T)\delta\bar{q}(T)-\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{q}^{\prime}\delta\mathfrak{p}+\mathfrak{p}\delta\mathfrak{q}^{\prime}-\left(g\frac{\partial H}{\partial\mathfrak{q}}+\frac{\partial g}{\partial\mathfrak{q}}(H+\mathfrak{q})\right)\delta\mathfrak{q}-g\delta\mathfrak{p}\right]d\tau}
τ(t=0)τ(t=T)[qδp+pδq(gHq+gq(H+𝔭))δq(gHp+gp(H+𝔭))δp]𝑑τ,\displaystyle\qquad\qquad-\int_{\tau(t=0)}^{\tau(t=T)}{\left[q^{\prime}\delta p+p\delta q^{\prime}-\left(g\frac{\partial H}{\partial q}+\frac{\partial g}{\partial q}(H+\mathfrak{p})\right)\delta q-\left(g\frac{\partial H}{\partial p}+\frac{\partial g}{\partial p}(H+\mathfrak{p})\right)\delta p\right]d\tau},

and using integration by parts and the boundary conditions δq¯(0)=0\delta\bar{q}(0)=0 and δp¯(T)=0,\delta\bar{p}(T)=0, gives

δ𝔖\displaystyle\delta\mathfrak{S} =τ(t=0)τ(t=T)[p+gHq+gq(H+𝔭)]δq𝑑τ+τ(t=0)τ(t=T)[gHp+gp(H+𝔭)q]δp𝑑τ\displaystyle=\int_{\tau(t=0)}^{\tau(t=T)}{\left[p^{\prime}+g\frac{\partial H}{\partial q}+\frac{\partial g}{\partial q}(H+\mathfrak{p})\right]\delta qd\tau}+\int_{\tau(t=0)}^{\tau(t=T)}{\left[g\frac{\partial H}{\partial p}+\frac{\partial g}{\partial p}(H+\mathfrak{p})-q^{\prime}\right]\delta pd\tau}
+τ(t=0)τ(t=T)[𝔭+gH𝔮+g𝔮(H+𝔭)]δ𝔮𝑑ττ(t=0)τ(t=T)[𝔮g]δ𝔭𝑑τ.\displaystyle\qquad\qquad+\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{p}^{\prime}+g\frac{\partial H}{\partial\mathfrak{q}}+\frac{\partial g}{\partial\mathfrak{q}}(H+\mathfrak{p})\right]\delta\mathfrak{q}d\tau}-\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{q}^{\prime}-g\right]\delta\mathfrak{p}d\tau}.

Thus, the condition that 𝔖(q¯(),p¯())\mathfrak{S}(\bar{q}(\cdot),\bar{p}(\cdot)) is stationary with respect to the boundary conditions δq¯(0)=0\delta\bar{q}(0)=0 and δp¯(T)=0\delta\bar{p}(T)=0 is equivalent to (q¯(),p¯())(\bar{q}(\cdot),\bar{p}(\cdot)) satisfying Hamilton’s canonical equations corresponding to the Poincaré transformed Hamiltonian,

𝔮=g(q,𝔮,p),\displaystyle\mathfrak{q}^{\prime}=g(q,\mathfrak{q},p), (3.6)
q=g(q,𝔮,p)Hp(q,𝔮,p)+gp(q,𝔮,p)[H(q,𝔮,p)+𝔭],\displaystyle q^{\prime}=g(q,\mathfrak{q},p)\frac{\partial H}{\partial p}(q,\mathfrak{q},p)+\frac{\partial g}{\partial p}(q,\mathfrak{q},p)\left[H(q,\mathfrak{q},p)+\mathfrak{p}\right], (3.7)
p=g(q,𝔮,p)Hq(q,𝔮,p)gq(q,𝔮,p)[H(q,𝔮,p)+𝔭],\displaystyle p^{\prime}=-g(q,\mathfrak{q},p)\frac{\partial H}{\partial q}(q,\mathfrak{q},p)-\frac{\partial g}{\partial q}(q,\mathfrak{q},p)\left[H(q,\mathfrak{q},p)+\mathfrak{p}\right], (3.8)
𝔭=g(q,𝔮,p)H𝔮(q,𝔮,p)g𝔮(q,𝔮,p)[H(q,𝔮,p)+𝔭].\displaystyle\mathfrak{p}^{\prime}=-g(q,\mathfrak{q},p)\frac{\partial H}{\partial\mathfrak{q}}(q,\mathfrak{q},p)-\frac{\partial g}{\partial\mathfrak{q}}(q,\mathfrak{q},p)\left[H(q,\mathfrak{q},p)+\mathfrak{p}\right]. (3.9)

An alternative way to reach the same conclusion is by interpreting equation (3.5) as the usual Type II action functional for the modified Hamiltonian,

g(q(τ),𝔮(τ),p(τ))[H(q(τ),𝔮(τ),p(τ))+𝔭(τ)],g(q(\tau),\mathfrak{q}(\tau),p(\tau))\left[H(q(\tau),\mathfrak{q}(\tau),p(\tau))+\mathfrak{p}(\tau)\right], (3.10)

which coincides with the Poincaré transformed Hamiltonian.

3.2. Time-adaptivity from a Variational Principle on the Lagrangian side

We will now derive a mechanism for time-adaptivity on the Lagrangian side by mimicking the derivation of the Poincaré Hamiltonian. We will work in the extended space q¯=(q,𝔮,λ)𝒬¯\bar{q}=(q,\mathfrak{q},\lambda)^{\top}\in\bar{\mathcal{Q}} where 𝔮=t\mathfrak{q}=t and λ\lambda is a Lagrange multiplier used to enforce the time rescaling dtdτ=g(t).\frac{dt}{d\tau}=g(t). Consider the action functional 𝔖:C2([0,T],T𝒬¯)\mathfrak{S}:C^{2}([0,T],T\bar{\mathcal{Q}})\rightarrow\mathbb{R} given by

𝔖(q¯(),q¯˙())\displaystyle\mathfrak{S}(\bar{q}(\cdot),\dot{\bar{q}}(\cdot)) =0T[L(q(t),q˙(t),𝔮(t))λ(t)(d𝔮dτg(𝔮(t)))]𝑑t\displaystyle=\int_{0}^{T}{\left[L(q(t),\dot{q}(t),\mathfrak{q}(t))-\lambda(t)\left(\frac{d\mathfrak{q}}{d\tau}-g(\mathfrak{q}(t))\right)\right]dt} (3.11)
=τ(t=0)τ(t=T)[dtdτL(q(τ),dτdtq(τ),𝔮(τ))λ(τ)dtdτ(d𝔮dτg(𝔮(τ)))]𝑑τ\displaystyle=\int_{\tau(t=0)}^{\tau(t=T)}{\left[\frac{dt}{d\tau}L\left(q(\tau),\frac{d\tau}{dt}q^{\prime}(\tau),\mathfrak{q}(\tau)\right)-\lambda(\tau)\frac{dt}{d\tau}\left(\frac{d\mathfrak{q}}{d\tau}-g(\mathfrak{q}(\tau))\right)\right]d\tau} (3.12)
=τ(t=0)τ(t=T)[𝔮(τ)L(q(τ),dτdtq(τ),𝔮(τ))λ(τ)𝔮(τ)[𝔮(τ)g(𝔮(τ))]]𝑑τ,\displaystyle=\int_{\tau(t=0)}^{\tau(t=T)}{\left[\mathfrak{q}^{\prime}(\tau)L\left(q(\tau),\frac{d\tau}{dt}q^{\prime}(\tau),\mathfrak{q}(\tau)\right)-\lambda(\tau)\mathfrak{q}^{\prime}(\tau)\left[\mathfrak{q}^{\prime}(\tau)-g(\mathfrak{q}(\tau))\right]\right]d\tau}, (3.13)

where, as before, we have performed a change of variables in the integral. This is the usual Type I action functional for the extended autonomous Lagrangian,

L¯(q¯(τ),q¯(τ))=𝔮(τ)L(q(τ),dτdtq(τ),𝔮(τ))λ(τ)𝔮(τ)[𝔮(τ)g(𝔮(τ))].\bar{L}(\bar{q}(\tau),\bar{q}^{\prime}(\tau))=\mathfrak{q}^{\prime}(\tau)L\left(q(\tau),\frac{d\tau}{dt}q^{\prime}(\tau),\mathfrak{q}(\tau)\right)-\lambda(\tau)\mathfrak{q}^{\prime}(\tau)\left[\mathfrak{q}^{\prime}(\tau)-g(\mathfrak{q}(\tau))\right]. (3.14)
Theorem 3.1.

If (q¯(τ),q¯(τ))\left(\bar{q}(\tau),\bar{q}^{\prime}(\tau)\right) satisfies the Euler–Lagrange equations corresponding to the extended Lagrangian L¯\bar{L}, then its components satisfy dtdτ=g(t)\frac{dt}{d\tau}=g(t) and the original Euler–Lagrange equations

ddtLq˙(q,q˙,t)=Lq(q,q˙,t).\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}\left(q,\dot{q},t\right)=\frac{\partial L}{\partial q}\left(q,\dot{q},t\right). (3.15)
Proof.

Substituting the expression for L¯\bar{L} into the Euler–Lagrange equations, ddτL¯λ=L¯λ,\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial\lambda^{\prime}}=\frac{\partial\bar{L}}{\partial\lambda}, and ddτL¯q=L¯q\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial q^{\prime}}=\frac{\partial\bar{L}}{\partial q}, gives

𝔮[𝔮g(𝔮)]=0,\mathfrak{q}^{\prime}\left[\mathfrak{q}^{\prime}-g(\mathfrak{q})\right]=0,

and

d𝔮dτdd𝔮[𝔮L(q,dτd𝔮q,𝔮)q]=𝔮L(q,dτd𝔮q,𝔮)q.\frac{d\mathfrak{q}}{d\tau}\frac{d}{d\mathfrak{q}}\left[\mathfrak{q}^{\prime}\frac{\partial L\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)}{\partial q^{\prime}}\right]=\mathfrak{q}^{\prime}\frac{\partial L\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)}{\partial q}.

Now, 𝔮=g(𝔮)>0\mathfrak{q}^{\prime}=g(\mathfrak{q})>0 so 𝔮=g(𝔮)\mathfrak{q}^{\prime}=g(\mathfrak{q}), and the chain rule gives

dd𝔮Lq˙(q,dτd𝔮q,𝔮)=Lq(q,dτd𝔮q,𝔮).\frac{d}{d\mathfrak{q}}\frac{\partial L}{\partial\dot{q}}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)=\frac{\partial L}{\partial q}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right).

Using the equation q˙=dτd𝔮q\dot{q}=\frac{d\tau}{d\mathfrak{q}}q^{\prime} and replacing 𝔮\mathfrak{q} by tt recovers the original Euler–Lagrange equations. ∎

We now introduce a discrete variational formulation of these continuous Lagrangian mechanics. Suppose we are given a partition 0=τ0<τ1<<τN=𝒯0=\tau_{0}<\tau_{1}<\ldots<\tau_{N}=\mathcal{T} of the interval [0,𝒯][0,\mathcal{T}], and a discrete curve in 𝒬××\mathcal{Q}\times\mathbb{R}\times\mathbb{R} denoted by {(qk,𝔮k,λk)}k=0N\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N} such that qkq(τk)q_{k}\approx q(\tau_{k}), 𝔮k𝔮(τk)\mathfrak{q}_{k}\approx\mathfrak{q}(\tau_{k}), and λkλ(τk)\lambda_{k}\approx\lambda(\tau_{k}). Consider the discrete action functional,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1[Ld(qk,𝔮k,qk+1,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk,\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}, (3.16)

where Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}) is obtained by approximating the exact discrete Lagrangian, which is related to Jacobi’s solution of the Hamilton–Jacobi equation and is the generating function for the exact time-hh flow map. It is given by the extremum of the action integral from τk\tau_{k} to τk+1\tau_{k+1} over twice continuously differentiable curves (q,𝔮)𝒬×(q,\mathfrak{q})\in\mathcal{Q}\times\mathbb{R} satisfying the boundary conditions (q(τk),𝔮(τk))=(qk,𝔮k),(q(\tau_{k}),\mathfrak{q}(\tau_{k}))=(q_{k},\mathfrak{q}_{k}), and (q(τk+1),𝔮(τk+1))=(qk+1,𝔮k+1)(q(\tau_{k+1}),\mathfrak{q}(\tau_{k+1}))=(q_{k+1},\mathfrak{q}_{k+1}):

Ld(qk,𝔮k,qk+1,𝔮k+1)ext(q,𝔮)C2([τk,τk+1],𝒬×)(q,𝔮)(τk)=(qk,𝔮k), (q,𝔮)(τk+1)=(qk+1,𝔮k+1) τkτk+1L(q,qg(𝔮),𝔮)𝑑τ.L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],\mathcal{Q}\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=(q_{k+1},\mathfrak{q}_{k+1})\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}. (3.17)

In practice, we can obtain an approximation by replacing the integral with a quadrature rule, and extremizing over a finite-dimensional function space instead of C2([τk,τk+1],𝒬×)C^{2}([\tau_{k},\tau_{k+1}],\mathcal{Q}\times\mathbb{R}). This discrete functional 𝔖¯d\bar{\mathfrak{S}}_{d} is a discrete analogue of the action functional 𝔖¯:C2([0,T],𝒬××)\bar{\mathfrak{S}}:C^{2}([0,T],\mathcal{Q}\times\mathbb{R}\times\mathbb{R})\rightarrow\mathbb{R} given by

𝔖¯(q(),𝔮(),λ())\displaystyle\bar{\mathfrak{S}}(q(\cdot),\mathfrak{q}(\cdot),\lambda(\cdot)) =0𝒯L¯(q(τ),𝔮(τ),λ(τ),q(τ),𝔮(τ),λ(τ))𝑑τ\displaystyle=\int_{0}^{\mathcal{T}}{\bar{L}\left(q(\tau),\mathfrak{q}(\tau),\lambda(\tau),q^{\prime}(\tau),\mathfrak{q}^{\prime}(\tau),\lambda^{\prime}(\tau)\right)d\tau} (3.18)
=0𝒯[L(q,qg(𝔮),𝔮)λ𝔮+λg(𝔮)]𝔮𝑑τ.\displaystyle=\int_{0}^{\mathcal{T}}{\left[L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)-\lambda\mathfrak{q}^{\prime}+\lambda g(\mathfrak{q})\right]\mathfrak{q}^{\prime}d\tau}. (3.19)

We can derive the following result which relates a discrete Type I variational principle to a set of discrete Euler–Lagrange equations:

Theorem 3.2.

The Type I discrete Hamilton’s variational principle,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0,\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0, (3.20)

is equivalent to the discrete extended Euler–Lagrange equations,

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}), (3.21)
𝔮k+1𝔮kτk+1τkD1Ld(qk,𝔮k,qk+1,𝔮k+1)+𝔮k𝔮k1τkτk1D3Ld(qk1,𝔮k1,qk,𝔮k)=0,\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0, (3.22)
[D2Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk1τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]\displaystyle\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right] (3.23)
+[D4Ldk1λk11τkτk1]𝔮k𝔮k1τkτk1+1τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]=0,\displaystyle\qquad+\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,

where LdkL_{d_{k}} denotes Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}).

Proof.

See Appendix A.1. ∎

Defining the discrete momenta via the discrete Legendre transformations,

pk=D1Ld(qk,𝔮k,qk+1,𝔮k+1),𝔭k=D2Ld(qk,𝔮k,qk+1,𝔮k+1),p_{k}=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),\qquad\qquad\mathfrak{p}_{k}=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.24)

and using a constant time-step hh in τ\tau, the discrete Euler–Lagrange equations can be rewritten as

pk\displaystyle p_{k} =D1Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.25)
𝔭k\displaystyle\mathfrak{p}_{k} =D2Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.26)
𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+hg(𝔮k),\displaystyle=\mathfrak{q}_{k}+hg(\mathfrak{q}_{k}), (3.27)
pk+1\displaystyle p_{k+1} =g(𝔮k)g(𝔮k+1)D3Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}D_{3}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.28)
𝔭k+1\displaystyle\mathfrak{p}_{k+1} =LdkLdk+1hg(𝔮k+1)+λk+1h+λk+1g(𝔮k+1)+g(𝔮k)g(𝔮k+1)[D4Ldkλkh].\displaystyle=\frac{L_{d_{k}}-L_{d_{k+1}}}{hg(\mathfrak{q}_{k+1})}+\frac{\lambda_{k+1}}{h}+\lambda_{k+1}\nabla g(\mathfrak{q}_{k+1})+\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}\left[D_{4}L_{d_{k}}-\frac{\lambda_{k}}{h}\right]. (3.29)

3.3. A Second Time-Adaptive Framework obtained by Reverse-Engineering

As mentioned earlier, all the literature to date on the Poincaré transformation have constructed the Poincaré transformed system by reverse-engineering. The Poincaré transformed Hamiltonian is chosen in such a way that the corresponding component dynamics satisfy the Hamilton’s equations in the original space. We will follow a similar strategy to derive a second framework for time-adaptivity from the Lagrangian perspective.

Given a time-dependent Lagrangian L(q(t),q˙(t),t)L(q(t),\dot{q}(t),t) consider a transformation of time tτt\rightarrow\tau,

dtdτ=g(t),\frac{dt}{d\tau}=g(t), (3.30)

described by the monitor function g(t)g(t). The time tt shall be referred to as the physical time, while τ\tau will be referred to as the fictive time, and we will denote derivatives with respect to tt and τ\tau by dots and apostrophes, respectively. We define the autonomous Lagrangian,

L¯(q¯(τ),q¯(τ))=𝔮L(q,qg(𝔮),𝔮)λ(𝔮g(𝔮)),\bar{L}(\bar{q}(\tau),\bar{q}^{\prime}(\tau))=\mathfrak{q}^{\prime}L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)-\lambda\left(\mathfrak{q}^{\prime}-g(\mathfrak{q})\right), (3.31)

in the extended space with q¯=(q,𝔮,λ)\bar{q}=\left(q,\mathfrak{q},\lambda\right)^{\top} where 𝔮=t\mathfrak{q}=t, and where λ\lambda is a multiplier used to impose the constraint that the time evolution is guided by the monitor function g(t)g(t). Note that in contrast to the earlier framework, the Lagrange multiplier term lacks an extra multiplicative factor of 𝔮\mathfrak{q}^{\prime}.

Theorem 3.3.

If (q¯(τ),q¯(τ))\left(\bar{q}(\tau),\bar{q}^{\prime}(\tau)\right) satisfies the Euler–Lagrange equations corresponding to the extended Lagrangian L¯\bar{L}, then its components satisfy dtdτ=g(t)\frac{dt}{d\tau}=g(t) and the original Euler–Lagrange equations

ddtLq˙(q,q˙,t)=Lq(q,q˙,t).\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}\left(q,\dot{q},t\right)=\frac{\partial L}{\partial q}\left(q,\dot{q},t\right). (3.32)
Proof.

Substituting the expression for L¯\bar{L} in the Euler–Lagrange equations ddτL¯λ=L¯λ\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial\lambda^{\prime}}=\frac{\partial\bar{L}}{\partial\lambda} and ddτL¯q=L¯q\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial q^{\prime}}=\frac{\partial\bar{L}}{\partial q} gives

𝔮=g(𝔮),\mathfrak{q}^{\prime}=g(\mathfrak{q}),

and

d𝔮dτdd𝔮[d𝔮dτL(q,qg(𝔮),𝔮)q]=d𝔮dτL(q,qg(𝔮),𝔮)q.\frac{d\mathfrak{q}}{d\tau}\frac{d}{d\mathfrak{q}}\left[\frac{d\mathfrak{q}}{d\tau}\frac{\partial L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)}{\partial q^{\prime}}\right]=\ \frac{d\mathfrak{q}}{d\tau}\frac{\partial L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)}{\partial q}.

We can divide by d𝔮dτ\frac{d\mathfrak{q}}{d\tau} and use the chain rule to get 𝔮=g(𝔮)\mathfrak{q}^{\prime}=g(\mathfrak{q}) and

dd𝔮Lq˙(q,dτd𝔮q,𝔮)=Lq(q,dτd𝔮q,𝔮).\frac{d}{d\mathfrak{q}}\frac{\partial L}{\partial\dot{q}}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right)=\frac{\partial L}{\partial q}\left(q,\frac{d\tau}{d\mathfrak{q}}q^{\prime},\mathfrak{q}\right).

Using the equations q˙=dτd𝔮q\dot{q}=\frac{d\tau}{d\mathfrak{q}}q^{\prime} and 𝔮=g(𝔮)\mathfrak{q}^{\prime}=g(\mathfrak{q}), and replacing 𝔮\mathfrak{q} by tt recovers the desired equations. ∎

We now introduce a discrete variational formulation of these continuous Lagrangian mechanics. Suppose we are given a partition 0=τ0<τ1<<τN=𝒯0=\tau_{0}<\tau_{1}<\ldots<\tau_{N}=\mathcal{T} of the interval [0,𝒯][0,\mathcal{T}], and a discrete curve in 𝒬××\mathcal{Q}\times\mathbb{R}\times\mathbb{R} denoted by {(qk,𝔮k,λk)}k=0N\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N} such that qkq(τk)q_{k}\approx q(\tau_{k}), 𝔮k𝔮(τk)\mathfrak{q}_{k}\approx\mathfrak{q}(\tau_{k}), and λkλ(τk)\lambda_{k}\approx\lambda(\tau_{k}). Consider the discrete action functional,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1{𝔮k+1𝔮kτk+1τk[Ld(qk,𝔮k,qk+1,𝔮k+1)λk]+λkg(𝔮k)},\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left\{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\right]+\lambda_{k}g(\mathfrak{q}_{k})\right\}}, (3.33)

where,

Ld(qk,𝔮k,qk+1,𝔮k+1)ext(q,𝔮)C2([τk,τk+1],𝒬×)(q,𝔮)(τk)=(qk,𝔮k), (q,𝔮)(τk+1)=(qk+1,𝔮k+1) τkτk+1L(q,qg(𝔮),𝔮)𝑑τ.L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],\mathcal{Q}\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=(q_{k+1},\mathfrak{q}_{k+1})\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}. (3.34)

This discrete functional 𝔖¯d\bar{\mathfrak{S}}_{d} is a discrete analogue of the action functional 𝔖¯:C2([0,T],𝒬××)\bar{\mathfrak{S}}:C^{2}([0,T],\mathcal{Q}\times\mathbb{R}\times\mathbb{R})\rightarrow\mathbb{R} given by

𝔖¯(q(),𝔮(),λ())\displaystyle\bar{\mathfrak{S}}(q(\cdot),\mathfrak{q}(\cdot),\lambda(\cdot)) =0𝒯L¯(q(τ),𝔮(τ),λ(τ),q(τ),𝔮(τ),λ(τ))𝑑τ\displaystyle=\int_{0}^{\mathcal{T}}{\bar{L}\left(q(\tau),\mathfrak{q}(\tau),\lambda(\tau),q^{\prime}(\tau),\mathfrak{q}^{\prime}(\tau),\lambda^{\prime}(\tau)\right)d\tau} (3.35)
=0𝒯{𝔮[L(q,qg(𝔮),𝔮)λ]+λg(𝔮)}𝑑τ.\displaystyle=\int_{0}^{\mathcal{T}}{\left\{\mathfrak{q}^{\prime}\left[L\left(q,\frac{q^{\prime}}{g(\mathfrak{q})},\mathfrak{q}\right)-\lambda\right]+\lambda g(\mathfrak{q})\right\}d\tau}. (3.36)

We can derive the following result which relates a discrete Type I variational principle to a set of discrete Euler–Lagrange equations:

Theorem 3.4.

The Type I discrete Hamilton’s variational principle,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0,\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0, (3.37)

is equivalent to the discrete extended Euler–Lagrange equations,

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}), (3.38)
𝔮k+1𝔮kτk+1τkD1Ld(qk,𝔮k,qk+1,𝔮k+1)+𝔮k𝔮k1τkτk1D3Ld(qk1,𝔮k1,qk,𝔮k)=0,\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0, (3.39)
𝔮k+1𝔮kτk+1τkD2LdkLdkτk+1τk+𝔮k𝔮k1τkτk1D4Ldk1+Ldk1τkτk1=λk1τkτk1λkτk+1τkλkg(𝔮k),\displaystyle\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{L_{d_{k}}}{\tau_{k+1}-\tau_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{L_{d_{k-1}}}{\tau_{k}-\tau_{k-1}}=\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}-\lambda_{k}\nabla g(\mathfrak{q}_{k}), (3.40)

where LdkL_{d_{k}} denotes Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}).

Proof.

See Appendix A.2. ∎

Defining the discrete momenta via the discrete Legendre transformations,

pk=D1Ld(qk,𝔮k,qk+1,𝔮k+1),𝔭k=D2Ld(qk,𝔮k,qk+1,𝔮k+1),p_{k}=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}),\qquad\qquad\mathfrak{p}_{k}=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.41)

and using a constant time-step hh in τ\tau, the discrete Euler–Lagrange equations can be rewritten as

pk\displaystyle p_{k} =D1Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.42)
𝔭k\displaystyle\mathfrak{p}_{k} =D2Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=-D_{2}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.43)
𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+hg(𝔮k),\displaystyle=\mathfrak{q}_{k}+hg(\mathfrak{q}_{k}), (3.44)
pk+1\displaystyle p_{k+1} =g(𝔮k)g(𝔮k+1)D3Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}D_{3}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (3.45)
𝔭k+1\displaystyle\mathfrak{p}_{k+1} =LdkLdk+1+λk+1λk+hλk+1g(𝔮k+1)+hg(𝔮k)D4Ldkhg(𝔮k+1).\displaystyle=\frac{L_{d_{k}}-L_{d_{k+1}}+\lambda_{k+1}-\lambda_{k}+h\lambda_{k+1}\nabla g(\mathfrak{q}_{k+1})+hg(\mathfrak{q}_{k})D_{4}L_{d_{k}}}{hg(\mathfrak{q}_{k+1})}. (3.46)

3.4. Remarks on the Framework for Time-Adaptivity

Time-adaptivity comes more naturally on the Hamiltonian side through the Poincaré transformation. Indeed, in the Hamiltonian case, the time-rescaling equation 𝔮=g(q,𝔮,p)\mathfrak{q}^{\prime}=g(q,\mathfrak{q},p) emerged naturally through the change of time variable inside the extended action functional. By contrast, in the Lagrangian case, we need to impose the time-rescaling equation as a constraint via a multiplier, which we then consider as an extra position coordinate. This strategy can be thought of as being part of the more general framework for constrained variational integrators (see [49; 16]).

The Poincaré transformation on the Hamiltonian side was presented in [69; 25; 20] for the general case where the monitor function can depend on position, time and momentum, g=g(q,t,p)g=g(q,t,p). For the accelerated optimization application which was our main motivation to develop a time-adaptive framework for geometric integrators, the monitor function only depends on time, g=g(t)g=g(t). For the sake of simplicity and clarity, we have decided to only present the theory for time-adaptive Lagrangian integrators for monitor functions of the form g=g(t)g=g(t) in this paper. Note however that this time-adaptivity framework on the Lagrangian side can be extended to the case where the monitor function also depends on position, g=g(q,t)g=g(q,t). The action integral remains the same with the exception that gg is now a function of (q,𝔮)(q,\mathfrak{q}). Unlike the case where g=g(t)g=g(t), the corresponding Euler–Lagrange equation ddτL¯q=Lq\frac{d}{d\tau}\frac{\partial\bar{L}}{\partial q^{\prime}}=\frac{\partial L}{\partial q} yields an extra term λ(t)gq(q,t)\lambda(t)\frac{\partial g}{\partial q}(q,t) in the original phase-space,

ddtLq˙(q,q˙,t)Lq(q,q˙,t)=λ(t)gq(q,t).\frac{d}{dt}\frac{\partial L}{\partial\dot{q}}\left(q,\dot{q},t\right)-\frac{\partial L}{\partial q}\left(q,\dot{q},t\right)=\lambda(t)\frac{\partial g}{\partial q}(q,t). (3.47)

The discrete Euler–Lagrange equations become more complicated and involve terms with partial derivatives D1g(qk,𝔮k)D_{1}g(q_{k},\mathfrak{q}_{k}) of gg with respect to qq. Furthermore, when g=g(q,t)g=g(q,t), the discrete Euler–Lagrange equations involve λk\lambda_{k} but the time-evolution of the Lagrange multiplier λ\lambda is not well-defined, so the discrete Hamiltonian map corresponding to the discrete Lagrangian LdL_{d} is not well-defined, as explained in [49, page 440]. Although there are ways to circumvent this problem in practice, this adds some difficulty and makes the time-adaptive Lagrangian approach with g=g(q,t)g=g(q,t) less natural and desirable than the corresponding Poincaré transformation on the Hamiltonian side. It might also be tempting to generalize further and consider the case where g=g(q,q˙,t)g=g(q,\dot{q},t). However, in this case, the time-rescaling equation dtdτ=g(q,q˙,t)\frac{dt}{d\tau}=g(q,\dot{q},t) becomes implicit and it becomes less clear how to generalize the variational derivation presented in this paper. There are examples where time-adaptivity with these more general monitor functions proved advantageous (see for instance Kepler’s problem in [20]). This motivates further effort towards developing a better framework for time-adaptivity on the Lagrangian side with more general monitor functions.

It might be more natural to consider these time-rescaled Lagrangian and Hamiltonian dynamics as Dirac mechanics [44; 66; 67] on the Pontryagin bundle (q,v,p)T𝒬T𝒬(q,v,p)\in T\mathcal{Q}\oplus T^{*}\mathcal{Q}. Dirac dynamics are described by the Hamilton-Pontryagin variational principle where the momentum pp acts as a Lagrange multiplier to impose the kinematic equation q˙=v\dot{q}=v,

δ0T[L(q,v,t)+p(q˙v)]𝑑t=0.\delta\int_{0}^{T}{\left[L(q,v,t)+p(\dot{q}-v)\right]dt}=0. (3.48)

This provides a variational description of both Lagrangian and Hamiltonian mechanics, yields the implicit Euler–Lagrange equations

q˙=v,p˙=Lq,p=Lv,\dot{q}=v,\qquad\dot{p}=\frac{\partial L}{\partial q},\qquad p=\frac{\partial L}{\partial v}, (3.49)

and suggests the introduction of a more general quantity, the generalized energy

E(q,v,p,t)=pvL(q,v,t),E(q,v,p,t)=pv-L(q,v,t), (3.50)

as an alternative to the Hamiltonian.

4. Application to Accelerated Optimization on Vector Spaces

4.1. A Variational Framework for Accelerated Optimization

A variational framework was introduced in [65] for accelerated optimization on normed vector spaces. The pp-Bregman Lagrangians and Hamiltonians are defined to be

p(x,v,t)=tp+12pv,vCpt2p1f(x),\mathcal{L}_{p}(x,v,t)=\frac{t^{p+1}}{2p}\langle v,v\rangle-Cpt^{2p-1}f(x), (4.1)
p(x,r,t)=p2tp+1r,r+Cpt2p1f(x),\mathcal{H}_{p}(x,r,t)=\frac{p}{2t^{p+1}}\langle r,r\rangle+Cpt^{2p-1}f(x), (4.2)

which are scalar-valued functions of position x𝒳x\in\mathcal{X}, velocity vdv\in\mathbb{R}^{d} or momentum rdr\in\mathbb{R}^{d}, and time tt. In [65], it was shown that solutions to the pp-Bregman Euler–Lagrange equations converge to a minimizer of the convex objective function ff at a convergence rate of 𝒪(1/tp)\mathcal{O}(1/t^{p}). Furthermore, this family of Bregman dynamics is closed under time dilation: time-rescaling a solution to the pp-Bregman Euler–Lagrange equations via τ(t)=tp̊/p\tau(t)=t^{\mathring{p}/p} yields a solution to the p̊\mathring{p}-Bregman Euler–Lagrange equations. Thus, the entire subfamily of Bregman trajectories indexed by the parameter pp can be obtained by speeding up or slowing down along the Bregman curve corresponding to any value of pp. In [20], the time-rescaling property of the Bregman dynamics was exploited together with a carefully chosen Poincaré transformation to transform the pp-Bregman Hamiltonian into an autonomous version of the p̊\mathring{p}-Bregman Hamiltonian in extended phase-space, where p̊<p\mathring{p}<p. This strategy allowed us to achieve the faster rate of convergence associated with the higher-order pp-Bregman dynamics, but with the computational efficiency of integrating the lower-order p̊\mathring{p}-Bregman dynamics. Explicitly, using the time rescaling τ(t)=tp̊/p\tau(t)=t^{\mathring{p}/p} within the Poincaré transformation framework yields the adaptive approach pp̊p\rightarrow\mathring{p}-Bregman Hamiltonian,

H¯pp̊(q¯,r¯)\displaystyle\bar{H}_{p\rightarrow\mathring{p}}(\bar{q},\bar{r}) =p22p̊𝔮p+p̊/pr,r+Cp2p̊𝔮2pp̊/pf(q)+pp̊𝔯𝔮1p̊/p,\displaystyle=\frac{p^{2}}{2\mathring{p}\mathfrak{q}^{p+\mathring{p}/p}}\langle r,r\rangle+\frac{Cp^{2}}{\mathring{p}}\mathfrak{q}^{2p-\mathring{p}/p}f(q)+\frac{p}{\mathring{p}}\mathfrak{r}\mathfrak{q}^{1-\mathring{p}/p}, (4.3)

and when p̊=p\mathring{p}=p, the direct approach pp-Bregman Hamiltonian,

H¯p(q¯,r¯)=p2𝔮p+1r,r+Cp𝔮2p1f(q)+𝔯.\bar{H}_{p}(\bar{q},\bar{r})=\frac{p}{2\mathfrak{q}^{p+1}}\langle r,r\rangle+Cp\mathfrak{q}^{2p-1}f(q)+\mathfrak{r}. (4.4)

In [20], a careful computational study was performed on how time-adaptivity and symplecticity of the numerical scheme improve the performance of the resulting optimization algorithm. In particular, it was observed that time-adaptive Hamiltonian variational discretizations, which are automatically symplectic, with adaptive time-steps informed by the time-rescaling of the family of pp-Bregman Hamiltonians yielded the most robust and computationally efficient optimization algorithms, outperforming fixed-timestep symplectic discretizations, adaptive-timestep non-symplectic discretizations, and Nesterov’s accelerated gradient algorithm which is neither time-adaptive nor symplectic.

4.2. Numerical Methods

4.2.1. A Lagrangian Taylor Variational Integrator (LTVI)

We will now construct a time-adaptive Lagrangian Taylor variational integrator (LTVI) for the pp-Bregman Lagrangian,

L¯p(q,q,𝔮)=𝔮p+12pq,qCp𝔮2p1f(q),\bar{L}_{p}\left(q,q^{\prime},\mathfrak{q}\right)=\frac{\mathfrak{q}^{p+1}}{2p}\langle q^{\prime},q^{\prime}\rangle-Cp\mathfrak{q}^{2p-1}f(q), (4.5)

using the strategy outlined in Section 2.2 together with the discrete Euler–Lagrange equations derived in Sections 3.2 and 3.3.

Looking at the form of the continuous pp-Bregman Euler–Lagrange equations,

q¨+p+1𝔮q˙+Cp2𝔮p2f(q)=0,\ddot{q}+\frac{p+1}{\mathfrak{q}}\dot{q}+Cp^{2}\mathfrak{q}^{p-2}\nabla f(q)=0, (4.6)

we can note that f\nabla f appears in the expression for q¨\ddot{q}. Now, the construction of a LTVI as presented in Section 2.2 requires ρ\rho-order and (ρ+1)(\rho+1)-order Taylor approximations of qq. This means that if we take ρ1\rho\geq 1, then f\nabla f and higher-order derivatives of ff will appear in the resulting discrete Lagrangian LdL_{d}, and as a consequence, the discrete Euler–Lagrange equations,

p0=D1Ld(q0,q1),p1=D2Ld(q0,q1),p_{0}=-D_{1}L_{d}(q_{0},q_{1}),\qquad p_{1}=D_{2}L_{d}(q_{0},q_{1}), (4.7)

will yield an integrator which is not gradient-based. Keeping in mind the machine learning applications where data sets are very large, we will restrict ourselves to explicit first-order optimization algorithms, and therefore the highest value of ρ\rho that we can choose to obtain a gradient-based algorithm is ρ=0\rho=0.

With ρ=0\rho=0, the choice of quadrature rule does not matter, so we can take the rectangular quadrature rule about the initial point (c1=0c_{1}=0 and b1=1b_{1}=1). We first approximate q¯˙(0)=v¯0\dot{\bar{q}}(0)=\bar{v}_{0} by the solution v¯~0\tilde{\bar{v}}_{0} of the problem q¯1=πQΨh(1)(q¯0,v¯~0)=q¯0+hv¯~0\bar{q}_{1}=\pi_{Q}\circ\Psi_{h}^{(1)}(\bar{q}_{0},\tilde{\bar{v}}_{0})=\bar{q}_{0}+h\tilde{\bar{v}}_{0}, that is v¯~0=q¯1q¯0h\tilde{\bar{v}}_{0}=\frac{\bar{q}_{1}-\bar{q}_{0}}{h}. Then, applying the quadrature rule gives the associated discrete Lagrangian,

Ld(q¯0,q¯1)\displaystyle L_{d}(\bar{q}_{0},\bar{q}_{1}) =hLp(q0,v~0g(𝔮0),𝔮0)=𝔮0p+12p(g(𝔮0))2hv~0,v~0Chp𝔮02p1f(q0).\displaystyle=hL_{p}\left(q_{0},\frac{\tilde{v}_{0}}{g(\mathfrak{q}_{0})},\mathfrak{q}_{0}\right)=\frac{\mathfrak{q}_{0}^{p+1}}{2p(g(\mathfrak{q}_{0}))^{2}}h\langle\tilde{v}_{0},\tilde{v}_{0}\rangle-Chp\mathfrak{q}_{0}^{2p-1}f(q_{0}). (4.8)

The variational integrator is then defined by the discrete extended Euler–Lagrange equations derived in Sections 3.2 and 3.3. In practice, we are not interested in the evolution of the conjugate momentum 𝔯\mathfrak{r}, and since it will not appear in the updates for the other variables, the discrete equations of motion from Sections 3.2 and 3.3 both reduce to the same updates,

rk\displaystyle r_{k} =D1Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=-D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (4.9)
rk+1\displaystyle r_{k+1} =g(𝔮k)g(𝔮k+1)D3Ld(qk,𝔮k,qk+1,𝔮k+1),\displaystyle=\frac{g(\mathfrak{q}_{k})}{g(\mathfrak{q}_{k+1})}D_{3}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), (4.10)
𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+hg(𝔮k).\displaystyle=\mathfrak{q}_{k}+hg(\mathfrak{q}_{k}). (4.11)

Now, for the adaptive approach, substituting g(𝔮)=pp̊𝔮1p̊pg(\mathfrak{q})=\frac{p}{\mathring{p}}\mathfrak{q}^{1-\frac{\mathring{p}}{p}} and

Ld(qk,𝔮k,qk+1,𝔮k+1)=p̊22hp3𝔮kp1+2p̊/pqk+1qk,qk+1qkChp𝔮02p1f(qk),L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})=\frac{\mathring{p}^{2}}{2hp^{3}}\mathfrak{q}_{k}^{p-1+2\mathring{p}/p}\langle q_{k+1}-q_{k},q_{k+1}-q_{k}\rangle-Chp\mathfrak{q}_{0}^{2p-1}f(q_{k}), (4.12)

yields the adaptive LTVI algorithm,

𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+hpp̊𝔮k1p̊/p,\displaystyle=\mathfrak{q}_{k}+h\frac{p}{\mathring{p}}\mathfrak{q}_{k}^{1-\mathring{p}/p}, (4.13)
qk+1\displaystyle q_{k+1} =qk+hp3p̊2𝔮kp1+2p̊/prkCh2p4p̊2𝔮kp2p̊/pf(qk),\displaystyle=q_{k}+\frac{hp^{3}}{\mathring{p}^{2}\mathfrak{q}_{k}^{p-1+2\mathring{p}/p}}r_{k}-\frac{Ch^{2}p^{4}}{\mathring{p}^{2}}\mathfrak{q}_{k}^{p-2\mathring{p}/p}\nabla f(q_{k}), (4.14)
rk+1\displaystyle r_{k+1} =p̊2𝔮kp+p̊/php3𝔮k+11p̊/p(qk+1qk).\displaystyle=\frac{\mathring{p}^{2}\mathfrak{q}_{k}^{p+\mathring{p}/p}}{hp^{3}\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}(q_{k+1}-q_{k}). (4.15)

In the direct approach, p̊=p\mathring{p}=p so g(𝔮)=1g(\mathfrak{q})=1 and we obtain the direct LTVI algorithm,

𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+h,\displaystyle=\mathfrak{q}_{k}+h, (4.16)
qk+1\displaystyle q_{k+1} =qk+hp𝔮kp+1rkCh2p2𝔮kp2f(qk),\displaystyle=q_{k}+\frac{hp}{\mathfrak{q}_{k}^{p+1}}r_{k}-Ch^{2}p^{2}\mathfrak{q}_{k}^{p-2}\nabla f(q_{k}), (4.17)
rk+1\displaystyle r_{k+1} =𝔮kp+1hp(qk+1qk).\displaystyle=\frac{\mathfrak{q}_{k}^{p+1}}{hp}(q_{k+1}-q_{k}). (4.18)

4.2.2. A Hamiltonian Taylor Variational Integrator (HTVI)

In [20], a Type II Hamiltonian Taylor Variational Integrator (HTVI) was derived following the strategy from Section 2.2 with ρ=0\rho=0 for the adaptive approach pp̊p\rightarrow\mathring{p}-Bregman Hamiltonian,

H¯pp̊(q¯,r¯)\displaystyle\bar{H}_{p\rightarrow\mathring{p}}(\bar{q},\bar{r}) =p22p̊𝔮p+p̊/pr,r+Cp2p̊𝔮2pp̊/pf(q)+pp̊𝔯𝔮1p̊/p.\displaystyle=\frac{p^{2}}{2\mathring{p}\mathfrak{q}^{p+\mathring{p}/p}}\langle r,r\rangle+\frac{Cp^{2}}{\mathring{p}}\mathfrak{q}^{2p-\mathring{p}/p}f(q)+\frac{p}{\mathring{p}}\mathfrak{r}\mathfrak{q}^{1-\mathring{p}/p}. (4.19)

This adaptive HTVI is the most natural Hamiltonian analogue of the LTVI described in Section 4.2.1, and its updates are given by

𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+hpp̊𝔮k1p̊/p,\displaystyle=\mathfrak{q}_{k}+h\frac{p}{\mathring{p}}\mathfrak{q}_{k}^{1-\mathring{p}/p}, (4.20)
rk+1\displaystyle r_{k+1} =rkp2p̊Ch𝔮k2pp̊/pf(qk),\displaystyle=r_{k}-\frac{p^{2}}{\mathring{p}}Ch\mathfrak{q}_{k}^{2p-\mathring{p}/p}\nabla f(q_{k}), (4.21)
qk+1\displaystyle q_{k+1} =qk+p2p̊h𝔮kpp̊/prk+1.\displaystyle=q_{k}+\frac{p^{2}}{\mathring{p}}h\mathfrak{q}_{k}^{-p-\mathring{p}/p}r_{k+1}. (4.22)

When p̊=p\mathring{p}=p, it reduces to the direct HTVI,

𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+h,\displaystyle=\mathfrak{q}_{k}+h, (4.23)
rk+1\displaystyle r_{k+1} =rkhCp𝔮k2p1f(qk),\displaystyle=r_{k}-hCp\mathfrak{q}_{k}^{2p-1}\nabla f(q_{k}), (4.24)
qk+1\displaystyle q_{k+1} =qk+hp𝔮kp1rk+1.\displaystyle=q_{k}+hp\mathfrak{q}_{k}^{-p-1}r_{k+1}. (4.25)

4.3. Numerical Experiments

Numerical experiments using the numerical methods presented in the previous section have been conducted to minimize the quartic function,

f(x)=[(x1)Σ(x1)]2,f(x)=\left[(x-1)^{\top}\Sigma(x-1)\right]^{2}, (4.26)

where xdx\in\mathbb{R}^{d} and Σij=0.9|ij|\Sigma_{ij}=0.9^{|i-j|}. This convex function achieves its minimum value 0 at x=1x^{*}=1.

As was observed in [20] for the HTVI algorithm, the numerical experiments showed that a carefully tuned adaptive approach algorithm enjoys a significantly better rate of convergence and requires a much smaller number of steps to achieve convergence than the direct approach, as can be seen in Figure 1 for the LTVI methods. Although the adaptive approach requires a smaller fictive time-step hh than the direct approach, the physical time-steps resulting from t=τp/p̊t=\tau^{p/\mathring{p}} in the adaptive approach grow rapidly to values larger than the constant physical time-step of the direct approach. The results of Figure 1 also show that the adaptive and direct LTVI methods become more and more efficient as pp is increased, which was also the case for the HTVI algorithm in [20].

Refer to caption
Figure 1. Comparison of the direct and adaptive approaches for the LTVI algorithm, when applied to the quartic function (4.26).

The LTVI and HTVI algorithms presented in Section 4.2 perform empirically almost exactly in the same way for the same parameters and time-step, as can be seen for instance in Figure 2. As a result, the computational analysis carried in [20] for the HTVI algorithm extends to the LTVI algorithm. In particular, it was shown in [20] that the HTVI algorithm is much more efficient than non-adaptive non-symplectic and adaptive non-symplectic integrators for the Bregman dynamics, and that it can be a competitive first-order explicit algorithm which can outperform certain popular optimization algorithms such as Nesterov’s Accelerated Gradient [54], Trust Region Steepest Descent, ADAM [33], AdaGrad [15], and RMSprop [61], for certain objective functions. Since the computational performance of the LTVI algorithm is almost exactly the same as that of the HTVI algorithm, this means that the LTVI algorithm is also much more efficient than non-symplectic integrators for the Bregman dynamics and can also be very competitive as a first-order explicit optimization algorithm.

Refer to caption
Figure 2. Comparison of the HTVI and LTVI algorithms with the same parameters.

5. Accelerated Optimization on More General Spaces

5.1. Motivation and Prior Work

The variational framework for accelerated optimization on normed vector spaces from [65; 20] was extended to the Riemannian manifold setting in [19] via a Riemannian pp-Bregman Lagrangian p:T𝒬×\mathcal{L}_{p}:T\mathcal{Q}\times\mathbb{R}\rightarrow\mathbb{R} and a corresponding Riemannian pp-Bregman Hamiltonian p:T𝒬×\mathcal{H}_{p}:T^{*}\mathcal{Q}\times\mathbb{R}\rightarrow\mathbb{R}, for p>0p>0, of the form

p(X,V,t)=tζλp+12pV,VCpt(ζλ+1)p1f(X),\mathcal{L}_{p}(X,V,t)=\frac{t^{\frac{\zeta}{\lambda}p+1}}{2p}\langle V,V\rangle-Cpt^{(\frac{\zeta}{\lambda}+1)p-1}f(X), (5.1)
p(X,R,t)=p2tζλp+1\llangleR,R\rrangle+Cpt(ζλ+1)p1f(X),\mathcal{H}_{p}(X,R,t)=\frac{p}{2t^{\frac{\zeta}{\lambda}p+1}}\llangle R,R\rrangle+Cpt^{(\frac{\zeta}{\lambda}+1)p-1}f(X), (5.2)

where ζ\zeta and λ\lambda are constants having to do with the curvature of the manifold and the convexity of the objective function ff. These yield the associated pp-Bregman Euler–Lagrange equations,

X˙X˙+ζp+λλtX˙+Cp2tp2gradf(X)=0.\nabla_{\dot{X}}\dot{X}+\frac{\zeta p+\lambda}{\lambda t}\dot{X}+Cp^{2}t^{p-2}\text{grad}f(X)=0. (5.3)

Here, gradf\text{grad}f denotes the Riemannian gradient of ff, XY\nabla_{X}Y is the covariant derivative of YY along XX, and \llangle,\rrangle\llangle\cdot,\cdot\rrangle is the fiber metric on T𝒬T^{*}\mathcal{Q} induced by the Riemannian metric ,\langle\cdot,\cdot\rangle on 𝒬\mathcal{Q} whose local coordinates representation is the inverse of the local representation of ,\langle\cdot,\cdot\rangle. See [1; 6; 48; 35; 32; 19] for a more detailed description of these notions from Riemannian geometry and of this Riemannian variational framework for accelerated optimization. Note that some work was done on accelerated optimization via numerical integration of Bregman dynamics in the Lie group setting [42; 60] before the theory for more general Bregman families on Riemannian manifolds was established in [19].

It was shown in [19] that solutions to the pp-Bregman Euler–Lagrange equations converge to a minimizer of ff at a convergence rate of 𝒪(1/tp)\mathcal{O}(1/t^{p}), under suitable assumptions, and proven that time-rescaling a solution to the pp-Bregman Euler–Lagrange equations via τ(t)=tp̊/p\tau(t)=t^{\mathring{p}/p} yields a solution to the p̊\mathring{p}-Bregman Euler–Lagrange equations. As a result, the adaptive approach involving the Poincaré transformation was extended to the Riemannian manifold setting via the adaptive approach Riemannian pp̊p\rightarrow\mathring{p} Bregman Hamiltonian,

¯pp̊(Q¯,R¯)\displaystyle\bar{\mathcal{H}}_{p\rightarrow\mathring{p}}\left(\bar{Q},\bar{R}\right) =p22p̊𝔔ζλp+p̊p\llangleR,R\rrangle+Cp2p̊𝔔(ζλ+1)pp̊pf(Q)+pp̊𝔔1p̊p.\displaystyle=\frac{p^{2}}{2\mathring{p}\mathfrak{Q}^{\frac{\zeta}{\lambda}p+\frac{\mathring{p}}{p}}}\llangle R,R\rrangle+\frac{Cp^{2}}{\mathring{p}}\mathfrak{Q}^{\left(\frac{\zeta}{\lambda}+1\right)p-\frac{\mathring{p}}{p}}f(Q)+\frac{p}{\mathring{p}}\mathfrak{Q}^{1-\frac{\mathring{p}}{p}}\mathfrak{R}. (5.4)

This adaptive framework was exploited using discrete variational integrators incorporating holonomic constraints [16] and projection-based variational integrators [18]. Both these strategies relied on embedding the Riemannian manifolds into an ambient Euclidean space. Although the Whitney and Nash Embedding Theorems [64; 63; 53] imply that there is no loss of generality when studying Riemannian manifolds only as submanifolds of Euclidean spaces, designing intrinsic methods that would exploit and preserve the symmetries and geometric properties of the Riemannian manifold and of the problem at hand could have advantages, both in terms of computation and in terms of improving our understanding of the acceleration phenomenon on Riemannian manifolds.

Developing an intrinsic extension of Hamiltonian variational integrators to manifolds would require some additional work, since the current approach involves Type II/III generating functions Hd+(qk,pk+1)H_{d}^{+}(q_{k},p_{k+1}), Hd(pk,qk+1)H_{d}^{-}(p_{k},q_{k+1}), which depend on the position at one boundary point, and on the momentum at the other boundary point. However, this does not make intrinsic sense on a manifold, since one needs the base point in order to specify the corresponding cotangent space, and one should ideally consider a Hamiltonian variational integrator construction based on the discrete generalized energy in the context of discrete Dirac mechanics [44; 66; 67].

On the other hand, Lagrangian variational integrators involve a Type I generating function Ld(qk,qk+1)L_{d}(q_{k},q_{k+1}) which only depends on the position at the boundary points and is therefore well-defined on manifolds, and many Lagrangian variational integrators have been derived on Riemannian manifolds, especially in the Lie group [38; 39; 5; 27; 56; 43; 28; 37; 36] and homogeneous space [40] settings. The time-adaptive framework developed in this paper now makes it possible to design time-adaptive Lagrangian integrators for accelerated optimization on these more general spaces, where it is more natural and easier to work on the Lagrangian side than on the Hamiltonian side.

5.2. Accelerated Optimization on Lie Groups

Although it is possible to work on Riemannian manifolds, we will restrict ourselves to Lie groups for simplicity of exposition since there is more literature available on Lie group integrators than Riemannian integrators. Note as well that prior work is available on accelerated optimization via numerical integration of Bregman dynamics in the Lie group setting [42; 60].

Here, we will work in the setting introduced in [42]. The setting of [60] can be thought of as a special case of the more general Lie group framework for accelerated optimization presented here. Consider a nn-dimensional Lie group GG with associated Lie algebra 𝔤=TeG\mathfrak{g}=T_{e}G, and a left-trivialization of the tangent bundle of the group TGG×𝔤TG\simeq G\times\mathfrak{g}, via (q,q˙)(q,Lq1q˙)(q,ξ)(q,\dot{q})\mapsto(q,\text{L}_{q^{-1}}\dot{q})\equiv(q,\xi), where L:G×GG\text{L}:G\times G\rightarrow G is the left action defined by Lqh=qh\text{L}_{q}h=qh for all q,hGq,h\in G. Suppose that 𝔤\mathfrak{g} is equipped with an inner product which induces an inner product on TqGT_{q}G via left-trivialization,

(vw)TqG=(TqLq1vTqLq1w)𝔤,v,wTqG.(v\bullet w)_{T_{q}G}=(\text{T}_{q}\text{L}_{q^{-1}}v\bullet\text{T}_{q}\text{L}_{q^{-1}}w)_{\mathfrak{g}},\qquad\forall v,w\in T_{q}G.

With this inner product, we identify 𝔤𝔤\mathfrak{g}\simeq\mathfrak{g}^{*} and TqGTqGG×𝔤T_{q}G\simeq T_{q}^{*}G\simeq G\times\mathfrak{g}^{*} via the Riesz representation. Let 𝐉:𝔤𝔤\mathbf{J}:\mathfrak{g}\rightarrow\mathfrak{g}^{*} be chosen such that (𝐉(ξ)ζ)(\mathbf{J}(\xi)\bullet\zeta) is positive-definite and symmetric as a bilinear form of ξ,ζ𝔤\xi,\zeta\in\mathfrak{g}. Then, the metric ,:𝔤×𝔤\langle\cdot,\cdot\rangle:\mathfrak{g}\times\mathfrak{g}\rightarrow\mathbb{R} with ξ,ζ=(𝐉(ξ)ζ)\langle\xi,\zeta\rangle=(\mathbf{J}(\xi)\bullet\zeta) serves as a left-invariant Riemannian metric on GG. The adjoint and ad operators are denoted by Adq:𝔤𝔤\text{Ad}_{q}:\mathfrak{g}\rightarrow\mathfrak{g} and adξ:𝔤𝔤\text{ad}_{\xi}:\mathfrak{g}\rightarrow\mathfrak{g}, respectively. We refer the reader to [48; 41; 23] for a more detailed description of Lie group theory and mechanics on Lie groups.

As mentioned earlier, there is a lot of literature available on Lie group integrators. We refer the reader to [30; 11; 12; 13] for very thorough surveys of the literature on Lie group methods, which acknowledge all the foundational contributions leading to the current state of Lie group integrator theory. In particular, the Crouch and Grossman approach [14], the Lewis and Simo approach [47], Runge–Kutta–Munthe–Kaas methods [51; 52; 50; 9], Magnus and Fer expansions [68; 4; 29], and commutator-free Lie group methods [10] are outlined in these surveys. Variational integrators have also been derived on the Lagrangian side in the Lie group setting [38; 39; 5; 27; 56; 43; 28; 37; 36].

We now introduce a discrete variational formulation of time-adaptive Lagrangian mechanics on Lie groups. Suppose we are given a partition 0=τ0<τ1<<τN=𝒯0=\tau_{0}<\tau_{1}<\ldots<\tau_{N}=\mathcal{T} of the interval [0,𝒯][0,\mathcal{T}], and a discrete curve in G××G\times\mathbb{R}\times\mathbb{R} denoted by {(qk,𝔮k,λk)}k=0N\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N} such that qkq(τk)q_{k}\approx q(\tau_{k}), 𝔮k𝔮(τk)\mathfrak{q}_{k}\approx\mathfrak{q}(\tau_{k}), and λkλ(τk)\lambda_{k}\approx\lambda(\tau_{k}). The discrete kinematics equation is chosen to be

qk+1=qkfk,q_{k+1}=q_{k}f_{k}, (5.5)

where fkGf_{k}\in G represents the relative update over a single step.

Consider the discrete action functional,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1[Ld(qk,fk,𝔮k,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk,\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}, (5.6)

where,

Ld(qk,fk,𝔮k,𝔮k+1)ext(q,𝔮)C2([τk,τk+1],G×)(q,𝔮)(τk)=(qk,𝔮k), (q,𝔮)(τk+1)=(qkfk,𝔮k+1) τkτk+1L(q,ξg(𝔮),𝔮)𝑑τ.L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],G\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=\left(q_{k}f_{k},\mathfrak{q}_{k+1}\right)\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{\xi}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}. (5.7)

We can derive the following result which relates a discrete Type I variational principle to a set of discrete Euler–Lagrange equations:

Theorem 5.1.

The Type I discrete Hamilton’s variational principle,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0,\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0, (5.8)

where,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1[Ld(qk,fk,𝔮k,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk,\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}, (5.9)

is equivalent to the discrete extended Euler–Lagrange equations,

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}), (5.10)
Adfk1(TeLfkD2Ldk)=TeLqkD1Ldk+τk+1τk𝔮k+1𝔮k𝔮k𝔮k1τkτk1TeLfk1D2Ldk1,\emph{Ad}^{*}_{f_{k}^{-1}}\left(\emph{T}_{e}^{*}\emph{L}_{f_{k}}D_{2}L_{d_{k}}\right)=\emph{T}_{e}^{*}\emph{L}_{q_{k}}D_{1}L_{d_{k}}+\frac{\tau_{k+1}-\tau_{k}}{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\emph{T}_{e}^{*}\emph{L}_{f_{k-1}}D_{2}L_{d_{k-1}}, (5.11)
[D3Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk1τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]\displaystyle\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right] (5.12)
+[D4Ldkλk11τkτk1]𝔮k𝔮k1τkτk1+1τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]=0,\displaystyle\qquad+\left[D_{4}L_{d_{k}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,

where LdkL_{d_{k}} denotes Ld(qk,fk,𝔮k,𝔮k+1)L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}).

Proof.

See Appendix A.3. ∎

Now, define

𝔭k\displaystyle\mathfrak{p}_{k} =D3Ld(qk,fk,𝔮k,𝔮k+1)\displaystyle=-D_{3}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}) (5.13)

and

μk\displaystyle\mu_{k} =Adfk1(TeLfkD2Ld(qk,fk,𝔮k,𝔮k+1))TeLqkD1Ld(qk,fk,𝔮k,𝔮k+1).\displaystyle=\text{Ad}^{*}_{f_{k}^{-1}}\left(\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\right)-\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}). (5.14)

Then,

μk+1\displaystyle\mu_{k+1} =τk+2τk+1𝔮k+2𝔮k+1𝔮k+1𝔮kτk+1τkTeLfkD2Ld(qk,fk,𝔮k,𝔮k+1).\displaystyle=\frac{\tau_{k+2}-\tau_{k+1}}{\mathfrak{q}_{k+2}-\mathfrak{q}_{k+1}}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}). (5.15)

With these definitions, if we use a constant time-step hh in fictive time τ\tau and substitute g(𝔮)=pp̊𝔮1p̊/pg(\mathfrak{q})=\frac{p}{\mathring{p}}\mathfrak{q}^{1-\mathring{p}/p}, the discrete Euler–Lagrange equations can be rewritten as

μk\displaystyle\mu_{k} =Adfk1(TeLfkD2Ld(qk,fk,𝔮k,𝔮k+1))TeLqkD1Ld(qk,fk,𝔮k,𝔮k+1),\displaystyle=\text{Ad}^{*}_{f_{k}^{-1}}\left(\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\right)-\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}), (5.16)
μk+1\displaystyle\mu_{k+1} =𝔮k1p̊/p𝔮k+11p̊/pTeLfkD2Ld(qk,fk,𝔮k,𝔮k+1),\displaystyle=\frac{\mathfrak{q}_{k}^{1-\mathring{p}/p}}{\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}), (5.17)
𝔮k+1\displaystyle\mathfrak{q}_{k+1} =𝔮k+hpp̊𝔮k1p̊/p,\displaystyle=\mathfrak{q}_{k}+h\frac{p}{\mathring{p}}\mathfrak{q}_{k}^{1-\mathring{p}/p}, (5.18)
𝔭k+1\displaystyle\mathfrak{p}_{k+1} =p̊[λk+1λk+LdkLdk+1]hp𝔮k+11p̊/p+D4Ldk+λk+1𝔮k+1(1p̊p).\displaystyle=\frac{\mathring{p}\left[\lambda_{k+1}-\lambda_{k}+L_{d_{k}}-L_{d_{k+1}}\right]}{hp\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}+D_{4}L_{d_{k}}+\frac{\lambda_{k+1}}{\mathfrak{q}_{k+1}}\left(1-\frac{\mathring{p}}{p}\right). (5.19)

In the Lie group setting, the Riemannian pp-Bregman Lagrangian becomes

p(q,ξ,t)=tκp+12pξ,ξCpt(κ+1)p1f(q),\mathcal{L}_{p}(q,\xi,t)=\frac{t^{\kappa p+1}}{2p}\langle\xi,\xi\rangle-Cpt^{(\kappa+1)p-1}f(q), (5.20)

with corresponding Euler–Lagrange equation,

d𝐉(ξ)dt+κp+1t𝐉(ξ)adξ𝐉(ξ)+Cp2tp2Lf(q)=0,\frac{d\mathbf{J}(\xi)}{dt}+\frac{\kappa p+1}{t}\mathbf{J}(\xi)-\text{ad}_{\xi}^{*}\mathbf{J}(\xi)+Cp^{2}t^{p-2}\nabla_{\text{L}}f(q)=0, (5.21)

where Lf\nabla_{\text{L}}f is the left-trivialized derivative of ff, given by Lf(q)=TeLq(Dqf(q))\nabla_{\text{L}}f(q)=\text{T}_{e}^{*}\text{L}_{q}(\text{D}_{q}f(q)). We then consider the discrete Lagrangian,

Ld(qk,fk,𝔮k,𝔮k+1)\displaystyle L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}) =𝔮kκp+1hp(g(𝔮k))2Td(fk)Chp𝔮k(κ+1)p1f(qk),\displaystyle=\frac{\mathfrak{q}_{k}^{\kappa p+1}}{hp(g(\mathfrak{q}_{k}))^{2}}T_{d}(f_{k})-Chp\mathfrak{q}_{k}^{(\kappa+1)p-1}f(q_{k}), (5.22)

where Td(fk)12hξk,hξkT_{d}(f_{k})\approx\frac{1}{2}\langle h\xi_{k},h\xi_{k}\rangle, which approximates

Ld(qk,fk,𝔮k,𝔮k+1)ext(q,𝔮)C2([τk,τk+1],G×)(q,𝔮)(τk)=(qk,𝔮k), (q,𝔮)(τk+1)=(qkfk,𝔮k+1) τkτk+1L(q,ξg(𝔮),𝔮)𝑑τ.L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})\approx\operatorname*{ext}_{\begin{subarray}{c}(q,\mathfrak{q})\in C^{2}([\tau_{k},\tau_{k+1}],G\times\mathbb{R})\\ (q,\mathfrak{q})(\tau_{k})=(q_{k},\mathfrak{q}_{k}),\text{ }(q,\mathfrak{q})(\tau_{k+1})=(q_{k}f_{k},\mathfrak{q}_{k+1})\end{subarray}}\text{ }\int_{\tau_{k}}^{\tau_{k+1}}{L\left(q,\frac{\xi}{g(\mathfrak{q})},\mathfrak{q}\right)d\tau}. (5.23)

5.3. Numerical Experiment on SO(3)\text{SO}(3)

We work on the 3-dimensional Special Orthogonal group,

SO(3)={R3×3|RR=I3×3,det(R)=1}.\text{SO}(3)=\{R\in\mathbb{R}^{3\times 3}|R^{\top}R=I_{3\times 3},\det{(R)}=1\}. (5.24)

Its Lie algebra is

𝔰𝔬(3)={S3×3|S=S},\mathfrak{so}(3)=\{S\in\mathbb{R}^{3\times 3}|S^{\top}=-S\}, (5.25)

with the matrix commutator as the Lie bracket. We have an identification between 3\mathbb{R}^{3} and 𝔰𝔬(3)\mathfrak{so}(3) given by the hat map ^:3𝔰𝔬(3)\hat{\cdot}:\mathbb{R}^{3}\rightarrow\mathfrak{so}(3), defined such that x^y=x×y\hat{x}y=x\times y for any x,y3x,y\in\mathbb{R}^{3}. The inverse of the hat map is the vee map ():𝔰𝔬(3)3(\cdot)^{\vee}:\mathfrak{so}(3)\rightarrow\mathbb{R}^{3}. The inner product on 𝔰𝔬(3)\mathfrak{so}(3) is given by

(η^ξ^)𝔰𝔬(3)=12Trace(η^ξ^)=ηξ,\left(\hat{\eta}\bullet\hat{\xi}\right)_{\mathfrak{so}(3)}=\frac{1}{2}\text{Trace}\left(\hat{\eta}^{\top}\hat{\xi}\right)=\eta^{\top}\xi, (5.26)

and the metric is chosen so that

η^,ξ^=(𝐉(η^)ξ^)𝔰𝔬(3)=Trace(η^Jdξ^)=ηJξ,\langle\hat{\eta},\hat{\xi}\rangle=\left(\mathbf{J}(\hat{\eta})\bullet\hat{\xi}\right)_{\mathfrak{so}(3)}=\text{Trace}\left(\hat{\eta}^{\top}J_{d}\hat{\xi}\right)=\eta^{\top}J\xi, (5.27)

where J3×3J\in\mathbb{R}^{3\times 3} is a symmetric positive-definite matrix and Jd=12Trace(J)I3×3JJ_{d}=\frac{1}{2}\text{Trace}(J)I_{3\times 3}-J.
On SO(3)\text{SO}(3), for any u,v3u,v\in\mathbb{R}^{3} and FSO(3)F\in\text{SO}(3),

adu^v^=[u^,v^]=u^v^v^u^=u×v^,AdFu^=Fu^F=Fu^.\text{ad}_{\hat{u}}\hat{v}=[\hat{u},\hat{v}]=\hat{u}\hat{v}-\hat{v}\hat{u}=\widehat{u\times v},\qquad\text{Ad}_{F}\hat{u}=F\hat{u}F^{\top}=\widehat{Fu}. (5.28)

Identifying 𝔰𝔬(3)𝔰𝔬(3)3\mathfrak{so}(3)^{*}\simeq\mathfrak{so}(3)\simeq\mathbb{R}^{3}, we have for any u,v3u,v\in\mathbb{R}^{3} and FSO(3)F\in\text{SO}(3) that

aduv=u^v=u×v,aduv=u^v=v×u,AdFu=Fu,AdFu=Fu.\text{ad}_{u}v=\hat{u}v=u\times v,\qquad\text{ad}^{*}_{u}v=-\hat{u}v=v\times u,\qquad\text{Ad}_{F}u=Fu,\qquad\text{Ad}_{F}^{*}u=F^{\top}u. (5.29)

On SO(3)\text{SO}(3), the Riemannian pp-Bregman Lagrangian becomes

p(R,Ω,t)=tp+12pΩJΩCpt2p1f(R),\mathcal{L}_{p}(R,\Omega,t)=\frac{t^{p+1}}{2p}\Omega^{\top}J\Omega-Cpt^{2p-1}f(R), (5.30)

and the corresponding Euler–Lagrange equations are given by

JΩ˙+p+1tJΩ+Ω^JΩ+Cp2tp2Lf(R)=0,R˙=RΩ^.J\dot{\Omega}+\frac{p+1}{t}J\Omega+\hat{\Omega}J\Omega+Cp^{2}t^{p-2}\nabla_{\text{L}}f(R)=0,\qquad\dot{R}=R\hat{\Omega}. (5.31)

The discrete kinematics equation is written as

Rk+1=RkFk,R_{k+1}=R_{k}F_{k}, (5.32)

where FkSO(3)F_{k}\in\text{SO}(3), and κ=1\kappa=1 so we get the discrete Lagrangian,

Ld(Rk,Fk,k,k+1)\displaystyle L_{d}(R_{k},F_{k},\mathfrak{R}_{k},\mathfrak{R}_{k+1}) =p̊2hp3kp1+2p̊/pTd(Fk)Chpk2p1f(Rk).\displaystyle=\frac{\mathring{p}^{2}}{hp^{3}}\mathfrak{R}_{k}^{p-1+2\mathring{p}/p}T_{d}(F_{k})-Chp\mathfrak{R}_{k}^{2p-1}f(R_{k}). (5.33)

As in [38; 42], the angular velocity is approximated by Ω^k1hRk(Rk+1Rk)=1h(FkI3×3)\hat{\Omega}_{k}\approx\frac{1}{h}R_{k}^{\top}(R_{k+1}-R_{k})=\frac{1}{h}(F_{k}-I_{3\times 3}) so we can take

Td(Fk)=Trace([I3×3Fk]Jd).T_{d}(F_{k})=\text{Trace}\left([I_{3\times 3}-F_{k}]J_{d}\right). (5.34)

Differentiating this equation and using the identity Trace(x^A)=(AA)x\text{Trace}(-\hat{x}A)=(A-A^{\top})^{\vee}\cdot x yields

TILFk(DFkTd(Fk))=(JdFkFkJd).\text{T}_{I}^{*}\text{L}_{F_{k}}\left(D_{F_{k}}T_{d}(F_{k})\right)=\left(J_{d}F_{k}-F_{k}^{\top}J_{d}\right)^{\vee}. (5.35)

Then, the discrete Euler–Lagrange equations for μk\mu_{k} and μk+1\mu_{k+1} become

μk\displaystyle\mu_{k} =p̊2hp3kp1+2p̊/p(FkJdJdFk)+Chpk2p1Lf(Rk),\displaystyle=\frac{\mathring{p}^{2}}{hp^{3}}\mathfrak{R}_{k}^{p-1+2\mathring{p}/p}\left(F_{k}J_{d}-J_{d}F_{k}^{\top}\right)^{\vee}+Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k}), (5.36)
μk+1\displaystyle\mu_{k+1} =𝔮k1p̊/p𝔮k+11p̊/pFk[μkChpk2p1Lf(Rk)].\displaystyle=\frac{\mathfrak{q}_{k}^{1-\mathring{p}/p}}{\mathfrak{q}_{k+1}^{1-\mathring{p}/p}}F_{k}^{\top}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right]. (5.37)

Now, equation (5.36) can be solved explicitly when J=I3×3J=I_{3\times 3} as described in [42]:

Fk=exp(sin1akaka^k),whereak=hp3p̊2k1p2p̊/p[μkChpk2p1Lf(Rk)].\displaystyle F_{k}=\exp{\left(\frac{\sin^{-1}{\|a_{k}\|}}{\|a_{k}\|}\hat{a}_{k}\right)},\qquad\text{where}\quad a_{k}=\frac{hp^{3}}{\mathring{p}^{2}}\mathfrak{R}_{k}^{1-p-2\mathring{p}/p}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right]. (5.38)

Therefore, we get the Adaptive LLGVI (Adaptive Lagrangian Lie Group Variational Integrator)

Fk=exp(sin1akaka^k),  where  ak=hp3p̊2k1p2p̊/p[μkChpk2p1Lf(Rk)],\displaystyle F_{k}=\exp{\left(\frac{\sin^{-1}{\|a_{k}\|}}{\|a_{k}\|}\hat{a}_{k}\right)},\text{ }\text{ where }\text{ }a_{k}=\frac{hp^{3}}{\mathring{p}^{2}}\mathfrak{R}_{k}^{1-p-2\mathring{p}/p}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right], (5.39)
k+1=k+hpp̊k1p̊/p,\displaystyle\mathfrak{R}_{k+1}=\mathfrak{R}_{k}+h\frac{p}{\mathring{p}}\mathfrak{R}_{k}^{1-\mathring{p}/p}, (5.40)
μk+1=k1p̊/pk+11p̊/pFk[μkChpk2p1Lf(Rk)],\displaystyle\mu_{k+1}=\frac{\mathfrak{R}_{k}^{1-\mathring{p}/p}}{\mathfrak{R}_{k+1}^{1-\mathring{p}/p}}F_{k}^{\top}\left[\mu_{k}-Chp\mathfrak{R}_{k}^{2p-1}\nabla_{\text{L}}f(R_{k})\right], (5.41)
Rk+1=RkFk.\displaystyle R_{k+1}=R_{k}F_{k}. (5.42)

We will use this integrator to solve the problem of minimizing the objective function,

f(R)=12ARF2=12(AF2+3)Trace(AR),f(R)=\frac{1}{2}\|A-R\|_{F}^{2}=\frac{1}{2}\left(\|A\|_{F}^{2}+3\right)-\text{Trace}(A^{\top}R), (5.43)

over RSO(3)R\in\text{SO}(3), where F\|\cdot\|_{F} denotes the Frobenius norm. Its left-trivialized gradient is given by

Lf(R)=(ARRA).\nabla_{\text{L}}f(R)=\left(A^{\top}R-R^{\top}A\right)^{\vee}. (5.44)

Minimizing this objective function appears in the least-squares estimation of attitude, commonly known as Wahba’s problem [62]. The optimal attitude is explicitly given by

R=Udiag[1,1,det(UV)]V,R^{*}=U\text{diag}\left[1,1,\det(UV)\right]V^{\top}, (5.45)

where A=USVA=USV^{\top} is the singular value decomposition of AA with U,VO(3)U,V\in\text{O}(3) and SS diagonal.

We have tested the Adaptive LLGVI integrator on Wahba’s problem against the Implicit Lie Group Variational Integrator (Implicit LGVI) from [42]. The Implicit LGVI is a Lagrangian Lie group variational integrator which adaptively adjusts the step size at every step. It should be noted that these two adaptive approaches use adaptivity in two fundamentally different ways: our Adaptive LLGVI method uses a priori adaptivity based on known global properties of the family of differential equations considered (i.e. the time-rescaling symmetry of the family of Bregman dynamics), while the implicit method from [42] adapts the time-steps in an a posteriori way, by solving a system of nonlinear equations coming from an extended variational principle. The results of our numerical experiments are presented in Figures 3 and 4. In these numerical experiments, we have used the termination criteria

|f(Rk)f(R)|<δand|f(Rk)f(Rk1)|<δ.|f(R_{k})-f(R^{*})|<\delta\qquad\text{and}\qquad|f(R_{k})-f(R_{k-1})|<\delta. (5.46)

We can see from Figure 3 that both algorithms preserve the orthogonality condition RkRk=I3×3R_{k}^{\top}R_{k}=I_{3\times 3} very well. Now, we can observe from Figure 3 that although both algorithms follow the same curve in time tt, they do not travel along this curve at the same speed. Despite the fact that the Adaptive LLGVI algorithm initially takes smaller time-steps, those time-steps eventually become much larger than for the Implicit LGVI algorithm, and as a result, the Adaptive LLGVI algorithm achieves the termination criteria in a smaller number of iterations, which can also be seen more explicitly in the table from Figure 4. Unlike the Implicit LGVI algorithm, the Adaptive LLGVI algorithm is explicit, so each iteration is much cheaper and is therefore significantly faster, as can be seen from the running times displayed in Figure 4. Furthermore, the Adaptive LLGVI algorithm is significantly easier to implement.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3. Comparison of the Adaptive LLGVI algorithm and of the Implicit LGVI algorithm from [42] with p=6p=6, to solve Wahba’s problem (5.43).
Refer to caption
Figure 4. Time and number of iterations needed by the Adaptive LLGVI and Implicit LGVI algorithms with p=6p=6, to satisfy the termination criterion (5.46) on Wahba’s problem (5.43).

6. Conclusion

A variational framework for accelerated optimization on vector spaces was introduced [65] by considering a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time-rescaling. This variational framework was exploited in [20] by using time-adaptive geometric Hamiltonian integrators to design efficient, explicit algorithms for symplectic accelerated optimization. It was observed that a careful use of adaptivity and symplecticity, which was possible on the Hamiltonian side thanks to the Poincaré transformation, could result in a significant gain in computational efficiency, by simulating higher-order Bregman dynamics using the computationally efficient lower-order Bregman integrators applied to the time-rescaled dynamics.

The variational framework and time-adaptive approach on the Hamiltonian side were later extended to the Riemannian manifolds setting in [19]. However, the current formulations of Hamiltonian variational integrators do not make sense intrinsically on manifolds, so this framework was only exploited using methods which take advantage of the structure of the Euclidean spaces in which the Riemannian manifolds are embedded [16; 18] instead of the structure of the Riemannian manifolds themselves. On the other hand, existing formulations of Lagrangian variational integrators are well-defined on manifolds, and many Lagrangian variational integrators have been derived on Riemannian manifolds, especially in the Lie group setting. This motivated exploring whether it is possible to construct a mechanism on the Lagrangian side which mimics the Poincaré transformation, since it is more natural and easier to work on the Lagrangian side on curved manifolds.

The usual correspondence between Hamiltonian and Lagrangian dynamics could not be exploited here since the Poincaré Hamiltonian is degenerate and therefore does not have a corresponding Lagrangian formulation. Instead, we introduced a novel derivation of the Poincaré transformation from a variational principle which gave us additional insight into the transformation mechanism and provided natural candidates for a time-adaptive framework on the Lagrangian side. Based on these observations, we constructed a theory of time-adaptive Lagrangian mechanics both in continuous and discrete time, and applied the resulting time-adaptive Lagrangian variational integrators to solve optimization problems by simulating Bregman dynamics, within the variational framework introduced in [65]. We observed empirically that our time-adaptive Lagrangian variational integrators performed almost exactly in the same way as the time-adaptive Hamiltonian variational integrators coming from the Poincaré framework of [20], whenever they are used with the same parameters and time-step. As a result, the computational analysis carried in [20] for the HTVI algorithm extends to the LTVI algorithm, and thus the LTVI algorithm is much more efficient than non-symplectic integrators for the Bregman dynamics and can be a competitive first-order explicit algorithm since it can outperform commonly used optimization algorithms for certain objective functions.

Finally, we showed that our time-adaptive Lagrangian approach extends naturally to more general spaces such as Riemannian manifolds and Lie groups without having to face the difficulties experienced on the Hamiltonian side, and we then applied time-adaptive Lie group Lagrangian variational integrators to solve an optimization problem on the three-dimensional Special Orthogonal group SO(3)\text{SO}(3). In particular, the resulting Lie group accelerated optimization algorithms were significantly faster and easier to implement than other recently proposed time-adaptive Lie group variational integrators for accelerated optimization.

In future work, we will explore the issue of time-adaptive Lagrangian mechanics for more general monitor functions, using the primal-dual framework of Dirac mechanics. We will also study the convergence properties of the discrete-time algorithms, and try to better understand how to reconcile the Nesterov barrier theorem with the convergence properties of the continuous Bregman flows. It would also be useful to study the extent to which the practical considerations recently presented in [17], which significantly improved the computational performance of the symplectic optimization algorithms in the normed vector space setting, extend to the Riemannian manifold and Lie group settings with the Lagrangian Riemannian and Lie group variational integrators.

Acknowledgments

The authors would like to thank Taeyoung Lee and Molei Tao for sharing the code for the implicit Lie group variational integrator from [42], which was used in the numerical comparison with the method introduced in this paper. The authors would also like to thank the referees for their helpful comments and suggestions.

Funding

The authors were supported in part by NSF under grants DMS-1345013, DMS-1813635, CCF-2112665, by AFOSR under grant FA9550-18-1-0288, and by the DoD under grant HQ00342010023 (Newton Award for Transformative Ideas during the COVID-19 Pandemic).

Appendix A Proofs of Theorems

A.1. Proof of Theorem 3.2

Theorem A.1.

The Type I discrete Hamilton’s variational principle,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0,\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

where,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1[Ld(qk,𝔮k,qk+1,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk,\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}},

is equivalent to the discrete extended Euler–Lagrange equations,

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),
𝔮k+1𝔮kτk+1τkD1Ld(qk,𝔮k,qk+1,𝔮k+1)+𝔮k𝔮k1τkτk1D3Ld(qk1,𝔮k1,qk,𝔮k)=0,\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,
[D2Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk1τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]\displaystyle\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]
+[D4Ldk1λk11τkτk1]𝔮k𝔮k1τkτk1+1τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]=0,\displaystyle\qquad+\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,

where LdkL_{d_{k}} denotes Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}).

Proof.

We use the notation Ldk=Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d_{k}}=L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), and we will use the fact that δq0=δqN=δ𝔮0=δ𝔮N=0\delta q_{0}=\delta q_{N}=\delta\mathfrak{q}_{0}=\delta\mathfrak{q}_{N}=0 throughout the proof. We have

δ𝔖¯d\displaystyle\delta\bar{\mathfrak{S}}_{d} =δ(k=0N1[Ld(qk,𝔮k,qk+1,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk)\displaystyle=\delta\left(\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}\right)
=k=1N1[D2Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τkδ𝔮kk=1N11τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k\displaystyle=\sum_{k=1}^{N-1}{\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}
+k=0N2[D4Ldkλk1τk+1τk]𝔮k+1𝔮kτk+1τkδ𝔮k+1+k=0N21τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k+1\displaystyle\qquad\qquad+\sum_{k=0}^{N-2}{\left[D_{4}L_{d_{k}}-\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k+1}}+\sum_{k=0}^{N-2}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k+1}}
+k=1N1𝔮k+1𝔮kτk+1τkD1Ldkδqk+k=0N2𝔮k+1𝔮kτk+1τkD3Ldkδqk+1+k=0N1𝔮k+1𝔮kτk+1τk(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk.\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}\delta q_{k}}+\sum_{k=0}^{N-2}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{3}L_{d_{k}}\delta q_{k+1}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.

Thus,

δ𝔖¯d\displaystyle\delta\bar{\mathfrak{S}}_{d} =k=1N1[D2Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τkδ𝔮kk=1N11τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k\displaystyle=\sum_{k=1}^{N-1}{\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}
+k=1N1[D4Ldk1λk11τkτk1]𝔮k𝔮k1τkτk1δ𝔮k+k=1N11τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]δ𝔮k\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\delta\mathfrak{q}_{k}}+\sum_{k=1}^{N-1}{\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]\delta\mathfrak{q}_{k}}
+k=1N1[𝔮k+1𝔮kτk+1τkD1Ldk+𝔮k𝔮k1τkτk1D3Ldk1]δqk+k=0N1𝔮k+1𝔮kτk+1τk(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk.\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d_{k-1}}\right]\delta q_{k}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.

As a consequence, if

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),
𝔮k+1𝔮kτk+1τkD1Ld(qk,𝔮k,qk+1,𝔮k+1)+𝔮k𝔮k1τkτk1D3Ld(qk1,𝔮k1,qk,𝔮k)=0,\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,
[D2Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk1τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]\displaystyle\left[D_{2}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]
+[D4Ldk1λk11τkτk1]𝔮k𝔮k1τkτk1+1τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]=0,\displaystyle\qquad+\left[D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,

then δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0. Conversely, if δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0, then a discrete fundamental theorem of the calculus of variations yields the above equations. ∎

A.2. Proof of Theorem 3.4

Theorem A.2.

The Type I discrete Hamilton’s variational principle,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0,\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

where,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1{𝔮k+1𝔮kτk+1τk[Ld(qk,𝔮k,qk+1,𝔮k+1)λk]+λkg(𝔮k)},\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left\{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\right]+\lambda_{k}g(\mathfrak{q}_{k})\right\}},

is equivalent to the discrete extended Euler–Lagrange equations,

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),
𝔮k+1𝔮kτk+1τkD1Ld(qk,𝔮k,qk+1,𝔮k+1)+𝔮k𝔮k1τkτk1D3Ld(qk1,𝔮k1,qk,𝔮k)=0,\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,
𝔮k+1𝔮kτk+1τkD2Ldk1τk+1τkLdk+𝔮k𝔮k1τkτk1D4Ldk1+1τkτk1Ldk1=λk1τkτk1λkτk+1τkλkg(𝔮k),\displaystyle\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}L_{d_{k-1}}=\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}-\lambda_{k}\nabla g(\mathfrak{q}_{k}),

where LdkL_{d_{k}} denotes Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}).

Proof.

We use the notation Ldk=Ld(qk,𝔮k,qk+1,𝔮k+1)L_{d_{k}}=L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1}), and we will use the fact that δq0=δqN=δ𝔮0=δ𝔮N=0\delta q_{0}=\delta q_{N}=\delta\mathfrak{q}_{0}=\delta\mathfrak{q}_{N}=0 throughout the proof. We have

δ𝔖¯d\displaystyle\delta\bar{\mathfrak{S}}_{d} =δ(k=0N1{𝔮k+1𝔮kτk+1τk[Ld(qk,𝔮k,qk+1,𝔮k+1)λk]+λkg(𝔮k)})\displaystyle=\delta\left(\sum_{k=0}^{N-1}{\left\{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left[L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})-\lambda_{k}\right]+\lambda_{k}g(\mathfrak{q}_{k})\right\}}\right)
=k=1N1[𝔮k+1𝔮kτk+1τkD2Ldk1τk+1τkLdk+λkτk+1τk+λkg(𝔮k)]δ𝔮k+k=0N2[𝔮k+1𝔮kτk+1τkD4Ldk+1τk+1τkLdkλkτk+1τk]δ𝔮k+1\displaystyle=\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}+\sum_{k=0}^{N-2}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{4}L_{d_{k}}+\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}\right]\delta\mathfrak{q}_{k+1}}
+k=1N1𝔮k+1𝔮kτk+1τkD1Ldkδqk+k=0N2𝔮k+1𝔮kτk+1τkD3Ldkδqk+1+k=0N1(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk.\displaystyle\quad\qquad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}\delta q_{k}}+\sum_{k=0}^{N-2}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{3}L_{d_{k}}\delta q_{k+1}}+\sum_{k=0}^{N-1}{\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.

Thus,

δ𝔖¯d\displaystyle\delta\bar{\mathfrak{S}}_{d} =k=1N1[𝔮k+1𝔮kτk+1τkD2Ldk1τk+1τkLdk+λkτk+1τk+λkg(𝔮k)+𝔮k𝔮k1τkτk1D4Ldk1+1τkτk1Ldk1λk1τkτk1]δ𝔮k\displaystyle=\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}L_{d_{k-1}}-\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}\right]\delta\mathfrak{q}_{k}}
+k=1N1[𝔮k+1𝔮kτk+1τkD1Ldk+𝔮k𝔮k1τkτk1D3Ldk1]δqk+k=0N1(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk.\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\left[\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d_{k-1}}\right]\delta q_{k}}+\sum_{k=0}^{N-1}{\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.

As a consequence, if

𝔮k+1𝔮kτk+1τkD2Ldk1τk+1τkLdk+𝔮k𝔮k1τkτk1D4Ldk1+1τkτk1Ldk1=λk1τkτk1λkτk+1τkλkg(𝔮k),\displaystyle\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}L_{d_{k}}+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{4}L_{d_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}L_{d_{k-1}}=\frac{\lambda_{k-1}}{\tau_{k}-\tau_{k-1}}-\frac{\lambda_{k}}{\tau_{k+1}-\tau_{k}}-\lambda_{k}\nabla g(\mathfrak{q}_{k}),
𝔮k+1𝔮kτk+1τkD1Ld(qk,𝔮k,qk+1,𝔮k+1)+𝔮k𝔮k1τkτk1D3Ld(qk1,𝔮k1,qk,𝔮k)=0,\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d}(q_{k},\mathfrak{q}_{k},q_{k+1},\mathfrak{q}_{k+1})+\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}D_{3}L_{d}(q_{k-1},\mathfrak{q}_{k-1},q_{k},\mathfrak{q}_{k})=0,
𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),

then δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0. Conversely, if δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0, then a discrete fundamental theorem of the calculus of variations yields the above equations. ∎

A.3. Proof of Theorem 5.1

Theorem A.3.

The Type I discrete Hamilton’s variational principle,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0,\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0,

where,

𝔖¯d({(qk,𝔮k,λk)}k=0N)=k=0N1[Ld(qk,fk,𝔮k,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk,\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}},}

is equivalent to the discrete extended Euler–Lagrange equations,

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),
Adfk1(TeLfkD2Ldk)\displaystyle\emph{Ad}^{*}_{f_{k}^{-1}}\left(\emph{T}_{e}^{*}\emph{L}_{f_{k}}D_{2}L_{d_{k}}\right) =TeLqkD1Ldk+τk+1τk𝔮k+1𝔮k𝔮k𝔮k1τkτk1TeLfk1D2Ldk1,\displaystyle=\emph{T}_{e}^{*}\emph{L}_{q_{k}}D_{1}L_{d_{k}}+\frac{\tau_{k+1}-\tau_{k}}{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\emph{T}_{e}^{*}\emph{L}_{f_{k-1}}D_{2}L_{d_{k-1}},
[D3Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk1τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]\displaystyle\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]
+[D4Ldkλk11τkτk1]𝔮k𝔮k1τkτk1+1τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]=0,\displaystyle\qquad+\left[D_{4}L_{d_{k}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,

where LdkL_{d_{k}} denotes Ld(qk,fk,𝔮k,𝔮k+1)L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}).

Proof.

We use the notation Ldk=Ld(qk,fk,𝔮k,𝔮k+1)L_{d_{k}}=L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1}) and we will use the fact that δq0=δqN=δ𝔮0=δ𝔮N=η0=ηN=0\delta q_{0}=\delta q_{N}=\delta\mathfrak{q}_{0}=\delta\mathfrak{q}_{N}=\eta_{0}=\eta_{N}=0 throughout the proof. We have

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)\displaystyle\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right) =δ(k=0N1[Ld(qk,fk,𝔮k,𝔮k+1)λk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk)\displaystyle=\delta\left(\sum_{k=0}^{N-1}{\left[L_{d}(q_{k},f_{k},\mathfrak{q}_{k},\mathfrak{q}_{k+1})-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}}\right)
=k=1N1[D3Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τkδ𝔮kk=1N11τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k\displaystyle=\sum_{k=1}^{N-1}{\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}
+k=0N2[D4Ldkλk1τk+1τk]𝔮k+1𝔮kτk+1τkδ𝔮k+1+k=0N21τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k+1\displaystyle\qquad\qquad+\sum_{k=0}^{N-2}{\left[D_{4}L_{d_{k}}-\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k+1}}+\sum_{k=0}^{N-2}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k+1}}
+k=1N1𝔮k+1𝔮kτk+1τkD1Ldkδqk+k=0N1𝔮k+1𝔮kτk+1τkD2Ldkδfk+k=0N1𝔮k+1𝔮kτk+1τk(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk.\displaystyle\qquad\qquad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{1}L_{d_{k}}\delta q_{k}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}D_{2}L_{d_{k}}\delta f_{k}}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.

We can write δgk\delta g_{k} as δgk=gkηk\delta g_{k}=g_{k}\eta_{k} for some ηk𝔤\eta_{k}\in\mathfrak{g}. Then, taking the variation of the discrete kinematics equation qk+1=qkfkq_{k+1}=q_{k}f_{k} gives the equation δqk+1=δqkfk+qkδfk\delta q_{k+1}=\delta q_{k}f_{k}+q_{k}\delta f_{k} and fk=qk1qk+1f_{k}=q_{k}^{-1}q_{k+1}. Therefore,

δfk\displaystyle\delta f_{k} =qk1δqk+1qk1δqkfk=qk1qk+1ηk+1qk1qkηkfk=fkηk+1ηkfk,\displaystyle=q_{k}^{-1}\delta q_{k+1}-q_{k}^{-1}\delta q_{k}f_{k}=q_{k}^{-1}q_{k+1}\eta_{k+1}-q_{k}^{-1}q_{k}\eta_{k}f_{k}=f_{k}\eta_{k+1}-\eta_{k}f_{k},

so

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)\displaystyle\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right) =k=1N1[D3Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τkδ𝔮kk=1N11τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k\displaystyle=\sum_{k=1}^{N-1}{\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}
+k=1N1[(D4Ldk1λk11τkτk1)𝔮k𝔮k1τkτk1+1τkτk1(Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1))]δ𝔮k\displaystyle\qquad\quad+\sum_{k=1}^{N-1}{\left[\left(D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right)\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left(L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right)\right]\delta\mathfrak{q}_{k}}
+k=1N1𝔮k+1𝔮kτk+1τk(TeLqkD1Ldkηk)+k=0N1𝔮k+1𝔮kτk+1τk(TeLfkD2Ldk[ηk+1fk1ηkfk])\displaystyle\qquad\qquad\quad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d_{k}}\bullet\eta_{k}\right)}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d_{k}}\bullet\left[\eta_{k+1}-f_{k}^{-1}\eta_{k}f_{k}\right]\right)}
+k=0N1𝔮k+1𝔮kτk+1τk(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk.\displaystyle\qquad\qquad\qquad+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}.

Then,

δ𝔖¯d({(qk,𝔮k,λk)}k=0N)\displaystyle\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right) =k=1N1[D3Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τkδ𝔮kk=1N11τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]δ𝔮k\displaystyle=\sum_{k=1}^{N-1}{\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\delta\mathfrak{q}_{k}}-\sum_{k=1}^{N-1}{\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]\delta\mathfrak{q}_{k}}
+k=1N1[(D4Ldk1λk11τkτk1)𝔮k𝔮k1τkτk1+1τkτk1(Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1))]δ𝔮k\displaystyle\qquad+\sum_{k=1}^{N-1}{\left[\left(D_{4}L_{d_{k-1}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right)\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left(L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right)\right]\delta\mathfrak{q}_{k}}
+k=1N1𝔮k+1𝔮kτk+1τk(TeLqkD1Ldkηk)+k=0N1𝔮k+1𝔮kτk+1τk(g(𝔮k)𝔮k+1𝔮kτk+1τk)δλk\displaystyle\qquad\quad+\sum_{k=1}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{*}\text{L}_{q_{k}}D_{1}L_{d_{k}}\bullet\eta_{k}\right)}+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(g(\mathfrak{q}_{k})-\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\right)\delta\lambda_{k}}
+k=0N1𝔮k𝔮k1τkτk1(TeLfk1D2Ldk1ηk)k=0N1𝔮k+1𝔮kτk+1τk(TeLfkD2LdkAdfk1ηk).\displaystyle\qquad\qquad+\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\left(\text{T}_{e}^{*}\text{L}_{f_{k-1}}D_{2}L_{d_{k-1}}\bullet\eta_{k}\right)}-\sum_{k=0}^{N-1}{\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}\left(\text{T}_{e}^{*}\text{L}_{f_{k}}D_{2}L_{d_{k}}\bullet\text{Ad}_{f_{k}^{-1}}\eta_{k}\right)}.

As a consequence, if

𝔮k+1=𝔮k+(τk+1τk)g(𝔮k),\mathfrak{q}_{k+1}=\mathfrak{q}_{k}+(\tau_{k+1}-\tau_{k})g(\mathfrak{q}_{k}),
Adfk1(TeLfkD2Ldk)\displaystyle\emph{Ad}^{*}_{f_{k}^{-1}}\left(\emph{T}_{e}^{*}\emph{L}_{f_{k}}D_{2}L_{d_{k}}\right) =TeLqkD1Ldk+τk+1τk𝔮k+1𝔮k𝔮k𝔮k1τkτk1TeLfk1D2Ldk1,\displaystyle=\emph{T}_{e}^{*}\emph{L}_{q_{k}}D_{1}L_{d_{k}}+\frac{\tau_{k+1}-\tau_{k}}{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}\emph{T}_{e}^{*}\emph{L}_{f_{k-1}}D_{2}L_{d_{k-1}},
[D3Ldk+λk1τk+1τk+λkg(𝔮k)]𝔮k+1𝔮kτk+1τk1τk+1τk[Ldkλk𝔮k+1𝔮kτk+1τk+λkg(𝔮k)]\displaystyle\left[D_{3}L_{d_{k}}+\lambda_{k}\frac{1}{\tau_{k+1}-\tau_{k}}+\lambda_{k}\nabla g(\mathfrak{q}_{k})\right]\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}-\frac{1}{\tau_{k+1}-\tau_{k}}\left[L_{d_{k}}-\lambda_{k}\frac{\mathfrak{q}_{k+1}-\mathfrak{q}_{k}}{\tau_{k+1}-\tau_{k}}+\lambda_{k}g(\mathfrak{q}_{k})\right]
+[D4Ldkλk11τkτk1]𝔮k𝔮k1τkτk1+1τkτk1[Ldk1λk1𝔮k𝔮k1τkτk1+λk1g(𝔮k1)]=0,\displaystyle\qquad+\left[D_{4}L_{d_{k}}-\lambda_{k-1}\frac{1}{\tau_{k}-\tau_{k-1}}\right]\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\frac{1}{\tau_{k}-\tau_{k-1}}\left[L_{d_{k-1}}-\lambda_{k-1}\frac{\mathfrak{q}_{k}-\mathfrak{q}_{k-1}}{\tau_{k}-\tau_{k-1}}+\lambda_{k-1}g(\mathfrak{q}_{k-1})\right]=0,

then δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0. Conversely, if δ𝔖¯d({(qk,𝔮k,λk)}k=0N)=0\delta\bar{\mathfrak{S}}_{d}\left(\{(q_{k},\mathfrak{q}_{k},\lambda_{k})\}_{k=0}^{N}\right)=0, then a discrete fundamental theorem of the calculus of variations yields the above equations. ∎

The authors have no competing interests to declare that are relevant to the content of this article.

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  • Absil et al. [2008] P. A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008. ISBN 978-0-691-13298-3.
  • Alimisis et al. [2020] F. Alimisis, A. Orvieto, G. Bécigneul, and A. Lucchi. A continuous-time perspective for modeling acceleration in Riemannian optimization. In Proceedings of the 23rd International AISTATS Conference, volume 108 of PMLR, pages 1297–1307, 2020.
  • Betancourt et al. [2018] M. Betancourt, M. I. Jordan, and A. Wilson. On symplectic optimization. 2018.
  • Blanes et al. [2008] S. Blanes, F. Casas, J. A. Oteo, and J. Ros. The Magnus expansion and some of its applications. Physics Reports, 470:151–238, 11 2008. doi: 10.1016/j.physrep.2008.11.001.
  • Bou-Rabee and Marsden [2009] N. Bou-Rabee and J. E. Marsden. Hamilton–Pontryagin integrators on Lie groups part I: Introduction and structure-preserving properties. Foundations of Computational Mathematics, 9(2):197–219, 2009.
  • Boumal [2020] N. Boumal. An introduction to optimization on smooth manifolds, 2020. URL http://www.nicolasboumal.net/book.
  • Calvo and Sanz-Serna [1993] J. P. Calvo and J. M. Sanz-Serna. The development of variable-step symplectic integrators, with application to the two-body problem. SIAM J. Sci. Comp., 14(4):936–952, 1993.
  • Campos et al. [2021] C. M. Campos, A. Mahillo, and D. Martín de Diego. A discrete variational derivation of accelerated methods in optimization. 2021. URL https://arxiv.org/abs/2106.02700.
  • Casas and Owren [2003] F. Casas and B. Owren. Cost efficient Lie group integrators in the RKMK class. BIT, 43:723–742, 11 2003. doi: 10.1023/B:BITN.0000009959.29287.d4.
  • Celledoni et al. [2003] E. Celledoni, A. Marthinsen, and B. Owren. Commutator-free Lie group methods. Future Gener. Comput. Syst., 19:341–352, 2003.
  • Celledoni et al. [2014] E. Celledoni, H. Marthinsen, and B. Owren. An introduction to Lie group integrators – basics, new developments and applications. Journal of Computational Physics, 257:1040–1061, 2014. ISSN 0021-9991. doi: 10.1016/j.jcp.2012.12.031.
  • Celledoni et al. [2022] E. Celledoni, E. Çokaj, A. Leone, D. Murari, and B. Owren. Lie group integrators for mechanical systems. International Journal of Computer Mathematics, 99(1):58–88, 2022. doi: 10.1080/00207160.2021.1966772.
  • Christiansen et al. [2011] S. H. Christiansen, H. Z. Munthe-Kaas, and B. Owren. Topics in structure-preserving discretization. Acta Numerica, 20:1–119, 2011. doi: 10.1017/S096249291100002X.
  • Crouch and Grossman [1993] P. E. Crouch and R. L. Grossman. Numerical integration of ordinary differential equations on manifolds. Journal of Nonlinear Science, 3:1–33, 1993.
  • Duchi et al. [2011] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
  • Duruisseaux and Leok [2022a] V. Duruisseaux and M. Leok. Accelerated optimization on Riemannian manifolds via discrete constrained variational integrators. Journal of Nonlinear Science, 32(42), 2022a. URL https://doi.org/10.1007/s00332-022-09795-9.
  • Duruisseaux and Leok [2022b] V. Duruisseaux and M. Leok. Practical perspectives on symplectic accelerated optimization. 2022b. URL https://arxiv.org/abs/2207.1146.
  • Duruisseaux and Leok [2022c] V. Duruisseaux and M. Leok. Accelerated optimization on Riemannian manifolds via projected variational integrators. 2022c. URL https://arxiv.org/abs/2201.02904.
  • Duruisseaux and Leok [2022d] V. Duruisseaux and M. Leok. A variational formulation of accelerated optimization on Riemannian manifolds. SIAM Journal on Mathematics of Data Science, 4(2):649–674, 2022d. URL https://doi.org/10.1137/21M1395648.
  • Duruisseaux et al. [2021] V. Duruisseaux, J. Schmitt, and M. Leok. Adaptive Hamiltonian variational integrators and applications to symplectic accelerated optimization. SIAM Journal on Scientific Computing, 43(4):A2949–A2980, 2021. URL https://doi.org/10.1137/20M1383835.
  • [21] G. França, M. I. Jordan, and R. Vidal. On dissipative symplectic integration with applications to gradient-based optimization. Journal of Statistical Mechanics: Theory and Experiment, 2021(4):043402. doi: 10.1088/1742-5468/abf5d4.
  • França et al. [2021] G. França, A. Barp, M. Girolami, and M. I. Jordan. Optimization on manifolds: A symplectic approach, 07 2021.
  • Gallier and Quaintance [2020] J. Gallier and J. Quaintance. Differential Geometry and Lie Groups: A Computational Perspective. Geometry and Computing. Springer International Publishing, 2020. ISBN 9783030460402.
  • Gladman et al. [1991] B. Gladman, M. Duncan, and J. Candy. Symplectic integrators for long-time integrations in celestial mechanics. Celestial Mech. Dynamical Astronomy, 52:221–240, 1991.
  • Hairer [1997] E. Hairer. Variable time step integration with symplectic methods. Applied Numerical Mathematics, 25(2-3):219–227, 1997.
  • Hairer et al. [2006] E. Hairer, C. Lubich, and G. Wanner. Geometric Numerical Integration, volume 31 of Springer Series in Computational Mathematics. Springer-Verlag, Berlin, second edition, 2006.
  • Hall and Leok [2015] J. Hall and M. Leok. Spectral Variational Integrators. Numer. Math., 130(4):681–740, 2015.
  • Hussein et al. [2006] I. I. Hussein, M. Leok, A. K. Sanyal, and A. M. Bloch. A discrete variational integrator for optimal control problems on SO(3). Proceedings of the 45th IEEE Conference on Decision and Control, pages 6636–6641, 2006.
  • Iserles and Nørsett [1999] A. Iserles and S. P. Nørsett. On the solution of linear differential equations in Lie groups. Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 357(1754):983–1019, 1999. ISSN 1364503X.
  • Iserles et al. [2000] A. Iserles, H. Z. Munthe-Kaas, S. P. Nørsett, and A. Zanna. Lie group methods. In Acta Numerica, volume 9, pages 215–365. Cambridge University Press, 2000.
  • [31] M. I. Jordan. Dynamical, symplectic and stochastic perspectives on gradient-based optimization. In Proceedings of the International Congress of Mathematicians (ICM 2018), pages 523–549. doi: 10.1142/9789813272880˙0022.
  • Jost [2017] J. Jost. Riemannian Geometry and Geometric Analysis. Universitext. Springer, Cham, 7th edition, 2017.
  • Kingma and Ba [2014] D. Kingma and J. Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
  • Lall and West [2006] S. Lall and M. West. Discrete variational Hamiltonian mechanics. J. Phys. A, 39(19):5509–5519, 2006.
  • Lee [2018] J. M. Lee. Introduction to Riemannian Manifolds, volume 176 of Graduate Texts in Mathematics. Springer, Cham, second edition, 2018.
  • Lee [2008] T. Lee. Computational geometric mechanics and control of rigid bodies. Ph.D. dissertation, University of Michigan, 2008.
  • [37] T. Lee, N. H. McClamroch, and M. Leok. A Lie group variational integrator for the attitude dynamics of a rigid body with applications to the 3d pendulum. Proceedings of 2005 IEEE Conference on Control Applications. (CCA 2005), pages 962–967.
  • Lee et al. [2007a] T. Lee, M. Leok, and N. H. McClamroch. Lie group variational integrators for the full body problem. Comput. Methods Appl. Mech. Engrg., 196(29-30):2907–2924, 2007a.
  • Lee et al. [2007b] T. Lee, M. Leok, and N. H. McClamroch. Lie group variational integrators for the full body problem in orbital mechanics. Celestial Mechanics and Dynamical Astronomy, 98(2):121–144, 2007b.
  • Lee et al. [2009] T. Lee, M. Leok, and N. H. McClamroch. Lagrangian mechanics and variational integrators on two-spheres. International Journal for Numerical Methods in Engineering, 79(9):1147–1174, 2009.
  • Lee et al. [2017] T. Lee, M. Leok, and N. H. McClamroch. Global Formulations of Lagrangian and Hamiltonian Dynamics on Manifolds: A Geometric Approach to Modeling and Analysis. Interaction of Mechanics and Mathematics. Springer International Publishing, 2017. ISBN 9783319569536.
  • Lee et al. [2021] T. Lee, M. Tao, and M. Leok. Variational symplectic accelerated optimization on Lie groups. 2021.
  • Leok [2004] M. Leok. Generalized Galerkin variational integrators: Lie group, multiscale, and pseudospectral methods. (preprint, arXiv:math.NA/0508360), 2004.
  • Leok and Ohsawa [2011] M. Leok and T. Ohsawa. Variational and geometric structures of discrete Dirac mechanics. Found. Comput. Math., 11(5):529--562, 2011.
  • Leok and Shingel [2012] M. Leok and T. Shingel. Prolongation-collocation variational integrators. IMA J. Numer. Anal., 32(3):1194--1216, 2012.
  • Leok and Zhang [2011] M. Leok and J. Zhang. Discrete Hamiltonian variational integrators. IMA Journal of Numerical Analysis, 31(4):1497--1532, 2011.
  • Lewis and Simo [1994] D. Lewis and J. C. Simo. Conserving algorithms for the dynamics of Hamiltonian systems on Lie groups. Journal of Nonlinear Science, 4(1):253--299, December 1994. doi: 10.1007/BF02430634.
  • Marsden and Ratiu [1999] J. E. Marsden and T. S. Ratiu. Introduction to mechanics and symmetry, volume 17 of Texts in Applied Mathematics. Springer-Verlag, New York, second edition, 1999.
  • Marsden and West [2001] J. E. Marsden and M. West. Discrete mechanics and variational integrators. Acta Numer., 10:357--514, 2001.
  • Munthe-Kaas [1995] H. Z. Munthe-Kaas. Lie-Butcher theory for Runge-Kutta methods. BIT, 35:572--587, 1995.
  • Munthe-Kaas [1998] H. Z. Munthe-Kaas. Runge-Kutta methods on Lie groups. BIT Numerical Mathematics, 38:92--111, 1998.
  • Munthe-Kaas [1999] H. Z. Munthe-Kaas. High order Runge-Kutta methods on manifolds. Applied Numerical Mathematics, 29:115--127, 1999.
  • Nash [1956] J. Nash. The imbedding problem for Riemannian manifolds. Annals of Mathematics, 63(1):20--63, 1956. ISSN 0003486X.
  • Nesterov [1983] Y. Nesterov. A method of solving a convex programming problem with convergence rate 𝒪(1/k2)\mathcal{O}(1/k^{2}). Soviet Mathematics Doklady, 27(2):372--376, 1983.
  • Nesterov [2004] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87 of Applied Optimization. Kluwer Academic Publishers, Boston, MA, 2004.
  • Nordkvist and Sanyal [2010] N. Nordkvist and A. K. Sanyal. A Lie group variational integrator for rigid body motion in SE(3) with applications to underwater vehicle dynamics. In 49th IEEE Conference on Decision and Control (CDC), pages 5414--5419, 2010. doi: 10.1109/CDC.2010.5717622.
  • Schmitt and Leok [2017] J. M. Schmitt and M. Leok. Properties of Hamiltonian variational integrators. IMA Journal of Numerical Analysis, 38(1):377--398, 03 2017.
  • Schmitt et al. [2018] J. M. Schmitt, T. Shingel, and M. Leok. Lagrangian and Hamiltonian Taylor variational integrators. BIT Numerical Mathematics, 58:457--488, 2018. doi: 10.1007/s10543-017-0690-9.
  • Su et al. [2016] W. Su, S. Boyd, and E. Candes. A differential equation for modeling Nesterov’s Accelerated Gradient method: theory and insights. Journal of Machine Learning Research, 17(153):1--43, 2016.
  • Tao and Ohsawa [2020] M. Tao and T. Ohsawa. Variational optimization on Lie groups, with examples of leading (generalized) eigenvalue problems. In Proceedings of the 23rd International AISTATS Conference, volume 108 of PMLR, 2020.
  • Tieleman and Hinton [2012] T. Tieleman and G. Hinton. Coursera: Neural Networks for Machine Learning - Lecture 6.5: RMSprop. 2012.
  • Wahba [1965] G. Wahba. A Least Squares Estimate of Satellite Attitude. SIAM Review, 7(3):409, July 1965. doi: 10.1137/1007077.
  • Whitney [1944a] H. Whitney. The singularities of a smooth nn-manifold in (2n1)(2n-1)-space. Annals of Mathematics, 45(2):247--293, 1944a. ISSN 0003486X.
  • Whitney [1944b] H. Whitney. The self-intersections of a smooth nn-manifold in 2n2n-space. Annals of Mathematics, 45(2):220--246, 1944b. ISSN 0003486X.
  • Wibisono et al. [2016] A. Wibisono, A. Wilson, and M. Jordan. A variational perspective on accelerated methods in optimization. Proceedings of the National Academy of Sciences, 113(47):E7351--E7358, 2016.
  • Yoshimura and Marsden [2006a] H. Yoshimura and J. E. Marsden. Dirac structures in Lagrangian mechanics Part I: Implicit Lagrangian systems. Journal of Geometry and Physics, 57(1):133--156, 2006a.
  • Yoshimura and Marsden [2006b] H. Yoshimura and J. E. Marsden. Dirac structures in Lagrangian mechanics Part II: Variational structures. Journal of Geometry and Physics, 57(1):209--250, 2006b.
  • [68] A. Zanna. Collocation and relaxed collocation for the Fer and the Magnus expansions. SIAM Journal on Numerical Analysis, 36(4):1145--1182. ISSN 00361429.
  • Zare and Szebehely [1975] K. Zare and V. G. Szebehely. Time transformations in the extended phase-space. Celestial mechanics, 11:469--482, 1975.