This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamics on Lie groups with applications to attitude estimation

T. Forrest Kieffer111Mathematician, Weapon Control Concepts Development Group, Air and Missile Defense Sector; [email protected] (Corresponding Author) and Michael L. Wall222Chief Scientist, Weapon Control Concepts Development Group, Air and Missile Defense Sector333Alphabetical author order; both authors contributed equally to this work Johns Hopkins University Applied Physics Laboratory, Laurel, MD 20723
Abstract

The problem of filtering – propagation of states through stochastic differential equations (SDEs) and association of measurement data using Bayesian inference – in a state space which forms a Lie group is considered. Particular emphasis is given to concentrated Gaussians (CGs) as a parametric family of probability distributions to capture the uncertainty associated with an estimated state. The so-called group-affine property of the state evolution is shown to be necessary and sufficient for linearity of the dynamics on the associated Lie algebra, in turn implying CGs are invariant under such evolution. A putative SDE on the group is then reformulated as an SDE on the associated Lie algebra. The vector space structure of the Lie algebra together with the notion of a CG enables the leveraging of techniques from conventional Gaussian-based Kalman filtering in an approach called the tangent space filter (TSF). We provide example calculations for several Lie groups that arise in the problem of estimating position, velocity, and orientation of a rigid body from a noisy, potentially biased inertial measurement unit (IMU). For the specific problem of attitude estimation, numerical experiments demonstrate that TSF-based approaches are more accurate and robust than another widely used attitude filtering technique.

Nomenclature

\mathbb{Z} = {0,±1,±2,}\{0,\pm 1,\pm 2,\cdots\}, the set of integers.
\mathbb{R} = the set of real numbers.
n\mathbb{R}^{n} = the set of nn-dimensional vectors with real entries.
n×m\mathbb{R}^{n\times m} = the set of n×mn\times m matrices with real entries.
InI_{n} = the n×nn\times n identity matrix.
,\langle\cdot,\cdot\rangle = standard Euclidean inner product on n\mathbb{R}^{n}.
[]×[\cdot]_{\times} = cross product matrix satisfying [a]×b=a×b[a]_{\times}b=a\times b for a,b3a,b\in\mathbb{R}^{3}.
[A,B][A,B] = ABBAAB-BA, the commutator of A,Bn×nA,B\in\mathbb{R}^{n\times n}.
𝕊3\mathbb{S}^{3} = {q4:q=1}\{q\in\mathbb{R}^{4}:\|q\|=1\}, 33-dimensional sphere.
GLn\mathrm{GL}_{n} = {An×n:detA0}\{A\in\mathbb{R}^{n\times n}:\det{A}\neq 0\}, the general linear group in nn dimensions.
SOn\mathrm{SO}_{n} = {An×n:AAT=ATA=In}\{A\in\mathbb{R}^{n\times n}:AA^{T}=A^{T}A=I_{n}\}, the special orthogonal group in nn dimensions.
𝔰𝔬n\mathfrak{so}_{n} = {An×n:AT=A}\{A\in\mathbb{R}^{n\times n}:A^{T}=-A\}, the Lie algebra of SOn\mathrm{SO}_{n}.
SEn\mathrm{SE}_{n} = SOnn\mathrm{SO}_{n}\ltimes\mathbb{R}^{n}, the special Euclidean group in nn dimensions.
𝔰𝔢n\mathfrak{se}_{n} = the Lie algebra of SEn\mathrm{SE}_{n}.
GG = matrix Lie group (i.e., a closed subgroup of GLd\mathrm{GL}_{d} for some d>0d>0).
𝔤\mathfrak{g} = the Lie algebra associated with GG, often denoted Lie(G)\mathrm{Lie}({G}).
𝒱𝔤\mathcal{V}_{\mathfrak{g}} = "vectorization" map associating the Lie algebra 𝔤\mathfrak{g} with dim𝔤\mathbb{R}^{\dim{\mathfrak{g}}} as X𝔤𝒱𝔤(X)dim𝔤X\in\mathfrak{g}\mapsto\mathcal{V}_{\mathfrak{g}}({X})\in\mathbb{R}^{\dim{\mathfrak{g}}}.
𝔤\mathcal{M}_{\mathfrak{g}} = "matrization" map associating dim𝔤\mathbb{R}^{\dim{\mathfrak{g}}} with 𝔤\mathfrak{g}, i.e. the inverse of 𝒱𝔤\mathcal{V}_{\mathfrak{g}}.
adX\mathrm{ad}_{X} = the adjoint action on 𝔤\mathfrak{g} given by adXY=[X,Y]\mathrm{ad}_{X}Y=[X,Y] with X,Y𝔤X,Y\in\mathfrak{g}.
J𝔤(X)J_{\mathfrak{g}}(X) = 01esadXds\int_{0}^{1}e^{-s\mathrm{ad}_{X}}\mathrm{d}s, for X𝔤X\in\mathfrak{g}, is the Jacobian of the exponential map.
W𝔤(X)W_{\!\mathfrak{g}}(X) = J𝔤1(X)J^{-1}_{\mathfrak{g}}(-X), for X𝔤X\in\mathfrak{g}.
ad¯ξ1ξ2\overline{\mathrm{ad}}_{\xi_{1}}\xi_{2} = 𝒱𝔤([𝔤(ξ1),𝔤(ξ2)])\mathcal{V}_{\mathfrak{g}}({[\mathcal{M}_{\mathfrak{g}}({\xi_{1}}),\mathcal{M}_{\mathfrak{g}}({\xi_{2}})]}) for ξ1,ξ2dim𝔤\xi_{1},\xi_{2}\in\mathbb{R}^{\dim{\mathfrak{g}}}.
J¯𝔤(ξ)\overline{J}_{\!\mathfrak{g}}(\xi) = 01esad¯ξds\int_{0}^{1}e^{-s\overline{\mathrm{ad}}_{\xi}}\mathrm{d}s, for ξdim𝔤\xi\in\mathbb{R}^{\dim{\mathfrak{g}}}.
W¯𝔤(ξ)\overline{W}_{\!\!\mathfrak{g}}(\xi) = J¯𝔤1(ξ)\overline{J}^{-1}_{\mathfrak{g}}(-\xi), for ξdim𝔤\xi\in\mathbb{R}^{\dim{\mathfrak{g}}}.
BCH(X1,X2)\mathrm{BCH}({X_{1}},{X_{2}}) = log(eX1eX2)\log{(e^{X_{1}}e^{X_{2}})} for X1,X2𝔤X_{1},X_{2}\in\mathfrak{g}.
𝒯G(μ,Σ)\mathcal{T}_{G}(\mu,\Sigma) = CG distribution on GG with group mean μG\mu\in G and covariance matrix Σ\Sigma.
𝒩n(μ,Σ)\mathcal{N}^{n}(\mu,\Sigma) = nn-dimensional Gaussian distribution with mean μn\mu\in\mathbb{R}^{n} and covariance Σn×n\Sigma\in\mathbb{R}^{n\times n}

I Introduction

The impact of the Kalman filter (KF) in statistical estimation and control theory is difficult to overstate. The KF enables the calculation of the joint probability distribution of a vector of state variables 𝐱\mathbf{x} from a collection of noisy measurements supplemented with linear models of the state vector dynamics and measurement kalman1960new ; kalman1961new . The key assumptions required for the KF to be optimal – that dynamics and measurements are linear in the state space variables, that the noise appearing in dynamics and measurements is additive and white, and that the state vector distribution is initially Gaussian – are physically reasonable or can be made to hold approximately with good accuracy in many cases. This has led to the KF being used ubiquitously in guidance, navigation, and control grewal2007global , in robotics, in space and defense bar2004estimation ; ristic2004beyond , and myriad other applications. In places where the linearity assumption breaks down, the extended Kalman filter (EKF), in which the nonlinear models are linearized about the current state estimate point, or the unscented Kalman filter (UKF), in which a collection of nonlinear model outcomes are utilized to estimate state vector properties julier1997new , are widely employed, often to good accuracy. Some exact results are known for distributions other than Gaussians, though these usually require specific dynamical and measurement models to ensure the that resulting filter process leaves the form of the non-Gaussian distribution invariant and only updates its parameters; the work of Benes̆ benevs1981exact and Daum daum2005nonlinear is paradigmatic of this approach. At the extreme of nonlinear filtering is particle filters gordon1993novel , which represent the filtered distribution as a weighted collection of delta functions ristic2004beyond and so place no constraints on dynamical or measurement models or the underlying state vector distribution. However, particle filters are limited by a sampling error which can be slow to converge, especially for large state space dimension daum2003curse , and this can pose challenges for stability and efficient implementation.

A more subtle setting for nonlinear filtering is estimation of a state which is subject to a nonlinear constraint. Here, several complications can occur: linearization of the constraint may be inaccurate across the support of the state vector distribution and so key properties are not maintained. The constraint also reduces the effective dimensionality of the state vector, which introduces singularities into its associated covariance matrix that are known to cause issues for filtering. In such cases, a modification of the distribution is required to avoid non-physical predictions, but such modification will likely be at odds with the Gaussian assumptions required for the standard KF.

The present work investigates a special case of this situation in which the constraint is formulated as a requirement that the state vector lives on a Lie group, which is a mathematical space that is both a smooth manifold and a group. In this setting, the filtering problem may be translated from the Lie group to its associated tangent spaces (i.e., its Lie algebra), which enables the constraint to be maintained globally through the exponential map connecting the tangent spaces with the Lie group, allows for accurate linearization due to the "flat nature" of the tangent spaces, and enables an unconstrained, singularity-free representation of the state and covariance.

Before presenting our general theoretical results, we introduce the motivating application of attitude estimation from noisy, potentially biased gyro measurements in Sec. II. In addition to providing motivation for our research, this also allows for concrete example calculations alongside the theory. Our theoretical exposition starts with the most general setting in Section III, which is that of an arbitrary matrix Lie group GG with Lie algebra 𝔤\mathfrak{g} endowed with an evolution equation of the form g˙=f(g)\dot{g}=f(g) for some vector field f:G𝔤f:G\rightarrow\mathfrak{g}. The question of the necessary and sufficient conditions on ff for which the corresponding dynamics on 𝔤\mathfrak{g} are linear is then addressed. Leveraging the results of barrau2019linear , ff must satisfy the so-called group-affine property and we go beyond prior work to give a complete characterization of group-affine vector fields (see Theorem 1).

Section IV formally introduces the class of CGs on a matrix Lie group. The main result here is Corollary 1, which makes precise the claim that the class of CGs are invariant under the flow generated by a group-affine vector field. This Corollary provides the theoretical foundation for one of the main contentions in this work, namely that CGs are the most natural choice of probability distributions to filter with on a Lie group. Particular emphasis is then given to choosing a group law on the underlying state space which makes the state space dynamics in question group-affine. Section V considers an application-motivated stochastic generalization of group-affine dynamics and derives the corresponding SDE on 𝔤\mathfrak{g}. This SDE on 𝔤\mathfrak{g} will be referred to as the tangent space SDE (TSSDE) on 𝔤\mathfrak{g} and is, in general, an SDE with state-dependent diffusion. This section ends with a discussion regarding the invariance of CGs in the presence of noise.

The theoretical results generated in Section III, IV, and V are used as motivation for our Lie group filtering methodology, detailed in Section VI. After introducing a discrete-time observation process, we design an (approximate) nonlinear filtering algorithm, called the TSF, that is based on the unscented transform (UT). Key novelties of this algorithm are the use of the continuous-time unscented transform (CTUT) sarkka2007unscented (generalized to state-dependent diffusion) to propagate moments through the TSSDE and a whitening procedure based on the Baker–Campbell–Hausdorff (BCH) formula. We also suggest how the TSF may be made more computationally efficient by considering a short-time expansion of the Fokker-Planck equation (FPE) associated with the TSSDE and deriving moment evolution equations based off this to replace to CTUT for moment propagation.

In Section VII, three application-relevant examples are worked out in detail. These three examples correspond to the well-known group SU2\mathrm{SU}_{2}, the special Euclidean group SE3\mathrm{SE}_{3}, and the group SE2,3\mathrm{SE}_{2,3}, whose underlying state space is SO3×6\mathrm{SO}_{3}\times\mathbb{R}^{6} and is a simple generalization of SE3\mathrm{SE}_{3}. The SU2\mathrm{SU}_{2} example is motivated by the problem of attitude filtering under the paradigm of dynamical model replacement (DMR) without gyro bias, while the SE2,3\mathrm{SE}_{2,3} example is motivated by the inertial navigation system (INS) state estimation problem without IMU biases. The group SE3\mathrm{SE}_{3} provides a bridge between these two examples. The natural dynamics for the SU2\mathrm{SU}_{2} problem falls into the class of constant Lie group dynamics, which are the simplest to analyze. On the other hand, the natural dynamics motivated by the INS problem are not constant, but fall into the more general class of group-affine dynamics. Emphasis is placed on how the SE2,3\mathrm{SE}_{2,3} group law on the underlying state space SO3×6\mathrm{SO}_{3}\times\mathbb{R}^{6} is the choice which results in group-affine dynamics (as compared to, e.g., the direct product (DP) group law). Additional insights regarding the fundamental solution of the FPE associated with the SU2\mathrm{SU}_{2} problem are provided at the end of Section VII.1.

A fourth example, covered in Section VIII, is the Lie group whose underlying manifold is SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}. This group is motivated by the problem of attitude estimation with the inclusion of the gyro bias that is to-be-estimated. This example will highlight the interplay between the equations of motion (i.e., dynamics), which are motivated by physics, and the freedom to choose a group structure which in turn makes the associated vector field group-affine. In this fourth example, we argue that there is no fixed group structure on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} with makes the attitude dynamics with gyro bias group-affine. Instead, a time-parameterized family of group laws must be considered to yield a vector field which is almost group-affine. This time-parameterized family of group laws is approximately SE3\mathrm{SE}_{3} to zeroth-order in time. (This observation is consistent with the proposal to use the SE3\mathrm{SE}_{3} group law for treating gyro bias in, e.g., barrau2022geometry .) This section ends with a numerical experiment that compares performance of the TSF using two different group laws on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}, namely that induced by the DP and that of a semidirect product (i.e., SE3\mathrm{SE}_{3}), to the Unscented Quaternion Estimator (USQUE) Crassidis_UKF . It is shown that the TSF using the SE3\mathrm{SE}_{3} group law exhibits near-perfect filter consistency, far outperforming the USQUE and TSF using the DP group law (Figure 1).

II Attitude descriptions and estimation

In attitude filtering, one is concerned with estimating an element of the non-abelian Lie group of 33-dimensional spatial rotations, denoted as SO3:={R3×3:RRT=I3,detR=1}\mathrm{SO}_{3}:=\{R\in\mathbb{R}^{3\times 3}:RR^{T}=I_{3},~{}\det{R}=1\}. Specifically, a matrix RSO3R\in\mathrm{SO}_{3} describes a rotation from one reference frame (e.g., the body frame of a rigid body) to another reference frame (e.g., an inertial or Earth-fixed frame). Since SO3\mathrm{SO}_{3} is a 33-dimensional manifold, one may use, at least locally, a set of 33 real parameters (e.g., Euler angles) to describe a matrix RSO3R\in\mathrm{SO}_{3}. However, since SO3\mathrm{SO}_{3} is not homeomorphic to 3\mathbb{R}^{3}, a single set of such parameters will necessarily have a singularity at some point (e.g., Euler angles have a singularity that cause the problem of gimbal lock).

The standard approach to avoid the issue of coordinate singularities is to work with a higher dimensional parameterization of rotations, such as the unit quaternions. A quaternion may be thought of as a vector q4q\in\mathbb{R}^{4} which is partitioned as q=(vs)q=(v~{}s), where v=(v1v2v3)3v=(v_{1}~{}v_{2}~{}v_{3})\in\mathbb{R}^{3} is the "vector" part and ss\in\mathbb{R} is the "scalar" part. A unit quaternion qq has q=1\|q\|=1, where \|\cdot\| indicates the standard Euclidean length in 44 dimensions. The set of all unit quaternions is the 33-sphere, denoted as 𝕊3:={q4:q=1}\mathbb{S}^{3}:=\{q\in\mathbb{R}^{4}:\|q\|=1\}. Given a unit quaternion q𝕊3q\in\mathbb{S}^{3}, there is a map R:𝕊3SO3R:\mathbb{S}^{3}\rightarrow\mathrm{SO}_{3} that associates the unit quaternion qq with a rotation R(q)R(q) between two specified reference frames. Explicitly,

R(q)=I32s[v]×+2[v]×2,\displaystyle R(q)=I_{3}-2s[v]_{\times}+2[v]_{\times}^{2}, (1)

where [v]×3×3[v]_{\times}\in\mathbb{R}^{3\times 3} is the cross product matrix, which is the matrix satisfying [v]×w=v×w[v]_{\times}w=v\times w for all w3w\in\mathbb{R}^{3}. Note that R(q)RT(q)=I3R(q)R^{T}(q)=I_{3} and detR(q)=1\det{R(q)}=1 is a result of the constraint q=1\|q\|=1.

One may endow 𝕊3\mathbb{S}^{3} with a multiplication operation \otimes such that the multiplication of two unit quaternions describes the composition of their individual rotations, i.e. R(qq)=R(q)R(q)R(q\otimes q^{\prime})=R(q)R(q^{\prime}). (Note that we are using M. Shuster’s convention for quaternion multiplication shuster1993survey . This choice is made so that the map RR in (1) is a (Lie) group homomorphism 𝕊3SO3\mathbb{S}^{3}\rightarrow\mathrm{SO}_{3}. The reader should be aware that another convention is in use in the literature and which convention is being used should always be verified for consistency.) It may be shown that qqq\otimes q^{\prime} defines another unit quaternion given by

qq=(sv+svv×vssv,v),\displaystyle q\otimes q^{\prime}=\left(\begin{array}[]{c}sv^{\prime}+s^{\prime}v-v\times v^{\prime}\\ ss^{\prime}-\langle v,v^{\prime}\rangle\end{array}\right), (4)

where v,v=v1v1+v2v2+v3v3\langle v,v^{\prime}\rangle=v_{1}v_{1}^{\prime}+v_{2}v_{2}^{\prime}+v_{3}v_{3}^{\prime} indicates the standard Euclidean inner product of the vectors v,v3v,v^{\prime}\in\mathbb{R}^{3}. This multiplication operation can be represented in matrix form as

[q]:=sI4+𝔰(v),\displaystyle\left[q\otimes\right]:=sI_{4}+\mathcal{M}_{\mathfrak{s}}({v}), (5)

where 𝔰:34×4\mathcal{M}_{\mathfrak{s}}:\mathbb{R}^{3}\rightarrow\mathbb{R}^{4\times 4} is the anti-symmetric matrix

𝔰(v)=([v]×vvT0),\displaystyle\mathcal{M}_{\mathfrak{s}}({v})=\left(\begin{array}[]{cc}-[v]_{\times}&v\\[2.0pt] -v^{T}&0\end{array}\right), (8)

in which vv is interpreted as a column vector and vTv^{T} as a row vector. Under the operation \otimes the inverse quaternion is given by q1=(vs)q^{-1}=(-v~{}s) and the identity quaternion is given by (01)(0~{}1). Hence, 𝕊3\mathbb{S}^{3} endowed with quaternionic multiplication forms a Lie group. As a Lie group, 𝕊3\mathbb{S}^{3} is recognized as the double cover of SO3\mathrm{SO}_{3}. The term double cover refers to the fact that the map (1) satisfies R(q)=R(q)R(q)=R(-q). In other words, there are two distinct unit quaternions that yield the same SO3\mathrm{SO}_{3}-matrix RR. Finally, it is worth noting that the Lie group 𝕊3\mathbb{S}^{3} is typically identified with the special unitary group SU2={U2×2:UU=I3,detU=1}\mathrm{SU}_{2}=\{U\in\mathbb{C}^{2\times 2}:UU^{\dagger}=I_{3},~{}\det{U}=1\}.

If a rigid body is subject to an angular velocity ω(t)3\omega(t)\in\mathbb{R}^{3} relative to a chosen reference frame and R(t)SO3R(t)\in\mathrm{SO}_{3} describes its orientation relative to that frame, then R˙=[ω]×R\dot{R}=[\omega]_{\times}R, where the "dot" denotes the time derivative. This equation of motion for rotations may be recast in terms of quaternions using the inverse of the map (1), which results in

{q˙=12𝔰(ω)qq(0)=q0𝕊3,\displaystyle\left\{\begin{array}[]{l}\dot{q}=\dfrac{1}{2}\mathcal{M}_{\mathfrak{s}}({\omega})q\\[2.0pt] q(0)=q_{0}\in\mathbb{S}^{3},\end{array}\right. (11)

where q0q_{0} describes the initial orientation of the rigid body. In the case of deterministic and constant ω(t)=ω0\omega(t)=\omega_{0}, the solution is given by the matrix exponential:

q(t)=et𝔰(ω0)/2q0.\displaystyle q(t)=e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}q_{0}. (12)

Typically, the angular rate ω\omega that is passed to (11) is a noisy measurement obtained from a gyroscope. A common parameterization of angular rate measurements is given by the Farrenkopf model farrenkopf1978analytic ; Crassidis_Extension . Stated as a system of coupled SDEs, the Farrenkopf model reads

{ωmdt=(ω+β)dt+dηdβ=dζ,\displaystyle\left\{\begin{array}[]{l}\omega_{m}\mathrm{d}t=\left(\omega+\beta\right)\mathrm{d}t+\mathrm{d}\eta\\[2.0pt] \mathrm{d}\beta=\mathrm{d}\zeta,\end{array}\right. (15)

in which ωm\omega_{m} is the measured rate, ω\omega is the true rate, β\beta is the rate bias, and dη\mathrm{d}\eta and dζ\mathrm{d}\zeta are vectors of mutually uncorrelated Wiener processes whose derivatives are delta-correlated white noises with spectral density matrices QηQ_{\eta} and QζQ_{\zeta}, respectively. This model excludes gyro misalignment and scale factor errors, but they can be straightforwardly included.

It must be stressed that the vector ωm\omega_{m} is a measurement, and so, strictly speaking, should be estimated when utilized in a filter Crassidis_Extension . Instead, the conventional approach, known as dynamic model replacement (DMR) Crassidis_UKF , is followed. DMR uses the measurements ωm\omega_{m} directly in the model and treats this measured angular rate as a constant over the propagation time step. The SDE describing the evolution of the quaternion under the Farrenkopf model is then determined by solving eqn. (15) for ω\omega and plugging the result into eqn. (11). The result is the quaternion SDE:

{q˙=12𝔰(ωmβη)qβ˙=ζq=1.\displaystyle\left\{\begin{array}[]{l}\dot{q}=\dfrac{1}{2}\mathcal{M}_{\mathfrak{s}}({\omega_{m}-\beta-\eta})q\\[2.0pt] \dot{\beta}=\zeta\\[1.0pt] \|q\|=1.\end{array}\right. (19)

As an SDE, (19) must carry the Stratonovich interpretation, otherwise solutions may not maintain the constraint q=1\|q\|=1. Eqn. (19) represents the SDE relevant for propagation in an attitude filter based on DMR using the Farrenkopf model (15).

A key question not yet addressed is: what is an appropriate family of probability distributions to describe the uncertainty associated with qq? In this regard, let δ𝒩3(0,Σ)\delta\sim\mathcal{N}^{3}(0,\Sigma) and μ𝕊3\mu\in\mathbb{S}^{3} be some fixed reference quaternion. Consider the distribution over 𝕊3\mathbb{S}^{3} induced by q=e𝔰(δ)μq=e^{\mathcal{M}_{\mathfrak{s}}({\delta})}\mu (c.f. (12)). The distribution associated with the random quaternion qq is an example of a CG on the Lie group 𝕊3\mathbb{S}^{3} with parameters Σ\Sigma and μ\mu. With this family of distributions identified, the next question of immediate practical importance is: how do the parameters of the distribution evolve under (19)? Here, we find our first connection with the main results of our work. Namely, if q0=e𝔰(δ0)μ0q_{0}=e^{\mathcal{M}_{\mathfrak{s}}({\delta_{0}})}\mu_{0} for some δ0𝒩3(0,Σ0)\delta_{0}\sim\mathcal{N}^{3}(0,\Sigma_{0}) and μ0𝕊3\mu_{0}\in\mathbb{S}^{3}, then the solution (12) is of the form q(t)=e𝔰(δ(t))μ(t)q(t)=e^{\mathcal{M}_{\mathfrak{s}}({\delta(t)})}\mu(t) for some δ(t)𝒩3(0,Σ(t))\delta(t)\sim\mathcal{N}^{3}(0,\Sigma(t)) and μ(t)𝕊3\mu(t)\in\mathbb{S}^{3}. In other words, if q0q_{0} is CG distributed, then q(t)q(t) in (12) is also CG distributed (c.f. Corollary 1). The remainder of this section is concerned with explicitly demonstrating this claim. First, it is necessary to collect a few more facts about 𝕊3\mathbb{S}^{3} and its Lie algebra.

Eqn. (5) gives us a convenient faithful representation ρ\rho of the group 𝕊3\mathbb{S}^{3} as a subgroup of SO4\mathrm{SO}_{4}, which is the Lie group of 44 dimensional rotations, a higher dimensional generalization of SO3\mathrm{SO}_{3}. (In what follows, we will write ρ(𝕊3)\rho(\mathbb{S}^{3}) to denote the subgroup of SO4\mathrm{SO}_{4} representing unit quaternions.) The convenience comes, in part, from the determination of the Lie algebra of ρ(𝕊3)\rho(\mathbb{S}^{3}), namely the set 𝔰:={𝔰(a):a3}𝔰𝔬4\mathfrak{s}:=\{\mathcal{M}_{\mathfrak{s}}({a}):a\in\mathbb{R}^{3}\}\subset\mathfrak{so}_{4} where 𝔰𝔬4\mathfrak{so}_{4} is the set of all 4×44\times 4 antisymmetric matrices (i.e., the Lie algebra of SO4\mathrm{SO}_{4}). That 𝔰\mathfrak{s} is the Lie algebra of ρ(𝕊3)\rho(\mathbb{S}^{3}) follows from the identity

e𝔰(a)=cos(a)I4+sin(a)a𝔰(a)ρ(𝕊3),\displaystyle e^{\mathcal{M}_{\mathfrak{s}}({a})}=\cos{(\|a\|)}I_{4}+\frac{\sin{(\|a\|)}}{\|a\|}\mathcal{M}_{\mathfrak{s}}({a})\in\rho(\mathbb{S}^{3}), (20)

which follows from the Taylor series expansion of the matrix exponential and 𝔰(a)2=a2I4\mathcal{M}_{\mathfrak{s}}({a})^{2}=-\|a\|^{2}I_{4}.

Formula (20) is an instance of the exponential map exp:𝔤G\exp:\mathfrak{g}\rightarrow G from the Lie algebra 𝔤\mathfrak{g} of a Lie group GG, which is a concept that will be dealt with regularly in the sequel. The exponential map is defined for abstract Lie groups and their associated Lie algebras, but for the case of matrix Lie groups (i.e., closed subgroups of GLn\mathrm{GL}_{n}), the abstract exponential map coincides with the standard matrix exponential. The Lie bracket on {𝔰(a):a3}\{\mathcal{M}_{\mathfrak{s}}({a}):a\in\mathbb{R}^{3}\} is given by

[𝔰(a),𝔰(b)]:=𝔰(a)𝔰(b)𝔰(b)𝔰(a)=2𝔰(a×b).\displaystyle[\mathcal{M}_{\mathfrak{s}}({a}),\mathcal{M}_{\mathfrak{s}}({b})]:=\mathcal{M}_{\mathfrak{s}}({a})\mathcal{M}_{\mathfrak{s}}({b})-\mathcal{M}_{\mathfrak{s}}({b})\mathcal{M}_{\mathfrak{s}}({a})=-2\mathcal{M}_{\mathfrak{s}}({a\times b}). (21)

In other words, the map a𝔰(a)a\mapsto\mathcal{M}_{\mathfrak{s}}({a}) is a Lie algebra isomorphism from the Lie algebra (3,×)(\mathbb{R}^{3},\times) to the matrix Lie algebra 𝔰\mathfrak{s}. The physical interpretation of e𝔰(a)/2e^{\mathcal{M}_{\mathfrak{s}}({a})/2} is that, when applied to a unit quaternion, it yields a rotation through the angle a\|a\| about the axis a/aa/\|a\|.

Let ω03\omega_{0}\in\mathbb{R}^{3}. Suppose q(t)=et𝔰(ω0)/2q0q(t)=e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}q_{0} where q0=e𝔰(δ0)μ0q_{0}=e^{\mathcal{M}_{\mathfrak{s}}({\delta_{0}})}\mu_{0} with δ0𝒩3(0,Σ0)\delta_{0}\sim\mathcal{N}^{3}(0,\Sigma_{0}) and μ0𝕊3\mu_{0}\in\mathbb{S}^{3}. The goal is to show that q(t)q(t) is still of the form q(t)=e𝔰(δ(t))μ(t)q(t)=e^{\mathcal{M}_{\mathfrak{s}}({\delta(t)})}\mu(t) for some δ(t)𝒩3(0,Σ(t))\delta(t)\sim\mathcal{N}^{3}(0,\Sigma(t)) and μ(t)𝕊3\mu(t)\in\mathbb{S}^{3}. The first step is to observe that

q(t)=et𝔰(ω0)/2e𝔰(δ0)μ0=et𝔰(ω0)/2e𝔰(δ0)et𝔰(ω0)/2μ(t),\displaystyle q(t)=e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}e^{\mathcal{M}_{\mathfrak{s}}({\delta_{0}})}\mu_{0}=e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}e^{\mathcal{M}_{\mathfrak{s}}({\delta_{0}})}e^{-t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}\mu(t),

where μ(t)=et𝔰(ω0)/2μ0\mu(t)=e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}\mu_{0}. What remains is to demonstrate that et𝔰(ω0)/2e𝔰(δ0)et𝔰(ω0)/2=e𝔰(δ(t))e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}e^{\mathcal{M}_{\mathfrak{s}}({\delta_{0}})}e^{-t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}=e^{\mathcal{M}_{\mathfrak{s}}({\delta(t)})} for some δ(t)3\delta(t)\in\mathbb{R}^{3}.

Recall a standard result in Lie theory, namely the braiding identity (hall2013lie, , Proposition 3.35):

eXYeX=eadXY,\displaystyle e^{X}Ye^{-X}=e^{\mathrm{ad}_{X}}Y, (22)

which holds for all X,Y𝔤X,Y\in\mathfrak{g}. In (22), adX:𝔤𝔤\mathrm{ad}_{X}:\mathfrak{g}\rightarrow\mathfrak{g} is given by adXY:=[X,Y]\mathrm{ad}_{X}Y:=[X,Y] and denotes the adjoint action on the Lie algebra 𝔤\mathfrak{g}. Let a,b3a,b\in\mathbb{R}^{3}. The braiding identity (22), the commutator identity (21), and linearity of 𝔰\mathcal{M}_{\mathfrak{s}}, yields

e𝔰(a)e𝔰(b)e𝔰(a)=ee𝔰(a)𝔰(b)e𝔰(a)=eead𝔰(a)𝔰(b)=e𝔰(e2[a]×b).\displaystyle e^{\mathcal{M}_{\mathfrak{s}}({a})}e^{\mathcal{M}_{\mathfrak{s}}({b})}e^{-\mathcal{M}_{\mathfrak{s}}({a})}=e^{e^{\mathcal{M}_{\mathfrak{s}}({a})}\mathcal{M}_{\mathfrak{s}}({b})e^{-\mathcal{M}_{\mathfrak{s}}({a})}}=e^{e^{\mathrm{ad}_{\mathcal{M}_{\mathfrak{s}}({a})}}\mathcal{M}_{\mathfrak{s}}({b})}=e^{\mathcal{M}_{\mathfrak{s}}({e^{-2[a]_{\times}}b})}.

Applying the previous to identity to the expression et𝔰(ω0)/2e𝔰(δ0)et𝔰(ω0)/2e^{t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}e^{\mathcal{M}_{\mathfrak{s}}({\delta_{0}})}e^{-t\mathcal{M}_{\mathfrak{s}}({\omega_{0}})/2}, one finds that the desired δ(t)\delta(t) is given by δ(t)=et[ω0]×δ0\delta(t)=e^{-t[\omega_{0}]_{\times}}\delta_{0}. In other words, δ(t)𝒩3(0,Σ(t))\delta(t)\sim\mathcal{N}^{3}(0,\Sigma(t)) where Σ(t)=et[ω0]×Σ0et[ω0]×\Sigma(t)=e^{-t[\omega_{0}]_{\times}}\Sigma_{0}e^{t[\omega_{0}]_{\times}}. What has just been demonstrated is the invariance of the class of CGs on 𝕊3\mathbb{S}^{3} under the flow generated by the dynamics (11). The deeper theory that is developed in Section III informs us that this invariance is really due to the fact that the vector field q12𝔰(ω)qq\mapsto\frac{1}{2}\mathcal{M}_{\mathfrak{s}}({\omega})q is group-affine on 𝕊3\mathbb{S}^{3}.

In general, additional uncertainty contributes to the attitude quaternion during propagation due to bias and noise corrupting the measured angular rates, as in (15). Some natural questions are then: are CGs invariant under propagation in the presence of bias and noise? If not, how should a CG approximation be maintained during propagation? These and related questions are addressed in generality throughout the remainder of this manuscript. In particular, in Section VI, the main result of Section V is used together with the CTUT to (approximately) propagate the parameters of a CG forward in time in the presence of noise. Once these tools are in hand, the problem of attitude filtering (with the inclusion of gyro bias) is revisited in Section VIII.

III Characterization of group-affine vector fields

The main theoretical exposition begins with a determination on necessary and sufficient conditions for dynamics on a Lie group to result in linear dynamics on the associated Lie algebra. Let GG be a matrix Lie group of dimension nn, and 𝔤=Lie(G)\mathfrak{g}=\mathrm{Lie}({G}) be its Lie algebra. Throughout this manuscript, GG is assumed to have surjective exponential map. Consider an initial value problem (IVP) on GG of the form

{g˙=f(g)g(0)=g0G,\displaystyle\left\{\begin{array}[]{l}\dot{g}=f(g)\\ g(0)=g_{0}\in G,\end{array}\right. (25)

where f:G𝔤f:G\rightarrow\mathfrak{g} is a smooth map and the dot indicates time derivative. Here, ff is taken to be time-independent to avoid technical complications but this isn’t necessary. Suppose that the solution g:[0,)Gg:[0,\infty)\rightarrow G to (25) satisfies g(t)=eX(t)μ(t)g(t)=e^{X(t)}\mu(t), where X(t)X(t) has X(0)=X0𝔤X(0)=X_{0}\in\mathfrak{g}, and μ(t)\mu(t) satisfies an ordinary differential equation (ODE) analogous to the first equation in (25):

{μ˙=f(μ)μ(0)=μ0G,\displaystyle\left\{\begin{array}[]{l}\dot{\mu}=f(\mu)\\ \mu(0)=\mu_{0}\in G,\end{array}\right.

with μ0G\mu_{0}\in G known a priori. Our goal is to find the corresponding ODE for X(t)𝔤X(t)\in\mathfrak{g}, and then determine the conditions on ff that are required to make this ODE for XX affine (in the sense of affine maps on n\mathbb{R}^{n}).

In order to derive an ODE for XX, we recall the result in Lie theory known as the derivative of the exponential map (hall2013lie, , Theorem 5.4). Let X:𝔤X:\mathbb{R}\rightarrow\mathfrak{g} be a C1C^{1}-path in the Lie algebra. The derivative of the exponential map is given by

ddteX(t)=eX(t)J𝔤(X(t))X˙(t),\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}e^{X(t)}=e^{X(t)}J_{\mathfrak{g}}(X(t))\dot{X}(t), (26)

where J𝔤(X)J_{\mathfrak{g}}(X) is referred to as the Jacobian of the exponential map and is given by

J𝔤(X)=01esadXds.\displaystyle J_{\mathfrak{g}}(X)=\int_{0}^{1}e^{-s\mathrm{ad}_{X}}\mathrm{d}s. (27)

(Note that some authors define the Jacobian with the eXe^{X} on the right hand side (RHS) of (26) included.) Taking the time derivative of g(t)=eX(t)μ(t)g(t)=e^{X(t)}\mu(t), while using (26) and (25), results in eXJ𝔤(X)X˙μ+eXf(μ)=f(eXμ)e^{X}J_{\mathfrak{g}}(X)\dot{X}\mu+e^{X}f(\mu)=f(e^{X}\mu), or, equivalently,

{X˙=J𝔤1(X)(eXf(eXμ)f(μ))μ1X(0)=X0𝔤.\displaystyle\left\{\begin{array}[]{l}\dot{X}=J_{\mathfrak{g}}^{-1}(X)(e^{-X}f(e^{X}\mu)-f(\mu))\mu^{-1}\\[3.0pt] X(0)=X_{0}\in\mathfrak{g}.\end{array}\right. (30)

The goal now is to determine a condition on ff that makes the RHS of (30) a linear (affine) map of XX. The relevant condition here is referred to as the group-affine property barrau2019linear . A vector field f:G𝔤f:G\rightarrow\mathfrak{g} is said to be group-affine if, for all g1,g2Gg_{1},g_{2}\in G,

f(g1g2)=f(g1)g2+g1f(g2)g1f(I)g2,\displaystyle f(g_{1}g_{2})=f(g_{1})g_{2}+g_{1}f(g_{2})-g_{1}f(I)g_{2}, (31)

where IGI\in G is the identity element. An important class of examples of group-affine vector fields are those of the form f(g)=X1g+gX2f(g)=X_{1}g+gX_{2} for X1,X2𝔤X_{1},X_{2}\in\mathfrak{g}. The ODE in (25) with f(g)=X1gf(g)=X_{1}g is referred to as the (left) Poisson equation on the group GG in some of the literature (see, e.g., muller2021review ). However, in this manuscript, this special case is referred to as "constant" dynamics on the group, using an analogy with n\mathbb{R}^{n}. Constant vector fields will be analyzed in detail in Section V. As shown in Theorem 1, not all group-affine vector fields are constant.

It is not entirely trivial to see how (31) forces (30) to simplify to a linear equation. Indeed, if ff is assumed to be group-affine and the identity (31) is applied to simplify the RHS of (30), then

{X˙=J𝔤1(X)(eXf(eX)f(I))X(0)=X0𝔤.\displaystyle\left\{\begin{array}[]{l}\dot{X}=J_{\mathfrak{g}}^{-1}(X)(e^{-X}f(e^{X})-f(I))\\[3.0pt] X(0)=X_{0}\in\mathfrak{g}.\end{array}\right. (34)

Strikingly, while the dependence on μ\mu was eliminated from the RHS of (30), eqn. (34) does not appear to reduce to a linear equation in XX. Additional facts regarding group-affine vector fields beyond condition (31) are needed to argue that (34) is indeed a linear system.

Let ψt:GG\psi_{t}:G\rightarrow G denote the flow associated with the equation g˙=f(g)\dot{g}=f(g), where ff satisfies (31). The reason why (31) is called group-affine is owed to (barrau2019linear, , Proposition 8), which asserts that, for each t>0t>0, there exists an element φtAut(G)\varphi_{t}\in\mathrm{Aut}({G}) of the automorphism group Aut(G)\mathrm{Aut}({G}) of GG and νtG\nu_{t}\in G such that

ψt(g0)=φt(g0)νt.\displaystyle\psi_{t}(g_{0})=\varphi_{t}(g_{0})\nu_{t}. (35)

The terminology of "group-affine" now becomes clear by analogy with linear systems on (n,+)(\mathbb{R}^{n},+) since automorphisms of (n,+)(\mathbb{R}^{n},+) are precisely linear maps. Eqn. (35) is key to proving one implication in the following Theorem. Before presenting this Theorem, recall that a derivation is a linear map D:𝔤𝔤D:\mathfrak{g}\rightarrow\mathfrak{g} that satisfies D[X,Y]=[DX,Y]+[X,DY]D[X,Y]=[DX,Y]+[X,DY].

Theorem 1

A vector field f:G𝔤f:G\rightarrow\mathfrak{g} is group-affine (31) if and only if there exists a derivation D:𝔤𝔤D:\mathfrak{g}\rightarrow\mathfrak{g} and Y1,Y2𝔤Y_{1},Y_{2}\in\mathfrak{g} such that, for all X𝔤X\in\mathfrak{g},

f(eX)=eXJ𝔤(X)DX+eXY1+Y2eX,\displaystyle f(e^{X})=e^{X}J_{\mathfrak{g}}(X)DX+e^{X}Y_{1}+Y_{2}e^{X}, (36)

where J𝔤(X)J_{\mathfrak{g}}(X) is given by (27). As a consequence, for a group-affine vector field, the ODE (34) simplifies to

{X˙=(D+adY2)XX(0)=X0𝔤.\displaystyle\left\{\begin{array}[]{l}\dot{X}=(D+\mathrm{ad}_{Y_{2}})X\\[3.0pt] X(0)=X_{0}\in\mathfrak{g}.\end{array}\right. (39)

The appearance of Y2Y_{2} in (39) instead of Y1Y_{1} is a consequence of our choice to represent gGg\in G as eXμe^{X}\mu instead of μeX\mu e^{X}.

Proof. It is first argued that, if f:G𝔤f:G\rightarrow\mathfrak{g} is group-affine, then there exists a derivation D:𝔤𝔤D:\mathfrak{g}\rightarrow\mathfrak{g} such that ff has the form (36). This argument will leverage three keys facts: 1) the flow associated with ff is of the form (35), 2) an automorphism of a Lie group induces an automorphism of the Lie algebra (40), and 3) the Lie algebra of Aut(𝔤)\mathrm{Aut}({\mathfrak{g}}) (viewed as a Lie group) is equal to the set of derivations on 𝔤\mathfrak{g}.

The starting point to show the RHS of (34) is a linear function of XX is (hall2013lie, , Theorem 5.6), which asserts the existence of a unique TtAut(𝔤)T_{t}\in\mathrm{Aut}({\mathfrak{g}}) such that

φt(eX)=eTt(X),X𝔤,\displaystyle\varphi_{t}(e^{X})=e^{T_{t}(X)},\hskip 14.22636pt\forall X\in\mathfrak{g}, (40)

where φtAut(G)\varphi_{t}\in\mathrm{Aut}({G}) is the automorphism appearing in (35). (TtAut(𝔤)T_{t}\in\mathrm{Aut}({\mathfrak{g}}) means that TtT_{t} is an invertible linear map on 𝔤\mathfrak{g} that preserves brackets.) Using the automorphism property of φt\varphi_{t},

ψt(g0)=ψt(eX0μ0)=φt(eX0μ0)νt=φt(eX0)φt(μ0)νt=eTt(X0)ψt(μ0).\displaystyle\psi_{t}(g_{0})=\psi_{t}(e^{X_{0}}\mu_{0})=\varphi_{t}(e^{X_{0}}\mu_{0})\nu_{t}=\varphi_{t}(e^{X_{0}})\varphi_{t}(\mu_{0})\nu_{t}=e^{T_{t}(X_{0})}\psi_{t}(\mu_{0}).

By definition, ψt\psi_{t} satisfies ddtψt(g0)=f(ψt(g0))\frac{\mathrm{d}}{\mathrm{d}t}\psi_{t}(g_{0})=f(\psi_{t}(g_{0})) and limt0ψt(g0)=g0\lim_{t\rightarrow 0}\psi_{t}(g_{0})=g_{0}. Therefore, by time-differentiating the identity ψt(g0)=eTt(X0)ψt(μ0)\psi_{t}(g_{0})=e^{T_{t}(X_{0})}\psi_{t}(\mu_{0}) and using the definition of the flow, one finds that TtAut(𝔤)T_{t}\in\mathrm{Aut}({\mathfrak{g}}) must satisfy the ODE

J𝔤(Tt(X0))T˙t(X0)=eTt(X0)f(eTt(X0))f(I),\displaystyle J_{\mathfrak{g}}(T_{t}(X_{0}))\dot{T}_{t}(X_{0})=e^{-T_{t}(X_{0})}f(e^{T_{t}(X_{0})})-f(I), (41)

which is precisely (34). Since TtAut(𝔤)T_{t}\in\mathrm{Aut}({\mathfrak{g}}), there exists a derivation D:𝔤𝔤D:\mathfrak{g}\rightarrow\mathfrak{g} such that Tt(X)=etDXT_{t}(X)=e^{tD}X. This observation together with (41) implies that ff satisfies

f(eX)=eXJ𝔤(X)DX+eXf(I).\displaystyle f(e^{X})=e^{X}J_{\mathfrak{g}}(X)DX+e^{X}f(I). (42)

For the reverse direction, suppose ff has the form (36) for some derivation D:𝔤𝔤D:\mathfrak{g}\rightarrow\mathfrak{g}. It is readily verified that (36) satisfies (31) when D=0D=0. Therefore, without loss of generality, take Y1=Y2=0Y_{1}=Y_{2}=0. Let X1,X2𝔤X_{1},X_{2}\in\mathfrak{g}, and let BCH(X1,X2)𝔤\mathrm{BCH}({X_{1}},{X_{2}})\in\mathfrak{g} be the element of 𝔤\mathfrak{g} such that

eBCH(X1,X2)=eX1eX2.\displaystyle e^{\mathrm{BCH}({X_{1}},{X_{2}})}=e^{X_{1}}e^{X_{2}}. (43)

In other words, BCH(X1,X2)logeX1eX2\mathrm{BCH}({X_{1}},{X_{2}})\equiv\log{e^{X_{1}}e^{X_{2}}}. As is well-known and suggested by our notation, BCH(X1,X2)𝔤\mathrm{BCH}({X_{1}},{X_{2}})\in\mathfrak{g} is the BCH series, which consists of a series of nested commutators of X1X_{1} and X2X_{2}. With this context, the left hand side (LHS) of (31) reads

f(eX1eX2)=eX1eX2J𝔤(BCH(X1,X2))DBCH(X1,X2),\displaystyle f(e^{X_{1}}e^{X_{2}})=e^{X_{1}}e^{X_{2}}J_{\mathfrak{g}}(\mathrm{BCH}({X_{1}},{X_{2}}))D\hskip 1.42262pt\mathrm{BCH}({X_{1}},{X_{2}}), (44)

while the RHS of (31) reads

eX1f(eX2)+f(eX1)eX2eX1f(I)eX2=eX1eX2J𝔤(X2)DX2+eX1J𝔤(X1)DX1eX2.\displaystyle e^{X_{1}}f(e^{X_{2}})+f(e^{X_{1}})e^{X_{2}}-e^{X_{1}}f(I)e^{X_{2}}=e^{X_{1}}e^{X_{2}}J_{\mathfrak{g}}(X_{2})DX_{2}+e^{X_{1}}J_{\mathfrak{g}}(X_{1})DX_{1}e^{X_{2}}. (45)

Hence, the objective is to argue that the following equality holds:

eX1eX2J𝔤(BCH(X1,X2))DBCH(X1,X2)=eX1eX2J𝔤(X2)DX2+eX1J𝔤(X1)DX1eX2.\displaystyle e^{X_{1}}e^{X_{2}}J_{\mathfrak{g}}(\mathrm{BCH}({X_{1}},{X_{2}}))D\hskip 1.42262pt\mathrm{BCH}({X_{1}},{X_{2}})=e^{X_{1}}e^{X_{2}}J_{\mathfrak{g}}(X_{2})DX_{2}+e^{X_{1}}J_{\mathfrak{g}}(X_{1})DX_{1}e^{X_{2}}.

Notice that eBCH(X1,X2)J𝔤(BCH(X1,X2))DBCH(X1,X2)=ddteetDBCH(X1,X2)|t=0e^{\mathrm{BCH}({X_{1}},{X_{2}})}J_{\mathfrak{g}}(\mathrm{BCH}({X_{1}},{X_{2}}))D\hskip 1.42262pt\mathrm{BCH}({X_{1}},{X_{2}})=\frac{\mathrm{d}}{\mathrm{d}t}e^{e^{tD}\mathrm{BCH}({X_{1}},{X_{2}})}\Big{|}_{t=0}. Since DD is a derivation, etDAut(𝔤)e^{tD}\in\mathrm{Aut}({\mathfrak{g}}). Using the fact that BCH(X1,X2)\mathrm{BCH}({X_{1}},{X_{2}}) is given by a series of nested commutators, etDBCH(X1,X2)=BCH(etDX1,etDX2)e^{tD}\mathrm{BCH}({X_{1}},{X_{2}})=\mathrm{BCH}({e^{tD}X_{1}},{e^{tD}X_{2}}), and so

eBCH(X1,X2)J𝔤(BCH(X1,X2))DBCH(X1,X2)\displaystyle e^{\mathrm{BCH}({X_{1}},{X_{2}})}J_{\mathfrak{g}}(\mathrm{BCH}({X_{1}},{X_{2}}))D\hskip 1.42262pt\mathrm{BCH}({X_{1}},{X_{2}}) =ddteBCH(etDX1,etDX2)|t=0\displaystyle=\frac{\mathrm{d}}{\mathrm{d}t}e^{\mathrm{BCH}({e^{tD}X_{1}},{e^{tD}X_{2}})}\Big{|}_{t=0}
=ddteetDX1eetDX2|t=0\displaystyle=\frac{\mathrm{d}}{\mathrm{d}t}e^{e^{tD}X_{1}}e^{e^{tD}X_{2}}\Big{|}_{t=0}
=eX1J𝔤(X1)DX1eX2+eX1eX2J𝔤(X2)DX2.\displaystyle=e^{X_{1}}J_{\mathfrak{g}}(X_{1})DX_{1}e^{X_{2}}+e^{X_{1}}e^{X_{2}}J_{\mathfrak{g}}(X_{2})DX_{2}.

The identity follows. \square

IV The class of CG distributions

The definition of a CG on a (matrix) Lie group has been alluded to at several points in the preceding exposition. It is now appropriate to present a formal definition. First, consider a "matrization" map 𝔤:dim𝔤𝔤\mathcal{M}_{\mathfrak{g}}:\mathbb{R}^{\dim{\mathfrak{g}}}\rightarrow\mathfrak{g} that associates an element of dim𝔤\mathbb{R}^{\dim{\mathfrak{g}}} to its matrix representation in 𝔤\mathfrak{g} with respect to some basis for 𝔤\mathfrak{g} (c.f. eqn. (8)). Let ξ𝒩n(0,Σ)\xi\sim\mathcal{N}^{n}\left(0,\Sigma\right) be an nn-dimensional Gaussian distributed vector with zero mean and covariance matrix Σ\Sigma. Fix a group element μG\mu\in G, which is to be interpreted as the MAP estimate (this may also be interpreted as the Lie group mean wolfe2011bayesian ). From ξ\xi and μ\mu, induce a random element gGg\in G via the formula

g=e𝔤(ξ)μ.\displaystyle g=e^{\mathcal{M}_{\mathfrak{g}}({\xi})}\mu. (46)

Write g𝒯G(μ,Σ)g\sim\mathcal{T}_{G}(\mu,\Sigma) to indicate gg is a random element of GG with a distribution following (46). The distribution 𝒯G(μ,Σ)\mathcal{T}_{G}(\mu,\Sigma) on GG induced via eqn. (46) is known as a (left) CG in the literature wolfe2011bayesian ; barrau2014intrinsic ; brossard2017unscented . A right CG has the e𝔤(ξ)e^{\mathcal{M}_{\mathfrak{g}}({\xi})} and μ\mu in (46) interchanged. A CG has the physically intuitive interpretation that there is a reference group element μG\mu\in G representing the "best" estimate of the state and the uncertainty in the estimate is given by random “rotations" e𝔤(ξ)Ge^{\mathcal{M}_{\mathfrak{g}}({\xi})}\in G generated by the random vector ξn\xi\in\mathbb{R}^{n}. When the Lie group in question is (n,+)(\mathbb{R}^{n},+), then a CG is simply a Gaussian and the theory reduces to the standard Gaussian-based filtering framework on n\mathbb{R}^{n}.

There is a decent volume of literature focused on proposing distributions on Lie groups for the purposes of filtering, and no attempt is made to provide a comprehensive review here (see, e.g., AttitudeSurvey07 ; barrau2014intrinsic ; WangLee2021 for relatively recent reviews and approaches for 𝕊3\mathbb{S}^{3} and SO3\mathrm{SO}_{3}). The class of CGs has been considered by other authors for the purposes of estimation on general Lie groups and, in particular, attitude estimation barfoot2014associating ; barrau2014intrinsic ; brossard2017unscented . However, one major contribution of this manuscript, in addition to Theorem 1, is the following corollary of Theorem 1.

Corollary 1

Let g0𝒯G(μ0,Σ0)g_{0}\sim\mathcal{T}_{G}(\mu_{0},\Sigma_{0}) for some μ0G\mu_{0}\in G and positive-definite, symmetric Σ0n×n\Sigma_{0}\in\mathbb{R}^{n\times n}. Consider the following IVP on GG

{g˙=f(g)g(0)=g0G.\displaystyle\left\{\begin{array}[]{l}\dot{g}=f(g)\\ g(0)=g_{0}\in G.\end{array}\right.

Suppose ff is group-affine (i.e., it satisfies (31)). Then, for all t>0t>0, the solution g:[0,)Gg:[0,\infty)\rightarrow G of the IVP satisfies g(t)𝒯G(μ(t),Σ(t))g(t)\sim\mathcal{T}_{G}(\mu(t),\Sigma(t)), where μ:[0,)G\mu:[0,\infty)\rightarrow G is the solution of the IVP

{μ˙=f(μ)μ(0)=μ0G,\displaystyle\left\{\begin{array}[]{l}\dot{\mu}=f(\mu)\\ \mu(0)=\mu_{0}\in G,\end{array}\right.

and Σ:[0,)n×n\Sigma:[0,\infty)\rightarrow\mathbb{R}^{n\times n} solves

{Σ˙=(D+adY2)Σ+Σ(D+adY2)TΣ(0)=Σ0n×n,\displaystyle\left\{\begin{array}[]{l}\dot{\Sigma}=(D+\mathrm{ad}_{Y_{2}})\Sigma+\Sigma(D+\mathrm{ad}_{Y_{2}})^{T}\\ \Sigma(0)=\Sigma_{0}\in\mathbb{R}^{n\times n},\end{array}\right.

where DD and Y2Y_{2} are defined via (36).

Corollary 1 demonstrates that the class of CGs (46) is invariant under the flow generated by group-affine dynamics. The somewhat surprising aspect of this is that group-affine dynamics may be nonconstant (on the group), yet invariance is maintained (see, e.g., Section VII.3). For conventional Kalman filtering, it is well-known that linear state space dynamics take Gaussian initial conditions to Gaussians with updated parameters. What has just been proven is that for CGs the analogous condition on the vector field is that it is group-affine. This observation provides a case for the claim that CGs are the natural class of probability distributions on a Lie group to filter with respect to, assuming the dynamics under consideration are group-affine. (It is being assumed that the posterior distribution after an observation on the state is well approximated by a CG when making this claim. This may not always be the case.) In fact, given an underlying state space (i.e., a manifold of some dimension 1\geq 1) and a dynamical system on this state space, one is encouraged to design a group law which makes the associated vector field group-affine (c.f. barrau2022geometry ). If this is not possible, and one still insists on filtering with respect to CGs, then one should seek a group law which makes the dynamics "as close to group-affine" as possible. Numerical evidence for this claim is provided in the special case of attitude filtering in the presence of gyro bias (see Section VIII).

V The TSSDE

In this section, a stochastic generalization of a group-affine dynamical system as required for filtering is discussed. For simplicity, the focus is on the case where the deterministic part of the dynamics is constant (i.e., assuming D=0D=0 in (36)). The more general case carries over straightforwardly with minimal modifications to the calculations below.

Suppose there is a smooth function g:[0,)Gg:[0,\infty)\rightarrow G that satisfies

{g˙(t)=X(t)g(t)g(0)=g0G.\displaystyle\left\{\begin{array}[]{l}\dot{g}(t)=X(t)g(t)\\[3.0pt] g(0)=g_{0}\in G.\end{array}\right. (49)

for some (possibly time-dependent) X(t)𝔤X(t)\in\mathfrak{g}. The solution to (49) may be written in a Lie-algebraic way as g(t)=eΘ(t)g0g(t)=e^{\Theta(t)}g_{0} with Θ(t)𝔤\Theta(t)\in\mathfrak{g} satisfying (01esadΘ(t)ds)Θ(t)=X(t)\left(\int_{0}^{1}e^{s\mathrm{ad}_{\Theta(t)}}\mathrm{d}s\right)\Theta^{\prime}(t)=X(t) and giving rise to the well-known Magnus expansion. In particular, if [X(t1),X(t2)]=0[X(t_{1}),X(t_{2})]=0 for all t1,t20t_{1},t_{2}\geq 0, then g(t)=eX(t)g0g(t)=e^{X(t)}g_{0}.

Let η:n\eta:\mathbb{R}\rightarrow\mathbb{R}^{n} be a Brownian motion with delta-correlated spectral density matrix QηQ_{\eta}. Consider the following Langevin equation on GG:

{g˙=(X+𝔤(η))gg(0)=g0G.\displaystyle\left\{\begin{array}[]{l}\dot{g}=(X+\mathcal{M}_{\mathfrak{g}}({\eta}))g\\[3.0pt] g(0)=g_{0}\in G.\end{array}\right. (52)

As an SDE, (52) must carry the Stratonovich interpretation, otherwise solutions may not remain in GG. Suppose gg satisfying (52) is of the form g(t)=e𝔤(ξ(t))μ(t)g(t)=e^{\mathcal{M}_{\mathfrak{g}}({\xi(t)})}\mu(t), where μ(t)G\mu(t)\in G is given by μ(t)=eΘ(t)g0\mu(t)=e^{\Theta(t)}g_{0} and the equation for ξ(t)n\xi(t)\in\mathbb{R}^{n} is to-be-determined. In what follows, drop the implied tt-dependence. Following a derivation similar to that of eqn. (30), ξ˙\dot{\xi} satisfies

J𝔤(𝔤(ξ))𝔤(ξ˙)=e𝔤(ξ)(X+𝔤(η))e𝔤(ξ)X.\displaystyle J_{\mathfrak{g}}(\mathcal{M}_{\mathfrak{g}}({\xi}))\mathcal{M}_{\mathfrak{g}}({\dot{\xi}})=e^{-\mathcal{M}_{\mathfrak{g}}({\xi})}(X+\mathcal{M}_{\mathfrak{g}}({\eta}))e^{\mathcal{M}_{\mathfrak{g}}({\xi})}-X. (53)

Applying the braiding identity (22) to (53) and simplifying yields

J𝔤(𝔤(ξ))𝔤(ξ˙)=(ead𝔤(ξ)I)X+ead𝔤(ξ)𝔤(η).\displaystyle J_{\mathfrak{g}}(\mathcal{M}_{\mathfrak{g}}({\xi}))\mathcal{M}_{\mathfrak{g}}({\dot{\xi}})=\left(e^{-\mathrm{ad}_{\mathcal{M}_{\mathfrak{g}}({\xi})}}-I\right)X+e^{-\mathrm{ad}_{\mathcal{M}_{\mathfrak{g}}({\xi})}}\mathcal{M}_{\mathfrak{g}}({\eta}). (54)

To simplify eqn. (54) further, introduce the operator W𝔤:𝔤n×nW_{\!\mathfrak{g}}:\mathfrak{g}\rightarrow\mathbb{R}^{n\times n} given by

W𝔤(X)=(01esadXds)1,\displaystyle W_{\!\mathfrak{g}}(X)=\left(\int_{0}^{1}e^{s\mathrm{ad}_{X}}\mathrm{d}s\right)^{-1}, (55)

and note J𝔤1(X)eadX=W𝔤(X)J_{\mathfrak{g}}^{-1}(X)e^{-\mathrm{ad}_{X}}=W_{\!\mathfrak{g}}(X). Noting further that J𝔤1(X)(eadXI)=adXJ_{\mathfrak{g}}^{-1}(X)(e^{-\mathrm{ad}_{X}}-I)=-\mathrm{ad}_{X}, eqn. (54) simplifies to

𝔤(ξ˙)=adX𝔤(ξ)+W𝔤(𝔤(ξ))𝔤(η).\displaystyle\mathcal{M}_{\mathfrak{g}}({\dot{\xi}})=\mathrm{ad}_{X}\mathcal{M}_{\mathfrak{g}}({\xi})+W_{\!\mathfrak{g}}(\mathcal{M}_{\mathfrak{g}}({\xi}))\mathcal{M}_{\mathfrak{g}}({\eta}). (56)

Eqn. (56) is essentially our desired SDE defined on the Lie algebra. However, it is an equation for matrices and it is desirable to convert this to an equation for the coordinates ξn\xi\in\mathbb{R}^{n}. In order to accomplish this, introduce the map 𝒱𝔤:𝔤n\mathcal{V}_{\mathfrak{g}}:\mathfrak{g}\rightarrow\mathbb{R}^{n} defined as the inverse of 𝔤:n𝔤\mathcal{M}_{\mathfrak{g}}:\mathbb{R}^{n}\rightarrow\mathfrak{g}. Using this map, define

ad¯ξ1ξ2:=𝒱𝔤(ad𝔤(ξ1)𝔤(ξ2)),\displaystyle\overline{\mathrm{ad}}_{\xi_{1}}\xi_{2}:=\mathcal{V}_{\mathfrak{g}}({\mathrm{ad}_{\mathcal{M}_{\mathfrak{g}}({\xi_{1}})}\mathcal{M}_{\mathfrak{g}}({\xi_{2}})}), (57)

for ξ1,ξ2n\xi_{1},\xi_{2}\in\mathbb{R}^{n}. For each ξn\xi\in\mathbb{R}^{n}, ad¯ξ:nn\overline{\mathrm{ad}}_{\xi}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} is a linear map and, hence, may be represented as a matrix. Examples of this matrix representation of the adjoint are in Section VII. Moreover, the identity 𝔤(ad¯ξ1ξ2)=ad𝔤(ξ1)𝔤(ξ2)\mathcal{M}_{\mathfrak{g}}({\overline{\mathrm{ad}}_{\xi_{1}}\xi_{2}})=\mathrm{ad}_{\mathcal{M}_{\mathfrak{g}}({\xi_{1}})}\mathcal{M}_{\mathfrak{g}}({\xi_{2}}) holds for all ξ1,ξ2n\xi_{1},\xi_{2}\in\mathbb{R}^{n}. This identity is a consequence of the fact that the maps 𝒱𝔤\mathcal{V}_{\mathfrak{g}} and 𝔤\mathcal{M}_{\mathfrak{g}} are inverses of each other. From this identity, introduce

W¯𝔤(ξ):=(01esad¯ξds)1\displaystyle\overline{W}_{\!\!\mathfrak{g}}(\xi):=\left(\int_{0}^{1}e^{s\overline{\mathrm{ad}}_{\xi}}\mathrm{d}s\right)^{-1} (58)

and note that W𝔤(𝔤(ξ))𝔤(η)=𝔤(W¯𝔤(ξ)η)W_{\!\mathfrak{g}}(\mathcal{M}_{\mathfrak{g}}({\xi}))\mathcal{M}_{\mathfrak{g}}({\eta})=\mathcal{M}_{\mathfrak{g}}({\overline{W}_{\!\!\mathfrak{g}}(\xi)\eta}). Using these notations, one may "vectorize" (56) to find the equation

ξ˙=ad¯𝒱𝔤(X)ξ+W¯𝔤(ξ)η.\displaystyle\dot{\xi}=\overline{\mathrm{ad}}_{\mathcal{V}_{\mathfrak{g}}({X})}\xi+\overline{W}_{\!\!\mathfrak{g}}(\xi)\eta. (59)

Again, as an SDE, eqn. (59) must carry the Stratonovich interpretation. If the derivation DD in eqn. 36 is non-zero, then (59) becomes

ξ˙=(D¯+ad¯𝒱𝔤(X))ξ+W¯𝔤(ξ)η,\displaystyle\dot{\xi}=(\overline{D}+\overline{\mathrm{ad}}_{\mathcal{V}_{\mathfrak{g}}({X})})\xi+\overline{W}_{\!\!\mathfrak{g}}(\xi)\eta, (60)

where D¯:nn\overline{D}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} is the linear map defined by D¯(ξ):=𝒱𝔤(D𝔤(ξ))\overline{D}(\xi):=\mathcal{V}_{\mathfrak{g}}({D\mathcal{M}_{\mathfrak{g}}({\xi})}) for ξn\xi\in\mathbb{R}^{n}.

Eqn. (60) represents the TSSDE relevant for propagation of the group elements in terms of the exponential coordinates which parameterize the Lie algebra through the vectorization operation. Clearly, if η=0\eta=0, then (60) becomes a linear equation, which is consistent with Theorem 1. Due to the nonlinear, state-dependent diffusion tensor W¯𝔤(ξ)\overline{W}_{\!\!\mathfrak{g}}(\xi) in (60), it no longer appears as though the class of CGs are invariant under propagation in the presence of noise. Indeed, at the end of Section VII.1, the FPE associated with (60) for the case of 𝕊3\mathbb{S}^{3} (i.e., the attitude filtering problem in the absence of gyro bias) is analyzed and the fundamental solution of this FPE is explicitly computed. The FPE fundamental solution defines the class of probability distributions which remain invariant during propagation in the presence of noise, and the computed fundamental solution for 𝕊3\mathbb{S}^{3} is notably not a CG. It is shown, however, that this fundamental solution is well-approximated by a CG when ΔtQη1\Delta tQ_{\eta}\ll 1 (in appropriate units). In many applications ΔtQη\Delta tQ_{\eta} is small and the CG approximation appears to hold with high accuracy during propagation.

VI Filtering on Lie groups: the TSF

With the TSSDE in hand, one has at their disposal the myriad of nonlinear filtering techniques to determine how the mean and covariance of ξ\xi (approximately) propagate through (60). The perhaps crudest approach would involve linearization in an EKF fashion and then proceed to numerically integrate the resulting ODEs defining the (approximate) mean and covariance evolution. One can do much better, however. In Section VI.1 a generalization of the CTUT described in sarkka2007unscented to the case of state-dependent diffusion is provided. It is used to propagate the mean and covariance of ξ\xi through (60). This approach provides a relatively accurate way to update the mean and covariance of ξ\xi in between measurement updates within the TSF framework.

Propagation of distributional parameters is only one of two fundamental stages in recursive filtering, with the other being the measurement stage, i.e. refining these distributional parameters by an incoming measurement zz according to Bayes’ theorem. In this regard, a measurement zdz\in\mathbb{R}^{d} is assumed to be related to the group element gg through the equation z=h(g,w)z=h(g,w), where h:G×mdh:G\times\mathbb{R}^{m}\rightarrow\mathbb{R}^{d} is a function of the group element and a random vector wmw\in\mathbb{R}^{m} that is typically assumed to be w𝒩m(0,R)w\sim\mathcal{N}^{m}(0,R). The function hh associating group elements to measurement values is typically of a form that demands an approximate means of performing the Bayesian fusion. In this work, we advocate for utilization of the UT UnscentedKF04 . Its application to the measurement stage of the TSF is described Section VI.2. A similar approach to a UT-based measurement update on Lie groups using a CG as a reference distribution appeared in brossard2017unscented ; sjoberg2021lie .

When using the CTUT to update the first two moments of the distribution over ξ\xi according to the TSSDE (60), the mean of ξ\xi will no longer remain zero, even if it is initially zero. This is a result of the nonlinear, state-dependent diffusion tensor W¯𝔤(ξ)\overline{W}_{\!\!\mathfrak{g}}(\xi) present in (60). (However, it has been observed that the mean generally remains small over time scales Δt\Delta t for which Δtσ1\Delta t\sigma\ll 1, where σ\sigma is the largest eigenvalue of QηQ_{\eta}). The mean of ξ\xi will also generally become non-zero after the measurement stage. Because the definition of a CG (46) requires the mean of ξ\xi to be zero (in order to interpret μ\mu as the MAP estimate), a procedure to approximate the distribution g=e𝔤(ξ)μg=e^{\mathcal{M}_{\mathfrak{g}}({\xi})}\mu, ξ𝒩n(ξ^,Σ)\xi\sim\mathcal{N}^{n}(\widehat{\xi},\Sigma), by a CG must be developed. This is done in Section VI.3 and the resulting procedure is called whitening. Whitening is fundamental to the TSF and is required to produce statistically consistent MAP estimates.

Since the bulk of the TSF algorithm presented here relies on the UT, it is helpful to briefly review its basics. The UT represents the distribution of a random vector xnx\in\mathbb{R}^{n} as a collection of sigma points σin\sigma_{i}\in\mathbb{R}^{n}, i=0,,2ni=0,\cdots,2n, given by

{σ0=x^σi=x^+n+λL:ii=1,nσi=x^n+λL:ii=n+1,,2n,\displaystyle\left\{\begin{array}[]{ll}\sigma_{0}=\widehat{x}&\\[5.0pt] \sigma_{i}=\widehat{x}+\sqrt{n+\lambda}~{}L_{:i}&i=1,\cdots n\\[5.0pt] \sigma_{i}=\widehat{x}-\sqrt{n+\lambda}~{}L_{:i}&i=n+1,\cdots,2n,\end{array}\right. (64)

in which L:iL_{:i} denotes the ithi^{\mathrm{th}} column of the Cholesky decomposition of the covariance matrix of xx, and λn\lambda\neq-n is a tuning parameter that controls the spread of the sigma points around their mean. In addition, one defines the collection of weights

{w0=λλ+nwi=12(λ+n),i=1,,2n.\displaystyle\left\{\begin{array}[]{lc}w_{0}=\dfrac{\lambda}{\lambda+n}&\\[5.0pt] w_{i}=\dfrac{1}{2\left(\lambda+n\right)},&i=1,\cdots,2n.\end{array}\right. (67)

The unscented transform then approximates the expectation of the distribution of y=f(x)y=f(x) for an arbitrary map f:nmf:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m} as 𝔼[f(x)]i=02nwif(σi)\mathbb{E}[{f(x)}]\simeq\sum_{i=0}^{2n}w_{i}f\left(\sigma_{i}\right). There are variants of the UT, such as the simplex UT, that use fewer sigma points to represent the distribution of xx, thereby reducing the computational resources needed to evaluate summations. In this manuscript, only the "vanilla" UT as just described is considered.

VI.1 Propagation

This section is devoted to describing the propagation step within the TSF using the CTUT sarkka2007unscented . However, the CTUT within sarkka2007unscented did not consider state-dependent diffusion, as the W¯𝔤(ξ)dη\overline{W}_{\!\!\mathfrak{g}}(\xi)\mathrm{d}\eta term in (60) now demands. Hence, it is necessary to derive a generalization to (sarkka2007unscented, , Algorithm 4.4). This is done for a general (Stratonovich) SDE

dx=f(x)dt+G(x)dw,\displaystyle\mathrm{d}x=f(x)\mathrm{d}t+G(x)\mathrm{d}w, (68)

where xnx\in\mathbb{R}^{n} is a vector-valued stochastic process, f:nnf:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} is a vector field governing the deterministic portion of the dynamics of xx, G:nn×mG:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n\times m} is the diffusion tensor, and ww is a Brownian motion with delta-correlated spectral density Qm×mQ\in\mathbb{R}^{m\times m}. Eqn. (60) is reproduced by setting ξx\xi\to x, ηw\eta\to w, f(x)=D¯+ad¯𝒱𝔤(X)xf(x)=\overline{D}+\overline{\mathrm{ad}}_{\mathcal{V}_{\mathfrak{g}}({X})}x, and G(x)=W¯𝔤(x)G(x)=\overline{W}_{\!\!\mathfrak{g}}(x). Interpreting (68) as a Stratonovich SDE, the FPE for the time-dependent transition probability density p:×n[0,)p:\mathbb{R}\times\mathbb{R}^{n}\rightarrow[0,\infty) associated with (68) reads

tp=div(fp)+12i=1nk,=1mj=1ni(GikQkj(Gjp)).\displaystyle\partial_{t}p=-\mathrm{div}({fp})+\frac{1}{2}\sum_{i=1}^{n}\sum_{k,\ell=1}^{m}\sum_{j=1}^{n}\partial_{i}(G_{ik}Q_{k\ell}\partial_{j}(G_{j\ell}p)). (69)

It is possible to rewrite this equation as

tp=div(f~p)+12i,j=1nij((GQGT)ijp),\displaystyle\partial_{t}p=-\mathrm{div}({\tilde{f}p})+\frac{1}{2}\sum_{i,j=1}^{n}\partial_{i}\partial_{j}((GQG^{T})_{ij}p), (70)

where f~:nn\tilde{f}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n} is a vector field whose iith component is

f~i(x)=fi(x)+12k,=1mj=1nGjQkjGik.\displaystyle\tilde{f}_{i}(x)=f_{i}(x)+\frac{1}{2}\sum_{k,\ell=1}^{m}\sum_{j=1}^{n}G_{j\ell}Q_{k\ell}\partial_{j}G_{ik}. (71)

By multiplying (70) by xx and xxTxx^{T} and integrating, one may derive the moment evolution equations for the mean x^:=𝔼[x]\widehat{x}:=\mathbb{E}[{x}] and covariance P:=Cov[x]=𝔼[(xx^)(xx^)T]P:=\mathrm{Cov}[{x}]=\mathbb{E}[{(x-\widehat{x})(x-\widehat{x})^{T}}] corresponding to (70):

{dx^dt=𝔼[f~(x)]dPdt=Cov[x,f~(x)]+Cov[f~(x),x]+𝔼[G(x)QGT(x)],\displaystyle\left\{\begin{array}[]{l}\dfrac{\mathrm{d}\widehat{x}}{\mathrm{d}t}=\mathbb{E}[{\tilde{f}(x)}]\\[5.0pt] \dfrac{\mathrm{d}P}{\mathrm{d}t}=\mathrm{Cov}[{x,\tilde{f}(x)}]+\mathrm{Cov}[{\tilde{f}(x),x}]+\mathbb{E}[{G(x)QG^{T}(x)}],\end{array}\right. (74)

where Cov[x,f(x)]=𝔼[(xx^)(f(x)f¯)T]\mathrm{Cov}[{x,f(x)}]=\mathbb{E}[{(x-\widehat{x})(f(x)-\overline{f})^{T}}].

As discussed in (sarkka2007unscented, , Appendix), the CTUT propagation equations in (sarkka2007unscented, , Algorithm 4.4) are obtained by applying the UT to the expectations on the RHS of equations (74). However, as mentioned earlier, that derivation did not consider state-dependent GG in (74). Hence, the covariance evolution in (sarkka2007unscented, , Algorithm 4.4) must be augmented with the term 𝔼[G(x)QGT(x)]i=02nwiG(σi)QGT(σi)\mathbb{E}[{G(x)QG^{T}(x)}]\simeq\sum_{i=0}^{2n}w_{i}G(\sigma_{i})QG^{T}(\sigma_{i}), where the weights wiw_{i} are defined in (67). With this, all the needed ingredients are present to write down the system of coupled ODEs governing the evolution of the sigma points during the propagation step of the TSF. The derivation follows that in (sarkka2007unscented, , Appendix) closely and is omitted. The end result is essentially (sarkka2007unscented, , Equation 35) with the matrix MM therein modified to include the term i=02nwiG(σi)QGT(σi)\sum_{i=0}^{2n}w_{i}G(\sigma_{i})QG^{T}(\sigma_{i}).

Depending on the dimension of GG and the technique used to numerically integrate the sigma point ODEs, it may be too computationally expensive to use the CTUT for real-time applications. In this regard, another approach for moment propagation through the TSSDE is briefly indicated. This approach is based on a "short-time expansion" of the FPE and may be seen as a generalization of the EKF to state-dependent diffusion. Assuming the initial probability density associated with ξ\xi in (60) is sufficiently localized around the origin of the Lie algebra, then a physically reasonable approximation is to assume it remains localized around the origin over propagation time scales Δt\Delta t for which qηΔt1q_{\!\eta}\Delta t\ll 1 (in appropriate units), where qηq_{\!\eta} is the largest eigenvalue of QηQ_{\eta}, the spectral density associated with η\eta in (60). Thus, only the behavior of the diffusion tensor W¯𝔤\overline{W}_{\!\!\mathfrak{g}} near the origin contributes to the evolution of the density under the FPE. More rigorously, for each λ>0\lambda>0, consider a rescaled density pλ(t,x)=λnp(t,x/λ)p_{\lambda}(t,x)=\lambda^{-n}p(t,x/\lambda), where p(t,x)p(t,x) is assumed to satisfy (70). Rewriting the FPE (70) for pλp_{\lambda}, one may expand the coefficients of this transformed PDE in powers of λ\lambda. Removing terms that fall off faster than λ1\lambda^{-1}, one arrives at an "effective" FPE that captures the evolution of the density over "short" time scales. The result of this procedure is stated for the SU2\mathrm{SU}_{2} example in Section VII.1. Similar ideas are described in ye2023uncertainty ; ye2024uncertainty .

VI.2 Measurement update

The previous section was concerned with the prediction stage of the TSF. The present section is concerned with the measurement stage of filtering, in which our state estimates and their uncertainties are refined by an incoming measurement zz according to Bayes’ theorem. The measurement zdz\in\mathbb{R}^{d} is assumed to be related to the group element gg through the equation z=h(g)+wz=h(g)+w, where h:Gdh:G\rightarrow\mathbb{R}^{d} and wdw\in\mathbb{R}^{d} is assumed to be a zero-mean Gaussian distributed random vector. An example measurement model in the case of attitude filtering is a triaxial magnetometer which measures an ambient magnetic field BB in the coordinate frame to which the attitude is referenced. Rotating to the body frame coordinates using eqn. (1), h(q)=R(q)Bh(q)=R(q)B, where R(q)R(q) is given by (1). Note that this requires a model of the ambient magnetic field BB.

Rarely will hh be equivalent to a linear function on the Lie algebra of GG. This means the class of CGs over the group are not, in general, invariant under the Bayesian update. Hence, nonlinear filtering techniques are required to approximate the posterior distribution over the group. To apply the UT to the measurement stage of filtering using a CG (46) as the reference distribution, start by generating sigma points according to (64), where the random vector ξn\xi\in\mathbb{R}^{n} in (46) is used in place of xx in (64). These sigma points are then transformed to "measurement space" via νi=h(e𝔤(σi)μ)\nu^{i}=h\left(e^{\mathcal{M}_{\mathfrak{g}}({\sigma^{i}})}\mu\right). Since both the measurement space and the Lie algebra where the sigma points are defined have the structure of a finite-dimensional vector space, the Bayesian fusion amounts to a standard Kalman filter update. These update equations are standard and omitted for brevity (see, e.g., (Crassidis_UKF, , Eqns. 2-4)). This procedure will produce an (approximate) posterior distribution over GG of the form g=e𝔤(ξ)μg=e^{\mathcal{M}_{\mathfrak{g}}({\xi})}\mu, where ξ𝒩n(ξ^,Σ)\xi\sim\mathcal{N}^{n}(\widehat{\xi},\Sigma) and ξ^0\widehat{\xi}\neq 0 in general. In order to maintain consistency, this posterior must be transformed into the form of a CG. This process is called whitening and is described in the following Section.

VI.3 Whitening

In this section, a procedure is developed to approximate the distribution g=e𝔤(ξ)μg=e^{\mathcal{M}_{\mathfrak{g}}({\xi})}\mu, with ξ𝒩n(ξ^,Σ)\xi\sim\mathcal{N}^{n}(\widehat{\xi},\Sigma), by a CG (46). In contrast to eqn. (46), ξ^\widehat{\xi} is not assumed to be zero. The approximation is accomplished by rotating μ\mu by a predetermined amount, while applying the reverse rotation to the distribution ξ\xi, where the rotation is chosen in such a way that the new distribution over ξ\xi has zero mean to some specified tolerance. The resulting algorithm will be referred to as whitening.

The problem is to find a fixed vector ana\in\mathbb{R}^{n} such that the mean of

ξ~:=𝒱𝔤(BCH(𝔤(ξ),𝔤(a)))\displaystyle\tilde{\xi}:=\mathcal{V}_{\mathfrak{g}}({\mathrm{BCH}({\mathcal{M}_{\mathfrak{g}}({\xi})},{-\mathcal{M}_{\mathfrak{g}}({a})})}) (75)

is zero. If this can be accomplished, then write

g=e𝔤(ξ)μ=(e𝔤(ξ)e𝔤(a))e𝔤(a)μ=e𝔤(ξ~)μ~,\displaystyle g=e^{\mathcal{M}_{\mathfrak{g}}({\xi})}\mu=\left(e^{\mathcal{M}_{\mathfrak{g}}({\xi})}e^{-\mathcal{M}_{\mathfrak{g}}({a})}\right)e^{\mathcal{M}_{\mathfrak{g}}({a})}\mu=e^{\mathcal{M}_{\mathfrak{g}}({\tilde{\xi}})}\tilde{\mu},

where μ~=e𝔤(a)μ\tilde{\mu}=e^{\mathcal{M}_{\mathfrak{g}}({a})}\mu. Since ξ~\tilde{\xi} has zero mean, μ~\tilde{\mu} may be interpreted as the MAP of the distribution over GG in accordance with the definition of a CG (46). Due to the nonlinear nature of the BCH expansion, it seems improbable that one may find a closed-form for an aa in terms of the moments of ξ\xi. Instead, a simpler route is taken that can be iterated upon until a desired convergence criteria is met. In this vein, an ana\in\mathbb{R}^{n} is sought such than the mean of ξ~\tilde{\xi} is strictly smaller (in magnitude) that the mean of ξ\xi.

To proceed, first recall a few facts about the BCH expansion. Let X,Y𝔤X,Y\in\mathfrak{g}, where 𝔤\mathfrak{g} is the Lie algebra of some (matrix) Lie group GG. Consider the series representation of BCH (hall2013lie, , Section 5.6), whose first few terms are given by

BCH(X,Y)=X+Y+12[X,Y]+112([X,[X,Y]][Y,[X,Y]])+.\displaystyle\mathrm{BCH}({X},{Y})=X+Y+\frac{1}{2}[X,Y]+\frac{1}{12}([X,[X,Y]]-[Y,[X,Y]])+\cdots. (76)

If all the commutators in (76) that only contain a single YY are collected, then it is possible to write an alternative form of the BCH series given by

BCH(X,Y)=X+J𝔤1(X)Y+𝒪(Y2),\displaystyle\mathrm{BCH}({X},{Y})=X+J_{\mathfrak{g}}^{-1}(X)Y+\mathcal{O}(Y^{2}), (77)

where J𝔤(X)J_{\mathfrak{g}}(X) was introduced in (27).

Using formula (77) to expand the RHS of (75) and vectorizing, one finds

ξ~=ξJ¯𝔤1(ξ)a+𝒪(a2),\displaystyle\tilde{\xi}=\xi-\overline{J}^{-1}_{\mathfrak{g}}(\xi)a+\mathcal{O}(a^{2}), (78)

where J¯𝔤\overline{J}_{\!\mathfrak{g}} is defined similar to (58). If one takes the expectation of both sides of (78), and imposes the constraint that 𝔼[ξ~]=0\mathbb{E}[{\tilde{\xi}}]=0, then

0=ξ^𝔼[J¯𝔤1(ξ)]a+𝒪(a2).\displaystyle 0=\widehat{\xi}-\mathbb{E}[{\overline{J}_{\!\mathfrak{g}}^{-1}(\xi)}]a+\mathcal{O}(a^{2}). (79)

Ignoring higher-order terms in aa, solve (79) for aa to find

a(𝔼[J¯𝔤1(ξ)])1ξ^.\displaystyle a\simeq\left(\mathbb{E}[{\overline{J}_{\!\mathfrak{g}}^{-1}(\xi)}]\right)^{-1}\widehat{\xi}. (80)

In order to use (80), one must be able to compute 𝔼[J¯𝔤1(ξ)]\mathbb{E}[{\overline{J}_{\!\mathfrak{g}}^{-1}(\xi)}] when ξ𝒩n(ξ^,Σ)\xi\sim\mathcal{N}^{n}(\widehat{\xi},\Sigma). Without making simplifying assumptions about the covariance Σ\Sigma of ξ\xi, it appears difficult to find a closed form. There are at least two routes to approximating this expectation value. One route is to use the UT, while another may use a first-order Taylor series approach, i.e. 𝔼[J¯𝔤1(ξ)]J¯𝔤1(ξ^)\mathbb{E}[{\overline{J}_{\!\mathfrak{g}}^{-1}(\xi)}]\simeq\overline{J}_{\!\mathfrak{g}}^{-1}(\widehat{\xi}). Using the fact that J¯𝔤1(ξ)ξ=ξ\overline{J}_{\!\mathfrak{g}}^{-1}(\xi)\xi=\xi for any ξn\xi\in\mathbb{R}^{n}, then under this first-order approximation, aξ^a\simeq\widehat{\xi}. However, if aa in (80) is computed using, for example, the UT, then, in general, aξ^a\neq\widehat{\xi}.

To compute the covariance of ξ~\tilde{\xi}, expand (75) as a series in ξ\xi, not aa. For this, note the BCH symmetry BCH(X,Y)=BCH(Y,X)\mathrm{BCH}({X},{Y})=-\mathrm{BCH}({-Y},{-X}), true for any X,Y𝔤X,Y\in\mathfrak{g}. Then, there is an expansion similar to (78), but now in powers of ξ\xi:

ξ~=J¯𝔤1(a)(ξa)+𝒪(ξ2).\displaystyle\tilde{\xi}=\overline{J}_{\!\mathfrak{g}}^{-1}\left(a\right)\left(\xi-a\right)+\mathcal{O}(\xi^{2}). (81)

The previous expression tells us that the covariance of ξ~\tilde{\xi} is Σ~J¯𝔤1(a)(Σ+(aξ^)(aξ^)T)J¯𝔤T(a)\tilde{\Sigma}\simeq\overline{J}_{\!\mathfrak{g}}^{-1}\left(a\right)(\Sigma+(a-\widehat{\xi})(a-\widehat{\xi})^{T})\overline{J}_{\!\mathfrak{g}}^{-T}\left(a\right). Choose a=ξ^a=\widehat{\xi}. Then Σ~J¯𝔤1(ξ^)ΣJ¯𝔤T(ξ^)\tilde{\Sigma}\simeq\overline{J}_{\!\mathfrak{g}}^{-1}\left(\widehat{\xi}\right)\Sigma\overline{J}_{\!\mathfrak{g}}^{-T}\left(\widehat{\xi}\right).

If aa is computed based on an approximation such as (80), then the mean of ξ~\tilde{\xi} in (75) will not be exactly zero. However, by analyzing higher-order terms in (81) using, say, a=ξ^a=\widehat{\xi}, one may show that the new mean of ξ~\tilde{\xi} has norm which is strictly smaller in norm than ξ^\widehat{\xi}. This implies that if the mean and covariance of ξ~\tilde{\xi} are (approximately) computed using, for example, the choice a=ξ^a=\widehat{\xi}, then this whitening process may be repeated using this new mean and covariance until a desired convergence criteria is met. In practice, it has been observed that a small number of iterations, e.g. 𝒪(10)\mathcal{O}\left(10\right), will suffice to produce a distribution with a mean on the order of machine double precision.

VI.4 Summary of the TSF

Sections VI.1, VI.2, and VI.3 may be summarized to describe a complete TSF based on the UT. One recursive step of the filter proceeds as follows. Let g𝒯G(μ,Σ)g\sim\mathcal{T}_{G}(\mu_{-},\Sigma_{-}) be the prior CG. In particular, g=e𝔤(ξ)μg=e^{\mathcal{M}_{\mathfrak{g}}({\xi_{-}})}\mu_{-} where ξ𝒩n(0,Σ)\xi_{-}\sim\mathcal{N}^{n}(0,\Sigma_{-}). Suppose there is an incoming measurement zdz\in\mathbb{R}^{d}. The first step is to update the distribution of ξ\xi_{-} with the measurement zz using the UT as described in Section VI.2. This produces a new distribution ξ+𝒩n(ξ^+,Σ+)\xi_{+}\sim\mathcal{N}^{n}(\widehat{\xi}_{+},\Sigma_{+}) where, in general, ξ^+0\widehat{\xi}_{+}\neq 0. To remedy this, several iterations of the whitening procedure described in Section VI.3 are performed. This transforms the posterior distribution g=e𝔤(ξ+)μg=e^{\mathcal{M}_{\mathfrak{g}}({\xi_{+}})}\mu_{-} into an (roughly) equivalent CG distribution, g𝒯G(μ~+,Σ~+)g\sim\mathcal{T}_{G}(\tilde{\mu}_{+},\tilde{\Sigma}_{+}). Next, one must propagate the MAP μ~+\tilde{\mu}_{+} and the first two moments of ξ~+𝒩n(0,Σ~+)\tilde{\xi}_{+}\sim\mathcal{N}^{n}(0,\tilde{\Sigma}_{+}) until the next measurement is received. The MAP estimate μ~+\tilde{\mu}_{+} is propagated forward by solving

{μ˙=f(μ)μ(0)=μ~+\displaystyle\left\{\begin{array}[]{l}\dot{\mu}=f(\mu)\\ \mu(0)=\tilde{\mu}_{+}\end{array}\right.

over the necessary time step, where ff is the vector field governing the dynamics on the Lie group. The moments of ξ~+\tilde{\xi}_{+} are propagated forward in time under the appropriate TSSDE (60) using the CTUT as described in Section VI.1. Before the next incoming measurement is processed, another iteration of the whitening procedure is typically required due to the nonlinear, state-dependent diffusion tensor in (60). Once this is complete, the algorithm just outlined may be repeated until all measurements are exhausted.

VI.5 Performance metrics

In the context of Gaussian-based nonlinear filtering on n\mathbb{R}^{n}, a standard filter performance metric in simulations is the (normalized) χ2\chi^{2} value. Specifically, if χ2=1nxμ,P1(xμ)\chi^{2}=\frac{1}{n}\langle x-\mu,P^{-1}(x-\mu)\rangle with x𝒩n(μ,P)x\sim\mathcal{N}^{n}(\mu,P), then χ2\chi^{2} follows a chi-squared distribution and has mean 𝔼[χ2]=1\mathbb{E}[{\chi^{2}}]=1. If, after averaging over many filter realizations, the χ2\chi^{2} is substantially different from its theoretical expectation, then there is strong evidence that the distribution followed by the estimated error is not a Gaussian with covariance matrix given by the associated state error covariance. Hence, the χ2\chi^{2} value gives a measure of statistical consistency of the filter. Poor statistical consistency could be the result of flawed assumptions, poor parameter tuning, numerical issues, or genuine non-Gaussian behavior (just to name a few). A key aspect of the χ2\chi^{2} metric is that it takes into account the full state error covariance matrix PP, capturing important cross-correlations between states.

The goal here is to identify a similar metric when filtering with respect to CGs on Lie groups. Suppose one is attempting to estimate gGg\in G and, at any given time in the simulation, an estimate of gg, denoted μG\mu\in G, is computed and an associated covariance matrix Σn×n\Sigma\in\mathbb{R}^{n\times n} (here, n=dimGn=\dim{G}). In analogy with filtering on Euclidean space, the error in our estimate μ\mu is stipulated to be gμ1g\mu^{-1}. If the TSF’s CG approximations of the true underlying probability distribution over gg are accurate, then g𝒯G(μ,Σ)g\sim\mathcal{T}_{G}(\mu,\Sigma) and, in particular, gμ1=e𝔤(ξ)g\mu^{-1}=e^{\mathcal{M}_{\mathfrak{g}}({\xi})}, where ξ𝒩n(0,Σ)\xi\sim\mathcal{N}^{n}(0,\Sigma). Hence, 𝒱𝔤(log(gμ1))𝒩n(0,Σ)\mathcal{V}_{\mathfrak{g}}({\log{(g\mu^{-1})}})\sim\mathcal{N}^{n}(0,\Sigma) and so the quantity

χG2:=1n𝒱𝔤(log(gμ1)),Σ1𝒱𝔤(log(gμ1)),\displaystyle\chi^{2}_{G}:=\frac{1}{n}\langle\mathcal{V}_{\mathfrak{g}}({\log{(g\mu^{-1})}}),\Sigma^{-1}\mathcal{V}_{\mathfrak{g}}({\log{(g\mu^{-1})}})\rangle, (82)

averaged over many random realizations should be (approximately) 11. The quantity χG2\chi^{2}_{G} will be used as the primary metric to evaluate simulated filter performance in Section VIII.1.

VII Examples

In the following sections, eqn. (60) is examined for some specific examples of Lie groups that are relevant for applications. Most of the work lies in determining the explicit form of the diffusion tensor (58) for the given example.

VII.1 SU2\mathrm{SU}_{2}

Consider the case of unit quaternions, which define the Lie group 𝕊3\mathbb{S}^{3} under quaternionic multiplication. This Lie group is isomorphic to SU2\mathrm{SU}_{2}, hence the name of this example. As discussed in Sec. II, quaternions define a convenient parameterization of elements of SO3\mathrm{SO}_{3}, which are the primary mathematical objects of interest in attitude filtering. In this case, the relevant group equation of motion is eqn. (11), where ω\omega is a noisy measurement provided by a gyroscope. This section does not consider the case where ω\omega is corrupted by a bias, as this is reserved for Section VIII. Instead, simply assume the measured angular rate satisfies ωm=ω+η\omega_{m}=\omega+\eta, where η𝒩3(0,Qη)\eta\sim\mathcal{N}^{3}(0,Q_{\eta}). Under the paradigm of DMR, the Langevin equation governing propagation in between attitude measurement updates then reads

q˙=12𝔰(ωmη)q,\displaystyle\dot{q}=\dfrac{1}{2}\mathcal{M}_{\mathfrak{s}}({\omega_{m}-\eta})q, (83)

where 𝔰\mathcal{M}_{\mathfrak{s}} is given by (8). Eqn. (83) is precisely of the form (52).

The first matter of business is to compute ad¯\overline{\mathrm{ad}}, defined in (57), for the Lie algebra 𝔰\mathfrak{s}. From the identity (21), it can be seen that ad¯ξ=2[ξ]×\overline{\mathrm{ad}}_{\xi}=-2[\xi]_{\times} for ξ3\xi\in\mathbb{R}^{3}. Therefore, W¯𝔰(ξ)=(01e2s[ξ]×ds)1\overline{W}_{\!\!\mathfrak{s}}(\xi)=\left(\int_{0}^{1}e^{-2s[\xi]_{\times}}\mathrm{d}s\right)^{-1}. To find a closed-form for W¯𝔰(ξ)\overline{W}_{\!\!\mathfrak{s}}(\xi), it is necessary to recall the Rodrigues’ rotation formula:

es[ξ]×=ξξTξ2+cos(sξ)(I3ξξTξ2)+sin(sξ)ξ[ξ]×.\displaystyle e^{s[\xi]_{\times}}=\frac{\xi\xi^{T}}{\|\xi\|^{2}}+\cos{(s\|\xi\|)}\left(I_{3}-\frac{\xi\xi^{T}}{\|\xi\|^{2}}\right)+\frac{\sin{(s\|\xi\|)}}{\|\xi\|}[\xi]_{\times}. (84)

Applying (84) to compute 01e2s[ξ]×ds\int_{0}^{1}e^{-2s[\xi]_{\times}}\mathrm{d}s, taking the matrix inverse, and simplifying, yields

W¯𝔰(ξ)=ξξTξ2+κ(ξ)(I3ξξTξ2)+[ξ]×,\displaystyle\overline{W}_{\!\!\mathfrak{s}}(\xi)=\frac{\xi\xi^{T}}{\|\xi\|^{2}}+\kappa(\xi)\left(I_{3}-\frac{\xi\xi^{T}}{\|\xi\|^{2}}\right)+[\xi]_{\times}, (85)

where κ:Bπ(0)(,1]\kappa:B_{\pi}(0)\rightarrow(-\infty,1] is the radial function

κ(ξ)=ξcotξ.\displaystyle\kappa(\xi)=\|\xi\|\cot{\|\xi\|}. (86)

As a final note, observe that in the SU2\mathrm{SU}_{2}-case, the “constant" piece of the Langevin equation X12𝔰(ωm)X\equiv\frac{1}{2}\mathcal{M}_{\mathfrak{s}}({\omega_{m}}), and so ad¯𝒱𝔤(X)[ωm]×\overline{\mathrm{ad}}_{\mathcal{V}_{\mathfrak{g}}({X})}\equiv-[\omega_{m}]_{\times}. Plugging the previous results into (59) yields

ξ˙=[ωm]×ξ+12W¯𝔰(ξ)η.\displaystyle\dot{\xi}=-[\omega_{m}]_{\times}\xi+\frac{1}{2}\overline{W}_{\!\!\mathfrak{s}}(\xi)\eta. (87)

The remainder of this example is devoted to a discussion of the FPE associated with (87), its fundamental solution, and the relevance to optimal attitude filtering. Much of what is covered here is in similar spirit to Markley_SO3 . In the interest of notational simplicity, for the remainder of this subsection, W¯𝔰(ξ)\overline{W}_{\!\!\mathfrak{s}}(\xi) will be denoted by G(ξ)W¯𝔰(ξ)G(\xi)\equiv\overline{W}_{\!\!\mathfrak{s}}(\xi). For brevity, the interested reader is referred to do1992riemannian ; rosenberg1997laplacian ; berger2003panoramic ; faraut2008analysis for additional information regarding the following terminology.

Using the identity iξiGij(ξ)=2(1κ(ξ))ξj/ξ2\sum_{i}\partial_{\xi_{i}}G_{ij}(\xi)=2(1-\kappa(\xi))\xi_{j}/\|\xi\|^{2}, the FPE for the transition probability density p:[0,)×Bπ(0)[0,)p:[0,\infty)\times B_{\pi}(0)\rightarrow[0,\infty) associated with (87) reads

(88)

The second-order differential operator GT(ξ)QηGT(ξ)G^{T}(\xi)\nabla\cdot Q_{\eta}G^{T}(\xi)\nabla appearing in (88) is the Laplace-Beltrami operator associated with the Riemannian metric

γ(ξ)=(G(ξ)QηGT(ξ))1\displaystyle\gamma(\xi)=(G(\xi)Q_{\eta}G^{T}(\xi))^{-1} (89)

on 𝕊3\mathbb{S}^{3}. For Qη=I3Q_{\eta}=I_{3}, γ\gamma simplifies to

γ(ξ)=ξξTξ2+h(ξ)(I3ξξTξ2),\displaystyle\gamma(\xi)=\dfrac{\xi\xi^{T}}{\|\xi\|^{2}}+h(\xi)\left(I_{3}-\dfrac{\xi\xi^{T}}{\|\xi\|^{2}}\right), (90)

where h:[0,)h:\mathbb{R}\rightarrow[0,\infty) is the radial function

h(ξ)=sin2(ξ)ξ2.\displaystyle h(\xi)=\frac{\sin^{2}{(\|\xi\|)}}{\|\xi\|^{2}}. (91)

Furthermore, the FPE (88) simplifies to

tp=div(([ωm]×ξ+1κ(ξ)ξ2ξ)p)+12(h(ξ)1ξ2Δ𝕊2+Δ)p,\displaystyle\partial_{t}p=\mathrm{div}({\left([\omega_{m}]_{\times}\xi+\frac{1-\kappa(\xi)}{\|\xi\|^{2}}\xi\right)p})+\frac{1}{2}\left(\frac{h(\xi)-1}{\|\xi\|^{2}}\Delta_{\mathbb{S}^{2}}+\Delta\right)p, (92)

where Δ\Delta is the standard Laplacian on 3\mathbb{R}^{3}, and Δ𝕊2\Delta_{\mathbb{S}^{2}} is the standard Laplace-Beltrami operator on the unit 22-sphere. Formula (90) is the expression for the standard round metric on 𝕊3\mathbb{S}^{3} written in exponential coordinates. In general, the metric (89) is obtained by left-translating the inner product induced by QηQ_{\eta} on the Lie algebra of 𝕊3\mathbb{S}^{3}. Therefore, the FPE (88) is closely related to the heat equation on 𝕊3\mathbb{S}^{3} equipped with an arbitrary left-invariant Riemannian metric on 𝕊3\mathbb{S}^{3}. This is natural considering that the SDE (87) was derived from a diffusive process on 𝕊3\mathbb{S}^{3}, i.e. eqn. (83).

Schulman Schulman68 was the first to write down a closed-form expression for the heat kernel for 𝕊3\mathbb{S}^{3} equipped with the round metric (90). Leveraging Schulman’s formula, it is possible to give the fundamental solution K:[0,)×Bπ(0)(0,)K:[0,\infty)\times B_{\pi}(0)\rightarrow(0,\infty) to (92) when ωm=0\omega_{m}=0:

K(t,ξ)=h(ξ)et/2(2πt)3/2nξ+2πnsinξe(ξ+2πn)22t.\displaystyle K(t,\xi)=\frac{h(\xi)e^{t/2}}{(2\pi t)^{3/2}}\sum_{n\in\mathbb{Z}}\frac{\|\xi\|+2\pi n}{\sin{\|\xi\|}}e^{-\frac{\left(\|\xi\|+2\pi n\right)^{2}}{2t}}. (93)

One may verify that K0K\geq 0 and is normalized on Bπ(0)3B_{\pi}(0)\subset\mathbb{R}^{3}, and hence defines a probability density on Bπ(0)B_{\pi}(0). If p0:Bπ(0)[0,)p_{0}:B_{\pi}(0)\rightarrow[0,\infty) is some initial probability density it follows from the left-invariance of the heat kernel faraut2008analysis that

p(t,ξ)=Bπ(0)K(t,z(ξ,δ))p0(δ)dδ\displaystyle p(t,\xi)=\int_{B_{\pi}(0)}K(t,z(\xi,\delta))p_{0}(\delta)\mathrm{d}\delta (94)

is normalized, satisfies (92) (with ωm=0\omega_{m}=0), and limt0p(t,ξ)=p0(ξ)\lim_{t\rightarrow 0}p(t,\xi)=p_{0}(\xi), where

z(ξ,δ)=𝒱𝔰(BCH(e𝔰(ξ),e𝔰(δ)))\displaystyle z(\xi,\delta)=\mathcal{V}_{\mathfrak{s}}({\mathrm{BCH}({e^{\mathcal{M}_{\mathfrak{s}}({\xi})}},{e^{-\mathcal{M}_{\mathfrak{s}}({\delta})}})}) (95)

Formula (94) essentially reflects the fact that convolution against the heat kernel on a Lie group yields the propagated solution to the heat equation on the group. Convolution is in terms of the group operation, which in this case arises from the underlying group structure on 𝕊3\mathbb{S}^{3}. From the Chapman-Kolmogorov equation, (93) essentially gives the class of probability distributions which are invariant under propagation through (87) in the case of isotropic QηQ_{\eta}. This observation should be compared with the canonical example of the heat kernel on n\mathbb{R}^{n} endowed with the standard Euclidean metric. This is the Gaussian function K(x,y,t)=1(4πt)d/2exp(xy24t)K(x,y,t)=\frac{1}{(4\pi t)^{d/2}}\exp{\left(-\frac{\|x-y\|^{2}}{4t}\right)}, for x,ynx,y\in\mathbb{R}^{n}, which satisfies the heat equation tK=ΔK\partial_{t}K=\Delta K on n\mathbb{R}^{n}. The Chapman-Kolmogorov equation for the previous heat kernel encodes the well-known fact that the convolution of two Gaussians is again a Gaussian, a fact which is the essence of the propagation step in the classical linear Kalman filter. Notice that (93) is well-approximated by the Gaussian eξ2/2t/(2πt)3/2e^{-\|\xi\|^{2}/2t}/(2\pi t)^{3/2} for t1t\ll 1 and ξπ\|\xi\|\ll\pi, which provides theoretical support for earlier claims that CGs are the appropriate class of distributions to filter with respect to on Lie groups, even in the presence of noise.

Using left-invariance, a relatively simple modification of (93) yields the fundamental solution, Kωm(t,ξ)K_{\omega_{m}}(t,\xi), for (92) when ωm0\omega_{m}\neq 0: Kωm(t,ξ)=K(t,z(ξ,ωm))K_{\omega_{m}}(t,\xi)=K(t,z(\xi,\omega_{m})), where zz was defined in (95). A further generalization of (93) to the case of non-isotropic QηQ_{\eta} (i.e., the fundamental solution to (88)) would require, at a minimum, a closed-form for the geodesic distance on 𝕊3\mathbb{S}^{3} equipped with an arbitrary left-invariant Riemannian metric, and such a formula appears to be unknown. A class of likelihood functions which leave (93) invariant under a Bayesian update is also unknown, and we suspect there is no such "natural" observation process (as compared to the classical linear Kalman filter where linear measurements with Gaussian distributed measurement noise leaves the class of Gaussian priors invariant under a Bayesian update). If these problems could be solved, however, then one would have a theory of attitude filtering on 𝕊3\mathbb{S}^{3} under the paradigm of DMR that is analogous to Kalman’s classical theory on Euclidean space.

As described at the end of Section VI.1, there is a short-time expansion that approximates (92) to yield a simpler FPE for which the moment evolution equations may be solved exactly. This approximate FPE is obtained by truncating the Taylor expansion of gg (86) and hh (91) to second-order, and then substituting these truncated Taylor expansions for gg and hh in (92). The resulting FPE reads

tp=div(([ωm]×+qη212I3)ξp)+qη22(112Δ𝕊2+Δ)p,\displaystyle\partial_{t}p=\mathrm{div}({\left([\omega_{m}]_{\times}+\frac{q_{\eta}^{2}}{12}I_{3}\right)\xi p})+\frac{q_{\eta}^{2}}{2}\left(\frac{1}{12}\Delta_{\mathbb{S}^{2}}+\Delta\right)p, (96)

where qηq_{\!\eta} is defined via Qη=qη2I3Q_{\eta}=q_{\eta}^{2}I_{3}. The moment evolution equations for the density which satisfies (96) read

{dξ^dt=([ωm]×+qη26I3)ξ^dPdt=([ωm]×+qη212I3)PP([ωm]×+qη212I3)Tqη212(PTr(P)I3)+qη2I3,\displaystyle\left\{\begin{array}[]{l}\dfrac{\mathrm{d}\widehat{\xi}}{\mathrm{d}t}=-\left([\omega_{m}]_{\times}+\dfrac{q_{\!\eta}^{2}}{6}I_{3}\right)\widehat{\xi}\\[5.0pt] \dfrac{\mathrm{d}P}{\mathrm{d}t}=-\left([\omega_{m}]_{\times}+\dfrac{q_{\!\eta}^{2}}{12}I_{3}\right)P-P\left([\omega_{m}]_{\times}+\dfrac{q_{\!\eta}^{2}}{12}I_{3}\right)^{T}-\dfrac{q_{\!\eta}^{2}}{12}\left(P-\mathrm{Tr}({P})I_{3}\right)+q_{\!\eta}^{2}I_{3},\end{array}\right. (99)

where ξ^:=𝔼[ξ]\widehat{\xi}:=\mathbb{E}[{\xi}] and P=Cov[ξ]P=\mathrm{Cov}[{\xi}]. The evolution equations (99) are both linear, a desirable feature if computational speed up is required.

VII.2 SE3\mathrm{SE}_{3}

Consider the special Euclidean group in 33 dimensions, SE3\mathrm{SE}_{3}, which is the semidirect product of SO3\mathrm{SO}_{3} and 3\mathbb{R}^{3}, typically written as SE3=SO33\mathrm{SE}_{3}=\mathrm{SO}_{3}\ltimes\mathbb{R}^{3}. An element ASE3A\in\mathrm{SE}_{3} will be represented as a 4×44\times 4 matrix of the form

A=(Rr01)GL4,\displaystyle A=\left(\begin{array}[]{cc}R&r\\ 0&1\end{array}\right)\in\mathrm{GL}_{4}, (102)

where RSO3R\in\mathrm{SO}_{3} and r3r\in\mathbb{R}^{3}. A physical context to the elements of SE3\mathrm{SE}_{3} may be ascribed as the position r3r\in\mathbb{R}^{3} and attitude RSO3R\in\mathrm{SO}_{3} of a rigid body. If the rigid body is subject to an angular velocity ω3\omega\in\mathbb{R}^{3} and r3r\in\mathbb{R}^{3} describes its position in the reference frame of the rigid body, then the attitude R(t)R(t) and position r(t)r(t) of the body at any instance in time tt satisfy

{R˙(t)=[ω]×R(t)r˙(t)=[ω]×r(t)+v.\displaystyle\left\{\begin{array}[]{lc}\dot{R}(t)=[\omega]_{\times}R(t)\\[5.0pt] \dot{r}(t)=[\omega]_{\times}r(t)+v.\end{array}\right. (105)

The ODEs (105) may be related to the INS problem as follows. Let RTibR\equiv T_{i}^{b} be the rotation from an inertial frame to a body frame. Then,

R˙=T˙ib=Tib[ωbii]×=Tib[ωbii]×TbiTib=[ωbib]×Tib,\displaystyle\dot{R}=\dot{T}_{i}^{b}=T_{i}^{b}[\omega^{i}_{bi}]_{\times}=T_{i}^{b}[\omega^{i}_{bi}]_{\times}T_{b}^{i}T_{i}^{b}=[\omega^{b}_{bi}]_{\times}T_{i}^{b}, (106)

where ωibi\omega^{i}_{ib} denotes the angular rate of the inertial frame with respect to the body frame, expressed in the inertial frame, and likewise for ωbib\omega^{b}_{bi}. Let rrbr\equiv r^{b}, where rb=Tibrir^{b}=T_{i}^{b}r^{i}, i.e. the position expressed in the body frame. Then,

r˙=r˙b=Tib([ωbii]×ri+r˙i)=Tib[ωbii]×Tbirb+vb=[ωbib]×rb+vb,\displaystyle\dot{r}=\dot{r}^{b}=T_{i}^{b}\left([\omega^{i}_{bi}]_{\times}r^{i}+\dot{r}^{i}\right)=T_{i}^{b}[\omega^{i}_{bi}]_{\times}T_{b}^{i}r^{b}+v^{b}=[\omega^{b}_{bi}]_{\times}r^{b}+v^{b}, (107)

where vb=Tibviv^{b}=T_{i}^{b}v^{i} is the velocity expressed in the body frame. From the previous two equations, ωωbib\omega\equiv\omega^{b}_{bi} and vvbv\equiv v^{b} in (105).

The Lie algebra 𝔰𝔢3\mathfrak{se}_{3} of SE3\mathrm{SE}_{3} consists of matrices of the form

𝔰𝔢3(ω,v)=([ω]×v00),\displaystyle\mathcal{M}_{\mathfrak{se}_{3}}({\omega,v})=\left(\begin{array}[]{cc}[\omega]_{\times}&v\\ 0&0\end{array}\right), (110)

where ω,v3\omega,v\in\mathbb{R}^{3}. The exponential map is

e𝔰𝔢3(ω,v)=(e[ω]×J¯𝔰𝔬3(ω)v01),\displaystyle e^{\mathcal{M}_{\mathfrak{se}_{3}}({\omega,v})}=\left(\begin{array}[]{cc}e^{[\omega]_{\times}}&\overline{J}_{\!\mathfrak{so}_{3}}(-\omega)v\\ 0&1\end{array}\right), (113)

where

J¯𝔰𝔬3(ω):=01es[ω]×ds.\displaystyle\overline{J}_{\!\mathfrak{so}_{3}}(\omega):=\int_{0}^{1}e^{-s[\omega]_{\times}}\mathrm{d}s. (114)

Therefore, the curve

A(t)=et𝔰𝔢3(ω,v)A(0)=(et[ω]×R0et[ω]×r0+0te(ts)[ω]×vds01)SE3,\displaystyle A(t)=e^{t\mathcal{M}_{\mathfrak{se}_{3}}({\omega,v})}A(0)=\left(\begin{array}[]{cc}e^{t[\omega]_{\times}}R_{0}&e^{t[\omega]_{\times}}r_{0}+\int_{0}^{t}e^{(t-s)[\omega]_{\times}}v\mathrm{d}s\\ 0&1\end{array}\right)\in\mathrm{SE}_{3}, (117)

is the solution of (105) written in a Lie algebraic way. The vector field in A˙(t)=𝔰𝔢3(ω,v)A(t)\dot{A}(t)=\mathcal{M}_{\mathfrak{se}_{3}}({\omega,v})A(t) is of the form (36) (with D=0D=0) and is therefore group-affine. Hence, in light of the group structure of SE3\mathrm{SE}_{3}, the "natural" state vector to estimate is x=(Tib,rb)x=(T_{i}^{b},r^{b}), assuming one desires to use CGs over SE3\mathrm{SE}_{3} to capture the uncertainty associated with xx.

Similar to the SU2\mathrm{SU}_{2}-case, the Lie algebra coordinates (ω,v)6(\omega,v)\in\mathbb{R}^{6} are treated as the measured quantities. The measured angular rate and velocity satisfy ωm=ω+η\omega_{m}=\omega+\eta and vm=v+ζv_{m}=v+\zeta, respectively, with η𝒩3(0,Qη)\eta\sim\mathcal{N}^{3}(0,Q_{\eta}) and ζ𝒩3(0,Qζ)\zeta\sim\mathcal{N}^{3}(0,Q_{\zeta}). Under DMR, the Langevin equation governing propagation in between measurement updates then reads

A˙=(𝔰𝔢3(ωm,vm)𝔰𝔢3(η,ζ))A.\displaystyle\dot{A}=\left(\mathcal{M}_{\mathfrak{se}_{3}}({\omega_{m},v_{m}})-\mathcal{M}_{\mathfrak{se}_{3}}({\eta,\zeta})\right)A. (118)

Eqn. (118) is of the form (52). Again, the first task is to compute ad¯\overline{\mathrm{ad}} for 𝔰𝔢3\mathfrak{se}_{3}. A computation shows, for (ω,v)3×3(\omega,v)\in\mathbb{R}^{3}\times\mathbb{R}^{3},

ad¯(ω,v)=([ω]×0[v]×[ω]×).\displaystyle\overline{\mathrm{ad}}_{(\omega,v)}=\left(\begin{array}[]{cc}[\omega]_{\times}&0\\ ~{}[v]_{\times}&[\omega]_{\times}\end{array}\right). (121)

Using the Van Loan method van1978computing to compute the matrix exponential yields

W¯𝔰𝔢3(ω,v)=(W¯𝔰𝔬3(ω)0W¯𝔰𝔬3(ω)N(ω,v)W¯𝔰𝔬3(ω)W¯𝔰𝔬3(ω)),\displaystyle\overline{W}_{\!\!\mathfrak{se}_{3}}(\omega,v)=\left(\begin{array}[]{cc}\overline{W}_{\!\!\mathfrak{so}_{3}}(\omega)&0\\ -\overline{W}_{\!\!\mathfrak{so}_{3}}(\omega)N(\omega,v)\overline{W}_{\!\!\mathfrak{so}_{3}}(\omega)&\overline{W}_{\!\!\mathfrak{so}_{3}}(\omega)\end{array}\right), (124)

where

W¯𝔰𝔬3(ω):=(01es[ω]×ds)1,\displaystyle\overline{W}_{\!\!\mathfrak{so}_{3}}(\omega):=\left(\int_{0}^{1}e^{s[\omega]_{\times}}\mathrm{d}s\right)^{-1}, (125)

and

N(ω,v):=01et[ω]×[0tes[ω]×vds]×dt.\displaystyle N(\omega,v):=\int_{0}^{1}e^{t[\omega]_{\times}}[\int_{0}^{t}e^{-s[\omega]_{\times}}v\mathrm{d}s]_{\times}\mathrm{d}t. (126)

Note that W¯𝔰𝔬3(ω)\overline{W}_{\!\!\mathfrak{so}_{3}}(\omega) is essentially given by (85) after a simple rescaling of variables. Plugging (121)-(124) into (59) yields the TSSDE for SE3\mathrm{SE}_{3} corresponding to (118):

ddt(δw)=([ωm]×0[vm]×[ωm]×)(δw)+W¯𝔰𝔢3(δ,w)(ηζ)\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\left(\begin{array}[]{c}\delta\\ w\end{array}\right)=\left(\begin{array}[]{cc}[\omega_{m}]_{\times}&0\\ ~{}[v_{m}]_{\times}&[\omega_{m}]_{\times}\end{array}\right)\left(\begin{array}[]{c}\delta\\ w\end{array}\right)+\overline{W}_{\!\!\mathfrak{se}_{3}}(\delta,w)\left(\begin{array}[]{c}\eta\\ \zeta\end{array}\right) (135)

in which δ,w3\delta,w\in\mathbb{R}^{3} are the coordinates parameterizing the Lie algebra 𝔰𝔢3\mathfrak{se}_{3}.

VII.3 SE2,3\mathrm{SE}_{2,3}

Ultimately, SE3\mathrm{SE}_{3} is not the Lie group appropriate to describe state estimation from IMU measurements, since it is not the body frame velocity vbv^{b} the IMU measures, but rather the body frame acceleration aba^{b} (minus gravity). This complicates things in an interesting way. First, an appropriate Lie group needs to be identified. Inspired by the results of Section VII.2, consider the state vector given by x=(Tib,rb,vb)x=(T_{i}^{b},r^{b},v^{b}). This state vector form motivates the group SE2,3GL5\mathrm{SE}_{2,3}\subset\mathrm{GL}_{5} given by matrices of the form

A=(Rvr010001),\displaystyle A=\left(\begin{array}[]{ccc}R&v&r\\ 0&1&0\\ 0&0&1\end{array}\right), (139)

where RSO3R\in\mathrm{SO}_{3} and r,v3r,v\in\mathbb{R}^{3} (thought of as a 3×13\times 1 column vectors). This makes SE2,3=SO36\mathrm{SE}_{2,3}=\mathrm{SO}_{3}\ltimes\mathbb{R}^{6}. To the best of our knowledge, the group SE2,3\mathrm{SE}_{2,3} first appeared in barrau2016invariant , and is referred to as the group of double direct spatial isometries.

Notice that v˙b=[ωbib]×vb+ab+Tibgi\dot{v}^{b}=[\omega^{b}_{bi}]_{\times}v^{b}+a^{b}+T_{i}^{b}g^{i}, where gi3g^{i}\in\mathbb{R}^{3} is the gravitational acceleration in an inertial frame and will be treated as constant (i.e., independent of rir^{i}). (In what follows, drop the superscripts bb and ii for notational simplicity.) The equations of motion on SE2,3\mathrm{SE}_{2,3} read

{R˙(t)=[ω]×R(t)r˙(t)=[ω]×r(t)+v(t)v˙(t)=[ω]×v(t)+a+R(t)g,\displaystyle\left\{\begin{array}[]{lc}\dot{R}(t)=[\omega]_{\times}R(t)\\[5.0pt] \dot{r}(t)=[\omega]_{\times}r(t)+v(t)\\[5.0pt] \dot{v}(t)=[\omega]_{\times}v(t)+a+R(t)g,\end{array}\right. (143)

or, equivalently,

A(t)=𝔰𝔢2,3(ω,a+R(t)g,v(t))A(t),\displaystyle A^{\prime}(t)=\mathcal{M}_{\mathfrak{se}_{2,3}}({\omega,a+R(t)g,v(t)})A(t), (144)

where

𝔰𝔢2,3(ω,a,v)=([ω]×av000000),\displaystyle\mathcal{M}_{\mathfrak{se}_{2,3}}({\omega,a,v})=\left(\begin{array}[]{ccc}[\omega]_{\times}&a&v\\ 0&0&0\\ 0&0&0\end{array}\right), (148)

is a member of the Lie algebra 𝔰𝔢2,3\mathfrak{se}_{2,3} of SE2,3\mathrm{SE}_{2,3}.

Eqn. (144) doesn’t fall into the class of constant dynamical systems because of the terms involving the gravitational acceleration gg and the velocity variables vv. This results in a distinct departure from the SU2\mathrm{SU}_{2} and SE3\mathrm{SE}_{3} examples considered so-far. In an example INS application under DMR (see Sec. II), ω\omega and aa in (143) are replaced by ωmη\omega_{m}-\eta and amζa_{m}-\zeta, where ωm\omega_{m} and ama_{m} are the measured angular rate and body frame acceleration, respectively, and η𝒩3(0,Qη)\eta\sim\mathcal{N}^{3}(0,Q_{\eta}) and ζ𝒩3(0,Qζ)\zeta\sim\mathcal{N}^{3}(0,Q_{\zeta}). ωm\omega_{m} and ama_{m} are treated as constants over the time interval of integration. A model for gg (in the inertial frame) is typically assumed. The vector field A𝔰𝔢2,3(ωm,am+Rg,v)AA\mapsto\mathcal{M}_{\mathfrak{se}_{2,3}}({\omega_{m},a_{m}+Rg,v})A is verified to be group-affine, assuming gg independent of rr. The fidelity of this assumption depends on the scenario being considered. According to Theorem 1, there must exist a linear equation of the form (60) on the Lie algebra corresponding to the ODE A˙=𝔰𝔢2,3(ωm,am+Rg,v)A\dot{A}=\mathcal{M}_{\mathfrak{se}_{2,3}}({\omega_{m},a_{m}+Rg,v})A. This will be shown to be the case. It is noteworthy that the DP group law on SO3×6\mathrm{SO}_{3}\times\mathbb{R}^{6}, corresponding to the state vector x=(Tib,ri,vi)x=(T_{i}^{b},r^{i},v^{i}), fails to make the dynamics

{R˙(t)=[ω]×R(t)r˙(t)=v(t)v˙(t)=RT(t)a+g\displaystyle\left\{\begin{array}[]{lc}\dot{R}(t)=[\omega]_{\times}R(t)\\[5.0pt] \dot{r}(t)=v(t)\\[5.0pt] \dot{v}(t)=R^{T}(t)a+g\end{array}\right.

group-affine.

Let the Lie algebra coordinates be partitioned as ξ=(δ,u,w)9\xi=(\delta,u,w)\in\mathbb{R}^{9}. The exponential map on SE2,3\mathrm{SE}_{2,3} reads

e𝔰𝔢2,3(ξ)=(e[δ]×J¯𝔰𝔬3(δ)uJ¯𝔰𝔬3(δ)w010001),\displaystyle e^{\mathcal{M}_{\mathfrak{se}_{2,3}}({\xi})}=\left(\begin{array}[]{ccc}e^{[\delta]_{\times}}&\overline{J}_{\!\mathfrak{so}_{3}}(-\delta)u&\overline{J}_{\!\mathfrak{so}_{3}}(-\delta)w\\ 0&1&0\\ 0&0&1\end{array}\right), (152)

where J¯𝔰𝔬3\overline{J}_{\!\mathfrak{so}_{3}} was introduced in (114). Supply (144) with the initial condition A(0)(R0,v0,r0)A(0)\equiv(R_{0},v_{0},r_{0}). The MAP μ(t)SE2,3\mu(t)\in\mathrm{SE}_{2,3} is then given by

μ(t)=(et[ωm]×R0v^(t)et[ωm]×r0+0te(ts)[ωm]×v^(s)ds010001),\displaystyle\mu(t)=\left(\begin{array}[]{ccc}e^{t[\omega_{m}]_{\times}}R_{0}&\widehat{v}(t)&e^{t[\omega_{m}]_{\times}}r_{0}+\int_{0}^{t}e^{(t-s)[\omega_{m}]_{\times}}\widehat{v}(s)\mathrm{d}s\\ 0&1&0\\ 0&0&1\end{array}\right), (156)

where v^(t)=et[ωm]×v0+0te(ts)[ωm]×(am+es[ωm]×R0g)ds\widehat{v}(t)=e^{t[\omega_{m}]_{\times}}v_{0}+\int_{0}^{t}e^{(t-s)[\omega_{m}]_{\times}}(a_{m}+e^{s[\omega_{m}]_{\times}}R_{0}g)\mathrm{d}s. Starting from eqn. (34), a derivation using (152)-(156) shows that eqn. (60) in the case of SE2,3\mathrm{SE}_{2,3} becomes

ddt(δuw)=([ωm]×00[am]×[ωm]×00I[ωm]×)(δuw)W¯𝔰𝔢2,3(ξ)(ηζ0),\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}\left(\begin{array}[]{c}\delta\\ u\\ w\end{array}\right)=\left(\begin{array}[]{ccc}~{}[\omega_{m}]_{\times}&0&0\\ ~{}[a_{m}]_{\times}&[\omega_{m}]_{\times}&0\\ 0&I&[\omega_{m}]_{\times}\end{array}\right)\left(\begin{array}[]{c}\delta\\ u\\ w\end{array}\right)-\overline{W}_{\!\!\mathfrak{se}_{2,3}}(\xi)\left(\begin{array}[]{c}\eta\\ \zeta\\ 0\end{array}\right), (169)

where

W¯𝔰𝔢2,3(ξ)=(W¯𝔰𝔬3(δ)00W¯𝔰𝔬3(δ)N(δ,u)W¯𝔰𝔬3(δ)W¯𝔰𝔬3(δ)0W¯𝔰𝔬3(δ)N(δ,w)W¯𝔰𝔬3(δ)0W¯𝔰𝔬3(δ)),\displaystyle\overline{W}_{\!\!\mathfrak{se}_{2,3}}(\xi)=\left(\begin{array}[]{ccc}\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)&0&0\\ -\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)N(\delta,u)\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)&\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)&0\\ -\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)N(\delta,w)\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)&0&\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)\end{array}\right), (173)

with N:63×3N:\mathbb{R}^{6}\rightarrow\mathbb{R}^{3\times 3} given by (126). It is interesting that, even in the noiseless case, the evolution of the MAP is not "constant", but the evolution on the Lie algebra is linear. It is also noteworthy that the gravitational acceleration does not appear in (169). This provides a non-trivial example in which filtering on the group with a CG yields a propagation stage with complexity equal to that of standard Kalman filtering, but which exactly incorporates nonlinearity missed by conventional Kalman approaches.

VIII Gyro Bias Estimation

This section addresses the problem of jointly estimating both the attitude of a rigid body and the bias contributing to the measured angular rate from a gyroscope, as in the Farrenkopf model (15). The underlying state space here will be the 66 dimensional manifold SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}, where the state to-be-estimated is (R,β)(R,\beta), with RR be the rotation matrix representing the attitude of the rigid body and β\beta representing the gyro bias. The Langevin equation describing the evolution of the state (R,β)(R,\beta) reads

{R˙=[ωmβη]×Rβ˙=ζ,\displaystyle\left\{\begin{array}[]{l}\dot{R}=[\omega_{m}-\beta-\eta]_{\times}R\\[5.0pt] \dot{\beta}=\zeta,\end{array}\right. (176)

where η,ζ\eta,\zeta are defined in (15).

There is more than one group law one may assign to pairs (R,β)(R,\beta) which turns SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} into a Lie group. The simplest is the DP group law, in which two pairs (R1,β1),(R2,β2)(R_{1},\beta_{1}),(R_{2},\beta_{2}) are combined according to (R1,β1)(R2,β2)=(R1R2,β1+β2)(R_{1},\beta_{1})(R_{2},\beta_{2})=(R_{1}R_{2},\beta_{1}+\beta_{2}). Another is the SE3\mathrm{SE}_{3} group law (i.e., a semidirect product), where (R1,β1)(R2,β2)=(R1R2,β1+R1β2)(R_{1},\beta_{1})(R_{2},\beta_{2})=(R_{1}R_{2},\beta_{1}+R_{1}\beta_{2}). Unfortunately, neither of these groups make the vector field f(R,β)=([ωmβ]×R,0)f(R,\beta)=([\omega_{m}-\beta]_{\times}R,0) group-affine. However, a time-parameterized family of group operations on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} is identified that ensures the vector field ff is "almost" group-affine (and is group-affine when restricted to SO3\mathrm{SO}_{3}). This family corresponds to the SE3\mathrm{SE}_{3} group law to zeroth order in time, thus providing theoretical motivation for this choice of group law over, e.g., the DP group law. Furthermore, the numerical superiority of the SE3\mathrm{SE}_{3} group law over the DP group law for this state estimation problem is demonstrated in Section VIII.1.

Identify SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} with SO3×𝔰𝔬3\mathrm{SO}_{3}\times\mathfrak{so}_{3} via the map (R,β)(R,[β]×)(R,\beta)\mapsto(R,[\beta]_{\times}). Consider a family of Lie groups GtG_{t} parameterized by time tt\in\mathbb{R} where for each tt the underlying manifold of GtG_{t} is simply SO3×𝔰𝔬3\mathrm{SO}_{3}\times\mathfrak{so}_{3}. The group operation is a function of tt, and is defined via

(R1[β1]×)t(R2[β2]×)=(R1R2t1BCH(tR1[β2]×R1T,t[β1]×)).\displaystyle\left(\begin{array}[]{c}R_{1}\\ ~{}[\beta_{1}]_{\times}\end{array}\right)\circ_{t}\left(\begin{array}[]{c}R_{2}\\ ~{}[\beta_{2}]_{\times}\end{array}\right)=\left(\begin{array}[]{c}R_{1}R_{2}\\ t^{-1}\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}})\end{array}\right). (183)

It may verified that t\circ_{t} yields a genuine group law for each t0t\neq 0. (The only nontrivial aspect of this is checking the associativity of t\circ_{t}. This fact essentially boils down to the identity BCH(BCH(X,Y),Z)=BCH(X,BCH(Y,Z))\mathrm{BCH}({\mathrm{BCH}({X},{Y})},{Z})=\mathrm{BCH}({X},{\mathrm{BCH}({Y},{Z})}) for X,Y,Z𝔤X,Y,Z\in\mathfrak{g}, which, in turn, follows from associativity of matrix multiplication.) From the BCH series (76), observe that

1tBCH(tR1[β2]×R1T,t[β1]×)=[β1]×+R1[β2]×R1Tt2[[β1]×,R1[β2]×R1T]+𝒪(t2).\displaystyle\frac{1}{t}\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}})=[\beta_{1}]_{\times}+R_{1}[\beta_{2}]_{\times}R_{1}^{T}-\frac{t}{2}[[\beta_{1}]_{\times},R_{1}[\beta_{2}]_{\times}R_{1}^{T}]+\mathcal{O}(t^{2}). (184)

From the identity R1[β2]×R1T=[R1β2]×R_{1}[\beta_{2}]_{\times}R_{1}^{T}=[R_{1}\beta_{2}]_{\times}, the group law t\circ_{t} is seen to be equivalent to the SE3\mathrm{SE}_{3} group law at t=0t=0. Finally, we note that there is an explicit representation of the group law t\circ_{t} on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} that doesn’t pass through 3𝔰𝔬3\mathbb{R}^{3}\simeq\mathfrak{so}_{3} and utilizes the closed-form for the BCH series on 𝔰𝔬3\mathfrak{so}_{3}. Indeed, let c3c\in\mathbb{R}^{3} be the vector such that [c]×=BCH([a]×,[b]×)[c]_{\times}=\mathrm{BCH}({[a]_{\times}},{[b]_{\times}}). Then, cc satisfies c=arccos(cos(a)cos(b)sin(a)sin(b)a,bab)\|c\|=\arccos{\left(\cos{\left(\|a\|\right)}\cos{\left(\|b\|\right)}-\sin{\left(\|a\|\right)}\sin{\left(\|b\|\right)}\frac{\langle a,b\rangle}{\|a\|\|b\|}\right)} and

cccsc(c)=sin(a)cos(b)aa+sin(b)cos(a)bbsin(a)sin(b)aba×b.\displaystyle\frac{c}{\|c\|\csc{(\|c\|)}}=\frac{\sin{\left(\|a\|\right)}\cos{\left(\|b\|\right)}}{\|a\|}a+\frac{\sin{\left(\|b\|\right)}\cos{\left(\|a\|\right)}}{\|b\|}b-\frac{\sin{\left(\|a\|\right)}\sin{\left(\|b\|\right)}}{\|a\|\|b\|}a\times b.

For simplicity, set ωm=0\omega_{m}=0. Let us denote by ψt:SO3×𝔰𝔬3SO3×𝔰𝔬3\psi_{t}:\mathrm{SO}_{3}\times\mathfrak{so}_{3}\rightarrow\mathrm{SO}_{3}\times\mathfrak{so}_{3} the flow associated with (176) in the absence of noise. In particular,

ψt(R,[β]×)=(et[β]×R[β]×).\displaystyle\psi_{t}(R,[\beta]_{\times})=\left(\begin{array}[]{c}e^{-t[\beta]_{\times}}R\\ ~{}[\beta]_{\times}\end{array}\right). (187)

From (barrau2019linear, , Proposition 8), group-affinity (under the group law (183)) would be equivalent to the existence of φtAut(Gt)\varphi_{t}\in\mathrm{Aut}({G_{t}}) and νtGt\nu_{t}\in G_{t} such that the flow ψt\psi_{t} (187) satisfies ψt(R,[β]×)=φt(R,[β]×)tνt\psi_{t}(R,[\beta]_{\times})=\varphi_{t}(R,[\beta]_{\times})\circ_{t}\nu_{t}. Since an automorphism maps the identity to itself, it must be the case that νt=(I3,0)\nu_{t}=(I_{3},0). In other words, the flow ψt\psi_{t} must be an automorphism of GtG_{t}, if the associated vector field is group-affine. Let us now analyze how close ψt\psi_{t} in (187) is to being an automorphism.

The defining property of an automorphism asserts that

ψt(R1R2,t1BCH(tR1[β2]×R1T,t[β1]×))=ψt(R1,β1)tψt(R2,β2),\displaystyle\psi_{t}(R_{1}R_{2},t^{-1}\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}}))=\psi_{t}(R_{1},\beta_{1})\circ_{t}\psi_{t}(R_{2},\beta_{2}), (188)

for all (R1,[β1]×),(R2,[β2]×)Gt(R_{1},[\beta_{1}]_{\times}),(R_{2},[\beta_{2}]_{\times})\in G_{t}. For the LHS of (188):

ψt(R1R2,t1BCH(tR1[β2]×R1T,t[β1]×))=(eBCH(tR1[β2]×R1T,t[β1]×)R1R2t1BCH(tR1[β2]×R1T,t[β1]×)),\displaystyle\psi_{t}(R_{1}R_{2},t^{-1}\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}}))=\left(\begin{array}[]{c}e^{-\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}})}R_{1}R_{2}\\ t^{-1}\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}})\end{array}\right), (191)

while the RHS of (188) reads

ψt(R1,β1)tψt(R2,β2)=(et[β1]×R1et[β2]×R2t1BCH(tet[β1]×R1[β2]×R1Tet[β1]×,t[β1]×)).\displaystyle\psi_{t}(R_{1},\beta_{1})\circ_{t}\psi_{t}(R_{2},\beta_{2})=\left(\begin{array}[]{c}e^{-t[\beta_{1}]_{\times}}R_{1}e^{-t[\beta_{2}]_{\times}}R_{2}\\ t^{-1}\mathrm{BCH}({te^{-t[\beta_{1}]_{\times}}R_{1}[\beta_{2}]_{\times}R_{1}^{T}e^{t[\beta_{1}]_{\times}}},{t[\beta_{1}]_{\times}})\end{array}\right). (194)

Analyzing (191)-(194), the bias terms differ in the first argument of BCH by a conjugation by et[β1]×e^{-t[\beta_{1}]_{\times}}. For the SO3\mathrm{SO}_{3} term in (194), notice that

et[β1]×R1et[β2]×R2=et[β1]×etR1[β2]×R1TR1R2=eBCH(t[β1]×,tR1[β2]×R1T)R1R2.\displaystyle e^{-t[\beta_{1}]_{\times}}R_{1}e^{-t[\beta_{2}]_{\times}}R_{2}=e^{-t[\beta_{1}]_{\times}}e^{-tR_{1}[\beta_{2}]_{\times}R_{1}^{T}}R_{1}R_{2}=e^{\mathrm{BCH}({-t[\beta_{1}]_{\times}},{-tR_{1}[\beta_{2}]_{\times}R_{1}^{T}})}R_{1}R_{2}.

Recalling the BCH symmetry BCH(X,Y)=BCH(Y,X)\mathrm{BCH}({X},{Y})=-\mathrm{BCH}({-Y},{-X}), one concludes that

et[β1]×R1et[β2]×R2=eBCH(tR1[β2]×R1T,t[β1]×)R1R2.\displaystyle e^{-t[\beta_{1}]_{\times}}R_{1}e^{-t[\beta_{2}]_{\times}}R_{2}=e^{-\mathrm{BCH}({tR_{1}[\beta_{2}]_{\times}R_{1}^{T}},{t[\beta_{1}]_{\times}})}R_{1}R_{2}.

Hence, the SO3\mathrm{SO}_{3} pieces of (191) and (194) agree, ensuring that ψt\psi_{t} is an automorphism of GtG_{t} when restricted to SO3\mathrm{SO}_{3}. A modification of the group law (183) which makes the bias terms in (191)-(194) agree while simultaneously making the SO3\mathrm{SO}_{3} terms agree has not been identified.

The question that now must be addressed is the derivation of SDEs on the Lie algebra corresponding to the Langevin equations (176), for both the SE3\mathrm{SE}_{3} and DP group laws on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}. Since neither of these group laws make the deterministic part of (176) group-affine, the TSSDE (60) derived in Section V does not directly apply. Instead, one must return to eqn. (30), which was derived without making any assumptions on ff. The computation of the terms in (30) will be carried out for the SE3\mathrm{SE}_{3} group law on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}, while the DP group law will follow in a similar fashion.

The vector field f:SE3𝔰𝔢3f:\mathrm{SE}_{3}\rightarrow\mathfrak{se}_{3} corresponding to the deterministic piece of (176) reads

f(A)=𝔰𝔢3(ωmβ,[ωm]×β)A,\displaystyle f(A)=\mathcal{M}_{\mathfrak{se}_{3}}({\omega_{m}-\beta,-[\omega_{m}]_{\times}\beta})A, (195)

where 𝔰𝔢3\mathcal{M}_{\mathfrak{se}_{3}} is given by (110) and AA is given by

A=(Rβ01).\displaystyle A=\left(\begin{array}[]{cc}R&\beta\\ 0&1\end{array}\right).

Suppose g:[0,)SE3g:[0,\infty)\rightarrow\mathrm{SE}_{3} satisfies (176) without noise and where g(0)=e𝔰𝔢3(ξ0)μ0g(0)=e^{\mathcal{M}_{\mathfrak{se}_{3}}({\xi_{0}})}\mu_{0} with ξ0𝒩6(0,Σ0)\xi_{0}\sim\mathcal{N}^{6}(0,\Sigma_{0}) and

μ0=(R0β^001)\displaystyle\mu_{0}=\left(\begin{array}[]{cc}R_{0}&\widehat{\beta}_{0}\\ 0&1\end{array}\right)

is given. Further suppose g(t)g(t) for t>0t>0 may be represented as g(t)=e𝔰𝔢3(ξ(t))μ(t)g(t)=e^{\mathcal{M}_{\mathfrak{se}_{3}}({\xi(t)})}\mu(t), where μ:[0,)SE3\mu:[0,\infty)\rightarrow\mathrm{SE}_{3} equals μ(t)=et𝔰𝔢3(ωmβ^0,[ωm]×β^0)μ0\mu(t)=e^{t\mathcal{M}_{\mathfrak{se}_{3}}({\omega_{m}-\widehat{\beta}_{0},-[\omega_{m}]_{\times}\widehat{\beta}_{0}})}\mu_{0}. In other words, μ\mu follows the same evolution equation as gg, except the initial condition is known and given by μ0SE3\mu_{0}\in\mathrm{SE}_{3}. From (30), 𝔰𝔢3(ξ)\mathcal{M}_{\mathfrak{se}_{3}}({\xi}) then satisfies the ODE

J𝔰𝔢3(𝔰𝔢3(ξ))𝔰𝔢3(ξ˙)μ=e𝔰𝔢3(ξ)f(e𝔰𝔢3(ξ)μ)f(μ),\displaystyle J_{\mathfrak{se}_{3}}(\mathcal{M}_{\mathfrak{se}_{3}}({\xi}))\mathcal{M}_{\mathfrak{se}_{3}}({\dot{\xi}})\mu=e^{-\mathcal{M}_{\mathfrak{se}_{3}}({\xi})}f(e^{\mathcal{M}_{\mathfrak{se}_{3}}({\xi})}\mu)-f(\mu), (196)

Let the Lie algebra coordinates be partitioned as ξ=(δ,u)3×3\xi=(\delta,u)\in\mathbb{R}^{3}\times\mathbb{R}^{3}. Using (195), the SE3\mathrm{SE}_{3} exponential map (113), and the braiding identity (22), the RHS of (196) may be simplified to read

J𝔰𝔢3(𝔰𝔢3(ξ))𝔰𝔢3(ξ˙)\displaystyle J_{\mathfrak{se}_{3}}(\mathcal{M}_{\mathfrak{se}_{3}}({\xi}))\mathcal{M}_{\mathfrak{se}_{3}}({\dot{\xi}}) =(ead𝔰𝔢3(ξ)I)𝔰𝔢3(ωmβ^0,[ωm]×β^0)\displaystyle=\left(e^{-\mathrm{ad}_{\mathcal{M}_{\mathfrak{se}_{3}}({\xi})}}-I\right)\mathcal{M}_{\mathfrak{se}_{3}}({\omega_{m}-\widehat{\beta}_{0},-[\omega_{m}]_{\times}\widehat{\beta}_{0}})
+ead𝔰𝔢3(ξ)𝔰𝔢3(β^0B(δ,u,β^0),[ωm]×(β^0B(δ,u,β^0))),\displaystyle\hskip 14.22636pt+e^{-\mathrm{ad}_{\mathcal{M}_{\mathfrak{se}_{3}}({\xi})}}\mathcal{M}_{\mathfrak{se}_{3}}({\widehat{\beta}_{0}-B(\delta,u,\widehat{\beta}_{0}),[\omega_{m}]_{\times}\left(\widehat{\beta}_{0}-B(\delta,u,\widehat{\beta}_{0})\right)}),

where

β~(δ,u,β^0)=e[δ]×(β^0+J¯𝔰𝔬3(δ)u),\displaystyle\tilde{\beta}(\delta,u,\widehat{\beta}_{0})=e^{[\delta]_{\times}}(\widehat{\beta}_{0}+\overline{J}_{\!\mathfrak{so}_{3}}(\delta)u), (197)

with J¯𝔰𝔬3\overline{J}_{\!\mathfrak{so}_{3}} given by (114). Vectorizing the previous equation, while noting that J¯𝔰𝔢31(ξ)(ead¯ξI)=ad¯ξ\overline{J}_{\!\mathfrak{se}_{3}}^{-1}(\xi)\left(e^{-\overline{\mathrm{ad}}_{\xi}}-I\right)=-\overline{\mathrm{ad}}_{\xi} and J¯𝔰𝔢31(ξ)ead¯ξ=W¯𝔰𝔢3(ξ)\overline{J}_{\!\mathfrak{se}_{3}}^{-1}(\xi)e^{-\overline{\mathrm{ad}}_{\xi}}=\overline{W}_{\!\!\mathfrak{se}_{3}}(\xi), where W¯𝔰𝔢3\overline{W}_{\!\!\mathfrak{se}_{3}} is given by (124), results in

ξ˙=ad¯ξ(ωmβ^0[ωm]×β^0)+W¯𝔰𝔢3(ξ)(β^0β~(δ,u,β^0)[ωm]×(β^0β~(δ,u,β^0))).\displaystyle\dot{\xi}=-\overline{\mathrm{ad}}_{\xi}\left(\begin{array}[]{c}\omega_{m}-\widehat{\beta}_{0}\\ -[\omega_{m}]_{\times}\widehat{\beta}_{0}\end{array}\right)+\overline{W}_{\!\!\mathfrak{se}_{3}}(\xi)\left(\begin{array}[]{c}\widehat{\beta}_{0}-\tilde{\beta}(\delta,u,\widehat{\beta}_{0})\\ ~{}[\omega_{m}]_{\times}\left(\widehat{\beta}_{0}-\tilde{\beta}(\delta,u,\widehat{\beta}_{0})\right)\end{array}\right). (202)

The term β^0β~(δ,u,β^0)\widehat{\beta}_{0}-\tilde{\beta}(\delta,u,\widehat{\beta}_{0}) represents the deviation that fails to make (202) a linear equation on the Lie algebra. This term may be interpreted as the difference between the gyro bias MAP β^0\widehat{\beta}_{0} and its random fluctuation β~(δ,u,β^0)\tilde{\beta}(\delta,u,\widehat{\beta}_{0}).

Eqn. (202) is the desired ODE on the Lie algebra. Simplifying this equation further results in valuable insights. Using the formula (121) for ad¯\overline{\mathrm{ad}} on 𝔰𝔢3\mathfrak{se}_{3} and the explicit formula for W¯𝔰𝔢3\overline{W}_{\!\!\mathfrak{se}_{3}} in (124), eqn. (202) may be simplified to yield the following set of equations for the evolution of ξ=(δ,u)\xi=(\delta,u):

(205)

where NN is defined in (126) and W¯𝔰𝔬3\overline{W}_{\!\!\mathfrak{so}_{3}} is given by (125). Observe that the equation for δ\delta in (205) is linear, while the equation for uu is nonlinear. This is also consistent with the observation that the vector field in (176) is group-affine for SE3\mathrm{SE}_{3} (to zeroth-order in tt) for the attitude piece, but not the bias. Moreover, the second equation for u˙\dot{u} in (205) notably depends on the MAP through β^0\widehat{\beta}_{0}. This is consistent the observation that, in the course of computing (34), the MAP was only eliminated from (30) because of the group-affine assumption on ff. Since the SE3\mathrm{SE}_{3} group structure doesn’t make the vector field in (176) group-affine, this is expected.

The derivation of eqn. (205) did not include the noise in (176). The noisy generalization of (205) will now be derived. The Langevin eqn. (176) may be written as

A˙=𝔰𝔢3(ωmβη,[ωmη]×β+ζ)A.\displaystyle\dot{A}=\mathcal{M}_{\mathfrak{se}_{3}}({\omega_{m}-\beta-\eta,-[\omega_{m}-\eta]_{\times}\beta+\zeta})A.

The new additional term on the RHS of (196) is

e𝔰𝔢3(ξ)([η]×e[δ]×et[ω~β^0]×R0ζ00)μ1=(e[δ]×[η]×e[δ]×e[δ]×[η]×e[δ]×β^0+e[δ]×ζ00).\displaystyle e^{-\mathcal{M}_{\mathfrak{se}_{3}}({\xi})}\left(\begin{array}[]{cc}-[\eta]_{\times}e^{[\delta]_{\times}}e^{t[\tilde{\omega}-\widehat{\beta}_{0}]_{\times}}R_{0}&\zeta\\ 0&0\end{array}\right)\mu^{-1}=\left(\begin{array}[]{cc}-e^{-[\delta]_{\times}}[\eta]_{\times}e^{[\delta]_{\times}}&e^{-[\delta]_{\times}}[\eta]_{\times}e^{[\delta]_{\times}}\widehat{\beta}_{0}+e^{-[\delta]_{\times}}\zeta\\ 0&0\end{array}\right).

Upon vectorizing the previous equation and appropriately augmenting the RHS of (202) with the result yields the desired Langevin equation on the Lie algebra:

ξ˙=ad¯ξ(ωmβ^0[ωm]×β^0)+W¯𝔰𝔢3(ξ)(β^0β~(δ,u,β^0)η[ωm]×(β^0β~(δ,u,β^0))+ζ[β~(δ,u,β^0)]×η).\displaystyle\dot{\xi}=-\overline{\mathrm{ad}}_{\xi}\left(\begin{array}[]{c}\omega_{m}-\widehat{\beta}_{0}\\ -[\omega_{m}]_{\times}\widehat{\beta}_{0}\end{array}\right)+\overline{W}_{\!\!\mathfrak{se}_{3}}(\xi)\left(\begin{array}[]{c}\widehat{\beta}_{0}-\tilde{\beta}(\delta,u,\widehat{\beta}_{0})-\eta\\ ~{}[\omega_{m}]_{\times}\left(\widehat{\beta}_{0}-\tilde{\beta}(\delta,u,\widehat{\beta}_{0})\right)+\zeta-[\tilde{\beta}(\delta,u,\widehat{\beta}_{0})]_{\times}\eta\end{array}\right). (210)

Notice how the diffusion tensor depends on the MAP μ\mu through β^0\widehat{\beta}_{0}, unlike (60). A compact formula for NN (126) that is convenient for the numerical propagation of moments through (210) is presented in Appendix Formula for NN.

If instead the DP group law on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} is used, then the Langevin equation on the Lie algebra would read

{δ˙=[ωmβ^0]×δW¯𝔰𝔬3(δ)uW¯𝔰𝔬3(δ)η,u˙=ζ.\displaystyle\left\{\begin{array}[]{l}\dot{\delta}=[\omega_{m}-\widehat{\beta}_{0}]_{\times}\delta-\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)u-\overline{W}_{\!\!\mathfrak{so}_{3}}(\delta)\eta,\\[3.0pt] \dot{u}=\zeta.\end{array}\right. (213)

Comparing (205) with the deterministic part of (213), the evolution of the Lie algebra coordinates associated with rotations (i.e., the δ\delta) is now nonlinear, while the evolution of the coordinates associated with the gyro bias is linear. This is a result of the DP group law making the vector field in (176) group-affine when restricted to the bias states. This highlights a key trade-off between working with the SE3\mathrm{SE}_{3} and DP group law on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3} for filtering. In the following, a performance comparison of an attitude filter is made using these two different choices of group laws. A dramatic improvement when working with the SE3\mathrm{SE}_{3} group law versus the DP group law, as well as compared to the USQUE Crassidis_UKF , is observed.

VIII.1 Numerical experiment

As a proof-of-principle test of attitude filtering using CGs on SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}, consider an example of a satellite undergoing purely Keplerian motion with inclination 3535 [deg] and radius 6775.196775.19 [km] Crassidis_UKF . The spacecraft keeps its body xx-axis (roll) along the velocity vector, its body zz-axis (yaw) along the nadir, and the body yy-axis (pitch) along the negative orbit normal. With this geometry, the orbit period is exactly 92.592.5 [min] = 55505550 [s] and the body frame spacecraft angular velocity is ωorbit=(0,2π/5550,0)\omega_{\mathrm{orbit}}=(0,-2\pi/5550,0) [rad/sec]. The spacecraft is equipped with a strapdown, three axis gyroscope with its axes aligned with the spacecraft’s body frame. Gyro measurements are simulated in accordance with the Farrenkopf model (15) using the procedure outlined in Crassidis_SPKF .

Measurements of the spacecraft’s attitude are given by an unbiased triaxial magnetometer with zero-mean Gaussian distributed measurement noise with covariance σm2I3\sigma_{m}^{2}I_{3}, i.e. z=RB+nz=RB+n, where BB is Earth’s magnetic field in an inertial reference frame (to be defined), RR is given by is the rotation from the inertial frame to the spacecraft’s body frame, and n𝒩3(0,σm2I3)n\sim\mathcal{N}^{3}(0,\sigma_{m}^{2}I_{3}). The Earth’s magnetic field along the circular orbit is modeled as a tilted dipole according to B=(25.54[μT])×(3m(t),runitrunitm(t))B=\left(25.54~{}[\mu\mbox{T}\right])\times\left(3\langle m(t),r_{\mathrm{unit}}\rangle r_{\mathrm{unit}}-m(t)\right), in which runit=rorbit/rorbitr_{\mathrm{unit}}=r_{\mathrm{orbit}}/\|r_{\mathrm{orbit}}\| is a unit vector from the center of the Earth to the spacecraft and the Earth dipole is m(t)=(sin(θE)sin(αEt),sin(θE)cos(αEt),cos(θE))m(t)=(\sin{(\theta_{E})}\sin{(\alpha_{E}t)},\sin{(\theta_{E})}\cos{(\alpha_{E}t)},\cos{(\theta_{E})}), where θE=168.6\theta_{E}=168.6 [deg] and αE=4.178×103\alpha_{E}=4.178\times 10^{-3} [deg/s].

The conditions of the spacecraft at the first measurement are that it is crossing the ascending node, and so the angle between the body xx axis and the equatorial plane is the orbit inclination of θi=\theta_{i}=35. To clarify, let us take the inertial (XYZXYZ) reference frame such that the ascending node lies on the negative YY axis. The unit vector pointing to the orbit is then defined as rorbit=(cos(θi)sin(ωorbitt),cos(ωorbitt),sin(θi)sin(ωorbitt))r_{\mathrm{orbit}}=(\cos{(\theta_{i})}\sin{(\|\omega_{\mathrm{orbit}}\|t)},-\cos{(\|\omega_{\mathrm{orbit}}\|t)},\sin{(\theta_{i})}\sin{(\|\omega_{\mathrm{orbit}}\|t)}), and the velocity vector points along vorbit=ωorbit(cos(θi)cos(ωorbitt),sin(ωorbitt),sin(θi)cos(ωorbitt))v_{\mathrm{orbit}}=\|\omega_{\mathrm{orbit}}\|(\cos{(\theta_{i})}\cos{(\|\omega_{\mathrm{orbit}}\|t)},\sin{(\|\omega_{\mathrm{orbit}}\|t)},\sin{(\theta_{i})}\cos{(\|\omega_{\mathrm{orbit}}\|t)}). The true initial attitude quaternion is qtrue(0)=(0.6744,0.2126,0.2126,0.6744)q_{\mathrm{true}}(0)=(-0.6744,-0.2126,-0.2126,0.6744).

Three attitude filtering approaches are analyzed for the scenario outlined above. Two of the filters are TSFs, one using the SE3\mathrm{SE}_{3} group law and the other the DP group law on the state space SO3×3\mathrm{SO}_{3}\times\mathbb{R}^{3}. These filters are dubbed TSF-SE3\mathrm{SE}_{3} and TSF-DP, respectively. Both filters follow the recursive algorithm summarized in Section VI.4, with the exception that the SDE used for propagation of the tangent space variables is (210) for the SE3\mathrm{SE}_{3} case, and (213) for the DP case. For both the TSF-SE3\mathrm{SE}_{3} and TSF-DP, a fourth order Runge-Kutta method is used to integrate the sigma point ODEs (see Section VI.1). The third filter is the USQUE described in Crassidis_UKF .

The primary metric used to compare filter performance is the statistical consistency as measured by the chi-squared value. For the CG-based filters, this is the quantity χG2\chi_{G}^{2} defined in (82). For the USQUE, the estimated state error covariance for the attitude state is over (generalized) Rodrigues parameters. Hence, to compute a chi-squared statistic for the USQUE, first compute the error quaternion qtrueqest1q_{\mathrm{true}}q_{\mathrm{est}}^{-1} and then map this error quaternion to its associated Rodrigues parameter. This gives a 33 component error vector whose associated covariance is being computed by the USQUE. Together with the error in the estimated gyro bias, a chi-squared statistic similar to that of χG2\chi_{G}^{2} may be computed.

Table 1 gives a complete list of the simulation parameters used. The initial state covariance in Table 1 is for the TSF-SE3\mathrm{SE}_{3} filter. For the TSF-DP and USQUE, this initial state covariance is transformed into an equivalent covariance using the UT. The initial attitude was sampled from a CG with MAP given by the true initial attitude and tangent space covariance given by the initial attitude covariance in Table 1. The initial CG covariance matrix (attitude and bias) is chosen so that the initial statistical consistency value for all filters is approximately 11. All additional parameters for the USQUE were specifically chosen to reflect the analysis in Crassidis_UKF .

Figure 1 compares root-mean-square (RMS) of the χ2\chi^{2} metrics for statistical consistency between the three filters for 200200 Monte Carlo (MC) runs with randomly generated magnetometer measurements, gyro measurements, and initial attitude estimates. Smaller values indicate that the filter contains errors more faithfully, with an optimum value having a time average around 11. The TSF-SE3\mathrm{SE}_{3} clearly maintains close proximity to the expected value of 11, while the TSF-DP and USQUE show dramatic overconfidence in the first hour of the simulation and appear to never quite converge to the theoretically expected value by the end of the time span. The TSF-SE3\mathrm{SE}_{3} χG2\chi_{G}^{2} always maintains a value close 11, which points to a far more robust representation of attitude uncertainty for large initial errors. Rapid convergence and robustness is desirable for essentially all applications.

Figure 2 compares the RMS of the roll, pitch, and yaw errors and their associated 3σ3\sigma values for each filter. Only the first 1/21/2 hour of the simulation is shown in order to highlight the differences in the errors. One may observe a clear difference in the errors between the three filters, while the 3σ3\sigma values appear to be roughly the same. The USQUE shows overconfidence in each roll, pitch, and yaw direction, which is consistent with Figure 1. The TSF-DP has slightly worse error in each direction compared to the TSF-SE3\mathrm{SE}_{3}, which has the lowest overall error and is consistent with the 3σ3\sigma bounds (c.f. Figure 1). Even though the TSF-DP roll, pitch, and yaw errors appear to be contained within their 3σ3\sigma bounds, the χG2\chi_{G}^{2} consistency metric for the TSF-DP in Figure 1 indicates that there are important cross correlations that are not being faithfully captured by this filter.

Figure 3 compares the total estimated gyro bias error and (33 times) the square root of the largest eigenvalue of its associated 3×33\times 3 covariance matrix for each filter. The story here is similar to that of Figures 1 and 2. The USQUE total estimated gyro bias error is not contained within the 3σ3\sigma bound in the first hour of the simulation. The TSF-SE3\mathrm{SE}_{3} exhibits containment throughout, which is again consistent with Figure 1. Overall, the TSF-SE3\mathrm{SE}_{3} is exhibiting near optimal performance in estimating both attitude and gyro bias, as well as the cross-correlations between the two. As a final note, all three filters exhibit similar computational speed. The primary factor contributing to runtime in the TSF-based filters appears to be the propagation step based on the CTUT. Demanding a high degree of accuracy when integrating the sigma point ODEs will slow down the filter runtime. If this is a bottleneck for real-time applications, one may restore to more crude approaches when propagating the mean and covariance through the appropriate SDEs.

Table 1: Simulation Parameters
Simulation length 44 [Hr]
Gyro sampling rate 10 [Hz]
Gyro ARW Qη=(3.1623×107)2I3Q_{\eta}=(3.1623\times 10^{-7})^{2}I_{3} [rad/s1/2]
Gyro RRW Qζ=(3.1623×1010)2I3Q_{\zeta}=(3.1623\times 10^{-10})^{2}I_{3} [rad/s3/2]
True Initial Gyro Bias (20,20,20)(20,~{}20,~{}20) [deg/hr]
Magnetometer sampling rate 1 [Hz]
Magnetometer measurement noise 1σ1\sigma 50 [nT]
Initial Attitude Covariance (10π/180)2I3(10\pi/180)^{2}I_{3}
Initial Bias Covariance (20π/(180×3600))2I3(20\pi/(180\times 3600))^{2}I_{3}
Initial Attitude Estimate Randomly Sampled
Initial Gyro Bias Estimate (0,0,0)(0,~{}0,~{}0) [deg/hr]
Whitening Tolerance 101510^{-15}
UT λ\lambda 0
Refer to caption
Figure 1: The RMS over 200200 MC runs of the (normalized) χG2\chi^{2}_{G} statistical consistency metric (82) for the TSF-SE3\mathrm{SE}_{3} (blue) and TSF-DP (red), along with a similar statistical consistency metric for the USQUE (black). The magenta lines represent the theoretically expected statistical consistency value for all filters, with the solid line between 11 (the expected value) and the dashed lines representing the ±1σ\pm 1\sigma around this expected value.
Refer to caption
Figure 2: The RMS over 200200 MC runs of the roll, pitch, and yaw errors and 3σ3\sigma bounds for the TSF-SE3\mathrm{SE}_{3} (blue), TSF-DP (red), and USQUE (black). The xx-axis only ranges over the first 1/21/2 hour of the simulation in order to highlight the differences in the errors.
Refer to caption
Figure 3: The RMS over 200200 MC runs of the total gyro bias error and (33 times the square root of) the largest eigenvalue of the associated error covariance matrix for the TSF-SE3\mathrm{SE}_{3} (blue), TSF-DP (red), and USQUE (black).

References

References

  • (1) Kalman, R. E., “A New Approach to Linear Filtering and Prediction Problems,” Journal of Basic Engineering, Vol. 82, No. 1, 1960, pp. 35–45, 10.1115/1.3662552.
  • (2) Kalman, R. E. and Bucy, R. S., “New Results in Linear Filtering and Prediction Theory,” Journal of Basic Engineering, Vol. 83, No. 1, 1961, pp. 95–108, 10.1115/1.3658902.
  • (3) Grewal, M. S., Weill, L. R., and Andrews, A. P., Global positioning systems, inertial navigation, and integration, John Wiley & Sons, 2007, 10.1002/0470099720.
  • (4) Bar-Shalom, Y., Li, X. R., and Kirubarajan, T., Estimation with applications to tracking and navigation: theory algorithms and software, John Wiley & Sons, 2004, 10.1002/0471221279.
  • (5) Ristic, B., Arulampalam, S., and Gordon, N., Beyond the Kalman filter: Particle filters for tracking applications, Vol. 685, Artech house Boston, 2004.
  • (6) Julier, S. J. and Uhlmann, J. K., “New extension of the Kalman filter to nonlinear systems,” in Kadar, I., ed., “Signal Processing, Sensor Fusion, and Target Recognition VI,” International Society for Optics and Photonics, SPIE, Vol. 3068, 1997, pp. 182 – 193, 10.1117/12.280797.
  • (7) Beneš, V. E., “Exact finite-dimensional filters for certain diffusions with nonlinear drift,” Stochastics, Vol. 5, No. 1-2, 1981, pp. 65–92, 10.1080/17442508108833174.
  • (8) Daum, F., “Nonlinear filters: beyond the Kalman filter,” IEEE Aerospace and Electronic Systems Magazine, Vol. 20, No. 8, 2005, pp. 57–69, 10.1109/MAES.2005.1499276.
  • (9) Gordon, N., Salmond, D., and Smith, A., “Novel approach to nonlinear/non-Gaussian Bayesian state estimation,” IEE Proceedings F (Radar and Signal Processing), Vol. 140, 1993, pp. 107–113, 10.1049/ip-f-2.1993.0015.
  • (10) Daum, F. and Huang, J., “Curse of dimensionality and particle filters,” in “2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652),” Vol. 4, 2003, 10.1109/AERO.2003.1235126.
  • (11) Barrau, A. and Bonnabel, S., “Linear observed systems on groups,” Systems & Control Letters, Vol. 129, 2019, pp. 36–42, https://doi.org/10.1016/j.sysconle.2019.05.005.
  • (12) Sarkka, S., “On Unscented Kalman Filtering for State Estimation of Continuous-Time Nonlinear Systems,” IEEE Transactions on Automatic Control, Vol. 52, No. 9, 2007, pp. 1631–1641, 10.1109/TAC.2007.904453.
  • (13) Barrau, A. and Bonnabel, S., “The Geometry of Navigation Problems,” IEEE Transactions on Automatic Control, Vol. 68, No. 2, 2023, pp. 689–704, 10.1109/TAC.2022.3144328.
  • (14) Crassidis, J. L. and Markley, F. L., “Unscented Filtering for Spacecraft Attitude Estimation,” Journal of Guidance, Control, and Dynamics, Vol. 26, No. 4, 2003, pp. 536–542, 10.2514/2.5102.
  • (15) Shuster, M. D., “Survey of attitude representations,” Journal of the Astronautical Sciences, Vol. 41, No. 4, 1993, pp. 439–517.
  • (16) Farrenkopf, R., “Analytic Steady-State Accuracy Solutions for Two Common Spacecraft Attitude Estimators,” Journal of Guidance and Control, Vol. 1, No. 4, 1978, pp. 282–284, 10.2514/3.55779.
  • (17) Dianetti, A. D. and Crassidis, J. L., Extension of Farrenkopf Steady-State Solutions with Estimated Angular Rate, 10.2514/6.2018-2095.
  • (18) Hall, B. C., Lie groups, Lie algebras, and representations, Springer, 2013, 10.1007/978-3-319-13467-3.
  • (19) Müller, A., “Review of the exponential and Cayley map on SE3\mathrm{SE}_{3} as relevant for Lie group integration of the generalized Poisson equation and flexible multibody systems,” Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol. 477, No. 2253, 2021, p. 20210303, 10.1098/rspa.2021.0303.
  • (20) Wolfe, K. C., Mashner, M., and Chirikjian, G. S., “Bayesian fusion on Lie groups,” Journal of Algebraic Statistics, Vol. 2, No. 1, 10.18409/JAS.V2I1.11.
  • (21) Barrau, A. and Bonnabel, S., “Intrinsic Filtering on Lie Groups With Applications to Attitude Estimation,” IEEE Transactions on Automatic Control, Vol. 60, No. 2, 2015, pp. 436–449, 10.1109/TAC.2014.2342911.
  • (22) Brossard, M., Bonnabel, S., and Condomines, J.-P., “Unscented Kalman filtering on Lie groups,” in “2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),” , 2017, pp. 2485–2491, 10.1109/IROS.2017.8206066.
  • (23) Crassidis, J. L., Markley, F. L., and Cheng, Y., “Survey of Nonlinear Attitude Estimation Methods,” Journal of Guidance, Control, and Dynamics, Vol. 30, No. 1, 2007, pp. 12–28, 10.2514/1.22452.
  • (24) Wang, W. and Lee, T., “Bingham-Gaussian Distribution on 𝕊3×n\mathbb{S}^{3}\times\mathbb{R}^{n} for Unscented Attitude Estimation,” in “2021 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI),” , 2021, pp. 1–7, 10.1109/MFI52462.2021.9591164.
  • (25) Barfoot, T. D. and Furgale, P. T., “Associating Uncertainty With Three-Dimensional Poses for Use in Estimation Problems,” IEEE Transactions on Robotics, Vol. 30, No. 3, 2014, pp. 679–693, 10.1109/TRO.2014.2298059.
  • (26) Julier, S. and Uhlmann, J., “Unscented filtering and nonlinear estimation,” Proceedings of the IEEE, Vol. 92, No. 3, 2004, pp. 401–422, 10.1109/JPROC.2003.823141.
  • (27) Sjøberg, A. M. and Egeland, O., “Lie Algebraic Unscented Kalman Filter for Pose Estimation,” IEEE Transactions on Automatic Control, Vol. 67, No. 8, 2022, pp. 4300–4307, 10.1109/TAC.2021.3121247.
  • (28) Ye, J., Jayaraman, A. S., and Chirikjian, G. S., “Uncertainty Propagation on Unimodular Matrix Lie Groups,” IEEE Transactions on Automatic Control, pp. 1–16, 10.1109/TAC.2024.3486652.
  • (29) Ye, J. and Chirikjian, G. S., “Uncertainty Propagation and Bayesian Fusion on Unimodular Lie Groups from a Parametric Perspective,” arXiv preprint arXiv:2401.03425.
  • (30) Markley, F. L., “Attitude filtering on SO(3)\mathrm{SO}(3),” The Journal of the Astronautical Sciences, Vol. 54, No. 3, 2006, pp. 391–413, 10.1007/BF03256497.
  • (31) Do Carmo, M. P. and Flaherty Francis, J., Riemannian geometry, Vol. 6, Springer, 1992.
  • (32) Rosenberg, S., The Laplacian on a Riemannian Manifold: An Introduction to Analysis on Manifolds, London Mathematical Society Student Texts, Cambridge University Press, 1997, 10.1017/CBO9780511623783.
  • (33) Berger, M., A panoramic view of Riemannian geometry, Springer, 2003, 10.1007/978-3-642-18245-7.
  • (34) Faraut, J., Analysis on Lie Groups: An Introduction, Cambridge Studies in Advanced Mathematics, Cambridge University Press, 2008, 10.1017/CBO9780511755170.
  • (35) Schulman, L., “A Path Integral for Spin,” Phys. Rev., Vol. 176, 1968, pp. 1558–1569, 10.1103/PhysRev.176.1558.
  • (36) Van Loan, C., “Computing integrals involving the matrix exponential,” IEEE Transactions on Automatic Control, Vol. 23, No. 3, 1978, pp. 395–404, 10.1109/TAC.1978.1101743.
  • (37) Barrau, A. and Bonnabel, S., “The Invariant Extended Kalman Filter as a Stable Observer,” IEEE Transactions on Automatic Control, Vol. 62, No. 4, 2017, pp. 1797–1812, 10.1109/TAC.2016.2594085.
  • (38) Crassidis, J., “Sigma-point Kalman filtering for integrated GPS and inertial navigation,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 42, No. 2, 2006, pp. 750–756, 10.1109/TAES.2006.1642588.

Formula for NN

This appendix provides a formula for NN in (126) that is suitable for numerical implementation when propagating moments through the SDE (210). This formula is based on the following integral identity for the matrix exponential.

Lemma 1

Let An×nA\in\mathbb{R}^{n\times n} be a matrix which commutes with its Moore-Penrose pseudo-inverse, A#A^{\#} (e.g., a normal matrix). Then,

01esAds=projkerA+(IneA)A#,\displaystyle\int_{0}^{1}e^{-sA}\mathrm{d}s=\mathrm{proj}_{\ker{A}}+(I_{n}-e^{-A})A^{\#}, (214)

and

01sesAds=12projkerA+(In(In+A)eA)(A#)2.\displaystyle\int_{0}^{1}se^{-sA}\mathrm{d}s=\frac{1}{2}\mathrm{proj}_{\ker{A}}+(I_{n}-(I_{n}+A)e^{-A})(A^{\#})^{2}. (215)

Proof. In general A#A^{\#} satisfies A#A=InprojkerAA^{\#}A=I_{n}-\mathrm{proj}_{\ker{A}}. From the assumption that [A,A#]=0[A,A^{\#}]=0, this implies AA#=InprojkerAAA^{\#}=I_{n}-\mathrm{proj}_{\ker{A}}. In other words, when [A,A#]=0[A,A^{\#}]=0, then A#A^{\#} is both the left and right inverse of AA when restricted to the orthogonal complement of kerA\ker{A}. Therefore,

(eAI)A#=(I+12A+13!A2+)(IprojkerA)=(01etAdt)(IprojkerA).\displaystyle(e^{A}-I)A^{\#}=\left(I+\frac{1}{2}A+\frac{1}{3!}A^{2}+\cdots\right)(I-\mathrm{proj}_{\ker{A}})=\left(\int_{0}^{1}e^{tA}\mathrm{d}t\right)(I-\mathrm{proj}_{\ker{A}}).

Hence,

01esAds=(eAI)A#+(01etAdt)projkerA.\displaystyle\int_{0}^{1}e^{sA}\mathrm{d}s=(e^{A}-I)A^{\#}+\left(\int_{0}^{1}e^{tA}\mathrm{d}t\right)\mathrm{proj}_{\ker{A}}.

The final step is to note (01etAdt)projkerA=projkerA\left(\int_{0}^{1}e^{tA}\mathrm{d}t\right)\mathrm{proj}_{\ker{A}}=\mathrm{proj}_{\ker{A}}. Replacing AA with A-A results in (214). A similar argument proves (215). \square

Formula (214) does not directly apply with AA replaced by (121) because this matrix doesn’t commute with its Moore-Penrose pseudo-inverse. However, for δ3\delta\in\mathbb{R}^{3}, the Moore-Penrose pseudo-inverse of [δ]×[\delta]_{\times} is [δ]×/δ2-[\delta]_{\times}/\|\delta\|^{2}. Together with (214), this implies

J¯𝔰𝔬3(δ)=01es[δ]×ds=δδTδ21δ2(I3e[δ]×)[δ]×.\displaystyle\overline{J}_{\!\mathfrak{so}_{3}}(\delta)=\int_{0}^{1}e^{-s[\delta]_{\times}}\mathrm{d}s=\frac{\delta\delta^{T}}{\|\delta\|^{2}}-\frac{1}{\|\delta\|^{2}}(I_{3}-e^{-[\delta]_{\times}})[\delta]_{\times}. (216)

Therefore,

N(δ,u)\displaystyle N(\delta,u) =01es[δ]×[0ser[δ]×udr]×ds\displaystyle=\int_{0}^{1}e^{s[\delta]_{\times}}[\int_{0}^{s}e^{-r[\delta]_{\times}}u\mathrm{d}r]_{\times}\mathrm{d}s
=01es[δ]×[(sδδTδ21δ2(I3es[δ]×)[δ]×)u]×ds\displaystyle=\int_{0}^{1}e^{s[\delta]_{\times}}[\left(s\frac{\delta\delta^{T}}{\|\delta\|^{2}}-\frac{1}{\|\delta\|^{2}}(I_{3}-e^{-s[\delta]_{\times}})[\delta]_{\times}\right)u]_{\times}\mathrm{d}s
=1δ2(01ses[δ]×ds)[δδTu]×1δ201es[δ]×[(I3es[δ]×)[δ]×u]×ds.\displaystyle=\frac{1}{\|\delta\|^{2}}\left(\int_{0}^{1}se^{s[\delta]_{\times}}\mathrm{d}s\right)[\delta\delta^{T}u]_{\times}-\frac{1}{\|\delta\|^{2}}\int_{0}^{1}e^{-s[\delta]_{\times}}[(I_{3}-e^{-s[\delta]_{\times}})[\delta]_{\times}u]_{\times}\mathrm{d}s. (217)

From (215),

01ses[δ]×ds=12δδTδ21δ4(I3(I3[δ]×)e[δ]×)[δ]×2.\displaystyle\int_{0}^{1}se^{s[\delta]_{\times}}\mathrm{d}s=\frac{1}{2}\frac{\delta\delta^{T}}{\|\delta\|^{2}}-\frac{1}{\|\delta\|^{4}}(I_{3}-(I_{3}-[\delta]_{\times})e^{[\delta]_{\times}})[\delta]_{\times}^{2}. (218)

Plugging (218) into the first term on the RHS of (217) and simplify, one finds

1δ2(01ses[δ]×ds)[δδTu]×=δ,uδ4(I3(I3[δ]×)e[δ]×)[δ]×.\displaystyle\frac{1}{\|\delta\|^{2}}\left(\int_{0}^{1}se^{s[\delta]_{\times}}\mathrm{d}s\right)[\delta\delta^{T}u]_{\times}=-\frac{\langle\delta,u\rangle}{\|\delta\|^{4}}(I_{3}-(I_{3}-[\delta]_{\times})e^{[\delta]_{\times}})[\delta]_{\times}.

For the next term on the RHS of (217), first observe that

es[δ]×[(I3es[δ]×)[δ]×u]×ds=[es[δ]×,[[δ]×u]×],\displaystyle e^{s[\delta]_{\times}}[(I_{3}-e^{-s[\delta]_{\times}})[\delta]_{\times}u]_{\times}\mathrm{d}s=\left[e^{s[\delta]_{\times}},[[\delta]_{\times}u]_{\times}\right], (219)

where the outer most brackets in (219) represents the matrix commutator. Hence,

01es[δ]×[(I3es[δ]×)[δ]×u]×ds=[01es[δ]×ds,[[δ]×u]×].\displaystyle\int_{0}^{1}e^{s[\delta]_{\times}}[(I_{3}-e^{-s[\delta]_{\times}})[\delta]_{\times}u]_{\times}\mathrm{d}s=\left[\int_{0}^{1}e^{s[\delta]_{\times}}\mathrm{d}s,[[\delta]_{\times}u]_{\times}\right].

In total,

N(δ,u)=1δ2[[[δ]×u]×,J¯𝔰𝔬3(δ)]δ,uδ4(I3(I3[δ]×)e[δ]×)[δ]×.\displaystyle N(\delta,u)=\frac{1}{\|\delta\|^{2}}\left[[[\delta]_{\times}u]_{\times},\overline{J}_{\!\mathfrak{so}_{3}}(-\delta)\right]-\frac{\langle\delta,u\rangle}{\|\delta\|^{4}}(I_{3}-(I_{3}-[\delta]_{\times})e^{[\delta]_{\times}})[\delta]_{\times}. (220)

Formula (220) is convenient for numerical implementation because it doesn’t involve explicit trigonometric function through, e.g. the Rodrigues’ rotation formula (84), and it reuses quantities that will already be necessary to compute (e.g., J¯𝔰𝔬3(δ)\overline{J}_{\!\mathfrak{so}_{3}}(-\delta)). Moreover, it is easier to analytically compute derivatives of (220), which is necessary to compute the drift correction in (71).