This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Convergence estimates for the Magnus expansion IIA. 2×22\times 2 matrices with operator norm

Gyula Lakos Department of Geometry, Institute of Mathematics, Eötvös University, Pázmány Péter s. 1/C, Budapest, H–1117, Hungary [email protected]
Abstract.

We review and provide simplified proofs related to the Magnus expansion, and improve convergence estimates. Observations and improvements concerning the Baker–Campbell–Hausdorff expansion are also made.

In this Part IIA, we investigate the case of 2×22\times 2 matrices with respect to the operator norm. We consider norm estimates and minimal presentations in terms of the Magnus and BCH expansions. Some results are obtained in the complex case, but a more complete picture is obtained in the real case.

Key words and phrases:
Magnus expansion, Baker–Campbell–Hausdorff expansion, growth estimates, Davis–Wielandt shell, conformal range of operators, minimal exponential presentations
2010 Mathematics Subject Classification:
Primary: 47A12, 15A16, Secondary: 15A60.

Introduction to Part IIA

The present paper is a direct continuation of Part II [12]. This assumes general familiarity with Part I [11] and a good understanding of Part II [12]. In Part II, we obtained norm estimates and inclusion theorems with respect to the conformal range. In Part IIA, we take a closer look to the more treatable case of 2×22\times 2 matrices. Some investigations will be about testing or sharpening our earlier results for 2×22\times 2 matrices; other ones will deal with minimal Magnus or Baker–Campbell–Hausdorff presentations. Studying 2×22\times 2 matrices may seem to be a modest objective, but, in reality, the computations are not trivial. (Basic analytical calculations already appear in Magnus [15]; and more sophisticated techniques appear in Michel [16] but in the context of the Frobenius norm. Our computations are similar in spirit but more complicated, as the operator norm is taken seriously.) The information exhibited here is relatively more complete in the case of 2×22\times 2 real matrices, and much more partial in the case of 2×22\times 2 complex matrices.

In Section 1, we recall some theorems and examples from Part II. Our results here with respect to 2×22\times 2 matrices can be viewed relative to these. In Section 2, we review some technical tools concerning 2×22\times 2 matrices. To a variable extent, all later sections use the information discussed here, hence the content of this section is crucial (despite its elementary nature); however, its content can be consulted as needed later. Nevertheless, a cursory reading is advised in any case.

Section 3 and 4 discuss how Schur’s formulae and the BCH formula simplify for 2×22\times 2 matrices, respectively. In Section 5, we show that for 2×22\times 2 complex matrices exponentials only rarely play the “role of geodesics” (i. e. parts of minimal Magnus presentations). In Section 6, examples of BCH expansions from SL2()\operatorname{SL}_{2}(\mathbb{R}) are given with interest in norm growth. In Section 7, we demonstrate that (appropriately) balanced BCH expansions of 2×22\times 2 real matrices with cumulative norm π\pi are uniformly bounded. Next, we turn toward BCH minimal presentations of real 2×22\times 2 matrices (with norm restrictions). Sections 8 and 9 study the moments of the Schur maps and apply them to minimal BCH presentations. In Section 10, the (critical) “BCH unit ball” π2,π2\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}} is described resulting the “wedge cap”. At this point several further questions are left open, but having that prime interesting case seen, we leave the discussion of BCH presentations here.

Then, we start the investigation of Magnus (minimal) expansions for real 2×22\times 2 matrices. We find that, in this setting, Magnus minimal presentations are more natural and approachable than BCH minimal presentations. In Section 11, we prove a logarithmic monotonicity property for the principal disks of 2×22\times 2 real matrices. In Section 12 we consider some examples which test our earlier norm estimates for the Magnus expansion but also help to understand the 2×22\times 2 real case better by introducing the “canonical Magnus developments”. Section 13 develops a systematic analysis of the 2×22\times 2 real case. The first observation is that in this case the Magnus exponent can be read off from the conformal range / principal disk. Based on the previous examples, we consider (minimal) normal presentations for 2×22\times 2 real matrices given as time-ordered exponentials which are not ordinary exponentials. The conclusion is that those normal presentations are better suited to the geometric description of GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}) than the customary exponentials. In Section 14, we give information about the asymptotics of the optimal norm estimate for 2×22\times 2 real matrices. In Section 15, however, it is demonstrated that the Magnus exponent of complex 2×22\times 2 matrices cannot be read off from the conformal range as simply as in the real case. (Thus the previous methods cannot transfer that easily.)

Notation and terminology

Line yy of formula (X)(X) will be denoted as (X/y)(X/y). If a,ba,b are points of an Euclidean (or just affine) space, then [a,b]e[a,b]_{\mathrm{e}} denotes the segment connecting them. This notation is also applied for (half-)open segments. It can also be used in \mathbb{R} conveniently when a>ba>b. Instead of ‘by the unicity principle of analytic continuation’ we will often say ‘by analytic continuation’ even if the function is already constructed. In general, we try to follow the notations established in Parts I and II.

1. Some results from Part II (Review)

Here we recall some points of Part II. It cannot replace the detailed discussion given in Part II, but it serves reference.  

1.A. Conformal range

Suppose that \mathfrak{H} is a real Hilbert space. The logarithmic distance on {0}\mathfrak{H}\setminus\{0\} is given by

(1) dlog(𝐯,𝐰)=(log|𝐰|2|𝐯|2)2+((𝐯,𝐰))2.d_{\log}(\mathbf{v},\mathbf{w})=\sqrt{\left(\log\frac{|\mathbf{w}|_{2}}{|\mathbf{v}|_{2}}\right)^{2}+\left((\mathbf{v},\mathbf{w})\sphericalangle\right)^{2}}.
Theorem 1.1.

Suppose that 𝐳:[a,b]{0}\mathbf{z}:[a,b]\rightarrow\mathfrak{H}\setminus\{0\} is continuous. Then

dlog(𝐳(a),𝐳(b))t[a,b]|d𝐳(t)|2|𝐳(t)|2.d_{\log}(\mathbf{z}(a),\mathbf{z}(b))\leq\int_{t\in[a,b]}\frac{|\mathrm{d}\mathbf{z}(t)|_{2}}{|\mathbf{z}(t)|_{2}}.

In case of equality, 𝐳\mathbf{z} is a (not necessarily strictly) monotone subpath of a distance segment connecting 𝐳(a)\mathbf{z}(a) to 𝐳(b)\mathbf{z}(b). ∎

For 𝐱,𝐲{0}\mathbf{x},\mathbf{y}\in\mathfrak{H}\setminus\{0\} let (𝐱,𝐲)\sphericalangle(\mathbf{x},\mathbf{y}) be denote their angle. This can already be obtained from the underlying real scalar product 𝐱,𝐲real=Re𝐱,𝐲\langle\mathbf{x},\mathbf{y}\rangle_{\mathrm{real}}=\operatorname{Re}\,\langle\mathbf{x},\mathbf{y}\rangle. For 𝐱,𝐲\mathbf{x},\mathbf{y}\in\mathfrak{H}, 𝐱0\mathbf{x}\neq 0, let

𝐲:𝐱=𝐲,𝐱real|𝐱|22+i|𝐲|𝐱|2𝐲,𝐱real|𝐱|22𝐱|𝐱|2|2.\mathbf{y}:\mathbf{x}=\frac{\langle\mathbf{y},\mathbf{x}\rangle_{\mathrm{real}}}{|\mathbf{x}|_{2}^{2}}+\mathrm{i}\left|\frac{\mathbf{y}}{|\mathbf{x}|_{2}}-\frac{\langle\mathbf{y},\mathbf{x}\rangle_{\mathrm{real}}}{|\mathbf{x}|_{2}^{2}}\frac{\mathbf{x}}{|\mathbf{x}|_{2}}\right|_{2}.

For A()A\in\mathcal{B}(\mathfrak{H}), we define the (extended) conformal range as

CRext(A)={A𝐱:𝐱,(A𝐱:𝐱)¯:𝐱{0}};\operatorname{CR^{ext}}(A)=\{A\mathbf{x}:\mathbf{x},\,\overline{(A\mathbf{x}:\mathbf{x})}\,:\,\mathbf{x}\in\mathfrak{H}\setminus\{0\}\};

and the restricted conformal range as

CR(A)={A𝐱:𝐱:𝐱{0}}.\operatorname{CR}(A)=\{A\mathbf{x}:\mathbf{x}\,:\,\mathbf{x}\in\mathfrak{H}\setminus\{0\}\}.

(This is a partial aspect of the Davis-Wielandt shell, see Wielandt [20], Davis [5], [6] and also [12].)

Theorem 1.2.

(Time ordered exponential mapping theorem.)

If ϕ\phi is ()\mathcal{B}(\mathfrak{H})-valued ordered measure, then

(2) CRext(expLϕ)expD(0,ϕ2),\operatorname{CR^{ext}}(\operatorname{exp_{L}}\phi)\subset\exp\operatorname{D}(0,\textstyle{\int\|\phi\|_{2}}),

and

(3) sp(expLϕ)expD(0,ϕ2).\operatorname{sp}(\operatorname{exp_{L}}\phi)\subset\exp\operatorname{D}(0,\textstyle{\int\|\phi\|_{2}}).

In particular, if ϕ2<π\int\|\phi\|_{2}<\pi, then logexpLϕ\log\operatorname{exp_{L}}\phi is well-defined, and for its spectral radius

(4) r(logexpLϕ)ϕ2.\mathrm{r}(\log\operatorname{exp_{L}}\phi)\leq\textstyle{\int\|\phi\|_{2}}.\qed

1.B. Convergence theorems

Theorem 1.3.

(Mityagin–Moan–Niesen–Casas logarithmic convergence theorem.)

If ϕ\phi is a ()\mathcal{B}(\mathfrak{H})-valued ordered measure and ϕ2<π\int\|\phi\|_{2}<\pi, then the Magnus expansion k=1μk,L(ϕ)\sum_{k=1}^{\infty}\mu_{k,\mathrm{L}}(\phi) is absolute convergent. In fact, logexpL(ϕ)=k=1μk,L(ϕ)\log\operatorname{exp_{L}}(\phi)=\sum_{k=1}^{\infty}\mu_{k,\mathrm{L}}(\phi) also holds.

The statement also holds if ()\mathcal{B}(\mathfrak{H}) is replaced by any CC^{*}-algebra.

[Cf. Schäfer [19], Mityagin [17], Moan, Niesen [18], Casas [4].] ∎

We say that the ordered measure ϕ\phi is a multiple Baker–Campbell–Hausdorff (mBCH) measure, if, up to reparametrization, ϕ\phi is of form A1𝟏..An𝟏A_{1}\mathbf{1}\boldsymbol{.}\ldots\boldsymbol{.}A_{n}\mathbf{1}. In this case, ϕ\phi also allows a mass-normalized version

(5) ψ=B1𝟏(0,t1]..Bk𝟏[tk1,tk]\psi=B_{1}\mathbf{1}_{(0,t_{1}]}\boldsymbol{.}\ldots\boldsymbol{.}B_{k}\mathbf{1}_{[t_{k-1},t_{k}]}

where ti<ti+1t_{i}<t_{i+1}, Bi2=1\|B_{i}\|_{2}=1 and thus tk=ϕ2t_{k}=\int\|\phi\|_{2}. It is constructed by replacing AiA_{i} with Ai/Ai2A_{i}/\|A_{i}\|_{2} if Ai20\|A_{i}\|_{2}\neq 0, and eliminating the term if Ai=0A_{i}=0. As it is obtained by a kind a reparametrization, its Magnus expansion is not affected.

Theorem 1.4.

(Finite dimensional critical BCH convergence theorem.)

Let \mathfrak{H} be a finite dimensional Hilbert space . Consider the ()\mathcal{B}(\mathfrak{H}) valued mBCH measure ϕ=A1𝟏..Ak𝟏\phi=A_{1}\mathbf{1}\boldsymbol{.}\ldots\boldsymbol{.}A_{k}\mathbf{1} with cumulative norm A12++Ak2=π\|A_{1}\|_{2}+\ldots+\|A_{k}\|_{2}=\pi. Then, the convergence radius of the Magnus (mBCH) expansion of ϕ\phi is greater than 11. In particular, finite dimensional mBCH expansions with cumulative norm π\pi converge. ∎

Lemma 1.5.

Let ψ\psi be a mass-normalized mBCH-measure as in (5), ψ2>0\int\|\psi\|_{2}>0.

Consider all the Hilbert subspaces 𝔙\mathfrak{V} of \mathfrak{H} such that

(i) =𝔙𝔙\mathfrak{H}=\mathfrak{V}\oplus\mathfrak{V}^{\bot} is an invariant orthogonal decomposition for all BiB_{i}.

(ii) B1|𝔙==Bk|𝔙B_{1}|_{\mathfrak{V}}=\ldots=B_{k}|_{\mathfrak{V}}, and these are orthogonal (unitary).

Then there is a single maximal such 𝔙\mathfrak{V}. ∎

In the context of the previous lemma, if the maximal such 𝔙\mathfrak{V} is 0, then we call ψ\psi reduced. In particular, this applies if i<jker(BiBj)=0\bigcap_{i<j}\ker(B_{i}-B_{j})=0.

Theorem 1.6.

(Finite dimensional logarithmic critical BCH convergence theorem.)

Let \mathfrak{H} be a finite dimensional Hilbert space. Consider the ()\mathcal{B}(\mathfrak{H}) valued mass-normalized mBCH measure ψ\psi as in (5) with cumulative norm ψ2=π\int\|\psi\|_{2}=\pi. We claim:

(a) Unless the component operators BiB_{i} have a common eigenvector for i\mathrm{i} or i-\mathrm{i} (complex case), or a common eigenblock [11]\begin{bmatrix}&-1\\ 1&\end{bmatrix} (real case), then logexpL(ψ)=k=1μk,L(ψ)\log\operatorname{exp_{L}}(\psi)=\sum_{k=1}^{\infty}\mu_{k,\mathrm{L}}(\psi) also holds.

(b) If ψ\psi is reduced, then, for any tD(0,1)t\in\operatorname{D}(0,1), logexpL(tψ)=k=1tkμk,L(ψ)\log\operatorname{exp_{L}}(t\cdot\psi)=\sum_{k=1}^{\infty}t^{k}\mu_{k,\mathrm{L}}(\psi) holds. Thus, the loglog-able radius of the Magnus (BCH) expansion is also greater than 11. ∎

1.C. Growth estimates

For p>0p>0, let us define 0<(p)<π20<\ell(p)<\frac{\pi}{2} as the solution of the equation

(p)+psin(p)=π2.\ell(p)+p\sin\ell(p)=\frac{\pi}{2}.

Then :(0,)(π/2,0)e\ell:(0,\infty)\rightarrow(\pi/2,0)_{\mathrm{e}} is a decreasing diffeomorphism. In particular,

(π)=0.386519539.\ell(\pi)=0.386519539\ldots\,\,.
Theorem 1.7.

Let CRext(A)expD(0,p)\operatorname{CR^{ext}}(A)\subset\exp\operatorname{D}(0,p), 0<p<π0<p<\pi. Then

(6) logA2J(p)=t=(p)π(p)p+sin(psint)cos(psint)psint2sin(psint)JJ(p,t)dt.\|\log A\|_{2}\leq J(p)=\int_{t=\ell(p)}^{\pi-\ell(p)}\underbrace{{\frac{p+\sin\left(p\sin t\right)-\cos\left(p\sin t\right)p\sin t}{2\sin\left(p\sin t\right)}}}_{JJ(p,t)}\,\mathrm{d}t.\qed
Theorem 1.8.

(a) As p0p\searrow 0,

(7) J(p)=p+16p3172p5+21115120p7+O(p9).J(p)=p+\frac{1}{6}\,{p}^{3}-{\frac{1}{72}}{p}^{5}+\frac{211}{15120}p^{7}+O({p}^{9}).

(b) As pπp\nearrow\pi,

(8) J(p)=ππ+pπp+Jπ+O(πp),J(p)=\pi\sqrt{\frac{\pi+p}{\pi-p}}+J_{\pi}+O({\pi-p}),

where

Jπ=4tan(π)+t=(π)π(π)(JJ(π,t)2cos2t)dtJ_{\pi}=-4\tan\ell(\pi)+\int_{t=\ell(\pi)}^{\pi-\ell(\pi)}\left(JJ(\pi,t)-\frac{2}{\cos^{2}t}\right)\,\mathrm{d}t

(the integrand extends to a smooth function of tt). Numerically, Jπ=3.0222.J_{\pi}=-3.0222\ldots\,\,.

The function J(p)J(p) is not very particular, it can be improved. An example we have considered to test the effectivity of (6)–(8) was

Example 1.9.

(The analytical expansion of the Magnus critical case.)

On the interval [0,π][0,\pi], we consider the measure Φ\Phi, such that

Φ(θ)=[sin2θcos2θcos2θsin2θ]dθ|[0,π].\Phi(\theta)=\begin{bmatrix}-\sin 2\theta&\cos 2\theta\\ \cos 2\theta&\sin 2\theta\end{bmatrix}\,\mathrm{d}\theta|_{[0,\pi]}.

Thus, for p(0,π)p\in(0,\pi),

p/πΦ2=p.\int\|p/\pi\cdot\Phi\|_{2}=p.

Then,

μL(p/πΦ)\displaystyle\mu_{\mathrm{L}}(p/\pi\cdot\Phi) =logexpL(p/πΦ)\displaystyle=\log\operatorname{exp_{L}}(p/\pi\cdot\Phi)
=π(11(p/π)21)[p/π1p/π+1].\displaystyle=\pi\left(\frac{1}{\sqrt{1-(p/\pi)^{2}}}-1\right)\begin{bmatrix}&-p/\pi-1\\ -p/\pi+1&\end{bmatrix}.

Consequently,

μL(p/πΦ)2\displaystyle\|\mu_{\mathrm{L}}(p/\pi\cdot\Phi)\|_{2} =π(11(p/π)21)(1+p/π)\displaystyle=\pi\left(\frac{1}{\sqrt{1-(p/\pi)^{2}}}-1\right)(1+p/\pi)
=ππ+pπpπp\displaystyle=\pi\sqrt{\frac{\pi+p}{\pi-p}}-\pi-p
(9) =2π3/2(πp)1/22π24π1/2(πp)1/2+O(πp),\displaystyle=\sqrt{2}\pi^{3/2}(\pi-p)^{-1/2}-2\pi-\frac{\sqrt{2}}{4}\pi^{1/2}(\pi-p)^{1/2}+O(\pi-p),

as pπp\nearrow\pi. ∎

(We will see, however, that Φ|[0,p]\Phi|_{[0,p]} is a better candidate to test (6)–(8).)

2. Computational background for 2×22\times 2 matrices

For the sake of reference, here we review some facts and conventions related to 2×22\times 2 matrices we have already considered in Part II. However, we also take the opportunity to augment this review by further observations of analytical nature.

2.A. The skew-quaternionic form (review)

One can write the 2×22\times 2 matrix AA in skew-quaternionic form

(10) A=a~Id2+b~I~+c~J~+d~K~a~[11]+b~[11]+c~[11]+d~[11].A=\tilde{a}\operatorname{Id}_{2}+\tilde{b}\tilde{I}+\tilde{c}\tilde{J}+\tilde{d}\tilde{K}\equiv\tilde{a}\begin{bmatrix}1&\\ &1\end{bmatrix}+\tilde{b}\begin{bmatrix}&-1\\ 1&\end{bmatrix}+\tilde{c}\begin{bmatrix}1&\\ &-1\end{bmatrix}+\tilde{d}\begin{bmatrix}&1\\ 1&\end{bmatrix}.

Then,

trA2=a~,\frac{\operatorname{tr}A}{2}=\tilde{a},

and

detA=a~2+b~2c~2d~2.\det A=\tilde{a}^{2}+\tilde{b}^{2}-\tilde{c}^{2}-\tilde{d}^{2}.

2.B. Spectral type (review)

Let us use the notation

DA=det(AtrA2Id2)=(detA)(trA)24.D_{A}=\det\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)=(\det A)-\frac{(\operatorname{tr}A)^{2}}{4}.

It is essentially the discriminant of AA, as the eigenvalues of AA are 12trA±DA\frac{1}{2}\operatorname{tr}A\pm\sqrt{-D_{A}}.

In form (10),

(11) DA=b~2c~2d~2.D_{A}=\tilde{b}^{2}-\tilde{c}^{2}-\tilde{d}^{2}.

In the special case of real 2×22\times 2 matrices, we use the classification

\bullet elliptic case: two conjugate strictly complex eigenvalues,

\bullet parabolic case: two equal real eigenvalues,

\bullet hyperbolic case: two distinct real eigenvalues.

Then, for real 2×22\times 2 matrices, DAD_{A} measures ‘ellipticity/parabolicity/hiperbolicity’: If DA>0D_{A}>0, then AA is elliptic; if DA=0D_{A}=0, then AA is parabolic; if DA<0D_{A}<0, then AA is hyperbolic.

In the general complex case, there are two main categories: parabolic (DA=0)(D_{A}=0) and non-parabolic (DA0)(D_{A}\neq 0).

2.C. Principal and chiral disks (review)

For 2×22\times 2 real matrices we can refine the spectral data as follows: Assume that A=[abcd]=a~Id2+b~I~+c~J~+d~K~A=\begin{bmatrix}a&b\\ c&d\end{bmatrix}=\tilde{a}\operatorname{Id}_{2}+\tilde{b}\tilde{I}+\tilde{c}\tilde{J}+\tilde{d}\tilde{K}. Its principal disk is

PD(A):=D(a+d2+|cb|2i,(ad2)2+(b+c2)2)=D(a~+|b~|i,c~2+d~2).\operatorname{PD}(A):=\operatorname{D}\left(\frac{a+d}{2}+\frac{|c-b|}{2}\mathrm{i},\sqrt{\left(\frac{a-d}{2}\right)^{2}+\left(\frac{b+c}{2}\right)^{2}}\right)=\operatorname{D}\left(\tilde{a}+|\tilde{b}|\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right).

This is refined further by the chiral disk

CD(A):=D(a+d2+cb2i,(ad2)2+(b+c2)2)=D(a~+b~i,c~2+d~2).\operatorname{CD}(A):=\operatorname{D}\left(\frac{a+d}{2}+\frac{c-b}{2}\mathrm{i},\sqrt{\left(\frac{a-d}{2}\right)^{2}+\left(\frac{b+c}{2}\right)^{2}}\right)=\operatorname{D}\left(\tilde{a}+\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right).

The additional data in the chiral disk is the chirality, which is the sign of the twisted trace, sgntr([11]A)=sgn(cb)=sgnb~.\operatorname{sgn}\operatorname{tr}\left(\begin{bmatrix}&1\\ -1&\end{bmatrix}A\right)=\operatorname{sgn}(c-b)=\operatorname{sgn}\tilde{b}. This chirality is, in fact, understood with respect to a fixed orientation of 2\mathbb{R}^{2}. It does not change if we conjugate AA by a rotation, but it changes sign if we conjugate AA by a reflection. From the properties of the twisted trace, it is also easy too see that log\log respects chirality.

One can read off many data from the disks. For example, if PD(A)=D((a~,b~),r)\operatorname{PD}(A)=\operatorname{D}((\tilde{a},\tilde{b}),r), then detA=a~2+b~2r2\det A=\tilde{a}^{2}+\tilde{b}^{2}-r^{2}. This is not surprising in the light of

Lemma 2.1.

CD\operatorname{CD} makes a bijective correspondence between possibly degenerated disks in \mathbb{C} and the orbits of M2()\mathrm{M}_{2}(\mathbb{R}) with respect to conjugacy by special orthogonal matrices (i. e. rotations).

PD\operatorname{PD} makes a bijective correspondence between possibly degenerated disks with center in +\mathbb{C}^{+} and the orbits of M2()\mathrm{M}_{2}(\mathbb{R}) with respect to conjugacy by orthogonal matrices. ∎

The principal / chiral disk is a point if AA has the effect of a complex multiplication (that AA is a quasicomplex matrix). In general, matrices AA fall into three categories: elliptic, parabolic, hyperbolic; such that the principal /chiral disk are disjoint, tangent or secant to the real axis, respectively.

2.D. Conformal range of 2×22\times 2 real matrices (review)

The principal / conformal disks turn out to be closely related to the conformal range:

Lemma 2.2.

Consider the real matrix A=a~Id2+b~I~+c~J~+d~K~.A=\tilde{a}\operatorname{Id}_{2}+\tilde{b}\tilde{I}+\tilde{c}\tilde{J}+\tilde{d}\tilde{K}. We claim:

(a) For AA acting on 2\mathbb{R}^{2},

CRext(A)=D(a~+b~i,c~2+d~2)D(a~b~i,c~2+d~2).\operatorname{CR^{ext}}(A^{\mathbb{R}})=\partial\operatorname{D}\left(\tilde{a}+\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right)\cup\partial\operatorname{D}\left(\tilde{a}-\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right).

(b) For AA acting on 2\mathbb{C}^{2},

CRext(A)=\displaystyle\operatorname{CR^{ext}}(A^{\mathbb{C}})= D(a~+b~i,c~2+d~2)D̊(a~b~i,c~2+d~2)\displaystyle\operatorname{D}\left(\tilde{a}+\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right)\setminus\operatorname{\mathring{D}}\left(\tilde{a}-\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right)
D(a~b~i,c~2+d~2)D̊(a~+b~i,c~2+d~2).\displaystyle\cup\operatorname{D}\left(\tilde{a}-\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right)\setminus\operatorname{\mathring{D}}\left(\tilde{a}+\tilde{b}\mathrm{i},\sqrt{{\tilde{c}}^{2}+{\tilde{d}}^{2}}\right).

This is CRext(A)\operatorname{CR^{ext}}(A^{\mathbb{R}}) but with the components of CRext(A)\mathbb{C}\setminus\operatorname{CR^{ext}}(A^{\mathbb{R}}) disjoint from \mathbb{R} filled in. ∎

Thus CR(A)\operatorname{CR}(A^{\mathbb{R}}) is the boundary of the principal (or chiral) disk factored by conjugation. In terms of hyperbolic geometry (in the Poincaré half-space), CR(A)\operatorname{CR}(A^{\mathbb{R}}) may yield points or circles in the elliptic case; asymptotic points or corresponding horocycles (paracycles) in the parabolic case; lines or pairs of distance lines (hypercycles) in the hyperbolic case. (In the normal case, it yields points, asymptotic points or lines; in the non-normal case, it yields circles, asymptotically closed horocycles, asymptotically closed pairs of distance lines.) CR(A)\operatorname{CR}(A^{\mathbb{C}}) is CR(A)\operatorname{CR}(A^{\mathbb{R}}) but the hh-convex closure.

2.E. Recognizing log-ability

For finite matrices sp(A)=CRext(A(real))\operatorname{sp}(A)\cap\mathbb{R}=\operatorname{CR^{ext}}(A^{\mathrm{(real)}})\cap\mathbb{R}. Consequently, AA is log\log-able if and only if CRext(A(real))(,0]=\operatorname{CR^{ext}}(A^{\mathrm{(real)}})\cap(\infty,0]=\emptyset. (CRext\operatorname{CR^{ext}} can be replaced by CR\operatorname{CR}). Or, for 2×22\times 2 real matrices, in terms of the principal disk, AA is log\log-able if and only if PD(A)(,0]=\operatorname{PD}(A)\cap(\infty,0]=\emptyset. (PD\operatorname{PD} can be replaced by CD\operatorname{CD}.)

2.F. Canonical forms of 2×22\times 2 matrices in skew-quaternionic form

In M2()\mathrm{M}_{2}(\mathbb{R}), the effect of conjugation by a rotation matrix Id2cosθ+I~sinθ\operatorname{Id}_{2}\cos\theta+\tilde{I}\sin\theta is given by

(12) aId2+bI~+cJ~+dK~aId2+bI~+((cos2θ)Id2+(sin2θ)I~)(cId2+dI~)K~.a\operatorname{Id}_{2}+b\tilde{I}+c\tilde{J}+d\tilde{K}\mapsto a\operatorname{Id}_{2}+b\tilde{I}+((\cos 2\theta)\operatorname{Id}_{2}+(\sin 2\theta)\tilde{I})(c\operatorname{Id}_{2}+d\tilde{I})\tilde{K}.

Thus, through conjugation by a rotation matrix, any real 2×22\times 2 matrix can be brought into shape

aId2+bI~+cJ~a\operatorname{Id}_{2}+b\tilde{I}+c\tilde{J}

with

abc0.\text{$a\in\mathbb{R}$, $b\in\mathbb{R}$, $c\geq 0$}.

If we also allow conjugation by J~\tilde{J}, then

b0b\geq 0

can be achieved. (sgnb\operatorname{sgn}b is the chirality class of the matrix.)

In M2()\mathrm{M}_{2}(\mathbb{C}), using conjugation by rotation matrices, the coefficient of K~\tilde{K} can be eliminated. Then using conjugation by diagonal unitary matrices, the phase of the coefficient of I~\tilde{I} can be adjusted. In that way the form

(13) aId2+eiβ(s1I~+s2J~)a\operatorname{Id}_{2}+\mathrm{e}^{\mathrm{i}\beta}(s_{1}\tilde{I}+s_{2}\tilde{J})

with

aa\in\mathbb{C}, s10s_{1}\geq 0, s20s_{2}\geq 0, β\beta\in\mathbb{R}

can be achieved. Conjugating by K~\tilde{K}, β[0,π)\beta\in[0,\pi) or a similar restriction can be assumed. Using the definite abuse of notation K~=1+i2Id2+1i2K~\sqrt{\tilde{K}}=\frac{1+\mathrm{i}}{2}\operatorname{Id}_{2}+\frac{1-\mathrm{i}}{2}\tilde{K}, we find K~I~K~1=iJ~\sqrt{\tilde{K}}\tilde{I}\sqrt{\tilde{K}}^{-1}=-\mathrm{i}\tilde{J} and K~J~K~1=iI~\sqrt{\tilde{K}}\tilde{J}\sqrt{\tilde{K}}^{-1}=-\mathrm{i}\tilde{I}; and K~\sqrt{\tilde{K}} is unitary. Thus, by a further unitary conjugation,

β[0,π/2)\beta\in[0,\pi/2)

or a similar restriction can be assumed. (We could also base a canonical form on J~\tilde{J} and K~\tilde{K}, but the present form is conveniently close to the real case.)

Here normal matrices are characterized by that s1=0s_{1}=0 or s2=0s_{2}=0.

2.G. The visualization of certain subsets of M2()\mathrm{M}_{2}(\mathbb{R})

In what follows we will consider certain subsets SS of M2()\mathrm{M}_{2}(\mathbb{R}). By the rotational effect (12), an invariant subset SS is best to be visualized through the image of the mapping

ΞPH:aId2+bI~+cJ~+dK~(a,b,c2+d2).\Xi^{\mathrm{PH}}:\qquad a\operatorname{Id}_{2}+b\tilde{I}+c\tilde{J}+d\tilde{K}\mapsto(a,b,\sqrt{c^{2}+d^{2}}).

(ΞPH(S)\Xi^{\mathrm{PH}}(S) can be considered as a subset of the asymptotically closed Poincaré half-space.)

In particular, the image of the 0 centered sphere in M2()\mathrm{M}_{2}(\mathbb{R}) of radius π2\frac{\pi}{2} is

ΞPH(π2)={(a,b,r):a2+b2+r=π2},\Xi^{\mathrm{PH}}\left(\partial\mathcal{B}_{\frac{\pi}{2}}\right)=\left\{(a,b,r)\,:\,\sqrt{a^{2}+b^{2}}+r=\frac{\pi}{2}\right\},

a “conical hat”.

2.H. Arithmetic consequences of dimension 22

For 2×22\times 2 complex matrices AA, the Cayley–Hamilton equation reads as

(14) A2=(trA)A(detA)Id2.A^{2}=(\operatorname{tr}A)A-(\det A)\operatorname{Id}_{2}.

Obvious consequences are as follows: For the trace,

trA2=(trA)22(detA);\operatorname{tr}A^{2}=(\operatorname{tr}A)^{2}-2(\det A);

i. e.,

detA=(trA)2trA22;\det A=\frac{(\operatorname{tr}A)^{2}-\operatorname{tr}A^{2}}{2};

in the invertible case,

A1=1detA((trA)Id2A);A^{-1}=\frac{1}{\det A}\left((\operatorname{tr}A)\operatorname{Id}_{2}-A\right);

and, more generally, for the adjugate

(15) adjA=(trA)Id2A.\operatorname{adj}A=(\operatorname{tr}A)\operatorname{Id}_{2}-A.

Applying (15) to the identity (adj𝐯)(adjA)=adj(A𝐯)(\operatorname{adj}\mathbf{v})(\operatorname{adj}A)=\operatorname{adj}(A\mathbf{v}), and expanded, we obtain

(16) A𝐯+𝐯A=(trA)𝐯+(tr𝐯)A+tr(A𝐯)Id2(trA)(tr𝐯)Id2.A\mathbf{v}+\mathbf{v}A=(\operatorname{tr}A)\mathbf{v}+(\operatorname{tr}\mathbf{v})A+\operatorname{tr}(A\mathbf{v})\operatorname{Id}_{2}-(\operatorname{tr}A)(\operatorname{tr}\mathbf{v})\operatorname{Id}_{2}.

Multiplying by AA, and applying (14), it is easy to deduce that

(17) A𝐯A=det(A)𝐯+tr(A𝐯)A(detA)(tr𝐯)Id2.A\mathbf{v}A=\det(A)\mathbf{v}+\operatorname{tr}(A\mathbf{v})A-(\det A)(\operatorname{tr}\mathbf{v})\operatorname{Id}_{2}.

This leads to

Lemma 2.3.

(o) Any non-commutative (real) polynomial of A,𝐯A,\mathbf{v} can be written as linear combinations of

(18) Id2,A,𝐯,[A,𝐯]\operatorname{Id}_{2},A,\mathbf{v},[A,\mathbf{v}]

with coefficients which are (real) polynomials of

(19) trA\operatorname{tr}A, tr𝐯\operatorname{tr}\mathbf{v}, detA\det A, det𝐯\det\mathbf{v}, tr(A𝐯)\operatorname{tr}(A\mathbf{v}).

(a) Any scalar expression built up from Id2,A,𝐯\operatorname{Id}_{2},A,\mathbf{v} and tr\operatorname{tr}, det\det, adj\operatorname{adj} using algebra operations can by written as a polynomial of (19).

(b) Any matrix expression built up from Id2,A,𝐯\operatorname{Id}_{2},A,\mathbf{v} and tr\operatorname{tr}, det\det, adj\operatorname{adj} using algebra operations can by written as linear combination of (18) with coefficients which are polynomials of (19).

Proof.

(o) Using (14) and (17), and also their version with the role of AA and 𝐯\mathbf{v} interchanged, repeatedly, we arrive to linear combinations Id2,A,𝐯,A𝐯,𝐯A\operatorname{Id}_{2},A,\mathbf{v},A\mathbf{v},\mathbf{v}A (with appropriate coefficients). Due to, (16), however, the last two terms can be traded to [A,𝐯][A,\mathbf{v}]. (a–b) are best to be proven by simultaneous induction on the complexity of the algebraic expressions. In view of (o), the only nontrivial step is when we take determinant of a matrix expression. This, however, can be resolved by using detX=(trX)2trX22\det X=\dfrac{(\operatorname{tr}X)^{2}-\operatorname{tr}X^{2}}{2}. ∎

If we examine the proof of Lemma 2.3, we see that it also gives an algorithm for the reduction to standard form.

Example 2.4.

If we apply Lemma 2.3 with AA and 𝐯=A\mathbf{v}=A^{*}, then (19) yields

trA,trA¯,detA,detA¯,tr(AA).\operatorname{tr}A,\,\overline{\operatorname{tr}A},\,\det A,\,\overline{\det A},\,\operatorname{tr}(A^{*}A).

(Although XX and its conjugate X¯{\overline{X}} contains the same information, they are arithmetically different objects.) By taking complex linear combinations, we have

RetrA,ImtrA,RedetA,ImdetA,tr(AA).\operatorname{Re}\operatorname{tr}A,\,\operatorname{Im}\operatorname{tr}A,\,\operatorname{Re}\det A,\,\operatorname{Im}\det A,\,\operatorname{tr}(A^{*}A).

which are real quantities. These are the “five data” associated to AA (cf. [13]). ∎

In general, if we allow adjoints in our expressions, then their complexity increases. We only note the identity

(20) tr(AA)tr(A𝐯)2(detA)((trA)(tr𝐯)tr(A𝐯))==tr((AAAA2Id2)(A𝐯A𝐯2Id2)),\frac{\operatorname{tr}(A^{*}A)\operatorname{tr}(A^{*}\mathbf{v})}{2}-(\det A^{*})\left((\operatorname{tr}A)(\operatorname{tr}\mathbf{v})-\operatorname{tr}(A\mathbf{v})\right)=\\ =\operatorname{tr}\left(\left(A^{*}A-\frac{A^{*}A}{2}\operatorname{Id}_{2}\right)\left(A^{*}\mathbf{v}-\frac{A^{*}\mathbf{v}}{2}\operatorname{Id}_{2}\right)\right),

which can be checked by direct computation. (Matrix expressions with “more than two variables” can also be dealt systematically, but the picture is more complicated.)

Rearranging (14) into a full square, we obtain

(21) (AtrA2Id2)2=(DA)Id2.\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)^{2}=(-D_{A})\operatorname{Id}_{2}.

It is useful to introduce the notation

TA,𝐯=12tr((AtrA2Id2)(𝐯tr𝐯2Id2)).T_{A,\mathbf{v}}=\frac{1}{2}\operatorname{tr}\left(\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)\left(\mathbf{v}-\frac{\operatorname{tr}\mathbf{v}}{2}\operatorname{Id}_{2}\right)\right).

By (21), DA=TA,AD_{A}=-T_{A,A}, thus TA,𝐯T_{A,\mathbf{v}} can be thought as a the polarization DAD_{A} (times 1-1). Then one can prove

(22) [A,[A,[A,𝐯]]]=4DA[A,𝐯];[A,[A,[A,\mathbf{v}]]]=-4D_{A}[A,\mathbf{v}];
(23) [A,[𝐯,[A,𝐯]]]=[𝐯,[A,[A,𝐯]]]=4TA,𝐯[A,𝐯];[A,[\mathbf{v},[A,\mathbf{v}]]]=[\mathbf{v},[A,[A,\mathbf{v}]]]=4T_{A,\mathbf{v}}[A,\mathbf{v}];
(24) [𝐯,[𝐯,[A,𝐯]]]=4D𝐯[A,𝐯].[\mathbf{v},[\mathbf{v},[A,\mathbf{v}]]]=-4D_{\mathbf{v}}[A,\mathbf{v}].

Indeed, in (22), AA can be replaced by AtrA2Id2A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}, in which form, (21) implies (22). Then (24) follows by interchange of variables, and (23) follows by polarization in the first two commutator variables.

The commutator analogue of Lemma 2.3 is

Lemma 2.5.

Any (real) commutator polynomial of A,𝐯A,\mathbf{v} with no uncommmutatored terms can be written as linear combinations of

(25) [A,𝐯],[A,[A,𝐯]],[𝐯,[𝐯,A]][A,\mathbf{v}],[A,[A,\mathbf{v}]],[\mathbf{v},[\mathbf{v},A]]

with coefficients which are (real) polynomials of

(26) DAD_{A}, TA,𝐯T_{A,\mathbf{v}}, D𝐯D_{\mathbf{v}}.
Proof.

Commutator expressions can be reduced to linear combinations of iterated left commutators, whose ends of length four are either trivial or can be reduced by (22) or (23) (essentially). After the reduction only commutators of length 22 or 33 are left with appropriate coefficients. Conversely, by (22) or (23) or (24), the multipliers DAD_{A}, TA,𝐯T_{A,\mathbf{v}}, D𝐯D_{\mathbf{v}} can be absorbed to commutators. ∎

Lemma 2.6.

We find:

(a) In Lemma 2.3, the entries (18) and (19) are formally algebraically independent.

(b) In Lemma 2.5, the entries (25) and (26) are formally algebraically independent.

Proof.

(a) Even the particular case of Example 2.4 demonstrates that the entries of (18) and (19) are algebraically independent over the reals. This also implies independence in general case over the reals. As all the algebraic rules are inherently real, this also implies independence over the complex numbers.

(b) It is easy to see that in the statement of Lemma 2.3, detA\det A, tr(A𝐯)\operatorname{tr}(A\mathbf{v}), det𝐯\det\mathbf{v} can be replaced by DAD_{A}, TA,𝐯T_{A,\mathbf{v}}, D𝐯D_{\mathbf{v}}, respectively. Regarding the base terms, the identities

(27) [A,[A,𝐯]]=4DA(𝐯tr𝐯2Id2)4TA,𝐯(AtrA2Id2)[A,[A,\mathbf{v}]]=-4D_{A}\left(\mathbf{v}-\frac{\operatorname{tr}\mathbf{v}}{2}\operatorname{Id}_{2}\right)-4T_{A,\mathbf{v}}\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)

and

(28) [𝐯,[𝐯,𝐀]]=4TA,𝐯(𝐯tr𝐯2Id2)4D𝐯(AtrA2Id2)[\mathbf{v},[\mathbf{v},\mathbf{A}]]=-4T_{A,\mathbf{v}}\left(\mathbf{v}-\frac{\operatorname{tr}\mathbf{v}}{2}\operatorname{Id}_{2}\right)-4D_{\mathbf{v}}\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)

show the independence of the new base terms as |DATA,𝐯TA,𝐯D𝐯|0\left|\begin{matrix}D_{A}&T_{A,\mathbf{v}}\\ T_{A,\mathbf{v}}&D_{\mathbf{v}}\end{matrix}\right|\not\equiv 0. ∎

As a consequence, we have obtained a normal form for (formal) commutator expressions in the case of 2×22\times 2 matrices. As the absorption rules (2224) are quite simple, this normal form is quite practical.

We also mention the identities

(29) TA,[A,𝐯]=0;T_{A,[A,\mathbf{v}]}=0;
(30) TA,[A,[A,𝐯]]=0;T_{A,[A,[A,\mathbf{v}]]}=0;
(31) TA,[𝐯,[𝐯,A]]=4(DAD𝐯TA,𝐯2);T_{A,[\mathbf{v},[\mathbf{v},A]]}=4(D_{A}D_{\mathbf{v}}-T_{A,\mathbf{v}}^{2});

which are easy to check by direct computation.

2.I. Self-adjointness, conform-unitarity, normality

Note that if BB is a 2×22\times 2 complex self-adjoint matrix, then it is of real hyperbolic or parabolic type. Thus

DB0ifB=B,-D_{B}\geq 0\qquad\text{if}\qquad B=B^{*},

with equality if and only if BB is a (real) scalar matrix.

In particular, for any 2×22\times 2 complex matrix AA,

DAA=(tr(AA)2)2|detA|2=12tr(AAtr(AA)2Id2)20.-D_{A^{*}A}=\left(\frac{\operatorname{tr}(A^{*}A)}{2}\right)^{2}-|\det A|^{2}=\frac{1}{2}\operatorname{tr}\left(A^{*}A-\frac{\operatorname{tr}(A^{*}A)}{2}\operatorname{Id}_{2}\right)^{2}\quad\geq\quad 0.

Equality holds if and only AA is normal and its eigenvalues have equal absolute values, i. e. AA is conformal-unitary (i. e. it is a unitary matrix times a positive scalar) or AA is the 0 matrix. (This follows by considering the unitary-conjugated triangular form of the matrices.) Note that the 0 matrix is excluded from being conform-unitary. In the real case, we can speak about conform-orthogonal matrices. Then, 2×22\times 2 conform-orthogonal matrices are either conform-rotations or conform-reflections.

For 2×22\times 2 complex matrices, in terms of the Frobenius norm,

[A,A]Frob2=tr([A,A][A,A])=tr[A,A]2=2det[A,A]=2D[A,A].\|[A,A^{*}]\|_{\mathrm{Frob}}^{2}=\operatorname{tr}\left([A,A^{*}]^{*}[A,A^{*}]\right)=\operatorname{tr}\,[A,A^{*}]^{2}=-2\det[A,A^{*}]=-2D_{[A,A^{*}]}.

Thus, tr[A,A]2=0\operatorname{tr}\,[A,A^{*}]^{2}=0, det[A,A]=0\det[A,A^{*}]=0, D[A,A]=0D_{[A,A^{*}]}=0 are all equivalent to the normality of AA. Actually, one can see, in terms of the operator norm, [A,A]Frob2=2[A,A]22\|[A,A^{*}]\|_{\mathrm{Frob}}^{2}=2\|[A,A^{*}]\|_{2}^{2}.

A somewhat strange quantity is D(trA)¯A+(trA)A2-D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}}. One can check that it vanishes if and only if trA=0\operatorname{tr}A=0 or AA is conform-unitary.

Lemma 2.7.

If AA is a complex 2×22\times 2 matrix, then

DAA=(tr(AA)2)2|detA|2;-D_{A^{*}A}=\left(\frac{\operatorname{tr}(A^{*}A)}{2}\right)^{2}-|\det A|^{2};
D(trA)¯A+(trA)A2=tr(AA)|trA|2det(A)tr(A)2det(A)tr(A)24;-D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}}=\dfrac{\operatorname{tr}(A^{*}A)|\operatorname{tr}A|^{2}-\det(A^{*})\operatorname{tr}(A)^{2}-\det(A)\operatorname{tr}(A^{*})^{2}}{4};
D12[A,A]=(trAA2|trA|24)2|(detA)(trA)24|2.-D_{\frac{1}{2}[A,A^{*}]}=\left(\frac{\operatorname{tr}A^{*}A}{2}-\frac{|\operatorname{tr}A|^{2}}{4}\right)^{2}-\left|(\det A)-\frac{(\operatorname{tr}A)^{2}}{4}\right|^{2}.
Proof.

Straightforward computation. ∎

In the case of real matrices the computations are typically much simpler, especially in skew-quaternionic form. For example,

Lemma 2.8.

If A=a~Id2+b~I~+c~J~+d~K~A=\tilde{a}\operatorname{Id}_{2}+\tilde{b}\tilde{I}+\tilde{c}\tilde{J}+\tilde{d}\tilde{K} is a real 2×22\times 2 matrix, then

DAA=4(a~2+b~2)(c~2+d~2),-D_{A^{*}A}=4(\tilde{a}^{2}+\tilde{b}^{2})(\tilde{c}^{2}+\tilde{d}^{2}),
D(trA)¯A+(trA)A2=4a~2(c~2+d~2),-D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}}=4\tilde{a}^{2}(\tilde{c}^{2}+\tilde{d}^{2}),
D12[A,A]=4b~2(c~2+d~2).-D_{\frac{1}{2}[A,A^{*}]}=4\tilde{b}^{2}(\tilde{c}^{2}+\tilde{d}^{2}).
Proof.

Simple computation. ∎

2.J. Exponentials (review, alternative)

Recall that Cos\operatorname{\not{\mathrm{C}}os} and Sin\operatorname{\not{\mathrm{S}}in} are entire functions on the complex plane such that

Cos(z)=n=0(1)nzn(2n)! and Sin(z)=n=0(1)nzn(2n+1)!.\operatorname{\not{\mathrm{C}}os}(z)=\sum_{n=0}^{\infty}(-1)^{n}\frac{z^{n}}{(2n)!}\text{\qquad and \qquad}\operatorname{\not{\mathrm{S}}in}(z)=\sum_{n=0}^{\infty}(-1)^{n}\frac{z^{n}}{(2n+1)!}.

For xx\in\mathbb{R},

Cos(x)={cosxif x>01if x=0coshxif x<0, and Sin(x)={sinxxif x>01if x=0sinhxxif x<0.\operatorname{\not{\mathrm{C}}os}(x)=\begin{cases}\cos\sqrt{x}&\text{if }x>0\\ 1&\text{if }x=0\\ \cosh\sqrt{-x}&\text{if }x<0,\end{cases}\text{\qquad and \qquad}\operatorname{\not{\mathrm{S}}in}(x)=\begin{cases}\dfrac{\sin\sqrt{x}}{\sqrt{x}}&\text{if }x>0\\ 1&\text{if }x=0\\ \dfrac{\sinh\sqrt{-x}}{\sqrt{-x}}&\text{if }x<0.\end{cases}
Lemma 2.9.

Let AA be a complex 2×22\times 2 matrix. Then

expA=(exptrA2)(Cos(DA)Id2+Sin(DA)(AtrA2Id2)).\exp A=\left(\exp\frac{\operatorname{tr}A}{2}\right)\cdot\left(\operatorname{\not{\mathrm{C}}os}\left(D_{A}\right)\operatorname{Id}_{2}+\operatorname{\not{\mathrm{S}}in}\left(D_{A}\right)\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)\right).
Proof.

A=trA2Id2+(AtrA2Id2)A=\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}+\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right) gives a decomposition to commuting operators for which, by multiplicativity, the exponential can be computed separately. In the case of the first summand this is trivial. In the case of the second summand, the identity (21) and the comparison of the power series implies the statement. ∎

2.K. The differential calculus of Cos\operatorname{\not{\mathrm{C}}os} and Sin\operatorname{\not{\mathrm{S}}in} (review)

First of all, it is useful to notice that

z=1Cos(z)2Sin(z)2z=\frac{1-\operatorname{\not{\mathrm{C}}os}(z)^{2}}{\operatorname{\not{\mathrm{S}}in}(z)^{2}}

(as entire analytic functions). Then one can easily see that

Cos(z)=12Sin(z)\operatorname{\not{\mathrm{C}}os}^{\prime}(z)=-\frac{1}{2}\operatorname{\not{\mathrm{S}}in}(z)

and

Sin(z)=Cos(z)Sin(z)2z=12Sin(z)2(Cos(z)Sin(z))1Cos(z)2\operatorname{\not{\mathrm{S}}in}^{\prime}(z)=\frac{\operatorname{\not{\mathrm{C}}os}(z)-\operatorname{\not{\mathrm{S}}in}(z)}{2z}\\ =\frac{1}{2}\frac{\operatorname{\not{\mathrm{S}}in}(z)^{2}\cdot(\operatorname{\not{\mathrm{C}}os}(z)-\operatorname{\not{\mathrm{S}}in}(z))}{1-\operatorname{\not{\mathrm{C}}os}(z)^{2}}

(as entire analytic functions). In particular, differentiation will not lead out of the rational field generated by Cos\operatorname{\not{\mathrm{C}}os} and Sin\operatorname{\not{\mathrm{S}}in}.  

2.L. A notable differential equation (review)

Lemma 2.10.

The solution of the ordinary differential equation

dA(θ)dtA(θ)1=a[11]+b[sin2cθcos2cθcos2cθsin2cθ]aI~+exp(cθI~)bK~exp(cθI~),\frac{\mathrm{d}A(\theta)}{\mathrm{d}t}A(\theta)^{-1}=a\begin{bmatrix}&-1\\ 1&\end{bmatrix}+b\begin{bmatrix}-\sin 2c\theta&\cos 2c\theta\\ \cos 2c\theta&\sin 2c\theta\end{bmatrix}\equiv a\tilde{I}+\exp(c\theta\tilde{I})b\tilde{K}\exp(-c\theta\tilde{I}),

with initial data

A(0)=[11]Id2,A(0)=\begin{bmatrix}1&\\ &1\end{bmatrix}\equiv\operatorname{Id}_{2},

is given by

A(θ)=F(aθ,bθ,cθ);A(\theta)=F(a\theta,b\theta,c\theta);

where

F(s,p,w)=exp(wI~)exp((ws)I~+pK~).F(s,p,w)=\exp(w\tilde{I})\exp(-(w-s)\tilde{I}+p\tilde{K}).\qed

In particular,

expL((a[11]+b[sin2cθcos2cθcos2cθsin2cθ])dθ|[0,π])=F(aπ,bπ,cπ).\operatorname{exp_{L}}\left(\left(a\begin{bmatrix}&-1\\ 1&\end{bmatrix}+b\begin{bmatrix}-\sin 2c\theta&\cos 2c\theta\\ \cos 2c\theta&\sin 2c\theta\end{bmatrix}\right)d\theta|_{[0,\pi]}\right)=F(a\pi,b\pi,c\pi).

We will also use the special notation

W(p,w)=F(0,p,w)exp(wI~)exp(wI~+pK~).W(p,w)=F(0,p,w)\equiv\exp(w\tilde{I})\exp(-w\tilde{I}+p\tilde{K}).

2.M. The calculus of Cot\operatorname{\not{\mathrm{C}}ot}

In what follows, we also use the meromorphic function

Cot(z)=Cos(z)Sin(z).\operatorname{\not{\mathrm{C}}ot}(z)=\frac{\operatorname{\not{\mathrm{C}}os}(z)}{\operatorname{\not{\mathrm{S}}in}(z)}.

Cot\operatorname{\not{\mathrm{C}}ot} has poles at z=k2π2z=k^{2}\pi^{2}, where kk is a positive integer. It is easy to see that

Cot(z)=12(1Cot(z)z+Cot(z)2z).\operatorname{\not{\mathrm{C}}ot}^{\prime}(z)=-\frac{1}{2}\left(1-\frac{\operatorname{\not{\mathrm{C}}ot}(z)}{z}+\frac{\operatorname{\not{\mathrm{C}}ot}(z)^{2}}{z}\right).

Consequently, (z,Cot(z))\mathbb{R}\boldsymbol{(}z,\operatorname{\not{\mathrm{C}}ot}(z)\boldsymbol{)} is a differential field (induced from, say, (,π2)(-\infty,\pi^{2})), which is subset of the differential field (Sin(z),Cos(z))\mathbb{R}\boldsymbol{(}\operatorname{\not{\mathrm{S}}in}(z),\operatorname{\not{\mathrm{C}}os}(z)\boldsymbol{)}. It is not much smaller: As

Sin(z)2=1Cot(z)2+z,\operatorname{\not{\mathrm{S}}in}(z)^{2}=\frac{1}{\operatorname{\not{\mathrm{C}}ot}(z)^{2}+z},

it is not hard to show that the degree of the field extension is

(Sin(z),Cos(z)):(z,Cot(z))=2.\mathbb{R}\boldsymbol{(}\operatorname{\not{\mathrm{S}}in}(z),\operatorname{\not{\mathrm{C}}os}(z)\boldsymbol{)}:\mathbb{R}\boldsymbol{(}z,\operatorname{\not{\mathrm{C}}ot}(z)\boldsymbol{)}=2.

2.N. A collection of auxiliary functions

Lemma 2.11.

The expressions

(a)

(z)=2Sin(z)Sin(z)=Cot(z)+1z\operatorname{\not{\mathbf{C}}}(z)=-2\frac{\operatorname{\not{\mathrm{S}}in}^{\prime}(z)}{\operatorname{\not{\mathrm{S}}in}(z)}=\frac{-\operatorname{\not{\mathrm{C}}ot}(z)+1}{z}

(b)

(z)=2z(z)+(z)=z(z)22(z)+1=Cot(z)2+z1z\operatorname{\not{\mathbf{D}}}(z)=2z\operatorname{\not{\mathbf{C}}}^{\prime}(z)+\operatorname{\not{\mathbf{C}}}(z)=z\operatorname{\not{\mathbf{C}}}(z)^{2}-2\operatorname{\not{\mathbf{C}}}(z)+1=\frac{\operatorname{\not{\mathrm{C}}ot}(z)^{2}+z-1}{z}

(c)

(z)=2(z)=Cot(z)2+Cot(z)+z2z2\operatorname{\not{\mathbf{W}}}(z)=2\operatorname{\not{\mathbf{C}}}^{\prime}(z)=\frac{\operatorname{\not{\mathrm{C}}ot}(z)^{2}+\operatorname{\not{\mathrm{C}}ot}(z)+z-2}{z^{2}}

(d)

(z)=(z)22(z)=3Cot(z)z+3z2\operatorname{\not{\mathbf{P}}}(z)=\operatorname{\not{\mathbf{C}}}(z)^{2}-2\operatorname{\not{\mathbf{C}}}^{\prime}(z)=\frac{-3\operatorname{\not{\mathrm{C}}ot}(z)-z+3}{z^{2}}

(e)

(z)=(z)=2z′′(z)+3(z)=z(z)32(z)22(z)+(z)\operatorname{\not{\mathbf{G}}}(z)=\operatorname{\not{\mathbf{D}}}^{\prime}(z)=2z\operatorname{\not{\mathbf{C}}}^{\prime\prime}(z)+3\operatorname{\not{\mathbf{C}}}^{\prime}(z)=z\operatorname{\not{\mathbf{C}}}(z)^{3}-2\operatorname{\not{\mathbf{C}}}(z)^{2}-2\operatorname{\not{\mathbf{C}}}^{\prime}(z)+\operatorname{\not{\mathbf{C}}}(z)
=Cot(z)3zCot(z)+1z2=\frac{-\operatorname{\not{\mathrm{C}}ot}(z)^{3}-z\operatorname{\not{\mathrm{C}}ot}(z)+1}{z^{2}}

(f)

𝐋(z)=(2)(z)(z)3=(z)3+2(z)(z)2(z)\operatorname{\not{\mathbf{L}}}(z)=\left(\frac{\operatorname{\not{\mathbf{D}}}}{\operatorname{\not{\mathbf{C}}}^{2}}\right)^{\prime}(z)\cdot\operatorname{\not{\mathbf{C}}}(z)^{3}=\operatorname{\not{\mathbf{C}}}(z)^{3}+2\operatorname{\not{\mathbf{C}}}^{\prime}(z)\operatorname{\not{\mathbf{C}}}(z)-2\operatorname{\not{\mathbf{C}}}(z)
=2Cot(z)3zCot(z)2+3Cot(z)22zCot(z)z2+3z1z3=\frac{-2\operatorname{\not{\mathrm{C}}ot}(z)^{3}-z\operatorname{\not{\mathrm{C}}ot}(z)^{2}+3\operatorname{\not{\mathrm{C}}ot}(z)^{2}-2z\operatorname{\not{\mathrm{C}}ot}(z)-z^{2}+3z-1}{z^{3}}

(g)

(z)=43′′(z)=2Cot(z)33Cot(z)22zCot(z)3Cot(z)3z+83z3\operatorname{\not{\mathbf{X}}}(z)=\frac{4}{3}\operatorname{\not{\mathbf{C}}}^{\prime\prime}(z)={\frac{-2\,{\operatorname{\not{\mathrm{C}}ot}(z)}^{3}-3\,{\operatorname{\not{\mathrm{C}}ot}(z)}^{2}-2\,z\,{\operatorname{\not{\mathrm{C}}ot}(z)}-3\,{\operatorname{\not{\mathrm{C}}ot}(z)}-3\,z+8}{3{z}^{3}}}

define meromorphic functions ,,,,,𝐋,\operatorname{\not{\mathbf{C}}},\operatorname{\not{\mathbf{D}}},\operatorname{\not{\mathbf{W}}},\operatorname{\not{\mathbf{P}}},\operatorname{\not{\mathbf{G}}},\operatorname{\not{\mathbf{L}}},\operatorname{\not{\mathbf{X}}} on the complex plane such that

(i) They are holomorphic on \mathbb{C} except at z=(kπ)2z=(k\pi)^{2} where kk is a positive integer. In those latter points they have poles of order

 12213𝐋33 .\text{ $\operatorname{\not{\mathbf{C}}}\mapsto 1$, $\operatorname{\not{\mathbf{D}}}\mapsto 2$, $\operatorname{\not{\mathbf{W}}}\mapsto 2$, $\operatorname{\not{\mathbf{P}}}\mapsto 1$, $\operatorname{\not{\mathbf{G}}}\mapsto 3$, $\operatorname{\not{\mathbf{L}}}\mapsto 3$, $\operatorname{\not{\mathbf{X}}}\mapsto 3$ }.

(ii) They are strictly monotone increasing on the interval (,π2)(-\infty,\pi^{2}) with range (0,)(0,\infty).

(iii) In particular, they are strictly positive on the interval (,π2)(-\infty,\pi^{2}).

Proof.

These facts can be derived using various methods of (complex) analysis. ∎

The following table contains easily recoverable information:

functionasymptotics as xvalue at x=0 asymptotics at zk2π2  (k{0})|x|1/21321zk2π2|x|1134π2k2(zk2π2)2|x|3/224541(zk2π2)2|x|11156π21k2(zk2π2)|x|21158π2k2(zk2π2)3𝐋|x|31135161(zk2π2)3|x|5/21628351631(zk2π2)3\begin{array}[]{|c|c|c|c|}\hline\cr&&&\\ \text{function}&\text{asymptotics as $x\searrow-\infty$}&\text{value at $x=0$}&\text{ asymptotics at $z\sim k^{2}\pi^{2}$ }\\ &&&\text{ ($k\in\mathbb{N}\setminus\{0\}$)}\\ &&&\\ \operatorname{\not{\mathbf{C}}}&|x|^{-1/2}&\dfrac{1}{3}&-2\cdot\dfrac{1}{z-k^{2}\pi^{2}}\\ &&&\\ \operatorname{\not{\mathbf{D}}}&|x|^{-1}&\dfrac{1}{3}&4\pi^{2}\cdot\dfrac{k^{2}}{(z-k^{2}\pi^{2})^{2}}\\ &&&\\ \operatorname{\not{\mathbf{W}}}&|x|^{-3/2}&\dfrac{2}{45}&4\cdot\dfrac{1}{(z-k^{2}\pi^{2})^{2}}\\ &&&\\ \operatorname{\not{\mathbf{P}}}&|x|^{-1}&\dfrac{1}{15}&-\dfrac{6}{\pi^{2}}\cdot\dfrac{1}{k^{2}(z-k^{2}\pi^{2})}\\ &&&\\ \operatorname{\not{\mathbf{G}}}&|x|^{-2}&\dfrac{1}{15}&-8\pi^{2}\cdot\dfrac{k^{2}}{(z-k^{2}\pi^{2})^{3}}\\ &&&\\ \operatorname{\not{\mathbf{L}}}&|x|^{-3}&\dfrac{1}{135}&-16\cdot\dfrac{1}{(z-k^{2}\pi^{2})^{3}}\\ &&&\\ \operatorname{\not{\mathbf{X}}}&|x|^{-5/2}&\dfrac{16}{2835}&-\dfrac{16}{3}\cdot\dfrac{1}{(z-k^{2}\pi^{2})^{3}}\\ &&&\\ \hline\cr\end{array}

Furthermore, it is also easy to check that the identities

(z)2=(z)+(z)\operatorname{\not{\mathbf{C}}}(z)^{2}=\operatorname{\not{\mathbf{W}}}(z)+\operatorname{\not{\mathbf{P}}}(z)
(z)(z)=(z)+(z)\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{D}}}(z)=\operatorname{\not{\mathbf{W}}}(z)+\operatorname{\not{\mathbf{G}}}(z)
(z)(z)=(z)(z)+𝐋(z)\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{G}}}(z)=\operatorname{\not{\mathbf{D}}}(z)\operatorname{\not{\mathbf{W}}}(z)+\operatorname{\not{\mathbf{L}}}(z)
(z)(z)=(z)(z)+𝐋(z)\operatorname{\not{\mathbf{P}}}(z)\operatorname{\not{\mathbf{D}}}(z)=\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{W}}}(z)+\operatorname{\not{\mathbf{L}}}(z)
(z)(z)=(z)𝐋(z)+(z)2\operatorname{\not{\mathbf{G}}}(z)\operatorname{\not{\mathbf{P}}}(z)=\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{L}}}(z)+\operatorname{\not{\mathbf{W}}}(z)^{2}
(z)2(z)=(z)(z)+(z)(z)+𝐋(z)\operatorname{\not{\mathbf{C}}}(z)^{2}\operatorname{\not{\mathbf{D}}}(z)=\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{W}}}(z)+\operatorname{\not{\mathbf{D}}}(z)\operatorname{\not{\mathbf{W}}}(z)+\operatorname{\not{\mathbf{L}}}(z)
(z)(z)+2(z)(z)=𝐋(z)+(z)\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{P}}}(z)+2\operatorname{\not{\mathbf{C}}}(z)\operatorname{\not{\mathbf{W}}}(z)=\operatorname{\not{\mathbf{L}}}(z)+\operatorname{\not{\mathbf{W}}}(z)

hold.

Lemma 2.12.

(a) The function x1(x)|x(,π2)\left.x\mapsto\frac{1}{\sqrt{\operatorname{\not{\mathbf{D}}}(x)}}\right|_{x\in(-\infty,\pi^{2})} extends to an analytic function \mathcal{E} on (a neighbourhood of) \mathbb{R}. Special values are (0)=3\mathcal{E}(0)=\sqrt{3}, (π2)=0\mathcal{E}(\pi^{2})=0.

(b) The function x(x)(x)|x(,π2)\left.x\mapsto\frac{\operatorname{\not{\mathbf{C}}}(x)}{\sqrt{\operatorname{\not{\mathbf{D}}}(x)}}\right|_{x\in(-\infty,\pi^{2})} extends to an analytic function \mathcal{F} on (a neighbourhood of) \mathbb{R}. Special values are (0)=13\mathcal{F}(0)=\frac{1}{\sqrt{3}}, (π2)=1π\mathcal{F}(\pi^{2})=\frac{1}{\pi}.

(c) As analytic functions,

(z)=12(z)(z)3,and(z)=12𝐋(z)(z)3.\mathcal{E}^{\prime}(z)=-\frac{1}{2}\operatorname{\not{\mathbf{G}}}(z)\mathcal{E}(z)^{3},\qquad\text{and}\qquad\mathcal{F}^{\prime}(z)=-\frac{1}{2}\operatorname{\not{\mathbf{L}}}(z)\mathcal{E}(z)^{3}.

In particular, |(,π2]\mathcal{E}|_{(-\infty,\pi^{2}]} and |(,π2]\mathcal{F}|_{(-\infty,\pi^{2}]} are strictly decreasing.

Proof.

(a) It is sufficient to find a compatible extension to (0,+)(0,+\infty). One can check that, on the real line, (x)\operatorname{\not{\mathbf{D}}}(x) has poles with asymptotics 4π2k2(xk2π2)2\sim\frac{4\pi^{2}k^{2}}{(x-k^{2}\pi^{2})^{2}} at x=k2π2x=k^{2}\pi^{2} for positive integers kk; and otherwise (x)\operatorname{\not{\mathbf{D}}}(x) is positive on the real line. This implies that

(x)=sgnsinx(x)for x>0\mathcal{E}(x)=\frac{\operatorname{sgn}\sin\sqrt{x}}{\sqrt{\operatorname{\not{\mathbf{D}}}(x)}}\qquad\text{for $x>0$}

defines an appropriate extension. The special values are straightforward.

(b) As \operatorname{\not{\mathbf{C}}} has only simple poles with asymptotics 2xk2π2\sim\frac{-2}{x-k^{2}\pi^{2}}, these are cancelled out once multiplied by (x)\mathcal{E}(x). The special values are straightforward.

(c) This is straightforward. ∎

2.O. The function AC\operatorname{AC} (review, alternative)

If z(,1]z\in\mathbb{C}\setminus(-\infty,-1], and t[0,1]t\in[0,1], then it is easy to see that 1+2t(1t)(z1)1+2t(1-t)(z-1) is invertible. (For a fixed zz, the possible values yield the segment [1,1+z2]e\left[1,\frac{1+z}{2}\right]_{\mathrm{e}}.) Then one can define, for z(,1]z\in\mathbb{C}\setminus(-\infty,-1],

(32) AC(z)=t=0111+2t(1t)(z1)dt.\operatorname{AC}(z)=\int_{t=0}^{1}\frac{1}{1+2t(1-t)(z-1)}\,\mathrm{d}t.

This gives an analytic function in z(,1]z\in\mathbb{C}\setminus(-\infty,-1].

Lemma 2.13.

(a) In terms of the real domain,

AC(x)={arccosx1x2if 1<x<11if x=1arcoshxx21if 1<x.\operatorname{AC}(x)=\begin{cases}\dfrac{\arccos x}{\sqrt{1-x^{2}}}&\text{if }-1<x<1\\[8.53581pt] 1&\text{if }x=1\\[2.84526pt] \dfrac{\operatorname{arcosh}x}{\sqrt{x^{2}-1}}\qquad&\text{if }1<x.\\ \end{cases}

Note, this can be rewritten as AC(x)=1Sin(Cos1(x))\operatorname{AC}(x)=\dfrac{1}{\operatorname{\not{\mathrm{S}}in}(\operatorname{\not{\mathrm{C}}os}^{-1}(x))} for x(1,+)x\in(-1,+\infty).

(b) As an analytic function,

(33) AC(z)=zAC(z)11z2.\operatorname{AC}^{\prime}(z)=\frac{z\operatorname{AC}(z)-1}{1-z^{2}}.

(c) AC\operatorname{AC} vanishes nowhere on (,1]\mathbb{C}\setminus(-\infty,-1]. AC(z)=±1\operatorname{AC}(z)=\pm 1 holds only for z=1z=1 with AC(1)=1\operatorname{AC}(1)=1.

(d) AC(z)=113(z1)+O((z1)2)\operatorname{AC}(z)=1-\frac{1}{3}(z-1)+O((z-1)^{2}) at z1z\sim 1.

(e) AC\operatorname{AC} is strictly monotone decreasing on (1,+)(-1,+\infty) with range (+,0)e(+\infty,0)_{\mathrm{e}}.

Proof.

(b) For z±1z\neq\pm 1, differentiating under the integral sign, we find

AC(z)=\displaystyle\operatorname{AC}^{\prime}(z)= t=012t(1t)(1+2t(1t)(z1))2dt\displaystyle\int_{t=0}^{1}\frac{-2t(1-t)}{(1+2t(1-t)(z-1))^{2}}\,\mathrm{d}t
=\displaystyle= t=01z11+2t(1t)(z1)11z2dt+t=0116t(1t)4t2(1t)2(z1)(1+z)(1+2t(1t)(z1))2dt\displaystyle\int_{t=0}^{1}\frac{z\dfrac{1}{1+2t(1-t)(z-1)}-1}{1-z^{2}}\,\mathrm{d}t+\int_{t=0}^{1}\frac{1-6t(1-t)-4t^{2}(1-t)^{2}(z-1)}{(1+z)(1+2t(1-t)(z-1))^{2}}\,\mathrm{d}t
=\displaystyle= zt=0111+2t(1t)(z1)dt11z2+[t(1t)(12t)(1+z)(1+2t(1t)(z1))]t=01\displaystyle\frac{\displaystyle{z\int_{t=0}^{1}\dfrac{1}{1+2t(1-t)(z-1)}\,\mathrm{d}t-1}}{1-z^{2}}+\left[\frac{t(1-t)(1-2t)}{(1+z)(1+2t(1-t)(z-1))}\right]_{t=0}^{1}
=\displaystyle= zAC(z)11z2+0.\displaystyle\frac{z\operatorname{AC}(z)-1}{1-z^{2}}+0.

(a) The special value AC(1)=1\operatorname{AC}(1)=1 is trivial. Restricted to x(1,1)x\in(-1,1), using (33), we find ddx(AC(x)1x2)=11x2\dfrac{\mathrm{d}}{\mathrm{d}x}\left(\operatorname{AC}(x)\sqrt{1-x^{2}}\right)=-\dfrac{1}{\sqrt{1-x^{2}}}. Considering the primitive functions, we find AC(x)=(arccosx)+c1x2\operatorname{AC}(x)=\frac{(\arccos x)+c}{\sqrt{1-x^{2}}} for x(1,1)x\in(-1,1), with an appropriate cc\in\mathbb{C}. As AC\operatorname{AC} limits to a finite value for x1x\nearrow 1, only the case c=0c=0 is possible. Restriction to (1,+)(1,+\infty) is similar.

(e) is immediate from (32).

(c) From (32), sgnImAC(z)=sgnImz\operatorname{sgn}\operatorname{Im}\operatorname{AC}(z)=-\operatorname{sgn}\operatorname{Im}z is immediate. Then it is sufficient to restrict to (1,+)(-1,+\infty). Then the statement follows from (d).

(d) Elementary analysis. ∎

From Lemma 2.13(a), it is easy to the see that, for x(,π2)x\in(-\infty,\pi^{2}),

(34) AC(Cos(x))=1Sin(x)\operatorname{AC}(\operatorname{\not{\mathrm{C}}os}(x))=\frac{1}{\operatorname{\not{\mathrm{S}}in}(x)}

holds. By analytic continuation, it also holds in an open neighbourhood of (,π2)(-\infty,\pi^{2}).

2.P. Logarithm (review, alternative)

Recall that, in a Banach algebra, AA is log\log-able if and only if the spectrum of AA is disjoint from (,0](-\infty,0]. In that case, the logarithm of AA is defined as

(35) logA=λ=01A1λ+(1λ)Adλ=t=01A1(1t)+tAdt.\log A=\int_{\lambda=0}^{1}\frac{A-1}{\lambda+(1-\lambda)A}\,\mathrm{d}\lambda=\int_{t=0}^{1}\frac{A-1}{(1-t)+tA}\,\mathrm{d}t.
Lemma 2.14.

Let AA be a real 2×22\times 2 matrix.

(a) Then AA is a log\log-able if and only if detA>0\det A>0 and trA2detA>1\dfrac{\operatorname{tr}A}{2\sqrt{\det A}}>-1.

(b) In the log\log-able case

(36) logA=(logdetA)Id2+AC(trA2detA)detA(AtrA2Id2).\log A=(\log\sqrt{\det A})\operatorname{Id}_{2}+\frac{\operatorname{AC}\left(\dfrac{\operatorname{tr}A}{2\sqrt{\det A}}\right)}{\sqrt{\det A}}\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right).

Suppose that AA is n×nn\times n complex matrix which is log\log-able. Let

𝐝𝐞𝐭A:=detA=exptrlogA2=exp12t=01trd((1t)Id+tA)(1t)Id+tA.\sqrt{\operatorname{\mathbf{det}}A}:=\det\sqrt{A}=\exp\frac{\operatorname{tr}\log A}{2}=\exp\frac{1}{2}\int_{t=0}^{1}\operatorname{tr}\frac{\mathrm{d}((1-t)\operatorname{Id}+tA)}{(1-t)\operatorname{Id}+tA}.

One can see that this is the product of the square root of eigenvalues (with multiplicity).

Lemma 2.15.

Suppose that AA is 2×22\times 2 complex matrix which is log\log-able.

(a) Then 𝐝𝐞𝐭A(,0]\sqrt{\operatorname{\mathbf{det}}A}\in\mathbb{C}\setminus(\infty,0], trA2𝐝𝐞𝐭A(,1]\dfrac{\operatorname{tr}A}{2\sqrt{\operatorname{\mathbf{det}}A}}\in\mathbb{C}\setminus(-\infty,-1].

(b) Furthermore, the extended form of (36) holds:

(37) logA=(log𝐝𝐞𝐭A)Id2+AC(trA2𝐝𝐞𝐭A)𝐝𝐞𝐭A(AtrA2Id2).\log A=(\log\sqrt{\operatorname{\mathbf{det}}A})\operatorname{Id}_{2}+\frac{\operatorname{AC}\left(\dfrac{\operatorname{tr}A}{2\sqrt{\operatorname{\mathbf{det}}A}}\right)}{\sqrt{\operatorname{\mathbf{det}}A}}\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right).\qed
Proofs.

Lemma 2.14(a) and Lemma 2.15(a) can be obtained from the examination of the eigenvalues in a relatively straightforward manner. For the rest, it is sufficient to prove 2.15(b). The proof is almost “tautological”: Using the differentiation rule (33), by a long but straightforward symbolic computation, one obtains, for t[0,1]t\in[0,1],

AId2(1t)Id2+tA=\frac{A-\operatorname{Id}_{2}}{(1-t)\operatorname{Id}_{2}+tA}=
=ddt((log𝐝𝐞𝐭((1t)Id2+tA))Id2+AC(tr((1t)Id2+tA)2𝐝𝐞𝐭((1t)Id2+tA))𝐝𝐞𝐭((1t)Id2+tA)t(AtrA2Id2))=\frac{\mathrm{d}}{\mathrm{d}t}\left(\left(\log\sqrt{\operatorname{\mathbf{det}}((1-t)\operatorname{Id}_{2}+tA)}\right)\operatorname{Id}_{2}+\frac{\operatorname{AC}\left(\dfrac{\operatorname{tr}((1-t)\operatorname{Id}_{2}+tA)}{2\sqrt{\operatorname{\mathbf{det}}((1-t)\operatorname{Id}_{2}+tA)}}\right)}{\sqrt{\operatorname{\mathbf{det}}((1-t)\operatorname{Id}_{2}+tA)}}t\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)\right)

(Indeed, the symbolic computation is perfectly valid for, say, t1t\sim 1, AexpJ~A\sim\exp\tilde{J}, where tr((1t)Id2+tA)2𝐝𝐞𝐭((1t)Id2+tA)cosh1±1\dfrac{\operatorname{tr}((1-t)\operatorname{Id}_{2}+tA)}{2\sqrt{\operatorname{\mathbf{det}}((1-t)\operatorname{Id}_{2}+tA)}}\sim\cosh 1\neq\pm 1; and even 𝐝𝐞𝐭\sqrt{\operatorname{\mathbf{det}}\cdot} can be replaced by det\sqrt{\det\cdot}. Then, by analyticity in tt and analyticity in AA, the general identity valid.) Integrated, (37) is obtained. ∎

2.Q. AC\operatorname{AC} near 1-1, some asymptotics

From Lemma 2.13(a) (and analytic continuation), it easy to see that

AC(y1)=πy2yAC(1y)\operatorname{AC}(y-1)=\frac{\pi}{\sqrt{y}\sqrt{2-y}}-\operatorname{AC}(1-y)

is valid for y((,0][2,))y\in\mathbb{C}\setminus((-\infty,0]\cup[2,\infty)). This is informative regarding what kind of analytic reparametrizations are useful for AC\operatorname{AC} near 1-1. More in terms of real analysis, one has

Lemma 2.16.

(a) For y(0,+)y\in(0,+\infty), the function

yAC(y1)π2yy\mapsto\frac{\operatorname{AC}(y-1)}{\dfrac{\pi}{\sqrt{2}\sqrt{y}}}

is monotone decreasing with range (1,0)e(1,0)_{\mathrm{e}}.

(b) In particular, for 0<y1<y20<y_{1}<y_{2},

AC(y21)AC(y11)<y1y2.\frac{\operatorname{AC}(y_{2}-1)}{\operatorname{AC}(y_{1}-1)}<\frac{\sqrt{y_{1}}}{\sqrt{y_{2}}}.
Proof.

Part (a) can addressed by elementary analyis. Part (a) implies part (b). ∎

2.R. Norms (review)

Lemma 2.17.

Let AA be a real or complex matrix. Then, for the norm,

(38) A2\displaystyle\left\|A\right\|_{2} =tr(AA)2+DAA\displaystyle=\sqrt{\frac{\operatorname{tr}(A^{*}A)}{2}+\sqrt{-D_{A^{*}A}}}
=tr(AA)2+(tr(AA))24|detA|2\displaystyle=\sqrt{\frac{\operatorname{tr}(A^{*}A)}{2}+\sqrt{\frac{(\operatorname{tr}(A^{*}A))^{2}}{4}-|\det A|^{2}}}
=tr(AA)+2|detA|+tr(AA)2|detA|2;\displaystyle=\frac{\sqrt{\operatorname{tr}(A^{*}A)+2|\det A|}+\sqrt{\operatorname{tr}(A^{*}A)-2|\det A|}}{2};

and, for the co-norm,

(39) A2=A121\displaystyle\|A\|_{2}^{-}=\left\|A^{-1}\right\|_{2}^{-1} =tr(AA)2DAA\displaystyle=\sqrt{\frac{\operatorname{tr}(A^{*}A)}{2}-\sqrt{-D_{A^{*}A}}}
=tr(AA)2(tr(AA))24|detA|2\displaystyle=\sqrt{\frac{\operatorname{tr}(A^{*}A)}{2}-\sqrt{\frac{(\operatorname{tr}(A^{*}A))^{2}}{4}-|\det A|^{2}}}
=tr(AA)+2|detA|tr(AA)2|detA|2.\displaystyle=\frac{\sqrt{\operatorname{tr}(A^{*}A)+2|\det A|}-\sqrt{\operatorname{tr}(A^{*}A)-2|\det A|}}{2}.

In particular,

(40) A2A2=|detA|.\|A\|_{2}\cdot\|A\|_{2}^{-}=|\det A|.

In the case of real matrices, the results are the same for the Hilbert spaces 2\mathbb{R}^{2} and 2\mathbb{C}^{2}. ∎

For 2×22\times 2 matrices, we define the signed co-norm as

(41) A2={0if A=0detAA2if A0.\lfloor A\rfloor_{2}=\begin{cases}0&\text{if }A=0\\ \\ \dfrac{\det A}{\|A\|_{2}}&\text{if }A\neq 0.\\ \end{cases}

Then,

(42) |A2|=A2,\left|\left\lfloor A\right\rfloor_{2}\right|=\|A\|_{2}^{-},

and

(43) A2A2=detA.\|A\|_{2}\cdot\left\lfloor A\right\rfloor_{2}=\det A.

However, we will essentially consider the signed co-norm only for real matrices.

Lemma 2.18.

Let A=[abcd]=a~Id2+b~I~+c~J~+d~K~A=\begin{bmatrix}a&b\\ c&d\end{bmatrix}=\tilde{a}\operatorname{Id}_{2}+\tilde{b}\tilde{I}+\tilde{c}\tilde{J}+\tilde{d}\tilde{K} be a real matrix. Then

(44) A2\displaystyle\left\|A\right\|_{2} =tr(AA)+2detA+tr(AA)2detA2\displaystyle=\frac{\sqrt{\operatorname{tr}(A^{*}A)+2\det A}+\sqrt{\operatorname{tr}(A^{*}A)-2\det A}}{2}
=(a+d)2+(cb)2+(ad)2+(b+c)22\displaystyle=\frac{\sqrt{(a+d)^{2}+(c-b)^{2}}+\sqrt{(a-d)^{2}+(b+c)^{2}}}{2}
(45) =a~2+b~2+c~2+d~2.\displaystyle=\sqrt{\tilde{a}^{2}+\tilde{b}^{2}}+\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}.

On the other hand,

(46) A2=A121\displaystyle\|A\|_{2}^{-}=\left\|A^{-1}\right\|_{2}^{-1} =|tr(AA)+2detAtr(AA)2detA2|\displaystyle=\left|\frac{\sqrt{\operatorname{tr}(A^{*}A)+2\det A}-\sqrt{\operatorname{tr}(A^{*}A)-2\det A}}{2}\right|
=|(a+d)2+(cb)2(ad)2+(b+c)22|\displaystyle=\left|\frac{\sqrt{(a+d)^{2}+(c-b)^{2}}-\sqrt{(a-d)^{2}+(b+c)^{2}}}{2}\right|
=|a~2+b~2c~2+d~2|.\displaystyle=\left|\sqrt{\tilde{a}^{2}+\tilde{b}^{2}}-\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}\right|.

It is true that

(47) sgndetA\displaystyle\operatorname{sgn}\det A =sgntr(AA)+2detAtr(AA)2detA2\displaystyle=\operatorname{sgn}\frac{\sqrt{\operatorname{tr}(A^{*}A)+2\det A}-\sqrt{\operatorname{tr}(A^{*}A)-2\det A}}{2}
=sgn(a+d)2+(cb)2(ad)2+(b+c)22\displaystyle=\operatorname{sgn}\frac{\sqrt{(a+d)^{2}+(c-b)^{2}}-\sqrt{(a-d)^{2}+(b+c)^{2}}}{2}
=sgn(a~2+b~2c~2+d~2).\displaystyle=\operatorname{sgn}\left(\sqrt{\tilde{a}^{2}+\tilde{b}^{2}}-\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}\right).

Furthermore,

(48) A2=sgn(detA)A2\displaystyle\left\lfloor A\right\rfloor_{2}=\operatorname{sgn}(\det A)\left\|A\right\|_{2}^{-} =tr(AA)+2detAtr(AA)2detA2\displaystyle=\frac{\sqrt{\operatorname{tr}(A^{*}A)+2\det A}-\sqrt{\operatorname{tr}(A^{*}A)-2\det A}}{2}
=(a+d)2+(cb)2(ad)2+(b+c)22\displaystyle=\frac{\sqrt{(a+d)^{2}+(c-b)^{2}}-\sqrt{(a-d)^{2}+(b+c)^{2}}}{2}
=a~2+b~2c~2+d~2.\displaystyle=\sqrt{\tilde{a}^{2}+\tilde{b}^{2}}-\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}.\qed

2.S. Directional derivatives

Whenever f:Mf(M)f:M\mapsto f(M) is a function on an open domain of matrices, we define the derivative of ff at AA in direction 𝐯\mathbf{v}, along smooth curves as

D𝐯 at M=A(f(M))=limt0f(γ(t)),\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(f(M)\right)=\lim_{t\rightarrow 0}f(\gamma(t)),

whenever it gives the same value for all γ\gamma such that γ\gamma is smooth, γ(0)=A\gamma(0)=A, γ(0)=𝐯\gamma^{\prime}(0)=\mathbf{v}. This is a sufficiently flexible notion to deal with some mildly singular ff. If ff is smooth, then the directional derivatives agrees to the usual multidimensional differential. E. g.

D𝐯 at M=A(M1)=A1𝐯A1,\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(M^{-1}\right)=-A^{-1}\mathbf{v}A^{-1},

or

(49) D𝐯 at M=A(trM)=tr𝐯,\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\operatorname{tr}M\right)=\operatorname{tr}\mathbf{v},
(50) D𝐯 at M=A(detM)=tr(𝐯adjA),\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\det M\right)=\operatorname{tr}\left(\mathbf{v}\operatorname{adj}A\right),
(51) D𝐯 at M=A(tr(MM))=2Retr(A𝐯).\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\operatorname{tr}(M^{*}M)\right)=2\operatorname{Re}\operatorname{tr}(A^{*}\mathbf{v}).

Note, for 2×22\times 2 matrices, (50) reads as

(52) D𝐯 at M=A(detM)=(trA)(tr𝐯)tr(A𝐯).\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\det M\right)=(\operatorname{tr}A)(\operatorname{tr}\mathbf{v})-\operatorname{tr}(A\mathbf{v}).

Moreover, also for 2×22\times 2 matrices,

(53) D𝐯 at M=A(DM)=2TA,𝐯.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(D_{M}\right)=-2T_{A,\mathbf{v}}.

Furthermore,

(54) D𝐯 at M=A(TM,B)=T𝐯,B.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(T_{M,B}\right)=T_{\mathbf{v},B}.

2.T. Smoothness of norm

Now we examine the smoothness properties of the norm of complex 2×22\times 2 matrices. For a complex 2×22\times 2 matrix 𝐯\mathbf{v}, we set

S(𝐯):=12tr𝐯+𝐯2+D𝐯+𝐯2.S(\mathbf{v}):=\frac{1}{2}\operatorname{tr}\frac{\mathbf{v}+\mathbf{v}^{*}}{2}+\sqrt{-D^{\phantom{M}}_{\frac{\mathbf{v}+\mathbf{v}^{*}}{2}}}.

(This is the higher eigenvalue of 𝐯+𝐯2\frac{\mathbf{v}+\mathbf{v}^{*}}{2}.) It is easy to see that for

𝐯=vaId+vbI~+vcJ~+vdK~\mathbf{v}=v_{a}\operatorname{Id}+v_{b}\tilde{I}+v_{c}\tilde{J}+v_{d}\tilde{K}

(complex coefficients), it yields

S(𝐯)=(Reva)+(Imvb)2+(Revc)2+(Revd)2.S(\mathbf{v})=(\operatorname{Re}v_{a})+\sqrt{(\operatorname{Im}v_{b})^{2}+(\operatorname{Re}v_{c})^{2}+(\operatorname{Re}v_{d})^{2}}.
Lemma 2.19.

(o) On the space of complex 2×22\times 2 matrices, the norm operation is smooth except at matrices AA such that the “norm discriminant”

DAA=0.-D_{A^{*}A}=0.

(a) The directional derivative of the function AM2A\mapsto\|M\|_{2} at M=0M=0, along smooth curves, is, just

D𝐯 at M=0(M2)=𝐯2.\mathrm{D}_{\mathbf{v}\text{ at }M=0}\left(\|M\|_{2}\right)=\|\mathbf{v}\|_{2}.

(b) The directional derivative of the function MM2M\mapsto\|M\|_{2} at M=IdM=\operatorname{Id}, along smooth curves, is

D𝐯 at M=Id(M2)\displaystyle\mathrm{D}_{\mathbf{v}\text{ at }M=\operatorname{Id}}\left(\|M\|_{2}\right) =S(𝐯).\displaystyle=S(\mathbf{v}).

(c) The directional derivative of the function MM2M\mapsto\|M\|_{2} at any conform-unitary AA, along smooth curves, is

D𝐯 at M=A(M2)=A2S(A1𝐯)=A2S(𝐯A1)=S(A𝐯)A2=S(𝐯A)A2.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)=\|A\|_{2}\cdot S(A^{-1}\mathbf{v})=\|A\|_{2}\cdot S(\mathbf{v}A^{-1})=\frac{S(A^{*}\mathbf{v})}{\|A\|_{2}}=\frac{S(\mathbf{v}A^{*})}{\|A\|_{2}}.

(d) The directional derivative of the function MM2M\mapsto\|M\|_{2} at any not conform-unitary A0A\neq 0, is

D𝐯 at M=A(M2)=\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)=
=1A2Re(tr(A𝐯)2+12tr((AAtr(AA)2Id2)(A𝐯tr(A𝐯)2Id2))DAA)=\frac{1}{\|A\|_{2}}\operatorname{Re}\left(\frac{\operatorname{tr}(A^{*}\mathbf{v})}{2}+\frac{\dfrac{1}{2}\operatorname{tr}\left(\left(A^{*}A-\dfrac{\operatorname{tr}(A^{*}A)}{2}\operatorname{Id}_{2}\right)\left(A^{*}\mathbf{v}-\dfrac{\operatorname{tr}(A^{*}\mathbf{v})}{2}\operatorname{Id}_{2}\right)\right)}{\sqrt{-D_{A^{*}A}}}\right)
=A2Re(tr(A𝐯))A21Re(det(A)(tr(A)tr(𝐯)tr(A𝐯)))2DAA.=\frac{\|A\|_{2}\operatorname{Re}\left(\operatorname{tr}(A^{*}\mathbf{v})\right)-\|A\|_{2}^{-1}\operatorname{Re}\left(\det(A^{*})\Bigl{(}\operatorname{tr}(A)\operatorname{tr}(\mathbf{v})-\operatorname{tr}(A\mathbf{v})\Bigr{)}\right)}{2\sqrt{-D_{A^{*}A}}}.
Proof.

(o) The smoothness part follows from the norm expressions (38/1–2). The non-smoothness part will follow from the explicit expressions for the directional derivatives. Part (a) is trivial. Part (b) is best to be done using Taylor expansions of smooth curves. Part (c) follows from (b) by the conform-unitary displacement argument: Neighbourhoods of conform-unitary matrices are related to each other just by multiplication by conform-unitary matrices (left or right, alike). Such a multiplication, however, just simply scales up the norm. Part (d) follows from the basic observations (4951) and standard composition rules. ∎

Remark 2.20.

If we apply the notation

{A,B}=AB+BA\boldsymbol{\{}A,B\boldsymbol{\}}=A^{*}B+B^{*}A

and

A,B=12tr((AtrA2Id2)(BtrB2Id2)),\boldsymbol{\langle}A,B\boldsymbol{\rangle}=\frac{1}{2}\operatorname{tr}\left(\left(A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right)\left(B-\frac{\operatorname{tr}B}{2}\operatorname{Id}_{2}\right)\right),

then we find

A2=12tr12{A,A}+12{A,A},12{A,A},\|A\|_{2}=\sqrt{\frac{1}{2}\operatorname{tr}\tfrac{1}{2}\boldsymbol{\{}A,A\boldsymbol{\}}+\sqrt{\boldsymbol{\langle}\tfrac{1}{2}\boldsymbol{\{}A,A\boldsymbol{\}},\tfrac{1}{2}\boldsymbol{\{}A,A\boldsymbol{\}}\boldsymbol{\rangle}}},

and in case Lemma 2.19 (c),

D𝐯 at M=A(M2)=12tr12{A,𝐯}+12{A,𝐯},12{A,𝐯}A2;\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)=\frac{\dfrac{1}{2}\operatorname{tr}\tfrac{1}{2}\boldsymbol{\{}A,\mathbf{v}\boldsymbol{\}}+\sqrt{\boldsymbol{\langle}\tfrac{1}{2}\boldsymbol{\{}A,\mathbf{v}\boldsymbol{\}},\tfrac{1}{2}\boldsymbol{\{}A,\mathbf{v}\boldsymbol{\}}\boldsymbol{\rangle}}}{\|A\|_{2}};

and in case Lemma 2.19 (d),

D𝐯 at M=A(M2)=12tr12{A,𝐯}+12{A,A},12{A,𝐯}12{A,A},12{A,A}A2;\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)=\frac{\dfrac{1}{2}\operatorname{tr}\tfrac{1}{2}\boldsymbol{\{}A,\mathbf{v}\boldsymbol{\}}+\dfrac{{\boldsymbol{\langle}\tfrac{1}{2}\boldsymbol{\{}A,A\boldsymbol{\}},\tfrac{1}{2}\boldsymbol{\{}A,\mathbf{v}\boldsymbol{\}}\boldsymbol{\rangle}}}{\sqrt{\boldsymbol{\langle}\tfrac{1}{2}\boldsymbol{\{}A,A\boldsymbol{\}},\tfrac{1}{2}\boldsymbol{\{}A,A\boldsymbol{\}}\boldsymbol{\rangle}}}}{\|A\|_{2}};

making the computation more suggestive. ∎

For the sake of curiosity, we include the corresponding statement for co-norms. Here the situation is analogous but slightly more complicated as there is an additional case for non-smoothness. Indeed, the signed co-norm (41) has similar smoothness properties as the norm. However, when absolute value is taken to obtain the co-norm, then there is an additional source for non-smoothness: the case when the determinant vanishes.

For a complex 2×22\times 2 matrix 𝐯\mathbf{v}, we set

S(𝐯):=12tr𝐯+𝐯2D𝐯+𝐯2.S^{-}(\mathbf{v}):=\frac{1}{2}\operatorname{tr}\frac{\mathbf{v}+\mathbf{v}^{*}}{2}-\sqrt{-D^{\phantom{M}}_{\frac{\mathbf{v}+\mathbf{v}^{*}}{2}}}.

(This is the lower eigenvalue of 𝐯+𝐯2\frac{\mathbf{v}+\mathbf{v}^{*}}{2}.) It is easy to see that for

𝐯=vaId+vbI~+vcJ~+vdK~\mathbf{v}=v_{a}\operatorname{Id}+v_{b}\tilde{I}+v_{c}\tilde{J}+v_{d}\tilde{K}

(complex coefficients), it yields

S(𝐯)=(Reva)(Imvb)2+(Revc)2+(Revd)2.S^{-}(\mathbf{v})=(\operatorname{Re}v_{a})-\sqrt{(\operatorname{Im}v_{b})^{2}+(\operatorname{Re}v_{c})^{2}+(\operatorname{Re}v_{d})^{2}}.
Lemma 2.21.

(o) On the space of complex 2×22\times 2 matrices, the co-norm operation is smooth except at matrices AA such that

detA=0\det A=0

or

DAA=0.-D_{A^{*}A}=0.

(a) The directional derivative of the function AM2A\mapsto\|M\|_{2}^{-} at M=0M=0, along smooth curves, is, just

D𝐯 at M=0(M2)=𝐯2.\mathrm{D}_{\mathbf{v}\text{ at }M=0}\left(\|M\|_{2}^{-}\right)=\|\mathbf{v}\|_{2}^{-}.

(á) The directional derivative of the function MM2M\mapsto\|M\|_{2}^{-} at AA with detA=0\det A=0, A0A\neq 0, along smooth curves, is

D𝐯 at M=A(M2)=|(trA)(tr𝐯)tr(A𝐯)|tr(AA).\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}^{-}\right)=\frac{|(\operatorname{tr}A)(\operatorname{tr}\mathbf{v})-\operatorname{tr}(A\mathbf{v})|}{\sqrt{\operatorname{tr}(A^{*}A)}}.

(In this case, A2=tr(AA)\|A\|_{2}=\sqrt{\operatorname{tr}(A^{*}A)} and A2=0\|A\|_{2}^{-}=0.)

(b) The directional derivative of the function MM2M\mapsto\|M\|_{2}^{-} at M=IdM=\operatorname{Id}, along smooth curves, is

D𝐯 at M=Id(M2)\displaystyle\mathrm{D}_{\mathbf{v}\text{ at }M=\operatorname{Id}}\left(\|M\|_{2}^{-}\right) =S(𝐯).\displaystyle=S^{-}(\mathbf{v}).

(c) The directional derivative of the function MM2M\mapsto\|M\|_{2}^{-} at any conform-unitary AA, along smooth curves, is

D𝐯 at M=A(M2)=A2S(A1𝐯)=A2S(𝐯A1)=S(A𝐯)A2=S(𝐯A)A2.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}^{-}\right)=\|A\|_{2}^{-}\cdot S^{-}(A^{-1}\mathbf{v})=\|A\|_{2}^{-}\cdot S^{-}(\mathbf{v}A^{-1})=\frac{S^{-}(A^{*}\mathbf{v})}{\|A\|_{2}^{-}}=\frac{S^{-}(\mathbf{v}A^{*})}{\|A\|_{2}^{-}}.

(In this case, A2=A2=tr(AA)/2\|A\|_{2}=\|A\|_{2}^{-}=\sqrt{\operatorname{tr}(A^{*}A)/2}.)

(d) The directional derivative of the function MM2M\mapsto\|M\|_{2}^{-} at any not conform-unitary A0A\neq 0, is

D𝐯 at M=A(M2)=\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}^{-}\right)=
=1A2Re(tr(A𝐯)212tr((AAtr(AA)2Id2)(A𝐯tr(A𝐯)2Id2))DAA)=\frac{1}{\|A\|_{2}^{-}}\operatorname{Re}\left(\frac{\operatorname{tr}(A^{*}\mathbf{v})}{2}-\frac{\dfrac{1}{2}\operatorname{tr}\left(\left(A^{*}A-\dfrac{\operatorname{tr}(A^{*}A)}{2}\operatorname{Id}_{2}\right)\left(A^{*}\mathbf{v}-\dfrac{\operatorname{tr}(A^{*}\mathbf{v})}{2}\operatorname{Id}_{2}\right)\right)}{\sqrt{-D_{A^{*}A}}}\right)
=(A2)1Re(det(A)(tr(A)tr(𝐯)tr(A𝐯)))A2Re(tr(A𝐯))2DAA.=\frac{(\|A\|_{2}^{-})^{-1}\operatorname{Re}\left(\det(A^{*})\Bigl{(}\operatorname{tr}(A)\operatorname{tr}(\mathbf{v})-\operatorname{tr}(A\mathbf{v})\Bigr{)}\right)-\|A\|_{2}^{-}\operatorname{Re}\left(\operatorname{tr}(A^{*}\mathbf{v})\right)}{2\sqrt{-D_{A^{*}A}}}.
Proof.

Similar to the previous statement. ∎

For the signed co-norm, the corresponding statement to Lemma 2.19 is easy to recover from

D𝐯 at M=A(M2)=(trA)(tr𝐯)tr(A𝐯)A2detAA22D𝐯 at M=A(M2)\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\lfloor M\rfloor_{2}\right)=\frac{(\operatorname{tr}A)(\operatorname{tr}\mathbf{v})-\operatorname{tr}(A\mathbf{v})}{\|A\|_{2}}-\frac{\det A}{\|A\|^{2}_{2}}\cdot\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)

(for A0A\neq 0). But this counterpart to Lemma 2.19 is not particularly simple.

However, things are manageable for real 2×22\times 2 matrices; which case is even more simple as the use of Re\operatorname{Re} and Im\operatorname{Im} can be avoided:

Lemma 2.22.

(o) On the space of real 2×22\times 2 matrices, the norm and signed co-norm operations are smooth except at matrices AA such that

DAA=0.-D_{A^{*}A}=0.

(a) The directional derivative of the function AM2A\mapsto\|M\|_{2} at M=0M=0, along smooth curves, is

D𝐯 at M=0(M2)=𝐯2.\mathrm{D}_{\mathbf{v}\text{ at }M=0}\left(\|M\|_{2}\right)=\|\mathbf{v}\|_{2}.

The directional derivative of the function AM2A\mapsto\lfloor M\rfloor_{2}^{-} at M=0M=0, along smooth curves, is

D𝐯 at M=0(M2)=𝐯2.\mathrm{D}_{\mathbf{v}\text{ at }M=0}\left(\lfloor M\rfloor_{2}\right)=\lfloor\mathbf{v}\rfloor_{2}.

(b) The directional derivative of the function MM2M\mapsto\|M\|_{2} at M=IdM=\operatorname{Id}, along smooth curves, is

D𝐯 at M=Id(M2)\displaystyle\mathrm{D}_{\mathbf{v}\text{ at }M=\operatorname{Id}}\left(\|M\|_{2}\right) =S(𝐯).\displaystyle=S(\mathbf{v}).

Here, for 𝐯=vaId+vbI~+vcJ~+vdK~\mathbf{v}=v_{a}\operatorname{Id}+v_{b}\tilde{I}+v_{c}\tilde{J}+v_{d}\tilde{K} (with real coefficients),

S(𝐯)=va+(vc)2+(vd)2.S(\mathbf{v})=v_{a}+\sqrt{(v_{c})^{2}+(v_{d})^{2}}.

The directional derivative of the function MM2M\mapsto\lfloor M\rfloor_{2} at M=IdM=\operatorname{Id}, along smooth curves, is

D𝐯 at M=Id(M2)\displaystyle\mathrm{D}_{\mathbf{v}\text{ at }M=\operatorname{Id}}\left(\lfloor M\rfloor_{2}\right) =S(𝐯).\displaystyle=S^{-}(\mathbf{v}).

Here, for 𝐯=vaId+vbI~+vcJ~+vdK~\mathbf{v}=v_{a}\operatorname{Id}+v_{b}\tilde{I}+v_{c}\tilde{J}+v_{d}\tilde{K} (with real coefficients),

S(𝐯)=va(vc)2+(vd)2.S^{-}(\mathbf{v})=v_{a}-\sqrt{(v_{c})^{2}+(v_{d})^{2}}.

(c) The directional derivative of the function MM2M\mapsto\|M\|_{2} at any conform-unitary AA, along smooth curves, is

D𝐯 at M=A(M2)=A2S(A1𝐯)=A2S(𝐯A1)=S(A𝐯)A2=S(𝐯A)A2.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)=\|A\|_{2}\cdot S(A^{-1}\mathbf{v})=\|A\|_{2}\cdot S(\mathbf{v}A^{-1})=\frac{S(A^{*}\mathbf{v})}{\|A\|_{2}}=\frac{S(\mathbf{v}A^{*})}{\|A\|_{2}}.

The directional derivative of the function MM2M\mapsto\lfloor M\rfloor_{2} at any conform-unitary AA, along smooth curves, is

D𝐯 at M=A(M2)=A2S(A1𝐯)=A2S(𝐯A1)=S(A𝐯)A2=S(𝐯A)A2.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\lfloor M\rfloor_{2}\right)=\lfloor A\rfloor_{2}\cdot S^{-}(A^{-1}\mathbf{v})=\lfloor A\rfloor_{2}\cdot S^{-}(\mathbf{v}A^{-1})=\frac{S^{-}(A^{*}\mathbf{v})}{\lfloor A\rfloor_{2}}=\frac{S^{-}(\mathbf{v}A^{*})}{\lfloor A\rfloor_{2}}.

(In this case, A2=A2=tr(AA)/2\|A\|_{2}=\|A\|_{2}^{-}=\sqrt{\operatorname{tr}(A^{*}A)/2}.)

(d) The directional derivative of the function MM2M\mapsto\|M\|_{2} at any not conform-unitary A0A\neq 0, is

D𝐯 at M=A(M2)=A2tr(A𝐯)A2(tr(A)tr(𝐯)tr(A𝐯))2DAA.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right)=\frac{\|A\|_{2}\operatorname{tr}(A^{*}\mathbf{v})-\lfloor A\rfloor_{2}\Bigl{(}\operatorname{tr}(A)\operatorname{tr}(\mathbf{v})-\operatorname{tr}(A\mathbf{v})\Bigr{)}}{2\sqrt{-D_{A^{*}A}}}.

The directional derivative of the function MM2M\mapsto\lfloor M\rfloor_{2} at any not conform-unitary A0A\neq 0, is

D𝐯 at M=A(M2)=A2(tr(A)tr(𝐯)tr(A𝐯))A2tr(A𝐯)2DAA.\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\lfloor M\rfloor_{2}\right)=\frac{\|A\|_{2}\Bigl{(}\operatorname{tr}(A)\operatorname{tr}(\mathbf{v})-\operatorname{tr}(A\mathbf{v})\Bigr{)}-\lfloor A\rfloor_{2}\operatorname{tr}(A^{*}\mathbf{v})}{2\sqrt{-D_{A^{*}A}}}.
Proof.

It follows from the complex picture. ∎

Next, we concentrate on a special case regarding complex 2×22\times 2 matrices.

Lemma 2.23.

(a) Let AA be a complex 2×22\times 2 matrix. Then

DAA=D(trA)¯A+(trA)A2+D12[A,A].D_{A^{*}A}=D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}}+D_{\frac{1}{2}[A,A^{*}]}.

(b) With some abuse of notation, for λ[0,1]\lambda\in[0,1], let

A(λ)=AλtrA2Id2.A(\lambda)=A-\lambda\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}.

Then,

D(trA(λ))¯A(λ)+(trA(λ))A(λ)2=(1λ)2D(trA)¯A+(trA)A2,D_{\frac{\overline{(\operatorname{tr}A(\lambda))}A(\lambda)+(\operatorname{tr}A(\lambda))A(\lambda)^{*}}{2}}=(1-\lambda)^{2}D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}},

and

D12[A(λ),A(λ)]=D12[A,A].D_{\frac{1}{2}[A(\lambda),A(\lambda)^{*}]}=D_{\frac{1}{2}[A,A^{*}]}.

(c) The map

λ[0,1]A(λ)2\lambda\in[0,1]\mapsto\|A(\lambda)\|_{2}

is smooth, with the possible exception of λ=1\lambda=1. However, if AA is non-normal (or trA=0\operatorname{tr}A=0), then smoothness also holds at λ=1\lambda=1.

Proof.

(a), (b) are straightforward computations. This follows from (a), (b) and the concrete form A2=tr(AA)2+DAA\left\|A\right\|_{2}=\sqrt{\frac{\operatorname{tr}(A^{*}A)}{2}+\sqrt{-D_{A^{*}A}}} of the norm. Note that the vanishing of D12[A,A]D_{\frac{1}{2}[A,A^{*}]} is equivalent to normality. ∎

Lemma 2.24.

Suppose that AA is a complex 2×22\times 2 matrix.

(a) If AA is conform-orthogonal, then

D(trA)Id2 at M=A(M2)=|trA|22A2.\mathrm{D}_{(\operatorname{tr}A)\operatorname{Id}_{2}\text{ at }M=A}\left(\|M\|_{2}\right)=\frac{\dfrac{|\operatorname{tr}A|^{2}}{2}}{\|A\|_{2}}.

(b) In the generic case DAA>0-D_{A^{*}A}>0,

D(trA)Id2 at M=A(M2)=|trA|22+D(trA)¯A+(trA)A2DAAA2.\mathrm{D}_{(\operatorname{tr}A)\operatorname{Id}_{2}\text{ at }M=A}\left(\|M\|_{2}\right)=\frac{\dfrac{|\operatorname{tr}A|^{2}}{2}+\dfrac{-D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}}}{\sqrt{-D_{A^{*}A}}}}{\|A\|_{2}}.

(c) Consequently, if trA0\operatorname{tr}A\neq 0, then

D(trA)Id2 at M=A(M2)>0.\mathrm{D}_{(\operatorname{tr}A)\operatorname{Id}_{2}\text{ at }M=A}\left(\|M\|_{2}\right)>0.

(d) Consequently, if A0A\neq 0 is normal, then

D(trA)Id2 at M=A(M2)=|trA|22+D(trA)¯A+(trA)A2A2.\mathrm{D}_{(\operatorname{tr}A)\operatorname{Id}_{2}\text{ at }M=A}\left(\|M\|_{2}\right)=\frac{\dfrac{|\operatorname{tr}A|^{2}}{2}+\sqrt{-D_{\frac{\overline{(\operatorname{tr}A)}A+(\operatorname{tr}A)A^{*}}{2}}}}{\|A\|_{2}}.
Proof.

(a) and (b) are straighforward consequences of Lemma 2.19. (c) is a trivial consequence of (a) and (b). So is (d), after considering Lemma 2.23(a). ∎

2.U. Some inequalities for norms

Corollary 2.25.

(a) If AA is a real 2×22\times 2 matrix, then

AtrA2Id22A2\left\|A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right\|_{2}\leq\|A\|_{2}

and

AtrA2Id22A2.\left\lfloor A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right\rfloor_{2}\leq\lfloor A\rfloor_{2}.

The inequalities are strict if trA0\operatorname{tr}A\neq 0.

(b) In fact, if trA0\operatorname{tr}A\neq 0, then the maps

tAttrA2Id22t\mapsto\left\|A-t\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right\|_{2}

and

tAttrA2Id22t\mapsto\left\lfloor A-t\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right\rfloor_{2}

are strictly decreasing on the interval t[0,1]t\in[0,1].

Proof.

(a) If AA is written as in (10), then the inequalities trivialize as b~2+c~2+d~2a~2+b~2+c~2+d~2\sqrt{\tilde{b}^{2}}+\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}\leq\sqrt{\tilde{a}^{2}+\tilde{b}^{2}}+\sqrt{\tilde{c}^{2}+\tilde{d}^{2}} and b~2c~2+d~2a~2+b~2c~2+d~2\sqrt{\tilde{b}^{2}}-\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}\leq\sqrt{\tilde{a}^{2}+\tilde{b}^{2}}-\sqrt{\tilde{c}^{2}+\tilde{d}^{2}}.

(b) follows along similar lines. ∎

Corollary 2.26.

(a) If AA is a complex 2×22\times 2 matrix, then

AtrA2Id22A2.\left\|A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right\|_{2}\leq\|A\|_{2}.

The inequality is strict if trA0\operatorname{tr}A\neq 0.

(b) In fact, if trA0\operatorname{tr}A\neq 0, then the map

tAttrA2Id22t\mapsto\left\|A-t\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}\right\|_{2}

is strictly decreasing on the interval t[0,1]t\in[0,1].

Proof.

It is sufficient to prove (b), which follows from Lemma 2.24(c). ∎

(Similar statement is not true for the unsigned co-norm, not even in the real case.)

2.V. Moments of linear maps

If :M2()\ell:\mathrm{M}_{2}(\mathbb{C})\rightarrow\mathbb{R} is a (real) linear map, then it can be represented uniquely by a complex 2×22\times 2 matrix AA such that

(𝐯)=12Retr(A𝐯).\ell(\mathbf{v})=\frac{1}{2}\operatorname{Re}\operatorname{tr}(A^{*}\mathbf{v}).

This matrix AA is the moment associated to \ell.

Lemma 2.27.

If DAA0-D_{A^{*}A}\neq 0, then the moment associated to 𝐯D𝐯 at M=A(M2)\mathbf{v}\mapsto\mathrm{D}_{\mathbf{v}\text{ at }M=A}\left(\|M\|_{2}\right) is

MN(A):=\displaystyle\operatorname{MN}(A):= 1A2(A+AAAtr(AA)2ADAA)\displaystyle\frac{1}{\|A\|_{2}}\left(A+\frac{AA^{*}A-\dfrac{\operatorname{tr}(A^{*}A)}{2}A}{\sqrt{-D_{A^{*}A}}}\right)
=\displaystyle= 1A2(A+det(A)(A(trA)Id2)+tr(AA)2ADAA).\displaystyle\frac{1}{\|A\|_{2}}\left(A+\frac{\det(A)\left(A^{*}-(\operatorname{tr}A^{*})\operatorname{Id}_{2}\right)+\dfrac{\operatorname{tr}(A^{*}A)}{2}A}{\sqrt{-D_{A^{*}A}}}\right).
Proof.

This follows from Lemma 2.19(d). ∎

2.W. Possible exponentials from M2()\mathrm{M}_{2}(\mathbb{R}) (review)

Lemma 2.28.

For AM2()A\in\mathrm{M}_{2}(\mathbb{R}), let us consider the set

LogA={MM2():exp(M)=A}.\operatorname{Log}^{\mathbb{R}}A=\{M\in\mathrm{M}_{2}(\mathbb{R})\,:\,\exp(M)=A\}.

The the following hold:

(a) If AA is hyperbolic (DA<0D_{A}<0) with two positive eigenvalues,

LogA={logA};\operatorname{Log}^{\mathbb{R}}A=\{\log A\};

where logA\log A is hyperbolic, with sp(logA)\operatorname{sp}(\log A)\subset\mathbb{R}.

(b) If AA is hyperbolic (DA<0D_{A}<0) with no two positive eigenvalues, then

LogA=.\operatorname{Log}^{\mathbb{R}}A=\emptyset.

(c) If A=aId2A=a\operatorname{Id}_{2} is a positive scalar matrix, then

LogA={(loga)Id2+2kπL:k,L2=Id2}.\operatorname{Log}^{\mathbb{R}}A=\{(\log a)\operatorname{Id}_{2}+2k\pi L\,:\,k\in\mathbb{Z},\,L^{2}=-\operatorname{Id}_{2}\}.

Among its elements, the one with the lowest norm is logA=(loga)Id2\log A=(\log a)\operatorname{Id}_{2}.

(d) If A=aId2A=a\operatorname{Id}_{2} is a negative scalar matrix, then

LogA={(loga)Id2+(2k+1)πL:k,L2=Id2}.\operatorname{Log}^{\mathbb{R}}A=\{(\log-a)\operatorname{Id}_{2}+(2k+1)\pi L\,:\,k\in\mathbb{Z},\,L^{2}=-\operatorname{Id}_{2}\}.

Among its elements, the two elements with the lowest norm are (loga)+πI~(\log-a)+\pi\tilde{I} and (loga)πI~(\log-a)-\pi\tilde{I}. (AA is not log-able).

(e) If AA is parabolic (DA=0D_{A}=0) but not a positive or negative scalar matrix, then

LogA=.\operatorname{Log}^{\mathbb{R}}A=\emptyset.

(f) If AA is elliptic (DA>0D_{A}>0), then there are unique r>0r>0 and ϕ(0,π)\phi\in(0,\pi) and IAM2()I_{A}\in\mathrm{M}_{2}(\mathbb{R}) such that (IA)2=Id2(I_{A})^{2}=-\operatorname{Id}_{2} and

A=r(cosϕ)Id2+r(sinϕ)IA.A=r(\cos\phi)\operatorname{Id}_{2}+r(\sin\phi)I_{A}.

In that case,

LogA={(logr)Id2+ϕIA+2kπIA:k}.\operatorname{Log}^{\mathbb{R}}A=\{(\log r)\operatorname{Id}_{2}+\phi I_{A}+2k\pi I_{A}\,:\,k\in\mathbb{Z}\}.

Among its elements, the one with the lowest norm is logA=(logr)Id2+ϕIA\log A=(\log r)\operatorname{Id}_{2}+\phi I_{A}.

Proof.

LogA\operatorname{Log}^{\mathbb{R}}A can be recovered by elementary linear algebra. The comments about the norms follow from the simple norm expression of (44/3). (Note that the skew-involutions are all traceless.) ∎

2.X. Further particularities in the real case

(I) If A=a~Id2+b~I~+c~J~+d~K~A=\tilde{a}\operatorname{Id}_{2}+\tilde{b}\tilde{I}+\tilde{c}\tilde{J}+\tilde{d}\tilde{K} is a real 2×22\times 2 matrix, then we set

𝑿+(A):=a~b~Id2+(c~2+d~2a~2)I~b~c~J~b~d~K~.\boldsymbol{X}^{+}(A):=\tilde{a}\tilde{b}\operatorname{Id}_{2}+(\tilde{c}^{2}+\tilde{d}^{2}-\tilde{a}^{2})\tilde{I}-\tilde{b}\tilde{c}\tilde{J}-\tilde{b}\tilde{d}\tilde{K}.

This operation is chiral; if UU is orthogonal, then 𝑿+(UAU1)=(detU)U𝑿+(A)U1\boldsymbol{X}^{+}(UAU^{-1})=(\det U)\cdot U\boldsymbol{X}^{+}(A)U^{-1}. One can check that

tr(A𝑿+(A))=0.\operatorname{tr}\left(A^{*}\boldsymbol{X}^{+}(A)\right)=0.

(Other reasonable choice would be 𝑿(A):=𝑿+(A)\boldsymbol{X}^{-}(A):=-\boldsymbol{X}^{+}(A).)

(II) Assume that A=aId2+bI~+cJ~+dK~A=a\operatorname{Id}_{2}+b\tilde{I}+c\tilde{J}+d\tilde{K} is a real 2×22\times 2 matrix such that detA>0\det A>0. We define the mm-distortion (multiplicative distortion) of AA as

mdistA=tr(AA)2detA2detA=c2+d2a2+b2c2d2.\mathrm{mdist}\,\,A=\frac{\sqrt{\operatorname{tr}(A^{*}A)-2\det A}}{2\sqrt{\det A}}=\frac{\sqrt{c^{2}+d^{2}}}{\sqrt{a^{2}+b^{2}-c^{2}-d^{2}}}.

2.Y. The flattened hyperboloid model HP and some relatives

(Recall, a review of hyperbolic geometry can be found in Berger [2]; for our conventions, see [12].)

The flattened hyperboloid model, sometimes also popularized as the “Gans model” (cf. [7]), is the “vertical” projection of the usual hyperboloid model. The transcription from the CKB model is

(xHP,yHP)=(xCKB1(xCKB)2(yCKB)2,yCKB1(xCKB)2(yCKB)2);\left(x_{\mathrm{HP}},y_{\mathrm{HP}}\right)=\left(\frac{x_{\mathrm{CKB}}}{\sqrt{1-(x_{\mathrm{CKB}})^{2}-(y_{\mathrm{CKB}})^{2}}},\frac{y_{\mathrm{CKB}}}{\sqrt{1-(x_{\mathrm{CKB}})^{2}-(y_{\mathrm{CKB}})^{2}}}\right);

and the transcription to the CKB model is

(xCKB,yCKB)=(xHP1+(xHP)2+(yHP)2,yHP1+(xHP)2+(yHP)2).\left(x_{\mathrm{CKB}},y_{\mathrm{CKB}}\right)=\left(\frac{x_{\mathrm{HP}}}{\sqrt{1+(x_{\mathrm{HP}})^{2}+(y_{\mathrm{HP}})^{2}}},\frac{y_{\mathrm{HP}}}{\sqrt{1+(x_{\mathrm{HP}})^{2}+(y_{\mathrm{HP}})^{2}}}\right).

An advantage of HP is that its points are represented by points of 2\mathbb{R}^{2}; its disadvantage is, however, that the asymptotical points of the hyperbolic plane are represented only by asymptotical points of 2\mathbb{R}^{2}.

For technical reasons, we will also consider the arctan\arctan-transformed HP model with

(xAHP,yAHP):=(arctanxHP,yHP).\left(x_{\mathrm{AHP}},y_{\mathrm{AHP}}\right):=\left(\arctan x_{\mathrm{HP}},y_{\mathrm{HP}}\right).

The problem with construction is that it not well-adapted to asymptotic points, except at (±1,0)CKB(\pm 1,0)^{\mathrm{CKB}}, where it (or, rather, its inverse) realizes blow-ups.

For this reason, a similar but better construction can be derived from the CKB model, we let

(xACKB,yACKB):=\displaystyle\left(x_{\mathrm{ACKB}},y_{\mathrm{ACKB}}\right):= (arcsinxCKB,yCKB1(xCKB)2)\displaystyle\left(\arcsin x_{\mathrm{CKB}},\frac{y_{\mathrm{CKB}}}{\sqrt{1-(x_{\mathrm{CKB}})^{2}}}\right)
=\displaystyle= (arctanxHP1(yHP)2,yHP1(yHP)2).\displaystyle\left(\arctan\frac{x_{\mathrm{HP}}}{\sqrt{1-(y_{\mathrm{HP}})^{2}}},\frac{y_{\mathrm{HP}}}{\sqrt{1-(y_{\mathrm{HP}})^{2}}}\right).

It (or, rather, its inverse) realizes a blow-up of unit disk of CKB with to [π2,π2]×[1,1][-\frac{\pi}{2},\frac{\pi}{2}]\times[-1,1]. The points (±1,0)CKB(\pm 1,0)^{\mathrm{CKB}} are blown up.

3. Schur’s formulae for 2×22\times 2 matrices

3.A. BCH and Schur’s formulae (review)

Recall, that for formal variables XX and YY,

(55) BCH(X,Y)=log((expX)(expY)).\operatorname{BCH}(X,Y)=\log((\exp X)(\exp Y)).

Whenever AA and BB are elements of a Banach algebra 𝔄\mathfrak{A}, then the convergence of BCH(A,B)\operatorname{BCH}(A,B) can be considered. Natural notions of convergence are (a) absolute convergence of terms grouped by joint homogeneity in AA and BB; (b) absolute convergence of terms grouped by separate homogeneity in AA and BB; or (c) absolute convergence of terms grouped by non-commutative monomials in AA and BB. We adopt the first viewpoint as it is equivalent to the convergence of the corresponding Magnus series; (b) is stricter, and (c) is even stricter and less natural; but even case (c) makes relatively little difference to (a), as it is discussed in Part I. Now, if BCH(A,B)\operatorname{BCH}(A,B) is absolute convergent, then expBCH(A,B)=(expA)(expB)\exp\operatorname{BCH}(A,B)=(\exp A)(\exp B). Another issue is whether BCH(A,B)=log((expA)(expB))\operatorname{BCH}(A,B)=\log((\exp A)(\exp B)) holds. This latter question is basically about the spectral properties of (expA)(expB)(\exp A)(\exp B). As it was discussed in already Part I, if |A|<π|A|<\pi or just sp(A){z:|Imz|<π}\operatorname{sp}(A)\subset\{z\,:\,|\operatorname{Im}z|<\pi\}, and tt is sufficiently small, then BCH(A,tM)=log((expA)(exptM))\operatorname{BCH}(A,tM)=\log((\exp A)(\exp tM)) and

(56) ddtlog(exp(A)exp(tM))|t=0=β(adA)M;\left.\frac{\mathrm{d}}{\mathrm{d}t}\log(\exp(A)\exp(tM))\right|_{t=0}=\beta(-\operatorname{ad}A)M;

and similarly, BCH(tM,A)=log((exptM)(expA))\operatorname{BCH}(tM,A)=\log((\exp tM)(\exp A)) and

(57) ddtlog(exp(tM)exp(A))|t=0=β(adA)M;\left.\frac{\mathrm{d}}{\mathrm{d}t}\log(\exp(tM)\exp(A))\right|_{t=0}=\beta(\operatorname{ad}A)M;

where ad(X)Y=[X,Y]=XYYX\operatorname{ad}(X)Y=[X,Y]=XY-YX and

β(x)=xex1=j=0βjxj=112x+112x21720x4+130240x6+.\beta(x)=\frac{x}{\mathrm{e}^{x}-1}=\sum_{j=0}^{\infty}\beta_{j}x^{j}=1-{\frac{1}{2}}x+{\frac{1}{12}}{x}^{2}-{\frac{1}{720}}{x}^{4}+\frac{1}{30240}x^{6}+\ldots\qquad.

(56) and (57) are F. Schur’s formulae and they embody partial but practical information about the BCH series.

3.B. Schur’s formulae in the 2×22\times 2 matrix case

In the 2×22\times 2 setting, Schur’s formulae can be written in closed form:

Lemma 3.1.

If A,𝐯A,\mathbf{v} are complex matrices, and sp(A){z:|Imz|<π}\operatorname{sp}(A)\subset\{z\,:\,|\operatorname{Im}z|<\pi\}, then

(58) D𝐯 at M=0(log(exp(A)exp(M)))==ddtlog(exp(A)exp(t𝐯))|t=0=𝐯+12[A,𝐯]+(DA)14[A,[A,𝐯]]\mathrm{D}_{\mathbf{v}\text{ at }M=0}\left(\log(\exp(A)\exp(M))\right)=\\ =\left.\frac{\mathrm{d}}{\mathrm{d}t}\log(\exp(A)\exp(t\mathbf{v}))\right|_{t=0}=\mathbf{v}+\frac{1}{2}[A,\mathbf{v}]+\operatorname{\not{\mathbf{C}}}(D_{A})\cdot\frac{1}{4}[A,[A,\mathbf{v}]]

and

(59) D𝐯 at M=0(log(exp(M)exp(A)))==ddtlog(exp(t𝐯)exp(A))|t=0=𝐯12[A,𝐯]+(DA)14[A,[A,𝐯]]\mathrm{D}_{\mathbf{v}\text{ at }M=0}\left(\log(\exp(M)\exp(A))\right)=\\ =\left.\frac{\mathrm{d}}{\mathrm{d}t}\log(\exp(t\mathbf{v})\exp(A))\right|_{t=0}=\mathbf{v}-\frac{1}{2}[A,\mathbf{v}]+\operatorname{\not{\mathbf{C}}}(D_{A})\cdot\frac{1}{4}[A,[A,\mathbf{v}]]

hold.

Proof.

This is a direct computation based on the explicit formulas in Lemma 2.9 and Lemma 2.15, and the rules of derivation. (The key identity for simplifying back is (34), which holds only if xx is in a certain neighborhood of (,π2)(-\infty,\pi^{2}), but then one can use analytic continuation.) ∎

Alternative proof.

For A0A\sim 0, it follows from the traditional form of Schur’s formulae combined with (22). Then the formulas extends by analytic continuation. ∎

Simple consequences are

Lemma 3.2.

If DA{k2π2:k{0}}D_{A}\notin\{k^{2}\pi^{2}\,:\,k\in\mathbb{N}\setminus\{0\}\}, then there exists uniquely, a local analytic branch logloc\log^{\mathrm{loc}} of exp1\exp^{-1} near expA\exp A such that logloc(expA)=A\log^{\mathrm{loc}}(\exp A)=A.

Proof.

The Schur map provides an inverse to the differential of exp\exp, thus the inverse function theorem can be used. ∎

Lemma 3.3.

Assume that AA is complex 2×22\times 2 matrix such that DAA0-D_{A^{*}A}\neq 0 and

sp(A){z:|Imz|<π}.\operatorname{sp}(A)\subset\{z\in\mathbb{C}\,:\,|\operatorname{Im}z|<\pi\}.

(a) Then the moment associated to 𝐯D𝐯 at M=Id2(log(exp(A)exp(M))2)\mathbf{v}\mapsto\mathrm{D}_{\mathbf{v}\text{ at }M=\operatorname{Id}_{2}}\left(\|\log(\exp(A)\exp(M))\|_{2}\right) is

MR(A):=MN(A)+12[A,MN(A)]+(DA)¯[A,[A,MN(A)]].\mathrm{MR}(A):=\operatorname{MN}(A)+\frac{1}{2}[A^{*},\operatorname{MN}(A)]+\overline{\operatorname{\not{\mathbf{C}}}(D_{A})}\cdot[A^{*},[A^{*},\operatorname{MN}(A)]].

(b) Then the moment associated to 𝐯D𝐯 at M=Id2(log(exp(M)exp(A))2)\mathbf{v}\mapsto\mathrm{D}_{\mathbf{v}\text{ at }M=\operatorname{Id}_{2}}\left(\|\log(\exp(M)\exp(A))\|_{2}\right) is

ML(A)=MN(A)12[A,MN(A)]+(DA)¯[A,[A,MN(A)]].\mathrm{ML}(A)=\operatorname{MN}(A)-\frac{1}{2}[A^{*},\operatorname{MN}(A)]+\overline{\operatorname{\not{\mathbf{C}}}(D_{A})}\cdot[A^{*},[A^{*},\operatorname{MN}(A)]].
Proof.

This follows from Lemma 3.1. ∎

Using Lemma 3.1, we can obtain higher terms in the BCH expansion. Let AR(t)=log(exp(A)exp(t𝐯))A^{R}(t)=\log(\exp(A)\exp(t\mathbf{v})). Then

ddtAR(t)=𝐯+12[AR(t),𝐯]+(DAR(t))14[AR(t),[AR(t),𝐯]]\frac{\mathrm{d}}{\mathrm{d}t}A^{R}(t)=\mathbf{v}+\frac{1}{2}[A^{R}(t),\mathbf{v}]+\operatorname{\not{\mathbf{C}}}(D_{A^{R}(t)})\cdot\frac{1}{4}[A^{R}(t),[A^{R}(t),\mathbf{v}]]

holds. Taking further derivatives, we can compute dndtnAR(t)\frac{\mathrm{d}^{n}}{\mathrm{d}t^{n}}A^{R}(t). Due to the rules of derivation, and equations of type (2931) we obtain commutator expressions of AR(t)A^{R}(t) , 𝐯\mathbf{v}, with coefficients which are polynomials of (n)(DAR(t))\operatorname{\not{\mathbf{C}}}^{(n)}(D_{A^{R}(t)}) (higher derivatives of \operatorname{\not{\mathbf{C}}} as functions) and DAR(t)D_{A^{R}(t)}, TAR(t),𝐯T_{A^{R}(t),\mathbf{v}}, D𝐯D_{\mathbf{v}}. Specifying t=0t=0, we obtain the corresponding higher terms in the BCH expression. Similar argument applies with AL(t)=log(exp(t𝐯)exp(A))A^{L}(t)=\log(\exp(t\mathbf{v})\exp(A)). In particular, we obtain

Lemma 3.4.

If A,𝐯A,\mathbf{v} are complex matrices, and sp(A){z:|Imz|<π}\operatorname{sp}(A)\subset\{z\,:\,|\operatorname{Im}z|<\pi\}, then

(60) d2d2tlog(exp(A)exp(t𝐯))|t=0=(DA)+(DA)4[𝐯,[𝐯,A]]+(DA)4[A,[𝐯,[𝐯,A]]]=4TA,𝐯[A,𝐯]+2(DA)+3(DA)16[A,[A,[𝐯,[𝐯,A]]]]=4TA,𝐯[A,[A,𝐯]]\frac{\mathrm{d}^{2}}{\mathrm{d}^{2}t}\log(\exp(A)\exp(t\mathbf{v}))\Bigr{|}_{t=0}=\frac{\operatorname{\not{\mathbf{C}}}(D_{A})+\operatorname{\not{\mathbf{D}}}(D_{A})}{4}\,[\mathbf{v},[\mathbf{v},A]]\\ +\frac{\operatorname{\not{\mathbf{C}}}(D_{A})}{4}\,\underbrace{[A,[\mathbf{v},[\mathbf{v},A]]]}_{=-4T_{A,\mathbf{v}}[A,\mathbf{v}]}+\frac{2\operatorname{\not{\mathbf{P}}}(D_{A})+3\operatorname{\not{\mathbf{W}}}(D_{A})}{16}\,\underbrace{[A,[A,[\mathbf{v},[\mathbf{v},A]]]]}_{=-4T_{A,\mathbf{v}}[A,[A,\mathbf{v}]]}

and

(61) d2d2tlog(exp(t𝐯)exp(A))|t=0=(DA)+(DA)4[𝐯,[𝐯,A]](DA)4[A,[𝐯,[𝐯,A]]]+2(DA)+3(DA)16[A,[A,[𝐯,[𝐯,A]]]].\frac{\mathrm{d}^{2}}{\mathrm{d}^{2}t}\log(\exp(t\mathbf{v})\exp(A))\Bigr{|}_{t=0}=\frac{\operatorname{\not{\mathbf{C}}}(D_{A})+\operatorname{\not{\mathbf{D}}}(D_{A})}{4}\,[\mathbf{v},[\mathbf{v},A]]\\ -\frac{\operatorname{\not{\mathbf{C}}}(D_{A})}{4}\,[A,[\mathbf{v},[\mathbf{v},A]]]+\frac{2\operatorname{\not{\mathbf{P}}}(D_{A})+3\operatorname{\not{\mathbf{W}}}(D_{A})}{16}\,[A,[A,[\mathbf{v},[\mathbf{v},A]]]].

hold.

Proof.

Direct computation. ∎

(Although generating functions for the higher terms are known, cf. Goldberg [8], computing with them is messier.)

Due to the nature of the recursion, one can see that dndntlog(exp(A)exp(t𝐯))|t=0\left.\frac{\mathrm{d}^{n}}{\mathrm{d}^{n}t}\log(\exp(A)\exp(t\mathbf{v}))\right|_{t=0} and dndntlog(exp(t𝐯)exp(A))|t=0\left.\frac{\mathrm{d}^{n}}{\mathrm{d}^{n}t}\log(\exp(t\mathbf{v})\exp(A))\right|_{t=0}, for n2n\geq 2, will be linear combinations of [A,𝐯][A,\mathbf{v}], [A,[A,𝐯]][A,[A,\mathbf{v}]], [𝐯,[𝐯,A]][\mathbf{v},[\mathbf{v},A]] with functions of DAD_{A}, TA,𝐯T_{A,\mathbf{v}}, D𝐯D_{\mathbf{v}}.

Taking this formally, cf. Lemma 2.5, all this implies the qualitative statement of the Baker–Campbell–Hausdorff formula, i. e. that the terms in the BCH expansion, grouped by (bi)degree, are commutator polynomials, for 2×22\times 2 matrices. This was achieved here as the special case of the argument using Schur’s formulae. (Which is a typical argument, see Bonfiglioli, Fulci [3] for a review of the topic, or [10] for some additional viewpoints.) The difference to the general case is that the terms in the expansion (grouped by degree in 𝐯\mathbf{v}) can be kept in finite form (meaning the kind of polynomials we have described).

4. The BCH formula for 2×22\times 2 matrices

Let us take another viewpoint on the BCH expansion now. Recall, in the case of 2×22\times 2 matrices, as it was demonstrated by Lemma 2.5 and Lemma 2.6, commutator polynomials can be represented quite simply, and allowing a formal calculations. It is also a natural question whether those commutator terms of the BCH expansion can be obtained in other efficient manners. The following formal calculations will address these issues.

Let us use the notation

(62) A^=AtrA2Id2\hat{A}=A-\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}

and

(63) 𝐯^=𝐯tr𝐯2Id2.\hat{\mathbf{v}}=\mathbf{v}-\frac{\operatorname{tr}\mathbf{v}}{2}\operatorname{Id}_{2}.

Then it is easy to that

(64) BCH(A,𝐯)=trA2Id2+tr𝐯2Id2+BCH(A^,𝐯^).\operatorname{BCH}(A,\mathbf{v})=\frac{\operatorname{tr}A}{2}\operatorname{Id}_{2}+\frac{\operatorname{tr}\mathbf{v}}{2}\operatorname{Id}_{2}+\operatorname{BCH}(\hat{A},\hat{\mathbf{v}}).

Then Lemma 2.3 can be applied to BCH(A^,𝐯^)\operatorname{BCH}(\hat{A},\hat{\mathbf{v}}). Due to the traceless of A^\hat{A}, 𝐯^\hat{\mathbf{v}}, only the scalar terms detA^=DA\det\hat{A}=D_{A}, det𝐯^=D𝐯\det\hat{\mathbf{v}}=D_{\mathbf{v}}, and tr(A^𝐯^)=2TA,𝐯\operatorname{tr}(\hat{A}\hat{\mathbf{v}})=2T_{A,\mathbf{v}} remain. In the vector terms [A^,𝐯^]=[A,𝐯][\hat{A},\hat{\mathbf{v}}]=[A,\mathbf{v}]. Thus we arrive to

(65) BCH(A^,𝐯^)=\displaystyle\operatorname{BCH}(\hat{A},\hat{\mathbf{v}})= g^0(DA,TA,𝐯,D𝐯)Id2\displaystyle\hat{g}_{0}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,\operatorname{Id}_{2}
+g^1(DA,TA,𝐯,D𝐯)A^\displaystyle+\hat{g}_{1}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,\hat{A}
+g^2(DA,TA,𝐯,D𝐯)𝐯^\displaystyle+\hat{g}_{2}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,\hat{\mathbf{v}}
+g^3(DA,TA,𝐯,D𝐯)[A,𝐯].\displaystyle+\hat{g}_{3}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,[A,\mathbf{v}].

(This is valid for any formal expression of A^,𝐯^\hat{A},\hat{\mathbf{v}}.)

From the (formal) determinant one can deduce that g00g_{0}\equiv 0. Considering (64), we obtain

(66) BCH(A^,𝐯^)=\displaystyle\operatorname{BCH}(\hat{A},\hat{\mathbf{v}})= A+𝐯+\displaystyle A+\mathbf{v}+
+g1(DA,TA,𝐯,D𝐯)A^\displaystyle+g_{1}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,\hat{A}
+g2(DA,TA,𝐯,D𝐯)𝐯^\displaystyle+g_{2}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,\hat{\mathbf{v}}
+g3(DA,TA,𝐯,D𝐯)[A,𝐯].\displaystyle+g_{3}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,[A,\mathbf{v}].

where

g1(DA,TA,𝐯,D𝐯)=g^1(DA,TA,𝐯,D𝐯)1,g_{1}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})=\hat{g}_{1}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})-1,
g2(DA,TA,𝐯,D𝐯)=g^2(DA,TA,𝐯,D𝐯)1,g_{2}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})=\hat{g}_{2}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})-1,
g3(DA,TA,𝐯,D𝐯)=g^3(DA,TA,𝐯,D𝐯).g_{3}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})=\hat{g}_{3}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}}).

This is very close to a commutator expansion. Using (27) and (28), we obtain

(67) BCH(A,𝐯)=A+𝐯\displaystyle\operatorname{BCH}(A,\mathbf{v})=A+\mathbf{v} +f1(DA,TA,𝐯,D𝐯)[A,𝐯]\displaystyle+f_{1}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,[A,\mathbf{v}]
+f2(DA,TA,𝐯,D𝐯)[A,[A,𝐯]]\displaystyle+f_{2}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,[A,[A,\mathbf{v}]]
+f3(DA,TA,𝐯,D𝐯)[𝐯,[𝐯,A]].\displaystyle+f_{3}(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}})\,[\mathbf{v},[\mathbf{v},A]].

where

f1=g3,f_{1}=g_{3},

and

(68) f2=TA,𝐯g1D𝐯g24(DAD𝐯TA,𝐯2),f_{2}=\frac{T_{A,\mathbf{v}}g_{1}-D_{\mathbf{v}}g_{2}}{4(D_{A}D_{\mathbf{v}}-T_{A,\mathbf{v}}^{2})},
(69) f3=DAg1+TA,𝐯g24(DAD𝐯TA,𝐯2).f_{3}=\frac{-D_{A}g_{1}+T_{A,\mathbf{v}}g_{2}}{4(D_{A}D_{\mathbf{v}}-T_{A,\mathbf{v}}^{2})}.

From this, we can see that the qualitative version of the BCH theorem is a very weak but not entirely trivial statement: It is equivalent to divisibility in (68) and (69). Therefore, in the setting of 2×22\times 2 matrices, an alternative approach to the BCH theorem is simply to compute the coefficients f2f_{2} and f3f_{3}. Due to the simplicity of 2×22\times 2 matrices, this, indeed, can be done by direct computation:

Lemma 4.1.

Consider the traceless 2×22\times 2 matrices A^\hat{A}, 𝐯^\hat{\mathbf{v}} as before.

(a) Then

det((1t)Id2+texp(A^)exp(𝐯^))==1+2t(1t)(Sin(DA)Sin(D𝐯)TA,𝐯+Cos(DA)Cos(D𝐯)1)δ(DA,TA,𝐯,D𝐯,t):=.\det\left((1-t)\operatorname{Id}_{2}+t\exp(\hat{A})\exp(\hat{\mathbf{v}})\right)=\\ =\underbrace{1+2t(1-t)\Bigr{(}\operatorname{\not{\mathrm{S}}in}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})T_{A,\mathbf{v}}+\operatorname{\not{\mathrm{C}}os}(D_{A})\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})-1\Bigr{)}}_{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t):=}.

(b) Let

C(A^,𝐯):=Cos(D𝐯)Sin(DA)A^+Cos(DA)Sin(D𝐯)𝐯^+12Sin(DA)Sin(D𝐯)[A,𝐯]C(\hat{A},\mathbf{v}):=\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})\operatorname{\not{\mathrm{S}}in}(D_{A})\hat{A}+\operatorname{\not{\mathrm{C}}os}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})\hat{\mathbf{v}}+\frac{1}{2}\operatorname{\not{\mathrm{S}}in}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})\,[A,\mathbf{v}]

Then, as long as exp(A^)exp(𝐯^)\exp(\hat{A})\exp(\hat{\mathbf{v}}) is log-able,

(70) BCH(A^,𝐯^)\displaystyle\operatorname{BCH}(\hat{A},\hat{\mathbf{v}}) =AC(Cos(DA)Cos(D𝐯)+Sin(DA)Sin(D𝐯)TA,𝐯)C(A^,𝐯)\displaystyle=\operatorname{AC}\bigl{(}\operatorname{\not{\mathrm{C}}os}(D_{A})\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})+\operatorname{\not{\mathrm{S}}in}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})T_{A,\mathbf{v}}\bigr{)}\cdot C(\hat{A},\mathbf{v})
=(t=01dtδ(DA,TA,𝐯,D𝐯,t))C(A^,𝐯).\displaystyle=\left(\int_{t=0}^{1}\frac{\mathrm{d}t}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\right)\cdot C(\hat{A},\mathbf{v}).
Proof.

This follows from our formulas concerning the exponential and the logarithm. ∎

We could have formulated the previous statement in the general case, but it would have been more complicated due to the trace terms. Taking formally, we note that the determinant term δ(DA,TA,𝐯,D𝐯)\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}}) is a formal perturbation of 11. (Substituting DA=0D_{A}=0, TA,𝐯=0T_{A,\mathbf{v}}=0, D𝐯=0D_{\mathbf{v}}=0 gives 11).

Theorem 4.2.

Let A,𝐯,A^,𝐯^A,\mathbf{v},\hat{A},\hat{\mathbf{v}} as before. Let

η(DA,D𝐯,t)==14t(1t)(1(tCos(DA)+(1t)Cos(D𝐯))(tCos(D𝐯)+(1t)Cos(DA)));\eta(D_{A},D_{\mathbf{v}},t)=\\ =1-4t(1-t)\Bigr{(}1-\bigl{(}t\operatorname{\not{\mathrm{C}}os}(D_{A})+(1-t)\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})\bigr{)}\bigl{(}t\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})+(1-t)\operatorname{\not{\mathrm{C}}os}(D_{A})\bigr{)}\Bigr{)};
ϰ2(DA,D𝐯,t)=Cos(D𝐯)+2t(1t)(Cos(DA)Cos(D𝐯);\varkappa_{2}(D_{A},D_{\mathbf{v}},t)=\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})+2t(1-t)(\operatorname{\not{\mathrm{C}}os}(D_{A})-\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}});
ξ2(DA,D𝐯,t)==t(1t)ϰ2(DA,D𝐯,t)12Sin(DA)2Sin(D𝐯);\xi_{2}(D_{A},D_{\mathbf{v}},t)=\\ =t(1-t)\varkappa_{2}(D_{A},D_{\mathbf{v}},t)\cdot\frac{1}{2}\operatorname{\not{\mathrm{S}}in}(D_{A})^{2}\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}});
ϰ3(DA,D𝐯,t)=Cos(DA)+2t(1t)(Cos(D𝐯)Cos(DA);\varkappa_{3}(D_{A},D_{\mathbf{v}},t)=\operatorname{\not{\mathrm{C}}os}(D_{A})+2t(1-t)(\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})-\operatorname{\not{\mathrm{C}}os}(D_{A});
ξ3(DA,D𝐯,t)==t(1t)ϰ3(DA,D𝐯,t)12Sin(D𝐯)2Sin(DA).\xi_{3}(D_{A},D_{\mathbf{v}},t)=\\ =t(1-t)\varkappa_{3}(D_{A},D_{\mathbf{v}},t)\cdot\frac{1}{2}\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})^{2}\operatorname{\not{\mathrm{S}}in}(D_{A}).

Then

(71) BCH(A,𝐯)=\displaystyle\operatorname{BCH}(A,\mathbf{v})= A+𝐯\displaystyle A+\mathbf{v}
+(t=01dtδ(DA,TA,𝐯,D𝐯,t))12Sin(DA)Sin(D𝐯)[A,𝐯]\displaystyle+\left(\int_{t=0}^{1}\frac{\mathrm{d}t}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\right)\cdot\frac{1}{2}\operatorname{\not{\mathrm{S}}in}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})\,[A,\mathbf{v}]
+(t=01ξ2(DA,D𝐯,t)η(DA,D𝐯,t)dtδ(DA,TA,𝐯,D𝐯,t))[A,[A,𝐯]]\displaystyle+\left(\int_{t=0}^{1}\frac{\xi_{2}(D_{A},D_{\mathbf{v}},t)}{\eta(D_{A},D_{\mathbf{v}},t)}\cdot\frac{\mathrm{d}t}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\right)\cdot[A,[A,\mathbf{v}]]
+(t=01ξ3(DA,D𝐯,t)η(DA,D𝐯,t)dtδ(DA,TA,𝐯,D𝐯,t))[𝐯,[𝐯,A]]\displaystyle+\left(\int_{t=0}^{1}\frac{\xi_{3}(D_{A},D_{\mathbf{v}},t)}{\eta(D_{A},D_{\mathbf{v}},t)}\cdot\frac{\mathrm{d}t}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\right)\cdot[\mathbf{v},[\mathbf{v},A]]

is valid as long as A,𝐯0A,\mathbf{v}\sim 0.

(Note that formally, δ(DA,TA,𝐯,D𝐯,t)\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t), η(DA,D𝐯,t)\eta(D_{A},D_{\mathbf{v}},t), ξ2(DA,D𝐯,t)\xi_{2}(D_{A},D_{\mathbf{v}},t), ξ2(DA,D𝐯,t)\xi_{2}(D_{A},D_{\mathbf{v}},t), Cos(DA)\operatorname{\not{\mathrm{C}}os}(D_{A}), Cos(D𝐯\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}}, Sin(DA)\operatorname{\not{\mathrm{S}}in}(D_{A}), Sin(D𝐯\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}}, are all perturbations of 11 for DA,TA,𝐯,D𝐯0D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}}\sim 0.)

Proof.

As a preparation, let us set

ρ(u,x,y)=arctan(utan(x+y2))+arctan(utan(xy2))x.\rho(u,x,y)={\frac{\arctan\left(u\tan\left(\frac{\sqrt{x}+\sqrt{y}}{2}\right)\right)+\arctan\left(u\tan\left(\frac{\sqrt{x}-\sqrt{y}}{2}\right)\right)}{\sqrt{x}}}.

In this form, this is a formal series in x\sqrt{x} and y\sqrt{y}, with coefficients which are polynomials of uu. By consideration of parities, we see that it gives an analytic power series convergent for x,y0x,y\sim 0 uniformly in u[1,1]u\in[-1,1]. (Using the addition rule arctan(z)+arctan(w)=arctan(w+z1wz)\arctan(z)+\arctan(w)=\arctan\left(\frac{w+z}{1-wz}\right) for z,w0z,w\sim 0, and utilizing the functions Cos,Sin,AC\operatorname{\not{\mathrm{C}}os},\operatorname{\not{\mathrm{S}}in},\operatorname{AC}, we can express it as a composite analytic function for x,y0x,y\sim 0 uniformly in u[1,1]u\in[-1,1]. In fact, it yields

ρ(u,x,y)==2uSin(x)AC(Cos(x)+Cos(y)+u2Cos(x)u2Cos(x)(Cos(x)+Cos(y)+u2Cos(x)u2Cos(x))2+4u2(1Cos(x)2))(Cos(x)+Cos(y)+u2Cos(x)u2Cos(x))2+4u2(1Cos(x)2).\rho(u,x,y)=\\ =\frac{2u\operatorname{\not{\mathrm{S}}in}(x)\operatorname{AC}\left(\dfrac{\operatorname{\not{\mathrm{C}}os}(x)+\operatorname{\not{\mathrm{C}}os}(y)+u^{2}\operatorname{\not{\mathrm{C}}os}(x)-u^{2}\operatorname{\not{\mathrm{C}}os}(x)}{\sqrt{(\operatorname{\not{\mathrm{C}}os}(x)+\operatorname{\not{\mathrm{C}}os}(y)+u^{2}\operatorname{\not{\mathrm{C}}os}(x)-u^{2}\operatorname{\not{\mathrm{C}}os}(x))^{2}+4u^{2}(1-\operatorname{\not{\mathrm{C}}os}(x)^{2})}}\right)}{\sqrt{(\operatorname{\not{\mathrm{C}}os}(x)+\operatorname{\not{\mathrm{C}}os}(y)+u^{2}\operatorname{\not{\mathrm{C}}os}(x)-u^{2}\operatorname{\not{\mathrm{C}}os}(x))^{2}+4u^{2}(1-\operatorname{\not{\mathrm{C}}os}(x)^{2})}}.

But this is not particularly enlightening.) Note, for x,y0x,y\sim 0,

ρ(1,x,y)=1andρ(1,x,y)=1.\rho(1,x,y)=1\qquad\text{and}\qquad\rho(-1,x,y)=-1.

Now, we the start the proof proper. By Lemma 4.1 and the discussion in this section, we know that

(72) BCH\displaystyle\operatorname{BCH} (A,𝐯)=A+𝐯\displaystyle(A,\mathbf{v})=A+\mathbf{v}
+(t=01dtδ(DA,TA,𝐯,D𝐯,t))12Sin(DA)Sin(D𝐯)[A,𝐯]\displaystyle+\left(\int_{t=0}^{1}\frac{\mathrm{d}t}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\right)\cdot\frac{1}{2}\operatorname{\not{\mathrm{S}}in}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})\,[A,\mathbf{v}]
+(t=01Cos(D𝐯)Sin(DA)δ(DA,TA,𝐯,D𝐯,t)dt)A^(t=01ddt(ρ(2t1,DA,D𝐯)2)dt)=1A^\displaystyle+\left(\int_{t=0}^{1}\frac{\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})\operatorname{\not{\mathrm{S}}in}(D_{A})}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\mathrm{d}t\right)\cdot\hat{A}-\underbrace{\left(\int_{t=0}^{1}\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\rho(2t-1,D_{A},D_{\mathbf{v}})}{2}\right)\mathrm{d}t\right)}_{=1}\cdot\hat{A}
+(t=01Cos(DA)Sin(D𝐯)δ(DA,TA,𝐯,D𝐯,t)dt)𝐯^(t=01ddt(ρ(2t1,D𝐯,DA)2)dt)=1𝐯^.\displaystyle+\left(\int_{t=0}^{1}\frac{\operatorname{\not{\mathrm{C}}os}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}\mathrm{d}t\right)\cdot\hat{\mathbf{v}}-\underbrace{\left(\int_{t=0}^{1}\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\rho(2t-1,D_{\mathbf{v}},D_{A})}{2}\right)\mathrm{d}t\right)}_{=1}\cdot\hat{\mathbf{v}}.

Thus, we have to prove

(73) RHS(71/3)+RHS(71/4)=RHS(72/3)+RHS(72/4).\text{RHS(\ref{eq:ewa}/3)}+\text{RHS(\ref{eq:ewa}/4)}=\text{RHS(\ref{eq:owa}/3)}+\text{RHS(\ref{eq:owa}/4)}.

As the integrands can be expanded as power series of DA,TA,𝐯,D𝐯0D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}}\sim 0 (uniformly in t[0,1]t\in[0,1]), it is even sufficient to prove this as for formal power series in DA,TA,𝐯,D𝐯D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}}.

Now, it is easy to check that

ddt(ρ(2t1,DA,D𝐯)2)=Sin(DA)ϰ2(DA,D𝐯,t)η(DA,D𝐯,t)\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\rho(2t-1,D_{A},D_{\mathbf{v}})}{2}\right)=\frac{\operatorname{\not{\mathrm{S}}in}(D_{A})\varkappa_{2}(D_{A},D_{\mathbf{v}},t)}{\eta(D_{A},D_{\mathbf{v}},t)}

and

ddt(ρ(2t1,D𝐯,DA)2)=Sin(D𝐯)ϰ3(DA,D𝐯,t)η(DA,D𝐯,t).\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\rho(2t-1,D_{\mathbf{v}},D_{A})}{2}\right)=\frac{\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})\varkappa_{3}(D_{A},D_{\mathbf{v}},t)}{\eta(D_{A},D_{\mathbf{v}},t)}.

Thus, using (27) and (28), one finds that

(Cos(D𝐯)Sin(DA)δ(DA,TA,𝐯,D𝐯,t)ddt(ρ(2t1,DA,D𝐯)2))A^\left(\frac{\operatorname{\not{\mathrm{C}}os}(D_{\mathbf{v}})\operatorname{\not{\mathrm{S}}in}(D_{A})}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}-\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\rho(2t-1,D_{A},D_{\mathbf{v}})}{2}\right)\right)\cdot\hat{A}
+(Cos(DA)Sin(D𝐯)δ(DA,TA,𝐯,D𝐯,t)ddt(ρ(2t1,D𝐯,DA)2))𝐯^+\left(\frac{\operatorname{\not{\mathrm{C}}os}(D_{A})\operatorname{\not{\mathrm{S}}in}(D_{\mathbf{v}})}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}-\frac{\mathrm{d}}{\mathrm{d}t}\left(\frac{\rho(2t-1,D_{\mathbf{v}},D_{A})}{2}\right)\right)\cdot\hat{\mathbf{v}}
=ξ2(DA,D𝐯,t)η(DA,D𝐯,t)1δ(DA,TA,𝐯,D𝐯,t)[A,[A,𝐯]]=\frac{\xi_{2}(D_{A},D_{\mathbf{v}},t)}{\eta(D_{A},D_{\mathbf{v}},t)}\cdot\frac{1}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}[A,[A,\mathbf{v}]]
+ξ3(DA,D𝐯,t)η(DA,D𝐯,t)1δ(DA,TA,𝐯,D𝐯,t)[𝐯,[𝐯,A]].+\frac{\xi_{3}(D_{A},D_{\mathbf{v}},t)}{\eta(D_{A},D_{\mathbf{v}},t)}\cdot\frac{1}{\delta(D_{A},T_{A,\mathbf{v}},D_{\mathbf{v}},t)}[\mathbf{v},[\mathbf{v},A]].

Integrated, it yields (73). ∎

Taken formally, the statement and the proof of the previous theorem gives an alternative demonstration of the qualitative (commutator expansion) statement of the BCH theorem (for 2×22\times 2 matrices). This is, however, more for the sake of curiosity; in practice, it is simpler to use (70) in order to expand as in (66), and obtain (67) with (68), (69).

5. Magnus minimality of mBCH expansions in the 2×22\times 2 case

Recall (from Part I) that in a Banach algebra 𝔄\mathfrak{A}, for an element AA we define its Magnus exponent as

𝔄(A)=inf{|ϕ|:expL(ϕ)=A}\mathcal{M}_{\mathfrak{A}}(A)=\inf\{\smallint|\phi|\,:\,\operatorname{exp_{L}}(\phi)=A\}

(giving ++\infty for an empty set). As ordered measures can be replaced by piecewise constant measures with arbitrarily small increase in the cumulative norm, the infimum can be taken for mBCH measures. We will say that ϕ\phi is Magnus-minimal if 𝔄(expL(ϕ))=|ψ|\mathcal{M}_{\mathfrak{A}}(\operatorname{exp_{L}}(\phi))=\smallint|\psi|. Now, the natural generalization of Theorem 1.6.(b) would be the following:

(X) “In the setting of finite dimensional Hilbert spaces, if ψ\psi is a reduced BCH measure, then ()(expL(ψ))<ψ2\mathcal{M}_{\mathcal{B}(\mathfrak{H})}(\operatorname{exp_{L}}(\psi))<\smallint\|\psi\|_{2} (i. e. ψ\psi is not Magnus minimal).”

The objective of this section is to check (X) for 2×22\times 2 matrices by direct computation. (In Section 13, we will see more informative approach in the real case.)

Theorem 5.1.

(a) Suppose that AA is real 2×22\times 2 matrix, which is not normal. Then A𝟏A\mathbf{1} is not Magnus-minimal, i. e.

2×2real(A𝟏)<A.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A\mathbf{1})<\|A\|.

(b) Similar statement holds in the complex case.

Proof.

We can assume that AA is sufficiently small, in particular, sp(A/2){z:|Imz|<π}\operatorname{sp}(A/2)\subset\{z\in\mathbb{C}\,:\,|\operatorname{Im}z|<\pi\}. For t0t\sim 0, tt\in\mathbb{R}, let

A1(t)=log(exp(A/2)exp(t[A,A]));A_{1}(t)=\log\left(\exp(A/2)\exp(t[A,A^{*}])\right);
A2(t)=log(exp(t[A,A])exp(A/2)).A_{2}(t)=\log\left(\exp(-t[A,A^{*}])\exp(A/2)\right).

Then,

expA=exp(A1(t))exp(A2(t));\exp A=\exp(A_{1}(t))\exp(A_{2}(t));

this yields a (m)BCH expansion for expA\exp A. It is sufficient to prove that

(74) dA1(t)2dt|t=0++dA2(t)2dt|t=0+<0\left.\frac{\mathrm{d}\|A_{1}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}+\left.\frac{\mathrm{d}\|A_{2}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}<0

in order to demonstrate non-Magnus minimality. Using (58) and (22), we find

D[A,A]atM=0(log(exp(A/2)exp(M)))𝐯1:==Cot(DA/2)[A,A]+14[A,[A,A]];\underbrace{\mathrm{D}_{[A,A^{*}]\,\,\mathrm{at}\,\,M=0}\left(\log\left(\exp(A/2)\exp(M)\right)\right)}_{\mathbf{v}_{1}:=}=\operatorname{\not{\mathrm{C}}ot}\left(D_{A/2}\right)[A,A^{*}]+\frac{1}{4}[A,[A,A^{*}]];

and, using Lemma 2.19(d),

dA1(t)2dt|t=0+=D𝐯1atM=A/2M2=14A2(D[A,A])DAA<0.\left.\frac{\mathrm{d}\|A_{1}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}=\mathrm{D}_{\mathbf{v}_{1}\,\,\mathrm{at}\,\,M=A/2}\,\|M\|_{2}=-\frac{1}{4}\frac{\|A_{2}\|(-D_{[A,A^{*}]})}{\sqrt{-D_{A*A}}}<0.

(Note that A/2A/2 being non-normal implies smoothness for the norm.) Similarly,

D[A,A]atM=0(log(exp(M)exp(A/2)))𝐯2:==Cot(DA/2)[A,A]+14[A,[A,A]];\underbrace{\mathrm{D}_{-[A,A^{*}]\,\,\mathrm{at}\,\,M=0}\left(\log\left(\exp(M)\exp(A/2)\right)\right)}_{\mathbf{v}_{2}:=}=-\operatorname{\not{\mathrm{C}}ot}\left(D_{A/2}\right)[A,A^{*}]+\frac{1}{4}[A,[A,A^{*}]];

and,

dA2(t)2dt|t=0+=D𝐯2atM=A/2M2=14A2(D[A,A])DAA<0.\left.\frac{\mathrm{d}\|A_{2}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}=\mathrm{D}_{\mathbf{v}_{2}\,\,\mathrm{at}\,\,M=A/2}\,\|M\|_{2}=-\frac{1}{4}\frac{\|A_{2}\|(-D_{[A,A^{*}]})}{\sqrt{-D_{A*A}}}<0.

Then (74) holds, and so does the statement. (Remark: it is useful to follow through this computation for (13) with s1,s2>0s_{1},s_{2}>0.) ∎

As a consequence, we see that any Magnus minimal mBCH presentation must contain only normal matrices.

Theorem 5.2.

(a) Suppose that AA and BB are real n×nn\times n matrices such that A+B2<A2+B2\|A+B\|_{2}<\|A\|_{2}+\|B\|_{2}. Then A𝟏.B𝟏A\mathbf{1}\boldsymbol{.}B\mathbf{1} is not Magnus-minimal, i. e.

n×nreal(expR(A𝟏.B𝟏))<A2+B2.\mathcal{M}_{n\times n\,\,\mathrm{real}}(\operatorname{exp_{R}}({A\mathbf{1}\boldsymbol{.}B\mathbf{1})})<\|A\|_{2}+\|B\|_{2}.

(b) Similar statement holds in the complex case.

Proof.

Appling the BCH formula, we see that

limt0log(expR(tA,tB))2tA2+tB2=A+B2A2+B2.\lim_{t\searrow 0}\frac{\|\log(\operatorname{exp_{R}}(tA,tB))\|_{2}}{\|tA\|_{2}+\|tB\|_{2}}=\frac{\|A+B\|_{2}}{\|A\|_{2}+\|B\|_{2}}.

This implies that for sufficiently small tt, expR((1t)A𝟏.tC.(1t)B𝟏)\operatorname{exp_{R}}({(1-t)A\mathbf{1}\boldsymbol{.}tC\boldsymbol{.}(1-t)B\mathbf{1})}, with tC=log(expR(tA,tB))tC=\log(\operatorname{exp_{R}}(tA,tB)), will be an alternative presentation with smaller cumulative norm. ∎

Definition 5.3.

(a) We say that the complex normal matrices AA and BB are aligned if, up to (simultaneous) unitary conjugation, they are of shape

A=r1eiη[1t1]andB=r2eiη[1t2],A=r_{1}{\mathrm{e}}^{\mathrm{i}\eta}\begin{bmatrix}1&\\ &t_{1}\end{bmatrix}\qquad\text{and}\qquad B=r_{2}{\mathrm{e}}^{\mathrm{i}\eta}\begin{bmatrix}1&\\ &t_{2}\end{bmatrix},

where r1,r2(0,+)r_{1},r_{2}\in(0,+\infty), η[0,2π)\eta\in[0,2\pi), t1,t2D(0,1)t_{1},t_{2}\in\operatorname{D}(0,1).

(b) We say that the complex normal matrices AA and BB are skew-aligned if, up to simultaneous unitary conjugation, they are of shape

A=r1[eiηcosteiθsinteiηsinteiθcost]andB=r2[eiηcosteiϕsinteiηsinteiϕcost],A=r_{1}\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}\cos t&-\mathrm{e}^{\mathrm{i}\theta}\sin t\\ \mathrm{e}^{\mathrm{i}\eta}\sin t&\mathrm{e}^{\mathrm{i}\theta}\cos t\end{bmatrix}\qquad\text{and}\qquad B=r_{2}\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}\cos t&-\mathrm{e}^{\mathrm{i}\phi}\sin t\\ \mathrm{e}^{\mathrm{i}\eta}\sin t&\mathrm{e}^{\mathrm{i}\phi}\cos t\end{bmatrix},

where r1,r2(0,+)r_{1},r_{2}\in(0,+\infty), t(0,π)t\in(0,\pi) and η,θ,ϕ[0,2π)\eta,\theta,\phi\in[0,2\pi) but ϕθ\phi\neq\theta. (Note that in this case AA and BB are conform-unitary. We also remark that tπtt\rightsquigarrow\pi-t, ηη+π\eta\rightsquigarrow\eta+\pi, ϕϕ+π\phi\rightsquigarrow\phi+\pi, θθ+π\theta\rightsquigarrow\theta+\pi is a symmetry by conjugation with [11]\begin{bmatrix}1&\\ &-1\end{bmatrix}. )

Lemma 5.4.

Suppose that A,BA,B are nonzero complex 2×22\times 2 matrices such that AA and BB are normal. Then

A+B2=A2+B2\|A+B\|_{2}=\|A\|_{2}+\|B\|_{2}

holds if and only if AA and BB are aligned or skew-aligned. The aligned and skew-aligned cases are mutually exclusive. For example, aligned pairs commute and skew-aligned pairs do not commute.

Proof.

If AA (or BB) is not conform unitary, then its norm is realized at an eigenvector. Due to the additive restriction it must be common eigenvector of AA and BB. It can be assumed that this common eigenvector is [10]\begin{bmatrix}1\\ 0\end{bmatrix}, then due to normality (the orthogonality of eigenspaces), the matrices are aligned. If AA and BB are conform-unitary, then it can be assume that the norm of the sum is taken [10]\begin{bmatrix}1\\ 0\end{bmatrix} again, and due to the restrictions we have a configuration which is an skew-alignment but t=0t=0 and ϕ=θ\phi=\theta allowed. The excess cases can be incorporated to the aligned case, what remains is the skew-aligned case. The commutation statement is easy to check. ∎

Lemma 5.5.

Suppose that AA and BB are nonzero, normal, 2×22\times 2 real matrices.

(a) AA and BB are aligned if and only if up to simultaneous conjugation by orthogonal matrices they are of shape

A=r1eiη[1t1]andB=r2eiη[1t2]A=r_{1}{\mathrm{e}}^{\mathrm{i}\eta}\begin{bmatrix}1&\\ &t_{1}\end{bmatrix}\qquad\text{and}\qquad B=r_{2}{\mathrm{e}}^{\mathrm{i}\eta}\begin{bmatrix}1&\\ &t_{2}\end{bmatrix}

where r1,r2(0,+)r_{1},r_{2}\in(0,+\infty), η{0,π}\eta\in\{0,\pi\}, t1,t2[1,1]t_{1},t_{2}\in[-1,1] [hyperbolically aligned case], or up to simultaneous conjugation by orthogonal matrices they are of shape

A=r1[costsintsintcost]andB=r2[costsintsintcost]A=r_{1}\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}\qquad\text{and}\qquad B=r_{2}\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}

with r1,r2(0,+)r_{1},r_{2}\in(0,+\infty), t[0,2π)t\in[0,2\pi) [elliptically aligned case].

(The hyperbolic and elliptic cases are not mutually exclusive, but the common case involves only with scalar matrices.)

(b) AA and BB are skew-aligned if and only if up to simultaneous conjugation by orthogonal matrices they are of shape

A=r1[eiηcosteiθsinteiηsinteiθcost]andB=r2[eiηcosteiϕsinteiηsinteiϕcost]A=r_{1}\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}\cos t&-\mathrm{e}^{\mathrm{i}\theta}\sin t\\ \mathrm{e}^{\mathrm{i}\eta}\sin t&\mathrm{e}^{\mathrm{i}\theta}\cos t\end{bmatrix}\qquad\text{and}\qquad B=r_{2}\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}\cos t&-\mathrm{e}^{\mathrm{i}\phi}\sin t\\ \mathrm{e}^{\mathrm{i}\eta}\sin t&\mathrm{e}^{\mathrm{i}\phi}\cos t\end{bmatrix}

where r1,r2(0,+)r_{1},r_{2}\in(0,+\infty), t(0,π)t\in(0,\pi) and η,θ,ϕ{0,π}\eta,\theta,\phi\in\{0,\pi\}, ϕθ\phi\neq\theta.

Proof.

This follows from the standard properties of normal (in this case: symmetric and conform-orthogonal) matrices. ∎

Theorem 5.6.

(a) Suppose that A,BA,B are real normal 2×22\times 2 matrices which are skew-aligned. Then A𝟏.B𝟏A\mathbf{1}\boldsymbol{.}B\mathbf{1} is not Magnus-minimal, i. e.

n×nreal(expR(A𝟏.B𝟏))<A2+B2.\mathcal{M}_{n\times n\,\,\mathrm{real}}(\operatorname{exp_{R}}({A\mathbf{1}\boldsymbol{.}B\mathbf{1})})<\|A\|_{2}+\|B\|_{2}.

(b) Similar statement holds in the complex case.

Proof.

Assume that A2,B2<π\|A\|_{2},\|B\|_{2}<\pi and CC is an arbitrary matrix. Then

A~(t)=log(exp(A)exp(tC))\tilde{A}(t)=\log(\exp(A)\exp(tC))

and

B~(t)=log(exp(Ct)exp(B))\tilde{B}(t)=\log(\exp(-Ct)\exp(B))

make sense and analytic for t0t\sim 0. Furthermore, (expA)(expB)=(expA~(t))(expB~(t))(\exp A)(\exp B)=(\exp\tilde{A}(t))(\exp\tilde{B}(t)).

If

dA~(t)2dt|t=0++dB~(t)2dt|t=0+<0,\left.\frac{\mathrm{d}\|\tilde{A}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}+\left.\frac{\mathrm{d}\|\tilde{B}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}<0,

then A𝟏.B𝟏A\mathbf{1}\boldsymbol{.}B\mathbf{1} cannot be Magnus minimal, as it can be replaced A~(t)𝟏.B~(t)𝟏\tilde{A}(t)\mathbf{1}\boldsymbol{.}\tilde{B}(t)\mathbf{1} which is of smaller cumulative norm for small t>0t>0. We are going to use this idea.

By Schur’s formulae,

A~(0)=ddtlog(exp(A)exp(tC))|t=0=β(adA)C,\tilde{A}^{\prime}(0)=\left.\frac{\mathrm{d}}{\mathrm{d}t}\log(\exp(A)\exp(tC))\right|_{t=0}=\beta(-\operatorname{ad}A)C,
B~(0)=ddtlog(exp(tC)exp(B))|t=0=β(adB)C.\tilde{B}^{\prime}(0)=\left.\frac{\mathrm{d}}{\mathrm{d}t}\log(\exp(-tC)\exp(B))\right|_{t=0}=-\beta(\operatorname{ad}B)C.

(This holds as A2,B2<π\|A\|_{2},\|B\|_{2}<\pi was assumed.)

In our situation the matrices AA and BB are conform-unitary matrices, thus

dA~(t)2dt|t=0+=A~(0)2S(A~(0)A~(0)1);\left.\frac{\mathrm{d}\|\tilde{A}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}=\|\tilde{A}(0)\|_{2}\cdot S(\tilde{A}^{\prime}(0)\tilde{A}(0)^{-1});

cf. Lemma 2.19(c); similar formula holds with B~(t)\tilde{B}(t).

First, we treat the complex case. Now, assume that

A=r[eiηcosteiθsinteiηsinteiθcost]r[costsintsintcost][eiηeiθ]A=r\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}\cos t&-\mathrm{e}^{\mathrm{i}\theta}\sin t\\ \mathrm{e}^{\mathrm{i}\eta}\sin t&\mathrm{e}^{\mathrm{i}\theta}\cos t\end{bmatrix}\equiv r\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}&\\ &\mathrm{e}^{\mathrm{i}\theta}\end{bmatrix}

and

B=r[eiηcosteiϕsinteiηsinteiϕcost]r[costsintsintcost][eiηeiϕ]B=r\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}\cos t&-\mathrm{e}^{\mathrm{i}\phi}\sin t\\ \mathrm{e}^{\mathrm{i}\eta}\sin t&\mathrm{e}^{\mathrm{i}\phi}\cos t\end{bmatrix}\equiv r\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}\begin{bmatrix}\mathrm{e}^{\mathrm{i}\eta}&\\ &\mathrm{e}^{\mathrm{i}\phi}\end{bmatrix}

but ABA\neq B. Note that, in this case, ϕθmod2π\phi\not\equiv\theta\operatorname{\,mod\,}2\pi and

1cos(ϕθ)>0.1-\cos(\phi-\theta)>0.

In our computation θ,ϕ,η\theta,\phi,\eta will be fixed, but rr can be taken “sufficiently small”.

Let

C=r112(1cos(ϕθ))[costsintsintcost][(𝔞1+i𝔞2)reiη𝔟r𝔟reiϕeiθ],C=r\frac{1}{\frac{1}{2}(1-\cos(\phi-\theta))}\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}\begin{bmatrix}(\mathfrak{a}_{1}+\mathrm{i}\mathfrak{a}_{2})r\mathrm{e}^{\mathrm{i}\eta}&\mathfrak{b}r\\ \mathfrak{b}r&\mathrm{e}^{\mathrm{i}\phi}-\mathrm{e}^{\mathrm{i}\theta}\end{bmatrix},

where

𝔟=12(2ei(θ+η)ei(ϕ+η))sint,\mathfrak{b}=\frac{1}{2}(2-\mathrm{e}^{\mathrm{i}(\theta+\eta)}-\mathrm{e}^{\mathrm{i}(\phi+\eta)})\sin t,

and

𝔞1=cos(η+θ)cos(η+ϕ),\mathfrak{a}_{1}=\cos\left(\eta+\theta\right)-\cos\left(\eta+\phi\right),

and

𝔞2=112(1cos(ϕθ))(\displaystyle\mathfrak{a}_{2}=\frac{1}{\frac{1}{2}(1-\cos(\phi-\theta))}\Bigl{(} +sin(η+θ)(cos(η+ϕ)1)(cos(η+ϕ)2)\displaystyle+\sin\left(\eta+\theta\right)\left(\cos\left(\eta+\phi\right)-1\right)\left(\cos\left(\eta+\phi\right)-2\right)
sin(η+ϕ)(cos(η+θ)1)(cos(η+θ)2)\displaystyle-\sin\left(\eta+\phi\right)\left(\cos\left(\eta+\theta\right)-1\right)\left(\cos\left(\eta+\theta\right)-2\right)
+12(cos(η+θ)cos(η+ϕ))sin(2η+θ+ϕ)).\displaystyle+\frac{1}{2}\,\left(\cos\left(\eta+\theta\right)-\cos\left(\eta+\phi\right)\right)\sin\left(2\,\eta+\theta+\phi\right)\Bigr{)}.

In this case,

dA~(t)2dt|t=0+\displaystyle\left.\frac{\mathrm{d}\|\tilde{A}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+} +dB~(t)2dt|t=0+=\displaystyle+\left.\frac{\mathrm{d}\|\tilde{B}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}=
=A2S(β(adA)CA1𝐯1:=)+B2S(β(adB)CB1𝐯2:=)\displaystyle=\|A\|_{2}\cdot S(\overbrace{\beta(-\operatorname{ad}A)C\cdot A^{-1}}^{\mathbf{v}_{1}:=})+\|B\|_{2}\cdot S(\overbrace{-\beta(\operatorname{ad}B)C\cdot B^{-1}}^{\mathbf{v}_{2}:=})
=r4sin2t6112(1cos(ϕθ))(𝔞1𝔞~1+𝔞2𝔞~2)+O(r5),\displaystyle=-\frac{r^{4}\sin^{2}t}{6}\frac{1}{\frac{1}{2}(1-\cos(\phi-\theta))}(\mathfrak{a}_{1}\tilde{\mathfrak{a}}_{1}+\mathfrak{a}_{2}\tilde{\mathfrak{a}}_{2})+O(r^{5}),

where 𝔞1=𝔞~1\mathfrak{a}_{1}=\tilde{\mathfrak{a}}_{1} and 𝔞2=𝔞~2\mathfrak{a}_{2}=\tilde{\mathfrak{a}}_{2}. (In fact, 𝔞1\mathfrak{a}_{1} and 𝔞2\mathfrak{a}_{2} were chosen accordingly.) Regarding the previous computation, we note that D𝐯i+𝐯i2=1+O(r)D_{\frac{\mathbf{v}_{i}+\mathbf{v}_{i}*}{2}}=1+O(r) as r0r\searrow 0 (for fixed θ,ϕ,η\theta,\phi,\eta). Thus D𝐯i+𝐯i2=1+O(r)\sqrt{D_{\frac{\mathbf{v}_{i}+\mathbf{v}_{i}*}{2}}}=1+O(r) also. This makes the computation valid for small rr.

Our next observation is that (𝔞1,𝔞2)(0,0)(\mathfrak{a}_{1},\mathfrak{a}_{2})\neq(0,0). Indeed, suppose that 𝔞1=0\mathfrak{a}_{1}=0. This means that

cos(η+ϕ)=cos(η+θ).\cos\left(\eta+\phi\right)=\cos\left(\eta+\theta\right).

As we know

η+ϕη+θmod2π,\eta+\phi\not\equiv\eta+\theta\operatorname{\,mod\,}2\pi,

this implies

sin(η+ϕ)=sin(η+θ)0\sin\left(\eta+\phi\right)=-\sin\left(\eta+\theta\right)\neq 0

and

cos(η+ϕ)=cos(η+θ)±1.\cos\left(\eta+\phi\right)=\cos\left(\eta+\theta\right)\neq\pm 1.

Substituting with these yields

𝔞2=112(1cos(ϕθ))(2sin(η+θ)(cos(η+θ)1)(cos(η+θ)2)).\mathfrak{a}_{2}=\frac{1}{\frac{1}{2}(1-\cos(\phi-\theta))}\Bigl{(}2\sin\left(\eta+\theta\right)\left(\cos\left(\eta+\theta\right)-1\right)\left(\cos\left(\eta+\theta\right)-2\right)\Bigr{)}.

Then 𝔞2\mathfrak{a}_{2} is nonzero, as any multiplicative component is nonzero.

By this, we have shown that

dA~(t)2dt|t=0++dB~(t)2dt|t=0+<0\left.\frac{\mathrm{d}\|\tilde{A}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}+\left.\frac{\mathrm{d}\|\tilde{B}(t)\|_{2}}{\mathrm{d}t}\right|_{t=0+}<0

if rr is sufficiently small (for fixed θ,ϕ,η\theta,\phi,\eta). This already contradicts to Magnus minimality, as we can always restrict to around the join area between AA and BB.

The real case is not different but we can choose only η,θ,ϕ{0,π}\eta,\theta,\phi\in\{0,\pi\}. The resulted matrices are all real (as sin(θ+η)\sin(\theta+\eta), etc., are all 0). ∎

Theorem 5.7.

(a) Assume ϕ=A1𝟏[0,t1)..Ak𝟏[tk1,tk)\phi=A_{1}\mathbf{1}_{[0,t_{1})}\boldsymbol{.}\ldots\boldsymbol{.}A_{k}\mathbf{1}_{[t_{k-1},t_{k})} is a mass-normalized mBCH measure of real 2×22\times 2 matrices, which is Magnus minimal, i. e. ,

n×nreal(expR(ϕ))=ϕ2.\mathcal{M}_{n\times n\,\,\mathrm{real}}(\operatorname{exp_{R}}(\phi))=\smallint\|\phi\|_{2}.

Then there are two possible cases:

(i) Up to conjugation by orthogonal matrices, ϕ\phi is of shape ϕ1ϕ2\phi_{1}\oplus\phi_{2} such that one of the components is of shape 1𝟏[0,tk)1\cdot\mathbf{1}_{[0,t_{k})} or 1𝟏[0,tk)-1\cdot\mathbf{1}_{[0,t_{k})} (1×11\times 1 matrix) [hyperbolic case]. Or,

(ii) ϕ\phi is constant orthogonal, i. e. it is of shape U𝟏[0,tk)U\mathbf{1}_{[0,t_{k})} where UU is orthogonal [elliptic case].

(b) Assume ϕ=A1𝟏[0,t1)..Ak𝟏[tk1,tk)\phi=A_{1}\mathbf{1}_{[0,t_{1})}\boldsymbol{.}\ldots\boldsymbol{.}A_{k}\mathbf{1}_{[t_{k-1},t_{k})} is a mass-normalized mBCH measure of complex 2×22\times 2 matrices, which is Magnus minimal, i. e. ,

n×ncomplex(expR(ϕ))=ϕ2.\mathcal{M}_{n\times n\,\,\mathrm{complex}}(\operatorname{exp_{R}}(\phi))=\smallint\|\phi\|_{2}.

Then, up to conjugation by unitary matrices, ϕ\phi is of shape ϕ1ϕ2\phi_{1}\oplus\phi_{2} such that one of the components is of shape u𝟏[0,tk)u\cdot\mathbf{1}_{[0,t_{k})} where uD(0,1)u\in\partial\operatorname{D}(0,1).

As a consequence we see that reduced mass-normalized mBCH measures of 2×22\times 2 cannot be minimal (valid in real and complex sense alike).

Proof.

(a) From the previous lemmas we know that the matrices must be normal, one aligned next to each other. If one of the matrices is of elliptic type then its neighbour must be equal to it. Assume that all AiA_{i} are of decomposable (parabolic or hyperbolic) type. Note that scalar matrices are freely movable in the decomposition so we can temporarily assume that the AiA_{i} of hyperbolic type are next to each other. Then its eigenspace decompositions are the same. Beyond that we have only scalar matrices, thus decomposability follows. Minimality implies that at least one of the components is minimal, thus we have the special shape. (b) is similar except simpler. ∎

Remark.

Minimality may lead to further restrictions on the orthogonal UU or the unit uu. However, if ϕ2π\smallint\|\phi\|_{2}\leq\pi, then all the indicated shapes are Magnus minimal. ∎

6. Asymptotics of some BCH expansions from SL2()\operatorname{SL}_{2}(\mathbb{R})

For the purposes of this section, we consider some auxiliary functions.

Lemma 6.1.

(a) The function

AS(x)=AC(x)211x2\operatorname{AS}(x)=\sqrt{\frac{\operatorname{AC}(x)^{2}-1}{1-x^{2}}}

extends from (1,1)(1,+)(-1,1)\cup(1,+\infty) to an analytic function on (,1]\mathbb{C}\setminus(-\infty,-1]. AS(0)=33\operatorname{AS}(0)=\frac{\sqrt{3}}{3}; and AS\operatorname{AS} is nowhere vanishing on (,1]\mathbb{C}\setminus(-\infty,-1].

(b) AS\operatorname{AS} is monotone decreasing on (1,)(-1,\infty) with range (+,0)e(+\infty,0)_{\mathrm{e}}.

Proof.

(a) (,1]\mathbb{C}\setminus(-\infty,-1] is simply connected, and the branchings (or vanishing) for the square roots can occur at z=±z=\pm or AC(z)=±\operatorname{AC}(z)=\pm, thus ultimately only for z=1z=1. Thus the statement is sufficient to check at z=1z=1 with power series.

(b) Elementary function calculus. ∎

Lemma 6.2.

(a) The function

At(z)=AC(z)1AS(z)\operatorname{At}(z)=\frac{\operatorname{AC}(z)-1}{\operatorname{AS}(z)}

is analytic on (,1]\mathbb{C}\setminus(-\infty,-1]. At(1)=0\operatorname{At}(1)=0 and At(0)=33\operatorname{At}^{\prime}(0)=-\frac{\sqrt{3}}{3}; At(z)\operatorname{At}(z) vanishes only at z=0z=0 for z(,1]z\in\mathbb{C}\setminus(-\infty,-1].

(b) The function xx+At(x)x\mapsto x+\operatorname{At}(x) is monotone increasing on (1,+)(-1,+\infty) (bijectively); moreover, it also yields a bijection from (1,1](-1,1] to itself.

Proof.

(a) This is a consequence of Lemma 6.1(a).

(b) Elementary function calculus. ∎

Remark 6.3.

Consequently, for z(,1]z\in\mathbb{C}\setminus(-\infty,-1],

AC(z)=1z2+At(z)21z2At(z)2\operatorname{AC}(z)=\frac{1-z^{2}+\operatorname{At}(z)^{2}}{1-z^{2}-\operatorname{At}(z)^{2}}

and

AS(z)=2At(z)1z2At(z)2\operatorname{AS}(z)=\frac{2\operatorname{At}(z)}{1-z^{2}-\operatorname{At}(z)^{2}}

as analytic functions. But, in fact, the denominators vanish only az z=1z=1. ∎

Example 6.4.

Consider the matrices

J~=[11],I~=[0110].\tilde{J}=\begin{bmatrix}1&\\ &-1\end{bmatrix},\qquad\tilde{I}=\begin{bmatrix}0&-1\\ 1&0\end{bmatrix}.

For α,β\alpha,\beta\in\mathbb{C}, let

Υα,β=αJ~𝟏.βI~𝟏.\Upsilon_{\alpha,\beta}=\alpha\tilde{J}\mathbf{1}\boldsymbol{.}\beta\tilde{I}\mathbf{1}.

Then

Υα,β2=|α|+|β|.\int\|\Upsilon_{\alpha,\beta}\|_{2}=|\alpha|+|\beta|.

For |α|+|β|<π|\alpha|+|\beta|<\pi, we can consider

μL(Υα,β)\displaystyle\mu_{\mathrm{L}}(\Upsilon_{\alpha,\beta}) =log(expL(Υα,β))\displaystyle=\log(\exp_{\mathrm{L}}(\Upsilon_{\alpha,\beta}))
=log(exp(βI~)exp(αJ~))\displaystyle=\log(\exp(\beta\tilde{I})\exp(\alpha\tilde{J}))
=log[eαcosβeαsinβeαsinβeαcosβ]\displaystyle=\log\begin{bmatrix}\mathrm{e}^{\alpha}\cos\beta&-\mathrm{e}^{-\alpha}\sin\beta\\ \mathrm{e}^{\alpha}\sin\beta&\mathrm{e}^{-\alpha}\cos\beta\end{bmatrix}
=AC(coshαcosβ)[sinhαcosβeαsinβeαsinβsinhαcosβ].\displaystyle=\operatorname{AC}(\cosh\alpha\cos\beta)\begin{bmatrix}\sinh\alpha\cos\beta&-\mathrm{e}^{-\alpha}\sin\beta\\ \mathrm{e}^{\alpha}\sin\beta&-\sinh\alpha\cos\beta\end{bmatrix}.

If α,β0\alpha,\beta\geq 0, then

μL(Υα,β)2=AC(coshαcosβ)(sinhα+coshαsinβ).\|\mu_{\mathrm{L}}(\Upsilon_{\alpha,\beta})\|_{2}=\operatorname{AC}(\cosh\alpha\cos\beta)\cdot(\sinh\alpha+\cosh\alpha\sin\beta).

(a) Now, for p[0,π)p\in[0,\pi), let

α~(p)=pπ+π2(πp)3,\tilde{\alpha}(p)=p-\pi+\sqrt[3]{\pi^{2}(\pi-p)},
β~(p)=ππ2(πp)3.\tilde{\beta}(p)=\pi-\sqrt[3]{\pi^{2}(\pi-p)}.

Then α~(p),β~(p)0\tilde{\alpha}(p),\tilde{\beta}(p)\geq 0, and

α~(p)+β~(p)=p.\tilde{\alpha}(p)+\tilde{\beta}(p)=p.

Thus,

Υα~(p),β~(p)2=p.\int\|\Upsilon_{\tilde{\alpha}(p),\tilde{\beta}(p)}\|_{2}=p.

As pπp\nearrow\pi, we see that α~(p)0\tilde{\alpha}(p)\searrow 0 (eventually) and β~(p)π\tilde{\beta}(p)\nearrow\pi. Consequently

limpπcoshα~(p)cosβ~(p)=1.\lim_{p\rightarrow\pi}\cosh\tilde{\alpha}(p)\cos\tilde{\beta}(p)=-1.

In that (elliptic) domain AC\operatorname{AC} is computed by arccos\arccos. Now, elementary function calculus shows that as pπp\nearrow\pi,

μL(Υα~(p),β~(p))2\displaystyle\|\mu_{\mathrm{L}}(\Upsilon_{\tilde{\alpha}(p),\tilde{\beta}(p)})\|_{2} =arccos(coshα~(p)cosβ~(p))1cosh2α~(p)cos2β~(p)(sinhα~(p)+coshα~(p)sinβ~(p))\displaystyle\stackrel{{\scriptstyle\rightarrow}}{{=}}\frac{\arccos(\cosh\tilde{\alpha}(p)\cos\tilde{\beta}(p))}{\sqrt{1-\cosh^{2}\tilde{\alpha}(p)\cos^{2}\tilde{\beta}(p)}}(\sinh\tilde{\alpha}(p)+\cosh\tilde{\alpha}(p)\sin\tilde{\beta}(p))
=12π8/3π2+6(πp)1/3+O((πp)1/3).\displaystyle=\sqrt{\frac{12\pi^{8/3}}{\pi^{2}+6}}(\pi-p)^{-1/3}+O((\pi-p)^{1/3}).

We see that in Baker–Campbell–Hausdorff setting we can produce the asymptotics O((πp)1/3)O((\pi-p)^{-1/3}), although having exponent 1/3-1/3 instead of 1/2-1/2 is strange.

(b) It is interesting to see that in the setting of the present example, one cannot do much better:

If we try to optimize μL(Υα,β)2\|\mu_{\mathrm{L}}(\Upsilon_{\alpha,\beta})\|_{2} for α+β\alpha+\beta (α,β0)(\alpha,\beta\geq 0) , then, after some computation, it turns out that the best approach is along a well-defined ridge. This ridge starts hyperbolic, but turns elliptic. Its elliptic part is part is parametrized by x(1,1]x\in(-1,1], and

α^(x)=arcosh(AC(x)+AC(x)24x(1xAS(x))AS(x)2(1xAS(x)));\hat{\alpha}(x)=\operatorname{arcosh}\left(\frac{\operatorname{AC}(x)+\sqrt{\operatorname{AC}(x)^{2}-4x(1-x\operatorname{AS}(x))\operatorname{AS}(x)}}{2(1-x\operatorname{AS}(x))}\right);
β^(x)=arccos(AC(x)AC(x)24x(1xAS(x))AS(x)2AS(x)).\hat{\beta}(x)=\arccos\left(\frac{\operatorname{AC}(x)-\sqrt{\operatorname{AC}(x)^{2}-4x(1-x\operatorname{AS}(x))\operatorname{AS}(x)}}{2\operatorname{AS}(x)}\right).

Then

coshα^(x)cosβ^(x)=x.\cosh\hat{\alpha}(x)\cos\hat{\beta}(x)=x.

Actually, x=1x=1 gives a parabolic expL(Υα^(x),β^(x))\operatorname{exp_{L}}(\Upsilon_{\hat{\alpha}(x),\hat{\beta}(x)}), but for x(1,1)x\in(-1,1) it is elliptic. Then α^(x),β^(x)0\hat{\alpha}(x),\hat{\beta}(x)\geq 0. As x1x\searrow-1, one can see that α0\alpha\searrow 0 (eventually) and βπ\beta\nearrow\pi; and, more importantly,

α^(x)+β^(x)π.\hat{\alpha}(x)+\hat{\beta}(x)\nearrow\pi.

Now, as x1x\searrow-1,

arccosx1x2(sinhα^(x)+coshα^(x)sinβ^(x))=π23/4(x+1)1/4+O((x+1)1/4),\frac{\arccos x}{\sqrt{1-x^{2}}}(\sinh\hat{\alpha}(x)+\cosh\hat{\alpha}(x)\sin\hat{\beta}(x))=\pi 2^{3/4}(x+1)^{-1/4}+O((x+1)^{1/4}),

and

πα^(x)β^(x)=1323/4(x+1)3/4+O((x+1)5/4).\pi-\hat{\alpha}(x)-\hat{\beta}(x)=\frac{1}{3}2^{3/4}(x+1)^{3/4}+O((x+1)^{5/4}).

Hence, using the notation p^(x)=α^(x)+β^(x)\hat{p}(x)=\hat{\alpha}(x)+\hat{\beta}(x), we find

μL(Υα^(x),β^(x))2=2π31/3(πp^(x))1/3+O((πp^(x))1/3).\|\mu_{\mathrm{L}}(\Upsilon_{\hat{\alpha}(x),\hat{\beta}(x)})\|_{2}=2\pi 3^{-1/3}(\pi-\hat{p}(x))^{-1/3}+O((\pi-\hat{p}(x))^{1/3}).

This 2π31/3=4.3562\pi 3^{-1/3}=4.356\ldots is just slightly better than 12π8/3π2+6=4.001\sqrt{\frac{12\pi^{8/3}}{\pi^{2}+6}}=4.001\ldots.

(c) In the previous example let α=πp,β=p\alpha=\pi-p,\beta=p with p[π/2,π)p\in[\pi/2,\pi). Let

(75) G(p):=μL(Υπp,p)2=AC(cosh(πp)cosp)(sinh(πp)+cosh(πp)sinp).G(p):=\|\mu_{\mathrm{L}}(\Upsilon_{\pi-p,p})\|_{2}=\operatorname{AC}(\cosh(\pi-p)\cos p)\cdot(\sinh(\pi-p)+\cosh(\pi-p)\sin p).

Then

μL(Υπp,p)2=2π3(πp)1+(π232)(πp)+O((πp)3).\|\mu_{\mathrm{L}}(\Upsilon_{\pi-p,p})\|_{2}=2\pi\sqrt{3}(\pi-p)^{-1}+\left(\frac{\pi}{2}\sqrt{3}-2\right)(\pi-p)+O((\pi-p)^{3}).

In fact, a special value is

G(π2)=π2expπ2.G\left(\frac{\pi}{2}\right)=\frac{\pi}{2}\exp\frac{\pi}{2}.

Using elementary analysis, one can see that G(p)G(p) is strictly monotone increasing on p[π/2,π)p\in[\pi/2,\pi). ∎

Example 6.5.

Consider the matrices

P~=[010],I~=[11].\tilde{P}=\begin{bmatrix}0&-1\\ &0\end{bmatrix},\qquad\tilde{I}=\begin{bmatrix}&-1\\ 1&\end{bmatrix}.

For α,β\alpha,\beta\in\mathbb{C}, let

Υ~α,β=αP~𝟏.βI~𝟏.\tilde{\Upsilon}_{\alpha,\beta}=\alpha\tilde{P}\mathbf{1}\boldsymbol{.}\beta\tilde{I}\mathbf{1}.

Then

Υ~α,β2=|α|+|β|.\int\|\tilde{\Upsilon}_{\alpha,\beta}\|_{2}=|\alpha|+|\beta|.

For |α|+|β|<π|\alpha|+|\beta|<\pi, we can consider

μL(Υ~α,β)\displaystyle\mu_{\mathrm{L}}(\tilde{\Upsilon}_{\alpha,\beta}) =log(expL(Υ~α,β))\displaystyle=\log(\exp_{\mathrm{L}}(\tilde{\Upsilon}_{\alpha,\beta}))
=log(exp(βI~)exp(αP~))\displaystyle=\log(\exp(\beta\tilde{I})\exp(\alpha\tilde{P}))
=log[cosβαcosβsinβsinβαsinβ+cosβ]\displaystyle=\log\begin{bmatrix}\cos\beta&-\alpha\cos\beta-\sin\beta\\ \sin\beta&-\alpha\sin\beta+\cos\beta\end{bmatrix}
=AC(cosβα2sinβ)[α2sinβαcosβsinβsinβα2sinβ].\displaystyle=\operatorname{AC}\left(\cos\beta-\frac{\alpha}{2}\sin\beta\right)\begin{bmatrix}\frac{\alpha}{2}\sin\beta&\alpha\cos\beta-\sin\beta\\ \sin\beta&-\frac{\alpha}{2}\sin\beta\end{bmatrix}.

If α,β0\alpha,\beta\geq 0, then

μL(Υ~α,β)2=AC(cosβα2sinβ)(sinβ+α2cosβ+α2).\|\mu_{\mathrm{L}}(\tilde{\Upsilon}_{\alpha,\beta})\|_{2}=\operatorname{AC}\left(\cos\beta-\frac{\alpha}{2}\sin\beta\right)\cdot\left(\sin\beta+\frac{\alpha}{2}\cos\beta+\frac{\alpha}{2}\right).

For optimal approach, consider x(1,1]x\in(-1,1], and let

α^(x)=2At(x)1(x+At(x))2;β^(x)=arccos(x+At(x)).\hat{\alpha}(x)=\frac{2\operatorname{At}(x)}{\sqrt{1-(x+\operatorname{At}(x))^{2}}};\qquad\hat{\beta}(x)=\arccos\left(x+\operatorname{At}(x)\right).

Then

cosβ^(x)α^(x)2sinβ^(x)=x.\cos\hat{\beta}(x)-\frac{\hat{\alpha}(x)}{2}\sin\hat{\beta}(x)=x.

As x1x\searrow-1, we have α0\alpha\searrow 0 (eventually) and βπ\beta\nearrow\pi; and, α^(x)+β^(x)π.\hat{\alpha}(x)+\hat{\beta}(x)\nearrow\pi. Now, as x1x\searrow-1,

μL(Υ~α^(x),β^(x))2\displaystyle\|\mu_{\mathrm{L}}(\tilde{\Upsilon}_{\hat{\alpha}(x),\hat{\beta}(x)})\|_{2} =arccosx1x2(sinβ^(x)+α^(x)2cosβ^(x)+α^(x)2)\displaystyle=\frac{\arccos x}{\sqrt{1-x^{2}}}\left(\sin\hat{\beta}(x)+\frac{\hat{\alpha}(x)}{2}\cos\hat{\beta}(x)+\frac{\hat{\alpha}(x)}{2}\right)
=21/4π(x+1)1/4+O((x+1)1/4),\displaystyle=2^{1/4}\pi(x+1)^{-1/4}+O((x+1)^{1/4}),

and

πα^(x)β^(x)=2321/4(x+1)3/4+O((x+1)5/4).\pi-\hat{\alpha}(x)-\hat{\beta}(x)=\frac{2}{3}2^{1/4}(x+1)^{3/4}+O((x+1)^{5/4}).

Hence, using the notation p^(x)=α^(x)+β^(x)\hat{p}(x)=\hat{\alpha}(x)+\hat{\beta}(x), we find

μL(Υ~α^(x),β^(x))2=π(4/3)1/3(πp^(x))1/3+O((πp^(x))1/3).\|\mu_{\mathrm{L}}(\tilde{\Upsilon}_{\hat{\alpha}(x),\hat{\beta}(x)})\|_{2}=\pi(4/3)^{1/3}(\pi-\hat{p}(x))^{-1/3}+O((\pi-\hat{p}(x))^{1/3}).

This leading coefficient π(4/3)1/3=3.457\pi(4/3)^{1/3}=3.457\ldots is worse than the corresponding one in the previous example. ∎

The previous two examples suggest, at least in the regime of 2×22\times 2 real matrices, two ideas. Firstly, that for BCH expansions substantially stronger asymptotical estimates apply as compared to general Magnus expansions. Secondly, that for larger norms in BCH expansions normal matrices are preferred. These issues will be addressed subsequently, with more machinery.

7. Critical singular behaviour of BCH expansions in the 2×22\times 2 real case

For the purposes of the next statement let

U={AM2()A21}U=\{A\in\mathrm{M}_{2}(\mathbb{R})\>\,\|A\|_{2}\leq 1\}

and

X=(π,π)×U×U.X=(-\pi,\pi)\times U\times U.

According to Theorem 1.4, the map

:(α,A,B)μR(πα2A𝟏.π+α2B𝟏)\mathcal{B}:(\alpha,A,B)\mapsto\mu_{\mathrm{R}}\left(\frac{\pi-\alpha}{2}A\mathbf{1}\boldsymbol{.}\frac{\pi+\alpha}{2}B\mathbf{1}\right)

is defined everywhere. In particular, the subset

X0=(π,π)×{I~}×{I~}(π,π)×{I~}×{I~}X_{0}=(-\pi,\pi)\times\left\{\tilde{I}\right\}\times\left\{\tilde{I}\right\}\cup(-\pi,\pi)\times\left\{-\tilde{I}\right\}\times\left\{-\tilde{I}\right\}

is included, where \mathcal{B} takes values πI~\pi\tilde{I} or πI~-\pi\tilde{I}.

Theorem 7.1.

(a) For (α,A,B)XX0(\alpha,A,B)\in X\setminus X_{0},

(α,A,B)=log(exp(πα2A)exp(π+α2B));\mathcal{B}(\alpha,A,B)=\log\left(\exp\left(\frac{\pi-\alpha}{2}A\right)\exp\left(\frac{\pi+\alpha}{2}B\right)\right);

and |XX0\mathcal{B}|_{X\setminus X_{0}} is analytic (meaning that it is a restriction from an analytic function which is defined on an open subset of (π,π)×M2()×M2()(-\pi,\pi)\times\mathrm{M}_{2}(\mathbb{R})\times\mathrm{M}_{2}(\mathbb{R}) containing XX0X\setminus X_{0}.)

(b) Let (α0,I0,I0)X0(\alpha_{0},I_{0},I_{0})\in X_{0}. Then

π\displaystyle\pi =lim inf(α,A,B)(α0,I0,I0)(α,A,B)2<\displaystyle=\liminf_{(\alpha,A,B)\rightarrow(\alpha_{0},I_{0},I_{0})}\|\mathcal{B}(\alpha,A,B)\|_{2}<
<lim sup(α,A,B)(α0,I0,I0)(α,A,B)2=ππ|α0|+2cosα02π|α0|2cosα02.\displaystyle<\limsup_{(\alpha,A,B)\rightarrow(\alpha_{0},I_{0},I_{0})}\|\mathcal{B}(\alpha,A,B)\|_{2}=\pi\sqrt{\frac{\pi-|\alpha_{0}|+2\cos\frac{\alpha_{0}}{2}}{\pi-|\alpha_{0}|-2\cos\frac{\alpha_{0}}{2}}}.

(Here lim inf\liminf and lim sup\limsup can be understood either as for (α,A,B)XX0(\alpha,A,B)\in X\setminus X_{0} or (α,A,B)X(\alpha,A,B)\in X.)

Proof.

(a) This follows from Theorem 1.6 and the analyticity of log\log.

(b) In these circumstances, trA,trB0\operatorname{tr}A,\operatorname{tr}B\rightarrow 0, and as trace can factorized out, and as detracing does not increase the norm (Corollary 2.25), we can restrict to the case trA=trB=0\operatorname{tr}A=\operatorname{tr}B=0. By conjugation invariance we can also assume I0=I~I_{0}=\tilde{I}. Taking (45) into consideration, we can restrict to matrices

A=(11+t2ξ)I~+1+t2ξ(s1J~+s2K~)A=\left(1-\frac{1+t}{2}\xi\right)\tilde{I}+\frac{1+t}{2}\xi\left(s_{1}\tilde{J}+s_{2}\tilde{K}\right)

and

B=(11t2ξ)I~+1t2ξ(r1J~+r2K~),B=\left(1-\frac{1-t}{2}\xi\right)\tilde{I}+\frac{1-t}{2}\xi\left(r_{1}\tilde{J}+r_{2}\tilde{K}\right),

where 0<ξ00<\xi\sim 0, and |t|,s12+s22,r12+r221|t|,\sqrt{s_{1}^{2}+s_{2}^{2}},\sqrt{r_{1}^{2}+r_{2}^{2}}\leq 1.

Essentially, we have to consider ξ0\xi\searrow 0 while αα0\alpha\rightarrow\alpha_{0}. (Thus all parameters depend on indices λΛ\lambda\in\Lambda which we omit.) Going through the computations, we find

log(exp(πα2A)exp(π+α2B))2=(π+O(ξ))ξ2(πtα2)2+O(ξ3)+ξ2(cosα2)2((s11+t2r11t2)2+(s21+t2r21t2)2)+O(ξ3)ξ2((πtα2)2(cosα2)2((s11+t2r11t2)2+(s21+t2r21t2)2))+O(ξ3).\left\|\log\left(\exp\left(\frac{\pi-\alpha}{2}A\right)\exp\left(\frac{\pi+\alpha}{2}B\right)\right)\right\|_{2}=(\pi+O(\xi))\cdot\\ \cdot\frac{\sqrt{\xi^{2}(\frac{\pi-t\alpha}{2})^{2}+O(\xi^{3})}+\sqrt{\xi^{2}\left(\cos\frac{\alpha}{2}\right)^{2}\left(\left(s_{1}\frac{1+t}{2}-r_{1}\frac{1-t}{2}\right)^{2}+\left(s_{2}\frac{1+t}{2}-r_{2}\frac{1-t}{2}\right)^{2}\right)+O(\xi^{3})}}{\sqrt{\xi^{2}\left((\frac{\pi-t\alpha}{2})^{2}-\left(\cos\frac{\alpha}{2}\right)^{2}\left(\left(s_{1}\frac{1+t}{2}-r_{1}\frac{1-t}{2}\right)^{2}+\left(s_{2}\frac{1+t}{2}-r_{2}\frac{1-t}{2}\right)^{2}\right)\right)+O(\xi^{3})}}.

Note that due to the convexity of the unit disk,

0(s11+t2r11t2)2+(s21+t2r21t2)2E2:=1.0\leq\underbrace{\left(s_{1}\frac{1+t}{2}-r_{1}\frac{1-t}{2}\right)^{2}+\left(s_{2}\frac{1+t}{2}-r_{2}\frac{1-t}{2}\right)^{2}}_{E^{2}:=}\leq 1.

Furthermore, every value from [0,1][0,1] can be realized as EE (even if we fix tt). Thus

log(exp(πα2A)exp(π+α2B))2=π(πtα)+(cosα2)E(πtα)(cosα2)E+O(ξ).\left\|\log\left(\exp\left(\frac{\pi-\alpha}{2}A\right)\exp\left(\frac{\pi+\alpha}{2}B\right)\right)\right\|_{2}=\pi\sqrt{\frac{(\pi-t\alpha)+\left(\cos\frac{\alpha}{2}\right)E}{(\pi-t\alpha)-\left(\cos\frac{\alpha}{2}\right)E}}+O(\xi).

From this, the statement follows. ∎

Remark 7.2.

Using the notation p=π+|α|2=max{πα2,π+α2}p=\frac{\pi+|\alpha|}{2}=\max\left\{\frac{\pi-\alpha}{2},\frac{\pi+\alpha}{2}\right\}, we find

ππ|α|+2cosα2π|α|2cosα2=ππp+sinpπpsinp=2π3πpπ330(πp)+O((πp)3)\pi\sqrt{\frac{\pi-|\alpha|+2\cos\frac{\alpha}{2}}{\pi-|\alpha|-2\cos\frac{\alpha}{2}}}=\pi\sqrt{{\frac{\pi-p+\sin p}{\pi-p-\sin p}}}={\frac{2\pi\,\sqrt{3}}{\pi-p}}-\,\frac{\pi\,\sqrt{3}}{30}\left(\pi-p\right)+O((\pi-p)^{3})

as pπp\nearrow\pi. As for crude estimates, one can check that

ππp+sinpπpsinp<2π3πp<G(p)\pi\sqrt{{\frac{\pi-p+\sin p}{\pi-p-\sin p}}}<{\frac{2\pi\,\sqrt{3}}{\pi-p}}<G(p)

holds for p[π2,π)p\in[\frac{\pi}{2},\pi), cf. (75). ∎

Theorem 7.3.

Let α0(π,π)\alpha_{0}\in(-\pi,\pi), p=π+|α0|2=max{πα02,π+α02}p=\frac{\pi+|\alpha_{0}|}{2}=\max\left\{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}\right\}. Cf. Example 6.4(c) for the definition of G(p)G(p). Then

G(p)(sup(α,A,B)X|α||α0|(α,A,B)2)<+.G(p)\leq\left(\sup_{\begin{subarray}{c}(\alpha,A,B)\in X\\ |\alpha|\leq|\alpha_{0}|\end{subarray}}\|\mathcal{B}(\alpha,A,B)\|_{2}\right)<+\infty.
Proof.

The lower estimate is obvious from (75). As for the upper estimate, Let MM be so that M>ππp+sinpπpsinpM>\pi\sqrt{{\frac{\pi-p+\sin p}{\pi-p-\sin p}}}. Then Theorem 7.1 shows that

{(α,A,B)X:|α||α0|,(α,A,B)M}\{(\alpha,A,B)\in X\,:\,|\alpha|\leq|\alpha_{0}|,\mathcal{B}(\alpha,A,B)\geq M\}

is a compact subset of XX, on which \mathcal{B} is continuous. From this compactness, the boundedness of 2\|\mathcal{B}\|_{2} follows. ∎

Thus for 2×22\times 2 real matrices, sufficient balancedness implies uniform boundedness for BCH expansions. The most interesting case is p=π/2p=\pi/2 (i. e. α0=0\alpha_{0}=0). In this case the lower bound is sharp; using a slightly different formulation, see Theorem 10.5.

8. Moments associated to Schur’s formulae

Let

𝒮={AM2():sp(A){z:|Imz|<π}DA<π2 in M2(),DAA0}.\mathcal{S}=\{A\in\mathrm{M}_{2}(\mathbb{R})\,:\,\underbrace{\operatorname{sp}(A)\subset\{z\in\mathbb{C}\,:\,|\operatorname{Im}z|<\pi\}}_{\equiv D_{A}<\pi^{2}\text{ in }\mathrm{M}_{2}(\mathbb{R})},\,-D_{A^{*}A}\neq 0\}.

We can decompose 𝒮\mathcal{S} further; so that

𝒮nn={A𝒮:[A,A]0}\mathcal{S}^{\mathrm{nn}}=\{A\in\mathcal{S}\,:\,[A,A^{*}]\neq 0\}

is the regular or non-normal (here: non-selfadjoint) interior of 𝒮\mathcal{S}; and

ðpar𝒮={A𝒮:A=A}\eth^{\mathrm{par}}\mathcal{S}=\{A\in\mathcal{S}\,:\,A=A^{*}\}

is the Schur-parabolic or normal (here: self-adjoint) pseudoboundary. Then

𝒮=𝒮nn˙ðpar𝒮.\mathcal{S}=\mathcal{S}^{\mathrm{nn}}\,\,\dot{\cup}\,\,\eth^{\mathrm{par}}\mathcal{S}.

Note that 𝒮\mathcal{S} is open in M2()\mathrm{M}_{2}(\mathbb{R}), consequently, for the closure in M2()\mathrm{M}_{2}(\mathbb{R}),

𝒮¯=𝒮˙𝒮.\overline{\mathcal{S}}=\mathcal{S}\,\,\dot{\cup}\,\,\partial\mathcal{S}.

We can decompose 𝒮M2()\partial\mathcal{S}\subset\mathrm{M}_{2}(\mathbb{R}) so that

0𝒮={0}\partial^{0}\mathcal{S}=\{0\}

is the zero-boundary;

hyp𝒮={AM2():sp(A){z:|Imz|<π}, A is a conform-reflexion}\partial^{\mathrm{hyp}}\mathcal{S}=\{A\in\mathrm{M}_{2}(\mathbb{R})\,:\,\operatorname{sp}(A)\subset\{z\in\mathbb{C}\,:\,|\operatorname{Im}z|<\pi\},\,\text{ $A$ is a conform-reflexion}\}

is the Schur-hyperbolic or conform-reflexional boundary;

ell𝒮={AM2():sp(A){z:|Imz|<π}, A is a conform-rotation}\partial^{\mathrm{ell}}\mathcal{S}=\{A\in\mathrm{M}_{2}(\mathbb{R})\,:\,\operatorname{sp}(A)\subset\{z\in\mathbb{C}\,:\,|\operatorname{Im}z|<\pi\},\,\text{ $A$ is a conform-rotation}\}

is the Schur-elliptic or conform-rotational boundary;

dell𝒮={AM2():sp(A)(iπ+)(iπ+), A is a conform-rotation}\partial^{\mathrm{dell}}\mathcal{S}=\{A\in\mathrm{M}_{2}(\mathbb{R})\,:\,\operatorname{sp}(A)\subset(\mathrm{i}\pi+\mathbb{R})\cup(-\mathrm{i}\pi+\mathbb{R}),\,\text{ $A$ is a conform-rotation}\}

is the Schur-elliptic degenerate boundary;

dnn𝒮={AM2():sp(A)(iπ+)(iπ+), A is not a conform-rotation}\partial^{\mathrm{dnn}}\mathcal{S}=\{A\in\mathrm{M}_{2}(\mathbb{R})\,:\,\operatorname{sp}(A)\subset(\mathrm{i}\pi+\mathbb{R})\cup(-\mathrm{i}\pi+\mathbb{R}),\,\text{ $A$ is not a conform-rotation}\}

is the non-normal degenerate boundary. Then

𝒮=0𝒮˙hyp𝒮˙ell𝒮˙dell𝒮˙dnn𝒮.\partial\mathcal{S}=\partial^{0}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{hyp}}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{ell}}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{dell}}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{dnn}}\mathcal{S}.

Some components here can be decomposed further naturally: ell𝒮\partial^{\mathrm{ell}}\mathcal{S} can be decomposed to conform-identity part ell1𝒮\partial^{\mathrm{ell1}}\mathcal{S} and the generic part ell𝒮\partial^{\mathrm{ell*}}\mathcal{S}; dell𝒮\partial^{\mathrm{dell}}\mathcal{S} can be decomposed to traceless part dell0𝒮\partial^{\mathrm{dell0}}\mathcal{S} and the generic part dell𝒮\partial^{\mathrm{dell*}}\mathcal{S}; and similarly dnn𝒮\partial^{\mathrm{dnn}}\mathcal{S} can be decomposed to traceless part dnn0𝒮\partial^{\mathrm{dnn0}}\mathcal{S} and the generic part dnn𝒮\partial^{\mathrm{dnn*}}\mathcal{S}.

We let

𝒮ext=M2()𝒮¯,\mathcal{S}^{\mathrm{ext}}=\mathrm{M}_{2}(\mathbb{R})\setminus\overline{\mathcal{S}},

the external set of 𝒮\mathcal{S}. This set can be decomposed to the degenerate or normal exterior 𝒮ellext\mathcal{S}^{\mathrm{ellext}} and the non-normal exterior 𝒮nnext\mathcal{S}^{\mathrm{nnext}}. It is reasonable to discriminate X^ext\widehat{X}^{\mathrm{ext}} further by whether DA{k2π2:k{0}}D_{A}\in\{k^{2}\pi^{2}\,:\,k\in\mathbb{N}\setminus\{0\}\} holds or not. However, 𝒮ext\mathcal{S}^{\mathrm{ext}} will not be, ultimately, of much interest for us:

Let

𝒮acc=𝒮nn˙ðpar𝒮˙0𝒮˙hyp𝒮˙ell𝒮˙dell𝒮\mathcal{S}^{\mathrm{acc}}=\mathcal{S}^{\mathrm{nn}}\,\,\dot{\cup}\,\,\eth^{\mathrm{par}}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{0}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{hyp}}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{ell}}\mathcal{S}\,\,\dot{\cup}\,\,\partial^{\mathrm{dell}}\mathcal{S}

(where dnn𝒮\partial^{\mathrm{dnn}}\mathcal{S} and 𝒮ext\mathcal{S}^{\mathrm{ext}} are not included). Here the point is that elements of dnn𝒮\partial^{\mathrm{dnn}}\mathcal{S} and 𝒮ext\mathcal{S}^{\mathrm{ext}} are not that useful to exponentiate, as they can easily be replaced by elements of smaller norm:

Lemma 8.1.

(a) If AM2()𝒮accA\in\mathrm{M}_{2}(\mathbb{R})\setminus{\mathcal{S}^{\mathrm{acc}}}, then there exists B𝒮accB\in{\mathcal{S}^{\mathrm{acc}}} such that expB=expA\exp B=\exp A and B2<A2\|B\|_{2}<\|A\|_{2}.

(b) If A𝒮accdell𝒮A\in{\mathcal{S}^{\mathrm{acc}}}\setminus\partial^{\mathrm{dell}}\mathcal{S}, there there is only one B𝒮accB\in{\mathcal{S}^{\mathrm{acc}}} such that expB=expA\exp B=\exp A; namely B=AB=A.

(c) If Adell𝒮A\in\partial^{\mathrm{dell}}\mathcal{S}, there are two possible B𝒮accB\in{\mathcal{S}^{\mathrm{acc}}} such that expB=expA\exp B=\exp A. If A=aId2+πI~A=a\operatorname{Id}_{2}+\pi\tilde{I} or A=aId2πI~A=a\operatorname{Id}_{2}-\pi\tilde{I}, then B=aId2±πI~B=a\operatorname{Id}_{2}\pm\pi\tilde{I}.

Proof.

This is straightforward from Lemma 2.28. ∎

Note that A𝒮A\in\mathcal{S} if and only if logexpA=A\log\exp A=A and the norm is smooth at AA. (This is the reason for the notation.) Then

(76) MRA(𝐯)=D𝐯 at M=0(MM2()log((expA)(expM))2)\mathrm{MR}_{A}(\mathbf{v})=D_{\mathbf{v}\text{ at }M=0}\left(M\in\mathrm{M}_{2}(\mathbb{R})\mapsto\|\log((\exp A)(\exp M))\|_{2}\right)

yields a linear operator MRA\mathrm{MR}_{A}. This can be represented by MR(A)M2()\mathrm{MR}(A)\in\mathrm{M}_{2}(\mathbb{R}) such that

MRA(𝐯)=12tr(𝐯MR(A)).\mathrm{MR}_{A}(\mathbf{v})=\frac{1}{2}\operatorname{tr}\left(\mathbf{v}\mathrm{MR}(A)^{*}\right).

Similarly, we can define

(77) MLA(𝐯)=D𝐯 at M=0(MM2()log((expM)(expA))2)\mathrm{ML}_{A}(\mathbf{v})=D_{\mathbf{v}\text{ at }M=0}\left(M\in\mathrm{M}_{2}(\mathbb{R})\mapsto\|\log((\exp M)(\exp A))\|_{2}\right)

and ML(A)M2()\mathrm{ML}(A)\in\mathrm{M}_{2}(\mathbb{R}) such that

MLA(𝐯)=12tr(𝐯ML(A)).\mathrm{ML}_{A}(\mathbf{v})=\frac{1}{2}\operatorname{tr}\left(\mathbf{v}\mathrm{ML}(A)^{*}\right).

As log\log, exp\exp, 2\|\cdot\|_{2} are locally open at these expressions, we know that ML(A),MR(A)0\mathrm{ML}(A),\mathrm{MR}(A)\neq 0 (for A𝒮A\in\mathcal{S}). It is easy to see that if UU is orthogonal, then

(78) MRUAU1(U𝐯U1)=MRA(𝐯)\mathrm{MR}_{UAU^{-1}}(U\mathbf{v}U^{-1})=\mathrm{MR}_{A}(\mathbf{v})

and

MR(UAU1)=UMR(A)U1.\mathrm{MR}(UAU^{-1})=U\mathrm{MR}(A)U^{-1}.

Similarly statements holds for ML\mathrm{ML}. Furthermore,

(79) MRA(𝐯)=MLA(𝐯)\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A^{*}}(\mathbf{v}^{*})

and

MR(A)=(ML(A)).\mathrm{MR}(A)=(\mathrm{ML}(A^{*}))^{*}.

By Lemma 3.2, the condition A𝒮A\in\mathcal{S} (ie. DAA0D_{A^{*}A}\neq 0 & DA<π2D_{A}<\pi^{2}) can be relaxed to DAA0D_{A^{*}A}\neq 0 & DA{k2π2:k{0}}D_{A}\notin\{k^{2}\pi^{2}\,:\,k\in\mathbb{N}\setminus\{0\}\} in the previous discussion. (Thus it extends to most of 𝒮ext\mathcal{S}^{\mathrm{ext}}.)

If A0𝒮hyp𝒮ell𝒮A\in\partial^{0}\mathcal{S}\cup\partial^{\mathrm{hyp}}\mathcal{S}\cup\partial^{\mathrm{ell}}\mathcal{S}, then (76) still makes sense, although it is not linear in 𝐯\mathbf{v}. Thus, the (non-linear) forms MRA(𝐯)\mathrm{MR}_{A}(\mathbf{v}) and MLA(𝐯)\mathrm{ML}_{A}(\mathbf{v}) are defined, but we leave MR(A)\mathrm{MR}(A) and ML(A)\mathrm{ML}(A) undefined. Nevertheless, (78) and (79) still hold.

If Adell𝒮A\in\partial^{\mathrm{dell}}\mathcal{S}, then we will be content to define MRA(𝐯)=MLA(𝐯)\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A}(\mathbf{v}) for 𝐯Id2+I~\mathbf{v}\in\mathbb{R}\operatorname{Id}_{2}+\mathbb{R}\tilde{I}. This is done as follows: We extend log\log as log\log^{*} such that for negative scalar matrices B=λId2B=-\lambda\operatorname{Id}_{2} it yields logB=(logλ)Id2±πI~\log^{*}B=(\log\lambda)\operatorname{Id}_{2}\pm\pi\tilde{I} (two-valued). Then logB2\|\log^{*}B\|_{2} still makes sense uniquely. Now, for 𝐯Id2+I~\mathbf{v}\in\mathbb{R}\operatorname{Id}_{2}+\mathbb{R}\tilde{I}, we define

MRA(𝐯)=MLA(𝐯)=limt0log(exp(𝐯t)exp(A))2A2t\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A}(\mathbf{v})=\lim_{t\searrow 0}\frac{\|\log^{*}(\exp(\mathbf{v}t)\exp(A))\|_{2}-\|A\|_{2}}{t}

(expA\exp A is central). This argument applies more generally, if DA{k2π2:k{0}}D_{A}\in\{k^{2}\pi^{2}\,:\,k\in\mathbb{N}\setminus\{0\}\}, with respect to an appropriate skew-involution IAI_{A}; but it will be no interest for us.

Another observation is that MRA()\mathrm{MR}_{A}(\cdot) and MLA()\mathrm{ML}_{A}(\cdot) are never trivial (i. e. identically zero). Indeed, in each case, the direction 𝐯=A\mathbf{v}=A yields to a differentiable increase or decrease (the latter is for Adell𝒮A\in\partial^{\mathrm{dell}}\mathcal{S}) in the norm. We will see concrete expressions for MRA(𝐯)\mathrm{MR}_{A}(\mathbf{v}) and MLA(𝐯)\mathrm{ML}_{A}(\mathbf{v}) later.

Lemma 8.2.

Suppose that A=aId2+bI~+(rcosψ)J~+(rsinψ)A=a\operatorname{Id}_{2}+b\tilde{I}+(r\cos\psi)\tilde{J}+(r\sin\psi) such that

(80) a2+b2>0,r>0,a^{2}+b^{2}>0,\qquad r>0,

and

(81) b2r2{k2π2:k{0}}.b^{2}-r^{2}\notin\{k^{2}\pi^{2}\,:\,k\in\mathbb{N}\setminus\{0\}\}.

(Note A𝒮A\in\mathcal{S} if and only if (80) and b2r2<π2b^{2}-r^{2}<\pi^{2} hold.)

(o) Then

MR(A)=\displaystyle\mathrm{MR}(A)= a^Id2+b^I~+c^RJ~+d^RK~\displaystyle\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}+\hat{c}_{\mathrm{R}}\tilde{J}+\hat{d}_{\mathrm{R}}\tilde{K}
=\displaystyle= a^Id2+b^I~+(cosψId2+sinψI~)(c˘J~+d˘K~),\displaystyle\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}+(\cos\psi\operatorname{Id}_{2}+\sin\psi\tilde{I})(\breve{c}\tilde{J}+\breve{d}\tilde{K}),

and

ML(A)=\displaystyle\mathrm{ML}(A)= a^Id2+b^I~+c^LJ~+d^LK~\displaystyle\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}+\hat{c}_{\mathrm{L}}\tilde{J}+\hat{d}_{\mathrm{L}}\tilde{K}
=\displaystyle= a^Id2+b^I~+(cosψId2+sinψI~)(c˘J~d˘K~),\displaystyle\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}+(\cos\psi\operatorname{Id}_{2}+\sin\psi\tilde{I})(\breve{c}\tilde{J}-\breve{d}\tilde{K}),

where

(82) a^\displaystyle\hat{a} =aa2+b2\displaystyle=\frac{a}{\sqrt{a^{2}+b^{2}}}
b^\displaystyle\hat{b} =ba2+b2(1+(a2+b2+r)r(b2r2))\displaystyle=\frac{b}{\sqrt{a^{2}+b^{2}}}\left(1+\left(\sqrt{a^{2}+b^{2}}+r\right)\cdot r\operatorname{\not{\mathbf{C}}}(b^{2}-r^{2})\right)
c˘\displaystyle\breve{c} =1ba2+b2(a2+b2+r)b(b2r2)\displaystyle=1-\frac{b}{\sqrt{a^{2}+b^{2}}}\left(\sqrt{a^{2}+b^{2}}+r\right)\cdot b\operatorname{\not{\mathbf{C}}}(b^{2}-r^{2})
d˘\displaystyle\breve{d} =ba2+b2(a2+b2+r).\displaystyle=-\frac{b}{\sqrt{a^{2}+b^{2}}}\left(\sqrt{a^{2}+b^{2}}+r\right).

(a) Consequently,

a^2+b^2c˘2d˘2\displaystyle\hat{a}^{2}+\hat{b}^{2}-\breve{c}^{2}-\breve{d}^{2} =a^2+b^2c^R2d^R2=a^2+b^2c^L2d^L2=\displaystyle=\hat{a}^{2}+\hat{b}^{2}-\hat{c}_{\mathrm{R}}^{2}-\hat{d}_{\mathrm{R}}^{2}=\hat{a}^{2}+\hat{b}^{2}-\hat{c}_{\mathrm{L}}^{2}-\hat{d}_{\mathrm{L}}^{2}=
(83) =(ba2+b2(a2+b2+r)(b2r2))20,\displaystyle=-\left(\frac{b}{\sqrt{a^{2}+b^{2}}}\left(\sqrt{a^{2}+b^{2}}+r\right)\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}\right)^{2}\leq 0,

with equality only if b=0b=0. In particular,

(84) c˘2+d˘2=c^R2+d^R2=c^L2+d^L2>0.\breve{c}^{2}+\breve{d}^{2}=\hat{c}_{\mathrm{R}}^{2}+\hat{d}_{\mathrm{R}}^{2}=\hat{c}_{\mathrm{L}}^{2}+\hat{d}_{\mathrm{L}}^{2}>0.

(b) If a>0a>0 and b=0b=0, then (a^,b^,c˘,d˘)=(1,0,cosψ,sinψ)(\hat{a},\hat{b},\breve{c},\breve{d})=(1,0,\cos\psi,\sin\psi), thus

MR(A)=ML(A)=Id2+cosψJ~+sinψK~.\mathrm{MR}(A)=\mathrm{ML}(A)=\operatorname{Id}_{2}+\cos\psi\tilde{J}+\sin\psi\tilde{K}.

If a<0a<0 and b=0b=0, then (a^,b^,c˘,d˘)=(1,0,cosψ,sinψ)(\hat{a},\hat{b},\breve{c},\breve{d})=(-1,0,\cos\psi,\sin\psi), thus

MR(A)=ML(A)=Id2+cosψJ~+sinψK~.\mathrm{MR}(A)=\mathrm{ML}(A)=-\operatorname{Id}_{2}+\cos\psi\tilde{J}+\sin\psi\tilde{K}.

(c) Restricted to A𝒮A\in\mathcal{S}, and b0b\neq 0, and to the level set =a2+b2+r>0\not{N}=\sqrt{a^{2}+b^{2}}+r>0, the maps MR(A)\mathrm{MR(A)} and ML(A)\mathrm{ML(A)} are injective.

(d) We can also consider the correspondence induced by conjugation by orthogonal matrices, that is the map

M˘:(a,b,r)(a^,b^,c˘2+d˘2),\breve{\mathrm{M}}:(a,b,r)\mapsto\left(\hat{a},\hat{b},\sqrt{\breve{c}^{2}+\breve{d}^{2}}\right),

where the domain is determined by the restrictions (80) and (81).

Restricted to the b2r2<π2b^{2}-r^{2}<\pi^{2}, and b0b\neq 0, and to the level set =a2+b2+r>0\not{N}=\sqrt{a^{2}+b^{2}}+r>0, the map M˘\breve{\mathrm{M}} is injective.

Proof.

(o), (a), and (b) are straightforward computations, cf. Lemma 3.3. Only (84) requires a particular argument: By (83), we see that c˘=d˘=0\breve{c}=\breve{d}=0 implies a^=b^=0\hat{a}=\hat{b}=0; however a^=d˘=0\hat{a}=\breve{d}=0 is in contradiction to a2+b2>0a^{2}+b^{2}>0. Due to the fibration property with respect to conjugation be rotation matrices, (d) will imply (c). Regarding (d): Assume we have (a^,b^,c˘2+d˘2)(\hat{a},\hat{b},\sqrt{\breve{c}^{2}+\breve{d}^{2}}) and \not{N} given. Note that, due to b2r2<π2b^{2}-r^{2}<\pi^{2}, we know not only sgna=sgna^\operatorname{sgn}a=\operatorname{sgn}\hat{a}, but sgnb=sgnb^\operatorname{sgn}b=\operatorname{sgn}\hat{b}. Hence, using a^=aa2+b2\hat{a}=\frac{a}{\sqrt{a^{2}+b^{2}}}, the value of ba2+b2\frac{b}{\sqrt{a^{2}+b^{2}}} can be recovered. Using (83), we can compute (b2r2)\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}. As b2r2<π2b^{2}-r^{2}<\pi^{2}, we can recover b2r2b^{2}-r^{2}. Then, from b^\hat{b}, we can recover rr. Thus we also know b2b^{2}. Using sgnb\operatorname{sgn}b, we can deduce the value of bb. As aa2+b2:ba2+b2\frac{a}{\sqrt{a^{2}+b^{2}}}:\frac{b}{\sqrt{a^{2}+b^{2}}} is already known, the value of aa can also be recovered. ∎

Motivated by Lemma 9.4 later, in the context of Lemma 8.2, we also define the normalized expressions

(85) MRCKB(A)=1c^R2+d^R2MR(A),MLCKB(A)=1c^L2+d^L2ML(A)\mathrm{MR}^{\mathrm{CKB}}(A)=\frac{1}{\hat{c}_{\mathrm{R}}^{2}+\hat{d}_{\mathrm{R}}^{2}}\mathrm{MR}(A),\qquad\mathrm{ML}^{\mathrm{CKB}}(A)=\frac{1}{\hat{c}_{\mathrm{L}}^{2}+\hat{d}_{\mathrm{L}}^{2}}\mathrm{ML}(A)

and

(86) M˘CKB(a,b,r)(a^CKB,b^CKB)=(a^c˘2+d˘2,b^c˘2+d˘2).\breve{\mathrm{M}}^{\mathrm{CKB}}(a,b,r)\equiv(\hat{a}^{\mathrm{CKB}},\hat{b}^{\mathrm{CKB}})=\left(\frac{\hat{a}}{\sqrt{\breve{c}^{2}+\breve{d}^{2}}},\frac{\hat{b}}{\sqrt{\breve{c}^{2}+\breve{d}^{2}}}\right).

We are looking for a generalization of Lemma 8.2(c)(d) to the normalized expressions. In that matter, M˘CKB\breve{\mathrm{M}}^{\mathrm{CKB}} will play crucial role. The notation refers to the (asymptotically closed) CKB model, as M˘CKB\breve{\mathrm{M}}^{\mathrm{CKB}} is supposed to take values in it. (Indeed, by (83), we obtain interior points in the CKB model for b0b\neq 0, and asymptotical points for b=0b=0.)

For a moment, let us consider the variant which uses the flattened hyperboloid model given by

(87) M˘HP(a,b,r)(a^HP,b^HP)=(a^a^2b^2+c˘2+d˘2,b^a^2b^2+c˘2+d˘2).\breve{\mathrm{M}}^{\mathrm{HP}}(a,b,r)\equiv(\hat{a}^{\mathrm{HP}},\hat{b}^{\mathrm{HP}})=\left(\frac{\hat{a}}{\sqrt{-\hat{a}^{2}-\hat{b}^{2}+\breve{c}^{2}+\breve{d}^{2}}},\frac{\hat{b}}{\sqrt{-\hat{a}^{2}-\hat{b}^{2}+\breve{c}^{2}+\breve{d}^{2}}}\right).

It is not defined for b=0b=0 (where the denominators vanish), but this is only a minor annoyance. Using (82), we can write

(88) M˘HP(a,b,r)=(a|b|1(b2r2)a2+b2+r,(sgnb)(1(b2r2)a2+b2+r+r(b2r2)(b2r2))).\breve{\mathrm{M}}^{\mathrm{HP}}(a,b,r)=\left(\frac{a}{|b|}\cdot\frac{\dfrac{1}{\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}}}{\sqrt{a^{2}+b^{2}}+r},(\operatorname{sgn}b)\cdot\left(\frac{\dfrac{1}{\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}}}{\sqrt{a^{2}+b^{2}}+r}+r\frac{\operatorname{\not{\mathbf{C}}}(b^{2}-r^{2})}{\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}}\right)\right).

Using Lemma 2.12, we can write it as

M˘HP(a,b,r)=(a|b|(b2r2)a2+b2+r,(sgnb)((b2r2)a2+b2+r+r(b2r2))).\breve{\mathrm{M}}^{\mathrm{HP}}(a,b,r)=\left(\frac{a}{|b|}\cdot\frac{\mathcal{E}(b^{2}-r^{2})}{\sqrt{a^{2}+b^{2}}+r},(\operatorname{sgn}b)\cdot\left(\frac{\mathcal{E}(b^{2}-r^{2})}{\sqrt{a^{2}+b^{2}}+r}+r\mathcal{F}(b^{2}-r^{2})\right)\right).

This form shows that the expression above is unexpectedly well-defined for (a,b,r)××[0,+)(a,b,r)\in\mathbb{R}\times\mathbb{R}\times[0,+\infty) as long as b0b\neq 0. This also applies to M˘CKB\breve{\mathrm{M}}^{\mathrm{CKB}}, as HP can simply written back to CKB.

Now, we show that M˘CKB\breve{\mathrm{M}}^{\mathrm{CKB}} extends to the case b=0b=0 if we make a suitable blow-up in b=0b=0. This means that instead of trying the domain

˘={(a,b,r)××[0,+)},\breve{\mathcal{M}}=\{(a,b,r)\in\mathbb{R}\times\mathbb{R}\times[0,+\infty)\},

we will consider the domain

^={(s,r,θ)[0,+)×[0,+)×(mod2π)}\widehat{\mathcal{M}}=\{(s,r,\theta)\in[0,+\infty)\times[0,+\infty)\times(\mathbb{R}\operatorname{\,mod\,}2\pi)\}

where the canonical correspondence is given by

a=scosθ,b=ssinθr=r.a=s\cos\theta,\qquad b=s\sin\theta\qquad r=r.

(This, somewhat colloquially, describes the map ^˘\widehat{\mathcal{M}}\rightarrow\breve{\mathcal{M}}.)

Thus, we can define M^(s,θ,r)CKB=M˘CKB(scosθ,ssinθ,r)\widehat{\mathrm{M}}{}^{\mathrm{CKB}}(s,\theta,r)=\breve{\mathrm{M}}^{\mathrm{CKB}}(s\cos\theta,s\sin\theta,r) as long as ssinθ0s\sin\theta\neq 0.

Lemma 8.3.

(a) M^CKB\widehat{\mathrm{M}}^{\mathrm{CKB}} extends by the formula

M^CKB(s,r,θ)=(𝒜𝒜2+2+𝒢2,𝒜2+2+𝒢2)\widehat{\mathrm{M}}^{\mathrm{CKB}}(s,r,\theta)=\left(\frac{\mathcal{A}}{\sqrt{\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}}},\frac{\mathcal{B}}{\sqrt{\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}}}\right)

with

𝒜(cosθ)((ssinθ)2r2),\mathcal{A}\equiv(\cos\theta)\cdot\mathcal{E}((s\sin\theta)^{2}-r^{2}),

and

(sinθ)(((ssinθ)2r2)+r(s+r)((ssinθ)2r2))0,\mathcal{B}\equiv(\sin\theta)\cdot\underbrace{\left(\mathcal{E}((s\sin\theta)^{2}-r^{2})+r(s+r)\mathcal{F}((s\sin\theta)^{2}-r^{2})\right)}_{\mathcal{B}_{0}\equiv},

and

𝒢(sinθ)(s+r)𝒢0,\mathcal{G}\equiv(\sin\theta)\cdot\underbrace{(s+r)}_{\mathcal{G}_{0}\equiv},

smoothly to the domain ^\widehat{\mathcal{M}}, i. e. to the domain subject to the conditions

(89) 0s,0r,θmod2π.0\leq s,\qquad 0\leq r,\qquad\theta\in\mathbb{R}\operatorname{\,mod\,}2\pi.

(b) If sinθ=0\sin\theta=0, then M^CKB(s,r,θ)=(cosθ,0)\widehat{\mathrm{M}}^{\mathrm{CKB}}(s,r,\theta)=(\cos\theta,0) (which is (1,0)(1,0) or (1,0)(-1,0)).

If s=r=0s=r=0, then M^CKB(0,0,θ)=(cosθ,sinθ)\widehat{\mathrm{M}}^{\mathrm{CKB}}(0,0,\theta)=(\cos\theta,\sin\theta).

However, if sinθ0\sin\theta\neq 0 and s+r>0s+r>0, then M^CKB(s,r,θ)\widehat{\mathrm{M}}^{\mathrm{CKB}}(s,r,\theta) is in the open unit disk.

(c) M^HP\widehat{\mathrm{M}}^{\mathrm{HP}} extends by the formula

M^HP(s,r,θ)=(𝒜|𝒢|,|𝒢|)=(𝒜|𝒢|,(sgnsinθ)0𝒢0),\widehat{\mathrm{M}}^{\mathrm{HP}}(s,r,\theta)=\left(\frac{\mathcal{A}}{|\mathcal{G}|},\frac{\mathcal{B}}{|\mathcal{G}|}\right)=\left(\frac{\mathcal{A}}{|\mathcal{G}|},(\operatorname{sgn}\sin\theta)\cdot\frac{\mathcal{B}_{0}}{\mathcal{G}_{0}}\right),

except it is formally not defined for s+r=0s+r=0 or sinθ=0\sin\theta=0.

Proof.

(a) It is sufficient to prove that 𝒜2+2+𝒢2\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2} never vanishes. If s=r=0s=r=0, then 𝒜2+2+𝒢2=(0)2=3\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}=\mathcal{E}(0)^{2}=3. If s+r>0s+r>0, then vanishing requires sinθ=0\sin\theta=0; thus (cosθ)2=1(\cos\theta)^{2}=1; consequently 𝒜2+2+𝒢2=(r2)2(0)2=3\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}=\mathcal{E}(-r^{2})^{2}\geq\mathcal{E}(0)^{2}=3, a contradiction.

(b), (c): Direct computation. ∎

Note that the decomposition M2()=𝒮nnðpar𝒮\mathrm{M}_{2}(\mathbb{R})=\mathcal{S}^{\mathrm{nn}}\cup\eth^{\mathrm{par}}\mathcal{S}\cup\ldots is invariant for conjugation by rotation matrices, thus it descends to a decomposition ˘=XnnðparX\breve{\mathcal{M}}=X^{\mathrm{nn}}\cup\eth^{\mathrm{par}}X\cup\ldots. By the blow up map, this induces a decomposition ^=X^nnðparX^\widehat{\mathcal{M}}=\widehat{X}^{\mathrm{nn}}\cup\eth^{\mathrm{par}}\widehat{X}\cup\ldots. We recapitulate the situation for ^\widehat{\mathcal{M}}. We have

\bullet the subset X^nn\widehat{X}^{\mathrm{nn}} with 0<r0<r, 0<s0<s, sinθ0\sin\theta\neq 0, (ssinθ)2r2<π(s\sin\theta)^{2}-r^{2}<\pi as the regular (or non-normal) interior;

\bullet the subset ðparX^\eth^{\mathrm{par}}\widehat{X} with 0<r0<r, 0<s0<s, sinθ=0\sin\theta=0, (ssinθ)2r2<π(s\sin\theta)^{2}-r^{2}<\pi as the Schur-parabolic (or self-adjoint) pseudo-boundary.

\bullet the subset 0X^\partial^{0}\widehat{X} with r=0r=0, s=0s=0 as the zero boundary;

\bullet the subset hypX^\partial^{\mathrm{hyp}}\widehat{X} with s=0s=0, r>0r>0 as the Schur-hyperbolic (or conform-reflexion) boundary;

\bullet the subset ellX^\partial^{\mathrm{ell}}\widehat{X} with s>0s>0, r=0r=0, (ssinθ)2r2<π(s\sin\theta)^{2}-r^{2}<\pi as the Schur-elliptic (or conform-rotational) boundary;

\bullet the subset dellX^\partial^{\mathrm{dell}}\widehat{X} with r=0r=0, (ssinθ)2r2=π2(s\sin\theta)^{2}-r^{2}=\pi^{2} as the Schur-elliptic degenerate boundary;

\bullet the subset dnnX^\partial^{\mathrm{dnn}}\widehat{X} with r>0r>0, (ssinθ)2r2=π2(s\sin\theta)^{2}-r^{2}=\pi^{2} as the non-normal degenerate boundary;

\bullet the subset X^ext\widehat{X}^{\mathrm{ext}} with (ssinθ)2r2>π2(s\sin\theta)^{2}-r^{2}>\pi^{2} as the exterior set.

Thus

^=X^nn˙ðparX^˙0X^˙hypX^˙ellX^˙dellX^˙dnnX^˙X^ext.\widehat{\mathcal{M}}=\widehat{X}^{\mathrm{nn}}\,\,\dot{\cup}\,\,\eth^{\mathrm{par}}\widehat{X}\,\,\dot{\cup}\,\,\partial^{0}\widehat{X}\,\,\dot{\cup}\,\,\partial^{\mathrm{hyp}}\widehat{X}\,\,\dot{\cup}\,\,\partial^{\mathrm{ell}}\widehat{X}\,\,\dot{\cup}\,\,\partial^{\mathrm{dell}}\widehat{X}\,\,\dot{\cup}\,\,\partial^{\mathrm{dnn}}\widehat{X}\,\,\dot{\cup}\,\,\widehat{X}^{\mathrm{ext}}.

Again, certain components can be decomposed naturally: hypX^\partial^{\mathrm{hyp}}\widehat{X} can be decomposed to hyp1X^\partial^{\mathrm{hyp1}}\widehat{X} with sinθ=0\sin\theta=0, and to hypX^\partial^{\mathrm{hyp*}}\widehat{X} with sinθ0\sin\theta\neq 0. ellX^\partial^{\mathrm{ell}}\widehat{X} can be decomposed to ell1X^\partial^{\mathrm{ell1}}\widehat{X} with sinθ=0\sin\theta=0, and to ellX^\partial^{\mathrm{ell*}}\widehat{X} with sinθ0\sin\theta\neq 0. dellX^\partial^{\mathrm{dell}}\widehat{X} can be decomposed to dell0X^\partial^{\mathrm{dell0}}\widehat{X} with cosθ=0\cos\theta=0, and to dellX^\partial^{\mathrm{dell*}}\widehat{X} with cosθ0\cos\theta\neq 0. dnnX^\partial^{\mathrm{dnn}}\widehat{X} can be decomposed to dnn0X^\partial^{\mathrm{dnn0}}\widehat{X} with cosθ=0\cos\theta=0, and to dnnX^\partial^{\mathrm{dnn*}}\widehat{X} with cosθ0\cos\theta\neq 0. X^ext\widehat{X}^{\mathrm{ext}} can be decomposed to the Schur-elliptic part X^ellext\widehat{X}^{\mathrm{ellext}} with r=0r=0, and to the non-normal part X^nnext\widehat{X}^{\mathrm{nnext}} with r>0r>0. (We could further discriminate X^ext\widehat{X}^{\mathrm{ext}} by whether (ssinθ)2r2{k2π2:k{0}}(s\sin\theta)^{2}-r^{2}\in\{k^{2}\pi^{2}\,:\,k\in\mathbb{N}\setminus\{0\}\} holds or not.)

Note that hypX^\partial^{\mathrm{hyp}}\widehat{X} is nontrivially blown up from hypX\partial^{\mathrm{hyp}}X, but otherwise ^\widehat{\mathcal{M}} and ˘\breve{\mathcal{M}} are quite similar. As long as we avoid (±1,0)CKB(\pm 1,0)^{\mathrm{CKB}}, we can pass from CKB to ACKB without trouble. In the context of Lemma 8.3, the critical case is when sinθ=0\sin\theta=0. Technically, however, this requires another (very simple) blow-up in domain.

Let us consider the domain

~={(s,r,θ,σ)[0,+)×[0,+)××(([0,π]mod2π)×{+1}([π,2π]mod2π)×{1})}\widetilde{\mathcal{M}}=\{(s,r,\theta,\sigma)\in[0,+\infty)\times[0,+\infty)\times\\ \times\left(([0,\pi]\operatorname{\,mod\,}2\pi)\times\{+1\}\cup([\pi,2\pi]\operatorname{\,mod\,}2\pi)\times\{-1\}\right)\}

where the canonical ~^\widetilde{\mathcal{M}}\rightarrow\widehat{\mathcal{M}} simply forgets σ\sigma. Technically, the blow-up simply duplicates (cuts along) hyp1X^ðparX^ell1X^\partial^{\mathrm{hyp1}}\widehat{X}\cup\eth^{\mathrm{par}}\widehat{X}\cup\partial^{\mathrm{ell1}}\widehat{X}. This separates the connected ^\widehat{\mathcal{M}} into the two components of ~\widetilde{\mathcal{M}}.

Now, M~ACKB(s,r,θ,σ)\widetilde{\mathrm{M}}^{\mathrm{ACKB}}(s,r,\theta,\sigma) is well-defined as long as sinθ0\sin\theta\neq 0. The analogous statement to Lemma 8.3 is

Lemma 8.4.

(a) M~ACKB\widetilde{\mathrm{M}}^{\mathrm{ACKB}} extends by the formula

M~ACKB(s,r,θ,σ)=(arcsin𝒜𝒜2+2+𝒢2,σ0(0)2+(𝒢0)2)\widetilde{\mathrm{M}}^{\mathrm{ACKB}}(s,r,\theta,\sigma)=\left(\arcsin\frac{\mathcal{A}}{\sqrt{\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}}},\sigma\cdot\frac{\mathcal{B}_{0}}{\sqrt{(\mathcal{B}_{0})^{2}+(\mathcal{G}_{0})^{2}}}\right)

(b) If s=r=0s=r=0, then M~ACKB(0,0,θ,σ)=(π2θ,σ)\widetilde{\mathrm{M}}^{\mathrm{ACKB}}(0,0,\theta,\sigma)=(\frac{\pi}{2}-\theta,\sigma).

However, if s+r>0s+r>0, then M~ACKB(s,r,θ,σ)\widetilde{\mathrm{M}}^{\mathrm{ACKB}}(s,r,\theta,\sigma) is in [π2,π2]×(1,1)[-\frac{\pi}{2},\frac{\pi}{2}]\times(-1,1).

(c) M~AHP\widetilde{\mathrm{M}}^{\mathrm{AHP}} extends by the formula

M~AHP(s,r,θ,σ)=(arctan𝒜|𝒢|,σ0𝒢0),\widetilde{\mathrm{M}}^{\mathrm{AHP}}(s,r,\theta,\sigma)=\left(\arctan\frac{\mathcal{A}}{|\mathcal{G}|},\sigma\cdot\frac{\mathcal{B}_{0}}{\mathcal{G}_{0}}\right),

with values in [π2,π2]×[-\frac{\pi}{2},\frac{\pi}{2}]\times\mathbb{R}, except it is formally not defined for s+r=0s+r=0.

Proof.

Direct computation. ∎

Let us use the notation =s+r\not{N}=s+r (which refers to the norm). Ultimately, we will be interested in the case when >0\not{N}>0 and \not{N} is fixed.

For the sake of visualization, in Figure 8, we show a cross-section of ~|σ=+1\widetilde{\mathcal{M}}|_{\sigma=+1} for =5π/3\not{N}=5\pi/3.

\begin{overpic}[scale={.5}]{figBA04.eps} \put(100.0,18.0){$\partial^{\mathrm{ell1}}\widetilde{X}$} \put(100.0,48.0){$\eth^{\mathrm{par}}\widetilde{X}$} \put(100.0,78.0){$\partial^{\mathrm{hyp1}}\widetilde{X}$} \put(-12.0,18.0){$\partial^{\mathrm{ell1}}\widetilde{X}$} \put(-12.0,48.0){$\eth^{\mathrm{par}}\widetilde{X}$} \put(-12.0,78.0){$\partial^{\mathrm{hyp1}}\widetilde{X}$} \put(48.0,13.0){$\widetilde{X}^{\mathrm{ellext}}$} \put(48.0,27.0){$\widetilde{X}^{\mathrm{nnext}}$} \put(48.0,42.0){$\partial^{\mathrm{dnn0}}\widetilde{X}$} \put(48.0,60.0){$\widetilde{X}^{\mathrm{nn}}$} \put(48.0,82.0){$\partial^{\mathrm{hyp*}}\widetilde{X}$} \put(69.0,34.0){$\partial^{\mathrm{dnn*}}\widetilde{X}$} \put(19.0,34.0){$\partial^{\mathrm{dnn*}}\widetilde{X}$} \put(69.0,13.0){$\partial^{\mathrm{dell*}}\widetilde{X}$} \put(84.0,13.0){$\partial^{\mathrm{ell*}}\widetilde{X}$} \put(22.0,13.0){$\partial^{\mathrm{dell*}}\widetilde{X}$} \put(7.0,13.0){$\partial^{\mathrm{ell*}}\widetilde{X}$} \end{overpic}
Fig. 8. Cross-section of ~|σ=+1\widetilde{\mathcal{M}}|_{\sigma=+1} for =5π/3\not{N}=5\pi/3
\phantomcaption

In general, the situation is similar for >π\not{N}>\pi. For =π\not{N}=\pi, the degenerate boundary shrinks only to dell0X~\partial^{\mathrm{dell0}}\widetilde{X}; and there is no degenerate boundary for π<\pi<\not{N}. Now, ~\widetilde{\mathcal{M}} contains two such bands, and ^\widehat{\mathcal{M}} is two such band glued together (all meant for a fixed >0\not{N}>0). In this setting 0~\partial^{0}\widetilde{\mathcal{M}} contains two segments, and 0^\partial^{0}\widehat{\mathcal{M}} is a circle.

Another sort of blow-up which affects the variables s,rs,r is given by passing to the coordinates ,t\not{N},t such that t[0,1]t\in[0,1], and

(90) s=t,andr=(1t).s=\not{N}t,\qquad\text{and}\qquad r=\not{N}(1-t).

In merit, as a blow-up, this affects only the case s=r=0s=r=0 of the zero-boundary (where it is slightly advantageous). As such, its use is marginal for us. However, (90) is retained as a practical change of coordinates, which is particularly useful if \not{N} is kept fixed.

Lemma 8.5.

In the context of Lemma 8.3 and Lemma 8.4, the extended actions on some boundary pieces (restricted for a fixed >0\not{N}>0, with =s+r\not{N}=s+r) are given as follows:

(a) If s=0s=0 (so r=r=\not{N}), i. e. we consider the Schur-hyperbolic boundary hypX^\partial^{\mathrm{hyp}}\widehat{X}, then

(a^,b^,c˘,d˘)=(cosθ,(coth)sinθ,1,sinθ).(\hat{a},\hat{b},\breve{c},\breve{d})=\left(\cos\theta,(\coth\not{N})\not{N}\sin\theta,1,-\not{N}\sin\theta\right).
c˘2+d˘2=1+(sinθ)2.\breve{c}^{2}+\breve{d}^{2}=1+\left(\not{N}\sin\theta\right)^{2}.
a^2+b^2c˘2d˘2=(sinθ)2((sinh)21).\hat{a}^{2}+\hat{b}^{2}-\breve{c}^{2}-\breve{d}^{2}=(\sin\theta)^{2}\left(\left(\frac{\not{N}}{\sinh\not{N}}\right)^{2}-1\right).

Here θ\theta is arbitrary.

The normalized expressions extend as

M^CKB(0,,θ)=(cosθ1+(sinθ)2,(coth)sinθ1+(sinθ)2)\widehat{\mathrm{M}}^{\mathrm{CKB}}(0,\not{N},\theta)=\left(\frac{\cos\theta}{\sqrt{1+\left(\not{N}\sin\theta\right)^{2}}},\frac{(\coth\not{N})\not{N}\sin\theta}{\sqrt{1+\left(\not{N}\sin\theta\right)^{2}}}\right)

and (if sinθ0\sin\theta\neq 0,)

M^HP(0,,θ)=(|cotθ|1(sinh)2sgncosθ,(coth)1(sinh)2sgnsinθ).\widehat{\mathrm{M}}^{\mathrm{HP}}(0,\not{N},\theta)=\left(\frac{|\cot\theta|}{\sqrt{1-\left(\dfrac{\not{N}}{\sinh\not{N}}\right)^{2}}}\operatorname{sgn}\cos\theta,\frac{(\coth\not{N})\not{N}}{\sqrt{1-\left(\dfrac{\not{N}}{\sinh\not{N}}\right)^{2}}}\operatorname{sgn}\sin\theta\right).

These curves (in θ\theta) are two-sided hh-hypercycles (hh-distance lines).

(b) If r=0r=0 (so s=s=\not{N}), i. e. we consider the purely elliptic boundary ellX^\partial^{\mathrm{ell}}\widehat{X}, then

(a^,b^,c˘,d˘)=(cosθ,sinθ,(sinθ)cot(sinθ),sinθ),(\hat{a},\hat{b},\breve{c},\breve{d})=\left(\cos\theta,\sin\theta,(\not{N}\sin\theta)\cot(\not{N}\sin\theta),-\not{N}\sin\theta\right),
c˘2+d˘2=(sinθsin(sinθ))2,\breve{c}^{2}+\breve{d}^{2}=\left(\frac{\not{N}\sin\theta}{\sin\left(\not{N}\sin\theta\right)}\right)^{2},
a^2+b^2c˘2d˘2=1(sinθsin(sinθ))2.\hat{a}^{2}+\hat{b}^{2}-\breve{c}^{2}-\breve{d}^{2}=1-\left(\frac{\not{N}\sin\theta}{\sin\left(\not{N}\sin\theta\right)}\right)^{2}.

Here the domain restriction can be expressed as

|sinθ|<π.|\sin\theta|<\frac{\pi}{\not{N}}.

The normalized expressions extend as

M^CKB(,0,θ)=(sin(sinθ)cotθ,sin(sinθ)),\widehat{\mathrm{M}}^{\mathrm{CKB}}(\not{N},0,\theta)=\left(\frac{\sin\left(\not{N}\sin\theta\right)}{\not{N}}\cot\theta,\frac{\sin\left(\not{N}\sin\theta\right)}{\not{N}}\right),

and (if sinθ0\sin\theta\neq 0,)

M^HP(,0,θ)=(a^HP,b^HP)=(cosθ(sinθsin(sinθ))21,sinθ(sinθsin(sinθ))21).\widehat{\mathrm{M}}^{\mathrm{HP}}(\not{N},0,\theta)=\left(\hat{a}^{\mathrm{HP}},\hat{b}^{\mathrm{HP}}\right)=\left(\frac{\cos\theta}{\sqrt{\left(\dfrac{\not{N}\sin\theta}{\sin\left(\not{N}\sin\theta\right)}\right)^{2}-1}},\frac{\sin\theta}{\sqrt{\left(\dfrac{\not{N}\sin\theta}{\sin\left(\not{N}\sin\theta\right)}\right)^{2}-1}}\right).

These curves (in θ\theta) in the CKB model are radially contracted images of (possibly two open pieces of) the unit circle. (They give quasi Cassini ovals.)

(c) If (ssinθ)2(s)2=π2(s\sin\theta)^{2}-(\not{N}-s)^{2}=\pi^{2}, i. e. we consider dellX^dnnX^\partial^{\mathrm{dell}}\widehat{X}\cup\partial^{\mathrm{dnn}}\widehat{X} (in particular π\not{N}\geq\pi), then

|sinθ|=π2+(s)2s.|\sin\theta|=\frac{\sqrt{\pi^{2}+(\not{N}-s)^{2}}}{s}.

The domain restrictions are expressed as

|sinθ|π,ors2+π22,orr2π22.|\sin\theta|\geq\frac{\pi}{\not{N}},\qquad\text{or}\qquad s\geq\frac{\not{N}^{2}+\pi^{2}}{2\not{N}},\qquad\text{or}\qquad r\leq\frac{\not{N}^{2}-\pi^{2}}{2\not{N}}.

The extended actions are given by

M^CKB(s,r,θ)=(0,sπ1+(sπ)2sgnsinθ)=(0,rπ1+(rπ)2sgnsinθ),\widehat{\mathrm{M}}^{\mathrm{CKB}}(s,r,\theta)=\left(0,\frac{\frac{\not{N}-s}{\pi}}{\sqrt{1+\left(\frac{\not{N}-s}{\pi}\right)^{2}}}\operatorname{sgn}\sin\theta\right)=\left(0,\frac{\frac{r}{\pi}}{\sqrt{1+\left(\frac{r}{\pi}\right)^{2}}}\operatorname{sgn}\sin\theta\right),

and

M^HP(s,r,θ)=(0,sπsgnsinθ)=(0,rπsgnsinθ).\widehat{\mathrm{M}}^{\mathrm{HP}}(s,r,\theta)=\left(0,\frac{\not{N}-s}{\pi}\operatorname{sgn}\sin\theta\right)=\left(0,\frac{r}{\pi}\operatorname{sgn}\sin\theta\right).

In particular, the image of dellX^dnnX^\partial^{\mathrm{dell}}\widehat{X}\cup\partial^{\mathrm{dnn}}\widehat{X} is

the segment connecting (0,2π22+π2) and (0,2π22+π2) in the CKB model,\text{the segment connecting $\left(0,\frac{\not{N}^{2}-\pi^{2}}{\not{N}^{2}+\pi^{2}}\right)$ and $\left(0,-\frac{\not{N}^{2}-\pi^{2}}{\not{N}^{2}+\pi^{2}}\right)$ in the CKB model},

and

the segment connecting (0,2π22π) and (0,2π22π) in the HP model.\text{the segment connecting $\left(0,\frac{\not{N}^{2}-\pi^{2}}{2\not{N}\pi}\right)$ and $\left(0,-\frac{\not{N}^{2}-\pi^{2}}{2\not{N}\pi}\right)$ in the HP model}.

(For =π\not{N}=\pi, this is just the origin, which comes from dell0X^\partial^{\mathrm{dell0}}\widehat{X}. For >π\not{N}>\pi, the endpoints come from dnn0X^\partial^{\mathrm{dnn0}}\widehat{X}, the origin comes from dellX^\partial^{\mathrm{dell*}}\widehat{X}, the intermediate points come from dnnX^\partial^{\mathrm{dnn*}}\widehat{X}.)

(d) If sinθ=0\sin\theta=0, i. e. we consider hyp1X^ðparX^ell1X^\partial^{\mathrm{hyp1}}\widehat{X}\cup\eth^{\mathrm{par}}\widehat{X}\cup\partial^{\mathrm{ell1}}\widehat{X}, or rather hyp1X~ðparX~ell1X~\partial^{\mathrm{hyp1}}\widetilde{X}\cup\eth^{\mathrm{par}}\widetilde{X}\cup\partial^{\mathrm{ell1}}\widetilde{X}, then

M^ACKB(r,r,θ,σ)=(π2sgncosθ,1(r2)+r(r2)1+(1(r2)+r(r2))2σ),\widehat{\mathrm{M}}^{\mathrm{ACKB}}(\not{N}-r,r,\theta,\sigma)=\left(\frac{\pi}{2}\cdot\operatorname{sgn}\cos\theta,\frac{\frac{1}{\not{N}}\mathcal{E}(-r^{2})+r\mathcal{F}(-r^{2})}{\sqrt{1+\left(\frac{1}{\not{N}}\mathcal{E}(-r^{2})+r\mathcal{F}(-r^{2})\right)^{2}}}\cdot\sigma\right),

and

M^AHP(r,r,θ,σ)=(π2sgncosθ,(1(r2)+r(r2))σ).\widehat{\mathrm{M}}^{\mathrm{AHP}}(\not{N}-r,r,\theta,\sigma)=\left(\frac{\pi}{2}\cdot\operatorname{sgn}\cos\theta,\left(\frac{1}{\not{N}}\mathcal{E}(-r^{2})+r\mathcal{F}(-r^{2})\right)\cdot\sigma\right).

The absolute values of the seconds coordinates are strictly monotone increasing in rr,

with range [33+2,coth1+2] in the ACKB model,\text{with range $\left[\frac{\sqrt{3}}{\sqrt{3+\not{N}^{2}}},\frac{\not{N}\coth\not{N}}{\sqrt{1+\not{N}^{2}}}\right]$ in the ACKB model},

and

with range [3,(coth)1(sinh)2] in the AHP model.\text{with range $\left[\frac{\sqrt{3}}{\not{N}},\frac{(\coth\not{N})\not{N}}{\sqrt{1-\left(\dfrac{\not{N}}{\sinh\not{N}}\right)^{2}}}\right]$ in the AHP model}.

(The lower limits come from ell1X~\partial^{\mathrm{ell1}}\widetilde{X}, the upper limits come from hyp1X~\partial^{\mathrm{hyp1}}\widetilde{X}, the intermediate values come from ðparX~\eth^{\mathrm{par}}\widetilde{X}.

Proof.

Direct computation. ∎

For the sake of visualization, in Figure 8, we show pictures about the range of M^CKB(s,r,θ)\widehat{\mathrm{M}}^{\mathrm{CKB}}(s,r,\theta) restricted to X¯X^nnðparX^hypX~ellX~dellX~dnnX~\overline{X}\equiv\widehat{X}^{\mathrm{nn}}\cup\eth^{\mathrm{par}}\widehat{X}\cup\partial^{\mathrm{hyp}}\widetilde{X}\cup\partial^{\mathrm{ell}}\widetilde{X}\cup\partial^{\mathrm{dell}}\widetilde{X}\cup\partial^{\mathrm{dnn}}\widetilde{X} and to a fixed >0\not{N}>0, with parameter lines induced by ss and θ\theta, for the cases =π2\not{N}=\frac{\pi}{2}, =5π6\not{N}=\frac{5\pi}{6}, =π\not{N}=\pi, =5π3\not{N}=\frac{5\pi}{3}; and the image of the relevant boundary pieces in the cases =5π3\not{N}=\frac{5\pi}{3}, =5π\not{N}=5\pi. Furthermore, in Figure 8, we also include a picture about the corresponding range of M~ACKB(s,r,θ,σ)\widetilde{\mathrm{M}}^{\mathrm{ACKB}}(s,r,\theta,\sigma) in the case =5π/6\not{N}=5\pi/6.

Refer to caption
Fig. 8(a) =π/2\not{N}=\pi/2
Refer to caption
8(b) =5π/6\not{N}=5\pi/6
\phantomcaption
Refer to caption
8(c) =π\not{N}=\pi
Refer to caption
8(d) =5π/3\not{N}=5\pi/3
\phantomcaption
Refer to caption
8(e) =5π/3\not{N}=5\pi/3 (only boundary)
Refer to caption
8(f) =5π\not{N}=5\pi (only boundary)
\phantomcaption
Refer to caption
Fig. 8. =5π/6\not{N}=5\pi/6 in ACKB
\phantomcaption

We see the following: Through M^CKB\widehat{\mathrm{M}}^{\mathrm{CKB}}, the various components of X¯\overline{X} map as follows:

\bullet The Schur-hyperbolic boundary hypX^\partial^{\mathrm{hyp}}\widehat{X} (i. e. s=0s=0, t=0t=0) maps injectively to the outer rim.

\bullet The Schur-elliptic boundary hypX^\partial^{\mathrm{hyp}}\widehat{X} (i. e. r=0r=0, t=1t=1) maps injectively to the inner rim except to the origin.

\bullet The closure of the pseudoboundary hyp1X^ðparX^ell1X^\partial^{\mathrm{hyp1}}\widehat{X}\cup\eth^{\mathrm{par}}\widehat{X}\cup\partial^{\mathrm{ell1}}\widehat{X} (i. e. sinθ=0\sin\theta=0) gives two pinches on the sides. (This is improved by M~ACKB\widetilde{\mathrm{M}}^{\mathrm{ACKB}}, which is injective on hyp1X~ðparX~ell1X~\partial^{\mathrm{hyp1}}\widetilde{X}\cup\eth^{\mathrm{par}}\widetilde{X}\cup\partial^{\mathrm{ell1}}\widetilde{X}.)

\bullet The degenerate boundary maps to the inner slit. More precisely:

\quad\circ For <π\not{N}<\pi, there is no degenerate boundary.

\quad\circ For =π\not{N}=\pi, the degenerate boundary is dell0X^\partial^{\mathrm{dell0}}\widehat{X}, it maps to the origin.

\quad\circ For >π\not{N}>\pi, the degenerate boundary is dellX^dnn0X^dnnX^\partial^{\mathrm{dell}}\widehat{X}\cup\partial^{\mathrm{dnn0}}\widehat{X}\cup\partial^{\mathrm{dnn*}}\widehat{X} (restricted to norm \not{N}). Then dnn0X^\partial^{\mathrm{dnn0}}\widehat{X} maps to the upper and lower tips of central slit, dellX^\partial^{\mathrm{dell*}}\widehat{X} maps to the origin, dnnX^\partial^{\mathrm{dnn*}}\widehat{X} maps to join them.

\bullet The red lines: θ\theta varies, ss (or rr) is fixed. The blue lines: ss (or rr) varies, θ\theta is fixed. It is quite suggestive that X^nn\widehat{X}^{\mathrm{nn}} maps injectively into two simply connected domains, but this requires some justification, which is as follows.

In fact, the proof of the next lemma will introduce some important computational techniques:

Lemma 8.6.

Suppose that >0\not{N}>0.

(a) Let 𝒮nnN\mathcal{S}^{\mathrm{nn}}_{\not}{N} be the subset of 𝒮nn\mathcal{S}^{\mathrm{nn}} which contains the elements of norm \not{N}. Then the operations MRCKB\mathrm{MR}^{\mathrm{CKB}} and MLCKB\mathrm{ML}^{\mathrm{CKB}} (cf. (85)) are injective on 𝒮nnN\mathcal{S}^{\mathrm{nn}}_{\not}{N}.

(b) Let XnnNX^{\mathrm{nn}}_{\not}{N} be the subset of XnnX^{\mathrm{nn}} which is the restriction to =r+s\not{N}=r+s. Then the operation MCKB\mathrm{M}^{\mathrm{CKB}} (cf. (86)) is injective on XnnNX^{\mathrm{nn}}_{\not}{N}.

In fact, its image is the union of two simply connected domains bounded by the image of the boundary pieces XnnX^{\mathrm{nn}}, laying in the upper and lower half-planes, respectively.

Proof.

Due to the circular fibration property, it is sufficient to prove (b). By the monotonicity properties of the boundary pieces (cf. Lemma 8.5) one can characterize the simply connected domains. Then one can check that M^CKB\widehat{\mathrm{M}}^{\mathrm{CKB}} (or M^HP\widehat{\mathrm{M}}^{\mathrm{HP}}) start to map into the two simply connected domains near the boundary pieces, which is tedious but doable. (It requires special arguments only in the case of dell0X^dnn0X^\partial^{\mathrm{dell0}}\widehat{X}\cup\partial^{\mathrm{dnn0}}\widehat{X}; see comments later.) Then, for topological reasons, it is sufficient to demonstrate that the Jacobian of M^CKB\widehat{\mathrm{M}}^{\mathrm{CKB}} (or M^HP\widehat{\mathrm{M}}^{\mathrm{HP}}) on XnnNX^{\mathrm{nn}}_{\not}{N} never vanishes.

Then, it is sufficient to consider the HP\mathrm{HP} model. We have to compute the Jacobian of

(a^HP,b^HP)=(ab1(b2r2),1(b2r2)+r(b2r2)(b2r2))(\hat{a}^{\mathrm{HP}},\hat{b}^{\mathrm{HP}})=\left(\frac{a}{b}\cdot\frac{1}{\not{N}\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}},\frac{1}{\not{N}\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}}+r\frac{\operatorname{\not{\mathbf{C}}}(b^{2}-r^{2})}{\sqrt{\operatorname{\not{\mathbf{D}}}(b^{2}-r^{2})}}\right)

with

a=tcosθ,b=tsinθ,r=ta=t\not{N}\cos\theta,\qquad b=t\not{N}\sin\theta,\qquad r=\not{N}-t\not{N}

with respect to t,θt,\theta (\not{N} is fixed) subject to 0<t<10<t<1, θ(0,π)(π,2π)\theta\in(0,\pi)\cup(\pi,2\pi). We find

(91) (a^HP,b^HP)(t,θ)=\displaystyle\frac{\partial(\hat{a}^{\mathrm{HP}},\hat{b}^{\mathrm{HP}})}{\partial(t,\theta)}=\, 1(sinθ)2\displaystyle-\frac{1}{\left(\sin\theta\right)^{2}}\,\frac{\operatorname{\not{\mathbf{C}}}}{\operatorname{\not{\mathbf{D}}}}
1t(cosθ)2(sinθ)22\displaystyle-\,{\frac{1-t\left(\cos\theta\right)^{2}}{(\sin\theta)^{2}}}\,\frac{\operatorname{\not{\mathbf{G}}}}{\operatorname{\not{\mathbf{D}}}^{2}}
(1t(cosθ)2)(sinθ)2(1t)22\displaystyle-\,{\frac{\left(1-t\left(\cos\theta\right)^{2}\right)}{(\sin\theta)^{2}}}\left(1-t\right)\not{N}^{2}\frac{\operatorname{\not{\mathbf{L}}}}{\operatorname{\not{\mathbf{D}}}^{2}}
2t2(cosθ)22\displaystyle-\,\not{N}^{2}{t}^{2}\left(\cos\theta\right)^{2}\frac{\operatorname{\not{\mathbf{C}}}\operatorname{\not{\mathbf{G}}}}{\operatorname{\not{\mathbf{D}}}^{2}}

where the arguments of ,,,𝐋\operatorname{\not{\mathbf{C}}},\operatorname{\not{\mathbf{D}}},\operatorname{\not{\mathbf{G}}},\operatorname{\not{\mathbf{L}}} should be b2r2=(tsinθ)2(1t)22b^{2}-r^{2}=(t\not{N}\sin\theta)^{2}-(1-t)^{2}\not{N}^{2}. Each summand is non-positive; in fact, strictly negative, with the possible exception of the last one. Thus the Jacobian is negative.

Moreover, this method with the Jacobian applies to around hypX\partial^{\mathrm{hyp*}}X (i. e. t=0t=0, sinθ0\sin\theta\neq 0) and ellX\partial^{\mathrm{ell*}}X (i. e. t=1t=1, sinθ0\sin\theta\neq 0) as (91rhs/1) is always negative. Moreover, it also applies to dellX^\partial^{\mathrm{dell*}}\widehat{X} and dnnX^\partial^{\mathrm{dnn*}}\widehat{X} where (91rhs/1)–(91rhs/3) extend to 0 zero smoothly but (91rhs/4) gives an nonzero contribution.

In fact, if we pass to (a^AHP,b^AHP)(\hat{a}^{\mathrm{AHP}},\hat{b}^{\mathrm{AHP}}), then it becomes regular even near the asymptotical points: It yields

(92) (a^AHP,b^AHP)(t,θ)=112(cosθ)2+(sinθ)2(+(1t(cosθ)2)2+(1t(cosθ)2)(1t)22+2t2(cosθ)2(sinθ)22).\frac{\partial(\hat{a}^{\mathrm{AHP}},\hat{b}^{\mathrm{AHP}})}{\partial(t,\theta)}=-\frac{1}{\frac{1}{\not{N}^{2}\operatorname{\not{\mathbf{D}}}}(\cos\theta)^{2}+(\sin\theta)^{2}}\Biggl{(}\,\,\frac{\operatorname{\not{\mathbf{C}}}}{\operatorname{\not{\mathbf{D}}}}+\,(1-t\left(\cos\theta\right)^{2})\,\frac{\operatorname{\not{\mathbf{G}}}}{\operatorname{\not{\mathbf{D}}}^{2}}\\ +\,{\left(1-t\left(\cos\theta\right)^{2}\right)}\left(1-t\right)\not{N}^{2}\frac{\operatorname{\not{\mathbf{L}}}}{\operatorname{\not{\mathbf{D}}}^{2}}+\,\not{N}^{2}{t}^{2}\left(\cos\theta\right)^{2}\left(\sin\theta\right)^{2}\frac{\operatorname{\not{\mathbf{C}}}\operatorname{\not{\mathbf{G}}}}{\operatorname{\not{\mathbf{D}}}^{2}}\Biggr{)}.

This extends smoothly to the case sinθ=0\sin\theta=0, i. e. to ðparX^hyp1Xell1X\eth^{\mathrm{par}}\widehat{X}\cup\partial^{\mathrm{hyp1}}X\cup\partial^{\mathrm{ell1}}X. It is easy to see that, on this set, the Jacobian is strictly negative again. (Note that in this case b2r2=r20b^{2}-r^{2}=-r^{2}\leq 0, thus \frac{\operatorname{\not{\mathbf{C}}}}{\operatorname{\not{\mathbf{D}}}} and 1\frac{1}{\operatorname{\not{\mathbf{D}}}} are positive.)

This still leaves the uneasy case of dell0X^dnn0X^\partial^{\mathrm{dell0}}\widehat{X}\cup\partial^{\mathrm{dnn0}}\widehat{X} regarding boundary behaviour. However, even those cases can be handled, as failure of indicated behaviour at those point (1-1 points for each simply connected domain for a fixed π\not{N}\geq\pi) would cause irregular behaviour at other points. ∎

The following discussion will be important for us. For A𝒮nnA\in\mathcal{S}^{\mathrm{nn}}, we let

NR(A):=𝑿+(MR(A))detMR(A)b^a^2+b^2c˘2d˘2(a^Id2+c˘2+d˘2a^2b^I~c˘J~d˘K).\mathrm{NR}(A):=\frac{\boldsymbol{X}^{+}(\mathrm{MR}(A))}{\det\mathrm{MR}(A)}\equiv\frac{\hat{b}}{\hat{a}^{2}+\hat{b}^{2}-\breve{c}^{2}-\breve{d}^{2}}\left(\hat{a}\operatorname{Id}_{2}+\frac{\breve{c}^{2}+\breve{d}^{2}-\hat{a}^{2}}{\hat{b}}\tilde{I}-\breve{c}\tilde{J}-\breve{d}K\right).

One can immediately see that

MRA(NR(A))=12tr(MR(A)NR(A))=0,\mathrm{MR}_{A}(\mathrm{NR}(A))=\frac{1}{2}\operatorname{tr}\left(\mathrm{MR}(A)^{*}\mathrm{NR}(A)\right)=0,

that is

ddtlog((expA)(exp(tNR(A)))2|t=0=0.\left.\frac{\mathrm{d}}{\mathrm{d}t}\|\log((\exp A)(\exp(t\mathrm{NR}(A)))\|_{2}\right|_{t=0}=0.

On the other hand,

Lemma 8.7.
d2dt2log((expA)(exp(tNR(A)))2|t=0<1<0.\left.\frac{\mathrm{d}^{2}}{\mathrm{d}t^{2}}\|\log((\exp A)(\exp(t\mathrm{NR}(A)))\|_{2}\right|_{t=0}<-\frac{1}{\not{N}}<0.
Proof.

By direct computation one can check that

LHS=\displaystyle\mathrm{LHS}= 1\displaystyle-\frac{1}{\not{N}}
1(1t(cosθ)2)2\displaystyle-\frac{1}{\not{N}}(1-t(\cos\theta)^{2})\frac{\operatorname{\not{\mathbf{G}}}}{\operatorname{\not{\mathbf{D}}}^{2}}
2(1t)(1t(cosθ)2)2\displaystyle-\not{N}2(1-t)(1-t(\cos\theta)^{2})\frac{\operatorname{\not{\mathbf{G}}}\operatorname{\not{\mathbf{C}}}}{\operatorname{\not{\mathbf{D}}}^{2}}
2t(sinθ)2\displaystyle-\not{N}2t(\sin\theta)^{2}\frac{\operatorname{\not{\mathbf{G}}}}{\operatorname{\not{\mathbf{D}}}}
(1t)2\displaystyle-\not{N}(1-t)\frac{\operatorname{\not{\mathbf{C}}}^{2}}{\operatorname{\not{\mathbf{D}}}}
3(1t)2(1t(cosθ)2)2\displaystyle-\not{N}^{3}(1-t)^{2}(1-t(\cos\theta)^{2})\frac{\operatorname{\not{\mathbf{C}}}\operatorname{\not{\mathbf{L}}}}{\operatorname{\not{\mathbf{D}}}^{2}}
3t(1t)(sinθ)2.\displaystyle-\not{N}^{3}t(1-t)(\sin\theta)^{2}\frac{\operatorname{\not{\mathbf{L}}}}{\operatorname{\not{\mathbf{D}}}}.

(The arguments of ,,,𝐋\operatorname{\not{\mathbf{C}}},\operatorname{\not{\mathbf{D}}},\operatorname{\not{\mathbf{G}}},\operatorname{\not{\mathbf{L}}} should be b2r22t2(sinθ)22(1t)2b^{2}-r^{2}\equiv\not{N}^{2}t^{2}(\sin\theta)^{2}-\not{N}^{2}(1-t)^{2}.)

Then the statement follows from the observation that every term in the sum is strictly negative on the specified domain. ∎

Here an important observation is that NR(A)\mathrm{NR}(A) depends solely on MR(A)\mathrm{MR}(A), and, by homogeneity, only on MRCKB(A)\mathrm{MR}^{\mathrm{CKB}}(A). Similar definitions, statements, and observations can be made regarding ML(A)\mathrm{ML}(A), and NL(A)\mathrm{NL}(A).

Remark 8.8.

Let us address the question that whether MRCKB(A)\mathrm{MR}^{\mathrm{CKB}}(A) and MLCKB(A)\mathrm{ML}^{\mathrm{CKB}}(A) can be extended to formally beyond the already discussed cases.

(o) If A=aId2+bI~+(rcosψ)J~+(rsinψ)A=a\operatorname{Id}_{2}+b\tilde{I}+(r\cos\psi)\tilde{J}+(r\sin\psi) such that

(93) a2+b2>0,r>0,a^{2}+b^{2}>0,\qquad r>0,

then MRCKB\mathrm{MR}^{\mathrm{CKB}} extends by the formula

MRCKB(A)=𝒜Id2+I~+((cosψ)Id2+(sinψ)I~)(𝒞J~+𝒟K~)𝒜2+2+𝒢2,\mathrm{MR}^{\mathrm{CKB}}(A)=\frac{\mathcal{A}\operatorname{Id}_{2}+\mathcal{B}\tilde{I}+((\cos\psi)\operatorname{Id}_{2}+(\sin\psi)\tilde{I})(\mathcal{C}\tilde{J}+\mathcal{D}\tilde{K})}{\sqrt{\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}}},

where

𝒜(cosθ)((ssinθ)2r2),\mathcal{A}\equiv(\cos\theta)\cdot\mathcal{E}((s\sin\theta)^{2}-r^{2}),

and

(sinθ)(((ssinθ)2r2)+r(s+r)((ssinθ)2r2)),\mathcal{B}\equiv(\sin\theta)\cdot\left(\mathcal{E}((s\sin\theta)^{2}-r^{2})+r(s+r)\mathcal{F}((s\sin\theta)^{2}-r^{2})\right),

and

𝒞((ssinθ)2r2)+(sinθ)2s(s+r)((ssinθ)2r2),\mathcal{C}\equiv\mathcal{E}((s\sin\theta)^{2}-r^{2})+(\sin\theta)^{2}\cdot s(s+r)\mathcal{F}((s\sin\theta)^{2}-r^{2}),

and

𝒟(sinθ)(r+s)((ssinθ)2r2),\mathcal{D}\equiv(\sin\theta)\cdot(r+s)\mathcal{E}((s\sin\theta)^{2}-r^{2}),

and

𝒢(sinθ)(s+r).\mathcal{G}\equiv(\sin\theta)\cdot(s+r).

This dispenses with the condition (81).

(a) If s>0s>0, r=0r=0, then (𝒞J~+𝒟K~)𝒜2+2+𝒢2\frac{(\mathcal{C}\tilde{J}+\mathcal{D}\tilde{K})}{\sqrt{\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}}} can be replaced by an arbitrary (cosξ)J~+(sinξ)K~(\cos\xi)\tilde{J}+(\sin\xi)\tilde{K}. Formally, this leads to

MRCKB(A)=𝒜Id2+I~𝒜2+2+𝒢2+((cosψ)J~+(sinψ)K~)\mathrm{MR}^{\mathrm{CKB}}(A)=\frac{\mathcal{A}\operatorname{Id}_{2}+\mathcal{B}\tilde{I}}{\sqrt{\mathcal{A}^{2}+\mathcal{B}^{2}+\mathcal{G}^{2}}}+((\cos\psi)\tilde{J}+(\sin\psi)\tilde{K})

where ψ\psi is undecided. Here

𝒜(cosθ)((ssinθ)2),\mathcal{A}\equiv(\cos\theta)\cdot\mathcal{E}((s\sin\theta)^{2}),

and

(sinθ)(((ssinθ)2)),\mathcal{B}\equiv(\sin\theta)\cdot\left(\mathcal{E}((s\sin\theta)^{2})\right),

and

𝒞((ssinθ)2)+(ssinθ)2((ssinθ)2),\mathcal{C}\equiv\mathcal{E}((s\sin\theta)^{2})+(s\sin\theta)^{2}\mathcal{F}((s\sin\theta)^{2}),

and

𝒟(ssinθ)((ssinθ)2),\mathcal{D}\equiv(s\sin\theta)\cdot\mathcal{E}((s\sin\theta)^{2}),

and

𝒢ssinθ.\mathcal{G}\equiv s\sin\theta.

(The undecidedness of ψ\psi does not appear M^CKB\widehat{\mathrm{M}}^{\mathrm{CKB}}.)

(b) If s=0s=0, r>0r>0, then θ\theta becomes undecided. Here

𝒜(cosθ)(r2),\mathcal{A}\equiv(\cos\theta)\cdot\mathcal{E}(-r^{2}),

and

(sinθ)((r2)+r2(r2)),\mathcal{B}\equiv(\sin\theta)\cdot\left(\mathcal{E}(-r^{2})+r^{2}\mathcal{F}(-r^{2})\right),

and

𝒞(r2),\mathcal{C}\equiv\mathcal{E}(-r^{2}),

and

𝒟(sinθ)r(r2),\mathcal{D}\equiv(\sin\theta)\cdot r\mathcal{E}(-r^{2}),

and

𝒢rsinθ.\mathcal{G}\equiv r\sin\theta.

(The undecidedness of θ\theta does appear M^CKB\widehat{\mathrm{M}}^{\mathrm{CKB}}.)

These calculations were formal. The real geometrical picture is computed in what follows. “Undecidedness” will appear as conical degeneracy. ∎

Lemma 8.9.

Suppose A=aId2+bI~A=a\operatorname{Id}_{2}+b\tilde{I} with π<b<π-\pi<b<\pi (thus Aell𝒮A\in\partial^{\mathrm{ell}}\mathcal{S}) and 𝐯=v1Id2+v2I~+v3J~+v4K~\mathbf{v}=v_{1}\operatorname{Id}_{2}+v_{2}\tilde{I}+v_{3}\tilde{J}+v_{4}\tilde{K}. Then

MRA(𝐯)=MLA(𝐯)=aa2+b2v1+ba2+b2v2+bsinb(v3)2+(v4)2.\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A}(\mathbf{v})=\frac{a}{\sqrt{a^{2}+b^{2}}}v_{1}+\frac{b}{\sqrt{a^{2}+b^{2}}}v_{2}+\frac{b}{\sin b}\sqrt{(v_{3})^{2}+(v_{4})^{2}}.

In particular, if

𝐯=v^1aId2+bI~a2+b2+v^2bId2+aI~a2+b2+v^3J~bsinb+v^4K~bsinb\mathbf{v}=\hat{v}_{1}\frac{a\operatorname{Id}_{2}+b\tilde{I}}{\sqrt{a^{2}+b^{2}}}+\hat{v}_{2}\frac{-b\operatorname{Id}_{2}+a\tilde{I}}{\sqrt{a^{2}+b^{2}}}+\hat{v}_{3}\frac{\tilde{J}}{\frac{b}{\sin b}}+\hat{v}_{4}\frac{\tilde{K}}{\frac{b}{\sin b}}

then

MRA(𝐯)=MLA(𝐯)=v^1+(v^3)2+(v^4)2.\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A}(\mathbf{v})=\hat{v}_{1}+\sqrt{(\hat{v}_{3})^{2}+(\hat{v}_{4})^{2}}.

(For b=0b=0, bsinb\frac{b}{\sin b} should be understood as 11.)

Proof.

Direct computation. ∎

Lemma 8.10.

Suppose A=cJ~+dK~A=c\tilde{J}+d\tilde{K}, c2+d2>0c^{2}+d^{2}>0, and 𝐯=v1Id2+v2I~+v3J~+v4K~\mathbf{v}=v_{1}\operatorname{Id}_{2}+v_{2}\tilde{I}+v_{3}\tilde{J}+v_{4}\tilde{K}. Then

MRA(𝐯)=cv3+dv4c2+d2+v12+(c2+d2coshc2+d2sinhc2+d2v2cv4+dv3)2.\mathrm{MR}_{A}(\mathbf{v})=\frac{cv_{3}+dv_{4}}{\sqrt{c^{2}+d^{2}}}+\sqrt{v_{1}^{2}+\left(\sqrt{c^{2}+d^{2}}\frac{\cosh\sqrt{c^{2}+d^{2}}}{\sinh\sqrt{c^{2}+d^{2}}}v_{2}-cv_{4}+dv_{3}\right)^{2}}.

In particular, if

𝐯=\displaystyle\mathbf{v}= v^1cJ~+dK~c2+d2+v^2(sinhc2+d2)I~+(coshc2+d2)dJ~+cK~c2+d2c2+d2sinhc2+d2\displaystyle\hat{v}_{1}\frac{c\tilde{J}+d\tilde{K}}{\sqrt{c^{2}+d^{2}}}+\hat{v}_{2}\frac{(\sinh\sqrt{c^{2}+d^{2}})\tilde{I}+(\cosh\sqrt{c^{2}+d^{2}})\frac{-d\tilde{J}+c\tilde{K}}{\sqrt{c^{2}+d^{2}}}}{\dfrac{\sqrt{c^{2}+d^{2}}}{\sinh\sqrt{c^{2}+d^{2}}}}
+v^3Id2+v^4(coshc2+d2)I~+(sinhc2+d2)dJ~+cK~c2+d2c2+d2sinhc2+d2,\displaystyle+\hat{v}_{3}\operatorname{Id}_{2}+\hat{v}_{4}\frac{(\cosh\sqrt{c^{2}+d^{2}})\tilde{I}+(\sinh\sqrt{c^{2}+d^{2}})\frac{-d\tilde{J}+c\tilde{K}}{\sqrt{c^{2}+d^{2}}}}{\dfrac{\sqrt{c^{2}+d^{2}}}{\sinh\sqrt{c^{2}+d^{2}}}},

then

MRA(𝐯)=v^1+(v^3)2+(v^4)2.\mathrm{MR}_{A}(\mathbf{v})=\hat{v}_{1}+\sqrt{(\hat{v}_{3})^{2}+(\hat{v}_{4})^{2}}.

Similarly,

MLA(𝐯)=cv3+dv4c2+d2+v12+(c2+d2coshc2+d2sinhc2+d2v2+cv4dv3)2.\mathrm{ML}_{A}(\mathbf{v})=\frac{cv_{3}+dv_{4}}{\sqrt{c^{2}+d^{2}}}+\sqrt{v_{1}^{2}+\left(\sqrt{c^{2}+d^{2}}\frac{\cosh\sqrt{c^{2}+d^{2}}}{\sinh\sqrt{c^{2}+d^{2}}}v_{2}+cv_{4}-dv_{3}\right)^{2}}.

In particular, if

𝐯=\displaystyle\mathbf{v}= v^1cJ~+dK~c2+d2+v^2(sinhc2+d2)I~+(coshc2+d2)dJ~cK~c2+d2c2+d2sinhc2+d2\displaystyle\hat{v}_{1}\frac{c\tilde{J}+d\tilde{K}}{\sqrt{c^{2}+d^{2}}}+\hat{v}_{2}\frac{(\sinh\sqrt{c^{2}+d^{2}})\tilde{I}+(\cosh\sqrt{c^{2}+d^{2}})\frac{d\tilde{J}-c\tilde{K}}{\sqrt{c^{2}+d^{2}}}}{\dfrac{\sqrt{c^{2}+d^{2}}}{\sinh\sqrt{c^{2}+d^{2}}}}
+v^3Id2+v^4(coshc2+d2)I~+(sinhc2+d2)dJ~cK~c2+d2c2+d2sinhc2+d2,\displaystyle+\hat{v}_{3}\operatorname{Id}_{2}+\hat{v}_{4}\frac{(\cosh\sqrt{c^{2}+d^{2}})\tilde{I}+(\sinh\sqrt{c^{2}+d^{2}})\frac{d\tilde{J}-c\tilde{K}}{\sqrt{c^{2}+d^{2}}}}{\dfrac{\sqrt{c^{2}+d^{2}}}{\sinh\sqrt{c^{2}+d^{2}}}},

then

MLA(𝐯)=v^1+(v^3)2+(v^4)2.\mathrm{ML}_{A}(\mathbf{v})=\hat{v}_{1}+\sqrt{(\hat{v}_{3})^{2}+(\hat{v}_{4})^{2}}.
Proof.

Direct computation. ∎

Lemma 8.11.

Suppose that A1dell𝒮A_{1}\in\partial^{\mathrm{dell}}\mathcal{S}, and, in particular, A1=aId2+πI~A_{1}=a\operatorname{Id}_{2}+\pi\tilde{I} or A1=aId2πI~A_{1}=a\operatorname{Id}_{2}-\pi\tilde{I}. Let 𝐯=v1Id2+v2I~\mathbf{v}=v_{1}\operatorname{Id}_{2}+v_{2}\tilde{I}. Then

MRA(𝐯)=MLA(𝐯)=av1π|v2|a2+π2.\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A}(\mathbf{v})=\frac{av_{1}-\pi|v_{2}|}{\sqrt{a^{2}+\pi^{2}}}.
Proof.

Direct computation. ∎

9. BCH minimality

Definition 9.1.

We say that (A,B)M2()×M2()(A,B)\in\mathrm{M}_{2}(\mathbb{R})\times\mathrm{M}_{2}(\mathbb{R}) is a BCH minimal pair (for CC), if for any (A~,B~)M2()×M2()(\tilde{A},\tilde{B})\in\mathrm{M}_{2}(\mathbb{R})\times\mathrm{M}_{2}(\mathbb{R}) such that (expA~)(expB~)=(expA)(expB)(\exp\tilde{A})(\exp\tilde{B})=(\exp A)(\exp B) (=C)(=C) it holds that A~2>A2\|\tilde{A}\|_{2}>\|A\|_{2} or B~2>B2\|\tilde{B}\|_{2}>\|B\|_{2}.

In this section we seek restrictions for BCH minimal pairs.

Any element CGL2+()C\in\operatorname{GL}^{+}_{2}(\mathbb{R}) can easily be perturbed into the product of two log\log-able elements (for example, one close to AA and one close to Id2\operatorname{Id}_{2}). By a simple compactness argument we find: If C=(expA)(expB)C=(\exp A)(\exp B), then CC allows a minimal pair (A,B)(A,B) such that A~2A2\|\tilde{A}\|_{2}\leq\|A\|_{2} and B~2B2\|\tilde{B}\|_{2}\leq\|B\|_{2}. In minimal pairs, however, we can immediately restrict our attention to some special matrices:

By Lemma 8.1, all minimal pairs are from 𝒮acc×𝒮acc{\mathcal{S}^{\mathrm{acc}}}\times{\mathcal{S}^{\mathrm{acc}}}. By the same lemma, we also see that the elements of 𝒮acc×0𝒮{\mathcal{S}^{\mathrm{acc}}}\times\partial^{0}\mathcal{S} and 0𝒮×𝒮acc\partial^{0}\mathcal{S}\times{\mathcal{S}^{\mathrm{acc}}} are all minimal pairs.

Definition 9.2.

Assume that (A,B)𝒮acc×𝒮acc(A,B)\in{\mathcal{S}^{\mathrm{acc}}}\times{\mathcal{S}^{\mathrm{acc}}}. We say that (A,B)(A,B) is an infinitesimally minimal BCH pair, if we cannot find 𝐯M2()\mathbf{v}\in\mathrm{M}_{2}(\mathbb{R}) such that

(94) MRA(𝐯)<0andMLB(𝐯)<0.\mathrm{MR}_{A}(\mathbf{v})<0\qquad\text{and}\qquad\mathrm{ML}_{B}(-\mathbf{v})<0.
Lemma 9.3.

Assume that (A,B)𝒮acc×𝒮acc(A,B)\in{\mathcal{S}^{\mathrm{acc}}}\times{\mathcal{S}^{\mathrm{acc}}}, and (A,B)(A,B) is not infinitesimally minimal. Then one can find A~,B~𝒮\tilde{A},\tilde{B}\in\mathcal{S} such that

A~2<A2,andB~2<B2,\|\tilde{A}\|_{2}<\|A\|_{2},\qquad\text{and}\qquad\|\tilde{B}\|_{2}<\|B\|_{2},

yet

(expA~)(expB~)=(expA)(expB).(\exp\tilde{A})(\exp\tilde{B})=(\exp A)(\exp B).

In particular, infinitesimal minimality is a necessary condition for minimality.

Proof.

If (94) holds, the let A~=log((expA)(expt𝐯))\tilde{A}=\log((\exp A)(\exp t\mathbf{v})) and B~=log((expt𝐯)(expB))\tilde{B}=\log((\exp-t\mathbf{v})(\exp B)) with a sufficiently small t>0t>0. ∎

Lemma 9.4.

Suppose that A,B𝒮A,B\in\mathcal{S} such that MR(A)\mathrm{MR}(A) and ML(B)\mathrm{ML}(B) are not positive multiples of each other. Then the pair (A,B)(A,B) is not infinitesimally minimal; in particular, (A,B)(A,B) is not minimal.

Proof.

The the nonzero linear functionals MRA()\mathrm{MR}_{A}(\cdot) and MLA()\mathrm{ML}_{A}(\cdot) are not positive multiples of each other, thus one can find 𝐯\mathbf{v} such that (94) holds. ∎

Lemma 9.5.

If (A,B)𝒮×𝒮(A,B)\in\mathcal{S}\times\mathcal{S} is BCH minimal, then (A,B)ðpar𝒮×ðpar𝒮(A,B)\in\eth^{\mathrm{par}}\mathcal{S}\times\eth^{\mathrm{par}}\mathcal{S}.

Proof.

For A𝒮nnA\in\mathcal{S}^{\mathrm{nn}}, MRCKB(A)\mathrm{MR}^{\mathrm{CKB}}(A) and MLCKB(A)\mathrm{ML}^{\mathrm{CKB}}(A) are inner points CKB, for Aðpar𝒮A\in\eth^{\mathrm{par}}\mathcal{S}, MRCKB(A)\mathrm{MR}^{\mathrm{CKB}}(A) and MLCKB(A)\mathrm{ML}^{\mathrm{CKB}}(A) are asymptotic points CKB. From this, and the previous lemma, if (A,B)(A,B) is infinitesimally minimal, then (A,B)ðpar𝒮×ðpar𝒮(A,B)\in\eth^{\mathrm{par}}\mathcal{S}\times\eth^{\mathrm{par}}\mathcal{S} or (A,B)𝒮nn×𝒮nn(A,B)\in\mathcal{S}^{\mathrm{nn}}\times\mathcal{S}^{\mathrm{nn}}. Regarding the second case, by infinitesimal minimality, MRCKB(A)=MLCKB(B)\mathrm{MR}^{\mathrm{CKB}}(A)=\mathrm{ML}^{\mathrm{CKB}}(B) is required. Then we can take 𝐯=NR(A)=NL(B)\mathbf{v}=\mathrm{NR}(A)=\mathrm{NL}(B), and, by Lemma 8.7, A~=log((expA)(expt𝐯))\tilde{A}=\log((\exp A)(\exp t\mathbf{v})) and B~=log((expt𝐯)(expB))\tilde{B}=\log((\exp-t\mathbf{v})(\exp B)) will give a counterexample to BCH minimalty with a sufficiently small t>0t>0. ∎

Lemma 9.6.

Suppose that (A,B)Sacc×Sacc(A,B)\in S^{\mathrm{acc}}\times S^{\mathrm{acc}} is an infinitesimally minimal BCH pair. We claim:

(a) If Adell𝒮A\in\partial^{\mathrm{dell}}\mathcal{S}, then Bhyp𝒮B\in\partial^{\mathrm{hyp}}\mathcal{S}.

(b) If Aell𝒮A\in\partial^{\mathrm{ell*}}\mathcal{S}, then Bell𝒮B\in\partial^{\mathrm{ell*}}\mathcal{S} or Bhyp𝒮B\in\partial^{\mathrm{hyp}}\mathcal{S} or B𝒮nnB\in\mathcal{S}^{\mathrm{nn}}.

(c) If Aell0𝒮A\in\partial^{\mathrm{ell0}}\mathcal{S}, then Bell0𝒮B\in\partial^{\mathrm{ell0}}\mathcal{S} or Bhyp𝒮B\in\partial^{\mathrm{hyp}}\mathcal{S} or Bðpar𝒮B\in\eth^{\mathrm{par}}\mathcal{S}.

Proof.

(a) By Lemma 8.11, for 𝐯±I~\mathbf{v}\sim\pm\tilde{I} the relation holds MRA(𝐯)<0\mathrm{MR}_{A}(\mathbf{v})<0. (‘\sim ’ means ‘approximately’.) Then the statement is a consequence of Lemma 9.3, and furthermore, Lemma 8.9.

(b) and (c) follow from the observation that the maps MRA|Id2+I~\mathrm{MR}_{A}|_{\mathbb{R}\operatorname{Id}_{2}+\mathbb{R}\tilde{I}} and MLB|Id2+I~\mathrm{ML}_{B}|_{\mathbb{R}\operatorname{Id}_{2}+\mathbb{R}\tilde{I}}, if they are linear functionals, should be proportional to each other (cf. Lemma 9.3). ∎

By taking adjoints, the relation described in the previous lemma is symmetric in AA and BB. This leads to

Theorem 9.7.

For minimal BCH pairs, where all factors are of positive norm, one has the following incidence possibilities:

𝒮nnðpar𝒮ell𝒮ell0𝒮dell𝒮hyp𝒮𝒮nn××××ðpar𝒮×××ell𝒮×××ell0𝒮×××dell𝒮×××××hyp𝒮\begin{array}[]{c|ccccccc}&\mathcal{S}^{\mathrm{nn}}&\eth^{\mathrm{par}}\mathcal{S}&\partial^{\mathrm{ell*}}\mathcal{S}&\partial^{\mathrm{ell0}}\mathcal{S}&\partial^{\mathrm{dell}}\mathcal{S}&\partial^{\mathrm{hyp}}\mathcal{S}\\ \hline\cr\mathcal{S}^{\mathrm{nn}}&\times&\times&\checkmark&\times&\times&\checkmark\\ \eth^{\mathrm{par}}\mathcal{S}&\times&\checkmark&\times&\checkmark&\times&\checkmark\\ \partial^{\mathrm{ell*}}\mathcal{S}&\checkmark&\times&\checkmark&\times&\times&\checkmark\\ \partial^{\mathrm{ell0}}\mathcal{S}&\times&\checkmark&\times&\checkmark&\times&\checkmark\\ \partial^{\mathrm{dell}}\mathcal{S}&\times&\times&\times&\times&\times&\checkmark\\ \partial^{\mathrm{hyp}}\mathcal{S}&\checkmark&\checkmark&\checkmark&\checkmark&\checkmark&\checkmark\end{array}

Here \checkmark means ‘perhaps possible’, and ×\times means ‘not possible’.

Proof.

This is a consequence of the previous statements. ∎

Next, we obtain more quantitative restrictions.

Lemma 9.8.

(a) For 1,2>0\not{N}_{1},\not{N}_{2}>0 consider the pairs

(A,B)=([1t11],[2t2])(A,B)=\left(\begin{bmatrix}\not{N}_{1}&\\ &t_{1}\not{N}_{1}\end{bmatrix},\begin{bmatrix}\not{N}_{2}&\\ &t_{2}\not{N}\end{bmatrix}\right)

or

(A,B)=([1t1],[2t2])(A,B)=\left(-\begin{bmatrix}\not{N}_{1}&\\ &t_{1}\not{N}\end{bmatrix},-\begin{bmatrix}\not{N}_{2}&\\ &t_{2}\not{N}\end{bmatrix}\right)

where t1,t2(1,1)t_{1},t_{2}\in(-1,1). Then the pairs (A,B)(A,B) will produce, up to conjugation by rotation matrices, every (infinitesimally) BCH minimal pair from ðpar𝒮×ðpar𝒮\eth^{\mathrm{par}}\mathcal{S}\times\eth^{\mathrm{par}}\mathcal{S}. In these cases

log((expA)(expB)2=1+2.\|\log((\exp A)(\exp B)\|_{2}=\not{N}_{1}+\not{N}_{2}.

(b) Similar statement holds for t1,t2(1,1]t_{1},t_{2}\in(-1,1] with respect to (ðpar𝒮ðell0𝒮)×(ðpar𝒮ðell0𝒮)(\eth^{\mathrm{par}}\mathcal{S}\cup\eth^{\mathrm{ell0}}\mathcal{S})\times(\eth^{\mathrm{par}}\mathcal{S}\cup\eth^{\mathrm{ell0}}\mathcal{S}).

Proof.

(a) By infinitesimal minimality, the matrices should be aligned. BCH minimality follows from Magnus minimality. (b) This is a trivial extension of (a). ∎

Lemma 9.9.

Consider the function given by

α(t1U1+t2U2+t3U3+t4U4)=t1+t32+t42\alpha(t_{1}U_{1}+t_{2}U_{2}+t_{3}U_{3}+t_{4}U_{4})=t_{1}+\sqrt{t_{3}^{2}+t_{4}^{2}}

and the linear functional given by

β(t1U1+t2U2+t3U3+t4U4)=β1t1+β2t2+β3t3+β4t4.\beta(t_{1}U_{1}+t_{2}U_{2}+t_{3}U_{3}+t_{4}U_{4})=\beta_{1}t_{1}+\beta_{2}t_{2}+\beta_{3}t_{3}+\beta_{4}t_{4}.

Then we can choose 𝐯=t1U1+t2U2+t3U3+t4U4\mathbf{v}=t_{1}U_{1}+t_{2}U_{2}+t_{3}U_{3}+t_{4}U_{4} such that α(𝐯)<0\alpha(\mathbf{v})<0 and β(𝐯)<0\beta(-\mathbf{v})<0 unless

(95) β2=0andβ1β32+β42\beta_{2}=0\qquad\text{and}\qquad\beta_{1}\geq\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}

holds, in which case finding such a 𝐯\mathbf{v} is impossible.

Proof.

If β20\beta_{2}\neq 0, or β1<0\beta_{1}<0, or 0=β1<β32+β420=\beta_{1}<\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}, or 0<β1<β32+β420<\beta_{1}<\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}, then

𝐯=U1+β1+1β2U2\mathbf{v}=-U_{1}+\frac{\beta_{1}+1}{\beta_{2}}U_{2}

or

𝐯=U1,\mathbf{v}=-U_{1},

or

𝐯=2U1+β3β32+β42U3+β4β32+β42U4\mathbf{v}=-2U_{1}+\frac{\beta_{3}}{\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}}U_{3}+\frac{\beta_{4}}{\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}}U_{4}

or

𝐯=β1+β32+β422β1U1+β3β32+β42U3+β4β32+β42U4\mathbf{v}=-\frac{\beta_{1}+\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}}{2\beta_{1}}U_{1}+\frac{\beta_{3}}{\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}}U_{3}+\frac{\beta_{4}}{\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}}U_{4}

respectively, are a good choices.

Conversely, assume that (95) holds and α(𝐯)<0\alpha(\mathbf{v})<0, i. e. t1<t32+t42t_{1}<-\sqrt{t_{3}^{2}+t_{4}^{2}}, also holds. Then,

β1t1+β2t2+β3t3+β4t4β32+β42t32+t42+0+β32+β42t32+t42=0\beta_{1}t_{1}+\beta_{2}t_{2}+\beta_{3}t_{3}+\beta_{4}t_{4}\leq-\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}\sqrt{t_{3}^{2}+t_{4}^{2}}+0+\sqrt{\beta_{3}^{2}+\beta_{4}^{2}}\sqrt{t_{3}^{2}+t_{4}^{2}}=0

shows that β(𝐯)0\beta(-\mathbf{v})\geq 0.

(In merit, this is just a simple geometrical statement about a half-cone.) ∎

Lemma 9.10.

Suppose that A1=aId2+bI~+cJ~𝒮A_{1}=a\operatorname{Id}_{2}+b\tilde{I}+c\tilde{J}\in\mathcal{S}, c>0c>0, and B1=a´Id2+b´I~ell𝒮B_{1}=\acute{a}\operatorname{Id}_{2}+\acute{b}\tilde{I}\in\partial^{\mathrm{ell}}\mathcal{S}, where ´N=a´2+b´2\acute{\not}{N}=\sqrt{\acute{a}^{2}+\acute{b}^{2}}. Suppose that (A1,B1)(A_{1},B_{1}) is infinitesimally BCH minimal. Then

B1=a^Id2+b^I~a^2+b^2´B_{1}=\frac{\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\,\acute{\not{N}}

and

(96) b^´sinb^´a^2+b^2c˘2+d˘2.\frac{\hat{b}\acute{\not{N}}}{\sin\dfrac{\hat{b}\acute{\not{N}}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}}\geq\sqrt{\breve{c}^{2}+\breve{d}^{2}}.

(For the b^=0\hat{b}=0, the LHS is understood as a^2+b^2\sqrt{\hat{a}^{2}+\hat{b}^{2}}, but then the condition is vacuous anyway. Also note that the condition |b^´a^2+b^2|<π\left|\dfrac{\hat{b}\acute{\not{N}}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\right|<\pi is imposed by B1ell𝒮B_{1}\in\partial^{\mathrm{ell}}\mathcal{S}.)

Proof.

This a consequence of Lemma 9.9 and Lemma 8.9. By (95/cond1),

B1=±a^Id2+b^I~a^2+b^2´.B_{1}=\pm\frac{\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\,\acute{\not{N}}.

By (95/cond2), only the sign choice + is valid, and (96) holds. ∎

Lemma 9.11.

Suppose that A1=aId2+bI~+cJ~𝒮A_{1}=a\operatorname{Id}_{2}+b\tilde{I}+c\tilde{J}\in\mathcal{S}, c>0c>0, and B1=c´J~+d´K~hyp𝒮B_{1}=\acute{c}\tilde{J}+\acute{d}\tilde{K}\in\partial^{\mathrm{hyp}}\mathcal{S}, where ´N=c´2+d´2\acute{\not}{N}=\sqrt{\acute{c}^{2}+\acute{d}^{2}}. If (A1,B1)(A_{1},B_{1}) is infinitesimally BCH minimal, then

B1=(c˘2+d˘2(sinh´Ncosh´Nb^)2Id2sinh´Ncosh´Nb^I~)c˘Id~2+d˘I~c˘2+d˘2N´J~B_{1}=\left(\sqrt{\breve{c}^{2}+\breve{d}^{2}-\left(\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\right)^{2}}\operatorname{Id}_{2}-\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\tilde{I}\right)\frac{\breve{c}\tilde{\operatorname{Id}}_{2}+\breve{d}\tilde{I}}{\breve{c}^{2}+\breve{d}^{2}}\acute{N}\tilde{J}

and

(97) c˘2+d˘2a^2+(sinh´Ncosh´Nb^)2´N2+1´N2.\breve{c}^{2}+\breve{d}^{2}\geq\hat{a}^{2}+\left(\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\right)^{2}\frac{\acute{\not}{N}^{2}+1}{\acute{\not}{N}^{2}}.
Proof.

This a consequence of Lemma 9.9 and Lemma 8.10. By (95/cond1),

B1=(±c˘2+d˘2(sinh´Ncosh´Nb^)2Id2sinh´Ncosh´Nb^I~)c˘Id~2+d˘I~c˘2+d˘2N´J~.B_{1}=\left(\pm\sqrt{\breve{c}^{2}+\breve{d}^{2}-\left(\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\right)^{2}}\operatorname{Id}_{2}-\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\tilde{I}\right)\frac{\breve{c}\tilde{\operatorname{Id}}_{2}+\breve{d}\tilde{I}}{\breve{c}^{2}+\breve{d}^{2}}\acute{N}\tilde{J}.

By (95/cond2), only the sign choice + is valid, and (97) holds. ∎

Lemma 9.12.

Suppose that A1=c1J~+d1K~A_{1}=c_{1}\tilde{J}+d_{1}\tilde{K} and A2=c2J~+d2K~A_{2}=c_{2}\tilde{J}+d_{2}\tilde{K}. Assume that 1=c12+d12>0\not{N}_{1}=\sqrt{c_{1}^{2}+d_{1}^{2}}>0 and 2=c22+d22>0\not{N}_{2}=\sqrt{c_{2}^{2}+d_{2}^{2}}>0. Let ψ\psi denote the angle between the vectors (c1,d1)(c_{1},d_{1}) and (c1,d2)(c_{1},d_{2}). We claim that if (A1,A2)(A_{1},A_{2}) is an infinitesimally BCH minimal pair, then ψ<π\psi<\pi, and

(98) tanψ2((coth1)+(coth2))121(coth1)+2(coth2).\tan\frac{\psi}{2}\leq{\frac{\left(\left(\coth\not{N}_{1}\right)+\left(\coth\not{N}_{2}\right)\right)\not{N}_{1}\,\not{N}_{2}}{\not{N}_{1}\,\left(\coth\not{N}_{1}\right)+\not{N}_{2}\,\left(\coth\not{N}_{2}\right)}}.
Proof.

ψ<π\psi<\pi is immediate. If (98) does not hold, then with the choice

𝐯=A11A22+121(coth1)+2(coth2)A1A2A2A1212,\mathbf{v}=\frac{A_{1}}{\not{N}_{1}}-\frac{A_{2}}{\not{N}_{2}}+\frac{\not{N}_{1}-\not{N}_{2}}{\not{N}_{1}(\coth\not{N}_{1})+\not{N}_{2}(\coth\not{N}_{2})}\frac{A_{1}A_{2}-A_{2}A_{1}}{2\not{N}_{1}\not{N}_{2}},

it yields

MRA(𝐯)=MLA(𝐯)=(((coth1)+(coth2))121(coth1)+2(coth2)tanψ2)sinψ<0.\mathrm{MR}_{A}(\mathbf{v})=\mathrm{ML}_{A}(-\mathbf{v})=\left({\frac{\left(\left(\coth\not{N}_{1}\right)+\left(\coth\not{N}_{2}\right)\right)\not{N}_{1}\,\not{N}_{2}}{\not{N}_{1}\,\left(\coth\not{N}_{1}\right)+\not{N}_{2}\,\left(\coth\not{N}_{2}\right)}}-\tan\frac{\psi}{2}\right)\sin\psi<0.

(By conjugation we can assume that

c1=1cosψ2,d1=1sinψ2,c2=2cosψ2,d2=2sinψ2.c_{1}=\not{N}_{1}\cos\frac{\psi}{2},\qquad d_{1}=\not{N}_{1}\sin\frac{\psi}{2},\qquad c_{2}=\not{N}_{2}\cos\frac{\psi}{2},\qquad d_{2}=-\not{N}_{2}\sin\frac{\psi}{2}.

Then direct computation yields the statement.) ∎

Assume that UU is a subset of M2()×M2()\mathrm{M}_{2}(\mathbb{R})\times\mathrm{M}_{2}(\mathbb{R}) closed under conjugation by rotation matrices. Then the set exp(U)={(expA)(expB):(A,B)U}\exp(U)=\{(\exp A)(\exp B)\,:\,(A,B)\in U\} is also closed conjugation by rotation matrices. Thus it can be represented by its image through ΞPH\Xi^{\mathrm{PH}} accurately. We say that exp(U)\exp(U) set has a factor dimension pp, if the map ΞPH\Xi^{\mathrm{PH}} applied to it can be factorized smoothly through a smooth manifold of dimension pp.

Theorem 9.13.

Factor dimensions of the BCH minimal subsets of the sets U1×U2U_{1}\times U_{2} restricted to (A,B)U1×U2(A,B)\in U_{1}\times U_{2}, A2=1\|A\|_{2}=\not{N}_{1}, B2=2\|B\|_{2}=\not{N}_{2} are

𝒮nnðpar𝒮ell𝒮ell1𝒮dell𝒮hyp𝒮𝒮nn22ðpar𝒮111ell𝒮211ell1𝒮100dell𝒮0hyp𝒮211001\begin{array}[]{c|ccccccc}&\mathcal{S}^{\mathrm{nn}}&\eth^{\mathrm{par}}\mathcal{S}&\partial^{\mathrm{ell*}}\mathcal{S}&\partial^{\mathrm{ell1}}\mathcal{S}&\partial^{\mathrm{dell}}\mathcal{S}&\partial^{\mathrm{hyp}}\mathcal{S}\\ \hline\cr\mathcal{S}^{\mathrm{nn}}&&&2&&&2\\ \eth^{\mathrm{par}}\mathcal{S}&&1&&1&&1\\ \partial^{\mathrm{ell*}}\mathcal{S}&2&&1&&&1\\ \partial^{\mathrm{ell1}}\mathcal{S}&&1&&0&&0\\ \partial^{\mathrm{dell}}\mathcal{S}&&&&&&0\\ \partial^{\mathrm{hyp}}\mathcal{S}&2&1&1&0&0&1\end{array}
Remark.

The numbers given are sort of upper estimates, but one can see that one cannot reduce them generically. ∎

Proof.

The previous statements and simple arguments show this. E. g. Lemma 9.10 takes care to 𝒮nn×ell𝒮\mathcal{S}^{\mathrm{nn}}\times\partial^{\mathrm{ell*}}\mathcal{S}, etc. ∎

Furthermore, note that the image of the restriction of ðpar𝒮×ðpar𝒮\eth^{\mathrm{par}}\mathcal{S}\times\eth^{\mathrm{par}}\mathcal{S} contains the images of the restrictions of ðpar𝒮×ell1𝒮\eth^{\mathrm{par}}\mathcal{S}\times\partial^{\mathrm{ell1}}\mathcal{S} and hyp𝒮×ðpar𝒮\partial^{\mathrm{hyp}}\mathcal{S}\times\eth^{\mathrm{par}}\mathcal{S}, which contain the image of the restriction of hyp𝒮×ell1𝒮\partial^{\mathrm{hyp}}\mathcal{S}\times\partial^{\mathrm{ell1}}\mathcal{S} (and also in reverse order).

Let us define the map

SE,´:XnnGL2()\operatorname{SE}_{\not{N},\acute{\not{N}}}:X^{\mathrm{nn}}_{\not{N}}\rightarrow\operatorname{GL}_{2}(\mathbb{R})

by

(a,b,r)exp(aId2+bI~+rJ~)exp(a^Id2+b^I~a^2+b^2´),(a,b,r)\mapsto\exp\left(a\operatorname{Id}_{2}+b\tilde{I}+r\tilde{J}\right)\exp\left(\frac{\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\,\acute{\not{N}}\right),

where a^\hat{a} and b^\hat{b} are as in Lemma 8.2 and ´N0\acute{\not}{N}\geq 0.

Note that logSE,´\log\operatorname{SE}_{\not{N},\acute{\not{N}}} is surely well-defined if 0<0<\not{N} and +´Nπ\not{N}+\acute{\not}{N}\leq\pi (as the BCH presentation is not Magnus-minimal).

Lemma 9.14.

Consider the map SE,´\operatorname{SE}_{\not{N},\acute{\not{N}}} with a=tcosθa=\not{N}t\cos\theta, b=tsinθb=\not{N}t\sin\theta, r=(1t)r=\not{N}(1-t). (Thus 0<t<10<t<1 and sinθ0\sin\theta\neq 0.) We claim:

(a)

mdisSE,´(a,b,r)=rSin(b2r2).\mathrm{mdis}\,\operatorname{SE}_{\not{N},\acute{\not{N}}}(a,b,r)=r\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2}).

(b) SE,´(a,b,r)\operatorname{SE}_{\not{N},\acute{\not{N}}}(a,b,r) is log-able if and only if

Cos(b2r2)cosb^N´a^2+b^2bSin(b2r2)sinb^N´a^2+b^2>1.\operatorname{\not{\mathrm{C}}os}(b^{2}-r^{2})\cos\frac{\hat{b}\acute{N}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}-b\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\sin\frac{\hat{b}\acute{N}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}>-1.

(c) In the log-able case: Let π23\pi_{23} denote the projection to the second and third coordinates. Then, for the Jacobian,

(π23ΞPHlogSE,´(tcosθ,tsinθ,(1t)))(t,θ)=\frac{\partial\left(\pi_{23}\circ\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SE}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial(t,\theta)}=
(cosθ)AC(Cos(b2r2)cosb^N´a^2+b^2bSin(b2r2)sinb^N´a^2+b^2)Sin(b2r2)(\cos\theta)\cdot\operatorname{AC}\left(\operatorname{\not{\mathrm{C}}os}(b^{2}-r^{2})\cos\frac{\hat{b}\acute{N}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}-b\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\sin\frac{\hat{b}\acute{N}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\right)\cdot\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\cdot
(2t+´N(a^2+b^2)3/2(1+(1t)(2t(cosθ)2)2+t2(1t)(sinθ)2(cosθ)24\Biggl{(}\not{N}^{2}t+\frac{\not{N}\acute{\not}{N}}{(\hat{a}^{2}+\hat{b}^{2})^{3/2}}\Biggl{(}1+(1-t)(2-t(\cos\theta)^{2})\not{N}^{2}\operatorname{\not{\mathbf{C}}}+t^{2}(1-t)(\sin\theta)^{2}(\cos\theta)^{2}\not{N}^{4}\operatorname{\not{\mathbf{W}}}
+(1t)((1t)(sinθ)2+(1t)2(cosθ)2+t2(sinθ)2(cosθ)2)42)),+(1-t)((1-t)(\sin\theta)^{2}+(1-t)^{2}(\cos\theta)^{2}+t^{2}(\sin\theta)^{2}(\cos\theta)^{2})\not{N}^{4}\operatorname{\not{\mathbf{C}}}^{2}\Biggr{)}\Biggr{)},

where the arguments of \operatorname{\not{\mathbf{C}}} and \operatorname{\not{\mathbf{W}}} should be b2r2b^{2}-r^{2}.

In particular, this is non-vanishing if cosθ0\cos\theta\neq 0.

Proof.

(a) This is direct computation. (b) This is a consequence of Lemma 2.14(a).

(c) The formula for log\log can be applied, then direct computation yields the result. Regarding non-vanishing, we can see that beyond cosθ\cos\theta, all multiplicative terms are positive. ∎

Let us define the map

SH,´:XnnGL2()\operatorname{SH}_{\not{N},\acute{\not{N}}}:X^{\mathrm{nn}}_{\not{N}}\rightarrow\operatorname{GL}_{2}(\mathbb{R})

by

(a,b,r)exp(aId2+bI~+rJ~)exp((c˘2+d˘2(sinh´Ncosh´Nb^)2Id2sinh´Ncosh´Nb^I~)c˘Id~2+d˘I~c˘2+d˘2N´J~),(a,b,r)\mapsto\exp\left(a\operatorname{Id}_{2}+b\tilde{I}+r\tilde{J}\right)\cdot\\ \cdot\exp\left(\left(\sqrt{\breve{c}^{2}+\breve{d}^{2}-\left(\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\right)^{2}}\operatorname{Id}_{2}-\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\tilde{I}\right)\frac{\breve{c}\tilde{\operatorname{Id}}_{2}+\breve{d}\tilde{I}}{\breve{c}^{2}+\breve{d}^{2}}\acute{N}\tilde{J}\right),

where a^\hat{a} and b^\hat{b} are as in Lemma 8.2 and ´N0\acute{\not}{N}\geq 0.

Note that logSH,´\log\operatorname{SH}_{\not{N},\acute{\not{N}}} is surely well-defined if 0<0<\not{N} and +´Nπ\not{N}+\acute{\not}{N}\leq\pi (as the BCH presentation is not Magnus-minimal).

Lemma 9.15.

Consider the map SH,´\operatorname{SH}_{\not{N},\acute{\not{N}}} with a=tcosθa=\not{N}t\cos\theta, b=tsinθb=\not{N}t\sin\theta, r=(1t)r=\not{N}(1-t). (Thus 0<t<10<t<1 and sinθ0\sin\theta\neq 0.) We claim:

(a)

mdisSH,´(a,b,r)=(cosh´N)Sin(b2r2)rZ+SHP\mathrm{mdis}\,\operatorname{SH}_{\not{N},\acute{\not{N}}}(a,b,r)=(\cosh\acute{\not}{N})\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\frac{rZ+SH}{\sqrt{P}}

where

Ssinh´Ncosh´N,S\equiv\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}},
Zc˘2+d˘2(sinh´Ncosh´Nb^)2,Z\equiv\sqrt{\breve{c}^{2}+\breve{d}^{2}-\left(\frac{\sinh\acute{\not}{N}}{\cosh\acute{\not}{N}}\hat{b}\right)^{2}},
Pc˘2+d˘2=1+2(1t)(sinθ)22+(sinθ)22+(1t)2(sinθ)242>0,P\equiv\breve{c}^{2}+\breve{d}^{2}=1+2(1-t)(\sin\theta)^{2}\not{N}^{2}\operatorname{\not{\mathbf{C}}}+(\sin\theta)^{2}\not{N}^{2}\operatorname{\not{\mathbf{D}}}+(1-t)^{2}(\sin\theta)^{2}\not{N}^{4}\operatorname{\not{\mathbf{C}}}^{2}>0,
Hb^c˘sinθ(1t)+(c˘2+d˘2)t=1+(1t)(1t(cosθ)2)2+t(sinθ)22>0,H\equiv\frac{\hat{b}\,\breve{c}}{\sin\theta}(1-t)+(\breve{c}^{2}+\breve{d}^{2})t=1+(1-t)(1-t(\cos\theta)^{2})\not{N}^{2}\operatorname{\not{\mathbf{C}}}+t(\sin\theta)^{2}\not{N}^{2}\operatorname{\not{\mathbf{D}}}>0,

such that the arguments of \operatorname{\not{\mathbf{C}}} and \operatorname{\not{\mathbf{D}}} should be b2r2b^{2}-r^{2}.

(b) SH,´(a,b,r)\operatorname{SH}_{\not{N},\acute{\not{N}}}(a,b,r) is log-able if and only if

(cosh´N)Cos(b2r2)+(sinh´N)rSin(b2r2)b^d˘S+c˘ZP2>1.(\cosh\acute{\not}{N})\operatorname{\not{\mathrm{C}}os}(b^{2}-r^{2})+(\sinh\acute{\not}{N})r\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\frac{\hat{b}\breve{d}S+\breve{c}Z}{P^{2}}>-1.

where S,Z,PS,Z,P are as before.

(c) In the log-able case: Let π23\pi_{23} denote the projection to the second and third coordinates. Then, for the Jacobian,

(π23ΞPHlogSH,´(tcosθ,tsinθ,(1t)))(t,θ)=\frac{\partial\left(\pi_{23}\circ\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SH}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial(t,\theta)}=
(cosθ)AC((cosh´N)Cos(b2r2)+(sinh´N)rSin(b2r2)b^d˘S+c˘ZP2)Sin(b2r2)(\cos\theta)\cdot\operatorname{AC}\left((\cosh\acute{\not}{N})\operatorname{\not{\mathrm{C}}os}(b^{2}-r^{2})+(\sinh\acute{\not}{N})r\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\frac{\hat{b}\breve{d}S+\breve{c}Z}{P^{2}}\right)\cdot\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\cdot
2(cosh´N)PP(rSF+S2FH+t(1S2)P2Z),\frac{\not{N}^{2}(\cosh\acute{\not}{N})}{P\sqrt{P}}\Biggl{(}rSF+\frac{S^{2}FH+t(1-S^{2})P^{2}}{Z}\Biggr{)},

where S,Z,P,HS,Z,P,H are as before, and

F(cosθ)2+t(sinθ)22+t2(sinθ)44,F\equiv(\cos\theta)^{2}+t(\sin\theta)^{2}\not{N}^{2}\operatorname{\not{\mathbf{C}}}+t^{2}(\sin\theta)^{4}\not{N}^{4}\operatorname{\not{\mathbf{W}}},

and the arguments of \operatorname{\not{\mathbf{C}}} and \operatorname{\not{\mathbf{W}}} should be b2r2b^{2}-r^{2}.

In particular, this is non-vanishing if cosθ0\cos\theta\neq 0.

Proof.

(a) is direct computation. (b) is a consequence of Lemma 2.14(a). (c) The formula for log\log can be applied, then direct computation yields the result. Regarding non-vanishing, we can see that beyond cosθ\cos\theta, all multiplicative terms are positive. ∎

Let SEH,´\operatorname{SEH}_{\not{N},\acute{\not{N}}} denote either SE,´\operatorname{SE}_{\not{N},\acute{\not{N}}} or SH,´\operatorname{SH}_{\not{N},\acute{\not{N}}}. It would be desirable to show that

(SEHC1) “If 0<,´0<\not{N},\acute{\not{N}} and +´π\not{N}+\acute{\not{N}}\leq\pi then

logSEH,´2:Xnn[0,+)\|\log\circ\operatorname{SEH}_{\not{N},\acute{\not{N}}}\|_{2}:X^{\mathrm{nn}}_{\not{N}}\rightarrow[0,+\infty)

has no local extremum.”

A statement which would provide this can be formulated as follows. For (a,b,r)2×[0,+)(a,b,r)\in\mathbb{R}^{2}\times[0,+\infty), a2+b2,r>0\sqrt{a^{2}+b^{2}},r>0 let (a,b,r)=(aa2+b2,ba2+b2,1)\nabla(a,b,r)=\left(\frac{a}{\sqrt{a^{2}+b^{2}}},\frac{b}{\sqrt{a^{2}+b^{2}}},1\right). One can recognize it as a sort of the gradient of norm. Then a stronger statement is

(SEHC2) “If 0<,´0<\not{N},\acute{\not{N}} and +´π\not{N}+\acute{\not{N}}\leq\pi then

(ΞPHlogSEH,´(tcosθ,tsinθ,(1t)))\nabla\left(\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SEH}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)

and

(99) (ΞPHlogSEH,´(tcosθ,tsinθ,(1t)))θ××(ΞPHlogSEH,´(tcosθ,tsinθ,(1t)))t\frac{\partial\left(\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SEH}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial\theta}\times\\ \times\frac{\partial\left(\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SEH}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial t}

(the standard vectorial product is meant) are not nonnegatively proportional.”

An even stronger possible statement is as follows. For (a,b,r)2×[0,+)(a,b,r)\in\mathbb{R}^{2}\times[0,+\infty), let π(12)3\pi_{(12)3} be defined by π(12)3(a,b,r)=(a2+b2,r)\pi_{(12)3}(a,b,r)=(\sqrt{a^{2}+b^{2}},r)

(SEHC3) “If 0<,´0<\not{N},\acute{\not{N}} and +´π\not{N}+\acute{\not{N}}\leq\pi, then the Jacobian

(π(12)3ΞPHlogSEH,´(tcosθ,tsinθ,(1t)))(t,θ)\frac{\partial\left(\pi_{(12)3}\circ\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SEH}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial(t,\theta)}

is nonvanishing.”

This latter statement can be established in particular cases (for \not{N} and ´\acute{\not{N}}) but I do not know a general argument.

Remark 9.16.

For SE\operatorname{SE},

0(π(12)3ΞPHlogSE,´(tcosθ,tsinθ,(1t)))(t,θ)(sinθ)(cosθ)(1t)AC(Cos(b2r2)cosb^N´a^2+b^2bSin(b2r2)sinb^N´a^2+b^2)2+0\ll\frac{\dfrac{\partial\left(\pi_{(12)3}\circ\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SE}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial(t,\theta)}}{(\sin\theta)(\cos\theta)(1-t)\operatorname{AC}\left(\operatorname{\not{\mathrm{C}}os}(b^{2}-r^{2})\cos\frac{\hat{b}\acute{N}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}-b\operatorname{\not{\mathrm{S}}in}(b^{2}-r^{2})\sin\frac{\hat{b}\acute{N}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\right)^{2}}\ll+\infty

seems to be the case (for \not{N} and ´\acute{\not{N}} fixed). For SH\operatorname{SH},

0(π(12)3ΞPHlogSE,´(tcosθ,tsinθ,(1t)))(t,θ)(sinθ)(cosθ)(t2+(cosθ)2)t2+(sinθ)2+0\ll\frac{\dfrac{\partial\left(\pi_{(12)3}\circ\Xi^{\mathrm{PH}}\circ\log\circ\operatorname{SE}_{\not{N},\acute{\not{N}}}(\not{N}t\cos\theta,\not{N}t\sin\theta,\not{N}(1-t))\right)}{\partial(t,\theta)}}{(\sin\theta)(\cos\theta)(t^{2}+(\cos\theta)^{2})}\cdot\sqrt{t^{2}+(\sin\theta)^{2}}\ll+\infty

seems to hold. These estimates can be established in several special cases. ∎

10. The balanced critical BCH case

Let α0(π,π)\alpha_{0}\in(-\pi,\pi). Consider

πα02,π+α02={log(exp(A)exp(B)):A,BM2(),A2πα02,B2π+α02}.\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}=\left\{\log(\exp(A)\exp(B))\,:\,A,B\in\mathrm{M}_{2}(\mathbb{R}),\|A\|_{2}\leq\frac{\pi-\alpha_{0}}{2},\|B\|_{2}\leq\frac{\pi+\alpha_{0}}{2}\right\}.

This is not a closed set. The reason is that log\log is not defined at Id-\operatorname{Id}, thus the usual compactness argument does not apply. This failure in well-definedness affects only two cases, (A,B)=(πα02I~,π+α02I~)(A,B)=\left(\frac{\pi-\alpha_{0}}{2}\tilde{I},\frac{\pi+\alpha_{0}}{2}\tilde{I}\right) and (A,B)=(πα02I~,π+α02I~)(A,B)=\left(-\frac{\pi-\alpha_{0}}{2}\tilde{I},-\frac{\pi+\alpha_{0}}{2}\tilde{I}\right). Yet, the closure πα02,π+α02¯\overline{\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}} is larger than πα02,π+α02\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}} by several quasi log\log-s of Id2-\operatorname{Id}_{2}, as Theorem 7.1 shows. (In this setting, log(exp(A)exp(B))\log(\exp(A)\exp(B)) is the same as BCH(A,B)\operatorname{BCH}(A,B) except the latter one is well-defined everywhere but still non-continuous at the critical points. Thus, a quite legitimate version of the set above is its union with {πI~,πI~}\{\pi\tilde{I},-\pi\tilde{I}\}.)

Theorem 10.1.

(a) The elements of

πα02,π+α02¯πα02,π+α02=πα02,π+α02πα02,π+α02\overline{\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}}\setminus\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}=\partial{\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}}\setminus\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}

are exactly the elements BB which are are of shape

B=bI~+cJ~+dK~B=b\tilde{I}+c\tilde{J}+d\tilde{K}

with

b2c2d2=π2b^{2}-c^{2}-d^{2}=\pi^{2}

and

π|b|+c2+d2ππ|α0|+2cosα02π|α0|2cosα02.\pi\leq|b|+\sqrt{c^{2}+d^{2}}\leq\pi\sqrt{\frac{\pi-|\alpha_{0}|+2\cos\frac{\alpha_{0}}{2}}{\pi-|\alpha_{0}|-2\cos\frac{\alpha_{0}}{2}}}.

(b) The elements of

πα02,π+α02πα02,π+α02\partial{\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}}\cap\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}

are all of shape log((expA)(logB))\log((\exp A)(\log B)) with A2=πα02\|A\|_{2}=\frac{\pi-\alpha_{0}}{2} and B2=π+α02\|B\|_{2}=\frac{\pi+\alpha_{0}}{2} such that (A,B)(A,B) is a BCH minimal pair.

(c) The interior of πα02,π+α02\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}} is connected, containing 0.

Proof.

(a) The set should be closed to conjugation by orthogonal matrixes, and by continuity of exp\exp, theirs exponentials are Id2-\operatorname{Id}_{2}. Then only the norms are of question but Theorem 7.1 takes care of that. (b) , (c) These follow from the openness of exp\exp as long as the domain is restricted to the spectrum in {z:|Imz|<π}\{z\in\mathbb{C}\,:\,|\operatorname{Im}z|<\pi\}. ∎

The set πα02,π+α02\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}} is closed for conjugation by orthogonal rotations, thus it can be visualized through ΞPH\Xi^{\mathrm{PH}}. Then ΞPH(πα02,π+α02)\partial\Xi^{\mathrm{PH}}\left(\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}\right), expectedly a 22-dimensional object, describes (πα02,π+α02)\left(\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}\right). In the previous section we have devised several restrictions for (infinitesimally) BCH minimal pairs. In fact, we find that ΞPH(πα02,π+α02)\partial\Xi^{\mathrm{PH}}\left(\mathcal{B}_{\frac{\pi-\alpha_{0}}{2},\frac{\pi+\alpha_{0}}{2}}\right) must be contained in the union of continuous images of finitely many, at most 22 dimensional, manifolds, which can described explicitly. Despite this, computation with these objects is tedious.

Therefore we restrict our attention to the case π2,π2\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}} (i. e. α0=0\alpha_{0}=0), which is the most important for us. Before giving any argument, by the following Figure 10, let us show what we will obtain:

Refer to caption
Fig. 10(a) ΞPHπ2,π2\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}}

\phantomcaption
Refer to caption
10(b) xzxz-projection
Refer to caption
10(c) yzyz-projection
\phantomcaption

ΞPHπ2,π2\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}} is a “wedge cap”. What see is the following (note that π/2\not{N}\equiv\pi/2 here):

\bullet In the middle top, we see the Schur bihyperbolic ridge, the image of the (infinitesimally) minimal pairs from hyp𝒮N×hyp𝒮N\partial^{\mathrm{hyp}}\mathcal{S}_{\not}{N}\times\partial^{\mathrm{hyp}}\mathcal{S}_{\not}{N}.

\bullet Joining it, we see Schur parabolical segments (in the xzxz-plane), the image of the (infinitesimally) minimal pairs from ðpar𝒮N×ðpar𝒮N\eth^{\mathrm{par}}\mathcal{S}_{\not}{N}\times\eth^{\mathrm{par}}\mathcal{S}_{\not}{N}.

\bullet The middle ridge is the Schur elliptic-hyperbolic ridge, the image of the (infinitesimally) minimal pairs from ell𝒮N×hyp𝒮N\partial^{\mathrm{ell}}\mathcal{S}_{\not}{N}\times\partial^{\mathrm{hyp}}\mathcal{S}_{\not}{N} or hyp𝒮N×ell𝒮N\partial^{\mathrm{hyp}}\mathcal{S}_{\not}{N}\times\partial^{\mathrm{ell}}\mathcal{S}_{\not}{N} (transposition invariance shows equality).

\bullet In the front and back see we the closure singularity segments (in the yzyz-plane), the images coming from limiting to ±(π2I~,π2I~)\pm(\frac{\pi}{2}\tilde{I},\frac{\pi}{2}\tilde{I})

\bullet In the very bottom, we see the Schur bielliptic rim, the image of the (infinitesimally) minimal pairs from ell1𝒮N×ell1𝒮N\partial^{\mathrm{ell1}}\mathcal{S}_{\not}{N}\times\partial^{\mathrm{ell1}}\mathcal{S}_{\not}{N} and ell𝒮N×ell𝒮N\partial^{\mathrm{ell*}}\mathcal{S}_{\not}{N}\times\partial^{\mathrm{ell*}}\mathcal{S}_{\not}{N}, but of ±(π2I~,π2I~)\pm(\frac{\pi}{2}\tilde{I},\frac{\pi}{2}\tilde{I}) which are exceptional (that is the former case is eliminated).

\bullet The upper, blue area is the Schur smooth-hyperbolic area, the image of the (infinitesimally) minimal pairs from 𝒮nnN×hyp𝒮N\mathcal{S}^{\mathrm{nn}}_{\not}{N}\times\partial^{\mathrm{hyp}}\mathcal{S}_{\not}{N} or hyp𝒮N×𝒮nnN\partial^{\mathrm{hyp}}\mathcal{S}_{\not}{N}\times\mathcal{S}^{\mathrm{nn}}_{\not}{N}. (Again, note transposition invariance.)

\bullet The lower, red area is the Schur smooth-elliptic area, the image of the (infinitesimally) minimal pairs from 𝒮nnN×ell𝒮N\mathcal{S}^{\mathrm{nn}}_{\not}{N}\times\partial^{\mathrm{ell}}\mathcal{S}_{\not}{N} or ell𝒮N×𝒮nnN\partial^{\mathrm{ell}}\mathcal{S}_{\not}{N}\times\mathcal{S}^{\mathrm{nn}}_{\not}{N}.

Remark 10.2.

The situation with ΞPH,\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\not{N},\not{N}} with 0<<π20<\not{N}<\frac{\pi}{2} is similar, cf. Figure 10.2,

Refer to caption
Fig. 10.2(a) ΞPHπ3,π3\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\frac{\pi}{3},\frac{\pi}{3}}
\phantomcaption
Refer to caption
10.2(b) xzxz-projection
Refer to caption
10.2(c) yzyz-projection
\phantomcaption

except the “closure” singularity does not develop. ∎

Let us now make the “statement” of Figure 10 more precise. Let us fix the choice =π/2\not{N}=\pi/2.

\bullet We define the Schur bihyperbolic parametrization as the map

ψ[2arctan,2arctan]ΞPHlog(exp(J~)exp(((cosψ)J~+(sinψ)K~))).\psi\in\left[-2\arctan\not{N},2\arctan\not{N}\right]\mapsto\Xi^{\mathrm{PH}}\log\left(\exp\left(\not{N}\tilde{J}\right)\exp\left(\not{N}((\cos\psi)\tilde{J}+(\sin\psi)\tilde{K})\right)\right).

\bullet We define the Schur parabolic parametrization as the map

(σ,t){1,1}×(1,1)ΞPH(σ[1t]).(\sigma,t)\in\{1,-1\}\times(-1,1)\mapsto\Xi^{\mathrm{PH}}\left(\sigma\cdot\not{N}\cdot\begin{bmatrix}1&\\ &t\end{bmatrix}\right).

\bullet We define the Schur elliptic*-hyperbolic parametrization as the map

ψ(0,π)(π,2π)ΞPHlog(exp(((cosψ)Id~2+(sinψ)I~))exp(J~)).\psi\in(0,\pi)\cup(\pi,2\pi)\mapsto\Xi^{\mathrm{PH}}\log\left(\exp\left(\not{N}((\cos\psi)\tilde{\operatorname{Id}}_{2}+(\sin\psi)\tilde{I})\right)\exp\left(\not{N}\tilde{J}\right)\right).

\bullet We define the closure parametrization as the map

(σ,r){1,1}×[0,2ππ24](0,σπ2+r2,r).(\sigma,r)\in\{-1,1\}\times\left[0,\frac{2\pi}{\sqrt{\pi^{2}-4}}\right]\mapsto\left(0,\sigma\cdot\sqrt{\pi^{2}+r^{2}},r\right).

\bullet We define the Schur bielliptic parametrization as the map

ψ(π2,π2)(π2,3π2)ΞPH(((cosψ)Id~2+(sinψ)I~)).\psi\in\left(-\frac{\pi}{2},\frac{\pi}{2}\right)\cup\left(\frac{\pi}{2},\frac{3\pi}{2}\right)\mapsto\Xi^{\mathrm{PH}}\left(\not{N}((\cos\psi)\tilde{\operatorname{Id}}_{2}+(\sin\psi)\tilde{I})\right).

\bullet We define the Schur smooth-hyperbolic parametrization as the map

(a,b,r)XNΞPHlog(exp(aId2+bI~+rJ~)exp((c˘2+d˘2(sinhcoshb^)2Id2sinhcoshb^I~)c˘Id~2+d˘I~c˘2+d˘2J~)).(a,b,r)\in X_{\not}{N}\mapsto\Xi^{\mathrm{PH}}\log\Biggl{(}\exp\left(a\operatorname{Id}_{2}+b\tilde{I}+r\tilde{J}\right)\cdot\\ \cdot\exp\left(\left(\sqrt{\breve{c}^{2}+\breve{d}^{2}-\left(\frac{\sinh\not{N}}{\cosh\not{N}}\hat{b}\right)^{2}}\operatorname{Id}_{2}-\frac{\sinh\not{N}}{\cosh\not{N}}\hat{b}\tilde{I}\right)\frac{\breve{c}\tilde{\operatorname{Id}}_{2}+\breve{d}\tilde{I}}{\breve{c}^{2}+\breve{d}^{2}}\not{N}\tilde{J}\right)\Biggr{)}.

(Here we have used the abbreviations of Lemma 8.2.)

\bullet Similarly, we define the Schur smooth-elliptic parametrization as the map

(a,b,r)XNΞPHlog(exp(aId2+bI~+rJ~)exp(a^Id2+b^I~a^2+b^2)).(a,b,r)\in X_{\not}{N}\mapsto\Xi^{\mathrm{PH}}\log\left(\exp\left(a\operatorname{Id}_{2}+b\tilde{I}+r\tilde{J}\right)\exp\left(\frac{\hat{a}\operatorname{Id}_{2}+\hat{b}\tilde{I}}{\sqrt{\hat{a}^{2}+\hat{b}^{2}}}\not{N}\right)\right).

We will call these maps as the canonical parametrizations in the (π2,π2)\left(\frac{\pi}{2},\frac{\pi}{2}\right) case.

Remark 10.3.

For 0<<π20<\not{N}<\frac{\pi}{2}, the canonical parametrizations can defined similarly in the (,)\left(\not{N},\not{N}\right) case, except the situation is simpler: These is no closure parametrization but the Schur bielliptic parametrization can be defined fully for [0,2π]mod2π[0,2\pi]\operatorname{\,mod\,}2\pi. ∎

Theorem 10.4.

(a) Every element of ΞPHπ2,π2\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}} occurs the image of BCH minimal pair with respect norms 1=2=π/2\not{N}_{1}=\not{N}_{2}=\pi/2 with the exception of the closure singularities. Conversely, every image of a such a BCH minimal pair or a point of the closure singularity is in ΞPHπ2,π2\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}}.

(b) The canonical parametrizations taken together map to ΞPHπ2,π2\partial\Xi^{\mathrm{PH}}\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}} bijectively, and the images fit together topologically as suggested by Figure 10.

Proof.

First we prove that non BCH minimal points map to the interior. Indeed as least one component can be replaced by an element of norm less then π/2\pi/2. As the exponential map and the logarithm will be open in these circumstances, perturbing that the entry yields an open neighborhood in the image. This proves the first part of (a). The second part, and, in fact, the rest of the statement, follows by topological reasons if we prove that the images of the canonical maps fit together topologically as a half-sphere. For the 1-dimensional canonical parametrizations injectivity and topological incidences are easy to check. Next one can prove that the 2-dimensional parametrizations limit on their boundary in an expected manner. In that blow-ups in ACKB or AHP are instrumental, except at the pairs ±(π2I~,π2I~)\pm(\frac{\pi}{2}\tilde{I},\frac{\pi}{2}\tilde{I}), continuity (in fact, well-definedness) breaks down. Yet, we know that the limits exponentiate to ±πI~\pm\pi\tilde{I}, and we know the range of the norm by Theorem 7.1. By this, the closure singularies can be recovered. Also, the sign relations in (the trace) coordinate xx are easy due to the BCH formula. To the 1-dimensional canonical parametrization we can add the a=0a=0 cases of the smooth-hyperbolic and smooth-elliptic canonical parametrizations. By the sign relations in xx, then it is sufficient to prove that yzyz-projections of the a0a\neq 0 parts of smooth-hyperbolic and smooth-elliptic canonical parametrizations have non-vanishing Jacobians. Thus the fill out the corresponding regions in Figure 10(c) as they should. (That also shows that they fit to the closure singularities properly.) ∎

Theorem 10.5.
G(π2)π2expπ2=(supA,BM2(),A2=B2π/2log((expA)(expB))2).G\left(\frac{\pi}{2}\right)\equiv\frac{\pi}{2}\exp\frac{\pi}{2}=\left(\sup_{\begin{subarray}{c}A,B\in\mathrm{M}_{2}(\mathbb{R}),\\ \|A\|_{2}=\|B\|_{2}\leq\pi/2\end{subarray}}\|\log((\exp A)(\exp B))\|_{2}\right).

(Here the exceptional cases A=B=π2I~A=B=\frac{\pi}{2}\tilde{I} and A=B=π2I~A=B=-\frac{\pi}{2}\tilde{I} do not participate in the supremum. But, writing BCH(A,B)\operatorname{BCH}(A,B), they could.)

Indication of proof.

Thus, one has to prove

sup{S2:Sπ2,π2}=π2expπ2.\sup\{\|S\|_{2}\,:\,S\in\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}}\}=\frac{\pi}{2}\exp\frac{\pi}{2}.

According to Theorem 10.1, we can replace π2,π2\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}} by π2,π2\partial\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}}, in fact, by π2,π2π2,π2\partial\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}}\cap\mathcal{B}_{\frac{\pi}{2},\frac{\pi}{2}}. By various estimates, we can localize the maxima near to tip of the horns of Figure 10(a). By further estimates, one show that the direction of gradients for the Schur smooth-elliptic parts and Schur smooth-hyperbolic parts near the tip are inconsistent to the maxima. Thus it remains to optimize on the Schur elliptic*-hyperbolic part. There, ultimately, we obtain the tips as expected for the maxima. ∎

Remark 10.6.

In the previous proof, the critical part is the reduction to the 1-dimensional boundary pieces. In a less ad hoc way it is also achieved by showing, say, (SECH3) for =´N=π2\not{N}=\acute{\not}{N}=\frac{\pi}{2}, which is quite doable as a special case. For 1-dimensional optimizations there are just several methods. ∎

It seems reasonable to expect that

(SM0) “If 0<,´0<\not{N},\acute{\not{N}} and +´π\not{N}+\acute{\not{N}}\leq\pi, then the maximum of E2\|E\|_{2} for E,´E\in\mathcal{B}_{\not{N},\acute{\not{N}}} is taken for a traceless pair of matrices.”

Indeed, large norm seems to come from non-commutativity, and thus including tracial parts seems to be quite pointless. Again, this is quite provable in several special cases, but I do not know a general proof. It may be possible, however, that a quite simple argument will suffice.

Remark 10.7.

The localizations of the maxima, however, are not trivial even in the case =´N\not{N}=\acute{\not}{N}. Indeed, in this setting, there is a critical value

C0=π0.392744C_{0}=\pi\cdot 0.392744\ldots

such that for 0<=´N<C00<\not{N}=\acute{\not}{N}<C_{0}, the maxima are taken on the Schur bihyperbolic ridge (cf. Figure 10.2); and for C0<=´NπC_{0}<\not{N}=\acute{\not}{N}\leq\pi, the maxima are taken on the Schur elliptic*-hyperbolic ridge; and for p=C0p=C_{0}, on both. ∎

Thus one can expect that

(SM+) “If 0<,´0<\not{N},\acute{\not{N}} and +´π\not{N}+\acute{\not{N}}\leq\pi but +´\not{N}+\acute{\not{N}} is sufficiently large, then the maximum of E2\|E\|_{2} for E,´E\in\mathcal{B}_{\not{N},\acute{\not{N}}} is taken for a traceless Schur elliptic-hyperbolic pair, the norm of the conform-involution is less or equal than the norm of conform-skew-involution.”

However, this needs further exploration.

Remark 10.8.

If this is true, then Example 6.4 seems to be sharp case. ∎

11. Principal disks and logarithm

Lemma 11.1.

Suppose that AA is a log\log-able real 2×22\times 2 matrix with principal disk

PD(A)=D(a+ib,r).\operatorname{PD}(A)=\operatorname{D}(a+\mathrm{i}b,r).

In that case,

(100) logA2=fCA(a,b,r)+fRD(a,b,r),\|\log A\|_{2}=f_{\mathrm{CA}}(a,b,r)+f_{\mathrm{RD}}(a,b,r),

and

(101) logA2=fCA(a,b,r)fRD(a,b,r),\left\lfloor\log A\right\rfloor_{2}=f_{\mathrm{CA}}(a,b,r)-f_{\mathrm{RD}}(a,b,r),

where

fCA(a,b,r)=(loga2+b2r2)2+(bAC(aa2+b2r2)a2+b2r2)2f_{\mathrm{CA}}(a,b,r)=\sqrt{\Bigl{(}\log\sqrt{a^{2}+b^{2}-r^{2}}\Bigr{)}^{2}+\left(\frac{b\operatorname{AC}\left(\dfrac{a}{\sqrt{a^{2}+b^{2}-r^{2}}}\right)}{\sqrt{a^{2}+b^{2}-r^{2}}}\right)^{2}}

and

fRD(a,b,r)=rAC(aa2+b2r2)a2+b2r2.f_{\mathrm{RD}}(a,b,r)=\frac{r\operatorname{AC}\left(\dfrac{a}{\sqrt{a^{2}+b^{2}-r^{2}}}\right)}{\sqrt{a^{2}+b^{2}-r^{2}}}.

In particular, if detA=1\det A=1, then a2+b2r2=1a^{2}+b^{2}-r^{2}=1, and fCA(a,b,r)=AC(a)bf_{\mathrm{CA}}(a,b,r)=\operatorname{AC}(a)b, fRD(a,b,r)=AC(a)rf_{\mathrm{RD}}(a,b,r)=\operatorname{AC}(a)r.

Proof.

This is just the combination of (36) and (44)–(48), computed explicitly. ∎

Theorem 11.2.

Suppose that A1,A2A_{1},A_{2} are log\log-able real 2×22\times 2 matrices such that

PD(A1)PD(A2).\operatorname{PD}(A_{1})\subset\operatorname{PD}(A_{2}).

Then

(102) logA12logA22.\|\log A_{1}\|_{2}\leq\|\log A_{2}\|_{2}.

and

(103) logA12logA22.\lfloor\log A_{1}\rfloor_{2}\geq\lfloor\log A_{2}\rfloor_{2}.
Remark.

The monotonicity of 2\|\cdot\|_{2} is strict, except if PD(A1)\operatorname{PD}(A_{1}) and PD(A2)\operatorname{PD}(A_{2}) are centered on the real line and sup{|logx|:xPD(A1)}=sup{|logx|:xPD(A2)}\sup\{|\log x|\,:\,x\in\mathbb{R}\cap\operatorname{PD}(A_{1})\}=\sup\{|\log x|\,:\,x\in\mathbb{R}\cap\operatorname{PD}(A_{2})\}.

The monotonicity of 2\lfloor\cdot\rfloor_{2} is strict, except if PD(A1)\operatorname{PD}(A_{1}) and PD(A2)\operatorname{PD}(A_{2}) are centered on the real line and inf{|logx|:xPD(A1)}=inf{|logx|:xPD(A2)}\inf\{|\log x|\,:\,x\in\mathbb{R}\cap\operatorname{PD}(A_{1})\}=\inf\{|\log x|\,:\,x\in\mathbb{R}\cap\operatorname{PD}(A_{2})\}. ∎

Proof.

Let f(a,b,r)f(a,b,r) denote the functional expression on the right side of (100). Then it is a straightforward but long computation to check the identity

(104) (f(a,b,r)r)2(f(a,b,r)a)2(f(a,b,r)b)2=(f(a,b,r)fCA(a,b,r)bAS(aa2+b2r2)a2+b2r2)2.\left(\frac{\partial f(a,b,r)}{\partial r}\right)^{2}-\left(\frac{\partial f(a,b,r)}{\partial a}\right)^{2}-\left(\frac{\partial f(a,b,r)}{\partial b}\right)^{2}=\left(\frac{f(a,b,r)}{f_{\mathrm{CA}}(a,b,r)}\frac{b\operatorname{AS}\left(\dfrac{a}{\sqrt{a^{2}+b^{2}-r^{2}}}\right)}{{a^{2}+b^{2}-r^{2}}}\right)^{2}.

This is valid, except if b=0b=0 and a=1+r2a=\sqrt{1+r^{2}}, the exceptional configurations. In particular, if b>0b>0, then

(f(a,b,r)r)2(f(a,b,r)a)2(f(a,b,r)b)2>0.\left(\frac{\partial f(a,b,r)}{\partial r}\right)^{2}-\left(\frac{\partial f(a,b,r)}{\partial a}\right)^{2}-\left(\frac{\partial f(a,b,r)}{\partial b}\right)^{2}>0.

The principal disks with b>0b>0 form a connected set, consequently

(105) f(a,b,r)r>(f(a,b,r)a)2+(f(a,b,r)b)2\frac{\partial f(a,b,r)}{\partial r}>\sqrt{\left(\frac{\partial f(a,b,r)}{\partial a}\right)^{2}+\left(\frac{\partial f(a,b,r)}{\partial b}\right)^{2}}

or

f(a,b,r)r<(f(a,b,r)a)2+(f(a,b,r)b)2\frac{\partial f(a,b,r)}{\partial r}<-\sqrt{\left(\frac{\partial f(a,b,r)}{\partial a}\right)^{2}+\left(\frac{\partial f(a,b,r)}{\partial b}\right)^{2}}

should hold globally for b>0b>0. The question is: which one? It is sufficient to check the sign f(a,b,r)r\frac{\partial f(a,b,r)}{\partial r} at a single place. Now, it is not hard to check that

f(a,b,r)r|r=0=AC(aa2+b2)a2+b2\frac{\partial f(a,b,r)}{\partial r}\Bigl{|}_{r=0}=\frac{\operatorname{AC}\left(\frac{a}{\sqrt{a^{2}+b^{2}}}\right)}{\sqrt{a^{2}+b^{2}}}

(except if a=1,b=0a=1,b=0), which shows that (105) holds. The meaning of (105) is that expanding principal disks smoothly with non-real centers leads to growth in the norm of the logarithm.

Let us return to principal disks Di=PD(Ai)D_{i}=\operatorname{PD}(A_{i}) in the statement. If b1,b2>0b_{1},b_{2}>0, then we can expand the smaller one to the bigger one with non-real centers. (Indeed, magnify D2D_{2} from its lowest point, until the perimeters touch, and then magnify from the touching point.) This proves the (102) for b1,b2>0b_{1},b_{2}>0. The general statement follows from the continuity of the norm of the logarithm. Notice that the norm grows if we can expand through b>0b>0.

Regarding (103): Let fco(a,b,r)f_{\mathrm{co}}(a,b,r) denote the functional expression on the right side of (101). It satisfies the very same equation (104) but with f(a,b,r)f(a,b,r) replaced by fco(a,b,r)f_{\mathrm{co}}(a,b,r) throughout. However,

fco(a,b,r)r|r=0=AC(aa2+b2)a2+b2.\frac{\partial f_{\mathrm{co}}(a,b,r)}{\partial r}\Bigl{|}_{r=0}=-\frac{\operatorname{AC}\left(\frac{a}{\sqrt{a^{2}+b^{2}}}\right)}{\sqrt{a^{2}+b^{2}}}.

The rest is analogous. ∎

Lemma 11.3.

Suppose that A1,A2A_{1},A_{2} are real 2×22\times 2 matrices. Then

PD(A1)PD(A2)\operatorname{PD}(A_{1})\subset\operatorname{PD}(A_{2})

holds if and only if

A1+λId2A2+λId2 for all λ\|A_{1}+\lambda\operatorname{Id}\|_{2}\leq\|A_{2}+\lambda\operatorname{Id}\|_{2}\qquad\text{ for all }\lambda\in\mathbb{R}

and

A1+λId2A2+λId2 for all λ.\lfloor A_{1}+\lambda\operatorname{Id}\rfloor_{2}\geq\lfloor A_{2}+\lambda\operatorname{Id}\rfloor_{2}\qquad\text{ for all }\lambda\in\mathbb{R}.
Proof.

The norms and co-norms can be read off from the principal disk immediately. Hence the statement is simple geometry. ∎

Theorem 11.4.

Suppose that A1,A2A_{1},A_{2} are log\log-able real 2×22\times 2 matrices. If

PD(A1)PD(A2),\operatorname{PD}(A_{1})\subset\operatorname{PD}(A_{2}),

then

PD(logA1)PD(logA2).\operatorname{PD}(\log A_{1})\subset\operatorname{PD}(\log A_{2}).

The monotonicity is strict. Similar statement applies to CD\operatorname{CD}.

Proof.

In this case, the matrices eλAi\mathrm{e}^{\lambda}A_{i} will also be log\log-able. Moreover, PD(eλA1)PD(eλA2)\operatorname{PD}(\mathrm{e}^{\lambda}A_{1})\subset\operatorname{PD}(\mathrm{e}^{\lambda}A_{2}) holds. Now, log(eλAi)=logAi+λId\log(\mathrm{e}^{\lambda}A_{i})=\log A_{i}+\lambda\operatorname{Id}. By the previous theorem, logA1+λId2logA2+λId2\|\log A_{1}+\lambda\operatorname{Id}\|_{2}\leq\|\log A_{2}+\lambda\operatorname{Id}\|_{2} and A1+λId2A2+λId2\lfloor A_{1}+\lambda\operatorname{Id}\rfloor_{2}\geq\lfloor A_{2}+\lambda\operatorname{Id}\rfloor_{2} holds for every λ\lambda\in\mathbb{R}. According to the previous lemma, this implies the main statement. The monotonicity is transparent in this case, as both log\log and exp\exp are compatible with conjugation by orthogonal matrices, hence the orbit correspondence is one-to-one. log\log respects chirality, hence the statement can also be transferred to chiral disks. ∎

12. Examples: The canonical Magnus developments in SL2()\operatorname{SL}_{2}(\mathbb{R})

Example 12.1.

(Magnus parabolic development.) On the interval [0,π][0,\pi], consider again the measure Φ\Phi, such that

Φ(θ)=[sin2θcos2θcos2θsin2θ]dθ|[0,π].\Phi(\theta)=\begin{bmatrix}-\sin 2\theta&\cos 2\theta\\ \cos 2\theta&\sin 2\theta\end{bmatrix}\,\mathrm{d}\theta|_{[0,\pi]}.

Then, for p[0,π)p\in[0,\pi),

Φ|[0,p]2=p.\int\|\Phi|_{[0,p]}\|_{2}=p.

Here

expL(Φ|[0,p])=W(p,p)=[cosp2pcospsinpsinp2psinp+cosp]=(cospId+sinpI~)(Id2pI~+pK~).\operatorname{exp_{L}}(\Phi|_{[0,p]})=W(p,p)=\begin{bmatrix}\cos p&2p\cos p-\sin p\\ \sin p&2p\sin p+\cos p\end{bmatrix}=(\cos p\operatorname{Id}+\sin p\tilde{I})(\operatorname{Id}_{2}-p\tilde{I}+p\tilde{K}).

Thus

μL(Φ|[0,p])=logexpL(Φ|[0,p])=AC(cosp+psinp)[psinp2pcospsinpsinppsinp].\mu_{\mathrm{L}}(\Phi|_{[0,p]})=\log\operatorname{exp_{L}}(\Phi|_{[0,p]})=\operatorname{AC}(\cos p+p\sin p)\begin{bmatrix}-p\sin p&2p\cos p-\sin p\\ \sin p&p\sin p\end{bmatrix}.

Consequently,

μL(Φ|[0,p])2=AC(cosp+psinp)(sinppcosp+p).\|\mu_{\mathrm{L}}(\Phi|_{[0,p]})\|_{2}=\operatorname{AC}(\cos p+p\sin p)\cdot(\sin p-p\cos p+p).

As p0p\searrow 0

(106) μL(Φ|[0,p])2=p+16p3172p5+173024p7+O(p9).\|\mu_{\mathrm{L}}(\Phi|_{[0,p]})\|_{2}=p+{\frac{1}{6}}{p}^{3}-{\frac{1}{72}}{p}^{5}+{\frac{17}{3024}}{p}^{7}+O\left({p}^{9}\right).

As pπp\nearrow\pi,

(107) μL(Φ|[0,p])2=2π3/2(πp)1/22π+2π(π21)4(πp)1/2+O(πp).\|\mu_{\mathrm{L}}(\Phi|_{[0,p]})\|_{2}=\sqrt{2}\pi^{3/2}(\pi-p)^{-1/2}-2\pi+\frac{\sqrt{2\pi}(\pi^{2}-1)}{4}(\pi-p)^{1/2}+O(\pi-p).

This is not only better than (9), but it has the advantage that it can be interpreted in terms of the solution of a differential equation blowing up. ∎

Example 12.2.

(Magnus elliptic development.) Let h[0,1]h\in[0,1] be a parameter. On the interval [0,π][0,\pi], consider the measure Φ^h\widehat{\Phi}_{h} such that

Φ^h(θ)=(1h)[11]+h[sin2θcos2θcos2θsin2θ]dθ|[0,π].\widehat{\Phi}_{h}(\theta)=(1-h)\begin{bmatrix}&-1\\ 1&\end{bmatrix}+h\begin{bmatrix}-\sin 2\theta&\cos 2\theta\\ \cos 2\theta&\sin 2\theta\end{bmatrix}\,\mathrm{d}\theta|_{[0,\pi]}.

Then, for p[0,π)p\in[0,\pi)

Φ^h|[0,p]2=p.\int\|\widehat{\Phi}_{h}|_{[0,p]}\|_{2}=p.

It is easy to see that

expL(Φ^h|[0,p])=F((1h)p,hp,p)==[cosp2wcospsinpsinp2wsinp+cosp]=(cospId+sinpI~)(Id2wI~+wK~).\operatorname{exp_{L}}(\widehat{\Phi}_{h}|_{[0,p]})=F((1-h)p,hp,p)=\\ =\begin{bmatrix}\cos p&2w\cos p-\sin p\\ \sin p&2w\sin p+\cos p\end{bmatrix}=(\cos p\operatorname{Id}+\sin p\tilde{I})(\operatorname{Id}_{2}-w\tilde{I}+w\tilde{K}).

Here Φ^1=Φ\widehat{\Phi}_{1}=\Phi. We find that

μL(Φ^h|[0,p])=logexpL(Φ^h|[0,p])=AC(cosp+hpsinp)(sinphpcosp+hp).\|\mu_{\mathrm{L}}(\widehat{\Phi}_{h}|_{[0,p]})\|=\|\log\operatorname{exp_{L}}(\widehat{\Phi}_{h}|_{[0,p]})\|=\operatorname{AC}(\cos p+hp\sin p)\cdot(\sin p-hp\cos p+hp).

Thus, if h0h\neq 0, then

limpπμL(Φ^h|[0,p])2=+.\lim_{p\nearrow\pi}\|\mu_{\mathrm{L}}(\widehat{\Phi}_{h}|_{[0,p]})\|_{2}=+\infty.

It is notable that

CD(expL(Φ^h|[0,p]))=D(eipieipph,ph),\operatorname{CD}(\operatorname{exp_{L}}(\widehat{\Phi}_{h}|_{[0,p]}))=\operatorname{D}(\mathrm{e}^{\mathrm{i}p}-\mathrm{i}\mathrm{e}^{\mathrm{i}p}ph,ph),

which is CD(expL(Φ|[0,p]))\operatorname{CD}(\operatorname{exp_{L}}(\Phi|_{[0,p]})) contracted from the boundary point eip\mathrm{e}^{\mathrm{i}p} by factor hh. ∎

Example 12.3.

(Magnus hyperbolic development.) More generally, let tt be a real parameter. On the interval [0,π][0,\pi] consider the measure Φsint\Phi_{\sin t}, such that

Φsint(θ)=[sin2(θsint)cos2(θsint)cos2(θsint)sin2(θsint)]dθ|[0,π].\Phi_{\sin t}(\theta)=\begin{bmatrix}-\sin 2(\theta\sin t)&\cos 2(\theta\sin t)\\ \cos 2(\theta\sin t)&\sin 2(\theta\sin t)\end{bmatrix}\,\mathrm{d}\theta|_{[0,\pi]}.

Then, for p[0,π)p\in[0,\pi)

Φsint|[0,p]2=p.\int\|\Phi_{\sin t}|_{[0,p]}\|_{2}=p.

Φ1\Phi_{1} is the same as Φ\Phi, and Φ1=K~Φ1K~\Phi_{-1}=\tilde{K}\cdot\Phi_{1}\cdot\tilde{K}. If t(π/2,π/2)t\in(-\pi/2,\pi/2), then

(108) expL(Φsint|[0,p])=W(p,psint)=(cos(psint)Id+sin(psint)I~)(cosh(pcost)Id2+sinh(pcost)cost(sintI~+K~)).\operatorname{exp_{L}}(\Phi_{\sin t}|_{[0,p]})=W(p,p\sin t)\\ =(\cos(p\sin t)\operatorname{Id}+\sin(p\sin t)\tilde{I})\cdot\left(\cosh(p\cos t)\operatorname{Id}_{2}+\frac{\sinh(p\cos t)}{\cos t}\Bigl{(}-\sin t\tilde{I}+\tilde{K}\Bigr{)}\right).

Consequently,

(109) μL(Φsint|[0,p])2=AC(cosh(pcost)cos(psint)+sinh(pcost)costsin(psint)sint)(|cosh(pcost)sin(psint)sinh(pcost)costcos(psint)sint|+sinh(pcost)cost).\|\mu_{\mathrm{L}}(\Phi_{\sin t}|_{[0,p]})\|_{2}=\operatorname{AC}\left({\cosh\left(p\cos t\right)\cos\left(p\sin t\right)+\frac{\sinh\left(p\cos t\right)}{\cos t}\sin\left(p\sin t\right)\sin t}\right)\\ \cdot\left(\left|{\cosh\left(p\cos t\right)\sin\left(p\sin t\right)-\frac{\sinh\left(p\cos t\right)}{\cos t}\cos\left(p\sin t\right)\sin t}\right|+\frac{\sinh\left(p\cos t\right)}{\cos t}\right).

Now, in the special case p/π=sintp/\pi=\sin t, we see that

Φp/π|[0,p]2=p,\int\|\Phi_{p/\pi}|_{[0,p]}\|_{2}=p,

and

(110) μL(Φp/π|[0,p])2=2π3/2(πp)1/22π+2π(4π23)12(πp)1/24π233(πp)1+2(368π2840π245)1440π(πp)3/2+O((πp)2).\|\mu_{\mathrm{L}}(\Phi_{p/\pi}|_{[0,p]})\|_{2}=\sqrt{2}\pi^{3/2}(\pi-p)^{-1/2}-2\pi+\frac{\sqrt{2\pi}(4\pi^{2}-3)}{12}(\pi-p)^{1/2}\\ -\frac{4\pi^{2}-3}{3}(\pi-p)^{1}+\frac{\sqrt{2}(368\pi^{2}-840\pi^{2}-45)}{1440\sqrt{\pi}}(\pi-p)^{3/2}+O((\pi-p)^{2}).

This shows that (107) is not optimal, either. ∎

Remark 12.4.

For t[π/2,π/2]t\in[-\pi/2,\pi/2] and p[0,π]p\in[0,\pi],

cosh(pcost)sin(psint)sinh(pcost)costcos(psint)sint0{\cosh\left(p\cos t\right)\sin\left(p\sin t\right)-\frac{\sinh\left(p\cos t\right)}{\cos t}\cos\left(p\sin t\right)\sin t}\geq 0

holds. (Understood as =0=0 for t=0t=0. It is also =0=0 for p=0p=0.) Thus, under these assumptions, the absolute value in (109) is unnecessary. ∎

13. Magnus minimality in the GL2+()\operatorname{GL}_{2}^{+}(\mathbb{R}) case

Theorem 13.1.

Let p(0,π)p\in(0,\pi). Consider the family of disks parameterized by t[π/2,π/2]t\in[-\pi/2,\pi/2], such that the centers and radii are

Ωp(t)=eipsint(cosh(pcost)isinh(pcost)sintcost),\Omega_{p}(t)=\mathrm{e}^{\mathrm{i}p\sin t}\left(\cosh(p\cos t)-\mathrm{i}\frac{\sinh(p\cos t)\sin t}{\cos t}\right),
ωp(t)=sinh(pcost)cost,\omega_{p}(t)=\frac{\sinh(p\cos t)}{\cos t},

for t±π/2t\neq\pm\pi/2; and

Ωp(±π/2)=(cosp+psinp)±i(sinppcosp),\Omega_{p}(\pm\pi/2)=(\cos p+p\sin p)\pm\mathrm{i}(\sin p-p\cos p),
ωp(±π/2)=p.\omega_{p}(\pm\pi/2)=p.

(a) The circle D(Ωp(t),ωp(t))\partial\operatorname{D}(\Omega_{p}(t),\omega_{p}(t)) is tangent to expD(0,p)\partial\exp\operatorname{D}(0,p) at

γp(t)=epcost+ipsintandγp(πtmod2π)=epcost+ipsint.\gamma_{p}(t)=\mathrm{e}^{p\cos t+\mathrm{i}p\sin t}\qquad\text{and}\qquad\gamma_{p}(\pi-t\operatorname{\,mod\,}2\pi)=\mathrm{e}^{-p\cos t+\mathrm{i}p\sin t}.

These points are inverses of each other relative to the unit circle. If the points are equal (t=±π/2t=\pm\pi/2), then the disk is the osculating disk at γp(t)\gamma_{p}(t).

The disks themselves are orthogonal to the unit circle. The disks are distinct from each other. Extending t[π,π]t\in[-\pi,\pi], we have Ωp(t)=Ωp(πtmod2π)\Omega_{p}(t)=\Omega_{p}(\pi-t\operatorname{\,mod\,}2\pi), ωp(t)=ωp(πtmod2π)\omega_{p}(t)=\omega_{p}(\pi-t\operatorname{\,mod\,}2\pi).

(b)

CD(expL(Φsint|[0,p])=CD(W(p,psint))=D(Ωp(t),ωp(t)).\operatorname{CD}(\operatorname{exp_{L}}(\Phi_{\sin t}|_{[0,p]})=\operatorname{CD}(W(p,p\sin t))=\operatorname{D}(\Omega_{p}(t),\omega_{p}(t)).

(c) The disks D(Ωp(t),ωp(t))\operatorname{D}(\Omega_{p}(t),\omega_{p}(t)) are the maximal disks in expD(0,p)\exp\operatorname{D}(0,p). The maximal disk D(Ωp(t),ωp(t))\operatorname{D}(\Omega_{p}(t),\omega_{p}(t)) touches expD(0,p)\partial\exp\operatorname{D}(0,p) only at γp(t)\gamma_{p}(t), γp(πtmod2π)\gamma_{p}(\pi-t\operatorname{\,mod\,}2\pi).

Proof.

(a) The disks are distinct because, the centers are distinct: For t(π/2,π/2)t\in(-\pi/2,\pi/2),

dargΩp(t)dt=ImdlogΩp(t)dt=(psin(t)cosh(psint)cosh(pcost))cosh(psint)cosh(psint)2sin2t>0.\frac{\mathrm{d}\arg\Omega_{p}(t)}{\mathrm{d}t}=\operatorname{Im}\frac{\mathrm{d}\log\Omega_{p}(t)}{\mathrm{d}t}=\frac{(p\sin(t)\cosh(p\sin t)-\cosh(p\cos t))\cosh(p\sin t)}{\cosh(p\sin t)^{2}-\sin^{2}t}>0.

(Cf. 0xysinhydy=xcoshxsinhx.\int_{0}^{x}y\sinh y\mathrm{d}y=x\cosh x-\sinh x.) The rest can easily be checked using the observation

Ωp(t)=epcost+ipsintsinh(pcost)costei(t+psint)=epcost+ipsint+sinh(pcost)costei(t+psint).\Omega_{p}(t)=\mathrm{e}^{p\cos t+\mathrm{i}p\sin t}-\frac{\sinh(p\cos t)}{\cos t}\mathrm{e}^{\mathrm{i}(t+p\sin t)}=\mathrm{e}^{-p\cos t+\mathrm{i}p\sin t}+\frac{\sinh(p\cos t)}{\cos t}\mathrm{e}^{\mathrm{i}(-t+p\sin t)}.

(b) This is direct computation.

(c) In general, maximal disks touch the boundary curve γp\gamma_{p}, and any such touching point determines the maximal disk. (But a maximal disk might belong to different points.) Due to the double tangent / osculating property the given disks are surely the maximal disks, once we prove that they are indeed contained in expD(0,p)\exp\operatorname{D}(0,p). However, CD(expL(Φsint|[0,p])=D(Ωp(t),ωp(t))\operatorname{CD}(\operatorname{exp_{L}}(\Phi_{\sin t}|_{[0,p]})=\operatorname{D}(\Omega_{p}(t),\omega_{p}(t)) together with Theorem 1.2 implies that D(Ωp(t),ωp(t))expD(0,p)\operatorname{D}(\Omega_{p}(t),\omega_{p}(t))\subset\exp\operatorname{D}(0,p). The distinctness of the circles implies that they touch the boundary only at the indicated points. ∎

Alternative proof for D(Ωp(t),ωp(t))expD(0,p)\operatorname{D}(\Omega_{p}(t),\omega_{p}(t))\subset\exp\operatorname{D}(0,p)..

Here we give a purely differential geometric argument.

One can see that the given disks D(Ωp(t),ωp(t))\operatorname{D}(\Omega_{p}(t),\omega_{p}(t)) are characterized by the following properties:

(α\alpha) If γp(t)γp(πtmod2π)\gamma_{p}(t)\neq\gamma_{p}(\pi-t\operatorname{\,mod\,}2\pi), then the disk is tangent to γp\gamma_{p} at these points.

(β\beta) If γp(t)=γp(πtmod2π)\gamma_{p}(t)=\gamma_{p}(\pi-t\operatorname{\,mod\,}2\pi), i. e. t=±πt=\pm\pi, then the disk is the osculating disk at γp(±π/2)\gamma_{p}(\pm\pi/2).

Now, we prove that D(Ωp(t),ωp(t))expD(0,p)\operatorname{D}(\Omega_{p}(t),\omega_{p}(t))\subset\exp\operatorname{D}(0,p). First, we show that D(Ωp(0),ωp(0))expD(0,p)\operatorname{D}(\Omega_{p}(0),\omega_{p}(0))\subset\exp\operatorname{D}(0,p). Indeed,

D(Ωp(0),ωp(0))=PD([epep]);\operatorname{D}(\Omega_{p}(0),\omega_{p}(0))=\operatorname{PD}\left(\begin{bmatrix}\mathrm{e}^{p}&\\ &\mathrm{e}^{-p}\end{bmatrix}\right);

hence, by Theorem 1.2, the log\log of any element of D(Ωp(0),ωp(0))\operatorname{D}(\Omega_{p}(0),\omega_{p}(0)) is contained in

PD(log[epep])=PD([pp])=D(0,p).\operatorname{PD}\left(\log\begin{bmatrix}\mathrm{e}^{p}&\\ &\mathrm{e}^{-p}\end{bmatrix}\right)=\operatorname{PD}\left(\begin{bmatrix}p&\\ &{-p}\end{bmatrix}\right)=\operatorname{D}(0,p).

Let LL be the maximal real number such that D(Ωp(t),ωp(t))expD(0,p)\operatorname{D}(\Omega_{p}(t),\omega_{p}(t))\subset\exp\operatorname{D}(0,p) for any t[L,L]t\in[-L,L], and L<π/2L<\pi/2. (Due to continuity, there is a maximum.) Indirectly, assume that L<π/2L<\pi/2. Then one of following should happen:

(i) Besides γp(L)\gamma_{p}(L) and γp(πLmod2π)\gamma_{p}(\pi-L\operatorname{\,mod\,}2\pi) there is another pair (due to inversion symmetry) of distinct points γp(L~)\gamma_{p}(\tilde{L}) and γp(πL~mod2π)\gamma_{p}(\pi-\tilde{L}\operatorname{\,mod\,}2\pi), where D(Ωp(L),ωp(L))\operatorname{D}(\Omega_{p}(L),\omega_{p}(L)) touches the boundary of expD(0,p)\exp\operatorname{D}(0,p).

(ii) D(Ωp(L),ωp(L))\operatorname{D}(\Omega_{p}(L),\omega_{p}(L)) touches the boundary at γp(π/2)\gamma_{p}(\pi/2) or γp(π/2)\gamma_{p}(-\pi/2).

(iii) D(Ωp(L),ωp(L))\operatorname{D}(\Omega_{p}(L),\omega_{p}(L)) is osculating at γp(L)\gamma_{p}(L) or at γp(πLmod2π)\gamma_{p}(\pi-L\operatorname{\,mod\,}2\pi).

(Symmetry implies that t=±Lt=\pm L are equally bad.) Case (i) is impossible, because the given circles are distinct and the characterising properties hold. Case (ii) is impossible, because, due to ωp(L)>p\omega_{p}(L)>p and the extremality of argγp(±π/2)\arg\gamma_{p}(\pm\pi/2), the situation would imply that D(Ωp(L),ωp(L))\operatorname{D}(\Omega_{p}(L),\omega_{p}(L)) strictly contains the osculating disk at γp(π/2)\gamma_{p}(\pi/2) or γp(π/2)\gamma_{p}(-\pi/2), which is a contradiction to D(Ωp(L),ωp(L))expD(0,p)\operatorname{D}(\Omega_{p}(L),\omega_{p}(L))\subset\exp\operatorname{D}(0,p). Case (iii) is impossible, because for the oriented plane curvature of γp\gamma_{p},

ϰγp(t)=1+pcostpepcost<1ωp(t)=costsinh(pcost)\varkappa_{\gamma_{p}}(t)=\frac{1+p\cos t}{p\mathrm{e}^{p\cos t}}<\frac{1}{\omega_{p}(t)}=\frac{\cos t}{\sinh(p\cos t)}

holds if cost0\cos t\neq 0. (In general, 1+xex<xsinhx\frac{1+x}{\mathrm{e}^{x}}<\frac{x}{\sinh x} for x0x\neq 0.) This implies L=π/2L=\pi/2, proving the statement. ∎

In what follows, we will not make much issue out of expressions like sinhpxx\frac{\sinh px}{x} when x=0x=0; we just assume that they are equal to pp, in the spirit of continuity.

Theorem 13.2.

Suppose that p(0,π)p\in(0,\pi). Suppose that DD is a disk in expD(0,p)\exp\operatorname{D}(0,p), which touches expD(0,p)\partial\exp\operatorname{D}(0,p) at γp(t)=epcost+ipsint\gamma_{p}(t)=\mathrm{e}^{p\cos t+\mathrm{i}p\sin t}. Then for an appropriate nonnegative decomposition p=p1+p2p=p_{1}+p_{2},

D=CD(exp(p1(Idcost+I~sint))W(p2,p2sint)).D=\operatorname{CD}\left(\exp(p_{1}(\operatorname{Id}\cos t+\tilde{I}\sin t))\cdot W(p_{2},p_{2}\sin t)\right).

The bigger the p2p_{2} is, the bigger the corresponding disk is. p2=pp_{2}=p corresponds to the maximal disk, p2=0p_{2}=0 corresponds to the point disk.

Proof.

Let Wp1,p2,tW_{p_{1},p_{2},t} denote the argument of CD\operatorname{CD}. Then its first component is Magnus exponentiable by norm p1p_{1}, and its second component is Magnus exponentiable by norm p2p_{2}. Thus the principal disk must lie in expD(0,p)\exp\operatorname{D}(0,p). One can compute the center and the radius of the chiral disk (cf. the Remark), and find that γp(t)\gamma_{p}(t) is on the boundary of the disk. So, CD(Wp1,p2,t)\operatorname{CD}(W_{p_{1},p_{2},t}) must be the maximal CD(W0,p1+p2,t)\operatorname{CD}(W_{0,p_{1}+p_{2},t}) contracted from γp(t)\gamma_{p}(t). One, in particular, finds that the radius of CD(Wp1,p2,t)\operatorname{CD}(W_{p_{1},p_{2},t}) is

ep1+p2ep1p22cost=epcost(1e2p2).\frac{\mathrm{e}^{p_{1}+p_{2}}-\mathrm{e}^{p_{1}-p_{2}}}{2\cos t}=\frac{\mathrm{e}^{p}}{\cos t}(1-\mathrm{e}^{-2p_{2}}).

This shows that bigger p2p_{2} leads to bigger disk. ∎

Remark.

It is easy to see that, for p=p1+p2p=p_{1}+p_{2},

exp(p1(Idcost+I~sint))W(p2,p2sint)=\displaystyle\exp(p_{1}(\operatorname{Id}\cos t+\tilde{I}\sin t))\cdot W(p_{2},p_{2}\sin t)=
=ep1costexp((p1+p2)sintI~)(cosh(p2cost)Id2+sinh(p2cost)cost(sintI~+K~))\displaystyle=\mathrm{e}^{p_{1}\cos t}\exp((p_{1}+p_{2})\sin t\tilde{I})\cdot\left(\cosh(p_{2}\cos t)\operatorname{Id}_{2}+\frac{\sinh(p_{2}\cos t)}{\cos t}\Bigl{(}-\sin t\tilde{I}+\tilde{K}\Bigr{)}\right)
=expL(p1p[costsintsintcost]+p2p[sin(2θsint)cos(2θsint)cos(2θsint)sin(2θsint)]dθ|[0,p])\displaystyle=\operatorname{exp_{L}}\left(\frac{p_{1}}{p}\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}+\frac{p_{2}}{p}\begin{bmatrix}-\sin(2\theta\sin t)&\cos(2\theta\sin t)\\ \cos(2\theta\sin t)&\sin(2\theta\sin t)\end{bmatrix}\mathrm{d}\theta|_{[0,p]}\right)
=expL(p1[costsintsintcost]+p2[sin(2pθsint)cos(2pθsint)cos(2pθsint)sin(2pθsint)]dθ|[0,1]).\displaystyle=\operatorname{exp_{L}}\left(p_{1}\begin{bmatrix}\cos t&-\sin t\\ \sin t&\cos t\end{bmatrix}+p_{2}\begin{bmatrix}-\sin(2p\theta\sin t)&\cos(2p\theta\sin t)\\ \cos(2p\theta\sin t)&\sin(2p\theta\sin t)\end{bmatrix}\mathrm{d}\theta|_{[0,1]}\right).\qed

This immediately implies the existence of a certain normal form. For the sake of compact notation, let

𝕂~:={sinβJ~+cosβK~:β[0,2π)},\tilde{\mathbb{K}}:=\{-\sin\beta\tilde{J}+\cos\beta\tilde{K}\,:\,\beta\in[0,2\pi)\},

which is the set of the conjugates of K~\tilde{K} by orthogonal matrices.

Theorem 13.3.

Suppose that AM2()A\in\mathrm{M}_{2}(\mathbb{R}) such that CD(A)expD̊(0,π)\operatorname{CD}(A)\subset\exp\operatorname{\mathring{D}}(0,\pi). Assume that pp is the smallest real number such that CD(A)expD(0,p)\operatorname{CD}(A)\subset\exp\operatorname{D}(0,p), and CD(A)\operatorname{CD}(A) touches expD(0,p)\exp\partial\operatorname{D}(0,p) at ep(cost+isint)\mathrm{e}^{p(\cos t+\mathrm{i}\sin t)}. Then there is an nonnegative decomposition p=p1+p2p=p_{1}+p_{2}, and a matrix F~𝕂~\tilde{F}\in\tilde{\mathbb{K}}, such that

(111) A=ep1cost\displaystyle A=\mathrm{e}^{p_{1}\cos t} exp(psintI~)(cosh(p2cost)Id2sinh(p2cost)costsintI~)+sinh(p2cost)costF~\displaystyle\exp(p\sin t\tilde{I})\cdot\left(\cosh(p_{2}\cos t)\operatorname{Id}_{2}-\frac{\sinh(p_{2}\cos t)}{\cos t}\sin t\tilde{I}\right)+\frac{\sinh(p_{2}\cos t)}{\cos t}\tilde{F}
(112) =expL(p1exp(tI~)+p2exp(2pθsintI~)F~dθ|[1/2,1/2])\displaystyle=\operatorname{exp_{L}}\left(p_{1}\exp(t\tilde{I})+p_{2}\exp(2p\theta\sin t\tilde{I})\cdot\tilde{F}\,\,\mathrm{d}\theta|_{[-1/2,1/2]}\right)
(113) =expL(exp(tI~)dθ|[0,p1])expL(exp((2θp1p2)sintI~)F~dθ|[0,p2])\displaystyle=\operatorname{exp_{L}}(\exp(t\tilde{I})\,\mathrm{d}\theta|_{[0,p_{1}]})\operatorname{exp_{L}}\left(\exp((2\theta-p_{1}-p_{2})\sin t\tilde{I})\tilde{F}\,\mathrm{d}\theta|_{[0,p_{2}]}\right)
(114) =expL(exp((2θ+p1p2)sintI~)F~dθ|[0,p2])expL(exp(tI~)dθ|[0,p1]).\displaystyle=\operatorname{exp_{L}}\left(\exp((2\theta+p_{1}-p_{2})\sin t\tilde{I})\tilde{F}\,\mathrm{d}\theta|_{[0,p_{2}]}\right)\operatorname{exp_{L}}(\exp(t\tilde{I})\,\mathrm{d}\theta|_{[0,p_{1}]}).

The case p1=p2=0p_{1}=p_{2}=0 corresponds to A=Id2A=\operatorname{Id}_{2}.

The case p1>0,p2=0p_{1}>0,p_{2}=0 corresponds to point disk case, the expression does not depend on F~\tilde{F}.

The case p1=0,p2>0p_{1}=0,p_{2}>0 corresponds to the maximal disk case, it has degeneracy tπtmod2πt\leftrightarrow\pi-t\operatorname{\,mod\,}2\pi.

In the general case p1,p2>0p_{1},p_{2}>0, the presentation is unique in terms of p1,p2,tmod2π,F~p_{1},p_{2},t\operatorname{\,mod\,}2\pi,\tilde{F}.

Proof.

This is an immediate consequence of the previous statement and the observation (cosα+I~sinα)K~(cosα+I~sinα)1=(cos2α+I~sin2α)K~=J~sin2α+K~cos2α.(\cos\alpha+\tilde{I}\sin\alpha)\tilde{K}(\cos\alpha+\tilde{I}\sin\alpha)^{-1}=(\cos 2\alpha+\tilde{I}\sin 2\alpha)\tilde{K}=-\tilde{J}\sin 2\alpha+\tilde{K}\cos 2\alpha.

In what follows, we use the notation

N(p1,p2,t,F~)\operatorname{N}(p_{1},p_{2},t,\tilde{F})

to denote the arithmetic expression on the RHS of (111). In itself, it just a matrix value, but the statement above offers three particularly convenient ways (normal forms) to present is as a left-exponential: (112) is sufficiently nice and compact with norm density pp on an interval of unit length. (113) and (114) are concatenations of intervals of length p1p_{1} and p2p_{2} with norm density 11. One part is essentially a complex exponential, relatively uninteresting; the other part is the Magnus parabolic or hyperbolic development of Examples 12.1 and 12.3, but up to conjugation by a special orthogonal matrix, which is the same to say as ‘up to phase’.

Theorem 13.4.

Suppose that AM2()A\in\mathrm{M}_{2}(\mathbb{R}) such that CD(A)expD̊(0,π)\operatorname{CD}(A)\subset\exp\operatorname{\mathring{D}}(0,\pi). Then

2×2real(A)=inf{λ[0,π):CD(A)expD(0,λ)}.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)=\inf\{\lambda\in[0,\pi)\,:\,\operatorname{CD}(A)\subset\exp\operatorname{D}(0,\lambda)\}.

Or, in other words,

2×2real(A)=sup{|logz|:zCD(A)}.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)=\sup\{|\log z|\,:\,z\in\operatorname{CD}(A)\}.
Proof.

Assume that pp is the smallest real number such that CD(A)expD(0,p)\operatorname{CD}(A)\subset\exp\operatorname{D}(0,p). By Theorem 1.2, 2×2real(A)\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A) is at least pp, while the left-exponentials of Theorem 13.3 does indeed Magnus-exponentiate them with norm pp. ∎

Suppose that AM2()A\in\mathrm{M}_{2}(\mathbb{R}) such that CD(A)expD̊(0,π)\operatorname{CD}(A)\subset\exp\operatorname{\mathring{D}}(0,\pi), AId2A\neq\operatorname{Id}_{2}, p=2×2real(A)p=\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A). If detA=1\det A=1, then AA can be of the three kinds: Magnus elliptic, when CD(A)\operatorname{CD}(A) touches expD(0,p)\exp\partial\operatorname{D}(0,p) at eip\mathrm{e}^{\mathrm{i}p} or eip\mathrm{e}^{-\mathrm{i}p}, but it is not an osculating disk; Magnus parabolic, when CD(A)\operatorname{CD}(A) touches expD(0,p)\exp\partial\operatorname{D}(0,p) at eip\mathrm{e}^{\mathrm{i}p} or eip\mathrm{e}^{-\mathrm{i}p}, and it is an osculating disk; or Magnus hyperbolic when CD(A)\operatorname{CD}(A) touches expD(0,p)\exp\partial\operatorname{D}(0,p) at two distinct points. If detA1\det A\neq 1 then CD(A)\operatorname{CD}(A) touches expD(0,p)\exp\partial\operatorname{D}(0,p) at a single point, asymmetrically; we can call these Magnus loxodromic. We see that Examples 12.1, 12.2, and 12.3, cover all the Magnus parabolic, hyperbolic and elliptic cases up to conjugation by an orthogonal matrix. In general, if AA is not Magnus hyperbolic, then it determines a unique Magnus direction cost+isint\cos t+\mathrm{i}\sin t (in the notation Theorem 13.3). It is the direction of the farthest point of {logz:zCD(A)}\{\log z\,:\,z\in\operatorname{CD}(A)\} from the origin. If AA is Magnus hyperbolic, then this direction is determined only up to sign in the real part.

Lemma 13.5.

Suppose AM2()A\in\mathrm{M}_{2}(\mathbb{R}) such that CD(A)expD̊(0,π)\operatorname{CD}(A)\subset\exp\operatorname{\mathring{D}}(0,\pi), AId2A\neq\operatorname{Id}_{2}, detA=1\det A=1, CD(A)=D((a,b),r)\operatorname{CD}(A)=\operatorname{D}((a,b),r). Then a2+b2=r2+1a^{2}+b^{2}=r^{2}+1 and a+1>0a+1>0.

We claim that AA is Magnus hyperbolic or parabolic if and only if

2arctanr+|b|a+1r.2\arctan\frac{r+|b|}{a+1}\leq r.

If AA is Magnus elliptic or parabolic, then

2×2real(A)=2arctanr+|b|a+1.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)=2\arctan\frac{r+|b|}{a+1}.
Proof.

D((a,b),r)\partial\operatorname{D}((a,b),r) intersects the unit circle at

(cosφ±,sinφ±):=(a±bra2+b2,bara2+b2),(\cos\varphi_{\pm},\sin\varphi_{\pm}):=\left(\dfrac{a\pm br}{a^{2}+b^{2}},\dfrac{b\mp ar}{a^{2}+b^{2}}\right),

φ±(π,π)\varphi_{\pm}\in(-\pi,\pi). In particular, a±bra2+b2+1>0\dfrac{a\pm br}{a^{2}+b^{2}}+1>0; multiplying them, we get a+1>0a+1>0. Then ϕ±=2arctanr±ba+1\phi_{\pm}=2\arctan\frac{r\pm b}{a+1}. If one them is equal to rr, then it is a Magnus parabolic case; if those are smaller than rr, then it is a Magnus hyperbolic case; if one of them is bigger than rr, this it must be a Magnus elliptic case. (Cf. the size of the chiral disk in Theorem 13.2.) ∎

Recall that we say that the measure ϕ\phi is a minimal Magnus presentation for AA, if expL(ϕ)=A\operatorname{exp_{L}}(\phi)=A and ϕ2=2×2real(A)\int\|\phi\|_{2}=\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A).

Lemma 13.6.

Any element AGL2+()A\in\operatorname{GL}^{+}_{2}(\mathbb{R}) has at least one minimal Magnus presentation.

Proof.

GL2+()\operatorname{GL}^{+}_{2}(\mathbb{R}) is connected, which implies that any element AA has at least one Magnus presentation ψ\psi. If ϕ2\int\|\phi\|_{2} is small enough, then we can divide the supporting interval of ϕ\phi into 2×2real(A)/π\lfloor\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)/\pi\rfloor many subintervals, such that the variation of ϕ\phi on any of them is less than π\pi. Replace ϕ\phi by a normal form on every such subinterval. By this we have managed to get a presentation of variation at most ϕ2\int\|\phi\|_{2} by a data from ([0,π]×[0,π]×[0,2π]×𝕂)2×2real(A)/π([0,\pi]\times[0,\pi]\times[0,2\pi]\times\mathbb{K})^{\lfloor\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)/\pi\rfloor}. Conversely, such a data always gives a presentation, whose expL\operatorname{exp_{L}} depends continuously on the data. Then the statement follows from a standard compactness argument. ∎

Lemma 13.7.

Suppose that AλIdA_{\lambda}\rightarrow\operatorname{Id}, such that AλA_{\lambda} is Magnus hyperbolic, but AλIdA_{\lambda}\neq\operatorname{Id} for any λ\lambda. Suppose that CD(Aλ)=D((1+aλ,bλ),rλ)\operatorname{CD}(A_{\lambda})=\operatorname{D}((1+a_{\lambda},b_{\lambda}),r_{\lambda}).

Then, as the sequence converges,

2×2real(Aλ)2=2aλ+O(itself2);\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A_{\lambda})^{2}=2a_{\lambda}+O(\mathrm{itself}^{2});

or more precisely,

2×2real(Aλ)2=2aλ13aλ2+32bλ2aλ+O(itself3).\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A_{\lambda})^{2}=2a_{\lambda}-\frac{1}{3}a_{\lambda}^{2}+\frac{3}{2}\frac{b_{\lambda}^{2}}{a_{\lambda}}+O(\mathrm{itself}^{3}).
Proof.

We can assume that Aλ=W(pλ,pλsintλ)A_{\lambda}=W(p_{\lambda},p_{\lambda}\sin t_{\lambda}). From the formula of W(p,psint)W(p,p\sin t) one can see that CD(W(p,psint))\operatorname{CD}(W(p,p\sin t)) is an entire function of x=pcost,y=psintx=p\cos t,y=p\sin t. One actually finds that the center is

(1+a^(x,y),b^(x,y))=\displaystyle(1+\hat{a}(x,y),\hat{b}(x,y))= (1+x2+y22+(x2y2)(x2+y2)24\displaystyle\biggl{(}1+\frac{x^{2}+y^{2}}{2}+\frac{(x^{2}-y^{2})(x^{2}+y^{2})}{24}
+(x410x2y2+5y4)(x2+y2)720+O(x,y)8,\displaystyle+\frac{(x^{4}-10x^{2}y^{2}+5y^{4})(x^{2}+y^{2})}{720}+O(x,y)^{8},
y(x2+y2)3+y(x2+y2)(x2y2)30+O(x,y)7).\displaystyle\frac{y(x^{2}+y^{2})}{3}+\frac{y(x^{2}+y^{2})(x^{2}-y^{2})}{30}+O(x,y)^{7}\biggr{)}.

(One can check that in the expansion a^(x,y)\hat{a}(x,y), every term is divisible by (x2+y2)(x^{2}+y^{2}); in the expansion b^(x,y)\hat{b}(x,y), every term is divisible by y(x2+y2)y(x^{2}+y^{2}).) Eventually, one finds that

p2=x2+y2=2a^(x,y)+O(x,y)4p^{2}=x^{2}+y^{2}=2\hat{a}(x,y)+O(x,y)^{4}

and

p2=x2+y2=2a^(x,y)13a^(x,y)2+32b^(x,y)2a^(x,y)+O(x,y)6.p^{2}=x^{2}+y^{2}=2\hat{a}(x,y)-\frac{1}{3}\hat{a}(x,y)^{2}+\frac{3}{2}\frac{\hat{b}(x,y)^{2}}{\hat{a}(x,y)}+O(x,y)^{6}.\qed

The hyperbolic developments pW(p,psint)p\mapsto W(p,p\sin t) are uniform motions in the sense that the increments W((p+ε),(p+ε)sint)W(p,psint)1W((p+\varepsilon),(p+\varepsilon)\sin t)W(p,p\sin t)^{-1} differ from each other by conjugation by orthogonal matrices as pp changes. In fact, they are locally characterized by the speed sint\sin t, and a phase, i. e. conjugation by rotations.

Lemma 13.8.

Assume that 0<p1,p20<p_{1},p_{2}; p1+p2<πp_{1}+p_{2}<\pi; t1,t2[π/2,π/2]t_{1},t_{2}\in[-\pi/2,\pi/2]; ε(π/2,π/2]\varepsilon\in(-\pi/2,\pi/2]. On the interval [p1,p2][-p_{1},p_{2}], consider the measure ϕ\phi given by

ϕ(θ)=η(θ)dθ|[p1,p2],\phi(\theta)=\eta(\theta)\,\mathrm{d}\theta|_{[-p_{1},p_{2}]},

where

η(θ)={[sin2(θsint2)cos2(θsint2)cos2(θsint2)sin2(θsint2)]if θ0[cosεsinεsinεcosε][sin2(θsint1)cos2(θsint1)cos2(θsint1)sin2(θsint1)][cosεsinεsinεcosε]if θ0.\eta(\theta)=\begin{cases}\begin{bmatrix}-\sin 2(\theta\sin t_{2})&\cos 2(\theta\sin t_{2})\\ \cos 2(\theta\sin t_{2})&\sin 2(\theta\sin t_{2})\end{bmatrix}&\text{if }\theta\geq 0\\ \begin{bmatrix}\cos\varepsilon&-\sin\varepsilon\\ \sin\varepsilon&\cos\varepsilon\end{bmatrix}\begin{bmatrix}-\sin 2(\theta\sin t_{1})&\cos 2(\theta\sin t_{1})\\ \cos 2(\theta\sin t_{1})&\sin 2(\theta\sin t_{1})\end{bmatrix}\begin{bmatrix}\cos\varepsilon&\sin\varepsilon\\ -\sin\varepsilon&\cos\varepsilon\end{bmatrix}&\text{if }\theta\leq 0.\end{cases}

Then

2×2real(expL(ϕ))<p1+p2\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi))<p_{1}+p_{2}

unless ε=0\varepsilon=0 and t1=t2t_{1}=t_{2}.

Proof.

It is sufficient to prove this for a small subinterval around 0. So let us take the choice p1=p2=p/2p_{1}=p_{2}=p/2, p0p\searrow 0. Then

expL(ϕ|[p/2,p/2])=W(p2,p2sint2)[cosεsinεsinεcosε]W(p2,p2sint1)1[cosεsinεsinεcosε].\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]})=W\left(\frac{p}{2},\frac{p}{2}\sin t_{2}\right)\begin{bmatrix}\cos\varepsilon&-\sin\varepsilon\\ \sin\varepsilon&\cos\varepsilon\end{bmatrix}W\left(-\frac{p}{2},-\frac{p}{2}\sin t_{1}\right)^{-1}\begin{bmatrix}\cos\varepsilon&\sin\varepsilon\\ -\sin\varepsilon&\cos\varepsilon\end{bmatrix}.

Let

D((ap,bp),rp)=CD(expL(ϕ|[p/2,p/2])).\operatorname{D}((a_{p},b_{p}),r_{p})=\operatorname{CD}(\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]})).

(i) If ε(π/2,0)(0,π/2)\varepsilon\in(-\pi/2,0)\cup(0,\pi/2), then

2arctanrp±bpap+1rp=14sin(2ε)p2+O(p3).2\arctan\frac{r_{p}\pm b_{p}}{a_{p}+1}-r_{p}=\mp\frac{1}{4}\sin(2\varepsilon)p^{2}+O(p^{3}).

This shows that expL(ϕ|[p/2,p/2])\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]}) gets Magnus elliptic. However,

2×2real(expL(ϕ|[p/2,p/2]))=2arctanrp±brap+1=pcos(ε)+O(p2)\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]}))=2\arctan\frac{r_{p}\pm b_{r}}{a_{p}+1}=p\cos(\varepsilon)+O(p^{2})

shows Magnus non-minimality.

(ii) If ε=π/2\varepsilon=\pi/2, sint1+sint20\sin t_{1}+\sin t_{2}\neq 0, then

2arctanrp±brap+1rp=112(sint1+sint2)p3+O(p4).2\arctan\frac{r_{p}\pm b_{r}}{a_{p}+1}-r_{p}=\mp\frac{1}{12}(\sin t_{1}+\sin t_{2})p^{3}+O(p^{4}).

This also shows Magnus ellipticity, and

2arctanrp±brap+1=14|sint1+sint2|p2+O(p3)2\arctan\frac{r_{p}\pm b_{r}}{a_{p}+1}=\frac{1}{4}|\sin t_{1}+\sin t_{2}|p^{2}+O(p^{3})

shows Magnus non-minimality.

(iii) If ε=π/2\varepsilon=\pi/2, sint1+sint2=0\sin t_{1}+\sin t_{2}=0, then expL(ϕ|[p/2,p/2])=Id2\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]})=\operatorname{Id}_{2}. Hence, full cancellation occurs, this is not Magnus minimal.

(iv) If ε=0\varepsilon=0, sint1sint2\sin t_{1}\neq\sin t_{2}, then sint1+sint2<2\sin t_{1}+\sin t_{2}<2, and

2arctanrp±bpap+1rp=16(±(sint1+sint2)2)p3+O(p4).2\arctan\frac{r_{p}\pm b_{p}}{a_{p}+1}-r_{p}=\frac{1}{6}(\pm(\sin t_{1}+\sin t_{2})-2)p^{3}+O(p^{4}).

This shows that expL(ϕ|[p/2,p/2])\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]}) gets Magnus hyperbolic. Then, assuming Magnus minimality and using the previous lemma, we get a contradiction by

2×2real(expL(ϕ|[p/2,p/2]))2=p2148p4(sint2sint1)2+O(itself3)<p2.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi|_{[-p/2,p/2]}))^{2}=p^{2}-\frac{1}{48}p^{4}(\sin t_{2}-\sin t_{1})^{2}+O(\text{itself}^{3})<p^{2}.

This proves the statement. ∎

Lemma 13.9.

Assume that 0<p1,p20<p_{1},p_{2}; p1+p2<πp_{1}+p_{2}<\pi; t1[π/2,π/2)t_{1}\in[-\pi/2,\pi/2). On the interval [p1,p2][-p_{1},p_{2}], let us consider the measure ϕ\phi given by

ϕ(θ)=η(θ)dθ,\phi(\theta)=\eta(\theta)\,\mathrm{d}\theta,

where

η(θ)={I~=[11]if θ0[sin2(θsint)cos2(θsint)cos2(θsint)sin2(θsint)]if θ0.\eta(\theta)=\begin{cases}\tilde{I}=\begin{bmatrix}&-1\\ 1&\end{bmatrix}&\text{if }\theta\geq 0\\ \begin{bmatrix}-\sin 2(\theta\sin t)&\cos 2(\theta\sin t)\\ \cos 2(\theta\sin t)&\sin 2(\theta\sin t)\end{bmatrix}&\text{if }\theta\leq 0.\end{cases}

Then

2×2real(expL(ϕ))<p1+p2.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi))<p_{1}+p_{2}.
Proof.

Again, it is sufficient to show it for a small subinterval around 0.

(i) Suppose t(π/2,π/2)t\in(-\pi/2,\pi/2). As p0p\searrow 0, restrict to the interval

p=[p,sinhpcostcostp].\mathcal{I}_{p}=\left[-p,\frac{\sinh p\cos t}{\cos t}-p\right].

Then

expL(ϕ|p)=exp(I~(sinsinhpcostcostp))W(p,psint)1.\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{p}})=\exp\left(\tilde{I}\left(\sin\frac{\sinh p\cos t}{\cos t}-p\right)\right)W(-p,-p\sin t)^{-1}.

Let

D((ap,bp),rp)=CD(expL(ϕ|p)).\operatorname{D}((a_{p},b_{p}),r_{p})=\operatorname{CD}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{p}})).

If we assume Magnus minimality, then

2×2real(expL(ϕ|p))=sinhpcostcost=rp.\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{p}}))=\frac{\sinh p\cos t}{\cos t}=r_{p}.

Thus, expL(ϕ|p)\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{p}}) is Magnus parabolic. By direct computation, we find

2arctanrp+|bp|ap+1=p+13p3max(cos2t+sint1,1sint)+O(p4),2\arctan\frac{r_{p}+|b_{p}|}{a_{p}+1}=p+\frac{1}{3}p^{3}\max(\cos^{2}t+\sin t-1,-1-\sin t)+O(p^{4}),

in contradiction to

sinhpcostcost=p+16p3(cos2t)+O(p4),\frac{\sinh p\cos t}{\cos t}=p+\frac{1}{6}p^{3}(\cos^{2}t)+O(p^{4}),

which is another way to express 2×2real(expL(ϕ|p))\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{p}})) from the density. (The coefficients of p3p^{3} differ for t(π/2,π/2)t\in(-\pi/2,\pi/2).)

(ii) Consider now the case t=π/2t=-\pi/2.

2arctanrp±bpap+1=±12p+O(p2)2\arctan\frac{r_{p}\pm b_{p}}{a_{p}+1}=\pm\frac{1}{2}p+O(p^{2})

shows Magnus ellipticity, and

2arctanrp+|bp|ap+1=p112p3+O(p4)2\arctan\frac{r_{p}+|b_{p}|}{a_{p}+1}=p-\frac{1}{12}p^{3}+O(p^{4})

shows non-minimality. ∎

Now we deal with the unicity of the normal forms as left exponentials. In the context of Theorem 13.3 we call ell(A):=p1(cost+I~sint)\operatorname{ell}(A):=p_{1}(\cos t+\tilde{I}\sin t) the elliptic component of AA, and we call hyp(A):=p2\operatorname{hyp}(A):=p_{2} the hyperbolic length of AA.

Theorem 13.10.

Suppose that AM2()A\in\mathrm{M}_{2}(\mathbb{R}) such that CD(A)expD̊(0,π)\operatorname{CD}(A)\subset\exp\operatorname{\mathring{D}}(0,\pi), and ϕ\phi is a minimal Magnus presentation for AA supported on [a,b][a,b].

Then, restricted to any subinterval \mathcal{I}, the value ell(expL(ϕ|))\operatorname{ell}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}})) is a multiple of ell(A)\operatorname{ell}(A) by a nonnegative real number. Furthermore the interval functions

2×2real(expL(ϕ|))=ϕ|2,\mathcal{I}\mapsto\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}}))=\smallint\|\phi|_{\mathcal{I}}\|_{2},
ell(expL(ϕ|)),\mathcal{I}\mapsto\operatorname{ell}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}})),
hyp(expL(ϕ|))\mathcal{I}\mapsto\operatorname{hyp}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}}))

are additive. In particular, if AA is Magnus hyperbolic or parabolic, then ell(expL(ϕ|))\operatorname{ell}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}})) is always 0.

Proof.

Let us divide the supporting interval of ϕ\phi into smaller intervals 1,,s\mathcal{I}_{1},\ldots,\mathcal{I}_{s}. On these intervals let us replace ϕ|k\phi|_{\mathcal{I}_{k}} by a left-complex normal form. Thus we obtain

ϕ=Φ𝒦1(1).(cost1+I~sint1)𝟏𝒥1..Φ𝒦s(s).(costs+I~sints)𝟏𝒥s,\phi^{\prime}=\Phi^{(1)}_{\mathcal{K}_{1}}\boldsymbol{.}(\cos t_{1}+\tilde{I}\sin t_{1})\mathbf{1}_{\mathcal{J}_{1}}\boldsymbol{.}\ldots\boldsymbol{.}\Phi^{(s)}_{\mathcal{K}_{s}}\boldsymbol{.}(\cos t_{s}+\tilde{I}\sin t_{s})\mathbf{1}_{\mathcal{J}_{s}},

where 𝒥j\mathcal{J}_{j} are 𝒦j\mathcal{K}_{j} are some intervals, and Φ𝒦j(j)\Phi^{(j)}_{\mathcal{K}_{j}} are hyperbolic developments (up to conjugation). (They can be parabolic but for the sake simplicity let us call them hyperbolic.) Further, rearrange this as

ϕ′′=Φ𝒦1(1)..Φ𝒦s(s).(cost1+I~sint1)𝟏𝒥1..(costs+I~sints)𝟏𝒥s,\phi^{\prime\prime}=\Phi^{\prime(1)}_{\mathcal{K}_{1}}\boldsymbol{.}\ldots\boldsymbol{.}\Phi^{\prime(s)}_{\mathcal{K}_{s}}\boldsymbol{.}(\cos t_{1}+\tilde{I}\sin t_{1})\mathbf{1}_{\mathcal{J}_{1}}\boldsymbol{.}\ldots\boldsymbol{.}(\cos t_{s}+\tilde{I}\sin t_{s})\mathbf{1}_{\mathcal{J}_{s}},

where the hyperbolic developments suffer some special orthogonal conjugation but they remain hyperbolic developments. Now, the elliptic parts

ell(expL(ϕ|j))=|𝒥j|(costj+I~sintj)\operatorname{ell}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{j}}))=|\mathcal{J}_{j}|(\cos t_{j}+\tilde{I}\sin t_{j})

must be nonnegatively proportional to each other, otherwise cancelation would occur when the elliptic parts are contracted, in contradiction to the minimality of the presentation. By this, we have proved that in a minimal presentation elliptic parts of disjoint intervals are nonnegatively proportional to each other.

Suppose that in a division |𝒥j|costj0|\mathcal{J}_{j}|\cos t_{j}\neq 0 occurs. Contract the elliptic parts in ϕ′′\phi^{\prime\prime} but immediately divide them into two equal parts:

ϕ′′′=Φ𝒦1(1)..Φ𝒦s(s).(costj+I~sintj)𝟏𝒥.(costj+I~sintj)𝟏𝒥.\phi^{\prime\prime\prime}=\Phi^{\prime(1)}_{\mathcal{K}_{1}}\boldsymbol{.}\ldots\boldsymbol{.}\Phi^{\prime(s)}_{\mathcal{K}_{s}}\boldsymbol{.}(\cos t_{j}+\tilde{I}\sin t_{j})\mathbf{1}_{\mathcal{J}}\boldsymbol{.}(\cos t_{j}+\tilde{I}\sin t_{j})\mathbf{1}_{\mathcal{J}}\boldsymbol{.}

Now replace everything but the last term by a normal form

ϕ′′′′=Φ𝒦0(0).(cost0+I~sint0)𝟏𝒥0.(costj+I~sintj)𝟏𝒥.\phi^{\prime\prime\prime\prime}=\Phi^{\prime(0)}_{\mathcal{K}_{0}}\boldsymbol{.}(\cos t_{0}+\tilde{I}\sin t_{0})\mathbf{1}_{\mathcal{J}_{0}}\boldsymbol{.}(\cos t_{j}+\tilde{I}\sin t_{j})\mathbf{1}_{\mathcal{J}}\boldsymbol{.}

Taking the determinant of the various left-exponential term we find

e|𝒥0|cost0+|𝒥|costj=e2|𝒥|costj.\mathrm{e}^{|\mathcal{J}_{0}|\cos t_{0}+|\mathcal{J}|\cos t_{j}}=\mathrm{e}^{2|\mathcal{J}|\cos t_{j}}.

Thus |𝒥0|cost00|\mathcal{J}_{0}|\cos t_{0}\neq 0, hence, by minimality tj=t0mod2πt_{j}=t_{0}\operatorname{\,mod\,}2\pi, moreover |𝒥0|=|𝒥||\mathcal{J}_{0}|=|\mathcal{J}|. However, the ϕ′′′\phi^{\prime\prime\prime} constitutes a normal form (prolonged in the elliptic part), which in this form is unique, thus, eventually

(115) ell(expL(ϕ))=j=1sell(expL(ϕ|j))\operatorname{ell}(\operatorname{exp_{L}}(\phi))=\sum_{j=1}^{s}\operatorname{ell}(\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{j}}))

must hold.

Suppose now that sintk=1\sin t_{k}=1 or sintk=1\sin t_{k}=-1 occurs with |𝒥k|0|\mathcal{J}_{k}|\neq 0. Consider ϕ′′\phi^{\prime\prime}. By Magnus minimality and Lemma 13.8, the hyperbolic development must fit into single hyperbolic development Ψ𝒦\Psi_{\mathcal{K}} (without phase or speed change). Furthermore, by Lemma 13.9, Ψ𝒦\Psi_{\mathcal{K}} must be parabolic fitting properly to the elliptic parts. Thus ϕ′′\phi^{\prime\prime}, in fact, yields a normal form Ψ𝒦.(sintk)𝟏𝒥\Psi_{\mathcal{K}}\boldsymbol{.}(\sin t_{k})\mathbf{1}_{\mathcal{J}}. Then (115) holds.

The third possibility in ϕ′′\phi^{\prime\prime} is that all the intervals 𝒥j\mathcal{J}_{j} are of zero length. Then the hyperbolic developments fit into a single development Ψ𝒦\Psi_{\mathcal{K}}, but (115) also holds.

Thus (115) is proven. It implies nonnegative proportionality relative to the total ell(expL(ϕ))\operatorname{ell}(\operatorname{exp_{L}}(\phi)). Now, subintervals of minimal presentations also yield minimal presentations, therefore additivity holds in full generality. Regarding the interval functions, the additivity of 2×2real\mathcal{M}_{2\times 2\,\,\mathrm{real}} is trivial, the additivity of ell\operatorname{ell} is just demonstrated, and hyp\operatorname{hyp} is just the 2×2real\mathcal{M}_{2\times 2\,\,\mathrm{real}} minus the absolute value (norm) of ell\operatorname{ell}. ∎

Remark 13.11.

Suppose that ϕ:()\phi:\mathcal{I}\rightarrow\mathcal{B}(\mathfrak{H}) is a measure. Assume that 1\mathcal{I}_{1}\subset\mathcal{I} is a subinterval such that ϕ|12<π\smallint\|\phi|_{\mathcal{I}_{1}}\|_{2}<\pi. Let us replace ϕ|1\phi|_{\mathcal{I}_{1}} by a Magnus minimal presentation of expL(ϕ|1)\operatorname{exp_{L}}(\phi|_{\mathcal{I}_{1}}), in order to obtain an other measure ϕ1\phi_{1}. Then we call ϕ1\phi_{1} a semilocal contraction of ϕ\phi.

We call ϕ\phi semilocally Magnus minimal, if finitely many application of semilocal contractions does not decrease ϕ2\smallint\|\phi\|_{2}. (In this case, the semilocal contractions will not really be contractions, as they are reversible.) We call ϕ\phi locally Magnus minimal, if any application of a semilocal contraction does not decrease ϕ2\smallint\|\phi\|_{2}. It is easy to see that

(Magnus minimal) (semilocally Magnus minimal) (locally Magnus minimal).\text{(Magnus minimal) $\Rightarrow$(semilocally Magnus minimal) $\Rightarrow$(locally Magnus minimal)}.

The arrows do not hold in the other directions. For example, I~𝟏[0,2π]\tilde{I}\mathbf{1}_{[0,2\pi]} is semilocally minimal but not Magnus minimal. Also, (𝟏[0,1]).Ψ0.1[0,1](-\mathbf{1}_{[0,1]})\boldsymbol{.}\Psi_{0}\boldsymbol{.}\mathbf{1}_{[0,1]} is locally Magnus minimal but not semilocally Magnus minimal: Using semilocal contractions we can move (𝟏[0,1])(-\mathbf{1}_{[0,1]}) and 𝟏[0,1]\mathbf{1}_{[0,1]} beside each other, and then there is a proper cancellation.

The proper local generalization of Magnus minimality is semilocal Magnus minimality. If ϕ\phi is locally Magnus minimal, the we can define ell(ϕ)\operatorname{ell}(\phi) and hyp(ϕ)\operatorname{hyp}(\phi) by taking a finite division of {j}\{\mathcal{I}_{j}\} of \mathcal{I} to intervals of variation less than π\pi, and simply adding ell(ϕj)\operatorname{ell}(\phi_{j}) and hyp(ϕj)\operatorname{hyp}(\phi_{j}). What semilocality is needed for is to show that ell(ϕ)\operatorname{ell}(\phi_{\mathcal{I}}) is nonnegatively proportional to ell(ϕ)\operatorname{ell}(\phi), and to a proper definition of the Magnus direction of ϕ\phi.

Having that, semilocally Magnus minimal presentations up to semilocal contractions behave like Magnus minimal presentations. They can also be classified as Magnus elliptic, parabolic, hyperbolic, or loxodromic. (But they are not elements of GL2+()\operatorname{GL}_{2}^{+}(\mathbb{R}) anymore but presentations.) In fact, semilocally Magnus minimal presentations up to semilocal contractions have a very geometrical interpretation, cf. Remark 13.15. (Interpreted as elements of GL2+()~\widetilde{\operatorname{GL}_{2}^{+}(\mathbb{R})}.) ∎

As Theorem 13.3 suggests, hyperbolic and parabolic developments are rather rigid, while in other cases there is some wiggling of elliptic parts.

Theorem 13.12.

Suppose that AId2A\neq\operatorname{Id}_{2}, p=2×2real(A)<πp=\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)<\pi, and ϕ\phi is a minimal presentation to AA supported on the interval [a,b][a,b].

(a) Suppose that AA is Magnus hyperbolic or parabolic. Then there are unique elements t[π/2,π/2]t\in[-\pi/2,\pi/2] and F~𝕂~\tilde{F}\in\tilde{\mathbb{K}} such that

expL(ϕ|[a,x])=N(0,ϕ|[a,x]2,t,F~).\operatorname{exp_{L}}(\phi|_{[a,x]})=\operatorname{N}\left(0,\int\|\phi|_{[a,x]}\|_{2},t,\tilde{F}\right).

Thus, minimal presentations for Magnus hyperbolic and parabolic matrices are unique, up to reparametrization of the measure.

(b) Suppose that CD(A)\operatorname{CD}(A) is point disk. Then there is a unique element t[0,2π)t\in[0,2\pi) such that

expL(ϕ|[a,x])=exp((Id2cost+I~sint)ϕ|[a,x]2).\operatorname{exp_{L}}(\phi|_{[a,x]})=\exp\left((\operatorname{Id}_{2}\cos t+\tilde{I}\sin t)\int\|\phi|_{[a,x]}\|_{2}\right).

Thus, minimal presentations for quasicomplex matrices are unique, up to reparametrization of the measure.

(c) Suppose that AA is not of the cases above. Then there are unique elements t[0,2π)t\in[0,2\pi), p1p_{1}, p2>0p_{2}>0, F~𝕂~\tilde{F}\in\tilde{\mathbb{K}} and surjective monotone increasing function ϖi:[a,b][0,pi]\varpi_{i}:[a,b]\rightarrow[0,p_{i}] such that

ϖ1(x)+ϖ2(x)=xa\varpi_{1}(x)+\varpi_{2}(x)=x-a

and

expL(ϕ|[a,x])=N(ϖ1(ϕ|[a,x]2),ϖ2(ϕ|[a,x]2),t,F~).\operatorname{exp_{L}}(\phi|_{[a,x]})=\operatorname{N}\left(\varpi_{1}\left(\int\|\phi|_{[a,x]}\|_{2}\right),\varpi_{2}\left(\int\|\phi|_{[a,x]}\|_{2}\right),t,\tilde{F}\right).

Thus, minimal presentations in the general case are unique, up to displacement of elliptic parts.

Proof.

Divide [a,b][a,b] to [a,x][a,x] and [x,b][x,b], and replace the minimal presentation by norma forms. They must fit in accordance to minimality. ∎

Remark.

The statement can easily be generalized to semilocally Magnus minimal presentations. ∎

Theorem 13.12 says that certain minimal Magnus presentations are essentially unique. Theorems 13.13 and 13.14 will give some explanation to the fact that it is not easy to give examples for the Magnus expansion blowing up in the critical case ϕ2=π\smallint\|\phi\|_{2}=\pi.

Theorem 13.13.

Suppose that AId2A\neq\operatorname{Id}_{2}, p=2×2real(A)<πp=\mathcal{M}_{2\times 2\,\,\mathrm{real}}(A)<\pi, and ϕ\phi is a minimal presentation to AA supported on the interval [a,b][a,b]. If ϕ\phi is of shape

expL(ϕ|[a,x])=exp(Sϕ|[a,x]2)\operatorname{exp_{L}}(\phi|_{[a,x]})=\exp\left(S\int\|\phi|_{[a,x]}\|_{2}\right)

with some matrix SS (i. e., it is essentially an exponential), then SS is a normal matrix with norm S2=1\|S\|_{2}=1.

Proof.

Due to homogeneity, ell(Φ|)\operatorname{ell}(\Phi|_{\mathcal{I}}) and hyp(Φ|)\operatorname{hyp}(\Phi|_{\mathcal{I}}) must be proportional to 2×2real(Φ|)\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\Phi|_{\mathcal{I}}). But it is easy to see that (up to parametrization) only the homogeneous normal densities (112) have this property, and they are locally constant only if the Magnus non-elliptic component vanishes with p2=0p_{2}=0, or when they are of special hyperbolic type with sint=0\sin t=0. ∎

(This redevelops Theorem 5.1 in the real case.)

Theorem 13.14.

Suppose that ϕ\phi is a measure,

ϕ2=π,\int\|\phi\|_{2}=\pi,

but logexpL(ϕ)\log\operatorname{exp_{L}}(\phi) does not exist. Then there are uniquely determined elements t{π,π}t\in\{-\pi,\pi\} and F~𝕂~\tilde{F}\in\tilde{\mathbb{K}}, a nonnegative decomposition π=p1+p2\pi=p_{1}+p_{2}, with p2>0p_{2}>0, and surjective monotone increasing functions ϖi:[a,b][0,pi]\varpi_{i}:[a,b]\rightarrow[0,p_{i}] such that

ϖ1(x)+ϖ2(x)=xa\varpi_{1}(x)+\varpi_{2}(x)=x-a

and

expL(ϕ|[a,x])=N(ϖ1(ϕ|[a,x]2),ϖ2(ϕ|[a,x]2),t,F~).\operatorname{exp_{L}}(\phi|_{[a,x]})=\operatorname{N}\left(\varpi_{1}\left(\int\|\phi|_{[a,x]}\|_{2}\right),\varpi_{2}\left(\int\|\phi|_{[a,x]}\|_{2}\right),t,\tilde{F}\right).

Thus, critical cases with log\log blowing up are the Magnus elliptic and parabolic (but not quasicomplex) developments up to reparametrization and rearrangement of elliptic parts.

Proof.

The presentation must be Magnus minimal, otherwise the log\log would be OK. Divide [a,b][a,b] to [a,x][a,x] and [x,b][x,b], and replace the minimal presentation by normal parts. They must fit in accordance to minimality. It is easy to see that in the Magnus hyperbolic / loxodromic cases CD(expL(ϕ|[a,x]))\operatorname{CD}(\operatorname{exp_{L}}(\phi|_{[a,x]})) has no chance to reach (,0](-\infty,0]. The disks are the largest in the Magnus hyperbolic cases, and the chiral disks CD(W(π,πsint))\operatorname{CD}(W(\pi,\pi\sin t)) of Magnus strictly hyperbolic developments do not reach the negative axis. So the Magnus elliptic and parabolic cases remain but the quasicomplex one is ruled out. ∎

Thus, even critical cases with ϕ2=π\int\|\phi\|_{2}=\pi are scarce.

Remark 13.15.

We started this section by investigating matrices AA with CD(A)D̊(0,π)\operatorname{CD}(A)\subset\operatorname{\mathring{D}}(0,\pi). It is a natural question to ask whether the treatment extends to matrices AA with, say, CD(A)(,0]=\operatorname{CD}(A)\cap(\infty,0]=\emptyset. The answer is affirmative. However, if we consider this question, then it is advisable to take an even bolder step:

We can extend the statements for AGL2+~()A\in\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}), the universal cover of GL2+(){\operatorname{GL}^{+}_{2}}(\mathbb{R}). This of course, implies that we have to use the covering exponential exp~:M2()GL2+~(),\widetilde{\exp}:\mathrm{M}_{2}(\mathbb{R})\rightarrow\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}), and expL\operatorname{exp_{L}} should also be replaced by expL~\widetilde{\operatorname{exp_{L}}}. Now, the chiral disks of elements of GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}) live in ~\widetilde{\mathbb{C}}, the universal cover of {0}\mathbb{C}\setminus\{0\}.

Mutatis mutandis, Theorems 13.1, 13.2, 13.3 extend in a straightforward manner. Remarkably, Theorems 1.1 and 1.2 have versions in this case (the group acting on the universal cover of {0}\mathbb{C}\setminus\{0\}), however we do not really need them that much, because chiral disks can be traced directly to prove a variant of Theorem 13.4. Elements of GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}) also have minimal Magnus presentations. In our previous terminology, they are semilocally Magnus minimal presentations. In fact, semilocally Magnus minimal presentations up to semilocal contractions will correspond to elements of GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}). Their classification to Magnus hyperbolic, elliptic, parabolic, loxodromic, quasicomplex elements extends to GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}). This picture of GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}) helps to understand GL2+(){\operatorname{GL}^{+}_{2}}(\mathbb{R}). Indeed, we see that every element of GL2+(){\operatorname{GL}^{+}_{2}}(\mathbb{R}) have countably many semilocally Magnus minimal presentations up to semilocal contractions, and among those one or two (conjugates) are minimal. The Magnus exponent of an element of GL2+(){\operatorname{GL}^{+}_{2}}(\mathbb{R}) is the minimal Magnus exponent of its lifts to GL2+~()\widetilde{\operatorname{GL}^{+}_{2}}(\mathbb{R}). ∎

Example 13.16.

Let 𝒛=4.493\boldsymbol{z}=4.493\ldots be the solution of tanz=z\tan z=z on the interval [π,2π][\pi,2\pi]. Consider

𝒁=[1+𝒛2𝒛1+𝒛2+𝒛].\boldsymbol{Z}=\begin{bmatrix}-\sqrt{1+\boldsymbol{z}^{2}}-\boldsymbol{z}&\\ &-\sqrt{1+\boldsymbol{z}^{2}}+\boldsymbol{z}\end{bmatrix}.

The determinant of the matrix is 11, we want to compute its real Magnus exponent. The optimistic suggestion is 2×2complex(𝒁)=π2+log(𝒛+1+𝒛2)2=3.839\mathcal{M}_{2\times 2\,\,\mathrm{complex}}(\boldsymbol{Z})=\sqrt{\pi^{2}+\log(\boldsymbol{z}+\sqrt{1+\boldsymbol{z}^{2}})^{2}}=3.839\ldots. Indeed, in the complex case, or in the doubled real case, this is realizable from

𝒁=exp[log(𝒛+1+𝒛2)+πilog(𝒛+1+𝒛2)+πi].\boldsymbol{Z}=\exp\begin{bmatrix}\log(\boldsymbol{z}+\sqrt{1+\boldsymbol{z}^{2}})+\pi\mathrm{i}&\\ &-\log(\boldsymbol{z}+\sqrt{1+\boldsymbol{z}^{2}})+\pi\mathrm{i}\end{bmatrix}.

However, in the real case, there is “not enough space” to do this. A pessimistic suggestion is π+|log(𝒛+1+𝒛2)|=5.349\pi+|\log(\boldsymbol{z}+\sqrt{1+\boldsymbol{z}^{2}})|=5.349\ldots. Indeed, we can change sign by an elliptic exponential, and then continue by a hyperbolic exponential. This, we know, cannot be optimal. In reality, the answer is 2×2real(𝒁)=𝒛=4.493\mathcal{M}_{2\times 2\,\,\mathrm{real}}(\boldsymbol{Z})=\boldsymbol{z}=4.493\ldots. In fact, 𝒁\boldsymbol{Z} is Magnus parabolic, one can check that 𝒁W(𝒛,𝒛)\boldsymbol{Z}\sim W(\boldsymbol{z},\boldsymbol{z}). This is easy to see from the chiral disk.

In this case there are two Magnus minimal representations, because of the conjugational symmetry. ∎

14. Optimal asymptotical norm estimate for M2()\mathrm{M}_{2}(\mathbb{R})

If ψ\psi is an ordered M2()\mathrm{M}_{2}(\mathbb{R}) valued measure with cumulative norm p=ψ2p=\int\|\psi\|_{2} such that 0<p<π0<p<\pi, then, according to the previous sections, the maximal possible norm of its Magnus expansion μL(ψ)\mu_{\mathrm{L}}(\psi) is realized via maximal disks, through canonical Magnus parabolic or hyperbolic developments. Thus this maximal norm is of shape W(p,psint)2\|W(p,p\sin t)\|_{2} where 0sint10\leq\sin t\leq 1. Thus we have optimize in sint=s(p)\sin t=s(p), although at this point it is not clear that s(p)s(p) is unique depending pp.

Using the implicit function theorem, however, it is easy to see that, for p0p\searrow 0, the optimal ridge is defined with sint=s(p)\sin t=s(p), where

s(p)=116p2+47360p4+O(p6).s(p)=1-\frac{1}{6}p^{2}+\frac{47}{360}p^{4}+O(p^{6}).

Then

(116) μL(Φs(p)|[0,p])2=p+16p3172p5+313024p7+O(p9).\|\mu_{\mathrm{L}}(\Phi_{s(p)}|_{[0,p]})\|_{2}=p+{\frac{1}{6}}{p}^{3}-{\frac{1}{72}}{p}^{5}+{\frac{31}{3024}}{p}^{7}+O\left({p}^{9}\right).

We see that this is somewhere in the middle, below the upper estimate (7) and above the parabolic case (106); but all deviations are of O(p7)O(p^{7}).

Using an appropriate reparametrization and the implicit function theorem, one can see that, for pπp\nearrow\pi, the optimal ridge is defined with sint=s(p)\sin t=s(p), where

s(p)=11π(πp)1p/π+23/2π3/2(πp)3/243(πp)2+O((πp)5/2).s(p)=\underbrace{1-\frac{1}{\pi}\left({\pi-p}\right)^{1}}_{p/\pi}+\frac{2^{3/2}}{\pi^{3/2}}\left({\pi-p}\right)^{3/2}-\frac{4}{3}\left({\pi-p}\right)^{2}+O\left(\left({\pi-p}\right)^{5/2}\right).

Then

(117) μL(Φs(p)|[0,p])2=2π3/2(πp)1/22π+2π(4π23)12(πp)1/24π233(πp)1+2(368π2120π245)1440π(πp)3/2+O((πp)2).\|\mu_{\mathrm{L}}(\Phi_{s(p)}|_{[0,p]})\|_{2}=\sqrt{2}\pi^{3/2}(\pi-p)^{-1/2}-2\pi+\frac{\sqrt{2\pi}(4\pi^{2}-3)}{12}(\pi-p)^{1/2}\\ -\frac{4\pi^{2}-3}{3}(\pi-p)^{1}+\frac{\sqrt{2}(368\pi^{2}-120\pi^{2}-45)}{1440\sqrt{\pi}}(\pi-p)^{3/2}+O((\pi-p)^{2}).

This is below the upper estimate (8) by O(1)O(1), and above the parabolic case (107) by O((πp)1/2)O((\pi-p)^{1/2}), and above the lower estimate (110) by O((πp)3/2)O((\pi-p)^{3/2}).

It is natural to ask whether the general upper estimate can be improved to, say

ππ+pπp2π+o(1),\pi\sqrt{\frac{\pi+p}{\pi-p}}-2\pi+o(1),

as pπp\nearrow\pi.

Remark 14.1.

One can show that s(p)s(p) yields a well-defined analytic function for 0<p<π0<p<\pi. This is, however, complicated, as it requires global estimates. ∎

15. A counterexample for 2×22\times 2 complex matrices

According to Theorem 13.4, AM2()A\in\mathrm{M}_{2}(\mathbb{R}), 0p<π0\leq p<\pi, and CR(A)expD(0,p)\operatorname{CR}(A)\subset\exp\operatorname{D}(0,p) implies that A=expLϕA=\exp_{\mathrm{L}}\phi with some appropriate ϕ\phi such that ϕ2p\|\phi\|_{2}\leq p. Here we demonstrate that the corresponding statement is not valid for 2×22\times 2 complex matrices.

Let 0<p<π0<p<\pi. In the CKB model, we will consider the Magnus range

Sp=CKBPHexp((+)D(0,p)).S_{p}=\frac{\mathrm{CKB}}{\mathrm{PH}}\circ\exp((\mathbb{C}^{+})\cap\operatorname{D}(0,p)).

Then it is easy to see that its hyperbolic boundary

Sp=CKBPHexp((+)D(0,p))\partial S_{p}=\frac{\mathrm{CKB}}{\mathrm{PH}}\circ\exp((\mathbb{C}^{+})\cap\partial\operatorname{D}(0,p))

is parametrized by the curve

t(0,π)sp(t)=(cos(psint)cosh(pcost),tanh(pcost)).t\in(0,\pi)\mapsto s_{p}(t)=\left(\frac{\cos(p\sin t)}{\cosh(p\cos t)},\tanh(p\cos t)\right).

The tangent line at tt is given by the equation

0=\displaystyle 0= (sin(t))Ap(t):=x\displaystyle\underbrace{(\sin\left(t\right))}_{A_{p}(t):=}x
+(sin(t)sinh(pcos(t))cos(psin(t))cos(t)cosh(pcos(t))sin(psin(t)))Bp(t)):=y\displaystyle+\underbrace{(\sin\left(t\right)\sinh\left(p\cos\left(t\right)\right)\cos\left(p\sin\left(t\right)\right)-\cos\left(t\right)\cosh\left(p\cos\left(t\right)\right)\sin\left(p\sin\left(t\right)\right))}_{B_{p}(t)):=}y
+(sin(t)cosh(pcos(t))cos(psin(t))+cos(t)sinh(pcos(t))sin(psin(t)))Cp(t):=.\displaystyle+\underbrace{(-\sin\left(t\right)\cosh\left(p\cos\left(t\right)\right)\cos\left(p\sin\left(t\right)\right)+\cos\left(t\right)\sinh\left(p\cos\left(t\right)\right)\sin\left(p\sin\left(t\right)\right))}_{C_{p}(t):=}.

(In this section, x,yx,y are understood as xCKB,yCKBx_{\mathrm{CKB}},y_{\mathrm{CKB}}.) Note that Ap(πt)=Ap(t)A_{p}(\pi-t)=A_{p}(t), Bp(πt)=Bp(t)B_{p}(\pi-t)=-B_{p}(t), Cp(πt)=Cp(t)C_{p}(\pi-t)=-C_{p}(t).

It is not hard to see that the equation of the ellipse tangent to sps_{p} at the three parameter points t<π2<πtt<\frac{\pi}{2}<\pi-t is given by

Ep,t(x,y)(cos(psin(t))cosh(pcos(t))x)2(Ap(t)cos(p)+Cp(t))2(cos(psin(t))cosh(pcos(t))cos(p))2((Ap(t)x+Cp(t))2(Bp(t)y)2)=0E_{p,t}(x,y)\equiv(\cos\left(p\sin\left(t\right)\right)-\cosh\left(p\cos\left(t\right)\right)x)^{2}(A_{p}(t)\cos(p)+C_{p}(t))^{2}\\ -\left(\cos\left(p\sin\left(t\right)\right)-\cosh\left(p\cos\left(t\right)\right)\cos\left(p\right)\right)^{2}\left((A_{p}(t)x+C_{p}(t))^{2}-(B_{p}(t)y)^{2}\right)=0

(the coefficient of y2y^{2} is nonnegative).

Example (Proposition) 15.1.

For the choice

(p0,t0)=(1415π,715π),\left(p_{0},t_{0}\right)=\left(\frac{14}{15}\pi,\frac{7}{15}\pi\right),

the elliptical disk

Ep0,t0(x,y)0E_{p_{0},t_{0}}(x,y)\leq 0

is realized as the conformal range of a 2×22\times 2 complex matrix XX in CKB (i. e. as DWCKB(X)\operatorname{DW}_{\mathrm{CKB}}(X)). Regarding this XX (4 different cases up to unitary conjugation),

CR(X)expD(0,p0)\operatorname{CR}(X)\subset\exp\operatorname{D}(0,p_{0})

but

(118) 2×2 complex(X)>p0.\mathcal{M}_{\text{$2\times 2$ complex}}(X)>p_{0}.
Proof.

One can check that, for the given choice, the ellipse Ep0,t0(x,y)=0E_{p_{0},t_{0}}(x,y)=0 lies in the interior of the unit circle, thus it be realized as the boundary of the conformal range of a (purely complex) 2×22\times 2 complex matrix XX. (Cf. Davis [5],[6], Lins, Spitkovsky, Zhong [14], or, in greater detail, [13].) It is harder to see but this ellipse lies in Sp0S_{p_{0}}, intersecting (tangent to) Sp0\partial S_{p_{0}} at sp0(t0),sp0(π/2),sp0(πt0)s_{p_{0}}(t_{0}),s_{p_{0}}(\pi/2),s_{p_{0}}(\pi-t_{0}). (This requires checking the second derivatives near the critical points, and for the rest numerical considerations are sufficient.)

Assume that now X=expLϕX=\operatorname{exp_{L}}\phi with ϕ2=p0\|\phi\|_{2}=p_{0}. Then ϕ\phi allows an initial restriction ϕ|I\phi|_{I} such that ϕI2=p0/2\|\phi_{I}\|_{2}=p_{0}/2. Let X1/2=expL(ϕI)X_{1/2}=\operatorname{exp_{L}}(\phi_{I}). Then, however, sp0(t0),s_{p_{0}}(t_{0}), sp0(π/2),s_{p_{0}}(\pi/2), sp0(πt0)DWCKB(X)s_{p_{0}}(\pi-t_{0})\in\operatorname{DW}_{\mathrm{CKB}}(X) implies that that sp0/2(t0),s_{p_{0}/2}(t_{0}), sp0/2(π/2),s_{p_{0}/2}(\pi/2), sp0/2(πt0)DWCKB(X1/2)s_{p_{0}/2}(\pi-t_{0})\in\operatorname{DW}_{\mathrm{CKB}}(X_{1/2}). In this case DWCKB(X1/2)\operatorname{DW}_{\mathrm{CKB}}(X_{1/2}) should be the ellipse with boundary Ep0/2,t0(x,y)=0E_{p_{0}/2,t_{0}}(x,y)=0. One can, on the other hand, check that this ellipse is not in the unit circle, which is a contradiction.

In fact, if we allow X=expLϕX=\operatorname{exp_{L}}\phi with ϕ2=p0+ε\|\phi\|_{2}=p_{0}+\varepsilon (with small ε\varepsilon), then it still means that the boundary of DWCKB(X1/2)\operatorname{DW}_{\mathrm{CKB}}(X_{1/2}) should pass through certain points near sp0/2(t0),s_{p_{0}/2}(t_{0}), sp0/2(π/2),s_{p_{0}/2}(\pi/2), sp0/2(πt0)s_{p_{0}/2}(\pi-t_{0}) but contained in Sp0/2S_{p_{0}/2} (making the tangents also close at the given points). This still makes the boundary of DWCKB(X1/2)\operatorname{DW}_{\mathrm{CKB}}(X_{1/2}) close to Ep0/2,t0(x,y)=0E_{p_{0}/2,t_{0}}(x,y)=0, yielding a contradiction. (Thus (118) could be quantified further.) ∎

References

  • [1]
  • [2] Berger, Marcel: Geometry I, II. Universitext. Springer-Verlag, Berlin, 1987.
  • [3] Bonfiglioli, Andrea; Fulci, Roberta: Topics in noncommutative algebra. The theorem of Campbell, Baker, Hausdorff and Dynkin. Lecture Notes in Mathematics, 2034. Springer, Heidelberg, 2012.
  • [4] Casas, Fernando: Sufficient conditions for the convergence of the Magnus expansion. J. Phys. A 40 (2007), 15001–15017.
  • [5] Davis, Chandler: The shell of a Hilbert-space operator. Acta Sci. Math. (Szeged) 29 (1968), 69–86.
  • [6] Davis, Chandler: The shell of a Hilbert-space operator. II. Acta Sci. Math. (Szeged) 31 (1970) 301–318.
  • [7] Gans, David: A new model of the hyperbolic plane. American Math. Monthly 73 (1966), 291–295.
  • [8] Goldberg, Karl: The formal power series for logexey\log\mathrm{e}^{x}\mathrm{e}^{y}. Duke Math. J. 23 (1956), 13–21.
  • [9] Kreĭn, M. G.: The angular localization of the spectrum of a multiplicative integral in Hilbert space (In Russian). Funkcional. Anal. i Priložen. 3 (1969), 89–90.
  • [10] Lakos, Gyula: Some proofs of the Poincaré–Birkhoff–Witt theorem and related matters. arXiv:1812.04896
  • [11] Lakos, Gyula: Convergence estimates for the Magnus expansion I. Banach algebras. arXiv:1709.01791
  • [12] Lakos, Gyula: Convergence estimates for the Magnus expansion II. CC^{*}-algebras. arXiv:1910.03328
  • [13] Lakos, Gyula: On the elliptical range theorems for the Davis–Wielandt shell, the numerical range, and the conformal range. arXiv:2211.13145
  • [14] Lins, Brian; Spitkovsky, Ilya M.; Zhong, Siyu: The normalized numerical range and the Davis–Wielandt shell. Linear Algebra Appl. 546 (2018), 187–209.
  • [15] Magnus, Wilhelm: On the exponential solution of differential equations for a linear operator. Comm. Pure Appl. Math. 7 (1954), 649–673.
  • [16] Michel, Jean: Bases des algèbres de Lie et série de Hausdorff. Séminaire Dubreil. Algèbre, 27 n.1 (1973-1974), exp. n.6, 1–9 (1974).
  • [17] Mityagin, B. S.: Unpublished notes, 1990.
  • [18] Moan, Per Christian; Niesen, Jitse: Convergence of the Magnus series. Found. Comput. Math. 8 (2008), 291–301.
  • [19] Schäffer, Juan Jorge: On Floquet’s theorem in Hilbert spaces. Bull. Amer. Math. Soc. 70 (1964), 243–245.
  • [20] Wielandt, H.: Inclusion theorems for eigenvalues. In: Simultaneous linear equations and the determination of eigenvalues, pp. 75–78. National Bureau of Standards Applied Mathematics Series, No. 29. U. S. Government Printing Office, Washington, D. C., 1953.