Generalizing Frobenius Inversion to Quaternion Matrices

Qiyuan Chen KLMM, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China [email protected] , J. Uhlmann Department of Electrical Engineering and Computer Science, University of Missouri - Columbia [email protected] and Ke Ye KLMM, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China [email protected]

Abstract.

In this paper we derive and analyze an algorithm for inverting quaternion matrices. The algorithm is an analogue of the Frobenius algorithm for the complex matrix inversion. On the theory side, we prove that our algorithm is more efficient that other existing methods. Moreover, our algorithm is optimal in the sense of the least number of complex inversions. On the practice side, our algorithm outperforms existing algorithms on randomly generated matrices. We argue that this algorithm can be used to improve the practical utility of recursive Strassen-type algorithms by providing the fastest possible base case for the recursive decomposition process when applied to quaternion matrices.

1. Introduction

Quaternions are a typical example of hypercomplex number systems, which was first discovered by Hamilton in 1843 [18]. Since then, the quaternions have been extensively studied and applied in various fields of mathematics including contact and spin geoemtry [3, 26], function theory [38, 11], non-commutative algebra [24, 6, 7], linear algebra [47] and algebraic topology [20]. On the other side, since the group of unit quaternions is isomorphic to the group $\operatorname{SU}(2)$ consisting of $2\times 2$ special unitary matrices, they provide us a natural way to represent spatial rotations.¹¹1 $\operatorname{SU}(2)$ is a double cover of the group $\operatorname{SO}(3)$ of all spatial rotations. Because of such a surprising connection, quaternions are widely applied in areas related to rigid body mechanics and quantum mechanics [21, 17, 4, 13, 30, 29, 34]. Quaternions are also useful for combining relating variables into a single algebraic entity that can allow them to be manipulated collectively in a natural way, e.g., representing RGB pixel values using the $i$ , $j$ , and $k$ components of a single quaternion. During the past decade, novel quaternion-based methods and models have emerged in a wide variety of signal processing applications [41, 16, 44, 40].

The increasing prevalence of quaternions in science and engineering has led to increased focus on the development of high-efficiency implementations of core utilities such as the quaternion LU decomposition [42], the quaternion SVD [25], low rank approximation of quaternion matrices [27], and high performance quaternion matrix multiplication [43].

The focus of this paper is on arguably the most fundamental quaternion matrix operation after multiplication: quaternion matrix inversion. Presently there are only four prominent algorithms for matrix inversion over quaternions in the literature [42, 47, 39, 33], and their use includes applications such as computing or estimating the inverse of a covariance matrix in statistics and machine learning [2, 40, 46]; quaternion batch normalization [14, 31]; construction of orthogonal polynomials [12]; the computation of numerical integrals [19]; and hardware design of MIMO radios [37].

The structure of this paper is as follows. We begin with background on the algorithms of matrix inversion and how the most efficient algorithm depends on the type (e.g., real, complex, or quaternion) of the matrix. We then derive an algorithm that is optimized for quaternion matrix inversion. After that, we discuss the computational complexity of our algorithm from two different points of view. Finally, we provide empirical results demonstrating its efficiency advantage over prior methods presented in the literature.

2. Background

The development of practically efficient methods for manually inverting matrices dates back to the early 20th Century. An important early result due to Frobenius in 1910 was a method for decomposing the inverse of a complex matrix into a form requiring only the evaluation of inverses of real matrices. Interest in methods of this type was renewed in the 1950s and 1960s as a means for inverting complex matrices in contexts in which complex variables were not natively supported, e.g., when numerical libraries contained only optimized routines for real matrices.

In 1969, Strassen showed that a special recursive block decomposition of the matrix multiplication problem could achieve better complexity than the conventional $O(n^{3})$ -time algorithm. Moreover, he showed that this implies related recursive algorithms with the same subcubic complexity for computing the determinant and inverse of a matrix. Refinements of the approach have led to progressively improving computational complexities over the decades [8], with $O(n^{2.37286})$ being the current best complexity for matrix multiplication [1] and, consequently, matrix inversion.

While Strassen-type algorithms provide better asymptotic computational complexities, their practical utility depends on the absolute efficiency offered for the problem at hand, i.e., whether it is faster than the best cubic-complexity algorithm for the matrix size of interest. More specifically, the practical efficiency of a generic Strassen-type recursion depends on the base-case efficiency of the fastest available cubic-complexity algorithm. In other words, there exists a fixed breakeven matrix size at which it is faster to apply a cubic-complexity algorithm than it is to continue the recursive block decomposition process down to the scalar base case. In the case of real matrix multiplication, the best base-case algorithm is the conventional cubic-complexity matrix multiplication algorithm. In the case of matrix inversion, however, the situation is much more subtle.

In [9], the authors established a surprising result that the Frobenius algorithm for inverting a complex matrix incurs less numerical overhead, and is demonstrably faster, than state-of-the-art methods based on LU and QR matrix decompositions. More specifically, they examined the Frobenius inversion method for a complex matrix $Z=A+iB$ :

Z^{-1}=(A+BA^{-1}B)^{-1}-iA^{-1}B(A+BA^{-1}B)^{-1},

(1)

which requires only the evaluation of inverses of real matrices $A$ and $B$ . The expression (1) requires only the $A$ component of nonsingular $Z$ to be nonsingular, but it can be reformulated to require only $B$ to be nonsingular²²2Randomization, e.g., as suggested by Zielinski [48], can reduce assumptions to the minimum of nonsingular $Z$ , but at the expense of additional overhead. In this paper we will focus only on generalizations of (1) with the understanding that our results can be trivially reformulated to satisfy different assumptions about the structure of $Z$ .. We now show that a similar approach can be applied to obtain improved computational efficiency for the case of quaternion matrices.

3. Quaternion Matrices

Space of quaternions, denoted $\mathbb{H}$ , generalizes the complex space $\mathbb{C}$ by introducing two additional complex components/dimensions, denoted $j$ and $k$ . Thus, whereas a complex number can be represented with two real scalar parameters $a$ and $b$ as $a+ib$ , a quaternion has four real parameters and has the form $a+ib+jc+kd$ , i.e., it has one real and three imaginary components. Quaternions are algebraically important because they form a linear space with properties analogous to matrices, e.g., they do not commute, but retain many valuable properties of complex numbers, e.g., is a normed division algebra.

The imaginary components of a quaternion are analogous to the complex imaginary component in that

i^{2}=\text{-1},~{}j^{2}=\text{-1},~{}\mbox{and}~{}k^{2}=\text{-1}.

(2)

with component cross products

ij=k,~{}jk=i,~{}\mbox{and}~{}ki=j,

(3)

and reversing the order of terms multiplies the result by - $1$ , i.e., they anticommute with respect to reverse cyclical lexicographic order:

ji=\text{-}k,~{}kj=\text{-}i,~{}\mbox{and}~{}ik=\text{-}j.

(4)

While less familiar, practical interest in quaternions has increased steadily since the 1990s, and are now widely used in computer graphics and GIS systems to perform rotations around a specified vector, as well as in automatic control and system analysis applications involving state evolution on a sphere, e.g., circular orbits around the earth. More recently, quaternion matrices, i.e., matrices with quaternion elements, have increased in prominence in applications ranging from representations of color information for image processing to uses in neural networks.

A quaternion matrix can be represented in the following form:

H=A+iB+jC+kD,

(5)

for real-valued matrices $A$ , $B$ , $C$ , and $D$ . A natural question, then, is whether the computation of the inverse of a given quaternion matrix can be made more efficient by using a decomposition inspired by that of Frobenius for complex matrices.

4. Inverting Quaternion Matrices

Our first observation is that the Frobenius method can be applied directly in the case that any two of the matrices $B$ , $C$ , and $D$ are zero. For example, if $B$ and $C$ are zero, then $H=A+kD$ and

H^{-1}=(A+DA^{-1}D)^{-1}-kA^{-1}D(A+DA^{-1}D)^{-1},

(6)

where the inverse is computed in the $j$ complex plane. From this observation, it can be concluded that any generalized Frobenius-type inverse must satisfy the above with regard to each of $i$ , $j$ , and $k$ when the matrix coefficients on the other two are zero.

We observe that there is an inclusion of $\mathbb{R}$ -algebras:

\mathbb{R}\subseteq\mathbb{C}\subseteq\mathbb{H}.

This implies that

M_{n}(\mathbb{R})\subseteq M_{n}(\mathbb{C})\subseteq M_{n}(\mathbb{H}).

We remark that $M_{n}(\mathbb{C})$ is an $M_{n}(\mathbb{R})$ -bimodule, since $M_{n}(\mathbb{C})\simeq M_{n}(\mathbb{R})\otimes_{\mathbb{R}}\mathbb{C}$ . Moreover, $\mathbb{C}$ is a quadratic field extension of $\mathbb{R}$ , over which the matrix inversion is thoroughly discussed in [9]. As a comparison, we notice that $\mathbb{H}$ is not an algebra over $\mathbb{C}$ , according to the Gelfand-Mazur theorem [15, 28]. Thus the method developed in [9] does not apply directly to the inversion in $M_{n}(\mathbb{H})$ over $M_{n}(\mathbb{C})$ . However, we still have

M_{n}(\mathbb{H})\simeq M_{n}(\mathbb{C})\otimes_{\mathbb{C}}\mathbb{H}.

This implies that given any $Z,W\in M_{n}(\mathbb{H})$ , we may write

	$\displaystyle Z$	$\displaystyle=A+iB+jC+kD=(A+iB)+j(C-iD),$
	$\displaystyle W$	$\displaystyle=E+iF+jG+kH=(E+iF)+j(G-iH),$

where $A,B,C,D,E,F,G,H\in M_{n}(\mathbb{R})$ . We observe that

	$\displaystyle WZ$	$\displaystyle=\left[(A+iB)+j(C-iD)\right]\left[(E+iF)+j(G-iH)\right]$
		$\displaystyle=\left[(A+iB)(E+iF)-(C+iD)(G-iH)\right]+j\left[(C-iD)(E+iF)+(A-iB)(G-iH)\right],$

Thus $W$ is the inverse of $Z$ if and only if

	$\displaystyle(A+iB)(E+iF)-(C+iD)(G-iH)$	$\displaystyle=I_{n},$		(7)
	$\displaystyle(C-iD)(E+iF)+(A-iB)(G-iH)$	$\displaystyle=0.$		(8)

Solving (7) and (8) for $E,F,G$ and $H$ , we obtain the first two items of the lemma that follows.

Lemma 4.1.

Let $Z=A+iB+jC+kD\in M_{n}(\mathbb{H})$ .

(a)

if $A+iB$ is invertible in $M_{n}(\mathbb{C})$ , then $(A+iB)+(C+iD)(A-iB)^{-1}(C-iD)$ is also invertible and

\begin{split}Z^{-1}&=\left[(A+iB)+(C+iD)(A-iB)^{-1}(C-iD)\right]^{-1}\\ &-j(A-iB)^{-1}(C-iD)\left[(A+iB)+(C+iD)(A-iB)^{-1}(C-iD)\right]^{-1}.\end{split}

(9)

(b)

if $C+iD$ invertible in $M_{n}(\mathbb{C})$ , then $(A+iB)(C-iD)^{-1}(A-iB)-(C+iD)$ is invertible and

\begin{split}Z^{-1}&=(C-iD)^{-1}(A-iB)\left[(A+iB)(C-iD)^{-1}(A-iB)+(C+iD)\right]^{-1}\\ &+j\left[-(A+iB)(C-iD)^{-1}(A-iB)-(C+iD)\right]^{-1}.\end{split}

(10)

Proof.

It is sufficient to prove (a). To that end, we recall that a quaternion $z=a+ib+jc+kd\in\mathbb{H}$ can be represented as a $2\times 2$ complex matrix $\begin{bmatrix}a+ib&c+id\\ -c+id&a-ib\end{bmatrix}$ , where $a,b,c,d\in\mathbb{R}$ . Therefore, we may also represent $Z=A+iB+jC+kD\in M_{n}(\mathbb{H})$ as a complex $2\times 2$ block matrix

\widehat{Z}=\begin{bmatrix}A+iB&C+iD\\ -C+iD&A-iB\end{bmatrix}\in M_{2n}(\mathbb{C}),

where $A,B,C,D\in M_{n}(\mathbb{R})$ . As a consequence, inverting $Z$ is equivalent to inverting $\widehat{Z}$ . If $A-iB$ is invertible, then we may invert $\widehat{Z}$ by the Schur complement, i.e.,

\widehat{Z}^{-1}=\begin{bmatrix}W_{1}&-\overline{W_{2}}\\ W_{2}&\overline{W_{1}}\end{bmatrix},

where $\overline{X}$ denotes the conjugate of a matrix $X$ and

	$\displaystyle W_{1}$	$\displaystyle=\left[(A+iB)+(C+iD)(A-iB)^{-1}(C-iD)\right]^{-1},$
	$\displaystyle W_{2}$	$\displaystyle=(A-iB)^{-1}(C-iD)\left[(A+iB)+(C+iD)(A-iB)^{-1}(C-iD)\right]^{-1}.$

Here the invertibility of $(A+iB)+(C+iD)(A-iB)^{-1}(C-iD)$ is ensured by that of $A+iB$ and equations (7) and (8). Hence the inverse of $Z$ is

Z^{-1}=W_{1}-jW_{2}

and this completes the proof. ∎

Clearly, one can iterate the roles of $A,B,C,D$ in Lemma 4.1 to obtain more inversion formulae in $M_{n}(\mathbb{H})$ .

4.1. The Frobenius algorithm over $M_{n}(\mathbb{C})$ and its optimality

For notational simplicity, we define

	$\displaystyle\mathcal{E}_{1}$	$\displaystyle\coloneqq\{A+iB+jC+kD\in M_{n}(\mathbb{H}):A+iB\in\operatorname{GL}_{n}(\mathbb{C})\},$
	$\displaystyle\mathcal{E}_{2}$	$\displaystyle\coloneqq\{A+iB+jC+kD\in M_{n}(\mathbb{H}):C+iD\in\operatorname{GL}_{n}(\mathbb{C})\}.$

We describe formulae in Lemma 4.1 (a)–(b) as a pseudocode in Algorithm 1.

Algorithm 1 complex Frobenius inversion

Z=A+iB+jC+kD\in\mathcal{E}_{1}\cup\mathcal{E}_{2}

2:inverse of

Z

3:if

Z\in\mathcal{E}_{1}

then

4: compute

X_{1}=(A-iB)^{-1}

X_{2}=X_{1}(C-iD)

X_{3}=(C+iD)X_{2}

5: compute

X_{4}=((A+iB)+X_{3})^{-1}

X_{5}=-X_{2}X_{4}

6: return

Z^{-1}=X_{4}+jX_{5}

7:else if

Z\in\mathcal{E}_{2}

then

8: compute

X_{1}=(C-iD)^{-1}

X_{2}=X_{1}(A-iB)

X_{3}=(A+iB)X_{2}

9: compute

X_{4}=(X_{3}+(C+iD))^{-1}

X_{5}=X_{2}X_{4}

10: return

Z^{-1}=X_{5}-jX_{4}

11:end if

It is clear that Algorithm 1 costs $2$ inversions and $3$ multiplications in $M_{n}(\mathbb{C})$ . Next we analyze the optimlity of Algorithm 1. To that end, we need the following two lemmas.

Lemma 4.2.

[9] For any field $\Bbbk$ and $A,B\in M_{n}(\Bbbk)$ such that both $A$ and $A+BA^{-1}B$ are invertible, it requires at least two matrix inversions to compute $(A+BA^{-1}B)^{-1}$ , no matter how many matrix additions, matrix multiplications or scalar multiplications in $\Bbbk$ are used.

Lemma 4.3.

Let $p_{1},p_{2},q_{1},q_{2}$ be polynomials over $\mathbb{C}$ in $n$ variables. Assume that

p_{1}(a)q_{2}(a)=p_{2}(a)q_{1}(a)

for any $a=(a_{1},\dots,a_{n})\in\mathbb{R}^{n}$ such that $q_{1}(a)\neq 0$ and $q_{2}(a)\neq 0$ . Then $p_{1}q_{2}=p_{2}q_{1}$ .

Proof.

We observe that any polynomial $f$ over $\mathbb{C}$ can be uniquely written as $f=\operatorname{Re}(f)+i\operatorname{Im}(f)$ , where $\operatorname{Re}(f)$ and $\operatorname{Im}(f)$ are polynomials with real coefficients. We assume that $p_{1}q_{2}-p_{2}q_{1}\not=0$ . Then it is clear that either $\operatorname{Re}(p_{1}q_{2}-p_{2}q_{1})\not=0$ or $\operatorname{Im}(p_{1}q_{2}-p_{2}q_{1})\not=0$ . Without loss of generality, we may assume that $\operatorname{Re}(p_{1}q_{2}-p_{2}q_{1})\not=0$ . Similarly we may also assume $\operatorname{Re}(q_{1})\neq 0$ and $\operatorname{Re}(q_{2})\neq 0$ . Since $\mathbb{R}$ is an infinite field, there exists $a=(a_{1},\cdots,a_{n})\in\mathbb{R}^{n}$ , such that

\left(\operatorname{Re}(p_{1}q_{2}-p_{2}q_{1})\operatorname{Re}(q_{1})\operatorname{Re}(q_{2})\right)(a)\not=0.

Hence $(p_{1}/q_{1}-p_{2}/q_{2})(a)\not=0$ , $q_{1}(a)\neq 0$ , $q_{2}(a)\not=0$ , which contradicts to the assumption. ∎

Now we are ready to prove the optimality of Algorithm 1. To simplify the notation, we define

Z_{1}=A+iB,\quad Z_{2}=C+iD.

Thus $Z=A+iB+jC+kD=Z_{1}+jZ_{2}$ . According to Algorithm 1, we may also assume that $Z_{1}$ is invertible so that $Z^{-1}=W_{1}-jW_{2}$ where

\displaystyle W_{1}=\left(Z_{1}+Z_{2}\overline{Z_{1}}^{-1}\overline{Z_{2}}\right)^{-1},W_{2}=\overline{Z_{1}}^{-1}\overline{Z_{2}}\left(Z_{1}+Z_{2}\overline{Z_{1}}^{-1}\overline{Z_{2}}\right)^{-1}.

Here $\overline{X}$ denotes the complex conjugate of a complex matrix $X$ .

Theorem 4.4.

It requires at least 1 addition and 2 inversions to compute $\left(Z_{1}+Z_{2}\overline{Z_{1}}^{-1}\overline{Z_{2}}\right)^{-1}$ . In particular, Algorithm 1 is optimal in the sense of the least number of complex matrix inversions.

We remark that Theorem 4.4 should be understood as follows: $2$ is the minimum number of complex matrix inversions required to compute $Z^{-1}$ , regardless of the number of additions, matrix multiplications, scalar multiplications and conjugations used.

Proof.

1 addition is obviously necessary, thus it is sufficient to prove the necessity of 2 inversions. We proceed by contradiction. Assume that there exists an algorithm which costs only one inversion. Then we may find complex noncommutative polynomials $f$ and $g$ such that

\left(Z_{1}+Z_{2}\overline{Z_{1}}^{-1}\overline{Z_{2}}\right)^{-1}=g(Z_{1},\overline{Z_{1}},Z_{2},\overline{Z_{2}},f(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})^{-1},\overline{f(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})^{-1}})

(11)

Clearly, one can further find complex noncommutative polynomials $h$ and $\phi$ so that (11) may be simplified as follows.

$\displaystyle\left(Z_{1}+Z_{2}\overline{Z_{1}}^{-1}\overline{Z_{2}}\right)^{-1}$	$\displaystyle=g\left(Z_{1},\overline{Z_{1}},Z_{2},\overline{Z_{2}},f(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})^{-1},\overline{f(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})^{-1}}\right)$
	$\displaystyle=h\left(Z_{1},\overline{Z_{1}},Z_{2},\overline{Z_{2}},\left(f(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})\overline{f(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})}\right)^{-1}\right)$
	$\displaystyle=h\left(Z_{1},\overline{Z_{1}},Z_{2},\overline{Z_{2}},\phi(Z_{1},Z_{2},\overline{Z_{1}},\overline{Z_{2}})^{-1}\right).$	(12)

Next we consider the rational map $\Phi:M_{n}(\mathbb{C})\times M_{n}(\mathbb{C})\dashrightarrow M_{n}(\mathbb{C})$ defined by

\Phi(Z_{1},Z_{2})=\left(Z_{1}+Z_{2}Z_{1}^{-1}Z_{2}\right)^{-1}-h\left(Z_{1},Z_{1},Z_{2},Z_{2},\phi(Z_{1},Z_{2},Z_{1},Z_{2})^{-1}\right).

In particular, elements of the matrix $\Phi(Z_{1},Z_{2})$ are rational functions on $M_{n}(\mathbb{C})\times M_{n}(\mathbb{C})$ . By (12), we see immediately that $\Phi\equiv 0$ on $M_{n}(\mathbb{R})\times M_{n}(\mathbb{R})$ whenever it is defined. Therefore Lemma 4.3 implies that elements of $\Phi$ must vanish on $M_{n}(\mathbb{C})\times M_{n}(\mathbb{C})$ whenever they are defined. In particular, $\Phi(Z_{1},Z_{2})\equiv 0$ on $M_{n}(\mathbb{C})\times M_{n}(\mathbb{C})$ whenever it is defined, from which we may conclude that $\left(Z_{1}+Z_{2}Z_{1}^{-1}Z_{2}\right)^{-1}$ can be computed by one complex matrix inversion. This leads to a contradiction to Lemma 4.2. ∎

4.2. The Frobenius algorithm over $M_{n}(\mathbb{R})$ and its complexity

According to [9], an inversion for a generic element in $M_{n}(\mathbb{C})$ costs $2$ inversions and $3$ multiplications in $M_{n}(\mathbb{R})$ , and a multiplication in $M_{n}(\mathbb{C})$ costs $3$ multiplications in $M_{n}(\mathbb{R})$ . Therefore, the total cost of inverting a generic $Z\in M_{n}(\mathbb{H})$ by Algorithm 1, together with algorithms in [9] for multiplication and inversion in $M_{n}(\mathbb{C})$ , is at most $4$ inversions and $15$ multiplications in $M_{n}(\mathbb{R})$ . However, we observe that there are actually two repeated multiplications, thus we may obtain Algorithm 2 and the theorem that follows.

Algorithm 2 real Frobenius inversion

1:generic

Z=A+iB+jC+kD\in M_{n}(\mathbb{H})

2:inverse of

Z

3:compute

K=C+D

4:compute

U_{1}=A^{-1}B

U_{2}=(A+BU_{1})^{-1}

U_{3}=U_{1}U_{2}

U_{4}=U_{3}C

U_{5}=U_{2}D

5:compute

V_{1}=(U_{2}+U_{3})K-U_{4}-U_{5}

V_{2}=U_{4}-U_{5}

V_{3}=V_{1}-V_{2}

V_{4}=CV_{2}

V_{5}=DV_{1}

6:compute

W_{1}=KV_{3}+A+V_{4}-V_{5}

W_{2}=B+V_{4}+V_{5}

W_{3}=W_{1}^{-1}W_{2}

7:compute

E=(W_{1}+W_{2}W_{3})^{-1}

E_{1}=V_{2}E

8:compute

F=-W_{3}E

F_{1}=V_{1}F

9:compute

G=F_{1}-E_{1}-V_{3}(E+F)

H=F_{1}+E_{1}

10:return

Z^{-1}=E+iF+jG+kH

Proposition 4.5.

Algorithm 2 computes $Z^{-1}$ by $4$ inversions and $13$ multiplications in $M_{n}(\mathbb{R})$ .

5. Comparison with existing methods

In this section, we compare the computational overhead of the Generalized Frobenius Inversion (Algorithms 1 and 2) with existing methods for quaternion matrix inversion.

5.1. Four Alternative Methods

As far as we are aware, there have been only four methods for quaternion matrix inversion previously discussed in the literature. Before we proceed further, we briefly summarize these methods.

5.1.1. Complex Method

Let $\mathcal{\varphi}_{1}:M_{n}(\mathbb{H})\to M_{2n}(\mathbb{C})$ be the injective map defined by

\mathcal{\varphi}_{1}(A+iB+jC+kD)=\begin{bmatrix}A+iB&C+iD\\ -C+iD&A-iB\end{bmatrix},\quad A,B,C,D\in M_{n}(\mathbb{R}).

Clearly $\mathcal{\varphi}_{1}$ is a homomorphism of algebras over $\mathbb{R}$ which sends a quaternion matrix into a double sized complex matrix. Converting quaternion matrices into complex matrices is a standard technique [47, 42], from which we may obtain Algorithm 3.

Algorithm 3 Complex Method

Z=A+iB+jC+kD\in M_{n}(\mathbb{H})

2:inverse of

Z

3:compute

Y=\mathcal{\varphi}_{1}(Z)

4:compute

V=Y^{-1}

5:return

Z^{-1}=\mathcal{\varphi}_{1}^{-1}(V)

5.1.2. Real Method

We may also convert a quaternion matrix into a quadruple sized real matrix [47, 42] via an injective map $\varphi_{2}:M_{n}(\mathbb{H})\to M_{4n}(\mathbb{R})$ defined by

\mathcal{\varphi}_{2}(A+iB+jC+kD)=\begin{bmatrix}A&B&C&D\\ -B&A&-D&C\\ -C&D&A&-B\\ -D&-C&B&A\end{bmatrix},\quad A,B,C,D\in M_{n}(\mathbb{R}).

Again, $\varphi_{2}$ is an $\mathbb{R}$ -algebra homomorphism which leads to Algorithm 4.

Algorithm 4 Real Method

Z=A+iB+jC+kD\in M_{n}(\mathbb{H})

2:inverse of

Z

3:compute

Y=\mathcal{\varphi}_{2}(Z)

4:compute

V=Y^{-1}

5:return

Z^{-1}=\mathcal{\varphi}_{2}^{-1}(V)

5.1.3. Skew Real Method

By exploring the block skew-symmetric structure of $\mathcal{\varphi}_{2}(A+iB+jC+kD)$ , a more efficient approach is proposed in [39]. For ease of reference, we record the method in Algorithm 5. We remark that the orginal approach in [39] is presented in terms of block matrices, while Algorithm 5 is a simplified and more efficient version.

Algorithm 5 Skew Real Method

Z=A+iB+jC+kD\in M_{n}(\mathbb{H})

2:inverse of

Z

3:compute

U_{1}=A^{-1}B

U_{2}=A+BU_{1}

U_{3}=U_{2}^{-1}

U_{4}=U_{1}U_{3}

4:compute

V_{1}=U_{3}C+U_{4}D

V_{2}=-U_{4}C+U_{3}D

5:compute

W_{1}=A+CV_{1}-DV_{2},W_{2}=B+DV_{1}+CV_{2}

W_{3}=W_{1}^{-1}W_{2}

6:compute

X=(W_{1}+W_{2}W_{3})^{-1}

Y=-W_{3}X

Z=-V_{1}X-V_{2}Y

W=V_{2}X-V_{1}Y

7:return

Z^{-1}=X+iY+jZ+kW

5.1.4. Quaternion Toolbox For Matlab (QTFM)

. A quaternion inversion algorithm for quaternion matrix inversion is implemented in the QTFM [33]. The main idea behind this approach is to partition a matrix $Z\in M_{n}(\mathbb{H})$ into

Z=\begin{bmatrix}Z_{11}&Z_{12}\\ Z_{21}&Z_{22}\end{bmatrix}

for some $Z_{11}\in M_{\lfloor n/2\rfloor}(\mathbb{H})$ , $Z_{12}\in\mathbb{H}^{\lfloor n/2\rfloor\times\lceil n/2\rceil}$ , $Z_{21}\in\mathbb{H}^{\lceil n/2\rceil\times\lfloor n/2\rfloor}$ and $Z_{22}\in M_{\lceil n/2\rceil}(\mathbb{H})$ . Then the inverse of $Z$ can be computed by the Schur complement [22] and one can repeat the procedure to obtain Algorithm 6 for $Z^{-1}$ . The key feature of Algorithm 6 that distinguishes it from Algorithms 1–5 is that operations in Algorithm 6 are all over $\mathbb{H}$ , rather than $\mathbb{C}$ or $\mathbb{R}$ .

Algorithm 6 toolbox

Z\in M_{n}(\mathbb{H})

2:inverse of

Z

3:partition

Z=\begin{bmatrix}Z_{11}&Z_{12}\\ Z_{21}&Z_{22}\end{bmatrix}

4:compute

U_{1}=Z_{11}^{-1}

U_{2}=U_{1}Z_{12}

U_{3}=Z_{21}U_{1}

\triangleright

inversion in

M_{\lfloor n/2\rfloor}(\mathbb{H})

5:compute

U_{4}=(Z_{22}-U_{3}Z_{12})^{-1}

U_{5}=U_{4}U_{3}

\triangleright

inversion in

M_{\lceil n/2\rceil}(\mathbb{H})

6:compute

R=\begin{bmatrix}U_{1}+U_{2}U_{5}&-U_{2}U_{4}\\ -U_{5}&U_{4}\end{bmatrix}

7:return

Z^{-1}=X+iY+jZ+kW

5.2. Comparison

In this subsection, we compare computational complexities of Algorithms 1–6. We denote by $\operatorname{inv}_{n,\mathbb{F}}$ (resp. $\operatorname{mult}_{n,\mathbb{F}}$ , $\operatorname{add}_{n,\mathbb{F}}$ ) the inversion (resp. multiplication, addition) in $M_{n}(\mathbb{F})$ , where $\mathbb{F}=\mathbb{R},\mathbb{C}$ or $\mathbb{H}$ . In Table 1, we record operations required by each of Algorithms 1–6 in the column labelled by “operations”.

Let $\mathcal{A}$ be an algorithm for matrix multiplication over $\mathbb{H}$ . We denote by $c_{n,\mathbb{H}}$ , $c_{n,\mathbb{C}}$ and $c_{n,\mathbb{R}}$ the complexity of $\mathcal{A}$ restricted to $M_{n}(\mathbb{H})$ , $M_{n}(\mathbb{C})$ and $M_{n}(\mathbb{R})$ respectively.

Theorem 5.1.

If we compute $\operatorname{mult}_{n,\mathbb{H}}$ by $\mathcal{A}$ and compute $\operatorname{inv}_{n,\mathbb{C}}$ by the LU-decomposition method, then the complexity (ignoring the lower order terms) of Algorithms 1–6 is as shown in the column of Table 1 with label “complexity”. In particular, if $\mathcal{A}$ is the usual algorithm³³3This refers to the algorithm obtained by the definition of matrix multiplication. for matrix multiplication, and if operations in $\mathbb{H}$ and $\mathbb{C}$ are computed by usual methods⁴⁴4This refers to algorithms obtained by definitions of multiplication and addition in $\mathbb{H}$ and $\mathbb{C}$ . over $\mathbb{R}$ , then among Algorithms 1–6, Algorithm 2 is optimal.

Proof.

We recall that inverting an $n\times n$ matrix by the LU-decomposition over $\mathbb{C}$ (resp. $\mathbb{R}$ ) costs (ignoring lower order terms) $\frac{4n^{3}}{3}(\operatorname{mult}_{1,\mathbb{C}}+\operatorname{add}_{1,\mathbb{C}})$ (resp. $\frac{4n^{3}}{3}(\operatorname{mult}_{1,\mathbb{R}}+\operatorname{add}_{1,\mathbb{R}})$ ). If $\mathcal{A}$ is the usual algorithm for matrix multiplication, then we have

c_{n,\mathbb{F}}=n^{3}(\operatorname{mult}_{1,\mathbb{F}}+\operatorname{add}_{1,\mathbb{F}}),

where $\mathbb{F}=\mathbb{H},\mathbb{C}$ or $\mathbb{R}$ . Moreover, if we compute operations in $\mathbb{H}$ and $\mathbb{C}$ in the usual way, then we also have

	$\displaystyle\operatorname{mult}_{1,\mathbb{H}}$	$\displaystyle=16\operatorname{mult}_{1,\mathbb{R}}+12\operatorname{add}_{1,\mathbb{R}},\quad\operatorname{add}_{1,\mathbb{H}}=4\operatorname{add}_{1,\mathbb{R}},$
	$\displaystyle\operatorname{mult}_{1,\mathbb{C}}$	$\displaystyle=4\operatorname{mult}_{1,\mathbb{R}}+2\operatorname{add}_{1,\mathbb{R}},\quad\operatorname{add}_{1,\mathbb{C}}=2\operatorname{add}_{1,\mathbb{R}}.$

This implies that Algorithms 1–6 respectively cost $\frac{136n^{3}}{3},\frac{110n^{3}}{3},\frac{256n^{3}}{3},\frac{512n^{3}}{3},\frac{128n^{3}}{3},128n^{3}$ flops in $\mathbb{R}$ , from which we may conclude the optimality of Algorithm 2. ∎

Algorithm	operations	complexity
1	$2\operatorname{inv}_{n,\mathbb{C}}+3\operatorname{mult}_{n,\mathbb{C}}+\operatorname{add}_{n,\mathbb{C}}$	$\frac{8n^{3}}{3}(\operatorname{mult}_{1,\mathbb{C}}+\operatorname{add}_{1,\mathbb{C}})+3c_{n,\mathbb{C}}$
2	$4\operatorname{inv}_{n,\mathbb{R}}+13\operatorname{mult}_{n,\mathbb{R}}+17\operatorname{add}_{n,\mathbb{R}}$	$\frac{16n^{3}}{3}(\operatorname{mult}_{1,\mathbb{R}}+\operatorname{add}_{1,\mathbb{R}})+13c_{n,\mathbb{R}}$
3	$\operatorname{inv}_{2n,\mathbb{C}}$	$\frac{32n^{3}}{3}(\operatorname{mult}_{1,\mathbb{C}}+\operatorname{add}_{1,\mathbb{C}})$
4	$\operatorname{inv}_{4n,\mathbb{R}}$	$\frac{256n^{3}}{3}(\operatorname{mult}_{1,\mathbb{R}}+\operatorname{add}_{1,\mathbb{R}})$
5	$4\operatorname{inv}_{n,\mathbb{R}}+16\operatorname{mult}_{n,\mathbb{R}}+10\operatorname{add}_{n,\mathbb{R}}$	$\frac{16n^{3}}{3}(\operatorname{mult}_{1,\mathbb{R}}+\operatorname{add}_{1,\mathbb{R}})+16c_{n,\mathbb{R}}$
6	$n\operatorname{inv}_{1,\mathbb{H}}+\sum_{j=1}^{\log_{2}n}2^{j}(3\operatorname{mult}_{n/2^{j},\mathbb{H}}+\operatorname{add}_{n/2^{j},\mathbb{H}})$	$3\sum_{j=1}^{\log_{2}n}2^{j}c_{n/2^{j},\mathbb{H}}$

Table 1. comparison of complexities

Corollary 5.2.

Let $\mathcal{A}$ be an algorithm for matrix multiplication that is faster than the usual algorithm. If we compute operations in $\mathbb{H}$ and $\mathbb{C}$ by usual methods over $\mathbb{R}$ , then Algorithm 2 is still optimal among Algorithms 1–6.

We conclude this subsection by a remark on the computation of operations in $\mathbb{H}$ and $\mathbb{C}$ over $\mathbb{R}$ . It is easy to prove that the usual algorithms for $\operatorname{add}_{1,\mathbb{H}}$ and $\operatorname{add}_{1,\mathbb{C}}$ are already optimal. It is known [23, 10] that any non-commutative algorithm for $\operatorname{mult}_{1,\mathbb{H}}$ costs at least $8$ multiplications in $\mathbb{R}$ , yet there is no known algorithm for $\operatorname{mult}_{1,\mathbb{H}}$ costs $8$ multiplications and less than $32$ additions in $\mathbb{R}$ . Thus the usual algorithm is in fact the optimal one (in the sense of total cost of multiplications and additions ) among all existing⁵⁵5We reiterate that we are assessing optimality in terms of the set of cubic-complexity quaternion algorithms currently available. Those algorithms, and ours, can of course be applied as the base case within the generic recursions used by the best available Strassen-type algorithm, with the optimal among that set clearly providing the best coefficient on the overall subcubic complexity. algorithms for $\operatorname{mult}_{1,\mathbb{H}}$ .

Similarly, it is well-known [35, 45, 36, 32, 5] that any non-commutative algorithm for $\operatorname{mult}_{1,\mathbb{C}}$ requires at least $3$ multiplications in $\mathbb{R}$ and the optimality is achieved by the Gauss algorithm. However, the Gauss algorithm costs $5$ additions in $\mathbb{R}$ . Therefore, the usual algorithm is most efficient among all known algorithms for $\operatorname{mult}_{1,\mathbb{C}}$ .

5.3. Experiments

For each integer $1\leq m\leq 50$ , we take $n=100m$ and apply Algorithms 1–6 to invert random matrices in $M_{n}(\mathbb{H})$ . To be more precise, we generate $n\times n$ real matrices $A,B,C$ and $D$ whose elements are uniformly drawn from the interval $(-1,1)$ . Then with probability one, Algorithms 1–6 are all applicable to $Z=A+iB+jC+kD\in M_{n}(\mathbb{H})$ . For each $m$ , we repeat the above procedure $50$ times to obtain $50$ samples for our test.

We record in Figure 1 the average running time of each of Algorithms 1–6. For better comparison, we also compute the ratio

r_{n,s}=\frac{t_{n,5}}{t_{n,s}},\quad s\in\{1,2,3,4,6\}.

Here $t_{n,p}$ denotes the average running time of Algorithm $p$ for instances of size $n\times n$ . By definition, $r_{n,s}>r_{n,s^{\prime}}$ indicates that Algorithm $s$ is more efficient than Algorithm $s^{\prime}$ . Figure 2 exhibits how $r_{n,s}$ varies as $n$ grows for each $s\in\{1,2,3,4,6\}$ . It is clear from Figures 1 and 2 that Algorithm 2 is significantly faster than all other five algorithms.

Refer to caption — Figure 1. Algorithm 2 (blue) incurs significantly less computational overhead compared to other five algorithms.

Let $\widehat{Z}$ be the inverse of $Z$ obtained by one of Algorithms 1–6. We measure the quality of $\widehat{Z}$ by the mean right residual:

\operatorname{res}_{n}=\frac{\lVert Z\widehat{Z}-I_{n}\rVert_{F}}{n^{2}}.

Here $\lVert\cdot\rVert_{F}$ is the Frobenius norm on $M_{n}(\mathbb{H})$ . We record in Figure 3 the average of the mean right residual of $50$ samples for each of Algorithms 1–6. From Figure 3, we may easily conclude that the mean right residual for Algorithms 2, 5 and 6 is less than $5\cdot 10^{-13}$ , while the other three algorithms have relatively smaller mean right residual.

6. Discussion

In this paper we have presented a new algorithm for efficiently inverting quaternion matrices, and we have provided empirical results showing that it is the fastest among quaternion inversion algorithms found in the literature. Consequently, it represents the optimizing recursive base case for Strassen-type inversion algorithms to achieve the best combination of computational complexity and practical performance for inverting quaternion matrices.

References

[1] J. Alman and V. V. Williams. A refined laser method and faster matrix multiplication. In D. Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 522–539. SIAM, 2021.
[2] T. W. Anderson. An introduction to multivariate statistical analysis. John Wiley & Sons Inc., 1958.
[3] V. I. Arnold. The geometry of spherical curves and the algebra of quaternions. Russian Mathematical Surveys, 50:1–68, 1995.
[4] M. Arribas, A. Elipe, and M. Palacios. Quaternions and the rotation of a rigid body. Celestial Mechanics and Dynamical Astronomy, 96(3):239–251, 2006.
[5] P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1997. With the collaboration of Thomas Lickteig.
[6] C. C. Chevalley, P. Cartier, and C. Chevalley. The algebraic theory of spinors and clifford algebras. 1996.
[7] P. M. Cohn. Skew Fields: Theory of General Division Rings. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1995.
[8] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. J. Symb. Comput., 9(3):251–280, 1990.
[9] Z. Dai, L.-H. Lim, and K. Ye. Inverting a complex matrix. arXiv, preprint, 2022.
[10] H. F. de Groote. On the complexity of quaternion multiplication. Information Processing Letters, 3(6):177–179, 1975.
[11] C. A. Deavours. The quaternion calculus. The American Mathematical Monthly, 80(9):995–1008, 1973.
[12] B. A. Dent and A. Newhouse. Polynomials orthogonal over discrete domains. SIAM Review, 1(1):55–59, 1959.
[13] H. M. Faugeras OD. The representation, recognition, and locating of 3-d objects. The International Journal of Robotics Research, 5(3):27–52, 1986.
[14] C. J. Gaudet and A. Maida. Deep quaternion networks. 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2017.
[15] I. Gelfand. Normarte ring. Mat. Sb., 9:43–54, 1941.
[16] L. Ghouti. Robust perceptual color image hashing using quaternion singular value decomposition. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3794–3798, 2014.
[17] C. GRUBIN. Derivation of the quaternion scheme via the euler axis and angle. Journal of Spacecraft and Rockets, 7(10):1261–1263, 1970.
[18] W. R. Hamilton. Elements of Quaternions. Cambridge Library Collection - Mathematics. Cambridge University Press, 2010.
[19] M. T. Heath, G. A. Geist, and J. B. Drake. Early experience with the intel ipsc/860 at oak ridge national laboratory. The International Journal of Supercomputing Applications, 5(2):10–26, 1991.
[20] H. Hopf. Über die abbildungen der dreidimensionalen sphäre auf die kugelfläche. Mathematische Annalen, 104(1):637–665, 1931.
[21] B. K. P. Horn. Closed-form solution of absolute orientation using unit quaternions. Journal of The Optical Society of America A-optics Image Science and Vision, 4:629–642, 1987.
[22] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, 2 edition, 2012.
[23] T. D. Howell and J. P. J. Lafon. The complexity of the quaternion product. 1975.
[24] A. Hurwitz. Über die komposition der quadratischen formen. Mathematische Annalen, 88(1):1–25, 1922.
[25] Z. Jia, M. K. Ng, and G.-J. Song. Lanczos method for large-scale quaternion singular value decomposition. Numerical Algorithms, 82(2):699–717, 2019.
[26] H. B. Lawson and M.-L. Michelsohn. Spin Geometry (PMS-38). Princeton University Press, 1989.
[27] Q. Liu, S. Ling, and Z. Jia. Randomized quaternion singular value decomposition for low-rank matrix approximation. SIAM J. Sci. Comput., 44(2):A870–A900, jan 2022.
[28] A. Mallios. Topological algebras. Selected topics, volume 124 of North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam, 1986. Notas de Matemática [Mathematical Notes], 109.
[29] M. Nakano, J. Seino, and H. Nakai. Development of spin-dependent relativistic open-shell hartree–fock theory with time-reversal symmetry (i): The unrestricted approach. International Journal of Quantum Chemistry, 117(10):e25356, 2017.
[30] J. Paldus, T. Sako, X. Li, and G. H. F. Diercksen. Symmetry-breaking in the independent particle model: nature of the singular behavior of hartree–fock potentials. Journal of Mathematical Chemistry, 51(2):427–450, 2013.
[31] T. Parcollet, M. Morchid, and G. Linarès. A survey of quaternion neural networks. Artificial Intelligence Review, 53:2957–2982, 2019.
[32] K. Ramachandra and V. Strassen. Vermeidung von divisionen. Journal für die reine und angewandte Mathematik (Crelles Journal), 1973:184 – 202, 1973.
[33] S. J. Sangwine and N. L. Bihan. Quaternion toolbox for matlab, version 3.4, 2022. available online: https://qtfm.sourceforge.net.
[34] K. Shoemake. Animating rotation with quaternion curves. SIGGRAPH ’85, New York, NY, USA, 1985. Association for Computing Machinery.
[35] V. Strassen. Rank and optimal computation of generic tensors. Linear Algebra and its Applications, 52-53:645–685, 1983.
[36] V. STRASSEN. Relative bilinear complexity and matrix multiplication. 1987(375-376):406–443, 1987.
[37] C. Studer, S. Fateh, and D. Seethaler. Asic implementation of soft-input soft-output mimo detection using mmse parallel interference cancellation. IEEE Journal of Solid-State Circuits, 46:1754–1765, 2011.
[38] A. Sudbery. Quaternionic analysis. Mathematical Proceedings of the Cambridge Philosophical Society, 85(2):199–225, 1979.
[39] J. D. Turner. Skew-symmetric algorithm for inverting nxn quaternion equations using nxn real variable matrix arithmetic. The Journal of the Astronautical Sciences, 57(1-2):275–287, 2009.
[40] J. Vía, D. P. Palomar, L. Vielva, and I. Santamaría. Maximum likelihood ica of quaternion gaussian vectors. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4260–4263, 2011.
[41] J. Vía, D. Ramírez, and I. Santamaría. Properness and widely linear processing of quaternion random vectors. IEEE Transactions on Information Theory, 56(7):3502–3515, 2010.
[42] M. Wei, Y. Li, F. Zhang, and J. Zhao. Quaternion matrix computations. Nova Science Publishers, Incorporated, 2018.
[43] D. B. Williams-Young and X. Li. On the efficacy and high-performance implementation of quaternion matrix multiplication. ArXiv, abs/1903.05575, 2019.
[44] Y. Xu, L. Yu, H. Xu, H. Zhang, and T. Nguyen. Vector sparse representation of color image using quaternion matrix analysis. 24(4), 2015.
[45] K. Ye and L.-H. Lim. Fast structured matrix computations: Tensor rank and cohn–umans method. Foundations of Computational Mathematics, 18(1):45–95, 2018.
[46] M. Yuan. High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res., 11:2261–2286, aug 2010.
[47] F. Zhang. Quaternions and matrices of quaternions. Linear Algebra and its Applications, 251:21–57, 1997.
[48] A. Zielinski. On inversion of complex matrices. Internat. J. Numer. Methods Engrg., 14(10):1563–1566, 1979.

Generalizing Frobenius Inversion to Quaternion Matrices

Abstract.

1. Introduction

2. Background

3. Quaternion Matrices

4. Inverting Quaternion Matrices

Lemma 4.1.

Proof.

4.1. The Frobenius algorithm over Mn​(ℂ)M_{n}(\mathbb{C}) and its optimality

Lemma 4.2.

Lemma 4.3.

Proof.

Theorem 4.4.

Proof.

4.2. The Frobenius algorithm over Mn​(ℝ)M_{n}(\mathbb{R}) and its complexity

Proposition 4.5.

5. Comparison with existing methods

5.1. Four Alternative Methods

5.1.1. Complex Method

5.1.2. Real Method

5.1.3. Skew Real Method

5.1.4. Quaternion Toolbox For Matlab (QTFM)

5.2. Comparison

Theorem 5.1.

Proof.

Corollary 5.2.

5.3. Experiments

6. Discussion

References

4.1. The Frobenius algorithm over $M_{n}(\mathbb{C})$ and its optimality

4.2. The Frobenius algorithm over $M_{n}(\mathbb{R})$ and its complexity