CP decomposition and low-rank approximation of antisymmetric tensors

Erna Begović Kovač and Lana Periša

Abstract.

For the antisymmetric tensors the paper examines a low-rank approximation which is represented via only three vectors. We describe a suitable low-rank format and propose an alternating least squares structure-preserving algorithm for finding such approximation. Moreover, we show that this approximation problem is equivalent to the problem of finding the best multilinear low-rank antisymmetric approximation and, consequently, equivalent to the problem of finding the best unstructured rank- $1$ approximation. The case of partial antisymmetry is also discussed. The algorithms are implemented in Julia programming language and their numerical performance is discussed.

Key words and phrases:

CP decomposition, antisymmetric tensors, low-rank approximation, structure-preserving algorithm, Julia

Mathematics Subject Classification:

15A69

Erna Begović Kovač, University of Zagreb Faculty of Chemical Engineering and Technology, Marulićev trg 19, 10000 Zagreb, Croatia, [email protected]

Lana Periša, Visage Technologies, Ivana Lučića 2a, 10000 Zagreb, Croatia, [email protected]

This work has been supported in part by Croatian Science Foundation under the project UIP-2019-04-5200.

1. Introduction

Tensor decompositions have been extensively studied in recent years [2, 9, 10, 19, 22]. However, the research was mostly focused on either unstructured or symmetric [7, 21] tensors. In this paper we explore the antisymmetric tensors, their CP decomposition and the algorithms for the low-rank approximation.

The idea of the CP decomposition is to write a tensor as a sum of its rank-one components. It was first introduced by Hitchcock [17, 18] in 1927, but it only became popular in the 1970s as CANDECOMP (canonical decomposition) [5] and PARAFAC (parallel factors) [16]. This decomposition is closely related to the tensor rank $R$ , which is defined as the minimal number of rank- $1$ summands in the exact CP decomposition. Contrary to the matrix case, the rank of a tensor can exceed its dimension, and it can be different over $\mathbb{R}$ and over $\mathbb{C}$ . It is known that the problem of finding the rank of a given tensor is NP-hard.

When computing the CP approximation, the main question is the choice of the number of rank-one components. Given the antisymmetric structure of our tensors in question, we impose an additional constraint on the CP decomposition. This constraint assures that the resulting tensor is, indeed, antisymmetric, and it gives a bound on the minimal number of rank-one components.

We focus on the tensors of order three. For a given antisymmetric tensor $\mathcal{A}\in\mathbb{R}^{n\times n\times n}$ our goal is to find its low-rank antisymmetric approximation which is represented via only three vectors. In particular, we are looking for the approximation $\widetilde{\mathcal{A}}$ of $\mathcal{A}$ such that $\text{rank}(\widetilde{\mathcal{A}})\leq 6$ for any $n$ , and

\widetilde{\mathcal{A}}=\frac{1}{6}(x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x),

where $x,y,z\in\mathbb{R}^{n}$ . We propose an alternating least squares structure-preserving algorithm for solving this problem. The algorithm is based on solving a minimization problem in each tensor mode. We compare our algorithm with a “naive” idea which uses a posteriori antisymmetrization. Further on, we show that our approximation problem is equivalent to the problem of the best multilinear rank- $3$ structure-preserving antisymmetric tensor approximation from [3] and, consequently, equivalent to the problem of the best unstructured rank- $1$ approximation. This establishes the equivalence between our algorithm and the higher-order power method (HOPM). Therefore, corresponding convergence result for HOPM from [30] can be applied.

Additionally, we study the tensors with partial antisymmetry, that is, antisymmetry in only two modes. Similarly to what we do for the tensors that are antisymmetric in all modes, we first determine a suited format of the CP decomposition, which is going to be simpler for the partial antisymmetry. Based on this format, for a given tensor $\mathcal{C}\in\mathbb{R}^{n\times n\times m}$ antisymmetric in two modes, we are looking for its approximation $\widetilde{\mathcal{C}}$ of the same structure such that $\widetilde{\mathcal{C}}$ is represented by three vectors and $\text{rank}(\widetilde{\mathcal{C}})=2$ .

In Section 2 we introduce the notation and preliminaries. Our problem of antisymmetric tensor approximation is described in Section 3. In Sections 4 we describe the approach with a posteriori antisymmetrization, while in Section 5 we propose the algorithm for solving the minimization problem from Section 3. Section 6 deals with the case of partial antisymmetry. In Section 7 we discuss our numerical results obtained in Julia programming language and conclusion is given in Section 8.

2. Notation and preliminaries

Throughout the paper we denote tensors by calligraphic letters, e.g., $\mathcal{A}$ . We refer to the tensor dimension as its order. Then, for $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ we say that $\mathcal{A}$ is a tensor of order $d$ . Tensor $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ is cubical if $n_{1}=n_{2}=\cdots=n_{d}$ . Vectors obtained from a tensor by fixing all indices but the $m$ th one are called mode- $m$ fibers. Fibers of an order- $3$ tensor are columns (mode- $1$ fibers), rows (mode- $2$ fibers), and tubes (mode- $3$ fibers). Matrices obtained from a tensor by fixing all indices but two are called slices. Matrix representation of a tensor $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ is called mode- $m$ matricization and it is denoted by $A_{(m)}$ . It is obtained by arranging mode- $m$ fibers of $\mathcal{A}$ as columns of $A_{(m)}$ . The mode- $m$ product of a tensor $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ with a matrix $M\in\mathbb{R}^{p\times n_{m}}$ is a tensor $\mathcal{B}\in\mathbb{R}^{n_{1}\times\cdots\times n_{m-1}\times p\times n_{m+1}\times\cdots\times n_{d}}$ ,

\mathcal{B}=\mathcal{A}\times_{m}M,\quad\text{such that}\quad B_{(m)}=MA_{(m)}.

Tensor norm is a generalization of the Frobeius norm. For $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ we have

\|\mathcal{A}\|=\sqrt{\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}a_{i_{1}i_{2}\ldots i_{d}}^{2}}.

The inner product of two tensors $\mathcal{A},\mathcal{B}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ is given by

\langle\mathcal{A},\mathcal{B}\rangle=\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}a_{i_{1}i_{2}\ldots i_{d}}b_{i_{1}i_{2}\ldots i_{d}}.

The vector outer product is denoted by $\circ$ . A tensor $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}}$ is a rank- $1$ tensor if it can be written as the outer product of $d$ vectors,

\mathcal{A}=v^{(1)}\circ v^{(2)}\circ\cdots\circ v^{(d)}.

Then

a_{i_{1}i_{2}\cdots i_{d}}=v^{(1)}_{i_{1}}v^{(2)}_{i_{2}}\cdots v^{(d)}_{i_{d}},\quad 1\leq i_{k}\leq n_{k},\ 1\leq k\leq d.

The Khatri-Rao product of two matrices $A\in\mathbb{R}^{m\times n},B\in\mathbb{R}^{p\times n}$ is defined as

A\odot B=\begin{bmatrix}a_{1}\otimes b_{1}&a_{2}\otimes b_{2}&\cdots&a_{n}\otimes b_{n}\end{bmatrix}\in\mathbb{R}^{(mp)\times n},

where $a_{k}$ and $b_{k}$ denote the $k$ th column of $A$ and $B$ , respectively. The Hadamard (element-wise) product of two matrices $A,B\in\mathbb{R}^{m\times n}$ , is defined as

A\ast B=\begin{bmatrix}a_{11}b_{11}&a_{12}b_{12}&\cdots&a_{1n}b_{1n}\\ a_{21}b_{21}&a_{22}b_{22}&\cdots&a_{2n}b_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}b_{m1}&a_{m2}b_{m2}&\cdots&a_{mn}b_{mn}\end{bmatrix}\in\mathbb{R}^{m\times n}.

The Moore-Penrose inverse of $A$ is denoted by $A^{+}$ .

For a tensor $\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}$ , its CP approximation takes the form

\mathcal{A}\approx\sum_{i=1}^{r}(x_{i}\circ y_{i}\circ z_{i}),

(2.1)

where $x_{i}\in\mathbb{R}^{n_{1}}$ , $y_{i}\in\mathbb{R}^{n_{2}}$ , $z_{i}\in\mathbb{R}^{n_{3}}$ . If we arrange vectors $x_{i},y_{i},z_{i}$ , $i=1,\ldots,r$ into matrices

X=\left[\begin{array}[]{cccc}x_{1}&x_{2}&\cdots&x_{r}\\ \end{array}\right],\quad Y=\left[\begin{array}[]{cccc}y_{1}&y_{2}&\cdots&y_{r}\\ \end{array}\right],\quad Z=\left[\begin{array}[]{cccc}z_{1}&z_{2}&\cdots&z_{r}\\ \end{array}\right],

relation (2.1) can be written as

\mathcal{A}\approx[[X,Y,Z]]=\sum_{i=1}^{r}(x_{i}\circ y_{i}\circ z_{i}).

(2.2)

The smallest number $r$ in the exact CP decomposition (2.2) is called tensor rank. We write $\text{rank}(\mathcal{A})=r$ .

The most commonly used algorithm for computing the CP approximation is alternating least squares (ALS) algorithm. (See, e.g., [22].) In Algorithm 2.1 we give the CP-ALS algorithm for order- $3$ tensors.

Algorithm 2.1.

CP-ALS

Input:

\mathcal{A}\in\mathbb{R}^{n\times n\times n}

r\in\mathbb{N}

Output:

X,Y,Z\in\mathbb{R}^{n\times r}

Initialize

X,Y,Z

as leading

r

left singular vectors of

A_{(i)}

i=1,2,3

, respectively.

repeat

X=A_{(1)}(Z\odot Y)(Y^{T}Y\ast Z^{T}Z)^{+}

Y=A_{(2)}(Z\odot X)(X^{T}X\ast Z^{T}Z)^{+}

Z=A_{(3)}(Y\odot X)(X^{T}X\ast Y^{T}Y)^{+}

until convergence or maximum number of iterations

3. Problem description

A cubical tensor is symmetric (sometimes also called supersymmetric) if its elements are invariant to any permutation of indices. On the contrary, a cubical tensor is antisymmetric if its elements change sign when permuting pairs of indices. In particular, an order- $3$ tensor $\mathcal{A}\in\mathbb{R}^{n\times n\times n}$ is antisymmetric if

a_{ijk}=a_{jki}=a_{kij}=-a_{ikj}=-a_{jik}=-a_{kji},\quad 1\leq i,j,k\leq n.

(3.1)

Such tensors are also called alternating $3$ -tensors $\Lambda^{3}(\mathbb{R}_{n})$ [23], or $3$ -vectors [14]. The antisymmetric tensors appear in applications such as quantum chemistry [27] and electromagnetism [26]. Besides, they are interesting from the mathematical point of view [3, 15]. From the definition of the antisymmetric tensor $\mathcal{A}$ it obviously follows that

(i)

In all modes, all slices of $\mathcal{A}$ are antisymmetric matrices.
(ii)

In all modes, all slices have one null column and one null row.
(iii)

Antisymmetric tensors are data-sparse in the sense that many of its non-zero elements are the same, up to the sign.

These facts are useful when it comes to the implementation of the specific algorithms.

We can define the antisymmetrizer “anti” as the orthogonal projection of a general cubical order- $d$ tensor $\mathcal{B}$ to the subspace of antisymmetric tensors. Then, $\mathcal{A}=\text{anti}(\mathcal{B})$ is an order- $d$ tensor given by

\mathcal{A}(i_{1},i_{2},\ldots,i_{d})\coloneqq\frac{1}{d!}\sum_{p\in\pi(d)}\text{sign}(p)\mathcal{B}(p(i_{1}),p(i_{2}),\ldots,p(i_{d})),

where $\pi(d)$ denotes the set of all permutations of length $d$ . Hence, for $d=3$ , $\mathcal{B}\in\mathbb{R}^{n\times n\times n}$ and $\mathcal{A}=\text{anti}(\mathcal{B})$ we have

a_{ijk}=\frac{1}{6}(b_{ijk}+b_{jki}+b_{kij}-b_{ikj}-b_{jik}-b_{kji}).

(3.2)

Let $\mathcal{A}\in\mathbb{R}^{n\times n\times n}$ be an antisymmetric tensor of order three. Take a triplet of indices $(i,j,k)$ , $1\leq i<j<k\leq n$ . It follows from (3.1) that a subtensor $\hat{\mathcal{A}}$ of $\mathcal{A}$ obtained at the intersection of the $i$ th, $j$ th, and $k$ th column, row, and tube is of the form

\hat{\mathcal{A}}=\alpha\mathcal{E},

where $\alpha\in\mathbb{R}$ and $\mathcal{E}$ is a $3\times 3\times 3$ tensor such that

\mathcal{E}(i_{1},i_{2},i_{3})=\left\{\begin{array}[]{rl}1,&\text{if the indices make an even permutation of $(1,2,3)$,}\\ -1,&\text{if the indices make an odd permutation of $(1,2,3)$,}\\ 0,&\text{if two or more indices are equal.}\\ \end{array}\right.

Tensor $\mathcal{E}$ is called the Levi-Civita tensor [12]. We can also write $\mathcal{E}$ using its matricization

E_{(1)}=\left[\begin{array}[]{ccc|ccc|ccc}0&0&0&0&0&-1&0&1&0\\ 0&0&1&0&0&0&-1&0&0\\ 0&-1&0&1&0&0&0&0&0\\ \end{array}\right].

(3.3)

Obviously, $\mathcal{E}$ is the simplest possible antisymmetric non-zero order- $3$ tensor.

For three given vectors $x,y,z\in\mathbb{R}^{n}$ we define an $n\times n\times n$ antisymmetric tensor associated to these vectors as

\mathcal{A}_{6}(x,y,z)\coloneqq\frac{1}{6}(x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x).

(3.4)

Note that tensor $\mathcal{E}$ is a special case of the antisymmetric tensor $\mathcal{A}_{6}(x,y,z)$ . For $x=[6,0,0]^{T}$ , $y=[0,1,0]^{T}$ , $z=[0,0,1]^{T}$ , we get $\mathcal{A}_{6}(x,y,z)=\mathcal{E}$ . Moreover, for a rank one tensor $\mathcal{T}=[[x,y,z]]$ , we have $\mathcal{A}_{6}(x,y,z)=\text{anti}(\mathcal{T})$ . Tensor format (3.4) can be favourable because it represents an antisymmetric tensor via only three vectors, that is, $3n$ entries. On the other hand, the standard form of an $n\times n\times n$ antisymmetric tensor contains $\binom{n}{3}$ different entries. Besides, tensor $\mathcal{A}_{6}(x,y,z)$ is a low-rank tensor. For any size $n$ , we have $\text{rank}(\mathcal{A}_{6}(x,y,z))\leq 6$ .

Our goal is to approximate a given antisymmetric tensor $\mathcal{A}$ with a low-rank antisymmetric tensor of the form (3.4). We demonstrate two approaches. The “naive” one is given in Section 4. Then, in Section 5 we formulate this problem as the minimization probelm. For a given non-zero antisymmetric tensor $\mathcal{A}\in\mathbb{R}^{n\times n\times n}$ , we are looking for a tensor $\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)$ , i.e., vectors $x,y,z\in\mathbb{R}^{n}$ , such that

\|\mathcal{A}-\widetilde{\mathcal{A}}\|^{2}\to\min.

4. CP-ALS with a posteriori antisymmetrization

First we describe a naive approach. The process is made of two steps. Step $1$ : Using the CP-ALS algorithm 2.1 that ignores the tensor structure we find a rank- $1$ approximation $\bar{\mathcal{A}}$ of $\mathcal{A}$ ,

\bar{\mathcal{A}}=[[x,y,z]],\quad\text{rank}(\bar{\mathcal{A}})=1.

Step $2$ : We apply the antisymmetrizer (3.2) on $\bar{\mathcal{A}}$ to obtain $\widetilde{\mathcal{A}}$ in the form (3.4),

\widetilde{\mathcal{A}}=\text{anti}(\bar{\mathcal{A}}),

that is,

\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z).

This procedure is given in Algorithm 4.1. We do not need to form the tensor $\bar{\mathcal{A}}$ explicitly.

Algorithm 4.1.

CP with a posteriori antisymmetrization

Input:

\mathcal{A}\in\mathbb{R}^{n\times n\times n}

antisymmetric

Output:

\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)

Apply Algorithm 2.1 on

\mathcal{A}

with

r=1

to obtain

x,y,z\in\mathbb{R}^{n}

\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)

Obviously, using a rank- $1$ intermediate tensor produces an unnecessarily big approximation error. However, it can be easily shown that if the error of the rank- $1$ approximation is bounded by some $\epsilon>0$ , the resulting error will also be bounded by $\epsilon$ .

5. Antisymmetry-preserving CP algorithm

For a given antisymmetric tensor $\mathcal{A}\in\mathbb{R}^{n\times n\times n}$ we are looking for vectors $x,y,z\in\mathbb{R}^{n}$ such that

\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|^{2}\to\min.

(5.1)

Contrary to the Algorithm 2.1, here we develop a new structure-preserving low-rank approximation algorithm. Our algorithm uses the ALS approach, that is, we are solving an optimization problem in each mode. It results with a tensor of the form (3.4) and there is no need to apply the antisymmetrizer. ALS algorithms are widely used to address different multilinear minimization problems [8, 29, 11, 13], including the ones regarding the CP approximation [1, 24, 20]. There is also a very recent extension to the antisymmetric case [28], but both the problem and the algorithm are different from ours.

Set

a=\left[\begin{array}[]{c}x\\ y\\ z\\ \end{array}\right]\in\mathbb{R}^{3n}.

Then, similarly to what was done in [1], we define the objective function $f\colon\mathbb{R}^{3n}\to\mathbb{R}$ ,

f(a)=6\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|^{2}.

(5.2)

We consider three partial minimization problems:

\min_{x}f(a),\quad\min_{y}f(a),\quad\min_{z}f(a).

(5.3)

Before we formulate the algorithm, we need to prove Theorem 5.1. It gives three reformulations of the objective function $f$ that we are going to use in order to find the solutions of the problems (5.3).

Observe that, since $\mathcal{A}_{6}(x,y,z)$ is linear in $x$ , $y$ and $z$ , the objective function is quadratic in $x$ , $y$ and $z$ . The approximation problem becomes a quadratic optimization problem. Here we derive the quadratic forms explicitly. However, it is worth mentioning that the underlying linearity opens the possibilities of the extension to more general settings.

In order to simplify the statement of the theorem, we define the following objects: matrices $Q^{(1)}=Q^{(1)}(y,z),Q^{(2)}=Q^{(2)}(x,z),Q^{(3)}=Q^{(3)}(x,y)\in\mathbb{R}^{n\times n}$ ,

$\displaystyle Q^{(1)}$	$\displaystyle=2\left(\left(\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-\langle y,z\rangle^{2}\right)I_{n}+(yz^{T}-zy^{T})^{2}\right),$	(5.4)
$\displaystyle Q^{(2)}$	$\displaystyle=2\left(\left(\\|z\\|_{2}^{2}\\|x\\|_{2}^{2}-\langle z,x\rangle^{2}\right)I_{n}+(zx^{T}-xz^{T})^{2}\right),$	(5.5)
$\displaystyle Q^{(3)}$	$\displaystyle=2\left(\left(\\|x\\|_{2}^{2}\\|y\\|_{2}^{2}-\langle x,y\rangle^{2}\right)I_{n}+(xy^{T}-yx^{T})^{2}\right);$	(5.6)

vectors $c^{(1)}=c^{(1)}(y,z),c^{(2)}=c^{(2)}(x,z),c^{(3)}=c^{(3)}(x,y)\in\mathbb{R}^{n}$ ,

$\displaystyle c^{(1)}$	$\displaystyle=-12\mathcal{A}\times_{2}y^{T}\times_{3}z^{T},$	(5.7)
$\displaystyle c_{i}^{(2)}$	$\displaystyle=-12\mathcal{A}\times_{2}z^{T}\times_{3}x^{T},$	(5.8)
$\displaystyle c_{i}^{(3)}$	$\displaystyle=-12\mathcal{A}\times_{2}x^{T}\times_{3}y^{T};$	(5.9)

and a real number

d=6\|\mathcal{A}\|^{2}.

(5.10)

Theorem 5.1.

The function $f$ defined by (5.2) can be written as

$\displaystyle f(a)$	$\displaystyle=d+(c^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x$	(5.11)
	$\displaystyle=d+(c^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y$	(5.12)
	$\displaystyle=d+(c^{(3)})^{T}z+\frac{1}{2}z^{T}Q^{(3)}z,$	(5.13)

for $Q^{(1)},Q^{(2)},Q^{(3)}\in\mathbb{R}^{n\times n}$ , $c^{(1)},c^{(2)},c^{(3)}\in\mathbb{R}^{n}$ , and $d\in\mathbb{R}$ defined by the relations (5.4)–(5.10).

Proof.

First, we can write the function $f$ from (5.2) as

	$\displaystyle f(a)$	$\displaystyle=6\\|\mathcal{A}\\|^{2}-2\langle\mathcal{A},6\mathcal{A}_{6}(x,y,z)\rangle+\frac{1}{6}\\|6\mathcal{A}_{6}(x,y,z)\\|^{2}$
		$\displaystyle=6f_{1}(a)-2f_{2}(a)+\frac{1}{6}f_{3}(a),$		(5.14)

where

$\displaystyle f_{1}(a)$	$\displaystyle=\\|\mathcal{A}\\|^{2},$
$\displaystyle f_{2}(a)$	$\displaystyle=\langle\mathcal{A},x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x\rangle,$	(5.15)
$\displaystyle f_{3}(a)$	$\displaystyle=\\|x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x\\|^{2}.$	(5.16)

For the function $f_{2}$ we have

	$\displaystyle f_{2}(a)$	$\displaystyle=\sum_{i,j,k=1}^{n}a_{ijk}(x_{i}y_{j}z_{k}+y_{i}z_{j}x_{k}+z_{i}x_{j}y_{k}-x_{i}z_{j}y_{k}-y_{i}x_{j}z_{k}-z_{i}y_{j}x_{k})$
		$\displaystyle=\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k}+\sum_{k=1}^{n}x_{k}\sum_{i,j=1}^{n}a_{ijk}y_{i}z_{j}+\sum_{j=1}^{n}x_{j}\sum_{i,k=1}^{n}a_{ijk}z_{i}y_{k}$
		$\displaystyle\ +\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{ijk})z_{j}y_{k}+\sum_{k=1}^{n}x_{k}\sum_{i,j=1}^{n}(-a_{ijk})z_{i}y_{j}+\sum_{j=1}^{n}x_{j}\sum_{i,k=1}^{n}(-a_{ijk})y_{i}z_{k}.$

We rename the indices in the upper expression and use the fact that $\mathcal{A}$ is antisymmetric. We get

$\displaystyle f_{2}(a)$	$\displaystyle=\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{jki}y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{kij}y_{j}z_{k}$
	$\displaystyle\ +\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{ikj})y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{kji})y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{jik})y_{j}z_{k}$
	$\displaystyle=6\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k}.$	(5.17)

Further on, we write the function $f_{3}$ as

f_{3}(a)=\sum_{i,j,k=1}^{n}(x_{i}y_{j}z_{k}+y_{i}z_{j}x_{k}+z_{i}x_{j}y_{k}-x_{i}z_{j}y_{k}-y_{i}x_{j}z_{k}-z_{i}y_{j}x_{k})^{2}.

(5.18)

After regrouping the summands and renaming the indices, as we did for $f_{2}$ , it follows from (5.18) that

	$\displaystyle f_{3}(a)=$
	$\displaystyle=6\sum_{i=1}^{n}x_{i}^{2}\left(\sum_{j,k=1}^{n}y_{j}^{2}z_{k}^{2}\right)-6\sum_{i=1}^{n}x_{i}^{2}\left(\sum_{j,k=1}^{n}y_{j}y_{k}z_{j}z_{k}\right)$
	$\displaystyle\ +12\sum_{i,j=1}^{n}x_{i}x_{j}\left(\sum_{k=1}^{n}y_{i}y_{k}z_{j}z_{k}\right)-6\sum_{i,j=1}^{n}x_{i}x_{j}\left(\sum_{k=1}^{n}y_{i}y_{j}z_{k}^{2}\right)-6\sum_{i,j=1}^{n}x_{i}x_{j}\left(\sum_{k=1}^{n}y_{k}^{2}z_{i}z_{j}\right)$
	$\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(6\sum_{j=1}^{n}y_{j}^{2}\sum_{k=1}^{n}z_{k}^{2}-6\bigg{(}\sum_{j=1}^{n}y_{j}z_{j}\bigg{)}^{2}\right)$
	$\displaystyle\ +\sum_{i,j=1}^{n}x_{i}x_{j}\left(12\sum_{k=1}^{n}y_{i}y_{k}z_{j}z_{k}-6\sum_{k=1}^{n}y_{i}y_{j}z_{k}^{2}-6\sum_{k=1}^{n}y_{k}^{2}z_{i}z_{j}\right)$
	$\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(6\sum_{j=1}^{n}y_{j}^{2}\sum_{k=1}^{n}z_{k}^{2}-6\bigg{(}\sum_{j=1}^{n}y_{j}z_{j}\bigg{)}^{2}+12y_{i}z_{i}\sum_{k=1}^{n}y_{k}z_{k}-6y_{i}^{2}\sum_{k=1}^{n}z_{k}^{2}-6z_{i}^{2}\sum_{k=1}^{n}y_{k}^{2}\right)$
	$\displaystyle\ +\sum_{\begin{subarray}{c}i,j=1\\ i<j\end{subarray}}^{n}x_{i}x_{j}\left(12y_{i}z_{j}\sum_{k=1}^{n}y_{k}z_{k}+12z_{i}y_{j}\sum_{k=1}^{n}y_{k}z_{k}-12y_{i}y_{j}\sum_{k=1}^{n}z_{k}^{2}-12z_{i}z_{j}\sum_{k=1}^{n}y_{k}^{2}\right).$

That is,

	$\displaystyle f_{3}(a)$	$\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(6\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-6\langle y,z\rangle^{2}+12y_{i}z_{i}\langle y,z\rangle-6y_{i}^{2}\\|z\\|_{2}^{2}-6z_{i}^{2}\\|y\\|_{2}^{2}\right)$
		$\displaystyle\ +\sum_{\begin{subarray}{c}i,j=1\\ i<j\end{subarray}}^{n}x_{i}x_{j}\left(12(y_{i}z_{j}+z_{i}y_{j})\langle y,z\rangle-12y_{i}y_{j}\\|z\\|_{2}^{2}-12z_{i}z_{j}\\|y\\|_{2}^{2}\right).$		(5.19)

Then, we can set

	$\displaystyle d$	$\displaystyle=6f_{1}(a),$
	$\displaystyle(c^{(1)})^{T}x$	$\displaystyle=-2f_{2}(a),$
	$\displaystyle\frac{1}{2}x^{T}Q^{(1)}x$	$\displaystyle=\frac{1}{6}f_{3}(a).$

From the relations (5.14), (5.17), and (5.19) we get the assertion (5.11) where

c_{i}^{(1)}=-12\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k},\quad 1\leq i\leq n,

(5.20)

	$\displaystyle q_{ii}^{(1)}$	$\displaystyle=2\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-2\langle y,z\rangle^{2}+4y_{i}z_{i}\langle y,z\rangle-2y_{i}^{2}\\|z\\|_{2}^{2}-2z_{i}^{2}\\|y\\|_{2}^{2},$
	$\displaystyle q_{ij}^{(1)}$	$\displaystyle=2(y_{i}z_{j}+z_{i}y_{j})\langle y,z\rangle-2y_{i}y_{j}\\|z\\|_{2}^{2}-2z_{i}z_{j}\\|y\\|_{2}^{2},\quad 1\leq i,j\leq n,\ i\neq j,$		(5.21)

and $d$ is as given in (5.10). It follows from the expressions in (5.21) that

	$\displaystyle Q^{(1)}$	$\displaystyle=2\left(\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-\langle y,z\rangle^{2}\right)I_{n}+2\left((yz^{T}+zy^{T})\langle y,z\rangle-yy^{T}\\|z\\|_{2}^{2}-zz^{T}\\|y\\|_{2}^{2}\right)$
		$\displaystyle=2\left(\left(\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-\langle y,z\rangle^{2}\right)I_{n}+(yz^{T}-zy^{T})^{2}\right),$

while the vector given element-wise by (5.20) is equal to the one from the relation (5.7).

Similarly, using a different regrouping of the summands in the equations (5.15) and (5.16), we obtain the assertions (5.12) and (5.13). We get $Q^{(2)}$ and $c^{(2)}$ , as in the relations (5.5) and (5.8), respectively, as well as $Q^{(3)}$ and $c^{(3)}$ , as in (5.6) and (5.9), respectively. ∎

Minimization problem of the form

\min_{v}\{d+c^{T}v+\frac{1}{2}v^{T}Qv\},

is a problem of quadratic programming with no constraints. Its solution $v$ is given by the linear system

Qv=-c.

Therefore, in order to find the solutions of the minimization problems

\left.\begin{array}[]{c}\displaystyle\min_{x}d+(c^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x\\ \displaystyle\min_{y}d+(c^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y\\ \displaystyle\min_{z}d+(c^{(3)})^{T}z+\frac{1}{2}z^{T}Q^{(3)}z\\ \end{array}\right\}

(5.22)

we need to solve linear systems

\left.\begin{array}[]{c}Q^{(1)}x=-c^{(1)}\\ Q^{(2)}y=-c^{(2)}\\ Q^{(3)}z=-c^{(3)}\\ \end{array}\right\}

respectively.

Here we come to an obstacle because matrices $Q^{(1)},Q^{(2)},Q^{(3)}$ are singular. Take $Q^{(1)}$ . From the relation (5.4) we see that $Q^{(1)}$ is defined by two vectors $y$ and $z$ and we have

Q^{(1)}y=0,\quad Q^{(1)}z=0.

Assuming that $y$ and $z$ are linearly independent vectors, this means that $\text{rank}(Q^{(1)})\leq n-2$ . On the other hand, $Q^{(1)}$ is defined as an identity matrix minus a rank- $2$ matrix. This implies that $\text{rank}(Q^{(1)})=n-2$ . However, linear system $Q^{(1)}x=-c^{(1)}$ is consistent because $\text{rank}([Q^{(1)}c^{(1)}])=\text{rank}(Q^{(1)})$ , which can be seen from the relations (5.4) and (5.7). Hence, linear system $Q^{(1)}x=-c^{(1)}$ can be solved using the Moore-Penrose inverse,

x=-\left(Q^{(1)}\right)^{+}c^{(1)},

with an additional constraint that $x$ is orthogonal to $y$ and $z$ . Analogue reasoning holds for the linear systems for $y$ and $z$ .

Now, we can write the algorithm for solving the minimization problem (5.1). The algorithm is based on solving three minimization problems (5.22).

Algorithm 5.2.

Antisymmetry-preserving CP

Input:

\mathcal{A}\in\mathbb{R}^{n\times n\times n}

antisymmetric

Output:

\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)

Initialize

x,y,z\in\mathbb{R}^{n}

as random vectors.

repeat

For

c^{(1)}

as in (5.7) and

Q^{(1)}

as in (5.4),

x=-\left(Q^{(1)}\right)^{+}c^{(1)}

For

c^{(2)}

as in (5.8) and

Q^{(2)}

as in (5.5),

y=-\left(Q^{(2)}\right)^{+}c^{(2)}

For

c^{(3)}

as in (5.9) and

Q^{(3)}

as in (5.6),

z=-\left(Q^{(3)}\right)^{+}c^{(3)}

until convergence or maximum number of iterations

\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)

Note that the Algorithm 5.2 results with mutually orthogonal vectors $x$ , $y$ and $z$ . While this may seem restrictive, it is reasonable to do a restriction on orthogonal vectors. Let us explain that.

First, note that if $x$ , $y$ and $z$ are linearly dependent, then $\mathcal{A}_{6}(x,y,z)=0$ . Thus, any linearly independent triplet of vectors will give a smaller value of the objective function $f$ defined by the relation (5.2).

The objective function $f$ is invariant under very general transformations. Due to multilinearity, for $\alpha,\beta\in\mathbb{R}$ , we have

\mathcal{A}_{6}(x+\alpha y+\beta z,y,z)=\mathcal{A}_{6}(\alpha y+\beta z,y,z)+\mathcal{A}_{6}(x,y,z)=\mathcal{A}_{6}(x,y,z),

where $\mathcal{A}_{6}(\alpha y+\beta z,y,z)=0$ by antisymmetry. Let us consider an arbitrary matrix $B=(b_{ij})\in\mathbb{R}^{3\times 3}$ . Using the arguments of multilinearity and antisymmetry as above, we have

	$\displaystyle\mathcal{A}_{6}(b_{11}x+b_{12}y+b_{13}z,b_{21}x+b_{22}y+b_{23}z,b_{31}x+b_{32}y+b_{33}z)=$
	$\displaystyle=b_{11}b_{22}b_{33}\mathcal{A}_{6}(x,y,z)+b_{11}b_{23}b_{32}\mathcal{A}_{6}(x,z,y)+b_{12}b_{21}b_{33}\mathcal{A}_{6}(y,x,z)+$
	$\displaystyle\quad+b_{12}b_{23}b_{31}\mathcal{A}_{6}(y,z,x)+b_{13}b_{21}b_{32}\mathcal{A}_{6}(z,x,y)+b_{13}b_{22}b_{31}\mathcal{A}_{6}(z,y,x)$
	$\displaystyle=(b_{11}b_{22}b_{33}-b_{11}b_{23}b_{32}-b_{12}b_{21}b_{33}+b_{12}b_{23}b_{31}+b_{13}b_{21}b_{32}-b_{13}b_{22}b_{31})\mathcal{A}_{6}(x,y,z)$
	$\displaystyle=\text{det}(B)\mathcal{A}_{6}(x,y,z).$

Therefore, if $\text{det}(B)=1$ ,

\|\mathcal{A}-\mathcal{A}_{6}(b_{11}x+b_{12}y+b_{13}z,b_{21}x+b_{22}y+b_{23}z,b_{31}x+b_{32}y+b_{33}z)\|=\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|,

and the value of the objective function $f$ stays the same.

Set $V=[x,y,z]\in\mathbb{R}^{n\times 3}$ . Take the thin QR decomposition $V=\widetilde{V}R$ , such that $det(R)=1$ and $\widetilde{V}=[\tilde{x},\tilde{y},\tilde{z}]\in\mathbb{R}^{n\times 3}$ has orthogonal columns. Then, following the same reasoning, we have

\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})=\mathcal{A}_{6}(x,y,z)

and

\min_{x,y,z}\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|=\min_{\tilde{x},\tilde{y},\tilde{z}\text{ orthogonal}}\|\mathcal{A}-\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|.

(5.23)

This justifies the choice of orthogonal vectors.

5.1. Equivalence of the Algorithm 5.2 to HOPM

Here we are going to show that the Algorithm 5.2 is equivalent to the higher-order power method (HOPM) for unstructured rank- $1$ approximation and see what implications it has.

Due to multilinearity, minimization problem (5.23) can be modified into a minimization problem on unitary vectors,

\min_{\begin{subarray}{c}\tilde{x},\tilde{y},\tilde{z}\text{ orthonormal}\\ \lambda\in\mathbb{R}\end{subarray}}\|\mathcal{A}-\lambda\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|^{2},

(5.24)

so it becomes a minimization problem on the Stiefel manifold. Since the expression in (5.24) does not depend on the basis, it is a minimization problem on the Grassmann manifold, which makes sense as the antisymmetric tensors are connected to the Grassmannians, see, e.g., [23].

We can rewrite (5.24) as

\min_{\begin{subarray}{c}\tilde{x},\tilde{y},\tilde{z}\text{ orthonormal}\\ \lambda\in\mathbb{R}\end{subarray}}\left\{\|\mathcal{A}\|^{2}-2\lambda\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle+\lambda^{2}\|\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|^{2}\right\}.

(5.25)

Set

V=\left[\begin{array}[]{ccc}\tilde{x}&\tilde{y}&\tilde{z}\\ \end{array}\right].

(5.26)

Observe that

\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})=\mathcal{E}\times_{1}V\times_{2}V\times_{3}V,

where $\mathcal{E}$ is given by the relation (3.3). Then,

\|\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|^{2}=\|\mathcal{E}\|^{2}=6,

because $\tilde{x}$ , $\tilde{y}$ , $\tilde{z}$ are orthonormal and the Frobenius norm is unitary invariant. This way, minimization problem (5.25) is simplified to

\min_{\begin{subarray}{c}\tilde{x},\tilde{y},\tilde{z}\text{ orthonormal}\\ \lambda\in\mathbb{R}\end{subarray}}\left\{\|\mathcal{A}\|^{2}-2\lambda\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle+6\lambda^{2}\right\}.

Take the objective function

\tilde{f}(\lambda,\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{A}\|^{2}-2\lambda\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle+6\lambda^{2}.

(5.27)

In order to find the optimal $\lambda_{*}$ for $\tilde{f}$ we set the partial derivative of $\tilde{f}$ to zero. We have

\frac{\partial}{\partial\lambda}\tilde{f}(\lambda,\tilde{x},\tilde{y},\tilde{z})=12\lambda-2\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle=0,

that is,

\lambda_{*}=\frac{\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle}{6}.

It follows from (5.27) that

\tilde{f}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{A}\|^{2}-\frac{1}{6}\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle^{2}.

Thus, minimizing $\tilde{f}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z})$ is equivalent to maximizing $|\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle|$ over the Stiefel manifold.

Define the compressed tensor

\mathcal{A}_{c}(V)\coloneqq\mathcal{A}\times_{1}V^{T}\times_{2}V^{T}\times_{3}V^{T},

(5.28)

where $V$ is as in the relation (5.26). This is a $3\times 3\times 3$ tensor. It is very similar to tensor $\mathcal{E}$ , except that in place of $1$ and $-1$ it has $(\mathcal{A}_{c}(V))_{123}$ and $-(\mathcal{A}_{c}(V))_{123}$ , respectively. Using this tensor we obtain

	$\displaystyle\|\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle\|$	$\displaystyle=\|\langle\mathcal{A},\mathcal{E}\times_{1}V\times_{2}V\times_{3}V\rangle\|=\|\langle\mathcal{A}_{c}(V),\mathcal{E}\rangle\|$
		$\displaystyle=6\|(\mathcal{A}_{c}(V))_{123}\|=\sqrt{6}\\|\mathcal{A}_{c}(V)\\|.$

In the last equation we used the norm of the compressed tensor, $\|\mathcal{A}_{c}(V)\|^{2}=6((\mathcal{A}_{c}(V))_{123})^{2}$ . Therefore, maximization of $|\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle|$ is equivalent to maximization of $\|\mathcal{A}_{c}(V)\|$ . This corresponds to the best structure-preserving multilinear rank- $r$ approximation from [3] for $r=3$ .

The problem of finding the best antisymmetric multilinear rank- $r$ approximation is equivalent to the problem of finding the best unstructured rank- $1$ approximation of an antisymmetric tensor, see [3, Theorem 4.2]. This implies the equivalence between our Algorithm 5.2 and HOPM used for finding the best unstructured rank- $1$ approximation. Finally, the global convergence result for HOPM given in [30], namely, the iterates of the ALS algorithm for HOPM converge to the stationary point of the corresponding objective function, apply to our algorithm as well.

6. Partial antisymmetry

Regarding the antisymmetric tensors, we can ask what happens if a tensor has only partial antisymmetry. We observe order- $3$ tensors. Note that partially antisymmetric tensors do not need to be cubical.

Tensor $\mathcal{C}\in\mathbb{R}^{n\times n\times m}$ is antisymmetric in modes one and two if all its frontal slices are antisymmetric. Without the loss of generality, we assume that tensor $\mathcal{C}$ is antisymmetric in the first two modes. That is,

c_{ijk}=-c_{jik},\quad 1\leq i,j\leq n,\ 1\leq k\leq m.

(6.1)

Tensors that are antisymmetric in modes two and three, or in modes one and three, are defined correspondingly. Partial antisymmetrizer that results with the antisymmetry in modes one and two can be defined as the operator $\text{anti}_{1,2}$ such that for $\mathcal{B}\in\mathbb{R}^{n\times n\times m}$ and $\mathcal{C}=\text{anti}_{1,2}(\mathcal{B})$ we have

c_{ijk}=\frac{1}{2}(b_{ijk}-b_{jik}).

For a pair of indices $(i,j)$ , $1\leq i<j\leq n$ , subtensor $\mathcal{G}$ of $\mathcal{C}$ obtained at the intersection of the $i$ th and $j$ th column, row, and tube is a $2\times 2\times 2$ tensor of the form

\mathcal{G}(i_{1},i_{2},i_{3})=\left\{\begin{array}[]{rl}\alpha,&\text{if $(i_{1},i_{2},i_{3})=(1,2,1)$,}\\ -\alpha,&\text{if $(i_{1},i_{2},i_{3})=(2,1,1)$,}\\ \beta,&\text{if $(i_{1},i_{2},i_{3})=(1,2,2)$,}\\ -\beta,&\text{if $(i_{1},i_{2},i_{3})=(2,1,2)$,}\\ 0,&\text{if $i_{1}=i_{2}$,}\\ \end{array}\right.

for $\alpha,\beta\in\mathbb{R}$ . Its mode- $1$ matricization is given by

G_{(1)}=\left[\begin{array}[]{cc|cc}0&\alpha&0&\beta\\ -\alpha&0&-\beta&0\\ \end{array}\right].

Here, tensor $\mathcal{G}$ plays the role analogue to the Levi-Civita tensor (3.3) in Section 3.

Analogue to the tensor format (3.4), for three vectors $x,y\in\mathbb{R}^{n}$ and $z\in\mathbb{R}^{m}$ , we can define an $n\times n\times m$ tensor

\mathcal{C}_{2}(x,y,z):=\frac{1}{2}(x\circ y\circ z-y\circ x\circ z).

(6.2)

If we take $x=[1,0]^{T}$ , $y=[0,1]^{T}$ , $z=[\alpha,\beta]^{T}$ , then $\mathcal{C}_{2}(x,y,z)=\mathcal{G}$ . Besides, if $\mathcal{T}=[[x,y,z]]$ is a rank- $1$ tensor, then $\mathcal{C}_{2}(x,y,z)=\text{anti}_{1,2}(\mathcal{T})$ . Obviously, $\text{rank}(\mathcal{C}_{2}(x,y,z))\leq 2$ . For the fixed third index, each slice of $\mathcal{C}_{2}(x,y,z)$ is a skew-symmetric matrix and, therefore, has an even rank. Hence,

\text{rank}(\mathcal{C}_{2}(x,y,z))=2.

Considering all this, for a given non-zero tensor $\mathcal{C}\in\mathbb{R}^{n\times n\times m}$ that is antisymmetric in the first two modes, we are looking for its rank- $2$ approximation $\widetilde{\mathcal{C}}$ of the same structure. Again, we examine two approaches. The first one is analogue to the Section 4. In the second approach we find a tensor $\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)$ defined by the vectors $x,y\in\mathbb{R}^{n}$ and $z\in\mathbb{R}^{m}$ , such that

\|\mathcal{C}-\widetilde{\mathcal{C}}\|^{2}\to\min.

(6.3)

6.1. Ignoring the structure

Let $\mathcal{C}\in\mathbb{R}^{n\times n\times m}$ ba a tensor with partial antisymmetry. We first approximate $\mathcal{C}$ with a rank- $1$ tensor $\bar{\mathcal{C}}$ by using the CP-ALS algorithm 2.1 with $r=1$ . Then, we apply the operator $\text{anti}_{1,2}$ on $\bar{\mathcal{C}}$ to get a rank- $2$ tensor $\widetilde{\mathcal{C}}$ that is antisymmetric in modes one and two. We have

	$\displaystyle\bar{\mathcal{C}}$	$\displaystyle=[[x,y,z]],\quad x,y\in\mathbb{R}^{n},\ z\in\mathbb{R}^{m},$
	$\displaystyle\widetilde{\mathcal{C}}$	$\displaystyle=\text{anti}_{1,2}(\bar{\mathcal{C}}),$

or equivalently, $\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)$ . The algorithm with partial a posteriori antisymmetrization is a simple modification of the Algorithm 4.1.

Algorithm 6.1.

CP with partial a posteriori antisymmetrization

Input:

\mathcal{C}\in\mathbb{R}^{n\times n\times m}

antisymmetric in modes

1

and

2

Output:

\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)

Apply Algorithm 2.1 on

\mathcal{A}

with

r=1

to obtain

x,y\in\mathbb{R}^{n}

z\in\mathbb{R}^{m}

\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)

6.2. Preserving the structure

Now we are going to construct an iterative structure-preserving minimization algorithm. Again, let $\mathcal{C}\in\mathbb{R}^{n\times n\times m}$ be a tensor with partial antisymmetry. We are looking for tensor $\widetilde{\mathcal{C}}\in\mathbb{R}^{n\times n\times m}$ which is a solution of the minimization problem (6.3). In particular, we are looking for vectors $x,y\in\mathbb{R}^{n}$ and $z\in\mathbb{R}^{m}$ such that

\|\mathcal{C}-\mathcal{C}_{2}(x,y,z)\|^{2}\to\min.

(6.4)

We set

v=\left[\begin{array}[]{c}x\\ y\\ z\\ \end{array}\right]\in\mathbb{R}^{2n+m},

and define the objective function $g\colon\mathbb{R}^{2n+m}\to\mathbb{R}$ ,

g(v)=2\|\mathcal{C}-\mathcal{C}_{2}(x,y,z)\|^{2}.

(6.5)

We formulate the ALS algorithm based on three minimization problems:

\min_{x}g(v),\quad\min_{y}g(v),\quad\min_{z}g(v).

To this end, we need Theorem 6.2. Before the statement of the theorem, we define the appropriate objects: matrices $Q^{(1)}=Q^{(1)}(y,z),Q^{(2)}=Q^{(2)}(x,z)\in\mathbb{R}^{n\times n}$ ,

	$\displaystyle Q^{(1)}$	$\displaystyle=2\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}I_{n}-2yy^{T}\\|z\\|_{2}^{2},$		(6.6)
	$\displaystyle Q^{(2)}$	$\displaystyle=2\\|x\\|_{2}^{2}\\|z\\|_{2}^{2}I_{n}-2xx^{T}\\|z\\|_{2}^{2};$		(6.7)

vectors $b^{(1)}=b^{(1)}(y,z),b^{(2)}=b^{(2)}(x,z)\in\mathbb{R}^{n}$ , $b^{(3)}=b^{(3)}(x,y)\in\mathbb{R}^{m}$ ,

$\displaystyle b^{(1)}$	$\displaystyle=-4\mathcal{C}\times_{2}y^{T}\times_{3}z^{T},$	(6.8)
$\displaystyle b^{(2)}$	$\displaystyle=-4\mathcal{C}\times_{2}x^{T}\times_{3}z^{T},$	(6.9)
$\displaystyle b^{(3)}$	$\displaystyle=-2(\mathcal{C}\times_{1}x^{T}\times_{2}y^{T}-\mathcal{C}\times_{1}y^{T}\times_{2}x^{T}),$	(6.10)

and numbers $q^{(3)}=q^{(3)}(x,y),d\in\mathbb{R}$ ,

	$\displaystyle q^{(3)}$	$\displaystyle=\\|xy^{T}-yx^{T}\\|_{2}^{2},$		(6.11)
	$\displaystyle d$	$\displaystyle=2\\|\mathcal{C}\\|^{2}.$		(6.12)

Theorem 6.2.

The function $g$ defined by (6.5) can be written as

$\displaystyle g(v)$	$\displaystyle=d+(b^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x$	(6.13)
	$\displaystyle=d+(b^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y$	(6.14)
	$\displaystyle=d+(b^{(3)})^{T}z+\frac{1}{2}q^{(3)}z^{T}z,$	(6.15)

for $Q^{(1)},Q^{(2)}\in\mathbb{R}^{n\times n}$ , $b^{(1)},b^{(2)}\in\mathbb{R}^{n}$ , $b^{(3)}\in\mathbb{R}^{m}$ , $q^{(3)}\in\mathbb{R}$ defined by the relations (6.6)-(6.12).

Proof.

We start by writing the function $g$ as

	$\displaystyle g(v)$	$\displaystyle=2\\|\mathcal{C}\\|^{2}-2\left\langle\mathcal{C},x\circ y\circ z-y\circ x\circ z\right\rangle+\frac{1}{2}\\|x\circ y\circ z-y\circ x\circ z\\|^{2}$
		$\displaystyle=2g_{1}(v)-2g_{2}(v)+\frac{1}{2}g_{3}(v),$

for

$\displaystyle g_{1}(v)$	$\displaystyle=\\|\mathcal{C}\\|^{2},$
$\displaystyle g_{2}(v)$	$\displaystyle=\left\langle\mathcal{C},x\circ y\circ z-y\circ x\circ z\right\rangle,$	(6.16)
$\displaystyle g_{3}(v)$	$\displaystyle=\\|x\circ y\circ z-y\circ x\circ z\\|^{2}.$	(6.17)

Function $g_{2}$ can be written as

	$\displaystyle g_{2}(v)$	$\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{m}c_{ijk}(x_{i}y_{j}z_{k}-y_{i}x_{j}z_{k})$
		$\displaystyle=\sum_{i=1}^{n}x_{i}\left(\sum_{j=1}^{n}\sum_{k=1}^{m}c_{ijk}y_{j}z_{k}\right)+\sum_{j=1}^{n}x_{j}\left(\sum_{i=1}^{n}\sum_{k=1}^{m}(-c_{ijk})y_{i}z_{k}\right).$

Using the partial antisymmetry property (6.1), after renaming the indices we get

g_{2}(v)=2\sum_{i=1}^{n}x_{i}\left(\sum_{j=1}^{n}\sum_{k=1}^{m}c_{ijk}y_{j}z_{k}\right).

For the function $g_{3}$ we have

	$\displaystyle g_{3}(v)$	$\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{m}(x_{i}y_{j}z_{k}-x_{j}y_{i}z_{k})^{2}$
		$\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(\sum_{j=1}^{n}\sum_{k=1}^{m}y_{j}^{2}z_{k}^{2}\right)-2\sum_{i,j=1}^{n}x_{i}x_{j}y_{i}y_{j}\left(\sum_{k=1}^{m}z_{k}^{2}\right)+\sum_{j=1}^{n}x_{j}^{2}\left(\sum_{i=1}^{n}\sum_{k=1}^{m}y_{i}^{2}z_{k}^{2}\right)$
		$\displaystyle=2\sum_{i=1}^{n}x_{i}^{2}\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-2\sum_{i,j=1}^{n}x_{i}x_{j}y_{i}y_{j}\\|z\\|_{2}^{2}$
		$\displaystyle=\sum_{i=1}^{n}x_{i}^{2}(2\\|y\\|_{2}^{2}\\|z\\|_{2}^{2}-2y_{i}^{2}\\|z\\|_{2}^{2})+\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}x_{i}x_{j}(-2y_{i}y_{j}\\|z\\|_{2}^{2}).$

The same way as in the proof of Theorem 5.1, we set

	$\displaystyle d$	$\displaystyle=2g_{1}(v),$
	$\displaystyle(b^{(1)})^{T}x$	$\displaystyle=-2g_{2}(v),$
	$\displaystyle\frac{1}{2}x^{T}Q^{(1)}x$	$\displaystyle=\frac{1}{2}g_{3}(v),$

where

b_{i}^{(1)}=-4\sum_{j=1}^{n}\sum_{k=1}^{m}c_{ijk}y_{j}z_{k},\quad 1\leq i\leq n,

(6.18)

and

	$\displaystyle q_{ii}^{(1)}$	$\displaystyle=2\\|y\\|^{2}\\|z\\|^{2}-2y_{i}^{2}\\|z\\|^{2},$
	$\displaystyle q_{ij}^{(1)}$	$\displaystyle=-2y_{i}y_{j}\\|z\\|^{2},\quad 1\leq i,j\leq n,\ i\neq j.$		(6.19)

Vector $b^{(1)}$ from (6.18) can be written in the more compact form (6.8) and matrix $Q^{(1)}$ from (6.19) is equivalent to (6.6), while $d$ is like in the relation (6.12). This is how we get the assertion (6.13).

With different regrouping of the summands in the relations (6.16) and (6.17) we get the equation (6.14) with $b^{(2)}$ and $Q^{(2)}$ as in (6.9) and (6.7), respectively.

To get the equation (6.15) we write

g_{2}(v)=\sum_{k=1}^{m}z_{k}\left(\sum_{i,j=1}^{n}c_{ijk}(x_{i}y_{j}-y_{i}x_{j})\right)

and

g_{3}(v)=\sum_{k=1}^{m}z_{k}^{2}\left(\sum_{i,j=1}^{n}(x_{i}y_{j}-x_{j}y_{i})^{2}\right).

Then, we set

	$\displaystyle b_{k}^{(3)}$	$\displaystyle=-2\sum_{i,j=1}^{n}c_{ijk}(x_{i}y_{j}-y_{i}x_{j}),\quad 1\leq k\leq m,$
	$\displaystyle q^{(3)}$	$\displaystyle=\sum_{i,j=1}^{n}(x_{i}y_{j}-x_{j}y_{i})^{2}=\\|xy^{T}-yx^{T}\\|_{2}^{2}.$

Compact form of the vector $b^{(3)}$ corresponds to (6.10). ∎

Therefore, like in the Section 5, our algorithm is based on finding the solutions of the minimization problems

\left.\begin{array}[]{c}\displaystyle\min_{x}d+(b^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x\\ \displaystyle\min_{y}d+(b^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y\\ \displaystyle\min_{z}d+(b^{(3)})^{T}z+\frac{1}{2}q^{(3)}z^{T}z.\\ \end{array}\right\}

Those solutions are obtained, respectively,from the following equations

\left.\begin{array}[]{c}Q^{(1)}x=-b^{(1)}\\ Q^{(2)}y=-b^{(2)}\\ z=-\frac{1}{q^{(3)}}b^{(3)}.\\ \end{array}\right\}

The situation regarding these linear systems is similar as for the fully antisymmetric case. Matrices $Q^{(1)}$ and $Q^{(2)}$ are not of the full rank. From their definitions (6.6) and (6.7) we see that both are given as identity minus a rank- $1$ matrix and

Q^{(1)}x=0,\quad Q^{(2)}y=0.

Thus, $\text{rank}(Q^{(1)})=\text{rank}(Q^{(2)})=n-1.$ Still, $\text{rank}([Q^{(1)}b^{(1)}])=\text{rank}(Q^{(1)})$ and $\text{rank}([Q^{(2)},b^{(2)}])=\text{rank}(Q^{(2)})$ , so the linear systems are consistent and can be solved using the Moore-Penrose inverse. Additionally, we get that the vectors $x$ and $y$ must be orthogonal. Note that, for $x\neq y$ , $q^{(3)}\neq 0$ and $z$ is well-defined.

The algorithm for solving the minimization problem (6.4) is very similar to the Algorithm 5.2.

Algorithm 6.3.

CP preserving partial antisymmetry

Input:

\mathcal{C}\in\mathbb{R}^{n\times n\times m}

antisymmetric in modes

1

and

2

Output:

\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)

Initialize

x,y\in\mathbb{R}^{n}

z\in\mathbb{R}^{m}

as random vectors.

repeat

For

b^{(1)}

as in (6.8) and

Q^{(1)}

as in (6.6),

x=-(Q^{(1)})^{+}b^{(1)}

For

b^{(2)}

as in (6.9) and

Q^{(2)}

as in (6.7),

y=-(Q^{(2)})^{+}b^{(2)}

For

b^{(3)}

as in (6.10) and

q^{(3)}

as in (6.11),

z=-\frac{b^{(3)}}{q^{(3)}}

until convergence or maximum number of iterations

\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)

Like in the fully antisymmetric case, we can additionally observe that $\mathcal{C}_{2}(x,y,z)=0$ if $x$ and $y$ are lineary dependent and

\mathcal{C}_{2}(b_{11}x+b_{12}y,b_{21}x+b_{22}y,z)=\det B\mathcal{C}_{2}(x,y,z),\quad B=\left[\begin{array}[]{cc}b_{11}&b_{12}\\ b_{21}&b_{22}\\ \end{array}\right].

Then we can rescale our optimization problem such that we are looking for

\min_{\begin{subarray}{c}\|\tilde{x}\|=\|\tilde{y}\|=\|\tilde{z}\|=1\\ x\perp y,\ \lambda\in\mathbb{R}\end{subarray}}\left\{\|\mathcal{C}-\lambda\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\|^{2}\right\}.

Set

\tilde{g}(\lambda,\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{C}\|^{2}-2\lambda\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle+\lambda^{2}\|\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\|^{2}.

From the shape of $\mathcal{C}_{2}$ and the fact that $\|\tilde{x}\|=\|\tilde{y}\|=\|\tilde{z}\|=1$ and $x\perp y$ , after a short a calculation we get $\|\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\|^{2}=\frac{1}{2}$ . Thus,

\tilde{g}(\lambda,\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{C}\|^{2}-2\lambda\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle+\frac{1}{2}\lambda^{2}.

The optimal $\lambda$ for $\tilde{g}$ is

\lambda_{*}=2\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle

and

\tilde{g}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z}))=\|\mathcal{C}\|^{2}-2\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle^{2}.

Therefore, minimizing $\tilde{g}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z}))$ is equivalent to maximizing $|\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle|$ .

Now we can set

W=\left[\begin{array}[]{cc}\tilde{x}&\tilde{y}\\ \end{array}\right]

and define the compressed matrix

C_{c}(W,\tilde{z})\coloneqq\mathcal{C}\times_{1}W^{T}\times_{2}W^{T}\times_{3}\tilde{z}^{T},

(6.20)

analogue to the compressed tensor (5.28). Matrix $C_{c}(W,\tilde{z})$ is a $2\times 2$ skew-symmetric matrix

\left[\begin{array}[]{cc}0&(C_{c}(W,\tilde{z}))_{12}\\ -(C_{c}(W,\tilde{z}))_{12}&0\\ \end{array}\right],

where

|(C_{c}(W,\tilde{z}))_{12}|=|\mathcal{C}\times_{1}\tilde{x}^{T}\times_{2}\tilde{y}^{T}\times_{3}\tilde{z}^{T}|.

(6.21)

Moreover, we can write

\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})=M\times_{1}W\times_{2}W\times_{3}\tilde{z},

for $M=\left[\begin{array}[]{cc}0&\frac{1}{2}\\ -\frac{1}{2}&0\\ \end{array}\right]$ . It follows that

|\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle|=|\langle C_{c}(W,\tilde{z}),M\rangle|=\frac{\sqrt{2}}{2}\|C_{c}(W,\tilde{z})\|_{F}

and we conclude that maximization of $|\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle|$ is equivalent to maximization of $\|C_{c}(W,\tilde{z})\|_{F}$ .

Maximization of $\|C_{c}(W,\tilde{z})\|_{F}$ corresponds to multilinear rank- $(2,2,m)$ structure-preserving approximation of $\mathcal{C}$ . Similarly as it was done in [3] for the best antisymmetric multilinear rank- $r$ approximation, we can establish an equivalence between the best partially antisymmetric multilinear rank- $(2,2,m)$ approximation and the best unstructured rank- $1$ approximation of a partially antisymmetric tensor.

Proposition 6.4.

Let $\mathcal{C}\in\mathbb{R}^{n\times n\times m}$ be a partially antisymmetric tensor. Then

	$\displaystyle\max\left\{\\|\mathcal{C}\times_{1}U^{T}\times_{2}U^{T}\times_{3}z^{T}\\|\ :\ U\in\mathbb{R}^{n\times 2},U^{T}U=I_{2},\\|z\\|_{2}=1\right\}$
	$\displaystyle=\sqrt{2}\max\left\{\|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}\|\ :\ \\|u_{1}\\|_{2}=\\|u_{2}\\|_{2}=\\|z\\|_{2}=1,[u_{1}u_{2}]^{T}[u_{1}u_{2}]=I_{2}\right\}$		(6.22)
	$\displaystyle=\sqrt{2}\max\left\{\|\mathcal{C}\times_{1}v_{1}^{T}\times_{2}v_{2}^{T}\times_{3}z^{T}\|\ :\ \\|v_{1}\\|_{2}=\\|v_{2}\\|_{2}=\\|z\\|_{2}=1\right\}.$		(6.23)

Proof.

Take $\alpha=\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}$ . From the relations (6.20) and (6.21) we see that, for every partially antisymmetric tensor $\mathcal{C}$ and for $U=[u_{1}u_{2}]$ ,

\|\mathcal{C}\times_{1}U^{T}\times_{2}U^{T}\times_{3}z^{T}\|^{2}=\left\|\left[\begin{array}[]{cc}0&\alpha\\ -\alpha&0\\ \end{array}\right]\right\|_{F}^{2}=2\alpha^{2},

which proves (6.22).

Obviously, expression (6.22) is less or equal than (6.23). Take the vectors $v_{1}$ , $v_{2}$ , $z$ that maximize (6.23). There is an upper-triangular $2\times 2$ matrix $R$ such that $|r_{11}|\leq 1$ , $|r_{22}|\leq 1$ and

\left[\begin{array}[]{cc}v_{1}&v_{2}\\ \end{array}\right]=\left[\begin{array}[]{cc}u_{1}&u_{2}\\ \end{array}\right]R

is thin QR decomposition of $[v_{1}v_{2}]$ . Using the antisymmetry in two modes we have

	$\displaystyle\|\mathcal{C}\times_{1}v_{1}^{T}\times_{2}v_{2}^{T}\times_{3}z^{T}\|$	$\displaystyle=\|\mathcal{C}\times_{1}r_{11}u_{1}^{T}\times_{2}(r_{12}u_{1}^{T}+r_{22}u_{2}^{T})\times_{3}z^{T}\|$
		$\displaystyle=\|\mathcal{C}\times_{1}r_{11}u_{1}^{T}\times_{2}r_{22}u_{2}^{T}\times_{3}z^{T}\|$
		$\displaystyle=\|r_{11}r_{22}\|\|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}\|\leq\|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}\|$

This proves that the value of (6.22) is equal to the value of (6.23). ∎

Therefore, following the previous discussion and the result of the Proposition 6.4 we obtained the equivalence between our Algorithm 6.3 and the best unstructured rank- $1$ approximation. Then, same as in the fully antisymmetric case, the convergence result for HOPM from [30] holds.

7. Numerical experiments

We provide numerical examples for the comparison of the CP rank- $1$ approximation with a posteriori antisymmetrization (Algorithm 4.1) and the antisymmetry preserving CP (Algorithm 5.2). Additionally, for the sake of completeness, we compare these algorithms with the CP-ALS algorithm (Algorithm 2.1) with $r=6$ , the algorithm that does not preserve antisymmetry. As we will show, antisymmetry preserving CP outperforms CP with a posteriori antisymmetrization in terms of accuracy, which was expected, but also in execution time, while CP-ALS showed to be much slower then the other two algorithms, and it also completely destroys the antisymmetric property.

All algorithms are implemented and tested in Julia programming language [4], version 1.8.1, on a personal computer, with BenchmarkTools [6] package, used for determining the execution times of the algorithms (function @btime) and TensorToolbox [25] package for tensor calculations.

For a given tensor $\mathcal{A}$ and an approximation $\widetilde{\mathcal{A}}$ , we are looking at the relative error $\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|$ . We run CP-ALS algorithm, both on its own and within CP with a posteriori antisymmetrization with tolerance $10^{-8}$ , and we stop antisymmetry preserving CP algorithm when either the relative error or the difference between relative errors in two consecutive iterations falls bellow $10^{-8}$ .

7.1. Example 1

First we generate an antisymmetric tensor $\mathcal{A}$ of size $n\times n\times n$ and rank $6$ , by randomly selecting three vectors $x,y,z$ of size $n$ and defining $\mathcal{A}=6\mathcal{A}_{6}(x,y,z)$ , where $\mathcal{A}_{6}(x,y,z)$ is defined in (3.4). In this example we know that $\mathcal{A}$ has the proposed structure. We evaluate and compare the accuracy and the speed of our algorithms. The results for different $n$ are presented in Table 1. The best result in each column is bolded.

$n$	$10$		$25$		$50$
	err	time	err	time	err	time
CP+antisym	$0.8333$	$224\,\mu s$	$0.8333$	$905.9\,\mu s$	$0.8333$	$\mathbf{3.983\,ms}$
antisymCP	$\mathbf{8.21\times 10^{-16}}$	$\mathbf{69.9\,\mu s}$	$\mathbf{1.34\times 10^{-15}}$	$\mathbf{502.5\,\mu s}$	$\mathbf{1.66\times 10^{-15}}$	$8.283\,ms$
CP-ALS	$5.27\times 10^{-6}$	$8.472\,ms$	$1.998\times 10^{-7}$	$26.282\,ms$	$8.43\times 10^{8}$	$187.625\,ms$

Table 1. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with

r=6

(Algorithm 2.1) in terms of relative error

\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|

and execution times obtained by function @btime.

Even though the execution time of CP with a posteriori antisymmetrization is comparable to antisymmetry preserving CP, by first approximating with non-antisymmetric tensor of rank 1, CP with a posteriori antisymmetrization loses the underlying structure, and results in an approximation with large error. CP-ALS manages to find a good non-antisymmetric approximation, but it requires much more time, so disregarding the antisymmetric property did not help either with accuracy or with execution times. Overall, antisymmetry preserving CP achieves best results.

Here, same as in the following examples, the initial vectors in Algorithm 5.2 are taken as random vectors. If we initialize the algorithm using the HOSVD, the number of iterations decreases, but the execution time increases because of the additional time needed to perform the HOSVD.

7.2. Example 2

Now we construct an antisymmetric tensor by element-wise as

	$\displaystyle\mathcal{A}(i,j,k)=$	$\displaystyle\sin(x_{i})\sin(y_{j})\sin(z_{k})+\sin(y_{i})\sin(z_{j})\sin(x_{k})+\sin(z_{i})\sin(x_{j})\sin(y_{k})-$
		$\displaystyle\sin(y_{i})\sin(x_{j})\sin(z_{k})-\sin(x_{i})\sin(z_{j})\sin(y_{k})+\sin(z_{i})\sin(y_{j})\sin(x_{k}),$

where $x_{i}$ , $y_{j}$ and $z_{k}$ are sets of $n$ equidistant points on intervals $[0,1]$ , $[2,10]$ and $[1,3]$ , respectively. This type of tensor appear in signal processing applications. Accuracy and speed of our algorithms for different $n$ are presented in Table 2. The best result in each column is bolded.

$n$	$10$		$25$		$50$
	err	time	err	time	err	time
CP+antisym	$0.8333$	$220.9\,\mu s$	$0.8333$	$912.7\,\mu s$	$0.8333$	$\mathbf{3.95\,ms}$
antisymCP	$\mathbf{7.55\times 10^{-16}}$	$\mathbf{111.5\,\mu s}$	$\mathbf{9.2\times 10^{-16}}$	$\mathbf{3.439\,\mu s}$	$\mathbf{1.575\times 10^{-15}}$	$26.898\,ms$
CP-ALS	$4.02\times 10^{-7}$	$7.659\,ms$	$3.91\times 10^{-9}$	$29.453\,ms$	$8.492\times 10^{-7}$	$86.045\,ms$

Table 2. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with

r=6

(Algorithm 2.1) in terms of relative error

\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|

and execution times obtained by function @btime.

Similarly as in Example 7.1, antisymmetry preserving CP beats two other methods in terms of accuracy and speed of getting an accurate solution.

In the next two examples we use tensors of smaller size, because ranks of those tensors increase with the size, and, since we are approximating by a rank- $6$ tensor, we want to use tensors for which it makes sense to do this type of approximation.

7.3. Example 3

Now we generate an antisymmetric tensor that does not necessarily have the structure (3.4), by discretizing the function $f(x,y,z)=\exp(x^{2}+2y^{2}+3z^{2})$ on a grid $\xi_{i}=(i-1)/(n-1)$ , $i=1,\ldots,n$ , and then applying the antisymmetrizer (3.2). We test for the different values of $n$ and show the results in Table 3. Again, the best result in each column is bolded.

$n$	$3$		$5$		$7$
	err	time	err	time	err	time
CP+antisym	$0.8333$	$185\,\mu s$	$0.8339$	$260.2\,\mu s$	$0.8345$	$276.6\,\mu s$
antisymCP	$\mathbf{1.61\times 10^{-14}}$	$\mathbf{17.1\,\mu s}$	$\mathbf{0.0557}$	$\mathbf{87.4\,\mu s}$	$\mathbf{0.0802}$	$\mathbf{130.4\,\mu s}$
CP-ALS	$2.55\times 10^{-5}$	$9.875\,ms$	$\mathbf{0.0557}$	$26.282\,ms$	$\mathbf{0.0802}$	$4.538\,ms$

Table 3. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with

r=6

(Algorithm 2.1) in terms of relative error

\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|

and execution times obtained by function @btime.

Antisymmetry preserving CP achieves the best execution times. When the tensor can be well approximated by the CP approximation of the form (3.4), it also achieves the best accuracy ( $n=3$ ). Otherwise, it results in the same error as CP-ALS, but much lower execution times ( $n=5,7$ ).

7.4. Example 4

We generate a random tensor of size $n\times n\times n$ and antisymmetrize it with (3.2). We compare the three algorithms and present the results in Table 4. The best result in each column is bolded.

$n$	$3$		$5$		$7$
	err	time	err	time	err	time
CP+antisym	$0.8333$	$184.6\,\mu s$	$0.8546$	$336.5\,\mu s$	$0.9242$	$743.3\,\mu s$
antisymCP	$\mathbf{3.616\times 10^{-16}}$	$\mathbf{17\,\mu s}$	$0.3432$	$\mathbf{139.9\,\mu s}$	$0.723$	$\mathbf{493.2\,\mu s}$
CP-ALS	$8.11\times 10^{-8}$	$14.364\,ms$	$\mathbf{0.2716}$	$20.172\,ms$	$\mathbf{0.6393}$	$421.051\,ms$

Table 4. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with

r=6

(Algorithm 2.1) in terms of relative error

\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|

and execution times obtained by function @btime.

Similarly as in the previous example, when a tensor can be well approximated by CP decomposition with six summands (here for $n=3$ ), antisymmetry preserving CP achieves the best results. For $n=5,7$ , antisymmetry preserving CP gives somewhat worse results than CP-ALS in terms of accuracy, but still gives the approximation in much sorter times, and CP-ALS does not preserve the antisymmetry. Note that the rank of a random antisymmetric tensor is much higher than six. This is the reason why all approximations produce high relative error.

7.5. Example 5. Partial antisymmetry

For the partial antisymmetry, we compare Algorithm 6.1, CP with partial a posteriori antisymmetrization, and Algorithm 6.3, CP preserving partial antisymmetry, with standard CP-ALS (Algorithm 2.1) with $r=2$ , which ignores the structure.

Here, regardless of how we construct the tensor $\mathcal{A}$ , all methods give approximately the same error. Again, the CP preserving partial antisymmetry stands out in terms of execution times. We present results in Table 5, with tensors $\mathcal{A}_{1}$ , $\mathcal{A}_{2}$ and $\mathcal{A}_{3}$ defined as follows:

•

$\mathcal{A}_{1}$ is an $8\times 8\times 10$ tensor constructed by randomly selecting vectors $x,y,z$ of sizes $8,8,10$ , respectively, and setting $\mathcal{A}=2\mathcal{C}_{2}$ , where $\mathcal{C}_{2}$ is defined in (6.2).
•

$\mathcal{A}_{2}$ is a $5\times 5\times 7$ tensor constructed from the function the same way as in the Example 7.3.
•

$\mathcal{A}_{3}$ is a $5\times 5\times 4$ tensor generated by partially antisymmetrizing a tensor with randomly selected elements, using $\text{anti}_{1,2}$ operator.

tensor	$\mathcal{A}_{1}$		$\mathcal{A}_{2}$		$\mathcal{A}_{3}$
	err	time	err	time	err	time
CP+pantisym	$1.88\times 10^{-16}$	$202.7\,\mu s$	$0.1001$	$269.4\,\mu s$	$0.7175$	$569.2\,\mu s$
pantisymCP	$5.832\times 10^{-16}$	$\mathbf{30.2\,\mu s}$	$0.1001$	$\mathbf{53.70\,\mu s}$	$0.7175$	$\mathbf{106.6\,\mu s}$
CP-ALS	$6.774\times 10^{-16}$	$1.051\,ms$	$0.1001$	$1.282\,ms$	$0.7175$	$3.026\,ms$

Table 5. Evaluation of CP algorithm with partial a posteriori antisymmetrization (CP+pantisym - Algorithm 6.3), antisymmetry preserving partial CP (pantisymCP - Algorithm 6.3) and CP-ALS with

r=2

(Algorithm 2.1) in terms of relative error

\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|

and execution times obtained by function @btime.

8. Conclusion

We described an antisymmetric tensor format $\mathcal{A}_{6}(x,y,z)$ determined by only three vectors, $x,y,z\in\mathbb{R}^{n}$ . For any $n$ , tensors of the form $\mathcal{A}_{6}(x,y,z)$ have rank at most six. We developed an ALS algorithm for structure-preserving low-rank approximation of an antisymmetric tensor $\mathcal{A}$ by a tensor of the form $\widetilde{A}=\mathcal{A}_{6}(x,y,z)$ . In order to obtain our algorithm, we wrote the objective function as three different quadratic forms, given explicitly, one for each mode. The algorithm works in a way that, in each microiteration, a quadratic optimization problem for the corresponding tensor mode is solved.

We showed that our minimization problem

\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|\rightarrow\min

for $x,y,z\in\mathbb{R}^{n}$ can be viewed as a minimization problem for orthonormal vectors $\tilde{x},\tilde{y},\tilde{z}\in\mathbb{R}^{n}$ ,

\|\mathcal{A}-\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|\rightarrow\min.

Further on, we demonstrated that this minimization problem is equivalent to maximization problem

\|\mathcal{A}\times_{1}V^{T}\times_{2}V^{T}\times_{3}V^{T}\|\rightarrow\max,

where $V\in\mathbb{R}^{n\times 3}$ is a matrix with orthonormal columns. The prior maximization problem corresponds to the problem of the best multilinear low-rank approximation of antisymmetric tensors. Since antisymmetric multilinear low-rank approximation is equivalent to the best unstructured rank- $1$ approximation, we were able relate our algorithm to HOPM. Therefore, the global convergence results for HOPM apply here.

For the tensors with partial antisymmetry we established a partially antisymmetric tesnor format $\mathcal{C}_{2}(x,y,z)$ determined by three vectors, $x,y\in\mathbb{R}^{n}$ , $z\in\mathbb{R}^{m}$ . Tensors of the form $\mathcal{C}_{2}(x,y,z)$ have rank two. We created a similar ALS algorithm for structure-preserving rank- $2$ approximation of a partially atisymmetric tensor $\mathcal{C}$ by a tensor of the form $\widetilde{C}=\mathcal{C}_{2}(x,y,z)$ . Analogously to the fully antisymmetric case, we verified that the algorithm in question is equivalent to HOPM.

The method described in the paper can be generalized to solve the approximation problem for different antisymmetric structures. Given that the target format can be written as a sum of multilinear terms, the underlying linearity in each mode would lead to quadratic optimization problems which would be handled in the same way, with different coefficient matrices and vectors. For example, instead of antisymmetric rank- $6$ approximation, this way, one could find antisymmetric rank- $6r$ approximation represented by $3r$ vectors. The paper limited its scope to order- $3$ tensors. For antisymmetric order- $d$ tensors, analogous rank- $d!r$ approximation would be represented by $dr$ vectors.

Acknowledgments

The authors would like to thank the anonymous referees, whose insightful remarks improved the paper, especially for the comments regarding orthogonality of the resulting vectors and the equivalence to HOPM.

References

[1] E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, Journal of Chemometrics, 25 (2011), pp. 67–86.
[2] E. Acar and B. Yener, Unsupervised multiway data analysis: A literature survey, IEEE Trans. Knowledge Date Engrg., 21 (2009), pp. 6–20.
[3] E. Begović Kovač and D. Kressner, Structure-preserving low multilinear rank approximation of antisymmetric tensors, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 967–983.
[4] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: a fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65–98.
[5] J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition, Psychometrika, 35 (1970), pp. 283–319.
[6] J. Chen and J. Revels, Robust benchmarking in noisy environments, ArXiv, 1608.04295 (2016).
[7] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain, Symmetric tensors and symmetric tensor rank, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1254–1279.
[8] P. Comon, X. Luciani, and A. L. F. de Almeida, Tensor decompositions, alternating least squares and other tales, urnal of Chemometrics, 23 (2009), pp. 393–405.
[9] L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278.
[10] , Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition, SIAM J. Matrix Anal. Appl., 26 (2004/05), pp. 295–327.
[11] M. Espig, W. Hackbusch, and A. Khachatryan, On the convergence of alternating least squares optimisation in tensor format representations, ArXiv, 1503.05431 (2015).
[12] H. Goldstein, C. P. Poole, and J. L. Safko, Classical Mechanics, Pearson, 3rd ed., 2001.
[13] L. Grasedyck, M. Kluge, and S. Krämer, Variants of alternating least squares tensor completion in the tensor train format, SIAM J. Sci. Comput., 37 (2015), pp. A2424–A2450.
[14] W. H. Greub, Multilinear algebra, vol. Band 136 of Die Grundlehren der mathematischen Wissenschaften, Springer-Verlag New York, Inc., New York, 1967.
[15] W. Hackbusch, On the representation of symmetric and antisymmetric tensors, in Contemporary computational mathematics—a celebration of the 80th birthday of Ian Sloan. Vol. 1, 2, Springer, Cham, 2018, pp. 483–515.
[16] R. A. Harshman, Foundations of the parafac procedure: Models and conditions for an “explanatory” multi-modal factor analysis, UCLA Working Papers in Phonetics, 16 (1970), pp. 1–84.
[17] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys., 6 (1927), pp. 164–189.
[18] , Multilple invariants and generalized rank of a p-way matrix or tensor, J. Math. Phys., 7 (1927), pp. 39–79.
[19] D. Hong, T. G. Kolda, and J. A. Duersch, Generalized canonical polyadic tensor decomposition, SIAM Rev., 62 (2020), pp. 133–163.
[20] F. Hummel, T. Tsatsoulis, and A. Grüneis, Low rank factorization of the Coulomb integrals for periodic coupled cluster theory, The Journal of Chemical Physics, 146 (2017), p. 124105.
[21] T. G. Kolda, Numerical optimization for symmetric tensor decomposition, Math. Program., 151 (2015), pp. 225–248.
[22] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455–500.
[23] J. M. Landsberg, Tensors: geometry and applications, vol. 128 of Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, 2012.
[24] R. Minster, I. Viviano, X. Liu, and G. Ballard, CP decomposition for tensors via alternating least squares with QR decomposition, Numer. Linear Algebra Appl., 30 (2023), pp. Paper No. e2511, 17.
[25] L. Periša, Julia tensor toolbox, (2019).
[26] K. F. Riley, M. P. Hobson, and S. J. Bence, Mathematical Methods for Physics and Engineering, Cambridge University Press, 3rd ed., 2006.
[27] S. Szalay, M. Pfeffer, V. Murg, G. Barcza, F. Verstraete, R. Schneider, and O. Legeza, Tensor product methods and entanglement optimization for ab initio quantum chemistry, International Journal of Quantum Chemistry, 115 (2015), pp. 1342–1391.
[28] P.-H. Tsai, P. Fischer, and E. Solomonik, Accelerating the galerkin reduced-order model with the tensor decomposition for turbulent flows, ArXiv, 2311.03694 (2023).
[29] A. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 639–652.
[30] , A new convergence proof for the higher-order power method and generalizations, Pac. J. Optim., 11 (2015), pp. 309–321.

	$\displaystyle\max\left\{\\|\mathcal{C}\times_{1}U^{T}\times_{2}U^{T}\times_{3}z^{T}\\|\ :\ U\in\mathbb{R}^{n\times 2},U^{T}U=I_{2},\\|z\\|_{2}=1\right\}$
	$\displaystyle=\sqrt{2}\max\left\{\|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}\|\ :\ \\|u_{1}\\|_{2}=\\|u_{2}\\|_{2}=\\|z\\|_{2}=1,[u_{1}u_{2}]^{T}[u_{1}u_{2}]=I_{2}\right\}$		(6.22)
	$\displaystyle=\sqrt{2}\max\left\{\|\mathcal{C}\times_{1}v_{1}^{T}\times_{2}v_{2}^{T}\times_{3}z^{T}\|\ :\ \\|v_{1}\\|_{2}=\\|v_{2}\\|_{2}=\\|z\\|_{2}=1\right\}.$		(6.23)

	$\displaystyle\|\mathcal{C}\times_{1}v_{1}^{T}\times_{2}v_{2}^{T}\times_{3}z^{T}\|$	$\displaystyle=\|\mathcal{C}\times_{1}r_{11}u_{1}^{T}\times_{2}(r_{12}u_{1}^{T}+r_{22}u_{2}^{T})\times_{3}z^{T}\|$
		$\displaystyle=\|\mathcal{C}\times_{1}r_{11}u_{1}^{T}\times_{2}r_{22}u_{2}^{T}\times_{3}z^{T}\|$
		$\displaystyle=\|r_{11}r_{22}\|\|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}\|\leq\|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}\|$