This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

CP decomposition and low-rank approximation of antisymmetric tensors

Erna Begović Kovač  and  Lana Periša
Abstract.

For the antisymmetric tensors the paper examines a low-rank approximation which is represented via only three vectors. We describe a suitable low-rank format and propose an alternating least squares structure-preserving algorithm for finding such approximation. Moreover, we show that this approximation problem is equivalent to the problem of finding the best multilinear low-rank antisymmetric approximation and, consequently, equivalent to the problem of finding the best unstructured rank-11 approximation. The case of partial antisymmetry is also discussed. The algorithms are implemented in Julia programming language and their numerical performance is discussed.

Key words and phrases:
CP decomposition, antisymmetric tensors, low-rank approximation, structure-preserving algorithm, Julia
Mathematics Subject Classification:
15A69
Erna Begović Kovač, University of Zagreb Faculty of Chemical Engineering and Technology, Marulićev trg 19, 10000 Zagreb, Croatia, [email protected]
Lana Periša, Visage Technologies, Ivana Lučića 2a, 10000 Zagreb, Croatia, [email protected]
This work has been supported in part by Croatian Science Foundation under the project UIP-2019-04-5200.

1. Introduction

Tensor decompositions have been extensively studied in recent years [2, 9, 10, 19, 22]. However, the research was mostly focused on either unstructured or symmetric [7, 21] tensors. In this paper we explore the antisymmetric tensors, their CP decomposition and the algorithms for the low-rank approximation.

The idea of the CP decomposition is to write a tensor as a sum of its rank-one components. It was first introduced by Hitchcock [17, 18] in 1927, but it only became popular in the 1970s as CANDECOMP (canonical decomposition) [5] and PARAFAC (parallel factors) [16]. This decomposition is closely related to the tensor rank RR, which is defined as the minimal number of rank-11 summands in the exact CP decomposition. Contrary to the matrix case, the rank of a tensor can exceed its dimension, and it can be different over \mathbb{R} and over \mathbb{C}. It is known that the problem of finding the rank of a given tensor is NP-hard.

When computing the CP approximation, the main question is the choice of the number of rank-one components. Given the antisymmetric structure of our tensors in question, we impose an additional constraint on the CP decomposition. This constraint assures that the resulting tensor is, indeed, antisymmetric, and it gives a bound on the minimal number of rank-one components.

We focus on the tensors of order three. For a given antisymmetric tensor 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n} our goal is to find its low-rank antisymmetric approximation which is represented via only three vectors. In particular, we are looking for the approximation 𝒜~\widetilde{\mathcal{A}} of 𝒜\mathcal{A} such that rank(𝒜~)6\text{rank}(\widetilde{\mathcal{A}})\leq 6 for any nn, and

𝒜~=16(xyz+yzx+zxyxzyyxzzyx),\widetilde{\mathcal{A}}=\frac{1}{6}(x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x),

where x,y,znx,y,z\in\mathbb{R}^{n}. We propose an alternating least squares structure-preserving algorithm for solving this problem. The algorithm is based on solving a minimization problem in each tensor mode. We compare our algorithm with a “naive” idea which uses a posteriori antisymmetrization. Further on, we show that our approximation problem is equivalent to the problem of the best multilinear rank-33 structure-preserving antisymmetric tensor approximation from [3] and, consequently, equivalent to the problem of the best unstructured rank-11 approximation. This establishes the equivalence between our algorithm and the higher-order power method (HOPM). Therefore, corresponding convergence result for HOPM from [30] can be applied.

Additionally, we study the tensors with partial antisymmetry, that is, antisymmetry in only two modes. Similarly to what we do for the tensors that are antisymmetric in all modes, we first determine a suited format of the CP decomposition, which is going to be simpler for the partial antisymmetry. Based on this format, for a given tensor 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} antisymmetric in two modes, we are looking for its approximation 𝒞~\widetilde{\mathcal{C}} of the same structure such that 𝒞~\widetilde{\mathcal{C}} is represented by three vectors and rank(𝒞~)=2\text{rank}(\widetilde{\mathcal{C}})=2.

In Section 2 we introduce the notation and preliminaries. Our problem of antisymmetric tensor approximation is described in Section 3. In Sections 4 we describe the approach with a posteriori antisymmetrization, while in Section 5 we propose the algorithm for solving the minimization problem from Section 3. Section 6 deals with the case of partial antisymmetry. In Section 7 we discuss our numerical results obtained in Julia programming language and conclusion is given in Section 8.

2. Notation and preliminaries

Throughout the paper we denote tensors by calligraphic letters, e.g., 𝒜\mathcal{A}. We refer to the tensor dimension as its order. Then, for 𝒜n1×n2××nd\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} we say that 𝒜\mathcal{A} is a tensor of order dd. Tensor 𝒜n1×n2××nd\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} is cubical if n1=n2==ndn_{1}=n_{2}=\cdots=n_{d}. Vectors obtained from a tensor by fixing all indices but the mmth one are called mode-mm fibers. Fibers of an order-33 tensor are columns (mode-11 fibers), rows (mode-22 fibers), and tubes (mode-33 fibers). Matrices obtained from a tensor by fixing all indices but two are called slices. Matrix representation of a tensor 𝒜n1×n2××nd\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} is called mode-mm matricization and it is denoted by A(m)A_{(m)}. It is obtained by arranging mode-mm fibers of 𝒜\mathcal{A} as columns of A(m)A_{(m)}. The mode-mm product of a tensor 𝒜n1×n2××nd\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} with a matrix Mp×nmM\in\mathbb{R}^{p\times n_{m}} is a tensor n1××nm1×p×nm+1××nd\mathcal{B}\in\mathbb{R}^{n_{1}\times\cdots\times n_{m-1}\times p\times n_{m+1}\times\cdots\times n_{d}},

=𝒜×mM,such thatB(m)=MA(m).\mathcal{B}=\mathcal{A}\times_{m}M,\quad\text{such that}\quad B_{(m)}=MA_{(m)}.

Tensor norm is a generalization of the Frobeius norm. For 𝒜n1×n2××nd\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} we have

𝒜=i1=1n1i2=1n2id=1ndai1i2id2.\|\mathcal{A}\|=\sqrt{\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}a_{i_{1}i_{2}\ldots i_{d}}^{2}}.

The inner product of two tensors 𝒜,n1×n2××nd\mathcal{A},\mathcal{B}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} is given by

𝒜,=i1=1n1i2=1n2id=1ndai1i2idbi1i2id.\langle\mathcal{A},\mathcal{B}\rangle=\sum_{i_{1}=1}^{n_{1}}\sum_{i_{2}=1}^{n_{2}}\cdots\sum_{i_{d}=1}^{n_{d}}a_{i_{1}i_{2}\ldots i_{d}}b_{i_{1}i_{2}\ldots i_{d}}.

The vector outer product is denoted by \circ. A tensor 𝒜n1×n2××nd\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times\cdots\times n_{d}} is a rank-11 tensor if it can be written as the outer product of dd vectors,

𝒜=v(1)v(2)v(d).\mathcal{A}=v^{(1)}\circ v^{(2)}\circ\cdots\circ v^{(d)}.

Then

ai1i2id=vi1(1)vi2(2)vid(d),1iknk, 1kd.a_{i_{1}i_{2}\cdots i_{d}}=v^{(1)}_{i_{1}}v^{(2)}_{i_{2}}\cdots v^{(d)}_{i_{d}},\quad 1\leq i_{k}\leq n_{k},\ 1\leq k\leq d.

The Khatri-Rao product of two matrices Am×n,Bp×nA\in\mathbb{R}^{m\times n},B\in\mathbb{R}^{p\times n} is defined as

AB=[a1b1a2b2anbn](mp)×n,A\odot B=\begin{bmatrix}a_{1}\otimes b_{1}&a_{2}\otimes b_{2}&\cdots&a_{n}\otimes b_{n}\end{bmatrix}\in\mathbb{R}^{(mp)\times n},

where aka_{k} and bkb_{k} denote the kkth column of AA and BB, respectively. The Hadamard (element-wise) product of two matrices A,Bm×nA,B\in\mathbb{R}^{m\times n}, is defined as

AB=[a11b11a12b12a1nb1na21b21a22b22a2nb2nam1bm1am2bm2amnbmn]m×n.A\ast B=\begin{bmatrix}a_{11}b_{11}&a_{12}b_{12}&\cdots&a_{1n}b_{1n}\\ a_{21}b_{21}&a_{22}b_{22}&\cdots&a_{2n}b_{2n}\\ \vdots&\vdots&\ddots&\vdots\\ a_{m1}b_{m1}&a_{m2}b_{m2}&\cdots&a_{mn}b_{mn}\end{bmatrix}\in\mathbb{R}^{m\times n}.

The Moore-Penrose inverse of AA is denoted by A+A^{+}.

For a tensor 𝒜n1×n2×n3\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}, its CP approximation takes the form

𝒜i=1r(xiyizi),\mathcal{A}\approx\sum_{i=1}^{r}(x_{i}\circ y_{i}\circ z_{i}), (2.1)

where xin1x_{i}\in\mathbb{R}^{n_{1}}, yin2y_{i}\in\mathbb{R}^{n_{2}}, zin3z_{i}\in\mathbb{R}^{n_{3}}. If we arrange vectors xi,yi,zix_{i},y_{i},z_{i}, i=1,,ri=1,\ldots,r into matrices

X=[x1x2xr],Y=[y1y2yr],Z=[z1z2zr],X=\left[\begin{array}[]{cccc}x_{1}&x_{2}&\cdots&x_{r}\\ \end{array}\right],\quad Y=\left[\begin{array}[]{cccc}y_{1}&y_{2}&\cdots&y_{r}\\ \end{array}\right],\quad Z=\left[\begin{array}[]{cccc}z_{1}&z_{2}&\cdots&z_{r}\\ \end{array}\right],

relation (2.1) can be written as

𝒜[[X,Y,Z]]=i=1r(xiyizi).\mathcal{A}\approx[[X,Y,Z]]=\sum_{i=1}^{r}(x_{i}\circ y_{i}\circ z_{i}). (2.2)

The smallest number rr in the exact CP decomposition (2.2) is called tensor rank. We write rank(𝒜)=r\text{rank}(\mathcal{A})=r.

The most commonly used algorithm for computing the CP approximation is alternating least squares (ALS) algorithm. (See, e.g., [22].) In Algorithm 2.1 we give the CP-ALS algorithm for order-33 tensors.

Algorithm 2.1.

  CP-ALS  

Input: 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n}, rr\in\mathbb{N}
Output: X,Y,Zn×rX,Y,Z\in\mathbb{R}^{n\times r}
Initialize X,Y,ZX,Y,Z as leading rr left singular vectors of A(i)A_{(i)}, i=1,2,3i=1,2,3, respectively.
repeat
     X=A(1)(ZY)(YTYZTZ)+X=A_{(1)}(Z\odot Y)(Y^{T}Y\ast Z^{T}Z)^{+}
     Y=A(2)(ZX)(XTXZTZ)+Y=A_{(2)}(Z\odot X)(X^{T}X\ast Z^{T}Z)^{+}
     Z=A(3)(YX)(XTXYTY)+Z=A_{(3)}(Y\odot X)(X^{T}X\ast Y^{T}Y)^{+}
until convergence or maximum number of iterations

 


3. Problem description

A cubical tensor is symmetric (sometimes also called supersymmetric) if its elements are invariant to any permutation of indices. On the contrary, a cubical tensor is antisymmetric if its elements change sign when permuting pairs of indices. In particular, an order-33 tensor 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n} is antisymmetric if

aijk=ajki=akij=aikj=ajik=akji,1i,j,kn.a_{ijk}=a_{jki}=a_{kij}=-a_{ikj}=-a_{jik}=-a_{kji},\quad 1\leq i,j,k\leq n. (3.1)

Such tensors are also called alternating 33-tensors Λ3(n)\Lambda^{3}(\mathbb{R}_{n}) [23], or 33-vectors [14]. The antisymmetric tensors appear in applications such as quantum chemistry [27] and electromagnetism [26]. Besides, they are interesting from the mathematical point of view [3, 15]. From the definition of the antisymmetric tensor 𝒜\mathcal{A} it obviously follows that

  • (i)

    In all modes, all slices of 𝒜\mathcal{A} are antisymmetric matrices.

  • (ii)

    In all modes, all slices have one null column and one null row.

  • (iii)

    Antisymmetric tensors are data-sparse in the sense that many of its non-zero elements are the same, up to the sign.

These facts are useful when it comes to the implementation of the specific algorithms.

We can define the antisymmetrizeranti” as the orthogonal projection of a general cubical order-dd tensor \mathcal{B} to the subspace of antisymmetric tensors. Then, 𝒜=anti()\mathcal{A}=\text{anti}(\mathcal{B}) is an order-dd tensor given by

𝒜(i1,i2,,id)1d!pπ(d)sign(p)(p(i1),p(i2),,p(id)),\mathcal{A}(i_{1},i_{2},\ldots,i_{d})\coloneqq\frac{1}{d!}\sum_{p\in\pi(d)}\text{sign}(p)\mathcal{B}(p(i_{1}),p(i_{2}),\ldots,p(i_{d})),

where π(d)\pi(d) denotes the set of all permutations of length dd. Hence, for d=3d=3, n×n×n\mathcal{B}\in\mathbb{R}^{n\times n\times n} and 𝒜=anti()\mathcal{A}=\text{anti}(\mathcal{B}) we have

aijk=16(bijk+bjki+bkijbikjbjikbkji).a_{ijk}=\frac{1}{6}(b_{ijk}+b_{jki}+b_{kij}-b_{ikj}-b_{jik}-b_{kji}). (3.2)

Let 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n} be an antisymmetric tensor of order three. Take a triplet of indices (i,j,k)(i,j,k), 1i<j<kn1\leq i<j<k\leq n. It follows from (3.1) that a subtensor 𝒜^\hat{\mathcal{A}} of 𝒜\mathcal{A} obtained at the intersection of the iith, jjth, and kkth column, row, and tube is of the form

𝒜^=α,\hat{\mathcal{A}}=\alpha\mathcal{E},

where α\alpha\in\mathbb{R} and \mathcal{E} is a 3×3×33\times 3\times 3 tensor such that

(i1,i2,i3)={1,if the indices make an even permutation of (1,2,3),1,if the indices make an odd permutation of (1,2,3),0,if two or more indices are equal.\mathcal{E}(i_{1},i_{2},i_{3})=\left\{\begin{array}[]{rl}1,&\text{if the indices make an even permutation of $(1,2,3)$,}\\ -1,&\text{if the indices make an odd permutation of $(1,2,3)$,}\\ 0,&\text{if two or more indices are equal.}\\ \end{array}\right.

Tensor \mathcal{E} is called the Levi-Civita tensor [12]. We can also write \mathcal{E} using its matricization

E(1)=[000001010001000100010100000].E_{(1)}=\left[\begin{array}[]{ccc|ccc|ccc}0&0&0&0&0&-1&0&1&0\\ 0&0&1&0&0&0&-1&0&0\\ 0&-1&0&1&0&0&0&0&0\\ \end{array}\right]. (3.3)

Obviously, \mathcal{E} is the simplest possible antisymmetric non-zero order-33 tensor.

For three given vectors x,y,znx,y,z\in\mathbb{R}^{n} we define an n×n×nn\times n\times n antisymmetric tensor associated to these vectors as

𝒜6(x,y,z)16(xyz+yzx+zxyxzyyxzzyx).\mathcal{A}_{6}(x,y,z)\coloneqq\frac{1}{6}(x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x). (3.4)

Note that tensor \mathcal{E} is a special case of the antisymmetric tensor 𝒜6(x,y,z)\mathcal{A}_{6}(x,y,z). For x=[6,0,0]Tx=[6,0,0]^{T}, y=[0,1,0]Ty=[0,1,0]^{T}, z=[0,0,1]Tz=[0,0,1]^{T}, we get 𝒜6(x,y,z)=\mathcal{A}_{6}(x,y,z)=\mathcal{E}. Moreover, for a rank one tensor 𝒯=[[x,y,z]]\mathcal{T}=[[x,y,z]], we have 𝒜6(x,y,z)=anti(𝒯)\mathcal{A}_{6}(x,y,z)=\text{anti}(\mathcal{T}). Tensor format (3.4) can be favourable because it represents an antisymmetric tensor via only three vectors, that is, 3n3n entries. On the other hand, the standard form of an n×n×nn\times n\times n antisymmetric tensor contains (n3)\binom{n}{3} different entries. Besides, tensor 𝒜6(x,y,z)\mathcal{A}_{6}(x,y,z) is a low-rank tensor. For any size nn, we have rank(𝒜6(x,y,z))6\text{rank}(\mathcal{A}_{6}(x,y,z))\leq 6.

Our goal is to approximate a given antisymmetric tensor 𝒜\mathcal{A} with a low-rank antisymmetric tensor of the form (3.4). We demonstrate two approaches. The “naive” one is given in Section 4. Then, in Section 5 we formulate this problem as the minimization probelm. For a given non-zero antisymmetric tensor 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n}, we are looking for a tensor 𝒜~=𝒜6(x,y,z)\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z), i.e., vectors x,y,znx,y,z\in\mathbb{R}^{n}, such that

𝒜𝒜~2min.\|\mathcal{A}-\widetilde{\mathcal{A}}\|^{2}\to\min.

4. CP-ALS with a posteriori antisymmetrization

First we describe a naive approach. The process is made of two steps. Step 11: Using the CP-ALS algorithm 2.1 that ignores the tensor structure we find a rank-11 approximation 𝒜¯\bar{\mathcal{A}} of 𝒜\mathcal{A},

𝒜¯=[[x,y,z]],rank(𝒜¯)=1.\bar{\mathcal{A}}=[[x,y,z]],\quad\text{rank}(\bar{\mathcal{A}})=1.

Step 22: We apply the antisymmetrizer (3.2) on 𝒜¯\bar{\mathcal{A}} to obtain 𝒜~\widetilde{\mathcal{A}} in the form (3.4),

𝒜~=anti(𝒜¯),\widetilde{\mathcal{A}}=\text{anti}(\bar{\mathcal{A}}),

that is,

𝒜~=𝒜6(x,y,z).\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z).

This procedure is given in Algorithm 4.1. We do not need to form the tensor 𝒜¯\bar{\mathcal{A}} explicitly.

Algorithm 4.1.

  CP with a posteriori antisymmetrization  

Input: 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n} antisymmetric
Output: 𝒜~=𝒜6(x,y,z)\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)
Apply Algorithm 2.1 on 𝒜\mathcal{A} with r=1r=1 to obtain x,y,znx,y,z\in\mathbb{R}^{n}
𝒜~=𝒜6(x,y,z)\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)

 


Obviously, using a rank-11 intermediate tensor produces an unnecessarily big approximation error. However, it can be easily shown that if the error of the rank-11 approximation is bounded by some ϵ>0\epsilon>0, the resulting error will also be bounded by ϵ\epsilon.

5. Antisymmetry-preserving CP algorithm

For a given antisymmetric tensor 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n} we are looking for vectors x,y,znx,y,z\in\mathbb{R}^{n} such that

𝒜𝒜6(x,y,z)2min.\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|^{2}\to\min. (5.1)

Contrary to the Algorithm 2.1, here we develop a new structure-preserving low-rank approximation algorithm. Our algorithm uses the ALS approach, that is, we are solving an optimization problem in each mode. It results with a tensor of the form (3.4) and there is no need to apply the antisymmetrizer. ALS algorithms are widely used to address different multilinear minimization problems [8, 29, 11, 13], including the ones regarding the CP approximation [1, 24, 20]. There is also a very recent extension to the antisymmetric case [28], but both the problem and the algorithm are different from ours.

Set

a=[xyz]3n.a=\left[\begin{array}[]{c}x\\ y\\ z\\ \end{array}\right]\in\mathbb{R}^{3n}.

Then, similarly to what was done in [1], we define the objective function f:3nf\colon\mathbb{R}^{3n}\to\mathbb{R},

f(a)=6𝒜𝒜6(x,y,z)2.f(a)=6\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|^{2}. (5.2)

We consider three partial minimization problems:

minxf(a),minyf(a),minzf(a).\min_{x}f(a),\quad\min_{y}f(a),\quad\min_{z}f(a). (5.3)

Before we formulate the algorithm, we need to prove Theorem 5.1. It gives three reformulations of the objective function ff that we are going to use in order to find the solutions of the problems (5.3).

Observe that, since 𝒜6(x,y,z)\mathcal{A}_{6}(x,y,z) is linear in xx, yy and zz, the objective function is quadratic in xx, yy and zz. The approximation problem becomes a quadratic optimization problem. Here we derive the quadratic forms explicitly. However, it is worth mentioning that the underlying linearity opens the possibilities of the extension to more general settings.

In order to simplify the statement of the theorem, we define the following objects: matrices Q(1)=Q(1)(y,z),Q(2)=Q(2)(x,z),Q(3)=Q(3)(x,y)n×nQ^{(1)}=Q^{(1)}(y,z),Q^{(2)}=Q^{(2)}(x,z),Q^{(3)}=Q^{(3)}(x,y)\in\mathbb{R}^{n\times n},

Q(1)\displaystyle Q^{(1)} =2((y22z22y,z2)In+(yzTzyT)2),\displaystyle=2\left(\left(\|y\|_{2}^{2}\|z\|_{2}^{2}-\langle y,z\rangle^{2}\right)I_{n}+(yz^{T}-zy^{T})^{2}\right), (5.4)
Q(2)\displaystyle Q^{(2)} =2((z22x22z,x2)In+(zxTxzT)2),\displaystyle=2\left(\left(\|z\|_{2}^{2}\|x\|_{2}^{2}-\langle z,x\rangle^{2}\right)I_{n}+(zx^{T}-xz^{T})^{2}\right), (5.5)
Q(3)\displaystyle Q^{(3)} =2((x22y22x,y2)In+(xyTyxT)2);\displaystyle=2\left(\left(\|x\|_{2}^{2}\|y\|_{2}^{2}-\langle x,y\rangle^{2}\right)I_{n}+(xy^{T}-yx^{T})^{2}\right); (5.6)

vectors c(1)=c(1)(y,z),c(2)=c(2)(x,z),c(3)=c(3)(x,y)nc^{(1)}=c^{(1)}(y,z),c^{(2)}=c^{(2)}(x,z),c^{(3)}=c^{(3)}(x,y)\in\mathbb{R}^{n},

c(1)\displaystyle c^{(1)} =12𝒜×2yT×3zT,\displaystyle=-12\mathcal{A}\times_{2}y^{T}\times_{3}z^{T}, (5.7)
ci(2)\displaystyle c_{i}^{(2)} =12𝒜×2zT×3xT,\displaystyle=-12\mathcal{A}\times_{2}z^{T}\times_{3}x^{T}, (5.8)
ci(3)\displaystyle c_{i}^{(3)} =12𝒜×2xT×3yT;\displaystyle=-12\mathcal{A}\times_{2}x^{T}\times_{3}y^{T}; (5.9)

and a real number

d=6𝒜2.d=6\|\mathcal{A}\|^{2}. (5.10)
Theorem 5.1.

The function ff defined by (5.2) can be written as

f(a)\displaystyle f(a) =d+(c(1))Tx+12xTQ(1)x\displaystyle=d+(c^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x (5.11)
=d+(c(2))Ty+12yTQ(2)y\displaystyle=d+(c^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y (5.12)
=d+(c(3))Tz+12zTQ(3)z,\displaystyle=d+(c^{(3)})^{T}z+\frac{1}{2}z^{T}Q^{(3)}z, (5.13)

for Q(1),Q(2),Q(3)n×nQ^{(1)},Q^{(2)},Q^{(3)}\in\mathbb{R}^{n\times n}, c(1),c(2),c(3)nc^{(1)},c^{(2)},c^{(3)}\in\mathbb{R}^{n}, and dd\in\mathbb{R} defined by the relations (5.4)–(5.10).

Proof.

First, we can write the function ff from (5.2) as

f(a)\displaystyle f(a) =6𝒜22𝒜,6𝒜6(x,y,z)+166𝒜6(x,y,z)2\displaystyle=6\|\mathcal{A}\|^{2}-2\langle\mathcal{A},6\mathcal{A}_{6}(x,y,z)\rangle+\frac{1}{6}\|6\mathcal{A}_{6}(x,y,z)\|^{2}
=6f1(a)2f2(a)+16f3(a),\displaystyle=6f_{1}(a)-2f_{2}(a)+\frac{1}{6}f_{3}(a), (5.14)

where

f1(a)\displaystyle f_{1}(a) =𝒜2,\displaystyle=\|\mathcal{A}\|^{2},
f2(a)\displaystyle f_{2}(a) =𝒜,xyz+yzx+zxyxzyyxzzyx,\displaystyle=\langle\mathcal{A},x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x\rangle, (5.15)
f3(a)\displaystyle f_{3}(a) =xyz+yzx+zxyxzyyxzzyx2.\displaystyle=\|x\circ y\circ z+y\circ z\circ x+z\circ x\circ y-x\circ z\circ y-y\circ x\circ z-z\circ y\circ x\|^{2}. (5.16)

For the function f2f_{2} we have

f2(a)\displaystyle f_{2}(a) =i,j,k=1naijk(xiyjzk+yizjxk+zixjykxizjykyixjzkziyjxk)\displaystyle=\sum_{i,j,k=1}^{n}a_{ijk}(x_{i}y_{j}z_{k}+y_{i}z_{j}x_{k}+z_{i}x_{j}y_{k}-x_{i}z_{j}y_{k}-y_{i}x_{j}z_{k}-z_{i}y_{j}x_{k})
=i=1nxij,k=1naijkyjzk+k=1nxki,j=1naijkyizj+j=1nxji,k=1naijkziyk\displaystyle=\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k}+\sum_{k=1}^{n}x_{k}\sum_{i,j=1}^{n}a_{ijk}y_{i}z_{j}+\sum_{j=1}^{n}x_{j}\sum_{i,k=1}^{n}a_{ijk}z_{i}y_{k}
+i=1nxij,k=1n(aijk)zjyk+k=1nxki,j=1n(aijk)ziyj+j=1nxji,k=1n(aijk)yizk.\displaystyle\ +\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{ijk})z_{j}y_{k}+\sum_{k=1}^{n}x_{k}\sum_{i,j=1}^{n}(-a_{ijk})z_{i}y_{j}+\sum_{j=1}^{n}x_{j}\sum_{i,k=1}^{n}(-a_{ijk})y_{i}z_{k}.

We rename the indices in the upper expression and use the fact that 𝒜\mathcal{A} is antisymmetric. We get

f2(a)\displaystyle f_{2}(a) =i=1nxij,k=1naijkyjzk+i=1nxij,k=1najkiyjzk+i=1nxij,k=1nakijyjzk\displaystyle=\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{jki}y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{kij}y_{j}z_{k}
+i=1nxij,k=1n(aikj)yjzk+i=1nxij,k=1n(akji)yjzk+i=1nxij,k=1n(ajik)yjzk\displaystyle\ +\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{ikj})y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{kji})y_{j}z_{k}+\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}(-a_{jik})y_{j}z_{k}
=6i=1nxij,k=1naijkyjzk.\displaystyle=6\sum_{i=1}^{n}x_{i}\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k}. (5.17)

Further on, we write the function f3f_{3} as

f3(a)=i,j,k=1n(xiyjzk+yizjxk+zixjykxizjykyixjzkziyjxk)2.f_{3}(a)=\sum_{i,j,k=1}^{n}(x_{i}y_{j}z_{k}+y_{i}z_{j}x_{k}+z_{i}x_{j}y_{k}-x_{i}z_{j}y_{k}-y_{i}x_{j}z_{k}-z_{i}y_{j}x_{k})^{2}. (5.18)

After regrouping the summands and renaming the indices, as we did for f2f_{2}, it follows from (5.18) that

f3(a)=\displaystyle f_{3}(a)=
=6i=1nxi2(j,k=1nyj2zk2)6i=1nxi2(j,k=1nyjykzjzk)\displaystyle=6\sum_{i=1}^{n}x_{i}^{2}\left(\sum_{j,k=1}^{n}y_{j}^{2}z_{k}^{2}\right)-6\sum_{i=1}^{n}x_{i}^{2}\left(\sum_{j,k=1}^{n}y_{j}y_{k}z_{j}z_{k}\right)
+12i,j=1nxixj(k=1nyiykzjzk)6i,j=1nxixj(k=1nyiyjzk2)6i,j=1nxixj(k=1nyk2zizj)\displaystyle\ +12\sum_{i,j=1}^{n}x_{i}x_{j}\left(\sum_{k=1}^{n}y_{i}y_{k}z_{j}z_{k}\right)-6\sum_{i,j=1}^{n}x_{i}x_{j}\left(\sum_{k=1}^{n}y_{i}y_{j}z_{k}^{2}\right)-6\sum_{i,j=1}^{n}x_{i}x_{j}\left(\sum_{k=1}^{n}y_{k}^{2}z_{i}z_{j}\right)
=i=1nxi2(6j=1nyj2k=1nzk26(j=1nyjzj)2)\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(6\sum_{j=1}^{n}y_{j}^{2}\sum_{k=1}^{n}z_{k}^{2}-6\bigg{(}\sum_{j=1}^{n}y_{j}z_{j}\bigg{)}^{2}\right)
+i,j=1nxixj(12k=1nyiykzjzk6k=1nyiyjzk26k=1nyk2zizj)\displaystyle\ +\sum_{i,j=1}^{n}x_{i}x_{j}\left(12\sum_{k=1}^{n}y_{i}y_{k}z_{j}z_{k}-6\sum_{k=1}^{n}y_{i}y_{j}z_{k}^{2}-6\sum_{k=1}^{n}y_{k}^{2}z_{i}z_{j}\right)
=i=1nxi2(6j=1nyj2k=1nzk26(j=1nyjzj)2+12yizik=1nykzk6yi2k=1nzk26zi2k=1nyk2)\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(6\sum_{j=1}^{n}y_{j}^{2}\sum_{k=1}^{n}z_{k}^{2}-6\bigg{(}\sum_{j=1}^{n}y_{j}z_{j}\bigg{)}^{2}+12y_{i}z_{i}\sum_{k=1}^{n}y_{k}z_{k}-6y_{i}^{2}\sum_{k=1}^{n}z_{k}^{2}-6z_{i}^{2}\sum_{k=1}^{n}y_{k}^{2}\right)
+i,j=1i<jnxixj(12yizjk=1nykzk+12ziyjk=1nykzk12yiyjk=1nzk212zizjk=1nyk2).\displaystyle\ +\sum_{\begin{subarray}{c}i,j=1\\ i<j\end{subarray}}^{n}x_{i}x_{j}\left(12y_{i}z_{j}\sum_{k=1}^{n}y_{k}z_{k}+12z_{i}y_{j}\sum_{k=1}^{n}y_{k}z_{k}-12y_{i}y_{j}\sum_{k=1}^{n}z_{k}^{2}-12z_{i}z_{j}\sum_{k=1}^{n}y_{k}^{2}\right).

That is,

f3(a)\displaystyle f_{3}(a) =i=1nxi2(6y22z226y,z2+12yiziy,z6yi2z226zi2y22)\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(6\|y\|_{2}^{2}\|z\|_{2}^{2}-6\langle y,z\rangle^{2}+12y_{i}z_{i}\langle y,z\rangle-6y_{i}^{2}\|z\|_{2}^{2}-6z_{i}^{2}\|y\|_{2}^{2}\right)
+i,j=1i<jnxixj(12(yizj+ziyj)y,z12yiyjz2212zizjy22).\displaystyle\ +\sum_{\begin{subarray}{c}i,j=1\\ i<j\end{subarray}}^{n}x_{i}x_{j}\left(12(y_{i}z_{j}+z_{i}y_{j})\langle y,z\rangle-12y_{i}y_{j}\|z\|_{2}^{2}-12z_{i}z_{j}\|y\|_{2}^{2}\right). (5.19)

Then, we can set

d\displaystyle d =6f1(a),\displaystyle=6f_{1}(a),
(c(1))Tx\displaystyle(c^{(1)})^{T}x =2f2(a),\displaystyle=-2f_{2}(a),
12xTQ(1)x\displaystyle\frac{1}{2}x^{T}Q^{(1)}x =16f3(a).\displaystyle=\frac{1}{6}f_{3}(a).

From the relations (5.14), (5.17), and (5.19) we get the assertion (5.11) where

ci(1)=12j,k=1naijkyjzk,1in,c_{i}^{(1)}=-12\sum_{j,k=1}^{n}a_{ijk}y_{j}z_{k},\quad 1\leq i\leq n, (5.20)
qii(1)\displaystyle q_{ii}^{(1)} =2y22z222y,z2+4yiziy,z2yi2z222zi2y22,\displaystyle=2\|y\|_{2}^{2}\|z\|_{2}^{2}-2\langle y,z\rangle^{2}+4y_{i}z_{i}\langle y,z\rangle-2y_{i}^{2}\|z\|_{2}^{2}-2z_{i}^{2}\|y\|_{2}^{2},
qij(1)\displaystyle q_{ij}^{(1)} =2(yizj+ziyj)y,z2yiyjz222zizjy22,1i,jn,ij,\displaystyle=2(y_{i}z_{j}+z_{i}y_{j})\langle y,z\rangle-2y_{i}y_{j}\|z\|_{2}^{2}-2z_{i}z_{j}\|y\|_{2}^{2},\quad 1\leq i,j\leq n,\ i\neq j, (5.21)

and dd is as given in (5.10). It follows from the expressions in (5.21) that

Q(1)\displaystyle Q^{(1)} =2(y22z22y,z2)In+2((yzT+zyT)y,zyyTz22zzTy22)\displaystyle=2\left(\|y\|_{2}^{2}\|z\|_{2}^{2}-\langle y,z\rangle^{2}\right)I_{n}+2\left((yz^{T}+zy^{T})\langle y,z\rangle-yy^{T}\|z\|_{2}^{2}-zz^{T}\|y\|_{2}^{2}\right)
=2((y22z22y,z2)In+(yzTzyT)2),\displaystyle=2\left(\left(\|y\|_{2}^{2}\|z\|_{2}^{2}-\langle y,z\rangle^{2}\right)I_{n}+(yz^{T}-zy^{T})^{2}\right),

while the vector given element-wise by (5.20) is equal to the one from the relation (5.7).

Similarly, using a different regrouping of the summands in the equations (5.15) and (5.16), we obtain the assertions (5.12) and (5.13). We get Q(2)Q^{(2)} and c(2)c^{(2)}, as in the relations (5.5) and (5.8), respectively, as well as Q(3)Q^{(3)} and c(3)c^{(3)}, as in (5.6) and (5.9), respectively. ∎

Minimization problem of the form

minv{d+cTv+12vTQv},\min_{v}\{d+c^{T}v+\frac{1}{2}v^{T}Qv\},

is a problem of quadratic programming with no constraints. Its solution vv is given by the linear system

Qv=c.Qv=-c.

Therefore, in order to find the solutions of the minimization problems

minxd+(c(1))Tx+12xTQ(1)xminyd+(c(2))Ty+12yTQ(2)yminzd+(c(3))Tz+12zTQ(3)z}\left.\begin{array}[]{c}\displaystyle\min_{x}d+(c^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x\\ \displaystyle\min_{y}d+(c^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y\\ \displaystyle\min_{z}d+(c^{(3)})^{T}z+\frac{1}{2}z^{T}Q^{(3)}z\\ \end{array}\right\} (5.22)

we need to solve linear systems

Q(1)x=c(1)Q(2)y=c(2)Q(3)z=c(3)}\left.\begin{array}[]{c}Q^{(1)}x=-c^{(1)}\\ Q^{(2)}y=-c^{(2)}\\ Q^{(3)}z=-c^{(3)}\\ \end{array}\right\}

respectively.

Here we come to an obstacle because matrices Q(1),Q(2),Q(3)Q^{(1)},Q^{(2)},Q^{(3)} are singular. Take Q(1)Q^{(1)}. From the relation (5.4) we see that Q(1)Q^{(1)} is defined by two vectors yy and zz and we have

Q(1)y=0,Q(1)z=0.Q^{(1)}y=0,\quad Q^{(1)}z=0.

Assuming that yy and zz are linearly independent vectors, this means that rank(Q(1))n2\text{rank}(Q^{(1)})\leq n-2. On the other hand, Q(1)Q^{(1)} is defined as an identity matrix minus a rank-22 matrix. This implies that rank(Q(1))=n2\text{rank}(Q^{(1)})=n-2. However, linear system Q(1)x=c(1)Q^{(1)}x=-c^{(1)} is consistent because rank([Q(1)c(1)])=rank(Q(1))\text{rank}([Q^{(1)}c^{(1)}])=\text{rank}(Q^{(1)}), which can be seen from the relations (5.4) and (5.7). Hence, linear system Q(1)x=c(1)Q^{(1)}x=-c^{(1)} can be solved using the Moore-Penrose inverse,

x=(Q(1))+c(1),x=-\left(Q^{(1)}\right)^{+}c^{(1)},

with an additional constraint that xx is orthogonal to yy and zz. Analogue reasoning holds for the linear systems for yy and zz.

Now, we can write the algorithm for solving the minimization problem (5.1). The algorithm is based on solving three minimization problems (5.22).

Algorithm 5.2.

  Antisymmetry-preserving CP  

Input: 𝒜n×n×n\mathcal{A}\in\mathbb{R}^{n\times n\times n} antisymmetric
Output: 𝒜~=𝒜6(x,y,z)\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)
Initialize x,y,znx,y,z\in\mathbb{R}^{n} as random vectors.
repeat
     For c(1)c^{(1)} as in (5.7) and Q(1)Q^{(1)} as in (5.4), x=(Q(1))+c(1)x=-\left(Q^{(1)}\right)^{+}c^{(1)}.
     For c(2)c^{(2)} as in (5.8) and Q(2)Q^{(2)} as in (5.5), y=(Q(2))+c(2)y=-\left(Q^{(2)}\right)^{+}c^{(2)}.
     For c(3)c^{(3)} as in (5.9) and Q(3)Q^{(3)} as in (5.6), z=(Q(3))+c(3)z=-\left(Q^{(3)}\right)^{+}c^{(3)}.
until convergence or maximum number of iterations
𝒜~=𝒜6(x,y,z)\widetilde{\mathcal{A}}=\mathcal{A}_{6}(x,y,z)

 


Note that the Algorithm 5.2 results with mutually orthogonal vectors xx, yy and zz. While this may seem restrictive, it is reasonable to do a restriction on orthogonal vectors. Let us explain that.

First, note that if xx, yy and zz are linearly dependent, then 𝒜6(x,y,z)=0\mathcal{A}_{6}(x,y,z)=0. Thus, any linearly independent triplet of vectors will give a smaller value of the objective function ff defined by the relation (5.2).

The objective function ff is invariant under very general transformations. Due to multilinearity, for α,β\alpha,\beta\in\mathbb{R}, we have

𝒜6(x+αy+βz,y,z)=𝒜6(αy+βz,y,z)+𝒜6(x,y,z)=𝒜6(x,y,z),\mathcal{A}_{6}(x+\alpha y+\beta z,y,z)=\mathcal{A}_{6}(\alpha y+\beta z,y,z)+\mathcal{A}_{6}(x,y,z)=\mathcal{A}_{6}(x,y,z),

where 𝒜6(αy+βz,y,z)=0\mathcal{A}_{6}(\alpha y+\beta z,y,z)=0 by antisymmetry. Let us consider an arbitrary matrix B=(bij)3×3B=(b_{ij})\in\mathbb{R}^{3\times 3}. Using the arguments of multilinearity and antisymmetry as above, we have

𝒜6(b11x+b12y+b13z,b21x+b22y+b23z,b31x+b32y+b33z)=\displaystyle\mathcal{A}_{6}(b_{11}x+b_{12}y+b_{13}z,b_{21}x+b_{22}y+b_{23}z,b_{31}x+b_{32}y+b_{33}z)=
=b11b22b33𝒜6(x,y,z)+b11b23b32𝒜6(x,z,y)+b12b21b33𝒜6(y,x,z)+\displaystyle=b_{11}b_{22}b_{33}\mathcal{A}_{6}(x,y,z)+b_{11}b_{23}b_{32}\mathcal{A}_{6}(x,z,y)+b_{12}b_{21}b_{33}\mathcal{A}_{6}(y,x,z)+
+b12b23b31𝒜6(y,z,x)+b13b21b32𝒜6(z,x,y)+b13b22b31𝒜6(z,y,x)\displaystyle\quad+b_{12}b_{23}b_{31}\mathcal{A}_{6}(y,z,x)+b_{13}b_{21}b_{32}\mathcal{A}_{6}(z,x,y)+b_{13}b_{22}b_{31}\mathcal{A}_{6}(z,y,x)
=(b11b22b33b11b23b32b12b21b33+b12b23b31+b13b21b32b13b22b31)𝒜6(x,y,z)\displaystyle=(b_{11}b_{22}b_{33}-b_{11}b_{23}b_{32}-b_{12}b_{21}b_{33}+b_{12}b_{23}b_{31}+b_{13}b_{21}b_{32}-b_{13}b_{22}b_{31})\mathcal{A}_{6}(x,y,z)
=det(B)𝒜6(x,y,z).\displaystyle=\text{det}(B)\mathcal{A}_{6}(x,y,z).

Therefore, if det(B)=1\text{det}(B)=1,

𝒜𝒜6(b11x+b12y+b13z,b21x+b22y+b23z,b31x+b32y+b33z)=𝒜𝒜6(x,y,z),\|\mathcal{A}-\mathcal{A}_{6}(b_{11}x+b_{12}y+b_{13}z,b_{21}x+b_{22}y+b_{23}z,b_{31}x+b_{32}y+b_{33}z)\|=\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|,

and the value of the objective function ff stays the same.

Set V=[x,y,z]n×3V=[x,y,z]\in\mathbb{R}^{n\times 3}. Take the thin QR decomposition V=V~RV=\widetilde{V}R, such that det(R)=1det(R)=1 and V~=[x~,y~,z~]n×3\widetilde{V}=[\tilde{x},\tilde{y},\tilde{z}]\in\mathbb{R}^{n\times 3} has orthogonal columns. Then, following the same reasoning, we have

𝒜6(x~,y~,z~)=𝒜6(x,y,z)\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})=\mathcal{A}_{6}(x,y,z)

and

minx,y,z𝒜𝒜6(x,y,z)=minx~,y~,z~ orthogonal𝒜𝒜6(x~,y~,z~).\min_{x,y,z}\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|=\min_{\tilde{x},\tilde{y},\tilde{z}\text{ orthogonal}}\|\mathcal{A}-\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|. (5.23)

This justifies the choice of orthogonal vectors.

5.1. Equivalence of the Algorithm 5.2 to HOPM

Here we are going to show that the Algorithm 5.2 is equivalent to the higher-order power method (HOPM) for unstructured rank-11 approximation and see what implications it has.

Due to multilinearity, minimization problem (5.23) can be modified into a minimization problem on unitary vectors,

minx~,y~,z~ orthonormalλ𝒜λ𝒜6(x~,y~,z~)2,\min_{\begin{subarray}{c}\tilde{x},\tilde{y},\tilde{z}\text{ orthonormal}\\ \lambda\in\mathbb{R}\end{subarray}}\|\mathcal{A}-\lambda\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|^{2}, (5.24)

so it becomes a minimization problem on the Stiefel manifold. Since the expression in (5.24) does not depend on the basis, it is a minimization problem on the Grassmann manifold, which makes sense as the antisymmetric tensors are connected to the Grassmannians, see, e.g., [23].

We can rewrite (5.24) as

minx~,y~,z~ orthonormalλ{𝒜22λ𝒜,𝒜6(x~,y~,z~)+λ2𝒜6(x~,y~,z~)2}.\min_{\begin{subarray}{c}\tilde{x},\tilde{y},\tilde{z}\text{ orthonormal}\\ \lambda\in\mathbb{R}\end{subarray}}\left\{\|\mathcal{A}\|^{2}-2\lambda\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle+\lambda^{2}\|\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|^{2}\right\}. (5.25)

Set

V=[x~y~z~].V=\left[\begin{array}[]{ccc}\tilde{x}&\tilde{y}&\tilde{z}\\ \end{array}\right]. (5.26)

Observe that

𝒜6(x~,y~,z~)=×1V×2V×3V,\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})=\mathcal{E}\times_{1}V\times_{2}V\times_{3}V,

where \mathcal{E} is given by the relation (3.3). Then,

𝒜6(x~,y~,z~)2=2=6,\|\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|^{2}=\|\mathcal{E}\|^{2}=6,

because x~\tilde{x}, y~\tilde{y}, z~\tilde{z} are orthonormal and the Frobenius norm is unitary invariant. This way, minimization problem (5.25) is simplified to

minx~,y~,z~ orthonormalλ{𝒜22λ𝒜,𝒜6(x~,y~,z~)+6λ2}.\min_{\begin{subarray}{c}\tilde{x},\tilde{y},\tilde{z}\text{ orthonormal}\\ \lambda\in\mathbb{R}\end{subarray}}\left\{\|\mathcal{A}\|^{2}-2\lambda\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle+6\lambda^{2}\right\}.

Take the objective function

f~(λ,x~,y~,z~)=𝒜22λ𝒜,𝒜6(x~,y~,z~)+6λ2.\tilde{f}(\lambda,\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{A}\|^{2}-2\lambda\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle+6\lambda^{2}. (5.27)

In order to find the optimal λ\lambda_{*} for f~\tilde{f} we set the partial derivative of f~\tilde{f} to zero. We have

λf~(λ,x~,y~,z~)=12λ2𝒜,𝒜6(x~,y~,z~)=0,\frac{\partial}{\partial\lambda}\tilde{f}(\lambda,\tilde{x},\tilde{y},\tilde{z})=12\lambda-2\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle=0,

that is,

λ=𝒜,𝒜6(x~,y~,z~)6.\lambda_{*}=\frac{\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle}{6}.

It follows from (5.27) that

f~(λ,x~,y~,z~)=𝒜216𝒜,𝒜6(x~,y~,z~)2.\tilde{f}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{A}\|^{2}-\frac{1}{6}\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle^{2}.

Thus, minimizing f~(λ,x~,y~,z~)\tilde{f}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z}) is equivalent to maximizing |𝒜,𝒜6(x~,y~,z~)||\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle| over the Stiefel manifold.

Define the compressed tensor

𝒜c(V)𝒜×1VT×2VT×3VT,\mathcal{A}_{c}(V)\coloneqq\mathcal{A}\times_{1}V^{T}\times_{2}V^{T}\times_{3}V^{T}, (5.28)

where VV is as in the relation (5.26). This is a 3×3×33\times 3\times 3 tensor. It is very similar to tensor \mathcal{E}, except that in place of 11 and 1-1 it has (𝒜c(V))123(\mathcal{A}_{c}(V))_{123} and (𝒜c(V))123-(\mathcal{A}_{c}(V))_{123}, respectively. Using this tensor we obtain

|𝒜,𝒜6(x~,y~,z~)|\displaystyle|\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle| =|𝒜,×1V×2V×3V|=|𝒜c(V),|\displaystyle=|\langle\mathcal{A},\mathcal{E}\times_{1}V\times_{2}V\times_{3}V\rangle|=|\langle\mathcal{A}_{c}(V),\mathcal{E}\rangle|
=6|(𝒜c(V))123|=6𝒜c(V).\displaystyle=6|(\mathcal{A}_{c}(V))_{123}|=\sqrt{6}\|\mathcal{A}_{c}(V)\|.

In the last equation we used the norm of the compressed tensor, 𝒜c(V)2=6((𝒜c(V))123)2\|\mathcal{A}_{c}(V)\|^{2}=6((\mathcal{A}_{c}(V))_{123})^{2}. Therefore, maximization of |𝒜,𝒜6(x~,y~,z~)||\langle\mathcal{A},\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\rangle| is equivalent to maximization of 𝒜c(V)\|\mathcal{A}_{c}(V)\|. This corresponds to the best structure-preserving multilinear rank-rr approximation from [3] for r=3r=3.

The problem of finding the best antisymmetric multilinear rank-rr approximation is equivalent to the problem of finding the best unstructured rank-11 approximation of an antisymmetric tensor, see [3, Theorem 4.2]. This implies the equivalence between our Algorithm 5.2 and HOPM used for finding the best unstructured rank-11 approximation. Finally, the global convergence result for HOPM given in [30], namely, the iterates of the ALS algorithm for HOPM converge to the stationary point of the corresponding objective function, apply to our algorithm as well.

6. Partial antisymmetry

Regarding the antisymmetric tensors, we can ask what happens if a tensor has only partial antisymmetry. We observe order-33 tensors. Note that partially antisymmetric tensors do not need to be cubical.

Tensor 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} is antisymmetric in modes one and two if all its frontal slices are antisymmetric. Without the loss of generality, we assume that tensor 𝒞\mathcal{C} is antisymmetric in the first two modes. That is,

cijk=cjik,1i,jn, 1km.c_{ijk}=-c_{jik},\quad 1\leq i,j\leq n,\ 1\leq k\leq m. (6.1)

Tensors that are antisymmetric in modes two and three, or in modes one and three, are defined correspondingly. Partial antisymmetrizer that results with the antisymmetry in modes one and two can be defined as the operator anti1,2\text{anti}_{1,2} such that for n×n×m\mathcal{B}\in\mathbb{R}^{n\times n\times m} and 𝒞=anti1,2()\mathcal{C}=\text{anti}_{1,2}(\mathcal{B}) we have

cijk=12(bijkbjik).c_{ijk}=\frac{1}{2}(b_{ijk}-b_{jik}).

For a pair of indices (i,j)(i,j), 1i<jn1\leq i<j\leq n, subtensor 𝒢\mathcal{G} of 𝒞\mathcal{C} obtained at the intersection of the iith and jjth column, row, and tube is a 2×2×22\times 2\times 2 tensor of the form

𝒢(i1,i2,i3)={α,if (i1,i2,i3)=(1,2,1),α,if (i1,i2,i3)=(2,1,1),β,if (i1,i2,i3)=(1,2,2),β,if (i1,i2,i3)=(2,1,2),0,if i1=i2,\mathcal{G}(i_{1},i_{2},i_{3})=\left\{\begin{array}[]{rl}\alpha,&\text{if $(i_{1},i_{2},i_{3})=(1,2,1)$,}\\ -\alpha,&\text{if $(i_{1},i_{2},i_{3})=(2,1,1)$,}\\ \beta,&\text{if $(i_{1},i_{2},i_{3})=(1,2,2)$,}\\ -\beta,&\text{if $(i_{1},i_{2},i_{3})=(2,1,2)$,}\\ 0,&\text{if $i_{1}=i_{2}$,}\\ \end{array}\right.

for α,β\alpha,\beta\in\mathbb{R}. Its mode-11 matricization is given by

G(1)=[0α0βα0β0].G_{(1)}=\left[\begin{array}[]{cc|cc}0&\alpha&0&\beta\\ -\alpha&0&-\beta&0\\ \end{array}\right].

Here, tensor 𝒢\mathcal{G} plays the role analogue to the Levi-Civita tensor (3.3) in Section 3.

Analogue to the tensor format (3.4), for three vectors x,ynx,y\in\mathbb{R}^{n} and zmz\in\mathbb{R}^{m}, we can define an n×n×mn\times n\times m tensor

𝒞2(x,y,z):=12(xyzyxz).\mathcal{C}_{2}(x,y,z):=\frac{1}{2}(x\circ y\circ z-y\circ x\circ z). (6.2)

If we take x=[1,0]Tx=[1,0]^{T}, y=[0,1]Ty=[0,1]^{T}, z=[α,β]Tz=[\alpha,\beta]^{T}, then 𝒞2(x,y,z)=𝒢\mathcal{C}_{2}(x,y,z)=\mathcal{G}. Besides, if 𝒯=[[x,y,z]]\mathcal{T}=[[x,y,z]] is a rank-11 tensor, then 𝒞2(x,y,z)=anti1,2(𝒯)\mathcal{C}_{2}(x,y,z)=\text{anti}_{1,2}(\mathcal{T}). Obviously, rank(𝒞2(x,y,z))2\text{rank}(\mathcal{C}_{2}(x,y,z))\leq 2. For the fixed third index, each slice of 𝒞2(x,y,z)\mathcal{C}_{2}(x,y,z) is a skew-symmetric matrix and, therefore, has an even rank. Hence,

rank(𝒞2(x,y,z))=2.\text{rank}(\mathcal{C}_{2}(x,y,z))=2.

Considering all this, for a given non-zero tensor 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} that is antisymmetric in the first two modes, we are looking for its rank-22 approximation 𝒞~\widetilde{\mathcal{C}} of the same structure. Again, we examine two approaches. The first one is analogue to the Section 4. In the second approach we find a tensor 𝒞~=𝒞2(x,y,z)\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z) defined by the vectors x,ynx,y\in\mathbb{R}^{n} and zmz\in\mathbb{R}^{m}, such that

𝒞𝒞~2min.\|\mathcal{C}-\widetilde{\mathcal{C}}\|^{2}\to\min. (6.3)

6.1. Ignoring the structure

Let 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} ba a tensor with partial antisymmetry. We first approximate 𝒞\mathcal{C} with a rank-11 tensor 𝒞¯\bar{\mathcal{C}} by using the CP-ALS algorithm 2.1 with r=1r=1. Then, we apply the operator anti1,2\text{anti}_{1,2} on 𝒞¯\bar{\mathcal{C}} to get a rank-22 tensor 𝒞~\widetilde{\mathcal{C}} that is antisymmetric in modes one and two. We have

𝒞¯\displaystyle\bar{\mathcal{C}} =[[x,y,z]],x,yn,zm,\displaystyle=[[x,y,z]],\quad x,y\in\mathbb{R}^{n},\ z\in\mathbb{R}^{m},
𝒞~\displaystyle\widetilde{\mathcal{C}} =anti1,2(𝒞¯),\displaystyle=\text{anti}_{1,2}(\bar{\mathcal{C}}),

or equivalently, 𝒞~=𝒞2(x,y,z)\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z). The algorithm with partial a posteriori antisymmetrization is a simple modification of the Algorithm 4.1.

Algorithm 6.1.

  CP with partial a posteriori antisymmetrization  

Input: 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} antisymmetric in modes 11 and 22
Output: 𝒞~=𝒞2(x,y,z)\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)
Apply Algorithm 2.1 on 𝒜\mathcal{A} with r=1r=1 to obtain x,ynx,y\in\mathbb{R}^{n}, zmz\in\mathbb{R}^{m}
𝒞~=𝒞2(x,y,z)\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)

 


6.2. Preserving the structure

Now we are going to construct an iterative structure-preserving minimization algorithm. Again, let 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} be a tensor with partial antisymmetry. We are looking for tensor 𝒞~n×n×m\widetilde{\mathcal{C}}\in\mathbb{R}^{n\times n\times m} which is a solution of the minimization problem (6.3). In particular, we are looking for vectors x,ynx,y\in\mathbb{R}^{n} and zmz\in\mathbb{R}^{m} such that

𝒞𝒞2(x,y,z)2min.\|\mathcal{C}-\mathcal{C}_{2}(x,y,z)\|^{2}\to\min. (6.4)

We set

v=[xyz]2n+m,v=\left[\begin{array}[]{c}x\\ y\\ z\\ \end{array}\right]\in\mathbb{R}^{2n+m},

and define the objective function g:2n+mg\colon\mathbb{R}^{2n+m}\to\mathbb{R},

g(v)=2𝒞𝒞2(x,y,z)2.g(v)=2\|\mathcal{C}-\mathcal{C}_{2}(x,y,z)\|^{2}. (6.5)

We formulate the ALS algorithm based on three minimization problems:

minxg(v),minyg(v),minzg(v).\min_{x}g(v),\quad\min_{y}g(v),\quad\min_{z}g(v).

To this end, we need Theorem 6.2. Before the statement of the theorem, we define the appropriate objects: matrices Q(1)=Q(1)(y,z),Q(2)=Q(2)(x,z)n×nQ^{(1)}=Q^{(1)}(y,z),Q^{(2)}=Q^{(2)}(x,z)\in\mathbb{R}^{n\times n},

Q(1)\displaystyle Q^{(1)} =2y22z22In2yyTz22,\displaystyle=2\|y\|_{2}^{2}\|z\|_{2}^{2}I_{n}-2yy^{T}\|z\|_{2}^{2}, (6.6)
Q(2)\displaystyle Q^{(2)} =2x22z22In2xxTz22;\displaystyle=2\|x\|_{2}^{2}\|z\|_{2}^{2}I_{n}-2xx^{T}\|z\|_{2}^{2}; (6.7)

vectors b(1)=b(1)(y,z),b(2)=b(2)(x,z)nb^{(1)}=b^{(1)}(y,z),b^{(2)}=b^{(2)}(x,z)\in\mathbb{R}^{n}, b(3)=b(3)(x,y)mb^{(3)}=b^{(3)}(x,y)\in\mathbb{R}^{m},

b(1)\displaystyle b^{(1)} =4𝒞×2yT×3zT,\displaystyle=-4\mathcal{C}\times_{2}y^{T}\times_{3}z^{T}, (6.8)
b(2)\displaystyle b^{(2)} =4𝒞×2xT×3zT,\displaystyle=-4\mathcal{C}\times_{2}x^{T}\times_{3}z^{T}, (6.9)
b(3)\displaystyle b^{(3)} =2(𝒞×1xT×2yT𝒞×1yT×2xT),\displaystyle=-2(\mathcal{C}\times_{1}x^{T}\times_{2}y^{T}-\mathcal{C}\times_{1}y^{T}\times_{2}x^{T}), (6.10)

and numbers q(3)=q(3)(x,y),dq^{(3)}=q^{(3)}(x,y),d\in\mathbb{R},

q(3)\displaystyle q^{(3)} =xyTyxT22,\displaystyle=\|xy^{T}-yx^{T}\|_{2}^{2}, (6.11)
d\displaystyle d =2𝒞2.\displaystyle=2\|\mathcal{C}\|^{2}. (6.12)
Theorem 6.2.

The function gg defined by (6.5) can be written as

g(v)\displaystyle g(v) =d+(b(1))Tx+12xTQ(1)x\displaystyle=d+(b^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x (6.13)
=d+(b(2))Ty+12yTQ(2)y\displaystyle=d+(b^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y (6.14)
=d+(b(3))Tz+12q(3)zTz,\displaystyle=d+(b^{(3)})^{T}z+\frac{1}{2}q^{(3)}z^{T}z, (6.15)

for Q(1),Q(2)n×nQ^{(1)},Q^{(2)}\in\mathbb{R}^{n\times n}, b(1),b(2)nb^{(1)},b^{(2)}\in\mathbb{R}^{n}, b(3)mb^{(3)}\in\mathbb{R}^{m}, q(3)q^{(3)}\in\mathbb{R} defined by the relations (6.6)-(6.12).

Proof.

We start by writing the function gg as

g(v)\displaystyle g(v) =2𝒞22𝒞,xyzyxz+12xyzyxz2\displaystyle=2\|\mathcal{C}\|^{2}-2\left\langle\mathcal{C},x\circ y\circ z-y\circ x\circ z\right\rangle+\frac{1}{2}\|x\circ y\circ z-y\circ x\circ z\|^{2}
=2g1(v)2g2(v)+12g3(v),\displaystyle=2g_{1}(v)-2g_{2}(v)+\frac{1}{2}g_{3}(v),

for

g1(v)\displaystyle g_{1}(v) =𝒞2,\displaystyle=\|\mathcal{C}\|^{2},
g2(v)\displaystyle g_{2}(v) =𝒞,xyzyxz,\displaystyle=\left\langle\mathcal{C},x\circ y\circ z-y\circ x\circ z\right\rangle, (6.16)
g3(v)\displaystyle g_{3}(v) =xyzyxz2.\displaystyle=\|x\circ y\circ z-y\circ x\circ z\|^{2}. (6.17)

Function g2g_{2} can be written as

g2(v)\displaystyle g_{2}(v) =i,j=1nk=1mcijk(xiyjzkyixjzk)\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{m}c_{ijk}(x_{i}y_{j}z_{k}-y_{i}x_{j}z_{k})
=i=1nxi(j=1nk=1mcijkyjzk)+j=1nxj(i=1nk=1m(cijk)yizk).\displaystyle=\sum_{i=1}^{n}x_{i}\left(\sum_{j=1}^{n}\sum_{k=1}^{m}c_{ijk}y_{j}z_{k}\right)+\sum_{j=1}^{n}x_{j}\left(\sum_{i=1}^{n}\sum_{k=1}^{m}(-c_{ijk})y_{i}z_{k}\right).

Using the partial antisymmetry property (6.1), after renaming the indices we get

g2(v)=2i=1nxi(j=1nk=1mcijkyjzk).g_{2}(v)=2\sum_{i=1}^{n}x_{i}\left(\sum_{j=1}^{n}\sum_{k=1}^{m}c_{ijk}y_{j}z_{k}\right).

For the function g3g_{3} we have

g3(v)\displaystyle g_{3}(v) =i,j=1nk=1m(xiyjzkxjyizk)2\displaystyle=\sum_{i,j=1}^{n}\sum_{k=1}^{m}(x_{i}y_{j}z_{k}-x_{j}y_{i}z_{k})^{2}
=i=1nxi2(j=1nk=1myj2zk2)2i,j=1nxixjyiyj(k=1mzk2)+j=1nxj2(i=1nk=1myi2zk2)\displaystyle=\sum_{i=1}^{n}x_{i}^{2}\left(\sum_{j=1}^{n}\sum_{k=1}^{m}y_{j}^{2}z_{k}^{2}\right)-2\sum_{i,j=1}^{n}x_{i}x_{j}y_{i}y_{j}\left(\sum_{k=1}^{m}z_{k}^{2}\right)+\sum_{j=1}^{n}x_{j}^{2}\left(\sum_{i=1}^{n}\sum_{k=1}^{m}y_{i}^{2}z_{k}^{2}\right)
=2i=1nxi2y22z222i,j=1nxixjyiyjz22\displaystyle=2\sum_{i=1}^{n}x_{i}^{2}\|y\|_{2}^{2}\|z\|_{2}^{2}-2\sum_{i,j=1}^{n}x_{i}x_{j}y_{i}y_{j}\|z\|_{2}^{2}
=i=1nxi2(2y22z222yi2z22)+i,j=1ijnxixj(2yiyjz22).\displaystyle=\sum_{i=1}^{n}x_{i}^{2}(2\|y\|_{2}^{2}\|z\|_{2}^{2}-2y_{i}^{2}\|z\|_{2}^{2})+\sum_{\begin{subarray}{c}i,j=1\\ i\neq j\end{subarray}}^{n}x_{i}x_{j}(-2y_{i}y_{j}\|z\|_{2}^{2}).

The same way as in the proof of Theorem 5.1, we set

d\displaystyle d =2g1(v),\displaystyle=2g_{1}(v),
(b(1))Tx\displaystyle(b^{(1)})^{T}x =2g2(v),\displaystyle=-2g_{2}(v),
12xTQ(1)x\displaystyle\frac{1}{2}x^{T}Q^{(1)}x =12g3(v),\displaystyle=\frac{1}{2}g_{3}(v),

where

bi(1)=4j=1nk=1mcijkyjzk,1in,b_{i}^{(1)}=-4\sum_{j=1}^{n}\sum_{k=1}^{m}c_{ijk}y_{j}z_{k},\quad 1\leq i\leq n, (6.18)

and

qii(1)\displaystyle q_{ii}^{(1)} =2y2z22yi2z2,\displaystyle=2\|y\|^{2}\|z\|^{2}-2y_{i}^{2}\|z\|^{2},
qij(1)\displaystyle q_{ij}^{(1)} =2yiyjz2,1i,jn,ij.\displaystyle=-2y_{i}y_{j}\|z\|^{2},\quad 1\leq i,j\leq n,\ i\neq j. (6.19)

Vector b(1)b^{(1)} from (6.18) can be written in the more compact form (6.8) and matrix Q(1)Q^{(1)} from (6.19) is equivalent to (6.6), while dd is like in the relation (6.12). This is how we get the assertion (6.13).

With different regrouping of the summands in the relations (6.16) and (6.17) we get the equation (6.14) with b(2)b^{(2)} and Q(2)Q^{(2)} as in (6.9) and (6.7), respectively.

To get the equation (6.15) we write

g2(v)=k=1mzk(i,j=1ncijk(xiyjyixj))g_{2}(v)=\sum_{k=1}^{m}z_{k}\left(\sum_{i,j=1}^{n}c_{ijk}(x_{i}y_{j}-y_{i}x_{j})\right)

and

g3(v)=k=1mzk2(i,j=1n(xiyjxjyi)2).g_{3}(v)=\sum_{k=1}^{m}z_{k}^{2}\left(\sum_{i,j=1}^{n}(x_{i}y_{j}-x_{j}y_{i})^{2}\right).

Then, we set

bk(3)\displaystyle b_{k}^{(3)} =2i,j=1ncijk(xiyjyixj),1km,\displaystyle=-2\sum_{i,j=1}^{n}c_{ijk}(x_{i}y_{j}-y_{i}x_{j}),\quad 1\leq k\leq m,
q(3)\displaystyle q^{(3)} =i,j=1n(xiyjxjyi)2=xyTyxT22.\displaystyle=\sum_{i,j=1}^{n}(x_{i}y_{j}-x_{j}y_{i})^{2}=\|xy^{T}-yx^{T}\|_{2}^{2}.

Compact form of the vector b(3)b^{(3)} corresponds to (6.10). ∎

Therefore, like in the Section 5, our algorithm is based on finding the solutions of the minimization problems

minxd+(b(1))Tx+12xTQ(1)xminyd+(b(2))Ty+12yTQ(2)yminzd+(b(3))Tz+12q(3)zTz.}\left.\begin{array}[]{c}\displaystyle\min_{x}d+(b^{(1)})^{T}x+\frac{1}{2}x^{T}Q^{(1)}x\\ \displaystyle\min_{y}d+(b^{(2)})^{T}y+\frac{1}{2}y^{T}Q^{(2)}y\\ \displaystyle\min_{z}d+(b^{(3)})^{T}z+\frac{1}{2}q^{(3)}z^{T}z.\\ \end{array}\right\}

Those solutions are obtained, respectively,from the following equations

Q(1)x=b(1)Q(2)y=b(2)z=1q(3)b(3).}\left.\begin{array}[]{c}Q^{(1)}x=-b^{(1)}\\ Q^{(2)}y=-b^{(2)}\\ z=-\frac{1}{q^{(3)}}b^{(3)}.\\ \end{array}\right\}

The situation regarding these linear systems is similar as for the fully antisymmetric case. Matrices Q(1)Q^{(1)} and Q(2)Q^{(2)} are not of the full rank. From their definitions (6.6) and (6.7) we see that both are given as identity minus a rank-11 matrix and

Q(1)x=0,Q(2)y=0.Q^{(1)}x=0,\quad Q^{(2)}y=0.

Thus, rank(Q(1))=rank(Q(2))=n1.\text{rank}(Q^{(1)})=\text{rank}(Q^{(2)})=n-1. Still, rank([Q(1)b(1)])=rank(Q(1))\text{rank}([Q^{(1)}b^{(1)}])=\text{rank}(Q^{(1)}) and rank([Q(2),b(2)])=rank(Q(2))\text{rank}([Q^{(2)},b^{(2)}])=\text{rank}(Q^{(2)}), so the linear systems are consistent and can be solved using the Moore-Penrose inverse. Additionally, we get that the vectors xx and yy must be orthogonal. Note that, for xyx\neq y, q(3)0q^{(3)}\neq 0 and zz is well-defined.

The algorithm for solving the minimization problem (6.4) is very similar to the Algorithm 5.2.

Algorithm 6.3.

  CP preserving partial antisymmetry  

Input: 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} antisymmetric in modes 11 and 22
Output: 𝒞~=𝒞2(x,y,z)\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)
Initialize x,ynx,y\in\mathbb{R}^{n},zmz\in\mathbb{R}^{m} as random vectors.
repeat
     For b(1)b^{(1)} as in (6.8) and Q(1)Q^{(1)} as in (6.6), x=(Q(1))+b(1)x=-(Q^{(1)})^{+}b^{(1)}.
     For b(2)b^{(2)} as in (6.9) and Q(2)Q^{(2)} as in (6.7), y=(Q(2))+b(2)y=-(Q^{(2)})^{+}b^{(2)}.
     For b(3)b^{(3)} as in (6.10) and q(3)q^{(3)} as in (6.11), z=b(3)q(3)z=-\frac{b^{(3)}}{q^{(3)}}.
until convergence or maximum number of iterations
𝒞~=𝒞2(x,y,z)\widetilde{\mathcal{C}}=\mathcal{C}_{2}(x,y,z)

 


Like in the fully antisymmetric case, we can additionally observe that 𝒞2(x,y,z)=0\mathcal{C}_{2}(x,y,z)=0 if xx and yy are lineary dependent and

𝒞2(b11x+b12y,b21x+b22y,z)=detB𝒞2(x,y,z),B=[b11b12b21b22].\mathcal{C}_{2}(b_{11}x+b_{12}y,b_{21}x+b_{22}y,z)=\det B\mathcal{C}_{2}(x,y,z),\quad B=\left[\begin{array}[]{cc}b_{11}&b_{12}\\ b_{21}&b_{22}\\ \end{array}\right].

Then we can rescale our optimization problem such that we are looking for

minx~=y~=z~=1xy,λ{𝒞λ𝒞2(x~,y~,z~)2}.\min_{\begin{subarray}{c}\|\tilde{x}\|=\|\tilde{y}\|=\|\tilde{z}\|=1\\ x\perp y,\ \lambda\in\mathbb{R}\end{subarray}}\left\{\|\mathcal{C}-\lambda\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\|^{2}\right\}.

Set

g~(λ,x~,y~,z~)=𝒞22λ𝒞,𝒞2(x~,y~,z~)+λ2𝒞2(x~,y~,z~)2.\tilde{g}(\lambda,\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{C}\|^{2}-2\lambda\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle+\lambda^{2}\|\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\|^{2}.

From the shape of 𝒞2\mathcal{C}_{2} and the fact that x~=y~=z~=1\|\tilde{x}\|=\|\tilde{y}\|=\|\tilde{z}\|=1 and xyx\perp y, after a short a calculation we get 𝒞2(x~,y~,z~)2=12\|\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\|^{2}=\frac{1}{2}. Thus,

g~(λ,x~,y~,z~)=𝒞22λ𝒞,𝒞2(x~,y~,z~)+12λ2.\tilde{g}(\lambda,\tilde{x},\tilde{y},\tilde{z})=\|\mathcal{C}\|^{2}-2\lambda\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle+\frac{1}{2}\lambda^{2}.

The optimal λ\lambda for g~\tilde{g} is

λ=2𝒞,𝒞2(x~,y~,z~)\lambda_{*}=2\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle

and

g~(λ,x~,y~,z~))=𝒞22𝒞,𝒞2(x~,y~,z~)2.\tilde{g}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z}))=\|\mathcal{C}\|^{2}-2\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle^{2}.

Therefore, minimizing g~(λ,x~,y~,z~))\tilde{g}(\lambda_{*},\tilde{x},\tilde{y},\tilde{z})) is equivalent to maximizing |𝒞,𝒞2(x~,y~,z~)||\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle|.

Now we can set

W=[x~y~]W=\left[\begin{array}[]{cc}\tilde{x}&\tilde{y}\\ \end{array}\right]

and define the compressed matrix

Cc(W,z~)𝒞×1WT×2WT×3z~T,C_{c}(W,\tilde{z})\coloneqq\mathcal{C}\times_{1}W^{T}\times_{2}W^{T}\times_{3}\tilde{z}^{T}, (6.20)

analogue to the compressed tensor (5.28). Matrix Cc(W,z~)C_{c}(W,\tilde{z}) is a 2×22\times 2 skew-symmetric matrix

[0(Cc(W,z~))12(Cc(W,z~))120],\left[\begin{array}[]{cc}0&(C_{c}(W,\tilde{z}))_{12}\\ -(C_{c}(W,\tilde{z}))_{12}&0\\ \end{array}\right],

where

|(Cc(W,z~))12|=|𝒞×1x~T×2y~T×3z~T|.|(C_{c}(W,\tilde{z}))_{12}|=|\mathcal{C}\times_{1}\tilde{x}^{T}\times_{2}\tilde{y}^{T}\times_{3}\tilde{z}^{T}|. (6.21)

Moreover, we can write

𝒞2(x~,y~,z~)=M×1W×2W×3z~,\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})=M\times_{1}W\times_{2}W\times_{3}\tilde{z},

for M=[012120]M=\left[\begin{array}[]{cc}0&\frac{1}{2}\\ -\frac{1}{2}&0\\ \end{array}\right]. It follows that

|𝒞,𝒞2(x~,y~,z~)|=|Cc(W,z~),M|=22Cc(W,z~)F|\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle|=|\langle C_{c}(W,\tilde{z}),M\rangle|=\frac{\sqrt{2}}{2}\|C_{c}(W,\tilde{z})\|_{F}

and we conclude that maximization of |𝒞,𝒞2(x~,y~,z~)||\langle\mathcal{C},\mathcal{C}_{2}(\tilde{x},\tilde{y},\tilde{z})\rangle| is equivalent to maximization of Cc(W,z~)F\|C_{c}(W,\tilde{z})\|_{F}.

Maximization of Cc(W,z~)F\|C_{c}(W,\tilde{z})\|_{F} corresponds to multilinear rank-(2,2,m)(2,2,m) structure-preserving approximation of 𝒞\mathcal{C}. Similarly as it was done in [3] for the best antisymmetric multilinear rank-rr approximation, we can establish an equivalence between the best partially antisymmetric multilinear rank-(2,2,m)(2,2,m) approximation and the best unstructured rank-11 approximation of a partially antisymmetric tensor.

Proposition 6.4.

Let 𝒞n×n×m\mathcal{C}\in\mathbb{R}^{n\times n\times m} be a partially antisymmetric tensor. Then

max{𝒞×1UT×2UT×3zT:Un×2,UTU=I2,z2=1}\displaystyle\max\left\{\|\mathcal{C}\times_{1}U^{T}\times_{2}U^{T}\times_{3}z^{T}\|\ :\ U\in\mathbb{R}^{n\times 2},U^{T}U=I_{2},\|z\|_{2}=1\right\}
=2max{|𝒞×1u1T×2u2T×3zT|:u12=u22=z2=1,[u1u2]T[u1u2]=I2}\displaystyle=\sqrt{2}\max\left\{|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}|\ :\ \|u_{1}\|_{2}=\|u_{2}\|_{2}=\|z\|_{2}=1,[u_{1}u_{2}]^{T}[u_{1}u_{2}]=I_{2}\right\} (6.22)
=2max{|𝒞×1v1T×2v2T×3zT|:v12=v22=z2=1}.\displaystyle=\sqrt{2}\max\left\{|\mathcal{C}\times_{1}v_{1}^{T}\times_{2}v_{2}^{T}\times_{3}z^{T}|\ :\ \|v_{1}\|_{2}=\|v_{2}\|_{2}=\|z\|_{2}=1\right\}. (6.23)
Proof.

Take α=𝒞×1u1T×2u2T×3zT\alpha=\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}. From the relations (6.20) and (6.21) we see that, for every partially antisymmetric tensor 𝒞\mathcal{C} and for U=[u1u2]U=[u_{1}u_{2}],

𝒞×1UT×2UT×3zT2=[0αα0]F2=2α2,\|\mathcal{C}\times_{1}U^{T}\times_{2}U^{T}\times_{3}z^{T}\|^{2}=\left\|\left[\begin{array}[]{cc}0&\alpha\\ -\alpha&0\\ \end{array}\right]\right\|_{F}^{2}=2\alpha^{2},

which proves (6.22).

Obviously, expression (6.22) is less or equal than (6.23). Take the vectors v1v_{1}, v2v_{2}, zz that maximize (6.23). There is an upper-triangular 2×22\times 2 matrix RR such that |r11|1|r_{11}|\leq 1, |r22|1|r_{22}|\leq 1 and

[v1v2]=[u1u2]R\left[\begin{array}[]{cc}v_{1}&v_{2}\\ \end{array}\right]=\left[\begin{array}[]{cc}u_{1}&u_{2}\\ \end{array}\right]R

is thin QR decomposition of [v1v2][v_{1}v_{2}]. Using the antisymmetry in two modes we have

|𝒞×1v1T×2v2T×3zT|\displaystyle|\mathcal{C}\times_{1}v_{1}^{T}\times_{2}v_{2}^{T}\times_{3}z^{T}| =|𝒞×1r11u1T×2(r12u1T+r22u2T)×3zT|\displaystyle=|\mathcal{C}\times_{1}r_{11}u_{1}^{T}\times_{2}(r_{12}u_{1}^{T}+r_{22}u_{2}^{T})\times_{3}z^{T}|
=|𝒞×1r11u1T×2r22u2T×3zT|\displaystyle=|\mathcal{C}\times_{1}r_{11}u_{1}^{T}\times_{2}r_{22}u_{2}^{T}\times_{3}z^{T}|
=|r11r22||𝒞×1u1T×2u2T×3zT||𝒞×1u1T×2u2T×3zT|\displaystyle=|r_{11}r_{22}||\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}|\leq|\mathcal{C}\times_{1}u_{1}^{T}\times_{2}u_{2}^{T}\times_{3}z^{T}|

This proves that the value of (6.22) is equal to the value of (6.23). ∎

Therefore, following the previous discussion and the result of the Proposition 6.4 we obtained the equivalence between our Algorithm 6.3 and the best unstructured rank-11 approximation. Then, same as in the fully antisymmetric case, the convergence result for HOPM from [30] holds.

7. Numerical experiments

We provide numerical examples for the comparison of the CP rank-11 approximation with a posteriori antisymmetrization (Algorithm 4.1) and the antisymmetry preserving CP (Algorithm 5.2). Additionally, for the sake of completeness, we compare these algorithms with the CP-ALS algorithm (Algorithm 2.1) with r=6r=6, the algorithm that does not preserve antisymmetry. As we will show, antisymmetry preserving CP outperforms CP with a posteriori antisymmetrization in terms of accuracy, which was expected, but also in execution time, while CP-ALS showed to be much slower then the other two algorithms, and it also completely destroys the antisymmetric property.

All algorithms are implemented and tested in Julia programming language [4], version 1.8.1, on a personal computer, with BenchmarkTools [6] package, used for determining the execution times of the algorithms (function @btime) and TensorToolbox [25] package for tensor calculations.

For a given tensor 𝒜\mathcal{A} and an approximation 𝒜~\widetilde{\mathcal{A}}, we are looking at the relative error 𝒜𝒜~/𝒜\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\|. We run CP-ALS algorithm, both on its own and within CP with a posteriori antisymmetrization with tolerance 10810^{-8}, and we stop antisymmetry preserving CP algorithm when either the relative error or the difference between relative errors in two consecutive iterations falls bellow 10810^{-8}.

7.1. Example 1

First we generate an antisymmetric tensor 𝒜\mathcal{A} of size n×n×nn\times n\times n and rank 66, by randomly selecting three vectors x,y,zx,y,z of size nn and defining 𝒜=6𝒜6(x,y,z)\mathcal{A}=6\mathcal{A}_{6}(x,y,z), where 𝒜6(x,y,z)\mathcal{A}_{6}(x,y,z) is defined in (3.4). In this example we know that 𝒜\mathcal{A} has the proposed structure. We evaluate and compare the accuracy and the speed of our algorithms. The results for different nn are presented in Table 1. The best result in each column is bolded.

nn 1010 2525 5050
err time err time err time
CP+antisym 0.83330.8333 224μs224\,\mu s 0.83330.8333 905.9μs905.9\,\mu s 0.83330.8333 3.983𝐦𝐬\mathbf{3.983\,ms}
antisymCP 8.21×𝟏𝟎𝟏𝟔\mathbf{8.21\times 10^{-16}} 69.9μ𝐬\mathbf{69.9\,\mu s} 1.34×𝟏𝟎𝟏𝟓\mathbf{1.34\times 10^{-15}} 502.5μ𝐬\mathbf{502.5\,\mu s} 1.66×𝟏𝟎𝟏𝟓\mathbf{1.66\times 10^{-15}} 8.283ms8.283\,ms
CP-ALS 5.27×1065.27\times 10^{-6} 8.472ms8.472\,ms 1.998×1071.998\times 10^{-7} 26.282ms26.282\,ms 8.43×1088.43\times 10^{8} 187.625ms187.625\,ms
Table 1. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with r=6r=6 (Algorithm 2.1) in terms of relative error 𝒜𝒜~/𝒜\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\| and execution times obtained by function @btime.

Even though the execution time of CP with a posteriori antisymmetrization is comparable to antisymmetry preserving CP, by first approximating with non-antisymmetric tensor of rank 1, CP with a posteriori antisymmetrization loses the underlying structure, and results in an approximation with large error. CP-ALS manages to find a good non-antisymmetric approximation, but it requires much more time, so disregarding the antisymmetric property did not help either with accuracy or with execution times. Overall, antisymmetry preserving CP achieves best results.

Here, same as in the following examples, the initial vectors in Algorithm 5.2 are taken as random vectors. If we initialize the algorithm using the HOSVD, the number of iterations decreases, but the execution time increases because of the additional time needed to perform the HOSVD.

7.2. Example 2

Now we construct an antisymmetric tensor by element-wise as

𝒜(i,j,k)=\displaystyle\mathcal{A}(i,j,k)= sin(xi)sin(yj)sin(zk)+sin(yi)sin(zj)sin(xk)+sin(zi)sin(xj)sin(yk)\displaystyle\sin(x_{i})\sin(y_{j})\sin(z_{k})+\sin(y_{i})\sin(z_{j})\sin(x_{k})+\sin(z_{i})\sin(x_{j})\sin(y_{k})-
sin(yi)sin(xj)sin(zk)sin(xi)sin(zj)sin(yk)+sin(zi)sin(yj)sin(xk),\displaystyle\sin(y_{i})\sin(x_{j})\sin(z_{k})-\sin(x_{i})\sin(z_{j})\sin(y_{k})+\sin(z_{i})\sin(y_{j})\sin(x_{k}),

where xix_{i}, yjy_{j} and zkz_{k} are sets of nn equidistant points on intervals [0,1][0,1], [2,10][2,10] and [1,3][1,3], respectively. This type of tensor appear in signal processing applications. Accuracy and speed of our algorithms for different nn are presented in Table 2. The best result in each column is bolded.

nn 1010 2525 5050
err time err time err time
CP+antisym 0.83330.8333 220.9μs220.9\,\mu s 0.83330.8333 912.7μs912.7\,\mu s 0.83330.8333 3.95𝐦𝐬\mathbf{3.95\,ms}
antisymCP 7.55×𝟏𝟎𝟏𝟔\mathbf{7.55\times 10^{-16}} 111.5μ𝐬\mathbf{111.5\,\mu s} 9.2×𝟏𝟎𝟏𝟔\mathbf{9.2\times 10^{-16}} 3.439μ𝐬\mathbf{3.439\,\mu s} 1.575×𝟏𝟎𝟏𝟓\mathbf{1.575\times 10^{-15}} 26.898ms26.898\,ms
CP-ALS 4.02×1074.02\times 10^{-7} 7.659ms7.659\,ms 3.91×1093.91\times 10^{-9} 29.453ms29.453\,ms 8.492×1078.492\times 10^{-7} 86.045ms86.045\,ms
Table 2. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with r=6r=6 (Algorithm 2.1) in terms of relative error 𝒜𝒜~/𝒜\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\| and execution times obtained by function @btime.

Similarly as in Example 7.1, antisymmetry preserving CP beats two other methods in terms of accuracy and speed of getting an accurate solution.

In the next two examples we use tensors of smaller size, because ranks of those tensors increase with the size, and, since we are approximating by a rank-66 tensor, we want to use tensors for which it makes sense to do this type of approximation.

7.3. Example 3

Now we generate an antisymmetric tensor that does not necessarily have the structure (3.4), by discretizing the function f(x,y,z)=exp(x2+2y2+3z2)f(x,y,z)=\exp(x^{2}+2y^{2}+3z^{2}) on a grid ξi=(i1)/(n1)\xi_{i}=(i-1)/(n-1), i=1,,ni=1,\ldots,n, and then applying the antisymmetrizer (3.2). We test for the different values of nn and show the results in Table 3. Again, the best result in each column is bolded.

nn 33 55 77
err time err time err time
CP+antisym 0.83330.8333 185μs185\,\mu s 0.83390.8339 260.2μs260.2\,\mu s 0.83450.8345 276.6μs276.6\,\mu s
antisymCP 1.61×𝟏𝟎𝟏𝟒\mathbf{1.61\times 10^{-14}} 17.1μ𝐬\mathbf{17.1\,\mu s} 0.0557\mathbf{0.0557} 87.4μ𝐬\mathbf{87.4\,\mu s} 0.0802\mathbf{0.0802} 130.4μ𝐬\mathbf{130.4\,\mu s}
CP-ALS 2.55×1052.55\times 10^{-5} 9.875ms9.875\,ms 0.0557\mathbf{0.0557} 26.282ms26.282\,ms 0.0802\mathbf{0.0802} 4.538ms4.538\,ms
Table 3. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with r=6r=6 (Algorithm 2.1) in terms of relative error 𝒜𝒜~/𝒜\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\| and execution times obtained by function @btime.

Antisymmetry preserving CP achieves the best execution times. When the tensor can be well approximated by the CP approximation of the form (3.4), it also achieves the best accuracy (n=3n=3). Otherwise, it results in the same error as CP-ALS, but much lower execution times (n=5,7n=5,7).

7.4. Example 4

We generate a random tensor of size n×n×nn\times n\times n and antisymmetrize it with (3.2). We compare the three algorithms and present the results in Table 4. The best result in each column is bolded.

nn 33 55 77
err time err time err time
CP+antisym 0.83330.8333 184.6μs184.6\,\mu s 0.85460.8546 336.5μs336.5\,\mu s 0.92420.9242 743.3μs743.3\,\mu s
antisymCP 3.616×𝟏𝟎𝟏𝟔\mathbf{3.616\times 10^{-16}} 𝟏𝟕μ𝐬\mathbf{17\,\mu s} 0.34320.3432 139.9μ𝐬\mathbf{139.9\,\mu s} 0.7230.723 493.2μ𝐬\mathbf{493.2\,\mu s}
CP-ALS 8.11×1088.11\times 10^{-8} 14.364ms14.364\,ms 0.2716\mathbf{0.2716} 20.172ms20.172\,ms 0.6393\mathbf{0.6393} 421.051ms421.051\,ms
Table 4. Evaluation of CP algorithm with a posteriori antisymmetrization (CP+antisym - Algorithm 4.1), antisymmetry preserving CP (antisymCP - Algorithm 5.2) and CP-ALS with r=6r=6 (Algorithm 2.1) in terms of relative error 𝒜𝒜~/𝒜\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\| and execution times obtained by function @btime.

Similarly as in the previous example, when a tensor can be well approximated by CP decomposition with six summands (here for n=3n=3), antisymmetry preserving CP achieves the best results. For n=5,7n=5,7, antisymmetry preserving CP gives somewhat worse results than CP-ALS in terms of accuracy, but still gives the approximation in much sorter times, and CP-ALS does not preserve the antisymmetry. Note that the rank of a random antisymmetric tensor is much higher than six. This is the reason why all approximations produce high relative error.

7.5. Example 5. Partial antisymmetry

For the partial antisymmetry, we compare Algorithm 6.1, CP with partial a posteriori antisymmetrization, and Algorithm 6.3, CP preserving partial antisymmetry, with standard CP-ALS (Algorithm 2.1) with r=2r=2, which ignores the structure.

Here, regardless of how we construct the tensor 𝒜\mathcal{A}, all methods give approximately the same error. Again, the CP preserving partial antisymmetry stands out in terms of execution times. We present results in Table 5, with tensors 𝒜1\mathcal{A}_{1}, 𝒜2\mathcal{A}_{2} and 𝒜3\mathcal{A}_{3} defined as follows:

  • 𝒜1\mathcal{A}_{1} is an 8×8×108\times 8\times 10 tensor constructed by randomly selecting vectors x,y,zx,y,z of sizes 8,8,108,8,10, respectively, and setting 𝒜=2𝒞2\mathcal{A}=2\mathcal{C}_{2}, where 𝒞2\mathcal{C}_{2} is defined in (6.2).

  • 𝒜2\mathcal{A}_{2} is a 5×5×75\times 5\times 7 tensor constructed from the function the same way as in the Example 7.3.

  • 𝒜3\mathcal{A}_{3} is a 5×5×45\times 5\times 4 tensor generated by partially antisymmetrizing a tensor with randomly selected elements, using anti1,2\text{anti}_{1,2} operator.

tensor 𝒜1\mathcal{A}_{1} 𝒜2\mathcal{A}_{2} 𝒜3\mathcal{A}_{3}
err time err time err time
CP+pantisym 1.88×10161.88\times 10^{-16} 202.7μs202.7\,\mu s 0.10010.1001 269.4μs269.4\,\mu s 0.71750.7175 569.2μs569.2\,\mu s
pantisymCP 5.832×10165.832\times 10^{-16} 30.2μ𝐬\mathbf{30.2\,\mu s} 0.10010.1001 53.70μ𝐬\mathbf{53.70\,\mu s} 0.71750.7175 106.6μ𝐬\mathbf{106.6\,\mu s}
CP-ALS 6.774×10166.774\times 10^{-16} 1.051ms1.051\,ms 0.10010.1001 1.282ms1.282\,ms 0.71750.7175 3.026ms3.026\,ms
Table 5. Evaluation of CP algorithm with partial a posteriori antisymmetrization (CP+pantisym - Algorithm 6.3), antisymmetry preserving partial CP (pantisymCP - Algorithm 6.3) and CP-ALS with r=2r=2 (Algorithm 2.1) in terms of relative error 𝒜𝒜~/𝒜\|\mathcal{A}-\widetilde{\mathcal{A}}\|/\|\mathcal{A}\| and execution times obtained by function @btime.

8. Conclusion

We described an antisymmetric tensor format 𝒜6(x,y,z)\mathcal{A}_{6}(x,y,z) determined by only three vectors, x,y,znx,y,z\in\mathbb{R}^{n}. For any nn, tensors of the form 𝒜6(x,y,z)\mathcal{A}_{6}(x,y,z) have rank at most six. We developed an ALS algorithm for structure-preserving low-rank approximation of an antisymmetric tensor 𝒜\mathcal{A} by a tensor of the form A~=𝒜6(x,y,z)\widetilde{A}=\mathcal{A}_{6}(x,y,z). In order to obtain our algorithm, we wrote the objective function as three different quadratic forms, given explicitly, one for each mode. The algorithm works in a way that, in each microiteration, a quadratic optimization problem for the corresponding tensor mode is solved.

We showed that our minimization problem

𝒜𝒜6(x,y,z)min\|\mathcal{A}-\mathcal{A}_{6}(x,y,z)\|\rightarrow\min

for x,y,znx,y,z\in\mathbb{R}^{n} can be viewed as a minimization problem for orthonormal vectors x~,y~,z~n\tilde{x},\tilde{y},\tilde{z}\in\mathbb{R}^{n},

𝒜𝒜6(x~,y~,z~)min.\|\mathcal{A}-\mathcal{A}_{6}(\tilde{x},\tilde{y},\tilde{z})\|\rightarrow\min.

Further on, we demonstrated that this minimization problem is equivalent to maximization problem

𝒜×1VT×2VT×3VTmax,\|\mathcal{A}\times_{1}V^{T}\times_{2}V^{T}\times_{3}V^{T}\|\rightarrow\max,

where Vn×3V\in\mathbb{R}^{n\times 3} is a matrix with orthonormal columns. The prior maximization problem corresponds to the problem of the best multilinear low-rank approximation of antisymmetric tensors. Since antisymmetric multilinear low-rank approximation is equivalent to the best unstructured rank-11 approximation, we were able relate our algorithm to HOPM. Therefore, the global convergence results for HOPM apply here.

For the tensors with partial antisymmetry we established a partially antisymmetric tesnor format 𝒞2(x,y,z)\mathcal{C}_{2}(x,y,z) determined by three vectors, x,ynx,y\in\mathbb{R}^{n}, zmz\in\mathbb{R}^{m}. Tensors of the form 𝒞2(x,y,z)\mathcal{C}_{2}(x,y,z) have rank two. We created a similar ALS algorithm for structure-preserving rank-22 approximation of a partially atisymmetric tensor 𝒞\mathcal{C} by a tensor of the form C~=𝒞2(x,y,z)\widetilde{C}=\mathcal{C}_{2}(x,y,z). Analogously to the fully antisymmetric case, we verified that the algorithm in question is equivalent to HOPM.

The method described in the paper can be generalized to solve the approximation problem for different antisymmetric structures. Given that the target format can be written as a sum of multilinear terms, the underlying linearity in each mode would lead to quadratic optimization problems which would be handled in the same way, with different coefficient matrices and vectors. For example, instead of antisymmetric rank-66 approximation, this way, one could find antisymmetric rank-6r6r approximation represented by 3r3r vectors. The paper limited its scope to order-33 tensors. For antisymmetric order-dd tensors, analogous rank-d!rd!r approximation would be represented by drdr vectors.

Acknowledgments

The authors would like to thank the anonymous referees, whose insightful remarks improved the paper, especially for the comments regarding orthogonality of the resulting vectors and the equivalence to HOPM.

References

  • [1] E. Acar, D. M. Dunlavy, and T. G. Kolda, A scalable optimization approach for fitting canonical tensor decompositions, Journal of Chemometrics, 25 (2011), pp. 67–86.
  • [2] E. Acar and B. Yener, Unsupervised multiway data analysis: A literature survey, IEEE Trans. Knowledge Date Engrg., 21 (2009), pp. 6–20.
  • [3] E. Begović Kovač and D. Kressner, Structure-preserving low multilinear rank approximation of antisymmetric tensors, SIAM J. Matrix Anal. Appl., 38 (2017), pp. 967–983.
  • [4] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah, Julia: a fresh approach to numerical computing, SIAM Rev., 59 (2017), pp. 65–98.
  • [5] J. D. Carroll and J. J. Chang, Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition, Psychometrika, 35 (1970), pp. 283–319.
  • [6] J. Chen and J. Revels, Robust benchmarking in noisy environments, ArXiv, 1608.04295 (2016).
  • [7] P. Comon, G. Golub, L.-H. Lim, and B. Mourrain, Symmetric tensors and symmetric tensor rank, SIAM J. Matrix Anal. Appl., 30 (2008), pp. 1254–1279.
  • [8] P. Comon, X. Luciani, and A. L. F. de Almeida, Tensor decompositions, alternating least squares and other tales, urnal of Chemometrics, 23 (2009), pp. 393–405.
  • [9] L. De Lathauwer, B. De Moor, and J. Vandewalle, A multilinear singular value decomposition, SIAM J. Matrix Anal. Appl., 21 (2000), pp. 1253–1278.
  • [10]  , Computation of the canonical decomposition by means of a simultaneous generalized Schur decomposition, SIAM J. Matrix Anal. Appl., 26 (2004/05), pp. 295–327.
  • [11] M. Espig, W. Hackbusch, and A. Khachatryan, On the convergence of alternating least squares optimisation in tensor format representations, ArXiv, 1503.05431 (2015).
  • [12] H. Goldstein, C. P. Poole, and J. L. Safko, Classical Mechanics, Pearson, 3rd ed., 2001.
  • [13] L. Grasedyck, M. Kluge, and S. Krämer, Variants of alternating least squares tensor completion in the tensor train format, SIAM J. Sci. Comput., 37 (2015), pp. A2424–A2450.
  • [14] W. H. Greub, Multilinear algebra, vol. Band 136 of Die Grundlehren der mathematischen Wissenschaften, Springer-Verlag New York, Inc., New York, 1967.
  • [15] W. Hackbusch, On the representation of symmetric and antisymmetric tensors, in Contemporary computational mathematics—a celebration of the 80th birthday of Ian Sloan. Vol. 1, 2, Springer, Cham, 2018, pp. 483–515.
  • [16] R. A. Harshman, Foundations of the parafac procedure: Models and conditions for an “explanatory” multi-modal factor analysis, UCLA Working Papers in Phonetics, 16 (1970), pp. 1–84.
  • [17] F. L. Hitchcock, The expression of a tensor or a polyadic as a sum of products, J. Math. Phys., 6 (1927), pp. 164–189.
  • [18]  , Multilple invariants and generalized rank of a p-way matrix or tensor, J. Math. Phys., 7 (1927), pp. 39–79.
  • [19] D. Hong, T. G. Kolda, and J. A. Duersch, Generalized canonical polyadic tensor decomposition, SIAM Rev., 62 (2020), pp. 133–163.
  • [20] F. Hummel, T. Tsatsoulis, and A. Grüneis, Low rank factorization of the Coulomb integrals for periodic coupled cluster theory, The Journal of Chemical Physics, 146 (2017), p. 124105.
  • [21] T. G. Kolda, Numerical optimization for symmetric tensor decomposition, Math. Program., 151 (2015), pp. 225–248.
  • [22] T. G. Kolda and B. W. Bader, Tensor decompositions and applications, SIAM Rev., 51 (2009), pp. 455–500.
  • [23] J. M. Landsberg, Tensors: geometry and applications, vol. 128 of Graduate Studies in Mathematics, American Mathematical Society, Providence, RI, 2012.
  • [24] R. Minster, I. Viviano, X. Liu, and G. Ballard, CP decomposition for tensors via alternating least squares with QR decomposition, Numer. Linear Algebra Appl., 30 (2023), pp. Paper No. e2511, 17.
  • [25] L. Periša, Julia tensor toolbox, (2019).
  • [26] K. F. Riley, M. P. Hobson, and S. J. Bence, Mathematical Methods for Physics and Engineering, Cambridge University Press, 3rd ed., 2006.
  • [27] S. Szalay, M. Pfeffer, V. Murg, G. Barcza, F. Verstraete, R. Schneider, and O. Legeza, Tensor product methods and entanglement optimization for ab initio quantum chemistry, International Journal of Quantum Chemistry, 115 (2015), pp. 1342–1391.
  • [28] P.-H. Tsai, P. Fischer, and E. Solomonik, Accelerating the galerkin reduced-order model with the tensor decomposition for turbulent flows, ArXiv, 2311.03694 (2023).
  • [29] A. Uschmajew, Local convergence of the alternating least squares algorithm for canonical tensor approximation, SIAM J. Matrix Anal. Appl., 33 (2012), pp. 639–652.
  • [30]  , A new convergence proof for the higher-order power method and generalizations, Pac. J. Optim., 11 (2015), pp. 309–321.