This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Tensor Completion through Total Variation with Initialization from Weighted HOSVD

Zehan Chao, Longxiu Huang and Deanna Needell
Abstract

In our paper, we have studied the tensor completion problem when the sampling pattern is deterministic. We first propose a simple but efficient weighted HOSVD algorithm for recovery from noisy observations. Then we use the weighted HOSVD result as an initialization for the total variation. We have proved the accuracy of the weighted HOSVD algorithm from theoretical and numerical perspectives. In the numerical simulation parts, we also showed that by using the proposed initialization, the total variation algorithm can efficiently fill the missing data for images and videos.

I Introduction

Tensor, a high-dimensional array which is an extension of matrix, plays an important role in a wide range of real world applications [3, 12]. Due to the high-dimensional structure, tensor could preserve more information compared to the unfolded matrix. For instance, a kk frame, m×nm\times n video stored as an m×n×km\times n\times k tensor will keep the connection between each frames, splitting the frames or unfolding this tensor may lose some conjunctional information. A brain MRI would also benefit from the 3D structure if stored as tensor instead of randomly arranging several snapshots as matrices. On the other hand, most of the real world datasets are partially missing and incomplete data which can lead to extremely low performance of downstream applications. The linear dependency and redundancy between missing and existing data can be leveraged to recover unavailable data and improve the quality and scale of the incomplete dataset. The task of recovering missing elements from partially observed tensor is called tensor completion and has attracted widespread attention in many applications. e.g., image/video inpainting [23, 25], recommendation systems [26]. Matrix completion problem, [1, 2, 4, 5, 6, 7, 10, 15, 17, 18] as a special case of tensor completion problem has been well-studied in the past few decades, which enlightened researchers on developing further tensor completion algorithms. Among different types of data matrices, image data is commonly studied and widely used for performance indicator. One traditional way to target image denoising problem is to minimize the total variation norm. Such method is based on the assumption of locally smoothness pattern of the data. Yet in recent decades, thanks to the algorithm development of non-negative matrix factorization (NMF) and nuclear norm minimization (NNM), the low-rank structure assumption becomes increasingly popular and extensively applied in related studies. In both matrix completion and tensor completion studies, researchers are trying to utilize and balance both assumption in order to improve the performance of image recovery and video recovery tasks.

In this research, we will provide an improved version of total variation minimization problem by providing a proper initialization. To implement the initialization, we have designed a simple but efficient algorithm which we call weighted HOSVD algorithm for low-rank tensor completion from a deterministic sampling pattern, which is motivated from [11, 15].

II Tensor Completion Problem

In this section, we would provide a formal definition for the tensor completion problem. First of all, we will introduce notations, basic operations and definitions for tensor.

II-A Preliminaries and Notations

Tensors, matrices, vectors and scalars are denoted in different typeface for clarity below. Throughout this paper, calligraphic boldface capital letters are used for tensors, capital letters are used for matrices, lower boldface letters for vectors, and regular letters for scalars. The set of the first dd natural numbers will be denoted by [d]:={1,,d}[d]:=\{1,\cdots,d\}. Let 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} and α\alpha\in\mathbb{R}, 𝒯(α)\mathcal{T}^{(\alpha)} represents the pointwise power operator i.e., (𝒯(α))i1in=(𝒯i1in)α(\mathcal{T}^{(\alpha)})_{i_{1}\cdots i_{n}}=(\mathcal{T}_{i_{1}\cdots i_{n}})^{\alpha}. We use 𝒯0\mathcal{T}\succ 0 to denote the tensor with 𝒯i1in>0\mathcal{T}_{i_{1}\cdots i_{n}}>0 for all i1,,ini_{1},\cdots,i_{n}. 𝟏Ω\boldsymbol{1}_{\Omega} denotes the tensor with all entries equal to 11 on Ω\Omega and 0 otherwise.

Definition 1 (Tensor).

A tensor is a multidimensional array. The dimension of a tensor is called the order (also called the mode). The space of a real tensor of order nn and of size d1××dnd_{1}\times\cdots\times d_{n} is denoted as d1××dn\mathbb{R}^{d_{1}\times\cdots\times d_{n}}. The elements of a tensor 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} are denoted by 𝒯i1in\mathcal{T}_{i_{1}\cdots i_{n}}.

For an order nn tensor 𝒯\mathcal{T} can be matricized in nn ways by unfolding it along each of the nn modes, next we will give the definition for the matricization of a given tensor.

Definition 2 (Matricization of a tensor).

The mode-kk matricization of tensor 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} is the matrix, which is denoted as 𝒯(k)dk×jkdj\mathcal{T}_{(k)}\in\mathbb{R}^{d_{k}\times\prod_{j\neq k}d_{j}}, whose columns are composed of all the vectors obtained from 𝒯\mathcal{T} by fixing all indices but iith.

In order to illustrate the matricization of a tensor, let us consider the following example.

Example 1.

Let 𝒯3×4×2\mathcal{T}\in\mathbb{R}^{3\times 4\times 2} with the following frontal slices:

T1=[147102581136912]T2=[131619221417202315182124],T_{1}=\begin{bmatrix}1&4&7&10\\ 2&5&8&11\\ 3&6&9&12\end{bmatrix}\quad\leavevmode\nobreak\ \leavevmode\nobreak\ \leavevmode\nobreak\ T_{2}=\begin{bmatrix}13&16&19&22\\ 14&17&20&23\\ 15&18&21&24\end{bmatrix},

then the three mode-n matricizations are

𝒯(1)\displaystyle\mathcal{T}_{(1)} =\displaystyle= [147101316192225811141720233691215182124],\displaystyle\begin{bmatrix}1&4&7&10&13&16&19&22\\ 2&5&8&11&14&17&20&23\\ 3&6&9&12&15&18&21&24\end{bmatrix},
𝒯(2)\displaystyle\mathcal{T}_{(2)} =\displaystyle= [123131415456161718789192021101112222324],\displaystyle\begin{bmatrix}1&2&3&13&14&15\\ 4&5&6&16&17&18\\ 7&8&9&19&20&21\\ 10&11&12&22&23&24\end{bmatrix},
𝒯(3)\displaystyle\mathcal{T}_{(3)} =\displaystyle= [123101112131415222324].\displaystyle\begin{bmatrix}1&2&3&\cdots&10&11&12\\ 13&14&15&\cdots&22&23&24\end{bmatrix}.
Definition 3 (Folding Operator).

Suppose 𝒯\mathcal{T} be a tensor. The mode-kk folding operator of a matrix M=𝒯(k)M=\mathcal{T}_{(k)}, denoted as foldk(M)\text{fold}_{k}(M), is the inverse operator of the unfolding operator.

Definition 4 (\infty norm).

Let 𝒯d1×d2××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times d_{2}\times\cdots\times d_{n}}, the 𝒯\|\mathcal{T}\|_{\infty} is defined as

𝒯=maxi1,i2,,,in|𝒯i1i2in|.\|\mathcal{T}\|_{\infty}=\max_{i_{1},i_{2},\cdots,,i_{n}}|\mathcal{T}_{i_{1}i_{2}\cdots i_{n}}|.

The unit ball under \infty norm is denoted by 𝐁\boldsymbol{B}_{\infty}.

Definition 5 (Frobenius norm).

The Frobenius norm for tensor 𝒯d1×d2××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times d_{2}\times\cdots\times d_{n}} is defined as

𝒯F=i1,i2,,in𝒯i1i2in2.\|\mathcal{T}\|_{F}=\sqrt{\sum_{i_{1},i_{2},\cdots,i_{n}}\mathcal{T}_{i_{1}i_{2}\cdots i_{n}}^{2}}.
Definition 6 (Product Operations).
  1. \bullet

    Mode-kk Product: Mode-kk product of tensor 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} and matrix Ad×dkA\in\mathbb{R}^{d\times d_{k}} is defined by

    𝒯×kA=foldk(A𝒯(k)),\mathcal{T}\times_{k}A=\text{fold}_{k}(A\mathcal{T}_{(k)}),

    i.e.,

    (𝒯×kA)i1ik1jik+1in=ik=1dk𝒯i1i2inAjik.(\mathcal{T}\times_{k}A)_{i_{1}\cdots i_{k-1}ji_{k+1}\cdots i_{n}}=\sum_{i_{k}=1}^{d_{k}}\mathcal{T}_{i_{1}i_{2}\cdots i_{n}}A_{ji_{k}}.
  2. \bullet

    Outer product: Let 𝒂𝟏d1,,𝒂𝒏dn\boldsymbol{a_{1}}\in\mathbb{R}^{d_{1}},\cdots,\boldsymbol{a_{n}}\in\mathbb{R}^{d_{n}}. The outer product among these nn vectors is a tensor 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} defined as:

    𝒯=𝒂𝟏𝒂𝒏,𝒯i1,,in=k=1n𝒂𝒌(ik).\mathcal{T}=\boldsymbol{a_{1}}\boldsymbol{\otimes}\cdots\boldsymbol{\otimes}\boldsymbol{a_{n}},\leavevmode\nobreak\ \leavevmode\nobreak\ \mathcal{T}_{i_{1},\cdots,i_{n}}=\prod_{k=1}^{n}\boldsymbol{a_{k}}(i_{k}).
  3. \bullet

    Kronecker product of matrices: The Kronecker product of AI×JA\in\mathbb{R}^{I\times J} and BK×LB\in\mathbb{R}^{K\times L} is denoted by ABA\otimes B. The result is a matrix of size (KI)×(JL)(KI)\times(JL) and defined by

    AB\displaystyle A\otimes B =\displaystyle= [A11BA12BA1JBA21BA22BA2JBAI1BAI2BAIJB].\displaystyle\begin{bmatrix}A_{11}B&A_{12}B&\cdots&A_{1J}B\\ A_{21}B&A_{22}B&\cdots&A_{2J}B\\ \vdots&\vdots&\ddots&\vdots\\ A_{I1}B&A_{I2}B&\cdots&A_{IJ}B\end{bmatrix}.
  4. \bullet

    Khatri-Rao product: Given matrices Ad1×rA\in\mathbb{R}^{d_{1}\times r} and Bd2×rB\in\mathbb{R}^{d_{2}\times r}, their Khatri-Rao product is denoted by ABA\odot B. The result is a matrix of size (d1d2)×r(d_{1}d_{2})\times r defined by

    AB=[𝒂𝟏𝒃𝟏𝒂𝒓𝒃𝒓],A\odot B=\begin{bmatrix}\boldsymbol{a_{1}}\otimes\boldsymbol{b_{1}}&\cdots&\boldsymbol{a_{r}}\otimes\boldsymbol{b_{r}}\end{bmatrix},

    where 𝒂𝒊\boldsymbol{a_{i}} and 𝒃𝒊\boldsymbol{b_{i}} stands for the ii-th column of AA and BB respectively.

  5. \bullet

    Hadamard product: Given two tensors 𝒯\mathcal{T} and 𝒴\mathcal{Y}, both of size d1××dnd_{1}\times\cdots\times d_{n}, their Hadamard product is denoted by 𝒳𝒴\mathcal{X}\boxdot\mathcal{Y}. The result is also of the size d1×d2××dnd_{1}\times d_{2}\times\cdots\times d_{n} and the elements of 𝒳𝒴\mathcal{X}\boxdot\mathcal{Y} are defined as the elementwise tensor product i.e.,

    (𝒳𝒴)i1i2in=𝒳i1i2in𝒴i1i2in.(\mathcal{X}\boxdot\mathcal{Y})_{i_{1}i_{2}\cdots i_{n}}=\mathcal{X}_{i_{1}i_{2}\cdots i_{n}}\mathcal{Y}_{i_{1}i_{2}\cdots i_{n}}.
Definition 7 (Rank-one Tensors).

An nn-order tensor 𝒯d1×d2××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times d_{2}\times\cdots\times d_{n}} is rank one if it can be written as the out product of nn vectors, i.e.,

𝒯=𝒂𝟏𝒂𝒏.\mathcal{T}=\boldsymbol{a_{1}}\boldsymbol{\otimes}\cdots\boldsymbol{\otimes}\boldsymbol{a_{n}}.
Definition 8 (Tensor (CP) rank[20, 21]).

The rank of a tensor 𝒳\mathcal{X}, denoted rank(𝒳)\text{rank}(\mathcal{X}), is defined as the smallest number of rank-one tensors that generate 𝒳\mathcal{X} as their sum. We use KrK_{r} to denote the cone of rank-r tensors.

Different from the case of matrices, the rank of a tensor is presently not understood well. And the problem of computing the rank of a tensor is NP-hard. Next we will introduce a new rank definition related to the tensor.

Definition 9 (Tucker rank [21]).

Let 𝒳d1××dn\mathcal{X}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}}. The tuple (r1,,rn)n(r_{1},\cdots,r_{n})\in\mathbb{N}^{n}, where rk=rank(𝒳(k))r_{k}=\text{rank}(\mathcal{X}_{(k)}) is called tensor Tucker rank of 𝒳\mathcal{X}. We use K𝐫K_{\boldsymbol{r}} to denote the cone of Tucker rank 𝐫\boldsymbol{r} tensors.

II-B Problem Statement

In order to find a good initialization for total variation method, we would like to solve the following questions.

Question 1.

Given a deterministic sampling pattern Ω\Omega and corresponding (possibly noisy) observations from the tensor, what type of recovery error can we expect, in what metric, and how may we efficiently implement this recovery?

Question 2.

Given a sampling pattern Ω\Omega, and noisy observations 𝒯Ω+𝒵Ω\mathcal{T}_{\Omega}+\mathcal{Z}_{\Omega}, for what rank-one weight tensor \mathcal{H} can we efficiently find a tensor 𝒯^\widehat{\mathcal{T}} so that (𝒯^𝒯)F\|\mathcal{H}\boxdot(\widehat{\mathcal{T}}-\mathcal{T})\|_{F} is small compared to F\|\mathcal{H}\|_{F}? And how can we efficiently find such weight tensor \mathcal{H}, or certify that a fixed \mathcal{H} has this property?

In order to find weight tensor, we consider the optimization problem

𝒲:=argmin𝒳0,rank(𝒳)=1𝒳𝟏ΩF\displaystyle\mathcal{W}:=\underset{{\mathcal{X}\succ 0,\text{rank}(\mathcal{X})=1}}{\text{argmin}}\|\mathcal{X}-\boldsymbol{1}_{\Omega}\|_{F}

𝒲\mathcal{W} can be estimated by using the least square CP algorithm [8, 19]. After we find 𝒲\mathcal{W}, then we consider the following optimization problem to estimate 𝒯\mathcal{T}:

𝒯^=𝒲(1/2)argminTucker_rank(𝒯)=𝒓𝒯𝒲(1/2)𝒴ΩF,\widehat{\mathcal{T}}=\mathcal{W}^{(-1/2)}\boxdot\underset{\text{Tucker\_rank}(\mathcal{T})=\boldsymbol{r}}{\text{argmin}}\|\mathcal{T}-\mathcal{W}^{(-1/2)}\boxdot\mathcal{Y}_{\Omega}\|_{F}, (1)

where 𝒴Ω=𝒯Ω+𝒵Ω\mathcal{Y}_{\Omega}=\mathcal{T}_{\Omega}+\mathcal{Z}_{\Omega}. As we know, to solve problem (1) is NP-hard. In order to solve (1) in polynomial time, we consider the HOSVD process [13]. Assume that 𝒯\mathcal{T} has Tucker rank 𝒓=[r1,,rn]\boldsymbol{r}=[r_{1},\cdots,r_{n}]. Let

A^i=argminrank(A)=riA(𝒲(1/2)𝒴Ω)(i)2.\widehat{A}_{i}=\underset{\text{rank}(A)=r_{i}}{\text{argmin}}\|A-(\mathcal{W}^{(-1/2)}\boxdot\mathcal{Y}_{\Omega})_{(i)}\|_{2}.

and set U^i\widehat{U}_{i} to be the left singular vector matrix of A^i\widehat{A}_{i}. Then the estimated tensor is of the form

𝒯^=𝒲(1/2)((𝒲(1/2)𝒴Ω)×1U^1U^1T×2×nU^nU^nT.\widehat{\mathcal{T}}=\mathcal{W}^{(-1/2)}\boxdot((\mathcal{W}^{(-1/2)}\boxdot\mathcal{Y}_{\Omega})\times_{1}\widehat{U}_{1}\widehat{U}_{1}^{T}\times_{2}\cdots\times_{n}\widehat{U}_{n}\widehat{U}_{n}^{T}.

In the following, we call our algorithm weighted HOSVD algorithm.

With the output from weighted HOSVD, 𝒯^\mathcal{\widehat{T}}, we can solve the following total variation problem with 𝒯^\mathcal{\widehat{T}} as initialization:

min𝒳\displaystyle\min_{\mathcal{X}}\quad 𝒳TV\displaystyle\|\mathcal{X}\|_{TV}
s.t.\displaystyle s.t.\quad 𝒳Ω=𝒴Ω.\displaystyle\mathcal{X}_{\Omega}=\mathcal{Y}_{\Omega}.

This total variation minimization problem is solved by iterative method. We will discuss the details of algorithm and numerical performance in section IV and V.

III Related Work

In this section, we will briefly step through the history of matrix completion and introduce several relevant studies on tensor completion.

Suppose we have a partially observed matrix MM under the low-rank assumption and give a deterministic sampling pattern Ω\Omega, the most intuitive optimization problem raised here is:

minX\displaystyle\min_{X}\quad Rank(X),\displaystyle Rank(X),
s.t.\displaystyle s.t.\quad XΩ=MΩ.\displaystyle X_{\Omega}=M_{\Omega}.

However, due to the computational complexity (NP-hard) of this minimization problem, researchers developed workarounds by defining new optimization problems which could be done in polynomial time. Two prominent substitutions are the nuclear norm minimization(NNM) and low rank matrix factorization(LRMF):

minX\displaystyle\min_{X}\quad X\displaystyle\|X\|_{*}
s.t.\displaystyle s.t.\quad XΩ=MΩ,\displaystyle X_{\Omega}=M_{\Omega},

where \|\cdot\|_{*} stands for the sum of singular values.

minA,B\displaystyle\min_{A,B}\quad (XAB)ΩF\displaystyle\|(X-AB)_{\Omega}\|_{F}
s.t.\displaystyle s.t.\quad Am×r,Br×n,\displaystyle A\in{\mbox{$\mathbb{R}$}^{m\times r}},B\in{\mbox{$\mathbb{R}$}^{r\times n}},

where Am×r,Br×nA\in{\mbox{$\mathbb{R}$}^{m\times r}},B\in{\mbox{$\mathbb{R}$}^{r\times n}} are restricted to be the low rank components.

While the task went up to tensor completion, the low rank assumption became even harder to approach. Some of the recent studies performed matrix completion algorithms on the unfolded tensor and obtained considerable results. For example, [25] introduced nuclear norm to unfolded tensors and took the weighted average for loss function. They proposed several algorithms such as FaLRTC and SiLRTC to solve the minimization problem:

min𝒳\displaystyle\min_{\mathcal{X}}\quad i=1nαi𝒳(i)\displaystyle\sum_{i=1}^{n}\alpha_{i}\|\mathcal{X}_{(i)}\|_{*}
s.t.\displaystyle s.t.\quad 𝒳Ω=𝒯Ω.\displaystyle\mathcal{X}_{\Omega}=\mathcal{T}_{\Omega}.

[28] applied low-rank matrix factorization(LRMF) to all-mode unfolded tensors and defined the minimization problem as following:

min𝒳,𝐀,𝐁\displaystyle\min_{\mathcal{X},\mathbf{A,B}}\quad i=1nαi𝒳(i)AiBiF2\displaystyle\sum_{i=1}^{n}\alpha_{i}\|\mathcal{X}_{(i)}-A_{i}B_{i}\|_{F}^{2}
s.t.\displaystyle s.t.\quad 𝒳Ω=𝒯Ω\displaystyle\mathcal{X}_{\Omega}=\mathcal{T}_{\Omega}

Where 𝐀={A1,,An}\mathbf{A}=\{A_{1},...,A_{n}\}, 𝐁={B1,,Bn}\mathbf{B}=\{B_{1},...,B_{n}\} are the set of low rank matrices with different size according to the unfolded tensor. This method is called TMac and could be solved using alternating minimization.

While researchers often test the performance of their tensor completion algorithms on image/video/MRI data, they started to combine NNM and LRMF with total variation norm minimization when dealing with relevant recovery tasks. For example, [22] introduced the TV regularization into the tensor completion problem:

min𝒳,A,B\displaystyle\min_{\mathcal{X},A,B}\quad i=1nαi𝒳(i)AiBiF2+μB3TV\displaystyle\sum_{i=1}^{n}\alpha_{i}\|\mathcal{X}_{(i)}-A_{i}B_{i}\|_{F}^{2}+\mu\|B_{3}\|_{TV}
s.t.\displaystyle s.t.\quad 𝒳Ω=𝒯Ω.\displaystyle\mathcal{X}_{\Omega}=\mathcal{T}_{\Omega}.

Note that the B3B_{3} here only compute the TV-norm for the first 2 modes. For example, assume that 𝒳\mathcal{X} is a video which can be treated as a 3-order tensor, then this TV-norm only counts the variation within each frame without the variation between frames.

For specific tensors - RGB image data, [24] unfolded the tensor in 2 ways (the horizontal and vertical dimension) and minimized the TV and nuclear norms of each unfolded matrix:

min𝒳\displaystyle\min_{\mathcal{X}}\quad i=12(αi𝒳(i)+μ𝒳(i)TV),\displaystyle\sum_{i=1}^{2}(\alpha_{i}\|\mathcal{X}_{(i)}\|_{*}+\mu\|\mathcal{X}_{(i)}\|_{TV}),
s.t.\displaystyle s.t.\quad 𝒳Ω=𝒯Ω.\displaystyle\mathcal{X}_{\Omega}=\mathcal{T}_{\Omega}.

In our experiment, we noticed that for a small percentage of observations, for instance, 50% or more entries are missing, the TV-minimization recovery will produce a similar structure of the original tensor, in the sense of singular values, but NNM will force a large portion of smaller singular values to be zero, which cannot be ignored in the original tensor. Therefore one should be really careful with the choice of minimization problem when performing the completion tasks on a specific dataset. We will discuss the details in the experiment section.

IV TV Minimization Algorithm

IV-A Matrix Denoising Algorithm

Total Variation Norm is often discretized by [16]

uTVi,j(xu)i,j2+(yu)i,j2.\|u\|_{TV}\approx\sum_{i,j}\sqrt{(\nabla_{x}u)^{2}_{i,j}+(\nabla_{y}u)^{2}_{i,j}}.

Hence the image denoising problem is defined as:

minXi,j(xMi,j)2+(yMi,j)2+λMXF.\min_{X}\sum_{i,j}\sqrt{(\nabla_{x}M_{i,j})^{2}+(\nabla_{y}M_{i,j})^{2}}+\lambda\|M-X\|_{F}.

This can be solved by implementing the algorithm in [9].

IV-B Tensor Completion with TV

Similar to the image denoising algorithm, we first compute the divergence at each entry and move each entry towards the divergence direction. To keep the existing entries unchanged, we project the observed entries to their original values at each step. We consider the following minimization problem:

min𝒳\displaystyle\min_{\mathcal{X}}\quad 𝒳TV,\displaystyle\|\mathcal{X}\|_{TV},
s.t.\displaystyle s.t.\quad 𝒳Ω=𝒯Ω.\displaystyle\mathcal{X}_{\Omega}=\mathcal{T}_{\Omega}.

The related algorithm is summarized in Algorithm 1.

Input : Incomplete tensor 𝒯d1××dn\mathcal{T}\in\mbox{$\mathbb{R}$}^{d_{1}\times\dots\times d_{n}}; Sampling pattern Ω{0,1}d1××dn\Omega\in\{0,1\}^{d_{1}\times\dots\times d_{n}}; stepsize hkh_{k}, threshold λ\lambda; 𝒳0d1××dn\mathcal{X}^{0}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}}.
Set 𝒳0=𝒳0+(𝒯Ω𝒳Ω0)\mathcal{X}^{0}=\mathcal{X}^{0}+(\mathcal{T}_{\Omega}-\mathcal{X}^{0}_{\Omega}).
for k=0:Kk=0:K do
       for i=1:ni=1:n do
             i(𝒳α1,,αnk)=𝒳α1,,αi+1,,αnk𝒳α1,αi,,αnk,(αi=1,2,,di1)\nabla_{i}(\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}})=\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{i}+1,...,\alpha_{n}}-\mathcal{X}^{k}_{\alpha_{1},...\alpha_{i},...,\alpha_{n}},(\alpha_{i}=1,2,...,d_{i}-1) (i()=0 when αi=di\nabla_{i}(\cdot)=0\text{ when }\alpha_{i}=d_{i})
            Δi(𝒳α1,,αnk)=𝒳α1,,αi1,,αnk+𝒳α1,,αi+1,,αnk2𝒳α1,αi,,αnk,(αi=2,3,,di1)\Delta_{i}(\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}})=\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{i}-1,...,\alpha_{n}}+\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{i}+1,...,\alpha_{n}}-2\mathcal{X}^{k}_{\alpha_{1},...\alpha_{i},...,\alpha_{n}},(\alpha_{i}=2,3,...,d_{i}-1) (Δi()=0 when αi=1\Delta_{i}(\cdot)=0\text{ when }\alpha_{i}=1 or did_{i})
      Δ(𝒳α1,,αnk)=iΔi(𝒳α1,,αnk)\Delta(\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}})=\sum_{i}\Delta_{i}(\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}})
       𝒳α1,,αnk+1=𝒳α1,,αnk+hkshrink(Δ(𝒳α1,,αnk)ii2(𝒳α1,,αnk),λ)\mathcal{X}^{k+1}_{\alpha_{1},...,\alpha_{n}}=\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}}+h_{k}\cdot\text{shrink}(\frac{\Delta(\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}})}{\sqrt{\sum_{i}\nabla_{i}^{2}(\mathcal{X}^{k}_{\alpha_{1},...,\alpha_{n}})}},\lambda)
       𝒳Ωk+1=𝒯Ω\mathcal{X}_{\Omega}^{k+1}=\mathcal{T}_{\Omega}
Output : 𝒳K\mathcal{X}^{K}
Algorithm 1 Tensor Completion through TV minimization

In Algorithm 1, the Laplacian operator computes the divergence of all-dimension gradients for each point. The shrink operator simply moves the input towards 0 with distance λ\lambda, or formally defined as:

shrink(x,λ)=𝐬𝐢𝐠𝐧(x)max(|x|λ,0)\text{shrink}(x,\lambda)=\mathbf{sign}(x)\cdot\max(|x|-\lambda,0)

For 𝒳0\mathcal{X}^{0} initialization, simple tensor completion with total variation (TVTC) method would set 𝒳0\mathcal{X}^{0} to be a zero tensor, i.e. 𝒳0=𝟎d1××dn\mathcal{X}^{0}=\mathbf{0}^{d_{1}\times\dots\times d_{n}}, but our proposed method will set 𝒳0\mathcal{X}^{0} to be the result from w-HOSVD. We will show the theoretical and experimental advantage of w-HOSVD in the following section.

V Main Results

In order to show the efficiency 𝒯^\mathcal{\widehat{T}} as the initialization for total variation algorithm, we only need to show that 𝒯^\mathcal{\widehat{T}} is close to 𝒯\mathcal{T}. In the following, the bound of 𝒲(𝒯𝒯^)F\|\mathcal{W}\boxdot(\mathcal{T}-\mathcal{\widehat{T}})\|_{F} is estimated.

Theorem 1.

Let 𝒲=𝐰1𝐰nd1××dn\mathcal{W}=\boldsymbol{w}_{1}\boldsymbol{\otimes}\cdots\boldsymbol{\otimes}\boldsymbol{w}_{n}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} have strictly positive entries, and fix Ω[d1]××[dn]\Omega\subseteq[d_{1}]\times\cdots\times[d_{n}]. Suppose that 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} has Tucker-rank 𝐫=[r1,,rn]\boldsymbol{r}=[r_{1},\cdots,r_{n}] for problem (1). Suppose that 𝒵i1in𝒩(0,σ2)\mathcal{Z}_{i_{1}\cdots i_{n}}\sim\mathcal{N}(0,\sigma^{2}). Then with probability at least 12|Ω|/21-2^{-|\Omega|/2} over the choice of 𝒵\mathcal{Z},

𝒲(1/2)(𝒯𝒯^)F4σμ|Ω|ln(2)+2𝒯𝒲(1/2)𝒲(1/2)1ΩF,\|\mathcal{W}^{(1/2)}\boxdot(\mathcal{T}-\widehat{\mathcal{T}})\|_{F}\leq 4\sigma\mu\sqrt{|\Omega|\ln(2)}\\ +2\|\mathcal{T}\|_{\infty}\|\mathcal{W}^{(1/2)}-\mathcal{W}^{(-1/2)}\boxdot 1_{\Omega}\|_{F},

where μ2=max(i1,,in)Ω1𝒲i1in\mu^{2}=\max_{(i_{1},\cdots,i_{n})\in\Omega}\frac{1}{\mathcal{W}_{i_{1}\cdots i_{n}}}.

Notice that the upper bound in Theorem 1 is for the optimal output 𝒯^\widehat{\mathcal{T}} for problems (1), which is general. But the upper bound in Theorem 1 contains no rank information of the underlying tensor 𝒯\mathcal{T}. In order to introduce the rank information of the underlying tensor 𝒯\mathcal{T}, we restrict our analysis for Problem (1) by considering the HOSVD process, we have the following results.

Theorem 2 (General upper bound for Tucker rank 𝒓\boldsymbol{r} tensor).

Let 𝒲=𝐰𝟏𝐰𝐧d1××dn\mathcal{W}=\boldsymbol{w_{1}}\boldsymbol{\otimes}\cdots\boldsymbol{\otimes}\boldsymbol{w_{n}}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} have strictly positive entries, and fix Ω[d1]××[dn]\Omega\subseteq[d_{1}]\times\cdots\times[d_{n}]. Suppose that 𝒯d1××dn\mathcal{T}\in\mathbb{R}^{d_{1}\times\cdots\times d_{n}} has Tucker rank r=[r1rn]\textbf{r}=[r_{1}\leavevmode\nobreak\ \cdots\leavevmode\nobreak\ r_{n}]. Suppose that 𝒵i1in𝒩(0,σ2)\mathcal{Z}_{i_{1}\cdots i_{n}}\sim\mathcal{N}(0,\sigma^{2}) and let

𝒯^=𝒲(1/2)((𝒲(1/2)𝒴Ω)×1U^1U^1T×2×nU^nU^nT)\widehat{\mathcal{T}}=\mathcal{W}^{(-1/2)}\boxdot((\mathcal{W}^{(-1/2)}\boxdot\mathcal{Y}_{\Omega})\times_{1}\widehat{U}_{1}\widehat{U}_{1}^{T}\times_{2}\cdots\times_{n}\widehat{U}_{n}\widehat{U}_{n}^{T})

where U^1,,U^n\widehat{U}_{1},\cdots,\widehat{U}_{n} are obtained by considering the HOSVD approximation process. Then with probability at least

1i=1n1di+jidj1-\sum_{i=1}^{n}\frac{1}{d_{i}+\prod_{j\neq i}d_{j}}

over the choice of 𝒵\mathcal{Z}, we have

𝒲(1/2)(𝒯𝒯^)F(k=1nrklog(dk+jkdj)μk)σ+(k=1nrk(𝒲(12)1Ω𝒲(12))(k)2)𝒯.\|\mathcal{W}^{(1/2)}\boxdot(\mathcal{T}-\widehat{\mathcal{T}})\|_{F}\lesssim\\ \left(\sum_{k=1}^{n}\sqrt{r_{k}\log(d_{k}+\prod_{j\neq k}d_{j})}\mu_{k}\right)\sigma\\ +\left(\sum_{k=1}^{n}r_{k}\|(\mathcal{W}^{(-\frac{1}{2})}\boxdot 1_{\Omega}-\mathcal{W}^{(\frac{1}{2})})_{(k)}\|_{2}\right)\|\mathcal{T}\|_{\infty}.

where

μk2=maxi1,,in{i1,,ik1,ik+1,,in1(i1,i2,,in)Ω𝒲i1i2in,ik1(i1,i2,,in)Ω𝒲i1i2in}.\mu_{k}^{2}=\max_{i_{1},\cdots,i_{n}}\left\{\sum_{i_{1},\cdots,i_{k-1},i_{k+1},\cdots,i_{n}}\frac{1_{(i_{1},i_{2},\cdots,i_{n})\in\Omega}}{\mathcal{W}_{i_{1}i_{2}\cdots i_{n}}},\right.\\ \left.\sum_{i_{k}}\frac{1_{(i_{1},i_{2},\cdots,i_{n})\in\Omega}}{\mathcal{W}_{i_{1}i_{2}\cdots i_{n}}}\right\}.

VI Simulations

In this section, we conducted numerical simulations to show the efficiency of the proposed weighted HOSVD algorithm first. Then, we will include the experiment results to show that using the weighted HOSVD algorithm results as a initialization of TV minimization algorithm can accelerate the convergence speed of the original TV minimization.

VI-A Simulations for Weighted HOSVD

In this simulation, we have tested our weighted HOSVD algorithm for 3-order tensor of the form 𝒯=𝒞×1U1×2U2×3U3\mathcal{T}=\mathcal{C}\times_{1}U_{1}\times_{2}U_{2}\times_{3}U_{3} under uniform and nonuniform sampling patterns, where Uidi×riU_{i}\in\mathbb{R}^{d_{i}\times r_{i}} and 𝒞r1×r2×r3\mathcal{C}\in\mathbb{R}^{r_{1}\times r_{2}\times r_{3}} with ri<dir_{i}<d_{i}. First of all, we generate 𝒯\mathcal{T} of the size 100×100×100100\times 100\times 100 with Tucker rank 𝒓=[r,r,r]\boldsymbol{r}=[r,r,r] with rr varies from 22 to 1010. Then we add Gaussian random noises with σ=102\sigma=10^{-2} to 𝒯\mathcal{T}. Next we generate a sampling pattern Ω\Omega which subsample 10%10\% of the entries. We consider estimates 𝒯^o\widehat{\mathcal{T}}_{o}, 𝒯^p\widehat{\mathcal{T}}_{p} and 𝒯^w\widehat{\mathcal{T}}_{w} by considering

𝒯^o\displaystyle\widehat{\mathcal{T}}_{o} =\displaystyle= argminTucker_rank(𝒳)=𝒓𝒳𝒴ΩF,\displaystyle\underset{\text{Tucker\_rank}(\mathcal{X})=\boldsymbol{r}}{\text{argmin}}\|\mathcal{X}-\mathcal{Y}_{\Omega}\|_{F},
𝒯^p\displaystyle\widehat{\mathcal{T}}_{p} =\displaystyle= argminTucker_rank(𝒳)=𝒓𝒳1p𝒴ΩF,p=|Ω|d1d2d3,\displaystyle\underset{\text{Tucker\_rank}(\mathcal{X})=\boldsymbol{r}}{\text{argmin}}\|\mathcal{X}-\frac{1}{p}\mathcal{Y}_{\Omega}\|_{F},p=\frac{|\Omega|}{d_{1}d_{2}d_{3}},
𝒯^w\displaystyle\widehat{\mathcal{T}}_{w} =\displaystyle= 𝒲(1/2)\displaystyle\mathcal{W}^{(-1/2)}\boxdot
argminTucker_rank(𝒳)=𝒓𝒳𝒲(1/2)𝒴ΩF,\displaystyle\underset{\text{Tucker\_rank}(\mathcal{X})=\boldsymbol{r}}{\text{argmin}}\|\mathcal{X}-\mathcal{W}^{(-1/2)}\boxdot\mathcal{Y}_{\Omega}\|_{F},

respectively by using truncated HOSVD algorithm. We give names HOSVD associated with 𝒯^o\widehat{\mathcal{T}}_{o}, HOSVD_p associated with 𝒯^p\widehat{\mathcal{T}}_{p}, and w_HOSVD associated with 𝒯^w\widehat{\mathcal{T}}_{w}. For an estimate 𝒯^\widehat{\mathcal{T}}, we consider both the weighted and unweighted relative errors:

𝒲(1/2)(𝒯^𝒯)F𝒲(1/2)𝒯F and 𝒯^𝒯F𝒯F.\frac{\|\mathcal{W}^{(1/2)}\boxdot(\widehat{\mathcal{T}}-\mathcal{T})\|_{F}}{\|\mathcal{W}^{(1/2)}\boxdot\mathcal{T}\|_{F}}\text{ and }\frac{\|\widehat{\mathcal{T}}-\mathcal{T}\|_{F}}{\|\mathcal{T}\|_{F}}.

We average each error over 20 experiments.

When each entry is sampled randomly uniformly with probability pp, then we have 𝔼(𝒴Ω)=p𝒯\mathbb{E}(\mathcal{Y}_{\Omega})=p\mathcal{T} which implies that the estimate 𝒯^p\widehat{\mathcal{T}}_{p} should perform better than 𝒯^o\widehat{\mathcal{T}}_{o}. In Figure 1, we take the sampling pattern to be uniform at random. The estimates 𝒯^p\widehat{\mathcal{T}}_{p} and 𝒯^w\widehat{\mathcal{T}}_{w} perform significantly better than 𝒯^o\widehat{\mathcal{T}}_{o} as expected.

In Figure 2, the set-up is the same as the one in Figure 1 except that the sampling pattern is non-uniformly at random. Then using 1p\frac{1}{p} is not a good weight tensor which as shown in Figure 2, 𝒲^p\widehat{\mathcal{W}}_{p} works terrible. But 𝒯^w\widehat{\mathcal{T}}_{w} still works better that 𝒯^o\widehat{\mathcal{T}}_{o}.

Refer to caption
(a)
Refer to caption
(b)
Figure 1: Uniformly subsampled.
Refer to caption
(a)
Refer to caption
(b)
Figure 2: Non-uniformly subsampled.

VI-B Simulations for TV with Initialization from Weighted HOSVD (wHOSVD-TV)

VI-B1 Experimental Setup

In order to show the advantage of weighted HOSVD, we test our proposed algorithm and simple TV minimization method, along with a baseline algorithm on video data. We mask a specific ratio of entries and conduct each completion algorithm in order to obtain the completion results 𝒯^\widehat{\mathcal{T}}. The tested sampling rates (SR) are 0.1,0.3,0.5,0.80.1,0.3,0.5,0.8. We then compute the relative root mean square error (RSE):

RSE=𝒯^𝒯F𝒯FRSE=\frac{\|\widehat{\mathcal{T}}-\mathcal{T}\|_{F}}{\|\mathcal{T}\|_{F}}

for each method to evaluate their performance. Meanwhile, we compare the average running time until algorithm converges to some preset threshold.

VI-B2 Data

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 3: The first frame of tested videos

In this part, we have tested our algorithm on three video data, see [14]. The video data are tennis-serve data from an Olympic Sports Dataset. The three videos are color videos. In our simulation, we use the same set-up like the one in [14], we pick 30 frames evenly from each video. For each image frame, the size is scaled to 360×480×3360\times 480\times 3. So each video is transformed into a 4-D tensor data of size 360×480×3×30360\times 480\times 3\times 30. The first frame of each video after preprocessed is illustrated in Figure 3.

VI-B3 Numerical Results

Table I: The Relative Square Error (RSE) and time spend for different algorithms on video data
Video Method SR RSE Time(S)
[Uncaptioned image] wHOSVD-TV 10% 0.2080 82.26
30% 0.1418 50.40
50% 0.1045 41.31
80% 0.0566 33.21
ST-HOSVD 10% N/A N/A
30% 0.1941 521.94
50% 0.1381 175.82
80% 0.0667 128.68
[Uncaptioned image] wHOSVD-TV 10% 0.2694 35.21
30% 0.1888 21.08
50% 0.1411 16.55
80% 0.0767 12.88
ST-HOSVD 10% N/A N/A
30% 0.2249 1130.77
50% 0.1480 1304.31
80% 0.0749 976.65
[Uncaptioned image] wHOSVD-TV 10% 0.2198 156.5
30% 0.1394 87.89
50% 0.0955 72.15
80% 0.0470 18.44
ST-HOSVD 10% N/A N/A
30% 0.1734 560.97
50% 0.1105 158.74
80% 0.0594 52.09
Refer to caption
Figure 4: Comparison Between TVTC and wHOSVD-TV on video 1 with SR =50%=50\%.

The simulation results on the videos are reported in Table I and Figure 4. In existent studies, there are studies performed the same completion task on the same dataset (see [14].) In [14], the ST-HOSVD [27] had the best performance among several low-rank based tensor completion algorithm.

We record the completion error and running time for each completion task and compare them with the previous low-rank based algorithms. One can observe that the TV-based algorithm is more compatible with video data most of the time.

On the other hand, we have implemented total variations with zero filling initialization for the entries which are not observed and with the tensor obtained from weighted HOSVD which are termed TVTC and wHOSVD-TV respectively. The iterative results are shown in Figure 4, which shows that using the result from weighted HOSVD as initialization could notably reduce the iterations of TV-minimization for achieving the convergence threshold (𝒳k𝒳k1F<104\|\mathcal{X}^{k}-\mathcal{X}^{k-1}\|_{F}<10^{-4}).

VI-C Discussion

Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption Refer to caption
Figure 5: Nuclear Norm Comparison for Different Recovery Patterns

The relation between smoothness pattern and low-rank pattern is mysterious. When studying image-related data, both patterns are usually taken at the same time and converted to a mixed optimization structure. Through experiments, we find that, with uniform random missing entries, result from total variation minimization on image-like data has singular values closer to the original image.

We randomly mask 70% entries for several grey-scale images and performed both nuclear norm minimization and total variation minimization on the images. Then the nuclear norms for original image, masked image, TV estimates, and NNM estimates are computed, see Figure 5. From this figure, we can see that the image recovered from TV minimization already gives a smaller nuclear norm compared to the original image, while NNM will bring this further away. By observing the singular values of the TV-recovered matrix and of the NNM recovered matrix, we can see that TV-minimization could better capture the smaller singular values, hence better preserve the overall structures of the original matrix.

Refer to caption
Figure 6: Comparison of singular values between TV-recovery and original image, original image is the same as in Figure 5

Since both \|\cdot\|_{*} and TV\|\cdot\|_{TV} are convex functions, the mixed minimization problem with restricted observation entries

𝒳^mix=min𝒳iαi𝒳(i)+λ𝒳TV\widehat{\mathcal{X}}_{\text{mix}}=\min_{\mathcal{X}}\sum_{i}\alpha_{i}\|\mathcal{X}_{(i)}\|_{*}+\lambda\|\mathcal{X}\|_{TV}

will produces a result whose nuclear norm is between 𝒳NNM\|\mathcal{X}_{NNM}\|_{*} and 𝒳TV\|\mathcal{X}_{TV}\|_{*}, where 𝒳^NNM\widehat{\mathcal{X}}_{\text{NNM}} and 𝒳^TV\widehat{\mathcal{X}}_{\text{TV}} are the results from each individual minimization problem (with the same constraint):

𝒳^NNM=min𝒳iαi𝒳(i),\widehat{\mathcal{X}}_{\text{NNM}}=\min_{\mathcal{X}}\sum_{i}\alpha_{i}\|\mathcal{X}_{(i)}\|_{*},
𝒳^TV=min𝒳𝒳TV.\widehat{\mathcal{X}}_{\text{TV}}=\min_{\mathcal{X}}\|\mathcal{X}\|_{TV}.

Unlike user-rating data and synthetic low-rank tensor, the image-like data tends to have a non-trivial tail of singular values. Figure 6 shows the similarity between the original image and TV-recovered image, which gives us hints about the performance comparison between TV-recovery and low-rank recovery.

Acknowledgement

Needell was partially supported by NSF CAREER #1348721 and NSF BIGDATA #1740325.

References

  • [1] Jacob Abernethy, Francis Bach, Theodoros Evgeniou, and Jean-Philippe Vert. Low-rank matrix factorization with attributes. arXiv preprint cs/0611124, 2006.
  • [2] Yonatan Amit, Michael Fink, Nathan Srebro, and Shimon Ullman. Uncovering shared structures in multiclass classification. In Proceedings of the 24th international conference on Machine learning, pages 17–24. ACM, 2007.
  • [3] Carl J Appellof and Ernest R Davidson. Strategies for analyzing data from video fluorometric monitoring of liquid chromatographic effluents. Analytical Chemistry, 53(13):2053–2056, 1981.
  • [4] Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010.
  • [5] T Tony Cai, Wen-Xin Zhou, et al. Matrix completion via max-norm constrained optimization. Electronic Journal of Statistics, 10(1):1493–1525, 2016.
  • [6] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936, 2010.
  • [7] Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717, 2009.
  • [8] J Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika, 35(3):283–319, 1970.
  • [9] Antonin Chambolle, Vicent Caselles, Daniel Cremers, Matteo Novaga, and Thomas Pock. An introduction to total variation for image analysis. Theoretical foundations and numerical methods for sparse recovery, 9(263-340):227, 2010.
  • [10] Pei Chen and David Suter. Recovering the missing components in a large noisy low-rank matrix: Application to sfm. IEEE transactions on pattern analysis and machine intelligence, 26(8):1051–1063, 2004.
  • [11] Miaomiao Cheng, Liping Jing, and Michael Kwok-Po Ng. A weighted tensor factorization method for low-rank tensor completion. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), pages 30–38. IEEE, 2019.
  • [12] P Comon. Tensor decompositions: state of the art and applications. math. Signal Process. V (Coventry, 2000), pages 1–24.
  • [13] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000.
  • [14] Zisen Fang, Xiaowei Yang, Le Han, and Xiaolan Liu. A sequentially truncated higher order singular value decomposition-based algorithm for tensor completion. IEEE transactions on cybernetics, 49(5):1956–1967, 2018.
  • [15] Simon Foucart, Deanna Needell, Reese Pathak, Yaniv Plan, and Mary Wootters. Weighted matrix completion from non-random, non-uniform sampling patterns. arXiv preprint arXiv:1910.13986, 2019.
  • [16] Pascal Getreuer. Rudin-osher-fatemi total variation denoising using split bregman. Image Processing On Line, 2:74–95, 2012.
  • [17] David F Gleich and Lek-heng Lim. Rank aggregation via nuclear norm minimization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 60–68. ACM, 2011.
  • [18] David Goldberg, David Nichols, Brian M Oki, and Douglas Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61–71, 1992.
  • [19] Richard A Harshman et al. Foundations of the parafac procedure: Models and conditions for an” explanatory” multimodal factor analysis. 1970.
  • [20] Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
  • [21] Frank L Hitchcock. Multiple invariants and generalized rank of a p-way matrix or tensor. Journal of Mathematics and Physics, 7(1-4):39–79, 1928.
  • [22] Teng-Yu Ji, Ting-Zhu Huang, Xi-Le Zhao, Tian-Hui Ma, and Gang Liu. Tensor completion using total variation and low-rank matrix factorization. Information Sciences, 326:243–257, 2016.
  • [23] Daniel Kressner, Michael Steinlechner, and Bart Vandereycken. Low-rank tensor completion by riemannian optimization. BIT Numerical Mathematics, 54(2):447–468, 2014.
  • [24] Lingwei Li, Fei Jiang, and Ruimin Shen. Total variation regularized reweighted low-rank tensor completion for color image inpainting. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2152–2156. IEEE, 2018.
  • [25] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. Tensor completion for estimating missing values in visual data. IEEE transactions on pattern analysis and machine intelligence, 35(1):208–220, 2012.
  • [26] Panagiotis Symeonidis, Alexandros Nanopoulos, and Yannis Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the 2008 ACM conference on Recommender systems, pages 43–50. ACM, 2008.
  • [27] Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. A new truncation strategy for the higher-order singular value decomposition. SIAM Journal on Scientific Computing, 34(2):A1027–A1052, 2012.
  • [28] Yangyang Xu, Ruru Hao, Wotao Yin, and Zhixun Su. Parallel matrix factorization for low-rank tensor completion. Inverse Problems and Imaging, 9(2):601–624, 2015.