Tensor Completion through Total Variation with Initialization from Weighted HOSVD
Abstract
In our paper, we have studied the tensor completion problem when the sampling pattern is deterministic. We first propose a simple but efficient weighted HOSVD algorithm for recovery from noisy observations. Then we use the weighted HOSVD result as an initialization for the total variation. We have proved the accuracy of the weighted HOSVD algorithm from theoretical and numerical perspectives. In the numerical simulation parts, we also showed that by using the proposed initialization, the total variation algorithm can efficiently fill the missing data for images and videos.
I Introduction
Tensor, a high-dimensional array which is an extension of matrix, plays an important role in a wide range of real world applications [3, 12]. Due to the high-dimensional structure, tensor could preserve more information compared to the unfolded matrix. For instance, a frame, video stored as an tensor will keep the connection between each frames, splitting the frames or unfolding this tensor may lose some conjunctional information. A brain MRI would also benefit from the 3D structure if stored as tensor instead of randomly arranging several snapshots as matrices. On the other hand, most of the real world datasets are partially missing and incomplete data which can lead to extremely low performance of downstream applications. The linear dependency and redundancy between missing and existing data can be leveraged to recover unavailable data and improve the quality and scale of the incomplete dataset. The task of recovering missing elements from partially observed tensor is called tensor completion and has attracted widespread attention in many applications. e.g., image/video inpainting [23, 25], recommendation systems [26]. Matrix completion problem, [1, 2, 4, 5, 6, 7, 10, 15, 17, 18] as a special case of tensor completion problem has been well-studied in the past few decades, which enlightened researchers on developing further tensor completion algorithms. Among different types of data matrices, image data is commonly studied and widely used for performance indicator. One traditional way to target image denoising problem is to minimize the total variation norm. Such method is based on the assumption of locally smoothness pattern of the data. Yet in recent decades, thanks to the algorithm development of non-negative matrix factorization (NMF) and nuclear norm minimization (NNM), the low-rank structure assumption becomes increasingly popular and extensively applied in related studies. In both matrix completion and tensor completion studies, researchers are trying to utilize and balance both assumption in order to improve the performance of image recovery and video recovery tasks.
In this research, we will provide an improved version of total variation minimization problem by providing a proper initialization. To implement the initialization, we have designed a simple but efficient algorithm which we call weighted HOSVD algorithm for low-rank tensor completion from a deterministic sampling pattern, which is motivated from [11, 15].
II Tensor Completion Problem
In this section, we would provide a formal definition for the tensor completion problem. First of all, we will introduce notations, basic operations and definitions for tensor.
II-A Preliminaries and Notations
Tensors, matrices, vectors and scalars are denoted in different typeface for clarity below. Throughout this paper, calligraphic boldface capital letters are used for tensors, capital letters are used for matrices, lower boldface letters for vectors, and regular letters for scalars. The set of the first natural numbers will be denoted by . Let and , represents the pointwise power operator i.e., . We use to denote the tensor with for all . denotes the tensor with all entries equal to on and otherwise.
Definition 1 (Tensor).
A tensor is a multidimensional array. The dimension of a tensor is called the order (also called the mode). The space of a real tensor of order and of size is denoted as . The elements of a tensor are denoted by .
For an order tensor can be matricized in ways by unfolding it along each of the modes, next we will give the definition for the matricization of a given tensor.
Definition 2 (Matricization of a tensor).
The mode- matricization of tensor is the matrix, which is denoted as , whose columns are composed of all the vectors obtained from by fixing all indices but th.
In order to illustrate the matricization of a tensor, let us consider the following example.
Example 1.
Let with the following frontal slices:
then the three mode-n matricizations are
Definition 3 (Folding Operator).
Suppose be a tensor. The mode- folding operator of a matrix , denoted as , is the inverse operator of the unfolding operator.
Definition 4 ( norm).
Let , the is defined as
The unit ball under norm is denoted by .
Definition 5 (Frobenius norm).
The Frobenius norm for tensor is defined as
Definition 6 (Product Operations).
-
Mode- Product: Mode- product of tensor and matrix is defined by
i.e.,
-
Outer product: Let . The outer product among these vectors is a tensor defined as:
-
Kronecker product of matrices: The Kronecker product of and is denoted by . The result is a matrix of size and defined by
-
Khatri-Rao product: Given matrices and , their Khatri-Rao product is denoted by . The result is a matrix of size defined by
where and stands for the -th column of and respectively.
-
Hadamard product: Given two tensors and , both of size , their Hadamard product is denoted by . The result is also of the size and the elements of are defined as the elementwise tensor product i.e.,
Definition 7 (Rank-one Tensors).
An -order tensor is rank one if it can be written as the out product of vectors, i.e.,
Definition 8 (Tensor (CP) rank[20, 21]).
The rank of a tensor , denoted , is defined as the smallest number of rank-one tensors that generate as their sum. We use to denote the cone of rank-r tensors.
Different from the case of matrices, the rank of a tensor is presently not understood well. And the problem of computing the rank of a tensor is NP-hard. Next we will introduce a new rank definition related to the tensor.
Definition 9 (Tucker rank [21]).
Let . The tuple , where is called tensor Tucker rank of . We use to denote the cone of Tucker rank tensors.
II-B Problem Statement
In order to find a good initialization for total variation method, we would like to solve the following questions.
Question 1.
Given a deterministic sampling pattern and corresponding (possibly noisy) observations from the tensor, what type of recovery error can we expect, in what metric, and how may we efficiently implement this recovery?
Question 2.
Given a sampling pattern , and noisy observations , for what rank-one weight tensor can we efficiently find a tensor so that is small compared to ? And how can we efficiently find such weight tensor , or certify that a fixed has this property?
In order to find weight tensor, we consider the optimization problem
can be estimated by using the least square CP algorithm [8, 19]. After we find , then we consider the following optimization problem to estimate :
(1) |
where . As we know, to solve problem (1) is NP-hard. In order to solve (1) in polynomial time, we consider the HOSVD process [13]. Assume that has Tucker rank . Let
and set to be the left singular vector matrix of . Then the estimated tensor is of the form
In the following, we call our algorithm weighted HOSVD algorithm.
With the output from weighted HOSVD, , we can solve the following total variation problem with as initialization:
This total variation minimization problem is solved by iterative method. We will discuss the details of algorithm and numerical performance in section IV and V.
III Related Work
In this section, we will briefly step through the history of matrix completion and introduce several relevant studies on tensor completion.
Suppose we have a partially observed matrix under the low-rank assumption and give a deterministic sampling pattern , the most intuitive optimization problem raised here is:
However, due to the computational complexity (NP-hard) of this minimization problem, researchers developed workarounds by defining new optimization problems which could be done in polynomial time. Two prominent substitutions are the nuclear norm minimization(NNM) and low rank matrix factorization(LRMF):
where stands for the sum of singular values.
where are restricted to be the low rank components.
While the task went up to tensor completion, the low rank assumption became even harder to approach. Some of the recent studies performed matrix completion algorithms on the unfolded tensor and obtained considerable results. For example, [25] introduced nuclear norm to unfolded tensors and took the weighted average for loss function. They proposed several algorithms such as FaLRTC and SiLRTC to solve the minimization problem:
[28] applied low-rank matrix factorization(LRMF) to all-mode unfolded tensors and defined the minimization problem as following:
Where , are the set of low rank matrices with different size according to the unfolded tensor. This method is called TMac and could be solved using alternating minimization.
While researchers often test the performance of their tensor completion algorithms on image/video/MRI data, they started to combine NNM and LRMF with total variation norm minimization when dealing with relevant recovery tasks. For example, [22] introduced the TV regularization into the tensor completion problem:
Note that the here only compute the TV-norm for the first 2 modes. For example, assume that is a video which can be treated as a 3-order tensor, then this TV-norm only counts the variation within each frame without the variation between frames.
For specific tensors - RGB image data, [24] unfolded the tensor in 2 ways (the horizontal and vertical dimension) and minimized the TV and nuclear norms of each unfolded matrix:
In our experiment, we noticed that for a small percentage of observations, for instance, 50% or more entries are missing, the TV-minimization recovery will produce a similar structure of the original tensor, in the sense of singular values, but NNM will force a large portion of smaller singular values to be zero, which cannot be ignored in the original tensor. Therefore one should be really careful with the choice of minimization problem when performing the completion tasks on a specific dataset. We will discuss the details in the experiment section.
IV TV Minimization Algorithm
IV-A Matrix Denoising Algorithm
IV-B Tensor Completion with TV
Similar to the image denoising algorithm, we first compute the divergence at each entry and move each entry towards the divergence direction. To keep the existing entries unchanged, we project the observed entries to their original values at each step. We consider the following minimization problem:
The related algorithm is summarized in Algorithm 1.
In Algorithm 1, the Laplacian operator computes the divergence of all-dimension gradients for each point. The shrink operator simply moves the input towards 0 with distance , or formally defined as:
For initialization, simple tensor completion with total variation (TVTC) method would set to be a zero tensor, i.e. , but our proposed method will set to be the result from w-HOSVD. We will show the theoretical and experimental advantage of w-HOSVD in the following section.
V Main Results
In order to show the efficiency as the initialization for total variation algorithm, we only need to show that is close to . In the following, the bound of is estimated.
Theorem 1.
Let have strictly positive entries, and fix . Suppose that has Tucker-rank for problem (1). Suppose that . Then with probability at least over the choice of ,
where .
Notice that the upper bound in Theorem 1 is for the optimal output for problems (1), which is general. But the upper bound in Theorem 1 contains no rank information of the underlying tensor . In order to introduce the rank information of the underlying tensor , we restrict our analysis for Problem (1) by considering the HOSVD process, we have the following results.
Theorem 2 (General upper bound for Tucker rank tensor).
Let have strictly positive entries, and fix . Suppose that has Tucker rank . Suppose that and let
where are obtained by considering the HOSVD approximation process. Then with probability at least
over the choice of , we have
where
VI Simulations
In this section, we conducted numerical simulations to show the efficiency of the proposed weighted HOSVD algorithm first. Then, we will include the experiment results to show that using the weighted HOSVD algorithm results as a initialization of TV minimization algorithm can accelerate the convergence speed of the original TV minimization.
VI-A Simulations for Weighted HOSVD
In this simulation, we have tested our weighted HOSVD algorithm for 3-order tensor of the form under uniform and nonuniform sampling patterns, where and with . First of all, we generate of the size with Tucker rank with varies from to . Then we add Gaussian random noises with to . Next we generate a sampling pattern which subsample of the entries. We consider estimates , and by considering
respectively by using truncated HOSVD algorithm. We give names HOSVD associated with , HOSVD_p associated with , and w_HOSVD associated with . For an estimate , we consider both the weighted and unweighted relative errors:
We average each error over 20 experiments.
When each entry is sampled randomly uniformly with probability , then we have which implies that the estimate should perform better than . In Figure 1, we take the sampling pattern to be uniform at random. The estimates and perform significantly better than as expected.
In Figure 2, the set-up is the same as the one in Figure 1 except that the sampling pattern is non-uniformly at random. Then using is not a good weight tensor which as shown in Figure 2, works terrible. But still works better that .




VI-B Simulations for TV with Initialization from Weighted HOSVD (wHOSVD-TV)
VI-B1 Experimental Setup
In order to show the advantage of weighted HOSVD, we test our proposed algorithm and simple TV minimization method, along with a baseline algorithm on video data. We mask a specific ratio of entries and conduct each completion algorithm in order to obtain the completion results . The tested sampling rates (SR) are . We then compute the relative root mean square error (RSE):
for each method to evaluate their performance. Meanwhile, we compare the average running time until algorithm converges to some preset threshold.
VI-B2 Data



In this part, we have tested our algorithm on three video data, see [14]. The video data are tennis-serve data from an Olympic Sports Dataset. The three videos are color videos. In our simulation, we use the same set-up like the one in [14], we pick 30 frames evenly from each video. For each image frame, the size is scaled to . So each video is transformed into a 4-D tensor data of size . The first frame of each video after preprocessed is illustrated in Figure 3.
VI-B3 Numerical Results
Video | Method | SR | RSE | Time(S) |
---|---|---|---|---|
![]() |
wHOSVD-TV | 10% | 0.2080 | 82.26 |
30% | 0.1418 | 50.40 | ||
50% | 0.1045 | 41.31 | ||
80% | 0.0566 | 33.21 | ||
ST-HOSVD | 10% | N/A | N/A | |
30% | 0.1941 | 521.94 | ||
50% | 0.1381 | 175.82 | ||
80% | 0.0667 | 128.68 | ||
![]() |
wHOSVD-TV | 10% | 0.2694 | 35.21 |
30% | 0.1888 | 21.08 | ||
50% | 0.1411 | 16.55 | ||
80% | 0.0767 | 12.88 | ||
ST-HOSVD | 10% | N/A | N/A | |
30% | 0.2249 | 1130.77 | ||
50% | 0.1480 | 1304.31 | ||
80% | 0.0749 | 976.65 | ||
![]() |
wHOSVD-TV | 10% | 0.2198 | 156.5 |
30% | 0.1394 | 87.89 | ||
50% | 0.0955 | 72.15 | ||
80% | 0.0470 | 18.44 | ||
ST-HOSVD | 10% | N/A | N/A | |
30% | 0.1734 | 560.97 | ||
50% | 0.1105 | 158.74 | ||
80% | 0.0594 | 52.09 |

The simulation results on the videos are reported in Table I and Figure 4. In existent studies, there are studies performed the same completion task on the same dataset (see [14].) In [14], the ST-HOSVD [27] had the best performance among several low-rank based tensor completion algorithm.
We record the completion error and running time for each completion task and compare them with the previous low-rank based algorithms. One can observe that the TV-based algorithm is more compatible with video data most of the time.
On the other hand, we have implemented total variations with zero filling initialization for the entries which are not observed and with the tensor obtained from weighted HOSVD which are termed TVTC and wHOSVD-TV respectively. The iterative results are shown in Figure 4, which shows that using the result from weighted HOSVD as initialization could notably reduce the iterations of TV-minimization for achieving the convergence threshold ().
VI-C Discussion
The relation between smoothness pattern and low-rank pattern is mysterious. When studying image-related data, both patterns are usually taken at the same time and converted to a mixed optimization structure. Through experiments, we find that, with uniform random missing entries, result from total variation minimization on image-like data has singular values closer to the original image.
We randomly mask 70% entries for several grey-scale images and performed both nuclear norm minimization and total variation minimization on the images. Then the nuclear norms for original image, masked image, TV estimates, and NNM estimates are computed, see Figure 5. From this figure, we can see that the image recovered from TV minimization already gives a smaller nuclear norm compared to the original image, while NNM will bring this further away. By observing the singular values of the TV-recovered matrix and of the NNM recovered matrix, we can see that TV-minimization could better capture the smaller singular values, hence better preserve the overall structures of the original matrix.

Since both and are convex functions, the mixed minimization problem with restricted observation entries
will produces a result whose nuclear norm is between and , where and are the results from each individual minimization problem (with the same constraint):
Unlike user-rating data and synthetic low-rank tensor, the image-like data tends to have a non-trivial tail of singular values. Figure 6 shows the similarity between the original image and TV-recovered image, which gives us hints about the performance comparison between TV-recovery and low-rank recovery.
Acknowledgement
Needell was partially supported by NSF CAREER #1348721 and NSF BIGDATA #1740325.
References
- [1] Jacob Abernethy, Francis Bach, Theodoros Evgeniou, and Jean-Philippe Vert. Low-rank matrix factorization with attributes. arXiv preprint cs/0611124, 2006.
- [2] Yonatan Amit, Michael Fink, Nathan Srebro, and Shimon Ullman. Uncovering shared structures in multiclass classification. In Proceedings of the 24th international conference on Machine learning, pages 17–24. ACM, 2007.
- [3] Carl J Appellof and Ernest R Davidson. Strategies for analyzing data from video fluorometric monitoring of liquid chromatographic effluents. Analytical Chemistry, 53(13):2053–2056, 1981.
- [4] Jian-Feng Cai, Emmanuel J Candès, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010.
- [5] T Tony Cai, Wen-Xin Zhou, et al. Matrix completion via max-norm constrained optimization. Electronic Journal of Statistics, 10(1):1493–1525, 2016.
- [6] Emmanuel J Candes and Yaniv Plan. Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936, 2010.
- [7] Emmanuel J Candès and Benjamin Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717, 2009.
- [8] J Douglas Carroll and Jih-Jie Chang. Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika, 35(3):283–319, 1970.
- [9] Antonin Chambolle, Vicent Caselles, Daniel Cremers, Matteo Novaga, and Thomas Pock. An introduction to total variation for image analysis. Theoretical foundations and numerical methods for sparse recovery, 9(263-340):227, 2010.
- [10] Pei Chen and David Suter. Recovering the missing components in a large noisy low-rank matrix: Application to sfm. IEEE transactions on pattern analysis and machine intelligence, 26(8):1051–1063, 2004.
- [11] Miaomiao Cheng, Liping Jing, and Michael Kwok-Po Ng. A weighted tensor factorization method for low-rank tensor completion. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM), pages 30–38. IEEE, 2019.
- [12] P Comon. Tensor decompositions: state of the art and applications. math. Signal Process. V (Coventry, 2000), pages 1–24.
- [13] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition. SIAM journal on Matrix Analysis and Applications, 21(4):1253–1278, 2000.
- [14] Zisen Fang, Xiaowei Yang, Le Han, and Xiaolan Liu. A sequentially truncated higher order singular value decomposition-based algorithm for tensor completion. IEEE transactions on cybernetics, 49(5):1956–1967, 2018.
- [15] Simon Foucart, Deanna Needell, Reese Pathak, Yaniv Plan, and Mary Wootters. Weighted matrix completion from non-random, non-uniform sampling patterns. arXiv preprint arXiv:1910.13986, 2019.
- [16] Pascal Getreuer. Rudin-osher-fatemi total variation denoising using split bregman. Image Processing On Line, 2:74–95, 2012.
- [17] David F Gleich and Lek-heng Lim. Rank aggregation via nuclear norm minimization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 60–68. ACM, 2011.
- [18] David Goldberg, David Nichols, Brian M Oki, and Douglas Terry. Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12):61–71, 1992.
- [19] Richard A Harshman et al. Foundations of the parafac procedure: Models and conditions for an” explanatory” multimodal factor analysis. 1970.
- [20] Frank L Hitchcock. The expression of a tensor or a polyadic as a sum of products. Journal of Mathematics and Physics, 6(1-4):164–189, 1927.
- [21] Frank L Hitchcock. Multiple invariants and generalized rank of a p-way matrix or tensor. Journal of Mathematics and Physics, 7(1-4):39–79, 1928.
- [22] Teng-Yu Ji, Ting-Zhu Huang, Xi-Le Zhao, Tian-Hui Ma, and Gang Liu. Tensor completion using total variation and low-rank matrix factorization. Information Sciences, 326:243–257, 2016.
- [23] Daniel Kressner, Michael Steinlechner, and Bart Vandereycken. Low-rank tensor completion by riemannian optimization. BIT Numerical Mathematics, 54(2):447–468, 2014.
- [24] Lingwei Li, Fei Jiang, and Ruimin Shen. Total variation regularized reweighted low-rank tensor completion for color image inpainting. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2152–2156. IEEE, 2018.
- [25] Ji Liu, Przemyslaw Musialski, Peter Wonka, and Jieping Ye. Tensor completion for estimating missing values in visual data. IEEE transactions on pattern analysis and machine intelligence, 35(1):208–220, 2012.
- [26] Panagiotis Symeonidis, Alexandros Nanopoulos, and Yannis Manolopoulos. Tag recommendations based on tensor dimensionality reduction. In Proceedings of the 2008 ACM conference on Recommender systems, pages 43–50. ACM, 2008.
- [27] Nick Vannieuwenhoven, Raf Vandebril, and Karl Meerbergen. A new truncation strategy for the higher-order singular value decomposition. SIAM Journal on Scientific Computing, 34(2):A1027–A1052, 2012.
- [28] Yangyang Xu, Ruru Hao, Wotao Yin, and Zhixun Su. Parallel matrix factorization for low-rank tensor completion. Inverse Problems and Imaging, 9(2):601–624, 2015.