This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Hyperspectral super-resolution via low rank tensor triple decomposition

Abstract.

Hyperspectral image (HSI) and multispectral image (MSI) fusion aims at producing a super-resolution image (SRI). In this paper, we establish a nonconvex optimization model for image fusion problem through low rank tensor triple decomposition. Using the L-BFGS approach, we develop a first-order optimization algorithm for obtaining the desired super-resolution image (TTDSR). Furthermore, two detailed methods are provided for calculating the gradient of the objective function. With the aid of the Kurdyka-Ł\Lojasiewicz property, the iterative sequence is proved to converge to a stationary point. Finally, experimental results on different datasets show the effectiveness of our proposed approach.

Key words and phrases:
Tensor triple decomposition, nonconvex programming, Kurdyka-Ł\Lojasiewicz property, image fusion, hyperspectral image, multispectral image
1991 Mathematics Subject Classification:
Primary: 90C30, 90C90, 90-08, 90-10, 49M37.
This work is supported by National Natural Science Foundation of China (grant No. 11901118 and 62073087).
Corresponding author: Jingya Chang

Xiaofei Cui1{}^{{\href mailto:[email protected]}1} and Jingya Chang1{}^{{\href mailto:[email protected]}*1}

1School of Mathematics and Statistics, Guangdong University of Technology, China


1. Introduction

Equipment for hyperspectral imaging uses an unusual sensor to produce images with great spectral resolution. The key factor in an image is the electromagnetic spectrum obtained by sampling at hundreds of continuous wavelength intervals. Because hyperspectral images have hundreds of bands, spectral information is abundant [32]. Hyperspectral images are now the core of a large and increasing number of remote sensing applications, such as environmental monitoring [4, 25], target classification [41, 8] and anomaly detection [7, 37].

With optical sensors, there is a fundamental compromise among spectral resolution, spatial resolution and signal-to-noise ratio. Hyperspectral images therefore have a high spectral resolution but a low spatial resolution. Contrarily, multispectral sensors, which only have a few spectral bands, can produce images with higher spatial resolution but poorer spectral resolution [5].

The approaches for addressing hyperspectral and multispectral images fusion issues can be categorized into four classes [42]. The pan-sharpening approach falls within the first category, which includes component substitution [27, 1] and multi-resolution analysis [6]. Selva [30] and Liu [22] proposed multi-resolution analysis method to extract spatial details that were injected into the interpolated hyperspectral image. Chen et al. [9] divided all bands of the images into several regions and then fused the images of each region by the pan-sharpening approach. Unfortunately, these methods may lead to spectral distortion. The second category is subspace-based formulas. The main idea is to explore the low-dimensional representation of hyperspectral images, which are regarded as a linear combination of a set of basis vectors or spectral features [18, 38, 34]. Hardie et al. [15] employed Bayesian formulas to solve the image fusion problem with the maximum posteriori probability. The method based on matrix factorization falls within the third category. Under these circumstances, HSI and MSI are regarded as the degradation versions of the reference SRI. Yokoya et al. [40] proposed a coupled nonnegative matrix factorization method, which estimated the basic vectors and its coefficients blindly and then calculated the endmember matrix and degree matrix by alternating optimization algorithm. In addition, there are methods combining sparse regularization and spatial regularization [19, 31], such as a low rank matrix decomposition model for decomposing a hyperspectral image into a super-resolution image and a difference image in [12, 42].

Tensor decomposition methods fall within the fourth category. Tensor, a higher-order extension of matrix, can capture the correlation between the spatial and spectral domains in hyperspectral images simultaneously. Representing hyperspectral and multispectral images as third-order tensors has been widely used in hyperspectral image denoising and super-resolution problems. The hyperspectral image is a three-dimensional data cube with two spatial dimensions (height and width) and one spectral dimension [17, 33, 23]. The spatial details and spectral structures of the SRI therefore are both more accurate.

By utilizing low rank tensor decomposition technology to explore the low rank properties of hyperspectral images, high spatial resolution and accurate restoration guarantees are obtained. Kanatsoulis et al. [20] proposed treating the super-resolution problem as a coupled tensor approximation, where the super-resolution image satisfied low rank Canonical Polynomial (CP) decomposition. Later, Li et al. [28, 24] considered a novel approach based on tucker decomposition. Xu et al. [39] first considered that the target SRI exhibited both the sparsity and the piecewise smoothness on three modes and then used tucker decomposition to construct model. Dian et al. [13] applied tensor kernel norm to explore low rank property of SRI. He et al. [16] proposed tensor ring decomposition model for addressing the image fusion problem.

In this paper, we propose a novel model based on low rank tensor triple decomposition to address the hyperspectral and multispectral images fusion problem. Triple decomposition breaks down a third-order tensor into three smaller tensors with predetermined dimensions. It is effectively applied in tensor completion and image restoration [29, 11]. The structures of image fusion and triple decomposition are presented in Figure 1. The optimization model is solved by the L-BFGS algorithm. The convergent results are given with the help of Kurdyka-Ł\Lojasiewicz property of the objective function. Numerical experiments demonstrate that the proposed TTDSR method performs well and the convergent conclusion is also validated numerically.

The rest of this paper is organized as follows. In section 2, we first provide definitions used in the paper and present the basic algebraic operations of tensors. In section 3, we introduce our motivation and model for the hyperspectral and multispectral images fusion. In section 4, the L-BFGS algorithm and the gradient calculation are also presented in detail. Convergence analysis of the algorithm is available in section 5. In section 6, experimental results on two datasets are given. The conclusion is given in section 7.

2. Preliminary

Let \mathbb{R} be the real field. We use lowercase letters and boldface lowercase letters to represent scalars and vectors respectively, while capital letters and calligraphic letters stand for matrices and tensors respectively.

An mmth order nn dimensional tensor 𝒜\mathcal{A} is a multidimensional array with its entry being

𝒜i1,i2,,im=ai1,i2,,im,forij=1,,n,j=1,,m.\mathcal{A}_{i_{1},i_{2},\ldots,i_{m}}=a_{i_{1},i_{2},\ldots,i_{m}},\ \text{for}\ i_{j}=1,\ldots,n,j=1,\ldots,m.

Similarly, the iith entry in a vector a is symbolized by (𝐚)i=ai(\mathbf{a})_{i}=a_{i}, and the (i,j)(i,j)th entry of a matrix AA is (A)ij=aij.(A)_{ij}=a_{ij}. Unless otherwise specified, the order of tensor is set to three hereafter in this paper. Without loss of generality, we denote 𝒜n1×n2×n3.\mathcal{A}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}. A fiber of 𝒜\mathcal{A} is a vector produced by fixing two indices of 𝒜,\mathcal{A}, such as 𝒜(i,j,:)\mathcal{A}(i,j,:) for any i[1,,n1],j[1,,n2]i\in[1,\ldots,n_{1}],j\in[1,\ldots,n_{2}]. A slice of 𝒜\mathcal{A} is a matrix by varying two of its indices while fixing another one, such as 𝒜(:,:,k)\mathcal{A}(:,:,k) for any k[1,,n3].k\in[1,\ldots,n_{3}]. The geometric structures of the tensor expansion are shown in Figure 2.

The mode-kk product of the tensor 𝒜\mathcal{A} and a matrix XX is an extension of matrix product. Suppose matrices Fl×n1F\in\mathbb{R}^{l\times n_{1}}, Gm×n2G\in\mathbb{R}^{m\times n_{2}}, and Hn×n3H\in\mathbb{R}^{n\times n_{3}}. The mode-11 product of 𝒜\mathcal{A} and FF is denoted as 𝒜×1F\mathcal{A}\times_{1}F with its elements being

(𝒜×1F)tjk=i=1n1aijkfti,fort=1,l,j=1,,n2,k=1,n3.\left(\mathcal{A}\times_{1}{F}\right)_{tjk}=\sum_{i=1}^{n_{1}}a_{ijk}f_{ti},\quad\text{for}\ t=1,\ldots l,j=1,\ldots,n_{2},k=1,\ldots n_{3}.

Also we have

(𝒜×2G)ipk=j=1n2aijkgpj,(𝒜×3H)ijq=k=1n3aijkhqk,\left(\mathcal{A}\times_{2}{G}\right)_{ipk}=\sum_{j=1}^{n_{2}}a_{ijk}g_{pj},\quad\left(\mathcal{A}\times_{3}{H}\right)_{ijq}=\sum_{k=1}^{n_{3}}a_{ijk}h_{qk},\quad (1)

for p=1,m,p=1,\ldots m, q=1,,n.q=1,\ldots,n. It is easy to verify that

𝒜×pF×qG=𝒜×qG×pF,p,q=1,2,3andpq,𝒜×kF×kG=𝒜×k(GF),k=1,2,3.\begin{split}\mathcal{A}\times_{p}{F}\times_{q}{G}=&\mathcal{A}\times_{q}{G}\times_{p}{F},\quad\forall\ p,q=1,2,3\ \text{and}\ p\neq q,\\ \mathcal{A}\times_{k}{F}\times_{k}{G}=&\mathcal{A}\times_{k}({GF}),\quad\forall\ k=1,2,3.\end{split} (2)

The mode-kk matricization of 𝒜\mathcal{A} denoted by A(k){A}_{(k)} arranges its nkn_{k} mode-kk fibers as the columns of A(k)A_{(k)} in order. The Frobenius norm of tensor 𝒜\mathcal{A} is given by

𝒜F=(i=1n1j=1n2k=1n3aijk2)1/2.\|\mathcal{A}\|_{F}=\left(\sum_{i=1}^{n_{1}}\sum_{j=1}^{n_{2}}\sum_{k=1}^{n_{3}}a_{ijk}^{2}\right)^{1/2}. (3)

The unfolding related no matter with tensors, matrices or vectors in the following part is carried out under the precondition that the left index varies more rapidly than the right one. Subscript represents dimension. We note that vec(Bm×n){vec}\left(B_{m\times n}\right) creates a column vector 𝐛mn\mathbf{b}_{mn} by stacking the columns of Bm×nB_{m\times n} below one another, i.e., (𝐛mn)i+(j1)m=bij\left(\mathbf{b}_{mn}\right)_{i+(j-1)m}=b_{ij}. In addition, the kronecker product of matrices Bm×nB_{m\times n} and Cp×qC_{p\times q} is

Bm×nCp×q=(b11Cp×qb12Cp×qb1nCp×qb21Cp×qb22Cp×qb2nCp×qbm1Cp×qbm2Cp×qbmnCp×q),B_{m\times n}\otimes C_{p\times q}=\left(\begin{array}[]{cccc}b_{11}C_{p\times q}&b_{12}C_{p\times q}&\cdots&b_{1n}C_{p\times q}\\ b_{21}C_{p\times q}&b_{22}C_{p\times q}&\cdots&b_{2n}C_{p\times q}\\ \cdots&\cdots&\cdots&\cdots\\ b_{m1}C_{p\times q}&b_{m2}C_{p\times q}&\cdots&b_{mn}C_{p\times q}\end{array}\right),\quad

for s=1,,p,s=1,\cdots,p, t=1,,q.t=1,\cdots,q. It generates a large matrix Dpm×qn=Bm×nCp×qD_{pm\times qn}=B_{m\times n}\otimes C_{p\times q}, whose entries are dgh=bijcst{d}_{gh}=b_{ij}c_{st} with g=s+(i1)pg=s+(i-1)p and h=t+(j1)qh=t+(j-1)q. By this rule, unfolding matrices of tensor 𝒜\mathcal{A} could be represented by

(An1×n2n3)i,j+(k1)n2=aijk,(An1×n3n2)i,k+(j1)n3=aijk,(An2×n3n1)j,k+(i1)n3=aijk,(An2×n1n3)j,i+(k1)n1=aijk,(An3×n1n2)k,i+(j1)n1=aijk,(An3×n2n1)k,j+(i1)n2=aijk.\begin{split}\left(A_{n_{1}\times n_{2}n_{3}}\right)_{i,j+(k-1)n_{2}}=a_{ijk},\quad&\left(A_{n_{1}\times n_{3}n_{2}}\right)_{i,k+(j-1)n_{3}}=a_{ijk},\\ \left(A_{n_{2}\times n_{3}n_{1}}\right)_{j,k+(i-1)n_{3}}=a_{ijk},\quad&\left(A_{n_{2}\times n_{1}n_{3}}\right)_{j,i+(k-1)n_{1}}=a_{ijk},\\ \left(A_{n_{3}\times n_{1}n_{2}}\right)_{k,i+(j-1)n_{1}}=a_{ijk},\quad&\left(A_{n_{3}\times n_{2}n_{1}}\right)_{k,j+(i-1)n_{2}}=a_{ijk}.\end{split} (4)

Moreover, An1n2×n3=(An3×n1n2).A_{n_{1}n_{2}\times n_{3}}=\left(A_{n_{3}\times n_{1}n_{2}}\right)^{\top}.

Refer to caption
Refer to caption
Figure 1. The image fusion is displayed in the left image, and the tensor triple decomposition structure is shown in the right.
Refer to caption
Figure 2. Slices and fibers of tensor 𝒜\mathcal{A}. From left to right are horizontal slices 𝒜i,:,:\mathcal{A}_{i,:,:}, frontal slices 𝒜:,:,k\mathcal{A}_{:,:,k}, column fibers 𝒜i,:,k\mathcal{A}_{i,:,k}, and row fibers 𝒜:,j,k\mathcal{A}_{:,j,k}.

3. The tensor triple decomposition (TTD) model for HSI-MSI fusion

Tensors can describe high dimensional relationship. The hyperspectral and multispectral images are naturally third-order tensors. Methods based on low rank matrix decomposition reconstruct the 3-d tensor data into a 2-d matrix data at beginning, which destroy the original 3-d structure and may cause negative influence on the fusion consequences. Therefore, hyperspectral and multispectral image fusion problems have been investigated with the aid of various tensor decompositions in recent years. The tensor triple decomposition model has certain advantages over the classical CP and tucker decompositions. The triple rank is less than or equal to CP rank, which has been proved theoretically and practically. As described in theorem 2.1 of [29], low rank triple decomposition and triple ranks are well-defined. The proposed triple decomposition has the low rank properties and performs well in image restoration. When facing large-scale problems, the lower rank indicates that we transform the problem into a lower dimensional subspace, which is relatively easy to solve. Thus, we employ the tensor triple decomposition framework to establish the hyperspectral super-resolution model.

Define 𝒴hn1×n2×n3\mathcal{Y}_{h}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}, 𝒴mm1×m2×m3\mathcal{Y}_{m}\in\mathbb{R}^{m_{1}\times m_{2}\times m_{3}} and 𝒵m1×m2×n3\mathcal{Z}\in\mathbb{R}^{m_{1}\times m_{2}\times n_{3}} as hyperspectral image (HSI), multispectral image (MSI) and super-resolution image (SRI) respectively. Here the first two indices n1n_{1} and n2n_{2} or m1m_{1} and m2m_{2} denote the spatial dimensions. The third index n3n_{3} or m3m_{3} indicate the number of spectral bands. Usually, the MSI contains more spatial information than the HSI, that is m1n1,m2n2.m_{1}\gg n_{1},m_{2}\gg n_{2}. The HSI has more spectral bands than MSI, which means n3m3.n_{3}\gg m_{3}. Our target is to find the SRI that possesses the high spectral resolution of HSI and the spatial resolution of the MSI, i.e. a super-resolution image.

3.1. The links among SRI, HSI and MSI

Let the mode-33 matricization of 𝒴h\mathcal{Y}_{h}, 𝒴m\mathcal{Y}_{m} and 𝒵\mathcal{Z} be (Yh)(3)n1n2×n3,(Ym)(3)m1m2×m3(Y_{h})_{(3)}\in\mathbb{R}^{n_{1}n_{2}\times n_{3}},(Y_{m})_{(3)}\in\mathbb{R}^{m_{1}m_{2}\times m_{3}} and Z(3)m1m2×n3.Z_{(3)}\in\mathbb{R}^{m_{1}m_{2}\times n_{3}}. The key point which helps us to construct the relationship among SRI, HSI and MSI is that there exist two linear operators Phn1n2×m1m2P_{h}\in\mathbb{R}^{n_{1}n_{2}\times m_{1}m_{2}} and P3m3×n3P_{3}\in\mathbb{R}^{m_{3}\times n_{3}} such that (Yh)(3)=PhZ(3)(Y_{h})_{(3)}=P_{h}Z_{(3)} and (Ym)(3)=Z(3)P3T(Y_{m})_{(3)}=Z_{(3)}P_{3}^{T} [20]. Thus we have

𝒴h(:,:,k)=P1𝒵(:,:,k)P2T,fork=1,,n3,\mathcal{Y}_{h}(:,:,k)=P_{1}\mathcal{Z}(:,:,k)P_{2}^{T},\quad\text{for}\ k=1,\ldots,n_{3}, (5)

where P1n1×m1P_{1}\in\mathbb{R}^{n_{1}\times m_{1}}, P2n2×m2P_{2}\in\mathbb{R}^{n_{2}\times m_{2}} and P2P1=Ph.P_{2}\otimes P_{1}=P_{h}. We also have

𝒴m(i,j,:)=P3𝒵(i,j,:),fori=1,,m1,j=1,,m2,\mathcal{Y}_{m}(i,j,:)={P}_{3}\mathcal{Z}(i,j,:),\quad\text{for}\ i=1,\ldots,m_{1},\ j=1,\ldots,m_{2}, (6)

where 𝒵(i,j,:)n3\mathcal{Z}(i,j,:)\in\mathbb{R}^{n_{3}} represents a fiber of 𝒵\mathcal{Z} and 𝒴m(i,j,:)m3\mathcal{Y}_{m}(i,j,:)\in\mathbb{R}^{m_{3}} is a fiber of 𝒴m\mathcal{Y}_{m} respectively. Therefore, 𝒴h\mathcal{Y}_{h}, 𝒴m\mathcal{Y}_{m} can be rewritten as

𝒴h=𝒵×1P1×2P2,𝒴m=𝒵×3P3.\mathcal{Y}_{h}=\mathcal{Z}\times_{1}{P}_{1}\times_{2}{P}_{2},\quad\mathcal{Y}_{m}=\mathcal{Z}\times_{3}{P}_{3}. (7)

Equations (5)(\ref{3}) and (6)(\ref{4}) also are known as the spatial and spectral degradation processes of 𝒵.\mathcal{Z}. The matrices P1P_{1}, P2P_{2} respectively describe the downsampling and blurring of the spatial degradation process. Downsampling is considered as linear compression, while blurring describes a linear mixing of neighbouring pixels under a specific kernel in both the row and column dimensions. The matrix P3{P}_{3} is usually modeled as a band-selection matrix that selects the common spectral bands of the SRI and MSI.

Based on the correlation of spatial and spectral information in hyperspectral and multispectral images, various low rank tensor decomposition models are established to study the problem. For example, suppose 𝒵\mathcal{Z} is decomposed by Canonical Polyadic decomposition into the sum of several rank-11 tensors, which is represented as 𝒵=A,B,C.\mathcal{Z}=\llbracket{A},{B},{C}\rrbracket. The HSI-MSI fusion model [20] is denoted as 𝒴h=P1A,P2B,C\mathcal{Y}_{h}=\llbracket{P}_{1}{A},{P}_{2}{B},{C}\rrbracket and 𝒴m=A,B,P3C\mathcal{Y}_{m}=\llbracket{A},{B},{P}_{3}{C}\rrbracket. Besides, tucker decomposition is considered [28, 44] and it is expressed as 𝒴h=𝒞×1(P1A)×2(P2B)×3C\mathcal{Y}_{h}=\mathcal{C}\times_{1}\left({P}_{1}{A}\right)\times_{2}\left({P}_{2}{B}\right)\times_{3}{C} and 𝒴m=𝒞×1A×2B×3(P3C)\mathcal{Y}_{m}=\mathcal{C}\times_{1}{A}\times_{2}{B}\times_{3}\left({P}_{3}{C}\right).

3.2. The new model for HSI-MSI fusion

In [29], Qi et al. proposed a new tensor decomposition, named tensor triple decomposition, which can effectively express the original tensor information by three low rank tensors. Therefore, we establish the hyperspectral and multispectral image fusion model by low rank tensor triple decomposition.

First, we introduce the tensor triple decomposition in detail. Tensor triple decomposition of a third-order 𝒵𝕖m1×m2×n3\mathcal{Z}\in\mathbb{Re}^{m_{1}\times m_{2}\times n_{3}} representing the target SRI takes the following form

𝒵=𝒜𝒞,\mathcal{Z}=\mathcal{A}\mathcal{B}\mathcal{C}, (8)

where 𝒜𝕖m1×m×n\mathcal{A}\in\mathbb{Re}^{m_{1}\times m\times n}, 𝕖l×m2×n\mathcal{B}\in\mathbb{Re}^{l\times m_{2}\times n} and 𝒞𝕖l×m×n3.\mathcal{C}\in\mathbb{Re}^{l\times m\times n_{3}}. It can also be denoted as 𝒵=𝒜,,𝒞\mathcal{Z}=\llbracket\mathcal{A},\mathcal{B},\mathcal{C}\rrbracket with its entries

(𝒜,,𝒞)ijk=t=1lp=1mq=1naipqbtjqctpk,(\llbracket\mathcal{A},\mathcal{B},\mathcal{C}\rrbracket)_{ijk}=\sum_{t=1}^{l}\sum_{p=1}^{m}\sum_{q=1}^{n}a_{ipq}b_{tjq}c_{tpk}, (9)

for i=1,,m1i=1,\ldots,m_{1}, j=1,,m2j=1,\ldots,m_{2}, and k=1,,n3.k=1,\ldots,n_{3}. The smallest value of rr such that (9) holds is called the triple rank of 𝒵\mathcal{Z} and denoted as rank(𝒵)=r\operatorname{rank}(\mathcal{Z})=r. If l=m=n=rl=m=n=r, equation (8) is called low rank triple decomposition of 𝒵\mathcal{Z}, where rmid{m1,m2,n3}r\leq\operatorname{mid}\left\{m_{1},m_{2},n_{3}\right\} and mid represents the larger number in the middle [29, 11]. In particular, according to [11, Theorem 2.2], triple decomposition satisfies the following equations

Zm1×m2n3=Am1×nm(EmBn×m2l)(Clm×n3Em2),Zm2×n3m1=Bm2×ln(EnCl×n3m)(Amn×m1En3),Zn3×m2m1=Cn3×lm(EmBl×m2n)(Anm×m1Em2),\begin{split}Z_{m_{1}\times m_{2}n_{3}}&=A_{m_{1}\times nm}\left(E_{m}\otimes B_{n\times m_{2}l}\right)\left(C_{lm\times n_{3}}\otimes E_{m_{2}}\right),\\ Z_{m_{2}\times n_{3}m_{1}}&=B_{m_{2}\times ln}\left(E_{n}\otimes C_{l\times n_{3}m}\right)\left(A_{mn\times m_{1}}\otimes E_{n_{3}}\right),\\ Z_{n_{3}\times m_{2}m_{1}}&=C_{n_{3}\times lm}\left(E_{m}\otimes B_{l\times m_{2}n}\right)\left(A_{nm\times m_{1}}\otimes E_{m_{2}}\right),\end{split} (10)

where EE is an identity matrix with a proper size.

Next, we propose a new model with the help of low rank tensor triple decomposition. Assume 𝒜\mathcal{A}, \mathcal{B} and 𝒞\mathcal{C} are the low rank triple decomposition tensors of tensor 𝒵.\mathcal{Z}. For the connection (7), we have

𝒴h=𝒜×1P1,×2P2,𝒞and𝒴m=𝒜,,𝒞×3P3.\mathcal{Y}_{h}=\llbracket\mathcal{A}\times_{1}{P_{1}},\mathcal{B}\times_{2}{P_{2}},\mathcal{C}\rrbracket\quad\text{and}\quad\mathcal{Y}_{m}=\llbracket\mathcal{A},\mathcal{B},\mathcal{C}\times_{3}{P_{3}}\rrbracket. (11)

Since the best low rank approximation problem of tensors may be ill-posed, we add the Tikhonov regularization term and get the following optimization model

min𝒜,,𝒞f(𝒜,,𝒞):=𝒴h𝒜×1P1,×2P2,𝒞F2+𝒴m𝒜,,𝒞×3P3F2+μ(𝒜F2+F2+𝒞F2),\begin{split}\min_{\mathcal{A},\mathcal{B},\mathcal{C}}f(\mathcal{A},\mathcal{B},\mathcal{C}):&=\left\|\mathcal{Y}_{h}-\llbracket\mathcal{A}\times_{1}{P_{1}},\mathcal{B}\times_{2}{P_{2}},\mathcal{C}\rrbracket\right\|_{F}^{2}+\left\|\mathcal{Y}_{m}-\llbracket\mathcal{A},\mathcal{B},\mathcal{C}\times_{3}{P_{3}}\rrbracket\right\|_{F}^{2}\\ &\quad+\mu\left(\|\mathcal{A}\|_{F}^{2}+\|\mathcal{B}\|_{F}^{2}+\|\mathcal{C}\|_{F}^{2}\right),\end{split} (12)

where μ\mu is regularization parameter. Thus, we employ the optimization model (12) to obtain the triple decomposition tensors 𝒜,\mathcal{A},\mathcal{B} and 𝒞\mathcal{C}, and to produce the super-resolution image SRI by 𝒵=𝒜,,𝒞\mathcal{Z}=\llbracket\mathcal{A},\mathcal{B},\mathcal{C}\rrbracket.

4. The L-BFGS algorithm for solving the model

In this section, we focus on numerical approaches for computing a first-order stationary point of the optimization problem (12). Since the limited memory BFGS (L-BFGS) algorithm [21] is powerful for large scale nonlinear unconstrained optimization, we apply it to produce a search direction and consider using inexact line search techniques to update the iteration step size. In the computaion process, we either matricize or vectorize the tensor variable to get the gradient of the objective function. For convenience, we demonstrate the algorithm and its convergence analysis for

𝐱:=(𝐚m1rr𝐛m2rr𝐜n3rr)=(Am1rr×1Bm2rr×1Cn3rr×1).\mathbf{x}:=\left(\begin{array}[]{l}\mathbf{a}_{m_{1}rr}\\ \mathbf{b}_{m_{2}rr}\\ \mathbf{c}_{n_{3}rr}\end{array}\right)=\left(\begin{array}[]{c}A_{m_{1}rr\times 1}\\ B_{m_{2}rr\times 1}\\ C_{n_{3}rr\times 1}\end{array}\right). (13)

4.1. Limited memory BFGS algorithm

BFGS is a quasi-Newton method which updates the approximation of the inverse of a Hessian iteratively. In the current iteration kk, it constructs an approximation matrix HkH_{k} to estimate the inverse of Hessian of f(𝐱)f(\mathbf{x}). The gradient of f(𝐱)f(\mathbf{x}) is defined as 𝐠(𝐱).\mathbf{g(x)}. At the beginning, we introduce the basic BFGS update. Let

𝐲k=f(𝐱k+1)f(𝐱k),𝐬k=𝐱k+1𝐱k,Vk=Iρk𝐲k𝐬k,\quad\mathbf{y}_{k}=\nabla f\left(\mathbf{x}_{k+1}\right)-\nabla f\left(\mathbf{x}_{k}\right),\quad\mathbf{s}_{k}=\mathbf{x}_{k+1}-\mathbf{x}_{k},\quad V_{k}=I-\rho_{k}\mathbf{y}_{k}\mathbf{s}_{k}^{\top}, (14)

and

ρk={1𝐲k𝐬k, if 𝐲k𝐬kϵ0, otherwise ,\begin{aligned} \hfil\displaystyle\begin{split}\rho_{k}=\left\{\begin{array}[]{ll}\frac{1}{\mathbf{y}_{k}^{\top}\mathbf{s}_{k}},&\text{ if }\mathbf{y}_{k}^{\top}\mathbf{s}_{k}\geq\epsilon\\ 0,&\text{ otherwise }\end{array}\right.\end{split}\end{aligned}, (15)

where II is an identity matrix, \top represents transposition and ϵ(0,1)\epsilon\in(0,1) is a small positive constant. The matrix HkH_{k} is updated by

Hk+1=VkHkVk+ρk𝐬k𝐬k.H_{k+1}=V_{k}^{\top}H_{k}V_{k}+\rho_{k}\mathbf{s}_{k}\mathbf{s}_{k}^{\top}. (16)

To deal with large-scale optimization problems, Liu and Nocedal [21] proposed the L-BFGS algorithm that implemented BFGS in an economical manner. Given a constant l.l. When kl,k\geq l, only information of ll matrices Hk1,Hk2,,HklH_{k-1},H_{k-2},\ldots,H_{k-l} are used to calculate the matrix HkH_{k} by the following recursive form

Hk=VkmHkmVkm+ρkm𝐬km𝐬km, for m=1,2,,l.{H_{k}}=V_{k-m}^{\top}H_{k-m}V_{k-m}+\rho_{k-m}\mathbf{s}_{k-m}\mathbf{s}_{k-m}^{\top},\quad\text{ for }m=1,2,\ldots,l. (17)

In order to save memory, the initial matrix HklH_{k-l} is replaced by

Hk(0)=γkI.H_{k}^{(0)}=\gamma_{k}{I}. (18)

Here γk>0\gamma_{k}>0 is usually determined by the Barzilai-Borwein method [3] as follows

γkBB1=𝐲k𝐬k𝐲k2 and γkBB2=𝐬k2𝐲k𝐬k.\gamma_{k}^{\mathrm{BB}1}=\frac{\mathbf{y}_{k}^{\top}\mathbf{s}_{k}}{\left\|\mathbf{y}_{k}\right\|^{2}}\quad\text{ and }\quad\gamma_{k}^{\mathrm{BB}2}=\frac{\left\|\mathbf{s}_{k}\right\|^{2}}{\mathbf{y}_{k}^{\top}\mathbf{s}_{k}}. (19)

If lkl\geq k, HkH_{k} is generated by the traditional BFGS method. The L-BFGS method can be implemented in an inexpensive two-loop recursion way which is shown in Algorithm 1.

The 𝐩k=Hk𝐠(𝐱k)\mathbf{p}_{k}=-H_{k}\mathbf{g}\left(\mathbf{x}_{k}\right) generated by L-BFGS is a gradient-related descent direction for HkH_{k} beging a positive definite matrix, where the proof process is presented in Lemma B.2. Then we find a proper step size along the direction 𝐩k\mathbf{p}_{k} that satisfies the Armijo condition. The computation method for the model (12) is obtained and we demonstrate it in Algorithm 2.

Algorithm 1 The two-loop recursion for L-BFGS [10, 21].
1:  𝐪𝐠(𝐱k)\mathbf{q}\leftarrow-\mathbf{g}\left(\mathbf{x}_{k}\right)
2:  for  i=k1,k2,,kli=k-1,k-2,\ldots,k-l  do
3:     αiρi𝐬i𝐪\quad\alpha_{i}\leftarrow\rho_{i}\mathbf{s}_{i}^{\top}\mathbf{q}
4:     𝐪𝐪αi𝐲i\quad\mathbf{q}\leftarrow\mathbf{q}-\alpha_{i}\mathbf{y}_{i}
5:  end for
6:  for for i=kl,kl+1,,k1i=k-l,k-l+1,\ldots,k-1 do
7:     βρi𝐲i𝐩\quad\beta\leftarrow\rho_{i}\mathbf{y}_{i}^{\top}\mathbf{p}
8:     𝐩𝐩+𝐬i(αiβ)\quad\mathbf{p}\leftarrow\mathbf{p}+\mathbf{s}_{i}\left(\alpha_{i}-\beta\right)
9:  end for
10:  Stop with result 𝐩=Hk𝐠(𝐱k)\mathbf{p}=-H_{k}\mathbf{g}\left(\mathbf{x}_{k}\right).
Algorithm 2 Low rank triple decomposition for obtaining super-resolution image (TTDSR).
1:  Choose constant rr and an initial iterate 𝐱0𝕖m1rr+m2rr+n3rr\mathbf{x}_{0}\in\mathbb{Re}^{m_{1}rr+m_{2}rr+n_{3}rr}. Select parameters l>0{l}>0, β(0,1)\beta\in\left(0,1\right), and σ(0,1)\sigma\in\left(0,1\right). Compute f0=f(𝐱0)f_{0}=f\left(\mathbf{x}_{0}\right) and 𝐠0=f(𝐱0)\mathbf{g}_{0}=\nabla f\left(\mathbf{x}_{0}\right). Set k0.k\leftarrow 0.
2:  while the sequence of iterates does not converge do
3:     Generate 𝐩k=Hk𝐠(𝐱k)\mathbf{p}_{k}=-H_{k}\mathbf{g}\left(\mathbf{x}_{k}\right) by Algorithm 1.
4:     Choose the smallest nonnegative integer ω\omega such that the step size α=βω\alpha=\beta^{\omega} satisfies
f(𝐱k+α𝐩k)f(𝐱k)+σα𝐩k𝐠k.f\left(\mathbf{x}_{k}+\alpha\mathbf{p}_{k}\right)\leq f\left(\mathbf{x}_{k}\right)+\sigma\alpha\mathbf{p}_{k}^{\top}\mathbf{g}_{k}. (20)
5:     Let αk=βω\alpha_{k}=\beta^{\omega} and update the new iterate 𝐱k+1=𝐱k+αk𝐩k\mathbf{x}_{k+1}=\mathbf{x}_{k}+\alpha_{k}\mathbf{p}_{k}.
6:     Compute fk+1=f(𝐱k+1)f_{k+1}=f\left(\mathbf{x}_{k+1}\right) and 𝐠k+1=f(𝐱k+1)\mathbf{g}_{k+1}=\nabla f\left(\mathbf{x}_{k+1}\right).
7:     Compute 𝐲k\mathbf{y}_{k}, 𝐬k\mathbf{s}_{k} and ρk\mathbf{\rho}_{k} by (14) and (15), respectively.
8:     kk+1.k\leftarrow k+1.
9:  end while

4.2. Gradient calculation

At each iteration in Algorithm 2, we need to compute the gradient of the objective function. The gradient can be calculated via reshaping the tensor variable into vector or matrix form. For small scale problems, the gradient is easier to obtain when the the tensor variable is turned into a vector than into a matrix, while for large scale problems, the gradient in vector form always leads to insufficient memory in the computation process. In this subsection, we derive the matrix form of the gradient and the vector form f\nabla f is given in the Appendix A.

Denote =𝒜×1P1𝕖n1×m×n\mathcal{H}=\mathcal{A}\times_{1}P_{1}\in\mathbb{Re}^{n_{1}\times m\times n}, 𝒦=×2P2𝕖l×n2×n\mathcal{K}=\mathcal{B}\times_{2}P_{2}\in\mathbb{Re}^{l\times n_{2}\times n}, and 𝒢=𝒞×3P3𝕖l×m×m3.\mathcal{G}=\mathcal{C}\times_{3}P_{3}\in\mathbb{Re}^{l\times m\times m_{3}}. According to (10) and (11), we get the following equations

(Yh)n1×n2n3\displaystyle\left(Y_{h}\right)_{n_{1}\times{n_{2}{n_{3}}}} =Pn1×m1Am1×nm(EmKn×n2l)(Clm×n3En2),\displaystyle=P_{n_{1}\times m_{1}}A_{m_{1}\times nm}\left(E_{m}\otimes K_{n\times n_{2}l}\right)\left(C_{lm\times n_{3}}\otimes E_{n_{2}}\right), (21)
(Yh)n2×n3n1\displaystyle\left(Y_{h}\right)_{n_{2}\times{{n_{3}n_{1}}}} =Pn2×m2Bm2×ln(EnCl×n3m)(Hmn×n1En3),\displaystyle=P_{n_{2}\times m_{2}}B_{m_{2}\times ln}\left(E_{n}\otimes C_{l\times n_{3}m}\right)\left(H_{mn\times n_{1}}\otimes E_{n_{3}}\right), (22)
(Yh)n3×n2n1\displaystyle\left(Y_{h}\right)_{n_{3}\times{{n_{2}n_{1}}}} =Cn3×lm(EmKl×n2n)(Hnm×n1En2),\displaystyle=C_{n_{3}\times lm}\left(E_{m}\otimes K_{l\times n_{2}n}\right)\left(H_{nm\times n_{1}}\otimes E_{n_{2}}\right), (23)
(Ym)m1×m2m3\displaystyle\left(Y_{m}\right)_{m_{1}\times{{m_{2}m_{3}}}} =Am1×nm(EmBn×m2l)(Glm×m3Em2),\displaystyle=A_{m_{1}\times nm}\left(E_{m}\otimes B_{n\times m_{2}l}\right)\left(G_{lm\times m_{3}}\otimes E_{m_{2}}\right), (24)
(Ym)m2×m3m1\displaystyle\left(Y_{m}\right)_{m_{2}\times{{m_{3}m_{1}}}} =Bm2×ln(EnGl×m3m)(Amn×m1Em3),\displaystyle=B_{m_{2}\times ln}\left(E_{n}\otimes G_{l\times m_{3}m}\right)\left(A_{mn\times m_{1}}\otimes E_{m_{3}}\right), (25)
(Ym)m3×m2m1\displaystyle\left(Y_{m}\right)_{m_{3}\times{{m_{2}m_{1}}}} =Pm3×n3Cn3×lm(EmBl×m2n)(Anm×m1Em2).\displaystyle=P_{m_{3}\times n_{3}}C_{n_{3}\times lm}\left(E_{m}\otimes B_{l\times m_{2}n}\right)\left(A_{nm\times m_{1}}\otimes E_{m_{2}}\right). (26)

Define P2Bm2×ln=Kn2×ln{P}_{2}{B}_{{m}_{2}\times{ln}}={K}_{{n}_{2}\times ln} and P3Cn3×lm=Gm3×lm.{P}_{3}{C}_{{n}_{3}\times lm}={G}_{{m}_{3}\times{lm}}. The function f(𝐱)f(\mathbf{x}) in (12) is transformed into the objective function of matrix variables Am1×nm{A_{m_{1}\times nm}}, Bm2×ln{B_{m_{2}\times ln}} and Cn3×lm{C_{n_{3}\times lm}} as follows

f(A,B,C)=P1Am1×nm(EmKn×n2l)(Cn3×lmTEn2)(Yh)n1×n2n3F2+Am1×nm(EmBn×m2L)((P3Cn3×lm)TEm2)(Ym)m1×m2m3F2+μ(Am1×nmF2+Bm2×lnF2+Cn3×lmF2),\begin{split}f(A,B,C)&=\left\|P_{1}A_{m_{1}\times nm}\left(E_{m}\otimes K_{n\times n_{2}l}\right)\left(C_{n_{3}\times lm}^{T}\otimes E_{n_{2}}\right)-\left(Y_{h}\right)_{n_{1}\times n_{2}n_{3}}\right\|_{F}^{2}\\ &+\left\|A_{m_{1}\times nm}\left(E_{m}\otimes B_{n\times m_{2}L}\right)\left(\left(P_{3}C_{n_{3}\times lm}\right)^{T}\otimes E_{m_{2}}\right)-\left(Y_{m}\right)_{m_{1}\times m_{2}m_{3}}\right\|_{F}^{2}\\ &+\mu\left(\left\|{A}_{{m}_{1}\times{nm}}\right\|_{{F}}^{2}+\left\|{B}_{{m}_{2}\times{ln}}\right\|_{{F}}^{2}+\left\|{C}_{{n}_{3}\times{lm}}\right\|_{{F}}^{2}\right),\end{split} (27)
f(A,B,C)=P2Bm2×ln(EnCl×n3m)((P1Am1×mn)TEn3)(Yh)n2×n3n1F2+Bm2×ln(EnGl×m3m)((Am1×mn)TEm3)(Ym)m2×m3m1F2+μ(Am1×nmF2+Bm2×lnF2+Cn3×lmF2),\begin{split}f(A,B,C)&=\left\|P_{2}B_{m_{2}\times ln}\left(E_{n}\otimes C_{l\times n_{3}m}\right)\left(\left(P_{1}{A_{m_{1}\times mn}}\right)^{T}\otimes E_{n_{3}}\right)-\left(Y_{h}\right)_{n_{2}\times n_{3}n_{1}}\right\|_{F}^{2}\\ &+\left\|B_{m_{2}\times ln}\left(E_{n}\otimes G_{l\times m_{3}m}\right)\left(\left(A_{m_{1}\times mn}\right)^{T}\otimes E_{m_{3}}\right)-\left(Y_{m}\right)_{m_{2}\times m_{3}m_{1}}\right\|_{F}^{2}\\ &+\mu\left(\left\|{A}_{{m}_{1}\times{nm}}\right\|_{{F}}^{2}+\left\|{B}_{{m}_{2}\times{ln}}\right\|_{{F}}^{2}+\left\|{C}_{{n}_{3}\times{lm}}\right\|_{{F}}^{2}\right),\end{split} (28)
f(A,B,C)=Cn3×lm(EmKl×n2n)((P1Am1×mn)TEn2)(Yh)n3×n2n1F2+P3Cn3×lm(EmBl×m2n)((Am1×nm)TEm2)(Ym)m3×m2m1F2+μ(Am1×nmF2+Bm2×lnF2+Cn3×lmF2).\begin{split}f(A,B,C)&=\left\|C_{n_{3}\times lm}\left(E_{m}\otimes K_{l\times n_{2}n}\right)\left(\left(P_{1}{A_{m_{1}\times mn}}\right)^{T}\otimes E_{n_{2}}\right)-\left(Y_{h}\right)_{n_{3}\times n_{2}n_{1}}\right\|_{F}^{2}\\ &+\left\|P_{3}C_{n_{3}\times lm}\left(E_{m}\otimes B_{l\times m_{2}n}\right)\left(\left(A_{m_{1}\times nm}\right)^{T}\otimes E_{m_{2}}\right)-\left(Y_{m}\right)_{m_{3}\times m_{2}m_{1}}\right\|_{F}^{2}\\ &+\mu\left(\left\|{A}_{{m}_{1}\times{nm}}\right\|_{F}^{2}+\left\|{B}_{{m}_{2}\times{ln}}\right\|_{{F}}^{2}+\left\|{C}_{{n}_{3}\times{lm}}\right\|_{{F}}^{2}\right).\end{split} (29)

For any matrix X,X, we have XF2=tr(XX).\|X\|_{F}^{2}=tr(X^{\top}X). Therefore we review the derivatives of some useful trace functions with respect to X:X:

tr(AXBXTC)X=ATCTXBT+CAXB,tr(AXTB)X=BA,tr(ABAT)A=A(B+BT),tr(BAT)A=B,tr(AXB)X=ATBT,tr(AB)A=BT.\normalsize\begin{split}&\dfrac{\partial tr(AXBX^{T}C)}{\partial X}=A^{T}C^{T}XB^{T}+CAXB,\quad\quad\dfrac{\partial tr(AX^{T}B)}{\partial X}=BA,\quad\\ &\dfrac{\partial tr(ABA^{T})}{\partial A}=A(B+B^{T}),\quad\quad\quad\quad\quad\quad\quad\quad\quad\dfrac{\partial tr(BA^{T})}{\partial A}=B,\quad\\ &\dfrac{\partial tr(AXB)}{\partial X}=A^{T}B^{T},\qquad\qquad\qquad\qquad\qquad\quad\quad\dfrac{\partial tr(AB)}{\partial A}=B^{T}.\quad\end{split} (30)

For simplicity, we denote

E1\displaystyle{E_{1}} =(EmBn×m2l)((P3Cn3×lm)TEm2),\displaystyle=\left(E_{m}\otimes B_{n\times m_{2}l}\right)\left(\left(P_{3}C_{n_{3}\times lm}\right)^{T}\otimes E_{m_{2}}\right), D1\displaystyle{D_{1}} =(EmKn×n2l)(Cn3×lmTEn2),\displaystyle=\left(E_{m}\otimes K_{n\times n_{2}l}\right)\left(C_{n_{3}\times lm}^{T}\otimes E_{n_{2}}\right),
F1\displaystyle{F_{1}} =(EnCl×n3m)((P1Am1×mn)TEn3),\displaystyle=\left(E_{n}\otimes C_{l\times n_{3}m}\right)\left(\left(P_{1}{A_{m_{1}\times mn}}\right)^{T}\otimes E_{n_{3}}\right), G1\displaystyle{G_{1}} =(EnGl×m3m)(Am1×mnTEm3),\displaystyle=\left(E_{n}\otimes G_{l\times m_{3}m}\right)\left(A_{m_{1}\times mn}^{T}\otimes E_{m_{3}}\right),
H1\displaystyle H_{1} =(EmKl×n2n)((P1Am1×mn)TEn2),\displaystyle=\left(E_{m}\otimes K_{l\times n_{2}n}\right)\left(\left(P_{1}{A_{m_{1}\times mn}}\right)^{T}\otimes E_{n_{2}}\right), K1\displaystyle K_{1} =(EmBl×m2n)(Am1×nmTEm2).\displaystyle=\left(E_{m}\otimes B_{l\times m_{2}n}\right)\left(A_{m_{1}\times nm}^{T}\otimes E_{m_{2}}\right).

Thus, the partial derivatives with respect to A,A, B,B, and CC are

f(A,B,C)A=(P1AD1YhF2+AE1YmF2+μ(AF2+BF2+CF2))A=tr(P1AD1Yh,P1AD1Yh+AE1Ym,AE1Ym)A+2μA=tr(P1AD1D1TATP1TP1AD1YhTYhD1TATP1TYhYhT)A+tr(AE1E1TATAE1YmTYmE1TAT+YmTYm)A+2μA=2(P1TP1AD1D1TP1TYhD1T+AE1E1TYmE1T+μA),\begin{split}\dfrac{\partial f({A},{B},{C})}{\partial{A}}=&\dfrac{\partial\left(\left\|{P_{1}}{A}{D_{1}}-{Y_{h}}\right\|_{F}^{2}+\left\|{A}{E_{1}}-{Y_{m}}\right\|_{F}^{2}+\mu\left(\left\|{A}\right\|_{{F}}^{2}+\left\|{B}\right\|_{{F}}^{2}+\left\|{C}\right\|_{{F}}^{2}\right)\right)}{\partial{A}}\\ \vspace{1.5ex}=&\dfrac{\partial tr\left(\langle{P_{1}}{A}{D_{1}}-{Y_{h}},{P_{1}}{A}{D_{1}}-{Y_{h}}\rangle+\langle{A}{E_{1}}-{Y_{m}},{A}{E_{1}}-{Y_{m}}\rangle\right)}{\partial{A}}+2\mu{A}\\ =&\dfrac{\partial tr\left({P_{1}}{A}{D_{1}}{D_{1}^{T}}{A^{T}}{P_{1}^{T}}-{P_{1}}{A}{D_{1}}{Y_{h}^{T}}-{Y_{h}}{D_{1}^{T}}{A^{T}}{P_{1}^{T}}-{Y_{h}}{Y_{h}^{T}}\right)}{\partial{A}}\\ +&\dfrac{\partial tr\left({A}{E_{1}}{E_{1}^{T}}{A^{T}}-{A}{E_{1}}{Y_{m}^{T}}-{Y_{m}}{E_{1}^{T}}{A^{T}}+{Y_{m}^{T}}{Y_{m}}\right)}{{\partial{A}}}+2\mu{A}\\ =&2\left({P_{1}^{T}}{P_{1}}A{D_{1}}{D_{1}^{T}}-{P_{1}^{T}}{Y_{h}}{D_{1}^{T}}+{A}{E_{1}}{E_{1}^{T}}-{Y_{m}}{E_{1}^{T}}+\mu{A}\right),\end{split} (31)
f(A,B,C)B=(P2BF1YhF2+BG1YmF2+μ(AF2+BF2+CF2))B=tr(P2BF1Yh,P2BF1Yh+BG1Ym,BG1Ym)B+2μB=tr(P2BF1F1TBTP2TP2BF1YhTYhF1TBTP2TYhYhT)B+tr(BG1G1TBTBG1YmTYmG1TBT+YmTYm)B+2μB=2(P2TP2BF1F1TP2TYhF1T+BG1G1TYmG1T+μB),\begin{split}\dfrac{\partial f({A},{B},{C})}{\partial{B}}=&\dfrac{\partial\left(\left\|{P_{2}}{B}{F_{1}}-{Y_{h}}\right\|_{F}^{2}+\left\|{B}{G_{1}}-{Y_{m}}\right\|_{F}^{2}+\mu\left(\left\|{A}\right\|_{{F}}^{2}+\left\|{B}\right\|_{{F}}^{2}+\left\|{C}\right\|_{{F}}^{2}\right)\right)}{\partial{B}}\\ =&\dfrac{\partial tr\left(\langle{P_{2}}{B}{F_{1}}-{Y_{h}},{P_{2}}{B}{F_{1}}-{Y_{h}}\rangle+\langle{B}{G_{1}}-{Y_{m}},{B}{G_{1}}-{Y_{m}}\rangle\right)}{\partial{B}}+2\mu{B}\\ =&\dfrac{\partial tr\left({P_{2}}{B}{F_{1}}{F_{1}^{T}}{B^{T}}{P_{2}^{T}}-{P_{2}}{B}{F_{1}}{Y_{h}^{T}}-{Y_{h}}{F_{1}^{T}}{B^{T}}{P_{2}^{T}}-{Y_{h}}{Y_{h}^{T}}\right)}{\partial{B}}\\ +&\dfrac{\partial tr\left({B}{G_{1}}{G_{1}^{T}}{B^{T}}-{B}{G_{1}}{Y_{m}^{T}}-{Y_{m}}{G_{1}^{T}}{B^{T}}+{Y_{m}^{T}}{Y_{m}}\right)}{{\partial{B}}}+2\mu{B}\\ =&2\left({P_{2}^{T}}{P_{2}}B{F_{1}}{F_{1}^{T}}-{P_{2}^{T}}{Y_{h}}{F_{1}^{T}}+{B}{G_{1}}{G_{1}^{T}}-{Y_{m}}{G_{1}^{T}}+\mu{B}\right),\end{split} (32)
f(A,B,C)C=(CH1YhF2+P3CK1YmF2+μ(AF2+BF2+CF2))C=tr(CH1Yh,CH1Yh+P3CK1Ym,P3CK1Ym)C+2μC=tr(CH1H1TCTCH1YhTYhH1TCT+YhTYh)C+tr(P3CK1K1TCTP3TP3CK1YmTYmK1TCTP3TYmYmT)C+2μC=2(CH1H1TYhH1T+P3TP3CK1K1TP3TYmK1T+μC).\begin{split}\dfrac{\partial f({A},{B},{C})}{\partial{C}}=&\dfrac{\partial\left(\left\|{C}{H_{1}}-{Y_{h}}\right\|_{F}^{2}+\left\|{P_{3}}{C}{K_{1}}-{Y_{m}}\right\|_{F}^{2}+\mu\left(\left\|{A}\right\|_{{F}}^{2}+\left\|{B}\right\|_{{F}}^{2}+\left\|{C}\right\|_{{F}}^{2}\right)\right)}{\partial{C}}\\ =&\dfrac{\partial tr\left(\langle{C}{H_{1}}-{Y_{h}},{C}{H_{1}}-{Y_{h}}\rangle+\langle{P_{3}}{C}{K_{1}}-{Y_{m}},P_{3}{C}{K_{1}}-{Y_{m}}\rangle\right)}{\partial{C}}+2\mu{C}\\ =&\dfrac{\partial tr\left({C}{H_{1}}{H_{1}^{T}}{C^{T}}-{C}{H_{1}}{Y_{h}^{T}}-{Y_{h}}{H_{1}^{T}}{C^{T}}+{Y_{h}^{T}}{Y_{h}}\right)}{\partial{C}}\\ +&\dfrac{\partial tr\left({P_{3}}{C}{K_{1}}{K_{1}^{T}}{C^{T}}{P_{3}^{T}}-{P_{3}}{C}{K_{1}}{Y_{m}^{T}}-{Y_{m}}{K_{1}^{T}}{C^{T}}{P_{3}^{T}}-{Y_{m}}{Y_{m}^{T}}\right)}{\partial{C}}+2\mu{C}\\ =&2\left({C}{H_{1}}{H_{1}^{T}}-{Y_{h}}{H_{1}^{T}}+{P_{3}^{T}}{P_{3}}C{K_{1}}{K_{1}^{T}}-{P_{3}^{T}}{Y_{m}}{K_{1}^{T}}+\mu{C}\right).\end{split} (33)

5. Convergence analysis

In this section, we analyze the convergence of 𝐠(𝐱)\|\mathbf{g}(\mathbf{x})\| and show that our proposed method produces a globally convergent iteration.

Lemma 5.1.

There exists a positive number MM such that

|f(𝐱)|M,𝐠(𝐱)M,H(𝐱)M.\lvert f(\mathbf{x})\rvert\leq M,\|\mathbf{g}(\mathbf{x})\|\leq M,\|H(\mathbf{x})\|\leq M. (34)
Proof.

Because 𝐩k=Hk𝐠(𝐱k)\mathbf{p}_{k}=-H_{k}\mathbf{g}\left(\mathbf{x}_{k}\right) generated by L-BFGS is a gradient-related descent direction. There exists a positive number satisfies the Armijo condition in (20) and the sequence of objective function value {f(𝐱k)}\left\{f\left(\mathbf{x}_{k}\right)\right\} decreases monotonically. Since the value of the objective function f(𝐱)f\left(\mathbf{x}\right) in (12) is nonnegative, f(𝐱)f\left(\mathbf{x}\right) is bounded, i.e. 0f(𝐱)M0\leq f(\mathbf{x})\leq M holds.

The regularization term μ(𝒜F2+F2+𝒞F2)\mu\left(\|\mathcal{A}\|_{F}^{2}+\|\mathcal{B}\|_{F}^{2}+\|\mathcal{C}\|_{F}^{2}\right) in the objective function indicates that the sequence {𝐱k}\left\{\mathbf{x}_{k}\right\} is bounded. Because 𝐠(𝐱)\mathbf{g}\left(\mathbf{x}\right) is a linear function of 𝐱\mathbf{x}, 𝐠(𝐱)\|\mathbf{g}(\mathbf{x})\| is also bounded. The proof of the boundedness of H(𝐱)H(\mathbf{x}) is shown in Appendix B. Thus, we get Lemma 5.1. ∎

Furthermore, we have the following Lemma 5.2.

Lemma 5.2.

There exist constants CmC_{m} and CMC_{M} satisfy 0<Cm1CM0<C_{m}\leq 1\leq C_{M},

𝐩k𝐠(𝐱k)Cm𝐠(𝐱k)2 and 𝐩kCM𝐠(𝐱k).\mathbf{p}_{k}^{\top}\mathbf{g}\left(\mathbf{x}_{k}\right)\leq-C_{m}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|^{2}\quad\text{ and }\quad\left\|\mathbf{p}_{k}\right\|\leq C_{M}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|. (35)
Proof.

The proof can be found in the Appendix B. ∎

Because of boundedness and monotonicity of {f(𝐱k)}\{f\left(\mathbf{x}_{k}\right)\}, the sequence of function value converges. The conclusion is given in Theorem 5.3.

Theorem 5.3.

Assume that Algorithm 2 generates an infinite sequence of function values {f(𝐱k)}\left\{f\left(\mathbf{x}_{k}\right)\right\}. Then, there exists a constant f{f}_{*} such that

limkf(𝐱k)=f.\lim_{k\rightarrow\infty}f\left(\mathbf{x}_{k}\right)=f_{*}. (36)

Next, we prove that every accumulation point of iterates {𝐱k}\{\mathbf{x}_{k}\} is a first-order stationary point. At last, by utilizing the Kurdyka-Ł\Lojasiewicz property [2], we show that the sequence of iterates {𝐱k}\left\{\mathbf{x}_{k}\right\} is also convergent. The following lemma means that the step size is lower bounded.

Lemma 5.4.

There exists a constant αmin>0\alpha_{\min}>0 such that

αkαmin>0,k𝐍+.\alpha_{k}\geq\alpha_{\min}>0,\quad\quad\forall k\in\mathbf{N}^{+}. (37)
Proof.

Let 0<αα~=(1σ)Cm12MCM20<\alpha\leq\tilde{\alpha}=\frac{(1-\sigma)C_{m}}{\frac{1}{2}MC_{M}^{2}}. From Lemma 5.2, for α(0,α~]\alpha\in(0,\tilde{\alpha}], it yields that

α𝐩kT𝐠k+12Mα2𝐩k2σα𝐩kT𝐠k=(1σ)α𝐩kT𝐠k+12Mα2𝐩k2(1σ)α(Cm𝐠(𝐱k)2)+12Mα2CM2𝐠k20.\begin{split}\alpha\mathbf{p}_{k}^{T}\mathbf{g}_{k}+\frac{1}{2}M\alpha^{2}\left\|\mathbf{p}_{k}\right\|^{2}-\sigma\alpha\mathbf{p}_{k}^{T}\mathbf{g}_{k}&=(1-\sigma)\alpha\mathbf{p}_{k}^{T}\mathbf{g}_{k}+\frac{1}{2}M\alpha^{2}\left\|\mathbf{p}_{k}\right\|^{2}\\ &\leq(1-\sigma)\alpha(-C_{m}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|^{2})+\frac{1}{2}M\alpha^{2}C_{M}^{2}\left\|\mathbf{g}_{k}\right\|^{2}\\ &\leq 0.\end{split} (38)

From Taylor’s mean value theorem and 𝐱k+1(α)=𝐱k+α𝐩k\mathbf{x}_{k+1}(\alpha)=\mathbf{x}_{k}+\alpha\mathbf{p}_{k}, we have

f(𝐱k+1(α))f(𝐱k)\displaystyle f\left(\mathbf{x}_{k+1}(\alpha)\right)-f\left(\mathbf{x}_{k}\right) 𝐠kT(𝐱k+1(α)𝐱k)+12M𝐱k+1(α)𝐱k2\displaystyle\leq\mathbf{g}_{k}^{T}\left(\mathbf{x}_{k+1}(\alpha)-\mathbf{x}_{k}\right)+\frac{1}{2}M\left\|\mathbf{x}_{k+1}(\alpha)-\mathbf{x}_{k}\right\|^{2} (39)
=α𝐩kT𝐠k+12Mα2𝐩k2\displaystyle=\alpha\mathbf{p}_{k}^{T}\mathbf{g}_{k}+\frac{1}{2}M\alpha^{2}\left\|\mathbf{p}_{k}\right\|^{2}
σα𝐩kT𝐠k,\displaystyle\leq\sigma\alpha\mathbf{p}_{k}^{T}\mathbf{g}_{k},

where the last inequality is valid owing to (38). The rule of inexact line search indicates αkαmin=α~β.\alpha_{k}\geq\alpha_{\min}=\tilde{\alpha}\beta. Hence, we find out a lower bound on the step size α\alpha. ∎

The following theorem proves that every accumulation point of iterates {𝐱k}\left\{\mathbf{x}_{k}\right\} is a first-order stationary point.

Theorem 5.5.

Suppose that Algorithm 2 generates an infinite sequence of iterates {𝐱k}\left\{\mathbf{x}_{k}\right\}. Then,

limk𝐠(𝐱k)=0.\lim_{k\rightarrow\infty}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|=0. (40)
Proof.

From Lemma 5.2 and (20), we get

f(𝐱k)f(𝐱k+1)σαk𝐩k𝐠(𝐱k)σαkCm𝐠(𝐱k)2.f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)\geq-\sigma\alpha_{k}\mathbf{p}_{k}^{\top}\mathbf{g}\left(\mathbf{x}_{k}\right)\geq\sigma\alpha_{k}C_{m}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|^{2}. (41)

The functional series k=1[f(𝐱k)f(𝐱k+1)]\sum_{k=1}^{\infty}\left[f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)\right] satisfies

2Mf(𝐱1)f=k=1[f(𝐱k)f(𝐱k+1)]k=1σαkCm𝐠(𝐱k)2k=1σαminCm𝐠(𝐱k)2.\begin{split}2M\geq f\left(\mathbf{x}_{1}\right)-f_{*}&=\sum_{k=1}^{\infty}\left[f\left(\mathbf{x}_{k}\right)-f\left(\mathbf{x}_{k+1}\right)\right]\\ &\geq\sum_{k=1}^{\infty}\sigma\alpha_{k}C_{m}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|^{2}\\ &\geq\sum_{k=1}^{\infty}\sigma\alpha_{\min}C_{m}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|^{2}.\end{split} (42)

That is to say,

k=1𝐠(𝐱k)22MσαminCm<+.\sum_{k=1}^{\infty}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|^{2}\leq\frac{2M}{\sigma\alpha_{\min}C_{m}}<+\infty. (43)

Hence, 𝐠(𝐱k)\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\| converges to zero. ∎

Analysis of proximal methods for nonconvex and nonsmooth optimization frequently uses the Kurdyka- Ł\Lojasiewicz (KL) property [2]. Since the objective function f(𝐱)f(\mathbf{x}) is a polynomial and the KL property below holds, we use the KL property to prove the convergence of algorithm.

Proposition 5.6.

(Kurdyka-Ł\Lojasiewicz (KL) property) Suppose that 𝐱\mathbf{x}_{*} is a stationary point of f(𝐱)f(\mathbf{x}). There is a neighborhood 𝒰\mathscr{U} of 𝐱\mathbf{x}_{*}, an exponent θ[0,1)\theta\in[0,1) and a positive constant KK such that for all 𝐱𝒰\mathbf{x}\in\mathscr{U}, the following inequality holds:

|f(𝐱)f(𝐱)|θK𝐠(𝐱).\left|f(\mathbf{x})-f\left(\mathbf{x}_{*}\right)\right|^{\theta}\leq K\|\mathbf{g}(\mathbf{x})\|. (44)

In particular, we define 0000^{0}\equiv 0.

Lemma 5.7.

Suppose that 𝐱\mathbf{x}_{*} is a stationary point of f(𝐱)f(\mathbf{x}) and 𝒜(𝐱,δ)={𝐱n:𝐱𝐱δ}𝒰\mathscr{A}\left(\mathbf{x}_{*},\delta\right)=\{\mathbf{x}\in\left.\mathbb{R}^{n}:\left\|\mathbf{x}-\mathbf{x}_{*}\right\|\leq\delta\right\}\subseteq\mathscr{U} is a neighborhood of 𝐱.\mathbf{x}_{*}. Let 𝐱1\mathbf{x}_{1} be an initial point satisfying

δ>CMKσCm(1θ)|f(𝐱1)f(𝐱)|1θ+𝐱1𝐱.\delta>\frac{C_{M}K}{\sigma C_{m}(1-\theta)}\left|f\left(\mathbf{x}_{1}\right)-f\left(\mathbf{x}_{*}\right)\right|^{1-\theta}+\left\|\mathbf{x}_{1}-\mathbf{x}_{*}\right\|. (45)

Then, the following assertions hold:

𝐱k𝒜(𝐱,δ),k=1,2,\mathbf{x}_{k}\in\mathscr{A}\left(\mathbf{x}_{*},\delta\right),\quad k=1,2,\ldots (46)

and

k=1𝐱k+1𝐱kCMKσCm(1θ)|f(𝐱1)f(𝐱)|1θ.\sum_{k=1}^{\infty}\left\|\mathbf{x}_{k+1}-\mathbf{x}_{k}\right\|\leq\frac{C_{M}K}{\sigma C_{m}(1-\theta)}\left|f\left(\mathbf{x}_{1}\right)-f\left(\mathbf{x}_{*}\right)\right|^{1-\theta}. (47)
Proof.

The theorem is proved by induction. Obviously, 𝐱1𝒜(𝐱,δ)\mathbf{x}_{1}\in\mathscr{A}\left(\mathbf{x}_{*},\delta\right). Now, we assume 𝐱i𝒜(𝐱,δ)\mathbf{x}_{i}\in\mathscr{A}\left(\mathbf{x}_{*},\delta\right) for all i=1,,ki=1,\ldots,k and KL property holds at these points. Define a concave function

ϕ(q)K1θ|qf(𝐱)|1θandq>f(𝐱).\phi(q)\equiv\frac{K}{1-\theta}\left|q-f\left(\mathbf{x}_{*}\right)\right|^{1-\theta}\quad and\quad q>f\left(\mathbf{x}_{*}\right). (48)

Its derivative function is

ϕ(q)=K|qf(𝐱)|θ.\phi^{\prime}(q)=\frac{K}{\left|q-f\left(\mathbf{x}_{*}\right)\right|^{\theta}}. (49)

For i=1,,ki=1,\ldots,k, the first-order condition of the concave function ϕ(q)\phi^{\prime}(q) at f(𝐱i)f(\mathbf{x}_{i}) is

ϕ(f(𝐱i))ϕ(f(𝐱i+1))\displaystyle\phi\left(f\left(\mathbf{x}_{i}\right)\right)-\phi\left(f\left(\mathbf{x}_{i+1}\right)\right) ϕ(f(𝐱i))(f(𝐱i)f(𝐱i+1))\displaystyle\geq\phi^{\prime}\left(f\left(\mathbf{x}_{i}\right)\right)\left(f\left(\mathbf{x}_{i}\right)-f\left(\mathbf{x}_{i+1}\right)\right) (50)
=K|f(𝐱i)f(𝐱)|θ(f(𝐱i)f(𝐱i+1)).\displaystyle=\frac{K}{\left|f\left(\mathbf{x}_{i}\right)-f\left(\mathbf{x}_{*}\right)\right|^{\theta}}\left(f\left(\mathbf{x}_{i}\right)-f\left(\mathbf{x}_{i+1}\right)\right).

The equation (50) and KL property mean

ϕ(f(𝐱i))ϕ(f(𝐱i+1))1𝐠(𝐱i)(f(𝐱i)f(𝐱i+1)).\phi\left(f\left(\mathbf{x}_{i}\right)\right)-\phi\left(f\left(\mathbf{x}_{i+1}\right)\right)\geq\frac{1}{\left\|\mathbf{g}\left(\mathbf{x}_{i}\right)\right\|}\left(f\left(\mathbf{x}_{i}\right)-f\left(\mathbf{x}_{i+1}\right)\right). (51)

By Lemma 5.2 and (41), we have

ϕ(f(𝐱i))ϕ(f(𝐱i+1))σαiCm𝐠(𝐱i)σCmCM𝐱i+1𝐱i,\phi\left(f\left(\mathbf{x}_{i}\right)\right)-\phi\left(f\left(\mathbf{x}_{i+1}\right)\right)\geq\sigma\alpha_{i}C_{m}\left\|\mathbf{g}\left(\mathbf{x}_{i}\right)\right\|\geq\frac{\sigma C_{m}}{C_{M}}\left\|\mathbf{x}_{i+1}-\mathbf{x}_{i}\right\|, (52)

where the last inequality is valid because

𝐱k+1𝐱k=αk𝐩kαkCM𝐠(𝐱k).\left\|\mathbf{x}_{k+1}-\mathbf{x}_{k}\right\|=\alpha_{k}\left\|\mathbf{p}_{k}\right\|\leq\alpha_{k}C_{M}\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|. (53)

The upper bound of 𝐱k+1𝐱\|\mathbf{x}_{k+1}-\mathbf{x}_{*}\| is

𝐱k+1𝐱\displaystyle\left\|\mathbf{x}_{k+1}-\mathbf{x}_{*}\right\| i=1k𝐱i+1𝐱i+𝐱1𝐱\displaystyle\leq\sum_{i=1}^{k}\left\|\mathbf{x}_{i+1}-\mathbf{x}_{i}\right\|+\left\|\mathbf{x}_{1}-\mathbf{x}_{*}\right\| (54)
CMσCmi=1kϕ(f(𝐱i))ϕ(f(𝐱i+1))+𝐱1𝐱\displaystyle\leq\frac{C_{M}}{\sigma C_{m}}\sum_{i=1}^{k}\phi\left(f\left(\mathbf{x}_{i}\right)\right)-\phi\left(f\left(\mathbf{x}_{i+1}\right)\right)+\left\|\mathbf{x}_{1}-\mathbf{x}_{*}\right\|
CMσCm(ϕ(f(𝐱1))ϕ(f(𝐱k+1)))+𝐱1𝐱\displaystyle\leq\frac{C_{M}}{\sigma C_{m}}\left(\phi\left(f\left(\mathbf{x}_{1}\right)\right)-\phi\left(f\left(\mathbf{x}_{k+1}\right)\right)\right)+\left\|\mathbf{x}_{1}-\mathbf{x}_{*}\right\|
CMσCmϕ(f(𝐱1))+𝐱1𝐱\displaystyle\leq\frac{C_{M}}{\sigma C_{m}}\phi\left(f\left(\mathbf{x}_{1}\right)\right)+\left\|\mathbf{x}_{1}-\mathbf{x}_{*}\right\|
<δ,\displaystyle<\delta,

which means 𝐱k+1𝒜(𝐱,δ)\mathbf{x}_{k+1}\in\mathscr{A}\left(\mathbf{x}_{*},\delta\right) and (46) holds. Moreover, according to (52), we obtain

k=1𝐱k+1𝐱kCMσCmk=1ϕ(f(𝐱k))ϕ(f(𝐱k+1))CMσCmϕ(f(𝐱1)).\sum_{k=1}^{\infty}\left\|\mathbf{x}_{k+1}-\mathbf{x}_{k}\right\|\leq\frac{C_{M}}{\sigma C_{m}}\sum_{k=1}^{\infty}\phi\left(f\left(\mathbf{x}_{k}\right)\right)-\phi\left(f\left(\mathbf{x}_{k+1}\right)\right)\leq\frac{C_{M}}{\sigma C_{m}}\phi\left(f\left(\mathbf{x}_{1}\right)\right). (55)

Thus, the proof of (47) is complete. ∎

The sequence of iterates {𝐱k}\left\{\mathbf{x}_{k}\right\} is demonstrated to converge to a unique accumulation point next.

Theorem 5.8.

Suppose that Algorithm 2 generates an infinite sequence of iterates {𝐱k}\left\{\mathbf{x}_{k}\right\}. {𝐱k}\left\{\mathbf{x}_{k}\right\} converges to a unique first-order stationary point 𝐱\mathbf{x}_{*}, i.e.

limk𝐱k=𝐱.\lim_{k\rightarrow\infty}\mathbf{x}_{k}=\mathbf{x}_{*}. (56)
Proof.

Clearly, according to (55), the sequence {𝐱k}\{\mathbf{x}_{k}\} satisfies

k=1𝐱k+1𝐱k<+\sum_{k=1}^{\infty}\left\|\mathbf{x}_{k+1}-\mathbf{x}_{k}\right\|<+\infty (57)

and is a Cauchy sequence. Owing to the boundedness of {𝐱k}\{\mathbf{x}_{k}\}, there exists an accumulate point 𝐱\mathbf{x}_{*} of iterates {𝐱k}\{\mathbf{x}_{k}\}. Thus (56) holds. ∎

6. Numerical experiments

In this section we demonstrate the performance of our proposed TTDSR method on two datasets. The method is implemented with parameters l=5l=5, σ=0.01\sigma=0.01, β=0.5\beta=0.5, μ=1\mu=1. The stopping criteria is

𝐠(𝐱k)<1010\left\|\mathbf{g}\left(\mathbf{x}_{k}\right)\right\|_{\infty}<10^{-10}

or

𝐱k+1𝐱k<1016 and |f(𝐱k+1)f(𝐱k)|<102.\left\|\mathbf{x}_{k+1}-\mathbf{x}_{k}\right\|_{\infty}<10^{-16}\quad\text{ and }\quad{\left|f\left(\mathbf{x}_{k+1}\right)-f\left(\mathbf{x}_{k}\right)\right|}<10^{-2}.

The maximum number of iteration is set to 400. All simulations are run on a HP notebook with 2.5 GHz Intel Core i5 and 4 GB RAM. We use tensorlab 3.0 [26] for basic tensor operations.

In the numerical experiments, the groundtruth image SRI is artificially degraded to HSI and MSI based on P1{P}_{1}, P2{P}_{2} and P3.{P}_{3}. For matrices P1P_{1} and P2P_{2} of spatial degradation, we follow the Wald’s protocol [36] and the degradation process from SRI to HSI is a combination of spatial blurring Gaussian kernel and downsampling. The downsampling and Gaussian kernel have parameters dd and qq, respectively. It is common to set the downsampling ratio d=4d=4 and q=9q=9 in the Gaussian kernel. In the following experiments, we also conduct simulations under situations such as d=6d=6 and q=9q=9, d=4d=4 and q=5q=5 respectively. In order to obtain MSI from SRI, we generate the spectral degradation matrix P3P_{3} through spectral specifications, which are taken from LANDSAT or QuickBird specifications of multispectral sensors. The Indian pines and Salinas-A scene datasets are available online at [14]. For Indian pines, the groundtruth image SRI is degraded with the former sensor, while the SRI of Salinas-A scene is degraded with the latter as in [28]. The dimensions of HSI, MSI, SRI images are demonstrated in Table 1.

Image name SRI HSI MSI
Indian pines, d=4 144×144×200144\times 144\times 200 36×36×20036\times 36\times 200 144×144×6144\times 144\times 6
Indian pines, d=6 144×144×200144\times 144\times 200 24×24×20024\times 24\times 200 144×144×6144\times 144\times 6
Salinas-A scene 80×80×20080\times 80\times 200 20×20×20020\times 20\times 200 80×80×480\times 80\times 4
Table 1. Image size for the SRI experiments.

6.1. Comparison with other algorithms

In this subsection we compare the proposed algorithm with state-of-the-art approaches, including HySure [31] and FUSE [35], which are based on matrix decompositions. Furthermore, tensor CP [20] and tucker decomposition [28] methods are also considered. The HySure method is about a convex formulation for SRI via subspace-based regularization proposed by Simoe´\acute{e}s et al, while FUSE describes fast fusion of multiband images based on solving a Sylvester equation proposed by Wei et al. We calculate the following metrics to evaluate the effect of image fusion, which includes re-constructed signal-to-noise ratio (R-SNR), correlation koeffizient (CC), spectral angle mapper (SAM) and the relative dimensional global error (ERGAS) used in [28]. R-SNR and CC are given by

R-SNR=10log10(𝒵F2𝒵^𝒵F2)\text{R-SNR}=10\log_{10}\left(\frac{\|\mathcal{Z}\|_{F}^{2}}{\|\widehat{\mathcal{Z}}-\mathcal{Z}\|_{F}^{2}}\right) (58)

and

CC=1n3(k=1n3ρ(𝒵:,:,k,𝒵^:,:,k)),\text{CC}=\frac{1}{n_{3}}\left(\sum_{k=1}^{n_{3}}\rho\left(\mathcal{Z}_{:,:,k},\hat{\mathcal{Z}}_{:,:,k}\right)\right), (59)

where ρ(,)\rho(\cdot,\cdot) is the pearson correlation coefficient between the original and estimated spectral slices. The metric SAM is

SAM=1m1m2n=1m1m2arccos(Zn,:(3)Z^n,:(3)Zn,:(3)2Z^n,:(3)2)\text{SAM}=\frac{1}{m_{1}m_{2}}\sum_{n=1}^{m_{1}m_{2}}\arccos\left(\frac{{Z}_{n,:}^{(3)^{\top}}\widehat{{Z}}_{n,:}^{(3)}}{\left\|{Z}_{n,:}^{(3)}\right\|_{2}\left\|\hat{{Z}}_{n,:}^{(3)}\right\|_{2}}\right) (60)

and calculates the angle between the original and estimated spectral fiber. The performance measurement ERGAS is

 ERGAS =100d1m1m2n3k=1n3𝒵^:,:,k𝒵:,:,kF2μk2,\text{ ERGAS }=\frac{100}{d}\sqrt{\frac{1}{m_{1}m_{2}n_{3}}\sum_{k=1}^{n_{3}}\frac{\left\|\widehat{\mathcal{Z}}_{:,:,k}-\mathcal{Z}_{:,:,k}\right\|_{F}^{2}}{\mu_{k}^{2}}}, (61)

where μk2\mu_{k}^{2} is the mean value of 𝒵^:,:,k\hat{\mathcal{Z}}_{:,:,k}. It represents the relative dimensionless global error between SRI and the estimated one. It is the root mean-square error averaged by the size of the SRI.

Algorithm quality evaluation metrics
R-SNR CC SAM ERGAS TIME(s)
best value ++\infty 1 0 0 -
STERTO 24.8691 0.8335 2.8220 1.2812 2.3025
d=4 SCOTT 16.4046 0.7617 7.2446 2.3651 0.5865
HySure 18.9055 0.6971 5.7052 2.3045 39.9998
q=9 FUSE 10.3359 0.6126 13.8561 4.5692 0.3635
TTDSR 17.0350 0.6712 6.7452 3.1165 2.3624
STERTO 23.7512 0.7874 3.2923 0.9875 2.1344
d=6 SCOTT 17.1569 0.7414 6.7436 1.5801 0.8730
HySure 17.8228 0.6879 6.4205 1.6793 38.6143
q=9 FUSE 11.9157 0.6134 11.8385 2.8082 0.3006
TTDSR 17.0350 0.6712 6.7452 2.0777 2.1966
STERTO 24.8723 0.8338 2.8213 1.2791 1.7986
d=4 SCOTT 15.5634 0.7218 7.6436 2.9810 0.5197
HySure 16.8333 0.6729 6.9514 2.8192 37.9540
q=5 FUSE 9.5533 0.5678 14.6419 5.8727 0.3169
TTDSR 17.0350 0.6712 6.7452 3.1165 2.386
Table 2. Comparison of performance of different algorithms on Indian pines.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 3. Spectral slice 120 of the SRI, Indian pines.

In practical applications, it is common that the hyperspectral and multispectral images generated by special sensors are noisy. Therefore in the experiments, we add white Gaussian noise to HSI and MSI. The first experiment is performed using Indian pines dataset from hyperspectral remote sensing data platform [14]. White Gaussian noise to 𝒴h\mathcal{Y}_{h} is 21dB, while to 𝒴m\mathcal{Y}_{m} is 25dB. The results are presented in Table 2 and Figure 3. The rank of tucker decomposition in SCOTT is [70,70,6]\left[70,70,6\right]. According to [20], tensor rank F=50F=50 of the STEREO method often yields good performance. For HySure method, ‘E’ represents groundtruth number of materials and is set to 16,16, which is chosen as the number of endmembers as [20]. In Table 2, when d=4d=4 and q=9q=9, the STERTO method performs the best and our proposed algorithm has advantages over the FUSE method in terms of metrics R-SNR, CC, SAM, and ERGAS. The HySure method has comparable performance to our algorithm, but requires more computational time. Moreover, our proposed algorithm gets a higher R-SNR value and lower SAM value when compared to the SCOTT. Figure 3 also provides an intuitive and reasonable display of super-resolution images.

In addition, we demonstrate the results given by different algorithms for Indian pines in Table 2 under the conditions d=6,q=9,d=6,q=9, and d=4,q=5.d=4,q=5. It seems that the values dd and qq affect the performances of all methods in some degree. However, the ranking of each evaluation parameters of different methods almost do not change when dd and qq vary.

Algorithm quality evaluation metrics
R-SNR CC SAM ERGAS TIME(s)
best value ++\infty 1 0 0 -
STERTO 17.1952 0.987 0.4548 4.3075 1.1343
d=4 SCOTT 18.8878 0.9903 0.3651 3.8203 0.1726
HySure 18.3815 0.9890 0.3519 4.0037 9.0540
q=9 FUSE 9.5258 0.8919 0.3769 12.2427 0.1248
TTDSR 17.1458 0.9858 0.1089 4.5147 1.5402
Table 3. Comparison of performance of different algorithms on Salinas-A scene.
Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e)
Figure 4. Spectral slice 120 of the SRI, Salinas-A scene.

In the second example, Salinas-A scene dataset comes from the hyperspectral remote sensing data platform, which is available in [28]. Similarly, white Gaussian noise is added to 𝒴h\mathcal{Y}_{h} and 𝒴m\mathcal{Y}_{m} with an input SNR of 3030 dB. Consistently, we conduct experiments on the dataset under the conditions of d=4d=4, q=9q=9, and the results are shown in Table 3. It is found that when d=6,q=9d=6,q=9 or d=4,q=5d=4,q=5 the metrics of the listed algorithms are almost consistent with the results of d=4d=4, q=9.q=9. We omit results of under these two conditions. In Table 3, compared with the STERTO, our proposed algorithm has comparable signal-to-noise ratio and time. Compared to other algorithms, our method get the lowest SAM value. Furthermore, it is evident that the TTDSR algorithm performs better than FUSE. In Figure 4, the super-resolution images obtained by different algorithms are shown. Due to the low SAM value, the image recovered by TTDSR algorithm are relatively clearer.

6.2. Further numerical reuslts of TTDSR

In this subsection we further show the numerical results of TTDSR implemented on Indian pines dataset. The curve in Figure 5(a) displays the objective function value in each iteration, which verifies the theoretical conclusion that the sequence {f(𝐱k)}\{f(\mathbf{x}_{k})\} is decreasing.

Refer to caption
(a)
Refer to caption
(b)
Figure 5. Further experiments on Indian pines.

In theory, the SRI is a low rank tensor. However, in the numerical experiments, we have no prior knowledge of the triple rank of the SRI tensor and the rank is given artificially. For Indian pines dataset, we run TTDSR algorithm ten times with the triple rank changing from 1 to 10 accordingly. In Figure 5(b), we demonstrate the values of R-SNR corresponding to different triple ranks. In this example, the best R-SNR is attained when the triple rank is 1.

From the above analysis, we can see that among all algorithms, the HySure method performs the best but costs much more time than others. This is because it establishes an optimization problem of convex objective function with vector total variation regularization. The TV regularizer calculates the dispersion difference of the image in the horizontal and vertical directions. Our method has a significant advantage over the SCOTT in that we only need to consider a triple rank, while the rank of tucker decomposition [28] is an array.

7. Conclusion

In this work, we provide a novel tensor triple decomposition model for hyperspectral super-resolution. Firstly, in order to capture the global interdependence between hyperspectral data of different modes, we use triple rank to characterize its low rank. Then we propose a optimization algorithm TTDSR to get the desired hyperspectral super-resolution image. Using the triple decomposition theorem, we cleverly obtain the gradient of the objective function of the model, which provides great help for solving the problem. Due to the algebraic nature of the objective function f(𝐱)f(\mathbf{x}), we apply the Kurdyka-Ł\Lojasiewicz property in analyzing the convergence of the sequence of iterates generated by TTDSR. In addition, experiments on two datasets show the feasibility and effectiveness of the TTDSR. This work opens up a new prospect for realizing hyperspectral super-resolution by using various tensor decompositions.

Appendix A The process of vectorization of the variable

The gradient of the objective function in (12) is calculated by vectorization of the variable as

𝐱:=(𝐚m1rr𝐛m2rr𝐜n3rr)=(Am1rr×1Bm2rr×1Cn3rr×1).\mathbf{x}:=\left(\begin{array}[]{l}\mathbf{a}_{m_{1}rr}\\ \mathbf{b}_{m_{2}rr}\\ \mathbf{c}_{n_{3}rr}\end{array}\right)=\left(\begin{array}[]{c}A_{m_{1}rr\times 1}\\ B_{m_{2}rr\times 1}\\ C_{n_{3}rr\times 1}\end{array}\right). (62)

We directly vectorize the optimal variables of (12) in accordance with preset principles while dealing with small-scale data. Firstly, noting that f(𝐱)=f1(𝐱)+f2(𝐱)+f3(𝐱).f\left(\mathbf{x}\right)=f_{1}\left(\mathbf{x}\right)+f_{2}\left(\mathbf{x}\right)+f_{3}\left(\mathbf{x}\right). Secondly, vectorizing the known tensors 𝒴h\mathcal{Y}_{h}, 𝒴m\mathcal{Y}_{m}, we get

vec(𝒴h)=𝐝handvec(𝒴m)=𝐝m.\displaystyle vec(\mathcal{Y}_{h})=\mathbf{d}_{h}\quad\quad and\quad\quad vec(\mathcal{Y}_{m})=\mathbf{d}_{m}.

The symbol vecvec indicates the vectorization operator. It yields from (21) that

(𝐲h)n1n2n3\displaystyle{\left(\mathbf{y}_{h}\right)}_{n_{1}n_{2}n_{3}} =(((Clm×n3En2)Pn1×m1)((EmKn×n2l)Em1))𝐚m1nm\displaystyle=\left(\left(\left(C_{lm\times n_{3}}\otimes E_{n_{2}}\right)^{\top}\otimes P_{n_{1}\times{m_{1}}}\right)\left(\left(E_{m}\otimes K_{n\times n_{2}l}\right)^{\top}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}
=(Cn3×lmEn2Pn1×m1)(EmKn2l×nEm1)𝐚m1nm\displaystyle=\left(C_{n_{3}\times lm}\otimes E_{n_{2}}\otimes P_{n_{1}\times m_{1}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}
=(Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1)𝐚m1nm.\displaystyle=\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}.

Hence,

f1(𝐱)=(Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1)𝐚m1nm𝐝h22.f_{1}\left(\mathbf{x}\right)=\left\|\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{h}\right\|_{2}^{2}.

It is easy to see

f1(𝐱)𝐚m1nm=2\displaystyle\frac{\partial f_{1}(\mathbf{x})}{\partial\mathbf{a}_{m_{1}nm}}=2 ((Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1))\displaystyle\left(\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\right)^{\top}
((Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1)𝐚m1nm𝐝h)\displaystyle\cdot\left(\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{h}\right)
=\displaystyle= 2(EmKn×n2lEm1)(Clm×n3Pm1n2×n1n2)\displaystyle 2\left(E_{m}\otimes K_{n\times n_{2}l}\otimes E_{m_{1}}\right)\left(C_{lm\times n_{3}}\otimes P_{m_{1}n_{2}\times n_{1}n_{2}}\right)
((Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1)𝐚m1nm𝐝h).\displaystyle\cdot\left(\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{h}\right).

Similarly, it yields from (24) that

(𝐲m)m1m2m3\displaystyle{\left(\mathbf{y}_{m}\right)}_{m_{1}m_{2}m_{3}} =(((EmBn×m2l)(Glm×m3Em2))Em1)𝐚m1nm\displaystyle=\left(\left(\left(E_{m}\otimes B_{n\times m_{2}l}\right)\left(G_{lm\times m_{3}}\otimes E_{m_{2}}\right)\right)^{\top}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}
=((Gm3×lmEm1m2)(EmBm2l×nEm1))𝐚m1nm.\displaystyle=\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}.

Hence,

f2(𝐱)=((Gm3×lmEm1m2)(EmBm2l×nEm1))𝐚m1nm𝐝m22,f_{2}\left(\mathbf{x}\right)=\left\|\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{m}\right\|_{2}^{2},

and

f2(𝐱)𝐚m1nm=2\displaystyle\frac{\partial f_{2}(\mathbf{x})}{\partial\mathbf{a}_{m_{1}nm}}=2 ((Gm3×lmEm1m2)(EmBm2l×nEm1))\displaystyle\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\right)^{\top}
((Gm3×lmEm1m2)(EmBm2l×nEm1)𝐚m1nm𝐝m)\displaystyle\cdot\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{m}\right)
=2\displaystyle=2 (EmBn×m2lEm1)(Glm×m3Em1m2)\displaystyle\left(E_{m}\otimes B_{n\times m_{2}l}\otimes E_{m_{1}}\right)\left(G_{lm\times m_{3}}\otimes E_{m_{1}m_{2}}\right)
((Gm3×lmEm1m2)(EmBm2l×nEm1)𝐚m1nm𝐝m).\displaystyle\cdot\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{m}\right).

For f3(𝐱)=μ𝐱22f_{3}\left(\mathbf{x}\right)=\mu\left\|\mathbf{x}\right\|_{2}^{2}, we have

f3(𝐱)𝐚m1nm=2μ𝐚m1nm.\frac{\partial f_{3}(\mathbf{x})}{\partial\mathbf{a}_{m_{1}nm}}=2\mu\mathbf{a}_{m_{1}nm}.

In a word,

f(𝐱)𝐚m1nm=2(EmKn×n2lEm1)(Clm×n3Pm1n2×n1n2)(((Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1))𝐚m1nm𝐝h)+2(EmBn×m2lEm1)(Glm×m3Em1m2)(((Gm3×lmEm1m2)(EmBm2l×nEm1))𝐚m1nm𝐝m)+2μ𝐚m1nm.\begin{split}\frac{\partial f(\mathbf{x})}{\partial\mathbf{a}_{m_{1}nm}}&=2\left(E_{m}\otimes K_{n\times n_{2}l}\otimes E_{m_{1}}\right)\left(C_{lm\times n_{3}}\otimes P_{m_{1}n_{2}\times n_{1}n_{2}}\right)\\ &\cdot\left(\left(\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{h}\right)\\ &+2\left(E_{m}\otimes B_{n\times m_{2}l}\otimes E_{m_{1}}\right)\left(G_{lm\times m_{3}}\otimes E_{m_{1}m_{2}}\right)\\ &\cdot\left(\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{m}\right)\\ &+2\mu\mathbf{a}_{m_{1}nm}.\end{split} (63)

It yields from (22) and (25) that

f1(𝐱)=((Hn1×mnPn2n3×m2n3)(EnCn3m×lEm2))𝐛m2ln𝐝h22,f_{1}\left(\mathbf{x}\right)=\left\|\left(\left(H_{n_{1}\times mn}\otimes P_{n_{2}n_{3}\times m_{2}n_{3}}\right)\left(E_{n}\otimes C_{n_{3}m\times l}\otimes E_{m_{2}}\right)\right)\mathbf{b}_{m_{2}ln}-\mathbf{d}_{h}\right\|_{2}^{2},
f2(𝐱)=((Am1×mnEm2m3)(EnGm3m×lEm2))𝐛m2ln𝐝m22,f_{2}\left(\mathbf{x}\right)=\left\|\left(\left(A_{m_{1}\times mn}\otimes E_{m_{2}m_{3}}\right)\left(E_{n}\otimes G_{m_{3}m\times l}\otimes E_{m_{2}}\right)\right)\mathbf{b}_{m_{2}ln}-\mathbf{d}_{m}\right\|_{2}^{2},

and

f3(𝐱)=μ𝐱22.f_{3}\left(\mathbf{x}\right)=\mu\left\|\mathbf{x}\right\|_{2}^{2}.

Therefore,

f(𝐱)𝐛m2ln\displaystyle\frac{\partial f(\mathbf{x})}{\partial\mathbf{b}_{m_{2}ln}} =2(EnCl×n3mEm2)(Hmn×n1Pm2n3×n2n3)\displaystyle=2\left(E_{n}\otimes C_{l\times n_{3}m}\otimes E_{m_{2}}\right)\left(H_{mn\times n_{1}}\otimes P_{m_{2}n_{3}\times n_{2}n_{3}}\right) (64)
(((Hn1×mnPn2n3×m2n3)(EnCn3m×lEm2))𝐛m2LN𝐝h)\displaystyle\cdot\left(\left(\left(H_{n_{1}\times mn}\otimes P_{n_{2}n_{3}\times m_{2}n_{3}}\right)\left(E_{n}\otimes C_{n_{3}m\times l}\otimes E_{m_{2}}\right)\right)\mathbf{b}_{m_{2}LN}-\mathbf{d}_{h}\right)
+2(EnGl×m3mEm2)(Amn×m1Em2m3)\displaystyle+2\left(E_{n}\otimes G_{l\times m_{3}m}\otimes E_{m_{2}}\right)\left(A_{mn\times m_{1}}\otimes E_{m_{2}m_{3}}\right)
(((Am1×mnEm2m3)(EnGm3m×lEm2))𝐛m2ln𝐝m)\displaystyle\cdot\left(\left(\left(A_{m_{1}\times mn}\otimes E_{m_{2}m_{3}}\right)\left(E_{n}\otimes G_{m_{3}m\times l}\otimes E_{m_{2}}\right)\right)\mathbf{b}_{m_{2}ln}-\mathbf{d}_{m}\right)
+2μ𝐛m2ln.\displaystyle+2\mu\mathbf{b}_{m_{2}ln}.

It also yields from (23) and (26) that

f1(𝐱)=((Hn1×nmEn2n3)(EmKn2n×lEn3))𝐜n3lm𝐝h22,f_{1}\left(\mathbf{x}\right)=\left\|\left(\left(H_{n_{1}\times nm}\otimes E_{n_{2}n_{3}}\right)\left(E_{m}\otimes K_{n_{2}n\times l}\otimes E_{n_{3}}\right)\right)\mathbf{c}_{n_{3}lm}-\mathbf{d}_{h}\right\|_{2}^{2},
f2(𝐱)=((Am1×nmPm2m3×m2n3)(EmBm2n×lEn3))𝐜n3lm𝐝m22,f_{2}\left(\mathbf{x}\right)=\left\|\left(\left(A_{m_{1}\times nm}\otimes P_{{m_{2}m_{3}}\times m_{2}n_{3}}\right)\left(E_{m}\otimes B_{m_{2}n\times l}\otimes E_{n_{3}}\right)\right)\mathbf{c}_{n_{3}lm}-\mathbf{d}_{m}\right\|_{2}^{2},

and

f3(𝐱)=μ𝐱22.f_{3}\left(\mathbf{x}\right)=\mu\left\|\mathbf{x}\right\|_{2}^{2}.

Thus,

f(𝐱)𝐜n3lm=2(EmKl×n2nEn3)(Hnm×n1En2×n3)(((Hn1×nmEn2×n3)(EmKn2n×lEn3))𝐜n3lm𝐝h)+2(EmBl×m2nEn3)(Anm×m1Pm2n3×m2m3)(((Am1×nmPm2m3×m2n3)(EmBm2n×lEn3))𝐜n3lm𝐝m)+2μ𝐜n3lm.\begin{split}\frac{\partial f(\mathbf{x})}{\partial\mathbf{c}_{n_{3}lm}}&=2\left(E_{m}\otimes K_{l\times n_{2}n}\otimes E_{n_{3}}\right)\left(H_{nm\times n_{1}}\otimes E_{n_{2}\times n_{3}}\right)\\ &\cdot\left(\left(\left(H_{n_{1}\times nm}\otimes E_{n_{2}\times n_{3}}\right)\left(E_{m}\otimes K_{n_{2}n\times l}\otimes E_{n_{3}}\right)\right)\mathbf{c}_{n_{3}lm}-\mathbf{d}_{h}\right)\\ &+2\left(E_{m}\otimes B_{l\times m_{2}n}\otimes E_{n_{3}}\right)\left(A_{nm\times m_{1}}\otimes P_{m_{2}n_{3}\times m_{2}m_{3}}\right)\\ &\cdot\left(\left(\left(A_{m_{1}\times nm}\otimes P_{m_{2}m_{3}\times m_{2}n_{3}}\right)\left(E_{m}\otimes B_{m_{2}n\times l}\otimes E_{n_{3}}\right)\right)\mathbf{c}_{n_{3}lm}-\mathbf{d}_{m}\right)\\ &+2\mu\mathbf{c}_{n_{3}lm}.\end{split} (65)

Then, its gradient is

f(𝐱)=2((EmKn×n2lEm1)(Clm×n3Pm1n2×n1n2)(((Cn3×lmPn1n2×m1n2)(EmKn2l×nEm1))𝐚m1nm𝐝h)+2(EmBn×m2lEm1)(Glm×m3Em1m2)(((Gm3×lmEm1m2)(EmBm2l×nEm1))𝐚m1nm𝐝m)+2μ𝐚m1nm (EnCl×n3mEm2)(Hmn×n1Pm2n3×n2n3)(((Hn1×mnPn2n3×m2n3)(EnCn3m×lEm2))𝐛m2ln𝐝h)+2(EnGl×m3mEm2)(Amn×m1Em2m3)(((Am1×mnEm2m3)(EnGm3m×lEm2))𝐛m2ln𝐝m)+2μ𝐛m2ln (EmKl×n2nEn3)(Hnm×n1En2×n3)(((Hn1×nmEn2×n3)(EmKn2n×lEn3))𝐜n3lm𝐝h)+2(EmBl×m2nEn3)(Anm×m1Pm2n3×m2m3)(((Am1×nmPm2m3×m2n3)(EmBm2n×lEn3))𝐜n3lm𝐝m)+2μ𝐜n3lm).\nabla f(\mathbf{x})=2\left(\begin{split}&\left(E_{m}\otimes K_{n\times n_{2}l}\otimes E_{m_{1}}\right)\left(C_{lm\times n_{3}}\otimes P_{m_{1}n_{2}\times n_{1}n_{2}}\right)\\ &\cdot\left(\left(\left(C_{n_{3}\times lm}\otimes P_{n_{1}n_{2}\times m_{1}n_{2}}\right)\left(E_{m}\otimes K_{n_{2}l\times n}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{h}\right)\\ &+2\left(E_{m}\otimes B_{n\times m_{2}l}\otimes E_{m_{1}}\right)\left(G_{lm\times m_{3}}\otimes E_{m_{1}m_{2}}\right)\\ &\cdot\left(\left(\left(G_{m_{3}\times lm}\otimes E_{m_{1}m_{2}}\right)\left(E_{m}\otimes B_{m_{2}l\times n}\otimes E_{m_{1}}\right)\right)\mathbf{a}_{m_{1}nm}-\mathbf{d}_{m}\right)\\ &+2\mu\mathbf{a}_{m_{1}nm}\\ &\rule[-10.0pt]{256.0748pt}{0.50003pt}\\ &\left(E_{n}\otimes C_{l\times n_{3}m}\otimes E_{m_{2}}\right)\left(H_{mn\times n_{1}}\otimes P_{m_{2}n_{3}\times n_{2}n_{3}}\right)\\ &\cdot\left(\left(\left(H_{n_{1}\times mn}\otimes P_{n_{2}n_{3}\times m_{2}n_{3}}\right)\left(E_{n}\otimes C_{n_{3}m\times l}\otimes E_{m_{2}}\right)\right)\mathbf{b}_{m_{2}ln}-\mathbf{d}_{h}\right)\\ &+2\left(E_{n}\otimes G_{l\times m_{3}m}\otimes E_{m_{2}}\right)\left(A_{mn\times m_{1}}\otimes E_{m_{2}m_{3}}\right)\\ &\cdot\left(\left(\left(A_{m_{1}\times mn}\otimes E_{m_{2}m_{3}}\right)\left(E_{n}\otimes G_{m_{3}m\times l}\otimes E_{m_{2}}\right)\right)\mathbf{b}_{m_{2}ln}-\mathbf{d}_{m}\right)\\ &+2\mu\mathbf{b}_{m_{2}ln}\\ &\rule[-10.0pt]{256.0748pt}{0.50003pt}\\ &\left(E_{m}\otimes K_{l\times n_{2}n}\otimes E_{n_{3}}\right)\left(H_{nm\times n_{1}}\otimes E_{n_{2}\times n_{3}}\right)\\ &\cdot\left(\left(\left(H_{n_{1}\times nm}\otimes E_{n_{2}\times n_{3}}\right)\left(E_{m}\otimes K_{n_{2}n\times l}\otimes E_{n_{3}}\right)\right)\mathbf{c}_{n_{3}lm}-\mathbf{d}_{h}\right)\\ &+2\left(E_{m}\otimes B_{l\times m_{2}n}\otimes E_{n_{3}}\right)\left(A_{nm\times m_{1}}\otimes P_{m_{2}n_{3}\times m_{2}m_{3}}\right)\\ &\cdot\left(\left(\left(A_{m_{1}\times nm}\otimes P_{m_{2}m_{3}\times m_{2}n_{3}}\right)\left(E_{m}\otimes B_{m_{2}n\times l}\otimes E_{n_{3}}\right)\right)\mathbf{c}_{n_{3}lm}-\mathbf{d}_{m}\right)\\ &+2\mu\mathbf{c}_{n_{3}lm}\end{split}\right). (66)

Appendix B We introduce five lemmas for the proof of Lemma 5.2

First, we consider the BFGS update (14),(15),(16)(\ref{4.1}),(\ref{4.2}),(\ref{4.3}).

Lemma B.1.

Suppose that Hk+1H_{k+1} is generated by (16)(\ref{4.3}). Then, we have

Hk+1Hk(1+2MNϵ)2+N2ϵ.\left\|H_{k+1}\right\|\leq\left\|H_{k}\right\|\left(1+\frac{2MN}{\epsilon}\right)^{2}+\frac{N^{2}}{\epsilon}. (67)
Proof.

If 𝐲k𝐬k<ϵ\mathbf{y}_{k}^{\top}\mathbf{s}_{k}<{\epsilon}, we get ρk=0\rho_{k}=0 and Hk+1=HkH_{k+1}=H_{k}. Hence, the inequality (67) holds. Next, we consider the case 𝐲k𝐬kϵ\mathbf{y}_{k}^{\top}\mathbf{s}_{k}\geq\epsilon. Obviously, ρk1ϵ\rho_{k}\leq\frac{1}{\epsilon}. From Lemma 5.1 and {𝐱k}\left\{\mathbf{x}_{k}\right\} is bouned, suppose there exists a positive number NN such that 𝐬kN\left\|\mathbf{s}_{k}\right\|\leq N and we get

𝐲k2M.\left\|\mathbf{y}_{k}\right\|\leq 2M. (68)

Since

Vk1+ρk𝐲k𝐬k1+2MNϵ and ρk𝐬k𝐬kρk𝐬k2N2ϵ,\left\|V_{k}\right\|\leq 1+\rho_{k}\left\|\mathbf{y}_{k}\right\|\left\|\mathbf{s}_{k}\right\|\leq 1+\frac{2MN}{\epsilon}\quad\text{ and }\quad\left\|\rho_{k}\mathbf{s}_{k}\mathbf{s}_{k}^{\top}\right\|\leq\rho_{k}\left\|\mathbf{s}_{k}\right\|^{2}\leq\frac{N^{2}}{\epsilon}, (69)

we have

Hk+1HkVk2+ρk𝐬k𝐬kHk(1+2MNϵ)2+N2ϵ.\left\|H_{k+1}\right\|\leq\left\|H_{k}\right\|\left\|V_{k}\right\|^{2}+\left\|\rho_{k}\mathbf{s}_{k}\mathbf{s}_{k}^{\top}\right\|\leq\left\|H_{k}\right\|\left(1+\frac{2MN}{\epsilon}\right)^{2}+\frac{N^{2}}{\epsilon}. (70)

Hence, the inequality (67) is valid. ∎

Lemma B.2.

Suppose that HkH_{k} is positive definite and Hk+1H_{k+1} is generated by BFGS (14),(15),(16)(\ref{4.1}),(\ref{4.2}),(\ref{4.3}). Let λmin(H)\lambda_{\min}(H) be the smallest eigenvalue of a symmetric matrix HH. Then, we get Hk+1H_{k+1} is positive definite and

λmin(Hk+1)ϵϵ+4M2Hkλmin(Hk).\lambda_{\min}\left(H_{k+1}\right)\geq\frac{\epsilon}{\epsilon+4M^{2}\left\|H_{k}\right\|}\lambda_{\min}\left(H_{k}\right). (71)
Proof.

For any unit vector 𝐯\mathbf{v}, we have

𝐯Hk+1𝐯=(𝐯ρk𝐬k𝐯𝐲k)Hk(𝐯ρk𝐬k𝐯𝐲k)+ρk(𝐬k𝐯)2.\mathbf{v}^{\top}H_{k+1}\mathbf{v}=\left(\mathbf{v}-\rho_{k}\mathbf{s}_{k}^{\top}\mathbf{vy}_{k}\right)^{\top}H_{k}\left(\mathbf{v}-\rho_{k}\mathbf{s}_{k}^{\top}\mathbf{vy}_{k}\right)+\rho_{k}\left(\mathbf{s}_{k}^{\top}\mathbf{v}\right)^{2}. (72)

Let t𝐬k𝐯t\equiv\mathbf{s}_{k}^{\top}\mathbf{v} and

ϕ(t)(𝐯tρk𝐲k)Hk(𝐯tρk𝐲k)+ρkt2.\phi(t)\equiv\left(\mathbf{v}-t\rho_{k}\mathbf{y}_{k}\right)^{\top}H_{k}\left(\mathbf{v}-t\rho_{k}\mathbf{y}_{k}\right)+\rho_{k}t^{2}. (73)

Because HkH_{k} is positive definite, ϕ(t)\phi(t) is convex and attaches its minimum at t=ρk𝐲kHk𝐯ρk+ρk2𝐲kHk𝐲kt_{*}=\frac{\rho_{k}\mathbf{y}_{k}^{\top}H_{k}\mathbf{v}}{\rho_{k}+\rho_{k}^{2}\mathbf{y}_{k}^{\top}H_{k}\mathbf{y}_{k}} . Hence,

𝐯Hk+1𝐯\displaystyle\mathbf{v}^{\top}H_{k+1}\mathbf{v} ϕ(t)\displaystyle\geq\phi\left(t_{*}\right) (74)
=𝐯Hk𝐯tρk𝐲kHk𝐯\displaystyle=\mathbf{v}^{\top}H_{k}\mathbf{v}-t_{*}\rho_{k}\mathbf{y}_{k}^{\top}H_{k}\mathbf{v}
=ρk𝐯Hk𝐯+ρk2(𝐲kHk𝐲k𝐯Hk𝐯(𝐲kHk𝐯)2)ρk+ρk2𝐲kHk𝐲k\displaystyle=\frac{\rho_{k}\mathbf{v}^{\top}H_{k}\mathbf{v}+\rho_{k}^{2}\left(\mathbf{y}_{k}^{\top}H_{k}\mathbf{y}_{k}\mathbf{v}^{\top}H_{k}\mathbf{v}-\left(\mathbf{y}_{k}^{\top}H_{k}\mathbf{v}\right)^{2}\right)}{\rho_{k}+\rho_{k}^{2}\mathbf{y}_{k}^{\top}H_{k}\mathbf{y}_{k}}
𝐯Hk𝐯1+ρk𝐲kHk𝐲k>0,\displaystyle\geq\frac{\mathbf{v}^{\top}H_{k}\mathbf{v}}{1+\rho_{k}\mathbf{y}_{k}^{\top}H_{k}\mathbf{y}_{k}}>0,

where the penultimate inequality holds because the Cauchy-Schwarz inequality is valid for the positive definite matrix norm Hk\|\cdot\|_{H_{k}},

𝐲kHk𝐯Hk𝐲kHk𝐯.\left\|\mathbf{y}_{k}\right\|_{H_{k}}\|\mathbf{v}\|_{H_{k}}\geq\mathbf{y}_{k}^{\top}H_{k}\mathbf{v}. (75)

So Hk+1H_{k+1} is also positive definite. From (68), it is easy to verify that

1+ρk𝐲kHk𝐲k1+4M2Hkϵ.1+\rho_{k}\mathbf{y}_{k}^{\top}H_{k}\mathbf{y}_{k}\leq 1+\frac{4M^{2}\left\|H_{k}\right\|}{\epsilon}. (76)

Thus, we have

𝐯Hk+1𝐯ϵϵ+4M2Hkλmin(Hk).\mathbf{v}^{\top}H_{k+1}\mathbf{v}\geq\frac{\epsilon}{{\epsilon}+4M^{2}\left\|H_{k}\right\|}\lambda_{\min}\left(H_{k}\right). (77)

Hence, we get the validation of (71). ∎

Second, we turn to L-BFGS. Regardless of the selection of γk\gamma_{k} in (19), we get the following lemma.

Lemma B.3.

Suppose that the parameter γk\gamma_{k} takes Barzilai-Borwein steps (19)(\ref{4.6}) . Then, we have

ϵ4M2γkN2ϵ.\frac{\epsilon}{4M^{2}}\leq\gamma_{k}\leq\frac{N^{2}}{\epsilon}. (78)
Proof.

If 𝐲k𝐬k<ϵ\mathbf{y}_{k}^{\top}\mathbf{s}_{k}<\epsilon, we get γk=1\gamma_{k}=1 which satisfies the bounds in (78) obviously. Otherwise, we have

ϵ𝐲k𝐬k𝐲k𝐬k.\epsilon\leq\mathbf{y}_{k}^{\top}\mathbf{s}_{k}\leq\left\|\mathbf{y}_{k}\right\|\left\|\mathbf{s}_{k}\right\|. (79)

Recalling (68), we get

ϵN𝐲k2M and ϵ2M𝐬kN.\frac{\epsilon}{N}\leq\left\|\mathbf{y}_{k}\right\|\leq 2M\quad\text{ and }\quad\frac{\epsilon}{2M}\leq\left\|\mathbf{s}_{k}\right\|\leq N. (80)

Hence, we have

ϵ4M2𝐲k𝐬k𝐲k2𝐲k𝐬k𝐲k2=𝐬k𝐲k=𝐬k2𝐲k𝐬k𝐬k2𝐲k𝐬kN2ϵ.\frac{\epsilon}{4M^{2}}\leq\frac{\mathbf{y}_{k}^{\top}\mathbf{s}_{k}}{\left\|\mathbf{y}_{k}\right\|^{2}}\leq\frac{\left\|\mathbf{y}_{k}\right\|\left\|\mathbf{s}_{k}\right\|}{\left\|\mathbf{y}_{k}\right\|^{2}}=\frac{\left\|\mathbf{s}_{k}\right\|}{\left\|\mathbf{y}_{k}\right\|}=\frac{\left\|\mathbf{s}_{k}\right\|^{2}}{\left\|\mathbf{y}_{k}\right\|\left\|\mathbf{s}_{k}\right\|}\leq\frac{\left\|\mathbf{s}_{k}\right\|^{2}}{\mathbf{y}_{k}^{\top}\mathbf{s}_{k}}\leq\frac{N^{2}}{\epsilon}. (81)

which means that γkBB1\gamma_{k}^{\mathrm{BB}1} and γkBB2\gamma_{k}^{\mathrm{BB}2} satisfy (78). ∎

Third, based on Lemmas B.1, B.2, B.3, we obtain two lemmas as follows.

Lemma B.4.

Suppose that the approximation of a Hessian’s inverse HkH_{k} is generated by L-BFGS (17),(18)(\ref{4.4}),(\ref{4.5}). Then, there exists a positive constant CM1C_{M}\geq 1 such that

HkCM.\left\|H_{k}\right\|\leq C_{M}. (82)
Proof.

From Lemma B.3 and (18), we have Hk(0)N2ϵ\left\|H_{k}^{(0)}\right\|\leq\frac{N^{2}}{\epsilon}. Then, for (17) and Lemma B.1, we get

Hk\displaystyle\left\|H_{k}\right\| Hk1(1+2MNϵ)2+N2ϵ\displaystyle\leq\left\|H_{k-1}\right\|\left(1+\frac{2MN}{\epsilon}\right)^{2}+\frac{N^{2}}{\epsilon} (83)
\displaystyle\leq\cdots
Hk(0)(1+2MNϵ)2l+N2ϵm=1l1(1+2MNϵ)2m\displaystyle\leq\left\|H_{k}^{(0)}\right\|\left(1+\frac{2MN}{\epsilon}\right)^{2l}+\frac{N^{2}}{\epsilon}\sum_{m=1}^{l-1}\left(1+\frac{2MN}{\epsilon}\right)^{2m}
N2ϵm=1l(1+2MNϵ)2mCM.\displaystyle\leq\frac{N^{2}}{\epsilon}\sum_{m=1}^{l}\left(1+\frac{2MN}{\epsilon}\right)^{2m}\equiv C_{M}.

Thus (82) holds. ∎

Lemma B.5.

Suppose that the approximation of a Hessian’s inverse HkH_{k} is generated by L-BFGS (17),(18)(\ref{4.4}),(\ref{4.5}). Then, there exists a constant 0<Cm<10<C_{m}<1 such that

λmin(Hk)Cm.\lambda_{\min}\left(H_{k}\right)\geq C_{m}. (84)
Proof.

From Lemma B.3 and (18), we have λmin(Hk(0))ϵ4M2.\lambda_{\min}\left(H_{k}^{(0)}\right)\geq\frac{\epsilon}{4M^{2}}. Moreover, Lemma B.4 means HkmCM\left\|H_{k-m}\right\|\leq C_{M} for all m=1,,lm=1,\ldots,l. Hence, Lemma B.2 implies

λmin(Hkm+1)ϵϵ+4M2CMλmin(Hkm).\lambda_{\min}\left(H_{k-m+1}\right)\geq\frac{\epsilon}{{\epsilon}+4M^{2}C_{M}}\lambda_{\min}\left(H_{k-m}\right). (85)

Then, we obtain

λmin(Hk)\displaystyle\lambda_{\min}\left(H_{k}\right) ϵϵ+4M2CMλmin(Hk1)\displaystyle\geq\frac{\epsilon}{{\epsilon}+4M^{2}C_{M}}\lambda_{\min}\left(H_{k-1}\right) (86)
\displaystyle\geq\cdots
(ϵϵ+4M2CM)lλmin(Hk(0))\displaystyle\geq\left(\frac{\epsilon}{{\epsilon}+4M^{2}C_{M}}\right)^{l}\lambda_{\min}\left(H_{k}^{(0)}\right)
ϵ4M2(ϵϵ+4M2CM)lCm.\displaystyle\geq\frac{\epsilon}{4M^{2}}\left(\frac{\epsilon}{{\epsilon}+4M^{2}C_{M}}\right)^{l}\equiv C_{m}.

Finally, the proof of Lemma 5.2 is straightforward from Lemmas B.4 and B.5. ∎

References