This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Hyperspectral Band Selection based on Generalized 3DTV and Tensor CUR Decomposition thanks: This research is partially supported by the NSF grant DMS-1941197.

Katherine Henneberger Department of Mathematics, University of Kentucky, Lexington, KY 40511    Jing Qin Department of Mathematics, University of Kentucky, Lexington, KY 40511
Abstract

Hyperspectral Imaging (HSI) serves as an important technique in remote sensing. However, high dimensionality and data volume typically pose significant computational challenges. Band selection is essential for reducing spectral redundancy in hyperspectral imagery while retaining intrinsic critical information. In this work, we propose a novel hyperspectral band selection model by decomposing the data into a low-rank and smooth component and a sparse one. In particular, we develop a generalized 3D total variation (G3DTV) by applying the 1p\ell_{1}^{p}-norm to derivatives to preserve spatial-spectral smoothness. By employing the alternating direction method of multipliers (ADMM), we derive an efficient algorithm, where the tensor low-rankness is implied by the tensor CUR decomposition. We demonstrate the effectiveness of the proposed approach through comparisons with various other state-of-the-art band selection techniques using two benchmark real-world datasets. In addition, we provide practical guidelines for parameter selection in both noise-free and noisy scenarios.

Keywords— Hyperspectral band selection, total variation, tensor CUR decomposition, classification

1 Introduction

Hyperspectral imaging (HSI) captures spectral information from ground objects across hundreds of narrow spectral bands. This method yields richer visual data than traditional RGB imagery and has found diverse applications in fields like remote sensing, biomedicine, agriculture, and art conservation. However, HSI data is characterized by its high dimensionality, leading to issues like spectral redundancy and increased computational demands. To address these challenges, it is crucial to apply dimensionality reduction and redundancy elimination strategies. The process of hyperspectral band selection is designed to mitigate these issues by choosing a subset of the original spectral bands, ensuring the retention of critical spectral information.

In general, HSI band selection can be implemented in an either supervised or unsupervised manner depending on the availability of label information. In this work, we focus on unsupervised band selection which does not require prior knowledge of labeled data. Many methods have been developed based on local or global similarity comparison. For example, the enhanced fast density-peak-based clustering (E-FDPC) algorithm [1] aims to identify density peaks to select the most discriminative bands, and the fast neighborhood grouping (FNGBS) method [2] utilizes spatial proximity and spectral similarity to group and select representative bands efficiently. Another notable approach is the similarity-based ranking strategy with structural similarity (SR-SSIM) [3], which prioritizes bands based on their contribution to the overall structural similarity, thus preserving the integrity of the spectral-spatial structures in HSI data.

Graph-based unsupervised band selection methods are gaining popularity for their ability to capture underlying graph similarities. Recently, the Marginalized Graph Self-Representation (MGSR) [4] works to generate segmentation of the HSI through superpixel segmentation, thereby capturing the spatial information across different homogeneous regions and encoding this information into a structural graph. To accelerate the processing, the Matrix CUR decomposition-based approach (MHBSCUR) [5] has been developed. This graph-based method stands out for leveraging matrix CUR decomposition, which enhances computational efficiency. However, graph-based strategies entail extra computational overhead to construct graphs. Furthermore, most band selection approaches necessitate converting data into a matrix instead of directly applying algorithms on the HSI tensor. To address these issues, we propose a tensor-based HSI band selection algorithm that employs tensor CUR decomposition alongside a generalized 3D total variation (3DTV), which enhances the smoothness of the traditional 3DTV by applying the p\ell_{p}-norm to derivatives.

Matrix CUR decompositions are efficient low-rank approximation methods that use a small number of actual columns and rows of the original matrix. Recent literature has extended the matrix CUR decomposition to the tensor setting. Initially the CUR decomposition was extended to tensors through the single-mode unfolding of 3-mode tensors, as discussed in early works [6]. Subsequent research [7], introduced a more comprehensive tensor CUR variant that encompasses all tensor modes. More recently, the terminology has been refined to include specific names for these decompositions, with [8, 9] introducing the terms Fiber and Chidori CUR decompositions. While these initial tensor CUR decomposition methods utilize the mode-kk product, a t-CUR decomposition has also been developed under the t-product framework [10].

In this paper, we focus on the t-product based CUR decomposition, which will be utilized to expedite a crucial phase in our proposed algorithm. This approach allows for the preservation of multi-dimensional structural integrity while facilitating a computationally efficient representation of the original tensor. This innovative combination of CUR and 3DTV in our algorithm leverages the inherent multidimensional structure of HSI data, making our method particularly suitable for dealing with large-scale datasets while ensuring robust band selection performance.

The organization of the remainder of this paper is as follows: Fundamental concepts and definitions in tensor algebra together with G3DTV are introduced in Section 2. The proposed band selection method is presented in Section 3. Section 4 demonstrates the performance of the proposed method through several numerical experiments on benchmark remote sensing datasets. Conclusions and future work are discussed in Section 6.

2 Preliminaries

In this section, we introduce fundamental definitions and notation about third-order tensors. Consider 𝒜n1×n2×n3{\mathcal{A}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}, n2×l×n3{\mathcal{B}}\in\mathbb{R}^{n_{2}\times l\times n_{3}}, and the kkth frontal slice denoted by 𝒜k=𝒜(:,:,k){\mathcal{A}}_{k}={\mathcal{A}}(:,:,k). We will use 𝒪\mathcal{O} to denote the zero tensor.

Definition 2.1

The block circulant operator is defined as follows

bcirc(𝒜):=[𝒜1𝒜n3𝒜2𝒜2𝒜1𝒜3𝒜n3𝒜n31𝒜1]n1n3×n2n3.\text{bcirc}({\mathcal{A}}):=\begin{bmatrix}{\mathcal{A}}_{1}&{\mathcal{A}}_{n_{3}}&\cdots&{\mathcal{A}}_{2}\\ {\mathcal{A}}_{2}&{\mathcal{A}}_{1}&\cdots&{\mathcal{A}}_{3}\\ \vdots&\vdots&\ddots&\vdots\\ {\mathcal{A}}_{n_{3}}&{\mathcal{A}}_{n_{3}-1}&\cdots&{\mathcal{A}}_{1}\end{bmatrix}\in\mathbb{R}^{n_{1}n_{3}\times n_{2}n_{3}}.
Definition 2.2

The operator unfold(\cdot) and its inversion fold(\cdot) for the conversion between tensors and matrices are defined as

unfold(𝒜)=[𝒜1𝒜2𝒜n3]n1n3×n2,fold([𝒜1𝒜2𝒜n3])=𝒜.\text{unfold}({\mathcal{A}})=\begin{bmatrix}{\mathcal{A}}_{1}\\ {\mathcal{A}}_{2}\\ \vdots\\ {\mathcal{A}}_{n_{3}}\end{bmatrix}\in\mathbb{R}^{n_{1}n_{3}\times n_{2}},\quad\text{fold}\left(\begin{bmatrix}{\mathcal{A}}_{1}\\ {\mathcal{A}}_{2}\\ \vdots\\ {\mathcal{A}}_{n_{3}}\end{bmatrix}\right)={\mathcal{A}}.
Definition 2.3

The block diagonal matrix form of 𝒜{\mathcal{A}} is defined as a block diagonal matrix with diagonal blocks 𝒜1,,𝒜n3{\mathcal{A}}^{1},\ldots,{\mathcal{A}}^{n_{3}}, i.e., bdiag(𝒜)=diag(𝒜1,𝒜2,,𝒜n3).\text{bdiag}(\mathcal{A})=\text{diag}({\mathcal{A}}_{1},{\mathcal{A}}_{2},\ldots,{\mathcal{A}}_{n_{3}}).

Definition 2.4

The t-product of the tensors 𝒜{\mathcal{A}} and {\mathcal{B}} is defined as 𝒜:=fold(bcirc(𝒜)unfold()).{\mathcal{A}}*{\mathcal{B}}:=\text{fold}(\text{bcirc}({\mathcal{A}})\cdot\text{unfold}({\mathcal{B}})). The t-product can also be converted to matrix-matrix multiplication in the Fourier domain such that 𝒞=𝒜{\mathcal{C}}={\mathcal{A}}*{\mathcal{B}} is equivalent to bdiag(𝒞f)=bcirc(𝒜f)bcirc(f)({\mathcal{C}}_{f})=\text{bcirc}({\mathcal{A}}_{f})\text{bcirc}({\mathcal{B}}_{f}), where 𝒜f=𝚏𝚏𝚝(𝒜,[],3){\mathcal{A}}_{f}=\verb|fft|({\mathcal{A}},[],3), the fast Fourier transform along the third dimension.

Definition 2.5

A tensor 𝒜\mathcal{A} is orthogonal if 𝒜𝒜=𝒜𝒜=𝒥\mathcal{A}^{*}*\mathcal{A}=\mathcal{A}*\mathcal{{\mathcal{A}}}^{*}={\mathcal{J}}, where 𝒥{\mathcal{J}} is the identity tensor such that 𝒥f(:,:,i3)=In{\mathcal{J}}_{f}(:,:,i_{3})=I_{n} for all i3[n3]{i_{3}\in[n_{3}]}.

Definition 2.6

The tensor 𝒜{\mathcal{A}}^{\dagger} is the Moore-Penrose pseudo inverse calculated by finding the pseudo inverse of each face in the transform domain

𝒜=𝚒𝚏𝚏𝚝(𝚏𝚘𝚕𝚍[(𝒜f)1(𝒜f)2(𝒜f)n3],[],3).{\mathcal{A}}^{\dagger}=\verb|ifft|\left(\verb|fold|\begin{bmatrix}({\mathcal{A}}_{f})^{\dagger}_{1}\\ ({\mathcal{A}}_{f})^{\dagger}_{2}\\ \vdots\\ ({\mathcal{A}}_{f})^{\dagger}_{n_{3}}\end{bmatrix},[],3\right).
Definition 2.7

A tensor 𝒜{\mathcal{A}} is f-diagonal if each frontal slice 𝒜(:,:,i3){{\mathcal{A}}(:,:,i_{3})} is diagonal for all i3[n3]i_{3}\in[n_{3}].

Definition 2.8

The tensor Singular Value Decomposition (t-SVD) of a tensor induced by the t-product is

𝒜=𝒰L𝒮L𝒱\mathcal{{\mathcal{A}}}=\mathcal{U}*_{L}\mathcal{S}*_{L}\mathcal{V}^{*}

where 𝒰n1×n1×n3\mathcal{U}\in\mathbb{R}^{n_{1}\times n_{1}\times n_{3}} and 𝒱n2×n2×n3\mathcal{V}\in\mathbb{R}^{n_{2}\times n_{2}\times n_{3}} are orthogonal and the core tensor 𝒮n1×n2×n3\mathcal{S}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}} is f-diagonal. Moreover, the t-SVD rank of tensor 𝒜{\mathcal{A}} is defined as

rankt(𝒜)=#{i:S(i,i,1)0}\text{rank}_{t}({\mathcal{A}})=\#\{i:S(i,i,1)\neq 0\}

where #\# denotes the cardinality of a set.

Definition 2.9

[10] Consider a tensor 𝒴n1×n2×n3{{\mathcal{Y}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}}. Let I{1:n1}{I\subset\{1:n_{1}\}} and J{1:n2}{J\subset\{1:n_{2}\}} be index subsets and define =𝒴(I,:,:){\mathcal{R}={\mathcal{Y}}(I,:,:)}, 𝒞=𝒴(:,J,:){{\mathcal{C}}={\mathcal{Y}}(:,J,:)}, and 𝒰=𝒴(I,J,:){{\mathcal{U}}={\mathcal{Y}}(I,J,:)}. The t-CUR decomposition of 𝒴{\mathcal{Y}} is 𝒴^=𝒞𝒰\hat{{\mathcal{Y}}}={\mathcal{C}}*{{\mathcal{U}}}^{\dagger}*\mathcal{R}, where 𝒰{\mathcal{U}}^{\dagger} is the pseudoinverse of 𝒰{\mathcal{U}} as defined in Definition 2.6 above.

Definition 2.10

The proximal operator of 1p\ell_{1}^{p} is defined as

proxλ1p(𝒵)=argmin𝒳n2×l××nm12𝒳𝒵2+λ𝒳1p.\displaystyle\operatorname{prox}_{\lambda\left\lVert\cdot\right\rVert_{1}^{p}}({\mathcal{Z}})=\operatorname*{argmin}_{{\mathcal{X}}\in\mathbb{R}^{n_{2}\times l\times\ldots\times n_{m}}}\frac{1}{2}\|{\mathcal{X}}-{\mathcal{Z}}\|^{2}+\lambda\|{\mathcal{X}}\|_{1}^{p}. (1)

Note that the proximal operator proxλ1p(𝒵)\operatorname{prox}_{\lambda\left\lVert\cdot\right\rVert_{1}^{p}}({\mathcal{Z}}) when p=1,2,3,4p=1,2,3,4 can be computed using Algorithms 1 and 2 in [11].

In addition, to describe the high order smoothness of 3D images, we propose a novel 3D total variation regularization.

Definition 2.11

Consider a function u:Ω3{u:\Omega\subset\mathbb{R}^{3}\to\mathbb{R}} where Ω\Omega is compact, open and bounded. The generalized 3D total variation (G3DTV) regularization of uu is defined as

uG3DTV=i=13iu1p,\left\lVert u\right\rVert_{G3DTV}=\sum_{i=1}^{3}\left\lVert\nabla_{i}u\right\rVert_{1}^{p},

where iu\nabla_{i}u is the derivative along the iith axis and p1p\geq 1 is an integer. If 𝒰\mathcal{U} is a discretized version of uu when Ω\Omega is discretized as a grid, then 𝒰\mathcal{U} becomes a third-order tensor and thereby the definition can be extended to a discrete case.

3 Proposed method

Consider a hyperspectral data tensor 𝒴n1×n2×n3{\mathcal{Y}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}} with n3n_{3} spectral bands, each composed of n1×n2n_{1}\times n_{2} spatial pixels. To select bands, we aim to decompose 𝒴{\mathcal{Y}} into a sum of low-rank and spatial-spectral smooth tensor {\mathcal{B}} and a sparse tensor 𝒮{\mathcal{S}} via the following model

minrankt()r,𝒮12+𝒮𝒴F2+λ1𝒮1+λ2G3DTV.\min_{\text{rank}_{t}({\mathcal{B}})\leq r,{\mathcal{S}}}\frac{1}{2}\left\lVert{\mathcal{B}}+{\mathcal{S}}-{\mathcal{Y}}\right\rVert_{F}^{2}+\lambda_{1}\left\lVert{\mathcal{S}}\right\rVert_{1}+\lambda_{2}\left\lVert\mathcal{B}\right\rVert_{G3DTV}. (2)

Here the third term is the G3DTV of \mathcal{B} as defined in Definition 2.11, i.e., G3DTV=i=13i1p\left\lVert\mathcal{B}\right\rVert_{G3DTV}=\sum_{i=1}^{3}\left\lVert\nabla_{i}{\mathcal{B}}\right\rVert^{p}_{1}. The parameter λ1\lambda_{1} is the regularization parameter for controlling the sparsity of the outlier tensor SS, and λ2>0\lambda_{2}>0 controls the spatial-spectral smoothness of {\mathcal{B}}. The parameter pp in G3DTV is an integer, which is set as 2 throughout our experiments. In order to apply the ADMM framework to minimize (2), we introduce the auxiliary variables 𝒳i{\mathcal{X}}_{i} and rewrite (2) as an equivalent form

minrankt()r𝒮,𝒳\displaystyle\min_{\begin{subarray}{c}\text{rank}_{t}({\mathcal{B}})\leq r\\ {\mathcal{S}},{\mathcal{X}}\end{subarray}} 12+𝒮𝒴F2+λ1𝒮1+λ2i=13𝒳i1p\displaystyle\frac{1}{2}\left\lVert{\mathcal{B}}+{\mathcal{S}}-{\mathcal{Y}}\right\rVert_{F}^{2}+\lambda_{1}\left\lVert{\mathcal{S}}\right\rVert_{1}+\lambda_{2}\sum_{i=1}^{3}\left\lVert{\mathcal{X}}_{i}\right\rVert^{p}_{1} (3)
s.t.𝒳i=ifori=1,2,3.\displaystyle\text{s.t.}\quad{\mathcal{X}}_{i}=\nabla_{i}{\mathcal{B}}\quad\text{for}\quad i=1,2,3.

We introduce an indicator function to take care of the constraints. Let Π={𝒳n1×n2×n3|rankt-SVD(𝒳)r}\Pi=\{{\mathcal{X}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}\,|\,\text{rank}_{\text{t-SVD}}({\mathcal{X}})\leq r\}. The indicator function χΠ\chi_{\Pi} is defined as χΠ(𝒳)=0\chi_{\Pi}({\mathcal{X}})=0 if 𝒳Π{\mathcal{X}}\in\Pi and \infty otherwise. Then the augmented Lagrangian reads as

=\displaystyle\mathcal{L}= 12+𝒮𝒴F2+λ1𝒮1+χΠ()+λ2i=13𝒳i1p\displaystyle\frac{1}{2}\left\lVert{\mathcal{B}}+{\mathcal{S}}-{\mathcal{Y}}\right\rVert_{F}^{2}+\lambda_{1}\left\lVert{\mathcal{S}}\right\rVert_{1}+\chi_{\Pi}({\mathcal{B}})+\lambda_{2}\sum_{i=1}^{3}\left\lVert{\mathcal{X}}_{i}\right\rVert^{p}_{1}
+β2i=13i𝒳i+𝒳~iF2,\displaystyle+\frac{\beta}{2}\sum_{i=1}^{3}\left\lVert\nabla_{i}{\mathcal{B}}-{\mathcal{X}}_{i}+\widetilde{{\mathcal{X}}}_{i}\right\rVert_{F}^{2},

where 𝒳~\widetilde{{\mathcal{X}}} is a dual variable and β>0\beta>0 is the penalty parameter. The resulting algorithm can be described as follows:

{argminrankt()r12+𝒮𝒴F2+β2i=13i𝒳i+𝒳~iF2,𝒮argmin𝒮12+𝒮𝒴F2+λ1𝒮1,𝒳iargmin𝒳iλ2𝒳i1p+β2i𝒳i+𝒳~iF2,𝒳~i𝒳~i+i𝒳i,\left\{\begin{aligned} {\mathcal{B}}\leftarrow&\operatorname*{argmin}_{\text{rank}_{t}({\mathcal{B}})\leq r}\frac{1}{2}\left\lVert{\mathcal{B}}+{\mathcal{S}}-{\mathcal{Y}}\right\rVert_{F}^{2}\\ &+\frac{\beta}{2}\sum_{i=1}^{3}\left\lVert\nabla_{i}{\mathcal{B}}-{\mathcal{X}}_{i}+\widetilde{{\mathcal{X}}}_{i}\right\rVert_{F}^{2},\\ {\mathcal{S}}\leftarrow&\operatorname*{argmin}_{{\mathcal{S}}}\frac{1}{2}\left\lVert{\mathcal{B}}+{\mathcal{S}}-{\mathcal{Y}}\right\rVert_{F}^{2}+\lambda_{1}\left\lVert{\mathcal{S}}\right\rVert_{1},\\ {\mathcal{X}}_{i}\leftarrow&\operatorname*{argmin}_{{\mathcal{X}}_{i}}\lambda_{2}\left\lVert{\mathcal{X}}_{i}\right\rVert^{p}_{1}+\frac{\beta}{2}\left\lVert\nabla_{i}{\mathcal{B}}-{\mathcal{X}}_{i}+\widetilde{{\mathcal{X}}}_{i}\right\rVert_{F}^{2},\\ \widetilde{{\mathcal{X}}}_{i}\leftarrow&\widetilde{{\mathcal{X}}}_{i}+\nabla_{i}{\mathcal{B}}-{\mathcal{X}}_{i},\end{aligned}\right.

where 𝒳i{\mathcal{X}}_{i} and 𝒳~i\widetilde{{\mathcal{X}}}_{i} are updated for i=1,2,3i=1,2,3. Applying the ADMM algorithm requires solving three subproblems at each iteration. Specifically, we aim to minimize \mathcal{L} with respect to {\mathcal{B}}, 𝒮{\mathcal{S}}, and 𝒳i{\mathcal{X}}_{i}. A common approach for updating {\mathcal{B}} is to use the skinny t-SVD, however this can be costly when the size of the tensor is large. To address this issue, we utilize the t-CUR decomposition and update {\mathcal{B}} via

j+1\displaystyle{\mathcal{B}}^{j+1} =argminrankt()r12+𝒮j𝒴F2\displaystyle=\operatorname*{argmin}_{\text{rank}_{t}({\mathcal{B}})\leq r}\frac{1}{2}\left\lVert{\mathcal{B}}+{\mathcal{S}}^{j}-{\mathcal{Y}}\right\rVert_{F}^{2}
+β2i=13i𝒳ij+𝒳~ijF2:=f().\displaystyle+\frac{\beta}{2}\sum_{i=1}^{3}\left\lVert\nabla_{i}{\mathcal{B}}-{\mathcal{X}}^{j}_{i}+\widetilde{{\mathcal{X}}}^{j}_{i}\right\rVert_{F}^{2}:=f({\mathcal{B}}).

By applying gradient descent with step size τ>0\tau>0, we obtain the updating scheme for \mathcal{B} as j+1=jτf(j){\mathcal{B}}^{j+1}={\mathcal{B}}^{j}-\tau\nabla f({\mathcal{B}}^{j}) where the gradient is calculated as f(j)=j+𝒮j𝒴+βi=13iT(ij𝒳ij+𝒳~ij)\nabla f({\mathcal{B}}^{j})={\mathcal{B}}^{j}+{\mathcal{S}}^{j}-{\mathcal{Y}}+\beta\sum_{i=1}^{3}\nabla_{i}^{T}(\nabla_{i}{\mathcal{B}}^{j}-{\mathcal{X}}^{j}_{i}+\widetilde{{\mathcal{X}}}^{j}_{i}). Then the t-CUR decomposition is updated as

{𝒞𝒞τf(j)(:,J,:),τf(j)(I,:,:),𝒰12(𝒞+),\left\{\begin{aligned} {\mathcal{C}}\leftarrow&{\mathcal{C}}-\tau\nabla f({\mathcal{B}}^{j})(:,J,:),\\ \mathcal{R}\leftarrow&\mathcal{R}-\tau\nabla f({\mathcal{B}}^{j})(I,:,:),\\ {\mathcal{U}}\leftarrow&\tfrac{1}{2}({\mathcal{C}}+\mathcal{R}),\end{aligned}\right.

where II and JJ are the respective row and column index sets. Next we calculate 𝒰{\mathcal{U}}^{\dagger} using Definition 2.6

j+1=𝒞𝒰.\displaystyle{\mathcal{B}}^{j+1}={\mathcal{C}}*{\mathcal{U}}^{{\dagger}}*\mathcal{R}. (4)

By fixing other variables, we update 𝒮{\mathcal{S}} and 𝒳i{\mathcal{X}}_{i} as

Sj+1\displaystyle S^{j+1} =proxλ1β1(𝒴j+1)\displaystyle=\operatorname{prox}_{\frac{\lambda_{1}}{\beta}\left\lVert\cdot\right\rVert_{1}}({\mathcal{Y}}-{\mathcal{B}}^{j+1}) (5)
𝒳ij+1\displaystyle{\mathcal{X}}_{i}^{j+1} =proxλ2β1p(ij+1+𝒳~ij),i=1,2,3.\displaystyle=\operatorname{prox}_{\frac{\lambda_{2}}{\beta}\left\lVert\cdot\right\rVert^{p}_{1}}(\nabla_{i}{\mathcal{B}}^{j+1}+\widetilde{{\mathcal{X}}}^{j}_{i}),\quad i=1,2,3. (6)

Here proxλ1β1\operatorname{prox}_{\frac{\lambda_{1}}{\beta}\left\lVert\cdot\right\rVert_{1}} is the soft thresholding operator and proxλ2β1p\operatorname{prox}_{\frac{\lambda_{2}}{\beta}\left\lVert\cdot\right\rVert^{p}_{1}} is the 1p\ell_{1}^{p} proximal operator defined in (1). The convergence condition is defined as: j+1j<ε{\left\lVert{\mathcal{B}}^{j+1}-{\mathcal{B}}^{j}\right\rVert_{\infty}<\varepsilon}, where ε\varepsilon is a predefined tolerance. Finally, we apply a classifier such as k-means on j+1{\mathcal{B}}^{j+1} to find the desired kk clusters. The fiber indices of the bands closest to the cluster centroids are stored in the set QQ. Thus the corresponding bands from the original tensor 𝒴{\mathcal{Y}} represent the desired band subset.

The main computational cost of Algorithm 1 arises from updating {\mathcal{B}} using the t-product. The per iteration complexity is O(2n3log(n3)+n1n2max(sr,s3)n3)O(2n_{3}\log(n_{3})+n_{1}n_{2}\max(s_{r},s_{3})n_{3}). Thus a computational trade-off exists when applying the t-CUR decomposition.

Algorithm 1 Hyperspectral Band Selection Based on Tensor CUR Decomposition
Input: 𝒴n1×n2×n3{\mathcal{Y}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}, maximum number of iterations TT, number of sampled rows and columns srs_{r} and scs_{c}, number of desired bands kk, parameters λ1,λ2,β,\lambda_{1},\lambda_{2},\beta, and tolerance ε\varepsilon
Output: The index QQ of the desired band set.
1. Optimize the model in (2) using ADMM:
Initialize: 0,𝒮0,𝒳i0=𝒪{\mathcal{B}}^{0},{\mathcal{S}}^{0},{\mathcal{X}}_{i}^{0}=\mathcal{O}
for j=0,1,2,3,Tj=0,1,2,3,\ldots T  do
    Update j+1{\mathcal{B}}^{j+1} as in (4)
    Update 𝒮j+1{\mathcal{S}}^{j+1} by solving (5)
    Update 𝒳ij+1{\mathcal{X}}_{i}^{j+1} by solving (6)
    Update 𝒳~ij+1=𝒳~ij+ij+1𝒳ij+1,i=1,2,3\widetilde{{\mathcal{X}}}_{i}^{j+1}=\widetilde{{\mathcal{X}}}^{j}_{i}+\nabla_{i}{\mathcal{B}}^{j+1}-{\mathcal{X}}^{j+1}_{i},\quad i=1,2,3
    Check the convergence conditions
    if converged then
         Exit and set ~=j+1\widetilde{{\mathcal{B}}}={\mathcal{B}}^{j+1}
    end if
end for
Set ~=T+1\widetilde{{\mathcal{B}}}={\mathcal{B}}^{T+1}
2. Cluster faces of ~\widetilde{{\mathcal{B}}} using k-means and find the index set QQ which indicates the bands closest to the kk cluster centroids.

4 Experimental data and results

4.1 Experimental Setup

In our numerical experiments, we evaluate the proposed method using two publicly available HSI datasets: Indian Pines and Salinas-A. To assess the effectiveness of our approach, we compare it with several other state-of-the-art band selection methods, including E-FDPC [1], SR-SSIM [3], and FNGBS [2] MGSR [4], and MHBSCUR [5]. To assess these method’s performance, we conduct classification tests using support vector machine (SVM), kk-nearest neighborhood (KNN), and convolutional neural network (CNN) as classifiers. Then the overall accuracy (OA) is used as the classification metric. According to our experiments, the classifiers such as SVM and KNN yield better classification accuracy and are faster across all the band selection methods than CNN. Therefore, we only report the SVM and KNN overall accuracy results in this paper. For the classification setup, we randomly select 90%90\% of the samples from each dataset for training and the remaining 10%10\% for testing. Each classification test is repeated 50 times with different training data to reduce the randomness effect. We test the number of bands ranging from 3 to 30, increasing in increments of three.

For Algorithm 1, the selection of specific rows and columns is fixed, with the actual choices being made through a random permutation initialized by setting the Mersenne Twister generator’s seed to 1. For each dataset, classification approach, and targeted number of bands, we fine-tune the parameters λ1,λ2\lambda_{1},\lambda_{2}, β\beta, and τ\tau using a grid search. The parameters λ1\lambda_{1} and λ2\lambda_{2} are adjusted over the range {104,103,102,101,1,10,100,1000}\{10^{-4},10^{-3},10^{-2},10^{-1},1,10,100,1000\}. The parameter β\beta is tuned over the range {101,1,10,100}\{10^{-1},1,10,100\} and τ\tau is similarly optimized over the set {101,1,10,100}\{10^{-1},1,10,100\}.

All the numerical experiments are conducted using MATLAB 2023b on a desktop computer equipped with an Intel i7-1065G7 CPU, 12GB RAM, and running Windows 11.

4.2 Experiment 1: Indian Pines

In the first experiment, we test the Indian Pines dataset111 https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing _Scenes#Indian_Pines. The dataset, acquired by the AVIRIS sensor in Northwestern Indiana, contains 145x145 pixels, 200 spectral bands, and 16 distinct classes. Fig. 1 and Fig. 2 plot the OA curves produced by SVM and KNN respectively for this dataset. The proposed method (THBSCUR) outperforms the state-of-the-art methods in terms of OA when SVM or KNN is used to classify the results for a high number of bands. In this case, SVM also produces slightly higher classification accuracy than KNN. The average running times for each method in seconds averaged over the number of selected bands kk are 0.03820.0382 for E-FDPC, 0.14170.1417 for FNGBS, 30.133430.1334 for SR-SSIM, 15.376015.3760 for MHBSCUR, 19.776619.7766 for MGSR, and 169.5353169.5353 for our proposed method. One can see that While our method may require a longer running time, it consistently achieves superior performance, particularly with a larger number of bands.

Refer to caption
Figure 1: Overall Accuracy with SVM for Indian Pines
Refer to caption
Figure 2: Overall Accuracy with KNN for Indian Pines

4.3 Experiment 2: Salinas-A

In the second experiment, we test the Salinas-A dataset, which is a subset of a larger image captured by the AVIRIS sensor in California, consisting of 86×8386\times 83 pixels, 204204 bands and 66 classes. Fig. 3 and Fig. 4 plot the OA curves produced by SVM and KNN respectively for the Salinas-A dataset. The average running times for each method in seconds averaged over the number of selected bands kk are 0.01070.0107 for E-FDPC, 0.05020.0502 for FNGBS, 22.67322.673 for SR-SSIM, 6.16366.1636 for MHBSCUR, 5.94915.9491 for MGSR, and 75.145575.1455 for our proposed method. Despite longer running time, the proposed method exhibits consistently superior performance with both the SVM and KNN classifier.

Refer to caption
Figure 3: Overall Accuracy by SVM for Salinas-A
Refer to caption
Figure 4: Overall Accuracy by KNN for Salinas-A

5 Discussion

In this section, we discuss parameter selection for each test dataset and provide general guidelines. Table 1 lists the optimal parameter range for each parameter and dataset. As the number of selected bands increases, the optimal choices for λ1\lambda_{1} and λ2\lambda_{2} are stable, whereas parameters β\beta and τ\tau must be more carefully tuned. Importantly, there exist multiple combinations of parameter choices that may lead to optimal overall accuracy.

Table 1: Parameter range for each dataset
Parameters Indian Pines Salinas-A
λ1\lambda_{1} {104,103}\{10^{-4},10^{-3}\} {103}\{10^{-3}\}
λ2\lambda_{2} {104,103,102}\{10^{-4},10^{-3},10^{-2}\} {103}\{10^{-3}\}
β\beta {0.1,1,10}\{0.1,1,10\} {0.1,1,10,100}\{0.1,1,10,100\}
τ\tau {1,10,100}\{1,10,100\} {0.1,1}\{0.1,1\}

In addition to the optimal parameter ranges outlined in Table 1, it is important to consider the effects of noise on parameter selection. Hyperspectral data often contains noise, commonly modeled as Gaussian noise, can greatly impact the performance of band selection algorithms. To account for the noise, the parameters may need to be adjusted to accommodate the deviation from the original signal introduced by the noise.

Considering a scenario where Gaussian noise with various standard deviation levels is introduced into the dataset we recommend the following guidelines for selecting 15 bands from the Indian Pines dataset. The parameters λ1\lambda_{1} and λ2\lambda_{2}, which control the trade-off between the sparsity term and the 3DTV regularization term, may require adjustment to counteract the introduction of noise. For the noisy case, β=1\beta=1 and τ=104\tau=10^{-4} consistently yield optimal results across noise levels. Typically, a higher level of noise may necessitate a larger value of λ2\lambda_{2} as it enforces stricter smoothness in {\mathcal{B}}. The parameter λ1\lambda_{1} may be chosen slightly higher than typical settings in noise-free environments. The optimal parameters for σ=1,2,3\sigma=1,2,3 are presented in 2.

Table 2: Optimal parameters for Indian Pines with Gaussian noise
Parameters σ=1\sigma=1 σ=3\sigma=3 σ=5\sigma=5
λ1\lambda_{1} 10210^{-2} 10310^{-3} 10210^{-2}
λ2\lambda_{2} 0.10.1 11 0.10.1
β\beta 11 11 11
τ\tau 10410^{-4} 10410^{-4} 10410^{-4}

6 Conclusion

The inherent high dimensionality and redundancy of HSI data necessitates effective strategies for band selection to ensure computational efficiency and data interpretability. In this work, we propose a novel tensor-based band selection approach that utilizes the G3DTV regularization to preserve high-order spatial-spectral smoothness, and the tensor CUR decomposition to improve processing efficiency while preserving low-rankness. Our method distinguishes itself by maintaining the tensor structure of HSI data, avoiding the often inefficient conversion to matrix form and directly addressing the spectral redundancy problem in the tensor domain. This approach not only preserves the multidimensional structure of the data but also enhances computational scalability, making it ideal for handling large-scale datasets in practical applications. In future work, we aim to further improve the efficiency of our algorithm by developing an accelerated version. Furthermore, we plan to explore various importance sampling schemes in the t-CUR decomposition and study the impact of gradient tensor sparsity on the band selection accuracy.

References

  • [1] Sen Jia, Guihua Tang, Jiasong Zhu, and Qingquan Li. A novel ranking-based clustering approach for hyperspectral band selection. IEEE Trans Geosci Remote Sens, 54(1):88–102, 2015.
  • [2] Qi Wang, Qiang Li, and Xuelong Li. A fast neighborhood grouping method for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 59(6):5028–5039, 2021.
  • [3] Buyun Xu, Xihai Li, Weijun Hou, Yiting Wang, and Yiwei Wei. A similarity-based ranking method for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 59(11):9585–9599, 2021.
  • [4] Yongshan Zhang, Xinxin Wang, Xinwei Jiang, and Yicong Zhou. Marginalized graph self-representation for unsupervised hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–12, 2022.
  • [5] Katherine Henneberger, Longxiu Huang, and Jing Qin. Hyperspectral band selection based on matrix cur decomposition. In IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pages 7380–7383. IEEE, 2023.
  • [6] Michael W Mahoney, Mauro Maggioni, and Petros Drineas. Tensor-CUR decompositions for tensor-based data. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 327–336, 2006.
  • [7] Cesar F Caiafa and Andrzej Cichocki. Generalizing the column–row matrix decomposition to multi-way arrays. Linear Algebra and its Applications, 433(3):557–573, 2010.
  • [8] HanQin Cai, Keaton Hamm, Longxiu Huang, and Deanna Needell. Mode-wise tensor decompositions: Multi-dimensional generalizations of CUR decompositions. Journal of machine learning research, 22(185):1–36, 2021.
  • [9] HanQin Cai, Zehan Chao, Longxiu Huang, and Deanna Needell. Fast robust tensor principal component analysis via fiber CUR decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 189–197, 2021.
  • [10] Juefei Chen, Yimin Wei, and Yanwei Xu. Tensor CUR decomposition under t-product and its perturbation. Numerical Functional Analysis and Optimization, 43(6):698–722, 2022.
  • [11] Ashley Prater-Bennette, Lixin Shen, and Erin E Tripp. A constructive approach for computing the proximity operator of the p-th power of the 1\ell_{1} norm. Applied and Computational Harmonic Analysis, 67:101572, 2023.