Hyperspectral Band Selection based on Generalized 3DTV and Tensor CUR Decomposition ††thanks: This research is partially supported by the NSF grant DMS-1941197.
Abstract
Hyperspectral Imaging (HSI) serves as an important technique in remote sensing. However, high dimensionality and data volume typically pose significant computational challenges. Band selection is essential for reducing spectral redundancy in hyperspectral imagery while retaining intrinsic critical information. In this work, we propose a novel hyperspectral band selection model by decomposing the data into a low-rank and smooth component and a sparse one. In particular, we develop a generalized 3D total variation (G3DTV) by applying the -norm to derivatives to preserve spatial-spectral smoothness. By employing the alternating direction method of multipliers (ADMM), we derive an efficient algorithm, where the tensor low-rankness is implied by the tensor CUR decomposition. We demonstrate the effectiveness of the proposed approach through comparisons with various other state-of-the-art band selection techniques using two benchmark real-world datasets. In addition, we provide practical guidelines for parameter selection in both noise-free and noisy scenarios.
Keywords— Hyperspectral band selection, total variation, tensor CUR decomposition, classification
1 Introduction
Hyperspectral imaging (HSI) captures spectral information from ground objects across hundreds of narrow spectral bands. This method yields richer visual data than traditional RGB imagery and has found diverse applications in fields like remote sensing, biomedicine, agriculture, and art conservation. However, HSI data is characterized by its high dimensionality, leading to issues like spectral redundancy and increased computational demands. To address these challenges, it is crucial to apply dimensionality reduction and redundancy elimination strategies. The process of hyperspectral band selection is designed to mitigate these issues by choosing a subset of the original spectral bands, ensuring the retention of critical spectral information.
In general, HSI band selection can be implemented in an either supervised or unsupervised manner depending on the availability of label information. In this work, we focus on unsupervised band selection which does not require prior knowledge of labeled data. Many methods have been developed based on local or global similarity comparison. For example, the enhanced fast density-peak-based clustering (E-FDPC) algorithm [1] aims to identify density peaks to select the most discriminative bands, and the fast neighborhood grouping (FNGBS) method [2] utilizes spatial proximity and spectral similarity to group and select representative bands efficiently. Another notable approach is the similarity-based ranking strategy with structural similarity (SR-SSIM) [3], which prioritizes bands based on their contribution to the overall structural similarity, thus preserving the integrity of the spectral-spatial structures in HSI data.
Graph-based unsupervised band selection methods are gaining popularity for their ability to capture underlying graph similarities. Recently, the Marginalized Graph Self-Representation (MGSR) [4] works to generate segmentation of the HSI through superpixel segmentation, thereby capturing the spatial information across different homogeneous regions and encoding this information into a structural graph. To accelerate the processing, the Matrix CUR decomposition-based approach (MHBSCUR) [5] has been developed. This graph-based method stands out for leveraging matrix CUR decomposition, which enhances computational efficiency. However, graph-based strategies entail extra computational overhead to construct graphs. Furthermore, most band selection approaches necessitate converting data into a matrix instead of directly applying algorithms on the HSI tensor. To address these issues, we propose a tensor-based HSI band selection algorithm that employs tensor CUR decomposition alongside a generalized 3D total variation (3DTV), which enhances the smoothness of the traditional 3DTV by applying the -norm to derivatives.
Matrix CUR decompositions are efficient low-rank approximation methods that use a small number of actual columns and rows of the original matrix. Recent literature has extended the matrix CUR decomposition to the tensor setting. Initially the CUR decomposition was extended to tensors through the single-mode unfolding of 3-mode tensors, as discussed in early works [6]. Subsequent research [7], introduced a more comprehensive tensor CUR variant that encompasses all tensor modes. More recently, the terminology has been refined to include specific names for these decompositions, with [8, 9] introducing the terms Fiber and Chidori CUR decompositions. While these initial tensor CUR decomposition methods utilize the mode- product, a t-CUR decomposition has also been developed under the t-product framework [10].
In this paper, we focus on the t-product based CUR decomposition, which will be utilized to expedite a crucial phase in our proposed algorithm. This approach allows for the preservation of multi-dimensional structural integrity while facilitating a computationally efficient representation of the original tensor. This innovative combination of CUR and 3DTV in our algorithm leverages the inherent multidimensional structure of HSI data, making our method particularly suitable for dealing with large-scale datasets while ensuring robust band selection performance.
The organization of the remainder of this paper is as follows: Fundamental concepts and definitions in tensor algebra together with G3DTV are introduced in Section 2. The proposed band selection method is presented in Section 3. Section 4 demonstrates the performance of the proposed method through several numerical experiments on benchmark remote sensing datasets. Conclusions and future work are discussed in Section 6.
2 Preliminaries
In this section, we introduce fundamental definitions and notation about third-order tensors. Consider , , and the th frontal slice denoted by . We will use to denote the zero tensor.
Definition 2.1
The block circulant operator is defined as follows
Definition 2.2
The operator unfold() and its inversion fold() for the conversion between tensors and matrices are defined as
Definition 2.3
The block diagonal matrix form of is defined as a block diagonal matrix with diagonal blocks , i.e.,
Definition 2.4
The t-product of the tensors and is defined as The t-product can also be converted to matrix-matrix multiplication in the Fourier domain such that is equivalent to bdiag, where , the fast Fourier transform along the third dimension.
Definition 2.5
A tensor is orthogonal if , where is the identity tensor such that for all .
Definition 2.6
The tensor is the Moore-Penrose pseudo inverse calculated by finding the pseudo inverse of each face in the transform domain
Definition 2.7
A tensor is f-diagonal if each frontal slice is diagonal for all .
Definition 2.8
The tensor Singular Value Decomposition (t-SVD) of a tensor induced by the t-product is
where and are orthogonal and the core tensor is f-diagonal. Moreover, the t-SVD rank of tensor is defined as
where denotes the cardinality of a set.
Definition 2.9
Definition 2.10
The proximal operator of is defined as
(1) |
Note that the proximal operator when can be computed using Algorithms 1 and 2 in [11].
In addition, to describe the high order smoothness of 3D images, we propose a novel 3D total variation regularization.
Definition 2.11
Consider a function where is compact, open and bounded. The generalized 3D total variation (G3DTV) regularization of is defined as
where is the derivative along the th axis and is an integer. If is a discretized version of when is discretized as a grid, then becomes a third-order tensor and thereby the definition can be extended to a discrete case.
3 Proposed method
Consider a hyperspectral data tensor with spectral bands, each composed of spatial pixels. To select bands, we aim to decompose into a sum of low-rank and spatial-spectral smooth tensor and a sparse tensor via the following model
(2) |
Here the third term is the G3DTV of as defined in Definition 2.11, i.e., . The parameter is the regularization parameter for controlling the sparsity of the outlier tensor , and controls the spatial-spectral smoothness of . The parameter in G3DTV is an integer, which is set as 2 throughout our experiments. In order to apply the ADMM framework to minimize (2), we introduce the auxiliary variables and rewrite (2) as an equivalent form
(3) | ||||
We introduce an indicator function to take care of the constraints. Let . The indicator function is defined as if and otherwise. Then the augmented Lagrangian reads as
where is a dual variable and is the penalty parameter. The resulting algorithm can be described as follows:
where and are updated for . Applying the ADMM algorithm requires solving three subproblems at each iteration. Specifically, we aim to minimize with respect to , , and . A common approach for updating is to use the skinny t-SVD, however this can be costly when the size of the tensor is large. To address this issue, we utilize the t-CUR decomposition and update via
By applying gradient descent with step size , we obtain the updating scheme for as where the gradient is calculated as . Then the t-CUR decomposition is updated as
where and are the respective row and column index sets. Next we calculate using Definition 2.6
(4) |
By fixing other variables, we update and as
(5) | ||||
(6) |
Here is the soft thresholding operator and is the proximal operator defined in (1). The convergence condition is defined as: , where is a predefined tolerance. Finally, we apply a classifier such as k-means on to find the desired clusters. The fiber indices of the bands closest to the cluster centroids are stored in the set . Thus the corresponding bands from the original tensor represent the desired band subset.
The main computational cost of Algorithm 1 arises from updating using the t-product. The per iteration complexity is . Thus a computational trade-off exists when applying the t-CUR decomposition.
4 Experimental data and results
4.1 Experimental Setup
In our numerical experiments, we evaluate the proposed method using two publicly available HSI datasets: Indian Pines and Salinas-A. To assess the effectiveness of our approach, we compare it with several other state-of-the-art band selection methods, including E-FDPC [1], SR-SSIM [3], and FNGBS [2] MGSR [4], and MHBSCUR [5]. To assess these method’s performance, we conduct classification tests using support vector machine (SVM), -nearest neighborhood (KNN), and convolutional neural network (CNN) as classifiers. Then the overall accuracy (OA) is used as the classification metric. According to our experiments, the classifiers such as SVM and KNN yield better classification accuracy and are faster across all the band selection methods than CNN. Therefore, we only report the SVM and KNN overall accuracy results in this paper. For the classification setup, we randomly select of the samples from each dataset for training and the remaining for testing. Each classification test is repeated 50 times with different training data to reduce the randomness effect. We test the number of bands ranging from 3 to 30, increasing in increments of three.
For Algorithm 1, the selection of specific rows and columns is fixed, with the actual choices being made through a random permutation initialized by setting the Mersenne Twister generator’s seed to 1. For each dataset, classification approach, and targeted number of bands, we fine-tune the parameters , , and using a grid search. The parameters and are adjusted over the range . The parameter is tuned over the range and is similarly optimized over the set .
All the numerical experiments are conducted using MATLAB 2023b on a desktop computer equipped with an Intel i7-1065G7 CPU, 12GB RAM, and running Windows 11.
4.2 Experiment 1: Indian Pines
In the first experiment, we test the Indian Pines dataset111 https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing _Scenes#Indian_Pines. The dataset, acquired by the AVIRIS sensor in Northwestern Indiana, contains 145x145 pixels, 200 spectral bands, and 16 distinct classes. Fig. 1 and Fig. 2 plot the OA curves produced by SVM and KNN respectively for this dataset. The proposed method (THBSCUR) outperforms the state-of-the-art methods in terms of OA when SVM or KNN is used to classify the results for a high number of bands. In this case, SVM also produces slightly higher classification accuracy than KNN. The average running times for each method in seconds averaged over the number of selected bands are for E-FDPC, for FNGBS, for SR-SSIM, for MHBSCUR, for MGSR, and for our proposed method. One can see that While our method may require a longer running time, it consistently achieves superior performance, particularly with a larger number of bands.


4.3 Experiment 2: Salinas-A
In the second experiment, we test the Salinas-A dataset, which is a subset of a larger image captured by the AVIRIS sensor in California, consisting of pixels, bands and classes. Fig. 3 and Fig. 4 plot the OA curves produced by SVM and KNN respectively for the Salinas-A dataset. The average running times for each method in seconds averaged over the number of selected bands are for E-FDPC, for FNGBS, for SR-SSIM, for MHBSCUR, for MGSR, and for our proposed method. Despite longer running time, the proposed method exhibits consistently superior performance with both the SVM and KNN classifier.


5 Discussion
In this section, we discuss parameter selection for each test dataset and provide general guidelines. Table 1 lists the optimal parameter range for each parameter and dataset. As the number of selected bands increases, the optimal choices for and are stable, whereas parameters and must be more carefully tuned. Importantly, there exist multiple combinations of parameter choices that may lead to optimal overall accuracy.
Parameters | Indian Pines | Salinas-A |
---|---|---|
In addition to the optimal parameter ranges outlined in Table 1, it is important to consider the effects of noise on parameter selection. Hyperspectral data often contains noise, commonly modeled as Gaussian noise, can greatly impact the performance of band selection algorithms. To account for the noise, the parameters may need to be adjusted to accommodate the deviation from the original signal introduced by the noise.
Considering a scenario where Gaussian noise with various standard deviation levels is introduced into the dataset we recommend the following guidelines for selecting 15 bands from the Indian Pines dataset. The parameters and , which control the trade-off between the sparsity term and the 3DTV regularization term, may require adjustment to counteract the introduction of noise. For the noisy case, and consistently yield optimal results across noise levels. Typically, a higher level of noise may necessitate a larger value of as it enforces stricter smoothness in . The parameter may be chosen slightly higher than typical settings in noise-free environments. The optimal parameters for are presented in 2.
Parameters | |||
---|---|---|---|
6 Conclusion
The inherent high dimensionality and redundancy of HSI data necessitates effective strategies for band selection to ensure computational efficiency and data interpretability. In this work, we propose a novel tensor-based band selection approach that utilizes the G3DTV regularization to preserve high-order spatial-spectral smoothness, and the tensor CUR decomposition to improve processing efficiency while preserving low-rankness. Our method distinguishes itself by maintaining the tensor structure of HSI data, avoiding the often inefficient conversion to matrix form and directly addressing the spectral redundancy problem in the tensor domain. This approach not only preserves the multidimensional structure of the data but also enhances computational scalability, making it ideal for handling large-scale datasets in practical applications. In future work, we aim to further improve the efficiency of our algorithm by developing an accelerated version. Furthermore, we plan to explore various importance sampling schemes in the t-CUR decomposition and study the impact of gradient tensor sparsity on the band selection accuracy.
References
- [1] Sen Jia, Guihua Tang, Jiasong Zhu, and Qingquan Li. A novel ranking-based clustering approach for hyperspectral band selection. IEEE Trans Geosci Remote Sens, 54(1):88–102, 2015.
- [2] Qi Wang, Qiang Li, and Xuelong Li. A fast neighborhood grouping method for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 59(6):5028–5039, 2021.
- [3] Buyun Xu, Xihai Li, Weijun Hou, Yiting Wang, and Yiwei Wei. A similarity-based ranking method for hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 59(11):9585–9599, 2021.
- [4] Yongshan Zhang, Xinxin Wang, Xinwei Jiang, and Yicong Zhou. Marginalized graph self-representation for unsupervised hyperspectral band selection. IEEE Transactions on Geoscience and Remote Sensing, 60:1–12, 2022.
- [5] Katherine Henneberger, Longxiu Huang, and Jing Qin. Hyperspectral band selection based on matrix cur decomposition. In IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pages 7380–7383. IEEE, 2023.
- [6] Michael W Mahoney, Mauro Maggioni, and Petros Drineas. Tensor-CUR decompositions for tensor-based data. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 327–336, 2006.
- [7] Cesar F Caiafa and Andrzej Cichocki. Generalizing the column–row matrix decomposition to multi-way arrays. Linear Algebra and its Applications, 433(3):557–573, 2010.
- [8] HanQin Cai, Keaton Hamm, Longxiu Huang, and Deanna Needell. Mode-wise tensor decompositions: Multi-dimensional generalizations of CUR decompositions. Journal of machine learning research, 22(185):1–36, 2021.
- [9] HanQin Cai, Zehan Chao, Longxiu Huang, and Deanna Needell. Fast robust tensor principal component analysis via fiber CUR decomposition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 189–197, 2021.
- [10] Juefei Chen, Yimin Wei, and Yanwei Xu. Tensor CUR decomposition under t-product and its perturbation. Numerical Functional Analysis and Optimization, 43(6):698–722, 2022.
- [11] Ashley Prater-Bennette, Lixin Shen, and Erin E Tripp. A constructive approach for computing the proximity operator of the p-th power of the norm. Applied and Computational Harmonic Analysis, 67:101572, 2023.