11email: {aas12as12as12tw, henry918888, a36492183, cjkang0601, zuw408421476}@gmail.com
11email: [email protected]
Self-supervised Fusarium Head Blight Detection with Hyperspectral Image and Feature Mining
Abstract
Fusarium Head Blight (FHB) is a serious fungal disease affecting wheat (including durum), barley, oats, other small cereal grains, and corn. Effective monitoring and accurate detection of FHB are crucial to ensuring stable and reliable food security. Traditionally, trained agronomists and surveyors perform manual identification, a method that is labor-intensive, impractical, and challenging to scale. With the advancement of deep learning and Hyper-spectral Imaging (HSI) and Remote Sensing (RS) technologies, employing deep learning, particularly Convolutional Neural Networks (CNNs), has emerged as a promising solution. Notably, wheat infected with serious FHB may exhibit significant differences on the spectral compared to mild FHB one, which is particularly advantageous for hyperspectral image-based methods. In this study, we propose a self-unsupervised classification method based on HSI endmember extraction strategy and top-K bands selection, designed to analyze material signatures in HSIs to derive discriminative feature representations. This approach does not require expensive device or complicate algorithm design, making it more suitable for practical uses. Our method has been effectively validated in the Beyond Visible Spectrum: AI for Agriculture Challenge 2024. The source code is easy to reproduce and available at https://github.com/VanLinLin/Automated-Crop-Disease-Diagnosis-from-Hyperspectral-Imagery-3rd.
Keywords:
Remote sensing Hyperspectral image Crop disease Self-supervised learning Fusarium head blight Feature Mining1 Introduction
With the continuous increase in the global population, food supply is a critical issue [9, 10, 11], especially for developing countries. Every year, a significant amount of crops is lost worldwide due to pests and diseases [8]. Accurate and timely detection of crop diseases is crucial for global agricultural production.
Effective diagnosis and control measures rely on symptom identification and severity assessment. Traditionally, these tasks are achieved through observation and estimation by humans, which may be impractical for large-scale monitoring due to high costs. Some prior-based approaches commonly used handcrafted features to identify features on land, such as the Normalized Difference Vegetation Index (NDVI) [16] and the Normalized Green-Red Difference Index (NGRDI) [17], are calculated based on the reflectance composition of specific spectral bands.
However, even though these indices may have the potential to distinguish features to some extent, they may fail in certain circumstances because they rely on a small number of bands within RS images or multispectral images (MSIs). Relying on such a limited number of bands does not always yield satisfactory results because spatial spectrum correlation and information are not fully exploited.
Fortunately, recent advancements in HSI imagery have made it possible to develop automated image-based crop disease diagnostic methods. Endmember extraction is a crucial process in HSI and RS that involves identifying pure spectral signatures from mixed pixel data [13, 14]. This process is essential for tasks such as material identification, classification, and abundance estimation in various applications like environmental monitoring, agriculture, and mineral exploration [10]. With the development of HSI analysis, several methods designed for material recognition are introduced, such as endmember extraction, HSI unmixing and non-negative matrix factorization. These methods are mainly optimization-based, effectively integrating the low-rank prior and sparsity prior of HSI.
Moreover, with the development of deep learning paradigms represented by CNNs and Vision Transformers [12], using deep neural networks for detection is a promising direction [15]. Devadas et al. [2] used 10 widely-used vegetation indices (VIs) to identify different types of rust in wheat leaves, and the results showed that some indices are effective. Shi et al. [3] applied wavelet techniques based on HSI to detect yellow rust on wheat leaves [4]. The learning-based methods are gradually dominating the HSI analysis for crop disease recognition. As our experiments and analysis shown, the spectral of diseased FHB images are differ from mild one. In other words, the endmember extraction would be effective to boost our model performance and robustness due to the extracted features may be discriminate.
In this report, we propose a self-supervised method for FHB detection. First, we analyze the intensity of each band in HSIs and reduce the complexity. Secondly, top-K bands selection with the guidance of pseudo-label generated by the K-means clustering is used to extract the discriminative features. Afterwards, the arbitrary classifier is assigned for FHB detection due to these effective key features are obtained. Compared to some detection methods based on CNNs, our proposed method is more suitable for large-scale general applications because it does not require the use of GPUs or the collection of a large number of expensive HSIs. We believe this study will provide valuable insights for future FHB detection.
2 Methodolodgy

In this section, we introduce a strategy to effectively utilize the spectral information of image-level HSIs using the simple endmember extraction strategy with top-K bands selection. Afterward, the simple classifier is employed to detect the given HSI is suffered from mild-FHB or serious-FHB, as shown in Figure 1.
2.1 Top-k bands selection for Endmember Extraction
Given the unique characteristics of HSIs, where different materials exhibit distinct reflectance values at the same bands, we propose an efficient approach for FHB detection that bypasses the need for complex HSI unmixing techniques traditionally used in endmember extraction.
Before the endmember extraction, the normalization and spectral averaging are used for reducing data complexity and remove the noises and redundancy within HSI. Because of the FHB detection can be regarded as the type of coarse-level HSI classification, there is no need for keeping fine-grained feature well. Therefore, normalization strategy is beneficial to the better performance and robustness for FHB detection.
As we mentioned in the previous section, the traditional metrics for diseased crops recognition, such as NDVI and NGRDI, are insufficient to handle more diverse scenes and the complexity of hyperspectral imagery (HSI). To reduce the complexity of HSI and streamline the recognition procedure, we introduce a top-K band selection method for key bands extraction. We begin by utilizing K-means clustering to generate pseudo-labels, providing valuable guidance. The normalization and spectral averaging techniques enable K-means clustering to achieve excellent clustering results.
To determine the optimal number of clusters (K), we employed two complementary methods: the Elbow Method and Silhouette Analysis, as illustrated in Figure 2. The Elbow Method, shown in the left graph, plots the inertia (within-cluster sum of squares) against the number of clusters. The optimal K is identified at the "elbow" point where the rate of decrease in inertia begins to level off, occurring at the optimal K is selected in our analysis. Correspondingly, the Silhouette Analysis, depicted in the right graph, measures how similar an object is to its own cluster compared to other clusters. The highest Silhouette score indicates the same result for the best clustering. The consistency between these two methods reinforces our confidence in selecting the optimal number of clustered groups as the best value.
Then, we utilized K-means clustering to generate pseudo-labels, providing guidance for our analysis. Due to normalization and spectral averaging techniques, K-means clustering achieved excellent clustering results. These pseudo-labels were subsequently used for feature importance mining. Based on this analysis, we selected the top-K important bands to serve as endmembers for mild or serious FHB recognition. Unlike traditional endmember extraction methods in HSI analysis, which primarily rely on optimization-based approaches with dense mathematical computations, our method is simpler. Its simplicity stems from the direct extraction of discriminative features, streamlining the process while maintaining effectiveness.

2.2 Simple Classifier for Fusarium Head Blight Detection
Because the robust and compact features are yielded, the classifier we used can be arbitrary. We use LightGBM [6] as our primary classifier because LightGBM excels in handling high-dimensional data with efficiency and robustness. It is especially beneficial in detecting FHB infections using HSI, which is usually a high-dimensional cube data, helping to mitigate the need for chemical treatments for treating wheat in the future. For the comprehensive analysis, we also used Support Vector Machine (SVM) in our experiment, it will be discussed in the next session.
3 Experiment Results
Data Description. The dataset we used is from [15]. These HSIs were acquired using a DJI M600 Pro UAV system equipped with an S185 snapshot hyperspectral sensor, capturing reflectance from 450-950nm with a spectral resolution of 4nm. The raw data included a px panchromatic image and a px hyperspectral image with 125 bands. Due to noise interference, the first 10 and last 14 bands were excluded, leaving 101 bands. All images were captured at a 60 meter altitude, providing a spatial resolution of approximately 4cm per pixel. While in the training stage, the data is divided into blocks with a size of and labeled. The dataset consisted of a total of 1,696 HSIs, of which 1,006 were labeled as mild-FHB and 690 as serious-FHB. During the training phase, we randomly selected 1,611 HSIs as training data, leaving the remaining 85 for validation.


Experiment Settings. In the experiment, we simply used two different models, LightGBM and SVM as our classifiers, as our previously illustrated, classifier can be arbitrary due to robust and discriminative representations are extracted by the proposed self-supervised endmember extraction strategy. The hyperparameter settings are all kept default. The input HSIs are normalized for stabilizing the training and inference. The common augmentation strategies are employed, such as random rotation and random cropping. In this study, we use accuracy to evaluate the performance of classification for FHB detection. Accuracy is defined as the ratio of correctly classified samples to the total number of samples. Additionally, four metrics—true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN)—are also can be used to calculate accuracy as follows:
(1) |
Top-K bands selection. The number of selected bands is important, it may lead the different extracted feature and prediction. Our experiments revealed that setting k to approximately 30 resulted in perfect accuracy on the validation set. Figure 4 illustrates that even with k is set to 3, the accuracy approaches 90%. Notably, as the selected bands increase, the accuracy is higher. It emphasizes the critical role of selecting important bands as features for model performance.
Evaluation and Analysis. The performance of our method have shown in Figure 4. The proposed method have showed the outstanding performance while significantly reducing the complexity. Ultimately, the top-30 bands are used during both stages of our method, striking an optimal balance between computational efficiency and model performance.
To add the explainability of our method, we further analyzed the important bands. Figure 3 illustrates the spectral signatures of mild-FHB and serious-FHB crops, highlighting the important features identified by our method across the wavelength range. Interestingly, as depicted in Figure 4, our analysis reveals that these critical bands are not uniformly distributed but rather cluster within specific wavelength ranges. Notably, we observed significant concentrations in the 700-750 nm and 800-875 nm ranges, corresponding to the green and red spectra, respectively. This clustering of important features suggests that a large portion of the hyperspectral data may be redundant for FHB detection. Instead, effectively leveraging information from these two key spectral regions appears to be crucial for accurately identifying FHB.
These finding not only validates the effectiveness of our feature selection approach but also offers valuable insights into the spectral characteristics most relevant for FHB detection. The clear difference between mild-diseased and serious-diseased spectral patterns, coupled with the top-K bands selection for endmember extraction, demonstrates the robustness and discriminative power of our method.
4 Conclusion
In this paper, we proposed a self-supervised method for FHB detection based on hyperspectral imagery. The core of our method lies in endmember extraction, followed by a novel top-K bands selection. With the selected significant bands, we effectively reduced the dimensionality of given HSI without compromise between complexity and performance for FHB detection. Because of the above methods are self-supervised and without the needs for the expensive device for storage and deep neural networks, we believe our methods are more easy and feasible to the practical uses and applications. The efficacy of the proposed method has been demonstrated in our experiments and the Beyond Visible Spectrum: AI for Agriculture Challenge 2024.

References
- [1] Beyond Visible Spectrum: AI for Agriculture 2024, https://www.kaggle.com/competitions/beyond-visible-spectrum-ai-for-agriculture-2024/overview.
- [2] R. Devadas, et al., "Evaluating ten spectral vegetation indices for identifying rust infection in individual wheat leaves", Precision Agric, 459–470 (2009)
- [3] Y. Shi, et al., "Wavelet-Based Rust Spectral Feature Set (WRSFs): A Novel Spectral Feature Set Based on Continuous Wavelet Transformation for Tracking Progressive Host–Pathogen Interaction of Yellow Rust on Wheat", Remote Sensing, 10 (2018)
- [4] M.D. Farrell and R.M. Mersereau, "On the Impact of PCA Dimension Reduction for Hyperspectral Detection of Difficult Targets", IEEE Geoscience and Remote Sensing Letters, 192–195 (2005)
- [5] T.G. Whiteside, et al., "Comparing object-based and pixel-based classifications for mapping savannas", International Journal of Applied Earth Observation and Geoinformation, 884–893 (2011)
- [6] G. Ke, et al., "LightGBM: A Highly Efficient Gradient Boosting Decision Tree", Advances in Neural Information Processing Systems (NeurIPS), 2017
- [7] C. Cortes and V. Vapnik, "Support-vector networks", Machine Learning, 273–297 (1995)
- [8] J M. Beddow, et al., "Cuddy and Yonow, Research investment implications of shifts in the global geography of wheat stripe rust", Nature Plants, 2015
- [9] R.P. Singh, et al., "Wheat Rust in Asia: Meeting the Challenges with Old and New Technologies", In Proceedings of the 4th International Crop Science Congress, 2004
- [10] S. Savary, et al., "The global burden of pathogens and pests on major food crops", Nature Ecology and Evolution, 2019
- [11] S. Sindhuja, et al., "A review of advanced techniques for detecting plant diseases", Computers and Electronics in Agriculture, 1–13 (2010)
- [12] A. Dosovitskiy, et al., "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale", arXiv 2010.11929, 2021
- [13] W. Sun, M. Jiang and L. Zhang, "Pure endmember extraction using SSR for Hyperspectral imagery", 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2016
- [14] K. Mantripragada, et al., "An Iterative Method for Hyperspectral Pixel Unmixing Leveraging Latent Dirichlet Variational Autoencoder", 2023 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2023
- [15] X. Zhang, et al., "A Deep Learning-Based Approach for Automated Yellow Rust Disease Detection from High-Resolution Hyperspectral UAV Images", Remote Sensing, 2019
- [16] Rouse, J.W., et al., "Monitoring vegetation systems in the Great Plains with ERTS", Third Earth Resources Technology Satellite-1 Symposium, 1, 309–317 (1974)
- [17] C. J. Tucker, "Red and photographic infrared linear combinations for monitoring vegetation", Remote Sensing of Environment, 8(2), 127–150 (1979)