Multi-Fake Evolutionary Generative Adversarial Networks for Imbalance Hyperspectral Image Classification
Abstract
This paper presents a novel multi-fake evolutionary generative adversarial network(MFEGAN) for handling imbalance hyperspectral image classification. It is an end-to-end approach in which different generative objective losses are considered in the generator network to improve the classification performance of the discriminator network. Thus, the same discriminator network has been used as a standard classifier by embedding the classifier network on top of the discriminating function. The effectiveness of the proposed method has been validated through two hyperspectral spatial-spectral data sets. The same generative and discriminator architectures have been utilized with two different GAN objectives for a fair performance comparison with the proposed method. It is observed from the experimental validations that the proposed method outperforms the state-of-the-art-methods with better classification performance.
Index Terms:
Imbalance multi-class classification, minority-oversampling, evolutionary GAN, hyperspectral datasets.I Introduction
In hyperspectral imaging, instruments acquire images within many narrow, contiguous spectral bands throughout the spectrum that allows for acceptable discrimination between different features on the Earth’s surface. Several uses arise from collecting these data, such as land change detection, urban planning, classification, agriculture, identification, and surveillance [1]. New applications and technological challenges in data analysis arise from these earth observation data sources. Due to their high spectral resolution, many bands and abundant information mark hyperspectral images(HSI). These make the classification as the main challenging task.
The HSI classification has many applications, including land coverage detection, urban development planning, and managing resources under turbulence situations [2]. However, the main challenges associated with the HSI classification task are (i) dimensionality and (ii) an imbalance in the number of training samples. A high degree of data redundancy and dimensionality can impede data analysis. Compared with other widely used supervised techniques like k-nearest-neighbors (KNN), neural network, and logistic regression, SVM-based classifiers usually perform well with limited training samples [3].
Considering the advancement of sensor and imaging systems, the spatial resolution of hyperspectral data is becoming increasingly fine. As a result, the classification performance can be significantly enhanced by using spatial information that can be extracted by filtering or segmentation approaches. Multiple kernel learning is commonly used for hyperspectral data classification due to the powerful capability to handle heterogeneous spectral and spatial features efficiently [3].
The spectral-spatial classification of hyperspectral data has been a subject of much research in recent years as HSI classification focuses largely on spatial-spectral approaches [4]. Sparse representations combined with Markov random fields(MRF) are employed to explore spatial correlation that helps to improve classification performance [5]. Discriminant analysis was applied to learn a representative subspace from the spectral-spatial domains that achieved good classification outcomes [5, 3, 6]. Invariant and discriminatory features of the input data tend to be extracted by deep models, which typically contain two or more hidden layers [7]. Previously[3, 6, 8], the spectral characteristics of HSI were extracted using a deep-learning Convolutional Neural Network (CNN) which produced promising performance in classification. For example, there is a recently introduced method for extracting the features of HSI based on Gabor filtering and CNN, which leads to performance improvements [9]. In order to classify HSIs, one more framework based on principal components analysis (PCA), CNN and logistic regression were implemented. However, Deep CNN methods require a lot of balanced samples to train a large number of parameters, which is why in spite of great progress in HSI classification achieved by Deep learning models, they face the problem of over-fitting [10]. In addition, if the training data is not uniformly distributed (imbalanced problems) then the baseline classifier (CNN) suffers adequate performance due to major class biases. Generative adversarial network(GAN) is a popular methods due to its synthetic data generation capabilities. For an imbalanced learning paradigm, auxiliary classifier GAN (ACGAN) based method may improve the classification performance but it always suffers mode collapse issues due to major class biases [11]. To overcome this, domain constraints three-players adversarial GAN game has been suggested where a class-dependent mixture of generators, a discriminator and a classifier are playing an adversarial game to improve the classifier performance [2, 12]. But this process requires class-dependent independent generators that forces the output to be within the data domain. Generated samples are always being selected by the classifier, which is relevant to it. This method was developed for spectral information rather than spatial resolution information, but spatial resolutions based generative models obtained better model accuracy [3].
In recent times, to tackle major class bias issues, evolutionary-based annealing genetic GAN (AGGAN) has been introduced to handle imbalance problems where simulated annealing (SA) approach is taken to get better classification performance of generated samples [11]. The selection procedure is based on multiple generative losses in which the best losses will be survived during the learning procedure with the help of SA. However, to tackle the class-imbalance issue, minority oversampling methods were introduced to make the dataset in a balance form. The effectiveness of AGGAN has been validated through identically distributed image datasets. However, hyperspectral data contains spatial-spectral pieces of information. Hence, AGGAN may not be directly applicable over spatial-spectral HSI datasets to get adequate classification performance. To handle spatial-spectral HSIs datasets, we propose a minority over-sample MFEGAN method. The main contributions of this letter are described as follows,
-
•
We propose a minority over-sample MFEGAN for handling imbalance hyperspectral spatial-spectral image classification task.
-
•
The proposed method has been validated on two hyperspectral datasets and compared with state-of-the-art methods including evolving ACGAN.
-
•
To verify the statistical significance of the discriminator embedded classifier, McNemar’s test is carried out on all the methods.
II Background
The supervised learning paradigm is broadly dealt with two approaches: discriminative and adversarial. The discriminative methods try to learn the discriminative features from the data directly. A generative adversarial network is based on the adversarial games between two players (networks). The generative networks learn the parameters of real data distribution and generate real-like samples from the learned models. The learning of real distribution is achieved through either explicit or implicit methods under certain conditional assumptions. Finally, the learned parameters of generative networks can generate samples that mimic the actual distributions.
The GAN game has been developed on this unique learning strategy on the adversarial game principle [13]. The GAN game is composed of two networks, Generator(G) and Discriminator(D). G model tries to capture all the distinctive modes in actual data distribution through a known distribution(e.g. normal). D network is working as a binary classifier between actual and generated samples. D network assigns a fake value if the samples are coming from the G network, whereas the actual data is considered a real value.The objective function of unsupervised GAN is defined as,
(1) |
where, is the real data distribution and is the noise latent variable which takes from known prior to maps real like data distribution.
II-A Proposed Approach
The traditional GAN game can be extended to conditional GAN(CGAN) by considering class information() in G and D networks since the class information() is embedded with latent prior(z) to generate the class-specific sample [14]. Similarly, D networks also consider class information to control the generated space. By incorporating class conditionals(), the generated samples are more likely to belong to any actual class distributions. The ACGAN was introduced by considering an auxiliary classifier on top of the D networks [14]. The D network gives two outputs of probability distributions. One is associated with source data(real/fake), and another one is for the class score, indicted by () and (), respectively. Finally, the final objective function is associated with two losses; source loss and classification loss.
II-A1 MFEGAN
The proposed MFEGAN architecture is depicted in the Fig. 1. The MFEGAN is an extension of ACGAN categories where evolutionary strategies have been considered to select the G. The background is similar to ACGAN where the discriminator classifies the real data (X) as under the conditional class information . But generated samples from evolving generator assigns as multiple fake classes . However, in ACGAN, the assigns fake classes as part of real classes to .
The overall discriminator loss is associated with two losses: source loss() and classification loss(). It is defined as follows,
(2) |
II-A2 Evolutionary strategy and Generator Updates
In traditional ACGAN, the gradient of G is updated through the fixed D. In contrast to the traditional approach, the G network is updated through an evolutionary framework which is more suitable to improve the diversity of the classifier. As a result, the classifier moves to the best optimum values. The evolutionary process follows two sub-stages: mutation and evolution(or selection). As described in [15, 11], we have followed the same mutation process which are described as min-max mutation, heuristic-mutation and least-square mutation. These mutation techniques are nothing but each loss associated with different GAN methods. These mutation losses are defined as follows:
(3) |
(4) |
(5) |
All the three mutation losses are related to two parts: true/fake and multi-class classification. In multi-class classification, the generator is only updated through real class score().
Fitness measure: To select the best generator losses during the learning procedure, it is required to have fitness on which the population(losses) will select. The fitness measure is based on two criteria: ”generated samples belong to the real class” and ”diversity between generated samples and the real class data”. The generated samples belong to real class is represented by quality measure () as follows,
(6) |
The diversity measure() is represented by gradient flow between real samples and generated samples through D networks and is defined as follows,
(7) |
The final fitness measure is defined as follows,
(8) |

Here is the regulating factor, chosen as 0.5. Finally, one generator loss will survive among three associated losses that has the maximum fitness value.
III Experimental Validations
III-A Design of G and D Architectures
G and D networks’ architectures are listed in Table I. G network consists of three deconvolutional (DeConv) layers followed by batch normalisation (BN) layer. To activate non-linearity in CNN structure, we have used Relu activation function. G takes each class conditionals uniform noise prior to map each real class distributions. Both datasets are normalised within the limits of to . Sigmoid activation function is used in last layer of G networks. Similarly for D networks, we have used three convolutional (Conv) layers followed by Leaky ReLU activation function. In addition, the D networks gives two classes as outputs: one is associated with true/fake and last one is dealt with class conditionals outputs. We have used two classes in the class-conditionals outputs, first one is associated with true classes and second one is used for fake classes.
Networks Layer Linear/DeConv/Conv BN Stride Padding Activation Dropout G 1 nn.Linear(100 +y, 512) No NA NA nn.ReLU() No 2 DeConv(512, 256, ) Yes 1 0 nn.ReLU() No 3 DeConv(256, 128, 4 4) Yes 2 1 nn.ReLU() No 4 DeConv(128, 3, 4 4) No 2 1 nn.Sigmoid() No D 1 Conv(3, 128, 4 4) Yes 2 1 LeakyReLU(0.2) 0.5 2 Conv(128, 256, 4 4) Yes 2 1 LeakyReLU(0.2) 0.5 3 Conv(256, 512, ) Yes 1 0 LeakyReLU(0.2) 0.5 4 nn.Linear(512, 1) NA NA NA nn.Sigmoid() No 4 nn.Linear(512, y(real) + y(fake)) NA NA NA nn.LogSoftmax(dim =1) No NA-Not Applicable, SP- Spatial Patches
III-B Data Details and Experimental Validations
Two popular hyperspectral datasets have been utilized to check the effectiveness of the proposed method. Indian Pines(IN) data was collected from Airborne Visible Infrared Imaging Spectrometer (AVIRIS) sensor over the Indian Pines test site at Northwestern Indiana regions. IN dataset contains bands of pixels of images in which water-absorbent bands were corrupted. After removal of water-absorbent bands, the final data consists of spectral bands. IN consists of landcover classes that that have spectral coverages from to with a spatial resolution of . The dataset was collected at Kennedy space centre (KSC) Florida by using the NASA airborne AVIRIS instrument. The KSC dataset was collected with spatial resolutions of throughout kms. The captured dataset has a pixel size of with classes. Initially, the captured data has bands. Due to the water absorption, the fraction of bands with low SNR values is discarded from the original spectral bands. Finally, we have used bands with high SNR values for checking the classification performance. Both the datasets are considered with spatial-spectral features which can lead us to give better classification performance [3]. We have taken the three major principle components for spectral features in which most of the data information is stored.
The whole datasets are divided into two parts in which training and testing samples are listed in Table II. The imbalance ratio between major to minor class is obtained as ( and for IN and KSC datasets respectively. The total number of training and testing samples for both datasets are , , and respectively. Since the IN dataset is highly imbalanced, we don’t need to drop any of the classes to make dataset imbalance form. While training IN dataset, we have randomly chosen to train the model and the remaining samples were used to validate the model. The class distributions in the KSC dataset is almost balanced. To check the effectiveness of the proposed method, we have to make the dataset imbalanced in nature. Therefore, we have taken more samples from the class with a factor of minor class (Swamp).
Classes | IN | KSC | ||
---|---|---|---|---|
Training | Testing | Training | Testing | |
0 | 2 | 44 | 33 | 728 |
1 | 71 | 1357 | 23 | 220 |
2 | 42 | 788 | 24 | 232 |
3 | 12 | 225 | 24 | 228 |
4 | 24 | 459 | 15 | 146 |
5 | 36 | 694 | 22 | 207 |
6 | 1 | 27 | 2 | 103 |
7 | 24 | 454 | 38 | 393 |
8 | 1 | 19 | 51 | 469 |
9 | 49 | 923 | 39 | 365 |
10 | 123 | 2332 | 41 | 378 |
11 | 30 | 563 | 49 | 454 |
12 | 10 | 195 | 100 | 827 |
13 | 63 | 1202 | ||
14 | 19 | 367 | ||
15 | 5 | 88 | ||
Sum | 512 | 9737 | 461 | 4750 |
The multi-class classification performance of spatial-spectral hyperspectral datasets is compared with two standard machine learning methods such as KNN and RF, and deep generative frameworks. We have taken two popular auxiliary classifier-based generative models such as ACGAN [14], and AGGAN [11], for fair comparisons with the proposed method. We have also compared with the baseline CNN which is nothing but the D network working as a multi-class classification network. Similar to minority random oversampling (RO) techniques used in ACGAN and AGGAN, we have also compared MFEGAN with baseline D. The classification performance is observed with three popular measures in hyperspectral classification domains such as overall accuracy(OA), average accuracy (AA) and Kappa coefficients [2]. In addition, each class classification performance is observed for both the datasets. Table III represents the classification performance for IN dataset. It is clearly observed from Table III that MFEGAN obtained better performance among seven methods. This performance is based on spatial resolutions patches from the spatial domain data. The enhancement of performance indices is in OA, in AA and than the second-best results, AGGAN. The significant performance improvement is OA, in AA, and from the baseline CNN. While dealing with imbalance classification problems, it is necessary to make the dataset balanced. Therefore, we have used randomly over-sample(RO) minority classes to balance and applied to baseline. It is clearly observed from the Table III that obtained better performance improvement than the ACGAN by in OA and in kappa except in AA.
Method | RF | KNN | CNN | RO+CNN | ACGAN | AGGAN | MFEGAN |
Alfalfa | 100.00 | 66.67 | 81.81 | 48.10 | 90.47 | 58.73 | 83.78 |
Corn-notill | 55.73 | 76.72 | 93.42 | 92.55 | 91.72 | 89.85 | 92.25 |
Corn-mintill | 86.97 | 80.58 | 85.93 | 86.41 | 84.59 | 89.56 | 89.59 |
Corn | 100.00 | 100.00 | 46.40 | 80.76 | 66.79 | 91.59 | 97.10 |
Grass-pasture | 98.63 | 81.44 | 76.71 | 93.13 | 92.41 | 97.42 | 98.00 |
Grass-trees | 71.59 | 59.05 | 92.89 | 90.84 | 88.74 | 86.61 | 86.55 |
Grass-pasture-mowed | 0.00 | 0.00 | 78.57 | 91.66 | 80.00 | 83.33 | 85.71 |
Hay-windrowed | 96.59 | 96.05 | 85.17 | 98.48 | 98.05 | 97.63 | 99.12 |
Oats | 0.00 | 0.00 | 69.23 | 65.00 | 56.52 | 57.14 | 81.81 |
Soybean-notill | 97.00 | 73.14 | 73.96 | 87.75 | 74.82 | 90.89 | 94.41 |
Soybean-mintil | 56.49 | 63.90 | 92.43 | 97.05 | 97.06 | 96.92 | 94.47 |
Soybean-clean | 96.73 | 93.01 | 62.79 | 74.30 | 82.27 | 83.74 | 88.78 |
Wheat | 99.48 | 91.14 | 85.22 | 94.85 | 90.67 | 94.58 | 94.14 |
Woods | 80.20 | 91.54 | 97.68 | 97.41 | 97.03 | 96.67 | 96.03 |
Buildings-Grass-Trees | 94.01 | 97.66 | 83.16 | 77.60 | 77.07 | 82.78 | 93.83 |
Stone-Steel-Towers | 100.00 | 96.00 | 80.64 | 80.23 | 93.33 | 85.18 | 84.70 |
OA | 69.61 | 74.70 | 84.33 | 90.69 | 90.35 | 92.02 | 93.22 |
AA | 77.09 | 72.96 | 80.38 | 84.76 | 86.72 | 86.41 | 91.26 |
Kappa | 64.04 | 70.57 | 82.24 | 89.43 | 88.97 | 90.91 | 92.26 |
Time(epoch/sec) | 1 | 1 | .98 | 1.16 | 2.55 | 8.03 | 6.96 |
Best obtained results marked as bold
To obtain a better understanding of how spatial resolution patches have an impact on overall performance, we have considered a set of three different spatial resolutions and compared with AGGAN. The illustration is depicted in Fig 2 in which all the three matrices are considered. It is clearly observed from Fig 2 that the proposed method outperformed AGGAN in all the test cases, in particular, the -spatial resolution patches obtained the best performance among the three patches.

Fig 3 represents classification performance for complete datasets using RF, KNN, CNN, RO+CNN, ACGAN, AGGAN, and MFEGAN. It is observed from the figure that MFEGAN performs well on the complete datasets.
Furthermore, we have also considered the KSC dataset for checking the effectiveness of the MFEGAN. Table IV describes the performance measures in terms of three parameters. Similar to IN dataset, our proposed method obtained better results among the seven methods. It is to be noted that the performance measures in the Table IV are for spatial resolution patches since, with this spatial size, we obtained the best performance for IN dataset. The enhancement in performance measures is in OA, in AA, and in kappa compared to the AGGAN. Similarly, AGGAN is seen to have better performance among the other state-of-the-art methods except for the proposed MFEGAN method. The baseline CNN gives the worst performance among all the methods due to the high imbalanced data distribution in the KSC dataset. Besides, a significant improvement in baseline CNN performance is observed with the over-sampled minority classes in the datasets (RO+CNN). However, the baseline CNN could not reach the performance that was obtained by the standard machine learning methods such as RF, KNN. The reason could be the over-fitting in the CNN parameters. Once, we over-sample minority classes, the performance improved a lot. Although, RO+CNN obtained better results compared to ACGAN but not AGGAN. In consistency for both datasets, AGGAN obtained best results compared to all the methods. It is also observed for both the cases, the AGGAN takes more time than the MFEGAN due to its SA block.








Classes | RF | KNN | CNN | RO+CNN | ACGAN | AGGAN | MFEGAN |
Scrub | 75.83 | 75.40 | 76.02 | 91.83 | 93.52 | 93.89 | 96.30 |
Willow swamp | 90.79 | 90.80 | 08.63 | 89.68 | 93.33 | 92.76 | 98.59 |
CP hammock | 83.98 | 77.98 | 79.20 | 96.17 | 93.30 | 92.55 | 99.53 |
Slash pine | 87.38 | 98.90 | 81.48 | 79.77 | 85.12 | 82.60 | 85.83 |
Oak/Broadleaf | 99.20 | 100.00 | 79.38 | 95.94 | 93.37 | 99.23 | 100.00 |
Hardwood | 94.70 | 99.33 | 46.71 | 92.27 | 72.76 | 88.55 | 99.51 |
Swamp | 0.00 | 0. 00 | 80.00 | 95.87 | 100.00 | 95.09 | 95.29 |
Graminoid marsh | 96.72 | 98.81 | 61.68 | 92.57 | 94.69 | 98.83 | 96.06 |
Spartina marsh | 99.15 | 51.70 | 87.10 | 98.06 | 90.23 | 90.32 | 89.29 |
Cattail marsh | 95.25 | 81.40 | 80.63 | 97.58 | 95.77 | 99.45 | 98.38 |
Salt marsh | 100.00 | 100. 00 | 96.89 | 100. 00 | 100.00 | 100.00 | 100. 00 |
Mud flats | 99.78 | 100.00 | 42.97 | 98.21 | 99.52 | 100.00 | 100. 00 |
Water | 100.00 | 97.63 | 04.76 | 99.63 | 96.61 | 97.64 | 98.10 |
OA | 92.54 | 82.63 | 58.80 | 95.15 | 93.49 | 95.22 | 96.58 |
AA | 86.37 | 82.46 | 63.50 | 94.43 | 92.94 | 94.69 | 96.68 |
Kappa | 91.65 | 80.47 | 55.30 | 94.60 | 92.75 | 94.67 | 96.19 |
Time(epoch/sec) | 1 | 1 | 0.98 | 1.24 | 5.13 | 17.56 | 15.71 |
Best obtained results marked as bold
MFEGAN | vs RF | vs KNN | vs CNN | vs RO+CNN | vs ACGAN | vs AGGAN |
---|---|---|---|---|---|---|
IN | 38.20 | 32.25 | 18.49 | 6.21 | 6.97 | 3.08 |
KSC | 12.13 | 30.27 | 55.95 | 4.91 | 9.71 | 4.69 |
The statistical significance test for both datasets has been discussed through McNemar’s test() with the proposed method and state-of-the-art methods. This is provided in Table V. Since the value of (), it indicates better statistical significance compared to the other methods.
IV conclusion
We have proposed a multi-fake evolutionary auxiliary classifier based GANs to improve the classification performance for IN and KSC datasets. For fair comparisons of the proposed method, other similar methods such as ACGAN, AGGAN and oversampling baseline methods have been considered. It is observed from the experimental validations that our proposed method outperformed all the methods while dealing with imbalanced datasets. Our future work will consider a mixture of two spatial-spectral generators for representing better feature representation learning for multi-class hyperspectral image classification.
References
- [1] Y. Guo, G. Ding, L. Liu, J. Han, and L. Shao, “Learning to hash with optimized anchor embedding for scalable retrieval,” IEEE Transactions on Image Processing, vol. 26, no. 3, pp. 1344–1354, 2017.
- [2] T. Dam, S. G. Anavatti, and H. A. Abbass, “Mixture of spectral generative adversarial networks for imbalanced hyperspectral image classification,” IEEE Geoscience and Remote Sensing Letters, 2020.
- [3] P. Ghamisi, J. Plaza, Y. Chen, J. Li, and A. J. Plaza, “Advanced spectral classifiers for hyperspectral images: A review,” IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 1, pp. 8–32, 2017.
- [4] I. Makki, R. Younes, C. Francis, T. Bianchi, and M. Zucchetti, “A survey of landmine detection using hyperspectral imaging,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 124, pp. 40–53, 2017.
- [5] Y. Yuan, J. Lin, and Q. Wang, “Hyperspectral image classification via multitask joint sparse representation and stepwise mrf optimization,” IEEE transactions on cybernetics, vol. 46, no. 12, pp. 2966–2977, 2015.
- [6] J. Feng, H. Yu, L. Wang, X. Cao, X. Zhang, and L. Jiao, “Classification of hyperspectral images based on multiclass spatial–spectral generative adversarial networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 5329–5343, 2019.
- [7] Z. Zhong, J. Li, D. A. Clausi, and A. Wong, “Generative adversarial networks and conditional random fields for hyperspectral image classification,” IEEE transactions on cybernetics, 2019.
- [8] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image classification using deep pixel-pair features,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 844–853, 2016.
- [9] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutional neural networks for hyperspectral image classification,” Journal of Sensors, vol. 2015, 2015.
- [10] Y. Chen, L. Zhu, P. Ghamisi, X. Jia, G. Li, and L. Tang, “Hyperspectral images classification with gabor filtering and convolutional neural network,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 12, pp. 2355–2359, 2017.
- [11] J. Hao, C. Wang, H. Zhang, and G. Yang, “Annealing genetic gan for minority oversampling,” arXiv preprint arXiv:2008.01967, 2020.
- [12] T. Dam, M. M. Ferdaus, S. G. Anavatti, S. Jayavelu, and H. A. Abbass, “Does adversarial oversampling help us?” arXiv preprint arXiv:2108.10697, 2021.
- [13] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
- [14] A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” in International conference on machine learning. PMLR, 2017, pp. 2642–2651.
- [15] C. Wang, C. Xu, X. Yao, and D. Tao, “Evolutionary generative adversarial networks,” IEEE Transactions on Evolutionary Computation, vol. 23, no. 6, pp. 921–934, 2019.