This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exhaustive Exploitation of Nature-inspired Computation for Cancer Screening in an Ensemble Manner

Xubin Wang, Yunhe Wang, Zhiqiang Ma, Ka-Chun Wong, and Xiangtao Li Xubin Wang and Xiangtao Li are with the School of Artificial Intelligence, Jilin University, Changchun, Jilin 130012, China, ( e-mail: [email protected]; [email protected]). Corresponding authors (): [email protected]; [email protected] Wang is with the School of Artificial Intelligence, Hebei University of Technology, Tianjin, China, (e-mail: [email protected]).Zhiqiang Ma is with the School of Information Science and Technology, Northeast Normal University, Changchun, Jilin 130117, China, (e-mail: [email protected]).Ka-Chun Wong is with the Department of Computer Science, City University of Hong Kong, Hong Kong SAR, (e-mail: [email protected]).
Abstract

Accurate screening of cancer types is crucial for effective cancer detection and precise treatment selection. However, the association between gene expression profiles and tumors is often limited to a small number of biomarker genes. While computational methods using nature-inspired algorithms have shown promise in selecting predictive genes, existing techniques are limited by inefficient search and poor generalization across diverse datasets. This study presents a framework termed Evolutionary Optimized Diverse Ensemble Learning (EODE) to improve ensemble learning for cancer classification from gene expression data. The EODE methodology combines an intelligent grey wolf optimization algorithm for selective feature space reduction, guided random injection modeling for ensemble diversity enhancement, and subset model optimization for synergistic classifier combinations. Extensive experiments were conducted across 35 gene expression benchmark datasets encompassing varied cancer types. Results demonstrated that EODE obtained significantly improved screening accuracy over individual and conventionally aggregated models. The integrated optimization of advanced feature selection, directed specialized modeling, and cooperative classifier ensembles helps address key challenges in current nature-inspired approaches. This provides an effective framework for robust and generalized ensemble learning with gene expression biomarkers. Specifically, we have opened EODE source code on Github at https://github.com/wangxb96/EODE.

Index Terms:
Feature selection, Clustering, Ensemble learning, Grey wolf optimizer, Classification

1 Introduction

Cancer has become one of the leading causes of mortality worldwide, resulting in over 10 million deaths in 2020 alone [1]. The heterogeneity and complexity of various cancer types poses significant challenges for timely and accurate diagnosis, prognosis, and treatment planning [2, 3]. Precision oncology aims to overcome these difficulties by leveraging molecular biomarkers and omics data to guide personalized therapeutic decisions [4]. In particular, analysis of cancer gene expression data enables identification of discriminative genes and pathways involved in pathogenesis, which can inform diagnostic tests, prognostic indicators, and drug targets [5, 6].

However, several analytical difficulties impose barriers to identifying robust molecular biomarkers from gene expression data. Small sample sizes coupled with extremely high dimensionality and sparsity of the data make computational analysis statistically underpowered [7]. Technical noise, batch effects, tumor heterogeneity, and variability between patients also confound analyses [8, 9]. Effective and robust computational methods are therefore urgently needed to overcome these challenges and accurately detect differentially expressed genes from such complex high-dimensional datasets across diverse cancer types. This can support development of gene expression-based biomarkers for precision oncology applications.

A variety of computational approaches have been applied for cancer gene expression analysis and biomarker identification, including machine learning, deep learning, and nature-inspired optimization algorithms [10, 11, 12]. In particular, swarm intelligence and evolutionary algorithms like particle swarm optimization (PSO) [13], ant colony optimization (ACO) [14], genetic algorithms [15], and enhanced optimizer variants [16, 17, 18, 19] have shown promise. While achieving promising results, further improvements in accuracy, robustness, and generalization ability are still possible. A key limitation is that most methods rely on a single learner algorithm, which makes it difficult to determine the universally optimal learner across diverse cancer types and datasets. Different algorithms have distinct strengths and weaknesses, so their performance varies. Relying on just one also reduces robustness.

Ensemble learning methods which combine multiple diverse base learner models can help address these pitfalls [20]. Strategies like bagging [21] and boosting [22] train multiple base models on randomized or reweighted data versions, then aggregate predictions to reduce variance and bias. Such ensembles have proven effective for tasks ranging from cancer subtype classification [23, 24] to drug response modeling [25]. However, naively combining all base learner models can limit diversity, leading to redundant representations and suboptimal performance [26]. Recent studies have explored intelligent optimizer-guided selection of ensemble subsets to promote specialization and synergy among members [27, 28, 29, 30, 31]. For instance, genetic algorithms have been applied to search the space of model combinations, selecting only classifiers that maximize validation accuracy through cooperative interactions [32]. While showing promise, these approaches generally utilize the full, high-dimensional feature space, which can retain irrelevant variables that confuse models and constrain diversity. Advanced feature selection is needed to derive maximally informative biomarker subsets tailored for ensemble learning [33]. Furthermore, diversity enhancement techniques like bagging and boosting are insufficient to fully overcome representation redundancies during model training [34]. Novel forms of controlled randomness injection could better promote specialization by guiding different models to focus on distinct explanatory data facets [35, 27]. Overall there remains great opportunity to advance ensemble classifier performance by integrating intelligent feature selection, guided diversity induction, and metaheuristic optimization of cooperative model combinations [26, 36]. This can further evolve the state-of-the-art in ensemble methods for precision medicine applications.

In this work, we propose a novel nature-inspired feature selection algorithm, optimized ensemble classifier, and diversity-enhancing ensemble strategy by integrating the grey wolf optimizer (GWO). Our approach, called Evolutionary Optimized Diverse Ensemble learning (EODE), synergistically combines GWO-based wrapper feature selection, diversity injection via randomized model training, and evolutionary optimization for constructing optimal ensemble classifiers. Specifically, GWO efficiently searches the high-dimensional gene expression space to identify an informative subset of discriminative features for cancer diagnosis. Multiple diverse base classifiers (e.g. SVM, KNN) are trained on these selected features while introducing randomness to increase diversity. Finally, GWO optimizes selection and integration of ensemble members to maximize performance on validation data. EODE enhances generalization ability by leveraging GWO’s feature selection, controlled randomness injection, and metaheuristic ensemble optimization. We evaluate EODE on cancer gene expression datasets for tasks including subtype classification, outcome prediction, and the size of feature subset. Results demonstrate EODE significantly improves accuracy and robustness over 23 state-of-the-art methods on 35 cancer gene expression datasets. The integrated strategy advances biomarker discovery and precision oncology by evolving high-performance diverse ensemble classifiers. The main steps of the EODE approach are as follows:

  1. 1.

    Base classifiers: The diversity among the base classifiers is crucial to the effectiveness of the ensemble. The base classifiers can be any suitable classification algorithms, such as decision trees, support vector machines, or neural networks. In this study, six base classifiers including Discriminant Analysis (DISCR), Decision Tree (DT), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and Naive Bayes (NB) are used.

  2. 2.

    Classifier selection: To mitigate the high computational cost associated with using ensemble methods in the feature selection training process, all base classifiers are initially trained with five-fold cross validation using the original training data. The best-performing base classifier is then selected to participate in the feature selection stage. This approach ensures that appropriate learners are involved in training for different datasets to a certain extent.

  3. 3.

    Feature selection: GWO is employed to search for an optimal subset of genes that are most relevant to cancer diagnosis. The fitness function is designed to evaluate the quality of each feature subset based on classification performance and the size of feature subset. GWO optimizes the feature subset by iteratively updating the positions of grey wolves based on their fitness values.

  4. 4.

    Ensemble diversity enhancement: To increase the diversity of ensemble, the techniques such as bagging, boosting, or random subspace method can be employed. Here, we generate multiple random subspaces through K-means clustering to increase the diversity of the ensemble. We use these data clusters to train base classifiers, resulting in a pool of models.

  5. 5.

    Model pool optimization: In the model pool, directly fusing all models can lead to lower inference efficiency, and the presence of some low-quality models may degrade the overall performance. Therefore, before final model evaluation, we optimize the model pool. We first performed pre-optimization, discarding models that performed below average on the validation set. For the remaining models, we further optimized using the GWO algorithm to select the possible optimal combination of models.

  6. 6.

    Evaluation and validation: The performance of the EODE model is evaluated using appropriate metrics such as accuracy, average performance, and the size of the feature subset. The predictions of the selected models are combined using plurality voting. The combined predictions provide the final classification result. Moreover, cross-validation and independent validation datasets are used to assess the generalization ability of the model.

2 Methods

2.1 Methodology Overview of EODE

In this study, we present a novel nature-inspired method called EODE for rapid identification of biomarker genes for multiple cancer types in multiple cancer gene expression datasets. A schematic overview of the algorithm is provided in Figure 1. The original input gene expression data 𝒟or={(x1,y1),,(xn,yn)}\mathcal{D}_{or}=\{(x_{1},y_{1}),...,(x_{n},y_{n})\} is considered, where xi=(xi,1,xi,2,,xi,dim)x_{i}=(x_{i,1},x_{i,2},...,x_{i,dim}) represents a sample with dimdim genes, yy belongs to the set {1,2,,c}\{1,2,...,c\} indicating the consensus molecular subtypes, and nn is the total number of samples.

Refer to caption
Figure 1: Overview of the proposed EODE algorithm: In the GWO feature selection phase, the original cancer gene expression training data is utilized to train all base classifiers, and the classifier with the highest performance is selected as the evaluation classifier. The processed data is then optimized to construct an ensemble model. Specifically, the training data is incrementally clustered using the K-means method to form subspace clusters. These clusters are used to train individual base classifiers, which are then added to the model pool. Any classifiers in the pool with below-average performance are filtered out. Next, the GWO is applied to optimize the classifier pool and determine the best possible ensemble combination. Finally, the optimized ensemble model is evaluated on the independent test dataset using a plurality voting strategy to generate the final cancer type predictions.

In the feature selection step, we employ the GWO to extract relevant biomarker genes after training our model on the training gene expression matrix 𝒟tr\mathcal{D}_{tr}. Each base classifier from the pool \mathcal{B} (including Discriminant Analysis (DISCR), Decision Tree (DT), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and Naive Bayes (NB)) is initially trained using the input data. The best-performing classifier is then chosen as the evaluation classifier for feature selection.

The processed data is subsequently utilized to train and optimize a diverse ensemble model. Specifically, the data undergoes five-fold cross-validation to construct the final model Ψ\Psi. Initially, the data is partitioned into progressive subspaces using the K-means method to form clusters. These clusters are then utilized to train base classifiers, which are subsequently incorporated into the model pool. Models in the pool with below-average performance are filtered out. After that, the GWO approach is applied to optimize the model pool and identify the best possible combination. Finally, the model Ψ\Psi is evaluated on the test data using a plurality voting strategy. The overall framework of EODE is summarized in Algorithm 1.

Algorithm 1 Pseudo Code of EODE Algorithm
1:Training Data: 𝒟tr={(xtr,1,ytr,1),,(xtr,n,ytr,n)}\mathcal{D}_{tr}=\{(x_{tr,1},y_{tr,1}),...,(x_{tr,n},y_{tr,n})\}, xRdx\in R^{d}, y{1,2,,c}y\in\{1,2,...,c\}, Test Data: 𝒟te={(xte,1,yte,1),,(xte,n,yte,n)}\mathcal{D}_{te}=\{(x_{te,1},y_{te,1}),...,(x_{te,n},y_{te,n})\}, a set of base classifiers \mathcal{B}, upper bounds of clustering KK, population of GWO X\vec{X}, the feature selection function f1f_{1}, the classifiers optimization function f2f_{2}
2:Use training data 𝒟tr\mathcal{D}_{tr} to train each base classifier in \mathcal{B}
3:The classifier bb with the best performance is selected for feature selection
4:Initialize a population of |X||\vec{X}| individuals
5:while t<t< max iterations TT do
6:     X\vec{X}\xleftarrow{} Use Algorithm 2 to do f1(X)f_{1}(\vec{X})
7:     Xi\vec{X_{i}}\xleftarrow{} best individual
8:     tt++;
9:end while
10:bfbf\xleftarrow{} best features selected by wolf Xi\vec{X_{i}}
11:fnumfnum\xleftarrow{} the number of features selected by wolf Xi\vec{X_{i}}
12:𝒟tr𝒟tr(bf)\mathcal{D}_{tr}\xleftarrow{}\mathcal{D}_{tr}(bf)
13:𝒟te𝒟te(bf)\mathcal{D}_{te}\xleftarrow{}\mathcal{D}_{te}(bf)
14:𝒟tr=𝒟tr,1𝒟tr,5,𝒟tr,i𝒟tr,j=(ij)\mathcal{D}_{tr}=\mathcal{D}_{tr,1}\cup...\cup\mathcal{D}_{tr,5},\mathcal{D}_{tr,i}\cap\mathcal{D}_{tr,j}=\emptyset(i\neq j)
15:for each 𝒟tr,i\mathcal{D}_{tr,i} in 𝒟tr\mathcal{D}_{tr} do
16:     𝒟tri=𝒟tr𝒟tr,i\mathcal{D}_{tr}^{-i}=\mathcal{D}_{tr}-\mathcal{D}_{tr,i}
17:     for k=1Kk=1\xrightarrow{}K do
18:         CSC^{S}\xleftarrow{} partition 𝒟tri\mathcal{D}_{tr}^{-i} into kk clusters
19:         S=S+1S=S+1
20:     end for
21:     for each mpimp_{i} in 𝒫\mathcal{MP} do
22:         Acc(i)Acc(i)\xleftarrow{} calculate each mpimp_{i}’s validation accuracy on Dtr,iD_{tr,i}
23:     end for
24:     𝒫\mathcal{MP}\xleftarrow{} (mpimp_{i} if Acc(mpi)>Acc(mp_{i})> mean(AccAcc))
25:     Initialize a population of |X||\vec{X}| individuals
26:     while t<t< max iterations TT do
27:         X\vec{X}\xleftarrow{} Use Algorithm 2 to do f2(X)f_{2}(\vec{X})
28:         Xi\vec{X_{i}}\xleftarrow{} best individual
29:         tt++;
30:     end while
31:     ψ\psi\xleftarrow{} best models in 𝒫\mathcal{MP} selected by wolf Xi\vec{X_{i}}
32:     Optimized classifier ΨΨ+ψ\Psi\xleftarrow{}\Psi+\psi
33:end for
34:testAcctestAcc\xleftarrow{} classify samples of 𝒟te\mathcal{D}_{te} by Ψ\Psi
35:Output: The optimized ensemble classifier Ψ\Psi, the number of selected features fnumfnum and the test accuracy testAcctestAcc

2.2 Nature-inspired Feature Selection

Considering a training cancer gene expression data 𝒟tr={(x1,y1),,(xn,yn)}\mathcal{D}_{tr}=\{(x_{1},y_{1}),...,(x_{n},y_{n})\}, where xi=(xi,1,xi,2,,xi,dim)x_{i}=(x_{i,1},x_{i,2},...,x_{i,dim}) represents the feature vector and dimdim denotes the number of features, yy belongs to the set {1,2,..,c}\{1,2,..,c\} representing the class, and nn is the number of samples. It is important to note that the high-dimensional nature of the gene expression data may include many irrelevant genes, which can negatively impact identification accuracy while increasing computational time [7]. Therefore, performing feature selection is crucial to preprocess the data effectively.

The Grey Wolf Optimizer (GWO), initially proposed by Mirjalili [37], is a swarm intelligence algorithm inspired by the social hierarchy and hunting behavior of grey wolves in nature. GWO offers advantages such as good convergence, minimal parameter tuning, and ease of implementation [38]. The core concept of GWO revolves around three primary predation behaviors: encircling prey, hunting, and attacking prey, which are performed based on the social hierarchy among the wolves. The social hierarchy in GWO consists of four levels: α\alpha, β\beta, δ\delta, and ω\omega, with α\alpha being the dominant wolf, followed by β\beta and δ\delta, while the remaining wolves are labeled as ω\omega. Wolves at higher ranks exert dominance over those at lower ranks, and α\alpha, β\beta, and δ\delta play key roles in the algorithm, with α\alpha being the wolf king and β\beta and δ\delta serving as potential successors. The α\alpha wolf represents the fittest solution and guides the pack towards promising search areas. The second and third best fit solutions are modeled as β\beta and δ\delta wolves, respectively. The ω\omega wolves represent the remaining weaker candidate solutions that follow the guidance of the α\alpha, β\beta and δ\delta wolves. During optimization, the candidate solutions iteratively update their positions towards the best three solutions until convergence upon the global optimal value. Specifically, a schematic representation of GWO is depicted in Fig. 2.

Building upon these foundations, we propose a nature-inspired feature selection method based on GWO, which comprises six essential components: classifier selection, population initialization, encircling prey phase, hunting phase, attacking phase, and feature selection objective function.

2.2.1 Classifier Selection

To evaluate the feature selection results, we consider six base classifiers in a classifier pool \mathcal{B}: Discriminant Analysis (DISCR), Decision Tree (DT), K-Nearest Neighbor (KNN), Artificial Neural Networks (ANNs), Support Vector Machine (SVM), and Naive Bayes (NB). However, incorporating all these classifiers into the ensemble method during the feature selection phase would be computationally expensive. Therefore, we adopt a pre-training approach to select the best-performing classifier from the pool \mathcal{B}. The cancer gene expression data 𝒟\mathcal{D} is subjected to five-fold cross-validation on each base classifier, and the classifier with the highest performance is chosen as the evaluation classifier for the feature selection phase. This approach allows us to efficiently select the most suitable classifier for the subsequent feature selection process.

2.2.2 Population Initialization

In the beginning, the population X\vec{X} is randomly created and represented as real numbers. Each individual, denoted as Xi\vec{X_{i}}, is a set of genes: Xi={g1,g2,,gdim}\vec{X_{i}}=\{g_{1},g_{2},...,g_{dim}\}, where gdimg_{dim} represents the dimdimth gene and dimdim is the total number of genes.

To convert these real numbers into a binary form, we use a threshold value θ\theta. If a feature value (gng_{n}) is greater than or equal to θ\theta, it is set to 1, indicating that the corresponding feature is selected. On the other hand, if gng_{n} is less than θ\theta, it is set to 0, indicating that the feature is not selected. The process can be as follows:

gn={1,gnθ0,gn<θ.g_{n}=\begin{dcases}1,&g_{n}\geq\theta\\ 0,&g_{n}<\theta\end{dcases}. (1)

After that, the position of each individual is represented by a binary (0/1) string.

2.2.3 Encircling Prey Phase

The ”encircling prey” behavior is a strategy employed by the grey wolf pack to search for feature subsets. This behavior is mathematically modeled to simulate how the grey wolf gradually approaches its prey and surrounds it. The distance (D\vec{D}) between the grey wolf and the prey is determined by the equation:

D=|2r2Xp(t)Xi(t)|,\vec{D}=|2\cdot r_{2}\cdot\vec{X_{p}(t)}-\vec{X_{i}(t)}|, (2)

where D\vec{D} represents the distance between them. During the search process, the current iteration is denoted by tt, and Xp(t)\vec{X_{p}(t)} and Xi(t)\vec{X_{i}(t)} represent the position vectors of the prey and the grey wolf, respectively.

To update the position of the grey wolf, we utilize the formula Xi(t+1)=Xp(t)(2ar1a)D\vec{X_{i}(t+1)}=\vec{X_{p}(t)}-(2\vec{a}\cdot r_{1}-\vec{a})\cdot\vec{D}. Here, a\vec{a} is the convergence factor that decreases linearly from 2 to 0 as the iterations progress. The convergence factor is calculated as a=22t\vec{a}=2-2t/maxtmax_{t}, where tt represents the current iteration, and maxtmax_{t} is the maximum number of iterations defined for the search process. Additionally, r1r_{1} and r2r_{2} are random numbers between 0 and 1.

By applying this position update formula, the grey wolf adjusts its position towards the prey. The term (2ar1a)(2\vec{a}\cdot r_{1}-\vec{a}) determines the magnitude and direction of the movement, while the distance D\vec{D} guides the grey wolf’s movement in narrowing the gap with the prey. The process continues iteratively until the desired maximum number of iterations is reached (maxtmax_{t}). Ultimately, the grey wolf is expected to encircle the prey, indicating the discovery of a promising feature subset.

Algorithm 2 Pseudo Code of Grey Wolf Optimizer (GWO)
1:Initialize a population X\vec{X} of wolves randomly within the solution space
2:Evaluate the fitness of each wolf Xi\vec{X_{i}} using a fitness function ff
3:Set the initial values for α\alpha, β\beta, and δ\delta as the wolves with the highest, second highest, and third highest fitness, respectively
4:while t << max iterations TT do
5:     for each wolf Xi\vec{X_{i}} in the population do
6:         Update the position of the wolf based on the positions of α\alpha, β\beta, and δ\delta using the following formulas:
7:         Dα=|C1XαXi|\vec{D}_{\alpha}=|\vec{C}_{1}\cdot\vec{X}_{\alpha}-\vec{X}_{i}| // Distance from α\alpha
8:         Dβ=|C2XβXi|\vec{D}_{\beta}=|\vec{C}_{2}\cdot\vec{X}_{\beta}-\vec{X}_{i}| // Distance from β\beta
9:         Dδ=|C3XδXi|\vec{D}_{\delta}=|\vec{C}_{3}\cdot\vec{X}_{\delta}-\vec{X}_{i}| // Distance from δ\delta
10:         X=XαA1Dα\vec{X}^{\prime}=\vec{X}_{\alpha}-A_{1}\cdot\vec{D}_{\alpha} // Encircling α\alpha
11:         Y=XβA2Dβ\vec{Y}^{\prime}=\vec{X}_{\beta}-A_{2}\cdot\vec{D}_{\beta} // Encircling β\beta
12:         Z=XδA3Dδ\vec{Z}^{\prime}=\vec{X}_{\delta}-A_{3}\cdot\vec{D}_{\delta} // Encircling δ\delta
13:         Update the position of the wolf using:
14:         Xi(t+1)=(X+Y+Z)/3\vec{X_{i}(t+1)}=(\vec{X}^{\prime}+\vec{Y}^{\prime}+\vec{Z}^{\prime})/3
15:         Apply boundary constraints to ensure the new position is within the solution space
16:     end for
17:     Update the fitness of each wolf Xi\vec{X_{i}} using a fitness function ff
18:     Update α\alpha, β\beta, and δ\delta based on the updated fitness values
19:     tt++;
20:end while
21:Output: The position of the α\alpha wolf represents the best solution found by the GWO algorithm
Refer to caption
Figure 2: The GWO algorithm is illustrated in a schematic representation, highlighting the process of updating the positions of the wolves. Initially, the positions of the wolves are randomly initialized within the solution space. The fitness of each wolf is evaluated based on a fitness function. In each iteration, the positions of the wolves are updated using mathematical formulas that consider the social hierarchy, with the α\alpha wolf having the greatest influence. The update process involves attracting other wolves towards the positions of the α\alpha, β\beta, and δ\delta wolves. This iterative position updating continues until a termination condition is met. Ultimately, the position of the α\alpha wolf represents the best solution found by the GWO algorithm.

2.2.4 Hunting Phase

Grey wolves possess the ability to identify the general location of their prey and work together to surround it. However, in many unknown situations, they may not have precise knowledge of the exact location of the target. In our study, we simulate the behavior of grey wolves by introducing three key individuals: α\alpha, β\beta, and δ\delta. These individuals help guide the entire wolf pack in surrounding the prey and searching for the optimal solution.

To track the position of the prey, each individual grey wolf calculates its distance to the prey using the following equations:

Dα=|C1XαXi|,\vec{D}_{\alpha}=|\vec{C}_{1}\cdot\vec{X}_{\alpha}-\vec{X}_{i}|, (3)
Dβ=|C2XβXi|,\vec{D}_{\beta}=|\vec{C}_{2}\cdot\vec{X}_{\beta}-\vec{X}_{i}|, (4)
Dδ=|C3XδXi|.\vec{D}_{\delta}=|\vec{C}_{3}\cdot\vec{X}_{\delta}-\vec{X}_{i}|. (5)

Here, Dα\vec{D_{\alpha}}, Dβ\vec{D_{\beta}}, and Dδ\vec{D_{\delta}} represent the distances between the grey wolves α\alpha, β\beta, δ\delta and the prey, respectively. Xα\vec{X_{\alpha}}, Xβ\vec{X_{\beta}}, and Xδ\vec{X_{\delta}} denote the positions of α\alpha, β\beta, and δ\delta, while Xi\vec{X_{i}} represents the current position of the grey wolf. Additionally, C1\vec{C_{1}}, C2\vec{C_{2}}, and C3\vec{C_{3}} are random vectors used to calculate these distances.

Each grey wolf updates its position based on these distance calculations:

X=XαA1Dα,\vec{X}^{\prime}=\vec{X}_{\alpha}-A_{1}\cdot\vec{D}_{\alpha}, (6)
Y=XβA2Dβ,\vec{Y}^{\prime}=\vec{X}_{\beta}-A_{2}\cdot\vec{D}_{\beta}, (7)
Z=XδA3Dδ.\vec{Z}^{\prime}=\vec{X}_{\delta}-A_{3}\cdot\vec{D}_{\delta}. (8)

Here, X\vec{X}^{\prime}, Y\vec{Y}^{\prime}, and Z\vec{Z}^{\prime} represent the new positions of the grey wolves moving towards α\alpha, β\beta, and δ\delta, respectively. The constants A1A_{1}, A2A_{2}, and A3A_{3} control the magnitude of the movement towards the prey.

Finally, the position of the grey wolf at the next time step Xi(t+1)\vec{X_{i}(t+1)} is determined as the average of the positions X\vec{X}^{\prime}, Y\vec{Y}^{\prime}, and Z\vec{Z}^{\prime}:

Xi(t+1)=X+Y+Z3.\vec{X_{i}(t+1)}=\frac{\vec{X}^{\prime}+\vec{Y}^{\prime}+\vec{Z}^{\prime}}{3}. (9)

In this way, the entire wolf pack moves together towards the positions of α\alpha, β\beta, and δ\delta, and the new position of each individual is updated accordingly.

2.2.5 Attacking Phase

The final stage of the hunting process is the attack, during which the grey wolves aim to capture their prey and obtain the optimal solution. This phase involves adjusting certain parameters to strike a balance between global exploration and local exploitation.

To achieve this balance, two key parameters are considered: aa and AA. The value of aa is progressively decreased from 2 to 0 in a linear manner. Simultaneously, the range of fluctuations in AA is reduced. The parameter AA takes on values within the range [a,a][-a,a]. The behavior of the grey wolves is influenced by the magnitude of AA. When the absolute value of AA is greater than 1, the grey wolves tend to spread out across different areas, enabling a global search for prey. Conversely, when the absolute value of AA is less than 1, the grey wolves exhibit a more focused, local search.

In addition to these parameters, the influence of the grey wolves’ positions on the prey is governed by a random weight, denoted as CC. This weight, which ranges between 0 and 2, determines the random influence of the grey wolf’s location on the prey. A value of CC greater than 1 indicates a higher weight, emphasizing the significance of the grey wolf’s position in guiding the search. Conversely, a value of CC less than 1 assigns a lower weight, reducing the impact of the grey wolf’s location. This random weight, CC, helps prevent the algorithm from converging too early and becoming trapped in a local optimum.

By dynamically adjusting the values of aa, AA, and CC during the attacking phase, the grey wolves strike a balance between exploration and exploitation, allowing them to efficiently search for and capture the optimal solution while avoiding premature convergence and local optima.

2.2.6 Feature Selection Objective Function

During each iteration of the GWO algorithm, the classification label for each candidate solution Xi\vec{X_{i}} is predicted using the evaluation classifier selected from the classifier selection phase. Specifically, the evaluation classifier is initially trained on the original training gene expression dataset 𝒟tr\mathcal{D}_{tr} with all features using five-fold cross-validation. For each Xi\vec{X_{i}} containing a subset of selected features, the evaluation classifier generates predicted labels yiy_{i}^{\prime} by classifying the corresponding data points from 𝒟tr\mathcal{D}_{tr} using only the selected features in Xi\vec{X_{i}}. The performance of yiy_{i}^{\prime} on 𝒟tr\mathcal{D}_{tr} determines the fitness value assigned to solution Xi\vec{X_{i}}. This allows the GWO algorithm to determine the α\alpha, β\beta, and δ\delta solutions representing the current best feature subsets for classification.

In the feature selection stage, the primary objective is to identify and select relevant features while filtering out redundant ones for subsequent identification purposes in cancer gene expression data. Traditional studies often focus solely on classification accuracy, disregarding the resource costs associated with redundant features. In our study, we address this limitation by considering both classification accuracy and the size of the feature subsets as part of our feature selection objective function [39].

The objective function, denoted as f1f_{1}, is defined as follows:

f1=αerror+βfnumdim.f_{1}=\alpha*\text{error}+\beta*\frac{f_{\text{num}}}{\text{dim}}. (10)

Here, fnumf_{\text{num}} represents the number of selected features during the evolutionary process, and dim represents the total number of features in the dataset. To strike a balance between the two objectives, we introduce weight coefficients to control their relative importance. In our study, we assign a weight of 0.9 to α\alpha to emphasize the significance of classification accuracy, while β\beta is set to 0.1 to underscore the importance of the feature subset size. These weight coefficients were determined based on the findings in the reference [40], where classification accuracy was identified as the primary objective.

The classification error (error) is a key component of the objective function. It is calculated as the difference between 1 and the accuracy (acc), which is defined as:

error=1acc,\text{error}=1-\text{acc}, (11)
acc=s=1nI(ys,ys)n.\text{acc}=\frac{\sum_{s=1}^{n}{I(y^{\prime}_{s},y_{s})}}{n}. (12)

In the above equations, nn represents the total number of instances, ysy^{\prime}_{s} represents the predicted class label for instance ss, and ysy_{s} represents the true class label for instance ss. The function I(ys,ys)I(y^{\prime}_{s},y_{s}) evaluates to 1 if the predicted and true class labels match, and 0 otherwise.

2.3 Nature-inspired Diverse Ensemble Learning

In this section, we propose a novel nature-inspired diverse ensemble learning method to improve the performance of cancer identification using selected features obtained through nature-inspired feature selection. Our method comprises diverse subspace generation, model pool generation, and model pool optimization.

2.3.1 Diverse Subspace Generation

Given the gene expression data after feature selection, denoted as 𝒟tr={(x1,y1),,(xm,ym)}\mathcal{D}_{tr}^{\prime}=\{(x_{1},y_{1}),...,(x_{m},y_{m})\}, where xi=(xi,1,xi,2,,xi,dim)x_{i}=(x_{i,1},x_{i,2},...,x_{i,dim}) represents the feature vector with dimdim denoting the number of features, y{1,2,,c}y\in\{1,2,...,c\} represents the classification label, and mm represents the number of input samples, we employ the K-means method [41] to cluster the input cancer gene expression data into multiple clusters. The clustering process is performed iteratively from 1 to tt, generating KK clusters in each iteration. Here, tt denotes the total number of iterations. The clusters are obtained by minimizing the following function:

argminSi=1KxSixμi2,\mathop{argmin}\limits_{S}\sum_{i=1}^{K}\sum_{x\in S_{i}}{||x-\mu_{i}||^{2}}, (13)

where xx represents the feature vector and μi\mu_{i} is the centroid of cluster SiS_{i}. This clustering process generates a set of diverse subspaces composed of all the obtained clusters.

2.3.2 Model Pool Generation

Each cluster in the diverse subspace is used to train six classifiers (DISCR, DT, KNN, ANN, SVM, and NB) to create a model pool. The base classifiers used in this step are independent. The resulting models are then added to the base model pool 𝒫\mathcal{MP}. The base model pool 𝒫\mathcal{MP} consists of l||l*|\mathcal{B}| models, where ll represents the number of clusters and |||\mathcal{B}| represents the number of base classifiers. Finally, we employ nature-inspired optimization techniques to refine the base models in the ensemble. Here, any combination of classifiers can be utilized.

2.3.3 Model Pool Optimization

After obtaining the diverse base model pool 𝒫\mathcal{MP}, we propose a pre-optimization step to refine 𝒫\mathcal{MP} by removing models with below-average performance. Subsequently, we incorporate a nature-inspired optimization method, namely GWO, to further optimize the pre-optimized base model pool 𝒫\mathcal{MP}.

Population Initialization: The population is randomly initialized, and each individual is represented as follows:

Xi={mp1,mp2,,mpr}.\vec{X_{i}}=\{mp_{1},mp_{2},...,mp_{r}\}. (14)

Here, mprmp_{r} represents a classifier in the model pool 𝒫\mathcal{MP}, and rr is the total number of models in 𝒫\mathcal{MP}. Similar to nature-inspired feature selection, the selection or non-selection of models is indicated by binary values. ”1” indicates that a model is selected, while ”0” indicates that the model is not selected. To convert the continuous search space of GWO into a binary search space, we introduce a threshold θ\theta. The conversion from a continuous position to discrete binary values is defined as follows:

mpr={1,mprθ0,mpr<θ.mp_{r}=\begin{dcases}1,&mp_{r}\geq\theta\\ 0,&mp_{r}<\theta\end{dcases}. (15)

Nature-inspired Optimization Process: In this phase, our aim is to discover optimal model subsets by optimizing the base model pool 𝒫\mathcal{MP}. The population is used to explore optimal model subsets in the encircling phase, identify potential optimal solutions in the hunting phase, and ultimately obtain the optimal solution in the attacking phase.

Ensemble Optimizing Objective Function: Our objective is to achieve the highest identification performance with the smallest ensemble size. After clustering the data following feature selection and training the base classifiers to create a model pool, we aim to optimize the model pool to obtain the optimal ensemble model with the smallest size. The optimized model ensemble is then evaluated using the test data. The objective function in the model pool optimization stage, denoted as f2f_{2}, is defined as follows:

f2=αerror+β|ψ|r.f_{2}=\alpha*\text{error}+\beta*\frac{|\psi|}{r}. (16)

Here, error represents the identification error rate described in Equation (11), |ψ||\psi| is the total number of selected models, and rr is the number of models in 𝒫\mathcal{MP}. The settings of α\alpha and β\beta are identical to those in section 2.2.6, with α\alpha accounting for 90% of the importance and β\beta for 10%.

However, unlike in section 2.2.6, where the predicted label ysy^{\prime}_{s} is predicted by a single classifier, we consider the ensemble of multiple models. We employ a plurality voting method to combine the predictions of multiple models, which has been proven to be a simple and effective ensemble fusion technique in many studies [42] [43].

2.3.4 Ensemble Classifier Prediction

During the training process, we obtain multiple models ψ\psi to represent the model Ψ\Psi. The model Ψ\Psi is used to generate an ensemble, and all models in Ψ\Psi are utilized to predict the test set. The predicted class labels ysy^{\prime}_{s} from all the models in the model Ψ\Psi are fused using the plurality voting method. The identification accuracy can be calculated using Equation (12).

2.4 Time Complexity Analysis

Here, we analyze the time complexity of our proposed EODE algorithm. The detailed analysis is outlined as follows:

  • Feature Selection: The time complexity of the feature selection process depends on the algorithm used. Since we used GWO for feature selection, the time complexity is typically O(T×P×F×C)O(T\times P\times F\times C), where TT is the number of generations, PP is the population size, FF is the number of features, and CC is the complexity of the fitness evaluation function. Generally, the feature selection process has a polynomial time complexity.

  • Diverse Subspace Generation: The time complexity of the diverse subspace generation mainly depends on the clustering algorithm used. Here, we applied the K-means algorithm, the time complexity is usually O(K×N×I×d)O(K\times N\times I\times d), where KK is the number of clusters, NN is the number of data points, II is the number of iterations, and dd is the dimensionality of the data. The diverse subspace generation process has a polynomial time complexity.

  • Model Pool Generation: The model pool generation involves training multiple base classifiers on each cluster. The time complexity depends on the complexity of the base classifiers and the number of clusters. Assuming the time complexity of training a base classifier on a single cluster is O(N×F×C)O(N\times F\times C), where NN is the number of data points, FF is the number of selected features, and CC is the complexity of the training algorithm, the overall time complexity of model pool generation is O(L×N×F×C)O(L\times N\times F\times C), where LL is the number of clusters. This process also has a polynomial time complexity.

  • Model Pool Optimization: The time complexity of the model pool optimization stage depends on the optimization algorithm used. We employed a nature-inspired optimization algorithm named GWO, the time complexity is typically O(T×P×C)O(T\times P\times C), where TT is the number of generations, PP is the population size, and CC is the complexity of the fitness evaluation function. Similar to the feature selection process, the model pool optimization stage generally has a polynomial time complexity.

  • Ensemble Classifier Prediction: The time complexity of the ensemble classifier prediction is dependent on the number of models in the ensemble and the complexity of combining their predictions. Assuming we have MM models in the ensemble and the complexity of combining predictions is O(M)O(M), the overall time complexity is O(M)O(M). This process has a linear time complexity.

In summary, the overall time complexity is: Overall Time Complexity = Feature Selection + Diverse Subspace Generation + Model Pool Generation + Model Pool Optimization + Ensemble Classifier Prediction = O(T ×\times P ×\times F ×\times C) + O(K ×\times N ×\times I ×\times d) + O(L ×\times N ×\times F ×\times C) + O(T ×\times P ×\times C) + O(M)

Since all these time complexities are polynomial, we can express the overall time complexity as the highest-order term in the sum. Therefore, the overall time complexity of the EODE algorithm is:

Overall Time Complexity = max\max{T ×\times P ×\times F ×\times C, K ×\times N ×\times I ×\times d, L ×\times N ×\times F ×\times C, T ×\times P ×\times C, M}

3 Implementation

3.1 Datasets

The cancer gene expression datasets were collected from [44], and can be downloaded from the website https://schlieplab.org/Static/Supplements/CompCancer/datasets.htm. The 35 datasets contain multiple types of cancers with high-dimensional features, exceeding 1000 dimensions, while having relatively small sample sizes (as shown in TABLE 1). This poses the “Curse of Dimensionality” challenge, necessitating the development of a computational model with high robustness and good generalization capabilities to address the different cancers.

To enable rigorous evaluation, the collected raw datasets have been randomly split into disjoint training and testing sets in a 80:20 ratio prior to conducting experiments. The training sets, comprising 80% of the data, have been used for model training and hyperparameter tuning. The testing sets, comprising the held-out 20% of the data, have only been used for final evaluation of the fully trained model’s performance. This ensures an unbiased estimate of generalization capability. The precise training/testing splits has been done randomly while preserving class balance in each set. Specifically, the training and testing datasets can be downloaded from the following links: https://github.com/wangxb96/EODE/tree/master/TrainData and https://github.com/wangxb96/EODE/tree/master/TestData. For model selection and hyperparameter tuning, k-fold cross-validation (k=5) was utilized during model selection and hyperparameter optimization on the training data only. By segregating the training and testing data, we prevent information leakage and overfitting to the test set. This rigorous methodology allows us to evaluate true generalization error and robustness across multiple cancer types.

TABLE I: 35 different gene expression datasets; each dataset showing the tissue type, number of samples, features, and classes.
Dataset Tissue Samples Features Classes Dataset Tissue Samples Features Classes
Alizadeh-2000-v1 Blood 42 1095 2 Alizadeh-2000-v2 Blood 62 2093 3
Alizadeh-2000-v3 Blood 62 2093 4 Armstrong-2002-v1 Blood 72 1081 2
Armstrong-2002-v2 Blood 72 2194 3 Bhattacharjee-2001 Lung 203 1543 5
Bittner-2000 Skin 38 2201 2 Bredel-2005 Brain 50 1739 3
Chen-2002 Liver 179 85 2 Chowdary-2006 Breast, Colon 104 182 2
Dyrskjot-2003 Bladder 40 1203 3 Garber-2001 Lung 66 4553 4
Golub-1999-v1 Bone Marrow 72 1877 2 Golub-1999-v2 Bone Marrow 72 1877 3
Gordon-2002 Lung 181 1626 2 Khan-2001 Multi-tissue 83 1069 4
Laiho-2007 Colon 37 2202 2 Lapointe-2004-v1 Prostate 69 1625 3
Lapointe-2004-v2 Prostate 110 2496 4 Liang-2005 Brain 37 1411 3
Nutt-2003-v1 Brain 50 1377 4 Nutt-2003-v2 Brain 28 1070 2
Nutt-2003-v3 Brain 22 1152 2 Pomeroy-2002-v1 Brain 34 857 2
Pomeroy-2002-v2 Brain 42 1379 5 Ramaswamy-2001 Multi-tissue 190 1363 14
Risinger-2003 Endometrium 42 1771 4 Shipp-2002-v1 Blood 77 798 2
Singh-2002 Prostate 102 339 2 Su-2001 Multi-tissue 174 1571 10
Tomlins-2006-v1 Prostate 104 2315 5 Tomlins-2006-v2 Prostate 92 1288 4
West-2001 Breast 49 1198 2 Yeoh-2002-v1 Bone Marrow 248 2526 2
Yeoh-2002-v2 Bone Marrow 248 2526 6 - - - - -

3.2 Baselines

To evaluate the effectiveness of our proposed method, we compared it against several existing classifiers and ensemble algorithms widely used in the literature. Firstly, we compared our model with six base classifiers: DISCR (Discriminant Analysis) [45], DT (Decision Tree) [46], KNN (K-Nearest Neighbor) [47], ANN (Artificial Neural Networks) [48], SVM (Support Vector Machine) [49], and NB (Naive Bayes) [50]. These classifiers serve as the baseline for performance comparison.

Next, we compared our approach with seven evolutionary algorithms: ACO [51], CS [52], DE [53], GA [54], GWO [37], PSO [55], and ABC [56]. These algorithms are widely used for optimization problems. Furthermore, we evaluated our approach against four novel ensemble methods: PSOEL [27], EAEL [57], FESM [58], and GA-Bagging-SVM [59]. These methods were selected to demonstrate the effectiveness of our proposed approach in comparison to recent advancements in ensemble learning.

In addition, we compared our ensemble algorithm with six state-of-the-art ensemble classifiers: Random Forests (RF) [60], ADABOOST [22], RUSBOOST [61], SUBSPACE [62], TOTALBOOST [63], and LPBOOST [64]. Random Forests is a well-known bagging method [60], while ADABOOST is a popular boosting method [22]. RUSBOOST is a random undersampling boosting method designed to address class imbalance [61]. SUBSPACE trains random feature subsets to reduce estimator correlation [62]. TOTALBOOST and LPBOOST aim to maximize the minimal margin of learned ensembles and have the ability to self-terminate [63] [64].

By comparing our method against these diverse algorithms, we aim to showcase its superiority and effectiveness in addressing the cancer gene expression data classification problem. Moreover, we have opened all computational model for public accessibility at “https://github.com/wangxb96/EODE/tree/master/ComparisonAlgorithms”.

3.3 Parameter Settings

Our experiments were conducted on a desktop computer with the following specifications: an Intel(R) Core(TM) i7-10700KF CPU @3.80GHz, 32GB of RAM, and a 64-bit Windows 10 operating system using Matlab 2021a. We utilized six base classifiers, namely DISCR, DT, KNN, ANN, SVM, and NB, to construct the ensemble. The parameters for DISCR, KNN, SVM, and NB are summarized in TABLE II, while the rest of the classifiers were used with their default settings. Additionally, Random Forest (RF) [60], ADABOOST [22], RUSBOOST [61], SUBSPACE [62], TOTALBOOST [63], and LPBOOST [64] were employed with their default parameter values. Furthermore, the parameters for four novel ensemble classifier methods, namely PSOEL [27], EAEL [57], FESM [58], and GA-Bagging-SVM [59], were set to be consistent with the original papers.

In our experiments, the original data was randomly divided into training and test datasets with an 8:2 ratio. The five-fold cross-validation method was used for training the data. For the GWO algorithm in feature selection and ensemble optimization, the population size (PP) was set to 100, the number of iterations was set to 50, and the threshold (θ\theta) was set to 0.5. Specifically, the threshold value θ\theta in our study is utilized as a criterion within the Grey Wolf Optimizer algorithm to determine feature selection, and is not directly related to actual gene expression values themselves. In the clustering phase, the parameter tt was set to m5\sqrt[5]{m}. The detailed parameters of seven classical evolutionary algorithms, including ACO [51], CS [52], DE [53], GA [54], GWO [37], PSO [55], and ABC [56], are summarized in TABLE III, where the population size (PP) and the maximum iteration (maxtmax_{t}) are set to the same values.

TABLE II: Parameters of Different Machine Learning Methods
Methods Parameters
DISCR discrimtype = diaglinear
KNN K = 3
SVM ’KernelFunction’ = ’rbf’, ’IterationLimit’ = 50000, ’Standardize’= true
NB distribution = kernel
TABLE III: Parameters of Different Evolutionary Algorithms
Methods Parameters
ACO tau = 1, eta = 1, alpha = 1, beta = 0.1, rho = 0.2, Pop = 100 , maxtmax_{t} = 50.
CS lb = 0, ub = 1, θ=0.5\theta=0.5, Pa = 0.25, alpha = 1, beta = 1.5, Pop = 100 , maxtmax_{t} = 50.
DE lb = 0, ub = 1, θ=0.5\theta=0.5, CR = 0.9, F = 0.5, Pop = 100 , maxtmax_{t} = 50.
GA CR = 0.8, MR = 0.01, Pop = 100 , maxtmax_{t} = 50.
PSO lb = 0, ub = 1, θ=0.5\theta=0.5, c1 = 2, c2 = 2, w = 0.9, Vmax =(ub - lb)/2, Pop = 100 , maxtmax_{t} = 50.
ABC lb = 0, ub = 1, θ=0.5\theta=0.5, maxlimit = 5, Pop = 100 , maxtmax_{t} = 50.
GWO lb = 0, ub = 1, θ=0.5\theta=0.5, Pop = 100 , maxtmax_{t} = 50.

For ACO, ”tau” denotes the pheromone value, ”eta” denotes the heuristic desirability, ”alpha” denotes the control pheromone, ”beta” denotes the control heuristic, and ”rho” denotes the pheromone trail decay coefficient, which is set to 0.2. For CS, ”Pa” denotes the discovery rate, ”alpha” denotes the constant, and ”beta” denotes the Levy component. For DE, ”CR” denotes the crossover rate, and ”F” denotes the scale factor. For GA, ”CR” denotes the crossover rate, and ”MR” denotes the mutation rate. For PSO, ”c1” denotes the cognitive factor, ”c2” denotes the social factor, ”w” denotes the inertia weight, and ”Vmax” denotes the maximum velocity.

4 Results and Analysis

4.1 Performance Comparisons with Other Nature-inspired Ensemble Learning Algorithms

In our study, we conducted performance comparisons of EODE with several other nature-inspired ensemble learning algorithms, namely PSOEL, EAEL, FESM, and GA-Bagging-SVM. The experimental results are summarized in Figure 3, where Figure 3(A) presents detailed classification results, Figure 3(B) illustrates the performance comparisons of EODE against the other ensemble methods, and Figure 3(C) showcases the average performance values of these methods.

As shown in Figure 3(A), EODE achieved the best results among all methods on 26 out of the 35 datasets. Specifically, EODE attained 100% classification accuracy on 7 datasets and achieved over 90% accuracy on more than half of the datasets. These results highlight the robustness of EODE in handling various types of cancers and its ability to provide highly accurate classifications. From Figure 3(B), it is evident that EODE outperformed the other nature-inspired ensemble learning algorithms. The performance comparisons clearly demonstrate the superiority of EODE in terms of test accuracy. To provide a comprehensive performance overview, we present the average performance across all 35 cancer gene expression datasets in Figure 3(C). The results indicate that EODE outperformed PSOEL by 6% and exhibited more than a 10% improvement compared to the other methods. These findings strongly support the conclusion that EODE performs better than other nature-inspired ensemble methods in the context of cancer gene expression classification.

Refer to caption
Figure 3: Performance comparison to the other nature-inspired ensemble learning algorithms. (A) Test classification results of EODE and four other nature-inspired ensemble methods across the 35 cancer gene expression datasets. (B) Comparison graphs of EODE and the other four nature-inspired ensemble methods. (C) The average performance of EODE and the other four nature-inspired ensemble methods across the 35 cancer gene expression datasets.

4.2 Performance Comparisons of Different Machine Learning Algorithms

In our study, we conducted a comprehensive analysis and comparison of the performance between our proposed ensemble approach, EODE, and single classifier approaches. The experimental results, as shown in Figure 4, clearly demonstrate the superiority of EODE in terms of classification accuracy for cancer gene expression datasets. EODE achieved the best classification accuracy for over 55% of the datasets, surpassing all single classifiers. This indicates the effectiveness and robustness of our ensemble approach in handling cancer gene expression classification tasks. Moreover, when considering the average performance across all 35 cancer gene expression datasets, EODE consistently outperformed all single classifiers. Specifically, our ensemble approach exhibited remarkable improvements compared to the worst classifier, with an increase in performance of nearly 33%. Furthermore, EODE consistently achieved performance improvements of more than 10% compared to the majority of the base classifiers.

These findings clearly highlight the advantages of our ensemble approach over traditional single classifier methods. By leveraging the collective wisdom of multiple classifiers, EODE effectively addresses the challenges posed by cancer gene expression classification, resulting in superior classification accuracy and overall performance. Figure 4 provides a visual representation of the experimental results, further supporting the conclusions drawn from our performance comparisons. The results validate the effectiveness of our proposed ensemble approach, highlighting its potential as a valuable tool in the field of cancer gene expression analysis.

Refer to caption
Figure 4: Performance comparisons of the different machine learning algorithms. The first 7 graphs represent the test classification accuracy on the different cancer gene expression datasets, and the last graph indicates the average performance of the seven methods on the 35 datasets.

4.3 Performance Comparisons of the Different Evolutionary Algorithms

To further evaluate the performance of the proposed EODE method, we compared it against other state-of-the-art evolutionary algorithms, including: Ant Colony Optimization (ACO), Cuckoo Search (CS), Differential Evolution (DE), Genetic Algorithm (GA), Grey Wolf Optimizer (GWO), Particle Swarm Optimization (PSO), and Artificial Bee Colony (ABC).

The experimental results are summarized in supplementary Figures 1 and 2. Supplementary Figure 1 shows the classification accuracy of different methods on each of the 35 cancer gene expression datasets, with the first 7 sub-figures presenting results on individual datasets and the last sub-figure reporting the average performance across all datasets. As seen in supplementary Figure 1, EODE obtains the best classification accuracy on over 60% of the datasets. Notably, there is an improvement of 5-8% in average classification accuracy achieved by EODE compared to other evolutionary algorithms. Supplementary Figure 2 depicts the number of features (i.e. biomarker genes) selected by each method on each dataset. We can observe that EODE selects the smallest feature subset in nearly 60% of datasets, indicating its ability to identify the most informative genes.

Overall, from both supplementary Figures 1 and 2, we can deduce that EODE consistently demonstrates the best average performance across all 35 cancer gene expression datasets, outperforming other state-of-the-art evolutionary methods. This validates the effectiveness and robustness of the proposed EODE approach in discovering critical biomarker genes for cancer classification.

TABLE IV: Performance on Training and Testing Sets with and without Ensemble Learning (WEL) Method.
Datasets Training Accuracy Test Accuracy Datasets Training Accuracy Test Accuracy
WEL EODE WEL EODE WEL EODE WEL EODE
Alizadeh-2000-v1 1.0000 1.0000 0.5114 1.0000 Lapointe-2004-v2 0.8737 0.8634 0.4339 0.9091
Alizadeh-2000-v2 1.0000 1.0000 0.6591 1.0000 Liang-2005 0.9515 0.9667 0.6494 0.8571
Alizadeh-2000-v3 0.9745 0.9600 0.4318 0.9167 Nutt-2003-v1 0.8614 0.9250 0.4273 0.9000
Armstrong-2002-v1 1.0000 0.9818 0.6558 0.9286 Nutt-2003-v2 1.0000 1.0000 0.5455 0.6000
Armstrong-2002-v2 1.0000 1.0000 0.4416 0.9286 Nutt-2003-v3 1.0000 1.0000 0.7955 1.0000
Bhattacharjee-2001 0.9944 0.9938 0.6750 0.8750 Pomeroy-2002-v1 0.9697 0.9667 0.8333 0.8333
Bittner-2000 0.9614 0.9667 0.5857 0.7143 Pomeroy-2002-v2 0.9532 0.9381 0.2841 1.0000
Bredel-2005 0.9114 0.9250 0.5727 0.9000 Ramaswamy-2001_\_database 0.7251 0.7966 0.1794 0.7632
Chen-2002 0.9918 0.9931 0.6649 0.9714 Risinger-2003 0.8649 0.8238 0.4545 0.7500
Chowdary-2006 0.9904 0.9875 0.8682 0.9000 Shipp-2002-v1 0.9800 0.9833 0.7212 0.8667
Dyrskjot-2003 0.9948 0.9381 0.5568 0.7500 Singh-2002 0.9778 0.9750 0.5955 0.9000
Garber-2001 0.8883 0.8873 0.5455 0.6154 Su-2001 0.9565 0.9643 0.1925 0.9118
Golub-1999-v1 0.9970 1.0000 0.6234 1.0000 Tomlins-2006-v1 0.8787 0.8809 0.3364 0.8000
Golub-1999-v2 0.9939 1.0000 0.5065 0.9286 Tomlins-2006-v2 0.8721 0.8648 0.3889 0.9444
Gordon-2002 1.0000 1.0000 0.8409 0.9444 West-2001 1.0000 1.0000 0.4646 0.5556
Khan-2001_\_database 1.0000 1.0000 0.3466 1.0000 Yeoh-2002-v1 0.9950 0.9950 0.8108 1.0000
Laiho-2007_\_database 1.0000 1.0000 0.7143 0.7143 Yeoh-2002-v2 0.8448 0.8995 0.2430 0.8776
Lapointe-2004-v1 0.8423 0.8591 0.5385 0.6154 Average 0.9498 0.9524 0.5456 0.8620

4.4 Performance Comparisons of the Different Ensemble Learning Algorithms

Refer to caption
Figure 5: Performance comparisons of the different ensemble learning algorithms. (A) Test classification results of EODE and six other ensemble methods across the 35 cancer gene expression datasets; (B) The average performance of EODE and the six other ensemble classifiers on the 35 datasets; (C) Graphs of EODE versus the other ensemble classifiers, where RF denotes Random Forest.

To further validate the effectiveness of the proposed EODE method, we conducted experiments comparing its performance to other state-of-the-art ensemble learning classifiers on the 35 cancer gene expression datasets. The methods considered for comparison include: Random Forest (RF) [60], ADABOOST [22], RUSBOOST [61], SUBSPACE [62], TOTALBOOST [63] and LPBOOST [64].

The results are shown in Figure 5. Figure 5(A) depicts a heat map of the classification accuracy of different methods on each of the 35 datasets, where darker colors indicate better performance. This heat map visualization allows us to qualitatively compare the performance of models across various cancer types. Figure 5(B) summarizes the mean classification accuracy of each method averaged over the 35 datasets. The proposed EODE method achieves 6-32% better performance compared to other ensemble learning classifiers, demonstrating its superior predictive ability. Figure 5(C) presents box plots to compare the distribution of classification accuracies obtained by each method on different datasets. We can observe that the median accuracy of EODE is higher than all other methods, indicating its stable and robust performance. Moreover, the box plot of EODE is more narrow compared to others, showing the consistency of results obtained.

Overall, these quantitative and qualitative comparisons presented in Figure 5 validate that the proposed EODE method achieves the best classification accuracy on over 70% of the cancer gene expression datasets, outperforming other state-of-the-art ensemble classifiers. This clearly demonstrates the effectiveness and robustness of the EODE approach for cancer classification using gene expression data.

4.5 Ablation Study

4.5.1 Performance of EODE without Ensemble Learning

TABLE 4 presents a comprehensive evaluation of training and testing performance across 35 datasets using the proposed EODE approach against EODE without ensemble learning (WEL). Here, WEL means the nature-inspired diverse ensemble learning is not employed. On analysis, it is evident that the training accuracy of both EODE and WEL are comparable, with the EODE approach achieving a slightly higher average training accuracy of 0.9524 versus 0.9498 for WEL. This indicates that the model capacity for fitting the training data is similar between the two approaches. However, EODE demonstrates a significant test accuracy advantage over WEL, with average test accuracies of 0.8620 and 0.5456 respectively. This translates to an absolute improvement of over 30% in generalization performance by leveraging ensemble learning.

The key insight is that while ensemble learning does not markedly improve training fit, it provides superior generalization through effectively preventing overfitting. Single models are prone to overfitting the noise in small datasets. Ensemble learning creates multiple diverse models and aggregates their predictions, avoiding these spurious patterns. Across multiple datasets, EODE consistently exhibits stronger generalization, evidenced by the significantly higher test accuracies. This gap is particularly prominent in smaller datasets where individual models tend to overfit more. By reducing variance via ensembling, the proposed approach demonstrates more robust predictions on unseen test data. The results validate the effectiveness of ensemble learning in enhancing model generalization capability and tackling the overfitting challenge.

In conclusion, the ensemble framework shows considerable promise in boosting test performance over single model baseline across a wide range of conditions. This has important implications for real-world applications like speech emotion recognition where avoiding overfitting is critical. The analysis provides strong empirical evidence and rationale for adopting ensemble techniques.

4.5.2 Performance of EODE Ensemble versus Individual Classifiers

Unlike the analysis in Section 4.2, this study does not evaluate each base classifier model in isolation. Rather, this section investigates the impact of using a single base classifier within the nature-inspired diverse ensemble learning phase of the proposed approach, instead of aggregating multiple heterogeneous classifiers concurrently as intended in the ensemble methodology. By focusing on the ensemble learning stage, this analysis provides targeted insight into the benefits of leveraging diversity in the classifier combinations compared to relying on any individual modeling paradigm alone during this critical step.

TABLE V: Performance of EODE Ensemble versus Individual Base Classifiers
Dataset DISCR DT KNN ANN SVM NB EODE
Alizadeh-2000-v1 0.8333 0.6875 0.7750 0.6875 0.5000 0.7500 1.0000
Alizadeh-2000-v2 1.0000 0.8452 0.9833 1.0000 0.6667 0.7333 1.0000
Alizadeh-2000-v3 0.9306 0.7381 0.8667 0.9271 0.3333 0.8000 0.9167
Armstrong-2002-v1 0.8929 0.9490 0.9143 0.8929 0.6429 0.8429 0.9286
Armstrong-2002-v2 0.8690 0.7959 0.8000 0.8482 0.4286 0.6714 0.9286
Bhattacharjee-2001 0.9292 0.8321 0.8350 0.9063 0.6750 0.7750 0.8750
Bittner-2000 0.6905 0.6531 0.7143 0.6786 0.4286 0.5714 0.7143
Bredel-2005 0.8667 0.6857 0.7800 0.8500 0.7000 0.8400 0.9000
Chen-2002 0.8619 0.7837 0.8171 0.8786 0.5829 0.8229 0.9714
Chowdary-2006 0.9000 0.9214 0.9300 0.9438 0.8900 0.8700 0.9000
Dyrskjot-2003 0.7500 0.6429 0.6750 0.6719 0.6250 0.6500 0.7500
Garber-2001 0.7051 0.6923 0.6769 0.6538 0.6154 0.6769 0.6154
Golub-1999-v1 0.9286 0.9184 0.9143 0.8482 0.6429 0.8000 1.0000
Golub-1999-v2 0.8810 0.8367 0.8857 0.7500 0.5714 0.6714 0.9286
Gordon-2002 0.9861 0.9643 0.9722 0.9757 0.8333 0.9500 0.9444
Khan-2001_\_database 0.8646 0.8571 0.9125 0.9297 0.3125 0.7000 1.0000
Laiho-2007_\_database 0.7381 0.7551 0.7143 0.6786 0.7143 0.7429 0.7143
Lapointe-2004-v1 0.7538 0.6264 0.8154 0.6923 0.5385 0.6000 0.6154
Lapointe-2004-v2 0.7386 0.5195 0.5909 0.7330 0.3636 0.7091 0.9091
Liang-2005 0.6786 0.7347 0.8000 0.8750 0.7143 0.7143 0.8571
Nutt-2003-v1 0.7000 0.3857 0.5800 0.5500 0.2800 0.3800 0.9000
Nutt-2003-v2 0.8000 0.4286 0.6800 0.6500 0.4400 0.4000 0.6000
Nutt-2003-v3 1.0000 0.8214 0.8500 1.0000 0.7500 0.8500 1.0000
Pomeroy-2002-v1 0.6250 0.6429 0.6333 0.6250 0.1667 0.6000 0.8333
Pomeroy-2002-v2 0.8750 0.4107 0.5500 0.5156 0.2500 0.2500 1.0000
Ramaswamy-2001 0.6250 0.5752 0.6263 0.5329 0.1579 0.2526 0.7632
Risinger-2003 0.5938 0.5000 0.3750 0.5625 0.3750 0.5000 0.7500
Shipp-2002-v1 0.7667 0.7619 0.8000 0.8667 0.7333 0.7733 0.8667
Singh-2002 0.8125 0.7833 0.7600 0.8188 0.5000 0.8500 0.9000
Su-2001 0.8676 0.7549 0.7588 0.8750 0.1588 0.4471 0.9118
Tomlins-2006-v1 0.7875 0.5250 0.8200 0.8000 0.3000 0.6800 0.8000
Tomlins-2006-v2 0.7778 0.6000 0.7000 0.7986 0.3889 0.6556 0.9444
West-2001 0.5833 0.7778 0.6444 0.5972 0.4667 0.6000 0.5556
Yeoh-2002-v1 0.9796 0.9714 0.9878 0.9566 0.6694 0.8327 1.0000
Yeoh-2002-v2 0.6735 0.6408 0.7673 0.5357 0.3265 0.3347 0.8776
Average 0.8076 0.7148 0.7687 0.7744 0.5069 0.6656 0.8620

Across the 35 gene expression datasets analyzed, the best single classifier achieved an average accuracy of 0.8076 using a DISCR model. In contrast, the proposed EODE ensemble approach attained a significantly higher accuracy of 0.8620 by leveraging an integrated combination of diverse classifiers including DISCR, DT, KNN, ANN, SVM and NB. The results highlight that relying on any individual base classifier is suboptimal compared to the ensemble approach. No single modeling paradigm consistently dominates the performance across all datasets, due to the complexity of the classification problem. Different datasets exhibit variability in terms of which individual classifier achieves the best performance when used alone. However, EODE provides equal or higher accuracy relative to the top stand-alone model on 23 out of 35 datasets. The results empirically demonstrate that integrating multiple complementary base classifiers simultaneously is essential to maximize the potential of the ensemble framework and attain optimal classification performance on gene expression data. Reliance on any single constituent classifier within the ensemble learning process fails to harness the full synergistic advantages of the diverse ensemble.

5 Conclusion

Cancer type identification is a critical aspect of cancer research, as it enables early diagnosis and tailored treatment for patients. One key challenge in this field is identifying the highly sensitive biomarker genes that are indicative of specific cancer types. In this study, we propose a novel approach called EODE to address the classification of cancer types, particularly in scenarios where the gene expression profile samples are high-dimensional and small in size. EODE leverages the grey wolf optimizer (GWO) to optimize feature subsets and collaboratively builds an optimized ensemble classifier. By combining nature-inspired feature selection and ensemble learning, EODE significantly improves the model’s identification capability.

We conducted experiments on 35 datasets encompassing various cancer types, and the results demonstrate the effectiveness of our algorithm compared to four nature-inspired ensemble methods (PSOEL, EAEL, FESM, and GA-Bagging-SVM), six benchmark machine learning algorithms (KNN, DT, ANN, SVM, DISCR, and NB), six state-of-the-art ensemble algorithms (RF, ADABOOST, RUSBOOST, SUBSPACE, TOTALBOOST, and LPBOOST), and seven nature-inspired methods (ACO, CS, DE, GA, GWO, PSO, and ABC). Our algorithm outperformed these methods in terms of classification accuracy.

In future work, we aim to enhance the efficiency of the algorithm by improving the screening of redundant and invalid features. Additionally, as biomedical data often exhibit class imbalance, we plan to ensure robust results on class-imbalanced data. Beyond computational refinements, we intend to evaluate the proposed methodology on expanded gene expression datasets from diverse clinical cohorts. As cancer subtyping using gene expression data holds great promise for guiding individualized treatment decisions, we hope to transition this computational pipeline into real-world clinical settings.

Acknowledgments

The work described in this paper was substantially supported by the National Natural Science Foundation of China under Grant No. 62076109, and funded by the Natural Science Foundation of Jilin Province under Grant No. 20190103006JH, the Natural Science Funds of Jilin Province under Grant No. 20200201158JC. The work described in this paper was supported by the grant from the Health and Medical Research Fund, the Food and Health Bureau, The Government of the Hong Kong Special Administrative Region [07181426], and the funding from Hong Kong Institute for Data Science (HKIDS) at City University of Hong Kong. The work described in this paper was partially supported by two grants from City University of Hong Kong (CityU 11202219, CityU 11203520). This research is also supported by the National Natural Science Foundation of China under Grant No. 32000464.

References

  • [1] Wei Cao, Hong-Da Chen, Yi-Wen Yu, Ni Li, and Wan-Qing Chen. Changing profiles of cancer burden worldwide and in china: a secondary analysis of the global cancer statistics 2020. Chinese Medical Journal, 134(07):783–791, 2021.
  • [2] Kyle Swanson, Eric Wu, Angela Zhang, Ash A Alizadeh, and James Zou. From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell, 2023.
  • [3] Wenya Linda Bi, Ahmed Hosny, Matthew B Schabath, Maryellen L Giger, Nicolai J Birkbak, Alireza Mehrtash, Tavis Allison, Omar Arnaout, Christopher Abbosh, Ian F Dunn, et al. Artificial intelligence in cancer imaging: clinical challenges and applications. CA: a cancer journal for clinicians, 69(2):127–157, 2019.
  • [4] Joaquin Mateo, Lotte Steuten, Philippe Aftimos, Fabrice André, Mark Davies, Elena Garralda, Jan Geissler, Don Husereau, Iciar Martinez-Lopez, Nicola Normanno, et al. Delivering precision oncology to patients with cancer. Nature Medicine, 28(4):658–665, 2022.
  • [5] De-Shuang Huang and Chun-Hou Zheng. Independent component analysis-based penalized discriminant method for tumor classification using gene expression data. Bioinformatics, 22(15):1855–1862, 2006.
  • [6] Ran Su, Jiahang Zhang, Xiaofeng Liu, and Leyi Wei. Identification of expression signatures for non-small-cell lung carcinoma subtype classification. Bioinformatics, 36(2):339–346, 2020.
  • [7] Chiwen Qu, Lupeng Zhang, Jinlong Li, Fang Deng, Yifan Tang, Xiaomin Zeng, and Xiaoning Peng. Improving feature selection performance for classification of gene expression data using harris hawks optimizer with variable neighborhood learning. Briefings in Bioinformatics, 2021.
  • [8] Hilary S Parker, Jeffrey T Leek, Alexander V Favorov, Michael Considine, Xiaoxin Xia, Sameer Chavan, Christine H Chung, and Elana J Fertig. Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction. Bioinformatics, 30(19):2757–2763, 2014.
  • [9] Florian Schmidt, Markus List, Engin Cukuroglu, Sebastian Köhler, Jonathan Göke, and Marcel H Schulz. An ontology-based method for assessing batch effect adjustment approaches in heterogeneous datasets. Bioinformatics, 34(17):i908–i916, 2018.
  • [10] Ting Jin, Nam D Nguyen, Flaminia Talos, and Daifeng Wang. Ecmarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages. Bioinformatics, 37(8):1115–1124, 2021.
  • [11] Bryan He, Ludvig Bergenstråhle, Linnea Stenbeck, Abubakar Abid, Alma Andersson, Åke Borg, Jonas Maaskola, Joakim Lundeberg, and James Zou. Integrating spatial gene expression and breast tumour morphology via deep learning. Nature Biomedical Engineering, 4(8):827–834, 2020.
  • [12] Huimin Gao, Chuang Bian, Xvbin Wang, Xiangtao Li, and Yunhe Wang. Exploring cancer biomarker genes from gene expression data via natureinspired multiobjective optimization. In 2022 34th Chinese Control and Decision Conference (CCDC), pages 5000–5007. IEEE, 2022.
  • [13] Xubin Wang and Weijia Jia. A feature weighting particle swarm optimization method to identify biomarker genes. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 830–834. IEEE, 2022.
  • [14] Shahla Nemati, Mohammad Ehsan Basiri, Nasser Ghasem-Aghaee, and Mehdi Hosseinzadeh Aghdam. A novel aco–ga hybrid algorithm for feature selection in protein function prediction. Expert systems with applications, 36(10):12086–12094, 2009.
  • [15] Negar Maleki, Yasser Zeinali, and Seyed Taghi Akhavan Niaki. A k-nn method for lung cancer prognosis with the use of a genetic algorithm for feature selection. Expert Systems with Applications, 164:113981, 2021.
  • [16] Rodrigo Clemente Thom de Souza, Camila Andrade de Macedo, Leandro dos Santos Coelho, Juliano Pierezan, and Viviana Cocco Mariani. Binary coyote optimization algorithm for feature selection. Pattern Recognition, 107:107470, 2020.
  • [17] Gaurav Dhiman, Diego Oliva, Amandeep Kaur, Krishna Kant Singh, S Vimal, Ashutosh Sharma, and Korhan Cengiz. Bepo: a novel binary emperor penguin optimizer for automatic feature selection. Knowledge-Based Systems, 211:106560, 2021.
  • [18] Abdelaziz I Hammouri, Majdi Mafarja, Mohammed Azmi Al-Betar, Mohammed A Awadallah, and Iyad Abu-Doush. An improved dragonfly algorithm for feature selection. Knowledge-Based Systems, 203:106131, 2020.
  • [19] Nabil Neggaz, Essam H Houssein, and Kashif Hussain. An efficient henry gas solubility optimization for feature selection. Expert Systems with Applications, 152:113364, 2020.
  • [20] Mahardhika Pratama, Witold Pedrycz, and Edwin Lughofer. Evolving ensemble fuzzy classifier. IEEE Transactions on Fuzzy Systems, 26(5):2552–2567, 2018.
  • [21] Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996.
  • [22] Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
  • [23] Ronglai Shen, Adam B Olshen, and Marc Ladanyi. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics, 25(22):2906–2912, 2009.
  • [24] Zhen Cao, Xiaoyong Pan, Yang Yang, Yan Huang, and Hong-Bin Shen. The lnclocator: a subcellular localization predictor for long non-coding rnas based on a stacked ensemble classifier. Bioinformatics, 34(13):2185–2194, 2018.
  • [25] Ran Su, Xinyi Liu, Guobao Xiao, and Leyi Wei. Meta-gdbp: a high-level stacked regression model to improve anticancer drug response prediction. Briefings in Bioinformatics, 21(3):996–1005, 2020.
  • [26] Gavin Brown, Jeremy Wyatt, Rachel Harris, and Xin Yao. Diversity creation methods: a survey and categorisation. Information Fusion, 6(1):5–20, 2005.
  • [27] Muhammad Zohaib Jan, Juan Carloz Munoz, and Muhammad Asim Ali. A novel method for creating an optimized ensemble classifier by introducing cluster size reduction and diversity. IEEE Transactions on Knowledge and Data Engineering, 2020.
  • [28] Tien Thanh Nguyen, Anh Vu Luong, Manh Truong Dang, Alan Wee-Chung Liew, and John McCall. Ensemble selection based on classifier prediction confidence. Pattern Recognition, 100:107104, 2020.
  • [29] Yijun Chen, Man-Leung Wong, and Haibing Li. Applying ant colony optimization to configuring stacking ensembles for data mining. Expert Systems with Applications, 41(6):2688–2702, 2014.
  • [30] Asit Kumar Das, Soumen Kumar Pati, and Arka Ghosh. Relevant feature selection and ensemble classifier design using bi-objective genetic algorithm. Knowledge and Information Systems, 62(2):423–455, 2020.
  • [31] Xiangtao Li, Shixiong Zhang, and Ka-Chun Wong. Single-cell rna-seq interpretations using evolutionary multiobjective ensemble pruning. Bioinformatics, 35(16):2809–2817, 2019.
  • [32] Sujie Zhu, Weikaixin Kong, Jie Zhu, Liting Huang, Shixin Wang, Suzhen Bi, and Zhengwei Xie. The genetic algorithm-aided three-stage ensemble learning method identified a robust survival risk score in patients with glioma. Briefings in Bioinformatics, 23(5):bbac344, 2022.
  • [33] Girish Chandrashekar and Ferat Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16–28, 2014.
  • [34] Ludmila I Kuncheva and Christopher J Whitaker. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51:181–207, 2003.
  • [35] Yi Zhang, Samuel Burer, W Nick Street, Kristin P Bennett, and Emilio Parrado-Hernández. Ensemble pruning via semi-definite programming. Journal of Machine Learning Research, 7(7), 2006.
  • [36] Lior Rokach. Ensemble-based classifiers. Artificial Intelligence Review, 33:1–39, 2010.
  • [37] Seyedali Mirjalili, Seyed Mohammad Mirjalili, and Andrew Lewis. Grey wolf optimizer. Advances in Engineering Software, 69:46–61, 2014.
  • [38] Hossam Faris, Ibrahim Aljarah, Mohammed Azmi Al-Betar, and Seyedali Mirjalili. Grey wolf optimizer: a review of recent variants and applications. Neural Computing and Applications, 30(2):413–435, 2018.
  • [39] Bing Xue, Mengjie Zhang, and Will N Browne. Particle swarm optimization for feature selection in classification: A multi-objective approach. IEEE Transactions on Cybernetics, 43(6):1656–1671, 2012.
  • [40] Xiangyang Wang, Jie Yang, Xiaolong Teng, Weijun Xia, and Richard Jensen. Feature selection based on rough sets and particle swarm optimization. Pattern Recognition Letters, 28(4):459–471, 2007.
  • [41] David JC MacKay and David JC Mac Kay. Information Theory, Inference and Learning Algorithms. Cambridge university press, 2003.
  • [42] Reshef Meir, Maria Polukarov, Jeffrey Rosenschein, and Nicholas Jennings. Convergence to equilibria in plurality voting. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 24, 2010.
  • [43] Reshef Meir. Plurality voting under uncertainty. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 29, 2015.
  • [44] Marcilio Cp De Souto, Ivan G Costa, Daniel Sa De Araujo, Teresa B Ludermir, and Alexander Schliep. Clustering cancer gene expression data: a comparative study. BMC Bioinformatics, 9(1):1–14, 2008.
  • [45] Peter A Lachenbruch and M Goldstein. Discriminant analysis. Biometrics, pages 69–85, 1979.
  • [46] S Rasoul Safavian and David Landgrebe. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics, 21(3):660–674, 1991.
  • [47] Naomi S Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175–185, 1992.
  • [48] Bayya Yegnanarayana. Artificial Neural Networks. PHI Learning Pvt. Ltd., 2009.
  • [49] William S Noble. What is a support vector machine? Nature Biotechnology, 24(12):1565–1567, 2006.
  • [50] Kevin P Murphy et al. Naive bayes classifiers. University of British Columbia, 18(60):1–8, 2006.
  • [51] Marco Dorigo, Mauro Birattari, and Thomas Stutzle. Ant colony optimization. IEEE Computational Intelligence Magazine, 1(4):28–39, 2006.
  • [52] Xin-She Yang and Suash Deb. Cuckoo search via lévy flights. In 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC), pages 210–214. Ieee, 2009.
  • [53] Swagatam Das and Ponnuthurai Nagaratnam Suganthan. Differential evolution: A survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation, 15(1):4–31, 2010.
  • [54] Darrell Whitley. A genetic algorithm tutorial. Statistics and computing, 4(2):65–85, 1994.
  • [55] James Kennedy and Russell Eberhart. Particle swarm optimization. In Proceedings of ICNN’95-international Conference on Neural Networks, volume 4, pages 1942–1948. IEEE, 1995.
  • [56] Dervis Karaboga and Bahriye Basturk. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (abc) algorithm. Journal of Global Optimization, 39(3):459–471, 2007.
  • [57] Zohaib Md. Jan and Brijesh Verma. Evolutionary classifier and cluster selection approach for ensemble classification. ACM Transactions on Knowledge Discovery from Data (TKDD), 14(1):1–18, 2019.
  • [58] Muhammad Zohaib Jan. A Novel Framework for Optimised Ensemble Classifiers. PhD thesis, Central Queensland University, 2020.
  • [59] Jianying Lin, Hui Chen, Shan Li, Yushuang Liu, Xuan Li, and Bin Yu. Accurate prediction of potential druggable proteins based on genetic algorithm and bagging-svm ensemble classifier. Artificial Intelligence in Medicine, 98:35–47, 2019.
  • [60] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
  • [61] Chris Seiffert, Taghi M Khoshgoftaar, Jason Van Hulse, and Amri Napolitano. Rusboost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 40(1):185–197, 2009.
  • [62] Tin Kam Ho. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(8):832–844, 1998.
  • [63] Manfred K Warmuth, Jun Liao, and Gunnar Rätsch. Totally corrective boosting algorithms that maximize the margin. In Proceedings of the 23rd International Conference on Machine Learning, pages 1001–1008, 2006.
  • [64] Adam J Grove and Dale Schuurmans. Boosting in the limit: Maximizing the margin of learned ensembles. In AAAI/IAAI, pages 692–699, 1998.