Acceleration of Subspace Learning Machine via Particle Swarm Optimization and Parallel Processing
Abstract
Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is reached at the expense of higher computational complexity. In this work, we investigate two ways to accelerate SLM. First, we adopt the particle swarm optimization (PSO) algorithm to speed up the search of a discriminant dimension that is expressed as a linear combination of current dimensions. The search of optimal weights in the linear combination is computationally heavy. It is accomplished by probabilistic search in original SLM. The acceleration of SLM by PSO requires 10-20 times fewer iterations. Second, we leverage parallel processing in the SLM implementation. Experimental results show that the accelerated SLM method achieves a speed up factor of 577 in training time while maintaining comparable classification/regression performance of original SLM.
1 Introduction
Feature-based classification and regression methods, where feature extraction and decision-making are two separate modules, have been well studied for general classification and regression tasks for decades in the field of machine learning. In recent years, deep learning (DL) methods handle feature extraction and decision-making jointly in training deep neural networks (DNNs) through back propagation (BP). In this work, we adopt the traditional machine learning pipeline and aim to improve the high-performance decision-making (i.e., classification and regression) module with features as the input.
Built upon the classical decision tree (DT) [1] idea, SLM has been recently proposed in [2]. SLM offers a high-performance machine learning solution to general classification and regression tasks. Both DT and SLM convert the high-dimensional feature space splitting problem into a 1D feature space splitting problem. DT compares each feature in the raw feature space and chooses the one that meets a certain criterion and searches for the optimal split point, which is the threshold value. In contrast, SLM conducts a linear combination of existing features to find a new discriminant 1D space and find the optimal split point accordingly. It demands to find optimal weights of the linear combination, which is computationally heavy, in the training stage.
It was shown in [2] that SLM achieves better performance than the support vector machine (SVM) [3], DT [1], multilayer perceptron (MLP) [4, 5] with fewer parameters on several benchmarking machine learning datasets. The performance improvement is reached at the expense of higher computational complexity. In this work, we investigate two ideas to accelerate SLM: 1) proposal of a more efficient weight learning strategy via particle swarm optimization (PSO) [6], and 2) software implementation optimization with parallel processing on CPU and GPU.
For the first idea, the search of optimal weights is accomplished by probabilistic search in original SLM. No optimization is involved. The probabilistic search demands a large number of iterations to achieve desired results. It is difficult to scale for a high-dimensional feature space. In contrast, PSO simulates the behavior of a flock of birds, where each bird is modeled by a particle. PSO finds the optimal solution in the search space by updating particles jointly and iteratively. The acceleration of SLM by PSO requires 10-20 times fewer iterations than the original SLM.
For the second idea, SLM’s training time can be significantly reduced through software implementation optimization. In the original SLM development stage, its implementation primarily relies on public python packages such as Scikit-learn[7], NumPy, SciPy [8], etc. After its algorithmic detail is settled, we follow the development pathway of XGBoost [9] and LightGBM [10], implement a C++ package, and integrate it with Cython. Both multithreaded CPU and CUDA are supported.
By combining the above two ideas, accelerated SLM can achieve a speed up factor of 577 in training as compared to the original SLM while maintaining comparable (or even better) classification/regression performance. The rest of this paper is organized as follows. The SLM method is reviewed in Sec. 2. The acceleration of SLM through PSO is introduced in Sec. 3. The optimization of software implementation is presented in Sec. 4. Experimental results are shown in Sec. 5. Finally, concluding remarks are given in Sec. 6.
2 Subspace Learning Machine (SLM)
2.1 Methodology
Inspired by the feedforward multilayer perceptron (FF-MLP) [5], decision tree (DT) [1] and extreme learning machine (ELM) [11], a new machine learning model named SLM was proposed by Fu et al. in [2] recently. Its main idea is subspace partitioning, where an input feature space is partitioned into multiple discriminant sets in a hierarchical manner. The whole input feature space is treated as a parent node and it is partitioned by a hyperplane into two subsets, where each subset has a lower entropy value. Each subset is denoted by a child node. This partitioning process is conducted recursively until samples in leaf nodes are pure enough or the sample number is small enough to avoid overfitting. Majority vote is adopted at each leaf node where all test samples in the node are predicted as the majority class. Different from traditional DT methods, SLM allows multiple splitting of a parent node at one level, which makes the SLM tree wider and shallower to prevent overfitting.
Instead of partitioning a node based on the original input feature space, SLM searches for discriminant projections for each parent node and performs its partitioning in a projected 1D space. The projected 1D space from the input feature vector can be expressed as
(1) |
where
(2) |
is the projection vector on the unit hypersphere in and is the dimension of the input feature space. The 1D subspace can be partitioned into two disjoint sets based on a threshold, , selected by the discriminant feature test (DFT) in [12]. They are written as
(3) | |||||
(4) |
We summarize the SLM tree construction as follows.
-
1.
A discriminant subspace of dimension is selected from the original feature space.
-
2.
projection vectors are generated to project feature space to 1D subspaces. Among them, the discriminability of each projected subspace is evaluated using the DFT loss in [12], to select the best discriminant 1D subspaces. Correlation among them are considered to ensure diversity.
-
3.
The node is split based on each selected projected subspaces, denoted as . This completes the node splitting in one level of the SLM tree.
-
4.
Conduct node splitting recursively until the stopping criteria are met.
2.2 Probabilistic Selection in SLM
The most important and challenging step in SLM is the selection of discriminant projection vectors that can separate samples into disjoint sets of lower entropy. Each projection vector is a linear combination of one-hot basis vectors, , , in form of
(5) |
where represents the coefficient of the corresponding basis vector .
A probabilistic search scheme was proposed in [2] to achieve this task. It first orders based on its discriminant power evaluation using the DFT loss from the highest to the lowest. Then, Eq. (5) can be rewritten as
(6) |
where vectors are ordered from the most to the least discriminant ones. Then, different sets of coefficients form different projection vectors. Although random selection can be performed to find optimal sets of coefficients, the search space is too large to be practical. Three hyper-parameters are proposed to guide the search in probabilistic search as described below.
-
•
: the probability of selecting coefficient
Since basis vectors are rank ordered in decreasing discriminant power in Eq. (6), we let decrease exponentially as increases. In other words, the more discriminant a basis, the higher likelihood being selected. The resulting tends to be more discriminant as well. -
•
: the dynamic range of coefficient
is randomly selected among integer values expressed as(7) where
(8) The more discriminant a basis, the higher dynamic range in selection.
-
•
: the number of selected coefficients
A subset of basis vectors out of are considered in the linear combination in Eq. (6). The remaining corresponding coefficients are set to . This further reduces the search space for a large value.
Given a set of selected projection vectors using the probabilistic search scheme as stated above, the cosine similarity between each pair is measured. To ensure the diversity of projected 1D spaces, we prune projection vectors of higher correlations from the final selected set.
3 Acceleration via Adaptive Particle Swarm Optimization (APSO)
As described in Sec. 2, the probabilistic search is used in the original SLM to obtain good projection vectors. As the feature dimension increases, the search space grows exponentially and the probabilistic search scheme does not scale well. In this section, we adopt a metaheuristic algorithm called the particle swarm optimization (PSO) [13] to address the same problem with higher efficiency.
3.1 Introduction to PSO
PSO is an evolutionary algorithm proposed by Kennedy and Eberhart in 1995 [6]. It finds the optimal solution in the search space by simulating the behavior of flock using a swarm of particles. Each particle in the swarm has a position vector and a velocity vector. Mathematically, the th particle has the following position and velocity vectors:
(9) | |||||
(10) |
where is the dimension of the search space, respectively. The velocity and position vectors of the th particle are updated in each iteration via
(11) | |||
(12) |
Eq. (11) has the following parameters:
-
•
inertia weight used to control the influence of the current speed of the particle on the update.
-
•
: the position vector of the optimal particle in the population.
-
•
: the best position of the th particle in the history update process up to iteration .
-
•
and : two learning factors.
-
•
and : two uniformly distributed random values between 0 and 1 used to simulate the stochastic behavior of particles.
The initial position and velocity vectors of all particles are randomly selected within a preset range. At each iteration, the velocity and position of each particle are updated according to Eqs. (11) and (12), respectively. Furthermore, we calculate the current loss and update and accordingly. If the algorithm converges, the position vector of the global optimal particle is the output. The pseudo-codes of the PSO are provided in Algorithm 1.
Input:
Population, Dimension and MaxIteration
Output:
Best Particle’s position vector and loss
3.2 Adaptive PSO (APSO) in SLM
The PSO algorithm has a couple of variants [14, 15, 16]. The search space of the projection vector in our problem is complex and there are quite a few local minima. A standard PSO can easily get stuck to the local minima. One variant of PSO, called the adaptive particle swarm optimization (APSO)[17], often yields better performance for multimodal problems so that it is adopted as an alternative choice in the SLM implementation.
Before an iteration, APSO calculates the distribution of particles in the space and identify it as one of the following four states: Exploration, Exploitation, Convergence, and Jumpout. APSO chooses parameters and in different states dynamically so as to make particles more effective under the current distribution. They are summarized below.
-
•
In the exploration state, APSO increases the value and decreases the value. This allows particles to explore in the whole search space more freely and individually to achieve better individual optimal position.
-
•
In exploitation state, APSO increases the value slightly and decreases the value. As a result, particles can leverage the local information more to find the global optimum.
-
•
In the convergence state, APSO increases both and slightly. An elite learning strategy is also introduced to promote the convergence of globally optimal particles, which makes the search more efficient.
-
•
In the jumping-out state, APSO decreases the value and increases the value. This allows particles move away from their current cluster more easily. Thus, the particle swarm can jump out of the current local optimal solution.
For classification tasks in SLM, particle’s position vector is the projection vector and the DFT loss is utilized as the loss function. In original SLM, we split samples into several child nodes at each tree level. Since APSO can find the global optimum with the lowest loss function, only one globally optimal projection is selected per node in the accelerated version. As a result, the resulting SLM tree is a binary tree. Furthermore, in the accelerated SLM algorithm, the top discriminant features in each node are re-computed and they serve as the input dimension of the APSO algorithm. We will show in the experimental section that APSO demands fewer iterations to find the global optimum than probabilistic search. Thus, the overall search speed can be improved. The pseudo-codes of the SLM tree building based on APSO is given in Algorithm 2.
4 Acceleration via Parallel Processing
The SLM algorithm in [2] was implemented in pure Python. While Python code is easy to write, the running speed is slow. Cython is a superset of Python that bridges the gap between Python and C. It allows static typing and compilation from Cython to C. Furthermore, there are a few commonly used math libraries in Python such as Scipy and scikit-learn.
The first step towards run-time acceleration is to convert the computational bottleneck of the SLM implementation from Python to Cython. The bottleneck is the search of the optimal projection vector. For a projection vector at a node, we calculate the loss at each split point so as to find the minimum loss in the 1D space. Since these computations do not depend on each other, they can be computed in parallel in principle. Python does not support multithreading due to the presence of the Global Interpreter Lock (GIL). To this end, multi-processing was deemed unsuitable due to higher overhead in the current context.
Instead, we implement multithreaded C++ extension for the optimal projection vector search, and integrate it into the remaining Cython code. In the C++ extension, we spawn multiple threads, where each thread covers the computation of a fixed number of splits per selected projection vector. The launched thread number is equal to the minimum of the physical core number to prevent excessive context switching. A task is defined as calculating the loss on a single split of a projected 1D subspace. These tasks are evenly allocated to each thread wherein each thread processes them sequentially.
Compared to CPUs, modern GPUs have thousands of cores and context switching occurs significantly less frequently. Thus, for GPU processing, each thread is responsible for a single task in our design. Once the task is completed, the GPU core moves onto the next available task, if any. Once the computation is completed, the loss values are returned to the main process where the minimum loss projection and split are computed.
Settings |
|
|
|
Iris | Wine | B.C.W | Pima | Ionoshpere | Banknote | |||||||
SLM | A | 7.026 | 4.015 | 6.019 | 82.939 | 65.652 | 234.597 | 339.077 | 248.410 | 118.000 | ||||||
B | 0.287 | 0.188 | 0.200 | 0.465 | 0.808 | 2.931 | 3.920 | 5.217 | 0.697 | |||||||
C | 0.346 | 0.0943 | 0.236 | 0.295 | 0.700 | 2.657 | 3.263 | 3.838 | 0.439 | |||||||
D | 4.071 | 3.159 | 3.245 | 3.205 | 3.572 | 3.956 | 4.660 | 5.021 | 4.370 | |||||||
E | 7.597 | 3.146 | 7.200 | 4.188 | 3.106 | 11.605 | 13.892 | 8.123 | 8.588 | |||||||
F | 0.270 | 0.0887 | 0.278 | 0.192 | 0.831 | 2.837 | 2.738 | 1.371 | 0.619 | |||||||
SLM Forest | A | 160.264 | 107.695 | 157.307 | 2333.514 | 2605.288 | 5067.356 | 10024.238 | 6582.104 | 3020.587 | ||||||
B | 8.507 | 5.272 | 8.471 | 17.142 | 26.254 | 134.301 | 101.631 | 152.625 | 40.365 | |||||||
C | 7.910 | 4.714 | 8.170 | 13.068 | 22.176 | 116.843 | 88.972 | 137.390 | 16.341 | |||||||
D | 7.708 | 4.954 | 7.678 | 5.946 | 15.066 | 98.969 | 48.987 | 80.547 | 16.511 | |||||||
E | 220.021 | 96.712 | 219.002 | 105.768 | 94.788 | 360.089 | 396.989 | 247.069 | 259.000 | |||||||
F | 8.75 | 4.058 | 9.02 | 5.822 | 20.081 | 98.490 | 31.684 | 75.964 | 14.054 | |||||||
SLM Boost | A | 143.332 | 116.156 | 151.904 | 2347.904 | 2591.919 | 7118.338 | 9887.949 | 6395.744 | 3412.654 | ||||||
B | 7.459 | 5.185 | 7.533 | 16.087 | 21.873 | 117.01 | 104.47 | 148.586 | 23.443 | |||||||
C | 7.987 | 3.434 | 8.293 | 12.144 | 15.624 | 115.226 | 80.273 | 131.712 | 14.207 | |||||||
D | 7.741 | 4.894 | 8.182 | 6.093 | 15.573 | 97.599 | 46.057 | 76.589 | 15.588 | |||||||
E | 233.071 | 97.541 | 233.347 | 107.709 | 95.534 | 362.295 | 396.989 | 247.069 | 262.546 | |||||||
F | 6.837 | 3.581 | 8.425 | 5.915 | 20.159 | 94.413 | 40.413 | 79.012 | 15.516 |
Settings | Make Friedman1 | Make Friedman2 | Make Friedman3 | Boston | California_housing | Diabetes | |
SLR | A | 152.291 | 60.232 | 186.535 | 153.173 | 756.868 | 147.933 |
B | 2.136 | 0.653 | 2.137 | 1.255 | 26.417 | 1.136 | |
C | 0.937 | 0.331 | 2.100 | 0.780 | 5.625 | 0.608 | |
D | 1.577 | 0.448 | 3.832 | 1.31 | 5.508 | 0.722 | |
E | 29.742 | 28.294 | 44.511 | 30.608 | 96.319 | 27.162 | |
F | 0.639 | 0.332 | 0.755 | 0.675 | 5.891 | 0.322 | |
SLR Forest | A | 5406.271 | 3346.595 | 7459.372 | 6454.979 | 21092.217 | 4413.089 |
B | 62.527 | 19.244 | 57.117 | 55.204 | 1049.859 | 46.622 | |
C | 35.576 | 13.533 | 63.551 | 30.943 | 254.415 | 29.606 | |
D | 19.408 | 6.48 | 42.182 | 18.04 | 125.104 | 17.242 | |
E | 1025.801 | 940.359 | 1051.498 | 792.33 | 3190.336 | 775.153 | |
F | 17.993 | 13.718 | 19.898 | 11.178 | 193.741 | 10.879 | |
SLR Boost | A | 4678.86 | 1955.517 | 6770.808 | 4097.66 | 21612.672 | 4200.828 |
B | 62.156 | 19.724 | 59.435 | 38.154 | 817.688 | 34.767 | |
C | 26.704 | 10.546 | 70.716 | 29.149 | 159.434 | 20.528 | |
D | 19.293 | 6.551 | 51.222 | 19.817 | 124.702 | 38.897 | |
E | 919.874 | 920.619 | 1834.926 | 910.271 | 3336.018 | 901.092 | |
F | 14.170 | 13.114 | 24.759 | 12.186 | 195.459 | 10.594 |
5 Experiments
Datasets. We follow the experimental setup in [2] and conduct performance benchmarking on 9 classification datasets (i.e., Circle-and-ring, 2-new-moons, 4-new-moons, Iris, Wine, B.C.W, Pima, Ionoshpere, and Banknote) and 6 regression datasets (i.e., Make Friedman1, Makefriedman2, Makefriedman3, Boston, California housing and Diabetes). For details of these 15 datasets, we refer readers to [2].
Benchmarking Algorithms and Accelation Settings. Ensemble methods are commonly used in the machine learning field to boost the performance. An ensemble model aims to obtain better performance than each constituent model alone. Inspired by the random forest (RF) [18] and XGBoost for DT, SLM Forest and SLM Boost are bagging and boosting methods for SLM, respectively. The counter part of SLM for the regression task is called the subspace learning regressor (SLR). Again, SLR has two ensemble methods; namely, SLR Forest and SLR Boost. They are detailed in [2].
We compare the run-time of six settings of SLM, SLM Forest and SLM Boost three methods in Table 1 and six settings of SLR, SLR Forest and SLR Boost in Table 2. They are:
-
A
Probabilistic search in pure Python,
-
B
Probabilistic search in Cython and C++ singlethreaded,
-
C
Probabilistic search in Cython and C++ multithreaded,
-
D
Probabilistic search in CUDA/GPU,
-
E
APSO in pure Python,
-
F
APSO in Cython and C++ multithreaded,
For a given dataset, settings [A]-[D] provide the same classification (or regression) performance in terms of classification accuracy (or regression MSE) while settings [E] and [F] offer the same classification accuracy (or regression MSE). Their main difference lies in the model training time. All settings are implemented on Intel(R) Xeon(R) CPU E5-2620 v3 @2.40GHz with 12 cores 24 threads and Nvidia Quadro M6000 GPU. All hyperparameters are the same. The number of trees for two SLM/SLR ensemble methods (i.e., SLM/SLR Forest and SLM/SLR Boost) is set to 30.
Accelation by Parallel Processing. We compare the run-time of SLM, SLM Forest and SLM Boost for the 9 classification datasets in Table 1 and that of SLR, SLR Forest and SLR Boost for the 6 regression datasets in Table 2. The shortest run-time is highlighted in bold face. We see from the tables that simply changing the implementation from Python to C++ yields a speed-up factor of x40 to x100 across all datasets. After that, the speed-up is incremental. Multithreading should lead to increased performance in theory. Yet, it is greatly influenced by the state of the machine in practice. For example, caching, task scheduling, and whether other processors are used by other processes. Moreover, additional overhead occurs in spawning, managing, and destroying threads. Typically, a thread of longer computation is more suitable for multithreading. Thus, the performance improvement in smaller datasets, where computations take less time, is less pronounced. Similarly, for a single SLM tree training, the model training time on GPU is notably worse due to the overhead of moving data to GPU memory. As the training scales (e.g., with ensembles of SLM/SLR trees such as SLM/SLR Forest and SLM/SLR Boost), CUDA/GPU generally outperforms multithreading.
Settings |
|
|
|
Iris | Wine | B.C.W | Pima | Ionoshpere | Banknote | Number of Iteration | ||||||
SLM [A,B,C,D] | 96.50 | 100 | 99.75 | 98.33 | 98.61 | 97.23 | 77.71 | 90.07 | 99.09 | 1000 | ||||||
SLM [E,F] | 96.50 | 100 | 99.75 | 98.33 | 100 | 98.68 | 77.71 | 92.19 | 99.64 | 110 | ||||||
SLM Forest [A,B,C,D] | 100 | 100 | 100 | 98.33 | 100 | 97.36 | 79.00 | 95.71 | 99.81 | 1000 | ||||||
SLM Forest [E,F] | 100 | 100 | 100 | 98.33 | 98.61 | 97.80 | 77.71 | 95.03 | 100 | 110 | ||||||
SLM Boost [A,B,C,D] | 100 | 100 | 100 | 98.33 | 100 | 98.83 | 77.71 | 94.33 | 100 | 1000 | ||||||
SLM Boost [E,F] | 100 | 100 | 100 | 98.33 | 100 | 97.80 | 78.34 | 94.33 | 100 | 110 |
settings | Make Friedman1 | Makefriedman2 | Makefriedman3 | Boston | California housing | Diabetes | Number of Iteration |
SLR [A,B,C,D] | 20.46 | 125830 | 0.078 | 65.68 | 1.172 | 5238.8 | 2000 |
SLR [E,F] | 13.52 | 12001 | 0.022 | 27.88 | 0.575 | 4018.7 | 110 |
SLR Forest [A,B,C,D] | 5.46 | 1641 | 0.007 | 13.18 | 0.407 | 2624.4 | 2000 |
SLR Forest [E,F] | 4.68 | 2144 | 0.012 | 13.48 | 0.419 | 2640.7 | 110 |
SLR Boost [A,B,C,D] | 3.85 | 670 | 0.008 | 13.07 | 0.365 | 2710.0 | 2000 |
SLR Boost [E,F] | 3.95 | 662 | 0.005 | 12.91 | 0.341 | 2711.2 | 110 |
Acceleration by APSO. To demonstrate the effectiveness of APSO, we compare the classification and regression performance of probabilistic search and APSO in Tables 3 and 4, respectively. Their performance in terms of classification accuracy and regression MSE is comparable. As shown in the last column of both tables, the iteration number of APSO is about 10% (for SLM) and 5% (for SLR) of that of probabilistic search.
Joint Acceleration by Parallel Processing and APSO. We can see the effect of joint acceleration of parallel processing and APSO from Tables 1 and 2 method F. We observe that the single-threaded C++ implementation of APSO has a comparable speed of the GPU version of probabilistic search. We have not yet done the GPU version of APSO acceleration but expect that it will provide another level of boosting in shortening the training run time.
In summary, APSO acceleration reduces the time complexity of SLM/SLR by reducing the number of iterations to 5-10%. Its C++ implementation and parallel process reduces the training time of each iteration by a factor of x40 to x100. With the combination of both, APSO SLM/SLR with C++ implementation gives the best overall performance in training time, classification accuracy and regression error. As compared to the original SLM/SLR, its training time can be accelerated by a factor up to 577.
6 Conclusion and Future Work
Two ways were proposed to accelerate SLM in this work. First, we applied the particle swarm optimization (PSO) algorithm to speed up the search of a discriminant dimension, which was accomplished by probabilistic search in original SLM. The acceleration of SLM by PSO requires 10-20 times fewer iterations. Second, we leveraged parallel processing in the implementation. It was shown by experimental results that, as compared to original SLM/SLR, accelerated SLM/SLR can achieve a speed up factor of 577 in training time while maintaining comparable classification and regression performance.
The datasets considered in [2] and this work are of lower dimensions. Recently, research on unsupervised representation of images and videos has been intensively studied, e.g., [19, 20, 21, 22, 23, 24, 25], learned representations and their labels can be fed into a classifier/regressor for supervised classification and regression. This leads to the traditional pattern recognition paradigm, where feature extraction and classification modules are cascaded for various tasks. The feature dimensions for these problems are in the order of hundreds. It is important to develop SLM for high dimensional image and video data to explore its full potential.
References
- [1] L. Breiman, J. Friedman, C. Stone, and R. Olshen, “Classification and regression trees (crc, boca raton, fl),” 1984.
- [2] H. Fu, Y. Yang, V. K. Mishra, and C.-C. J. Kuo, “Subspace learning machine (slm): Methodology and performance,” arXiv preprint arXiv:2205.05296, 2022.
- [3] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20, no. 3, pp. 273–297, 1995.
- [4] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
- [5] R. Lin, Z. Zhou, S. You, R. Rao, and C.-C. J. Kuo, “From two-class linear discriminant analysis to interpretable multilayer perceptron design,” arXiv preprint arXiv:2009.04442, 2020.
- [6] J. Kennedy and R. Eberhart, “Particle swarm optimization,” in Proceedings of ICNN’95-international conference on neural networks, vol. 4, pp. 1942–1948, IEEE, 1995.
- [7] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al., “Scikit-learn: Machine learning in python,” the Journal of machine Learning research, vol. 12, pp. 2825–2830, 2011.
- [8] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Bright, et al., “Scipy 1.0: fundamental algorithms for scientific computing in python,” Nature methods, vol. 17, no. 3, pp. 261–272, 2020.
- [9] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, 2016.
- [10] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, “Lightgbm: A highly efficient gradient boosting decision tree,” Advances in neural information processing systems, vol. 30, 2017.
- [11] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine: theory and applications,” Neurocomputing, vol. 70, no. 1-3, pp. 489–501, 2006.
- [12] Y. Yang, W. Wang, H. Fu, and C.-C. J. Kuo, “On supervised feature selection from high dimensional feature spaces,” arXiv preprint arXiv:2203.11924, 2022.
- [13] R. C. Eberhart, Y. Shi, and J. Kennedy, Swarm intelligence. Elsevier, 2001.
- [14] R. A. Krohling and L. dos Santos Coelho, “Coevolutionary particle swarm optimization using gaussian distribution for solving constrained optimization problems,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 36, no. 6, pp. 1407–1416, 2006.
- [15] J. J. Liang, A. K. Qin, P. N. Suganthan, and S. Baskar, “Comprehensive learning particle swarm optimizer for global optimization of multimodal functions,” IEEE transactions on evolutionary computation, vol. 10, no. 3, pp. 281–295, 2006.
- [16] Y. Shi and R. C. Eberhart, “Fuzzy adaptive particle swarm optimization,” in Proceedings of the 2001 congress on evolutionary computation (IEEE Cat. No. 01TH8546), vol. 1, pp. 101–106, IEEE, 2001.
- [17] Z.-H. Zhan, J. Zhang, Y. Li, and H. S.-H. Chung, “Adaptive particle swarm optimization,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 39, no. 6, pp. 1362–1381, 2009.
- [18] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
- [19] Y. Chen and C.-C. J. Kuo, “Pixelhop: A successive subspace learning (ssl) method for object recognition,” Journal of Visual Communication and Image Representation, vol. 70, p. 102749, 2020.
- [20] Y. Yang, H. Fu, and C.-C. J. Kuo, “Design of supervision-scalable learning systems: Methodology and performance benchmarking,” arXiv preprint arXiv:2206.09061, 2022.
- [21] C.-C. J. Kuo, M. Zhang, S. Li, J. Duan, and Y. Chen, “Interpretable convolutional neural networks via feedforward design,” Journal of Visual Communication and Image Representation, 2019.
- [22] Z. Zhou, H. Fu, S. You, C. C. Borel-Donohue, and C.-C. J. Kuo, “Uhp-sot: An unsupervised high-performance single object tracker,” in 2021 International Conference on Visual Communications and Image Processing (VCIP), pp. 1–5, IEEE, 2021.
- [23] Z. Zhou, H. Fu, S. You, and C.-C. J. Kuo, “Unsupervised lightweight single object tracking with uhp-sot++,” arXiv preprint arXiv:2111.07548, 2021.
- [24] Z. Zhou, H. Fu, S. You, and C.-C. J. Kuo, “Gusot: Green and unsupervised single object tracking for long video sequences,” arXiv preprint arXiv:2207.07629, 2022.
- [25] Y. Zhu, X. Wang, H.-S. Chen, R. Salloum, and C.-C. J. Kuo, “A-pixelhop: A green, robust and explainable fake-image detector,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8947–8951, IEEE, 2022.