This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep Forest with Hashing Screening and Window Screening

Pengfei Ma School of Artificial Intelligence, Hebei University of Technology,Tianjing, 300401, China Youxi Wu [email protected] Hebei Key Laboratory of Big Data Computing,Tianjing, 300401, China Yan Li School of Economics and Management, Hebei University of Technology,Tianjing, 300401, China Lei Guo State Key Laboratory of Reliability and Intelligence of Electrical Equipment, Hebei University of Technology,Tianjing, 300401, China He Jiang School of Software, Dalian University of Technology,Dalian, 116023, China Xingquan Zhu Department of Computer & Electrical Engineering and Computer Science, Florida Atlantic University,FL, 33431, USA Xindong Wu Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China), Hefei University of Technology, Hefei, 230009, China Mininglamp Academy of Sciences, Mininglamp Technology, Beijing, 100084, China
Abstract

As a novel deep learning model, gcForest has been widely used in various applications. However, the current multi-grained scanning of gcForest produces many redundant feature vectors, and this increases the time cost of the model. To screen out redundant feature vectors, we introduce a hashing screening mechanism for multi-grained scanning and propose a model called HW-Forest which adopts two strategies, hashing screening and window screening. HW-Forest employs perceptual hashing algorithm to calculate the similarity between feature vectors in hashing screening strategy, which is used to remove the redundant feature vectors produced by multi-grained scanning and can significantly decrease the time cost and memory consumption. Furthermore, we adopt a self-adaptive instance screening strategy to improve the performance of our approach, called window screening, which can achieve higher accuracy without hyperparameter tuning on different datasets. Our experimental results show that HW-Forest has higher accuracy than other models, and the time cost is also reduced.

keywords:
deep learning; deep forest, perceptual hashing, hashing screening strategy, window screening, self-adaptive mechanism

1 Introduction

Deep learning [1] can obtain the better performance than classical machine learning methods [2, 3, 4, 5] and has been widely applied in various fields, such as image processing [6], speech recognition [7], and text processing [8]. However, deep learning models based on neural network involve large numbers of hyperparameters and depend on a parameter tuning process. More importantly, theoretical analyses of neural network models are extremely difficult. There are therefore several reasons for exploring the development of a novel deep learning model. Based on the recognition that layer-by-layer processing and in-model feature transformation are key aspects of neural network models, Zhou and Feng [9] proposed a novel deep learning model called gcForest.

Inspired by neural network models, gcForest adopts a cascade structure to process data, and contains many random forests at each level of the cascade structure. Multi-grained scanning is employed, in which, as an in-model feature transformation mechanism, a sliding window is used to scan high-dimensional raw data. The main advantage of gcForest is that it does not need backpropagation at the training stage, which makes it much easier to train than the neural network model. Another advantage is that the complexity of gcForest can be automatically determined and adapted to different training datasets. Recently, gcForest has achieved excellent performance on a broad range of tasks, such as cancer detection [10] and remote sensing [11]. However, although existing experimental results show that larger models may provide higher accuracy, gcForest is difficult to implement with large models. The reason for this is that multi-grained scanning transforms a raw instance of high-dimension data into hundreds or thousands of novel image instances, and inputs them into all levels of the cascade structure. Thus, gcForest imposes high time cost and memory consumption to realize learning.

To reduce the time costs and memory requirements of gcForest, Pang et al. [12] proposed gcForestcs, in which a confidence screening mechanism is introduced into the general framework of gcForest. Confidence screening mechanisms select instances according to their confidence, where instance confidence is produced by their prediction vectors. More specifically, all instances are divided into two subsets: high-confidence and low-confidence. Instances in the high-confidence set are used to produce a final prediction at the current level, while the instances in the low-confidence set are passed to the next level. The key aspect of gcForestcs is confidence screening, and its threshold is produced by a simple and effective rule. However, this rule cannot avoid the mispartitioning of instances, in which some low-accuracy instances are classified into the high-confidence set. To tackle this issue, we previously proposed DBC-Forest [13], in which binning confidence screening is used to replace confidence screening. More concretely, DBC-Forest packs all instances into bins based on their confidence values, and adopts the instance accuracy of the bins to avoid mispartitioning of instances. Experimental results showed that DBC-Forest achieved better performance than gcForestcs.

However, current deep forest models have two issues. First, multi-grained scanning can produce some redundant feature vectors which increase time cost. The reason for producing them is that some instances have the same features. Second, in cascade structure, the thresholds of these confidence screening mechanisms depend on tuning hyperparameters. In other words, they cannot achieve the best performance when the hyperparameters are not suitable for some datasets. To improve the performance of the deep forest models, we propose HW-Forest, which adopts two screening mechanisms: hashing screening and window screening. Hashing screening reduces the number of redundant feature vectors based on their perceptual hashing values, and can also be applied to other deep forest models, since the mechanism is an improvement on gcForest. Furthermore, inspired by the binning confidence screening process used in DBC-Forest, a self-adaptive mechanism called window screening is proposed, in which a window is adopted to screen instances. Unlike DBC-Forest, window screening produces thresholds automatically for different datasets without requiring hyperparameter tuning.

The contributions of this paper can be summarized as follows.

  1. 1.

    To improve the performance and efficiency of the deep forest algorithm, we propose a novel deep forest model called HW-Forest which uses two screening mechanisms: hashing screening and window screening.

  2. 2.

    In HW-Forest, hashing screening is used to remove the redundant feature vectors produced by multi-grained scanning, which significantly decreases the time cost and memory consumption.

  3. 3.

    A self-adaptive confidence screening is used to realize window screening, which can achieve higher accuracy without hyperparameter tuning on different datasets.

  4. 4.

    Our experimental results show that the proposed hashing screening mechanism can not only reduce the running time of HW-Forest, but also reduce the running time of other deep forest methods. More importantly, HW-Forest yields better performance than other state-of-the-art approaches.

The remainder of the paper is organized as follows. Section 2 introduces related work on the deep forest algorithm. Section 3 describes perceptual hashing and DBC-Forest. Section 4 introduces HW-Forest and its two screening mechanisms. Section 5 presents experimental results. Finally, conclusions are drawn in Section 6.

2 RELATED WORK

As a novel deep learning model, deep forest gives higher accuracy than deep neural networks, and it has therefore been applied in many fields, such as medicine, electricity, emotion recognition, agriculture, topic classification for tweets, image retrieval, and economics [14]. In medicine, Sun et al. [15] proposed AFS-DF, in which a feature selection mechanism was adopted to classify cases of COVID-19. To improve the accuracy of classification of cancer subtypes, Guo et al. [16] and Dong et al. [17] proposed the boosting cascade deep forest and multi-weighted gcForest algorithms, respectively. In electricity. Wang et al. [18] proposed a novel voltage-current trajectory classification model based on deep forest. In emotion recognition, Cheng et al. [19] applied deep forest to emotion recognition based on electroencephalography. Fang et al.[20] proposed a multi-feature deep forest approach in which a multi-feature structure was employed to extract power spectral density and differential entropy features. In agriculture, Zhang et al. [21] combined hyperspectral imaging technology with deep forest to identify frost-damaged rice seeds. In tweet topic classification, a semantic deep forest technique was developed [22] in which a semantic layer was introduced into the cascade structure of deep forest. In image retrieval, Zhou et al. [23] proposed a deep forest hashing approach to learn shorter binary codes. In economics, Ma et al. [24] proposed a cost-sensitive deep forest algorithm that assigned a specific cost for each misclassification, to improve the accuracy of deep forest in the area of price prediction.

However, the performance of deep forest needs to be further improved, since it is inefficient on datasets with larger numbers of instances. The most representative of the improved deep forest models is gcForestcs [12], in which confidence screening was adopted to improve the efficiency. Inspired by gcForestcs, many other approaches have been proposed, such as DBC-Forest [13], AWDF [25], and, MLDF [26]. In the binning confidence screening deep forest algorithm [13], a binning confidence mechanism was adopted to improve the accuracy of gcForestcs. In an approach called adaptive weighted deep forest [25], a special weighted scheme was utilized that could adaptively weight each training instance at each cascade level. The multi-label deep forest algorithm [26] employed a measure-aware feature reuse mechanism to retain valuable presentation in the previous layer, and a measure-aware layer growth was adopted to limit the complexity of the model with different performance measures.

In the models described above, less attention is paid to the redundant feature vectors, which not only increase the time cost but also decrease the accuracy of the model. To further improve the performance of deep forest, we propose a novel model called HW-Forest, in which hashing screening and window screening are applied. Our experimental results show that HW-Forest achieves better performance than other state-of-the-art models.

3 Preliminaries

Section 3.1 introduces perceptual hashing algorithms. Section 3.2 reviews the principles underlying DBC-Forest, including the use of binning confidence screening.

3.1 Perceptual Hashing

Perceptual hashing [27] is an effective approach for image retrieval in which images are converted into binary sequences for retrieval. The most representative perceptual hashing algorithms are aHash [28], pHash [29], and dHash [30]. To process data efficiently, we employ aHash to screen out redundant feature vectors, using a hashing screening mechanism. The main steps of aHash are as follows:

Step 1: Resize the image (this is a fast way to match an image of any size).

Step 2: Convert the resized image to grayscale.

Step 3: Compute the mean value of the grayed image pixels.

Step 4: Compare each pixel value with the mean value, and set any pixels larger than the mean to one, and the other pixels to zero.

Step 5: Combine the binarization results into binary sequences.

Finally, aHash retrieves images by comparing their binary sequences. The process used by the aHash algorithm is illustrated in Figure 1.

Refer to caption
Figure 1: Processing of the image with aHash

3.2 DBC-Forest

As an improved version of gcForestcs, DBC-Forest adopts a binning confidence screening mechanism to avoid the mispartitioning of instances with low accuracy into high-confidence regions. More specifically, the binning confidence screening mechanism packs all instances into bins, and DBC-Forest then calculates the accuracy of all bins to locate mispartitioned instances. This mechanism can be summarized as follows.

Step 1: Calculate the confidence of each instance at the current level. We utilize confidence as the criterion for selecting instances, and this is generated by the estimated class vector. For example, if the estimated class vector for instance x is (0.5, 0.3, 0.2), its confidence is 0.5.

Step 2: Rank instances based on their confidences. We then obtain instances (𝒙s1,𝒙s2,,𝒙sm)(\bm{x}_{s1},\bm{x}_{s2},\cdots,\bm{x}_{sm}), where 𝒙s1\bm{x}_{s1} is the highest confidence instance, and put these instances into kk bins. Each bin 𝒃t\bm{b}_{t} (1tk\leq t\leq k ) can be obtained according to Equation (1):

(𝒙s(m(t1)+1k),𝒙s(m(t1)+2k),,𝒙s(mtk))𝒃t(\bm{x}_{s(\frac{m(t-1)+1}{k})},\bm{x}_{s(\frac{m(t-1)+2}{k})},\dots,\bm{x}_{s(\frac{mt}{k})})\in\bm{b}_{t} (1)

Step 3: Calculate the accuracy for each bin. The accuracy 𝑷t\bm{P}_{t} of 𝒃t\bm{b}_{t} is calculated according to Equation (2):

𝑷t=i=mk×(t1)+1m×tk𝟙[p(𝒙si)=𝒚si]m/k\bm{P}_{t}=\frac{\sum_{i=\frac{m}{k}\times(t-1)+1}^{m\times\frac{t}{k}}\mathds{1}[p(\bm{x}_{si})=\bm{y}_{si}]}{m/k} (2)

where p(𝒙si)p({\bm{x}_{si}}) is the prediction of 𝒙si\bm{x}_{si} at the current level, and 𝒚si\bm{y}_{si} is the label of 𝒙si\bm{x}_{si}.

Step 4: Determine the threshold. In DBC-Forest, the accuracy of each bin is compared with the target accuracy TATA from 1 to kk. If 𝑷j+1\bm{P}_{j+1} is the first bin for which the accuracy is lower than TATA, then we utilize the confidence of 𝒙s(m×jk)\bm{x}_{s(m\times\frac{j}{k})} as the threshold.

Step 5: Compare the confidences of the instances with the threshold. If the confidence of the instance is larger than the threshold, then the instance is used to produce a final prediction at the current level; otherwise, the instance is passed to the next level.

4 Proposed Method

Section 4.1 introduces the hashing screening strategy, and Section 4.2 presents the window screening strategy. Section 4.3 describes HW-Forest and illustrates its framework.

4.1 Hashing Screening

Multi-grained scanning is a structure for processing high-dimensional data that can achieve excellent performance on image data. However, the time cost of multi-grained scanning is increased by the presence of redundant feature vectors, which are produced by sampling the same features from instances. Figure 2 shows the production of a redundant feature vector by a sliding window.

Refer to caption
Figure 2: The production of redundant feature vectors. We select two instances from the MNIST dataset, and assume the size of the slide window is 2×2.

To reduce the number of redundant feature vectors, we apply the aHash algorithm to evaluate the similarity between feature vectors and eliminate the high similarity feature vectors. This process can be summarized as follows.

Step 1: Group feature vectors according to their locations, which gives (𝒗1,𝒗2,,𝒗r)(\bm{v}_{1},\bm{v}_{2},\cdots,\bm{v}_{r}) according to Equation (3):

(𝒇r,1,𝒇r,2,,𝒇r,m)𝒗r(\bm{f}_{r,1},\bm{f}_{r,2},\dots,\bm{f}_{r,m})\in\bm{v}_{r} (3)

The relationships between these vectors are illustrated in Figure 3. For example, 𝒇1,1\bm{f}_{1,1} and 𝒇1,2\bm{f}_{1,2} are members of group 𝒗1\bm{v}_{1}, and are the first feature vectors of instances 𝒙1\bm{x}_{1} and 𝒙2\bm{x}_{2}, respectively.

Refer to caption
Figure 3: The relationships among (𝒙1,𝒙2,,𝒙r)(\bm{x}_{1},\bm{x}_{2},\cdots,\bm{x}_{r}), (𝒗1,𝒗2,,𝒗r)(\bm{v}_{1},\bm{v}_{2},\cdots,\bm{v}_{r}), and (𝒇1,1,𝒇1,2,,𝒇r,m))(\bm{f}_{1,1},\bm{f}_{1,2},\cdots,\bm{f}_{r,m})). Feature vectors 𝒇1,1\bm{f}_{1,1} and 𝒇1,2\bm{f}_{1,2} are in group 𝒗1\bm{v}_{1}, since they are the first feature vectors of different instances.

Step 2: Obtain binary sequences for all feature vectors using aHash. For example, (br,i(k1),br,i(k2),,br,i(kc))(b_{r,i}(k_{1}),b_{r,i}(k_{2}),\\ \dots,b_{r,i}(k_{c})) is the binary sequence of br,ib_{r,i}, where cc is the length of the binary sequence and br,i(kj)b_{r,i}(k_{j}) is obtained by aHash.

Step 3: Calculate the distances between the features in each group. In group 𝒗r\bm{v}_{r}, the jj-th feature distance 𝒗r(kj)\bm{v}_{r}(k_{j}) can be calculated using Equation (4):

𝒗r(kj)={i=1mbr,i(kj)m,i=1mbr,i(kj)m<0.51i=1mbr,i(kj)m,i=1mbr,i(kj)m0.5\bm{v}_{r}(k_{j})=\left\{\begin{aligned} \frac{\sum_{i=1}^{m}b_{r,i}(k_{j})}{m},\quad\frac{\sum_{i=1}^{m}b_{r,i}(k_{j})}{m}\textless 0.5\\ 1-\frac{\sum_{i=1}^{m}b_{r,i}(k_{j})}{m},\frac{\sum_{i=1}^{m}b_{r,i}(k_{j})}{m}\geq 0.5\\ \end{aligned}\right. (4)

Step 4: Calculate the distance between each pair of groups. This distance represents the differences in the feature vectors in the groups. The larger the distance, the greater the difference in the feature vectors. The distance drd_{r} of group 𝒗r\bm{v}_{r} can be calculated using Equation (5):

dr=j=1c𝒗r(kj)cd_{r}=\frac{\sum_{j=1}^{c}\bm{v}_{r}(k_{j})}{c} (5)

Step 5: Rank the groups in descending order based on their distances. We then obtain a new instances order (𝒘1,𝒘2,,𝒘r)(\bm{w}_{1},\bm{w}_{2},\cdots,\bm{w}_{r}) and their distances (e1,e2,,er)(e_{1},e_{2},\cdots,e_{r}), where e1e_{1} is the largest.

Step 6: Calculate the number of groups contained in each percentage of DD, where DD is the sum of all distances, i.e., D=i=1reiD=\sum_{i=1}^{r}e_{i} . We then obtain (N(100),N(99),,N(1))(N(100),N(99),\cdots,N(1)) based on (e1,e2,,er)(e_{1},e_{2},\cdots,e_{r}). The calculation of N(g)N(g) is carried out using Equation (6):

N(g)={m|i=1meiDg100}N(g)=\{m|\sum_{i=1}^{m}e_{i}\geq\frac{Dg}{100}\} (6)

Step 7: Determine the threshold HTHT. We compare r/50r/50 with N(100)N(99),N(99)N(98),,N(2)N(1)N(100)-N(99),N(99)-N(98),\cdots,N(2)-N(1) in order. If N(p)N(p1)N(p)-N(p-1) is the first term that is lower than or equal to r/50r/50, we utilize ehe_{h} as the threshold HTHT, where hh is the value of N(p)N(p) and p{100,99,,2}p\in\{100,99,\cdots,2\}. HTHT is calculated using Equation (7):

HT=eN(p)HT=e_{N(p)} (7)

where p=max{u|N(u)N(u1)r100,u=100,99,,2}p=max\{u|N(u)-N(u-1)\geq\frac{r}{100},u=100,99,\dots,2\}.

Step 8: Compare the distances with HTHT. If the distance of a group is lower than the threshold, we eliminate the feature vectors in the group; otherwise, the feature vectors in the group are converted to class vectors and input into the cascade structure.

To illustrate the above steps, we present an example below.

Example 1. Suppose we have four groups (𝒗1,𝒗2,𝒗3,𝒗4)(\bm{v}_{1},\bm{v}_{2},\bm{v}_{3},\bm{v}_{4}), for which the feature vectors are (𝒇1,1=(0,1)(\bm{f}_{1,1}=(0,1), 𝒇1,2=(0,1))\bm{f}_{1,2}=(0,1)), (𝒇2,1=(1,1)\bm{f}_{2,1}=(1,1), 𝒇2,2=(1,1))\bm{f}_{2,2}=(1,1)), (𝒇3,1=(1,0),𝒇3,2=(0,1))(\bm{f}_{3,1}=(1,0),\bm{f}_{3,2}=(0,1)), (𝒇4,1=(0,1),𝒇4,2=(1,0))(\bm{f}_{4,1}=(0,1),\bm{f}_{4,2}=(1,0)). We know that the values of 𝒇1,1(k1)\bm{f}_{1,1}(k_{1}) and 𝒇1,2(k1)\bm{f}_{1,2}(k_{1}) are zero. Hence, according to Equation (4), the value of 𝒗1(𝒌1)=0+02=0\bm{v}_{1}(\bm{k}_{1})=\frac{0+0}{2}=0. Similarly, the values of (𝒗1(k1)\bm{v}_{1}(k_{1}), 𝒗1(k2)\bm{v}_{1}(k_{2})), (𝒗2(k1)\bm{v}_{2}(k_{1}), 𝒗2(k2)\bm{v}_{2}(k_{2})), (𝒗3(k1)\bm{v}_{3}(k_{1}), 𝒗3(k2)\bm{v}_{3}(k_{2})), and (𝒗4(k1)\bm{v}_{4}(k_{1}), 𝒗4(k2)\bm{v}_{4}(k_{2})) are (0, 0), (0, 0), (0.5, 0.5), and (0.5, 0.5), respectively. Since 𝒗4(k1)=0.5\bm{v}_{4}(k_{1})=0.5 and 𝒗4(k2)=0.5\bm{v}_{4}(k_{2})=0.5, the distance of 𝒗4\bm{v}_{4} is 0.5, i.e., d4d_{4}=0.5+0.52=0.5\frac{0.5+0.5}{2}=0.5. Thus, (d1,d2,d3,d4)(d_{1},d_{2},d_{3},d_{4}) are (0, 0, 0.5, 0.5). We then rank the groups according to their distances. Finally, we obtain (𝒘1\bm{w}_{1}, 𝒘2\bm{w}_{2}, 𝒘3\bm{w}_{3}, 𝒘4\bm{w}_{4}), for which the values are (e1=0.5,e2=0.5,e3=0,e4=0)(e_{1}=0.5,e_{2}=0.5,e_{3}=0,e_{4}=0). Hence, D=i=14ei=0.5+0.5+0+0=1D=\sum_{i=1}^{4}e_{i}=0.5+0.5+0+0=1. We now obtain the values of (N(100),N(99),,N(1))(N(100),N(99),\cdots,N(1)). According to Equation (6), the values of N(100),N(99),,N(51)N(100),N(99),\dots,N(51) are 2, since e1+e2100×D100=1e_{1}+e_{2}\geq 100\times\frac{D}{100}=1, and e1+e251×D100=0.51e_{1}+e_{2}\geq 51\times\frac{D}{100}=0.51. Similarly, the values of N(50),N(49),,N(1)N(50),N(49),\cdots,N(1) are 1, since e150×D100=0.50e_{1}\geq 50\times\frac{D}{100}=0.50 and e11×D100=0.01e_{1}\geq 1\times\frac{D}{100}=0.01. We know that there are four groups, meaning that r=4r=4, and r50=450=0.08\frac{r}{50}=\frac{4}{50}=0.08. According to Step 7, p=100p=100, since N(100)N(99)N(100)-N(99) is 0, and this is the first value lower than 0.08. Since p=100p=100 and N(100)=2N(100)=2, according to Equation (7), we select the second group distance e2e_{2} as the threshold, i.e., h=2h=2. Finally, we eliminate 𝒘3\bm{w}_{3} and 𝒘4\bm{w}_{4}, since their distances e3e_{3} and e4e_{4} are lower than e2e_{2}. 𝒘1\bm{w}_{1} and 𝒘2\bm{w}_{2} are converted to class vectors, and are input into the cascade structure. Figure 4 illustrates the hashing screening mechanism.

Refer to caption
Figure 4: Illustration of hashing screening. We select the distance of group 2 as the threshold, since N(100)N(99)N(100)-N(99) is zero and is therefore lower than 450\frac{4}{50}.

Example 1 verifies that hashing screening can effectively eliminate the redundant feature vectors.

4.2 Window Screening

As an improved version of gcForestcs, DBC-Forest avoids the mispartition of instances and thus improves the performance. However, its performance depends on the hyperparameter kk, which represents the number of bins used to partition instances. Specifically, the smaller the value of kk, the larger the threshold. Example 2 illustrates this phenomenon.

Example 2. Suppose we have 16 instances sorted by confidence, as illustrated in Figure 5. Of these, instances 1, 2, 4, 5, 6, 7, 8, and 11 are correct predictions, and the others are incorrect. We set two cases with kk = 8 and kk = 4 under the condition of the hyperparameter TATA = 70%. In the case of kk = 8, instance 2 is selected as the threshold in DBC-Forest, since the accuracy of bin II is 50%. In the case of kk = 4, instance 8 is selected, since the accuracy of bin III is 25%. This example shows that the performance of DBC-Forest depends on the hyperparameter kk. In other words, DBC-Forest cannot achieve high accuracy if the hyperparameter kk is not suitable for the dataset.

Refer to caption
Figure 5: Comparison of the different thresholds of DBC-Forest for different values of the hyperparameter kk. Suppose TATA is 70%. DBC-Forest selects instance 2 as the threshold when k=8k=8, since the accuracy of bin II is 0.5, which is lower than TATA, and selects instance 8 as threshold when k=4k=4, since the accuracy of bin III is 0.25, which is lower than TATA.

To address this drawback of DBC-Forest, we propose a window screening mechanism, in which a self-adaptive approach is used to calculate the threshold. Unlike the fixed window used in DBC-Forest, this approach uses windows with variable size. The mechanism can be summarized as follows.

Step 1: Rank the instances based on their confidences, and obtain (𝒙1,𝒙2,,𝒙m)(\bm{x}_{1},\bm{x}_{2},\cdots,\bm{x}_{m}).

Step 2: Establish a window 𝑾\bm{W} for which the parameters are cc, uu, and ll, where cc is the size of the window, and uu and ll are the starting and ending points of the window, respectively. Initially, the values of cc, uu, and ll are m/2m/2, 1, and m/2m/2, respectively. The window 𝑾\bm{W} can be defined according to Equation (8):

(𝒙u,𝒙u+1,,𝒙l)𝑾(\bm{x}_{u},\bm{x}_{u+1},\dots,\bm{x}_{l})\in\bm{W} (8)

Step 3: Slide the window 𝑾\bm{W} using a step size of one (i.e., increase the values of uu and ll), and calculate the accuracy of window P𝒘P_{\bm{w}} using Equation (9):

P𝒘=i=ul𝟙[p(𝒙i)=𝒚i]luP_{\bm{w}}=\frac{\sum_{i=u}^{l}\mathds{1}[p(\bm{x}_{i})=\bm{y}_{i}]}{l-u} (9)

where p(𝒙i)p(\bm{x}_{i}) is the prediction 𝒙i\bm{x}_{i}, and 𝒚i\bm{y}_{i} is the label of 𝒙i\bm{x}_{i}. If the accuracy is larger than the target accuracy TATA and the value of cc is not two, the window stops sliding.

Step 4: Reset the window parameters: the value of cc is reduced by half, uu does not change, and l=u+c1l=u+c-1.

Step 5: Iterate Steps 3 and 4. Set the first instance in the window as the threshold WTWT when cc is equal to or lower than two.

Step 6: Compare the confidence values of the instances with the threshold. If the confidence of the instance is larger than the threshold, then the instance produces a final prediction at the current level; otherwise, the instance is passed to the next level.

Unlike in the binning confidence screening approach, the threshold in window screening can be obtained by a self-adaptive method. Moreover, the performance of this approach is better than DBC-Forest, since the bin size hyperparameter is not adjusted. An illustrative example is given below.

Example 3. In this example, we set the target accuracy TATA to 70%, and employ the same 16 instances as in Example 2. The process of window screening is shown in Figure 6. In stage 1, we set the initial values to c=8c=8, u=1u=1, and l=8l=8. We know that the accuracy of the window is 0.875, which is greater than TATA. Thus, we slide the window 𝑾\bm{W} using a step size of one. We know that when u=2u=2 and l=9l=9, the accuracy of the window is also 0.875. However, when u=3u=3 and l=10l=10, the accuracy of the window is 0.625, which is less than TATA. We then enter stage 2, and reset the values to c=4c=4, u=3u=3, and l=6l=6. We know that the accuracy of this window is 0.75, which is greater than TATA. We then iterate this process until c=4c=4, u=7u=7, and l=10l=10. In this case, the accuracy of the window is 0.50, which is less than TATA. We then enter stage 3, and reset the values to c=2c=2, u=7u=7, and l=8l=8. In this case, the accuracy of the window is 1.0, which is greater than TATA. We iterate this process until c=2c=2, u=8u=8, and l=9l=9. In this case, the accuracy of the window is 0.50, which is less than TATA. Since c=2c=2, the window does not shrunk further. Hence, the threshold is the confidence of instance 8, which is the same as for the case of k=4k=4 in Example 2.

Refer to caption
Figure 6: Illustration of window screening. The size of the window cc will be decreased when the accuracy of the window is lower than TATA. When the size of window is two and the accuracy is lower than TATA, the confidence of the first instance is selected as the threshold.

4.3 HW-Forest

HW-Forest is an improved version of gcForest which consists of two parts: multi-grained scanning with hashing screening and cascade structure with window screening. Figure 7 illustrates the overall procedure of HW-Forest. In the first part, the instances are input into the multi-grained scanning stage. We then apply hashing screening to eliminate the redundant feature vectors, and the remaining feature vectors are converted into class vectors using a random forest algorithm. The second part is a cascade forest structure with window screening mechanism to reduce the time costs and memory requirements. The class vectors in the form of the original data are input into the first level. A window screening mechanism is used to divide all instances into high-confidence and low-confidence instances at each level. High-confidence instances are used to produce the final prediction at the current level, while low-confidence instances are passed to the next level.

Refer to caption
Figure 7: Framework of HW-Forest. A hashing screening strategy is adopted to eliminate redundant feature vectors, and window screening is applied to select high-confidence instances in the cascade forest.

Algorithm 1 shows pseudocode for HW-Forest.

Algorithm 1 HW-Forest
0:  Training set 𝑺\bm{S}, validation set 𝑺v\bm{S}_{v}, and the maximal number of cascade levels 𝑻\bm{T}.
0:  HW-Forest model ff.
1:  Get (𝒗1,𝒗2,,𝒗r)(\bm{v}_{1},\bm{v}_{2},\cdots,\bm{v}_{r}) by scanning 𝑺\bm{S};
2:  Calculate HTHT using a hashing screening mechanism;
3:  Eliminate redundant feature vectors and obtain a new training set SS’;
4:  t=1t=1, l1=0l_{1}=0, S1=SS_{1}=S’;
5:  while t𝑻t\leq\bm{T} do
6:     g=99,c=crad(S)/2,u=0,l=c,g=99,c=crad(S)/2,u=0,l=c, and l0=1l_{0}=1;
7:     Calculate WTtWT_{t} using a window screening mechanism;
8:     ft=prediction(S)f_{t}=prediction(S’);
9:     Compute the validation error lt=ft(𝑺𝒗)l_{t}=f_{t}(\bm{S_{v}});
10:     if lt>l(t1)l_{t}>l_{(}t-1) then
11:        return  ff;
12:     end if
13:     f=ftf=f_{t}
14:     St+1=St\{x|x>WTt,xSt}S_{t+1}=S_{t}\backslash\{x|x>WT_{t},x\in S_{t}\}
15:     t=t+1t=t+1
16:  end while
17:  return  ff;

5 Experiments

In this section, we report the results of some experiments carried out to assess the performance of HW-Forest. We also demonstrate the effects of the hashing screening and window screening processes. This section is organized as follows. Section 5.1 introduces the experimental setting, including the parameters, experimental environment, and baseline methods. Section 5.2 discusses the effect of hashing screening, and Section 5.3 illustrates the process of hashing screening by showing instances processed with this method. Section 5.4 discusses the influence of window screening, and Section 5.5 compares the thresholds used in DBC-Forest and HW-Forest to validate the effect of window screening. Section 5.6 compares HW-Forest with other state-of-the-art models on benchmark datasets.

5.1 Experimental Setup

Parameter Settings. In all of our experiments, the models used the same parameters, which were as follows. For the cascade structure, each level was produced by five-fold cross-validation and had a random forest and a completely-random forest. Each forest contained 50 decision trees. The number of cascade levels stopped increasing when the current level did not give an improvement in the accuracy of the previous level. For multi-grained scanning, we established three windows to scan the high-dimensional data, with sizes of 4×4, 6×6, and, 8×8. Each window had a random forest and a completely-random forest, and the number of decision trees in these two forests was 30. TATA was set to decrease the error rate by 1/2.

Datasets. We used nine benchmark datasets to verify the performance of HW-Forest. Table 1 summarizes the statistics of these datasets. Of these, MNIST, EMNIST, FASHION-MNIST, and QMNIST were commonly used high-dimensional datasets [31, 32, 33], while ADULT, BANK, YEAST, LETTER, IMDB were low-dimensional datasets.

Table 1: Summary of datasets
Name Training Testing Features Labels
MNIST 56,000 14,000 784 10
EMNIST 105,280 26,320 784 10
FASHION-MNIST 56,000 14,000 784 10
QMNIST 96,000 24,000 784 10
ADULT 39,074 9,768 14 2
BANK (BANK MARKETING) 32,950 8,238 20 2
YEAST 1,187 297 8 10
LETTER 16,000 4,000 16 26
IMDB 40,000 10,000 5,000 2

Evaluation metrics and methodology. In all of our experiments, we used the predictive accuracy as a measure of the classification performance, as it is suitable for balanced datasets. The training time was used to evaluate the efficiency. Statistical tests were also adopted as measurement criteria.

Hardware. In these experiments, we used a machine with 2×2.20 GHz CPUs (10 cores) and 128 GB main memory.

Baseline methods. To validate the performance of HW-Forest, we selected four state-of-the-art algorithms as competitive models: gcForest [9], gcForestcs [12], DBC-Forest [13], and AWDF [25]. To verify the effect of our hashing screening mechanism, we introduced this mechanism into gcForest, gcForestcs, DBC-Forest, and AWDF, to give H-gcForest, H-gcForestcs, HDBC-Forest, and H-AWDF, respectively.

gcForest [9]: This is a deep learning model that is inspired by deep neural networks, and adopts multi-grained scanning and a cascade structure to process the data.

gcForestcs [12]: This is an improved gcForest model in which a confidence screening mechanism is introduced to screen the instances at each level.

DBC-Forest [13]: This is an improved gcForestcs model. To increase the accuracy of gcForestcs, binning confidence screening is used to avoid the mispartitioning of instances at each level.

AWDF [25]: This is an adaptive weighted model in which adaptive weighting is used for every training instance at each level.

H-gcForest, H-gcForestcs, HDBC-Forest, and H-AWDF: To verify the influence of the hashing screening mechanism, we introduced it into gcForest, gcForestcs, DBC-Forest, and AWDF, to give H-gcForest, H-gcForestcs, HDBC-Forest, and H-AWDF, respectively.

H-Forest (H-gcForest) and W-Forest: To validate the influence of two screening mechanisms, we developed H-Forest (H-gcForest) and W-Forest. These are versions of HW-Forest without the window and hashing screening mechanisms, respectively.

To validate the performance of HW-Forest, we developed the following five research questions (RQ).

RQ 1: What is the effect of hashing screening, and can it be adopted in other models?

RQ 2: What is the effect of hashing screening with different window sizes?

RQ 3: Compared with DBC-Forest, what is the effect of HW-Forest at each level?

RQ 4: What is the difference between the thresholds in DBC-Forest and HW-Forest?

RQ 5: Compared with state-of-the-art alternatives, what is the performance of HW-Forest?

To answer RQ 1, we introduced hashing screening into gcForest, gcForestcs, DBC-Forest, and AWDF, and compared their accuracies with the original models to verify the influence of hashing screening. To address RQ 2, we selected 10 instances from FASHION-MINST to investigate the effect of hashing screening with different window sizes, and to explore the performance of HW-Forest with different window sizes. In response to RQ 3, we compared the accuracy and the number of instances between DBC-Forest and HW-Forest at each level. Specifically, we used MNIST, EMNIST, FASHION-MNIST, QMNIST, and IMDB in this experiment. To answer RQ 4, we explored the thresholds used in the window screening and binning confidence screening mechanisms to show the difference. For RQ 5, we compared the performance of HW-Forest with gcForest, gcForestcs, DBC-Forest, AWDF, H-Forest, and W-Forest.

5.2 Influence of Hashing Screening

To analyze the effect of the hashing screening mechanism, we conducted four experiments to compare the performance between pairs of models: gcForest and H-gcForest; gcForestcs and H-gcForestcs; gcForestcs and H-gcForestcs; and AWDF and H-AWDF. Experiments were conducted on the MNIST, EMNIST, FASHION-MNIST, and QMNIST datasets, since these need to be processed by multi-grained scanning. The results of this comparison are shown in Tables 2 to 6.

Table 2: Comparison of performance between gcForest and H-gcForest
Accuracy (%) Time(s)
Dataset gcForest H-gcForest Difference gcForest H-gcForest Difference
MNIST 98.77±0.17 98.75±0.12 -0.02 1972.27±6.39 1464.86±11.54 507.41 (34.70%)
EMNIST 86.18±0.24 86.13±0.28 -0.05 8774.77±108.08 6920.12±51.39 1854.65 (26.80%)
FASHION-MNIST 89.94±0.29 90.05±0.29 0.09 2411.10±10.28 2272.63±25.18 138.47 (6.09%)
QMNIST 98.93±0.06 98.92±0.06 -0.01 2332.79±31.80 1790.10±17.71 542.69 (30.32%)
Table 3: Comparison of performance between gcForestcs and H-gcForestcs
Accuracy (%) Time(s)
Dataset gcForestcs H-gcForestcs Difference gcForestcs H-gcForestcs Difference
MNIST 98.20±0.09 98.19±0.12 -0.01 1636.00±13.47 1153.50±8.65 482.5 (41.83%)
EMNIST 87.11±0.23 87.13±0.30 0.02 5945.37±159.68 5313.32±128.73 532.05 (11.90%)
FASHION-MNIST 89.94±0.29 90.03±0.30 0.07 1943.09±6.50 1850.41±8.41 92.47 (5.01%)
QMNIST 98.40±0.12 98.41±0.12 0.01 1991.11±17.71 1454.08±14.77 537.03 (36.93%)
Table 4: Comparison of performance between DBC-Forest and HDBC-gcForest
Accuracy (%) Time(s)
Dataset DBC-Forest HDBC-Forest Difference DBC-Forest HDBC-Forest Difference
MNIST 99.03±0.07 99.01±0.06 -0.02 1636.10±17.79 1159.99±9.90 479.11 (41.04%)
EMNIST 86.82±0.26 86.83±0.16 0.01 7226.23±112.34 6349.11±162.52 877.12 (13.81%)
FASHION-MNIST 90.57±0.29 90.70±0.20 0.13 2056.95±13.38 1928.22±13.91 128.73 (6.68%)
QMNIST 99.08±0.12 99.05±0.06 -0.03 1979.00±22.14 1467.99±21.23 511.01 (34.81%)
Table 5: Comparison of performance between AWDF and H-AWDF
Accuracy (%) Time(s)
Dataset AWDF H-AWDF Difference AWDF H-AWDF Difference
MNIST 98.80±0.11 98.80±0.11 0.00 1697.79±18.36 1192.30±11.95 505.49 (42.40%)
EMNIST 86.74±0.18 86.70±0.25 -0.04 7385.44±103.03 6664.62±201.76 720.82 (10.82%)
FASHION-MNIST 89.89±0.27 90.05±0.14 0.06 1821.65±15.30 1734.06±16.83 87.59 (5.05%)
QMNIST 98.93±0.08 98.95±0.10 0.02 1988.55±66.69 1458.53±14.06 530.02 (36.34%)
Table 6: Comparison of performance between W-Forest and HW-Forest
Accuracy (%) Time(s)
Dataset W-Forest HW-Forest Difference W-Forest HW-Forest Difference
MNIST 99.05±0.08 99.07±0.09 -0.02 1648.92±20.56 1135.42±9.32 513.5 (45.23%)
EMNIST 87.24±0.18 87.22±0.17 -0.02 6856.64±80.16 6408.14±65.33 448.5 (7.00%)
FASHION-MNIST 90.77±0.34 90.79±0.24 0.02 1990.47±13.91 1904.17±22.82 86.30 (4.53%)
QMNIST 99.13±0.05 99.13±0.05 0.00 1965.98±34.68 1473.09±12.00 489.89 (33.46%)

The experimental results give rise to the following observations.

1. Hashing screening effectively decreases the time cost for all datasets. For example, from Table 3, it can be seen that H-gcForestcs has a time cost that is 482.5 s (41.83%) lower than for gcForestcs on the MNIST dataset. This phenomenon can also be seen in the other tables. The reason for this is as follows. We know that the greater the number of feature vectors, the higher the time cost. Hashing screening can effectively eliminate the redundant feature vectors, and hence the time cost is reduced.

2. Hashing screening does not influence the accuracy of the models. The results of 20 experiments are presented in Tables 2 to 6, and it can be seen that the models with hashing screening have better accuracy than the original models in 11 experiments, worse accuracy than the original models in seven experiments, and the same as the original models in two experiments. For example, Table 3 shows that H-gcForestcs has better accuracy than gcForestcs on three datasets (EMNIST, FASHION-MNIST, and QMNIST) and worse accuracy than gcForestcs on MNIST. The reason for this is as follows. Since the important feature vectors have larger distances than the redundant feature vectors, hashing screening effectively screens for important feature vectors based on their distances. More importantly, these redundant feature vectors make lower contributions to the classification accuracy. Hence, the models using hashing screening have almost the same accuracy as the original models.

To further evaluate the influence of hashing screening, we applied a tt-test for statistical hypothesis test.

To measure the influence of the hashing screening mechanism, we applied a tt-test to the time cost and accuracy. More specifically, we compared the accuracy and time cost of the models at each fold, and obtained the different values (D1,D2,,Dk)(D_{1},D_{2},\cdots,D_{k}). We then calculated the statistics according to |kμ/σ|\left|\sqrt{k}\mu/\sigma\right|, where kk is the number of cross-validation folds, μ\mu is the mean of (D1,D2,,Dk)(D_{1},D_{2},\cdots,D_{k}), and σ\sigma is the standard deviation of (D1,D2,,Dk)(D_{1},D_{2},\cdots,D_{k}). The results for the time cost and accuracy are shown in Tables 7 and 8, respectively.

Table 7: Significance testing for time cost
DATASET H-gcForest H-gcForestcs HDBC-Forest H-AWDF HW-Forest
MNIST R (192.15) R (52.08) R (42.58) R (44.3) R (44.30)
EMNIST R (28.02) R (6.56) R (6.83) R (5.99) R (5.36)
FASHION-MNIST R (12.58) R (35.44) R (11.49) R (7.71) R (7.71)
QMNIST R (25.45) R (44.35) R (34.31) R (28.76) R (28.76)
Table 8: Significance testing for accuracy
DATASET H-gcForest H-gcForestcs HDBC-Forest H-AWDF HW-Forest
MNIST A (0.88) A (1.52) A (0.47) A (1.18) A (0.60)
EMNIST A (0.08) A (1.99) A (0.13) A (0.98) A (0.75)
FASHION-MNIST A (1.15) A (1.89) A (0.80) A (1.17) A (0.09)
QMNIST A (0.87) A (0.33) A (1.13) A (0.40) A (0.84)

We can reject (R) or accept (A) the null hypothesis based on the statistics. For example, for the MNIST dataset in Table 8, the accuracy of H-gcForest is not significantly different from that of gcForest, since T0.05,4T_{0.05,4} is 2.13, which is larger than the tt-test statistic of 0.88. Furthermore, for the MNIST dataset, Table 7 shows that the time costs of H-Forest and HW-Forest are significantly different, since the tt-test statistic is 192.15, which is much larger than 2.13. The results of these statistical hypothesis tests showed that hashing screening effectively reduces the time cost by eliminating redundant feature vectors, without influencing the accuracy. More importantly, when used with other deep forest models, hashing screening can achieve the same accuracy as the original models.

Based on the above experimental results and statistical analysis, we can conclude that hashing screening can reduce the time cost without influencing the accuracy.

5.3 Effect of Hashing Screening

In this section, we compared raw instances with instances processed by hashing screening, to explore the effect of the hashing screening mechanism. We conducted experiments with windows of size 2×2, 4×4, 6×6, and 8×8, and selected 10 instances from the FASHION-MNIST dataset. The results are shown in Figure 8.

Refer to caption
(a) Ten raw instances.
Refer to caption
(b) Processing by hashing screening with a window of size 2×2
Refer to caption
(c) Processing by hashing screening with a window of size 4×4
Refer to caption
(d) Processing by hashing screening with a window of size 6×6
Refer to caption
(e) Processing by hashing screening with a window of size 8×8
Figure 8: Comparison of raw and processed instances

Two observations can be made from these results.

1. The number of redundant feature vectors produced by a large window is significantly lower than for a small window. For example, the number of redundant feature vectors in Figure 8(f) is two, for a window size of 8×8; however, the number of redundant feature vectors in Figure 8(b) is 55, for a window size of 2×2. The reason for this is that on the FASHION-MNSIT dataset, the difference between the number of redundant feature vectors and important feature vectors is small. As the window size increases, the probability of containing important feature vectors increases, and thus the difference becomes smaller. Hence, with an increase in the window size, the number of redundant feature vectors decreases.

2. Hashing screening reduces the number of redundant feature vectors while retaining important information. For example, as shown in Figure 8(b), hashing screening retains the important features of instances, such as the toes of shoes and the sleeves of coats. The same phenomenon can be seen in the other figures. The reason for this is as follows. Since the important feature vectors are significant different from each other, the distance between them is larger than for the redundant feature vectors. The redundant feature vectors are therefore eliminated and the important feature vectors are retained based on their distances.

To further explore the effect of the hashing screening mechanism, we conducted four experiments with windows of size 2×2, 4×4, 6×6, and 8×8. In each experiment, we compared the accuracy and time cost of W-gcForest and HW-gcForest for different window sizes. The results are shown in Table 9.

Table 9: Comparison of performance between W-Forest and HW-gcForest for different window sizes
Accuracy (%) Time(s)
Window size W-Forest HW-Forest Difference W-Forest HW-Forest Difference
2×2 89.98±0.27 90.05±0.18 0.07 989.19±9.25 762.11±10.32 227.08 (29.80%)
4×4 90.57±0.27 90.66±0.10 0.09 1,049.75±8.36 953.53±9.68 96.22 (10.09%)
6×6 90.28±0.28 90.28±0.28 0.00 1,077.05±11.31 1,020.22±11.81 56.83 (5.57%)
8×8 98.98±0.33 98.98±0.33 0.00 1,192.44±11.48 1,184.81±13.19 7.63 (0.64%)

Based on these results, two observations can be made.

1. With an increase in the window size, the time required for hash screening decreases. As shown in Table 9, this time is reduced by 227.08 s (29.80%) and 56.83 s (5.57%) when the window sizes are 2×2 and 6×6, respectively. The reason for this is as follows. As in the previous analysis, with an increase in the window size, the number of redundant feature vectors decreases, and hence the time required for hash screening decreases.

2. Hashing screening does not affect the accuracy of the model with different window sizes. For example, HW-Forest has the same accuracy as W-Forest for window sizes of 8×8 and 6×6, and has better accuracy than W-Forest when the window sizes are 4×4 and 2×2. This phenomenon indicates that using the distance of the feature vector as the criterion for eliminating redundant feature vectors can effectively identify redundant feature vectors with different window sizes. In other words, hashing screening is robust under different window sizes.

5.4 Influence of Window Screening

To validate the effect of window screening, we observed the accuracy and the number of instances at each level. We selected DBC-Forest as a competitive algorithm and used the MNIST, EMNIST, FASHION-MNIST, QMNIST, and IMDB datasets, since our model and competitive algorithm produced more layers for these datasets. Comparisons of the accuracy and numbers of instances for MNIST, EMNIST, FASHION-MNIST, QMNIST, and IMDB are shown in Figures 9 to 13, respectively.

Refer to caption
Refer to caption
Figure 9: Comparison of accuracy and number of instances for the MNIST dataset
Refer to caption
Refer to caption
Figure 10: Comparison of accuracy and number of instances for the EMNIST dataset
Refer to caption
Refer to caption
Figure 11: Comparison of accuracy and number of instances for the FASHION-MNIST dataset
Refer to caption
Refer to caption
Figure 12: Comparison of accuracy and number of instances for the QMNIST dataset
Refer to caption
Refer to caption
Figure 13: Comparison of accuracy and number of instances for the IMDB dataset

These results give rise to the following observations.

1. HW-Forest achieves higher accuracy than DBC-Forest at each level. For example, it can be seen from Figure 13 that for the IMDB dataset, the accuracy of HW-Forest is 0.09% higher than DBC-Forest at the second level, and 0.19% higher than DBC-Forest at the last level. The reason for this is that window screening is a self-adaptive mechanism, and produces more reasonable thresholds than the binning screening mechanism without the need for hyperparameter tuning. Hence, the instances screened by HW-Forest have higher accuracy than those screened by DBC-Forest at each level.

2. HW-Forest has a lower number of instances than DBC-Forest at each level. For example, on the MNIST dataset, HW-Forest has 1,213 fewer instances than DBC-Forest at the second level, and 146 fewer instances than DBC-Forest at the last level. The reason for this is that since the performance of DBC-Forest depends on the hyperparameters, it is not easy to find high-confidence instances. However, HW-Forest adopts a self-adaptive mechanism to precisely select high-confidence instances.

5.5 Effect of Window Screening

In this section, we explore the difference between window screening and binning confidence screening. To demonstrate the performance of our approach on different types of datasets, we used MNIST, IMDB, LETTER, and BANK. Comparisons of the thresholds for the MNIST, IMDB, LETTER, and BANK datasets are shown in Figures 14 to 17, respectively.

Refer to caption
Figure 14: Comparison of thresholds for the MNIST dataset
Refer to caption
Figure 15: Comparison of thresholds for the IMDB dataset
Refer to caption
Figure 16: Comparison of thresholds for the LETTER dataset
Refer to caption
Figure 17: Comparison of thresholds for the BANK dataset

As shown in Figures 14 to 17, HW-Forest gives a more reasonable threshold than DBC-Forest. For example, on the MNIST dataset, DBC-Forest cannot avoid accuracy fluctuations, and some high-confidence instances screened with this threshold have low accuracy. Using HW-Forest, the screened high-confidence instances have significantly higher accuracy. The reason for this is that unlike the dependence of DBC-Forest on hyperparameter tuning, HW-Forest selects instances at each level without the need for hyperparameter tuning. Hence, HW-Forest gives a more reasonable threshold than DBC-Forest.

5.6 Performance of HW-Forest

In this section, to validate the accuracy of HW-Forest, we used gcForest, gcForestcs, DBC-Forest, AWDF, H-Forest, and W-Forest as competitive algorithms. Experiments were conducted on nine benchmark datasets, and comparisons of the results for accuracy and time cost are shown in Tables 10 and 11, respectively.

Table 10: Comparison of accuracy (%)
DATASET gcforest gcForestcs DBC-Forest AWDF H-Forest W-Forest HW-Forest
MNIST 98.77±0.12 98.20±0.09 99.03±0.07 98.80±0.11 98.75±0.12 99.05±0.08 99.07±0.09
EMNIST 86.18±0.24 87.11±0.23 86.82±0.26 86.74±0.18 86.13±0.28 87.24±0.18 87.22±0.17
FASHION-MNIST 89.94±0.29 89.94±0.29 90.57±0.22 89.89±0.27 90.05±0.29 90.77±0.34 90.79±0.24
QMNIST 98.93±0.06 98.40±0.12 99.08±0.08 98.93±0.08 98.92±0.06 99.13±0.05 99.13±0.05
ADULT 85.99±0.37 86.04±0.35 86.11±0.29 85.86±0.15 85.99±0.37 86.18±0.65 86.18±0.65
BANK 91.43±0.23 91.58±0.19 91.62±0.16 91.45±0.23 91.43±0.23 91.60±0.22 91.60±0.22
YEAST 62.06±3.77 62.00±3.50 62.13±3.70 62.53±4.10 62.06±3.77 62.67±2.07 62.67±2.07
IMDB 88.81±0.10 88.88±0.14 89.39±0.32 88.94±0.23 88.81±0.10 89.58±0.26 89.58±0.26
LETTER 97.02±0.22 96.83±0.27 97.07±0.23 96.65±0.18 97.02±0.22 97.19±0.16 97.19±0.16
Table 11: Comparison of time cost (s)
DATASET gcforest gcForestcs DBC-Forest AWDF H-Forest W-Forest HW-Forest
MNIST 1972.27±6.39 1636.00±13.47 1636.10±17.79 1697.79±18.36 1464.86±11.54 1648.92±20.56 1135.42±9.32
EMNIST 8774.77±108.08 5945.37±159.68 7926.23±112.34 7385.44±103.03 6920.12±51.39 6856.64±80.16 6408.14±65.33
FASHION-MNIST 2411.10±10.28 1943.09±6.50 2056.95±13.38 1821.65±15.30 2272.63±25.18 1990.47±13.91 1904.17±22.82
QMNIST 2332.79±31.80 1991.11±17.71 1979.00±22.14 1988.55±66.69 1790.10±17.71 1965.98±34.68 1473.09±12.00
ADULT 10.63±0.18 9.88±0.08 10.18±0.23 10.21±0.12 10.63±0.18 10.32±0.18 10.15±0.08
BANK 12.68±0.13 9.83±0.26 10.84±0.07 11.64±0.06 12.77±0.16 11.86±0.11 12.056±0.12
YEAST 8.52±0.04 8.47±0.07 8.56±0.09 10.65±0.23 8.52±0.04 8.69±0.3 8.69±0.33
IMDB 1,257.24±6.28 475.65±10.53 753.52±12.18 1,109.72±16.22 1,257.24±6.28 551.96±8.41 551.96±8.41
LETTER 11.89±0.18 9.02±0.10 9.02±0.10 10.68±0.12 11.89±0.18 9.63±0.14 9.63±0.14

From these results, we can make the following observations.

1. As shown in Table 10, HW-Forest achieves the highest accuracy on most datasets. For example, on the QMNIST dataset, HW-Forest achieves the highest accuracy of 99.07%. The same phenomenon can be found for the other datasets. The reason for this is that HW-Forest uses window screening to improve the accuracy of DBC-Forest, and this produces more reasonable thresholds than the binning screening mechanism. Hence, HW-Forest achieves higher accuracy than DBC-Forest, and also achieves higher accuracy than other models. As shown in Table 11, for high-dimensional datasets, HW-Forest runs faster than the competitive models in most cases; for low-dimensional datasets, HW-Forest runs slower than gcForestcs, but faster than the other competitive models. For example, on the MNIST dataset, the time cost for HW-gcForest is 1135.42 s, which is faster than all other models. On the ADULT dataset, the time cost of HW-gcForest is 10.15 s, which is slower than gcForestcs but faster than the other competitive models. The reason for this is that for high-dimensional datasets, deep forest models take a long time to find the classification models. If we can reduce the number of feature vectors, the running time can be significantly reduced. HW-Forest employs hashing screening to effectively eliminate redundant feature vectors, and therefore runs faster than the competitive models for high-dimensional datasets. However, for low-dimensional datasets, the deep forest models take only a short time to find the classification models, and these datasets do not need to be processed by multi-grained scanning. Hence, the running time of precisely selecting instances at each level cannot be neglected, and HW-Forest runs slower than gcForestcs.

2. HW-Forest and W-Forest achieve the same accuracy on most datasets. For example, from Table 10, we can see that for the QMNIST dataset, HW-Forest and W-Forest achieve the same accuracy of 99.13%. The same effect can be seen for the ADULT, BANK, YEAST, IMDB, and LETTER datasets. For the MNIST, FASHION-MNIST, and EMNIST datasets, the results for the accuracy are close. The reason is for this is that low-dimensional datasets do not need to be processed by multi-grained scanning. The results of HW-Forest are therefore the same as those of W-Forest, since W-Forest is simply the HW-Forest model without the hashing screening mechanism. For high-dimensional datasets, as demonstrated in our analysis of the effect of hashing screening in Section 5.3, the hashing screening mechanism does not influence the accuracy of the model. Hence, the accuracies of HW-Forest and W-Forest are almost the same.

To further evaluate the performance of DBC-Forest, we applied Friedman and Nemenyi tests [34] for statistical hypothesis test.

1. Friedman test: This test ranks the models according to their accuracy, where the highest is given a score of one, the second highest has a score of two, and so on. The mean values for the gcForest, gcForestcs, DBC-Forest, AWDF, HW-Forest, H-Forest, and W-Forest models were 5.89, 5.33, 3, 5.11, 1.61, 5.44 and 1.61, respectively. We then calculated the Friedman test statistic according to (N1)×τχ2/(N×(k1)τχ2)(N-1)\times\tau_{\chi^{2}}/(N\times(k-1)-\tau_{\chi^{2}}), where τχ2=12×N×(i=1kri2k×(k+1)2/4)/k×(k+1)\tau_{\chi^{2}}=12\times N\times(\sum_{i=1}^{k}r_{i}^{2}-k\times(k+1)^{2}/4)/k\times(k+1), and NN and kk are the number of datasets and models, respectively. The Friedman statistic was 24.371, which was larger than T0.05,4T_{0.05,4} = 2.132. Hence, we reject the null hypothesis.

2. Nemenyi test: To further compare the models, we adopted the Nemenyi test to calculate the critical difference value. The result was 2.742, according to the formula q0.1,7k(k+1)/6Nq_{0.1,7}\sqrt{k(k+1)/6N}, where q0.1,7q_{0.1,7} = 2.693. As shown in Figure 18, the accuracy of HW-Forest is the same as for W-Forest, which indicates that HW-Forest is statistically consistent with W-Forest. More importantly, the accuracy of HW-Forest is significantly higher than the other models except W-Forest, meaning that HW-Forest statistically outperformed the other competitive models.

Refer to caption
Figure 18: Results of a Nemenyi test, where the dots show the mean value for each model, and the horizontal bar across each dot denotes the critical difference

6 Conclusion

To improve the performance and efficiency of the deep forest model, we propose a novel deep forest model called HW-Forest in which two mechanisms are applied to improve the performance: hashing screening and window screening. Hashing screening employs the feature distance as the criterion for selecting feature vectors, and this is calculated using a perceptual hashing algorithm. The number of redundant feature vectors can therefore be reduced by hashing screening. We also propose a window screening mechanism which produces the threshold by using a self-adaptive process for the length of the window. This method can effectively find the threshold without using an unsuitable hyperparameter for the size of the bins. Our experimental results demonstrate that hashing screening effectively reduces the time cost of multi-grained scanning, and the window screening mechanism outperforms the binning confidence screening mechanism. More importantly, our experimental results indicate that HW-Forest gives better performance than other competitive models, since it is statistically significantly better than these models.

Acknowledgement

This work was supported by National Natural Science Foundation of China (61976240, 52077056, 91746209), National Key Research and Development Program of China (2016YFB1000901) , Natural Science Foundation of Hebei Province, China (Nos. F2020202013, E2020202033), and Graduate Student Innovation Program of Hebei Province (CXZZSS2021026).

References

  • [1] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553, 436-444.
  • [2] Shuhui Cheng, Youxi Wu, Yan Li, Fang Yao, and Fan Min. 2021. TWD-SFNN: Three-way decision with single hidden layer feedforward neural network. Informmation Sciences 579, 15-32.
  • [3] Youxi Wu, Lanfang Luo, Yan Li, Lei Guo, Philippe Fournier-Viger, Xingquan Zhu, and Xindong Wu. 2021. NTP-Miner: Nonoverlapping three-way sequential pattern mining. ACM Transactions on Knowledge Discovery from Data. doi:10.1145/3480245.
  • [4] Dong Liu, YouXi Wu, and He Jiang. 2016. FP-ELM: An online sequential learning algorithm for dealing with concept drift. Neurocomputing 207, 322-334.
  • [5] YouXi Wu, Dong Liu, and He Jiang. 2017. Length-changeable incremental extreme learning machine. Journal of Computer Science and Technology 32, 3, 630-643.
  • [6] Ziyang Zhang, Chenrui Duan, Tao Lin, Shoujun Zhou, Yuanquan Wang, and Xuedong Gao. 2020. GVFOM: a novel external force for active contour based image segmentation. Information Sciences 506, 1-18.
  • [7] Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, and Björn Schuller. 2018. Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments. ACM Transactions on Intelligent Systems and Technology 9, 5, 49.
  • [8] Jipeng Qiang, Zhenyu Qian, Yun Li, Yunhao Yuan, Xindong Wu. 2020. Short text topic modeling techniques, applications, and performance: A survey. IEEE Transactions on Knowledge and Data Engineering. DOI:10.1109/TKDE.2020.2992485.
  • [9] Zhihua Zhou and Ji Feng. 2019. Deep forest. National Science Review 6, 1, 74-86.
  • [10] Ran Su, Xinyi Liu, Leyi Wei, and Quan Zou. 2019. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods 166, 91-102
  • [11] Feng Yang, Qizhi Xu, Bo Li, and Yan Ji. 2018. Ship detection from thermal remote sensing imagery through region-based deep forest. IEEE Geoence & Remote Sensing Letters 449-453.
  • [12] M. Pang, K. Ting, P. Zhao, and Z. Zhou. 2020. Improving deep forest by screening, IEEE Transactions on Knowledge and Data Engineering. doi:10.1109/TKDE.2020.3038799.
  • [13] Peng Ma, Youxi Wu, Yan Li, Lei Guo, and Zhao Li. 2021. DBC-Forest: deep forest with binning confidence screening. Neurocomputing, 2022, 475: 112-122.
  • [14] Youxi Wu, Meng Geng, Yan Li, lei Guo, Zhao Li, Philippe Fournier-Viger, Xingquan Zhu, and Xindong Wu. 2021. HANP-Miner: High average utility nonoverlapping sequential pattern mining. Knowledge-Based Systems 229. doi: 10.1016/j.knosys.2021.107361.
  • [15] Liang Sun, Zhanhao Mo, Fuhua Yan, Liming Xia, Fei Shan, Zhongxiang Ding, Bin Song, Wanchun Gao, Wei Shao, Feng Shi, Huan Yuan, Huiting Jiang, Dijia Wu, Ying Wei, Yaozong Gao, He Sui, Daoqiang Zhang, and Dinggang Shen. 2020 Adaptive feature selection guided deep forest for covid-19 classification with chest CT. IEEE Journal of Biomedical and Health Informatics 24, 10, 2798-2805.
  • [16] Yang Guo, Shuhui Liu, Zhanhuai Li, and Xuequn Shang. 2018. BCDForest: A boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinformatics 19, S5, 1-13.
  • [17] Yunyun Dong, Wenkai Yang, Jiawen Wang, Juanjuan Zhao, and Yan Qiang. 2019. MLW-gcForest: A multi-weighted gcForest model for cancer subtype classification by methylation data. Applied Sciences 9, 17, 3589.
  • [18] Shouxiang Wang, Haiwen Chen, Luyang Guo, and Di Xu. 2021. Non-intrusive load identification based on the improved voltage-current trajectory with discrete color encoding background and deep-forest classifier. Energy and Buildings 244, 2, 111043.
  • [19] Juan Cheng, Meiyao Chen, Chang Li, Yu Liu, Rencheng Song, Aiping Liu, and Xun Chen. 2020. Emotion recognition from multi-channel eeg via deep forest. IEEE Journal of Biomedical and Health Informatics 25, 2, 453-464.
  • [20] Yinfeng Fang, Haiyang Yang, Xuguang Zhang, Han Liu, and Bo Tao. 2020. Multi-feature input deep forest for EEG-based emotion recognition. Frontiers in Neurorobotics 14, 617531.
  • [21] Liu Zhang, Heng Sun, Zhenhong Rao, and Haiyan Ji. 2020. Hyperspectral imaging technology combined with deep forest model to identify frost-damaged rice seeds. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 229, 117973.
  • [22] Kheir Eddine Daouadi, Rim Zghal Rebaï, and Ikram Amous. 2021. Optimizing semantic deep forest for tweet topic classification. Information Systems 101, 101801.
  • [23] Meng Zhou, Xianhua Zeng, and Aozhu Chen. 2019. Deep forest hashing for image retrieval. Pattern Recognition 95, 114-127.
  • [24] Chao Ma, Zhenbing Liu, Zhiguang Cao, Wen Song, Jie Zhang, and Weiliang Zeng. 2020. Cost-sensitive deep forest for price prediction. Pattern Recognition 107, 107499.
  • [25] Lev V Utkin, Andrei V Konstantinov, Viacheslav S Chukanov, and Anna A Meldo. 2020. A new adaptive weighted deep forest and its modifications. International Journal of Information Technology & Decision Making 19, 04, 963-986.
  • [26] Liang Yang, Xizhu Wu, Yuan Jiang, and Zhihua Zhou. 2020. Multi-label learning with deep forest. in: European Conference on Artificial Intelligence 325, 1634-1641.
  • [27] Zheng Zhang, Xiaofeng Zhu, Guangming Lu, and Yudong Zhang. 2021. Probability ordinal-preserving semantic hashing for large-scale image retrieval. ACM Transactions on Knowledge Discovery from Data. doi:10.1145/3442204.
  • [28] Huifang. Yang, Chenghao Tu, and Chusong Chen. 2021. Learning binary hash codes based on adaptable label representations. IEEE Transactions on Neural Networks and Learning Systems. doi: 10.1109/TNNLS.2021.3095399.
  • [29] Yi Xu, Xianglong Liu, Binshuai Wang, Renshuai Tao, Ke Xia, and Xianbin Cao. 2021. Fast nearest subspace search via random angular hashing. IEEE Transactions on Multimedia 23, 342-352.
  • [30] Wen Fang, Haimiao Hu, Zihao Hu, Shengcai Liao, and Bo Li. 2018. Perceptual hash-based feature description for person re-identification. Neurocomputing 272, 520-531.
  • [31] Kafeng Wang, Haoyi Xiong, Jiang Bian, Zhanxing Zhu, Qian Gao, and Zhishan Guo. 2021. Sampling sparse representations with randomized measurement langevin dynamics. ACM Transactions on Knowledge Discovery from Data. doi:10.1145/3427585.
  • [32] Yong Liu, Shizhong Liao, Shali Jiang, Lizhong Ding, Hailun Lin and Weiping Wang. Fast cross-validation for kernel-based algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence. 42(5):1083-1096, 2020.
  • [33] Shaoning Zeng, Bob Zhang, Jianping Gou, Yong Xu, and Wei Huang. 2021. Fast and robust dictionary-based classification for image data. ACM Transactions on Knowledge Discovery from Data. doi:10.1145/3449360.
  • [34] Youxi Wu, Yuehua Wang, Yan Li, Xingquan Zhu, and Xindong Wu. 2021. Top-k self-adaptive contrast sequential pattern mining. IEEE Transactions on Cybernetics. doi:10.1109/TCYB.2021.3082114.