Classification of White Blood Cell Leukemia with Low Number of Interpretable and Explainable Features

William Franz Lamberti Center for Public Health Genomics
Department of Biomedical Engineering
University of Virginia
Charlottesville, VA, United States
[email protected]

Abstract

White Blood Cell (WBC) Leukaemia is detected through image-based classification. Convolutional Neural Networks are used to learn the features needed to classify images of cells a malignant or healthy. However, this type of model requires learning a large number of parameters and is difficult to interpret and explain. Explainable AI (XAI) attempts to alleviate this issue by providing insights to how models make decisions. Therefore, we present an XAI model which uses only 24 explainable and interpretable features and is highly competitive to other approaches by outperforming them by about 4.38%. Further, our approach provides insight into which variables are the most important for the classification of the cells. This insight provides evidence that when labs treat the WBCs differently, the importance of various metrics changes substantially. Understanding the important features for classification is vital in medical imaging diagnosis and, by extension, understanding the AI models built in scientific pursuits.

Index Terms:

White blood cells, explainable artificial intelligence, random forest, image classification

I Introduction

Acute Lumpocytic Leukemia (ALL) is a disease that affects WBCs. It is particularly harmful for children and the elderly. Further, if ALL remains unchecked, it can spread very quickly throughout a patient’s body by traveling through their bloodstream [1]. Thus, being able to accurately classify healthy white blood cells (WBCs) against those with ALL is crucially important. Further, understanding how AI systems make decisions is crucial for understanding the biological differences between cancerous and healthy WBCs.

A popular method for the analysis and classification of images is using a convolutional neural network (CNN) [2]. This modeling approach is also popular in medical imaging analysis [1]. One downside of using a CNN-based approach is that it is difficult to explain and interpret the model’s parameters since the CNN models can have millions of parameters. While some are attempting to make CNNs more explainable and interpretable[3], these approaches still provide little insight to the underlying function estimated and the ability to assign physical meaning to the millions of individual parameters[4]. The explainability and interpretability of a model is important as communities require models used for patients to be interpretable and explainable. For instance, the European Union (EU) passed the General Data Protection Regulation (GDPR) in 2016 which took effect in 2018 [5]. Further, using models with interpretable and explainable features enable analysts to better understand the phenomena of interest. Thus, using features that are interpretable and explainable enable a deeper understanding of how features interact with one another. Understanding how these features are used by the model is a key aspect for many questions in science.

Our proposed approach aims to classify WBCs as healthy or ALL, achieve a high classification performance, provide insight to the important features, and use a relatively low number of parameters. An overview of our approach is provided in Figure 1. We first processed our images in 1). Since we used two separate data sources that were collected using different processes, we applied a unified custom segmentation algorithm to treat both datasets equitably. We then extracted the useful features for shape, color, and texture using the processed images in 2). Lastly, a random forest (RF) was built using 5-folds cross validation (CV) and evaluated using the validation data in 3). Our RF model uses interpretable and explainable shape, color, and texture features. Further, we can describe the importance of each of the variables in the model. This provides insight into what is useful for determining the type of cell. We extend the individual variable importance by summarizing the most important group of features. This process was repeated for both data sets and the models were compared. Furthermore, our approach outperforms leading approaches by about 4.38%. Our GitHub link provides the code to our analysis under separate branches for each respective dataset:https://github.com/billyl320/wbc_luke

Refer to caption — Figure 1: Figure provides overview of our workflow. The labels “ALL-IDB” and “C-NMC” were added to the images for visualization purposes.

II Methods

II-A Dataset description

The two sources of data used in this manuscript were the publically available and well regarded ALL-IDB[6] and C-NMC[7, 8, 9] datasets. The ALL-IDB data was provided by the Department of Information Technology - Università degli Studi di Milano. The image data was captured with a microscope with a Cannon PowerShort G5 camera and are retained in JPG format with 24 bit color depth [6]. However, the format we received the files were in TIF. The ALL-IDB source has two datasets, but we will constrain ourselves to the ALL-IBD2 data. The ALL-IDB2 data contains cropped areas of interest of WBCs that are malignant and healthy. These cells retain the background and other potential nearby cells. There are a total of 130 of malignant and healthy cells, for a total of 260 images. An example from the ALL-IDB2 dataset is provided before the first step in Figure 1.

The C-NMC dataset was also used to help broaden the scope of our approach. This dataset comes from the ISBI 2019 challenge. The images are saved in the BMP format. There are a total of 7,272 malignant cells and 3,389 healthy cells for a total of 10,661 images. The cells were extracted by an expert oncologist. An example from the C-NMC dataset is provided before the first step in Figure 1.

An important distinction between the ALL-IDB2 and C-NMC datasets is that they both have differing backgrounds. This is crucially important as the segmenting algorithm we developed can be applied to both datasets without any changes. This allows for our method to be applied to a much larger variety of data. Examples of the resulting segmentation results are provided in Figure 1 after the first step.

II-B Image Preprocessing and Segmentation

We need to extract the object of interest to make appropriate inferences. This requires us to perform some image preprocessing. Since we are using two different datasets that were collected differently, creating a universal segmentation algorithm is non-trivial. We will provide a brief overview of the image operators here, but the exact image operators performed on the image data is provided in the Appendix at the following link: https://github.com/billyl320/wbc_luke/blob/master/appendix.pdf.

We first converted all the black pixels to yellow. We then converted the image to CMYK space. Next, we convolved the image with a local maximum. These steps extracted the object of interest from the background. We then created a contrast stretched image and an equalized histogram image. We then combined these two images and extracted the object closest to the center of the image. The resulting image was critically important to extract the features for the analysis. This algorithm is noteworthy as it is able to properly segment two sets of images that were treated very differently during the data collection stage.

II-C Interpretable Features

The code to collect the metrics is provided at our GitHub link: https://github.com/billyl320/wbc_luke. Each dataset has is own respective branch and associated code and output. We selected the 8 shape metrics that had past evidence for outperforming CNNs for other problems related to classifying satellite imagery [10]. We then collected 14 color metrics from the RGB and CMYK color channels. We used these color channels since these channels were useful to isolate the cells as performed by Ghane et al. [11]. From our extracted object, we also measured the mean and SD of the co-occurance matrix, $P$ . These 2 features extract relevant information regarding texture. We used the co-occurance matrix as it is a well known approach for collecting texture information [12]. The metrics we used are summarized in Table I. The exact manner in which we collected these metrics and extended details on these metrics are provided in the Appendix. However, a shorter summary and description of each of the features is provided in the following sections.

TABLE I: Table provides the metrics used in this analysis on a given image,

i

. The first column is the

q^{\text{th}}

metric, where

q\in\{1,2,...,24\}

$\vec{m}_{q,i}$	Metric	Feature Type
1	White EI	Shape
2	Black EI	Shape
3	SP value	Shape
4	$1^{\text{st}}$ Eigenvalue	Shape
5	$2^{\text{nd}}$ Eigenvalue	Shape
6	Eccentricity	Shape
7	Circularity	Shape
8	Number of Corners	Shape
9	Mean R	Color
10	SD R	Color
11	Mean G	Color
12	SD G	Color
13	Mean B	Color
14	SD B	Color
15	Mean C	Color
16	SD C	Color
17	Mean M	Color
18	SD M	Color
19	Mean Y	Color
20	SD Y	Color
21	Mean K	Color
22	SD K	Color
23	Mean P	Texture
24	SD P	Texture

II-C1 Shape Features

The first metrics were the SPs and EIs [13]. The EI is the black and white pixel counts of the cell after the cell is placed in the minimum encompassing circle and then the minimum encompassing square [13]. In other words, this is the area of the cell and the relevant area surrounding the cell. The SP value is the proportion of the area of the shape relative to the sum of the EI. The eigenvalues measure the major and minor axes of the shape. Eccentricity is the ratio of the major axis over the minor one. These are calculated using the $1^{\text{st}}$ and $2^{\text{nd}}$ eigenvalues of the cells, respectively [12]. Circularity measures how circular a given cell is [12, 14, 15]. The number of corners calculates the number of corners in the extracted cell[12]. This results in a total of 8 total shape metrics used by our RF model.

II-C2 Color Features

The metrics for color were the mean and standard deviation of 7 different channels to describe color. The mean describes the average value of the cell, while the standard deviation (SD) describes how much the color varies within the cell. Large values for the mean correspond to large amounts of that given color being present in the cell. Small mean values correspond to small amounts of a color’s presence. Large values for the SD indicate that the given color varies throughout the object, while small values indicate that the color is fairly consistent within the object. The color channels analyzed came from the RGB and CMYK spaces. The RGB color space is the represents that describe an object’s color in terms of red (R), green (G), and blue (B). The CMYK color space describes the object’s color in terms of cyan (C), magenta (M), yellow (Y), and black (K). The CMYK space is particularly useful for segmenting the WBC from the background [11]. This corresponds to a total of 14 color features.

II-C3 Texture Features

The metrics for texture were the mean and SD of the co-occurance matrix in a grayscale representation of the cell. The co-occurance matrix, $P$ , describes how often certain values are next to one another [12]. A large mean indicates that grayscale values are nearby a large number of different grayscale values, while a small mean would indicate the opposite. A large SD indicates that grayscale intensities are nearby many different intensities. A small SD indicates that grayscale intensities tend to be nearby only a few other intensities. In other words, smoother images have a small SD while rougher images have a large SD.

II-D Modeling

We will use a random forest (RF) model to classify the healthy and ALL WBCs. The RF was built using the well established caret and randomForest packages in R [16, 17].

Since various communities define validation, training, and testing data differently, we would like to define these terms to be clear. The training data is the data used to build the model. The testing data is used to evaluate the model built. The validation data is the data never used to build the model. In a random split, we will have the testing and validation data. In cross validation (CV), we also will have the testing and validation data. However, the testing data is split into $k$ -folds, where each fold takes turns being the testing data while the remaining $k-1$ folds are the training data.

We first used stratified random sampling to preserve the proportions from the original data when we split the data using a random split. The strata are the malignant and healthy WBC images. The random split between the testing and validation data was 80% and 20%, respectively. We chose these proportions for the splits since we wanted to keep our experimental setup similar to experiments done by others. We then used 5-fold CV on the larger split to determine the number of variables to check when building the RF and estimating of the overall accuracy. For parameter tuning, we used a grid search ranging from 1 to 10 for the number of variables to consider for each tree. After the number of variables to check was tuned, we used the entirety of the larger split to build a final model. We then used the smaller 20% split, the validation data, to confirm that our model produces reasonable results.

III Results

III-A Accuracy and CI for Our RF Models

Table II provides the 95% confidence intervals (CIs) for the models on the training and validation data. Our models are able to provide highly accurate and consistent results on the ALL-IDB2 data. While our RF model achieved a high level of overall accuracy, our accuracy did drop from the C-NMC training to validation data. Further, the validation accuracy was not encompassed in the training 95% CI, suggesting that our RF model slightly overfit to the training data.

Source	Accuracy	95%CI	Data
ALL-IDB2	1.00	(0.9824, 1.0000)	Training
C-NMC	1.00	(0.9996, 1.0000)	Training
ALL-IDB2	1.00	(0.9315, 1.000)	Validation
C-NMC	0.8645	(0.8493, 0.8788)	Validation

TABLE II: The accuracy and 95% confidence interval (CI) for the various datasets from our RF model.

III-B Performance comparison with other approaches

Tables III and IV provide the results for our approach compared to other state of the art approaches for the ALL-IDB2 and C-NMC datasets. Both of these models used the same preprocessing steps to ensure a fair treatment of the data. Our model outperformed the other state of the art approaches by about 4.70% and 3.81% on average for the ALL-IDB2 and C-NMC datasets, respectively.

Model	Number of Parameters	Model	Accuracy
[18]	256	SVM	89.72%
[19]	4096	Knn	93.84%
[20]	8	Knn	95.24%
[18]	45	Knn	95.67%
[1]	1087	CNN & SVM	96.11%
[21]	142	Ensemble	99.2%
[22]	6.1 $\times 10^{7}$	CNN	99.5%
Ours	24	RF	100.00%

TABLE III: Comparison with related works for the ALL-IDB2 data. The best approaches for the smallest number of parameters and highest accuracy are boldened.

Model	Number of Parameters	Model	F1
[23]	$1.1\times 10^{7}$	CNN	86.9%
[24]	$8.7\times 10^{7}$	CNN	86.7%
[25]	$2.5\times 10^{7}$	CNN	85.7%
[1]	1,115	CNN	87.9%
Ours	24	RF	90.1%

TABLE IV: Comparison with related works for the C-NMC data. The best approaches for the smallest number of parameters and highest F1 score are boldened.

Our approach was able to outperform the other advanced methods by about 4.38% on average. Furthermore, our approach used the smallest number of parameters when compared to the other approaches except for one. This is particularly impressive since the methods to extract the features were the same on both datasets. Thus, our segmentation and collected metrics used in our approach is applicable to a large variety of images that are collected differently.

III-C Number of parameters comparison

Our approach used less features than the other approaches. It was the second smallest for the ALL-IDB2 data and the smallest on for the C-MNC data. For both datasets, the best prior solutions used a large number of parameters. Shahol et al.’s approach used a variable selection approach to select useful features from a CNN [1]. This approach selected over 1,000 variables to use for classifying the WBCs. Our approach only required 24 features and outperformed their approach on both datasets. This shows that our approach describes the WBCs in a more explainable and interpretable manner than the other approaches. These results are also summarized in Tables III and IV.

III-D Interpreting our models’ results

Table V provides the relative variable importance (VI) by category. The two most important categories are color and shape, while texture is the least important. This differs from Bhattacharjee and Saini who used only shape features[20]. Sahlol et. al. built a model using only texture features [26]. Singhal et. al. built two different models: one that used only shape features and one that used only texture features. Their final model suggested that the model using only the texture features was better[18]. Rawat et. al. built an ensemble of different models using color, shape, and texture features [21]. However, they provide no insight to what features were most important in their analysis. Thus, our approach incorporates a smaller number of different types of features and outperforms all reported approaches. Further, our models are the first characterize the importance of color for classifying WBCs as malignant or healthy. Our results suggest that color is a major factor for classifying WBCs as malignant or healthy.

Relative Importance	ALL-IDB2	C-NMC
Most	Color: 1.000	Shape: 1.000
Secondarily	Shape: 0.357	Color: 0.776
Least	Texture: 0.052	Texture: 0.152

TABLE V: The relative VI of the categories for the ALL-IDB2 and C-NMC data.

Since we used interpretable features, we have greater insight into what is important for classifying malignant and healthy WBCs. Both datasets indicate that color features are the most important, followed by shape and then texture. However, if the WBC is extracted by an oncologist, shape features turn out to be very important for classifying the WBCs. However, if the WBC must be extracted from the background, other features such as color tend to be more important. Thus, the manner in which labs treat WBCs dramatically changes what features are useful for classification. This in turn changes what the model learns.

This also showcases that the RF model values different variables when trained on different datasets. This is strong evidence of batch effects. Deploying a RF model without considering batch effects is inadvisable. Observing how the model performs alongside current systems is strongly encouraged before fully adopting any machine learning or artificial intelligence solution.

IV Conclusion

We provide a competitive and novel solution for WBC classification of Leukemia and healthy cells. Our approach outperforms by about 4.38% while also using substantially less features than other proposed solutions. Further, our features are all explainable which provides greater insight to finding a solution to understand what characteristics are useful for classifying malignant and healthy WBCs. Thus, we provide a pathway for a truly interpretable and explainable solution for classifying Leukemia and healthy WBCs. Further, we were able to show that the variable importance differed for the different types of features. This allowed us to show that if labs treat the WBCs differently, this has a direct impact on the classification models built. Further, the importance of the variables used could easily change depending on the labs’ treatments of their respective data.

Acknowledgment

The Office of the Provost at George Mason University funded this research.

We would like to thank the Zang Lab for Computational Biology at the University of Virginia for their support.

References

[1] A. T. Sahlol, P. Kollmannsberger, and A. A. Ewees, “Efficient Classification of White Blood Cell Leukemia with Improved Swarm Optimization of Deep Features,” Scientific Reports, vol. 10, no. 1, p. 2536, Feb. 2020, number: 1 Publisher: Nature Publishing Group. [Online]. Available: http://www.nature.com/articles/s41598-020-59215-9
[2] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and T. Chen, “Recent advances in convolutional neural networks,” Pattern Recognition, vol. 77, pp. 354–377, May 2018.
[3] W. Samek, T. Wiegand, and K.-R. Müller, “Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models,” ITU Journal: ICT Discoveries - Special Issue 1 - The Impact of Artificial Intelligence (AI) on Communication Networks and Services, vol. 1, Aug. 2017, arXiv: 1708.08296. [Online]. Available: http://arxiv.org/abs/1708.08296
[4] A. Barredo Arrieta, N. Díaz-Rodríguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, and F. Herrera, “Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI,” Information Fusion, vol. 58, pp. 82–115, Jun. 2020. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1566253519308103
[5] “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance),” May 2016, legislative Body: EP, CONSIL. [Online]. Available: http://data.europa.eu/eli/reg/2016/679/oj/eng
[6] R. D. Labati, V. Piuri, and F. Scotti, “All-IDB: The acute lymphoblastic leukemia image database for image processing,” in 2011 18th IEEE International Conference on Image Processing. Brussels, Belgium: IEEE, Sep. 2011, pp. 2045–2048. [Online]. Available: http://ieeexplore.ieee.org/document/6115881/
[7] R. Gupta, P. Mallick, R. Duggal, A. Gupta, and O. Sharma, “Stain Color Normalization and Segmentation of Plasma Cells in Microscopic Images as a Prelude to Development of Computer Assisted Automated Disease Diagnostic Tool in Multiple Myeloma,” Clinical Lymphoma Myeloma and Leukemia, vol. 17, no. 1, Supplement, p. e99, Feb. 2017.
[8] R. Duggal, A. Gupta, R. Gupta, M. Wadhwa, and C. Ahuja, “Overlapping cell nuclei segmentation in microscopic images using deep belief networks,” in Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, ser. ICVGIP ’16. New York, NY, USA: Association for Computing Machinery, Dec. 2016, pp. 1–8. [Online]. Available: https://doi.org/10.1145/3009977.3010043
[9] R. Duggal, A. Gupta, R. Gupta, and P. Mallick, “SD-Layer: Stain Deconvolutional Layer for CNNs in Medical Microscopic Imaging,” in Medical Image Computing and Computer Assisted Intervention MICCAI 2017, ser. Lecture Notes in Computer Science, M. Descoteaux, L. Maier-Hein, A. Franz, P. Jannin, D. L. Collins, and S. Duchesne, Eds. Cham: Springer International Publishing, 2017, pp. 435–443.
[10] W. F. Lamberti, “Classification of Synthetic Aperture Radar Images of Icebergs and Ships Using Random Forests Outperforms Convolutional Neural Networks,” in 2020 IEEE Radar Conference (RadarConf20), Sep. 2020, pp. 1–6, iSSN: 2375-5318.
[11] N. Ghane, A. Vard, A. Talebi, and P. Nematollahy, “Segmentation of White Blood Cells From Microscopic Images Using a Novel Combination of K-Means Clustering and Modified Watershed Algorithm,” Journal of Medical Signals and Sensors, vol. 7, no. 2, pp. 92–101, 2017. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5437768/
[12] J. M. Kinser, Image Operators: Image Processing in Python, 1st ed. Boca Raton, FL: CRC Press, Oct. 2018.
[13] W. F. Lamberti, “Algorithms to Improve Analysis and Classification for Small Data,” Ph.D., George Mason University, United States – Virginia, 2020, iSBN: 9798557033350.
[14] A. Rosenfeld, “Compact Figures in Digital Pictures,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-4, no. 2, pp. 221–223, Mar. 1974.
[15] D. Proffitt, “The measurement of circularity and ellipticity on a digital grid,” Pattern Recognition, vol. 15, no. 5, pp. 383–387, Jan. 1982.
[16] M. Kuhn, J. Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, R. C. Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, C. Candan, and T. Hunt, “caret: Classification and Regression Training,” Mar. 2020, http://topepo.github.io/caret/index.html. [Online]. Available: https://CRAN.R-project.org/package=caret
[17] F. o. b. L. B. a. A. Cutler and R. p. b. A. L. a. M. Wiener, “randomForest: Breiman and Cutler’s Random Forests for Classification and Regression,” Mar. 2018. [Online]. Available: https://CRAN.R-project.org/package=randomForest
[18] V. Singhal and P. Singh, “Local Binary Pattern for automatic detection of Acute Lymphoblastic Leukemia,” in 2014 Twentieth National Conference on Communications (NCC), Feb. 2014, pp. 1–5.
[19] ——, “Texture Features for the Detection of Acute Lymphoblastic Leukemia,” Feb. 2016, pp. 535–543.
[20] R. Bhattacharjee and L. M. Saini, “Robust technique for the detection of Acute Lymphoblastic Leukemia,” in 2015 IEEE Power, Communication and Information Technology Conference (PCITC), Oct. 2015, pp. 657–662.
[21] J. Rawat, A. Singh, H. S. Bhadauria, J. Virmani, and J. S. Devgun, “Classification of acute lymphoblastic leukaemia using hybrid hierarchical classifiers,” Multimedia Tools and Applications, vol. 76, no. 18, pp. 19 057–19 085, Sep. 2017. [Online]. Available: https://doi.org/10.1007/s11042-017-4478-3
[22] S. Shafique and S. Tehsin, “Acute Lymphoblastic Leukemia Detection and Classification of Its Subtypes Using Pretrained Deep Convolutional Neural Networks,” Technology in Cancer Research & Treatment, vol. 17, p. 1533033818802789, Sep. 2018. [Online]. Available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6161200/
[23] C. Marzahl, M. Aubreville, J. Voigt, and A. Maier, “Classification of Leukemic B-Lymphoblast Cells from Blood Smear Microscopic Images with an Attention-Based Deep Learning Method and Advanced Augmentation Techniques,” in ISBI 2019 C-NMC Challenge: Classification in Cancer Cell Imaging, ser. Lecture Notes in Bioengineering, A. Gupta and R. Gupta, Eds. Singapore: Springer, 2019, pp. 13–22.
[24] Y. Ding, Y. Yang, and Y. Cui, “Deep Learning for Classifying of White Blood Cancer,” in ISBI 2019 C-NMC Challenge: Classification in Cancer Cell Imaging, ser. Lecture Notes in Bioengineering, A. Gupta and R. Gupta, Eds. Singapore: Springer, 2019, pp. 33–41.
[25] R. Kulhalli, C. Savadikar, and B. Garware, “Toward Automated Classification of B-Acute Lymphoblastic Leukemia,” in ISBI 2019 C-NMC Challenge: Classification in Cancer Cell Imaging, ser. Lecture Notes in Bioengineering, A. Gupta and R. Gupta, Eds. Singapore: Springer, 2019, pp. 63–72.
[26] A. T. Sahlol, A. M. Abdeldaim, and A. E. Hassanien, “Automatic acute lymphoblastic leukemia classification model using social spider optimization algorithm,” Soft Computing, vol. 23, no. 15, pp. 6345–6360, Aug. 2019. [Online]. Available: https://doi.org/10.1007/s00500-018-3288-5
[27] A. J. Izenman, Modern Multivariate Statistical Techniques Regression, Classification, and Manifold Learning, ser. Springer Texts in Statistics. New York, NY: Springer New York, 2008.
[28] C. Harris and M. Stephens, “A Combined Corner and Edge Detector,” in Procedings of the Alvey Vision Conference 1988. Manchester: Alvey Vision Club, 1988, pp. 23.1–23.6. [Online]. Available: http://www.bmva.org/bmvc/1988/avc-88-023.html

Segmentation

The first step to extract the cell from our images. This is crucially important and challenging since the ALL-IDB2 and C-NMC datasets were collected using different methods. Furthermore, the ALL-IDB2 data includes the background surrounding the WBC. Conversely, the C-NMC images have been segmented by an expert oncologist. Examples of these images are found in Figure 1.

Our image segmentation algorithm was partially inspired by Ghane et al. in the use of the CMYK color space, histogram equalization, and contrast stretching [11]. For a given image, $\textbf{a}[\vec{x}]$ , we first extract the red, green, and blue channels, respectively, by

\textbf{r}[\vec{x}]=\left\{\begin{array}[]{c}1\\ \emptyset\\ \emptyset\end{array}\right\}\textbf{a}[\vec{x}],

(1)

\textbf{g}[\vec{x}]=\left\{\begin{array}[]{c}\emptyset\\ 1\\ \emptyset\end{array}\right\}\textbf{a}[\vec{x}],

(2)

\textbf{b}[\vec{x}]=\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ 1\end{array}\right\}\textbf{a}[\vec{x}].

(3)

We need to then convert all of the black pixels to yellow. This will allow us to extract our cell from the image more easily in the CMYK color space. Let $v$ and $h$ be the width and height of image $\textbf{a}[\vec{x}]$ . Then $\forall i\in\{1,...,v\}$ and $\forall j\in\{1,...,h\}$ , we convert the black pixels to yellow by

\textbf{r}^{*}_{i,j}[\vec{x}]=\left\{\begin{array}[]{ll}255,&\textbf{r}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}\textbf{g}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}\textbf{b}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}0\\ \textbf{r}_{i,j}[\vec{x}],&\text{else}\end{array}\right.,

(4)

\textbf{g}^{*}_{i,j}[\vec{x}]=\left\{\begin{array}[]{ll}255,&\textbf{r}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}\textbf{g}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}\textbf{b}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}0\\ \textbf{g}_{i,j}[\vec{x}],&\text{else}\end{array}\right.,

(5)

\textbf{b}^{*}_{i,j}[\vec{x}]=\left\{\begin{array}[]{ll}0,&\textbf{r}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}\textbf{g}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}\textbf{b}_{i,j}[\vec{x}]\stackrel{{\scriptstyle?}}{{=}}0\\ \textbf{b}_{i,j}[\vec{x}],&\text{else}\end{array}\right.,

(6)

\textbf{d}[\vec{x}]=\left\{\begin{array}[]{c}\textbf{r}^{*}[\vec{x}]\\ \textbf{g}^{*}[\vec{x}]\\ \textbf{b}^{*}[\vec{x}]\end{array}\right\}.

(7)

Now that we converted all the black pixels to yellow, we can convert our input image to the CMYK color space and extract the Y channel by

\textbf{f}[\vec{x}]=\large{(}\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ 1\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)}\otimes\textbf{g}[\vec{p}]

(8)

where $\mathscr{L}_{CMYK}$ converts the input image to the CMYK color space, $\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ 1\\ \emptyset\end{array}\right\}$ extracts the Y channel, $\otimes$ is the convolution operator, and $\textbf{g}[\vec{p}]$ is the local maximum filter using a $5\times 5$ kernel. This process converts the image into a more useful color space, where we use the helpful channel for extracting our object of interest. We also apply a local max operation to help remove some noise from the images.

We then perform histogram equalization and contrast stretching to obtain two new images. We obtain the contrast stretched image by

\textbf{s}[\vec{x}]=(\textbf{f}[\vec{x}]-\epsilon)\large{(}\frac{255}{\delta-\epsilon}\large{)},

(9)

where $\delta$ and $\epsilon$ are the $2^{\text{nd}}$ and $98^{\text{th}}$ percentile values of image $\textbf{f}[\vec{x}]$ . Next, to obtain the image with histogram equalization, we performed

\textbf{e}[\vec{x}]=\mathbb{E}\textbf{f}[\vec{p}],

(10)

where $\mathbb{E}$ is the histogram equalization operator. We then combined the two images as performed by Ghane et al. [11]. Explicitly, we combined the images by

\textbf{j}[\vec{x}]=(2\times\textbf{s}[\vec{x}])+\textbf{e}[\vec{x}].

(11)

We then performed a simple thresholding and extracted the object closest to the center of the image. This was performed by

\textbf{u}[\vec{x}]=\mathcal{I}_{(v/2,h/2)}\Gamma_{(\bigwedge\textbf{j}[\vec{x}]+0.01)<}\textbf{j}[\vec{x}],

(12)

where $\Gamma$ is the threshold operator and retains all the pixel values that are less than the the minimum value of image $\textbf{j}[\vec{x}]$ plus 0.01, $\bigwedge$ is the minimum value operator, $v$ and $h$ are the first and second values from $Z\textbf{j}[\vec{x}]$ , respectively. The size operator is $Z$ . The resulting image $\textbf{u}[\vec{x}]$ is a binary image where the object of interest is white (or 1’s) and the background is black (or 0’s). This provides us with our extracted shapes as observed in Figure 1.

Appendix A Feature Extraction

Shape

We collected the encircled image-histograms (EIs) using the algorithm described by Lamberti [13]. This algorithm results in a vector $\vec{c}_{EI}$ which contains the white and black pixel counts from image $\textbf{u}[\vec{x}]$ . These counts are the first two metrics, $\vec{m}_{1}$ and $\vec{m}_{2}$ , respectively. The shape proportion (SP) value for a given image, $i$ , is merely

\vec{m}_{3,i}=\frac{\vec{m}_{1,i}}{\vec{m}_{1,i}+\vec{m}_{2,i}}.

(13)

The SP value is essentially the proportion of white pixels after applying the SPEI algorithm [13]. This SPEI algorithm puts a shape in its minimum encompassing circle. Then the circle is placed in its minimum encompassing square. Figure 2 provides an illustrative example of the SPEI algorithm result. The EIs are the white and black pixel counts after applying SPEIs [13].

We calculated eccentricity by finding the ratio of the first and second eigenvalues. To obtain the eigenvalues, we performed

\vec{e}_{i}=\mathbb{E}_{(1,2)}V\textbf{u}_{i}[\vec{x}],

(14)

where $V$ collects the covariance matrix of the shape matrix and $\mathbb{E}_{(1,2)}$ calculates the first and second eigenvalues of the resulting covariance matrix. The $j^{\text{th}}$ eigenvalue on image $i$ is $\vec{e}_{i,(j)}$ . To obtain $\vec{m}_{6}$ , eccentricity, we perform on a given image, $i$ ,

\vec{m}_{6,i}=\frac{\vec{e}_{i,(1)}}{\vec{e}_{i,(2)}}.

(15)

The first and second eigenvalues are metrics $\vec{m}_{4}$ and $\vec{m}_{5}$ , respectively. It is well-known that the eigenvalues of a covariance matrix correspond to the linear combination in the data which maximizes the variance for their respective dimension [27]. For instance, the first eigenvalue is the linear combination of the data which maximizes the first eigenvalue [27]. We also know that the linear projections, or eigenvalues, are orthogonal to one another [27]. Thus, the eigenvalues are measures of the major and minor axes of our given shape. Using the ratio of the major and minor axes provides some insight to how a given shape exists as a 2D digital image [12]. A value of 1 corresponds to a shape with the same major and minor axes’ lengths. A value greater than 1 corresponds to the case where the major axis’ length is larger than the minor axis’ length. The limit of eccentricity would correspond to the case where the major axis’ length is infinitely larger than the minor axis’ length.

Circularity, $\vec{m}_{7}$ , was collected on image $i$ by

\vec{m}_{7,i}=\frac{\sum\textbf{u}_{i}[\vec{x}]}{4\pi\times\big{(}\sum\triangleleft\textbf{u}_{i}[\vec{x}]-\sum\textbf{u}_{i}[\vec{x}]\big{)}},

(16)

where $\sum$ sums the pixel intensity values. The $\sum$ operator will compute the area of the image since we are restricted to binary images. The denominator of this metric is the perimeter of the binary image multiplied by $4\pi$ . Circularity provides a measure for how circular a shape is as a 2D digital image [12, 14, 15]. A value of 1 corresponds to a perfect circle [12, 14, 15].

The number of corners, $\vec{m}_{8}$ , counts the number of corners on an object [28]. This metric was measures on image $i$ by

\textbf{v}_{i}[\vec{x}]=G_{2}E_{x}\textbf{u}_{i}[\vec{x}],

(17)

\textbf{h}_{i}[\vec{x}]={G}_{2}{E}_{y}\textbf{u}_{i}[\vec{x}],

(18)

\textbf{A}_{i}[[\vec{x}]]=\begin{bmatrix}\textbf{v}_{i}[\vec{x}]^{2}&\textbf{v}_{i}[\vec{x}]\textbf{h}_{i}[\vec{x}]\\ \textbf{v}_{i}[\vec{x}]\textbf{h}_{i}[\vec{x}]&\textbf{h}_{i}[\vec{x}]^{2}\end{bmatrix},

(19)

\textbf{r}_{i}[\vec{x}]=\frac{det(\textbf{A}_{i}[[\vec{x}]])-k}{trace(\textbf{A}_{i}[[\vec{x}]])^{2}},

(20)

\vec{m}_{8,i}=\mathbb{C}\textbf{r}_{i}[\vec{x}],

(21)

where $E$ is the edge operator where the subscript finds the vertical or horizontal edges, $x$ and $y$ , respectively, $G_{2}$ is a Gaussian smoothing operator with a standard deviation of 2, $\textbf{A}_{i}[[\vec{x}]]$ is an image tensor, $\textbf{r}_{i}[\vec{x}]$ are the resulting corners extracted from the image, and $\mathbb{C}$ extracts the number of corners in the input image. Here we first extract the edges from the image to help first isolate the parts of the image that would contain the corners. Next, we smooth the image to remove spurious parts of the image that would not qualify as a corner. We then create a tensor and compute image $\textbf{r}_{i}[\vec{x}]$ to create an image that contains all of the corners in the image. We then count the number of corners as the last operator to retrieve the desired metric.

Note that all of the variables used in our analysis have a very interpretable meaning. The EI’s are the white and black pixel counts after putting the shape in the minimal encompassing circle and then the minimal encompassing square. The SP value is the proportion of white pixels divided by the summation of the EIs. The $1^{\text{st}}$ and $2^{\text{nd}}$ Eigenvalues are associated with the major and minor axes, respectively, of the shape. Eccentricity is the ratio of the major to minor axis. Circularity measures how circular a shape is where a value of 1 indicates a perfect circle. The number of corners estimates the number of corners the shape has.

Color

The features for color were the mean and standard deviation (SD) of the object’s pixel intensity values. The object’s pixels were extracted by using the segmented shape. Seven different color channels were used from two different color representations: RGB and CMYK. Explicitly, for a given color channel and image, metrics $\vec{m}_{9,i}$ to $\vec{m}_{22,i}$

\vec{m}_{9,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\textbf{r}_{i}[\vec{x}],

(22)

\vec{m}_{10,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\textbf{r}_{i}[\vec{x}],

(23)

\vec{m}_{11,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\textbf{g}_{i}[\vec{x}],

(24)

\vec{m}_{12,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\textbf{g}_{i}[\vec{x}],

(25)

\vec{m}_{13,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\textbf{b}_{i}[\vec{x}],

(26)

\vec{m}_{14,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\textbf{b}_{i}[\vec{x}],

(27)

\vec{m}_{15,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}1\\ \emptyset\\ \emptyset\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(28)

\vec{m}_{16,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}1\\ \emptyset\\ \emptyset\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(29)

\vec{m}_{17,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}\emptyset\\ 1\\ \emptyset\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(30)

\vec{m}_{18,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}\emptyset\\ 1\\ \emptyset\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(31)

\vec{m}_{19,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ 1\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(32)

\vec{m}_{20,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ 1\\ \emptyset\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(33)

\vec{m}_{21,i}=\mathcal{M}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ \emptyset\\ 1\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(34)

\vec{m}_{22,i}=\mathcal{T}_{\textbf{u}_{i}[\vec{x}]}\large{(}\left\{\begin{array}[]{c}\emptyset\\ \emptyset\\ \emptyset\\ 1\end{array}\right\}\mathscr{L}_{CMYK}\textbf{d}[\vec{x}]\large{)},

(35)

where $\mathcal{M}$ and $\mathcal{T}$ are the mean and standard deviation operators, respectively. Both operators are only extracted on the 1’s of image $\textbf{u}[\vec{x}]$ , which ensures that only the colors of object are considered for a given color channel. The mean value for a given channel indicates how much of that color is in the respective channel. Large values indicate that a particular color is very present in the image, while small correspond to the image having very little of a given color. The SD indicates how much the presence of a color changes throughout an image. A large SD indicates that the presence of the intensity of a color changes dramatically, while small a SD corresponds to a fairly uniform representation of the intensity of a color. Thus, our color features have an interpretable and explainable meaning.

Texture

The features for texture where the mean and SD of the co-occurance matrix of the masked grayscale image. This was claculated by:

\vec{m}_{23,i}=\mathcal{M}\mathfrak{C}_{1,256}{\large{(}}(\mathscr{L}_{L}\textbf{a}_{i}[\vec{x}])\times\textbf{u}_{i}[\vec{x}]{\large{)}},

(36)

\vec{m}_{24,i}=\mathcal{T}\mathfrak{C}_{1,256}{\large{(}}(\mathscr{L}_{L}\textbf{a}_{i}[\vec{x}])\times\textbf{u}_{i}[\vec{x}]{\large{)}},

(37)

where $\mathscr{L}_{L}$ converts the given input image to a grayscale image and $\mathfrak{C}$ calculates the co-occurrence matrix with a shift of 1 to the right and bottom for 256 intensities. We multiply the grayscaled image by the extracted shape to remove the background from the image. We then calculate the co-occurrence matrix. This image provides the counts of number of times a certain pixel intensity is next to another intensity with a single shift. This $256\times 256$ matrix is useful for texture measures [12]. We then extracted the mean and SD from this matrix. The mean of this matrix indicates how often an average intensity is nearby another value. Large means would indicate that intensities are nearby other intensity values often which translates to a very rough image. Small means would indicate that intensities are nearby very few other values with translates to a smoother image. The SD indicates how often much the intensities change over an image, with large values indicating a large amount of change and small values indicating a small amount of change. Thus, our texture features have an interpretable and explainable meaning.

Extended Results

Confusion matrices of the ALL-IDB2 training and validation data are provided in Tables VI and VII, respectively. Confusion matricies of the C-NMC training and validation data are provided in Tables VIII and IX. In Tables VI, VII, and VIII, we were able to obtain perfect classification. Table IX shows that our RF model for the C-NMC data did overfit, but was still able to perform well as the Tables have large values in the diagonal. Future work would include producing a model that did not overfit as much to the training data but also improved on the validation data.

Table X provides the two relative VI for the two datasets. Table XI compares the VI for the two datasets. Tables X and XI show that the VI changes between the two datasets. This provides strong evidence in the batch effects between these two datasets. Thus, analysts must consider batch effects when using models in deployment.

Prediction/Truth	Healthy (Negative)	Malignant (Positive)
Healthy (Negative)	104	0
Malignant (Positive)	0	104

TABLE VI: The ALL-IDB2 confusion matrix for the training data.

Prediction/Truth	Healthy (Negative)	Malignant (Positive)
Healthy (Negative)	26	0
Malignant (Positive)	0	26

TABLE VII: The ALL-IDB2 confusion matrix for the validation data.

Prediction/Truth	Healthy (Negative)	Malignant (Positive)
Healthy (Negative)	2711	0
Malignant (Positive)	0	5817

TABLE VIII: The C-NMC confusion matrix for the training data.

Prediction/Truth	Healthy (Negative)	Malignant (Positive)
Healthy (Negative)	522	133
Malignant (Positive)	156	1322

TABLE IX: The C-NMC confusion matrix for the validation data.

Feature	Relative VI	Feature	Relative VI
Mean M	1.000	White EI	1.000
SD K	0.791	$2^{\text{nd}}$ Eigenvalue	0.664
SD B	0.660	$1^{\text{st}}$ Eigenvalue	0.614
$2^{\text{nd}}$ Eigenvalue	0.539	Mean B	0.436
White EI	0.458	Black EI	0.427
Mean G	0.393	Mean K	0.414
SD M	0.253	Mean R	0.265
Mean K	0.229	Mean P	0.250
Mean B	0.214	Mean G	0.248
SD G	0.209	SD M	0.247
Circularity	0.207	SD P	0.245
SD C	195	Number of Corners	0.204
$1^{\text{st}}$ Eigenvalue	0.146	SD K	0.176
SD R	0.139	Mean M	0.175
Mean P	0.124	SD B	0.165
Number of Corners	0.105	Circularity	0.162
SD P	0.097	SD G	0.154
Mean R	0.087	SP	0.129
SD C	0.066	Mean C	0.103
SP	0.057	SD R	0.088
Black EI	0.052	Eccentricity	0.060
Eccentricity	0.023	SD C	0.056
Mean Y	0.001	Mean Y	$>$ 0.000
SD Y	$>$ 0.000	SD Y	$>$ 0.000

TABLE X: The table provides the relative VI of the features for the ALL-IDB2 data in the first two columns and the C-NMC data in the second two columns. The most important feature for the ALL-IDB2 was the mean pixel value in the blue channel, followed closely by the mean pixel value in the black channel and then circularity. The most important feature for the C-NMC data was the White EI value, followed closely by the Black EI and Eccentricity. Each of these variables is defined in the Interpretable Feautres subsection of the Methods section.

Feature	Mean Decrease in Accuracy for ALL-IDB	Mean Decrease in Accuracy for C-NMC
White EI	0.039	0.083
Black EI	0.004	0.035
SP	0.005	0.011
Mean R	0.007	0.022
SD R	0.012	0.007
Mean G	0.033	0.021
SD G	0.018	0.013
Mean B	0.018	0.036
SD B	0.056	0.014
Mean C	0.017	0.009
SD C	0.006	0.005
Mean M	0.085	0.015
SD M	0.021	0.021
Mean Y	$>$ 0.000	$>$ 0.000
SD Y	$>$ 0.000	$>$ 0.000
Mean K	0.019	0.034
SD K	0.067	0.015
Mean P	0.010	0.021
SD P	0.008	0.020
Circularity	0.018	0.013
Eccentricity	0.002	0.005
$1^{\text{st}}$ Eigenvalue	0.012	0.051
$2^{\text{nd}}$ Eigenvalue	0.046	0.055
Number of Corners	0.009	0.017

TABLE XI: The VI of the features for the ALL-IDB2 and C-NMC data.