An Investigation into Glomeruli Detection
in Kidney H&E and PAS Images using YOLO

Kimia Hemmatirad¹, Morteza Babaie¹, Jeffrey Hodgin³, Liron Pantanowitz³, H.R.Tizhoosh^1,4 ¹Kimia Lab, University of Waterloo, ON, Canada
³ Department of Pathology, Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
⁴ Rhazes Lab, Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA

Abstract

Context – Analyzing digital pathology images is necessary to draw diagnostic conclusions by investigating tissue patterns and cellular morphology. However, manual evaluation can be time-consuming, expensive, and prone to inter- and intra-observer variability.
Objective – To assist pathologists using computerized solutions, automated tissue structure detection and segmentation must be proposed. Furthermore, generating pixel-level object annotations for histopathology images is expensive and time-consuming. As a result, detection models with bounding box labels may be a feasible solution. Design – This paper studies YOLO- v4 (You-Only-Look-Once), a real-time object detector for microscopic images. YOLO uses a single neural network to predict several bounding boxes and class probabilities for objects of interest. YOLO can enhance detection performance by training on whole slide images. YOLO-v4 has been used in this paper for glomeruli detection in human kidney images. Multiple experiments have been designed and conducted based on different training data of two public datasets and a private dataset from the University of Michigan for fine-tuning the model. The model was tested on the private dataset from the University of Michigan, serving as an external validation of two different stains, namely hematoxylin and eosin (H&E) and periodic acid–Schiff (PAS).
Results – Average specificity and sensitivity for all experiments, and comparison of existing segmentation methods on the same datasets are discussed.
Conclusions – Automated glomeruli detection in human kidney images is possible using modern AI models. The design and validation for different stains still depends on variability of public multi-stain datasets.

I Introduction

For investigations of tissue morphology and, as a result, for making diagnostic conclusions, computational pathology approaches may offer fast and reliable solutions compared to conventional microscopy-based workflows. On the other hand, any manual evaluation of tissue samples can be time-consuming, costly, and subject to both inter- and intra-observer variability [1]. Consequently, researchers have recently focused their attention on automated solutions to detect and segment tissue structures in digital pathology whole slide images (WSIs). Many studies, such as determining tissue types, rely on the accuracy of tissue pattern segmentation, which is regarded as the foundation of automated image analysis. However, due to the complexity of tissue clustering into types, with architecture such as glands and organelles overlapping with each other, establishing precise segmentation is not a simple operation. This makes distinguishing these patterns from the tissue background, and mainly from each other a challenge. In addition, histopathological images may contain noise and artifacts created during image acquisition, as well as low contrast between foreground and background [1]. Segmentation models have been widely used in digital pathology to segment cells and other regions of interest [2]. However, training these models needs pixel-level object annotations made by an expert. Detailed labels (pixel-level) for histopathology images are expensive, time-consuming, and hard to achieve [3]. Moreover, in some of the applications in histopathology, only detecting the position of the specific tissue pattern without precisely outlining the borders may be sufficient [1]. These techniques are called tissue pattern detection, and are usually faster compare to segmentation methods. The main advantage of detection models is that they construct a bounding box around the tissue of interest rather than pixel-level labelling, making its training much more convenient.

Deep object detectors typically consist of two parts: a backbone trained on ImageNet and a head used to forecast object classes and bounding boxes. One-stage object detectors and two-stage object detectors are the most common head types [4]. Regions with convolutional neural networks (R- CNN) [5] series, is a good example of a two-stage object detection category. YOLO (you-only-look-once) is one of the examples for one-stage object detectors [4] which has been studied and explored on two different applications for detecting specific tissue patterns in this paper.

YOLO is a simple concept with several advantages. To begin with, YOLO is very fast as it does not require a complicated pipeline for a regression problem. Furthermore, the mean average accuracy of YOLO is higher than that of comparable real-time systems. Therefore, the network can be considered as a real-time object detector. Secondly, with YOLO, context information about object classes is encoded as well as their appearance during training and testing, unlike sliding window and region proposal-based approaches [6]. A popular object recognition approach, Fast R-CNN [7], may misidentify background patches in an image as objects. In comparison to Fast R-CNN, YOLO creates half the number of background errors [6]. And thirdly, YOLO learns to represent objects in a universally applicable way. YOLO surpasses the best detection algorithms like deformable parts models (DPM) and R-CNN when trained on natural images and evaluated on art. YOLO is less likely to fail when applied to new domains or unexpected inputs because of its high degree of generalizability [6].

In this paper, YOLO-v4 has been employed to find particular tissue patterns in WSIs. Comparisons on the same datasets, with segmentation approaches, will be performed. The histological evaluation of “glomeruli” is critical for identifying whether a kidney is transplantable [8]. The Karpinski score, which includes the ratio of sclerotic glomeruli to total number of glomeruli in a kidney segment, is critical for determining the necessity for a single or dual kidney transplant [9]. Clinical symptoms, immunopathology, and morphological abnormalities are all factors that go into classifying glomerular disorders. To classify glomerular diseases, these anatomic structures need to be detected. Automated glomeruli identification frameworks for kidney biopsies conducted by pathologists can be quite helpful because manual examination of kidney samples is time-consuming and error-prone [9, 8]. There are several segmentation methods to detect glomeruli in kidney images [10]. However, these methods require pixel-level annotation of the images. In detection methods, only determining the location of a given tissue pattern, the glomerulus, is required without the need to precisely delineating its borders.

In the field of histopathology, the lack of image data, annotation, and labels has always been a problem [11]. Hence, it is important to validate deep networks on their generalization capability. By training a network with public datasets, and then fine-tuning it with only limited data from a specific hospital or specific resource, we may be able to significantly improve the accuracy of the network on the validation set from the same resource.

In another application, YOLO-v4 as a detection network has been trained to recognize all glomeruli in a given kidney image. Multiple experiments were designed and carried out based on different training data from two public datasets to fine-tune the model, and tested on the private dataset from the University of Michigan as an external validation on two differently stained tissues, namely periodic acid–Schiff (PAS) staining and hematoxylin and eosin (H&E) staining.

The first dataset is a public collection of 31 tiled TIFF (SVS) WSIs. The annotation of the bounding boxes of these 31 WSIs has been performed by collaborating pathologists. This data is part of the WSI datasets generated within the European project AIDPATH (source: http://aidpath.eu/). The second dataset has been used for the HubMap competition (source: https://www.kaggle.com/c/hubmap-kidney-segmentation/overview). TIFF files ranging in size from 500MB to 5GB make up the dataset containing 8 WSIs for training, and 5 WSIs for subsequent testing. The segmentation annotation was provided for each of the WSIs in this competition. The generalization of the network has been tested by training on these two public datasets, followed by the external validation on the private dataset from University of Michigan. Another (private) dataset that has been used for training and fine-tuning the models has 7 PAS stained WSIs which has been collected from the University of Michigan annotated by an expert pathologist. In Figure 1 three samples of the training dataset for the network are shown.

Refer to caption — Figure 1: Samples of the three training datasets: sample from the first public dataset AIDPATH (left), sample from the second public dataset HubMap (middle), and an example from the private dataset from the University of Michigan (right).

The three datasets, two public and one private, have been used to design and conduct 14 experiments. These experiments have been trained on different combinations of public and private datasets. Results have been validated on the private dataset from the University of Michigan with two stains, 20 PAS stained WSIs and 16 H&E stained WSIs. YOLO served as the detector network which will be described in details in the following sections.

In Figure 2, two samples of the private validation dataset along with the annotated bounding boxes have been shown. On the top is a sample of tissue derived from a H&E stained WSI, and on the bottom is a sample of tissue from a PAS stained WSI. The results, average specificity and sensitivity for all experiments. Comparison of existing segmentation methods on the same datasets are discussed in the results section. In general, one could observe that the average specificity and sensitivity are higher on the PAS validation set, because all of the images in the training dataset are PAS stained. Also, there is an improvement in average specificity and sensitivity while fine-tuning the network with only 7 PAS WSIs from the University of Michigan.

II Literature Review

A significant step in determining whether a kidney is transplantable is the histological examination of renal samples by experienced pathologists [9, 8]. The histopathology evaluation of the number of globally sclerosed glomeruli in relation to the overall number of glomeruli is essential for accepting or rejecting a donor’s kidneys [9]. Mutliple glomerulocentric pathology classification systems are employed for native kidney diseases [12, 13, 14] emphasizing the central role of glomerular injury. In Figure 3 samples of glomeruli in kidney images are shown.

Waste and excess fluids are expelled from the human body by glomeruli, which are clusters of capillaries responsible for expulsion. It is possible to group glomerular disorders according to their clinical symptoms, etymology, immunopathology, or morphological changes [9, 8]. A condition known as “glomerulosclerosis” is the result of the kidney lesion changing its morphology; this sclerosis can impact the kidney in many ways, depending on whether or not it is global or partial [10]. The number of glomeruli detected in each kidney biopsy should be counted in daily practice. Per kidney biopsy, about 20 to 30 cuts are made [10]. Additionally, glomeruli that are completely sclerosed must be noted (the entire glomerulus). Detection of localized sclerosis will provide further information regarding the patient’s condition. Each pathology report should include this information because the number of glomeruli assessed must be representative enough to determine a diagnosis [15]. On the other hand, if the sample has numerous sclerosed glomeruli, this may suggest that the patient has chronic kidney disease with dead glomeruli. As a result, the patient may not be suited for some medications, which will help to define adequate treatment [15]. This information is also entered into the national register for glomerulonephritis [10]. Time-consuming and tiresome, the count of glomeruli is a painstaking process. Because of this, image processing methods that can identify and categorize the glomerulus are needed.

With the emergence of deep learning networks, various options for computer vision tasks such as glomeruli object identification, semantic segmentation, and instance segmentation became available [10]. For instance, some works provide a detailed assessment of object identification and instance segmentation algorithms [16]. Others provide a complete review of semantic segmentation [17]. Several recent research efforts in digital pathology have used deep neural networks for glomeruli detection and segmentation [18, 19, 20, 21, 22, 8, 23, 24, 25, 10].

For glomeruli detection, YOLO has been applied on kidney images for the first time in this paper and compared with the existing segmentation method U-Net, using the same validation dataset. There are two different tissue stains in the validation dataset.

II-A Tissue Staining

Staining is used to emphasize essential characteristics of the tissue, as well as improve contrast. Hematoxylin is a common stain dye used in this technique that gives the nuclei a bluish hue, whereas eosin (another stain dye used in histology) gives the cell’s cytoplasm a pinkish tint [26].

II-A1 Periodic Acid-Schiff (PAS)

A staining technique called PAS is used in histochemistry to show that carbohydrates and carbohydrate compounds like polysaccharides, mucin, glycogen, and fungal cell wall components are found in cells. PAS has been used to look for glycogen in places like the skeletal muscle, liver, and heart muscle. PAS staining works with both formalin-fixed, paraffin-embedded (FFPE), and frozen tissue sections [27]. In renal pathology PAS stain is particularly useful to highlight basement membranes.

II-A2 Hematoxylin and Eosin (H&E) Staining

There are two types of histological stains that come together to make H&E: Hematoxylin and Eosin. The hematoxylin stains cell nuclei purple, and eosin stains the extracellular matrix as well as the cytoplasm pink. Other anatomic tissue structures take on different shades and hues of these two colors [28]. There are two parts of a cell including the nucleus and the cytoplasm. Pathologists can easily tell them apart, and the overall patterns of coloration from the stain show the general layout and distribution of cells and give an overall impression tissue morphology [27].

III Materials & Methods

III-A Method

Predicting one or more object locations, determining their classes, and drawing a bounding box around the object is the definition of an object detection task. In many existing detection systems, multiple classifiers are applied to an image at many locations and scales to calculate the high-scoring regions of the image for detecting a region of interest. In this paper, the YOLO approach (You Only Look Once) [6] has been trained for both applications on detecting tissue patterns in WSIs; one is artifacts and manual ink-markers detection, and the second is glomeruli detection in kidney images.

One of the essential advantages of YOLO over classifier-based systems is the speed of this model. YOLO is faster than R-CNN with greater than 1000 times performance [5], and 100 times faster than Fast R-CNN [7]. Predictions based on YOLO are with a single network evaluation, while R-CNN requires many network evaluations for a single image. More importantly, as an object detector, YOLO does not require detailed pixel-level annotation; labels for YOLO are just bounding boxes around the target objects.

III-A1 Network Architecture of YOLO

By combining separate components of other object detection networks, like the ones using a sliding window, or region-based techniques, YOLO can predict all image objects for all the classes based on the information from the whole image only by looking at the image once [6]. In other words, the network models the entire image at once along with all of its individual objects. End-to-end training and real-time speeds are made possible by the YOLO architecture while high average accuracy is maintained [6]. An $S\times S$ grid is generated on any given image. A grid cell is responsible for identifying an object whose center lies within that grid cell. Boxes and confidence ratings are predicted for each grid square. If the model is certain that the box contains an object, it will give it a high confidence score [6]. This confidence score is calculated based on

Pr(Object)=IOU^{truth}_{pred}

(1)

The confidence score should be 0 if there is no predicted object present in a cell. If there is at least one predicted object in that cell, for the confidence score to be accurate, it must be equal to the intersection over union (IOU) between the predicted box and the ground truth. The probabilities of each $C$ conditional class, $Pr(Class_{i}|Object)$ , are also predicted in each grid cell. The location of object’s grid cell determines these probabilities. No matter how many boxes $B$ there are in a grid cell, the network can only forecast one set of class probabilities. For the evaluation of the network, the network of computes the class-specific confidence scores for each box (each class $C_{i}$ and object $O$ ) based on

Pr(C_{i}|O)*Pr(O)*IOU^{truth}_{pred}=Pr(C_{i})*IOU^{truth}_{pred}

(2)

Both the likelihood that a certain class will be found in the box and how well the predicted box will fit the item are represented by these scores [6].

YOLO was inspired by GoogleNet model for image classification [29]. It has 24 convolutional layers, followed by two fully connected layers make up the detecting network. The full network has been shown in Figure 4.

III-A2 YOLO-v4

The YOLO-v4 [4] model has been trained for both applications in this paper. The codes of all steps involved are available on GitHub (source: https://github.com/AlexeyAB/darknet), and in this experiment, the codes have been modified in a way to train a custom dataset. The implementation of a new architecture in the backbone in YOLO-V4 compared to YOLO-V3 has made an essential improvement in the mAP (mean Average Precision) and the number of FPS (Frame per Second) by 10% and 12%, respectively, when trained and tested on COCO dataset (source: https://cocodataset.org/). The new architecture in the backbone is a deep neural network composed mainly of convolution layers, and the main objective is to extract features. The backbone selection is a key step and can improve object detection performance; mostly pre-trained neural networks are used to train the backbone [4].

III-B Dataset

Kimia Lab and the pathology department of the University of Michigan are collaborating on a project for developing a computational kidney disease diagnosis model. As a part of this project, Kimia Lab has received a glomeruli dataset with bounding box annotations created by nephropathologists. To expand the training data two different public datasets, plus a private dataset from the University of Michigan have been used in this study.

Public Dataset 1

The first public dataset consists of 31 WSIs in SVS format. With the size range between $21651\times 10498$ pixels and $49799\times 32359$ pixels acquired at 20x to preserve image quality and information while requiring significantly less computational time than images taken at other magnifications [30]. A glomerulus may lose structural information due to the lower resolution and poor image quality. It is also important to note that employing magnifications such as 40x would increase the model size, slowing down the training process [10]. This data is part of the WSI datasets generated within the European project AIDPATH (source: http://aidpath.eu/). A biopsy needle with an outside diameter of between 100 nm and 300 nm was used to obtain tissue samples. Once the paraffin blocks were ready, the tissue portions were cut into 4 um pieces and coloured with through PAS staining [30]. It is common to employ PAS stain to color polysaccharides found in kidney tissue and to highlight glomerular basement membranes be- cause of its effectiveness [31]. These images contain different types of glomeruli labeled by Bueno et al. approach [10]. This dataset has two parts, DATASET A, which contains the raw 31 WSIs, and DATASET B, which is 2340 glomeruli images, 1170 normal glomeruli and 1170 sclerosed glomeruli. Because of the lack of exact coordinates of the extracted glomeruli, the exact coordinates of the glomeruli bounding boxes were extracted by a pathologist.An annotated WSI sample of the first public dataset has been shown in Figure 5.

Public Dataset 2

This dataset has been used for HubMap competition (source: https://www.kaggle.com/c/hubmap-kidney-segmentation/overview). TIFF files ranging in size from 500MB to 5GB make up the dataset containing 8 images for the training and 5 images for the test. RLE-coded and uncoded (JSON) annotations are included in the training set. The annotations identify glomeruli that have been divided into sections. Also, anatomical structural segmentations are included in both the training and public test sets. The bounding boxes of these anatomical structures for using these annotations for the YOLO object detector have been created based on manual contours. Figure 6 is an example of the procedure to generate a bounding box from manual delineation. This bounding box is found by calculating the upper left most and lower right- most coordinates in the delineation. An annotated WSI of the second dataset has been shown in Figure 7.

University of Michigan Data

This private dataset has been collected from the University of Michigan and annotated by an expert pathologist. The training dataset consists of 7 Periodic acid–Schiff (PAS) stained WSIs for fine-tuning the models that have been trained on the mentioned public datasets. Annotated WSI sample of this dataset has been shown in Figure 8. Beside these 7 PAS stained WSIs for training, 20 PAS stained WSIs, and 16 H&E stained WSIs have been used for validation. The images show that the University of Michigan data has the same type as the first public dataset, namely needle biopsy images, unlike the second public dataset, which are surgical excisions. This difference would affect the results obtained by each public dataset.

III-C Experiments

A total of 7 different combination of datasets (using two public datasets, and a private dataset) selected for the training of YOLO object detector, resulting in a total of 7 different models. The 7 training datasets have been evaluated on two different validation datasets with different stains from the University of Michigan: One contains 20 PAS stained images and the other one contains 16 H&E stained images. All experiments, along with the explanation of the training and the validation dataset, have been reported in Table 2. And the configurations of the network for all 7 different training datasets have been described in Table 1. In this table,

•

Batch stands for how many images are used in the for- ward pass to compute a gradient and update the weights via back-propagation,
•

Subdivisions stands for the number of blocks in which the batch is subdivided,
•

Policy means using the steps and scales parameters bellow to adjust the learning rate during training,
•

Steps means adjust the learning rate after 3200 and 3600 batches,
•

Scales means re-scale the current learning rate by the corresponding factor once the number of steps is reached,
•

Max batches is the maximum number of iterations,
•

Filters stands for how many convolutional kernels there are in a layer, and
•

Activation defines the activation function.

TABLE 1: The configuration of the network for all training datasets for glomeruli detection

learning rate	batch	subdivisions	policy	steps	scales	max batches	filters	activation
0.001	40	16	steps	4800, 5400	0.1,0.1	6000	18	linear

TABLE 2: All 7 training sets along with the test experiments using public datasets 1 and, the private dataset from University of Michigan.

Experiment	Training Dataset	Test Dataset
1	31 WSIs from public dataset 1	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset
2	31 WSIs from public dataset 1, fine-tuned with 7 PAS WSIs from UMICH dataset	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset
3	8 WSIs from public dataset 2	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset
4	8 WSIs from public dataset 2, fine-tuned with 7 PAS WSIs from UMICH dataset	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset
5	31 WSIs from public dataset 1, and 8 WSIs from public dataset 2	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset
6	31 WSIs from public dataset 1, and 8 WSIs from public dataset 2, fine-tuned with 7 PAS WSIs from UMICH dataset	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset
7	7 PAS stained WSIs from UMICH dataset	20 PAS WSIs from UMICH dataset
		16 H&E WSIs from UMICH dataset

Many studies have been performed to identify glomeruli functional tissue units in human kidneys. Recently, there was a Kaggle competition, Hacking the Kidney, launching to segment glomeruli in kidney images (source: https://www.kaggle.com/competitions/hubmap-kidney-segmentation). The dataset provided for the competition was public dataset 2, discussed in the dataset section. TIFF files, ranging in size from 500MB to 5GB, make up the dataset containing eight images for the training and five images for the test. RLE-coded and uncoded (JSON) annotations are included in the training and validation sets. The authors of a study [32] compare the five winning algorithms between more than a thousand teams that participated in the above competition. They assess the accuracy and performance of the five top algorithms, and the codes are available online (source: https://github.com/cns-iu/ccf-research-kaggle-2021/). To compare a segmentation model with the detection model in this paper, the first team’s algorithm has been chosen as the benchmark. The accuracy on the same validation dataset i.e. 20 PAS stained images and 16 H&E images from the University of Michigan has been calculated based on the explanation for the winning proposal (source: https://www.kaggle.com/c/hubmap-kidney-segmentation/discussion/238198). They have used a single U-Net SeResNext101 architecture with Convolutional Block Attention Module (CBAM), hypercolumns, and deep supervision. Their network read 1024 $\times$ 1024 pixel patches and then downsample them to 320 $\times$ 320 patches. SGD is the optimizer for their model, trained using binary cross-entropy.

Training is performed for 20 epochs, with a learning rate of $10^{-4}$ to $10^{-6}$ and a batch size of 8 images. Their final weights trained on the whole training dataset have been used to validate and compare their network on the University of Michigan dataset, which contains 20 PAS stained images and 16 H&E stained images. The results are provided in section IV. Note that this is not possible to fine-tune the mentioned segmentation model with the external validation set (University of Michigan WSIs) as the external WSIs do not contain the pixel-level annotation. For comparing the segmentation model with YOLO, the segmentation area is enclosed with the smallest possible rectangle (the upper left most and lower right-most coordinates) and use these rectangles as the segmentation model output. Figure 6 depicts the process visually.

IV Experiments & Results

Immunopathology, clinical symptoms, and morphological abnormalities are all factors that go into classifying glomeruli disorders [33]. To classify the glomeruli diseases, these objects need to be detected first. Therefore, the average sensitivity and specificity of the detection matters.

By having True Positives as $TP$ , False Positives as $FP$ , False Negatives as $FN$ , and True Negative as $TN$ , The formula of sensitivity and specificity metrics [34] is as follows:

	Sensitivity	$\displaystyle=$	$\displaystyle\frac{TP}{TP+FN}$		(3)
	Specificity	$\displaystyle=$	$\displaystyle\frac{TN}{TN+FP}$		(4)

For computing true positives, false positives, false negatives, and true negatives, the IoU measure has been used (intersection over union) to determine the overlap between two boundaries divided by their union. Our dataset pre-defined an IoU threshold (i.e., 0.5) in classifying whether the prediction is a true positive or a false positive. Also, false negative would be those glomeruli objects that any predicted bounding boxes have not covered. Moreover, the true negatives were calculated based on the area of the whole slide tissue minus those predicted areas that were not containing any glomeruli.

As mentioned in the last section III, 7 training datasets with two public and one private datasets have been created and validated on two datasets with different stains from the University of Michigan. In this section, the average sensitivity and specificity have been calculated for all these experiments for all images, along with the comparison with the existing segmentation method.

IV-A PAS Validation Set

Two public datasets and a private dataset from the University of Michigan, all PAS stained, were used to train YOLO and validated on 20 PAS stained images. Different experiments were designed and evaluated on these images. Average sensitivity and specificity values for each experiment can be seen in Table 3 along with the comparison with the segmentation method explained in Section III (used for Hacking the Kidney Competition).

The ROC (receiver operator characteristics) curves [35] for all the experiments on these 20 PAS stained images have been shown in Figure 9. As it has been reported in Table 3, the segmentation results have a high average specificity with lower sensitivity which means the network has low number of false positives. However, it can only predict half of the true negative glomeruli objects. Furthermore, using the external validation set (University of Michigan WSIs) to fine-tune this segmentation model is not feasible since the external WSIs do not contain pixel-level annotation.

Among the YOLO experiments, one experiment was done with a training set containing only 7 PAS stained images from the University of Michigan, with average sensitivity and specificity equal to 85%, and 80%, respectively, which can show the network is performing well on the validation from the same resource with limited training data. Another three experiments have been performed only on public datasets. They have been evaluated on an external validation dataset which is the data from the University of Michigan.

By examining the network on an external dataset, the generalization of the network can be assessed. Also, it is evident that after fine-tuning the network with only 7 PAS stained images from the University of Michigan on the same dataset, the average sensitivity has a considerable improvement. For example, the average sensitivity and specificity changed from 45%, and 98% to 74%, and 94% respectively. The results may significantly change if there is more data of the same resource as the validation dataset for fine-tuning the network.

Another important point would be the difference between the results of experiments trained on the first public dataset and the second one. It has been shown that by combining both datasets, the accuracy could drop off compared to only training on the first public dataset, and the reason may be related to the difference between the images from the second public dataset and the images from the University of Michigan. The images from the first public dataset and images from the University of Michigan are needle biopsy images. In contrast, the second public dataset consists of excision tissue samples. The phrase “needle biopsy“ refers to a procedure in which a specific needle is inserted into a suspicious region of the skin in order to collect cells. During a “surgical biopsy“, a surgeon creates an incision in your skin in order to reach the suspicious cells. As shown in Figures 7 and 8 number of glomeruli and the size of the glomeruli compared to the whole image are one of the differences between needle biopsy and surgical biopsy.

TABLE 3: Average sensitivity and average specificity were reported for seven different experiments designed with two public datasets and a private dataset from UMICH (University of Michigan). All the PAS (Periodic acid–Schiff) stained images and evaluated on 20 PAS stained images, along with the comparison with a segmentation method using U-NET.

Dataset	Average Sensitivity	Average Specificity
31 WSIs from public dataset 1	82%	95%
31 WSIs from public dataset 1, fine-tuned with 7 PAS stained WSIs from UMICH dataset	85%	89%
8 WSIs from public dataset 2	45%	98%
8 WSIs from public dataset 2, fine-tuned with 7 PAS stained WSIs from UMICH datase	74%	94%
31 WSIs from public dataset 1, and 8 WSIs from public dataset 2	75%	95%
31 WSIs from public dataset 1, and 8 WSIs from public dataset 2, fine-tuned with 7 PAS stained WSIs from UMICH dataset	83%	96%
7 PAS stained WSIs from UMICH dataset	85%	80%
Segmentation Method (HubMap Competition)	48%	99%

IV-B H&E Validation Set

A total of 16 H&E stained images from the University of Michigan have been used as a validation dataset for all training datasets described in the previous section. Comparison between the average sensitivity and average specificity for all seven experiments using YOLO, with two public datasets, as well as a private dataset from the University of Michigan and the segmentation method explained in section III that was used for Hacking the Kidney Competition are provided in Table 4. ROC curves for all experiments on these 16 H&E stained images have been shown in Figure 10. There is a considerable difference between the validation results on PAS stained images and H&E stained images. This substantial difference is explainable because of the difference in tissue staining of training and validation datasets.

Same as in the Table 3, because of the high average specificity and low sensitivity shown in Table 4, the network’s segmentation results practically never show false positives. However, only half of the ground truth negative glomeruli objects can be predicted by this method.

As well as the Table 3, the results have been improved by fine-tuning the training dataset with only seven images from the University of Michigan. The difference between the outcomes of experiments trained on the first public dataset and the second is still significant. Because of the differences in images between the second dataset which are surgical biopsy images and those from the University of Michigan that are needle biopsy images, it has been demonstrated that by combining both datasets, accuracy can drop.

TABLE 4: Average sensitivity, and average specificity reported for different seven experiments designed with two public datasets and a private dataset from UMICH, all PAS stained and evaluated on 16 H&E stained images, along with the comparison with a segmentation method using U-NET

Dataset	Average Sensitivity	Average Specificity
31 WSIs from public dataset 1	51%	95%
31 WSIs from public dataset 1, fine-tuned with 7 PAS stained WSIs from UMICH dataset	67%	89%
8 WSIs from public dataset 2	30%	85%
8 WSIs from public dataset 2, fine-tuned with 7 PAS stained WSIs from UMICH datase	59%	90%
31 WSIs from public dataset 1, and 8 WSIs from public dataset 2	58%	94%
31 WSIs from public dataset 1, and 8 WSIs from public dataset 2, fine-tuned with 7 PAS stained WSIs from UMICH dataset	70%	96%
7 PAS stained WSIs from UMICH dataset	70%	86%
Segmentation Method (HubMap Competition)	47%	99%

V Conclusions

There have been several technological advances across health care and digital pathology in recent years. Automated segmentation and pixel analysis of digital pathology images may identify diagnostic patterns and visual cues, leading to more reliable and consistent diagnostic categorization.

Glomeruli detection, as the first step of classifying the glomeruli diseases following by diagnosing different kidney diseases, is essential and critical in digital pathology. Because of the large number of these objects in the kidney, glomeruli detection could help pathologists save considerable time by computerized quantification. This paper trained YOLO-v4 with seven different training datasets consisting of two public datasets and a private dataset from the University of Michigan. Moreover, the networks were evaluated on 20 PAS stained images and 16 H&E stained images from the University of Michigan. By training YOLO-v4 on the first public dataset, and fine-tuning by only 7 PAS stained images from the University of Michigan, experiments achieved 85% average sensitivity and 89% average specificity while validating the network on 20 PAS stained images from the University of Michigan, which was the best result out of different training datasets. For evaluating the network on H&E stained images, 70% average sensitivity and 96% average specificity were obtained while training on both public datasets, followed by fine-tuning on the 7 PAS stained images. Also, final weights of a segmentation method based on U-Net have been used to evaluate the results on the same validation datasets. The model could achieve high specificity and lower sensitivity, making this method rather unreliable compared to YOLO with higher sensitivity. Moreover, obtaining pixel-level WSI annotations for the network is time-consuming. This makes the whole procedure for fine-tuning the model with limited data harder than detection methods like YOLO, which only requires a bounding box around the target objects.

References

[1] F. Xing and L. Yang, “Robust nucleus/cell detection and segmentation in digital pathology and microscopy images: a comprehensive review,” IEEE reviews in biomedical engineering, vol. 9, pp. 234–263, 2016.
[2] H. Irshad, A. Veillard, L. Roux, and D. Racoceanu, “Methods for nuclei detection, segmentation, and classification in digital histopathology: a review—current status and future potential,” IEEE reviews in biomedical engineering, vol. 7, pp. 97–114, 2013.
[3] M. Khened, A. Kori, H. Rajkumar, G. Krishnamurthi, and B. Srinivasan, “A generalized deep learning framework for whole-slide image segmentation and analysis,” Scientific reports, vol. 11, no. 1, pp. 1–14, 2021.
[4] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[6] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[7] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[8] J. N. Marsh, M. K. Matlock, S. Kudose, T.-C. Liu, T. S. Stappenbeck, J. P. Gaut, and S. J. Swamidass, “Deep learning global glomerulosclerosis in transplant kidney frozen sections,” IEEE transactions on medical imaging, vol. 37, no. 12, pp. 2718–2728, 2018.
[9] N. Altini, G. D. Cascarano, A. Brunetti, I. De Feudis, D. Buongiorno, M. Rossini, F. Pesce, L. Gesualdo, and V. Bevilacqua, “A deep learning instance segmentation approach for global glomerulosclerosis assessment in donor kidney biopsies,” Electronics, vol. 9, no. 11, p. 1768, 2020.
[10] G. Bueno, M. M. Fernandez-Carrobles, L. Gonzalez-Lopez, and O. Deniz, “Glomerulosclerosis identification in whole slide images using semantic segmentation,” Computer methods and programs in biomedicine, vol. 184, p. 105273, 2020.
[11] P. Sudharshan, C. Petitjean, F. Spanhol, L. E. Oliveira, L. Heutte, and P. Honeine, “Multiple instance learning for histopathological breast cancer image classification,” Expert Systems with Applications, vol. 117, pp. 103–111, 2019.
[12] M. B. Stokes and V. D. D’Agati, “Morphologic variants of focal segmental glomerulosclerosis and their significance,” Advances in chronic kidney disease, vol. 21, no. 5, pp. 400–407, 2014.
[13] I. M. Bajema, S. Wilhelmus, C. E. Alpers, J. A. Bruijn, R. B. Colvin, H. T. Cook, V. D. D’Agati, F. Ferrario, M. Haas, J. C. Jennette, et al., “Revision of the international society of nephrology/renal pathology society classification for lupus nephritis: clarification of definitions, and modified national institutes of health activity and chronicity indices,” Kidney international, vol. 93, no. 4, pp. 789–796, 2018.
[14] H. Trimarchi, J. Barratt, D. C. Cattran, H. T. Cook, R. Coppo, M. Haas, Z.-H. Liu, I. S. Roberts, Y. Yuzawa, H. Zhang, et al., “Oxford classification of iga nephropathy 2016: an update from the iga nephropathy classification working group,” Kidney international, vol. 91, no. 5, pp. 1014–1021, 2017.
[15] N. J. Vickers, “Animal communication: when i’m calling you, will you answer too?,” Current biology, vol. 27, no. 14, pp. R713–R715, 2017.
[16] Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212–3232, 2019.
[17] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and J. Garcia-Rodriguez, “A review on deep learning techniques applied to semantic segmentation,” arXiv preprint arXiv:1704.06857, 2017.
[18] D. Ledbetter, L. Ho, and K. V. Lemley, “Prediction of kidney function from biopsy images using convolutional neural networks,” arXiv preprint arXiv:1702.01816, 2017.
[19] Y. Kawazoe, K. Shimamoto, R. Yamaguchi, Y. Shintani-Domoto, H. Uozaki, M. Fukayama, and K. Ohe, “Faster r-cnn-based glomerular detection in multistained human whole slide images,” Journal of Imaging, vol. 4, no. 7, p. 91, 2018.
[20] G. D. Cascarano, F. S. Debitonto, R. Lemma, A. Brunetti, D. Buongiorno, I. De Feudis, A. Guerriero, M. Rossini, F. Pesce, L. Gesualdo, et al., “An innovative neural network framework for glomerulus classification based on morphological and texture features evaluated in histological images of kidney biopsy,” in International Conference on Intelligent Computing, pp. 727–738, Springer, 2019.
[21] N. Altini, G. D. Cascarano, A. Brunetti, F. Marino, M. T. Rocchetti, S. Matino, U. Venere, M. Rossini, F. Pesce, L. Gesualdo, et al., “Semantic segmentation framework for glomeruli detection and classification in kidney histological sections,” Electronics, vol. 9, no. 3, p. 503, 2020.
[22] J. Gallego, A. Pedraza, S. Lopez, G. Steiner, L. Gonzalez, A. Laurinavicius, and G. Bueno, “Glomerulus classification and detection based on convolutional neural networks,” Journal of Imaging, vol. 4, no. 1, p. 20, 2018.
[23] T. Kato, R. Relator, H. Ngouv, Y. Hirohashi, O. Takaki, T. Kakimoto, and K. Okada, “Segmental hog: new descriptor for glomerulus detection in kidney microscopy image,” Bmc Bioinformatics, vol. 16, no. 1, pp. 1–16, 2015.
[24] O. Simon, R. Yacoub, S. Jain, J. E. Tomaszewski, and P. Sarder, “Multi-radial lbp features as a tool for rapid glomerular detection and assessment in whole slide histopathology images,” Scientific reports, vol. 8, no. 1, pp. 1–11, 2018.
[25] M. Temerinac-Ott, G. Forestier, J. Schmitz, M. Hermsen, J. Bräsen, F. Feuerhake, and C. Wemmert, “Detection of glomeruli in renal pathology by mutual comparison of multiple staining modalities,” in Proceedings of the 10th International Symposium on Image and Signal Processing and Analysis, pp. 19–24, IEEE, 2017.
[26] H. A. Alturkistani, F. M. Tashkandi, and Z. M. Mohammedsaleh, “Histological stains: a literature review and case study,” Global journal of health science, vol. 8, no. 3, p. 72, 2016.
[27] D. Wittekind, “Traditional staining for routine diagnostic pathology including the role of tannic acid. 1. value and limitations of the hematoxylin-eosin stain,” Biotechnic & histochemistry, vol. 78, no. 5, pp. 261–270, 2003.
[28] J. K. Chan, “The wonderful colors of the hematoxylin–eosin stain in diagnostic surgical pathology,” International journal of surgical pathology, vol. 22, no. 1, pp. 12–32, 2014.
[29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions. corr abs/1409.4842 (2014),” arXiv preprint arXiv:1409.4842, 2014.
[30] G. Bueno, L. Gonzalez-Lopez, M. Garcia-Rojo, A. Laurinavicius, and O. Deniz, “Data for glomeruli characterization in histopathological images,” Data in brief, vol. 29, p. 105314, 2020.
[31] R. R. Robinson, Renal disease in children: Clinical evaluation and diagnosis. Springer Science & Business Media, 2012.
[32] L. L. Godwin, Y. Ju, N. Sood, Y. Jain, E. M. Quardokus, A. Bueckle, T. Longacre, A. Horning, Y. Lin, E. D. Esplin, et al., “Robust and generalizable segmentation of human functional tissue units,” bioRxiv, 2021.
[33] A. Mastrangelo, J. Serafinelli, M. Giani, and G. Montini, “Clinical and pathophysiological insights into immunological mediated glomerular diseases in childhood,” Frontiers in Pediatrics, vol. 8, p. 205, 2020.
[34] A. G. Lalkhen and A. McCluskey, “Clinical tests: sensitivity and specificity,” Continuing education in anaesthesia critical care & pain, vol. 8, no. 6, pp. 221–223, 2008.
[35] Z. H. Hoo, J. Candlish, and D. Teare, “What is an roc curve?,” 2017.

An Investigation into Glomeruli Detection in Kidney H&E and PAS Images using YOLO