A three in one bottom-up framework for simultaneous semantic segmentation, instance segmentation and classification of multi-organ nuclei in digital cancer histology

Ibtihaj Ahmad Syed Muhammad Israr Zain Ul Islam

Abstract

Simultaneous segmentation and classification of nuclei in digital histology play an essential role in computer-assisted cancer diagnosis; however, it remains challenging. The highest achieved binary and multi-class Panoptic Quality (PQ) remains as low as 0.68 bPQ and 0.49 mPQ, respectively. It is due to the higher staining variability, variability across the tissue, rough clinical conditions, overlapping nuclei, and nuclear class imbalance. The generic deep-learning methods usually rely on end-to-end models, which fail to address these problems associated explicitly with digital histology. In our previous work, DAN-NucNet, we resolved these issues for semantic segmentation with an end-to-end model. This work extends our previous model to simultaneous instance segmentation and classification. We introduce additional decoder heads with independent weighted losses, which produce semantic segmentation, edge proposals, and classification maps. We use the outputs from the three-head model to apply post-processing to produce the final segmentation and classification. Our multi-stage approach utilizes edge proposals and semantic segmentations compared to direct segmentation and classification strategies followed by most state-of-the-art methods. Due to this, we demonstrate a significant performance improvement in producing high-quality instance segmentation and nuclei classification. We have achieved a 0.841 Dice score for semantic segmentation, 0.713 bPQ scores for instance segmentation, and 0.633 mPQ for nuclei classification. Our proposed framework is generalized across 19 types of tissues. Furthermore, the framework is less complex compared to the state-of-the-art.

keywords:

Cancer Diagnosis in Digital Histology , Nuclei Segmentation , Nuclei Classification , Simultaneous Segmentation and Classification , Spatial Channel Attention , Histopathology

^†^†journal: Journal of ABC

\affiliation

[inst1]organization=School of Computer Science,addressline=Northwestern Polytechnical University, city=Xi’an, postcode=710072, state=Shaanxi, country=PR China, , e-mail: [email protected]

\affiliation

[inst2]organization= School of Information Science and Technology,addressline=University of Science and Technology of China, city=Hefei, postcode=230000, state=Anhui, country=PR China, , e-mail: [email protected]

\affiliation

[inst3]organization=ICT Division, College of Science & Engineering,addressline=Hamad Bin Khalifa University, city=Education City, postcode=34110, state=Doha, country=Qatar, , e-mail: [email protected]

1 Introduction

Digital histology plays a significant role in the analysis and diagnosis of cancers. Digital histology is in the form of a Whole Slide Image (WSI), usually of size 100,000 x 100,000 pixels. A single WSI may contain thousands of nuclei of various classes. The analysis of these nuclei provides information about the tumor. The appearance, morphological shape, and patterns act as essential markers. These markers give information about the type of cancer, grade of cancer [1], metastases [2], and survival prediction [3]. In current medical practices, pathologists analyze WSI using Computer Assisted Diagnosis (CAD). Although these semi-automatic CAD systems have significantly reduced time constraints, the analysis remains a tiresome job and requires an expert pathologist. An automated analysis greatly reduces the pathologist’s workload and improves the diagnosis time.

Generally, nuclei segmentation and classification in digital histology are achieved using three different approaches, i.e., the top-down, the bottom-up, and the keypoint-based approaches [4]. Top-down and keypoint-based approaches follow “detect and classify” strategies. In contrast, bottom-up approaches use “segment and classify” strategies. The drawback of the keypoint-based methods is that they achieve lower Segmentation Quality (SQ) since they only produce high-confidence instances. The top-down approaches are unsuitable for histology segmentation and classification since they perform poorly for closely packed nuclei objects. Bottom-up approaches such as [5, 6, 7, 8], produce high-resolution semantic segmentation, group the pixels in the form of instances, and finally classify the nuclei. The bottom-up approach is followed primarily due to its superior performance, specifically for digital histology. For nuclei classification, the bottom-up “segment and classify” strategy performs better than the “detect and classify” strategy [9]. However, the segmentation performance of the bottom-up approaches significantly depends upon semantic segmentation. Although the bottom-up approaches are better than the rest, they still pose issues. This is due to the overlapping nucli, variation across organs, high staining variability, extreme clinical conditions, poor quality, artifacts, variable nuclear density (nuclei overlapping), variable magnification level, and nuclei class imbalance [10, 5, 4]. Although current methods [5, 6, 7, 11, 12] have solved some of the problems; however, problems like variation in tissue, poor staining quality, variable nuclear density, and nuclei class imbalance still exist.

In our previous work, DAN-NucNet [13], we resolved some of these issues. However, DAN-NucNet is limited to semantic segmentation. Considering the superiority of the bottom-up approaches, we extend our previous model to a three-head dual attention model embedded in the bottom-up fashion with the post-processing methods. We extend our single decoder to a three-head decoder model. The decoder heads produce semantic segmentation, edge proposals, and pixel-level class proposals. Then we incorporate the three-head model with the post-processing methods. Our post-processing approach uses the nuclei edges to apply a controlled watershed to generate nuclei instances. These instances and the outputs from the classification decoder head are used to generate a clean version of the classified nuclei. The generic workflow of the proposed framework is shown in Fig. 1.

Refer to caption — Figure 1: The figure provides an overview of our bottom-up approach framework. The three-head Network produces semantic segmentation, edge proposals, and pixel-wise classes, then using post-processing, instance segmentation and classification are obtained.

The redesigned model has the following benefits: (1) Using the DAN-NucNet baseline helps improve the tissue generalization capability and the semantic segmentation quality. (2) spatial attention solves the morphological variability problem, while channel attention solves the staining variability problem. (3) Instead of direct instance segmentation, we utilize the edge proposals produced by the three-head model and the semantic segmentation. This significantly improves the instance segmentation compared to the direct approach used by the state-of-the-art. (4) The use of a different loss function for each output and the weighted loss strategy greatly improved the performance of individual output and especially the problems associated with the class imbalance. The primary contributions of this work are as follows,

1.

An all-in-one novel bottom-up framework for the simultaneous semantic segmentation, instance segmentation, and classification of nuclei in multi-organ cancer histology images is proposed in this work. We modify our previous model by adding additional decoder heads and post-processing methods. The baseline model is equipped with two more losses, including a weighted loss function.
2.

We utilize the semantic segmentation and edges proposed by one of the decoder heads to perform instance segmentation, compared to the state-of-the-art, which mostly relies on direct approaches. This helps the model to predict the nuclei boundaries even in challenging conditions, such as overlapping nuclei and missing staining.
3.

The instance segmentation and class maps are then used to assign classes to the nuclei instances. Due to this, a huge improvement in classifying imbalanced nuclear categories, i.e., connective and dead nuclei, is achieved.
4.

The proposed framework remains less computationally extensive than the state-of-the-art. Furthermore, our framework is generalized across 19 cancer sites.

2 Related Work

This section presents a detailed overview of the relevant literature. The related work is organized into three sub-sections based on the earlier approaches.

2.1 Keypoint Based Approaches

Keypoint-based approaches regress the nuclei locality by using pre-defined key points to the nuclei. The keypoint approaches are classified into two categories, i.e., group-based and group-free key point approaches. Group-based keypoint approaches produce key points for each nucleus and then group them based on post-processing methods to form bounding boxes. CornerNet [14], CentripetalNet [15], and RepPoints [16] are examples of group-based keypoint approaches. Group-based keypoint approaches have similar drawbacks to traditional image-processing methods since they depend on traditional post-processing methods. Contrary to the group-based keypoint approaches, the group-free keypoint approaches detect the nuclei center directly. Group-free keypoint approaches include CenterNet [17], SaccadeNet [18], and PointNu-Net [4]. The drawback of the keypoint-based methods is that they achieve lower Segmentation Quality (SQ) since they output the high confidence instances only, thus reducing the classification performance. Keypoint-based methods produce an enormous amount of incorrect bounding boxes [17]. Furthermore, keypoint-based methods also have reduced inference speed [19].

2.2 Top-down Approaches

Top-down approaches produce candidate regions and use non-maximum suppression tools to obtain bounding boxes. Then segmentation and classification of nuclei are independently performed. Examples of top-down approaches include Mask-RCNN [20], YOLACT [21], CenterMask [22], PolarMask [23], TensorMask [24], and MaskLab [25]. Although the top-down approaches are commonly used for generic applications, these methods are rarely used for segmentation and classification for many reasons. One reason is that these approaches have shown lower performance for segmenting nuclei boundaries due to densely populated and overlapped nuclei. These approaches may recommend tens of overlapped bounding boxes in a small region of a histology image. Furthermore, due to non-maximum suppression, a bounding box may not cover the whole nuclei.

2.3 Bottom-up Approaches

Bottom-up approaches produce high-resolution semantic segmentation and then utilize post-processing methods to group and clean the segmented and classified nuclei. Some top-performing bottom-up approaches include, [26], DIST [6], CIA-Net [7], Triple U-Net [8], HoVer-Net [5], and [11]. Kumar et al. [26] predict the background, nuclei, and edges of the nuclei and then cluster the pixels into instances based on region growth using seeds. CIA-Net [7] uses contour supervision to generate edges. DIST [6] regresses distance maps to group pixels. HoVer-Net [5] uses separate vertical and horizontal distance maps of the nuclei pixels from their centers to achieve simultaneous instance segmentation and classification. Although these bottom-up approaches are commonly used for nuclei segmentation and classification due to their superior performance, they still possess drawbacks. These include problems like variation in tissue, poor staining quality, variable nuclear density (nuclei overlapping), and nuclei class imbalance. These methods still possess a significant performance gap in extreme clinical conditions.

3 Proposed Framework

The idea of the suggested framework is based on the bottom-up approach, i.e., we design a model that can produce basic features, which can then be utilized by the post-processing methods to produce improved segmentations and classifications. We redesign our previous model by adding additional decoder heads. These decoder heads produce edge proposals and pixel-level class proposals. It is to be noted that theoretically, we could produce instance segmentation and classifications with these additional decoders, just like some of the state-of-the-art models. However, this approach has some drawbacks. The model can easily miss-segment the overlapping and partially stained nuclei. This also results in a higher false classification rate. Therefore using additional information is necessary. In our case, we intelligently propose edges, combining them with semantic segmentation and other hand-crafted features, and then use post-processing to achieve instance segmentation and classification. Generally, our strategy is as follows. First, a histology image is given as input to our pre-trained three-head network. The three head networks produce three outputs, i.e., semantic segmentation, edge proposals, and pixel-wise classification. Next, we apply a controlled watershed algorithm using semantic segmentation and edge proposals to generate instance segmentation. Finally, we group pixels from the three-head network using instance segmentation to obtain a cleaned version of classified nuclei. The detailed working of each block is explained in the following subsections.

3.1 Three Head Network

Encoder-Decoder-based networks have improved segmentation performance in digital histology compared to the rest of the networks. However, these networks possess limitations. Usually, the encoder and decoder features are directly fused. Some of the features may possess redundant information, thus increasing complexity and may even reduce performance. Digital histology is H&E stained, where most information lies in the channel. Furthermore, some information is also spatially present; for example, the inflammatory nuclei usually surround the tumor nuclei. Therefore considering these features may increase the network performance. The second problem is the design of the encoder. The encoder is comprised of consecutive convolution layers, which produce discontinuity between lower-level and higher-level features. It also introduces the vanishing gradient problem. Inspired by these observations, we introduce spatial and channel attention mechanisms. We place our attention mechanism between the encoder and decoder as an intermediate layer. We call it the attention decoder block. Transposed convolutions follow the attention decoder blocks. We have replaced simple convolutions blocks with ResNet blocks in the encoder. Fig. 2 shows the three-head network and building blocks in detail.

The three-head network has 16 filters at the first stage (represented by ‘F‘ in Fig. 2), and then at each subsequent stage of the encoder, the number of filters is doubled. At each subsequent decoder stage, the number of filters is halved.

3.1.1 Attention Mechanism

With some modifications, we deploy the spatial-channel attention as our previous work. We have placed transposed convolution layer at the end of the each decoder instead of placing a naive up-sample layer enabling the network to learn the features during the up-sampling. This increases performance for poorly stained images and incomplete/missing nuclei. The results of the transposed convolution are used in the successive attention decoders, except for the first decoder, where we use the output of the intermediate convolution layers. The attention mechanism is mathematically expressed as Eq. 1, and Eq. 2, respectively,

\begin{matrix}I=\sigma_{ReLU}\left(C_{1,1}^{F/8}\left(AvgPool\left(F_{enc}\right)\right)\right)\oplus\sigma_{ReLU}\left(C_{1,1}^{F/8}\left(MaxPool\left(F_{enc}\right)\right)\right)\\ J=\sigma_{ReLU}\left(C_{1,1}^{F/8}\left(AvgPool\left(F_{dec}\right)\right)\right)\oplus\sigma_{ReLU}\left(C_{1,1}^{F/8}\left(MaxPool\left(F_{dec}\right)\right)\right)\\ A_{cha}=\sigma_{sig}\left(\sigma_{ReLU}\left(C_{1,1}^{F/8}\left(I\oplus J\right)\right)\right)\end{matrix}

(1)

\begin{matrix}K=\sigma_{ReLU}\left(C_{1,1}^{1}\left(MaxPool\left(AvgPool\left(F_{enc}\right)\right)\right)\right)\\ L=\sigma_{ReLU}\left(C_{1,1}^{1}\left(MaxPool\left(AvgPool\left(F_{dec}\right)\right)\right)\right)\\ A_{spa}=\sigma\left(K\oplus L\right)\end{matrix}

(2)

In the above equations $C_{k,k}^{f}$ , $k$ , $F$ , $\sigma$ , and $\oplus$ represent convolution, kernels, filters, sigmoid activation function, and concatenation, respectively. $A_{cha}$ , and $A_{spa}$ represent the channel attention and spatial attention blocks output, respectively.

The spatial attention maps $A_{spa}$ and channel attention maps $A_{cha}$ obtained above are finally multiplied with the encoder features $F_{e}$ , which results in the refined features $F_{r}$ . $F_{r}$ is mathematically expressed as, Eq. 3,

F_{r}=F_{enc}\otimes A_{cha}\otimes A_{spa}

(3)

3.1.2 ResNet Block

The proposed resnet block extracts useful semantic features at all encoder levels. The working of the ResNet block is represented in Fig. 2 and explained as follows; First, the input to the ResNet block is passed through a 1x1 convolution + ReLU layer, then 3x3 convolution + ReLU layer, finally 1x1 convolution + ReLU layer. This is represented in Eq. 4. Simultaneously, the input is also passed through a 1x1 convolution layer, as represented in Eq. 5. The output of these two is fused, and a 3x3 convolution + ReLU layer + batch normalization is applied, which is represented as Eq. 6.

\left.\begin{matrix}T_{i,0}=\sigma_{ReLU}\left(C_{1,1}^{F}\left(in_{i}\right)\right)\\ T_{i,1}=BN\left(\sigma_{ReLU}\left(C_{3,3}^{F}\left(T_{i,0}\right)\right)\right)\\ X_{i,1}=\sigma_{ReLU}\left(C_{1,1}^{F}\left(T_{i,1}\right)\right)\\ \end{matrix}\right\}

(4)

X_{i,0}=\sigma_{ReLU}\left(C_{1,1}^{F}\left(in_{i}\right)\right)

(5)

out_{i}=BN\left(\sigma_{ReLU}\left(C_{3,3}^{F}\left\{T_{i,0}\oplus T_{i.1}\right\}\right)\right)

(6)

in the above equation, $in_{i}$ represents input and $out_{i}$ represents output of the resnet block at layer $i$ . $T_{i,j}$ represents intermediate features, $C_{k,k}^{F}$ represents convolution, and $BN$ represents batch normalization..

3.2 Instance Segmentation and Classification

We use a controlled watershed algorithm to get nuclei instance segmentation. The instance segmentation framework is shown in Fig. 3. First, we remove the edge proposals having less than ten percent probability. This limit is chosen such that the framework provides optimal performance. This step removes some of the false edges. We subtract these edge proposals from semantic segmentation maps in the next step. This separates the connected nuclei and gives us the sure foreground region (foreground markers) of the nuclei. The background markers in the semantic segmentation image are shown in black color. Ideally, we do not want these background markers to be too close to the edges of the nuclei. This is done by computing the watershed transform of the Euclidean distance transform of sure foreground markers and then identify for the watershed ridgelines. We combine all these, i.e., sure foreground, ridgelines, and background. Finally, we apply a marker-controlled watershed algorithm to get instance nuclei.

We take the previously segmented instances for classification and extract the corresponding region in the pixel-wise class image obtained from the three-head network. Then we separately count the pixels of each class within the extracted region. Finally we assign the selected instance to the region’s class with the highest frequency. Mathematically, the process is represented as Eq. 7.

y_{c}^{ins}=max\left(\sum x_{c1}\left(i,j\right),\sum x_{c2}\left(i,j\right),\cdots\sum x_{c3}\left(i,j\right)\right)

(7)

where, $y_{c}^{ins}$ is the class ( $c$ ) assigned to the instance ( $ins$ ), $x_{cn}\left(i,j\right)$ pixel class at row $i$ , column $j$ predicted by the three head network, and $n$ is the total number of classes. An illustration of the pixel grouping is shown in Fig. 4.

3.3 Loss Function

The proposed three head framework have three sets of losses ( $l_{a}$ , $l_{b}$ , and $l_{c}$ ) and their respective weights ( $\lambda_{a}$ , $\lambda_{b}$ , and $\lambda_{c}$ ) , combined under 8,

l=\underbrace{\lambda_{a}l_{a}}_{\text{Semantic Segmentation}}+\underbrace{\lambda_{b}l_{b}}_{\text{Edge Proposal}}+\underbrace{\lambda_{c}l_{c}}_{\text{Pixel-Wise Class}}

(8)

For semantic segmentation and edge proposal, we retain our previous loss function [27, 28], since it has proven to be the most productive for semantic segmentation. This loss function is represented by 9,

l_{a}=\frac{Dice\;Loss*Jaccard\;Loss}{Dice\;Loss+Jaccard\;Loss}

(9)

We have used this specific loss function for semantic segmentation and edge proposals since it has shown slightly improved performance compared to the other widely used loss functions. For the classification head, we have used categorical cross-entropy loss, represented by 10,

l_{c}=-w_{d}\sum_{d}t_{c,d}log\left(p_{c,d}\right)

(10)

where, $p$ are the predictions, $t$ are the targets, $c$ denotes the data point, $d$ denotes the nuclei class, and $w_{d}$ denotes individual class weights. It is to be noted that we have used separate weights ( $w_{d}$ ) within the categorical cross-entropy as well in order to tackle the high nuclear category imbalance. These weights are assigned based on the percentage of the nuclear category, reported by PanNuke [9]. The other weights, i.e., $\lambda_{a}$ , $\lambda_{b}$ , and $\lambda_{c}$ are, specifically, set to 1, 5, and 4, respectively, for optimal performance. The improvement in performance due to the addition of these weights is verified and reported in the ablation study (Section 5.2) of this work.

4 Experiments and Results

4.1 Datasets

PanNuke is the largest publically available cross-tissue nuclei cancer histology dataset with thousands of images. PanNuke dataset is obtained from combining four different datasets, i.e. CPM2017 [29], Kumar [26], visual field from TCGA [30], and bone marrow [31]. The images of size 256 ×256 are extracted from more than 20,000 Whole Slide Images (WSI). It contains a total of around forty-seven thousand labeled nuclei. The nuclei are labeled as instances and five classes, i.e., neoplastic, non-neoplastic or epithelial, inflammatory, connective, and dead nuclei. The dataset is split into training, validation, and testing split. We follow the same splits as followed by earlier literature in order to get reproducible results. We use limited pre-processing on the data. We removed a few blank images present in the dataset. We extract the canny edges from the labeled instances to use them for training our network head which predicts the edges of the nuclei.

4.2 Implementation Details

We developed the proposed network using Keras, an open-source python library for deep learning. The model is trained via Nvidia RTX 2060 graphical processing unit (GPU), accelerated by CUDA 11.0. We minimize the losses using the Adam optimizer with the mini-batch size of five. The learning rate is set to 0.004 at the start of the training. When the model stops learning for five consecutive epochs, the learning rate is decayed by a factor of 0.3. The training is considered as completed if the validation loss stop improving for fifteen successive epochs.

4.3 Results And Comparison

We utilize PQ for comparative analysis as discussed by Graham et al. [5]. We use binary PQ (bPQ), and multi-class PQ (mPQ) as suggested by Gamper et al. [9]. bPQ assumes that all the nuclei are associated with only a single class, while mPQ calculates PQ for an individual class of nuclei and then finds the average of the individual PQ. One of the reasons for using mPQ is that it is not affected by class imbalance. To know, specifically, how our method performs over each tissue individually, we calculate bPQ, and mPQ for each of the 19 issues in the dataset individually. We report tissue-wise results and the average across the tissues in Table 1. Table 1 also show the comparison of the suggested framework against top-performing methods i.e. HoVer-Net [5], DIST [6], Micro-Net [32], Mask-RCNN [20], and PointNu-Net [4].

Table 1: The table reports tissue-wise instance segmentation and classification results compared with the top-performing methods in the literature. The last row shows the average across the tissue splits.

Methods $\rightarrow$	DIST		Micro-Net		Mask-RCNN		HoVer-Net		PointNu-Net		Proposed
Organs $\downarrow$	mPQ	bPQ	mPQ	bPQ	mPQ	bPQ	mPQ	bPQ	mPQ	bPQ	mPQ	bPQ
AG	0.3442	0.5603	0.4153	0.6440	0.3470	0.5546	0.4812	0.6962	0.5115	0.7134	0.6386	0.7031
Bile Duct	0.3614	0.5384	0.4124	0.6232	0.3536	0.5567	0.4714	0.6696	0.4868	0.6814	0.6477	0.7557
Breast	0.3790	0.5466	0.4407	0.6029	0.3882	0.5574	0.4902	0.6470	0.5147	0.6709	0.6472	0.7325
Bladder	0.4463	0.5625	0.5357	0.6488	0.5065	0.6049	0.5792	0.7031	0.6065	0.7226	0.6654	0.7548
Colon	0.2989	0.4508	0.3414	0.4972	0.3122	0.4603	0.4095	0.5575	0.4509	0.5945	0.6402	0.7164
Cervix	0.3371	0.5309	0.3795	0.6101	0.3402	0.5483	0.4438	0.6652	0.5014	0.6899	0.6627	0.6917
Esophagus	0.3942	0.5295	0.4668	0.6011	0.4311	0.5691	0.5085	0.6427	0.5504	0.6766	0.6426	0.7179
H & N	0.3177	0.4764	0.3668	0.5242	0.3946	0.5457	0.4530	0.6331	0.4838	0.6546	0.6229	0.6913
Lung	0.2809	0.4978	0.3370	0.5588	0.3182	0.5134	0.4004	0.6302	0.4048	0.6352	0.6272	0.6977
Kidney	0.3339	0.5727	0.4165	0.6321	0.3553	0.5092	0.4424	0.6836	0.5066	0.6912	0.6147	0.7074
Liver	0.3441	0.5818	0.4365	0.6666	0.4103	0.6085	0.4974	0.7248	0.5174	0.7314	0.6050	0.7027
Ovarian	0.3789	0.5289	0.4387	0.6013	0.4337	0.5784	0.4863	0.6309	0.5484	0.6863	0.6029	0.6859
Prostate	0.3810	0.5442	0.4341	0.6049	0.3959	0.5789	0.5101	0.6615	0.5127	0.6854	0.6507	0.6821
Pancreatic	0.3395	0.5343	0.4041	0.6074	0.3624	0.5460	0.4600	0.6491	0.4804	0.6791	0.6382	0.7246
Stomach	0.3369	0.5553	0.3872	0.6293	0.3684	0.5976	0.4726	0.6886	0.4517	0.7010	0.6235	0.7399
Skin	0.2627	0.5080	0.3223	0.5817	0.2665	0.5021	0.3429	0.6234	0.4011	0.6494	0.6309	0.6895
Thyroid	0.2574	0.5596	0.3712	0.6555	0.3037	0.5712	0.4315	0.6983	0.4508	0.7076	0.6202	0.6995
Testis	0.3278	0.5548	0.4088	0.6300	0.3512	0.5420	0.4754	0.6890	0.5334	0.7058	0.6666	0.7512
Uterus	0.3487	0.5246	0.3965	0.5821	0.3683	0.5589	0.4393	0.6393	0.4846	0.6634	0.5926	0.7139
Average	0.3406	0.5346	0.4059	0.6053	0.3688	0.5528	0.4629	0.6596	0.4957	0.6808	0.6337	0.7135

The suggested framework achieves state-of-the-art performance compared to the top performers both for instance segmentation and classification (refer to Table 1). We report average PQ for each nuclei class in Table 2. The suggested framework has significantly improved tissue classification, especially the connective and dead cells.

Table 2: Average PQ for each nuclei class accross the three datasplits of PanNuke dataset.

	Neoplastic	Inflammatory	Epithelium	Dead	Connective
DAN-NucNet [13]	0.410	0.314	0.390	0.000	0.359
Micro-Net [32]	0.504	0.333	0.442	0.051	0.334
DIST [6]	0.439	0.343	0.439	0.000	0.275
Mask-RCNN [20]	0.472	0.290	0.472	0.069	0.300
HoVer-Net [5]	0.551	0.417	0.551	0.139	0.388
PointNu-Net [4]	0.578	0.433	0.578	0.154	0.409
Proposed	0.596	0.459	0.619	0.251	0.477

We observe that the suggested framework produces improved semantic segmentation compared to the top performers. Since bottom-up approaches rely on semantic segmentation, we have compared the proposed framework specifically with bottom-up approaches. The comparison is made using dice metrics. The comparison is reported in Table 3. The table also verifies the fact that the semantic segmentation of the bottom-up approaches improves their instance segmentation and classification, as demonstrated by [4].

Table 3: The table reports the comparison of the suggested framework with the top performer bottom-up approaches.

	Semantic Segmentation	Instance Segmentation	Classification
	(Dice)	(bPQ)	(mPQ)
U-Net [33]	0.715	–	–
NucleiSegNet [34]	0.825	–	–
DIST [6]	0.717	0.5346	0.3406
Micro-Net [32]	0.813	0.6053	0.4059
HoVer-Net [5]	0.819	0.6596	0.4629
Proposed	0.841	0.7135	0.6337

We present visual results, under variable conditions, of our semantic and instance segmentation in Fig. 5, while classification results in Fig. 6. We have also presented our visual results compared to the top performers in Fig. 7. The images in Fig. 7 also represent harsh conditions. We further compare our visual results with the top two performers, PointNu-Net [4] and HoVer-Net [5] (reported in Fig. 8). The suggested framework successfully segments and classifies partial or smaller nuclei. In comparison, the existing methods fail to segment smaller or partially present nuclei.

4.4 Significance Analysis of the Results

Although, the significance of the results of our framework can be observed in Table 1. However, we perform an in-depth analysis to analyze our framework performance in further detail. Based on the detailed results on individual tissue sites, we plot a box plot (Fig. 9). The left boxplot in the figure is drawn, for classification, while the boxplot on the right side is drawn for instance segmentation. The boxplot helps us to analyze various aspects of the framework. 1) The color dots represent outliers among the results. We can observe that the proposed framework produces the least number of outliers, demonstrating the framework’s generalization capability across multiple organs. 2) The color boxes show the interquartile range of the frameworks. The suggested framework results in a limited quartile range in comparison to the state-of-the-art, representing the framework’s stability.

5 Framework Analysis

5.1 Complexity Analysis of the Three-Head Model

The complexity analysis of the network is an important factor in histology since a single whole slide may be of the size of 100,000x100,000 pixels. Although it is an essential comparison, it is rarely mentioned in the literature. We have compared the complexity of the suggested model with the state-of-the-art models. The comparison is in terms of the training parameters. It represents the training and inference complexity of the models. The proposed three-head model has a total of 2.74 million parameters. However, if we use a separate feature extractor for improved nuclei classification, our total parameters are increased to 4.19 million. Compared to the state-of-the-art models, the parameters are far much less. The nearest competitors are DIST [6], and HoVer-Net [5]. The figurative comparison of the parameters are shown in Fig. 10. It is to be noted that the complexity of the post-processing used by the suggested framework and the state-of-the-art are not reported. It is due to the reason that the resources utilized by the post-processing are negligible compared to the models.

5.2 Model Related Ablation Studies

For an in-depth analysis of the proposed model, we performed ablation studies related to the model’s components. This study helps us to validate different aspects of the proposed method. We specifically perform four studies. First, we use a formal watershed algorithm instead of a controlled watershed. We observe that a formal watershed algorithm leads to poor instance segmentation. The use of controlled watersheds is encouraged in the previous literature compared to formal watershed methods; thus, this re-verify the previous literature. Second, we observe the framework without using pixel grouping. If we do not use pixel grouping, we get multiple class suggestions for each nuclei instance since one of our three head network heads produces pixel-wise classes. This problem is consistent with other bottom-up approaches as well. This makes the end-to-end networks for nuclei classification in histology less feasible. It also hints at the superiority of bottom-up approaches over the existing end-to-end state-of-the-art networks. Third, we use the same class weights for each nuclear category. This causes reduced performance based on the fact that the histology dataset has a high nuclei class imbalance. Because of the class imbalance, we see the worst performance of the existing methods for dead nuclei. Therefore we suggest the use of a weighted loss function for classification. Finally, we observe the framework using the same feature extractor for semantic segmentation, edge proposals, and pixel-wise classes. This has almost no effect on semantic or instance segmentation. However, using the same feature extractor reduced the classification performance. The use of a separate feature extractor is also suggested by previous works, such as Graham et al. [5]. However, using a separate feature extractor increases the number of parameters. The results of the ablation studies are summarized in Table 4.

Table 4: The table reports the results obtained during the ablation studies carried out on the suggested framework.

Ablation studies	bPQ	mPQ
Uncontrolled Watershed	0.4659	0.3252
Without Pixel Grouping	–	0.4786
Equal Class Weights	0.7126	0.5213
Same Feature Extractor for Class	0.7121	0.5937
Proposed	0.7135	0.6337

6 Conclusion

This work presents an all-in-one framework for simultaneous semantic segmentation, instance segmentation, and nuclei classification in cancer digital histology. We use our previous DAN-NucNet model as a baseline to develop a three-head model. The three-head model has additional decoder heads that generate semantic segmentation, edge proposals, and pixel-wise class maps. We utilize the estimated edges and the semantic segmentation produced by the two decoder heads to perform instance segmentation via a controlled watershed. The edge maps help the model to improve the nuclei boundaries delineation even in challenging conditions, such as overlapping nuclei and missing staining. We then use the instance segmentation and class maps to assign classes to the nuclei instances via pixel grouping. Due to the third decoder head, pixel grouping, and weighted loss function, we observe a huge improvement in classifying imbalanced nuclear categories, i.e., connective and dead nuclei. We specifically demonstrate a considerable improvement in nuclei segmentation and classification compared to the state-of-the-art.

Although the suggested framework has demonstrated a significant improvement in the segmentation and classification of nuclei, the classification of nuclei still has significant room for improvement. This is due to the resemblance between non-neoplastic and neoplastic nuclei. Furthermore, the high nuclei class imbalance still results in 0.261 mPQ for the dead nuclear category. In future work, these problems may be addressed.

References

[1] C. Lu, D. Romo-Bucheli, X. Wang, A. Janowczyk, S. Ganesan, H. Gilmore, D. Rimm, A. Madabhushi, Nuclear shape and orientation features from h&e images predict survival in early-stage estrogen receptor-positive breast cancers, Laboratory Investigation 98 (11) (2018) 1438–1448. doi:10.1038/s41374-018-0095-7.
[2] Y. Liu, K. Gadepalli, M. Norouzi, G. E. Dahl, T. Kohlberger, A. Boyko, S. Venugopalan, A. Timofeev, P. Q. Nelson, G. S. Corrado, J. D. Hipp, L. Peng, M. C. Stumpe, Detecting cancer metastases on gigapixel pathology images, MICCAI Tutorial (2017) (Mar. 2017). arXiv:1703.02442.
[3] N. Alsubaie, K. Sirinukunwattana, S. E. A. Raza, D. Snead, N. M. Rajpoot, A bottom-up approach for tumour differentiation in whole slide images of lung adenocarcinoma, in: M. N. Gurcan, J. E. Tomaszewski (Eds.), Medical Imaging 2018: Digital Pathology, SPIE, 2018. doi:10.1117/12.2293316.
[4] K. Yao, K. Huang, J. Sun, A. Hussain, C. Jude, Pointnu-net: Simultaneous multi-tissue histology nuclei segmentation and classification in the clinical wild, IEEE Transactions on Emerging Topics in Computational Intelligence (2023).
[5] S. Graham, Q. D. Vu, S. E. A. Raza, A. Azam, Y. W. Tsang, J. T. Kwak, N. Rajpoot, Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images, Medical Image Analysis 58 (2019) 101563. doi:10.1016/j.media.2019.101563.
[6] P. Naylor, M. Lae, F. Reyal, T. Walter, Segmentation of nuclei in histopathology images by deep regression of the distance map, IEEE Transactions on Medical Imaging 38 (2) (2019) 448–459. doi:10.1109/tmi.2018.2865709.
[7] Y. Zhou, O. F. Onder, Q. Dou, E. Tsougenis, H. Chen, P.-A. Heng, CIA-Net: Robust nuclei instance segmentation with contour-aware information aggregation, in: Lecture Notes in Computer Science, Springer International Publishing, 2019, pp. 682–693.
[8] B. Zhao, X. Chen, Z. Li, Z. Yu, S. Yao, L. Yan, Y. Wang, Z. Liu, C. Liang, C. Han, Triple u-net: Hematoxylin-aware nuclei segmentation with progressive dense feature aggregation, Medical Image Analysis 65 (2020) 101786. doi:10.1016/j.media.2020.101786.
[9] J. Gamper, N. A. Koohbanani, S. Graham, M. Jahanifar, S. A. Khurram, A. Azam, K. Hewitt, N. Rajpoot, Pannuke dataset extension, insights and baselines, arXiv preprint arXiv:2003.10778 (2020).
[10] F. Xing, L. Yang, Robust nucleus/cell detection and segmentation in digital pathology and microscopy images: A comprehensive review, IEEE Reviews in Biomedical Engineering 9 (2016) 234–263. doi:10.1109/rbme.2016.2515127.
[11] S. Graham, Q. D. Vu, M. Jahanifar, S. E. A. Raza, F. Minhas, D. Snead, N. Rajpoot, One model is all you need: Multi-task learning enables simultaneous histology image segmentation and classification, Medical Image Analysis 83 (2023) 102685. doi:10.1016/j.media.2022.102685.
[12] M. Swerdlow, O. Guler, R. Yaakov, D. G. Armstrong, Simultaneous segmentation and classification of pressure injury image data using mask-r-CNN, Computational and Mathematical Methods in Medicine 2023 (2023) 1–7. doi:10.1155/2023/3858997.
[13] I. Ahmad, Y. Xia, H. Cui, Z. U. Islam, DAN-NucNet: A dual attention based framework for nuclei segmentation in cancer histology images under wild clinical conditions, Expert Systems with Applications (2022) 118945doi:10.1016/j.eswa.2022.118945.
[14] H. Law, J. Deng, Cornernet: Detecting objects as paired keypoints, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 734–750.
[15] Z. Dong, G. Li, Y. Liao, F. Wang, P. Ren, C. Qian, Centripetalnet: Pursuing high-quality keypoint pairs for object detection, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020. arXiv:2003.09119.
[16] Z. Yang, S. Liu, H. Hu, L. Wang, S. Lin, Reppoints: Point set representation for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[17] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, Q. Tian, Centernet: Keypoint triplets for object detection, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[18] S. Lan, Z. Ren, Y. Wu, L. S. Davis, G. Hua, Saccadenet: A fast and accurate object detector, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 10397–10406.
[19] J. Dong, J. Yuan, L. Li, X. Zhong, A lightweight high-resolution representation backbone for real-time keypoint-based object detection, in: 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2020, pp. 1–6. doi:10.1109/icme46284.2020.9102749.
[20] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE international conference on computer vision, 2017. arXiv:1703.06870.
[21] D. Bolya, C. Zhou, F. Xiao, Y. J. Lee, Yolact: Real-time instance segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[22] Y. Lee, J. Park, Centermask : Real-time anchor-free instance segmentation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019. arXiv:1911.06667.
[23] E. Xie, P. Sun, X. Song, W. Wang, D. Liang, C. Shen, P. Luo, Polarmask: Single shot instance segmentation with polar representation, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019. arXiv:1909.13226.
[24] X. Chen, R. Girshick, K. He, P. Dollar, Tensormask: A foundation for dense object segmentation, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[25] L.-C. Chen, A. Hermans, G. Papandreou, F. Schroff, P. Wang, H. Adam, Masklab: Instance segmentation by refining object detection with semantic and direction features, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017. arXiv:1712.04837.
[26] N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane, A. Sethi, A dataset and a technique for generalized nuclear segmentation for computational pathology, IEEE Transactions on Medical Imaging 36 (7) (2017) 1550–1560. doi:10.1109/tmi.2017.2677499.
[27] I. Ahmad, M. A. Ahmad, S. J. Anwar, Transfer learning and dual attention network based nuclei segmentation in head and neck digital cancer histology images, in: 2023 15th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), IEEE, 2023. doi:10.1109/ecai58194.2023.10193937.
[28] I. Ahmad, Y. Xia, H. Cui, Z. U. Islam, AATSN: Anatomy aware tumor segmentation network for PET-CT volumes and images using a lightweight fusion-attention mechanism, Computers in Biology and Medicine 157 (2023) 106748. doi:10.1016/j.compbiomed.2023.106748.
[29] Q. D. Vu, S. Graham, T. Kurc, M. N. N. To, M. Shaban, T. Qaiser, N. A. Koohbanani, S. A. Khurram, J. Kalpathy-Cramer, T. Zhao, et al., Methods for segmentation and classification of digital microscopy tissue images, Frontiers in bioengineering and biotechnology (2019) 53.
[30] J. Liu, T. Lichtenberg, K. A.Hoadley, L. M.Poisson, A. J.Lazar, A. D.Cherniack, A. J.Kovatich, C. C.Benz, D. A.Levine, A. V.Lee, L. Omberg, D. M.Wolf, C. D.Shriver, V. Thorsson, An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics, Cell 173 (2) (2018) 400–416.e11. doi:10.1016/j.cell.2018.02.052.
[31] P. Kainz, M. Urschler, S. Schulter, P. Wohlhart, V. Lepetit, You should use regression to detect cells, in: Lecture Notes in Computer Science, Springer International Publishing, 2015, pp. 276–283. doi:10.1007/978-3-319-24574-4_33.
[32] S. E. A. Raza, L. Cheung, M. Shaban, S. Graham, D. Epstein, S. Pelengaris, M. Khan, N. M. Rajpoot, Micro-net: A unified model for segmentation of various objects in microscopy images, Medical Image Analysis 52 (2019) 160–173. doi:10.1016/j.media.2018.12.003.
[33] O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, in: Lecture Notes in Computer Science, Springer International Publishing, 2015, pp. 234–241.
[34] S. Lal, D. Das, K. Alabhya, A. Kanfade, A. Kumar, J. Kini, NucleiSegNet: Robust deep learning architecture for the nuclei segmentation of liver cancer histopathology images, Computers in Biology and Medicine 128 (2021) 104075. doi:10.1016/j.compbiomed.2020.104075.