¹¹institutetext: Computer Vision Institute,
School of Computer Science and Software Engineering,
Shenzhen University
[email protected], [email protected] ²²institutetext: National Engineering Research Center for Big Data Technology and System,
Services Computing Technology and System Lab,
Cluster and Grid Computing Lab,
School of Computer Science and Technology,
Huazhong University of Science and Technology,
430074, Wuhan, China
[email protected]

HI-Net: Hyperdense Inception $3D$ UNet for Brain Tumor Segmentation

Saqib Qamar equal contribution11 0000-0002-5980-5976 Parvez Ahmad^* 22 0000-0003-1409-3175 Linlin Shen 11

Abstract

The brain tumor segmentation task aims to classify tissue into the whole tumor (WT), tumor core (TC) and enhancing tumor (ET) classes using multimodel MRI images. Quantitative analysis of brain tumors is critical for clinical decision making. While manual segmentation is tedious, time-consuming, and subjective, this task is at the same time very challenging to automatic segmentation methods. Thanks to the powerful learning ability, convolutional neural networks (CNNs), mainly fully convolutional networks, have shown promising brain tumor segmentation. This paper further boosts the performance of brain tumor segmentation by proposing hyperdense inception $3D$ UNet (HI-Net), which captures multi-scale information by stacking factorization of $3$ D weighted convolutional layers in the residual inception block. We use hyper dense connections among factorized convolutional layers to extract more contexual information, with the help of features reusability. We use a dice loss function to cope with class imbalances. We validate the proposed architecture on the multi-modal brain tumor segmentation challenges (BRATS) $2020$ testing dataset. Preliminary results on the BRATS $2020$ testing set show that achieved by our proposed approach, the dice (DSC) scores of ET, WT, and TC are $0.79457$ , $0.87494$ , and $0.83712$ , respectively.

Keywords:

Brain tumor

3

D UNetdense connections Factorized convolutional Deep learning

1 Introduction

Primary and secondary are two types of brain tumors. Primary brain tumors originate from brain cells, whereas secondary tumors metastasize into the brain from other organs. Gliomas are primary brain tumors. Gliomas can be further sub-divided into two parts: low-grade (LGG) and high-grade (HGG). High-grade gliomas are an aggressive type of malignant brain tumor that proliferates, usually requires surgery and radiotherapy, and has a poor survival prognosis. Magnetic resonance imaging (MRI) is a critical diagnostic tool for brain tumor analysis, monitoring, and surgery planning. Usually, several complimentary $3D$ MRI modalities - such as $T1$ , $T1$ with contrast agent $(T1c)$ , $T2$ , and fluid attenuation inversion recover $(FLAIR)$ are required to emphasize different tissue properties and areas of tumor spread. For example, the contrast agent, usually gadolinium, emphasizes hyperactive tumor subregions in $T1c$ MRI modality.

Deep learning techniques, especially CNNs, are prevalent for the automatic segmentation of brain tumors. CNN can learn from examples and demonstrate state-of-the-art segmentation accuracy both in $2$ D natural images and in $3$ D medical image modalities. The information of segmentation provides an accurate, reproducible solution for further tumor analysis and monitoring. Multi-modal brain tumor segmentation challenge (BRATS) aims to evaluate state-of-the-art methods for the segmentation of brain tumors by providing a $3$ D MRI dataset with ground truth labels annotated by physicians [1], [2], [3], [4], [14]. A $3D$ UNet is a popular CNN architecture for automatic brain tumor segmentation [8]. The multi-scale contextual information of the encoder-decoder sub-networks is effective for the accurate brain tumor segmentation task. Several variations of the encoder-decoder architectures were proposed for MICCAI BraTS $2018$ and $2019$ competitions.The potential of several deep architectures [12, 13, 17] and their ensembling procedures for brain tumor segmentation was discussed by a top-performing method [11] for MICCAI BRATS $2017$ competition. Wang et al[18] proposed architectures with factorized weighted layers to save the GPU memory and the computational time. At the same time, the majority of these architectures used either the bigger input sizes [16] or cascaded training [10] or novel pre-processing [7] and post-processing strategies [9] to improve the segmentation accuracy. In contrast, few architectures demonstrate the important memory consumption of $3$ D convolutional layers. Chen et al [5] used an important concept in which each weighted layer was split into three branches in a parallel fashion, each with a different orthogonal view, namely axial, sagittal, and coronal. However, more complex combinations exist between features within and in-between different orthogonal views, which can significantly increase the learning representation [6]. Inspired by the S $3$ D UNet architecture [5, 19], we propose a variant encoder-decoder based architecture for the brain tumor segmentation. The key contributions of our study are as follows:

•

A novel hyperdense inception $3$ D UNet (HI-Net) architecture is proposed by stacking factorization of $3$ D weighted convolutional layers in the residual inception block.
•

In each residual inception block, hyper-dense connections are used in-between different orthogonal views to learn more complex feature representation.
•

Our network achieves state-of-the-art performance as compared to other recent methods.

2 Proposed Method

Figure 1 shows the proposed HI-Net architecture for brain tumor segmentation. The network’s left side works as an encoder to extract the features of different levels, and the right component of the network acts as a decoder to aggregate the features and the segmentation mask. The modified residual inception blocks of the encoder-decoder sub-networks have two $3D$ convolutional layers, and each layer has followed the structure of Fig. 2(b). In contrast, traditional residual inception blocks are shown in Fig. 2(a). This study employed inter-connections of dense connections within and in-between different orthogonal views to learn more complex feature representation. In the stage of encoding, the encoder extracts feature at multiple scales and create fine-to-coarse feature maps. Fine feature maps contain low-level features but more spatial information, while coarse feature maps provide the opposite. Skip connection is used to combine coarse and fine feature maps for accurate segmentation. Unlike standard residual UNet, the encoder sub-network uses a self-repetition procedure on multiple levels to generate semantic maps for fine feature maps and thus select relevant regions in the fine feature maps to concatenate with the coarse feature maps.

Refer to caption — Figure 1: Proposed HI-Net architecture. The element-wise addition operations ( $+$ symbol with the oval shape) are employed to design the proposed architecture. The modified residual inception blocks (violet), known as hyperdense residual inception blocks, are used in the encoder-decoder paths. The length of the encoder path is longer than the decoder part by performing repetitions on several levels. The maximum repetition is $4$ on the last level of the encoder part to draw the semantic information from the lowest input resolution. Finally, the softmax activation is performed for the outcomes.

3 Implementation Details

3.1 Dataset

The BRATS 2020 [1], [2], [3], [4], [14] training dataset included $369$ cases ( $293$ HGG and $76$ LGG), each with four rigidly aligned $3$ D MRI modalities ( $T1$ , $T1c$ , $T2$ , and $FLAIR$ ), resampled to $1\times 1\times 1$ mm isotropic resolution and skull-stripped. The input image size is $240\times 240\times 155$ . The data were collected from $19$ institutions, using various MRI scanners. Annotations include $3$ tumor subregions: WT, TC, and ET. Two additional datasets without the ground truth labels are provided for validation and testing. These datasets required participants to upload the segmentation masks to the organizers’ server for evaluations. In validation ( $125$ cases) and testing ( $166$ ) datasets, each subject includes the same four modalities of brain MRI scans but no ground truth. In our experiment, the training set is applied to optimize the trainable parameters in the network. The validation and testing sets are utilized to evaluate the performance of the trained network.

3.2 Experiments

The network is implemented by Keras and trained on Tesla V $100$ –SXM $2$ $32$ GB GPU card with a batch size of $1$ . Adam optimizer with an initial learning rate $3\times 10{{}^{-5}}$ is employed to optimize the parameters. The learning rate is reduced by $0.5$ per $30$ epochs. The network is trained for $350$ epochs. During network training, augmentation techniques such as random rotations and mirroring are employed. The size of the input during the training of the network is $128\times 128\times 128$ . The multi-label dice loss function [15] addressed the class imbalance problem. Equation 1 shows the mathematical representation of loss function.

Loss=-\frac{2}{D}\sum_{d\in D}\frac{\sum_{j}P_{(j,d)}T_{(j,d)}+r}{{\sum_{j}P_{(j,d)}}+{\sum_{j}T_{(j,d)}}+r}\linebreak

(1)

where $P_{(j,d)}$ and $T_{(j,d)}$ are the prediction obtained by softmax activation and ground truth at voxel $j$ for class $d$ , respectively. $D$ is the total number of classes.

3.3 Evaluation Metrics

Multiple criteria are computed as performance metrics to quantify the segmentation result. Dice coefficient (DSC) is the most frequently used metric for evaluating medical image segmentation. It measures the overlap between the segmentation and ground truth with a value between $0$ and $1$ . The higher the Dice score, the better the segmentation performance. Sensitivity and specificity are also commonly used statistical measures. The sensitivity called true positive rate is defined as the proportion of positives that are correctly predicted. It measures the portion of tumor regions in the ground truth that is also predicted as tumor regions by the segmentation method. The specificity, called true negative rate, is defined as the proportion of correctly predicted negatives. It measures the portion of normal tissue regions in the ground truth that is also predicted as normal tissue regions by the segmentation method.

3.4 Results

The performance of our proposed architecture is evaluated on training, validation, and the testing sets provided by BRATS $2020$ . Table 1 presents the quantitative analysis of the proposed work. We have secured mean DSC scores of ET, WT, and TC as $0.74191$ , $0.90673$ , and $0.84293$ , respectively, on the validation dataset, while $0.80009$ , $0.92967$ , and $0.90963$ on the training dataset. At the same time, our proposed approach obtained mean DSC scores of ET, WT, and TC as $0.79457$ , $0.87494$ , and $0.83712$ , respectively, on the testing dataset. In Table 1, sensitivity and specificity are also presented on training, validation, and the testing datasets. Table 2 shows the comparable study of proposed work with the baseline work [5]. Our proposed HI-Net achieves higher scores for each tumor than the baseline work. Furthermore, ablation studies are conducted to assess the modified residual inception blocks’ influence with and without the inter-connected dense connections. The influence of these connections on DSCs of ET, WT, and TC is shown in Table 2. To provide qualitative results of our method, three-segmented images from training data are shown in Fig 3. In summary, modified inception blocks significantly improve the DSCs of ET, WT, and TC against the baseline inception blocks.

Table 1: BRATS

2020

training, validation and testing results. Mean average scores on different metrics.

Dataset	Metrics	WT	TC	ET
BRATS 2020 Training	DSC	92.967	90.963	80.009
	Sensitivity	93.004	91.282	80.751
	Specificity	99.932	99.960	99.977
BRATS 2020 Validation	DSC	90.673	84.293	74.191
	Sensitivity	90.485	80.572	73.516
	Specificity	99.929	99.974	99.977
BRATS 2020 Testing	DSC	87.494	83.712	79.457
	Sensitivity	91.628	85.257	82.409
	Specificity	99.883	99.962	99.965

Table 2: Performance evaluation of different methods on the BRATS

2020

validation dataset. For comparison, only DSC scores are shown. All scores are evaluated online.

Methods	ET	WT	TC
Baseline Work	70.616	90.670	82.136
Proposed Work	74.191	90.673	84.293

4 Conclusion

We proposed a HI-Net architecture for brain tumor segmentation. Each $3$ D convolution is splitted into three parallel branches in the residual inception block, each with different orthogonal views, namely axial, sagittal and coronal. We also proposed hyperdense connections among factorized convolutional layers to extract more contextual information. The HI-Net architecture secures high DSC scores for all types of tumors. This network has been evaluated on the BRATS $2020$ Challenge testing dataset and achieved average DSC scores of $0.79457$ , $0.87494$ , and $0.83712$ for the segmentation of ET, WT, and TC, respectively. Compared with the performance of the validation dataset, the scores on the testing set are higher. In the future, we will work to enhance the robustness of the network to improve the segmentation performance by using some post-processing methods such as a fully connected conditional random field (CRF).

Acknowledgment

This work is supported by the National Natural Science Foundation of China under Grant No. 91959108.

References

[1] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann, J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-GBM collection. The Cancer Imaging Archive (2017) (2017)
[2] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann, J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-LGG collection. The Cancer Imaging Archive 286 (2017)
[3] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann, J.B., Farahani, K., Davatzikos, C.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific Data 4, 170117 (sep 2017), https://doi.org/10.1038/sdata.2017.117http://10.0.4.14/sdata.2017.117
[4] Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., Prastawa, M., Alberts, E., Lipková, J., Freymann, J.B., Kirby, J.S., Bilello, M., Fathallah-Shaykh, H.M., Wiest, R., Kirschke, J., Wiestler, B., Colen, R.R., Kotrotsou, A., LaMontagne, P., Marcus, D.S., Milchenko, M., Nazeri, A., Weber, M.A., Mahajan, A., Baid, U., Kwon, D., Agarwal, M., Alam, M., Albiol, A., Albiol, A., Varghese, A., Tuan, T.A., Arbel, T., Avery, A., B., P., Banerjee, S., Batchelder, T., Batmanghelich, K.N., Battistella, E., Bendszus, M., Benson, E., Bernal, J., Biros, G., Cabezas, M., Chandra, S., Chang, Y.J., al., E.: Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the {BRATS} Challenge. CoRR abs/1811.0 (2018), http://arxiv.org/abs/1811.02629
[5] Chen, W., Liu, B., Peng, S., Sun, J., Qiao, X.: S3D-UNet: Separable 3D U-Net for Brain Tumor Segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 358–368. Springer International Publishing, Cham (2019)
[6] Dolz, J., Gopinath, K., Yuan, J., Lombaert, H., Desrosiers, C., Ayed, I.B.: HyperDense-Net: A hyper-densely connected CNN for multi-modal image segmentation. CoRR abs/1804.0 (2018), http://arxiv.org/abs/1804.02967
[7] Feng, X., Tustison, N., Meyer, C.: Brain Tumor Segmentation Using an Ensemble of 3D U-Nets and Overall Survival Prediction Using Radiomic Features. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 279–288. Springer International Publishing, Cham (2019)
[8] Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the {BRATS} 2017 Challenge. CoRR abs/1802.1 (2018), http://arxiv.org/abs/1802.10508
[9] Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: No New-Net. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 234–244. Springer International Publishing (2019)
[10] Jiang, Z., Ding, C., Liu, M., Tao, D.: Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 Segmentation Task. In: Crimi, A., Bakas, S. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 231–241. Springer International Publishing, Cham (2020)
[11] Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S.G., Sinclair, M., Pawlowski, N., Rajchl, M., Lee, M.C.H., Kainz, B., Rueckert, D., Glocker, B.: Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation. CoRR abs/1711.0 (2017), http://arxiv.org/abs/1711.01468
[12] Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, 61–78 (2017)
[13] Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. CoRR abs/1411.4 (2014), http://arxiv.org/abs/1411.4038
[14] Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., Lanczi, L., Gerstner, E., Weber, M., Arbel, T., Avants, B.B., Ayache, N., Buendia, P., Collins, D.L., Criminisi, A.: The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging 34(10), 1993–2024 (oct 2015). https://doi.org/10.1109/TMI.2014.2377694
[15] Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. CoRR abs/1606.0 (2016), http://arxiv.org/abs/1606.04797
[16] Myronenko, A.: 3D {MRI} brain tumor segmentation using autoencoder regularization. CoRR abs/1810.1 (2018), http://arxiv.org/abs/1810.11654
[17] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. CoRR abs/1505.0 (2015), http://arxiv.org/abs/1505.04597
[18] Wang, G., Li, W., Ourselin, S., Vercauteren, T.: Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural Networks. CoRR abs/1709.0 (2017), http://arxiv.org/abs/1709.00382
[19] Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking Spatiotemporal Feature Learning For Video Understanding. CoRR abs/1712.0 (2017), http://arxiv.org/abs/1712.04851

HI-Net: Hyperdense Inception 3​D3D UNet for Brain Tumor Segmentation