This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Computer Vision Institute,
School of Computer Science and Software Engineering,
Shenzhen University
[email protected], [email protected]
22institutetext: National Engineering Research Center for Big Data Technology and System,
Services Computing Technology and System Lab,
Cluster and Grid Computing Lab,
School of Computer Science and Technology,
Huazhong University of Science and Technology,
430074, Wuhan, China
[email protected]

HI-Net: Hyperdense Inception 3D3D UNet for Brain Tumor Segmentation

Saqib Qamar equal contribution11 0000-0002-5980-5976    Parvez Ahmad* 22 0000-0003-1409-3175    Linlin Shen 11
Abstract

The brain tumor segmentation task aims to classify tissue into the whole tumor (WT), tumor core (TC) and enhancing tumor (ET) classes using multimodel MRI images. Quantitative analysis of brain tumors is critical for clinical decision making. While manual segmentation is tedious, time-consuming, and subjective, this task is at the same time very challenging to automatic segmentation methods. Thanks to the powerful learning ability, convolutional neural networks (CNNs), mainly fully convolutional networks, have shown promising brain tumor segmentation. This paper further boosts the performance of brain tumor segmentation by proposing hyperdense inception 3D3D UNet (HI-Net), which captures multi-scale information by stacking factorization of 33D weighted convolutional layers in the residual inception block. We use hyper dense connections among factorized convolutional layers to extract more contexual information, with the help of features reusability. We use a dice loss function to cope with class imbalances. We validate the proposed architecture on the multi-modal brain tumor segmentation challenges (BRATS) 20202020 testing dataset. Preliminary results on the BRATS 20202020 testing set show that achieved by our proposed approach, the dice (DSC) scores of ET, WT, and TC are 0.794570.79457, 0.874940.87494, and 0.837120.83712, respectively.

Keywords:
Brain tumor 33D UNetdense connections Factorized convolutional Deep learning

1 Introduction

Primary and secondary are two types of brain tumors. Primary brain tumors originate from brain cells, whereas secondary tumors metastasize into the brain from other organs. Gliomas are primary brain tumors. Gliomas can be further sub-divided into two parts: low-grade (LGG) and high-grade (HGG). High-grade gliomas are an aggressive type of malignant brain tumor that proliferates, usually requires surgery and radiotherapy, and has a poor survival prognosis. Magnetic resonance imaging (MRI) is a critical diagnostic tool for brain tumor analysis, monitoring, and surgery planning. Usually, several complimentary 3D3D MRI modalities - such as T1T1, T1T1 with contrast agent (T1c)(T1c), T2T2, and fluid attenuation inversion recover (FLAIR)(FLAIR) are required to emphasize different tissue properties and areas of tumor spread. For example, the contrast agent, usually gadolinium, emphasizes hyperactive tumor subregions in T1cT1c MRI modality.

Deep learning techniques, especially CNNs, are prevalent for the automatic segmentation of brain tumors. CNN can learn from examples and demonstrate state-of-the-art segmentation accuracy both in 22D natural images and in 33D medical image modalities. The information of segmentation provides an accurate, reproducible solution for further tumor analysis and monitoring. Multi-modal brain tumor segmentation challenge (BRATS) aims to evaluate state-of-the-art methods for the segmentation of brain tumors by providing a 33D MRI dataset with ground truth labels annotated by physicians [1], [2], [3], [4], [14]. A 3D3D UNet is a popular CNN architecture for automatic brain tumor segmentation [8]. The multi-scale contextual information of the encoder-decoder sub-networks is effective for the accurate brain tumor segmentation task. Several variations of the encoder-decoder architectures were proposed for MICCAI BraTS 20182018 and 20192019 competitions.The potential of several deep architectures [12, 13, 17] and their ensembling procedures for brain tumor segmentation was discussed by a top-performing method [11] for MICCAI BRATS 20172017 competition. Wang et al[18] proposed architectures with factorized weighted layers to save the GPU memory and the computational time. At the same time, the majority of these architectures used either the bigger input sizes [16] or cascaded training [10] or novel pre-processing [7] and post-processing strategies [9] to improve the segmentation accuracy. In contrast, few architectures demonstrate the important memory consumption of 33D convolutional layers. Chen et al [5] used an important concept in which each weighted layer was split into three branches in a parallel fashion, each with a different orthogonal view, namely axial, sagittal, and coronal. However, more complex combinations exist between features within and in-between different orthogonal views, which can significantly increase the learning representation [6]. Inspired by the S33D UNet architecture [5, 19], we propose a variant encoder-decoder based architecture for the brain tumor segmentation. The key contributions of our study are as follows:

  • A novel hyperdense inception 33D UNet (HI-Net) architecture is proposed by stacking factorization of 33D weighted convolutional layers in the residual inception block.

  • In each residual inception block, hyper-dense connections are used in-between different orthogonal views to learn more complex feature representation.

  • Our network achieves state-of-the-art performance as compared to other recent methods.

2 Proposed Method

Figure 1 shows the proposed HI-Net architecture for brain tumor segmentation. The network’s left side works as an encoder to extract the features of different levels, and the right component of the network acts as a decoder to aggregate the features and the segmentation mask. The modified residual inception blocks of the encoder-decoder sub-networks have two 3D3D convolutional layers, and each layer has followed the structure of Fig. 2(b). In contrast, traditional residual inception blocks are shown in Fig. 2(a). This study employed inter-connections of dense connections within and in-between different orthogonal views to learn more complex feature representation. In the stage of encoding, the encoder extracts feature at multiple scales and create fine-to-coarse feature maps. Fine feature maps contain low-level features but more spatial information, while coarse feature maps provide the opposite. Skip connection is used to combine coarse and fine feature maps for accurate segmentation. Unlike standard residual UNet, the encoder sub-network uses a self-repetition procedure on multiple levels to generate semantic maps for fine feature maps and thus select relevant regions in the fine feature maps to concatenate with the coarse feature maps.

Refer to caption
Figure 1: Proposed HI-Net architecture. The element-wise addition operations (++ symbol with the oval shape) are employed to design the proposed architecture. The modified residual inception blocks (violet), known as hyperdense residual inception blocks, are used in the encoder-decoder paths. The length of the encoder path is longer than the decoder part by performing repetitions on several levels. The maximum repetition is 44 on the last level of the encoder part to draw the semantic information from the lowest input resolution. Finally, the softmax activation is performed for the outcomes.

3 Implementation Details

3.1 Dataset

The BRATS 2020 [1], [2], [3], [4], [14] training dataset included 369369 cases (293293 HGG and 7676 LGG), each with four rigidly aligned 33D MRI modalities (T1T1, T1cT1c, T2T2, and FLAIRFLAIR), resampled to 1×1×11\times 1\times 1 mm isotropic resolution and skull-stripped. The input image size is 240×240×155240\times 240\times 155. The data were collected from 1919 institutions, using various MRI scanners. Annotations include 33 tumor subregions: WT, TC, and ET. Two additional datasets without the ground truth labels are provided for validation and testing. These datasets required participants to upload the segmentation masks to the organizers’ server for evaluations. In validation (125125 cases) and testing (166166) datasets, each subject includes the same four modalities of brain MRI scans but no ground truth. In our experiment, the training set is applied to optimize the trainable parameters in the network. The validation and testing sets are utilized to evaluate the performance of the trained network.

3.2 Experiments

The network is implemented by Keras and trained on Tesla V100100–SXM22 3232 GB GPU card with a batch size of 11. Adam optimizer with an initial learning rate 3×1053\times 10{{}^{-5}} is employed to optimize the parameters. The learning rate is reduced by 0.50.5 per 3030 epochs. The network is trained for 350350 epochs. During network training, augmentation techniques such as random rotations and mirroring are employed. The size of the input during the training of the network is 128×128×128128\times 128\times 128. The multi-label dice loss function [15] addressed the class imbalance problem. Equation 1 shows the mathematical representation of loss function.

Loss=2DdDjP(j,d)T(j,d)+rjP(j,d)+jT(j,d)+rLoss=-\frac{2}{D}\sum_{d\in D}\frac{\sum_{j}P_{(j,d)}T_{(j,d)}+r}{{\sum_{j}P_{(j,d)}}+{\sum_{j}T_{(j,d)}}+r}\linebreak (1)

where P(j,d)P_{(j,d)} and T(j,d)T_{(j,d)} are the prediction obtained by softmax activation and ground truth at voxel jj for class dd, respectively. DD is the total number of classes.

Refer to caption
Figure 2: Difference between baseline and modified residual inception blocks. (aa) represent a baseline residual inception block with a separable 33D convolutional layer, while the proposed block with inter-connected dense connections is shown in (bb).

3.3 Evaluation Metrics

Multiple criteria are computed as performance metrics to quantify the segmentation result. Dice coefficient (DSC) is the most frequently used metric for evaluating medical image segmentation. It measures the overlap between the segmentation and ground truth with a value between 0 and 11. The higher the Dice score, the better the segmentation performance. Sensitivity and specificity are also commonly used statistical measures. The sensitivity called true positive rate is defined as the proportion of positives that are correctly predicted. It measures the portion of tumor regions in the ground truth that is also predicted as tumor regions by the segmentation method. The specificity, called true negative rate, is defined as the proportion of correctly predicted negatives. It measures the portion of normal tissue regions in the ground truth that is also predicted as normal tissue regions by the segmentation method.

3.4 Results

The performance of our proposed architecture is evaluated on training, validation, and the testing sets provided by BRATS 20202020. Table 1 presents the quantitative analysis of the proposed work. We have secured mean DSC scores of ET, WT, and TC as 0.741910.74191, 0.906730.90673, and 0.842930.84293, respectively, on the validation dataset, while 0.800090.80009, 0.929670.92967, and 0.909630.90963 on the training dataset. At the same time, our proposed approach obtained mean DSC scores of ET, WT, and TC as 0.794570.79457, 0.874940.87494, and 0.837120.83712, respectively, on the testing dataset. In Table 1, sensitivity and specificity are also presented on training, validation, and the testing datasets. Table 2 shows the comparable study of proposed work with the baseline work [5]. Our proposed HI-Net achieves higher scores for each tumor than the baseline work. Furthermore, ablation studies are conducted to assess the modified residual inception blocks’ influence with and without the inter-connected dense connections. The influence of these connections on DSCs of ET, WT, and TC is shown in Table 2. To provide qualitative results of our method, three-segmented images from training data are shown in Fig 3. In summary, modified inception blocks significantly improve the DSCs of ET, WT, and TC against the baseline inception blocks.

Refer to caption
Figure 3: Segmentation results on the training dataset of the BRATS 20202020. From left to right: Ground-truth and predicted results on FLAIR modality; WT (brown), TC (red) and ET (blue).
Table 1: BRATS 20202020 training, validation and testing results. Mean average scores on different metrics.
Dataset Metrics WT TC ET
BRATS 2020 Training DSC 92.967 90.963 80.009
Sensitivity 93.004 91.282 80.751
Specificity 99.932 99.960 99.977
BRATS 2020 Validation DSC 90.673 84.293 74.191
Sensitivity 90.485 80.572 73.516
Specificity 99.929 99.974 99.977
BRATS 2020 Testing DSC 87.494 83.712 79.457
Sensitivity 91.628 85.257 82.409
Specificity 99.883 99.962 99.965
Table 2: Performance evaluation of different methods on the BRATS 20202020 validation dataset. For comparison, only DSC scores are shown. All scores are evaluated online.
Methods ET WT TC
Baseline Work 70.616 90.670 82.136
Proposed Work 74.191 90.673 84.293

4 Conclusion

We proposed a HI-Net architecture for brain tumor segmentation. Each 33D convolution is splitted into three parallel branches in the residual inception block, each with different orthogonal views, namely axial, sagittal and coronal. We also proposed hyperdense connections among factorized convolutional layers to extract more contextual information. The HI-Net architecture secures high DSC scores for all types of tumors. This network has been evaluated on the BRATS 20202020 Challenge testing dataset and achieved average DSC scores of 0.794570.79457, 0.874940.87494, and 0.837120.83712 for the segmentation of ET, WT, and TC, respectively. Compared with the performance of the validation dataset, the scores on the testing set are higher. In the future, we will work to enhance the robustness of the network to improve the segmentation performance by using some post-processing methods such as a fully connected conditional random field (CRF).

Acknowledgment

This work is supported by the National Natural Science Foundation of China under Grant No. 91959108.

References

  • [1] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann, J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-GBM collection. The Cancer Imaging Archive (2017) (2017)
  • [2] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J., Freymann, J., Farahani, K., Davatzikos, C.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-LGG collection. The Cancer Imaging Archive 286 (2017)
  • [3] Bakas, S., Akbari, H., Sotiras, A., Bilello, M., Rozycki, M., Kirby, J.S., Freymann, J.B., Farahani, K., Davatzikos, C.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Scientific Data 4, 170117 (sep 2017), https://doi.org/10.1038/sdata.2017.117http://10.0.4.14/sdata.2017.117
  • [4] Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., Prastawa, M., Alberts, E., Lipková, J., Freymann, J.B., Kirby, J.S., Bilello, M., Fathallah-Shaykh, H.M., Wiest, R., Kirschke, J., Wiestler, B., Colen, R.R., Kotrotsou, A., LaMontagne, P., Marcus, D.S., Milchenko, M., Nazeri, A., Weber, M.A., Mahajan, A., Baid, U., Kwon, D., Agarwal, M., Alam, M., Albiol, A., Albiol, A., Varghese, A., Tuan, T.A., Arbel, T., Avery, A., B., P., Banerjee, S., Batchelder, T., Batmanghelich, K.N., Battistella, E., Bendszus, M., Benson, E., Bernal, J., Biros, G., Cabezas, M., Chandra, S., Chang, Y.J., al., E.: Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the {BRATS} Challenge. CoRR abs/1811.0 (2018), http://arxiv.org/abs/1811.02629
  • [5] Chen, W., Liu, B., Peng, S., Sun, J., Qiao, X.: S3D-UNet: Separable 3D U-Net for Brain Tumor Segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 358–368. Springer International Publishing, Cham (2019)
  • [6] Dolz, J., Gopinath, K., Yuan, J., Lombaert, H., Desrosiers, C., Ayed, I.B.: HyperDense-Net: A hyper-densely connected CNN for multi-modal image segmentation. CoRR abs/1804.0 (2018), http://arxiv.org/abs/1804.02967
  • [7] Feng, X., Tustison, N., Meyer, C.: Brain Tumor Segmentation Using an Ensemble of 3D U-Nets and Overall Survival Prediction Using Radiomic Features. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 279–288. Springer International Publishing, Cham (2019)
  • [8] Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the {BRATS} 2017 Challenge. CoRR abs/1802.1 (2018), http://arxiv.org/abs/1802.10508
  • [9] Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., Maier-Hein, K.H.: No New-Net. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 234–244. Springer International Publishing (2019)
  • [10] Jiang, Z., Ding, C., Liu, M., Tao, D.: Two-Stage Cascaded U-Net: 1st Place Solution to BraTS Challenge 2019 Segmentation Task. In: Crimi, A., Bakas, S. (eds.) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. pp. 231–241. Springer International Publishing, Cham (2020)
  • [11] Kamnitsas, K., Bai, W., Ferrante, E., McDonagh, S.G., Sinclair, M., Pawlowski, N., Rajchl, M., Lee, M.C.H., Kainz, B., Rueckert, D., Glocker, B.: Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation. CoRR abs/1711.0 (2017), http://arxiv.org/abs/1711.01468
  • [12] Kamnitsas, K., Ledig, C., Newcombe, V.F.J., Simpson, J.P., Kane, A.D., Menon, D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Medical image analysis 36, 61–78 (2017)
  • [13] Long, J., Shelhamer, E., Darrell, T.: Fully Convolutional Networks for Semantic Segmentation. CoRR abs/1411.4 (2014), http://arxiv.org/abs/1411.4038
  • [14] Menze, B.H., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J., Burren, Y., Porz, N., Slotboom, J., Wiest, R., Lanczi, L., Gerstner, E., Weber, M., Arbel, T., Avants, B.B., Ayache, N., Buendia, P., Collins, D.L., Criminisi, A.: The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Transactions on Medical Imaging 34(10), 1993–2024 (oct 2015). https://doi.org/10.1109/TMI.2014.2377694
  • [15] Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. CoRR abs/1606.0 (2016), http://arxiv.org/abs/1606.04797
  • [16] Myronenko, A.: 3D {MRI} brain tumor segmentation using autoencoder regularization. CoRR abs/1810.1 (2018), http://arxiv.org/abs/1810.11654
  • [17] Ronneberger, O., Fischer, P., Brox, T.: U-Net: Convolutional Networks for Biomedical Image Segmentation. CoRR abs/1505.0 (2015), http://arxiv.org/abs/1505.04597
  • [18] Wang, G., Li, W., Ourselin, S., Vercauteren, T.: Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural Networks. CoRR abs/1709.0 (2017), http://arxiv.org/abs/1709.00382
  • [19] Xie, S., Sun, C., Huang, J., Tu, Z., Murphy, K.: Rethinking Spatiotemporal Feature Learning For Video Understanding. CoRR abs/1712.0 (2017), http://arxiv.org/abs/1712.04851