44email: [email protected]
Learning Directional Feature Maps for Cardiac MRI Segmentation
Abstract
Cardiac MRI segmentation plays a crucial role in clinical diagnosis for evaluating personalized cardiac performance parameters. Due to the indistinct boundaries and heterogeneous intensity distributions in the cardiac MRI, most existing methods still suffer from two aspects of challenges: inter-class indistinction and intra-class inconsistency. To tackle these two problems, we propose a novel method to exploit the directional feature maps, which can simultaneously strengthen the differences between classes and the similarities within classes. Specifically, we perform cardiac segmentation and learn a direction field pointing away from the nearest cardiac tissue boundary to each pixel via a direction field (DF) module. Based on the learned direction field, we then propose a feature rectification and fusion (FRF) module to improve the original segmentation features, and obtain the final segmentation. The proposed modules are simple yet effective and can be flexibly added to any existing segmentation network without excessively increasing time and space complexity. We evaluate the proposed method on the 2017 MICCAI Automated Cardiac Diagnosis Challenge (ACDC) dataset and a large-scale self-collected dataset, showing good segmentation performance and robust generalization ability of the proposed method. The code is publicly available at https://github.com/c-feng/DirectionalFeature.
Keywords:
Cardiac Segmentation Deep Learning Direction Field.1 Introduction
Cardiac cine Magnetic Resonance Imaging (MRI) segmentation is of great importance in disease diagnosis and surgical planning. Given the segmentation results, doctors can obtain the cardiac diagnostic indices such as myocardial mass and thickness, ejection fraction and ventricle volumes more efficiently. Indeed, manual segmentation is the gold standard approach. However, it is not only time-consuming but also suffers from the inter-observer variations. Hence, the automatic cardiac cine MRI segmentation is desirable in the clinic.
In the past decade, deep convolutional neural networks (CNNs) based methods have achieved great successes in both natural and medical image segmentation. U-Net [11] is one of the most successful and influential method in medical image segmentation. Recent works typically leverage the U-shape networks and can be roughly divided into 2D and 3D methods. 2D methods take a single 2D slice as input while 3D methods utilize the entire volume. nnU-Net [5] adopts the model fusion strategy of 2D U-Net and 3D U-Net, which achieves the current state-of-the-art performance in cardiac segmentation. However, the applicability is somewhat limited since it requires a high cost of memory and computation.
The MRI artifacts such as intensity inhomogeneity and fuzziness may make it indistinguishable between pixels near the boundary, leading to the problem of inter-class indistinction. As depicted in Fig. 1(a), we observe that the cardiac MRI segmentation accuracy drops dramatically for those pixels close to the boundary. Meanwhile, due to the lack of restriction on the spatial relationship between pixels, the segmentation model may produce some anatomical implausible errors (see Fig. 1(b) for an example). In this paper, we propose a novel method to improve the segmentation feature maps with directional information, which can significantly improve the inter-class indistinction as well as cope with the intra-class inconsistency. Extensive experiments demonstrate that the proposed method achieves good performance and is robust under cross-dataset validation.
Recent approaches in semantic segmentation have been devoted to handling the inter-class indistinction and the intra-class inconsistency. Ke et al. [6] define the concept of adaptive affinity fields (AAF) to capture and match the semantic relations between neighboring pixels in the label space. Cheng et al. [2] explore the boundary and segmentation mask information to improve the inter-class indistinction problem for instance segmentation. Shusil et al. [3] propose a multi-task learning framework to perform segmentation along with a pixel-wise distance map regression. This regularization method takes the distance from the pixel to the boundary as auxiliary information to handle the problem of inter-class indistinction. Nathan et al. [9] propose an adversarial variational auto-encoder to assure anatomically plausible, whose latent space encodes a smooth manifold on which lies a large spectrum of valid cardiac shapes, thereby indirectly solving the intra-class inconsistency problem.
Directional information has been recently explored in different vision tasks. For instance, TextField [14] and DeepFlux [12] learn similar direction fields on text areas and skeleton context for scene text detection and skeleton extraction, respectively. They directly construct text instances or recover skeletons from the direction field. However, medical images are inherently different from natural images. Such segmentation results obtained directly from the direction field are not accurate for the MRI segmentation task. In this paper, we propose to improve the original segmentation features guided by the directional information for better cardiac MRI segmentation.
2 Method
Inter-class indistinction and intra-class inconsistency are commonly found in both natural and medical image segmentation. Meanwhile, segmentation models usually learn individual representations and thus lack of restrictions on the relationship between pixels. We propose a simple yet effective method to exploit the directional relationship between pixels, which can simultaneously strengthen the differences between classes and the similarities within classes. The pipeline of the proposed method, termed as DFM, is depicted in Fig. 2. We adopt U-Net [11] as our base segmentation framework. Given an input image, the network produces the initial segmentation map. Meanwhile, we apply a direction field (DF) module to learn the direction field with the shared features from U-Net. A feature rectification feature (FRF) module is then proposed to combine the initial segmentation feature with the learned direction field to generate the final improved segmentation result.
2.1 DF Module to Learn a Direction Field
We first detail the notation of the direction field. As shown in Fig. 3(a-b), for each foreground pixel , we find its nearest pixel lying on the cardiac tissue boundary and then normalize the direction vector pointing from to by the distance between and . We set the background pixels to . Formally, the direction field for each pixel in the image domain is given by:
(1) |
We propose a simple yet effective DF module to learn the above direction field, which is depicted in Fig. 3. This module is made up of a convolution, whose input is the 64-channel feature extracted by U-Net and output is the two-channel direction field. It is noteworthy that we can obtain the ground truth of the direction field from the annotation easily by distance transform algorithm.
2.2 FRF Module for Feature Rectification and Fusion
The direction field predicted by the DF module reveals the directional relationship between pixels and provides a unique direction vector that points from the boundary to the central area for each pixel. Guided by these direction vectors, we propose a Feature Rectification and Fusion (FRF) module to utilize the characteristics of the central area to rectify the errors in the initial segmentation feature maps step by step. As illustrated in Fig. 4, the N-steps improved feature maps are obtained with the initial feature maps and the predicted direction field step by step. Concretely, the improved feature of the pixel is updated iteratively by the feature of the position that points to, which is calculated by the bilinear interpolation. In other words, is rectified by the features of the central area gradually. The whole procedure is formalized as below:
(2) |
where denotes the current step, is the total steps (set to 5 if not stated otherwise), and (resp. ) represents the (resp. ) coordinate of the pixel .
After performing the above rectification process, we concatenate with , and then apply the final classifier on the concatenated feature maps to predict the final cardiac segmentation.
2.3 Training Objective
The proposed method involves loss function on the initial segmentation , final segmentation , and direction field . We adopt the general cross-entropy as the segmentation loss to encourage class-seperate feature, which is commonly used in semantic segmentation. Formally, is given by , where and denote the ground truth and the prediction, respectively. For the loss to supervise the direction field learning, we choose the -norm distance and angle distance as the training objective:
(3) |
where and denote the predicted direction field and the corresponding ground truth, respectively, is a hyperparameter to balance the -norm distance and angle distance, and is to 1 in all experiments, and represents the weight on pixel , which is calculated by:
(4) |
where denotes the total number of pixels with label and is the number of classes. The overall loss combines and with a balance factor :
(5) |
3 Experiments
3.1 Datasets and Evaluation Metrics
Automatic Cardiac Diagnosis Challenge (ACDC) Dataset contains cine-MR images of 150 patients, split into 100 train images and 50 test images. These patients are divided into 5 evenly distributed subgroups: normal, myocardial infarction, dilated cardiomyopathy, hypertrophic cardiomyopathy and abnormal right ventricle, available as a part of the STACOM 2017 ACDC challenge. The annotations for the 50 test images are hold by the challenge organizer. We also further divide the 100 training images into 80% training and 20% validation with five non-overlaping folds to perform extensive experiments.
Self-collected Dataset consists of more than 100k 2D images that we collected from 531 patient cases. All the data was labeled by a team of medical experts. The patients are also divided into the same 5 subgroups as ACDC. A series of short axis slices cover LV from the base to the apex, with a thickness of 6 mm and a flip angle of 80∘. The magnitude field strength of images is 1.5T and the spatial resolution is 1.328 /pixel. We also split all the patients into 80% training and 20% test.
Evaluation metrics: We adopt the widely used 3D Dice coefficient and Hausdorff distance to benchmark the proposed method.
3.2 Implementation Details
The network is trained by minimizing the proposed loss function in Eq. (5) using ADAM optimizer [8] with the learning rate set to 10-3. The network weights are initialized with [4] and trained for 200 epochs. Data augmentation is applied to prevent over-fitting including: 1) random translation with the maximum absolute fraction for horizontal and vertical translations both set to 0.125; 2) random rotation with the random angle between -180∘ and 180∘. The batch size is set to 32 with the resized inputs.
Methods | Dice Coefficient | Hausdorff Distance (mm) | |||||||
---|---|---|---|---|---|---|---|---|---|
LV | RV | MYO | Mean | LV | RV | MYO | Mean | ||
ACDC Dataset | U-Net | 0.931 | 0.856 | 0.872 | 0.886 | 24.609 | 30.006 | 14.416 | 23.009 |
Ours | 0.949 | 0.888 | 0.911 | 0.916 | 3.761 | 6.037 | 10.282 | 6.693 | |
Self-Collected Dataset | U-Net | 0.948 | 0.854 | 0.906 | 0.903 | 2.823 | 3.691 | 2.951 | 3.155 |
Ours | 0.949 | 0.859 | 0.909 | 0.906 | 2.814 | 3.409 | 2.683 | 2.957 |
Rank | User | Mean DICE | Mean HD (mm) |
---|---|---|---|
1 | Fabian Isensee [5] | 0.927 | 7.8 |
2 | Clement Zotti [15] | 0.9138 | 9.51 |
3 | Ours | 0.911 | 9.92 |
4 | Nathan Painchaud [9] | 0.911 | 9.93 |
5 | Christian Baumgartner [1] | 0.9046 | 10.4 |
6 | Jelmer Wolterink [13] | 0.908 | 10.7 |
7 | Mahendra Khened [7] | 0.9136 | 11.23 |
8 | Shubham Jain [10] | 0.8915 | 12.07 |
Methods | Dice Coefficient | Hausdorff Distance (mm) | ||||||
---|---|---|---|---|---|---|---|---|
LV | MYO | RV | Mean | LV | MYO | RV | Mean | |
U-Net | 0.931 | 0.856 | 0.872 | 0.886 | 24.609 | 30.006 | 14.416 | 23.009 |
AAF [6] | 0.928 | 0.853 | 0.891 | 0.891 | 13.306 | 14.255 | 13.969 | 13.844 |
DMR [3] | 0.937 | 0.880 | 0.892 | 0.903 | 7.520 | 9.870 | 12.385 | 9.925 |
U-Net+DFM | 0.949 | 0.888 | 0.911 | 0.916 | 3.761 | 6.037 | 10.282 | 6.693 |
3.3 In-dataset Results
We first evaluate the proposed method on the ACDC dataset (train/val split described in Sec. 3.1) and the self-collected dataset. Tab. 1 presents the performance of the proposed DFM on two datasets. LV, RV and MYO represent the left ventricle, right ventricle and myocardium, respectively. Our approach consistently improves the baseline (U-Net), demonstrating its effectiveness. To further make comparison with the current state-of-the-arts methods, we submit the results to the ACDC leadboard. As shown in Tab. 2, compared with those methods that rely on well-designed networks (e.g. 3D network in [5] and Grid-like CNN in [15]) or multi-model fusion, the proposed DFM achieves competitive performance with only two simple yet effective modules added to the baseline U-Net. We also compare our approach with other methods dedicated to alleviate the inter-class indistinction and intra-class inconsistency. As depicted in Tab. 3, the proposed DFM significantly outperforms these methods. Some qualitative comparisons are given in Fig. 5.
Methods | ACDC to self-collected dataset | Self-collected dataset to ACDC | ||
---|---|---|---|---|
Mean Dice | Mean HD (mm) | Mean Dice | Mean HD (mm) | |
U-Net | 0.832 | 25.553 | 0.803 | 4.896 |
U-Net+DFM | 0.841 | 17.870 | 0.820 | 4.453 |
Number of steps | 0 | 1 | 3 | 5 | 7 |
---|---|---|---|---|---|
Mean Dice | 0.910 | 0.913 | 0.914 | 0.916 | 0.910 |
Mean HD (mm) | 10.498 | 17.846 | 9.026 | 6.693 | 13.880 |
3.4 Cross Dataset Evaluation and Ablation Study
To analyze the generalization ability of the proposed DFM, we performed a cross-dataset segmentation evaluation. Results listed in Tab. 4 show that the proposed DFM can consistently improve the cross-dataset performance compared with the original U-Net, validating its generalization ability and robustness. We also conduct ablation study on ACDC dataset to explore how the number of steps involved in the FRF module influences the performance. As shown in Tab. 5, the setting of gives the best performance.
4 Conclusion
In this paper, we explore the importance of directional information and present a simple yet effective method for cardiac MRI segmentation. We propose to learn a direction field, which characterizes the directional relationship between pixels and implicitly restricts the shape of the segmentation result. Guided by the directional information, we improve the segmentation feature maps and thus achieve better segmentation accuracy. Experimental results demonstrate the effectiveness and the robust generalization ability of the proposed method.
Acknowledgement
This work was supported in part by the Major Project for New Generation of AI under Grant no. 2018AAA0100400, NSFC 61703171, and NSF of Hubei Province of China under Grant 2018CFB199. Dr. Yongchao Xu was supported by the Young Elite Scientists Sponsorship Program by CAST.
References
- [1] Baumgartner, C.F., Koch, L.M., Pollefeys, M., Konukoglu, E.: An exploration of 2d and 3d deep learning techniques for cardiac MR image segmentation. In: International Workshop on Statistical Atlases and Computational Models of the Heart. pp. 111–119 (2017)
- [2] Cheng, T., Wang, X., Huang, L., Liu, W.: Boundary-preserving mask r-cnn. In: Proc. of European Conference on Computer Vision (2020)
- [3] Dangi, S., Linte, C.A., Yaniv, Z.: A distance map regularized CNN for cardiac cine MR image segmentation. Medical physics 46(12), 5637–5651 (2019)
- [4] He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition. pp. 1026–1034 (2015)
- [5] Isensee, F., Petersen, J., Kohl, S.A., Jäger, P.F., Maier-Hein, K.H.: nnu-net: Breaking the spell on successful medical image segmentation. arXiv preprint arXiv:1904.08128 (2019)
- [6] Ke, T.W., Hwang, J.J., Liu, Z., Yu, S.X.: Adaptive affinity fields for semantic segmentation. In: Proc. of European Conference on Computer Vision. pp. 587–602 (2018)
- [7] Khened, M., Kollerathu, V.A., Krishnamurthi, G.: Fully convolutional multi-scale residual DenseNets for cardiac segmentation and automated cardiac diagnosis using ensemble of classifiers. Medical image analysis 51, 21–45 (2019)
- [8] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
- [9] Painchaud, N., Skandarani, Y., Judge, T., Bernard, O., Lalande, A., Jodoin, P.M.: Cardiac MRI segmentation with strong anatomical guarantees. In: Proc. of Intl. Conf. on Medical Image Computing and Computer Assisted Intervention. pp. 632–640 (2019)
- [10] Patravali, J., Jain, S., Chilamkurthy, S.: 2d-3d fully convolutional neural networks for cardiac MR segmentation. In: International Workshop on Statistical Atlases and Computational Models of the Heart. pp. 130–139 (2017)
- [11] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: Proc. of Intl. Conf. on Medical Image Computing and Computer Assisted Intervention. pp. 234–241 (2015)
- [12] Wang, Y., Xu, Y., Tsogkas, S., Bai, X., Dickinson, S., Siddiqi, K.: Deepflux for skeletons in the wild. In: Proc. of IEEE Intl. Conf. on Computer Vision and Pattern Recognition. pp. 5287–5296 (2019)
- [13] Wolterink, J.M., Leiner, T., Viergever, M.A., Išgum, I.: Automatic segmentation and disease classification using cardiac cine MR images. In: International Workshop on Statistical Atlases and Computational Models of the Heart. pp. 101–110 (2017)
- [14] Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. on Image Processing 28(11), 5566–5579 (2019)
- [15] Zotti, C., Luo, Z., Lalande, A., Jodoin, P.M.: Convolutional neural network with shape prior applied to cardiac MRI segmentation. IEEE journal of biomedical and health informatics 23(3), 1119–1128 (2018)