Deep Angiogram: Trivializing Retinal Vessel Segmentation

Dewei Hu Vanderbilt University, Dept. of Electrical and Computer Engineering Xing Yao Vanderbilt University, Dept. of Computer Science Jiacheng Wang Vanderbilt University, Dept. of Computer Science Yuankai K. Tao Vanderbilt University, Dept. of Biomedical Engineering, Nashville, TN, USA Ipek Oguz Vanderbilt University, Dept. of Computer Science

Abstract

Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography.

keywords:

deep learning, vessel enhancement, vessel segmentation, domain generalization

1 INTRODUCTION

Retinal fundus photography is a cheap, fast and non-invasive modality that reveals essential anatomical features including optic disc, optic cup, macula, fovea, vessels and lesions such as hemorrhages and exudates [1]. Therefore, it is widely used for the diagnosis of diseases such as diabetic retinopathy [2], glaucoma [3] and age-related macular degeneration [4]. While fundus photography is broadly used as a low-cost screening tool, it does not provide sufficient contrast to resolve clinically relevant vascular features and exogenous indocyanine green angiography (ICG)/fluorescein angiography (FA) remain the standard of care for visualization/quantifying retinal vasculopathies. An algorithm that can provide accurate vessel segmentation from these fundus images would have profound impact on future clinical practice. In recent years, deep learning models [5] have achieved remarkable success in this task. Nevertheless, the domain shift induced by variations in image contrast and presence of unseen pathological features in testing data can dramatically degrade the performance of deep models.

Recent research explored three main types of domain generalization methods [6]: domain randomization, representation learning and general learning strategy. Domain randomization augments the training data to extend the source domain [7], improving the likelihood that an unseen target domain overlaps with the training domain. Representation learning refers to the disentanglement of features that are invariant to different domains [8]. A typical general learning strategy is meta-learning: for example, Li et al. simulate the domain shift by splitting the source domain into meta-train and meta-test [9].

In this work, we leverage both domain randomization and representation learning approaches to train a model that has superior generalizability across different domains. We augment the source domain by the contrast limited adaptive histogram equalization (CLAHE) [10] with clip limit $\epsilon\in\mathcal{N}$ . In addition to well-enhanced contrast for vessels, the augmented images also have exaggerated irrelevant structures including noise and lesions. Inspired by the idea of disentangling the shared features in two images presented in our previous work [11, 12], we leverage a variational auto-encoder (VAE) to extract the representation of vessels. However, as we showed in [11], this latent image may have an arbitrary style that contains unwanted features. We tackle this challenge by introducing a contrastive loss such that vessels are the only features in the synthetic image. We name the result a deep angiogram. Then, the segmentation task is simply reduced to Otsu thresholding [13]. Without the irrelevant features, the visibility of the vasculature is drastically improved in the deep angiogram compared to other vessel enhancement approaches [14]. We evaluate the generalizability of our model by the segmentation performance on the target domains. For baseline models, we trained two segmentation networks on the source domain that take the green channel fundus image and the principle component analysis (PCA) image as the input respectively. The result indicates that the proposed method generalizes better on target domains and achieves higher segmentation performance than deep segmentation networks, by simple thresholding.

2 METHODS

2.1 Causal Feature Extraction

Fig. 1(a) shows our VAE model composed by the encoder $E_{\theta}$ and the decoder $D_{\varphi}$ . The input image is $x$ and the supervision is provided by the label $y$ . As we have previously shown [11, 12], when the latent manifold of the VAE has the same dimension with input $x$ , the encoder is able to enhance the shared features in $x$ and $y$ . Intuitively, if an image is regarded as a collection of representations, then $(x\cap y)\subseteq E_{\theta}(x)$ should hold to guarantee that there is no essential information missing in the output $\hat{y}$ . In the context of causal learning, $x\cap y$ is the set of causal features for the final prediction. In this implementation, the fundus image $x$ includes information of many anatomical structures such as optic disc, vessels, macula and lesions, whereas the causal features for the segmentation task contain just the vessels, so ideally the latent image should be a vessel map without any irrelevant features, i.e., $(x\cap y)=E_{\theta}(x)$ .

As suggested in Fig. 1, since we want to put most of the workload on the encoder $E_{\theta}$ , it is designed to have more learnable parameters than the decoder $D_{\varphi}$ . Both $E_{\theta}$ and $D_{\varphi}$ have residual U-Net architecture. Note that the decoder $D_{\varphi}$ will not be applied in the testing since its purpose is to simply provide supervision to $E_{\theta}$ during training. The segmentation loss for the decoder is set to be a combination of cross-entropy and Dice loss:

\mathcal{L}_{seg}=-\frac{1}{N}\sum_{n=1}^{N}y_{n}\log\hat{y}_{n}+\left(1-\frac{2\sum_{n=1}^{N}y_{n}\hat{y}_{n}}{\sum_{n=1}^{N}y_{n}^{2}+\hat{y}_{n}^{2}}\right)

(1)

2.2 Domain Randomization

There are two major causes for distribution shift of fundus images. First, within a well-curated dataset (e.g., DRIVE [15]), the image contrast is usually consistent. A model trained on such a dataset may struggle with a poor-contrast test image. Second, since a given dataset is unlikely to exhaustively provide samples of all possible pathologies, unseen features such as drusen and hemorrhages can be problematic during testing.

To improve the robustness of the model, we randomize the source domain data by CLAHE [10] in addition to other commonly used augmentation methods (e.g., rotation). For an input image $x$ , we apply CLAHE $C_{\epsilon}$ to all the color channels with a random clip limit $\epsilon\in\mathcal{N}(5,1)$ . In the resultant image $x^{\prime}$ , the contrast of vessels are strongly enhanced, as well as the background noise. Then as in Fig. 1, we introduce a contrastive loss $\mathcal{L}_{cont}$ for the latent image to guarantee that the model is not distracted by this exaggerated noise and provides stable visualization for input with various contrasts. The loss function is defined as the sum of the $L_{2}$ loss and the structural similarity (SSIM) loss.

\mathcal{L}_{cont}=\|E_{\theta}(x)-E_{\theta}(x^{\prime})\|_{2}+SSIM(E_{\theta}(x)-E_{\theta}(x^{\prime}))

(2)

The SSIM loss is defined as $SSIM(x,y)=\frac{(2\mu_{x}\mu_{y}+c_{1})(2\sigma_{xy}+c_{2})}{(\mu_{x}^{2}+\mu_{y}^{2}+c_{1})(\sigma_{x}^{2}+\sigma_{y}^{2}+c_{2})}$ , where $\mu$ and $\sigma$ represent the mean and standard deviation of the image, and $c_{1}$ and $c_{2}$ are constants.

2.3 Experiments

Baseline Methods. Since the color image is more sensitive to domain shift, it is common to convert the fundus image to grayscale as pre-processing, typically by extracting the green channel or using principle component analysis (PCA). We train a segmentation network that has the same architecture as $E_{\theta}$ with either the green channel or the PCA as input. We compare these two networks to Otsu thresholding of deep angiograms.

Datasets. We use four publicly available fundus datasets as shown in Fig. 1(b). The DRIVE dataset [15] consists of 20 labelled images of size $565\times 584$ . The HRF dataset [16] contains 45 labelled images of size $3504\times 2336$ . The STARE dataset [17] includes 20 labelled images of size $700\times 605$ . The ARIA dataset [18] includes 138 labelled images of size $768\times 576$ . DRIVE and HRF are set as source domain, whereas STARE and ARIA are used for testing.

Implementation Details. All networks are trained and tested on an NVIDIA RTX 2080TI 11GB GPU. We use a batch size of 4 and train for 300 epochs. We use the Adam optimizer with the initial learning rate of $5\times 10^{-4}$ for the proposed VAE, $1\times 10^{-3}$ for the baseline segmentation networks. The learning rate for both networks decay by 0.5 every 3 epochs.

3 RESULTS and Conclusion

Fig. 2 shows a test example from each of the target domains. We observe that for different datasets, the manual annotations includes varying amounts of detail: the label for the STARE dataset contains many more small vessels than ARIA. In the ARIA example, the deep angiogram is able to enhance the thin vessels with very poor contrast. This is also evident by the big vessels seen at the bottom left quadrant of the image where the illumination is low. Moreover, the angiogram filters out the circular artifacts seen within the red box. In the STARE example, our model extracts most of the vasculature including the faintly visible fine vessels. These tiny vessels have relatively lower intensity in the deep angiogram, which suggests lower confidence. Compared to the manual label, the deep angiogram can also delineate the vessel diameter more precisely.

We quantitatively evaluate the vessel segmentation performance in Fig. 3. By simple thresholding on deep angiogram, we obtain get better vessel maps than the segmentation networks that use the green channel and PCA image as inputs.

The proposed method can effectively extract a specific type of feature from a complex context. Specific to retinal vessels, our model can generate stable deep angiograms that dramatically enhance small vessels with poor contrast for color fundus images from unseen domains. Hence, deep angiogram is a low-cost method that can be performed using standard fundus photography technologies, including portable handheld systems. The ability to resolve vascular features without the need for exogenous contrast injections significantly reduces the clinical expertise/equipment/cost of retinal angiography. Integration of these technologies with recent demonstrations of cellphone-based fundus photography methods and remote diagnostic technologies can move retinal disease screening out of the clinic and dramatically expand the impact of color fundus photography.

4 ACKNOWLEDGEMENTS

This work is supported by the Vanderbilt University Discovery Grant Program.

References

[1] Li, T., Bo, W., Hu, C., Kang, H., Liu, H., Wang, K., and Fu, H., “Applications of deep learning in fundus images: A review,” Medical Image Analysis 69, 101971 (2021).
[2] Islam, M. M., Yang, H.-C., Poly, T. N., Jian, W.-S., and Li, Y.-C. J., “Deep learning algorithms for detection of diabetic retinopathy in retinal fundus photographs: A systematic review and meta-analysis,” Computer Methods and Programs in Biomedicine 191, 105320 (2020).
[3] Zhang, Z., Yin, F. S., Liu, J., Wong, W. K., Tan, N. M., Lee, B. H., Cheng, J., and Wong, T. Y., “Origa-light: An online retinal fundus image database for glaucoma analysis and research,” in [2010 Annual international conference of the IEEE engineering in medicine and biology ], 3065–3068, IEEE (2010).
[4] Spaide, R. F., “Fundus autofluorescence and age-related macular degeneration,” Ophthalmology 110(2), 392–399 (2003).
[5] Chen, C., Chuah, J. H., Raza, A., and Wang, Y., “Retinal vessel segmentation using deep learning: a review,” IEEE Access (2021).
[6] Wang, J., Lan, C., Liu, C., Ouyang, Y., Qin, T., Lu, W., Chen, Y., Zeng, W., and Yu, P., “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and Data Engineering (2022).
[7] Zhang, R., Xu, Q., Huang, C., Zhang, Y., and Wang, Y., “Semi-supervised domain generalization for medical image analysis,” in [2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI) ], 1–5, IEEE (2022).
[8] Jiang, J. and Veeraraghavan, H., “Unified cross-modality feature disentangler for unsupervised multi-domain mri abdomen organs segmentation,” in [International Conference on Medical Image Computing and Computer-Assisted Intervention ], 347–358, Springer (2020).
[9] Li, D., Yang, Y., Song, Y.-Z., and Hospedales, T., “Learning to generalize: Meta-learning for domain generalization,” in [Proceedings of the AAAI conference on artificial intelligence ], 32(1) (2018).
[10] Reza, A. M., “Realization of the contrast limited adaptive histogram equalization (clahe) for real-time image enhancement,” Journal of VLSI signal processing systems for signal, image and video technology 38(1), 35–44 (2004).
[11] Hu, D., Li, H., Liu, H., and Oguz, I., “Domain generalization for retinal vessel segmentation with vector field transformer,” in [Medical Imaging with Deep Learning ], (2021).
[12] Hu, D., Cui, C., Li, H., Larson, K. E., Tao, Y. K., and Oguz, I., “Life: a generalizable autodidactic pipeline for 3d oct-a vessel segmentation,” in [International Conference on Medical Image Computing and Computer-Assisted Intervention ], 514–524, Springer (2021).
[13] Otsu, N., “A threshold selection method from gray-level histograms,” IEEE transactions on systems, man, and cybernetics 9(1), 62–66 (1979).
[14] Subramaniam, A., Douglass, M., Orge, F., Can, B., Monteoliva, G., Fried, E., Schbib, V., Saidman, G., Peña, B., Ulacia, S., et al., “Vessel enhancement in smartphone fundus images to aid retinopathy of prematurity and plus disease diagnosis and classification,” in [Medical Imaging 2022: Imaging Informatics for Healthcare, Research, and Applications ], 12037, 35–42, SPIE (2022).
[15] Staal, J., Abràmoff, M. D., Niemeijer, M., Viergever, M. A., and Van Ginneken, B., “Ridge-based vessel segmentation in color images of the retina,” IEEE transactions on medical imaging 23(4), 501–509 (2004).
[16] Budai, A., Bock, R., Maier, A., Hornegger, J., and Michelson, G., “Robust vessel segmentation in fundus images,” International journal of biomedical imaging 2013 (2013).
[17] Hoover, A., Kouznetsova, V., and Goldbaum, M., “Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response,” IEEE Transactions on Medical imaging 19(3), 203–210 (2000).
[18] Farnell, D. J., Hatfield, F. N., Knox, P., Reakes, M., Spencer, S., Parry, D., and Harding, S. P., “Enhancement of blood vessels in digital fundus photographs via the application of multiscale line operators,” Journal of the Franklin institute 345(7), 748–765 (2008).

Fundus	Deep Angiogram	Label
ARIA
STARE