This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dual-Branch Network with Dual-Sampling Modulated Dice Loss
for Hard Exudate Segmentation from Colour Fundus Images

Qing Liu1, Haotian Liu1, Yixiong Liang 1
Abstract

Automated segmentation of hard exudates in colour fundus images is a challenge task due to issues of extreme class imbalance and enormous size variation. This paper aims to tackle these issues and proposes a dual-branch network with dual-sampling modulated Dice loss. It consists of two branches: large hard exudate biased segmentation branch and small hard exudate biased segmentation branch. Both of them are responsible for their own duties separately. Furthermore, we propose a dual-sampling modulated Dice loss for the training such that our proposed dual-branch network is able to segment hard exudates in different sizes. In detail, for the first branch, we use a uniform sampler to sample pixels from predicted segmentation mask for Dice loss calculation, which leads to this branch naturally be biased in favour of large hard exudates as Dice loss generates larger cost on misidentification of large hard exudates than small hard exudates. For the second branch, we use a re-balanced sampler to oversample hard exudate pixels and undersample background pixels for loss calculation. In this way, cost on misidentification of small hard exudates is enlarged, which enforces the parameters in the second branch fit small hard exudates well. Considering that large hard exudates are much easier to be correctly identified than small hard exudates, we propose an easy-to-difficult learning strategy by adaptively modulating the losses of two branches. We evaluate our proposed method on two public datasets and results demonstrate that ours achieves state-of-the-art performance.

Introduction

Hard exudate is one of the most significant manifestation of diabetic retinopathy (DR) (Klein et al. 1987). Automated and accurate segmentation of hard exudate in colour fundus images has several potential applications in clinical such as large-scale automated DR screening, computer-aided diagnosis and severity level assessment of DR (Sasaki et al. 2013).

The segmentation of hard exudate can be formulated as a dense classification problem. At the era of deep learning, without any exception, the first choice is fully convolutional networks (FCNs). Its goal is to optimise parameters in designed FCNs to best fit the exudate ground-truth via minimising a specified loss function. However, achieving this goal is challenge due to two issues:

  • Extreme class imbalance. To illustrate how extreme the class imbalance is, we account the ratio of negative samples (i.e. background pixels) to positive samples (i.e. hard exudate pixels) in two public datasets for hard exudate segmentation, i.e. DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) (see Table. 1). The ratio in DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) is 512512 and 110110 respectively. With those serious extreme class imbalanced data, how to design loss function and train the segmentation model to alleviate the bias towards majority class becomes critical.

  • Enormous variation in size across connected components of hard exudate regions. Most of hard exudate connected regions are small. In particular, we calculate the relative area of connected hard exudate regions and find that almost 90%90\% hard exudate pixels belong to connected regions whose relative area to the whole fundus image is less than 9.7×1059.7\times 10^{-5} in DDR (Li et al. 2019) and 1.1×1041.1\times 10^{-4} in IDRiD (Porwal et al. 2018). More seriously, there are 10%10\% hard exudate pixels belonging to connected hard exudate regions with relative areas less than 2.0×1062.0\times 10^{-6} in DDR (Li et al. 2019) and 5.0×1065.0\times 10^{-6} in IDRiD (Porwal et al. 2018) respectively. The size variations of largest 10%10\% and smallest 10%10\% in DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) are almost 4848 and 2222 times, respectively. This variation in size which the FCN model needs to handle is enormous and rises a huge challenge for representation and classifier learning.

Table 1: Class distribution imbalance and exudate region size variation existing in exudate segmentation datasets DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018). Rationeg/posRatio_{neg/pos} denotes the ratio of background pixels and hard exudate pixels. SizelargeSize_{large} and SizesmallSize_{small} are the relative size to images of the top 10%10\% largest exudate regions and top 10%10\% smallest exudate regions in the whole dataset.
Dataset Rationeg/posRatio_{neg/pos} SizelargeSize_{large}/SizesmallSize_{small}
DDR 512 9.7×1059.7\times 10^{-5}/2.0×1062.0\times 10^{-6}
IDRiD 110 1.1×1041.1\times 10^{-4}/5.0×1065.0\times 10^{-6}

An intuitive way for hard exudate segmentation is to fine-tune semantic segmentation networks, such as HED (Xie and Tu 2015), PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and Deeplabv3+ (Chen et al. 2018), which are originally designed for dense classification tasks on natural scene images. Those methods handle the issue of class imbalance via using class balanced cross-entropy (CBCE) loss rather than the traditionally used cross-entropy loss. Inspired by HED (Xie and Tu 2015), Guo et al. (Guo et al. 2019a, 2020) propose a variant of CBCE loss called bin loss and fine-tune the parameters in HED (Xie and Tu 2015) for hard exudate segmentation. Bin loss (Guo et al. 2019a, 2020) considers not only the class imbalance problem but also the hardness of background pixels to be correctly classified. FCNs trained with CBCE loss successfully avoid background bias. However, directly up-weighting loss of minority class and down-weighting loss of majority class according to the inverse class frequency is too rough. This makes the segmentation model be biased in favour of exudate pixels. Taking PSPNet (Zhao et al. 2017) as example, when training it with CBCE loss, the model always wrongly identifies confusion structures and background regions around hard exudates, as shown in Fig. 1 (b) and (f).

Refer to caption
Figure 1: Segmentation results by PSPNet (Zhao et al. 2017) trained with CBCE loss and Dice loss and our proposed dual-branch network. For better visualisation, we show the segmentation results of the entire image in (a) in the first row and zooming in the white solid window in (a) in the second row. Pixels in yellow are exudate pixels that are correctly classified. Pixels in red are background pixels that are wrongly classified as exudate pixels. Pixels in green are exudate pixels that are wrongly classified as background pixels. Dashed cyan box highlights background regions wrongly identified as exudate and dashed magenta box highlights exudate regions wrongly identified as background. From this figure we can see that PSPNet (Zhao et al. 2017) trained with CBCE loss tends to classify background as exudate while it trained with Dice loss tends to result in misidentification on hard exudate in small size. On the contrary, our dual-branch segmentation with DSM loss based on PSPNet (Zhao et al. 2017) achieves better than the single branch ones.

An alternate strategy is to train FCNs with Dice loss (Milletari, Navab, and Ahmadi 2016). Dice loss (Milletari, Navab, and Ahmadi 2016) is a regional loss which measures the overlapping error between the prediction and ground truth. It works better than CBCE loss when class imbalance issue is serious. However, because the costs on small exudate regions in terms of Dice loss are slight comparing to that on large hard exudate regions, FCNs trained with Dice loss is biased towards large hard exudate regions and results in dentification on small large hard exudates. The large variation in size across connected components of hard exudate makes the bias more serious. Fig. 1 (c) and (g) show the results by PSPNet (Zhao et al. 2017) trained with Dice loss, which tends to mis-classifiy small hard exudate regions as background.

In this paper, we propose a dual-branch network with dual-sampling modulated Dice loss to take care of both large and small hard exudate connected regions. As shown in Fig. 2, our dual-branch network consists of two branches: large hard exudate biased segmentation branch and small hard exudate biased segmentation branch. It is trained with a dual-sampling modulated (DSM) Dice loss. Each branch separately performs its own duties for representation and classifier learning for hard exudates in different sizes. Large hard exudate biased segmentation branch learns a segmentation model which is large hard exudate biased while small hard exudate biased segmentation branch is biased in supporting small hard exudates. The bias of both learning branches is achieved by the proposed DSM loss. For large hard exudate biased segmentation branch, Dice loss with uniform pixel sampler is used. For small hard exudate biased segmentation branch, a re-balanced pixel sampler is used to oversample hard exudate pixels and undersample background pixels. Thus hard exudate pixels are sampled multiple times, which increases the penalty on misidentification of small hard exudate regions. In this way it well compensates the large hard exudate biased segmentation branch. The bias of two branches are shifted by modulation with regard to learning epoch. We evaluate the effectiveness of the proposed dual-branch network on DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) and results show that our dual-branch network outperforms existing hard exudate segmentation methods. Furthermore, to demonstrate the effectiveness of underlying thoughts of dual-branch network, we combine it with several dense classification networks. Results show that dual-branch networks trained with our proposed dual-sampling modulated Dice loss achieve superior performance to single branch networks trained with Dice loss.

In summary, the contributions of this paper are as follows:

  • We propose a novel framework named dual-branch network to handle the issues of extreme class imbalance and enormous variation in size existed in the task of automated hard exudate segmentation from colour fundus images.

  • We propose a dual-sampling modulated Dice loss to guide the learning process of dual-branch network, which is an easy-to-difficult learning strategy and adaptively modulates the losses of two branches such that dual-branch network gradually shifts the attention to easy task of large hard exudate segmentation to hard task of small hard exudate segmentation.

  • We conduct extensive experiments on two public datasets DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) and demonstrate that dual-branch network achieves state-of-the-art performance on hard exudate segmentation.

Refer to caption
Figure 2: Illustration of dual-branch network. The left part is the dual-branch network which is constructed with two identical segmentation branches with partial weight sharing. We note here arbitrary segmentation models such as PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015) are desired. The right part illustrates the proposed dual-sampling modulated Dice loss, which uses a uniform pixel sampler and a re-balanced pixel sampler sample pixels involving loss calculation. The two losses are adaptively modulated by a hyper-parameter which is set according to training epoch.

Related Work

Unsupervised Hard Exudate Segmentation Methods. Earlier methods such as (Walter et al. 2002; Sopharak et al. 2008; Ravishankar, Jain, and Mittal 2009; Welfer, Scharcanski, and Marinho 2010) adopt morphological operations to enhance exudates, then use a simple thresholding to partition exudate from background. Similarly, Pereira et al. (Pereira, Gonçalves, and Ferreira 2015) propose to use median filtering and normalisation on green plane fundus image for enhancement. J. Kaur and D. Mittal (Kaur and Mittal 2018) first remove the vessel and optic disc, then enhance the image by adaptive image quantization and normalisation. Finally, adaptive thresholding is used to identify exudates. Those kinds of methods are simple and do not need expensive annotations by ophthalmologists, but they always fail on confused structures which have high contrast to background.

Coarse-to-fine Supervised Hard Exudate Segmentation Methods. Those methods are data-driven and require expert annotation. They commonly involve two stages: (1) coarse detection stage for candidate detection and (2) fine segmentation stage to finally determine whether the candidate is hard exudate region. For example, in (Zhang et al. 2014; Wang et al. 2020), candidates are extracted by mathematical morphology first, then a random forest is trained for classification. Rather than learning to determine whether the candidate is hard exudate, other researchers focus on high quality candidate extraction via learning. For example, Liu et al. (Liu et al. 2017) first learn to extract multiscale hard exudate candidate patches and reduces numerous background regions, then identify hard exudate regions from candidate patches according to their characteristics such as intensity contrast to background and area. Kusakunniran et al. (Kusakunniran et al. 2018) first learn a multilayer perceptron for the detection of candidate hard exudate seeds. With the clusters of initial seeds, iterative graph-cut is used for segmentation. Additionally, Parham et al. (Khojasteh, Aliahmad, and Kumar 2019; Khojasteh et al. 2019) propose to identify exudate patches from non-exudate patches by either training a lightweight deep network on candidate patches or training a support vector machine with features extracted by pre-trained deep network Resnet-50 (He et al. 2016). However, how to extract hard exudate candidate patches from whole images during testing phase still needs to be solved.

End-to-end Hard exudate Segmentation Methods. Recent hard exudate segmentation methods adopt an end-to-end manner to train an FCN with a loss function. Mo et al. (Mo, Zhang, and Feng 2018) design a fully convolutional residual network named FCRN for exudate segmentation while Guo et al. (Guo et al. 2019b) design a lightweight neural network named LWENet. In (Guo et al. 2019a), L-seg is proposed for multi-lesion segmentation. All of those methods take the class imbalance problem into consideration and train the networks with CBCE loss. In (Guo et al. 2020), an improved CBCE loss incorporating hard negative mining is proposed for hard exudate segmentation. Both CBCE and bin loss avoid the background bias by increasing the cost of wrong classification on exudate pixels. However, due to the imbalanced cost weights, FCNs trained with CBCE loss and bin loss turn to suffer from exudate-bias.

It is noteworthy to mention that dual-network and dual-sampling have been used for class imbalance classification. In (Zhou et al. 2020), bilateral branch network (BBN) equipped with two samplers is proposed for class imbalance image classification. In (Ouyang et al. 2020), a dual-sampling network (DSN) which consists of two separate branches with two samplers is proposed for diagnosis of COVID-19. Our method is inspired by BBN (Zhou et al. 2020) and DSN (Ouyang et al. 2020). Although all of those methods contain dual-branches with dual-samplers, ours differs from BBN (Zhou et al. 2020) and DSN (Ouyang et al. 2020) in three aspects: (1) our dual-branch network is designed for dense classification while BBN (Zhou et al. 2020) and DSN (Ouyang et al. 2020) are for image-level classification; (2) Images fed into dual-branch network are randomly sampled from training set while images fed into BBN (Zhou et al. 2020) and DSN (Ouyang et al. 2020) are sampled according to the pre-defined samplers. (3) Samplers in dual-branch network are used on predicted segmentation masks which sample pixels involving Dice loss calculation while samplers in BBN (Zhou et al. 2020) and DSN (Ouyang et al. 2020) are used in input layer which sample images for representation and classifier learning.

Method

Overall Framework

Our goal is to pursuit deep network parameters that best fit the hard exudate ground-truth in different sizes with given training images. For hard exudates in different sizes, learning representation and classifier in different manners is desired. To this end, we propose to adopt two branches to separately learn representation and classifier. One branch, named large hard exudate biased segmentation branch, is mainly responsible for hard exudates in large size. The other, named small hard exudate biased segmentation branch, is responsible for hard exudates in small size. To achieve the size bias of each branch adaptively, we design a dual-sampling modulated Dice loss, termed DSM. Fig. 2 illustrates our proposed dual-branch network.

Formally, let XH×W×3X\in\mathcal{R}^{H\times W\times 3} denote a training colour fundus image size of H×WH\times W and YH×WY\in\mathcal{R}^{H\times W} is the corresponding ground-truth, which is a binary map within the context of hard exudate segmentation. From training set, we randomly fetch two images {XL,YL}\{X_{L},Y_{L}\} and {XS,YS}\{X_{S},Y_{S}\} and feed them into large hard exudate biased segmentation branch and small hard exudate biased segmentation branch respectively to obtain the final predictions Y^L\hat{Y}_{L} and Y^S\hat{Y}_{S}. Next we elaborate the architecture of our dual-branch network and training details with our dual-sampling modulated Dice loss.

Dual-Branch Segmentation Network

We let both branches economically share the same segmentation network structure, as illustrated in Fig. 2. Our dual-branch segmentation network can adopt arbitrary segmentation network. In this paper, we take three state-of-the-art segmentation networks, i.e. PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015) as examples to introduce our dual-branch segmentation network. PSPNet (Zhao et al. 2017) and Deeplabv3 (Chen et al. 2017b) adopt ResNet50 (He et al. 2016) equipped with dilation convolution (Chen et al. 2017a) in the last two stages as the backbone while HED (Xie and Tu 2015) adopts the five-stage VGG16 (Simonyan and Zisserman 2015) as backbone. Both ResNet50 (He et al. 2016) and VGG16 (Simonyan and Zisserman 2015) contain five stages of convolutions. Additionally, in PSPNet (Zhao et al. 2017), a pyramid pooling module is attached in the last convolutional stage, then a classifier is followed to make dense predictions. An auxiliary classifier is attached on the second convolution stage and auxiliary loss is generated to help optimise the learning process. In Deeplabv3 (Chen et al. 2017b), an atrous spatial pyramid pooling module is attached on the last convolutional stage to generate multiscale feature maps. In HED (Xie and Tu 2015), five side-output layers are stitched on five convolutional stages and finally a fusion layer is used to aggregate the side-output predictions. To reduce the model complexity and speed up the inference, for PSPNet (Zhao et al. 2017) and Deeplabv3 (Chen et al. 2017b) as segmentation branch, weights in first four stages of backbone networks are shared while rest weights are learned separately. For HED (Xie and Tu 2015), only the first three stages of backbone networks are shared. In this way, the representations for final classifiers are specific to hard exudate in different sizes. The loss items in PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015) are replaced with our proposed dual-sampling modulated Dice loss.

Loss Function

In each training iteration, two pairs of fundus images and their ground truth denoted by {XL,YL}\{X_{L},Y_{L}\} and {XS,YS}\{X_{S},Y_{S}\} are randomly fetched. XLX_{L} is fed into the large hard exudate biased segmentation branch and predictions are obtained and denoted by Y^L\hat{Y}_{L}. Similarly, XSX_{S} is fed into the small hard exudate biased segmentation branch and predictions are obtained and denoted by Y^S\hat{Y}_{S}. As hard exudate segmentation suffers from the issues of extreme class imbalance and large variation in size, rather than using the class balanced cross-entropy loss (Guo et al. 2019a, 2020), we propose the dual-sampling modulated Dice loss (DSM loss). As shown in Fig. 2, the total loss can be expressed as:

total=DSM.\mathcal{L}_{total}=\mathcal{L}_{DSM}\;. (1)

Dual-Sampling Modulated Dice Loss. In our design of DSM loss, two different samplers are used to sample pixels from predictions by two branches separately. Then Dice loss is used to measure the dissimilarity between set of sampled pixels and their ground truths.

For large hard exudate biased segmentation branch, with predicted segmentation mask Y^L\hat{Y}_{L}, we use a uniform pixel sampler which samples hard exudate pixels and background pixels with equal probability. We denote the sampled pixel set with 𝒮^L={y^L,n}n=1N\hat{\mathcal{S}}_{L}=\{\hat{y}_{L,n}\}_{n=1}^{N} where N=H×WN=H\times W and the corresponding set of ground truth with 𝒮L={yL,n}n=1N\mathcal{S}_{L}=\{y_{L,n}\}_{n=1}^{N} where yL,ny_{L,n} is ground truth of y^L,n\hat{y}_{L,n}. We use Dice loss (Milletari, Navab, and Ahmadi 2016) to calculate the dissimilarity between sampled pixel set and the corresponding ground truth labels:

L(𝒮^L,𝒮L)=1𝒟(𝒮^L,𝒮L),\mathcal{L}_{L}(\hat{\mathcal{S}}_{L},\mathcal{S}_{L})=1-\mathcal{D}(\hat{\mathcal{S}}_{L},\mathcal{S}_{L})\;, (2)

where 𝒟(𝒮^L,𝒮L)\mathcal{D}(\hat{\mathcal{S}}_{L},\mathcal{S}_{L}) measures the overlapping degree between two sets:

𝒟(𝒮^L,𝒮L)=y^L,n𝒮^L2y^L,nyL,ny^L,n𝒮L^y^L,n+yL,n𝒮LyL,n.\mathcal{D}(\hat{\mathcal{S}}_{L},\mathcal{S}_{L})=\frac{\sum_{\hat{y}_{L,n}\in\hat{\mathcal{S}}_{L}}2\hat{y}_{L,n}y_{L,n}}{\sum_{\hat{y}_{L,n}\in\hat{\mathcal{S}_{L}}}\hat{y}_{L,n}+\sum_{y_{L,n}\in\mathcal{S}_{L}}y_{L,n}}\;. (3)

Obviously, the loss defined by Eq. (2) and (3) focuses on overlapping error, which greatly alleviates the bias towards majority class like cross-entropy loss as well as minority class like CBCE loss. Dice loss produces serious penalty on misidentification of large hard exudate regions while slight penalty on misidentification of small hard exudate regions, which results in bias towards large hard exudate regions.

To remedy the misidentification on small hard exudate regions, from prediction by small hard exudate biased segmentation branch, we use a re-balanced pixel sampler to sample pixels involving loss calculation. Particularly, we oversample the exudate pixels and undersample background pixels. Thus hard exudate pixels in small regions are sampled multiple times with a high confidence, which increases the penalty of misidentification on small hard exudate regions. As a result, the learning focus is shifted into small hard exudate regions. In formulation, we randomly sample N1N_{1} hard exudate pixels and NN1N-N_{1} background pixels with replacement. We denote the sampled pixel set with 𝒮^S={y^S,n}n=1N\hat{\mathcal{S}}_{S}=\{\hat{y}_{S,n}\}_{n=1}^{N} and the corresponding set of ground truth with 𝒮S={yS,n}n=1N\mathcal{S}_{S}=\{y_{S,n}\}_{n=1}^{N} where yS,ny_{S,n} is ground truth of y^S,n\hat{y}_{S,n}. Similarly, the loss for small hard exudate biased segmentation branch is calculated as follows:

S(𝒮^S,𝒮S)=1𝒟(𝒮^S,𝒮S),\mathcal{L}_{S}(\hat{\mathcal{S}}_{S},\mathcal{S}_{S})=1-\mathcal{D}(\hat{\mathcal{S}}_{S},\mathcal{S}_{S})\;, (4)

where 𝒟(𝒮^S,𝒮S)\mathcal{D}(\hat{\mathcal{S}}_{S},\mathcal{S}_{S}) is the Dice coefficient which is computed same with Eq. (3).

As the segmentation of hard exudates in large size is much easier than in small size, we propose an easy-to-difficult learning strategy. We adaptively modulate the losses of two branches such that the dual-branch network first learns to handle the easy task, then focuses on difficult task. At the beginning of learning, we multiply the loss of large hard exudate biased segmentation branch by a large weight α\alpha while the loss of small hard exudate biased segmentation branch by a small weight 1α1-\alpha to enforce dual-branch network learn to segment hard exudate in large size. As the large hard exudate biased segmentation branch becomes more and more sophisticated in segmentation of large hard exudates, we gradually decrease the loss weight α\alpha and increase 1α1-\alpha. In this way, the focus of dual-branch network is shifted to segmentation of small hard exudates gradually. Formally, we express our proposed dual-sampling modulated Dice loss as

DSM=(1α)L(𝒮^L,𝒮L)+αS(𝒮^S,𝒮S),\mathcal{L}_{DSM}=(1-\alpha)\cdot\mathcal{L}_{L}(\hat{\mathcal{S}}_{L},\mathcal{S}_{L})+\alpha\cdot\mathcal{L}_{S}(\hat{\mathcal{S}}_{S},\mathcal{S}_{S})\;, (5)

where α\alpha is relative to learning epoch

α=1(epochepochmax)2.\alpha=1-\left(\frac{epoch}{epoch_{max}}\right)^{2}\;. (6)

Inference. In inference phase, the test fundus image is fed into both branches and two predictions are obtained. As both branches are equally important, we simply perform element-wise average on two predictions to obtain the final prediction.

Experiments

In this section, we first introduce the data and evaluation metrics for hard exudate segmentation, and present the implementation details, then give ablation analysis and finally make comparisons with state-of-the-arts.

Data and Evaluation Metrics

Data. In our experiments, we validate our method on two public datasets for hard exudate segmentation. The first one is the DDR dataset (Li et al. 2019), which is made public for diabetic retinopathy classification, lesion segmentation and detection in 2019. To our best knowledge, DDR (Li et al. 2019) is the largest dataset for hard exudate segmentation. Fundus images in DDR (Li et al. 2019) are collected from 147 hospitals, covering 23 provinces in China and their size ranges from 1088×19201088\times 1920 to 3456×51843456\times 5184. The large variant in image size and large domain gap make the classification and segmentation on DDR (Li et al. 2019) more challenge. Specific to lesion segmentation, DDR (Li et al. 2019) provides 757 fundus images with pixel-level annotation, among which 383 images are for training, 149 for validation and 225 for testing. The other is IDRiD (Porwal et al. 2018, 2020), provided by a grand challenge on “Diabetic Retinopathy – Segmentation and Grading” in 2018. It provides 81 fundus images size of 4288×28484288\times 2848 with pixel-level hard exudate annotations, among which 54 images are used for training and 27 for testing. All of those images are acquired from an eye clinic in India.

Evaluation Metrics. We evaluate segmentation methods at both pixel-level and region-level. With regard to pixel-level metrics, following (Li et al. 2019) and (Porwal et al. 2020), Intersection of Union (IoU) and Area Under Precision-Recall Curve (AUPR) are adopted. We also adopt F-score for evaluation, which is harmonic mean of sensitivity (SNSN) and positive predicted value (PPVPPV).

With regard to region-level metrics, we follow (Zhang et al. 2014; Liu et al. 2017; Guo et al. 2020) and re-define true positive (TP), false positive (FP) and false negative (FN). We denote predicted hard exudate connected component set with C^={C^1,,C^N}\hat{C}=\{\hat{C}_{1},\cdots,\hat{C}_{N}\} and ground truth hard exudate connected component set with C={C1,,CM}C=\{C_{1},\cdots,C_{M}\}. A pixel is defined as a TP if, and only if, it belongs to:

{C^C}{C^i||C^iC||C^i|>σ}{Ci||C^iC||Ci|>σ},\{\hat{C}\cap C\}\cup\left\{\hat{C}_{i}\bigg{|}\frac{|\hat{C}_{i}\cap C|}{|\hat{C}_{i}|}>\sigma\right\}\cup\left\{C_{i}\bigg{|}\frac{|\hat{C}_{i}\cap C|}{|C_{i}|}>\sigma\right\}\;, (7)

where |||\cdot| accounts the cardinality and σ[0,1]\sigma\in[0,1] is the overlap ratio threshold. The larger σ\sigma is, more rigorous the condition that a pixel is treated as a TP is. A pixel is considered as an FP if, and only if it belongs to

{C^i|C^iC=}{C^iC¯||C^iC||C^i|σ},\{\hat{C}_{i}|\hat{C}_{i}\cap C=\emptyset\}\cup\left\{\hat{C}_{i}\cap\bar{C}\bigg{|}\frac{|\hat{C}_{i}\cap C|}{|\hat{C}_{i}|}\leq\sigma\right\}\;, (8)

where C¯\bar{C} is complementary set to CC. A pixel is considered as an FN if, and only if it belongs to

{Ci|C^Ci=}{CiC^¯||CiC^¯||Ci|σ},\{C_{i}|\hat{C}\cap C_{i}=\emptyset\}\cup\left\{C_{i}\cap\bar{\hat{C}}\bigg{|}\frac{|C_{i}\cap\bar{\hat{C}}|}{|C_{i}|}\leq\sigma\right\}\;, (9)

where C^¯\bar{\hat{C}} is complementary set to C^\hat{C}. Rest pixels are considered as TNs. The region-level F-score is defined as

Fσ=2×SNσ×PPVσSNσ+PPVσ,F_{\sigma}=\frac{2\times SN_{\sigma}\times PPV_{\sigma}}{SN_{\sigma}+PPV_{\sigma}}\;, (10)

where SNσSN_{\sigma} is sensitivity defined as SNσ=TPTP+FNSN_{\sigma}=\frac{TP}{TP+FN} and PPVσPPV_{\sigma} is positive predictive value defined as PPVσ=TPTP+FPPPV_{\sigma}=\frac{TP}{TP+FP}. In our experiment, F0.2F_{0.2}, F0.35F_{0.35}, F0.5F_{0.5}, F0.65F_{0.65} and F0.8F_{0.8} are reported.

Implementation Details

Data Preprocessing and Augmentation. For images in DDR (Li et al. 2019), we first crop the bounding box of field of view, then for cropped boxes whose short side is less than 1024, we enlarge the short side to equal length of long side via zero padding. Finally, we resize them to 1024×10241024\times 1024. For images in IDRiD (Porwal et al. 2018, 2020), we directly resize images to 1440×9601440\times 960. Following (Guo et al. 2019a, 2020), on both datasets, two tricks are adopted to augment the training data: rotation (90, 180 and 270) and flipping (horizontal and vertical).

Experimental Setting. We build three variants of dual-branch segmentation network based on PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015). We call them dual-PSPNet, dual-Deeplabv3 and dual-HED respectively and implement them within PyTorch framework. We initialise the parameters in backbones with the pre-trained model on ImageNet (Russakovsky et al. 2015) and the parameters associated with classifiers with Gaussian distribution with zero mean and standard deviation of 0.01. SGD is used for parameter optimisation. Hyper-parameters include: initial learning rate(0.03 poly policy with power of 0.9), weight decay (0.0005), momentum (0.9), batch size (2) and iteration epoch (100 on DDR (Li et al. 2019) and 40 on IDRiD (Porwal et al. 2018)). Sample rate N1/NN_{1}/N in re-balanced pixel sampler is set to 0.5. The models are trained on GeForce RTX 2080 Ti with two GPUs.

Ablation Study

Effect of Sample Rate. We take dual-PSPNet as example and first explore the effectiveness of sample rate N1/NN_{1}/N of re-balanced pixel sampler. We set it to 0.25, 0.5, 0.75 and reversed class frequency and train dual-PSPNet on DDR (Li et al. 2019) training set separately. Results on test set are reported in Table. 2. They show that our dual-PSPNet network achieves best when sample rate N1/NN_{1}/N is set to 0.5. In what follows, except for extra illustration, N1/N=0.5N_{1}/N=0.5 is the default setting.

Table 2: Effect of sample rate of re-balanced sampler in our Dual PSPNet on DDR (Li et al. 2019) test set. SNpixelSN_{pixel}, PPVpixelPPV_{pixel} and FpixelF_{pixel} are pixel-level sensitivity, positive predicted value and F-score.
N1/NN_{1}/N SNpixelSN_{pixel} PPVpixelPPV_{pixel} FpixelF_{pixel} IoUIoU AUPRAUPR
0.25 0.5863 0.5706 0.5784 0.4069 0.5468
0.5 0.6077 0.5582 0.5819 0.4103 0.5587
0.75 0.5947 0.5628 0.5783 0.4068 0.5491
reverse class frequency 0.5673 0.5547 0.5609 0.3898 0.5202
Table 3: Comparisons of single branch PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015) with CBCE loss and Dice loss and our dual-branch ones with DSM loss on DDR (Li et al. 2019) test set.
Method SNpixelSN_{pixel} PPVpixelPPV_{pixel} FpixelF_{pixel} IoUIoU AUPRAUPR Fσ=0.2F_{\sigma=0.2} Fσ=0.35F_{\sigma=0.35} Fσ=0.5F_{\sigma=0.5} Fσ=0.65F_{\sigma=0.65} Fσ=0.8F_{\sigma=0.8}
PSPNet + CBCE 0.7014 0.2934 0.4137 0.2608 0.4906 0.7441 0.5880 0.4917 0.4587 0.4257
PSPNet + Dice 0.4425 0.7372 0.5530 0.3822 0.4730 0.8640 0.7790 0.6954 0.6467 0.5902
Dual PSPNet + DSM 0.6077 0.5582 0.5819 0.4103 0.5587 0.8887 0.8454 0.7543 0.6428 0.6034
Deeplabv3 + CBCE 0.6557 0.3431 0.4505 0.2907 0.4881 0.7968 0.6652 0.5564 0.4856 0.4617
Deeplabv3 + Dice 0.4210 0.7395 0.5366 0.3667 0.4495 0.8310 0.7341 0.6768 0.6334 0.5754
Dual Deeplabv3+DSM 0.5170 0.6569 0.5786 0.4071 0.5701 0.8811 0.8372 0.7542 0.6562 0.6053
HED + CBCE 0.7302 0.2617 0.3853 0.2386 0.5342 0.6823 0.5672 0.4705 0.4317 0.3959
HED + Dice 0.4899 0.7032 0.5775 0.4060 0.4775 0.8451 0.7807 0.7107 0.6776 0.6164
Dual HED + DSM 0.6006 0.5714 0.5856 0.4140 0.5294 0.8809 0.8402 0.7205 0.6419 0.6099
Table 4: Comparisons of single branch PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015) with CBCE losse and Dice loss and our dual-branch ones with DSM loss on IDRiD (Porwal et al. 2018) test set.
Method SNpixelSN_{pixel} PPVpixelPPV_{pixel} FpixelF_{pixel} IoUIoU AUPRAUPR Fσ=0.2F_{\sigma=0.2} Fσ=0.35F_{\sigma=0.35} Fσ=0.5F_{\sigma=0.5} Fσ=0.65F_{\sigma=0.65} Fσ=0.8F_{\sigma=0.8}
PSPNet + CBCE 0.9632 0.3361 0.4983 0.3318 0.7475 0.8254 0.7036 0.6268 0.5463 0.5024
PSPNet + Dice 0.7358 0.7819 0.7582 0.6106 0.7878 0.9388 0.9297 0.9160 0.8924 0.8304
Dual PSPNet + DSM 0.7748 0.7572 0.7659 0.6206 0.7977 0.9469 0.9386 0.9241 0.8946 0.8351
Deeplabv3 + CBCE 0.9492 0.3518 0.5134 0.3453 0.7335 0.8532 0.7410 0.6467 0.5244 0.5190
Deeplabv3 + Dice 0.7527 0.7690 0.7607 0.6139 0.7891 0.9462 0.9344 0.9198 0.8925 0.8320
Dual Deeplabv3+DSM 0.7686 0.7621 0.7653 0.6198 0.7890 0.9478 0.9374 0.9240 0.8973 0.8347
HED + CBCE 0.9019 0.4065 0.5604 0.3893 0.7740 0.9013 0.8219 0.7419 0.5967 0.5664
HED + Dice 0.7211 0.7640 0.7419 0.5897 0.7894 0.9429 0.9325 0.9202 0.8698 0.8038
Dual HED + DSM 0.7630 0.7739 0.7684 0.6239 0.8296 0.9446 0.9348 0.9184 0.8971 0.8353
Refer to caption
Figure 3: Visual comparisons of proposed dual-branch networks with DSM loss to single branch networks with CBCE loss and Dice loss on DDR (Li et al. 2019). The single branch networks are PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015). Pixels in yellow are exudate pixels that are correctly classified. Pixels in red are background pixels that are wrongly classified as exudate pixels. Pixels in green are exudate pixels that are wrongly classified as background pixels. Dashed magenta boxes highlight hard exudates that are misidentified.
Refer to caption
Figure 4: Visual comparisons of proposed dual-branch networks with DSM loss to single branch networks with CBCE loss and Dice loss on IDRiD (Porwal et al. 2018). The single branch networks are PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015). Pixels in yellow are exudate pixels that are correctly classified. Pixels in red are background pixels that are wrongly classified as exudate pixels. Pixels in green are exudate pixels that are wrongly classified as background pixels. Dashed magenta boxes highlight hard exudates that are misidentified.

Ablation Study on Different Losses. To evaluate dual-branch network, we conduct experiments on three state-of-the-art dense classification methods PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015) with several settings, including single branch network with CBCE loss, single branch network with Dice loss (Milletari, Navab, and Ahmadi 2016) and our proposed dual-branch network with DSM loss on both DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018).

Table. 3 reports the results on DDR (Li et al. 2019). Obviously, single branch networks with Dice loss (Milletari, Navab, and Ahmadi 2016) are much more sophisticated in hard exudate segmentation than with CBCE loss. This is because that CBCE loss re-weights costs according to reverse class frequency. It over-weights costs on hard exudate pixels and under-weights costs on background pixels, which makes the network be biased in favour of hard exudates. Thus the pixel-level sensitivity is very high while the PPV is very low. The harmonic mean i.e. pixel-level F-score is low. On the contrary, single network branches trained with Dice loss improve the pixel-level PPV significantly but inferior sensitivity. Nevertheless, final pixel-level F-score has been greatly improved. Compared with single branch networks with Dice loss, our dual branch networks with proposed DSM loss further improve the performance. In terms of IoU, our dual-branch ones with DSM loss are superior to the single branch ones. In terms of AUPR, our dual-branch ones outperform single branch ones except for the dual HED. In terms of region-level evaluation metrics, from Table. 3, we can see that dual Deeplabv3 (Chen et al. 2017b) with DSM loss consistently achieves better than single branch ones. Our dual PSPNet with DSM outperforms the single branch PSPNet (Zhao et al. 2017) with CBCE loss and Dice loss except for σ=0.65\sigma=0.65. Our dual HED with DSM outperforms the single branch HED (Xie and Tu 2015) with Dice loss when σ\sigma is small. As σ\sigma is larger than 0.5, our dual HED with DSM loss achieve inferior regional F-score to single branch network trained with Dice loss. The possible reason is that our dual HED with DSM segment small sized hard exudates coarsely and background pixels around small sized hard exudates are misidentified. When σ\sigma is small, those misidentified background pixels are treated as true positives. Thus our dual HED with DSM loss achieve high regional F-scores than single branch network with Dice loss. When σ\sigma increases, those misidentified background pixels are considered as false positives, thus the regional F-scores of ours are lower than single branch one with Dice loss. On IDRiD (Porwal et al. 2018), from Table. 4 we can see that our dual-branch networks with DSM outperform single branch networks with CBCE and Dice loss except for Dual HED with DSM when σ\sigma is 0.5.

In terms of number of parameters, parameters in two branches with PSPNet (Zhao et al. 2017) and Deeplabv3 (Chen et al. 2017b) are almost 1.5 times of those in single branches while two branches with HED (Xie and Tu 2015) is 1.9 times of that in single branch.

Visual comparisons on DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) of our dual-branch networks with DSM loss to single branch networks with CBCE and Dice losses are provided in Fig. 3 and Fig. 4 respectively. We can see that single networks with CBCE loss are prone to misidentify background pixels around hard exudates. Single networks with Dice loss are prone to misidentify small size hard exudates. Our dual-branch networks work better than single branch networks.

Comparison with State-of-the-arts

On DDR (Li et al. 2019), we compare our three dual-branch networks with three deep learning based methods: Deeplabv3+ (Chen et al. 2018), DNL (Yin et al. 2020) and SPNet (Hou et al. 2020). For DNL (Yin et al. 2020) and SPNet (Hou et al. 2020), we provide results with two losses: CBCE and Dice loss. All these methods are originally designed for natural scene image segmentation. Table. 5 reports the results. We note that results of Deeplabv3+ (Chen et al. 2018) in the first row are provided by (Li et al. 2019) and the rest are obtained by fine-tuning on DDR (Li et al. 2019). As shown in Table. 5, our dual-branch networks achieve superior performance in terms of both pixel-level and region-level metrics.

On IDRiD (Porwal et al. 2018), we compare our dual-branch network with five deep learning based methods: DNL (Yin et al. 2020), SPNet (Hou et al. 2020), L-seg (Guo et al. 2019a), LWENet (Guo et al. 2019b) and Bin loss (Guo et al. 2020). For LWENet (Guo et al. 2019b) and L-seg (Guo et al. 2019a), the predicted binary masks are provided by authors. With them, IoUIoU, FpixelF_{pixel} and the region-level metrics are computed. For the rest methods, results are obtained by fine-tuning. Table. 6 reports the results. We can see that our dual-branch network achieves superior results than compared methods.

Table 5: Comparison of other segmentation methods on DDR (Li et al. 2019) test set. Results of Deeplabv3+ (Chen et al. 2018) are directly borrowed from (Li et al. 2019).
Method IoUIoU FpixelF_{pixel} AUPRAUPR Fσ=0.2F_{\sigma=0.2} Fσ=0.35F_{\sigma=0.35} Fσ=0.5F_{\sigma=0.5} Fσ=0.65F_{\sigma=0.65} Fσ=0.8F_{\sigma=0.8}
Deeplabv3+ (Chen et al. 2018) 0.3118 - - - - - - -
DNL (Yin et al. 2020) + CBCE 0.1643 0.2822 0.5125 0.4923 0.3617 0.3308 0.3186 0.2957
DNL (Yin et al. 2020) + Dice 0.3862 0.5572 0.4854 0.8683 0.8116 0.7026 0.5970 0.5611
SPNet (Hou et al. 2020) + CBCE 0.1034 0.1874 0.5034 0.2856 0.2395 0.2264 0.2073 0.1942
SPNet (Hou et al. 2020) + Dice 0.3089 0.4720 0.3748 0.7469 0.7151 0.6291 0.5395 0.4948
Dual PSPNet + DSM (Ours) 0.3822 0.5530 0.4730 0.8640 0.7790 0.6954 0.6467 0.5902
Dual Deeplabv3 + DSM (Ours) 0.4071 0.5786 0.5701 0.8811 0.8372 0.7542 0.6562 0.6053
Dual HED + DSM (Ours) 0.4140 0.5856 0.5294 0.8809 0.8402 0.7205 0.6419 0.6099
Table 6: Comparison of other segmentation methods on IDRiD (Porwal et al. 2018) test set.
Method IoU{IoU} Fpixel{F_{pixel}} AUPRAUPR Fσ=0.2{F_{\sigma=0.2}} Fσ=0.35{F_{\sigma=0.35}} Fσ=0.5{F_{\sigma=0.5}} Fσ=0.65{F_{\sigma=0.65}} Fσ=0.8{F_{\sigma=0.8}}
DNL(Yin et al. 2020) + CBCE 0.2304 0.3745 0.6807 0.6709 0.5753 0.4671 0.3771 0.3760
DNL(Yin et al. 2020) + Dice 0.5715 0.7273 0.6891 0.9413 0.9283 0.9062 0.8698 0.8041
SPNet (Hou et al. 2020) + CBCE 0.2143 0.3530 0.6223 0.6412 0.5384 0.4420 0.3568 0.3552
SPNet (Hou et al. 2020) + Dice 0.4923 0.6598 0.6064 0.9258 0.8883 0.8642 0.8145 0.7147
L-seg (Guo et al. 2019a) 0.5909 0.7429 - 0.9508 0.9417 0.9271 0.8754 0.8038
LWENet (Guo et al. 2019b) 0.5226 0.6865 - 0.9191 0.8942 0.8582 0.8179 0.7160
Bin loss (Guo et al. 2020) 0.3582 0.5275 0.7047 0.9070 0.8056 0.6890 0.5400 0.5319
Dual PSPNet + DSM (Ours) 0.6206 0.7659 0.7977 0.9469 0.9386 0.9241 0.8946 0.8351
Dual Deeplabv3 + DSM (Ours) 0.6198 0.7653 0.7890 0.9478 0.9374 0.9240 0.8973 0.8347
Dual HED + DSM 0.6239 0.7684 0.8296 0.9446 0.9348 0.9184 0.8971 0.8353

Visual comparisons between previous methods and ours on DDR (Li et al. 2019) and IDRiD (Porwal et al. 2018) are performed. In Fig. 5, the segmentation results by our dual-branch networks based on PSPNet (Zhao et al. 2017), Deeplabv3 (Chen et al. 2017b) and HED (Xie and Tu 2015), DNL (Yin et al. 2020) with CBCE loss and Dice loss, SPNet (Hou et al. 2020) with CBCE loss and Dice loss are shown. We can see that (1) the compared DNL (Yin et al. 2020) and SPNet (Hou et al. 2020) with CBCE loss are hard exudate biased and prone to misidentify background pixels as hard exudate pixels; (2) SPNet (Hou et al. 2020) with Dice loss are background biased and prone to misidentify hard exudate pixels as background pixels; (3) Our dual-branch networks achieve better than them. In Fig. 6, the segmentation results by ours, DNL (Yin et al. 2020) SPNet (Hou et al. 2020), L-seg (Guo et al. 2019a), LWENet (Guo et al. 2019b), and Bin loss (Guo et al. 2020) are shown. Similarly, we can see that (1) DNL (Yin et al. 2020) and SPNet (Hou et al. 2020) with CBCE loss are hard exudate biased while with Dice loss are background biased; (2) L-seg (Guo et al. 2019a) is background biased while LWE (Guo et al. 2019b) and Bin loss (Guo et al. 2020) are hard exudate biased; (3) ours achieve better than the compared methods.

Refer to caption
Figure 5: Visual comparisons with DNL (Yin et al. 2020), SPNet (Hou et al. 2020) with CBCE loss and Dice loss on DDR (Li et al. 2019). Pixels in yellow are exudate pixels that are correctly classified. Pixels in red are background pixels that are wrongly classified as exudate pixels. Pixels in green are exudate pixels that are wrongly classified as background pixels.
Refer to caption
Figure 6: Visual comparisons of with DNL (Yin et al. 2020), SPNet (Hou et al. 2020) with CBCE loss and Dice loss, L-seg (Guo et al. 2019a), LWENet (Guo et al. 2019b) and Bin loss (Guo et al. 2020) on IDRiD (Porwal et al. 2018). Pixels in yellow are exudate pixels that are correctly classified. Pixels in red are background pixels that are wrongly classified as exudate pixels. Pixels in green are exudate pixels that are wrongly classified as background pixels.

Conclusion

In this paper, we propose dual-branch network to address the issues of extreme class imbalance and enormous variation in size across target regions for segmentation of hard exudate in colour fundus images. Our dual-branch network uses two branches with partial weight sharing to learn representations and classifiers for hard exudates in different sizes. It is trained with the proposed dual-sampling modulated Dice loss, which enables dual-branch network to first learn to segment large hard exudates then small hard exudates. Experimental results on two public datasets for hard exudate segmentation demonstrate that our dual-branch network outperforms existing segmentation networks with both CBCE loss and Dice loss.

References

  • Chen et al. (2017a) Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; and Yuille, A. L. 2017a. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 40(4): 834–848.
  • Chen et al. (2017b) Chen, L.-C.; Papandreou, G.; Schroff, F.; and Adam, H. 2017b. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 .
  • Chen et al. (2018) Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; and Adam, H. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 801–818.
  • Guo et al. (2019a) Guo, S.; Li, T.; Kang, H.; Li, N.; Zhang, Y.; and Wang, K. 2019a. L-Seg: An end-to-end unified framework for multi-lesion segmentation of fundus images. Neurocomputing 349: 52–63.
  • Guo et al. (2019b) Guo, S.; Li, T.; Wang, K.; Zhang, C.; and Kang, H. 2019b. A Lightweight Neural Network for Hard Exudate Segmentation of Fundus Image. In International Conference on Artificial Neural Networks, 189–199. Springer.
  • Guo et al. (2020) Guo, S.; Wang, K.; Kang, H.; Liu, T.; Gao, Y.; and Li, T. 2020. Bin loss for hard exudates segmentation in fundus images. Neurocomputing 392: 314–324.
  • He et al. (2016) He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In CVPR, 770–778.
  • Hou et al. (2020) Hou, Q.; Zhang, L.; Cheng, M.-M.; and Feng, J. 2020. Strip pooling: Rethinking spatial pooling for scene parsing. In CVPR, 4003–4012.
  • Kaur and Mittal (2018) Kaur, J.; and Mittal, D. 2018. A generalized method for the segmentation of exudates from pathological retinal fundus images. Biocybernetics and Biomedical Engineering 38(1): 27–53.
  • Khojasteh, Aliahmad, and Kumar (2019) Khojasteh, P.; Aliahmad, B.; and Kumar, D. K. 2019. A novel color space of fundus images for automatic exudates detection. Biomedical Signal Processing and Control 49: 240–249.
  • Khojasteh et al. (2019) Khojasteh, P.; Júnior, L. A. P.; Carvalho, T.; Rezende, E.; Aliahmad, B.; Papa, J. P.; and Kumar, D. K. 2019. Exudate detection in fundus images using deeply-learnable features. Computers in biology and medicine 104: 62–69.
  • Klein et al. (1987) Klein, R.; Klein, B. E.; Moss, S. E.; Davis, M. D.; and DeMets, D. L. 1987. The Wisconsin Epidemiologic Study of Diabetic Retinopathy: VII. Diabetic nonproliferative retinal lesions. Ophthalmology 94(11): 1389–1400.
  • Kusakunniran et al. (2018) Kusakunniran, W.; Wu, Q.; Ritthipravat, P.; and Zhang, J. 2018. Hard exudates segmentation based on learned initial seeds and iterative graph cut. Computer methods and programs in biomedicine 158: 173–183.
  • Li et al. (2019) Li, T.; Gao, Y.; Wang, K.; Guo, S.; Liu, H.; and Kang, H. 2019. Diagnostic Assessment of Deep Learning Algorithms for Diabetic Retinopathy Screening. Information Sciences 501: 511 – 522. ISSN 0020-0255.
  • Liu et al. (2017) Liu, Q.; Zou, B.; Chen, J.; Ke, W.; Yue, K.; Chen, Z.; and Zhao, G. 2017. A location-to-segmentation strategy for automatic exudate segmentation in colour retinal fundus images. Computerized medical imaging and graphics 55: 78–86.
  • Milletari, Navab, and Ahmadi (2016) Milletari, F.; Navab, N.; and Ahmadi, S.-A. 2016. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In 2016 fourth international conference on 3D vision (3DV), 565–571. IEEE.
  • Mo, Zhang, and Feng (2018) Mo, J.; Zhang, L.; and Feng, Y. 2018. Exudate-based diabetic macular edema recognition in retinal images using cascaded deep residual networks. Neurocomputing 290: 161–171.
  • Ouyang et al. (2020) Ouyang, X.; Huo, J.; Xia, L.; Shan, F.; Liu, J.; Mo, Z.; Yan, F.; Ding, Z.; Yang, Q.; Song, B.; et al. 2020. Dual-Sampling Attention Network for Diagnosis of COVID-19 from Community Acquired Pneumonia. IEEE Transactions on Medical Imaging .
  • Pereira, Gonçalves, and Ferreira (2015) Pereira, C.; Gonçalves, L.; and Ferreira, M. 2015. Exudate segmentation in fundus images using an ant colony optimization approach. Information Sciences 296: 14–24.
  • Porwal et al. (2018) Porwal, P.; Pachade, S.; Kamble, R.; Kokare, M.; Deshmukh, G.; Sahasrabuddhe, V.; and Meriaudeau, F. 2018. Indian diabetic retinopathy image dataset (idrid): A database for diabetic retinopathy screening research. Data 3(3): 25.
  • Porwal et al. (2020) Porwal, P.; Pachade, S.; Kokare, M.; Deshmukh, G.; Son, J.; Bae, W.; Liu, L.; Wang, J.; Liu, X.; Gao, L.; et al. 2020. IDRiD: Diabetic Retinopathy–Segmentation and Grading Challenge. Medical image analysis 59: 101561.
  • Ravishankar, Jain, and Mittal (2009) Ravishankar, S.; Jain, A.; and Mittal, A. 2009. Automated feature extraction for early detection of diabetic retinopathy in fundus images. In CVPR, 210–217. IEEE.
  • Russakovsky et al. (2015) Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. 2015. Imagenet large scale visual recognition challenge. International journal of computer vision 115(3): 211–252.
  • Sasaki et al. (2013) Sasaki, M.; Kawasaki, R.; Noonan, J. E.; Wong, T. Y.; Lamoureux, E.; and Wang, J. J. 2013. Quantitative measurement of hard exudates in patients with diabetes and their associations with serum lipid levels. Investigative ophthalmology & visual science 54(8): 5544–5550.
  • Simonyan and Zisserman (2015) Simonyan, K.; and Zisserman, A. 2015. Very Deep Convolutional Networks for Large-scale Image Recognition. In International Conference on Learning Representations.
  • Sopharak et al. (2008) Sopharak, A.; Uyyanonvara, B.; Barman, S.; and Williamson, T. H. 2008. Automatic detection of diabetic retinopathy exudates from non-dilated retinal images using mathematical morphology methods. Computerized medical imaging and graphics 32(8): 720–727.
  • Walter et al. (2002) Walter, T.; Klein, J.-C.; Massin, P.; and Erginay, A. 2002. A contribution of image processing to the diagnosis of diabetic retinopathy-detection of exudates in color fundus images of the human retina. IEEE transactions on medical imaging 21(10): 1236–1243.
  • Wang et al. (2020) Wang, H.; Yuan, G.; Zhao, X.; Peng, L.; Wang, Z.; He, Y.; Qu, C.; and Peng, Z. 2020. Hard exudate detection based on deep model learned information and multi-feature joint representation for diabetic retinopathy screening. Computer Methods and Programs in Biomedicine 191: 105398.
  • Welfer, Scharcanski, and Marinho (2010) Welfer, D.; Scharcanski, J.; and Marinho, D. R. 2010. A coarse-to-fine strategy for automatically detecting exudates in color eye fundus images. computerized medical imaging and graphics 34(3): 228–235.
  • Xie and Tu (2015) Xie, S.; and Tu, Z. 2015. Holistically-nested edge detection. In ICCV, 1395–1403.
  • Yin et al. (2020) Yin, M.; Yao, Z.; Cao, Y.; Li, X.; Zhang, Z.; Lin, S.; and Hu, H. 2020. Disentangled non-local neural networks. In ECCV, 191–207. Springer.
  • Zhang et al. (2014) Zhang, X.; Thibault, G.; Decencière, E.; Marcotegui, B.; Laÿ, B.; Danno, R.; Cazuguel, G.; Quellec, G.; Lamard, M.; Massin, P.; et al. 2014. Exudate detection in color retinal images for mass screening of diabetic retinopathy. Medical image analysis 18(7): 1026–1043.
  • Zhao et al. (2017) Zhao, H.; Shi, J.; Qi, X.; Wang, X.; and Jia, J. 2017. Pyramid scene parsing network. In CVPR, 2881–2890.
  • Zhou et al. (2020) Zhou, B.; Cui, Q.; Wei, X.-S.; and Chen, Z.-M. 2020. BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition. In CVPR, 9719–9728.