A New Approach for Evaluating and Improving the Performance of Segmentation Algorithms on Hard-to-Detect Blood Vessels

João P. Parella Department of Computer Science, Federal University of São Carlos, São Carlos, SP, Brazil Matheus V. da Silva Department of Computer Science, Federal University of São Carlos, São Carlos, SP, Brazil Cesar H. Comin [email protected] Department of Computer Science, Federal University of São Carlos, São Carlos, SP, Brazil

Abstract

Background and Objective: Many studies regarding the vasculature of biological tissues involve the segmentation of the blood vessels in a sample followed by the creation of a graph structure to model the vasculature. The graph is then used to extract relevant vascular properties. Small segmentation errors can lead to largely distinct connectivity patterns and a high degree of variability of the extracted properties. Nevertheless, global metrics such as Dice, precision, and recall are commonly applied for measuring the performance of blood vessel segmentation algorithms. These metrics might conceal important information about the accuracy at specific regions of a sample. To tackle this issue, we propose a local vessel salience (LVS) index to quantify the expected difficulty in segmenting specific blood vessel segments. Methods: The LVS index is calculated for each vessel pixel by comparing the local intensity of the vessel with the image background around the pixel. The index is then used for defining a new accuracy metric called low-salience recall (LSRecall), which quantifies the performance of segmentation algorithms on blood vessel segments having low salience. The perspective provided by the LVS index is used to define a data augmentation procedure that can be used to improve the segmentation performance of convolutional neural networks. Results: We show that segmentation algorithms having high Dice and recall values can display very low LSRecall values, which reveals systematic errors of these algorithms for vessels having low salience. The proposed data augmentation procedure is able to improve the LSRecall of some samples by as much as 25%. Conclusions: The developed methodology opens up new possibilities for comparing the performance of segmentation algorithms regarding hard-to-detect blood vessels as well as their capabilities for vascular topology preservation.

Blood vessel segmentation, Salience metric, Segmentation performance, Data augmentation

^†^†preprint: 1

I Introduction

Identifying blood vessels in digital images is important for diagnosis Mookiah et al. (2021); Eladawi et al. (2018); Sangeethaa and Uma Maheswari (2018); Li et al. (2021); Almotiri et al. (2018); Nair and Muthuvel (2020) as well as for obtaining accurate measurements for supporting current studies on the vascular system Roda et al. (2021); Ouellette et al. (2020); Wong et al. (2019); Dolati et al. (2015). Blood vessels can have very different shapes and appearances, and they form a complex interconnected structure that usually spans a whole tissue sample. It is often desirable to identify whole vascular structures for downstream analysis such as branching point characterization, tortuosity quantification as well as blood flow simulation Li et al. (2022); Ouellette et al. (2020); Fraz et al. (2012). The typical pipeline involves the segmentation of the vasculature followed by the calculation of the medial axes of the segmented vessels and the creation of a graph representing the vasculature Paetzold et al. (2021); Freitas-Andrade et al. (2022).

Many methodologies have been developed for blood vessel identification on digital images Mookiah et al. (2021); Cervantes et al. (2023). Recent approaches use Convolutional Neural Networks (CNN) optimized on manually annotated samples to segment all blood vessels in a dataset Chen et al. (2021). Common to most of these approaches is the evaluation of the results using the Dice score, which quantifies the overall quality of the segmentation. Boundary distance metrics are also sometimes used Schaap et al. (2009); van Walsum et al. (2008). The recently developed clDice metric has also been used by many studies to quantify the degree of topology preservation of segmentation methods Shit et al. (2021).

Given the complex changes in shape and appearance that the vasculature can display on a given sample as well as on different samples of a dataset, global metrics such as the Dice and clDice measures usually do not tell the whole story about the quality of the segmentation. For instance, it is common to observe significant contrast variations between the vessels and the background, as shown in Figure 1. If the objective is to construct a graph representing the vasculature, small discontinuities along segmented blood vessels at low-contrast regions might significantly change the topology of the system. Such discontinuities might have little impact on the value of global quality metrics, while other segmentation errors, such as those related to the vessel caliber, which do not change the topology, might dominate the metrics.

Refer to caption — Figure 1: Two fluorescence microscopy images of the mouse cortex showing blood vessels with varying appearances Freitas-Andrade *et al.* (2022).

We propose a simple methodology to calculate the local vessel salience (LVS) index for each pixel belonging to the vasculature¹¹1The code of the developed methodologies is available at https://github.com/jpparella/vessel_algorithm. The index quantifies the expected difficulty in segmenting a given blood vessel region. The calculation of the index involves a careful definition of sampling regions to compare the local intensity of the vessel and its immediate background. The LVS index is then used to define a metric quantifying the accuracy of segmentation algorithms at low-salience regions, which we call low-salience recall (LSRecall). The metric corresponds to the recall of an algorithm at difficult regions to segment.

The LSRecall metric can provide important information regarding segmentation accuracy for downstream tasks involving blood vessel connectivity analyses. For instance, we show that samples having similar Dice and recall scores for a segmentation algorithm can have largely distinct LSRecall. We also show that CNNs trained using different techniques can result in similar Dice and recall but different LSRecall values.

Furthermore, we also propose a salience augmentation technique that randomly reduces the LVS index of blood vessel segments when training CNNs, creating samples that are more challenging to segment. The technique also allows creating discontinuities on blood vessel segments, that is, regions with LVS index equal to zero. An adjustable parameter can be used to set the length of the discontinuity. We show that the proposed augmentation can significantly improve the LSRecall of neural networks.

The remainder of the text is organized as follows. Previous studies related to the proposed methodology are discussed in Section II. The definition of the LVS index is provided in Section III.1. The LSRecall is presented in Section III.2 and the proposed augmentation procedure is defined in Section III.3. Section IV contains the results of the respective methodologies and Section V presents the conclusions of the study.

II Related Works

Segmenting interconnected curvilinear structures while preserving the topology of the overall system is a challenging task. Many approaches have been developed for extracting road networks from aerial images Mosinska et al. (2018); Citraro et al. (2020); Vasu et al. (2020), identifying neurons and respective dendrites and axons Hu et al. (2019); Funke et al. (2018); Scheffer et al. (2020), and for reconstructing vascular networks Gupta et al. (2024); Hu et al. (2021). We do not aim to define a new segmentation or graph creation algorithm, but to develop a new metric and augmentation technique for improving current blood vessel segmentation methods.

Regarding segmentation quality, many metrics have been developed to assess blood vessel identification performance Li et al. (2022); Moccia et al. (2018). Besides the usual Dice and Jaccard scores, precision, recall and related pixel-wise metrics are usually reported. The Hausdorff or average curve/surface distance are sometimes used Schaap et al. (2009); Moccia et al. (2018). Metrics that are more related to connectivity preservation, but are occasionally used, include the percentage of correctly identified branches, total vessel length, and segment fragmentation Li et al. (2022); Moccia et al. (2018). Betti numbers, from algebraic topology, have also been used Menten et al. (2023). The recently introduced clDice metric Shit et al. (2021) aims at quantifying connectivity preservation by calculating the degree that the skeleton of the ground truth mask lies inside the predicted segmentation, and vice-versa. The metric can be applied not only for quantifying performance but can also be directly optimized as a loss function.

These metrics focus on quantifying the global accuracy of the results, usually by comparing the output of an algorithm with a given reference annotation. However, given the complexity of vascular structures, it is common to have blood vessels that are easy to segment while others that are much more challenging. For instance, on eye fundus images it is common for methods with high segmentation accuracy to underperform on thin vessels Mookiah et al. (2021). Thin vessels tend to have smaller salience from the background due to the finite resolution of the capture device. Similarly, blood vessels captured using fluorescence microscopy can have large changes in contrast with respect to the background (see Figure 1).

Recent works Reinke et al. (2021); Maier-Hein et al. (2024) have drawn attention to the flaws and biases of using inappropriate standard performance metrics. The domain where the metrics are being applied as well as the kind of object that is being detected should be considered. Please refer to Reinke et al. (2021) for an extensive discussion about common metric biases. We argue that changes in blood vessel salience in a sample or among different samples should be taken into account when evaluating segmentation performance. If 90% of the vessel segments in a sample are easy to identify, one can argue that a method reaching a Dice score of 0.9 is just doing the bare minimum.

Regarding data augmentation for neural network training, to our knowledge, no method has been developed specifically for blood vessels. It is common to use standard augmentation approaches such as noise addition, blurring, random crops, and color jittering. An augmentation that is specifically useful for blood vessels is the elastic transformation da Silva et al. (2022); Lin et al. (2022). It allows locally changing the shape of the blood vessels, which usually become more tortuous due to random displacements. We develop an augmentation approach that is specifically tailored for curvilinear structures that have no significant local background variations.

III Methodology

III.1 Local Vessel Salience

We aim to characterize the local salience of blood vessel segments for each pixel belonging to the vasculature. We assume that each image has a respective ground truth manual annotation mask indicating the pixels that belong to the vessel. The first step of the methodology is to represent the ground truth annotation as a graph, where nodes indicate bifurcations and terminations of the vasculature and edges represent vessel segments. For this task, we use the Pyvane framework²²2https://github.com/chcomin/pyvane. Briefly, the framework first applies a skeletonization algorithm to calculate the medial axes of the vessels. Next, skeleton pixels having one neighboring pixel or three or more neighboring pixels become a node in the graph. A pruning procedure is applied to remove small spurious branches caused by the skeletonization algorithm. A merging strategy is also applied to merge neighboring graph nodes into a single node. The final graph contains the connectivity of the vasculature as well as the positions of the nodes. More importantly for our method, each edge of the graph contains the pixels of the medial axis of the respective blood vessel segment. We henceforth use MAS to refer to the medial axis segment of a vessel. Please refer to Freitas-Andrade et al. (2022) for a complete description of the method.

The aim is to calculate the salience of the vessel at each MAS pixel and expand the calculated values to nearby pixels. The salience is calculated as the relative difference in intensity between the vessel and the background around the pixel. More details are given below. For the calculation, it is first necessary to define the cross-section of the vessel at the position of the pixel. A possible approach is to define a normal vector with respect to the MAS. We found that this approach is unstable at sharp changes of direction of the vessel as well as close to bifurcations and terminations. Thus, a simple and robust method, that is guaranteed to find relevant vessel and background values for all MAS pixels, was developed.

First, the borders of the vessels are obtained using the ground truth mask. A parametric contour tracing algorithm is used Suzuki et al. (1985). Given a MAS pixel $p$ , the closest vessel contour pixel $p_{c1}$ is identified using the Euclidean distance. A vector representing the direction from $p$ to $p_{c1}$ is calculated as $v_{c1}=p_{c1}-p$ . Next, the second closest contour pixel $p_{ci}$ is identified and a respective vector $v_{ci}=p_{ci}-p$ is defined. The dot product between vectors $v_{c1}$ and $v_{ci}$ is calculated. If it is negative, it means that the vectors have opposite directions, and thus two opposite contour points were found. A positive dot product means that $p_{ci}$ is at the same side of the contour as $p_{c1}$ , and thus should be discarded. In such a case, the third closest contour pixel is identified and the dot product calculated, and so on until a contour pixel having a negative dot product with $v_{c1}$ is found. The final opposite pixel is represented as $p_{c2}$ .

Given $p$ , $p_{c1}$ and $p_{c2}$ , two straight lines representing a cross-section of the vessel can be defined. The first line, $l_{1}$ , goes from $p_{c1}$ to $p$ while the second line, $l_{2}$ , goes from $p$ to $p_{c2}$ . Figure 2 shows an example of a MAS pixel and the respective shortest lines to nearby contour pixels. The pixels belonging to both lines are henceforth represented as $S_{v}$ . The values of these pixels represent the local vessel signal that will be compared with the background.

Points $p_{c1}$ and $p_{c2}$ are used to identify nearby background pixels. All pixels with a Euclidean distance smaller than or equal to $r_{b}$ from $p_{c1}$ or from $p_{c2}$ are identified. The ground truth mask is used to discard pixels belonging to the blood vessel. Figure 2 shows an illustration of the background pixels identified at this step. These pixels are represented as $S_{b}$ .

Pixels $S_{v}$ and $S_{b}$ tend to form a dumbbell-like shape representing local intensities associated with $p$ . $r_{b}$ is a free parameter of the method defining the scale where background pixels will be searched, but using a value leading to a similar number of pixels in $S_{v}$ and $S_{b}$ tends to work well in practice.

Different approaches can be used to calculate the local vessel salience from pixels $S_{v}$ and $S_{b}$ . We chose a simple calculation to define a value that can be easily interpreted. First, the average intensity values $I_{v}$ and $I_{b}$ of, respectively, $S_{v}$ and $S_{b}$ are calculated. Next, the relative difference in intensity is obtained as

\Delta I=\frac{I_{v}-I_{b}}{max(I_{v},I_{b})}.

(1)

We found that $\Delta I$ can have sharp changes in value, which hinders the definition of continuous regions for the recall measure presented in the next section. Thus, it is beneficial to smooth the values along the MAS. Since the pixels of the MAS are ordered parametrically along the segment, the final LVS index of a pixel $p_{i}$ is defined as

\mathrm{LVS}(p_{i})=\frac{1}{2k+1}\sum_{j=i-k}^{j=i+k}\Delta I(p_{j})

(2)

where $\Delta I(p_{j})$ represents the local difference for pixel $p_{j}$ of the MAS. Parameter $k$ sets the degree of smoothing.

The final step of the calculation is to expand the values calculated for the MAS to other vessel pixels. For each vessel pixel, the closest MAS pixel is identified, and the respective LVS index is attributed to the pixel. The LVS index of each vessel pixel is stored as an image so that it can be easily used on downstream calculations.

III.2 Low-Salience Recall

Given the LVS index of each vessel pixel, it is possible to identify regions with different vessel salience. Pixels with large LVS are easier to segment. Pixels with low LVS are more challenging. But notice that it is possible to define a third class of pixels having zero or negative LVS. These pixels represent regions of vessel discontinuities in the image. Such pixels are very challenging to segment. Interestingly, recent results show that humans tend to have a strong shape bias while CNNs tend to have a strong texture bias Geirhos et al. (2019). Thus, it is expected that humans perform better than CNNs at identifying vessel discontinuities, which can be located using the shape prior that two nearby and aligned, but discontinuous, segments are likely the same segment. This means that ground truth annotations might have discontinuities correctly annotated as being blood vessels, but CNNs will struggle to identify such regions.

Given the result of a segmentation algorithm to be compared with a ground truth annotation, we define as $G$ the set of ground truth vessel pixels and as $R$ the set of vessel pixels identified by the algorithm. From these two sets the number of true positive (TP) and false negative (FN) pixels can be calculated as

TP=|G\cap R|

(3)

FN=|G-R|

(4)

Then, the traditional recall metric can be defined as

\mathrm{Recall}=\frac{TP}{TP+FN}.

(5)

Since the number of annotated blood vessel pixels is $P=TP+FN$ , the recall metric quantifies the fraction of blood vessel pixels that were successfully identified by the algorithm.

The LSRecall is defined in a similar way to the usual recall metric. Given the LVS index of all vessel pixels, a threshold value $t$ is defined, and only pixels having an LVS index equal to or smaller than the threshold are considered. This defines a binary image containing challenging vessel pixels that can be used in a usual recall calculation. We represent as $G_{t}$ the set of such pixels. Notice that $G_{t}$ is a subset of $G$ . Then, the respective metrics are calculated as

TP_{t}=|G_{t}\cap R|

(6)

FN_{t}=|G_{t}-R|

(7)

The respective recall metric can be defined as

\mathrm{LSRecall}=\frac{TP_{t}}{TP_{t}+FN_{t}}.

(8)

The LSRecall quantifies how successful the algorithm was at identifying challenging blood vessel pixels. Notice that $G_{t}$ can be small. Thus, large changes in LSRecall might have respective insignificant changes on the traditional recall metric. The threshold $t$ can be set to a specific value to quantify if a segmentation method is identifying pre-defined challenging regions or a family of LSRecall values can be calculated using different values of $t$ . When $t=1$ the LSRecall becomes the usual recall metric.

It is relevant to point out that, similarly to the usual recall metric, the LSRecall quantifies one aspect of an algorithm’s performance related to false negatives. Other metrics can be used to quantify the amount of false positives in the result.

III.3 Augmenting Blood Vessel Salience

It will be shown in the results section that segmentation methods reaching recall values of $\approx 0.9$ can have respective LSRecall scores as low as $0.2$ . Thus, it is clear that, as expected, many methods tend to struggle in regions having low salience. Thus, we also propose a data augmentation methodology to improve the robustness of methods regarding blood vessels with low salience or discontinuities. In general terms, given the MAS pixels of a blood vessel, as defined in the previous section, the salience of a subset of the pixels is systematically reduced, with an optional discontinuity region added. That is, from a given initial point of a blood vessel segment, the intensity of the vessel is slowly reduced along the segment until it becomes equal to the local background intensity. The details of the method follow below.

Two parameters of the method define the length of the salience modification for a given blood vessel segment. Parameter $l$ sets the total salience modification length, that is, only a region of length $l$ of a segment is modified. Parameter $l_{d}<l$ sets the length of the discontinuity region. Thus, a region of length $l_{d}$ will have pixels with intensity identical to the local background intensity of the segment. We note that $l$ and $l_{d}$ as well as all distances of the procedure described below are calculated along the MAS, that is, they are calculated as path lengths, not as shortest distances between two points.

Given the MAS pixels of a specific blood vessel, a reference pixel $p_{c}$ is randomly selected. This pixel will be the center of the salience reduction region. Thus, the random selection only considers pixels having a path-length distance larger than $l/2$ from both endpoints of the segment. From $p_{c}$ , the two MAS pixels that are a distance $l/2$ from $p_{c}$ are selected. These pixels, represented as $p_{1}$ and $p_{2}$ , define the beginning and end of the salience reduction region. Two other pixels, $p_{d1}$ and $p_{d2}$ , with path-length distance $l_{d}/2$ from $p_{c}$ are also identified. These pixels define the beginning and end of the discontinuity. Figure 3(a) illustrates the positions and lengths defined.

An intensity preservation factor $f_{1}$ is defined from $p_{1}$ to $p_{d1}$ as

f_{1}(p_{i})=\frac{d(p_{i},p_{d1})}{d(p_{1},p_{d1})}

(9)

where $p_{i}$ represents a MAS pixel between $p_{1}$ and $p_{d1}$ and $d(p_{x},p_{y})$ represents the path-length distance between two points. Intuitively, the preservation factor is one for pixel $p_{1}$ and linearly decreases along the segment until reaching a value of zero at $p_{d1}$ . A preservation factor $f_{2}$ is similarly defined for pixels between $p_{2}$ and $p_{d2}$ . Between pixels $p_{d1}$ and $p_{d2}$ , a preservation factor of zero is set. A final preservation factor $f$ combining all calculated values is then defined. Figure 3(b) shows the preservation factor for the MAS illustrated in Figure 3(a).

The preservation factor is calculated only for the MAS pixels. Thus, they need to be expanded to the remaining vessel pixels. For each vessel pixel, the closest MAS pixel is identified (using the Euclidean distance) and the respective factor is associated with the vessel pixel. The final result is a preservation factor for every vessel pixel between points $p_{1}$ and $p_{2}$ . A preservation factor of one is set to all other pixels of the vessel when necessary.

Next, the intensities of the vessel segment are transformed. The minimum value needs to be similar to the local background intensity of the sample. This is done by taking the average value of all background pixels that are inside a rectangular region defined by the upper-left point $(p_{c}(x)-l,p_{c}(y)-l)$ and the lower-right point $(p_{c}(x)+l,p_{c}(y)+l)$ on image coordinates. $(p_{c}(x),p_{c}(y))$ are the coordinates of point $p_{c}$ . The calculated value is represented as $I_{m}$ . Next, the intensity $I(p)$ of each vessel pixel is transformed as

\hat{I}(p)=f(I(p)-I_{m})+I_{m}

(10)

Thus, pixels close to $p_{1}$ and $p_{2}$ have little change in intensity, while pixels close to $p_{d1}$ and $p_{d2}$ become very similar to the background. Pixels between $p_{d1}$ and $p_{d2}$ become equal to $I_{m}$ since the preservation factor is zero in this region.

A specific procedure was developed to make the discontinuous region as similar as possible to the background. Vessel pixels having a preservation factor of zero define a continuous image region $S_{a}$ that must have the appearance of the image background. A procedure was developed to replace the intensities at $S_{a}$ with a background region having the same shape as $S_{a}$ . This was done by inverting the ground truth annotation and applying a binary erosion using $S_{a}$ as a structuring element with central point $p_{a}$ . The remaining pixels are all centers of candidate background regions that can replace the intensities in $S_{a}$ . However, the new background region might be significantly distinct from the local background of the discontinuous region $S_{a}$ . Thus, a criterion was used to find good background candidates for replacement.

Given $S_{a}$ , the outer and inner contours of the set are obtained, as shown in Figure 4. The outer contour $C_{o}$ is defined as all background pixels having at least one 8-neighbor belonging to $S_{a}$ . The inner contour $C_{i}$ is defined as all pixels in $S_{a}$ having at least one 8-neighbor in $C_{o}$ . Thus, $C_{o}$ and $C_{i}$ define the pixels at the interface between the vessel and the background at the region where a vessel discontinuity is desired. The intensities $I_{o}$ of the outer contour are obtained and stored. The positions in $C_{i}$ are translated to a candidate background region found with the procedure described above, and the respective intensities $I_{i}$ are calculated. The average absolute difference between intensities of the outer and translated inner contour is obtained. The procedure is repeated to all candidate background regions.

A randomly selected background region having an absolute intensity difference smaller than a given threshold $t_{b}$ is identified, and the intensities are copied to the respective pixels of $S_{a}$ . Thus, vessel pixels between $p_{d1}$ and $p_{d2}$ acquire new intensities copied from a background region having similar intensity to the local background intensity of pixels between $p_{d1}$ and $p_{d2}$ . Threshold $t_{b}$ sets the largest acceptable difference between the local background and the background region selected to replace $S_{a}$ .

For augmenting the images using the procedure described, the values $l$ and $l_{d}$ are randomly selected for each segment from a range of possible values that are parameters of the augmentation. The number of vessel segments to be augmented in an image is also randomly selected from a pre-specified range.

IV Results and Discussion

IV.1 Dataset

A dataset containing 2641 fluorescence microscopy images of the mouse brain vasculature is used to analyze the potential of the proposed methodologies. The samples were acquired under different experimental conditions and from different animals. Details about the image acquisition procedure and the characteristics of the samples can be found in Freitas-Andrade et al. (2022); da Silva et al. (2024). The dataset is interesting because the blood vessels contained in the samples have many different characteristics. Interesting variations include samples with different amounts of noise, contrast, and vessel caliber as well as samples containing imaging artifacts and significant intensity variation along blood vessel segments. Two examples of the dataset are shown in Figure 1.

IV.2 Local Vessel Salience Analysis

To investigate if the LVS index can successfully identify challenging regions to segment, the LVS indices of all blood vessels in the dataset were calculated. For all experiments, a value of $r_{b}=4$ was used for identifying background pixels for the LVS calculation, and $k=15$ was used for smoothing the values (Equation 2). Figure 5 shows example results for six images. The index tends to correlate with the intensity of the blood vessels, but in regions where the intensity of the background is higher, or the background is noisier, the LVS index tends to result in smaller values, as expected. The Pearson correlation coefficient between the intensities of the pixels and respective LVS values was calculated for each sample and averaged among all samples of the dataset. A value of 0.43 was obtained, which indicates that the two quantities contain distinct information about the vessels. The regions indicated in red on the last row of plots in Figure 5 are challenging regions to segment identified using the index.

Different threshold values can be used to identify specific hard-to-segment blood vessels to focus on when evaluating segmentation algorithms. Figure 6 shows examples of thresholded indices using different threshold values. The threshold sets the expected difficulty in identifying vessel segments. Very low thresholds can be used to define regions where the blood vessels are almost indistinguishable from the background. In some cases, continuity criteria need to be used to identify regions connecting pairs of aligned, but discontinuous, vessel segments.

For the sample shown in Figure 6, segments having an LVS larger than 0.5 are easy to identify. Most segmentation methods should be able to identify these segments. Thus, as discussed above, it is more relevant to focus the performance analysis on vessels having low LVS.

IV.3 Quantifying CNN Segmentation Quality Using LSRecall

To verify if the LSRecall metric can aid in measuring the performance of segmentation algorithms at difficult-to-segment regions, a CNN was trained on the blood vessel dataset and the LSRecall metric was calculated on the results. The neural network used is based on the U-net architecture Ronneberger et al. (2015) using residual blocks. Figure 7 illustrates the architecture. The network was trained on a subset of 45 randomly selected images from the dataset and validated on another 5 randomly selected images. The trained network was then applied to all images of the dataset, including those used for training and validation. This split using a small set of images for training aims at reproducing real-world scenarios where only a few samples are annotated. The training set was included in the final results because we are not interested in absolute performance values, but only in comparing different performance metrics.

The network was trained for 500 epochs using a learning rate of 0.01 with a polynomial learning rate scheduler and a batch size of 10 images. The AdamW optimizer Loshchilov and Hutter (2019) was used with a momentum of 0.9. The model with the lowest validation loss found during training was used. To avoid class imbalances, the background and vessel classes were weighted according to the inverse frequency of the pixels for loss function calculation. The only augmentation used consisted of random crops with a size of $256\times 256$ pixels. The images were normalized using z-score for each image independently (i.e., each image was normalized by its mean and standard deviation values). The network reached an average Dice score of 0.82 on the whole dataset.

Figure 8(a) shows a comparison between the LSRecall and the recall metrics calculated for all images. A threshold of 0.2 was used for the LSRecall. It is clear that the segmentation is much worse for vessels having low salience, with some images displaying LSRecall of zero. Interestingly, many samples having recall as high as 0.9 display a low LSRecall of around 0.2.

The LSRecall threshold parameter sets the largest salience to be considered when measuring the segmentation performance. Figure 8(b) shows the average LSRecall of all samples in the dataset for different threshold values. There is a clear trend of the CNN missing more and more vessel pixels as the threshold gets lower. When the threshold is one, the LSRecall is equal to the recall metric.

Having a metric that quantifies segmentation performance at low salience regions allows studying the influence of hyperparameter changes for improving the results at such regions. One possible change is the inclusion of specific data augmentations to make the method more robust to blood vessel intensity changes. This approach is studied next.

IV.4 Salience Augmentation

The augmentation procedure described in Section III.3 was applied to verify if the CNN can be made more robust to low-salience blood vessels. The training protocol was the same as presented in the previous section, with the addition of the salience augmentation. Specifically, for each image the number of blood vessel segments to be augmented, $n$ , is randomly selected with uniform probability in the range $[50,100]$ . Next, parameters $l$ and $l_{d}$ are randomly selected with uniform probability in, respectively, the ranges $[20,100]$ and $[0,30]$ . A segment with length larger than $l$ is then randomly selected and the procedure described in Section III.3 is applied. The process is repeated until $n$ segments have been augmented. A value of $t_{b}=5$ was used for finding suitable background regions to generate vessel discontinuities.

Figure 9(a) shows a comparison between the LSRecall of the CNNs trained with and without augmentation. A threshold of 0.5 was used for the LSRecall calculation. The result shows that the augmentation procedure indeed tends to make the CNN more robust to blood vessels having low salience. The absolute difference between LSRecall values tends to be small, but this difference tends to be more relevant for samples having low LSRecall. This can be verified using an alternative approach to compare the values. Figure 9(b) shows the relative improvement in LSRecall using the augmentation plotted as a function of LSRecall with no augmentation. The relative improvement is calculated as the difference between LSRecall values with and without augmentation divided by the LSRecall without augmentation. The figure shows that the augmentation procedure significantly improved the accuracy for samples having low LSRecall.

The average precision (positive predictive value) of the CNNs trained with and without augmentation were, respectively, 0.72 and 0.73. Thus, the precision did not significantly change between the two training procedures.

V Conclusion

Semantic segmentation of biomedical images is a challenging task since the objects of interest cannot have a pre-specified bounding box. Thus, a segmentation algorithm must identify relevant pixels taking into account that the object might span the whole image. In such cases, global image-wise metrics might hide important mistakes of the algorithm. In the case of blood vessels, many downstream tasks involve modeling the vessels as a graph and extracting relevant metrics such as total vessel length and ramification density Wu et al. (2022); Freitas-Andrade et al. (2023); Lugo-Hernandez et al. (2017). Small errors at specific regions of the vasculature can lead to highly distinct graphs and respective downstream features. In this study, we developed approaches for quantifying and improving the segmentation of blood vessels focusing on specific, difficult-to-segment, regions.

The proposed LVS index allows an intuitive quantification of the difficulty in segmenting blood vessels at different regions of an image. The index is defined for every vessel pixel and can be used to analyze the degree of variability of blood vessel salience in a dataset. Interestingly, if all vessels in a dataset have low LVS indices, that is, they are very similar to the background, a CNN with enough expressivity could likely be successfully optimized to segment the blood vessels. The problem arises when most of the pixels have high LVS and only a small fraction of the pixels have low or zero LVS. A neural network will tend to focus mostly on easy-to-segment vessels and ignore hard-to-segment, topology-breaking, vessels.

A variation of the recall metric, LSRecall, was defined to focus on quantifying the segmentation quality of difficult-to-segment vessels. The results showed that segmentations having high recall scores might have very low LSRecall. For instance, the CNN used in the experiments obtained an average Dice score of 0.82 and an average recall of 0.89, but an average LSRecall of 0.54 for a salience threshold of 0.1. Focusing on regions that are difficult to segment allows better quantification of the impact of hyperparameters of the segmentation algorithm. For instance, changing the network architecture might lead to little improvement of the Dice score, but to a large improvement of LSRecall. In such a case, the new architecture might be better suited to avoid blood vessel discontinuities. A fact that might not be revealed by the Dice or recall scores.

Taking into account the potential local mistakes of segmentation algorithms, a data augmentation procedure was also defined to improve robustness regarding variations of blood vessel salience along a given image as well as along different images of a dataset. The results showed that for many samples the augmentation procedure improved the recall of difficult-to-segment regions by as much as 25%, with some samples having even a 75% improvement.

Interesting prospects include defining a differentiable LSRecall metric, similarly to the clDice score Shit et al. (2021). This would allow the direct optimization of the metric using neural networks. The ability to systematically generate discontinuous, realistic, vessels can also be used to study the connectivity preservation performance of different segmentation methods.

Funding

C. H. Comin thanks FAPESP (grant no. 21/12354-8) for financial support. M. V. da Silva thanks FAPESP (grant no. 23/03975-4) and the Google PhD Fellowship Program for financial support.

CRediT Author Statement

João Pedro Parella: Methodology, Software, Data curation. Matheus Viana da Silva: Software, Validation, Writing – review and editing. Cesar Henrique Comin: Conceptualization, Methodology, Supervision, Writing - original draft, review and editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Mookiah et al. (2021) Muthu Rama Krishnan Mookiah, Stephen Hogg, Tom J MacGillivray, Vijayaraghavan Prathiba, Rajendra Pradeepa, Viswanathan Mohan, Ranjit Mohan Anjana, Alexander S Doney, Colin NA Palmer, and Emanuele Trucco, “A review of machine learning methods for retinal blood vessel segmentation and artery/vein classification,” Medical Image Analysis 68, 101905 (2021).
Eladawi et al. (2018) Nabila Eladawi, Mohammed Elmogy, Fahmi Khalifa, Mohammed Ghazal, Nicola Ghazi, Ahmed Aboelfetouh, Alaa Riad, Harpal Sandhu, Shlomit Schaal, and Ayman El-Baz, “Early diabetic retinopathy diagnosis based on local retinal blood vessel analysis in optical coherence tomography angiography (octa) images,” Medical physics 45, 4582–4599 (2018).
Sangeethaa and Uma Maheswari (2018) SN Sangeethaa and P Uma Maheswari, “An intelligent model for blood vessel segmentation in diagnosing DR using cnn,” Journal of medical systems 42, 175 (2018).
Li et al. (2021) Zhenwei Li, Mengli Jia, Xiaoli Yang, and Mengying Xu, “Blood vessel segmentation of retinal image based on dense-u-net network,” Micromachines 12, 1478 (2021).
Almotiri et al. (2018) Jasem Almotiri, Khaled Elleithy, and Abdelrahman Elleithy, “A multi-anatomical retinal structure segmentation system for automatic eye screening using morphological adaptive fuzzy thresholding,” IEEE Journal of Translational Engineering in Health and Medicine 6, 1–23 (2018).
Nair and Muthuvel (2020) Arun T Nair and K Muthuvel, “Blood vessel segmentation and diabetic retinopathy recognition: an intelligent approach,” Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization 8, 169–181 (2020).
Roda et al. (2021) Niccolò Roda, Giada Blandano, and Pier Giuseppe Pelicci, “Blood vessels and peripheral nerves as key players in cancer progression and therapy resistance,” Cancers 13, 4471 (2021).
Ouellette et al. (2020) Julie Ouellette, Xavier Toussay, Cesar H Comin, Luciano da F Costa, Mirabelle Ho, María Lacalle-Aurioles, Moises Freitas-Andrade, Qing Yan Liu, Sonia Leclerc, Youlian Pan, et al., “Vascular contributions to 16p11. 2 deletion autism syndrome modeled in mice,” Nature Neuroscience 23, 1090–1101 (2020).
Wong et al. (2019) Sau May Wong, Jacobus FA Jansen, C Eleana Zhang, Erik I Hoff, Julie Staals, Robert J van Oostenbrugge, and Walter H Backes, “Blood-brain barrier impairment and hypoperfusion are linked in cerebral small vessel disease,” Neurology 92, e1669–e1677 (2019).
Dolati et al. (2015) Parviz Dolati, Alexandra Golby, Daniel Eichberg, Mohamad Abolfotoh, Ian F Dunn, Srinivasan Mukundan, Mohamed M Hulou, and Ossama Al-Mefty, “Pre-operative image-based segmentation of the cranial nerves and blood vessels in microvascular decompression: can we prevent unnecessary explorations?” Clinical neurology and neurosurgery 139, 159–165 (2015).
Li et al. (2022) Hao Li, Zeyu Tang, Yang Nan, and Guang Yang, “Human treelike tubular structure segmentation: A comprehensive review and future perspectives,” Computers in Biology and Medicine 151, 106241 (2022).
Fraz et al. (2012) Muhammad Moazam Fraz, Paolo Remagnino, Andreas Hoppe, Bunyarit Uyyanonvara, Alicja R Rudnicka, Christopher G Owen, and Sarah A Barman, “Blood vessel segmentation methodologies in retinal images–a survey,” Computer methods and programs in biomedicine 108, 407–433 (2012).
Paetzold et al. (2021) Johannes C Paetzold, Julian McGinnis, Suprosanna Shit, Ivan Ezhov, Paul Büschl, Chinmay Prabhakar, Anjany Sekuboyina, Mihail Todorov, Georgios Kaissis, Ali Ertürk, et al., “Whole brain vessel graphs: A dataset and benchmark for graph learning and neuroscience,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) (2021).
Freitas-Andrade et al. (2022) Moises Freitas-Andrade, Cesar H Comin, Matheus Viana da Silva, Luciano da F Costa, and Baptiste Lacoste, “Unbiased analysis of mouse brain endothelial networks from two-or three-dimensional fluorescence images,” Neurophotonics 9, 031916 (2022).
Cervantes et al. (2023) Jair Cervantes, Jared Cervantes, Farid García-Lamont, Arturo Yee-Rendon, Josué Espejel Cabrera, and Laura Domínguez Jalili, “A comprehensive survey on segmentation techniques for retinal vessel segmentation,” Neurocomputing 556, 126626 (2023).
Chen et al. (2021) Chunhui Chen, Joon Huang Chuah, Raza Ali, and Yizhou Wang, “Retinal vessel segmentation using deep learning: a review,” IEEE Access 9, 111985–112004 (2021).
Schaap et al. (2009) Michiel Schaap, Coert T Metz, Theo van Walsum, Alina G van der Giessen, Annick C Weustink, Nico R Mollet, Christian Bauer, Hrvoje Bogunović, Carlos Castro, Xiang Deng, et al., “Standardized evaluation methodology and reference database for evaluating coronary artery centerline extraction algorithms,” Medical image analysis 13, 701–714 (2009).
van Walsum et al. (2008) Theo van Walsum, Michiel Schaap, Coert T Metz, Alina G van der Giessen, and Wiro J Niessen, “Averaging centerlines: mean shift on paths,” in International Conference on Medical Image Computing and Computer-Assisted Intervention (Springer, 2008) pp. 900–907.
Shit et al. (2021) Suprosanna Shit, Johannes C Paetzold, Anjany Sekuboyina, Ivan Ezhov, Alexander Unger, Andrey Zhylka, Josien PW Pluim, Ulrich Bauer, and Bjoern H Menze, “cldice-a novel topology-preserving loss function for tubular structure segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021) pp. 16560–16569.
Mosinska et al. (2018) Agata Mosinska, Pablo Marquez-Neila, Mateusz Koziński, and Pascal Fua, “Beyond the pixel-wise loss for topology-aware delineation,” in Proceedings of the IEEE conference on computer vision and pattern recognition (2018) pp. 3136–3145.
Citraro et al. (2020) Leonardo Citraro, Mateusz Koziński, and Pascal Fua, “Towards reliable evaluation of algorithms for road network reconstruction from aerial images,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16 (Springer, 2020) pp. 703–719.
Vasu et al. (2020) Subeesh Vasu, Mateusz Kozinski, Leonardo Citraro, and Pascal Fua, “Topoal: An adversarial learning approach for topology-aware road segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16 (Springer, 2020) pp. 224–240.
Hu et al. (2019) Xiaoling Hu, Fuxin Li, Dimitris Samaras, and Chao Chen, “Topology-preserving deep image segmentation,” Advances in neural information processing systems 32 (2019).
Funke et al. (2018) Jan Funke, Fabian Tschopp, William Grisaitis, Arlo Sheridan, Chandan Singh, Stephan Saalfeld, and Srinivas C Turaga, “Large scale image segmentation with structured loss based deep learning for connectome reconstruction,” IEEE transactions on pattern analysis and machine intelligence 41, 1669–1680 (2018).
Scheffer et al. (2020) Louis K Scheffer, C Shan Xu, Michal Januszewski, Zhiyuan Lu, Shin-ya Takemura, Kenneth J Hayworth, Gary B Huang, Kazunori Shinomiya, Jeremy Maitlin-Shepard, Stuart Berg, et al., “A connectome and analysis of the adult drosophila central brain,” elife 9, e57443 (2020).
Gupta et al. (2024) Saumya Gupta, Yikai Zhang, Xiaoling Hu, Prateek Prasanna, and Chao Chen, “Topology-aware uncertainty for image segmentation,” Advances in Neural Information Processing Systems 36 (2024).
Hu et al. (2021) Xiaoling Hu, Yusu Wang, Li Fuxin, Dimitris Samaras, and Chao Chen, “Topology-aware segmentation using discrete morse theory,” arXiv preprint arXiv:2103.09992 (2021).
Moccia et al. (2018) Sara Moccia, Elena De Momi, Sara El Hadji, and Leonardo S Mattos, “Blood vessel segmentation algorithms—review of methods, datasets and evaluation metrics,” Computer methods and programs in biomedicine 158, 71–91 (2018).
Menten et al. (2023) Martin J Menten, Johannes C Paetzold, Veronika A Zimmer, Suprosanna Shit, Ivan Ezhov, Robbie Holland, Monika Probst, Julia A Schnabel, and Daniel Rueckert, “A skeletonization algorithm for gradient-based optimization,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (2023) pp. 21394–21403.
Reinke et al. (2021) Annika Reinke, Minu D Tizabi, Carole H Sudre, Matthias Eisenmann, Tim Rädsch, Michael Baumgartner, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, et al., “Common limitations of image processing metrics: A picture story,” arXiv preprint arXiv:2104.05642 (2021).
Maier-Hein et al. (2024) Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D Tizabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, et al., “Metrics reloaded: recommendations for image analysis validation,” Nature methods , 1–18 (2024).
da Silva et al. (2022) Matheus V da Silva, Julie Ouellette, Baptiste Lacoste, and Cesar H Comin, “An analysis of the influence of transfer learning when measuring the tortuosity of blood vessels,” Computer Methods and Programs in Biomedicine 225, 107021 (2022).
Lin et al. (2022) Guoye Lin, Hanhua Bai, Jie Zhao, Zhaoqiang Yun, Yangfan Chen, Shumao Pang, and Qianjin Feng, “Improving sensitivity and connectivity of retinal vessel segmentation via error discrimination network,” Medical Physics 49, 4494–4507 (2022).
Suzuki et al. (1985) Satoshi Suzuki et al., “Topological structural analysis of digitized binary images by border following,” Computer vision, graphics, and image processing 30, 32–46 (1985).
Geirhos et al. (2019) Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel, “Imagenet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness.” in International Conference on Learning Representations (2019).
da Silva et al. (2024) Matheus Viana da Silva, Natália de Carvalho Santos, Julie Ouellette, Baptiste Lacoste, and Cesar Henrique Comin, “A new dataset for measuring the performance of blood vessel segmentation methods under distribution shifts,” arXiv preprint arXiv:2301.04517 (2024).
Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, edited by Nassir Navab, Joachim Hornegger, William M. Wells, and Alejandro F. Frangi (Springer International Publishing, Cham, 2015) pp. 234–241.
Loshchilov and Hutter (2019) Ilya Loshchilov and Frank Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations (2019).
Wu et al. (2022) Yuan-ting Wu, Hannah C Bennett, Uree Chon, Daniel J Vanselow, Qingguang Zhang, Rodrigo Muñoz-Castañeda, Keith C Cheng, Pavel Osten, Patrick J Drew, and Yongsoo Kim, “Quantitative relationship between cerebrovascular network and neuronal cell types in mice,” Cell reports 39 (2022).
Freitas-Andrade et al. (2023) Moises Freitas-Andrade, Cesar H Comin, Peter Van Dyken, Julie Ouellette, Joanna Raman-Nair, Nicole Blakeley, Qing Yan Liu, Sonia Leclerc, Youlian Pan, Ziying Liu, et al., “Astroglial hmgb1 regulates postnatal astrocyte morphogenesis and cerebrovascular maturation,” Nature Communications 14, 4965 (2023).
Lugo-Hernandez et al. (2017) Erlen Lugo-Hernandez, Anthony Squire, Nina Hagemann, Alexandra Brenzel, Maryam Sardari, Jana Schlechter, Eduardo H Sanchez-Mendoza, Matthias Gunzer, Andreas Faissner, and Dirk M Hermann, “3d visualization and quantification of microvessels in the whole ischemic mouse brain using solvent-based clearing and light sheet microscopy,” Journal of Cerebral Blood Flow & Metabolism 37, 3355–3367 (2017).