Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation
Abstract
Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fine-tune the hyperparameters. To address these issues, we propose a novel optimization model along with its equivalent primal-dual model and introduce a new optimization algorithm based on primal-dual threshold dynamics (PD-TD). Additionally, we relax the solution constraint and propose another novel primal-dual soft threshold-dynamics algorithm (PD-STD) to achieve superior performance. Based on the variational explanation of the sigmoid layer, the proposed PD-STD algorithm can be integrated into Deep Neural Networks (DNNs) to enforce compact regions as image segmentation results. Compared to existing deep learning methods, extensive experiments demonstrated that the proposed algorithms outperformed state-of-the-art algorithms in numerical efficiency and effectiveness, especially while applying to the popular networks of DeepLabV3 and IrisParseNet with higher IoU, dice, and compactness metrics on noisy Iris datasets. In particular, the proposed algorithms significantly improve IoU by training on a highly noisy image dataset.
keywords:
Shape compactness , Image segmentation , Deep neural networks , threshold dynamics[inst1]organization=Department of Mathematics, Hong Kong Baptist University,addressline=Kowloon Tong, city=Hong Kong, country=China
[inst2]organization=Hong Kong Center for Cerebro-Cardiovascular Health Engineering,addressline=Sha Tin, city=Hong Kong, country=China
[inst3]organization=College of Mathematical Medicine, Zhejiang Normal University,addressline=Jinhua, city=Zhejiang, country=China
[inst4]organization=Norwegian Research Center,city=Bergen, country=Norway
1 Introduction
Image segmentation is a fundamental process in computer vision that involves partitioning a digital image into distinct non-overlapping subregions, including a wide spectrum of real-world applications such as autonomous driving [1], medical image segmentation [2; 3; 4], and satellite image processing [5; 6]. In this respect, the variational method often provides a popular and mathematically explanable approach by formulating image segmentation as a minimization model whose energy function usually contains a data fidelity term and a boundary regularization term. The often-used variational models include the Mumford-Shah model [7], the Chan-Vese model [8], and the Potts model [9]. However, when the image is corrupted by noise or the object of interest is partially blocked in the image, such variational models may not produce desirable segmentation results; hence, some task-dependent prior knowledge is incorporated into the variational model by adding extra penalization functions or hard constraints, so as to improve the final results. Many shape priors have been studied in the literature, such as the convexity prior [10], the shape volume prior [11], the star-shape prior [12], and the compactness prior [13; 14], etc.
In this work, we study the variational image segmentation model with a compactness shape prior. The shape compactness prior helps to obtain a more compact segmentation mask with smooth boundaries in the results, by introducing an additional penalization term to describe a compact region geometrically [13; 14]. Previous studies have explored different definitions of shape compactness. The perimeter-to-area ratio was proposed in [15] to depict the compactness of the segmented regions. The authors in [13] introduced a new compactness term that is invariant from scale, rotation, and translation. The authors of [14] first integrate compactness, in terms of the ratio of the squared perimeter to the area, into image segmentation to ensure size invariance. In addition, it is well-known that the ratio of squared perimeter to area achieves the minimum when the shape is circular, which implies its preference to circular shape segments in the result. However, adding this ratio leads to a challenging non-convex optimization problem. The authors of [14] proposed an algorithm based on the alternating-direction method of multipliers (ADMM), which is, however, computationally expensive and requires fine-tuning of many hyper-parameters.
Given the great success of deep learning-based (DL-based) methods in many different applications during the past decades, many neural network architectures have been proposed for image processing, such as LeNet [16], AlexNet [17], and U-Net [3], etc. In a series of recent works [18; 19; 20], dilated convolutions, which increase the receptive field without a significant increase in computational cost, are used to achieve state-of-the-art segmentation performance. Specifically, U-Net, as proposed in [3], employs an encoder-decoder structure that concatenates multiscale features, allowing effective image segmentation. Such an architecture has inspired several works, such as SegNet [21], U-Net++ [22]. In [23], a novel neural network framework, the so-called PottsMGNet, is proposed to solve the variational Potts model of image segmentation by leveraging operator-splitting methods, multigrid methods, and control variables. Moreover, PottsMGNet shows that most encoder-decoder-based networks are equivalent to some optimal control problems for certain initial value problems using operator-splitting methods as solvers. Although deep learning (DL) based methods have achieved remarkable performance for image segmentation, integrating spatial information priors into such DL-based methods still remains a challenge. Recently, the combination of shape priors with deep learning models was explored such that semantic features were automatically extracted from large datasets using neural networks. Together with the Potts model, operator splitting methods, and double-well potential, the U-Net architecture is used in [24] to segment images with length penalties. In [11], a method is proposed to incorporate spatial priors by introducing the variational explanation of the sigmoid function. This method allows for the integration of shape priors, such as convexity and star shape, into deep learning-based approaches. Motivated by this, this paper aims to incorporate compactness priors with DL-based methods.
In this paper, we consider a non-convex optimization model for image segmentation with the shape compactness prior, for which we introduce two primal-dual-based (PD-based) optimization algorithms using threshold dynamics (TD) and soft threshold dynamics (STD), respectively. The proposed algorithms enjoy advantages in both theory and numerical solvers. Moreover, the introduced PD-STD algorithm can be easily integrated into many popular neural networks such as DeepLabV3 and IrisParseNet, and significantly improves performance while segmenting noisy images. We list our main contributions of this work as follows:
-
- We study Potts model joint with a shape compactness prior for image segmentation, and introduce its novel equivalent primal-dual and dual models.
-
- Two novel optimization algorithms, namely the PD-TD (Primal-Dual Threshold Dynamics) and the PD-STD (Primal-Dual Soft Threshold Dynamic) algorithms, are proposed based on the introduced primal-dual and dual models, which outperform previous methods in efficiency and accuracy.
-
- Motivated by [11], the PD-STD algorithm can be integrated into the structures of deep neural networks. The new networks demonstrate better robustness for segmenting objects of circular shapes, especially on noisy images.
This work is organized as follows. In Sec. 2, we discuss related works of the classical variational segmentation models, especially the models with shape-compactness prior; also, recent deep learning-based image segmentation methods. We introduce a new variational model in Sec. 3, and propose its novel equivalent models and corresponding algorithms in Sec. 4. In Sec. 5, experimental results are presented to verify the efficiency, accuracy, and robustness of our proposed algorithms. The conclusions will be given in Sec. 6.
2 Preliminaries
2.1 Variational segmentation models
Suppose is an input compact image domain in . The objective of binary image segmentation is to divide the input image into two distinct regions, i.e. the foreground region and the background region . The aim is to assign each pixel to a specific region of the foreground and background, respectively. One of the most popular image segmentation models is the Potts model [9], which originated from statistical mechanics to represent interactions between spins on a crystalline lattice and well studied in a discrete optimization setting by Geman [25] and Boykov et al. [26; 27]. During recent decades, the extension of Potts model in the continuous optimization setting has gained significant attentions in image segmentation, which aims to minimize the following energy function:
(1) |
where is the data fidelity term at each pixel , depending on the given image information , and the second term , named edge force term, measures the boundary length of the foreground region . Clearly, the second term serves as the regularization term promoting smooth and well-defined boundaries.
Let be the discrete-valued labeling function such that
(2) |
The optimization model (1) can, therefore, be rewritten as
(3) |
where the total-variation function gives the estimation of the perimeter of the foreground region, i.e. , in a continuous space setting.
Given the discrete-valued labeling function , the optimization problem (3) is non-convex, thus difficult to solve directly. To address this, its convex relaxation is thus studied:
(4) |
where
(5) |
It can be proved that the global optimum of the non-convex discrete optimization problem can be easily obtained by simply thresh-holding the optimum of its convex relaxation problem [28; 29]. Multiple efficient algorithms have been developed to address the convex optimization problem (4), based on primal-dual optimization [30; 31; 32] and dual optimization [29] respectively. More details about literature proposed algorithms for solving the Potts model and relevant references can be found in a recent survey paper [33].
2.2 Shape compactness prior
In practice, shape prior information is often incorporated into the applied variational image segmentation models so as to further constrain the solution space of segmentation and improve accuracy and robustness of computation results. Various types of shape prior, such as convexity and star-shape, are proposed and explored in variational image segmentation models [11; 10]. It is of great interest to segment the compact target regions from the given images in many real-world image segmentation tasks. The key challenges lie in how to represent such region compactness in the most effective and descent way in mathematics and solve it efficiently in numerics. This is one of the main research topics in this study.
Given the isoperimetric inequality [34] for the region , we have
and the equality holds if and only if is a ball. Clearly, the ratio of and is always larger than or equal to , and the minimum is obtained if and only if is a ball. Its minimization simply enforces the result of segmentation to be a single compact region. Therefore, we define the following formulation as the measure of shape compactness for the region :
Let be the indicator function of the region :
It is well-known that, for the perimeter of , we have [35], where is the total-variation of such that
where
Thus, the shape compactness of can be properly reformulated as
(6) |
The area of is assumed to be non-zero in this paper.
2.3 Deep learning-based segmentation models
This work also introduces the incorporation of the shape compactness prior, as defined in (6), into modern neural networks, such as DNNs. DNNs emerged as a widely used tool in the field of artificial intelligence and have revolutionized numerous domains ranging from computer vision to natural language processing. Especially, with their great ability to learn hierarchical representations of data through multiple layers of interconnected neurons, DNNs have achieved remarkable success in capturing complex patterns and obtaining state-of-the-art performance in various challenging tasks, and emerged as a widely used tool in the field of artificial intelligence. We review the often-used structure of DNNs and discuss the motivation of integrating shape prior information into DNNs.
DNNs for image segmentation can be expressed as a parametrized nonlinear operator for a multilayer neural network, where denotes the input image to the network and its output of each network layer is denoted by , . The structures of neural networks are typically compositions of such layers, e.g. DeepLabV3 [36]:
(7) |
In (7), the activation function can be chosen as many different operators, such as ReLU, sigmoid, and tanh. It can also include downsampling, upsampling operators, and their compositions. In the final layer, is typically a soft classification activation function of sigmoid or softmax. The operator describes the connections between the t-th layer and its previous layers . For example, in a simple convolutional network, is only associated to its previous layer and where and are the convolution kernel and the bias of an affine transformation, respectively. Let be the collection of learnable parameters:
For the problem of binary image segmentation, the sigmoid function is often chosen as the activation function of the final layer in DNNs:
(8) |
where is the logits output of the neural network, maps the logits from to and enforces output probabilities.
However, integrating high-level spatial priors into DNNs is still a challenge for the often-used DNNs. In this respect, Liu et al [10] proposed an interesting work to naturally enforce the classical spatial regularization term into DNNs:
(9) |
It is motivated by the fact that the sigmoid function can be regarded as the minimizer of a dual function of the entropy-regularized Rectified Linear Unit (ReLU):
(10) |
when the entropic regularization parameter . Clearly, the last layer of (9) just represents a total-variation regularized sigmoid function.
Inspired by [10], we propose to incorporate the studied shape-compactness prior into DNNs, to push the spatial compactness of the segmentation result from DNNs.
3 Image segmentation with compactness prior
In this work, let be the spatial compactness regularization of a function , where is defined in Sec. 2.1, as below:
Given the classical image segmentation model (3), we consider replacing its length-minimization term, i.e. the total-variation function, by the above shape-compactness prior, and propose the new image segmentation model as follows:
(11) |
where the region force and denotes the inner product.
3.1 Analysis of the optimization model (11)
We first show that the minimizer of the novel shape-compactness regularized image segmentation model (11) does exist and its minimizer tends to be a circular region when the regularization parameter in (11) becomes small enough.
Proposition 1
Suppose is a sequence of functions in and in . Then
Proof 1
Since is closed, is also in . If , then follows directly from the isoperimetric inequality. If , then, by the lower semi-continuous of total variation [37], we have and . Thus,
Lemma 1 (Existence of minimizer)
For any , there exists a minimizer of in .
Proof 2
Suppose is a minimizing sequence of in such that . Then, there exists a constant such that
In addition, we have . Thus, the BV norm of is uniformly bounded for :
By Theorem 2.5 of [37], there exists a sub-sequence converging to a function in . Since is closed, also belongs to . By Proposition 1, we have for any . Then,
Thus, and is a minimizer.
Lemma 2 (Convergence of minimizers)
Denote , then every cluster point of the sequence is the indicator function of a ball in .
Proof 3
Let be the indicator function of a ball in , then
for . Thus, . Using a similar argument with Lemma 1, we can show has bounded BV norm and thus converges to some up to a sub-sequence . Then, . Therefore, and must be the indicator function of a ball.
3.2 Dolz-Ayed-Desrosiers algorithm
Dolz et al [14] proposed an algorithm to solve the model (11) based on the alternating direction methods of multipliers (ADMM). By the two auxiliary variables and , the studied optimization model (11) can be written as the following linear-equality constrained optimization problem:
s.t |
where is a constant function in with all values equal to . Its corresponding augmented Lagrangian function is:
At each iteration, each variable of is updated separately while fixing the two multipliers of and the multipliers of are updated as the classical augmented Lagrangian method consequently. Particulalrly, the anisotropic total-variation functions of and are applied, so the step of update is solved by graph-cut and update is computed by tackling a large-scale linear equation; the step of update is simply finished by finding the root of a cubic equation.
4 Novel equivalent models and efficient algorithms
The algorithm proposed in [14] has a series of hyper-parameters, which require fine-tuning and impacts its numerical performance in practice. In this work, we reformulate the image segmentation model (11) with the shape-compactness prior and propose new efficient algorithms.
4.1 Novel equivalent optimization models
As shown in [11; 23], the total-variation function , i.e. the boundary length term, can well approximated by
(12) |
where is the Gaussian kernel. Such an approximation -converges to the exact length as .
Hence, the approximated model of (11) can be essentially formulated as
(13) |
Let . In view of the following dual representation of the convex function such that
the optimization model (13) can be equally rewritten as
4.2 Primal-dual algorithms using threshold dynamics (PD-TD)
Given the primal-dual formulation (14), we propose the first primal-dual algorithm based on threshold dynamics. Given and at the -th step, the new algorithm involves two steps to update and :
First, we fix and update by
(15) |
which can be solved approximately by linearizing at :
The optimum can, therefore, be efficiently computed by threshholding , i.e.
(16) |
where
Second, we fix and update by solving a quadratic equation explicitly:
(17) |
We list the details of the proposed primal-dual algorithm using threshold dynamics (PD-TD) in Alg. 1.
4.3 Primal-dual algorithm using soft threshold dynamics (PD-STD)
In this section, we propose another new primal-dual-based algorithm which is similar to the introduced PD-TD algorithm (Alg. 1), including two steps of updating and at each iteration. But different from the PD-TD algorithm of Alg. 1, we apply soft threshholding to , instead of its hard threshholding as (16). This allows the result to be within and the whole procedure is differentiable and can thus be integrated into the DNNs as a variational sigmoid layer.
Given at the -th iteration, we first fix and compute by
(18) |
where an entropy regularization is introduced to the loss function for some [11].
Second, we fix and introduce a proximal-point algorithm to update :
(19) |
The update (19) becomes (17) if we choose . After getting , we can obtain a binary solution in by thresholding . The details of the primal-dual algorithm using soft threshold dynamics (PD-STD) are listed in Alg. 2. Notice that the PD-TD algorithm (Alg. 1) can be taken as a special case of the PD-STD algorithm (Alg. 2) when and .
4.4 New PD-STD Neural Network
Utilizing the variational explanation of the sigmoid function (10) and the neural network incorporating spatial regularization (9), we introduce a novel neural network which properly integrates the shape-compactness prior as follows:
(20) |
In the last layer of the neural network above, we set the regularizer to the approximated shape compactness prior as defined in (13).
The proposed PD-STD algorithm helps obtain smooth and compact segmentation regions, by unrolling the PD-STD algorithm as network sublayers: for the last layer , each iteration of the proposed PD-STD algorithm can be regarded as a layer in DNNs, and thus the algorithm forms the new PD-STD Layers as shown in Fig. 1, and this will be used as the last block for our neural network as shown in Fig. 2. In Fig. 1, we show the layers which are resulted from Algorithm 2 and only and updates are illustrated. The updating is part of the updating and not shown here.
Our new network architecture based on (20) is shown in Fig. 2. The network in (20) for is referred as the backbone network. The backbone network can be any network which can produce the values for Algorithm 2 which is used as the last layer for our new network. The backbone network can be various image segmentation networks, including U-Net [3] and SegNet [21].
5 Numerical experiments
In this section, experiments on both synthetic and real-world images are conducted to evaluate the performance of our proposed algorithms which are implemented in Matlab and Python, particularly the algorithms of PD-TD and PD-STD in Matlab and PD-STD-based DNNs in Pytorch. Implementation of the proposed PD-TD and PD-STD algorithms is straightforward as in Alg. 1 and Alg. 2. The two popular neural networks of DeepLabV3 [36] and IrisParseNet [39] are taken as the backbone networks to apply the introduced PD-STD block to build up the new PD-STD-based DNNs as stated in Section 4.4. We compare the new networks with their original networks over different datasets to demonstrate the advantages of the PD-STD block, especially on noisy image datasets. Training is conducted on a computer equipped with 4 x NVIDIA Tesla V100-32G GPUs. The convolution is performed using the classical 2-D discrete Gaussian kernel of size (with ), with zero padding. In this work, the grayvalue of each image pixel is normalized between 0 and 1. Two types of noise, i.e. Gaussian and salt-and-peppr, are applied for related experiments. Hence, adding Gaussian noise of to an image implies introducing noise with a standard deviation (SD) of to each pixel. The noise level for salt-and-pepper noise represents the probability of noise presence in the image.
We also compare the proposed algorithms with the state-of-the-art algorithms in efficiency, accuracy, and robustness. For the accuracy evaluation of image segmentation, the two metrics of Dice and IoU are applied:
where is the ground truth label, is the computed segmentation area, and and are their intersection and union, respectively. The following compactness metric is introduced to measure the shape compactness of the segmentation result :
(21) |
5.1 Experiments of the proposed PD-TD and PD-STD algorithms
The experiment results of the proposed Algorithms of PD-TD and PD-STD, on both syntehtic and real-world images, are shown in this section and compared to the ADMM algorithm introduced by Dolz et al. [14] (see Sec. 3.2 for implementation details). As shown in Fig. 3, the segmentation result tends to be more circular as the weight parameter in (11) becomes smaller, which means the weight of the introduced shape-compactness regularization is relatively bigger. This is consistent with Lemma 2, and confirms that the introduced shape-compactness regularization does work properly for all the three algorithms of ADMM, PD-TD and PD-STD. Additionally, both the PD-TD and PD-STD algorithms demonstrate the ability to produce more compact segmentation results, with the PD-STD algorithm achieving the best performance.
Fig. 4 illustrates the comparison on real images. Clearly, our proposed algorithms yield more compact segmentation results with smoother boundaries. To quantify the performance, we compare the averaged dice score, shape compactness, and computational time in Tab. 1. The results clearly indicate the superiority of our proposed methods: the proposed algorithms of PD-TD and PD-STD significantly outperform the ADMM algorithm in both compactness and efficiency. The compared ADMM algorithm even fails to obtain a compact segmentation region as result in some cases. In particular, the PD-STD algorithm outperforms the other algorithms by producing results with the highest dice scores, which directly indicate more accurate segmentation results. Additionally, the PD-STD algorithm exhibits significantly faster computational speed compared to the ADMM algorithm. While the PD-TD algorithm demonstrates the shortest computational time, its compactness performance does not match the competitive level achieved by the PD-STD algorithm.
Dice | Compactness | Time(s) | |
---|---|---|---|
PD-TD | 0.8863 | 14.0240 | 0.3936 |
PD-STD | 0.9631 | 14.0086 | 4.2269 |
ADMM | 0.9246 | 22.2291 | 35.7028 |
5.2 Experiments for PD-STD-Based DNNs
In this work, two popular DNNs of DeepLabV3 [36] and IrisParseNet [39] are taken as the backbone networks for which the PD-STD block is introduced to replace their sigmoid layers as shown in Fig. 2, which encodes the compact shape prior in the proposed DNNs for training. Experiments show great performance of the proposed PD-STD-based DNNs in extracting compact regions from images, especially when there is high noise.
5.2.1 Implementation Details
We reimplemented the networks of DeepLabV3 and IrisParseNet as described in [36] and [39] using PyTorch and strictly adhered to the hyper-parameter settings specified in the papers [36] and [39]. We conducted multiple runs by randomly initializing the network parameters and selected the best results from each run for both networks. ’deeplabv3_resnet50’ from ’torchvision.models. segmentation’ is imported and we set the ’pretrained_backbone=TRUE’ to load the pre-training parameters for all the networks, so as to accelerate training. In our experiments, we use the ADAM optimizer and the learning rate is set as 0.0001. Actually, both the weight parameter in (18) and the iteration number in Alg. 2 impact the segmentation results. Increasing the iteration number enhances shape-compactness of the segmentation results, albeit at the cost of more running time. On the other hand, influences the speed at which the segmentation result reaches optimum: Higher values of often speed up the optimization process. Consequently, we aim to select an optimal pair of and to achieve rapid convergence and shape compactness jointly in our experiments.
5.2.2 Experiments on Iris Dataset
In this section, the Iris dataset of UBIRIS.v2, consisting of 483 training images and 436 testing images (all of size ), is taken for experiments to show the performance of our proposed PD-STD-based network. This dataset is publically available on the GitHub website: https://github.com/xiamenwcy/IrisParseNet. We add different degrees of Gaussian noise and salt-and-pepper noise to the test image dataset to further evaluate the robustness of our proposed networks.
The two networks of DeepLabV3 and our proposed PD-STD + DeepLabV3 are first trained by the clean Iris dataset (without noise), and the iteration number of the PD-STD block is set to . The two networks are then evaluated on both clean and noisy image datasets. Different levels of Gaussian noise with standard deviation (, , , and ) and salt-and-pepper noise () are also applied for tests. Experiment results for the proposed method versus the original DeepLabV3 network are shown in Tab. 2, where higher values of IoU and Dice metrics indicate more accurate results and the metric of Compactness approaching signifies more compact segmentation regions. Clearly, our proposed PD-STD + DeepLabV3 outperforms DeepLabV3 in all metrics, particularly when segmenting images with higher noise levels. Fig. 5 shows several examples that illustrate the effectiveness of the proposed PD-STD + DeepLabV3 in mitigating the rough boundaries caused by the noise introduced. It is obvious that PD-STD + DeepLabV3 yields segmentation outcomes that are more preferable with compact shapes. When the Gaussian noise level is relatively large (such as 0.1), DeepLabV3 fails to get reasonable segmentation results.
Method | Clean | Gaussian | Salt & Pepper | ||||
---|---|---|---|---|---|---|---|
Noise level | 0 | 0.01 | 0.05 | 0.07 | 0.1 | 0.01 | |
IoU | DeepLabV3 | 0.9096 | 0.9089 | 0.8572 | 0.7315 | 0.3817 | 0.8330 |
PD-STD + DeepLabV3 | |||||||
Dice | DeepLabV3 | 0.9439 | 0.9435 | 0.9086 | 0.7979 | 0.4347 | 0.8904 |
PD-STD + DeepLabV3 | |||||||
Compactness | DeepLabV3 | 14.1590 | 14.1299 | 15.1426 | 18.7855 | 19.4549 | 16.4107 |
PD-STD + DeepLabV3 |
Method | Clean | Gaussian | Salt & Pepper | ||||
---|---|---|---|---|---|---|---|
Noise level | 0 | 0.1 | 0.15 | 0.17 | 0.2 | 0.05 | |
IoU | IrisParseNet | 0.8209 | 0.4363 | 0.2769 | 0.1121 | 0.5822 | |
PD-STD + IrisParseNet | |||||||
Dice | IrisParseNet | 0.8803 | 0.5357 | 0.3682 | 0.1649 | 0.6817 | |
PD-STD + IrisParseNet | |||||||
Compactness | IrisParseNet | 14.8976 | 24.9740 | 44.1741 | 45.6170 | 35.8653 | 61.6093 |
PD-STD + IrisParseNet |
Now we introduce the proposed PD-STD block to the state-of-the-art IrisParseNet network [39], i.e. the PD-STD+IrisParseNet, to further show its effectiveness in pursuit of shape prior information on compactness. In contrast to the general-purpose DeepLabV3 network, IrisParseNet is purposely designed to tackle the challenge of iris images affected by severe noise. Similar experiments for IrisParseNet [39] and our proposed PD-STD + IrisParseNet are conducted for comparisons. Different levels of Gaussian noise ranging from 0.1 to 0.2 and salt-and-pepper noise of 0.05 are set for the experiments. Tab. 3 demonstrates that our proposed PD-STD + IrisParseNet exhibits comparable performance in segmenting clean images, and significantly outperforms the original IrisParseNet in segmenting noisy images, as the examples illustrated in Fig. 6.
Noisy Image
Ground Truth
IrisParseNet
DeepLabV3
IrisParseNet + PD-STD
DeepLabV3 + PD-STD
On the other hand, we also build up a training image dataset with a Gaussian noise level of , and compare the performance of the four networks of DeepLabV3, IrisParseNet and our proposed PD-STD-based networks, trained on the noisy image dataset with those trained on the clean image dataset. Tab. 4 presents the segmentation results on clean images and images with different noise levels, including Gaussian noise levels of , , and , as well as salt-and-pepper noise levels of and . As shown in Tab. 4 and Fig. 7, the performance of DeepLabV3 and IrisParseNet is largely improved when trained in the noisy image dataset. However, our proposed PD-STD-based networks still outperform in all cases, especially in the compactness metric when segmenting images with Salt & Pepper noises.
Method | Clean | Gaussian | Salt & Pepper | ||||
Noise level | 0 | 0.01 | 0.05 | 0.1 | 0.01 | 0.05 | |
IoU | IrisParseNet | 0.8618 | 0.8943 | 0.8995 | 0.9017 | 0.8629 | 0.8491 |
DeepLabV3 | |||||||
PD-STD + IrisParseNet | 0.8880 | 0.8827 | |||||
PD-STD + DeepLabV3 | 0.8959 | 0.6784 | |||||
Dice | IrisParseNet | 0.9097 | 0.9330 | 0.9370 | 0.9388 | 0.9108 | 0.9032 |
DeepLabV3 | 0.9120 | 0.9186 | 0.9373 | 0.9371 | 0.9215 | 0.7531 | |
PD-STD + IrisParseNet | 0.9293 | 0.9397 | 0.9263 | ||||
PD-STD + DeepLabV3 | 0.9350 | 0.9397 | 0.7617 | ||||
Compactness | IrisParseNet | 19.1554 | 15.7192 | 15.1328 | 14.7285 | 19.1122 | 19.3267 |
DeepLabV3 | 21.0197 | 19.6417 | 15.6534 | 14.6295 | 18.5644 | 19.6172 | |
PD-STD + IrisParseNet | 15.0566 | 14.5395 | 14.3723 | 14.2044 | 15.3038 | 14.9607 | |
PD-STD + DeepLabV3 |
5.2.3 Experiments on Fundus dataset
More experiments on the image dataset of Fundus [40] are given in this section, which comprises training images and testing images. Segmenting the optic disc region is essential for fundus image analysis. However, the segmentation result is often affected by existing blood vessels and a noisy imaging condition. In our experiments, the training images are cropped to the size of , and we set the training batch size to , the training epoch to , , and the iteration number to . Both DeepLabV3 and the proposed PD-STD-DeepLabV3 are trained on the clean Fundus dataset, and tested on the images with different noise types and levels, such as Gaussian noise levels of , , , , and Salt & Pepper noise . The experiment results obtained by the baseline network DeepLabV3 and our proposed PD-STD+DeepLabV3 are presented in Tab. 5. The proposed PD-STD+DeepLabV3 consistently outperforms DeepLabV3 in terms of IoU, Dice, and Compactness metrics across all noise levels. When tested on the images with a Gaussian noise level of , the proposed PD-STD+DeepLabV3 significantly improves the accuracy of the segmentation results in terms of IoU and Dice. Fig. 8 visually exhibits the superior performance of our proposed PD-STD+DeepLabV3.
Method | Clean | Gaussian | Salt & Pepper | ||||
---|---|---|---|---|---|---|---|
Noise level | 0 | 0.01 | 0.05 | 0.07 | 0.1 | 0.001 | |
IoU | DeepLabV3 | 0.9028 | 0.9028 | 0.8648 | 0.8254 | 0.7681 | 0.8220 |
PD-STD + DeepLabV3 | |||||||
Dice | DeepLabV3 | 0.9482 | 0.9483 | 0.9263 | 0.9028 | 0.8673 | 0.9008 |
PD-STD + DeepLabV3 | |||||||
Compactness | DeepLabV3 | 14.3698 | 14.3200 | 14.7309 | 15.5674 | 17.2642 | 15.6184 |
PD-STD + DeepLabV3 |
Noise level of test image dataset | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Methods | clean | |||||||||
IoU | DeepLabV3 | |||||||||
PD-STD+DeepLabV3 | ||||||||||
DeepLabV3* | ||||||||||
PD-STD+DeepLabV3* | ||||||||||
Dice | DeepLabV3 | |||||||||
PD-STD+DeepLabV3 | ||||||||||
DeepLabV3* | ||||||||||
PD-STD+DeepLabV3* | ||||||||||
Comp. | DeepLabV3 | |||||||||
PD-STD+DeepLabV3 | ||||||||||
DeepLabV3* | ||||||||||
PD-STD+DeepLabV3* |
5.3 Progressive Training Strategy
In this work, a progressive training strategy is also used to train the neural networks and update their parameters progressively from the image dataset of low noise level to the one with high noise level. As shown in [23], such progressive training strategy improves the robustness and reliability of trained neural networks, compared to the direct training strategy which trains neural networks directly on the image dataset with a high level of noise. A comprehensive comparison of the direct training strategy and the introduced progressive training strategy is shown in Tab. 6: the networks of DeepLabV3 and our proposed PD-STD+DeepLabV3 are trained by the two training strategies at a high Gaussian noise level of , respectively; the trained networks of DeepLabV3 and PD-STD+DeepLabV3 by the progressive training strategy are denoted by DeepLabV3* and PD-STD+DeepLabV3* correspondingly.
Noisy Image
Ground Truth
DeepLabV3
PD-STD
+DeepLabV3
DeepLabV3*
PD-STD
+DeepLabV3*
It is easy to see that the progressive training strategy does improve the performance of trained networks significantly in terms of of IoU, Dice and Compactness; and the networks of DeepLabV3* and PD-STD+DeepLabV3*, which are trained in the progressive way, perform much more robustly and reliably on test images of different noise levels. Fig. 9 clearly illustrates some examples: the networks trained by the direct training strategy at a high noise level of perform much worse on the test image with a low noise level.
clean | DeepLabV3 | ||||||||
---|---|---|---|---|---|---|---|---|---|
PD-STD+DeepLabV3 | |||||||||
0.1 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.2 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.3 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.4 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.5 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.6 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.7 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 | |||||||||
0.8 | DeepLabV3 | ||||||||
PD-STD+DeepLabV3 |
The progressive training strategy is employed for the Iris dataset from the image noise level of to gradually at a step-size and its extensive results are shown in Tab. 7. Given the circular shape of Iris, the compactness of the optimal image segmentation result tends to be ; so the compactness value closer to means a more circular segmentation region is obtained, which means better. In view of this, our proposed PD-STD+DeepLabV3 performs better than DeepLabV3 in both accuracy and robustness when trained at a fixed image noise level but tested at different image noise levels; also, the same when trained at different image noise levels but tested at a fixed image noise level. In all cases, our proposed PD-STD+DeepLabV3 achieves better results than DeepLabV3. Fig. 10 (a) and (b) demonstrate the performance of two examples of such experiments, in terms of IoU, when trained at a fixed image noise level but tested on different image noise levels: with the help of the introduced PD-STD block, i.e. incorporating a compact shape prior information, our proposed PD-STD+DeepLabV3 can obtain more accurate results than DeepLabV3 when trained at a fixed image noise level of (a) and (b) but tested at different image noise levels; moreover, as (a) shown, when both networks are trained at the image noise level of , DeepLabV3 achieves the result close to our proposed PD-STD+DeepLabV3 at test image noise level of , but it performs much worse than PD-STD+DeepLabV3 at a high test image noise level of ; the proposed PD-STD+DeepLabV3 exhibits better robustness than DeepLabV3 in both (a) and (b), which enables PD-STD-based networks to properly reduce the influence of real-world high noise level and data variability. Fig. 11 provides three illustrative examples which show that the proposed PD-STD network block can effectively incorporate proper shape information, thus ensuring more reasonable image segmentation results when noise and data variability exist.
As the above experiments show, by training the networks on the image datasets with higher noise levels and progressively increasing the noise level during training, the networks can be gradually adapted to different noisy images and memorize image views of different noise levels, which hence enhances the networks’ ability to handle image noise and improves their performance on datasets with various noise levels. The progressive training strategy is therefore essential to improve the trained networks’ performance in accuracy and robustness across different image noise levels.
(a)
(b)
6 Conclusion
In this paper, we proposed two novel algorithms PD-TD and PD-STD to solve the challenging image segmentation problem with a high-order shape-compactness prior, which essentially evaluates the ratio of squared perimeter to area. The new algorithms are based on the new primal-dual model, which is equivalent to the studied optimization problem, outperform existing methods in numerical simplicity and effectiveness. Meanwhile, a new PD-STD block is introduced to replace the often-used sigmoid layer of the backbone DNNs, which properly integrates the shape-compactness information into the neural network and enforces compact regions in segmentation results. Extensive experiments, especially on highly noisy image datasets, show that the proposed PD-STD-based neural networks significantly outperform the state-of-the-art DNNs in both robustness and accuracy. Such PD-STD block can also be applied to many other DNN models, besides DeepLabV3 and IrisParseNet used in this work.
Acknowledgements
We acknowledge the following funding sources that supported this work. Dr. Hao Liu was partially supported by the Natural Science Foundation of China (Grant No. 12201530) and the HKRGC ECS (Grant No. 22302123). Professor Jing Yuan acknowledges the support from the National Natural Science Foundation of China (NSFC) under Grant No. 61877047, as well as the Distinguished Professorship Start-up Funding of Zhejiang Normal University (No. YS304022908). Professor Xue-cheng Tai was partially supproted by the NORCE Kompetanseoppbygging program.
References
- [1] G. Máttyus, W. Luo, R. Urtasun, Deeproadmapper: Extracting road topology from aerial images, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3438–3446.
- [2] K. Kamnitsas, E. Ferrante, S. Parisot, C. Ledig, A. V. Nori, A. Criminisi, D. Rueckert, B. Glocker, Deepmedic for brain tumor segmentation, in: A. Crimi, B. Menze, O. Maier, M. Reyes, S. Winzeck, H. Handels (Eds.), Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Springer International Publishing, Cham, 2016, pp. 138–149.
-
[3]
O. Ronneberger, P.Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Vol. 9351 of LNCS, Springer, 2015, pp. 234–241, (available on arXiv:1505.04597 [cs.CV]).
URL http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a - [4] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 2016, pp. 565–571.
- [5] M. Kampffmeyer, A.-B. Salberg, R. Jenssen, Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 1–9.
- [6] Z. Pan, J. Xu, Y. Guo, Y. Hu, G. Wang, Deep learning segmentation and classification for urban village using a worldview satellite image based on u-net, Remote Sensing 12 (10) (2020) 1574.
- [7] D. B. Mumford, J. Shah, Optimal approximations by piecewise smooth functions and associated variational problems, Communications on Pure and Applied Mathematics (1989).
- [8] T. F. Chan, L. A. Vese, Active contours without edges, IEEE Transactions on Image Processing 10 (2) (2001) 266–277.
- [9] R. B. Potts, Some generalized order-disorder transformations, in: Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 48, Cambridge University Press, 1952, pp. 106–109.
- [10] J. Liu, X.-C. Tai, S. Luo, Convex shape prior for deep neural convolution network based eye fundus images segmentation, arXiv preprint arXiv:2005.07476 (2020).
- [11] J. Liu, X. Wang, X.-c. Tai, Deep convolutional neural networks with spatial regularization, volume and star-shape priors for image segmentation, Journal of Mathematical Imaging and Vision (2022) 1–21.
- [12] O. Veksler, Star shape prior for graph-cut image segmentation, in: Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III 10, Springer, 2008, pp. 454–467.
- [13] I. Ben Ayed, M. Wang, B. Miles, G. J. Garvin, Tric: Trust region for invariant compactness and its application to abdominal aorta segmentation, in: P. Golland, N. Hata, C. Barillot, J. Hornegger, R. Howe (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Springer International Publishing, Cham, 2014, pp. 381–388.
- [14] J. Dolz, I. Ben Ayed, C. Desrosiers, Unbiased shape compactness for segmentation, in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2017, Springer International Publishing, 2017, pp. 755–763.
- [15] O. Veksler, Stereo matching by compact windows via minimum ratio cycle, in: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 1, 2001, pp. 540–547 vol.1. doi:10.1109/ICCV.2001.937563.
- [16] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel, Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems 2 (1989).
- [17] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25 (2012).
- [18] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs (2016). arXiv:1412.7062.
- [19] M. Yang, K. Yu, C. Zhang, Z. Li, K. Yang, Denseaspp for semantic segmentation in street scenes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3684–3692.
- [20] A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, arXiv preprint arXiv:1606.02147 (2016).
- [21] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12) (2017) 2481–2495.
- [22] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: Redesigning skip connections to exploit multiscale features in image segmentation, IEEE transactions on medical imaging 39 (6) (2019) 1856–1867.
- [23] X.-C. Tai, H. Liu, R. Chan, PottsMGNet: A mathematical explanation of encoder-decoder based neural networks, SIAM Journal on Imaging Sciences 17 (1) (2024) 540–594.
- [24] H. Liu, J. Liu, R. Chan, X.-C. Tai, Double-well net for image segmentation, arXiv preprint arXiv:2401.00456 (2023).
- [25] S. Geman, D. Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 721–741.
- [26] Y. Boykov, O. Veksler, R. Zabih, Markov random fields with efficient approximations, in: Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231), IEEE, 1998, pp. 648–655.
- [27] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 1222 – 1239.
- [28] T. F. Chan, S. Esedoglu, M. Nikolova, Algorithms for finding global minimizers of image segmentation and denoising models, SIAM journal on applied mathematics 66 (5) (2006) 1632–1648.
- [29] J. Yuan, E. Bae, X.-C. Tai, Y. Boykov, A continuous max-flow approach to potts model, in: European Conference on Computer Vision, Springer, 2010, pp. 379–392.
- [30] A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision 40 (1) (2011) 120–145.
- [31] E. Esser, X. Zhang, T. F. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM Journal on Imaging Sciences 3 (4) (2010) 1015–1046.
- [32] C. Wu, X.-C. Tai, Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models, SIAM Journal on Imaging Sciences 3 (2010) 300–339.
- [33] X.-C. Tai, L. Li, E. Bae, The Potts model with different piecewise constant representations and fast algorithms: a survey, Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision (2021) 1–41.
- [34] R. Osserman, The isoperimetric inequality, Bulletin of the American Mathematical Society 84 (6) (1978) 1182–1238.
- [35] A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock, An introduction to total variation for image analysis, Theoretical Foundations and Numerical Methods for Sparse Recovery 9 (263-340) (2010) 227.
- [36] L.-C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587 (2017).
- [37] R. Acar, C. R. Vogel, Analysis of bounded variation penalty methods for ill-posed problems, Inverse Problems 10 (6) (1994) 1217.
- [38] D. Wang, H. Li, X. Wei, X.-P. Wang, An efficient iterative thresholding method for image segmentation, Journal of Computational Physics 350 (2017) 657–667.
- [39] C. Wang, J. Muhammad, Y. Wang, Z. He, Z. Sun, Towards complete and accurate iris segmentation using deep multi-task attention network for non-cooperative iris recognition, IEEE Transactions on Information Forensics and Security 15 (2020) 2944–2959. doi:10.1109/TIFS.2020.2980791.
- [40] K. Jin, X. Huang, J. Zhou, Y. Li, Y. Yan, Y. Sun, Q. Zhang, Y. Wang, J. Ye, Fives: A fundus image dataset for artificial intelligence based vessel segmentation, Scientific Data 9 (1) (2022) 475.