This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Deep Convolutional Neural Networks Meet Variational Shape Compactness Priors for Image Segmentation

Kehui Zhang Lingfeng Li Hao Liu Jing Yuan Xue-Cheng Tai
Abstract

Shape compactness is a key geometrical property to describe interesting regions in many image segmentation tasks. In this paper, we propose two novel algorithms to solve the introduced image segmentation problem that incorporates a shape-compactness prior. Existing algorithms for such a problem often suffer from computational inefficiency, difficulty in reaching a local minimum, and the need to fine-tune the hyperparameters. To address these issues, we propose a novel optimization model along with its equivalent primal-dual model and introduce a new optimization algorithm based on primal-dual threshold dynamics (PD-TD). Additionally, we relax the solution constraint and propose another novel primal-dual soft threshold-dynamics algorithm (PD-STD) to achieve superior performance. Based on the variational explanation of the sigmoid layer, the proposed PD-STD algorithm can be integrated into Deep Neural Networks (DNNs) to enforce compact regions as image segmentation results. Compared to existing deep learning methods, extensive experiments demonstrated that the proposed algorithms outperformed state-of-the-art algorithms in numerical efficiency and effectiveness, especially while applying to the popular networks of DeepLabV3 and IrisParseNet with higher IoU, dice, and compactness metrics on noisy Iris datasets. In particular, the proposed algorithms significantly improve IoU by 20%percent2020\% training on a highly noisy image dataset.

keywords:
Shape compactness , Image segmentation , Deep neural networks , threshold dynamics
journal: Neural Networks
\affiliation

[inst1]organization=Department of Mathematics, Hong Kong Baptist University,addressline=Kowloon Tong, city=Hong Kong, country=China

\affiliation

[inst2]organization=Hong Kong Center for Cerebro-Cardiovascular Health Engineering,addressline=Sha Tin, city=Hong Kong, country=China

\affiliation

[inst3]organization=College of Mathematical Medicine, Zhejiang Normal University,addressline=Jinhua, city=Zhejiang, country=China

\affiliation

[inst4]organization=Norwegian Research Center,city=Bergen, country=Norway

1 Introduction

Image segmentation is a fundamental process in computer vision that involves partitioning a digital image into distinct non-overlapping subregions, including a wide spectrum of real-world applications such as autonomous driving [1], medical image segmentation [2; 3; 4], and satellite image processing [5; 6]. In this respect, the variational method often provides a popular and mathematically explanable approach by formulating image segmentation as a minimization model whose energy function usually contains a data fidelity term and a boundary regularization term. The often-used variational models include the Mumford-Shah model [7], the Chan-Vese model [8], and the Potts model [9]. However, when the image is corrupted by noise or the object of interest is partially blocked in the image, such variational models may not produce desirable segmentation results; hence, some task-dependent prior knowledge is incorporated into the variational model by adding extra penalization functions or hard constraints, so as to improve the final results. Many shape priors have been studied in the literature, such as the convexity prior [10], the shape volume prior [11], the star-shape prior [12], and the compactness prior [13; 14], etc.

In this work, we study the variational image segmentation model with a compactness shape prior. The shape compactness prior helps to obtain a more compact segmentation mask with smooth boundaries in the results, by introducing an additional penalization term to describe a compact region geometrically [13; 14]. Previous studies have explored different definitions of shape compactness. The perimeter-to-area ratio was proposed in [15] to depict the compactness of the segmented regions. The authors in [13] introduced a new compactness term that is invariant from scale, rotation, and translation. The authors of [14] first integrate compactness, in terms of the ratio of the squared perimeter to the area, into image segmentation to ensure size invariance. In addition, it is well-known that the ratio of squared perimeter to area achieves the minimum when the shape is circular, which implies its preference to circular shape segments in the result. However, adding this ratio leads to a challenging non-convex optimization problem. The authors of [14] proposed an algorithm based on the alternating-direction method of multipliers (ADMM), which is, however, computationally expensive and requires fine-tuning of many hyper-parameters.

Given the great success of deep learning-based (DL-based) methods in many different applications during the past decades, many neural network architectures have been proposed for image processing, such as LeNet [16], AlexNet [17], and U-Net [3], etc. In a series of recent works [18; 19; 20], dilated convolutions, which increase the receptive field without a significant increase in computational cost, are used to achieve state-of-the-art segmentation performance. Specifically, U-Net, as proposed in [3], employs an encoder-decoder structure that concatenates multiscale features, allowing effective image segmentation. Such an architecture has inspired several works, such as SegNet [21], U-Net++ [22]. In [23], a novel neural network framework, the so-called PottsMGNet, is proposed to solve the variational Potts model of image segmentation by leveraging operator-splitting methods, multigrid methods, and control variables. Moreover, PottsMGNet shows that most encoder-decoder-based networks are equivalent to some optimal control problems for certain initial value problems using operator-splitting methods as solvers. Although deep learning (DL) based methods have achieved remarkable performance for image segmentation, integrating spatial information priors into such DL-based methods still remains a challenge. Recently, the combination of shape priors with deep learning models was explored such that semantic features were automatically extracted from large datasets using neural networks. Together with the Potts model, operator splitting methods, and double-well potential, the U-Net architecture is used in [24] to segment images with length penalties. In [11], a method is proposed to incorporate spatial priors by introducing the variational explanation of the sigmoid function. This method allows for the integration of shape priors, such as convexity and star shape, into deep learning-based approaches. Motivated by this, this paper aims to incorporate compactness priors with DL-based methods.

In this paper, we consider a non-convex optimization model for image segmentation with the shape compactness prior, for which we introduce two primal-dual-based (PD-based) optimization algorithms using threshold dynamics (TD) and soft threshold dynamics (STD), respectively. The proposed algorithms enjoy advantages in both theory and numerical solvers. Moreover, the introduced PD-STD algorithm can be easily integrated into many popular neural networks such as DeepLabV3 and IrisParseNet, and significantly improves performance while segmenting noisy images. We list our main contributions of this work as follows:

  • - We study Potts model joint with a shape compactness prior for image segmentation, and introduce its novel equivalent primal-dual and dual models.

  • - Two novel optimization algorithms, namely the PD-TD (Primal-Dual Threshold Dynamics) and the PD-STD (Primal-Dual Soft Threshold Dynamic) algorithms, are proposed based on the introduced primal-dual and dual models, which outperform previous methods in efficiency and accuracy.

  • - Motivated by [11], the PD-STD algorithm can be integrated into the structures of deep neural networks. The new networks demonstrate better robustness for segmenting objects of circular shapes, especially on noisy images.

This work is organized as follows. In Sec. 2, we discuss related works of the classical variational segmentation models, especially the models with shape-compactness prior; also, recent deep learning-based image segmentation methods. We introduce a new variational model in Sec. 3, and propose its novel equivalent models and corresponding algorithms in Sec. 4. In Sec. 5, experimental results are presented to verify the efficiency, accuracy, and robustness of our proposed algorithms. The conclusions will be given in Sec. 6.

2 Preliminaries

2.1 Variational segmentation models

Suppose ΩΩ\Omega is an input compact image domain in 2superscript2\mathbb{R}^{2}. The objective of binary image segmentation is to divide the input image 𝒫(x):Ω:𝒫𝑥Ω\mathcal{P}(x):\Omega\to\mathbb{R} into two distinct regions, i.e. the foreground region Ω0subscriptΩ0\Omega_{0} and the background region ΩΩ0ΩsubscriptΩ0\Omega\setminus\Omega_{0}. The aim is to assign each pixel xΩ𝑥Ωx\in\Omega to a specific region of the foreground and background, respectively. One of the most popular image segmentation models is the Potts model [9], which originated from statistical mechanics to represent interactions between spins on a crystalline lattice and well studied in a discrete optimization setting by Geman [25] and Boykov et al. [26; 27]. During recent decades, the extension of Potts model in the continuous optimization setting has gained significant attentions in image segmentation, which aims to minimize the following energy function:

minΩ0EPotts({Ω0})subscriptsubscriptΩ0subscript𝐸PottssubscriptΩ0\displaystyle\min_{\Omega_{0}}E_{\mathrm{Potts}}(\{\Omega_{0}\}) :=Ω0f(x)𝑑x+α|Ω0|,assignabsentsubscriptsubscriptΩ0𝑓𝑥differential-d𝑥𝛼subscriptΩ0\displaystyle:=\int_{\Omega_{0}}f(x)\,dx+\alpha|\partial\Omega_{0}|, (1)

where f(x)𝑓𝑥f(x) is the data fidelity term at each pixel xΩ𝑥Ωx\in\Omega, depending on the given image information 𝒫(x)𝒫𝑥\mathcal{P}(x), and the second term |Ω0|subscriptΩ0|\partial\Omega_{0}|, named edge force term, measures the boundary length of the foreground region Ω0subscriptΩ0\Omega_{0}. Clearly, the second term serves as the regularization term promoting smooth and well-defined boundaries.

Let u(x)𝑢𝑥u(x) be the discrete-valued labeling function u:ΩS:𝑢Ω𝑆u:\Omega\to S such that

S={u(x){0,1},xΩ|uL1(Ω)},whereu(x)={1,xΩ0,0,xΩ0.formulae-sequence𝑆conditional-setformulae-sequence𝑢𝑥01for-all𝑥Ω𝑢superscript𝐿1Ωwhere𝑢𝑥cases1𝑥subscriptΩ0otherwise0𝑥subscriptΩ0otherwise\displaystyle S=\left\{{u}(x)\in\{0,1\},\forall x\in\Omega\bigg{|}u\in L^{1}(\Omega)\right\},\quad\text{where}\;\;u(x)=\begin{cases}1,x\in\Omega_{0},\\ 0,x\notin\Omega_{0}.\end{cases} (2)

The optimization model (1) can, therefore, be rewritten as

minu(x)SΩf(x)u(x)𝑑x+αΩ|u(x)|𝑑x,subscript𝑢𝑥𝑆subscriptΩ𝑓𝑥𝑢𝑥differential-d𝑥𝛼subscriptΩ𝑢𝑥differential-d𝑥\displaystyle\min_{{u}(x)\in S}\int_{\Omega}f(x)u(x)\,dx+\alpha\int_{\Omega}|\nabla u(x)|\,dx, (3)

where the total-variation function gives the estimation of the perimeter of the foreground region, i.e. |Ω0|subscriptΩ0|\partial\Omega_{0}|, in a continuous space setting.

Given the discrete-valued labeling function u(x)𝑢𝑥u(x), the optimization problem (3) is non-convex, thus difficult to solve directly. To address this, its convex relaxation is thus studied:

minu𝒦Ωf(x)u(x)𝑑x+αΩ|u(x)|𝑑x,subscript𝑢𝒦subscriptΩ𝑓𝑥𝑢𝑥differential-d𝑥𝛼subscriptΩ𝑢𝑥differential-d𝑥\displaystyle\min_{{u}\in\mathcal{K}}\int_{\Omega}f(x)u(x)\,dx+\alpha\int_{\Omega}|\nabla u(x)|\,dx, (4)

where

𝒦={u(x)[0,1],xΩ|uL1(Ω)}.𝒦conditional-setformulae-sequence𝑢𝑥01for-all𝑥Ω𝑢superscript𝐿1Ω\displaystyle\mathcal{K}=\left\{u(x)\in[0,1],\forall x\in\Omega\bigg{|}u\in L^{1}(\Omega)\right\}. (5)

It can be proved that the global optimum of the non-convex discrete optimization problem can be easily obtained by simply thresh-holding the optimum of its convex relaxation problem [28; 29]. Multiple efficient algorithms have been developed to address the convex optimization problem (4), based on primal-dual optimization [30; 31; 32] and dual optimization [29] respectively. More details about literature proposed algorithms for solving the Potts model and relevant references can be found in a recent survey paper [33].

2.2 Shape compactness prior

In practice, shape prior information is often incorporated into the applied variational image segmentation models so as to further constrain the solution space of segmentation and improve accuracy and robustness of computation results. Various types of shape prior, such as convexity and star-shape, are proposed and explored in variational image segmentation models [11; 10]. It is of great interest to segment the compact target regions from the given images in many real-world image segmentation tasks. The key challenges lie in how to represent such region compactness in the most effective and descent way in mathematics and solve it efficiently in numerics. This is one of the main research topics in this study.

Given the isoperimetric inequality [34] for the region Ω0subscriptΩ0\Omega_{0}, we have

Per(Ω0)2πArea(Ω0)PersubscriptΩ02𝜋AreasubscriptΩ0\text{Per}(\Omega_{0})\geq 2\sqrt{\pi\text{Area}(\Omega_{0})}

and the equality holds if and only if Ω0subscriptΩ0\Omega_{0} is a ball. Clearly, the ratio of Per2(Ω0)superscriptPer2subscriptΩ0\text{Per}^{2}(\Omega_{0}) and Area(Ω0)AreasubscriptΩ0\text{Area}(\Omega_{0}) is always larger than or equal to 4π4𝜋4\pi, and the minimum is obtained if and only if Ω0subscriptΩ0\Omega_{0} is a ball. Its minimization simply enforces the result of segmentation to be a single compact region. Therefore, we define the following formulation as the measure of shape compactness for the region Ω0subscriptΩ0\Omega_{0}:

Per2(Ω0)Area(Ω0).superscriptPer2subscriptΩ0AreasubscriptΩ0\frac{\text{Per}^{2}(\Omega_{0})}{\text{Area}(\Omega_{0})}.

Let u(x):Ω:𝑢𝑥Ωu(x):\Omega\to\mathbb{R} be the indicator function of the region Ω02subscriptΩ0superscript2\Omega_{0}\subset\mathbb{R}^{2}:

u(x)={1,xΩ0,0,xΩ0.𝑢𝑥cases1𝑥subscriptΩ00𝑥subscriptΩ0u(x)=\begin{cases}1,&x\in\Omega_{0},\\ 0,&x\notin\Omega_{0}.\end{cases}

It is well-known that, for the perimeter of Ω0subscriptΩ0\Omega_{0}, we have Per(Ω0)=|u|TVPersubscriptΩ0subscript𝑢𝑇𝑉\text{Per}(\Omega_{0})=|u|_{TV} [35], where |u|TVsubscript𝑢𝑇𝑉|u|_{TV} is the total-variation of u(x)𝑢𝑥u(x) such that

|u|TV=supv𝒱Ωudiv(v)𝑑x,subscript𝑢𝑇𝑉subscriptsupremum𝑣𝒱subscriptΩ𝑢div𝑣differential-d𝑥|u|_{TV}=\sup_{v\in\mathcal{V}}-\int_{\Omega}u\text{div}(v)dx,

where

𝒱={v(C01(Ω))2,|v(x)|1xΩ}.𝒱formulae-sequence𝑣superscriptsuperscriptsubscript𝐶01Ω2𝑣𝑥1for-all𝑥Ω\mathcal{V}=\{v\in\left(C_{0}^{1}(\Omega)\right)^{2},|v(x)|\leq 1\ \forall x\in\Omega\}.

Thus, the shape compactness of Ω0subscriptΩ0\Omega_{0} can be properly reformulated as

(|u|TV)2Ωu(x)𝑑x.superscriptsubscript𝑢𝑇𝑉2subscriptΩ𝑢𝑥differential-d𝑥\frac{(|u|_{TV})^{2}}{\int_{\Omega}u(x)dx}. (6)

The area of Ω0subscriptΩ0\Omega_{0} is assumed to be non-zero in this paper.

2.3 Deep learning-based segmentation models

This work also introduces the incorporation of the shape compactness prior, as defined in (6), into modern neural networks, such as DNNs. DNNs emerged as a widely used tool in the field of artificial intelligence and have revolutionized numerous domains ranging from computer vision to natural language processing. Especially, with their great ability to learn hierarchical representations of data through multiple layers of interconnected neurons, DNNs have achieved remarkable success in capturing complex patterns and obtaining state-of-the-art performance in various challenging tasks, and emerged as a widely used tool in the field of artificial intelligence. We review the often-used structure of DNNs and discuss the motivation of integrating shape prior information into DNNs.

DNNs for image segmentation can be expressed as a parametrized nonlinear operator 𝒩𝚯(𝒗0)subscript𝒩𝚯superscript𝒗0\mathcal{N}_{\boldsymbol{\Theta}}(\boldsymbol{v}^{0}) for a multilayer neural network, where 𝒗0superscript𝒗0\boldsymbol{v}^{0} denotes the input image to the network and its output of each network layer is denoted by 𝒗tsuperscript𝒗𝑡\boldsymbol{v}^{t}, t=1,,T𝑡1𝑇t=1,...,T. The structures of neural networks are typically compositions of such layers, e.g. DeepLabV3 [36]:

{𝒐t=𝒯𝚯t1(𝒗t1,𝒗t2,,𝒗0),t=1,2,,T,𝒗t=𝒜t(𝒐t),t=1,2,,T.casesformulae-sequencesuperscript𝒐𝑡subscript𝒯superscript𝚯𝑡1superscript𝒗𝑡1superscript𝒗𝑡2superscript𝒗0𝑡12𝑇otherwiseformulae-sequencesuperscript𝒗𝑡superscript𝒜𝑡superscript𝒐𝑡𝑡12𝑇otherwise\begin{cases}\boldsymbol{o}^{t}=\mathcal{T}_{\boldsymbol{\Theta}^{t-1}}(\boldsymbol{v}^{t-1},\boldsymbol{v}^{t-2},\dots,\boldsymbol{v}^{0}),\ t=1,2,\dots,T,\\ \boldsymbol{v}^{t}=\mathcal{A}^{t}(\boldsymbol{o}^{t}),t=1,2,\dots,T.\end{cases} (7)

In (7), the activation function 𝒜tsuperscript𝒜𝑡\mathcal{A}^{t} can be chosen as many different operators, such as ReLU, sigmoid, and tanh. It can also include downsampling, upsampling operators, and their compositions. In the final layer, 𝒜tsuperscript𝒜𝑡\mathcal{A}^{t} is typically a soft classification activation function of sigmoid or softmax. The operator 𝒯𝚯t1(𝒗t1,𝒗t2,,𝒗0)subscript𝒯superscript𝚯𝑡1superscript𝒗𝑡1superscript𝒗𝑡2superscript𝒗0\mathcal{T}_{\boldsymbol{\Theta}^{t-1}}(\boldsymbol{v}^{t-1},\boldsymbol{v}^{t-2},\dots,\boldsymbol{v}^{0}) describes the connections between the t-th layer 𝒗tsuperscript𝒗𝑡\boldsymbol{v}^{t} and its previous layers 𝒗t1,𝒗t2,,𝒗0superscript𝒗𝑡1superscript𝒗𝑡2superscript𝒗0\boldsymbol{v}^{t-1},\boldsymbol{v}^{t-2},\dots,\boldsymbol{v}^{0}. For example, in a simple convolutional network, 𝒗tsuperscript𝒗𝑡\boldsymbol{v}^{t} is only associated to its previous layer 𝒗t1superscript𝒗𝑡1\boldsymbol{v}^{t-1} and 𝒯𝚯t1(𝒗t1)=𝝎t1𝒗t1+𝒃t1subscript𝒯superscript𝚯𝑡1superscript𝒗𝑡1superscript𝝎𝑡1superscript𝒗𝑡1superscript𝒃𝑡1\mathcal{T}_{\boldsymbol{\Theta}^{t-1}}(\boldsymbol{v}^{t-1})=\boldsymbol{\omega}^{t-1}*\boldsymbol{v}^{t-1}+\boldsymbol{b}^{t-1} where 𝝎t1superscript𝝎𝑡1\boldsymbol{\omega}^{t-1} and 𝒃t1superscript𝒃𝑡1\boldsymbol{b}^{t-1} are the convolution kernel and the bias of an affine transformation, respectively. Let 𝚯𝚯\boldsymbol{\Theta} be the collection of learnable parameters:

𝚯={𝚯t=(𝝎t1,𝒃t1)|t=0,1,,T1}.𝚯conditional-setsuperscript𝚯𝑡superscript𝝎𝑡1superscript𝒃𝑡1𝑡01𝑇1\boldsymbol{\Theta}=\left\{\boldsymbol{\Theta}^{t}=(\boldsymbol{\omega}^{t-1},\boldsymbol{b}^{t-1})\bigg{|}t=0,1,\dots,T-1\right\}.

For the problem of binary image segmentation, the sigmoid function is often chosen as the activation function of the final layer 𝒜Tsuperscript𝒜𝑇\mathcal{A}^{T} in DNNs:

S(oT)(x)=11+eoT(x),𝑆superscript𝑜𝑇𝑥11superscript𝑒superscript𝑜𝑇𝑥\displaystyle S(o^{T})(x)=\frac{1}{1+e^{-o^{T}(x)}}, (8)

where oT:Ω:superscript𝑜𝑇Ωo^{T}:\Omega\to\mathbb{R} is the logits output of the neural network, S(oT)𝑆superscript𝑜𝑇S(o^{T}) maps the logits from (,+)(-\infty,+\infty) to [0,1]01[0,1] and enforces output probabilities.

However, integrating high-level spatial priors into DNNs is still a challenge for the often-used DNNs. In this respect, Liu et al [10] proposed an interesting work to naturally enforce the classical spatial regularization term into DNNs:

{ot=𝒯𝚯t1(vt1,vt2,,v0),t=1,2,,T,vt=𝒜t(ot),t=1,2,,T1,vT=argminu𝒦{λoT,u+ϵu,log(u)+ϵ1u,log(1u)+(u)}.casesformulae-sequencesuperscript𝑜𝑡subscript𝒯superscript𝚯𝑡1superscript𝑣𝑡1superscript𝑣𝑡2superscript𝑣0𝑡12𝑇otherwiseformulae-sequencesuperscript𝑣𝑡superscript𝒜𝑡superscript𝑜𝑡𝑡12𝑇1otherwisesuperscript𝑣𝑇subscript𝑢𝒦𝜆superscript𝑜𝑇𝑢italic-ϵ𝑢𝑢italic-ϵ1𝑢1𝑢𝑢otherwise\begin{cases}{o}^{t}=\mathcal{T}_{\boldsymbol{\Theta}^{t-1}}({v}^{t-1},{v}^{t-2},\dots,{v}^{0}),\ t=1,2,\dots,T,\\ {v}^{t}=\mathcal{A}^{t}({o}^{t}),t=1,2,\dots,T-1,\\ {v}^{T}=\arg\min_{u\in\mathcal{K}}\left\{\lambda\langle-{o}^{T},{u}\rangle+\epsilon\langle{u},\log({u})\rangle+\epsilon\langle 1-{u},\log(1-{u})\rangle+\mathcal{R}({u})\right\}.\end{cases} (9)

It is motivated by the fact that the sigmoid function can be regarded as the minimizer of a dual function of the entropy-regularized Rectified Linear Unit (ReLU):

argminu𝒦{<o,u>+ϵ<u,log(u)>+ϵ<1u,log(1u)>},\displaystyle\arg\min_{u\in\mathcal{K}}\left\{<-o,u>+\epsilon<u,\log(u)>+\epsilon<1-u,\log(1-u)>\right\}, (10)

when the entropic regularization parameter ϵ=1italic-ϵ1\epsilon=1. Clearly, the last layer of (9) just represents a total-variation regularized sigmoid function.

Inspired by [10], we propose to incorporate the studied shape-compactness prior into DNNs, to push the spatial compactness of the segmentation result from DNNs.

3 Image segmentation with compactness prior

In this work, let C(u)𝐶𝑢C(u) be the spatial compactness regularization of a function u(x)𝒮𝑢𝑥𝒮u(x)\in\mathcal{S}, where 𝒮𝒮\mathcal{S} is defined in Sec. 2.1, as below:

C(u)={4π,u(x)0,(|u|TV)2/Ωu(x)𝑑x,otherwise.𝐶𝑢cases4𝜋𝑢𝑥0superscriptsubscript𝑢𝑇𝑉2subscriptΩ𝑢𝑥differential-d𝑥otherwiseC(u)=\begin{cases}4\pi,&u(x)\equiv 0,\\ (|u|_{TV})^{2}/\int_{\Omega}u(x)dx,&\text{otherwise}.\end{cases}

Given the classical image segmentation model (3), we consider replacing its length-minimization term, i.e. the total-variation function, by the above shape-compactness prior, and propose the new image segmentation model as follows:

minu𝒮Eλ(u):=λf,u+C(u),assignsubscript𝑢𝒮subscript𝐸𝜆𝑢𝜆𝑓𝑢𝐶𝑢\displaystyle\min_{u\in\mathcal{S}}E_{\lambda}(u):=\lambda\langle f,u\rangle+C(u), (11)

where the region force f(x)L(Ω)𝑓𝑥superscript𝐿Ωf(x)\in L^{\infty}(\Omega) and ,\langle\cdot,\cdot\rangle denotes the L2superscript𝐿2L^{2} inner product.

3.1 Analysis of the optimization model (11)

We first show that the minimizer of the novel shape-compactness regularized image segmentation model (11) does exist and its minimizer tends to be a circular region when the regularization parameter in (11) becomes small enough.

Proposition 1

Suppose {un}n=1superscriptsubscriptsubscript𝑢𝑛𝑛1\{u_{n}\}_{n=1}^{\infty} is a sequence of functions in 𝒮𝒮\mathcal{S} and unusubscript𝑢𝑛superscript𝑢u_{n}\rightarrow u^{*} in L1(Ω)superscript𝐿1ΩL^{1}(\Omega). Then C(u)lim infnC(un).𝐶superscript𝑢subscriptlimit-infimum𝑛𝐶subscript𝑢𝑛C(u^{*})\leq\liminf_{n\rightarrow\infty}C(u_{n}).

Proof 1

Since 𝒮𝒮\mathcal{S} is closed, usuperscript𝑢u^{*} is also in 𝒮𝒮\mathcal{S}. If u=0superscript𝑢0u^{*}=0, then C(u)=4πlim infnC(un)𝐶superscript𝑢4𝜋subscriptlimit-infimum𝑛𝐶subscript𝑢𝑛C(u^{*})=4\pi\leq\liminf_{n\rightarrow\infty}C(u_{n}) follows directly from the isoperimetric inequality. If u0superscript𝑢0u^{*}\neq 0, then, by the lower semi-continuous of total variation [37], we have limnΩun𝑑x=Ωu𝑑xsubscript𝑛subscriptΩsubscript𝑢𝑛differential-d𝑥subscriptΩsuperscript𝑢differential-d𝑥\lim_{n\rightarrow\infty}\int_{\Omega}u_{n}dx=\int_{\Omega}u^{*}dx and |u|TVlim infn|un|TVsubscriptsuperscript𝑢𝑇𝑉subscriptlimit-infimum𝑛subscriptsubscript𝑢𝑛𝑇𝑉|u^{*}|_{TV}\leq\liminf_{n\rightarrow\infty}|u_{n}|_{TV}. Thus, C(u)lim infnC(un).𝐶superscript𝑢subscriptlimit-infimum𝑛𝐶subscript𝑢𝑛C(u^{*})\leq\liminf_{n\rightarrow\infty}C(u_{n}). \hfill\square

Lemma 1 (Existence of minimizer)

For any λ>0𝜆0\lambda>0, there exists a minimizer of Eλ(u)subscript𝐸𝜆𝑢E_{\lambda}(u) in 𝒮𝒮\mathcal{S}.

Proof 2

Suppose {un}n=1superscriptsubscriptsubscript𝑢𝑛𝑛1\{u_{n}\}_{n=1}^{\infty} is a minimizing sequence of Eλsubscript𝐸𝜆E_{\lambda} in 𝒮𝒮\mathcal{S} such that limnEλ(un)=infv𝒮Eλ(v)subscript𝑛subscript𝐸𝜆subscript𝑢𝑛subscriptinfimum𝑣𝒮subscript𝐸𝜆𝑣\lim_{n\rightarrow\infty}E_{\lambda}(u_{n})=\inf_{v\in\mathcal{S}}E_{\lambda}(v). Then, there exists a constant M𝑀M such that

MEλ(un)λf|Ω|+(|un|TV)2|Ω|𝑀subscript𝐸𝜆subscript𝑢𝑛𝜆subscriptnorm𝑓Ωsuperscriptsubscriptsubscript𝑢𝑛𝑇𝑉2Ω\displaystyle M\geq E_{\lambda}(u_{n})\geq-\lambda\|f\|_{\infty}|\Omega|+\frac{(|u_{n}|_{TV})^{2}}{|\Omega|}
\displaystyle\Rightarrow |un|TVM|Ω|+λf|Ω|2.subscriptsubscript𝑢𝑛𝑇𝑉𝑀Ω𝜆subscriptnorm𝑓superscriptΩ2\displaystyle|u_{n}|_{TV}\leq\sqrt{M|\Omega|+\lambda\|f\|_{\infty}|\Omega|^{2}}.

In addition, we have unL1|Ω|subscriptnormsubscript𝑢𝑛superscript𝐿1Ω\|u_{n}\|_{L^{1}}\leq|\Omega|. Thus, the BV norm of unsubscript𝑢𝑛u_{n} is uniformly bounded for n=1,2,𝑛12italic-…n=1,2,\dots:

unBV=unL1+|un|TV.subscriptnormsubscript𝑢𝑛𝐵𝑉subscriptnormsubscript𝑢𝑛superscript𝐿1subscriptsubscript𝑢𝑛𝑇𝑉\|u_{n}\|_{BV}=\|u_{n}\|_{L^{1}}+|u_{n}|_{TV}.

By Theorem 2.5 of [37], there exists a sub-sequence {unk}k=1superscriptsubscriptsubscript𝑢subscript𝑛𝑘𝑘1\{u_{n_{k}}\}_{k=1}^{\infty} converging to a function usuperscript𝑢u^{*} in L1(Ω)superscript𝐿1ΩL^{1}(\Omega). Since 𝒮𝒮\mathcal{S} is closed, usuperscript𝑢u^{*} also belongs to 𝒮𝒮\mathcal{S}. By Proposition 1, we have C(u)lim infkC(unk)𝐶superscript𝑢subscriptlimit-infimum𝑘𝐶subscript𝑢subscript𝑛𝑘C(u^{*})\leq\liminf_{k\rightarrow\infty}C(u_{n_{k}}) for any u𝒮superscript𝑢𝒮u^{*}\in\mathcal{S}. Then,

infv𝒮Eλ(v)Eλ(u)lim infkEλ(unk)=infv𝒮Eλ(v).subscriptinfimum𝑣𝒮subscript𝐸𝜆𝑣subscript𝐸𝜆superscript𝑢subscriptlimit-infimum𝑘subscript𝐸𝜆subscript𝑢subscript𝑛𝑘subscriptinfimum𝑣𝒮subscript𝐸𝜆𝑣\inf_{v\in\mathcal{S}}E_{\lambda}(v)\leq E_{\lambda}(u^{*})\leq\liminf_{k\rightarrow\infty}E_{\lambda}(u_{n_{k}})=\inf_{v\in\mathcal{S}}E_{\lambda}(v).

Thus, infv𝒮Eλ(v)=Eλ(u)subscriptinfimum𝑣𝒮subscript𝐸𝜆𝑣subscript𝐸𝜆superscript𝑢\inf_{v\in\mathcal{S}}E_{\lambda}(v)=E_{\lambda}(u^{*}) and u𝒮superscript𝑢𝒮u^{*}\in\mathcal{S} is a minimizer. \hfill\square

Lemma 2 (Convergence of minimizers)

Denote un:=argminv𝒮E1/n(v)assignsubscript𝑢𝑛subscript𝑣𝒮subscript𝐸1𝑛𝑣u_{n}:=\arg\min_{v\in\mathcal{S}}E_{1/n}(v), then every cluster point of the sequence {un}subscript𝑢𝑛\{u_{n}\} is the indicator function of a ball in ΩΩ\Omega.

Proof 3

Let u~~𝑢\tilde{u} be the indicator function of a ball in ΩΩ\Omega, then

1nf|Ω|+4πE1/n(un)E1/n(u~)1nf|Ω|+4π1𝑛subscriptnorm𝑓Ω4𝜋subscript𝐸1𝑛subscript𝑢𝑛subscript𝐸1𝑛~𝑢1𝑛subscriptnorm𝑓Ω4𝜋-\frac{1}{n}\|f\|_{\infty}|\Omega|+4\pi\leq E_{1/n}(u_{n})\leq E_{1/n}(\tilde{u})\leq\frac{1}{n}\|f\|_{\infty}|\Omega|+4\pi

for n=1,2,𝑛12italic-…n=1,2,\dots. Thus, limnE1/n(un)=limnC(un)=4πsubscript𝑛subscript𝐸1𝑛subscript𝑢𝑛subscript𝑛𝐶subscript𝑢𝑛4𝜋\lim_{n\rightarrow\infty}E_{1/n}(u_{n})=\lim_{n\rightarrow\infty}C(u_{n})=4\pi. Using a similar argument with Lemma 1, we can show {un}subscript𝑢𝑛\{u_{n}\} has bounded BV norm and thus converges to some u𝒮superscript𝑢𝒮u^{*}\in\mathcal{S} up to a sub-sequence {unk}subscript𝑢subscript𝑛𝑘\{u_{n_{k}}\}. Then, 4πC(u)lim infnC(unk)=4π4𝜋𝐶superscript𝑢subscriptlimit-infimum𝑛𝐶subscript𝑢subscript𝑛𝑘4𝜋4\pi\leq C(u^{*})\leq\liminf_{n\rightarrow\infty}C(u_{n_{k}})=4\pi. Therefore, C(u)=4π𝐶superscript𝑢4𝜋C(u^{*})=4\pi and usuperscript𝑢u^{*} must be the indicator function of a ball. \hfill\square

As Lemma 2, it is expected that the optimum of the proposed image segmentation problem (11) tends to be circular while the penalty parameter λ𝜆\lambda is small enough.

3.2 Dolz-Ayed-Desrosiers algorithm

Dolz et al [14] proposed an algorithm to solve the model (11) based on the alternating direction methods of multipliers (ADMM). By the two auxiliary variables s=Ωu(x)𝑑x𝑠subscriptΩ𝑢𝑥differential-d𝑥s=\int_{\Omega}u(x)dx and z=u𝑧𝑢z=u, the studied optimization model (11) can be written as the following linear-equality constrained optimization problem:

minu𝒮,z,ssubscript𝑢𝒮𝑧𝑠\displaystyle\min_{u\in\mathcal{S},z,s} {λf,u+|u|TV|z|TVs},𝜆𝑓𝑢subscript𝑢𝑇𝑉subscript𝑧𝑇𝑉𝑠\displaystyle\left\{\lambda\langle f,u\rangle+\frac{|u|_{TV}|z|_{TV}}{s}\right\},
s.t s=z,𝟙Ω,u=z,formulae-sequence𝑠𝑧subscript1Ω𝑢𝑧\displaystyle s=\langle z,\mathbbm{1}_{\Omega}\rangle,\quad u=z,

where 𝟙Ωsubscript1Ω\mathbbm{1}_{\Omega} is a constant function in ΩΩ\Omega with all values equal to 111. Its corresponding augmented Lagrangian function is:

L(u,z,s,ν1,ν2)=𝐿𝑢𝑧𝑠subscript𝜈1subscript𝜈2absent\displaystyle L(u,z,s,\nu_{1},\nu_{2})= λf,u+|u|TV|z|TVs+μ12uz+ν122+μ22(sz,𝟙Ω+ν2)2.𝜆𝑓𝑢subscript𝑢𝑇𝑉subscript𝑧𝑇𝑉𝑠subscript𝜇12superscriptsubscriptnorm𝑢𝑧subscript𝜈122subscript𝜇22superscript𝑠𝑧subscript1Ωsubscript𝜈22\displaystyle\lambda\langle f,u\rangle+\frac{|u|_{TV}|z|_{TV}}{s}+\frac{\mu_{1}}{2}\|u-z+\nu_{1}\|_{2}^{2}+\frac{\mu_{2}}{2}(s-\langle z,\mathbbm{1}_{\Omega}\rangle+\nu_{2})^{2}.

At each iteration, each variable of (u,z,s)𝑢𝑧𝑠(u,z,s) is updated separately while fixing the two multipliers of (ν1,ν2)subscript𝜈1subscript𝜈2(\nu_{1},\nu_{2}) and the multipliers of (ν1,ν2)subscript𝜈1subscript𝜈2(\nu_{1},\nu_{2}) are updated as the classical augmented Lagrangian method consequently. Particulalrly, the anisotropic total-variation functions of |u|TVsubscript𝑢𝑇𝑉|u|_{TV} and |z|TVsubscript𝑧𝑇𝑉|z|_{TV} are applied, so the step of ulimit-from𝑢u-update is solved by graph-cut and zlimit-from𝑧z-update is computed by tackling a large-scale linear equation; the step of slimit-from𝑠s-update is simply finished by finding the root of a cubic equation.

4 Novel equivalent models and efficient algorithms

The algorithm proposed in [14] has a series of hyper-parameters, which require fine-tuning and impacts its numerical performance in practice. In this work, we reformulate the image segmentation model (11) with the shape-compactness prior and propose new efficient algorithms.

4.1 Novel equivalent optimization models

As shown in [11; 23], the total-variation function |u|TVsubscript𝑢𝑇𝑉|u|_{TV}, i.e. the boundary length term, can well approximated by

u,Gσ(1u),𝑢subscript𝐺𝜎1𝑢\langle u,G_{\sigma}*(1-u)\rangle, (12)

where Gσ(x)=12πσ2exp(|x|22σ2)subscript𝐺𝜎𝑥12𝜋superscript𝜎2superscript𝑥22superscript𝜎2G_{\sigma}(x)=\frac{1}{2\pi\sigma^{2}}\exp(-\frac{|x|^{2}}{2\sigma^{2}}) is the Gaussian kernel. Such an approximation ΓΓ\Gamma-converges to the exact length |Ω|Ω|\partial\Omega| as σ0+𝜎superscript0\sigma\to 0^{+}.

Hence, the approximated model of (11) can be essentially formulated as

minu𝒮{λf,u+u,Gσ(1u)2u,𝟙Ω}.subscript𝑢𝒮𝜆𝑓𝑢superscript𝑢subscript𝐺𝜎1𝑢2𝑢subscript1Ω\displaystyle\min_{u\in\mathcal{S}}\left\{\lambda\langle f,u\rangle+\frac{\langle u,G_{\sigma}*(1-u)\rangle^{2}}{\langle u,\mathbbm{1}_{\Omega}\rangle}\right\}. (13)

Let q=u,Gσ(1u)𝑞𝑢subscript𝐺𝜎1𝑢q=\langle u,G_{\sigma}*(1-u)\rangle. In view of the following dual representation of the convex function q2u,𝟙Ωsuperscript𝑞2𝑢subscript1Ω\frac{q^{2}}{\langle u,\mathbbm{1}_{\Omega}\rangle} such that

q2u,𝟙Ω:=maxp{pqp2u,𝟙Ω4},assignsuperscript𝑞2𝑢subscript1Ωsubscript𝑝𝑝𝑞superscript𝑝2𝑢subscript1Ω4\frac{q^{2}}{\langle u,\mathbbm{1}_{\Omega}\rangle}\,:=\,\max_{p\in\mathbb{R}}\left\{p\cdot q-\frac{p^{2}\langle u,\mathbbm{1}_{\Omega}\rangle}{4}\right\}\,,

the optimization model (13) can be equally rewritten as

minu𝒮maxp{λf,u+pu,Gσ(1u)p2u,𝟙Ω4}.subscript𝑢𝒮subscript𝑝𝜆𝑓𝑢𝑝𝑢subscript𝐺𝜎1𝑢superscript𝑝2𝑢subscript1Ω4\displaystyle\min_{u\in\mathcal{S}}\max_{p\in\mathbb{R}}\left\{\lambda\langle f,u\rangle+p\langle u,G_{\sigma}*(1-u)\rangle-\frac{p^{2}\langle u,\mathbbm{1}_{\Omega}\rangle}{4}\right\}.

By simple calculation, the above optimization problem becomes

minu𝒮maxpL(u,p){:=λfp24,u+pu,Gσ(1u)}.annotatedsubscript𝑢𝒮subscript𝑝𝐿𝑢𝑝assignabsent𝜆𝑓superscript𝑝24𝑢𝑝𝑢subscript𝐺𝜎1𝑢\displaystyle\min_{u\in\mathcal{S}}\max_{p\in\mathbb{R}}\;L(u,p)\left\{:=\langle\lambda f-\frac{p^{2}}{4},u\rangle+p\langle u,G_{\sigma}*(1-u)\rangle\right\}\,. (14)

In this work, the min-max problem (14) is called the equivalent primal-dual model of (13).

4.2 Primal-dual algorithms using threshold dynamics (PD-TD)

Given the primal-dual formulation (14), we propose the first primal-dual algorithm based on threshold dynamics. Given uksuperscript𝑢𝑘u^{k} and pksuperscript𝑝𝑘p^{k} at the (k+1)𝑘1(k+1)-th step, the new algorithm involves two steps to update uk+1superscript𝑢𝑘1u^{k+1} and pk+1superscript𝑝𝑘1p^{k+1}:

First, we fix pksuperscript𝑝𝑘p^{k} and update u𝑢u by

uk+1=argminu𝒮L(u,pk),superscript𝑢𝑘1subscript𝑢𝒮𝐿𝑢superscript𝑝𝑘\displaystyle u^{k+1}=\arg\min_{u\in\mathcal{S}}L(u,p^{k}), (15)

which can be solved approximately by linearizing u,Gσ(1u)𝑢subscript𝐺𝜎1𝑢\langle u,G_{\sigma}*(1-u)\rangle at uksuperscript𝑢𝑘u^{k}:

u,Gσ(1u)uk,Gσ(1uk)+uuk,Gσ(12uk).𝑢subscript𝐺𝜎1𝑢superscript𝑢𝑘subscript𝐺𝜎1superscript𝑢𝑘𝑢superscript𝑢𝑘subscript𝐺𝜎12superscript𝑢𝑘\displaystyle\langle u,G_{\sigma}*(1-u)\rangle\approx\langle u^{k},G_{\sigma}*(1-u^{k})\rangle+\langle u-u^{k},G_{\sigma}*(1-2u^{k})\rangle.

The optimum uk+1superscript𝑢𝑘1u^{k+1} can, therefore, be efficiently computed by threshholding ϕk(x)superscriptitalic-ϕ𝑘𝑥\phi^{k}(x), i.e.

uk+1superscript𝑢𝑘1\displaystyle u^{k+1} =𝟙ϕk<0={1ϕk(x)<0,0ϕk(x)0,absentsubscript1superscriptitalic-ϕ𝑘0cases1superscriptitalic-ϕ𝑘𝑥00superscriptitalic-ϕ𝑘𝑥0\displaystyle=\mathbbm{1}_{\phi^{k}<0}=\begin{cases}1&\phi^{k}(x)<0,\\ 0&\phi^{k}(x)\geq 0,\end{cases} (16)

where

ϕk(x)=λf(x)(pk)24+pkGσ(x)(12uk(x)).superscriptitalic-ϕ𝑘𝑥𝜆𝑓𝑥superscriptsuperscript𝑝𝑘24superscript𝑝𝑘subscript𝐺𝜎𝑥12superscript𝑢𝑘𝑥\phi^{k}(x)=\lambda f(x)-\frac{(p^{k})^{2}}{4}+p^{k}G_{\sigma}(x)*(1-2u^{k}(x)).

Second, we fix uk+1superscript𝑢𝑘1u^{k+1} and update p𝑝p by solving a quadratic equation explicitly:

pk+1=argmaxpL(uk+1,p)=2uk+1,Gσ(1uk+1)uk+1,𝟙Ω.superscript𝑝𝑘1subscript𝑝𝐿superscript𝑢𝑘1𝑝2superscript𝑢𝑘1subscript𝐺𝜎1superscript𝑢𝑘1superscript𝑢𝑘1subscript1Ω\displaystyle p^{k+1}=\arg\max_{p\in\mathbb{R}}L(u^{k+1},p)=\frac{2\langle u^{k+1},G_{\sigma}*(1-u^{k+1})\rangle}{\langle u^{k+1},\mathbbm{1}_{\Omega}\rangle}. (17)

We list the details of the proposed primal-dual algorithm using threshold dynamics (PD-TD) in Alg. 1.

Algorithm 1 Primal-dual algorithm using threshold dynamics (PD-TD)
the region force term f(x)𝑓𝑥f(x)
choose parameters λ𝜆\lambda and the maximum number of iterations I𝐼I
initialize u𝑢u and p𝑝p to get u0(x)superscript𝑢0𝑥u^{0}(x) and p0superscript𝑝0p^{0}
for k𝑘k from 0 to I1𝐼1I-1 do
     ϕk=λf(pk)2/4+pkGσ(12uk).superscriptitalic-ϕ𝑘𝜆𝑓superscriptsuperscript𝑝𝑘24superscript𝑝𝑘subscript𝐺𝜎12superscript𝑢𝑘\phi^{k}=\lambda f-(p^{k})^{2}/4+p^{k}G_{\sigma}*(1-2u^{k}).
     uk+1=𝟙ϕk<0superscript𝑢𝑘1subscript1superscriptitalic-ϕ𝑘0u^{k+1}=\mathbbm{1}_{\phi^{k}<0}
     pk+1=2uk+1,Gσ(1uk+1)/uk+1,𝟙Ωsuperscript𝑝𝑘12superscript𝑢𝑘1subscript𝐺𝜎1superscript𝑢𝑘1superscript𝑢𝑘1subscript1Ωp^{k+1}=2\langle u^{k+1},G_{\sigma}*(1-u^{k+1})\rangle/\langle u^{k+1},\mathbbm{1}_{\Omega}\rangle
     if stopping criterion is met then
         stop iterations
     end if
end for
return uk+1superscript𝑢𝑘1u^{k+1}

4.3 Primal-dual algorithm using soft threshold dynamics (PD-STD)

In this section, we propose another new primal-dual-based algorithm which is similar to the introduced PD-TD algorithm (Alg. 1), including two steps of updating u𝑢u and p𝑝p at each iteration. But different from the PD-TD algorithm of Alg. 1, we apply soft threshholding to ϕksuperscriptitalic-ϕ𝑘\phi^{k}, instead of its hard threshholding as (16). This allows the result to be within [0,1]01[0,1] and the whole procedure is differentiable and can thus be integrated into the DNNs as a variational sigmoid layer.

Given uk,pksuperscript𝑢𝑘superscript𝑝𝑘u^{k},p^{k} at the (k+1)𝑘1(k+1)-th iteration, we first fix pksuperscript𝑝𝑘p^{k} and compute uk+1superscript𝑢𝑘1u^{k+1} by

uk+1=argminu𝒦{L(u,pk)+ϵu,log(u)+ϵ(1u),log(1u)},superscript𝑢𝑘1subscript𝑢𝒦𝐿𝑢superscript𝑝𝑘italic-ϵ𝑢𝑢italic-ϵ1𝑢1𝑢\displaystyle u^{k+1}=\arg\min_{u\in\mathcal{K}}\left\{L(u,p^{k})+\epsilon\langle u,\log(u)\rangle+\epsilon\langle(1-u),\log(1-u)\rangle\right\}, (18)

where an entropy regularization ϵ(u,log(u)+1u,log(1u))italic-ϵ𝑢𝑢1𝑢1𝑢\epsilon(\langle u,\log(u)\rangle+\langle 1-u,\log(1-u)\rangle) is introduced to the loss function L(u,pk)𝐿𝑢superscript𝑝𝑘L(u,p^{k}) for some ϵ>0italic-ϵ0\epsilon>0 [11].

When ϵ0+italic-ϵsuperscript0\epsilon\to 0^{+}, it has been proved [38] that the solution to (18) becomes binary.

Second, we fix uk+1superscript𝑢𝑘1u^{k+1} and introduce a proximal-point algorithm to update p𝑝p:

pk+1superscript𝑝𝑘1\displaystyle p^{k+1} =argmaxp{L(uk+1,p)12τ|ppk|2}absentsubscript𝑝𝐿superscript𝑢𝑘1𝑝12𝜏superscript𝑝superscript𝑝𝑘2\displaystyle=\arg\max_{p\in\mathbb{R}}\left\{L(u^{k+1},p)-\frac{1}{2\tau}|p-p^{k}|^{2}\right\}
=pk+τuk+1,Gσ(1uk+1)1+τuk+1.1Ω/2.absentsuperscript𝑝𝑘𝜏superscript𝑢𝑘1subscript𝐺𝜎1superscript𝑢𝑘11𝜏delimited-⟨⟩superscript𝑢𝑘1subscript.1Ω2\displaystyle=\frac{p^{k}+\tau\langle u^{k+1},G_{\sigma}*(1-u^{k+1})\rangle}{1+\tau\langle u^{k+1}.\mathbbm{1}_{\Omega}\rangle/2}. (19)

The update (19) becomes (17) if we choose τ=+𝜏\tau=+\infty. After getting uk+1𝒦superscript𝑢𝑘1𝒦u^{k+1}\in\mathcal{K}, we can obtain a binary solution in 𝒮𝒮\mathcal{S} by thresholding uk+1superscript𝑢𝑘1u^{k+1}. The details of the primal-dual algorithm using soft threshold dynamics (PD-STD) are listed in Alg. 2. Notice that the PD-TD algorithm (Alg. 1) can be taken as a special case of the PD-STD algorithm (Alg. 2) when ϵ0+italic-ϵsuperscript0\epsilon\to 0^{+} and τ+𝜏\tau\to+\infty.

Algorithm 2 Primal-dual algorithm using soft threshold dynamics (PD-STD)
the region force term f(x)𝑓𝑥f(x)
choose parameters λ𝜆\lambda, ϵitalic-ϵ\epsilon, τ𝜏\tau and the maximum number of iterations I𝐼I
initialize u𝑢u and p𝑝p to get u0(x)superscript𝑢0𝑥u^{0}(x) and p0superscript𝑝0p^{0}
for k𝑘k from 0 to I1𝐼1I-1 do
     ϕk=λf(pk)2/4+pkGσ(12uk).superscriptitalic-ϕ𝑘𝜆𝑓superscriptsuperscript𝑝𝑘24superscript𝑝𝑘subscript𝐺𝜎12superscript𝑢𝑘\phi^{k}=\lambda f-(p^{k})^{2}/4+p^{k}G_{\sigma}*(1-2u^{k}).
     uk+1=exp(ϕk/ϵ)/(1+exp(ϕk/ϵ))superscript𝑢𝑘1superscriptitalic-ϕ𝑘italic-ϵ1superscriptitalic-ϕ𝑘italic-ϵu^{k+1}=\exp(-\phi^{k}/\epsilon)/(1+\exp(-\phi^{k}/\epsilon))
     pk+1=(pk+τuk+1,Gσ(1uk+1))/(1+τuk+1,𝟙Ω/2)superscript𝑝𝑘1superscript𝑝𝑘𝜏superscript𝑢𝑘1subscript𝐺𝜎1superscript𝑢𝑘11𝜏superscript𝑢𝑘1subscript1Ω2p^{k+1}=(p^{k}+\tau\langle u^{k+1},G_{\sigma}*(1-u^{k+1})\rangle)/(1+\tau\langle u^{k+1},\mathbbm{1}_{\Omega}\rangle/2)
     if stopping criterion is met then
         stop iterations
     end if
end for
return 𝟙uk+1>0.5subscript1superscript𝑢𝑘10.5\mathbbm{1}_{u^{k+1}>0.5}

4.4 New PD-STD Neural Network

Utilizing the variational explanation of the sigmoid function (10) and the neural network incorporating spatial regularization (9), we introduce a novel neural network which properly integrates the shape-compactness prior as follows:

{ot=𝒯Θt1(vt1,vt2,,v0),t=1,2,,T,vt=𝒜t(ot),t=1,2,,T1,vT=argminu𝒦{λoT,u+ϵu,log(u)+ϵ(1u),log(1u)+u,Gσ(1u)2u,𝟙Ω}.casesformulae-sequencesuperscript𝑜𝑡subscript𝒯superscriptΘ𝑡1superscript𝑣𝑡1superscript𝑣𝑡2superscript𝑣0𝑡12𝑇otherwiseformulae-sequencesuperscript𝑣𝑡superscript𝒜𝑡superscript𝑜𝑡𝑡12𝑇1otherwisesuperscript𝑣𝑇subscript𝑢𝒦𝜆superscript𝑜𝑇𝑢italic-ϵ𝑢𝑢italic-ϵ1𝑢1𝑢superscript𝑢subscript𝐺𝜎1𝑢2𝑢subscript1Ωotherwise\begin{cases}{o}^{t}=\mathcal{T}_{{\Theta}^{t-1}}({v}^{t-1},{v}^{t-2},\dots,{v}^{0}),\ t=1,2,\dots,T,\\ {v}^{t}=\mathcal{A}^{t}({o}^{t}),t=1,2,\dots,T-1,\\ {v}^{T}=\arg\min_{u\in\mathcal{K}}\left\{\lambda\langle-{o}^{T},u\rangle+\epsilon\langle u,\log(u)\rangle+\epsilon\langle(1-u),\log(1-u)\rangle+\frac{\langle u,G_{\sigma}*(1-u)\rangle^{2}}{\langle u,\mathbbm{1}_{\Omega}\rangle}\right\}.\end{cases} (20)

In the last layer of the neural network above, we set the regularizer (u)𝑢\mathcal{R}(u) to the approximated shape compactness prior as defined in (13).

The proposed PD-STD algorithm helps obtain smooth and compact segmentation regions, by unrolling the PD-STD algorithm as network sublayers: for the last layer vTsuperscript𝑣𝑇v^{T}, each iteration of the proposed PD-STD algorithm can be regarded as a layer in DNNs, and thus the algorithm forms the new PD-STD Layers as shown in Fig. 1, and this will be used as the last block for our neural network as shown in Fig. 2. In Fig. 1, we show the layers which are resulted from Algorithm 2 and only p𝑝p and u𝑢u updates are illustrated. The ϕitalic-ϕ\phi updating is part of the u𝑢u updating and not shown here.

Our new network architecture based on (20) is shown in Fig. 2. The network in (20) for t=1,2,T1𝑡12𝑇1t=1,2,\cdots T-1 is referred as the backbone network. The backbone network can be any network which can produce the oTsuperscript𝑜𝑇o^{T} values for Algorithm 2 which is used as the last layer for our new network. The backbone network can be various image segmentation networks, including U-Net [3] and SegNet [21].

Refer to caption
Figure 1: The network architecture of the proposed PD-STD Layers used for our proposed network as shown in Fig. 2. The red rectangle denotes the dual variables in the regularization space. The blue rectangle denotes the dual variables in a compact-shaped space.
Refer to caption
Figure 2: The architecture of a segmentation network with PD-STD layer.

5 Numerical experiments

In this section, experiments on both synthetic and real-world images are conducted to evaluate the performance of our proposed algorithms which are implemented in Matlab and Python, particularly the algorithms of PD-TD and PD-STD in Matlab and PD-STD-based DNNs in Pytorch. Implementation of the proposed PD-TD and PD-STD algorithms is straightforward as in Alg. 1 and Alg. 2. The two popular neural networks of DeepLabV3 [36] and IrisParseNet [39] are taken as the backbone networks to apply the introduced PD-STD block to build up the new PD-STD-based DNNs as stated in Section 4.4. We compare the new networks with their original networks over different datasets to demonstrate the advantages of the PD-STD block, especially on noisy image datasets. Training is conducted on a computer equipped with 4 x NVIDIA Tesla V100-32G GPUs. The convolution is performed using the classical 2-D discrete Gaussian kernel Gσsubscript𝐺𝜎G_{\sigma} of size 2n+12𝑛12n+1 (with n+𝑛superscriptn\in\mathbb{Z}^{+}), with zero padding. In this work, the grayvalue of each image pixel is normalized between 0 and 1. Two types of noise, i.e. Gaussian and salt-and-peppr, are applied for related experiments. Hence, adding Gaussian noise of 0.10.10.1 to an image implies introducing noise with a standard deviation (SD) of ρ=0.1𝜌0.1\rho=0.1 to each pixel. The noise level for salt-and-pepper noise represents the probability of noise presence in the image.

We also compare the proposed algorithms with the state-of-the-art algorithms in efficiency, accuracy, and robustness. For the accuracy evaluation of image segmentation, the two metrics of Dice and IoU are applied:

Dice=2×|Ω0Ωr||Ω0|+|Ωr|,IoU=|Ω0Ωr||Ω0Ωr|,formulae-sequenceDice2subscriptΩ0subscriptΩ𝑟subscriptΩ0subscriptΩ𝑟IoUsubscriptΩ0subscriptΩ𝑟subscriptΩ0subscriptΩ𝑟\text{Dice}=\frac{2\times|\Omega_{0}\cap\Omega_{r}|}{|\Omega_{0}|+|\Omega_{r}|}\,,\quad\quad\text{IoU}=\frac{{{|\Omega_{0}\cap\Omega_{r}|}}}{{{|\Omega_{0}\cup\Omega_{r}|}}},

where Ω0subscriptΩ0\Omega_{0} is the ground truth label, ΩrsubscriptΩ𝑟\Omega_{r} is the computed segmentation area, and |Ω0Ωr|subscriptΩ0subscriptΩ𝑟{|\Omega_{0}\cap\Omega_{r}|} and |Ω0Ωr|subscriptΩ0subscriptΩ𝑟{|\Omega_{0}\cup\Omega_{r}|} are their intersection and union, respectively. The following compactness metric is introduced to measure the shape compactness of the segmentation result ΩrsubscriptΩ𝑟\Omega_{r}:

Compactness=|Ωr|2|Ωr|=(|u|TV)2Ωu(x)𝑑x.CompactnesssuperscriptsubscriptΩ𝑟2subscriptΩ𝑟superscriptsubscript𝑢𝑇𝑉2subscriptΩ𝑢𝑥differential-d𝑥\text{Compactness}=\frac{|\partial\Omega_{r}|^{2}}{|\Omega_{r}|}=\frac{(|u|_{TV})^{2}}{\int_{\Omega}u(x)dx}. (21)

5.1 Experiments of the proposed PD-TD and PD-STD algorithms

The experiment results of the proposed Algorithms of PD-TD and PD-STD, on both syntehtic and real-world images, are shown in this section and compared to the ADMM algorithm introduced by Dolz et al. [14] (see Sec. 3.2 for implementation details). As shown in Fig. 3, the segmentation result tends to be more circular as the weight parameter λ𝜆\lambda in (11) becomes smaller, which means the weight of the introduced shape-compactness regularization C(u)𝐶𝑢C(u) is relatively bigger. This is consistent with Lemma 2, and confirms that the introduced shape-compactness regularization C(u)𝐶𝑢C(u) does work properly for all the three algorithms of ADMM, PD-TD and PD-STD. Additionally, both the PD-TD and PD-STD algorithms demonstrate the ability to produce more compact segmentation results, with the PD-STD algorithm achieving the best performance.

Fig. 4 illustrates the comparison on real images. Clearly, our proposed algorithms yield more compact segmentation results with smoother boundaries. To quantify the performance, we compare the averaged dice score, shape compactness, and computational time in Tab. 1. The results clearly indicate the superiority of our proposed methods: the proposed algorithms of PD-TD and PD-STD significantly outperform the ADMM algorithm in both compactness and efficiency. The compared ADMM algorithm even fails to obtain a compact segmentation region as result in some cases. In particular, the PD-STD algorithm outperforms the other algorithms by producing results with the highest dice scores, which directly indicate more accurate segmentation results. Additionally, the PD-STD algorithm exhibits significantly faster computational speed compared to the ADMM algorithm. While the PD-TD algorithm demonstrates the shortest computational time, its compactness performance does not match the competitive level achieved by the PD-STD algorithm.

Refer to caption
Figure 3: Experimental results of different algorithms with different λ𝜆\lambda for comparison: it is obvious that the segmentation region tends to be more circular as the weight parameter λ𝜆\lambda in (11) becomes smaller (which means the weight of the introduced shape-compactness regularization C(u)𝐶𝑢C(u) is bigger). This shows that the introduced shape-compactness regularization C(u)𝐶𝑢C(u) does work properly for all the three algorithms of ADMM, PD-TD and PD-STD.
Refer to caption
Figure 4: Segmentation results by different algorithms on some real images. The proposed algorithms of PD-TD and PD-STD generate more compact region with smoother boundaries as segmentation results; in contrast, the compared ADMM algorithm sometimes fails to obtain a compact segmentation region as result.
Dice Compactness Time(s)
PD-TD 0.8863 14.0240 0.3936
PD-STD 0.9631 14.0086 4.2269
ADMM 0.9246 22.2291 35.7028
Table 1: Averaged metrics of dice, compactness and computation time (s) are computed for performance comparison of PD-TD, PD-STD, and ADMM algorithms on real images shown in Fig. 4.

5.2 Experiments for PD-STD-Based DNNs

In this work, two popular DNNs of DeepLabV3 [36] and IrisParseNet [39] are taken as the backbone networks for which the PD-STD block is introduced to replace their sigmoid layers as shown in Fig. 2, which encodes the compact shape prior in the proposed DNNs for training. Experiments show great performance of the proposed PD-STD-based DNNs in extracting compact regions from images, especially when there is high noise.

5.2.1 Implementation Details

We reimplemented the networks of DeepLabV3 and IrisParseNet as described in [36] and [39] using PyTorch and strictly adhered to the hyper-parameter settings specified in the papers [36] and [39]. We conducted multiple runs by randomly initializing the network parameters and selected the best results from each run for both networks. ’deeplabv3_resnet50’ from ’torchvision.models. segmentation’ is imported and we set the ’pretrained_backbone=TRUE’ to load the pre-training parameters for all the networks, so as to accelerate training. In our experiments, we use the ADAM optimizer and the learning rate is set as 0.0001. Actually, both the weight parameter ϵitalic-ϵ\epsilon in (18) and the iteration number I𝐼I in Alg. 2 impact the segmentation results. Increasing the iteration number I𝐼I enhances shape-compactness of the segmentation results, albeit at the cost of more running time. On the other hand, ϵitalic-ϵ\epsilon influences the speed at which the segmentation result reaches optimum: Higher values of ϵitalic-ϵ\epsilon often speed up the optimization process. Consequently, we aim to select an optimal pair of ϵitalic-ϵ\epsilon and I𝐼I to achieve rapid convergence and shape compactness jointly in our experiments.

5.2.2 Experiments on Iris Dataset

In this section, the Iris dataset of UBIRIS.v2, consisting of 483 training images and 436 testing images (all of size 400×300400300400\times 300), is taken for experiments to show the performance of our proposed PD-STD-based network. This dataset is publically available on the GitHub website: https://github.com/xiamenwcy/IrisParseNet. We add different degrees of Gaussian noise and salt-and-pepper noise to the test image dataset to further evaluate the robustness of our proposed networks.

The two networks of DeepLabV3 and our proposed PD-STD + DeepLabV3 are first trained by the clean Iris dataset (without noise), and the iteration number I𝐼I of the PD-STD block is set to 505050. The two networks are then evaluated on both clean and noisy image datasets. Different levels of Gaussian noise with standard deviation (0.10.10.1, 0.070.070.07, 0.050.050.05, and 0.010.010.01) and salt-and-pepper noise (0.010.010.01) are also applied for tests. Experiment results for the proposed method versus the original DeepLabV3 network are shown in Tab. 2, where higher values of IoU and Dice metrics indicate more accurate results and the metric of Compactness approaching 4π12.564𝜋12.564\pi\approx 12.56 signifies more compact segmentation regions. Clearly, our proposed PD-STD + DeepLabV3 outperforms DeepLabV3 in all metrics, particularly when segmenting images with higher noise levels. Fig. 5 shows several examples that illustrate the effectiveness of the proposed PD-STD + DeepLabV3 in mitigating the rough boundaries caused by the noise introduced. It is obvious that PD-STD + DeepLabV3 yields segmentation outcomes that are more preferable with compact shapes. When the Gaussian noise level is relatively large (such as 0.1), DeepLabV3 fails to get reasonable segmentation results.

Method Clean Gaussian Salt & Pepper
Noise level 0 0.01 0.05 0.07 0.1 0.01
IoU DeepLabV3 0.9096 0.9089 0.8572 0.7315 0.3817 0.8330
PD-STD + DeepLabV3 0.91020.9102\boldsymbol{0.9102} 0.90960.9096\boldsymbol{0.9096} 0.86510.8651\boldsymbol{0.8651} 0.77930.7793\boldsymbol{0.7793} 0.55110.5511\boldsymbol{0.5511} 0.85550.8555\boldsymbol{0.8555}
Dice DeepLabV3 0.9439 0.9435 0.9086 0.7979 0.4347 0.8904
PD-STD + DeepLabV3 0.94420.9442\boldsymbol{0.9442} 0.94390.9439\boldsymbol{0.9439} 0.91060.9106\boldsymbol{0.9106} 0.83890.8389\boldsymbol{0.8389} 0.61680.6168\boldsymbol{0.6168} 0.90550.9055\boldsymbol{0.9055}
Compactness DeepLabV3 14.1590 14.1299 15.1426 18.7855 19.4549 16.4107
PD-STD + DeepLabV3 13.766513.7665\boldsymbol{13.7665} 13.824313.8243\boldsymbol{13.8243} 14.345414.3454\boldsymbol{14.3454} 15.024315.0243\boldsymbol{15.0243} 13.246313.2463\boldsymbol{13.2463} 14.642714.6427\boldsymbol{14.6427}
Table 2: Comparison btw. DeepLabV3 and our proposed PD-STD + DeepLavV3 when training with a clean image dataset. Our proposed method of PD-STD + DeepLabV3 obtains better results in all metrics of IoU, Dice and Compactness. Especially, when the Gaussian noise level reaches relatively (0.1), DeepLabV3 fails to get reasonable segmentation results, but our proposed PD-STD + DeepLabV3 still yields meaningful results. Please see Fig. 5 for illustrations.
Refer to captionRefer to captionRefer to caption
(a)
Refer to captionRefer to captionRefer to caption
(b)
Refer to captionRefer to captionRefer to caption
(c)
Refer to captionRefer to captionRefer to caption
(d)
Figure 5: Segmentation results predicted by DeepLabV3 and our methods of PD-STD + DeepLabV3 trained on a clean image dataset. Noise level of the test image dataset from top to bottom: Gaussian noise level of 0.010.010.01, 0.070.070.07, salt and pepper noise level of 0.010.010.01.
Method Clean Gaussian Salt & Pepper
Noise level 0 0.1 0.15 0.17 0.2 0.05
IoU IrisParseNet 0.90540.9054\boldsymbol{0.9054} 0.8209 0.4363 0.2769 0.1121 0.5822
PD-STD + IrisParseNet 0.90300.9030{0.9030} 0.87940.8794\boldsymbol{0.8794} 0.79870.7987\boldsymbol{0.7987} 0.72390.7239\boldsymbol{0.7239} 0.58360.5836\boldsymbol{0.5836} 0.75540.7554\boldsymbol{0.7554}
Dice IrisParseNet 0.94150.9415\boldsymbol{0.9415} 0.8803 0.5357 0.3682 0.1649 0.6817
PD-STD + IrisParseNet 0.94000.9400{0.9400} 0.92490.9249\boldsymbol{0.9249} 0.86750.8675\boldsymbol{0.8675} 0.80700.8070\boldsymbol{0.8070} 0.67880.6788\boldsymbol{0.6788} 0.83410.8341\boldsymbol{0.8341}
Compactness IrisParseNet 14.8976 24.9740 44.1741 45.6170 35.8653 61.6093
PD-STD + IrisParseNet 13.913413.9134\boldsymbol{13.9134} 13.997113.9971\boldsymbol{13.9971} 14.087614.0876\boldsymbol{14.0876} 14.360814.3608\boldsymbol{14.3608} 13.836013.8360\boldsymbol{13.8360} 13.903013.9030\boldsymbol{13.9030}
Table 3: Comparison of IrisParseNet and our proposed PD-STD + IrisParseNet when training with a clean image dataset. Our proposed method of PD-STD + IrisParseNet performs similarly as IrisParseNet when testing on clean images, but significantly outperforms IrisParseNet when testing on noisy images. Especially, when the Gaussian noise level increases, IrisParseNet fails to get reasonable segmentation results, but our proposed PD-STD + IrisParseNet still works properly. Please see Fig. 6 for illustrations.
Refer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(a)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(b)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(c)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(d)
Figure 6: Segmentation results predicted by IrisParseNet and our proposed PD-STD+IrisParseNet trained on clean image dataset. Noise level from top to bottom: Gaussian noise level of 0.10.10.1, 0.150.150.15, 0.170.170.17, 0.20.20.2, salt-and-pepper noise level of 0.050.050.05. It is visually obvious that our porposed PD-STD + IrisParseNet works much more robustly than the popular IrisParseNet in segmenting noisy images.

Now we introduce the proposed PD-STD block to the state-of-the-art IrisParseNet network [39], i.e. the PD-STD+IrisParseNet, to further show its effectiveness in pursuit of shape prior information on compactness. In contrast to the general-purpose DeepLabV3 network, IrisParseNet is purposely designed to tackle the challenge of iris images affected by severe noise. Similar experiments for IrisParseNet [39] and our proposed PD-STD + IrisParseNet are conducted for comparisons. Different levels of Gaussian noise ranging from 0.1 to 0.2 and salt-and-pepper noise of 0.05 are set for the experiments. Tab. 3 demonstrates that our proposed PD-STD + IrisParseNet exhibits comparable performance in segmenting clean images, and significantly outperforms the original IrisParseNet in segmenting noisy images, as the examples illustrated in Fig. 6.

Noisy Image

Ground Truth

IrisParseNet

DeepLabV3

IrisParseNet + PD-STD

DeepLabV3 + PD-STD

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(a)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(b)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(c)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(d)
Figure 7: Some illustrative examples are given in this figure: the four networks of DeepLabV3, IrisParseNet, PD-STD+DeepLabV3 and PD-STD+IrisParseNet are trained on the image dataset of noise level 0.10.10.1, and their performance on the test image of noise level: no noise, Gaussian noise 0.050.050.05 and 0.10.10.1, Salt & Pepper 0.050.050.05, is shown in Column (a) - (d) respectively. The trained networks of DeepLabV3 and IrisParseNet work worse on the test images of no noise and Gaussian noise 0.050.050.05; in contrast, the proposed networks of PD-STD+DeepLabV3 and PD-STD+IrisParseNet work reliably on the test images of various noise levels.

On the other hand, we also build up a training image dataset with a Gaussian noise level of 0.10.10.1, and compare the performance of the four networks of DeepLabV3, IrisParseNet and our proposed PD-STD-based networks, trained on the noisy image dataset with those trained on the clean image dataset. Tab. 4 presents the segmentation results on clean images and images with different noise levels, including Gaussian noise levels of 0.10.10.1, 0.050.050.05, and 0.010.010.01, as well as salt-and-pepper noise levels of 0.050.050.05 and 0.010.010.01. As shown in Tab. 4 and Fig. 7, the performance of DeepLabV3 and IrisParseNet is largely improved when trained in the noisy image dataset. However, our proposed PD-STD-based networks still outperform in all cases, especially in the compactness metric when segmenting images with Salt & Pepper noises.

Method Clean Gaussian Salt & Pepper
Noise level 0 0.01 0.05 0.1 0.01 0.05
IoU IrisParseNet 0.8618 0.8943 0.8995 0.9017 0.8629 0.8491
DeepLabV3 0.86420.8642{0.8642} 0.87370.8737{0.8737} 0.89980.8998{0.8998} 0.89990.8999{0.8999} 0.87630.8763{0.8763} 0.67180.6718{0.6718}
PD-STD + IrisParseNet 0.8880 0.90020.9002\boldsymbol{0.9002} 0.90420.9042{0.9042} 0.90440.9044\boldsymbol{0.9044} 0.8827 0.86540.8654\boldsymbol{0.8654}
PD-STD + DeepLabV3 0.89330.8933\boldsymbol{0.8933} 0.8959 0.90510.9051\boldsymbol{0.9051} 0.90300.9030{0.9030} 0.89420.8942\boldsymbol{0.8942} 0.6784
Dice IrisParseNet 0.9097 0.9330 0.9370 0.9388 0.9108 0.9032
DeepLabV3 0.9120 0.9186 0.9373 0.9371 0.9215 0.7531
PD-STD + IrisParseNet 0.9293 0.93710.9371\boldsymbol{0.9371} 0.9397 0.94080.9408\boldsymbol{0.9408} 0.9263 0.91560.9156\boldsymbol{0.9156}
PD-STD + DeepLabV3 0.93330.9333\boldsymbol{0.9333} 0.9350 0.94110.9411\boldsymbol{0.9411} 0.9397 0.93390.9339\boldsymbol{0.9339} 0.7617
Compactness IrisParseNet 19.1554 15.7192 15.1328 14.7285 19.1122 19.3267
DeepLabV3 21.0197 19.6417 15.6534 14.6295 18.5644 19.6172
PD-STD + IrisParseNet 15.0566 14.5395 14.3723 14.2044 15.3038 14.9607
PD-STD + DeepLabV3 14.312514.3125\boldsymbol{14.3125} 14.226214.2262\boldsymbol{14.2262} 13.976913.9769\boldsymbol{13.9769} 13.832113.8321\boldsymbol{13.8321} 14.105114.1051\boldsymbol{14.1051} 14.457314.4573\boldsymbol{14.4573}
Table 4: Results of the four networks of DeepLabV3, IrisParseNet and our proposed PD-STD-based networks, which are trained on images with Gaussian noise level 0.10.10.1. Although the performance of DeepLabV3 and IrisParseNet improves a lot when training in the noisy image dataset, our proposed PD-STD-based networks still perform better in all cases, especially in the compactness metric when segmenting images with Salt & Pepper noises.

5.2.3 Experiments on Fundus dataset

More experiments on the image dataset of Fundus [40] are given in this section, which comprises 311311311 training images and 174174174 testing images. Segmenting the optic disc region is essential for fundus image analysis. However, the segmentation result is often affected by existing blood vessels and a noisy imaging condition. In our experiments, the training images are cropped to the size of 300×300300300300\times 300, and we set the training batch size to 202020, the training epoch to 200200200, ϵ=0.01italic-ϵ0.01\epsilon=0.01, and the iteration number to 505050. Both DeepLabV3 and the proposed PD-STD-DeepLabV3 are trained on the clean Fundus dataset, and tested on the images with different noise types and levels, such as Gaussian noise levels of 0.010.010.01, 0.050.050.05, 0.070.070.07, 0.10.10.1, and Salt & Pepper noise 0.010.010.01. The experiment results obtained by the baseline network DeepLabV3 and our proposed PD-STD+DeepLabV3 are presented in Tab. 5. The proposed PD-STD+DeepLabV3 consistently outperforms DeepLabV3 in terms of IoU, Dice, and Compactness metrics across all noise levels. When tested on the images with a Gaussian noise level of 0.10.10.1, the proposed PD-STD+DeepLabV3 significantly improves the accuracy of the segmentation results in terms of IoU and Dice. Fig. 8 visually exhibits the superior performance of our proposed PD-STD+DeepLabV3.

Method Clean Gaussian Salt & Pepper
Noise level 0 0.01 0.05 0.07 0.1 0.001
IoU DeepLabV3 0.9028 0.9028 0.8648 0.8254 0.7681 0.8220
PD-STD + DeepLabV3 0.90610.9061\boldsymbol{0.9061} 0.90660.9066\boldsymbol{0.9066} 0.87640.8764\boldsymbol{0.8764} 0.85400.8540\boldsymbol{0.8540} 0.82640.8264\boldsymbol{0.8264} 0.83290.8329\boldsymbol{0.8329}
Dice DeepLabV3 0.9482 0.9483 0.9263 0.9028 0.8673 0.9008
PD-STD + DeepLabV3 0.95000.9500\boldsymbol{0.9500} 0.95030.9503\boldsymbol{0.9503} 0.93310.9331\boldsymbol{0.9331} 0.92000.9200\boldsymbol{0.9200} 0.90350.9035\boldsymbol{0.9035} 0.90760.9076\boldsymbol{0.9076}
Compactness DeepLabV3 14.3698 14.3200 14.7309 15.5674 17.2642 15.6184
PD-STD + DeepLabV3 14.020014.0200\boldsymbol{14.0200} 14.022814.0228\boldsymbol{14.0228} 14.105214.1052\boldsymbol{14.1052} 14.266114.2661\boldsymbol{14.2661} 14.497914.4979\boldsymbol{14.4979} 14.519914.5199\boldsymbol{14.5199}
Table 5: DeepLabV3 and the proposed PD-STD+DeepLabV3 are trained on a clean fundus dataset. Clearly, the segmentation accuracy of DeepLabV3 quickly worsens as the noise level of test images gets bigger; in contrast, the proposed PD-STD+DeepLabV3 is not affected much by image noise levels.
Refer to caption
Figure 8: This pictures of this figure show some test results, obtained by the DeepLabV3 and PD-STD+DeepLabV3, on the images with Gaussian noise level 0.050.050.05. The proposed PD-STD-DeepLabV3 obtains better segmentation results than DeepLabV3, with the correct shapes of optic disc which are nearly the same as the ground-truth results.
Noise level of test image dataset
Methods clean 0.10.1{0.1} 0.20.2{0.2} 0.30.3{0.3} 0.40.4{0.4} 0.50.5{0.5} 0.60.6{0.6} 0.70.7{0.7} 0.80.8{0.8}
IoU DeepLabV3 0.53890.5389{0.5389} 0.79360.7936{0.7936} 0.85450.8545{0.8545} 0.86390.8639{0.8639} 0.86890.8689{0.8689} 0.86660.8666{0.8666} 0.86030.8603{0.8603} 0.85610.8561{0.8561} 0.83230.8323{0.8323}
PD-STD+DeepLabV3 0.71590.7159{0.7159} 0.81330.8133{0.8133} 0.84930.8493{0.8493} 0.86560.8656{0.8656} 0.87020.8702{0.8702} 0.87930.8793{0.8793} 0.87280.8728{0.8728} 0.86330.8633{0.8633} 0.84030.8403{0.8403}
DeepLabV3* 0.87060.8706{0.8706} 0.87440.8744{0.8744} 0.87720.8772{0.8772} 0.88370.8837{0.8837} 0.88670.8867{0.8867} 0.88760.8876{0.8876} 0.88740.8874{0.8874} 0.88350.8835{0.8835} 0.87390.8739{0.8739}
PD-STD+DeepLabV3* 0.88000.8800\boldsymbol{0.8800} 0.88360.8836\boldsymbol{0.8836} 0.88660.8866\boldsymbol{0.8866} 0.89030.8903\boldsymbol{0.8903} 0.89170.8917\boldsymbol{0.8917} 0.88940.8894\boldsymbol{0.8894} 0.88980.8898\boldsymbol{0.8898} 0.88680.8868\boldsymbol{0.8868} 0.87630.8763\boldsymbol{0.8763}
Dice DeepLabV3 0.66710.6671{0.6671} 0.86600.8660{0.8660} 0.90810.9081{0.9081} 0.91410.9141{0.9141} 0.91790.9179{0.9179} 0.91530.9153{0.9153} 0.90990.9099{0.9099} 0.90980.9098{0.9098} 0.88970.8897{0.8897}
PD-STD+DeepLabV3 0.80520.8052{0.8052} 0.87670.8767{0.8767} 0.90240.9024{0.9024} 0.91420.9142{0.9142} 0.91900.9190{0.9190} 0.92480.9248{0.9248} 0.92120.9212{0.9212} 0.91510.9151{0.9151} 0.89700.8970{0.8970}
DeepLabV3* 0.91860.9186{0.9186} 0.92090.9209{0.9209} 0.92330.9233{0.9233} 0.92740.9274{0.9274} 0.92950.9295{0.9295} 0.93000.9300{0.9300} 0.93040.9304{0.9304} 0.92790.9279{0.9279} 0.92150.9215{0.9215}
PD-STD+DeepLabV3* 0.92410.9241\boldsymbol{0.9241} 0.92690.9269\boldsymbol{0.9269} 0.92890.9289\boldsymbol{0.9289} 0.93120.9312\boldsymbol{0.9312} 0.93220.9322\boldsymbol{0.9322} 0.93090.9309\boldsymbol{0.9309} 0.93140.9314\boldsymbol{0.9314} 0.92990.9299\boldsymbol{0.9299} 0.92310.9231\boldsymbol{0.9231}
Comp. DeepLabV3 50.827350.8273{50.8273} 24.492024.4920{24.4920} 17.921317.9213{17.9213} 16.728316.7283{16.7283} 15.604415.6044{15.6044} 15.389315.3893{15.3893} 15.414815.4148{15.4148} 15.757215.7572{15.7572} 17.772017.7720{17.7720}
PD-STD+DeepLabV3 16.457716.4577{16.4577} 14.654914.6549{14.6549} 14.200514.2005\boldsymbol{14.2005} 14.154114.1541{14.1541} 14.102814.1028{14.1028} 13.971613.9716\boldsymbol{13.9716} 14.058914.0589{14.0589} 14.026814.0268{14.0268} 13.911913.9119\boldsymbol{13.9119}
DeepLabV3* 16.283916.2839{16.2839} 16.777516.7775{16.7775} 16.450716.4507{16.4507} 15.679815.6798{15.6798} 15.250715.2507{15.2507} 14.831914.8319{14.8319} 14.820914.8209{14.8209} 14.917714.9177{14.9177} 15.024715.0247{15.0247}
PD-STD+DeepLabV3* 14.457414.4574\boldsymbol{14.4574} 14.394414.3944\boldsymbol{14.3944} 14.263814.2638{14.2638} 14.077514.0775\boldsymbol{14.0775} 14.068614.0686\boldsymbol{14.0686} 13.987913.9879{13.9879} 13.976913.9769\boldsymbol{13.9769} 13.999213.9992\boldsymbol{13.9992} 13.960613.9606{13.9606}
Table 6: Comparison between the direct training strategy and the progressive training strategy: the networks of DeepLabV3 and our proposed PD-STD+DeepLabV3 are trained directly by the image dataset of a high Gaussian noise level of 0.80.80.8; and the trained networks of DeepLabV3 and PD-STD+DeepLabV3 by the progressive training strategy are denoted by DeepLabV3* and PD-STD+DeepLabV3* correspondingly. Clearly, the progressive training strategy does improve the performance of the trained networks significantly in terms of of IoU, Dice and Compactness, and the networks of DeepLabV3* and PD-STD+DeepLabV3*, trained in the progressive way, perform much more reliably on test images across different noise levels.

5.3 Progressive Training Strategy

In this work, a progressive training strategy is also used to train the neural networks and update their parameters progressively from the image dataset of low noise level to the one with high noise level. As shown in [23], such progressive training strategy improves the robustness and reliability of trained neural networks, compared to the direct training strategy which trains neural networks directly on the image dataset with a high level of noise. A comprehensive comparison of the direct training strategy and the introduced progressive training strategy is shown in Tab. 6: the networks of DeepLabV3 and our proposed PD-STD+DeepLabV3 are trained by the two training strategies at a high Gaussian noise level of 0.80.80.8, respectively; the trained networks of DeepLabV3 and PD-STD+DeepLabV3 by the progressive training strategy are denoted by DeepLabV3* and PD-STD+DeepLabV3* correspondingly.

Noisy Image

Ground Truth

DeepLabV3

PD-STD

+DeepLabV3

DeepLabV3*

PD-STD

+DeepLabV3*

Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(a)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(b)
Refer to captionRefer to captionRefer to captionRefer to captionRefer to captionRefer to caption
(c)
Figure 9: Three segmentation expamples demonstrated in this figure show that the progressive training strategy largely improves the performance of the trained networks comparing to the direct training strategy. Gaussian noise levels of the test images in (a), (b), and (c) are 0.10.10.1, 0.50.50.5, and 0.80.80.8, respectively. Obviously, the networks trained by the direct training strategy at a high noise level of 0.80.80.8 perform much worse on the test image with a low noise level.

It is easy to see that the progressive training strategy does improve the performance of trained networks significantly in terms of of IoU, Dice and Compactness; and the networks of DeepLabV3* and PD-STD+DeepLabV3*, which are trained in the progressive way, perform much more robustly and reliably on test images of different noise levels. Fig. 9 clearly illustrates some examples: the networks trained by the direct training strategy at a high noise level of 0.80.80.8 perform much worse on the test image with a low noise level.

test noise level training noise level 0.10.1{0.1} 0.20.2{0.2} 0.30.3{0.3} 0.40.4{0.4} 0.50.5{0.5} 0.60.6{0.6} 0.70.7{0.7} 0.80.8{0.8}
clean DeepLabV3 14.5714.57{14.57} 15.2615.26{15.26} 15.9615.96{15.96} 17.3017.30{17.30} 17.6717.67{17.67} 16.5216.52{16.52} 16.9916.99{16.99} 16.2916.29{16.29}
PD-STD+DeepLabV3 13.8913.89\boldsymbol{13.89} 14.1014.10\boldsymbol{14.10} 14.0614.06\boldsymbol{14.06} 13.9613.96\boldsymbol{13.96} 14.1314.13\boldsymbol{14.13} 14.2314.23\boldsymbol{14.23} 14.5614.56\boldsymbol{14.56} 14.4714.47\boldsymbol{14.47}
  0.1 DeepLabV3 14.3814.38{14.38} 15.0015.00{15.00} 15.7915.79{15.79} 16.9516.95{16.95} 17.6517.65{17.65} 16.8516.85{16.85} 18.4418.44{18.44} 16.7916.79{16.79}
PD-STD+DeepLabV3 13.8713.87\boldsymbol{13.87} 14.0514.05\boldsymbol{14.05} 14.1614.16\boldsymbol{14.16} 14.0314.03\boldsymbol{14.03} 14.1614.16\boldsymbol{14.16} 14.3014.30\boldsymbol{14.30} 14.3214.32\boldsymbol{14.32} 14.4014.40\boldsymbol{14.40}
  0.2 DeepLabV3 23.1123.11{23.11} 14.6414.64{14.64} 14.6714.67{14.67} 15.7415.74{15.74} 16.5516.55{16.55} 16.6916.69{16.69} 17.1717.17{17.17} 16.4616.46{16.46}
PD-STD+DeepLabV3 15.7115.71\boldsymbol{15.71} 13.8113.81\boldsymbol{13.81} 14.0114.01\boldsymbol{14.01} 13.9713.97\boldsymbol{13.97} 14.0514.05\boldsymbol{14.05} 14.0914.09\boldsymbol{14.09} 14.0914.09\boldsymbol{14.09} 14.2714.27\boldsymbol{14.27}
  0.3 DeepLabV3 {-} 22.3422.34{22.34} 14.5514.55{14.55} 14.8814.88{14.88} 15.4415.44{15.44} 15.6815.68{15.68} 15.7215.72{15.72} 15.6915.69{15.69}
PD-STD+DeepLabV3 {-} 14.2914.29\boldsymbol{14.29} 13.9313.93\boldsymbol{13.93} 13.8913.89\boldsymbol{13.89} 13.9613.96\boldsymbol{13.96} 14.0214.02\boldsymbol{14.02} 14.1014.10\boldsymbol{14.10} 14.0914.09\boldsymbol{14.09}
  0.4 DeepLabV3 {-} 43.8043.80{43.80} 16.1016.10{16.10} 14.5114.51{14.51} 14.8414.84{14.84} 14.8714.87{14.87} 15.0715.07{15.07} 15.2615.26{15.26}
PD-STD+DeepLabV3 {-} 13.6413.64\boldsymbol{13.64} 14.0114.01\boldsymbol{14.01} 13.8313.83\boldsymbol{13.83} 13.9513.95\boldsymbol{13.95} 13.9513.95\boldsymbol{13.95} 13.9213.92\boldsymbol{13.92} 14.1414.14\boldsymbol{14.14}
  0.5 DeepLabV3 {-} {-} 25.0425.04{25.04} 15.9215.92{15.92} 14.5714.57{14.57} 14.9914.99{14.99} 14.8814.88{14.88} 14.8414.84{14.84}
PD-STD+DeepLabV3 {-} {-} 14.3514.35\boldsymbol{14.35} 13.8113.81\boldsymbol{13.81} 13.9213.92\boldsymbol{13.92} 13.8613.86\boldsymbol{13.86} 13.9213.92\boldsymbol{13.92} 14.0014.00\boldsymbol{14.00}
  0.6 DeepLabV3 {-} {-} {-} 23.8423.84{23.84} 16.5016.50{16.50} 15.2715.27{15.27} 14.7014.70{14.70} 14.8314.83{14.83}
PD-STD+DeepLabV3 {-} {-} {-} 13.3713.37\boldsymbol{13.37} 14.0514.05\boldsymbol{14.05} 13.9913.99\boldsymbol{13.99} 14.0114.01\boldsymbol{14.01} 13.9913.99\boldsymbol{13.99}
  0.7 DeepLabV3 {-} {-} {-} {-} 22.3822.38{22.38} 15.3215.32{15.32} 14.8114.81{14.81} 14.9314.93{14.93}
PD-STD+DeepLabV3 {-} {-} {-} {-} 14.0214.02\boldsymbol{14.02} 13.8413.84\boldsymbol{13.84} 13.9113.91\boldsymbol{13.91} 14.0114.01\boldsymbol{14.01}
  0.8 DeepLabV3 {-} {-} {-} {-} 32.6132.61{32.61} 20.5820.58{20.58} 15.9115.91{15.91} 15.0315.03{15.03}
PD-STD+DeepLabV3 {-} {-} {-} {-} 13.7713.77\boldsymbol{13.77} 13.8813.88\boldsymbol{13.88} 13.9113.91\boldsymbol{13.91} 13.9713.97\boldsymbol{13.97}
Table 7: The progressive training strategy is applied for the Iris dataset from the image noise level of 0.10.10.1 to 0.80.80.8 gradually at a step-size 0.10.10.1. Given the circular shape of Iris, the compactness of the optimal image segmentation result tends to be 4π12.564𝜋12.564\pi\approx 12.56; hence, the compactness value closer to 12.5612.5612.56 means a more circular segmentation region is obtained, i.e. better. Clearly, our proposed PD-STD+DeepLabV3 performs more accuractely and robustly than DeepLabV3 when trained on a fixed image noise level but tested on different image noise levels; also, the same when trained on different image noise levels but tested on a fixed image noise level. In all cases, our proposed PD-STD+DeepLabV3 achieves better results than DeepLabV3.

The progressive training strategy is employed for the Iris dataset from the image noise level of 0.10.10.1 to 0.80.80.8 gradually at a step-size 0.10.10.1 and its extensive results are shown in Tab. 7. Given the circular shape of Iris, the compactness of the optimal image segmentation result tends to be 4π12.564𝜋12.564\pi\approx 12.56; so the compactness value closer to 12.5612.5612.56 means a more circular segmentation region is obtained, which means better. In view of this, our proposed PD-STD+DeepLabV3 performs better than DeepLabV3 in both accuracy and robustness when trained at a fixed image noise level but tested at different image noise levels; also, the same when trained at different image noise levels but tested at a fixed image noise level. In all cases, our proposed PD-STD+DeepLabV3 achieves better results than DeepLabV3. Fig. 10 (a) and (b) demonstrate the performance of two examples of such experiments, in terms of IoU, when trained at a fixed image noise level but tested on different image noise levels: with the help of the introduced PD-STD block, i.e. incorporating a compact shape prior information, our proposed PD-STD+DeepLabV3 can obtain more accurate results than DeepLabV3 when trained at a fixed image noise level of 0.50.50.5 (a) and 0.70.70.7 (b) but tested at different image noise levels; moreover, as (a) shown, when both networks are trained at the image noise level of 0.50.50.5, DeepLabV3 achieves the result close to our proposed PD-STD+DeepLabV3 at test image noise level of 0.50.50.5, but it performs much worse than PD-STD+DeepLabV3 at a high test image noise level of 0.80.80.8; the proposed PD-STD+DeepLabV3 exhibits better robustness than DeepLabV3 in both (a) and (b), which enables PD-STD-based networks to properly reduce the influence of real-world high noise level and data variability. Fig. 11 provides three illustrative examples which show that the proposed PD-STD network block can effectively incorporate proper shape information, thus ensuring more reasonable image segmentation results when noise and data variability exist.

As the above experiments show, by training the networks on the image datasets with higher noise levels and progressively increasing the noise level during training, the networks can be gradually adapted to different noisy images and memorize image views of different noise levels, which hence enhances the networks’ ability to handle image noise and improves their performance on datasets with various noise levels. The progressive training strategy is therefore essential to improve the trained networks’ performance in accuracy and robustness across different image noise levels.

(a)

(a) *
Refer to caption

(b)

(b) *
Refer to caption
Figure 10: The two graphs (a) and (b) show the performance of DeepLabV3 and our proposed PD-STD+DeepLabV3 on the test images with different noise levels, when the networks are trained progressively at the noise level of 0.50.50.5 and 0.70.70.7 respectively. Our proposed PD-STD+DeepLabV3 outperforms DeepLabV3 in all cases in terms of IoU, and demonstrates more robustly when there is a big difference in noise level between the training image data and the test image data.
Refer to captionRefer to captionRefer to caption
(a)
Refer to captionRefer to captionRefer to caption
(b)
Refer to captionRefer to captionRefer to caption
(c)
Refer to captionRefer to captionRefer to caption
(d)
Figure 11: Segmentation results of progressive training with noise level varying from 0.10.10.1 to 0.80.80.8. Models were trained and tested at different noise levels, with the noisy training images having noise levels of 0.50.50.5, 0.50.50.5, and 0.70.70.7 from top to bottom. The segmentation results for DeepLabV3 and our method correspond to models tested on noisy testing images with noise levels of 0.20.20.2, 0.70.70.7, and 0.80.80.8.

6 Conclusion

In this paper, we proposed two novel algorithms PD-TD and PD-STD to solve the challenging image segmentation problem with a high-order shape-compactness prior, which essentially evaluates the ratio of squared perimeter to area. The new algorithms are based on the new primal-dual model, which is equivalent to the studied optimization problem, outperform existing methods in numerical simplicity and effectiveness. Meanwhile, a new PD-STD block is introduced to replace the often-used sigmoid layer of the backbone DNNs, which properly integrates the shape-compactness information into the neural network and enforces compact regions in segmentation results. Extensive experiments, especially on highly noisy image datasets, show that the proposed PD-STD-based neural networks significantly outperform the state-of-the-art DNNs in both robustness and accuracy. Such PD-STD block can also be applied to many other DNN models, besides DeepLabV3 and IrisParseNet used in this work.

Acknowledgements

We acknowledge the following funding sources that supported this work. Dr. Hao Liu was partially supported by the Natural Science Foundation of China (Grant No. 12201530) and the HKRGC ECS (Grant No. 22302123). Professor Jing Yuan acknowledges the support from the National Natural Science Foundation of China (NSFC) under Grant No. 61877047, as well as the Distinguished Professorship Start-up Funding of Zhejiang Normal University (No. YS304022908). Professor Xue-cheng Tai was partially supproted by the NORCE Kompetanseoppbygging program.

References

  • [1] G. Máttyus, W. Luo, R. Urtasun, Deeproadmapper: Extracting road topology from aerial images, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 3438–3446.
  • [2] K. Kamnitsas, E. Ferrante, S. Parisot, C. Ledig, A. V. Nori, A. Criminisi, D. Rueckert, B. Glocker, Deepmedic for brain tumor segmentation, in: A. Crimi, B. Menze, O. Maier, M. Reyes, S. Winzeck, H. Handels (Eds.), Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Springer International Publishing, Cham, 2016, pp. 138–149.
  • [3] O. Ronneberger, P.Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Vol. 9351 of LNCS, Springer, 2015, pp. 234–241, (available on arXiv:1505.04597 [cs.CV]).
    URL http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a
  • [4] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmentation, in: 2016 Fourth International Conference on 3D Vision (3DV), IEEE, 2016, pp. 565–571.
  • [5] M. Kampffmeyer, A.-B. Salberg, R. Jenssen, Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2016, pp. 1–9.
  • [6] Z. Pan, J. Xu, Y. Guo, Y. Hu, G. Wang, Deep learning segmentation and classification for urban village using a worldview satellite image based on u-net, Remote Sensing 12 (10) (2020) 1574.
  • [7] D. B. Mumford, J. Shah, Optimal approximations by piecewise smooth functions and associated variational problems, Communications on Pure and Applied Mathematics (1989).
  • [8] T. F. Chan, L. A. Vese, Active contours without edges, IEEE Transactions on Image Processing 10 (2) (2001) 266–277.
  • [9] R. B. Potts, Some generalized order-disorder transformations, in: Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 48, Cambridge University Press, 1952, pp. 106–109.
  • [10] J. Liu, X.-C. Tai, S. Luo, Convex shape prior for deep neural convolution network based eye fundus images segmentation, arXiv preprint arXiv:2005.07476 (2020).
  • [11] J. Liu, X. Wang, X.-c. Tai, Deep convolutional neural networks with spatial regularization, volume and star-shape priors for image segmentation, Journal of Mathematical Imaging and Vision (2022) 1–21.
  • [12] O. Veksler, Star shape prior for graph-cut image segmentation, in: Computer Vision–ECCV 2008: 10th European Conference on Computer Vision, Marseille, France, October 12-18, 2008, Proceedings, Part III 10, Springer, 2008, pp. 454–467.
  • [13] I. Ben Ayed, M. Wang, B. Miles, G. J. Garvin, Tric: Trust region for invariant compactness and its application to abdominal aorta segmentation, in: P. Golland, N. Hata, C. Barillot, J. Hornegger, R. Howe (Eds.), Medical Image Computing and Computer-Assisted Intervention – MICCAI 2014, Springer International Publishing, Cham, 2014, pp. 381–388.
  • [14] J. Dolz, I. Ben Ayed, C. Desrosiers, Unbiased shape compactness for segmentation, in: Medical Image Computing and Computer Assisted Intervention – MICCAI 2017, Springer International Publishing, 2017, pp. 755–763.
  • [15] O. Veksler, Stereo matching by compact windows via minimum ratio cycle, in: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 1, 2001, pp. 540–547 vol.1. doi:10.1109/ICCV.2001.937563.
  • [16] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, L. Jackel, Handwritten digit recognition with a back-propagation network, Advances in Neural Information Processing Systems 2 (1989).
  • [17] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25 (2012).
  • [18] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs (2016). arXiv:1412.7062.
  • [19] M. Yang, K. Yu, C. Zhang, Z. Li, K. Yang, Denseaspp for semantic segmentation in street scenes, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3684–3692.
  • [20] A. Paszke, A. Chaurasia, S. Kim, E. Culurciello, Enet: A deep neural network architecture for real-time semantic segmentation, arXiv preprint arXiv:1606.02147 (2016).
  • [21] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12) (2017) 2481–2495.
  • [22] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: Redesigning skip connections to exploit multiscale features in image segmentation, IEEE transactions on medical imaging 39 (6) (2019) 1856–1867.
  • [23] X.-C. Tai, H. Liu, R. Chan, PottsMGNet: A mathematical explanation of encoder-decoder based neural networks, SIAM Journal on Imaging Sciences 17 (1) (2024) 540–594.
  • [24] H. Liu, J. Liu, R. Chan, X.-C. Tai, Double-well net for image segmentation, arXiv preprint arXiv:2401.00456 (2023).
  • [25] S. Geman, D. Geman, Stochastic relaxation, gibbs distributions, and the bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1984) 721–741.
  • [26] Y. Boykov, O. Veksler, R. Zabih, Markov random fields with efficient approximations, in: Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No. 98CB36231), IEEE, 1998, pp. 648–655.
  • [27] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization via graph cuts, IEEE Transactions on Pattern Analysis and Machine Intelligence 23 (2001) 1222 – 1239.
  • [28] T. F. Chan, S. Esedoglu, M. Nikolova, Algorithms for finding global minimizers of image segmentation and denoising models, SIAM journal on applied mathematics 66 (5) (2006) 1632–1648.
  • [29] J. Yuan, E. Bae, X.-C. Tai, Y. Boykov, A continuous max-flow approach to potts model, in: European Conference on Computer Vision, Springer, 2010, pp. 379–392.
  • [30] A. Chambolle, T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision 40 (1) (2011) 120–145.
  • [31] E. Esser, X. Zhang, T. F. Chan, A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science, SIAM Journal on Imaging Sciences 3 (4) (2010) 1015–1046.
  • [32] C. Wu, X.-C. Tai, Augmented Lagrangian method, dual methods, and split Bregman iteration for ROF, vectorial TV, and high order models, SIAM Journal on Imaging Sciences 3 (2010) 300–339.
  • [33] X.-C. Tai, L. Li, E. Bae, The Potts model with different piecewise constant representations and fast algorithms: a survey, Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging: Mathematical Imaging and Vision (2021) 1–41.
  • [34] R. Osserman, The isoperimetric inequality, Bulletin of the American Mathematical Society 84 (6) (1978) 1182–1238.
  • [35] A. Chambolle, V. Caselles, D. Cremers, M. Novaga, T. Pock, An introduction to total variation for image analysis, Theoretical Foundations and Numerical Methods for Sparse Recovery 9 (263-340) (2010) 227.
  • [36] L.-C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for semantic image segmentation, arXiv preprint arXiv:1706.05587 (2017).
  • [37] R. Acar, C. R. Vogel, Analysis of bounded variation penalty methods for ill-posed problems, Inverse Problems 10 (6) (1994) 1217.
  • [38] D. Wang, H. Li, X. Wei, X.-P. Wang, An efficient iterative thresholding method for image segmentation, Journal of Computational Physics 350 (2017) 657–667.
  • [39] C. Wang, J. Muhammad, Y. Wang, Z. He, Z. Sun, Towards complete and accurate iris segmentation using deep multi-task attention network for non-cooperative iris recognition, IEEE Transactions on Information Forensics and Security 15 (2020) 2944–2959. doi:10.1109/TIFS.2020.2980791.
  • [40] K. Jin, X. Huang, J. Zhou, Y. Li, Y. Yan, Y. Sun, Q. Zhang, Y. Wang, J. Ye, Fives: A fundus image dataset for artificial intelligence based vessel segmentation, Scientific Data 9 (1) (2022) 475.