This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\pdfximage

supplement_rev.pdf

[1]\fnmGabriele \surScrivanti

1]\orgnameUniversité Paris-Saclay, Inria, CentraleSupélec, CVN, \stateFrance 2]\orgnameResearch department, Preligens,\stateFrance

A Variational Approach for Joint Image Recovery and Feature Extraction Based on Spatially Varying Generalised Gaussian Models

\fnmÉmilie \surChouzenoux    \fnmMarie-Caroline \surCorbineau    \fnmJean-Christophe \surPesquet    * [email protected] [ [
Abstract

The joint problem of reconstruction/feature extraction is a challenging task in image processing. It consists in performing, in a joint manner, the restoration of an image and the extraction of its features. In this work, we firstly propose a novel non-smooth and non-convex variational formulation of the problem. For this purpose, we introduce a versatile generalised Gaussian prior whose parameters, including its exponent, are space-variant. Secondly, we design an alternating proximal-based optimisation algorithm that efficiently exploits the structure of the proposed non-convex objective function. We also analyse the convergence of this algorithm. As shown in numerical experiments conducted on joint deblurring/segmentation tasks, the proposed method provides high-quality results.

keywords:
Image recovery ; Space-variant regularisation ; Alternating minimization ; Proximal algorithm ; Block coordinate descent ; Image segmentation

1 Introduction

Variational regularisation of ill-posed inverse problems in imaging relies on the idea of searching for a solution in a well-suited space. A central role in this context is played by p\ell_{p} spaces with p(0,)p\in(0,\infty), and the power pp of the corresponding norms when p1p\geq 1 [1, 2, 3, 4, 5] or seminorms when p(0,1)p\in(0,1) [6, 7, 8]. For every vector u=(ui)1innu=(u_{i})_{1\leq i\leq n}\in\mathbb{R}^{n} and p(0,+)p\in(0,+\infty), the p\ell_{p} (semi-)norm is denoted by up=(i=1n|ui|p)1/p\|u\|_{p}=\big{(}\sum_{i=1}^{n}|u_{i}|^{p}\big{)}^{1/p}. We usually omit pp when p=2p=2, so that =2\|\cdot\|=\|\cdot\|_{2}. The case p(0,1)p\in(0,1) has gained rising credit, especially in the field of sparse regularisation. An extensive literature has been focused on challenging numerical tasks raised by the non-convexity of the seminorms and the possibility of combining them with linear operators to extract salient features of the sought images [9, 10]. In [11], the more general notion of FF-norm is introduced in order to establish functional analysis results on products of pi\ell_{p_{i}}-spaces with pi(0,2]p_{i}\in(0,2]. For some x=(xi)1innx=(x_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}, this amounts to studying the properties of penalties of the form i=1n|xi|pi\sum_{i=1}^{n}|x_{i}|^{p_{i}}, for some positive exponents (pi)1in(p_{i})_{1\leq i\leq n}. This approach offers a more flexible framework by considering a wider range of exponents than the standard p\ell_{p}-based regularisation. However, it extends the problem of choosing a suitable exponent pp to a whole sequence of exponents (pi)1in(p_{i})_{1\leq i\leq n}. In [12], the authors proposed a non-convex regulariser of the form i=1n|xi|ϖ(|xi|)\sum_{i=1}^{n}|x_{i}|^{\varpi(|x_{i}|)}, where each exponent is expressed as a function of the absolute magnitude of the data and function ϖ()\varpi(\cdot) is a rescaled version of the sigmoid function, taking values in the interval [0,1][0,1]. In image restoration, a similar approach consists in adopting space variant regularisation models built around a Total Variation-like functional with a variable exponent i=1n(x)ipi\sum_{i=1}^{n}\|(\nabla x)_{i}\|^{p_{i}} where \nabla is a discrete 2D gradient operator. The rationale is to select the set of parameters (pi)1in(p_{i})_{1\leq i\leq n} in order to promote either edge enhancement (pi=1p_{i}=1) or smoothing (pi>1p_{i}>1) depending on the spatial location encoded by index ii. This model was introduced in [13] and then put into practice firstly for pi[1,2]p_{i}\in[1,2] in [14] and then for pi(0,2]p_{i}\in(0,2] in [15]. To conclude, in a recent work [16], the authors proposed a modular-proximal gradient algorithm to find solutions to ill-posed inverse problems in variable exponents Lebesgue spaces Lp()(Ω)L^{p(\cdot)}(\Omega) with Ωn\Omega\subseteq\mathbb{R}^{n}, rather than in L2(Ω)L^{2}(\Omega). In all of these works, the so-called space variant pp-map (i.e. , (pi)1in(p_{i})_{1\leq i\leq n}) is estimated offline in a preliminary step and then kept fixed throughout the optimisation procedure.

In this paper, we address the problem of joint image recovery and feature extraction. Image recovery amounts to retrieving an estimate of an original image from a degraded version of it. The degradation usually corresponds to the application of a linear operator (e.g., blur, projection matrix) to the image and the addition of a noise. Feature extraction problems arise when one wants to assign to an image a small set of parameters which can describe or identify the image itself. Image segmentation can be viewed as an example of feature extraction, which consists of defining a label field on the image domain so that pixels are partitioned into a predefined number of homogeneous regions according to some specific characteristics. A second example, similar to segmentation, is edge detection, where one aims at identifying the contour changes within different regions of the image. Texture retrieval is a third example. This task relies on the idea of assigning a set of parameters to each coefficient of the image – possibly in some transformed space – so that the combination of all parameters defines a "signature" that represents the content of various spatial regions. Joint image recovery and feature extraction consists in performing, in a joint manner, the image recovery and the extraction of features in the sought image.

A powerful and versatile approach for feature extraction, that we propose to adopt here, assumes that the data follow a mixture of generalised Gaussian probability distribution (𝒢𝒢𝒟)(\mathcal{GGD}) [17, 18, 19]. The 𝒢𝒢𝒟\mathcal{GGD} model results in a sum of weighted pi\ell_{p_{i}}-based terms in the criterion, with general form i=1nϑi|xi|pi\sum_{i=1}^{n}\vartheta_{i}|x_{i}|^{p_{i}} with {ϑi}1in[0,+)\{\vartheta_{i}\}_{1\leq i\leq n}\subset[0,+\infty). We thus aim at jointly estimating an optimal configuration for (ϑi,pi)1in(\vartheta_{i},p_{i})_{1\leq i\leq n}, and retrieving the image. Under an assumption of consistency within the exponents’ values of a given region of the features space, we indeed obtain the desired feature extraction starting from the estimated pp-map. The latter amounts to minimizing a non-smooth and non-convex cost function.

This specific structure of the proposed objective function suggests the use of an alternating minimisation procedure. In such an approach, one sequentially updates a subset of parameters through the resolution of an inner minimization problem, while the other parameters are assumed to be fixed. This approach has a standard form in the Block Coordinate Descent method (BCD) (also known as Gauss-Seidel algorithm) [20]. In the context of non-smooth and non-convex problems, the simple BCD may, however, show instabilities [21], which resulted in an extensive construction of alternative methods that efficiently exploit the characteristics of the functions, and introduce powerful tools to improve the convergence guarantees of BCD, or overcome difficulties arising in some formulations. In this respect, a central role is played by proximal methods [22, 23]: a proximally regularised BCD (PAM) for non-convex problems was studied in [24]; a proximal linearised method (PALM) and its inertial and stochastic versions were then proposed in [25] resp. [26] and [27]; in [28], the authors investigated the advantage of a hybrid semi-linearised scheme (SL-PAM) for the joint task of image restoration and edge detection based on a discrete version of the Mumford–Shah model. A structure-adapted version of PALM (ASAP) was designed in [29, 30] to exploit the block-convexity of the coupling terms and the regularity of the block-separable terms arising in some practical applications such as image colorisation and blind source separation. The extension to proximal mappings defined w.r.t. a variable metric was firstly introduced in [31], leading to the so-called Block Coordinate Variable Metric Forward-Backward. An Inexact version and a line search based version of it were presented in [32] and [33], respectively. In [34] the authors introduced a Majorisation-Minimisation (MM) strategy within a Variable Metric Forward-Backward algorithm to tackle the challenging task of computing the proximity operator of composite functions. We refer to [35] for an in-depth analysis of how to introduce a variable metric into first-order methods. In [36], the authors introduced a family of block-coordinate majorisation-minimisation methods named TITAN. Various majorisation strategies can be encompassed by their framework, such as proximal surrogates, Lipschitz gradient surrogates, or Bregman surrogate functions. Convergence of the algorithm iterates are shown in [36, 32], under mild assumptions, that include the challenging non-convex setting. These studies emphasised the prominent role played by the Kurdyka-Łojasiewicz (KL) inequality [37].

In the proposed problem formulation, the objective function includes several non-smooth terms, as well as a quadratic term – hence Lipschitz differentiable – that is restricted to a single block of variables. This feature makes the related subproblem well-suited for a splitting procedure that involves an explicit gradient step with respect to this term, combined with implicit proximal steps on the remaining blocks of variables. Variable metrics within gradient/proximal steps would also be desirable for convergence speed purposes. As we will show, the TITAN framework from [36] allows building and analysing such an algorithm. Unfortunately, the theoretical convergence properties of TITAN assume exact proximal computations at each step, which cannot be ensured in practice in our context. To circumvent this, we thus propose and prove the convergence of an inexact version of a TITAN-based optimisation scheme. Inexact rules in the form of those studied in [32] are considered. We refer to the proposed method as to a Preconditioned Semi-Linearised Structure Adapted Proximal Alternating Minimisation (P–SASL–PAM) scheme. We investigate the convergence properties for this algorithm by relying on the KŁ property first considered in [37]. Under analytical assumptions on the objective function, we show the global convergence toward a critical point of any sequence generated by the proposed method. Then, we explicit the use of this method in our problem of image recovery and feature extraction. The performance of the approach is illustrated by means of examples in the field of image processing, in which we also show quantitative comparisons with state-of-the-art methods.

In a nutshell, the contributions of this work are (i) the proposition of an original non-convex variational model for the joint image recovery and feature extraction problem; (ii) the design of an inexact block coordinate descent algorithm to address the resulting minimisation problem; (iii) the convergence analysis of this scheme; (iv) the illustration of the performance of the proposed method through a numerical example in the field of ultrasound image processing.

The paper is organised as follows. In Section 2 we introduce the degradation model and report our derivation of the objective function for image recovery and feature extraction, starting from statistical assumptions on the data. In Section 3, we describe the proposed P-SASL-PAM method to address a general non-smooth non-convex optimization problem; secondly we show that the proposed method converges globally, in the sense that the whole generated sequence converges to a (local) minimum. The application of the P-SASL-PAM method to the joint reconstruction/segmentation problem is described in Section 4. Some illustrative numerical results are shown in Section 5. Conclusions are drawn in Section 6.

2 Model Formulation

In this section, we describe the construction of the objective function associated to the joint reconstruction/feature extraction problem. After defining the degradation model, we report the Bayesian model that is reminiscent from the one considered in [17, 19] in the context of ultrasound imaging. Then, we describe the procedure that leads us to the definition of our addressed optimization problem.

2.1 Observation Model

Let xnx\in\mathbb{R}^{n} and ymy\in\mathbb{R}^{m} be respectively the vectorised sought-for solution and the observed data, which are assumed to be related according to the following model

y=Kx+ω,y=Kx+\omega, (1)

where Km×nK\in\mathbb{R}^{m\times n} is a linear operator, and ω𝒩(0,σ2𝕀m)\omega\sim\mathcal{N}(0,\sigma^{2}\mathbb{I}_{m}), i.e.  the normal distribution with zero mean and covariance matrix σ2𝕀m\sigma^{2}\mathbb{I}_{m} with σ>0\sigma>0 and 𝕀m\mathbb{I}_{m} states for the m×mm\times m identity matrix. We further assume that xx can be characterised by a finite set of kk features that are defined in a suitable space, where the data are described by a simple model relying on a small number of parameters. The Generalised Gaussian Distribution (𝒢𝒢𝒟\mathcal{GGD})

(t)𝗉(t;p,α)=12α1/pΓ(1+1p)exp(|t|pα)(\forall t\in\mathbb{R})\quad\mathsf{p}(t;p,\alpha)\\ =\frac{1}{2\alpha^{1/p}\Gamma\left(1+\frac{1}{p}\right)}\exp\left(-\frac{|t|^{p}}{\alpha}\right) (2)

with (p,α)(0,+)2(p,\alpha)\in(0,+\infty)^{2} has shown to be a suitable and flexible tool for this purpose [17, 18, 19]. Each feature can be identified by a pair (pj,αj)(p_{j},\alpha_{j}) for j{1,,k}j\in\{1,\dots,k\}, where parameter pp is proportional to the decay rate of the tail of the probability density function (PDF) and parameter α\alpha models the width of the peak of the PFD. Taking into account the role that pp and α\alpha play in the definition of the PDF profile, these two parameters are generally referred to as shape and scale parameter.

Assuming that KK and σ\sigma are known, the task we address in this work is to jointly retrieve xx (reconstruction) and obtain a good representation of its features through an estimation of the underlying model parameters (pj,αj)(p_{j},\alpha_{j}) for j{1,,k}j\in\{1,\dots,k\} (feature extraction). Starting from a similar statistical model as the one considered in [17, 19], we infer a continuous variational framework which does not rely on the a priori knowledge of the exact number of features kk. We derive this model by performing a Maximum a Posteriori estimation, which allows us to formulate the Joint Image Reconstruction and Feature Extraction task as a non-smooth and non-convex optimisation problem involving a coupling term and a block-coordinate separable one.

2.2 Bayesian Model

From (1), we derive the following likelihood

𝗉(y|x,σ2)=1(2πσ2)n/2exp(yKx22σ2).\mathsf{p}(y|x,\sigma^{2})\\ =\frac{1}{(2\pi\sigma^{2})^{n/2}}\exp\left(-\frac{\|y-Kx\|^{2}}{2\sigma^{2}}\right). (3)

Assuming then that the components of xx are independent conditionally to the knowledge of their feature class, the prior distribution of xx is a mixture of 𝒢𝒢𝒟\mathcal{GGD}s

𝗉(x|p,α)=j=1k1(2αj1/pjΓ(1+1pj))Njexp(x¯jpjpjαj).\mathsf{p}(x|p,\alpha)\\ =\prod_{j=1}^{k}\frac{1}{\left(2\alpha^{1/p_{j}}_{j}\Gamma\left(1+\frac{1}{p_{j}}\right)\right)^{N_{j}}}\exp\left(-\frac{\|\overline{x}_{j}\|^{p_{j}}_{p_{j}}}{\alpha_{j}}\right). (4)

Hereabove, for every xnx\in\mathbb{R}^{n} and a feature labels set j{1,,k}j\in\{1,\dots,k\}, we define x¯jNj\overline{x}_{j}\in\mathbb{R}^{N_{j}} as the vector containing only the NjN_{j} components of xx that belong to the jj-th feature. Following the discussion in [38], for the shape parameter, we choose a uniform distribution on a certain interval [a,b][0,+)[a,b]\subset[0,+\infty):

𝗉(p)\displaystyle\mathsf{p}(p) =j=1k1ba𝖨[a,b](pj).\displaystyle=\prod_{j=1}^{k}\frac{1}{b-a}\mathsf{I}_{[a,b]}(p_{j}). (5)

This choice stems from the fact that setting a=0a=0 and b=3b=3 allows covering all possible values of the shape parameter encountered in practical applications, but no additional information about this parameter is available. For the scale parameter, we adopt the Jeffreys distribution to reflect the lack of knowledge about this parameter:

𝗉(α)\displaystyle\mathsf{p}(\alpha) =j=1k1αj𝖨[0,+)(αj).\displaystyle=\prod_{j=1}^{k}\frac{1}{\alpha_{j}}\mathsf{I}_{[0,+\infty)}(\alpha_{j}). (6)

Note that such kind of prior is often used for scale parameters [39]. Hereabove, 𝖨S\mathsf{I}_{S} represents the characteristic function of some subset SS\subset\mathbb{R}, which is equal to 1 over SS, and 0 elsewhere.

2.3 Variational Model

In order to avoid to define a priori the number of features, we regularise the problem by considering the 2D Total Variation (TV) of the 𝒢𝒢𝒟\mathcal{GGD} parameters (p,α)(0,+)n×(0,+)n(p,\alpha)\in(0,+\infty)^{n}\times(0,+\infty)^{n}. The idea of using Total Variation to define a segmentation procedure is studied in [40, 41, 42, 43, 44, 45] by virtue of the co-area formula: the authors propose to replace the boundary information term of the Mumford-Shah (MS) functional [46] with the TV convex integral term. This choice yields a non-tight convexification of the MS model that does not require setting the number of segments in advance. The overall segmentation procedure is then built upon two steps: the first one consists of obtaining a smooth version of the given image that is adapted to segmentation by minimising the proposed functional with convex methods; the second step consists of partitioning the obtained solution into the desired number of segments, by e.g. defining the thresholds with Otsu’s method [47] or the kk-means algorithm. The strength of our approach is that the second step (i.e.  the actual segmentation step) is independent from the first one; hence it is possible to set the number of segments (i.e. , labels) without solving the optimisation problem again.

In the considered model, the introduction of a TV prior leads to a minimization problem that is non-convex with respect to α\alpha. To circumvent this issue, a possible strategy would involve applying the variable change β=1/α\beta=1/\alpha, which leads to a convex problem with regard to β\beta. However, after performing some tests, we noticed that this choice tends to promote extreme values 0 or ++\infty. We then opted for the following reparameterisation for the scale parameter: let β=(βi)1inn\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n} be such that, for every i{1,,n}i\in\{1,\ldots,n\},

βi=1pilnαi,\beta_{i}=\frac{1}{p_{i}}\ln\alpha_{i}, (7)

and let us choose for this new variable a non-informative Gaussian prior that is defined on the whole space of configurations with mean μβ0\mu_{\beta}\geq 0 and standard deviation σβ>0\sigma_{\beta}>0. The choice of a non-necessarily zero-mean distribution stems from the idea of having a more flexible prior to represent our reparameterised scale parameter.

Thus, replacing α\alpha with β\beta and further introducing TV regularisation potentials (weighted by the regularisation parameters λ>0\lambda>0 and ζ>0\zeta>0) leads to the following reformulation of distributions (4)-(6):

𝗉(x|p,β)=i=1n12exp(βi)Γ(1+1pi)exp(|xi|piexp(piβi))\mathsf{p}(x|p,\beta)\\ =\prod_{i=1}^{n}\frac{1}{2\exp(\beta_{i})\Gamma\left(1+\frac{1}{p_{i}}\right)}\exp\left(-|x_{i}|^{p_{i}}\exp(-p_{i}\beta_{i})\right) (8)
𝗉(p)=cpexp(λTV(p))i=1n1ba𝖨[a,b](pi)\mathsf{p}(p)=c_{p}\exp(-\lambda\mathrm{TV}(p))\prod_{i=1}^{n}\frac{1}{b-a}\mathsf{I}_{[a,b]}(p_{i}) (9)
𝗉(β)=cβexp(ζTV(β))i=1n12πσβexp((βiμβ)22σβ2).{\mathsf{p}(\beta)}\\ =c_{\beta}{\exp(-\zeta\mathrm{TV}(\beta))\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma_{\beta}}\exp\Big{(}-\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}\Big{)}.} (10)

where (cp,cβ)(0,+)2(c_{p},c_{\beta})\in(0,+\infty)^{2} are normalisation constants. In Figure 1, we depict the probabilistic dependence graph defining the relations between variables and hyperparameters in our model.

Refer to caption
Figure 1: Probabilistic dependence graph of our model. Hyperparameters are represented as diamonds, and variables as ellipses: aa and bb are the lower and the upper bound for the interval appearing in the uniform distribution of the shape parameter, pp is the shape parameter, α\alpha is the original scale parameter, β\beta is the reparameterised scale parameter with mean μβ\mu_{\beta} and standard deviation σβ\sigma_{\beta}, xx is the sought signal, yy is the observed one, KK is the linear operator, and ω\omega is the additive Gaussian noise with standard deviation σ\sigma.

The joint posterior distribution is determined as follows:

𝗉(x,p,β|y)\displaystyle\mathsf{p}(x,p,\beta|y) 𝗉(y|x,p,β)𝗉(x,p,β)\displaystyle\propto\mathsf{p}(y|x,p,\beta)\mathsf{p}(x,p,\beta)
𝗉(y|x,p,β)𝗉(x|p,β)𝗉(p)𝗉(β).\displaystyle\propto\mathsf{p}(y|x,p,\beta)\mathsf{p}(x|p,\beta)\mathsf{p}(p)\mathsf{p}(\beta). (11)

Let us take the negative logarithm of (11), then computing the Maximum a Posteriori estimates (i.e., maximising the joint posterior distribution) is equivalent to the following optimization problem, which we refer to as the joint image reconstruction and feature extraction problem

minimize(x,p,β)n×n×nΘ(x,p,β)=12σ2yKx2+i=1n(|xi|piepiβi+lnΓ(1+1pi)+ι[a,b](pi)+βi+(βiμβ)22σβ2)+λTV(p)+ζTV(β).\underset{\begin{subarray}{c}{(x,p,\beta)\in\mathbb{R}^{n}\times\mathbb{R}^{n}\times\mathbb{R}^{n}}\end{subarray}}{\mathrm{minimize}}\;\;\Theta(x,p,\beta)=\frac{1}{2\sigma^{2}}\|y-Kx\|^{2}\\ +\sum_{i=1}^{n}\Big{(}|x_{i}|^{p_{i}}e^{-p_{i}\beta_{i}}+\ln\Gamma(1+\frac{1}{p_{i}})+\iota_{[a,b]}(p_{i})+\beta_{i}\\ +{\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}}\Big{)}+\lambda\mathrm{TV}(p)+\zeta\mathrm{TV}(\beta). (12)

Hereabove, ιS\iota_{S} represents the indicator function of some subset SS\subset\mathbb{R}, which is equal to 0 over SS, and ++\infty elsewhere.

In [28], the authors proposed a generalised discrete Mumford-Shah variational model that is specifically designed for the joint image reconstruction and edge detection problem. In contrast, the model we propose in (12) is well suited to encompass a wider class of problems. In Section 5, we present two applications, namely in the context of wavelet-based image restoration and in the context of joint deblurring/segmentation of ultrasound images. In particular, we notice that when restricted to variable xx for a given set of parameters (p,β)(p,\beta), the formulation (12) boils down to the flexible sparse regularisation model

minimizexn12σ2yKx2+i=1n|xi|piepiβi,\underset{\begin{subarray}{c}{x\in\mathbb{R}^{n}}\end{subarray}}{\mathrm{minimize}}\;\;\frac{1}{2\sigma^{2}}\|y-Kx\|^{2}+\sum_{i=1}^{n}|x_{i}|^{p_{i}}e^{-p_{i}\beta_{i}}, (13)

where the contribution of the pi\ell_{p_{i}} regularisation term is itself weighted in a space varying fashion.

Function Θ\Theta in (12) is non-smooth and non-convex. It reads as the sum of a coupling term and three block-separable terms. In particular, the block-separable data-fit term relative to xx is quadratic and hence has a Lipschitz continuous gradient. Our proposed algorithm aims to leverage this property, which is generally not accounted for by other BCD methods. To this aim, we exploit a hybrid scheme that involves both standard and linearised proximal steps. The details about the proposed method are presented in the next section.

3 Preconditioned Structure Adapted Semi-Linearised Proximal Alternating Minimisation (P-SASL-PAM)

In this section, we introduce a BCD-based method to address a class of sophisticated optimization problems that includes (12) as a special case. We start the section by useful preliminaries about subdifferential calculus. Then, we present the Kurdyka-Łojasiewicz property, which plays a prominent role in the convergence analysis of BCD methods in a non-convex setting. Finally, we define problem (20), itself generalising (12), for which we derive our proposed BCD-based algorithm and show its convergence properties. The so-called Preconditioned Structure Adapted Semi-Linearised Proximal Alternating Minimisation (P-SASL-PAM) approach mixes both standard and preconditioned linearised proximal regularisation on the different coordinate blocks of the criterion.

3.1 Subdifferential Calculus

Let us now recall some definitions and elements of subdifferential calculus that will be useful in the upcoming sections. For a proper and lower semicontinuous function h:n(,]h:\mathbb{R}^{n}\rightarrow(-\infty,\infty], the domain of hh is defined as

domh={unh(u)<+}.\operatorname{dom}h\;=\left\{u\in\mathbb{R}^{n}\mid h(u)<+\infty\right\}.

Firstly, we recall the notion of subgradients and subdifferential for convex functions.

Definition 1 (Subgradient of a convex function).

Let h:n(,]h:\mathbb{R}^{n}\rightarrow(-\infty,\infty] be a proper convex lower semicontinuous function. The subdifferential h(u+)\partial h(u^{+}) of hh at u+nu^{+}\in\mathbb{R}^{n} is the set of all vectors rnr\in\mathbb{R}^{n}, called subgradients of hh at u+u^{+}, such that

unh(u)h(u+)+r,uu+.\forall u\in\mathbb{R}^{n}\;\;h(u)\geq h(u^{+})+\langle r,u-u^{+}\rangle.

If u+domhu^{+}\notin\operatorname{dom}h, then h(u+)=\partial h(u^{+})=\varnothing.

Secondly, we consider the more general notion of (limiting)-subdifferential for non-necessarily convex functions, as proposed in [48, Definition 8.3].

Definition 2 (Limiting Subdifferential).

Let h:n(,+]h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty] be a proper and lower semicontinuous function. For a vector u+nu^{+}\in\mathbb{R}^{n},

  • the Fréchet subdifferential of hh at u+u^{+}, written as ^h(u+)\hat{\partial}h(u^{+}), is the set of all vectors rnr\in\mathbb{R}^{n} such that

    h(u)h(u+)+r,uu++o(uu+);h(u)\geq h(u^{+})+\langle r,u-u^{+}\rangle+o(\|u-u^{+}\|);

    if u+dom hu^{+}\notin\text{dom\;}h, then ^h(u+)=\hat{\partial}h(u^{+})=\varnothing;

  • the limiting-subdifferential of hh at u+u^{+}, denoted by h(u+)\partial h(u^{+}), is defined as

    h(u+)={rn|uku+,h(uk)h(u+),rkr,rk^h(uk)}.\partial h(u^{+})=\{r\in\mathbb{R}^{n}\;|\;\exists\,u^{k}\rightarrow u^{+},\\ h(u^{k})\rightarrow h(u^{+}),r^{k}\rightarrow r,r^{k}\in\hat{\partial}h(u^{k})\}.

If hh is lower semicontinuous and convex, then the three previous notions of subdifferentiality are equivalent, i.e.  ^h(u+)=h(u+)\hat{\partial}h(u^{+})=\partial h(u^{+}). If hh is differentiable, then h(u+)={h(u+)}\partial h(u^{+})=\{\nabla h(u^{+})\}. Now, it is possible to formalise the notion of critical points for a general function:

Definition 3 (Critical point).

Let h:n(,+]h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty] be a proper function. A point unu^{*}\in\mathbb{R}^{n} is said to be a critical (or stationary) point for hh if 0h(u)0\in\partial h(u^{*}).

Eventually, we define the notion of proximal maps relative to the norm induced by a positive definite matrix.

Definition 4.

Let 𝒮n\mathcal{S}_{n} be the set of symmetric and positive definite matrices in n×n\mathbb{R}^{n\times n}. For a matrix A𝒮nA\in\mathcal{S}_{n}, the weighted 2\ell_{2}-norm induced by AA is defined as

(un)uA=(uAu)1/2.(\forall u\in\mathbb{R}^{n})\quad\|u\|_{A}=(u^{\top}Au)^{1/2}. (14)
Definition 5.

Let h:n(,+]h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty] be a proper and lower semicontinuous function, let A𝒮nA\in\mathcal{S}_{n} and u+nu^{+}\in\mathbb{R}^{n}. The proximity operator of function hh at u+u^{+} with respect to the norm induced by AA is defined as

proxhA(u+)=argminun(12uu+A2+h(u)).\text{prox}_{h}^{A}(u^{+})=\text{argmin}_{u\in\mathbb{R}^{n}}\left(\frac{1}{2}\|u-u^{+}\|^{2}_{A}+h(u)\right). (15)

Note that proxhA(u+)\text{prox}_{h}^{A}(u^{+}), as defined above, can be the empty set. It is nonempty for every u+nu^{+}\in\mathbb{R}^{n}, if hh is lower-bounded by an affine function. In addition, it reduces to a single-valued operator when hh is convex.

In order to deal with the situation when no closed-form proximal formulas are available (as it might be the case for non-trivial preconditioning metrics AA), we take into account an inexact notion of proximal computation in the sense of [37, Theorems 4.2 and 5.2] and [32]:

Definition 6.

Let h:n(,+]h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty] be a proper and lower semicontinuous function, let A𝒮nA\in\mathcal{S}_{n}, τ>0\tau>0 and u+nu^{+}\in\mathbb{R}^{n}. Then unu^{*}\in\mathbb{R}^{n} is an inexact proximal point for hh at u+u^{+} if the following relative error conditions are satisfied:

  1. (i)

    Sufficient Decrease Condition:

    h(u)+12u+uA2h(u+)h(u^{*})+\frac{1}{2}\|u^{+}-u^{*}\|_{A}^{2}\leq h(u^{+}) (16)
  2. (ii)

    Inexact Optimality: there exists rh(u)r\in\partial h(u^{*}) such that

    rτu+u.\|r\|\leq\tau\|u^{+}-u^{*}\|. (17)

In this case we write that uproxhA,τ(u+)u^{*}\in\operatorname{prox}_{h}^{A,\tau}(u^{+}).

Remark 1.

We highlight that when exact proximal points are considered, the optimality condition reads as

0h(u)+A(u+u)0\in\partial h(u^{*})+A(u^{+}-u^{*}) (18)

implying that there exists rh(u)r\in\partial h(u^{*}) such that r=A(uu+)r=A(u^{*}-u^{+}).

3.2 The KŁ-Property

Most of the works related to BCD-based algorithms rely on the framework developed by Attouch, Bolte, and Svaiter in their seminal paper [37] in order to prove the convergence of block alternating strategies for non-smooth and non-convex problems. A fundamental assumption in [37] is that the objective function satisfies the Kurdyka-Łojasiewicz (KŁ) property [49, 50, 51]. We recall the definition of this property as it was given in [25]. Let η(0,+]\eta\in(0,+\infty] and denote by Φη\Phi_{\eta} the class of concave continuous functions φ:[0,+)[0,+){\varphi:[0,+\infty)\rightarrow[0,+\infty)} satisfying the following conditions:

  1. (i)

    φ(0)=0\varphi(0)=0;

  2. (ii)

    φ\varphi is 𝒞1\mathcal{C}^{1} on (0,η)(0,\eta) and continuous at 0;

  3. (iii)

    for every s(0,η)s\in(0,\eta), φ(s)>0\varphi^{\prime}(s)>0.

For any subset SnS\subset\mathbb{R}^{n} and any point u+nu^{+}\in\mathbb{R}^{n}, the distance from u+u^{+} to SS is defined by

dist(u+,S)=infuSu+u\operatorname{dist}(u^{+},S)=\inf_{u\in S}\|u^{+}-u\|

with dist(u+,)=+\operatorname{dist}(u^{+},\varnothing)=+\infty.

Definition 7 (KŁ property).

Let h:n(,+]h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty] be a proper and lower semicontinuous function.

  1. (i)

    Function hh is said to satisfy the Kurdyka-Łojasiewicz property at u+domh{u}^{+}\in\text{dom}\;\partial h if there exist η(0,+]\eta\in(0,+\infty], a neighbourhood UU of u+{u}^{+} and a function φΦη\varphi\in\Phi_{\eta} such that, for every uUu\in U,

    h(u+)<h(u)<h(u+)+ηφ(h(u)h(u+))dist(0,h(u))1.h(u^{+})<h(u)<h(u^{+})+\eta\\ \quad\Rightarrow\varphi^{\prime}(h(u)-h(u^{+}))\operatorname{dist}(0,\partial h(u))\geq 1. (19)
  2. (ii)

    Function hh is said to be a KŁ function if it satisfies the KŁ property at each point of domh\operatorname{dom}\partial h.

3.3 Proposed Algorithm

Let us consider that every element ζN\zeta\in\mathbb{R}^{N} is block-decomposed as ζ=(ζ0,,ζd)\zeta=(\zeta_{0},\dots,\zeta_{d}), with, for every i{0,,d}i\in\{0,\ldots,d\}, ζini\zeta_{i}\in\mathbb{R}^{n_{i}}, with i=0dni=N\sum_{i=0}^{d}n_{i}=N. As we will show in Subsection 4.1, Problem (12) is a special instance of the general class of problems of the form

minimizeζNθ(ζ)=q(ζ)+f(ζ0)+i=1dgi(ζi),\underset{\begin{subarray}{c}{\zeta\in\mathbb{R}^{N}}\end{subarray}}{\mathrm{minimize}}\;\;\theta(\zeta)=q(\zeta)+f(\zeta_{0})+\sum_{i=1}^{d}g_{i}(\zeta_{i}), (20)

under the following assumption:

Assumption 1.


  1. 1.

    Function q:Nq:\mathbb{R}^{N}\to\mathbb{R} is bounded from below and differentiable with Lipschitz continuous gradient on bounded subsets of N\mathbb{R}^{N}.

  2. 2.

    Function f:n0f:\mathbb{R}^{n_{0}}\to\mathbb{R} is differentiable with globally Lipschitz continuous gradient of constant Lf>0L_{f}>0, and is bounded from below.

  3. 3.

    For every i{1,,d}i\in\{1,\dots,d\}, function gi:ni(,+]g_{i}:\mathbb{R}^{n_{i}}\to(-\infty,+\infty] is proper, lower semicontinuous and bounded from below and the restriction to its domain is continuous.

  4. 4.

    θ\theta is a KŁ function.

Remark 2.

The assumption of continuity in Assumption 1.3 is standard in the context of inexact minimisation algorithm (see the assumptions in [37, Theorem 4.1, Theorem 5.2]).

Throughout the paper we will use the following notation: for every (ζi)1idn1×nd(\zeta_{i^{\prime}})_{1\leq i^{\prime}\leq d}\in\mathbb{R}^{n_{1}}\times\cdots\mathbb{R}^{n_{d}} and i{0,,d}i\in\{0,\dots,d\}, ζi=(ζ0,,ζi1,ζi+1,,ζd){\zeta_{\neq i}=(\zeta_{0},\dots,\zeta_{i-1},\zeta_{i+1},\dots,\zeta_{d})} and

(zni)(z;ζi)=(ζ0,,ζi1,z,ζi+1,,ζd).(\forall z\in\mathbb{R}^{n_{i}})\quad(z;\zeta_{\neq i})\\ =(\zeta_{0},\dots,\zeta_{i-1},z,\zeta_{i+1},\dots,\zeta_{d}). (21)

In order to proceed with the algorithm construction and analysis, let us recall the notion of partial subdifferentiation for a function θ:N\theta:\mathbb{R}^{N}\longrightarrow\mathbb{R} as the one in (20). For every i{0,,d}i\in\{0,\dots,d\} given a fixed ζi\zeta_{\neq i}, the subdifferential of the partial function θ(;ζi)\theta(\cdot\,;\zeta_{\neq i}) with respect to the ii-th block, is denoted as iθ(;ζi)\partial_{i}\theta(\cdot\,;\zeta_{\neq i}). Given these definitions, we have the following differential calculus property (see [48, Exercises 8.8(c), Proposition 10.5]:

Proposition 1.

Let function θ\theta be defined as in (20). Under Assumption 1, the following equality holds: for every ζN\zeta\in\mathbb{R}^{N},

θ(ζ)={0q(ζ)+f(ζ0)}××i=1d(iq(ζ)+gi(ζi))=×i=0diθ(ζ).\partial\theta(\zeta)\\ =\{\nabla_{0}q(\zeta)+\nabla f(\zeta_{0})\}\times\raisebox{-1.42262pt}{\mbox{\LARGE{$\times$}}}_{\!i=1}^{\!d}\left(\nabla_{i}q(\zeta)+\partial g_{i}(\zeta_{i})\right)\\ =\raisebox{-1.42262pt}{\mbox{\LARGE{$\times$}}}_{\!i=0}^{\!d}\partial_{i}\theta(\zeta). (22)

We are now ready to introduce our block alternating algorithm P-SASL-PAM, outlined in Algorithm 1, to solve problem (20). Throughout the paper, we use the following notation: for every \ell\in\mathbb{N} and and for i{1,,d}i\in\{1,\dots,d\},

ζ+1,0\displaystyle\zeta^{\ell+1,0} =ζ;\displaystyle=\zeta^{\ell};
ζ+1,i\displaystyle\zeta^{\ell+1,i} =(ζ0+1,,ζi1+1,ζi,ζi+1,,ζd)\displaystyle=(\zeta^{\ell+1}_{0},\dots,\zeta^{\ell+1}_{i-1},\zeta^{\ell}_{i},\zeta^{\ell}_{i+1},\dots,\zeta^{\ell}_{d})
ζ+1,d+1\displaystyle\zeta^{\ell+1,d+1} =ζ+1.\displaystyle=\zeta^{\ell+1}.

Initialize ζ00domf\zeta_{0}^{0}\in\text{dom}\,f, ζi0domgi\zeta_{i}^{0}\in\text{dom}\,g_{i} for i{1,,d}i\in\{1,\dots,d\}
Set (A)𝒮n0(A_{\ell})_{\ell\in\mathbb{N}}\in\mathcal{S}_{n_{0}} for every \ell\in\mathbb{N}
Set γ0(0,1)\gamma_{0}\in(0,1) and γi>0\gamma_{i}>0 and τi>0\tau_{i}>0 for i{1,,d}i\in\{1,\dots,d\} and τi>0\tau_{i}>0 for i{0,,d}i\in\{0,\dots,d\}
For =0,1,\,\ell=0,1,\ldots until convergence

ζ0+1\displaystyle\zeta_{0}^{\ell+1} proxγ0q(;ζ0)A,τ0(ζ0γ0A1f(ζ0))\displaystyle\in\text{prox}^{A_{\ell},\tau_{0}}_{\gamma_{0}q{(\cdot\,;\zeta_{\neq 0}^{\ell})}}\big{(}\zeta_{0}^{\ell}-\gamma_{0}{A_{\ell}^{-1}}\nabla f(\zeta_{0}^{\ell})\big{)} (23)

ForFor i=1,,d\,i=1,\ldots,d

ζi+1\displaystyle\zeta_{i}^{{\ell}+1} proxγiθ(;ζi+1,i)τi(ζi)\displaystyle\in\text{prox}^{\tau_{i}}_{\gamma_{i}\theta{(\cdot\,;\,\zeta^{{\ell}+1,i}_{\neq i})}}(\zeta_{i}^{{\ell}}) (24)

Forend
end

Algorithm 1 P-SASL-PAM to solve (20).

The proposed method sequentially updates one of the coordinate blocks (ζ0,,ζd)(\zeta_{0},\dots,\zeta_{d}) involved in function θ\theta, through proximal and gradient steps. Our algorithm P-SASL-PAM, summarised in Algorithm 1, mixes both standard and linearised proximal regularisation on the coordinate blocks as in SLPAM [28], while inverting the splitting in order to gain more efficient proximal computations as in ASAP [29, 30]. On the one hand, the lack of global Lipschitz-continuity of q\nabla q prevents us from adopting BCVMFB [32]. On the other hand, the lack of differentiability for the whole set of block-separable functions prevents us from adopting ASAP [29, 30]. Our approach takes full advantage of the Lipschitz differentiability assumption on ff to perform a linearised step for the update of variable ζ0\zeta_{0}, while the remaining ζi\zeta_{i}’s are updated sequentially, according to a standard proximal step. In addition, in order to accelerate the convergence, a preconditioned version of the linearised step is used, which relies on the MM-based variable metric forward-backward strategy introduced in [52]. The latter relies on the following technical assumptions:

Assumption 2.

We choose a sequence of SPD matrices (A)(A_{\ell})_{\ell\in\mathbb{N}} in such a way that there exists (ν¯,ν¯)(0,+)2(\underline{\nu},\overline{\nu})\in(0,+\infty)^{2} such that, for every \ell\in\mathbb{N},

ν¯𝕀n0Aν¯𝕀n0.\underline{\nu}\mathbb{I}_{n_{0}}\preceq A_{\ell}\preceq\overline{\nu}\mathbb{I}_{n_{0}}. (25)
Assumption 3.

The quadratic function defined, for every ζ0+n0\zeta_{0}^{+}\in\mathbb{R}^{n_{0}} and every SPD matrix A𝒮nA\in\mathcal{S}_{n} satisfying Assumption 2, as

(ζ0n0)ϕ(ζ0,ζ0+)=f(ζ0+)+ζ0ζ0+,f(ζ0+)+12ζ0+ζ0A2(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad\phi(\zeta_{0},\zeta_{0}^{+})\\ =f(\zeta_{0}^{+})+\langle\zeta_{0}-\zeta_{0}^{+},\nabla f(\zeta_{0}^{+})\rangle+\frac{1}{2}\|\zeta_{0}^{+}-\zeta_{0}\|_{A}^{2} (26)

is a majorant function of ff at ζ0+\zeta_{0}^{+}, i.e.

(ζ0n0)f(ζ0)ϕ(ζ0,ζ0+).(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad f(\zeta_{0})\leq\phi(\zeta_{0},\zeta_{0}^{+}). (27)
Remark 3.

Since ff satisfies Assumption 1, the Descent Lemma [53, Proposition A.24] applies, yielding

((ζ0,ζ0+)n0×n0)f(ζ0)f(ζ0+)+ζ0ζ0+,f(ζ0+)+Lf2ζ0+ζ02.(\forall(\zeta_{0},\zeta_{0}^{+})\in\mathbb{R}^{n_{0}}\times\mathbb{R}^{n_{0}})\quad f(\zeta_{0})\\ \leq f(\zeta_{0}^{+})+\langle\zeta_{0}-\zeta_{0}^{+},\nabla f(\zeta_{0}^{+})\rangle+\frac{L_{f}}{2}\|\zeta_{0}^{+}-\zeta_{0}\|^{2}.

This guarantees that the preconditioning matrix A=Lf𝕀n0A=L_{f}\mathbb{I}_{n_{0}} satisfies Assumption 2, with ν¯=ν¯=Lf\underline{\nu}=\overline{\nu}=L_{f}. Apart from this simple choice for matrix AA, more sophisticated construction strategies have been studied in the literature [52, 54, 55]. Practical choices of metrics for Problem (12) will be discussed in Section 5, which is dedicated to numerical experiments.

Remark 4.

Alternative approaches to deal with the lack of global Lipschitz continuity of q\nabla q could involve a backtracking strategy as in [56] or adaptive step sizes based on an estimate of the local smoothness of the function as in [57, 58].

Remark 5.

Esentially Cyclic update rule Even though Algorithm 1 relies on a sequential update rule for the blocks of coordinates i{0,,d}i\in\{0,\dots,d\}, an extension to a quasi-cyclic rule with interval d¯d\overline{d}\geq d is possible. In this case, at each iteration, the index i{0,,d}i\in\{0,\dots,d\} of the updated block is randomly chosen in such a way that each of the dd blocks is updated at least once every d¯\overline{d} steps.

P-SASL-PAM involves the computation of three proximal operators, at each iteration \ell\in\mathbb{N}. As we will show in Subsection 3.3.1, if these operators are exactly computed, P-SASL-PAM fits within the general algorithmic framework TITAN [36], and, as such, inherits its convergence properties. The links between the exact and the inexact form of Algorithm 1 is discussed in Subsection 3.3.2. The convergence of the inexact form of Algorithm 1 is shown in Subsection 3.4.

3.3.1 Links between P-SASL-PAM and TITAN

Let us show that the exact version of Algorithm 1 is a special instance of the TITAN algorithm from [36]. The scheme of TITAN relies on an MM strategy that, at each iteration, for each block of coordinates, minimizes a block surrogate function, i.e.  a majorizing approximation for the restriction of the objective function to this block. Let us define formally the notion of block surrogate function in the case of Problem (20).

Definition 8 (Block surrogate function).

Consider a function h:Nh:\mathbb{R}^{N}\longrightarrow\mathbb{R}. For every i{0,,d}i\in\{0,\dots,d\}, function hi:ni×N{h_{i}\,:\,\mathbb{R}^{n_{i}}\times\mathbb{R}^{N}\to\mathbb{R}} is called a block surrogate function of hh at block ii if (ζi,ξ)hi(ζi;ξ)(\zeta_{i},\xi)\mapsto h_{i}(\zeta_{i};\xi) is continuous in ξ\xi, lower-semicontinuous in ζi\zeta_{i} and the following conditions are satisfied:

  1. (i)

    hi(ξi;ξ)=h(ξ)h_{i}(\xi_{i};\xi)=h(\xi) for every ξN\xi\in\mathbb{R}^{N}

  2. (ii)

    hi(ζi;ξ)h(ζi;ξi)h_{i}(\zeta_{i};\xi)\geq h(\zeta_{i};\xi_{\neq i}) for all ζini\zeta_{i}\in\mathbb{R}^{n_{i}} and ξN{\xi\in\mathbb{R}^{N}}

Function hi(;ξ)h_{i}(\cdot;\xi) is said to be a block surrogate function of hh at block ii in ξ\xi. The block approximation error for block ii at a point (ζi,ξ)ni×N(\zeta_{i},\xi)\in\mathbb{R}^{n_{i}}\times\mathbb{R}^{N} is then defined for every i{0,,d}i\in\{0,\dots,d\} as

𝖾i(ζi;ξ):=hi(ζi;ξ)h(ζi;ξi).{\mathsf{e}_{i}(\zeta_{i};\xi):=h_{i}(\zeta_{i};\xi)-h(\zeta_{i};\xi_{\neq i})}.

Let us now show that each of the steps of Algorithm 1 is actually equivalent to minimising an objective function involving a block surrogate function of the differentiable terms in θ\theta for block i{0,,d}i\in\{0,\dots,d\} at the current iterate.

Solving (23) in Algorithm 1 is equivalent to solving

argminζ0n0h0(ζ0;ζ)\underset{\begin{subarray}{c}{\zeta_{0}\in\mathbb{R}^{n_{0}}}\end{subarray}}{\mathrm{argmin}}\;\;h_{0}(\zeta_{0};\zeta^{\ell}) (28)

where

(ζ0n0)h0(ζ0;ζ)=q(ζ0;ζ0)+f(ζ0)+f(ζ0),ζ0ζ0+12γ1ζ0ζ0A2,(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad h_{0}(\zeta_{0};\zeta^{\ell})=\\ q(\zeta_{0};\zeta^{\ell}_{\neq 0})+f(\zeta_{0}^{\ell})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}-\zeta_{0}^{\ell}\rangle+\frac{1}{2\gamma_{1}}\|\zeta_{0}^{\ell}-\zeta_{0}\|^{2}_{{A_{\ell}}},

is a surrogate function of (q(;ζ0)+f(q(\cdot\,;\zeta^{\ell}_{\neq 0})+f by virtue of Assumption 3. Notice that function h0h_{0} is continuous on the block (ζ0;ζ)n0×N(\zeta_{0};\zeta)\in\mathbb{R}^{n_{0}}\times\mathbb{R}^{N}.

Solving (24) for a certain block index i{1,,d}i\in\{1,\dots,d\} in Algorithm 1 is equivalent to solving

argminζinihi(ζi;ζ+1,i)+gi(ζi)\underset{\begin{subarray}{c}{\zeta_{i}\in\mathbb{R}^{n_{i}}}\end{subarray}}{\mathrm{argmin}}\;\;h_{i}(\zeta_{i};\zeta^{\ell+1,i})+g_{i}(\zeta_{i}) (29)

where the function

(ζini)hi(ζi;ζ+1,i)=q(ζi;ζi+1,i)+12γiζiζi2(\forall\zeta_{i}\in\mathbb{R}^{n_{i}})\qquad h_{i}(\zeta_{i};\zeta^{\ell+1,i})=\\ q(\zeta_{i};\zeta^{\ell+1,i}_{\neq i})+\frac{1}{2\gamma_{i}}\|\zeta_{i}-\zeta_{i}^{\ell}\|^{2}

is a proximal surrogate of function q(,ζi+1)q(\cdot\;,\zeta^{\ell+1}_{\neq i}) at its ii-th block in ζ+1,i\zeta^{\ell+1,i}. Note that function hih_{i} is continuous on N\mathbb{R}^{N}.

In a nutshell, Algorithm 1 alternates between minimization of problems involving block surrogates for the differentiable terms of function θ\theta, and, as such, can be viewed as a special instance of TITAN [36]. This allows us to state the following convergence result for a sequence generated by Algorithm 1.

Theorem 2.

Let Assumptions 1-3 be satisfied. Assume also that the sequence (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} generated by Algorithm 1 is bounded. Then,

  1. i)

    =0+ζ+1ζ<+\sum_{\ell=0}^{+\infty}\|\zeta^{\ell+1}-\zeta^{\ell}\|<+\infty;

  2. ii)

    (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} converges to a critical point ζ\zeta^{*} for function θ\theta in (20).

Proof.

We start the proof by identifying the three block approximation errors for the block surrogate functions at an iteration \ell\in\mathbb{N}:

(ζ0n0)𝖾0(ζ0;ζ)=f(ζ0)+f(ζ0),ζ0ζ0+12γ0ζ0ζ0A2f(ζ0)(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad\mathsf{e}_{0}(\zeta_{0};\zeta^{\ell})\\ =f(\zeta_{0}^{\ell})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}-\zeta_{0}^{\ell}\rangle+\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell}-\zeta_{0}\|^{2}_{A_{\ell}}-f(\zeta_{0})

and for i{1,,d}i\in\{1,\dots,d\}

(ζini)𝖾i(ζi;ζ+1,i)=12γiζiζi2.(\forall\zeta_{i}\in\mathbb{R}^{n_{i}})\quad\mathsf{e}_{i}(\zeta_{i};\zeta^{\ell+1,i})=\frac{1}{2\gamma_{i}}\|\zeta_{i}-\zeta_{i}^{\ell}\|^{2}.

Clearly 𝖾0(;ζ)\mathsf{e}_{0}(\cdot\,;\zeta^{\ell}) (resp. 𝖾i(;ζ+1,i)\mathsf{e}_{i}(\cdot\,;\zeta^{\ell+1,i}) for i{1,,d}i\in\{1,\dots,d\}) are differentiable at ζ0\zeta_{0}^{\ell} (resp. at ζi\zeta_{i}^{\ell} for i{1,,d}i\in\{1,\dots,d\}) and the following holds for every i{0,,d}i\in\{0,\dots,d\}:

𝖾i(ζi;ζ+1,i)=0,\displaystyle\mathsf{e}_{i}(\zeta_{i}^{\ell};\zeta^{\ell+1,i})=0, i𝖾i(ζi;ζ+1,i)=0.\displaystyle\nabla_{i}\mathsf{e}_{i}(\zeta_{i}^{\ell};\zeta^{\ell+1,i})=0.

This shows that [36, Assumption 2] is satisfied.

From (23) and Assumptions 2-3, we deduce that

q(ζ+1,1)+f(ζ0+1)+12(1γ01)ν¯ζ0+1ζ02q(ζ)+f(ζ1)q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{1}^{\ell})

which implies that the Nearly Sufficient Descending Property [36, (NSDP)] is satisfied for the first block of coordinate with constant 12(1γ01)ν¯\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}. On the other hand, for every i{1,,d}i\in\{1,\dots,d\}, function 𝖾i(;ζ+1,i)\mathsf{e}_{i}(\cdot\,;\zeta^{\ell+1,i}) satisfies [36, Condition 2], which implies that [36, (NSDP)] also holds for ii-the block of coordinates with the corresponding constant 1/γi1/\gamma_{i}.

Moreover, [36, Condition 4.(ii)] is satisfied by Algorithm 1 with

l¯=max{ν¯2(1γ01),1γ1,,1γd}\overline{l}=\max\left\{\frac{\underline{\nu}}{2}\left(\frac{1}{\gamma_{0}}-1\right),\frac{1}{\gamma_{1}},\dots,\frac{1}{\gamma_{d}}\right\}

and this constant fulfill the requirements of [36, Theorem 8]. In addition, by virtue of Proposition 22, [36, Assumption 3.(i)] holds, while the requirement in [36, Assumption 3.(ii)] is guaranteed by the fact that all the block surrogate functions are continuously differentiable.

Finally, since Algorithm 1 does not include any extrapolation step, we do not need to verify [36, Condition 1], whereas [36, Condition 4.(i)] is always satisfied.

In conclusion, we proved that all the requirements of [36, Proposition 5, Theorems 6 and 8] are satisfied. [36, Proposition 5] guarantees that the sequence has the finite-length property as expressed by i),while [36, Theorems 6 and 8] state that the sequence converges to a critical point ζ\zeta^{*} of (20), which concludes the proof. ∎

3.3.2 Well-definition of Algorithm 1

Now, we show that the inexact updates involved in Algorithm 1 are well-defined. To do so, we prove that the P-SASL-PAM algorithm with exact proximal computations can be recovered as a special case of Algorithm 1.

By the variational definition of proximal operator, for every \ell\in\mathbb{N}, the iterates of Algorithm 1 satisfy, for every i{1,,d}i\in\{1,\dots,d\},

ζ0+1argminu0n0{q(u0;ζ0)+f(ζ0),u0ζ0+12γ0u0ζ0A2}\zeta_{0}^{\ell+1}\in\underset{\begin{subarray}{c}{u_{0}\in\mathbb{R}^{n_{0}}}\end{subarray}}{\mathrm{argmin}}\;\;\Big{\{}q(u_{0}\,;\zeta_{\neq 0}^{\ell})+\langle\nabla f(\zeta_{0}^{\ell}),u_{0}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|u_{0}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\Big{\}} (30)
ζi+1argminuini{q(ui;ζi+1,i)+gi(ui)+12γiuiζi2}\zeta_{i}^{\ell+1}\in\underset{\begin{subarray}{c}{u_{i}\in\mathbb{R}^{n_{i}}}\end{subarray}}{\mathrm{argmin}}\;\;\Big{\{}q(u_{i}\,;\zeta_{\neq i}^{\ell+1,i})+g_{i}(u_{i})\\ +\frac{1}{2\gamma_{i}}\|u_{i}-\zeta_{i}^{\ell}\|^{2}\Big{\}} (31)

so that

q(ζ+1,1)+f(ζ0),ζ0+1ζ0+12γ0ζ0+1ζ0A2q(ζ)q(\zeta^{\ell+1,1})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\leq q(\zeta^{\ell}) (32)
q(ζ+1,i+1)+g(ζi+1)+12γiζi+1ζi2q(ζ+1,i)+g(ζi),q(\zeta^{\ell+1,i+1})+g(\zeta_{i}^{\ell+1})\\ +\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\leq q(\zeta^{\ell+1,i})+g(\zeta_{i}^{\ell}), (33)

which implies that the sufficient decrease condition (16) is satisfied for every i{0,,d}i\in\{0,\dots,d\}.

The use of the Fermat’s rule implies that, for every \ell\in\mathbb{N}, the iterates of P-SASL-PAM are such that for every i{1,,d}i\in\{1,\dots,d\} there exists rigi(ζi+1)r_{i}\in\partial g_{i}(\zeta_{i}^{\ell+1}) for which the following equalities are satisfied for i{1,,d}i\in\{1,\dots,d\}:

0=f(ζ0)+0q(ζ+1,1)+γ01A(ζ0+1ζ0)\displaystyle 0=\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})+\gamma_{0}^{-1}{A_{\ell}}(\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}) (34)
0=ri+iq(ζ+1,i+1)+γi1(ζi+1ζi)\displaystyle 0=r_{i}+\nabla_{i}q(\zeta^{\ell+1,i+1})+\gamma_{i}^{-1}(\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}) (35)

Hence, for every i{1,,d}i\in\{1,\dots,d\}

f(ζ0)+0q(ζ+1,1)\displaystyle\|\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})\| γ01ν¯ζ0+1ζ0\displaystyle\leq\gamma_{0}^{-1}\overline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\| (36)
ri+iq(ζ+1,i+1)\displaystyle\|r_{i}+\nabla_{i}q(\zeta^{\ell+1,i+1})\| =γi1ζi+1ζi\displaystyle=\gamma_{i}^{-1}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\| (37)

which implies that the inexact optimality condition (17) is satisfied with τ0=γ01ν¯\tau_{0}=\gamma_{0}^{-1}\overline{\nu} for the first block of coordinates and τi=γi1\tau_{i}=\gamma_{i}^{-1} for the remaining ones. In a nutshell, Algorithm 1 is well defined, as its inexact rules hold assuming exact computation of the proximity operators, which leads to TITAN.

3.4 Convergence analysis in the inexact case

Let us now present our main result, that is the convergence analysis for Algorithm 1.

Lemma 1.

Let (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} be the sequence generated by Algorithm 1. Then, under Assumptions 1 and 2

  1. i)

    there exists μ(0,+)\mu\in(0,+\infty) such that for every \ell\in\mathbb{N},

    θ(ζ+1)θ(ζ)μ2ζ+1ζ2.\theta(\zeta^{\ell+1})\leq\theta(\zeta^{\ell})-\frac{\mu}{2}\|\zeta^{\ell+1}-\zeta^{\ell}\|^{2}. (38)
  2. ii)

    =0+ζ+1ζ2<+\sum_{\ell=0}^{+\infty}\|\zeta^{\ell+1}-\zeta^{\ell}\|^{2}<+\infty.

Proof.

Let us start by considering the sufficient decrease inequality related to the first block:

q(ζ+1,1)+f(ζ0),ζ0+1ζ0+12γ0ζ0+1ζ0A2q(ζ).q(\zeta^{\ell+1,1})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\leq q(\zeta^{\ell}). (39)

Adding f(ζ0)+12ζ0+1ζ0A2f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2} on both sides of (39) yields

q(ζ+1,1)+f(ζ0),ζ0+1ζ0+12γ0ζ0+1ζ0A2+f(ζ0)+12ζ0+1ζ0A2q(ζ)+f(ζ0)+12ζ0+1ζ0A2q(\zeta^{\ell+1,1})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}+f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2} (40)

By applying (26) and (27) with ζ0+=ζ0\zeta_{0}^{+}=\zeta_{0}^{\ell} and ζ0=ζ0+1\zeta_{0}=\zeta_{0}^{\ell+1} we obtain

f(ζ0+1)f(ζ0)+ζ0+1ζ0,f(ζ0)+12ζ0+1ζ0A2f(\zeta_{0}^{\ell+1})\leq f(\zeta_{0}^{\ell})+\langle\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell},\nabla f(\zeta_{0}^{\ell})\rangle\\ +\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2} (41)

hence the LHS in (40) can be further lower bounded, yielding

q(ζ+1,1)+f(ζ0+1)+12γ0ζ0+1ζ0A2q(ζ)+f(ζ0)+12ζ0+1ζ0A2,q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}, (42)

hence

q(ζ+1,1)+f(ζ0+1)+12(1γ01)ζ0+1ζ0A2q(ζ)+f(ζ0).q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell}). (43)

To conclude, by Assumption 2, we get

q(ζ+1,1)+f(ζ0+1)+12(1γ01)ν¯ζ0+1ζ02q(ζ)+f(ζ0).q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell}). (44)

The sufficient decrease inequality for the remaining blocks of index i{1,,d}i\in\{1,\ldots,d\} can be expressed as

q(ζ+1,i+1)+g(ζi+1)g(ζi)+12γiζi+1ζi2q(ζ+1,i).q(\zeta^{\ell+1,i+1})+g(\zeta_{i}^{\ell+1})-g(\zeta_{i}^{\ell})+\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\\ \leq q(\zeta^{\ell+1,i}). (45)

The first term in the LHS of (45) for the ii-th block can be similarly bounded from below with the sufficient decrease inequality for the (i+1)(i+1)-th block, yielding

q(ζ+1,i+2)+g(ζi+1+1)g(ζi+1)+12γi+1ζi+1+1ζi+12+g(ζi+1)g(ζi)+12γiζi+1ζi2q(ζ+1,i).q(\zeta^{\ell+1,i+2})+g(\zeta_{i+1}^{\ell+1})-g(\zeta_{i+1}^{\ell})\\ +\frac{1}{2\gamma_{i+1}}\|\zeta_{i+1}^{\ell+1}-\zeta_{i+1}^{\ell}\|^{2}+g(\zeta_{i}^{\ell+1})-g(\zeta_{i}^{\ell})\\ +\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\leq q(\zeta^{\ell+1,i}). (46)

By applying this reasoning recursively from i=1i=1 to i=di=d, we obtain

q(ζ+1,d+1)+i=1dg(ζi+1)i=1dg(ζi)+i=1d12γiζi+1ζi2q(ζ+1,1)q(\zeta^{\ell+1,d+1})+\sum_{i=1}^{d}g(\zeta_{i}^{\ell+1})-\sum_{i=1}^{d}g(\zeta_{i}^{\ell})\\ +\sum_{i=1}^{d}\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\leq q(\zeta^{\ell+1,1}) (47)

where we recall that q(ζ+1,d+1)=q(ζ+1)q(\zeta^{\ell+1,d+1})=q(\zeta^{\ell+1}).

Exploiting now (47), we can lower bound the first term in the LHS of (44), which yields

q(ζ+1)+i=1dg(ζi+1)i=1dg(ζi)+i=1d12γiζi+1ζi2+f(ζ0+1)+12(1γ01)ν¯ζ0+1ζ02q(ζ)+f(ζ0).q(\zeta^{\ell+1})+\sum_{i=1}^{d}g(\zeta_{i}^{\ell+1})-\sum_{i=1}^{d}g(\zeta_{i}^{\ell})\\ +\sum_{i=1}^{d}\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}+f(\zeta_{0}^{\ell+1})\\ +\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|^{2}\leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell}). (48)

By setting μ=min{(1γ01)ν¯,1γ1,,1γd}\mu=\min\left\{\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu},\frac{1}{\gamma_{1}},\dots,\frac{1}{\gamma_{d}}\right\}, we deduce (38).

From (38), it follows that the sequence (θ(ζ))(\theta(\zeta^{\ell}))_{\ell\in\mathbb{N}} is non-increasing. Since function θ\theta is assumed to be bounded from below, this sequence converges to some real number θ¯\underline{\theta}. We have then, for every integer KK,

κ=0Kζζ+12\displaystyle\sum_{\kappa=0}^{K}\|\zeta^{\ell}-\zeta^{\ell+1}\|^{2} 1μκ=0K(θ(ζ)θ(ζ+1))\displaystyle\leq\frac{1}{\mu}\sum_{\kappa=0}^{K}\left(\theta(\zeta^{\ell})-\theta(\zeta^{\ell+1})\right) (49)
=1μ(θ(ζ0)θ(ζK+1))\displaystyle=\frac{1}{\mu}(\theta(\zeta^{0})-\theta(\zeta^{K+1}))
1μ(θ(ζ0)θ¯).\displaystyle\leq\frac{1}{\mu}(\theta(\zeta^{0})-\underline{\theta}).

Taking the limit as K+K\rightarrow+\infty yields the desired summability property. ∎

Lemma 2.

Assume that the sequence (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} generated by Algorithm 1 is bounded. Then, for every \ell\in\mathbb{N}, there exists s+1θ(ζ+1)s^{\ell+1}\in\partial\theta(\zeta^{\ell+1}) such that

s+1ρζ+1ζ,\|s^{\ell+1}\|\leq\rho\|\zeta^{\ell+1}-\zeta^{\ell}\|, (50)

where ρ(0,+)\rho\in(0,+\infty).

Proof.

The assumed boundedness implies that there exists a bounded subset SS of N\mathbb{R}^{N} such that for every i{0,,d}i\in\{0,\dots,d\} and \ell\in\mathbb{N}, ζ+1,iS\zeta^{\ell+1,i}\in S. For every \ell\in\mathbb{N}, we define

s0+1=f(ζ0+1)+0q(ζ+1)s_{0}^{\ell+1}=\nabla f(\zeta_{0}^{\ell+1})+\nabla_{0}q(\zeta^{\ell+1}) (51)

for which the following holds by virtue of Proposition 22

s0+10θ(ζ+1)={0θ(ζ+1)}.s^{\ell+1}_{0}\in\partial_{0}\theta(\zeta^{\ell+1})=\{\nabla_{0}\theta(\zeta^{\ell+1})\}. (52)

Then

s0+1\displaystyle\|s^{\ell+1}_{0}\| f(ζ0+1)f(ζ0)\displaystyle\leq\|\nabla f(\zeta_{0}^{\ell+1})-\nabla f(\zeta_{0}^{\ell})\|
+f(ζ0)+0q(ζ+1,1)\displaystyle+\|\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})\|
+0q(ζ+1)0q(ζ+1,1).\displaystyle+\|\nabla_{0}q(\zeta^{\ell+1})-\nabla_{0}q(\zeta^{\ell+1,1})\|.

From the Lipschitz continuity of f\nabla f and q\nabla q on SS and the inexact optimality inequality for the first block, we conclude that

s0+1(Lf+τ0+Lq)ζ+1ζ.\|s^{\ell+1}_{0}\|\leq\left(L_{f}+\tau_{0}+L_{q}\right)\|\zeta^{\ell+1}-\zeta^{\ell}\|. (53)

In the same spirit, for every i{1,,d}i\in\{1,\dots,d\} we consider ri+1g(ζi+1)r_{i}^{\ell+1}\in\partial g(\zeta_{i}^{\ell+1}) satisfying the inexact optimality inequality with the corresponding τi\tau_{i}. We then define

si+1\displaystyle s_{i}^{\ell+1} =iq(ζ+1)+ri+1÷\displaystyle=\nabla_{i}q(\zeta^{\ell+1})+r_{i}^{\ell+1}\textdiv
iq(ζ+1)+gi(ζi+1)=iθ(ζ+1).\displaystyle\in\nabla_{i}q(\zeta^{\ell+1})+\partial g_{i}(\zeta_{i}^{\ell+1})=\partial_{i}\theta(\zeta^{\ell+1}). (54)

For i=di=d, by virtue of the inexact optimality inequality,

sd+1τdζ+1ζ.\|s_{d}^{\ell+1}\|\leq\tau_{d}\|\zeta^{\ell+1}-\zeta^{\ell}\|. (55)

On the other side, for i=1,,d1i=1,\dots,d-1

si+1\displaystyle\|s_{i}^{\ell+1}\| =iq(ζ+1)+ri+1\displaystyle=\|\nabla_{i}q(\zeta^{\ell+1})+r_{i}^{\ell+1}\|
iq(ζ+1)iq(ζ+1,i+1)\displaystyle\leq\|\nabla_{i}q(\zeta^{\ell+1})-\nabla_{i}q(\zeta^{\ell+1,i+1})\|
+ri+1+iq(ζ+1,i+1)\displaystyle\qquad\qquad+\|r_{i}^{\ell+1}+\nabla_{i}q(\zeta^{\ell+1,i+1})\|
Lqζ+1ζ+τiζi+1ζi,\displaystyle\leq L_{q}\|\zeta^{\ell+1}-\zeta^{\ell}\|+\tau_{i}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|,

where the last estimate stems from inexact optimality inequality for the ii-th block. This yields

si+1(Lq+τi)ζ+1ζ.\|s_{i}^{\ell+1}\|\leq(L_{q}+\tau_{i})\|\zeta^{\ell+1}-\zeta^{\ell}\|. (56)

To conclude, setting

s+1=(s0+1,,sd+1)θ(ζ+1)s^{\ell+1}=(s_{0}^{\ell+1},\dots,s_{d}^{\ell+1})\in\partial\theta(\zeta^{\ell+1}) (57)

and ρ=Lf+i=0dτi+dLq\rho=L_{f}+\sum_{i=0}^{d}\tau_{i}+dL_{q} yields (50). ∎

We now report a first convergence result for a sequence generated by the proposed algorithm, which is reminiscent from [24, Proposition 6]:

Proposition 3 (Properties of the cluster points set).

Suppose that Assumptions 1 and 2 hold. Let (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} be a sequence generated by Algorithm 1. Denote by ω(ζ0)\omega(\zeta^{0}) the (possibly empty) set of its cluster points. Then

  1. i)

    if (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} is bounded, then ω(ζ0)\omega(\zeta^{0}) is a nonempty compact connected set and

    dist(ζ,ω(ζ0))0 as +;\operatorname{dist}(\zeta^{\ell},\omega(\zeta^{0}))\rightarrow 0\quad\text{\;as\;}\quad\ell\rightarrow+\infty;
  2. ii)

    ω(ζ0)critθ\omega(\zeta^{0})\subset\operatorname{crit}\,\theta, where critθ\operatorname{crit}\,\theta is the set of critical points of function θ\theta;

  3. iii)

    θ\theta is finite valued and constant on ω(ζ0)\omega(\zeta^{0}), and it is equal to

    infθ(ζ)=lim+θ(ζ).\inf_{\ell\in\mathbb{N}}\theta(\zeta^{\ell})=\lim_{\ell\rightarrow+\infty}\theta(\zeta^{\ell}).
Proof.

The proof of the above results for the proposed algorithm is basically identical to the one for [24, Proposition 6] for PAM algorithm. In addition, we highlight that according to Assumption 1, our objective function θ\theta is continuous on its domain. ∎

In conclusion, we have proved that, under Assumptions 1-3, a bounded sequence generated by the proposed method satisfies the assumptions in [37, Theorem 2.9]. Consequently, we can state the following result:

Theorem 4.

Let Assumptions 1-3 be satisfied and let (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} be a sequence generated by Algorithm 1 that is assumed to be bounded. Then,

  1. i)

    =1+ζ+1ζ<+\sum_{\ell=1}^{+\infty}\|\zeta^{\ell+1}-\zeta^{\ell}\|<+\infty;

  2. ii)

    (ζ)(\zeta^{\ell})_{\ell\in\mathbb{N}} converges to a critical point ζ\zeta^{*} of θ\theta.

We managed to show that both the exact and the inexact version of Algorithm 1 share the same convergence guarantees under Assumptions 1-3. One of the main differences between the two algorithms, as highlighted in [37], is that the former has convergence guarantees that hold for an objective function that is lower semicontinuous, whereas the latter requires its continuity on the domain. However, as it will be shown in the next section, this does not represent an obstacle to the use of Algorithm 1 in image processing applications.

4 Application of P-SASL-PAM

4.1 Smoothing of the coupling term

The application of Algorithm 1 to Problem (12) requires the involved functions to fulfil the requirements listed in Assumption 1. This section is devoted to this analysis, by first defining d=2d=2, n0=n1=n2=nn_{0}=n_{1}=n_{2}=n, N=3nN=3n and the following functions, for every x=(xi)1innx=(x_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}, p=(pi)1innp=(p_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}, and β=(βi)1inn\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n},

q~(x,p,β)\displaystyle\tilde{q}(x,p,\beta) =i=1n|xi|pieβipi,\displaystyle=\sum_{i=1}^{n}|x_{i}|^{p_{i}}e^{-\beta_{i}p_{i}}, (58)
f(x)\displaystyle f(x) =12σ2yKx22,\displaystyle=\frac{1}{2\sigma^{2}}\|y-Kx\|_{2}^{2}, (59)
g1(p)\displaystyle g_{1}(p) =i=1n(lnΓ(1+1pi)+ι[a,b](pi))\displaystyle=\sum_{i=1}^{n}\left(\ln\Gamma(1+\frac{1}{p_{i}})+\iota_{[a,b]}(p_{i})\right) (60)
+λTV(p),\displaystyle\qquad\qquad+\lambda\operatorname{TV}(p),
g2(β)\displaystyle g_{2}(\beta) =i=1n(βi+(βiμβ)22σβ2)+ζTV(β).\displaystyle=\sum_{i=1}^{n}\left(\beta_{i}+{\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}}\right)+\zeta\operatorname{TV}(\beta). (61)

The first item in Assumption 1 regarding the regularity of the coupling term is not satisfied by (58). To circumvent this difficulty, we introduce the pseudo-Huber loss function [59] depending on a pair of parameters δ=(δ1,δ2)(0,+)2\delta=(\delta_{1},\delta_{2})\in(0,+\infty)^{2} such that δ2<δ1\delta_{2}<\delta_{1}:

(t)Cδ(t)=Hδ1(t)δ2,(\forall t\in\mathbb{R})\quad C_{\delta}(t)=H_{\delta_{1}}(t)-\delta_{2}, (62)

where Hδ1H_{\delta_{1}} is the hyperbolic function defined, for every tt\in\mathbb{R}, by Hδ1(t)=t2+δ12H_{\delta_{1}}(t)=\sqrt{t^{2}+\delta^{2}_{1}}. Function (62) is used as a smooth approximation of the absolute value involved in (58). We then replace (58) with

q(x,p,β)=i=1n(Cδ(xi))pieβipi.q(x,p,\beta)=\sum_{i=1}^{n}\left(C_{\delta}(x_{i})\right)^{p_{i}}e^{-\beta_{i}p_{i}}. (63)

Function CδC_{\delta} is infinitely differentiable. Thus function (63) satisfies Assumption 1.

Function (59) is quadratic convex, hence it clearly satisfies Assumption 1(ii). Function (60) is a sum of functions that are proper, lower semicontinuous and either non-negative or bounded from below. The same applies to function (61), which is also strongly convex. It results that (60) and (61) satisfy Assumption 1(iii).

Now, we must show that Θ\Theta is a KŁ function. To do so, let us consider the notion of o-minimal structure [60], which is a particular family 𝒪={𝒪n}n\mathcal{O}=\{\mathcal{O}_{n}\}_{n\in\mathbb{N}} where each 𝒪n\mathcal{O}_{n} is a collection of subsets of n\mathbb{R}^{n}, satisfying a series of axioms (we refer to [24, Definition 13], for a more complete description). We present hereafter the definition of definable set and definable function in an o-minimal structure:

Definition 9 (Definable sets and definable functions).

Given an o-minimal structure 𝒪\mathcal{O}, a set 𝒜n\mathcal{A}\subset\mathbb{R}^{n} such that 𝒜𝒪n\mathcal{A}\in\mathcal{O}_{n} is said to be definable in 𝒪\mathcal{O}. A real extended valued function f:(,+]f\,:\mathbb{R}\rightarrow(-\infty,+\infty] is said to be definable in 𝒪\mathcal{O} if its graph is a definable subset of n×\mathbb{R}^{n}\times\mathbb{R}.

The importance of these concepts in mathematical optimisation is related to the following key result concerning the KŁ property [61]:

Theorem 5.

Any proper lower semicontinuous function f:n(,+]f:\mathbb{R}^{n}\rightarrow(-\infty,+\infty] which is definable in an o-minimal structure 𝒪\mathcal{O} has the KŁ property at each point of domf\operatorname{dom}\,\partial f.

Let us identify a structure in which all the functions involved in the definition of Θ\Theta are definable. This will be sufficient, as definability is a closed property with respect to several operations, including finite sum and composition of functions. Before that, we provide a couple of examples of o-minimal structure. The first is represented by the structure of globally subanalytic sets an\mathbb{R}_{\rm an} [62], which contains all the sets of the form {(u,t)[1,1]n×f(u)=t}\{(u,t)\in[-1,1]^{n}\times\mathbb{R}\mid f(u)=t\} where f:[1,1]nf\,:\,[-1,1]^{n}\rightarrow\mathbb{R} is an analytic function that can be analytically extended on a neighbourhood of [1,1]n[-1,1]^{n}. The second example is the log\log-exp\exp structure (an,exp)(\mathbb{R}_{\rm an},\text{exp}) [60, 63], which includes an\mathbb{R}_{\rm an} and the graph of the exponential function. Even though this second structure is a common setting for many optimisation problems, it does not meet the requirements for ours: as shown in [64], Γ>0\Gamma^{>0} (i.e. , the restriction of the Gamma function to (0,+)(0,+\infty)) is not definable on (an,exp)(\mathbb{R}_{\rm an},\exp). We thus consider the larger structure (𝒢,exp)(\mathbb{R_{\mathcal{G}}},\exp), where Γ>0\Gamma^{>0} has been proved to be definable [65]. 𝒢\mathbb{R_{\mathcal{G}}} is an o-minimal structure that extends an\mathbb{R}_{\rm an} and is generated by the class 𝒢\mathcal{G} of Gevrey functions from [66].

We end this section with the following result, which will be useful subsequently.

Proposition 6.

The function tlnΓ(1+1t)t\mapsto\ln\Gamma(1+\frac{1}{t}) defined on (0,+)(0,+\infty) is μ\mu-weakly convex with μ>μ00.1136\mu>\mu_{0}\approx 0.1136.

Proof.

Let us show that there exists μ>0\mu>0 such that function tlnΓ(1+1t)+μt2/2t\mapsto\ln\Gamma(1+\frac{1}{t})+\mu t^{2}/2 is convex on (0,+)(0,+\infty). The second-order derivative of this function on the positive real axis reads

d2dt2(lnΓ(1+1t)+μ2t2)=1t3(2Dig(1+1t)+1tDig(1+1t)+μt3),\frac{d^{2}}{dt^{2}}\left(\ln\Gamma\left(1+\frac{1}{t}\right)+\frac{\mu}{2}t^{2}\right)=\\ \frac{1}{t^{3}}\left(2\text{Dig}\left(1+\frac{1}{t}\right)+\frac{1}{t}\text{Dig}^{\prime}\left(1+\frac{1}{t}\right)+\mu t^{3}\right), (64)

where the Digamma function Dig()\operatorname{Dig}() is the logarithmic derivative of the Gamma function. In order to show the convexity of the considered function, we need to ensure that (64) is positive for every t(0,+)t\in(0,+\infty). By virtue of Bohr–Möllerup’s theorem [67, Theorem 2.1], among all functions extending the factorial functions to the positive real numbers, only the Gamma function is log-convex. More precisely, its natural logarithm is (strictly) convex on the positive real axis. This implies that tDig(t)t\mapsto\text{Dig}^{\prime}(t) is positive. It results that the only sign-changing term in (64) is function t2Dig(1+1t)t\mapsto 2\,\text{Dig}\left(1+\frac{1}{t}\right) as tDig(t)t\mapsto\text{Dig}(t) vanishes in a point t0>1t_{0}>1 (t01.46163t_{0}\approx 1.46163) which corresponds to the minimum point of the Gamma function – and therefore also of its natural logarithm [68]. As a consequence, the Digamma function is strictly positive for t(t0,+)t\in(t_{0},+\infty), implying that tDig(1+1t)t\mapsto\text{Dig}\left(1+\frac{1}{t}\right) is strictly positive for all t(0,1t01)t\in(0,\frac{1}{t_{0}-1}). Furthermore, tDig(1+1t)t\mapsto\text{Dig}\left(1+\frac{1}{t}\right) is strictly decreasing and bounded from below, as shown by the negativity of its first derivative

ddtDig(1+1t)=1t2Dig(1+1t)\frac{d}{dt}\text{Dig}\left(1+\frac{1}{t}\right)=-\frac{1}{t^{2}}\text{Dig}^{\prime}\left(1+\frac{1}{t}\right)

and by the following limit

limt+Dig(1+1t)=Dig(1)=\lim_{t\rightarrow+\infty}\text{Dig}\left(1+\frac{1}{t}\right)=\text{Dig}(1)=-\mathcal{E}

where the last equality holds by virtue of the Gauss Digamma theorem, and \mathcal{E} is Euler-Mascheroni’s constant 0.57721\mathcal{E}\approx 0.57721 [69]. In conclusion, for t[1t01,+)t\in[\frac{1}{t_{0}-1},+\infty), we need to ensure that the positive terms in (64) manage to balance the negative contribution of function t2Dig(1+1t)>2t\mapsto 2\text{Dig}\left(1+\frac{1}{t}\right)>-2\mathcal{E}. This leads to a condition on parameter μ>0\mu>0, since we can impose that

0<μt32,0<\mu t^{3}-2\mathcal{E},

where the right-hand side expression has a lower bound μ/(t01)32\mu/(t_{0}-1)^{3}-2\mathcal{E} that is positive when

μ>2(t01)3=μ00.1136.\mu>2{\mathcal{E}}(t_{0}-1)^{3}=\mu_{0}\approx 0.1136.

This shows that function tlnΓ(1+1t)t\mapsto\ln\Gamma(1+\frac{1}{t}) is μ\mu-weakly convex. ∎

4.2 Proximal computations

Let us now discuss the practical implementation of the proximal computations involved in Algorithm 1. Specifically, as we will show, none of these operators have closed-form expressions, so we need to resort to the inexact version. To ease the description, we summarise in Algorithm 2 the application of Algorithm 1 to the resolution of (12). As pointed out in [70] and in [35], the role of the relative error conditions (16) and (17) are more of theoretical interest than of practical use. In the following, we will illustrate optimisation procedures ensuring that condition (16) is satisfied for every block of variables at every iteration.

Initialize x0x^{0}, p0p^{0} and β0\beta^{0}
Set γ0(0,1)\gamma_{0}\in(0,1), γ1(0,1/μ0)\gamma_{1}\in(0,1/\mu_{0}), γ2>0\gamma_{2}>0
For =0,1,\,\ell=0,1,\ldots
Set A𝒮n{A_{\ell}}\in\mathcal{S}_{n}
Find

x+1\displaystyle x^{\ell+1} proxγ0q(,p,β)A(xγ0A1f(x))\displaystyle\approx\text{prox}^{A}_{\gamma_{0}q{(\cdot,p^{\ell},\beta^{\ell})}}(x^{\ell}-\gamma_{0}A^{-1}\nabla f(x^{\ell})) (65)
 (with Algorithm 3)
p+1\displaystyle p^{{\ell}+1} proxγ1θ(x+1,,β)(p)\displaystyle\approx\text{prox}_{\gamma_{1}\theta{(x^{{\ell}+1},\cdot,\beta^{\ell})}}(p^{{\ell}}) (66)
 (with Algorithm 5)
β+1\displaystyle\beta^{{\ell}+1} proxγ2θ(x+1,p+1,)(β)\displaystyle\approx\text{prox}_{\gamma_{2}\theta{(x^{{\ell}+1},p^{\ell+1},\cdot)}}(\beta^{\ell}) (67)
 (with Algorithm 6)
Algorithm 2 P-SASL-PAM to solve (12)
Proximal computation with respect to xx.

Subproblem (65) in Algorithm 2 requires the computation of the proximity operator of the following separable function

q(,p,β):xi=1n(Cδ(xi))pieβipi,q(\cdot,p^{\ell},\beta^{\ell})\,:\,x\,\mapsto\,\sum_{i=1}^{n}\left(C_{\delta}(x_{i})\right)^{p_{i}^{\ell}}e^{-\beta_{i}^{\ell}p_{i}^{\ell}},

within a weighted Euclidean metric induced by some matrix A𝒮nA\in\mathcal{S}_{n}. We notice that xi(Cδ(xi))pix_{i}\mapsto\left(C_{\delta}(x_{i})\right)^{p_{i}^{\ell}} is non-convex whenever pi(0,1)p^{\ell}_{i}\in(0,1), for some i{1,,n}i\in\{1,\dots,n\}. In order to overcome this issue, we apply a majorisation principle [71]. Let us introduce function σ\sigma defined, for every u[δ1,+)u\in[\delta_{1},+\infty), as σ(u)=(uδ2)p\sigma(u)=(u-\delta_{2})^{p} with p(0,1]p\in(0,1], and vector δ=(δ1,δ2)(0,+)2\delta=(\delta_{1},\delta_{2})\in(0,+\infty)^{2} such that δ2<δ1\delta_{2}<\delta_{1}. Since this function is concave, it can be majorised by its first-order expansion around any point w>δ2w>\delta_{2}:

(u>δ2)(uδ2)p(wδ2)p+p(wδ2)p1(uw)=(1p)(wδ2)p+p(wδ2)p1(uδ2).(\forall u>\delta_{2})\quad(u-\delta_{2})^{p}\leq(w-\delta_{2})^{p}\\ +p(w-\delta_{2})^{p-1}(u-w)\\ =(1-{p})(w-\delta_{2})^{p}+{p}(w-\delta_{2})^{p-1}(u-\delta_{2}). (68)

Setting, for every (t,t)2(t,t^{\prime})\in{\mathbb{R}^{2}}, u=Hδ1(t)δ1u=H_{\delta_{1}}(t)\geq\delta_{1}, w=Hδ1(t)δ1w=H_{\delta_{1}}(t^{\prime})\geq\delta_{1} allows us to deduce the following majorisation:

(Cδ(t))p\displaystyle(C_{\delta}(t))^{p}\leq (1p)(Cδ(t))p+p(Cδ(t))p1Cδ(t).\displaystyle(1-p)(C_{\delta}(t^{\prime}))^{p}+p(C_{\delta}(t^{\prime}))^{p-1}C_{\delta}(t). (69)

Let us now define ={i{1,,n}|pi1}\mathcal{I}^{\ell}=\{i\in\{1,\dots,n\}\;|\;p^{\ell}_{i}\geq 1\} and 𝒥={1,,n}\mathcal{J}^{\ell}=\{1,\dots,n\}\setminus\mathcal{I}^{\ell}. Given v=(vi)1innv=(v_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}, we deduce from (69) that

(x=(xi)1inn)\displaystyle(\forall x=(x_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})
q(x,p,β)=i(Cδ(xi))pieβipi\displaystyle\quad q(x,p^{\ell},\beta^{\ell})=\sum_{i\in\mathcal{I}^{\ell}}\left(C_{\delta}(x_{i})\right)^{p^{\ell}_{i}}e^{-\beta^{\ell}_{i}p^{\ell}_{i}} (70)
+i𝒥(Cδ(xi))pieβipi\displaystyle\qquad\qquad+\sum_{i\in\mathcal{J}^{\ell}}\left(C_{\delta}(x_{i})\right)^{p^{\ell}_{i}}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}
q¯(x,v,p,β),\displaystyle\leq\overline{q}(x,v,p^{\ell},\beta^{\ell}), (71)

where the resulting majorant function is separable, i.e. 

q¯(x,v,p,β),=i=1nq¯i(xi,vi,pi,βi),\overline{q}(x,v,p^{\ell},\beta^{\ell}),=\sum_{i=1}^{n}\overline{q}_{i}(x_{i},v_{i},p^{\ell}_{i},\beta^{\ell}_{i}), (72)

with, for every i{1,,n}i\in\{1,\ldots,n\} and xix_{i}\in\mathbb{R},

q¯i(xi,vi,pi,βi)\displaystyle\overline{q}_{i}(x_{i},v_{i},p^{\ell}_{i},\beta^{\ell}_{i}) (73)
={eβipi(Cδ(ui))pi,if pi1eβipi((Cδ(vi))pi(1pi)+pi(Cδ(vi))pi1Cδ(xi))otherwise.\displaystyle=\begin{cases}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}\left(C_{\delta}(u_{i})\right)^{p^{\ell}_{i}},&\mbox{if $p_{i}^{\ell}\geq 1$}\\ e^{-\beta^{\ell}_{i}p^{\ell}_{i}}\Big{(}(C_{\delta}(v_{i}))^{p^{\ell}_{i}}(1-p^{\ell}_{i})\\ \qquad+p^{\ell}_{i}(C_{\delta}(v_{i}))^{p^{\ell}_{i}-1}C_{\delta}(x_{i})\Big{)}&\mbox{otherwise.}\end{cases}

In a nutshell, each term of index i{1,,n}i\in\{1,\dots,n\} in (72) coincides either with the ii-th term of q(,p,β)q(\cdot,p^{\ell},\beta^{\ell}) when ii\in\mathcal{I}^{\ell}, or it is a convex majorant of this ii-th term with respect to viv_{i} when i𝒥i\in\mathcal{J}^{\ell}. We thus propose to adopt an MM procedure by building a sequence of convex surrogate problems for the non-convex minimisation problem involved in the computation of proxγ0q(,p,β)A\text{prox}_{\gamma_{0}q(\cdot,p^{\ell},\beta^{\ell})}^{A}. At the κ\kappa-th iteration of this procedure, following the MM principle, the next iterate xκ+1x^{\kappa+1} is determined by setting v=xκv=x^{\kappa}. We summarise the strategy in Algorithm 3.

Initialize x0nx^{0}\in\mathbb{R}^{n}
For κ=0,1,\kappa=0,1,\dots until convergence

xκ+1\displaystyle x^{\kappa+1} =proxγ0q¯(,xκ,p,β)A(x+)\displaystyle=\text{prox}_{\gamma_{0}\overline{q}(\cdot,x^{\kappa},p^{\ell},\beta^{\ell})}^{A}(x^{+}) (74)
 (with Algorithm 4)
Algorithm 3 MM algorithm to approximate proxγ0q(,p,β)A(x+)\text{prox}_{\gamma_{0}q(\cdot,p^{\ell},\beta^{\ell})}^{A}(x^{+}) with x+nx^{+}\in\mathbb{R}^{n}

Since function q¯(,v,p,β)\overline{q}(\cdot,v,p^{\ell},\beta^{\ell}) is convex, proper, and lower semicontinuous, its proximity operator in the weighted Euclidean metric induced by matrix AA is guaranteed to be uniquely defined. It can be computed efficiently using the Dual Forward-Backward (DFB) method [72], outlined in Algorithm 4.

Initialize dual variable w0nw^{0}\in\mathbb{R}^{n}
Set η(0,2|A|1)\eta\in(0,2|||A|||^{-1})
For κ=0,1,\kappa^{\prime}=0,1,\dots until convergence

uκ\displaystyle u^{\kappa^{\prime}} =x+Awκ,\displaystyle=x^{+}-Aw^{\kappa^{\prime}}, (75)
wκ+1\displaystyle w^{\kappa^{\prime}+1} =wκ+ηuκ\displaystyle=w^{\kappa^{\prime}}+\eta u^{\kappa^{\prime}} (76)
ηproxη1γ0q¯(,v,p,β)(η1wκ+uκ).\displaystyle\;\;-\eta\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})}(\eta^{-1}w^{\kappa^{\prime}}+u^{\kappa^{\prime}}).

Return uκnu^{\kappa^{\prime}}\in\mathbb{R}^{n}

Algorithm 4 DFB algorithm to compute proxγ0q¯(,v;p,β)A(x+)\text{prox}_{\gamma_{0}\overline{q}(\cdot,v;p^{\ell},\beta^{\ell})}^{A}(x^{+}) with x+nx^{+}\in\mathbb{R}^{n}

The update in (4) can be performed componentwise since function q¯(,v,p,β)\overline{q}(\cdot,v,p^{\ell},\beta^{\ell}) is separable. Thanks to the separability property, computing proxη1γ0q¯(,v,p,β)\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})} boils down to solving nn one-dimensional optimization problems, that is

(u+\displaystyle(\forall u^{+} =(ui+)1inn)\displaystyle=(u^{+}_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})
proxη1γ0q¯(,v,p,β)(u+)\displaystyle\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})}(u^{+})
=(proxη1γ0q¯i(,vi,pi,βi)(ui+))1in.\displaystyle=\left(\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}_{i}(\cdot,v_{i},p^{\ell}_{i},\beta^{\ell}_{i})}(u^{+}_{i})\right)_{1\leq i\leq n}. (77)

More precisely,

  • for every i{1,,n}i\in\{1,\ldots,n\}, such that pi1p^{\ell}_{i}\leq 1,

    proxη1γ0q¯i(,vi,pi,βi)(ui+)\displaystyle\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}_{i}(\cdot,v_{i},p^{\ell}_{i},\beta^{\ell}_{i})}(u^{+}_{i})
    =proxη1γ0eβipipi(Cδ(vi))pi1Cδ1(ui+)\displaystyle=\text{prox}_{\eta^{-1}\gamma_{0}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}p^{\ell}_{i}(C_{\delta}(v_{i}))^{p^{\ell}_{i}-1}C_{\delta_{1}}}(u^{+}_{i})
    =proxη1γ0eβipipi(Cδ(vi))pi1Hδ1(ui+).\displaystyle=\text{prox}_{\eta^{-1}\gamma_{0}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}p^{\ell}_{i}(C_{\delta}(v_{i}))^{p^{\ell}_{i}-1}H_{\delta_{1}}}(u^{+}_{i}). (78)

    The proximity operator of the so-scaled version of function Hδ1H_{\delta_{1}} can be determined by solving a quartic polynomial equation.111 http://proximity-operator.net/scalarfunctions.html

  • For every i{1,,n}i\in\{1,\ldots,n\} such that pi>1p^{\ell}_{i}>1,

    proxη1γ0q¯i(,vi,pi,βi)(ui+)=proxη1γ0eβipi(Cδ)pi(ui+).\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}_{i}(\cdot,v_{i},p^{\ell}_{i},\beta^{\ell}_{i})}(u^{+}_{i})\\ =\text{prox}_{\eta^{-1}\gamma_{0}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}\left(C_{\delta}\right)^{p^{\ell}_{i}}}(u^{+}_{i}). (79)

    The latter quantity can be evaluated through a bisection search to find the root of the derivative of the involved proximally regularised function.

Remark 6.

Due to the non-convexity of q(,p,β)q(\cdot,p^{\ell},\beta^{\ell}), there is no guarantee that the point estimated by Algorithm 4 coincides with the exact proximity point. However, we did not notice any numerical issues in our implementation.

Proximal computation with respect to pp.

Subproblem (66) requires to compute the proximity operator of γ1(q(x+1,,β)+g)\gamma_{1}\big{(}q(x^{\ell+1},\cdot,\beta^{\ell})+g\big{)}, which is equivalent to solving the following minimization problem

minimizep[a,b]nψ(p)+λ1,2(Dp),\underset{\begin{subarray}{c}{p\in{[a,b]^{n}}}\end{subarray}}{\mathrm{minimize}}\;\;\psi^{\ell}(p)+\lambda\ell_{1,2}(Dp), (80)

where, for every pnp\in\mathbb{R}^{n}, ψ(p)=i=1nψi(pi)\psi^{\ell}(p)=\sum_{i=1}^{n}\psi_{i}^{\ell}(p_{i}) with

(i{1,,n})(pi)ψi(pi)={(Cδ(xi+1))pieβipi+lnΓ(1+1pi)+12γ1(pipi)2if pi>0+otherwise.(\forall i\in\{1,\ldots,n\})(\forall p_{i}\in\mathbb{R})\\ \psi_{i}^{\ell}(p_{i})=\begin{cases}\left(C_{\delta}(x_{i}^{\ell+1})\right)^{p_{i}}e^{-\beta_{i}^{\ell}p_{i}}+\ln\Gamma(1+\frac{1}{p_{i}})\\ \quad+\frac{1}{2\gamma_{1}}(p_{i}-p_{i}^{\ell})^{2}\quad\mbox{if $p_{i}>0$}\\ +\infty\qquad\qquad\qquad\;\;\mbox{otherwise.}\end{cases} (81)

Moreover, D=[Dh,Dv]D=[D_{\rm h},D_{\rm v}] where (Dh,Dv)(n×n)2(D_{\rm h},D_{\rm v})\in(\mathbb{R}^{n\times n})^{2} are the discrete horizontal and vertical 2D gradient operators, and the 1,2\ell_{1,2}-norm is defined as

(pn)1,2(Dp)=i=1n([Dhp]i,[Dvp]i)2.(\forall p\in\mathbb{R}^{n})\quad\ell_{1,2}(Dp)=\sum_{i=1}^{n}\|([D_{\rm h}p]_{i},[D_{\rm v}p]_{i})\|_{2}.

Problem (80) is equivalent to minimizing the sum of the indicator function of a hypercube, a separable component and a non-separable term involving the linear operator DD. According to Proposition 6, we can ensure the convexity of each term (ψi)1in(\psi_{i}^{\ell})_{1\leq i\leq n} by setting γ1<1μ08.805\gamma_{1}<\frac{1}{\mu_{0}}\approx 8.805. In order to solve (80), it is then possible to implement a Primal-Dual (PD) algorithm [73, 74, 75] as outlined in Algorithm 5.

Initialise the dual variables v10n×2v^{0}_{1}\in\mathbb{R}^{n\times 2},v20nv^{0}_{2}\in\mathbb{R}^{n}.
Set τ>0\tau>0 and σ>0\sigma>0 such that τσ(|D|2+1)<1\tau\sigma(|||D|||^{2}+1)<1.
for κ=0,1,\kappa=0,1,\dots until convergence

uκ\displaystyle u^{\kappa} =pκτ(Dv1κ+v2κ),\displaystyle=p^{\kappa}-\tau(D^{*}v^{\kappa}_{1}+v^{\kappa}_{2}), (82)
pκ+1\displaystyle{p}^{\kappa+1} =proj[a,b]n(uκ),\displaystyle=\text{proj}_{[a,b]^{n}}(u^{\kappa}), (83)
w1κ\displaystyle w^{\kappa}_{1} =v1κ+σD(2pκ+1pκ),\displaystyle=v^{\kappa}_{1}+\sigma D(2{p}^{\kappa+1}-{p}^{\kappa}), (84)
v1κ+1\displaystyle v^{\kappa+1}_{1} =w1κ+1σproxλ1,2σ(w1κσ).\displaystyle=w^{\kappa+1}_{1}-\sigma\text{prox}_{\frac{\lambda\ell_{1,2}}{\sigma}}(\frac{w^{\kappa}_{1}}{\sigma}). (85)
w2κ\displaystyle w^{\kappa}_{2} =v2κ+σ(2pκ+1pκ),\displaystyle=v^{\kappa}_{2}+\sigma(2{p}^{\kappa+1}-{p}^{\kappa}), (86)
v2κ+1\displaystyle v^{\kappa+1}_{2} =w2κ+1σproxψσ(w2κσ).\displaystyle=w^{\kappa+1}_{2}-\sigma\text{prox}_{\frac{\psi^{\ell}}{\sigma}}(\frac{w^{\kappa}_{2}}{\sigma}). (87)

Return pκ+1[a,b]np^{\kappa+1}\in[a,b]^{n}

Algorithm 5 Primal Dual Algorithm for solving (80)

The proximity operator of the involved 1,2\ell_{1,2} norm has a closed-form expression. For every w1=([w1]i,1,[w1]i,2)1inn×2w_{1}=([w_{1}]_{i,1},[w_{1}]_{i,2})_{1\leq i\leq n}\in\mathbb{R}^{n\times 2} and λ>0\lambda>0, we have

proxλ1,2(w1)\displaystyle\text{prox}_{\lambda\ell_{1,2}}(w_{1})
=(proxλ2(([w1]i,1,[w1]i,2)))1in\displaystyle\quad=\left(\text{prox}_{\lambda\|\cdot\|_{2}}\Big{(}([w_{1}]_{i,1},[w_{1}]_{i,2})\Big{)}\right)_{1\leq i\leq n}
=(([w1]i,1,[w1]i,2)\displaystyle=\Bigg{(}([w_{1}]_{i,1},[w_{1}]_{i,2})
λ([w1]i,1,[w1]i,2)max{λ,([w1]i,1,[w1]i,2)2})1in.\displaystyle\qquad-\frac{\lambda([w_{1}]_{i,1},[w_{1}]_{i,2})}{\max\{\lambda,\|([w_{1}]_{i,1},[w_{1}]_{i,2})\|_{2}\}}\Bigg{)}_{1\leq i\leq n}.

The proximal point at w2κ/σ=([w2]i/σ)1inn{{w_{2}^{\kappa}}/{\sigma}=\left({[w_{2}^{\ell}]_{i}}/{\sigma}\right)_{1\leq i\leq n}\in\mathbb{R}^{n}} of the separable term ψ\psi^{\ell} with respect to a step size 1/σ1/\sigma can be found by minimizing, for every i{1,,n}i\in\{1,\ldots,n\}, the following smooth function

(t(0,+))𝗀1,i(t)=ψi(t)+σ2(t[w2κ]iσ)2.(\forall t\in(0,+\infty))\quad\mathsf{g}_{1,i}(t)=\psi_{i}^{\ell}(t)+\frac{\sigma}{2}\Big{(}t-\frac{[w_{2}^{\kappa}]_{i}}{\sigma}\Big{)}^{2}.

The update in (87) then reads

v2κ+1=([w2κ+1]iσ[w2κ]i)1inv_{2}^{\kappa+1}=\left([w_{2}^{\kappa+1}]_{i}-\sigma[w_{2}^{\kappa}]_{i}^{*}\right)_{1\leq i\leq n}

where, for every i{1,,n}i\in\{1,\dots,n\}, [w2κ]i[w_{2}^{\kappa}]_{i}^{*} corresponds to the unique zero of the derivative of 𝗀1,i\mathsf{g}_{1,i}. This zero is found by applying Newton’s method initialised with

w¯i=(max{103,[w2κ]iσ})1in.\bar{w}_{i}=\left(\max\Big{\{}10^{-3},\frac{[w_{2}^{\kappa}]_{i}}{\sigma}\Big{\}}\right)_{1\leq i\leq n}.
Proximal computation with respect to β\beta.

Subproblem (67) requires the solution of the following minimisation problem:

minimizeβnφ(β)+ζ1,2(Dβ)\underset{\begin{subarray}{c}{\beta\in\mathbb{R}^{n}}\end{subarray}}{\mathrm{minimize}}\;\;\varphi^{\ell}(\beta)+\zeta\ell_{1,2}(D\beta) (88)

where DD and 1,2\ell_{1,2} have been defined previously and

(β=(βi)1inn)φ(β)=i=1nφi(βi)(\forall\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})\quad\varphi^{\ell}(\beta)=\sum_{i=1}^{n}\varphi_{i}^{\ell}(\beta_{i})

with, for every i{1,,n}i\in\{1,\ldots,n\},

φi(βi)=(Cδ(xi+1))pi+1eβipi+1+βi+(βiμβ)22σβ2+12γ2(βiβi)2\varphi_{i}^{\ell}(\beta_{i})=\left(C_{\delta}(x_{i}^{\ell+1})\right)^{p_{i}^{\ell+1}}e^{-\beta_{i}p_{i}^{\ell+1}}+\beta_{i}\\ +{\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}}+\frac{1}{2\gamma_{2}}(\beta_{i}-\beta_{i}^{\ell})^{2} (89)

The above problem shares a structure similar to the one studied in the previous case since the objective function is the sum of the smooth convex term φ\varphi^{\ell} and the non-smooth convex one ζTV=ζ1,2(D)\zeta\operatorname{TV}=\zeta\ell_{1,2}(D\cdot), and it can be solved by the primal-dual procedure outlined in Algorithm 6.

Set τ>0\tau>0 and σ>0\sigma>0 such that τσ|D|21\tau\sigma|||D|||^{2}\leq 1.
Initialise the dual variable v0n×2v^{0}\in\mathbb{R}^{n\times 2}.
for κ=0,1,\kappa=0,1,\dots until convergence

uκ\displaystyle u^{\kappa} =βκτ(Dvκ),\displaystyle=\beta^{\kappa}-\tau(D^{*}v^{\kappa}), (90)
βκ+1\displaystyle{\beta}^{\kappa+1} =proxτφ(uκ),\displaystyle=\text{prox}_{\tau\varphi^{\ell}}(u^{\kappa}), (91)
wκ\displaystyle w^{\kappa} =vκ+σD(2βκ+1βκ),\displaystyle=v^{\kappa}+\sigma D(2{\beta}^{\kappa+1}-{\beta}^{\kappa}), (92)
vκ+1\displaystyle v^{\kappa+1} =wκ+1σproxζ1,2σ(wκσ).\displaystyle=w^{\kappa+1}-\sigma\text{prox}_{\frac{\zeta\ell_{1,2}}{\sigma}}(\frac{w^{\kappa}}{\sigma}). (93)

Return βκ+1n\beta^{\kappa+1}\in\mathbb{R}^{n}

Algorithm 6 Primal Dual Algorithm for minimizing (88)

At each iteration κ\kappa of Algorithm 6, the proximity operator of φ\varphi^{\ell} is expressed as

(β=(βi)1inn)proxτφ(β)=(proxτφi(βi))1in.(\forall\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})\\ \text{prox}_{\tau\varphi^{\ell}}(\beta)=\left(\text{prox}_{\tau\varphi_{i}^{\ell}}(\beta_{i})\right)_{1\leq i\leq n}. (94)

For every i{1,,n}i\in\{1,\ldots,n\}, proxτφi(βi)\text{prox}_{\tau\varphi_{i}^{\ell}}(\beta_{i}) is the minimizer of function

(βi)𝗀2,i(βi)=φi(βi)+12τ(βiuiκ)2.\displaystyle(\forall\beta_{i}\in\mathbb{R})\quad\mathsf{g}_{2,i}(\beta_{i})=\varphi_{i}^{\ell}(\beta_{i})+\frac{1}{2\tau}(\beta_{i}-u_{i}^{\kappa})^{2}. (95)

The nonlinear equation defining the unique zero of the derivative of 𝗀2,i\mathsf{g}_{2,i} admits a closed-form solution that involves the Lambert WW-function [76]. Indeed, let us introduce the following notation:

a1,i\displaystyle a_{1,i} =pi+1(Cδ(xi+1))pi+1,\displaystyle=p_{i}^{\ell+1}\left(C_{\delta}(x_{i}^{\ell+1})\right)^{p_{i}^{\ell+1}}, (96)
a2\displaystyle a_{2} =(1σβ2+1γ2+1τ)1,\displaystyle=\left(\frac{1}{\sigma^{2}_{\beta}}+\frac{1}{\gamma_{2}}+\frac{1}{\tau}\right)^{-1}, (97)
a3,i\displaystyle a_{3,i} =1μβσβ2βiγ2uiκτ.\displaystyle=1{-\frac{\mu_{\beta}}{\sigma^{2}_{\beta}}}-\frac{\beta_{i}^{\ell}}{\gamma_{2}}-\frac{u_{i}^{\kappa}}{\tau}. (98)

Then

𝗀2,i(βi)=0\displaystyle\mathsf{g}_{2,i}^{\prime}(\beta_{i})=0
a1,iexp(pi+1βi)+βia2+a3,i=0\displaystyle\iff-a_{1,i}\exp(-p_{i}^{\ell+1}\beta_{i})+\frac{\beta_{i}}{a_{2}}+a_{3,i}=0
pi+1(βi+a2a3,i)exp(pi+1(βi+a2a3,i))\displaystyle\iff p_{i}^{\ell+1}(\beta_{i}+a_{2}a_{3,i})\exp(p_{i}^{\ell+1}(\beta_{i}+a_{2}a_{3,i}))
=pi+1a1,ia2exp(pi+1a2a3,i)\displaystyle\qquad\qquad=p_{i}^{\ell+1}a_{1,i}a_{2}\exp(p_{i}^{\ell+1}a_{2}a_{3,i})
βi\displaystyle\iff\beta_{i}
=1pi+1W(pi+1a1,ia2exp(pi+1a2a3,i))a2a3,i,\displaystyle=\frac{1}{p_{i}^{\ell+1}}W(p_{i}^{\ell+1}a_{1,i}a_{2}\exp(p_{i}^{\ell+1}a_{2}a_{3,i}))-a_{2}a_{3,i}, (99)

where the last equivalence comes from the fact that the Lambert WW-function is single-valued and satisfies the following identity for a pair (X,Y)×(1e,+)(X,Y)\in\mathbb{R}\times\left(-\frac{1}{e},+\infty\right):

Xexp(X)=YX=W(Y).X\exp(X)=Y\iff X=W(Y). (100)

Notice that the expression in (99) is well defined since the argument of the Lambert function is always positive.

In conclusion, the update in (91) reads as βκ+1=(βiκ+1)1in\beta^{\kappa+1}=\left(\beta_{i}^{\kappa+1}\right)_{1\leq i\leq n} where each component of this vector is calculated according to (99).

5 Numerical Experiments

In this section, we illustrate the performance of our approach on a problem of joint deblurring/segmentation of realistically simulated ultrasound images. We consider images with two regions (Simu1) and three regions (Simu2) extracted from [17]. Both images have dimension 256×256256\times 256 pixels. The shape parameters pp and the reparameterised scale parameters β\beta are set in each region following the choices for pp and α\alpha in [17], itself based on the experimental setting in [19]. This strategy allows us to have a reference configuration for β\beta, which led us to choose a non-necessarily zero-mean Gaussian distribution as a prior for this parameter. In our experiments, we will treat μβ\mu_{\beta} as an unknown parameter, along with the regularisation parameters for the Total Variation priors. The pixel values in each region of the original image xnx\in\mathbb{R}^{n} are obtained as a realisation of a random variable following a 𝒢𝒢𝒟\mathcal{GGD} with the corresponding shape and scale parameters pp and α\alpha. We define KK as the linear operator modelling the convolution with the point spread function of a 3.5 MHz linear probe obtained with the Field II ultrasound simulator [77]. To reproduce the same setting as in [17], we obtain the observed degraded images yny\in\mathbb{R}^{n} from the original images xnx\in\mathbb{R}^{n} by applying the observation model (1), where we set the additive noise variance (which will be assumed to be known) to σ2=0.013\sigma^{2}=0.013 for Simu1 and σ2=33\sigma^{2}=33 for Simu2. For the preconditioner, we consider a regularised version of the inverse of the Hessian of the data fidelity function in (59), given by

A=σ2(KK+μ𝕀m)1A=\sigma^{2}(K^{\top}K+\mu\mathbb{I}_{m})^{-1}

where μ=0.1\mu=0.1, so that AA is well defined and constant throughout the iterations. Following the procedure outlined in [17], we initialise x0n{x}^{0}\in\mathbb{R}^{n} using a pre-deconvolved image obtained with a Wiener filter applied to the observed data yy, (pi0)1in({p}^{0}_{i})_{1\leq i\leq n} is drawn from an i.i.d. uniform distribution in the range [0.5,1.5][0.5,1.5], while (βi0)1in({\beta}^{0}_{i})_{1\leq i\leq n} is drawn from an i.i.d. Gaussian distribution with mean μβ\mu_{\beta} and unit standard deviation. We set μβ=0\mu_{\beta}=0 for Simu 1 and μβ=4\mu_{\beta}=4 for Simu 2, for arguments discussed in SM 2. We adopt the recovery strategy described in Section 4 and describe hereafter the setting of the model/algorithm hyperparameters.

The model parameters that need to be tuned are the δ1>0\delta_{1}>0 and δ2>0\delta_{2}>0 values for the pseudo-Huber function, the mean μβ\mu_{\beta}\in\mathbb{R} and the standard deviation σβ>0\sigma_{\beta}>0 for the reparameterised scale parameter, and finally the regularisation parameters (λ,ζ)(0,+)2(\lambda,\zeta)\in(0,+\infty)^{2} for the TV terms. For parameter δ=(δ1,δ2)\delta=(\delta_{1},\delta_{2}), we applied the following choice, resulting from a rough empirical search,: δ1=1\delta_{1}=1 while δ2=δ1×102\delta_{2}=\delta_{1}\times 10^{-2}. For what concerns the Gaussian parameters of the reparameterised scale variable (μβ,σβ)(\mu_{\beta},\sigma_{\beta}), the mean μβ\mu_{\beta} is the most influential on the estimated solution, so we dedicated an in-depth analysis for its choice in combination with the TV regularisation parameters (λ,ζ)(\lambda,\zeta). More precisely, we tested different values of μβ\mu_{\beta} in the range [10,10][-10,10] in combination with a grid search for (λ,ζ){102,101,1,10,102,103}2(\lambda,\zeta)\in\{10^{-2},10^{-1},1,10,10^{2},10^{3}\}^{2} with respect to different quality metrics and identified an optimal choice for μβ\mu_{\beta}. The standard deviation appeared less influential and is set to σβ=1\sigma_{\beta}=1 in all our experiments. The details of the analysis are illustrated in the annexed SM 2.

The algorithmic hyperparameters include the step sizes of the proximal steps, as well as the preconditioning matrix involved in the preconditioned proximal gradient step. We set (γ0,γ1,γ2)=(0.99,1,1)(\gamma_{0},\gamma_{1},\gamma_{2})=(0.99,1,1) in order to meet the convergence assumptions in Algorithm 2. In particular, the choice for γ0\gamma_{0} approximates the highest value allowed for the step size of the preconditioned inexact FB scheme in (65), while γ1\gamma_{1} satisfies the condition γ1<8.805\gamma_{1}<8.805 for the convexity of the function in (80).

In order to obtain the labelling of a segmented image from our estimated shape parameter (denoted by p^\widehat{p}) we use a quantisation procedure based on Matlab functions multithresh and imquantize. The former defines a desired number of quantisation levels using Otsu’s method, while the latter performs a truncation of the data values according to the provided quantisation levels. We remark here that the number of labels does not need to be defined throughout the proposed optimisation procedure, but only at the final segmentation step. This step can thus be considered as a post-processing that is performed on the estimated solution.

In order to evaluate the quality of the solution, we consider the following metrics: for the estimated image, we make use of the peak signal-to-noise ratio (PSNR) defined as follows, xx being the original signal and x^\widehat{x} the estimated one:

PSNR=10log10(nmaxi{1,,n}{xi,x^i}2/xix^i2),\text{PSNR}=10\log_{10}\big{(}n\,\max_{i\in\{1,\dots,n\}}\{x_{i},\hat{x}_{i}\}^{2}/\|x_{i}-\hat{x}_{i}\|^{2}\big{)},

and of the structure similarity measure (SSIM) [78]. For the segmentation task we compute the percentage OA of correctly predicted labels.

The stopping criteria for Algorithm 2 outer and inner loops are set by defining a threshold level on the relative change between two consecutive iterates of the involved variables, the relative change of the objective values of two consecutive iterates and a maximum number of iterations. The outer loop in Algorithm 2 stops whenever =10000\ell=10000 or when both ζ+1ζ/ζ<104\|\zeta^{\ell+1}-\zeta^{\ell}\|/\|\zeta^{\ell}\|<10^{-4} and |θ(ζ+1)θ(ζ)|/|θ(ζ)|<104|\theta(\zeta^{\ell+1})-\theta(\zeta^{\ell})|/|\theta(\zeta^{\ell})|<10^{-4}. The MM procedure to compute x+1x^{\ell+1} in Algorithm 3 is stopped after 300 iterations or when xκ+1xκ/xκ<103\|x^{\kappa+1}-x^{\kappa}\|/\|x^{\kappa}\|<10^{-3}. The DFB procedure in Algorithm 4 to compute uκ+1u^{\kappa+1} is stopped after 300 iterations or when uκ+1uκ/uκ<103\|u^{\kappa+1}-u^{\kappa}\|/\|u^{\kappa}\|<10^{-3}. The PD procedure in Algorithm 5, and Algorithm 6 computing p+1p^{\ell+1} (resp. β+1\beta^{\ell+1}) terminates after 200 iteration or when pκ+1pκ/pκ<103\|p^{\kappa+1}-p^{\kappa}\|/\|p^{\kappa}\|<10^{-3} (resp. βκ+1βκ/βκ<103\|\beta^{\kappa+1}-\beta^{\kappa}\|/\|\beta^{\kappa}\|<10^{-3}).

Figure 2 illustrates in the first and second line the B–mode image of the original xx, of the degraded yy, and of the reconstructed image x^\hat{x} on both examples. The B–mode image is the most common representation of an ultrasound image, displaying the acoustic impedance of a 2-dimensional cross section of the considered tissue. The reconstructed results in Figure 2 (right) show clearly reduced blur and sharper region contours. We then report in the third and fourth lines of Figure 2 the estimated shape parameter and the segmentation obtained via the aforementioned quantisation procedure, which confirms its good performance. We notice that our estimated p^i\hat{p}_{i} values are consistent with the original ones and the fact that the results for Simu2 are slightly less accurate than the ones for Simu1 is in line with the results presented in [17, Table III] for P-ULA, HMC and PP-ULA, suggesting that the configuration of the parameters for Simu2 is quite challenging.

Table 1 proposes a quantitative comparison of our results against those of the methods considered in [17]: a combination of Wiener deconvolution and Otsu’s segmentation [47], a combination of LASSO deconvolution and SLaT segmentation [40], the adjusted Hamiltonian Monte Carlo (HMC) method [79], the Proximal Unadjusted Langevin algorithm (P-ULA) [80] and its preconditioned version (PP-ULA) [17] for joint deconvolution and segmentation. From this table, we can conclude that the proposed variational method is able to compete with state-of-the-art Monte Carlo Markov Chain techniques in terms of both segmentation and deconvolution performance. For what concerns the computational time, the average time (over 10 runs of the algorithm) required by P-SASL-PAM to meet the stopping criteria ζ+1ζ/ζ<104{\|\zeta^{\ell+1}-\zeta^{\ell}\|/\|\zeta^{\ell}\|<10^{-4}} and |θ(ζ+1)θ(ζ)|/|θ(ζ)|<104{|\theta(\zeta^{\ell+1})-\theta(\zeta^{\ell})|/|\theta(\zeta^{\ell})|<10^{-4}} corresponds to 493.2493.2 seconds (approximately 813′′8^{\prime}13{{}^{\prime\prime}}) for Simu1 and 536.4536.4 seconds (approximately 856′′8^{\prime}56{{}^{\prime\prime}}) for Simu2. Simulations were run on Matlab 2021b on an Intel Xeon Gold 6230 CPU 2.10GHz. In Table 1, we report the computational times for PULA, HMC and PP-ULA from [17, TABLE II], which were obtained on Matlab 2018b on an Intel Xeon CPU E5-1650 3.20GHz.

Simu1 Simu2
METHOD PSNR SSIM OA TIME PSNR SSIM OA TIME
Wiener-Otsu 37.1 0.57 99.5 35.4 0.63 96.0
LASSO-SLaT 39.2 0.60 99.6 37.8 0.70 98.3
P-ULA 38.9 0.45 98.7 22 h 2727 min 37.1 0.57 94.9 33 h 0606 min
HMC 40.0 0.62 99.7 11 h 0808 min 36.4 0.64 98.5 44 h 1414 min
PP-ULA 40.3 0.62 99.7 1212 min 38.6 0.71 98.7 3939 min
OURS 40.2 0.61 99.9 88 min 38.1 0.70 97.7 99 min
Table 1: PSNR, SSIM, OA scores and Computational time for Simu1 and Simu2 from [17]. The symbol "–" means the result was not available in the reference paper.
ORIGINAL DEGRADED RECONSTRUCTED

Simu1

Refer to caption Refer to caption Refer to caption

Simu2

Refer to caption Refer to caption Refer to caption
REFERENCE ESTIMATED QUANTISED
Simu1 Refer to caption Refer to caption Refer to caption
Simu2 Refer to caption Refer to caption Refer to caption
Figure 2: First and Second lines: B–mode of Simu1 and Simu2. The B–mode image is the most common type of ultrasound image, displaying the acoustic impedance of a 2-dimensional cross section of the considered tissue. All images are presented in the same scale [0,1]. Third and Fourth lines: Segmentation of the shape parameter for Simu1 and Simu2: reference pp, estimated p^\hat{p} and quantised p¯\bar{p}.
Refer to caption Refer to caption Refer to caption Refer to caption
(a) (b) (c) (d)
Figure 3: Decay of the objective value along 500 iterations for Simu1 (a) and Simu2 (b). We considered ten random sampling for p0p^{0} and β0\beta^{0}. The continuous line in the plot represents the mean objective value at each iteration, and the shaded area, highlighted in the zoomed region at the centre spanning over 20 iterations, corresponds to the confidence interval related to the standard deviation. Logarithmic plot of the relative distance from the iterates ζ\zeta^{\ell} to the solution ζ\zeta^{\infty} over 1500 iterations for Simu1 (c) and Simu2 (d).

Eventually, Figure 3 (a)-(b) show the evolution of the mean value of the cost function for both Simu1 (a) and Simu2 (b) along 500 iterations for ten different sampling of p0p^{0} and β0\beta^{0}, while Figure 3 (c)-(d)illustrate on a logarithmic scale the relative distance from the iterates to the solution ζζ/ζ\|\zeta^{\ell}-\zeta^{\infty}\|/\|\zeta^{\infty}\| for Simu1 (c) and Simu2 (d), showing the convergence of our algorithm.

Additional experiments can be found in Supplementary Material: SM1, showing that for standard wavelet-based image restoration problems the proposed regularisation outperforms other sparsity measures.

6 Conclusions

We investigated a novel approach for the joint reconstruction/feature extraction problem. The novelty in this work lies both in the problem formulation and in the resolution procedure. Firstly, we proposed a new variational model in which we introduced a flexible sparse regularisation term for the reconstruction task. Secondly, we designed an inexact version of a TITAN-based block alternating optimisation scheme, whose aim is to exploit the structure of the problem and the properties of the functions involved in it. We established convergence results for the proposed algorithm whose scope goes beyond the image processing problems considered in our work. We illustrated the validity of the approach on numerical examples in the case of a joint deconvolution-segmentation problem. We also included comparisons with state-of-the-art methods with respect to which our proposal registers a similar qualitative and quantitative performance. An attractive aspect of the proposed work is that the space variant parameters defining the flexible sparse regularisation do not need to be defined in advance, but are inherently estimated by the iterative optimisation procedure. For what concerns the tuning of the hyperparamters of the model, the design of an automatic strategy could be an interesting development of the work, for instance through supervised learning. \bmheadAcknowledgments This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 861137. The authors thank Ségolène Martin for her careful reading of the initial version of this manuscript.

References

  • \bibcommenthead
  • Daubechies et al. [2004] Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constrains. Communications on Pure and Applied Mathematics 57 (2004)
  • Grasmair et al. [2008] Grasmair, M., Haltmeier, M., Scherzer, O.: Sparse regularization with lq penalty term. Inverse Problems 24, 055020 (2008)
  • Lorenz [2008] Lorenz, D.: Convergence rates and source conditions for Tikhonov regularization with sparsity constraints. Journal of Inverse and Ill-posed Problems 16(5), 463–478 (2008)
  • Ramlau and Resmerita [2010] Ramlau, R., Resmerita, E.: Convergence rates for regularization with sparsity constraints. Electronic transactions on numerical analysis ETNA 37, 87–104 (2010)
  • Tibshirani [1996] Tibshirani, R.: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288 (1996)
  • Chartrand [2007] Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters 14, 707–710 (2007)
  • Grasmair [2009] Grasmair, M.: Well-posedness and convergence rates for sparse regularization with sublinear q\ell^{q} penalty term. Inverse Problems &\& Imaging 3(3), 383–387 (2009)
  • Zarzer [2009] Zarzer, C.: On Tikhonov regularization with non-convex sparsity constraints. Inverse Problems 25, 025006 (2009)
  • Ghilli and Kunisch [2019] Ghilli, D., Kunisch, K.: On monotone and primal-dual active set schemes for p\ell_{p}-type problems, p(0,1]p\in(0,1]. Computational Optimization and Applications 72, 45–85 (2019)
  • Hintermüller and Wu [2013] Hintermüller, M., Wu, T.: Nonconvex TVq-models in image restoration: analysis and a Trust-Region regularization-based superlinearly convergent solver. SIAM Journal on Imaging Sciences 6, 1385–1415 (2013)
  • Lorenz and Resmerita [2016] Lorenz, D., Resmerita, E.: Flexible sparse regularization. Inverse Problems 33 (2016)
  • Afonso and Sanches [2017] Afonso, M., Sanches, J.M.: Adaptive order non-convex lp-norm regularization in image restoration. Journal of Physics: Conference Series 904(1), 012016 (2017)
  • Blomgren et al. [1997] Blomgren, P., Chan, T.F., Mulet, P., Wong, C.K.: Total variation image restoration: Numerical methods and extensions. IEEE International Conference on Image Processing (1997)
  • Chen et al. [2006] Chen, Y., Levine, S., Rao, M.: Variable exponent, linear growth functionals in image restoration. SIAM Journal on Applied Mathematics 66, 1383–1406 (2006)
  • Lanza et al. [2018] Lanza, A., Morigi, S., Pragliola, M., Sgallari, F.: Space-variant generalised Gaussian regularisation for image restoration. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization 7, 1–14 (2018)
  • Lazzaretti et al. [2022] Lazzaretti, M., Calatroni, L., Estatico, C.: Modular-proximal gradient algorithms in variable exponent lebesgue spaces. SIAM Journal on Scientific Computing 44(6), 3463–3489 (2022)
  • Corbineau et al. [2019] Corbineau, M.-C., Kouamé, D., Chouzenoux, E., Tourneret, J.-Y., Pesquet, J.-C.: Preconditioned P-ULA for joint deconvolution-segmentation of ultrasound images. IEEE Signal Processing Letters 26(10), 1456–1460 (2019)
  • Do and Vetterli [2002] Do, M.N., Vetterli, M.: Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance. IEEE Transactions on Image Processing 11(2), 146–158 (2002)
  • Zhao et al. [2016] Zhao, N., Basarab, A., Kouamé, D., Tourneret, J.-Y.: Joint segmentation and deconvolution of ultrasound images using a hierarchical Bayesian model based on generalized Gaussian priors. IEEE Transactions on Image Processing 25(8), 3736–3750 (2016)
  • Hildreth [1957] Hildreth, C.: A quadratic programming procedure. Naval Research Logistics Quarterly 4(1), 79–85 (1957)
  • Tseng [2001] Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications 109, 475–494 (2001)
  • Combettes and Pesquet [2011] Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing, pp. 185–212. Springer, New York, NY (2011)
  • Combettes and Pesquet [2021] Combettes, P.L., Pesquet, J.-C.: Fixed point strategies in data science. IEEE Transactions on Signal Processing 69, 3878–3905 (2021)
  • Attouch et al. [2010] Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research 35(2), 438–457 (2010)
  • Bolte et al. [2014] Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1-2), 459–494 (2014)
  • Pock and Sabach [2016] Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM Journal on Imaging Sciences 9(4), 1756–1787 (2016)
  • Hertrich and Steidl [2022] Hertrich, J., Steidl, G.: Inertial stochastic palm and applications in machine learning. Sampling Theory, Signal Processing, and Data Analysis 20(1), 4 (2022)
  • Foare et al. [2020] Foare, M., Pustelnik, N., Condat, L.: Semi-Linearized proximal alternating minimization for a discrete Mumford–Shah model. IEEE Transactions on Image Processing 29, 2176–2189 (2020)
  • Nikolova and Tan [2017] Nikolova, M., Tan, P.: Alternating proximal gradient descent for nonconvex regularised problems with multiconvex coupling terms (2017). https://hal.archives-ouvertes.fr/hal-01492846
  • Tan et al. [2019] Tan, P., Pierre, F., Nikolova, M.: Inertial alternating generalized forward–backward splitting for image colorization. Journal of Mathematical Imaging and Vision 61, 672–690 (2019)
  • Censor and Lent [1987] Censor, Y., Lent, A.: Optimization of “log x" entropy over linear equality constraints. Siam Journal on Control and Optimization 25, 921–933 (1987)
  • Chouzenoux et al. [2016] Chouzenoux, E., Pesquet, J.-C., Repetti, A.: A block coordinate variable metric forward–backward algorithm. Journal of Global Optimization, 1–29 (2016)
  • Bonettini et al. [2018] Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Computational Optimization and Applications (2018)
  • Repetti and Wiaux [2021] Repetti, A., Wiaux, Y.: Variable metric forward-backward algorithm for composite minimization problems. SIAM Journal on Optimization 31(2), 1215–1241 (2021)
  • Bonettini et al. [2019] Bonettini, S., Porta, F., Prato, M., Rebegoldi, S., Ruggiero, V., Zanni, L.: Recent Advances in Variable Metric First-Order Methods, pp. 1–31. Springer, Cham (2019)
  • Hien et al. [2020] Hien, L.T.K., Phan, D.N., Gillis, N.: An inertial block majorization minimization framework for nonsmooth nonconvex optimization. J. Mach. Learn. Res. 24, 18–11841 (2020)
  • Attouch et al. [2011] Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming, Series A 137(1), 91–124 (2011)
  • Chaâri et al. [2010] Chaâri, L., Pesquet, J.-C., Tourneret, J.-Y., Ciuciu, P., Benazza-Benyahia, A.: A hierarchical bayesian model for frame representation. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4086–4089 (2010). https://doi.org/10.1109/ICASSP.2010.5495737
  • Jeffreys [1946] Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 186(1007), 453–461 (1946). Accessed 2024-02-07
  • Cai et al. [2017] Cai, X., Chan, R., Nikolova, M., Zeng, T.: A three-stage approach for segmenting degraded color images: Smoothing, Lifting and Thresholding (SLaT). Journal of Scientific Computing 72, 1313–1332 (2017)
  • Cai et al. [2019] Cai, X., Chan, R., Schönlieb, C.-B., Steidl, G., Zeng, T.: Linkage between piecewise constant Mumford–Shah model and Rudin–Osher–Fatemi model and its virtue in image segmentation. SIAM Journal on Scientific Computing 41(6), 1310–1340 (2019)
  • Cai et al. [2013] Cai, X., Chan, R., Zeng, T.: A two-stage image segmentation method using a convex variant of the Mumford–Shah model and thresholding. SIAM Journal on Imaging Sciences 6(1), 368–390 (2013)
  • Chambolle et al. [2012] Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM Journal on Imaging Sciences 5, 1113–1158 (2012)
  • Chan et al. [2014] Chan, R., Yang, H., Zeng, T.: A two-stage image segmentation method for blurry images with Poisson or multiplicative Gamma noise. SIAM Journal on Imaging Sciences 7(1), 98–127 (2014)
  • Pascal et al. [2021] Pascal, B., Vaiter, S., Pustelnik, N., Abry, P.: Automated data-driven selection of the hyperparameters for total-variation based texture segmentation. Journal of Mathematical Imaging and Vision 63, 923–952 (2021)
  • Mumford and Shah [1989] Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42, 577–685 (1989)
  • Otsu [1979] Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9(1), 62–66 (1979)
  • Rockafellar et al. [2009] Rockafellar, R.T., Wets, M., Wets, R.J.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften. Springer, Heidelberg (2009)
  • Kurdyka [1998] Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’Institut Fourier 48(3), 769–783 (1998)
  • Łojasiewicz [1963] Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Equ. Derivees partielles, Paris 1962, Colloques internat. Centre nat. Rech. sci. 117, 87-89 (1963). (1963)
  • Łojasiewicz [1993] Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’Institut Fourier 43(5), 1575–1595 (1993)
  • Chouzenoux et al. [2014] Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric Forward-Backward algorithm for minimizing the sum of a differentiable function and a convex function. Journal of Optimization Theory and Applications 162, 107–132 (2014)
  • Bertsekas [1999] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
  • Erdogan and Fessler [1999] Erdogan, H., Fessler, J.A.: Monotonic algorithms for transmission tomography. IEEE Transactions on Medical Imaging 18(9), 801–814 (1999)
  • Hunter and Lange [2004] Hunter, D., Lange, K.: A Tutorial on MM Algorithms. The American Statistician 58, 30–37 (2004)
  • Salzo [2017] Salzo, S.: The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM Journal on Optimization 27(4), 2153–2181 (2017)
  • Malitsky and Mishchenko [2020] Malitsky, Y., Mishchenko, K.: Adaptive gradient descent without descent. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 6702–6712 (2020)
  • Latafat et al. [2023] Latafat, P., Themelis, A., Stella, L., Patrinos, P.: Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient (2023)
  • Charbonnier et al. [1997] Charbonnier, P., Blanc-Féraud, L., Aubert, G., Barlaud, M.: Deterministic edge-preserving regularization in computed imaging. IEEE Transactions on Image Processing 6 2, 298–311 (1997)
  • Van Den Dries [1998] Van Den Dries, L.: Tame Topology and O-minimal Structures. London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge (1998)
  • Bolte et al. [2007] Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM Journal on Optimization 18(2), 556–572 (2007)
  • Gabrielov [1996] Gabrielov, A.: Complements of subanalytic sets and existential formulas for analytic functions. Inventiones mathematicae 125, 1–12 (1996)
  • Wilkie [1996] Wilkie, A.: Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. Journal of the American Mathematical Society 9, 1051–1094 (1996)
  • Van Den Dries et al. [1997] Van Den Dries, L., Macintyre, A., Marker, D.: Logarithmic-exponential power series. Journal of the London Mathematical Society 56(3), 417–434 (1997)
  • Van Den Dries and Speissegger [2000] Van Den Dries, L., Speissegger, P.: The field of reals with multisummable series and the exponential function. Proceedings of The London Mathematical Society 81, 513–565 (2000)
  • Tougeron [1994] Tougeron, J.: Sur les ensembles semi-analytiques avec conditions gevrey au bord. Annales Scientifiques De L Ecole Normale Superieure 27, 173–208 (1994)
  • Artin [2015] Artin, E.: The Gamma Function. Courier Dover Publications, New York (2015)
  • Wrench [1968] Wrench, J.W.: Concerning two series for the Gamma function. Mathematics of Computation 22(103), 617–626 (1968)
  • Andrews et al. [1999] Andrews, G.E., Askey, R., Roy, R.: Special Functions. Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge (1999)
  • Repetti and Wiaux [2021] Repetti, A., Wiaux, Y.: Variable metric forward-backward algorithm for composite minimization problems. SIAM Journal on Optimization 31(2), 1215–1241 (2021)
  • Schifano et al. [2010] Schifano, E.D., Strawderman, R.L., Wells, M.T.: Majorization-Minimization algorithms for nonsmoothly penalized objective functions. Electronic Journal of Statistics 4(none), 1258–1299 (2010)
  • Combettes et al. [2011] Combettes, P.L., Dũng, Bằng Công Vũ: Proximity for sums of composite functions. Journal of Mathematical Analysis and Applications 380(2), 680–688 (2011)
  • Condat [2013] Condat, L.: A Primal–Dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. Journal of Optimization Theory and Applications 158 (2013)
  • Komodakis and Pesquet [2015] Komodakis, N., Pesquet, J.-C.: Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine 32(6), 31–54 (2015)
  • Vũ [2013] Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Advances in Computational Mathematics 38, 667–681 (2013)
  • Corless et al. [1996] Corless, R., Gonnet, G., Hare, D., Jeffrey, D., Knuth, D.: On the Lambert W function. Advances in Computational Mathematics 5, 329–359 (1996)
  • Jensen [2004] Jensen, J.A.: Simulation of advanced ultrasound systems using field ii. In: 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro, pp. 636–6391 (2004)
  • Wang et al. [2004] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004)
  • Robert et al. [2018] Robert, C., Elvira, V., Tawn, N., Wu, C.: Accelerating MCMC algorithms. Wiley Interdisciplinary Reviews: Computational Statistics 10 (2018)
  • Pereyra [2013] Pereyra, M.: Proximal Markov chain Monte Carlo algorithms. Statistics and Computing 26 (2013)
\foreach\x

in 1,…,0 See pages \x of supplement_rev.pdf