\pdfximage

supplement_rev.pdf

[1]\fnmGabriele \surScrivanti

1]\orgnameUniversité Paris-Saclay, Inria, CentraleSupélec, CVN, \stateFrance 2]\orgnameResearch department, Preligens,\stateFrance

A Variational Approach for Joint Image Recovery and Feature Extraction Based on Spatially Varying Generalised Gaussian Models

\fnmÉmilie \surChouzenoux \fnmMarie-Caroline \surCorbineau \fnmJean-Christophe \surPesquet * [email protected] [ [

Abstract

The joint problem of reconstruction/feature extraction is a challenging task in image processing. It consists in performing, in a joint manner, the restoration of an image and the extraction of its features. In this work, we firstly propose a novel non-smooth and non-convex variational formulation of the problem. For this purpose, we introduce a versatile generalised Gaussian prior whose parameters, including its exponent, are space-variant. Secondly, we design an alternating proximal-based optimisation algorithm that efficiently exploits the structure of the proposed non-convex objective function. We also analyse the convergence of this algorithm. As shown in numerical experiments conducted on joint deblurring/segmentation tasks, the proposed method provides high-quality results.

keywords:

Image recovery ; Space-variant regularisation ; Alternating minimization ; Proximal algorithm ; Block coordinate descent ; Image segmentation

1 Introduction

Variational regularisation of ill-posed inverse problems in imaging relies on the idea of searching for a solution in a well-suited space. A central role in this context is played by $\ell_{p}$ spaces with $p\in(0,\infty)$ , and the power $p$ of the corresponding norms when $p\geq 1$ [1, 2, 3, 4, 5] or seminorms when $p\in(0,1)$ [6, 7, 8]. For every vector $u=(u_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ and $p\in(0,+\infty)$ , the $\ell_{p}$ (semi-)norm is denoted by $\|u\|_{p}=\big{(}\sum_{i=1}^{n}|u_{i}|^{p}\big{)}^{1/p}$ . We usually omit $p$ when $p=2$ , so that $\|\cdot\|=\|\cdot\|_{2}$ . The case $p\in(0,1)$ has gained rising credit, especially in the field of sparse regularisation. An extensive literature has been focused on challenging numerical tasks raised by the non-convexity of the seminorms and the possibility of combining them with linear operators to extract salient features of the sought images [9, 10]. In [11], the more general notion of $F$ -norm is introduced in order to establish functional analysis results on products of $\ell_{p_{i}}$ -spaces with $p_{i}\in(0,2]$ . For some $x=(x_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ , this amounts to studying the properties of penalties of the form $\sum_{i=1}^{n}|x_{i}|^{p_{i}}$ , for some positive exponents $(p_{i})_{1\leq i\leq n}$ . This approach offers a more flexible framework by considering a wider range of exponents than the standard $\ell_{p}$ -based regularisation. However, it extends the problem of choosing a suitable exponent $p$ to a whole sequence of exponents $(p_{i})_{1\leq i\leq n}$ . In [12], the authors proposed a non-convex regulariser of the form $\sum_{i=1}^{n}|x_{i}|^{\varpi(|x_{i}|)}$ , where each exponent is expressed as a function of the absolute magnitude of the data and function $\varpi(\cdot)$ is a rescaled version of the sigmoid function, taking values in the interval $[0,1]$ . In image restoration, a similar approach consists in adopting space variant regularisation models built around a Total Variation-like functional with a variable exponent $\sum_{i=1}^{n}\|(\nabla x)_{i}\|^{p_{i}}$ where $\nabla$ is a discrete 2D gradient operator. The rationale is to select the set of parameters $(p_{i})_{1\leq i\leq n}$ in order to promote either edge enhancement ( $p_{i}=1$ ) or smoothing ( $p_{i}>1$ ) depending on the spatial location encoded by index $i$ . This model was introduced in [13] and then put into practice firstly for $p_{i}\in[1,2]$ in [14] and then for $p_{i}\in(0,2]$ in [15]. To conclude, in a recent work [16], the authors proposed a modular-proximal gradient algorithm to find solutions to ill-posed inverse problems in variable exponents Lebesgue spaces $L^{p(\cdot)}(\Omega)$ with $\Omega\subseteq\mathbb{R}^{n}$ , rather than in $L^{2}(\Omega)$ . In all of these works, the so-called space variant $p$ -map (i.e. , $(p_{i})_{1\leq i\leq n}$ ) is estimated offline in a preliminary step and then kept fixed throughout the optimisation procedure.

In this paper, we address the problem of joint image recovery and feature extraction. Image recovery amounts to retrieving an estimate of an original image from a degraded version of it. The degradation usually corresponds to the application of a linear operator (e.g., blur, projection matrix) to the image and the addition of a noise. Feature extraction problems arise when one wants to assign to an image a small set of parameters which can describe or identify the image itself. Image segmentation can be viewed as an example of feature extraction, which consists of defining a label field on the image domain so that pixels are partitioned into a predefined number of homogeneous regions according to some specific characteristics. A second example, similar to segmentation, is edge detection, where one aims at identifying the contour changes within different regions of the image. Texture retrieval is a third example. This task relies on the idea of assigning a set of parameters to each coefficient of the image – possibly in some transformed space – so that the combination of all parameters defines a "signature" that represents the content of various spatial regions. Joint image recovery and feature extraction consists in performing, in a joint manner, the image recovery and the extraction of features in the sought image.

A powerful and versatile approach for feature extraction, that we propose to adopt here, assumes that the data follow a mixture of generalised Gaussian probability distribution $(\mathcal{GGD})$ [17, 18, 19]. The $\mathcal{GGD}$ model results in a sum of weighted $\ell_{p_{i}}$ -based terms in the criterion, with general form $\sum_{i=1}^{n}\vartheta_{i}|x_{i}|^{p_{i}}$ with $\{\vartheta_{i}\}_{1\leq i\leq n}\subset[0,+\infty)$ . We thus aim at jointly estimating an optimal configuration for $(\vartheta_{i},p_{i})_{1\leq i\leq n}$ , and retrieving the image. Under an assumption of consistency within the exponents’ values of a given region of the features space, we indeed obtain the desired feature extraction starting from the estimated $p$ -map. The latter amounts to minimizing a non-smooth and non-convex cost function.

This specific structure of the proposed objective function suggests the use of an alternating minimisation procedure. In such an approach, one sequentially updates a subset of parameters through the resolution of an inner minimization problem, while the other parameters are assumed to be fixed. This approach has a standard form in the Block Coordinate Descent method (BCD) (also known as Gauss-Seidel algorithm) [20]. In the context of non-smooth and non-convex problems, the simple BCD may, however, show instabilities [21], which resulted in an extensive construction of alternative methods that efficiently exploit the characteristics of the functions, and introduce powerful tools to improve the convergence guarantees of BCD, or overcome difficulties arising in some formulations. In this respect, a central role is played by proximal methods [22, 23]: a proximally regularised BCD (PAM) for non-convex problems was studied in [24]; a proximal linearised method (PALM) and its inertial and stochastic versions were then proposed in [25] resp. [26] and [27]; in [28], the authors investigated the advantage of a hybrid semi-linearised scheme (SL-PAM) for the joint task of image restoration and edge detection based on a discrete version of the Mumford–Shah model. A structure-adapted version of PALM (ASAP) was designed in [29, 30] to exploit the block-convexity of the coupling terms and the regularity of the block-separable terms arising in some practical applications such as image colorisation and blind source separation. The extension to proximal mappings defined w.r.t. a variable metric was firstly introduced in [31], leading to the so-called Block Coordinate Variable Metric Forward-Backward. An Inexact version and a line search based version of it were presented in [32] and [33], respectively. In [34] the authors introduced a Majorisation-Minimisation (MM) strategy within a Variable Metric Forward-Backward algorithm to tackle the challenging task of computing the proximity operator of composite functions. We refer to [35] for an in-depth analysis of how to introduce a variable metric into first-order methods. In [36], the authors introduced a family of block-coordinate majorisation-minimisation methods named TITAN. Various majorisation strategies can be encompassed by their framework, such as proximal surrogates, Lipschitz gradient surrogates, or Bregman surrogate functions. Convergence of the algorithm iterates are shown in [36, 32], under mild assumptions, that include the challenging non-convex setting. These studies emphasised the prominent role played by the Kurdyka-Łojasiewicz (KL) inequality [37].

In the proposed problem formulation, the objective function includes several non-smooth terms, as well as a quadratic term – hence Lipschitz differentiable – that is restricted to a single block of variables. This feature makes the related subproblem well-suited for a splitting procedure that involves an explicit gradient step with respect to this term, combined with implicit proximal steps on the remaining blocks of variables. Variable metrics within gradient/proximal steps would also be desirable for convergence speed purposes. As we will show, the TITAN framework from [36] allows building and analysing such an algorithm. Unfortunately, the theoretical convergence properties of TITAN assume exact proximal computations at each step, which cannot be ensured in practice in our context. To circumvent this, we thus propose and prove the convergence of an inexact version of a TITAN-based optimisation scheme. Inexact rules in the form of those studied in [32] are considered. We refer to the proposed method as to a Preconditioned Semi-Linearised Structure Adapted Proximal Alternating Minimisation (P–SASL–PAM) scheme. We investigate the convergence properties for this algorithm by relying on the KŁ property first considered in [37]. Under analytical assumptions on the objective function, we show the global convergence toward a critical point of any sequence generated by the proposed method. Then, we explicit the use of this method in our problem of image recovery and feature extraction. The performance of the approach is illustrated by means of examples in the field of image processing, in which we also show quantitative comparisons with state-of-the-art methods.

In a nutshell, the contributions of this work are (i) the proposition of an original non-convex variational model for the joint image recovery and feature extraction problem; (ii) the design of an inexact block coordinate descent algorithm to address the resulting minimisation problem; (iii) the convergence analysis of this scheme; (iv) the illustration of the performance of the proposed method through a numerical example in the field of ultrasound image processing.

The paper is organised as follows. In Section 2 we introduce the degradation model and report our derivation of the objective function for image recovery and feature extraction, starting from statistical assumptions on the data. In Section 3, we describe the proposed P-SASL-PAM method to address a general non-smooth non-convex optimization problem; secondly we show that the proposed method converges globally, in the sense that the whole generated sequence converges to a (local) minimum. The application of the P-SASL-PAM method to the joint reconstruction/segmentation problem is described in Section 4. Some illustrative numerical results are shown in Section 5. Conclusions are drawn in Section 6.

2 Model Formulation

In this section, we describe the construction of the objective function associated to the joint reconstruction/feature extraction problem. After defining the degradation model, we report the Bayesian model that is reminiscent from the one considered in [17, 19] in the context of ultrasound imaging. Then, we describe the procedure that leads us to the definition of our addressed optimization problem.

2.1 Observation Model

Let $x\in\mathbb{R}^{n}$ and $y\in\mathbb{R}^{m}$ be respectively the vectorised sought-for solution and the observed data, which are assumed to be related according to the following model

y=Kx+\omega,

(1)

where $K\in\mathbb{R}^{m\times n}$ is a linear operator, and $\omega\sim\mathcal{N}(0,\sigma^{2}\mathbb{I}_{m})$ , i.e. the normal distribution with zero mean and covariance matrix $\sigma^{2}\mathbb{I}_{m}$ with $\sigma>0$ and $\mathbb{I}_{m}$ states for the $m\times m$ identity matrix. We further assume that $x$ can be characterised by a finite set of $k$ features that are defined in a suitable space, where the data are described by a simple model relying on a small number of parameters. The Generalised Gaussian Distribution ( $\mathcal{GGD}$ )

(\forall t\in\mathbb{R})\quad\mathsf{p}(t;p,\alpha)\\ =\frac{1}{2\alpha^{1/p}\Gamma\left(1+\frac{1}{p}\right)}\exp\left(-\frac{|t|^{p}}{\alpha}\right)

(2)

with $(p,\alpha)\in(0,+\infty)^{2}$ has shown to be a suitable and flexible tool for this purpose [17, 18, 19]. Each feature can be identified by a pair $(p_{j},\alpha_{j})$ for $j\in\{1,\dots,k\}$ , where parameter $p$ is proportional to the decay rate of the tail of the probability density function (PDF) and parameter $\alpha$ models the width of the peak of the PFD. Taking into account the role that $p$ and $\alpha$ play in the definition of the PDF profile, these two parameters are generally referred to as shape and scale parameter.

Assuming that $K$ and $\sigma$ are known, the task we address in this work is to jointly retrieve $x$ (reconstruction) and obtain a good representation of its features through an estimation of the underlying model parameters $(p_{j},\alpha_{j})$ for $j\in\{1,\dots,k\}$ (feature extraction). Starting from a similar statistical model as the one considered in [17, 19], we infer a continuous variational framework which does not rely on the a priori knowledge of the exact number of features $k$ . We derive this model by performing a Maximum a Posteriori estimation, which allows us to formulate the Joint Image Reconstruction and Feature Extraction task as a non-smooth and non-convex optimisation problem involving a coupling term and a block-coordinate separable one.

2.2 Bayesian Model

From (1), we derive the following likelihood

\mathsf{p}(y|x,\sigma^{2})\\ =\frac{1}{(2\pi\sigma^{2})^{n/2}}\exp\left(-\frac{\|y-Kx\|^{2}}{2\sigma^{2}}\right).

(3)

Assuming then that the components of $x$ are independent conditionally to the knowledge of their feature class, the prior distribution of $x$ is a mixture of $\mathcal{GGD}$ s

\mathsf{p}(x|p,\alpha)\\ =\prod_{j=1}^{k}\frac{1}{\left(2\alpha^{1/p_{j}}_{j}\Gamma\left(1+\frac{1}{p_{j}}\right)\right)^{N_{j}}}\exp\left(-\frac{\|\overline{x}_{j}\|^{p_{j}}_{p_{j}}}{\alpha_{j}}\right).

(4)

Hereabove, for every $x\in\mathbb{R}^{n}$ and a feature labels set $j\in\{1,\dots,k\}$ , we define $\overline{x}_{j}\in\mathbb{R}^{N_{j}}$ as the vector containing only the $N_{j}$ components of $x$ that belong to the $j$ -th feature. Following the discussion in [38], for the shape parameter, we choose a uniform distribution on a certain interval $[a,b]\subset[0,+\infty)$ :

\displaystyle\mathsf{p}(p)

\displaystyle=\prod_{j=1}^{k}\frac{1}{b-a}\mathsf{I}_{[a,b]}(p_{j}).

(5)

This choice stems from the fact that setting $a=0$ and $b=3$ allows covering all possible values of the shape parameter encountered in practical applications, but no additional information about this parameter is available. For the scale parameter, we adopt the Jeffreys distribution to reflect the lack of knowledge about this parameter:

\displaystyle\mathsf{p}(\alpha)

\displaystyle=\prod_{j=1}^{k}\frac{1}{\alpha_{j}}\mathsf{I}_{[0,+\infty)}(\alpha_{j}).

(6)

Note that such kind of prior is often used for scale parameters [39]. Hereabove, $\mathsf{I}_{S}$ represents the characteristic function of some subset $S\subset\mathbb{R}$ , which is equal to 1 over $S$ , and 0 elsewhere.

2.3 Variational Model

In order to avoid to define a priori the number of features, we regularise the problem by considering the 2D Total Variation (TV) of the $\mathcal{GGD}$ parameters $(p,\alpha)\in(0,+\infty)^{n}\times(0,+\infty)^{n}$ . The idea of using Total Variation to define a segmentation procedure is studied in [40, 41, 42, 43, 44, 45] by virtue of the co-area formula: the authors propose to replace the boundary information term of the Mumford-Shah (MS) functional [46] with the TV convex integral term. This choice yields a non-tight convexification of the MS model that does not require setting the number of segments in advance. The overall segmentation procedure is then built upon two steps: the first one consists of obtaining a smooth version of the given image that is adapted to segmentation by minimising the proposed functional with convex methods; the second step consists of partitioning the obtained solution into the desired number of segments, by e.g. defining the thresholds with Otsu’s method [47] or the $k$ -means algorithm. The strength of our approach is that the second step (i.e. the actual segmentation step) is independent from the first one; hence it is possible to set the number of segments (i.e. , labels) without solving the optimisation problem again.

In the considered model, the introduction of a TV prior leads to a minimization problem that is non-convex with respect to $\alpha$ . To circumvent this issue, a possible strategy would involve applying the variable change $\beta=1/\alpha$ , which leads to a convex problem with regard to $\beta$ . However, after performing some tests, we noticed that this choice tends to promote extreme values $0$ or $+\infty$ . We then opted for the following reparameterisation for the scale parameter: let $\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ be such that, for every $i\in\{1,\ldots,n\}$ ,

\beta_{i}=\frac{1}{p_{i}}\ln\alpha_{i},

(7)

and let us choose for this new variable a non-informative Gaussian prior that is defined on the whole space of configurations with mean $\mu_{\beta}\geq 0$ and standard deviation $\sigma_{\beta}>0$ . The choice of a non-necessarily zero-mean distribution stems from the idea of having a more flexible prior to represent our reparameterised scale parameter.

Thus, replacing $\alpha$ with $\beta$ and further introducing TV regularisation potentials (weighted by the regularisation parameters $\lambda>0$ and $\zeta>0$ ) leads to the following reformulation of distributions (4)-(6):

\mathsf{p}(x|p,\beta)\\ =\prod_{i=1}^{n}\frac{1}{2\exp(\beta_{i})\Gamma\left(1+\frac{1}{p_{i}}\right)}\exp\left(-|x_{i}|^{p_{i}}\exp(-p_{i}\beta_{i})\right)

(8)

\mathsf{p}(p)=c_{p}\exp(-\lambda\mathrm{TV}(p))\prod_{i=1}^{n}\frac{1}{b-a}\mathsf{I}_{[a,b]}(p_{i})

(9)

{\mathsf{p}(\beta)}\\ =c_{\beta}{\exp(-\zeta\mathrm{TV}(\beta))\prod_{i=1}^{n}\frac{1}{\sqrt{2\pi}\sigma_{\beta}}\exp\Big{(}-\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}\Big{)}.}

(10)

where $(c_{p},c_{\beta})\in(0,+\infty)^{2}$ are normalisation constants. In Figure 1, we depict the probabilistic dependence graph defining the relations between variables and hyperparameters in our model.

Refer to caption — Figure 1: Probabilistic dependence graph of our model. Hyperparameters are represented as diamonds, and variables as ellipses: $a$ and $b$ are the lower and the upper bound for the interval appearing in the uniform distribution of the shape parameter, $p$ is the shape parameter, $\alpha$ is the original scale parameter, $\beta$ is the reparameterised scale parameter with mean $\mu_{\beta}$ and standard deviation $\sigma_{\beta}$ , $x$ is the sought signal, $y$ is the observed one, $K$ is the linear operator, and $\omega$ is the additive Gaussian noise with standard deviation $\sigma$ .

The joint posterior distribution is determined as follows:

	$\displaystyle\mathsf{p}(x,p,\beta\|y)$	$\displaystyle\propto\mathsf{p}(y\|x,p,\beta)\mathsf{p}(x,p,\beta)$
		$\displaystyle\propto\mathsf{p}(y\|x,p,\beta)\mathsf{p}(x\|p,\beta)\mathsf{p}(p)\mathsf{p}(\beta).$		(11)

Let us take the negative logarithm of (11), then computing the Maximum a Posteriori estimates (i.e., maximising the joint posterior distribution) is equivalent to the following optimization problem, which we refer to as the joint image reconstruction and feature extraction problem

\underset{\begin{subarray}{c}{(x,p,\beta)\in\mathbb{R}^{n}\times\mathbb{R}^{n}\times\mathbb{R}^{n}}\end{subarray}}{\mathrm{minimize}}\;\;\Theta(x,p,\beta)=\frac{1}{2\sigma^{2}}\|y-Kx\|^{2}\\ +\sum_{i=1}^{n}\Big{(}|x_{i}|^{p_{i}}e^{-p_{i}\beta_{i}}+\ln\Gamma(1+\frac{1}{p_{i}})+\iota_{[a,b]}(p_{i})+\beta_{i}\\ +{\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}}\Big{)}+\lambda\mathrm{TV}(p)+\zeta\mathrm{TV}(\beta).

(12)

Hereabove, $\iota_{S}$ represents the indicator function of some subset $S\subset\mathbb{R}$ , which is equal to $0$ over $S$ , and $+\infty$ elsewhere.

In [28], the authors proposed a generalised discrete Mumford-Shah variational model that is specifically designed for the joint image reconstruction and edge detection problem. In contrast, the model we propose in (12) is well suited to encompass a wider class of problems. In Section 5, we present two applications, namely in the context of wavelet-based image restoration and in the context of joint deblurring/segmentation of ultrasound images. In particular, we notice that when restricted to variable $x$ for a given set of parameters $(p,\beta)$ , the formulation (12) boils down to the flexible sparse regularisation model

\underset{\begin{subarray}{c}{x\in\mathbb{R}^{n}}\end{subarray}}{\mathrm{minimize}}\;\;\frac{1}{2\sigma^{2}}\|y-Kx\|^{2}+\sum_{i=1}^{n}|x_{i}|^{p_{i}}e^{-p_{i}\beta_{i}},

(13)

where the contribution of the $\ell_{p_{i}}$ regularisation term is itself weighted in a space varying fashion.

Function $\Theta$ in (12) is non-smooth and non-convex. It reads as the sum of a coupling term and three block-separable terms. In particular, the block-separable data-fit term relative to $x$ is quadratic and hence has a Lipschitz continuous gradient. Our proposed algorithm aims to leverage this property, which is generally not accounted for by other BCD methods. To this aim, we exploit a hybrid scheme that involves both standard and linearised proximal steps. The details about the proposed method are presented in the next section.

3 Preconditioned Structure Adapted Semi-Linearised Proximal Alternating Minimisation (P-SASL-PAM)

In this section, we introduce a BCD-based method to address a class of sophisticated optimization problems that includes (12) as a special case. We start the section by useful preliminaries about subdifferential calculus. Then, we present the Kurdyka-Łojasiewicz property, which plays a prominent role in the convergence analysis of BCD methods in a non-convex setting. Finally, we define problem (20), itself generalising (12), for which we derive our proposed BCD-based algorithm and show its convergence properties. The so-called Preconditioned Structure Adapted Semi-Linearised Proximal Alternating Minimisation (P-SASL-PAM) approach mixes both standard and preconditioned linearised proximal regularisation on the different coordinate blocks of the criterion.

3.1 Subdifferential Calculus

Let us now recall some definitions and elements of subdifferential calculus that will be useful in the upcoming sections. For a proper and lower semicontinuous function $h:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ , the domain of $h$ is defined as

\operatorname{dom}h\;=\left\{u\in\mathbb{R}^{n}\mid h(u)<+\infty\right\}.

Firstly, we recall the notion of subgradients and subdifferential for convex functions.

Definition 1 (Subgradient of a convex function).

Let $h:\mathbb{R}^{n}\rightarrow(-\infty,\infty]$ be a proper convex lower semicontinuous function. The subdifferential $\partial h(u^{+})$ of $h$ at $u^{+}\in\mathbb{R}^{n}$ is the set of all vectors $r\in\mathbb{R}^{n}$ , called subgradients of $h$ at $u^{+}$ , such that

\forall u\in\mathbb{R}^{n}\;\;h(u)\geq h(u^{+})+\langle r,u-u^{+}\rangle.

If $u^{+}\notin\operatorname{dom}h$ , then $\partial h(u^{+})=\varnothing$ .

Secondly, we consider the more general notion of (limiting)-subdifferential for non-necessarily convex functions, as proposed in [48, Definition 8.3].

Definition 2 (Limiting Subdifferential).

Let $h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ be a proper and lower semicontinuous function. For a vector $u^{+}\in\mathbb{R}^{n}$ ,

•

the Fréchet subdifferential of $h$ at $u^{+}$ , written as $\hat{\partial}h(u^{+})$ , is the set of all vectors $r\in\mathbb{R}^{n}$ such that

$h(u)\geq h(u^{+})+\langle r,u-u^{+}\rangle+o(\|u-u^{+}\|);$

if $u^{+}\notin\text{dom\;}h$ , then $\hat{\partial}h(u^{+})=\varnothing$ ;

•

the limiting-subdifferential of $h$ at $u^{+}$ , denoted by $\partial h(u^{+})$ , is defined as

\partial h(u^{+})=\{r\in\mathbb{R}^{n}\;|\;\exists\,u^{k}\rightarrow u^{+},\\ h(u^{k})\rightarrow h(u^{+}),r^{k}\rightarrow r,r^{k}\in\hat{\partial}h(u^{k})\}.

If $h$ is lower semicontinuous and convex, then the three previous notions of subdifferentiality are equivalent, i.e. $\hat{\partial}h(u^{+})=\partial h(u^{+})$ . If $h$ is differentiable, then $\partial h(u^{+})=\{\nabla h(u^{+})\}$ . Now, it is possible to formalise the notion of critical points for a general function:

Definition 3 (Critical point).

Let $h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ be a proper function. A point $u^{*}\in\mathbb{R}^{n}$ is said to be a critical (or stationary) point for $h$ if $0\in\partial h(u^{*})$ .

Eventually, we define the notion of proximal maps relative to the norm induced by a positive definite matrix.

Definition 4.

Let $\mathcal{S}_{n}$ be the set of symmetric and positive definite matrices in $\mathbb{R}^{n\times n}$ . For a matrix $A\in\mathcal{S}_{n}$ , the weighted $\ell_{2}$ -norm induced by $A$ is defined as

(\forall u\in\mathbb{R}^{n})\quad\|u\|_{A}=(u^{\top}Au)^{1/2}.

(14)

Definition 5.

Let $h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ be a proper and lower semicontinuous function, let $A\in\mathcal{S}_{n}$ and $u^{+}\in\mathbb{R}^{n}$ . The proximity operator of function $h$ at $u^{+}$ with respect to the norm induced by $A$ is defined as

\text{prox}_{h}^{A}(u^{+})=\text{argmin}_{u\in\mathbb{R}^{n}}\left(\frac{1}{2}\|u-u^{+}\|^{2}_{A}+h(u)\right).

(15)

Note that $\text{prox}_{h}^{A}(u^{+})$ , as defined above, can be the empty set. It is nonempty for every $u^{+}\in\mathbb{R}^{n}$ , if $h$ is lower-bounded by an affine function. In addition, it reduces to a single-valued operator when $h$ is convex.

In order to deal with the situation when no closed-form proximal formulas are available (as it might be the case for non-trivial preconditioning metrics $A$ ), we take into account an inexact notion of proximal computation in the sense of [37, Theorems 4.2 and 5.2] and [32]:

Definition 6.

Let $h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ be a proper and lower semicontinuous function, let $A\in\mathcal{S}_{n}$ , $\tau>0$ and $u^{+}\in\mathbb{R}^{n}$ . Then $u^{*}\in\mathbb{R}^{n}$ is an inexact proximal point for $h$ at $u^{+}$ if the following relative error conditions are satisfied:

(i)

Sufficient Decrease Condition:

$h(u^{*})+\frac{1}{2}\|u^{+}-u^{*}\|_{A}^{2}\leq h(u^{+})$ (16)
(ii)

Inexact Optimality: there exists $r\in\partial h(u^{*})$ such that

$\|r\|\leq\tau\|u^{+}-u^{*}\|.$ (17)

In this case we write that $u^{*}\in\operatorname{prox}_{h}^{A,\tau}(u^{+})$ .

Remark 1.

We highlight that when exact proximal points are considered, the optimality condition reads as

0\in\partial h(u^{*})+A(u^{+}-u^{*})

(18)

implying that there exists $r\in\partial h(u^{*})$ such that $r=A(u^{*}-u^{+})$ .

3.2 The KŁ-Property

Most of the works related to BCD-based algorithms rely on the framework developed by Attouch, Bolte, and Svaiter in their seminal paper [37] in order to prove the convergence of block alternating strategies for non-smooth and non-convex problems. A fundamental assumption in [37] is that the objective function satisfies the Kurdyka-Łojasiewicz (KŁ) property [49, 50, 51]. We recall the definition of this property as it was given in [25]. Let $\eta\in(0,+\infty]$ and denote by $\Phi_{\eta}$ the class of concave continuous functions ${\varphi:[0,+\infty)\rightarrow[0,+\infty)}$ satisfying the following conditions:

(i)

$\varphi(0)=0$ ;
(ii)

$\varphi$ is $\mathcal{C}^{1}$ on $(0,\eta)$ and continuous at 0;
(iii)

for every $s\in(0,\eta)$ , $\varphi^{\prime}(s)>0$ .

For any subset $S\subset\mathbb{R}^{n}$ and any point $u^{+}\in\mathbb{R}^{n}$ , the distance from $u^{+}$ to $S$ is defined by

\operatorname{dist}(u^{+},S)=\inf_{u\in S}\|u^{+}-u\|

with $\operatorname{dist}(u^{+},\varnothing)=+\infty$ .

Definition 7 (KŁ property).

Let $h:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ be a proper and lower semicontinuous function.

(i)

Function $h$ is said to satisfy the Kurdyka-Łojasiewicz property at ${u}^{+}\in\text{dom}\;\partial h$ if there exist $\eta\in(0,+\infty]$ , a neighbourhood $U$ of ${u}^{+}$ and a function $\varphi\in\Phi_{\eta}$ such that, for every $u\in U$ ,

h(u^{+})<h(u)<h(u^{+})+\eta\\ \quad\Rightarrow\varphi^{\prime}(h(u)-h(u^{+}))\operatorname{dist}(0,\partial h(u))\geq 1.

(19)

(ii)

Function $h$ is said to be a KŁ function if it satisfies the KŁ property at each point of $\operatorname{dom}\partial h$ .

3.3 Proposed Algorithm

Let us consider that every element $\zeta\in\mathbb{R}^{N}$ is block-decomposed as $\zeta=(\zeta_{0},\dots,\zeta_{d})$ , with, for every $i\in\{0,\ldots,d\}$ , $\zeta_{i}\in\mathbb{R}^{n_{i}}$ , with $\sum_{i=0}^{d}n_{i}=N$ . As we will show in Subsection 4.1, Problem (12) is a special instance of the general class of problems of the form

\underset{\begin{subarray}{c}{\zeta\in\mathbb{R}^{N}}\end{subarray}}{\mathrm{minimize}}\;\;\theta(\zeta)=q(\zeta)+f(\zeta_{0})+\sum_{i=1}^{d}g_{i}(\zeta_{i}),

(20)

under the following assumption:

Assumption 1.

1.

Function $q:\mathbb{R}^{N}\to\mathbb{R}$ is bounded from below and differentiable with Lipschitz continuous gradient on bounded subsets of $\mathbb{R}^{N}$ .
2.

Function $f:\mathbb{R}^{n_{0}}\to\mathbb{R}$ is differentiable with globally Lipschitz continuous gradient of constant $L_{f}>0$ , and is bounded from below.
3.

For every $i\in\{1,\dots,d\}$ , function $g_{i}:\mathbb{R}^{n_{i}}\to(-\infty,+\infty]$ is proper, lower semicontinuous and bounded from below and the restriction to its domain is continuous.
4.

$\theta$ is a KŁ function.

Remark 2.

The assumption of continuity in Assumption 1.3 is standard in the context of inexact minimisation algorithm (see the assumptions in [37, Theorem 4.1, Theorem 5.2]).

Throughout the paper we will use the following notation: for every $(\zeta_{i^{\prime}})_{1\leq i^{\prime}\leq d}\in\mathbb{R}^{n_{1}}\times\cdots\mathbb{R}^{n_{d}}$ and $i\in\{0,\dots,d\}$ , ${\zeta_{\neq i}=(\zeta_{0},\dots,\zeta_{i-1},\zeta_{i+1},\dots,\zeta_{d})}$ and

(\forall z\in\mathbb{R}^{n_{i}})\quad(z;\zeta_{\neq i})\\ =(\zeta_{0},\dots,\zeta_{i-1},z,\zeta_{i+1},\dots,\zeta_{d}).

(21)

In order to proceed with the algorithm construction and analysis, let us recall the notion of partial subdifferentiation for a function $\theta:\mathbb{R}^{N}\longrightarrow\mathbb{R}$ as the one in (20). For every $i\in\{0,\dots,d\}$ given a fixed $\zeta_{\neq i}$ , the subdifferential of the partial function $\theta(\cdot\,;\zeta_{\neq i})$ with respect to the $i$ -th block, is denoted as $\partial_{i}\theta(\cdot\,;\zeta_{\neq i})$ . Given these definitions, we have the following differential calculus property (see [48, Exercises 8.8(c), Proposition 10.5]:

Proposition 1.

Let function $\theta$ be defined as in (20). Under Assumption 1, the following equality holds: for every $\zeta\in\mathbb{R}^{N}$ ,

\partial\theta(\zeta)\\ =\{\nabla_{0}q(\zeta)+\nabla f(\zeta_{0})\}\times\raisebox{-1.42262pt}{\mbox{\LARGE{$\times$}}}_{\!i=1}^{\!d}\left(\nabla_{i}q(\zeta)+\partial g_{i}(\zeta_{i})\right)\\ =\raisebox{-1.42262pt}{\mbox{\LARGE{$\times$}}}_{\!i=0}^{\!d}\partial_{i}\theta(\zeta).

(22)

We are now ready to introduce our block alternating algorithm P-SASL-PAM, outlined in Algorithm 1, to solve problem (20). Throughout the paper, we use the following notation: for every $\ell\in\mathbb{N}$ and and for $i\in\{1,\dots,d\}$ ,

	$\displaystyle\zeta^{\ell+1,0}$	$\displaystyle=\zeta^{\ell};$
	$\displaystyle\zeta^{\ell+1,i}$	$\displaystyle=(\zeta^{\ell+1}_{0},\dots,\zeta^{\ell+1}_{i-1},\zeta^{\ell}_{i},\zeta^{\ell}_{i+1},\dots,\zeta^{\ell}_{d})$
	$\displaystyle\zeta^{\ell+1,d+1}$	$\displaystyle=\zeta^{\ell+1}.$

Initialize $\zeta_{0}^{0}\in\text{dom}\,f$ , $\zeta_{i}^{0}\in\text{dom}\,g_{i}$ for $i\in\{1,\dots,d\}$
Set $(A_{\ell})_{\ell\in\mathbb{N}}\in\mathcal{S}_{n_{0}}$ for every $\ell\in\mathbb{N}$
Set $\gamma_{0}\in(0,1)$ and $\gamma_{i}>0$ and $\tau_{i}>0$ for $i\in\{1,\dots,d\}$ and $\tau_{i}>0$ for $i\in\{0,\dots,d\}$
For $\,\ell=0,1,\ldots$ until convergence

\displaystyle\zeta_{0}^{\ell+1}

\displaystyle\in\text{prox}^{A_{\ell},\tau_{0}}_{\gamma_{0}q{(\cdot\,;\zeta_{\neq 0}^{\ell})}}\big{(}\zeta_{0}^{\ell}-\gamma_{0}{A_{\ell}^{-1}}\nabla f(\zeta_{0}^{\ell})\big{)}

(23)

ForFor $\,i=1,\ldots,d$

\displaystyle\zeta_{i}^{{\ell}+1}

\displaystyle\in\text{prox}^{\tau_{i}}_{\gamma_{i}\theta{(\cdot\,;\,\zeta^{{\ell}+1,i}_{\neq i})}}(\zeta_{i}^{{\ell}})

(24)

Forend
end

Algorithm 1 P-SASL-PAM to solve (20).

The proposed method sequentially updates one of the coordinate blocks $(\zeta_{0},\dots,\zeta_{d})$ involved in function $\theta$ , through proximal and gradient steps. Our algorithm P-SASL-PAM, summarised in Algorithm 1, mixes both standard and linearised proximal regularisation on the coordinate blocks as in SLPAM [28], while inverting the splitting in order to gain more efficient proximal computations as in ASAP [29, 30]. On the one hand, the lack of global Lipschitz-continuity of $\nabla q$ prevents us from adopting BCVMFB [32]. On the other hand, the lack of differentiability for the whole set of block-separable functions prevents us from adopting ASAP [29, 30]. Our approach takes full advantage of the Lipschitz differentiability assumption on $f$ to perform a linearised step for the update of variable $\zeta_{0}$ , while the remaining $\zeta_{i}$ ’s are updated sequentially, according to a standard proximal step. In addition, in order to accelerate the convergence, a preconditioned version of the linearised step is used, which relies on the MM-based variable metric forward-backward strategy introduced in [52]. The latter relies on the following technical assumptions:

Assumption 2.

We choose a sequence of SPD matrices $(A_{\ell})_{\ell\in\mathbb{N}}$ in such a way that there exists $(\underline{\nu},\overline{\nu})\in(0,+\infty)^{2}$ such that, for every $\ell\in\mathbb{N}$ ,

\underline{\nu}\mathbb{I}_{n_{0}}\preceq A_{\ell}\preceq\overline{\nu}\mathbb{I}_{n_{0}}.

(25)

Assumption 3.

The quadratic function defined, for every $\zeta_{0}^{+}\in\mathbb{R}^{n_{0}}$ and every SPD matrix $A\in\mathcal{S}_{n}$ satisfying Assumption 2, as

(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad\phi(\zeta_{0},\zeta_{0}^{+})\\ =f(\zeta_{0}^{+})+\langle\zeta_{0}-\zeta_{0}^{+},\nabla f(\zeta_{0}^{+})\rangle+\frac{1}{2}\|\zeta_{0}^{+}-\zeta_{0}\|_{A}^{2}

(26)

is a majorant function of $f$ at $\zeta_{0}^{+}$ , i.e.

(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad f(\zeta_{0})\leq\phi(\zeta_{0},\zeta_{0}^{+}).

(27)

Remark 3.

Since $f$ satisfies Assumption 1, the Descent Lemma [53, Proposition A.24] applies, yielding

(\forall(\zeta_{0},\zeta_{0}^{+})\in\mathbb{R}^{n_{0}}\times\mathbb{R}^{n_{0}})\quad f(\zeta_{0})\\ \leq f(\zeta_{0}^{+})+\langle\zeta_{0}-\zeta_{0}^{+},\nabla f(\zeta_{0}^{+})\rangle+\frac{L_{f}}{2}\|\zeta_{0}^{+}-\zeta_{0}\|^{2}.

This guarantees that the preconditioning matrix $A=L_{f}\mathbb{I}_{n_{0}}$ satisfies Assumption 2, with $\underline{\nu}=\overline{\nu}=L_{f}$ . Apart from this simple choice for matrix $A$ , more sophisticated construction strategies have been studied in the literature [52, 54, 55]. Practical choices of metrics for Problem (12) will be discussed in Section 5, which is dedicated to numerical experiments.

Remark 4.

Alternative approaches to deal with the lack of global Lipschitz continuity of $\nabla q$ could involve a backtracking strategy as in [56] or adaptive step sizes based on an estimate of the local smoothness of the function as in [57, 58].

Remark 5.

Esentially Cyclic update rule Even though Algorithm 1 relies on a sequential update rule for the blocks of coordinates $i\in\{0,\dots,d\}$ , an extension to a quasi-cyclic rule with interval $\overline{d}\geq d$ is possible. In this case, at each iteration, the index $i\in\{0,\dots,d\}$ of the updated block is randomly chosen in such a way that each of the $d$ blocks is updated at least once every $\overline{d}$ steps.

P-SASL-PAM involves the computation of three proximal operators, at each iteration $\ell\in\mathbb{N}$ . As we will show in Subsection 3.3.1, if these operators are exactly computed, P-SASL-PAM fits within the general algorithmic framework TITAN [36], and, as such, inherits its convergence properties. The links between the exact and the inexact form of Algorithm 1 is discussed in Subsection 3.3.2. The convergence of the inexact form of Algorithm 1 is shown in Subsection 3.4.

3.3.1 Links between P-SASL-PAM and TITAN

Let us show that the exact version of Algorithm 1 is a special instance of the TITAN algorithm from [36]. The scheme of TITAN relies on an MM strategy that, at each iteration, for each block of coordinates, minimizes a block surrogate function, i.e. a majorizing approximation for the restriction of the objective function to this block. Let us define formally the notion of block surrogate function in the case of Problem (20).

Definition 8 (Block surrogate function).

Consider a function $h:\mathbb{R}^{N}\longrightarrow\mathbb{R}$ . For every $i\in\{0,\dots,d\}$ , function ${h_{i}\,:\,\mathbb{R}^{n_{i}}\times\mathbb{R}^{N}\to\mathbb{R}}$ is called a block surrogate function of $h$ at block $i$ if $(\zeta_{i},\xi)\mapsto h_{i}(\zeta_{i};\xi)$ is continuous in $\xi$ , lower-semicontinuous in $\zeta_{i}$ and the following conditions are satisfied:

(i)

$h_{i}(\xi_{i};\xi)=h(\xi)$ for every $\xi\in\mathbb{R}^{N}$
(ii)

$h_{i}(\zeta_{i};\xi)\geq h(\zeta_{i};\xi_{\neq i})$ for all $\zeta_{i}\in\mathbb{R}^{n_{i}}$ and ${\xi\in\mathbb{R}^{N}}$

Function $h_{i}(\cdot;\xi)$ is said to be a block surrogate function of $h$ at block $i$ in $\xi$ . The block approximation error for block $i$ at a point $(\zeta_{i},\xi)\in\mathbb{R}^{n_{i}}\times\mathbb{R}^{N}$ is then defined for every $i\in\{0,\dots,d\}$ as

{\mathsf{e}_{i}(\zeta_{i};\xi):=h_{i}(\zeta_{i};\xi)-h(\zeta_{i};\xi_{\neq i})}.

Let us now show that each of the steps of Algorithm 1 is actually equivalent to minimising an objective function involving a block surrogate function of the differentiable terms in $\theta$ for block $i\in\{0,\dots,d\}$ at the current iterate.

Solving (23) in Algorithm 1 is equivalent to solving

\underset{\begin{subarray}{c}{\zeta_{0}\in\mathbb{R}^{n_{0}}}\end{subarray}}{\mathrm{argmin}}\;\;h_{0}(\zeta_{0};\zeta^{\ell})

(28)

where

(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad h_{0}(\zeta_{0};\zeta^{\ell})=\\ q(\zeta_{0};\zeta^{\ell}_{\neq 0})+f(\zeta_{0}^{\ell})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}-\zeta_{0}^{\ell}\rangle+\frac{1}{2\gamma_{1}}\|\zeta_{0}^{\ell}-\zeta_{0}\|^{2}_{{A_{\ell}}},

is a surrogate function of $(q(\cdot\,;\zeta^{\ell}_{\neq 0})+f$ by virtue of Assumption 3. Notice that function $h_{0}$ is continuous on the block $(\zeta_{0};\zeta)\in\mathbb{R}^{n_{0}}\times\mathbb{R}^{N}$ .

Solving (24) for a certain block index $i\in\{1,\dots,d\}$ in Algorithm 1 is equivalent to solving

\underset{\begin{subarray}{c}{\zeta_{i}\in\mathbb{R}^{n_{i}}}\end{subarray}}{\mathrm{argmin}}\;\;h_{i}(\zeta_{i};\zeta^{\ell+1,i})+g_{i}(\zeta_{i})

(29)

where the function

(\forall\zeta_{i}\in\mathbb{R}^{n_{i}})\qquad h_{i}(\zeta_{i};\zeta^{\ell+1,i})=\\ q(\zeta_{i};\zeta^{\ell+1,i}_{\neq i})+\frac{1}{2\gamma_{i}}\|\zeta_{i}-\zeta_{i}^{\ell}\|^{2}

is a proximal surrogate of function $q(\cdot\;,\zeta^{\ell+1}_{\neq i})$ at its $i$ -th block in $\zeta^{\ell+1,i}$ . Note that function $h_{i}$ is continuous on $\mathbb{R}^{N}$ .

In a nutshell, Algorithm 1 alternates between minimization of problems involving block surrogates for the differentiable terms of function $\theta$ , and, as such, can be viewed as a special instance of TITAN [36]. This allows us to state the following convergence result for a sequence generated by Algorithm 1.

Theorem 2.

Let Assumptions 1-3 be satisfied. Assume also that the sequence $(\zeta^{\ell})_{\ell\in\mathbb{N}}$ generated by Algorithm 1 is bounded. Then,

i)

$\sum_{\ell=0}^{+\infty}\|\zeta^{\ell+1}-\zeta^{\ell}\|<+\infty$ ;
ii)

$(\zeta^{\ell})_{\ell\in\mathbb{N}}$ converges to a critical point $\zeta^{*}$ for function $\theta$ in (20).

Proof.

We start the proof by identifying the three block approximation errors for the block surrogate functions at an iteration $\ell\in\mathbb{N}$ :

(\forall\zeta_{0}\in\mathbb{R}^{n_{0}})\quad\mathsf{e}_{0}(\zeta_{0};\zeta^{\ell})\\ =f(\zeta_{0}^{\ell})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}-\zeta_{0}^{\ell}\rangle+\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell}-\zeta_{0}\|^{2}_{A_{\ell}}-f(\zeta_{0})

and for $i\in\{1,\dots,d\}$

(\forall\zeta_{i}\in\mathbb{R}^{n_{i}})\quad\mathsf{e}_{i}(\zeta_{i};\zeta^{\ell+1,i})=\frac{1}{2\gamma_{i}}\|\zeta_{i}-\zeta_{i}^{\ell}\|^{2}.

Clearly $\mathsf{e}_{0}(\cdot\,;\zeta^{\ell})$ (resp. $\mathsf{e}_{i}(\cdot\,;\zeta^{\ell+1,i})$ for $i\in\{1,\dots,d\}$ ) are differentiable at $\zeta_{0}^{\ell}$ (resp. at $\zeta_{i}^{\ell}$ for $i\in\{1,\dots,d\}$ ) and the following holds for every $i\in\{0,\dots,d\}$ :

\displaystyle\mathsf{e}_{i}(\zeta_{i}^{\ell};\zeta^{\ell+1,i})=0,

\displaystyle\nabla_{i}\mathsf{e}_{i}(\zeta_{i}^{\ell};\zeta^{\ell+1,i})=0.

This shows that [36, Assumption 2] is satisfied.

From (23) and Assumptions 2-3, we deduce that

q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{1}^{\ell})

which implies that the Nearly Sufficient Descending Property [36, (NSDP)] is satisfied for the first block of coordinate with constant $\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}$ . On the other hand, for every $i\in\{1,\dots,d\}$ , function $\mathsf{e}_{i}(\cdot\,;\zeta^{\ell+1,i})$ satisfies [36, Condition 2], which implies that [36, (NSDP)] also holds for $i$ -the block of coordinates with the corresponding constant $1/\gamma_{i}$ .

Moreover, [36, Condition 4.(ii)] is satisfied by Algorithm 1 with

\overline{l}=\max\left\{\frac{\underline{\nu}}{2}\left(\frac{1}{\gamma_{0}}-1\right),\frac{1}{\gamma_{1}},\dots,\frac{1}{\gamma_{d}}\right\}

and this constant fulfill the requirements of [36, Theorem 8]. In addition, by virtue of Proposition 22, [36, Assumption 3.(i)] holds, while the requirement in [36, Assumption 3.(ii)] is guaranteed by the fact that all the block surrogate functions are continuously differentiable.

Finally, since Algorithm 1 does not include any extrapolation step, we do not need to verify [36, Condition 1], whereas [36, Condition 4.(i)] is always satisfied.

In conclusion, we proved that all the requirements of [36, Proposition 5, Theorems 6 and 8] are satisfied. [36, Proposition 5] guarantees that the sequence has the finite-length property as expressed by i),while [36, Theorems 6 and 8] state that the sequence converges to a critical point $\zeta^{*}$ of (20), which concludes the proof. ∎

3.3.2 Well-definition of Algorithm 1

Now, we show that the inexact updates involved in Algorithm 1 are well-defined. To do so, we prove that the P-SASL-PAM algorithm with exact proximal computations can be recovered as a special case of Algorithm 1.

By the variational definition of proximal operator, for every $\ell\in\mathbb{N}$ , the iterates of Algorithm 1 satisfy, for every $i\in\{1,\dots,d\}$ ,

\zeta_{0}^{\ell+1}\in\underset{\begin{subarray}{c}{u_{0}\in\mathbb{R}^{n_{0}}}\end{subarray}}{\mathrm{argmin}}\;\;\Big{\{}q(u_{0}\,;\zeta_{\neq 0}^{\ell})+\langle\nabla f(\zeta_{0}^{\ell}),u_{0}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|u_{0}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\Big{\}}

(30)

\zeta_{i}^{\ell+1}\in\underset{\begin{subarray}{c}{u_{i}\in\mathbb{R}^{n_{i}}}\end{subarray}}{\mathrm{argmin}}\;\;\Big{\{}q(u_{i}\,;\zeta_{\neq i}^{\ell+1,i})+g_{i}(u_{i})\\ +\frac{1}{2\gamma_{i}}\|u_{i}-\zeta_{i}^{\ell}\|^{2}\Big{\}}

(31)

so that

q(\zeta^{\ell+1,1})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\leq q(\zeta^{\ell})

(32)

q(\zeta^{\ell+1,i+1})+g(\zeta_{i}^{\ell+1})\\ +\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\leq q(\zeta^{\ell+1,i})+g(\zeta_{i}^{\ell}),

(33)

which implies that the sufficient decrease condition (16) is satisfied for every $i\in\{0,\dots,d\}$ .

The use of the Fermat’s rule implies that, for every $\ell\in\mathbb{N}$ , the iterates of P-SASL-PAM are such that for every $i\in\{1,\dots,d\}$ there exists $r_{i}\in\partial g_{i}(\zeta_{i}^{\ell+1})$ for which the following equalities are satisfied for $i\in\{1,\dots,d\}$ :

	$\displaystyle 0=\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})+\gamma_{0}^{-1}{A_{\ell}}(\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell})$		(34)
	$\displaystyle 0=r_{i}+\nabla_{i}q(\zeta^{\ell+1,i+1})+\gamma_{i}^{-1}(\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell})$		(35)

Hence, for every $i\in\{1,\dots,d\}$

	$\displaystyle\\|\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})\\|$	$\displaystyle\leq\gamma_{0}^{-1}\overline{\nu}\\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\\|$		(36)
	$\displaystyle\\|r_{i}+\nabla_{i}q(\zeta^{\ell+1,i+1})\\|$	$\displaystyle=\gamma_{i}^{-1}\\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\\|$		(37)

which implies that the inexact optimality condition (17) is satisfied with $\tau_{0}=\gamma_{0}^{-1}\overline{\nu}$ for the first block of coordinates and $\tau_{i}=\gamma_{i}^{-1}$ for the remaining ones. In a nutshell, Algorithm 1 is well defined, as its inexact rules hold assuming exact computation of the proximity operators, which leads to TITAN.

3.4 Convergence analysis in the inexact case

Let us now present our main result, that is the convergence analysis for Algorithm 1.

Lemma 1.

Let $(\zeta^{\ell})_{\ell\in\mathbb{N}}$ be the sequence generated by Algorithm 1. Then, under Assumptions 1 and 2

i)

there exists $\mu\in(0,+\infty)$ such that for every $\ell\in\mathbb{N}$ ,

$\theta(\zeta^{\ell+1})\leq\theta(\zeta^{\ell})-\frac{\mu}{2}\|\zeta^{\ell+1}-\zeta^{\ell}\|^{2}.$ (38)
ii)

$\sum_{\ell=0}^{+\infty}\|\zeta^{\ell+1}-\zeta^{\ell}\|^{2}<+\infty$ .

Proof.

Let us start by considering the sufficient decrease inequality related to the first block:

q(\zeta^{\ell+1,1})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\leq q(\zeta^{\ell}).

(39)

Adding $f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}$ on both sides of (39) yields

q(\zeta^{\ell+1,1})+\langle\nabla f(\zeta_{0}^{\ell}),\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\rangle\\ +\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}+f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}

(40)

By applying (26) and (27) with $\zeta_{0}^{+}=\zeta_{0}^{\ell}$ and $\zeta_{0}=\zeta_{0}^{\ell+1}$ we obtain

f(\zeta_{0}^{\ell+1})\leq f(\zeta_{0}^{\ell})+\langle\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell},\nabla f(\zeta_{0}^{\ell})\rangle\\ +\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}

(41)

hence the LHS in (40) can be further lower bounded, yielding

q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2\gamma_{0}}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell})+\frac{1}{2}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2},

(42)

hence

q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|_{A_{\ell}}^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell}).

(43)

To conclude, by Assumption 2, we get

q(\zeta^{\ell+1,1})+f(\zeta_{0}^{\ell+1})+\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|^{2}\\ \leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell}).

(44)

The sufficient decrease inequality for the remaining blocks of index $i\in\{1,\ldots,d\}$ can be expressed as

q(\zeta^{\ell+1,i+1})+g(\zeta_{i}^{\ell+1})-g(\zeta_{i}^{\ell})+\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\\ \leq q(\zeta^{\ell+1,i}).

(45)

The first term in the LHS of (45) for the $i$ -th block can be similarly bounded from below with the sufficient decrease inequality for the $(i+1)$ -th block, yielding

q(\zeta^{\ell+1,i+2})+g(\zeta_{i+1}^{\ell+1})-g(\zeta_{i+1}^{\ell})\\ +\frac{1}{2\gamma_{i+1}}\|\zeta_{i+1}^{\ell+1}-\zeta_{i+1}^{\ell}\|^{2}+g(\zeta_{i}^{\ell+1})-g(\zeta_{i}^{\ell})\\ +\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\leq q(\zeta^{\ell+1,i}).

(46)

By applying this reasoning recursively from $i=1$ to $i=d$ , we obtain

q(\zeta^{\ell+1,d+1})+\sum_{i=1}^{d}g(\zeta_{i}^{\ell+1})-\sum_{i=1}^{d}g(\zeta_{i}^{\ell})\\ +\sum_{i=1}^{d}\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}\leq q(\zeta^{\ell+1,1})

(47)

where we recall that $q(\zeta^{\ell+1,d+1})=q(\zeta^{\ell+1})$ .

Exploiting now (47), we can lower bound the first term in the LHS of (44), which yields

q(\zeta^{\ell+1})+\sum_{i=1}^{d}g(\zeta_{i}^{\ell+1})-\sum_{i=1}^{d}g(\zeta_{i}^{\ell})\\ +\sum_{i=1}^{d}\frac{1}{2\gamma_{i}}\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\|^{2}+f(\zeta_{0}^{\ell+1})\\ +\frac{1}{2}\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu}\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\|^{2}\leq q(\zeta^{\ell})+f(\zeta_{0}^{\ell}).

(48)

By setting $\mu=\min\left\{\left(\frac{1}{\gamma_{0}}-1\right)\underline{\nu},\frac{1}{\gamma_{1}},\dots,\frac{1}{\gamma_{d}}\right\}$ , we deduce (38).

From (38), it follows that the sequence $(\theta(\zeta^{\ell}))_{\ell\in\mathbb{N}}$ is non-increasing. Since function $\theta$ is assumed to be bounded from below, this sequence converges to some real number $\underline{\theta}$ . We have then, for every integer $K$ ,

$\displaystyle\sum_{\kappa=0}^{K}\\|\zeta^{\ell}-\zeta^{\ell+1}\\|^{2}$	$\displaystyle\leq\frac{1}{\mu}\sum_{\kappa=0}^{K}\left(\theta(\zeta^{\ell})-\theta(\zeta^{\ell+1})\right)$	(49)
	$\displaystyle=\frac{1}{\mu}(\theta(\zeta^{0})-\theta(\zeta^{K+1}))$
	$\displaystyle\leq\frac{1}{\mu}(\theta(\zeta^{0})-\underline{\theta}).$

Taking the limit as $K\rightarrow+\infty$ yields the desired summability property. ∎

Lemma 2.

Assume that the sequence $(\zeta^{\ell})_{\ell\in\mathbb{N}}$ generated by Algorithm 1 is bounded. Then, for every $\ell\in\mathbb{N}$ , there exists $s^{\ell+1}\in\partial\theta(\zeta^{\ell+1})$ such that

\|s^{\ell+1}\|\leq\rho\|\zeta^{\ell+1}-\zeta^{\ell}\|,

(50)

where $\rho\in(0,+\infty)$ .

Proof.

The assumed boundedness implies that there exists a bounded subset $S$ of $\mathbb{R}^{N}$ such that for every $i\in\{0,\dots,d\}$ and $\ell\in\mathbb{N}$ , $\zeta^{\ell+1,i}\in S$ . For every $\ell\in\mathbb{N}$ , we define

s_{0}^{\ell+1}=\nabla f(\zeta_{0}^{\ell+1})+\nabla_{0}q(\zeta^{\ell+1})

(51)

for which the following holds by virtue of Proposition 22

s^{\ell+1}_{0}\in\partial_{0}\theta(\zeta^{\ell+1})=\{\nabla_{0}\theta(\zeta^{\ell+1})\}.

(52)

Then

	$\displaystyle\\|s^{\ell+1}_{0}\\|$	$\displaystyle\leq\\|\nabla f(\zeta_{0}^{\ell+1})-\nabla f(\zeta_{0}^{\ell})\\|$
		$\displaystyle+\\|\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})\\|$
		$\displaystyle+\\|\nabla_{0}q(\zeta^{\ell+1})-\nabla_{0}q(\zeta^{\ell+1,1})\\|.$

From the Lipschitz continuity of $\nabla f$ and $\nabla q$ on $S$ and the inexact optimality inequality for the first block, we conclude that

\|s^{\ell+1}_{0}\|\leq\left(L_{f}+\tau_{0}+L_{q}\right)\|\zeta^{\ell+1}-\zeta^{\ell}\|.

(53)

In the same spirit, for every $i\in\{1,\dots,d\}$ we consider $r_{i}^{\ell+1}\in\partial g(\zeta_{i}^{\ell+1})$ satisfying the inexact optimality inequality with the corresponding $\tau_{i}$ . We then define

	$\displaystyle s_{i}^{\ell+1}$	$\displaystyle=\nabla_{i}q(\zeta^{\ell+1})+r_{i}^{\ell+1}\textdiv$
		$\displaystyle\in\nabla_{i}q(\zeta^{\ell+1})+\partial g_{i}(\zeta_{i}^{\ell+1})=\partial_{i}\theta(\zeta^{\ell+1}).$		(54)

For $i=d$ , by virtue of the inexact optimality inequality,

\|s_{d}^{\ell+1}\|\leq\tau_{d}\|\zeta^{\ell+1}-\zeta^{\ell}\|.

(55)

On the other side, for $i=1,\dots,d-1$

	$\displaystyle\\|s_{i}^{\ell+1}\\|$	$\displaystyle=\\|\nabla_{i}q(\zeta^{\ell+1})+r_{i}^{\ell+1}\\|$
		$\displaystyle\leq\\|\nabla_{i}q(\zeta^{\ell+1})-\nabla_{i}q(\zeta^{\ell+1,i+1})\\|$
		$\displaystyle\qquad\qquad+\\|r_{i}^{\ell+1}+\nabla_{i}q(\zeta^{\ell+1,i+1})\\|$
		$\displaystyle\leq L_{q}\\|\zeta^{\ell+1}-\zeta^{\ell}\\|+\tau_{i}\\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\\|,$

where the last estimate stems from inexact optimality inequality for the $i$ -th block. This yields

\|s_{i}^{\ell+1}\|\leq(L_{q}+\tau_{i})\|\zeta^{\ell+1}-\zeta^{\ell}\|.

(56)

To conclude, setting

s^{\ell+1}=(s_{0}^{\ell+1},\dots,s_{d}^{\ell+1})\in\partial\theta(\zeta^{\ell+1})

(57)

and $\rho=L_{f}+\sum_{i=0}^{d}\tau_{i}+dL_{q}$ yields (50). ∎

We now report a first convergence result for a sequence generated by the proposed algorithm, which is reminiscent from [24, Proposition 6]:

Proposition 3 (Properties of the cluster points set).

Suppose that Assumptions 1 and 2 hold. Let $(\zeta^{\ell})_{\ell\in\mathbb{N}}$ be a sequence generated by Algorithm 1. Denote by $\omega(\zeta^{0})$ the (possibly empty) set of its cluster points. Then

i)

if $(\zeta^{\ell})_{\ell\in\mathbb{N}}$ is bounded, then $\omega(\zeta^{0})$ is a nonempty compact connected set and

$\operatorname{dist}(\zeta^{\ell},\omega(\zeta^{0}))\rightarrow 0\quad\text{\;as\;}\quad\ell\rightarrow+\infty;$
ii)

$\omega(\zeta^{0})\subset\operatorname{crit}\,\theta$ , where $\operatorname{crit}\,\theta$ is the set of critical points of function $\theta$ ;
iii)

$\theta$ is finite valued and constant on $\omega(\zeta^{0})$ , and it is equal to

$\inf_{\ell\in\mathbb{N}}\theta(\zeta^{\ell})=\lim_{\ell\rightarrow+\infty}\theta(\zeta^{\ell}).$

Proof.

The proof of the above results for the proposed algorithm is basically identical to the one for [24, Proposition 6] for PAM algorithm. In addition, we highlight that according to Assumption 1, our objective function $\theta$ is continuous on its domain. ∎

In conclusion, we have proved that, under Assumptions 1-3, a bounded sequence generated by the proposed method satisfies the assumptions in [37, Theorem 2.9]. Consequently, we can state the following result:

Theorem 4.

Let Assumptions 1-3 be satisfied and let $(\zeta^{\ell})_{\ell\in\mathbb{N}}$ be a sequence generated by Algorithm 1 that is assumed to be bounded. Then,

i)

$\sum_{\ell=1}^{+\infty}\|\zeta^{\ell+1}-\zeta^{\ell}\|<+\infty$ ;
ii)

$(\zeta^{\ell})_{\ell\in\mathbb{N}}$ converges to a critical point $\zeta^{*}$ of $\theta$ .

We managed to show that both the exact and the inexact version of Algorithm 1 share the same convergence guarantees under Assumptions 1-3. One of the main differences between the two algorithms, as highlighted in [37], is that the former has convergence guarantees that hold for an objective function that is lower semicontinuous, whereas the latter requires its continuity on the domain. However, as it will be shown in the next section, this does not represent an obstacle to the use of Algorithm 1 in image processing applications.

4 Application of P-SASL-PAM

4.1 Smoothing of the coupling term

The application of Algorithm 1 to Problem (12) requires the involved functions to fulfil the requirements listed in Assumption 1. This section is devoted to this analysis, by first defining $d=2$ , $n_{0}=n_{1}=n_{2}=n$ , $N=3n$ and the following functions, for every $x=(x_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ , $p=(p_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ , and $\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ ,

$\displaystyle\tilde{q}(x,p,\beta)$	$\displaystyle=\sum_{i=1}^{n}\|x_{i}\|^{p_{i}}e^{-\beta_{i}p_{i}},$	(58)
$\displaystyle f(x)$	$\displaystyle=\frac{1}{2\sigma^{2}}\\|y-Kx\\|_{2}^{2},$	(59)
$\displaystyle g_{1}(p)$	$\displaystyle=\sum_{i=1}^{n}\left(\ln\Gamma(1+\frac{1}{p_{i}})+\iota_{[a,b]}(p_{i})\right)$	(60)
	$\displaystyle\qquad\qquad+\lambda\operatorname{TV}(p),$
$\displaystyle g_{2}(\beta)$	$\displaystyle=\sum_{i=1}^{n}\left(\beta_{i}+{\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}}\right)+\zeta\operatorname{TV}(\beta).$	(61)

The first item in Assumption 1 regarding the regularity of the coupling term is not satisfied by (58). To circumvent this difficulty, we introduce the pseudo-Huber loss function [59] depending on a pair of parameters $\delta=(\delta_{1},\delta_{2})\in(0,+\infty)^{2}$ such that $\delta_{2}<\delta_{1}$ :

(\forall t\in\mathbb{R})\quad C_{\delta}(t)=H_{\delta_{1}}(t)-\delta_{2},

(62)

where $H_{\delta_{1}}$ is the hyperbolic function defined, for every $t\in\mathbb{R}$ , by $H_{\delta_{1}}(t)=\sqrt{t^{2}+\delta^{2}_{1}}$ . Function (62) is used as a smooth approximation of the absolute value involved in (58). We then replace (58) with

q(x,p,\beta)=\sum_{i=1}^{n}\left(C_{\delta}(x_{i})\right)^{p_{i}}e^{-\beta_{i}p_{i}}.

(63)

Function $C_{\delta}$ is infinitely differentiable. Thus function (63) satisfies Assumption 1.

Function (59) is quadratic convex, hence it clearly satisfies Assumption 1(ii). Function (60) is a sum of functions that are proper, lower semicontinuous and either non-negative or bounded from below. The same applies to function (61), which is also strongly convex. It results that (60) and (61) satisfy Assumption 1(iii).

Now, we must show that $\Theta$ is a KŁ function. To do so, let us consider the notion of o-minimal structure [60], which is a particular family $\mathcal{O}=\{\mathcal{O}_{n}\}_{n\in\mathbb{N}}$ where each $\mathcal{O}_{n}$ is a collection of subsets of $\mathbb{R}^{n}$ , satisfying a series of axioms (we refer to [24, Definition 13], for a more complete description). We present hereafter the definition of definable set and definable function in an o-minimal structure:

Definition 9 (Definable sets and definable functions).

Given an o-minimal structure $\mathcal{O}$ , a set $\mathcal{A}\subset\mathbb{R}^{n}$ such that $\mathcal{A}\in\mathcal{O}_{n}$ is said to be definable in $\mathcal{O}$ . A real extended valued function $f\,:\mathbb{R}\rightarrow(-\infty,+\infty]$ is said to be definable in $\mathcal{O}$ if its graph is a definable subset of $\mathbb{R}^{n}\times\mathbb{R}$ .

The importance of these concepts in mathematical optimisation is related to the following key result concerning the KŁ property [61]:

Theorem 5.

Any proper lower semicontinuous function $f:\mathbb{R}^{n}\rightarrow(-\infty,+\infty]$ which is definable in an o-minimal structure $\mathcal{O}$ has the KŁ property at each point of $\operatorname{dom}\,\partial f$ .

Let us identify a structure in which all the functions involved in the definition of $\Theta$ are definable. This will be sufficient, as definability is a closed property with respect to several operations, including finite sum and composition of functions. Before that, we provide a couple of examples of o-minimal structure. The first is represented by the structure of globally subanalytic sets $\mathbb{R}_{\rm an}$ [62], which contains all the sets of the form $\{(u,t)\in[-1,1]^{n}\times\mathbb{R}\mid f(u)=t\}$ where $f\,:\,[-1,1]^{n}\rightarrow\mathbb{R}$ is an analytic function that can be analytically extended on a neighbourhood of $[-1,1]^{n}$ . The second example is the $\log$ - $\exp$ structure $(\mathbb{R}_{\rm an},\text{exp})$ [60, 63], which includes $\mathbb{R}_{\rm an}$ and the graph of the exponential function. Even though this second structure is a common setting for many optimisation problems, it does not meet the requirements for ours: as shown in [64], $\Gamma^{>0}$ (i.e. , the restriction of the Gamma function to $(0,+\infty)$ ) is not definable on $(\mathbb{R}_{\rm an},\exp)$ . We thus consider the larger structure $(\mathbb{R_{\mathcal{G}}},\exp)$ , where $\Gamma^{>0}$ has been proved to be definable [65]. $\mathbb{R_{\mathcal{G}}}$ is an o-minimal structure that extends $\mathbb{R}_{\rm an}$ and is generated by the class $\mathcal{G}$ of Gevrey functions from [66].

We end this section with the following result, which will be useful subsequently.

Proposition 6.

The function $t\mapsto\ln\Gamma(1+\frac{1}{t})$ defined on $(0,+\infty)$ is $\mu$ -weakly convex with $\mu>\mu_{0}\approx 0.1136$ .

Proof.

Let us show that there exists $\mu>0$ such that function $t\mapsto\ln\Gamma(1+\frac{1}{t})+\mu t^{2}/2$ is convex on $(0,+\infty)$ . The second-order derivative of this function on the positive real axis reads

\frac{d^{2}}{dt^{2}}\left(\ln\Gamma\left(1+\frac{1}{t}\right)+\frac{\mu}{2}t^{2}\right)=\\ \frac{1}{t^{3}}\left(2\text{Dig}\left(1+\frac{1}{t}\right)+\frac{1}{t}\text{Dig}^{\prime}\left(1+\frac{1}{t}\right)+\mu t^{3}\right),

(64)

where the Digamma function $\operatorname{Dig}()$ is the logarithmic derivative of the Gamma function. In order to show the convexity of the considered function, we need to ensure that (64) is positive for every $t\in(0,+\infty)$ . By virtue of Bohr–Möllerup’s theorem [67, Theorem 2.1], among all functions extending the factorial functions to the positive real numbers, only the Gamma function is log-convex. More precisely, its natural logarithm is (strictly) convex on the positive real axis. This implies that $t\mapsto\text{Dig}^{\prime}(t)$ is positive. It results that the only sign-changing term in (64) is function $t\mapsto 2\,\text{Dig}\left(1+\frac{1}{t}\right)$ as $t\mapsto\text{Dig}(t)$ vanishes in a point $t_{0}>1$ ( $t_{0}\approx 1.46163$ ) which corresponds to the minimum point of the Gamma function – and therefore also of its natural logarithm [68]. As a consequence, the Digamma function is strictly positive for $t\in(t_{0},+\infty)$ , implying that $t\mapsto\text{Dig}\left(1+\frac{1}{t}\right)$ is strictly positive for all $t\in(0,\frac{1}{t_{0}-1})$ . Furthermore, $t\mapsto\text{Dig}\left(1+\frac{1}{t}\right)$ is strictly decreasing and bounded from below, as shown by the negativity of its first derivative

\frac{d}{dt}\text{Dig}\left(1+\frac{1}{t}\right)=-\frac{1}{t^{2}}\text{Dig}^{\prime}\left(1+\frac{1}{t}\right)

and by the following limit

\lim_{t\rightarrow+\infty}\text{Dig}\left(1+\frac{1}{t}\right)=\text{Dig}(1)=-\mathcal{E}

where the last equality holds by virtue of the Gauss Digamma theorem, and $\mathcal{E}$ is Euler-Mascheroni’s constant $\mathcal{E}\approx 0.57721$ [69]. In conclusion, for $t\in[\frac{1}{t_{0}-1},+\infty)$ , we need to ensure that the positive terms in (64) manage to balance the negative contribution of function $t\mapsto 2\text{Dig}\left(1+\frac{1}{t}\right)>-2\mathcal{E}$ . This leads to a condition on parameter $\mu>0$ , since we can impose that

0<\mu t^{3}-2\mathcal{E},

where the right-hand side expression has a lower bound $\mu/(t_{0}-1)^{3}-2\mathcal{E}$ that is positive when

\mu>2{\mathcal{E}}(t_{0}-1)^{3}=\mu_{0}\approx 0.1136.

This shows that function $t\mapsto\ln\Gamma(1+\frac{1}{t})$ is $\mu$ -weakly convex. ∎

4.2 Proximal computations

Let us now discuss the practical implementation of the proximal computations involved in Algorithm 1. Specifically, as we will show, none of these operators have closed-form expressions, so we need to resort to the inexact version. To ease the description, we summarise in Algorithm 2 the application of Algorithm 1 to the resolution of (12). As pointed out in [70] and in [35], the role of the relative error conditions (16) and (17) are more of theoretical interest than of practical use. In the following, we will illustrate optimisation procedures ensuring that condition (16) is satisfied for every block of variables at every iteration.

Initialize $x^{0}$ , $p^{0}$ and $\beta^{0}$
Set $\gamma_{0}\in(0,1)$ , $\gamma_{1}\in(0,1/\mu_{0})$ , $\gamma_{2}>0$
For $\,\ell=0,1,\ldots$
Set ${A_{\ell}}\in\mathcal{S}_{n}$
Find

$\displaystyle x^{\ell+1}$	$\displaystyle\approx\text{prox}^{A}_{\gamma_{0}q{(\cdot,p^{\ell},\beta^{\ell})}}(x^{\ell}-\gamma_{0}A^{-1}\nabla f(x^{\ell}))$	(65)
	(with Algorithm 3)
$\displaystyle p^{{\ell}+1}$	$\displaystyle\approx\text{prox}_{\gamma_{1}\theta{(x^{{\ell}+1},\cdot,\beta^{\ell})}}(p^{{\ell}})$	(66)
	(with Algorithm 5)
$\displaystyle\beta^{{\ell}+1}$	$\displaystyle\approx\text{prox}_{\gamma_{2}\theta{(x^{{\ell}+1},p^{\ell+1},\cdot)}}(\beta^{\ell})$	(67)
	(with Algorithm 6)

Algorithm 2 P-SASL-PAM to solve (12)

Proximal computation with respect to $x$ .

Subproblem (65) in Algorithm 2 requires the computation of the proximity operator of the following separable function

q(\cdot,p^{\ell},\beta^{\ell})\,:\,x\,\mapsto\,\sum_{i=1}^{n}\left(C_{\delta}(x_{i})\right)^{p_{i}^{\ell}}e^{-\beta_{i}^{\ell}p_{i}^{\ell}},

within a weighted Euclidean metric induced by some matrix $A\in\mathcal{S}_{n}$ . We notice that $x_{i}\mapsto\left(C_{\delta}(x_{i})\right)^{p_{i}^{\ell}}$ is non-convex whenever $p^{\ell}_{i}\in(0,1)$ , for some $i\in\{1,\dots,n\}$ . In order to overcome this issue, we apply a majorisation principle [71]. Let us introduce function $\sigma$ defined, for every $u\in[\delta_{1},+\infty)$ , as $\sigma(u)=(u-\delta_{2})^{p}$ with $p\in(0,1]$ , and vector $\delta=(\delta_{1},\delta_{2})\in(0,+\infty)^{2}$ such that $\delta_{2}<\delta_{1}$ . Since this function is concave, it can be majorised by its first-order expansion around any point $w>\delta_{2}$ :

(\forall u>\delta_{2})\quad(u-\delta_{2})^{p}\leq(w-\delta_{2})^{p}\\ +p(w-\delta_{2})^{p-1}(u-w)\\ =(1-{p})(w-\delta_{2})^{p}+{p}(w-\delta_{2})^{p-1}(u-\delta_{2}).

(68)

Setting, for every $(t,t^{\prime})\in{\mathbb{R}^{2}}$ , $u=H_{\delta_{1}}(t)\geq\delta_{1}$ , $w=H_{\delta_{1}}(t^{\prime})\geq\delta_{1}$ allows us to deduce the following majorisation:

\displaystyle(C_{\delta}(t))^{p}\leq

\displaystyle(1-p)(C_{\delta}(t^{\prime}))^{p}+p(C_{\delta}(t^{\prime}))^{p-1}C_{\delta}(t).

(69)

Let us now define $\mathcal{I}^{\ell}=\{i\in\{1,\dots,n\}\;|\;p^{\ell}_{i}\geq 1\}$ and $\mathcal{J}^{\ell}=\{1,\dots,n\}\setminus\mathcal{I}^{\ell}$ . Given $v=(v_{i})_{1\leq i\leq n}\in\mathbb{R}^{n}$ , we deduce from (69) that

	$\displaystyle(\forall x=(x_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})$
	$\displaystyle\quad q(x,p^{\ell},\beta^{\ell})=\sum_{i\in\mathcal{I}^{\ell}}\left(C_{\delta}(x_{i})\right)^{p^{\ell}_{i}}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}$		(70)
	$\displaystyle\qquad\qquad+\sum_{i\in\mathcal{J}^{\ell}}\left(C_{\delta}(x_{i})\right)^{p^{\ell}_{i}}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}$
	$\displaystyle\leq\overline{q}(x,v,p^{\ell},\beta^{\ell}),$		(71)

where the resulting majorant function is separable, i.e.

\overline{q}(x,v,p^{\ell},\beta^{\ell}),=\sum_{i=1}^{n}\overline{q}_{i}(x_{i},v_{i},p^{\ell}_{i},\beta^{\ell}_{i}),

(72)

with, for every $i\in\{1,\ldots,n\}$ and $x_{i}\in\mathbb{R}$ ,

	$\displaystyle\overline{q}_{i}(x_{i},v_{i},p^{\ell}_{i},\beta^{\ell}_{i})$		(73)
	$\displaystyle=\begin{cases}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}\left(C_{\delta}(u_{i})\right)^{p^{\ell}_{i}},&\mbox{if $p_{i}^{\ell}\geq 1$}\\ e^{-\beta^{\ell}_{i}p^{\ell}_{i}}\Big{(}(C_{\delta}(v_{i}))^{p^{\ell}_{i}}(1-p^{\ell}_{i})\\ \qquad+p^{\ell}_{i}(C_{\delta}(v_{i}))^{p^{\ell}_{i}-1}C_{\delta}(x_{i})\Big{)}&\mbox{otherwise.}\end{cases}$

In a nutshell, each term of index $i\in\{1,\dots,n\}$ in (72) coincides either with the $i$ -th term of $q(\cdot,p^{\ell},\beta^{\ell})$ when $i\in\mathcal{I}^{\ell}$ , or it is a convex majorant of this $i$ -th term with respect to $v_{i}$ when $i\in\mathcal{J}^{\ell}$ . We thus propose to adopt an MM procedure by building a sequence of convex surrogate problems for the non-convex minimisation problem involved in the computation of $\text{prox}_{\gamma_{0}q(\cdot,p^{\ell},\beta^{\ell})}^{A}$ . At the $\kappa$ -th iteration of this procedure, following the MM principle, the next iterate $x^{\kappa+1}$ is determined by setting $v=x^{\kappa}$ . We summarise the strategy in Algorithm 3.

Initialize $x^{0}\in\mathbb{R}^{n}$
For $\kappa=0,1,\dots$ until convergence

	$\displaystyle x^{\kappa+1}$	$\displaystyle=\text{prox}_{\gamma_{0}\overline{q}(\cdot,x^{\kappa},p^{\ell},\beta^{\ell})}^{A}(x^{+})$		(74)
		(with Algorithm 4)

Algorithm 3 MM algorithm to approximate

\text{prox}_{\gamma_{0}q(\cdot,p^{\ell},\beta^{\ell})}^{A}(x^{+})

with

x^{+}\in\mathbb{R}^{n}

Since function $\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})$ is convex, proper, and lower semicontinuous, its proximity operator in the weighted Euclidean metric induced by matrix $A$ is guaranteed to be uniquely defined. It can be computed efficiently using the Dual Forward-Backward (DFB) method [72], outlined in Algorithm 4.

Initialize dual variable $w^{0}\in\mathbb{R}^{n}$
Set $\eta\in(0,2|||A|||^{-1})$
For $\kappa^{\prime}=0,1,\dots$ until convergence

$\displaystyle u^{\kappa^{\prime}}$	$\displaystyle=x^{+}-Aw^{\kappa^{\prime}},$	(75)
$\displaystyle w^{\kappa^{\prime}+1}$	$\displaystyle=w^{\kappa^{\prime}}+\eta u^{\kappa^{\prime}}$	(76)
	$\displaystyle\;\;-\eta\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})}(\eta^{-1}w^{\kappa^{\prime}}+u^{\kappa^{\prime}}).$

Return $u^{\kappa^{\prime}}\in\mathbb{R}^{n}$

Algorithm 4 DFB algorithm to compute

\text{prox}_{\gamma_{0}\overline{q}(\cdot,v;p^{\ell},\beta^{\ell})}^{A}(x^{+})

with

x^{+}\in\mathbb{R}^{n}

The update in (4) can be performed componentwise since function $\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})$ is separable. Thanks to the separability property, computing $\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})}$ boils down to solving $n$ one-dimensional optimization problems, that is

$\displaystyle(\forall u^{+}$	$\displaystyle=(u^{+}_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})$
	$\displaystyle\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}(\cdot,v,p^{\ell},\beta^{\ell})}(u^{+})$
	$\displaystyle=\left(\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}_{i}(\cdot,v_{i},p^{\ell}_{i},\beta^{\ell}_{i})}(u^{+}_{i})\right)_{1\leq i\leq n}.$	(77)

More precisely,

•

for every $i\in\{1,\ldots,n\}$ , such that $p^{\ell}_{i}\leq 1$ ,

	$\displaystyle\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}_{i}(\cdot,v_{i},p^{\ell}_{i},\beta^{\ell}_{i})}(u^{+}_{i})$
	$\displaystyle=\text{prox}_{\eta^{-1}\gamma_{0}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}p^{\ell}_{i}(C_{\delta}(v_{i}))^{p^{\ell}_{i}-1}C_{\delta_{1}}}(u^{+}_{i})$
	$\displaystyle=\text{prox}_{\eta^{-1}\gamma_{0}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}p^{\ell}_{i}(C_{\delta}(v_{i}))^{p^{\ell}_{i}-1}H_{\delta_{1}}}(u^{+}_{i}).$		(78)

The proximity operator of the so-scaled version of function $H_{\delta_{1}}$ can be determined by solving a quartic polynomial equation.¹¹1 http://proximity-operator.net/scalarfunctions.html

•

For every $i\in\{1,\ldots,n\}$ such that $p^{\ell}_{i}>1$ ,

\text{prox}_{\eta^{-1}\gamma_{0}\overline{q}_{i}(\cdot,v_{i},p^{\ell}_{i},\beta^{\ell}_{i})}(u^{+}_{i})\\ =\text{prox}_{\eta^{-1}\gamma_{0}e^{-\beta^{\ell}_{i}p^{\ell}_{i}}\left(C_{\delta}\right)^{p^{\ell}_{i}}}(u^{+}_{i}).

(79)

The latter quantity can be evaluated through a bisection search to find the root of the derivative of the involved proximally regularised function.

Remark 6.

Due to the non-convexity of $q(\cdot,p^{\ell},\beta^{\ell})$ , there is no guarantee that the point estimated by Algorithm 4 coincides with the exact proximity point. However, we did not notice any numerical issues in our implementation.

Proximal computation with respect to $p$ .

Subproblem (66) requires to compute the proximity operator of $\gamma_{1}\big{(}q(x^{\ell+1},\cdot,\beta^{\ell})+g\big{)}$ , which is equivalent to solving the following minimization problem

\underset{\begin{subarray}{c}{p\in{[a,b]^{n}}}\end{subarray}}{\mathrm{minimize}}\;\;\psi^{\ell}(p)+\lambda\ell_{1,2}(Dp),

(80)

where, for every $p\in\mathbb{R}^{n}$ , $\psi^{\ell}(p)=\sum_{i=1}^{n}\psi_{i}^{\ell}(p_{i})$ with

(\forall i\in\{1,\ldots,n\})(\forall p_{i}\in\mathbb{R})\\ \psi_{i}^{\ell}(p_{i})=\begin{cases}\left(C_{\delta}(x_{i}^{\ell+1})\right)^{p_{i}}e^{-\beta_{i}^{\ell}p_{i}}+\ln\Gamma(1+\frac{1}{p_{i}})\\ \quad+\frac{1}{2\gamma_{1}}(p_{i}-p_{i}^{\ell})^{2}\quad\mbox{if $p_{i}>0$}\\ +\infty\qquad\qquad\qquad\;\;\mbox{otherwise.}\end{cases}

(81)

Moreover, $D=[D_{\rm h},D_{\rm v}]$ where $(D_{\rm h},D_{\rm v})\in(\mathbb{R}^{n\times n})^{2}$ are the discrete horizontal and vertical 2D gradient operators, and the $\ell_{1,2}$ -norm is defined as

(\forall p\in\mathbb{R}^{n})\quad\ell_{1,2}(Dp)=\sum_{i=1}^{n}\|([D_{\rm h}p]_{i},[D_{\rm v}p]_{i})\|_{2}.

Problem (80) is equivalent to minimizing the sum of the indicator function of a hypercube, a separable component and a non-separable term involving the linear operator $D$ . According to Proposition 6, we can ensure the convexity of each term $(\psi_{i}^{\ell})_{1\leq i\leq n}$ by setting $\gamma_{1}<\frac{1}{\mu_{0}}\approx 8.805$ . In order to solve (80), it is then possible to implement a Primal-Dual (PD) algorithm [73, 74, 75] as outlined in Algorithm 5.

Initialise the dual variables $v^{0}_{1}\in\mathbb{R}^{n\times 2}$ , $v^{0}_{2}\in\mathbb{R}^{n}$ .
Set $\tau>0$ and $\sigma>0$ such that $\tau\sigma(|||D|||^{2}+1)<1$ .
for $\kappa=0,1,\dots$ until convergence

$\displaystyle u^{\kappa}$	$\displaystyle=p^{\kappa}-\tau(D^{*}v^{\kappa}_{1}+v^{\kappa}_{2}),$	(82)
$\displaystyle{p}^{\kappa+1}$	$\displaystyle=\text{proj}_{[a,b]^{n}}(u^{\kappa}),$	(83)
$\displaystyle w^{\kappa}_{1}$	$\displaystyle=v^{\kappa}_{1}+\sigma D(2{p}^{\kappa+1}-{p}^{\kappa}),$	(84)
$\displaystyle v^{\kappa+1}_{1}$	$\displaystyle=w^{\kappa+1}_{1}-\sigma\text{prox}_{\frac{\lambda\ell_{1,2}}{\sigma}}(\frac{w^{\kappa}_{1}}{\sigma}).$	(85)
$\displaystyle w^{\kappa}_{2}$	$\displaystyle=v^{\kappa}_{2}+\sigma(2{p}^{\kappa+1}-{p}^{\kappa}),$	(86)
$\displaystyle v^{\kappa+1}_{2}$	$\displaystyle=w^{\kappa+1}_{2}-\sigma\text{prox}_{\frac{\psi^{\ell}}{\sigma}}(\frac{w^{\kappa}_{2}}{\sigma}).$	(87)

Return $p^{\kappa+1}\in[a,b]^{n}$

Algorithm 5 Primal Dual Algorithm for solving (80)

The proximity operator of the involved $\ell_{1,2}$ norm has a closed-form expression. For every $w_{1}=([w_{1}]_{i,1},[w_{1}]_{i,2})_{1\leq i\leq n}\in\mathbb{R}^{n\times 2}$ and $\lambda>0$ , we have

	$\displaystyle\text{prox}_{\lambda\ell_{1,2}}(w_{1})$
	$\displaystyle\quad=\left(\text{prox}_{\lambda\\|\cdot\\|_{2}}\Big{(}([w_{1}]_{i,1},[w_{1}]_{i,2})\Big{)}\right)_{1\leq i\leq n}$
	$\displaystyle=\Bigg{(}([w_{1}]_{i,1},[w_{1}]_{i,2})$
	$\displaystyle\qquad-\frac{\lambda([w_{1}]_{i,1},[w_{1}]_{i,2})}{\max\{\lambda,\\|([w_{1}]_{i,1},[w_{1}]_{i,2})\\|_{2}\}}\Bigg{)}_{1\leq i\leq n}.$

The proximal point at ${{w_{2}^{\kappa}}/{\sigma}=\left({[w_{2}^{\ell}]_{i}}/{\sigma}\right)_{1\leq i\leq n}\in\mathbb{R}^{n}}$ of the separable term $\psi^{\ell}$ with respect to a step size $1/\sigma$ can be found by minimizing, for every $i\in\{1,\ldots,n\}$ , the following smooth function

(\forall t\in(0,+\infty))\quad\mathsf{g}_{1,i}(t)=\psi_{i}^{\ell}(t)+\frac{\sigma}{2}\Big{(}t-\frac{[w_{2}^{\kappa}]_{i}}{\sigma}\Big{)}^{2}.

The update in (87) then reads

v_{2}^{\kappa+1}=\left([w_{2}^{\kappa+1}]_{i}-\sigma[w_{2}^{\kappa}]_{i}^{*}\right)_{1\leq i\leq n}

where, for every $i\in\{1,\dots,n\}$ , $[w_{2}^{\kappa}]_{i}^{*}$ corresponds to the unique zero of the derivative of $\mathsf{g}_{1,i}$ . This zero is found by applying Newton’s method initialised with

\bar{w}_{i}=\left(\max\Big{\{}10^{-3},\frac{[w_{2}^{\kappa}]_{i}}{\sigma}\Big{\}}\right)_{1\leq i\leq n}.

Proximal computation with respect to $\beta$ .

Subproblem (67) requires the solution of the following minimisation problem:

\underset{\begin{subarray}{c}{\beta\in\mathbb{R}^{n}}\end{subarray}}{\mathrm{minimize}}\;\;\varphi^{\ell}(\beta)+\zeta\ell_{1,2}(D\beta)

(88)

where $D$ and $\ell_{1,2}$ have been defined previously and

(\forall\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})\quad\varphi^{\ell}(\beta)=\sum_{i=1}^{n}\varphi_{i}^{\ell}(\beta_{i})

with, for every $i\in\{1,\ldots,n\}$ ,

\varphi_{i}^{\ell}(\beta_{i})=\left(C_{\delta}(x_{i}^{\ell+1})\right)^{p_{i}^{\ell+1}}e^{-\beta_{i}p_{i}^{\ell+1}}+\beta_{i}\\ +{\frac{(\beta_{i}-\mu_{\beta})^{2}}{2\sigma_{\beta}^{2}}}+\frac{1}{2\gamma_{2}}(\beta_{i}-\beta_{i}^{\ell})^{2}

(89)

The above problem shares a structure similar to the one studied in the previous case since the objective function is the sum of the smooth convex term $\varphi^{\ell}$ and the non-smooth convex one $\zeta\operatorname{TV}=\zeta\ell_{1,2}(D\cdot)$ , and it can be solved by the primal-dual procedure outlined in Algorithm 6.

Set $\tau>0$ and $\sigma>0$ such that $\tau\sigma|||D|||^{2}\leq 1$ .
Initialise the dual variable $v^{0}\in\mathbb{R}^{n\times 2}$ .
for $\kappa=0,1,\dots$ until convergence

$\displaystyle u^{\kappa}$	$\displaystyle=\beta^{\kappa}-\tau(D^{*}v^{\kappa}),$	(90)
$\displaystyle{\beta}^{\kappa+1}$	$\displaystyle=\text{prox}_{\tau\varphi^{\ell}}(u^{\kappa}),$	(91)
$\displaystyle w^{\kappa}$	$\displaystyle=v^{\kappa}+\sigma D(2{\beta}^{\kappa+1}-{\beta}^{\kappa}),$	(92)
$\displaystyle v^{\kappa+1}$	$\displaystyle=w^{\kappa+1}-\sigma\text{prox}_{\frac{\zeta\ell_{1,2}}{\sigma}}(\frac{w^{\kappa}}{\sigma}).$	(93)

Return $\beta^{\kappa+1}\in\mathbb{R}^{n}$

Algorithm 6 Primal Dual Algorithm for minimizing (88)

At each iteration $\kappa$ of Algorithm 6, the proximity operator of $\varphi^{\ell}$ is expressed as

(\forall\beta=(\beta_{i})_{1\leq i\leq n}\in\mathbb{R}^{n})\\ \text{prox}_{\tau\varphi^{\ell}}(\beta)=\left(\text{prox}_{\tau\varphi_{i}^{\ell}}(\beta_{i})\right)_{1\leq i\leq n}.

(94)

For every $i\in\{1,\ldots,n\}$ , $\text{prox}_{\tau\varphi_{i}^{\ell}}(\beta_{i})$ is the minimizer of function

\displaystyle(\forall\beta_{i}\in\mathbb{R})\quad\mathsf{g}_{2,i}(\beta_{i})=\varphi_{i}^{\ell}(\beta_{i})+\frac{1}{2\tau}(\beta_{i}-u_{i}^{\kappa})^{2}.

(95)

The nonlinear equation defining the unique zero of the derivative of $\mathsf{g}_{2,i}$ admits a closed-form solution that involves the Lambert $W$ -function [76]. Indeed, let us introduce the following notation:

$\displaystyle a_{1,i}$	$\displaystyle=p_{i}^{\ell+1}\left(C_{\delta}(x_{i}^{\ell+1})\right)^{p_{i}^{\ell+1}},$	(96)
$\displaystyle a_{2}$	$\displaystyle=\left(\frac{1}{\sigma^{2}_{\beta}}+\frac{1}{\gamma_{2}}+\frac{1}{\tau}\right)^{-1},$	(97)
$\displaystyle a_{3,i}$	$\displaystyle=1{-\frac{\mu_{\beta}}{\sigma^{2}_{\beta}}}-\frac{\beta_{i}^{\ell}}{\gamma_{2}}-\frac{u_{i}^{\kappa}}{\tau}.$	(98)

Then

	$\displaystyle\mathsf{g}_{2,i}^{\prime}(\beta_{i})=0$
	$\displaystyle\iff-a_{1,i}\exp(-p_{i}^{\ell+1}\beta_{i})+\frac{\beta_{i}}{a_{2}}+a_{3,i}=0$
	$\displaystyle\iff p_{i}^{\ell+1}(\beta_{i}+a_{2}a_{3,i})\exp(p_{i}^{\ell+1}(\beta_{i}+a_{2}a_{3,i}))$
	$\displaystyle\qquad\qquad=p_{i}^{\ell+1}a_{1,i}a_{2}\exp(p_{i}^{\ell+1}a_{2}a_{3,i})$
	$\displaystyle\iff\beta_{i}$
	$\displaystyle=\frac{1}{p_{i}^{\ell+1}}W(p_{i}^{\ell+1}a_{1,i}a_{2}\exp(p_{i}^{\ell+1}a_{2}a_{3,i}))-a_{2}a_{3,i},$		(99)

where the last equivalence comes from the fact that the Lambert $W$ -function is single-valued and satisfies the following identity for a pair $(X,Y)\in\mathbb{R}\times\left(-\frac{1}{e},+\infty\right)$ :

X\exp(X)=Y\iff X=W(Y).

(100)

Notice that the expression in (99) is well defined since the argument of the Lambert function is always positive.

In conclusion, the update in (91) reads as $\beta^{\kappa+1}=\left(\beta_{i}^{\kappa+1}\right)_{1\leq i\leq n}$ where each component of this vector is calculated according to (99).

5 Numerical Experiments

In this section, we illustrate the performance of our approach on a problem of joint deblurring/segmentation of realistically simulated ultrasound images. We consider images with two regions (Simu1) and three regions (Simu2) extracted from [17]. Both images have dimension $256\times 256$ pixels. The shape parameters $p$ and the reparameterised scale parameters $\beta$ are set in each region following the choices for $p$ and $\alpha$ in [17], itself based on the experimental setting in [19]. This strategy allows us to have a reference configuration for $\beta$ , which led us to choose a non-necessarily zero-mean Gaussian distribution as a prior for this parameter. In our experiments, we will treat $\mu_{\beta}$ as an unknown parameter, along with the regularisation parameters for the Total Variation priors. The pixel values in each region of the original image $x\in\mathbb{R}^{n}$ are obtained as a realisation of a random variable following a $\mathcal{GGD}$ with the corresponding shape and scale parameters $p$ and $\alpha$ . We define $K$ as the linear operator modelling the convolution with the point spread function of a 3.5 MHz linear probe obtained with the Field II ultrasound simulator [77]. To reproduce the same setting as in [17], we obtain the observed degraded images $y\in\mathbb{R}^{n}$ from the original images $x\in\mathbb{R}^{n}$ by applying the observation model (1), where we set the additive noise variance (which will be assumed to be known) to $\sigma^{2}=0.013$ for Simu1 and $\sigma^{2}=33$ for Simu2. For the preconditioner, we consider a regularised version of the inverse of the Hessian of the data fidelity function in (59), given by

A=\sigma^{2}(K^{\top}K+\mu\mathbb{I}_{m})^{-1}

where $\mu=0.1$ , so that $A$ is well defined and constant throughout the iterations. Following the procedure outlined in [17], we initialise ${x}^{0}\in\mathbb{R}^{n}$ using a pre-deconvolved image obtained with a Wiener filter applied to the observed data $y$ , $({p}^{0}_{i})_{1\leq i\leq n}$ is drawn from an i.i.d. uniform distribution in the range $[0.5,1.5]$ , while $({\beta}^{0}_{i})_{1\leq i\leq n}$ is drawn from an i.i.d. Gaussian distribution with mean $\mu_{\beta}$ and unit standard deviation. We set $\mu_{\beta}=0$ for Simu 1 and $\mu_{\beta}=4$ for Simu 2, for arguments discussed in SM 2. We adopt the recovery strategy described in Section 4 and describe hereafter the setting of the model/algorithm hyperparameters.

The model parameters that need to be tuned are the $\delta_{1}>0$ and $\delta_{2}>0$ values for the pseudo-Huber function, the mean $\mu_{\beta}\in\mathbb{R}$ and the standard deviation $\sigma_{\beta}>0$ for the reparameterised scale parameter, and finally the regularisation parameters $(\lambda,\zeta)\in(0,+\infty)^{2}$ for the TV terms. For parameter $\delta=(\delta_{1},\delta_{2})$ , we applied the following choice, resulting from a rough empirical search,: $\delta_{1}=1$ while $\delta_{2}=\delta_{1}\times 10^{-2}$ . For what concerns the Gaussian parameters of the reparameterised scale variable $(\mu_{\beta},\sigma_{\beta})$ , the mean $\mu_{\beta}$ is the most influential on the estimated solution, so we dedicated an in-depth analysis for its choice in combination with the TV regularisation parameters $(\lambda,\zeta)$ . More precisely, we tested different values of $\mu_{\beta}$ in the range $[-10,10]$ in combination with a grid search for $(\lambda,\zeta)\in\{10^{-2},10^{-1},1,10,10^{2},10^{3}\}^{2}$ with respect to different quality metrics and identified an optimal choice for $\mu_{\beta}$ . The standard deviation appeared less influential and is set to $\sigma_{\beta}=1$ in all our experiments. The details of the analysis are illustrated in the annexed SM 2.

The algorithmic hyperparameters include the step sizes of the proximal steps, as well as the preconditioning matrix involved in the preconditioned proximal gradient step. We set $(\gamma_{0},\gamma_{1},\gamma_{2})=(0.99,1,1)$ in order to meet the convergence assumptions in Algorithm 2. In particular, the choice for $\gamma_{0}$ approximates the highest value allowed for the step size of the preconditioned inexact FB scheme in (65), while $\gamma_{1}$ satisfies the condition $\gamma_{1}<8.805$ for the convexity of the function in (80).

In order to obtain the labelling of a segmented image from our estimated shape parameter (denoted by $\widehat{p}$ ) we use a quantisation procedure based on Matlab functions multithresh and imquantize. The former defines a desired number of quantisation levels using Otsu’s method, while the latter performs a truncation of the data values according to the provided quantisation levels. We remark here that the number of labels does not need to be defined throughout the proposed optimisation procedure, but only at the final segmentation step. This step can thus be considered as a post-processing that is performed on the estimated solution.

In order to evaluate the quality of the solution, we consider the following metrics: for the estimated image, we make use of the peak signal-to-noise ratio (PSNR) defined as follows, $x$ being the original signal and $\widehat{x}$ the estimated one:

\text{PSNR}=10\log_{10}\big{(}n\,\max_{i\in\{1,\dots,n\}}\{x_{i},\hat{x}_{i}\}^{2}/\|x_{i}-\hat{x}_{i}\|^{2}\big{)},

and of the structure similarity measure (SSIM) [78]. For the segmentation task we compute the percentage OA of correctly predicted labels.

The stopping criteria for Algorithm 2 outer and inner loops are set by defining a threshold level on the relative change between two consecutive iterates of the involved variables, the relative change of the objective values of two consecutive iterates and a maximum number of iterations. The outer loop in Algorithm 2 stops whenever $\ell=10000$ or when both $\|\zeta^{\ell+1}-\zeta^{\ell}\|/\|\zeta^{\ell}\|<10^{-4}$ and $|\theta(\zeta^{\ell+1})-\theta(\zeta^{\ell})|/|\theta(\zeta^{\ell})|<10^{-4}$ . The MM procedure to compute $x^{\ell+1}$ in Algorithm 3 is stopped after 300 iterations or when $\|x^{\kappa+1}-x^{\kappa}\|/\|x^{\kappa}\|<10^{-3}$ . The DFB procedure in Algorithm 4 to compute $u^{\kappa+1}$ is stopped after 300 iterations or when $\|u^{\kappa+1}-u^{\kappa}\|/\|u^{\kappa}\|<10^{-3}$ . The PD procedure in Algorithm 5, and Algorithm 6 computing $p^{\ell+1}$ (resp. $\beta^{\ell+1}$ ) terminates after 200 iteration or when $\|p^{\kappa+1}-p^{\kappa}\|/\|p^{\kappa}\|<10^{-3}$ (resp. $\|\beta^{\kappa+1}-\beta^{\kappa}\|/\|\beta^{\kappa}\|<10^{-3}$ ).

Figure 2 illustrates in the first and second line the B–mode image of the original $x$ , of the degraded $y$ , and of the reconstructed image $\hat{x}$ on both examples. The B–mode image is the most common representation of an ultrasound image, displaying the acoustic impedance of a 2-dimensional cross section of the considered tissue. The reconstructed results in Figure 2 (right) show clearly reduced blur and sharper region contours. We then report in the third and fourth lines of Figure 2 the estimated shape parameter and the segmentation obtained via the aforementioned quantisation procedure, which confirms its good performance. We notice that our estimated $\hat{p}_{i}$ values are consistent with the original ones and the fact that the results for Simu2 are slightly less accurate than the ones for Simu1 is in line with the results presented in [17, Table III] for P-ULA, HMC and PP-ULA, suggesting that the configuration of the parameters for Simu2 is quite challenging.

Table 1 proposes a quantitative comparison of our results against those of the methods considered in [17]: a combination of Wiener deconvolution and Otsu’s segmentation [47], a combination of LASSO deconvolution and SLaT segmentation [40], the adjusted Hamiltonian Monte Carlo (HMC) method [79], the Proximal Unadjusted Langevin algorithm (P-ULA) [80] and its preconditioned version (PP-ULA) [17] for joint deconvolution and segmentation. From this table, we can conclude that the proposed variational method is able to compete with state-of-the-art Monte Carlo Markov Chain techniques in terms of both segmentation and deconvolution performance. For what concerns the computational time, the average time (over 10 runs of the algorithm) required by P-SASL-PAM to meet the stopping criteria ${\|\zeta^{\ell+1}-\zeta^{\ell}\|/\|\zeta^{\ell}\|<10^{-4}}$ and ${|\theta(\zeta^{\ell+1})-\theta(\zeta^{\ell})|/|\theta(\zeta^{\ell})|<10^{-4}}$ corresponds to $493.2$ seconds (approximately $8^{\prime}13{{}^{\prime\prime}}$ ) for Simu1 and $536.4$ seconds (approximately $8^{\prime}56{{}^{\prime\prime}}$ ) for Simu2. Simulations were run on Matlab 2021b on an Intel Xeon Gold 6230 CPU 2.10GHz. In Table 1, we report the computational times for PULA, HMC and PP-ULA from [17, TABLE II], which were obtained on Matlab 2018b on an Intel Xeon CPU E5-1650 3.20GHz.

	Simu1				Simu2
METHOD	PSNR	SSIM	OA	TIME	PSNR	SSIM	OA	TIME
Wiener-Otsu	37.1	0.57	99.5	–	35.4	0.63	96.0	–
LASSO-SLaT	39.2	0.60	99.6	–	37.8	0.70	98.3	–
P-ULA	38.9	0.45	98.7	$2$ h $27$ min	37.1	0.57	94.9	$3$ h $06$ min
HMC	40.0	0.62	99.7	$1$ h $08$ min	36.4	0.64	98.5	$4$ h $14$ min
PP-ULA	40.3	0.62	99.7	$12$ min	38.6	0.71	98.7	$39$ min
OURS	40.2	0.61	99.9	$8$ min	38.1	0.70	97.7	$9$ min

Table 1: PSNR, SSIM, OA scores and Computational time for Simu1 and Simu2 from [17]. The symbol "–" means the result was not available in the reference paper.

Eventually, Figure 3 (a)-(b) show the evolution of the mean value of the cost function for both Simu1 (a) and Simu2 (b) along 500 iterations for ten different sampling of $p^{0}$ and $\beta^{0}$ , while Figure 3 (c)-(d)illustrate on a logarithmic scale the relative distance from the iterates to the solution $\|\zeta^{\ell}-\zeta^{\infty}\|/\|\zeta^{\infty}\|$ for Simu1 (c) and Simu2 (d), showing the convergence of our algorithm.

Additional experiments can be found in Supplementary Material: SM1, showing that for standard wavelet-based image restoration problems the proposed regularisation outperforms other sparsity measures.

6 Conclusions

We investigated a novel approach for the joint reconstruction/feature extraction problem. The novelty in this work lies both in the problem formulation and in the resolution procedure. Firstly, we proposed a new variational model in which we introduced a flexible sparse regularisation term for the reconstruction task. Secondly, we designed an inexact version of a TITAN-based block alternating optimisation scheme, whose aim is to exploit the structure of the problem and the properties of the functions involved in it. We established convergence results for the proposed algorithm whose scope goes beyond the image processing problems considered in our work. We illustrated the validity of the approach on numerical examples in the case of a joint deconvolution-segmentation problem. We also included comparisons with state-of-the-art methods with respect to which our proposal registers a similar qualitative and quantitative performance. An attractive aspect of the proposed work is that the space variant parameters defining the flexible sparse regularisation do not need to be defined in advance, but are inherently estimated by the iterative optimisation procedure. For what concerns the tuning of the hyperparamters of the model, the design of an automatic strategy could be an interesting development of the work, for instance through supervised learning. \bmheadAcknowledgments This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 861137. The authors thank Ségolène Martin for her careful reading of the initial version of this manuscript.

References

\bibcommenthead
Daubechies et al. [2004] Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constrains. Communications on Pure and Applied Mathematics 57 (2004)
Grasmair et al. [2008] Grasmair, M., Haltmeier, M., Scherzer, O.: Sparse regularization with lq penalty term. Inverse Problems 24, 055020 (2008)
Lorenz [2008] Lorenz, D.: Convergence rates and source conditions for Tikhonov regularization with sparsity constraints. Journal of Inverse and Ill-posed Problems 16(5), 463–478 (2008)
Ramlau and Resmerita [2010] Ramlau, R., Resmerita, E.: Convergence rates for regularization with sparsity constraints. Electronic transactions on numerical analysis ETNA 37, 87–104 (2010)
Tibshirani [1996] Tibshirani, R.: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288 (1996)
Chartrand [2007] Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Signal Processing Letters 14, 707–710 (2007)
Grasmair [2009] Grasmair, M.: Well-posedness and convergence rates for sparse regularization with sublinear $\ell^{q}$ penalty term. Inverse Problems $\&$ Imaging 3(3), 383–387 (2009)
Zarzer [2009] Zarzer, C.: On Tikhonov regularization with non-convex sparsity constraints. Inverse Problems 25, 025006 (2009)
Ghilli and Kunisch [2019] Ghilli, D., Kunisch, K.: On monotone and primal-dual active set schemes for $\ell_{p}$ -type problems, $p\in(0,1]$ . Computational Optimization and Applications 72, 45–85 (2019)
Hintermüller and Wu [2013] Hintermüller, M., Wu, T.: Nonconvex TVq-models in image restoration: analysis and a Trust-Region regularization-based superlinearly convergent solver. SIAM Journal on Imaging Sciences 6, 1385–1415 (2013)
Lorenz and Resmerita [2016] Lorenz, D., Resmerita, E.: Flexible sparse regularization. Inverse Problems 33 (2016)
Afonso and Sanches [2017] Afonso, M., Sanches, J.M.: Adaptive order non-convex lp-norm regularization in image restoration. Journal of Physics: Conference Series 904(1), 012016 (2017)
Blomgren et al. [1997] Blomgren, P., Chan, T.F., Mulet, P., Wong, C.K.: Total variation image restoration: Numerical methods and extensions. IEEE International Conference on Image Processing (1997)
Chen et al. [2006] Chen, Y., Levine, S., Rao, M.: Variable exponent, linear growth functionals in image restoration. SIAM Journal on Applied Mathematics 66, 1383–1406 (2006)
Lanza et al. [2018] Lanza, A., Morigi, S., Pragliola, M., Sgallari, F.: Space-variant generalised Gaussian regularisation for image restoration. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization 7, 1–14 (2018)
Lazzaretti et al. [2022] Lazzaretti, M., Calatroni, L., Estatico, C.: Modular-proximal gradient algorithms in variable exponent lebesgue spaces. SIAM Journal on Scientific Computing 44(6), 3463–3489 (2022)
Corbineau et al. [2019] Corbineau, M.-C., Kouamé, D., Chouzenoux, E., Tourneret, J.-Y., Pesquet, J.-C.: Preconditioned P-ULA for joint deconvolution-segmentation of ultrasound images. IEEE Signal Processing Letters 26(10), 1456–1460 (2019)
Do and Vetterli [2002] Do, M.N., Vetterli, M.: Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance. IEEE Transactions on Image Processing 11(2), 146–158 (2002)
Zhao et al. [2016] Zhao, N., Basarab, A., Kouamé, D., Tourneret, J.-Y.: Joint segmentation and deconvolution of ultrasound images using a hierarchical Bayesian model based on generalized Gaussian priors. IEEE Transactions on Image Processing 25(8), 3736–3750 (2016)
Hildreth [1957] Hildreth, C.: A quadratic programming procedure. Naval Research Logistics Quarterly 4(1), 79–85 (1957)
Tseng [2001] Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications 109, 475–494 (2001)
Combettes and Pesquet [2011] Combettes, P.L., Pesquet, J.-C.: Proximal splitting methods in signal processing, pp. 185–212. Springer, New York, NY (2011)
Combettes and Pesquet [2021] Combettes, P.L., Pesquet, J.-C.: Fixed point strategies in data science. IEEE Transactions on Signal Processing 69, 3878–3905 (2021)
Attouch et al. [2010] Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Mathematics of Operations Research 35(2), 438–457 (2010)
Bolte et al. [2014] Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming 146(1-2), 459–494 (2014)
Pock and Sabach [2016] Pock, T., Sabach, S.: Inertial proximal alternating linearized minimization (iPALM) for nonconvex and nonsmooth problems. SIAM Journal on Imaging Sciences 9(4), 1756–1787 (2016)
Hertrich and Steidl [2022] Hertrich, J., Steidl, G.: Inertial stochastic palm and applications in machine learning. Sampling Theory, Signal Processing, and Data Analysis 20(1), 4 (2022)
Foare et al. [2020] Foare, M., Pustelnik, N., Condat, L.: Semi-Linearized proximal alternating minimization for a discrete Mumford–Shah model. IEEE Transactions on Image Processing 29, 2176–2189 (2020)
Nikolova and Tan [2017] Nikolova, M., Tan, P.: Alternating proximal gradient descent for nonconvex regularised problems with multiconvex coupling terms (2017). https://hal.archives-ouvertes.fr/hal-01492846
Tan et al. [2019] Tan, P., Pierre, F., Nikolova, M.: Inertial alternating generalized forward–backward splitting for image colorization. Journal of Mathematical Imaging and Vision 61, 672–690 (2019)
Censor and Lent [1987] Censor, Y., Lent, A.: Optimization of “log x" entropy over linear equality constraints. Siam Journal on Control and Optimization 25, 921–933 (1987)
Chouzenoux et al. [2016] Chouzenoux, E., Pesquet, J.-C., Repetti, A.: A block coordinate variable metric forward–backward algorithm. Journal of Global Optimization, 1–29 (2016)
Bonettini et al. [2018] Bonettini, S., Prato, M., Rebegoldi, S.: A block coordinate variable metric linesearch based proximal gradient method. Computational Optimization and Applications (2018)
Repetti and Wiaux [2021] Repetti, A., Wiaux, Y.: Variable metric forward-backward algorithm for composite minimization problems. SIAM Journal on Optimization 31(2), 1215–1241 (2021)
Bonettini et al. [2019] Bonettini, S., Porta, F., Prato, M., Rebegoldi, S., Ruggiero, V., Zanni, L.: Recent Advances in Variable Metric First-Order Methods, pp. 1–31. Springer, Cham (2019)
Hien et al. [2020] Hien, L.T.K., Phan, D.N., Gillis, N.: An inertial block majorization minimization framework for nonsmooth nonconvex optimization. J. Mach. Learn. Res. 24, 18–11841 (2020)
Attouch et al. [2011] Attouch, H., Bolte, J., Svaiter, B.F.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Mathematical Programming, Series A 137(1), 91–124 (2011)
Chaâri et al. [2010] Chaâri, L., Pesquet, J.-C., Tourneret, J.-Y., Ciuciu, P., Benazza-Benyahia, A.: A hierarchical bayesian model for frame representation. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4086–4089 (2010). https://doi.org/10.1109/ICASSP.2010.5495737
Jeffreys [1946] Jeffreys, H.: An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences 186(1007), 453–461 (1946). Accessed 2024-02-07
Cai et al. [2017] Cai, X., Chan, R., Nikolova, M., Zeng, T.: A three-stage approach for segmenting degraded color images: Smoothing, Lifting and Thresholding (SLaT). Journal of Scientific Computing 72, 1313–1332 (2017)
Cai et al. [2019] Cai, X., Chan, R., Schönlieb, C.-B., Steidl, G., Zeng, T.: Linkage between piecewise constant Mumford–Shah model and Rudin–Osher–Fatemi model and its virtue in image segmentation. SIAM Journal on Scientific Computing 41(6), 1310–1340 (2019)
Cai et al. [2013] Cai, X., Chan, R., Zeng, T.: A two-stage image segmentation method using a convex variant of the Mumford–Shah model and thresholding. SIAM Journal on Imaging Sciences 6(1), 368–390 (2013)
Chambolle et al. [2012] Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM Journal on Imaging Sciences 5, 1113–1158 (2012)
Chan et al. [2014] Chan, R., Yang, H., Zeng, T.: A two-stage image segmentation method for blurry images with Poisson or multiplicative Gamma noise. SIAM Journal on Imaging Sciences 7(1), 98–127 (2014)
Pascal et al. [2021] Pascal, B., Vaiter, S., Pustelnik, N., Abry, P.: Automated data-driven selection of the hyperparameters for total-variation based texture segmentation. Journal of Mathematical Imaging and Vision 63, 923–952 (2021)
Mumford and Shah [1989] Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42, 577–685 (1989)
Otsu [1979] Otsu, N.: A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9(1), 62–66 (1979)
Rockafellar et al. [2009] Rockafellar, R.T., Wets, M., Wets, R.J.B.: Variational Analysis. Grundlehren der mathematischen Wissenschaften. Springer, Heidelberg (2009)
Kurdyka [1998] Kurdyka, K.: On gradients of functions definable in o-minimal structures. Annales de l’Institut Fourier 48(3), 769–783 (1998)
Łojasiewicz [1963] Łojasiewicz, S.: Une propriété topologique des sous-ensembles analytiques réels. Equ. Derivees partielles, Paris 1962, Colloques internat. Centre nat. Rech. sci. 117, 87-89 (1963). (1963)
Łojasiewicz [1993] Łojasiewicz, S.: Sur la géométrie semi- et sous- analytique. Annales de l’Institut Fourier 43(5), 1575–1595 (1993)
Chouzenoux et al. [2014] Chouzenoux, E., Pesquet, J.-C., Repetti, A.: Variable metric Forward-Backward algorithm for minimizing the sum of a differentiable function and a convex function. Journal of Optimization Theory and Applications 162, 107–132 (2014)
Bertsekas [1999] Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Nashua (1999)
Erdogan and Fessler [1999] Erdogan, H., Fessler, J.A.: Monotonic algorithms for transmission tomography. IEEE Transactions on Medical Imaging 18(9), 801–814 (1999)
Hunter and Lange [2004] Hunter, D., Lange, K.: A Tutorial on MM Algorithms. The American Statistician 58, 30–37 (2004)
Salzo [2017] Salzo, S.: The variable metric forward-backward splitting algorithm under mild differentiability assumptions. SIAM Journal on Optimization 27(4), 2153–2181 (2017)
Malitsky and Mishchenko [2020] Malitsky, Y., Mishchenko, K.: Adaptive gradient descent without descent. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 6702–6712 (2020)
Latafat et al. [2023] Latafat, P., Themelis, A., Stella, L., Patrinos, P.: Adaptive proximal algorithms for convex optimization under local Lipschitz continuity of the gradient (2023)
Charbonnier et al. [1997] Charbonnier, P., Blanc-Féraud, L., Aubert, G., Barlaud, M.: Deterministic edge-preserving regularization in computed imaging. IEEE Transactions on Image Processing 6 2, 298–311 (1997)
Van Den Dries [1998] Van Den Dries, L.: Tame Topology and O-minimal Structures. London Mathematical Society Lecture Note Series. Cambridge University Press, Cambridge (1998)
Bolte et al. [2007] Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM Journal on Optimization 18(2), 556–572 (2007)
Gabrielov [1996] Gabrielov, A.: Complements of subanalytic sets and existential formulas for analytic functions. Inventiones mathematicae 125, 1–12 (1996)
Wilkie [1996] Wilkie, A.: Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. Journal of the American Mathematical Society 9, 1051–1094 (1996)
Van Den Dries et al. [1997] Van Den Dries, L., Macintyre, A., Marker, D.: Logarithmic-exponential power series. Journal of the London Mathematical Society 56(3), 417–434 (1997)
Van Den Dries and Speissegger [2000] Van Den Dries, L., Speissegger, P.: The field of reals with multisummable series and the exponential function. Proceedings of The London Mathematical Society 81, 513–565 (2000)
Tougeron [1994] Tougeron, J.: Sur les ensembles semi-analytiques avec conditions gevrey au bord. Annales Scientifiques De L Ecole Normale Superieure 27, 173–208 (1994)
Artin [2015] Artin, E.: The Gamma Function. Courier Dover Publications, New York (2015)
Wrench [1968] Wrench, J.W.: Concerning two series for the Gamma function. Mathematics of Computation 22(103), 617–626 (1968)
Andrews et al. [1999] Andrews, G.E., Askey, R., Roy, R.: Special Functions. Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge (1999)
Repetti and Wiaux [2021] Repetti, A., Wiaux, Y.: Variable metric forward-backward algorithm for composite minimization problems. SIAM Journal on Optimization 31(2), 1215–1241 (2021)
Schifano et al. [2010] Schifano, E.D., Strawderman, R.L., Wells, M.T.: Majorization-Minimization algorithms for nonsmoothly penalized objective functions. Electronic Journal of Statistics 4(none), 1258–1299 (2010)
Combettes et al. [2011] Combettes, P.L., Dũng, Bằng Công Vũ: Proximity for sums of composite functions. Journal of Mathematical Analysis and Applications 380(2), 680–688 (2011)
Condat [2013] Condat, L.: A Primal–Dual splitting method for convex optimization involving lipschitzian, proximable and linear composite terms. Journal of Optimization Theory and Applications 158 (2013)
Komodakis and Pesquet [2015] Komodakis, N., Pesquet, J.-C.: Playing with duality: An overview of recent primal-dual approaches for solving large-scale optimization problems. IEEE Signal Processing Magazine 32(6), 31–54 (2015)
Vũ [2013] Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Advances in Computational Mathematics 38, 667–681 (2013)
Corless et al. [1996] Corless, R., Gonnet, G., Hare, D., Jeffrey, D., Knuth, D.: On the Lambert W function. Advances in Computational Mathematics 5, 329–359 (1996)
Jensen [2004] Jensen, J.A.: Simulation of advanced ultrasound systems using field ii. In: 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro, pp. 636–6391 (2004)
Wang et al. [2004] Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13(4), 600–612 (2004)
Robert et al. [2018] Robert, C., Elvira, V., Tawn, N., Wu, C.: Accelerating MCMC algorithms. Wiley Interdisciplinary Reviews: Computational Statistics 10 (2018)
Pereyra [2013] Pereyra, M.: Proximal Markov chain Monte Carlo algorithms. Statistics and Computing 26 (2013)

\foreach\x

in 1,…,0 See pages \x of supplement_rev.pdf

	$\displaystyle\mathsf{p}(x,p,\beta\|y)$	$\displaystyle\propto\mathsf{p}(y\|x,p,\beta)\mathsf{p}(x,p,\beta)$
		$\displaystyle\propto\mathsf{p}(y\|x,p,\beta)\mathsf{p}(x\|p,\beta)\mathsf{p}(p)\mathsf{p}(\beta).$		(11)

	$\displaystyle\\|\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})\\|$	$\displaystyle\leq\gamma_{0}^{-1}\overline{\nu}\\|\zeta_{0}^{\ell+1}-\zeta_{0}^{\ell}\\|$		(36)
	$\displaystyle\\|r_{i}+\nabla_{i}q(\zeta^{\ell+1,i+1})\\|$	$\displaystyle=\gamma_{i}^{-1}\\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\\|$		(37)

	$\displaystyle\\|s^{\ell+1}_{0}\\|$	$\displaystyle\leq\\|\nabla f(\zeta_{0}^{\ell+1})-\nabla f(\zeta_{0}^{\ell})\\|$
		$\displaystyle+\\|\nabla f(\zeta_{0}^{\ell})+\nabla_{0}q(\zeta^{\ell+1,1})\\|$
		$\displaystyle+\\|\nabla_{0}q(\zeta^{\ell+1})-\nabla_{0}q(\zeta^{\ell+1,1})\\|.$

	$\displaystyle\\|s_{i}^{\ell+1}\\|$	$\displaystyle=\\|\nabla_{i}q(\zeta^{\ell+1})+r_{i}^{\ell+1}\\|$
		$\displaystyle\leq\\|\nabla_{i}q(\zeta^{\ell+1})-\nabla_{i}q(\zeta^{\ell+1,i+1})\\|$
		$\displaystyle\qquad\qquad+\\|r_{i}^{\ell+1}+\nabla_{i}q(\zeta^{\ell+1,i+1})\\|$
		$\displaystyle\leq L_{q}\\|\zeta^{\ell+1}-\zeta^{\ell}\\|+\tau_{i}\\|\zeta_{i}^{\ell+1}-\zeta_{i}^{\ell}\\|,$

	ORIGINAL	DEGRADED	RECONSTRUCTED
Simu1
Simu2
	REFERENCE	ESTIMATED	QUANTISED
Simu1
Simu2

A Variational Approach for Joint Image Recovery and Feature Extraction Based on Spatially Varying Generalised Gaussian Models

Abstract

keywords:

1 Introduction

2 Model Formulation

2.1 Observation Model

2.2 Bayesian Model

2.3 Variational Model

3 Preconditioned Structure Adapted Semi-Linearised Proximal Alternating Minimisation (P-SASL-PAM)

3.1 Subdifferential Calculus

Definition 1 (Subgradient of a convex function).

Definition 2 (Limiting Subdifferential).

Definition 3 (Critical point).

Definition 4.

Definition 5.

Definition 6.

Remark 1.

3.2 The KŁ-Property

Definition 7 (KŁ property).

3.3 Proposed Algorithm

Assumption 1.

Remark 2.

Proposition 1.

Assumption 2.

Assumption 3.

Remark 3.

Remark 4.

Remark 5.

3.3.1 Links between P-SASL-PAM and TITAN

Definition 8 (Block surrogate function).

Theorem 2.

Proof.

3.3.2 Well-definition of Algorithm 1

3.4 Convergence analysis in the inexact case

Lemma 1.

Proof.

Lemma 2.

Proof.

Proposition 3 (Properties of the cluster points set).

Proof.

Theorem 4.

4 Application of P-SASL-PAM

4.1 Smoothing of the coupling term

Definition 9 (Definable sets and definable functions).

Theorem 5.

Proposition 6.

Proof.

4.2 Proximal computations

Proximal computation with respect to xx.

Remark 6.

Proximal computation with respect to pp.

Proximal computation with respect to β\beta.

5 Numerical Experiments

6 Conclusions

References

Proximal computation with respect to $x$ .

Proximal computation with respect to $p$ .

Proximal computation with respect to $\beta$ .