This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Choosing Initial Values of Iteratively Reweighted 1\ell_{1} Algorithms for the Piece-wise Exponential Penalty

Rongrong Lin, Shimin Li, and Yulan Liu Corresponding author. School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou 510520, P.R. China. Email: [email protected].
Abstract

Computing the proximal operator of the sparsity-promoting piece-wise exponential (PiE) penalty 1e|x|/σ1-e^{-|x|/\sigma} with a given shape parameter σ>0\sigma>0, which is treated as a popular nonconvex surrogate of 0\ell_{0}-norm, is fundamental in feature selection via support vector machines, image reconstruction, zero-one programming problems, compressed sensing, etc. Due to the nonconvexity of PiE, for a long time, its proximal operator is frequently evaluated via an iteratively reweighted 1\ell_{1} algorithm, which substitutes PiE with its first-order approximation, however, the obtained solutions only are the critical point. Based on the exact characterization of the proximal operator of PiE, we explore how the iteratively reweighted 1\ell_{1} solution deviates from the true proximal operator in certain regions, which can be explicitly identified in terms of σ\sigma, the initial value and the regularization parameter in the definition of the proximal operator. Moreover, the initial value can be adaptively and simply chosen to ensure that the iteratively reweighted 1\ell_{1} solution belongs to the proximal operator of PiE.

Keywords: Iteratively reweighted 1\ell_{1} algorithms; piece-wise exponential penalty; proximal operator; Lambert W function; initial values.

1 Introduction

Sparse optimization problems arise in a wide range of fields, such as compressed sensing, image processing, statistics, machine learning, and among others [32, 33]. The so-called 0\ell_{0}-norm, which counts the nonzero components of a vector, is a natural penalty function to promote sparsity. Sparse solutions are more easily interpretable and generally lead to better generalization of the model performance. Numerous studies on 0\ell_{0}-norm penalty optimization problem have been widely investigated in the literature [2, 8, 27, 33]. However, such a nonconvex problem is NP-hard [2].

To circumvent this challenge, there are a great many of 0\ell_{0}-norm surrogates listed in the literature [16, 32, 42]. The 1\ell_{1}-norm regularizer has received a great deal of attention for its continuity and convexity. Although it comes close to the 0\ell_{0}-norm, the 1\ell_{1}-norm frequently leads to problems with excessive punishment. To remedy this issue, nonconvex sparsity-inducing penalties have been employed to better approximate the 0\ell_{0}-norm and enhance sparsity, and hence have received considerable attention in sparse learning. Recent theoretical studies have shown their superiority to the convex counterparts in a variety of sparse learning settings, including the bridge p\ell_{p}-norm penalty [9, 15], capped 1\ell_{1} penalty [13, 40], transformed 1\ell_{1} penalty [38, 39], log-sum penalty [5], minimax concave penalty [37], smoothly clipped absolute derivation [7], the difference of 1\ell_{1}- and 2\ell_{2}-norms [17, 36], the ratio of 1\ell_{1}- and 2\ell_{2}-norms [28, 35], Weibull penalty [41], generalized error functions [12, 42], pp-th power of the 1\ell_{1}-norm [24], piece-wise exponential function (PiE) in [3, 14, 20], and among others. To address the nonconvex and possibly nonsmooth problems, a proximal algorithm is commonly used [1]. The proximal operator [1] of a function φ:\varphi:\mathbb{R}\to\mathbb{R} at τ\tau\in\mathbb{R} with the regularization parameter λ>0\lambda>0 is defined by

Proxλφ(τ):=argminx{λφ(x)+12(xτ)2}.\,{\rm Prox}\,_{\lambda\varphi}(\tau):=\arg\min_{x\in\mathbb{R}}\Big{\{}\lambda\varphi(x)+\frac{1}{2}(x-\tau)^{2}\Big{\}}.

Characterizing the proximal operator of a function is crucial to the proximal algorithm. However, such a proximal operator does not always have a closed form or is computationally challenging to solve due to the nonconvex and nonsmooth nature of the sparsity-inducing penalty. A popular method for handling this issue is the iteratively reweighted algorithm, which approximates the nonconvex and nonsmooth problem by a sequence of trackable convex subproblems. Zou and Li [43] devised a local linear approximation, which can be treated as a special case of the iteratively reweighted 1\ell_{1} (IRL1) minimization method proposed by Candés, Wakin, and Boyd [5]. The IRL1 algorithm can be unified under a majorization-minimization framework [22]. Later, the IRL1 algorithm for optimization problems with general nonconvex and nonsmooth sparsity-inducing terms was explored in [32], and its global and local convergence analysis for the p\ell_{p}-norm regularized model were studied in [31] and [30], respectively.

In this paper, we focus on the PiE function. The PiE function fσ:f_{\sigma}:\mathbb{R}\to\mathbb{R} with a shape parameter σ>0\sigma>0, defined by

fσ(x)=1e|x|σ, for any x,f_{\sigma}(x)=1-e^{-\frac{|x|}{\sigma}},\mbox{ for any }x\in\mathbb{R}, (1)

is one of the nonconvex surrogates of the 0\ell_{0}-norm. It is also called an exponential-type penalty [11, 14, 32, 41] or a Laplacian function [29, (16)], which has been successfully applied in the support vector machines [3, 10], zero-one programming problems [18, 25], image reconstruction [29, 39], compressed sensing [6, 14, 19], and the low-rank matrix completion [34], etc. Due to the nonconvexity of PiE, for a very long time, the IRL1 algorithm was adopted in a large volume of references to approximate the proximal operator of PiE [3, 34, 41, 42]. Recently, the IRL1 algorithm for computing the proximal operator of PiE was adopted in [34, (3.19)] for matrix completion. However, the expression of the proximal operator Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}} for PiE was originally and partially studied by Malek-Mohammadi et al[19] in 2016 and then systematically explored by Liu, Zhou, and Lin [16] using the Lambert W function. Motivated by the analysis between the IRL1 algorithm solution for the log-sum penalty and its proximal operator in [23], we will explore the relation between the IRL1 algorithm solution and the proximal operator for PiE and then provide how to select a suitable initial point in the IRL1 algorithm to ensure that the IRL1 solution is consistent with the proximal operator of PiE.

The remainder of the paper is outlined as follows: In Section 2, we recall the existing characterizations for Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}} by utilizing the Lambert W function. With this, we show in Theorems 3.3 and 3.4 of Section 3 that the iteratively reweighted 1\ell_{1} solution does not belong to the proximal operator of PiE in certain regions, which can be explicitly determined in terms of σ\sigma, the initial value, and the regularization parameter λ\lambda, as shown in Fig. 2 later. To remedy this issue, the initial value is set adaptively, as in Theorems 3.6 and 3.8, to ensure that the IRL1 solution belongs to the proximal operator of PiE. Some necessary lemmas and the proofs of Theorems 3.3 and 3.4 are presented in Section 4. Some conclusions are made in the final section.

2 Existing characterizations for Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}}

Let us recall the expression of the proximal operator Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}} of PiE (1), which was systematically explored in [16] by means of the Lambert W function. The Lambert W function W(x)W(x) is a set of solutions of the equation

x=W(x)eW(x), for any x[1e,+).x=W(x)e^{W(x)},\mbox{ for any }x\in[-\frac{1}{e},+\infty).

The function W(x)W(x) is single-valued for x0x\geq 0 or x=1ex=-\frac{1}{e}, and is double-valued for 1e<x<0-\frac{1}{e}<x<0 (see, Fig. 1). To discriminate between the two branches when 1e<x<0-\frac{1}{e}<x<0, we use the same notation as in [21, Section 1.5] and denote the branch satisfying W(x)1W(x)\geq-1 and W(x)1W(x)\leq-1 by W0(x)W_{0}(x) and W1(x)W_{-1}(x), respectively. Such a function is a built-in function in Python (https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.lambertw.html). Lemma 2.1 later gives their monotonicity. The readers can refer to the recent monograph [21, Section 1.5] on the Lambert W function to learn more details.

Lemma 2.1

[21, Section 1.6] The Lambert W function W0(x)W_{0}(x) is strictly increasing on [1e,0)[-\frac{1}{e},0); however, W1(x)W_{-1}(x) is strictly decreasing on [1e,0)[-\frac{1}{e},0).

Refer to caption
Figure 1: Two main branches of the Lambert W function.

The characterizations of the proximal operator of PiE (1) were presented in [16, Section 2], which were split into two cases: λσ2\lambda\leq\sigma^{2} and λ>σ2\lambda>\sigma^{2}. For the sake of completeness, we list those characterizations as follows.

Lemma 2.2

Let λσ2\lambda\leq\sigma^{2} and τ\tau\in\mathbb{R}. It holds that

Proxλfσ(τ)={{0}, if |τ|λσ,{sign(τ)x1(τ)}, otherwise,\displaystyle\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)=\left\{\begin{array}[]{cl}\{0\},&\mbox{ if }|\tau|\leq\frac{\lambda}{\sigma},\\ \{\,{\rm sign}\,(\tau)x_{1}(\tau)\},&\mbox{ otherwise},\end{array}\right.

where x1(τ):=σW0(λσ2e|τ|σ)+|τ|.x_{1}(\tau):=\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{|\tau|}{\sigma}})+|\tau|.

Lemma 2.3

Let λ>σ2\lambda>\sigma^{2} and τ\tau\in\mathbb{R}. It holds that

Proxλfσ(τ)={{0}, if |τ|<σ(1+lnλσ2),sign(τ)argminx=0,x1(τ){L^(x,τ)}, if σ(1+lnλσ2)|τ|λσ,{sign(τ)x1(τ)}, otherwise,\displaystyle\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)=\left\{\begin{array}[]{cl}\{0\},&\mbox{ if }|\tau|<\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\\ \,{\rm sign}\,(\tau)\arg\min\limits_{x=0,x_{1}(\tau)}\{\widehat{L}(x,\tau)\},&\mbox{ if }\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq|\tau|\leq\frac{\lambda}{\sigma},\\ \{\,{\rm sign}\,(\tau)x_{1}(\tau)\},&\mbox{ otherwise},\end{array}\right.

where L^(x,τ):=λ(1exσ)+12(x|τ|)2\hat{L}(x,\tau):=\lambda(1-e^{-\frac{x}{\sigma}})+\frac{1}{2}(x-|\tau|)^{2} and x1(τ)x_{1}(\tau) is defined as in Lemma 2.2.

Lemma 2.3 can be further reduced to the following result, which shows that Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) is single-valued except at some point τ¯λ,σ\bar{\tau}_{\lambda,\sigma} depending upon only the λ\lambda and σ\sigma. This conclusion will be used in the proof of Theorem 3.4.

Lemma 2.4

Let λ>σ2\lambda>\sigma^{2} and τ\tau\in\mathbb{R}. Then

Proxλfσ(τ)={{0}, if |τ|τ¯λ,σ,{0,x1(τ)}, if |τ|=τ¯λ,σ,{sign(τ)x1(τ)}, otherwise,\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)=\left\{\begin{array}[]{ll}\{0\},&\mbox{ if }|\tau|\leq\bar{\tau}_{\lambda,\sigma},\\ \{0,x_{1}(\tau)\},&\mbox{ if }|\tau|=\bar{\tau}_{\lambda,\sigma},\\ \{\,{\rm sign}\,(\tau)x_{1}(\tau)\},&\mbox{ otherwise},\end{array}\right.

where τ¯λ,σ=x+λσexσ\bar{\tau}_{\lambda,\sigma}=x^{*}+\frac{\lambda}{\sigma}e^{-\frac{x^{*}}{\sigma}} with x(0,2λ)x^{*}\in(0,\sqrt{2\lambda}) being the solution to the equation 12+λ(xσ+1)exσ1x2=0\frac{1}{2}+\lambda\frac{(\frac{x}{\sigma}+1)e^{-\frac{x}{\sigma}}-1}{x^{2}}=0 on (0,)(0,\infty), and x1(τ)x_{1}(\tau) is defined as in Lemma 2.2.

Obviously, according to Lemmas 2.3 and 2.4, the threshold τ¯λ,σ\bar{\tau}_{\lambda,\sigma} satisfies

σ(1+lnλσ2)τ¯λ,σλσ.\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq\bar{\tau}_{\lambda,\sigma}\leq\frac{\lambda}{\sigma}.

Those three points will be frequently used when we explore the iteratively reweighted 1\ell_{1} algorithm for computing Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}} in the next section.

3 Analysis of IRL1 for computing Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}}

In this section, we will analyze the IRL1 algorithm to compute the following problem:

minx{λfσ(x)+12(xτ)2}.\displaystyle\min_{x\in\mathbb{R}}\Big{\{}\lambda f_{\sigma}(x)+\frac{1}{2}(x-\tau)^{2}\Big{\}}. (2)

To solve the problem (2), the nonconvex function fσf_{\sigma} in the IRL1 algorithm is locally approximated by its linear expansion, namely,

fσ(x)fσ(x(k))+1σe|x(k)|σ(|x||x(k)|),f_{\sigma}(x)\approx f_{\sigma}(x^{(k)})+\frac{1}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}(|x|-|x^{(k)}|),

where x(k)x^{(k)} denotes the kk-th iteration. With it, the next iteration x(k+1)x^{(k+1)} for a given τ\tau is computed by

x(k+1):=argminx{12(xτ)2+λ(fσ(x(k))+1σe|x(k)|σ(|x||x(k)|))}.x^{(k+1)}:=\arg\min_{x\in\mathbb{R}}\Big{\{}\frac{1}{2}(x-\tau)^{2}+\lambda\Big{(}f_{\sigma}(x^{(k)})+\frac{1}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}(|x|-|x^{(k)}|)\Big{)}\Big{\}}.

By removing the terms which do not depend on the variable xx in the above expression, we obtain

x(k+1)=argminx{12(xτ)2+λσe|x(k)|σ|x|}=Proxλσe|x(k)|σ||(τ),x^{(k+1)}=\arg\min_{x\in\mathbb{R}}\Big{\{}\frac{1}{2}(x-\tau)^{2}+\frac{\lambda}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}|x|\Big{\}}=\,{\rm Prox}\,_{\frac{\lambda}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}|\cdot|}(\tau),

that is,

x(k+1)=sign(τ)(|τ|λσe|x(k)|σ)+,x^{(k+1)}=\,{\rm sign}\,(\tau)\Big{(}|\tau|-\frac{\lambda}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}\Big{)}_{+},

where (t)+:=max{0,t}(t)_{+}:=\max\{0,t\}.

It is sufficient to restrict our discussion on τ>0\tau>0 as Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) is symmetric about the origin [16, Lemma 2.1] and Proxλfσ(0)={0}\,{\rm Prox}\,_{\lambda f_{\sigma}}(0)=\{0\}. To be more precise, the IRL1 algorithm for PiE with τ>0\tau>0 is described in Algorithm 1.

Algorithm 1 Iteratively Reweighted 1\ell_{1} Algorithm (IRL1)

Input Fix λ>0\lambda>0 and σ>0\sigma>0. Given x(0)0x^{(0)}\geq 0 and τ>0\tau>0.

  • for k=0,1,k=0,1,\dots do

    x(k+1)=(τλσex(k)σ)+x^{(k+1)}=\Big{(}\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}\Big{)}_{+} (3)
  • end for

Output x()x^{(\infty)}

Denote F(x):=λfσ(x)+12(xτ)2F(x)\!:=\!\lambda f_{\sigma}(x)+\frac{1}{2}(x-\tau)^{2}. We call xx a critical point of the function FF, if 0F(x)0\!\in\!\partial F(x) is satisfied, where F(x)\partial F(x) denotes the subdifferential of FF at xx [26, Definition 8.3]. Ochs et al [22] pointed out that the sequence {x(k)}\{x^{(k)}\} generated by Algorithm 1 converges to a critical point of the function FF. We go one step further than the previous result and show that not only the sequence {x(k)}\{x^{(k)}\} is convergent, but also its limit x()x^{(\infty)} depends on the initialization x(0)x^{(0)} and the relationship of τ\tau with the parameters λ\lambda and σ\sigma. The convergence behavior of (3) is described by Lemmas 4.14.7 in Section 4. This is then compared to the true solution set Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) in Theorems 3.3 and 3.4. In particular, we identify the intervals where (3) will not achieve the true solution. These intervals are explicitly determined in terms of the initial x(0)x^{(0)} and parameters λ\lambda and σ\sigma.

Notice that x()x^{(\infty)} satisfying the equation x=(τλσexσ)+x=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x}{\sigma}})_{+} by (3). To further investigate properties of x()x^{(\infty)}, for given τ\tau\!\in\!\mathbb{R} we define a function ϕ:\phi:\mathbb{R}\to\mathbb{R} with

ϕ(x):=τxλσexσ, for any x,\phi(x):=\tau-x-\frac{\lambda}{\sigma}e^{-\frac{x}{\sigma}},{\text{ for any }}x\in\mathbb{R}, (4)

and its main properties used later are listed in the following Lemma.

Lemma 3.1

Let ϕ\phi be defined by (4). Write x2(τ):=σW1(λσ2eτσ)+τx_{2}(\tau):=\sigma W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau, and x1(τ)x_{1}(\tau) is defined as in Lemma 2.2. Then, the following statements hold.

  • (i)

    The function ϕ\phi is strictly increasing on (,σlnλσ2](-\infty,\sigma\ln\frac{\lambda}{\sigma^{2}}] and strictly decreasing on (σlnλσ2,+)(\sigma\ln\frac{\lambda}{\sigma^{2}},+\infty). Moreover, ϕ(x)ϕ(σlnλσ2)=τσ(1+lnλσ2)\phi(x)\!\leq\!\phi(\sigma\ln\frac{\lambda}{\sigma^{2}})\!=\!\tau\!-\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}) for any xx\!\in\!\mathbb{R}.

  • (ii)

    If τ(σ(1+lnλσ2),λσ)\tau\!\in\!(\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}), the equation ϕ(x)=0\phi(x)\!=\!0 has two solutions x1(τ)x_{1}(\tau) and x2(τ)x_{2}(\tau) with

    {0<x2(τ)<σlnλσ2<x1(τ),ifλ>σ2,x2(τ)<σlnλσ2<x1(τ)<0,ifλσ2.\displaystyle\left\{\begin{array}[]{cl}0<x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau),&{\rm if\;}\lambda>\sigma^{2},\\ x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)<0,&{\rm if\;}\lambda\leq\sigma^{2}.\end{array}\right. (7)
  • (iii)

    If τ=σ(1+lnλσ2)\tau=\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}), the equation ϕ(x)=0\phi(x)=0 has a unique solution, that is, x1(τ)=x2(τ)=σlnλσ2x_{1}(\tau)=x_{2}(\tau)=\sigma\ln{\frac{\lambda}{\sigma^{2}}}.

  • (iv)

    If τ>λσ\tau>\frac{\lambda}{\sigma}, the equation ϕ(x)=0\phi(x)=0 has two solutions x1(τ)x_{1}(\tau) and x2(τ)x_{2}(\tau) satisfying x2(τ)<0<x1(τ)x_{2}(\tau)<0<x_{1}(\tau).

  • (v)

    If τ=λσ\tau=\frac{\lambda}{\sigma}, the equation ϕ(x)=0\phi(x)=0 has two solutions x1(τ)x_{1}(\tau) and x2(τ)x_{2}(\tau) with

    {0=x2(τ)<σlnλσ2<x1(τ),ifλ>σ2,x2(τ)<σlnλσ2<x1(τ)=0,ifλ<σ2,x1(τ)=x2(τ)=0,ifλ=σ2.\displaystyle\left\{\begin{array}[]{cl}0=x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau),&{\rm if\;}\lambda>\sigma^{2},\\ x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)=0,&{\rm if\;}\lambda<\sigma^{2},\\ x_{1}(\tau)=x_{2}(\tau)=0,&{\rm if\;}\lambda=\sigma^{2}.\end{array}\right. (11)

Proof: After simple calculation, ϕ(x)=λσ2exσ1\phi^{\prime}(x)=\frac{\lambda}{\sigma^{2}}e^{-\frac{x}{\sigma}}-1, ϕ′′(x)=λσ3exσ\phi^{\prime\prime}(x)=-\frac{\lambda}{\sigma^{3}}e^{-\frac{x}{\sigma}}. Clearly, the statement (i) holds. The equation ϕ(x)=0\phi(x)=0 is equivalent to x=τλσexσx\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x}{\sigma}}, namely,

xτσexτσ=λσ2eτσ.\frac{x-\tau}{\sigma}e^{\frac{x-\tau}{\sigma}}=-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}. (12)

If τ>σ(1+lnλσ2)\tau\!>\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}), λσ2eτσ(1e,0)-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0). By definition of Lambert W function and the equation (12), the equation ϕ(x)=0\phi(x)=0 has two solutions x1(τ)x_{1}(\tau) and x2(τ)x_{2}(\tau). Together with (i) and the fact ϕ(0)=τλσ\phi(0)\!=\!\tau-\frac{\lambda}{\sigma}, we know that the statements (ii) and (iv) hold. When τ=σ(1+lnλσ2)\tau\!=\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}), λσ2eτσ=1e-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!=\!-\frac{1}{e}. Hence, we obtain

W1(λσ2eτσ)=W0(λσ2eτσ)=W(1e)=1,W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W(-\frac{1}{e})=-1,

which implies x1(τ)=x2(τ)=τσ=σlnλσ2x_{1}(\tau)\!=\!x_{2}(\tau)\!=\!\tau-\sigma\!=\!\sigma\ln{\frac{\lambda}{\sigma^{2}}}. The statement (iii) holds. In the following, we will argue the statement (v). Notice τ=λσ>σ(1+lnλσ2)\tau\!=\!\frac{\lambda}{\sigma}\!>\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}), λσ2eτσ(1e,0)-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0). So, the equation ϕ(x)=0\phi(x)\!=\!0 has solutions x1(τ)x_{1}(\tau) and x2(τ)x_{2}(\tau). From (i), it follows

x2(τ)<σlnλσ2<x1(τ).\displaystyle x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau). (13)

We will proceed in two cases.

Case 1: λσ2\lambda\neq\sigma^{2}. If λ>σ2\lambda>\sigma^{2}, then λσ2<1-\frac{\lambda}{\sigma^{2}}<-1. With λσ2eτσ(1e,0)-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0), we know that W1(λσ2eλσ2)=λσ2.W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=-\frac{\lambda}{\sigma^{2}}. Together with τ=λσ\tau=\frac{\lambda}{\sigma}, yielding

x2(τ)=σW1(λσ2eτσ)+τ=τ+σW1(λσ2eλσ2)=τλσ=0.\displaystyle x_{2}(\tau)=\sigma W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau=\tau+\sigma W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=\tau-\frac{\lambda}{\sigma}=0.

Again from (13), it follows that 0=x2(τ)<σlnλσ2<x1(τ)0=x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau).

If λ<σ2\lambda<\sigma^{2}, then λσ2>1-\frac{\lambda}{\sigma^{2}}>-1. With λσ2eτσ(1e,0)-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0), we know that W0(λσ2eλσ2)=λσ2.W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=-\frac{\lambda}{\sigma^{2}}. Together with τ=λσ\tau=\frac{\lambda}{\sigma}, yielding

x1(τ)=σW0(λσ2eτσ)+τ=τ+σW0(λσ2eλσ2)=τλσ=0.\displaystyle x_{1}(\tau)=\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau=\tau+\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=\tau-\frac{\lambda}{\sigma}=0.

Again from (13), it follows that x2(τ)<σlnλσ2<x1(τ)=0x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)=0.

Case 2: λ=σ2\lambda\!=\!\sigma^{2}. Now λσ2eτσ=1e-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!=\!-\frac{1}{e} by τ=λσ\tau=\frac{\lambda}{\sigma}. Hence,

W1(λσ2eτσ)=W0(λσ2eτσ)=W(1e)=1,W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W(-\frac{1}{e})=-1,

which implies ϕ(x)=0\phi(x)\!=\!0 has a unique solution x1(τ)=x2(τ)=0x_{1}(\tau)\!=\!x_{2}(\tau)\!=\!0 by (12).   \Box

Proposition 3.2

Given τ>0\tau>0 and an initial value x(0)0x^{(0)}\geq 0. Suppose that the sequence {x(k)}\{x^{(k)}\} generated by Algorithm 1 converges to x()x^{(\infty)}. Then, the following statements hold.

  • (i)

    x()=0x^{(\infty)}=0 implies that τλσ\tau\leq\frac{\lambda}{\sigma}.

  • (ii)

    If τ>λσ\tau>\frac{\lambda}{\sigma}, x()=σW0(λσ2eτσ)+τx^{(\infty)}=\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau.

Proof: By the continuity of the function ()+(\cdot)_{+}, x(k)xx^{(k)}\to x^{\infty} and the equation (3) for each kk, we know that

x()=(τλσex()σ)+.\displaystyle x^{(\infty)}=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}})_{+}. (14)

If x()=0x^{(\infty)}\!=\!0, then (τλσ)+=0(\tau-\frac{\lambda}{\sigma})_{+}\!=\!0 from (14), which implies that τλσ\tau\!\leq\!\frac{\lambda}{\sigma}. Hence, the statement (i) holds. If τ>λσ\tau\!>\!\frac{\lambda}{\sigma}, then x()>0x^{(\infty)}\!>\!0 from (i), and x()=τλσex()σx^{(\infty)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}} from (14), namely, ϕ(x())=0\phi(x^{(\infty)})\!=\!0, where ϕ\phi defined by (4). So, x()=σW0(λσ2eτσ)+τx^{(\infty)}\!=\!\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau from Lemma 3.1 (iv).   \Box

3.1 Comparing IRL1 solution with Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}}

In this subsection, we will identify when the limit x()x^{(\infty)} of the sequence {x(k)}\{x^{(k)}\} belongs or not belongs to the set Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau). We recall in Lemmas 2.2 and 2.4 that the set Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) has a unique element except for |τ|=τ¯λ,σ|\tau|=\bar{\tau}_{\lambda,\sigma} with λ>σ2\lambda>\sigma^{2}.

The following two theorems summarize our main results. Our results for PiE are mainly inspired by the ideas presented in [23, section 4] for the iteratively reweighted algorithm for computing the proximal operator of the log-sum penalty. The proofs as well as relevant technical lemmas are given in Section 4. From now on, we say that a sequence {x(k)}\{x^{(k)}\} is converging to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) provided that the limit of {x(k)}\{x^{(k)}\} belongs to the set Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau).

Theorem 3.3

Given τ>0\tau\!>\!0 and an initial value x(0)0x^{(0)}\!\geq\!0. Let λσ2\lambda\leq\sigma^{2}. Then the sequence {x(k)}\{x^{(k)}\} generated by Algorithm 1 converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau).

If λ>σ2\lambda\!>\!\sigma^{2}, we see that {x(k)}\{x^{(k)}\} generated by Algorithm 1 may not always converge to Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}} for some given x(0)0x^{(0)}\!\geq\!0. The regions where the algorithm fails depend on the threshold τ¯λ,σ\bar{\tau}_{\lambda,\sigma} given as in Lemma 2.4 and x2(τ)x_{2}(\tau) defined in Lemma 3.1, as shown in Fig. 2. The value τ¯λ,σ\bar{\tau}_{\lambda,\sigma} can be computed by the bisection method. Notice that x2(τ)x_{2}(\tau) is strictly decreasing on [σ(1+lnλσ2),λσ][\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}], by Lemma 2.1, we denote the inverse function of x2(τ)x_{2}(\tau) by x21(τ)x_{2}^{-1}(\tau) for each τ[σ(1+lnλσ2),λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}].

Theorem 3.4

Given τ>0\tau\!>\!0 and an initial value x(0)0x^{(0)}\!\geq\!0. Let λ>σ2\lambda>\sigma^{2}, τ¯λ,σ\bar{\tau}_{\lambda,\sigma} be defined as in Lemma 2.4, xi(τ)(i=1,2)x_{i}(\tau)(i=1,2) be defined as in Lemma 3.1 and the sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1. Then the following statements hold.

  • (i)

    The sequence {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ(0,σ(1+lnλσ2))(λσ,+)\tau\!\in\!(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))\!\cup\!(\frac{\lambda}{\sigma},+\infty).

  • (ii)

    If x(0)σlnλσ2x^{(0)}\geq\sigma\ln\frac{\lambda}{\sigma^{2}}, {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) for any τ[σ(1+lnλσ2),λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]. Consequently, {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ[τ¯λ,σ,λσ]\tau\in[\bar{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}], however {x(k)}\{x^{(k)}\} does not converge to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ[σ(1+lnλσ2),τ¯λ,σ)\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\bar{\tau}_{\lambda,\sigma}).

  • (iii)

    If x2(τ¯λ,σ)<x(0)<σlnλσ2x_{2}(\bar{\tau}_{\lambda,\sigma})\!<\!x^{(0)}\!<\!\sigma\ln\frac{\lambda}{\sigma^{2}}, the sequence {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ[σ(1+lnλσ2),x21(x(0)))[τ¯λ,σ,λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)}))\cup[\bar{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}], but the sequence {x(k)}\{x^{(k)}\} does not converge to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ[x21(x(0)),τ¯λ,σ)\tau\in[x_{2}^{-1}(x^{(0)}),\bar{\tau}_{\lambda,\sigma}).

  • (iv)

    If x(0)=x2(τ¯λ,σ)x^{(0)}=x_{2}(\bar{\tau}_{\lambda,\sigma}), the sequence {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ[σ(1+lnλσ2),τ¯λ,σ)(τ¯λ,σ,λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\bar{\tau}_{\lambda,\sigma})\cup(\bar{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}], however the sequence {x(k)}\{x^{(k)}\} does not converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) when τ=τ¯λ,σ\tau=\bar{\tau}_{\lambda,\sigma}.

  • (v)

    If 0x(0)<x2(τ¯λ,σ)0\!\leq\!x^{(0)}\!<\!x_{2}(\bar{\tau}_{\lambda,\sigma}), the sequence {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ[σ(1+lnλσ2),τ¯λ,σ](x21(x(0)),λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\bar{\tau}_{\lambda,\sigma}]\cup(x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}], but the sequence {x(k)}\{x^{(k)}\} does not converge to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ(τ¯λ,σ,x21(x(0))]\tau\in(\bar{\tau}_{\lambda,\sigma},x_{2}^{-1}(x^{(0)})].

Remark 3.5

The initial value for ILR1 is usually and simply set to be 11 [5, Subsection 2.2] for compressed sensing, to be a random feasible value for support vector machines [4, Subsection 2.1], and the identity matrix for a low-rank matrix completion problem [41, Algorithm1]. By Theorem 3.4, the above choice may result in the deviation between the IRL1 solution and the proximal operator of PiE.

Fig. 2 illustrates the results (i)-(v) in Theorem 3.4 with τ>0\tau\!>\!0. Only when τ\tau lies in a subset of the interval [σ(1+lnλσ2),λσ][\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}] does the deviation occur. The colored regions indicate where the IRL1 solution differs from the proximal operator of PiE. For example, let λ=2\lambda=2 and σ=1\sigma=1. Then σ(1+lnλσ2)=1+ln2\sigma(1+\ln\frac{\lambda}{\sigma^{2}})=1+\ln 2, τ¯λ,σ=1.7638\bar{\tau}_{\lambda,\sigma}=1.7638, λσ=2\frac{\lambda}{\sigma}=2, σlnλσ2=ln2\sigma\ln\frac{\lambda}{\sigma^{2}}=\ln 2, x2(τ¯λ,σ)=0.3393x_{2}(\bar{\tau}_{\lambda,\sigma})=0.3393, and x1(τ¯λ,σ)=1.094x_{1}(\bar{\tau}_{\lambda,\sigma})=1.094. In this case, given an initial value x(0)=1>σlnλσ2x^{(0)}=1>\sigma\ln\frac{\lambda}{\sigma^{2}}, the IRL1 solution (red dashdot) and the true proximal operator (black dashed) are illustrated in Fig. 3, which corresponds to the case of Theorem 3.4 (ii). Clearly, the IRL1 solution disagrees with the true proximal operator for any given τ[1+ln2,1.7638)\tau\in[1+\ln 2,1.7638).

Refer to caption
Figure 2: Illustration of Theorem 3.4 with τ>0\tau>0. The specific regions where the IRL1 solution differs from the proximal operator of PiE in (ii)-(v) are marked in blue, red, yellow, and green, respectively.
Refer to caption
Figure 3: The proximal operator Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}} and the IRL1 solution in Theorem 3.4 with λ=2\lambda=2, σ=1\sigma=1, and the initial value x(0)=1x^{(0)}=1.

3.2 Choices of initial values

We are devoted to adaptively selecting an initial value in a simple way to guarantee the fast convergence of the IRL1 solution to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for all τ>0\tau>0. The discussion will be divided into two cases: λσ2\lambda\leq\sigma^{2} and λ>σ2\lambda>\sigma^{2}.

Theorem 3.6

Given τ>0\tau>0. Suppose that λσ2\lambda\leq\sigma^{2} and the sequence {x(k)}\{x^{(k)}\} is generated by Algorithm 1 with the initial value x(0)x^{(0)} in Algorithm 1 given as

x(0):={0, if τλσ,τ, otherwise.x^{(0)}:=\left\{\begin{array}[]{ll}0,&\mbox{ if }\tau\leq\frac{\lambda}{\sigma},\\ \tau,&\mbox{ otherwise}.\end{array}\right. (15)

Then, the following statements hold.

  • (i)

    If 0<τλσ0<\tau\leq\frac{\lambda}{\sigma}, then x(k)=0x^{(k)}=0 for each kk\in\mathbb{N}.

  • (ii)

    If τ>λσ\tau>\frac{\lambda}{\sigma}, it holds that

    (λσ2eτσ)k(τx1(τ))<x(k)x1(τ)<(λσ2ex1(τ)σ)k(τx1(τ)),\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau))\!<\!x^{(k)}-x_{1}(\tau)\!<\!\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau)), (16)

    where x1(τ)x_{1}(\tau) is defined as in Lemma 2.2.

Proof: If 0<τλσ0\!<\!\tau\leq\frac{\lambda}{\sigma}, x(0)=0x^{(0)}\!=\!0, the statement (i) is trivial by Lemma 4.1 (i). Now suppose τ>λσ\tau\!>\!\frac{\lambda}{\sigma}. Then x(0)=τ>0x^{(0)}\!=\!\tau>0. Therefore, by Lemma 4.3, x(k)>0x^{(k)}>0 for each kk\in\mathbb{N} and {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau). Notice that τ=x1(τ)+λσex1(τ)σ\tau=x_{1}(\tau)+\frac{\lambda}{\sigma}e^{-\frac{x_{1}(\tau)}{\sigma}} by Lemma 3.1(iv). With (3) and the Lagrange mean value theorem, we arrive at

x(k+1)x1(τ)=λσ(ex1(τ)σex(k)σ)=λσ2eξσ(x(k)x1(τ)),x^{(k+1)}-x_{1}(\tau)=\frac{\lambda}{\sigma}\Big{(}e^{-\frac{x_{1}(\tau)}{\sigma}}-e^{-\frac{x^{(k)}}{\sigma}}\Big{)}=\frac{\lambda}{\sigma^{2}}e^{-\frac{\xi}{\sigma}}(x^{(k)}-x_{1}(\tau)), (17)

for some ξ(x1(τ),x(k))(x1(τ),τ)\xi\in(x_{1}(\tau),x^{(k)})\subseteq(x_{1}(\tau),\tau). Note that etσe^{-\frac{t}{\sigma}} is strictly decreasing for any t>0t>0. By (17), it follows that

λσ2eτσ(x(k)x1(τ))<x(k+1)x1(τ)<λσ2ex1(τ)σ(x(k)x1(τ)).\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}(x^{(k)}-x_{1}(\tau))\!<\!x^{(k+1)}-x_{1}(\tau)\!<\!\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}(x^{(k)}-x_{1}(\tau)).

Repeating the process, which yields (16). The proof is hence complete.   \Box

Remark 3.7

A sequence {y(k)}\{y^{(k)}\}\subseteq\mathbb{R} is said to converge Q-linearly to a point y¯\bar{y} if there exists c>0c>0 such that limk+|y(k+1)y¯|/|y(k)y¯|=c\lim_{k\to+\infty}|y^{(k+1)}-\bar{y}|/|y^{(k)}-\bar{y}|=c. The equation (17) in Theorem 3.6 shows that the approximate error x(k)Proxλfσx^{(k)}-\,{\rm Prox}\,_{\lambda f_{\sigma}} converges Q-linearly to x1(τ)x_{1}(\tau) for each τ>λσ\tau\!>\!\frac{\lambda}{\sigma} when λσ2\lambda\leq\sigma^{2}.

For (16), λσ2ex1(τ)σ<λσ2e01\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}<\frac{\lambda}{\sigma^{2}}e^{0}\leq 1 by Lemma 3.1 (iv). Moreover, x1(τ)x_{1}(\tau) is increasing on (λσ,+)(\frac{\lambda}{\sigma},+\infty) by Lemma 2.1 and x1(τ)0x_{1}(\tau)\to 0 as τλσ\tau\to\frac{\lambda}{\sigma}. Let λ=1\lambda=1 and σ=2\sigma=2 in Theorem 3.6. Fix k{1,2,3,4}k\in\{1,2,3,4\}, x(k)x^{(k)}, Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}}, and the corresponding error function x(k)Proxλfσx^{(k)}-\,{\rm Prox}\,_{\lambda f_{\sigma}} for any τ>0\tau>0 are illustrated in Fig. 4.

Refer to caption
(a)
Refer to caption
(b)
Figure 4: Illustration of Theorem 3.6 with λ=1\lambda=1 and σ=2\sigma=2 for any τ(0,1.5]\tau\in(0,1.5]. (a) The true proximal operator and IRL1 solution x(k)x^{(k)} with k=1,2,3,4k=1,2,3,4; (b) the error function.

Next, we will consider the case that λ>σ2\lambda>\sigma^{2}. As a direct sequence of Theorem 3.4, if we fix the initial value x(0)0x^{(0)}\geq 0 (for example, x(0)=τx^{(0)}=\tau or x(0)=0x^{(0)}=0) for all τ>0\tau>0, then {x(k)}\{x^{(k)}\} generated by Algorithm 1 fails to converge to the solution of Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for at least one τ>0\tau>0. To solve this, the initial value x(0)0x^{(0)}\geq 0 will be chose depending on τ¯λ,σ\bar{\tau}_{\lambda,\sigma}. A simple choice for x(0)x^{(0)} is suggested below.

Theorem 3.8

Given τ>0\tau\!>\!0. Let λ>σ2\lambda\!>\!\sigma^{2} and the initial value x(0)x^{(0)} is given by

x(0):={0, if ττ¯λ,σ,τ, otherwise.x^{(0)}:=\left\{\begin{array}[]{ll}0,&\mbox{ if }\tau\leq\bar{\tau}_{\lambda,\sigma},\\ \tau,&\mbox{ otherwise}.\end{array}\right. (18)

Then, the following statements hold.

  • (i)

    The sequence {x(k)}\{x^{(k)}\} generated by Algorithm 1 converges to the solution of Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ>0\tau>0.

  • (ii)

    If 0<ττ¯λ,σ0<\tau\leq\bar{\tau}_{\lambda,\sigma}, then x(k)=0x^{(k)}=0 for each kk.

  • (iii)

    If τ>τ¯λ,σ\tau>\bar{\tau}_{\lambda,\sigma}, it holds that

    (λσ2eτσ)k(τx1(τ))<x(k)x1(τ)<(λσ2ex1(τ)σ)k(τx1(τ)),\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau))\!<\!x^{(k)}-x_{1}(\tau)\!<\!\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau)),

    where x1(τ)x_{1}(\tau) is defined as in Lemma 2.2.

Proof: Suppose ττ¯λ,σ\tau\leq\bar{\tau}_{\lambda,\sigma}. Then x(0)=0x^{(0)}=0. By Theorem 3.4 (i) and (v), then {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any ττ¯λ,σ\tau\leq\bar{\tau}_{\lambda,\sigma}. If τ>τ¯λ,σ\tau>\bar{\tau}_{\lambda,\sigma}, then τ>σ(1+lnλσ2)\tau>\sigma(1+\ln\frac{\lambda}{\sigma^{2}}) and further x(0)=τ>σlnλσ2x^{(0)}=\tau>\sigma\ln\frac{\lambda}{\sigma^{2}}. By Theorem 3.4 (i) and (ii), {x(k)}\{x^{(k)}\} converges to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for any τ>τ¯λ,σ\tau\!>\!\bar{\tau}_{\lambda,\sigma}. So, the statement (i) holds. The statement (ii) is from Lemma 4.1 (i) and the fact τ¯λ,σλσ\bar{\tau}_{\lambda,\sigma}\!\leq\!\frac{\lambda}{\sigma}. Now suppose that τ>τ¯λ,σ\tau>\bar{\tau}_{\lambda,\sigma}. Then x(0)=τx^{(0)}=\tau. Notice ϕ(τ)<0\phi(\tau)<0 where ϕ\phi is defined in (4). Then x1(τ)<τx_{1}(\tau)<\tau by Lemma 3.1 and λ>σ2\lambda>\sigma^{2}. Associating the proof of Lemma 4.7 (iii) when x(0)>x1(τ)x^{(0)}>x_{1}(\tau) with Lemma 4.3, we know that x(k+1)=τλσex(k)σx^{(k+1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}} and x(k)>x(k+1)>x1(τ)x^{(k)}\!>\!x^{(k+1)}\!>\!x_{1}(\tau) for each kk. The rest proof is similar to the last part of Theorem 3.6. We omit it.   \Box

Recall that σ(1+lnλσ2)τ¯λ,σλσ\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq\bar{\tau}_{\lambda,\sigma}\leq\frac{\lambda}{\sigma} and x1(τ)x_{1}(\tau) is strictly increasing for τσ(1+lnλσ2)\tau\geq\sigma(1+\ln\frac{\lambda}{\sigma^{2}}). Observe that x1(σ(1+lnλσ2))=σlnλσ2x_{1}(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))=\sigma\ln\frac{\lambda}{\sigma^{2}}. It follows that for any τ>τ¯λ,σ\tau>\bar{\tau}_{\lambda,\sigma},

λσ2ex1(τ)σ<λσ2ex1(τ¯λ,σ)σλσ2ex1(σ(1+lnλσ2))σ=λσ2eσlnλσ2σ=1.\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}<\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\bar{\tau}_{\lambda,\sigma})}{\sigma}}\leq\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))}{\sigma}}=\frac{\lambda}{\sigma^{2}}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma^{2}}}{\sigma}}=1.

Given the initial value x(0)x^{(0)} as in Theorem 3.8. Let λ=2\lambda=2 and σ=1\sigma=1. Fix k{2,4,6,8}k\in\{2,4,6,8\}, x(k)x^{(k)}, Proxλfσ\,{\rm Prox}\,_{\lambda f_{\sigma}}, and the corresponding error function x(k)Proxλfσx^{(k)}-\,{\rm Prox}\,_{\lambda f_{\sigma}} for any τ>0\tau>0 are given in Fig. 5.

Refer to caption
(a)
Refer to caption
(b)
Figure 5: Illustration of Theorem 3.8 with λ=2\lambda=2 and σ=1\sigma=1 for any τ(0,3]\tau\in(0,3]. (a) The true proximal operator and IRL1 solution x(k)x^{(k)} with k=2,4,6,8k=2,4,6,8; (b) the error function.

4 Proof of Theorems 3.3 and 3.4

To start with, we present several technical lemmas describing the convergence of Algorithm 1. Then, the limit of the sequence generated by this algorithm is compared to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) directly with τ>0\tau>0.

Lemma 4.1

Given τ>0\tau\!>\!0 and x(0)0x^{(0)}\!\geq\!0. Let sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1. Suppose that there exists k00k_{0}\!\geq\!0 such that x(k0)=0x^{(k_{0})}=0. Then

  • (i)

    If τλσ\tau\leq\frac{\lambda}{\sigma}, x(k)=0x^{(k)}=0 for any kk0k\geq k_{0}.

  • (ii)

    If τ>λσ\tau>\frac{\lambda}{\sigma}, x(k+1)=τλσex(k)σ>0x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0 for any kk0k\geq k_{0}. Moreover, x(k+1)>x(k)x^{(k+1)}>x^{(k)} for any kk0k\geq k_{0}.

  • (iii)

    If τ>λσ\tau>\frac{\lambda}{\sigma}, the sequence {x(k)}\{x^{(k)}\} converges to σW0(λσ2eτσ)+τ\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau.

Proof: From (3), it holds that

x(k0+1)=(τλσex(k0)σ)+=(τλσ)+.x^{(k_{0}+1)}=\Big{(}\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k_{0})}}{\sigma}}\Big{)}_{+}=\Big{(}\tau-\frac{\lambda}{\sigma}\Big{)}_{+}. (19)

Clearly, if τλσ\tau\!\leq\!\frac{\lambda}{\sigma}, x(k0+1)=0x^{(k_{0}+1)}\!=\!0 by (19), yielding x(k)=0x^{(k)}\!=\!0 for any kk0+1k\!\geq\!k_{0}+1. If τ>λσ\tau\!>\!\frac{\lambda}{\sigma}, x(k0+1)=τλσ>0x^{(k_{0}+1)}\!=\!\tau-\frac{\lambda}{\sigma}\!>\!0 by (19). With τλσex(k0+1)σ>τλσe0>0\tau\!-\!\frac{\lambda}{\sigma}e^{-\frac{x^{(k_{0}+1)}}{\sigma}}\!>\!\tau\!-\!\frac{\lambda}{\sigma}e^{0}\!>\!0 and (3), one has

x(k0+2)=(τλσex(k0+1)σ)+>0,x^{(k_{0}+2)}=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k_{0}+1)}}{\sigma}})_{+}>0,

yielding x(k+1)=τλσex(k)σ>0x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0 for any kk0k\geq k_{0}. Moreover, notice that x(k0+2)>x(k0+1)x^{(k_{0}+2)}>x^{(k_{0}+1)}. Together with x(k+1)=τλσex(k)σ>0x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0 for any kk0k\geq k_{0} and the monotonic increase of the function h(t):=τλσeth(t):=\tau-\frac{\lambda}{\sigma}e^{-t}, we know that x(k+1)>x(k)x^{(k+1)}>x^{(k)} for any kk0k\geq k_{0}. Hence, the statements (i) and (ii) hold.

By (ii) and τλσ<x(k)<τ\tau-\frac{\lambda}{\sigma}\!<\!x^{(k)}\!<\!\tau for each k>k0k>k_{0}, the sequence {x(k)}\{x^{(k)}\} converges. Moreover, it converges to σW0(λσ2eτσ)+τ\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau by Proposition 3.2 (ii).   \Box

Lemma 4.2

Given τ>0\tau\!>\!0 and x(0)0x^{(0)}\!\geq\!0. Let sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1. If x(k)>0x^{(k)}>0 for any kk\in\mathbb{N}, then

  • (i)

    {x(k)}\{x^{(k)}\} is strictly increasing and convergent if x(1)>x(0)x^{(1)}>x^{(0)}.

  • (ii)

    {x(k)}\{x^{(k)}\} is strictly decreasing and convergent if x(1)<x(0)x^{(1)}<x^{(0)}.

  • (iii)

    {x(k)}\{x^{(k)}\} is constant if x(1)=x(0)x^{(1)}=x^{(0)}.

Proof: Since x(k)>0x^{(k)}\!>\!0 for each kk and (3), x(k+1)=τλσex(k)σ>0x^{(k+1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}\!>\!0 for any kk\in\mathbb{N}. If x(1)>x(0)x^{(1)}\!>\!x^{(0)}, together with the monotonic increase of the function h(t):=τλσeth(t):=\tau-\frac{\lambda}{\sigma}e^{-t}, we know that x(k+1)>x(k)x^{(k+1)}\!>\!x^{(k)} for any kk. Obviously, 0<x(k)<τ0<x^{(k)}<\tau for each kk. Hence, {x(k)}\{x^{(k)}\} is strictly increasing and converging, and the statement (i) holds. The rest proof is similar to (i).   \Box

Lemma 4.3

Given τ[λσ,+)\tau\in[\frac{\lambda}{\sigma},+\infty) and x(0)>0x^{(0)}>0. Let the sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1. Then x(k)>0x^{(k)}>0 for all k0k\geq 0 and the sequence {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) defined as in Lemma 2.2.

Proof: Since τλσ\tau\!\geq\!\frac{\lambda}{\sigma} and x(0)>0x^{(0)}\!>\!0, it holds that τλσex(0)σ>τλσ0\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}\!>\!\tau-\frac{\lambda}{\sigma}\!\geq\!0. With (3), we have x(1)=(τλσex(0)σ)+=τλσex(0)σ>0,x^{(1)}=\Big{(}\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}\Big{)}_{+}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}>0, which yields

x(k+1)=τλσex(k)σ>0, for any k.x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0,\text{ for any }k\in\mathbb{N}.

By Lemma 4.2, it suffices to argue the sequence {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau). Notice that x(1)x(0)=τλσex(0)σx(0)=ϕ(x(0))x^{(1)}-x^{(0)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}-x^{(0)}\!=\!\phi(x^{(0)}), where ϕ\phi be defined by (4). Obviously, x1(τ)0x_{1}(\tau)\!\geq\!0 by Lemma 3.1 (iv) and (v). We will proceed in three cases.

Case 1: x(0)=x1(τ)>0x^{(0)}\!=\!x_{1}(\tau)\!>\!0. Then ϕ(x(0))=0\phi(x^{(0)})\!=\!0 by Lemma 3.1 (iv), namely, x(1)=x(0)x^{(1)}\!=\!x^{(0)}. Hence, {x(k)}\{x^{(k)}\} is constant from Lemma 4.2 (iii). The desired result obviously holds.

Case 2: 0<x(0)<x1(τ)0\!<\!x^{(0)}\!<\!x_{1}(\tau). Now, x1(τ)>0x_{1}(\tau)\!>\!0. Then ϕ(x(0))>0\phi(x^{(0)})\!>\!0 by Lemma 3.1 (i) and the fact ϕ(0)0\phi(0)\!\geq\!0, which implies that x(1)>x(0)x^{(1)}\!>\!x^{(0)}. Hence, {x(k)}\{x^{(k)}\} is strictly increasing and convergent from Lemma 4.2 (i).

Case 3: x(0)>x1(τ)x^{(0)}\!>\!x_{1}(\tau). Then ϕ(x(0))<0\phi(x^{(0)})\!<\!0 by Lemma 3.1 (i), namely, x(1)<x(0)x^{(1)}\!<\!x^{(0)}. Hence, {x(k)}\{x^{(k)}\} is strictly decreasing and convergent from Lemma 4.2 (ii).

In summary, the sequence {x(k)}\{x^{(k)}\} is convergent and its limit is denoted by x()\!x^{(\infty)}. Then x()0x^{(\infty)}\!\geq\!0 and ϕ(x())=0\phi(x^{(\infty)})\!=\!0. So, x()=x1(τ)x^{(\infty)}=x_{1}(\tau) by Lemma 3.1 (iv) and (v).   \Box

By Lemma 4.1 (iii) and Lemma 4.3, we have the following conclusion.

Corollary 4.4

Given τ>λσ\tau\!>\!\frac{\lambda}{\sigma} and x(0)0x^{(0)}\!\geq\!0, the sequence {x(k)}\{x^{(k)}\} generated by Algorithm 1 converges to x1(τ)x_{1}(\tau).

The following lemma proves that {x(k)}\{x^{(k)}\} always converges to 0 for all τ(0,σ(1+lnλσ2))\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}})) if σ(1+lnλσ2)>0\sigma(1+\ln\frac{\lambda}{\sigma^{2}})>0, namely, λσ2>1e\frac{\lambda}{\sigma^{2}}>\frac{1}{e}.

Lemma 4.5

Suppose σ(1+lnλσ2)>0\sigma(1+\ln\frac{\lambda}{\sigma^{2}})>0. Given τ(0,σ(1+lnλσ2))\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}})) and an initial value x(0)0x^{(0)}\geq 0. Let the sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1. Then {x(k)}\{x^{(k)}\} converges to 0.

Proof: Firstly, we will argue that there exists k00k_{0}\geq 0 such that x(k0)=0x^{(k_{0})}=0. If not, x(k)>0x^{(k)}>0 for all k0k\geq 0. Then x(1)=τλσex(0)σx^{(1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}} from (3) and x(1)x(0)=ϕ(x(0))x^{(1)}-x^{(0)}=\phi(x^{(0)}), where ϕ\phi be defined by (4). Again from Lemma 3.1 (i) and τ(0,σ(1+lnλσ2))\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}})), it holds that ϕ(x)ϕ(σlnλσ2)<0\phi(x)\leq\phi(\sigma\ln{\frac{\lambda}{\sigma^{2}}})<0 for any xx\in\mathbb{R}. Consequently, ϕ(x(0))<0\phi(x^{(0)})<0, namely, x(1)<x(0)x^{(1)}<x^{(0)}. So, {x(k)}\{x^{(k)}\} is decreasing and convergent by Lemma 4.2 (ii). Now suppose that limkx(k)=x()\lim\limits_{k\to\infty}x^{(k)}\!=\!x^{(\infty)}. Then x()0x^{(\infty)}\!\geq\!0 and ϕ(x())=0\phi(x^{(\infty)})\!=\!0, which contradicts to ϕ(x())<0\phi(x^{(\infty)})<0. Hence, there exists k00k_{0}\geq 0 such that x(k0)=0x^{(k_{0})}=0, and then the sequence {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.1 (i) and the fact σ(1+lnλσ2)λσ\sigma(1+\ln{\frac{\lambda}{\sigma^{2}}})\!\leq\!\frac{\lambda}{\sigma}.   \Box

The next two lemmas study the convergence of {x(k)}\{x^{(k)}\} for τ[σ(1+lnλσ2),λσ)\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}).

Lemma 4.6

Suppose λσ2\lambda\!\leq\!\sigma^{2}. Given τ[σ(1+lnλσ2),λσ)\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}) and an initial value x(0)0x^{(0)}\!\geq\!0. Let the sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1. Then {x(k)}\{x^{(k)}\} converges to 0.

Proof: Firstly, we will argue that there exists k00k_{0}\geq 0 such that x(k0)=0x^{(k_{0})}=0. If not, x(k)>0x^{(k)}>0 for all k0k\geq 0. Then x(1)=τλσex(0)σx^{(1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}} from (3) and x(1)x(0)=ϕ(x(0))x^{(1)}-x^{(0)}=\phi(x^{(0)}), where ϕ\phi is defined by (4). Since λσ2\lambda\leq\sigma^{2}, σlnλσ20\sigma\ln{\frac{\lambda}{\sigma^{2}}}\leq 0. Again from Lemma 3.1 (i), it holds that ϕ(x)ϕ(0)=τλσ<0\phi(x)\leq\phi(0)=\tau-\frac{\lambda}{\sigma}<0 for any x0x\geq 0. Consequently, ϕ(x(0))<0\phi(x^{(0)})<0, namely, x(1)<x(0)x^{(1)}<x^{(0)}. So, {x(k)}\{x^{(k)}\} is decreasing and convergent by Lemma 4.2 (ii). Now suppose that limkx(k)=x()\lim\limits_{k\to\infty}x^{(k)}\!=\!x^{(\infty)}. Then x()0x^{(\infty)}\geq 0 and ϕ(x())=0\phi(x^{(\infty)})=0, which contradicts to ϕ(x())<0\phi(x^{(\infty)})<0. Hence, there exists k00k_{0}\geq 0 such that x(k0)=0x^{(k_{0})}=0, and then the sequence {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.1 (i).   \Box

Lemma 4.7

Suppose λ>σ2\lambda\!>\!\sigma^{2}. Given τ[σ(1+lnλσ2),λσ)\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}) and an initial value x(0)0x^{(0)}\!\geq\!0. Let the sequence {x(k)}\{x^{(k)}\} be generated by Algorithm 1 and x1(τ)x_{1}(\tau) and x2(τ)x_{2}(\tau) are defined in Lemma 3.1. Then, the following statements hold.

  • (i)

    If x(0)(0,x2(τ))x^{(0)}\in(0,x_{2}(\tau)), the sequence {x(k)}\{x^{(k)}\} converges to 0.

  • (ii)

    If x(0)=x2(τ)x^{(0)}=x_{2}(\tau), the sequence {x(k)}\{x^{(k)}\} converges to x2(τ)x_{2}(\tau).

  • (iii)

    If x(0)(x2(τ),+)x^{(0)}\in(x_{2}(\tau),+\infty), the sequence {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau).

Proof: Since λ>σ2\lambda>\sigma^{2} and τ[σ(1+lnλσ2),λσ)\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}), ϕ(σlnλστ)=σlnλστ<0\phi(\sigma\ln\frac{\lambda}{\sigma\tau})\!=\!-\sigma\ln\frac{\lambda}{\sigma\tau}\!<\!0, where ϕ\phi is defined by (4). By Lemma 3.1 (i) and (ii), it holds

0<σlnλστ<x2(τ)σlnλσ2x1(τ),0<\sigma\ln\frac{\lambda}{\sigma\tau}<x_{2}(\tau)\leq\sigma\ln\frac{\lambda}{\sigma^{2}}\leq x_{1}(\tau), (20)

for any τ[σ(1+lnλσ2),λσ)\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}).

(i) The proof can be divided into two cases: x(0)σlnλστx^{(0)}\leq\sigma\ln\frac{\lambda}{\sigma\tau} and σlnλστ<x(0)<x2(τ)\sigma\ln\frac{\lambda}{\sigma\tau}<x^{(0)}<x_{2}(\tau). If x(0)σlnλστx^{(0)}\leq\sigma\ln\frac{\lambda}{\sigma\tau}, then

x(1)=(τλσex(0)σ)+(τλσeσlnλστσ)+=(ττ)+=0,x^{(1)}=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}})_{+}\leq(\tau-\frac{\lambda}{\sigma}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma\tau}}{\sigma}})_{+}=(\tau-\tau)_{+}=0,

and hence {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.1 (i).

If σlnλστ<x(0)<x2(τ)\sigma\ln\frac{\lambda}{\sigma\tau}<x^{(0)}<x_{2}(\tau), τλσex(0)σ>τλσeσlnλστσ=0.\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}>\tau-\frac{\lambda}{\sigma}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma\tau}}{\sigma}}=0. Hence, it follows that x(1)=τλσex(0)σx^{(1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}} from (3), and then 0<x(1)x(0)0<x^{(1)}\leq x^{(0)} as x(1)x(0)=ϕ(x(0))<ϕ(x2(τ))=0x^{(1)}-x^{(0)}=\phi(x^{(0)})<\phi(x_{2}(\tau))=0 by Lemma 3.1 (i)–(iii). If there exists k00k_{0}\geq 0 such that x(k0)=0x^{(k_{0})}=0, {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.1 (i). Otherwise, x(k)>0x^{(k)}>0 for all k0k\geq 0. Notice that x(1)<x(0)x^{(1)}<x^{(0)}. Thus, the sequence {x(k)}\{x^{(k)}\} is decreasing and convergent by Lemma 4.2 (ii). Moreover, its limit, denoted by x()x^{(\infty)} satisfies x()=τλσex()σx^{(\infty)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}}, namely, ϕ(x())=0\phi(x^{(\infty)})=0, and x()<x(0)<x2(τ)x^{(\infty)}<x^{(0)}<x_{2}(\tau), which implies ϕ(x())<ϕ(x2(τ))=0\phi(x^{(\infty)})<\phi(x_{2}(\tau))=0. Contradiction. In summary, the sequence {x(k)}\{x^{(k)}\} converges to 0.

(ii) If x(0)=x2(τ)x^{(0)}\!=\!x_{2}(\tau), then x(0)>σlnλστx^{(0)}\!>\!\sigma\ln\frac{\lambda}{\sigma\tau} from (20), and x(1)=x(0)x^{(1)}\!=\!x^{(0)} as x(1)x(0)=ϕ(x(0))=ϕ(x2(τ))=0x^{(1)}-x^{(0)}\!=\!\phi(x^{(0)})\!=\!\phi(x_{2}(\tau))\!=\!0. In this scenario, the sequence {x(k)}\{x^{(k)}\} is a constant sequence and its limit is x2(τ)x_{2}(\tau).

(iii) Let x(0)(x2(τ),+)x^{(0)}\!\in\!(x_{2}(\tau),+\infty). We know that x2(τ)x1(τ)x_{2}(\tau)\!\leq\!x_{1}(\tau) from Lemma 3.1 (ii) and (iii). x(0)>σlnλστx^{(0)}>\sigma\ln\frac{\lambda}{\sigma\tau} by (20) and τλσex(0)σ>τλσeσlnλστσ=0.\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}>\tau-\frac{\lambda}{\sigma}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma\tau}}{\sigma}}=0. Hence, it follows that x(1)=τλσex(0)σx^{(1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}} from (3). If x(0)<x1(τ)x^{(0)}\!<\!x_{1}(\tau), x(1)>x(0)x^{(1)}\!>\!x^{(0)} since x(1)x(0)=ϕ(x(0))>ϕ(x2(τ))=0x^{(1)}-x^{(0)}\!=\!\phi(x^{(0)})\!>\!\phi(x_{2}(\tau))\!=\!0 by Lemma 3.1 (i) and (ii), which yields

x(k+1)=τλσex(k)σ>0, for any k.x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0,\text{ for any }k\in\mathbb{N}.

Hence, {x(k)}\{x^{(k)}\} is increasing and convergent by Lemma 4.2 (i), and its limit satisfies x()=τλσex()σx^{(\infty)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}} and must be x1(τ)x_{1}(\tau). If x(0)=x1(τ)x^{(0)}=x_{1}(\tau), x(1)=x(0)x^{(1)}\!=\!x^{(0)} since x(1)x(0)=ϕ(x(0))=ϕ(x1(τ))=0x^{(1)}-x^{(0)}=\phi(x^{(0)})=\phi(x_{1}(\tau))=0. In this scenario, the sequence {x(k)}\{x^{(k)}\} is a constant sequence and its limit is x1(τ)x_{1}(\tau). If x(0)>x1(τ)x^{(0)}>x_{1}(\tau), x(1)<x(0)x^{(1)}<x^{(0)} as ϕ(x(0))<ϕ(x1(τ))=0\phi(x^{(0)})<\phi(x_{1}(\tau))=0 with Lemma 3.1 (i) and (ii). We estimate

x(1)x1(τ)=τλσex(0)σx1(τ)>τλσex1(τ)σx1(τ)=ϕ(x1(τ))=0.x^{(1)}-x_{1}(\tau)=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}-x_{1}(\tau)>\tau-\frac{\lambda}{\sigma}e^{-\frac{x_{1}(\tau)}{\sigma}}-x_{1}(\tau)=\phi(x_{1}(\tau))=0.

which implies x(1)>x1(τ)x^{(1)}>x_{1}(\tau) and then x(k)>x(k+1)>x1(τ)x^{(k)}>x^{(k+1)}>x_{1}(\tau) for each kk. Therefore, {x(k)}\{x^{(k)}\} is decreasing and convergent. Its limit satisfies x()x1(τ)x^{(\infty)}\geq x_{1}(\tau) and x()=τλσex()σx^{(\infty)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}}. Therefore, x()x^{(\infty)} must be x1(τ)x_{1}(\tau) by Lemma 3.1 (ii) and (iii).   \Box

By Lemma 4.7 (ii), (iii) and Lemma 3.1 (iii), we can obtain the following claim.

Corollary 4.8

When τ=σ(1+lnλσ2)\tau=\sigma(1+\ln\frac{\lambda}{\sigma^{2}}) and λ>σ2\lambda>\sigma^{2}, the sequence {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) for any x(0)[x1(τ),+)x^{(0)}\in[x_{1}(\tau),+\infty) with x1(τ)=σlnλσ2x_{1}(\tau)=\sigma\ln\frac{\lambda}{\sigma^{2}}.

Now, we are ready to prove Theorems 3.3 and 3.4.

Proof of Theorem 3.3 We only argue when τ>0\tau\!>\!0. In the following, we will divide the arguments into two cases.

Case 1: σ(1+lnλσ2)>0\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\!>\!0. When τ(0,σ(1+lnλσ2))\tau\!\in\!(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}})), {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.5. When τ[σ(1+lnλσ2),λσ)\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}), {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.6. When τ=λσ\tau=\frac{\lambda}{\sigma}, {x(k)}\{x^{(k)}\} converges to x1(τ)=0x_{1}(\tau)=0 by Lemma 4.3 and Lemma 3.1 if x(0)>0x^{(0)}>0, and {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.1 (i) if x(0)=0x^{(0)}=0. In a short, for any τ(0,λσ]\tau\in(0,\frac{\lambda}{\sigma}], {x(k)}\{x^{(k)}\} converges to 0. When τ(λσ,+)\tau\in(\frac{\lambda}{\sigma},+\infty), {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) by Lemma 4.3 if x(0)>0x^{(0)}>0, and {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) by Lemma 4.1 (iii) if x(0)=0x^{(0)}=0. Hence, for any τ(λσ,+)\tau\in(\frac{\lambda}{\sigma},+\infty), {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau).

Case 2: σ(1+lnλσ2)0\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq 0. In this case. τ(0,λσ)[σ(1+lnλσ2),λσ)\tau\in(0,\frac{\lambda}{\sigma})\subseteq[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}), the sequence {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.6. When τ[λσ,+)\tau\in[\frac{\lambda}{\sigma},+\infty), its proof is the same as the case 1.

Based on the above arguments, {x(k)}\{x^{(k)}\} converges to the exact solution to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) by Lemma 2.2. The proof is hence complete.


Proof of Theorem 3.4 We only argue that τ>0\tau\!>\!0. By Corollary 4.4, {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) for any τ(λσ,+)\tau\in(\frac{\lambda}{\sigma},+\infty). By Lemma 4.5, {x(k)}\{x^{(k)}\} converges to 0 for any τ(0,σ(1+lnλσ2))\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}})). Hence, the statement (i) holds with Lemma 2.3. The rest of the proof will focus on τ[σ(1+lnλσ2),λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]. Now suppose that τ[σ(1+lnλσ2),λσ]\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}].

(ii) Let x(0)σlnλσ2x^{(0)}\!\geq\!\sigma\ln\frac{\lambda}{\sigma^{2}}. If τ(σ(1+lnλσ2),λσ]\tau\in(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}], then x2(τ)<σlnλσ2<x1(τ)x_{2}(\tau)\!<\!\sigma\ln\frac{\lambda}{\sigma^{2}}\!<\!x_{1}(\tau) by Lemma 3.1(ii) and (v). Hence, x(0)>x2(τ)x^{(0)}\!>\!x_{2}(\tau) from the assumption that x(0)σlnλσ2x^{(0)}\!\geq\!\sigma\ln\frac{\lambda}{\sigma^{2}}. By Lemma 4.7 (iii) and Corollary 4.4, {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau). If τ=σ(1+lnλσ2)\tau\!=\!\sigma(1+\ln\frac{\lambda}{\sigma^{2}}), x1(τ)=x2(τ)=σlnλσ2x_{1}(\tau)=x_{2}(\tau)=\sigma\ln\frac{\lambda}{\sigma^{2}} from Lemma 3.1(iii), and then the desired result is obtained by Corollary 4.8. Thus, with Lemma 2.4, the statement (ii) holds.

(iii) Let x2(τ¯λ,σ)<x(0)<σlnλσ2x_{2}(\bar{\tau}_{\lambda,\sigma})\!<\!x^{(0)}\!<\!\sigma\ln\frac{\lambda}{\sigma^{2}}. Since x2(τ)x_{2}(\tau) is strictly decreasing on τ[σ(1+lnλσ2),λσ)\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}) by Lemma 2.1 and σlnλσ2=x2(σ(1+lnλσ2))\sigma\ln\frac{\lambda}{\sigma^{2}}=x_{2}\Big{(}\sigma\big{(}1+\ln\frac{\lambda}{\sigma^{2}}\big{)}\Big{)} by Lemma 3.1(iii), we can drive that σ(1+lnλσ2)<x21(x(0))<τ¯λ,σ,\sigma\Big{(}1+\ln\frac{\lambda}{\sigma^{2}}\Big{)}<x_{2}^{-1}(x^{(0)})<\bar{\tau}_{\lambda,\sigma}, and that x(0)<x2(τ)x^{(0)}\!<\!x_{2}(\tau) for each τ[σ(1+lnλσ2),x21(x(0))]\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)})] and x(0)>x2(τ)x^{(0)}\!>\!x_{2}(\tau) for each τ[x21(x(0)),λσ)\tau\!\in\![x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}). Together with Lemma 4.7, the limit of {x(k)}\{x^{(k)}\}, denoted by x()x^{(\infty)}, satisfies

x()={0, if τ(σ(1+lnλσ2),x21(x(0))),x(0)=x2(τ), if τ=x21(x(0)),x1(τ), if τ(x21(x(0)),λσ].x^{(\infty)}=\left\{\begin{array}[]{ll}0,&\mbox{ if }\tau\in(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)})),\\ x^{(0)}=x_{2}(\tau),&\mbox{ if }\tau=x_{2}^{-1}(x^{(0)}),\\ x_{1}(\tau),&\mbox{ if }\tau\in(x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}].\end{array}\right. (21)

Compared (21) with Lemma 2.4 gives the desired conclusion.

(iv) Let x(0)=x2(τ¯λ,σ)x^{(0)}\!=\!x_{2}(\overline{\tau}_{\lambda,\sigma}). When τ[σ(1+lnλσ2),τ¯λ,σ]\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\overline{\tau}_{\lambda,\sigma}], x2(τ)>x2(τ¯λ,σ)=x0x_{2}(\tau)\!>\!x_{2}(\overline{\tau}_{\lambda,\sigma})\!=\!x_{0} since x2(τ)x_{2}(\tau) is strictly decreasing on τ[σ(1+lnλσ2),λσ)\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}) by Lemma 2.1, and then {x(k)}\{x^{(k)}\} converges to 0 by Lemma 4.7 (i). When τ(τ¯λ,σ,λσ]\tau\!\in\!(\overline{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}], x2(τ)<x2(τ¯λ,σ)=x0x_{2}(\tau)\!<\!x_{2}(\overline{\tau}_{\lambda,\sigma})\!=\!x_{0}, and then {x(k)}\{x^{(k)}\} converges to x1(τ)x_{1}(\tau) by Lemma 4.7 (iii). When τ=τ¯λ,σ\tau\!=\!\overline{\tau}_{\lambda,\sigma}, x2(τ)=x0x_{2}(\tau)\!=\!x_{0} and then {x(k)}\{x^{(k)}\} converges to x2(τ¯λ,σ)x_{2}(\overline{\tau}_{\lambda,\sigma}) by Lemma 4.7 (ii). Hence, the desired result is obtained by Lemma 2.4.

(v) Let 0x(0)<x2(τ¯λ,σ)0\leq x^{(0)}\!<\!x_{2}(\bar{\tau}_{\lambda,\sigma}). The proof is similar to (iii). Since 0x(0)<x2(τ¯λ,σ)0\!\leq x^{(0)}\!<\!x_{2}(\bar{\tau}_{\lambda,\sigma}), we have σ(1+lnλσ2)τ¯λ,σ<x21(x(0))λσ.\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\!\leq\!\bar{\tau}_{\lambda,\sigma}\!<\!x_{2}^{-1}(x^{(0)})\!\leq\!\frac{\lambda}{\sigma}. Suppose that x(0)>0x^{(0)}\!>\!0. Since x2(τ)x_{2}(\tau) is strictly decreasing on τ[σ(1+lnλσ2),λσ]\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}] by Lemma 2.1, it holds that x(0)<x2(τ)x^{(0)}\!<\!x_{2}(\tau) for any τ[σ(1+lnλσ2),x21(x(0)))\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)})); and x(0)>x2(τ)x^{(0)}\!>\!x_{2}(\tau) for any τ(x21(x(0)),λσ]\tau\!\in\!(x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}]. By Lemma 4.7, the limit of {x(k)}\{x^{(k)}\} is given as in (21). Compared (21) with Lemma 2.4, x()x^{(\infty)} does not belong to Proxλfσ(τ)\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau) for τ(τ¯λ,σ,x21(x(0))]\tau\!\in\!(\bar{\tau}_{\lambda,\sigma},x_{2}^{-1}(x^{(0)})]. The rest is also true for x(0)=0x^{(0)}\!=\!0 by Lemma 4.1 (i) and the facts that x2(τ)=0x_{2}(\tau)\!=\!0 if and only if τ=λσ\tau\!=\!\frac{\lambda}{\sigma} from the proof of case (i) in Lemma 3.1 and x2(τ)x_{2}(\tau) is strictly decreasing on τ[σ(1+lnλσ2),λσ]\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}].

5 Conclusions

The relation between the IRL1 solution and the true proximal operator of PiE (1) has been clarified in Theorems 3.3 and 3.4, which can be explicitly dependent upon σ\sigma, the initial value x(0)x^{(0)}, and the regularization parameter λ\lambda. Furthermore, to remedy the gap, the initial value was adaptively selected as in Theorems 3.6 and 3.8 to guarantee that the IRL1 solution belongs to the proximal operator of PiE. The results justify the usage of IRL1 for PiE whenever an initial value is appropriately given. Finally, our arguments can be applied to other sparse-promoting penalties, especially those whose proximal operator can not be explicitly derived.

References

  • [1] A. Beck, First-order Methods in Optimization, SIAM Publisher, Philadelphia, 2017.
  • [2] T. Blumensath and M. E. Davies, Iterative thresholding for sparse approximations, Journal of Fourier Analysis and Applications, 14 (2008), pp. 629–654.
  • [3] P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in Proceedings of the 15th International Conference on Machine Learning, vol. 98, 1998, pp. 82–90.
  • [4] P. S. Bradley, O. L. Mangasarian, and W. N. Street, Feature selection via mathematical programming, INFORMS Journal on Computing, 10 (1998), pp. 209–217.
  • [5] E. J. Candes, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted 1\ell_{1} minimization, Journal of Fourier Analysis and Applications, 14 (2008), pp. 877–905.
  • [6] L. Chen and Y. Gu, The convergence guarantees of a non-convex approach for sparse recovery, IEEE Transactions on Signal Processing, 62 (2014), pp. 3754–3767.
  • [7] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (2001), pp. 1348–1360.
  • [8] J. Fan, R. Li, C.-H. Zhang, and H. Zou, Statistical Foundations of Data Science, Chapman and Hall/CRC, 2020.
  • [9] S. Foucart and M.-J. Lai, Sparsest solutions of underdetermined linear systems via q\ell_{q}-minimization for 0<q10<q\leq 1, Applied and Computational Harmonic Analysis, 26 (2009), pp. 395–407.
  • [10] G. M. Fung, O. L. Mangasarian, and A. J. Smola, Minimal kernel classifiers, Journal of Machine Learning Research, 3 (2002), pp. 303–321.
  • [11] C. Gao, N. Wang, Q. Yu, and Z. Zhang, A feasible nonconvex relaxation approach to feature selection, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, 2011, pp. 356–361.
  • [12] W. Guo, Y. Lou, J. Qin, and M. Yan, A novel regularization based on the error function for sparse recovery, Journal of Scientific Computing, 87 (2021), p. No. 31.
  • [13] W. Jiang, F. Nie, and H. Huang, Robust dictionary learning with capped 1\ell_{1}-norm, in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
  • [14] H. A. Le Thi, T. P. Dinh, H. M. Le, and X. T. Vo, DC approximation approaches for sparse optimization, European Journal of Operational Research, 244 (2015), pp. 26–46.
  • [15] Y. Liu and R. Lin, A bisection method for computing the proximal operator of the p\ell_{p}-norm with 0<p<10<p<1. Journal of Computational and Applied Mathematics.
  • [16] Y. Liu, Y. Zhou, and R. Lin, The proximal operator of the piece-wise exponential function and its application in compressed sensing. arXiv:2306.13425.
  • [17] Y. Lou and M. Yan, Fast L1–L2 minimization via a proximal operator, Journal of Scientific Computing, 74 (2018), pp. 767–785.
  • [18] S. Lucidi and F. Rinaldi, Exact penalty functions for nonlinear integer programming problems, Journal of Optimization Theory and Applications, 145 (2010), pp. 479–488.
  • [19] M. Malek-Mohammadi, A. Koochakzadeh, M. Babaie-Zadeh, M. Jansson, and C. R. Rojas, Successive concave sparsity approximation for compressed sensing, IEEE Transactions on Signal Processing, 64 (2016), pp. 5657–5671.
  • [20] O. Mangasarian, Machine learning via polyhedral concave minimization, in Applied Mathematics and Parallel Computing, Springer, 1996, pp. 175–188.
  • [21] I. Mezo, The Lambert W Function: Its Generalizations and Applications, CRC Press, 2022.
  • [22] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM Journal on Imaging Sciences, 8 (2015), pp. 331–372.
  • [23] A. Prater-Bennette, L. Shen, and E. E. Tripp, The proximity operator of the log-sum penalty, Journal of Scientific Computing, 93 (2022), p. No. 67.
  • [24] A. Prater-Bennette, L. Shen, and E. E. Tripp, A constructive approach for computing the proximity operator of the p-th power of the 1\ell_{1} norm, Applied and Computational Harmonic Analysis, 67 (2023), p. 101572.
  • [25] F. Rinaldi, New results on the equivalence between zero-one programming and continuous concave programming, Optimization Letters, 3 (2009), pp. 377–386.
  • [26] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, vol. 317, Springer Science & Business Media, 2009.
  • [27] L. Shen, Y. Xu, and X. Zeng, Wavelet inpainting with the 0\ell_{0} sparse regularization, Applied and Computational Harmonic Analysis, 41 (2016), pp. 26–53.
  • [28] M. Tao, Minimization of L1{L}_{1} over L2{L}_{2} for sparse signal recovery with convergence guarantee, SIAM Journal on Scientific Computing, 44 (2022), pp. A770–A797.
  • [29] J. Trzasko and A. Manduca, Highly undersampled magnetic resonance image reconstruction via homotopic 0\ell_{0}-minimization, IEEE Transactions on Medical imaging, 28 (2008), pp. 106–121.
  • [30] H. Wang, H. Zeng, and J. Wang, Convergence rate analysis of proximal iteratively reweighted 1\ell_{1} methods for p\ell_{p} regularization problems, Optimization Letters, 17 (2023), pp. 413–435.
  • [31] H. Wang, H. Zeng, J. Wang, and Q. Wu, Relating p\ell_{p} regularization and reweighted 1\ell_{1} regularization, Optimization Letters, 15 (2021), pp. 2639–2660.
  • [32] H. Wang, F. Zhang, Y. Shi, and Y. Hu, Nonconvex and nonsmooth sparse optimization via adaptively iterative reweighted methods, Journal of Global Optimization, 81 (2021), pp. 717–748.
  • [33] J. Wright and Y. Ma, High-dimensional Data Analysis with Low-dimensional Models: Principles, Computation, and Applications, Cambridge University Press, 2022.
  • [34] J. Yan, X. Meng, F. Cao, and H. Ye, A universal rank approximation method for matrix completion, International Journal of Wavelets, Multiresolution and Information Processing, 20 (2022), p. 2250016.
  • [35] P. Yin, E. Esser, and J. Xin, Ratio and difference of 1\ell_{1} and 2\ell_{2} norms and sparse representation with coherent dictionaries, Communications in Information and Systems, 14 (2014), pp. 87–109.
  • [36] P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of 12\ell_{1-2} for compressed sensing, SIAM Journal on Scientific Computing, 37 (2015), pp. A536–A563.
  • [37] C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Annals of Statistics, 38 (2010), pp. 894–942.
  • [38] S. Zhang and J. Xin, Minimization of transformed l1l_{1} penalty: Closed form representation and iterative thresholding algorithms, Communications in Mathematical Sciences, 15 (2017), pp. 511–537.
  • [39]  , Minimization of transformed L1{L}_{1} penalty: theory, difference of convex function algorithm, and robust application in compressed sensing, Mathematical Programming, 169 (2018), pp. 307–336.
  • [40] T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Research, 11 (2010), pp. 1081–1107.
  • [41] Z. Zhou, A unified framework for constructing nonconvex regularizations, IEEE Signal Processing Letters, 29 (2022), pp. 479–483.
  • [42]  , Sparse recovery based on the generalized error function, Journal of Computational Mathematics,doi:10.4208/jcm.2204-m2021-0288, (2023).
  • [43] H. Zou and R. Li, One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics, 36 (2008), pp. 1509–1533.