On Choosing Initial Values of Iteratively Reweighted $\ell_{1}$ Algorithms for the Piece-wise Exponential Penalty

Rongrong Lin, Shimin Li, and Yulan Liu Corresponding author. School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou 510520, P.R. China. Email: [email protected].

Abstract

Computing the proximal operator of the sparsity-promoting piece-wise exponential (PiE) penalty $1-e^{-|x|/\sigma}$ with a given shape parameter $\sigma>0$ , which is treated as a popular nonconvex surrogate of $\ell_{0}$ -norm, is fundamental in feature selection via support vector machines, image reconstruction, zero-one programming problems, compressed sensing, etc. Due to the nonconvexity of PiE, for a long time, its proximal operator is frequently evaluated via an iteratively reweighted $\ell_{1}$ algorithm, which substitutes PiE with its first-order approximation, however, the obtained solutions only are the critical point. Based on the exact characterization of the proximal operator of PiE, we explore how the iteratively reweighted $\ell_{1}$ solution deviates from the true proximal operator in certain regions, which can be explicitly identified in terms of $\sigma$ , the initial value and the regularization parameter in the definition of the proximal operator. Moreover, the initial value can be adaptively and simply chosen to ensure that the iteratively reweighted $\ell_{1}$ solution belongs to the proximal operator of PiE.

Keywords: Iteratively reweighted $\ell_{1}$ algorithms; piece-wise exponential penalty; proximal operator; Lambert W function; initial values.

1 Introduction

Sparse optimization problems arise in a wide range of fields, such as compressed sensing, image processing, statistics, machine learning, and among others [32, 33]. The so-called $\ell_{0}$ -norm, which counts the nonzero components of a vector, is a natural penalty function to promote sparsity. Sparse solutions are more easily interpretable and generally lead to better generalization of the model performance. Numerous studies on $\ell_{0}$ -norm penalty optimization problem have been widely investigated in the literature [2, 8, 27, 33]. However, such a nonconvex problem is NP-hard [2].

To circumvent this challenge, there are a great many of $\ell_{0}$ -norm surrogates listed in the literature [16, 32, 42]. The $\ell_{1}$ -norm regularizer has received a great deal of attention for its continuity and convexity. Although it comes close to the $\ell_{0}$ -norm, the $\ell_{1}$ -norm frequently leads to problems with excessive punishment. To remedy this issue, nonconvex sparsity-inducing penalties have been employed to better approximate the $\ell_{0}$ -norm and enhance sparsity, and hence have received considerable attention in sparse learning. Recent theoretical studies have shown their superiority to the convex counterparts in a variety of sparse learning settings, including the bridge $\ell_{p}$ -norm penalty [9, 15], capped $\ell_{1}$ penalty [13, 40], transformed $\ell_{1}$ penalty [38, 39], log-sum penalty [5], minimax concave penalty [37], smoothly clipped absolute derivation [7], the difference of $\ell_{1}$ - and $\ell_{2}$ -norms [17, 36], the ratio of $\ell_{1}$ - and $\ell_{2}$ -norms [28, 35], Weibull penalty [41], generalized error functions [12, 42], $p$ -th power of the $\ell_{1}$ -norm [24], piece-wise exponential function (PiE) in [3, 14, 20], and among others. To address the nonconvex and possibly nonsmooth problems, a proximal algorithm is commonly used [1]. The proximal operator [1] of a function $\varphi:\mathbb{R}\to\mathbb{R}$ at $\tau\in\mathbb{R}$ with the regularization parameter $\lambda>0$ is defined by

\,{\rm Prox}\,_{\lambda\varphi}(\tau):=\arg\min_{x\in\mathbb{R}}\Big{\{}\lambda\varphi(x)+\frac{1}{2}(x-\tau)^{2}\Big{\}}.

Characterizing the proximal operator of a function is crucial to the proximal algorithm. However, such a proximal operator does not always have a closed form or is computationally challenging to solve due to the nonconvex and nonsmooth nature of the sparsity-inducing penalty. A popular method for handling this issue is the iteratively reweighted algorithm, which approximates the nonconvex and nonsmooth problem by a sequence of trackable convex subproblems. Zou and Li [43] devised a local linear approximation, which can be treated as a special case of the iteratively reweighted $\ell_{1}$ (IRL1) minimization method proposed by Candés, Wakin, and Boyd [5]. The IRL1 algorithm can be unified under a majorization-minimization framework [22]. Later, the IRL1 algorithm for optimization problems with general nonconvex and nonsmooth sparsity-inducing terms was explored in [32], and its global and local convergence analysis for the $\ell_{p}$ -norm regularized model were studied in [31] and [30], respectively.

In this paper, we focus on the PiE function. The PiE function $f_{\sigma}:\mathbb{R}\to\mathbb{R}$ with a shape parameter $\sigma>0$ , defined by

f_{\sigma}(x)=1-e^{-\frac{|x|}{\sigma}},\mbox{ for any }x\in\mathbb{R},

(1)

is one of the nonconvex surrogates of the $\ell_{0}$ -norm. It is also called an exponential-type penalty [11, 14, 32, 41] or a Laplacian function [29, (16)], which has been successfully applied in the support vector machines [3, 10], zero-one programming problems [18, 25], image reconstruction [29, 39], compressed sensing [6, 14, 19], and the low-rank matrix completion [34], etc. Due to the nonconvexity of PiE, for a very long time, the IRL1 algorithm was adopted in a large volume of references to approximate the proximal operator of PiE [3, 34, 41, 42]. Recently, the IRL1 algorithm for computing the proximal operator of PiE was adopted in [34, (3.19)] for matrix completion. However, the expression of the proximal operator $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ for PiE was originally and partially studied by Malek-Mohammadi et al[19] in 2016 and then systematically explored by Liu, Zhou, and Lin [16] using the Lambert W function. Motivated by the analysis between the IRL1 algorithm solution for the log-sum penalty and its proximal operator in [23], we will explore the relation between the IRL1 algorithm solution and the proximal operator for PiE and then provide how to select a suitable initial point in the IRL1 algorithm to ensure that the IRL1 solution is consistent with the proximal operator of PiE.

The remainder of the paper is outlined as follows: In Section 2, we recall the existing characterizations for $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ by utilizing the Lambert W function. With this, we show in Theorems 3.3 and 3.4 of Section 3 that the iteratively reweighted $\ell_{1}$ solution does not belong to the proximal operator of PiE in certain regions, which can be explicitly determined in terms of $\sigma$ , the initial value, and the regularization parameter $\lambda$ , as shown in Fig. 2 later. To remedy this issue, the initial value is set adaptively, as in Theorems 3.6 and 3.8, to ensure that the IRL1 solution belongs to the proximal operator of PiE. Some necessary lemmas and the proofs of Theorems 3.3 and 3.4 are presented in Section 4. Some conclusions are made in the final section.

2 Existing characterizations for $\,{\rm Prox}\,_{\lambda f_{\sigma}}$

Let us recall the expression of the proximal operator $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ of PiE (1), which was systematically explored in [16] by means of the Lambert W function. The Lambert W function $W(x)$ is a set of solutions of the equation

x=W(x)e^{W(x)},\mbox{ for any }x\in[-\frac{1}{e},+\infty).

The function $W(x)$ is single-valued for $x\geq 0$ or $x=-\frac{1}{e}$ , and is double-valued for $-\frac{1}{e}<x<0$ (see, Fig. 1). To discriminate between the two branches when $-\frac{1}{e}<x<0$ , we use the same notation as in [21, Section 1.5] and denote the branch satisfying $W(x)\geq-1$ and $W(x)\leq-1$ by $W_{0}(x)$ and $W_{-1}(x)$ , respectively. Such a function is a built-in function in Python (https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.lambertw.html). Lemma 2.1 later gives their monotonicity. The readers can refer to the recent monograph [21, Section 1.5] on the Lambert W function to learn more details.

Lemma 2.1

[21, Section 1.6] The Lambert W function $W_{0}(x)$ is strictly increasing on $[-\frac{1}{e},0)$ ; however, $W_{-1}(x)$ is strictly decreasing on $[-\frac{1}{e},0)$ .

Refer to caption — Figure 1: Two main branches of the Lambert W function.

The characterizations of the proximal operator of PiE (1) were presented in [16, Section 2], which were split into two cases: $\lambda\leq\sigma^{2}$ and $\lambda>\sigma^{2}$ . For the sake of completeness, we list those characterizations as follows.

Lemma 2.2

Let $\lambda\leq\sigma^{2}$ and $\tau\in\mathbb{R}$ . It holds that

\displaystyle\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)=\left\{\begin{array}[]{cl}\{0\},&\mbox{ if }|\tau|\leq\frac{\lambda}{\sigma},\\ \{\,{\rm sign}\,(\tau)x_{1}(\tau)\},&\mbox{ otherwise},\end{array}\right.

where $x_{1}(\tau):=\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{|\tau|}{\sigma}})+|\tau|.$

Lemma 2.3

Let $\lambda>\sigma^{2}$ and $\tau\in\mathbb{R}$ . It holds that

\displaystyle\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)=\left\{\begin{array}[]{cl}\{0\},&\mbox{ if }|\tau|<\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\\ \,{\rm sign}\,(\tau)\arg\min\limits_{x=0,x_{1}(\tau)}\{\widehat{L}(x,\tau)\},&\mbox{ if }\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq|\tau|\leq\frac{\lambda}{\sigma},\\ \{\,{\rm sign}\,(\tau)x_{1}(\tau)\},&\mbox{ otherwise},\end{array}\right.

where $\hat{L}(x,\tau):=\lambda(1-e^{-\frac{x}{\sigma}})+\frac{1}{2}(x-|\tau|)^{2}$ and $x_{1}(\tau)$ is defined as in Lemma 2.2.

Lemma 2.3 can be further reduced to the following result, which shows that $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ is single-valued except at some point $\bar{\tau}_{\lambda,\sigma}$ depending upon only the $\lambda$ and $\sigma$ . This conclusion will be used in the proof of Theorem 3.4.

Lemma 2.4

Let $\lambda>\sigma^{2}$ and $\tau\in\mathbb{R}$ . Then

\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)=\left\{\begin{array}[]{ll}\{0\},&\mbox{ if }|\tau|\leq\bar{\tau}_{\lambda,\sigma},\\ \{0,x_{1}(\tau)\},&\mbox{ if }|\tau|=\bar{\tau}_{\lambda,\sigma},\\ \{\,{\rm sign}\,(\tau)x_{1}(\tau)\},&\mbox{ otherwise},\end{array}\right.

where $\bar{\tau}_{\lambda,\sigma}=x^{*}+\frac{\lambda}{\sigma}e^{-\frac{x^{*}}{\sigma}}$ with $x^{*}\in(0,\sqrt{2\lambda})$ being the solution to the equation $\frac{1}{2}+\lambda\frac{(\frac{x}{\sigma}+1)e^{-\frac{x}{\sigma}}-1}{x^{2}}=0$ on $(0,\infty)$ , and $x_{1}(\tau)$ is defined as in Lemma 2.2.

Obviously, according to Lemmas 2.3 and 2.4, the threshold $\bar{\tau}_{\lambda,\sigma}$ satisfies

\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq\bar{\tau}_{\lambda,\sigma}\leq\frac{\lambda}{\sigma}.

Those three points will be frequently used when we explore the iteratively reweighted $\ell_{1}$ algorithm for computing $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ in the next section.

3 Analysis of IRL1 for computing $\,{\rm Prox}\,_{\lambda f_{\sigma}}$

In this section, we will analyze the IRL1 algorithm to compute the following problem:

\displaystyle\min_{x\in\mathbb{R}}\Big{\{}\lambda f_{\sigma}(x)+\frac{1}{2}(x-\tau)^{2}\Big{\}}.

(2)

To solve the problem (2), the nonconvex function $f_{\sigma}$ in the IRL1 algorithm is locally approximated by its linear expansion, namely,

f_{\sigma}(x)\approx f_{\sigma}(x^{(k)})+\frac{1}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}(|x|-|x^{(k)}|),

where $x^{(k)}$ denotes the $k$ -th iteration. With it, the next iteration $x^{(k+1)}$ for a given $\tau$ is computed by

x^{(k+1)}:=\arg\min_{x\in\mathbb{R}}\Big{\{}\frac{1}{2}(x-\tau)^{2}+\lambda\Big{(}f_{\sigma}(x^{(k)})+\frac{1}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}(|x|-|x^{(k)}|)\Big{)}\Big{\}}.

By removing the terms which do not depend on the variable $x$ in the above expression, we obtain

x^{(k+1)}=\arg\min_{x\in\mathbb{R}}\Big{\{}\frac{1}{2}(x-\tau)^{2}+\frac{\lambda}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}|x|\Big{\}}=\,{\rm Prox}\,_{\frac{\lambda}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}|\cdot|}(\tau),

that is,

x^{(k+1)}=\,{\rm sign}\,(\tau)\Big{(}|\tau|-\frac{\lambda}{\sigma}e^{-\frac{|x^{(k)}|}{\sigma}}\Big{)}_{+},

where $(t)_{+}:=\max\{0,t\}$ .

It is sufficient to restrict our discussion on $\tau>0$ as $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ is symmetric about the origin [16, Lemma 2.1] and $\,{\rm Prox}\,_{\lambda f_{\sigma}}(0)=\{0\}$ . To be more precise, the IRL1 algorithm for PiE with $\tau>0$ is described in Algorithm 1.

Algorithm 1 Iteratively Reweighted

\ell_{1}

Algorithm (IRL1)

Input Fix $\lambda>0$ and $\sigma>0$ . Given $x^{(0)}\geq 0$ and $\tau>0$ .

for $k=0,1,\dots$ do

$x^{(k+1)}=\Big{(}\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}\Big{)}_{+}$ (3)
end for

Output $x^{(\infty)}$

Denote $F(x)\!:=\!\lambda f_{\sigma}(x)+\frac{1}{2}(x-\tau)^{2}$ . We call $x$ a critical point of the function $F$ , if $0\!\in\!\partial F(x)$ is satisfied, where $\partial F(x)$ denotes the subdifferential of $F$ at $x$ [26, Definition 8.3]. Ochs et al [22] pointed out that the sequence $\{x^{(k)}\}$ generated by Algorithm 1 converges to a critical point of the function $F$ . We go one step further than the previous result and show that not only the sequence $\{x^{(k)}\}$ is convergent, but also its limit $x^{(\infty)}$ depends on the initialization $x^{(0)}$ and the relationship of $\tau$ with the parameters $\lambda$ and $\sigma$ . The convergence behavior of (3) is described by Lemmas 4.1–4.7 in Section 4. This is then compared to the true solution set $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ in Theorems 3.3 and 3.4. In particular, we identify the intervals where (3) will not achieve the true solution. These intervals are explicitly determined in terms of the initial $x^{(0)}$ and parameters $\lambda$ and $\sigma$ .

Notice that $x^{(\infty)}$ satisfying the equation $x=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x}{\sigma}})_{+}$ by (3). To further investigate properties of $x^{(\infty)}$ , for given $\tau\!\in\!\mathbb{R}$ we define a function $\phi:\mathbb{R}\to\mathbb{R}$ with

\phi(x):=\tau-x-\frac{\lambda}{\sigma}e^{-\frac{x}{\sigma}},{\text{ for any }}x\in\mathbb{R},

(4)

and its main properties used later are listed in the following Lemma.

Lemma 3.1

Let $\phi$ be defined by (4). Write $x_{2}(\tau):=\sigma W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau$ , and $x_{1}(\tau)$ is defined as in Lemma 2.2. Then, the following statements hold.

(i)

The function $\phi$ is strictly increasing on $(-\infty,\sigma\ln\frac{\lambda}{\sigma^{2}}]$ and strictly decreasing on $(\sigma\ln\frac{\lambda}{\sigma^{2}},+\infty)$ . Moreover, $\phi(x)\!\leq\!\phi(\sigma\ln\frac{\lambda}{\sigma^{2}})\!=\!\tau\!-\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}})$ for any $x\!\in\!\mathbb{R}$ .

(ii)

If $\tau\!\in\!(\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ , the equation $\phi(x)\!=\!0$ has two solutions $x_{1}(\tau)$ and $x_{2}(\tau)$ with

\displaystyle\left\{\begin{array}[]{cl}0<x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau),&{\rm if\;}\lambda>\sigma^{2},\\ x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)<0,&{\rm if\;}\lambda\leq\sigma^{2}.\end{array}\right.

(7)

(iii)

If $\tau=\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}})$ , the equation $\phi(x)=0$ has a unique solution, that is, $x_{1}(\tau)=x_{2}(\tau)=\sigma\ln{\frac{\lambda}{\sigma^{2}}}$ .
(iv)

If $\tau>\frac{\lambda}{\sigma}$ , the equation $\phi(x)=0$ has two solutions $x_{1}(\tau)$ and $x_{2}(\tau)$ satisfying $x_{2}(\tau)<0<x_{1}(\tau)$ .

(v)

If $\tau=\frac{\lambda}{\sigma}$ , the equation $\phi(x)=0$ has two solutions $x_{1}(\tau)$ and $x_{2}(\tau)$ with

\displaystyle\left\{\begin{array}[]{cl}0=x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau),&{\rm if\;}\lambda>\sigma^{2},\\ x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)=0,&{\rm if\;}\lambda<\sigma^{2},\\ x_{1}(\tau)=x_{2}(\tau)=0,&{\rm if\;}\lambda=\sigma^{2}.\end{array}\right.

(11)

Proof: After simple calculation, $\phi^{\prime}(x)=\frac{\lambda}{\sigma^{2}}e^{-\frac{x}{\sigma}}-1$ , $\phi^{\prime\prime}(x)=-\frac{\lambda}{\sigma^{3}}e^{-\frac{x}{\sigma}}$ . Clearly, the statement (i) holds. The equation $\phi(x)=0$ is equivalent to $x\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x}{\sigma}}$ , namely,

\frac{x-\tau}{\sigma}e^{\frac{x-\tau}{\sigma}}=-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}.

(12)

If $\tau\!>\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}})$ , $-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0)$ . By definition of Lambert W function and the equation (12), the equation $\phi(x)=0$ has two solutions $x_{1}(\tau)$ and $x_{2}(\tau)$ . Together with (i) and the fact $\phi(0)\!=\!\tau-\frac{\lambda}{\sigma}$ , we know that the statements (ii) and (iv) hold. When $\tau\!=\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}})$ , $-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!=\!-\frac{1}{e}$ . Hence, we obtain

W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W(-\frac{1}{e})=-1,

which implies $x_{1}(\tau)\!=\!x_{2}(\tau)\!=\!\tau-\sigma\!=\!\sigma\ln{\frac{\lambda}{\sigma^{2}}}$ . The statement (iii) holds. In the following, we will argue the statement (v). Notice $\tau\!=\!\frac{\lambda}{\sigma}\!>\!\sigma(1\!+\!\ln\frac{\lambda}{\sigma^{2}})$ , $-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0)$ . So, the equation $\phi(x)\!=\!0$ has solutions $x_{1}(\tau)$ and $x_{2}(\tau)$ . From (i), it follows

\displaystyle x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau).

(13)

We will proceed in two cases.

Case 1: $\lambda\neq\sigma^{2}$ . If $\lambda>\sigma^{2}$ , then $-\frac{\lambda}{\sigma^{2}}<-1$ . With $-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0)$ , we know that $W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=-\frac{\lambda}{\sigma^{2}}.$ Together with $\tau=\frac{\lambda}{\sigma}$ , yielding

\displaystyle x_{2}(\tau)=\sigma W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau=\tau+\sigma W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=\tau-\frac{\lambda}{\sigma}=0.

Again from (13), it follows that $0=x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)$ .

If $\lambda<\sigma^{2}$ , then $-\frac{\lambda}{\sigma^{2}}>-1$ . With $-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!\in\!(-\frac{1}{e},0)$ , we know that $W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=-\frac{\lambda}{\sigma^{2}}.$ Together with $\tau=\frac{\lambda}{\sigma}$ , yielding

\displaystyle x_{1}(\tau)=\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau=\tau+\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\lambda}{\sigma^{2}}})=\tau-\frac{\lambda}{\sigma}=0.

Again from (13), it follows that $x_{2}(\tau)<\sigma\ln{\frac{\lambda}{\sigma^{2}}}<x_{1}(\tau)=0$ .

Case 2: $\lambda\!=\!\sigma^{2}$ . Now $-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\!=\!-\frac{1}{e}$ by $\tau=\frac{\lambda}{\sigma}$ . Hence,

W_{-1}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})\!=\!W(-\frac{1}{e})=-1,

which implies $\phi(x)\!=\!0$ has a unique solution $x_{1}(\tau)\!=\!x_{2}(\tau)\!=\!0$ by (12). $\Box$

Proposition 3.2

Given $\tau>0$ and an initial value $x^{(0)}\geq 0$ . Suppose that the sequence $\{x^{(k)}\}$ generated by Algorithm 1 converges to $x^{(\infty)}$ . Then, the following statements hold.

(i)

$x^{(\infty)}=0$ implies that $\tau\leq\frac{\lambda}{\sigma}$ .
(ii)

If $\tau>\frac{\lambda}{\sigma}$ , $x^{(\infty)}=\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau$ .

Proof: By the continuity of the function $(\cdot)_{+}$ , $x^{(k)}\to x^{\infty}$ and the equation (3) for each $k$ , we know that

\displaystyle x^{(\infty)}=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}})_{+}.

(14)

If $x^{(\infty)}\!=\!0$ , then $(\tau-\frac{\lambda}{\sigma})_{+}\!=\!0$ from (14), which implies that $\tau\!\leq\!\frac{\lambda}{\sigma}$ . Hence, the statement (i) holds. If $\tau\!>\!\frac{\lambda}{\sigma}$ , then $x^{(\infty)}\!>\!0$ from (i), and $x^{(\infty)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}}$ from (14), namely, $\phi(x^{(\infty)})\!=\!0$ , where $\phi$ defined by (4). So, $x^{(\infty)}\!=\!\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau$ from Lemma 3.1 (iv). $\Box$

3.1 Comparing IRL1 solution with $\,{\rm Prox}\,_{\lambda f_{\sigma}}$

In this subsection, we will identify when the limit $x^{(\infty)}$ of the sequence $\{x^{(k)}\}$ belongs or not belongs to the set $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ . We recall in Lemmas 2.2 and 2.4 that the set $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ has a unique element except for $|\tau|=\bar{\tau}_{\lambda,\sigma}$ with $\lambda>\sigma^{2}$ .

The following two theorems summarize our main results. Our results for PiE are mainly inspired by the ideas presented in [23, section 4] for the iteratively reweighted algorithm for computing the proximal operator of the log-sum penalty. The proofs as well as relevant technical lemmas are given in Section 4. From now on, we say that a sequence $\{x^{(k)}\}$ is converging to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ provided that the limit of $\{x^{(k)}\}$ belongs to the set $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ .

Theorem 3.3

Given $\tau\!>\!0$ and an initial value $x^{(0)}\!\geq\!0$ . Let $\lambda\leq\sigma^{2}$ . Then the sequence $\{x^{(k)}\}$ generated by Algorithm 1 converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ .

If $\lambda\!>\!\sigma^{2}$ , we see that $\{x^{(k)}\}$ generated by Algorithm 1 may not always converge to $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ for some given $x^{(0)}\!\geq\!0$ . The regions where the algorithm fails depend on the threshold $\bar{\tau}_{\lambda,\sigma}$ given as in Lemma 2.4 and $x_{2}(\tau)$ defined in Lemma 3.1, as shown in Fig. 2. The value $\bar{\tau}_{\lambda,\sigma}$ can be computed by the bisection method. Notice that $x_{2}(\tau)$ is strictly decreasing on $[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ , by Lemma 2.1, we denote the inverse function of $x_{2}(\tau)$ by $x_{2}^{-1}(\tau)$ for each $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ .

Theorem 3.4

Given $\tau\!>\!0$ and an initial value $x^{(0)}\!\geq\!0$ . Let $\lambda>\sigma^{2}$ , $\bar{\tau}_{\lambda,\sigma}$ be defined as in Lemma 2.4, $x_{i}(\tau)(i=1,2)$ be defined as in Lemma 3.1 and the sequence $\{x^{(k)}\}$ be generated by Algorithm 1. Then the following statements hold.

(i)

The sequence $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\!\in\!(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))\!\cup\!(\frac{\lambda}{\sigma},+\infty)$ .
(ii)

If $x^{(0)}\geq\sigma\ln\frac{\lambda}{\sigma^{2}}$ , $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ for any $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ . Consequently, $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in[\bar{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}]$ , however $\{x^{(k)}\}$ does not converge to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\bar{\tau}_{\lambda,\sigma})$ .
(iii)

If $x_{2}(\bar{\tau}_{\lambda,\sigma})\!<\!x^{(0)}\!<\!\sigma\ln\frac{\lambda}{\sigma^{2}}$ , the sequence $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)}))\cup[\bar{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}]$ , but the sequence $\{x^{(k)}\}$ does not converge to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in[x_{2}^{-1}(x^{(0)}),\bar{\tau}_{\lambda,\sigma})$ .
(iv)

If $x^{(0)}=x_{2}(\bar{\tau}_{\lambda,\sigma})$ , the sequence $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\bar{\tau}_{\lambda,\sigma})\cup(\bar{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}]$ , however the sequence $\{x^{(k)}\}$ does not converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ when $\tau=\bar{\tau}_{\lambda,\sigma}$ .
(v)

If $0\!\leq\!x^{(0)}\!<\!x_{2}(\bar{\tau}_{\lambda,\sigma})$ , the sequence $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\bar{\tau}_{\lambda,\sigma}]\cup(x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}]$ , but the sequence $\{x^{(k)}\}$ does not converge to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\in(\bar{\tau}_{\lambda,\sigma},x_{2}^{-1}(x^{(0)})]$ .

Remark 3.5

The initial value for ILR1 is usually and simply set to be $1$ [5, Subsection 2.2] for compressed sensing, to be a random feasible value for support vector machines [4, Subsection 2.1], and the identity matrix for a low-rank matrix completion problem [41, Algorithm1]. By Theorem 3.4, the above choice may result in the deviation between the IRL1 solution and the proximal operator of PiE.

Fig. 2 illustrates the results (i)-(v) in Theorem 3.4 with $\tau\!>\!0$ . Only when $\tau$ lies in a subset of the interval $[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ does the deviation occur. The colored regions indicate where the IRL1 solution differs from the proximal operator of PiE. For example, let $\lambda=2$ and $\sigma=1$ . Then $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})=1+\ln 2$ , $\bar{\tau}_{\lambda,\sigma}=1.7638$ , $\frac{\lambda}{\sigma}=2$ , $\sigma\ln\frac{\lambda}{\sigma^{2}}=\ln 2$ , $x_{2}(\bar{\tau}_{\lambda,\sigma})=0.3393$ , and $x_{1}(\bar{\tau}_{\lambda,\sigma})=1.094$ . In this case, given an initial value $x^{(0)}=1>\sigma\ln\frac{\lambda}{\sigma^{2}}$ , the IRL1 solution (red dashdot) and the true proximal operator (black dashed) are illustrated in Fig. 3, which corresponds to the case of Theorem 3.4 (ii). Clearly, the IRL1 solution disagrees with the true proximal operator for any given $\tau\in[1+\ln 2,1.7638)$ .

3.2 Choices of initial values

We are devoted to adaptively selecting an initial value in a simple way to guarantee the fast convergence of the IRL1 solution to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for all $\tau>0$ . The discussion will be divided into two cases: $\lambda\leq\sigma^{2}$ and $\lambda>\sigma^{2}$ .

Theorem 3.6

Given $\tau>0$ . Suppose that $\lambda\leq\sigma^{2}$ and the sequence $\{x^{(k)}\}$ is generated by Algorithm 1 with the initial value $x^{(0)}$ in Algorithm 1 given as

x^{(0)}:=\left\{\begin{array}[]{ll}0,&\mbox{ if }\tau\leq\frac{\lambda}{\sigma},\\ \tau,&\mbox{ otherwise}.\end{array}\right.

(15)

Then, the following statements hold.

(i)

If $0<\tau\leq\frac{\lambda}{\sigma}$ , then $x^{(k)}=0$ for each $k\in\mathbb{N}$ .

(ii)

If $\tau>\frac{\lambda}{\sigma}$ , it holds that

\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau))\!<\!x^{(k)}-x_{1}(\tau)\!<\!\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau)),

(16)

where $x_{1}(\tau)$ is defined as in Lemma 2.2.

Proof: If $0\!<\!\tau\leq\frac{\lambda}{\sigma}$ , $x^{(0)}\!=\!0$ , the statement (i) is trivial by Lemma 4.1 (i). Now suppose $\tau\!>\!\frac{\lambda}{\sigma}$ . Then $x^{(0)}\!=\!\tau>0$ . Therefore, by Lemma 4.3, $x^{(k)}>0$ for each $k\in\mathbb{N}$ and $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ . Notice that $\tau=x_{1}(\tau)+\frac{\lambda}{\sigma}e^{-\frac{x_{1}(\tau)}{\sigma}}$ by Lemma 3.1(iv). With (3) and the Lagrange mean value theorem, we arrive at

x^{(k+1)}-x_{1}(\tau)=\frac{\lambda}{\sigma}\Big{(}e^{-\frac{x_{1}(\tau)}{\sigma}}-e^{-\frac{x^{(k)}}{\sigma}}\Big{)}=\frac{\lambda}{\sigma^{2}}e^{-\frac{\xi}{\sigma}}(x^{(k)}-x_{1}(\tau)),

(17)

for some $\xi\in(x_{1}(\tau),x^{(k)})\subseteq(x_{1}(\tau),\tau)$ . Note that $e^{-\frac{t}{\sigma}}$ is strictly decreasing for any $t>0$ . By (17), it follows that

\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}(x^{(k)}-x_{1}(\tau))\!<\!x^{(k+1)}-x_{1}(\tau)\!<\!\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}(x^{(k)}-x_{1}(\tau)).

Repeating the process, which yields (16). The proof is hence complete. $\Box$

Remark 3.7

A sequence $\{y^{(k)}\}\subseteq\mathbb{R}$ is said to converge Q-linearly to a point $\bar{y}$ if there exists $c>0$ such that $\lim_{k\to+\infty}|y^{(k+1)}-\bar{y}|/|y^{(k)}-\bar{y}|=c$ . The equation (17) in Theorem 3.6 shows that the approximate error $x^{(k)}-\,{\rm Prox}\,_{\lambda f_{\sigma}}$ converges Q-linearly to $x_{1}(\tau)$ for each $\tau\!>\!\frac{\lambda}{\sigma}$ when $\lambda\leq\sigma^{2}$ .

For (16), $\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}<\frac{\lambda}{\sigma^{2}}e^{0}\leq 1$ by Lemma 3.1 (iv). Moreover, $x_{1}(\tau)$ is increasing on $(\frac{\lambda}{\sigma},+\infty)$ by Lemma 2.1 and $x_{1}(\tau)\to 0$ as $\tau\to\frac{\lambda}{\sigma}$ . Let $\lambda=1$ and $\sigma=2$ in Theorem 3.6. Fix $k\in\{1,2,3,4\}$ , $x^{(k)}$ , $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ , and the corresponding error function $x^{(k)}-\,{\rm Prox}\,_{\lambda f_{\sigma}}$ for any $\tau>0$ are illustrated in Fig. 4.

Next, we will consider the case that $\lambda>\sigma^{2}$ . As a direct sequence of Theorem 3.4, if we fix the initial value $x^{(0)}\geq 0$ (for example, $x^{(0)}=\tau$ or $x^{(0)}=0$ ) for all $\tau>0$ , then $\{x^{(k)}\}$ generated by Algorithm 1 fails to converge to the solution of $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for at least one $\tau>0$ . To solve this, the initial value $x^{(0)}\geq 0$ will be chose depending on $\bar{\tau}_{\lambda,\sigma}$ . A simple choice for $x^{(0)}$ is suggested below.

Theorem 3.8

Given $\tau\!>\!0$ . Let $\lambda\!>\!\sigma^{2}$ and the initial value $x^{(0)}$ is given by

x^{(0)}:=\left\{\begin{array}[]{ll}0,&\mbox{ if }\tau\leq\bar{\tau}_{\lambda,\sigma},\\ \tau,&\mbox{ otherwise}.\end{array}\right.

(18)

Then, the following statements hold.

(i)

The sequence $\{x^{(k)}\}$ generated by Algorithm 1 converges to the solution of $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau>0$ .
(ii)

If $0<\tau\leq\bar{\tau}_{\lambda,\sigma}$ , then $x^{(k)}=0$ for each $k$ .

(iii)

If $\tau>\bar{\tau}_{\lambda,\sigma}$ , it holds that

\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau))\!<\!x^{(k)}-x_{1}(\tau)\!<\!\Big{(}\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}\Big{)}^{k}(\tau-x_{1}(\tau)),

where $x_{1}(\tau)$ is defined as in Lemma 2.2.

Proof: Suppose $\tau\leq\bar{\tau}_{\lambda,\sigma}$ . Then $x^{(0)}=0$ . By Theorem 3.4 (i) and (v), then $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\leq\bar{\tau}_{\lambda,\sigma}$ . If $\tau>\bar{\tau}_{\lambda,\sigma}$ , then $\tau>\sigma(1+\ln\frac{\lambda}{\sigma^{2}})$ and further $x^{(0)}=\tau>\sigma\ln\frac{\lambda}{\sigma^{2}}$ . By Theorem 3.4 (i) and (ii), $\{x^{(k)}\}$ converges to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for any $\tau\!>\!\bar{\tau}_{\lambda,\sigma}$ . So, the statement (i) holds. The statement (ii) is from Lemma 4.1 (i) and the fact $\bar{\tau}_{\lambda,\sigma}\!\leq\!\frac{\lambda}{\sigma}$ . Now suppose that $\tau>\bar{\tau}_{\lambda,\sigma}$ . Then $x^{(0)}=\tau$ . Notice $\phi(\tau)<0$ where $\phi$ is defined in (4). Then $x_{1}(\tau)<\tau$ by Lemma 3.1 and $\lambda>\sigma^{2}$ . Associating the proof of Lemma 4.7 (iii) when $x^{(0)}>x_{1}(\tau)$ with Lemma 4.3, we know that $x^{(k+1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}$ and $x^{(k)}\!>\!x^{(k+1)}\!>\!x_{1}(\tau)$ for each $k$ . The rest proof is similar to the last part of Theorem 3.6. We omit it. $\Box$

Recall that $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq\bar{\tau}_{\lambda,\sigma}\leq\frac{\lambda}{\sigma}$ and $x_{1}(\tau)$ is strictly increasing for $\tau\geq\sigma(1+\ln\frac{\lambda}{\sigma^{2}})$ . Observe that $x_{1}(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))=\sigma\ln\frac{\lambda}{\sigma^{2}}$ . It follows that for any $\tau>\bar{\tau}_{\lambda,\sigma}$ ,

\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\tau)}{\sigma}}<\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\bar{\tau}_{\lambda,\sigma})}{\sigma}}\leq\frac{\lambda}{\sigma^{2}}e^{-\frac{x_{1}(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))}{\sigma}}=\frac{\lambda}{\sigma^{2}}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma^{2}}}{\sigma}}=1.

Given the initial value $x^{(0)}$ as in Theorem 3.8. Let $\lambda=2$ and $\sigma=1$ . Fix $k\in\{2,4,6,8\}$ , $x^{(k)}$ , $\,{\rm Prox}\,_{\lambda f_{\sigma}}$ , and the corresponding error function $x^{(k)}-\,{\rm Prox}\,_{\lambda f_{\sigma}}$ for any $\tau>0$ are given in Fig. 5.

4 Proof of Theorems 3.3 and 3.4

To start with, we present several technical lemmas describing the convergence of Algorithm 1. Then, the limit of the sequence generated by this algorithm is compared to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ directly with $\tau>0$ .

Lemma 4.1

Given $\tau\!>\!0$ and $x^{(0)}\!\geq\!0$ . Let sequence $\{x^{(k)}\}$ be generated by Algorithm 1. Suppose that there exists $k_{0}\!\geq\!0$ such that $x^{(k_{0})}=0$ . Then

(i)

If $\tau\leq\frac{\lambda}{\sigma}$ , $x^{(k)}=0$ for any $k\geq k_{0}$ .
(ii)

If $\tau>\frac{\lambda}{\sigma}$ , $x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0$ for any $k\geq k_{0}$ . Moreover, $x^{(k+1)}>x^{(k)}$ for any $k\geq k_{0}$ .
(iii)

If $\tau>\frac{\lambda}{\sigma}$ , the sequence $\{x^{(k)}\}$ converges to $\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau$ .

Proof: From (3), it holds that

x^{(k_{0}+1)}=\Big{(}\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k_{0})}}{\sigma}}\Big{)}_{+}=\Big{(}\tau-\frac{\lambda}{\sigma}\Big{)}_{+}.

(19)

Clearly, if $\tau\!\leq\!\frac{\lambda}{\sigma}$ , $x^{(k_{0}+1)}\!=\!0$ by (19), yielding $x^{(k)}\!=\!0$ for any $k\!\geq\!k_{0}+1$ . If $\tau\!>\!\frac{\lambda}{\sigma}$ , $x^{(k_{0}+1)}\!=\!\tau-\frac{\lambda}{\sigma}\!>\!0$ by (19). With $\tau\!-\!\frac{\lambda}{\sigma}e^{-\frac{x^{(k_{0}+1)}}{\sigma}}\!>\!\tau\!-\!\frac{\lambda}{\sigma}e^{0}\!>\!0$ and (3), one has

x^{(k_{0}+2)}=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k_{0}+1)}}{\sigma}})_{+}>0,

yielding $x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0$ for any $k\geq k_{0}$ . Moreover, notice that $x^{(k_{0}+2)}>x^{(k_{0}+1)}$ . Together with $x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0$ for any $k\geq k_{0}$ and the monotonic increase of the function $h(t):=\tau-\frac{\lambda}{\sigma}e^{-t}$ , we know that $x^{(k+1)}>x^{(k)}$ for any $k\geq k_{0}$ . Hence, the statements (i) and (ii) hold.

By (ii) and $\tau-\frac{\lambda}{\sigma}\!<\!x^{(k)}\!<\!\tau$ for each $k>k_{0}$ , the sequence $\{x^{(k)}\}$ converges. Moreover, it converges to $\sigma W_{0}(-\frac{\lambda}{\sigma^{2}}e^{-\frac{\tau}{\sigma}})+\tau$ by Proposition 3.2 (ii). $\Box$

Lemma 4.2

Given $\tau\!>\!0$ and $x^{(0)}\!\geq\!0$ . Let sequence $\{x^{(k)}\}$ be generated by Algorithm 1. If $x^{(k)}>0$ for any $k\in\mathbb{N}$ , then

(i)

$\{x^{(k)}\}$ is strictly increasing and convergent if $x^{(1)}>x^{(0)}$ .
(ii)

$\{x^{(k)}\}$ is strictly decreasing and convergent if $x^{(1)}<x^{(0)}$ .
(iii)

$\{x^{(k)}\}$ is constant if $x^{(1)}=x^{(0)}$ .

Proof: Since $x^{(k)}\!>\!0$ for each $k$ and (3), $x^{(k+1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}\!>\!0$ for any $k\in\mathbb{N}$ . If $x^{(1)}\!>\!x^{(0)}$ , together with the monotonic increase of the function $h(t):=\tau-\frac{\lambda}{\sigma}e^{-t}$ , we know that $x^{(k+1)}\!>\!x^{(k)}$ for any $k$ . Obviously, $0<x^{(k)}<\tau$ for each $k$ . Hence, $\{x^{(k)}\}$ is strictly increasing and converging, and the statement (i) holds. The rest proof is similar to (i). $\Box$

Lemma 4.3

Given $\tau\in[\frac{\lambda}{\sigma},+\infty)$ and $x^{(0)}>0$ . Let the sequence $\{x^{(k)}\}$ be generated by Algorithm 1. Then $x^{(k)}>0$ for all $k\geq 0$ and the sequence $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ defined as in Lemma 2.2.

Proof: Since $\tau\!\geq\!\frac{\lambda}{\sigma}$ and $x^{(0)}\!>\!0$ , it holds that $\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}\!>\!\tau-\frac{\lambda}{\sigma}\!\geq\!0$ . With (3), we have $x^{(1)}=\Big{(}\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}\Big{)}_{+}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}>0,$ which yields

x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0,\text{ for any }k\in\mathbb{N}.

By Lemma 4.2, it suffices to argue the sequence $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ . Notice that $x^{(1)}-x^{(0)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}-x^{(0)}\!=\!\phi(x^{(0)})$ , where $\phi$ be defined by (4). Obviously, $x_{1}(\tau)\!\geq\!0$ by Lemma 3.1 (iv) and (v). We will proceed in three cases.

Case 1: $x^{(0)}\!=\!x_{1}(\tau)\!>\!0$ . Then $\phi(x^{(0)})\!=\!0$ by Lemma 3.1 (iv), namely, $x^{(1)}\!=\!x^{(0)}$ . Hence, $\{x^{(k)}\}$ is constant from Lemma 4.2 (iii). The desired result obviously holds.

Case 2: $0\!<\!x^{(0)}\!<\!x_{1}(\tau)$ . Now, $x_{1}(\tau)\!>\!0$ . Then $\phi(x^{(0)})\!>\!0$ by Lemma 3.1 (i) and the fact $\phi(0)\!\geq\!0$ , which implies that $x^{(1)}\!>\!x^{(0)}$ . Hence, $\{x^{(k)}\}$ is strictly increasing and convergent from Lemma 4.2 (i).

Case 3: $x^{(0)}\!>\!x_{1}(\tau)$ . Then $\phi(x^{(0)})\!<\!0$ by Lemma 3.1 (i), namely, $x^{(1)}\!<\!x^{(0)}$ . Hence, $\{x^{(k)}\}$ is strictly decreasing and convergent from Lemma 4.2 (ii).

In summary, the sequence $\{x^{(k)}\}$ is convergent and its limit is denoted by $\!x^{(\infty)}$ . Then $x^{(\infty)}\!\geq\!0$ and $\phi(x^{(\infty)})\!=\!0$ . So, $x^{(\infty)}=x_{1}(\tau)$ by Lemma 3.1 (iv) and (v). $\Box$

By Lemma 4.1 (iii) and Lemma 4.3, we have the following conclusion.

Corollary 4.4

Given $\tau\!>\!\frac{\lambda}{\sigma}$ and $x^{(0)}\!\geq\!0$ , the sequence $\{x^{(k)}\}$ generated by Algorithm 1 converges to $x_{1}(\tau)$ .

The following lemma proves that $\{x^{(k)}\}$ always converges to $0$ for all $\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))$ if $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})>0$ , namely, $\frac{\lambda}{\sigma^{2}}>\frac{1}{e}$ .

Lemma 4.5

Suppose $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})>0$ . Given $\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))$ and an initial value $x^{(0)}\geq 0$ . Let the sequence $\{x^{(k)}\}$ be generated by Algorithm 1. Then $\{x^{(k)}\}$ converges to $0$ .

Proof: Firstly, we will argue that there exists $k_{0}\geq 0$ such that $x^{(k_{0})}=0$ . If not, $x^{(k)}>0$ for all $k\geq 0$ . Then $x^{(1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}$ from (3) and $x^{(1)}-x^{(0)}=\phi(x^{(0)})$ , where $\phi$ be defined by (4). Again from Lemma 3.1 (i) and $\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))$ , it holds that $\phi(x)\leq\phi(\sigma\ln{\frac{\lambda}{\sigma^{2}}})<0$ for any $x\in\mathbb{R}$ . Consequently, $\phi(x^{(0)})<0$ , namely, $x^{(1)}<x^{(0)}$ . So, $\{x^{(k)}\}$ is decreasing and convergent by Lemma 4.2 (ii). Now suppose that $\lim\limits_{k\to\infty}x^{(k)}\!=\!x^{(\infty)}$ . Then $x^{(\infty)}\!\geq\!0$ and $\phi(x^{(\infty)})\!=\!0$ , which contradicts to $\phi(x^{(\infty)})<0$ . Hence, there exists $k_{0}\geq 0$ such that $x^{(k_{0})}=0$ , and then the sequence $\{x^{(k)}\}$ converges to $0$ by Lemma 4.1 (i) and the fact $\sigma(1+\ln{\frac{\lambda}{\sigma^{2}}})\!\leq\!\frac{\lambda}{\sigma}$ . $\Box$

The next two lemmas study the convergence of $\{x^{(k)}\}$ for $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ .

Lemma 4.6

Suppose $\lambda\!\leq\!\sigma^{2}$ . Given $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ and an initial value $x^{(0)}\!\geq\!0$ . Let the sequence $\{x^{(k)}\}$ be generated by Algorithm 1. Then $\{x^{(k)}\}$ converges to $0$ .

Proof: Firstly, we will argue that there exists $k_{0}\geq 0$ such that $x^{(k_{0})}=0$ . If not, $x^{(k)}>0$ for all $k\geq 0$ . Then $x^{(1)}\!=\!\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}$ from (3) and $x^{(1)}-x^{(0)}=\phi(x^{(0)})$ , where $\phi$ is defined by (4). Since $\lambda\leq\sigma^{2}$ , $\sigma\ln{\frac{\lambda}{\sigma^{2}}}\leq 0$ . Again from Lemma 3.1 (i), it holds that $\phi(x)\leq\phi(0)=\tau-\frac{\lambda}{\sigma}<0$ for any $x\geq 0$ . Consequently, $\phi(x^{(0)})<0$ , namely, $x^{(1)}<x^{(0)}$ . So, $\{x^{(k)}\}$ is decreasing and convergent by Lemma 4.2 (ii). Now suppose that $\lim\limits_{k\to\infty}x^{(k)}\!=\!x^{(\infty)}$ . Then $x^{(\infty)}\geq 0$ and $\phi(x^{(\infty)})=0$ , which contradicts to $\phi(x^{(\infty)})<0$ . Hence, there exists $k_{0}\geq 0$ such that $x^{(k_{0})}=0$ , and then the sequence $\{x^{(k)}\}$ converges to $0$ by Lemma 4.1 (i). $\Box$

Lemma 4.7

Suppose $\lambda\!>\!\sigma^{2}$ . Given $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ and an initial value $x^{(0)}\!\geq\!0$ . Let the sequence $\{x^{(k)}\}$ be generated by Algorithm 1 and $x_{1}(\tau)$ and $x_{2}(\tau)$ are defined in Lemma 3.1. Then, the following statements hold.

(i)

If $x^{(0)}\in(0,x_{2}(\tau))$ , the sequence $\{x^{(k)}\}$ converges to $0$ .
(ii)

If $x^{(0)}=x_{2}(\tau)$ , the sequence $\{x^{(k)}\}$ converges to $x_{2}(\tau)$ .
(iii)

If $x^{(0)}\in(x_{2}(\tau),+\infty)$ , the sequence $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ .

Proof: Since $\lambda>\sigma^{2}$ and $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ , $\phi(\sigma\ln\frac{\lambda}{\sigma\tau})\!=\!-\sigma\ln\frac{\lambda}{\sigma\tau}\!<\!0$ , where $\phi$ is defined by (4). By Lemma 3.1 (i) and (ii), it holds

0<\sigma\ln\frac{\lambda}{\sigma\tau}<x_{2}(\tau)\leq\sigma\ln\frac{\lambda}{\sigma^{2}}\leq x_{1}(\tau),

(20)

for any $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ .

(i) The proof can be divided into two cases: $x^{(0)}\leq\sigma\ln\frac{\lambda}{\sigma\tau}$ and $\sigma\ln\frac{\lambda}{\sigma\tau}<x^{(0)}<x_{2}(\tau)$ . If $x^{(0)}\leq\sigma\ln\frac{\lambda}{\sigma\tau}$ , then

x^{(1)}=(\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}})_{+}\leq(\tau-\frac{\lambda}{\sigma}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma\tau}}{\sigma}})_{+}=(\tau-\tau)_{+}=0,

and hence $\{x^{(k)}\}$ converges to $0$ by Lemma 4.1 (i).

If $\sigma\ln\frac{\lambda}{\sigma\tau}<x^{(0)}<x_{2}(\tau)$ , $\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}>\tau-\frac{\lambda}{\sigma}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma\tau}}{\sigma}}=0.$ Hence, it follows that $x^{(1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}$ from (3), and then $0<x^{(1)}\leq x^{(0)}$ as $x^{(1)}-x^{(0)}=\phi(x^{(0)})<\phi(x_{2}(\tau))=0$ by Lemma 3.1 (i)–(iii). If there exists $k_{0}\geq 0$ such that $x^{(k_{0})}=0$ , $\{x^{(k)}\}$ converges to $0$ by Lemma 4.1 (i). Otherwise, $x^{(k)}>0$ for all $k\geq 0$ . Notice that $x^{(1)}<x^{(0)}$ . Thus, the sequence $\{x^{(k)}\}$ is decreasing and convergent by Lemma 4.2 (ii). Moreover, its limit, denoted by $x^{(\infty)}$ satisfies $x^{(\infty)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}}$ , namely, $\phi(x^{(\infty)})=0$ , and $x^{(\infty)}<x^{(0)}<x_{2}(\tau)$ , which implies $\phi(x^{(\infty)})<\phi(x_{2}(\tau))=0$ . Contradiction. In summary, the sequence $\{x^{(k)}\}$ converges to $0$ .

(ii) If $x^{(0)}\!=\!x_{2}(\tau)$ , then $x^{(0)}\!>\!\sigma\ln\frac{\lambda}{\sigma\tau}$ from (20), and $x^{(1)}\!=\!x^{(0)}$ as $x^{(1)}-x^{(0)}\!=\!\phi(x^{(0)})\!=\!\phi(x_{2}(\tau))\!=\!0$ . In this scenario, the sequence $\{x^{(k)}\}$ is a constant sequence and its limit is $x_{2}(\tau)$ .

(iii) Let $x^{(0)}\!\in\!(x_{2}(\tau),+\infty)$ . We know that $x_{2}(\tau)\!\leq\!x_{1}(\tau)$ from Lemma 3.1 (ii) and (iii). $x^{(0)}>\sigma\ln\frac{\lambda}{\sigma\tau}$ by (20) and $\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}>\tau-\frac{\lambda}{\sigma}e^{-\frac{\sigma\ln\frac{\lambda}{\sigma\tau}}{\sigma}}=0.$ Hence, it follows that $x^{(1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}$ from (3). If $x^{(0)}\!<\!x_{1}(\tau)$ , $x^{(1)}\!>\!x^{(0)}$ since $x^{(1)}-x^{(0)}\!=\!\phi(x^{(0)})\!>\!\phi(x_{2}(\tau))\!=\!0$ by Lemma 3.1 (i) and (ii), which yields

x^{(k+1)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(k)}}{\sigma}}>0,\text{ for any }k\in\mathbb{N}.

Hence, $\{x^{(k)}\}$ is increasing and convergent by Lemma 4.2 (i), and its limit satisfies $x^{(\infty)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}}$ and must be $x_{1}(\tau)$ . If $x^{(0)}=x_{1}(\tau)$ , $x^{(1)}\!=\!x^{(0)}$ since $x^{(1)}-x^{(0)}=\phi(x^{(0)})=\phi(x_{1}(\tau))=0$ . In this scenario, the sequence $\{x^{(k)}\}$ is a constant sequence and its limit is $x_{1}(\tau)$ . If $x^{(0)}>x_{1}(\tau)$ , $x^{(1)}<x^{(0)}$ as $\phi(x^{(0)})<\phi(x_{1}(\tau))=0$ with Lemma 3.1 (i) and (ii). We estimate

x^{(1)}-x_{1}(\tau)=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(0)}}{\sigma}}-x_{1}(\tau)>\tau-\frac{\lambda}{\sigma}e^{-\frac{x_{1}(\tau)}{\sigma}}-x_{1}(\tau)=\phi(x_{1}(\tau))=0.

which implies $x^{(1)}>x_{1}(\tau)$ and then $x^{(k)}>x^{(k+1)}>x_{1}(\tau)$ for each $k$ . Therefore, $\{x^{(k)}\}$ is decreasing and convergent. Its limit satisfies $x^{(\infty)}\geq x_{1}(\tau)$ and $x^{(\infty)}=\tau-\frac{\lambda}{\sigma}e^{-\frac{x^{(\infty)}}{\sigma}}$ . Therefore, $x^{(\infty)}$ must be $x_{1}(\tau)$ by Lemma 3.1 (ii) and (iii). $\Box$

By Lemma 4.7 (ii), (iii) and Lemma 3.1 (iii), we can obtain the following claim.

Corollary 4.8

When $\tau=\sigma(1+\ln\frac{\lambda}{\sigma^{2}})$ and $\lambda>\sigma^{2}$ , the sequence $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ for any $x^{(0)}\in[x_{1}(\tau),+\infty)$ with $x_{1}(\tau)=\sigma\ln\frac{\lambda}{\sigma^{2}}$ .

Now, we are ready to prove Theorems 3.3 and 3.4.

Proof of Theorem 3.3 We only argue when $\tau\!>\!0$ . In the following, we will divide the arguments into two cases.

Case 1: $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\!>\!0$ . When $\tau\!\in\!(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))$ , $\{x^{(k)}\}$ converges to $0$ by Lemma 4.5. When $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ , $\{x^{(k)}\}$ converges to $0$ by Lemma 4.6. When $\tau=\frac{\lambda}{\sigma}$ , $\{x^{(k)}\}$ converges to $x_{1}(\tau)=0$ by Lemma 4.3 and Lemma 3.1 if $x^{(0)}>0$ , and $\{x^{(k)}\}$ converges to $0$ by Lemma 4.1 (i) if $x^{(0)}=0$ . In a short, for any $\tau\in(0,\frac{\lambda}{\sigma}]$ , $\{x^{(k)}\}$ converges to $0$ . When $\tau\in(\frac{\lambda}{\sigma},+\infty)$ , $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ by Lemma 4.3 if $x^{(0)}>0$ , and $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ by Lemma 4.1 (iii) if $x^{(0)}=0$ . Hence, for any $\tau\in(\frac{\lambda}{\sigma},+\infty)$ , $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ .

Case 2: $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\leq 0$ . In this case. $\tau\in(0,\frac{\lambda}{\sigma})\subseteq[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ , the sequence $\{x^{(k)}\}$ converges to $0$ by Lemma 4.6. When $\tau\in[\frac{\lambda}{\sigma},+\infty)$ , its proof is the same as the case 1.

Based on the above arguments, $\{x^{(k)}\}$ converges to the exact solution to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ by Lemma 2.2. The proof is hence complete.

Proof of Theorem 3.4 We only argue that $\tau\!>\!0$ . By Corollary 4.4, $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ for any $\tau\in(\frac{\lambda}{\sigma},+\infty)$ . By Lemma 4.5, $\{x^{(k)}\}$ converges to $0$ for any $\tau\in(0,\sigma(1+\ln\frac{\lambda}{\sigma^{2}}))$ . Hence, the statement (i) holds with Lemma 2.3. The rest of the proof will focus on $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ . Now suppose that $\tau\in[\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ .

(ii) Let $x^{(0)}\!\geq\!\sigma\ln\frac{\lambda}{\sigma^{2}}$ . If $\tau\in(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ , then $x_{2}(\tau)\!<\!\sigma\ln\frac{\lambda}{\sigma^{2}}\!<\!x_{1}(\tau)$ by Lemma 3.1(ii) and (v). Hence, $x^{(0)}\!>\!x_{2}(\tau)$ from the assumption that $x^{(0)}\!\geq\!\sigma\ln\frac{\lambda}{\sigma^{2}}$ . By Lemma 4.7 (iii) and Corollary 4.4, $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ . If $\tau\!=\!\sigma(1+\ln\frac{\lambda}{\sigma^{2}})$ , $x_{1}(\tau)=x_{2}(\tau)=\sigma\ln\frac{\lambda}{\sigma^{2}}$ from Lemma 3.1(iii), and then the desired result is obtained by Corollary 4.8. Thus, with Lemma 2.4, the statement (ii) holds.

(iii) Let $x_{2}(\bar{\tau}_{\lambda,\sigma})\!<\!x^{(0)}\!<\!\sigma\ln\frac{\lambda}{\sigma^{2}}$ . Since $x_{2}(\tau)$ is strictly decreasing on $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ by Lemma 2.1 and $\sigma\ln\frac{\lambda}{\sigma^{2}}=x_{2}\Big{(}\sigma\big{(}1+\ln\frac{\lambda}{\sigma^{2}}\big{)}\Big{)}$ by Lemma 3.1(iii), we can drive that $\sigma\Big{(}1+\ln\frac{\lambda}{\sigma^{2}}\Big{)}<x_{2}^{-1}(x^{(0)})<\bar{\tau}_{\lambda,\sigma},$ and that $x^{(0)}\!<\!x_{2}(\tau)$ for each $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)})]$ and $x^{(0)}\!>\!x_{2}(\tau)$ for each $\tau\!\in\![x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma})$ . Together with Lemma 4.7, the limit of $\{x^{(k)}\}$ , denoted by $x^{(\infty)}$ , satisfies

x^{(\infty)}=\left\{\begin{array}[]{ll}0,&\mbox{ if }\tau\in(\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)})),\\ x^{(0)}=x_{2}(\tau),&\mbox{ if }\tau=x_{2}^{-1}(x^{(0)}),\\ x_{1}(\tau),&\mbox{ if }\tau\in(x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}].\end{array}\right.

(21)

Compared (21) with Lemma 2.4 gives the desired conclusion.

(iv) Let $x^{(0)}\!=\!x_{2}(\overline{\tau}_{\lambda,\sigma})$ . When $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\overline{\tau}_{\lambda,\sigma}]$ , $x_{2}(\tau)\!>\!x_{2}(\overline{\tau}_{\lambda,\sigma})\!=\!x_{0}$ since $x_{2}(\tau)$ is strictly decreasing on $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma})$ by Lemma 2.1, and then $\{x^{(k)}\}$ converges to $0$ by Lemma 4.7 (i). When $\tau\!\in\!(\overline{\tau}_{\lambda,\sigma},\frac{\lambda}{\sigma}]$ , $x_{2}(\tau)\!<\!x_{2}(\overline{\tau}_{\lambda,\sigma})\!=\!x_{0}$ , and then $\{x^{(k)}\}$ converges to $x_{1}(\tau)$ by Lemma 4.7 (iii). When $\tau\!=\!\overline{\tau}_{\lambda,\sigma}$ , $x_{2}(\tau)\!=\!x_{0}$ and then $\{x^{(k)}\}$ converges to $x_{2}(\overline{\tau}_{\lambda,\sigma})$ by Lemma 4.7 (ii). Hence, the desired result is obtained by Lemma 2.4.

(v) Let $0\leq x^{(0)}\!<\!x_{2}(\bar{\tau}_{\lambda,\sigma})$ . The proof is similar to (iii). Since $0\!\leq x^{(0)}\!<\!x_{2}(\bar{\tau}_{\lambda,\sigma})$ , we have $\sigma(1+\ln\frac{\lambda}{\sigma^{2}})\!\leq\!\bar{\tau}_{\lambda,\sigma}\!<\!x_{2}^{-1}(x^{(0)})\!\leq\!\frac{\lambda}{\sigma}.$ Suppose that $x^{(0)}\!>\!0$ . Since $x_{2}(\tau)$ is strictly decreasing on $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ by Lemma 2.1, it holds that $x^{(0)}\!<\!x_{2}(\tau)$ for any $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),x_{2}^{-1}(x^{(0)}))$ ; and $x^{(0)}\!>\!x_{2}(\tau)$ for any $\tau\!\in\!(x_{2}^{-1}(x^{(0)}),\frac{\lambda}{\sigma}]$ . By Lemma 4.7, the limit of $\{x^{(k)}\}$ is given as in (21). Compared (21) with Lemma 2.4, $x^{(\infty)}$ does not belong to $\,{\rm Prox}\,_{\lambda f_{\sigma}}(\tau)$ for $\tau\!\in\!(\bar{\tau}_{\lambda,\sigma},x_{2}^{-1}(x^{(0)})]$ . The rest is also true for $x^{(0)}\!=\!0$ by Lemma 4.1 (i) and the facts that $x_{2}(\tau)\!=\!0$ if and only if $\tau\!=\!\frac{\lambda}{\sigma}$ from the proof of case (i) in Lemma 3.1 and $x_{2}(\tau)$ is strictly decreasing on $\tau\!\in\![\sigma(1+\ln\frac{\lambda}{\sigma^{2}}),\frac{\lambda}{\sigma}]$ .

5 Conclusions

The relation between the IRL1 solution and the true proximal operator of PiE (1) has been clarified in Theorems 3.3 and 3.4, which can be explicitly dependent upon $\sigma$ , the initial value $x^{(0)}$ , and the regularization parameter $\lambda$ . Furthermore, to remedy the gap, the initial value was adaptively selected as in Theorems 3.6 and 3.8 to guarantee that the IRL1 solution belongs to the proximal operator of PiE. The results justify the usage of IRL1 for PiE whenever an initial value is appropriately given. Finally, our arguments can be applied to other sparse-promoting penalties, especially those whose proximal operator can not be explicitly derived.

References

[1] A. Beck, First-order Methods in Optimization, SIAM Publisher, Philadelphia, 2017.
[2] T. Blumensath and M. E. Davies, Iterative thresholding for sparse approximations, Journal of Fourier Analysis and Applications, 14 (2008), pp. 629–654.
[3] P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in Proceedings of the 15th International Conference on Machine Learning, vol. 98, 1998, pp. 82–90.
[4] P. S. Bradley, O. L. Mangasarian, and W. N. Street, Feature selection via mathematical programming, INFORMS Journal on Computing, 10 (1998), pp. 209–217.
[5] E. J. Candes, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted $\ell_{1}$ minimization, Journal of Fourier Analysis and Applications, 14 (2008), pp. 877–905.
[6] L. Chen and Y. Gu, The convergence guarantees of a non-convex approach for sparse recovery, IEEE Transactions on Signal Processing, 62 (2014), pp. 3754–3767.
[7] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (2001), pp. 1348–1360.
[8] J. Fan, R. Li, C.-H. Zhang, and H. Zou, Statistical Foundations of Data Science, Chapman and Hall/CRC, 2020.
[9] S. Foucart and M.-J. Lai, Sparsest solutions of underdetermined linear systems via $\ell_{q}$ -minimization for $0<q\leq 1$ , Applied and Computational Harmonic Analysis, 26 (2009), pp. 395–407.
[10] G. M. Fung, O. L. Mangasarian, and A. J. Smola, Minimal kernel classifiers, Journal of Machine Learning Research, 3 (2002), pp. 303–321.
[11] C. Gao, N. Wang, Q. Yu, and Z. Zhang, A feasible nonconvex relaxation approach to feature selection, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, 2011, pp. 356–361.
[12] W. Guo, Y. Lou, J. Qin, and M. Yan, A novel regularization based on the error function for sparse recovery, Journal of Scientific Computing, 87 (2021), p. No. 31.
[13] W. Jiang, F. Nie, and H. Huang, Robust dictionary learning with capped $\ell_{1}$ -norm, in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
[14] H. A. Le Thi, T. P. Dinh, H. M. Le, and X. T. Vo, DC approximation approaches for sparse optimization, European Journal of Operational Research, 244 (2015), pp. 26–46.
[15] Y. Liu and R. Lin, A bisection method for computing the proximal operator of the $\ell_{p}$ -norm with $0<p<1$ . Journal of Computational and Applied Mathematics.
[16] Y. Liu, Y. Zhou, and R. Lin, The proximal operator of the piece-wise exponential function and its application in compressed sensing. arXiv:2306.13425.
[17] Y. Lou and M. Yan, Fast L1–L2 minimization via a proximal operator, Journal of Scientific Computing, 74 (2018), pp. 767–785.
[18] S. Lucidi and F. Rinaldi, Exact penalty functions for nonlinear integer programming problems, Journal of Optimization Theory and Applications, 145 (2010), pp. 479–488.
[19] M. Malek-Mohammadi, A. Koochakzadeh, M. Babaie-Zadeh, M. Jansson, and C. R. Rojas, Successive concave sparsity approximation for compressed sensing, IEEE Transactions on Signal Processing, 64 (2016), pp. 5657–5671.
[20] O. Mangasarian, Machine learning via polyhedral concave minimization, in Applied Mathematics and Parallel Computing, Springer, 1996, pp. 175–188.
[21] I. Mezo, The Lambert W Function: Its Generalizations and Applications, CRC Press, 2022.
[22] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM Journal on Imaging Sciences, 8 (2015), pp. 331–372.
[23] A. Prater-Bennette, L. Shen, and E. E. Tripp, The proximity operator of the log-sum penalty, Journal of Scientific Computing, 93 (2022), p. No. 67.
[24] A. Prater-Bennette, L. Shen, and E. E. Tripp, A constructive approach for computing the proximity operator of the p-th power of the $\ell_{1}$ norm, Applied and Computational Harmonic Analysis, 67 (2023), p. 101572.
[25] F. Rinaldi, New results on the equivalence between zero-one programming and continuous concave programming, Optimization Letters, 3 (2009), pp. 377–386.
[26] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, vol. 317, Springer Science & Business Media, 2009.
[27] L. Shen, Y. Xu, and X. Zeng, Wavelet inpainting with the $\ell_{0}$ sparse regularization, Applied and Computational Harmonic Analysis, 41 (2016), pp. 26–53.
[28] M. Tao, Minimization of ${L}_{1}$ over ${L}_{2}$ for sparse signal recovery with convergence guarantee, SIAM Journal on Scientific Computing, 44 (2022), pp. A770–A797.
[29] J. Trzasko and A. Manduca, Highly undersampled magnetic resonance image reconstruction via homotopic $\ell_{0}$ -minimization, IEEE Transactions on Medical imaging, 28 (2008), pp. 106–121.
[30] H. Wang, H. Zeng, and J. Wang, Convergence rate analysis of proximal iteratively reweighted $\ell_{1}$ methods for $\ell_{p}$ regularization problems, Optimization Letters, 17 (2023), pp. 413–435.
[31] H. Wang, H. Zeng, J. Wang, and Q. Wu, Relating $\ell_{p}$ regularization and reweighted $\ell_{1}$ regularization, Optimization Letters, 15 (2021), pp. 2639–2660.
[32] H. Wang, F. Zhang, Y. Shi, and Y. Hu, Nonconvex and nonsmooth sparse optimization via adaptively iterative reweighted methods, Journal of Global Optimization, 81 (2021), pp. 717–748.
[33] J. Wright and Y. Ma, High-dimensional Data Analysis with Low-dimensional Models: Principles, Computation, and Applications, Cambridge University Press, 2022.
[34] J. Yan, X. Meng, F. Cao, and H. Ye, A universal rank approximation method for matrix completion, International Journal of Wavelets, Multiresolution and Information Processing, 20 (2022), p. 2250016.
[35] P. Yin, E. Esser, and J. Xin, Ratio and difference of $\ell_{1}$ and $\ell_{2}$ norms and sparse representation with coherent dictionaries, Communications in Information and Systems, 14 (2014), pp. 87–109.
[36] P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of $\ell_{1-2}$ for compressed sensing, SIAM Journal on Scientific Computing, 37 (2015), pp. A536–A563.
[37] C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Annals of Statistics, 38 (2010), pp. 894–942.
[38] S. Zhang and J. Xin, Minimization of transformed $l_{1}$ penalty: Closed form representation and iterative thresholding algorithms, Communications in Mathematical Sciences, 15 (2017), pp. 511–537.
[39] , Minimization of transformed ${L}_{1}$ penalty: theory, difference of convex function algorithm, and robust application in compressed sensing, Mathematical Programming, 169 (2018), pp. 307–336.
[40] T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Research, 11 (2010), pp. 1081–1107.
[41] Z. Zhou, A unified framework for constructing nonconvex regularizations, IEEE Signal Processing Letters, 29 (2022), pp. 479–483.
[42] , Sparse recovery based on the generalized error function, Journal of Computational Mathematics,doi:10.4208/jcm.2204-m2021-0288, (2023).
[43] H. Zou and R. Li, One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics, 36 (2008), pp. 1509–1533.

On Choosing Initial Values of Iteratively Reweighted ℓ1\ell_{1} Algorithms for the Piece-wise Exponential Penalty