A Unified Framework for Constructing Nonconvex Regularizations

Zhiyong Zhou The author is with the Department of Statistics, Zhejiang University City College, 310015, Hangzhou, China (e-mail: [email protected]).

Abstract

Over the past decades, many individual nonconvex methods have been proposed to achieve better sparse recovery performance in various scenarios. However, how to construct a valid nonconvex regularization function remains open in practice. In this paper, we fill in this gap by presenting a unified framework for constructing the nonconvex regularization based on the probability density function. Meanwhile, a new nonconvex sparse recovery method constructed via the Weibull distribution is studied.

Index Terms:

Nonconvex regularization; Probability density function; Cumulative distribution function; Iteratively reweighted algorithms.

I Introduction

Sparse recovery has attracted tremendous research interest in various areas including statistical learning [1] and compressive sensing [2]. Its goal is to recover an unknown sparse signal $\mathbf{x}\in\mathbb{R}^{N}$ from under-determined noisy measurements $\mathbf{y}=A\mathbf{x}+\boldsymbol{\varepsilon}\in\mathbb{R}^{m}$ with $m\ll N$ and $\boldsymbol{\varepsilon}$ being the noise vector. As is known, this sparse signal of interest can be well recovered by solving the Lasso problem [3] $\min_{\mathbf{x}\in\mathbb{R}^{N}}\frac{1}{2}\lVert\mathbf{y}-A\mathbf{x}\rVert_{2}^{2}+\lambda\lVert\mathbf{x}\rVert_{1}$ ( $\lambda>0$ is a tuning parameter) or the constrained $\ell_{1}$ -minimization [4]. Due to the fact that the $\ell_{1}$ norm is just a loose approximation of the $\ell_{0}$ norm, the Lasso is biased and does not have the oracle property [5].

To overcome this issue, a large number of nonconvex methods have been proposed to better approximate the $\ell_{0}$ norm and promote sparsity. Generally, a nonconvex regularization method possesses a form of

\displaystyle\min\limits_{\mathbf{x}\in\mathbb{R}^{N}}\frac{1}{2}\lVert\mathbf{y}-A\mathbf{x}\rVert_{2}^{2}+\lambda J_{\theta}(\mathbf{x}),

(1)

where $J_{\theta}(\cdot)$ denotes a nonconvex regularization or penalty function depending on some parameter or vector of parameters $\theta$ . In particular, a separable regularization function has a formulation as $\sum_{j=1}^{N}F_{\theta}(|x_{j}|)$ . This separable framework includes many popular nonconvex methods as its special cases, such as $\ell_{p}(0<p<1)$ [6, 7], Capped-L1 [8], transformed $\ell_{1}$ (TL1) [9], smooth clipped absolute deviation (SCAD) [5], minimax concave penalty (MCP) [10], three-order polynomial (TOP) method [11], exponetial-type penalty (ETP) [12, 13], error function (ERF) method [14], and the very recent generalized error function (GERF) method [15], to name a few. It was called $F$ -minimization in [16], and its exact and robust reconstruction conditions were investigated when $F_{\theta}(\cdot)$ satisfies some desirable properties such as subadditivity.

These nonconvex regularization methods enjoy attractive theoretical properties and have achieved great success in various scenarios. However, in practice, it is still unclear how to construct a valid and effective nonconvex regularization function. More specifically, how to construct the function $F_{\theta}(\cdot)$ in the separable nonconvex regularization term remains unanswered. In this paper, we set out to fill in this gap between theory and practice by proposing a unified framework for constructing the nonconvex regularization based on the probability density function. It provides a fairly general framework, making many existing nonconvex regularization methods as its special cases. More importantly, many new nonconvex regularization functions can be constructed within this framework.

The paper is organized as follows. In Section II, we present the unified framework. In Section III, we discuss the theoretical analysis results. In Section IV, we study a new nonconvex method based on the Weibull penalty. Finally, conclusions are included in Section V.

II A Unified Framework

We propose to construct the following nonconvex regularization term for sparse recovery:

\displaystyle J_{\theta}(\mathbf{x})=\sum\limits_{j=1}^{N}F_{\theta}(|x_{j}|),

(2)

where $F_{\theta}(x)=\int_{0}^{x}f_{\theta}(\tau)d\tau$ is the cumulative distribution function (CDF) of a probability density function (PDF) $f_{\theta}(\cdot)$ defined on $[0,\infty)$ .

In what follows, we first give some examples of the distributions supported on $[0,\infty)$ (or on a bounded interval) and their corresponding nonconvex sparse recovery methods. As we can see from Table I, we are able to find the corresponding probability distributions for almost all the existing popular separable nonconvex methods. In addition, some new nonconvex methods can be constructed through provided distributions, for instance the Weibull distribution and the Chi-squared distribution.

Distribution	PDF	Method
Dirac delta function	$\delta(x)$	$\ell_{0}$ -minimization
Uniform	$\frac{1}{\gamma},x\in[0,\gamma]$	Capped L1
Piece-wise Linear	$\frac{2}{\lambda(\gamma+1)}\left(1\wedge(1-\frac{x-\lambda}{\lambda(\gamma-1)})_{+}\right),\gamma>1$	SCAD
Piece-wise Linear	$\frac{2}{\lambda\gamma}(1-\frac{x}{\lambda\gamma})_{+},\gamma>0$	MCP
U-quadratic	$\alpha(x-\beta)^{2},x\in[a,b]$	TOP
Exponential	$\frac{1}{\sigma}e^{-x/\sigma},\sigma>0$	ETP
Rayleigh	$\frac{x}{\sigma^{2}}e^{-x^{2}/(2\sigma^{2})},\sigma>0$	–
Weibull	$\frac{k}{\sigma}(x/\sigma)^{k-1}e^{-(x/\sigma)^{k}},k>0,\sigma>0$	–
Chi-squared	$\frac{x^{k-1}e^{-x^{2}/2}}{2^{k/2-1}\Gamma(\frac{k}{2})},k>0$	–
Generalized Gamma	$\frac{(p/a^{d})x^{d-1}e^{-(x/a)^{p}}}{\Gamma(d/p)},a>0,d>0,p>0$	GERF ( $d=1$ )
Generalized Beta Prime	$\frac{p(\frac{x}{q})^{\alpha p-1}(1+(\frac{x}{q})^{p})^{-\alpha-\beta}}{qB(\alpha,\beta)},p,q,\alpha,\beta>0$	TL1 ( $\alpha=\beta=p=1$ )

TABLE I: Examples of nonconvex methods and their corresponding probability distributions defined on

[0,\infty)

. We did not fill in the corresponding methods for some distributions because they still seem to be new.

If a probability distribution is supported on the whole real line, then we can instead use its folded version supported on $[0,\infty)$ to construct the corresponding regularization function. Some examples are listed as follows:

•

Folded Normal distribution: $f_{\sigma}(x)=\frac{2}{\sqrt{2\pi}\sigma}e^{-\frac{x^{2}}{2\sigma^{2}}}$ with $F_{\sigma}(x)=\mathrm{erf}(\frac{x}{\sqrt{2}\sigma})$ . In this case, it corresponds to the ERF method proposed in [14].
•

Folded Student’s t-distribution: $f_{\nu}(x)=\frac{2\Gamma(\frac{\nu+1}{2})}{\sqrt{\nu\pi}\Gamma(\frac{\nu}{2})}(1+\frac{x^{2}}{\nu})^{-\frac{\nu+1}{2}}$ , where $\nu$ is the degrees of freedom and $\Gamma(\cdot)$ is the gamma function. In the special case of $\nu=1$ , it reduces to the Folded Cauchy distribution with $f(x)=\frac{2}{\pi(1+x^{2})}$ and $F(x)=\frac{2}{\pi}\arctan(x)$ , so that it goes to the ”arctan” method proposed in [17].
•

Folded Laplace distribution is exactly the Exponential distribution.

We display plots for the PDFs and CDFs of some well-known distributions in Figure 1. As shown, all the CDFs are continuous, non-decreasing, and satisfy that $F_{\theta}(0)=0$ and $F_{\theta}(+\infty)=1$ . The PDFs of the Exponential, the Folded Normal and the Uniform distributions are non-increasing, such that their CDFs are concave. Whereas, the PDFs of the Rayleigh and the Weibull with $k=1.5$ are strongly unimodal, making their CDFs being not concave. These properties will be closely related to the theoretical analysis and solving algorithms for the corresponding nonconvex regularization methods.

Refer to caption — Figure 1: The PDFs and CDFs of some well-known distributions including the Exponential ( $\sigma=1$ ), the Folded Normal ( $\sigma=1$ ), the Uniform ( $[0,2]$ ), the Rayleigh ( $\sigma=1$ ) and the Weibull ( $k=1.5,\sigma=1$ ).

Next, we provide some useful properties for the regularization functions $J_{\theta}(\cdot)$ constructed based on the probability density functions. We start with the following proposition, which shows the distribution of $J_{\theta}(\mathbf{x})$ when the entries of $\mathbf{x}$ are assumed to be random. Its proof is straightforward and so is omitted.

Proposition 1

If we assume that $\{|x_{j}|\}$ is random and independent and identically distributed (i.i.d.) with a PDF $f_{\theta}(\cdot)$ and let $p_{j}=F_{\theta}(|x_{j}|)$ for any $j\in[N]$ , then the penalty $J_{\theta}(\mathbf{x})=\sum_{j=1}^{N}p_{j}$ with $p_{j}\sim U[0,1]$ is the sum of $N$ i.i.d. $U[0,1]$ random variables, which has an Irwin–Hall (IH) distribution. Moreover, when $N$ is large, $J_{\theta}(\mathbf{x})$ has a normal limiting distribution with mean $N/2$ and variance $N/12$ by using the central limit theorem.

In addition, the constructed regularization or penalty term $J_{\theta}(\cdot):\mathbb{R}^{N}\rightarrow[0,N]$ can also be viewed as a sparsity measure, which possesses the following appealing properties:

•

$J_{\theta}(\cdot)$ is continuous in $\mathbb{R}^{N}$ so that it is stable with respect to small perturbations of the signal.
•

$J_{\theta}(\mathbf{x})=0$ if and only if $\mathbf{x}=\mathbf{0}$ .
•

$J_{\theta}(\mathbf{x})=J_{\theta}(-\mathbf{x})$ for any $\mathbf{x}\in\mathbb{R}^{N}$ .
•

$0\leq J_{\theta}(\mathbf{x})\leq\lVert\mathbf{x}\rVert_{0}\leq N$ .
•

If $|\mathbf{x}|\preceq|\mathbf{y}|$ (i.e., $|x_{j}|\leq|y_{j}|$ for all $j\in[N]$ ), then we have $J_{\theta}(\mathbf{x})\leq J_{\theta}(\mathbf{y})$ due to the non-decreasing property of the CDF $F_{\theta}(\cdot)$ .
•

If $f_{\theta}(\cdot)$ is non-increasing (i.e., $f_{\theta}^{\prime}(\cdot)\leq 0$ ), then both $F_{\theta}(\cdot)$ and $J_{\theta}(\cdot)$ are concave. In addition, in this case we have $J_{\theta}(\cdot)$ is subadditive, that is, for any $\mathbf{x},\mathbf{y}\in\mathbb{R}^{N}$ , $J_{\theta}(\mathbf{x}+\mathbf{y})\leq J_{\theta}(\mathbf{x})+J_{\theta}(\mathbf{y})$ . This additive property holds when $\mathbf{x},\mathbf{y}$ have disjoint supports.

To illustrate the penalty function $J_{\theta}(\cdot)$ as a sparsity measure, we show the corresponding sparsity levels computed via $J_{\theta}(\cdot)$ constructed based on the Exponential, the Rayleigh and the Weibull distributions for a compressible signal of length $50$ generated with its entries decay as $j^{-2}$ with $j\in\{1,2,\cdots,50\}$ in Figure 2. As we can see, for a compressible signal with very small but non-zero entries, all the $J_{\theta}(\cdot)$ with proper values of $\theta$ provided better sparsity measures than the traditional $\ell_{0}$ norm which equals $50$ here.

III Theoretical Analysis

In this section, we discuss the recovery analysis results for the following constrained noiseless $J_{\theta}$ -minimization:

\displaystyle\min\limits_{\mathbf{z}\in\mathbb{R}^{N}}J_{\theta}(\mathbf{z})\quad\text{subject to}\quad A\mathbf{z}=A\mathbf{x},

(3)

where $J_{\theta}(\cdot)$ is constructed as in (2). Throughout this section, we assume that the function $J_{\theta}(\cdot)$ fulfills the subadditive property.

We start with the definition of a generalized version of the null space property (NSP) [18], which guarantees an exact sparse recovery for our proposed $J_{\theta}$ -minimization (3).

Definition 1

We say a matrix $A\in\mathbb{R}^{m\times N}$ satisfies a generalized null space property (gNSP) relative to $J_{\theta}(\cdot)$ and $S\subset[N]$ if

\displaystyle J_{\theta}(\mathbf{v}_{S})<J_{\theta}(\mathbf{v}_{S^{c}})\quad\text{for all $\mathbf{v}\in\mathrm{Ker}(A)\setminus\{0\}$}.

(4)

It satisfies the gNSP of order $s$ relative to $J_{\theta}(\cdot)$ if it satisfies the gNSP relative to $J_{\theta}(\cdot)$ and any $S\subset[N]$ with $|S|\leq s$ .

Under the subadditivity of $J_{\theta}(\cdot)$ , it is straightforward to obtain the following sufficient and necessary condition for an exact sparse recovery.

Theorem 1

For any given measurement matrix $A\in\mathbb{R}^{m\times N}$ , every $s$ -sparse vector $\mathbf{x}\in\mathbb{R}^{N}$ is the unique solution of the problem (3) if and only if $A$ satisfies the gNSP of order $s$ relative to $J_{\theta}(\cdot)$ .

Moreover, we are able to show that by choosing a proper $\theta$ , one can obtain a solution arbitrarily close to the unique solution of $\ell_{0}$ minimization via (3) based on the $\Delta_{q}$ -spherical section property given in [15]. The proofs of the following results are quite similar to that given earlier in [15] and so are not reproduced here. Our results established in this present paper extend those given in [15] for the GERF method to any subadditive $J_{\theta}$ -minimization method constructed as in (2).

Definition 2

([15]) For any $q\in(1,\infty]$ , the measurement matrix $A$ is said to possess the $\Delta_{q}$ -spherical section property if $\Delta_{q}(A)\leq(\lVert\mathbf{v}\rVert_{1}/\lVert\mathbf{v}\rVert_{q})^{\frac{q}{q-1}}$ holds for all $\mathbf{v}\in\mathrm{Ker}(A)\setminus\{0\}$ . In other words, $\Delta_{q}(A)=\inf_{\mathbf{v}\in\mathrm{Ker}(A)\setminus\{0\}}\left(\frac{\lVert\mathbf{v}\rVert_{1}}{\lVert\mathbf{v}\rVert_{q}}\right)^{\frac{q}{q-1}}$ .

Proposition 2

Assume $A\in\mathbb{R}^{m\times N}$ has the $\Delta_{q}$ -spherical section property for some $q\in(1,\infty]$ and $\lVert\mathbf{x}\rVert_{0}=s<2^{\frac{q}{1-q}}\Delta_{q}(A)$ . Then for any $\hat{\mathbf{x}}\in\{\mathbf{z}\in\mathbb{R}^{N}:A\mathbf{z}=A\mathbf{x}\}$ and $\alpha_{\theta}=F_{\theta}^{-1}(1-\frac{1}{N})$ , we have

\displaystyle\lVert\hat{\mathbf{x}}-\mathbf{x}\rVert_{q}\leq\frac{N\alpha_{\theta}}{\Delta_{q}(A)^{1-1/q}-(\lceil\Delta_{q}(A)-1\rceil)^{1-1/q}},

(5)

whenever $J_{\theta}(\hat{\mathbf{x}})\leq\lceil\Delta_{q}-1\rceil-s$ .

Proposition 3

If $\hat{\mathbf{x}}$ be the solution of (3) and the conditions in Proposition 2 hold, then it satisfies

\displaystyle J_{\theta}(\hat{\mathbf{x}})\leq\lceil\Delta_{q}-1\rceil-s.

Theorem 2

Let $\hat{\mathbf{x}}$ be the solution of (3) and assume the conditions in Proposition 2 hold. Then $\hat{\mathbf{x}}$ approaches the unique solution of $\ell_{0}$ minimization $\mathbf{x}$ as $\alpha_{\theta}=F_{\theta}^{-1}(1-\frac{1}{N})$ approaches $0$ .

IV Weibull Penalty

In this section, we study a new sparse recovery method constructed via the Weibull distribution with PDF $f(x;k,\sigma)=\frac{k}{\sigma}(x/\sigma)^{k-1}e^{-(x/\sigma)^{k}},x\geq 0$ , where $k>0$ is the shape parameter, $\sigma>0$ is the scale parameter, and CDF $F(x;k,\sigma)=1-e^{-(x/\sigma)^{k}}$ . Hereafter, we denote the Weibull penalty (WBP) by

	$\displaystyle WBP(\mathbf{x})$	$\displaystyle=\sum\limits_{j=1}^{N}\int_{0}^{\|x_{j}\|}\frac{k}{\sigma}(\tau/\sigma)^{k-1}e^{-(\tau/\sigma)^{k}}d\tau$
		$\displaystyle=\sum\limits_{j=1}^{N}\left(1-e^{-\|x_{j}/\sigma\|^{k}}\right).$		(6)

We hide its dependency on the parameters in $WBP(\mathbf{x})$ to keep the notation simple. The parameters $k$ and $\sigma$ control the degree of concavity of the penalty function $WBP(\mathbf{x})$ . We refer this new sparse recovery method based on the Weibull penalty as ”WBP”. It is very flexible and includes the ETP method in [12, 13] as its special case of $k=1$ .

Then, we can obtain the following proposition that characterizes the asymptotic behaviors of $WBP(\mathbf{x})$ .

Proposition 4

For any nonzero $\mathbf{x}\in\mathbb{R}^{N}$ and $k>0$ ,

(a)

$\sigma^{k}WBP(\mathbf{x})\rightarrow\lVert\mathbf{x}\rVert_{k}^{k}$ , as $\sigma\rightarrow+\infty$ .
(b)

$WBP(\mathbf{x})\rightarrow\lVert\mathbf{x}\rVert_{0}$ , as $\sigma\rightarrow 0^{+}$ .

Proof. As for (a), when $x>0$ , it holds that

	$\displaystyle\lim\limits_{\sigma\rightarrow+\infty}\frac{\sigma^{k}\int_{0}^{x}\frac{k}{\sigma^{k}}\tau^{k-1}e^{-(\tau/\sigma)^{k}}d\tau}{x^{k}}$	$\displaystyle=\lim\limits_{\sigma\rightarrow+\infty}\frac{\int_{0}^{x/\sigma}ku^{k-1}e^{-u^{k}}du}{(x/\sigma)^{k}}$
		$\displaystyle=\lim\limits_{t\rightarrow 0}\frac{\int_{0}^{t}{ku^{k-1}e^{-u^{k}}du}}{t^{k}}=1,$

where we let $t=x/\sigma$ . When $x=0$ , it is obvious that $1-e^{-|x/\sigma|^{k}}=0=x$ .

To prove (b), we have $1-e^{-|0/\sigma|^{k}}=0$ and $\forall\,x>0$ ,

	$\displaystyle\lim\limits_{\sigma\rightarrow 0^{+}}\int_{0}^{x}\frac{k}{\sigma}(\tau/\sigma)^{k-1}e^{-(\tau/\sigma)^{k}}d\tau$	$\displaystyle=\lim\limits_{\sigma\rightarrow 0^{+}}\int_{0}^{x/\sigma}ku^{k-1}e^{-u^{k}}du$
		$\displaystyle=\int_{0}^{+\infty}{ku^{k-1}e^{-u^{k}}du}=1.$

In order to demonstrate this proposition, in Figure 3 we plot the objective functions of the Weibull penalty $WBP(\cdot)$ in $\mathbb{R}$ for various values of $k$ and $\sigma$ . All the functions are scaled to attain the point $(1,1)$ for a better comparison. As expected, regardless of some constant, the Weibull penalty with a shape parameter $k$ is an interpolation of the $\ell_{0}$ and $\ell_{k}$ penalty. More specifically, it approaches the $\ell_{0}$ penalty as $\sigma$ approaches $0$ , while it approaches the $\ell_{k}$ penalty as $\sigma$ approaches $+\infty$ .

IV-A Algorithms

As $f^{\prime}(x;k,\sigma)=\left(\frac{k(k-1)}{\sigma^{k}}x^{k-2}-\frac{k^{2}}{\sigma^{2k}}x^{2(k-1)}\right)e^{-(x/\sigma)^{k}}$ , so $F(x;k,\sigma)$ is concave in $x\in\mathbb{R}_{+}$ whenever $k\leq 1$ . Therefore, according to [19], an iteratively reweighted $\ell_{1}$ (IRL1) algorithm (summarized in Algorithm 1) can be employed to solve the following Weibull-Penalized problem:

\displaystyle\min\limits_{\mathbf{x}\in\mathbb{R}^{N}}\frac{1}{2}\lVert\mathbf{y}-A\mathbf{x}\rVert_{2}^{2}+\lambda WBP(\mathbf{x}).

(7)

Algorithm 1 IRL1 for solving (7)

1: Initialization: Set

w_{j}^{(0)}=1

for

j\in[N]

, and

l=0

2: Iteration: Repeat until the stopping rule is met,

\displaystyle\mathbf{x}^{(l+1)}=\mathop{\arg\min}\limits_{\mathbf{x}\in\mathbb{R}^{N}}\frac{1}{2}\lVert\mathbf{y}-A\mathbf{x}\rVert_{2}^{2}+\lambda\sum\limits_{j=1}^{N}w_{j}^{(l)}|x_{j}|,

(8)

where

w_{j}^{(l)}=\frac{k}{\sigma^{k}}|x_{j}^{(l)}|^{k-1}e^{-|x_{j}^{(l)}/\sigma|^{k}}

for

j\in[N]

3: Update iteration:

l=l+1

For the case of $k>1$ , $F(x;k,\sigma)$ is no longer concave in $x\in\mathbb{R}_{+}$ such that the IRL1 is not applicable. However, note that $WBP(\cdot)$ with $k>1$ belongs to the class $\mathcal{F}_{cc}$ (additively separable, and convex in the convexity region and concave in the concavity region) which has also been discussed in [19]. Therefore, an iteratively reweighted tight convex algorithm (see Algorithm 4 in [19]) can be adopted here. Moreover, we can rewrite the $WBP(\cdot)$ as a Difference of Convex functions (DC). For any $k\geq 1$ , the decomposition $1-e^{-|x/\sigma|^{k}}=\frac{1}{\sigma^{k}}|x|^{k}-\frac{1}{\sigma^{k}}\int_{0}^{|x|}k\tau^{k-1}(1-e^{-(\tau/\sigma)^{k}})d\tau$ yields the following DC decomposition:

\displaystyle WBP(\mathbf{x})=\frac{1}{\sigma^{k}}\lVert\mathbf{x}\rVert_{k}^{k}-\frac{1}{\sigma^{k}}\sum\limits_{j=1}^{N}\int_{0}^{|x_{j}|}{k\tau^{k-1}(1-e^{-(\tau/\sigma)^{k}})d\tau}.

As a result, the problem (7) can be expressed as a DC program:

	$\displaystyle\min\limits_{\mathbf{x}\in\mathbb{R}^{N}}\frac{1}{2}\lVert\mathbf{y}-A\mathbf{x}\rVert_{2}^{2}+\lambda\sum\limits_{j=1}^{N}(1-e^{-\|x_{j}/\sigma\|^{k}})$
	$\displaystyle=\min\limits_{\mathbf{x}\in\mathbb{R}^{N}}\underbrace{\frac{1}{2}\lVert\mathbf{y}-A\mathbf{x}\rVert_{2}^{2}+\frac{\lambda}{\sigma^{k}}\lVert\mathbf{x}\rVert_{k}^{k}}_{g(\mathbf{x})}-\underbrace{\frac{\lambda}{\sigma^{k}}\sum\limits_{j=1}^{N}\int_{0}^{\|x_{j}\|}{k\tau^{k-1}(1-e^{-(\tau/\sigma)^{k}})d\tau}}_{h(\mathbf{x})},$

which can be solved by the Difference of Convex functions Algorithms (DCA) [20]. The detailed discussions are out of the scope of this paper and so left for future work.

IV-B A Simulation Study

Finally, we carry out a simple simulation study for the new constructed WBP method as given in (7) with $k\in(0,1]$ in sparse recovery. We solve it by using the IRL1 algorithm and an alternating direction method of multipliers (ADMM) algorithm [21] is used to solve the subproblem (8). The true signal is of length $256$ and simulated as $s$ -sparse with $s$ in the set $\{6,8,\cdots,32\}$ . The measurement matrix $A$ is a $64\times 256$ Gaussian random matrix. For each $k$ and $\sigma$ , we replicate the noiseless experiments $100$ times with $\lambda=10^{-7}$ . It is recorded as one success if the relative error $\lVert\hat{\mathbf{x}}-\mathbf{x}\rVert_{2}/\lVert\mathbf{x}\rVert_{2}\leq 10^{-3}$ . We show the success rate over the $100$ replicates for various values of parameters $k$ and $\sigma$ , while varying the sparsity level $s$ .

As we can observe from Figure 4, the proposed WBP method outperforms the $\ell_{1}$ -minimization (denoted as $L_{1}$ ) for all tested values of $k$ and $\sigma$ . Overall, those tested values of $\sigma$ (i.e., $\sigma=0.01,1,10,100$ ) have very limited effects on the reconstruction performance when $k$ is less than 1 (i.e., $k=0.01,0.2,0.5,0.8$ ), but have a much bigger effect for the case of $k=1$ (apparently $\sigma=1$ is the best choice). $k=1$ is always better than other values of $k$ when $\sigma$ is fixed to $0.01,1,10$ . However, when $\sigma$ takes a large value (e.g., $\sigma=100$ ), the WBP with $k=1$ is almost the same as the $\ell_{1}$ -minimization, which has a worse performance compared to the WBP with other values of $k$ .

V Conclusion

In this paper, we provide a unified framework for constructing the nonconvex separable penalty through the probability distribution. The theoretical recovery results in terms of the null space property and the spherical section property are presented. In addition, a new nonconvex method based on the Weibull penalty is used in sparse recovery.

Acknowledgment

We would like to acknowledge support from the Zhejiang Provincial Natural Science Foundation of China under Grant No.LQ21A010003.

References

[1] T. Hastie, R. Tibshirani, and M. Wainwright, Statistical learning with sparsity: the lasso and generalizations. CRC press, 2015.
[2] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing. Birkhäuser Basel, 2013, vol. 1.
[3] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.
[4] D. L. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, 2006.
[5] J. Fan and R. Li, “Variable selection via nonconcave penalized likelihood and its oracle properties,” Journal of the American Statistical Association, vol. 96, no. 456, pp. 1348–1360, 2001.
[6] R. Chartrand and W. Yin, “Iteratively reweighted algorithms for compressive sensing,” in Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on. IEEE, 2008, pp. 3869–3872.
[7] S. Foucart and M.-J. Lai, “Sparsest solutions of underdetermined linear systems via $\ell_{q}$ -minimization for $0<q\leq 1$ ,” Applied and Computational Harmonic Analysis, vol. 26, no. 3, pp. 395–407, 2009.
[8] T. Zhang, “Analysis of multi-stage convex relaxation for sparse regularization.” Journal of Machine Learning Research, vol. 11, no. 3, 2010.
[9] S. Zhang and J. Xin, “Minimization of transformed ${L}_{1}$ penalty: theory, difference of convex function algorithm, and robust application in compressed sensing,” Mathematical Programming, vol. 169, no. 1, pp. 307–336, 2018.
[10] C.-H. Zhang, “Nearly unbiased variable selection under minimax concave penalty,” Annals of Statistics, vol. 38, no. 2, pp. 894–942, 2010.
[11] Y. Yu and J. Peng, “A unified smooth framework for nonconvex penalized least squares problems,” 2020.
[12] C. Gao, N. Wang, Q. Yu, and Z. Zhang, “A feasible nonconvex relaxation approach to feature selection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, no. 1, 2011.
[13] M. Malek-Mohammadi, A. Koochakzadeh, M. Babaie-Zadeh, M. Jansson, and C. R. Rojas, “Successive concave sparsity approximation for compressed sensing,” IEEE Transactions on Signal Processing, vol. 64, no. 21, pp. 5657–5671, 2016.
[14] W. Guo, Y. Lou, J. Qin, and M. Yan, “A novel regularization based on the error function for sparse recovery,” arXiv preprint arXiv:2007.02784, 2020.
[15] Z. Zhou, “Sparse recovery based on the generalized error function,” arXiv preprint arXiv:2105.13189, 2021.
[16] J. Liu, J. Jin, and Y. Gu, “Robustness of sparse recovery via F-minimization: A topological viewpoint,” IEEE Transactions on Information Theory, vol. 61, no. 7, pp. 3996–4014, 2015.
[17] E. J. Candes, M. B. Wakin, S. P. Boyd et al., “Enhancing sparsity by reweighted $\ell_{1}$ minimization,” Journal of Fourier Analysis and Applications, vol. 14, no. 5, pp. 877–905, 2008.
[18] A. Cohen, W. Dahmen, and R. DeVore, “Compressed sensing and best $k$ -term approximation,” Journal of the American Mathematical Society, vol. 22, no. 1, pp. 211–231, 2009.
[19] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, “On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision,” SIAM Journal on Imaging Sciences, vol. 8, no. 1, pp. 331–372, 2015.
[20] P. D. Tao and L. T. H. An, “Convex analysis approach to dc programming: theory, algorithms and applications,” Acta Mathematica Vietnamica, vol. 22, no. 1, pp. 289–355, 1997.
[21] S. Boyd, N. Parikh, and E. Chu, Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011.