On Choosing Initial Values of Iteratively Reweighted Algorithms for the Piece-wise Exponential Penalty
Abstract
Computing the proximal operator of the sparsity-promoting piece-wise exponential (PiE) penalty with a given shape parameter , which is treated as a popular nonconvex surrogate of -norm, is fundamental in feature selection via support vector machines, image reconstruction, zero-one programming problems, compressed sensing, etc. Due to the nonconvexity of PiE, for a long time, its proximal operator is frequently evaluated via an iteratively reweighted algorithm, which substitutes PiE with its first-order approximation, however, the obtained solutions only are the critical point. Based on the exact characterization of the proximal operator of PiE, we explore how the iteratively reweighted solution deviates from the true proximal operator in certain regions, which can be explicitly identified in terms of , the initial value and the regularization parameter in the definition of the proximal operator. Moreover, the initial value can be adaptively and simply chosen to ensure that the iteratively reweighted solution belongs to the proximal operator of PiE.
Keywords: Iteratively reweighted algorithms; piece-wise exponential penalty; proximal operator; Lambert W function; initial values.
1 Introduction
Sparse optimization problems arise in a wide range of fields, such as compressed sensing, image processing, statistics, machine learning, and among others [32, 33]. The so-called -norm, which counts the nonzero components of a vector, is a natural penalty function to promote sparsity. Sparse solutions are more easily interpretable and generally lead to better generalization of the model performance. Numerous studies on -norm penalty optimization problem have been widely investigated in the literature [2, 8, 27, 33]. However, such a nonconvex problem is NP-hard [2].
To circumvent this challenge, there are a great many of -norm surrogates listed in the literature [16, 32, 42]. The -norm regularizer has received a great deal of attention for its continuity and convexity. Although it comes close to the -norm, the -norm frequently leads to problems with excessive punishment. To remedy this issue, nonconvex sparsity-inducing penalties have been employed to better approximate the -norm and enhance sparsity, and hence have received considerable attention in sparse learning. Recent theoretical studies have shown their superiority to the convex counterparts in a variety of sparse learning settings, including the bridge -norm penalty [9, 15], capped penalty [13, 40], transformed penalty [38, 39], log-sum penalty [5], minimax concave penalty [37], smoothly clipped absolute derivation [7], the difference of - and -norms [17, 36], the ratio of - and -norms [28, 35], Weibull penalty [41], generalized error functions [12, 42], -th power of the -norm [24], piece-wise exponential function (PiE) in [3, 14, 20], and among others. To address the nonconvex and possibly nonsmooth problems, a proximal algorithm is commonly used [1]. The proximal operator [1] of a function at with the regularization parameter is defined by
Characterizing the proximal operator of a function is crucial to the proximal algorithm. However, such a proximal operator does not always have a closed form or is computationally challenging to solve due to the nonconvex and nonsmooth nature of the sparsity-inducing penalty. A popular method for handling this issue is the iteratively reweighted algorithm, which approximates the nonconvex and nonsmooth problem by a sequence of trackable convex subproblems. Zou and Li [43] devised a local linear approximation, which can be treated as a special case of the iteratively reweighted (IRL1) minimization method proposed by Candés, Wakin, and Boyd [5]. The IRL1 algorithm can be unified under a majorization-minimization framework [22]. Later, the IRL1 algorithm for optimization problems with general nonconvex and nonsmooth sparsity-inducing terms was explored in [32], and its global and local convergence analysis for the -norm regularized model were studied in [31] and [30], respectively.
In this paper, we focus on the PiE function. The PiE function with a shape parameter , defined by
(1) |
is one of the nonconvex surrogates of the -norm. It is also called an exponential-type penalty [11, 14, 32, 41] or a Laplacian function [29, (16)], which has been successfully applied in the support vector machines [3, 10], zero-one programming problems [18, 25], image reconstruction [29, 39], compressed sensing [6, 14, 19], and the low-rank matrix completion [34], etc. Due to the nonconvexity of PiE, for a very long time, the IRL1 algorithm was adopted in a large volume of references to approximate the proximal operator of PiE [3, 34, 41, 42]. Recently, the IRL1 algorithm for computing the proximal operator of PiE was adopted in [34, (3.19)] for matrix completion. However, the expression of the proximal operator for PiE was originally and partially studied by Malek-Mohammadi et al[19] in 2016 and then systematically explored by Liu, Zhou, and Lin [16] using the Lambert W function. Motivated by the analysis between the IRL1 algorithm solution for the log-sum penalty and its proximal operator in [23], we will explore the relation between the IRL1 algorithm solution and the proximal operator for PiE and then provide how to select a suitable initial point in the IRL1 algorithm to ensure that the IRL1 solution is consistent with the proximal operator of PiE.
The remainder of the paper is outlined as follows: In Section 2, we recall the existing characterizations for by utilizing the Lambert W function. With this, we show in Theorems 3.3 and 3.4 of Section 3 that the iteratively reweighted solution does not belong to the proximal operator of PiE in certain regions, which can be explicitly determined in terms of , the initial value, and the regularization parameter , as shown in Fig. 2 later. To remedy this issue, the initial value is set adaptively, as in Theorems 3.6 and 3.8, to ensure that the IRL1 solution belongs to the proximal operator of PiE. Some necessary lemmas and the proofs of Theorems 3.3 and 3.4 are presented in Section 4. Some conclusions are made in the final section.
2 Existing characterizations for
Let us recall the expression of the proximal operator of PiE (1), which was systematically explored in [16] by means of the Lambert W function. The Lambert W function is a set of solutions of the equation
The function is single-valued for or , and is double-valued for (see, Fig. 1). To discriminate between the two branches when , we use the same notation as in [21, Section 1.5] and denote the branch satisfying and by and , respectively. Such a function is a built-in function in Python (https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.lambertw.html). Lemma 2.1 later gives their monotonicity. The readers can refer to the recent monograph [21, Section 1.5] on the Lambert W function to learn more details.
Lemma 2.1
[21, Section 1.6] The Lambert W function is strictly increasing on ; however, is strictly decreasing on .

The characterizations of the proximal operator of PiE (1) were presented in [16, Section 2], which were split into two cases: and . For the sake of completeness, we list those characterizations as follows.
Lemma 2.2
Let and . It holds that
where
Lemma 2.3
Lemma 2.3 can be further reduced to the following result, which shows that is single-valued except at some point depending upon only the and . This conclusion will be used in the proof of Theorem 3.4.
Lemma 2.4
3 Analysis of IRL1 for computing
In this section, we will analyze the IRL1 algorithm to compute the following problem:
(2) |
To solve the problem (2), the nonconvex function in the IRL1 algorithm is locally approximated by its linear expansion, namely,
where denotes the -th iteration. With it, the next iteration for a given is computed by
By removing the terms which do not depend on the variable in the above expression, we obtain
that is,
where .
It is sufficient to restrict our discussion on as is symmetric about the origin [16, Lemma 2.1] and . To be more precise, the IRL1 algorithm for PiE with is described in Algorithm 1.
Input Fix and . Given and .
-
for do
(3) -
end for
Output
Denote . We call a critical point of the function , if is satisfied, where denotes the subdifferential of at [26, Definition 8.3]. Ochs et al [22] pointed out that the sequence generated by Algorithm 1 converges to a critical point of the function . We go one step further than the previous result and show that not only the sequence is convergent, but also its limit depends on the initialization and the relationship of with the parameters and . The convergence behavior of (3) is described by Lemmas 4.1–4.7 in Section 4. This is then compared to the true solution set in Theorems 3.3 and 3.4. In particular, we identify the intervals where (3) will not achieve the true solution. These intervals are explicitly determined in terms of the initial and parameters and .
Notice that satisfying the equation by (3). To further investigate properties of , for given we define a function with
(4) |
and its main properties used later are listed in the following Lemma.
Lemma 3.1
Let be defined by (4). Write , and is defined as in Lemma 2.2. Then, the following statements hold.
-
(i)
The function is strictly increasing on and strictly decreasing on . Moreover, for any .
-
(ii)
If , the equation has two solutions and with
(7) -
(iii)
If , the equation has a unique solution, that is, .
-
(iv)
If , the equation has two solutions and satisfying .
-
(v)
If , the equation has two solutions and with
(11)
Proof: After simple calculation, , . Clearly, the statement (i) holds. The equation is equivalent to , namely,
(12) |
If , . By definition of Lambert W function and the equation (12), the equation has two solutions and . Together with (i) and the fact , we know that the statements (ii) and (iv) hold. When , . Hence, we obtain
which implies . The statement (iii) holds. In the following, we will argue the statement (v). Notice , . So, the equation has solutions and . From (i), it follows
(13) |
We will proceed in two cases.
Case 1: . If , then . With , we know that Together with , yielding
Again from (13), it follows that .
Proposition 3.2
Given and an initial value . Suppose that the sequence generated by Algorithm 1 converges to . Then, the following statements hold.
-
(i)
implies that .
-
(ii)
If , .
Proof: By the continuity of the function , and the equation (3) for each , we know that
(14) |
If , then from (14), which implies that . Hence, the statement (i) holds. If , then from (i), and from (14), namely, , where defined by (4). So, from Lemma 3.1 (iv).
3.1 Comparing IRL1 solution with
In this subsection, we will identify when the limit of the sequence belongs or not belongs to the set . We recall in Lemmas 2.2 and 2.4 that the set has a unique element except for with .
The following two theorems summarize our main results. Our results for PiE are mainly inspired by the ideas presented in [23, section 4] for the iteratively reweighted algorithm for computing the proximal operator of the log-sum penalty. The proofs as well as relevant technical lemmas are given in Section 4. From now on, we say that a sequence is converging to provided that the limit of belongs to the set .
Theorem 3.3
Given and an initial value . Let . Then the sequence generated by Algorithm 1 converges to .
If , we see that generated by Algorithm 1 may not always converge to for some given . The regions where the algorithm fails depend on the threshold given as in Lemma 2.4 and defined in Lemma 3.1, as shown in Fig. 2. The value can be computed by the bisection method. Notice that is strictly decreasing on , by Lemma 2.1, we denote the inverse function of by for each .
Theorem 3.4
Given and an initial value . Let , be defined as in Lemma 2.4, be defined as in Lemma 3.1 and the sequence be generated by Algorithm 1. Then the following statements hold.
-
(i)
The sequence converges to for any .
-
(ii)
If , converges to for any . Consequently, converges to for any , however does not converge to for any .
-
(iii)
If , the sequence converges to for any , but the sequence does not converge to for any .
-
(iv)
If , the sequence converges to for any , however the sequence does not converges to when .
-
(v)
If , the sequence converges to for any , but the sequence does not converge to for any .
Remark 3.5
The initial value for ILR1 is usually and simply set to be [5, Subsection 2.2] for compressed sensing, to be a random feasible value for support vector machines [4, Subsection 2.1], and the identity matrix for a low-rank matrix completion problem [41, Algorithm1]. By Theorem 3.4, the above choice may result in the deviation between the IRL1 solution and the proximal operator of PiE.
Fig. 2 illustrates the results (i)-(v) in Theorem 3.4 with . Only when lies in a subset of the interval does the deviation occur. The colored regions indicate where the IRL1 solution differs from the proximal operator of PiE. For example, let and . Then , , , , , and . In this case, given an initial value , the IRL1 solution (red dashdot) and the true proximal operator (black dashed) are illustrated in Fig. 3, which corresponds to the case of Theorem 3.4 (ii). Clearly, the IRL1 solution disagrees with the true proximal operator for any given .


3.2 Choices of initial values
We are devoted to adaptively selecting an initial value in a simple way to guarantee the fast convergence of the IRL1 solution to for all . The discussion will be divided into two cases: and .
Theorem 3.6
Proof: If , , the statement (i) is trivial by Lemma 4.1 (i). Now suppose . Then . Therefore, by Lemma 4.3, for each and converges to . Notice that by Lemma 3.1(iv). With (3) and the Lagrange mean value theorem, we arrive at
(17) |
for some . Note that is strictly decreasing for any . By (17), it follows that
Repeating the process, which yields (16). The proof is hence complete.
Remark 3.7
For (16), by Lemma 3.1 (iv). Moreover, is increasing on by Lemma 2.1 and as . Let and in Theorem 3.6. Fix , , , and the corresponding error function for any are illustrated in Fig. 4.


Next, we will consider the case that . As a direct sequence of Theorem 3.4, if we fix the initial value (for example, or ) for all , then generated by Algorithm 1 fails to converge to the solution of for at least one . To solve this, the initial value will be chose depending on . A simple choice for is suggested below.
Theorem 3.8
Proof: Suppose . Then . By Theorem 3.4 (i) and (v), then converges to for any . If , then and further . By Theorem 3.4 (i) and (ii), converges to for any . So, the statement (i) holds. The statement (ii) is from Lemma 4.1 (i) and the fact . Now suppose that . Then . Notice where is defined in (4). Then by Lemma 3.1 and . Associating the proof of Lemma 4.7 (iii) when with Lemma 4.3, we know that and for each . The rest proof is similar to the last part of Theorem 3.6. We omit it.
Recall that and is strictly increasing for . Observe that . It follows that for any ,
Given the initial value as in Theorem 3.8. Let and . Fix , , , and the corresponding error function for any are given in Fig. 5.


4 Proof of Theorems 3.3 and 3.4
To start with, we present several technical lemmas describing the convergence of Algorithm 1. Then, the limit of the sequence generated by this algorithm is compared to directly with .
Lemma 4.1
Given and . Let sequence be generated by Algorithm 1. Suppose that there exists such that . Then
-
(i)
If , for any .
-
(ii)
If , for any . Moreover, for any .
-
(iii)
If , the sequence converges to .
Proof: From (3), it holds that
(19) |
Clearly, if , by (19), yielding for any . If , by (19). With and (3), one has
yielding for any . Moreover, notice that . Together with for any and the monotonic increase of the function , we know that for any . Hence, the statements (i) and (ii) hold.
By (ii) and for each , the sequence converges. Moreover, it converges to by Proposition 3.2 (ii).
Lemma 4.2
Given and . Let sequence be generated by Algorithm 1. If for any , then
-
(i)
is strictly increasing and convergent if .
-
(ii)
is strictly decreasing and convergent if .
-
(iii)
is constant if .
Proof: Since for each and (3), for any . If , together with the monotonic increase of the function , we know that for any . Obviously, for each . Hence, is strictly increasing and converging, and the statement (i) holds. The rest proof is similar to (i).
Lemma 4.3
Proof: Since and , it holds that . With (3), we have which yields
By Lemma 4.2, it suffices to argue the sequence converges to . Notice that , where be defined by (4). Obviously, by Lemma 3.1 (iv) and (v). We will proceed in three cases.
Case 1: . Then by Lemma 3.1 (iv), namely, . Hence, is constant from Lemma 4.2 (iii). The desired result obviously holds.
Case 2: . Now, . Then by Lemma 3.1 (i) and the fact , which implies that . Hence, is strictly increasing and convergent from Lemma 4.2 (i).
Case 3: . Then by Lemma 3.1 (i), namely, . Hence, is strictly decreasing and convergent from Lemma 4.2 (ii).
In summary, the sequence is convergent and its limit is denoted by . Then and . So, by Lemma 3.1 (iv) and (v).
Corollary 4.4
Given and , the sequence generated by Algorithm 1 converges to .
The following lemma proves that always converges to for all if , namely, .
Lemma 4.5
Suppose . Given and an initial value . Let the sequence be generated by Algorithm 1. Then converges to .
Proof: Firstly, we will argue that there exists such that . If not, for all . Then from (3) and , where be defined by (4). Again from Lemma 3.1 (i) and , it holds that for any . Consequently, , namely, . So, is decreasing and convergent by Lemma 4.2 (ii). Now suppose that . Then and , which contradicts to . Hence, there exists such that , and then the sequence converges to by Lemma 4.1 (i) and the fact .
The next two lemmas study the convergence of for .
Lemma 4.6
Suppose . Given and an initial value . Let the sequence be generated by Algorithm 1. Then converges to .
Proof: Firstly, we will argue that there exists such that . If not, for all . Then from (3) and , where is defined by (4). Since , . Again from Lemma 3.1 (i), it holds that for any . Consequently, , namely, . So, is decreasing and convergent by Lemma 4.2 (ii). Now suppose that . Then and , which contradicts to . Hence, there exists such that , and then the sequence converges to by Lemma 4.1 (i).
Lemma 4.7
(i) The proof can be divided into two cases: and . If , then
and hence converges to by Lemma 4.1 (i).
If , Hence, it follows that from (3), and then as by Lemma 3.1 (i)–(iii). If there exists such that , converges to by Lemma 4.1 (i). Otherwise, for all . Notice that . Thus, the sequence is decreasing and convergent by Lemma 4.2 (ii). Moreover, its limit, denoted by satisfies , namely, , and , which implies . Contradiction. In summary, the sequence converges to .
(ii) If , then from (20), and as . In this scenario, the sequence is a constant sequence and its limit is .
(iii) Let . We know that from Lemma 3.1 (ii) and (iii). by (20) and Hence, it follows that from (3). If , since by Lemma 3.1 (i) and (ii), which yields
Hence, is increasing and convergent by Lemma 4.2 (i), and its limit satisfies and must be . If , since . In this scenario, the sequence is a constant sequence and its limit is . If , as with Lemma 3.1 (i) and (ii). We estimate
which implies and then for each . Therefore, is decreasing and convergent. Its limit satisfies and . Therefore, must be by Lemma 3.1 (ii) and (iii).
Corollary 4.8
When and , the sequence converges to for any with .
Proof of Theorem 3.3 We only argue when . In the following, we will divide the arguments into two cases.
Case 1: . When , converges to by Lemma 4.5. When , converges to by Lemma 4.6. When , converges to by Lemma 4.3 and Lemma 3.1 if , and converges to by Lemma 4.1 (i) if . In a short, for any , converges to . When , converges to by Lemma 4.3 if , and converges to by Lemma 4.1 (iii) if . Hence, for any , converges to .
Case 2: . In this case. , the sequence converges to by Lemma 4.6. When , its proof is the same as the case 1.
Based on the above arguments, converges to the exact solution to by Lemma 2.2. The proof is hence complete.
Proof of Theorem 3.4 We only argue that . By Corollary 4.4, converges to for any . By Lemma 4.5, converges to for any . Hence, the statement (i) holds with Lemma 2.3. The rest of the proof will focus on . Now suppose that .
(ii) Let . If , then by Lemma 3.1(ii) and (v). Hence, from the assumption that . By Lemma 4.7 (iii) and Corollary 4.4, converges to . If , from Lemma 3.1(iii), and then the desired result is obtained by Corollary 4.8. Thus, with Lemma 2.4, the statement (ii) holds.
(iii) Let . Since is strictly decreasing on by Lemma 2.1 and by Lemma 3.1(iii), we can drive that and that for each and for each . Together with Lemma 4.7, the limit of , denoted by , satisfies
(21) |
(iv) Let . When , since is strictly decreasing on by Lemma 2.1, and then converges to by Lemma 4.7 (i). When , , and then converges to by Lemma 4.7 (iii). When , and then converges to by Lemma 4.7 (ii). Hence, the desired result is obtained by Lemma 2.4.
(v) Let . The proof is similar to (iii). Since , we have Suppose that . Since is strictly decreasing on by Lemma 2.1, it holds that for any ; and for any . By Lemma 4.7, the limit of is given as in (21). Compared (21) with Lemma 2.4, does not belong to for . The rest is also true for by Lemma 4.1 (i) and the facts that if and only if from the proof of case (i) in Lemma 3.1 and is strictly decreasing on .
5 Conclusions
The relation between the IRL1 solution and the true proximal operator of PiE (1) has been clarified in Theorems 3.3 and 3.4, which can be explicitly dependent upon , the initial value , and the regularization parameter . Furthermore, to remedy the gap, the initial value was adaptively selected as in Theorems 3.6 and 3.8 to guarantee that the IRL1 solution belongs to the proximal operator of PiE. The results justify the usage of IRL1 for PiE whenever an initial value is appropriately given. Finally, our arguments can be applied to other sparse-promoting penalties, especially those whose proximal operator can not be explicitly derived.
References
- [1] A. Beck, First-order Methods in Optimization, SIAM Publisher, Philadelphia, 2017.
- [2] T. Blumensath and M. E. Davies, Iterative thresholding for sparse approximations, Journal of Fourier Analysis and Applications, 14 (2008), pp. 629–654.
- [3] P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in Proceedings of the 15th International Conference on Machine Learning, vol. 98, 1998, pp. 82–90.
- [4] P. S. Bradley, O. L. Mangasarian, and W. N. Street, Feature selection via mathematical programming, INFORMS Journal on Computing, 10 (1998), pp. 209–217.
- [5] E. J. Candes, M. B. Wakin, and S. P. Boyd, Enhancing sparsity by reweighted minimization, Journal of Fourier Analysis and Applications, 14 (2008), pp. 877–905.
- [6] L. Chen and Y. Gu, The convergence guarantees of a non-convex approach for sparse recovery, IEEE Transactions on Signal Processing, 62 (2014), pp. 3754–3767.
- [7] J. Fan and R. Li, Variable selection via nonconcave penalized likelihood and its oracle properties, Journal of the American Statistical Association, 96 (2001), pp. 1348–1360.
- [8] J. Fan, R. Li, C.-H. Zhang, and H. Zou, Statistical Foundations of Data Science, Chapman and Hall/CRC, 2020.
- [9] S. Foucart and M.-J. Lai, Sparsest solutions of underdetermined linear systems via -minimization for , Applied and Computational Harmonic Analysis, 26 (2009), pp. 395–407.
- [10] G. M. Fung, O. L. Mangasarian, and A. J. Smola, Minimal kernel classifiers, Journal of Machine Learning Research, 3 (2002), pp. 303–321.
- [11] C. Gao, N. Wang, Q. Yu, and Z. Zhang, A feasible nonconvex relaxation approach to feature selection, in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 25, 2011, pp. 356–361.
- [12] W. Guo, Y. Lou, J. Qin, and M. Yan, A novel regularization based on the error function for sparse recovery, Journal of Scientific Computing, 87 (2021), p. No. 31.
- [13] W. Jiang, F. Nie, and H. Huang, Robust dictionary learning with capped -norm, in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- [14] H. A. Le Thi, T. P. Dinh, H. M. Le, and X. T. Vo, DC approximation approaches for sparse optimization, European Journal of Operational Research, 244 (2015), pp. 26–46.
- [15] Y. Liu and R. Lin, A bisection method for computing the proximal operator of the -norm with . Journal of Computational and Applied Mathematics.
- [16] Y. Liu, Y. Zhou, and R. Lin, The proximal operator of the piece-wise exponential function and its application in compressed sensing. arXiv:2306.13425.
- [17] Y. Lou and M. Yan, Fast L1–L2 minimization via a proximal operator, Journal of Scientific Computing, 74 (2018), pp. 767–785.
- [18] S. Lucidi and F. Rinaldi, Exact penalty functions for nonlinear integer programming problems, Journal of Optimization Theory and Applications, 145 (2010), pp. 479–488.
- [19] M. Malek-Mohammadi, A. Koochakzadeh, M. Babaie-Zadeh, M. Jansson, and C. R. Rojas, Successive concave sparsity approximation for compressed sensing, IEEE Transactions on Signal Processing, 64 (2016), pp. 5657–5671.
- [20] O. Mangasarian, Machine learning via polyhedral concave minimization, in Applied Mathematics and Parallel Computing, Springer, 1996, pp. 175–188.
- [21] I. Mezo, The Lambert W Function: Its Generalizations and Applications, CRC Press, 2022.
- [22] P. Ochs, A. Dosovitskiy, T. Brox, and T. Pock, On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision, SIAM Journal on Imaging Sciences, 8 (2015), pp. 331–372.
- [23] A. Prater-Bennette, L. Shen, and E. E. Tripp, The proximity operator of the log-sum penalty, Journal of Scientific Computing, 93 (2022), p. No. 67.
- [24] A. Prater-Bennette, L. Shen, and E. E. Tripp, A constructive approach for computing the proximity operator of the p-th power of the norm, Applied and Computational Harmonic Analysis, 67 (2023), p. 101572.
- [25] F. Rinaldi, New results on the equivalence between zero-one programming and continuous concave programming, Optimization Letters, 3 (2009), pp. 377–386.
- [26] R. T. Rockafellar and R. J.-B. Wets, Variational analysis, vol. 317, Springer Science & Business Media, 2009.
- [27] L. Shen, Y. Xu, and X. Zeng, Wavelet inpainting with the sparse regularization, Applied and Computational Harmonic Analysis, 41 (2016), pp. 26–53.
- [28] M. Tao, Minimization of over for sparse signal recovery with convergence guarantee, SIAM Journal on Scientific Computing, 44 (2022), pp. A770–A797.
- [29] J. Trzasko and A. Manduca, Highly undersampled magnetic resonance image reconstruction via homotopic -minimization, IEEE Transactions on Medical imaging, 28 (2008), pp. 106–121.
- [30] H. Wang, H. Zeng, and J. Wang, Convergence rate analysis of proximal iteratively reweighted methods for regularization problems, Optimization Letters, 17 (2023), pp. 413–435.
- [31] H. Wang, H. Zeng, J. Wang, and Q. Wu, Relating regularization and reweighted regularization, Optimization Letters, 15 (2021), pp. 2639–2660.
- [32] H. Wang, F. Zhang, Y. Shi, and Y. Hu, Nonconvex and nonsmooth sparse optimization via adaptively iterative reweighted methods, Journal of Global Optimization, 81 (2021), pp. 717–748.
- [33] J. Wright and Y. Ma, High-dimensional Data Analysis with Low-dimensional Models: Principles, Computation, and Applications, Cambridge University Press, 2022.
- [34] J. Yan, X. Meng, F. Cao, and H. Ye, A universal rank approximation method for matrix completion, International Journal of Wavelets, Multiresolution and Information Processing, 20 (2022), p. 2250016.
- [35] P. Yin, E. Esser, and J. Xin, Ratio and difference of and norms and sparse representation with coherent dictionaries, Communications in Information and Systems, 14 (2014), pp. 87–109.
- [36] P. Yin, Y. Lou, Q. He, and J. Xin, Minimization of for compressed sensing, SIAM Journal on Scientific Computing, 37 (2015), pp. A536–A563.
- [37] C.-H. Zhang, Nearly unbiased variable selection under minimax concave penalty, Annals of Statistics, 38 (2010), pp. 894–942.
- [38] S. Zhang and J. Xin, Minimization of transformed penalty: Closed form representation and iterative thresholding algorithms, Communications in Mathematical Sciences, 15 (2017), pp. 511–537.
- [39] , Minimization of transformed penalty: theory, difference of convex function algorithm, and robust application in compressed sensing, Mathematical Programming, 169 (2018), pp. 307–336.
- [40] T. Zhang, Analysis of multi-stage convex relaxation for sparse regularization, Journal of Machine Learning Research, 11 (2010), pp. 1081–1107.
- [41] Z. Zhou, A unified framework for constructing nonconvex regularizations, IEEE Signal Processing Letters, 29 (2022), pp. 479–483.
- [42] , Sparse recovery based on the generalized error function, Journal of Computational Mathematics,doi:10.4208/jcm.2204-m2021-0288, (2023).
- [43] H. Zou and R. Li, One-step sparse estimates in nonconcave penalized likelihood models, Annals of Statistics, 36 (2008), pp. 1509–1533.