Proximal Linearized Method for Sparse Equity Portfolio Optimization with Minimum Transaction Cost
Abstract
In this paper, we propose a sparse equity portfolio optimization (SEPO) based on the mean-variance portfolio selection model. Aimed at minimizing transaction cost by avoiding small investments, this new model includes -norm regularization of the asset weights to promote sparsity, hence the acronym SEPO-. The selection model is also subjected to a minimum expected return. The complexity of the model calls for proximal method, which allows us to handle the objectives terms separately via the corresponding proximal operators. We develop an efficient ADMM-like algorithm to find the optimal portfolio and prove its global convergence. The efficiency of the algorithm is demonstrated using real stock data and the model is promising in portfolio selection in terms of generating higher expected return while maintaining good level of sparsity, and thus minimizing transaction cost.
Keywords: Portfolio optimization, sparse portfolio, minimum transaction cost, mean-variance model, proximal method
1 Introduction
Introduced by Markowitz [20] in 1952, mean-variance optimization (MVO) has been widely used in the selection of optimal investment portfolios. The success of MVO is attributed to the simplicity of its quadratic objective function, which in turn can be solved by quadratic programming (QP) that are widely available. However, MVO has flaws on its own and its implementation in portfolio optimization has been heavily criticized by academics and professionals [22]. One of its flaws, as pointed out by Michaud [21], is its sensitivity towards input parameters, thus maximizes the errors associated with these inputs. This was proven theoretically and computationally by Best and Grauer [3], where a slight change in the assets’ expected return or correlations results in large changes in portfolio weights. Despite that, MVO remains to be one of the most successful framework due to the absence of models that are simple enough to be cast as a QP problem.
Over the past one decade or so, the success of robust optimization techniques has allowed researchers to consider non-quadratic objective function and regularization for portfolio optimization. Consequently, the work by Daubechies et al. [12] showed that the usual quadratic regularizing penalties can be replaced by weighted -norm penalties with . Two specific cases in portfolio optimization, namely lasso when and ridge regression when , were considered by Brodie et al. [9] and [14], respectively. While the ridge regression regularization minimizes the sample variance subject to the constraint which leads to diversification, lasso regularization encourages sparse portfolios which in turn leads to the minimization of transaction cost. Such regularizations have been studied notably by Chen et al. [10], De Mol [13] and Fastrich et al. [15].
In reality, financial institutions charge their customers transaction fees for trading over the stock market. The two most common ways to charge their customers are based on a fixed transaction fee and/or a proportion of the investment amount, whichever is higher. In general, a large number of transactions will result in higher transaction cost, likely to be caused by small investments that incur fixed transaction fees. Transaction cost, in this sense, will have an effect on the portfolio optimization and the frequency of time rebalancing the portfolio. On the other hand, diversification is the practice of spreading the investments around so that the exposure to any one type of asset is limited. This practice can help to mitigate the risk and volatility in the portfolio, but potentially upsizing the number of investment components and thus, increasing the number of transactions. Therefore, a more realistic model is needed to strike a balance between diversification and minimizing transaction cost for optimal portfolio selection.
Due to the complexity of the objective function and the regularization that are involved, many existing literature employ the alternating direction method of multipliers (ADMM), which was first introduced by Gabay and Mercier [17] in 1976. It was not until the recent decade that ADMM has received much attention in machine learning problems. The essence of ADMM is that it allows one to handle the objective terms separately when they can only be approximated using proximal operators. Its appealing features in large-scale convex optimization problems include ease of implementation and relatively good performance (see, for instance Boyd et al. [8], Fazel et al. [16] and Perrin and Roncalli [22]). Some of the examples of ADMM-like algorithms in portfolio optimization can be found in Chen et al. [10], Dai and Wen [11] and Lai et al. [18], where they are used to solve -regularizing problems when . Though -norm is ideal for sparsity problems, the regularization result in a discontinuous and nonconvex problem, of which the computation will turn out to be complicated.
In this paper, we propose a new algorithmic framework to maximize the sparsity within the entire portfolio while promoting diversification, i.e. to minimize the -norm and -norm of the asset weights, respectively, subject to a minimum expected return via MVO. We first transform the constrained problem into an unconstrained one, to find a non-smooth and non-convex objective term. The technique of ADMM allows us to handle these terms separately, but nevertheless converges to its optimal solution. Numerical results using real data are also provided to illustrate the reliability of the proposed model and its efficiency in generating higher expected return while minimizing transaction cost when compared to the standard MVO.
This paper is organized as follows: In Section 2, we present a model for sparse equity portfolio optimization with minimum transaction cost and establish the proximal linearized method for -norm minimization. Subsequently, in Section 3, we present an ADMM algorithm to find the optimal portfolio of the proposed model, together with its convergence analysis. To illustrate the reliability and efficiency of our method, we present the numerical esults using real stock data in Section 4. Finally, the conclusion of the paper is presented in Section 5.
2 Proximal Linearized Method for -norm Minimization
We begin with a universe of assets under consideration, with mean return vector and the covariance matrix . Let be the vector of asset weights in the portfolio. Our objective is to maximize the portfolio return and minimize the variance of portfolio return , while maintaining a certain level of diversification and minimizing transaction cost . The variance of the portfolio return is the measure of risk inherent in investing in a portfolio, and we shall denote this as variance risk throughout this paper. The portfolio is said to be pure concentrated if there exists such that and equally-weighted if for all . Assume that the capital is fully invested, thus where is an all-one vector. A sparse equity portfolio optimization with minimum transaction cost (SEPO-) goes as follows:
(2.1) | ||||
s.t. | ||||
where is a parameter for leveraging the portfolio variance risk, is a parameter for leveraging portfolio diversification and is the minimum guaranteed return ratio with .
In a standard MVO, diversification is of general importance to reduce portfolio risk without necessarily reducing portfolio return. While diversification does not mean that we add more money into our investment, it certainly does reduce our investment value as the investment in each equity incurs transaction cost. Our proposed method takes into consideration of having diversified investments, but at the same time avoid small investments that might incur unnecessary transaction costs due to its diversification. Note that the sparsity measure of the vector is given by
Minimizing -norm in (2.1) promotes sparsity within the portfolio, since the values of are forced to be zero except for the large ones, thus minimizing the transaction cost.
Our model (2.1) poses computational difficulties due to the non-convexity and discontinuity of -norm and the inqeuality constraint . Instead of dealing with the problem in its entirety, we employ the alternating direction method of multipliers (ADMM) such that the smooth and non-smooth terms can be handled separately. This calls for a brief introduction of proximal operators and Moreau envelope [23]:
Definition 2.1.
Let be a proper and lower semicontinuous function and be a parameter. The proximal operator of is defined as
(2.2) |
Its Moreau envelope (or Moreau-Yosida regularization) is defined by
(2.3) |
The parameter can be interpreted as a trade-off between minimizing and being close to . Moreau envelope, specifically, is a way to smooth a non-smooth function and it can be shown that the optimal value of is also the optimal value of .
Suppose now we are given a problem
where are closed proper functions, of which both and can be nonsmooth. Under the ADMM algorithm, each iteration takes on an alternating nature with the proximal operators of and being evaluated separately:
Viewing the above as a fixed point iteration, the ADMM scheme results in such that
Turning our attention back to our problem (2.1), we first denote the set associated with the inequality constraint in (2.1) by
(2.4) |
and the indicator function of by
(2.5) |
We now define the augmented Lagrangian corresponding to problem (2.1):
(2.6) | ||||
where is the usual Lagrange multiplier and is the penalty parameter for the equality constraint . We may set to be constant with value greater than 4 [2], leading to our problem (2.6) rewritten as
(2.7) | ||||
where and are updated via
Problem (2.7) can now be viewed as the following minimization problem:
(2.8) |
where consists of the smooth terms given by
(2.9) |
and the non-smooth terms given by
(2.10) |
For the purpose of our discussion on the proximal method, we let be a fixed value, say , of which we deal with the following minimization problem:
(2.11) |
Our proximal method, inspired by Beck and Teboulle [1], for minimizing the objective function in (2.11) can be viewed as the proximal regularization of linearized at a given point :
(2.12) |
where and denotes the derivative operator. Invoking simple algebra and ignoring the constant terms, (2.12) can be written as
(2.13) |
Using Definition 2.1, the iterative scheme consists of a proximal step at a resulting gradient point which gives us the proximal gradient method:
(2.14) |
where is a suitable step size. Note that if is Lipschitz continuous with constant , then the proximal gradient method is known to converge at a rate of with fixed step size (Boyd et al. [8]). In the case when is not known, the step sizes can be chosen via line search methods (see, for example Beck and Teboulle [1]). In the context of line search methods, the largest possible step size is more desirable. Therefore, proximal gradient methods usually have a fixed step size . In our case, the Lipschitz continuity of gives
(2.15) |
for all where denotes the identity matrix and denotes the Frobenius norm. Since the Lipschitz constant of (2) is not easily accessible, we can estimate it in the following way:
(2.16) |
where tr denotes the matrix trace. Since , it is clear that will always return the value . We shall henceforth fix our step size . Our choice of step size follows from the well-known descent property below:
Lemma 2.1 (Descent property [1]).
Let be a continuously differentiable function with gradient assumed to be Lipschitz continuous. Then, for any ,
(2.17) |
Using the proximal operator defined in Definition 2.1, the minimization of (2.12) is equivalent to the following step:
(2.18) |
where . The choice of also guarantees the sufficient decrease of our objective function under the proximal methods:
Lemma 2.2 (Sufficient decrease property [7]).
Let be a function with its gradient being Lipschitz continuous with moduli . Let be a proper and lower semicontinuous function with . Suppose is chosen such as . Then, for any and any defined by
(2.19) |
we have
(2.20) |
Note that in Lemma 2.2 defines the set of points for which proper and lower semicontinuous function takes on finite value:
In view of Lemma 2.2, we turn to our non-smooth term , which can be written as the following unconstrained problem:
(2.21) |
It follows from the definition of Moreau envelope that our unconstrained optimization problem (2.21) becomes
(2.22) |
for some . It is known that if is a solution of (2.22) for any , then . In addition, is a solution of problem (2.21) if and only if is a solution to (2.22). The proximal problem (2.13) now becomes
(2.23) |
In particular, the proximal operator of the indicator function is reduced to Euclidean projection onto :
(2.24) |
Meanwhile, the proximal operator of the -norm can be expressed in its component-wise form:
(2.25) |
Note that is known as a hard thresholding operator since it forces the vectors ’s except the large one to be zero [23]. In other words, larger results in higher sparsity and less penalization for moving away from . Doing so ensures that our portfolio selection avoid small investments.
In the next section, we will see how the proximal operators are evaluated alternately to give us the optimal solution for problem (2.1).
3 Alternating proximal algorithm and its convergence
In this section, we present an ADMM algorithm to find the optimal portfolio of the proposed SEPO- model (2.1) and establish its global convergence.
SEPO- Algorithm
-
Step 0
Given , initial point and convergence tolerance . Set .
-
Step 1
Compute .
-
Step 2
Compute
-
Step 3
Compute .
-
Step 4
If or , stop. Else, set and go to Step 1.
We have seen in Section 2 how the proposed proximal method guarantees the descent of the solution. To proceed with the convergence of SEPO- algorithm, we begin with Assumption A for any objective function where :
Assumption A
-
(i)
is a continuously differentiable function where its gradient is Lipschitz continuous with moduli .
-
(ii)
is a proper and lower semicontinuous function.
-
(iii)
and
SEPO- algorithm also results in nice convergence properties of (2.7):
Lemma 3.1 (Convergence properties [7]).
Suppose that Assumption A holds. Let be a sequence generated by SEPO- algorithm. Then, the sequence is nonincreasing and in particular
(3.1) |
Moreover,
(3.2) |
and hence
(3.3) |
Proof.
Without loss of generality, we let be a fixed constant and work with in place of , where is given by (2.9) and is given by (2.10). Note that is differentiable and its gradient is Lipschitz continuous with moduli . Invoking SEPO- algorithm and by Lemma 2.2, we have
(3.4) |
where is given by (2). Writing in (3.4) and rearranging it lead to (3.1), which asserts that the sequence is nonincreasing.
Before we present the result that sums up the properties of the sequence generated by SEPO- algorithm starting from the initial point , we first give some basic notations. We denote by the set of critical points of and the set of all limit points, where
Given any set and any point , the distance from to is denoted and defined by
When , then we invoke the usual convention that and hence for all .
Lemma 3.2 (Properties of limit points [7]).
Suppose that Assumption A holds. Let be a bounded sequence generated by SEPO- algorithm. Then, the following hold:
-
(a)
is a nonempty, compact and connected set.
-
(b)
.
-
(c)
.
-
(d)
The objective function is finite and constant on .
Proof.
See Bolte et al. [7]. ∎
What remains is its global convergence, of which we shall establish by means of the Kurdyka-Łojasiewicz (KL) property [7] as an extension of Łojasiewicz gradient inequality [19] for non-smooth functions. We first show that the objective function (2.7) is semi-algebraic and therefore is a KL function. This, in turn, is crucial in giving us the convergence property of the sequences generated via SEPO- algorithm. We begin by recalling notations and definitions concerning subdifferential (see, for instance [7, 23]) and KL property.
Definition 3.1.
Let be a proper and lower semicontinuous function. The (limiting) subdifferential of at , is denoted and defined by
(3.5) |
The point is called a (limiting) critical point of if .
It follows that if is a local minimizer of . For continuously differentiable , then and hence we have the usual gradient mapping from to . If is convex, the subdifferential (3.5) turns out to be the classical Fréchet subdifferential (see [23]).
Let and denote by be the class of all concave and continuous functions that are continuously differentiable on and continuous at 0 with and for all .
Definition 3.2 (Kurdyka-Łojasiewicz (KL) property).
Let be a proper and lower semicontinuous function. The function is said to have the Kurdyka-Łojasiewicz (KL) property at if there exist , a neighbourhood U of and a function , such that for all , the following inequality holds:
(3.6) |
Moreover, is called a KL function if it satisfies the KL property at each point of .
The definition above uses the sublevel sets: Given , the sublevel sets of a function are denoted and defined by
Similar definition holds for . The level sets of are denoted and defined by
Closely related to the KL function is the semi-algebraic function, which is crucial in the proof of the convergence property of our proposed method.
Definition 3.3 (Semi-algebraic sets and functions).
-
(i)
A subset is called a semi-algebraic set if there exists a finite number of real polynomial functions and such that
(3.7) -
(ii)
A function is called a semi-algebraic function if its graph
(3.8) is a semi-algebraic subset of .
It follows that semi-algebraic functions are indeed KL functions, of which the result below is a non-smooth version of the Łojasiewicz gradient inequality.
Theorem 3.4 ([5, 6]).
Let be a proper and lower semicontinuous function. If is semi-algebraic, then it is a KL function.
Theorem 3.4 allows us to avoid the technicality in proving the KL property. This is due to the broad range of functions and sets that are indeed semi-algebraic (see, for instance [4, 7]). Some of the examples of semi-algebraic functions include real polynomial functions, and indicator functions of semi-algebraic sets. Apart from that, finite sums and products of semi-algebraic functions, as well as scalar products are all semi-algebraic.
We are now ready to give the global convergence result of the proposed model (2.1).
Theorem 3.5 (Global convergence).
Suppose the objective function is a KL function such that Assumption A holds. Then the sequence generated by SEPO- algorithm converges to a critical point .
Proof.
See Bolte et al. [7]. ∎
By virtue of Theorem 3.5, we now show that each term in (2.7) is semi-algebraic since the finite sum of semi-algebraic functions is also semi-algebraic. It is obvious that function (2.7) is a sum of a smooth function , -norm and indicator function. The function given by (2.9) is a linear combination of linear and quadratic functions, and hence is a real polynomial function, which in turn is semi-algebraic.
As a specific example given by Bolte et al. [7], -norm is nothing but the sparsity measure of the vector , which is indeed semi-algebraic. In particular, the graph of is given by a finite union of product sets:
(3.9) |
where for any given , denotes the cardinality of and
It is obvious that (3.9) is a piecewise linear set, hence the claim. Lastly, the indicator function defined by (2.5) is also semi-algebraic, since the feasible set (2.4) is convex.
4 Numerical experiments and results
In this section, we study the efficiency of the proposed portfolio optimization model, SEPO-, in maximizing portfolio return and minimizing transaction cost. We test our algorithm on real data of stock prices and returns of 100 companies across 10 different sectors in China, collected from January 2019 to June 2019. These data are in turn used to generate the covariance matrix, which gives us the portfolio variance as in our problem (2.1). We start with equally-weighted portfolio, i.e. for all . We set and stop the algorithm when or . All computational results are obtained by running Matlab R2021a on Windows 10 (Intel Core i7 1065G7 16GB CPU @ 1.30 GHz 1.50GHz).
For testing purposes, we set our penalty parameter and tuning parameter . The latter means that we set our weight on the portfolio diversification as constant. Meanwhile, the value of is chosen to be relatively smaller than . For illustration, we present our results for minimum guaranteed return ratio and .
In Table 4.1, we present the computational results of the expected return, variance risk and sparsity under the proposed SEPO- model and standard MVO model for different values of when we set the minimum guaranteed ratio to be 0.1 and 0.2, respectively. Note that though we leveraged on the variance risk when , the portfolio selection under SEPO- manages to generate expected return of 0.3455 and 0.4014 when and , respectively. Meanwhile, the standard MVO is only able to generate expected return of 0.1560 when we set . The variance risks, however, are higher under the proposed model due to the sparsity, as compared to maximum diversification of the standard MVO. From the table, we can see that our model offers good level of sparsity between 30% and 61% when and between 52% and 72% when . This simply means that out of 100 stocks considered under minimum expected return ratio , one will only need to invest in the selected 39 - 70 stocks where the algorithm returns nonzero ’s. Despite the sparse portfolio selection and increased risk, we can see that the proposed model is more promising in terms of higher expected return.
SEPO- | Standard MVO | SEPO- | |||||||
E.R. | V.R. | Spar | E.R. | V.R. | Spar | E.R. | V.R. | Spar | |
0.1 | 0.6355 | 3.2835 | 58% | 0.6889 | 2.3603 | 0% | 0.7441 | 4.3108 | 72% |
0.2 | 0.6279 | 2.9577 | 61% | 0.5735 | 1.5138 | 0% | 0.6732 | 3.2822 | 66% |
0.3 | 0.5050 | 2.1304 | 47% | 0.4555 | 1.0320 | 0% | 0.5829 | 2.4655 | 58% |
0.4 | 0.5180 | 2.0288 | 53% | 0.3760 | 0.8114 | 0% | 0.5796 | 2.2333 | 64% |
0.5 | 0.4865 | 1.7976 | 51% | 0.3003 | 0.6689 | 0% | 0.5374 | 1.9056 | 64% |
0.6 | 0.4237 | 1.5684 | 39% | 0.2646 | 0.5785 | 0% | 0.4675 | 1.6193 | 52% |
0.7 | 0.3677 | 1.4070 | 30% | 0.2223 | 0.5324 | 0% | 0.4655 | 1.4800 | 59% |
0.8 | 0.3581 | 1.3248 | 31% | 0.2057 | 0.4930 | 0% | 0.4521 | 1.3289 | 63% |
0.9 | 0.3787 | 1.2635 | 44% | 0.1750 | 0.4704 | 0% | 0.4182 | 1.2149 | 56% |
1 | 0.3455 | 1.1802 | 40% | 0.1560 | 0.4501 | 0% | 0.4014 | 1.1204 | 56% |
-
•
E.R. = Expected return, V.R. = Variance risk, Spar = Sparsity


We also compare the expected return and variance risk for the SEPO- and standard MVO for by using scatterplot, as seen in Figure 4.1. The downward trend of the portfolio expected return and risk mimic the standard MVO as . Note that a higher value of reflects our leverage on the variance risk over expected return. At the same time, higher expected return results means higher risk as shown in Table 4.1. In general, the standard MVO model gives lower measure for risk due to maximum diversification, as we can see from Table 4.1 and Figure 4.1. The proposed SEPO-, on the other hand, can lead to higher expected return and lower total transaction cost due to a sparse portfolio. This shows that SEPO- model is able to provide good combination of portfolio selection under sparsity.
To illustrate the reliability of our model, we present the output of the proposed model for using a scatterplot of the variables, as shown in Figure 4.2, with as independent variable at -axis, expected return and sparsity (in decimal) at the left -axis, while risk’s scale is at the right of -axis. We can observe a similar trend for the three lines, which clearly reflects the consistency of our model in obtaining optimal portfolio selection.

The relationship between the independent variable and the response variables is further examined using multivariate linear regression model, as presented in Table 4.2. As we can see from the table, the estimates for response variables are all negative, which means their values decrease with the increase of . Since the -values of all response variables are approximately zero, it is clear that these three variables are significant. In particular, has significant negative relationship with expected return, risk and sparsity.
Response variable | Estimate for intercept | Estimate for | Standard error for | -value for | R-squared |
Expected return, | 0.6514 | -0.3396 | 0.0383 | 2.0737e-0.5 | 0.9076 |
Risk, | 3.1246 | -2.2371 | 0.2992 | 7.0894e-0.5 | 0.8748 |
Sparsity, | 0.6013 | -0.2679 | 0.0796 | 9.8657e-0.5 | 0.5859 |
The significance of on these three dependent variables are supported by the values of R-squared of univariate regression, standing at 90.76%, 87.48% and 58.59% for expected return, risk and sparsity, respectively. Since R-squared is the percentage of total variation contributed by predictor variable, the high R-squared values of greater than 80% for expected return and risk mean that explains high percentage of the variance in these two response variables. It is slightly lower for sparsity, however any R-squared value greater than 50% can be considered as moderately high.
5 Conclusion
The classical Markowitz portfolio scheme or mean-variance optimization (MVO) is one of the most successful framework due to the simplicity in implementation, in particular it can be solved by quadratic programming which is widely available. However, it is very sensitive to input parameter and obtaining acceptable solutions requires the right weight constraints. Over the past decade, there has been renewed attention in considering non-quadratic portfolio selection models, due to the advancement in optimization algorithms for solving more general class of functions. Here we proposed a new algorithmic framework that allows portfolio managers to strike a balance between diversifying investments and minimizing transaction cost, of which the latter is achieved by means of minimizing the -norm. This simply means that the model maximizes sparsity within the portfolio, since the weights are forced to be zero except for large ones. In practice, the regularization of results in a discontinuous and nonconvex problem, and hence is often approximated via -norm. In this study, we employed the proximal methods such that function can be ’smoothed’, by means of linearizing part of the objective function at some given point and regularizing by a quadratic proximal term that acts as a measure for the ”local error” in the approximation. Writing our problem in the form of augmented Lagrangian, the unconstrained problem can be divided into two parts, namely the smooth and non-smooth terms. These terms are then handled separately through their proximal methods via the ADMM method. The global convergence of the proposed SEPO- algorithm for sparse equity portfolio has been established. The efficiency of our model in maximizing portfolio expectedreturn while striking a balance between minimizing transaction cost and diversification has been analyzed using actual data of 100 companies. Emperically, the implementation of our model leads to higher expected return and lower transaction cost. This shows that, despite its higher risk as compared to the standard MVO, the SEPO- model is promising in generating a good combination for optimal investment portfolio.
References
- [1] A. Beck and M. Teboulle. Convex Optimization in Signal Processing and Communications, chapter Gradient-based algorithms with applications to signal recovery problems, pages 42–88. Cambridge University Press, 2009.
- [2] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 3rd edition, 2016.
- [3] M. J. Best and R. R. Grauer. On the sensitivity of mean-variance-efficient portfolios to changes in asset means: some analytical and computational results. The Review of Financial Studies, 4(2):315–342, 1991.
- [4] J. Bochnak, M. Coste, and M. F. Roy. Real algebraic geometry, volume 36. Springer Science & Business Media, 2013.
- [5] J. Bolte, A. Daniilidis, and A. Lewis. The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4):1205–1223, 2007.
- [6] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 18(2):556–572, 2007.
- [7] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1):459–494, 2014.
- [8] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1):1–122, 2011.
- [9] J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences, 106(30):12267–12272, 2009.
- [10] J. Chen, G. Dai, and N. Zhang. An application of sparse-group lasso regularization to equity portfolio optimization and sector selection. Annals of Operations Research, 284(1):243–262, 2020.
- [11] Z. Dai and F. Wen. A generalized approach to sparse and stable portfolio optimization problem. Journal of Industrial & Management Optimization, 14(4):1651–1666, 2018.
- [12] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457, 2004.
- [13] C. De Mol. Financial Signal Processing and Machine Learning, chapter Sparse Markowitz Portfolios, pages 11–22. Wiley Online Library, 2016.
- [14] V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal. A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management Science, 55(5):798–812, 2009.
- [15] B. Fastrich, S. Paterlini, and P. Winker. Constructing optimal sparse portfolios using regularization methods. Computational Management Science, 12(3):417–434, 2015.
- [16] M. Fazel, T. K. Pong, D Sun, and P Tseng. Hankel matrix rank minimization with applications to system identification and realization. SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013.
- [17] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17–40, 1976.
- [18] Z. R. Lai, P. Y. Yang, L. Fang, and X. Wu. Short-term sparse portfolio optimization based on alternating direction method of multipliers. The Journal of Machine Learning Research, 19:1–28, 2018.
- [19] S. Łojasiewicz. Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles, 117:87–89, 1963.
- [20] H. Markowitz. Portfolio selection. Journal of Finance, 7(1):77–91, 1952.
- [21] R. O. Michaud. The Markowitz optimization enigma: Is ‘optimized’optimal? Financial Analysts Journal, 45(1):31–42, 1989.
- [22] S. Perrin and T. Roncalli. Machine learning optimization algorithms & portfolio allocation. Machine Learning for Asset Management: New Developments and Financial Applications, pages 261–328, 2020.
- [23] R. T. Rockafellar and R. J. B Wets. Variational analysis. Springer-Verlag New York, 1998.