Proximal Linearized Method for Sparse Equity Portfolio Optimization with Minimum Transaction Cost

Hong Seng Sim Centre for Mathematical Sciences, Universiti Tunku Abdul Rahman, 43000 Kajang, Selangor, MALAYSIA. Wendy Shin Yie Ling Department of Mathematics and Statistics, Faculty of Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, MALAYSIA. Wah June Leong Department of Mathematics and Statistics, Faculty of Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, MALAYSIA. Institute for Mathematical Research, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, MALAYSIA. Chuei Yee Chen Corresponding author: [email protected] Department of Mathematics and Statistics, Faculty of Science, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, MALAYSIA.

Abstract

In this paper, we propose a sparse equity portfolio optimization (SEPO) based on the mean-variance portfolio selection model. Aimed at minimizing transaction cost by avoiding small investments, this new model includes $\ell_{0}$ -norm regularization of the asset weights to promote sparsity, hence the acronym SEPO- $\ell_{0}$ . The selection model is also subjected to a minimum expected return. The complexity of the model calls for proximal method, which allows us to handle the objectives terms separately via the corresponding proximal operators. We develop an efficient ADMM-like algorithm to find the optimal portfolio and prove its global convergence. The efficiency of the algorithm is demonstrated using real stock data and the model is promising in portfolio selection in terms of generating higher expected return while maintaining good level of sparsity, and thus minimizing transaction cost.

Keywords: Portfolio optimization, sparse portfolio, minimum transaction cost, mean-variance model, proximal method

1 Introduction

Introduced by Markowitz [20] in 1952, mean-variance optimization (MVO) has been widely used in the selection of optimal investment portfolios. The success of MVO is attributed to the simplicity of its quadratic objective function, which in turn can be solved by quadratic programming (QP) that are widely available. However, MVO has flaws on its own and its implementation in portfolio optimization has been heavily criticized by academics and professionals [22]. One of its flaws, as pointed out by Michaud [21], is its sensitivity towards input parameters, thus maximizes the errors associated with these inputs. This was proven theoretically and computationally by Best and Grauer [3], where a slight change in the assets’ expected return or correlations results in large changes in portfolio weights. Despite that, MVO remains to be one of the most successful framework due to the absence of models that are simple enough to be cast as a QP problem.

Over the past one decade or so, the success of robust optimization techniques has allowed researchers to consider non-quadratic objective function and regularization for portfolio optimization. Consequently, the work by Daubechies et al. [12] showed that the usual quadratic regularizing penalties can be replaced by weighted $\ell_{p}$ -norm penalties with $p\in[1,2]$ . Two specific cases in portfolio optimization, namely lasso when $p=1$ and ridge regression when $p=2$ , were considered by Brodie et al. [9] and [14], respectively. While the ridge regression regularization minimizes the sample variance subject to the constraint which leads to diversification, lasso regularization encourages sparse portfolios which in turn leads to the minimization of transaction cost. Such regularizations have been studied notably by Chen et al. [10], De Mol [13] and Fastrich et al. [15].

In reality, financial institutions charge their customers transaction fees for trading over the stock market. The two most common ways to charge their customers are based on a fixed transaction fee and/or a proportion of the investment amount, whichever is higher. In general, a large number of transactions will result in higher transaction cost, likely to be caused by small investments that incur fixed transaction fees. Transaction cost, in this sense, will have an effect on the portfolio optimization and the frequency of time rebalancing the portfolio. On the other hand, diversification is the practice of spreading the investments around so that the exposure to any one type of asset is limited. This practice can help to mitigate the risk and volatility in the portfolio, but potentially upsizing the number of investment components and thus, increasing the number of transactions. Therefore, a more realistic model is needed to strike a balance between diversification and minimizing transaction cost for optimal portfolio selection.

Due to the complexity of the objective function and the regularization that are involved, many existing literature employ the alternating direction method of multipliers (ADMM), which was first introduced by Gabay and Mercier [17] in 1976. It was not until the recent decade that ADMM has received much attention in machine learning problems. The essence of ADMM is that it allows one to handle the objective terms separately when they can only be approximated using proximal operators. Its appealing features in large-scale convex optimization problems include ease of implementation and relatively good performance (see, for instance Boyd et al. [8], Fazel et al. [16] and Perrin and Roncalli [22]). Some of the examples of ADMM-like algorithms in portfolio optimization can be found in Chen et al. [10], Dai and Wen [11] and Lai et al. [18], where they are used to solve $\ell_{p}$ -regularizing problems when $p\in[1,2]$ . Though $\ell_{0}$ -norm is ideal for sparsity problems, the regularization result in a discontinuous and nonconvex problem, of which the computation will turn out to be complicated.

In this paper, we propose a new algorithmic framework to maximize the sparsity within the entire portfolio while promoting diversification, i.e. to minimize the $\ell_{0}$ -norm and $\ell_{2}$ -norm of the asset weights, respectively, subject to a minimum expected return via MVO. We first transform the constrained problem into an unconstrained one, to find a non-smooth and non-convex objective term. The technique of ADMM allows us to handle these terms separately, but nevertheless converges to its optimal solution. Numerical results using real data are also provided to illustrate the reliability of the proposed model and its efficiency in generating higher expected return while minimizing transaction cost when compared to the standard MVO.

This paper is organized as follows: In Section 2, we present a model for sparse equity portfolio optimization with minimum transaction cost and establish the proximal linearized method for $\ell_{0}$ -norm minimization. Subsequently, in Section 3, we present an ADMM algorithm to find the optimal portfolio of the proposed model, together with its convergence analysis. To illustrate the reliability and efficiency of our method, we present the numerical esults using real stock data in Section 4. Finally, the conclusion of the paper is presented in Section 5.

2 Proximal Linearized Method for $\ell_{0}$ -norm Minimization

We begin with a universe of $n$ assets under consideration, with mean return vector $\mu\in\mathbb{R}^{n}$ and the covariance matrix $V\in\mathbb{R}^{n\times n}$ . Let $x\in\mathbb{R}^{n}$ be the vector of asset weights in the portfolio. Our objective is to maximize the portfolio return $\mu^{T}x$ and minimize the variance of portfolio return $x^{T}Vx$ , while maintaining a certain level of diversification $||x||^{2}_{2}$ and minimizing transaction cost $||x||_{0}$ . The variance of the portfolio return is the measure of risk inherent in investing in a portfolio, and we shall denote this as variance risk throughout this paper. The portfolio is said to be pure concentrated if there exists $i$ such that $x_{i}=1$ and equally-weighted if $x_{i}=\frac{1}{n}$ for all $i$ . Assume that the capital is fully invested, thus $e^{T}x=1$ where $e\in\mathbb{R}^{n}$ is an all-one vector. A sparse equity portfolio optimization with minimum transaction cost (SEPO- $\ell_{0}$ ) goes as follows:

$\displaystyle\min_{x\in\mathbb{R}^{n}}\text{ }$	$\displaystyle\frac{\beta_{1}}{2}x^{T}Vx-\mu^{T}x+\frac{\beta_{2}}{2}\|\|x\|\|^{2}_{2}+\|\|x\|\|_{0}$	(2.1)
s.t.	$\displaystyle\mu^{T}x\geq r,$
	$\displaystyle e^{T}x=1,$
	$\displaystyle x\geq 0,$

where $\beta_{1}>0$ is a parameter for leveraging the portfolio variance risk, $\beta_{2}>0$ is a parameter for leveraging portfolio diversification and $r\geq 0$ is the minimum guaranteed return ratio with $r\leq\max\{\mu_{i}\}$ .

In a standard MVO, diversification is of general importance to reduce portfolio risk without necessarily reducing portfolio return. While diversification does not mean that we add more money into our investment, it certainly does reduce our investment value as the investment in each equity incurs transaction cost. Our proposed method takes into consideration of having diversified investments, but at the same time avoid small investments that might incur unnecessary transaction costs due to its diversification. Note that the sparsity measure of the vector $x\in\mathbb{R}^{n}$ is given by

\|x\|_{0}:=\textnormal{number of nonzero components of }x_{i}.

Minimizing $\ell_{0}$ -norm in (2.1) promotes sparsity within the portfolio, since the values of $x_{i}$ are forced to be zero except for the large ones, thus minimizing the transaction cost.

Our model (2.1) poses computational difficulties due to the non-convexity and discontinuity of $\ell_{0}$ -norm and the inqeuality constraint $\mu^{T}x\geq r$ . Instead of dealing with the problem in its entirety, we employ the alternating direction method of multipliers (ADMM) such that the smooth and non-smooth terms can be handled separately. This calls for a brief introduction of proximal operators and Moreau envelope [23]:

Definition 2.1.

Let $\psi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ be a proper and lower semicontinuous function and $\sigma>0$ be a parameter. The proximal operator of $\psi$ is defined as

\text{prox}_{\sigma\psi}(x)=\mathop{\arg\min}\limits_{y\in\mathbb{R}^{n}}\left\{\psi(y)+\frac{1}{2\sigma}\|y-x\|^{2}_{2}\right\}.

(2.2)

Its Moreau envelope (or Moreau-Yosida regularization) is defined by

\text{env}_{\sigma\psi}(x)=\inf_{y\in\mathbb{R}^{n}}\left\{\psi(y)+\frac{1}{2\sigma}\|y-x\|^{2}_{2}\right\}.

(2.3)

The parameter $\sigma$ can be interpreted as a trade-off between minimizing $\psi$ and being close to $x$ . Moreau envelope, specifically, is a way to smooth a non-smooth function and it can be shown that the optimal value of $\text{env}_{\sigma\psi}(x)$ is also the optimal value of $\text{prox}_{\sigma\psi}(x)$ .

Suppose now we are given a problem

\min\quad\psi(x)+\phi(x)

where $\psi,\phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ are closed proper functions, of which both $\psi$ and $\phi$ can be nonsmooth. Under the ADMM algorithm, each iteration $k$ takes on an alternating nature with the proximal operators of $\psi$ and $\phi$ being evaluated separately:

	$\displaystyle x^{k+1}$	$\displaystyle\in\text{prox}_{\sigma\psi}(z^{k}-u^{k});$
	$\displaystyle y^{k+1}$	$\displaystyle\in\text{prox}_{\sigma\phi}(x^{k+1}+u^{k});$
	$\displaystyle u^{k+1}$	$\displaystyle:=u^{k}+x^{k+1}-z^{k+1}.$

Viewing the above as a fixed point iteration, the ADMM scheme results in $x=z$ such that

	$\displaystyle x$	$\displaystyle=\text{prox}_{\sigma\psi}(x-u),$
	$\displaystyle y$	$\displaystyle=\text{prox}_{\sigma\phi}(x+u).$

Turning our attention back to our problem (2.1), we first denote the set $R$ associated with the inequality constraint in (2.1) by

R=\left\{x\in\mathbb{R}^{n}\colon\mu^{T}x\geq r\right\},

(2.4)

and the indicator function of $R$ by

I_{R}(x)=\left\{\begin{array}[]{ll}0,&x\in R,\\ \infty,&x\notin R.\end{array}\right.

(2.5)

We now define the augmented Lagrangian corresponding to problem (2.1):

	$\displaystyle\mathcal{L}(x,\lambda,\rho)=$	$\displaystyle\frac{\beta_{1}}{2}x^{T}Vx-\mu^{T}x+\frac{\beta_{2}}{2}\|\|x\|\|^{2}_{2}+\|\|x\|\|_{0}+I_{R}(x)$		(2.6)
		$\displaystyle+\lambda\left(e^{T}x-1\right)+\frac{\rho}{2}\left(e^{T}x-1\right)^{2},$		(2.6)

where $\lambda$ is the usual Lagrange multiplier and $\rho>0$ is the penalty parameter for the equality constraint $e^{T}x=1$ . We may set $\rho$ to be constant with value greater than 4 [2], leading to our problem (2.6) rewritten as

	$\displaystyle\mathcal{L}(x,\lambda)=$	$\displaystyle\ \frac{\beta_{1}}{2}x^{T}Vx-\mu^{T}x+\frac{\beta_{2}}{2}\|\|x\|\|^{2}_{2}+\|\|x\|\|_{0}+I_{R}(x)$		(2.7)
		$\displaystyle+\lambda\left(e^{T}x-1\right)+\frac{\rho}{2}\left(e^{T}x-1\right)^{2},$		(2.7)

where $x$ and $\lambda$ are updated via

	$\displaystyle x^{k+1}$	$\displaystyle=\mathop{\arg\min}\limits_{x}\mathcal{L}\left(x,\lambda^{k}\right),$
	$\displaystyle\lambda^{k+1}$	$\displaystyle=\lambda^{k}+\rho(e^{T}x^{k+1}-1).$

Problem (2.7) can now be viewed as the following minimization problem:

\min_{x,\lambda}\quad P(x,\lambda)+Q(x),

(2.8)

where $P(x,\lambda)$ consists of the smooth terms given by

P(x,\lambda)=\frac{\beta_{1}}{2}x^{T}Vx-\mu^{T}x+\frac{\beta_{2}}{2}||x||^{2}_{2}+\lambda\left(e^{T}x-1\right)+\frac{\rho}{2}\left(e^{T}x-1\right)^{2},

(2.9)

and $Q(x)$ the non-smooth terms given by

Q(x)=||x||_{0}+I_{R}(x).

(2.10)

For the purpose of our discussion on the proximal method, we let $\lambda$ be a fixed value, say $\hat{\lambda}$ , of which we deal with the following minimization problem:

\min_{x}\quad P(x,\hat{\lambda})+Q(x).

(2.11)

Our proximal method, inspired by Beck and Teboulle [1], for minimizing the objective function in (2.11) can be viewed as the proximal regularization of $P$ linearized at a given point $x^{k}$ :

x^{k+1}\in\mathop{\arg\min}\limits_{x}\left\{Q(x)+(x-x^{k})^{T}\nabla P(x^{k})+\frac{1}{2t}\|x-x^{k}\|^{2}\right\},

(2.12)

where $t>0$ and $\nabla$ denotes the derivative operator. Invoking simple algebra and ignoring the constant terms, (2.12) can be written as

x^{k+1}\in\mathop{\arg\min}\limits_{x}\left\{Q(x)+\frac{1}{2t^{k}}\left\|x-(x^{k}-t^{k}\nabla P(x^{k}))\right\|^{2}\right\}.

(2.13)

Using Definition 2.1, the iterative scheme consists of a proximal step at a resulting gradient point which gives us the proximal gradient method:

x^{k+1}\in\text{prox}_{\alpha^{k}Q}\left(x^{k}-\alpha^{k}\nabla P\left(x^{k}\right)\right),

(2.14)

where $\alpha^{k}>0$ is a suitable step size. Note that if $\nabla P$ is Lipschitz continuous with constant $L_{c}$ , then the proximal gradient method is known to converge at a rate of $\mathcal{O}(1/k)$ with fixed step size $\alpha\in(0,1/L_{c}]$ (Boyd et al. [8]). In the case when $L_{c}$ is not known, the step sizes can be chosen via line search methods (see, for example Beck and Teboulle [1]). In the context of line search methods, the largest possible step size $\alpha=1$ is more desirable. Therefore, proximal gradient methods usually have a fixed step size $\alpha=\min\{1,1/L_{c}\}$ . In our case, the Lipschitz continuity of $\nabla P$ gives

	$\displaystyle\left\\|\nabla P(x)-\nabla P(y)\right\\|_{2}$	$\displaystyle=\left\\|\beta_{1}V(x-y)+\beta_{2}(x-y)+\rho(x-y)\right\\|_{2}$
		$\displaystyle\leq\\|\beta_{1}V+\beta_{2}I+\rho ee^{T}\\|_{F}\,\\|x-y\\|_{2}$		(2.15)

for all $x,y\in\mathbb{R}^{n}$ where $I$ denotes the identity matrix and $\|\cdot\|_{F}$ denotes the Frobenius norm. Since the Lipschitz constant of (2) is not easily accessible, we can estimate it in the following way:

	$\displaystyle L_{c}$	$\displaystyle\leq\beta_{1}\\|V\\|_{F}+\beta_{2}\\|I\\|_{F}+\rho\\|ee^{T}\\|_{F}$
		$\displaystyle=\beta_{1}\sqrt{\text{tr}\left(VV^{T}\right)}+\beta_{2}\sqrt{n}+\rho n=:\tilde{L}_{c}$		(2.16)

where tr denotes the matrix trace. Since $\tilde{L}_{c}>1$ , it is clear that $\min\{1,1/\tilde{L}_{c}\}$ will always return the value $1/\tilde{L}_{c}$ . We shall henceforth fix our step size $\alpha=1/\tilde{L}_{c}$ . Our choice of step size follows from the well-known descent property below:

Lemma 2.1 (Descent property [1]).

Let $\psi:R^{n}\to R$ be a continuously differentiable function with gradient $\nabla\psi$ assumed to be $L_{c}-$ Lipschitz continuous. Then, for any $\tilde{L}_{c}\geq L_{c}$ ,

\psi(x)\leq\psi(y)+(x-y)^{T}\nabla\psi(y)+\frac{\tilde{L}_{c}}{2}\|x-y\|^{2},\;\;\forall x,y\in R^{n}.

(2.17)

Using the proximal operator defined in Definition 2.1, the minimization of (2.12) is equivalent to the following step:

x^{k+1}\in\text{prox}_{\alpha Q}\left(x^{k}-\alpha\nabla P\left(x^{k}\right)\right),

(2.18)

where $\alpha=\frac{1}{\tilde{L}_{c}}$ . The choice of $\tilde{L}_{c}$ also guarantees the sufficient decrease of our objective function under the proximal methods:

Lemma 2.2 (Sufficient decrease property [7]).

Let $\psi:\mathbb{R}^{n}\to\mathbb{R}$ be a $C^{1}$ function with its gradient $\nabla\psi$ being Lipschitz continuous with moduli $L_{c}$ . Let $\phi:\mathbb{R}^{n}\to(-\infty,+\infty]$ be a proper and lower semicontinuous function with $\inf_{\mathbb{R}^{n}}\phi>-\infty$ . Suppose $\tilde{L}_{c}$ is chosen such as $\tilde{L}_{c}>L_{c}$ . Then, for any $x\in\mbox{dom }\phi$ and any $\hat{x}\in\mathbb{R}^{n}$ defined by

\hat{x}\in\mbox{prox}_{\alpha\phi}\left(x-\alpha\nabla\psi\left(x\right)\right),\quad\alpha=\frac{1}{\tilde{L}_{c}},

(2.19)

we have

\psi\left(\hat{x}\right)+\phi\left(\hat{x}\right)\leq\psi(x)+\phi(x)-\frac{1}{2}\left(\tilde{L}_{c}-L_{c}\right)\|\hat{x}-x\|^{2}.

(2.20)

Note that $\text{dom }\phi$ in Lemma 2.2 defines the set of points for which proper and lower semicontinuous function $\phi:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ takes on finite value:

\text{dom }\phi=\{x\in\mathbb{R}^{n}\colon\phi(x)<+\infty\}.

In view of Lemma 2.2, we turn to our non-smooth term $Q(x)$ , which can be written as the following unconstrained problem:

\min_{x\in\mathbb{R}^{n}}\ \ \|x\|_{0}+I_{R}(x).

(2.21)

It follows from the definition of Moreau envelope that our unconstrained optimization problem (2.21) becomes

\min_{x,y\in\mathbb{R}^{n}}\ \ \|y\|_{0}+\frac{1}{2\sigma}\|y-x\|_{2}^{2}+I_{R}(x),

(2.22)

for some $\sigma>0$ . It is known that if $(x^{*},y^{*})$ is a solution of (2.22) for any $\sigma>0$ , then $y^{*}\in\text{prox}_{\sigma\|\cdot\|_{0}}(x^{*})$ . In addition, $x^{*}$ is a solution of problem (2.21) if and only if $(x^{*},y^{*})$ is a solution to (2.22). The proximal problem (2.13) now becomes

\left\{\begin{aligned} &y^{k+1}\in\text{prox}_{\sigma\|\cdot\|_{0}}\left(y^{k}-\frac{1}{\tilde{L}_{c}}\nabla P\left(y^{k}\right)\right),\\ &x^{k+1}=\text{prox}_{I_{R}}\left(y^{k+1}\right).\end{aligned}\right.

(2.23)

In particular, the proximal operator of the indicator function $I_{R}$ is reduced to Euclidean projection onto $R$ :

\text{prox}_{I_{R}}(x)=\left\{\begin{array}[]{ll}x,&\text{if }\mu^{T}x\geq r,\\ \frac{r}{\mu^{T}x}x,&\text{if }\mu^{T}x<r.\end{array}\right.

(2.24)

Meanwhile, the proximal operator of the $\ell_{0}$ -norm can be expressed in its component-wise form:

\text{prox}_{\sigma\|\cdot\|_{0}}(x)=\left\{\begin{array}[]{cl}\{0\},&\text{if }x_{i}<\sqrt{2\sigma},\\ \{0,x_{i}\},&\text{if }x_{i}=\sqrt{2\sigma},\\ \{x_{i}\},&\text{if }x_{i}>\sqrt{2\sigma}.\end{array}\right.

(2.25)

Note that $\text{prox}_{\sigma\|\cdot\|_{0}}(x)$ is known as a hard thresholding operator since it forces the vectors $x_{i}$ ’s except the large one to be zero [23]. In other words, larger $\sigma$ results in higher sparsity and less penalization for moving away from $x$ . Doing so ensures that our portfolio selection avoid small investments.

In the next section, we will see how the proximal operators are evaluated alternately to give us the optimal solution for problem (2.1).

3 Alternating proximal algorithm and its convergence

In this section, we present an ADMM algorithm to find the optimal portfolio of the proposed SEPO- $\ell_{0}$ model (2.1) and establish its global convergence.

SEPO- $\ell_{0}$ Algorithm

Step 0

Given $\beta_{1},\beta_{2},\sigma,r,V,\mu,\rho,\alpha$ , initial point $(x^{0},\lambda^{0})$ and convergence tolerance $\varepsilon$ . Set $k:=0$ .
Step 1

Compute $\hat{x}^{k+1}\in\text{prox}_{\sigma\|\cdot\|_{0}}\!\left(x^{k}-\alpha\nabla P(x^{k},\lambda^{k})\right)$ .
Step 2

Compute $x^{k+1}=\text{prox}_{I_{R}}\left(\hat{x}^{k+1}\right).$
Step 3

Compute $\lambda^{k+1}=\lambda^{k}+\rho(e^{T}x^{k+1}-1)$ .
Step 4

If $\left\|\nabla P\left(x^{k+1},\lambda^{k+1}\right)\right\|<\varepsilon$ or $k>10000$ , stop. Else, set $k:=k+1$ and go to Step 1.

We have seen in Section 2 how the proposed proximal method guarantees the descent of the solution. To proceed with the convergence of SEPO- $\ell_{0}$ algorithm, we begin with Assumption A for any objective function $\mathcal{L}\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ where $\mathcal{L}=\psi+\phi$ :

Assumption A

(i)

$\psi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a continuously differentiable function where its gradient $\nabla\psi$ is Lipschitz continuous with moduli $L_{c}$ .
(ii)

$\phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ is a proper and lower semicontinuous function.
(iii)

$\inf_{\mathbb{R}^{n}}\psi>-\infty$ and $\inf_{\mathbb{R}^{n}}\phi>-\infty$

SEPO- $\ell_{0}$ algorithm also results in nice convergence properties of (2.7):

Lemma 3.1 (Convergence properties [7]).

Suppose that Assumption A holds. Let $\{x^{k}\}_{k\in\mathbb{N}}$ be a sequence generated by SEPO- $\ell_{0}$ algorithm. Then, the sequence $\{\mathcal{L}(x^{k},\lambda^{k}):k\in\mathbb{N}\}$ is nonincreasing and in particular

\mathcal{L}\left(x^{k}\right)-\mathcal{L}\left(x^{k+1}\right)\geq\frac{1}{2}\left(\tilde{L}_{c}-L_{c}\right)\|x^{k+1}-x^{k}\|^{2}.

(3.1)

Moreover,

\sum^{\infty}_{k=1}\left\|x^{k+1}-x^{k}\right\|^{2}<\infty,

(3.2)

and hence

\lim_{k\rightarrow\infty}\ \left\|x^{k+1}-x^{k}\right\|=0.

(3.3)

Proof.

Without loss of generality, we let $\lambda$ be a fixed constant and work with $\mathcal{L}(x)=P(x)+Q(x)$ in place of $\mathcal{L}(x,\lambda)$ , where $P(x)$ is given by (2.9) and $Q(x)$ is given by (2.10). Note that $P(x)$ is differentiable and its gradient is Lipschitz continuous with moduli $L_{c}$ . Invoking SEPO- $\ell_{0}$ algorithm and by Lemma 2.2, we have

P\left(x^{k+1}\right)+Q\left(x^{k+1}\right)\leq P\left(x^{k}\right)+Q\left(x^{k}\right)-\frac{1}{2}\left(\tilde{L}_{c}-L_{c}\right)\left\|x^{k+1}-x^{k}\right\|^{2},

(3.4)

where $\tilde{L}_{c}$ is given by (2). Writing $\mathcal{L}(x^{k})=P(x^{k})+Q(x^{k})$ in (3.4) and rearranging it lead to (3.1), which asserts that the sequence $\{\mathcal{L}(x^{k},\lambda^{k}):k\in\mathbb{N}\}$ is nonincreasing.

Note that $P$ and $Q$ are bounded below (see Assumption A), and hence $\mathcal{L}$ converges to some $\underline{\mathcal{L}}$ . Let $N\in\mathbb{N}_{+}$ . We sum up (3.1) from $k=0$ to $k=N-1$ to get

	$\displaystyle\sum^{N-1}_{k=0}\left\\|x^{k+1}-x^{k}\right\\|^{2}$	$\displaystyle\leq\frac{2}{\tilde{L}_{c}-L_{c}}\sum^{N-1}_{k=0}\left(\mathcal{L}\left(x^{k}\right)-\mathcal{L}\left(x^{k+1}\right)\right)$
		$\displaystyle=\frac{2}{\tilde{L}_{c}-L_{c}}\left(\mathcal{L}\left(x^{0}\right)-\mathcal{L}\left(x^{N}\right)\right)$
		$\displaystyle\leq\frac{2}{\tilde{L}_{c}-L_{c}}\left(\mathcal{L}\left(x^{0}\right)-\underline{\mathcal{L}}\right).$

It follows that (3.2) and (3.3) hold when we take the limit as $N\rightarrow\infty$ . ∎

Before we present the result that sums up the properties of the sequence $\{x^{k}\}_{k\in\mathbb{N}}$ generated by SEPO- $\ell_{0}$ algorithm starting from the initial point $x^{0}$ , we first give some basic notations. We denote by $\text{crit }\mathcal{L}$ the set of critical points of $\mathcal{L}$ and $\omega\left(x^{0}\right)$ the set of all limit points, where

\omega\left(x^{0}\right)=\left\{\overline{x}\in\mathbb{R}^{n}:\exists\text{ an increasing sequence of integers }\{k_{j}\}_{j\in\mathbb{N}}\text{ such that }x^{k_{j}}\rightarrow\overline{x}\text{ as }j\rightarrow\infty\right\}.

Given any set $\Omega\subset\mathbb{R}^{n}$ and any point $x\in\mathbb{R}^{n}$ , the distance from $x$ to $\Omega$ is denoted and defined by

\text{dist}(x,\Omega):=\inf\left\{\|y-x\|\colon y\in\Omega\right\}.

When $\Omega=\emptyset$ , then we invoke the usual convention that $\inf\emptyset=\infty$ and hence $\text{dist}(x,\Omega)=\infty$ for all $x$ .

Lemma 3.2 (Properties of limit points [7]).

Suppose that Assumption A holds. Let $\{x^{k}\}_{k\in\mathbb{N}}$ be a bounded sequence generated by SEPO- $\ell_{0}$ algorithm. Then, the following hold:

(a)

$\omega\left(x^{0}\right)$ is a nonempty, compact and connected set.
(b)

$\omega\left(x^{0}\right)\subset\text{crit }\mathcal{L}$ .
(c)

$\lim_{k\rightarrow\infty}\text{dist }\left(x^{k},\omega\left(x^{0}\right)\right)=0$ .
(d)

The objective function $\mathcal{L}$ is finite and constant on $\omega\left(x^{0}\right)$ .

Proof.

See Bolte et al. [7]. ∎

What remains is its global convergence, of which we shall establish by means of the Kurdyka-Łojasiewicz (KL) property [7] as an extension of Łojasiewicz gradient inequality [19] for non-smooth functions. We first show that the objective function (2.7) is semi-algebraic and therefore is a KL function. This, in turn, is crucial in giving us the convergence property of the sequences generated via SEPO- $\ell_{0}$ algorithm. We begin by recalling notations and definitions concerning subdifferential (see, for instance [7, 23]) and KL property.

Definition 3.1.

Let $\phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ be a proper and lower semicontinuous function. The (limiting) subdifferential of $\phi$ at $x\in\text{dom }\phi$ , is denoted and defined by

\partial\phi(x)=\left\{u\in\mathbb{R}^{n}\colon\exists x^{k}\rightarrow x,\ \phi(x^{k})\rightarrow\phi(x),\ u^{k}\rightarrow u,\ \mathop{\lim\inf}\limits_{y\rightarrow x^{k}}\ \frac{\phi(y)-\phi(x^{k})-\langle u^{k},y-x^{k}\rangle}{\|y-x^{k}\|}\geq 0\right\}.

(3.5)

The point $x$ is called a (limiting) critical point of $\phi$ if $0\in\partial\phi(x)$ .

It follows that $0\in\partial\phi(x)$ if $x\in\mathbb{R}^{n}$ is a local minimizer of $\phi$ . For continuously differentiable $\phi$ , then $\partial\phi(x)=\{\nabla\phi\}$ and hence we have the usual gradient mapping $\nabla\phi$ from $x\in\text{dom }\phi$ to $\nabla\phi(x)$ . If $\psi$ is convex, the subdifferential (3.5) turns out to be the classical Fréchet subdifferential (see [23]).

Let $\eta\in(0,\infty]$ and denote by $\Phi_{\eta}$ be the class of all concave and continuous functions $\varphi\colon[0,\eta)\rightarrow\mathbb{R}_{+}$ that are continuously differentiable on $(0,\eta)$ and continuous at 0 with $\varphi(0)=0$ and $\varphi^{\prime}(s)>0$ for all $s\in(0,\eta)$ .

Definition 3.2 (Kurdyka-Łojasiewicz (KL) property).

Let $\phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ be a proper and lower semicontinuous function. The function $\phi$ is said to have the Kurdyka-Łojasiewicz (KL) property at $\bar{u}\in\text{dom }\partial\phi:=\left\{u\in\mathbb{R}^{n}\colon\partial\phi(u)\neq\emptyset\right\}$ if there exist $\eta\in(0,+\infty]$ , a neighbourhood U of $\bar{u}$ and a function $\varphi\in\Phi_{\eta}$ , such that for all $u\in U\cap\left[\phi(\bar{u})<\phi(u)<\phi(\bar{u})+\eta\right]$ , the following inequality holds:

\varphi^{\prime}\left(\phi(u)-\phi(\bar{u})\right)\ \text{dist}\left(0,\partial\phi\left(u\right)\right)\geq 1.

(3.6)

Moreover, $\phi$ is called a KL function if it satisfies the KL property at each point of $\text{dom }\phi$ .

The definition above uses the sublevel sets: Given $a,b\in\mathbb{R}$ , the sublevel sets of a function $\phi$ are denoted and defined by

[a\leq\phi\leq b]:=\left\{x\in\mathbb{R}^{n}\colon a\leq\phi(x)\leq b\right\}.

Similar definition holds for $[a<\phi<b]$ . The level sets of $\phi$ are denoted and defined by

[\phi=a]:=\left\{x\in\mathbb{R}^{n}\colon\phi(x)=a\right\}.

Closely related to the KL function is the semi-algebraic function, which is crucial in the proof of the convergence property of our proposed method.

Definition 3.3 (Semi-algebraic sets and functions).

(i)

A subset $\Omega\subset\mathbb{R}^{n}$ is called a semi-algebraic set if there exists a finite number of real polynomial functions $p_{ij}$ and $q_{ij}$ such that

\Omega=\bigcup^{p}_{j=1}\bigcap^{q}_{i=1}\bigl{\{}u\in\mathbb{R}^{n}\colon p_{ij}(u)=0\text{ and }q_{ij}(u)<0\bigr{\}}.

(3.7)

(ii)

A function $\phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ is called a semi-algebraic function if its graph

$\bigl{\{}(u,t)\in\mathbb{R}^{n+1}\colon\phi(u)=t\bigr{\}}$ (3.8)

is a semi-algebraic subset of $R^{n+1}$ .

It follows that semi-algebraic functions are indeed KL functions, of which the result below is a non-smooth version of the Łojasiewicz gradient inequality.

Theorem 3.4 ([5, 6]).

Let $\phi\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ be a proper and lower semicontinuous function. If $\phi$ is semi-algebraic, then it is a KL function.

Theorem 3.4 allows us to avoid the technicality in proving the KL property. This is due to the broad range of functions and sets that are indeed semi-algebraic (see, for instance [4, 7]). Some of the examples of semi-algebraic functions include real polynomial functions, and indicator functions of semi-algebraic sets. Apart from that, finite sums and products of semi-algebraic functions, as well as scalar products are all semi-algebraic.

We are now ready to give the global convergence result of the proposed model (2.1).

Theorem 3.5 (Global convergence).

Suppose the objective function $\mathcal{L}\colon\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\{+\infty\}$ is a KL function such that Assumption A holds. Then the sequence $\{x^{k}\}_{k\in\mathbb{N}}$ generated by SEPO- $\ell_{0}$ algorithm converges to a critical point $x^{*}$ .

Proof.

See Bolte et al. [7]. ∎

By virtue of Theorem 3.5, we now show that each term in (2.7) is semi-algebraic since the finite sum of semi-algebraic functions is also semi-algebraic. It is obvious that function (2.7) is a sum of a smooth function $P(x)$ , $\ell_{0}$ -norm and indicator function. The function $P(x)$ given by (2.9) is a linear combination of linear and quadratic functions, and hence $P(x)$ is a real polynomial function, which in turn is semi-algebraic.

As a specific example given by Bolte et al. [7], $\ell_{0}$ -norm is nothing but the sparsity measure of the vector $x\in\mathbb{R}^{n}$ , which is indeed semi-algebraic. In particular, the graph of $\|\cdot\|_{0}$ is given by a finite union of product sets:

\text{graph }\|\cdot\|_{0}=\bigcup_{I\subset\{1,\dots,n\}}\left(\prod^{n}_{i=1}J^{I}_{i}\right)\times\{n-|I|\},

(3.9)

where for any given $I\subset\{1,\dots,n\}$ , $|I|$ denotes the cardinality of $I$ and

J^{I}_{i}=\left\{\begin{array}[]{ll}\{0\},&\text{if }i\in I,\\ \mathbb{R}\setminus\{0\},&\text{otherwise}.\end{array}\right.

It is obvious that (3.9) is a piecewise linear set, hence the claim. Lastly, the indicator function $I_{R}(x)$ defined by (2.5) is also semi-algebraic, since the feasible set (2.4) is convex.

4 Numerical experiments and results

In this section, we study the efficiency of the proposed portfolio optimization model, SEPO- $\ell_{0}$ , in maximizing portfolio return and minimizing transaction cost. We test our algorithm on real data of stock prices and returns of 100 companies across 10 different sectors in China, collected from January 2019 to June 2019. These data are in turn used to generate the covariance matrix, which gives us the portfolio variance as in our problem (2.1). We start with equally-weighted portfolio, i.e. $x_{i}^{0}=\frac{1}{n}$ for all $i$ . We set $\varepsilon=10^{-7}$ and stop the algorithm when $\left\|\nabla P\left(x^{k+1},\lambda^{k+1}\right)\right\|<\varepsilon$ or $k>10000$ . All computational results are obtained by running Matlab R2021a on Windows 10 (Intel Core i7 1065G7 16GB CPU @ 1.30 GHz $\sim$ 1.50GHz).

For testing purposes, we set our penalty parameter $\rho=5$ and tuning parameter $\beta_{2}=1$ . The latter means that we set our weight on the portfolio diversification as constant. Meanwhile, the value of $\beta_{1}$ is chosen to be relatively smaller than $\beta_{2}$ . For illustration, we present our results for minimum guaranteed return ratio $r=0.1$ and $r=0.2$ .

In Table 4.1, we present the computational results of the expected return, variance risk and sparsity under the proposed SEPO- $\ell_{0}$ model and standard MVO model for different values of $\beta_{1}$ when we set the minimum guaranteed ratio to be 0.1 and 0.2, respectively. Note that though we leveraged on the variance risk when $\beta=1$ , the portfolio selection under SEPO- $\ell_{0}$ manages to generate expected return of 0.3455 and 0.4014 when $r=0.1$ and $r=0.2$ , respectively. Meanwhile, the standard MVO is only able to generate expected return of 0.1560 when we set $r=0.1$ . The variance risks, however, are higher under the proposed model due to the sparsity, as compared to maximum diversification of the standard MVO. From the table, we can see that our model offers good level of sparsity between 30% and 61% when $r=0.1$ and between 52% and 72% when $r=0.2$ . This simply means that out of 100 stocks considered under minimum expected return ratio $r=0.1$ , one will only need to invest in the selected 39 - 70 stocks where the algorithm returns nonzero $x_{i}$ ’s. Despite the sparse portfolio selection and increased risk, we can see that the proposed model is more promising in terms of higher expected return.

	$r=0.1$						$r=0.2$
	SEPO- $\ell_{0}$			Standard MVO			SEPO- $\ell_{0}$
$\beta_{1}$	E.R.	V.R.	Spar	E.R.	V.R.	Spar	E.R.	V.R.	Spar
0.1	0.6355	3.2835	58%	0.6889	2.3603	0%	0.7441	4.3108	72%
0.2	0.6279	2.9577	61%	0.5735	1.5138	0%	0.6732	3.2822	66%
0.3	0.5050	2.1304	47%	0.4555	1.0320	0%	0.5829	2.4655	58%
0.4	0.5180	2.0288	53%	0.3760	0.8114	0%	0.5796	2.2333	64%
0.5	0.4865	1.7976	51%	0.3003	0.6689	0%	0.5374	1.9056	64%
0.6	0.4237	1.5684	39%	0.2646	0.5785	0%	0.4675	1.6193	52%
0.7	0.3677	1.4070	30%	0.2223	0.5324	0%	0.4655	1.4800	59%
0.8	0.3581	1.3248	31%	0.2057	0.4930	0%	0.4521	1.3289	63%
0.9	0.3787	1.2635	44%	0.1750	0.4704	0%	0.4182	1.2149	56%
1	0.3455	1.1802	40%	0.1560	0.4501	0%	0.4014	1.1204	56%

•

E.R. = Expected return, V.R. = Variance risk, Spar = Sparsity

Table 4.1: The values of portfolio expected return, risk and sparsity for different

\beta_{1}

with minimum guaranteed return ratio

r=0.1

and

r=0.2

under SEPO-

\ell_{0}

and standard MVO

We also compare the expected return and variance risk for the SEPO- $\ell_{0}$ and standard MVO for $r=0.1$ by using scatterplot, as seen in Figure 4.1. The downward trend of the portfolio expected return and risk mimic the standard MVO as $\beta\rightarrow 1$ . Note that a higher value of $\beta_{1}$ reflects our leverage on the variance risk over expected return. At the same time, higher expected return results means higher risk as shown in Table 4.1. In general, the standard MVO model gives lower measure for risk due to maximum diversification, as we can see from Table 4.1 and Figure 4.1. The proposed SEPO- $\ell_{0}$ , on the other hand, can lead to higher expected return and lower total transaction cost due to a sparse portfolio. This shows that SEPO- $\ell_{0}$ model is able to provide good combination of portfolio selection under sparsity.

To illustrate the reliability of our model, we present the output of the proposed model for $r=1$ using a scatterplot of the variables, as shown in Figure 4.2, with $\beta_{1}$ as independent variable at $x$ -axis, expected return and sparsity (in decimal) at the left $y$ -axis, while risk’s scale is at the right of $y$ -axis. We can observe a similar trend for the three lines, which clearly reflects the consistency of our model in obtaining optimal portfolio selection.

The relationship between the independent variable $\beta_{1}$ and the response variables is further examined using multivariate linear regression model, as presented in Table 4.2. As we can see from the table, the estimates for response variables $y$ are all negative, which means their values decrease with the increase of $\beta_{1}$ . Since the $p$ -values of all response variables are approximately zero, it is clear that these three variables are significant. In particular, $\beta_{1}$ has significant negative relationship with expected return, risk and sparsity.

Response variable	Estimate for intercept	Estimate for $y$	Standard error for $y$	$p$ -value for $y$	R-squared
Expected return, $y_{1}$	0.6514	-0.3396	0.0383	2.0737e-0.5	0.9076
Risk, $y_{2}$	3.1246	-2.2371	0.2992	7.0894e-0.5	0.8748
Sparsity, $y_{3}$	0.6013	-0.2679	0.0796	9.8657e-0.5	0.5859

Table 4.2: Relationship between the independent variable

\beta_{1}

and the response variables using multivariate linear regression model for SEPO-

\ell_{0}

with

r=0.1

The significance of $\beta_{1}$ on these three dependent variables are supported by the values of R-squared of univariate regression, standing at 90.76%, 87.48% and 58.59% for expected return, risk and sparsity, respectively. Since R-squared is the percentage of total variation contributed by predictor variable, the high R-squared values of greater than 80% for expected return and risk mean that $\beta_{1}$ explains high percentage of the variance in these two response variables. It is slightly lower for sparsity, however any R-squared value greater than 50% can be considered as moderately high.

5 Conclusion

The classical Markowitz portfolio scheme or mean-variance optimization (MVO) is one of the most successful framework due to the simplicity in implementation, in particular it can be solved by quadratic programming which is widely available. However, it is very sensitive to input parameter and obtaining acceptable solutions requires the right weight constraints. Over the past decade, there has been renewed attention in considering non-quadratic portfolio selection models, due to the advancement in optimization algorithms for solving more general class of functions. Here we proposed a new algorithmic framework that allows portfolio managers to strike a balance between diversifying investments and minimizing transaction cost, of which the latter is achieved by means of minimizing the $\ell_{0}$ -norm. This simply means that the model maximizes sparsity within the portfolio, since the weights $x_{i}$ are forced to be zero except for large ones. In practice, the regularization of $\ell_{0}$ results in a discontinuous and nonconvex problem, and hence is often approximated via $\ell_{1}$ -norm. In this study, we employed the proximal methods such that function can be ’smoothed’, by means of linearizing part of the objective function at some given point and regularizing by a quadratic proximal term that acts as a measure for the ”local error” in the approximation. Writing our problem in the form of augmented Lagrangian, the unconstrained problem can be divided into two parts, namely the smooth and non-smooth terms. These terms are then handled separately through their proximal methods via the ADMM method. The global convergence of the proposed SEPO- $\ell_{0}$ algorithm for sparse equity portfolio has been established. The efficiency of our model in maximizing portfolio expectedreturn while striking a balance between minimizing transaction cost and diversification has been analyzed using actual data of 100 companies. Emperically, the implementation of our model leads to higher expected return and lower transaction cost. This shows that, despite its higher risk as compared to the standard MVO, the SEPO- $\ell_{0}$ model is promising in generating a good combination for optimal investment portfolio.

References

[1] A. Beck and M. Teboulle. Convex Optimization in Signal Processing and Communications, chapter Gradient-based algorithms with applications to signal recovery problems, pages 42–88. Cambridge University Press, 2009.
[2] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, 3rd edition, 2016.
[3] M. J. Best and R. R. Grauer. On the sensitivity of mean-variance-efficient portfolios to changes in asset means: some analytical and computational results. The Review of Financial Studies, 4(2):315–342, 1991.
[4] J. Bochnak, M. Coste, and M. F. Roy. Real algebraic geometry, volume 36. Springer Science & Business Media, 2013.
[5] J. Bolte, A. Daniilidis, and A. Lewis. The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization, 17(4):1205–1223, 2007.
[6] J. Bolte, A. Daniilidis, A. Lewis, and M. Shiota. Clarke subgradients of stratifiable functions. SIAM Journal on Optimization, 18(2):556–572, 2007.
[7] J. Bolte, S. Sabach, and M. Teboulle. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146(1):459–494, 2014.
[8] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends^® in Machine Learning, 3(1):1–122, 2011.
[9] J. Brodie, I. Daubechies, C. De Mol, D. Giannone, and I. Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences, 106(30):12267–12272, 2009.
[10] J. Chen, G. Dai, and N. Zhang. An application of sparse-group lasso regularization to equity portfolio optimization and sector selection. Annals of Operations Research, 284(1):243–262, 2020.
[11] Z. Dai and F. Wen. A generalized approach to sparse and stable portfolio optimization problem. Journal of Industrial & Management Optimization, 14(4):1651–1666, 2018.
[12] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences, 57(11):1413–1457, 2004.
[13] C. De Mol. Financial Signal Processing and Machine Learning, chapter Sparse Markowitz Portfolios, pages 11–22. Wiley Online Library, 2016.
[14] V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal. A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management Science, 55(5):798–812, 2009.
[15] B. Fastrich, S. Paterlini, and P. Winker. Constructing optimal sparse portfolios using regularization methods. Computational Management Science, 12(3):417–434, 2015.
[16] M. Fazel, T. K. Pong, D Sun, and P Tseng. Hankel matrix rank minimization with applications to system identification and realization. SIAM Journal on Matrix Analysis and Applications, 34(3):946–977, 2013.
[17] D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17–40, 1976.
[18] Z. R. Lai, P. Y. Yang, L. Fang, and X. Wu. Short-term sparse portfolio optimization based on alternating direction method of multipliers. The Journal of Machine Learning Research, 19:1–28, 2018.
[19] S. Łojasiewicz. Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles, 117:87–89, 1963.
[20] H. Markowitz. Portfolio selection. Journal of Finance, 7(1):77–91, 1952.
[21] R. O. Michaud. The Markowitz optimization enigma: Is ‘optimized’optimal? Financial Analysts Journal, 45(1):31–42, 1989.
[22] S. Perrin and T. Roncalli. Machine learning optimization algorithms & portfolio allocation. Machine Learning for Asset Management: New Developments and Financial Applications, pages 261–328, 2020.
[23] R. T. Rockafellar and R. J. B Wets. Variational analysis. Springer-Verlag New York, 1998.

	$\displaystyle\left\\|\nabla P(x)-\nabla P(y)\right\\|_{2}$	$\displaystyle=\left\\|\beta_{1}V(x-y)+\beta_{2}(x-y)+\rho(x-y)\right\\|_{2}$
		$\displaystyle\leq\\|\beta_{1}V+\beta_{2}I+\rho ee^{T}\\|_{F}\,\\|x-y\\|_{2}$		(2.15)

Proximal Linearized Method for Sparse Equity Portfolio Optimization with Minimum Transaction Cost

Abstract

1 Introduction

2 Proximal Linearized Method for ℓ0\ell_{0}-norm Minimization

Definition 2.1.

Lemma 2.1 (Descent property [1]).

Lemma 2.2 (Sufficient decrease property [7]).

3 Alternating proximal algorithm and its convergence

Lemma 3.1 (Convergence properties [7]).

Proof.

Lemma 3.2 (Properties of limit points [7]).

Proof.

Definition 3.1.

Definition 3.2 (Kurdyka-Łojasiewicz (KL) property).

Definition 3.3 (Semi-algebraic sets and functions).

Theorem 3.4 ([5, 6]).

Theorem 3.5 (Global convergence).

Proof.

4 Numerical experiments and results

5 Conclusion

References

2 Proximal Linearized Method for $\ell_{0}$ -norm Minimization