Statistical Proxy based Mean-Reverting Portfolios with Sparsity and Volatility Constraints

Ahmad Mousavi^∗ Emails: [email protected] and [email protected] Department of Mathematics and Statistics, American University George Michailidis Department of Statistics, University of Florida

Abstract

Mean-reverting portfolios with volatility and sparsity constraints are of prime interest to practitioners in finance since they are both profitable and well-diversified, while also managing risk and minimizing transaction costs. Three main measures that serve as statistical proxies to capture the mean-reversion property are predictability, portmanteau criterion, and crossing statistics. If in addition, reasonable volatility and sparsity for the portfolio are desired, a convex quadratic or quartic objective function, subject to nonconvex quadratic and cardinality constraints needs to be minimized. In this paper, we introduce and investigate a comprehensive modeling framework that incorporates all the previous proxies proposed in the literature and develop an effective unifying algorithm that is enabled to obtain a Karush–Kuhn–Tucker (KKT) point under mild regularity conditions. Specifically, we present a tailored penalty decomposition method that approximately solves a sequence of penalized subproblems by a block coordinate descent algorithm. To the best of our knowledge, our proposed algorithm is the first method for directly solving volatile, sparse, and mean-reverting portfolio problems based on the portmanteau criterion and crossing statistics proxies. Further, we establish that the convergence analysis can be extended to a nonconvex objective function case if the starting penalty parameter is larger than a finite bound and the objective function has a bounded level set. Numerical experiments on the S&P 500 data set, demonstrate the efficiency of the proposed algorithm in comparison to a semidefinite relaxation-based approach and suggest that the crossing statistics proxy yields more desirable portfolios.

Index Terms—Penalty Decomposition Methods, Sparse Optimization, Mean-reverting Portfolios, Predictability Proxy, Portmanteau Criterion, Crossing Statistics.

1 Introduction

Portfolios exhibiting mean-reverting behavior are of great interest to practitioners due to their predictability property that creates arbitrage opportunities. The task is to construct a portfolio that tends to return to its average value over time and use its performance to make trading decisions. Traditionally such portfolios are obtained by considering a linear combination of assets that are stationary through co-integration methods. However, such an approach often results in a large number of assets, which may not own reasonable volatility; making such portfolios practically inapplicable [12, 28]. Therefore, along with mean-reversion, reasonable volatility is a mandatory property in realistic situations. Moreover, trading costs can be notably reduced by limiting the number of assets, and hence sparse portfolios offer advantages (note that the notion of sparsity is widely used in signal processing and machine learning applications; e.g., [19] and references therein). Hence, constructing a mean-reverting portfolio exhibiting reasonable volatility and comprising a relatively smaller number of assets has found attention in the literature [20, 8]. Recognize that the concept of sparsity holds significant relevance across various applications within contemporary science [18, 24, 17, 1].

Three standard proxies that reflect the mean-reversion property are the predictability measure, the portmanteau criterion, and crossing statistics [9]. The predictability measure, initially defined for stationary processes in [4] and later generalized for non-stationary processes by [3], assesses how similar a time series is to a noise process. Let $A_{i}$ ’s for $i=0,1,\dots,q$ be empirical autocovariance matrices related to a multivariate time series (see (16) for details), then the mean-reversion property via the predictability proxy can be captured using the following problem [9]:

\displaystyle\min_{x\in\mathbb{R}^{n}}\quad x^{T}A_{1}x\qquad\textrm{subject to}\qquad x^{T}x=1.

(1)

The portmanteau statistic, proposed by [16], is used to determine if a process is a white noise. In particular, mean reversion is obtained via portamento criterion proxy as follows [9]:

\displaystyle\min_{x\in\mathbb{R}^{n}}\quad\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)^{2}\qquad\textrm{subject to}\qquad x^{T}x=1.

(2)

The choice of $q$ depends on factors such as data availability, portfolio size, risk tolerance, and computational complexity; hence, in applications, sensitivity analysis is recommended for selecting an appropriate $q$ [11, 14]. The crossing statistic calculates the likelihood of a process crossing its mean per unit of time [26]. Precisely, the crossing statistics proxy results in mean reversion by [9]:

\displaystyle\min_{x\in\mathbb{R}^{n}}\quad x^{T}A_{1}x+\gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)^{2}\qquad\textrm{subject to}\qquad x^{T}x=1,

(3)

$\gamma$ serves as a hyperparameter, governing the influence of lag- $i$ autocovariance matrices (see (16)) on the mean reversion characteristic of the target portfolio. On the other hand, recall that sufficient volatility can be obtained by imposing a constraint on the variance, that is, $xA_{0}x\geq\phi$ for some positive threshold $\phi$ , where $A_{0}$ is the covariance matrix. Hence, statistical proxy-based problems (1), (2), and (3) together with volatility and sparsity constraints can be formulated into the following comprehensive model:

	$\displaystyle\min_{x\in\mathbb{R}^{n}}$	$\displaystyle\ f(x):=\alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)^{2}$		( $P$ )
	subject to	$\displaystyle x^{T}A_{0}x\geq\phi,\quad x^{T}x=1,\quad\text{and}\quad\\|x\\|_{0}\leq k,$		( $P$ )

where $\alpha\in\{0,1\},2\leq q\in\mathbb{N},A_{0}\succ 0,A_{i}\succeq 0$ for $i=1,\dots,q,\gamma\geq 0,\phi>0$ , and $k\leq n$ (for further details see 17)). We emphasize that autocovariance matrices provide information about the temporal dependence structure among variables in a time series data set and are commonly used in time series analysis and econometrics for forecasting, model estimation, and inference; see Section 3 for details. Lastly, it is important to emphasize that $\alpha$ is not a hyperparameter; instead, it is a parameter that is determined immediately by the model choice, as detailed in the equation (17). Also, the hyperparameter $\gamma$ only matters when we consider the crossing statistics-based model and in fact, does not play a role in predictability- and portmanteau criterion-based models.

Despite the extensive literature on solving close problems in portfolio optimization [7, 13, 15, 23], the only approach for handling ( $P$ ) suggests to apply the following semidefinite (SDP) relaxation [8]:

	$\displaystyle\min_{X\in\mathbb{R}^{n\times n}}$	$\displaystyle\ \alpha\mbox{{Tr}}(A_{0}X)+\gamma\sum_{i=2}^{q}\mbox{{Tr}}(A_{i}X)^{2}+\beta\\|X\\|_{1}$		( $SDP-P$ )
	subject to	$\displaystyle\mbox{{Tr}}(A_{0}X)\geq\phi,\quad\mbox{{Tr}}(X)=1,\quad\text{and}\quad X\succeq 0,$		( $SDP-P$ )

with $\beta>0$ and $\|X\|_{1}:=\sum_{i,j}|X_{i,j}|$ . However, this method exhibits several drawbacks: (i) optimal values of ( $P$ ) and ( $SDP-P$ ) may differ, (ii) a solution of ( $SDP-P$ ) is not necessarily rank-one, and (iii) even if it is rank-one, a common approach is to apply sparse principal component analysis to it but, the qualitative properties of a solution of the sparse component analysis is not well-understood with respect to the original problem. Therefore, the effectiveness of applying an SDP relaxation to ( $P$ ) lacks rigorous theoretical support. The main reason is that the presence of $\|X\|_{1}$ in the objective function of ( $SDP-P$ ) breaks down the standard techniques that result in a relationship between a QCQP (without a cardinality constraint) with its original SDP relaxation. More specifically, there are no known theoretical results that specify the relationship between the optimal values and points of a solution of a QCQP that has a cordiality constraint with its SDP relaxation. Further, there exists a body of literature addressing general cardinality problems through other relaxation techniques like mixed-integer nonlinear programming [2], and continuous relaxations of mixed-integer programs [6] but such methods are less promising because they are not using the interesting structure of (P) in their design.

Therefore, we propose an efficient tailored algorithm that exploits the structure of (P) and utilizes a penalty decomposition method to directly solve ( $P$ ) which obtains a KKT point of ( $P$ ). To the best of our knowledge, the proposed algorithm is the first specialized approach specifically designed to directly tackle the problem ( $P$ ) based on the portmanteau criterion and crossing statistics proxies. This unifying algorithm not only is applicable to all three proxies mentioned above but also works for nonconvex objective functions. In the proposed methodology, we use the variable splitting technique to simplify our complex problem ( $P$ ) that involves highly nonlinearly coupled constraints. Generally, the splitting approach involves introducing additional variables to the problem, which helps loosen the coupling between the original variables [27]. This transformation enables the problem to be tackled more efficiently. After splitting and penalizing over the related equality constraints, a sequence of penalized subproblems is in hand that could be approximately solved. Each subproblem is efficiently tackled by a block coordinate algorithm that outputs a saddle point of the corresponding subproblem. Each step of the block coordinate algorithm either has a closed-form solution or is shown to be tractable; see problems ( $P_{x}$ ), ( $P_{y}$ ), and ( $P_{z}$ ), respectively, in Section 2. In addition, the sequence of objective values in the block coordinate algorithm is either strictly decreasing or reaches equality at a finite step, which also yields a saddle point. We establish that our tailored penalty decomposition algorithm finds a KKT point of the original problem ( $P$ ). Moreover, in the nonconvex case, the analysis of the algorithm can be simply extended to demonstrate that our algorithm is successful again, if the starting penalty parameter is larger than a finite positive bound and if the objective function has a bounded level set; see Theorem 2.3 in Section 5 and its proof 5.4 in the Appendix.

We mention that [20] introduces a penalty decomposition method that exclusively addresses the sparse, volatile, and mean-reverting problem associated with the predictability proxy, specifically for when $\alpha=1$ and $\gamma=0$ in ( $P$ ). In this work, we extend this methodology to efficiently tackle problems arising from other proxies as well. We emphasize that, to the best of our knowledge, the portmanteau criterion- and crossing statistics-based problems with volatility and sparsity constraints have not been previously considered in the literature, even though they could produce more profitable portfolios as more information is leveraged by them. We numerically demonstrate that this is indeed the case for crossing statistics, which highlights the importance of designing an effective algorithm for this intractable problem; see Section 3. Further, our empirical results reveal the quadratic term in the objective function is crucial in capturing mean reversion.

We demonstrate the effectiveness of the algorithm by conducting a numerical comparison with the SDP relaxation approach using the S&P 500 dataset. Our method outperforms the SDP relaxation in terms of practical performance measures. Further, we provide a comparison of portfolio performance generated from predictability, portmanteau criterion, and crossing statistics proxies, when our algorithm is applied. The results show that after tuning the parameters $q$ and $\gamma$ , the crossing statistics-based portfolio not only effectively captures the mean-reversion property, but also achieves superior performance in terms of Sharpe ratio and cumulative return and loss, compared to predictability and portmanteau criterion-based portfolios; see Section 3. This is consistent with the fact that its corresponding problem exploits more information from the data in this case. However, the portmanteau criterion-based portfolio practically may not be successful in retaining mean reversion, underscoring the significance of the quadratic term in the objective function for capturing this desirable property.

Organization. The remainder of the paper is organized as follows. Section 2 presents the proposed penalty decomposition algorithm for directly solving ( $P$ ) together with all technical issues. Extensive numerical experiments are provided in 3, while Section 4 draws some concluding remarks. The proofs of the convergence analysis are presented in the Appendix.

Notation. The complement of a set $S$ is denoted as $S^{c}$ and its cardinality as $|S|$ . For a natural number $n$ , let $[n]:=\{1,2,\dots,n\}$ . For $S=\{i_{1},i_{2}\dots,i_{|S|}\}\subseteq[n]$ and $x\in\mathbb{R}^{n},x_{S}\in\mathbb{R}^{n}$ is the coordinate projection of $x$ with respect to indices in $S$ , that is, $(x_{S})_{i}=x_{i}$ for $i\in S$ and $(x_{S})_{i}=0$ for $i\in S^{c}$ . By $\|.\|$ , we mean the standard Euclidean norm. We show $A$ is positive semidefinite or definite by $A\succeq 0$ and $A\succ 0$ , respectively. We denote the smallest and largest eigenvalues of $A$ with $\lambda_{\min}(A)$ , and $\lambda_{\max}(A)$ , respectively.

2 A Penalty Decomposition Method for Statistical Proxy-based Portfolios with Sparsity and Volatility Constraints

An algorithm for directly minimizing ( $P$ ) that obtains a KKT point is presented next, with all proofs delegated to the Appendix. The PPC (standing for predictability, portmanteau criterion, and crossing statistics) based algorithm leverages a tailored penalty decomposition method in which a sequence of penalty subproblems is approximately solved. For each penalty subproblem, a block coordinate descent (BCD) algorithm is utilized that produces a saddle point of the penalty subproblem. The limit point of a suitable subsequence of such saddle points is demonstrated to be a KKT point of ( $P$ ).

2.1 Details of PPC and BCD Algorithms

By introducing two new variables $y$ and $z$ , we can equivalently reformulate ( $P$ ) as follows:

$\displaystyle\min_{x,y,z\in\mathbb{R}^{n}}$	$\displaystyle\alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)(x^{T}A_{i}x)$	( $P^{\prime}$ )
subject to	$\displaystyle x^{T}A_{0}x\geq\phi,\quad y^{T}y=1,\quad\\|y\\|_{0}\leq k,\quad$
	$\displaystyle x-y=0,\quad\text{and}\quad x-z=0.$

The reason for such splitting is to effectively handle the highly nonlinearly coupled constraints of the original problem. Specifically, we aim for the subproblems of Algorithm 1 to be as simple as possible. Let

\displaystyle q_{\rho}(x,y,z):=\alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)(x^{T}A_{i}x)+\rho(\|x-y\|_{2}^{2}+\|x-z\|_{2}^{2}),

(4)

and

\displaystyle\mathcal{X}:=\{x\in\mathbb{R}^{n}\,|\,x^{T}A_{0}x\geq\phi\},\qquad\mathcal{Y}:=\{y\in\mathbb{R}^{n}\,|\,y^{T}y=1\,\text{ and }\|y\|_{0}\leq k\},\qquad\text{ and }\qquad\mathcal{Z}:=\mathbb{R}^{n}.

By penalizing the last two constraints in ( $P^{\prime}$ ), we tackle this problem by a sequence of penalty subproblems as follows:

	$\displaystyle\min_{x,y,z}$	$\displaystyle\ \qquad q_{\rho}(x,y,z)$		( $P_{x,y,z}$ )
	subject to	$\displaystyle\quad x\in\mathcal{X},\quad y\in\mathcal{Y},\quad\text{and}\quad z\in\mathcal{Z},$		( $P_{x,y,z}$ )

with $\rho$ going to infinity incrementally (such techniques have demonstrated their efficacy in the existing literature [21, 20]). The auxiliary variables $y$ and $z$ are introduced such that simpler subproblems can be obtained in Algorithm 1. We establish in the sequel that this method efficiently finds a saddle point $(x_{*},y_{*},z_{*})$ of ( $P_{x,y,z}$ ), which means, $x_{*}\in\mbox{Argmin}_{x\in\mathcal{X}}\,q_{\rho}(x,y_{*},z_{*}),$ $y_{*}\in\mbox{Argmin}_{y\in\mathcal{Y}}\,q_{\rho}(x_{*},y,z_{*}),$ and $z_{*}\in\mbox{Argmin}_{z\in\mathcal{Z}}\,q_{\rho}(x_{*},y_{*},z).$

Algorithm 1 BCD Algorithm for Solving (

P_{x,y,z}

)

1: Input: Select arbitrary

y_{0}\in\mathcal{Y}

and

z_{0}\in\mathcal{Z}

2: Set

l=0

3: Solve

x_{l+1}\in\mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y_{l},z_{l}).

4: Solve

y_{l+1}=\mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x_{l+1},y,z_{l}).

5: Solve

z_{l+1}=\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x_{l+1},y_{l+1},z)

l\leftarrow l+1

and go to step (3).

Next, we discuss how to efficiently solve the restricted subproblems in Algorithm 1.
$\bullet\ \mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y,z)$ : This subproblem becomes the following nonconvex quadratic optimization problem:

	$\displaystyle\min_{x\in\mathbb{R}^{n}}$	$\displaystyle\ \alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}\left(z^{T}A_{i}z\right)\left(x^{T}A_{i}x\right)+\rho\left(\\|x-y\\|^{2}+\\|x-z\\|^{2}\right)$		( $P_{x}$ )
	subject to	$\displaystyle\quad\quad x^{T}A_{0}x\geq\phi.$		( $P_{x}$ )

Clearly, this problem has a solution $x_{*}$ and its KKT conditions are as follows:

\alpha A_{1}x_{*}+\gamma\sum_{i=2}^{q}\left(z^{T}A_{i}z\right)\,A_{i}x_{*}+\rho(2x_{*}-y-z)-\lambda A_{0}x_{*}=0,\quad\mbox{ and }\quad 0\leq\lambda\perp x_{*}^{T}A_{0}x_{*}-\phi\geq 0.

(5)

In general, this nonconvex program does not have a closed-form solution, nevertheless, we will discuss how to find its global minimizer by exploiting its SDP relaxation. Recall that the SDP relaxation of a general quadratic program with exactly one general quadratic constraint obtains the same optimal value provided that it is strictly feasible [5]. It is easy to see ( $P_{x}$ ) is strictly feasible because $A\succ 0$ . Hence, we can find the optimal value of this nonconvex problem exactly by solving its convex SDP relaxation given next:

	$\displaystyle\min_{X\in\mathbb{R}^{(n+1)\times(n+1)}}$	$\displaystyle\mbox{{Tr}}\left(H_{1}X\right)$		( $SDP-P_{x}$ )
	subject to Tr	$\displaystyle\left(H_{2}X\right)\geq 0,\quad X_{11}=1,\quad\text{and}\quad X\succeq 0,$		( $SDP-P_{x}$ )

where

H_{1}=\begin{bmatrix}\rho\left(\|y\|^{2}+\|z\|^{2}\right)&-\rho(y+z)^{T}\\ -\rho(y+z)&\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I\end{bmatrix}\qquad\text{ and }\qquad H_{2}=\begin{bmatrix}-\phi&0\\ 0&A_{0}\end{bmatrix}.

Its dual problem is given by

	$\displaystyle\max_{w_{1}\in\mathbb{R},w_{2}\in\mathbb{R},Z\in\mathbb{R}^{(n+1)\times(n+1)}}$	$\displaystyle w_{2}$
	subject to	$\displaystyle w_{1}H_{2}+w_{2}\begin{bmatrix}1&0\\ 0&0\end{bmatrix}+Z=H_{1},$
		$\displaystyle w_{1}\geq 0,\quad\text{and}\quad Z\succeq 0.$

We first show that both problems are strictly feasible. Clearly, since $A_{0}\succ 0$ , we see $X=\begin{bmatrix}1&0\\ 0&tI\end{bmatrix}$ with $t>\phi/\mbox{{Tr}}(A_{0})>0$ is a strictly feasible point of the primal problem. To see that the dual problem is strictly feasible, it is enough to show that there exists a positive $w_{1}$ and a real $w_{2}$ such that $Z\succ 0.$ This block matrix is positive definite, if and only if (i) $\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I-w_{1}A_{0}\succ 0$ and (ii) $\rho(\|y\|^{2}+\|z\|^{2})+w_{1}\phi-w_{2}-\rho^{2}(y+z)^{T}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I-w_{1}A_{0})^{-1}(y+z)>0$ . To guarantee inequality (i), it is enough to select $w_{1}=\epsilon>0$ small enough such that

\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})+2\rho>\epsilon\lambda_{\max}(A_{0}),

(6)

which is doable because $\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}))\geq 0$ in view of $A_{i}\succeq 0$ for $i\in[q],\gamma\geq 0,$ and $\rho>0$ . Inequality (ii) can be easily guaranteed by selecting a sufficiently large negative number for $w_{2}$ .

Given that both the primal and the dual problems have strictly feasible solutions, their solutions have the same optimal value. Let $(w_{1}^{*},w_{2}^{*},Z^{*})$ and $X^{*}$ be their optimal solutions. If $X^{*}$ of ( $SDP-P_{x}$ ) has rank one, then the solution for ( $P_{x}$ ) is trivially obtained. If not, using the procedure in [25, Lemma 2.2], we can use the rank-one decomposition $X^{*}=\sum_{i=1}^{r}u_{i}u_{i}^{T}$ with $r=\mbox{rank}(X^{*})$ and $0\neq u_{i}\in\mathbb{R}^{n+1}$ for all $i\in[r]$ such that $ru_{i}^{T}H_{2}u_{i}=\mbox{{Tr}}\left(H_{2}X^{*}\right)\geq 0$ for all $i\in[r]$ . Since $X^{*}_{11}=1$ , there exists $j\in[r]$ such that $u_{j}=[\alpha;u]\in\mathbb{R}^{n+1}$ with $\alpha\neq 0$ . Further, the KKT conditions imply that $0=\mbox{{Tr}}\left(X^{*}Z^{*}\right)=\mbox{{Tr}}\left(\sum_{i=1}^{r}u_{i}u_{i}^{T}Z^{*}\right)=\sum_{i=1}^{r}\mbox{{Tr}}\left(u^{T}_{i}Z^{*}u_{i}\right)=\sum_{i=1}^{r}u^{T}_{i}Z^{*}u_{i}$ and since $Z^{*}\succeq 0$ , we have $u_{j}^{T}Z^{*}u_{j}=0$ . Thus, $u_{j}u_{j}^{T}$ and $(w_{1}^{*},w_{2}^{*},Z^{*})$ satisfy the KKT conditions and consequently, $x_{*}:={u_{j}}/\alpha$ yields a global solution to ( $P_{x}$ ). Thus, $x_{*}$ indeed satisfies (5).

$\bullet\ \mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x,y,z)$ : This subproblem is as follows:

\min_{y\in\mathbb{R}^{n}}\ \|x-y\|_{2}^{2}\qquad\mbox{subject to}\qquad y^{T}y=1,\quad\textrm{and}\quad\|y\|_{0}\leq k.

(

P_{y}

)

Thus, due to Lemma 3.2 in [20], the closed-form solution of this problem is given by

y_{*}=\mathcal{T}_{k}\left(x\right),

(7)

where $\mathcal{T}_{k}$ is defined in (9).

$\bullet\ \mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x,y,z)$ : This subproblem becomes the following:

\min_{z\in\mathbb{R}^{n}}\ \gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)\left(z^{T}A_{i}z\right)+\rho\|x-z\|^{2},

(

P_{z}

)

which has the closed-form solution of

z_{*}=\rho\left(\gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)A_{i}+\rho I\right)^{-1}x.

(8)

Next, we develop our proposed PPC Algorithm 2 that starts from an initial positive penalty parameter and smoothly enlarges it to numerically go to infinity. For each individual penalty parameter, Algorithm 1 is utilized to solve the corresponding subproblem. We assume that ( $P$ ) is feasible and a feasible point $x^{\text{feas}}$ is available. To find such a point, one can solve the following sparse principal component analysis problem:

\max_{x\in\mathbb{R}^{n}}\ x^{T}A_{0}x\qquad\mbox{subject to}\qquad x^{T}x=1\quad\text{and}\quad\|x\|_{0}\leq k.

A stationary point of this problem can be obtained by the power method $x_{l}:=\mathcal{T}_{k}(A_{0}x_{l-1})$ for $l\in\mathbb{N}$ (with $u_{0}\in\mathbb{R}^{n}$ arbitrary) suggested in [22]. We can run this cheap method starting from different $x_{0}$ ’s until the achieved stationary point satisfies the volatility constraint. The sparsifying operator $\mathcal{T}_{k}$ is defined as

\mathcal{T}_{k}(x):=\frac{x_{\mathcal{L}}}{\left\|x_{\mathcal{L}}\right\|_{2}},

(9)

where $\mathcal{L}$ is the index set containing the $k$ largest components of $x$ in absolute value. To provide this algorithm and its analysis later, we let:

\displaystyle\Upsilon\geq\max\{f(x^{\text{feas}}),\min_{x\in\mathcal{X}}q_{\rho^{(0)}}(x,y^{(0)}_{0},z^{(0)}_{0})\}>0\qquad\text{ and }\qquad X_{\Upsilon}:=\{x\in\mathbb{R}^{n}\,|\,f(x)\leq\Upsilon\},

(10)

where $f(x)$ is defined in the objective function of ( $P$ ).

Algorithm 2 PPC Penalty Decomposition Algorithm for Solving (

P

)

1: Inputs:

r>1,\rho^{(0)}>0

, and

y^{(0)}_{0}\in\mathcal{Y}

and

z^{(0)}_{0}\in\mathcal{Z}

with

\|z_{0}^{(0)}\|\leq 1

2: Set

j=0

3: repeat

4: Set

l=0

5: repeat

6: Solve

x^{(j)}_{l+1}\in\mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y^{(j)}_{l},z^{(j)}_{l}).

7: Solve

y^{(j)}_{l+1}=\mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x^{(j)}_{l+1},y,z^{(j)}_{l}).

8: Solve

z^{(j)}_{l+1}=\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x^{(j)}_{l+1},y^{(j)}_{l+1},z)

9: Set

l\leftarrow l+1

10: until stopping criterion (11) is met.

11: Set

(x^{(j)},y^{(j)},z^{(j)}):=(x^{(j)}_{l},y^{(j)}_{l},z^{(j)}_{l})

12: Set

\rho^{(j+1)}=r\cdot\rho^{(j)}

13: If

\min_{x\in\mathcal{X}}q_{\rho^{(j+1)}}(x,y^{(j)},z^{(j)})>\Upsilon

, then

y_{0}^{(j+1)}=x^{\text{feas}}

and

z_{0}^{(j+1)}=x^{\text{feas}}

. Otherwise,

y^{(j+1)}_{0}=y^{(j)}

and

z^{(j+1)}_{0}=z^{(j)}

14: Set

j\leftarrow j+1

15: until stopping criterion (12) is met.

Note that Algorithm 2 consists of two nested loops,we stop the inner loop, lines 5 to 10, if (after dropping $(j)$ for simplicity):

\max\left\{\frac{\|x_{l}-x_{l-1}\|_{\infty}}{\max\left(\|x_{l}\|_{\infty},1\right)},\frac{\|y_{l}-y_{l-1}\|_{\infty}}{\max\left(\|y_{l}\|_{\infty},1\right)},\frac{\|z_{l}-z_{l-1}\|_{\infty}}{\max\left(\|z_{l}\|_{\infty},1\right)}\right\}\leq\epsilon_{I},

(11)

and the outer loop, lines 3 to 15, is stopped when the following convergence criterion is met:

\|x^{(j)}-y^{(j)}\|_{\infty}+\|x^{(j)}-z^{(j)}\|_{\infty}\leq\epsilon_{O}.

(12)

2.2 Convergence Analysis

We first investigate Algorithm 1 and then establish the convergence of Algorithm 2.

Analysis of Algorithm 1. We analyze a sequence $\{(x_{l},y_{l},z_{l})\}$ generated by Algorithm 1 and provide a tailored proof that any such sequence obtains a saddle point of ( $P_{x,y,z}$ ). This justifies the use of Algorithm 1 for this nonconvex problem.

Lemma 2.1.

Let $\alpha\in\{0,1\},\gamma\geq 0,\rho>0,A_{0}\succ 0,$ and $A_{i}\succeq 0$ for $i\in[q]$ . Consider the iterates of Algorithm 1, that is, $x_{l}\in\mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y_{l-1},z_{l-1})$ , $y_{l}\in\mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x_{l},y,z_{l-1})$ and further, $z_{l}\in\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x_{l},y_{l},z),$ we get

\max\{\|x_{l}\|,\|y_{l}\|,\|z_{l}\|\}\leq\max\{\sqrt{\phi/{\lambda_{\min}(A_{0})}},1\}.

(13)

This lemma establishes that the sequence created by Algorithm 1 is bounded. Therefore, every sequence generated by this algorithm possesses a point of accumulation. The subsequent theorem establishes that every accumulation point is unquestionably a saddle point of ( $P_{x,y,z}$ ). Additionally, the sequence $(q_{\rho}(x_{l},y_{l},z_{l}))$ is either strictly decreasing or two consecutive terms produce the same value, resulting in a saddle point. Essentially, Algorithm 1 produces a saddle point either in finite steps or in the limit.

Theorem 2.1.

Let $\{(x_{l},y_{l},z_{l})\}$ be a sequence generated by Algorithm 1 for solving ( $P_{x,y,z}$ ). Suppose that $(x_{*},y_{*},z_{*})$ is an accumulation point of this sequence, then $(x_{*},y_{*},z_{*})$ is a saddle point of the nonconvex problem ( $P_{x,y,z}$ ). Moreover, $\left\{q_{\rho}(x_{l},y_{l},z_{l})\right\}$ is a non-increasing sequence. If $q_{\rho}(x_{r},y_{r},z_{r})=q_{\rho}(x_{r+1},y_{r+1},z_{r+1})$ for some $r\in\mathbb{N}$ , then $(x_{r},y_{r},z_{r})$ is a saddle point of $(\ref{pr: pxyz})$ .

Analysis of Algorithm 2. Suppose that $x^{*}$ is a local minimum of ( $P$ ), then there exists an index set $\mathcal{L}$ such that $|\mathcal{L}|=k$ and $x^{*}_{\mathcal{L}^{c}}=0$ such that $x^{*}$ is also a local minimizer of the following problem:

	$\displaystyle\min_{x\in\mathbb{R}^{n}}$	$\displaystyle\quad f(x)$
	subject to	$\displaystyle\quad x^{T}A_{0}x\geq\phi,\quad x^{T}x=1,\quad\text{and}\quad x_{\mathcal{L}^{c}}=0,$

where $f(x)$ is defined in the objective function of ( $P$ ). Recall that Robinson’s constraint qualification conditions for a local minimizer $x^{*}$ for the above problem are as follows [20]:

		$\displaystyle(i)\mbox{ If }(x^{})^{T}A_{0}x^{}>\phi,\text{ then }\bigg{\{}\begin{bmatrix}-2d^{T}A_{0}x^{}-v\\ 2d^{T}x^{}\\ d_{\mathcal{L}^{c}}\end{bmatrix}\ \Big{\|}\ d\in\mathbb{R}^{n},\ v\in\mathbb{R}\bigg{\}}=\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{\|\mathcal{L}^{c}\|};$
		$\displaystyle(ii)\mbox{ If }(x^{})^{T}A_{0}x^{}=\phi,\text{ then }\bigg{\{}\begin{bmatrix}-2d^{T}A_{0}x^{}-v\\ 2d^{T}x^{}\\ d_{\mathcal{L}^{c}}\end{bmatrix}\ \Big{\|}\ d\in\mathbb{R}^{n},\ v\in\mathbb{R}_{-}\bigg{\}}=\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{\|\mathcal{L}^{c}\|}.$		(14)

It can be seen that $x^{*}_{\mathcal{L}}\neq 0$ , due to $\|x^{*}\|_{2}=1$ and $x^{*}_{\mathcal{L}^{c}}=0$ , and thus Robinson’s conditions for case (i) always hold. For case (ii), Robinson’s conditions hold, if and only if $\{(A_{0}x^{*})_{\mathcal{L}},x^{*}_{\mathcal{L}}\}$ is linearly independent. It is easy to see this set is linearly independent almost always. This implies that except for a set of measure zero, Robinson’s conditions are satisfied for ( $P$ ), that is, such an assumption for $x^{*}$ is essentially the case in practice.

Under Robinson’s conditions mentioned above, the KKT conditions for a local minimizer $x^{*}$ of ( $P$ ) are the existence of Lagrangian multipliers $(\lambda,\mu,w)\in\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{n}$ and $\mathcal{L}\subseteq[n]$ with $|\mathcal{L}|=k$ such that the following holds:

		$\displaystyle\quad(\alpha A_{1}+2\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})x-\lambda A_{0}x+\mu x+w=0,$		(15)
		$\displaystyle x_{{\mathcal{L}}^{c}}=0,\,0\leq\lambda\perp x^{T}A_{0}x-\phi\geq 0,\,\\|x\\|_{2}=1,\text{ and }w_{\mathcal{L}}=0.$		(15)

The next result states that an arbitrary sequence $\{(x^{(j)},y^{(j)},z^{(j)})\}$ generated by Algorithm 2 has a convergent subsequence, whose limit point is a KKT point of ( $P$ ).

Theorem 2.2.

Suppose that $\alpha\in\{0,1\},\gamma\geq 0,\phi>0,k\in[n],A_{0}\succ 0$ and $A_{i}\succeq 0$ for $i\in[q]$ . Let $\left\{\left(x^{(j)},y^{(j)},z^{(j)}\right)\right\}$ be a sequence generated by Algorithm 2 for solving ( $P$ ). Then, the following hold:

(i)

$\left\{\left(x^{(j)},y^{(j)},z^{(j)}\right)\right\}$ has a convergent subsequence whose accumulation point $(x^{*},y^{*},z^{*})$ satisfies $x^{*}=y^{*}=z^{*}$ . Further, there exists an index subset $\mathcal{L}\subseteq[n]$ with $|\mathcal{L}|=k$ such that $x^{*}_{\mathcal{L}^{c}}=0$ .
(ii)

Suppose that Robinson’s condition given in (2.2) holds at $x^{*}$ with the index subset $\mathcal{L}$ indicated above. Then, $x^{*}$ is a KKT point satisfying (15) for ( $P$ ).

Extension to Nonconvex Objective Functions. Note that the focus has been on constructing portfolios based on the predictability, portmanteau criterion, and crossing statistics proxies, for which $f(x)$ as defined in (10) is convex. Nevertheless, we point out that from a technical viewpoint, one can employ Algorithm 2 to solve ( $P$ ) even when the objective function is not convex. For this case, we require the boundedness of the level set $X_{\Upsilon}$ defined in (10) and $\rho^{(0)}$ to be large enough in this algorithm.

Theorem 2.3.

Suppose that $\alpha\in\{0,1\},\gamma\geq 0,\phi>0,A_{0}\succ 0,A_{i}$ be symmetric for $i\in[q],k\in[n],\rho^{(0)}>|\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1})|$ , and the level set $X_{\Upsilon}$ in (10) is bounded. All the steps in Algorithm 2 are well-defined and it successfully finds a KKT point of ( $P$ ).

3 Numerical Experiments

Here, we examine the performance of Algorithm 2 in finding sparse, volatile, and statistically proxy-based mean-reverting portfolios on real data coming from the US stock market Standard and Poor’s (S $\&$ P 500) Index. This data set is often used for showing the practical effectiveness of an algorithm or for comparing numerical performance between competing approaches.

Recall that a statistical arbitrage strategy typically involves four main steps, namely constructing an appropriate asset pool, designing a mean-reverting portfolio, verifying the mean-reverting property through a unit-root test, and finally trading the favorable portfolio. The construction of an asset pool can be achieved using a method described in [8], which utilizes the smallest eigenvalue of the covariance matrix. In this paper, we focus on the crucial second step of developing a mean-reverting portfolio by minimizing predictability, portmanteau criterion, or crossing statistics measures, while incorporating volatility and sparsity constraints to ensure the profitability of the portfolio in practice. The mean-reversion property can be verified using a unit-root test such as the Dickey-Fuller test [10]. A comprehensive discussion on the trading strategy of a mean-reverting portfolio in [30, 29] is based on observed/normalized spread values. Normalizing the spread values makes the strategy less sensitive to absolute price levels. Table 1 in [29] provides examples of how the strategy works for different trading positions and normalized spread values. The strategy involves closing long positions and taking short positions when the normalized spread value exceeds a threshold, and not taking any action when the normalized spread value is negative. It is a simple yet effective strategy, but careful testing and analysis are necessary before adopting it as a long-term trading strategy.

In our numerical experiments, various standard performance metrics are used to measure the quality of portfolios, which in our case, are obtained from the limit point of Algorithm 2 and applying sparse PCA to the solution of (SDP-P). The first measure is called cumulative profit and loss ( $P\&L$ ) is used to measure the overall return of a mean-reverting portfolio within a single trading period, from $t_{1}$ to $t_{2}$ , and is calculated as

\text{Cumulative P\&L}(t_{1},t_{2})=\sum_{t=t_{1}}^{t_{2}}P\&L_{t},

where $P\&L_{t}=\color[rgb]{0,0,0}{x}^{T}r_{t}(t-t_{o})-x^{T}r_{t-1}(t-1-t_{o})$ when a long position is opened, and $P\&L_{t}=\color[rgb]{0,0,0}x^{T}r_{t}(t-t_{o})-x^{T}r_{t-1}(t-1-t_{o})$ when a short position is opened at time $t_{o}$ . For a particular asset, the value of $r_{t}(\tau)=\frac{p_{t}-p_{t-\tau}}{p_{t-\tau}}\approx\ln(p_{t})-\ln(p_{t-\tau})$ , where $p_{t}$ represents the price of the asset at time $t$ . Table 1 in [29] is used for this purpose with $d$ being the standard deviation of the portfolio. The Return on Investment (ROI) is another metric used to evaluate the investment return of a mean-reverting portfolio. It is calculated as follows: $\text{ROI}_{t}=\frac{P\&L_{t}}{\|x\|_{1}},$ where $P\&L_{t}$ is the profit and loss of the portfolio at time $t$ , and $\|x\|_{1}$ is the L1-norm of the portfolio weights. The last metric used to evaluate the performance of a portfolio over a given period of time from $t_{1}$ to $t_{2}$ is the Sharpe ratio ( $SR$ ), which is defined as

SR_{ROI}(t_{1},t_{2})=\mu_{ROI}/\sigma_{ROI},

where $\mu_{ROI}=1/(t_{2}-t_{1})\sum_{t=t_{1}}^{t_{2}}ROI_{t}$ , and $\sigma^{2}_{ROI}=1/(t_{2}-t_{1})\sum_{t=t_{1}}^{t_{2}}(ROI_{t}-\mu_{ROI})^{2}$ . A portfolio with a higher $SR$ is considered more profitable.

To conduct the numerical experiments, we start by explaining how to select matrices $A_{i}$ ’s for $i\in\{0,1,\dots,q\}$ . Given a multivariate stochastic process $\bf x=(\bf x_{t})_{t\in\mathbb{N}}$ with values in $\mathbb{R}^{n}$ . For a nonnegative integer $s$ , the lag- $s$ empirical autocovariance matrix of a sample path $x=(x_{1},x_{2},\dots,x_{T})$ of $\bf x_{t}$ is defined as

\Gamma_{s}:=\frac{1}{T-s-1}\sum_{t=1}^{T-s}\tilde{x_{t}}\tilde{x}_{T+s}^{T}\qquad\text{ with }\qquad\tilde{x}_{t}:=x_{t}-\frac{1}{T}\sum_{t=1}^{T}x_{t}.

(16)

In particular, we have

\left\{\begin{array}[]{ll}\alpha=1,\,\gamma=0,\ \text{and}\ A_{0}=\Gamma_{0}&\quad\mbox{for the predictability-},\\ \alpha=0,\,\gamma=1,\ \text{and}\ A_{i}=\Gamma_{i};\ \forall i\in\{0,2,3,\dots,q\}&\quad\mbox{for the portmanteau criterion-, and}\\ \alpha=1,\,\gamma>0,\ \text{and}\ A_{i}=\Gamma_{i};\ \forall i\in\{0,1,2,3,\dots,q\}&\quad\mbox{for the crossing statistics-based problems.}\end{array}\right.

(17)

We construct the asset pool by combining the pools suggested in optimization portfolio studies [8, 30, 29, 28], resulting in a pool of $n=30$ assets. The trading time period is from February 1st, 2012 to June 30th, 2014.

We run Algorithm 2 with $r=\sqrt{10},\epsilon_{I}=10^{-3},$ defined in (11) and $\epsilon_{O}=10^{-3}$ , defined in (12), and the volatility threshold $\phi$ selected to be larger than thirty percent of the median variance of all assets in the pool (see [8]). In order to obtain reasonable results and compare the performance of the portmanteau criterion- and crossing statistics-based portfolios, we must decide upon a specific $q$ . For this goal, we select the best $q$ among the candidate set $\{2,3,4\}$ when we apply our algorithm for the case of the portmanteau criterion as follows. We measure the average Sharpe ratios for a small, medium, and large sparsity level $k=5,10,$ and $17$ . Based on this procedure (results not reported due to space considerations), we get that $q=3$ produces better results than $q=2$ , but the differences between $q=4$ and $q=3$ are rather marginal. Hence, in our experiments, we set $q=3$ . After fixing $q=3$ , the next step involves selecting an appropriate $\gamma$ . For this aim, we concentrate exclusively on portfolio optimization based on crossing statistics. We opt for a range of values within the interval $(0.0001,1]$ , incrementally adjusting $\gamma$ with a step size of 0.0009. While applying Algorithm 2 to the crossing statistic-based model across various sparsity levels, specifically $k=5,10,and17$ , we evaluate the average Sharpe ratios. By this process, the best candidate is realized as $\gamma=0.001$ . Furthermore, for (SDP-P), we need to determine $\beta$ as well. To select this hyperparameter, we apply the methodology presented in [8] (that is, we solve this convex problem and apply sparse principal component analysis on its solution to introduce the corresponding portfolio), for all three cases reported in (17) (with $q=3$ , and $\gamma=0.001$ ) and values from 0.001 to 1.999, with 0.009 increments. This grid search revealed that the range between 0.8 and 1.2 showed better promise for $\beta$ values. We then narrowed down our selection within this range, using a 0.01 step. Throughout this iterative approach, we found that $\beta=1$ yielded the highest Sharpe ratio. As such, $\beta$ is selected as 1 in our comparisons.

The spread pictures in Figures 1, 2, and 3 depict that the proposed method illustrates better mean reversion overall and the second and third pictures in these figures show that our method surpasses the SDP relaxation technique across both standard cumulative P&L and Sharpe ratio assessments. This holds true for a range of sparsity levels, including small, medium, and large, as well as across all three portfolio models: predictability-based, portmanteau criterion-based, and crossing statistics-based portfolios. Nevertheless, we should emphasize that the numerical performance of the SDP relaxation idea raises intriguing theoretical research questions that surely require thorough investigation. For example, under what conditions on the problem data in ( $P$ ), its SDP relaxation ( $SDP-P$ ) has a rank-one solution. And, if available, how to use a rank-one solution of ( $SDP-P$ ) to construct a solution of ( $P$ ).

Next, we compare the performance of portfolios generated from the predictability, portmanteau criterion, and crossing statistics proxies, based on Algorithm 2. Figure 4 demonstrates that after careful tuning of the parameters $q$ and $\gamma$ , the crossing statistics-based portfolio not only accurately captures the mean-reversion property, but also achieves superior performance in terms of Sharpe ratio and cumulative return and loss, surpassing the predictability and portmanteau criterion-based portfolios. This observation completely aligns with the idea that Algorithm 2 leverages more information from the data in this case. In addition, the portmanteau criterion proxy seems to fail in retaining mean reversion, underscoring the critical role of the quadratic term in the objective function for capturing this desirable property.

Refer to caption — Figure 1: Statistical proxy- and SDP-based portfolios for $k=5$ .

4 Conclusion

The paper developed an algorithm that solves the underlying optimization problem of constructing portfolios of assets, having the mean-reverting property, and also being sparse and exhibiting reasonable volatility. Three main proxies for capturing mean-reversion are predictability, portmanteau criterion, and crossing statistics. A comprehensive optimization model was developed for this task. We leverage the variable splitting technique to unfold highly coupled nonconvex constraints of this model and propose a tailored penalty decomposition method, that solves a sequence of penalty subproblems via an efficient block coordinate method. We establish that not only the proposed algorithm finds a KKT point when the objective function is convex, but also its analysis can be extended to a nonconvex objective function (with partial modifications). We numerically examine the performance of the algorithm and show that it outperforms the SDP relaxation approach in terms of standard measures on an S&P 500 data set. Further, we compare these statistical proxy-based portfolios and demonstrate that, after tuning hyperparameters, the crossing statistics surrogate captures mean-reversion well and achieves better performance compared to predictability and portmanteau criterion-based portfolios. Our method has significant potential applications in finance, economics, and other related fields.

References

[1] Ramin Ayanzadeh, Ahmad Mousavi, Milton Halem, and Tim Finin. Quantum annealing based binary compressive sensing with matrix uncertainty. arXiv preprint arXiv:1901.00088, 2019.
[2] Dimitris Bertsimas and Romy Shioda. Algorithm for cardinality-constrained quadratic optimization. Computational Optimization and Applications, 43(1):1–22, 2009.
[3] Ronald Bewley, David Orden, Minxian Yang, and Lance A Fisher. Comparison of box—tiao and johansen canonical estimators of cointegrating vectors in vec (1) models. Journal of Econometrics, 64(1-2):3–27, 1994.
[4] George EP Box and George C Tiao. A canonical analysis of multiple time series. Biometrika, 64(2):355–365, 1977.
[5] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
[6] Oleg P Burdakov, Christian Kanzow, and Alexandra Schwartz. Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM Journal on Optimization, 26(1):397–425, 2016.
[7] Catarina Coelho, José Luis Santos, and Pedro Júdice. The performance of bank portfolio optimization. International Transactions in Operational Research, 2023.
[8] Marco Cuturi and Alexandre d’Aspremont. Mean-reverting portfolios: Tradeoffs between sparsity and volatility. Financial Signal Processing and Machine Learning, pages 23–40, 2016.
[9] Marco Cuturi and Alexandre d’Aspremont. Mean reversion with a variance threshold. In International Conference on Machine Learning, pages 271–279. PMLR, 2013.
[10] David A Dickey and Wayne A Fuller. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a):427–431, 1979.
[11] Jin-Hong Du, Yifeng Guo, and Xueqin Wang. High-dimensional portfolio selection with cardinality constraints. Journal of the American Statistical Association, pages 1–13, 2022.
[12] Norbert Fogarasi and Janos Levendovszky. Sparse, mean reverting portfolio selection using simulated annealing. Algorithmic Finance, 2(3-4):197–211, 2013.
[13] F Hooshmand, Z Anoushirvani, and SA MirHassani. Model and efficient algorithm for the portfolio selection problem with real-world constraints under value-at-risk measure. International Transactions in Operational Research, 2023.
[14] Takahiro Imai and Kei Nakagawa. Statistical arbitrage strategy in multi-asset market using time series analysis. Journal of Mathematical Finance, 10(2):334–344, 2020.
[15] Kui Jing, Fengmin Xu, and Xuepeng Li. A bi-level programming framework for identifying optimal parameters in portfolio selection. International Transactions in Operational Research, 29(1):87–112, 2022.
[16] Greta M Ljung and George EP Box. On a measure of lack of fit in time series models. Biometrika, 65(2):297–303, 1978.
[17] Hossein Moosaei, Ahmad Mousavi, Milan Hladík, and Zheming Gao. Sparse l1-norm quadratic surface support vector machine with universum data. Soft Computing, 27(9):5567–5586, 2023.
[18] Ahmad Mousavi and George Michailidis. Cardinality constrained mean-variance portfolios: A penalty decomposition algorithm. arXiv preprint arXiv:2309.16004, 2023.
[19] Ahmad Mousavi, Mehdi Rezaee, and Ramin Ayanzadeh. A survey on compressive sensing: classical results and recent advancements. Journal of Mathematical Modeling, 8(3):309–344, 2020.
[20] Ahmad Mousavi and Jinglai Shen. A penalty decomposition algorithm with greedy improvement for mean-reverting portfolios with sparsity and volatility constraints. International Transactions in Operational Research, 2022.
[21] Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, and George Michailidis. A penalty based method for communication-efficient decentralized bilevel programming. arXiv preprint arXiv:2211.04088, 2022.
[22] Andrzej Ruszczynski. Nonlinear Optimization. Princeton University Press, 2011.
[23] Ruchika Sehgal and Aparna Mehra. Robust reward–risk ratio portfolio optimization. International Transactions in Operational Research, 28(4):2169–2190, 2021.
[24] Jinglai Shen and Ahmad Mousavi. Least sparsity of $p$ -norm based optimization problems with $p>1$ . SIAM Journal on Optimization, 28(3):2721–2751, 2018.
[25] Yinyu Ye and Shuzhong Zhang. New results on quadratic minimization. SIAM Journal on Optimization, 14(1):245–267, 2003.
[26] N Donald Ylvisaker. The expected number of zeros of a stationary gaussian process. The Annals of Mathematical Statistics, 36(3):1043–1046, 1965.
[27] Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, and Yuan Yao. Global convergence of block coordinate descent in deep learning. In International conference on machine learning, pages 7313–7323. PMLR, 2019.
[28] Jize Zhang, Tim Leung, and Aleksandr Aravkin. Sparse mean-reverting portfolios via penalized likelihood optimization. Automatica, 111:108651, 2020.
[29] Ziping Zhao and Daniel P Palomar. Mean-reverting portfolio with budget constraint. IEEE Transactions on Signal Processing, 66(9):2342–2357, 2018.
[30] Ziping Zhao, Rui Zhou, Zhongju Wang, and Daniel P Palomar. Optimal portfolio design for statistical arbitrage in finance. In 2018 IEEE Statistical Signal Processing Workshop (SSP), pages 801–805. IEEE, 2018.

5 Appendix

All technical proofs are given next.

5.1 Proof of Lemma 2.1.

Proof.

We drop the subscripts for simplicity. If $x^{T}A_{0}x>\phi$ , the Lagrangian multiplier in ( $P_{x}$ ) must be zero such that we have $\left(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I\right)x={\rho}(y+z),$ and consequently, $x=\rho(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I)^{-1}(y+z)$ . This leads to

$\displaystyle\\|x\\|$	$\displaystyle\leq\rho\\|(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I)^{-1}\\|_{2}\\|y+z\\|_{2}$	(18)
	$\displaystyle\leq\frac{\rho}{\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})+2\rho}\\|y+z\\|_{2}$
	$\displaystyle\leq\frac{1}{2}\\|y+z\\|_{2}$
	$\displaystyle\leq\frac{1}{2}(\\|y\\|+\\|z\\|)$
	$\displaystyle=\frac{1}{2}(1+\\|z\\|)\qquad\text{as}\ \ y\ \ \text{solves}\ \ \ref{pr: py},$

where we used that $\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})\geq 0$ in the third inequality. Moreover, since $z$ solves ( $P_{z}$ ) and $\lambda_{\min}\left(\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i}\right)\geq 0$ , the equation (8) implies

$\displaystyle\\|z\\|$	$\displaystyle\leq\rho\\|(\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i}+\rho I)^{-1}\\|\\|x\\|$	(19)
	$\displaystyle\leq\frac{\rho}{\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})+\rho}\\|x\\|$
	$\displaystyle\leq\\|x\\|.$

Through (18), we see $\|x\|\leq\frac{1}{2}(1+\|z\|)\leq\frac{1}{2}(1+\|x\|)$ ; leading to $\|x\|\leq 1$ in this case. If $x^{T}A_{0}x=\phi$ , we have $\|x\|\leq\sqrt{\phi/{\lambda_{\min}(A_{0})}}$ . By (19), we have $\|z\|\leq\|x\|\leq\max\{\sqrt{\phi/{\lambda_{\min}(A_{0})}},1\}.$ ∎

5.2 Proof of Theorem 2.1.

Proof.

By observing definitions of $x_{l+1},y_{l+1}$ , and $z_{l+1}$ in steps 3-5 of Algorithm 1, we get

$\displaystyle q_{\rho}(x_{l+1},y_{l+1},z_{l+1})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x_{l+1},y_{l+1},z);\qquad\forall z\in\mathcal{Z},$
$\displaystyle q_{\rho}(x_{l+1},y_{l+1},z_{l})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x_{l+1},y,z_{l});\qquad\forall y\in\mathcal{Y},$
$\displaystyle q_{\rho}(x_{l+1},y_{l},z_{l})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x,y_{l},z_{l});\qquad\forall x\in\mathcal{X}.$	(20)

This simply leads to the following:

$\displaystyle 0$	$\displaystyle\leq q_{\rho}(x_{l+1},y_{l+1},z_{l+1})$	(21)
	$\displaystyle\leq q_{\rho}(x_{l+1},y_{l+1},z_{l})$
	$\displaystyle\leq q_{\rho}(x_{l+1},y_{l},z_{l})$
	$\displaystyle\leq q_{\rho}(x_{l},y_{l},z_{l});\quad\forall l\in\mathbb{N}.$

Thus, $q_{\rho}(x_{l},y_{l},z_{l})$ is a bounded below and non-increasing sequence; implying that $q_{\rho}(x_{l},y_{l},z_{l})$ is convergent. From the other side, since $(x_{*},y_{*},z_{*})$ is an accumulation point of $\{(x_{l},y_{l},z_{l})\}$ , there exist a subsequence $L$ such that $\lim_{l\in L\to\infty}(x_{l},y_{l},z_{l})=(x_{*},y_{*},z_{*})$ . The continuity of $q_{\rho}(x_{l},y_{l},z_{l})$ yields

	$\displaystyle\lim_{l\to\infty}q_{\rho}(x_{l+1},y_{l+1},z_{l})$	$\displaystyle=\lim_{l\to\infty}q_{\rho}(x_{l+1},y_{l},z_{l})$
		$\displaystyle=\lim_{l\to\infty}q_{\rho}(x_{l},y_{l},z_{l})$
		$\displaystyle=\lim_{l\in L\to\infty}q_{\rho}(x_{l},y_{l},z_{l})$
		$\displaystyle=q_{\rho}(x_{},y_{},z_{*}).$

By the continuity of $q_{\rho}(x_{l},y_{l},z_{l})$ and taking the limit of both sides of (5.2) as $l\in L\to\infty$ , we have

$\displaystyle q_{\rho}(x_{},y_{},z_{*})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x,y_{},z_{});\qquad\forall x\in\mathcal{X},$
$\displaystyle q_{\rho}(x_{},y_{},z_{*})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x_{},y,z_{});\qquad\forall y\in\mathcal{Y},$
$\displaystyle q_{\rho}(x_{},y_{},z_{*})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x_{},y_{},z);\qquad\forall z\in\mathcal{Z}.$

Further, it is clear from (21) that $\left\{q_{\rho}(x_{l},y_{l},z_{l})\right\}$ is non-increasing.
Now suppose $q_{\rho}(x_{l},y_{l},z_{l})=q_{\rho}(x_{l+1},y_{l+1},z_{l+1})$ for some $l\in\mathbb{N}$ . Then because of the last inequality in (21), we have $q_{\rho}(x_{l+1},y_{l},z_{l})=q_{\rho}(x_{l},y_{l},z_{l}).$ Since $q_{\rho}(x_{l+1},z_{l},y_{l})=\min_{x\in\mathcal{X}}q_{\rho}(x,y_{l},z_{l})$ and $x_{l}\in\mathcal{X}$ , we see $x_{l}\in\mbox{Argmin}_{x\in\mathcal{X}}q_{\rho}(x,y_{l},z_{l})$ . Further, if $q_{\rho}(x_{l},y_{l},z_{l})=q_{\rho}(x_{l+1},y_{l+1},z_{l+1})$ for some $l\in\mathbb{N}$ , using the third inequality in (21) we get $q_{\rho}(x_{l+1},y_{l+1},z_{l})=q_{\rho}(x_{l+1},y_{l},z_{l})$ . Since $q_{\rho}(x_{l+1},y_{l+1},z_{l})=\min_{y\in\mathcal{Y}}q_{\rho}(x_{l+1},y,z_{l})$ and $y_{l}\in\mathcal{Y}$ , we see $y_{l}\in\mbox{Argmin}_{y\in\mathcal{Y}}q_{\rho}(x_{l+1},y,z_{l})$ . By the last inequality in (21), we have $q_{\rho}(x_{l+1},y_{l},z_{l})=q_{\rho}(x_{l},y_{l},z_{l})$ , so $y_{l}\in\mbox{Argmin}_{y\in\mathcal{Y}}q_{\rho}(x_{l},y,z_{l})$ . Recall that $z_{l}\in\mathcal{Z}$ satisfies $z_{l}\in\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x_{l},y_{l},z)$ by definition. Hence, $(x_{l},y_{l},z_{l})$ is a saddle point of ( $P_{x,y,z}$ ). ∎

5.3 Proof of Theorem 2.2.

Proof.

(i) From (13), we know that $\max\left\{\|x^{(j)}\|,\|y^{(j)}\|,\|z^{(j)}\|\right\}\leq\max\{\sqrt{\phi/{\lambda_{\min}(A_{0})}},1\},$ for every $j\in\mathbb{N}$ . Thus, $\{(x^{(j)},y^{(j)},z^{(j)})\}$ is bounded and therefore, has a convergent subsequence. For our purposes, without loss of generality, we suppose that the sequence itself is convergent. Let $\left(x^{*},y^{*},z^{*}\right)$ be its accumulation point. First, let us show that $x^{*}=y^{*}=z^{*}$ . Since $\alpha\in\{0,1\},A_{i}\succeq 0$ for $i\in[q],$ and $\gamma\geq 0$ , we see $\alpha{(x^{(j)})}^{T}A_{1}x^{(j)}+\gamma\sum_{i=2}^{q}({z^{(j)}})^{T}A_{i}z^{(j)}({x^{(j)}})^{T}A_{i}x^{(j)}\geq 0$ , and thus in view of (4), (21), and step 13 of Algorithm 2, we have $\rho^{(j)}\left(\|x^{(j)}-y^{(j)}\|^{2}+\|x^{(j)}-z^{(j)}\|^{2}\right)\leq\Upsilon$ . The latter leads to

\max\{\|x^{(j)}-y^{(j)}\|,\|x^{(j)}-z^{(j)}\|\}\leq\sqrt{{\Upsilon}/{\rho^{(j)}}},

and thus $\max\{\|x^{(j)}-y^{(j)}\|,\|x^{(j)}-z^{(j)}\|\}\to 0$ when $j\to\infty$ as $\rho^{(j)}\to\infty$ ; implying that $x^{*}=y^{*}=z^{*}$ .

Next, let $\mathcal{I}^{(j)}\subseteq[n]$ be such that $|\mathcal{I}^{(j)}|=k$ and $(y_{(\mathcal{I}^{j})^{c}})_{i}=0$ for every $j\in\mathbb{N}$ and $i\in(\mathcal{I}^{j})^{c}$ . Then, since $\{\mathcal{I}^{(j)}\}$ is a bounded sequence of indices, it has a convergent subsequence, which means that there exists an index subset $\mathcal{L}\subseteq[n]$ with $|\mathcal{L}|=k$ and a subsequence $\{(x^{(j_{\ell})},y^{(j_{\ell})},z^{(j_{\ell})})\},$ of the above convergent subsequence such that $\mathcal{I}^{(j_{\ell})}=\mathcal{L}$ for all large $j_{\ell}$ ’s. Therefore, since $x^{*}=y^{*}$ and $y^{*}_{\mathcal{L}^{c}}=0$ , we see $x^{*}_{\mathcal{L}^{c}}=0$ , which further implies $\|x^{*}_{\mathcal{L}}\|=1$ .

(ii) For each $j$ , $(x^{(j)},y^{(j)},z^{(j)})$ is a saddle point of ( $P_{x,y,z}$ ) with $\rho=\rho^{(j)}>0$ such that (5),(7) and (8) give

\left\{\begin{aligned} &\alpha A_{1}x^{(j_{\ell})}+\gamma\sum_{i=2}^{q}(z^{(j_{\ell})})^{T}A_{i}z^{(j_{\ell})}A_{i}x^{(j_{\ell})}+\dots\\ &\qquad\dots+\rho(x^{(j_{\ell})}-y^{(j_{\ell})}+x^{(j_{\ell})}-z^{(j_{\ell})})-\lambda A_{0}x^{(j_{\ell})}=0,\\ &y^{(j_{\ell})}={(x^{(j_{\ell})})_{\mathcal{L}}}/{\|{(x^{(j_{\ell})})_{\mathcal{L}}}\|},\\ &\gamma\sum_{i=2}^{q}(x^{(j_{\ell})})^{T}A_{i}x^{(j_{\ell})}A_{i}z^{(j_{\ell})}=\rho(x^{(j_{\ell})}-z^{(j_{\ell})}),\\ &0\leq\lambda\perp(x^{(j_{\ell})})^{T}A_{0}x^{(j_{\ell})}-\phi\geq 0,\\ &\quad\|y^{(j_{\ell})}\|=1,\quad\mbox{and}\quad(y^{(j_{\ell})})_{\mathcal{L}^{c}}=0.\end{aligned}\right.

(22)

By injecting the first two lines of (22) into the third, we get

\displaystyle\alpha A_{1}x^{(j_{\ell})}+\gamma\sum_{i=2}^{q}[(z^{(j_{\ell})})^{T}A_{i}z^{(j_{\ell})}A_{i}x^{(j_{\ell})}+(x^{(j_{\ell})})^{T}A_{i}x^{(j_{\ell})}A_{i}z^{(j_{\ell})}]+\rho^{(j_{\ell})}(x^{(j_{\ell})}-y^{(j_{\ell})})-\lambda^{(j_{\ell})}A_{0}x^{(j_{\ell})}=0.

By observing ${\|(x^{(j_{\ell})})_{\mathcal{L}}\|}\left(y^{(j_{\ell})}\right)_{\mathcal{L}}=\left(x^{(j_{\ell})}\right)_{\mathcal{L}}$ that implies $\left(x^{(j_{\ell})}-y^{(j_{\ell})}\right)_{\mathcal{L}}=\underbrace{(\|x^{(j_{\ell})}\|-1)}_{:=\mu^{(j_{\ell})}}(y^{(j_{\ell})})_{\mathcal{L}},$ and letting

\displaystyle M^{(j_{\ell})}:=\alpha A_{1}x^{(j_{\ell})}+\gamma\sum_{i=2}^{q}[(z^{(j_{\ell})})^{T}A_{i}z^{(j_{\ell})}A_{i}x^{(j_{\ell})}+(x^{(j_{\ell})})^{T}A_{i}x^{(j_{\ell})}A_{i}z^{(j_{\ell})}],

we get

M^{(j_{\ell})}+\mu^{(j_{\ell})}\begin{bmatrix}(y^{(j_{\ell})})_{\mathcal{L}}\\ 0\end{bmatrix}+\underbrace{\begin{bmatrix}0\\ \rho^{(j_{\ell})}(x^{(j_{\ell})})_{\mathcal{L}^{c}}\end{bmatrix}}_{:=w^{(j_{\ell})}}-\lambda A_{0}x^{(j_{\ell})}=0,

where $(w^{(j_{\ell})})_{\mathcal{L}}=0$ for each $j_{\ell}$ and also, $0\leq\lambda^{(j_{\ell})}\perp(x^{(j_{\ell})})^{T}A_{0}x^{(j_{\ell})}-\phi\geq 0,\ \|y^{(j_{\ell})}\|_{2}=1,$ and $(y^{(j_{\ell})})_{\mathcal{L}^{c}}=0.$ We next prove that $\{(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\}$ is bounded under Robinson’s condition on $x^{*}$ . Suppose not, consider the normalized sequence

(\widetilde{\lambda}^{(j_{\ell})},\widetilde{\mu}^{(j_{\ell})},\widetilde{w}^{(j_{\ell})}):=\frac{(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})}{\|(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\|_{2}},\qquad\forall\ {j_{\ell}}.

Through boundedness, there exists a convergent subsequence of $(\widetilde{\lambda}^{(j_{\ell})},\widetilde{\mu}^{(j_{\ell})},\widetilde{w}^{(j_{\ell})})$ whose limit is given by $(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})$ such that $\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|_{2}=1$ , we obtain, in view of $x^{*}=y^{*}$ , $y^{*}_{\mathcal{L}^{c}}=0$ and the boundedness of $(M^{(j_{\ell})})$ , and passing the limits, that

-\widetilde{\lambda}_{*}A_{0}x^{*}+\widetilde{\mu}_{*}x^{*}+\widetilde{w}^{*}=0,

(23)

where $\widetilde{\lambda}_{*}\geq 0$ , $x^{*}_{\mathcal{L}^{c}}=0$ , and $\widetilde{w}^{*}_{\mathcal{L}}=0$ . Let us consider two cases: $(x^{*})^{T}A_{0}x^{*}=\phi$ , and $(x^{*})^{T}A_{0}x^{*}>\phi$ as follows. When $(x^{*})^{T}A_{0}x^{*}=\phi$ , by the Robinson’s conditions at $x^{*}$ , there exist a vector $d\in\mathbb{R}^{n}$ and a constant $v\in\mathbb{R}_{-}$ such that $-2d^{T}A_{0}x^{*}-v=-2\widetilde{\lambda}_{*}$ , $2d^{T}x^{*}=-2\widetilde{\mu}_{*}$ , and $d_{\mathcal{L}^{c}}=-\widetilde{w}^{*}_{\mathcal{L}^{c}}$ . Since $d_{\mathcal{L}^{c}}=-\widetilde{w}^{*}_{\mathcal{L}^{c}}$ and $\widetilde{w}^{*}_{\mathcal{L}}=0$ , we see that $d^{T}\widetilde{w}^{*}=-\|\widetilde{w}^{*}\|^{2}_{2}$ . Therefore, $0=-\widetilde{\lambda}_{*}d^{T}A_{0}x^{*}+\widetilde{\mu}d^{T}x^{*}+d^{T}\widetilde{w}^{*}=-(\widetilde{\lambda}_{*})^{2}+\frac{\widetilde{\lambda}_{*}v}{2}-(\widetilde{\mu}_{*})^{2}-\|\widetilde{w}^{*}\|^{2}_{2}=-\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}+\frac{\widetilde{\lambda}_{*}v}{2},$ which implies that $\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}=\frac{\widetilde{\lambda}_{*}v}{2}$ . Since $\widetilde{\lambda}_{*}\geq 0$ and $v\leq 0$ , we have $\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}=0$ , which is a contradiction. Therefore, the sequence $\big{(}(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\big{)}$ is bounded. Now, suppose $(x^{*})^{T}A_{0}x^{*}>\phi$ . In this case, since $(x^{j_{\ell}})$ converges to $x^{*}$ , we have $(x^{j_{\ell}})^{T}A_{0}x^{j_{\ell}}>\phi$ for all $j_{\ell}$ sufficiently large. Hence, $\lambda^{(j_{\ell})}=0$ for all large $j_{\ell}$ . This shows that $\widetilde{\lambda}_{*}=0$ . In view of the Robinson’s condition at $x^{*}$ given in (2.2), there exist a vector $d\in\mathbb{R}^{n}$ and a constant $v\in\mathbb{R}$ such that $-2d^{T}A_{0}x^{*}-v=-2\widetilde{\lambda}_{*}$ , $2d^{T}x^{*}=-2\widetilde{\mu}_{*}$ , and $d_{\mathcal{L}^{c}}=-\widetilde{w}^{*}_{\mathcal{L}^{c}}$ . Using these equations (23) together with $\widetilde{\lambda}_{*}=0$ , we can see that $0=-\widetilde{\lambda}_{*}d^{T}A_{0}x^{*}+\widetilde{\mu}d^{T}x^{*}+d^{T}\widetilde{w}^{*}=-\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}+\frac{\widetilde{\lambda}_{*}v}{2};$ implying that $\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}$ , which is again a contradiction.

Hence, $\{(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\}$ is bounded and has a convergent subsequence with the limit $(\lambda,\mu,w)$ . thus, through passing limit in (5.3) and applying the results of part (i), we have:
$\alpha A_{1}x^{*}+2\gamma\sum_{i=2}^{q}(x^{*})^{T}A_{i}x^{*}A_{i}x^{*}-\lambda A_{0}x^{*}+\mu x^{*}+w=0,\|x^{*}\|=1,\quad(x^{*})_{\mathcal{L}^{c}}=0,\quad\mbox{and}\quad w^{*}_{\mathcal{L}}=0,$ where $0\leq\lambda\perp(x^{*})^{T}A_{0}x^{*}-\phi\geq 0$ . This means that $x^{*}$ satisfies the first-order optimality conditions of ( $P$ ) given in (15). ∎

5.4 Proof of Theorem 2.3.

Proof.

Note that only a few arguments given in the above proofs must be partially modified in order to make sure that all the subproblems are well-defined and that the convergence analysis similarly holds in the nonconvexity case. Therefore, we avoid repetition and instead, shortly discuss all the arguments that must be revisited. Specifically, we must put conditions on $\rho$ as follows.
(1) $2\rho>\lambda_{\max}(A_{0})-\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})$ , which is to guarantee the existence of a strictly feasible point in the dual of the SDP relaxation of ( $P_{x}$ ); simply inequality (6). (2) $\rho>-\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})$ , which is to have well-defined $z_{*}$ in (8). And, (3) $2\rho>-\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})$ , which is for the inequality (18). Since $A_{0}\succ 0$ , (3) is satisfied if (1) holds. It is easy to show that conditions (1) and (2) are satisfied if $\rho>\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1}).$ Recall that for two symmetric matrices $Q_{1}$ and $Q_{2}$ , we $x^{T}Q_{1}x\geq\lambda_{\min}(Q_{1})\|x\|^{2}$ and for two symmetric matrices $\lambda_{\min}(Q_{1}+Q_{2})\geq\lambda_{\min}(Q_{1})+\lambda_{\min}(Q_{2})$ . Thus, we can show $\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})\geq\alpha\lambda_{\min}(A_{1})+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)\lambda_{\min}(A_{i})$ $\geq\alpha\lambda_{\min}(A_{1})+\gamma\sum_{i=2}^{q}\|z\|^{2}\lambda^{2}_{\min}(A_{i})\geq\alpha\lambda_{\min}(A_{1});$ implying that $\lambda_{\max}(A_{0})-\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})\leq\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1})$ . In the same manner $\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})\geq\sum_{i=2}^{q}(x^{T}A_{i}x)\lambda_{\min}(A_{i})\geq\sum_{i=2}^{q}\|x\|^{2}\lambda^{2}_{\min}(A_{i})\geq 0$ such that we have $-\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})\leq 0$ . Therefore, conditions (1) and (2) are satisfied if $\rho>|\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1})|.$
Moreover, for part (i) in Theorem 2.2, we can have $\alpha{(x^{(j)})}^{T}A_{1}x^{(j)}+\gamma\sum_{i=2}^{q}({z^{(j)}})^{T}A_{i}z^{(j)}({x^{(j)}})^{T}A_{i}x^{(j)}+\rho^{(j)}\left(\|x^{(j)}-y^{(j)}\|^{2}+\|x^{(j)}-z^{(j)}\|^{2}\right)\leq\Upsilon,$ which by the new assumption on the boundedness of the level set $X_{\Upsilon}$ yields $\rho^{(j)}\left(\|x^{(j)}-y^{(j)}\|^{2}+\|x^{(j)}-z^{(j)}\|^{2}\right)\leq\Upsilon-\min_{x\in X_{\Upsilon}}f(x),$ which leads to $x^{*}=y^{*}=z^{*}$ . Consequently, (i) and (ii) in Theorem 2.2 hold for this case as well. ∎

$\displaystyle\\|x\\|$	$\displaystyle\leq\rho\\|(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I)^{-1}\\|_{2}\\|y+z\\|_{2}$	(18)
	$\displaystyle\leq\frac{\rho}{\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})+2\rho}\\|y+z\\|_{2}$
	$\displaystyle\leq\frac{1}{2}\\|y+z\\|_{2}$
	$\displaystyle\leq\frac{1}{2}(\\|y\\|+\\|z\\|)$
	$\displaystyle=\frac{1}{2}(1+\\|z\\|)\qquad\text{as}\ \ y\ \ \text{solves}\ \ \ref{pr: py},$

$\displaystyle\\|z\\|$	$\displaystyle\leq\rho\\|(\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i}+\rho I)^{-1}\\|\\|x\\|$	(19)
	$\displaystyle\leq\frac{\rho}{\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})+\rho}\\|x\\|$
	$\displaystyle\leq\\|x\\|.$

$\displaystyle q_{\rho}(x_{},y_{},z_{*})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x,y_{},z_{});\qquad\forall x\in\mathcal{X},$
$\displaystyle q_{\rho}(x_{},y_{},z_{*})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x_{},y,z_{});\qquad\forall y\in\mathcal{Y},$
$\displaystyle q_{\rho}(x_{},y_{},z_{*})$	$\displaystyle\leq$	$\displaystyle q_{\rho}(x_{},y_{},z);\qquad\forall z\in\mathcal{Z}.$