This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Statistical Proxy based Mean-Reverting Portfolios with Sparsity and Volatility Constraints

Ahmad Mousavi Emails: [email protected] and [email protected] Department of Mathematics and Statistics, American University George Michailidis Department of Statistics, University of Florida
Abstract

Mean-reverting portfolios with volatility and sparsity constraints are of prime interest to practitioners in finance since they are both profitable and well-diversified, while also managing risk and minimizing transaction costs. Three main measures that serve as statistical proxies to capture the mean-reversion property are predictability, portmanteau criterion, and crossing statistics. If in addition, reasonable volatility and sparsity for the portfolio are desired, a convex quadratic or quartic objective function, subject to nonconvex quadratic and cardinality constraints needs to be minimized. In this paper, we introduce and investigate a comprehensive modeling framework that incorporates all the previous proxies proposed in the literature and develop an effective unifying algorithm that is enabled to obtain a Karush–Kuhn–Tucker (KKT) point under mild regularity conditions. Specifically, we present a tailored penalty decomposition method that approximately solves a sequence of penalized subproblems by a block coordinate descent algorithm. To the best of our knowledge, our proposed algorithm is the first method for directly solving volatile, sparse, and mean-reverting portfolio problems based on the portmanteau criterion and crossing statistics proxies. Further, we establish that the convergence analysis can be extended to a nonconvex objective function case if the starting penalty parameter is larger than a finite bound and the objective function has a bounded level set. Numerical experiments on the S&P 500 data set, demonstrate the efficiency of the proposed algorithm in comparison to a semidefinite relaxation-based approach and suggest that the crossing statistics proxy yields more desirable portfolios.

Index Terms—Penalty Decomposition Methods, Sparse Optimization, Mean-reverting Portfolios, Predictability Proxy, Portmanteau Criterion, Crossing Statistics.

1 Introduction

Portfolios exhibiting mean-reverting behavior are of great interest to practitioners due to their predictability property that creates arbitrage opportunities. The task is to construct a portfolio that tends to return to its average value over time and use its performance to make trading decisions. Traditionally such portfolios are obtained by considering a linear combination of assets that are stationary through co-integration methods. However, such an approach often results in a large number of assets, which may not own reasonable volatility; making such portfolios practically inapplicable [12, 28]. Therefore, along with mean-reversion, reasonable volatility is a mandatory property in realistic situations. Moreover, trading costs can be notably reduced by limiting the number of assets, and hence sparse portfolios offer advantages (note that the notion of sparsity is widely used in signal processing and machine learning applications; e.g., [19] and references therein). Hence, constructing a mean-reverting portfolio exhibiting reasonable volatility and comprising a relatively smaller number of assets has found attention in the literature [20, 8]. Recognize that the concept of sparsity holds significant relevance across various applications within contemporary science [18, 24, 17, 1].

Three standard proxies that reflect the mean-reversion property are the predictability measure, the portmanteau criterion, and crossing statistics [9]. The predictability measure, initially defined for stationary processes in [4] and later generalized for non-stationary processes by [3], assesses how similar a time series is to a noise process. Let AiA_{i}’s for i=0,1,,qi=0,1,\dots,q be empirical autocovariance matrices related to a multivariate time series (see (16) for details), then the mean-reversion property via the predictability proxy can be captured using the following problem [9]:

minxnxTA1xsubject toxTx=1.\displaystyle\min_{x\in\mathbb{R}^{n}}\quad x^{T}A_{1}x\qquad\textrm{subject to}\qquad x^{T}x=1. (1)

The portmanteau statistic, proposed by [16], is used to determine if a process is a white noise. In particular, mean reversion is obtained via portamento criterion proxy as follows [9]:

minxni=2q(xTAix)2subject toxTx=1.\displaystyle\min_{x\in\mathbb{R}^{n}}\quad\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)^{2}\qquad\textrm{subject to}\qquad x^{T}x=1. (2)

The choice of qq depends on factors such as data availability, portfolio size, risk tolerance, and computational complexity; hence, in applications, sensitivity analysis is recommended for selecting an appropriate qq [11, 14]. The crossing statistic calculates the likelihood of a process crossing its mean per unit of time [26]. Precisely, the crossing statistics proxy results in mean reversion by [9]:

minxnxTA1x+γi=2q(xTAix)2subject toxTx=1,\displaystyle\min_{x\in\mathbb{R}^{n}}\quad x^{T}A_{1}x+\gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)^{2}\qquad\textrm{subject to}\qquad x^{T}x=1, (3)

γ\gamma serves as a hyperparameter, governing the influence of lag-ii autocovariance matrices (see (16)) on the mean reversion characteristic of the target portfolio. On the other hand, recall that sufficient volatility can be obtained by imposing a constraint on the variance, that is, xA0xϕxA_{0}x\geq\phi for some positive threshold ϕ\phi, where A0A_{0} is the covariance matrix. Hence, statistical proxy-based problems (1), (2), and (3) together with volatility and sparsity constraints can be formulated into the following comprehensive model:

minxn\displaystyle\min_{x\in\mathbb{R}^{n}} f(x):=αxTA1x+γi=2q(xTAix)2\displaystyle\ f(x):=\alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)^{2} (PP)
subject to xTA0xϕ,xTx=1,andx0k,\displaystyle x^{T}A_{0}x\geq\phi,\quad x^{T}x=1,\quad\text{and}\quad\|x\|_{0}\leq k,

where α{0,1},2q,A00,Ai0\alpha\in\{0,1\},2\leq q\in\mathbb{N},A_{0}\succ 0,A_{i}\succeq 0 for i=1,,q,γ0,ϕ>0i=1,\dots,q,\gamma\geq 0,\phi>0, and knk\leq n (for further details see 17)). We emphasize that autocovariance matrices provide information about the temporal dependence structure among variables in a time series data set and are commonly used in time series analysis and econometrics for forecasting, model estimation, and inference; see Section 3 for details. Lastly, it is important to emphasize that α\alpha is not a hyperparameter; instead, it is a parameter that is determined immediately by the model choice, as detailed in the equation (17). Also, the hyperparameter γ\gamma only matters when we consider the crossing statistics-based model and in fact, does not play a role in predictability- and portmanteau criterion-based models.

Despite the extensive literature on solving close problems in portfolio optimization [7, 13, 15, 23], the only approach for handling (PP) suggests to apply the following semidefinite (SDP) relaxation [8]:

minXn×n\displaystyle\min_{X\in\mathbb{R}^{n\times n}} αTr(A0X)+γi=2qTr(AiX)2+βX1\displaystyle\ \alpha\mbox{{Tr}}(A_{0}X)+\gamma\sum_{i=2}^{q}\mbox{{Tr}}(A_{i}X)^{2}+\beta\|X\|_{1} (SDPPSDP-P)
subject to Tr(A0X)ϕ,Tr(X)=1,andX0,\displaystyle\mbox{{Tr}}(A_{0}X)\geq\phi,\quad\mbox{{Tr}}(X)=1,\quad\text{and}\quad X\succeq 0,

with β>0\beta>0 and X1:=i,j|Xi,j|\|X\|_{1}:=\sum_{i,j}|X_{i,j}|. However, this method exhibits several drawbacks: (i) optimal values of (PP) and (SDPPSDP-P) may differ, (ii) a solution of (SDPPSDP-P) is not necessarily rank-one, and (iii) even if it is rank-one, a common approach is to apply sparse principal component analysis to it but, the qualitative properties of a solution of the sparse component analysis is not well-understood with respect to the original problem. Therefore, the effectiveness of applying an SDP relaxation to (PP) lacks rigorous theoretical support. The main reason is that the presence of X1\|X\|_{1} in the objective function of (SDPPSDP-P) breaks down the standard techniques that result in a relationship between a QCQP (without a cardinality constraint) with its original SDP relaxation. More specifically, there are no known theoretical results that specify the relationship between the optimal values and points of a solution of a QCQP that has a cordiality constraint with its SDP relaxation. Further, there exists a body of literature addressing general cardinality problems through other relaxation techniques like mixed-integer nonlinear programming [2], and continuous relaxations of mixed-integer programs [6] but such methods are less promising because they are not using the interesting structure of (P) in their design.

Therefore, we propose an efficient tailored algorithm that exploits the structure of (P) and utilizes a penalty decomposition method to directly solve (PP) which obtains a KKT point of (PP). To the best of our knowledge, the proposed algorithm is the first specialized approach specifically designed to directly tackle the problem (PP) based on the portmanteau criterion and crossing statistics proxies. This unifying algorithm not only is applicable to all three proxies mentioned above but also works for nonconvex objective functions. In the proposed methodology, we use the variable splitting technique to simplify our complex problem (PP) that involves highly nonlinearly coupled constraints. Generally, the splitting approach involves introducing additional variables to the problem, which helps loosen the coupling between the original variables [27]. This transformation enables the problem to be tackled more efficiently. After splitting and penalizing over the related equality constraints, a sequence of penalized subproblems is in hand that could be approximately solved. Each subproblem is efficiently tackled by a block coordinate algorithm that outputs a saddle point of the corresponding subproblem. Each step of the block coordinate algorithm either has a closed-form solution or is shown to be tractable; see problems (PxP_{x}), (PyP_{y}), and (PzP_{z}), respectively, in Section 2. In addition, the sequence of objective values in the block coordinate algorithm is either strictly decreasing or reaches equality at a finite step, which also yields a saddle point. We establish that our tailored penalty decomposition algorithm finds a KKT point of the original problem (PP). Moreover, in the nonconvex case, the analysis of the algorithm can be simply extended to demonstrate that our algorithm is successful again, if the starting penalty parameter is larger than a finite positive bound and if the objective function has a bounded level set; see Theorem 2.3 in Section 5 and its proof 5.4 in the Appendix.

We mention that [20] introduces a penalty decomposition method that exclusively addresses the sparse, volatile, and mean-reverting problem associated with the predictability proxy, specifically for when α=1\alpha=1 and γ=0\gamma=0 in (PP). In this work, we extend this methodology to efficiently tackle problems arising from other proxies as well. We emphasize that, to the best of our knowledge, the portmanteau criterion- and crossing statistics-based problems with volatility and sparsity constraints have not been previously considered in the literature, even though they could produce more profitable portfolios as more information is leveraged by them. We numerically demonstrate that this is indeed the case for crossing statistics, which highlights the importance of designing an effective algorithm for this intractable problem; see Section 3. Further, our empirical results reveal the quadratic term in the objective function is crucial in capturing mean reversion.

We demonstrate the effectiveness of the algorithm by conducting a numerical comparison with the SDP relaxation approach using the S&P 500 dataset. Our method outperforms the SDP relaxation in terms of practical performance measures. Further, we provide a comparison of portfolio performance generated from predictability, portmanteau criterion, and crossing statistics proxies, when our algorithm is applied. The results show that after tuning the parameters qq and γ\gamma, the crossing statistics-based portfolio not only effectively captures the mean-reversion property, but also achieves superior performance in terms of Sharpe ratio and cumulative return and loss, compared to predictability and portmanteau criterion-based portfolios; see Section 3. This is consistent with the fact that its corresponding problem exploits more information from the data in this case. However, the portmanteau criterion-based portfolio practically may not be successful in retaining mean reversion, underscoring the significance of the quadratic term in the objective function for capturing this desirable property.

Organization. The remainder of the paper is organized as follows. Section 2 presents the proposed penalty decomposition algorithm for directly solving (PP) together with all technical issues. Extensive numerical experiments are provided in 3, while Section 4 draws some concluding remarks. The proofs of the convergence analysis are presented in the Appendix.

Notation. The complement of a set SS is denoted as ScS^{c} and its cardinality as |S||S|. For a natural number nn, let [n]:={1,2,,n}[n]:=\{1,2,\dots,n\}. For S={i1,i2,i|S|}[n]S=\{i_{1},i_{2}\dots,i_{|S|}\}\subseteq[n] and xn,xSnx\in\mathbb{R}^{n},x_{S}\in\mathbb{R}^{n} is the coordinate projection of xx with respect to indices in SS, that is, (xS)i=xi(x_{S})_{i}=x_{i} for iSi\in S and (xS)i=0(x_{S})_{i}=0 for iSci\in S^{c}. By .\|.\|, we mean the standard Euclidean norm. We show AA is positive semidefinite or definite by A0A\succeq 0 and A0A\succ 0, respectively. We denote the smallest and largest eigenvalues of AA with λmin(A)\lambda_{\min}(A), and λmax(A)\lambda_{\max}(A), respectively.

2 A Penalty Decomposition Method for Statistical Proxy-based Portfolios with Sparsity and Volatility Constraints

An algorithm for directly minimizing (PP) that obtains a KKT point is presented next, with all proofs delegated to the Appendix. The PPC (standing for predictability, portmanteau criterion, and crossing statistics) based algorithm leverages a tailored penalty decomposition method in which a sequence of penalty subproblems is approximately solved. For each penalty subproblem, a block coordinate descent (BCD) algorithm is utilized that produces a saddle point of the penalty subproblem. The limit point of a suitable subsequence of such saddle points is demonstrated to be a KKT point of (PP).

2.1 Details of PPC and BCD Algorithms

By introducing two new variables yy and zz, we can equivalently reformulate (PP) as follows:

minx,y,zn\displaystyle\min_{x,y,z\in\mathbb{R}^{n}} αxTA1x+γi=2q(zTAiz)(xTAix)\displaystyle\alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)(x^{T}A_{i}x) (PP^{\prime})
subject to xTA0xϕ,yTy=1,y0k,\displaystyle x^{T}A_{0}x\geq\phi,\quad y^{T}y=1,\quad\|y\|_{0}\leq k,\quad
xy=0,andxz=0.\displaystyle x-y=0,\quad\text{and}\quad x-z=0.

The reason for such splitting is to effectively handle the highly nonlinearly coupled constraints of the original problem. Specifically, we aim for the subproblems of Algorithm 1 to be as simple as possible. Let

qρ(x,y,z):=αxTA1x+γi=2q(zTAiz)(xTAix)+ρ(xy22+xz22),\displaystyle q_{\rho}(x,y,z):=\alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)(x^{T}A_{i}x)+\rho(\|x-y\|_{2}^{2}+\|x-z\|_{2}^{2}), (4)

and

𝒳:={xn|xTA0xϕ},𝒴:={yn|yTy=1 and y0k}, and 𝒵:=n.\displaystyle\mathcal{X}:=\{x\in\mathbb{R}^{n}\,|\,x^{T}A_{0}x\geq\phi\},\qquad\mathcal{Y}:=\{y\in\mathbb{R}^{n}\,|\,y^{T}y=1\,\text{ and }\|y\|_{0}\leq k\},\qquad\text{ and }\qquad\mathcal{Z}:=\mathbb{R}^{n}.

By penalizing the last two constraints in (PP^{\prime}), we tackle this problem by a sequence of penalty subproblems as follows:

minx,y,z\displaystyle\min_{x,y,z} qρ(x,y,z)\displaystyle\ \qquad q_{\rho}(x,y,z) (Px,y,zP_{x,y,z})
subject to x𝒳,y𝒴,andz𝒵,\displaystyle\quad x\in\mathcal{X},\quad y\in\mathcal{Y},\quad\text{and}\quad z\in\mathcal{Z},

with ρ\rho going to infinity incrementally (such techniques have demonstrated their efficacy in the existing literature [21, 20]). The auxiliary variables yy and zz are introduced such that simpler subproblems can be obtained in Algorithm 1. We establish in the sequel that this method efficiently finds a saddle point (x,y,z)(x_{*},y_{*},z_{*}) of (Px,y,zP_{x,y,z}), which means, xArgminx𝒳qρ(x,y,z),x_{*}\in\mbox{Argmin}_{x\in\mathcal{X}}\,q_{\rho}(x,y_{*},z_{*}), yArgminy𝒴qρ(x,y,z),y_{*}\in\mbox{Argmin}_{y\in\mathcal{Y}}\,q_{\rho}(x_{*},y,z_{*}), and zArgminz𝒵qρ(x,y,z).z_{*}\in\mbox{Argmin}_{z\in\mathcal{Z}}\,q_{\rho}(x_{*},y_{*},z).

Algorithm 1 BCD Algorithm for Solving (Px,y,zP_{x,y,z})
1:  Input: Select arbitrary y0𝒴y_{0}\in\mathcal{Y} and z0𝒵z_{0}\in\mathcal{Z}.
2:  Set l=0l=0.
3:  Solve xl+1Argminx𝒳qρ(x,yl,zl).x_{l+1}\in\mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y_{l},z_{l}).
4:  Solve yl+1=Argminy𝒴qρ(xl+1,y,zl).y_{l+1}=\mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x_{l+1},y,z_{l}).
5:  Solve zl+1=Argminz𝒵qρ(xl+1,yl+1,z)z_{l+1}=\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x_{l+1},y_{l+1},z).
6:  ll+1l\leftarrow l+1 and go to step (3).

Next, we discuss how to efficiently solve the restricted subproblems in Algorithm 1.
Argminx𝒳qρ(x,y,z)\bullet\ \mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y,z): This subproblem becomes the following nonconvex quadratic optimization problem:

minxn\displaystyle\min_{x\in\mathbb{R}^{n}} αxTA1x+γi=2q(zTAiz)(xTAix)+ρ(xy2+xz2)\displaystyle\ \alpha x^{T}A_{1}x+\gamma\sum_{i=2}^{q}\left(z^{T}A_{i}z\right)\left(x^{T}A_{i}x\right)+\rho\left(\|x-y\|^{2}+\|x-z\|^{2}\right) (PxP_{x})
subject to xTA0xϕ.\displaystyle\quad\quad x^{T}A_{0}x\geq\phi.

Clearly, this problem has a solution xx_{*} and its KKT conditions are as follows:

αA1x+γi=2q(zTAiz)Aix+ρ(2xyz)λA0x=0, and 0λxTA0xϕ0.\alpha A_{1}x_{*}+\gamma\sum_{i=2}^{q}\left(z^{T}A_{i}z\right)\,A_{i}x_{*}+\rho(2x_{*}-y-z)-\lambda A_{0}x_{*}=0,\quad\mbox{ and }\quad 0\leq\lambda\perp x_{*}^{T}A_{0}x_{*}-\phi\geq 0. (5)

In general, this nonconvex program does not have a closed-form solution, nevertheless, we will discuss how to find its global minimizer by exploiting its SDP relaxation. Recall that the SDP relaxation of a general quadratic program with exactly one general quadratic constraint obtains the same optimal value provided that it is strictly feasible [5]. It is easy to see (PxP_{x}) is strictly feasible because A0A\succ 0. Hence, we can find the optimal value of this nonconvex problem exactly by solving its convex SDP relaxation given next:

minX(n+1)×(n+1)\displaystyle\min_{X\in\mathbb{R}^{(n+1)\times(n+1)}} Tr(H1X)\displaystyle\mbox{{Tr}}\left(H_{1}X\right) (SDPPxSDP-P_{x})
subject to   Tr (H2X)0,X11=1,andX0,\displaystyle\left(H_{2}X\right)\geq 0,\quad X_{11}=1,\quad\text{and}\quad X\succeq 0,

where

H1=[ρ(y2+z2)ρ(y+z)Tρ(y+z)αA1+γi=2q(zTAiz)Ai+2ρI] and H2=[ϕ00A0].H_{1}=\begin{bmatrix}\rho\left(\|y\|^{2}+\|z\|^{2}\right)&-\rho(y+z)^{T}\\ -\rho(y+z)&\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I\end{bmatrix}\qquad\text{ and }\qquad H_{2}=\begin{bmatrix}-\phi&0\\ 0&A_{0}\end{bmatrix}.

Its dual problem is given by

maxw1,w2,Z(n+1)×(n+1)\displaystyle\max_{w_{1}\in\mathbb{R},w_{2}\in\mathbb{R},Z\in\mathbb{R}^{(n+1)\times(n+1)}} w2\displaystyle w_{2}
subject to w1H2+w2[1000]+Z=H1,\displaystyle w_{1}H_{2}+w_{2}\begin{bmatrix}1&0\\ 0&0\end{bmatrix}+Z=H_{1},
w10,andZ0.\displaystyle w_{1}\geq 0,\quad\text{and}\quad Z\succeq 0.

We first show that both problems are strictly feasible. Clearly, since A00A_{0}\succ 0, we see X=[100tI]X=\begin{bmatrix}1&0\\ 0&tI\end{bmatrix} with t>ϕ/Tr(A0)>0t>\phi/\mbox{{Tr}}(A_{0})>0 is a strictly feasible point of the primal problem. To see that the dual problem is strictly feasible, it is enough to show that there exists a positive w1w_{1} and a real w2w_{2} such that Z0.Z\succ 0. This block matrix is positive definite, if and only if (i) αA1+γi=2q(zTAiz)Ai+2ρIw1A00\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I-w_{1}A_{0}\succ 0 and (ii) ρ(y2+z2)+w1ϕw2ρ2(y+z)T(αA1+γi=2q(zTAiz)Ai+2ρIw1A0)1(y+z)>0\rho(\|y\|^{2}+\|z\|^{2})+w_{1}\phi-w_{2}-\rho^{2}(y+z)^{T}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I-w_{1}A_{0})^{-1}(y+z)>0. To guarantee inequality (i), it is enough to select w1=ϵ>0w_{1}=\epsilon>0 small enough such that

λmin(αA1+γi=2q(zTAiz)Ai)+2ρ>ϵλmax(A0),\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})+2\rho>\epsilon\lambda_{\max}(A_{0}), (6)

which is doable because λmin(αA1+γi=2q(zTAiz)Ai))0\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}))\geq 0 in view of Ai0A_{i}\succeq 0 for i[q],γ0,i\in[q],\gamma\geq 0, and ρ>0\rho>0. Inequality (ii) can be easily guaranteed by selecting a sufficiently large negative number for w2w_{2}.

Given that both the primal and the dual problems have strictly feasible solutions, their solutions have the same optimal value. Let (w1,w2,Z)(w_{1}^{*},w_{2}^{*},Z^{*}) and XX^{*} be their optimal solutions. If XX^{*} of (SDPPxSDP-P_{x}) has rank one, then the solution for (PxP_{x}) is trivially obtained. If not, using the procedure in [25, Lemma 2.2], we can use the rank-one decomposition X=i=1ruiuiTX^{*}=\sum_{i=1}^{r}u_{i}u_{i}^{T} with r=rank(X)r=\mbox{rank}(X^{*}) and 0uin+10\neq u_{i}\in\mathbb{R}^{n+1} for all i[r]i\in[r] such that ruiTH2ui=Tr(H2X)0ru_{i}^{T}H_{2}u_{i}=\mbox{{Tr}}\left(H_{2}X^{*}\right)\geq 0 for all i[r]i\in[r]. Since X11=1X^{*}_{11}=1, there exists j[r]j\in[r] such that uj=[α;u]n+1u_{j}=[\alpha;u]\in\mathbb{R}^{n+1} with α0\alpha\neq 0. Further, the KKT conditions imply that 0=Tr(XZ)=Tr(i=1ruiuiTZ)=i=1rTr(uiTZui)=i=1ruiTZui0=\mbox{{Tr}}\left(X^{*}Z^{*}\right)=\mbox{{Tr}}\left(\sum_{i=1}^{r}u_{i}u_{i}^{T}Z^{*}\right)=\sum_{i=1}^{r}\mbox{{Tr}}\left(u^{T}_{i}Z^{*}u_{i}\right)=\sum_{i=1}^{r}u^{T}_{i}Z^{*}u_{i} and since Z0Z^{*}\succeq 0, we have ujTZuj=0u_{j}^{T}Z^{*}u_{j}=0. Thus, ujujTu_{j}u_{j}^{T} and (w1,w2,Z)(w_{1}^{*},w_{2}^{*},Z^{*}) satisfy the KKT conditions and consequently, x:=uj/αx_{*}:={u_{j}}/\alpha yields a global solution to (PxP_{x}). Thus, xx_{*} indeed satisfies (5).

Argminy𝒴qρ(x,y,z)\bullet\ \mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x,y,z): This subproblem is as follows:

minynxy22subject toyTy=1,andy0k.\min_{y\in\mathbb{R}^{n}}\ \|x-y\|_{2}^{2}\qquad\mbox{subject to}\qquad y^{T}y=1,\quad\textrm{and}\quad\|y\|_{0}\leq k. (PyP_{y})

Thus, due to Lemma 3.2 in [20], the closed-form solution of this problem is given by

y=𝒯k(x),y_{*}=\mathcal{T}_{k}\left(x\right), (7)

where 𝒯k\mathcal{T}_{k} is defined in (9).

Argminz𝒵qρ(x,y,z)\bullet\ \mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x,y,z): This subproblem becomes the following:

minznγi=2q(xTAix)(zTAiz)+ρxz2,\min_{z\in\mathbb{R}^{n}}\ \gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)\left(z^{T}A_{i}z\right)+\rho\|x-z\|^{2}, (PzP_{z})

which has the closed-form solution of

z=ρ(γi=2q(xTAix)Ai+ρI)1x.z_{*}=\rho\left(\gamma\sum_{i=2}^{q}\left(x^{T}A_{i}x\right)A_{i}+\rho I\right)^{-1}x. (8)

Next, we develop our proposed PPC Algorithm 2 that starts from an initial positive penalty parameter and smoothly enlarges it to numerically go to infinity. For each individual penalty parameter, Algorithm 1 is utilized to solve the corresponding subproblem. We assume that (PP) is feasible and a feasible point xfeasx^{\text{feas}} is available. To find such a point, one can solve the following sparse principal component analysis problem:

maxxnxTA0xsubject toxTx=1andx0k.\max_{x\in\mathbb{R}^{n}}\ x^{T}A_{0}x\qquad\mbox{subject to}\qquad x^{T}x=1\quad\text{and}\quad\|x\|_{0}\leq k.

A stationary point of this problem can be obtained by the power method xl:=𝒯k(A0xl1)x_{l}:=\mathcal{T}_{k}(A_{0}x_{l-1}) for ll\in\mathbb{N} (with u0nu_{0}\in\mathbb{R}^{n} arbitrary) suggested in [22]. We can run this cheap method starting from different x0x_{0}’s until the achieved stationary point satisfies the volatility constraint. The sparsifying operator 𝒯k\mathcal{T}_{k} is defined as

𝒯k(x):=xx2,\mathcal{T}_{k}(x):=\frac{x_{\mathcal{L}}}{\left\|x_{\mathcal{L}}\right\|_{2}}, (9)

where \mathcal{L} is the index set containing the kk largest components of xx in absolute value. To provide this algorithm and its analysis later, we let:

Υmax{f(xfeas),minx𝒳qρ(0)(x,y0(0),z0(0))}>0 and XΥ:={xn|f(x)Υ},\displaystyle\Upsilon\geq\max\{f(x^{\text{feas}}),\min_{x\in\mathcal{X}}q_{\rho^{(0)}}(x,y^{(0)}_{0},z^{(0)}_{0})\}>0\qquad\text{ and }\qquad X_{\Upsilon}:=\{x\in\mathbb{R}^{n}\,|\,f(x)\leq\Upsilon\}, (10)

where f(x)f(x) is defined in the objective function of (PP).

Algorithm 2 PPC Penalty Decomposition Algorithm for Solving (PP)
1:  Inputs: r>1,ρ(0)>0r>1,\rho^{(0)}>0, and y0(0)𝒴y^{(0)}_{0}\in\mathcal{Y} and z0(0)𝒵z^{(0)}_{0}\in\mathcal{Z} with z0(0)1\|z_{0}^{(0)}\|\leq 1.
2:  Set j=0j=0.
3:  repeat
4:     Set l=0l=0.
5:     repeat
6:        Solve xl+1(j)Argminx𝒳qρ(x,yl(j),zl(j)).x^{(j)}_{l+1}\in\mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y^{(j)}_{l},z^{(j)}_{l}).
7:        Solve yl+1(j)=Argminy𝒴qρ(xl+1(j),y,zl(j)).y^{(j)}_{l+1}=\mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x^{(j)}_{l+1},y,z^{(j)}_{l}).
8:        Solve zl+1(j)=Argminz𝒵qρ(xl+1(j),yl+1(j),z)z^{(j)}_{l+1}=\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x^{(j)}_{l+1},y^{(j)}_{l+1},z).
9:        Set ll+1l\leftarrow l+1.
10:     until stopping criterion (11) is met.
11:     Set (x(j),y(j),z(j)):=(xl(j),yl(j),zl(j))(x^{(j)},y^{(j)},z^{(j)}):=(x^{(j)}_{l},y^{(j)}_{l},z^{(j)}_{l}).
12:     Set ρ(j+1)=rρ(j)\rho^{(j+1)}=r\cdot\rho^{(j)}.
13:     If minx𝒳qρ(j+1)(x,y(j),z(j))>Υ\min_{x\in\mathcal{X}}q_{\rho^{(j+1)}}(x,y^{(j)},z^{(j)})>\Upsilon, then y0(j+1)=xfeasy_{0}^{(j+1)}=x^{\text{feas}} and z0(j+1)=xfeasz_{0}^{(j+1)}=x^{\text{feas}}. Otherwise, y0(j+1)=y(j)y^{(j+1)}_{0}=y^{(j)} and z0(j+1)=z(j)z^{(j+1)}_{0}=z^{(j)}.
14:     Set jj+1j\leftarrow j+1.
15:  until stopping criterion (12) is met.

Note that Algorithm 2 consists of two nested loops,we stop the inner loop, lines 5 to 10, if (after dropping (j)(j) for simplicity):

max{xlxl1max(xl,1),ylyl1max(yl,1),zlzl1max(zl,1)}ϵI,\max\left\{\frac{\|x_{l}-x_{l-1}\|_{\infty}}{\max\left(\|x_{l}\|_{\infty},1\right)},\frac{\|y_{l}-y_{l-1}\|_{\infty}}{\max\left(\|y_{l}\|_{\infty},1\right)},\frac{\|z_{l}-z_{l-1}\|_{\infty}}{\max\left(\|z_{l}\|_{\infty},1\right)}\right\}\leq\epsilon_{I}, (11)

and the outer loop, lines 3 to 15, is stopped when the following convergence criterion is met:

x(j)y(j)+x(j)z(j)ϵO.\|x^{(j)}-y^{(j)}\|_{\infty}+\|x^{(j)}-z^{(j)}\|_{\infty}\leq\epsilon_{O}. (12)

2.2 Convergence Analysis

We first investigate Algorithm 1 and then establish the convergence of Algorithm 2.

Analysis of Algorithm 1. We analyze a sequence {(xl,yl,zl)}\{(x_{l},y_{l},z_{l})\} generated by Algorithm 1 and provide a tailored proof that any such sequence obtains a saddle point of (Px,y,zP_{x,y,z}). This justifies the use of Algorithm 1 for this nonconvex problem.

Lemma 2.1.

Let α{0,1},γ0,ρ>0,A00,\alpha\in\{0,1\},\gamma\geq 0,\rho>0,A_{0}\succ 0, and Ai0A_{i}\succeq 0 for i[q]i\in[q]. Consider the iterates of Algorithm 1, that is, xlArgminx𝒳qρ(x,yl1,zl1)x_{l}\in\mbox{Argmin}_{x\in\mathcal{X}}\ q_{\rho}(x,y_{l-1},z_{l-1}), ylArgminy𝒴qρ(xl,y,zl1)y_{l}\in\mbox{Argmin}_{y\in\mathcal{Y}}\ q_{\rho}(x_{l},y,z_{l-1}) and further, zlArgminz𝒵qρ(xl,yl,z),z_{l}\in\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x_{l},y_{l},z), we get

max{xl,yl,zl}max{ϕ/λmin(A0),1}.\max\{\|x_{l}\|,\|y_{l}\|,\|z_{l}\|\}\leq\max\{\sqrt{\phi/{\lambda_{\min}(A_{0})}},1\}. (13)

This lemma establishes that the sequence created by Algorithm 1 is bounded. Therefore, every sequence generated by this algorithm possesses a point of accumulation. The subsequent theorem establishes that every accumulation point is unquestionably a saddle point of (Px,y,zP_{x,y,z}). Additionally, the sequence (qρ(xl,yl,zl))(q_{\rho}(x_{l},y_{l},z_{l})) is either strictly decreasing or two consecutive terms produce the same value, resulting in a saddle point. Essentially, Algorithm 1 produces a saddle point either in finite steps or in the limit.

Theorem 2.1.

Let {(xl,yl,zl)}\{(x_{l},y_{l},z_{l})\} be a sequence generated by Algorithm 1 for solving (Px,y,zP_{x,y,z}). Suppose that (x,y,z)(x_{*},y_{*},z_{*}) is an accumulation point of this sequence, then (x,y,z)(x_{*},y_{*},z_{*}) is a saddle point of the nonconvex problem (Px,y,zP_{x,y,z}). Moreover, {qρ(xl,yl,zl)}\left\{q_{\rho}(x_{l},y_{l},z_{l})\right\} is a non-increasing sequence. If qρ(xr,yr,zr)=qρ(xr+1,yr+1,zr+1)q_{\rho}(x_{r},y_{r},z_{r})=q_{\rho}(x_{r+1},y_{r+1},z_{r+1}) for some rr\in\mathbb{N}, then (xr,yr,zr)(x_{r},y_{r},z_{r}) is a saddle point of (Px,y,z)(\ref{pr: pxyz}).

Analysis of Algorithm 2. Suppose that xx^{*} is a local minimum of (PP), then there exists an index set \mathcal{L} such that ||=k|\mathcal{L}|=k and xc=0x^{*}_{\mathcal{L}^{c}}=0 such that xx^{*} is also a local minimizer of the following problem:

minxn\displaystyle\min_{x\in\mathbb{R}^{n}} f(x)\displaystyle\quad f(x)
subject to xTA0xϕ,xTx=1,andxc=0,\displaystyle\quad x^{T}A_{0}x\geq\phi,\quad x^{T}x=1,\quad\text{and}\quad x_{\mathcal{L}^{c}}=0,

where f(x)f(x) is defined in the objective function of (PP). Recall that Robinson’s constraint qualification conditions for a local minimizer xx^{*} for the above problem are as follows [20]:

(i) If (x)TA0x>ϕ, then {[2dTA0xv2dTxdc]|dn,v}=××|c|;\displaystyle(i)\mbox{ If }(x^{*})^{T}A_{0}x^{*}>\phi,\text{ then }\bigg{\{}\begin{bmatrix}-2d^{T}A_{0}x^{*}-v\\ 2d^{T}x^{*}\\ d_{\mathcal{L}^{c}}\end{bmatrix}\ \Big{|}\ d\in\mathbb{R}^{n},\ v\in\mathbb{R}\bigg{\}}=\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{|\mathcal{L}^{c}|};
(ii) If (x)TA0x=ϕ, then {[2dTA0xv2dTxdc]|dn,v}=××|c|.\displaystyle(ii)\mbox{ If }(x^{*})^{T}A_{0}x^{*}=\phi,\text{ then }\bigg{\{}\begin{bmatrix}-2d^{T}A_{0}x^{*}-v\\ 2d^{T}x^{*}\\ d_{\mathcal{L}^{c}}\end{bmatrix}\ \Big{|}\ d\in\mathbb{R}^{n},\ v\in\mathbb{R}_{-}\bigg{\}}=\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{|\mathcal{L}^{c}|}. (14)

It can be seen that x0x^{*}_{\mathcal{L}}\neq 0, due to x2=1\|x^{*}\|_{2}=1 and xc=0x^{*}_{\mathcal{L}^{c}}=0, and thus Robinson’s conditions for case (i) always hold. For case (ii), Robinson’s conditions hold, if and only if {(A0x),x}\{(A_{0}x^{*})_{\mathcal{L}},x^{*}_{\mathcal{L}}\} is linearly independent. It is easy to see this set is linearly independent almost always. This implies that except for a set of measure zero, Robinson’s conditions are satisfied for (PP), that is, such an assumption for xx^{*} is essentially the case in practice.

Under Robinson’s conditions mentioned above, the KKT conditions for a local minimizer xx^{*} of (PP) are the existence of Lagrangian multipliers (λ,μ,w)××n(\lambda,\mu,w)\in\mathbb{R}\times\mathbb{R}\times\mathbb{R}^{n} and [n]\mathcal{L}\subseteq[n] with ||=k|\mathcal{L}|=k such that the following holds:

(αA1+2γi=2q(xTAix)Ai)xλA0x+μx+w=0,\displaystyle\quad(\alpha A_{1}+2\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})x-\lambda A_{0}x+\mu x+w=0, (15)
xc=0, 0λxTA0xϕ0,x2=1, and w=0.\displaystyle x_{{\mathcal{L}}^{c}}=0,\,0\leq\lambda\perp x^{T}A_{0}x-\phi\geq 0,\,\|x\|_{2}=1,\text{ and }w_{\mathcal{L}}=0.

The next result states that an arbitrary sequence {(x(j),y(j),z(j))}\{(x^{(j)},y^{(j)},z^{(j)})\} generated by Algorithm 2 has a convergent subsequence, whose limit point is a KKT point of (PP).

Theorem 2.2.

Suppose that α{0,1},γ0,ϕ>0,k[n],A00\alpha\in\{0,1\},\gamma\geq 0,\phi>0,k\in[n],A_{0}\succ 0 and Ai0A_{i}\succeq 0 for i[q]i\in[q]. Let {(x(j),y(j),z(j))}\left\{\left(x^{(j)},y^{(j)},z^{(j)}\right)\right\} be a sequence generated by Algorithm 2 for solving (PP). Then, the following hold:

  • (i)

    {(x(j),y(j),z(j))}\left\{\left(x^{(j)},y^{(j)},z^{(j)}\right)\right\} has a convergent subsequence whose accumulation point (x,y,z)(x^{*},y^{*},z^{*}) satisfies x=y=zx^{*}=y^{*}=z^{*}. Further, there exists an index subset [n]\mathcal{L}\subseteq[n] with ||=k|\mathcal{L}|=k such that xc=0x^{*}_{\mathcal{L}^{c}}=0.

  • (ii)

    Suppose that Robinson’s condition given in (2.2) holds at xx^{*} with the index subset \mathcal{L} indicated above. Then, xx^{*} is a KKT point satisfying (15) for (PP).

Extension to Nonconvex Objective Functions. Note that the focus has been on constructing portfolios based on the predictability, portmanteau criterion, and crossing statistics proxies, for which f(x)f(x) as defined in (10) is convex. Nevertheless, we point out that from a technical viewpoint, one can employ Algorithm 2 to solve (PP) even when the objective function is not convex. For this case, we require the boundedness of the level set XΥX_{\Upsilon} defined in (10) and ρ(0)\rho^{(0)} to be large enough in this algorithm.

Theorem 2.3.

Suppose that α{0,1},γ0,ϕ>0,A00,Ai\alpha\in\{0,1\},\gamma\geq 0,\phi>0,A_{0}\succ 0,A_{i} be symmetric for i[q],k[n],ρ(0)>|λmax(A0)αλmin(A1)|i\in[q],k\in[n],\rho^{(0)}>|\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1})|, and the level set XΥX_{\Upsilon} in (10) is bounded. All the steps in Algorithm 2 are well-defined and it successfully finds a KKT point of (PP).

3 Numerical Experiments

Here, we examine the performance of Algorithm 2 in finding sparse, volatile, and statistically proxy-based mean-reverting portfolios on real data coming from the US stock market Standard and Poor’s (S&\&P 500) Index. This data set is often used for showing the practical effectiveness of an algorithm or for comparing numerical performance between competing approaches.

Recall that a statistical arbitrage strategy typically involves four main steps, namely constructing an appropriate asset pool, designing a mean-reverting portfolio, verifying the mean-reverting property through a unit-root test, and finally trading the favorable portfolio. The construction of an asset pool can be achieved using a method described in [8], which utilizes the smallest eigenvalue of the covariance matrix. In this paper, we focus on the crucial second step of developing a mean-reverting portfolio by minimizing predictability, portmanteau criterion, or crossing statistics measures, while incorporating volatility and sparsity constraints to ensure the profitability of the portfolio in practice. The mean-reversion property can be verified using a unit-root test such as the Dickey-Fuller test [10]. A comprehensive discussion on the trading strategy of a mean-reverting portfolio in [30, 29] is based on observed/normalized spread values. Normalizing the spread values makes the strategy less sensitive to absolute price levels. Table 1 in [29] provides examples of how the strategy works for different trading positions and normalized spread values. The strategy involves closing long positions and taking short positions when the normalized spread value exceeds a threshold, and not taking any action when the normalized spread value is negative. It is a simple yet effective strategy, but careful testing and analysis are necessary before adopting it as a long-term trading strategy.

In our numerical experiments, various standard performance metrics are used to measure the quality of portfolios, which in our case, are obtained from the limit point of Algorithm 2 and applying sparse PCA to the solution of (SDP-P). The first measure is called cumulative profit and loss (P&LP\&L) is used to measure the overall return of a mean-reverting portfolio within a single trading period, from t1t_{1} to t2t_{2}, and is calculated as

Cumulative P&L(t1,t2)=t=t1t2P&Lt,\text{Cumulative P\&L}(t_{1},t_{2})=\sum_{t=t_{1}}^{t_{2}}P\&L_{t},

where P&Lt=xTrt(tto)xTrt1(t1to)P\&L_{t}=\color[rgb]{0,0,0}{x}^{T}r_{t}(t-t_{o})-x^{T}r_{t-1}(t-1-t_{o}) when a long position is opened, and P&Lt=xTrt(tto)xTrt1(t1to)P\&L_{t}=\color[rgb]{0,0,0}x^{T}r_{t}(t-t_{o})-x^{T}r_{t-1}(t-1-t_{o}) when a short position is opened at time tot_{o}. For a particular asset, the value of rt(τ)=ptptτptτln(pt)ln(ptτ)r_{t}(\tau)=\frac{p_{t}-p_{t-\tau}}{p_{t-\tau}}\approx\ln(p_{t})-\ln(p_{t-\tau}), where ptp_{t} represents the price of the asset at time tt. Table 1 in [29] is used for this purpose with dd being the standard deviation of the portfolio. The Return on Investment (ROI) is another metric used to evaluate the investment return of a mean-reverting portfolio. It is calculated as follows: ROIt=P&Ltx1,\text{ROI}_{t}=\frac{P\&L_{t}}{\|x\|_{1}}, where P&LtP\&L_{t} is the profit and loss of the portfolio at time tt, and x1\|x\|_{1} is the L1-norm of the portfolio weights. The last metric used to evaluate the performance of a portfolio over a given period of time from t1t_{1} to t2t_{2} is the Sharpe ratio (SRSR), which is defined as

SRROI(t1,t2)=μROI/σROI,SR_{ROI}(t_{1},t_{2})=\mu_{ROI}/\sigma_{ROI},

where μROI=1/(t2t1)t=t1t2ROIt\mu_{ROI}=1/(t_{2}-t_{1})\sum_{t=t_{1}}^{t_{2}}ROI_{t}, and σROI2=1/(t2t1)t=t1t2(ROItμROI)2\sigma^{2}_{ROI}=1/(t_{2}-t_{1})\sum_{t=t_{1}}^{t_{2}}(ROI_{t}-\mu_{ROI})^{2}. A portfolio with a higher SRSR is considered more profitable.

To conduct the numerical experiments, we start by explaining how to select matrices AiA_{i}’s for i{0,1,,q}i\in\{0,1,\dots,q\}. Given a multivariate stochastic process 𝐱=(𝐱𝐭)𝐭\bf x=(\bf x_{t})_{t\in\mathbb{N}} with values in n\mathbb{R}^{n}. For a nonnegative integer ss, the lag-ss empirical autocovariance matrix of a sample path x=(x1,x2,,xT)x=(x_{1},x_{2},\dots,x_{T}) of 𝐱𝐭\bf x_{t} is defined as

Γs:=1Ts1t=1Tsxt~x~T+sT with x~t:=xt1Tt=1Txt.\Gamma_{s}:=\frac{1}{T-s-1}\sum_{t=1}^{T-s}\tilde{x_{t}}\tilde{x}_{T+s}^{T}\qquad\text{ with }\qquad\tilde{x}_{t}:=x_{t}-\frac{1}{T}\sum_{t=1}^{T}x_{t}. (16)

In particular, we have

{α=1,γ=0,andA0=Γ0for the predictability-,α=0,γ=1,andAi=Γi;i{0,2,3,,q}for the portmanteau criterion-, andα=1,γ>0,andAi=Γi;i{0,1,2,3,,q}for the crossing statistics-based problems.\left\{\begin{array}[]{ll}\alpha=1,\,\gamma=0,\ \text{and}\ A_{0}=\Gamma_{0}&\quad\mbox{for the predictability-},\\ \alpha=0,\,\gamma=1,\ \text{and}\ A_{i}=\Gamma_{i};\ \forall i\in\{0,2,3,\dots,q\}&\quad\mbox{for the portmanteau criterion-, and}\\ \alpha=1,\,\gamma>0,\ \text{and}\ A_{i}=\Gamma_{i};\ \forall i\in\{0,1,2,3,\dots,q\}&\quad\mbox{for the crossing statistics-based problems.}\end{array}\right. (17)

We construct the asset pool by combining the pools suggested in optimization portfolio studies [8, 30, 29, 28], resulting in a pool of n=30n=30 assets. The trading time period is from February 1st, 2012 to June 30th, 2014.

We run Algorithm 2 with r=10,ϵI=103,r=\sqrt{10},\epsilon_{I}=10^{-3}, defined in (11) and ϵO=103\epsilon_{O}=10^{-3}, defined in (12), and the volatility threshold ϕ\phi selected to be larger than thirty percent of the median variance of all assets in the pool (see [8]). In order to obtain reasonable results and compare the performance of the portmanteau criterion- and crossing statistics-based portfolios, we must decide upon a specific qq. For this goal, we select the best qq among the candidate set {2,3,4}\{2,3,4\} when we apply our algorithm for the case of the portmanteau criterion as follows. We measure the average Sharpe ratios for a small, medium, and large sparsity level k=5,10,k=5,10, and 1717. Based on this procedure (results not reported due to space considerations), we get that q=3q=3 produces better results than q=2q=2, but the differences between q=4q=4 and q=3q=3 are rather marginal. Hence, in our experiments, we set q=3q=3. After fixing q=3q=3, the next step involves selecting an appropriate γ\gamma. For this aim, we concentrate exclusively on portfolio optimization based on crossing statistics. We opt for a range of values within the interval (0.0001,1](0.0001,1], incrementally adjusting γ\gamma with a step size of 0.0009. While applying Algorithm 2 to the crossing statistic-based model across various sparsity levels, specifically k=5,10,and17k=5,10,and17, we evaluate the average Sharpe ratios. By this process, the best candidate is realized as γ=0.001\gamma=0.001. Furthermore, for (SDP-P), we need to determine β\beta as well. To select this hyperparameter, we apply the methodology presented in [8] (that is, we solve this convex problem and apply sparse principal component analysis on its solution to introduce the corresponding portfolio), for all three cases reported in (17) (with q=3q=3, and γ=0.001\gamma=0.001) and values from 0.001 to 1.999, with 0.009 increments. This grid search revealed that the range between 0.8 and 1.2 showed better promise for β\beta values. We then narrowed down our selection within this range, using a 0.01 step. Throughout this iterative approach, we found that β=1\beta=1 yielded the highest Sharpe ratio. As such, β\beta is selected as 1 in our comparisons.

The spread pictures in Figures 1, 2, and 3 depict that the proposed method illustrates better mean reversion overall and the second and third pictures in these figures show that our method surpasses the SDP relaxation technique across both standard cumulative P&L and Sharpe ratio assessments. This holds true for a range of sparsity levels, including small, medium, and large, as well as across all three portfolio models: predictability-based, portmanteau criterion-based, and crossing statistics-based portfolios. Nevertheless, we should emphasize that the numerical performance of the SDP relaxation idea raises intriguing theoretical research questions that surely require thorough investigation. For example, under what conditions on the problem data in (PP), its SDP relaxation (SDPPSDP-P) has a rank-one solution. And, if available, how to use a rank-one solution of (SDPPSDP-P) to construct a solution of (PP).

Next, we compare the performance of portfolios generated from the predictability, portmanteau criterion, and crossing statistics proxies, based on Algorithm 2. Figure 4 demonstrates that after careful tuning of the parameters qq and γ\gamma, the crossing statistics-based portfolio not only accurately captures the mean-reversion property, but also achieves superior performance in terms of Sharpe ratio and cumulative return and loss, surpassing the predictability and portmanteau criterion-based portfolios. This observation completely aligns with the idea that Algorithm 2 leverages more information from the data in this case. In addition, the portmanteau criterion proxy seems to fail in retaining mean reversion, underscoring the critical role of the quadratic term in the objective function for capturing this desirable property.

[Uncaptioned image]
[Uncaptioned image]
Refer to caption
Figure 1: Statistical proxy- and SDP-based portfolios for k=5k=5.
[Uncaptioned image]
[Uncaptioned image]
Refer to caption
Figure 2: Statistical proxy- and SDP-based portfolios for k=10k=10.
[Uncaptioned image]
[Uncaptioned image]
Refer to caption
Figure 3: Statistical proxy- and SDP-based portfolios for k=17k=17.
[Uncaptioned image]
[Uncaptioned image]
Refer to caption
Figure 4: Statistical proxy-based portfolios for k=5,10,17k=5,10,17.

4 Conclusion

The paper developed an algorithm that solves the underlying optimization problem of constructing portfolios of assets, having the mean-reverting property, and also being sparse and exhibiting reasonable volatility. Three main proxies for capturing mean-reversion are predictability, portmanteau criterion, and crossing statistics. A comprehensive optimization model was developed for this task. We leverage the variable splitting technique to unfold highly coupled nonconvex constraints of this model and propose a tailored penalty decomposition method, that solves a sequence of penalty subproblems via an efficient block coordinate method. We establish that not only the proposed algorithm finds a KKT point when the objective function is convex, but also its analysis can be extended to a nonconvex objective function (with partial modifications). We numerically examine the performance of the algorithm and show that it outperforms the SDP relaxation approach in terms of standard measures on an S&P 500 data set. Further, we compare these statistical proxy-based portfolios and demonstrate that, after tuning hyperparameters, the crossing statistics surrogate captures mean-reversion well and achieves better performance compared to predictability and portmanteau criterion-based portfolios. Our method has significant potential applications in finance, economics, and other related fields.

References

  • [1] Ramin Ayanzadeh, Ahmad Mousavi, Milton Halem, and Tim Finin. Quantum annealing based binary compressive sensing with matrix uncertainty. arXiv preprint arXiv:1901.00088, 2019.
  • [2] Dimitris Bertsimas and Romy Shioda. Algorithm for cardinality-constrained quadratic optimization. Computational Optimization and Applications, 43(1):1–22, 2009.
  • [3] Ronald Bewley, David Orden, Minxian Yang, and Lance A Fisher. Comparison of box—tiao and johansen canonical estimators of cointegrating vectors in vec (1) models. Journal of Econometrics, 64(1-2):3–27, 1994.
  • [4] George EP Box and George C Tiao. A canonical analysis of multiple time series. Biometrika, 64(2):355–365, 1977.
  • [5] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
  • [6] Oleg P Burdakov, Christian Kanzow, and Alexandra Schwartz. Mathematical programs with cardinality constraints: reformulation by complementarity-type conditions and a regularization method. SIAM Journal on Optimization, 26(1):397–425, 2016.
  • [7] Catarina Coelho, José Luis Santos, and Pedro Júdice. The performance of bank portfolio optimization. International Transactions in Operational Research, 2023.
  • [8] Marco Cuturi and Alexandre d’Aspremont. Mean-reverting portfolios: Tradeoffs between sparsity and volatility. Financial Signal Processing and Machine Learning, pages 23–40, 2016.
  • [9] Marco Cuturi and Alexandre d’Aspremont. Mean reversion with a variance threshold. In International Conference on Machine Learning, pages 271–279. PMLR, 2013.
  • [10] David A Dickey and Wayne A Fuller. Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366a):427–431, 1979.
  • [11] Jin-Hong Du, Yifeng Guo, and Xueqin Wang. High-dimensional portfolio selection with cardinality constraints. Journal of the American Statistical Association, pages 1–13, 2022.
  • [12] Norbert Fogarasi and Janos Levendovszky. Sparse, mean reverting portfolio selection using simulated annealing. Algorithmic Finance, 2(3-4):197–211, 2013.
  • [13] F Hooshmand, Z Anoushirvani, and SA MirHassani. Model and efficient algorithm for the portfolio selection problem with real-world constraints under value-at-risk measure. International Transactions in Operational Research, 2023.
  • [14] Takahiro Imai and Kei Nakagawa. Statistical arbitrage strategy in multi-asset market using time series analysis. Journal of Mathematical Finance, 10(2):334–344, 2020.
  • [15] Kui Jing, Fengmin Xu, and Xuepeng Li. A bi-level programming framework for identifying optimal parameters in portfolio selection. International Transactions in Operational Research, 29(1):87–112, 2022.
  • [16] Greta M Ljung and George EP Box. On a measure of lack of fit in time series models. Biometrika, 65(2):297–303, 1978.
  • [17] Hossein Moosaei, Ahmad Mousavi, Milan Hladík, and Zheming Gao. Sparse l1-norm quadratic surface support vector machine with universum data. Soft Computing, 27(9):5567–5586, 2023.
  • [18] Ahmad Mousavi and George Michailidis. Cardinality constrained mean-variance portfolios: A penalty decomposition algorithm. arXiv preprint arXiv:2309.16004, 2023.
  • [19] Ahmad Mousavi, Mehdi Rezaee, and Ramin Ayanzadeh. A survey on compressive sensing: classical results and recent advancements. Journal of Mathematical Modeling, 8(3):309–344, 2020.
  • [20] Ahmad Mousavi and Jinglai Shen. A penalty decomposition algorithm with greedy improvement for mean-reverting portfolios with sparsity and volatility constraints. International Transactions in Operational Research, 2022.
  • [21] Parvin Nazari, Ahmad Mousavi, Davoud Ataee Tarzanagh, and George Michailidis. A penalty based method for communication-efficient decentralized bilevel programming. arXiv preprint arXiv:2211.04088, 2022.
  • [22] Andrzej Ruszczynski. Nonlinear Optimization. Princeton University Press, 2011.
  • [23] Ruchika Sehgal and Aparna Mehra. Robust reward–risk ratio portfolio optimization. International Transactions in Operational Research, 28(4):2169–2190, 2021.
  • [24] Jinglai Shen and Ahmad Mousavi. Least sparsity of pp-norm based optimization problems with p>1p>1. SIAM Journal on Optimization, 28(3):2721–2751, 2018.
  • [25] Yinyu Ye and Shuzhong Zhang. New results on quadratic minimization. SIAM Journal on Optimization, 14(1):245–267, 2003.
  • [26] N Donald Ylvisaker. The expected number of zeros of a stationary gaussian process. The Annals of Mathematical Statistics, 36(3):1043–1046, 1965.
  • [27] Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, and Yuan Yao. Global convergence of block coordinate descent in deep learning. In International conference on machine learning, pages 7313–7323. PMLR, 2019.
  • [28] Jize Zhang, Tim Leung, and Aleksandr Aravkin. Sparse mean-reverting portfolios via penalized likelihood optimization. Automatica, 111:108651, 2020.
  • [29] Ziping Zhao and Daniel P Palomar. Mean-reverting portfolio with budget constraint. IEEE Transactions on Signal Processing, 66(9):2342–2357, 2018.
  • [30] Ziping Zhao, Rui Zhou, Zhongju Wang, and Daniel P Palomar. Optimal portfolio design for statistical arbitrage in finance. In 2018 IEEE Statistical Signal Processing Workshop (SSP), pages 801–805. IEEE, 2018.

5 Appendix

All technical proofs are given next.

5.1 Proof of Lemma 2.1.

Proof.

We drop the subscripts for simplicity. If xTA0x>ϕx^{T}A_{0}x>\phi, the Lagrangian multiplier in (PxP_{x}) must be zero such that we have (αA1+γi=2q(zTAiz)Ai+2ρI)x=ρ(y+z),\left(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I\right)x={\rho}(y+z), and consequently, x=ρ(αA1+γi=2q(zTAiz)Ai+2ρI)1(y+z)x=\rho(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I)^{-1}(y+z). This leads to

x\displaystyle\|x\| ρ(αA1+γi=2q(zTAiz)Ai+2ρI)12y+z2\displaystyle\leq\rho\|(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}+2\rho I)^{-1}\|_{2}\|y+z\|_{2} (18)
ρλmin(αA1+γi=2q(zTAiz)Ai)+2ρy+z2\displaystyle\leq\frac{\rho}{\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})+2\rho}\|y+z\|_{2}
12y+z2\displaystyle\leq\frac{1}{2}\|y+z\|_{2}
12(y+z)\displaystyle\leq\frac{1}{2}(\|y\|+\|z\|)
=12(1+z)asysolvesPy,\displaystyle=\frac{1}{2}(1+\|z\|)\qquad\text{as}\ \ y\ \ \text{solves}\ \ \ref{pr: py},

where we used that λmin(αA1+γi=2q(zTAiz)Ai)0\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})\geq 0 in the third inequality. Moreover, since zz solves (PzP_{z}) and λmin(γi=2q(xTAix)Ai)0\lambda_{\min}\left(\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i}\right)\geq 0, the equation (8) implies

z\displaystyle\|z\| ρ(γi=2q(xTAix)Ai+ρI)1x\displaystyle\leq\rho\|(\gamma\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i}+\rho I)^{-1}\|\|x\| (19)
ργλmin(i=2q(xTAix)Ai)+ρx\displaystyle\leq\frac{\rho}{\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})+\rho}\|x\|
x.\displaystyle\leq\|x\|.

Through (18), we see x12(1+z)12(1+x)\|x\|\leq\frac{1}{2}(1+\|z\|)\leq\frac{1}{2}(1+\|x\|); leading to x1\|x\|\leq 1 in this case. If xTA0x=ϕx^{T}A_{0}x=\phi, we have xϕ/λmin(A0)\|x\|\leq\sqrt{\phi/{\lambda_{\min}(A_{0})}}. By (19), we have zxmax{ϕ/λmin(A0),1}.\|z\|\leq\|x\|\leq\max\{\sqrt{\phi/{\lambda_{\min}(A_{0})}},1\}.

5.2 Proof of Theorem 2.1.

Proof.

By observing definitions of xl+1,yl+1x_{l+1},y_{l+1}, and zl+1z_{l+1} in steps 3-5 of Algorithm 1, we get

qρ(xl+1,yl+1,zl+1)\displaystyle q_{\rho}(x_{l+1},y_{l+1},z_{l+1}) \displaystyle\leq qρ(xl+1,yl+1,z);z𝒵,\displaystyle q_{\rho}(x_{l+1},y_{l+1},z);\qquad\forall z\in\mathcal{Z},
qρ(xl+1,yl+1,zl)\displaystyle q_{\rho}(x_{l+1},y_{l+1},z_{l}) \displaystyle\leq qρ(xl+1,y,zl);y𝒴,\displaystyle q_{\rho}(x_{l+1},y,z_{l});\qquad\forall y\in\mathcal{Y},
qρ(xl+1,yl,zl)\displaystyle q_{\rho}(x_{l+1},y_{l},z_{l}) \displaystyle\leq qρ(x,yl,zl);x𝒳.\displaystyle q_{\rho}(x,y_{l},z_{l});\qquad\forall x\in\mathcal{X}. (20)

This simply leads to the following:

0\displaystyle 0 qρ(xl+1,yl+1,zl+1)\displaystyle\leq q_{\rho}(x_{l+1},y_{l+1},z_{l+1}) (21)
qρ(xl+1,yl+1,zl)\displaystyle\leq q_{\rho}(x_{l+1},y_{l+1},z_{l})
qρ(xl+1,yl,zl)\displaystyle\leq q_{\rho}(x_{l+1},y_{l},z_{l})
qρ(xl,yl,zl);l.\displaystyle\leq q_{\rho}(x_{l},y_{l},z_{l});\quad\forall l\in\mathbb{N}.

Thus, qρ(xl,yl,zl)q_{\rho}(x_{l},y_{l},z_{l}) is a bounded below and non-increasing sequence; implying that qρ(xl,yl,zl)q_{\rho}(x_{l},y_{l},z_{l}) is convergent. From the other side, since (x,y,z)(x_{*},y_{*},z_{*}) is an accumulation point of {(xl,yl,zl)}\{(x_{l},y_{l},z_{l})\}, there exist a subsequence LL such that limlL(xl,yl,zl)=(x,y,z)\lim_{l\in L\to\infty}(x_{l},y_{l},z_{l})=(x_{*},y_{*},z_{*}). The continuity of qρ(xl,yl,zl)q_{\rho}(x_{l},y_{l},z_{l}) yields

limlqρ(xl+1,yl+1,zl)\displaystyle\lim_{l\to\infty}q_{\rho}(x_{l+1},y_{l+1},z_{l}) =limlqρ(xl+1,yl,zl)\displaystyle=\lim_{l\to\infty}q_{\rho}(x_{l+1},y_{l},z_{l})
=limlqρ(xl,yl,zl)\displaystyle=\lim_{l\to\infty}q_{\rho}(x_{l},y_{l},z_{l})
=limlLqρ(xl,yl,zl)\displaystyle=\lim_{l\in L\to\infty}q_{\rho}(x_{l},y_{l},z_{l})
=qρ(x,y,z).\displaystyle=q_{\rho}(x_{*},y_{*},z_{*}).

By the continuity of qρ(xl,yl,zl)q_{\rho}(x_{l},y_{l},z_{l}) and taking the limit of both sides of (5.2) as lLl\in L\to\infty , we have

qρ(x,y,z)\displaystyle q_{\rho}(x_{*},y_{*},z_{*}) \displaystyle\leq qρ(x,y,z);x𝒳,\displaystyle q_{\rho}(x,y_{*},z_{*});\qquad\forall x\in\mathcal{X},
qρ(x,y,z)\displaystyle q_{\rho}(x_{*},y_{*},z_{*}) \displaystyle\leq qρ(x,y,z);y𝒴,\displaystyle q_{\rho}(x_{*},y,z_{*});\qquad\forall y\in\mathcal{Y},
qρ(x,y,z)\displaystyle q_{\rho}(x_{*},y_{*},z_{*}) \displaystyle\leq qρ(x,y,z);z𝒵.\displaystyle q_{\rho}(x_{*},y_{*},z);\qquad\forall z\in\mathcal{Z}.

Further, it is clear from (21) that {qρ(xl,yl,zl)}\left\{q_{\rho}(x_{l},y_{l},z_{l})\right\} is non-increasing.
Now suppose qρ(xl,yl,zl)=qρ(xl+1,yl+1,zl+1)q_{\rho}(x_{l},y_{l},z_{l})=q_{\rho}(x_{l+1},y_{l+1},z_{l+1}) for some ll\in\mathbb{N}. Then because of the last inequality in (21), we have qρ(xl+1,yl,zl)=qρ(xl,yl,zl).q_{\rho}(x_{l+1},y_{l},z_{l})=q_{\rho}(x_{l},y_{l},z_{l}). Since qρ(xl+1,zl,yl)=minx𝒳qρ(x,yl,zl)q_{\rho}(x_{l+1},z_{l},y_{l})=\min_{x\in\mathcal{X}}q_{\rho}(x,y_{l},z_{l}) and xl𝒳x_{l}\in\mathcal{X}, we see xlArgminx𝒳qρ(x,yl,zl)x_{l}\in\mbox{Argmin}_{x\in\mathcal{X}}q_{\rho}(x,y_{l},z_{l}). Further, if qρ(xl,yl,zl)=qρ(xl+1,yl+1,zl+1)q_{\rho}(x_{l},y_{l},z_{l})=q_{\rho}(x_{l+1},y_{l+1},z_{l+1}) for some ll\in\mathbb{N}, using the third inequality in (21) we get qρ(xl+1,yl+1,zl)=qρ(xl+1,yl,zl)q_{\rho}(x_{l+1},y_{l+1},z_{l})=q_{\rho}(x_{l+1},y_{l},z_{l}). Since qρ(xl+1,yl+1,zl)=miny𝒴qρ(xl+1,y,zl)q_{\rho}(x_{l+1},y_{l+1},z_{l})=\min_{y\in\mathcal{Y}}q_{\rho}(x_{l+1},y,z_{l}) and yl𝒴y_{l}\in\mathcal{Y}, we see ylArgminy𝒴qρ(xl+1,y,zl)y_{l}\in\mbox{Argmin}_{y\in\mathcal{Y}}q_{\rho}(x_{l+1},y,z_{l}). By the last inequality in (21), we have qρ(xl+1,yl,zl)=qρ(xl,yl,zl)q_{\rho}(x_{l+1},y_{l},z_{l})=q_{\rho}(x_{l},y_{l},z_{l}), so ylArgminy𝒴qρ(xl,y,zl)y_{l}\in\mbox{Argmin}_{y\in\mathcal{Y}}q_{\rho}(x_{l},y,z_{l}). Recall that zl𝒵z_{l}\in\mathcal{Z} satisfies zlArgminz𝒵qρ(xl,yl,z)z_{l}\in\mbox{Argmin}_{z\in\mathcal{Z}}\ q_{\rho}(x_{l},y_{l},z) by definition. Hence, (xl,yl,zl)(x_{l},y_{l},z_{l}) is a saddle point of (Px,y,zP_{x,y,z}). ∎

5.3 Proof of Theorem 2.2.

Proof.

(i) From (13), we know that max{x(j),y(j),z(j)}max{ϕ/λmin(A0),1},\max\left\{\|x^{(j)}\|,\|y^{(j)}\|,\|z^{(j)}\|\right\}\leq\max\{\sqrt{\phi/{\lambda_{\min}(A_{0})}},1\}, for every jj\in\mathbb{N}. Thus, {(x(j),y(j),z(j))}\{(x^{(j)},y^{(j)},z^{(j)})\} is bounded and therefore, has a convergent subsequence. For our purposes, without loss of generality, we suppose that the sequence itself is convergent. Let (x,y,z)\left(x^{*},y^{*},z^{*}\right) be its accumulation point. First, let us show that x=y=zx^{*}=y^{*}=z^{*}. Since α{0,1},Ai0\alpha\in\{0,1\},A_{i}\succeq 0 for i[q],i\in[q], and γ0\gamma\geq 0, we see α(x(j))TA1x(j)+γi=2q(z(j))TAiz(j)(x(j))TAix(j)0\alpha{(x^{(j)})}^{T}A_{1}x^{(j)}+\gamma\sum_{i=2}^{q}({z^{(j)}})^{T}A_{i}z^{(j)}({x^{(j)}})^{T}A_{i}x^{(j)}\geq 0, and thus in view of (4), (21), and step 13 of Algorithm 2, we have ρ(j)(x(j)y(j)2+x(j)z(j)2)Υ\rho^{(j)}\left(\|x^{(j)}-y^{(j)}\|^{2}+\|x^{(j)}-z^{(j)}\|^{2}\right)\leq\Upsilon. The latter leads to

max{x(j)y(j),x(j)z(j)}Υ/ρ(j),\max\{\|x^{(j)}-y^{(j)}\|,\|x^{(j)}-z^{(j)}\|\}\leq\sqrt{{\Upsilon}/{\rho^{(j)}}},

and thus max{x(j)y(j),x(j)z(j)}0\max\{\|x^{(j)}-y^{(j)}\|,\|x^{(j)}-z^{(j)}\|\}\to 0 when jj\to\infty as ρ(j)\rho^{(j)}\to\infty; implying that x=y=zx^{*}=y^{*}=z^{*}.

Next, let (j)[n]\mathcal{I}^{(j)}\subseteq[n] be such that |(j)|=k|\mathcal{I}^{(j)}|=k and (y(j)c)i=0(y_{(\mathcal{I}^{j})^{c}})_{i}=0 for every jj\in\mathbb{N} and i(j)ci\in(\mathcal{I}^{j})^{c}. Then, since {(j)}\{\mathcal{I}^{(j)}\} is a bounded sequence of indices, it has a convergent subsequence, which means that there exists an index subset [n]\mathcal{L}\subseteq[n] with ||=k|\mathcal{L}|=k and a subsequence {(x(j),y(j),z(j))},\{(x^{(j_{\ell})},y^{(j_{\ell})},z^{(j_{\ell})})\}, of the above convergent subsequence such that (j)=\mathcal{I}^{(j_{\ell})}=\mathcal{L} for all large jj_{\ell}’s. Therefore, since x=yx^{*}=y^{*} and yc=0y^{*}_{\mathcal{L}^{c}}=0, we see xc=0x^{*}_{\mathcal{L}^{c}}=0, which further implies x=1\|x^{*}_{\mathcal{L}}\|=1.

(ii) For each jj, (x(j),y(j),z(j))(x^{(j)},y^{(j)},z^{(j)}) is a saddle point of (Px,y,zP_{x,y,z}) with ρ=ρ(j)>0\rho=\rho^{(j)}>0 such that (5),(7) and (8) give

{αA1x(j)+γi=2q(z(j))TAiz(j)Aix(j)++ρ(x(j)y(j)+x(j)z(j))λA0x(j)=0,y(j)=(x(j))/(x(j)),γi=2q(x(j))TAix(j)Aiz(j)=ρ(x(j)z(j)),0λ(x(j))TA0x(j)ϕ0,y(j)=1,and(y(j))c=0.\left\{\begin{aligned} &\alpha A_{1}x^{(j_{\ell})}+\gamma\sum_{i=2}^{q}(z^{(j_{\ell})})^{T}A_{i}z^{(j_{\ell})}A_{i}x^{(j_{\ell})}+\dots\\ &\qquad\dots+\rho(x^{(j_{\ell})}-y^{(j_{\ell})}+x^{(j_{\ell})}-z^{(j_{\ell})})-\lambda A_{0}x^{(j_{\ell})}=0,\\ &y^{(j_{\ell})}={(x^{(j_{\ell})})_{\mathcal{L}}}/{\|{(x^{(j_{\ell})})_{\mathcal{L}}}\|},\\ &\gamma\sum_{i=2}^{q}(x^{(j_{\ell})})^{T}A_{i}x^{(j_{\ell})}A_{i}z^{(j_{\ell})}=\rho(x^{(j_{\ell})}-z^{(j_{\ell})}),\\ &0\leq\lambda\perp(x^{(j_{\ell})})^{T}A_{0}x^{(j_{\ell})}-\phi\geq 0,\\ &\quad\|y^{(j_{\ell})}\|=1,\quad\mbox{and}\quad(y^{(j_{\ell})})_{\mathcal{L}^{c}}=0.\end{aligned}\right. (22)

By injecting the first two lines of (22) into the third, we get

αA1x(j)+γi=2q[(z(j))TAiz(j)Aix(j)+(x(j))TAix(j)Aiz(j)]+ρ(j)(x(j)y(j))λ(j)A0x(j)=0.\displaystyle\alpha A_{1}x^{(j_{\ell})}+\gamma\sum_{i=2}^{q}[(z^{(j_{\ell})})^{T}A_{i}z^{(j_{\ell})}A_{i}x^{(j_{\ell})}+(x^{(j_{\ell})})^{T}A_{i}x^{(j_{\ell})}A_{i}z^{(j_{\ell})}]+\rho^{(j_{\ell})}(x^{(j_{\ell})}-y^{(j_{\ell})})-\lambda^{(j_{\ell})}A_{0}x^{(j_{\ell})}=0.

By observing (x(j))(y(j))=(x(j)){\|(x^{(j_{\ell})})_{\mathcal{L}}\|}\left(y^{(j_{\ell})}\right)_{\mathcal{L}}=\left(x^{(j_{\ell})}\right)_{\mathcal{L}} that implies (x(j)y(j))=(x(j)1):=μ(j)(y(j)),\left(x^{(j_{\ell})}-y^{(j_{\ell})}\right)_{\mathcal{L}}=\underbrace{(\|x^{(j_{\ell})}\|-1)}_{:=\mu^{(j_{\ell})}}(y^{(j_{\ell})})_{\mathcal{L}}, and letting

M(j):=αA1x(j)+γi=2q[(z(j))TAiz(j)Aix(j)+(x(j))TAix(j)Aiz(j)],\displaystyle M^{(j_{\ell})}:=\alpha A_{1}x^{(j_{\ell})}+\gamma\sum_{i=2}^{q}[(z^{(j_{\ell})})^{T}A_{i}z^{(j_{\ell})}A_{i}x^{(j_{\ell})}+(x^{(j_{\ell})})^{T}A_{i}x^{(j_{\ell})}A_{i}z^{(j_{\ell})}],

we get

M(j)+μ(j)[(y(j))0]+[0ρ(j)(x(j))c]:=w(j)λA0x(j)=0,M^{(j_{\ell})}+\mu^{(j_{\ell})}\begin{bmatrix}(y^{(j_{\ell})})_{\mathcal{L}}\\ 0\end{bmatrix}+\underbrace{\begin{bmatrix}0\\ \rho^{(j_{\ell})}(x^{(j_{\ell})})_{\mathcal{L}^{c}}\end{bmatrix}}_{:=w^{(j_{\ell})}}-\lambda A_{0}x^{(j_{\ell})}=0,

where (w(j))=0(w^{(j_{\ell})})_{\mathcal{L}}=0 for each jj_{\ell} and also, 0λ(j)(x(j))TA0x(j)ϕ0,y(j)2=1,0\leq\lambda^{(j_{\ell})}\perp(x^{(j_{\ell})})^{T}A_{0}x^{(j_{\ell})}-\phi\geq 0,\ \|y^{(j_{\ell})}\|_{2}=1, and (y(j))c=0.(y^{(j_{\ell})})_{\mathcal{L}^{c}}=0. We next prove that {(λ(j),μ(j),w(j))}\{(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\} is bounded under Robinson’s condition on xx^{*}. Suppose not, consider the normalized sequence

(λ~(j),μ~(j),w~(j)):=(λ(j),μ(j),w(j))(λ(j),μ(j),w(j))2,j.(\widetilde{\lambda}^{(j_{\ell})},\widetilde{\mu}^{(j_{\ell})},\widetilde{w}^{(j_{\ell})}):=\frac{(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})}{\|(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\|_{2}},\qquad\forall\ {j_{\ell}}.

Through boundedness, there exists a convergent subsequence of (λ~(j),μ~(j),w~(j))(\widetilde{\lambda}^{(j_{\ell})},\widetilde{\mu}^{(j_{\ell})},\widetilde{w}^{(j_{\ell})}) whose limit is given by (λ~,μ~,w~)(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*}) such that (λ~,μ~,w~)2=1\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|_{2}=1, we obtain, in view of x=yx^{*}=y^{*}, yc=0y^{*}_{\mathcal{L}^{c}}=0 and the boundedness of (M(j))(M^{(j_{\ell})}), and passing the limits, that

λ~A0x+μ~x+w~=0,-\widetilde{\lambda}_{*}A_{0}x^{*}+\widetilde{\mu}_{*}x^{*}+\widetilde{w}^{*}=0, (23)

where λ~0\widetilde{\lambda}_{*}\geq 0, xc=0x^{*}_{\mathcal{L}^{c}}=0, and w~=0\widetilde{w}^{*}_{\mathcal{L}}=0. Let us consider two cases: (x)TA0x=ϕ(x^{*})^{T}A_{0}x^{*}=\phi, and (x)TA0x>ϕ(x^{*})^{T}A_{0}x^{*}>\phi as follows. When (x)TA0x=ϕ(x^{*})^{T}A_{0}x^{*}=\phi, by the Robinson’s conditions at xx^{*}, there exist a vector dnd\in\mathbb{R}^{n} and a constant vv\in\mathbb{R}_{-} such that 2dTA0xv=2λ~-2d^{T}A_{0}x^{*}-v=-2\widetilde{\lambda}_{*}, 2dTx=2μ~2d^{T}x^{*}=-2\widetilde{\mu}_{*}, and dc=w~cd_{\mathcal{L}^{c}}=-\widetilde{w}^{*}_{\mathcal{L}^{c}}. Since dc=w~cd_{\mathcal{L}^{c}}=-\widetilde{w}^{*}_{\mathcal{L}^{c}} and w~=0\widetilde{w}^{*}_{\mathcal{L}}=0, we see that dTw~=w~22d^{T}\widetilde{w}^{*}=-\|\widetilde{w}^{*}\|^{2}_{2}. Therefore, 0=λ~dTA0x+μ~dTx+dTw~=(λ~)2+λ~v2(μ~)2w~22=(λ~,μ~,w~)22+λ~v2,0=-\widetilde{\lambda}_{*}d^{T}A_{0}x^{*}+\widetilde{\mu}d^{T}x^{*}+d^{T}\widetilde{w}^{*}=-(\widetilde{\lambda}_{*})^{2}+\frac{\widetilde{\lambda}_{*}v}{2}-(\widetilde{\mu}_{*})^{2}-\|\widetilde{w}^{*}\|^{2}_{2}=-\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}+\frac{\widetilde{\lambda}_{*}v}{2}, which implies that (λ~,μ~,w~)22=λ~v2\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}=\frac{\widetilde{\lambda}_{*}v}{2}. Since λ~0\widetilde{\lambda}_{*}\geq 0 and v0v\leq 0, we have (λ~,μ~,w~)22=0\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}=0, which is a contradiction. Therefore, the sequence ((λ(j),μ(j),w(j)))\big{(}(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\big{)} is bounded. Now, suppose (x)TA0x>ϕ(x^{*})^{T}A_{0}x^{*}>\phi. In this case, since (xj)(x^{j_{\ell}}) converges to xx^{*}, we have (xj)TA0xj>ϕ(x^{j_{\ell}})^{T}A_{0}x^{j_{\ell}}>\phi for all jj_{\ell} sufficiently large. Hence, λ(j)=0\lambda^{(j_{\ell})}=0 for all large jj_{\ell}. This shows that λ~=0\widetilde{\lambda}_{*}=0. In view of the Robinson’s condition at xx^{*} given in (2.2), there exist a vector dnd\in\mathbb{R}^{n} and a constant vv\in\mathbb{R} such that 2dTA0xv=2λ~-2d^{T}A_{0}x^{*}-v=-2\widetilde{\lambda}_{*}, 2dTx=2μ~2d^{T}x^{*}=-2\widetilde{\mu}_{*}, and dc=w~cd_{\mathcal{L}^{c}}=-\widetilde{w}^{*}_{\mathcal{L}^{c}}. Using these equations (23) together with λ~=0\widetilde{\lambda}_{*}=0, we can see that 0=λ~dTA0x+μ~dTx+dTw~=(λ~,μ~,w~)22+λ~v2;0=-\widetilde{\lambda}_{*}d^{T}A_{0}x^{*}+\widetilde{\mu}d^{T}x^{*}+d^{T}\widetilde{w}^{*}=-\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}+\frac{\widetilde{\lambda}_{*}v}{2}; implying that (λ~,μ~,w~)22\|(\widetilde{\lambda}_{*},\widetilde{\mu}_{*},\widetilde{w}^{*})\|^{2}_{2}, which is again a contradiction.

Hence, {(λ(j),μ(j),w(j))}\{(\lambda^{(j_{\ell})},\mu^{(j_{\ell})},w^{(j_{\ell})})\} is bounded and has a convergent subsequence with the limit (λ,μ,w)(\lambda,\mu,w). thus, through passing limit in (5.3) and applying the results of part (i), we have:
αA1x+2γi=2q(x)TAixAixλA0x+μx+w=0,x=1,(x)c=0,andw=0,\alpha A_{1}x^{*}+2\gamma\sum_{i=2}^{q}(x^{*})^{T}A_{i}x^{*}A_{i}x^{*}-\lambda A_{0}x^{*}+\mu x^{*}+w=0,\|x^{*}\|=1,\quad(x^{*})_{\mathcal{L}^{c}}=0,\quad\mbox{and}\quad w^{*}_{\mathcal{L}}=0, where 0λ(x)TA0xϕ00\leq\lambda\perp(x^{*})^{T}A_{0}x^{*}-\phi\geq 0. This means that xx^{*} satisfies the first-order optimality conditions of (PP) given in (15). ∎

5.4 Proof of Theorem 2.3.

Proof.

Note that only a few arguments given in the above proofs must be partially modified in order to make sure that all the subproblems are well-defined and that the convergence analysis similarly holds in the nonconvexity case. Therefore, we avoid repetition and instead, shortly discuss all the arguments that must be revisited. Specifically, we must put conditions on ρ\rho as follows.
(1) 2ρ>λmax(A0)λmin(αA1+γi=2q(zTAiz)Ai)2\rho>\lambda_{\max}(A_{0})-\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}), which is to guarantee the existence of a strictly feasible point in the dual of the SDP relaxation of (PxP_{x}); simply inequality (6). (2) ρ>γλmin(i=2q(xTAix)Ai)\rho>-\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i}), which is to have well-defined zz_{*} in (8). And, (3) 2ρ>λmin(αA1+γi=2q(zTAiz)Ai)2\rho>-\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i}), which is for the inequality (18). Since A00A_{0}\succ 0, (3) is satisfied if (1) holds. It is easy to show that conditions (1) and (2) are satisfied if ρ>λmax(A0)αλmin(A1).\rho>\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1}). Recall that for two symmetric matrices Q1Q_{1} and Q2Q_{2}, we xTQ1xλmin(Q1)x2x^{T}Q_{1}x\geq\lambda_{\min}(Q_{1})\|x\|^{2} and for two symmetric matrices λmin(Q1+Q2)λmin(Q1)+λmin(Q2)\lambda_{\min}(Q_{1}+Q_{2})\geq\lambda_{\min}(Q_{1})+\lambda_{\min}(Q_{2}). Thus, we can show λmin(αA1+γi=2q(zTAiz)Ai)αλmin(A1)+γi=2q(zTAiz)λmin(Ai)\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})\geq\alpha\lambda_{\min}(A_{1})+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)\lambda_{\min}(A_{i}) αλmin(A1)+γi=2qz2λmin2(Ai)αλmin(A1);\geq\alpha\lambda_{\min}(A_{1})+\gamma\sum_{i=2}^{q}\|z\|^{2}\lambda^{2}_{\min}(A_{i})\geq\alpha\lambda_{\min}(A_{1}); implying that λmax(A0)λmin(αA1+γi=2q(zTAiz)Ai)λmax(A0)αλmin(A1)\lambda_{\max}(A_{0})-\lambda_{\min}(\alpha A_{1}+\gamma\sum_{i=2}^{q}(z^{T}A_{i}z)A_{i})\leq\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1}). In the same manner λmin(i=2q(xTAix)Ai)i=2q(xTAix)λmin(Ai)i=2qx2λmin2(Ai)0\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})\geq\sum_{i=2}^{q}(x^{T}A_{i}x)\lambda_{\min}(A_{i})\geq\sum_{i=2}^{q}\|x\|^{2}\lambda^{2}_{\min}(A_{i})\geq 0 such that we have γλmin(i=2q(xTAix)Ai)0-\gamma\lambda_{\min}(\sum_{i=2}^{q}(x^{T}A_{i}x)A_{i})\leq 0. Therefore, conditions (1) and (2) are satisfied if ρ>|λmax(A0)αλmin(A1)|.\rho>|\lambda_{\max}(A_{0})-\alpha\lambda_{\min}(A_{1})|.
Moreover, for part (i) in Theorem 2.2, we can have α(x(j))TA1x(j)+γi=2q(z(j))TAiz(j)(x(j))TAix(j)+ρ(j)(x(j)y(j)2+x(j)z(j)2)Υ,\alpha{(x^{(j)})}^{T}A_{1}x^{(j)}+\gamma\sum_{i=2}^{q}({z^{(j)}})^{T}A_{i}z^{(j)}({x^{(j)}})^{T}A_{i}x^{(j)}+\rho^{(j)}\left(\|x^{(j)}-y^{(j)}\|^{2}+\|x^{(j)}-z^{(j)}\|^{2}\right)\leq\Upsilon, which by the new assumption on the boundedness of the level set XΥX_{\Upsilon} yields ρ(j)(x(j)y(j)2+x(j)z(j)2)ΥminxXΥf(x),\rho^{(j)}\left(\|x^{(j)}-y^{(j)}\|^{2}+\|x^{(j)}-z^{(j)}\|^{2}\right)\leq\Upsilon-\min_{x\in X_{\Upsilon}}f(x), which leads to x=y=zx^{*}=y^{*}=z^{*}. Consequently, (i) and (ii) in Theorem 2.2 hold for this case as well. ∎