A Riemannian Primal-dual Algorithm Based on Proximal Operator and its Application in Metric Learning
Abstract
In this paper, we consider optimizing a smooth, convex, lower semicontinuous function in Riemannian space with constraints. To solve the problem, we first convert it to a dual problem and then propose a general primal-dual algorithm to optimize the primal and dual variables iteratively. In each optimization iteration, we employ a proximal operator to search optimal solution in the primal space. We prove convergence of the proposed algorithm and show its non-asymptotic convergence rate. By utilizing the proposed primal-dual optimization technique, we propose a novel metric learning algorithm which learns an optimal feature transformation matrix in the Riemannian space of positive definite matrices. Preliminary experimental results on an optimal fund selection problem in fund of funds (FOF) management for quantitative investment showed its efficacy.
I Introduction
Many machine learning problems can be solved by optimization algorithms which minimize or maximize a predefined objective function under certain constraints if there are any. In the past decades, searching optimal variables in Euclidean space is a mainstream direction for optimization techniques. In recent years, there is a shift from Euclidean space to Riemannian space due to manifold structures existed in many machine learning problems[1, 2, 3, 4, 5, 6]. To solve optimization problems in the Riemannian space, a straightforward method is to generalize optimization algorithms developed in the Euclidean space to the Riemannian space with consideration of manifold constraints on the variables to be optimized. Gradient descent methods, Newton’s methods and conjugate gradient methods can be natural extended from the Euclidean space to the Riemannian space, see [7, 8, 9, 10] and references therein. Studies on Riemannian accelerated gradient methods, quasi-Newton algorithms like BFGS and adaptive optimization methods can be found in [11, 12, 13]. P. -A. Absil et al. proposed a trust-region approach for optimizing a smooth function on a Riemannian manifold in which the trust-region subproblems are solved using a truncated conjugate gradient algorithm [2]. Furthermore, Qi and Agarwal generalized the adaptive regularization with cubics algorithm to Riemannian manifold, and obtain an upper bound on the iteration complexity which is optimal compared to the complexity of steepest descent and trust-region methods [14, 15]. In recent years, variance reduction techniques drew tremendous attention for optimizing finite-sum problems[16, 17, 18]. Extending the idea of variance reduction for optimizing finite sums of geodesically smooth functions on Riemannian manifolds can be found in [19, 20, 21].
Although intensive studies on Riemannian manifold optimization, all the above works have only considered problems on unconstrained manifold, the only constraint is that the solution has to lie on the manifold. This severely limits the scope of possible applications with those methods. In many problems, additional equality or inequality constraints need to be imposed. There are few works addressed this problem, Hauswirth et al. extended the projected gradient descent algorithm to Riemannian manifold with inequality constraints and show its well-behaved convergent behaviour, but the manifold is restricted to submanifold of Euclidean space [22]. Zhang et al. developed an ADMM-like primal-dual approach to solve nonconvex and nonsmooth multi-block optimization over Riemannian manifold with coupled linear equality constraints[23].
In this paper, we consider the following general nonlinear primal problem which is constrained on a complete Riemannian manifold :
(1) |
subject to constraints: , where and are closed proper, convex, lower semicontinuous (l.s.c.) and real-valued functions. A function is of class if its first and second derivatives both exist and are continuous.
I-A Previous Works
Khuzani and Li studied stochastic primal-dual method on the Riemannian manifolds with bounded sectional curvature[24]. They proved non-asymptotic convergence of the primal-dual method and established a connection between convergence rate and sectional curvature lower bound. In their algorithm, standard gradient descent method followed by exponential map to search optimal variable in each iteration. In recent years, proximal algorithms emerge from many machine learning applications due to their capability on handling nonsmooth, constrained, large-scale or distributed problems[25, 26]. Ferreira and Oliveira considered minimization problem on a Riemannian manifold with nonpositive sectional curvature. They solved the problem by extending the proximal method in the Euclidean space to the Riemannian space [27]. Recent advances on Riemannian proximal point method can be found in [28, 29].
I-B Our Contributions
Previous works concentrate on either constrained Euclidean space optimization or unconstrained Riemannian space optimization, we propose a novel algorithm to solve constrained optimization problem (1) over the Riemannian manifold. We first convert it to a dual problem and then use a general primal-dual algorithm to optimize the primal and dual variables iteratively. In each optimization iteration, we employ the proximal point algorithm and gradient ascend method alternatively to search optimal solution in the primal and dual space. We prove convergence of the proposed algorithm and show its non-asymptotic convergence rate. To show the efficacy of the proposed primal-dual algorithm for optimizing nonlinear l.s.c. functions in space with constraints, we propose a novel metric learning algorithm and solved it using the proposed Riemannian primal-dual method.
II Notation and Preliminaries
Let be a connected and finite dimensional manifold with dimensionality of . We denote by the tangent space of at . Let be endowed with a Riemannian metric , with corresponding norm denoted by , so that is now a Riemannian manifold[30]. We use to denote the length of a piecewise smooth curve joining to , i.e., such that and . Minimizing this length functional over the set of all piecewise smooth curves passing and we get a Riemannian distance which induces the original topology on . Take the exponential map is defined by which maps a tangent vector at to along the curve . For any we define the exponential inverse map which is and maps a point on to a tangent vector at with . We assume is a complete metric space, bounded and all closed subsets of are compact. For a given convex function at , a vector is called subgradient of at if , for all . The set of all subgradients of at is called subdifferential of at which is denoted by . If is a Hadamard manifold which is complete, simply connected and has everywhere non-positive sectional curvature, the subdifferential of at any point on is nonempty[27].
III The Algorithm
By employing duality, we convert original optimization problem (1) to an augmented Lagrangian function (generic saddle-point problem):
(2) |
where is Lagrangian dual vector for inequality constraints, and is a regularization parameter which weights norm of the dual variables. The norm of the Lagrangian dual variables is upper bounded and so is gradient of the deterministic Lagrangian function .
To solve the generic primal-dual problem shown in eq. (2), we propose a primal-dual algorithm based on proximal operator (see Algorithm 1).
(3) | ||||
(4) |
Assumption 1. Assume is a compact Riemannian manifold with finite diameter and non-positive sectional curvature. For any , the following gradients are bounded by , , and , .
With Assumption 1, we have the following theorem.
Theorem 1. Assume problem (3) has a saddle-point and Assumption 1 hold. Let be a finite sequence generated by Algorithm 1 iteratively, and step size , . Then,
(5) |
for all .
Proof of Theorem 1 is shown in appendix A.
From Theorem 1, we could derive the following corollary:
Corollary 1. (Non-asymptotic Convergence) By choosing step size , and for all , the sequence , generated by Algorithm 1 converges at rate:
(6) |
Proof of Corollary 1 is shown in appendix B.
IV Application in Metric Learning
Metric learning is a technique to learn a distance metric in data feature space, and finds application in various machine learning tasks relying on distances or similarities measure, like classification, clustering, dimensionality reduction and domain adaptation, to name a few [31, 32, 33, 34, 35, 36, 37, 38]. Most methods learn the metric (positive definite matrix ) in a weakly-supervised way from pairwise or triplet constraints of data points. In general, metric learning can be formulated as an optimization problem that shares the same form as standard regularized empirical risk minimization:
(7) |
where denotes training samples, is the loss function associated with sample constraints and is the regularizer, is the trade-off parameter. Many methods are specified as a constrained optimization problem by writing down explicitly as inequality constraints , although we can always transform it into an unconstrained problem using hinge loss or other tricks [31, 37].
Some techniques are developed to solve metric learning optimization problem eq. (7). Projected gradient descent and its stochastic version use traditional (stochastic) gradient descent followed by an orthogonal projection onto the positive semi-definite cone [33, 39, 40, 41]. Bregman projections update based on one single constraint at each iteration, and perform a general non-orthogonal projection so that the chosen constraint is satisfied. After projecting, an appropriate correction is employed [35].
However, these methods do not fully use the intrinsic manifold structure of the problem, i.e. the learned metric must lie in a Riemannian space of positive definite matrices. So it is naturally an optimization problem on Riemannian manifold rather than Euclidean space. In this section, we apply the proposed method to metric learning problem and illustrate how to optimize a convex target function in a Riemannian manifold.
IV-A Metric Learning Problem Formulation
We consider the following convex metric learning problem with in the Riemannian space of positive definite matrices:
(8) |
s.t.
where , is the LogDet divergence which is a scale-invariant distance measure on Riemannian metrics manifold [35]. is a target transformation matrix initialized to identity (corresponds to the Euclidean distance) or inverse of data covariance matrix (corresponds to the Mahalanobis distance), is set of all sample pairs with the same/different labels. and are the upper/lower distance bound of similar/dissimilar pairs of points and are set to 5-th/95-th percentiles of the observed distribution of distances in the following experiments. It is known that space of all positive definite Hermitian matrices is a Cartan-Hadamard manifold which is a simply connected complete Riemannian manifold with non-positive sectional curvature.
IV-B Optimization by the Proposed Riemannian Primal-dual Algorithm
By introducing relaxation variables , we have
(9) |
s.t.
, ,
, .
Let’s define ( is a vector whose entries are all one, extracts diagonal elements of an input matrix and write them to a vector), and , where are matrices composed by sample pairs with the same / different labels (shape: number of samples by feature dimensions), and are corresponding relaxation vectors which are greater than or equal to zero. So we have
(10) |
s.t.
,
,
,
.
Further more, we define , and , then
(11) |
s.t.
,
.
By employing duality, we have the following augmented Lagrangian function:
.
Now let’s solve the above Lagrangian function using Algorithm 1. At each step , we have the following updates:
,
,
.
So,
,
,
,
.
In the following paragraphs, we will show how to update primal and dual variables in each iteration.
(1) We employ Riemannian gradient decent method to search optimal . Define
.
We have , where is the Riemannian gradient and operator means retraction. See Appendix C for the full procedure.
(2) Define , and .
.
(3)
(4) .
IV-C Experimental Results
Both investors and machine learning researchers showed great interests on applying machine learning to finance area in recent years. The Holy Grail of quantitative investment is selection of high-quality financial assets with good timing to achieve higher returns with less risk[42]. To measure quality and trend of financial assets, technical, fundamental, and macroeconomic factors or features are developed, Usually multi-factor regression models[43] are deployed to find the most effective features to achieve higher asset return. However, when the number of features is large, or heterogeneity/multicollinearity exists in these features, traditional factor-oriented asset selection models tend to fail and may not give encouraging results.
Each asset can be represented by a data point in high-dimensional feature space, then a good distance metric in the space is crucial for more accurate similarity measure between assets. In our treating, assets selection can be regarded as a classification problem, assets are divided into two groups, one with positive return and the other with negative return, the aim is to find an optimal distance metric in feature space to separate these two groups, which is exactly what eq. (8) formulated. Using metric learning approach to asset selection, above mentioned factor model problems can be largely alleviated. In following sections, we apply the proposed Riemannian primal-dual metric learning (RPDML) algorithm to fund of funds (FOF) management problem. FOF is a multi-manager investment strategy whose portfolios are composed by mutual or hedge funds which invest directly in stocks, bonds or other financial assets.
IV-D Data
In this research, we consider Chinese mutual funds that were publicly traded for at least 12 consecutive months in the period 2012-01-01 to 2018-12-31. We select a total of 697 funds with capital size larger than 100 million RMB. Fund features consist of totally 70 technical factors with different rolling windows (10, 14, 21, 28, 42, 63 and 90 trading days respectively), including ROC, EMA, MDD, STDDEV, Sharpe ratio, Sortino ratio, Calmar ratio, RSI, MACD, and Stability [https://www.investopedia.com/]. All features are normalized to have zero mean and unit variance.
IV-E Backtest Protocol
For mutual fund management, typical length of rebalancing interval is one quarter of a year. So we split original sequential fund data into segments of quarters. We use a rolling window prediction schema, in the training set, we learn a distance metric from 70 technical factors from previous quarter with quarterly return of current quarter as target. In the test set, features from current quarter are used to predict quarterly return of next quarter. To validate learned metric, a simple k-nearest neighbors (k-NN) algorithm is employed, the predict return of next quarter for each fund is based on the learned metric from training set. The hyper-parameter k of k-NN is set as 10. To avoid overfitting, rolling data before 2017-01-01 are used as validation and data from 2017-01-01 to 2018-12-31 as test. For convenience, following results are based on both validation and test set.
For each fund, with k nearest neighbor funds predicted, we use average of returns of the k neighbor funds in current quarter as prediction of the fund’s return in the next quarter. At each quarterly rebalance day (the last trading day of each quarter), a top 10 buy trading strategy is used based on the prediction of each fund. In this strategy, we rank all funds based on their predicted quarterly return and select the top 10 funds for portfolio construction and rebalancing with equal weight.
We compare RPDML with the following four distance metrics and a baseline fund index:
- Euclidean distance metric.
- Mahalanobis distance metric, which is computed as the inverse of covariance matrix of training samples.
- LMNN, a distance metric learning algorithm based on the large margin nearest neighbor classification [34].
- ITML, a distance metric learning algorithm based on information geometry[35].
- GMML, a distance metric learning algorithm, the learned metric can be viewed as “matrix geometric mean” on the Riemannian manifold of positive definite matrices [37].
- Baseline fund index, CSI Partial Equity Funds Index [http://www.csindex.com.cn/en /indices/index-detail/930950]
IV-F Result
Algorithm | Euclidean | Mahalanobis | LMNN | ITML | GMML | RPDML |
---|---|---|---|---|---|---|
IC |
In the FoF setting, we care more about the order of predicted return than the absolute value. So we calculate the Spearman’s rank-order correlation of the predicted return to the true return for different algorithms, a higher correlation means better predictive power. In financial community, this correlation is often called Information coefficient (IC). The calculation is done rollingly, and we show the mean and standard deviation of IC in Table 1. We can see that RPDML achieves highest mean correlation. The backtest performance using different prediction models is shown in Figure 1. We also show CSI Partial Equity Funds Index (dashed curves) as baseline fund index which reflects overall performance of all partial equity funds in China’s financial market. From the top panel, we observed that all the experimental algorithms outperform baseline index, and among them, RPDML achieved the best performance, with total accumulated return of 148% in the whole backtest period, while the worst portfolio with Euclidean distance metric only achieved 25%, even less than CSI Partial Equity Funds Index. We also plot one year rolling maximum drawdown (MDD) in the bottom panel, MDD of the RPDML algorithm is quite low considering its superior accumulated return.
In Figure 2, we show annual returns of FOF of each algorithm. We can see that the proposed algorithm PRDML achieved highest returns in most years. Besides, we also notice that LMNN performed quite well in some years.
V Conclusion
In this paper, we propose a Riemannian primal-dual algorithm based on proximal operator for optimizing a smooth, convex, lower semicontinuous function on Riemannian manifolds with constraints. We prove convergence of the proposed algorithm and show its non-asymptotic rate. By utilizing the proposed primal-dual optimization technique, we propose a novel metric learning algorithm which learns an optimal feature transformation matrix in the Riemannian space of positive definite matrices. Preliminary experimental results on an optimal fund selection problem in FOF management for quantitative investment showed its efficacy.
VI Appendix
VI-A Proof of Theorem 1
Due to convexity of , for any we have
(12) |
where and .
Because is a Hadamard manifold which has non-positive curvature, then
(13) |
[ref.[44], Proposition 1]
From eq. (3), we have . So
.
Since is zero or positive, by taking out it,
.
Let be the saddle (min-max) point which satisfies
. By choosing , we have
(15) |
Multiplying eq. (15) with and summing over ,
(16) |
For any dual variable , we have
By summing over , and using the telescoping sum series, we have
.
Using the fact that and , we have
(17) |
Because is concave with respect to ,
.
By replacing in eq. (17), we have
(18) |
Now let’s expand the left side of eq. (20),
Since , , and , by removing positive terms and , we have the following inequality
(21) |
By maximizing , we have , and
Since could be any value in , so
By removing the maximum term which is positive on the l.h.s. of above equation , we have
Since
we have
(23) |
End of proof.
VI-B Proof of Corollary 1
The proof simply follows the proof of Corollary 8 in ref.[24].
First we have the following bounds:
By using the fact that , we have
End of proof.
VI-C Riemannian Gradient with Retraction
We can write Euclidean gradient as
, where means constructing a block diagonal matrix whose block diagonal elements are columns / rows of . In another word, each element of is a covariance matrix of a sample pair vector , (assume the expectation of , , is zero).
is tensor dot product between and a vector of size whose element is covariance matrix of the ’th sample pair.
1) Riemannian Gradient
First we show that in our case, .
Let’s define , s.t. , , and , with , . is Stiefel manifold of real, orthonormal matrices. is a Riemannian manifold with tangent space[45, 3]:
, where is an arbitrary matrix, , and .
For a given objective function which depends on input matrix W of size , we use represent Euclidean gradient of w.r.t ; and denote as its Riemannian gradient by projecting the Euclidean gradient onto the tangent space of :
, where , , , and .
For our problem, since is a real symmetric positive definite matrix, we have , and , where is an orthogonal matrix, is a diagonal matrix whose entries are the eigenvalues of and greater than or equal to zero.
Since , , , , the projection of the Euclidean gradient onto the tangent space of is
.
2) Retraction
With and shown above, we would like to calculate using retraction. From Ref.[4], for any tangent vector , its retraction . For our case , where and are the -th eigenvalues and eigenvector of matrix .
The calculation and retraction shown above are repeated until converges.
References
- [1] Y. Nishimori, S. Akaho, and M. D. Plumbley, “Riemannian optimization method on the flag manifold for independent subspace analysis,” in International Conference on Independent Component Analysis and Signal Separation. Springer, 2006, pp. 295–302.
- [2] P.-A. Absil, C. G. Baker, and K. A. Gallivan, “Trust-region methods on riemannian manifolds,” Foundations of Computational Mathematics, vol. 7, no. 3, pp. 303–330, 2007.
- [3] B. Vandereycken, “Low-rank matrix completion by Riemannian optimization,” SIAM Journal on Optimization, vol. 23, No. 2, pp. 1214–1236, 2013.
- [4] L. Cheng, “Riemannian similarity learning,” in International Conference on Machine Learning, 2013, pp. 540–548.
- [5] A. Cherian and S. Sra, “Riemannian dictionary learning and sparse coding for positive definite matrices,” IEEE transactions on neural networks and learning systems, vol. 28, no. 12, pp. 2859–2871, 2017.
- [6] P.-A. Absil, R. Mahony, and R. Sepulchre, “Optimization on manifolds: Methods and applications,” in Recent Advances in Optimization and its Applications in Engineering. Springer, 2010, pp. 125–144.
- [7] D. Gabay, “Minimizing a differentiable function over a differential manifold,” Journal of Optimization Theory and Applications, vol. 37, no. 2, pp. 177–219, 1982.
- [8] S. T. Smith, “Optimization techniques on riemannian manifolds,” Fields institute communications, vol. 3, no. 3, pp. 113–135, 1994.
- [9] G. de Carvalho Bento, O. P. Ferreira, and P. R. Oliveira, “Unconstrained steepest descent method for multicriteria optimization on riemannian manifolds,” J. Optimization Theory and Applications, vol. 154, no. 1, pp. 88–107, 2012.
- [10] H. Sato and T. Iwai, “A new, globally convergent riemannian conjugate gradient method,” Optimization, vol. 64, no. 4, pp. 1011–1031, 2015.
- [11] Y. Liu, F. Shang, J. Cheng, H. Cheng, and L. Jiao, “Accelerated first-order methods for geodesically convex optimization on riemannian manifolds,” in Advances in Neural Information Processing Systems, 2017, pp. 4868–4877.
- [12] C. Qi, K. A. Gallivan, and P.-A. Absil, “Riemannian bfgs algorithm with applications,” in Recent advances in optimization and its applications in engineering. Springer, 2010, pp. 183–192.
- [13] G. Bécigneul and O. Ganea, “Riemannian adaptive optimization methods,” CoRR, vol. abs/1810.00760, 2018. [Online]. Available: http://arxiv.org/abs/1810.00760
- [14] C. Qi, “Numerical optimization methods on riemannian manifolds,” 2011.
- [15] N. Agarwal, N. Boumal, B. Bullins, and C. Cartis, “Adaptive regularization with cubics on manifolds with a first-order analysis,” arXiv preprint arXiv:1806.00065, 2018.
- [16] M. Schmidt, N. Le Roux, and F. Bach, “Minimizing finite sums with the stochastic average gradient,” Mathematical Programming, vol. 162, no. 1-2, pp. 83–112, 2017.
- [17] R. Johnson and T. Zhang, “Accelerating stochastic gradient descent using predictive variance reduction,” in Advances in neural information processing systems, 2013, pp. 315–323.
- [18] A. Defazio, F. Bach, and S. Lacoste-Julien, “Saga: A fast incremental gradient method with support for non-strongly convex composite objectives,” in Advances in neural information processing systems, 2014, pp. 1646–1654.
- [19] H. Zhang, S. J. Reddi, and S. Sra, “Riemannian svrg: Fast stochastic optimization on riemannian manifolds,” in Advances in Neural Information Processing Systems, 2016, pp. 4592–4600.
- [20] H. Sato, H. Kasai, and B. Mishra, “Riemannian stochastic variance reduced gradient,” arXiv preprint arXiv:1702.05594, 2017.
- [21] H. Kasai, H. Sato, and B. Mishra, “Riemannian stochastic quasi-newton algorithm with variance reduction and its convergence analysis,” arXiv preprint arXiv:1703.04890, 2017.
- [22] A. Hauswirth, S. Bolognani, G. Hug, and F. Dörfler, “Projected gradient descent on riemannian manifolds with applications to online power system optimization,” in 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton). IEEE, 2016, pp. 225–232.
- [23] J. Zhang, S. Ma, and S. Zhang, “Primal-dual optimization algorithms over riemannian manifolds: an iteration complexity analysis,” arXiv preprint arXiv:1710.02236, 2017.
- [24] M. B. Khuzani and N. Li, “Stochastic primal-dual method on riemannian manifolds of bounded sectional curvature,” in Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference on. IEEE, 2017, pp. 133–140.
- [25] R. T. Rockafellar, “Augmented lagrangians and applications of the proximal point algorithm in convex programming,” Mathematics of operations research, vol. 1, no. 2, pp. 97–116, 1976.
- [26] N. Parikh, S. Boyd et al., “Proximal algorithms,” Foundations and Trends® in Optimization, vol. 1, no. 3, pp. 127–239, 2014.
- [27] O. Ferreira and P. Oliveira, “Proximal point algorithm on riemannian manifolds,” Optimization, vol. 51, no. 2, pp. 257–270, 2002.
- [28] M. Bačák, “The proximal point algorithm in metric spaces,” Israel Journal of Mathematics, vol. 194, no. 2, pp. 689–701, 2013.
- [29] S. Chen, S. Ma, A. M.-C. So, and T. Zhang, “Proximal gradient method for manifold optimization,” arXiv preprint arXiv:1811.00980, 2018.
- [30] L. P. Eisenhart, Riemannian geometry. Princeton university press, 2016.
- [31] B. Kulis et al., “Metric learning: A survey,” Foundations and Trends® in Machine Learning, vol. 5, no. 4, pp. 287–364, 2013.
- [32] A. Bellet, A. Habrard, and M. Sebban, “A survey on metric learning for feature vectors and structured data,” arXiv preprint arXiv:1306.6709, 2013.
- [33] E. P. Xing, M. I. Jordan, S. J. Russell, and A. Y. Ng, “Distance metric learning with application to clustering with side-information,” in Advances in neural information processing systems, 2003, pp. 521–528.
- [34] K. Q. Weinberger, J. Blitzer, and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” in Advances in neural information processing systems, 2006, pp. 1473–1480.
- [35] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, “Information-theoretic metric learning,” in Proceedings of the 24th international conference on Machine learning. ACM, 2007, pp. 209–216.
- [36] S. Wang and R. Jin, “An information geometry approach for distance metric learning,” in Artificial Intelligence and Statistics, 2009, pp. 591–598.
- [37] P. Zadeh, R. Hosseini, and S. Sra, “Geometric mean metric learning,” in International Conference on Machine Learning, 2016, pp. 2464–2471.
- [38] S. Mahadevan, B. Mishra, and S. Ghosh, “A unified framework for domain adaptation using metric learning on manifolds,” arXiv preprint arXiv:1804.10834, 2018.
- [39] E. S. Levitin and B. T. Polyak, “Constrained minimization methods,” Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, vol. 6, no. 5, pp. 787–823, 1966.
- [40] S. Shalev-Shwartz, Y. Singer, and A. Y. Ng, “Online and batch learning of pseudo-metrics,” in Proceedings of the twenty-first international conference on Machine learning. ACM, 2004, p. 94.
- [41] P. Jain, B. Kulis, I. S. Dhillon, and K. Grauman, “Online metric learning and fast similarity search,” in Advances in neural information processing systems, 2009, pp. 761–768.
- [42] R. A. DeFusco, D. W. McLeavey, J. E. Pinto, M. J. Anson, and D. E. Runkle, Quantitative investment analysis. John Wiley & Sons, 2015.
- [43] E. F. Fama and K. R. French, “The capital asset pricing model: Theory and evidence,” Journal of economic perspectives, vol. 18, no. 3, pp. 25–46, 2004.
- [44] G. C. Bento, O. P. Ferreira, and J. G. Melo, “Iteration-complexity of gradient, subgradient and proximal point methods on riemannian manifolds,” Journal of Optimization Theory and Applications, vol. 173, no. 2, pp. 548–562, 2017.
- [45] P. A. Absil, R. Mahony, and R. Sepulchre, Optimization algorithms on matrix manifolds. Princeton University Press, 2009.