2021 \jmlrworkshopACML 2021
Vector Transport Free Riemannian LBFGS for Optimization on Symmetric Positive Definite Matrix Manifolds
Abstract
This work concentrates on optimization on Riemannian manifolds. The Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) algorithm is a commonly used quasi-Newton method for numerical optimization in Euclidean spaces. Riemannian LBFGS (RLBFGS) is an extension of this method to Riemannian manifolds. RLBFGS involves computationally expensive vector transports as well as unfolding recursions using adjoint vector transports. In this article, we propose two mappings in the tangent space using the inverse second root and Cholesky decomposition. These mappings make both vector transport and adjoint vector transport identity and therefore isometric. Identity vector transport makes RLBFGS less computationally expensive and its isometry is also very useful in convergence analysis of RLBFGS. Moreover, under the proposed mappings, the Riemannian metric reduces to Euclidean inner product, which is much less computationally expensive. We focus on the Symmetric Positive Definite (SPD) manifolds which are beneficial in various fields such as data science and statistics. This work opens a research opportunity for extension of the proposed mappings to other well-known manifolds.
keywords:
LBFGS, Riemannian optimization, positive definite manifolds, isometric vector transport, quasi-Newton’s method1 Introduction
Various numerical optimization methods have appeared, for the Euclidean spaces, which can be categorized into first-order and second-order methods (Nocedal and Wright, 2006). Examples for the former category are steepest descent and gradient descent and for the latter group are Newton’s method. Computation of the Hessian matrix is usually expensive in Newton’s method encouraging many practical problems to either use quasi-Newton’s methods for approximating the Hessian matrix or use non-linear conjugate gradient (Hestenes and Stiefel, 1952). The most well-known algorithm for quasi-Newton optimization is Broyden-Fletcher-Goldfarb-Shanno (BFGS) (Fletcher, 2013). Limited-memory BFGS (LBFGS) is a simplified version of BFGS which utilizes less memory (Nocedal, 1980; Liu and Nocedal, 1989). It has recursive unfoldings which approximate the descent directions in optimization.
Unlike Euclidean spaces in which the optimization direction lie in a linear coordinate system, Riemannian spaces have curvature in coordinates. Recently, extension of optimization methods from Euclidean spaces to Riemannian manifolds has been extensively noticed in the literature (Absil et al., 2009; Boumal, 2020; Hu et al., 2020). For example, Euclidean BFGS has been extended to Riemannian manifolds, named Riemannian BFGS (RBFGS), (Qi et al., 2010), its convergence has been proven (Ring and Wirth, 2012; Huang et al., 2015), and its properties have been analyzed in the literature (Seibert et al., 2013). As vector transport is computationally expensive in RBFGS, cautious RBFGS was proposed (Huang et al., 2016) which ignores the curvature condition in the Wolfe conditions (Wolfe, 1969) and only checks the Armijo condition (Armijo, 1966). Since the curvature condition guarantees that the approximation of Hessian remains positive definite, it compensates by checking a cautious condition (Li and Fukushima, 2001) before updating the approximation of Hessian. This cautious RBFGS has been used in the Manopt optimization toolbox (Boumal et al., 2014). Another approach is an extension of the Euclidean LBFGS to Riemannian manifolds, named Riemannian LBFGS (RLBFGS), using both Wolfe conditions (Wolfe, 1969) in linesearch can be found in (Sra and Hosseini, 2015, 2016; Hosseini and Sra, 2020). Some other direct extensions of Euclidean BFGS to Riemannian spaces exist (e.g., see (Ji, 2007, Chapter 7)).
In this paper, we address the computationally expensive parts of RLBFGS algorithm, which are computation of vector transports, their adjoints, and Riemannian metrics. To achieve this, we propose two mappings, in the tangent space, which make RLBFGS free of vector transport. We name the obtained algorithm Vector Transport Free (VTF)-RLBFGS. One mapping uses inverse second root and the other uses Cholesky decomposition which is very efficient (Golub and Van Loan, 2013). The proposed mappings make both vector transport and adjoint vector transport, which are used in RLBFGS, identity. This reduction of transports to identity makes optimization much less expensive computationally. Moreover, as the vector transports become identity, they are isometric which is a suitable property mostly used in the convergence proofs of RBFGS and RLBFGS algorithms (Ring and Wirth, 2012; Huang et al., 2015). Furthermore, under the proposed mappings, the Riemannian metric reduces to Euclidean inner product which is much less computationally expensive. In this paper, we concentrate on the Symmetric Positive Definite (SPD) manifolds (Sra and Hosseini, 2015, 2016; Bhatia, 2009) which are very useful in data science and machine learning, such as in mixture models (Hosseini and Sra, 2020; Hosseini and Mash’al, 2015). This paper opens a new research path for extension of the proposed mappings to other well-known manifolds such as Grassmann and Stiefel (Edelman et al., 1998).
The remainder of this paper is organized as follows. Section 2 reviews the notations and technical background on Euclidean LBFGS, Wolfe conditions, Riemannian LBFGS, and SPD manifold. The proposed mappings using inverse second root and Cholesky decomposition are introduced in Sections 3 and 4, respectively. Simulation results are reported in Section 6. Finally, Section 7 concludes the paper and proposes the possible future directions.
2 Background and Notations
2.1 Euclidean BFGS and LBFGS
Consider minimization of the cost function where the point belongs to some domain. In Newton’s method, the descent direction at the iteration is calculated as (Nocedal and Wright, 2006), where is the Hessian or approximation of Hessian and is the gradient of function at iteration . The Euclidean BFGS method is a quasi-Newton’s method which approximates the Hessian matrix as (Fletcher, 2013; Nocedal and Wright, 2006), where and . The descent direction is whose expression was provided above.
The Euclidean LBFGS calculates the descent direction recursively where it uses the approximation of the inverse Hessian as (Nocedal, 1980; Liu and Nocedal, 1989), where denotes the approximation of the inverse of the Hessian at iteration , , and in which denotes the identity matrix. The LBFGS algorithm updates the approximation of the inverse of the Hessian matrix recursively and for that it always stores a memory window of pairs (Liu and Nocedal, 1989).
2.2 Linesearch and Wolfe Conditions
After finding the descent direction at each iteration of optimization, one needs to know what step size should be taken in that direction. Linesearch should be performed to find the largest step, for faster progress, satisfying Wolfe conditions which are and (Wolfe, 1969), where the parameters are recommended to be and (Nocedal and Wright, 2006). The former condition is the Armijo condition to check if cost decreases sufficiently (Armijo, 1966) while the latter is the curvature condition making sure that the slope is reduced sufficiently in a way that the approximation of Hessian remains positive definite. Note that there also exists a strong curvature condition, i.e., .
2.3 Riemannian Notations
Consider a Riemannian manifold denoted by . At every point , there is a tangent space to the manifold, denoted by . A tangent space includes tangent vectors. We denote a tangent vector by . For , a metric on this manifold is the inner product defined on manifold and is denoted by . Note that the gradient of a cost function, whose domain is a manifold, is a tangent vector in the tangent space and is denoted by for point . Vector transport is an operator which maps a tangent vector from its tangent space at point to another tangent space at another point . We denote this vector transport by . Now, consider a point on a manifold and a descent direction in the tangent space . The retraction retracts or maps the direction in the tangent space onto the manifold . The operator exponential map, denoted by , is also capable of this mapping by moving along the geodesic. One can use the second-order Taylor expansion of exponential map, which is positive-preserving (Jeuris et al., 2012) for the case of SPD manifold, for approximating the exponential map. In this paper, denotes the trace of matrix and denotes the Frobenius norm.
2.4 Riemannian BFGS and LBFGS
The Riemannian extension of Euclidean BFGS (RBFGS) (Qi et al., 2010; Ring and Wirth, 2012; Huang et al., 2015) performs updates of Hessian approximation by the (see Section 2.1) using Riemannian operators. RBFGS methods check both Wolfe conditions (see Section 2.2) which are computationally expensive. Cautious RBFGS (Huang et al., 2016) ignores the curvature condition and only checks the Armijo condition for linesearch. However, for ensuring that the Hessian approximation remains positive definite, it checks a cautious condition (Li and Fukushima, 2001) before updating the Hessian approximation.
Riemannian LBFGS (Sra and Hosseini, 2015, 2016; Hosseini and Sra, 2020) performs the recursions of LBFGS in the Riemannian space for finding the descent direction and checks both Wolfe conditions in linesearch. In every iteration of optimization, recursion starts with the direction . Let the recursive function returns the descent direction. Inside every step of this recursion, we have (Hosseini and Sra, 2020, Algorithm 3):
(1) | |||
(2) | |||
(3) |
where, , and inspired by the introduced and for Euclidean spaces, we have:
(4) | |||
(5) | |||
(6) |
Note that the new point in every iteration is found by retraction, or an exponential map, of the searched point along the descent direction onto manifold. According to Eq. (2) in recursion and Eq. (5), RLBFGS involves both adjoint vector transport and vector transport which are computationally expensive. Moreover, Eqs. (1) and (3) show that Riemannian metric is utilized many times inside recursions. Our proposed mappings simplify all vector transport, adjoint vector transport, and metric which are used in the RLBFGS algorithm.
2.5 Symmetric Positive Definite (SPD) Manifold
Consider the SPD manifold (Sra and Hosseini, 2015, 2016) whose every point is a SPD matrix, i.e., and . It can be shown that the tangent space to the SPD manifold is the space of symmetric matrices, i.e., (Bhatia, 2009). In this paper, we focus on SPD manifolds which are widely used in data science. The operators for metric, gradient, exponential map, and vector transport on SPD manifolds are listed in Table 1 (Sra and Hosseini, 2016; Hosseini and Sra, 2020). In this table, denotes the Euclidean gradient and and are the lower-triangular matrices in Cholesky decomposition of points and , respectively.
Operator | No mapping |
---|---|
Metric, | |
Gradient, | |
Exponential map, | |
Vector transport, | or |
Approx. Euclidean retraction, | |
Operator | Mapping by inverse second root |
Mapping | |
Metric, | |
Gradient, | |
Exponential map, | |
Vector transport, | |
Approx. Euclidean retraction, | |
Operator | Mapping by Cholesky decomposition |
Mapping | |
Metric, | |
Gradient, | |
Exponential map, | |
Vector transport, | |
Approx. Euclidean retraction, |
3 Vector Transport Free Riemannian LBFGS Using Mapping by Inverse Second Root
We propose two mappings on tangent vectors in the tangent space where the first mapping is by inverse second root. Our first proposed mapping in the tangent space of every point is:
(7) |
where the mapped tangent vector still remains in the tangent space, i.e. . It is important the proposed mapping is bijective and keeps the tangent vector in the tangent space while it simplifies vector transport, adjoint vector transport, and metric.
Under the proposed mapping (7), the Riemannian operators on a SPD manifold are modified and mostly simplified. These operators are listed in Table 1. In the following, we provide proofs for these modifications.
Proposition 3.1.
After mapping (7), we have
-
•
Metric: the metric on SPD manifold is reduced to the Euclidean inner product, i.e., .
-
•
Gradient: the gradient on a SPD manifold is changed to , where denotes the Euclidean gradient.
-
•
Vector transport: the vector transport is changed to identity, i.e., hence, optimization becomes vector transport free.
-
•
Exponential map: the exponential map on a SPD manifold becomes .
-
•
Adjoint vector transport: the adjoint of vector transport on a SPD manifold remains the same. In other words, if before mapping we have the definition of adjoint vector transport as (Ring and Wirth, 2012), we will have .
-
•
Retraction: the approximation of Euclidean retraction, using second-order Taylor expansion, on a SPD manifold becomes .
Proof 3.2.
Metric: where is because of definition of metric on SPD manifolds (see Table 1) and is thanks to the cyclic property of trace.
Gradient: , where is due to definition of gradient on a SPD manifold (see Table 1).
Vector transport: , where is due to definition of vector transport on a SPD manifold (see Table 1).
Exponential map: where is shown in (Sra and Hosseini, 2015, Eq. 3.3) and is shown in (Sra and Hosseini, 2015, Eq. 3.2). Also see Table 1.
Retraction: , where is because of approximation of Euclidean retraction, using the second-order Taylor expansion, on SPD manifolds (see Table 1).
4 Vector Transport Free Riemannian LBFGS Using Mapping by Cholesky Decomposition
Our second proposed mapping is by Cholesky decomposition which is very efficient computationally. Consider the Cholesky decomposition of point (Golub and Van Loan, 2013):
(8) |
where is the lower-triangular matrix in Cholesky decomposition. It is noteworthy that many of the MATLAB matrix multiplication operators, which the Manopt toolbox (Boumal et al., 2014) also uses, apply Cholesky decomposition internally due to its efficiency.
In the tangent space of every point , the proposed mapping is:
(9) |
where . Note that, similar to the previous mapping, the tangent matrix is still symmetric under this mapping; hence, it remains in the tangent space of the SPD manifold (Bhatia, 2009). Similar to the previous mapping, under the second proposed mapping (7), the Riemannian operators on SPD manifold are simplified. These operators can be found in Table 1. In the following, we provide proofs for these operators.
Proposition 4.1.
After mapping (9), we have:
-
•
Metric: the metric on a SPD manifold is reduced to the Euclidean inner product, i.e., .
-
•
Gradient: the gradient on SPD manifold is changed to , where denotes the Euclidean gradient.
-
•
Vector transport: the vector transport is changed to identity, i.e., , hence, optimization becomes vector transport free.
-
•
Exponential map: the exponential map becomes .
-
•
Adjoint vector transport: the adjoint vector transport becomes as we had in Proposition 3.1.
-
•
Retraction: the approximation of Euclidean retraction by second-order Taylor expansion on a SPD manifold becomes .
Proof 4.2.
Metric: , where is because of definition of metric on SPD manifolds (see Table 1) and is thanks to the cyclic property of trace.
Gradient: , where is due to definition of gradient on a SPD manifold (see Table 1).
Vector transport: , where is due to definition of vector transport on SPD manifold (see Table 1).
5 Analytical Discussion and Complexity Analysis
Corollary 5.1.
Propositions 3.1 and 4.1 show that under mapping (7) or (9), both vector transport and adjoint vector transport are identity. As these transforms become identity, they also become isometric because inner products of vectors do not change under these transforms. As they are identity, these transforms also preserve the length of vectors under the proposed mappings.
Propositions 3.1 and 4.1 and Corollary 5.1 show the two proposed mappings simplify vector transport and adjoint vector transport to isometric identity and reduce the Riemannian metric to Euclidean inner product. These reductions and simplifications reduce computations significantly during optimization on the manifold. The VTF-RLBFGS algorithm using either of the proposed mappings is shown in Algorithm 1. As the algorithm shows, at every new point , the entire parameters are computed in the paradigm of mapping because the manifold operators, calculated as in Table 1, are in that paradigm. This simplifies operators such as metric and removes vector transports from RLBFGS. In case the Riemannian gradient is given directly by the user to RLBFGS, the Riemannian gradient, which is in the tangent space, should be mapped explicitly by Eqs. (7) and (9) at every iteration. However, if the Riemannian gradient is calculated from the Euclidean gradient, it should not be mapped explicitly, since it is already in the paradigm of mapping implicitly because of the used operators of Table 1 in that paradigm.
Lemma 5.2.
Proof 5.3.
A valid vector transport should satisfy three properties (Hosseini and Sra, 2020) (also see (Absil et al., 2009, Definition 8.1.1) and (Boumal, 2020, Definition 10.62)):
(1) The following property holds because the tangent space is isomorphic to the space of symmetric matrices and the identity vector transports return the tangent vector itself which is in the space of symmetric matrices.
(2) The vector transport of a tangent vector from one point to itself should be the same tangent vector. This holds because vector transport is identity under the proposed mappings: .
(3) The vector transport should be linear which is because it is equal to under the proposed mappings.
As after applying the mappings, the vector transport holds the three above properties, it is a valid transport. Q.E.D.
Proposition 5.4.
The time complexity of the recursion part in RLBFGS is improved from to after mapping (7) or (9), where is the memory limit, i.e., the maximum number of recursions in RLBFGS (proof is available in Supplementary Material). This time improvement shows off better in problems whose computation of gradients in Eq. (6), or line in Algorithm 1, is not dominant in complexity.
6 Simulations
In this section, we evaluate the effectiveness of the proposed mappings, i.e. (7) and (9), in the tangent space. Here, we show that these mappings often improve the performance and speed of RLBFGS. The code of this article is available in https://github.com/bghojogh/LBFGS-Vector-Transport-Free. In our reports, we denote the proposed Vector Transport Free (VTF) RLBFGS with VTF-RLBFGS where ISR and Cholesky (or Chol.) stand for VTF-RLBFGS using mapping by inverse second root and Cholesky decomposition, respectively. For RLBFGS, with and without the proposed mappings, we use both Wolfe linesearch conditions. The programming language, used for experiments, was MATLAB and the hardware was Intel Core-i7 CPU with the base frequency 2.20 GHz and 12 GB RAM. For every experiment, we performed optimization for ten runs and the reported results are the average of performances over the runs. We evaluated our mappings with various application problems, explained below.
6.1 Gaussian Mixture Model
Formulation: An optimization problem, which we selected for evaluation, is the Riemannian optimization for Gaussian Mixture Model (GMM) without the use of expectation maximization. We employ RLBFGS with and without the proposed mappings for fitting the GMM problem whose algorithm can be found in (Hosseini and Sra, 2020; Hosseini and Mash’al, 2015). This is a suitable problem for evaluation of the proposed mappings because the covariance matrices are SPD (Bhatia, 2009). For this, we minimize the negative log-likelihood of GMM where the covariance matrix is constrained to belong to the SPD matrix manifold (Hosseini and Sra, 2020). For -dimensional GMM, the optimization problem is:
(10) | ||||||
subject to |
where denotes the sample size, denotes the number of components in mixture model, and , , and are the mixing probability, mean, and covariance of the -th component, respectively. We use the same reformulation trick of (Hosseini and Sra, 2020) to reformulate the cost function of (10). The mixture parameters were initialized using K-means++ (Arthur and Vassilvitskii, 2007) following (Hosseini and Sra, 2020). Three different levels of separation of Gaussian models, namely low, mid, and high, were used. The reader can refer to (Hosseini and Sra, 2020) for mathematical details of these separation levels.
Separation | Algorithm | #iters | conv. time | time diff. std | iter. time | last cost | ||
---|---|---|---|---|---|---|---|---|
2 | 2 | Low | VTF (ISR) | 53.10018.248 | 68.38052.834 | 13.405 | 1.1400.504 | 0.3640.444 |
VTF (Chol.) | 52.10017.866 | 61.09648.433 | 17.443 | 1.0300.463 | 0.3640.444 | |||
RLBFGS | 52.50016.595 | 65.89049.208 | – | 1.1250.458 | 0.3640.444 | |||
Mid | VTF (ISR) | 56.40021.813 | 76.12469.189 | 10.712 | 1.1500.574 | 0.6570.344 | ||
VTF (Chol.) | 54.40019.156 | 68.45664.290 | 48.016 | 1.0990.504 | 0.6380.333 | |||
RLBFGS | 57.70019.844 | 81.95764.463 | – | 1.2500.550 | 0.6570.344 | |||
High | VTF (ISR) | 25.5003.064 | 9.5182.922 | 0.333 | 0.3660.067 | 0.3410.371 | ||
VTF (Chol.) | 26.0004.738 | 10.2545.468 | 3.132 | 0.3770.108 | 0.3410.371 | |||
RLBFGS | 25.5003.064 | 10.0543.141 | – | 0.3860.073 | 0.3410.371 | |||
10 | Low | VTF (ISR) | 77.60054.175 | 235.615466.119 | 12.040 | 1.9361.751 | 4.2100.889 | |
VTF (Chol.) | 84.10073.260 | 313.398714.908 | 257.528 | 2.0412.150 | 4.2090.889 | |||
RLBFGS | 77.60052.754 | 242.743458.641 | – | 2.0691.733 | 4.2090.889 | |||
Mid | VTF (ISR) | 44.6007.260 | 43.65416.872 | 4.139 | 0.9480.208 | 4.2621.098 | ||
VTF (Chol.) | 45.9008.647 | 45.66120.088 | 5.615 | 0.9550.236 | 4.2621.098 | |||
RLBFGS | 45.2008.080 | 48.10419.981 | – | 1.0250.243 | 4.2621.098 | |||
High | VTF (ISR) | 44.4009.252 | 43.68422.460 | 11.746 | 0.9360.254 | 3.8741.395 | ||
VTF (Chol.) | 47.1008.333 | 48.27821.112 | 8.992 | 0.9870.240 | 3.8741.395 | |||
RLBFGS | 43.3007.150 | 43.90417.626 | – | 0.9810.225 | 3.8741.395 | |||
5 | 2 | Low | VTF (ISR) | 152.40062.819 | 1344.573999.949 | 649.595 | 7.5603.406 | 0.2600.458 |
VTF (Chol.) | 174.800114.139 | 2168.8863090.820 | 2347.949 | 8.6806.348 | 0.2650.466 | |||
RLBFGS | 166.30076.884 | 1767.6761406.454 | – | 8.7884.427 | 0.2670.454 | |||
Mid | VTF (ISR) | 120.70069.620 | 942.7131063.075 | 1120.404 | 5.8073.877 | 0.7810.207 | ||
VTF (Chol.) | 121.60065.030 | 897.110988.826 | 1315.410 | 5.7203.445 | 0.7810.207 | |||
RLBFGS | 136.30091.493 | 1404.6541986.798 | – | 7.0575.378 | 0.7640.225 | |||
High | VTF (ISR) | 46.40017.322 | 94.741105.045 | 22.071 | 1.7390.902 | 1.8050.385 | ||
VTF (Chol.) | 46.20019.803 | 98.065122.950 | 6.436 | 1.7291.020 | 1.8050.385 | |||
RLBFGS | 46.20018.937 | 104.934126.734 | – | 1.8791.064 | 1.8050.385 | |||
10 | Low | VTF (ISR) | 295.90086.577 | 5524.4953173.855 | 2664.627 | 17.2995.221 | 6.1750.745 | |
VTF (Chol.) | 292.30083.828 | 5294.5653165.735 | 5106.693 | 16.7375.350 | 6.1970.718 | |||
RLBFGS | 318.800100.216 | 7083.0824801.718 | – | 20.2796.859 | 6.1730.744 | |||
Mid | VTF (ISR) | 134.20054.956 | 1139.332919.917 | 306.469 | 7.3023.227 | 6.7530.708 | ||
VTF (Chol.) | 133.30052.415 | 1100.519842.840 | 296.996 | 7.1833.045 | 6.7530.708 | |||
RLBFGS | 135.30054.965 | 1268.707967.678 | – | 8.0703.580 | 6.7530.708 | |||
High | VTF (ISR) | 68.60012.367 | 241.48787.593 | 18.122 | 3.3980.764 | 6.5990.836 | ||
VTF (Chol.) | 74.00014.071 | 279.271105.828 | 40.158 | 3.6320.838 | 6.5990.836 | |||
RLBFGS | 68.30011.982 | 258.47593.959 | – | 3.6610.790 | 6.5990.836 |
Results: We compared RLBFGS performances with and without our proposed mappings. The average number of iterations, convergence time, time per iteration, and the cost value of last iteration are reported in Table 2 where exponential map is used for retraction. Results for Taylor approximation of exponential map used as retraction is available in Table 1 of supplementary material. In these experiments, we report performances for , , . For the sake of fairness, all the three compared algorithms start with the same initial points in every run. The reader can see more extensive experiments for more sample size and dimensionality, i.e. , , , in the Supplementary Material. In these tables, we have also provided the standard deviation (std) of time differences between VTF-RLBFGS and RLBFGS, over the ten runs. This value shows how much the average convergence time improvements over RLBFGS are reliable.
Discussion by Varying Separability: As Table 2 and the Table 1 of Supplementary Material show, the proposed ISR mapping converges faster than no mapping most often in all separability levels. Both its time per iteration and number of iterations are often less than no mapping. These tables also show that the proposed Cholesky mapping converges faster than no mapping in most of the cases, although not all cases. Overall, we see that the two proposed mappings make RLBFGS faster and more efficient most often. This pacing improvement can be noticed more for low and mid separability levels because they are harder cases to converge. In terms of quality of local minimum, the proposed mappings often find the same local minimum as no mapping, but in a faster way. The equality of the found local optima in the three methods is because of using the same RLBFGS algorithm as their base in addition to having the same initial points. In some cases, the proposed mappings have even found better local minima. Moreover, as expected, the time difference std reduces in higher separability which is a simpler task.
Discussion by Varying Dimensionality: We can discuss the results of Table 2 and the Table 1 of Supplementary Material by varying dimensionality, . Tables 2 and 3 of Supplementary Material also report performance for dimensions and larger sample size. Obviously, by increasing dimensionality, the time of convergence goes up and the faster pacing of the proposed mappings is noticed more, compared to no mapping.
Discussion by Varying the Number of Components: The results of Table 2 and the Table 1 of Supplementary Material can also be interpreted based on varying the number of mixture components, . The more number of components makes the optimization problem harder. Hence, the difference of speeds of the proposed mappings and no mapping can be noticed more for ; although, for both values, the proposed mappings are often faster than no mapping.
Discussion by Varying Retraction Type: We can also compare the performance of algorithms in terms of type of retraction. Table 2 and the Table 1 of Supplementary Material report performances where exponential map and Taylor approximation of exponential map are used for retraction, respectively. Comparing these tables shows that using Taylor approximation usually converges with less number of iterations compared to using exponential map. It makes sense because exponential map requires passing on geodesics. This difference of pacing is more obvious for larger number of components, i.e., . Our two proposed mappings outperform no mapping for both types retraction. This shows that our mappings are effective regardless of the details of operators.
Discussion by Varying the Sample Size: Table 2 and the Table 1 of Supplementary Material report for , . More experiments for larger sample size and dimensionality, i.e. , , can be found in Tables 2 and 3 in the Supplementary Material. Comparing those tables with Table 2 and the Table 1 of Supplementary Material shows that larger sample size and/or dimensionality takes more time to converge as expected. Still, our proposed mappings often converge faster than no mapping. The difference of pacing is mostly less in larger sample size compared to smaller sample size. This is because very large sample size consumes time on computation of cost function and the difference is not given much chance to show off in that case.
Low separation | Mid separation | High separation | ||||
---|---|---|---|---|---|---|
Algorithm | #iters () | #iters () | #iters () | #iters () | #iters () | #iters () |
VTF (ISR) | 51.5008.885 | 123.40026.069 | 54.10020.464 | 106.50039.328 | 24.5004.353 | 35.5006.996 |
VTF (Chol.) | 54.20011.263 | 123.00035.065 | 58.80024.943 | 107.50044.490 | 25.1004.149 | 35.9007.622 |
RLBFGS | 52.9008.825 | 147.40035.926 | 57.90029.622 | 110.20038.064 | 24.4004.006 | 35.6007.516 |
CG | 83.10022.684 | 142.20044.236 | 69.50040.175 | 155.10048.732 | 28.5009.192 | 51.60016.392 |
EM | 94.40040.114 | 277.900142.549 | 118.10089.794 | 224.000101.576 | 3.2000.422 | 3.6000.966 |

Comparison with Other Algorithms: We compared RLBFGS, with and without the proposed mappings, with some other algorithms, i.e., nonlinear Conjugate Gradient (CG) and Expectation Maximization (EM) for fitting GMM. The log-scale cost difference progress of several insightful runs are illustrated in Fig. 1. The average number of iterations in the algorithms are compared in Table 3. A complete set of plots for cost difference progress can be seen in Figs. 1, 2, 3, and 4 in the Supplementary Material. Figs. 1-a and 1-b show that in some cases, ISR mapping is faster than the Cholesky mapping and in some other cases, we have the other way around. As Fig. 1 and Table 3 show, our proposed mappings are outperforming CG and EM in the number of iterations.
6.2 Geometric Metric Learning
Formulation: As another application, we experiment with geometric metric learning. We implemented an iterative version of (Zadeh et al., 2016) whose regularized problem is:
s.t. |
where and denote the sets of similar and dissimilar points, respectively. We used class labels to randomly sample the points for these sets. This metric learning behaves like triplet loss in which the intra- and inter-class variances are decreased and increased, respectively, for better discrimination of classes (Ghojogh et al., 2020). The Euclidean gradient of this problem is .
Results: We evaluated our mappings with geometric metric learning optimization on three public datasets, i.e., Fisher Iris, USPS digits, and MNIST digits. The average results over ten runs are reported in Table 4. In Iris data, both mappings have converged faster than RLBFGS, having a better converged cost function. In USPS and MNIST data, the Cholesky and ISR mappings have outperformed RLBFGS without mapping, respectively.
Data | Algorithm | #iters | conv. time | iter. time | last cost |
---|---|---|---|---|---|
Iris | VTF (ISR) | 23.5004.528 | 2.4611.174 | 0.1100.070 | 1512.602347.004 |
VTF (Chol.) | 25.0005.676 | 2.4741.009 | 0.0960.028 | 1594.975124.087 | |
RLBFGS | 25.0004.714 | 2.9141.239 | 0.1130.037 | 1620.207125.989 | |
USPS | VTF (ISR) | 14.7003.234 | 1.8851.024 | 0.1330.094 | 14223.2150.001 |
VTF (Chol.) | 13.1001.524 | 1.2460.217 | 0.0950.012 | 14223.2160.001 | |
RLBFGS | 13.1002.234 | 1.3070.454 | 0.0980.025 | 14223.2150.001 | |
MNIST | VTF (ISR) | 11.7003.335 | 1.2880.619 | 0.1080.041 | 6254.2830.001 |
VTF (Chol.) | 13.1003.479 | 1.4350.793 | 0.1040.029 | 6254.2840.001 | |
RLBFGS | 12.1003.381 | 1.2660.754 | 0.0990.024 | 6254.2840.001 |
7 Conclusion and Future Direction
In this paper, we proposed two mappings in the tangent space of SPD manifolds by inverse second root and Cholesky decomposition. The proposed mappings simplify the vector transports and adjoint vector transports to identity. These transports are widely used in RLBFGS quasi-Newton optimization, to identity. They also reduce the Riemannian metric to the Euclidean inner product which is more efficient computationally. Simulation results verified the effectiveness of the proposed mappings for two optimization tasks on SPD matrix manifolds. In this work, we focused on mappings for SPD manifolds which are widely used in machine learning and data science. A possible future direction is to extend the proposed mappings for other well-known Riemannian manifolds, such as Grassmann and Stiefel (Edelman et al., 1998), as well as other Riemannian optimization methods. This paper opens a new research path for such mappings in the tangent space and we conjecture that such mappings can make numerical Riemannian optimization more efficient.
References
- Absil et al. (2009) P-A Absil, Robert Mahony, and Rodolphe Sepulchre. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
- Armijo (1966) Larry Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of mathematics, 16(1):1–3, 1966.
- Arthur and Vassilvitskii (2007) David Arthur and Sergei Vassilvitskii. k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035, 2007.
- Bhatia (2009) Rajendra Bhatia. Positive definite matrices. Princeton University Press, 2009.
- Boumal (2020) Nicolas Boumal. An introduction to optimization on smooth manifolds. Available online, 2020.
- Boumal et al. (2014) Nicolas Boumal, Bamdev Mishra, P-A Absil, and Rodolphe Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds. The Journal of Machine Learning Research, 15(1):1455–1459, 2014.
- Edelman et al. (1998) Alan Edelman, Tomás A Arias, and Steven T Smith. The geometry of algorithms with orthogonality constraints. SIAM journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
- Fletcher (2013) Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2013.
- Ghojogh et al. (2020) Benyamin Ghojogh, Milad Sikaroudi, Sobhan Shafiei, Hamid R Tizhoosh, Fakhri Karray, and Mark Crowley. Fisher discriminant triplet and contrastive losses for training Siamese networks. In International joint conference on neural networks, pages 1–7. IEEE, 2020.
- Golub and Van Loan (2013) Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU press, 2013.
- Hestenes and Stiefel (1952) Magnus Rudolph Hestenes and Eduard Stiefel. Methods of conjugate gradients for solving linear systems, volume 49. NBS Washington, DC, 1952.
- Hosseini and Mash’al (2015) Reshad Hosseini and Mohamadreza Mash’al. Mixest: An estimation toolbox for mixture models. arXiv preprint arXiv:1507.06065, 2015.
- Hosseini and Sra (2020) Reshad Hosseini and Suvrit Sra. An alternative to EM for Gaussian mixture models: batch and stochastic Riemannian optimization. Mathematical Programming, 181(1):187–223, 2020.
- Hu et al. (2020) Jiang Hu, Xin Liu, Zai-Wen Wen, and Ya-Xiang Yuan. A brief introduction to manifold optimization. Journal of the Operations Research Society of China, 8(2):199–248, 2020.
- Huang et al. (2015) Wen Huang, Kyle A Gallivan, and P-A Absil. A Broyden class of quasi-Newton methods for Riemannian optimization. SIAM Journal on Optimization, 25(3):1660–1685, 2015.
- Huang et al. (2016) Wen Huang, P-A Absil, and Kyle A Gallivan. A Riemannian BFGS method for nonconvex optimization problems. In Numerical Mathematics and Advanced Applications ENUMATH 2015, pages 627–634. Springer, 2016.
- Jeuris et al. (2012) Ben Jeuris, Raf Vandebril, and Bart Vandereycken. A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electronic Transactions on Numerical Analysis, 39(ARTICLE):379–402, 2012.
- Ji (2007) Huibo Ji. Optimization approaches on smooth manifolds. PhD thesis, Australian National University, 2007.
- Li and Fukushima (2001) Dong-Hui Li and Masao Fukushima. On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM Journal on Optimization, 11(4):1054–1064, 2001.
- Liu and Nocedal (1989) Dong C Liu and Jorge Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1):503–528, 1989.
- Nocedal (1980) Jorge Nocedal. Updating quasi-Newton matrices with limited storage. Mathematics of computation, 35(151):773–782, 1980.
- Nocedal and Wright (2006) Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.
- Qi et al. (2010) Chunhong Qi, Kyle A Gallivan, and P-A Absil. Riemannian BFGS algorithm with applications. In Recent advances in optimization and its applications in engineering, pages 183–192. Springer, 2010.
- Ring and Wirth (2012) Wolfgang Ring and Benedikt Wirth. Optimization methods on Riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):596–627, 2012.
- Seibert et al. (2013) Matthias Seibert, Martin Kleinsteuber, and Knut Hüper. Properties of the BFGS method on Riemannian manifolds. Mathematical System Theory C Festschrift in Honor of Uwe Helmke on the Occasion of his Sixtieth Birthday, pages 395–412, 2013.
- Sra and Hosseini (2015) Suvrit Sra and Reshad Hosseini. Conic geometric optimization on the manifold of positive definite matrices. SIAM Journal on Optimization, 25(1):713–739, 2015.
- Sra and Hosseini (2016) Suvrit Sra and Reshad Hosseini. Geometric optimization in machine learning. In Algorithmic Advances in Riemannian Geometry and Applications, pages 73–91. Springer, 2016.
- Wolfe (1969) Philip Wolfe. Convergence conditions for ascent methods. SIAM Review, 11(2):226–235, 1969.
- Zadeh et al. (2016) Pourya Zadeh, Reshad Hosseini, and Suvrit Sra. Geometric mean metric learning. In International conference on machine learning, pages 2464–2471. PMLR, 2016.
Vector Transport Free Riemannian LBFGS for Optimization on Symmetric Positive Definite Matrix Manifolds (Supplementary Material)
1 Proof for Proposition 5
– Upper bound analysis: The most time consuming part of the RLBFGS algorithm is the recursive function GetDirection(.) defined in Section 2.4. In every call of recursion, Eqs. (1)-(3) are performed. The metric used in Eqs. (1) and (3) takes and before and after applying the mappings, respectively (cf. Table 1 in main paper). This is because matrix inversion takes time and the matrices are vectorized within trace operator in the algorithm implementation. The vector transport used in Eq. (2) takes and before and after applying the mappings, respectively (cf. Table 1 in main paper). Recursion in RLBGS is usually done for a constant number of times (see Ref. Ring and Wirth, 2020). Overall, the complexity of recursion is and , for RLBFGS with and without mappings, respectively.
– Lower bound analysis: Computing matrix inversion in the metric before applying our mappings (cf. Table 1 in main paper) takes (see article [1] referenced below) while metric takes after the mappings. The vector transport takes and before and after the mappings, respectively. Overall, the complexity of recursion is and , for RLBFGS with and without mappings, respectively.
[1] A. Tveit, “On the complexity of matrix inversion,” Norwegian University of Science and Technology, Trondheim, Technical Report, 2003.
– Tight bound analysis: As the upper bound and lower bound complexities of the every algorithm are equal, we can conclude that the complexity bound is tight. Therefore, RLBFGS has time complexity and , with and without our mappings, respectively. This shows that our proposed mappings improve the time of optimization. This time improvement shows off better if computation of gradients in Eq. (6) is not dominant in complexity.
2 Simulations for Larger Sample Size and Dimensionality
Tables 2 and 3, in this Supplementary Material, report the average simulation results for large sample size, i.e., . In these tables, exponential map and Taylor approximation for retraction are used, respectively. We also have reported results for large dimensionality in these two tables.
Separation | Algorithm | #iters | conv. time | time diff. std | iter. time | last cost | ||
---|---|---|---|---|---|---|---|---|
2 | 2 | Low | VTF (ISR) | 59.80022.553 | 93.13588.437 | 19.280 | 1.3200.711 | 0.6100.366 |
VTF (Chol.) | 62.60029.079 | 106.545117.633 | 48.611 | 1.3550.830 | 0.6190.382 | |||
RLBFGS | 59.80026.803 | 100.424120.310 | – | 1.3630.787 | 0.6100.366 | |||
Mid | VTF (ISR) | 45.80016.982 | 48.15538.045 | 10.833 | 0.9000.454 | 0.4820.497 | ||
VTF (Chol.) | 47.00018.074 | 51.52941.465 | 11.146 | 0.9300.483 | 0.4820.497 | |||
RLBFGS | 45.00016.330 | 48.15037.285 | – | 0.9210.458 | 0.4820.497 | |||
High | VTF (ISR) | 25.6004.142 | 9.8163.837 | 1.380 | 0.3700.093 | 0.2700.456 | ||
VTF (Chol.) | 26.3004.448 | 10.5404.190 | 1.932 | 0.3850.101 | 0.2700.456 | |||
RLBFGS | 25.5004.353 | 10.5264.819 | – | 0.3960.113 | 0.2700.456 | |||
10 | Low | VTF (ISR) | 75.80025.607 | 166.736137.600 | 63.108 | 1.9550.816 | 4.1290.771 | |
VTF (Chol.) | 74.50022.629 | 152.074107.531 | 98.424 | 1.8550.682 | 4.1290.771 | |||
RLBFGS | 74.80025.442 | 172.641143.198 | – | 2.0540.834 | 4.1290.771 | |||
Mid | VTF (ISR) | 50.90013.956 | 64.50138.055 | 11.794 | 1.1700.395 | 3.4721.154 | ||
VTF (Chol.) | 52.10013.892 | 66.77541.257 | 7.815 | 1.1840.410 | 3.4721.154 | |||
RLBFGS | 51.00014.063 | 70.13246.421 | – | 1.2640.450 | 3.4721.154 | |||
High | VTF (ISR) | 43.6005.522 | 42.76311.897 | 12.828 | 0.9630.168 | 4.3531.081 | ||
VTF (Chol.) | 47.4006.620 | 50.43815.187 | 8.731 | 1.0410.182 | 4.3531.081 | |||
RLBFGS | 44.3006.464 | 47.83014.747 | – | 1.0550.192 | 4.3531.081 | |||
5 | 2 | Low | VTF (ISR) | 262.500126.532 | 4430.5774455.877 | 13196.851 | 13.8496.991 | 0.0820.281 |
VTF (Chol.) | 253.500124.970 | 4086.0763967.586 | 10698.273 | 13.0846.850 | 0.0990.279 | |||
RLBFGS | 270.700144.145 | 5305.8215495.383 | – | 15.5608.452 | 0.0900.282 | |||
Mid | VTF (ISR) | 110.90050.886 | 731.318769.639 | 184.154 | 5.4452.806 | 1.0100.349 | ||
VTF (Chol.) | 112.20060.736 | 792.7561053.832 | 416.142 | 5.4363.358 | 1.0100.349 | |||
RLBFGS | 111.90055.183 | 814.994975.737 | – | 5.8273.289 | 1.0100.349 | |||
High | VTF (ISR) | 52.00013.565 | 116.48569.535 | 12.185 | 2.0710.728 | 1.6260.431 | ||
VTF (Chol.) | 51.40012.616 | 114.03267.916 | 19.838 | 2.0570.735 | 1.6260.431 | |||
RLBFGS | 53.40014.081 | 139.98989.213 | – | 2.4080.936 | 1.6260.431 | |||
10 | Low | VTF (ISR) | 219.70057.908 | 2946.0401513.372 | 1239.375 | 12.5793.507 | 6.6430.662 | |
VTF (Chol.) | 226.10086.010 | 3272.1112808.634 | 2103.162 | 12.7075.156 | 6.6250.650 | |||
RLBFGS | 231.30064.484 | 3607.2151947.580 | – | 14.5164.305 | 6.6420.663 | |||
Mid | VTF (ISR) | 87.10021.502 | 413.616223.510 | 54.437 | 4.4581.310 | 6.5850.569 | ||
VTF (Chol.) | 87.20021.186 | 405.484214.985 | 47.791 | 4.3741.262 | 6.5850.569 | |||
RLBFGS | 87.40020.403 | 454.434227.556 | – | 4.9181.339 | 6.5850.569 | |||
High | VTF (ISR) | 60.50010.533 | 181.11376.070 | 16.691 | 2.8920.649 | 6.8270.836 | ||
VTF (Chol.) | 62.70011.295 | 197.56593.653 | 24.531 | 3.0190.827 | 6.8270.836 | |||
RLBFGS | 58.90011.040 | 187.51886.254 | – | 3.0580.746 | 6.8270.836 |
Separation | Algorithm | #iters | conv. time | time diff. std | iter. time | last cost | |
---|---|---|---|---|---|---|---|
2 | Low | VTF (ISR) | 72.00022.949 | 127.90298.372 | 31.248 | 1.6160.584 | 0.2810.394 |
VTF (Chol.) | 69.80023.136 | 116.25086.811 | 23.937 | 1.4900.590 | 0.2810.394 | ||
RLBFGS | 68.90024.875 | 123.064105.594 | – | 1.5610.694 | 0.2810.394 | ||
Mid | VTF (ISR) | 59.40016.460 | 78.12453.880 | 10.139 | 1.2100.421 | 0.5200.392 | |
VTF (Chol.) | 54.90012.862 | 63.24134.382 | 31.027 | 1.0820.332 | 0.5160.393 | ||
RLBFGS | 58.40017.037 | 80.04262.183 | – | 1.2400.497 | 0.5200.392 | ||
High | VTF (ISR) | 23.3002.627 | 7.4342.288 | 0.738 | 0.3130.057 | 0.1020.288 | |
VTF (Chol.) | 22.9002.685 | 7.2732.084 | 0.873 | 0.3120.052 | 0.1020.288 | ||
RLBFGS | 23.3002.497 | 7.9172.135 | – | 0.3350.053 | 0.1020.288 | ||
10 | Low | VTF (ISR) | 63.40016.728 | 112.68274.903 | 17.031 | 1.6640.489 | 4.3641.299 |
VTF (Chol.) | 65.00017.994 | 117.21582.258 | 13.637 | 1.6700.538 | 4.3641.299 | ||
RLBFGS | 64.90017.866 | 126.98787.668 | – | 1.8190.561 | 4.3641.299 | ||
Mid | VTF (ISR) | 48.2008.766 | 58.09623.089 | 10.899 | 1.1640.256 | 4.0241.627 | |
VTF (Chol.) | 49.80011.811 | 64.02134.770 | 6.571 | 1.2080.364 | 4.0241.627 | ||
RLBFGS | 48.40010.906 | 64.09932.746 | – | 1.2510.361 | 4.0241.627 | ||
High | VTF (ISR) | 45.1007.385 | 49.62820.550 | 7.336 | 1.0650.243 | 3.2901.129 | |
VTF (Chol.) | 47.8008.121 | 55.21722.622 | 6.572 | 1.1170.251 | 3.2901.129 | ||
RLBFGS | 45.2007.131 | 52.88320.627 | – | 1.1370.236 | 3.2901.129 | ||
100 | Low | VTF (ISR) | 131.90032.402 | 54826.32730454.295 | 5842.849 | 389.761121.060 | 63.7032.843 |
VTF (Chol.) | 141.20036.908 | 59082.27832385.087 | 3775.919 | 392.658110.916 | 63.7032.843 | ||
RLBFGS | 136.30036.059 | 56627.78832602.224 | – | 387.367119.404 | 63.7032.843 | ||
Mid | VTF (ISR) | 87.80020.225 | 23635.62311743.237 | 5738.406 | 259.02652.061 | 61.2634.734 | |
VTF (Chol.) | 95.80025.170 | 28033.65516196.321 | 1410.113 | 278.45664.670 | 61.2634.734 | ||
RLBFGS | 94.90025.653 | 28527.52417341.755 | – | 284.54869.821 | 61.2634.734 | ||
High | VTF (ISR) | 97.00024.240 | 29048.83612857.812 | 3441.882 | 286.60263.250 | 61.2413.991 | |
VTF (Chol.) | 105.10027.477 | 33967.54315707.841 | 2014.574 | 308.66870.065 | 61.2413.991 | ||
RLBFGS | 103.20026.389 | 33101.98615790.066 | – | 305.00274.361 | 61.2413.991 |
Separation | Algorithm | #iters | conv. time | time diff. std | iter. time | last cost | |
---|---|---|---|---|---|---|---|
2 | Low | VTF (ISR) | 79.80049.973 | 206.935343.153 | 105.106 | 1.8391.340 | 0.4030.578 |
VTF (Chol.) | 72.20020.004 | 125.63683.113 | 334.239 | 1.6080.536 | 0.4020.576 | ||
RLBFGS | 83.90051.054 | 237.936364.382 | – | 2.0641.412 | 0.4040.578 | ||
Mid | VTF (ISR) | 48.10012.688 | 49.27730.486 | 12.185 | 0.9460.334 | 0.1960.436 | |
VTF (Chol.) | 50.10014.271 | 56.32940.420 | 14.674 | 1.0170.422 | 0.1960.436 | ||
RLBFGS | 51.00013.622 | 61.03639.550 | – | 1.0990.412 | 0.1960.436 | ||
High | VTF (ISR) | 25.3003.945 | 9.8873.903 | 3.565 | 0.3770.101 | 0.1110.261 | |
VTF (Chol.) | 26.5004.927 | 11.0035.644 | 4.938 | 0.3950.124 | 0.1110.261 | ||
RLBFGS | 25.1004.012 | 10.2034.224 | – | 0.3920.106 | 0.1110.261 | ||
10 | Low | VTF (ISR) | 65.50018.435 | 123.55277.898 | 14.558 | 1.7470.560 | 4.1641.142 |
VTF (Chol.) | 66.50019.558 | 122.28179.423 | 19.994 | 1.6880.569 | 4.1641.142 | ||
RLBFGS | 65.20018.743 | 130.09884.241 | – | 1.8350.622 | 4.1641.142 | ||
Mid | VTF (ISR) | 40.6005.211 | 38.91512.591 | 8.862 | 0.9380.177 | 4.5891.278 | |
VTF (Chol.) | 41.0005.538 | 38.73212.699 | 9.319 | 0.9240.171 | 4.5891.278 | ||
RLBFGS | 40.4006.186 | 41.51315.303 | – | 0.9980.213 | 4.5891.278 | ||
High | VTF (ISR) | 44.4006.310 | 46.95715.325 | 11.008 | 1.0320.198 | 3.7861.113 | |
VTF (Chol.) | 46.0006.616 | 50.39315.950 | 12.688 | 1.0700.195 | 3.7861.113 | ||
RLBFGS | 44.2007.099 | 50.70718.070 | – | 1.1160.222 | 3.7861.113 | ||
100 | Low | VTF (ISR) | 118.50011.404 | 39986.2996719.992 | 2227.745 | 335.34126.657 | 62.2415.708 |
VTF (Chol.) | 124.40011.157 | 42882.6587219.289 | 3669.708 | 342.53828.830 | 62.2415.708 | ||
RLBFGS | 121.30011.235 | 41171.0597987.985 | – | 336.54736.783 | 62.2415.708 | ||
Mid | VTF (ISR) | 103.80024.679 | 32533.03716124.089 | 3314.944 | 297.26876.267 | 60.2563.662 | |
VTF (Chol.) | 111.90028.781 | 37328.92518924.066 | 2916.610 | 314.51282.724 | 60.2563.662 | ||
RLBFGS | 111.30027.941 | 36926.99517705.262 | – | 314.56476.642 | 60.2563.662 | ||
High | VTF (ISR) | 96.30025.880 | 28818.65614923.845 | 1492.788 | 283.82467.267 | 60.6014.436 | |
VTF (Chol.) | 103.90026.126 | 33669.98716494.757 | 3368.579 | 308.35373.281 | 60.6014.436 | ||
RLBFGS | 102.80025.258 | 31630.47815755.934 | – | 292.77168.396 | 60.6014.436 |
3 Cost Difference Progress for Sample Size Simulations
The log-scale cost difference for simulations of are depicted in Figs. 1 and 2 of this Supplementary Material, for and , respectively.


4 Cost Difference Progress for Sample Size Simulations
The log-scale cost difference for simulations of are depicted in Figs. 3 and 4 of this Supplementary Material, where exponential map and Taylor approximation for retraction are used, respectively.

