Convergence directions of the randomized Gauss–Seidel method and its extension 111The work is supported by the National Natural Science Foundation of China (No. 11671060) and the Natural Science Foundation Project of CQ CSTC (No. cstc2019jcyj-msxmX0267)
Abstract
The randomized Gauss–Seidel method and its extension have attracted much attention recently and their convergence rates have been considered extensively. However, the convergence rates are usually determined by upper bounds, which cannot fully reflect the actual convergence. In this paper, we make a detailed analysis of their convergence behaviors. The analysis shows that the larger the singular value of is, the faster the error decays in the corresponding singular vector space, and the convergence directions are mainly driven by the large singular values at the beginning, then gradually driven by the small singular values, and finally by the smallest nonzero singular value. These results explain the phenomenon found in the extensive numerical experiments appearing in the literature that these two methods seem to converge faster at the beginning. Numerical examples are provided to confirm the above findings.
keywords:
convergence direction , randomized Gauss–Seidel method, randomized extended Gauss–Seidel method , singular vector , least squares1 Introduction
Linear least squares problem is a ubiquitous problem arising frequently in data analysis and scientific computing. Specifically, given a data matrix and a data vector , a linear least squares problem can be written as follows
(1) |
In the literature, several direct methods have been proposed for solving its normal equations through either the QR factorization or the singular value decomposition (SVD) of [1, 2], which can be prohibitive when the matrix is large–scale. Hence, iterative methods are considered for solving large linear least squares problem, such as the famous Gauss–Seidel method [3].
In [4], Leventhal and Lewis proved that the randomized Gauss–Seidel (RGS) method, also known as the randomized coordinate descent method, converges to the solution at a linear rate in expectation. This method works on the columns of the matrix at random with probability proportional to their norms. Later, Ma, Needell and Ramdas [5] provided a unified theory of the RGS method and the randomized Kaczmarz (RK) method [6], where the latter method works on the rows of , and showed that the RGS method converges to the minimum Euclidean norm least squares solution of (1) only when the matrix is of full column rank. To further develop the RGS method for more general matrix, inspired by the randomized extended Kaczmarz (REK) method [7], Ma et al. [5] presented a variant of the RGS mehtod, i.e., randomized extended Gauss–Seidel (REGS) method, and proved that the REGS method converges to regardless of whether the matrix has full column rank. After that, many variants of the RGS (or REGS) method were developed and studied extensively; see for example [8, 9, 10, 11, 12, 13, 14] and references therein.
To the best of our knowledge, when studying the convergence properties of the RGS and REGS methods, people mainly pay attention to their convergence rates and usually give corresponding upper bounds, and no work focuses on what determines their convergence rates, what drives their convergence directions, and what their ultimate directions is. As we know, the obtained upper bound of convergence can only be used as a reference for the convergence rate, and cannot truly reflect the empirical convergence of the method. So it is interesting to consider the above three problems.
In 2017, Jiao, Jin and Lu [15] analyzed the preasymptotic convergence of the RK method. Recently, Steinerberger [16] made a more detailed analysis of the convergence property of the RK method for overdetermined full rank linear system. The author showed that the right singular vectors of the matrix describe the directions of distinguished dynamics and the RK method converges along small right singular vectors. After that, Zhang and Li [17] considered the convergence property of the REK method for all types of linear systems (consistent or inconsistent, overdetermined or underdetermined, full-rank or rank-deficient) and showed that the REK method converges to the minimum Euclidean norm least squares solution with different decay rates in different right singular vectors spaces.
In this paper, we analyze the convergence properties of the RGS and REGS methods for linear least squares problem and show that the decay rates of the sequences and (resp., the sequences and ) generated by the RGS method (resp., the REGS method) are depend on the size of singular values of . Specifically, the larger the singular value of is, the faster the error decays in the corresponding singular vector space, and the convergence directions are mainly driven by the large singular values at the beginning, then gradually driven by the small singular values, and finally by the smallest nonzero singular value.
2 Notations and preliminaries
Throughout the paper, for a matrix , , , , , , , and denote its transpose, th row (or th entry in the case of a vector), th column, th singular value, smallest nonzero singular value, Frobenius norm, and column space, respectively. For any integer , let . If the matrix is positive definite, we define the energy norm of any vector as . In addition, we denote the identity matrix by , its th column by and the expectation of any random variable by .
In the following, we use to denote the minimum Euclidean norm least squares solution of (1), where denotes the Moore–Penrose pseudoinverse of the matrix . Because the SVD is the basic tool for the convergence analysis in next two sections, we denote the SVD [18] of by
where and are column orthonormal matrices and their column vectors known as the left and right singular vectors, respectively, and is diagonal with the diagonal elements ordered nonincreasingly, i.e., with .
3 Convergence directions of the RGS method
Algorithm 1
The RGS method
-
1.
INPUT: , , ,
-
2.
For do
-
3.
Select with probability
-
4.
Set
-
5.
End for
Theorem 1
Remark 1
Theorem 1 shows that converges linearly in expectation to regardless of whether the matrix has full rank. Since , it follows from (2) that
which implies that converges linearly in expectation to the minimum Euclidean norm least squares solution when the matrix is overdetermined and of full column rank, but can not converge to when is not full column rank. So, we assume that the matrix is of full column rank in this section.
Now, we give our three main results of the RGS method.
Theorem 2
Let , , be the minimum Euclidean norm least squares solution, and be the th approximation of the RGS method generated by Algorithm 1 with initial guess . Then
(3) |
Proof 1
Remark 2
Theorem 2 shows that the decay rates of are different in different left singular vectors spaces. Specifically, the decay rates are dependent on the singular values: the larger the singular value of is, the faster the error decays in the corresponding left singular vector space. This implies that the smallest singular value will lead to the slowest rate of convergence, which is the one in (2). So, the convergence bound presented in [4, 5] is optimal.
Remark 3
Remark 4
Theorem 3
Let , , be the minimum Euclidean norm least squares solution, and be the th approximation of the RGS method generated by Algorithm 1 with initial guess . Then
Proof 2
Similar to the proof of [5], we can derive the desired result.
Remark 5
Since , Theorem 3 implies that the RGS method actually converges faster if is not close to left singular vectors corresponding to the small singular values of .
Theorem 4
Let , , be the minimum Euclidean norm least squares solution, and be the th approximation of the RGS method generated by Algorithm 1 with initial guess . Then
(4) |
Proof 3
Remark 6
Let and are two unit vectors, i.e., and . We use inner quantity to represent the angle between and , and the bigger the angle is, the bigger the fluctuation becomes from to . Theorem 4 shows the fluctuation of two adjacent iterations. Specifically, when is large, the angle between and is large, which implies that has a large fluctuation; when is small, the angle between and is small, which implies that has very little fluctuation.
Since , Theorem 4 implies that if is mainly composed of left singular vectors corresponding to the small singular values of , its direction hardly changes, which means that the RGS method finally converges along left singular vector corresponding to the small singular value of .
4 Convergence directions of the REGS method
Recalling Remark 1, when the matrix is not full column rank, the sequence generated by the RGS method does not converge to the minimum Euclidean norm least squares solution , even though does converge to . In [5], Ma et al. proposed an extended variant of the RGS method, i.e., the REGS method, to allow for convergence to regardless of whether has full column rank or not.
Now, we list the REGS method presented in [13] in Algorithm 2, which is a equivalent variant of the original REGS method [5], and restate its convergence bound presented in [13] in Theorem 5. From the algorithm we find that, in each iteration, is the th approximation of the RGS method and is a one-step RK update for the linear system from .
Algorithm 2
The REGS method
-
1.
INPUT: , , , ,
-
2.
For do
-
3.
Select with probability
-
4.
Set
-
5.
Select with probability
-
6.
Set
-
7.
End for
Theorem 5
For the REGS method, we first discuss the convergence behavior of in Theorem 6 and Theorem 7, and then consider its convergence behavior of in Theorem 8.
Theorem 6
Let , , be the minimum Euclidean norm least squares solution, and be the th approximation of the REGS method generated by Algorithm 2 with initial and . Then
(6) |
Proof 4
We first consider. Let be the conditional expectation conditioned on the first iterations of the REGS method. That is,
where is the th column chosen and is the th row chosen. We denote the conditional expectation conditioned on the first iterations and the th column chosen as
Similarly, we denote the conditional expectation conditioned on the first iterations and the th row chosen as
Then, by the law of total expectation, we have
Thus, we obtain
Further, by making use of and , we get
which together with the orthogonality of the left singular vectors yields
As a result, by taking the full expectation on both sides, we have
(8) |
Remark 7
Theorem 7
Let , , be the minimum Euclidean norm least squares solution, and be the th approximation of the REGS method generated by Algorithm 2 with initial and . Then
(11) |
Proof 5
Remark 8
Since , Theorem 7 implies that of the REGS method actually converges faster if is not close to right singular vectors corresponding to the small singular values of .
Theorem 8
Let , , be the minimum Euclidean norm least squares solution, and be the th approximation of the REGS method generated by Algorithm 2 with initial and . Then
(12) |
Proof 6
We now consider . Exploiting (10), we have
(15) |
5 Numerical experiments
Now we present two simple examples to illustrate that the convergence directions of the RGS and REGS methods. To this end, let be a Gaussian matrix with i.i.d. entries and be a diagonal matrix whose diagonal elements are all 100. Further, we set and replace its last row by a tiny perturbation of , i.e., adding 0.01 to each entry of . Then, we normalize all rows of , i.e., set , . After that, we set and , where and are zero matrices. So, the first 499 singular values of the matrices and are between and , and the smallest nonzero singular value is .
We first consider convergence directions of and of the RGS method. We generate a vector using the MATLAB function randn, set the full column rank coefficient matrix and set the right-hand side , where is a nonzero vector belonging to the null space of , which is generated by the MATLAB function null. With , we plot and in Figure 1 and Figure 2, respectively.

From Figure 1, we find that initially is very small and almost is 0, which indicates that is not close to the left singular vector . Considering the analysis of Remark 5, the phenomenon implies the ‘preconvergence’ behavior of the RGS method, that is, the RGS method seems to converge quickly at the beginning. In addition, as , . This phenomenon implies that tends to the left singular vector corresponding to the smallest singular value of .

From Figure 2, we observe that the values of decreases with and finally approaches the small singular value. This phenomenon implies that the forward direction of is mainly determined by the right singular vectors corresponding to the large singular values of at the beginning. With the increase of , the direction is mainly determined by the right singular vectors corresponding to the small singular values. Finally, tends to the right singular vector space corresponding to the smallest singular value. Furthermore, this phenomenon also allows for an interesting application, i.e., finding nonzero vectors such that is small.
We now consider convergence directions of and of the REGS method. We generate a vector using the MATLAB function randn, set the coefficient matrix which does not have full column rank, and set the right-hand side . With and , we plot and in Figure 3 and Figure 4, respectively.


Figure 3 and Figure 4 show the similar results obtained in the RGS method. That is, the convergence directions of and of the REGS method initially are depending on the large singular values and then mainly depending on the small singular values, and finally depending on the smallest singular value of .
References
- [1] Å. Björck, Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996.
- [2] N. J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, 2002.
- [3] Y. Saad, Iterative Methods for Sparse Linear Systems, SIAM, Philadelphia, 2003.
- [4] D. Leventhal, A. S. Lewis, Randomized methods for linear constraints: convergence rates and conditioning, Math. Oper. Res. 35 (2010) 641–654.
- [5] A. Ma, D. Needell, A. Ramdas, Convergence properties of the randomized extended Gauss–Seidel and Kaczmarz methods, SIAM J. Matrix Anal. Appl. 36 (2015) 1590–1604.
- [6] T. Strohmer, R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, J. Fourier Anal. Appl. 15 (2009) 262–278.
- [7] A. Zouzias, M. N. Freris, Randomized extended Kaczmarz for solving least squares, SIAM J. Matrix Anal. Appl. 34 (2013) 773–793.
- [8] R. M. Gower, P. Richtárik, Randomized iterative methods for linear systems, SIAM J. Matrix Anal. Appl. 36 (2015) 1660–1690.
- [9] J. Nutini, M. Schmidt, I. Laradji, M. Friedlander, H. Koepke, Coordinate descent converges faster with the Gauss–Southwell rule than random selection, in: International Conference on Machine Learning, PMLR, 2015, pp. 1632–1641.
- [10] A. Hefny, D. Needell, A. Ramdas, Rows versus columns: randomized Kaczmarz or Gauss–Seidel for ridge regression, SIAM J. Sci. Comput. 39 (2017) S528–S542.
- [11] S. Tu, S. Venkataraman, A. C. Wilson, A. Gittens, M. I. Jordan, B. Recht, Breaking locality accelerates block Gauss–Seidel, in: International Conference on Machine Learning, PMLR, 2017, pp. 3482–3491.
- [12] Y. Y. Xu, Hybrid Jacobian and Gauss–Seidel proximal block coordinate update methods for linearly constrained convex programming, SIAM J. Optim. 28 (2018) 646–670.
- [13] K. Du, Tight upper bounds for the convergence of the randomized extended Kaczmarz and Gauss–Seidel algorithms, Numer. Linear Algebra Appl. 26 (2019) e2233.
- [14] M. Razaviyayn, M. Y. Hong, N. Reyhanian, Z. Q. Luo, A linearly convergent doubly stochastic Gauss–Seidel algorithm for solving linear equations and a certain class of over–parameterized optimization problems, Math. Program. 176 (2019) 465–496.
- [15] Y. L. Jiao, B. T. Jin, X. L. Lu, Preasymptotic convergence of randomized Kaczmarz method, Inverse Problems. 33 (2017) 125012.
- [16] S. Steinerberger, Randomized Kaczmarz converges along small singular vectors, SIAM J. Matrix Anal. Appl. 42 (2021) 608–615.
- [17] Y. J. Zhang, H. Y. Li, Preconvergence of the randomized extended Kaczmarz method, arXiv preprint arXiv:2105.04924, 2021.
- [18] G. H. Golub, C. F. Van Loan, Matrix Computations, fourth ed., Johns Hopkins University Press, Baltimore, 2013.