A Semi-Randomized and Augmented Kaczmarz Method with Simple Random Sampling for Large-Scale Inconsistent Linear Systems
Abstract
A greedy randomized augmented Kaczmarz (GRAK) method was proposed in [Z.-Z. Bai and W.-T. Wu, SIAM J. Sci. Comput., 43 (2021), pp. A3892–A3911] for large and sparse inconsistent linear systems. However, one has to construct two new index sets via computing residual vector with respect to the augmented linear system in each iteration. Thus, the computational overhead of this method is large for extremely large-scale problems. Moreover, there is no reliable stopping criterion for this method. In this work, we are interested in solving large-scale sparse or dense inconsistent linear systems, and try to enhance the numerical performance of the GRAK method. First, we propose an accelerated greedy randomized augmented Kaczmarz method. Theoretical analysis indicates that it converges faster than the GRAK method under very weak assumptions. Second, in order to further release the overhead, we propose a semi-randomized augmented Kaczmarz method with simple random sampling. Third, to the best of our knowledge, there are no practical stopping criteria for all the randomized Kaczmarz-type methods till now. To fill-in this gap, we introduce a practical stopping criterion for Kaczmarz-type methods, and show its rationality from a theoretical point of view. Numerical experiments are performed on both real-world and synthetic data sets, which demonstrate the efficiency of the proposed methods and the effectiveness of our stopping criterion.
keywords:
Randomized Kaczmarz Method, Greedy Randomized Augmented Kaczmarz Method (GRAK), Semi-Randomized Kaczmarz Method, Inconsistent Linear System, Simple Random Sampling.AMS:
65F10, 65F151 Introduction
Consider the large-scale linear system
(1) |
where , , and is an unknown vector. In this paper, we make the assumption that there are no zero rows or columns in . On one hand, if the linear system (1) is consistent, the interest is to find the least-norm solution . The Kaczmarz method [23] is an effective iterative method for consistent linear systems, which has been applied in many real-world applications, e.g., signal processing [2, 16, 24], image and sound reconstructions [15, 18, 38, 42], computed tomography [1, 16, 24], and so on; see [11] and the references therein. At the -th iteration of the Kaczmarz method, the approximate solution is obtained from orthogonally projecting the current iteration vector onto the hyperplane :
(2) |
However, if the angle between the hyperplane related to two consecutive iterations is small, the convergence rate of Kaczmarz method may be slow. In order to partially overcome this difficulty, Strohmer and Vershynin [41] proposed a randomized version and proved its expected exponential rate of convergence, which is called the randomized Kaczmarz (RK) method. However, RK is difficult to utilize some real-time information for selecting working rows in each iteration. Another disadvantage is that the probability of each row being selected is directly proportional to the square of its Euclidean norm. If the Euclidean norms of a few rows of the coefficient matrix are much larger than those of the others, the RK method will converge very slow or even diverge. In order to improve the convergence of the RK method, many variations have been proposed, such as the sampling Kaczmarz-Motzkin method [28], the accelerated randomized Kaczmarz method [27], the weighted randomized Kaczmarz method [40], and so on [6, 14, 17, 45, 46, 47]. Some randomized block Kaczmarz methods were introduced in [9, 20, 26, 30, 32, 34, 35].
In [3], Bai and Wu proposed a greedy randomized Kaczmarz (GRK) method for large consistent linear systems. Theoretical analysis and numerical experiments demonstrate that GRK works better than RK. However, in each iteration of the GRK method, one has to compute the residual vector corresponding to the augmented linear system, and construct a new index set, which suffer from a large amount of workload, especially for big data problems. In order to reduce the computational overhead, Jiang, Wu and Jiang [21] proposed a semi-randomized Kaczmarz method with simple random sampling. This method only uses a small portion of rows of the matrix , and there is no need to calculate probabilities nor constructing index set for choosing working rows.
On the other hand, if the linear system (1) is inconsistent, the Kaczmarz-type methods can not converge to the least-norm least-squares solution [4]. Inspired by the works in [36, 37], Zouzias and Freris [48] introduced a randomized extended Kaczmarz (REK) method, and proved that the REK method converges to the least-norm least-squares solution. Indeed, the REK method can be understood as using the randomized Kaczmarz method to solve two linear systems and . Based on the REK method, Bai and Wu [5] proposed a partially random extended Kaczmarz (PREK) method. However, both the REK method and the PREK method will converge slowly if one of the two linear systems does so. A number of randomized block Kaczmarz methods for solving inconsistent linear systems were investigated in [13, 33].
Recently, Bai and Wu proposed a greedy randomized augmented Kaczmarz (GRAK) method for solving large sparse inconsistent linear systems [4]. In essence, the GRAK method first equivalently transform the inconsistent linear system (1) into a consistent augmented linear system, and then apply the GRK method to solve this augmented linear system. The relaxed version of the GRAK method was proposed in [7]. However, similar to the GRK method, it is required to calculate the residual vector with respect to the augmented linear system and construct two index sets and in each iteration. This is unfavorable for extremely large-scale problems.
In this paper, we focus on solving large (dense or sparse) inconsistent linear systems, and try to improve the performance of the GRAK method proposed in [4]. The contributions of this work are as follows. First, we modify the iteration scheme for updating the approximate solution, and propose an accelerated greedy randomized augmented Kaczmarz method. We prove that it converges faster than the GRAK method if . Second, we propose a semi-randomized augmented Kaczmarz method with simple random sampling for large inconsistent linear systems. In this method, we are free of computing the residual vector corresponding to the augmented linear system, and there is no need to construct index sets. So the proposed method is much cheaper than the GRAK method per iteration. The convergence of the method is established. Third, as far as we know, there are no practical stopping criteria for Kaczmarz-type methods. The third contribution of this work is to introduce a practical stopping criterion for Kaczmarz-type methods. By using this strategy, there is no need to know the exact solution a prior nor compute the residual vector.
The organization of this paper is as follows. In Section 2, we briefly review the randomized extended Kaczmarz method and the greedy randomized augmented Kaczmarz method for large inconsistent linear systems. In Section 3, we propose an accelerated greedy randomized augmented Kaczmarz method and an accelerated greedy randomized augmented Kaczmarz method with simple random sampling. Convergence results of the two methods are established. In Section 4, we present a practical stopping criterion for Kaczmarz-type methods, and show its rationality theoretically. In Section 5, we perform comprehensive numerical experiments on both synthetic and real-world data sets, to show the efficiency of the proposed methods as well as the effectiveness of our stopping criterion. Some concluding remarks are given in Section 6.
Let us introduce some notations. For a given matrix , we denote by , , , and its conjugate transpose, Euclidean norm, Moore-Penrose inverse, the th row and the th column of , respectively. Let and be the range space of and the orthogonal complement subspace of . Let and be the orthogonal projection of the vector onto and , respectively. Denote by the minimal nonzero eigenvalue of the matrix , and by the th column of the identity matrix . Let be the full expectation, and let be the conditional expectation on the first iterations [3]. That is,
In the light of the properties of full expectation and conditional expectation, we have that .
2 The randomized extended Kaczmarz method and the greedy randomized augmented Kaczmarz method
In this section, we briefly introduce the randomized extended Kaczmarz (REK) method and the greedy randomized augmented Kaczmarz (GRAK) method for large-scale inconsistent linear systems. The REK method is a combination of the randomized orthogonal projection algorithm and the randomized Kaczmarz method [48]. The main idea is to efficiently reduce the norm of the “noisy” of , i.e., , by using the randomized orthogonal projection algorithm, and then apply the randomized Kaczmarz method on a new linear system whose right-hand side is now (arbitrarily) close to the column space of . The framework of the REK method is as follows; for more details, refer to [48].
Algorithm 1.
The randomized extended Kaczmarz (REK) method [48]
Input: Given , , the convergence tolerance , and let and .
Output: The approximate solution .
1. for do
2. Pick with probability .
3. Pick with probability .
4. Set .
5. Set .
6. Check every iterations and terminate if
7. end for
In [4], Bai and Wu proposed a greedy randomized augmented Kaczmarz (GRAK) method. The main idea is to rewrite the large-scale inconsistent linear system into a consistent augmented linear system
(1) |
where
(2) |
and then apply the greedy randomized augmented Kaczmarz (GRK) method [3] to the augmented linear system (1).
More precise, define the two index sets
(3) |
and
(4) |
where
(5) |
with
(6) |
and
(7) |
Let
(8) |
and
(9) |
The greedy randomized augmented Kaczmarz method is given as follows; for more details, refer to [4].
Algorithm 2.
The greedy randomized augmented Kaczmarz (GRAK) method [4]
Input: , , , , and .
Output: and .
1. for do
2. Compute the tolerances , and by (5)–(6), and identify the index sets and by the criteria (3) and (4), and determine the projected residuals and by the definitions (8) and (9), respectively.
3. Select an index satisfying , with the probability
4. If , then set and compute
and
else if , then set , and compute
5. endfor
The following theorem indicates that the GRAK method is convergent in expectation to the least-norm least-square solution .
Theorem 1.
[4] Starting from any initial vectors and , the iteration sequences and , generated by the GRAK method, converge in expectation to the least-norm least-squares solution of the linear system (1) and the orthogonal projection vector , respectively. Moreover, the global solution error in expectation with respect to both iteration sequences and obeys
and
(10) |
where
and
(11) |
with
(12) |
and
(13) |
As a result, it holds that
Theorem 1 shows that the convergence rate of GRAK strongly depends on the minimum singular value and the Frobenius norm, the number of rows, and the size of the Euclidean norms of all rows and columns of the matrix .
3 The Proposed Methods
In each iteration of the GRAK method, one has to compute two residual vectors , and the probabilities , to construct two new index sets , . Thus, one has to scan all the rows of the coefficient matrix , which is very time consuming as the matrix is extremely large, and it is contrary to the original motivation of the Kaczmarz-type methods.
So as to deal with these problems, in this section, we first propose an accelerated greedy randomized augmented Kaczmarz method. Convergence analysis shows that, if , the accelerated method often converges faster than the GRAK method. To further reduce the computational overhead of the accelerated method, we then propose an accelerated greedy randomized augmented Kaczmarz method with simple sampling. The convergence of the method is also established.
3.1 An accelerated greedy randomized augmented Kaczmarz method
It is seen from Algorithm 2 that, as , the GRAK method only updates the vector rather than the approximate solution , which may slow down the rate of convergence of this algorithm. Indeed, just like the REK method, GRAK may converge slowly if one of the iteration sequences or does so. Thus, it is necessary to give new insight into the GRAK method and improve its numerical performance.
The idea is that, when , after computing , we select the working rows with probability to further update for the linear equation . Motivated by the semi-randomized Kaczmarz method proposed in [21], to refrain from evaluating probabilities and free of constructing two index sets in each iteration, we propose an accelerated greedy randomized augmented Kaczmarz (AGRAK) method to solve the augmented linear system (1).
Algorithm 3.
An accelerated greedy randomized augmented Kaczmarz (AGRAK) method
Input: , , , , and .
Output: and .
1. for do
2. Select according to
3. If , then set and compute
and
Else if , then set ,
and select an index with probability , and compute
4. endfor
Next, we consider the convergence of the AGRAK method. The following lemma is needed.
Lemma 3.1.
Let , and . Then there exists , such that .
Proof.
Let , then
and
which satisfies . ∎
First, notice that if , Algorithm 3 applies the semi-randomized Kaczmarz method [21] to solve the augmented linear system (1). Denote by
from [21, Theorem 3.2], we have that
(1) |
Thanks to the structure of , and ; see (2), one can rewrite (1) as
(2) |
Second, we consider the convergence of as . Notice that if , then
Consequently,
and
which is nothing but applying the semi-randomized Kaczmarz method to solve the equation . Therefore, by [21, Theorem 3.2],
(4) |
Take full expectation on the both sides of (4), and set
we get
(5) |
Next, we update for the linear equation by using the randomized Kaczmarz method. From [31, 41], we obtain
(6) |
Take full expectation on the both sides of (6), and set
(7) |
we get
(8) |
where the second inequality is from (5). Suppose that 111We mention that this assumption is very weak in practice, since Kaczmarz-type methods often applies to very large (dense) linear systems., such that
or in other words,
(9) |
Combining (5), (3.1) and (9), we have from Lemma 3.1 that there exists , such that
(10) |
where the high order term is omitted. In conclusion, we have the follow theorem for the convergence of the proposed AGRAK method.
Theorem 2.
Under the condition that , the following theorem indicates that Algorithm 3 converges faster than the GRAK method; refer to (10).
Theorem 3.
Proof.
First, we prove that . Recall that
and
As there are no zero columns in , we obtain
and .
Second, we prove that . Indeed, it is sufficient to show that . Let
Notice that . Let , then
(11) |
3.2 An accelerated greedy randomized augmented Kaczmarz method with simple random sampling
Although Algorithm 3 can converge faster than GRAK, however, just like the GRAK method, one has to compute the residual vector and in each iteration. That is, we have to access all the rows and columns of the data matrix , which is contrary to the motivation of the Kaczmarz-type methods. Specifically, this is unfavorable when the coefficient matrix is large and dense, and it is interesting to accelerate Algorithm 3 further.
To deal with this problem, we apply the simple random sampling strategy advocated in [21] to solve (1). The key idea is that, in each iteration, we first generate a subset by using simple random sampling, and then select working rows in this subset. Notice that the cost of generating a subset via simple random sampling is negligible compared with that of the overall algorithm. The advantages are that we are free of accessing all the rows or columns of the data matrix, and there is no need to compute the residual vector associated with the augmented linear system. The algorithm is given as follows.
Algorithm 4.
A semi-randomized augmented Kaczmarz method with simple random sampling
Input: , , , , a parameter and the maximal iteration number .
Output: and .
1. for do
2. Generate an indicator set with rows by using the simple random sampling method, where means rounding down. Let and . Then, select according to
3. If , then set , and compute
and
Else if , then set , and compute
and select an index with probability , and compute
4. end for
We are in a position to consider the convergence of the semi-randomized augmented Kaczmarz method with simple random sampling. First, if , similar to the proof of Theorem 4.2 in [21], as the number of the rows of is sufficiently large, we have from the Chebyshev’s (weak) law of large numbers that [22], there are two scalars such that
It can be rewritten as
(13) |
where and are defined in (12) and (13), respectively. Take full expectation on the both sides of (13) and set
we obtain
Second, if , analogous to the analysis of Algorithm 3, we have that
As the number of the rows of is sufficiently large, we have from the Chebyshev’s (weak) law of large numbers that [22], there are two scalars , such that
where is defined in (7), and
Theorem 4.
Remark 3.1.
As we only use a small portion of the rows of the data matrix , we often have that and . That is, Algorithm 4 may use more iterations than Algorithm 3. However, this does not mean that Algorithm 4 definitely runs slower than Algorithm 3. Indeed, Algorithm 4 is much cheaper than Algorithm 3 per iteration, and it may use less CPU time in practice. One refers to Section 5 for numerical comparisons of the algorithms.
4 A practical stopping criterion for Kaczmarz-type methods
To the best of our knowledge, there are no practical stopping criteria for the randomized Kaczmarz-type methods till now. To fill-in this gap, we introduce a practical stopping criterion for the randomized Kaczmarz-type methods in this section. Let be an iterative sequence generated by any Kaczmarz-type methods, and let be the least-squares solution of the linear system . One scheme is to use the relative solution error (RSE) as the stopping criterion [3, 5, 6, 21]
(1) |
In [29], the absolute solution error (ASE) is used
(2) |
Obviously, both (1) and (2) are impractical since is unknown in advance. In [43], the relative residual (RRes) is utilized
However, it is unsuitable for inconsistent linear system, and one has to calculate the residual . In other words, we have to access all the rows of in each iteration. In [25], Li and Wu suggest using the adjacent iterative solution error (AISE for short) as the stopping criterion
(3) |
We observe that it is free of and there is no need to compute the residual vector in (3). Unfortunately, converges to zero is only a necessary condition for converges to . More precisely,
In this section, we will propose a practical stopping criterion for all the randomized Kaczmarz-type methods, and show its rationality theoretically. We need the following lemma.
Lemma 4.1.
Let be an iterative sequence obtained from a convergent Kaczmarz-type method. Then, there is a positive integer such that
Proof.
We prove it by contradiction. Suppose that , for all the positive integers . If we take , then
which contradicts to the fact is from a convergent Kaczmarz-type method. ∎
Let
then the backward divided difference of at is defined as [10]
From Lemma 4.1, we see that is a strictly monotonically decreasing discrete function whose infimum is 0, and
Thus, for any , there exists a positive integer , such that . Next, we will show that there are and , if , then . Indeed, let , there exists , such that
(4) |
On one hand, if , then . On the other hand, if , then there is a scalar
(5) |
such that
(6) |
Combining the inequalities (4) and (6), we arrive at
From (5), we have . Hence,
In conclusion, we have the following theorem.
Theorem 5.
Let be a strictly monotonically decreasing discrete function whose infimum is 0. For any , there are and , if
then .
Remark 4.1.
Based on Theorem 5, we propose the following stopping criterion for Kaczmarz-type methods
(7) |
where is a user-provided number and is the convergence tolerance.
It is seen that the parameter is crucial to our stopping criterion. Next, we briefly discuss how to choose it in practice. Assume that
where , and
(8) |
Therefore,
5 Numerical Experiments
In this section, we perform numerical experiments to show the efficiency of our algorithms as well as the effectiveness of the proposed stopping criterion. All the numerical experiments are run on a Hp workstation with 20 cores double Intel(R)Xeon(R) E5-2640 v3 processors, with CPU 2.60 GHz and RAM 256 GB. The operation system is 64-bit Windows 10. All the numerical results are obtained from using MATLAB 2018b. In order to show the efficiency of our proposed algorithms, we compare Algorithm 3 and Algorithm 4 with the following state-of-the-art Kaczmarz-type methods for solving large-scale (dense or sparse) inconsistent linear systems:
REK [48]: The randomized extended Kaczmarz method.
PREK [5]: The partially randomized extended Kaczmarz method.
GRAK [4]: The greedy randomized augmented Kaczmarz method.
RGRAK [7]: The relaxed greedy randomized augmented Kaczmarz method, in which the relaxation parameter is set to be 1 [7].
In the tables below, we denote by “IT” the number of iterations, by “CPU” the CPU time in seconds, and by “RSE” the relative error defined as
(10) |
where is the least-squares solution in 2-norm of the inconsistent linear systems (1), which is generated by the MATLAB function . All the numerical results are the means from 10 runs.
5.1 Effectiveness of the stopping criterion (7)
In this section, we show effectiveness of our stopping criterion for large inconsistent linear systems. In this subsection, the test matrix is synthetic and is generated by using the MATLAB built-in function . Similar to [4], the right-hand side is chosen as , where is one of the least-squares solution of the inconsistent linear systems (1), and is a nonzero vector in the null space of the matrix generate by the MATLAB function . Two examples are performed in this subsection.
In this first example, we run the six algorithms using their own stopping criteria. More precisely, in the REK method, we use [48]
(11) |
as the stopping criterion; in the PREK method, we use [5]
(12) |
as the stopping criterion; in the GRAK method and the RGRAK method, we use [4, 7]
(13) |
as the stopping criteria, where is computed by . In Algorithm 3 and Algorithm 4, we make use of
(14) |
as the stopping criteria, where , , refer to (2), and the convergence tolerance is chosen as . Table 1 lists the numerical results, where the values in bold are the best one.
REK | IT | 19916 | 37204.9 | 64410.7 | 109528.8 | 185659.2 |
CPU | 161.9 | 375.7 | 646.4 | 1362.3 | 2618.0 | |
RSE | 4.38e-3 | 6.53e-3 | 9.44e-3 | 1.34e-2 | 1.98e-2 | |
PREK | IT | 13191.5 | 24724.2 | 47252.7 | 87730.9 | 169481.5 |
CPU | 3.5 | 7.3 | 14.9 | 31.4 | 64.3 | |
RSE | 1.00e-2 | 1.00e-2 | 1.00e-2 | 1.00e-2 | 1.00e-2 | |
GRAK | IT | 3003.5 | 5645.6 | 10181.9 | 16879.9 | 29117.3 |
CPU | 17.1 | 49.9 | 84.7 | 139.9 | 282.1 | |
RSE | 4.79e-1 | 4.50e-1 | 4.16e-1 | 3.94e-1 | 3.61e-1 | |
RGRAK | IT | 2977.3 | 5601.4 | 10206 | 16702.6 | 28927.7 |
CPU | 15.7 | 47.6 | 84.1 | 139.6 | 283.5 | |
RSE | 4.90e-1 | 4.73e-1 | 4.55e-1 | 4.44e-1 | 4.10e-1 | |
Algorithm 3 | IT | 9600 | 19200 | 36880 | 65480 | 127200 |
CPU | 61.1 | 117.9 | 253.6 | 581.5 | 1356.9 | |
RSE | 7.52e-4 | 1.17e-3 | 1.73e-3 | 2.22e-3 | 3.19e-3 | |
Algorithm 4 | IT | 10240 | 19920 | 37960 | 68080 | 131360 |
CPU | 14.0 | 38.9 | 107.5 | 250.7 | 748.6 | |
RSE | 6.61e-4 | 1.06e-3 | 1.63e-3 | 2.09e-3 | 2.99e-3 |
It is seen from Table 1 that the PREK method and the RGRAK method may require less CPU time and fewer number of iterations by using their own stopping criterion. However, the computed solution obtained from the two methods are unreliable, and the “real” relative errors of the approximations from the two methods can be large. For instance, the RSE values of the GRAM and RGRAK methods are only in the order of , the RSE values of the REK and PREK methods are only in the order of or , while those of Algorithm 3 and Algorithm 4 are in the order of or .
Indeed, we find that the denominator of the stopping criterion (13) can be quite large, which leads to an early termination of the GRAM and RGRAK methods even when is far from being a good approximation to . For example, for the matrix with and , the values of can be over . Moreover, both (12) and (13) are impractical as is unknown in advance, and we have to perform two matrix-vector product with respect to and for computing (11). That is to say, one has to access all the rows and columns of . Thus, compared with some available stopping criteria, our stopping criterion (14) is effective and is more appropriate as a stopping criterion for Kaczmarz-type methods.
REK | IT | 27680 | 54360 | 98480 | 173520 | 306360 |
CPU | 9.5 | 21.3 | 41.4 | 73.3 | 139.1 | |
RSE | 4.43e-4 | 7.04e-4 | 9.94e-4 | 1.46e-3 | 2.28e-3 | |
PREK | IT | 17600 | 33360 | 60520 | 106200 | 186120 |
CPU | 4.7 | 9.9 | 19.4 | 36.7 | 66.5 | |
RSE | 2.17e-3 | 2.87e-3 | 3.49e-3 | 4.52e-3 | 5.84e-3 | |
GRAK | IT | 11200 | 23040 | 43760 | 81960 | 146280 |
CPU | 73.9 | 230.9 | 455.4 | 1013.4 | 1889.4 | |
RSE | 1.01e-3 | 1.63e-3 | 2.22e-3 | 3.04e-3 | 4.13e-3 | |
RGRAK | IT | 11040 | 22640 | 42520 | 78640 | 140800 |
CPU | 73.5 | 184.1 | 501.6 | 1123.3 | 1961.4 | |
RSE | 1.02e-3 | 1.76e-3 | 2.59e-3 | 3.61e-3 | 5.58e-3 | |
Algorithm 3 | IT | 9600 | 19200 | 36040 | 66440 | 116600 |
CPU | 57.1 | 96.1 | 289.2 | 592.9 | 1119.5 | |
RSE | 7.77e-4 | 1.19e-3 | 1.66e-3 | 2.23e-3 | 2.99e-3 | |
Algorithm 4 | IT | 10120 | 20320 | 37720 | 68920 | 120760 |
CPU | 14.0 | 40.6 | 110.2 | 223.3 | 713.5 | |
RSE | 6.99e-4 | 1.10e-3 | 1.54e-3 | 2.09e-3 | 2.88e-3 |
In view of the numerical results given in the first example, in the second example, we run the six algorithms using the proposed stopping criterion (7) with and . We generate another five test matrices by using the MATLAB built-in function . The numerical results are given in Table 2, where the best results are in bold.
Some remarks are in order. First, comparing the values of the GRAK, RGRAK, REK and PREK methods in Tabel 1 and Tabel 2, we find that the approximate solutions got from these four method are more accurate than before when using (11) as the stopping criterion. This means that the proposed stopping criterion is feasible and it is more effective than the available ones. Second, we observe from Table 2 and Table 1 that REK runs much faster when using (14) as the stopping criterion. This is due to the fact that one has to perform two matrix-vector product with respect to and when evaluating (11). In fact, the computational cost of matrix-vector products is high when is large and dense. Third, it is seen from the table that for this synthetic problem, REK enjoys the highest accuracy and PREK runs the fastest, while Algorithm 3 needs the fewest iterations. This implies that the convergence speed of Algorithm 3 is the highest among the six. Of cause, an algorithm with least iteration numbers does not mean that it runs the fastest. Indeed, the framework of REK and PREK is different from those of the other four algorithms. Fourth, we see that Algorithm 3 and Algorithm 4 outperform GRAK and RGRAK significantly, both in terms of CPU time and iteration numbers. Specifically, Algorithm 3 needs the fewest iterations and Algorithm 4 uses the least CPU time. This is because Algorithm 4 only utilize a small portion of rows of the matrix in each iteration.
5.2 Efficiency of the proposed algorithms
In this section, we perform numerical experiments to show the superiority of the proposed algorithms over many state-of-the-art Kaczmarz-type methods for large and inconsistent linear systems. Based on the numerical experiments made in Section 5.1 and for the sake of justification, we make use of (7) as the stopping criterion for all the algorithms, with . In order to demonstrate the superiority of Algorithm 4 over the GRAK method, we consider the value of
(15) |
There are three numerical experiments in this section.
Matrix () | Size () | Nnz | Background |
22footnotemark: 2 | 51307 | Combinatorial Problem | |
22footnotemark: 2 | 137228 | Combinatorial Problem | |
22footnotemark: 2 | 1748122 | 6804304 | Least Squares Problem |
22footnotemark: 2 | 16150 | Directed Graph | |
22footnotemark: 2 | 233921 | Linear Programming Problem | |
22footnotemark: 2 | 81671 | Combinatorial Problem |
In the first experiment, we show numerical behavior of the proposed algorithms on some large sparse inconsistent linear systems. The test matrices are from the University of Florida Sparse Matrix Collection 222https://sparse.tamu.edu/. The details of these data matrices are listed in Table 3. In this experiment, the construction of the right-hand side corresponding to the matrices , , , and is the same way as in Section 5.1, while for the matrix , it is generated by using the MATLAB built-in function . The convergence tolerance is chosen as , and the sampling ratio in Algorithm 4 is set to be . Table 4 lists the numerical results, where the values in bold are the best one.
Matrix& Size | 1748122 | ||||||
REK | IT | 49640 | 64760 | / | 476680 | 69440 | 103840 |
CPU | 21.8 | 51.6 | / | 396.7 | 273.4 | 152.8 | |
PREK | IT | 36640 | 41800 | / | / | 100880 | / |
CPU | 14.1 | 29.7 | / | / | 175.6 | / | |
GRAK | IT | 38880 | 56200 | / | 300320 | 18680 | 73880 |
CPU | 35.1 | 99.4 | / | 418.2 | 119.6 | 190.9 | |
RGRAK | IT | 33680 | 47440 | / | 398080 | 18720 | 93240 |
CPU | 29.5 | 85.5 | / | 531.9 | 118.2 | 202.9 | |
Algorithm 3 | IT | 25680 | 32560 | / | 223280 | 11800 | 54440 |
CPU | 11.9 | 29.2 | / | 165.7 | 33.4 | 70.1 | |
Algorithm 4 | IT | 33880 | 45920 | 406640 | 251760 | 12680 | 59040 |
CPU | 15.6 | 45.1 | 59120.4 | 161.2 | 31.8 | 68.5 | |
Speed-up | 2.25 | 2.20 | / | 2.59 | 3.76 | 2.79 |
Some remarks are in order. First, it is seen from Table 4 that Algorithm 3 and Algorithm 4 are far superior to the other algorithms in terms of CPU time and number of iterations. In terms of the values of “Speed-up”, we observe from Table 4 that Algorithm 4 is about 2 to 3 times faster than the GRAK method, and the improvements are impressive. Specifically, for the large matrix , all the algorithms fail to converge within 24 hours, except for Algorithm 4. Therefore, we benefit from the strategy of simple sampling and there is no need to compute the whole residual vector during iterations. Thus, Algorithm 4 is competitive for solving large and sparse inconsistent linear systems. Notice that the PREK method does not work for the two matrices and , either. Indeed, it is required that the coefficient matrix is tall and of full column rank in the PREK method [5, Theorem 3.1], while the two matrices are not of full column rank.
In the second experiment, we consider the problem of tomographic image reconstruction and run the six algorithms on a 2-D parallel-beam tomography problem. In this experiment, a sparse matrix and an “exact solution” is generated by using the function in the AIR tool box [19], with , and . The matrix is of size , and we set the right-hand side , where the noise is a nonzero vector in the null space of the matrix generate by the MATLAB function .
In this experiment, we solve the inconsistent linear system , and all the algorithms are run for 20000 iterations. Consider the signal-noise ratio (SNR) [44]
(16) |
where is the original clean signal (i.e., the “exact solution” ), and is the denoised signal (i.e., the computed solution obtained from one of the six algorithms). Notice that the larger the , the better the denoising effect of an algorithm. The original image (Exact), the reconstructed ones as well as the values of SNR are shown in Figure 1. It is obviously to see from the figures and the values of SNR that Algorithm 3 and Algorithm 4 can remove the noise and restore the real image efficiently, and our algorithms are superior to the other methods in terms of SNR.
Spectral Regression Discriminant Analysis (SRDA) is one of the most popular methods for large-scale discriminant analysis [12]. By using the tool of spectral graph analysis, SRDA casts discriminant analysis into a regression framework which facilitates both efficient computation and the use of regularization techniques. Specifically, SRDA only needs to solve a set of least squares problems and there is no eigenvector computation involved, which can save both the running time and the memory compared with some classical linear discriminant analysis methods.
Let be a set of training samples in a -dimensional feature space. Assume that the data matrix is partitioned into classes as , where is the -th set with being the number of samples. Denote by the centroid vector of , and by the global centroid vector of the training data. Denote by
where and are the one vector of size and , respectively, and is the -by- identity matrix. Let and be an orthonormal basis of .
Let . In essence, the SRDA method resorts to the following least squares problems [12, 39]:
(17) |
where is the -th column vector of .
Without loss of generality, we solve the problem as by using Algorithm 3 and Algorithm 4, where the sampling ratio is chosen as in the latter algorithm. We compare the two proposed methods with the REK method, the PREK method, the GRAK method, and the RGRAK method. In the stopping criterion (7), we use and . We run the algorithms on five dense databases, including the dataset333http://yann.lecun.com/exdb/mnist/, the dataset444http://archive.ics.uci.edu/dataset/54/isolet, the dataset555https://hyper.ai/datasets/16041[8], the -10 dataset666http://www.cs.toronto.edu/ kriz/cifar.html, as well as the YouTube Faces dataset777http://www.cs.tau.ac.il/ wolf/ytfaces/. In all the experiments, we randomly choose samples as the training set, where is the total number of samples. The details of the data sets are listed in Table 5, and Table 6 lists the numerical results obtained.
Datasets | Dimensionality () | Number of total samples () | Background | Type |
33footnotemark: 3 | 784 | 70000 | Handwritten digits recognition | dense |
44footnotemark: 4 | 617 | 7797 | Classification | dense |
55footnotemark: 5 | 256 | 9598 | Handwritten digits recognition | dense |
66footnotemark: 6 | 3072 | 50000 | Object recognition | dense |
77footnotemark: 7 | 4096 | 370319 | Face recognition | dense |
Matrix& Size | 9598 | 50000 | ||||
REK | IT | 166800 | 246440 | 367000 | / | / |
CPU | 237.6 | 83.3 | 111.9 | / | / | |
PREK | IT | / | 374680 | 162600 | / | / |
CPU | / | 101.6 | 40.6 | / | / | |
GRAK | IT | 19200 | 124840 | 188480 | 465080 | / |
CPU | 419.9 | 312.5 | 153.1 | 30428.1 | / | |
RGRAK | IT | 17760 | 88600 | 98440 | 130800 | / |
CPU | 350.2 | 217.9 | 79.9 | 14115.9 | / | |
Algorithm 3 | IT | 17200 | 82200 | 72320 | 130760 | / |
CPU | 1451.5 | 187.2 | 46.2 | 26437.2 | / | |
Algorithm 4 | IT | 88000 | 236240 | 338280 | 975600 | 905200 |
CPU | 183.4 | 72.7 | 94.6 | 4393.7 | 44789.7 | |
Speed-up | 2.3 | 4.3 | 1.6 | 6.9 | / |
From Table 6, we observe that Algorithm 4 is the fastest one while Algorithm 3 needs the fewest number of iterations in most cases. Specifically, Algorithm 4 is about two to six times faster than GRAK, and the improvement is prohibitive. For the extremely large-scale data set , all the algorithms fail to converge except for Algorithm 4. Thus, we benefit from the strategy of random sampling and there is no need to calculate the whole residual vector at each iteration. All these results show that Algorithm 4 is advantageous in solving large-scale least squares problems with dense data. Notice that the PREK method does not work for the dataset , this is because this method may fail to converge if the coefficient matrix is not of full column rank [5].
6 Concluding Remarks
The greedy randomized augmented Kaczmarz method is an effective approach proposed recently for large and sparse inconsistent linear systems. The key of this method is to equivalently transform a inconsistent linear system of size -by- into a consistent augmented linear system of size -by-, and then apply the GRK method to the augmented linear system. However, in the GRAK method, it is required to calculate the residual of the augmented linear system and construct two index sets at each iteration. Moreover, it only updates the vector rather than the approximate solution as , which may slow down the convergence rate. Thus, the computational overhead of this method is large, especially for big-data problems, and it is urgent to enhance the numerical performance of this method.
The main contributions of this work are two-fold. First, we propose an accelerated greedy randomized augmented Kaczmarz method for large-scale and sparse or dense inconsistent linear systems, and apply the strategy of simple random sampling to further reduce the workload. Theoretical analysis shows that, under very weak assumptions, the accelerated greedy randomized augmented Kaczmarz method converges faster than the greedy randomized augmented Kaczmarz method. Second, as far as we know, there are no practical stopping criteria for randomized Kaczmarz-type methods till now. So as to fill-in this gap, we introduce a practical stopping criterion that is applicable to all the randomized Kaczmarz-type methods. Numerical experiments demonstrate the numerical behavior of our proposed algorithms and the effectiveness of the stopping criterion. However, there is a parameter involved in the proposed stopping criterion. How to pick its optimal value is interesting and deserves further investigation. We believe it is problem-and-data dependent.
References
- [1] R. Ansorge, Connections between the Cimmino-method and the Kaczmarz-method for the solution of singular and regular systems of equations, Computing, 33 (1984), pp. 367–375.
- [2] V. Borkar, N. Karamchandani, and S. Mirani, Randomized Kaczmarz for rank aggregation from pairwise comparisons, IEEE Information Theory Workshop (ITW), Cambridge, 2016, pp. 389–393.
- [3] Z.-Z. Bai and W.-T. Wu, On greedy randomized Kaczmarz method for solving large sparse linear systems, SIAM Journal on Scientific Computing, 40 (2018), pp. A592–A606.
- [4] Z.-Z. Bai and W.-T. Wu, On greedy randomized augmented Kaczmarz method for solving large sparse inconsistent linear systems, SIAM Journal on Scientific Computing, 43 (2021), pp. A3892–A3911.
- [5] Z.-Z. Bai and W.-T. Wu, On partially randomized extended Kaczmarz method for solving large sparse overdetermined inconsistent linear systems, Linear Algebra and its Applications, 578 (2019), pp. 225–250.
- [6] Z.-Z. Bai and W.-T. Wu, On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems, Applied Mathematics Letters, 83 (2018), pp. 21–26.
- [7] Z.-Z. Bai, L. Wang, and G.V. Muuratova, On relaxed greedy randomized augmented Kaczmarz methods for solving large sparse inconsistent linear systems, East Asian Journal on Applied Mathematics, (27) 2021, pp. 323–332.
- [8] L. Bai, J. Liang and Y. Zhao, Self-constrained spectral clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 (2023), pp. 5126–5138.
- [9] J. Chen and Z.-D. Huang, On a fast deterministic block Kaczmarz method for solving large scale linear system, Numerical Algorithms, 89 (2022), pp. 1007–1029.
- [10] S. Chapra and R Canale, Numerical Methods for Engineers, 7th. ed., McGraw-Hill, New York, 2005.
- [11] Y. Censor, Row-action methods for huge and sparse systems and their applications, SIAM Review, 23 (1981), pp. 444–466.
- [12] D. Cai, X.-F. He, and J.-W. Han, SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis, IEEE Transactions on Knowledge and Data Engineering, 20 (2008), pp. 1–12.
- [13] K. Du, W. Si, and X. Sun, Randomized extended average block Kaczmarz for solving least squares, SIAM Journal on Scientific Computing, 42 (2020), pp. A3541–A3559.
- [14] C. Gu and Y. Liu, Variant of greedy randomized Kaczmarz for ridge regression, Applied Numerical Mathematics, 143 (2019), pp. 223–246.
- [15] R. Gordon, R. Bender, and G.T. Herman, Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography, Journal of Theoretical Biology, 29 (1970), pp. 471–481.
- [16] W. Guo, H. Chen, W. Geng, and L. Lei, A modified Kaczmarz algorithm for computerized tomographic image reconstruction, IEEE International Conference on Biomedical Engineering and Informatics, 3 (2009), pp. 1–4.
- [17] A. Hefny, D. Needell, and A. Ramdas, Rows versus columns: randomized Kaczmarz or Gauss-Seidel for ridge egression, SIAM Journal on Scientific Computing, 39 (2017), pp. S528–S542.
- [18] G. Herman, Fundamentals of Computerized Tomography: Image Reconstruction from Projections, 2nd. ed., Springer, Dordrecht, 2009.
- [19] P. Hansen and Jrgensen, AIR tools :algebraic iterative reconstruction methods, improved mplementation, Numerical Algorithms, 79 (2018), pp. 107–137.
- [20] X. Jiang, K. Zhang, and J. Yin, Randomized block Kaczmarz methods with k-means clustering for solving large linear systems, Journal of Computational and Applied Mathematics, 403 (2022), Article 113828.
- [21] Y. Jiang, G. Wu, and L. Jiang, A semi-randomized Kaczmarz method with simple random sampling for large-scale linear systems, Advances in Computational Mathmatics, 49 (2023), Article 20.
- [22] J. Johnson, Probability and Statistics for Computer Scientists, The John Wiley & Sons Press, 1997.
- [23] S. Kaczmarz, Approximate solution of systems of linear equations, International Journal of Control, 35 (1937), pp. 355–357.
- [24] S. Lee and H. Kim, Noise properties of reconstructed images in a kilo-voltage on-board imaging system with iterative reconstruction techniques: A phantom study, Physica Medica, 30 (2014), pp. 365–373.
- [25] K. Li and G. Wu, Randomized approximate class-specific kernel spectral regression analysis for large-scale face verification, Machine Learning, 111 (2022), pp. 2037–2091.
- [26] R. Li and H. Liu, On global randomized block Kaczmarz method for image reconstruction, Electronic Research Archive, 30 (2022), pp. 1442–1453.
- [27] J. Liu and S. Wright, An accelerated randomized Kaczmarz algorithm, Mathematics of Computation, 85 (2016), pp. 153–178.
- [28] J. Loera, J. Haddock, and D. Needell, A sampling Kaczmarz-Motzkin algorithm for linear feasibility, SIAM Journal on Scientific Computing, 39 (2017), pp. S66–S87.
- [29] A. Ma, D. Needell, and A. Ramdas, Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods, SIAM Journal on Matrix Analysis and Applications, 36 (2015), pp. 1590–1604.
- [30] C.-Q. Miao and W.-T. Wu, On greedy randomized average block Kaczmarz method for solving large linear systems, Journal of Computational and Applied Mathematics, 413 (2022), Article 114372.
- [31] D. Needell, Randomized Kaczmarz solver for noisy linear systems, Bit Numerical Mathematics, 50 (2010), pp. 395–403.
- [32] D. Needell and J. Tropp, Paved with good intentions: Analysis of a randomized block Kaczmarz method, Linear Algebra and Its Applications, 441 (2014), pp. 199–221.
- [33] D. Needell, R. Zhao, and A. Zouzias, Randomized block Kaczmarz method with projection for solving least squares, Linear Algebra and Its Applications, 484 (2015), pp. 322–343.
- [34] I. Necoara, Faster randomized block Kaczmarz algorithms, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 1425–1452.
- [35] Y. Niu and B. Zheng, A greedy block Kaczmarz algorithm for solving large-scale linear systems, Applied Mathematics Letters, 104 (2020), Article 106294.
- [36] C. Popa, Least-squares solution of overdetermined inconsistent linear systems using kaczmarz’s relaxation, International Journal of Computer Mathematics, 55 (1995), pp. 79–89.
- [37] C. Popa, Extensions of block-projections methods with relaxation parameters to inconsistent and rank-deficient least-squares problems, Bit Numerical Mathematics, 38 (1998), pp. 151–176.
- [38] R. Ramlau and M. Rosensteiner, An efficient solution to the atmospheric turbulence tomography problem using Kaczmarz iteration, Inverse Problems, 28 (2012), Article 095004.
- [39] W. Shi and G. Wu, New algorithms for trace-ratio problem with application to high-dimension and large-sample data dimensionality reduction, Machine Learning (2021), pp. 1–28.
- [40] S. Steinerberger, A weighted randomized Kaczmarz method for solving linear systems, Mathematics of Computation, 90 (2021), pp. 2815–2826.
- [41] T. Strohmer and R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, Journal of Fourier Analysis and Applications, 15 (2009), pp. 262–278.
- [42] G. Thoppe, V. Borkar, and D. Manjunath, A stochastic Kaczmarz algorithm for network tomography, Automatica, 50 (2014), pp. 910–914.
- [43] X. Yang, A geometric probability randomized Kaczmarz method for large scale linear systems, Applied Numerical Mathematics, 164 (2021), pp. 139-160.
- [44] J. Yin, N. Li, and N. Zheng, Restarted randomized surrounding methods for solving large linear equations, Applied mathematics letters, 133 (2022), Article 108290
- [45] J. Zhang, A new greedy Kaczmarz algorithm for the solution of very large linear systems, Applied Mathematics Letters, 91 (2019), pp. 207–212.
- [46] Y. Zhang and H. Li, Greedy Motzkin-Kaczmarz methods for solving linear systems, Numerical Linear Algebra with Applications, 29 (2022), e2429.
- [47] Y. Zhang and H. Li, A count sketch maximal weighted residual Kaczmarz method for solving highly overdetermined linear systems, Applied Mathematics and Computation, 410 (2021), Article 126486.
- [48] A. Zouzias and N.M. Freris, Randomized extended Kaczmarz for solving least-squares, SIAM Journal on Matrix Analysis and Applications, 34 (2013), pp. 773–793.