This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A Semi-Randomized and Augmented Kaczmarz Method with Simple Random Sampling for Large-Scale Inconsistent Linear Systems

Shunchang Li School of Mathematics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, P.R. China. E-mail: [email protected].    Gang Wu Corresponding author. School of Mathematics, China University of Mining and Technology, Xuzhou 221116, Jiangsu, P.R. China. E-mail: [email protected]. This work is supported by the National Natural Science Foundation of China under Grant 12271518, the Fujian Natural Science Foundation under Grant 2023J01354, the Key Research and Development Project of Xuzhou Natural Science Foundation under Grant KC22288, and the Open Project of Key Laboratory of Data Science and Intelligence Education of the Ministry of Education under Grant DSIE202203.
Abstract

A greedy randomized augmented Kaczmarz (GRAK) method was proposed in [Z.-Z. Bai and W.-T. Wu, SIAM J. Sci. Comput., 43 (2021), pp. A3892–A3911] for large and sparse inconsistent linear systems. However, one has to construct two new index sets via computing residual vector with respect to the augmented linear system in each iteration. Thus, the computational overhead of this method is large for extremely large-scale problems. Moreover, there is no reliable stopping criterion for this method. In this work, we are interested in solving large-scale sparse or dense inconsistent linear systems, and try to enhance the numerical performance of the GRAK method. First, we propose an accelerated greedy randomized augmented Kaczmarz method. Theoretical analysis indicates that it converges faster than the GRAK method under very weak assumptions. Second, in order to further release the overhead, we propose a semi-randomized augmented Kaczmarz method with simple random sampling. Third, to the best of our knowledge, there are no practical stopping criteria for all the randomized Kaczmarz-type methods till now. To fill-in this gap, we introduce a practical stopping criterion for Kaczmarz-type methods, and show its rationality from a theoretical point of view. Numerical experiments are performed on both real-world and synthetic data sets, which demonstrate the efficiency of the proposed methods and the effectiveness of our stopping criterion.

keywords:
Randomized Kaczmarz Method, Greedy Randomized Augmented Kaczmarz Method (GRAK), Semi-Randomized Kaczmarz Method, Inconsistent Linear System, Simple Random Sampling.
AMS:
65F10, 65F15

1 Introduction

Consider the large-scale linear system

A𝐱=𝐛,A{\bf x}={\bf b}, (1)

where Am×nA\in\mathbb{C}^{m\times n}, 𝐛m{\bf b}\in\mathbb{C}^{m}, and 𝐱n{\bf x}\in\mathbb{C}^{n} is an unknown vector. In this paper, we make the assumption that there are no zero rows or columns in AA. On one hand, if the linear system (1) is consistent, the interest is to find the least-norm solution 𝐱=A𝐛{\bf x}_{\star}=A^{\dagger}{\bf b}. The Kaczmarz method [23] is an effective iterative method for consistent linear systems, which has been applied in many real-world applications, e.g., signal processing [2, 16, 24], image and sound reconstructions [15, 18, 38, 42], computed tomography [1, 16, 24], and so on; see [11] and the references therein. At the kk-th iteration of the Kaczmarz method, the approximate solution 𝐱k+1{\bf x}_{k+1} is obtained from orthogonally projecting the current iteration vector 𝐱k{\bf x}_{k} onto the hyperplane Hik={𝐱|A(ik)𝐱=𝐛(ik)}H_{i_{k}}=\big{\{}{\bf x}|A^{\left(i_{k}\right)}{\bf x}={\bf b}^{\left(i_{k}\right)}\big{\}}:

𝐱k+1=𝐱k+𝐛(ik)A(ik)𝐱kA(ik)22(A(ik))H,ik=(kmodm)+1.{\bf x}_{k+1}={\bf x}_{k}+\frac{{\bf b}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}}{\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H},\quad i_{k}=(k~{}mod~{}m)+1. (2)

However, if the angle between the hyperplane related to two consecutive iterations is small, the convergence rate of Kaczmarz method may be slow. In order to partially overcome this difficulty, Strohmer and Vershynin [41] proposed a randomized version and proved its expected exponential rate of convergence, which is called the randomized Kaczmarz (RK) method. However, RK is difficult to utilize some real-time information for selecting working rows in each iteration. Another disadvantage is that the probability of each row being selected is directly proportional to the square of its Euclidean norm. If the Euclidean norms of a few rows of the coefficient matrix are much larger than those of the others, the RK method will converge very slow or even diverge. In order to improve the convergence of the RK method, many variations have been proposed, such as the sampling Kaczmarz-Motzkin method [28], the accelerated randomized Kaczmarz method [27], the weighted randomized Kaczmarz method [40], and so on [6, 14, 17, 45, 46, 47]. Some randomized block Kaczmarz methods were introduced in [9, 20, 26, 30, 32, 34, 35].

In [3], Bai and Wu proposed a greedy randomized Kaczmarz (GRK) method for large consistent linear systems. Theoretical analysis and numerical experiments demonstrate that GRK works better than RK. However, in each iteration of the GRK method, one has to compute the residual vector corresponding to the augmented linear system, and construct a new index set, which suffer from a large amount of workload, especially for big data problems. In order to reduce the computational overhead, Jiang, Wu and Jiang [21] proposed a semi-randomized Kaczmarz method with simple random sampling. This method only uses a small portion of rows of the matrix AA, and there is no need to calculate probabilities nor constructing index set for choosing working rows.

On the other hand, if the linear system (1) is inconsistent, the Kaczmarz-type methods can not converge to the least-norm least-squares solution 𝐱=A𝐛{\bf x}_{\star}=A^{\dagger}{\bf b} [4]. Inspired by the works in [36, 37], Zouzias and Freris [48] introduced a randomized extended Kaczmarz (REK) method, and proved that the REK method converges to the least-norm least-squares solution. Indeed, the REK method can be understood as using the randomized Kaczmarz method to solve two linear systems AH𝐳=0A^{H}{\bf z}=0 and A𝐱=𝐛𝐳A{\bf x}={\bf b}-{\bf z}. Based on the REK method, Bai and Wu [5] proposed a partially random extended Kaczmarz (PREK) method. However, both the REK method and the PREK method will converge slowly if one of the two linear systems does so. A number of randomized block Kaczmarz methods for solving inconsistent linear systems were investigated in [13, 33].

Recently, Bai and Wu proposed a greedy randomized augmented Kaczmarz (GRAK) method for solving large sparse inconsistent linear systems [4]. In essence, the GRAK method first equivalently transform the inconsistent linear system (1) into a consistent augmented linear system, and then apply the GRK method to solve this augmented linear system. The relaxed version of the GRAK method was proposed in [7]. However, similar to the GRK method, it is required to calculate the residual vector with respect to the augmented linear system and construct two index sets and in each iteration. This is unfavorable for extremely large-scale problems.

In this paper, we focus on solving large (dense or sparse) inconsistent linear systems, and try to improve the performance of the GRAK method proposed in [4]. The contributions of this work are as follows. First, we modify the iteration scheme for updating the approximate solution, and propose an accelerated greedy randomized augmented Kaczmarz method. We prove that it converges faster than the GRAK method if AF21\left\|A\right\|_{F}^{2}\gg 1. Second, we propose a semi-randomized augmented Kaczmarz method with simple random sampling for large inconsistent linear systems. In this method, we are free of computing the residual vector corresponding to the augmented linear system, and there is no need to construct index sets. So the proposed method is much cheaper than the GRAK method per iteration. The convergence of the method is established. Third, as far as we know, there are no practical stopping criteria for Kaczmarz-type methods. The third contribution of this work is to introduce a practical stopping criterion for Kaczmarz-type methods. By using this strategy, there is no need to know the exact solution a prior nor compute the residual vector.

The organization of this paper is as follows. In Section 2, we briefly review the randomized extended Kaczmarz method and the greedy randomized augmented Kaczmarz method for large inconsistent linear systems. In Section 3, we propose an accelerated greedy randomized augmented Kaczmarz method and an accelerated greedy randomized augmented Kaczmarz method with simple random sampling. Convergence results of the two methods are established. In Section 4, we present a practical stopping criterion for Kaczmarz-type methods, and show its rationality theoretically. In Section 5, we perform comprehensive numerical experiments on both synthetic and real-world data sets, to show the efficiency of the proposed methods as well as the effectiveness of our stopping criterion. Some concluding remarks are given in Section 6.

Let us introduce some notations. For a given matrix Am×nA\in\mathbb{C}^{m\times n}, we denote by AHA^{H}, A2\left\|A\right\|_{2}, AA^{\dagger}, A(i)A^{\left(i\right)} and A(j)A_{\left(j\right)} its conjugate transpose, Euclidean norm, Moore-Penrose inverse, the iith row and the jjth column of AA, respectively. Let (A)\mathcal{R}\left(A\right) and (A)\mathcal{R}\left(A\right)^{\bot} be the range space of AA and the orthogonal complement subspace of (A)\mathcal{R}\left(A\right). Let 𝐛(A){\bf b}_{\mathcal{R}\left(A\right)} and 𝐛(A){\bf b}_{\mathcal{R}\left(A\right)^{\bot}} be the orthogonal projection of the vector 𝐛{\bf b} onto (A)\mathcal{R}\left(A\right) and (A)\mathcal{R}\left(A\right)^{\bot}, respectively. Denote by λmin(AHA)\lambda_{\min}(A^{H}A) the minimal nonzero eigenvalue of the matrix AHAA^{H}A, and by 𝐞𝐢{\bf e_{i}} the iith column of the identity matrix II. Let 𝔼[]\mathbb{E}\left[\cdot\right] be the full expectation, and let 𝔼k[]\mathbb{E}_{k}\left[\cdot\right] be the conditional expectation on the first kk iterations [3]. That is,

𝔼k[]=𝔼[|i0,i1,,ik1],k=1,2,\mathbb{E}_{k}\left[\cdot\right]=\mathbb{E}\left[\cdot|i_{0},i_{1},\ldots,i_{k-1}\right],\quad k=1,2,\ldots

In the light of the properties of full expectation and conditional expectation, we have that 𝔼[𝔼k[]]=𝔼[]\mathbb{E}\left[\mathbb{E}_{k}\left[\cdot\right]\right]=\mathbb{E}\left[\cdot\right].

2 The randomized extended Kaczmarz method and the greedy randomized augmented Kaczmarz method

In this section, we briefly introduce the randomized extended Kaczmarz (REK) method and the greedy randomized augmented Kaczmarz (GRAK) method for large-scale inconsistent linear systems. The REK method is a combination of the randomized orthogonal projection algorithm and the randomized Kaczmarz method [48]. The main idea is to efficiently reduce the norm of the “noisy” of 𝐛{\bf b}, i.e., 𝐛(A){\bf b}_{\mathcal{R}\left(A\right)^{\bot}}, by using the randomized orthogonal projection algorithm, and then apply the randomized Kaczmarz method on a new linear system whose right-hand side is now (arbitrarily) close to the column space of AA. The framework of the REK method is as follows; for more details, refer to [48].

Algorithm 1.

The randomized extended Kaczmarz (REK) method [48]
Input: Given AA, 𝐛\bf b, the convergence tolerance ϵ>0\epsilon>0, and let 𝐱0=0{\bf x}_{0}=0 and 𝐳0=𝐛{\bf z}_{0}={\bf b}.
Output: The approximate solution 𝐱~\widetilde{\bf x}.
1. for k=0,1,2,,k=0,1,2,\ldots, do
2. Pick ik{1,2,,m}i_{k}\in\left\{1,2,\ldots,m\right\} with probability qi=A(i)22/AF2,i{1,2,,m}q_{i}=\left\|A^{\left(i\right)}\right\|_{2}^{2}/\left\|A\right\|_{F}^{2},~{}i\in\left\{1,2,\ldots,m\right\}.
3. Pick jk{1,2,,n}j_{k}\in\left\{1,2,\ldots,n\right\} with probability pj=A(j)22/AF2,j{1,2,,n}p_{j}=\left\|A^{\left(j\right)}\right\|_{2}^{2}/\left\|A\right\|_{F}^{2},~{}j\in\left\{1,2,\ldots,n\right\}.
4. Set 𝐳k+1=𝐳kA(jk)H𝐳kA(jk)22A(jk){\bf z}_{k+1}={\bf z}_{k}-\frac{A_{\left(j_{k}\right)}^{H}{\bf z}_{k}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}A_{\left(j_{k}\right)}.
5. Set 𝐱k+1=𝐱k𝐛(ik)𝐳k(ik)A(ik)𝐱kA(jk)22(A(ik))H{\bf x}_{k+1}={\bf x}_{k}-\frac{{\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H}.
6. Check every 8min(m,n)8\cdot\min\left(m,n\right) iterations and terminate if

A𝐱k(𝐛𝐳k)2AF𝐱k2εandAH𝐳k2AF2𝐱k2ε.\frac{\left\|A{\bf x}_{k}-\left({\bf b}-{\bf z}_{k}\right)\right\|_{2}}{\left\|A\right\|_{F}\left\|{\bf x}_{k}\right\|_{2}}\leqslant\varepsilon\,\,~{}~{}and~{}~{}\,\,\frac{\left\|A^{H}{\bf z}_{k}\right\|_{2}}{\left\|A\right\|_{F}^{2}\left\|{\bf x}_{k}\right\|_{2}}\leqslant\varepsilon.

7. end for

In [4], Bai and Wu proposed a greedy randomized augmented Kaczmarz (GRAK) method. The main idea is to rewrite the large-scale inconsistent linear system A𝐱=𝐛A{\bf x}={\bf b} into a consistent augmented linear system

A~𝐱~=𝐛~,\tilde{A}\tilde{{\bf x}}=\tilde{{\bf b}}, (1)

where

A~=(IAAH0),𝐱~=(𝐳𝐱),and𝐛~=[𝐛𝟎],\tilde{A}=\left(\begin{matrix}I&A\\ A^{H}&0\\ \end{matrix}\right),\quad\tilde{{\bf x}}=\left(\begin{array}[]{c}{\bf z}\\ {\bf x}\\ \end{array}\right),\quad and\quad\tilde{{\bf b}}=\left[\begin{array}[]{c}{\bf b}\\ {\bf 0}\\ \end{array}\right], (2)

and then apply the greedy randomized augmented Kaczmarz (GRK) method [3] to the augmented linear system (1).

More precise, define the two index sets

ΩkR={ik{1,2,,m}:|𝐛(ik)𝐳k(ik)A(ik)𝐱k|21+A(ik)22ϵk(𝐛𝐳kA𝐱k22+AH𝐳k22)},\varOmega_{k}^{R}=\left\{i_{k}\in\left\{1,2,\ldots,m\right\}:\frac{\left|{\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right|^{2}}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\geqslant\epsilon_{k}\left(\left\|{\bf b}-{\bf z}_{k}-A{\bf x}_{k}\right\|_{2}^{2}+\left\|A^{H}{\bf z}_{k}\right\|_{2}^{2}\right)\right\}, (3)

and

ΩkC={jk{1,2,,n}:|A(jk)H𝐳k|2A(jk)22εk(𝐛𝐳kA𝐱k22+AH𝐳k22)},\varOmega_{k}^{C}=\left\{j_{k}\in\left\{1,2,\ldots,n\right\}:\frac{\left|A_{\left(j_{k}\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}\geqslant\varepsilon_{k}\left(\left\|{\bf b}-{\bf z}_{k}-A{\bf x}_{k}\right\|_{2}^{2}+\left\|A^{H}{\bf z}_{k}\right\|_{2}^{2}\right)\right\}, (4)

where

ϵk=max{ϵkR,ϵkC},\epsilon_{k}=\max\left\{\epsilon_{k}^{R},\epsilon_{k}^{C}\right\}, (5)

with

ϵkR=12(1𝐛𝐳kA𝐱k22+AH𝐳k22max1ikm{|𝐛(ik)𝐳k(ik)A(ik)𝐱k|21+A(ik)22}+1m+2AF2)\epsilon_{k}^{R}=\frac{1}{2}\left(\frac{1}{\left\|{\bf b}-{\bf z}_{k}-A{\bf x}_{k}\right\|_{2}^{2}+\left\|A^{H}{\bf z}_{k}\right\|_{2}^{2}}\underset{1\leqslant i_{k}\leqslant m}{\max}\left\{\frac{\left|{\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right|^{2}}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\right\}+\frac{1}{m+2\left\|A\right\|_{F}^{2}}\right) (6)

and

ϵkC=12(1𝐛𝐳kA𝐱k22+AH𝐳k22max1jkn{|A(jk)H𝐳k|2A(jk)22}+1m+2AF2).\epsilon_{k}^{C}=\frac{1}{2}\left(\frac{1}{\left\|{\bf b}-{\bf z}_{k}-A{\bf x}_{k}\right\|_{2}^{2}+\left\|A^{H}{\bf z}_{k}\right\|_{2}^{2}}\underset{1\leqslant j_{k}\leqslant n}{\max}\left\{\frac{\left|A_{\left(j_{k}\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A^{\left(j_{k}\right)}\right\|_{2}^{2}}\right\}+\frac{1}{m+2\left\|A\right\|_{F}^{2}}\right). (7)

Let

𝐫~k(i)={𝐛(i)𝐳k(i)A(i)𝐱kifiΩkR,0otherwise,\tilde{{\bf r}}_{k}^{\left(i\right)}=\begin{cases}{\bf b}^{\left(i\right)}-{\bf z}_{k}^{\left(i\right)}-A^{\left(i\right)}{\bf x}_{k}&if\,\,i\in\varOmega_{k}^{R},\\ 0&otherwise,\\ \end{cases} (8)

and

𝐬~k(j)={A(j)H𝐳kifjΩkC,0otherwise.\tilde{{\bf s}}_{k}^{\left(j\right)}=\begin{cases}-A_{\left(j\right)}^{H}{\bf z}_{k}&if\,\,j\in\varOmega_{k}^{C},\\ 0&otherwise.\\ \end{cases} (9)

The greedy randomized augmented Kaczmarz method is given as follows; for more details, refer to [4].

Algorithm 2.

The greedy randomized augmented Kaczmarz  (GRAK) method [4]
Input: AA, 𝐛\bf b, ll, 𝐱0{\bf x}_{0}, and 𝐳0{\bf z}_{0}.
Output: 𝐱l{\bf x}_{l} and 𝐳l{\bf z}_{l}.
1. for k=0,1,,l1k=0,1,\ldots,l-1 do
2. Compute the tolerances ϵkR\epsilon_{k}^{R}, ϵkC\epsilon_{k}^{C} and ϵk\epsilon_{k} by (5)–(6), and identify the index sets ΩkR\varOmega_{k}^{R} and ΩkC\varOmega_{k}^{C} by the criteria (3) and (4), and determine the projected residuals 𝐫~k\tilde{\bf r}_{k} and 𝐬~k\tilde{\bf s}_{k} by the definitions (8) and (9), respectively.
3. Select an index tkt_{k} satisfying 1tkm+n1\leqslant t_{k}\leqslant m+n, with the probability

Pr(index=tk)={|𝐫~k(tk)|2𝐫~k22+𝐬~k22if  1tkm,|𝐬~k(tkm)|2𝐫~k22+𝐬~k22ifm+1tkm+n.Pr\left(index=t_{k}\right)=\begin{cases}\frac{\left|\tilde{{\bf r}}_{k}^{\left(t_{k}\right)}\right|^{2}}{\left\|\tilde{{\bf r}}_{k}\right\|_{2}^{2}+\left\|\tilde{{\bf s}}_{k}\right\|_{2}^{2}}&if\,\,1\leqslant t_{k}\leqslant m,\\ \frac{\left|\tilde{{\bf s}}_{k}^{\left(t_{k}-m\right)}\right|^{2}}{\left\|\tilde{{\bf r}}_{k}\right\|_{2}^{2}+\left\|\tilde{{\bf s}}_{k}\right\|_{2}^{2}}&if\,\,m+1\leqslant t_{k}\leqslant m+n.\\ \end{cases}

4. If 1tkm1\leqslant t_{k}\leqslant m, then set ik=tki_{k}=t_{k} and compute

𝐳k+1=𝐳k+(𝐛(ik)𝐳k(ik)A(ik)𝐱k)1+A(ik)22𝐞ik{\bf z}_{k+1}={\bf z}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}{\bf e}_{i_{k}}

and

𝐱k+1=𝐱k+(𝐛(ik)𝐳k(ik)A(ik)𝐱k)1+A(ik)22(A(ik))H{\bf x}_{k+1}={\bf x}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H}

else if   m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, then set jk=tkmj_{k}=t_{k}-m, and compute

𝐳k+1=𝐳kA(jk)H𝐳kA(jk)22A(jk)and𝐱k+1=𝐱k{\bf z}_{k+1}={\bf z}_{k}-\frac{A_{\left(j_{k}\right)}^{H}{\bf z}_{k}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}A_{\left(j_{k}\right)}\quad and\quad{\bf x}_{k+1}={\bf x}_{k}

5. endfor

The following theorem indicates that the GRAK method is convergent in expectation to the least-norm least-square solution 𝐱=A𝐛{\bf x}_{\star}=A^{\dagger}{\bf b}.

Theorem 1.

[4] Starting from any initial vectors 𝐱0range(AH){\bf x}_{0}\in range\left(A^{H}\right) and 𝐳0m{\bf z}_{0}\in\mathbb{C}^{m}, the iteration sequences {𝐱k}k=0\left\{{\bf x}_{k}\right\}_{k=0}^{\infty} and {𝐳k}k=0\left\{{\bf z}_{k}\right\}_{k=0}^{\infty}, generated by the GRAK method, converge in expectation to the least-norm least-squares solution 𝐱=A𝐛{\bf x}_{\star}=A^{\dagger}{\bf b} of the linear system (1) and the orthogonal projection vector 𝐳=𝐛(A){\bf z}_{\star}={\bf b}_{\mathcal{R}\left(A\right)^{\bot}}, respectively. Moreover, the global solution error in expectation with respect to both iteration sequences {𝐱k}k=0\left\{{\bf x}_{k}\right\}_{k=0}^{\infty} and {𝐳k}k=0\left\{{\bf z}_{k}\right\}_{k=0}^{\infty} obeys

𝔼(𝐱1𝐱22+𝐳1𝐳22)ζ(𝐱0𝐱22+𝐳0𝐳22)\mathbb{E}\left(\left\|{\bf x}_{1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\zeta\left(\left\|{\bf x}_{0}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{0}-{\bf z}_{\star}\right\|_{2}^{2}\right)

and

𝔼k(𝐱k+1𝐱22+𝐳k+1𝐳22)β(𝐱k𝐱22+𝐳k𝐳22),k=1,2\mathbb{E}_{k}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\beta\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right),k=1,2\ldots (10)

where

ζ=1η2AF2+m\zeta=1-\frac{\eta}{2\left\|A\right\|_{F}^{2}+m}

and

β=112(2AF2+mγ+1)η2AF2+m\beta=1-\frac{1}{2}\left(\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\right)\frac{\eta}{2\left\|A\right\|_{F}^{2}+m} (11)

with

η=min{1,(λmin(AHA)+1412)2}\eta=\min\left\{1,\left(\sqrt{\lambda_{\min}\left(A^{H}A\right)+\frac{1}{4}}-\frac{1}{2}\right)^{2}\right\} (12)

and

γ=2AF2+mmin{1+min1imA(i)22,min1jnA(j)22}.\gamma=2\left\|A\right\|_{F}^{2}+m-\min\left\{1+\underset{1\leqslant i\leqslant m}{\min}\left\|A^{\left(i\right)}\right\|_{2}^{2},\underset{1\leqslant j\leqslant n}{\min}\left\|A_{\left(j\right)}\right\|_{2}^{2}\right\}. (13)

As a result, it holds that

𝔼(𝐱k𝐱22+𝐳k𝐳22)βk1ζ(𝐱0𝐱22+𝐳0𝐳22)\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\beta^{k-1}\zeta\cdot\left(\left\|{\bf x}_{0}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{0}-{\bf z}_{\star}\right\|_{2}^{2}\right)

Theorem 1 shows that the convergence rate of GRAK strongly depends on the minimum singular value and the Frobenius norm, the number of rows, and the size of the Euclidean norms of all rows and columns of the matrix AA.

3 The Proposed Methods

In each iteration of the GRAK method, one has to compute two residual vectors 𝐫~k\tilde{{\bf r}}_{k}, 𝐬~k\tilde{{\bf s}}_{k} and the probabilities PR()PR(\cdot), to construct two new index sets ϵkR\epsilon_{k}^{R}, ϵkC\epsilon_{k}^{C}. Thus, one has to scan all the m+nm+n rows of the coefficient matrix A~\tilde{A}, which is very time consuming as the matrix AA is extremely large, and it is contrary to the original motivation of the Kaczmarz-type methods.

So as to deal with these problems, in this section, we first propose an accelerated greedy randomized augmented Kaczmarz method. Convergence analysis shows that, if AF21\|A\|_{F}^{2}\gg 1, the accelerated method often converges faster than the GRAK method. To further reduce the computational overhead of the accelerated method, we then propose an accelerated greedy randomized augmented Kaczmarz method with simple sampling. The convergence of the method is also established.

3.1 An accelerated greedy randomized augmented Kaczmarz method

It is seen from Algorithm 2 that, as m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, the GRAK method only updates the vector 𝐳k{\bf z}_{k} rather than the approximate solution 𝐱k{\bf x}_{k}, which may slow down the rate of convergence of this algorithm. Indeed, just like the REK method, GRAK may converge slowly if one of the iteration sequences {𝐱k}k=0\left\{{\bf x}_{k}\right\}_{k=0}^{\infty} or {𝐳k}k=0\left\{{\bf z}_{k}\right\}_{k=0}^{\infty} does so. Thus, it is necessary to give new insight into the GRAK method and improve its numerical performance.

The idea is that, when m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, after computing 𝐳k{\bf z}_{k}, we select the working rows with probability A(ik)22/AF2\|A^{\left(i_{k}\right)}\|_{2}^{2}/\|A\|_{F}^{2} to further update 𝐱k{\bf x}_{k} for the linear equation A𝐱=𝐛𝐳k+1A{\bf x}={\bf b}-{\bf z}_{k+1}. Motivated by the semi-randomized Kaczmarz method proposed in [21], to refrain from evaluating probabilities and free of constructing two index sets in each iteration, we propose an accelerated greedy randomized augmented Kaczmarz (AGRAK) method to solve the augmented linear system (1).

Algorithm 3.

An accelerated greedy randomized augmented Kaczmarz (AGRAK) method
Input:
AA, 𝐛\bf b, ll, 𝐱0=𝟎{\bf x}_{0}={\bf 0}, and 𝐳0=𝐛{\bf z}_{0}={\bf b}.
Output: 𝐱l{\bf x}_{l} and 𝐳l{\bf z}_{l}.
1. for k=0,1,,l1k=0,1,\ldots,l-1 do
2. Select tk{1,2,,m+n}t_{k}\in\left\{1,2,\dots,m+n\right\} according to

tk=argmax{max1im{|𝐛(i)𝐳k(i)A(i)𝐱k|21+A(i)22},maxm+1im+n{|A(im)H𝐳k|2A(im)22}}.t_{k}=\mathop{\arg\max}\left\{\underset{1\leqslant i\leqslant m}{\max}\left\{\frac{\left|{\bf b}^{\left(i\right)}-{\bf z}_{k}^{\left(i\right)}-A^{\left(i\right)}{\bf x}_{k}\right|^{2}}{1+\left\|A^{\left(i\right)}\right\|_{2}^{2}}\right\},\underset{m+1\leqslant i\leqslant m+n}{\max}\left\{\frac{\left|A_{\left(i-m\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A_{\left(i-m\right)}\right\|_{2}^{2}}\right\}\right\}.

3. If 1tkm1\leqslant t_{k}\leqslant m, then set ik=tki_{k}=t_{k} and compute

𝐳k+1=𝐳k+(𝐛(ik)𝐳k(ik)A(ik)𝐱k)1+A(ik)22𝐞ik,{\bf z}_{k+1}={\bf z}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}{\bf e}_{i_{k}},

and

𝐱k+1=𝐱k+(𝐛(ik)𝐳k(ik)A(ik)𝐱k)1+A(ik)22(A(ik))H.{\bf x}_{k+1}={\bf x}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H}.

Else if m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, then set jk=tkmj_{k}=t_{k}-m,

𝐳k+1=𝐳kA(jk)H𝐳kA(jk)22A(jk),{\bf z}_{k+1}={\bf z}_{k}-\frac{A_{\left(j_{k}\right)}^{H}{\bf z}_{k}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}A_{\left(j_{k}\right)},

and select an index ik{1,2,,m}i_{k}\in\left\{1,2,\dots,m\right\} with probability Pr(row=ik)=A(ik)22AF2P_{r}\left(row=i_{k}\right)=\frac{\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}{\left\|A\right\|_{F}^{2}}, and compute

𝐱k+1=𝐱k+(𝐛(ik)𝐳k+1(ik)A(ik)𝐱k)A(ik)22(A(ik))H.{\bf x}_{k+1}={\bf x}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k+1}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H}.

4. endfor

Next, we consider the convergence of the AGRAK method. The following lemma is needed.

Lemma 3.1.

Let a1,a2,b1,b2+a_{1},a_{2},b_{1},b_{2}\in\mathbb{R}^{+}, and a1<a2a_{1}<a_{2}. Then there exists υ(a1,a2)\upsilon\in\left(a_{1},a_{2}\right), such that a1b1+a2b2=υ(b1+b2)a_{1}b_{1}+a_{2}b_{2}=\upsilon\left(b_{1}+b_{2}\right).

Proof.

Let υ=a1b1+a2b2b1+b2\upsilon=\frac{a_{1}b_{1}+a_{2}b_{2}}{b_{1}+b_{2}}, then

υ=a1b1+a2b2b1+b2<a2b1+a2b2b1+b2=a2(b1+b2)b1+b2=a2,\upsilon=\frac{a_{1}b_{1}+a_{2}b_{2}}{b_{1}+b_{2}}<\frac{a_{2}b_{1}+a_{2}b_{2}}{b_{1}+b_{2}}=\frac{a_{2}\left(b_{1}+b_{2}\right)}{b_{1}+b_{2}}=a_{2},

and

υ=a1b1+a2b2b1+b2>a1b1+a1b2b1+b2=a1(b1+b2)b1+b2=a1,\upsilon=\frac{a_{1}b_{1}+a_{2}b_{2}}{b_{1}+b_{2}}>\frac{a_{1}b_{1}+a_{1}b_{2}}{b_{1}+b_{2}}=\frac{a_{1}\left(b_{1}+b_{2}\right)}{b_{1}+b_{2}}=a_{1},

which satisfies υ(b1+b2)=a1b1+a2b2\upsilon\left(b_{1}+b_{2}\right)=a_{1}b_{1}+a_{2}b_{2}. ∎

First, notice that if 1tkm1\leqslant t_{k}\leqslant m, Algorithm 3 applies the semi-randomized Kaczmarz method [21] to solve the augmented linear system (1). Denote by

𝐱~k=[𝐳k𝐱k]and𝐱~=[𝐳𝐱],\tilde{{\bf x}}_{k}=\left[\begin{array}[]{c}{\bf z}_{k}\\ {\bf x}_{k}\\ \end{array}\right]\quad and\quad\tilde{{\bf x}}_{\star}=\left[\begin{array}[]{c}{\bf z}_{\star}\\ {\bf x}_{\star}\\ \end{array}\right],

from [21, Theorem 3.2], we have that

𝔼k𝐱~k+1𝐱~22(1λmin(A~HA~)A~F2min1im+n{A~(i)22})𝐱~k𝐱~22.\mathbb{E}_{k}\left\|\tilde{{\bf x}}_{k+1}-\tilde{{\bf x}}_{\star}\right\|_{2}^{2}\leqslant\left(1-\frac{\lambda_{\min}\left(\tilde{A}^{H}\tilde{A}\right)}{\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}}\right)\left\|\tilde{{\bf x}}_{k}-\tilde{{\bf x}}_{\star}\right\|_{2}^{2}. (1)

Thanks to the structure of A~\tilde{A}, 𝐱~k+1\tilde{{\bf x}}_{k+1} and 𝐱~\tilde{{\bf x}}_{\star}; see (2), one can rewrite (1) as

𝔼k(𝐱k+1𝐱22+𝐳k+1𝐳22)(1ηγ)(𝐱k𝐱22+𝐳k𝐳22),\mathbb{E}_{k}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\left(1-\frac{\eta}{\gamma}\right)\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right), (2)

where η\eta and γ\gamma be defined in (12) and (13).

Take full expectation on the both sides of (2), and set

β~=1ηγ,\tilde{\beta}=1-\frac{\eta}{\gamma}, (3)

we arrive at

𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)β~𝔼(𝐱k𝐱22+𝐳k𝐳22),1tkm.\mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\tilde{\beta}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right),\quad 1\leqslant t_{k}\leqslant m.

Second, we consider the convergence of 𝐳k{\bf z}_{k} as m+1tkm+nm+1\leqslant t_{k}\leqslant m+n. Notice that if m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, then

max{max1im{|𝐛(i)𝐳k(i)A(i)𝐱k|21+A(i)22},maxm+1im+n{|A(im)H𝐳k|2A(im)22}}=maxm+1im+n{|A(im)H𝐳k|2A(im)22}.\max\left\{\underset{1\leqslant i\leqslant m}{\max}\left\{\frac{\left|{\bf b}^{\left(i\right)}-{\bf z}_{k}^{\left(i\right)}-A^{\left(i\right)}{\bf x}_{k}\right|^{2}}{1+\left\|A^{\left(i\right)}\right\|_{2}^{2}}\right\},\underset{m+1\leqslant i\leqslant m+n}{\max}\left\{\frac{\left|A_{\left(i-m\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A_{\left(i-m\right)}\right\|_{2}^{2}}\right\}\right\}=\underset{m+1\leqslant i\leqslant m+n}{\max}\left\{\frac{\left|A_{\left(i-m\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A_{\left(i-m\right)}\right\|_{2}^{2}}\right\}.

Consequently,

jk=tkm=argmax1jn{|A(j)H𝐳k|2A(j)22},j_{k}=t_{k}-m=\mathop{\arg\max}_{1\leqslant j\leqslant n}\left\{\frac{\left|A_{\left(j\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A_{\left(j\right)}\right\|_{2}^{2}}\right\},

and

𝐳k+1=𝐳kA(jk)H𝐳kA(jk)22A(jk),wherejk=argmax1jn{|A(j)H𝐳k|2A(j)22},{\bf z}_{k+1}={\bf z}_{k}-\frac{A_{\left(j_{k}\right)}^{H}{\bf z}_{k}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}A_{\left(j_{k}\right)},\quad where\quad j_{k}=\mathop{\arg\max}_{1\leqslant j\leqslant n}\left\{\frac{\left|A_{\left(j\right)}^{H}{\bf z}_{k}\right|^{2}}{\left\|A_{\left(j\right)}\right\|_{2}^{2}}\right\},

which is nothing but applying the semi-randomized Kaczmarz method to solve the equation AH𝐳=0A^{H}{\bf z}=0. Therefore, by [21, Theorem 3.2],

𝔼k𝐳k+1𝐳22(1λmin(AHA)AF2min1jn{A(j)22})𝐳k𝐳22,m+1tkm+n.\mathbb{E}_{k}\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\leqslant\left(1-\frac{\lambda_{\min}\left(A^{H}A\right)}{\left\|A\right\|_{F}^{2}-\underset{1\leqslant j\leqslant n}{\min}\left\{\left\|A_{\left(j\right)}\right\|_{2}^{2}\right\}}\right)\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2},\quad m+1\leqslant t_{k}\leqslant m+n. (4)

Take full expectation on the both sides of (4), and set

δ=1λmin(AHA)AF2min1jn{A(j)22},\delta=1-\frac{\lambda_{\min}\left(A^{H}A\right)}{\left\|A\right\|_{F}^{2}-\underset{1\leqslant j\leqslant n}{\min}\left\{\left\|A_{\left(j\right)}\right\|_{2}^{2}\right\}},

we get

𝔼𝐳k+1𝐳22δ𝔼𝐳k𝐳22.\mathbb{E}\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\leqslant\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}. (5)

Next, we update 𝐱k{\bf x}_{k} for the linear equation A𝐱=𝐛𝐳k+1A{\bf x}={\bf b}-{\bf z}_{k+1} by using the randomized Kaczmarz method. From [31, 41], we obtain

𝔼k𝐱k+1𝐱22(1λmin(AHA)AF2)𝐱k𝐱22+𝐳k+1𝐳22AF2,m+1tkm+n.\mathbb{E}_{k}\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant\left(1-\frac{\lambda_{\min}\left(A^{H}A\right)}{\left\|A\right\|_{F}^{2}}\right)\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\frac{\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}}{\left\|A\right\|_{F}^{2}},\quad m+1\leqslant t_{k}\leqslant m+n. (6)

Take full expectation on the both sides of (6), and set

α=1λmin(AHA)AF2,\alpha=1-\frac{\lambda_{\min}\left(A^{H}A\right)}{\left\|A\right\|_{F}^{2}}, (7)

we get

𝔼𝐱k+1𝐱22\displaystyle\mathbb{E}\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2} α𝔼𝐱k𝐱22+𝔼𝐳k+1𝐳22AF2\displaystyle\leqslant\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\frac{\mathbb{E}\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}}{\left\|A\right\|_{F}^{2}}
α𝔼𝐱k𝐱22+δAF2𝔼𝐳k𝐳22,\displaystyle\leqslant\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\frac{\delta}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}, (8)

where the second inequality is from (5). Suppose that AF21\left\|A\right\|_{F}^{2}\gg 1111We mention that this assumption is very weak in practice, since Kaczmarz-type methods often applies to very large (dense) linear systems., such that

δAF2𝔼𝐳k𝐳22α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22,\frac{\delta}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\ll\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2},

or in other words,

δAF2𝔼𝐳k𝐳22=o(α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22).\frac{\delta}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}=o\left(\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right). (9)

Combining (5), (3.1) and (9), we have from Lemma 3.1 that there exists δ<θk<α\delta<\theta_{k}<\alpha, such that

𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)\displaystyle\mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right) α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22+δAF2𝔼𝐳k𝐳22\displaystyle\leqslant\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}+\frac{\delta}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}
=α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22+o(α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22)\displaystyle=\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}+o\big{(}\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\big{)}
α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22\displaystyle\approx\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}
=θk𝔼(𝐱k𝐱22+𝐳k𝐳22),m+1tkm+n,\displaystyle=\theta_{k}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right),\quad m+1\leqslant t_{k}\leqslant m+n, (10)

where the high order term o(α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22)o\big{(}\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\big{)} is omitted. In conclusion, we have the follow theorem for the convergence of the proposed AGRAK method.

Theorem 2.

Let AF21\left\|A\right\|_{F}^{2}\gg 1 such that (9) is satisfied. Then under the above notations, the iteration sequences {𝐱k}k=1\left\{{\bf x}_{k}\right\}_{k=1}^{\infty} and {𝐳k}k=1\left\{{\bf z}_{k}\right\}_{k=1}^{\infty} generated by Algorithm 3 converge in expectation to 𝐱{\bf x}_{\star} and 𝐳{\bf z}_{\star}, with

{𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)β~𝔼(𝐱k𝐱22+𝐳k𝐳22)if  1tkm,𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)θk𝔼(𝐱k𝐱22+𝐳k𝐳22)ifm+1tkm+n,\begin{cases}\mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\tilde{\beta}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right)&if\,\,1\leqslant t_{k}\leqslant m,\\ \mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\lesssim\theta_{k}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right)&if\,\,m+1\leqslant t_{k}\leqslant m+n,\\ \end{cases}

where “\lesssim” means the high order term o(α𝔼𝐱k𝐱22+δ𝔼𝐳k𝐳22)o\big{(}\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\delta\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\big{)} is omitted.

Under the condition that AF21\|A\|_{F}^{2}\gg 1, the following theorem indicates that Algorithm 3 converges faster than the GRAK method; refer to (10).

Theorem 3.

Let β,β~\beta,\tilde{\beta} and θk\theta_{k} be defined in (11), (3) and (3.1), respectively. If AF21\left\|A\right\|_{F}^{2}\gg 1 such that (9) is satisfied, then

0<β~<βand0<θk<β.0<\tilde{\beta}<\beta\quad{\rm and}\quad 0<\theta_{k}<\beta.
Proof.

First, we prove that β~<β\tilde{\beta}<\beta. Recall that

β=112(A~F2A~F2min1im+n{A~(i)22}+1)λmin(A~HA~)A~F2,\beta=1-\frac{1}{2}\left(\frac{\left\|\tilde{A}\right\|_{F}^{2}}{\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}}+1\right)\frac{\lambda_{\min}\left(\tilde{A}^{H}\tilde{A}\right)}{\left\|\tilde{A}\right\|_{F}^{2}},

and

β~\displaystyle\tilde{\beta} =1λmin(A~HA~)A~F2min1im+n{A~(i)22}=1A~F2A~F2min1im+n{A~(i)22}λmin(A~HA~)A~F2.\displaystyle=1-\frac{\lambda_{\min}\left(\tilde{A}^{H}\tilde{A}\right)}{\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}}=1-\frac{\left\|\tilde{A}\right\|_{F}^{2}}{\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}}\cdot\frac{\lambda_{\min}\left(\tilde{A}^{H}\tilde{A}\right)}{\left\|\tilde{A}\right\|_{F}^{2}}.

As there are no zero columns in AA, we obtain

12(A~F2A~F2min1im+n{A~(i)22}+1)<A~F2A~F2min1im+n{A~(i)22},\frac{1}{2}\left(\frac{\left\|\tilde{A}\right\|_{F}^{2}}{\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}}+1\right)<\frac{\left\|\tilde{A}\right\|_{F}^{2}}{\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}},

and β~<β\tilde{\beta}<\beta.

Second, we prove that θk<β\theta_{k}<\beta. Indeed, it is sufficient to show that α<β\alpha<\beta. Let

{η=1<λmin(AHA),ifλmin(AHA)2,η=(λmin(AHA)+1412)2,ifλmin(AHA)<2.\begin{cases}\eta=1<\lambda_{\min}\left(A^{H}A\right),&if\,\,\lambda_{\min}\left(A^{H}A\right)\geqslant 2,\\ \eta=\left(\sqrt{\lambda_{\min}\left(A^{H}A\right)+\frac{1}{4}}-\frac{1}{2}\right)^{2},&if\,\,\lambda_{\min}\left(A^{H}A\right)<2.\\ \end{cases}

Notice that η<λmin(AHA)\eta<\lambda_{\min}\left(A^{H}A\right). Let ξ=2+mAF2\xi=2+\frac{m}{\left\|A\right\|_{F}^{2}}, then

12(2AF2+mγ+1)η2AF2+m=12ξ(2AF2+mγ+1)ηAF2.\frac{1}{2}\left(\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\right)\frac{\eta}{2\left\|A\right\|_{F}^{2}+m}=\frac{1}{2\xi}\left(\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\right)\frac{\eta}{\left\|A\right\|_{F}^{2}}. (11)

Next, we show that 2AF2+mγ+1<4.\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1<4. Otherwise, if 2AF2+mγ+14\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\geqslant 4, then

3AF23min{1+min1im{A(i)22},min1jn{A(j)22}}4AF2+2m,3\left\|A\right\|_{F}^{2}\geqslant 3\cdot\min\left\{1+\underset{1\leqslant i\leqslant m}{\min}\left\{\left\|A^{\left(i\right)}\right\|_{2}^{2}\right\},\underset{1\leqslant j\leqslant n}{\min}\left\{\left\|A_{\left(j\right)}\right\|_{2}^{2}\right\}\right\}\geq 4\left\|A\right\|_{F}^{2}+2m,

which is incorrect. As a result,

12ξ(2AF2+mγ+1)<1,\frac{1}{2\xi}\left(\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\right)<1,

and

12ξ(2AF2+mγ+1)ηAF2<λmin(AHA)AF2.\frac{1}{2\xi}\left(\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\right)\frac{\eta}{\left\|A\right\|_{F}^{2}}<\frac{\lambda_{\min}\left(A^{H}A\right)}{\left\|A\right\|_{F}^{2}}. (12)

A combination of (11) and (12) yields

12(2AF2+mγ+1)η2AF2+m<λmin(AHA)AF2,\frac{1}{2}\left(\frac{2\left\|A\right\|_{F}^{2}+m}{\gamma}+1\right)\frac{\eta}{2\left\|A\right\|_{F}^{2}+m}<\frac{\lambda_{\min}\left(A^{H}A\right)}{\left\|A\right\|_{F}^{2}},

and it follows from (11) and (7) that θk<α<β.\theta_{k}<\alpha<\beta.

3.2 An accelerated greedy randomized augmented Kaczmarz method with simple random sampling

Although Algorithm 3 can converge faster than GRAK, however, just like the GRAK method, one has to compute the residual vector A𝐱k(𝐛𝐳k)A{\bf x}_{k}-({\bf b}-{\bf z}_{k}) and AH𝐳kA^{H}{\bf z}_{k} in each iteration. That is, we have to access all the rows and columns of the data matrix AA, which is contrary to the motivation of the Kaczmarz-type methods. Specifically, this is unfavorable when the coefficient matrix is large and dense, and it is interesting to accelerate Algorithm 3 further.

To deal with this problem, we apply the simple random sampling strategy advocated in [21] to solve (1). The key idea is that, in each iteration, we first generate a subset by using simple random sampling, and then select working rows in this subset. Notice that the cost of generating a subset via simple random sampling is negligible compared with that of the overall algorithm. The advantages are that we are free of accessing all the rows or columns of the data matrix, and there is no need to compute the residual vector associated with the augmented linear system. The algorithm is given as follows.

Algorithm 4.

A semi-randomized augmented Kaczmarz method with simple random sampling
Input:
AA, 𝐛\bf b, 𝐱0{\bf x}_{0}, 𝐳0{\bf z}_{0}, a parameter 0<η10<\eta\ll 1 and the maximal iteration number ll.
Output: 𝐱l{\bf x}_{l} and 𝐳l{\bf z}_{l}.
1. for k=0,1,,l1k=0,1,\ldots,l-1 do
2. Generate an indicator set Ωk\varOmega_{k} with (m+n)η\lfloor\left(m+n\right)\eta\rceil rows by using the simple random sampling method, where \lfloor\cdot\rceil means rounding down. Let Ωk1=Ωk{1,2,,m}\varOmega_{k}^{1}=\varOmega_{k}\cap\left\{1,2,\dots,m\right\} and Ωk2=Ωk{m+1,m+2,,m+n}\varOmega_{k}^{2}=\varOmega_{k}\cap\left\{m+1,m+2,\dots,m+n\right\}. Then, select tkt_{k} according to

tk=argmax{maxjkΩk1{|𝐛(jk)𝐳k(jk)A(jk)𝐱k|1+A(jk)22},maxjkΩk2{|A(jkm)H𝐳k|A(jkm)H2}}t_{k}=\mathop{\arg\max}\left\{\max_{j_{k}\in\varOmega_{k}^{1}}\left\{\frac{\left|{\bf b}^{\left(j_{k}\right)}-{\bf z}_{k}^{\left(j_{k}\right)}-A^{\left(j_{k}\right)}{\bf x}_{k}\right|}{\sqrt{1+\left\|A^{\left(j_{k}\right)}\right\|_{2}^{2}}}\right\},\max_{j_{k}\in\varOmega_{k}^{2}}\left\{\frac{\left|-A_{\left(j_{k}-m\right)}^{H}{\bf z}_{k}\right|}{\left\|A_{\left(j_{k}-m\right)}^{H}\right\|_{2}}\right\}\right\}

3. If 1tkm1\leqslant t_{k}\leqslant m, then set ik=tki_{k}=t_{k}, and compute

𝐳k+1=𝐳k+(𝐛(ik)𝐳k(ik)A(ik)𝐱k)1+A(ik)22𝐞ik,{\bf z}_{k+1}={\bf z}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}{\bf e}_{i_{k}},

and

𝐱k+1=𝐱k+(𝐛(ik)𝐳k(ik)A(ik)𝐱k)1+A(ik)22(A(ik))H.{\bf x}_{k+1}={\bf x}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{1+\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H}.

Else if m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, then set jk=tkmj_{k}=t_{k}-m, and compute

𝐳k+1=𝐳kA(jk)H𝐳kA(jk)22A(jk),{\bf z}_{k+1}={\bf z}_{k}-\frac{A_{\left(j_{k}\right)}^{H}{\bf z}_{k}}{\left\|A_{\left(j_{k}\right)}\right\|_{2}^{2}}A_{\left(j_{k}\right)},

and select an index ik{1,2,,m}i_{k}\in\left\{1,2,\dots,m\right\} with probability Pr(row=ik)=A(ik)22AF2P_{r}\left(row=i_{k}\right)=\frac{\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}{\left\|A\right\|_{F}^{2}}, and compute

𝐱k+1=𝐱k+(𝐛(ik)𝐳k+1(ik)A(ik)𝐱k)A(ik)22(A(ik))H.{\bf x}_{k+1}={\bf x}_{k}+\frac{\left({\bf b}^{\left(i_{k}\right)}-{\bf z}_{k+1}^{\left(i_{k}\right)}-A^{\left(i_{k}\right)}{\bf x}_{k}\right)}{\left\|A^{\left(i_{k}\right)}\right\|_{2}^{2}}\left(A^{\left(i_{k}\right)}\right)^{H}.

4. end for

We are in a position to consider the convergence of the semi-randomized augmented Kaczmarz method with simple random sampling. First, if 1tkm1\leqslant t_{k}\leqslant m, similar to the proof of Theorem 4.2 in [21], as the number of the rows of AA is sufficiently large, we have from the Chebyshev’s (weak) law of large numbers that [22], there are two scalars 0ε^k,ε~k10\leqslant\hat{\varepsilon}_{k},\tilde{\varepsilon}_{k}\ll 1 such that

𝔼k𝐱~k+1𝐱~22(1(1ε^k)λmin(A~HA~)(1+ε~k)(A~F2min1im+n{A~(i)22}))𝐱~k𝐱~22.\mathbb{E}_{k}\left\|\tilde{{\bf x}}_{k+1}-\tilde{{\bf x}}_{\star}\right\|_{2}^{2}\leqslant\left(1-\frac{\left(1-\hat{\varepsilon}_{k}\right)\lambda_{\min}\left(\tilde{A}^{H}\tilde{A}\right)}{\left(1+\tilde{\varepsilon}_{k}\right)\left(\left\|\tilde{A}\right\|_{F}^{2}-\underset{1\leqslant i\leqslant m+n}{\min}\left\{\left\|\tilde{A}^{\left(i\right)}\right\|_{2}^{2}\right\}\right)}\right)\left\|\tilde{{\bf x}}_{k}-\tilde{{\bf x}}_{\star}\right\|_{2}^{2}.

It can be rewritten as

𝔼k(𝐱k+1𝐱22+𝐳k+1𝐳22)(1(1ε^k)η(1+ε~k)γ)(𝐱k𝐱22+𝐳k𝐳22),\mathbb{E}_{k}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\left(1-\frac{\left(1-\hat{\varepsilon}_{k}\right)\eta}{\left(1+\tilde{\varepsilon}_{k}\right)\gamma}\right)\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right), (13)

where η\eta and γ\gamma are defined in (12) and (13), respectively. Take full expectation on the both sides of (13) and set

β^=1(1ε^k)η(1+ε~k)γ,\hat{\beta}=1-\frac{\left(1-\hat{\varepsilon}_{k}\right)\eta}{\left(1+\tilde{\varepsilon}_{k}\right)\gamma},

we obtain

𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)β^𝔼(𝐱k𝐱22+𝐳k𝐳22),1tkm.\mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\hat{\beta}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right),\quad 1\leqslant t_{k}\leqslant m.

Second, if m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, analogous to the analysis of Algorithm 3, we have that

𝔼𝐳k+1𝐳22δ~𝔼𝐳k𝐳22.\mathbb{E}\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\leqslant\tilde{\delta}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}.

As the number of the rows of AA is sufficiently large, we have from the Chebyshev’s (weak) law of large numbers that [22], there are two scalars 0εˇk,ε̊k10\leqslant\check{\varepsilon}_{k},\mathring{\varepsilon}_{k}\ll 1, such that

𝔼𝐱k+1𝐱22α𝔼𝐱k𝐱22+δ~AF2𝔼𝐳k𝐳22,\mathbb{E}\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\frac{\tilde{\delta}}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2},

where α\alpha is defined in (7), and

δ~=1(1εˇk)λmin(AHA)(1+ε̊k)(AF2min1jn{A(j)22}).\tilde{\delta}=1-\frac{\left(1-\check{\varepsilon}_{k}\right)\lambda_{\min}\left(A^{H}A\right)}{\left(1+\mathring{\varepsilon}_{k}\right)\left(\left\|A\right\|_{F}^{2}-\underset{1\leqslant j\leqslant n}{\min}\left\{\left\|A_{\left(j\right)}\right\|_{2}^{2}\right\}\right)}.

If AF21\left\|A\right\|_{F}^{2}\gg 1 such that

δ~AF2𝔼𝐳k𝐳22α𝔼𝐱k𝐱22+δ~𝔼𝐳k𝐳22,\frac{\tilde{\delta}}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\ll\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\tilde{\delta}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2},

i.e.,

δ~AF2𝔼𝐳k𝐳22=o(α𝔼𝐱k𝐱22+δ~𝔼𝐳k𝐳22).\frac{\tilde{\delta}}{\left\|A\right\|_{F}^{2}}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}=o\left(\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\tilde{\delta}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right). (14)

Then we have the following theorem on the convergence of Algorithm 4.

Theorem 4.

Suppose that AF21\left\|A\right\|_{F}^{2}\gg 1 such that (14) is satisfied. Then the iteration sequences {𝐱k}k=1\left\{{\bf x}_{k}\right\}_{k=1}^{\infty} and {𝐳k}k=1\left\{{\bf z}_{k}\right\}_{k=1}^{\infty} generated by Algorithm 4, converge in expectation to 𝐱{\bf x}_{\star} and 𝐳{\bf z}_{\star}, with

{𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)β^𝔼(𝐱k𝐱22+𝐳k𝐳22)if  1tkm,𝔼(𝐱k+1𝐱22+𝐳k+1𝐳22)θ~k𝔼(𝐱k𝐱22+𝐳k𝐳22)ifm+1tkm+n,\begin{cases}\mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\leqslant\hat{\beta}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right)&if\,\,1\leqslant t_{k}\leqslant m,\\ \mathbb{E}\left(\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k+1}-{\bf z}_{\star}\right\|_{2}^{2}\right)\lesssim\tilde{\theta}_{k}\mathbb{E}\left(\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right)&if\,\,m+1\leqslant t_{k}\leqslant m+n,\\ \end{cases}

where “\lesssim” means the high order term o(α𝔼𝐱k𝐱22+δ~𝔼𝐳k𝐳22)o\left(\alpha\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\tilde{\delta}\mathbb{E}\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}\right) is omitted, and θ~k<max{α,δ~}<1\tilde{\theta}_{k}<\max\big{\{}\alpha,\tilde{\delta}\big{\}}<1.

Remark 3.1.

As we only use a small portion of the rows of the data matrix AA, we often have that β^>β~\hat{\beta}>\tilde{\beta} and θ~k>θk\tilde{\theta}_{k}>\theta_{k}. That is, Algorithm 4 may use more iterations than Algorithm 3. However, this does not mean that Algorithm 4 definitely runs slower than Algorithm 3. Indeed, Algorithm 4 is much cheaper than Algorithm 3 per iteration, and it may use less CPU time in practice. One refers to Section 5 for numerical comparisons of the algorithms.

4 A practical stopping criterion for Kaczmarz-type methods

To the best of our knowledge, there are no practical stopping criteria for the randomized Kaczmarz-type methods till now. To fill-in this gap, we introduce a practical stopping criterion for the randomized Kaczmarz-type methods in this section. Let {𝐱i}i=0\left\{{\bf x}_{i}\right\}_{i=0}^{\infty} be an iterative sequence generated by any Kaczmarz-type methods, and let 𝐱=A𝐛{\bf x}_{\star}=A^{\dagger}{\bf b} be the least-squares solution of the linear system A𝐱=𝐛A{\bf x}={\bf b}. One scheme is to use the relative solution error (RSE) as the stopping criterion [3, 5, 6, 21]

RSE=𝐱k𝐱2𝐱2.RSE=\frac{\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}}{\left\|{\bf x}_{\star}\right\|_{2}}. (1)

In [29], the absolute solution error (ASE) is used

ASE=𝐱k𝐱2.ASE=\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}. (2)

Obviously, both (1) and (2) are impractical since xx_{\star} is unknown in advance. In [43], the relative residual (RRes) is utilized

RRes=𝐛A𝐱k2𝐛2.RRes=\frac{\left\|{\bf b}-A{\bf x}_{k}\right\|_{2}}{\left\|{\bf b}\right\|_{2}}.

However, it is unsuitable for inconsistent linear system, and one has to calculate the residual 𝐛A𝐱k{\bf b}-A{\bf x}_{k}. In other words, we have to access all the rows of AA in each iteration. In [25], Li and Wu suggest using the adjacent iterative solution error (AISE for short) as the stopping criterion

AISE=𝐱k𝐱k12𝐛2.AISE=\frac{\left\|{\bf x}_{k}-{\bf x}_{k-1}\right\|_{2}}{\left\|{\bf b}\right\|_{2}}. (3)

We observe that it is free of xx_{\star} and there is no need to compute the residual vector in (3). Unfortunately, 𝐱k𝐱k12/𝐛2\left\|{\bf x}_{k}-{\bf x}_{k-1}\right\|_{2}/\left\|{\bf b}\right\|_{2} converges to zero is only a necessary condition for {𝐱i}i=0\left\{{\bf x}_{i}\right\}_{i=0}^{\infty} converges to 𝐱{\bf x}_{\star}. More precisely,

|𝐱k𝐱2𝐱k1𝐱2|𝐛2𝐱k𝐱k12𝐛2𝐱k𝐱2+𝐱k1𝐱2𝐛2.\frac{\left|\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}-\left\|{\bf x}_{k-1}-{\bf x}_{\star}\right\|_{2}\right|}{\left\|{\bf b}\right\|_{2}}\leqslant\frac{\left\|{\bf x}_{k}-{\bf x}_{k-1}\right\|_{2}}{\left\|{\bf b}\right\|_{2}}\leqslant\frac{\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}+\left\|{\bf x}_{k-1}-{\bf x}_{\star}\right\|_{2}}{\left\|{\bf b}\right\|_{2}}.

In this section, we will propose a practical stopping criterion for all the randomized Kaczmarz-type methods, and show its rationality theoretically. We need the following lemma.

Lemma 4.1.

Let {𝐱i}i=0\left\{{\bf x}_{i}\right\}_{i=0}^{\infty} be an iterative sequence obtained from a convergent Kaczmarz-type method. Then, there is a positive integer LL such that 𝐱(k+1)L𝐱2<𝐱kL𝐱2,k=0,1,2,\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}<\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2},~{}k=0,1,2,\ldots

Proof.

We prove it by contradiction. Suppose that 𝐱(k+1)L𝐱2𝐱kL𝐱2,k=0,1,2,\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\geqslant\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2},k=0,1,2,\ldots, for all the positive integers LL. If we take L=1L=1, then

𝐱k+1𝐱2𝐱k𝐱2,k=0,1,2,,\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}\geqslant\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2},\quad k=0,1,2,\ldots,

which contradicts to the fact {𝐱i}i=0\left\{{\bf x}_{i}\right\}_{i=0}^{\infty} is from a convergent Kaczmarz-type method. ∎

Let

g()=𝐱𝐱2,=0,L,2L,g\left(\ell\right)=\left\|{\bf x}_{\ell}-{\bf x}_{\star}\right\|_{2}\,\,,\quad\ell=0,L,2L,\ldots

then the backward divided difference of g()g\left(\ell\right) at kLkL is defined as [10]

|g(kL)g((k1)L)|kL(k1)L=|𝐱kL𝐱2𝐱(k1)L𝐱2|L.\frac{\left|g\left(kL\right)-g\left(\left(k-1\right)L\right)\right|}{kL-\left(k-1\right)L}=\frac{\left|\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2}-\left\|{\bf x}_{\left(k-1\right)L}-{\bf x}_{\star}\right\|_{2}\right|}{L}.

From Lemma 4.1, we see that g()g\left(\ell\right) is a strictly monotonically decreasing discrete function whose infimum is 0, and

limkg(kL)=0.\lim_{k\rightarrow\infty}g\left(kL\right)=0.

Thus, for any ε>0\varepsilon>0, there exists a positive integer k1>0k_{1}>0, such that g(kL)ε2,kk1g\left(kL\right)\leqslant\frac{\varepsilon}{2},~{}\forall\,\,k\geqslant k_{1}. Next, we will show that there are k2>0k_{2}>0 and ε1<ε\varepsilon_{1}<\varepsilon, if 𝐱k2L𝐱(k21)L2/Lε1\left\|{\bf x}_{k_{2}L}-{\bf x}_{\left(k_{2}-1\right)L}\right\|_{2}/L\leqslant\varepsilon_{1}, then g(k2L)<εg\left(k_{2}L\right)<\varepsilon. Indeed, let ε1=ε2Lk1\varepsilon_{1}=\frac{\varepsilon}{2Lk_{1}}, there exists k2>0k_{2}>0, such that

|g(k2L)g((k21)L)|L\displaystyle\frac{\left|g\left(k_{2}L\right)-g\left(\left(k_{2}-1\right)L\right)\right|}{L} =|𝐱k2L𝐱2𝐱(k21)L𝐱2|L\displaystyle=\frac{\left|\left\|{\bf x}_{k_{2}L}-{\bf x}_{\star}\right\|_{2}-\left\|{\bf x}_{\left(k_{2}-1\right)L}-{\bf x}_{\star}\right\|_{2}\right|}{L}
𝐱k2L𝐱(k21)L2Lε2Lk1.\displaystyle\leqslant\frac{\left\|{\bf x}_{k_{2}L}-{\bf x}_{\left(k_{2}-1\right)L}\right\|_{2}}{L}\leqslant\frac{\varepsilon}{2Lk_{1}}. (4)

On one hand, if k1k2k_{1}\leqslant k_{2}, then g(k2L)ε2<εg\left(k_{2}L\right)\leqslant\frac{\varepsilon}{2}<\varepsilon. On the other hand, if k1>k2k_{1}>k_{2}, then there is a scalar

a=g(k1L)g(k2L)(k1k2)L,a=\frac{g\left(k_{1}L\right)-g\left(k_{2}L\right)}{\left(k_{1}-k_{2}\right)L}, (5)

such that

g(k2L)g((k21)L)Lag(k1L)g((k11)L)L<0.\frac{g\left(k_{2}L\right)-g\left(\left(k_{2}-1\right)L\right)}{L}\leqslant a\leqslant\frac{g\left(k_{1}L\right)-g\left(\left(k_{1}-1\right)L\right)}{L}<0. (6)

Combining the inequalities (4) and (6), we arrive at

|a||g(k2L)g((k21)L)|Lε2k1L.\left|a\right|\leqslant\frac{\left|g\left(k_{2}L\right)-g\left(\left(k_{2}-1\right)L\right)\right|}{L}\leqslant\frac{\varepsilon}{2k_{1}L}.

From (5), we have g(k2L)=g(k1L)a(k1k2)Lg\left(k_{2}L\right)=g\left(k_{1}L\right)-a\left(k_{1}-k_{2}\right)L. Hence,

g(k2L)\displaystyle g\left(k_{2}L\right) =|g(k2L)|=|g(k1L)a(k1k2)L|\displaystyle=\left|g\left(k_{2}L\right)\right|=\left|g\left(k_{1}L\right)-a\left(k_{1}-k_{2}\right)L\right|
|g(k1L)|+|a||k1k2|Lε2+ε2k1L|k1k2|L\displaystyle\leqslant\left|g\left(k_{1}L\right)\right|+\left|a\right|\cdot\left|k_{1}-k_{2}\right|L\leqslant\frac{\varepsilon}{2}+\frac{\varepsilon}{2k_{1}L}\left|k_{1}-k_{2}\right|L
=ε2+k1k2k1ε2ε2+ε2=ε.\displaystyle=\frac{\varepsilon}{2}+\frac{k_{1}-k_{2}}{k_{1}}\cdot\frac{\varepsilon}{2}\leqslant\frac{\varepsilon}{2}+\frac{\varepsilon}{2}=\varepsilon.

In conclusion, we have the following theorem.

Theorem 5.

Let g()=𝐱𝐱2(=0,L,2L,)g\left(\ell\right)=\left\|{\bf x}_{\ell}-{\bf x}_{\star}\right\|_{2}~{}(\ell=0,L,2L,\ldots) be a strictly monotonically decreasing discrete function whose infimum is 0. For any ε>0\varepsilon>0, there are k2>0k_{2}>0 and ε1<ε\varepsilon_{1}<\varepsilon, if

𝐱k2L𝐱(k21)LLε1,\frac{\left\|{\bf x}_{k_{2}L}-{\bf x}_{\left(k_{2}-1\right)L}\right\|}{L}\leqslant\varepsilon_{1},

then g(k2)<εg\left(k_{2}\right)<\varepsilon.

Remark 4.1.

Based on Theorem 5, we propose the following stopping criterion for Kaczmarz-type methods

LISE=𝐱kL𝐱(k1)L2L<tol,LISE=\frac{\left\|{\bf x}_{kL}-{\bf x}_{\left(k-1\right)L}\right\|_{2}}{L}<tol, (7)

where L>0L>0 is a user-provided number and 𝑡𝑜𝑙\it tol is the convergence tolerance.

It is seen that the parameter LL is crucial to our stopping criterion. Next, we briefly discuss how to choose it in practice. Assume that

𝔼𝐱k+1𝐱22c𝔼𝐱k𝐱22,\mathbb{E}\left\|{\bf x}_{k+1}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant c\mathbb{E}\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2},

where 0<c<10<c<1, and

𝔼𝐱(k+1)L𝐱22cL𝔼𝐱kL𝐱22.\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant c^{L}\mathbb{E}\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2}^{2}. (8)

Therefore,

𝔼𝐱kL𝐱22\displaystyle\mathbb{E}\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2}^{2} =𝔼𝐱kL𝐱(k+1)L+𝐱(k+1)L𝐱22𝔼(𝐱(k+1)L𝐱kL2+𝐱(k+1)L𝐱2)2\displaystyle=\mathbb{E}\left\|{\bf x}_{kL}-{\bf x}_{\left(k+1\right)L}+{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}+\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\right)^{2}
=𝔼(𝐱(k+1)L𝐱kL22+2𝐱(k+1)L𝐱kL2𝐱(k+1)L𝐱2+𝐱(k+1)L𝐱22)\displaystyle=\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}^{2}+2\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}+\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}^{2}\right)
=𝔼𝐱(k+1)L𝐱kL22+2𝔼(𝐱(k+1)L𝐱kL2𝐱(k+1)L𝐱2)+𝔼𝐱(k+1)L𝐱22\displaystyle=\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}^{2}+2\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\right)+\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}^{2}
𝔼𝐱(k+1)L𝐱kL22+2𝔼(𝐱(k+1)L𝐱kL2𝐱(k+1)L𝐱2)+cL𝔼𝐱kL𝐱22.\displaystyle\leqslant\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}^{2}+2\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\right)+c^{L}\mathbb{E}\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2}^{2}.

As a result,

(1cL)𝔼𝐱kL𝐱22𝔼𝐱(k+1)L𝐱kL22+2𝔼(𝐱(k+1)L𝐱kL2𝐱(k+1)L𝐱2),\left(1-c^{L}\right)\mathbb{E}\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}^{2}+2\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\right),

and

𝔼𝐱kL𝐱2211cL𝔼𝐱(k+1)L𝐱kL22+21cL𝔼(𝐱(k+1)L𝐱kL2𝐱(k+1)L𝐱2).\mathbb{E}\left\|{\bf x}_{kL}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant\frac{1}{1-c^{L}}\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}^{2}+\frac{2}{1-c^{L}}\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\right). (9)

From (8) and (9), it follows that

𝔼𝐱(k+1)L𝐱22cL1cL𝔼𝐱(k+1)L𝐱kL22+2cL1cL𝔼(𝐱(k+1)L𝐱kL2𝐱(k+1)L𝐱2).\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}^{2}\leqslant\frac{c^{L}}{1-c^{L}}\mathbb{E}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}^{2}+\frac{2c^{L}}{1-c^{L}}\mathbb{E}\left(\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{kL}\right\|_{2}\left\|{\bf x}_{\left(k+1\right)L}-{\bf x}_{\star}\right\|_{2}\right).

This implies that we need to use a sufficiently large LL in practical calculations. One can use, say, L=50L=50 or 8080 in practice.

5 Numerical Experiments

In this section, we perform numerical experiments to show the efficiency of our algorithms as well as the effectiveness of the proposed stopping criterion. All the numerical experiments are run on a Hp workstation with 20 cores double Intel(R)Xeon(R) E5-2640 v3 processors, with CPU 2.60 GHz and RAM 256 GB. The operation system is 64-bit Windows 10. All the numerical results are obtained from using MATLAB 2018b. In order to show the efficiency of our proposed algorithms, we compare Algorithm 3 and Algorithm 4 with the following state-of-the-art Kaczmarz-type methods for solving large-scale (dense or sparse) inconsistent linear systems:
\bullet REK [48]: The randomized extended Kaczmarz method.
\bullet PREK [5]: The partially randomized extended Kaczmarz method.
\bullet GRAK [4]: The greedy randomized augmented Kaczmarz method.
\bullet RGRAK [7]: The relaxed greedy randomized augmented Kaczmarz method, in which the relaxation parameter is set to be 1 [7].

In the tables below, we denote by “IT” the number of iterations, by “CPU” the CPU time in seconds, and by “RSE” the relative error defined as

RSE=𝐱k𝐱2𝐱2,{\rm RSE}=\frac{\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}}{\left\|{\bf x}_{\star}\right\|_{2}}, (10)

where 𝐱=A𝐛{\bf x}_{\star}=A^{\dagger}{\bf b} is the least-squares solution in 2-norm of the inconsistent linear systems (1), which is generated by the MATLAB function pinv.mpinv.m. All the numerical results are the means from 10 runs.

5.1 Effectiveness of the stopping criterion (7)

In this section, we show effectiveness of our stopping criterion for large inconsistent linear systems. In this subsection, the test matrix AA is synthetic and is generated by using the MATLAB built-in function randn(m,n)randn(m,n). Similar to [4], the right-hand side is chosen as 𝐛=A𝐱+𝐫{\bf b}=A{\bf x}_{\ast}+{\bf r}, where 𝐱n{\bf x}_{\ast}\in\mathbb{R}^{n} is one of the least-squares solution of the inconsistent linear systems (1), and 𝐫m{\bf r}\in\mathbb{R}^{m} is a nonzero vector in the null space of the matrix AHA^{H} generate by the MATLAB function null.mnull.m. Two examples are performed in this subsection.

\bullet In this first example, we run the six algorithms using their own stopping criteria. More precisely, in the REK method, we use [48]

A𝐱k(𝐛𝐳k)2AF𝐱k2tolandAH𝐳k2AF2𝐱k2tol\frac{\left\|A{\bf x}_{k}-\left({\bf b}-{\bf z}_{k}\right)\right\|_{2}}{\left\|A\right\|_{F}\left\|{\bf x}_{k}\right\|_{2}}\leqslant tol\quad{\rm and}\quad\frac{\left\|A^{H}{\bf z}_{k}\right\|_{2}}{\left\|A\right\|_{F}^{2}\left\|{\bf x}_{k}\right\|_{2}}\leqslant tol (11)

as the stopping criterion; in the PREK method, we use [5]

𝐱k𝐱22𝐱22tol\frac{\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}}{\left\|{\bf x}_{\star}\right\|_{2}^{2}}\leqslant tol (12)

as the stopping criterion; in the GRAK method and the RGRAK method, we use [4, 7]

𝐱k𝐱22+𝐳k𝐳22𝐱22+𝐛(A)22tol\frac{\left\|{\bf x}_{k}-{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf z}_{k}-{\bf z}_{\star}\right\|_{2}^{2}}{\left\|{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf b}_{\mathcal{R}\left(A\right)}\right\|_{2}^{2}}\leqslant tol (13)

as the stopping criteria, where 𝐳=𝐛(A){\bf z}_{\star}={\bf b}_{\mathcal{R}\left(A\right)^{\bot}} is computed by 𝐛(A)=𝐛A𝐱{\bf b}_{\mathcal{R}\left(A\right)^{\bot}}={\bf b}-A{\bf x}_{\star}. In Algorithm 3 and Algorithm 4, we make use of

LISE=𝐱~kL𝐱~(k1)L2LtolLISE=\frac{\left\|\tilde{{\bf x}}_{kL}-\tilde{{\bf x}}_{\left(k-1\right)L}\right\|_{2}}{L}\leqslant tol (14)

as the stopping criteria, where L=400L=400, 𝐱~kL=[𝐳kLH,𝐱kLH]H\tilde{{\bf x}}_{kL}=\left[{\bf z}_{kL}^{H},{\bf x}_{kL}^{H}\right]^{H}, refer to (2), and the convergence tolerance is chosen as tol=104tol=10^{-4}. Table 1 lists the numerical results, where the values in bold are the best one.

Table 1: Section 5.1. Example 1: Numerical results of the algorithms using their own stopping criteria (11)–(14); A=randn(m,n)A=randn(m,n), and tol=104tol=10^{-4}. The sampling ratio is chosen as η=0.01\eta=0.01 in Algorithm 4.
m×nm\times n 5000×10005000\times 1000 5000×15005000\times 1500 5000×20005000\times 2000 5000×25005000\times 2500 5000×30005000\times 3000
REK IT 19916 37204.9 64410.7 109528.8 185659.2
CPU 161.9 375.7 646.4 1362.3 2618.0
RSE 4.38e-3 6.53e-3 9.44e-3 1.34e-2 1.98e-2
PREK IT 13191.5 24724.2 47252.7 87730.9 169481.5
CPU 3.5 7.3 14.9 31.4 64.3
RSE 1.00e-2 1.00e-2 1.00e-2 1.00e-2 1.00e-2
GRAK IT 3003.5 5645.6 10181.9 16879.9 29117.3
CPU 17.1 49.9 84.7 139.9 282.1
RSE 4.79e-1 4.50e-1 4.16e-1 3.94e-1 3.61e-1
RGRAK IT 2977.3 5601.4 10206 16702.6 28927.7
CPU 15.7 47.6 84.1 139.6 283.5
RSE 4.90e-1 4.73e-1 4.55e-1 4.44e-1 4.10e-1
Algorithm 3 IT 9600 19200 36880 65480 127200
CPU 61.1 117.9 253.6 581.5 1356.9
RSE 7.52e-4 1.17e-3 1.73e-3 2.22e-3 3.19e-3
Algorithm 4 IT 10240 19920 37960 68080 131360
CPU 14.0 38.9 107.5 250.7 748.6
RSE 6.61e-4 1.06e-3 1.63e-3 2.09e-3 2.99e-3

It is seen from Table 1 that the PREK method and the RGRAK method may require less CPU time and fewer number of iterations by using their own stopping criterion. However, the computed solution obtained from the two methods are unreliable, and the “real” relative errors of the approximations from the two methods can be large. For instance, the RSE values of the GRAM and RGRAK methods are only in the order of 𝒪(101)\mathcal{O}(10^{-1}), the RSE values of the REK and PREK methods are only in the order of 𝒪(102)\mathcal{O}(10^{-2}) or 𝒪(103)\mathcal{O}(10^{-3}), while those of Algorithm 3 and Algorithm 4 are in the order of 𝒪(103)\mathcal{O}(10^{-3}) or 𝒪(104)\mathcal{O}(10^{-4}).

Indeed, we find that the denominator of the stopping criterion (13) can be quite large, which leads to an early termination of the GRAM and RGRAK methods even when 𝐱k{\bf x}_{k} is far from being a good approximation to 𝐱{\bf x}_{\star}. For example, for the matrix AA with m=5000m=5000 and n=1000n=1000, the values of 𝐱22+𝐛(A)22\left\|{\bf x}_{\star}\right\|_{2}^{2}+\left\|{\bf b}_{\mathcal{R}\left(A\right)}\right\|_{2}^{2} can be over 5×1065\times 10^{6}. Moreover, both (12) and (13) are impractical as xx_{\star} is unknown in advance, and we have to perform two matrix-vector product with respect to AA and AHA^{H} for computing (11). That is to say, one has to access all the rows and columns of AA. Thus, compared with some available stopping criteria, our stopping criterion (14) is effective and is more appropriate as a stopping criterion for Kaczmarz-type methods.

Table 2: Section 5.1. Example 2: Numerical results of the algorithms using the proposed stopping criterion (14) with L=400L=400; A=randn(m,n)A=randn(m,n) and tol=104tol=10^{-4}. The sampling ratio is chosen as η=0.01\eta=0.01 in Algorithm 4.
m×nm\times n 5000×10005000\times 1000 5000×15005000\times 1500 5000×20005000\times 2000 5000×25005000\times 2500 5000×30005000\times 3000
REK IT 27680 54360 98480 173520 306360
CPU 9.5 21.3 41.4 73.3 139.1
RSE 4.43e-4 7.04e-4 9.94e-4 1.46e-3 2.28e-3
PREK IT 17600 33360 60520 106200 186120
CPU 4.7 9.9 19.4 36.7 66.5
RSE 2.17e-3 2.87e-3 3.49e-3 4.52e-3 5.84e-3
GRAK IT 11200 23040 43760 81960 146280
CPU 73.9 230.9 455.4 1013.4 1889.4
RSE 1.01e-3 1.63e-3 2.22e-3 3.04e-3 4.13e-3
RGRAK IT 11040 22640 42520 78640 140800
CPU 73.5 184.1 501.6 1123.3 1961.4
RSE 1.02e-3 1.76e-3 2.59e-3 3.61e-3 5.58e-3
Algorithm 3 IT 9600 19200 36040 66440 116600
CPU 57.1 96.1 289.2 592.9 1119.5
RSE 7.77e-4 1.19e-3 1.66e-3 2.23e-3 2.99e-3
Algorithm 4 IT 10120 20320 37720 68920 120760
CPU 14.0 40.6 110.2 223.3 713.5
RSE 6.99e-4 1.10e-3 1.54e-3 2.09e-3 2.88e-3

\bullet In view of the numerical results given in the first example, in the second example, we run the six algorithms using the proposed stopping criterion (7) with L=400L=400 and tol=104tol=10^{-4}. We generate another five test matrices by using the MATLAB built-in function randn(m,n)randn(m,n). The numerical results are given in Table 2, where the best results are in bold.

Some remarks are in order. First, comparing the RSERSE values of the GRAK, RGRAK, REK and PREK methods in Tabel 1 and Tabel 2, we find that the approximate solutions got from these four method are more accurate than before when using (11) as the stopping criterion. This means that the proposed stopping criterion is feasible and it is more effective than the available ones. Second, we observe from Table 2 and Table 1 that REK runs much faster when using (14) as the stopping criterion. This is due to the fact that one has to perform two matrix-vector product with respect to AA and AHA^{H} when evaluating (11). In fact, the computational cost of matrix-vector products is high when AA is large and dense. Third, it is seen from the table that for this synthetic problem, REK enjoys the highest accuracy and PREK runs the fastest, while Algorithm 3 needs the fewest iterations. This implies that the convergence speed of Algorithm 3 is the highest among the six. Of cause, an algorithm with least iteration numbers does not mean that it runs the fastest. Indeed, the framework of REK and PREK is different from those of the other four algorithms. Fourth, we see that Algorithm 3 and Algorithm 4 outperform GRAK and RGRAK significantly, both in terms of CPU time and iteration numbers. Specifically, Algorithm 3 needs the fewest iterations and Algorithm 4 uses the least CPU time. This is because Algorithm 4 only utilize a small portion of rows of the matrix A~\tilde{A} in each iteration.

5.2 Efficiency of the proposed algorithms

In this section, we perform numerical experiments to show the superiority of the proposed algorithms over many state-of-the-art Kaczmarz-type methods for large and inconsistent linear systems. Based on the numerical experiments made in Section 5.1 and for the sake of justification, we make use of (7) as the stopping criterion for all the algorithms, with L=400L=400. In order to demonstrate the superiority of Algorithm 4 over the GRAK method, we consider the value of

Speed-up=CPUtimeofGRAKCPUtimeofAlgorithm4.{\rm Speed\text{-}up}=\frac{\rm CPU~{}time~{}of~{}GRAK}{\rm CPU~{}time~{}of~{}Algorithm~{}\ref{alg6}}. (15)

There are three numerical experiments in this section.

Table 3: Section 5.2. Example 1: Test matrices for solving large inconsistent linear systems.
Matrix (AA) Size (m×nm\times n) Nnz Background
abtaha1abtaha122footnotemark: 2 14596×20914596\times 209 51307 Combinatorial Problem
abtaha2abtaha222footnotemark: 2 37932×33137932\times 331 137228 Combinatorial Problem
slssls22footnotemark: 2 1748122×62729\times 62729 6804304 Least Squares Problem
CaliforniaCalifornia22footnotemark: 2 9664×96649664\times 9664 16150 Directed Graph
stat96v5stat96v522footnotemark: 2 2307×757792307\times 75779 233921 Linear Programming Problem
GL7d25GL7d2522footnotemark: 2 2798×210742798\times 21074 81671 Combinatorial Problem

\bullet In the first experiment, we show numerical behavior of the proposed algorithms on some large sparse inconsistent linear systems. The test matrices are from the University of Florida Sparse Matrix Collection 222https://sparse.tamu.edu/. The details of these data matrices are listed in Table 3. In this experiment, the construction of the right-hand side 𝐛\bf b corresponding to the matrices abtaha1abtaha1, abtaha2abtaha2, CaliforniaCalifornia, stat96v5stat96v5 and GL7d25GL7d25 is the same way as in Section 5.1, while for the matrix slssls, it is generated by using the MATLAB built-in function randnrandn. The convergence tolerance is chosen as tol=104tol=10^{-4}, and the sampling ratio in Algorithm 4 is set to be η=0.01\eta=0.01. Table 4 lists the numerical results, where the values in bold are the best one.

Table 4: Section 5.2. Example 1: Numerical results of the six algorithms on some sparse matrices from the University of Florida Sparse Matrix Collection, tol=104tol=10^{-4} and L=400L=400. The sampling ratio is chosen as η=0.01\eta=0.01 in Algorithm 4. Here “/” means that the number of iterations exceeds 1000000 or the CPU time exceeds 24 hours.
Matrix& Size abtaha1abtaha1 14596×20914596\times 209 abtaha2abtaha2 37932×33137932\times 331 slssls 1748122×62729\times 62729 CaliforniaCalifornia 9664×96649664\times 9664 stat96v5stat96v5 2307×757792307\times 75779 GL7d25GL7d25 2798×210742798\times 21074
REK IT 49640 64760 / 476680 69440 103840
CPU 21.8 51.6 / 396.7 273.4 152.8
PREK IT 36640 41800 / / 100880 /
CPU 14.1 29.7 / / 175.6 /
GRAK IT 38880 56200 / 300320 18680 73880
CPU 35.1 99.4 / 418.2 119.6 190.9
RGRAK IT 33680 47440 / 398080 18720 93240
CPU 29.5 85.5 / 531.9 118.2 202.9
Algorithm 3 IT 25680 32560 / 223280 11800 54440
CPU 11.9 29.2 / 165.7 33.4 70.1
Algorithm 4 IT 33880 45920 406640 251760 12680 59040
CPU 15.6 45.1 59120.4 161.2 31.8 68.5
Speed-up 2.25 2.20 / 2.59 3.76 2.79

Some remarks are in order. First, it is seen from Table 4 that Algorithm 3 and Algorithm 4 are far superior to the other algorithms in terms of CPU time and number of iterations. In terms of the values of “Speed-up”, we observe from Table 4 that Algorithm 4 is about 2 to 3 times faster than the GRAK method, and the improvements are impressive. Specifically, for the large matrix slssls, all the algorithms fail to converge within 24 hours, except for Algorithm 4. Therefore, we benefit from the strategy of simple sampling and there is no need to compute the whole residual vector during iterations. Thus, Algorithm 4 is competitive for solving large and sparse inconsistent linear systems. Notice that the PREK method does not work for the two matrices CaliforniaCalifornia and GL7d25GL7d25, either. Indeed, it is required that the coefficient matrix AA is tall and of full column rank in the PREK method [5, Theorem 3.1], while the two matrices are not of full column rank.

\bullet In the second experiment, we consider the problem of tomographic image reconstruction and run the six algorithms on a 2-D parallel-beam tomography problem. In this experiment, a sparse matrix Am×nA\in\mathbb{R}^{m\times n} and an “exact solution” 𝐱n{\bf x}_{\star}\in\mathbb{R}^{n} is generated by using the function paralleltomo(N,θ,p)paralleltomo(N,\theta,p) in the AIR tool box [19], with N=60N=60, θ=0:1:178°\theta=0:1:178\degree and p=125p=125. The matrix AA is of size 22375×360022375\times 3600, and we set the right-hand side 𝐛=A𝐱+𝐫{\bf b}=A{\bf x}_{\star}+{\bf r}, where the noise 𝐫m{\bf r}\in\mathbb{R}^{m} is a nonzero vector in the null space of the matrix AHA^{H} generate by the MATLAB function nullnull.

In this experiment, we solve the inconsistent linear system A𝐱=𝐛A{\bf x}={\bf b}, and all the algorithms are run for 20000 iterations. Consider the signal-noise ratio (SNR) [44]

SNR=i=1n𝐱i2i=1n(𝐱i𝐱^i)2,SNR=\frac{\sum_{i=1}^{n}{{\bf x}_{i}^{2}}}{\sum_{i=1}^{n}{\left({\bf x}_{i}-\hat{{\bf x}}_{i}\right)^{2}}}, (16)

where 𝐱{\bf x} is the original clean signal (i.e., the “exact solution” 𝐱{\bf x}_{\star}), and 𝐱^\hat{{\bf x}} is the denoised signal (i.e., the computed solution obtained from one of the six algorithms). Notice that the larger the SNRSNR, the better the denoising effect of an algorithm. The original image (Exact), the reconstructed ones as well as the values of SNR are shown in Figure 1. It is obviously to see from the figures and the values of SNR that Algorithm 3 and Algorithm 4 can remove the noise and restore the real image efficiently, and our algorithms are superior to the other methods in terms of SNR.

Refer to caption


Fig. 1: Example 2: The original image (Exact), the reconstructed ones, and the values of SNR obtained from the six algorithms REK, PREK, GRAK, RGRAK, Algorithm 3 and Algorithm 4 (η=0.01)\left(\eta=0.01\right). All the algorithms were run for 20000 iterations.

\bullet Spectral Regression Discriminant Analysis (SRDA) is one of the most popular methods for large-scale discriminant analysis [12]. By using the tool of spectral graph analysis, SRDA casts discriminant analysis into a regression framework which facilitates both efficient computation and the use of regularization techniques. Specifically, SRDA only needs to solve a set of least squares problems and there is no eigenvector computation involved, which can save both the running time and the memory compared with some classical linear discriminant analysis methods.

Let X=[𝐱1,𝐱2,,𝐱n]d×nX=\left[{\bf x}_{1},{\bf x}_{2},\dots,{\bf x}_{n}\right]\in\mathbb{R}^{d\times n} be a set of nn training samples in a dd-dimensional feature space. Assume that the data matrix is partitioned into kk classes as X=[X1,,Xk]X=\left[X_{1},\dots,X_{k}\right], where XjX_{j} is the jj-th set with njn_{j} being the number of samples. Denote by 𝐜j{\bf c}_{j} the centroid vector of XjX_{j}, and by 𝐜{\bf c} the global centroid vector of the training data. Denote by

Y=[𝐲1,𝐲2,,𝐲k]=[𝟏n1𝟎𝟎𝟎𝟏n2𝟎𝟎𝟎𝟏nk]n×k,Y=\left[{\bf y}_{1},{\bf y}_{2},\dots,{\bf y}_{k}\right]=\left[\begin{matrix}{\bf 1}_{n_{1}}&{\bf 0}&\cdots&{\bf 0}\\ {\bf 0}&{\bf 1}_{n_{2}}&\cdots&{\bf 0}\\ \vdots&\vdots&\ddots&\vdots\\ {\bf 0}&{\bf 0}&\cdots&{\bf 1}_{n_{k}}\\ \end{matrix}\right]\in\mathbb{R}^{n\times k},

where 𝟏ni{\bf 1}_{n_{i}} and 𝟏n{\bf 1}_{n} are the one vector of size nin_{i} and nn, respectively, and InI_{n} is the nn-by-nn identity matrix. Let Y^=(In𝟏n𝟏nT)Y\hat{Y}=\left(I_{n}-{\bf 1}_{n}{\bf 1}_{n}^{T}\right)Y and Y¯n×(k1)\bar{Y}\in\mathbb{R}^{n\times\left(k-1\right)} be an orthonormal basis of span{Y^}span\left\{\hat{Y}\right\}.

Let X^=[XT,𝟏n]T\hat{X}=\left[X^{T},{\bf 1}_{n}\right]^{T}. In essence, the SRDA method resorts to the following least squares problems [12, 39]:

a^i=argminaid{X^Tai𝐲i2},i=1,2,,k1,\hat{a}_{i}=\mathop{\arg\min}\limits_{a_{i}\in\mathbb{R}^{d}}\left\{\left\|\hat{X}^{T}a_{i}-{\bf y}_{i}\right\|_{2}\right\},\quad i=1,2,\dots,k-1, (17)

where 𝐲i{\bf y}_{i} is the ii-th column vector of Y¯,i=1,2,,k1\bar{Y},~{}i=1,2,\dots,k-1.

Without loss of generality, we solve the problem as i=1i=1 by using Algorithm 3 and Algorithm 4, where the sampling ratio is chosen as η=0.001\eta=0.001 in the latter algorithm. We compare the two proposed methods with the REK method, the PREK method, the GRAK method, and the RGRAK method. In the stopping criterion (7), we use tol=106tol=10^{-6} and L=400L=400. We run the algorithms on five dense databases, including the MNISTMNIST dataset333http://yann.lecun.com/exdb/mnist/, the UCIUCI dataset444http://archive.ics.uci.edu/dataset/54/isolet, the USPS_USPS\_ dataset555https://hyper.ai/datasets/16041[8], the CIFARCIFAR-10 dataset666http://www.cs.toronto.edu/ kriz/cifar.html, as well as the YouTube Faces dataset777http://www.cs.tau.ac.il/ wolf/ytfaces/. In all the experiments, we randomly choose n=70%Nn=70\%\cdot N samples as the training set, where NN is the total number of samples. The details of the data sets are listed in Table 5, and Table 6 lists the numerical results obtained.

Table 5: Section 5.2. Example 3: Details of the databases.
Datasets Dimensionality (dd) Number of total samples (NN) Background Type
MNISTMNIST33footnotemark: 3 784 70000 Handwritten digits recognition dense
ISOLETISOLET44footnotemark: 4 617 7797 Classification dense
USPS_USPS\_55footnotemark: 5 256 9598 Handwritten digits recognition dense
CIFAR10CIFAR-1066footnotemark: 6 3072 50000 Object recognition dense
YouTubeFaceArrange_64×64YouTubeFaceArrange\_64\times 6477footnotemark: 7 4096 370319 Face recognition dense
Table 6: Section 5.2.2. Example 3: Numerical results of the algorithms for solving the least squares problem (17) as i=1i=1. Here ”/” means the number of iterations exceeds 1000000 or the CPU time exceeds 24 hours.
Matrix& Size MNISTMNIST 70000×78470000\times 784 ISOLETISOLET 7797×6177797\times 617 USPS_USPS\_ 9598×256\times 256 CIFAR10CIFAR-10 50000×3072\times 3072 YouTubeFaceArrange_64×64YouTubeFaceArrange\_64\times 64 370319×4096370319\times 4096
REK IT 166800 246440 367000 / /
CPU 237.6 83.3 111.9 / /
PREK IT / 374680 162600 / /
CPU / 101.6 40.6 / /
GRAK IT 19200 124840 188480 465080 /
CPU 419.9 312.5 153.1 30428.1 /
RGRAK IT 17760 88600 98440 130800 /
CPU 350.2 217.9 79.9 14115.9 /
Algorithm 3 IT 17200 82200 72320 130760 /
CPU 1451.5 187.2 46.2 26437.2 /
Algorithm 4 IT 88000 236240 338280 975600 905200
CPU 183.4 72.7 94.6 4393.7 44789.7
Speed-up 2.3 4.3 1.6 6.9 /

From Table 6, we observe that Algorithm 4 is the fastest one while Algorithm 3 needs the fewest number of iterations in most cases. Specifically, Algorithm 4 is about two to six times faster than GRAK, and the improvement is prohibitive. For the extremely large-scale data set YouTubeFaceArrange_64×64YouTubeFaceArrange\_64\times 64, all the algorithms fail to converge except for Algorithm 4. Thus, we benefit from the strategy of random sampling and there is no need to calculate the whole residual vector at each iteration. All these results show that Algorithm 4 is advantageous in solving large-scale least squares problems with dense data. Notice that the PREK method does not work for the dataset MNISTMNIST, this is because this method may fail to converge if the coefficient matrix is not of full column rank [5].

6 Concluding Remarks

The greedy randomized augmented Kaczmarz method is an effective approach proposed recently for large and sparse inconsistent linear systems. The key of this method is to equivalently transform a inconsistent linear system of size mm-by-nn into a consistent augmented linear system of size (m+n)(m+n)-by-(m+n)(m+n), and then apply the GRK method to the augmented linear system. However, in the GRAK method, it is required to calculate the residual of the augmented linear system and construct two index sets at each iteration. Moreover, it only updates the vector 𝐳k{\bf z}_{k} rather than the approximate solution 𝐱k{\bf x}_{k} as m+1tkm+nm+1\leqslant t_{k}\leqslant m+n, which may slow down the convergence rate. Thus, the computational overhead of this method is large, especially for big-data problems, and it is urgent to enhance the numerical performance of this method.

The main contributions of this work are two-fold. First, we propose an accelerated greedy randomized augmented Kaczmarz method for large-scale and sparse or dense inconsistent linear systems, and apply the strategy of simple random sampling to further reduce the workload. Theoretical analysis shows that, under very weak assumptions, the accelerated greedy randomized augmented Kaczmarz method converges faster than the greedy randomized augmented Kaczmarz method. Second, as far as we know, there are no practical stopping criteria for randomized Kaczmarz-type methods till now. So as to fill-in this gap, we introduce a practical stopping criterion that is applicable to all the randomized Kaczmarz-type methods. Numerical experiments demonstrate the numerical behavior of our proposed algorithms and the effectiveness of the stopping criterion. However, there is a parameter LL involved in the proposed stopping criterion. How to pick its optimal value is interesting and deserves further investigation. We believe it is problem-and-data dependent.

References

  • [1] R. Ansorge, Connections between the Cimmino-method and the Kaczmarz-method for the solution of singular and regular systems of equations, Computing, 33 (1984), pp. 367–375.
  • [2] V. Borkar, N. Karamchandani, and S. Mirani, Randomized Kaczmarz for rank aggregation from pairwise comparisons, IEEE Information Theory Workshop (ITW), Cambridge, 2016, pp. 389–393.
  • [3] Z.-Z. Bai and W.-T. Wu, On greedy randomized Kaczmarz method for solving large sparse linear systems, SIAM Journal on Scientific Computing, 40 (2018), pp. A592–A606.
  • [4] Z.-Z. Bai and W.-T. Wu, On greedy randomized augmented Kaczmarz method for solving large sparse inconsistent linear systems, SIAM Journal on Scientific Computing, 43 (2021), pp. A3892–A3911.
  • [5] Z.-Z. Bai and W.-T. Wu, On partially randomized extended Kaczmarz method for solving large sparse overdetermined inconsistent linear systems, Linear Algebra and its Applications, 578 (2019), pp. 225–250.
  • [6] Z.-Z. Bai and W.-T. Wu, On relaxed greedy randomized Kaczmarz methods for solving large sparse linear systems, Applied Mathematics Letters, 83 (2018), pp. 21–26.
  • [7] Z.-Z. Bai, L. Wang, and G.V. Muuratova, On relaxed greedy randomized augmented Kaczmarz methods for solving large sparse inconsistent linear systems, East Asian Journal on Applied Mathematics, (27) 2021, pp. 323–332.
  • [8] L. Bai, J. Liang and Y. Zhao, Self-constrained spectral clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 45 (2023), pp. 5126–5138.
  • [9] J. Chen and Z.-D. Huang, On a fast deterministic block Kaczmarz method for solving large scale linear system, Numerical Algorithms, 89 (2022), pp. 1007–1029.
  • [10] S. Chapra and R Canale, Numerical Methods for Engineers, 7th. ed., McGraw-Hill, New York, 2005.
  • [11] Y. Censor, Row-action methods for huge and sparse systems and their applications, SIAM Review, 23 (1981), pp. 444–466.
  • [12] D. Cai, X.-F. He, and J.-W. Han, SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis, IEEE Transactions on Knowledge and Data Engineering, 20 (2008), pp. 1–12.
  • [13] K. Du, W. Si, and X. Sun, Randomized extended average block Kaczmarz for solving least squares, SIAM Journal on Scientific Computing, 42 (2020), pp. A3541–A3559.
  • [14] C. Gu and Y. Liu, Variant of greedy randomized Kaczmarz for ridge regression, Applied Numerical Mathematics, 143 (2019), pp. 223–246.
  • [15] R. Gordon, R. Bender, and G.T. Herman, Algebraic reconstruction techniques (ART) for three-dimensional electron microscopy and X-ray photography, Journal of Theoretical Biology, 29 (1970), pp. 471–481.
  • [16] W. Guo, H. Chen, W. Geng, and L. Lei, A modified Kaczmarz algorithm for computerized tomographic image reconstruction, IEEE International Conference on Biomedical Engineering and Informatics, 3 (2009), pp. 1–4.
  • [17] A. Hefny, D. Needell, and A. Ramdas, Rows versus columns: randomized Kaczmarz or Gauss-Seidel for ridge egression, SIAM Journal on Scientific Computing, 39 (2017), pp. S528–S542.
  • [18] G. Herman, Fundamentals of Computerized Tomography: Image Reconstruction from Projections, 2nd. ed., Springer, Dordrecht, 2009.
  • [19] P. Hansen and J\varnothingrgensen, AIR tools IIII:algebraic iterative reconstruction methods, improved mplementation, Numerical Algorithms, 79 (2018), pp. 107–137.
  • [20] X. Jiang, K. Zhang, and J. Yin, Randomized block Kaczmarz methods with k-means clustering for solving large linear systems, Journal of Computational and Applied Mathematics, 403 (2022), Article 113828.
  • [21] Y. Jiang, G. Wu, and L. Jiang, A semi-randomized Kaczmarz method with simple random sampling for large-scale linear systems, Advances in Computational Mathmatics, 49 (2023), Article 20.
  • [22] J. Johnson, Probability and Statistics for Computer Scientists, The John Wiley & Sons Press, 1997.
  • [23] S. Kaczmarz, Approximate solution of systems of linear equations, International Journal of Control, 35 (1937), pp. 355–357.
  • [24] S. Lee and H. Kim, Noise properties of reconstructed images in a kilo-voltage on-board imaging system with iterative reconstruction techniques: A phantom study, Physica Medica, 30 (2014), pp. 365–373.
  • [25] K. Li and G. Wu, Randomized approximate class-specific kernel spectral regression analysis for large-scale face verification, Machine Learning, 111 (2022), pp. 2037–2091.
  • [26] R. Li and H. Liu, On global randomized block Kaczmarz method for image reconstruction, Electronic Research Archive, 30 (2022), pp. 1442–1453.
  • [27] J. Liu and S. Wright, An accelerated randomized Kaczmarz algorithm, Mathematics of Computation, 85 (2016), pp. 153–178.
  • [28] J. Loera, J. Haddock, and D. Needell, A sampling Kaczmarz-Motzkin algorithm for linear feasibility, SIAM Journal on Scientific Computing, 39 (2017), pp. S66–S87.
  • [29] A. Ma, D. Needell, and A. Ramdas, Convergence properties of the randomized extended Gauss-Seidel and Kaczmarz methods, SIAM Journal on Matrix Analysis and Applications, 36 (2015), pp. 1590–1604.
  • [30] C.-Q. Miao and W.-T. Wu, On greedy randomized average block Kaczmarz method for solving large linear systems, Journal of Computational and Applied Mathematics, 413 (2022), Article 114372.
  • [31] D. Needell, Randomized Kaczmarz solver for noisy linear systems, Bit Numerical Mathematics, 50 (2010), pp. 395–403.
  • [32] D. Needell and J. Tropp, Paved with good intentions: Analysis of a randomized block Kaczmarz method, Linear Algebra and Its Applications, 441 (2014), pp. 199–221.
  • [33] D. Needell, R. Zhao, and A. Zouzias, Randomized block Kaczmarz method with projection for solving least squares, Linear Algebra and Its Applications, 484 (2015), pp. 322–343.
  • [34] I. Necoara, Faster randomized block Kaczmarz algorithms, SIAM Journal on Matrix Analysis and Applications, 40 (2019), pp. 1425–1452.
  • [35] Y. Niu and B. Zheng, A greedy block Kaczmarz algorithm for solving large-scale linear systems, Applied Mathematics Letters, 104 (2020), Article 106294.
  • [36] C. Popa, Least-squares solution of overdetermined inconsistent linear systems using kaczmarz’s relaxation, International Journal of Computer Mathematics, 55 (1995), pp. 79–89.
  • [37] C. Popa, Extensions of block-projections methods with relaxation parameters to inconsistent and rank-deficient least-squares problems, Bit Numerical Mathematics, 38 (1998), pp. 151–176.
  • [38] R. Ramlau and M. Rosensteiner, An efficient solution to the atmospheric turbulence tomography problem using Kaczmarz iteration, Inverse Problems, 28 (2012), Article 095004.
  • [39] W. Shi and G. Wu, New algorithms for trace-ratio problem with application to high-dimension and large-sample data dimensionality reduction, Machine Learning (2021), pp. 1–28.
  • [40] S. Steinerberger, A weighted randomized Kaczmarz method for solving linear systems, Mathematics of Computation, 90 (2021), pp. 2815–2826.
  • [41] T. Strohmer and R. Vershynin, A randomized Kaczmarz algorithm with exponential convergence, Journal of Fourier Analysis and Applications, 15 (2009), pp. 262–278.
  • [42] G. Thoppe, V. Borkar, and D. Manjunath, A stochastic Kaczmarz algorithm for network tomography, Automatica, 50 (2014), pp. 910–914.
  • [43] X. Yang, A geometric probability randomized Kaczmarz method for large scale linear systems, Applied Numerical Mathematics, 164 (2021), pp. 139-160.
  • [44] J. Yin, N. Li, and N. Zheng, Restarted randomized surrounding methods for solving large linear equations, Applied mathematics letters, 133 (2022), Article 108290
  • [45] J. Zhang, A new greedy Kaczmarz algorithm for the solution of very large linear systems, Applied Mathematics Letters, 91 (2019), pp. 207–212.
  • [46] Y. Zhang and H. Li, Greedy Motzkin-Kaczmarz methods for solving linear systems, Numerical Linear Algebra with Applications, 29 (2022), e2429.
  • [47] Y. Zhang and H. Li, A count sketch maximal weighted residual Kaczmarz method for solving highly overdetermined linear systems, Applied Mathematics and Computation, 410 (2021), Article 126486.
  • [48] A. Zouzias and N.M. Freris, Randomized extended Kaczmarz for solving least-squares, SIAM Journal on Matrix Analysis and Applications, 34 (2013), pp. 773–793.