This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sublinear Time Eigenvalue Approximation via Random Sampling

Rajarshi Bhattacharjee111Manning College of Information and Computer Sciences, University of Massachusetts, Amherst, {rbhattacharj, cmusco, ray}@cs.umass.edu    Gregory Dexter222Department of Computer Science, Purdue University, West Lafayette, {gdexter, pdrineas}@purdue.edu    Petros Drineas22footnotemark: 2    Cameron Musco11footnotemark: 1    Archan Ray11footnotemark: 1
Abstract

We study the problem of approximating the eigenspectrum of a symmetric matrix 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} with bounded entries (i.e., 𝐀1\|\mathbf{A}\|_{\infty}\leq 1). We present a simple sublinear time algorithm that approximates all eigenvalues of 𝐀\mathbf{A} up to additive error ±ϵn\pm\epsilon n using those of a randomly sampled O~(log3nϵ3)×O~(log3nϵ3)\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{3}}\right)\times\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{3}}\right) principal submatrix. Our result can be viewed as a concentration bound on the complete eigenspectrum of a random submatrix, significantly extending known bounds on just the singular values (the magnitudes of the eigenvalues). We give improved error bounds of ±ϵnnz(𝐀)\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} and ±ϵ𝐀F\pm\epsilon\|\mathbf{A}\|_{F} when the rows of 𝐀\mathbf{A} can be sampled with probabilities proportional to their sparsities or their squared 2\ell_{2} norms respectively. Here nnz(𝐀)\operatorname{nnz}(\mathbf{A}) is the number of non-zero entries in 𝐀\mathbf{A} and 𝐀F\|\mathbf{A}\|_{F} is its Frobenius norm. Even for the strictly easier problems of approximating the singular values or testing the existence of large negative eigenvalues (Bakshi, Chepurko, and Jayaram, FOCS ’20), our results are the first that take advantage of non-uniform sampling to give improved error bounds. From a technical perspective, our results require several new eigenvalue concentration and perturbation bounds for matrices with bounded entries. Our non-uniform sampling bounds require a new algorithmic approach, which judiciously zeroes out entries of a randomly sampled submatrix to reduce variance, before computing the eigenvalues of that submatrix as estimates for those of 𝐀\mathbf{A}. We complement our theoretical results with numerical simulations, which demonstrate the effectiveness of our algorithms in practice.

1 Introduction

Approximating the eigenvalues of a symmetric matrix is a fundamental problem – with applications in engineering, optimization, data analysis, spectral graph theory, and beyond. For an n×nn\times n matrix, all eigenvalues can be computed to high accuracy using direct eigendecomposition in O(nω)O(n^{\omega}) time, where ω2.37\omega\approx 2.37 is the exponent of matrix multiplication [DDHK07, AW21]. When just a few of the largest magnitude eigenvalues are of interest, the power method and other iterative Krylov methods can be applied [Saa11]. These methods repeatedly multiply the matrix of interest by query vectors, requiring O(n2)O(n^{2}) time per multiplication when the matrix is dense and unstructured.

For large nn, it is desirable to have even faster eigenvalue approximation algorithms, running in o(n2)o(n^{2}) time – i.e., sublinear in the size of the input matrix. Unfortunately, for general matrices, no non-trivial approximation can be computed in o(n2)o(n^{2}) time: without reading Ω(n2)\Omega(n^{2}) entries, it is impossible to distinguish with reasonable probability if all entries (and hence all eigenvalues) are equal to zero, or if there is a single pair of arbitrarily large entries at positions (i,j)(i,j) and (j,i)(j,i), leading to a pair of arbitrarily large eigenvalues. Given this, we seek to address the following question:

Under what assumptions on a symmetric n×nn\times n input matrix, can we compute non-trivial approximations to its eigenvalues in o(n2)o(n^{2}) time?

It is well known that o(n2)o(n^{2}) time eigenvalue computation is possible for highly structured inputs, like tridiagonal or Toeplitz matrices [GE95]. For sparse or structured matrices that admit fast matrix vector multiplication, one can compute a small number of the largest in magnitude eigenvalues in o(n2)o(n^{2}) time using iterative methods. Through the use of robust iterative methods, fast top eigenvalue estimation is also possible for matrices that admit fast approximate matrix-vector multiplication, such as kernel similarity matrices [GS91, HP14, BIMW21]. Our goal is to study simple, sampling-based sublinear time algorithms that work under much weaker assumptions on the input matrix.

1.1 Our Contributions

Our main contribution is to show that a very simple algorithm can be used to approximate all eigenvalues of any symmetric matrix with bounded entries. In particular, for any 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} with maximum entry magnitude 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, sampling an s×ss\times s principal submatrix 𝐀S\mathbf{A}_{S} of 𝐀\mathbf{A} with s=O~(log3nϵ3)s=\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{3}}\right) and scaling its eigenvalues by n/sn/s yields a ±ϵn\pm\epsilon n additive error approximation to all eigenvalues of 𝐀\mathbf{A} with good probability.333Here and throughout, O~()\tilde{O}(\cdot) hides logarithmic factors in the argument. Note that by scaling, our algorithm gives a ±ϵn𝐀\pm\epsilon n\cdot\|\mathbf{A}\|_{\infty} approximation for any 𝐀\mathbf{A}. This result is formally stated below, where [n]=def{1,,n}[n]\mathbin{\stackrel{{\scriptstyle\rm def}}{{=}}}\{1,\ldots,n\}.

Theorem 1 (Sublinear Time Eigenvalue Approximation).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and eigenvalues λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}). Let S[n]S\subseteq[n] be formed by including each index independently with probability s/ns/n as in Algorithm 1. Let 𝐀S\mathbf{A}_{S} be the corresponding principal submatrix of 𝐀\mathbf{A}, with eigenvalues λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}_{S}).

For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}_{S})\geq 0, let λ~i(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all i[|S|]i\in[|S|] with λi(𝐀S)<0\lambda_{i}(\mathbf{A}_{S})<0, let λ~n(|S|i)(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all other i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0. If sclog(1/(ϵδ))log3nϵ3δs\geq\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}{\delta}}, for large enough constant cc, then with probability 1δ\geq 1-\delta, for all i[n]i\in[n],

λi(𝐀)ϵnλ~i(𝐀)λi(𝐀)+ϵn.\displaystyle\lambda_{i}(\mathbf{A})-\epsilon n\leq\tilde{\lambda}_{i}(\mathbf{A})\leq\lambda_{i}(\mathbf{A})+\epsilon n.

See Figure 1 for an illustration of how the |S||S| eigenvalues of 𝐀S\mathbf{A}_{S} are mapped to estimates for all nn eigenvalues of 𝐀\mathbf{A}. Note that the principal submatrix 𝐀S\mathbf{A}_{S} sampled in Theorem 1 will have O(s)=O~(log3nϵ3δ)O(s)=\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{3}\delta}\right) rows/columns with high probability. Thus, with high probability, the algorithm reads just O~(log6nϵ6δ2)\tilde{O}\left(\frac{\log^{6}n}{\epsilon^{6}\delta^{2}}\right) entries of 𝐀\mathbf{A} and runs in missingpoly(logn,1/ϵ,1/δ)\mathop{\mathrm{missing}}{poly}(\log n,1/\epsilon,1/\delta) time. Standard matrix concentration bounds imply that one can sample O(slog(1/δ)ϵ2)O\left(\frac{s\log(1/\delta)}{\epsilon^{2}}\right) random entries from the O(s)×O(s)O(s)\times O(s) random submatrix 𝐀S\mathbf{A}_{S} and preserve its eigenvalues to error ±ϵs\pm\epsilon s with probability 1δ1-\delta [AM07]. See Appendix F for a proof. This can be directly combined with Theorem 1 to give improved sample complexity:

Corollary 1 (Improved Sample Complexity via Entrywise Sampling).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and eigenvalues λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}). For any ϵ,δ(0,1)\epsilon,\delta\in(0,1), there is an algorithm that reads O~(log3nϵ5δ)\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{5}\delta}\right) entries of 𝐀\mathbf{A} and returns, with probability at least 1δ1-\delta, λ~i(𝐀)\tilde{\lambda}_{i}(\mathbf{A}) for each i[n]i\in[n] satisfying |λ~i(𝐀)λi(𝐀)|ϵn|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})|\leq\epsilon n.

Observe that the dependence on δ\delta in Theorem 1 and Corollary 1 can be improved via standard arguments: running the algorithm with failure probability δ=2/3\delta^{\prime}=2/3, repeating O(log(1/δ))O(\log(1/\delta)) times, and taking the median estimate for each λi(𝐀)\lambda_{i}(\mathbf{A}). This guarantees that the algorithm will succeed with probability at most 1δ1-\delta at the expense of a log(1/δ)\log(1/\delta) dependence in the complexity.

n{-}n0nns{-}s0ssλ~n(𝐀)\tilde{\lambda}_{n}(\mathbf{A})λ~1(𝐀)\tilde{\lambda}_{1}(\mathbf{A})λ~n(|S|p)(𝐀)\tilde{\lambda}_{n-(|S|-p)}(\mathbf{A})λ~p1(𝐀)\tilde{\lambda}_{p-1}(\mathbf{A})λ|S|(𝐀S)\lambda_{|S|}(\mathbf{A}_{S})λ1(𝐀S)\lambda_{1}(\mathbf{A}_{S})λp(𝐀S)\lambda_{p}(\mathbf{A}_{S})λp1(𝐀S)\lambda_{p-1}(\mathbf{A}_{S})λ~t(𝐀) for t(n(|S|p+1),p)\tilde{\lambda}_{t}(\mathbf{A})\text{ for }t\in(n-(|S|-p+1),p)
Figure 1: Alignment of eigenvalues in Thm. 1 and Algo. 1. We illustrate how the eigenvalues of 𝐀S\mathbf{A}_{S}, scaled by ns\frac{n}{s}, are used to approximate all eigenvalues of 𝐀\mathbf{A}. If 𝐀S\mathbf{A}_{S} has p1p-1 positive eigenvalues, they are set to the top p1p-1 eigenvalue estimates. Its |S|p+1|S|-p+1 negative eigenvalues are set to the bottom eigenvalue estimates. All remaining eigenvalues are simply approximated as zero.

Comparison to known bounds. Theorem 1 can be viewed as a concentration inequality on the full eigenspectrum of a random principal submatrix 𝐀S\mathbf{A}_{S} of 𝐀\mathbf{A}. This significantly extends prior work, which was able to bound just the spectral norm (i.e., the magnitude of the top eigenvalue) of a random principal submatrix [RV07, Tro08a]. Bakshi, Chepurko, and Jayaram [BCJ20] recently identified developing such full eigenspectrum concentration inequalities as an important step in expanding our knowledge of sublinear time property testing algorithms for bounded entry matrices.

Standard matrix concentration bounds [GT11] can be used to show that the singular values of 𝐀\mathbf{A} (i.e., the magnitudes of its eigenvalues) are approximated by those of a O(lognϵ2)×O(lognϵ2)O\left(\frac{\log n}{\epsilon^{2}}\right)\times O\left(\frac{\log n}{\epsilon^{2}}\right) random submatrix (see Appendix G) with independently sampled rows and columns. However, such a random matrix will not be symmetric or even have real eigenvalues in general, and thus no analogous bounds were previously known for the eigenvalues themselves.

Recently, Bakshi, Chepurko, and Jayaram [BCJ20] studied the closely related problem of testing positive semidefiniteness in the bounded entry model. They show how to test whether the minimum eigenvalue of 𝐀\mathbf{A} is either greater than 0 or smaller than ϵn-\epsilon n by reading just O~(1ϵ2)\tilde{O}(\frac{1}{\epsilon^{2}}) entries. They show that this result is optimal in terms of query complexity, up to logarithmic factors. Like our approach, their algorithm is based on random principal submatrix sampling. Our eigenvalue approximation guarantee strictly strengthens the testing guarantee – given ±ϵn\pm\epsilon n approximations to all eigenvalues, we immediately solve the testing problem. Thus, our query complexity is tight up to a missingpoly(logn,1/ϵ)\mathop{\mathrm{missing}}{poly}(\log n,1/\epsilon) factor. It is open if our higher sample complexity is necessary to solve the harder full eigenspectrum estimation problem. See Section 1.4 for further discussion.

Improved bounds for non-uniform sampling. Our second main contribution is to show that, when it is possible to efficiently sample rows/columns of 𝐀\mathbf{A} with probabilities proportional to their sparsities or their squared 2\ell_{2} norms, significantly stronger eigenvalue estimates can be obtained. In particular, letting nnz(𝐀)\operatorname{nnz}(\mathbf{A}) denote the number of nonzero entries in 𝐀\mathbf{A} and 𝐀F\|\mathbf{A}\|_{F} denote its Frobenius norm, we show that sparsity-based sampling yields eigenvalue estimates with error ±ϵnnz(𝐀)\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} and norm-based sampling gives error ±ϵ𝐀F\pm\epsilon\|\mathbf{A}\|_{F}. See Theorems 2 and 3 for formal statements. Observe that when 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, its eigenvalues are bounded in magnitude by 𝐀2𝐀Fnnz(𝐀)n\|\mathbf{A}\|_{2}\leq\|\mathbf{A}\|_{F}\leq\sqrt{\operatorname{nnz}(\mathbf{A})}\leq n. Thus, Theorems 2 and 3 are natural strengthenings of Theorem 1. Row norm-based sampling (Theorem 3) additionally removes the bounded entry requirement of Theorems 1 and 2.

As discussed in Section 1.3.1, sparsity-based sampling can be performed in sublinear time when 𝐀\mathbf{A} is stored in a slightly augmented sparse matrix format or when 𝐀\mathbf{A} is the adjacency matrix of a graph accessed in the standard graph query model of the sublinear algorithms literature [GR97]. Norm-based sampling can also be performed efficiently with an augmented matrix format, and is commonly studied in randomized and ‘quantum-inspired’ algorithms for linear algebra [FKV04, Tan18].

Theorem 2 (Sparse Matrix Eigenvalue Approximation).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and eigenvalues λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}). Let S[n]S\subseteq[n] be formed by including the iith index independently with probability pi=min(1,snnz(𝐀i)nnz(𝐀))p_{i}=\min\left(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}\right) as in Algorithm 2. Here nnz(𝐀i)\operatorname{nnz}(\mathbf{A}_{i}) is the number of non-zero entries in the ithi^{th} row of 𝐀\mathbf{A}. Let 𝐀S\mathbf{A}_{S} be the corresponding principal submatrix of 𝐀\mathbf{A}, and let λ~i(𝐀)\tilde{\lambda}_{i}(\mathbf{A}) be the estimate of λi(𝐀)\lambda_{i}(\mathbf{A}) computed from 𝐀S\mathbf{A}_{S} as in Algorithm 2. If sclog8nϵ8δ4s\geq\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}}, for large enough constant cc, then with probability 1δ\geq 1-\delta, for all i[n]i\in[n], |λ~i(𝐀)λi(𝐀)|ϵnnz(𝐀)|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})|\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}.

Theorem 3 (Row Norm Based Matrix Eigenvalue Approximation).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric and eigenvalues λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}). Let S[n]S\subseteq[n] be formed by including the iith index independently with probability pi=min(1,s𝐀i22𝐀F2+1n2)p_{i}=\min\left(1,\frac{s\|\mathbf{A}_{i}\|^{2}_{2}}{\|\mathbf{A}\|^{2}_{F}}+\frac{1}{n^{2}}\right) as in Algorithm 3. Here 𝐀i2\|\mathbf{A}_{i}\|_{2} is the 2\ell_{2} norm of the ithi^{th} row of 𝐀\mathbf{A}. Let 𝐀S\mathbf{A}_{S} be the corresponding principal submatrix of 𝐀\mathbf{A}, and let λ~i(𝐀)\tilde{\lambda}_{i}(\mathbf{A}) be the estimate of λi(𝐀)\lambda_{i}(\mathbf{A}) computed from 𝐀S\mathbf{A}_{S} as in Algorithm 3. If sclog10nϵ8δ4s\geq\frac{c\log^{10}n}{\epsilon^{8}\delta^{4}}, for large enough constant cc, then with probability 1δ\geq 1-\delta, for all i[n]i\in[n], |λ~i(𝐀)λi(𝐀)|ϵ𝐀F.|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})|\leq\epsilon\|\mathbf{A}\|_{F}.

The above non-uniform sampling theorems immediately yield algorithms for testing the presence of a negative eigenvalue with magnitude at least ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} or ϵ𝐀F\epsilon\|\mathbf{A}\|_{F} respectively, strengthening the testing results of [BCJ20], which require eigenvalue magnitude at least ϵn\epsilon n. In the graph property testing literature, there is a rich line of work exploring the testing of bounded degree or sparse graphs [GR97, BSS10]. Theorem 2 can be thought of as first step in establishing a related theory of sublinear time approximation algorithms and property testers for sparse matrices.

Surprisingly, in the non-uniform sampling case, the eigenvalue estimates derived from 𝐀S\mathbf{A}_{S} cannot simply be its scaled eigenvalues, as in Theorem 1. E.g., when 𝐀\mathbf{A} is the identity, our row sampling probabilities are uniform in all cases. However, the scaled submatrix ns𝐀S\frac{n}{s}\cdot\mathbf{A}_{S} will be a scaled identity, and have eigenvalues equal to n/sn/s – failing to give a ±ϵnnz(𝐀)=±ϵ𝐀F=±ϵn\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}=\pm\epsilon\|\mathbf{A}\|_{F}=\pm\epsilon\sqrt{n} approximation to the true eigenvalues (all of which are 11) unless snϵs\gtrsim\frac{\sqrt{n}}{\epsilon}. To handle this, and related cases, we must argue that selectively zeroing out entries in sufficiently low probability rows/columns of 𝐀\mathbf{A} (see Algorithms 2 and 3) does not significantly change the spectrum, and ensures concentration of the submatrix eigenvalues. It is not hard to see that simple random submatrix sampling fails even for the easier problem of singular value estimation. Theorems 2 and 3 give the first results of their kinds for this problem as well.

1.2 Related Work

Eigenspectrum estimation is a key primitive in numerical linear algebra, typically known as spectral density estimation. The eigenspectrum is viewed as a distribution with mass 1/n1/n at each of the nn eigenvalues, and the goal is to approximate this distribution [WWAF06, LSY16]. Applications include identifying motifs in social networks [DBB19], studying Hessian and weight matrix spectra in deep learning [SBL16, YGL+18, GKX19], ‘spectrum splitting’ in parallel eigensolvers [LXES19], and the study of many systems in experimental physics and chemistry [Wan94, SR94, HBT19].

Recent work has studied sublinear time spectral density estimation for graph structured matrices – Braverman, Krishnan, and Musco [BKM22] show that the spectral density of a normalized graph adjacency or Laplacian matrix can be estimated to ϵ\epsilon error in the Wasserstein distance in O~(n/missingpoly(ϵ))\tilde{O}(n/\mathop{\mathrm{missing}}{poly}(\epsilon)) time. Cohen-Steiner, Kong, Sohler, and Valiant study a similar setting, giving runtime 2O(1/ϵ)2^{O(1/\epsilon)} [CSKSV18]. We note that the additive error eigenvalue approximation result of Theorem 1 (analogously Theorems 2 and 3) directly gives an ϵn\epsilon n approximation to the spectral density in the Wasserstein distance – extending the above results to a much broader class of matrices. When 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, 𝐀\mathbf{A} can have eigenvalues as large as nn, while the normalized adjacency matrices studied in [CSKSV18, BKM22] have eigenvalues in [1,1][-1,1]. So, while the results are not directly comparable, our Wasserstein error can be thought as on order of their error of ϵ\epsilon after scaling.

Our work is also closely related to a line of work on sublinear time property testing for bounded entry matrices, initiated by Balcan et al. [BLWZ19]. In that work, they study testing of rank, Schatten-pp norms, and several other global spectral properties. Sublinear time testing algorithms for the rank and other properties have also been studied under low-rank and bounded row norm assumptions on the input matrix [KS03, LWW14]. Recent work studies positive semidefiniteness testing and eigenvalue estimation in the matrix-vector query model, where each query computes 𝐀𝐱\mathbf{A}\mathbf{x} for some 𝐱n×n\mathbf{x}\in\mathbb{R}^{n\times n}. As in Theorem 3, ±ϵ𝐀F\pm\epsilon\|\mathbf{A}\|_{F} eigenvalue estimation can be achieved with missingpoly(logn,1/ϵ)\mathop{\mathrm{missing}}{poly}(\log n,1/\epsilon) queries in this model [NSW22]. Finally, several works study streaming algorithms for eigenspectrum approximation [AN13, LNW14, LW16]. These algorithms are not sublinear time – they require at least linear time to process the input matrix. However, they use sublinear working memory. Note that Theorem 1 immediately gives a sublinear space streaming algorithm for eigenvalue estimation. We can simply store the sampled submatrix 𝐀S\mathbf{A}_{S} as its entries are updated.

1.3 Technical Overview

In this section, we overview the main techniques used to prove Theorems 1, and then how these techniques are extended to prove Theorems 2 and 3. We start by defining a decomposition of any symmetric 𝐀\mathbf{A} into the sum of two matrices containing its large and small magnitude eigendirections.

Definition 1.1 (Eigenvalue Split).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric. For any ϵ,δ(0,1)\epsilon,\delta\in(0,1), let 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}_{o}=\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T} where 𝚲o\mathbf{\Lambda}_{o} is diagonal, with the eigenvalues of 𝐀\mathbf{A} with magnitude ϵδn\geq\epsilon\sqrt{\delta}n on its diagonal, and 𝐕o\mathbf{V}_{o} has the corresponding eigenvectors as columns. Similarly, let 𝐀m=𝐕m𝚲m𝐕mT\mathbf{A}_{m}=\mathbf{V}_{m}\mathbf{\Lambda}_{m}\mathbf{V}_{m}^{T} where 𝚲m\mathbf{\Lambda}_{m} has the eigenvalues of 𝐀\mathbf{A} with magnitude <ϵδn<\epsilon\sqrt{\delta}n on its diagonal and 𝐕m\mathbf{V}_{m} has the corresponding eigenvectors as columns. Then, 𝐀\mathbf{A} can be decomposed as

𝐀=𝐀o+𝐀m=𝐕o𝚲o𝐕oT+𝐕m𝚲m𝐕mT.\mathbf{A}=\mathbf{A}_{o}+\mathbf{A}_{m}=\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T}+\mathbf{V}_{m}\mathbf{\Lambda}_{m}\mathbf{V}_{m}^{T}.

Any principal submatrix of 𝐀\mathbf{A}, 𝐀S\mathbf{A}_{S}, can be similarly written as

𝐀S=𝐀o,S+𝐀m,S=𝐕o,S𝚲o𝐕o,ST+𝐕m,S𝚲m𝐕m,ST,\mathbf{A}_{S}=\mathbf{A}_{o,S}+\mathbf{A}_{m,S}=\mathbf{V}_{o,S}\mathbf{\Lambda}_{o}\mathbf{V}_{o,S}^{T}+\mathbf{V}_{m,S}\mathbf{\Lambda}_{m}\mathbf{V}_{m,S}^{T},

where 𝐕o,S,𝐕m,S\mathbf{V}_{o,S},\mathbf{V}_{m,S} are the corresponding submatrices obtained by sampling rows of 𝐕o,𝐕m\mathbf{V}_{o},\mathbf{V}_{m}.

Since 𝐀S\mathbf{A}_{S}, 𝐀m,S\mathbf{A}_{m,S} and 𝐀o,S\mathbf{A}_{o,S} are all symmetric, we can use Weyl’s eigenvalue perturbation theorem [Wey12] to show that for all eigenvalues of 𝐀S\mathbf{A}_{S},

|λi(𝐀S)λi(𝐀o,S)|𝐀m,S2.\displaystyle\lvert\lambda_{i}(\mathbf{A}_{S})-\lambda_{i}(\mathbf{A}_{o,S})\rvert\leq\|\mathbf{A}_{m,S}\|_{2}. (1)

We will argue that the eigenvalues of 𝐀o,S\mathbf{A}_{o,S} approximate those of 𝐀o\mathbf{A}_{o} – i.e. all eigenvalues of 𝐀\mathbf{A} with magnitude ϵδn\geq\epsilon\sqrt{\delta}n. Further, we will show that 𝐀m,S2\|\mathbf{A}_{m,S}\|_{2} is small with good probability. Thus, via (1), the eigenvalues of 𝐀S\mathbf{A}_{S} approximate those of 𝐀o\mathbf{A}_{o}. In the estimation procedure of Theorem 1, all other small magnitude eigenvalues of 𝐀\mathbf{A} are estimated to be 0, which will immediately give our ±ϵn\pm\epsilon n approximation bound when the original eigenvalue has magnitude ϵn\leq\epsilon n.

Bounding the eigenvalues of 𝐀o,S\mathbf{A}_{o,S}. The first step is to show that the eigenvalues of 𝐀o,S\mathbf{A}_{o,S} well-approximate those of 𝐀o\mathbf{A}_{o}. As in [BCJ20], we critically use that the eigenvectors corresponding to large eigenvalues are incoherent – intuitively, since 𝐀\|\mathbf{A}\|_{\infty} is bounded, their mass must be spread out in order to witness a large eigenvalue. Specifically, [BCJ20] shows that for any eigenvector 𝐯\mathbf{v} of 𝐀\mathbf{A} with corresponding eigenvalue ϵδn\geq\epsilon\sqrt{\delta}n, 𝐯1ϵδn\|\mathbf{v}\|_{\infty}\leq\frac{1}{\epsilon\sqrt{\delta}\cdot\sqrt{n}}. We give related bounds on the Euclidean norms of the rows of 𝐕o\mathbf{V}_{o} (the leverage scores of 𝐀o\mathbf{A}_{o}), and on these rows after weighting by 𝚲o\mathbf{\Lambda}_{o}.

Using these incoherence bounds, we argue that the eigenvalues of 𝐀o,S\mathbf{A}_{o,S} approximate those of 𝐀o\mathbf{A}_{o} up to ±ϵn\pm\epsilon n error. A key idea is to bound the eigenvalues of 𝚲o1/2𝐕o,ST𝐕o,S𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,S}^{T}\mathbf{V}_{o,S}\mathbf{\Lambda}_{o}^{1/2}, which are identical to the non-zero eigenvalues of 𝐀o,S=𝐕o,S𝚲o𝐕o,ST\mathbf{A}_{o,S}=\mathbf{V}_{o,S}\mathbf{\Lambda}_{o}\mathbf{V}_{o,S}^{T}. Via a matrix Bernstein bound and our incoherence bounds on 𝐕o\mathbf{V}_{o}, we show that this matrix is close to 𝚲o\mathbf{\Lambda}_{o} with high probability. However, since 𝚲o1/2\mathbf{\Lambda}_{o}^{1/2} may be complex, the matrix is not necessarily Hermitian and standard perturbation bounds [SgS90, HJ12] do not apply. Thus, to derive an eigenvalue bound, we apply a perturbation bound of Bhatia [Bha13], which generalizes Weyl’s inequality to the non-Hermitian case, with a logn\log n factor loss. To the best of our knowledge, this is the first time that perturbation theory bounds for non-Hermitian matrices have been used to prove improved algorithmic results in the theoretical computer science literature.

We note that in Appendix B, we give an alternate bound, which instead analyzes the Hermitian matrix (𝐕o,ST𝐕o,S)1/2𝚲o(𝐕o,ST𝐕o,S)1/2(\mathbf{V}_{o,S}^{T}\mathbf{V}_{o,S})^{1/2}\mathbf{\Lambda}_{o}(\mathbf{V}_{o,S}^{T}\mathbf{V}_{o,S})^{1/2}, whose eigenvalues are again identical to those of 𝐀o,S\mathbf{A}_{o,S}. This approach only requires Weyl’s inequality, and yields an overall bound of s=O(lognϵ4δ)s=O\left(\frac{\log n}{\epsilon^{4}\delta}\right), improving the logn\log n factors of Theorem 1 at the cost of worse ϵ\epsilon dependence.

Bounding the spectral norm of 𝐀m,S\mathbf{A}_{m,S}. The next step is to show that all eigenvalues of 𝐀m,S\mathbf{A}_{m,S} are small provided a sufficiently large submatrix is sampled. This means that the “middle” eigenvalues of 𝐀\mathbf{A}, i.e. those with magnitude ϵδn\leq\epsilon\sqrt{\delta}n do not contribute much to any eigenvalue λi(𝐀S)\lambda_{i}(\mathbf{A}_{S}). To do so, we apply a theorem of [RV07, Tro08a] which shows concentration of the spectral norm of a uniformly random submatrix of an entrywise bounded matrix Observe that while 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, such a bound will not in general hold for 𝐀m\|\mathbf{A}_{m}\|_{\infty}. Nevertheless, we can use the incoherence of 𝐕o\mathbf{V}_{o} to show that 𝐀o\|\mathbf{A}_{o}\|_{\infty} is bounded, which via triangle inequality, yields a bound on 𝐀m𝐀+𝐀o\|\mathbf{A}_{m}\|_{\infty}\leq\|\mathbf{A}\|_{\infty}+\|\mathbf{A}_{o}\|_{\infty}. In the end, we show that if sO(lognϵ2δ)s\geq{O}(\frac{\log n}{\epsilon^{2}\delta}), with probability at least 1δ1-\delta, 𝐀m,S2ϵs\|\mathbf{A}_{m,S}\|_{2}\leq\epsilon s. After the n/sn/s scaling in the estimation procedure of Theorem 1, this spectral norm bound translates into an additive ϵn\epsilon n error in approximating the eigenvalues of 𝐀\mathbf{A}.

Completing the argument. Once we establish the above bounds on 𝐀o,S\mathbf{A}_{o,S} and 𝐀m,S\mathbf{A}_{m,S}, Theorem 1 is essentially complete. Any eigenvalue in 𝐀\mathbf{A} with magnitude ϵn\geq\epsilon n will correspond to a nearby eigenvalue in ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} and in turn, ns𝐀S\frac{n}{s}\cdot\mathbf{A}_{S} given our spectral norm bound on 𝐀m,S\mathbf{A}_{m,S}. An eigenvalue in 𝐀\mathbf{A} with magnitude ϵn\leq\epsilon n may or may not correspond to a nearby by eigenvalue in 𝐀o,S\mathbf{A}_{o,S} (it will only if it lies in the range [ϵδn,ϵn][\epsilon\sqrt{\delta}n,\epsilon n]). In any case however, in the estimation procedure of Theorem 1, such an eigenvalue will either be estimated using a small eigenvalue of 𝐀S\mathbf{A}_{S}, or be estimated as 0. In both instances, the estimate will give ±ϵn\pm\epsilon n error, as required.

Can we beat additive error? It is natural to ask if our approach can be improved to yield sublinear time algorithms with stronger relative error approximation guarantees for 𝐀\mathbf{A}’s eigenvalues. Unfortunately, this is not possible – consider a matrix with just a single pair of entries 𝐀i,j,𝐀j,i\mathbf{A}_{i,j},\mathbf{A}_{j,i} set to 11. To obtain relative error approximations to the two non-zero eigenvalues, we must find the pair (i,j)(i,j), as otherwise we cannot distinguish 𝐀\mathbf{A} from the all zeros matrix. This requires reading a Ω(n2)\Omega(n^{2}) of 𝐀\mathbf{A}’s entries. More generally, consider 𝐀\mathbf{A} with a random n/t×n/tn/t\times n/t principal submatrix populated by all 11s, and with all other entries equal to 0. 𝐀\mathbf{A} has largest eigenvalue n/tn/t. However, if we read st2s\ll t^{2} entries of 𝐀\mathbf{A}, with good probability, we will not see even a single one, and thus we will not be able to distinguish 𝐀\mathbf{A} from the all zeros matrix. This example establishes that any sublinear time algorithm with query complexity ss must incur additive error at least Ω(n/s)\Omega(n/\sqrt{s}).

1.3.1 Improved Bounds via Non-Uniform Sampling

We now discuss how to give improved approximation bounds via non-uniform sampling. We focus on the ±ϵnnz(𝐀)\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} bound of Theorem 2 using sparsity-based sampling. The proof of Theorem 3 for row norm sampling follows the same general ideas, but with some additional complications.

Theorem 2 requires sampling a submatrix 𝐀S\mathbf{A}_{S}, where each index ii is included in SS with probability pi=min(1,snnz(𝐀i)nnz(𝐀))p_{i}=\min(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}). We reweight each sampled row by 1pi\frac{1}{\sqrt{p_{i}}}. Thus, if entry 𝐀ij\mathbf{A}_{ij} is sampled, it is scaled by 1pipj\frac{1}{\sqrt{p_{i}\cdot p_{j}}}. When the rows have uniform sparsity (so all pi=s/np_{i}=s/n), this ensures that the full submatrix is scaled by n/sn/s, as in Theorem 1.

The proof of Theorem 2 follows the same outline as that of Theorem 1: we first argue that the outlying eigenvectors in 𝐕o\mathbf{V}_{o} are incoherent, giving a bound on the norm of each row of 𝐕o\mathbf{V}_{o} in terms of nnz(𝐀i)\operatorname{nnz}(\mathbf{A}_{i}). We then apply a matrix Bernstein bound and Bhatia’s non-Hermitian eigenvalue perturbation bound to show that the eigenvalues of 𝐀o,S\mathbf{A}_{o,S} approximate those of 𝐀o\mathbf{A}_{o} up to ±ϵnnz(𝐀)\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}.

Bounding the spectral norm of 𝐀m,S\mathbf{A}_{m,S}. The major challenge is showing that the subsampled middle eigendirections do not significantly increase the approximation error by bounding the 𝐀m,S2\|\mathbf{A}_{m,S}\|_{2} by ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}. This is difficult since the indices in 𝐀m,S\mathbf{A}_{m,S} are sampled nonuniformly, so existing bounds [Tro08a] on the spectral norm of uniformly random submatrices do not apply. We extend these bounds to the non-uniform sampling case, but still face an issue due to the rescaling of entries by 1pipj\frac{1}{\sqrt{p_{i}p_{j}}}. In fact, without additional algorithmic modifications, 𝐀m,S2\|\mathbf{A}_{m,S}\|_{2} is simply not bounded by ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}! For example, as already discussed, if 𝐀=𝐈\mathbf{A}=\mathbf{I} is the identity matrix, we get 𝐀m,S=ns𝐈\mathbf{A}_{m,S}=\frac{n}{s}\cdot\mathbf{I} and so 𝐀m,S2=ns>ϵnnz(𝐀)\|\mathbf{A}_{m,S}\|_{2}=\frac{n}{s}>\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}, assuming s<nϵs<\frac{\sqrt{n}}{\epsilon}. Relatedly, suppose that 𝐀\mathbf{A} is tridiagonal, with zeros on the diagonal and ones on the first diagonals above and below the main diagonal. Then, if sns\geq\sqrt{n}, with constant probability, one of the ones will be sampled and scaled by ns\frac{n}{s}. Thus, we will again have 𝐀m,S2nsϵnnz(𝐀)\|\mathbf{A}_{m,S}\|_{2}\geq\frac{n}{s}\geq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}, assuming s<n2ϵs<\frac{\sqrt{n}}{2\epsilon}. Observe that this issue arrises even when trying to approximate just the singular values (the eigenvalue magnitudes) via sampling. Thus, while an analogous bound to the uniform sampling result of Theorem 1 can easily be given for singular value estimation via matrix concentration inequalities (see Appendix G), to the best of our knowledge, Theorems 2 and 3 are the first of their kind even for singular value estimation.

Zeroing out entries in sparse rows/columns. To handle the above cases, we prove a novel perturbation bound, arguing that if we zero out any entry 𝐀ij\mathbf{A}_{ij} of 𝐀\mathbf{A} where nnz(𝐀i)nnz(𝐀j)ϵnnz(𝐀)clogn\sqrt{\operatorname{nnz}(\mathbf{A}_{i})\cdot\operatorname{nnz}(\mathbf{A}_{j})}\leq\frac{\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}}{c\log n}, then the eigenvalues of 𝐀\mathbf{A} are not perturbed by more than ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}. This can be thought of as a strengthening of Girshgorin’s circle theorem, which would ensure that zeroing out entries in rows/columns with nnz(𝐀i)ϵnnz(𝐀)\operatorname{nnz}(\mathbf{A}_{i})\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} does not perturb the eigenvalues by more than ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}. Armed with this perturbation bound, we argue that if we zero out the appropriate entries of 𝐀S\mathbf{A}_{S} before computing its eigenvalues, then since we have removed entries in very sparse rows and columns which would be scaled by a large 1pipj\frac{1}{\sqrt{p_{i}p_{j}}} factor in 𝐀S\mathbf{A}_{S}, we can bound 𝐀m,S2\|\mathbf{A}_{m,S}\|_{2}. This requires relating the magnitudes of the entries in 𝐀m,S\mathbf{A}_{m,S} to those in 𝐀S\mathbf{A}_{S} using the incoherence of the top eigenvectors, which gives bounds on the entries of 𝐀o,S=𝐀S𝐀m,S\mathbf{A}_{o,S}=\mathbf{A}_{S}-\mathbf{A}_{m,S}.

Sampling model. We note that the sparsity-based sampling of Theorem 2 can be efficiently implemented in several natural settings. Given a matrix stored in sparse format, i.e., as a list of nonzero entries, we can easily sample a row with probability nnz(𝐀i)nnz(𝐀)\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})} by sampling a uniformly random non-zero entry and looking at its corresponding row. Via standard techniques, we can convert several such samples into a sampled set SS close in distribution to having each i[n]i\in[n] included independently with probability min(1,snnz(𝐀i)nnz(𝐀))\min\left(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}\right). If we store the values of nnz(𝐀),nnz(𝐀1),,nnz(𝐀n)\operatorname{nnz}(\mathbf{A}),\operatorname{nnz}(\mathbf{A}_{1}),\ldots,\operatorname{nnz}(\mathbf{A}_{n}), we can also efficiently access each pip_{i}, which is needed for rescaling and zeroing out entries. Also observe that if 𝐀\mathbf{A} is the adjacency matrix of a graph, in the standard graph query model [GR97], it is well known how to approximately count edges and sample them uniformly at random, i.e., compute nnz(𝐀)\operatorname{nnz}(\mathbf{A}) and sample its nonzero entries, in sublinear time [GR08, ER18]. Further, it is typically assumed that one has access to the node degrees, i.e., nnz(𝐀1),,nnz(𝐀n)\operatorname{nnz}(\mathbf{A}_{1}),\ldots,\operatorname{nnz}(\mathbf{A}_{n}). Thus, our algorithm can naturally be used to estimate spectral graph properties in sublinear time.

The 2\ell_{2} norm-based sampling of Theorem 3 can also be performed efficiently using an augmented data structure for storing 𝐀\mathbf{A}. Such data structures have been used extensively in the literature on quantum-inspired algorithms, and require just O(nnz(𝐀))O(\operatorname{nnz}(\mathbf{A})) time to construct, O(nnz(𝐀))O(\operatorname{nnz}(\mathbf{A})) space, and O(logn)O(\log n) time to update give an update to an entry of 𝐀\mathbf{A} [Tan18, CCH+20].

1.4 Towards Optimal Query Complexity

As discussed, Bakshi et al. [BCJ20] show that any algorithm which can test with good probability whether 𝐀\mathbf{A} has an eigenvalue ϵn\leq-\epsilon n or else has all non-negative eigenvalues must read Ω~(1ϵ2)\tilde{\Omega}\left(\frac{1}{\epsilon^{2}}\right) entries of 𝐀\mathbf{A}. This testing problem is strictly easier than outputting ±ϵn\pm\epsilon n error estimates of all eigenvalues, so gives a lower bound for our setting. If the queried entries are restricted to fall in a submatrix, [BCJ20] shows that this submatrix must have dimensions Ω(1ϵ2)×Ω(1ϵ2)\Omega\left(\frac{1}{\epsilon^{2}}\right)\times\Omega\left(\frac{1}{\epsilon^{2}}\right), giving total query complexity Ω(1ϵ4)\Omega\left(\frac{1}{\epsilon^{4}}\right). Closing the gap between our upper bound of O~(log3nϵ3)×O~(log3nϵ3)\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{3}}\right)\times\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{3}}\right) and the lower bound of Ω(1ϵ2)×Ω(1ϵ2)\Omega\left(\frac{1}{\epsilon^{2}}\right)\times\Omega\left(\frac{1}{\epsilon^{2}}\right) for submatrix queries is an intriguing open question.

We show in Appendix A that this gap can be easily closed via a surprisingly simple argument if 𝐀\mathbf{A} is positive semidefinite (PSD). In that case, 𝐀=𝐁𝐁T\mathbf{A}=\mathbf{B}\mathbf{B}^{T} with 𝐁n×n\mathbf{B}\in\mathbb{R}^{n\times n}. Writing 𝐀S=𝐒T𝐀𝐒\mathbf{A}_{S}=\mathbf{S}^{T}\mathbf{A}\mathbf{S} for a sampling matrix 𝐒n×|S|\mathbf{S}\in\mathbb{R}^{n\times|S|}, the non-zero eigenvalues of 𝐀S\mathbf{A}_{S} are identical to those of 𝐁𝐒𝐒T𝐁T\mathbf{B}\mathbf{S}\mathbf{S}^{T}\mathbf{B}^{T}. Via a standard approximate matrix multiplication analysis [DK01], one can then show that, for s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta}, with probability at least 1δ1-\delta, 𝐁𝐁T𝐁𝐒𝐒T𝐁Fϵn\|\mathbf{BB}^{T}-\mathbf{B}\mathbf{S}\mathbf{S}^{T}\mathbf{B}\|_{F}\leq\epsilon n. Via Weyl’s inequality, this shows that the eigenvalues of 𝐁𝐒𝐒T𝐁\mathbf{B}\mathbf{S}\mathbf{S}^{T}\mathbf{B}, and hence 𝐀S\mathbf{A}_{S}, approximate those of 𝐀\mathbf{A} up to ±ϵn\pm\epsilon n error.444In fact, via more refined eigenvalue perturbation bounds [Bha13] one can show an 2\ell_{2} norm bound on the eigenvalue approximation errors, which can be much stronger than the \ell_{\infty} norm bound of Theorem 1.

Unfortunately, this approach breaks down when 𝐀\mathbf{A} has negative eigenvalues, and so cannot be factored as 𝐁𝐁T\mathbf{BB}^{T} for real 𝐁n×n\mathbf{B}\in\mathbb{R}^{n\times n}. This is more than a technical issue: observe that when 𝐀\mathbf{A} is PSD and has 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, it can have at most 1/ϵ1/\epsilon eigenvalues larger than ϵn\epsilon n – since its trace, which is equal to the sum of its eigenvalues, is bounded by nn, and since all eigenvalues are non-negative. When 𝐀\mathbf{A} is not PSD, it can have Ω(1/ϵ2)\Omega(1/\epsilon^{2}) eigenvalues with magnitude larger than ϵn\epsilon n. In particular, if 𝐀\mathbf{A} is the tensor product of a 1/ϵ2×1/ϵ21/\epsilon^{2}\times 1/\epsilon^{2} random ±1\pm 1 matrix and the ϵ2n×ϵ2n\epsilon^{2}n\times\epsilon^{2}n all ones matrix, the bulk of its eigenvalues (of which there are 1/ϵ21/\epsilon^{2}) will concentrate around 1/ϵϵ2n=ϵn1/\epsilon\cdot\epsilon^{2}n=\epsilon n. As a result it remains unclear whether we can match the 1/ϵ21/\epsilon^{2} dependence of the PSD case, or if a stronger lower bound can be shown for indefinite matrices.

Outside the ϵ\epsilon dependence, it is unknown if full eigenspectrum approximation can be performed with sample complexity independent of the matrix size nn. [BCJ20] achieve this for the easier positive semidefiniteness testing problem, giving sample complexity O~(1/ϵ2)\tilde{O}(1/\epsilon^{2}). However our bounds have additional logn\log n factors. As discussed, in Appendix B we give an alternate analysis for Theorem 1, which shows that sampling a O(lognϵ4δ)×O(lognϵ4δ)O\left(\frac{\log n}{\epsilon^{4}\delta}\right)\times O\left(\frac{\log n}{\epsilon^{4}\delta}\right) submatrix suffices for ±ϵn\pm\epsilon n eigenvalue approximation, saving a log2n\log^{2}n factor at the cost of worse ϵ\epsilon dependence. However, removing the final logn\log n seems difficult – it arises when bounding 𝐀m,S2\|\mathbf{A}_{m,S}\|_{2} via bounds on the spectral norms of random principal submatrices [RV07]. Removing it seems as though it would require either improving such bounds, or taking a different algorithmic approach.

Also note that our logn\log n and ϵ\epsilon dependencies for non-uniform sampling (Theorems 2 and 3) are likely not tight. It is not hard to check that the lower bounds of [BCJ20] still hold in these settings. For example, in the sparsity-based sampling setting, by simply having the matrix entirely supported on a nnz(𝐀)×nnz(𝐀)\sqrt{\operatorname{nnz}(\mathbf{A})}\times\sqrt{\operatorname{nnz}(\mathbf{A})} submatrix, the lower bounds of [BCJ20] directly carry over. Giving tight query complexity bounds here would also be interesting. Finally, it would be interesting to go beyond principal submatrix based algorithms, to achieve improved query complexity, as in Corollary 1. Finding an algorithm matching the O~(1ϵ2)\tilde{O}\left(\frac{1}{\epsilon^{2}}\right) overall query complexity lower bound of [BCJ20] is open even in the much simpler PSD setting.

2 Notation and Preliminaries

We now define notation and foundational results that we use throughout our work. For any integer nn, let [n][n] denote the set {1,2,,n}\{1,2,\ldots,n\}. We write matrices and vectors in bold literals – e.g., 𝐀\mathbf{A} or 𝐱\mathbf{x}. We denote the eigenvalues of a symmetric matrix 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} by λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}), in decreasing order. A symmetric matrix is positive semidefinite if all its eigenvalues are non-negative. For two matrices 𝐀,𝐁\mathbf{A},\mathbf{B}, we let 𝐀𝐁\mathbf{A}\succeq\mathbf{B} denote that 𝐀𝐁\mathbf{A}-\mathbf{B} is positive semidefinite. For any matrix 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} and i[n]i\in[n], we let 𝐀i\mathbf{A}_{i} denote the ithi^{th} row of 𝐀\mathbf{A}, nnz(𝐀i)\operatorname{nnz}(\mathbf{A}_{i}) denote the number of non-zero elements in this row, and 𝐀i2\|\mathbf{A}_{i}\|_{2} denote its 2\ell_{2} norm. We let nnz(𝐀)\operatorname{nnz}(\mathbf{A}) denote the total number of non-zero elements 𝐀\mathbf{A}. For a vector 𝐱\mathbf{x}, we let 𝐱2\|\mathbf{x}\|_{2} denote its Euclidean norm. For a matrix 𝐀\mathbf{A}, we let 𝐀\|\mathbf{A}\|_{\infty} denote the largest magnitude of an entry, 𝐀2=max𝐱𝐀𝐱2𝐱2\|\mathbf{A}\|_{2}=\max_{\mathbf{x}}\frac{\|\mathbf{Ax}\|_{2}}{\|\mathbf{x}\|_{2}} denote the spectral norm, 𝐀F=(i,j𝐀ij2)1/2\|\mathbf{A}\|_{F}=(\sum_{i,j}\mathbf{A}_{ij}^{2})^{1/2} denote the Frobenius norm, and 𝐀12\|\mathbf{A}\|_{1\rightarrow 2} denote the maximum Euclidean norm of a column. For 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} and S[n]S\subseteq[n] we let 𝐀S\mathbf{A}_{S} denote the principal submatrix corresponding to SS. We let 𝔼2\mathbb{E}_{2} denote the L2L_{2} norm of a random variable, 𝔼2[X]=(𝔼[X2])1/2\mathbb{E}_{2}[X]=(\mathbb{E}[X^{2}])^{1/2}, where 𝔼[]\mathbb{E}[\cdot] denotes expectation.

We use the following basic facts and identities on eigenvalues throughout our proofs.

Fact 1 (Eigenvalue of Matrix Product).

For any two matrices 𝐀n×m,𝐁m×n\mathbf{A}\in\mathbb{C}^{n\times m},\mathbf{B}\in\mathbb{C}^{m\times n}, the non-zero eigenvalues of 𝐀𝐁\mathbf{AB} are identical to those of 𝐁𝐀\mathbf{B}\mathbf{A}.

Fact 2 (Girshgorin’s circle theorem [Ger31]).

Let 𝐀n×n\mathbf{A}\in\mathbb{C}^{n\times n} with entries 𝐀ij\mathbf{A}_{ij}. For i[n]i\in[n], let 𝐑i\mathbf{R}_{i} be the sum of absolute values of non-diagonal entries in the iith row. Let D(𝐀ii,𝐑i)D(\mathbf{A}_{ii},\mathbf{R}_{i}) be the closed disc centered at 𝐀ii\mathbf{A}_{ii} with radius 𝐑i\mathbf{R}_{i}. Then every eigenvalue of 𝐀\mathbf{A} lies within one of the discs D(𝐀ii,𝐑i)D(\mathbf{A}_{ii},\mathbf{R}_{i}).

Fact 3 (Weyl’s Inequality [Wey12]).

For any two Hermitian matrices 𝐀,𝐁n×n\mathbf{A},\mathbf{B}\in\mathbb{C}^{n\times n} with 𝐀𝐁=𝐄\mathbf{A}-\mathbf{B}=\mathbf{E},

maxi|λi(𝐀)λi(𝐁)|𝐄2.\displaystyle\max_{i}|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{B})|\leq\|\mathbf{E}\|_{2}.

Weyl’s inequality ensures that a small Hermitian perturbation of a Hermitian matrix will not significantly change its eigenvalues. The bound can be extended to the case when the perturbation is not Hermitian, with a loss of an O(logn)O(\log n) factor; to the best of our knowledge this loss is necessary:

Fact 4 (Non-Hermitian perturbation bound [Bha13]).

Let 𝐀n×n\mathbf{A}\in\mathbb{C}^{n\times n} be Hermitian and 𝐁n×n\mathbf{B}\in\mathbb{C}^{n\times n} be any matrix whose eigenvalues are λ1(𝐁),,λn(𝐁)\lambda_{1}(\mathbf{B}),\ldots,\lambda_{n}(\mathbf{B}) such that Re(λ1(𝐁))Re(λn(𝐁))Re(\lambda_{1}(\mathbf{B}))\geq\ldots\geq Re(\lambda_{n}(\mathbf{B})) (where Re(λi(𝐁))Re(\lambda_{i}(\mathbf{B})) denotes the real part of λi(𝐁)\lambda_{i}(\mathbf{B})). Let 𝐀𝐁=𝐄\mathbf{A}-\mathbf{B}=\mathbf{E}. For some universal constant CC,

maxi|λi(𝐀)λi(𝐁)|Clogn𝐄2.\displaystyle\max_{i}|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{B})|\leq C\log n\|\mathbf{E}\|_{2}.

Beyond the above facts, we use several theorems to obtain eigenvalue concentration bounds. We first state a theorem from [Tro08a], which bounds the spectral norm of a principal submatrix sampled uniformly at random from a bounded entry matrix. We build on this to prove the full eigenspectrum concentration result of Theorem 1.

Theorem 4 (Random principal submatrix spectral norm bound [RV07, Tro08a]).

Let 𝐀n×n\mathbf{A}\in\mathbb{C}^{n\times n} be Hermitian, decomposed into diagonal and off-diagonal parts: 𝐀=𝐃+𝐇\mathbf{A}=\mathbf{D}+\mathbf{H}. Let 𝐒n×n\mathbf{S}\in\mathbb{R}^{n\times n} be a diagonal sampling matrix with the jthj^{th} diagonal entry set to 11 independently with probability s/ns/n and 0 otherwise. Then, for some universal constant CC,

𝔼2𝐒𝐀𝐒2C[logn𝔼2𝐒𝐇𝐒+slognn𝔼2𝐇𝐒12+sn𝐇2]+𝔼2𝐒𝐃𝐒2.\displaystyle\mathbb{E}_{2}\|\mathbf{S}\mathbf{AS}\|_{2}\leq C\left[\log n\cdot\mathbb{E}_{2}\|\mathbf{S}\mathbf{HS}\|_{\infty}+\sqrt{\frac{s\log n}{n}}\cdot\mathbb{E}_{2}\|\mathbf{HS}\|_{1\rightarrow 2}+\frac{s}{n}\cdot\|\mathbf{H}\|_{2}\right]+\mathbb{E}_{2}\|\mathbf{S}\mathbf{DS}\|_{2}.

For Theorems 2 and 3, we need an extension of Theorem 4 to the setting where rows are sampled non-uniformly. We will use two bounds here. The first is a decoupling and recoupling result for matrix norms. One can prove this lemma following an analogous result in [Tro08a] for sampling rows/columns uniformly. The proof is almost identical so we omit it.

Lemma 1 (Decoupling and recoupling).

Let 𝐇\mathbf{H} be a Hermitian matrix with zero diagonal. Let δj\delta_{j} be a sequence of independent random variables such that δj=1pj\delta_{j}=\frac{1}{\sqrt{p_{j}}} with probability pjp_{j} and 0 otherwise. Let 𝐒\mathbf{S} be a square diagonal sampling matrix with jthj^{th} diagonal entry set to δj\delta_{j}. Then:

𝔼2𝐒𝐇𝐒22𝔼2𝐒𝐇𝐒^2and𝔼2𝐒𝐇𝐒^4𝔼2𝐒𝐇𝐒,\mathbb{E}_{2}\|\mathbf{SHS}\|_{2}\leq 2\mathbb{E}_{2}\|\mathbf{SH\hat{S}}\|_{2}\hskip 10.00002pt\text{and}\hskip 10.00002pt\mathbb{E}_{2}\|\mathbf{SH\hat{S}}\|_{\infty}\leq 4\mathbb{E}_{2}\|\mathbf{SHS}\|_{\infty},

where 𝐒^\mathbf{\hat{S}} is an independent diagonal sampling matrix drawn from the same distribution as 𝐒\mathbf{S}.

The second theorem bounds the spectral norm of a non-uniform random column sample of a matrix. We give a proof in Appendix D, again following a theorem in [Tro08b] for uniform sampling.

Theorem 5 (Non-uniform column sampling – spectral norm bound).

Let 𝐀\mathbf{A} be an m×nm\times n matrix with rank rr. Let δj\delta_{j} be a sequence of independent random variables such that δj=1pj\delta_{j}=\frac{1}{\sqrt{p_{j}}} with probability pjp_{j} and 0 otherwise. Let 𝐒\mathbf{S} be a square diagonal sampling matrix with jthj^{th} diagonal entry set to δj\delta_{j}.

𝔼2𝐀𝐒25logr𝔼2𝐀𝐒12+𝐀2\mathbb{E}_{2}\|\mathbf{AS}\|_{2}\leq 5\sqrt{\log r}\cdot\mathbb{E}_{2}\|\mathbf{AS}\|_{1\rightarrow 2}+\|\mathbf{A}\|_{2}

We use a standard Matrix Bernstein inequality to bound the spectral norm of random submatrices.

Theorem 6 (Matrix Bernstein [Tro15]).

Consider a finite sequence {𝐒k}\{\mathbf{S}_{k}\} of random matrices in d×d\mathbb{R}^{d\times d}. Assume that for all kk, 𝔼[𝐒k]=𝟎and𝐒k2L.\mathbb{E}[\mathbf{S}_{k}]=\mathbf{0}\quad\text{and}\quad\|\mathbf{S}_{k}\|_{2}\leq L. Let 𝐙=k𝐒k\mathbf{Z}=\sum_{k}\mathbf{S}_{k} and let 𝐕1,𝐕2\mathbf{V}_{1},\mathbf{V}_{2} be semidefinite upper-bounds for the matrix valued variances 𝐕𝐚𝐫1(𝐙)\mathbf{Var}_{1}(\mathbf{Z}) and 𝐕𝐚𝐫2(𝐙)\mathbf{Var}_{2}(\mathbf{Z}):

𝐕1\displaystyle\mathbf{V}_{1} 𝐕𝐚𝐫1(𝐙)=def𝔼(𝐙𝐙T)=k𝔼(𝐒k𝐒kT),and\displaystyle\succeq\mathbf{Var}_{1}(\mathbf{Z})\mathbin{\stackrel{{\scriptstyle\rm def}}{{=}}}\mathbb{E}\left(\mathbf{ZZ}^{T}\right)=\sum_{k}\mathbb{E}\left(\mathbf{S}_{k}\mathbf{S}_{k}^{T}\right),\quad\text{and}
𝐕2\displaystyle\mathbf{V}_{2} 𝐕𝐚𝐫2(𝐙)=def𝔼(𝐙T𝐙)=k𝔼(𝐒kT𝐒k).\displaystyle\succeq\mathbf{Var}_{2}(\mathbf{Z})\mathbin{\stackrel{{\scriptstyle\rm def}}{{=}}}\mathbb{E}\left(\mathbf{Z}^{T}\mathbf{Z}\right)=\sum_{k}\mathbb{E}\left(\mathbf{S}_{k}^{T}\mathbf{S}_{k}\right).

Then, letting v=max(𝐕12,𝐕22)v=\max(\|\mathbf{V}_{1}\|_{2},\|\mathbf{V}_{2}\|_{2}), for any t0t\geq 0,

(𝐙2t)\displaystyle\operatorname*{\mathbb{P}}(\|\mathbf{Z}\|_{2}\geq t) 2dexp(t2/2v+Lt/3).\displaystyle\leq 2d\cdot\exp\left(\frac{-t^{2}/2}{v+Lt/3}\right).

For real valued random variables, we use the standard Bernstein inequality.

Theorem 7 (Bernstein inequality [Ber27]).

Let {zj}\{z_{j}\} for j[n]j\in[n] be independent random variables with zero mean such that |zj|M\lvert z_{j}\rvert\leq M for all jj. Then for all positive tt,

(|j=1nzj|t)exp(t2/2i=1n𝔼[zi2]+Mt/3).\displaystyle\operatorname*{\mathbb{P}}\left(\left\lvert\sum_{j=1}^{n}z_{j}\right\rvert\geq t\right)\leq\exp\left(\frac{-t^{2}/2}{\sum_{i=1}^{n}\mathbb{E}[z_{i}^{2}]+Mt/3}\right).

3 Sublinear Time Eigenvalue Estimation using Uniform Sampling

We now prove our main eigenvalue estimation result – Theorem 1. We give the pseudocode for our principal submatrix based estimation procedure in Algorithm 1. We will show that any positive or negative eigenvalue of 𝐀\mathbf{A} with magnitude ϵn\geq\epsilon n will appear as an approximate eigenvalue in 𝐀S\mathbf{A}_{S} with good probability. Thus, in step 5 of Algorithm 1, the positive and negative eigenvvalues of 𝐀S\mathbf{A}_{S} are used to estimate the outlying largest and smallest eigenvalues of 𝐀\mathbf{A}. All other interior eigenvalues of 𝐀\mathbf{A} are estimated to be 0, which will immediately give our ±ϵn\pm\epsilon n approximation bound when the original eigenvalue has magnitude ϵn\leq\epsilon n.

Algorithm 1 Eigenvalue estimator using uniform sampling
1:  Input: Symmetric 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, Accuracy ϵ(0,1)\epsilon\in(0,1), failure prob. δ(0,1)\delta\in(0,1).
2:  Fix s=clog(1/(ϵδ))log3nϵ3δs=\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}{\delta}} where cc is a sufficiently large constant.
3:  Add each index i[n]i\in[n] to the sample set SS independently with probability sn\frac{s}{n}. Let the principal submatrix of 𝐀\mathbf{A} corresponding SS be 𝐀S\mathbf{A}_{S}.
4:  Compute the eigenvalues of 𝐀S\mathbf{A}_{S}: λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}_{S}).
5:  For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}_{S})\geq 0, let λ~i(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all i[|S|]i\in[|S|] with λi(𝐀S)<0\lambda_{i}(\mathbf{A}_{S})<0, let λ~n(|S|i)(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all remaining i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0.
6:  Return: Eigenvalue estimates λ~1(𝐀)λ~n(𝐀)\tilde{\lambda}_{1}(\mathbf{A})\geq\ldots\geq\tilde{\lambda}_{n}(\mathbf{A}).

Running time. Observe that the expected number of indices chosen by Algorithm 1 is s=clog(1/(ϵδ))log3nϵ3δs=\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}{\delta}}. A standard concentration bound can be used to show that with high probability (11/missingpoly(n))(1-1/\mathop{\mathrm{missing}}{poly}(n)), the number of sampled entries is O(s)O(s). Thus, the algorithm reads a total of O(s2)O(s^{2}) entries of 𝐀\mathbf{A} and runs in O(sω)O(s^{\omega}) time – the time to compute a full eigendecomposition of 𝐀S\mathbf{A}_{S}.

3.1 Outer and Middle Eigenvalue Bounds

Recall that we will split 𝐀\mathbf{A} into two symmetric matrices (Definition 1.1): 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}_{o}=\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T} which contains its large magnitude (outlying) eigendirections with eigenvalue magnitudes ϵδn\geq\epsilon\sqrt{\delta}n and 𝐀m=𝐕m𝚲m𝐕mT\mathbf{A}_{m}=\mathbf{V}_{m}\mathbf{\Lambda}_{m}\mathbf{V}_{m}^{T} which contains its small magnitude (middle) eigendirections.

We first show that the eigenvectors in 𝐕o\mathbf{V}_{o} are incoherent. I.e., that their (eigenvalue weighted) squared row norms are bounded. This ensures that the outlying eigenspace of 𝐀\mathbf{A} is well-approximated via uniform sampling.

Lemma 2 (Incoherence of outlying eigenvectors).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Let 𝐕o\mathbf{V}_{o} be as in Definition 1.1. Let 𝐕o,i\mathbf{V}_{o,i} denote the iith row of 𝐕o\mathbf{V}_{o}. Then,

𝚲o1/2𝐕o,i221ϵδ and 𝐕o,i221ϵ2δn.\displaystyle\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{1}{\epsilon\sqrt{\delta}}\hskip 20.00003pt\text{ and }\hskip 20.00003pt\|\mathbf{V}_{o,i}\|^{2}_{2}\leq\frac{1}{\epsilon^{2}\delta n}.
Proof.

Observe that 𝐀𝐕o=𝐕o𝚲o\mathbf{A}\mathbf{V}_{o}=\mathbf{V}_{o}\mathbf{\Lambda}_{o}. Let [𝐀𝐕o]i[\mathbf{A}\mathbf{V}_{o}]_{i} denote the iith row of the 𝐀𝐕o\mathbf{A}\mathbf{V}_{o}. Then we have

[𝐀𝐕o]i22=[𝐕o𝚲o]i22=j=1rλj2𝐕o,i,j2,\|[\mathbf{A}\mathbf{V}_{o}]_{i}\|_{2}^{2}=\|[\mathbf{V}_{o}\mathbf{\Lambda}_{o}]_{i}\|_{2}^{2}=\sum_{j=1}^{r}\lambda_{j}^{2}\cdot\mathbf{V}_{o,i,j}^{2}, (2)

where r=rank(𝐀o)r=\operatorname{rank}(\mathbf{A}_{o}), 𝐕o,i,j\mathbf{V}_{o,i,j} is the (i,j)(i,j)th element of 𝐕o\mathbf{V}_{o} and λj=𝚲o(j,j)\lambda_{j}=\mathbf{\Lambda}_{o}(j,j). 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 by assumption and since 𝐕o\mathbf{V}_{o} has orthonormal columns, its spectral norm is bounded by 11, thus we have

[𝐀𝐕o]i22=[𝐀]i𝐕o22[𝐀]i22𝐕o22n.\|[\mathbf{A}\mathbf{V}_{o}]_{i}\|_{2}^{2}=\|[\mathbf{A}]_{i}\mathbf{V}_{o}\|_{2}^{2}\leq\|[\mathbf{A}]_{i}\|_{2}^{2}\cdot\|\mathbf{V}_{o}\|^{2}_{2}\leq n.

Therefore, by (2), we have:

j=1rλj2𝐕o,i,j2n.\sum_{j=1}^{r}\lambda_{j}^{2}\cdot\mathbf{V}_{o,i,j}^{2}\leq n. (3)

Since by definition of 𝚲o\mathbf{\Lambda}_{o}, |λj|ϵδn\lvert\lambda_{j}\rvert\geq\epsilon\sqrt{\delta}n for all jj, we finally have

𝚲o1/2𝐕o,i22=j=1rλj𝐕o,i,j2nϵδn=1ϵδ\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}=\sum_{j=1}^{r}\lambda_{j}\cdot\mathbf{V}_{o,i,j}^{2}\leq\frac{n}{\epsilon\sqrt{\delta}n}=\frac{1}{\epsilon\sqrt{\delta}}

and

𝐕o,i22=j=1r𝐕o,i,j2\displaystyle\|\mathbf{V}_{o,i}\|_{2}^{2}=\sum_{j=1}^{r}\mathbf{V}_{o,i,j}^{2} nϵ2δn2=1ϵ2δn.\displaystyle\leq\frac{n}{\epsilon^{2}\delta n^{2}}=\frac{1}{\epsilon^{2}\delta n}.

Let 𝐒¯n×|S|\mathbf{\bar{S}}\in\mathbb{R}^{n\times|S|} be the scaled sampling matrix satisfying 𝐒¯T𝐀𝐒¯=ns𝐀S\mathbf{\bar{S}}^{T}\mathbf{A}\mathbf{\bar{S}}=\frac{n}{s}\cdot\mathbf{A}_{S}. We next apply Lemma 2 in conjunction with a matrix Bernstein bound to show that 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2} concentrates around its expectation, 𝚲o\mathbf{\Lambda}_{o}. Since by Fact 1, this matrix has identical eigenvalues to ns𝐀o,S=𝐒¯T𝐕o𝚲o𝐕oT𝐒¯\frac{n}{s}\cdot\mathbf{A}_{o,S}=\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}, this allows us to argue that the eigenvalues of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} approximate those of 𝚲o\mathbf{\Lambda}_{o}.

Lemma 3 (Concentration of outlying eigenvalues).

Let S[n]S\subseteq[n] be sampled as in Algorithm 1 for sclog(1/(ϵδ))ϵ3δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{3}\sqrt{\delta}} where cc is a sufficiently large constant. Let 𝐒¯n×|S|\mathbf{\bar{S}}\in\mathbb{R}^{n\times|S|} be the scaled sampling matrix satisfying 𝐒¯T𝐀𝐒¯=ns𝐀S\mathbf{\bar{S}}^{T}\mathbf{A}\mathbf{\bar{S}}=\frac{n}{s}\cdot\mathbf{A}_{S}. Letting 𝚲o,𝐕o\mathbf{\Lambda}_{o},\mathbf{V}_{o} be as in Definition 1.1, with probability at least 1δ1-\delta,

𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2ϵn.\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}\|_{2}\leq\epsilon n.
Proof.

Define 𝐄=𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o\mathbf{E}=\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}. For all i[n]i\in[n], let 𝐕o,i\mathbf{V}_{o,i} be the ithi^{th} row of 𝐕o\mathbf{V}_{o} and define the matrix valued random variable

𝐘i={ns𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2,with probability s/n0otherwise.\displaystyle\mathbf{Y}_{i}=\begin{cases}\frac{n}{s}\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2},&\text{with probability }s/n\\ 0&\text{otherwise.}\end{cases} (4)

Define 𝐐i=𝐘i𝔼[𝐘i]\mathbf{Q}_{i}=\mathbf{Y}_{i}-\mathbb{E}\left[\mathbf{Y}_{i}\right]. Observe that 𝐐1,,𝐐n\mathbf{Q}_{1},\ldots,\mathbf{Q}_{n} are independent random variables and that i=1n𝐐i=𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o=𝐄\sum_{i=1}^{n}\mathbf{Q}_{i}=\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}=\mathbf{E}. Further, observe that 𝐐i2max(1,ns1)𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/22max(1,ns1)𝚲o1/2𝐕o,i22\|\mathbf{Q}_{i}\|_{2}\leq\max\left(1,\frac{n}{s}-1\right)\cdot\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2}\|_{2}\leq\max\left(1,\frac{n}{s}-1\right)\cdot\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}. Now, 𝚲o1/2𝐕o,i221ϵδ\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{1}{\epsilon\sqrt{\delta}} by Lemma 2. Thus, 𝐐i2nϵδs\|\mathbf{Q}_{i}\|_{2}\leq\frac{n}{\epsilon\sqrt{\delta}s}. The variance 𝐕𝐚𝐫(𝐄)=def𝔼(𝐄𝐄T)=𝔼(𝐄T𝐄)=i=1n𝔼[𝐐i2]\mathbf{Var}(\mathbf{E})\mathbin{\stackrel{{\scriptstyle\rm def}}{{=}}}\mathbb{E}(\mathbf{EE}^{T})=\mathbb{E}(\mathbf{E}^{T}\mathbf{E})=\sum_{i=1}^{n}\mathbb{E}[\mathbf{Q}_{i}^{2}] can be bounded as:

i=1n𝔼[𝐐i2]\displaystyle\sum_{i=1}^{n}\mathbb{E}[\mathbf{Q}_{i}^{2}] =i=1n[sn(ns1)2+(1sn)](𝚲o1/2𝐕o,i𝐕o,iT𝚲o𝐕o,i𝐕o,iT𝚲o1/2)\displaystyle=\sum_{i=1}^{n}\left[\frac{s}{n}\cdot\left(\frac{n}{s}-1\right)^{2}+\left(1-\frac{s}{n}\right)\right]\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})
i=1nns𝚲o1/2𝐕o,i22(𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2).\displaystyle\preceq\sum_{i=1}^{n}\frac{n}{s}\cdot\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2}). (5)

Again by Lemma 2, 𝚲o1/2𝐕o,i221ϵδ\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{1}{\epsilon\sqrt{\delta}}. Plugging back into (5) we can bound,

i=1n𝔼[𝐐i2]\displaystyle\sum_{i=1}^{n}\mathbb{E}[\mathbf{Q}_{i}^{2}] i=1nns1ϵδ(𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2)=nsϵδ𝚲on2sϵδ𝐈.\displaystyle\preceq\sum_{i=1}^{n}\frac{n}{s}\cdot\frac{1}{\epsilon\sqrt{\delta}}\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})=\frac{n}{s\epsilon\sqrt{\delta}}\mathbf{\Lambda}_{o}\preceq\frac{n^{2}}{s\epsilon\sqrt{\delta}}\cdot\mathbf{I}.

Since 𝐐i2\mathbf{Q}_{i}^{2} is PSD, this establishes that 𝐕𝐚𝐫(𝐄)2n2sϵδ\|\mathbf{Var}(\mathbf{E})\|_{2}\leq\frac{n^{2}}{s\epsilon\sqrt{\delta}}. We then apply Theorem 6 (the matrix Bernstein inequality) with L=nsϵδL=\frac{n}{s\epsilon\sqrt{\delta}}, v=n2sϵδv=\frac{n^{2}}{s\epsilon\sqrt{\delta}}, and d1ϵ2δd\leq\frac{1}{\epsilon^{2}\delta} since there are at most 𝐀F2δϵ2n21ϵ2δ\frac{\|\mathbf{A}\|_{F}^{2}}{\delta\epsilon^{2}n^{2}}\leq\frac{1}{\epsilon^{2}\delta} outlying eigenvalues with magnitude δϵn\geq\sqrt{\delta}\epsilon n in 𝚲o\boldsymbol{\Lambda}_{o}. This gives:

(𝐄2ϵn)\displaystyle\operatorname*{\mathbb{P}}\left(\left\|\mathbf{E}\right\|_{2}\geq\epsilon n\right) 2ϵ2δexp(ϵ2n2/2v+Lϵn/3)\displaystyle\leq\frac{2}{\epsilon^{2}\delta}\cdot\exp\left(\frac{-\epsilon^{2}n^{2}/2}{v+L\epsilon n/3}\right)
2ϵ2δexp(ϵ2n2/2n2sϵδ+ϵn23sϵδ)\displaystyle\leq\frac{2}{\epsilon^{2}\delta}\cdot\exp\left(\frac{-\epsilon^{2}n^{2}/2}{\frac{n^{2}}{s\epsilon\sqrt{\delta}}+\frac{\epsilon n^{2}}{3s\epsilon\sqrt{\delta}}}\right)
2ϵ2δexp(sϵ3δ4).\displaystyle\leq\frac{2}{\epsilon^{2}\delta}\cdot\exp\left(\frac{-s\epsilon^{3}\sqrt{\delta}}{4}\right).

Thus, if we set sclog(1/(ϵδ))ϵ3δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{3}\sqrt{\delta}} for large enough cc, then the probability is bounded above by δ\delta, completing the proof. ∎

We cannot prove an analogous leverage score bound to Lemma 2 for the interior eigenvectors of 𝐀\mathbf{A} appearing in 𝐕m\mathbf{V}_{m}. Thus we cannot apply a matrix Bernstein bound as in Lemma 3. However, we can use Theorem 4 to show that the spectral norm of the random principal submatrix 𝐀m,S\mathbf{A}_{m,S} is not too large, and thus that the eigenvalues of 𝐀S=𝐀o,S+𝐀m,S\mathbf{A}_{S}=\mathbf{A}_{o,S}+\mathbf{A}_{m,S} are close to those of 𝐀o,S\mathbf{A}_{o,S}.

Lemma 4 (Spectral norm bound – sampled middle eigenvalues).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Let 𝐀m\mathbf{A}_{m} be as in Definition 1.1. Let SS be sampled as in Algorithm 1. If sclognϵ2δs\geq\frac{c\log n}{\epsilon^{2}\delta} for some sufficiently large constant cc, then with probability at least 1δ1-\delta, 𝐀m,S2ϵs\|\mathbf{A}_{m,S}\|_{2}\leq\epsilon s.

Proof.

Let 𝐀m=𝐃m+𝐇m\mathbf{A}_{m}=\mathbf{D}_{m}+\mathbf{H}_{m} where 𝐃m\mathbf{D}_{m} is the matrix of diagonal elements and 𝐇m\mathbf{H}_{m} the matrix of off-diagonal elements. Let 𝐒n×|S|\mathbf{S}\in\mathbb{R}^{n\times|S|} be the binary sampling matrix with 𝐀m,S=𝐒T𝐀m𝐒\mathbf{A}_{m,S}=\mathbf{S}^{T}\mathbf{A}_{m}\mathbf{S}. From Theorem 4, we have for some constant CC,

𝔼2[𝐀m,S2]C[logn𝔼2[𝐒T𝐇m𝐒]+slognn𝔼2[𝐇m𝐒12]+sn𝐇m2]+𝔼2[𝐒T𝐃m𝐒].\mathbb{E}_{2}[\|\mathbf{A}_{m,S}\|_{2}]\leq C\bigg{[}\log n\cdot\mathbb{E}_{2}[\|\mathbf{S}^{T}\mathbf{H}_{m}\mathbf{S}\|_{\infty}]+\sqrt{\frac{s\log n}{n}}\mathbb{E}_{2}[\|\mathbf{H}_{m}\mathbf{S}\|_{1\rightarrow 2}]+\frac{s}{n}\|\mathbf{H}_{m}\|_{2}\bigg{]}+\mathbb{E}_{2}[\|\mathbf{S}^{T}\mathbf{D}_{m}\mathbf{S}\|]. (6)

Considering the various terms in (6), we have 𝐒T𝐇m𝐒𝐀m\|\mathbf{S}^{T}\mathbf{H}_{m}\mathbf{S}\|_{\infty}\leq\|\mathbf{A}_{m}\|_{\infty} and 𝐒T𝐃m𝐒2=𝐒T𝐃m𝐒𝐀m\|\mathbf{S}^{T}\mathbf{D}_{m}\mathbf{S}\|_{2}=\|\mathbf{S}^{T}\mathbf{D}_{m}\mathbf{S}\|_{\infty}\leq\|\mathbf{A}_{m}\|_{\infty}. We also have

𝐇m2𝐀m2+𝐃m2𝐀m2+𝐀mϵδ1/2n+𝐀m\|\mathbf{H}_{m}\|_{2}\leq\|\mathbf{A}_{m}\|_{2}+\|\mathbf{D}_{m}\|_{2}\leq\|\mathbf{A}_{m}\|_{2}+\|\mathbf{A}_{m}\|_{\infty}\leq\epsilon\delta^{1/2}n+\|\mathbf{A}_{m}\|_{\infty}

and

𝐇m𝐒12𝐀m𝐒12𝐀m12n.\|\mathbf{H}_{m}\mathbf{S}\|_{1\rightarrow 2}\leq\|\mathbf{A}_{m}\mathbf{S}\|_{1\rightarrow 2}\leq\|\mathbf{A}_{m}\|_{1\rightarrow 2}\leq\sqrt{n}.

The final bound follows since 𝐀m=𝐕m𝐕mT𝐀\mathbf{A}_{m}=\mathbf{V}_{m}\mathbf{V}_{m}^{T}\mathbf{A}, where 𝐕m𝐕mT\mathbf{V}_{m}\mathbf{V}_{m}^{T} is an orthogonal projection matrix. Thus, 𝐀m12𝐀12n\|\mathbf{A}_{m}\|_{1\rightarrow 2}\leq\|\mathbf{A}\|_{1\rightarrow 2}\leq\sqrt{n} by our assumption that 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Plugging all these bounds into (6) we have, for some constant CC,

𝔼2[𝐀m,S2]C[logn𝐀m+logns+sϵδ1/2].\displaystyle\mathbb{E}_{2}[\|\mathbf{A}_{m,S}\|_{2}]\leq C\bigg{[}\log n\cdot\|\mathbf{A}_{m}\|_{\infty}+\sqrt{\log n\cdot s}+s\cdot\epsilon\delta^{1/2}\bigg{]}. (7)

It remains to bound 𝐀m\|\mathbf{A}_{m}\|_{\infty}. We have 𝐀=𝐀m+𝐀o\mathbf{A}=\mathbf{A}_{m}+\mathbf{A}_{o} and thus by triangle inequality,

𝐀m𝐀+𝐀o=1+𝐀o.\displaystyle\|\mathbf{A}_{m}\|_{\infty}\leq\|\mathbf{A}\|_{\infty}+\|\mathbf{A}_{o}\|_{\infty}=1+\|\mathbf{A}_{o}\|_{\infty}. (8)

Writing 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}_{o}=\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T} (see Definition 1.1), and letting 𝐕o,i\mathbf{V}_{o,i} denote the iith row of 𝐕o\mathbf{V}_{o}, the (i,j)(i,j)th element of 𝐀o\mathbf{A}_{o} has magnitude

|𝐀o,i,j|=|𝐕o,i𝚲o𝐕o,jT|𝐕o,i2𝚲o𝐕o,jT2,|\mathbf{A}_{o,i,j}|=|\mathbf{V}_{o,i}\mathbf{\Lambda}_{o}\mathbf{V}^{T}_{o,j}|\leq\|\mathbf{V}_{o,i}\|_{2}\cdot\|\mathbf{\Lambda}_{o}\mathbf{V}^{T}_{o,j}\|_{2},

by Cauchy-Schwarz. From Lemma 2, we have 𝐕o,i21ϵδ1/2n\|\mathbf{V}_{o,i}\|_{2}\leq\frac{1}{\epsilon\delta^{1/2}\sqrt{n}}. Also, from (2), 𝚲o𝐕o,jT2=[𝐀𝐕o]j2n\|\mathbf{\Lambda}_{o}\mathbf{V}^{T}_{o,j}\|_{2}=\|[\mathbf{A}\mathbf{V}_{o}]_{j}\|_{2}\leq\sqrt{n}. Overall, for all i,ji,j we have 𝐀o,i,j1ϵδ1/2nn=1ϵδ1/2\mathbf{A}_{o,i,j}\leq\frac{1}{\epsilon\delta^{1/2}\sqrt{n}}\cdot\sqrt{n}=\frac{1}{\epsilon\delta^{1/2}}, giving 𝐀o1ϵδ1/2\|\mathbf{A}_{o}\|_{\infty}\leq\frac{1}{\epsilon\delta^{1/2}}. Plugging back into (8) and in turn (7), we have for some constant CC,

𝔼2[𝐀m,S2]C[lognϵδ1/2+slogn+sϵδ1/2].\mathbb{E}_{2}[\|\mathbf{A}_{m,S}\|_{2}]\leq C\bigg{[}\frac{\log n}{\epsilon\delta^{1/2}}+\sqrt{s\log n}+s\epsilon\delta^{1/2}\bigg{]}.

Setting sclognϵ2δs\geq\frac{c\log n}{\epsilon^{2}\delta} for sufficiently large cc, all terms in the right hand side of the above equation are bounded by ϵδs\epsilon\sqrt{\delta}s and so

𝔼2[𝐀m,S2]3ϵδs\mathbb{E}_{2}[\|\mathbf{A}_{m,S}\|_{2}]\leq 3\epsilon\sqrt{\delta}s

Thus, by Markov’s inequality, with probability at least 1δ1-\delta, we have 𝐀m,S23ϵs\|\mathbf{A}_{m,S}\|_{2}\leq 3\epsilon s. We can adjust ϵ\epsilon by a constant to obtain the required bound. ∎

3.2 Main Accuracy Bounds

We now restate our main result, and give its proof via Lemmas 3 and 4. See 1

Proof.

Let 𝐒n×|S|\mathbf{S}\in\mathbb{R}^{n\times|S|} be the binary sampling matrix with a single one in each column such that 𝐒T𝐀𝐒=𝐀S\mathbf{S}^{T}\mathbf{A}\mathbf{S}=\mathbf{A}_{S}. Let 𝐒¯=n/s𝐒\bar{\mathbf{S}}=\sqrt{n/s}\cdot\mathbf{S} Following Definition 1.1, we write 𝐀=𝐀o+𝐀m\mathbf{A}=\mathbf{A}_{o}+\mathbf{A}_{m}. By Fact 1 we have that the nonzero eigenvalues of ns𝐀o,S=𝐒¯T𝐕o𝚲o𝐕oT𝐒¯\frac{n}{s}\cdot\mathbf{A}_{o,S}=\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T}\bar{\mathbf{S}} are identical to those of 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2} where 𝚲o1/2\mathbf{\Lambda}_{o}^{1/2} is the square root matrix of 𝚲o\mathbf{\Lambda}_{o} such that 𝚲o1/2𝚲o1/2=𝚲o\mathbf{\Lambda}_{o}^{1/2}\mathbf{\Lambda}_{o}^{1/2}=\mathbf{\Lambda}_{o}.

Note that 𝚲o\mathbf{\Lambda}_{o} is Hermitian. However 𝚲o1/2\mathbf{\Lambda}_{o}^{1/2} may be complex, and hence 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2} is not necessarily Hermitian, although it does have real eigenvalues. Thus, we can apply the perturbation bound of Fact 4 to 𝚲o\mathbf{\Lambda}_{o} and 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2} to claim for all i[n]i\in[n], and some constant CC,

|λi(𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2)λi(𝚲o)|Clogn𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2.\lvert\lambda_{i}(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2})-\lambda_{i}(\mathbf{\Lambda}_{o})\rvert\leq C\log n\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}\|_{2}.

By Lemma 3 applied with error ϵ2Clogn\frac{\epsilon}{2C\log n}, with probability at least 1δ1-\delta, for any sclog(1/(ϵδ))log3nϵ3δs\geq\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}} (for a large enough constant cc) we have 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2ϵn2Clogn\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}\|_{2}\leq\frac{\epsilon n}{2C\log n}. Thus, for all ii,

|λi(𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2)λi(𝚲o)|\displaystyle\left|\lambda_{i}(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2})-\lambda_{i}(\mathbf{\Lambda}_{o})\right| <ϵn2.\displaystyle<\frac{\epsilon n}{2}. (9)

We note that the conceptual part of the proof is essentially complete: the nonzero eigenvalues of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} are identical to those of 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}, which we have shown well approximate those of 𝚲o\mathbf{\Lambda}_{o} and in turn 𝐀o\mathbf{A}_{o}. i.e., the non-zero eigenvalues of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} approximate all outlying eigenvalues of 𝐀\mathbf{A}. It remains to carefully argue how these approximations should be ‘lined up’ given the presence of zero eigenvalues in the spectrum of these matrices. We also must account for the impact of the interior eigenvalues in 𝐀m,S\mathbf{A}_{m,S}, which is limited by the spectral norm bound of Lemma 4.

Eigenvalue alignment and effect of interior eigenvalues. First recall that 𝐀S=𝐀o,S+𝐀m,S\mathbf{A}_{S}=\mathbf{A}_{o,S}+\mathbf{A}_{m,S}. By Lemma 4 applied with error ϵ/2\epsilon/2, we have 𝐀m,S2ϵ/2s\|\mathbf{A}_{m,S}\|_{2}\leq\epsilon/2\cdot s with probability at least 1δ1-\delta when sclognϵ2δs\geq\frac{c\log n}{\epsilon^{2}\delta}. By Weyl’s inequality (Fact 3), for all i[|S|]i\in[|S|] we thus have

|nsλi(𝐀S)nsλi(𝐀o,S)|\displaystyle\left\lvert\frac{n}{s}\lambda_{i}(\mathbf{A}_{S})-\frac{n}{s}\lambda_{i}(\mathbf{A}_{o,S})\right\rvert nsϵs2=ϵn2.\displaystyle\leq\frac{n}{s}\cdot\frac{\epsilon s}{2}=\frac{\epsilon n}{2}. (10)

Consider i[|S|]i\in[|S|] with λi(𝐀o,S)>0\lambda_{i}(\mathbf{A}_{o,S})>0. Since the nonzero eigenvalues of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} are identical to those of 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}, nsλi(𝐀o,S)=λi(𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2)\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{o,S})=\lambda_{i}(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}), and so by (9),

|nsλi(𝐀o,S)λi(𝚲o)|\displaystyle\left|\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{o,S})-\lambda_{i}(\mathbf{\Lambda}_{o})\right| <ϵn2.\displaystyle<\frac{\epsilon n}{2}. (11)

Analogously, consider i[|S|]i\in[|S|] such that λi(𝐀o,S)<0\lambda_{i}(\mathbf{A}_{o,S})<0. We have nsλi(𝐀o,S)=λr(|S|i)(𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2)\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{o,S})=\lambda_{r-(|S|-i)}(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}), where rr is the dimension of 𝚲o\mathbf{\Lambda}_{o} – i.e., the number of outlying eigenvalues in 𝐀\mathbf{A}. Again by (9) we have

|nsλi(𝐀o,S)λr(|S|i)(𝚲o)|\displaystyle\left|\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{o,S})-\lambda_{r-(|S|-i)}(\mathbf{\Lambda}_{o})\right| <ϵn2.\displaystyle<\frac{\epsilon n}{2}. (12)

Now the nonzero eigenvalues of 𝐀o\mathbf{A}_{o} are identical to those of 𝚲o\mathbf{\Lambda}_{o}. Consider i[|S|]i\in[|S|] such that λi(𝐀S)ϵs\lambda_{i}(\mathbf{A}_{S})\geq\epsilon s. In this case, by (10), (11), and the triangle inequality, we have λi(𝚲o)>0\lambda_{i}(\mathbf{\Lambda}_{o})>0 and thus we have λi(𝚲o)=λi(𝐀o)\lambda_{i}(\mathbf{\Lambda}_{o})=\lambda_{i}(\mathbf{A}_{o}). In turn, again applying (10), (11), and the triangle inequality, we have

|nsλi(𝐀S)λi(𝐀o)||nsλi(𝐀o,S)λi(𝐀o)|+|nsλi(𝐀S)λi(𝐀o,S)|ϵn.\left|\frac{n}{s}\lambda_{i}(\mathbf{A}_{S})-\lambda_{i}(\mathbf{A}_{o})\right|\leq\left|\frac{n}{s}\lambda_{i}(\mathbf{A}_{o,S})-\lambda_{i}(\mathbf{A}_{o})\right|+\left|\frac{n}{s}\lambda_{i}(\mathbf{A}_{S})-\lambda_{i}(\mathbf{A}_{o,S})\right|\leq\epsilon n.

Analogously, for i[|S|]i\in[|S|] such that λi(𝐀S)ϵs\lambda_{i}(\mathbf{A}_{S})\leq-\epsilon s, we have by (10) and (12) that λr(|S|i)(𝚲o)<0\lambda_{r-(|S|-i)}(\mathbf{\Lambda}_{o})<0. Thus λr(|S|i)(𝚲o)=λn(ri)(𝐀o)\lambda_{r-(|S|-i)}(\mathbf{\Lambda}_{o})=\lambda_{n-(r-i)}(\mathbf{A}_{o}). Again by (10), (12), and triangle inequality this gives

|nsλi(𝐀S)λn(|S|i)(𝐀o)|ϵn.\left|\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S})-\lambda_{n-(|S|-i)}(\mathbf{A}_{o})\right|\leq\epsilon n.

Now, consider all i[n]i\in[n] such that λi(𝐀o)\lambda_{i}(\mathbf{A}_{o}) is not well approximated by one of the outlying eigenvalues of 𝐀S\mathbf{A}_{S} as argued above. By (10), (11), and (12), all such eigenvalues must have |λi(𝐀o)|2ϵn|\lambda_{i}(\mathbf{A}_{o})|\leq 2\epsilon n. Thus, if we approximate them in any way either by the remaining eigenvalues of 𝐀S\mathbf{A}_{S} with magnitude ϵs\leq\epsilon s, or else by 0, we will approximate all to error at most 3ϵn3\epsilon n. Thus, if (as in Algorithm 1) for i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}_{S})\geq 0, we let λ~i(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}) and for i[|S|]i\in[|S|] with λi(𝐀S)<0\lambda_{i}(\mathbf{A}_{S})<0, let λ~n(|S|i)(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}), and let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0 for all other ii, we will have for all ii,

|λ~i(𝐀)λi(𝐀o)|3ϵn.\displaystyle\left|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}_{o})\right|\leq 3\epsilon n.

Finally by definition, for all ii, |λi(𝐀)λi(𝐀o)|ϵδnϵn|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}_{o})|\leq\epsilon\sqrt{\delta}n\leq\epsilon n and thus, via triangle inequality, |λ~i(𝐀)λi(𝐀)|4ϵn.\left|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})\right|\leq 4\epsilon n. This gives our final error bound after adjusting constants on ϵ\epsilon.

Recall that we require sclog(1/(ϵδ))log3nϵ3δs\geq\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}} for the outer eigenvalue bound of (9) to hold with probability 1δ1-\delta. We require sclognϵ2δs\geq\frac{c\log n}{\epsilon^{2}\delta} for 𝐀m,S2ϵ/2s\|\mathbf{A}_{m,S}\|_{2}\leq\epsilon/2\cdot s to hold with probability 1δ1-\delta by Lemma 4. Thus, for both conditions to hold simultaneously with probability 12δ1-2\delta by a union bound, if suffices to set s=clog(1/(ϵδ))log3nϵ3δmax(clog(1/(ϵδ))log3nϵ3δ,clognϵ2δ)s=\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}{\delta}}\geq\max\left(\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}},\frac{c\log n}{\epsilon^{2}\delta}\right), where we use that log(1/(ϵδ)O(logn)\log(1/(\epsilon\delta)\leq O(\log n), as otherwise our algorithm can take 𝐀S\mathbf{A}_{S} to be the full matrix 𝐀\mathbf{A}. Adjusting δ\delta to δ/2\delta/2 completes the theorem. ∎

Remark: The proof of Lemma 3 and consequently, Theorem 1 can be modified to give better bounds for the case when the eigenvalues of 𝐀o\mathbf{A}_{o} lie in a bounded range – between ϵaδn\epsilon^{a}\sqrt{\delta}n and ϵbn\epsilon^{b}n where 0ba10\leq b\leq a\leq 1. See Theorem 9 in Appendix C for details. For example, if all the top eigenvalues are equal, one can show that s=O~(log2nϵ2)s=\tilde{O}\left(\frac{\log^{2}n}{\epsilon^{2}}\right) suffices to give ±ϵn\pm\epsilon n error, nearly matching the lower bound of [BCJ20]. This seems to indicate that improving Theorem 1 in general requires tackling the case when the outlying eigenvalues in 𝚲o\boldsymbol{\Lambda}_{o} have a wide range.

4 Improved Bounds via Sparsity-Based Sampling

We now prove the ±ϵnnz(𝐀)\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} approximation bound of Theorem 2, assuming the ability to sample each row with probability proportional to nnz(𝐀i)nnz(𝐀)\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}. Pseudocode for our algorithm is given in Algorithm 2. Unlike in the uniform sampling case (Algorithm 1), we cannot simply sample a principal submatrix of 𝐀\mathbf{A} and compute its eigenvalues. We must carefully zero out entries lying at the intersection of sparse rows and columns to ensure accuracy of our estimates. A similar approach is taken for the norm-based sampling result of Theorem 3. We defer that proof to Appendix E.

4.1 Preliminary Lemmas

Our first step is to argue that zeroing out entries in sparse rows/columns in step 5 of Algorithm 2 does not introduce significant error. We define 𝐀n×n\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} to be the extension of 𝐀\mathbf{A}^{\prime} to the original matrix – i.e., 𝐀ij=0\mathbf{A}^{\prime}_{ij}=0 whenever i=ji=j or nnz(𝐀i)nnz(𝐀j)<ϵ2nnz(𝐀)c2log2n\operatorname{nnz}(\mathbf{A}_{i})\operatorname{nnz}(\mathbf{A}_{j})<\frac{\epsilon^{2}\operatorname{nnz}({\mathbf{A}})}{c_{2}\log^{2}n}. Otherwise 𝐀ij=𝐀ij\mathbf{A}^{\prime}_{ij}=\mathbf{A}_{ij}. We argue via a strengthening of Girshgorin’s theorem that |λi(𝐀)λi(𝐀)|ϵnnz(𝐀)|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime})|\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} for all ii.

After this step is complete, our proof follows the same general outline as that of Theorem 1 in Section 3. We split 𝐀=𝐀o+𝐀m\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{o}+\mathbf{A}^{\prime}_{m}, arguing that (1) after sampling 𝐀m,S2ϵnnz(𝐀)\|\mathbf{A}_{m,S}^{\prime}\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} and (2) that the eigenvalues of 𝐀o,S\mathbf{A}^{\prime}_{o,S} are ±ϵnnz(𝐀)\pm\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} approximations to those of 𝐀o\mathbf{A}^{\prime}_{o}. In both cases, we critically use that the rescaling factors introduced in line 4 of Algorithm 2 do not introduce too much variance, due to the zeroing out of entries in 𝐀\mathbf{A}^{\prime}.

Algorithm 2 Eigenvalue estimator using sparsity-based sampling
1:  Input: Symmetric 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1, Accuracy ϵ(0,1)\epsilon\in(0,1), failure prob. δ(0,1)\delta\in(0,1). nnz(𝐀i)\operatorname{nnz}(\mathbf{A}_{i}) for all i[n]i\in[n] and nnz(𝐀)\operatorname{nnz}(\mathbf{A}).
2:  Fix s=c1log8nϵ8δ4s=\frac{c_{1}\log^{8}n}{\epsilon^{8}\delta^{4}} where c1c_{1} is a sufficiently large constant.
3:  Add each i[n]i\in[n] to sample set SS independently, with probability pi=min(1,snnz(𝐀i)nnz(𝐀))p_{i}=\min\left(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}\right). Let the principal submatrix of 𝐀\mathbf{A} corresponding to SS be 𝐀S\mathbf{A}_{S}.
4:  Let 𝐀S=𝐃𝐀S𝐃\mathbf{A}_{S}=\mathbf{D}\mathbf{A}_{S}\mathbf{D} where 𝐃|S|×|S|\mathbf{D}\in\mathbb{R}^{|S|\times|S|} is diagonal with 𝐃i,i=1pj\mathbf{D}_{i,i}=\frac{1}{\sqrt{p_{j}}} if the ithi^{th} element of SS is jj.
5:  Construct 𝐀S|S|×|S|\mathbf{A}^{\prime}_{S}\in\mathbb{R}^{|S|\times|S|} from 𝐀S\mathbf{A}_{S} as follows:
[𝐀S]i,j\displaystyle\mathbf{[}\mathbf{A}^{\prime}_{S}]_{i,j} ={0if i=j or nnz(𝐀i)nnz(𝐀j)<ϵ2nnz(𝐀)c2log2n for sufficient large constant c2[𝐀S]i,jotherwise.\displaystyle=\begin{cases}0&\text{if $i=j$ or }\operatorname{nnz}(\mathbf{A}_{i})\operatorname{nnz}(\mathbf{A}_{j})<\frac{\epsilon^{2}\operatorname{nnz}({\mathbf{A}})}{c_{2}\log^{2}n}\text{ for sufficient large constant $c_{2}$}\\ [\mathbf{A}_{S}]_{i,j}&\text{otherwise}.\end{cases}
6:  Compute the eigenvalues of 𝐀S\mathbf{A}^{\prime}_{S}: λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}^{\prime}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}^{\prime}_{S}).
7:  For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}^{\prime}_{S})\geq 0, let λ~i(𝐀)=λi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\lambda_{i}(\mathbf{A}^{\prime}_{S}). For all i[|S|]i\in[|S|] with λi(𝐀S)<0\lambda_{i}(\mathbf{A}^{\prime}_{S})<0, let λ~n(|S|i)(𝐀)=λi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\lambda_{i}(\mathbf{A}^{\prime}_{S}). For all remaining i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0.
8:  Return: Eigenvalue estimates λ~1(𝐀)λ~n(𝐀)\tilde{\lambda}_{1}(\mathbf{A})\geq\ldots\geq\tilde{\lambda}_{n}(\mathbf{A}).

Remark: Throughout, we will assume that 𝐀\mathbf{A} does not have any rows/columns that are all 0, as such rows will never be sampled and will have no effect on the output of Algorithm 2. Additionally, we will assume that nnz(𝐀)c1log8nϵ8δ4\operatorname{nnz}(\mathbf{A})\geq\frac{c_{1}\log^{8}n}{\epsilon^{8}\delta^{4}}, as otherwise, 𝐀\mathbf{A} has at most s=c1log8nϵ8δ4s=\frac{c_{1}\log^{8}n}{\epsilon^{8}\delta^{4}} non-zero rows. Thus, rather than running Algorithm 2, we can directly compute the eigenvalues of 𝐀\mathbf{A}.

Lemma 5.

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and nnz(𝐀)2/ϵ2\operatorname{nnz}(\mathbf{A})\geq 2/\epsilon^{2}. Let 𝐀n×n\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} have 𝐀ij=0\mathbf{A}^{\prime}_{ij}=0 if i=ji=j or nnz(𝐀i)nnz(𝐀j)<ϵ2nnz(𝐀)c2log2n\operatorname{nnz}(\mathbf{A}_{i})\cdot\operatorname{nnz}(\mathbf{A}_{j})<\frac{\epsilon^{2}\operatorname{nnz}({\mathbf{A}})}{c_{2}\log^{2}n} for a sufficiently large constant c2c_{2} and 𝐀ij=𝐀ij\mathbf{A}^{\prime}_{ij}=\mathbf{A}_{ij} otherwise. Then, for all i[n]i\in[n],

|λi(𝐀)λi(𝐀)|ϵnnz(𝐀).|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime})|\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}.
Proof.

We consider the matrix 𝐀′′\mathbf{A}^{\prime\prime}, which is defined identically to 𝐀\mathbf{A}^{\prime} except we only set 𝐀ij′′=0\mathbf{A}^{\prime\prime}_{ij}=0 if nnz(𝐀i)nnz(𝐀j)<ϵ2nnz(𝐀)c2log2n\operatorname{nnz}(\mathbf{A}_{i})\cdot\operatorname{nnz}(\mathbf{A}_{j})<\frac{\epsilon^{2}\operatorname{nnz}({\mathbf{A}})}{c_{2}\log^{2}n}. I.e., we do not have the condition requiring setting the diagonal to 0. We will show that |λi(𝐀)λi(𝐀′′)|ϵ/2nnz(𝐀)|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime\prime})|\leq\epsilon/2\cdot\sqrt{\operatorname{nnz}(\mathbf{A})}. By Weyl’s inequality, and the assumption that nnz(𝐀)2/ϵ2\operatorname{nnz}(\mathbf{A})\geq 2/\epsilon^{2}, we then have |λi(𝐀)λi(𝐀)|ϵ/2nnz(𝐀)+1ϵnnz(𝐀)|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime})|\leq\epsilon/2\cdot\sqrt{\operatorname{nnz}(\mathbf{A})}+1\leq\epsilon\cdot\sqrt{\operatorname{nnz}(\mathbf{A})} as required.

Let k[n]\mathcal{I}_{k}\subset[n] be the set of rows/columns with nnz(𝐀i)[nnz(𝐀)2k,nnz(𝐀)2k1)\operatorname{nnz}(\mathbf{A}_{i})\in\left[\frac{\operatorname{nnz}(\mathbf{A})}{2^{k}},\frac{\operatorname{nnz}(\mathbf{A})}{2^{k-1}}\right) and 𝐀kl=𝐀(k,l)\mathbf{A}_{kl}=\mathbf{A}(\mathcal{I}_{k},\mathcal{I}_{l}) be the submatrix of 𝐀\mathbf{A} formed with rows in k\mathcal{I}_{k} and columns in l\mathcal{I}_{l}. Define 𝐀kl′′\mathbf{A}^{\prime\prime}_{kl} in the same way and observe that 𝐀kl′′=𝐀kl\mathbf{A}^{\prime\prime}_{kl}=\mathbf{A}_{kl} whenever 2k+lc2nnz(𝐀)log2nϵ22^{k+l}\leq\frac{c_{2}\operatorname{nnz}(\mathbf{A})\log^{2}n}{\epsilon^{2}}.

When 2k+l>c2nnz(𝐀)log2nϵ22^{k+l}>\frac{c_{2}\operatorname{nnz}(\mathbf{A})\log^{2}n}{\epsilon^{2}}, we may zero out some entries of 𝐀kl\mathbf{A}_{kl} to produce 𝐀kl′′\mathbf{A}_{kl}^{\prime\prime}. Let 𝐀^kl\mathbf{\widehat{A}}_{kl} be equal to 𝐀kl\mathbf{A}_{kl} on this set of zeroed out entries, and 0 everywhere else. Observe that (𝐀^kl𝐀^klT)m,:=(𝐀^kl)m,:𝐀^klT(\mathbf{\widehat{A}}_{kl}\mathbf{\widehat{A}}_{kl}^{T})_{m,:}=(\mathbf{\widehat{A}}_{kl})_{m,:}\mathbf{\widehat{A}}_{kl}^{T}. Next observe that (𝐀^kl)m,:(\mathbf{\widehat{A}}_{kl})_{m,:} has at most nnz(𝐀m)nnz(𝐀)2k1\operatorname{nnz}(\mathbf{A}_{m})\leq\frac{\operatorname{nnz}(\mathbf{A})}{2^{k-1}} non-zero entries. Similarly, each row of 𝐀^klT\mathbf{\widehat{A}}_{kl}^{T} has at most nnz(𝐀)2l1\frac{\operatorname{nnz}(\mathbf{A})}{2^{l-1}} non-zero elements. Thus, for all m|k|m\in|\mathcal{I}_{k}|, using that 𝐀1\|\mathbf{A}\|_{\infty}\leq 1,

(𝐀^kl𝐀^klT)m,:1nnz(𝐀)22k+l2=4nnz(𝐀)22k+l.\displaystyle\|(\mathbf{\widehat{A}}_{kl}\mathbf{\widehat{A}}_{kl}^{T})_{m,:}\|_{1}\leq\frac{\operatorname{nnz}(\mathbf{A})^{2}}{2^{k+l-2}}=\frac{4\operatorname{nnz}(\mathbf{A})^{2}}{2^{k+l}}.

Applying Girshgorin’s circle theorem (Theorem 2) we thus have:

𝐀^kl22=𝐀^kl𝐀^klT2maxm(𝐀^kl𝐀^klT)m,:14nnz(𝐀)22k+l.\displaystyle\|\mathbf{\widehat{A}}_{kl}\|_{2}^{2}=\|\mathbf{\widehat{A}}_{kl}\mathbf{\widehat{A}}_{kl}^{T}\|_{2}\leq\max_{m}\|(\mathbf{\widehat{A}}_{kl}\mathbf{\widehat{A}}_{kl}^{T})_{m,:}\|_{1}\leq\frac{4\operatorname{nnz}(\mathbf{A})^{2}}{2^{k+l}}. (13)

Let 𝐀¯kln×n\bar{\mathbf{A}}_{kl}\in\mathbb{R}^{n\times n} be a symmetric matrix such that 𝐀¯kl(k,l)=𝐀^kl\bar{\mathbf{A}}_{kl}(\mathcal{I}_{k},\mathcal{I}_{l})=\mathbf{\widehat{A}}_{kl}, 𝐀¯kl(l,k)=𝐀^lk\bar{\mathbf{A}}_{kl}(\mathcal{I}_{l},\mathcal{I}_{k})=\mathbf{\widehat{A}}_{lk}, and 𝐀¯kl\bar{\mathbf{A}}_{kl} is zero everywhere else. By triangle inequality and the bound of (13),

𝐀¯kl2𝐀^kl2+𝐀^lk24nnz(𝐀)2(k+l)/2.\displaystyle\|\bar{\mathbf{A}}_{kl}\|_{2}\leq\|\mathbf{\widehat{A}}_{kl}\|_{2}+\|\mathbf{\widehat{A}}_{lk}\|_{2}\leq\frac{4\operatorname{nnz}(\mathbf{A})}{2^{(k+l)/2}}.

Observe that, since we assume all rows have at least one non-zero entry, nnz(𝐀i)1\operatorname{nnz}(\mathbf{A}_{i})\geq 1 and nnz(𝐀)n2\operatorname{nnz}(\mathbf{A})\leq n^{2}. Therefore, k,lk,l can range from 11 to log(n2)=2logn\log(n^{2})=2\log n. By triangle inequality,

𝐀𝐀′′2\displaystyle\|\mathbf{A}-\mathbf{A}^{\prime\prime}\|_{2} (k,l):2k+l>c2nnz(𝐀)log2nϵ2𝐀¯kl\displaystyle\leq\sum_{(k,l):2^{k+l}>\frac{c_{2}\operatorname{nnz}(\mathbf{A})\log^{2}n}{\epsilon^{2}}}\|\mathbf{\bar{A}}_{kl}\|
k=12logn4ϵnnz(𝐀)c2logni=12logn12i1\displaystyle\leq\sum_{k=1}^{2\log n}\frac{4\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}}{\sqrt{c_{2}}\cdot\log n}\cdot\sum_{i=1}^{2\log n}\frac{1}{2^{i-1}}
16ϵnnz(𝐀)c2.\displaystyle\leq\frac{16\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}}{\sqrt{c_{2}}}.

Finally, setting c2c_{2} large enough and using Weyls’ inequality (Fact 4) we have the required bound:

|λi(𝐀)λi(𝐀′′)|\displaystyle|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime\prime})| ϵ/2nnz(𝐀).\displaystyle\leq\epsilon/2\sqrt{\operatorname{nnz}(\mathbf{A})}.

We next give a bound on the coherence of the outlying eigenvectors of 𝐀\mathbf{A}^{\prime}. This bound is analogous to Lemma 2, but is more refined, taking into account the sparsity of each row.

Lemma 6 (Incoherence of outlying eigenvectors in terms of sparsity).

Let 𝐀,𝐀n×n\mathbf{A},\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} be as in Lemma 5. Let 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{V}_{o}^{{}^{\prime}T} where 𝚲o\mathbf{\Lambda}^{\prime}_{o} is diagonal, with the eigenvalues of 𝐀\mathbf{A}^{\prime} with magnitude ϵδnnz(𝐀)\geq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} on its diagonal, and 𝐕o\mathbf{V}^{\prime}_{o} has columns equal to the corresponding eigenvectors. Let 𝐕o,i\mathbf{V}^{\prime}_{o,i} denote the iith row of 𝐕o\mathbf{V}^{\prime}_{o}. Then,

𝚲o1/2𝐕o,i22nnz(𝐀i)ϵδnnz(𝐀)and𝐕o,i22nnz(𝐀i)ϵ2δnnz(𝐀).\displaystyle\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}^{\prime}_{o,i}\|_{2}^{2}\leq\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}}\hskip 10.00002ptand\hskip 10.00002pt\|\mathbf{V}^{\prime}_{o,i}\|^{2}_{2}\leq\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\epsilon^{2}\delta\operatorname{nnz}(\mathbf{A})}.
Proof.

The proof is nearly identical to that of Lemma 2. Observe that 𝐀𝐕o=𝐕o𝚲o\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}. Letting [𝐀𝐕o]i[\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}]_{i} denote the iith row of the 𝐀𝐕o\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}, we have

[𝐀𝐕o]i22=[𝐕o𝚲o]i22=j=1rλj2𝐕o,i,j2,\|[\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}]_{i}\|_{2}^{2}=\|[\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}]_{i}\|_{2}^{2}=\sum_{j=1}^{r}\lambda_{j}^{2}\cdot\mathbf{V}_{o,i,j}^{{}^{\prime}2}, (14)

where r=rank(𝐀o)r=\operatorname{rank}(\mathbf{A}^{\prime}_{o}), 𝐕o,i,j\mathbf{V}^{\prime}_{o,i,j} is the (i,j)(i,j)th element of 𝐕o\mathbf{V}^{\prime}_{o} and λj=𝚲o(j,j)\lambda_{j}=\mathbf{\Lambda}^{\prime}_{o}(j,j). Since 𝐕o\mathbf{V}^{\prime}_{o} has orthonormal columns, we thus have [𝐀𝐕o]i22𝐀i22𝐀i22nnz(𝐀i)\|[\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}]_{i}\|_{2}^{2}\leq\|\mathbf{A}^{\prime}_{i}\|_{2}^{2}\leq\|\mathbf{A}_{i}\|_{2}^{2}\leq\operatorname{nnz}(\mathbf{A}_{i}). Therefore, by (14),

j=1rλj2𝐕o,i,j2nnz(𝐀i).\sum_{j=1}^{r}\lambda_{j}^{2}\cdot\mathbf{V}_{o,i,j}^{{}^{\prime}2}\leq\operatorname{nnz}(\mathbf{A}_{i}). (15)

Since by definition |λj|ϵδnnz(𝐀)\lvert\lambda_{j}\rvert\geq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} for all jj, we can concluse that 𝚲o1/2𝐕o,i22=j=1rλj𝐕o,i,j2nnz(𝐀i)ϵδnnz(𝐀)\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}^{\prime}_{o,i}\|_{2}^{2}=\sum_{j=1}^{r}\lambda_{j}\cdot\mathbf{V}_{o,i,j}^{{}^{\prime}2}\leq\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}} and 𝐕o,i22=j=1r𝐕o,i,j2nnz(𝐀i)ϵ2δnnz(𝐀)\|\mathbf{V}^{\prime}_{o,i}\|_{2}^{2}=\sum_{j=1}^{r}\mathbf{V}_{o,i,j}^{{}^{\prime}2}\leq\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\epsilon^{2}\delta\operatorname{nnz}(\mathbf{A})}, which completes the lemma. ∎

4.2 Outer and Middle Eigenvalue Bounds

Using Lemma 6, we next argue that the eigenvalues of 𝐀o,S\mathbf{A}_{o,S}^{\prime} will approximate those of 𝐀\mathbf{A}^{\prime}, and in turn those of 𝐀\mathbf{A}. The proof is very similar to Lemma 3 in the uniform sampling case.

Lemma 7 (Concentration of outlying eigenvalues with sparsity-based sampling).

Let 𝐀,𝐀n×n\mathbf{A},\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} be as in Lemmas 5 and 6. Let 𝐀=𝐀m+𝐀o\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{m}+\mathbf{A}^{\prime}_{o}, where 𝐀m=𝐕m𝚲m𝐕mT\mathbf{A}^{\prime}_{m}=\mathbf{V}^{\prime}_{m}\mathbf{\Lambda}^{\prime}_{m}\mathbf{\mathbf{V}^{\prime}}_{m}^{T}, and 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{\mathbf{V}^{\prime}}_{o}^{T} are projections onto the eigenspaces with magnitude <ϵδnnz(𝐀)<\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} and ϵδnnz(𝐀)\geq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} respectively (analogous to Definition 1.1) As in Algorithm 2, for all i[n]i\in[n] let pi=min(1,snnz(𝐀i)nnz(𝐀))p_{i}=\min\left(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}\right) and let 𝐒¯\bar{\mathbf{S}} be a scaled diagonal sampling matrix such that the 𝐒¯ii=1pi\bar{\mathbf{S}}_{ii}=\frac{1}{\sqrt{p_{i}}} with probability pip_{i} and 𝐒¯ii=0\bar{\mathbf{S}}_{ii}=0 otherwise. If sclog(1/(ϵδ))ϵ3δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{3}\sqrt{\delta}} for a large enough constant cc, then with probability at least 1δ1-\delta, 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2ϵnnz(𝐀)\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}-\mathbf{\Lambda}^{\prime}_{o}\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}.

Proof.

Define 𝐄=𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o\mathbf{E}=\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}-\mathbf{\Lambda}^{\prime}_{o}. For all i[n]i\in[n], let 𝐕o,i\mathbf{V}_{o,i} be the ithi^{th} row of 𝐕o\mathbf{V}^{\prime}_{o} and define the matrix valued random variable

𝐘i={1pi𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2,with probability pi0otherwise.\displaystyle\mathbf{Y}_{i}=\begin{cases}\frac{1}{p_{i}}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}^{\prime}_{o,i}\mathbf{V}_{o,i}^{{}^{\prime}T}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2},&\text{with probability }p_{i}\\ 0&\text{otherwise.}\end{cases} (16)

Define 𝐐i=𝐘i𝔼[𝐘i]\mathbf{Q}_{i}=\mathbf{Y}_{i}-\mathbb{E}\left[\mathbf{Y}_{i}\right]. We can observe that 𝐐1,𝐐2,,𝐐n\mathbf{Q}_{1},\mathbf{Q}_{2},\ldots,\mathbf{Q}_{n} are independent random variables and that i=1n𝐐i=𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o=𝐄\sum_{i=1}^{n}\mathbf{Q}_{i}=\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}-\mathbf{\Lambda}^{\prime}_{o}=\mathbf{E}. Let P={i[n]:pi<1}P=\{i\in[n]:p_{i}<1\}. Then, observe that i[n]P𝐐i=0\sum_{i\in[n]\setminus P}\mathbf{Q}_{i}=0. So, 𝐄=iP𝐐i\mathbf{E}=\sum_{i\in P}\mathbf{Q}_{i}. Then, similar to the proof of Lemma 3, we need to bound 𝐐i2\|\mathbf{Q}_{i}\|_{2} for all iPi\in P and 𝐕𝐚𝐫(𝐄)=def𝔼(𝐄𝐄T)=𝔼(𝐄T𝐄)=iP𝔼[𝐐i2]\mathbf{Var}(\mathbf{E})\mathbin{\stackrel{{\scriptstyle\rm def}}{{=}}}\mathbb{E}(\mathbf{EE}^{T})=\mathbb{E}(\mathbf{E}^{T}\mathbf{E})=\sum_{i\in P}\mathbb{E}[\mathbf{Q}_{i}^{2}] using the improved row norm bounds of Lemma 5. In particular, we have

iP𝔼[𝐐i2]\displaystyle\sum_{i\in P}\mathbb{E}[\mathbf{Q}_{i}^{2}] =iP[pi(1pi1)2+(1pi)](𝚲o1/2𝐕o,i𝐕o,iT𝚲o𝐕o,i𝐕o,iT𝚲o1/2)\displaystyle=\sum_{i\in P}\left[p_{i}\cdot\left(\frac{1}{p_{i}}-1\right)^{2}+\left(1-p_{i}\right)\right]\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})
iP1pi𝚲o1/2𝐕o,i22(𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2).\displaystyle\preceq\sum_{i\in P}\frac{1}{p_{i}}\cdot\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2}). (17)

By Lemma 5, 𝚲o1/2𝐕o,i22nnz(𝐀i)ϵδnnz(𝐀)\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}}. Plugging back into (17),

iP𝔼[𝐐i2]\displaystyle\sum_{i\in P}\mathbb{E}[\mathbf{Q}_{i}^{2}] iP1pinnz(𝐀i)ϵδnnz(𝐀)(𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2)\displaystyle\preceq\sum_{i\in P}\frac{1}{p_{i}}\cdot\frac{\operatorname{nnz}(\mathbf{A}_{i})}{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}}\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})
nnz(𝐀)sϵδ(iPΛo1/2𝐕o,i𝐕o,iT𝚲o1/2)\displaystyle\preceq\frac{\sqrt{\operatorname{nnz}(\mathbf{A})}}{s\epsilon\sqrt{\delta}}(\sum_{i\in P}\Lambda_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})
=nnz(𝐀)sϵδ𝚲onnz(𝐀)sϵδ𝐈.\displaystyle=\frac{\sqrt{\operatorname{nnz}(\mathbf{A})}}{s\epsilon\sqrt{\delta}}\mathbf{\Lambda}_{o}\preceq\frac{\operatorname{nnz}(\mathbf{A})}{s\epsilon\sqrt{\delta}}\cdot\mathbf{I}.

Since 𝐐i2\mathbf{Q}_{i}^{2} is PSD this establishes that vVar(E)2nnz(𝐀)sϵδv\leq\|\textbf{Var(E)}\|_{2}\leq\frac{\operatorname{nnz}(\mathbf{A})}{s\epsilon\sqrt{\delta}}. Since there are at most nnz(𝐀)δϵ2nnz(𝐀)=1ϵ2δ\frac{\operatorname{nnz}(\mathbf{A})}{\delta\epsilon^{2}\operatorname{nnz}(\mathbf{A})}=\frac{1}{\epsilon^{2}\delta} eigenvalues with absolute value ϵδnnz(𝐀)\geq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}, we can apply the matrix Bernstein inequality exactly as in the proof of Lemma 3 with d=1ϵ2δd=\frac{1}{\epsilon^{2}\delta} to show that when sclog(1/(ϵδ))ϵ3δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{3}\sqrt{\delta}} for large enough cc, with probability at least 1δ1-\delta, 𝐄2ϵnnz(𝐀)\left\|\mathbf{E}\right\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}. ∎

We next bound the spectral norm of 𝐀m,S\mathbf{A}^{\prime}_{m,S}. This is the most challenging part of the proof – the rows of this matrix are sampled non-uniformly and scaled proportional to their inverse sampling probabilities, so we cannot apply existing bounds on the spectral norms of uniformly sampled random submatrices [RV07]. We extend these bounds to the non-uniform case, critically using that entries which would be scaled up significantly after sampling (i.e. those lying in sparse rows/columns), have already been set to 0 in 𝐀m,S\mathbf{A}^{\prime}_{m,S}, and thus do not contribute to the spectral norm.

Lemma 8 (Concentration of middle eigenvalues with sparsity-based sampling).

Let 𝐀,𝐀n×n\mathbf{A},\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} be as in Lemmas 5 and 6. Let 𝐀=𝐀m+𝐀o\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{m}+\mathbf{A}^{\prime}_{o}, where 𝐀m=𝐕m𝚲m𝐕mT\mathbf{A}^{\prime}_{m}=\mathbf{V}^{\prime}_{m}\mathbf{\Lambda}^{\prime}_{m}\mathbf{\mathbf{V}^{\prime}}_{m}^{T}, and 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{\mathbf{V}^{\prime}}_{o}^{T} are projections onto the eigenspaces with magnitude <ϵδnnz(𝐀)<\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} and ϵδnnz(𝐀)\geq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} respectively (analogous to Definition 1.1). As in Algorithm 2, for all i[n]i\in[n] let pi=min(1,snnz(𝐀i)nnz(𝐀))p_{i}=\min\left(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}\right) and let 𝐒¯\bar{\mathbf{S}} be a scaled diagonal sampling matrix such that the 𝐒¯ii=1pi\bar{\mathbf{S}}_{ii}=\frac{1}{\sqrt{p_{i}}} with probability pip_{i} and 𝐒¯ii=0\bar{\mathbf{S}}_{ii}=0 otherwise. If sclog8nϵ8δ4s\geq\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}} for a large enough constant cc, then with probability at least 1δ1-\delta,

𝐒¯𝐀m𝐒¯2ϵnnz(𝐀).\|\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m}\bar{\mathbf{S}}\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}.
Proof.

The initial part of the proof follows the outline of proof of the spectral norm bound for uniformly random submatrices (Theorem 4) of [Tro08a]. From Lemma 6, we have 𝐕o,i2nnz(𝐀i)ϵδnnz(𝐀)\|{\mathbf{V}^{\prime}}_{o,i}\|_{2}\leq\frac{\sqrt{\operatorname{nnz}({\mathbf{A}}_{i})}}{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}}. Also, following the proof of Lemma 6, we have 𝚲o𝐕o,jT2=[𝐀𝐕o]j2nnz(𝐀j)\|{\mathbf{\Lambda}^{\prime}}_{o}{\mathbf{V}^{\prime}}^{T}_{o,j}\|_{2}=\|[{\mathbf{A}^{\prime}}{\mathbf{V}^{\prime}}_{o}]_{j}\|_{2}\leq\sqrt{\operatorname{nnz}({\mathbf{A}}_{j})}. Thus, for all i,j[n]i,j\in[n], using Cauchy Schwarz’s inequality, we have

|𝐀o,i,j|=|𝐕o,i𝚲o𝐕o,jT|𝐕o,i2𝚲o𝐕o,jT2nnz(𝐀i)ϵδnnz(𝐀)nnz(𝐀j).\displaystyle|{\mathbf{A}^{\prime}}_{o,i,j}|=|{\mathbf{V}^{\prime}}_{o,i}{\mathbf{\Lambda}^{\prime}}_{o}{\mathbf{V}^{\prime}}_{o,j}^{T}|\leq\|{\mathbf{V}^{\prime}}_{o,i}\|_{2}\cdot\|{\mathbf{\Lambda}^{\prime}}_{o}{\mathbf{V}^{\prime}}_{o,j}^{T}\|_{2}\leq\frac{\sqrt{\operatorname{nnz}({\mathbf{A}}_{i})}}{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}}\cdot\sqrt{\operatorname{nnz}({\mathbf{A}}_{j})}. (18)

Let 𝐀m=𝐇m+𝐃m{\mathbf{A}^{\prime}}_{m}=\mathbf{H}_{m}+\mathbf{D}_{m} where 𝐇m\mathbf{H}_{m} and 𝐃m\mathbf{D}_{m} contain the off-diagonal and diagonal elements of 𝐀m\mathbf{A}^{\prime}_{m} respectively. Note that while 𝐀\mathbf{A}^{\prime} is zero on the diagonal, 𝐀m\mathbf{A}^{\prime}_{m} may not be. We have:

𝔼2𝐒¯𝐀m𝐒¯2𝔼2𝐒¯𝐇m𝐒¯2+𝔼2𝐒¯𝐃m𝐒¯2.\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}\leq\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\bar{\mathbf{S}}\|_{2}+\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}.

Using Lemma 1 (decoupling) on 𝔼2𝐒𝐇m𝐒¯2\mathbb{E}_{2}\|\mathbf{S}\mathbf{H}_{m}\bar{\mathbf{S}}\|_{2}, we get

𝔼2𝐒¯𝐀m𝐒¯22𝔼2𝐒¯𝐇m𝐒^2+𝔼2𝐒¯𝐃m𝐒¯2,\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}\leq 2\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{2}+\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}, (19)

where 𝐒^\hat{\mathbf{S}} is an independent copy of 𝐒¯\bar{\mathbf{S}}. Upper bounding the rank of 𝐇m\mathbf{H}_{m} as nn and applying Theorem 5 twice to 𝔼2𝐒¯𝐇m𝐒^2\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{2}, once for each operator, we get

𝔼2𝐒¯𝐇m𝐒^2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{2} 5logn𝔼2𝐒¯𝐇m𝐒^12+𝔼2𝐒^𝐇m2\displaystyle\leq 5\sqrt{\log n}\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+\mathbb{E}_{2}\|\hat{\mathbf{S}}\mathbf{H}_{m}\|_{2}
5logn𝔼2𝐒¯𝐇m𝐒^12+5logn𝔼2𝐇m𝐒^12+𝐇m2.\displaystyle\leq 5\sqrt{\log n}\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+5\sqrt{\log n}\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+\|\mathbf{H}_{m}\|_{2}. (20)

Plugging (20) into (19), we have:

𝔼2𝐒¯𝐀m𝐒¯210logn(𝔼2𝐒¯𝐇m𝐒^12+𝔼2𝐇m𝐒^12)+2𝐇m2+𝔼2𝐒¯𝐃m𝐒¯2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}\leq 10\sqrt{\log n}\left(\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\right)+2\|\mathbf{H}_{m}\|_{2}+\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2} (21)

We now proceed to bound each of the terms on the right hand side of (21). We start with 𝔼2𝐒¯𝐃m𝐒¯2\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}. First, observe that 𝔼2𝐒¯𝐃m𝐒¯2maxi1pi|(𝐃m)ii|\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}\leq\max_{i}\frac{1}{p_{i}}\lvert(\mathbf{D}_{m})_{ii}\rvert. We consider two cases.

Case 1: pi<1p_{i}<1. Then, pi=snnz(𝐀i)nnz(𝐀)p_{i}=\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})} and |(𝐃m)ii|=|(𝐀m)ii|=|(𝐀o)ii|\lvert(\mathbf{D}_{m})_{ii}\rvert=\lvert({\mathbf{A}^{\prime}}_{m})_{ii}\rvert=\lvert(\mathbf{A}^{\prime}_{o})_{ii}\rvert (since 𝐀ii=0\mathbf{A}^{\prime}_{ii}=0). Then by (18), we have 1pi|(𝐃m)ii|nnz(𝐀)sϵδ\frac{1}{p_{i}}\lvert(\mathbf{D}_{m})_{ii}\rvert\leq\frac{\sqrt{\operatorname{nnz}(\mathbf{A})}}{s\epsilon\sqrt{\delta}}.

Case 2: pi=1p_{i}=1. Then we have 1pi|(𝐃m)ii|=|(𝐃m)ii|maxj|(𝐃m)jj|𝐀m2ϵδnnz(𝐀)\frac{1}{p_{i}}\lvert(\mathbf{D}_{m})_{ii}\rvert=\lvert(\mathbf{D}_{m})_{ii}\rvert\leq\max_{j}\lvert(\mathbf{D}_{m})_{jj}\rvert\leq\|\mathbf{A}^{\prime}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}.
From the two cases above, for s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta}, we have:

𝔼2𝐒¯𝐃m𝐒¯2ϵδnnz(𝐀).\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}. (22)

We can bound 𝐇m2\|\mathbf{H}_{m}\|_{2} similarly. Since 𝐇m=𝐀m𝐃m\mathbf{H}_{m}={\mathbf{A}^{\prime}}_{m}-\mathbf{D}_{m} and 𝐀m2ϵδnnz(𝐀)\|{\mathbf{A}^{\prime}}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})},

𝐇m2\displaystyle\|\mathbf{H}_{m}\|_{2} 𝐀m2+𝐃m2\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m}\|_{2}+\|\mathbf{D}_{m}\|_{2}
ϵδnnz(𝐀)+ϵδnnz(𝐀)\displaystyle\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}+\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}
=2ϵδnnz(𝐀)\displaystyle=2\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})} (23)

where the second step follows from the fact that 𝐃m2maxi|(𝐃m)ii|𝐀m2\|\mathbf{D}_{m}\|_{2}\leq\max_{i}\lvert(\mathbf{D}_{m})_{ii}\rvert\leq\|\mathbf{A}^{\prime}_{m}\|_{2}.

We next bound the term 𝔼2𝐇m𝐒^12\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}. Observe that 𝔼2𝐇m𝐒^12maxi𝐀m,i2pi\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\leq\frac{\max_{i}\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}}{\sqrt{p_{i}}}, where 𝐀m,i\mathbf{A^{\prime}}_{m,i} is the iith column/row of 𝐀m\mathbf{A}^{\prime}_{m}. We again consider the two cases when pi=1p_{i}=1 and pi<1p_{i}<1:

Case 1: pi=1p_{i}=1. Then 𝐀m,i2𝐀m2ϵδnnz(𝐀)\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}\leq\|{\mathbf{A}^{\prime}}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}.

Case 2: pi<1p_{i}<1. Then 𝐀m,i2𝐀i2nnz(𝐀i)\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}\leq\|{\mathbf{A}^{\prime}}_{i}\|_{2}\leq\sqrt{\operatorname{nnz}({\mathbf{A}}_{i})}. Thus, setting s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta} we have:

𝐀m,i2pi\displaystyle\frac{\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}}{\sqrt{p_{i}}} nnz(𝐀)snnz(𝐀i)𝐀i2\displaystyle\leq\sqrt{\frac{\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{i})}}\cdot\|{\mathbf{A}^{\prime}}_{i}\|_{2}
nnz(𝐀)sϵδnnz(𝐀).\displaystyle\leq\sqrt{\frac{\operatorname{nnz}({\mathbf{A}})}{s}}\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}.

Thus, from the two cases above, for all i[n]i\in[n], adjusting ϵ\epsilon by a 1logn\frac{1}{\sqrt{\log n}} factor, we have for slognϵ2δs\geq\frac{\log n}{\epsilon^{2}\delta}:

𝔼2𝐇m𝐒^12\displaystyle\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2} ϵδnnz(𝐀)logn.\displaystyle\leq\frac{\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}}{\sqrt{\log n}}. (24)

Overall, plugging (22), (23), and (24) back into (21), we have :

𝔼2𝐒¯𝐀m𝐒¯210logn𝔼2𝐒¯𝐇m𝐒^12+15ϵδnnz(𝐀).\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}\leq 10\sqrt{\log n}\cdot\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+15\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}. (25)

It remains to bound 𝔼2𝐒¯𝐇m𝐒^12\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}, which is the most complex part of the proof. Since 𝐒^\hat{\mathbf{S}} is an independent copy of 𝐒¯\bar{\mathbf{S}}, we denote the norm of the iith column of 𝐒¯𝐇m𝐒^\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}} as (𝐒¯𝐇m):,i2pi\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}. Then 𝔼2𝐒¯𝐇m𝐒^12𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\leq\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right). We will argue that maxi:i[n](𝐒¯𝐇m):,i2pi\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}} is bounded by ϵδnnz(𝐀)\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} with probability 11/missingpoly(n)1-1/\mathop{\mathrm{missing}}{poly}(n). Since our sampling probabilities are all at least 1/n21/n^{2} and since 𝐇mF𝐀Fn\|\mathbf{H}_{m}\|_{F}\leq\|\mathbf{A}\|_{F}\leq n, this value is also deterministically bounded by n2n^{2}. Thus, our high probability bound implies the needed bound on 𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right).

We begin by observing that since 𝐀m=𝐇m+𝐃m{\mathbf{A}^{\prime}}_{m}=\mathbf{H}_{m}+\mathbf{D}_{m}, (𝐒¯𝐀m):,i2(𝐒¯𝐇m):,i2\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}\geq\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}, and so to bound maxi:i[n](𝐒¯𝐇m):,i2pi\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}, it suffices to bound (𝐒¯𝐀m):,i2pi\frac{\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}} for all i[n]i\in[n]. Towards this end, for a fixed ii and any j[n]j\in[n], define

zj\displaystyle z_{j} ={1pj|𝐀m,i,j|2with probability pj0otherwise.\displaystyle=\begin{cases}\frac{1}{p_{j}}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}&\text{with probability $p_{j}$}\\ 0&\text{otherwise}.\end{cases}

Then j=1nzj=(𝐒¯𝐀m):,i22\sum_{j=1}^{n}z_{j}=\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}^{2} and 𝔼[j=1nzj]=𝐀m,i22𝐀i22nnz(𝐀i)\mathbb{E}[\sum_{j=1}^{n}z_{j}]=\|\mathbf{A}^{\prime}_{m,i}\|_{2}^{2}\leq\|\mathbf{A}^{\prime}_{i}\|_{2}^{2}\leq\operatorname{nnz}(\mathbf{A}_{i}). Since j=1nzj=(𝐒¯𝐀m):,i22\sum_{j=1}^{n}z_{j}=\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}^{2} is a sum of independent random variables, we can bound this quantity by applying Bernstein’s inequality. To do this, we must bound |zj||z_{j}| for all j[n]j\in[n] and 𝐕𝐚𝐫(j=1nzj)\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right). We will again consider the cases of pi<1p_{i}<1 and pi=1p_{i}=1 separately.

Case 1: pi<1p_{i}<1. Then, we have pi=snnz(𝐀i)/nnz(𝐀)p_{i}=s\operatorname{nnz}({\mathbf{A}}_{i})/\operatorname{nnz}({\mathbf{A}}). If 𝐀i,j0{\mathbf{A}^{\prime}}_{i,j}\neq 0 then

|zj|\displaystyle|z_{j}| 1pj|𝐀m,i,j|2max(1,nnz(𝐀)snnz(𝐀j))|𝐀m,i,j|2\displaystyle\leq\frac{1}{p_{j}}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\max\left(1,\frac{\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}\right)|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}
|𝐀m,i,j|2+2nnz(𝐀)snnz(𝐀j)(|𝐀i,j|2+|𝐀o,i,j|2)\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}\left(|{\mathbf{A}^{\prime}}_{i,j}|^{2}+|{\mathbf{A}^{\prime}}_{o,i,j}|^{2}\right)
|𝐀m,i,j|2+2nnz(𝐀)snnz(𝐀j)(|𝐀i,j|2+nnz(𝐀i)nnz(𝐀j)ϵ2δnnz(𝐀))\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}\left(|{\mathbf{A}^{\prime}}_{i,j}|^{2}+\frac{\operatorname{nnz}({\mathbf{A}}_{i})\operatorname{nnz}({\mathbf{A}}_{j})}{\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}})}\right)
|𝐀m,i,j|2+2nnz(𝐀)snnz(𝐀j)|𝐀i,j|2+2nnz(𝐀i)ϵ2δs,\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}|{\mathbf{A}^{\prime}}_{i,j}|^{2}+\frac{2\operatorname{nnz}({\mathbf{A}}_{i})}{\epsilon^{2}\delta s},

where the fourth inequality uses (18). By the thresholding procedure which defines 𝐀\mathbf{A}^{\prime}, if 𝐀ij0\mathbf{A}^{\prime}_{ij}\neq 0,

nnz(𝐀i)nnz(𝐀j)ϵ2nnz(𝐀)c2log2nnnz(𝐀j)ϵ2nnz(𝐀)c2log2nnnz(𝐀i),\displaystyle\operatorname{nnz}({\mathbf{A}}_{i})\cdot\operatorname{nnz}({\mathbf{A}}_{j})\geq\frac{\epsilon^{2}\operatorname{nnz}({\mathbf{A}})}{c_{2}\log^{2}n}\Rightarrow\operatorname{nnz}({\mathbf{A}}_{j})\geq\frac{\epsilon^{2}\operatorname{nnz}({\mathbf{A}})}{c_{2}\log^{2}n\operatorname{nnz}({\mathbf{A}}_{i})}, (26)

and thus for pi<1p_{i}<1 and 𝐀ij0{\mathbf{A}^{\prime}}_{ij}\neq 0 we have

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2c2log2nnnz(𝐀i)sϵ2+2nnz(𝐀i)ϵ2δs.\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2c_{2}\log^{2}n\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}}+\frac{2\operatorname{nnz}({\mathbf{A}}_{i})}{\epsilon^{2}\delta s}.

If 𝐀i,j=0{\mathbf{A}^{\prime}}_{i,j}=0 then we simply have

|zj|\displaystyle|z_{j}| |𝐀m,ij|2+nnz(𝐀i)sϵ2δ.\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,ij}|^{2}+\frac{\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}\delta}.

Overall for all j[n]j\in[n],

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2nnz(𝐀i)sϵ2δ+2c2log2nnnz(𝐀i)sϵ2,\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{2}n\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}}, (27)

and since |𝐀m,i,j|2j=1n|𝐀m,i,j|2=𝐀m,i22𝐀i22nnz(𝐀i)|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}=\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{2}\leq\|{\mathbf{A}^{\prime}}_{i}\|_{2}^{2}\leq\operatorname{nnz}({\mathbf{A}}_{i}),

|zj|\displaystyle|z_{j}| nnz(𝐀i)+2nnz(𝐀i)sϵ2δ+2c2log2nnnz(𝐀i)sϵ2.\displaystyle\leq\operatorname{nnz}({\mathbf{A}}_{i})+\frac{2\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{2}n\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}}. (28)

For sc(log2nϵ2+1ϵ2δ)s\geq c\left(\frac{\log^{2}n}{\epsilon^{2}}+\frac{1}{\epsilon^{2}\delta}\right) and large enough cc, we thus have |zj|2nnz(𝐀)|z_{j}|\leq 2\operatorname{nnz}({\mathbf{A}}).

We next bound the variance by:

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) j=1n𝔼[zj2]j=1npj1pj2|𝐀m,i,j|4\displaystyle\leq\sum_{j=1}^{n}\mathbb{E}[z_{j}^{2}]\leq\sum_{j=1}^{n}p_{j}\frac{1}{p_{j}^{2}}|{\mathbf{A}^{\prime}}_{m,i,j}|^{4}
=j=1nmax(1,nnz(𝐀)snnz(𝐀j))|𝐀m,i,j|4\displaystyle=\sum_{j=1}^{n}\max\left(1,\frac{\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}\right)|{\mathbf{A}^{\prime}}_{m,i,j}|^{4}
j=1n|𝐀m,i,j|4+j=1n12nnz(𝐀)snnz(𝐀j)(|𝐀i,j|4+|𝐀o,i,j|4)\displaystyle\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{4}+\sum_{j=1}^{n}\frac{12\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}\left(|{\mathbf{A}^{\prime}}_{i,j}|^{4}+|{\mathbf{A}^{\prime}}_{o,i,j}|^{4}\right)
𝐀m,i24+j=1n12nnz(𝐀)snnz(𝐀j)(|𝐀i,j|4+nnz(𝐀i)2nnz(𝐀j)2ϵ4δ2nnz(𝐀)2),\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\sum_{j=1}^{n}\frac{12\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}\left(|\mathbf{A}_{i,j}^{\prime}|^{4}+\frac{\operatorname{nnz}(\mathbf{A}_{i})^{2}\operatorname{nnz}(\mathbf{A}_{j})^{2}}{\epsilon^{4}\delta^{2}\operatorname{nnz}(\mathbf{A})^{2}}\right),

where the last inequality uses (18). Now since 𝐀ii=0\mathbf{A}_{ii}^{\prime}=0 for all ii and 𝐀1\|\mathbf{A}^{\prime}\|_{\infty}\leq 1 we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i24+j:𝐀i,j012nnz(𝐀)snnz(𝐀j)+j=1n12nnz(𝐀i)2nnz(𝐀j)sϵ4δ2nnz(𝐀).\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\sum_{j:{\mathbf{A}^{\prime}}_{i,j}\neq 0}\frac{12\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{j})}+\sum_{j=1}^{n}\frac{12\operatorname{nnz}({\mathbf{A}}_{i})^{2}\operatorname{nnz}({\mathbf{A}}_{j})}{s\epsilon^{4}\delta^{2}\operatorname{nnz}({\mathbf{A}})}. (29)

Combining (26) with the second term to the right of (29) we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i24+j:𝐀i,j012c2log2nnnz(𝐀i)sϵ2+j=1n12nnz(𝐀i)2nnz(𝐀j)sϵ4δ2nnz(𝐀),\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\sum_{j:{\mathbf{A}^{\prime}}_{i,j}\neq 0}\frac{12c_{2}\log^{2}n\cdot\operatorname{nnz}({\mathbf{A}_{i}})}{s\epsilon^{2}}+\sum_{j=1}^{n}\frac{12\operatorname{nnz}({\mathbf{A}}_{i})^{2}\operatorname{nnz}({\mathbf{A}}_{j})}{s\epsilon^{4}\delta^{2}\operatorname{nnz}({\mathbf{A}})},

and since |{j:𝐀i,j0}|=nnz(𝐀i)|\{j:{\mathbf{A}^{\prime}}_{i,j}\neq 0\}|=\operatorname{nnz}({\mathbf{A}}_{i}), we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i24+12c2log2nnnz(𝐀i)2sϵ2+j=1n12nnz(𝐀i)2nnz(𝐀j)sϵ4δ2nnz(𝐀).\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\frac{12c_{2}\log^{2}n\cdot\operatorname{nnz}({\mathbf{A}_{i}})^{2}}{s\epsilon^{2}}+\sum_{j=1}^{n}\frac{12\operatorname{nnz}({\mathbf{A}}_{i})^{2}\operatorname{nnz}({\mathbf{A}}_{j})}{s\epsilon^{4}\delta^{2}\operatorname{nnz}({\mathbf{A}})}. (30)

Finally since j=1nnnz(𝐀j)=nnz(𝐀)\sum_{j=1}^{n}\operatorname{nnz}(\mathbf{A}_{j})=\operatorname{nnz}(\mathbf{A}) and 𝐀m,i24𝐀i24nnz(𝐀i)2\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}\leq\|\mathbf{A^{\prime}}_{i}\|_{2}^{4}\leq\operatorname{nnz}(\mathbf{A}_{i})^{2} we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) nnz(𝐀i)2+12c2log2nnnz(𝐀i)2sϵ2+12nnz(𝐀i)2sϵ4δ2.\displaystyle\leq\operatorname{nnz}({\mathbf{A}}_{i})^{2}+\frac{12c_{2}\log^{2}n\cdot\operatorname{nnz}({\mathbf{A}}_{i})^{2}}{s\epsilon^{2}}+\frac{12\operatorname{nnz}({\mathbf{A}}_{i})^{2}}{s\epsilon^{4}\delta^{2}}. (31)

For sc(log2nϵ2+1ϵ4δ2)s\geq c\left(\frac{\log^{2}n}{\epsilon^{2}}+\frac{1}{\epsilon^{4}\delta^{2}}\right) for large enough cc, we have 𝐕𝐚𝐫(j=1nzj)2nnz(𝐀i)2\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right)\leq 2\operatorname{nnz}({\mathbf{A}}_{i})^{2}.

Therefore, using (28) and (31) with sc(log2nϵ2+1ϵ4δ2)s\geq c\left(\frac{\log^{2}n}{\epsilon^{2}}+\frac{1}{\epsilon^{4}\delta^{2}}\right), we can apply Bernstein inequality (Theorem 7) (for some constant cc) to get

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+t)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+t\right) (j=1nzjnnz(𝐀i)+t)\displaystyle\leq\operatorname*{\mathbb{P}}\left(\sum_{j=1}^{n}z_{j}\geq\operatorname{nnz}(\mathbf{A}_{i})+t\right)
exp(t2/2cnnz(𝐀i)2+ctnnz(𝐀i)/3).\displaystyle\leq\exp\left(\frac{-t^{2}/2}{c\operatorname{nnz}({\mathbf{A}}_{i})^{2}+ct\operatorname{nnz}({\mathbf{A}}_{i})/3}\right).

If we set t=lognnnz(𝐀i)t=\log n\cdot\operatorname{nnz}({\mathbf{A}}_{i}), for some constant cc^{\prime} we have

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+lognnnz(𝐀i))\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+\log n\cdot\operatorname{nnz}({\mathbf{A}}_{i})\right) exp((logn)2/2c+c(logn)/3)exp(clogn)1/nc.\displaystyle\leq\exp\left(\frac{-(\log n)^{2}/2}{c+c(\log n)/3}\right)\leq\exp(-c^{\prime}\log n)\leq 1/n^{c^{\prime}}.

Since 𝐀m=𝐇m+𝐃m{\mathbf{A}^{\prime}}_{m}=\mathbf{H}_{m}+\mathbf{D}_{m}, we have (𝐒¯𝐀m):,i2(𝐒¯𝐇m):,i2\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}\geq\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}. Then with probability at least 11/nc1δ1-1/n^{c^{\prime}}\geq 1-\delta, for any row ii with pi<1p_{i}<1, we have

1pi(𝐒¯𝐇m):,i22nnz(𝐀)snnz(𝐀i)c(logn)nnz(𝐀i)ϵ2δnnz(𝐀)logn,\displaystyle\frac{1}{p_{i}}\cdot\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}\leq\frac{\operatorname{nnz}({\mathbf{A}})}{s\operatorname{nnz}({\mathbf{A}}_{i})}\cdot c(\log n)\operatorname{nnz}({\mathbf{A}}_{i})\leq\frac{\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}})}{\log n},

for sc(log2nϵ2+1ϵ4δ2)s\geq c\left(\frac{\log^{2}n}{\epsilon^{2}}+\frac{1}{\epsilon^{4}\delta^{2}}\right) for large enough cc. Observe that, as in Lemma 3 w.l.o.g. we have assumed 1nc1δ1-n^{c^{\prime}}\geq 1-\delta, since otherwise, our algorithm would read all n2n^{2} entries of the matrix.

Case 2: pi=1p_{i}=1. Then, we have nnz(𝐀i)nnz(𝐀)/s\operatorname{nnz}({\mathbf{A}}_{i})\geq\operatorname{nnz}({\mathbf{A}})/s. As in the pi<1p_{i}<1 case, we have from (27):

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2nnz(𝐀i)sϵ2δ+2c2log2nnnz(𝐀i)sϵ2.\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{2}n\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}}.

Now, we observe that |𝐀m,i,j|2j=1n|𝐀m,i,j|2𝐀m,i22𝐀22ϵ2δnnz(𝐀)|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\|\mathbf{A}^{\prime}_{m,i}\|^{2}_{2}\leq\|\mathbf{A}^{\prime}\|^{2}_{2}\leq\epsilon^{2}\delta\operatorname{nnz}(\mathbf{A}), which gives us

|zj|\displaystyle|z_{j}| ϵ2δnnz(𝐀)+2nnz(𝐀i)sϵ2δ+2c2log2nnnz(𝐀i)sϵ2.\displaystyle\leq\epsilon^{2}\delta\operatorname{nnz}(\mathbf{A})+\frac{2\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{2}n\operatorname{nnz}({\mathbf{A}}_{i})}{s\epsilon^{2}}. (32)

Thus, for sc(log2nϵ4δ+1ϵ4δ2)s\geq c\left(\frac{\log^{2}n}{\epsilon^{4}\delta}+\frac{1}{\epsilon^{4}\delta^{2}}\right) for a large enough constant cc and adjusting for other constants we have |zj|2ϵ2δnnz(𝐀)|z_{j}|\leq 2\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}}). Also observe that the expectation of zj\sum z_{j} can be bounded by:

𝔼(j=1nzj)=𝔼(𝐒¯𝐀m):,i22=𝐀m,i22𝐀m22ϵ2δnnz(𝐀).\displaystyle\mathbb{E}\left(\sum_{j=1}^{n}z_{j}\right)=\mathbb{E}\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}^{2}=\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{2}\leq\|{\mathbf{A}^{\prime}}_{m}\|_{2}^{2}\leq\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}}).

Next, the variance of the sum of the random variables {zj}\{z_{j}\} can again be bounded by following the analysis presented in and prior to (30) and (31) we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i,j24+12c2log2nnnz(𝐀i)2sϵ2+12nnz(𝐀i)2sϵ4δ2\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i,j}\|_{2}^{4}+\frac{12c_{2}\log^{2}n\cdot\operatorname{nnz}({\mathbf{A}}_{i})^{2}}{s\epsilon^{2}}+\frac{12\operatorname{nnz}({\mathbf{A}}_{i})^{2}}{s\epsilon^{4}\delta^{2}}
ϵ4δ2nnz(𝐀)2+12c2log2nnnz(𝐀i)2sϵ2+12nnz(𝐀i)2sϵ4δ2,\displaystyle\leq\epsilon^{4}\delta^{2}\operatorname{nnz}({\mathbf{A}})^{2}+\frac{12c_{2}\log^{2}n\cdot\operatorname{nnz}({\mathbf{A}}_{i})^{2}}{s\epsilon^{2}}+\frac{12\operatorname{nnz}({\mathbf{A}}_{i})^{2}}{s\epsilon^{4}\delta^{2}}, (33)

where we again bound 𝐀m,i,j24\|{\mathbf{A}^{\prime}}_{m,i,j}\|_{2}^{4} using

|𝐀m,i,j|2j=1n|𝐀m,i,j|2𝐀m,i22𝐀22ϵ2δnnz(𝐀).|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\|\mathbf{A}^{\prime}_{m,i}\|^{2}_{2}\leq\|\mathbf{A}^{\prime}\|^{2}_{2}\leq\epsilon^{2}\delta\operatorname{nnz}(\mathbf{A}).

Then for sc(log2nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{2}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}), we have 𝐕𝐚𝐫(j=1nzj)2ϵ4δ2nnz(𝐀)2\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right)\leq 2\epsilon^{4}\delta^{2}\operatorname{nnz}({\mathbf{A}})^{2} for large enough constant cc.

Using (32) and (33) and noting that j=1n𝔼(zj2)𝐕𝐚𝐫(j=1nzj)𝔼2(j=1nzj)\sum_{j=1}^{n}\mathbb{E}\left(z_{j}^{2}\right)\geq\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right)-\mathbb{E}^{2}\left(\sum_{j=1}^{n}z_{j}\right) we can apply the Bernstein inequality (Theorem 7):

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+t)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+t\right) (j=1nzjϵ2δnnz(𝐀i)+t)\displaystyle\leq\operatorname*{\mathbb{P}}\left(\sum_{j=1}^{n}z_{j}\geq\epsilon^{2}\delta\operatorname{nnz}(\mathbf{A}_{i})+t\right)
exp(t2/2cϵ4δ2nnz(𝐀)2+cϵ2δnnz(𝐀)t/3).\displaystyle\leq\exp\left(\frac{-t^{2}/2}{c\epsilon^{4}\delta^{2}\operatorname{nnz}({\mathbf{A}})^{2}+c\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}})t/3}\right).

If we set t=(logn)ϵ2δnnz(𝐀)t=(\log n)\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}}), then for some constant cc^{\prime} we have

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+t)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+t\right) exp(clogn)1/nc.\displaystyle\leq\exp(-c^{\prime}\log n)\leq 1/n^{c^{\prime}}.

This, since (𝐒¯𝐇m):,i22(𝐒¯𝐀m):,i22\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}\leq\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}, when pi=1p_{i}=1, setting sc(log2nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{2}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}) for large enough cc, we have with probability 11/nc\geq 1-1/n^{c^{\prime}} 1pi(𝐒¯𝐇m):,i22=(𝐒¯𝐇m):,i22(𝐒¯𝐀m):,i22(logn)ϵ2δnnz(𝐀).\frac{1}{p_{i}}\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}=\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}\leq\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\leq(\log n)\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}}).

We thus have, that with probability 11/nc\geq 1-1/n^{c^{\prime}}, for both cases when pi<1p_{i}<1 and pi=1p_{i}=1, (𝐒¯𝐇m):,i22pi(logn)ϵ2δnnz(𝐀)\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}}{p_{i}}\leq(\log n)\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}}). Taking a union bound over all i[n]i\in[n], with probability at least 11/nc11-1/n^{c^{\prime}-1}, maxi(𝐒¯𝐇m):,i2pilognϵδnnz(𝐀)\max_{i}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\leq\sqrt{\log n}\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})} for sc(log2nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{2}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}). As stated before, since pi1n2p_{i}\geq\frac{1}{n^{2}} for all i[n]i\in[n], and since 𝐇mF𝐀Fn\|\mathbf{H}_{m}\|_{F}\leq\|\mathbf{A}\|_{F}\leq n, we also have maxi(𝐒¯𝐇m):,i2pin2\max_{i}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\leq n^{2}. Thus,

𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)lognϵδnnz(𝐀)(11nc1)+1nc3lognϵδnnz(𝐀).\displaystyle\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right)\leq\sqrt{\log n}\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}(1-\frac{1}{n^{c^{\prime}-1}})+\frac{1}{n^{c^{\prime}-3}}\leq\sqrt{\log n}\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}.

after adjusting ϵ\epsilon by at most some constants. Overall, we finally get

𝔼2𝐒¯𝐇m𝐒^12𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)ϵlognδnnz(𝐀).\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\leq\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right)\leq\epsilon\sqrt{\log n}\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}.

Plugging this bound into (25), we have for sc(log2nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{2}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}),

𝔼2𝐒¯𝐀m𝐒¯2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2} (logn)ϵδnnz(𝐀).\displaystyle\leq(\log n)\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}.

Finally after adjusting ϵ\epsilon by a 1logn\frac{1}{\log n} factor, we have for sc(log8nϵ6δ2+log8nϵ8δ4)s\geq c(\frac{\log^{8}n}{\epsilon^{6}\delta^{2}}+\frac{\log^{8}n}{\epsilon^{8}\delta^{4}}) or sclog8nϵ8δ4s\geq\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}},

𝔼2𝐒¯𝐀m𝐒¯2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2} ϵδnnz(𝐀).\displaystyle\leq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}({\mathbf{A}})}.

The final bound then follows via Markov’s inequality on 𝐒¯𝐀m𝐒¯2\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}. ∎

4.3 Main Accuracy Bound

We are finally ready to prove our main result for sparsity-based sampling, which we restate below. See 2

Proof.

With Lemmas 7 and 8 in place, the proof is nearly identical to that of Theorem 1, with the additional need to apply Lemma 5 to show that the eigenvalues of 𝐀\mathbf{A}^{\prime} are close to those of 𝐀\mathbf{A}.

For all i[n]i\in[n] let pi=min(1,snnz(𝐀i)nnz(𝐀))p_{i}=\min\left(1,\frac{s\operatorname{nnz}(\mathbf{A}_{i})}{\operatorname{nnz}(\mathbf{A})}\right) and let 𝐒¯\bar{\mathbf{S}} be a scaled diagonal sampling matrix such that the 𝐒¯ii=1pi\bar{\mathbf{S}}_{ii}=\frac{1}{\sqrt{p_{i}}} with probability pip_{i} and 𝐒¯ii=0\bar{\mathbf{S}}_{ii}=0 otherwise. Let 𝐀\mathbf{A}^{\prime} be the matrix constructed from 𝐀\mathbf{A} by zeroing out its elements as described in Lemma 5. Then, note that 𝐒¯𝐀𝐒¯=𝐀S\bar{\mathbf{S}}\mathbf{A}^{\prime}\bar{\mathbf{S}}=\mathbf{A}^{\prime}_{S} where 𝐀S\mathbf{A}^{\prime}_{S} is the submatrix constructed as in Algorithm 2. We first show that the eigenvalues of 𝐀S\mathbf{A}^{\prime}_{S} approximate those of 𝐀\mathbf{A}^{\prime} up to error ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}. The steps are almost identical to those in the proof of Theorem 1. We provide a brief outline of the steps but skip the details.

We split 𝐀\mathbf{A}^{\prime} as 𝐀=𝐀o+𝐀m\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{o}+\mathbf{A}^{\prime}_{m} where 𝐀o\mathbf{A}^{\prime}_{o} and 𝐀m\mathbf{A}^{\prime}_{m} contain eigenvalues of 𝐀\mathbf{A}^{\prime} of magnitudes <ϵδnnz(𝐀)<\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})} andϵδnnz(𝐀)\geq\epsilon\sqrt{\delta}\sqrt{\operatorname{nnz}(\mathbf{A})}. This implies 𝐀S=𝐀o,S+𝐀m,S\mathbf{A}^{\prime}_{S}=\mathbf{A}^{\prime}_{o,S}+\mathbf{A}^{\prime}_{m,S} where 𝐀o,S=𝐒¯𝐀o𝐒¯\mathbf{A}^{\prime}_{o,S}=\bar{\mathbf{S}}\mathbf{A}^{\prime}_{o}\bar{\mathbf{S}} and 𝐀m,S=𝐒¯𝐀m𝐒¯\mathbf{A}^{\prime}_{m,S}=\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m}\bar{\mathbf{S}}. By Fact 1 we have that the nonzero eigenvalues of 𝐀o,S=𝐒¯𝐕o𝚲o𝐕oT𝐒¯\mathbf{A}^{\prime}_{o,S}=\bar{\mathbf{S}}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}} are identical to those of 𝚲o1/2𝐕oT𝐒¯𝐒¯𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}. Thus, applying the perturbation bound of Fact 4, we have:

|λi(𝚲o1/2𝐕oT𝐒¯𝐒¯𝐕o𝚲o1/2)λi(𝚲o)|Clogn𝚲o1/2𝐕oT𝐒¯𝐒¯𝐕o𝚲o1/2𝚲o2.\displaystyle\left|\lambda_{i}(\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2})-\lambda_{i}(\mathbf{\Lambda}^{\prime}_{o})\right|\leq C\log n\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}-\mathbf{\Lambda}^{\prime}_{o}\|_{2}.

From Lemma 7, we get 𝚲o1/2𝐕oT𝐒¯𝐒¯𝐕o𝚲o1/2𝚲o2ϵnnz(𝐀)\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}-\mathbf{\Lambda}^{\prime}_{o}\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} for sclog(1/(ϵδ))ϵ3δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{3}\sqrt{\delta}} with probability at least 1δ1-\delta. Thus, setting the error parameter to ϵlogn\frac{\epsilon}{\log n} in Lemma 7, for sclog(1/(ϵδ))log3nϵ3δs\geq\frac{c\log(1/(\epsilon\delta))\log^{3}n}{\epsilon^{3}\sqrt{\delta}}, with probability at least 1δ1-\delta we have:

|λi(𝚲o1/2𝐕oT𝐒¯𝐒¯𝐕o𝚲o1/2)λi(𝚲o)|\displaystyle\left|\lambda_{i}(\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2})-\lambda_{i}(\mathbf{\Lambda}^{\prime}_{o})\right| <ϵnnz(𝐀).\displaystyle<\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}. (34)

We have thus shown that the non-zero eigenvalues of 𝐀o,S\mathbf{A}^{\prime}_{o,S} approximate all outlying eigenvalues of 𝐀\mathbf{A}^{\prime}. Note that by Lemma 8, we also have 𝐀m,S2ϵnnz(𝐀)\|\mathbf{A}^{\prime}_{m,S}\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} with probability at least 1δ1-\delta for sclog8nϵ8δ4s\geq\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}}. Then, similarly to the section on eigenvalue alignment of Theorem 1, we can argue how these approximations ‘line up’ in the presence of zero eigenvalues in the spectrum of these matrices, concluding that, for all i[n]i\in[n],

|λ~i(𝐀)λi(𝐀)|ϵnnz(𝐀).\displaystyle\left|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime})\right|\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}.

Finally, by Lemma 5, we have |λi(𝐀)λi(𝐀)|ϵnnz(𝐀)\lvert\lambda_{i}(\mathbf{A}^{\prime})-\lambda_{i}(\mathbf{A})\rvert\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} for all i[n]i\in[n]. Thus, via triangle inequality, |λ~i(𝐀)λi(𝐀)|2ϵnnz(𝐀)\left|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})\right|\leq 2\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})}, which gives the required bound after adjusting ϵ\epsilon to ϵ/2\epsilon/2.

Recall that we require sclog(1/(ϵδ))log3nϵ3δs\geq\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}} for (34) to hold with probability 1δ1-\delta. We also require sclog8nϵ8δ4s\geq\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}} for 𝐀m,S2ϵnnz(𝐀)\|\mathbf{A}_{m,S}^{\prime}\|_{2}\leq\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} to hold with probability 1δ1-\delta by Lemma 8. Thus, for both conditions to hold simultaneously with probability 12δ1-2\delta by a union bound, it suffices to set s=clog8nϵ8δ4max(clog(1/(ϵδ))log3nϵ3δ,clog8nϵ8δ4)s=\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}}\geq\max\left(\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}},\frac{c\log^{8}n}{\epsilon^{8}\delta^{4}}\right), where we use that log(1/(ϵδ)O(logn)\log(1/(\epsilon\delta)\leq O(\log n), as otherwise our algorithm can take 𝐀S\mathbf{A}_{S} to be the full matrix 𝐀\mathbf{A}. Adjusting δ\delta to δ/2\delta/2 completes the theorem. ∎

5 Empirical Evaluation

We complement our theoretical results by evaluating Algorithms 1 (uniform sampling) and Algorithm 2 (sparsity-based sampling) in approximating the eigenvalues of several symmetric matrices. We defer an evaluation of Algorithm 3 (norm-based sampling) to later work. Algorithm 1 and Algorithm 2 perform very well. They seem to have error dependence roughly 1/ϵ21/\epsilon^{2} in practice, as compared to the 1/ϵ31/\epsilon^{3} dependence proven in Theorem 1 and 1/ϵ81/\epsilon^{8} dependence in Theorem 2. Closing the gap between the theory and observed results would be very interesting.

5.1 Datasets

We test Algorithm 1 (uniform sampler) on three dense matrices. We also compare the relative performance of Algorithm 1 and Algorithm 2 (sparsity sampler) on three other synthetic and real world matrices.

The first two dense matrices, following [CNX21], are created by sampling 50005000 points from a binary image. We then normalize all the points in the range [0,1][0,1] in both axes. The original image and resulting set of points are shown in Figure 2. We then compute a similarity matrix for the points using two common similarity functions used in machine learning and computer graphics: δ(𝐱,𝐲)=tanh(𝐱,𝐲2)\delta(\mathbf{x},\mathbf{y})=\tanh\left({\frac{\langle\mathbf{x},\mathbf{y}\rangle}{2}}\right), the hyperbolic tangent; and δ(𝐱,𝐲)=𝐱𝐲22log(𝐱𝐲22)\delta(\mathbf{x},\mathbf{y})={\|\mathbf{x}-\mathbf{y}\|_{2}^{2}}\cdot\log\left({\|\mathbf{x}-\mathbf{y}\|_{2}^{2}}\right), the thin plane spline. These measures lead to symmetric, indefinite, and entrywise bounded similarity matrices.

Our next dense matrix (called the block matrix) is based on the construction of the hard instance for the lower bound in [BCJ20] which shows that we need Ω(1/ϵ2)×Ω(1/ϵ2)\Omega(1/\epsilon^{2})\times\Omega(1/\epsilon^{2}) samples to compute ϵn\epsilon n approximations to the eigenvalues of a bounded entry matrix. It is a 5000×50005000\times 5000 matrix containing a 2500×25002500\times 2500 principal submatrix of all 11s, with the rest of the entries set to 0. It has λ1(𝐀)=2500\lambda_{1}(\mathbf{A})=2500 and all other eigenvalues equal to 0.

We now describe the three matrices used to compare Algorithm 1 and Algorithm 2. All three are graph adjacency matrices, which are symmetric, indefinite, entrywise bounded and sparse. Spectral density estimation for graph structured matrices is an important primitive in network analysis [DBB19]. The first is a dense Erdös-Rényi graph with 50005000 nodes and connection probability 0.10.1. The second two are real world graphs, taken from SNAP [LK14]; namely Facebook [ML12] and Arxiv COND-MAT [LKF07]. The Facebook graph contains 40394039 nodes and 8823488234 directed edges. We symmetrize the adjacency matrix. Arxiv COND-MAT is a collaboration network between authors of Condensed Matter papers published on arXiv, containing 2313323133 nodes and 9349793497 undirected edges. Both these graphs are very sparse – the number of edges is 1%\leq 1\% of the total edges in a complete graph with same number of nodes.

Refer to caption
Refer to caption
Figure 2: Kong dataset. The image on the left is the original synthetic binary image and the image on the right shows the 5000 sampled points from the outline used as dataset in our experiments.

5.2 Implementation Details

Apart from uniform random sampling (Algorithm 1), we also apply the sparsity-based sampling technique in Algorithm 2 and a modification to Algorithm 2, where we do not zero out the elements of the sampled submatrix 𝐀S\mathbf{A}_{S} (we call this simple sparsity sampler). In practice, to apply Algorithm 2, we zero out element [𝐀S]i,j[\mathbf{A}_{S}]_{i,j} (line 5 of Algorithm 2) if i=ji=j or nnz(𝐀i)nnz(𝐀j)<nnz(𝐀)c2s\operatorname{nnz}(\mathbf{A}_{i})\operatorname{nnz}(\mathbf{A}_{j})<\frac{\operatorname{nnz}(\mathbf{A})}{c_{2}s}, where c2c_{2} is a constant and ss is the size of the sample. We set c2=0.1c_{2}=0.1 experimentally as this results in consistent behavior across datasets.

5.3 Experimental Setup

We subsample each matrix and compute its eigenvalues using numpy [Com21]. We then use our approximation algorithms to estimate the eigenvalues of 𝐀\mathbf{A} by scaling the eigenvalues of the sampled submatrix. For tt trials, we report the logarithm of the average absolute scaled error, log(1t|λ~i,t(𝐀)λi(𝐀)|nnz(𝐀))\log\left(\frac{1}{t}\sum\frac{|\tilde{\lambda}_{i,t}(\mathbf{A})-\lambda_{i}(\mathbf{A})|}{\sqrt{\operatorname{nnz}(\mathbf{A})}}\right), where λ~i,t(𝐀)\tilde{\lambda}_{i,t}(\mathbf{A}) is the estimated eigenvalue in the ttht^{th} trial, λi(𝐀)\lambda_{i}(\mathbf{A}) is the true eigenvalue and nnz(𝐀)\operatorname{nnz}(\mathbf{A}) is the number of non-zero elements in 𝐀\mathbf{A}. Recall that nnz(𝐀)𝐀F\sqrt{\operatorname{nnz}(\mathbf{A})}\geq\|\mathbf{A}\|_{F} is an upper bound on all eigenvalue magnitudes. Also note that for the fully dense matrices, nnz(𝐀)n\sqrt{\operatorname{nnz}(\mathbf{A})}\approx n.

We repeat our experiments for t=50t=50 trials at different sampling rates and aggregate the results. The resultant errors of estimation for dense matrices are plotted in Figure 3 and for the graph matrices are plotted in Figure 4. The xx-axis is the log proportion of the number of random samples chosen from the matrix. If we sample 1%1\% of the rows/columns, then the log\log comes to around 4.5-4.5. In these log-log plots, if the sample size has polynomial dependence on ϵ\epsilon, e.g., ϵn\epsilon n or ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} error is achieved with sample size proportional to 1/ϵp1/\epsilon^{p}, we expect to see error falling off linearly, with slope equal to 1/p-1/p where pp is the exponent on ϵ\epsilon.

As a baseline we also show the error if we approximate all eigenvalues with 0 which results in an error of λinnz(𝐀)\frac{\lambda_{i}}{\sqrt{\operatorname{nnz}(\mathbf{A})}}. This helps us to observe how the approximation algorithms perform for both large and small order eigenvalues, as opposed to just approximating everything by 0.

Code. All codes are written in Python and available at https://github.com/archanray/eigenvalue_estimation.

5.4 Summary of Results

Our results are plotted in Figures 3 and 4. We observe relatively small error in approximating all eigenvalues, with the error decreasing as the number of samples increases. What is more interesting is that the relationship between sample size and error ϵn\epsilon n seems to be generally on the order of 1/ϵ21/\epsilon^{2}, our expected lower bound for approximating eigenvalues by randomly sampling a principal submatrix. This can be seen by observing the slope of approximately 1/2-1/2 on the log-log error plots. In some cases, we do better in approximating small eigenvalues of 𝐀\mathbf{A} – if the eigenvalue lies well within the range of middle eigenvalues, i.e. {ϵn,ϵn}\{-\epsilon n,\epsilon n\}), we may achieve a very good absolute error estimate simply by approximating it to 0.

As expected, on the graph adjacency matrices (in Figure 4), sparsity-based sampling techniques generally achieve better error than uniform sampling. For the Erdös-Rényi graph, we expect the node degrees (and hence row sparsities) to be similar. Thus the sampling probability for each row will be roughly uniform, which leads to similar performance of sparsity-based techniques and uniform sampling. For the real world graphs, which have power law degree distributions, sparsity-based sampling techniques has a significant effect. As a result Algorithm 2, and the simple sparsity sampler variant significantly outperform uniform sampling.

Algorithm 2 almost always dominates simple sparsity sampler. In some cases simple sparsity sampler performs better or equivalent to Algorithm 2. This may happen because for two reasons: 1) if Algorithm 2 zeroes out almost all of the sampled submatrix 𝐀S\mathbf{A}_{S} for small samples, the algorithm will underestimate the corresponding eigenvalue, and 2) the cut-off threshold for the term nnz(𝐀i)nnz(𝐀j)\operatorname{nnz}(\mathbf{A}_{i})\operatorname{nnz}(\mathbf{A}_{j}) may be too high leading to no difference between simple sparsity sampler and Algorithm 2.

We also observe that approximating all eigenvalues with 0 results in very good approximation for small eigenvalues of the Erdös-Rényi graph. We believe this is because the smaller eigenvalues are significantly less than the largest eigenvalue (of the order of 35003500). We see similar trends of approximating eigenvalues with zero for the real world graphs too. But since eigenvalues at the extreme spectrum are of a larger order, we see reasonably good approximation for the sampling algorithms. Algorithm 2 outperforms approximation by 0 in all of these cases.

In the dense matrices uniform sampling almost always outperforms approximation by 0 when estimating any reasonably large eigenvalues. Additionally, note that the block matrix is rank-11 with true eigenvalues {2500,0,,0}\{2500,0,\ldots,0\}. Any sampled principal submatrix will also have rank at most 11. Thus, outside the top eigenvalue, the submatrix will have all zero eigenvalues. So, in theory, our algorithm should give perfect error for all eigenvalues outside the top – we see that this is nearly the case. The very small and sporadic error in the plots for these eigenvalues arises due to numerical roundoff in the eigensolver. The only non-trivial approximation for this matrix is for the top eigenvalue. This approximation seems to have error dependency around 1/ϵ21/\epsilon^{2}, as expected.

6 Conclusion

We present efficient algorithms for estimating all eigenvalues of a symmetric matrix with bounded entries up to additive error ϵn\epsilon n, by reading just a missingpoly(logn,1/ϵ)×missingpoly(logn,1/ϵ)\mathop{\mathrm{missing}}{poly}(\log n,1/\epsilon)\times\mathop{\mathrm{missing}}{poly}(\log n,1/\epsilon) random principal submatrix. We give improved error bounds of ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} and ϵ𝐀F\epsilon\|\mathbf{A}\|_{F} when the rows/columns are sampled with probabilities proportional to their sparsities or squared 2\ell_{2} norms, respectively.

As discussed, our work leaves several open questions. In particular, it is open if our query complexity for ±ϵn\pm\epsilon n approximation can be improved, possibly to O~(logcn/ϵ4)\tilde{O}(\log^{c}n/\epsilon^{4}) total entries using principal submatrix queries or O~(logc/ϵ2)\tilde{O}(\log^{c}/\epsilon^{2}) entries using general queries. The later bound is open even when 𝐀\mathbf{A} is PSD, a setting where we know that sampling a O(1/ϵ2)×O(1/ϵ2)O(1/\epsilon^{2})\times O(1/\epsilon^{2}) principal submatrix (with O(1/ϵ4)O(1/\epsilon^{4}) total entries) does suffice. Additionally, it is open if we can achieve sample complexity independent of nn, by removing all logn\log n factors, as have been done for the easier problem of testing positive semidefiniteness [BCJ20]. See Section 1.4 for more details.

It would also be interesting to extend our results to give improved approximation bounds for other properties of the matrix spectrum, such as various Schatten-pp norms and spectral summaries. For many of these problems large gaps in understanding exist – e.g., for ±n3/2\pm n^{3/2} approximation to the Schatten-11 norm, which requires Ω(n)\Omega(n) queries, but for which no o(n2)o(n^{2}) query algorithm is known. Applying our techniques to improve sublinear time PSD testing algorithms under an 2\ell_{2} rather than \ell_{\infty} approximation requirement [BCJ20] would also be interesting. Finally, it would be interesting to identify additional assumptions on 𝐀\mathbf{A} or on the sampling model where stronger approximation guarantees (e.g., relative error) can be achieved in sublinear time.

Acknowledgements

We thank Ainesh Bakshi, Rajesh Jayaram, Anil Damle, and Christopher Musco for helpful conversations about this work. RB, CM and AR was partially supported by an Adobe Research grant, along with NSF Grants 2046235 and 1763618. PD and GD were partially supported by NSF AF 1814041, NSF FRG 1760353, and DOE-SC0022085.

Refer to caption
Refer to caption
Refer to caption
(a) Hyperbolic tangent similarity matrix.
Refer to caption
Refer to caption
Refer to caption
(b) Thin plane spline similarity matrix.
Refer to caption
Refer to caption
Refer to caption
(c) Block matrix.
Figure 3: Approximation error of eigenvalues of dense matrices. Log scale absolute error vs. log sampling rate for Algorithm 1 and and approximation by 0, as described in Section 5.3, for approximating the largest, smallest and fourth largest of three of the example matrices. The corresponding true eigenvalues for each matrix in-order are: (hyperbolic tangent) {4.52e+03,7.85e+00,3.18e01}\{4.52\mathrm{e}{+03},-7.85\mathrm{e}{+00},3.18\mathrm{e}{-01}\}, (thin plane spline) {3.54e+02,1.22e+03,1.28e+02}\{3.54\mathrm{e}{+02},-1.22\mathrm{e}{+03},1.28\mathrm{e}{+02}\} and (block matrix) {2.50e+03,5.08e14,1.49e23}\{2.50\mathrm{e}{+03},-5.08\mathrm{e}{-14},1.49\mathrm{e}{-23}\}.
Refer to caption
Refer to caption
Refer to caption
(a) Erdös-Rényi graph adjacency matrix [ER59].
Refer to caption
Refer to caption
Refer to caption
(b) Facebook graph adjacency matrix [ML12].
Refer to caption
Refer to caption
Refer to caption
(c) ArXiv collaboration network adjacency matrix [LKF07].
Figure 4: Approximation error of eigenvalues of sparse matrices. Log scale absolute error vs. log sampling rate for Algorithm 1, Algoithm 2, simple sparsity sampler and approximation by 0, as described in Section 5.3, for approximating the largest, smallest, and fourth largest of remaining three example matrices. The corresponding true eigenvalues for each matrix in-order are: (Erdös-Rényi) {500.57,42.52,42.02}\{500.57,-42.52,42.02\}, (Facebook) {162.37,23.75,73.28}\{162.37,-23.75,73.28\} and (arXiv) {37.95,15.58,26.92}\{37.95,-15.58,26.92\}.

References

  • [AM07] Dimitris Achlioptas and Frank McSherry. Fast computation of low-rank matrix approximations. Journal of the ACM (JACM), 54(2):9–es, 2007.
  • [AN13] Alexandr Andoni and Huy L Nguyên. Eigenvalues of a matrix in the streaming model. In Proceedings of the \nth24 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2013.
  • [AW21] Josh Alman and Virginia Vassilevska Williams. A refined laser method and faster matrix multiplication. In Proceedings of the \nth32 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2021.
  • [BCJ20] Ainesh Bakshi, Nadiia Chepurko, and Rajesh Jayaram. Testing positive semi-definiteness via random submatrices. Proceedings of the \nth61 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2020.
  • [Ber27] Serge Bernstein. Sur l’extension du théorème limite du calcul des probabilités aux sommes de quantités dépendantes. Mathematische Annalen, 97(1):1–59, 1927.
  • [Bha13] Rajendra Bhatia. Matrix analysis. Springer Science & Business Media, 2013.
  • [BIMW21] Arturs Backurs, Piotr Indyk, Cameron Musco, and Tal Wagner. Faster kernel matrix algebra via density estimation. Proceedings of the \nth38 International Conference on Machine Learning (ICML), 2021.
  • [BKKS21] Vladimir Braverman, Robert Krauthgamer, Aditya R Krishnan, and Shay Sapir. Near-optimal entrywise sampling of numerically sparse matrices. In Proceedings of the \nth34 Annual Conference on Computational Learning Theory (COLT), 2021.
  • [BKM22] Vladimir Braverman, Aditya Krishnan, and Christopher Musco. Linear and sublinear time spectral density estimation. Proceedings of the \nth54 Annual ACM Symposium on Theory of Computing (STOC), 2022.
  • [BLWZ19] Maria-Florina Balcan, Yi Li, David P Woodruff, and Hongyang Zhang. Testing matrix rank, optimally. In Proceedings of the \nth30 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2019.
  • [BSS10] Itai Benjamini, Oded Schramm, and Asaf Shapira. Every minor-closed property of sparse graphs is testable. Advances in Mathematics, 223(6):2200–2218, 2010.
  • [CCH+20] Nadiia Chepurko, Kenneth L Clarkson, Lior Horesh, Honghao Lin, and David P Woodruff. Quantum-inspired algorithms from randomized numerical linear algebra. arXiv:2011.04125, 2020.
  • [CLM+15] Michael B Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, and Aaron Sidford. Uniform sampling for matrix approximation. In Proceedings of the \nth6 Conference on Innovations in Theoretical Computer Science (ITCS), 2015.
  • [CNX21] Difeng Cai, James Nagy, and Yuanzhe Xi. Fast and stable deterministic approximation of general symmetric kernel matrices in high dimensions. arXiv:2102.05215, 2021.
  • [Com21] The Numpy Community. numpy.linalg.eigvals. https://numpy.org/doc/stable/reference/generated/numpy.linalg.eigvals.html, 2021.
  • [CSKSV18] David Cohen-Steiner, Weihao Kong, Christian Sohler, and Gregory Valiant. Approximating the spectrum of a graph. In Proceedings of the \nth24 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2018.
  • [DBB19] Kun Dong, Austin R Benson, and David Bindel. Network density of states. In Proceedings of the \nth25 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019.
  • [DDHK07] James Demmel, Ioana Dumitriu, Olga Holtz, and Robert Kleinberg. Fast matrix multiplication is stable. Numerische Mathematik, 2007.
  • [DK01] Petros Drineas and Ravi Kannan. Fast monte-carlo algorithms for approximate matrix multiplication. In Proceedings of the \nth42 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2001.
  • [ER59] P. Erdös and A. Rényi. On random graphs I. Publicationes Mathematicae Debrecen, 1959.
  • [ER18] Talya Eden and Will Rosenbaum. On sampling edges almost uniformly. SIAM Symposium on Simplicty in Algorithms (SOSA), 2018.
  • [FKV04] Alan Frieze, Ravi Kannan, and Santosh Vempala. Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM (JACM), 51(6):1025–1041, 2004.
  • [GE95] Ming Gu and Stanley C Eisenstat. A divide-and-conquer algorithm for the symmetric tridiagonal eigenproblem. SIAM Journal on Matrix Analysis and Applications, 1995.
  • [Ger31] Semyon Aranovich Gershgorin. Uber die abgrenzung der eigenwerte einer matrix. Izvestiya Rossiyskoy akademii nauk. Seriya matematicheskaya, (6):749–754, 1931.
  • [GKX19] Behrooz Ghorbani, Shankar Krishnan, and Ying Xiao. An investigation into neural net optimization via hessian eigenvalue density. In Proceedings of the \nth36 International Conference on Machine Learning (ICML), 2019.
  • [GR97] Oded Goldreich and Dana Ron. Property testing in bounded degree graphs. In Proceedings of the \nth29 Annual ACM Symposium on Theory of Computing (STOC), 1997.
  • [GR08] Oded Goldreich and Dana Ron. Approximating average parameters of graphs. Random Structures & Algorithms, 32(4):473–493, 2008.
  • [GS91] Leslie Greengard and John Strain. The fast gauss transform. SIAM Journal on Scientific and Statistical Computing, 1991.
  • [GT11] Alex Gittens and Joel A Tropp. Tail bounds for all eigenvalues of a sum of random matrices. arXiv:1104.4513, 2011.
  • [HBT19] Jonas Helsen, Francesco Battistel, and Barbara M Terhal. Spectral quantum tomography. Quantum Information, 2019.
  • [HJ12] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, USA, 2nd edition, 2012.
  • [HP14] Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. Advances in Neural Information Processing Systems 27 (NIPS), 2014.
  • [KS03] Robert Krauthgamer and Ori Sasson. Property testing of data dimensionality. In Proceedings of the \nth14 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2003.
  • [LK14] Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, 2014.
  • [LKF07] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), 2007.
  • [LNW14] Yi Li, Huy L Nguyê̋n, and David P Woodruff. On sketching matrix norms and the top singular vector. In Proceedings of the \nth25 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014.
  • [LSY16] Lin Lin, Yousef Saad, and Chao Yang. Approximating spectral densities of large matrices. SIAM Review, 2016.
  • [LW16] Yi Li and David P Woodruff. On approximating functions of the singular values in a stream. In Proceedings of the \nth48 Annual ACM Symposium on Theory of Computing (STOC), 2016.
  • [LWW14] Yi Li, Zhengyu Wang, and David P Woodruff. Improved testing of low rank matrices. In Proceedings of the \nth20 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2014.
  • [LXES19] Ruipeng Li, Yuanzhe Xi, Lucas Erlandson, and Yousef Saad. The eigenvalues slicing library (EVSL): Algorithms, implementation, and software. SIAM Journal on Scientific Computing, 2019.
  • [ML12] Julian J McAuley and Jure Leskovec. Learning to discover social circles in ego networks. In Advances in Neural Information Processing Systems 25 (NIPS), 2012.
  • [MU17] Michael Mitzenmacher and Eli Upfal. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017.
  • [NSW22] Deanna Needell, William Swartworth, and David P Woodruff. Testing positive semidefiniteness using linear measurements. In Proceedings of the \nth63 Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2022.
  • [RV07] Mark Rudelson and Roman Vershynin. Sampling from large matrices: An approach through geometric functional analysis. Journal of the ACM (JACM), 2007.
  • [Saa11] Yousef Saad. Numerical methods for large eigenvalue problems: revised edition. SIAM, 2011.
  • [SBL16] Levent Sagun, Leon Bottou, and Yann LeCun. Eigenvalues of the hessian in deep learning: Singularity and beyond. arXiv:1611.07476, 2016.
  • [SgS90] G. W. Stewart and Ji guang Sun. Matrix Perturbation Theory. Academic Press, 1990.
  • [SR94] RN Silver and H Röder. Densities of states of mega-dimensional Hamiltonian matrices. International Journal of Modern Physics C, 1994.
  • [Tan18] Ewin Tang. Quantum-inspired classical algorithms for principal component analysis and supervised clustering. arXiv:1811.00414, 2018.
  • [Tro08a] Joel A Tropp. Norms of random submatrices and sparse approximation. Comptes Rendus Mathematique, 2008.
  • [Tro08b] Joel A. Tropp. The random paving property for uniformly bounded matrices. Studia Mathematica, 185:67–82, 2008.
  • [Tro15] Joel A Tropp. An introduction to matrix concentration inequalities. arXiv:1501.01571, 2015.
  • [Wan94] Lin-Wang Wang. Calculating the density of states and optical-absorption spectra of large quantum systems by the plane-wave moments method. Physical Review B, 1994.
  • [Wey12] Hermann Weyl. The asymptotic distribution law of the eigenvalues of linear partial differential equations (with an application to the theory of cavity radiation). Mathematical Annals, 1912.
  • [WWAF06] Alexander Weiße, Gerhard Wellein, Andreas Alvermann, and Holger Fehske. The kernel polynomial method. Reviews of Modern Physics, 2006.
  • [YGL+18] Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, and Michael W Mahoney. Hessian-based analysis of large batch training and robustness to adversaries. arXiv:1802.08241, 2018.

Appendix A Eigenvalue Approximation for PSD Matrices

Here we give a simple proof that shows if Algorithm 1 is used to approximate the eigenvalues of positive semidefinite (PSD) matrices (i.e., with all non-negative eigenvalues) using a O(1/ϵ2)×O(1/ϵ2)O(1/\epsilon^{2})\times O(1/\epsilon^{2}) random submatrix, then the 2\ell_{2} norm of the error of eigenvalue approximations is bounded by ϵn\epsilon n. This much stronger result immediately implies that each eigenvalue of a PSD matrix can be approximated to ±ϵn\pm\epsilon n additive error using just a O(1/ϵ2)×O(1/ϵ2)O(1/\epsilon^{2})\times O(1/\epsilon^{2}) random submatrix. The proof follows from a bound in [Bha13] which bounds the 2\ell_{2} norm of the difference vector of eigenvalues of a Hermitian matrix and any other random matrix by the Frobenius norm of the difference of the two matrices. This improves on the bound of Theorem 1 for general entrywise bounded matrices by a 1/ϵ21/\epsilon^{2} factor, and matches the O(1/ϵ4)O(1/\epsilon^{4}) lower bound for principal submatrix queries in [BCJ20]. Note that the hard instance used to prove the lower bound in [BCJ20] can in fact be negated to be PSD, thus showing that our upper bound here is tight.

We first state the result from [Bha13] which we will be using in our proof.

Fact 5 (2\ell_{2}-norm bound on eigenvalues [Bha13]).

Let 𝐀n×n\mathbf{A}\in\mathbb{C}^{n\times n} be Hermitian and 𝐁n×n\mathbf{B}\in\mathbb{C}^{n\times n} be any matrix whose eigenvalues are λ1(𝐁),,λn(𝐁)\lambda_{1}(\mathbf{B}),\ldots,\lambda_{n}(\mathbf{B}) such that Re(λ1(𝐁))Re(λn(𝐁))Re(\lambda_{1}(\mathbf{B}))\geq\ldots\geq Re(\lambda_{n}(\mathbf{B})) (where Re(λi(𝐁))Re(\lambda_{i}(\mathbf{B})) denotes the real part of λi(𝐁)\lambda_{i}(\mathbf{B})). Let 𝐀𝐁=𝐄\mathbf{A}-\mathbf{B}=\mathbf{E}. Then

(i=1n|λi(𝐀)λi(𝐁)|2)1/22𝐄F.\displaystyle\left(\sum_{i=1}^{n}\left|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{B})\right|^{2}\right)^{1/2}\leq\sqrt{2}\|\mathbf{E}\|_{F}.

Our result is based on the following Lemma, we prove at the end of the section.

Lemma 9.

Consider a PSD matrix 𝐀=𝐁𝐁T\mathbf{A}=\mathbf{BB}^{T} with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Let SS be sampled as in Algorithm 1 for s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta}. Let 𝐒¯n×|S|\mathbf{\bar{S}}\in\mathbb{R}^{n\times|S|} be the scaled sampling matrix satisfying 𝐒¯T𝐀𝐒¯=ns𝐀S\mathbf{\bar{S}}^{T}\mathbf{A}\mathbf{\bar{S}}=\frac{n}{s}\cdot\mathbf{A}_{S}. Then with probability at least 1δ1-\delta,

𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁Fϵn.\displaystyle\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}\leq\epsilon n.

From the above Lemma we have:

Corollary 2 (Spectral norm bound – PSD matrices).

Consider a PSD matrix 𝐀\mathbf{A} with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Let SS be a subset of indices formed by including each index in [n][n] independently with probability s/ns/n as in Algorithm 1. Let 𝐀S\mathbf{A}_{S} be the corresponding principal submatrix of 𝐀\mathbf{A}, with eigenvalues λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}_{S}).

For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}_{S})\geq 0, let λ~i(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all other i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0. Then if s2ϵ2δs\geq\frac{2}{\epsilon^{2}\delta}, with probability at least 1δ1-\delta,

(i=1n|λ~i(𝐀)λi(𝐀)|2)1/2\displaystyle\left(\sum_{i=1}^{n}\left|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})\right|^{2}\right)^{1/2} ϵn,\displaystyle\leq\epsilon n,

which implies that for all i[n]i\in[n],

λi(𝐀)ϵnλ~i(𝐀)λi(𝐀)+ϵn.\displaystyle\lambda_{i}(\mathbf{A})-\epsilon n\leq\tilde{\lambda}_{i}(\mathbf{A})\leq\lambda_{i}(\mathbf{A})+\epsilon n.
Proof.

Let SS be sampled as in Algorithm 1 and let 𝐒¯n×|S|\mathbf{\bar{S}}\in\mathbb{R}^{n\times|S|} be the scaled sampling matrix satisfying 𝐒¯T𝐀𝐒¯=ns𝐀S\mathbf{\bar{S}}^{T}\mathbf{A}\mathbf{\bar{S}}=\frac{n}{s}\cdot\mathbf{A}_{S}. Since 𝐀\mathbf{A} is PSD, we can write 𝐀=𝐁𝐁T\mathbf{A}=\mathbf{B}\mathbf{B}^{T} for some matrix 𝐁n×rank(𝐀)\mathbf{B}\in\mathbb{R}^{n\times\operatorname{rank}(\mathbf{A})}. From Lemma 9, for s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta}, we have with probability at least 1δ1-\delta:

𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁Fϵn\displaystyle\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}\leq\epsilon n

Using Fact 5, we have,

(i=1rank(𝐀)|λi(𝐁T𝐒¯𝐒¯T𝐁)λi(𝐁T𝐁)|2)1/22𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁F2ϵn.\displaystyle\left(\sum_{i=1}^{\operatorname{rank}(\mathbf{A})}\left|\lambda_{i}(\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B})-\lambda_{i}(\mathbf{B}^{T}\mathbf{B})\right|^{2}\right)^{1/2}\leq\sqrt{2}\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}\leq\sqrt{2}\epsilon n. (35)

Also from Fact 1, we have λi(𝐁T𝐁)=λi(𝐁𝐁T)=λi(𝐀)\lambda_{i}(\mathbf{B}^{T}\mathbf{B})=\lambda_{i}(\mathbf{B}\mathbf{B}^{T})=\lambda_{i}(\mathbf{A}) for all irank(𝐀)i\leq\operatorname{rank}(\mathbf{A}). Thus,

(i=1rank(𝐀)|λi(𝐁T𝐒¯𝐒¯T𝐁)λi(𝐀)|2)1/22ϵn\displaystyle\left(\sum_{i=1}^{\operatorname{rank}(\mathbf{A})}\left|\lambda_{i}(\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B})-\lambda_{i}(\mathbf{A})\right|^{2}\right)^{1/2}\leq\sqrt{2}\epsilon n

Also by Fact 1, all non-zero eigenvalues of 𝐁T𝐒¯𝐒¯T𝐁\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B} are equal to those of 𝐒¯T𝐁𝐁T𝐒¯=ns𝐀S\mathbf{\bar{S}}^{T}\mathbf{BB}^{T}\mathbf{\bar{S}}=\frac{n}{s}\cdot\mathbf{A}_{S}. All other eigenvalue estimates are set to 0. Further, for all i>rank(𝐀)i>\operatorname{rank}(\mathbf{A}), λi(𝐀)=0\lambda_{i}(\mathbf{A})=0. Thus,

(i=1n|λ~i(𝐀)λi(𝐀)|2)1/22ϵn.\displaystyle\left(\sum_{i=1}^{n}\left|\tilde{\lambda}_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A})\right|^{2}\right)^{1/2}\leq\sqrt{2}\epsilon n.

Adjusting ϵ\epsilon to ϵ/2\epsilon/\sqrt{2} then gives us the bound. ∎

We now prove Lemma 9, using a standard approach for sampling based approximate matrix multiplication – see e.g. [DK01].

Proof of Lemma 9.

For k=1,,nk=1,\ldots,n let 𝐘k=ns1\mathbf{Y}_{k}=\frac{n}{s}-1 with probability sn\frac{s}{n} and 𝐘k=1\mathbf{Y}_{k}=-1 with probability 1sn1-\frac{s}{n}. Thus 𝔼[𝐘k]=0\mathbb{E}[\mathbf{Y}_{k}]=0 and

𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁F2=i=1nj=1n(k=1n𝐘k𝐁ik𝐁jk)2.\displaystyle\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}^{2}=\sum_{i=1}^{n}\sum_{j=1}^{n}\left(\sum_{k=1}^{n}\mathbf{Y}_{k}\cdot\mathbf{B}_{ik}\mathbf{B}_{jk}\right)^{2}.

Fixing i,ji,j, each the 𝐘k𝐁ik𝐁jk\mathbf{Y}_{k}\cdot\mathbf{B}_{ik}\mathbf{B}_{jk} are 0 mean independent random variables. Thus we have:

𝔼[𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁F2]\displaystyle\mathbb{E}\left[\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}^{2}\right] =i=1nj=1n𝔼[(k=1n𝐘k𝐁ik𝐁jk)2]\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{n}\mathbb{E}\left[\left(\sum_{k=1}^{n}\mathbf{Y}_{k}\cdot\mathbf{B}_{ik}\mathbf{B}_{jk}\right)^{2}\right]
=i=1nj=1nVar[k=1n𝐘k𝐁ik𝐁jk]\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{n}\operatorname{Var}\left[\sum_{k=1}^{n}\mathbf{Y}_{k}\cdot\mathbf{B}_{ik}\mathbf{B}_{jk}\right]
=i=1nj=1nk=1nVar[𝐘k𝐁ik𝐁jk]\displaystyle=\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\operatorname{Var}\left[\mathbf{Y}_{k}\cdot\mathbf{B}_{ik}\mathbf{B}_{jk}\right]
i=1nj=1nk=1nns𝐁ik2𝐁jk2.\displaystyle\leq\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\frac{n}{s}\cdot\mathbf{B}_{ik}^{2}\mathbf{B}_{jk}^{2}.

since Var[𝐘k]=(ns1)2sn+(1sn)=ns2+sn+1sn=ns1\operatorname{Var}[\mathbf{Y}_{k}]=\left(\frac{n}{s}-1\right)^{2}\cdot\frac{s}{n}+\left(1-\frac{s}{n}\right)=\frac{n}{s}-2+\frac{s}{n}+1-\frac{s}{n}=\frac{n}{s}-1. Rearranging the sums we have:

𝔼[𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁F2]nsk=1ni=1n𝐁ik2j=1n𝐁jk2.\displaystyle\mathbb{E}[\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}^{2}]\leq\frac{n}{s}\sum_{k=1}^{n}\sum_{i=1}^{n}\mathbf{B}_{ik}^{2}\sum_{j=1}^{n}\mathbf{B}_{jk}^{2}.

Observe that j=1n𝐁jk2=𝐀kk𝐀1\sum_{j=1}^{n}\mathbf{B}_{jk}^{2}=\mathbf{A}_{kk}\leq\|\mathbf{A}\|_{\infty}\leq 1, thus overall we have:

𝔼[𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁F2]n2sϵ2δn2.\displaystyle\mathbb{E}[\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}^{2}]\leq\frac{n^{2}}{s}\leq\epsilon^{2}\delta n^{2}.

So by Markov’s inequality, with probability 1δ\geq 1-\delta, 𝐁T𝐒¯𝐒¯T𝐁𝐁T𝐁F2ϵ2n2\|\mathbf{B}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{B}-\mathbf{B}^{T}\mathbf{B}\|_{F}^{2}\leq\epsilon^{2}n^{2}. This completes the theorem after taking a square root. ∎

Remark: The proof of Lemma 9 can be easily modified to show that the iith row of 𝐀\mathbf{A} can be sampled with probability proportional to |𝐀ii|tr(𝐀)\frac{|\mathbf{A}_{ii}|}{\text{tr}(\mathbf{A})} to approximate the eigenvalues of any PSD 𝐀\mathbf{A} up to ±ϵtr(𝐀)\pm\epsilon\cdot\text{tr}(\mathbf{A}) error (tr(𝐀)\text{tr}(\mathbf{A}) is the trace of 𝐀\mathbf{A}). When sampling with probability proportional to |𝐀ii|tr(𝐀)\frac{|\mathbf{A}_{ii}|}{\text{tr}(\mathbf{A})}, we do not require a bounded entry assumption on 𝐀\mathbf{A}.

Appendix B Alternate Bound for Uniform Sampling

In this section we provide an alternate bound for approximating eigenvalues with uniform sampling. The sample complexity is worse by a factor of 1/ϵ1/\epsilon for this approach, but better by a factor log2n\log^{2}n as compared to Theorem 1. We start with an analog to Lemma 3, showing that the outlying eigenspace remains nearly orthogonal after sampling. In particular, we show concentration of the Hermitian matrix 𝐕oT𝐒¯𝐒¯T𝐕o\mathbf{V}_{o}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{V}_{o} about its expectation 𝐕oT𝐕o=𝐈\mathbf{V}_{o}^{T}\mathbf{V}_{o}=\mathbf{I} rather than the non-Hermitian 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\boldsymbol{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\mathbf{\bar{S}}\mathbf{\bar{S}}^{T}\mathbf{V}_{o}\boldsymbol{\Lambda}_{o}^{1/2} as in Lemma 3. This allows us to use Weyl’s inequality in our final analysis, rather than the non-Hermitian eigenvalue perturbation bound of Fact 4, saving a log2n\log^{2}n factor in the sample complexity.

Lemma 10 (Near orthonormality – sampled outlying eigenvalues).

Let SS be sampled as in Algorithm 1 for sclog(1/(ϵδ))ϵ4δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{4}\delta} where cc is a sufficiently large constant. Let 𝐒¯n×|S|\mathbf{\bar{S}}\in\mathbb{R}^{n\times|S|} be the scaled sampling matrix satisfying 𝐒¯T𝐀𝐒¯=ns𝐀S\mathbf{\bar{S}}^{T}\mathbf{A}\mathbf{\bar{S}}=\frac{n}{s}\cdot\mathbf{A}_{S}. Then with probability at least 1δ1-\delta, 𝐕oT𝐒¯𝐒¯T𝐕o𝐈2ϵ.\|\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}-\mathbf{I}\|_{2}\leq\epsilon.

Proof.

The result is standard in randomized numerical linear algebra – see e.g., [CLM+15]. For completeness, we give a proof here. Define 𝐄=𝐕oT𝐒¯𝐒¯T𝐕o𝐈\mathbf{E}=\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}-\mathbf{I}. For all i[n]i\in[n], let 𝐕o,i\mathbf{V}_{o,i} be the ithi^{th} row of 𝐕o\mathbf{V}_{o} and define the matrix valued random variable

𝐘i={ns𝐕o,i𝐕o,iT,with probability s/n0otherwise.\displaystyle\mathbf{Y}_{i}=\begin{cases}\frac{n}{s}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T},&\text{with probability }s/n\\ 0&\text{otherwise.}\end{cases}

Then, similar to the proof of Lemma 3, define 𝐐i=𝐘i𝔼[𝐘i]\mathbf{Q}_{i}=\mathbf{Y}_{i}-\mathbb{E}\left[\mathbf{Y}_{i}\right]. Since 𝐐1,𝐐2,,𝐐n\mathbf{Q}_{1},\mathbf{Q}_{2},\ldots,\mathbf{Q}_{n} are independent random variables and i=1n𝐐i=𝐕oT𝐒¯𝐒¯T𝐕o𝐈=𝐄\sum_{i=1}^{n}\mathbf{Q}_{i}=\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}-\mathbf{I}=\mathbf{E} , we need to bound 𝐐i2\|\mathbf{Q}_{i}\|_{2} for all i[n]i\in[n] and 𝐕𝐚𝐫(𝐄)=def𝔼(𝐄𝐄T)=𝔼(𝐄T𝐄)=i=1n𝔼[𝐐i2]\mathbf{Var}(\mathbf{E})\mathbin{\stackrel{{\scriptstyle\rm def}}{{=}}}\mathbb{E}(\mathbf{EE}^{T})=\mathbb{E}(\mathbf{E}^{T}\mathbf{E})=\sum_{i=1}^{n}\mathbb{E}[\mathbf{Q}_{i}^{2}]. Observe 𝐐i2max(1,ns1)𝐕o,i𝐕o,iT2=max(1,ns1)𝐕o,i221ϵ2δs\|\mathbf{Q}_{i}\|_{2}\leq\max\left(1,\frac{n}{s}-1\right)\|\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\|_{2}=\max\left(1,\frac{n}{s}-1\right)\|\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{1}{\epsilon^{2}\delta s}, by row norm bounds of Lemma 2. Again, using Lemma 2 we have

i=1n𝔼[𝐐i2]\displaystyle\sum_{i=1}^{n}\mathbb{E}[\mathbf{Q}_{i}^{2}] =i=1nsn(ns1)2(𝐕o,i𝐕o,iT)2+(1sn)(𝐕o,i𝐕o,iT)2\displaystyle=\sum_{i=1}^{n}\frac{s}{n}\cdot\left(\frac{n}{s}-1\right)^{2}(\mathbf{V}_{o,i}\mathbf{V}^{T}_{o,i})^{2}+\left(1-\frac{s}{n}\right)(\mathbf{V}_{o,i}\mathbf{V}^{T}_{o,i})^{2}
i=1nns𝐕o,i22(𝐕o,i𝐕o,iT)\displaystyle\preceq\sum_{i=1}^{n}\frac{n}{s}\|\mathbf{V}_{o,i}\|_{2}^{2}(\mathbf{V}_{o,i}\mathbf{V}^{T}_{o,i})
i=1nns1ϵ2δn(𝐕o,i𝐕o,iT)\displaystyle\preceq\sum_{i=1}^{n}\frac{n}{s}\frac{1}{\epsilon^{2}\delta n}(\mathbf{V}_{o,i}\mathbf{V}^{T}_{o,i})
1sϵ2δ𝐈\displaystyle\preceq\frac{1}{s\epsilon^{2}\delta}\cdot\mathbf{I}

where 𝐈\mathbf{I} is the identity matrix of appropriate dimension. By setting d=1ϵ2δd=\frac{1}{\epsilon^{2}\delta}, we can finally bound the probability of the event 𝐄2ϵn\|\mathbf{E}\|_{2}\geq\epsilon n using Theorem 6 (the matrix Bernstein inequality) with δ\delta if sclog(1/(ϵδ))ϵ4δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{4}\delta}. Since these steps follow Lemma 3 nearly exactly, we omit them here. ∎

With Lemma 10 in place, we can now give our alternate sample complexity bound.

Theorem 8 (Sublinear Time Eigenvalue Approximation).

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and eigenvalues λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}). Let S[n]S\subseteq[n] be formed by including each index independently with probability s/ns/n as in Algorithm 1. Let 𝐀S\mathbf{A}_{S} be the corresponding principal submatrix of 𝐀\mathbf{A}, with eigenvalues λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}_{S}).

For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}_{S})\geq 0, let λ~i(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all i{1,,|S|}i\in\{1,\ldots,|S|\} with λi(𝐀S)<0\lambda_{i}(\mathbf{A}_{S})<0, let λ~n(|S|i)(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all other i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0. If sclognϵ4δs\geq\frac{c\log n}{\epsilon^{4}\delta}, for a large enough constant cc, then with probability 1δ\geq 1-\delta, for all i[n]i\in[n],

λi(𝐀)ϵnλ~i(𝐀)λi(𝐀)+ϵn.\displaystyle\lambda_{i}(\mathbf{A})-\epsilon n\leq\tilde{\lambda}_{i}(\mathbf{A})\leq\lambda_{i}(\mathbf{A})+\epsilon n.
Proof.

Let 𝐒n×|S|\mathbf{S}\in\mathbb{R}^{n\times|S|} be the binary sampling matrix with a single one in each column such that 𝐒T𝐀𝐒=𝐀S\mathbf{S}^{T}\mathbf{A}\mathbf{S}=\mathbf{A}_{S}. Let 𝐒¯=n/s𝐒\bar{\mathbf{S}}=\sqrt{n/s}\cdot\mathbf{S}. Following Definition 1.1, we write 𝐀=𝐀o+𝐀m\mathbf{A}=\mathbf{A}_{o}+\mathbf{A}_{m}. By Fact 1 we have that the nonzero eigenvalues of ns𝐀o,S=𝐒¯T𝐕o𝚲o𝐕oT𝐒¯\frac{n}{s}\cdot\mathbf{A}_{o,S}=\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T}\bar{\mathbf{S}} are identical to those of 𝚲o𝐕oT𝐒¯𝐒¯T𝐕o\mathbf{\Lambda}_{o}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}.

Note that 𝐇=𝐕oT𝐒¯𝐒¯T𝐕o\mathbf{H}=\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o} is positive semidefinite. Writing its eigendecomposition 𝐇=𝐔𝐖𝐔T\mathbf{H}=\mathbf{U}\mathbf{W}\mathbf{U}^{T} we can define the matrix squareroot 𝐇1/2=𝐔𝐖1/2𝐔T\mathbf{H}^{1/2}=\mathbf{U}\mathbf{W}^{1/2}\mathbf{U}^{T} with 𝐇1/2𝐇1/2=𝐇\mathbf{H}^{1/2}\mathbf{H}^{1/2}=\mathbf{H}. By Lemma 10 applied with error ϵ/6\epsilon/6, with probability at least 1δ1-\delta, all eigenvalues of 𝐇\mathbf{H} lie in the range [1ϵ/6,1+ϵ/6][1-\epsilon/6,1+\epsilon/6]. In turn, all eigenvalues of 𝐇1/2\mathbf{H}^{1/2} also lie in this range. Again using Fact 1, we have that the nonzero eigenvalues of 𝚲o𝐇\mathbf{\Lambda}_{o}\mathbf{H}, and in turn those of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S}, are identical to those of 𝐇1/2𝚲o𝐇1/2\mathbf{H}^{1/2}\mathbf{\Lambda}_{o}\mathbf{H}^{1/2}.

Let 𝐄=𝐇1/2𝐈=𝐔𝐖1/2𝐔T𝐔𝐔T=𝐔(𝐖1/2𝐈)𝐔T\mathbf{E}=\mathbf{H}^{1/2}-\mathbf{I}=\mathbf{U}\mathbf{W}^{1/2}\mathbf{U}^{T}-\mathbf{U}\mathbf{U}^{T}=\mathbf{U}(\mathbf{W}^{1/2}-\mathbf{I})\mathbf{U}^{T}. Since the diagonal entries of 𝐖1/2\mathbf{W}^{1/2} lie in [1ϵ/6,1+ϵ/6][1-\epsilon/6,1+\epsilon/6], those of 𝐖1/2𝐈\mathbf{W}^{1/2}-\mathbf{I} lie in [ϵ/6,ϵ/6][-\epsilon/6,\epsilon/6]. Thus, 𝐄2ϵ/6\|\mathbf{E}\|_{2}\leq\epsilon/6. We can write

λi(𝐇1/2𝚲o𝐇1/2)=λi((𝐈+𝐄)𝚲o(𝐈+𝐄))=λi(𝚲o+𝐄𝚲o+𝚲o𝐄+𝐄𝚲o𝐄).\displaystyle\lambda_{i}(\mathbf{H}^{1/2}\mathbf{\Lambda}_{o}\mathbf{H}^{1/2})=\lambda_{i}((\mathbf{I}+\mathbf{E})\mathbf{\Lambda}_{o}(\mathbf{I}+\mathbf{E}))=\lambda_{i}(\mathbf{\Lambda}_{o}+\mathbf{E}\mathbf{\Lambda}_{o}+\mathbf{\Lambda}_{o}\mathbf{E}+\mathbf{E}\mathbf{\Lambda}_{o}\mathbf{E}).

We can then bound

𝐄𝚲o+𝚲o𝐄+𝐄𝚲o𝐄2\displaystyle\|\mathbf{E}\mathbf{\Lambda}_{o}+\mathbf{\Lambda}_{o}\mathbf{E}+\mathbf{E}\mathbf{\Lambda}_{o}\mathbf{E}\|_{2} 𝐄𝚲o2+𝚲o𝐄2+𝐄𝚲o𝐄2\displaystyle\leq\|\mathbf{E}\mathbf{\Lambda}_{o}\|_{2}+\|\mathbf{\Lambda}_{o}\mathbf{E}\|_{2}+\|\mathbf{E}\mathbf{\Lambda}_{o}\mathbf{E}\|_{2}
𝐄2𝚲o2+𝚲o2𝐄2+𝐄2𝚲o2𝐄2\displaystyle\leq\|\mathbf{E}\|_{2}\|\mathbf{\Lambda}_{o}\|_{2}+\|\mathbf{\Lambda}_{o}\|_{2}\|\mathbf{E}\|_{2}+\|\mathbf{E}\|_{2}\|\mathbf{\Lambda}_{o}\|_{2}\|\mathbf{E}\|_{2}
ϵn/6+nϵ/6+ϵ2n/36\displaystyle\leq\epsilon n/6+n\epsilon/6+\epsilon^{2}n/36
ϵ/2n.\displaystyle\leq\epsilon/2\cdot n.

Applying Weyl’s eigenvalue perturbation theorem (Fact 3), we thus have for all ii,

|λi(𝐇1/2𝚲o𝐇1/2)λi(𝚲o)|\displaystyle\left|\lambda_{i}(\mathbf{H}^{1/2}\mathbf{\Lambda}_{o}\mathbf{H}^{1/2})-\lambda_{i}(\mathbf{\Lambda}_{o})\right| <ϵ/2n.\displaystyle<\epsilon/2\cdot n. (36)

Note that we have shown that the nonzero eigenvalues of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} are identical to those of 𝐇1/2𝚲o𝐇1/2\mathbf{H}^{1/2}\mathbf{\Lambda}_{o}\mathbf{H}^{1/2}, which we have shown well approximate those of 𝚲o\mathbf{\Lambda}_{o} and in turn 𝐀o\mathbf{A}_{o} i.e., the non-zero eigenvalues of ns𝐀o,S\frac{n}{s}\cdot\mathbf{A}_{o,S} approximate all outlying eigenvalues of 𝐀\mathbf{A}. We can also bound the middle eigenvalues using Lemma 4 as in Theorem 1. Now the only thing left is to argue how these approximations ‘line up’ in the presence of zero eigenvalues in the spectrum of these matrices. This part of the proof again proceeds similarly to that of Theorem 1 in Section 3.2.

Analogous to Theorem 1, from Lemma 10 equation (36) holds with probability 1δ1-\delta if sclog(1/(ϵδ))ϵ4δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{4}\delta}. We also require sclognϵ2δs\geq\frac{c\log n}{\epsilon^{2}\delta} for 𝐀m,S2ϵn\|\mathbf{A}_{m,S}\|_{2}\leq\epsilon n to hold with probability 1δ1-\delta by Lemma 4. Thus, for both conditions to hold simultaneously with probability 12δ1-2\delta by a union bound, it suffices to set s=clognϵ4δmax(clog(1/(ϵδ))ϵ4δ,clognϵ2δ)s=\frac{c\log n}{\epsilon^{4}\delta}\geq\max\left(\frac{c\log(1/(\epsilon\delta))}{\epsilon^{4}\delta},\frac{c\log n}{\epsilon^{2}\delta}\right), where we use that log(1/(ϵδ))=O(logn)\log(1/(\epsilon\delta))=O(\log n), as otherwise our algorithm can take 𝐀S\mathbf{A}_{S} to be all of 𝐀\mathbf{A}. Adjusting δ\delta to δ/2\delta/2 completes the theorem. ∎

Appendix C Refined Bounds

In this section, we show how it is possible to get better query complexity or tighter approximation factors by modifying the proof of Theorem 1 and Lemmas 3 and 2 under some assumptions. We give an extension to Theorem 1 in Theorem 9 for the case when the eigenvalues of 𝐀o\mathbf{A}_{o} lie in a bounded range – between ϵaδn\epsilon^{a}\sqrt{\delta}n and ϵbn\epsilon^{b}n where 0ba10\leq b\leq a\leq 1.

Theorem 9.

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and eigenvalues λ1(𝐀)λn(𝐀)\lambda_{1}(\mathbf{A})\geq\ldots\geq\lambda_{n}(\mathbf{A}). Let 𝐀o\mathbf{A}_{o} be as in Definition 1.1 such that for all eigenvalues λi(𝐀o)\lambda_{i}(\mathbf{A}_{o}) we have either ϵaδn|λi(𝐀o)|ϵbn\epsilon^{a}\sqrt{\delta}n\leq\lvert\lambda_{i}(\mathbf{A}_{o})\rvert\leq\epsilon^{b}n or λi(𝐀o)=0\lambda_{i}(\mathbf{A}_{o})=0 where 0ba10\leq b\leq a\leq 1. Let S[n]S\subseteq[n] be formed by including each index independently with probability s/ns/n as in Algorithm 1. Let 𝐀S\mathbf{A}_{S} be the corresponding principal submatrix of 𝐀\mathbf{A}, with eigenvalues λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}_{S}).

For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}_{S})\geq 0, let λ~i(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all i[|S|]i\in[|S|] with λi(𝐀S)<0\lambda_{i}(\mathbf{A}_{S})<0, let λ~n(|S|i)(𝐀)=nsλi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\frac{n}{s}\cdot\lambda_{i}(\mathbf{A}_{S}). For all other i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0. If sclog(1/(ϵδ))log2+abnϵ2+abδs\geq\frac{c\log(1/(\epsilon\delta))\log^{2+a-b}n}{\epsilon^{2+a-b}\delta}, for large enough cc, then with probability at least 1δ1-\delta, for all i[n]i\in[n],

λi(𝐀)ϵnλ~i(𝐀)λi(𝐀)+ϵn.\displaystyle\lambda_{i}(\mathbf{A})-\epsilon n\leq\tilde{\lambda}_{i}(\mathbf{A})\leq\lambda_{i}(\mathbf{A})+\epsilon n.
Proof.

The proof follows by modifying the proofs of Theorem 1, Lemmas 2 and 3 to account for the tighter intervals. First observe that since |λi(𝐀o)|ϵaδn\lvert\lambda_{i}(\mathbf{A}_{o})\rvert\geq\epsilon^{a}\sqrt{\delta}n for all ii, we can give a tighter row norm bound for 𝐕o\mathbf{V}_{o} from the proof of Lemma 2. In particular, from equation (3) we get:

𝚲o1/2𝐕o,i221ϵaδ and 𝐕o,i22\displaystyle\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{1}{\epsilon^{a}\sqrt{\delta}}\hskip 20.00003pt\text{ and }\hskip 20.00003pt\|\mathbf{V}_{o,i}\|_{2}^{2} nϵ2aδn2=1ϵ2aδn.\displaystyle\leq\frac{n}{\epsilon^{2a}\delta n^{2}}=\frac{1}{\epsilon^{2a}\delta n}.

We can then bound the number of samples we need to take such that for 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2} (as defined in Theorem 8) we have 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2ϵn\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}\|_{2}\leq\epsilon n with probability at least 1δ1-\delta via a matrix Bernstein bound. By appropriately modifying the proof of Lemma 3 to incorporate the stronger row norm bound for 𝐕o\mathbf{V}_{o}, we can show that sampling with probability s/ns/n for sclog(1/(ϵδ))ϵ2+abδs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{2+a-b}\delta} for large enough cc suffices. Specifically, we get LnϵaδsL\leq\frac{n}{\epsilon^{a}\sqrt{\delta}s}, vn2ϵabδsv\leq\frac{n^{2}}{\epsilon^{a-b}\sqrt{\delta}s} and dlog(1/(ϵ2δ))d\leq\log(1/(\epsilon^{2}\delta)) for the Bernstein bound in Lemma 3 which enables us to get the tighter bound. Thus, we have 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2ϵn\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}\|_{2}\leq\epsilon n with probability 1δ1-\delta for sclog(1/(ϵδ))ϵ2+abδs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{2+a-b}\sqrt{\delta}} following Lemma 3. We also require sclognϵ2δs\geq\frac{c\log n}{\epsilon^{2}\delta} for 𝐀m,S2ϵn\|\mathbf{A}_{m,S}\|_{2}\leq\epsilon n to hold with probability 1δ1-\delta by Lemma 4. Then, following the proof of Theorem 1, by Fact 4, for all i[n]i\in[n], and some constant CC, we have:

|λi(𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2)λi(𝚲o)|Clogn𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2.\lvert\lambda_{i}(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2})-\lambda_{i}(\mathbf{\Lambda}_{o})\rvert\leq C\log n\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2}-\mathbf{\Lambda}_{o}\|_{2}.

As in the proof of Theorem 1, adjusting ϵ\epsilon by a 1Clogn\frac{1}{C\log n} factor, we get |λi(𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2)λi(𝚲o)|ϵn\lvert\lambda_{i}(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o}^{T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}_{o}\mathbf{\Lambda}_{o}^{1/2})-\lambda_{i}(\mathbf{\Lambda}_{o})\rvert\leq\epsilon n with probability 1δ1-\delta for sclog(1/(ϵδ))log2+abnϵ2+abδs\geq\frac{c\log(1/(\epsilon\delta))\log^{2+a-b}n}{\epsilon^{2+a-b}\sqrt{\delta}}. Then we follow the proof of Theorem 1 to align the eigenvalues completing the proof. ∎

Appendix D Spectral Norm Bounds for Non-Uniform Random Submatrices

See 5

Proof.

The proof follows from [Tro08b]. We begin by first defining the following term

E𝔼2𝐀𝐒2.\displaystyle E\coloneqq\mathbb{E}_{2}\|\mathbf{AS}\|_{2}.

Now we have

E2=𝔼𝐀𝐒22=𝔼𝐀𝐒𝐒𝐀2=𝔼j=1nδj2𝐀j𝐀j2,\displaystyle E^{2}=\mathbb{E}\|\mathbf{A}\mathbf{S}\|^{2}_{2}=\mathbb{E}\|\mathbf{A}\mathbf{S}\mathbf{S}\mathbf{A}^{*}\|_{2}=\mathbb{E}\left\|\sum_{j=1}^{n}\delta^{2}_{j}\mathbf{A}_{j}\mathbf{A}_{j}^{*}\right\|_{2},

where δj\delta_{j} is the sequence of independent random variables such that δj=1pj\delta_{j}=\frac{1}{\sqrt{p_{j}}} with probability pjp_{j} and 0 otherwise, and 𝐀j\mathbf{A}_{j} is the jjth column of 𝐀\mathbf{A}. Then, μj=𝔼[(δj)2]=1\mu_{j}=\mathbb{E}[(\delta_{j})^{2}]=1. Let {δj}\{\delta_{j}^{\prime}\} be an independent copy of the sequence {δj}\{\delta_{j}\}. Subtracting the mean and applying triangle inequality we have

E2𝔼j=1n(δj2𝔼[(δ^)2])𝐀j𝐀j2+j=1n𝐀j𝐀j2.\displaystyle E^{2}\leq\mathbb{E}\left\|\sum_{j=1}^{n}(\delta^{2}_{j}-\mathbb{E}[(\hat{\delta})^{2}])\mathbf{A}_{j}\mathbf{A}_{j}^{*}\right\|_{2}+\left\|\sum_{j=1}^{n}\mathbf{A}_{j}\mathbf{A}_{j}^{*}\right\|_{2}.

Using Jensen’s inequality we have

E2\displaystyle E^{2} 𝔼j=1n(δj2(δj)2)𝐀j𝐀j2+𝐀𝐀2.\displaystyle\leq\mathbb{E}\left\|\sum_{j=1}^{n}(\delta^{2}_{j}-(\delta_{j}^{\prime})^{2})\mathbf{A}_{j}\mathbf{A}_{j}^{*}\right\|_{2}+\left\|\mathbf{A}\mathbf{A}^{*}\right\|_{2}.

The random variables (δj2(δj)2)(\delta^{2}_{j}-(\delta_{j}^{\prime})^{2}) are symmetric and independent. Let {ϵj}\{\epsilon_{j}\} be i.i.d Rademacher random variables for all j[n]j\in[n]. Then applying the standard symmetrization argument followed by triangle inequality, we have:

E2\displaystyle E^{2} 2𝔼j=1nϵjδj2𝐀j𝐀j2+𝐀𝐀2.\displaystyle\leq 2\mathbb{E}\left\|\sum_{j=1}^{n}\epsilon_{j}\delta^{2}_{j}\mathbf{A}_{j}\mathbf{A}_{j}^{*}\right\|_{2}+\left\|\mathbf{A}\mathbf{A}^{*}\right\|_{2}.

Let Ω={j:δj=1pj}\Omega=\{j:\delta_{j}=\frac{1}{\sqrt{p_{j}}}\}. Let 𝔼\mathbb{E} be the partial expectation with respect to {ϵj}\{\epsilon_{j}\}, keeping the other random variables fixed. Then, we get:

E2\displaystyle E^{2} 2𝔼Ω[𝔼ϵΩϵjδj2𝐀j𝐀jT2]+𝐀22.\displaystyle\leq 2\mathbb{E}_{\Omega}\left[\mathbb{E}_{\epsilon}\left\|\sum_{\Omega}\epsilon_{j}\delta_{j}^{2}\mathbf{A}_{j}\mathbf{A}_{j}^{T}\right\|_{2}\right]+\|\mathbf{A}\|^{2}_{2}.

Using Rudelson’s Lemma 11 of [Tro08b] for any matrix 𝐗\mathbf{X} with columns 𝐱1,𝐱2,,𝐱n\mathbf{x}_{1},\mathbf{x}_{2},\cdots,\mathbf{x}_{n} and any q=2lognq=2\log n we have

(𝔼j=1nϵj𝐱j𝐱j2q)1/q\displaystyle\left(\mathbb{E}\left\|\sum_{j=1}^{n}\epsilon_{j}\mathbf{x}_{j}\mathbf{x}_{j}^{*}\right\|_{2}^{q}\right)^{1/q} 1.5q𝐗12𝐗2.\displaystyle\leq 1.5\sqrt{q}\|\mathbf{X}\|_{1\to 2}\|\mathbf{X}\|_{2}.

Since (.)1/q(.)^{1/q} is concave for q1q\geq 1, using Jensen’s inequality we get:

𝔼j=1nϵj𝐱j𝐱j2\displaystyle\mathbb{E}\left\|\sum_{j=1}^{n}\epsilon_{j}\mathbf{x}_{j}\mathbf{x}_{j}^{*}\right\|_{2} 1.5q𝐗12𝐗2\displaystyle\leq 1.5\sqrt{q}\|\mathbf{X}\|_{1\to 2}\|\mathbf{X}\|_{2}

Applying the above result to the matrix 𝐀𝐒\mathbf{A}\mathbf{S}, we get:

E2\displaystyle E^{2} 3q[𝔼(𝐀𝐒12𝐀𝐒2)]+𝐀22.\displaystyle\leq 3\sqrt{q}\left[\mathbb{E}(\|\mathbf{A}\mathbf{S}\|_{1\rightarrow 2}\|\mathbf{A}\mathbf{S}\|_{2})\right]+\|\mathbf{A}\|_{2}^{2}.

Applying Cauchy Schwartz we get:

E2\displaystyle E^{2} 3q(𝔼𝐀𝐒122)1/2(𝔼𝐀𝐒22)1/2+𝐀22.\displaystyle\leq 3\sqrt{q}(\mathbb{E}\|\mathbf{A}\mathbf{S}\|^{2}_{1\rightarrow 2})^{1/2}(\mathbb{E}\|\mathbf{A}\mathbf{S}\|^{2}_{2})^{1/2}+\|\mathbf{A}\|_{2}^{2}.

The above equation is of the form E2bE+cE^{2}\leq bE+c. Thus, the values of EE fro which the above equation is true is given by Eb+b2+4c2b+cE\leq\frac{b+\sqrt{b^{2}+4c}}{2}\leq b+\sqrt{c}. Thus, we get:

𝔼2𝐀𝐒23q𝔼2𝐀𝐒12+𝐀2.\displaystyle\mathbb{E}_{2}\|\mathbf{AS}\|_{2}\leq 3\sqrt{q}\mathbb{E}_{2}\|\mathbf{AS}\|_{1\rightarrow 2}+\|\mathbf{A}\|_{2}.

This gives us the final bound. ∎

Appendix E Improved Bounds via Row-Norm-Based Sampling

Building on the sparsity-based sampling results presented in Section 4, we now show how to obtain improved approximation error of ±ϵ𝐀F\pm\epsilon\|\mathbf{A}\|_{F} assuming we can sample the rows of 𝐀\mathbf{A} with probabilties proportional to their squared 2\ell_{2} norms. The ability to sample by norms also allows us to remove the assumption that 𝐀\mathbf{A} has bounded entries – our results apply to any symmetric matrix.

For technical reasons, we mix row norm sampling with uniform sampling, forming a random principal submatrix by sampling each index i[n]i\in[n] independently with probability pi=min(1,s𝐀i22𝐀F2+1n2)p_{i}=\min\left(1,\frac{s\|\mathbf{A}_{i}\|^{2}_{2}}{\|\mathbf{A}\|^{2}_{F}}+\frac{1}{n^{2}}\right) and rescaling each sampled row/column by 1/pi1/\sqrt{p_{i}}. As in the sparsity-based sampling setting, we must carefully zero out entries of the sampled submatrix to ensure concentration of the sampled eigenvalues. Pseudocode for the full algorithm is given in Algorithm 3.

E.1 Preliminary Lemmas

Our proof closely follows that of Theorem 2 in Section 4. We start by defining 𝐀n×n\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} obtained by zeroing out entries of 𝐀\mathbf{A} as described in Algorithm 3. We have 𝐀ij=0\mathbf{A}^{\prime}_{ij}=0 whenever 1) i=ji=j and 𝐀i22<ϵ24𝐀F2\|\mathbf{A}_{i}\|^{2}_{2}<\frac{\epsilon^{2}}{4}\|\mathbf{A}\|^{2}_{F} or 2) iji\neq j and 𝐀i22𝐀j22<ϵ2𝐀F2|𝐀ij|2c2log4n\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2}<\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}\cdot\lvert\mathbf{A}_{ij}\rvert^{2}}{c_{2}\log^{4}n}. Otherwise 𝐀ij=𝐀ij\mathbf{A}^{\prime}_{ij}=\mathbf{A}_{ij}. Similar to the sparsity sampling case, we argue that the eigenvalues of 𝐀\mathbf{A}^{\prime} are close to 𝐀\mathbf{A} i.e., zeroing out entries of 𝐀\mathbf{A} according to the given condition doesn’t change it’s eigenvalues by too much (Lemma 11. Then, we again split 𝐀=𝐀o+𝐀m\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{o}+\mathbf{A}^{\prime}_{m} such that 𝐀m2ϵδ𝐀F\|\mathbf{A}^{\prime}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}. We argue that after sampling, we have 𝐀m,S2ϵ𝐀F\|\mathbf{A}^{\prime}_{m,S}\|_{2}\leq\epsilon\|\mathbf{A}\|_{F} and the eigenvalues of 𝐀o,S\mathbf{A}^{\prime}_{o,S} approximate those of 𝐀o\mathbf{A}^{\prime}_{o} up to ±ϵ𝐀F\pm\epsilon\|\mathbf{A}\|_{F} error.

Algorithm 3 Eigenvalue estimator using 2\ell_{2} norm-based sampling
1:  Input: Symmetric 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n}, Accuracy ϵ(0,1)\epsilon\in(0,1), failure prob. δ(0,1)\delta\in(0,1). 𝐀i2\|\mathbf{A}_{i}\|_{2} for all i[n]i\in[n].
2:  Fix s=c1log10nϵ8δ4s=\frac{c_{1}\log^{10}n}{\epsilon^{8}\delta^{4}} where c1c_{1} is a sufficiently large constant.
3:  Add each i[n]i\in[n] to sample set SS independently, with probability pi=min(1,s𝐀i22𝐀F2+1n2)p_{i}=\min\left(1,\frac{s\|\mathbf{A}_{i}\|^{2}_{2}}{\|\mathbf{A}\|^{2}_{F}}+\frac{1}{n^{2}}\right). Let the principal submatrix of 𝐀\mathbf{A} corresponding to SS be 𝐀S\mathbf{A}_{S}.
4:  Let 𝐀S=𝐃𝐀S𝐃\mathbf{A}_{S}=\mathbf{D}\mathbf{A}_{S}\mathbf{D} where 𝐃|S|×|S|\mathbf{D}\in\mathbb{R}^{|S|\times|S|} is diagonal with 𝐃i,i=1pj\mathbf{D}_{i,i}=\frac{1}{\sqrt{p_{j}}} if the ithi^{th} element of SS is jj.
5:  Construct 𝐀S|S|×|S|\mathbf{A}^{\prime}_{S}\in\mathbb{R}^{|S|\times|S|} from 𝐀S\mathbf{A}_{S} as follows:
[𝐀S]i,j\displaystyle\mathbf{[}\mathbf{A}^{\prime}_{S}]_{i,j} ={0if i=j and 𝐀i22<ϵ24𝐀F20if ij and 𝐀i22𝐀j22<ϵ2𝐀F2|𝐀ij|2c2log4n for sufficient large constant c2[𝐀S]i,jotherwise.\displaystyle=\begin{cases}0&\text{if $i=j$ and }\|\mathbf{A}_{i}\|_{2}^{2}<\frac{\epsilon^{2}}{4}\|\mathbf{A}\|_{F}^{2}\\ 0&\text{if $i\neq j$ and }\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2}<\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}\cdot\lvert\mathbf{A}_{ij}\rvert^{2}}{c_{2}\log^{4}n}\text{ for sufficient large constant $c_{2}$}\\ [\mathbf{A}_{S}]_{i,j}&\text{otherwise}.\end{cases}
6:  Compute the eigenvalues of 𝐀S\mathbf{A}^{\prime}_{S}: λ1(𝐀S)λ|S|(𝐀S)\lambda_{1}(\mathbf{A}^{\prime}_{S})\geq\ldots\geq\lambda_{|S|}(\mathbf{A}^{\prime}_{S}).
7:  For all i[|S|]i\in[|S|] with λi(𝐀S)0\lambda_{i}(\mathbf{A}^{\prime}_{S})\geq 0, let λ~i(𝐀)=λi(𝐀S)\tilde{\lambda}_{i}(\mathbf{A})=\lambda_{i}(\mathbf{A}^{\prime}_{S}). For all i[|S|]i\in[|S|] with λi(𝐀S)<0\lambda_{i}(\mathbf{A}^{\prime}_{S})<0, let λ~n(|S|i)(𝐀)=λi(𝐀S)\tilde{\lambda}_{n-(|S|-i)}(\mathbf{A})=\lambda_{i}(\mathbf{A}^{\prime}_{S}). For all remaining i[n]i\in[n], let λ~i(𝐀)=0\tilde{\lambda}_{i}(\mathbf{A})=0.
8:  Return: Eigenvalue estimates λ~1(𝐀)λ~n(𝐀)\tilde{\lambda}_{1}(\mathbf{A})\geq\ldots\geq\tilde{\lambda}_{n}(\mathbf{A}).
Lemma 11.

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be symmetric. Let 𝐀n×n\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} have 𝐀ij=0\mathbf{A}^{\prime}_{ij}=0 if either 1) i=ji=j and 𝐀i22<ϵ24𝐀F2\|\mathbf{A}_{i}\|_{2}^{2}<\frac{\epsilon^{2}}{4}\|\mathbf{A}\|_{F}^{2} or 2) iji\neq j and 𝐀i22𝐀j22<ϵ2𝐀F2|𝐀ij|2c2log4n\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2}<\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}\cdot\lvert\mathbf{A}_{ij}\rvert^{2}}{c_{2}\log^{4}n} for a sufficiently large constant c2c_{2}. Otherwise, 𝐀ij=𝐀ij\mathbf{A}^{\prime}_{ij}=\mathbf{A}_{ij}. Then, for all i[n]i\in[n],

|λi(𝐀)λi(𝐀)|ϵ𝐀F.|\lambda_{i}(\mathbf{A})-\lambda_{i}(\mathbf{A}^{\prime})|\leq\epsilon\|\mathbf{A}\|_{F}.
Proof.

Consider the matrix 𝐀′′\mathbf{A}^{\prime\prime}, which is defined identically to 𝐀\mathbf{A}^{\prime} except we only set 𝐀ij′′=0\mathbf{A}^{\prime\prime}_{ij}=0 if iji\neq j and 𝐀i22𝐀j22<ϵ2𝐀F2|𝐀ij|2c2log4n\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2}<\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}\lvert\mathbf{A}_{ij}\rvert^{2}}{c_{2}\log^{4}n}. I.e., we do not zero out any entries on the diagonal as in 𝐀\mathbf{A}^{\prime}. We will show that 𝐀𝐀′′2ϵ2𝐀F\|\mathbf{A}-\mathbf{A}^{\prime\prime}\|_{2}\leq\frac{\epsilon}{2}\|\mathbf{A}\|_{F}. If 𝐀ii\mathbf{A}_{ii} is zeroed out in 𝐀\mathbf{A}^{\prime} this implies that 𝐀i22<ϵ24𝐀F2\|\mathbf{A}_{i}\|_{2}^{2}<\frac{\epsilon^{2}}{4}\|\mathbf{A}\|_{F}^{2}. Thus, |𝐀ii|𝐀i2ϵ2𝐀F|\mathbf{A}_{ii}|\leq\|\mathbf{A}_{i}\|_{2}\leq\frac{\epsilon}{2}\|\mathbf{A}\|_{F} and so 𝐀′′𝐀2ϵ2𝐀F\|\mathbf{A}^{\prime\prime}-\mathbf{A}^{\prime}\|_{2}\leq\frac{\epsilon}{2}\|\mathbf{A}\|_{F}. So, by triangle inequality, we will then have 𝐀𝐀2ϵ𝐀F\|\mathbf{A}-\mathbf{A}^{\prime}\|_{2}\leq\epsilon\cdot\|\mathbf{A}\|_{F}. The lemma then follows from Weyl’s inequality

To show that 𝐀𝐀′′2ϵ2𝐀F\|\mathbf{A}-\mathbf{A}^{\prime\prime}\|_{2}\leq\frac{\epsilon}{2}\|\mathbf{A}\|_{F}, we use a variant of Girshgorin’s theorem, as in the proof of Lemma 5. First, we split the entries of 𝐀\mathbf{A} into level sets, according to their magnitudes. Let 𝐀=k=0lognϵ𝐀k\mathbf{A}=\sum_{k=0}^{\log\frac{n}{\epsilon}}\mathbf{A}_{k} where (𝐀0)ij=𝐀ij(\mathbf{A}_{0})_{ij}=\mathbf{A}_{ij} if |𝐀ij|[0,ϵn𝐀F)\lvert\mathbf{A}_{ij}\rvert\in\left[0,\frac{\epsilon}{n}\|\mathbf{A}\|_{F}\right) and (𝐀0)ij=0(\mathbf{A}_{0})_{ij}=0 otherwise. For 1klognϵ1\leq k\leq\log\frac{n}{\epsilon}, (𝐀k)ij=𝐀ij(\mathbf{A}_{k})_{ij}=\mathbf{A}_{ij} if |𝐀ij|[𝐀F2k,𝐀F2k1)\lvert\mathbf{A}_{ij}\rvert\in\left[\frac{\|\mathbf{A}\|_{F}}{2^{k}},\frac{\|\mathbf{A}\|_{F}}{2^{k-1}}\right) and (𝐀k)ij=0(\mathbf{A}_{k})_{ij}=0 otherwise. We can also define 𝐀′′=k=0lognϵ𝐀k′′\mathbf{A}^{\prime\prime}=\sum_{k=0}^{\log\frac{n}{\epsilon}}\mathbf{A}^{\prime\prime}_{k} where each 𝐀k′′\mathbf{A}^{\prime\prime}_{k} are defined similarly. By triangle inequality, 𝐀𝐀′′2k=0logn/ϵ𝐀k𝐀k′′2\|\mathbf{A}-\mathbf{A}^{\prime\prime}\|_{2}\leq\sum_{k=0}^{\log n/\epsilon}\|\mathbf{A}_{k}-\mathbf{A}_{k}^{\prime\prime}\|_{2}. First observe that 𝐀0𝐀0′′2𝐀0𝐀0′′Fn𝐀0ϵ𝐀F\|\mathbf{A}_{0}-\mathbf{A}_{0}^{\prime\prime}\|_{2}\leq\|\mathbf{A}_{0}-\mathbf{A}_{0}^{\prime\prime}\|_{F}\leq n\cdot\|\mathbf{A}_{0}\|_{\infty}\leq\epsilon\|\mathbf{A}\|_{F}. Further, we can assume without loss of generality that ϵ>1/n\epsilon>1/n and so log(n/ϵ)2logn\log(n/\epsilon)\leq 2\log n, as otherwise our algorithm can afford to read all of 𝐀\mathbf{A}. So, it suffices to show that for all k1k\geq 1,

𝐀k𝐀k′′2ϵlogn𝐀F.\displaystyle\|\mathbf{A}_{k}-\mathbf{A}_{k}^{\prime\prime}\|_{2}\leq\frac{\epsilon}{\log n}\cdot\|\mathbf{A}\|_{F}. (37)

This will give 𝐀𝐀′′2ϵ𝐀F+k=1logn/ϵϵlogn𝐀F3ϵ𝐀F\|\mathbf{A}-\mathbf{A}^{\prime\prime}\|_{2}\leq\epsilon\cdot\|\mathbf{A}\|_{F}+\sum_{k=1}^{\log n/\epsilon}\frac{\epsilon}{\log n}\cdot\|\mathbf{A}\|_{F}\leq 3\epsilon\cdot\|\mathbf{A}\|_{F}, which gives the lemma after adjusting ϵ\epsilon by a constant factor.

We now prove (37) for each k1k\geq 1. For p{0,1,log(n2)}p\in\{0,1,\ldots\log(n^{2})\}, let p[n]\mathcal{I}_{p}\subset[n] be the set of rows/columns in 𝐀k\mathbf{A}_{k} with nnz((𝐀k)i)[nnz(𝐀k)2p,nnz(𝐀k)2p1)\operatorname{nnz}((\mathbf{A}_{k})_{i})\in\left[\frac{\operatorname{nnz}(\mathbf{A}_{k})}{2^{p}},\frac{\operatorname{nnz}(\mathbf{A}_{k})}{2^{p-1}}\right) and let 𝐀kpq=𝐀k(p,q)\mathbf{A}_{kpq}=\mathbf{A}_{k}(\mathcal{I}_{p},\mathcal{I}_{q}) be the submatrix of 𝐀k\mathbf{A}_{k} formed with rows in p\mathcal{I}_{p} and columns in q\mathcal{I}_{q}. Define the submatrix 𝐀kpq′′\mathbf{A}^{\prime\prime}_{kpq} of 𝐀k′′\mathbf{A}^{\prime\prime}_{k} in the same way. Let 𝐀^kpq=𝐀kpq𝐀kpq′′\mathbf{\widehat{A}}_{kpq}=\mathbf{A}_{kpq}-\mathbf{A}^{\prime\prime}_{kpq} and finally, let 𝐀¯kpqn×n\bar{\mathbf{A}}_{kpq}\in\mathbb{R}^{n\times n} be the symmetric error matrix such that 𝐀¯kpq(p,q)=𝐀^kpq\bar{\mathbf{A}}_{kpq}(\mathcal{I}_{p},\mathcal{I}_{q})=\mathbf{\widehat{A}}_{kpq} and 𝐀¯kpq(q,p)=𝐀^kpqT\bar{\mathbf{A}}_{kpq}(\mathcal{I}_{q},\mathcal{I}_{p})=\mathbf{\widehat{A}}_{kpq}^{T}.

Note that all rows from which we zero out entries must have at least one non-zero entry nnz((𝐀k)i)1\operatorname{nnz}((\mathbf{A}_{k})_{i})\geq 1 (otherwise all entries in that row/column are already zero), thus all such rows have nnz((𝐀k)i)nnz(𝐀k)n2\operatorname{nnz}((\mathbf{A}_{k})_{i})\geq\frac{\operatorname{nnz}(\mathbf{A}_{k})}{n^{2}} and so are covered by the submatrices 𝐀kpq\mathbf{A}_{kpq}. Thus, by triangle inequality, we can bound

𝐀k𝐀k′′2p=0log(n2)q=0log(n2)𝐀¯kpq2.\displaystyle\|\mathbf{A}_{k}-\mathbf{A}_{k}^{\prime\prime}\|_{2}\leq\sum_{p=0}^{\log(n^{2})}\sum_{q=0}^{\log(n^{2})}\|\mathbf{\bar{A}}_{kpq}\|_{2}. (38)

To prove (37), we need to bound 𝐀kpq𝐀kpq′′2\|\mathbf{A}_{kpq}-\mathbf{A}_{kpq}^{\prime\prime}\|_{2} for all k1k\geq 1 and p,qp,q. We use a case analysis.

Case 1: 4nnz(𝐀k)2c2log4nϵ222k>2p+q.\frac{4\operatorname{nnz}(\mathbf{A}_{k})^{2}\cdot c_{2}\log^{4}n}{\epsilon^{2}\cdot 2^{2k}}>2^{p+q}. In this case, first observe that since the nonzero entries of 𝐀k\mathbf{A}_{k} lie in [𝐀F2k,𝐀F2k1)\left[\frac{\|\mathbf{A}\|_{F}}{2^{k}},\frac{\|\mathbf{A}\|_{F}}{2^{k-1}}\right), for any ipi\in\mathcal{I}_{p}, jjj\in\mathcal{I}_{j},

𝐀i22𝐀j22\displaystyle\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2} (𝐀k)i22(𝐀k)j22\displaystyle\geq\|(\mathbf{A}_{k})_{i}\|_{2}^{2}\cdot\|(\mathbf{A}_{k})_{j}\|_{2}^{2}
𝐀F424knnz((𝐀k)i)nnz((𝐀k)j)\displaystyle\geq\frac{\|\mathbf{A}\|_{F}^{4}}{2^{4k}}\cdot\operatorname{nnz}((\mathbf{A}_{k})_{i})\cdot\operatorname{nnz}((\mathbf{A}_{k})_{j})
𝐀F424k2p+qnnz(𝐀k)2.\displaystyle\geq\frac{\|\mathbf{A}\|_{F}^{4}}{2^{4k}\cdot 2^{p+q}}\cdot\operatorname{nnz}(\mathbf{A}_{k})^{2}.

Thus, by the assumed bound on 2p+q2^{p+q}, we have for any i,ji,j where (𝐀k)ij(\mathbf{A}_{k})_{ij} is nonzero,

𝐀i22𝐀j22ϵ2𝐀F4422kc2log4nϵ2𝐀F2|𝐀ij|2c2log4n,\displaystyle\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2}\geq\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{4}}{4\cdot 2^{2k}c_{2}\log^{4}n}\geq\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}\cdot|\mathbf{A}_{ij}|^{2}}{c_{2}\log^{4}n},

where the second inequality follows again from the fact that the nonzero entries of 𝐀k\mathbf{A}_{k} lie in [𝐀F2k,𝐀F2k1)\left[\frac{\|\mathbf{A}\|_{F}}{2^{k}},\frac{\|\mathbf{A}\|_{F}}{2^{k-1}}\right). Thus, any i,ji,j with (𝐀kpq)ij(\mathbf{A}_{kpq})_{ij} nonzero is not zeroed out in line 5 of Algorithm 3. So 𝐀¯kpq=𝟎\mathbf{\bar{A}}_{kpq}=\mathbf{0}. Plugging into (38), we thus have:

𝐀k𝐀k′′2p=0log(n2)q:2p+q16nnz(𝐀k)2c2log4nϵ222k𝐀¯kpq2.\displaystyle\|\mathbf{A}_{k}-\mathbf{A}_{k}^{\prime\prime}\|_{2}\leq\sum_{p=0}^{\log(n^{2})}\sum_{q:2^{p+q}\geq\frac{16\operatorname{nnz}(\mathbf{A}_{k})^{2}\cdot c_{2}\log^{4}n}{\epsilon^{2}\cdot 2^{2k}}}\|\mathbf{\bar{A}}_{kpq}\|_{2}. (39)

Case 2: 16nnz(𝐀k)2c2log4nϵ222k2p+q.\frac{16\operatorname{nnz}(\mathbf{A}_{k})^{2}\cdot c_{2}\log^{4}n}{\epsilon^{2}\cdot 2^{2k}}\leq 2^{p+q}. In this case, observe that (𝐀^kpq𝐀^kpqT)m=(𝐀^kpq)m𝐀^kpqT(\mathbf{\widehat{A}}_{kpq}\mathbf{\widehat{A}}_{kpq}^{T})_{m}=(\mathbf{\widehat{A}}_{kpq})_{m}\mathbf{\widehat{A}}_{kpq}^{T}. We can see that (𝐀^kpq)m(\mathbf{\widehat{A}}_{kpq})_{m} has at most nnz((𝐀k)m)nnz(𝐀k)2p1\operatorname{nnz}((\mathbf{A}_{k})_{m})\leq\frac{\operatorname{nnz}(\mathbf{A}_{k})}{2^{p-1}} non-zero entries. Similarly, each row of 𝐀^kpqT\mathbf{\widehat{A}}_{kpq}^{T} has at most nnz(𝐀k)2q1\frac{\operatorname{nnz}(\mathbf{A}_{k})}{2^{q-1}} non-zero elements. Thus, for all m|p|m\in|\mathcal{I}_{p}|, using the fact that all non-zero entries of 𝐀kpq\mathbf{A}_{kpq} are bounded by 𝐀F2k1\frac{\|\mathbf{A}\|_{F}}{2^{k-1}}, we have:

(𝐀^kpq𝐀^kpqT)m1nnz(𝐀k)22p+q2𝐀F222k2.\displaystyle\|(\mathbf{\widehat{A}}_{kpq}\mathbf{\widehat{A}}^{T}_{kpq})_{m}\|_{1}\leq\frac{\operatorname{nnz}(\mathbf{A}_{k})^{2}}{2^{p+q-2}}\cdot\frac{\|\mathbf{A}\|_{F}^{2}}{2^{2k-2}}.

Applying Girshgorin’s circle theorem (Theorem 2) we thus have:

𝐀^kpq22=𝐀^kpq𝐀^kpqT2nnz(𝐀k)22p+q2𝐀F222k2\displaystyle\|\mathbf{\widehat{A}}_{kpq}\|_{2}^{2}=\|\mathbf{\widehat{A}}_{kpq}\mathbf{\widehat{A}}_{kpq}^{T}\|_{2}\leq\frac{\operatorname{nnz}(\mathbf{A}_{k})^{2}}{2^{p+q-2}}\cdot\frac{\|\mathbf{A}\|_{F}^{2}}{2^{2k-2}}

and so

𝐀¯kpq22𝐀^kpq28𝐀Fnnz(𝐀k)2k2(p+q)2.\displaystyle\|\bar{\mathbf{A}}_{kpq}\|_{2}\leq 2\|\mathbf{\widehat{A}}_{kpq}\|_{2}\leq\frac{8\cdot\|\mathbf{A}\|_{F}\cdot\operatorname{nnz}(\mathbf{A}_{k})}{2^{k}2^{\frac{(p+q)}{2}}}.

Plugging to (39), we thus have:

𝐀k𝐀k′′2\displaystyle\|\mathbf{A}_{k}-\mathbf{A}_{k}^{\prime\prime}\|_{2} p=0log(n2)q:2p+q16nnz(𝐀k)2c2log4nϵ222k8𝐀Fnnz(𝐀k)2k2(p+q)2\displaystyle\leq\sum_{p=0}^{\log(n^{2})}\sum_{q:2^{p+q}\geq\frac{16\operatorname{nnz}(\mathbf{A}_{k})^{2}\cdot c_{2}\log^{4}n}{\epsilon^{2}\cdot 2^{2k}}}\frac{8\cdot\|\mathbf{A}\|_{F}\cdot\operatorname{nnz}(\mathbf{A}_{k})}{2^{k}2^{\frac{(p+q)}{2}}}
p=0log(n2)2ϵ𝐀Fc2log2ni=0128ϵ𝐀Fc2.\displaystyle\leq\sum_{p=0}^{\log(n^{2})}\frac{2\epsilon\cdot\|\mathbf{A}\|_{F}}{\sqrt{c_{2}}\log^{2}n}\cdot\sum_{i=0}^{\infty}\frac{1}{\sqrt{2}}\leq\frac{8\epsilon\|\mathbf{A}\|_{F}}{\sqrt{c_{2}}}.

Setting c264c_{2}\geq 64, we thus have (37), and in turn the lemma. ∎

We next give a bound on the incoherence of the outlying eigenvectors of 𝐀\mathbf{A}^{\prime}. This bound is again similar to Lemmas 2 and 6.

Lemma 12 (Incoherence of outlying eigenvectors in terms of 2\ell_{2} norms).

Let 𝐀,𝐀n×n\mathbf{A},\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} be as in Lemma 11. Let 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{V}_{o}^{{}^{\prime}T} where 𝚲o\mathbf{\Lambda}^{\prime}_{o} is diagonal, with the eigenvalues of 𝐀\mathbf{A}^{\prime} with magnitude ϵδ𝐀F\geq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} on its diagonal, and 𝐕o\mathbf{V}^{\prime}_{o} has columns equal to the corresponding eigenvectors. Let 𝐕o,i\mathbf{V}^{\prime}_{o,i} denote the iith row of 𝐕o\mathbf{V}^{\prime}_{o}. Then,

𝚲o1/2𝐕o,i22𝐀i22ϵδ𝐀Fand𝐕o,i22𝐀i22ϵ2δ𝐀F2.\displaystyle\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}^{\prime}_{o,i}\|_{2}^{2}\leq\frac{\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}}\hskip 10.00002ptand\hskip 10.00002pt\|\mathbf{V}^{\prime}_{o,i}\|^{2}_{2}\leq\frac{\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}}.
Proof.

The proof is again nearly identical to that of Lemma 2. Observe that 𝐀𝐕o=𝐕o𝚲o\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}. Letting [𝐀𝐕o]i[\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}]_{i} denote the iith row of the 𝐀𝐕o\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}, we have

[𝐀𝐕o]i22=[𝐕o𝚲o]i22=j=1rλj2𝐕o,i,j2,\|[\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}]_{i}\|_{2}^{2}=\|[\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}]_{i}\|_{2}^{2}=\sum_{j=1}^{r}\lambda_{j}^{2}\cdot\mathbf{V}_{o,i,j}^{{}^{\prime}2}, (40)

where r=rank(𝐀o)r=\operatorname{rank}(\mathbf{A}^{\prime}_{o}), 𝐕o,i,j\mathbf{V}^{\prime}_{o,i,j} is the (i,j)(i,j)th element of 𝐕o\mathbf{V}^{\prime}_{o} and λj=𝚲o(j,j)\lambda_{j}=\mathbf{\Lambda}^{\prime}_{o}(j,j). Since 𝐕o\mathbf{V}^{\prime}_{o} has orthonormal columns, we have [𝐀𝐕o]i22𝐀i22𝐀i22\|[\mathbf{A}^{\prime}\mathbf{V}^{\prime}_{o}]_{i}\|_{2}^{2}\leq\|\mathbf{A}^{\prime}_{i}\|_{2}^{2}\leq\|\mathbf{A}_{i}\|_{2}^{2}. Therefore, by (40),

j=1rλj2𝐕o,i,j2𝐀i22.\sum_{j=1}^{r}\lambda_{j}^{2}\cdot\mathbf{V}_{o,i,j}^{{}^{\prime}2}\leq\|\mathbf{A}_{i}\|_{2}^{2}. (41)

Since by definition |λj|ϵδ𝐀F\lvert\lambda_{j}\rvert\geq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} for all jj, we can conclude that 𝚲o1/2𝐕o,i22=j=1rλj𝐕o,i,j2𝐀i22ϵδ𝐀F\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}^{\prime}_{o,i}\|_{2}^{2}=\sum_{j=1}^{r}\lambda_{j}\cdot\mathbf{V}_{o,i,j}^{{}^{\prime}2}\leq\frac{\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}} and 𝐕o,i22=j=1r𝐕o,i,j2𝐀i22ϵ2δ𝐀F2\|\mathbf{V}^{\prime}_{o,i}\|_{2}^{2}=\sum_{j=1}^{r}\mathbf{V}_{o,i,j}^{{}^{\prime}2}\leq\frac{\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}}, which completes the lemma. ∎

E.2 Outer and Middle Eigenvalue Bounds

Using Lemma 12, we next argue that the eigenvalues of 𝐀o,S\mathbf{A}_{o,S}^{\prime} will approximate those of 𝐀\mathbf{A}^{\prime}, and in turn those of 𝐀\mathbf{A}. The proof is very similar to Lemmas 3 and 7.

Lemma 13 (Concentration of outlying eigenvalues with 2\ell_{2} norm based sampling).

Let 𝐀,𝐀n×n\mathbf{A},\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} be as in algorithm 3. Let 𝐀=𝐀m+𝐀o\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{m}+\mathbf{A}^{\prime}_{o}, where 𝐀m=𝐕m𝚲m𝐕mT\mathbf{A}^{\prime}_{m}=\mathbf{V}^{\prime}_{m}\mathbf{\Lambda}^{\prime}_{m}\mathbf{\mathbf{V}^{\prime}}_{m}^{T}, and 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{\mathbf{V}^{\prime}}_{o}^{T} are projections onto the eigenspaces with magnitude <ϵδ𝐀F<\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} and ϵδ𝐀F\geq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} respectively. For all i[n]i\in[n] let pi=min(1,s𝐀i22𝐀F2+1n2)p_{i}=\min\left(1,\frac{s\|\mathbf{A}_{i}\|^{2}_{2}}{\|\mathbf{A}\|^{2}_{F}}+\frac{1}{n^{2}}\right) and let 𝐒¯\bar{\mathbf{S}} be a scaled diagonal sampling matrix such that the 𝐒¯ii=1pi\bar{\mathbf{S}}_{ii}=\frac{1}{\sqrt{p_{i}}} with probability pip_{i} and 𝐒¯ii=0\bar{\mathbf{S}}_{ii}=0 otherwise. If sclog(1/(ϵδ))ϵ3δs\geq\frac{c\log(1/(\epsilon\delta))}{\epsilon^{3}\sqrt{\delta}} for a large enough constant cc, then with probability at least 1δ1-\delta, 𝚲o1/2𝐕oT𝐒¯𝐒¯T𝐕o𝚲o1/2𝚲o2ϵ𝐀F\|\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}\mathbf{V}_{o}^{{}^{\prime}T}\bar{\mathbf{S}}\bar{\mathbf{S}}^{T}\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}_{o}^{{}^{\prime}1/2}-\mathbf{\Lambda}^{\prime}_{o}\|_{2}\leq\epsilon\|\mathbf{A}\|_{F}.

Proof.

We define the random variables 𝐐1,𝐐n\mathbf{Q}_{1},\cdots\mathbf{Q}_{n} and the set P={i[n]:pi<1}P=\{i\in[n]:p_{i}<1\} exactly as in the proof of Lemma 7. Then, as explained in the proof of Lemma 7 it is sufficient to bound iP𝔼[𝐐i2]\sum_{i\in P}\mathbb{E}[\mathbf{Q}_{i}^{2}]. From 17 we have iP𝔼[𝐐i2]iP1pi𝚲o1/2𝐕o,i22(𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2)\sum_{i\in P}\mathbb{E}[\mathbf{Q}_{i}^{2}]\preceq\sum_{i\in P}\frac{1}{p_{i}}\cdot\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2}). Also from Lemma 11, we have 𝚲o1/2𝐕o,i22𝐀i22ϵδ𝐀F\|\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\|_{2}^{2}\leq\frac{\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}} and for all iPi\in P, 1pi𝐀F2s𝐀i22\frac{1}{p_{i}}\leq\frac{\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{i}\|_{2}^{2}}. We thus get,

iP𝔼[𝐐i2]\displaystyle\sum_{i\in P}\mathbb{E}[\mathbf{Q}_{i}^{2}] iP1pi𝐀i22ϵδ𝐀F(𝚲o1/2𝐕o,i𝐕o,iT𝚲o1/2)\displaystyle\preceq\sum_{i\in P}\frac{1}{p_{i}}\cdot\frac{\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}}\cdot(\mathbf{\Lambda}_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})
𝐀Fsϵδ(iPΛo1/2𝐕o,i𝐕o,iT𝚲o1/2)\displaystyle\preceq\frac{\|\mathbf{A}\|_{F}}{s\epsilon\sqrt{\delta}}(\sum_{i\in P}\Lambda_{o}^{1/2}\mathbf{V}_{o,i}\mathbf{V}_{o,i}^{T}\mathbf{\Lambda}_{o}^{1/2})
=𝐀Fsϵδ𝚲o𝐀F2sϵδ𝐈.\displaystyle=\frac{\|\mathbf{A}\|_{F}}{s\epsilon\sqrt{\delta}}\mathbf{\Lambda}_{o}\preceq\frac{\|\mathbf{A}\|_{F}^{2}}{s\epsilon\sqrt{\delta}}\cdot\mathbf{I}.

Since 𝐐i2\mathbf{Q}_{i}^{2} is PSD this establishes that vVar(E)2𝐀F2sϵδv\leq\|\textbf{Var(E)}\|_{2}\leq\frac{\|\mathbf{A}\|_{F}^{2}}{s\epsilon\sqrt{\delta}}. We can then apply the matrix Bernstein inequality exactly as in the proof of Lemma 3 to show that when scϵ3δs\geq\frac{c}{\epsilon^{3}\sqrt{\delta}} for large enough cc, with probability at least 1δ1-\delta, 𝐄2ϵ𝐀F\left\|\mathbf{E}\right\|_{2}\leq\epsilon\|\mathbf{A}\|_{F}. ∎

We now bound the middle eignevalues.

Lemma 14 (Concentration of middle eigenvalues with 2\ell_{2}- norm based sampling).

Let 𝐀,𝐀n×n\mathbf{A},\mathbf{A}^{\prime}\in\mathbb{R}^{n\times n} be as in Lemma 12. Let 𝐀=𝐀m+𝐀o\mathbf{A}^{\prime}=\mathbf{A}^{\prime}_{m}+\mathbf{A}^{\prime}_{o}, where 𝐀m=𝐕m𝚲m𝐕mT\mathbf{A}^{\prime}_{m}=\mathbf{V}^{\prime}_{m}\mathbf{\Lambda}^{\prime}_{m}\mathbf{\mathbf{V}^{\prime}}_{m}^{T}, and 𝐀o=𝐕o𝚲o𝐕oT\mathbf{A}^{\prime}_{o}=\mathbf{V}^{\prime}_{o}\mathbf{\Lambda}^{\prime}_{o}\mathbf{\mathbf{V}^{\prime}}_{o}^{T} are projections onto the eigenspaces with magnitude <ϵδ𝐀F<\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} and ϵδ𝐀F\geq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} respectively (analogous to Definition 1.1). As in Algorithm 2, for all i[n]i\in[n] let pi=min(1,s𝐀i22𝐀F2+1n2)p_{i}=\min\left(1,\frac{s\|\mathbf{A}_{i}\|_{2}^{2}}{\|\mathbf{A}\|_{F}^{2}}+\frac{1}{n^{2}}\right) and let 𝐒¯\bar{\mathbf{S}} be a scaled diagonal sampling matrix such that the 𝐒¯ii=1pi\bar{\mathbf{S}}_{ii}=\frac{1}{\sqrt{p_{i}}} with probability pip_{i} and 𝐒¯ii=0\bar{\mathbf{S}}_{ii}=0 otherwise. If sclog10nϵ8δ4s\geq\frac{c\log^{10}n}{\epsilon^{8}\delta^{4}} for a large enough constant cc, then with probability at least 1δ1-\delta,

𝐒¯𝐀m𝐒¯2ϵ𝐀F.\|\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m}\bar{\mathbf{S}}\|_{2}\leq\epsilon\|\mathbf{A}\|_{F}.
Proof.

First observe that since s4ϵ2s\geq\frac{4}{\epsilon^{2}} (for large enough cc), the results of Lemmas 11 and 12 still hold. The proof follows the same structure as the proof of bounding the middle eigenvalues for sparsity sampling in Lemma 8. From Lemma 12, we have 𝐕o,i2𝐀i2ϵδ𝐀F\|{\mathbf{V}^{\prime}}_{o,i}\|_{2}\leq\frac{\|\mathbf{A}_{i}\|_{2}}{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}}. Also, following the proof of Lemma 12, we have 𝚲o𝐕o,jT2=[𝐀𝐕o]j2𝐀j2\|{\mathbf{\Lambda}^{\prime}}_{o}{\mathbf{V}^{\prime}}^{T}_{o,j}\|_{2}=\|[{\mathbf{A}^{\prime}}{\mathbf{V}^{\prime}}_{o}]_{j}\|_{2}\leq\|\mathbf{A}_{j}\|_{2}. Thus, for all i,j[n]i,j\in[n], using Cauchy Schwarz’s inequality, we have

|𝐀o,i,j|=|𝐕o,i𝚲o𝐕o,jT|𝐕o,i2𝚲o𝐕o,jT2𝐀i2ϵδ𝐀F𝐀j2.\displaystyle|{\mathbf{A}^{\prime}}_{o,i,j}|=|{\mathbf{V}^{\prime}}_{o,i}{\mathbf{\Lambda}^{\prime}}_{o}{\mathbf{V}^{\prime}}_{o,j}^{T}|\leq\|{\mathbf{V}^{\prime}}_{o,i}\|_{2}\cdot\|{\mathbf{\Lambda}^{\prime}}_{o}{\mathbf{V}^{\prime}}_{o,j}^{T}\|_{2}\leq\frac{\|\mathbf{A}_{i}\|_{2}}{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}}\cdot\|\mathbf{A}_{j}\|_{2}. (42)

Let 𝐀m=𝐇m+𝐃m{\mathbf{A}^{\prime}}_{m}=\mathbf{H}_{m}+\mathbf{D}_{m} where 𝐇m\mathbf{H}_{m} and 𝐃m\mathbf{D}_{m} contain the off-diagonal and diagonal elements of 𝐀m\mathbf{A}^{\prime}_{m} respectively. Then following the proof of Lemma 8, we get:

𝔼2𝐒¯𝐀m𝐒¯210logn(𝔼2𝐒¯𝐇m𝐒^12+𝔼2𝐇m𝐒^12)+2𝐇m2+𝔼2𝐒¯𝐃m𝐒¯2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}\leq 10\sqrt{\log n}\left(\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\right)+2\|\mathbf{H}_{m}\|_{2}+\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2} (43)

We now proceed to bound each of the terms on the right hand side of (43). We start with 𝔼2𝐒¯𝐃m𝐒¯2\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}. First, observe that 𝔼2𝐒¯𝐃m𝐒¯2maxi1pi|(𝐃m)ii|\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}\leq\max_{i}\frac{1}{p_{i}}\lvert(\mathbf{D}_{m})_{ii}\rvert. We consider two cases.

Case 1: pi<1p_{i}<1. Then, as pis𝐀i22𝐀F2p_{i}\geq\frac{s\|\mathbf{A}_{i}\|_{2}^{2}}{\|\mathbf{A}\|_{F}^{2}} we have 𝐀F21s𝐀i22\|\mathbf{A}\|_{F}^{2}\leq\frac{1}{s}\|\mathbf{A}_{i}\|_{2}^{2} since 1s<ϵ24\frac{1}{s}<\frac{\epsilon^{2}}{4}. So we must have that have |(𝐃m)ii|=|(𝐀m)ii|=|(𝐀o)ii|\lvert(\mathbf{D}_{m})_{ii}\rvert=\lvert({\mathbf{A}^{\prime}}_{m})_{ii}\rvert=\lvert(\mathbf{A}^{\prime}_{o})_{ii}\rvert (since 𝐀ii=0\mathbf{A}^{\prime}_{ii}=0). Then by (42), we have 1pi|(𝐃m)ii|𝐀Fsϵδ\frac{1}{p_{i}}\lvert(\mathbf{D}_{m})_{ii}\rvert\leq\frac{\|\mathbf{A}\|_{F}}{s\epsilon\sqrt{\delta}}.

Case 2: pi=1p_{i}=1. Then we have 1pi|(𝐃m)ii|=|(𝐃m)ii|maxj|(𝐃m)jj|𝐀m2ϵδ𝐀F\frac{1}{p_{i}}\lvert(\mathbf{D}_{m})_{ii}\rvert=\lvert(\mathbf{D}_{m})_{ii}\rvert\leq\max_{j}\lvert(\mathbf{D}_{m})_{jj}\rvert\leq\|\mathbf{A}^{\prime}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.
From the two cases above, for s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta}, we have:

𝔼2𝐒¯𝐃m𝐒¯2ϵδ𝐀F.\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{D}_{m}\bar{\mathbf{S}}\|_{2}\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}. (44)

We can bound 𝐇m2\|\mathbf{H}_{m}\|_{2} similarly. Since 𝐇m=𝐀m𝐃m\mathbf{H}_{m}={\mathbf{A}^{\prime}}_{m}-\mathbf{D}_{m} and 𝐀m2ϵδ𝐀F.\|{\mathbf{A}^{\prime}}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.,

𝐇m2\displaystyle\|\mathbf{H}_{m}\|_{2} 𝐀m2+𝐃m2\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m}\|_{2}+\|\mathbf{D}_{m}\|_{2}
ϵδ𝐀F+ϵδ𝐀F.\displaystyle\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}+\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.
=2ϵδ𝐀F.\displaystyle=2\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}. (45)

where the second step follows from the fact that 𝐃m2maxi|(𝐃m)ii|𝐀m2\|\mathbf{D}_{m}\|_{2}\leq\max_{i}\lvert(\mathbf{D}_{m})_{ii}\rvert\leq\|\mathbf{A}^{\prime}_{m}\|_{2}.

We next bound the term 𝔼2𝐇m𝐒^12\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}. Observe that 𝔼2𝐇m𝐒^12maxi𝐀m,i2pi\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\leq\frac{\max_{i}\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}}{\sqrt{p_{i}}}, where 𝐀m,i\mathbf{A^{\prime}}_{m,i} is the iith column/row of 𝐀m\mathbf{A}^{\prime}_{m}. We again consider the two cases when pi=1p_{i}=1 and pi<1p_{i}<1:

Case 1: pi=1p_{i}=1. Then 𝐀m,i2𝐀m2ϵδ𝐀F\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}\leq\|{\mathbf{A}^{\prime}}_{m}\|_{2}\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.

Case 2: pi<1p_{i}<1. Then 𝐀m,i2𝐀i2𝐀F\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}\leq\|{\mathbf{A}^{\prime}}_{i}\|_{2}\leq\|\mathbf{A}\|_{F}. Thus, setting s1ϵ2δs\geq\frac{1}{\epsilon^{2}\delta} we have:

𝐀m,i2pi\displaystyle\frac{\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}}{\sqrt{p_{i}}} 𝐀Fs𝐀i2𝐀i2\displaystyle\leq\frac{\|\mathbf{A}\|_{F}}{\sqrt{s}\|\mathbf{A}_{i}\|_{2}}\cdot\|{\mathbf{A}^{\prime}}_{i}\|_{2}
𝐀Fsϵδ𝐀F.\displaystyle\leq\frac{\|\mathbf{A}\|_{F}}{\sqrt{s}}\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.

Thus, from the two cases above, for all i[n]i\in[n], adjusting ϵ\epsilon by a 1logn\frac{1}{\sqrt{\log n}} factor, we have for slognϵ2δs\geq\frac{\log n}{\epsilon^{2}\delta}:

𝔼2𝐇m𝐒^12\displaystyle\mathbb{E}_{2}\|\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2} ϵδ𝐀Flogn.\displaystyle\leq\frac{\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}}{\sqrt{\log n}}. (46)

Overall, plugging (44), (45), and (46) back into (43), we have :

𝔼2𝐒¯𝐀m𝐒¯210logn𝔼2𝐒¯𝐇m𝐒^12+15ϵδ𝐀F.\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}\leq 10\sqrt{\log n}\cdot\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}+15\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}. (47)

Finally we bound 𝔼2𝐒¯𝐇m𝐒^12\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}. As in the proof of Lemma 8, we have 𝔼2𝐒¯𝐇m𝐒^12𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\leq\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right) and we will argue that maxi:i[n](𝐒¯𝐇m):,i2pi\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}} is bounded by ϵδ𝐀F\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} with probability 11/missingpoly(n)1-1/\mathop{\mathrm{missing}}{poly}(n). Also as argued in the proof of Lemma 8, since pi1n2p_{i}\geq\frac{1}{n^{2}}, it suffices to bound (𝐒¯𝐀m):,i2pi\frac{\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}} for all i[n]i\in[n] with high probability. Again, for a fixed ii and any j[n]j\in[n], define the random variables zjz_{j} as:

zj\displaystyle z_{j} ={1pj|𝐀m,i,j|2with probability pj0otherwise.\displaystyle=\begin{cases}\frac{1}{p_{j}}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}&\text{with probability $p_{j}$}\\ 0&\text{otherwise}.\end{cases}

Then j=1nzj=(𝐒¯𝐀m):,i22\sum_{j=1}^{n}z_{j}=\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}^{2} and 𝔼[j=1nzj]=𝐀m,i22𝐀i22𝐀F2\mathbb{E}[\sum_{j=1}^{n}z_{j}]=\|\mathbf{A}^{\prime}_{m,i}\|_{2}^{2}\leq\|\mathbf{A}^{\prime}_{i}\|_{2}^{2}\leq\|\mathbf{A}\|_{F}^{2}. We will again use Bernstein’s inequality to bound j=1nzj=(𝐒¯𝐀m):,i22\sum_{j=1}^{n}z_{j}=\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}^{2} by bounding bound |zj||z_{j}| for all j[n]j\in[n] and 𝐕𝐚𝐫(j=1nzj)\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right). We consider the cases of pi<1p_{i}<1 and pi=1p_{i}=1 separately.

Case 1: pi<1p_{i}<1. Then, we have pis𝐀i22/𝐀F2p_{i}\geq s\|\mathbf{A}_{i}\|_{2}^{2}/\|\mathbf{A}\|_{F}^{2}. If 𝐀i,j0{\mathbf{A}^{\prime}}_{i,j}\neq 0 then

|zj|\displaystyle|z_{j}| 1pj|𝐀m,i,j|2max(1,𝐀F2s𝐀j22)|𝐀m,i,j|2\displaystyle\leq\frac{1}{p_{j}}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\max\left(1,\frac{\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}\right)|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}
|𝐀m,i,j|2+2𝐀F2s𝐀j22(|𝐀i,j|2+|𝐀o,i,j|2)\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}\left(|{\mathbf{A}^{\prime}}_{i,j}|^{2}+|{\mathbf{A}^{\prime}}_{o,i,j}|^{2}\right)
|𝐀m,i,j|2+2𝐀F2s𝐀j22(|𝐀i,j|2+𝐀i22𝐀j22ϵ2δ𝐀F2)\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}\left(|{\mathbf{A}^{\prime}}_{i,j}|^{2}+\frac{\|\mathbf{A}_{i}\|_{2}^{2}\|\mathbf{A}_{j}\|_{2}^{2}}{\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}}\right)
|𝐀m,i,j|2+2𝐀F2s𝐀j22|𝐀i,j|2+2𝐀i22ϵ2δs,\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}|{\mathbf{A}^{\prime}}_{i,j}|^{2}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon^{2}\delta s},

where the fourth inequality uses (42). By the thresholding procedure which defines 𝐀\mathbf{A}^{\prime}, if iji\neq j and 𝐀ij0\mathbf{A}^{\prime}_{ij}\neq 0,

𝐀i22𝐀j22ϵ2𝐀F2|𝐀ij|2c2log4n𝐀j22|𝐀i,j|2ϵ2𝐀F2c2log4n𝐀i22,\displaystyle\|\mathbf{A}_{i}\|_{2}^{2}\cdot\|\mathbf{A}_{j}\|_{2}^{2}\geq\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}|\mathbf{A}^{\prime}_{ij}|^{2}}{c_{2}\log^{4}n}\Rightarrow\frac{\|\mathbf{A}_{j}\|_{2}^{2}}{|\mathbf{A}^{\prime}_{i,j}|^{2}}\geq\frac{\epsilon^{2}\|\mathbf{A}\|_{F}^{2}}{c_{2}\cdot\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}}, (48)

and thus for pi<1p_{i}<1 and 𝐀ij0{\mathbf{A}^{\prime}}_{ij}\neq 0 we have

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2c2log4n𝐀i22sϵ2+2𝐀i22ϵ2δs.\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{\epsilon^{2}\delta s}.

Also 𝐀ii=0\mathbf{A}^{\prime}_{ii}=0 since we must have 𝐀i22<ϵ24𝐀F2\|\mathbf{A}_{i}\|_{2}^{2}<\frac{\epsilon^{2}}{4}\|\mathbf{A}\|_{F}^{2} as pi<1p_{i}<1. If 𝐀i,j=0{\mathbf{A}^{\prime}}_{i,j}=0 or i=ji=j, then we simply have

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2𝐀i22sϵ2δ.\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}\delta}.

Overall for all j[n]j\in[n],

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2𝐀i22sϵ2δ+2c2log4n𝐀i22sϵ2,\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}}, (49)

and since |𝐀m,i,j|2j=1n|𝐀m,i,j|2=𝐀m,i22𝐀i22𝐀i22|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}=\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{2}\leq\|{\mathbf{A}^{\prime}}_{i}\|_{2}^{2}\leq\|\mathbf{A}_{i}\|_{2}^{2},

|zj|\displaystyle|z_{j}| 𝐀i22+2𝐀i22sϵ2δ+2c2log4n𝐀i22sϵ2.\displaystyle\leq\|\mathbf{A}_{i}\|_{2}^{2}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}\delta}+\frac{2c_{2}\cdot\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}}. (50)

For sc(log4nϵ2+1ϵ2δ)s\geq c\left(\frac{\log^{4}n}{\epsilon^{2}}+\frac{1}{\epsilon^{2}\delta}\right) and large enough cc, we thus have |zj|2𝐀i22|z_{j}|\leq 2\|\mathbf{A}_{i}\|_{2}^{2}.

We next bound the variance by:

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) j=1n𝔼[zj2]j=1npj1pj2|𝐀m,i,j|4\displaystyle\leq\sum_{j=1}^{n}\mathbb{E}[z_{j}^{2}]\leq\sum_{j=1}^{n}p_{j}\frac{1}{p_{j}^{2}}|{\mathbf{A}^{\prime}}_{m,i,j}|^{4}
=j=1nmax(1,𝐀F2s𝐀j22)|𝐀m,i,j|4\displaystyle=\sum_{j=1}^{n}\max\left(1,\frac{\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}\right)|{\mathbf{A}^{\prime}}_{m,i,j}|^{4}
j=1n|𝐀m,i,j|4+j=1n12𝐀F2s𝐀j22(|𝐀i,j|4+|𝐀o,i,j|4)\displaystyle\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{4}+\sum_{j=1}^{n}\frac{12\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}\left(|{\mathbf{A}^{\prime}}_{i,j}|^{4}+|{\mathbf{A}^{\prime}}_{o,i,j}|^{4}\right)
𝐀m,i24+j=1n12𝐀F2s𝐀j22(|𝐀i,j|4+𝐀i24𝐀j24ϵ4δ2𝐀F4),\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\sum_{j=1}^{n}\frac{12\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{j}\|_{2}^{2}}\left(|\mathbf{A}_{i,j}^{\prime}|^{4}+\frac{\|\mathbf{A}_{i}\|_{2}^{4}\|\mathbf{A}_{j}\|_{2}^{4}}{\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{4}}\right),

where the last inequality uses (42). We thus get:

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i24+j:𝐀i,j012𝐀F2|𝐀ij|4s𝐀j22+j=1n12𝐀i24𝐀j22sϵ4δ2𝐀F2.\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\sum_{j:{\mathbf{A}^{\prime}}_{i,j}\neq 0}\frac{12\|\mathbf{A}\|_{F}^{2}|\mathbf{A}^{\prime}_{ij}|^{4}}{s\|\mathbf{A}_{j}\|_{2}^{2}}+\sum_{j=1}^{n}\frac{12\|\mathbf{A}_{i}\|_{2}^{4}\|\mathbf{A}_{j}\|_{2}^{2}}{s\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{2}}. (51)

Now 𝐀ii=0\mathbf{A}_{ii}^{\prime}=0 as pi<1p_{i}<1 (and thus, 𝐀i2<ϵ24𝐀F2\|\mathbf{A}\|_{i}^{2}<\frac{\epsilon^{2}}{4}\|\mathbf{A}\|_{F}^{2}). Combining (48) with the second term to the right of (51) we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i24+j:𝐀i,j012c2log4n𝐀i22|𝐀ij|2sϵ2+j=1n12𝐀i24𝐀j22sϵ4δ2𝐀F2,\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\sum_{j:{\mathbf{A}^{\prime}}_{i,j}\neq 0}\frac{12c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}\cdot|\mathbf{A}^{\prime}_{ij}|^{2}}{s\epsilon^{2}}+\sum_{j=1}^{n}\frac{12\|\mathbf{A}_{i}\|_{2}^{4}\|\mathbf{A}_{j}\|_{2}^{2}}{s\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{2}},

and since j|𝐀ij|2=𝐀i22\sum_{j}|\mathbf{A}^{\prime}_{ij}|^{2}=\|\mathbf{A}_{i}\|_{2}^{2}, we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i24+12c2log4n𝐀i24sϵ2+j=1n12𝐀i24𝐀j22sϵ4δ2𝐀F2.\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}+\frac{12c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{2}}+\sum_{j=1}^{n}\frac{12\|\mathbf{A}_{i}\|_{2}^{4}\|\mathbf{A}_{j}\|_{2}^{2}}{s\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{2}}. (52)

Finally since j=1n𝐀j22=𝐀F2\sum_{j=1}^{n}\|\mathbf{A}_{j}\|_{2}^{2}=\|\mathbf{A}\|_{F}^{2} and 𝐀m,i24𝐀i24𝐀i24\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{4}\leq\|\mathbf{A^{\prime}}_{i}\|_{2}^{4}\leq\|\mathbf{A}_{i}\|_{2}^{4} we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀i24+12c2log4n𝐀i24sϵ2+12𝐀i24sϵ4δ2.\displaystyle\leq\|\mathbf{A}_{i}\|_{2}^{4}+\frac{12c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{2}}+\frac{12\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{4}\delta^{2}}. (53)

For sc(log4nϵ2+1ϵ4δ2)s\geq c\left(\frac{\log^{4}n}{\epsilon^{2}}+\frac{1}{\epsilon^{4}\delta^{2}}\right) for large enough cc, we have 𝐕𝐚𝐫(j=1nzj)2𝐀i24\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right)\leq 2\|\mathbf{A}_{i}\|_{2}^{4}.

Therefore, using (50) and (53) with sc(log4nϵ2+1ϵ4δ2)s\geq c\left(\frac{\log^{4}n}{\epsilon^{2}}+\frac{1}{\epsilon^{4}\delta^{2}}\right), we can apply Bernstein inequality (Theorem 7) (for some constant cc) to get

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+t)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+t\right) (j=1nzj𝐀i22+t)\displaystyle\leq\operatorname*{\mathbb{P}}\left(\sum_{j=1}^{n}z_{j}\geq\|\mathbf{A}_{i}\|_{2}^{2}+t\right)
exp(t2/2c𝐀i24+ct𝐀i22/3).\displaystyle\leq\exp\left(\frac{-t^{2}/2}{c\|\mathbf{A}_{i}\|_{2}^{4}+ct\|\mathbf{A}_{i}\|_{2}^{2}/3}\right).

If we set t=logn𝐀i22t=\log n\cdot\|\mathbf{A}_{i}\|_{2}^{2}, for some constant cc^{\prime} we have

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+logn𝐀i22)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+\log n\cdot\|\mathbf{A}_{i}\|_{2}^{2}\right) exp((logn)2/2c+c(logn)/3)exp(clogn)1/nc.\displaystyle\leq\exp\left(\frac{-(\log n)^{2}/2}{c+c(\log n)/3}\right)\leq\exp(-c^{\prime}\log n)\leq 1/n^{c^{\prime}}.

Since 𝐀m=𝐇m+𝐃m{\mathbf{A}^{\prime}}_{m}=\mathbf{H}_{m}+\mathbf{D}_{m}, we have (𝐒¯𝐀m):,i2(𝐒¯𝐇m):,i2\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}\geq\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}. Then with probability at least 11/nc1δ1-1/n^{c^{\prime}}\geq 1-\delta, for any row ii with pi<1p_{i}<1, we have

1pi(𝐒¯𝐇m):,i22𝐀F2s𝐀i22c(logn)𝐀i22ϵ2δ𝐀F2logn,\displaystyle\frac{1}{p_{i}}\cdot\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}\leq\frac{\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{i}\|_{2}^{2}}\cdot c(\log n)\|\mathbf{A}_{i}\|_{2}^{2}\leq\frac{\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}}{\log n},

for sc(log4nϵ2+1ϵ4δ2)s\geq c\left(\frac{\log^{4}n}{\epsilon^{2}}+\frac{1}{\epsilon^{4}\delta^{2}}\right) for large enough cc. Observe that, as in Lemma 3 w.l.o.g. we have assumed 11nc1δ1-\frac{1}{n^{c^{\prime}}}\geq 1-\delta, since otherwise, our algorithm would read all n2n^{2} entries of the matrix.

Case 2: pi=1p_{i}=1. Then, we have 𝐀i22𝐀F2/s\|\mathbf{A}_{i}\|_{2}^{2}\geq\|\mathbf{A}\|_{F}^{2}/s. As in the pi<1p_{i}<1 case, when 𝐀ii=0\mathbf{A}_{ii}=0, (and this 𝐀ii=𝐀ii=0\mathbf{A}^{\prime}_{ii}=\mathbf{A}_{ii}=0) we have from (49):

|zj|\displaystyle|z_{j}| |𝐀m,i,j|2+2𝐀i22sϵ2δ+2c2log4n𝐀i22sϵ2.\displaystyle\leq|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}}.

Now, we observe that |𝐀m,i,j|2j=1n|𝐀m,i,j|2𝐀m,i22𝐀m22ϵ2δ𝐀F2|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\|\mathbf{A}^{\prime}_{m,i}\|^{2}_{2}\leq\|\mathbf{A}^{\prime}_{m}\|^{2}_{2}\leq\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}, which gives us

|zj|\displaystyle|z_{j}| ϵ2δ𝐀F2+2𝐀i22sϵ2δ+2c2log4n𝐀i22sϵ2.\displaystyle\leq\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}+\frac{2\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}\delta}+\frac{2c_{2}\log^{4}n\cdot\|\mathbf{A}_{i}\|_{2}^{2}}{s\epsilon^{2}}. (54)

Note that if 𝐀ii0\mathbf{A}_{ii}\neq 0, the second term in (49) is bounded as 2𝐀F2s𝐀i22|𝐀ii|22𝐀F2s2ϵ2δ𝐀F2\frac{2\|\mathbf{A}\|_{F}^{2}}{s\|\mathbf{A}_{i}\|_{2}^{2}}\cdot|\mathbf{A}^{\prime}_{ii}|^{2}\leq\frac{2\|\mathbf{A}\|_{F}^{2}}{s}\leq 2\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2} for sO(1ϵ2δ)s\geq O(\frac{1}{\epsilon^{2}\delta}). Thus, for sc(log4nϵ4δ+1ϵ4δ2)s\geq c\left(\frac{\log^{4}n}{\epsilon^{4}\delta}+\frac{1}{\epsilon^{4}\delta^{2}}\right) for a large enough constant cc and adjusting for other constants we have |zj|2ϵ2δ𝐀F2|z_{j}|\leq 2\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}. Also observe that the expectation of zj\sum z_{j} can be bounded by:

𝔼(j=1nzj)=𝔼(𝐒¯𝐀m):,i22=𝐀m,i22𝐀m22ϵ2δ𝐀F2.\displaystyle\mathbb{E}\left(\sum_{j=1}^{n}z_{j}\right)=\mathbb{E}\|(\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m})_{:,i}\|_{2}^{2}=\|{\mathbf{A}^{\prime}}_{m,i}\|_{2}^{2}\leq\|{\mathbf{A}^{\prime}}_{m}\|_{2}^{2}\leq\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}.

Next, the variance of the sum of the random variables {zj}\{z_{j}\} can again be bounded by following the analysis presented in and prior to (52) and (53) we have

𝐕𝐚𝐫(j=1nzj)\displaystyle\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right) 𝐀m,i,j24+12c2log2n𝐀i24sϵ2+12𝐀i24sϵ4δ2\displaystyle\leq\|{\mathbf{A}^{\prime}}_{m,i,j}\|_{2}^{4}+\frac{12c_{2}\log^{2}n\cdot\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{2}}+\frac{12\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{4}\delta^{2}}
ϵ4δ2𝐀F4+12c2log2n𝐀i24sϵ2+12𝐀i24sϵ4δ2,\displaystyle\leq\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{4}+\frac{12c_{2}\log^{2}n\cdot\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{2}}+\frac{12\|\mathbf{A}_{i}\|_{2}^{4}}{s\epsilon^{4}\delta^{2}}, (55)

where we again bound 𝐀m,i,j24\|{\mathbf{A}^{\prime}}_{m,i,j}\|_{2}^{4} using

|𝐀m,i,j|2j=1n|𝐀m,i,j|2𝐀m,i22𝐀22ϵ2δ𝐀F2.|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\sum_{j=1}^{n}|{\mathbf{A}^{\prime}}_{m,i,j}|^{2}\leq\|\mathbf{A}^{\prime}_{m,i}\|^{2}_{2}\leq\|\mathbf{A}^{\prime}\|^{2}_{2}\leq\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}.

Then for sc(log4nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{4}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}), we have 𝐕𝐚𝐫(j=1nzj)2ϵ4δ2𝐀F4\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right)\leq 2\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{4} for large enough constant cc.

Using (54) and (55) and noting that j=1n𝔼(zj2)𝐕𝐚𝐫(j=1nzj)𝔼2(j=1nzj)\sum_{j=1}^{n}\mathbb{E}\left(z_{j}^{2}\right)\geq\mathbf{Var}\left(\sum_{j=1}^{n}z_{j}\right)-\mathbb{E}^{2}\left(\sum_{j=1}^{n}z_{j}\right) we can apply the Bernstein inequality (Theorem 7):

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+t)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+t\right) (j=1nzjϵ2δ𝐀i22+t)\displaystyle\leq\operatorname*{\mathbb{P}}\left(\sum_{j=1}^{n}z_{j}\geq\epsilon^{2}\delta\|\mathbf{A}_{i}\|_{2}^{2}+t\right)
exp(t2/2cϵ4δ2𝐀F4+cϵ2δ𝐀F2t/3).\displaystyle\leq\exp\left(\frac{-t^{2}/2}{c\epsilon^{4}\delta^{2}\|\mathbf{A}\|_{F}^{4}+c\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}t/3}\right).

If we set t=(logn)ϵ2δ𝐀F2t=(\log n)\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}, then for some constant cc^{\prime} we have

((𝐒¯𝐀m):,i22𝔼(𝐒¯𝐀m):,i22+t)\displaystyle\operatorname*{\mathbb{P}}\left(\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\geq\mathbb{E}\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}+t\right) exp(clogn)1/nc.\displaystyle\leq\exp(-c^{\prime}\log n)\leq 1/n^{c^{\prime}}.

This, since (𝐒¯𝐇m):,i22(𝐒¯𝐀m):,i22\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}\leq\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}, when pi=1p_{i}=1, setting sc(log4nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{4}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}) for large enough cc, we have with probability 11/nc\geq 1-1/n^{c^{\prime}} 1pi(𝐒¯𝐇m):,i22=(𝐒¯𝐇m):,i22(𝐒¯𝐀m):,i22(logn)ϵ2δnnz(𝐀).\frac{1}{p_{i}}\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}=\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}\leq\|(\bar{\mathbf{S}}\mathbf{A}^{\prime}_{m})_{:,i}\|_{2}^{2}\leq(\log n)\epsilon^{2}\delta\operatorname{nnz}({\mathbf{A}}).

We have proven that with probability 11/nc\geq 1-1/n^{c^{\prime}}, for both cases when pi<1p_{i}<1 and pi=1p_{i}=1, (𝐒¯𝐇m):,i22pi(logn)ϵ2δ𝐀F2\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}^{2}}{p_{i}}\leq(\log n)\epsilon^{2}\delta\|\mathbf{A}\|_{F}^{2}. Taking a union bound over all i[n]i\in[n], with probability at least 11/nc11-1/n^{c^{\prime}-1}, maxi(𝐒¯𝐇m):,i2pilognϵδ𝐀F\max_{i}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\leq\sqrt{\log n}\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F} for sc(log4nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{4}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}). Also, since pi1n2p_{i}\geq\frac{1}{n^{2}} for all i[n]i\in[n], (𝐒¯𝐇m):,i2pij=1n𝐀m,i,j2pipjn𝐀Fs\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\leq\sqrt{\sum_{j=1}^{n}\frac{\mathbf{A}_{m,i,j}^{2}}{p_{i}\cdot p_{j}}}\leq\frac{n\cdot\|\mathbf{A}\|_{F}}{\sqrt{s}}. Thus, maxi(𝐒¯𝐇m):,i2pin𝐀F\max_{i}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\leq n\|\mathbf{A}\|_{F} and we get,

𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)lognϵδ𝐀F+1nc3lognϵδ𝐀F.\displaystyle\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right)\leq\sqrt{\log n}\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}+\frac{1}{n^{c^{\prime}-3}}\leq\sqrt{\log n}\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.

after adjusting ϵ\epsilon by at most some constants. Overall, we finally get

𝔼2𝐒¯𝐇m𝐒^12𝔼2(maxi:i[n](𝐒¯𝐇m):,i2pi)ϵlognδ𝐀F.\mathbb{E}_{2}\|\bar{\mathbf{S}}\mathbf{H}_{m}\hat{\mathbf{S}}\|_{1\to 2}\leq\mathbb{E}_{2}\left(\max_{i:i\in[n]}\frac{\|(\bar{\mathbf{S}}\mathbf{H}_{m})_{:,i}\|_{2}}{\sqrt{p_{i}}}\right)\leq\epsilon\sqrt{\log n}\sqrt{\delta}\|\mathbf{A}\|_{F}.

Plugging this bound into (47), we have for sc(log4nϵ6δ2+1ϵ8δ4)s\geq c(\frac{\log^{4}n}{\epsilon^{6}\delta^{2}}+\frac{1}{\epsilon^{8}\delta^{4}}),

𝔼2𝐒¯𝐀m𝐒¯2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2} (logn)ϵδ𝐀F.\displaystyle\leq(\log n)\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.

Finally after adjusting ϵ\epsilon by a 1logn\frac{1}{\log n} factor, we have for sc(log10nϵ6δ2+log8nϵ8δ4)s\geq c(\frac{\log^{10}n}{\epsilon^{6}\delta^{2}}+\frac{\log^{8}n}{\epsilon^{8}\delta^{4}}) or sclog10nϵ8δ4s\geq\frac{c\log^{10}n}{\epsilon^{8}\delta^{4}},

𝔼2𝐒¯𝐀m𝐒¯2\displaystyle\mathbb{E}_{2}\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2} ϵδ𝐀F.\displaystyle\leq\epsilon\sqrt{\delta}\|\mathbf{A}\|_{F}.

The final bound then follows via Markov’s inequality on 𝐒¯𝐀m𝐒¯2\|\bar{\mathbf{S}}{\mathbf{A}^{\prime}}_{m}\bar{\mathbf{S}}\|_{2}. ∎

E.3 Main Accuracy Bound

We are finally ready to state our main result for 2\ell_{2} norm based sampling.

See 3

Proof.

The proof follows exactly the same structure as the proofs of Theorems 1 and 2 for uniform and sparsity based sampling respectively. We use the results of Lemmas 14 and 13 on the concentration of the middle and large eigenvalues for 2\ell_{2} norm based sampling.

Analogous to Theorem 2, from Lemma 13 with error parameter ϵlogn\frac{\epsilon}{\log n} the eigenvalues of 𝐀o,S\mathbf{A}^{\prime}_{o,S} approximate those of 𝐀o\mathbf{A}_{o}^{\prime} up to error ϵ𝐀F\epsilon\|\mathbf{A}\|_{F} with probability 1δ1-\delta if sclog(1/(ϵδ))log3nϵ3δs\geq\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}}. We also require sclog10nϵ8δ4s\geq\frac{c\log^{10}n}{\epsilon^{8}\delta^{4}} for 𝐀m,S2ϵ𝐀F\|\mathbf{A}_{m,S}^{\prime}\|_{2}\leq\epsilon\|\mathbf{A}\|_{F} to hold with probability 1δ1-\delta by Lemma 14. Thus, for both conditions to hold simultaneously with probability 12δ1-2\delta by a union bound, if suffices to set s=max(clog(1/(ϵδ))log3nϵ3δ,clog10nϵ8δ4)=clog10nϵ8δ4s=\max\left(\frac{c\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\sqrt{\delta}},\frac{c\log^{10}n}{\epsilon^{8}\delta^{4}}\right)=\frac{c\log^{10}n}{\epsilon^{8}\delta^{4}}, where we use that log(1/(ϵδ)logn\log(1/(\epsilon\delta)\leq\log n, as otherwise our algorithm can take 𝐀S\mathbf{A}_{S} to be the full matrix 𝐀\mathbf{A}. Adjusting δ\delta to δ/2\delta/2 completes the theorem. ∎

Appendix F Eigenvalue Approximation via Entrywise Sampling

In this section we show that sampling O~(n/ϵ2)\tilde{O}(n/\epsilon^{2}) entries from a bounded entry matrix preserves its eigenvalues up to error ±ϵn\pm\epsilon n. We use this result to improve the sample complexity of Theorem 1 from O~(log6nϵ6)\tilde{O}\left(\frac{\log^{6}n}{\epsilon^{6}}\right) to O~(log3nϵ5)\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{5}}\right) by applying entrywise sampling to further sparsify the submatrix 𝐀S\mathbf{A}_{S} that is sampled in Algorithm 1. Entrywise sampling results similar to what we show are well-known in the literature. See for example [AM07] and [BKKS21]. For completeness, we give a proof here using standard matrix concentration bounds.

Theorem 10 (Entrywise sampling – spectral norm bound).

Consider 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Let 𝐂n×n\mathbf{C}\in\mathbb{R}^{n\times n} be constructed by setting 𝐂i,i=𝐀i,i\mathbf{C}_{i,i}=\mathbf{A}_{i,i} for all i[n]i\in[n] and

𝐂j,i=𝐂i,j\displaystyle\mathbf{C}_{j,i}=\mathbf{C}_{i,j} ={1p𝐀i,jwith probability p0otherwise.\displaystyle=\begin{cases}\frac{1}{p}\cdot\mathbf{A}_{i,j}&\text{with probability }$p$\\ 0&\text{otherwise}.\end{cases}

For any ϵ,δ(0,1)\epsilon,\delta\in(0,1), if pclog(n/δ)nϵ2p\geq\frac{c\log(n/\delta)}{n\epsilon^{2}} for a large enough constant cc, then with probability at least 1δ1-\delta, 𝐀𝐂2ϵn\|\mathbf{A}-\mathbf{C}\|_{2}\leq\epsilon n.

Note that by Weyl’s inequality (Fact 3), Theorem 10 immediately implies that the eigenvalues of 𝐂\mathbf{C} approximate those of 𝐀\mathbf{A} up to ±ϵn\pm\epsilon n error with good probability.

Proof.

For any i<ji<j, define the symmetric random matrix 𝐄(ij)\mathbf{E}^{(ij)} with

𝐄i,j(ij)=𝐄j,i(ij)\displaystyle\mathbf{E}^{(ij)}_{i,j}=\mathbf{E}^{(ij)}_{j,i} ={(1p1)𝐀i,jwith probability p𝐀i,jotherwise.\displaystyle=\begin{cases}(\frac{1}{p}-1)\cdot\mathbf{A}_{i,j}&\text{with probability }p\\ -\mathbf{A}_{i,j}&\text{otherwise}.\end{cases}

. Observe that 𝐂𝐀=i,j[n],i<j𝐄(ij)\mathbf{C}-\mathbf{A}=\sum_{i,j\in[n],i<j}\mathbf{E}^{(ij)}. Further, each 𝐄(ij)\mathbf{E}^{(ij)} has just two non-zero values in different rows and columns. So

𝐄(ij)2=|𝐂i,j𝐀i,j]|(1p1)|𝐀i,j|1p,\displaystyle\|\mathbf{E}^{(ij)}\|_{2}=|\mathbf{C}_{i,j}-\mathbf{A}_{i,j}]|\leq\left(\frac{1}{p}-1\right)\cdot|\mathbf{A}_{i,j}|\leq\frac{1}{p},

where the last inequality uses that 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Additionally, 𝐄(ij)𝐄(ij)T\mathbf{E}^{(ij)}\mathbf{E}^{(ij)T} is diagonal with two diagonal entries at (i,i)(i,i) or (j,j)(j,j) equal to (𝐂i,j𝐀i,j)2(\mathbf{C}_{i,j}-\mathbf{A}_{i,j})^{2}. Thus, 𝐕=i,j[n],i<j𝔼[𝐄(ij)𝐄(ij)T]\mathbf{V}=\sum_{i,j\in[n],i<j}\mathbb{E}[\mathbf{E}^{(ij)}\mathbf{E}^{(ij)T}] is also diagonal. We have

𝐕i,i=ji𝔼[(𝐂i,j𝐀i,j)2]\displaystyle\mathbf{V}_{i,i}=\sum_{j\neq i}\mathbb{E}[(\mathbf{C}_{i,j}-\mathbf{A}_{i,j})^{2}] =ji𝐀i,j2(p(1p1)2+(1p)(1)2)\displaystyle=\sum_{j\neq i}\mathbf{A}_{i,j}^{2}\cdot\left(p\cdot\left(\frac{1}{p}-1\right)^{2}+(1-p)\cdot(-1)^{2}\right)
=ji𝐀i,j2(1p1)np,\displaystyle=\sum_{j\neq i}\mathbf{A}_{i,j}^{2}\cdot\left(\frac{1}{p}-1\right)\leq\frac{n}{p},

where in the final inequality we use that 𝐀1\|\mathbf{A}\|_{\infty}\leq 1. Thus, since 𝐕\mathbf{V} is diagonal, 𝐕2np\|\mathbf{V}\|_{2}\leq\frac{n}{p}. Putting the above together using Theorem 6 we get,

(𝐀𝐂2ϵn)=(i,j[n],i<j𝐄(ij)2ϵn)\displaystyle\operatorname*{\mathbb{P}}\left(\|\mathbf{A}-\mathbf{C}\|_{2}\geq\epsilon n\right)=\operatorname*{\mathbb{P}}\left(\left\|\sum_{i,j\in[n],i<j}\mathbf{E}^{(ij)}\right\|_{2}\geq\epsilon n\right) 2nexp(ϵ2n2/2np+ϵn3p).\displaystyle\leq 2n\cdot\exp\left(\frac{-\epsilon^{2}n^{2}/2}{\frac{n}{p}+\frac{\epsilon n}{3p}}\right).

Thus, for pclog(n/δ)nϵ2p\geq\frac{c\log(n/\delta)}{n\epsilon^{2}} for large enough cc, with probability at least 1δ1-\delta we have 𝐀𝐂2ϵn\|\mathbf{A}-\mathbf{C}\|_{2}\leq\epsilon n. ∎

F.1 Improved Sample Complexity via Entrywise Sampling

We can combine Theorem 10 directly with Theorem 1 to give an improved sample complexity for eigenvalue estimation. we have: See 1

Proof.

Letting s=c1log(1/(ϵδ))log3nϵ3δs=\frac{c_{1}\log(1/(\epsilon\delta))\cdot\log^{3}n}{\epsilon^{3}\delta} for large enough constant c1c_{1}, by Theorem 1, for a random principal submatrix 𝐀S\mathbf{A}_{S} formed by sampling each index with probability s/ns/n, the eigenvalues of 𝐀S\mathbf{A}_{S}, after scaling up by a factor of n/sn/s approximate those of 𝐀\mathbf{A} to error ±ϵn\pm\epsilon n with probability at least 1δ1-\delta. By Theorem 10, if we sample off-diagonal entries of 𝐀S\mathbf{A}_{S} with probability pc2log(|S|/δ)|S|ϵ2p\geq\frac{c_{2}\log(|S|/\delta)}{|S|\cdot\epsilon^{2}} to produce 𝐂\mathbf{C}, then we preserve its eigenvalues to error ±ϵ|S|\pm\epsilon|S|. Thus, after scaling by ns\frac{n}{s}, the eigenvalues of 𝐂\mathbf{C} approximate those of 𝐀\mathbf{A} to error ±(ϵn+nsϵ|S|)\pm\left(\epsilon n+\frac{n}{s}\cdot\epsilon|S|\right). Finally, observe that by a standard Chernoff bound, |S|2s|S|\leq 2s with probability at least 1δ1-\delta. So adjusting ϵ\epsilon by a constant, the scaled eigenvalues of 𝐂\mathbf{C} give ±ϵn\pm\epsilon n approximations to 𝐀\mathbf{A}’s eigenvalues. The expected number of entries read is |S|+p|S|2=O~(slog(1/δ)ϵ2)=O~(log3nϵ5δ)|S|+p\cdot|S|^{2}=\tilde{O}\left(\frac{s\cdot\log(1/\delta)}{\epsilon^{2}}\right)=\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{5}\delta}\right). Additionally, by a standard Chernoff bound at most O~(log3nϵ5δ)\tilde{O}\left(\frac{\log^{3}n}{\epsilon^{5}\delta}\right) are read with probability at least 1δ1-\delta. ∎

Appendix G Singular Value Approximation via Sampling

We now show how to estimate the singular values of a bounded-entry matrix via random subsampling. Unlike in eigenvalue estimation, instead of sampling a random principal submatrix, here we sample a random submatrix with independent rows and columns. This allows us to apply known interior eigenvalue matrix Chernoff bounds to bound the perturbation in the singular values [GT11, BCJ20]. We first state a simplified version of Theorem 4.1 from [GT11] (also stated as Theorem 4.6 in [BCJ20]), simplified using standard upper bounds on the Chernoff bounds in [MU17].

Theorem 11 (Interior Eigenvalue Matrix Chernoff bounds – Theorem 4.1 of [GT11]).

Let {𝐗j}\{\mathbf{X}_{j}\} be a finite sequence of independent, random, positive-semidefinite matrices with dimension nn, and assume that 𝐗j2L\|\mathbf{X}_{j}\|_{2}\leq L for some value LL almost surely. Given an integer knk\leq n, define

μk=λk(j𝔼[𝐗j]).\mu_{k}=\lambda_{k}\left(\sum_{j}\mathbb{E}[\mathbf{X}_{j}]\right).

Then we have the tail inequalities:

{(λk(j𝐗j)(1+Δ)μk)(nk+1)eΔμk3L,for Δ1(λk(j𝐗j)(1+Δ)μk)(nk+1)eΔ2μk3L,for Δ[0,1)(λk(j𝐗j)(1Δ)μk)keΔ2μk2L,for Δ[0,1)\displaystyle\begin{cases}\operatorname*{\mathbb{P}}\left(\lambda_{k}(\sum_{j}\mathbf{X}_{j})\geq(1+\Delta)\mu_{k}\right)\leq(n-k+1)\cdot e^{-\frac{\Delta\mu_{k}}{3L}},&\text{for }\Delta\geq 1\\ \operatorname*{\mathbb{P}}\left(\lambda_{k}(\sum_{j}\mathbf{X}_{j})\geq(1+\Delta)\mu_{k}\right)\leq(n-k+1)\cdot e^{-\frac{\Delta^{2}\mu_{k}}{3L}},&\text{for }\Delta\in[0,1)\\ \operatorname*{\mathbb{P}}\left(\lambda_{k}(\sum_{j}\mathbf{X}_{j})\leq(1-\Delta)\mu_{k}\right)\leq k\cdot e^{-\frac{\Delta^{2}\mu_{k}}{2L}},&\text{for }\Delta\in[0,1)\end{cases}

We are now ready to state and prove the main theorem.

Theorem 12.

Let 𝐀n×n\mathbf{A}\in\mathbb{R}^{n\times n} be a matrix with 𝐀1\|\mathbf{A}\|_{\infty}\leq 1 and singular values σ1(𝐀)σn(𝐀)\sigma_{1}(\mathbf{A})\geq\ldots\geq\sigma_{n}(\mathbf{A}). Let 𝐒¯n×n\mathbf{\bar{S}}\in\mathbb{R}^{n\times n} be a scaled diagonal sampling matrix such that 𝐒¯ii=ns\mathbf{\bar{S}}_{ii}=\sqrt{\frac{n}{s}} with probability sn\frac{s}{n} and 𝐒¯ii=0\mathbf{\bar{S}}_{ii}=0 otherwise. Let 𝐓¯n×n\mathbf{\bar{T}}\in\mathbb{R}^{n\times n} be an independent and identically distributed random sampling matrix. Let 𝐙=𝐒¯𝐀𝐓¯\mathbf{Z}=\mathbf{\bar{S}A\bar{T}} be the sampled submatrix from 𝐀\mathbf{A} with singular values σ1(𝐙)σn(𝐙)\sigma_{1}(\mathbf{Z})\geq\ldots\geq\sigma_{n}(\mathbf{Z}). Then, if sclog(n/δ)ϵ2s\geq\frac{c\log(n/\delta)}{\epsilon^{2}} for some constant cc, with probability at least 1δ1-\delta, for all i[n]i\in[n],

σi(𝐀)ϵnσi(𝐙)σi(𝐀)+ϵn.\displaystyle\sigma_{i}(\mathbf{A})-\epsilon n\leq\sigma_{i}(\mathbf{Z})\leq\sigma_{i}(\mathbf{A})+\epsilon n.
Proof.

We first prove that singular values of 𝐒¯𝐀\bar{\mathbf{S}}\mathbf{A} are close to those of 𝐀\mathbf{A}. Let 𝐗in×n\mathbf{X}_{i}\in\mathbb{R}^{n\times n} be matrix valued r.v.’s for i[n]i\in[n] such that:

𝐗i={ns𝐀i𝐀iT,with probability s/n0otherwise\displaystyle\mathbf{X}_{i}=\begin{cases}\frac{n}{s}\mathbf{A}_{i}\mathbf{A}_{i}^{T},&\text{with probability }s/n\\ 0&\text{otherwise}\end{cases}

where 𝐀i\mathbf{A}_{i} is the iith row of 𝐀\mathbf{A} written as a column vector. Then, i𝐗i=(𝐒¯𝐀)T(𝐒¯𝐀)\sum_{i}\mathbf{X}_{i}=(\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}) and 𝔼[i𝐗i]=𝐀T𝐀\mathbb{E}[\sum_{i}\mathbf{X}_{i}]=\mathbf{A}^{T}\mathbf{A}. We have 𝐗i2maxjns𝐀j22n2s\|\mathbf{X}_{i}\|_{2}\leq\max_{j}\frac{n}{s}\|\mathbf{A}_{j}\|^{2}_{2}\leq\frac{n^{2}}{s} and λk(𝔼[i𝐗i])=λk(𝐀T𝐀)=σk2(𝐀)\lambda_{k}(\mathbb{E}[\sum_{i}\mathbf{X}_{i}])=\lambda_{k}(\mathbf{A}^{T}\mathbf{A})=\sigma_{k}^{2}(\mathbf{A}) for k[n]k\in[n].

Case 1: We will first prove that σk(𝐀)ϵnσk(𝐒¯𝐀)\sigma_{k}(\mathbf{A})-\epsilon n\leq\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}) for all k[n]k\in[n]. Note that when σk(𝐀)ϵn\sigma_{k}(\mathbf{A})\leq\epsilon n, σk(𝐀)ϵnσk(𝐒¯𝐀)\sigma_{k}(\mathbf{A})-\epsilon n\leq\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}) is trivially true. We now consider all k[n]k\in[n] such that σk(𝐀)>ϵn\sigma_{k}(\mathbf{A})>\epsilon n. Setting μk=λk(𝐀T𝐀)\mu_{k}=\lambda_{k}(\mathbf{A}^{T}\mathbf{A}), L=n2sL=\frac{n^{2}}{s} and Δ=ϵnσk(𝐀)\Delta=\frac{\epsilon n}{\sigma_{k}(\mathbf{A})} (note that Δ<1\Delta<1) in Theorem 11, we get:

(λk((𝐒¯𝐀)T(𝐒¯𝐀))(1Δ)λk(𝐀T𝐀))\displaystyle\operatorname*{\mathbb{P}}\left(\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))\leq(1-\Delta)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})\right) kecΔ12λk(𝐀T𝐀)Lkecϵ2n2λk(𝐀T𝐀)λk(𝐀T𝐀)(n2/s)\displaystyle\leq k\cdot e^{-c\frac{\Delta^{2}_{1}\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}{L}}\leq k\cdot e^{-c\frac{\epsilon^{2}n^{2}}{\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}\cdot\frac{\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}{(n^{2}/s)}}

where cc is constant. So, for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}) for any kk, we have λk((𝐒¯𝐀)T(𝐒¯𝐀))=σk2(𝐒¯𝐀)(1Δ)σk2(𝐀)\lambda_{k}((\bar{\mathbf{S}}\mathbf{A})^{T}(\bar{\mathbf{S}}\mathbf{A}))=\sigma_{k}^{2}(\bar{\mathbf{S}}\mathbf{A})\geq(1-\Delta)\sigma_{k}^{2}(\mathbf{A}) with probability at least 1δn1-\frac{\delta}{n}. Taking a square root on both sides we get σk(𝐒¯𝐀)1Δσk(𝐀)(1Δ)σk(𝐀)=σk(𝐀)ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})\geq\sqrt{1-\Delta}\sigma_{k}(\mathbf{A})\geq(1-\Delta)\sigma_{k}(\mathbf{A})=\sigma_{k}(\mathbf{A})-\epsilon n. Taking a union bound over all kk with σk(𝐀)>ϵn\sigma_{k}(\mathbf{A})>\epsilon n, σk(𝐀)ϵnσk(𝐒¯𝐀)\sigma_{k}(\mathbf{A})-\epsilon n\leq\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}) holds for all such kk with probability at least 1δ1-\delta.

Case 2: We now prove that σk(𝐒¯𝐀)σk(𝐀)+ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})\leq\sigma_{k}(\mathbf{A})+\epsilon n for all k[n]k\in[n]. We first consider the case when σk(𝐀)ϵn\sigma_{k}(\mathbf{A})\leq\epsilon n. Setting μk=λk(𝐀T𝐀)\mu_{k}=\lambda_{k}(\mathbf{A}^{T}\mathbf{A}), L=n2sL=\frac{n^{2}}{s} and Δ=ϵ2n2σk2(𝐀)\Delta=\frac{\epsilon^{2}n^{2}}{\sigma^{2}_{k}(\mathbf{A})} (note that Δ1\Delta\geq 1) in Theorem 11, we get (for some constant cc):

(λk((𝐒¯𝐀)T(𝐒¯𝐀))(1+Δ)λk(𝐀T𝐀))\displaystyle\operatorname*{\mathbb{P}}\left(\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))\geq(1+\Delta)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})\right) n.ecΔλk(𝐀T𝐀)L\displaystyle\leq n.e^{-\frac{c\Delta\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}{L}}
necϵ2n2λk(𝐀T𝐀)λk(𝐀T𝐀)(n2/s)\displaystyle\leq n\cdot e^{-\frac{c\epsilon^{2}n^{2}}{\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}\cdot\frac{\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}{(n^{2}/s)}}

Thus, if sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}), we have λk((𝐒¯𝐀)T(𝐒¯𝐀))(1+Δ)λk(𝐀T𝐀)λk(𝐀T𝐀)+ϵ2n2\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))\leq(1+\Delta)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})\leq\lambda_{k}(\mathbf{A}^{T}\mathbf{A})+\epsilon^{2}n^{2} for all k[n]k\in[n] such that σk(𝐀)ϵn\sigma_{k}(\mathbf{A})\leq\epsilon n with probability at least 1δ1-\delta via a union bound. Taking square root on both sides and using the facts that λk(𝐀T𝐀)=σk2(𝐀)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})=\sigma_{k}^{2}(\mathbf{A}), λk((𝐒¯𝐀)T(𝐒¯𝐀))=σk2(𝐒¯𝐀)\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))=\sigma_{k}^{2}(\mathbf{\bar{S}}\mathbf{A}) and a+b<a+b\sqrt{a+b}<\sqrt{a}+\sqrt{b}, we get σk(𝐒¯𝐀)σk(𝐀)+ϵn\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})\leq\sigma_{k}(\mathbf{A})+\epsilon n.

We now consider the case σk(𝐀)>ϵn\sigma_{k}(\mathbf{A})>\epsilon n. Setting μk=λk(𝐀T𝐀)\mu_{k}=\lambda_{k}(\mathbf{A}^{T}\mathbf{A}), L=n2sL=\frac{n^{2}}{s} and Δ=ϵnσk(𝐀)\Delta=\frac{\epsilon n}{\sigma_{k}(\mathbf{A})} (note that Δ<1\Delta<1) in Theorem 11, we get (for some constant cc):

(λk((𝐒¯𝐀)T(𝐒¯𝐀))(1+Δ)λk(𝐀T𝐀))\displaystyle\operatorname*{\mathbb{P}}\left(\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))\geq(1+\Delta)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})\right) n.ecΔ2λk(𝐀T𝐀)L\displaystyle\leq n.e^{-\frac{c\Delta^{2}\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}{L}}
necϵ2n2λk(𝐀T𝐀)λk(𝐀T𝐀)(n2/s).\displaystyle\leq n\cdot e^{-\frac{c\epsilon^{2}n^{2}}{\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}\cdot\frac{\lambda_{k}(\mathbf{A}^{T}\mathbf{A})}{(n^{2}/s)}}.

Thus, if sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}), we have λk((𝐒¯𝐀)T(𝐒¯𝐀))(1+Δ)λk(𝐀T𝐀)\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))\leq(1+\Delta)\lambda_{k}(\mathbf{A}^{T}\mathbf{A}) for all k[n]k\in[n] such that σk(𝐀)>ϵn\sigma_{k}(\mathbf{A})>\epsilon n with probability at least 1δ1-\delta via a union bound. Taking square root on both sides and using the fact that λk(𝐀T𝐀)=σk2(𝐀)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})=\sigma_{k}^{2}(\mathbf{A}), λk((𝐒¯𝐀)T(𝐒¯𝐀))=σk2(𝐒¯𝐀)\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))=\sigma_{k}^{2}(\mathbf{\bar{S}}\mathbf{A}) and a<a\sqrt{a}<a for any a>1a>1, we get σk(𝐒¯𝐀)(1+Δ)σk(𝐀)σk(𝐀)+ϵn\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})\leq(1+\Delta)\sigma_{k}(\mathbf{A})\leq\sigma_{k}(\mathbf{A})+\epsilon n. Thus, via a union bound over all k[n]k\in[n], we have σk(𝐒¯𝐀)σk(𝐀)+ϵn\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})\leq\sigma_{k}(\mathbf{A})+\epsilon n with probability 12δ1-2\delta.

Thus, via a union bound over the two cases above, for all k[n]k\in[n] with probability at least 13δ1-3\delta for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}) we have, for all k[n]k\in[n],

|σk(𝐒¯𝐀)σk(𝐀)|ϵn.\displaystyle|\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})-\sigma_{k}(\mathbf{A})|\leq\epsilon n. (56)

Next we prove that the singular values of 𝐒¯𝐀𝐓¯\mathbf{\bar{S}}\mathbf{A}\bar{\mathbf{T}} are close to those of 𝐒¯𝐀\mathbf{\bar{S}}\mathbf{A}, using essentially the same approach as above. Let 𝐘i\mathbf{Y}_{i} be a matrix values random variable for i[n]i\in[n] such that:

𝐘i={ns(𝐒¯𝐀)i(𝐒¯𝐀)iT,with probability s/n0otherwise\displaystyle\mathbf{Y}_{i}=\begin{cases}\frac{n}{s}(\mathbf{\bar{S}}\mathbf{A})_{i}(\mathbf{\bar{S}}\mathbf{A})^{T}_{i},&\text{with probability }s/n\\ 0&\text{otherwise}\end{cases}

where (𝐒¯𝐀)i(\mathbf{\bar{S}}\mathbf{A})_{i} is the iith column of 𝐒¯𝐀\mathbf{\bar{S}}\mathbf{A}. Then, i𝐘i=(𝐒¯𝐀𝐓¯)T(𝐒¯𝐀𝐓¯)\sum_{i}\mathbf{Y}_{i}=(\mathbf{\bar{S}}\mathbf{A}\bar{\mathbf{T}})^{T}(\mathbf{\bar{S}}\mathbf{A}\bar{\mathbf{T}}). Also, we have λk(𝔼[i𝐘i])=λk((𝐒¯𝐀)T(𝐒¯𝐀))=σk2(𝐒¯𝐀)\lambda_{k}(\mathbb{E}[\sum_{i}\mathbf{Y}_{i}])=\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))=\sigma_{k}^{2}(\mathbf{\bar{S}}\mathbf{A}). First, using a standard Chernoff bound, we can claim that 𝐒¯\bar{\mathbf{S}} will sample at most 2s2s rows from 𝐀\mathbf{A} with probability at least 1δ1-\delta for any sO(log(1/δ))s\geq O(\log(1/\delta)). Thus, we have 𝐘i2=ns𝐒¯𝐀22nsns2s2n2s\|\mathbf{Y}_{i}\|_{2}=\frac{n}{s}\|\mathbf{\bar{S}}\mathbf{A}\|_{2}^{2}\leq\frac{n}{s}\cdot\frac{n}{s}\cdot 2s\leq\frac{2n^{2}}{s} with probability 1δ1-\delta. Let this event be called E2E_{2}. We now consider two cases conditioned on the event E2E_{2}.

Case 1: We first prove that σk(𝐒¯𝐀)ϵnσk(𝐒¯𝐀𝐓¯)\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})-\epsilon n\leq\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}}) for all k[n]k\in[n]. Again note that when σk(𝐒¯𝐀)ϵn\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})\leq\epsilon n this is trvially true. So we consider all k[n]k\in[n] such that σk(𝐒¯𝐀)>ϵn\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})>\epsilon n. Setting μk=λk((𝐒¯𝐀)T(𝐒¯𝐀))\mu_{k}=\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A})), L=2n2sL=\frac{2n^{2}}{s} (as we have conditioned on E2E_{2}) and Δ=ϵnσk(𝐒¯𝐀)\Delta=\frac{\epsilon n}{\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})} (note that Δ<1\Delta<1) in Theorem 11, we get:

(λk((𝐒¯𝐀𝐓¯)T(𝐒¯𝐀𝐓¯))(1Δ)λk(𝐀T𝐀))\displaystyle\operatorname*{\mathbb{P}}\left(\lambda_{k}((\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})^{T}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}}))\leq(1-\Delta)\lambda_{k}(\mathbf{A}^{T}\mathbf{A})\right) kecΔ12λk((𝐒¯𝐀)T(𝐒¯𝐀))Lkecϵ2n2λk((𝐒¯𝐀)T(𝐒¯𝐀))λk((𝐒¯𝐀)T(𝐒¯𝐀))(n2/s)\displaystyle\leq k\cdot e^{-c\frac{\Delta^{2}_{1}\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))}{L}}\leq k\cdot e^{-c\frac{\epsilon^{2}n^{2}}{\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))}\cdot\frac{\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))}{(n^{2}/s)}}

where cc is some constant. So, for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}) for any kk, we have λk((𝐒¯𝐀𝐓¯)T(𝐒¯𝐀𝐓¯))=σk2(𝐒¯𝐀𝐓¯)(1Δ)σk2(𝐒¯𝐀)\lambda_{k}((\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})^{T}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}}))=\sigma_{k}^{2}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})\geq(1-\Delta)\sigma_{k}^{2}(\bar{\mathbf{S}}\mathbf{A}) with probability at least 1δn1-\frac{\delta}{n}. Taking a square root on both sides we get σk(𝐒¯𝐀𝐓¯)1Δσk(𝐒¯𝐀)(1Δ)σk(𝐒¯𝐀)=σk(𝐒¯𝐀)ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})\geq\sqrt{1-\Delta}\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})\geq(1-\Delta)\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})=\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})-\epsilon n. Taking a union bound over all kk with σk(𝐀)>ϵn\sigma_{k}(\mathbf{A})>\epsilon n, σk(𝐒¯𝐀)ϵnσk(𝐒¯𝐀𝐓¯)\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})-\epsilon n\leq\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}}) holds for all such kk with probability at least 1δ1-\delta.

Case 2: We now prove σk(𝐒¯𝐀𝐓¯)σk(𝐒¯𝐀)+ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})\leq\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})+\epsilon n for all k[n]k\in[n]. We again first consider the case σk(𝐒¯𝐀)ϵn\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})\leq\epsilon n. Setting μk=λk(𝐀T𝐀)\mu_{k}=\lambda_{k}(\mathbf{A}^{T}\mathbf{A}), L=n2sL=\frac{n^{2}}{s} and Δ=ϵ2n2σk2(𝐒¯𝐀)\Delta=\frac{\epsilon^{2}n^{2}}{\sigma^{2}_{k}(\mathbf{\bar{S}}\mathbf{A})} (note that Δ1\Delta\geq 1) in Theorem 11:

(λk((𝐒¯𝐀𝐓¯)T(𝐒¯𝐀𝐓¯))(1+Δ)λk((𝐒¯𝐀)T(𝐒¯𝐀)))\displaystyle\operatorname*{\mathbb{P}}\left(\lambda_{k}((\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})^{T}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}}))\geq(1+\Delta)\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))\right) necΔλk((𝐒¯𝐀)T(𝐒¯𝐀))L\displaystyle\leq n\cdot e^{-\frac{c\Delta\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))}{L}}
necϵ2n2λk((𝐒¯𝐀)T(𝐒¯𝐀))λk((𝐒¯𝐀)T(𝐒¯𝐀))(n2/s)\displaystyle\leq n\cdot e^{-\frac{c\epsilon^{2}n^{2}}{\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))}\cdot\frac{\lambda_{k}((\mathbf{\bar{S}}\mathbf{A})^{T}(\mathbf{\bar{S}}\mathbf{A}))}{(n^{2}/s)}}

Then, similar to the case σk(𝐀)ϵn\sigma_{k}(\mathbf{A})\leq\epsilon n in the previous case 2, taking square root of both sides and via a union bound, we get σk(𝐒¯𝐀𝐓¯)σk(𝐒¯𝐀)+ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})\leq\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})+\epsilon n for all k[n]k\in[n] such that σk(𝐒¯𝐀)ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})\leq\epsilon n with probability at least 1δ1-\delta for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}). The case σk(𝐒¯𝐀)>ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})>\epsilon n will again be similar as σk(𝐀)>ϵn\sigma_{k}(\mathbf{A})>\epsilon n in the previous case 2. We set Δ=ϵnσk(𝐒¯𝐀)\Delta=\frac{\epsilon n}{\sigma_{k}(\bar{\mathbf{S}}\mathbf{A})} and apply Theorem 11 and take the square root on both sides to get σk(𝐒¯𝐀𝐓¯)σk(𝐒¯𝐀)+ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})\leq\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})+\epsilon n with probability 1δ1-\delta for all k[n]k\in[n] for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}). Thus, with probability 12δ1-2\delta, conditioned on the event E2E_{2}, we have σk(𝐒¯𝐀𝐓¯)σk(𝐒¯𝐀)+ϵn\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})\leq\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})+\epsilon n for all k[n]k\in[n]. Finally, via a union bound over the two cases above, and conditioned on E2E_{2}, for all k[n]k\in[n] with probability at least 12δ1-2\delta for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}) we get

|σk(𝐒¯𝐀𝐓¯)σk(𝐒¯𝐀)|ϵn.\displaystyle|\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})-\sigma_{k}(\mathbf{\bar{S}}\mathbf{A})|\leq\epsilon n. (57)

Thus, taking a union bound over all the cases above (including E2E_{2}), from equation (56) and (57) and via a triangle inequality, we get: |σk(𝐒¯𝐀𝐓¯)σk(𝐀)|2ϵn|\sigma_{k}(\bar{\mathbf{S}}\mathbf{A}\bar{\mathbf{T}})-\sigma_{k}(\mathbf{A})|\leq 2\epsilon n with probability at least 1cδ1-c\delta (where cc is a small constant) for sO(log(n/δ)ϵ2)s\geq O(\frac{\log(n/\delta)}{\epsilon^{2}}). Adjusting ϵ\epsilon and δ\delta by constant factors gives us the final bound. ∎

Remark on Rectangular Matrices: Though we have considered 𝐀\mathbf{A} to be a square matrix for simplicity, notice that Theorem 12 also holds for any arbitrary (non-square) matrix 𝐀n×m\mathbf{A}\in\mathbb{R}^{n\times m}, with nn replaced by max(n,m)\max(n,m) in the sample complexity bound.

Remark on Non-Uniform Sampling: As discussed in Section 1.3.1, simple non-uniform random submatrix sampling via row/column sparsities or norms does not suffice to estimate the singular values up to improved error bounds of ϵnnz(𝐀)\epsilon\sqrt{\operatorname{nnz}(\mathbf{A})} or ϵ𝐀F\epsilon\|\mathbf{A}\|_{F}. A more complex strategy, such as the zeroing out used in Theorems 2 and 3 must be used. It is worth noting that following the same proof as Theorem 12, it is easy to show that if 𝐒¯\mathbf{\bar{S}} is sampled according to row norms or sparsities and appropriately weighted, then the singular values of 𝐒¯𝐀\mathbf{\bar{S}}\mathbf{A} do approximate those of 𝐀\mathbf{A} up to these improved error bounds. The proof breaks down when analyzing 𝐒¯𝐀𝐓¯\mathbf{\bar{S}}\mathbf{A}\mathbf{\bar{T}}. 𝐓¯\mathbf{\bar{T}} would have to be sampled according to the row norms/sparsities of 𝐒¯𝐀\mathbf{\bar{S}}\mathbf{A}, not 𝐀\mathbf{A}, for the proof to go through. However, in general, these sampling probabilities can differ significantly between 𝐒¯𝐀\mathbf{\bar{S}}\mathbf{A} and 𝐀\mathbf{A}.