0.8pt
Hutchinson’s Estimator is Bad at Kronecker-Trace-Estimation
Abstract
We study the problem of estimating the trace of a matrix that can only be accessed through Kronecker-matrix-vector products. That is, for any Kronecker-structured vector , we can compute . We focus on the natural generalization of Hutchinson’s Estimator to this setting, proving tight rates for the number of matrix-vector products this estimator needs to find a approximation to the trace of .
We find an exact equation for the variance of the estimator when using a Kronecker of Gaussian vectors, revealing an intimate relationship between Hutchinson’s Estimator, the partial trace operator, and the partial transpose operator. Using this equation, we show that when using real vectors, in the worst case, this estimator needs products to recover a approximation of the trace of any PSD , and a matching lower bound for certain PSD . However, when using complex vectors, this can be exponentially improved to . We show that Hutchinson’s Estimator converges slowest when itself also has Kronecker structure. We conclude with some theoretical evidence suggesting that, by combining Hutchinson’s Estimator with other techniques, it may be possible to avoid the exponential dependence on .
1 Introduction
A ubiquitous problem in scientific computing is that of estimating the trace of a matrix that can be only accessed via matrix-vector products. In other words, we are given access to an oracle that can evaluate for any , and our goal is to approximate with as few oracle queries as possible. This matrix-vector oracle model or implicit matrix model is a common computational framework used in much of the numerical linear algebra community [Sun et al., 2021a, Chen and Hallman, 2022, Halikias and Townsend, 2022, Bakshi et al., 2022, Persson et al., 2022]. However, in many applications, we cannot efficiently evaluate for all vectors . Instead, only some matrix vector products can be efficiently computed.
In this paper, we study trace estimation in the Kronecker-matrix-vector oracle model, also known as the rank-one vector model, where we can evaluate for any vector that can can be written as a Kronecker product of smaller vectors, i.e. for some vectors . Here, denotes the Kronecker product, which is a common primitive in many tensor-based algorithms and in much of quantum physics. The number of vectors () and the number of entries () in each of are parameters of the model.
Our study is motivated by recent literature on entanglement estimation in quantum states [Feldman et al., 2022]. This computational model has also been studied in other applied domains [Chen and Hallman, 2022, Kueng et al., 2017]. An important setting where we can efficiently compute Kronecker-matrix-vector products is when is the sum of a small number of matrices, where each summand is a Kronecker product of matrices. Such matrices occur as Hamiltonians of various quantum models [Wang et al., 2023]. Another instance in which we can efficiently compute Kronecker-matrix-vector products is when is the flattening of a tensor network. This is the case in [Feldman et al., 2022], where the oracle is efficiently implemented when describes a density operator that represents certain systems of quantum particles. Recently, there has been some effort to find algorithms that can efficiently solve linear algebra problems using this oracle, although the papers do not explicitly define such a Kronecker-matrix-vector oracle [Bujanovic and Kressner, 2021, Avron et al., 2014, Ahle et al., 2020, Sun et al., 2021b]. In this paper, we focus on the fundamental problem of trace estimation.
The trace estimation problem in the Kronecker-matrix-vector oracle model can be exactly solved using oracle queries. Since each standard basis vector obeys Kronecker structure, we can exactly look up each diagonal entry of and evaluate . So, our goal is to estimate the trace of to relative error using matrix-vector products. In the standard non-Kronecker setting, the most fundamental algorithm for trace estimation is Hutchinson’s Estimator [Girard, 1987, Hutchinson, 1989]:
If the distribution of random vectors has and , then . Further, if is a Gaussian or Rademacher distribution, then . In particular, under the common assumption that is PSD, so that , we then find that the standard deviation of Hutchinson’s estimator is
when we take queries. For large dimension , this very simple algorithm is much more efficient than exactly computing the diagonal entries of , since the estimator has no dependence on the size of at all.
This estimator can be naturally generalized to the Kronecker-matrix-vector oracle model by simply changing the distribution . Instead of taking to be a distribution over Gaussian or Rademacher vectors, we now take to be the Kronecker of vectors drawn from some other isotropic zero-mean distribution :
Notably, if and , then we also immediately have that and , so that . This simple estimator, which we refer to as the Kronecker-Hutchinson Estimator, fits the Kronecker-matrix-vector oracle model, and we may hope that it is efficient enough to estimate the trace of very quickly. At a high level, we answer this in the negative, proving novel and very tight bounds on the variance of the Kronecker-Hutchinson Estimator, showing that the sample complexity of the estimator has an unavoidable exponential dependence on .
1.1 Prior Work
We first remark there is a great deal of literature on trace estimation without the Kronecker constraint. The design of Hutchinson’s Estimator is attributed to [Girard, 1987, Hutchinson, 1989], and some of the core papers that analyze the estimator are [Avron and Toledo, 2011, Roosta-Khorasani and Ascher, 2015, Cortinovis and Kressner, 2021].
Although relatively a new topic, trace estimation in the Kronecker-matrix-vector oracle model has recently been explored in the literature. From the physics literature, the works of [Feldman et al., 2022] and [Huang et al., 2020] analyze trace estimators when , and shows some empirical results for their performance. From a theoretical perspective, [Vershynin, 2020] provides bounds that can be used to analyze the Kronecker-Hutchinson Estimator. However, the paper provides very coarse bounds for this problem, showing that if then we have , where is the operator norm of . To obtain a relative error , this suggests an overall sample complexity of , which is generally more than queries.
The research of [Bujanovic and Kressner, 2021] directly analyzes the Kronecker-Hutchinson Estimator when , showing first that samples suffice to ensure that , and second that samples suffice to ensure that . Lastly, Remark 2 of the arXiv version of [Ahle et al., 2020] is a result on using a Kronecker-Johnson-Lindenstrauss transform, and implies that samples suffice for the Kronecker-Hutchinson Estimator with Rademacher vectors to achieve relative error . While this worst-case bound is tight, the paper gives no way to understand when this bound is lose, and for what matrices fewer samples are needed. Further, all the aforementioned theoretical papers rely on heavy-duty analysis, for instance involving Gaussian and Rademacher Chaos theory. In contrast, our contributions give tighter results using simpler techniques.
We also note that [Ahle et al., 2020] does also include a Johnson-Lindenstrauss style guarantee for working with Kronecker-shaped data using a sketch that has rows. However, this sketch does not actually operate in the Kronecker-matrix-vector oracle model.
1.2 Our Contributions
Our core technical contribution is an exact expression of the variance of Hutchinson’s Estimator when our query vectors are formed as a Kronecker product of Gaussian vectors. This also provides an upper bound on the variance when the estimator instead uses a Kronecker product of Rademacher vectors. For PSD matrices, our results imply a worst-case upper bound of on the number of samples needed to estimate the trace of to relative error . We match this upper bound with a lower bound showing that the Kronecker-Hutchinson Estimator with iid zero-mean unit-variance vectors has to use samples to achieve standard deviation . Lastly, we show that if we are allowed to use complex Kronecker-matrix-vector products instead of real matrix-vector products (i.e. if where ), then this rate improves to .
To more concretely state our core technical contribution, we recall the partial trace operator from quantum physics. In quantum physics, a Hermitian density matrix is roughly speaking used to represent the joint probability distribution for a set of particles. Suppose these particles are partitioned into two sets, so that . Further suppose that Alice can observe the particles in , and that Bob can observe the particles in , but they cannot observe each other’s particles. Then, the partial trace of with respect to , denoted , describes the marginal distribution of the particles that Bob observes. This is often called “tracing out the subsystem ”.
We include a general formal definition of the partial trace in Section 2, but a rough intuition is easily gained by examining the case. Here, we can break down as a block matrix full of :
Then, the partial trace of with respect to is the sum of the diagonal blocks, i.e. , and the partial trace of with respect to is the matrix that contains the trace of each block matrix, i.e.
We will also need to recall the partial transpose operator from quantum physics. This operator is often used in quantum physics, but lacks an intuitive physical meaning analogous to what the partial trace enjoys. Like the partial trace, the partial transpose can also be taken over any subset of particles . We denote the partial transpose of with respect to as , and call this “transposing the subsystem ”. We have a formal definition of the partial transpose in Section 2. But, when , the two partial transposes of are:
Having setup both the partial trace and the partial transpose, we can now state our core theorem:
Theorem 1.
Fix , and let . Let , where each is independent and is either a Rademacher vector or a Gaussian vector. Let be the average of all partial transposes of . Then, . Furthermore, if all are , then this expression is an equality.
The matrix seems odd at a glance, so we first justify why it is natural. For the classical setting with Gaussian vectors, we can write that for all symmetric matrices [Avron and Toledo, 2011, Meyer, 2021]. For asymmetric matrices, this equality does not hold. Instead, we let , and note that for all . Since is symmetric, we then find that . Theorem 1 generalizes this idea to all possible ways to partially transpose across all possible subsystems .
Next, note that , and that symmetric matrices have . So, we know that holds for all matrices , with equality for symmetric . This bound can be easier to work with, since it does not require worrying about . For large , we similarly bound , providing a more approachable version of Theorem 1:
Theorem 2.
Fix , and let . Let , where each is independent and is either a Rademacher vector or a Gaussian vector. Then, . Furthermore, if all are and if , then this expression is an equality.
Note that many reasonable matrices will have . For instance, recall that the Kronecker-matrix-vector oracle model may be used when is the sum of a small number of matrices, where each summand is a Kronecker product of matrices. If each summand is the Kronecker product of symmetric matrices, then we have .
We demonstrate the utility of Theorem 2 by using it to show the following guarantee on estimating the trace of a random rank-one matrix:
Theorem 3.
Fix . Let where is a vector of iid standard normal Gaussians. Then, with probability , we know that . In particular, with probability , we know that the Kronecker-Hutchinson Estimator returns a relative error estimate to the trace of if .
This rate of is obviously expensive, but our sample complexity is lower than what is achievable with any other technique that we are aware of. Nevertheless, even without , this sum of partial traces may sometimes still be too unwieldy to work with. For such cases, we provide a worst-case bound for PSD matrices:
Theorem 4.
Fix , and let be a PSD matrix. Let , where each is either a Rademacher vector or a Gaussian vector. Then, .
In particular, by Chebyshev’s Inequality, Theorem 4 implies that samples suffice to estimate to relative error . For the case of [Bujanovic and Kressner, 2021], when , we get the following:
Corollary 5.
Let . Let , where each is independent and is either a Rademacher vector or a Gaussian vector. Let . Then, . If is PSD, with probability , we know that the Kronecker-Hutchinson Estimator returns a relative error estimate to the trace of if .
This success probability can be boosted to for an arbitrary by picking up a dependence in the sample complexity by returning the median of iid trials of . We further show that the expensive rate of is essentially tight in the worst case for the Kronecker-Hutchinson Estimator:
Theorem 6.
Fix . Consider Hutchinson’s Estimator run with vectors where are vectors of iid real zero-mean unit-variance entries. Then, there exists a PSD matrix such that Hutchinson’s Estimator needs samples in order to have standard deviation .
1.2.1 Complex Kronecker-Matrix-Vector Products
Although certainly cheaper than exactly computing the trace with queries to the diagonal of , the rate of is prohibitively expensive for large . Our next set of results show that the bound improves if we can issue complex queries instead of real ones, that is when where . It seems realistic to say that any software implementation of the Kronecker-matrix-vector oracle could handle complex queries instead of real ones.
Note that for , there is almost no distinction between the real and complex oracle models, as any complex matrix-vector product can be simulated by running two real matrix-vector products. However, this is no longer the case for large values of and real simulation becomes expensive. To see this, note that the real part of the Kronecker product of complex vectors might not have a Kronecker structure. Because of this, it may take real-valued Kronecker-matrix-vector products to simulate a single complex-valued Kronecker-matrix-vector product. In a vague intuitive sense, the complex Kronecker-structured vectors are exponentially less constrained than the real Kronecker-structured vectors. Running with this intuition in mind, we prove the following analogues to Theorems 1, 2, 4, 3 and 6:
Theorem 7.
Fix , and let . Let , where each is either a complex Rademacher vector or a complex Gaussian vector. That is, we have where and are iid real-valued vectors of either Rademacher or standard normal Gaussian entries. Let be the average of all partial transposes of . Then, . If are complex Gaussians, then the first expression is an equality. If we further know that , then both expression are equalities. Further, if is PSD, then we have .
Theorem 8.
Fix . Let where is a vector of iid standard normal Gaussians. Then, with probability , we know that . In particular, with probability , we know that the Kronecker-Hutchinson Estimator returns a relative error estimate to the trace of if .
Theorem 9.
Fix . Consider Hutchinson’s Estimator run with vectors where are vectors of iid complex zero-mean unit-variance entries. Then, there exists a PSD matrix such that Hutchinson’s Estimator needs samples in order to have standard deviation .
By applying Chebyshev’s Inequality to Theorems 4 and 7, we see an exponential speedup from queries in the real-valued case to queries in the complex-valued case. For large , this gap from to is a very significant speedup, and we are not aware of any other prior works which discuss this potential.
Another way to view this speedup is from comparing the Kronecker-Hutchinson Estimator to just summing the diagonal of , which takes exactly queries. When , the real-valued Kronecker-Hutchinson Estimator is exponentially more expensive than just summing the diagonal directly. However, the complex-valued Kronecker-Hutchinson Estimator is only times more expensive in the worst case. This is another way to frame the power of the complex-valued estimator – the complex-valued estimator is at most polynomially slower than summing the diagonal, while the real-valued estimator can be exponentially slower.
We can also directly compare Theorems 3 and 8. In the real-valued case, the sample complexity grows as , while in the complex-valued case, the sample complexity grows as . Notably, if then this rate grows as for real-valued queries and for complex-valued queries. This shows a power exponential gap between our bounds, where we are only leveraging the difference between complex-valued and real-valued vectors.
1.2.2 Additional Results and Hope for a Faster Algorithm
We complement the aforementioned results with a few informative additions. First, we show a lower bound against all Kronecker-matrix-vector algorithms. By adapting the techniques of [Braverman et al., 2020] and [Jiang et al., 2021], we show that any Kronecker-matrix-vector algorithm that estimates the trace of with standard deviation must make at least oracle queries.
Second, we note that the lower bounds in Theorems 6 and 9 both use construction of such that . That is, the Kronecker-Hutchinson Estimator converges slowest for matrices that have a Kronecker structure. However, if we truly are given with the promise that for some unknown matrices , then in Section 7 we show that the trace of can be recovered exactly using just Kronecker-matrix-vector products. Similarly, Theorems 3 and 8 suggest that the Kronecker-Hutchinson Estimator converges exponentially slowly for random rank-one matrices. But in this rank-one case, we show that we can exactly recover the trace of with a single Kronecker-matrix-vector query!
This trend, where the matrices that make the Kronecker-Hutchinson Estimator converge slowly can have their traces efficiently estimated by another algorithm, is reminiscent of the setup for the analysis in [Meyer et al., 2021]. In that paper, the authors recall the analysis of the standard deviation of Hutchinson’s Estimator:
where the first inequality uses that for PSD , and the second inequality is forced to pick the sample complexity . Since the first inequality is only tight for matrices that are nearly low-rank, the authors propose an algorithm which combines Hutchinson’s Estimator with a low-rank approximation method that can efficiently compute the trace of nearly-low-rank matrices efficiently. This way, their algorithm has a worst-case matrix-vector complexity of , while Hutchinson’s Estimator is stuck with .
Our analysis similarly characterizes two cases when the Kronecker-Hutchinson Estimator converges slowly – when is either low-rank or has Kronecker structure. We also show that we can compute the trace such matrices very efficiently. This suggests that a more involved algorithm can interpolate between these low query complexities and perhaps even avoid any exponential dependence on at all. We leave this as an important problem for future work.
Lastly, we remark that the analysis from [Meyer et al., 2021] crucially uses a tight bound for the variance of Hutchinson’s Estimator in terms of . We believe that our core results, Theorems 1 and 2, can serve an analogous role in constructing a faster trace-estimating algorithm.
2 Preliminaries
We use capital bold letter to denote matrices, lowercase bold letters to denote vectors, and lowercase non-bold letters to denote scalars. is the set of reals, and is the set of complex numbers. and denote the real and imaginary parts of a vector or scalar. denotes the transpose of a vector, while denotes the conjugate transpose of a matrix or vector. denotes out input matrix. We use the bracket notation to denote the entry of matrix . We let denote the standard basis vector, whose exact dimension may depend on the context. We let always refer to the identity matrix. We let denote the Frobenius norm, the trace, the partial trace, and the partial transpose. We let be the set of integers from 1 to .
2.1 Kronecker product
The Kronecker product of two matrices and is , where
We also use the shorthands , as well as . The Kronecker product has three key properties that we repeatedly use:
Lemma 10 (Mixed-Product Property (Folklore)).
Lemma 11 (Transposing (Folklore)).
Lemma 12 (Fact 7.4.1 in [Bernstein, 2011]).
Let and . Then, .
At one point, we will also require a short lemma about the squared Frobenius norm and the Kronecker product. This lemma is roughly equivalent to noting that in the block matrix decomposition of when in Section 1.2.
Lemma 13.
Let and for any . Then, we have
Proof.
First, recall there exists a permutation matrix such that for all . Now let . Then, by expand as a block matrix of matrices , and noting that . We then get that
We can also notice that
Then, noting that the Frobenius norm is invariant to permutation, we find that
where the last equality comes from again permuting the matrices inside the norm. ∎
2.2 The Partial Trace Operator
We take a moment to formalize the partial trace operator and characterize some useful properties of it. We start by defining the partial trace with respect to a single subsystem:
Definition 14.
Let . Then, the partial trace of with respect to its subsystem is
(1) |
where these identity matrices are matrices, and where is the standard basis vector.
Equivalently, the partial trace operator can be defined as the unique linear operator which guarantees that for all , , and , we have that .
We will often consider the partial trace of with respect to a set of subsystems . We can think of this as either composing the operation in Equation 1 many times, or as extending the summation in Equation 1 to sum over all basis vectors that span the product of subspaces defined in . To see this equivalence, consider the partial trace of with respect to :
(Lemma 10) |
Next, note that is a standard basis vector in uniquely identified by the pair . That is, we can replace this double sum above with a single sum over standard basis vectors in :
( now) |
Which now looks remarkably similar to Definition 14, but with a larger definition of “subsystem” that the standard basis vector is acting on. Formally, we define the partial trace of with respect to a set as following:
Definition 15.
Let , and let . Let be the indices in , sorted so that . Then, the partial trace of with respect to the subsystem is
In the special case that , we denote this partial trace as .
In principle, the order of subscripts does not really matter. Tracing out the first subsystem and then tracing out the second subsystem produces the exact same matrix as tracing out the second subsystem and then tracing out the first subsystem. However, notationally, a non-ascending order becomes messy since removing the second subsystem changes the indices. To see this messiness, note that the above statement would be formalized as . To avoid this issue, we always expand the partial traces in in sorted order as in Definition 15.
Moving on, a vital property of the partial trace is that it is trace-preserving:
Lemma 16 (Folklore).
For all and , we have .
2.3 Post-Measurement Reduced Density Matrices
Our analysis also heavily involves post-measurement reduced density matrices, which we now define, and whose long name originates from quantum physics. As mentioned in Section 1.2, if is a matrix that describes the joint probability distribution of many particles (a density matrix), then describes the joint probability distribution of the particles that Bob can view. The Post-measurement Reduced Density Matrix (PMRDM) of with respect to vector describes the joint distribution of the particles that Bob can view given that Alice measured her particles in the direction of the vector .111For the purpose of understanding the paper, there is no particular need to understand this intuition. Those interested in learning more are recommended to read through Chapter 6 of [Aaronson, 2018]. For brevity, we refer to this matrix as the PMRDM.
Definition 17.
Let , , and . Then the PMRDM of after measuring along the subsystem is
Further let . Then, the PMRDM of after measuring according to in the first subsystems of is
where .
Notice that the notation for the PMRDM does not explicitly show the vector that define the measurement. This is done for visual clarity, and is safe because the meaning of the vector will always be clear from context.222Namely, we always take to be the term in the expansion of , where has either iid Gaussian or Rademacher entries. Further, while we could define the PMRDM with respect to an arbitrary set of subsystems , as done in Definition 15, our proofs only need to examine the PMRDM when looking at the first subsystems, so for simplicity we only define the PMRDM for that case.
We need one important property from the PMRDM, which relates it to the partial trace:
Lemma 18.
Let , , and . Define . Then, .
Proof.
2.4 The Partial Transpose Operator
Finally, we formalize the partial transpose operator, and characterize a useful property of it. We start by defining the partial transpose of with respect to a single subsystem.
Definition 19.
Let and . Then the partial transpose of with respect to the subsystem is
Equivalently, the partial transpose operator is the unique linear operator which guarantees that for all , , , we have that . To verify this equivalence, notice that we can express the transpose of a matrix as
Then, we see that
Like the partial trace, we will often be interested in taking the partial transpose of with respect to a set of subsystems . We extend the definition of the partial transpose directly to this case:
Definition 20.
Let and . Let be the indices in . Then, the partial transpose of with respect to the subsystem is
Notably, the order of partial transposes does not matter in this case. That is, we have . We next establish two useful properties of the partial transpose:
Lemma 21.
Let and . Then, .
Proof.
We directly compute
∎
Lemma 22.
Let , , , and . Then, for all we have
Proof.
We expand the right-hand-side:
where we remove sum by noting that and are zero unless and . ∎
We now have enough tools to exactly understand the variance of the Kronecker-Hutchinson Estimator.
3 The Exact Variance of Kronecker-Hutchinson Estimator
We begin by recalling the exact variance of the classical () Hutchinson’s Estimator:
Lemma 23 ([Girard, 1987, Hutchinson, 1989, Avron and Toledo, 2011, Meyer, 2021]).
Let and let be a vector of either iid Rademacher or standard normal Gaussian entries. Let . Then, and . The first inequality is tight if is Gaussian, and the second is tight if is symmetric.
This factor of 2 is important for our analysis, and is the base of the exponent in Theorem 1. The key to our proof technique is to use Lemma 12 to expand
(2) |
which then is just a classical () Hutchinson’s Estimate of the trace of . Lemma 23 then gives (for Gaussians) an exact understanding of the variance of this estimator. As a warm-up, we first use this proof technique to show that the Kronecker-Hutchinson Estimator is unbiased, though shorter proofs are certainly possible.
Theorem 24.
Let . Let be iid Rademacher or standard normal Gaussian vectors, and let . Then, .
Proof.
For any ,
(Equation 2) | ||||
(Tower Rule) | ||||
(Lemma 23) | ||||
(Lemma 18) |
Repeating this argument times in a row, and letting , we conclude by Lemma 16 that
which completes the proof.
∎
Analyzing the variance of the estimator requires a bit more effort due to the Frobenius norm term and matrix that appear in Lemma 23. More concretely, we approach the analysis of the variance of via the classical formula
Since we already know that , we turn our attention to evaluating . Note that rearranging Lemma 23 gives that
(3) |
with equality if is Gaussian. Then, noting that , and taking the same approach as in the bias analysis, we see that
(Equation 2) | ||||
(Tower Rule) | ||||
(Equation 3) | ||||
(Lemma 18) |
where we let . The first term admits a recursive relationship just like in the proof of Theorem 24, so we can easily handle that. The second term however requires more effort. The key observation we use is that the squared Frobenius norm of the PMRDM is in fact a sum of the squares of many Kronecker-Hutchinson Estimators:
(Lemma 12) |
In particular, this last line is the sum of squares of many Kronecker-Hutchinson Estimators for a wide variety of matrices. Despite these being trace estimates for different matrices, we can just carefully analyze the resulting terms using the Mixed-Product Property (Lemma 10), Lemma 12, and Equation 3. We get the following recurrence relation:
Lemma 25.
Fix , and let for some . Then,
with equality if is a standard normal Gaussian vector.
Proof.
We first extract the vector via the same technique as in Equation 2. Recall that we always take to be the identity in dimensions.
We take a moment to examine the variance of this Hutchinson’s sample. Letting , we get that the variance of the sample is at most
By Lemma 22, we know the term on the right is equivalent to , so that the norm above becomes . Lemma 13 further tells us the the sum of this variance across all is . Returning to our overall analysis, we get
Note that the only inequality applies Equation 3, which is an equality if is a standard normal Gaussian.
∎
Notice that this lemma takes in an arbitrary matrix , instead of the specific choice of from earlier. This is useful, as we apply Lemma 25 repeatedly in the following proof:
Lemma 26.
Fix , and let for some . Let . Then,
with equality if are iid standard normal Gaussian vectors.
Proof.
We prove this by induction over . For notational clarity in this induction, we let . Then, our goal is to prove that for all and . We start with the base case . First note that by Lemma 21. Then Lemma 25 gives us that
Next, for the induction case, we assume the lemma holds for . Since is the average of the all partial transposes in the first subsystems of , we can expand . So, for instance, by our induction hypothesis we have that
Therefore, by Lemma 25, by our inductive hypothesis, and by the composition of partial traces, we have that
(4) | |||||
(5) |
The second inequality applies the induction hypotheses to the matrix on the left term, and applies the induction hypothesis to the matrix on the right term.
Further, Equation 5 holds by comparing the first and second terms in Equation 4, where the left sum covers all such that , and where the right term covers all such that . Note that the only inequalities used are Equations 3 and 25, as well as the induction hypothesis, which are all equalities when considering iid standard normal Gaussian vectors. ∎
We now have all the tools needed to prove the bound on the variance of the Kronecker-Hutchinson Estimator.
Theorem 1 Restated.
Fix , and let . Let , where each is independent and is either a Rademacher vector or a Gaussian vector. Let be the average of all partial transposes of . Then, . Furthermore, if all are , then this expression is an equality.
Proof.
We start again with the observation in Equation 2, which we use to prove the following claim by induction over :
(6) |
where we interpret . The base case for Equation 6 is when , where we immediately have
which completes the base case. Now we assume Equation 6 holds for , and we show that it holds for :
(Lemma 23) | |||
where the last line applies the induction hypothesis. This completes the proof by induction of Equation 6. We can then appeal to Lemma 26 and merge terms together. First, let , and note that . Then,
We proceed by simplifying the partial trace . First, note that is the average of all partial transposes of in the first subsystems. By Lemma 21, we can extend this to be the average of all partial transposes of in across all subsystems , so that
The left term above is the average of all partial transposes of that do not include subsystem . Since where , the right term above is the average of all partial transposes of that do include . So, we find that . We are therefore left with the expression
(7) |
We now focus on the double summation in Equation 7. We start by understanding the inner summation in terms of the index from the outer iteration. Let
so that
We see that is a weighted sum of across all of the form where . That is, we consider all that satisfies two properties:
-
1.
Indices are all included:
-
2.
Index is not included:
We can intuitively think of as a “Gap” that precedes the “Suffix” . Crucially, if appears in a term of , then does not appear in a term of any other . This is because each has a unique gap: for any , let be the smallest value such that . Then is the gap of . We note an edge-case, where if then the gap is and we recover the term in Equation 7.
Next, note that every is scaled by . Since , we know , so we can write . So, we have shown that every appears exactly once in Equation 7, and each term is scaled by . That is,
which completes the proof since
Note that all inequalities in the proof become equalities if we know that are all iid standard normal Gaussians. ∎
3.1 Proof of Theorem 2
The variance expression in Theorem 1 is extremely tight, being an equality in the case that our query vectors are Kroneckers of Gaussian vectors. However, it is not immediately clear how we should interpret this expression. We now prove Theorem 2, which is a more approachable upper bound of Theorem 1. We start with a short lemma:
Lemma 27.
Let and . Then, .
Proof.
Consider any . Then, recall that each standard basis can be decomposed as where , , and . Then, we can expand the norm of as
where we remove the sum over and be noting that and are zero unless and . We then complete the lemma by decomposing , where . ∎
We now prove the theorem:
Theorem 2 Restated.
Fix , and let . Let , where each is independent and is either a Rademacher vector or a Gaussian vector. Then, . Furthermore, if all are and if , then this expression is an equality.
Proof.
We prove that , which implies the theorem. First, recall that the partial trace is linear, so that by the triangle inequality we can write
Next, we examine the norm . First, by Lemma 21, we can assume without loss of generality that . Then, for notationally simplicity, suppose that . We can then write . To see this, fix any and . By our without-loss-of-generality claim, we know that , and can therefore expand
So, we can pull all of the partial transposes from inside of the partial trace to outside the partial trace. Then, by Lemma 27, we can conclude that
∎
4 Cost of the Kronecker-Hutchinson Estimator for PSD Matrices
Trace estimation is often studied under the assumption that the input matrix is PSD. This assumption is very common and reasonable in practice, and without such an assumption, it is impossible for an algorithm such as Hutchinson’s Estimator to achieve a relative error approximation to the trace of . This is because an indefinite matrix can have while having large Frobenius norm, meaning that Hutchinson’s Estimator would have to perfectly recover the trace of despite having a nonzero variance. This would incur infinite sample complexity. However, if we assume that is PSD, then , and so we know that the standard deviation of the classical Hutchinson’s Estimator is . That is, samples suffice to achieve relative error with at least constant probability.
We now make the same assumption and study the Kronecker-Hutchinson Estimator, and show that samples suffices to estimate the trace of a PSD matrix to relative error. To start, we need a basic property of the partial trace:
Lemma 28.
Let be a PSD matrix, and let . Then, is also PSD.
Proof.
Recall that if is PSD, then is also PSD for any matrix of compatible dimension. Further recall that if and are both PSD, then is also PSD. Since is a composition of such operations on , we get that is also PSD. ∎
Lemma 29 (Folklore).
Let be a PSD matrix. Then, .
We now prove our next core theorem:
Theorem 4 Restated.
Fix , and let be a PSD matrix. Let , where each is either a Rademacher vector or a Gaussian vector. Then, .
Proof.
We first apply Theorems 2, 29 and 16 to get
So we turn our attention to bounding this sum. If we let , then we can write , and so
We then conclude that . ∎
In particular, if we take samples in the Kronecker-Hutchinson Estimator, we get a standard deviation of , which is less than for . More formally, by Chebyshev’s inequality, we find that samples suffices to get
with constant probability.
4.1 A Matching Lower Bound
We might hope that the painful rate of results from a loose analysis of the Kronecker-Hutchinson Estimator. Or, perhaps, that some more clever choice of random vector distribution can avoid this painful exponential dependence. We show that this is not the case, at least not among vectors with real iid mean-zero unit-variance components:
Theorem 6 Restated.
Fix . Consider Hutchinson’s Estimator run with independent vectors where are vectors of real iid zero-mean unit-variance entries. Then, there exists a PSD matrix such that Hutchinson’s Estimator needs samples in order to have standard deviation .
Proof.
We let be the all-ones matrix. If we let denote the all-ones vector, then we can write . This is helpful because we can then use the Mixed-Product property (Lemma 10) to show that
where we can replace the Kronecker product with a scalar product because the Kronecker product of scalars is just the scalar product. Notably, we see that is a product of iid terms. Since we are interested in the variance of the estimator, we actually want to examine
So, letting in order to simplify notation, we turn our attention to lower bounding the expectation of where is a vector of iid zero-mean unit-variance terms. We expand this sum as
This expectation is zero if any random variable appears exactly once, like how by independence. So, we can reduce the sum above to terms that either look like or like . Terms like occur exactly times. Terms like occur exactly times333This can happen 3 ways: if or if or if . Each such pattern occurs times, so the overall pattern occurs times.. We can then bound by Jensen’s Inequality, and by independence. So, we get that
Which, noting that , in turn gets a variance lower bound of
Then, since , in order for the Kronecker-Hutchinson’s Estimator to have standard deviation , we need to take
∎
We note that this lower bound uses a rank-one matrix, which is interesting because we can exactly compute the trace of any rank-one matrix with a single Kronecker-matrix-vector product. In particular, if for some , then for any vector not orthogonal to , we have that
Since a Kronecker of Gaussian vectors is not orthogonal to any fixed vector with probability 1, we can exactly compute this trace with probability 1. That is to say, the hard instance used in this lower bound is truly only hard for the Kronecker-Hutchinson Estimator, and is not hard in general for the Kronecker-matrix-vector oracle model.
The lower bound is also structured in a second way – the matrix is not only rank-one but also has Kronecker structure. In Section 7 we prove that we can exactly compute the trace of any matrix with Kronecker structure by using Kronecker-matrix-vector products. That is to say, the hard instance used in this lower bound is in a second sense only hard for the Kronecker-Hutchinson’s Estimator, and its class of hard input matrices is not hard in general for the Kronecker-matrix-vector oracle model. That said, we do not have a clear sense of what algorithms may be able to bridge this gap and admit much faster trace estimation.
4.2 Estimating the Trace of a Random Rank-One Matrix
To round off our discussion of PSD matrix trace estimation, we show that estimating the trace of a random rank-one matrix still has an exponential dependence on , but a milder dependence than shown in the worst-case bound of Theorem 6. This proof serves two purposes. First, supposing our upper bound is tight, it suggests that the Kronecker-Hutchinson Estimator can still have an exponentially slow rate of convergence for matrices that are far from having Kronecker structure. Second, it shows how the variance bound from Theorem 2 can be used to get a tightened understanding of the concentration of the Kronecker-Hutchinson Estimator for particular matrices.
We study the random rank-one matrix , where . By analyzing the expected squared Frobenius norm of the partial traces of , we prove the following:
Theorem 3 Restated.
Fix . Let where is a vector of iid standard normal Gaussians. Then, with probability , we know that . In particular, with probability , we know that the Kronecker-Hutchinson Estimator returns a relative error estimate to the trace of if .
Proof.
We start by analyzing the expected squared Frobenius norm of the partial trace of with respect to a set of subsystems . Let , and note that without loss of generality we can assume that when trying to compute . We then decompose the vector where are standard normal Gaussian vectors. Then, note that we can decompose . This lets us write out
Then, if we define , then we see that
Noticing that is just a matrix full of iid standard normal Gaussians, we can then bound in expectation by just using known properties of the norms of Gaussians, where denotes the Schatten 4-norm. For instance, Lemma A.1 from [Tropp and Webber, 2023] implies that a standard normal Gaussian matrix has . So,
This in turn implies that
The last line above recognizes that we only depended on , so we could just sum over all values of , scaled by the times that appears. Rearranging terms, and recalling that , we get
Then, by Markov’s Inequality, we know that
holds with probability at least . Then, since is a chi-square random variable with parameter , we can use Chebyshev’s Inequality to bound with probability at least if . We condition on both of these events, together holding with probability at least . Given these events, the variance of the Kronecker-Hutchinson Estimator is therefore
We can now examine how many samples are needed for the Kronecker-Hutchinson Estimator to achieve a relative error estimation to with constant probability. By Chebyshev’s Inequality, we get that with probability . To make this additive error at most , we take
That is, with constant probability, samples suffice to get a relative error approximation to a random rank-one matrix with constant probability. By allowing larger and values, and by using sharper concentrations, the constant of 1152 can be significantly improved. ∎
5 The Complex-Valued Kronecker-Matrix-Vector Oracle
Until now, we have only considered using a real-valued Kronecker-matrix-vector oracle, where we are only allowed to compute for vectors where . In this section, we consider how our results change when we instead allow . If we did not have any Kronecker constraint on our oracle, then this change of oracles would be banal. If some algorithm computes complex-valued non-Kronecker matrix-vector products with , then we can find another algorithm which produces the exact same result, but instead uses real-valued non-Kronecker matrix-vector products with . Each complex-values query gets decomposed as , which can be evaluated with just two real-valued oracle queries.
However, this picture changes with a Kronecker constraint. If for , then the real part of the vector does not necessarily have Kronecker structure as well, and therefore we cannot reproduce the strategy above. We can see this even with very small vectors:
The real and imaginary component vectors do not have Kronecker structure. We can still simulate a complex-valued oracle with a real-valued oracle, but this requires far more real-valued oracle queries in the worst case. Specifically, we expand the real and imaginary parts of each vectors . To keep the notation simple, let decompose the real and imaginary parts of , so that . Then,
where we continue to expand this out into a sum of real-valued Kronecker-structured vectors. So, if some algorithm solves a problem using complex-valued Kronecker-structure matrix-vector products with , then we can only immediately guarantee the existence of an algorithm that computes real-valued Kronecker-structured products with that also solves the problem.
Keeping this exponential gap between the real-valued and complex-valued oracles in mind, we now turn our attention back to the Kronecker-Hutchinson Estimator. In the non-Kronecker case, it is well known that the Hutchinson Estimator performs well with complex queries:
Lemma 30.
Let and let be a vector of either iid complex Rademacher or complex standard normal Gaussian entries. That is, we have where and are iid real-valued vectors of either Rademacher or standard normal Gaussian entries. Then, and , where the first inequality is tight if is complex Gaussian, and the second is tight if is symmetric.
Note that the difference between Lemma 23 and Lemma 30 is just a factor of 2 in the variance term. However, this factor of 2 has been extremely influential in all of our proofs so far. Improving this constant to be 1 and following the exact same proofs give the following results:
Theorem 7 Restated.
Fix , and let . Let , where each is either a complex Rademacher vector or a complex Gaussian vector. Let be the average of all partial transposes of . Then, . If are complex Gaussians, then the first expression is an equality. If we further know that , then both expression are equalities. Further, if is PSD, then we have .
Proof.
Retrace the proofs of Lemmas 25 and 26 as well as Theorems 2 and 4. Anywhere that a 2 appears, replace it with a 1. ∎
We can similarly show that this dependence of is essentially tight:
Theorem 9 Restated.
Fix . Consider Hutchinson’s Estimator run with vectors where are vectors of iid complex zero-mean unit-variance entries. Then, there exists a PSD matrix such that Hutchinson’s Estimator needs samples in order to have standard deviation .
Proof.
Like in the proof of Theorem 6, we take where is the all-ones vector. In fact, we follow the proof of Theorem 6 rather closely, but we recreate much of it to handle the complex conjugation correctly. First, we expand the product
We let to simplify the notation. Then, we expand this inner expectation:
This expectation is zero if any random variable appears exactly once, like how by independence. So, we only need to consider terms in this sum where we do not have any one term appear alone. This can happen in three different ways.
First, if , then by Jensen’s inequality, we get . This case occurs times in the sum. Second, if , then by the iid distribution of the entry of , we get . Lastly, if or , then . This case occurs times in the sum. So, we get that
Which, noting that , in turn gets a variance lower bound of
Then, since , in order for the complex Kronecker-Hutchinson’s Estimator to have standard deviation , we need to take
∎
Lastly, we produce a much nicer bound on the sample complexity of the Kronecker-Hutchinson’s Estimator when working with random rank-one matrices:
Theorem 8 Restated.
Fix . Let where is a vector of iid standard normal Gaussians. Then, with probability , we know that . In particular, with probability , we know that the Kronecker-Hutchinson Estimator returns a relative error estimate to the trace of if .
Proof.
Recall from the proof of Theorem 3 that the expected squared Frobenius norm of the partial trace of is
Then, we can directly bound
Where the last line recognizes that we only depended on , so we could just sum over all values of , scaled by the times that appears. Rearranging terms, and recalling that , we get
As in the proof of Theorem 3, we know that with probability . Further, by Markov’s Inequality, we can bound
Following the same probability arguments as in the proof of Theorem 3, we get that taking
samples suffices for the complex-valued Kronecker-Hutchinson’s Estimator to recover a relative error approximation to the trace of with probability . ∎
Recall that the rate in the real-valued case was , and the rate here in the complex-valued case is . In the real-valued case, this rate grows at least as fast as , preventing any polynomial dependence on from being possible. However, in the complex valued case, if , then the Kronecker-Hutchinson Estimator actually converges in samples. We only show that the real-valued algorithm converges exponentially slowly, while the complex-valued algorithm converges polynomially quickly!
6 Looking Beyond the Kronecker-Hutchinson. Estimator
We have shown that the Kronecker-Hutchinson Estimator has a painful exponential dependence on the in the worst case. However, we are broadly interested in the class of all algorithms that operate in the Kronecker-matrix-vector oracle model. We show that all trace estimation algorithms in this model must have query complexity that grows with :
Theorem 31.
Any algorithm that accesses a PSD matrix via Kronecker-matrix-vector products , where are (possibly adaptively) chosen real-valued vectors, requires in order to output an estimate such that .
If we constrain ourselves to estimators that are unbiased (i.e. ), then this lower bound says that Kronecker-matrix-vector products are required to achieve standard deviation . Our analysis builds off of tools in [Braverman et al., 2020, Jiang et al., 2021], which studies matrix-vector lower bounds without the Kronecker constraint. In particular, we use the following theorem:
Definition 32.
Let be a matrix of iid standard normal Gaussian entries. Then, the matrix is distributed as a random matrix.
Imported Theorem 33 (Lemma 13 from [Braverman et al., 2020]).
Let . Then, for any sequence of (possibly adaptively chosen) matrix-vector queries and responses (so that ), there exists a rotation matrix constructed as a function of such that the matrix can be written as
where conditioned on the values of is distributed as Wishart matrix. That is, . Further, is a PSD matrix that is a deterministic function of .
This theorem is powerful because it says that any algorithm that has accessed a Wishart matrix via matrix-vector products must have a submatrix which the algorithm has absolutely no knowledge of. We note that in [Braverman et al., 2020], they do not state that is a deterministic function of , but this fact is easily verified by examining by the proof of Lemma 13 in [Braverman et al., 2020].
We start by extending 33 to the Kronecker-matrix-vector oracle model. We construct our hard instance input matrix as , where are iid Wishart matrices:
Corollary 34.
Let be iid matrices, and let . Then, for any sequence of (possibly adaptively chosen) Kronecker-matrix-vector queries and responses (so that ), there exists orthogonal matrices such that has
Let and be decompositions of the query and response vectors. Then, each conditioned on the values of is distributed as a Wishart matrix. That is, . Further, each is a PSD matrix that is a deterministic function of .
With this setup, we can now form a lower bound against any trace estimating algorithm. We first prove in Lemma 35 that, in terms of minimizing mean squared error, no estimator can outperform the conditional expectation . Next, in Lemma 36, we compute the exact variance of this estimator. Lastly, in the final proof of Theorem 31, we compare this variance to the trace of , to conclude that .
To simplify notation, we let denote the list of hidden Wishart matrices, and let be the variables that is conditioned on. Note that the algorithm only observes the response vectors , and that the algorithm does not exactly know the vectors . Therefore, the algorithm might not observe all of , but definitely covers everything that the algorithm does see.
Lemma 35.
Consider an algorithm that interacts with via Kronecker-matrix-vector products as in the setting of Corollary 34. Let be the output of that algorithm, and let . Then, has better means squared error than :
Proof.
Note that the algorithm only observes , so the estimator has to be a (possibly randomized) function of . We then expand the mean squared error of by using the tower rule:
where the inequality comes from being the minimum mean square error estimator for that particular conditional expectation. ∎
For any specific instantiation of , we can ask what the expected mean squared error of is:
We do not really care about this value for a specific choice of , but instead care about it on average across all values of . That is, we care about the squared error . We then want to ensure that the error is less than . So, we want to find out what value of ensures that
We start by computing the variance inside the square root on the left.
Lemma 36.
The mean squared error of is .
Proof.
Let and , so that is a product of independent random variables. Then, since these terms are independence, we know that
Further, since is independent of for , we know that
The last expectation on the right can be simplified a bit further. Since , we can write
which in turn means we have
(8) |
Now we can directly analyze these means and expectations. First, note that is a chi-squared random variable. Therefore, . Next, we focus on . By the linearity and cyclic property of the trace, and since , we have that
Therefore the only term in that depends on is , and so we can expand
where the last equality notes that is also a chi-squared random variable. Plugging our expectations back into Equation 8, we find that
which completes the proof. ∎
We now have built up the tools we need to prove the central lower bound:
Theorem 31 Restated.
Any algorithm that accesses a PSD matrix via Kronecker-matrix-vector products , where are (possibly adaptively) chosen real-valued vectors, requires in order to output an estimate such that .
Proof.
We consider as in Corollary 34. By Lemmas 35 and 36, we know that
We then want to show that this term on the right hand side is at most . First, we quickly note that , since is a chi-squared random variable. So, we want to find what values of ensure that
In fact, it will be mathematically convenient to demand slightly less accuracy, which will still give a meaningful lower bound:
We can rearrange this inequality to produce
Note that for and , we have that , so that . So, by additionally bounding for , we can simplify the above bound to be
Solving this inequality for gives . Optimizing over gives , completing the proof. ∎
7 Exactly Computing the Trace of in Queries
In this section, we are promised that for some unknown matrices . We show that by using exactly Kronecker-matrix-vector products, we can recover a set of matrices such that . Note that we cannot exactly recover the matrices because the Kronecker product does not admit a unique decomposition: we have for all scalars and matrices , . However, we are able to recover the trace exactly since , we can then compute .
input: Kronecker-matrix-vector oracle access to .
output: Factor matrices .
Theorem 37.
Let . Then, with probability 1, Algorithm 1 computes exactly Kronecker-matrix-vector products with and returns matrices such that .
Proof.
At a high level, the quadratic form is trying to recover the entry of . To see why, notice that
If we let denote the various components of , then we can write the above expression as
So, we get that the matrix has . We therefore conclude that
This analysis implicitly assumes that for all . Since are random Gaussian vectors, we know that does not lie in the nullspace of with probability 1 so long as is not the all-zeros matrix. So, Algorithm 1 is correct so long as no factor is the all-zeros matrix. If some is the all-zeros matrix, then we get that is also the all-zeros matrix, so that , and therefore our output will have all being all-zeros matrices, which is correct.
To count the number of Kronecker-matrix-vector products computed, note that Algorithm 1 only computes such products on lines 3 and 6. Line 3 computes exactly 1 product, and line 6 computes exactly products. ∎
8 Conclusion
In this paper, we produced new and tight bounds on the rate of convergence for the Kronecker-Hutchinson Estimator. We show that the estimator generally has an exponential dependence on , and we showed an expression for the exact variance of the estimator when using the Kronecker of Gaussian vectors.
There are several natural future directions for this research. It is worth exploring the variance of more classes of matrices under Theorem 4. In particular, is there a natural property of matrices which exactly characterizes the input matrices for which the Kronecker-Hutchinson Estimator has a polynomial rate of convergence? If so, this would be a significant step towards a sub-exponential runtime for trace estimation in the Kronecker-matrix-vector oracle.
It is also worth looking at algorithms in the Kronecker-matrix-vector oracle model beyond the Kronecker-Hutchinson Estimator. For instance, when has a quickly decaying spectrum, could a low-rank approximation method like ones used in the non-Kronecker case provide good trace estimation guarantees [Li and Zhu, 2021, Saibaba et al., 2017]?
There is also potential for newer and stronger lower bounds. We provide a lower bound against possibly adaptive algorithms, which is very general, but is also perhaps too general to be descriptive of most algorithms people are currently using. Is there perhaps a lower bound against all sketching-based methods which is exponential in ? Or, even more narrowly, is there a lower bound against all quadratic-form based algorithms, in the vain of [Wimmer et al., 2014]?
Lastly, there is also plenty of potential to study more problems in the Kronecker-matrix-vector oracle model. Is it possible to accurately and efficiently compute the top eigenvalue or eigenvector of a matrix from Kronecker-matrix-vector products? How about solving a linear system? These problems all need to be explored in great detail.
Acknowledgements
Raphael Meyer was partially supported by a GAANN fellowship from the US Department of Education. Haim Avron was partially supported by the US-Israel Binational Science Foundation (Grant no. 2017698). We thank Moshe Goldstein and Noa Feldman for discussions on quantum entanglement estimation, which inspired this work. We thank Tyler Chen for the design of Algorithm 1. We thank Christopher Musco for pointing us to relevant results in [Ahle et al., 2020].
References
- [Aaronson, 2018] Aaronson, S. (2018). Introduction to quantum information science lecture notes. https://www.scottaaronson.com/qclec.pdf.
- [Ahle et al., 2020] Ahle, T. D., Kapralov, M., Knudsen, J. B., Pagh, R., Velingker, A., Woodruff, D. P., and Zandieh, A. (2020). Oblivious sketching of high-degree polynomial kernels. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 141–160. SIAM.
- [Avron et al., 2014] Avron, H., Nguyen, H., and Woodruff, D. (2014). Subspace embeddings for the polynomial kernel. Advances in neural information processing systems, 27.
- [Avron and Toledo, 2011] Avron, H. and Toledo, S. (2011). Randomized algorithms for estimating the trace of an implicit symmetric positive semi-definite matrix. J. ACM, 58(2).
- [Bakshi et al., 2022] Bakshi, A., Clarkson, K. L., and Woodruff, D. P. (2022). Low-rank approximation with 1/ matrix-vector products. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1130–1143.
- [Bernstein, 2011] Bernstein, D. S. (2011). Matrix Mathematics: Theory, Facts, and Formulas. Princeton University Press, second edition.
- [Braverman et al., 2020] Braverman, M., Hazan, E., Simchowitz, M., and Woodworth, B. (2020). The gradient complexity of linear regression. In Conference on Learning Theory, pages 627–647. PMLR.
- [Bujanovic and Kressner, 2021] Bujanovic, Z. and Kressner, D. (2021). Norm and trace estimation with random rank-one vectors. SIAM Journal on Matrix Analysis and Applications, 42(1):202–223.
- [Chen and Hallman, 2022] Chen, T. and Hallman, E. (2022). Krylov-aware stochastic trace estimation. arXiv preprint arXiv:2205.01736.
- [Cortinovis and Kressner, 2021] Cortinovis, A. and Kressner, D. (2021). On randomized trace estimates for indefinite matrices with an application to determinants. Foundations of Computational Mathematics, pages 1–29.
- [Feldman et al., 2022] Feldman, N., Kshetrimayum, A., Eisert, J., and Goldstein, M. (2022). Entanglement estimation in tensor network states via sampling. PRX Quantum, 3(3):030312.
- [Girard, 1987] Girard, D. (1987). Un algorithme simple et rapide pour la validation croisée generalisée sur des problèmes de grande taille: applications à la restauration d’image. Informatique et Mathématiques Appliqu’ees de Grenoble.
- [Halikias and Townsend, 2022] Halikias, D. and Townsend, A. (2022). Matrix recovery from matrix-vector products. arXiv preprint arXiv:2212.09841.
- [Huang et al., 2020] Huang, H.-Y., Kueng, R., and Preskill, J. (2020). Predicting many properties of a quantum system from very few measurements. Nature Physics, 16(10):1050–1057.
- [Hutchinson, 1989] Hutchinson, M. F. (1989). A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059–1076.
- [Jiang et al., 2021] Jiang, S., Pham, H., Woodruff, D., and Zhang, R. (2021). Optimal sketching for trace estimation. Advances in Neural Information Processing Systems, 34:23741–23753.
- [Kueng et al., 2017] Kueng, R., Rauhut, H., and Terstiege, U. (2017). Low rank matrix recovery from rank one measurements. Applied and Computational Harmonic Analysis, 42(1):88–116.
- [Li and Zhu, 2021] Li, H. and Zhu, Y. (2021). Randomized block krylov subspace methods for trace and log-determinant estimators. BIT Numerical Mathematics, 61:911–939.
- [Meyer, 2021] Meyer, R. A. (2021). Updates for Hutch++: Hutch++ for undergrads. https://ram900.hosting.nyu.edu/hutchplusplus/#hutch_for_undergrads.
- [Meyer et al., 2021] Meyer, R. A., Musco, C., Musco, C., and Woodruff, D. P. (2021). Hutch++: Optimal stochastic trace estimation. In Symposium on Simplicity in Algorithms (SOSA), pages 142–155. SIAM.
- [Persson et al., 2022] Persson, D., Cortinovis, A., and Kressner, D. (2022). Improved variants of the Hutch++ algorithm for trace estimation. SIAM Journal on Matrix Analysis and Applications, 43(3):1162–1185.
- [Roosta-Khorasani and Ascher, 2015] Roosta-Khorasani, F. and Ascher, U. (2015). Improved bounds on sample size for implicit matrix trace estimators. Foundations of Computational Mathematics, 15(5):1187–1212.
- [Saibaba et al., 2017] Saibaba, A. K., Alexanderian, A., and Ipsen, I. C. (2017). Randomized matrix-free trace and log-determinant estimators. Numerische Mathematik, 137(2):353–395.
- [Sun et al., 2021a] Sun, X., Woodruff, D. P., Yang, G., and Zhang, J. (2021a). Querying a matrix through matrix-vector products. ACM Transactions on Algorithms (TALG), 17(4):1–19.
- [Sun et al., 2021b] Sun, Y., Guo, Y., Tropp, J. A., and Udell, M. (2021b). Tensor random projection for low memory dimension reduction. arXiv preprint arXiv:2105.00105.
- [Tropp and Webber, 2023] Tropp, J. A. and Webber, R. J. (2023). Randomized algorithms for low-rank matrix approximation: Design, analysis, and applications. arXiv preprint arXiv:2306.12418.
- [Vershynin, 2020] Vershynin, R. (2020). Concentration inequalities for random tensors. Bernoulli.
- [Wang et al., 2023] Wang, S., McArdle, S., and Berta, M. (2023). Qubit-efficient randomized quantum algorithms for linear algebra. arXiv preprint arXiv:2302.01873.
- [Wimmer et al., 2014] Wimmer, K., Wu, Y., and Zhang, P. (2014). Optimal query complexity for estimating the trace of a matrix. In Automata, Languages, and Programming: 41st International Colloquium, ICALP 2014, Copenhagen, Denmark, July 8-11, 2014, Proceedings, Part I 41, pages 1051–1062. Springer.