Memory-Sample Lower Bounds for Learning with Classical-Quantum Hybrid Memory
Abstract
In a work by Raz (J. ACM and FOCS 16), it was proved that any algorithm for parity learning on bits requires either bits of classical memory or an exponential number (in ) of random samples. A line of recent works continued that research direction and showed that for a large collection of classical learning tasks, either super-linear classical memory size or super-polynomially many samples are needed. All these works consider learning algorithms as classical branching programs, which perform classical computation within bounded memory.
However, these results do not capture all physical computational models, remarkably, quantum computers and the use of quantum memory. It leaves the possibility that a small piece of quantum memory could significantly reduce the need for classical memory or samples and thus completely change the nature of the classical learning task. Despite the recent research on the necessity of quantum memory for intrinsic quantum learning problems like shadow tomography and purity testing, the role of quantum memory in classical learning tasks remains obscure.
In this work, we study classical learning tasks in the presence of quantum memory. We prove that any quantum algorithm with both, classical memory and quantum memory, for parity learning on bits, requires either bits of classical memory or bits of quantum memory or an exponential number of samples. In other words, the memory-sample lower bound for parity learning remains qualitatively the same, even if the learning algorithm can use, in addition to the classical memory, a quantum memory of size (for some constant ).
Our result is more general and applies to many other classical learning tasks. Following previous works, we represent by the matrix the following learning task. An unknown is sampled uniformly at random from a concept class , and a learning algorithm tries to uncover by seeing streaming of random samples where for every , is chosen uniformly at random. Assume that are integers such that any submatrix of of at least rows and at least columns, has a bias of at most . We prove that any algorithm with classical and quantum hybrid memory for the learning problem corresponding to needs either (1) bits of classical memory, or (2) qubits of quantum memory, or (3) random samples, to achieve a success probability at least .
Our results refute the possibility that a small amount of quantum memory significantly reduces the size of classical memory needed for efficient learning on these problems. Our results also imply improved security of several existing cryptographical protocols in the bounded-storage model (protocols that are based on parity learning on bits), proving that security holds even in the presence of a quantum adversary with at most bits of classical memory and bits of quantum memory (for some constant ).
1 Introduction
Memory plays an important role in learning. Starting from the seminal works by Shamir [Sha14] and Steinhardt, Valiant and Wager [SVW16], a sequence of works initiates and deepens the study of lower bounds for learning under memory constraints. Steinhardt, Valiant, and Wager [SVW16] conjectured that in order to learn an unknown -bit string from samples of random-subset parity, an algorithm needs either memory-size quadratic in or exponentially many random samples (also in ). This conjecture was later on proved by Raz [Raz18], showing for the first time that for some learning problems, super-linear memory size is required for efficient learning. This result was then generalized to a broad class of learning problems [KRT17, Raz17, MM18, BGY18, GRT18, SSV19, GKR20, GKLR21].
Although we have a comprehensive understanding of the (in)feasibility of learning under limitations on particular computation resource (memory), the previous works mentioned above do not capture all physical computational models; most notably, quantum computation and the power of quantum memory. Many researchers believe that large-scale quantum computers will eventually become viable. Recent experiments demonstrated quantum advantages, for example [AAB+19], and suggested that there are possibly no fundamental barriers to achieving quantum memory and quantum computers. Questions on the role of quantum memory in learning were proposed by Wright in the context of general state tomography [Wri16] and by Aaronson for shadow tomography [Aar18]. A line of works [HHJ+16, BCL20, HKP20, CCHL22, ACQ22, CLO22] pioneer the idea and show either polynomial or exponential separations for learning with/without quantum memory, but all for intrinsic quantum learning tasks like state tomography, shadow tomography and purity testing.
In light of the above, it is appealing to consider classical learning tasks in the presence of quantum memory, as well as hybrid classical-quantum memory. A direct implication of all aforementioned classical results only gives trivial results. As qubits of memory can always be efficiently simulated by classical bits, we can only conclude (say, for parity learning) that either -qubit quantum memory or exponentially many samples are needed. Prior to our work, it could have been the case that even if only a very small size quantum memory was available, it might have significantly reduced the need for classical memory and led to an efficient learning algorithm.
In this work, we prove memory-sample lower bounds in the presence of hybrid memory for a wide collection of classical learning problems. As in [Raz17, GRT18], we will represent a learning problem by a matrix whose columns correspond to concepts in the concept class and rows correspond to random samples. In the learning task, an unknown concept is sampled uniformly at random and each random sample is given as for a uniformly picked . The learner’s goal is to uncover . In [GRT18], it is proved that when the underlying matrix is a - two source extractor111Roughly speaking, this means that every submatrix of with number of rows at least and number of columns at least has a relative bias at most . with error , a learning algorithm requires either bits of memory or samples to achieve a success probability at least for the learning task.
1.1 Our Results
In this work, we model a quantum learning algorithm as a program with hybrid memory consisting of qubits of quantum memory and bits of classical memory. At each stage, a random sample is given to the algorithm. The quantum learning algorithm applies an arbitrary quantum channel to the hybrid memory, controlled by the random sample. Although the channel can be arbitrary, we impose the outcome to be a hybrid classical-quantum state of at most qubits and bits. We stress that there is no limitation on the complexity of the quantum channel (and this only makes our results stronger as we are proving here lower bounds for such algorithms).
With the above model, we give the following main theorem.
Theorem 1 (Main Theorem, Informal).
Let be a matrix. If is a - two source extractor with error , a quantum learning algorithm requires either
-
1.
bits of classical memory; or,
-
2.
qubits of quantum memory; or,
-
3.
samples,
to succeed with a probability of at least in the corresponding learning task.
Our main theorem implies that for many learning problems, the availability of a quantum memory of size up to , does not reduce the size of classical memory or the number of samples that are needed. As coherent quantum memory is challenging for near-term intermediate-scale quantum computers and is probably expensive even if and when quantum computers are widely viable, the impact of quantum memory is further limited for these learning problems.
To make the theorem more precise, let us take parity learning as an example. The above theorem says that a quantum learning algorithm needs either bits of memory, or qubits of quantum memory, to conduct efficient learning; otherwise, it requires random samples. At first glance, it seems that the constraint on quantum memory is trivial: if the target is to learn an -bit unknown secret, a linear amount of memory always seems necessary to store the secret. However, noticing that our main theorem applies to quantum learning algorithms with hybrid memory and rules out algorithms with bits and qubits of hybrid memory for parity learning, the main theorem yields non-trivial and compelling memory-sample lower bounds. Note also that our results (and previous results) are valid even if the goal is to output only one bit of the secret. Currently, we do not know whether our main theorem is tight. For parity learning, we are not aware of any quantum learning algorithm that uses only qubits of quantum memory. We leave closing the gap as a fascinating open question.
The main theorem naturally applies to other learning problems considered in [GRT18], including learning sparse parities, learning from sparse linear equations, and many others. We do not present an exhaustive list here but refer the readers to [GRT18] for more details.
Along the way, we propose a new approach for proving the classical memory-sample lower bounds. We call this approach, the “badness levels” method. The approach is technically equivalent to the previous approach in [Raz17, GRT18] but is conceptually simpler to work with and we are able to lift it to the quantum case.
We note that proving a linear lower bound on the size of the quantum memory, without classical memory, is significantly simpler (but to the best of our knowledge such a proof has not appeared prior to our work). We present such a proof in Appendix C. In Appendix C, we state and prove Theorem 3 that shows a simpler proof for a linear lower bound on the quantum-memory size (without classical memory). While Theorem 3 is qualitatively weaker than our main result in most cases, as it only gives a lower bound for programs with only quantum memory but without a (possibly quadratic) classical memory, Theorem 3 is technically incomparable and is stated in terms of quantum extractors, rather than classical extractors. Additionally, the proof of Theorem 3 is significantly simpler than the proof of our main theorem.
Implications to Cryptography in the Bounded-Storage Model.
Since learning theory and cryptography can be viewed as two sides of the same coin, our theorem also lifts the security of many existing cryptographical protocols in the bounded-storage model (protocols that are based on parity learning) to the quantum setting. To our best knowledge, these are the first proofs of classical cryptographical protocols being secure against space-bounded quantum attackers.222On the other hand, there are known examples of classically-secure bounded-storage protocols that are breakable with an exponentially smaller amount of quantum memory. [GKK+08]. We elaborate more below.
Cryptography in the (classical) bounded storage model was first proposed by Maurer [Mau92]. In such a model, no computational assumption is needed. Honest execution is performed through a long streaming sequence of bits. Eavesdroppers have bounded storage and limited capability of storing conversations, thus cannot break the protocol. A line of works [CM97, AR99, ADR02, DR02, DM02, Lu02, DM04, MSTS04, HN06, DQW21, DQW22, …] builds efficient and secure protocols for key agreement, oblivious transfer, bit commitment and time stamping in that model.
Based on the memory-sample lower bounds for parity learning of bits, [Raz18] suggested an encryption scheme in the bounded-storage model. Guan and Zhandry [GZ19] proposed key agreement, oblivious transfer and bit commitment with improved rounds and better correctness, against attackers with up to bits of memory. Following a similar idea, Liu and Vusirikala [LV21] showed that semi-honest multiparty computation could be achieved against attackers with up to bits of memory. More recently, Dodis, Quach, and Wichs [DQW22] considered message authentication in the bounded storage model based on parity learning. Our result on parity learning gives a direct lift on all the results above. When the cryptographic protocols are based on parity learning of bits (often treated as a security parameter), our result shows that security holds even in the presence of a quantum adversary with at most bits of classical memory and qubits of quantum memory.
Despite many previous works on cryptography in the quantum bounded storage model [DFR+07, DFSS07, Sch07, DFSS08, WW08, PMLA13, BY21], they all rely on streaming quantum states. Our memory-sample lower bounds give for the first time a rich class of classical cryptographical schemes (key agreement, oblivious transfer, and bit commitment) secure against space-bounded quantum attackers.
2 Proof Overview
2.1 Recap of Proofs for Classical Lower Bounds
Since our proof builds on the previous line of works on classical memory-sample lower bounds for learning, specifically, on the proof technique of [Raz17, GRT18], we provide a brief review of these proofs, using parity learning [Raz18] as an example. In below, denotes the inner product of and in .
Consider a classical branching program that tries to learn an unknown and uniformly random from samples , where is uniformly random and . We can associate every state of the branching program with a distribution over , indicating the distribution of conditioned on reaching that state. At the initial state, without any information about , the distribution is uniform (which has the smallest possible -norm). Along a computational path on the branching program, the distribution evolves and eventually gets concentrated (with large -norms) in order to output correctly. Therefore, during the evolution, should at some stage have mildly large -norms ( times larger than uniform for some small constant ). If we set such a distribution as a target, the distribution is hard to achieve with random samples. Only with probability, the branching program can make significant progress towards the target; while most of the time a sample just splits the distributions (both the current and the target distribution) into two even parts, and that does not help much in getting closer to the target distribution (with large norm).
To put it more rigorously, we examine the evolution of the inner product
between the distribution on the current state , and a target distribution . Receiving a sample implies that , hence only the part of supported on such proceeds. If this part is close to probability, we say that divides evenly. Denoting the new distribution as , after proper normalization the new inner product is
(1) |
Ideally, both and the point-wise product vector should have reasonably small -norms. Due to the extractor property of , most of should divide both vectors evenly, and thus the denominator is close to while the enumerator is close to . That means, given a uniformly random , we get limited progress on the inner product. On the other hand, from with uniform distribution to , the branching program needs to make multiple steps of progression. Therefore it happens with an extremely small probability.
To ensure that the above statement goes smoothly, we require the following properties for every state in the branching program:
-
•
The -norm is small.
-
•
The -norm is small, which is implied when the -norm is small.
-
•
The denominator in Eq. 1 is bounded away from for every sample .
These properties do not hold by themselves. Instead, we execute a truncation procedure on the branching program before choosing a target distribution. More specifically, the branching program is modified so that it stops whenever it:
-
•
( truncation): Reaches a state with large ;
-
•
( truncation): Reaches a state with large when the unknown concept is ;
-
•
(Sample truncation): Or, for the next sample , does not divide evenly.
It turns out that after truncation, the other two truncation steps add error in each stage of the branching program. Therefore the proof boils down to proving a bound on the probability of reaching a state with large , from which by a standard union bound, we can prove the memory-sample lower bounds for parity learning: either samples or bits of memory are necessary.
2.2 Badness Levels
As mentioned above, to bound the probability of reaching a state with a large -norm, the basic idea is to fix its distribution as the target distribution , and bound the increment of the inner product . This was done in [Raz17, GRT18] by designing a potential function that tracks the average of for some , where the average is over states in the same stage of the branching program. Here we propose another approach using the concept of badness levels. Although it is technically equivalent to the potential function approach in the classical case, it is more pliable and easier to be adapted to the quantum case. We view this approach as a separate contribution of our work.
We first define a bad event to be a pair of the state and the upcoming part of the sample , such that , and for one of the two possible outcomes ,
(2) |
with some small constant . In other words, the inner product is large enough, while not being divided evenly by . From Eq. 1 we know that the inner product gets at most roughly doubled through a bad event. In contrast, in the good case, the inner product either gets a mere multiplicative factor or is already smaller than the baseline . Also, the extractor property of ensures that for every state , over uniformly random , the bad event happens with at most probability.
Now, the badness level of a state keeps track of how many times the computational path went through bad events before reaching .333For now we think of as a natural number. In the actual proof, is a distribution on natural numbers, as for different computational paths reaching the same state, the count of bad events can be different. The above observations on the bad events imply that (omitting the smaller factors):
-
•
For every state , is bounded by ;
-
•
Heading to the next stage, increases by with probability .
Therefore at each stage, the total weight of states with badness level is at most . Thus any state with must have probability.
2.3 Obstacles for Proving Quantum Lower Bounds
In this section, we present an attempt to prove the same -sample or -quantum-memory lower bound for the pure quantum case. Along the way we identify some obstacles to proving memory-sample lower bounds for quantum learning algorithms, and in the next section we show how to overcome these obstacles while proving lower bounds for hybrid learning algorithms, with quadratic-size classical-memory and linear-size quantum-memory.
Following the same framework as the above described proof for the classical case, we first need to transfer all the notions to a quantum algorithms:
-
•
The state is a quantum state in the Hilbert space of quantum memory;
-
•
The distribution is still well-defined: It is the distribution of when the quantum memory is measured to (see Section 3.4 and Eq. 3);
-
•
We are still able to implement truncation: If has large -norm, project the entire system to the orthogonal subspace of and repeat, until there is no such state (see Section 4.1 for details).
-
•
We are also able to implement sample truncation, in a similar manner to truncation. As the criteria here depends on , we separately create a copy of the current system for each , truncate the states using projection when is not evenly divided by in each copy, and then merge them back together. We prove that the error introduced by this truncation is small.
Here comes the first major obstacle: truncation. In the classical case, truncation is implemented for each individual , in contrast to truncation where the states are removed altogether. Relying on the fact that it is already known that the norm of the distribution is small, using Markov inequality, one can prove that the error introduced by the truncation is small.
However, when we try to emulate the classical implementation of truncation with quantum truncation, that is, to only project to the system conditioned on the specific where is large, instead of for every , it may lead to huge changes to the distributions on states non-orthogonal to . The following example illustrates such a scenario:
Example.
Consider a quantum learning algorithm, and assume that at some stage of the computation, for each , the quantum memory is in some pure state . We pick each uniformly at random in a Hilbert space of dimension and consider a typical configuration of . Now the -norms are bounded for every quantum state : the worst ones happen when for some , where is typically around , close to the -norm of uniform distribution. However, those worst distributions also have -norms close to , which is much larger than the -norm of the uniform distribution, and needs to be truncated. But truncating off for means that is completely erased, and we end up removing everything.
Moving on, we fix a target state with a target distribution which exceeds the -norm threshold, and the goal is again to prove a amplitude bound on . The bad event should still be defined as a pair satisfying Eq. 2, with now being a quantum state. We then run into the second major obstacle: it is not clear how to define badness levels.
If we define the badness level for each state individually by examining the bad events over the historical states, then it is not clear how to measure the total weight of a badness level . In the classical case, we simply define the total weight as the total probability of states with badness level . But here in the quantum case, it turns out that such a definition either depends on the choice of basis, which might have large increment in each stage, or completely fails to imply the desired amplitude bound on the target state.
The other choice is to have a more operational definition of badness levels, and it is indeed tempting to define as another register whose updates are controlled by the quantum memory. The problem with such definitions is that the bad event (Eq. 2) is not linear in . Therefore an operational definition of badness level, which is a linear operator, inevitably introduces error that escalates fast with the number of stages.
2.4 Hybrid Memory Lower Bounds with Small Quantum Memory
The obstacles in the previous section are for proving quadratic quantum memory lower bound. We note that proving linear quantum memory lower bound (without classical memory) is not hard: the proof can be entirely information theoretical, as with very limited memory, say, qubits, the information gained from each sample is exponentially small, despite the memory being quantum. We present such a proof in Appendix C.
The lower bounds that we prove here are with hybrid memory: To learn parity with both classical and quantum memory, an algorithm needs either samples, or classical memory, or quantum memory (Theorem 1). We now describe how we overcome the previously mentioned obstacles.
Truncation.
When there is only small quantum memory and no classical memory, the treatment for truncation is straightforward. We remove all quantum states with distributions of large -norm, by projecting the system to the orthogonal subspace , just like the process of truncation. As the overall distribution on is uniform, any state with must have weight at most . Therefore, as long as the dimension of the Hilbert space is much smaller than , the error introduced in this truncation is small. 444The example in the previous section that shows the infeasibility of treating truncation the same way as truncation does not work here, as it requires qubits of memory while here we have a smaller memory size.
With classical memory in presence, the actual truncation step (see Section 4.2, Step 2) is more complicated. We first apply the original classical truncation on the classical memory . Now that is bounded for each classical memory state , we can remove the quantum states with large by projection as stated above. Since the classical truncation depends on , it could change the distributions . However, as in the classical case, will not change a lot. Thus, wherever changes drastically, it must have a small weight and can also be removed by projection. This removal corresponds to truncation by in Section 4.2.
Badness Levels.
Interestingly, we are able to avoid the problems of defining the badness level on quantum memory altogether, by keeping it a property on the classical memory only. To do so we need to alter the definition of a bad event: it is now a pair of classical memory state and sample , such that there exists some quantum memory state with satisfying Eq. 2.
For each fixed classical memory state , we still need to ensure that bad events happen with a small probability. We prove it (Lemma 5.2) by showing that, if there are many different samples , each associated with some quantum state satisfying Eq. 2, then there is some quantum state that simultaneously satisfies Eq. 2 with most of such (which is impossible because of the extractor property). This is ultimately due to the continuous nature of Eq. 2: Under some proper congruent transformation, Eq. 2 becomes a simple threshold inequality on quadratic forms over . Now if it is satisfied by some , it is going to be satisfied by most for a much smaller threshold parameter , and hence the existence of a simultaneously satisfying .555We note that the error bound for sample truncation (Lemma 4.12) is also proved using this argument. In this argument, we use Lemma 3.1, which is derived from the anti-concentration bound for Gaussian quadratic forms, and crucially relies on the fact that the dimension is at most for some small .
Another technical problem is that to use the extractor property, we need to ensure that for the simultaneously satisfying . Thus, what we do in Lemma 5.2 is to first conceptually remove the parts where is too small, using projection similarly to the truncation steps. After the removal, we are left with a subspace where is always lower bounded, and we show that for every state that satisfies Eq. 2, the inequality is still close to being satisfied after projecting onto . Therefore we could still apply the above argument and find a simultaneously satisfying within the subspace.
3 Preliminaries
3.1 Vectors and Matrices
For a vector and , we define the norm of as
For two vectors , define their inner product as . So . We also view every distribution over a set as a non-negative real vector with .
We specifically use Dirac notation to denote unit vectors, implies that . For a non-zero vector , let be the normalization of , that is, .
For every vector , let be the diagonal matrix whose diagonal entries represent . Conversely, for every square matrix , let be the vector consisting of the diagonal entries of . For a matrix (or generally a linear operator) , we use and to denote its trace norm and spectral norm respectively, that is,
For an Hermitian , we say it is a positive semi-definite operator if for every , . A (partial) density operator is a positive semi-definite operator with its trace being (or at most , respectively).
Viewing a Learning Problem as a Matrix
Let be a matrix. The matrix corresponds to the following learning problem. There is an unknown element that was chosen uniformly at random. A learner tries to learn from samples , where is chosen uniformly at random and . That is, the learning algorithm is given a stream of samples, , where each is uniformly distributed and for every , . For each , we use to denote the vector corresponding to the -th row of .
Extractors
A matrix with is a - extractor with error , if for every distribution over with , there are at most rows such that
3.2 Anti-Concentration Bound for Quadratic Form on Unit Vectors
Lemma 3.1.
There exists an absolute constant such that following holds. Let be a Hermitian operator over the Hilbert space , and let be a uniformly random unit vector in . Then for every , we have
Proof.
Let be standard Gaussians. Notice that follows -distribution, and is equidistributed as . Therefore by union bound we have
For the first term, notice that (see e.g. [RS08, Chapter 5]) which is no smaller than . Therefore, by Carbery–Wright inequality [CW01], there exists an absolute constant such that
For the second term, the standard Laurent-Massart bound on -distrbitions [LM00] gives:
3.3 Multipartite Quantum Systems
The state of qubits can be represented in a Hilbert space . In a product of Hilbert spaces , a multipartite partial system is represented by a partial density operator . For a subset of indices, the subsystem on (or for short) is defined by tracing out , that is,
Now for any two disjoint subsets , given some , the conditional system on is defined as
which is a partial density operator on . Note that the trace
only depends on the system and , while being independent of the choice of .
Another simple fact that will be repeatedly used later on is that for an orthogonal basis of , we have
3.4 Classical-Quantum Systems
In the underlying space of the multipartite system, we say is classical if there is a fixed orthogonal basis of , such that for every multipartite system , every pair of distinct and every two states , we have
Without loss of generality, in the rest of the work we always assume is the set of computational basis states. We also identify with the discrete set , and remove the Dirac brackets when we talk about the classical elements in . In this case every multipartite system can be written as a direct sum
The reader may find this direct sum viewpoint easier to handle in some later scenarios.
When is classical, conditioned on any with disjoint from , the system is represented as a diagonal matrix on . If , it induces a distribution over the computation basis states of , defined as
(3) |
In the rest of this paper, whenever we use this notation , it is always implicitly assumed that and the distribution exists.
In this work we typically consider the following scenario: There is a quantum memory register ranging in the Hilbert space , and a classical memory register ranging in the set of memory states , along with some classical information (later in the work, it is the concept to be learned) that is correlated with and . We will make use of the following fact:
Claim 3.2.
Let be a classical-quantum system over classical and quantum . For every , is a convex combination of for some .
Proof.
Let be an orthogonal basis of , so that we have (from the end of last section)
Therefore is a linear combination of for , with non-negative coefficients. Since they are all distributions, it must be a convex combination. ∎
Characterization of operators over classical-quantum hybrid systems.
Now we identify all possible operators on the classical-quantum hybrid memory space . A priori to the assumption that is classical, we think of a quantum channel operating on the system as working on the underlying space . Now we denote to be the set of all such quantum channels that satisfy the following: for every classical-quantum system in , is still classical in . That is, for every two states and every pair of distinct , we have
Note that not all channels in are physically realizable. For instance, with one-bit classical memory and no quantum memory, the channel
is not a classical operator. However, since we are constrained to classical quantum systems, this channel is effectively equivalent to an identity channel on one-bit classical memory. Generally speaking, every channel in is equivalent to a channel controlled by that maps to . Below, we prove this observation and use it to show the following claim:
Claim 3.3.
Let be a classical-quantum system over classical and quantum . Let , and we use to denote the system after applying to and identity to . Then for every and , is a convex combination of for some and .
One difference between 3.2 and 3.3 is that in 3.3 it is not always possible to write as a convex combination of for from an orthogonal basis of . But it is always possible in 3.2. Although the difference does not matter in this work, we mention it here for clarity.
Proof.
Since , the following channel is functionally equivalent to for classical-quantum systems:
The physical meaning of is to measure under the computational basis (which should not change the functionality we care about) and apply .
By defining the channel , the above can be alternatively written as:
Now consider the Kraus representation of each , that is, a finite set of linear operators such that
We can write
where in each term of the summation, . Similar to the arguments in 3.2, is a convex combination of . ∎
3.5 Branching Program with Hybrid Memory
For a learning problem that corresponds to the matrix , a branching program of hybrid memory with -bit classical memory, -qubit quantum memory and length is specified as follows.
At each stage , the memory state of the branching program is described as a classical-quantum system over quantum memory space and classical memory space . The memory state evolves based on the samples that the branching program receives, and therefore depends on the unknown element . We can then interpret the overall systems over , in which consists of an unknown concept , resulting in a classical-quantum system . It always holds that the distribution of is uniform, i.e.,
Initially the memory is independent of and can be arbitrarily initialized. We assume that
At each stage , the branching program receives a sample , where and , and applies an operation over its memory state. Thus the evolution of the entire system can be written as
Finally, at stage , a measurement over the computational bases is applied on , and the branching program outputs an element as a function of the measurement result . The success probability of the program is the probability that which can be formulated as
4 Main Result
Theorem 2.
Let be two finite sets with . Let be a matrix which is a - extractor with error for sufficiently large and , where . Let
Let be a branching program for the learning problem corresponding to , described by classical-quantum systems , with -qubit quantum memory , -bit classical memory and length . If , and , the success probability of is at most .
From now on we let and . Then we have the following inequalities to be used later:
(4) | ||||
(5) | ||||
(6) |
We leave a detailed calculation for the above inequalities in Appendix A.
4.1 Truncated Classical-Quantum Systems
Here we describe how to truncate a partial classical-quantum system according to some property of desire on . The goal is to remove the parts of where is not satisfied. We execute the following procedure:
-
1.
Maintain a partial system initialized as , and subspaces initialized as for each .
-
2.
Pick and such that and is false.
-
3.
Change the partial system into the following system by projection:
and change to its subspace orthogonal to , that is
-
4.
Repeat from step 2 until there is no such and . Denote the final system as .
In step 2 we pick and arbitrarily as long as it satisfies the requirements, however we could always think of it as iterating over and processing each separately. The choices of for each do affect the final system ; Yet as we will see later, these choices are irrelevant to our proof.
Below, we give two useful lemmas on truncated systems.
Lemma 4.1.
For every and such that , there exists in the remaining subspace such that
Proof.
It suffices to prove the lemma with one round of the truncation procedure executed. Suppose the is picked in step 2, resulting in the partial system
We can write
-
•
If , then
And the lemma holds directly by choosing .
-
•
If , then with , we have
By the fact that , we must have . Therefore if we let , which is the normalized projection of onto the orthogonal subspace of , the above equality implies that . Meanwhile, since we have , which completes the proof. ∎
A direct corollary of the above lemma is that if only depends on the distribution , then holds for every and in the truncated system , even when is not in the remaining subspace .
Our second lemma is based on the following fact that bounds the trace distance of a partial system and its projection, whose proof can be found in the Appendix B.
Proposition 4.2.
For every partial system and projection operator on , we have
Lemma 4.3.
For each , let be the states picked in step 2 within . Then
Proof.
Since always holds, by summing over all we get the following corollary:
Corollary 4.4.
Let be all of the memory states picked in step 2. Then
4.2 Truncated Branching Program
The properties that we desire for the partial system consist of three parts:
-
•
Small norm: Let be the property that
-
•
Small norm: Let be the property that
-
•
Even division: For every , let be the property that
Now we define the truncated branching program, by specifying the truncated partial classical-quantum system for each stage . Initially let . For each stage , the truncation consists of three ingredients (below we ignore the superscripts on for convenience):
-
1.
Remove parts where is large. That is, let .
-
2.
Remove parts where is large. This is done by two steps.
-
-
First, let be an indicator vector such that if and only if
Let , where is the projection operator acting on .
-
-
To make sure that the distributions did not change a lot after the projection , for each , let be the property that
Let .
-
-
-
3.
For each , remove (only for this ) parts where is not evenly divided by . That is, for each , let .
Then, if , for each we evolve the system by applying the sample operations as the original branching program on , so that we have
4.3 Bounding the Truncation Difference
In order to show that the success probability of the original branching program is low, the plan is to prove an upper bound on the success probability of the truncated branching program , and bound the difference between the two probabilities.
Here we bound the difference by the trace distance between the two systems and . We will show that the contribution to the trace distance from each one of the truncation ingredients is small, and in addition the evolution preserves the trace distance.
4.3.1 Truncation by
Lemma 4.5.
For every , and such that is violated (that is, ), we must have .
The lemma says, for any direction picked by the truncation procedure, the weight will be small and the truncation will not change the state significantly.
Proof.
This is our main technical lemma and we defer the proof to Section 5. ∎
Since there are at most such directions picked in the truncation procedure, we conclude the following corollary.
Corollary 4.6.
For every , we have .
Proof.
Recall that . Since , the truncation lasts for at most rounds. Since in every round the picked has , by Corollary 4.4 we have
4.3.2 Truncation by
Lemma 4.7.
For every and we have
Proof.
Corollary 4.8.
For every and every , we have , and
Moreover, we have .
Proof.
Since and are both classical and , we have
which is positive semi-definite. Recalling that (Equation 3)
we have
And therefore, as is positive semi-definite, we have
Lemma 4.9.
For every , and such that is violated (that is, ) or is violated (that is, ), we must have .
Proof.
If is violated, let be the one such that . If then , while if then by Corollary 4.8,
Hence we always have
where the first inequality comes from the fact that and Equation 3.
Corollary 4.10.
For every , we have .
Proof.
Recall that . For each , the truncation picks at most states , each with . Therefore by applying Lemma 4.3 for each , we have
4.3.3 Truncation by
Notice that in the truncation step from to , the distribution might change and not satisfy anymore. However, with the truncation by , any such distribution that changes too much is eliminated, and we have the following guarantee.
Lemma 4.11.
For every , and , we have
Proof.
By Lemma 4.1, there exists such that . The truncation by ensures that , and therefore
Lemma 4.12.
For every partial classical-quantum system over such that holds for every , we have
Proof.
Notice that we can think of to be . This is because we can first assume that is full rank (otherwise change to its subspace and the conclusion in this lemma still holds), and if we have diagonalization for some non-singular , then consider the new system
and the set of distributions and over are the same, since for . With we have for every , and thus .
Let be the set of such that there exists with . For each , let
which is a Hermitian operator on . There exists such that
which means that . Now let be a uniformly random unit vector in , and by Lemma 3.1 we know that for some absolute constant ,
The second last inequality comes from Eq. 4, while the last inequality is because of the assumption that is sufficiently large.
Since the above holds for every , it implies that is at least . It means that there exists some such that for at least half of . On the other hand, since is a -extractor with error , and , there are at most fraction of such that . That means
Here , by the definition of . ∎
Corollary 4.13.
For every , we have .
Proof.
For each , the partial system satisfies the condition of Lemma 4.12 since for every ,
Notice that for each such that there does not exist with (that is, when holds for every ), the sub system is not touched in the truncation by and we have . Therefore
4.3.4 Evolution preserves trace distance
Lemma 4.14.
For every , we have .
Proof.
Recall that
Therefore by triangle inequality and contractivity of quantum channels under trace norms,
4.4 Proof of Theorem 2
Proof.
First, combining Corollaries 4.6, 4.8, 4.10, 4.13 and 4.14 we have
Since , by triangle inequality we know that , and thus
This bounds the difference between the measurement probabilities of and under any measurement, specifically the difference between the success probability of the branching program and the following value on :
Since and , the above value is at most . Therefore the success probability of the branching program is at most (recall that )
5 Proof of Lemma 4.5
The first step towards proving Lemma 4.5 is to analyze how evolves according to the rule
We introduce the following notations. For every and , let
which is a - vector that indicates whether . Let
(7) |
so that we can write
(8) |
Thus 3.3 implies that is a convex combination of for some and .
5.1 Target Distribution and Badness
Before considering the target distribution, let us first establish that the -norms of cannot be too large:
Lemma 5.1.
For every , , , we have
Proof.
When the statement is clearly true as is always uniform.
Now assume . By Lemma 4.1 and Lemma 4.11 we know that
for every and , as is truncated from . Since is true, meaning that the distribution is evenly divided by , we further have
Since is a convex combination of , by convexity its -norm is bounded by . ∎
From now on we use to denote a fixed target distribution (which we will later choose to be the distribution in Lemma 4.5), such that
We want to bound the progress of , which starts off as at , and becomes at least when . Note that by Cauchy-Schwarz we always have
(9) |
In order to bound the progress, we introduce some new notations. For any superscript (such as ) on the partial systems, we use to denote . Notice that
Similarly, can be deduced from via
(10) |
Therefore we can bound the norm of as
Now we can identity the places where increases by a lot, which happens when the inner product is not evenly divided by some (we will see the reason in the analysis later). Formally, at stage , we say is bad if
(11) |
Lemma 5.2.
For every and , we have
Proof.
Since is truncated from , Lemma 4.1 shows that for every , and there is such that
and by Eq. 10 it also implies that
Now fix some , and let be the set of of such that
Then contains all such that is bad, and our goal is to bound the fraction of in .
In the rest of the proof we temporarily omit the super script and write and simply as and . For the same reason as in Lemma 4.12 we can assume that , and thus
where the last inequality is by Lemma 4.11 and Cauchy-Schwarz, in the same way as Eq. 9.
Suppose that we have diagonalization , where is unitary and is diagonal and non-negative. Let be the subspace spanned by over the computational basis vectors such that . So for every we have
We claim that for every , there exists such that . To prove the claim, let be the projection operator from to , and then can be conceptually seen as a truncated partial system where holds when for the fixed . By Lemma 4.3 we have
Since , assume for we have and . Let , then we have
where the last step is due to . Thus
Similarly to the proof for Lemma 4.12, for each let
which is a Hermitian operator on . For each , let . Recall that , and therefore
And that means
We showed above that there exists , and thus such that
which means that for , the restriction of on , we have . Now consider a uniformly random unit vector in , and by Lemma 3.1 we know that for some absolute constant ,
Therefore, for the random vector where is uniform in , we conclude that
On the other hand, as , it also holds that , therefore is always true. Thus there exists a that simultaneously satisfies
for at least of . Since
and is a -extractor with error , there are at most fraction of such that , which means that
5.2 Badness Levels
At stage , for each classical memory state we count how many times the path to it has been bad, which is a random variable depending on the previous random choices of . This is stored in another classical register , which we call badness level and takes values . It is initially set to be , that is, we let
We ensure that the distribution of always only depends on and is independent of and conditioned on , using the following updating rules on the combined system for each stage :
-
•
The truncation steps are executed independently of . Therefore, for each we let
(12) -
•
The value of updates before the evolution step, where for each and we let
Here is a permutation operator, depending on , acting on such that
-
•
For the evolution step, we apply the channels on the memories and to get
Notice that the evolution step might introduce dependencies between and . However, such dependencies are eliminated later due to how we handle the truncation steps (12), and thus do not affect our proof.
We can check that the combined partial system defined above is consistent with the partial system that we discussed in previous sections, in the sense that always holds:
-
•
For the truncation step, it is straightforward to check that
- •
-
•
The evolution step can be checked directly from the formula without (Eq. 8):
So all previously proved properties about are preserved. In addition, we prove the following two properties about badness levels.
Lemma 5.3.
For every , and , we have
Proof.
We prove it by induction on . For the lemma is true as and .
Suppose the lemma holds true for some . By a similar argument as in Lemma 4.11 and applying Lemma 4.1 multiple times, we know that for every and , there exists and such that
and therefore
(13) |
Also, the truncation step by implies that . That is, for both ,
Therefore we have, unconditionally
(14) |
When the inner product is evenly divided, i.e. , we further have
which means that
(15) |
Now there are three cases to discuss:
- •
- •
- •
The last inequality follows from . Hence we obtain the same conclusion from all three cases.
For the evolution step, since is classical we can view and as a whole and apply 3.3 on , which asserts that is a convex combination of for some and . Then by linearity we conclude that 666It should be noted that in , and are not independent. (In they are independent (conditioned on )). Nevertheless, independence of (in ) is not needed or used here and we can conclude the final inequality by linearity by taking the corresponding convex combination of all inequalities.
Lemma 5.4.
For every we have
Proof.
We prove it by induction on . For the lemma holds as . Also notice that the lemma is trivially true for every when .
Now suppose the lemma holds for some . By definition we have
Therefore
By Lemma 5.2 we know that for every , the probability that is bad for is at most . In other words, for every ,
where the probability is taken over the random choice of . It means that
Notice that
and thus we conclude that
With the lemmas above in hand, we can finally prove Lemma 4.5.
Acknowledgement
We are grateful to Uma Girish for many important discussions and suggestions on the draft of this paper, and to the anonymous reviewers for their helpful comments.
References
- [AAB+19] Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando GSL Brandao, David A Buell, et al. Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505–510, 2019.
- [Aar18] Scott Aaronson. Shadow tomography of quantum states. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 325–338, 2018.
- [ACQ22] Dorit Aharonov, Jordan Cotler, and Xiao-Liang Qi. Quantum algorithmic measurement. Nature communications, 13(1):1–9, 2022.
- [ADR02] Yonatan Aumann, Yan Zong Ding, and Michael O Rabin. Everlasting security in the bounded storage model. IEEE Transactions on Information Theory, 48(6):1668–1680, 2002.
- [AR99] Yonatan Aumann and Michael O Rabin. Information theoretically secure communication in the limited storage space model. In Annual International Cryptology Conference, pages 65–79. Springer, 1999.
- [Aud07] Koenraad MR Audenaert. A sharp continuity estimate for the von neumann entropy. Journal of Physics A: Mathematical and Theoretical, 40(28):8127, 2007.
- [BCL20] Sebastien Bubeck, Sitan Chen, and Jerry Li. Entanglement is necessary for optimal quantum property testing. In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 692–703. IEEE, 2020.
- [BGY18] Paul Beame, Shayan Oveis Gharan, and Xin Yang. Time-space tradeoffs for learning finite functions from random evaluations, with applications to polynomials. In COLT, volume 75 of Proceedings of Machine Learning Research, pages 843–856. PMLR, 2018.
- [BY21] Anne Broadbent and Peter Yuen. Device-independent oblivious transfer from the bounded-quantum-storage-model and computational assumptions. arXiv preprint arXiv:2111.08595, 2021.
- [CCHL22] Sitan Chen, Jordan Cotler, Hsin-Yuan Huang, and Jerry Li. Exponential separations between learning with and without quantum memory. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS), pages 574–585. IEEE, 2022.
- [CLO22] Sitan Chen, Jerry Li, and Ryan O’Donnell. Toward instance-optimal state certification with incoherent measurements. In Conference on Learning Theory, pages 2541–2596. PMLR, 2022.
- [CM97] Christian Cachin and Ueli Maurer. Unconditional security against memory-bounded adversaries. In Annual International Cryptology Conference, pages 292–306. Springer, 1997.
- [CW01] Anthony Carbery and James Wright. Distributional and norm inequalities for polynomials over convex bodies in . Mathematical Research Letters, 8(3):233–248, 2001.
- [DFR+07] Ivan B Damgård, Serge Fehr, Renato Renner, Louis Salvail, and Christian Schaffner. A tight high-order entropic quantum uncertainty relation with applications. In Annual International Cryptology Conference, pages 360–378. Springer, 2007.
- [DFSS07] Ivan B Damgård, Serge Fehr, Louis Salvail, and Christian Schaffner. Secure identification and qkd in the bounded-quantum-storage model. In Annual International Cryptology Conference, pages 342–359. Springer, 2007.
- [DFSS08] Ivan B Damgård, Serge Fehr, Louis Salvail, and Christian Schaffner. Cryptography in the bounded-quantum-storage model. SIAM Journal on Computing, 37(6):1865–1890, 2008.
- [DM02] Stefan Dziembowski and Ueli Maurer. Tight security proofs for the bounded-storage model. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pages 341–350, 2002.
- [DM04] Stefan Dziembowski and Ueli Maurer. On generating the initial key in the bounded-storage model. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 126–137. Springer, 2004.
- [DQW21] Yevgeniy Dodis, Willy Quach, and Daniel Wichs. Speak much, remember little: Cryptography in the bounded storage model, revisited. Cryptology ePrint Archive, 2021.
- [DQW22] Yevgeniy Dodis, Willy Quach, and Daniel Wichs. Authentication in the bounded storage model. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 737–766. Springer, 2022.
- [DR02] Yan Zong Ding and Michael O Rabin. Hyper-encryption and everlasting security. In Annual Symposium on Theoretical Aspects of Computer Science, pages 1–26. Springer, 2002.
- [FvdG99] Christopher A Fuchs and Jeroen van de Graaf. Cryptographic distinguishability measures for quantum-mechanical states. IEEE Transactions on Information Theory, 45(4):1216–1227, 1999.
- [GKK+08] Dmitry Gavinsky, Julia Kempe, Iordanis Kerenidis, Ran Raz, and Ronald de Wolf. Exponential separation for one-way quantum communication complexity, with applications to cryptography. SIAM J. Comput., 38(5):1695–1708, 2008.
- [GKLR21] Sumegha Garg, Pravesh K. Kothari, Pengda Liu, and Ran Raz. Memory-sample lower bounds for learning parity with noise. In APPROX-RANDOM, volume 207 of LIPIcs, pages 60:1–60:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
- [GKR20] Sumegha Garg, Pravesh K. Kothari, and Ran Raz. Time-space tradeoffs for distinguishing distributions and applications to security of goldreich’s PRG. In APPROX-RANDOM, volume 176 of LIPIcs, pages 21:1–21:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
- [GRT18] Sumegha Garg, Ran Raz, and Avishay Tal. Extractor-based time-space lower bounds for learning. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2018, page 990–1002, New York, NY, USA, 2018. Association for Computing Machinery.
- [GZ19] Jiaxin Guan and Mark Zhandary. Simple schemes in the bounded storage model. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 500–524. Springer, 2019.
- [HHJ+16] Jeongwan Haah, Aram W Harrow, Zhengfeng Ji, Xiaodi Wu, and Nengkun Yu. Sample-optimal tomography of quantum states. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 913–925, 2016.
- [HKP20] Hsin-Yuan Huang, Richard Kueng, and John Preskill. Predicting many properties of a quantum system from very few measurements. Nature Physics, 16(10):1050–1057, 2020.
- [HN06] Danny Harnik and Moni Naor. On everlasting security in the hybrid bounded storage model. In International Colloquium on Automata, Languages, and Programming, pages 192–203. Springer, 2006.
- [KK12] Roy Kasher and Julia Kempe. Two-source extractors secure against quantum adversaries. Theory of Computing, 8(1):461–486, 2012.
- [KRT17] Gillat Kol, Ran Raz, and Avishay Tal. Time-space hardness of learning sparse parities. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 1067–1080, 2017.
- [LM00] Beatrice Laurent and Pascal Massart. Adaptive estimation of a quadratic functional by model selection. The Annals of Statistics, 28(5), October 2000.
- [Lu02] Chi-Jen Lu. Hyper-encryption against space-bounded adversaries from on-line strong extractors. In Annual International Cryptology Conference, pages 257–271. Springer, 2002.
- [LV21] Jiahui Liu and Satyanarayana Vusirikala. Secure multiparty computation in the bounded storage model. In IMA International Conference on Cryptography and Coding, pages 289–325. Springer, 2021.
- [Mau92] Ueli M Maurer. Conditionally-perfect secrecy and a provably-secure randomized cipher. Journal of Cryptology, 5(1):53–66, 1992.
- [MM18] Dana Moshkovitz and Michal Moshkovitz. Entropy samplers and strong generic lower bounds for space bounded learning. In ITCS, volume 94 of LIPIcs, pages 28:1–28:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018.
- [MSTS04] Tal Moran, Ronen Shaltiel, and Amnon Ta-Shma. Non-interactive timestamping in the bounded storage model. In Annual International Cryptology Conference, pages 460–476. Springer, 2004.
- [OP04] Masanori Ohya and Dénes Petz. Quantum entropy and its use. Springer Science & Business Media, 2004.
- [PMLA13] Stefano Pironio, Ll Masanes, Anthony Leverrier, and Antonio Acín. Security of device-independent quantum key distribution in the bounded-quantum-storage model. Physical Review X, 3(3):031007, 2013.
- [Raz17] Ran Raz. A time-space lower bound for a large class of learning problems. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 732–742, 2017.
- [Raz18] Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. J. ACM, 66(1), dec 2018.
- [RS08] Alvin C Rencher and G Bruce Schaalje. Linear models in statistics. John Wiley & Sons, 2008.
- [Sch07] Christian Schaffner. Cryptography in the bounded-quantum-storage model. arXiv preprint arXiv:0709.0289, 2007.
- [Sha14] Ohad Shamir. Fundamental limits of online and distributed algorithms for statistical learning and estimation. Advances in Neural Information Processing Systems, 27, 2014.
- [SSV19] Vatsal Sharan, Aaron Sidford, and Gregory Valiant. Memory-sample tradeoffs for linear regression with small error. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 890–901, 2019.
- [SVW16] Jacob Steinhardt, Gregory Valiant, and Stefan Wager. Memory, communication, and statistical queries. In Conference on Learning Theory, pages 1490–1516. PMLR, 2016.
- [Uhl76] Armin Uhlmann. The “transition probability” in the state space of a *-algebra. Reports on Mathematical Physics, 9(2):273–279, 1976.
- [Wri16] John Wright. How to learn a quantum state. PhD thesis, Carnegie Mellon University, 2016.
- [WW08] Stephanie Wehner and Jürg Wullschleger. Composable security in the bounded-quantum-storage model. In International Colloquium on Automata, Languages, and Programming, pages 604–615. Springer, 2008.
Appendix A Bounding Parameters for Theorem 2
Proof.
For Equation 4, since by the definition of and , we have:
For Equation 5, we use the fact that , therefore
Solving the inequality (together with the fact that ) gives us , which follows since by definition.
Appendix B Proof for Proposition 4.2
We first state a variant of Fuchs-van de Graaf inequality [FvdG99] on fidelity, defined for two partial density operators and as
Lemma B.1.
Let be two semi-definite operators. Assume . Then
Notice that when , the above inequality is the original Fuchs-van de Graaf inequality.
Proof.
Let and be purifications of and , that is, with and , where is the ambient space of and , and is some finite-dimensional Hilbert space.
Let be a unitary that diagonalizes , that is there is a diagonal matrix such that . Let be the diagonal of and respectively. We have
Therefore, by Cauchy-Schwarz inequality,
Notice that , and . By Uhlmann’s theorem [Uhl76], we can also choose and such that . Plugging them into the above inequality concludes the proof. ∎
Now we are ready to prove Proposition 4.2
Proof for Proposition 4.2.
By Fuchs-van de Graaf inequality, it suffices to prove the following bound on fidelity:
Let be a purification of , that is, for some Hilbert space . Then is a purification of . By Uhlmann’s theorem we have
Appendix C Linear Quantum Memory Lower Bound
In this appendix, we prove Theorem 3 that shows a simpler proof for a linear quantum-memory lower bound (without classical memory). While Theorem 3 is qualitatively weaker than our main results in most cases, as it only gives a lower bound for programs with a linear-size quantum memory but without a (possibly quadratic) classical memory, Theorem 3 is technically incomparable to the main results, as it’s stated in terms of quantum extractors and the bound on the quantum-memory size depends on a different parameter of the extractor. Additionally, the proof of Theorem 3 is significantly simpler than the proof of our main theorem.
We first define the quantum extractor property that we need, which is a simplified version of the ones considered in [KK12]. Given a matrix , consider two independent sources and uniformly distributed over and respectively. Suppose there is some quantum register whose state depends on and , and they together form a classical-quantum system
where is the state of when and . For any function on , we say that depends only on if for any and , whenever we have . In particular, depending only on is equivalent to being independent of , or .
We say that is an -strong -quantum extractor, if for every classical-quantum system , as above, with the -qubit quantum subsystem that depends only on , it holds that
Here is the uniform operator over one bit, and is the classical-quantum system constructed by adding a new classical register storing the value of and tracing out . In other words,
Notice that if we choose to be trivial, the above inequality immediately implies that .
As an example, the results in [KK12] imply that the inner product function on bits, where and
is an -strong -quantum extractor for every .
In this section we prove the following theorem:
Theorem 3.
Let be two finite sets with . Let be a matrix which is a -strong -quantum extractor. Let be a branching program for the learning problem corresponding to , described by classical-quantum systems , with -qubit quantum memory and length , and without classical memory. Then the success probability of is at most
To prove the theorem, we first need to define the following measure of dependency:
Definition.
Let be a classical-quantum system over classical and quantum . The dependency of on in is defined as
where is taken over all density operators on . Notice that in this definition taking is almost optimal as we have
(16) |
We also consider the quantum mutual information between and in , defined as
where denotes the von Neumann entropy, and denotes quantum relative entropy. When consists of qubits, we have the following relationship between our dependency measure and quantum mutual information:
Lemma C.1.
.
Proof.
On one hand, using the inequality on quantum relative entropy and trace distance (see e.g. [OP04, Theorem 1.15]), we have
On the other hand, Fannes-Audenaert inequality [Aud07] tells us that for every , the difference between the von Neumann entropies of any two states and on is bounded by
where is the binary entropy function. Since the state of conditioned on is , we have
as is concave and . Now let be the optimal density operator in the definition of . Plugging in Eq. 16, we conclude that
Lemma C.2.
For every classical-quantum system with the -qubit quantum subsystem that depends only on , we have
Proof.
Since , it suffices to bound . To bound the later, we first notice that since is a strong -quantum extractor,
As the total dimension of and is , by Lemma C.1 we have
Lemma C.3.
For every classical-quantum system with -qubit quantum subsystem that depends only on and , we have
Proof.
Let , where is the density matrix of when and . Then is a -bit quantum system that depends only on . Since can be decided from and , we have
We are now ready to prove Theorem 3. Let be the quantum channel applied on at stage with sample , and recall that the evolution of the system can be expressed as
Proof of Theorem 3.
We are going to bound the increment of , which is the shorthand for . For now let us focus on some stage , and let be the density operator that minimizes . Notice that for every .
Since is a fixed quantum state, we can prepare and apply on to obtain a new quantum register , which depends only on and . Notice that
and therefore similarly to Lemma 4.14 (that evolution does not increase trace distance), we can show that
Hence we have
Since , we conclude that
This value bounds the difference of the success probability of , and that of a quantum branching program whose memory is independent of . The later is clearly at most , which finishes the proof. ∎