Indiana University Bloomington, USA and https://homes.luddy.indiana.edu/qzhangcs/ [email protected]://orcid.org/0000-0002-6851-3115supported by NSF CCF-1844234 and IU Luddy Faculty FellowshipIndiana University Bloomington, USA and https://homes.luddy.indiana.edu/mheidar/[email protected] supported by NSF CCF-2211423 \CopyrightQin Zhang and Mohsen Heidari \ccsdesc[500]Theory of computation Quantum query complexity \ccsdesc[500]Information systems Query operators \ccsdesc[500]Information systems Query optimization \EventEditorsJohn Q. Open and Joan R. Access \EventNoEds2 \EventLongTitle42nd Conference on Very Important Topics (CVIT 2016) \EventShortTitleCVIT 2016 \EventAcronymCVIT \EventYear2016 \EventDateDecember 24–27, 2016 \EventLocationLittle Whinging, United Kingdom \EventLogo \SeriesVolume42 \ArticleNo23
Quantum Data Sketches
Abstract
Recent advancements in quantum technologies, particularly in quantum sensing and simulation, have facilitated the generation and analysis of inherently quantum data. This progress underscores the necessity for developing efficient and scalable quantum data management strategies. This goal faces immense challenges due to the exponential dimensionality of quantum data and its unique quantum properties such as no-cloning and measurement stochasticity. Specifically, classical storage and manipulation of an arbitrary -qubit quantum state requires exponential space and time. Hence, there is a critical need to revisit foundational data management concepts and algorithms for quantum data. In this paper, we propose succinct quantum data sketches to support basic database operations such as search and selection. We view our work as an initial step towards the development of quantum data management model, opening up many possibilities for future research in this direction.
keywords:
quantum data representation, data sketching, query execution1 Introduction
Quantum information and computing, rooted in the principles of quantum mechanics, have emerged as an important field of study with far-reaching effects across a broad spectrum of disciplines. Central to the concept of quantum computing are quantum bits (or qubits), which set themselves apart from classical bits due to their ability to exist in a superposition of states, allowing a quantum computer to offer the potential computational advantage against classical computing.
Although significant advancements have been made in the development of quantum algorithms after several decades of research, only a handful provably outperform their classical counterparts. Notable examples include Shor’s algorithm for factorization [54], Grover’s algorithm for search [22], and linear system solvers [26]. These quantum algorithms typically start by encoding classical input data into quantum states, execute a series of quantum operations, and then measure the resulting quantum states and carry out specific post-processing on the measurement outcomes. The reasons for the difficulties in the design of quantum algorithms that can outperform classical counterparts on classical input data remain elusive.
In this paper, we take a different perspective, directing our attention towards quantum data themselves. The nature, along with scientific experiments spanning physics, chemistry, material science, biology, and other fields, generates massive quantities of quantum data every day. Sources include Hawking Radiation, Cosmic Microwave Background, quantum effects in neutron stars, quantum states in ultra-cold atoms, quantum information in DNA replication, etc. In many scenarios, there is a need for us to preserve quantum data that has been collected from nature or generated in labs for future analysis. For example, scientists often use photons collected from remote stars to study the properties of those astronomical objects. It would be beneficial to store those photons as quantum states in a database, since it may not be feasible to collect fresh photons from those astronomical objects at the time of data analysis. In the case that the quantum states are prepared in the labs, generating fresh copies of quantum states on demand is often time-consuming. Let us use quantum simulation as an example. Quantum simulation is a prominent advantage of quantum computers, with significant implications for numerous areas of scientific research, including computer-aided drug design [50], high-energy physics [41], quantum chemistry [58] and many-body physics [55]. Quantum simulation typically relies on solving the Schrödinger equation for the underlying Hamiltonian. The Hamiltonian is implemented by a quantum circuit, which is applied to an initial quantum state to generate target quantum states. The construction of the Hamiltonians and the preparation of the target states can be rather time-consuming.111For instance, the Hamiltonian of the two-dimensional Fermi-Hubbard model on an lattice already requires approximately Toffoli gates [43], which directly contribute to the query time if states need to be generated from scratch at query. Storing the generated molecular quantum states in a database would eliminate the need to repeat the state preparation procedures during data analysis.
Once the quantum states are stored in a database, and assuming each state is associated with additional information such as the nature sources recorded at the time of collection or parameters of the experimental setup used to produce them, numerous applications can be envisioned. For example, if scientists receive photons from an unknown remote star, they can search a photon database to find a matching quantum state. Upon finding a match, they can retrieve its associated properties and other information, such as the time and method of its previous observation. They may also want to sort the states using a local observable (see Definition 3.5 in Section 3) with respect to certain properties (such as energy or momentum) to get an order of the photons in the database, aiding in the understanding of the spectrum of the corresponding stars in the universe. In quantum simulation, if we want to produce molecular states with average energy levels above a certain threshold relative to a specific local observable, we can perform a selection operation in our database to identify those states, and then use the associated parameters for the experimental setup to produce more of such quantum states.
Nevertheless, quantum data management remains in its infant stage. Some of the previously mentioned motivating examples are more like anticipated future problems. There has been research that leverages quantum data for learning or optimization, such as quantum machine learning [33, 24, 3], quantum variational optimization algorithm [28, 17]. and quantum neural network [52, 48, 18, 31, 45, 20]. However, their primary focus is on the sample complexity (namely, the number of copies of the quantum state needed for the task) and the convergence to optimal points, rather than on developing methods for the efficient representation and storage of quantum data for subsequent analysis.
In this paper, we introduce several quantum data sketches to support basic database operations in a sustainable and efficient manner. This paper does not aim to formulate a comprehensive quantum data management model. Rather, we view our work as an initial step towards developing a sustainable model for representing, querying, and analyzing quantum data at scale.
Unique Challenges in the Quantum World
The quantum world possesses several unique properties, such as superposition and entanglement, that can be leveraged to reduce resource usage in computing and information exchange. However, some of these features also post significant challenges to quantum data management. We highlight a few below.
Post-Measurement State Disturbance. The only way to extract information from a quantum state is to perform quantum measurements and observe probabilistic outcomes. However, each measurement has the effect of perturbing the quantum state. This characteristic implies that a quantum state might not be reusable post-measurement. In other words, we may need to consume many identical copies of a quantum state in order to derive enough useful information about it. This phenomenon is in stark contrast with the classical setting, in which we can consistently access the same data element for a number of times, always yielding the same result.
No-cloning. A natural thought to resolve the issue caused by state disturbance is to clone the quantum state before the operations. Unfortunately, the no-cloning theorem (see, e.g., [46]) in quantum mechanics asserts that it is impossible to create an exact copy of an arbitrary unknown quantum state.
Lack of Large-Scale Quantum Storage Systems. At the time of writing this paper, we are not aware of any reliable large-scale quantum storage systems. One reason for this is that qubits are highly susceptible to environmental disruptions such as temperature variations, electromagnetic radiation, or particle interactions. These disruptions lead to what is known as decoherence [40], resulting in the loss of quantum information.
Moreover, due to the quantum state disturbance and the no-cloning principle, even if we successfully build viable large-scale quantum storage systems in the future, we still need many identical copies of the quantum state for any nontrivial database operation. This implies that in order to accommodate an unlimited number of database operations (i.e., to be sustainable), we must prepare an unlimited number of copies for each quantum state in the storage, which is certainly not practical.
An alternative approach is to first learn the classical description of each quantum state and store it in a classical memory for future operations. Indeed, we believe that for the purpose of quantum data management, we have to store quantum states in the classical format. However, learning and storing the full information of a quantum state as a classical object is both time and space expensive, as the dimensionality of a quantum state is exponential in terms of the number of qubits.
We thus propose to design succinct classical representations (or, sketches) of quantum states that can be used to perform database operations efficiently. Based on the particular database operation it is intended to support, each sketch preserves only partial information of a quantum state. This is also the reason why we may be able to make the size of the sketch to be , where is the dimension of the quantum state. We also note that the sample complexity for constructing data sketches is a secondary consideration for database management systems, as it is just a one-time preprocessing step in the database design. This is where our work departs from the quantum state learning/tomography literature, which we will discuss in Section 1.1.
Our Contribution
We give the first systematic approach to designing space-efficient sketches for quantum states. These sketches can then be used to develop time-efficient algorithms for basic database operations. In particular:
-
1.
In Section 3, we have formalized a set of basic database operations for quantum data, including search, selection, sorting, and join. These operations differ from those for classical data as they inherently incorporate approximation in their definitions.
-
2.
Our main technical results are the first set of classical vector sketches that preserve, up to a distortion of for an arbitrarily small , the trace distance of the quantum states with probability . Our sketches have sizes , which is independent of the dimension of the states. Coupled with efficient nearest neighbor search via locality sensitive hashing, they can be used to support the search and join operations with time sublinear in the database size and independent of the state dimension. See Section 4.1.
-
3.
We make use of classical shadow seeds of quantum states [37] to approximate the expectation value of any given -local observable (to be defined in Section 3.3) using time and space independent of the dimension of the state. We also present a new hybrid quantum-classical algorithm to accelerate the query time. This sketch can be used for selection and sorting operations. See Section 4.2.
Paper Outline
In Section 2, we review some background on quantum information and computing as well as tools for classical data management. In Section 3, we define a set of basic database operations for quantum data. After these preparations, in Section 4, we present our classical sketches of quantum states and illustrate how to perform various database operations using these sketches. We review works that are most relevant to this paper in Section 1.1 and propose several directions for future research in Section 5.
1.1 Related Work
We are not aware of any prior work on designing classical sketches of quantum data, except for the paper [37] discussed in Section 4.2. There have been effort aiming to introduce quantum computing, quantum algorithms and quantum machine learning to the database community [13, 61, 44, 57, 9, 59]. We refer the readers to the recent tutorial [25] for an overview of these works. However, these initiatives either attempt to design and perform database operations directly on quantum data (i.e., assuming database elements are stored as quantum states) or focused on speeding up databases query optimization and transactions on classical data, setting them apart from the objectives pursued in this paper.
There are works [60, 38] focusing on applying classical data compression techniques (such as quantization) to the quantum state vector during quantum simulation. We note that our approach with sketches is quite different, as we aim to extract relevant information (often independent of the quantum states’ dimension) for various database operations.
Quantum State Learning
Many studies have explored the task of characterizing and learning properties of a quantum state using multiple copies of the state, including approximate state discrimination [12], quantum state discrimination [29], quantum state tomography [23, 49], quantum state property testing [27], quantum state certification [7], shadow tomography [2, 6], and pretty good tomography [1].
In the problem of approximate state discrimination, we are promised that a query quantum state belongs to a set of quantum states. The algorithm’s task is to return a state such that . The algorithm for approximate state discrimination proposed in [12] can be used together with the equality testing to handle the search operation when the available number of copies of the query state is limited, at the cost of larger time and space complexities. However, the need of fresh copies of database states for equality testing would undermine the long-term sustainability of the database system.
The problem of quantum state discrimination is very similar: We are again promised that the query state belongs to a set , but now the algorithm needs to return the exact . Harrow and Winter [29] gave an algorithm for this problem where the sample complexity of the query state depends on a parameter , which is the maximum pairwise fidelity of states in the set .
In the quantum state tomography, we wanted to learn an unknown quantum state up to a trace distance . Optimal sample complexity has been identified [23, 49].
Quantum state property testing [27] and quantum state certification [7] can be seen as relaxations of aforementioned problems. In the former, we are given a query state and a set of quantum states, and asked to test whether or is -far from (that is, for any state , we have ), and in the latter, we are given a query state and a known state , and asked to test whether or . The main issue with property testing and certification in the setting of data management is that the decision can be arbitrary even if the query state is very close to (but not the same as) a database state.
Both shadow tomography and pretty good tomography focus on approximating for a query state and a set of known binary measurements [2, 6], or a distribution on them [1]. However, these algorithms cannot be used for the -selection for an arbitrary observable given at the time of query. Their running time is also polynomial in terms of the state dimension . Recently, Gong and Aaronson [21] generalized shadow tomography to a fixed set of measurements with multiple outcomes.
To the best of our knowledge, all the previous work on quantum state learning focuses on the sample complexity, but not on the space complexity for representing the quantum states for various data management operations.
2 Preliminaries
We start by giving a gentle introduction of the basics of quantum information and computing, particularly for readers who are not in the field yet. For a comprehensive treatment on this topic, we refer the readers to standard textbooks in the field, such as [46].
Quantum States and Qubits
The first axiom of quantum mechanics is concerned with quantum state as a way to describe a quantum system, such as a qubit. For accessibility of the paper we focus on pure state that are represented by complex-valued vectors. Moreover, we assume that each quantum data point is stored in -qubits. Therefore, the dimentionality of the space is . In that case, the quantum stats are unit-norm vectors in . Following the Dirac bra-ket notation, a vector is simply denoted by the ket . As an example, a qubit is a -dimensional vector represented as , where . This decomposition is typically called a superposition. A well-known superposition is the state . Similarly, an -qubit state is represented by a superposition as where . For compactness, we use to represent each , where is the decimal representation of the binary string .
Quantum Operations
The second axiom of quantum mechanics states that the evolution of quantum states are described via unitary transformation. A unitary transformation is represented by a unitary matrix such that . If the initial state is , then the evolved state is . In quantum computing is typically implemented in terms of elementary quantum logical gates. In this perspective, one can study the gate complexity of implementing . This axiom implies a unique feature of quantum, known as the no-cloning principle that prohibits making copies of quantum data. As a result one needs to adopt data management procedures that abide this rule.
Quantum Measurements
The third axiom of quantum mechanics asserts that any classical information about a quantum state is obtained via measuring it. The act of measuring a quantum system will collapse the quantum state inevitably. The specific outcome of a measurement is probabilistic and is governed by the Born’s law. These probabilities are determined by the initial state of the system and the nature of the interaction between the system and the measuring device. Measuring in an -qubit system is typically modeled in the so-called computational basis. When the quantum state is in the superposition , the outcome of the measurement in the computational basis is going to be with probability . For instance, measuring the state produces a random uniform binary output. The stochasticity of quantum measurements is another feature that calls for probabilistic data management frameworks. Moreover, the state collapse phenomenon significantly complicates the tasks, as the quantum state cannot be entirely “recycled” following a measurement.
One may attempt to think of a quantum state - as far as measurement is concerned - as a discrete probability distribution , but there are two fundamental differences. First, the coefficients (called amplitudes) ’s are complex numbers that make superposition and interference possible. Second, the probability of an outcome in quantum mechanics is found by taking the absolute square of the amplitude, that is, .
In general, a certain measurement on a quantum state can be obtained in three stages: (i) applying an appropriate quantum operator to the state, (ii) measuring the evolved state in the computational basis; and (iii) applying classical post processing on the measurement outcomes. This procedure is compactly modeled as a matrix called an observable that is multiplied by the original state . The eigenvalues of represent the possible values of the measurement outcomes. Moreover, by we denote the probability distribution of the measurement outcomes after applying on . Because the outcomes are probabilistic, we are often interested in their expectation values. The expectation of the outcome distributed by is equal to , where is the complex conjugate transpose of the vector .
Standard Math Notations Versus Dirac Notations
As this paper is intended for an audience within the database community, we recognize that the Dirac bra-ket notation might appear unfamiliar to database researchers without a background in quantum information and computing. To simplify, in the main text we express a pure quantum state as a column vector with dimensions denoted as , and use and to denote and , respectively. We use to denote the expectation value of an observable . Throughout the paper, we reserve the notations and for quantum states.
We have also included a more formal (but still gentle) introduction of quantum information and computing using Dirac bra-ket notations in Appendix A. We will use Dirac bra-ket notations in all the proofs in the appendix.
Trace Distance
Given two quantum states and , we define their trace distance to be . The trace distance is the most widely used distance measure for quantum states in the literature.
2.1 Performance Metrics
In the context of quantum data, similar to classical database design, the efficiency of space and time is crucial during database initialization, indexing, and querying. Minimizing the number of quantum state copies used for constructing sketches is also important, as obtaining state copies can be costly and they cannot be fully recycled due to post-measurement disturbance. However, as we mentioned earlier, sample complexity is a secondary consideration in the data management setting, since the sketch-building/initialization is a one-time process.
A unit-time quantum operation comprises standard single-qubit gates like the Hadamard gate, Pauli gates, phase gate, and gate, as well as a two-qubit gate, such as the Controlled-NOT (CNOT) gate, that enables entangling operations.222We refer the readers to [46] for a detailed introduction of these gates. The combination of these gates is sufficient to approximate any unitary operation to arbitrary accuracy. We call these gates unit gates, and define the size of a circuit (for representing a unitary operation) to be the number of unit gates in the circuit.
As mentioned, a typical quantum measurement on qubit systems consists of a unitary operator followed by measurement in computational basis and classical post processing. Assuming that the classical post processing is polynomial, the overall time cost is typically dominated by the gate complexity of . It has been shown in [56] that a circuit depth of (i.e., ) is needed for constructing an arbitrary unitary operator . To simplify matters, we assume that both executing an arbitrary -dimensional quantum measurement and preparing an arbitrary -dimensional state require quantum time.
2.2 Nearest Neighbor in High Dimensions
As quantum states are inherently high dimensional, even after effective sketching and summarization that we will illustrate in the subsequent sections, we will thus use Approximate Nearest Neighbor (ANN) via Locality Sensitive Hashing (LSH) to further speed up some database operations. This subsection will take a brief detour from our discussion of quantum data management.
Definition 2.1 (-ANN-search).
Let be a database containing a set of vectors in and be a query vector. Let be a distance function. If there is at least one vector with , return any with . Otherwise, either return a with or return .
Let us focus on the case that the distance function is or . Indyk and Motwani [39] showed that -ANN can be solved efficiently via LSH. The idea is that we first apply multiple hash functions to each vector in ; this part can be pre-computed and stored as an indexing. At the time of query, we apply the same set of hash functions to the query vector . We then run over all vectors such that and collide (i.e., fall into the same bin) on at least one hash function, and return the first vector if . If no such found after traversing a certain number of vectors in , we return .
We will use to denote the -ANN search for a query vector in database . The following is a summary of results on LSH-based ANN for distances.
Theorem 2.2 ([39, 15, 5]).
For being or , a database of vectors, and a -dimensional vector , there is an algorithm that solves using space and classical time, where for distance and for distance.
Remark 2.3.
We note that if we do not terminate the algorithm after encounter the first such that , then the same algorithm can return a subset including all vectors such that , and excluding all vectors such that .
Remark 2.4.
We can also use LSH to find a set of pairs of vectors such that includes all pairs such that , and excludes all pairs such that . To this end, we first hash all vectors, and then check the distances of all pairs of vectors that collide on at least one hash function.
3 Basic Operations on Quantum Data
The characteristics of quantum information dictate that we can only obtain an approximation of a quantum state with a finite number of quantum state copies. A celebrated result in quantum state tomography states that to learn an unknown -qubit quantum state up to a trace distance , we already need copies of the quantum state, where is the dimension of [23, 49]. We thus consider two quantum states with the same state. Consequently, all the operations that we support in a quantum database also need to be approximate. The precise definition of ‘approximation’ varies for different operations.
In this section, we formulate basic quantum data operations that we aim to support using our proposed sketches. When we say the return of a quantum state , we are referring to its identifier.
3.1 Equality Test
In the classical data setting, the equality test on two data objects returns if , and returns otherwise. In the quantum setting, since we cannot distinguish two quantum states using copies of the states if their trace distance is at most , we need to introduce the approximation version of the equality test:
Definition 3.1 (-equality-test).
Given two quantum states and , output if , and if . The output can be arbitrary if .
In words, we consider two quantum states the same if their trace distance is at most , and different if their trace distance is more than . If the distance falls between the two values, then the decision can be arbitrary. The gap between yes and no is inevitable for quantum data.
Given two quantum states and , which may be unknown, the standard method for estimating their trace distance is the swap test [8]. The algorithm uses a controlled-SWAP gate (can be implemented using unit gates) and two single-qubit Hadamard gates. The test outputs with probability , and otherwise. Therefore, using such tests (the constant hidden in the big- depends on the constant ), we can differentiate the case from with a probability .
The main issue with this algorithm is that we have to consume fresh copies of database states for each equality test, which is unsustainable for a database system that is designed to answer an unlimited number of queries.
3.2 Search and Join
In the classical data setting, given a set of objects and a query object , the search operation returns some such that if such exists, and otherwise. In the quantum setting, again due to the difficulty of distinguishing two quantum states within a distance of , we propose the following approximation version.
Definition 3.2 (-search).
Given a query state and a database , if there exist a state such that , return a state with . Otherwise, either return a state with or return .
In other words, if there exists a state in the database which has a trace distance no more than from the query state , we return a state in whose distance is no more than from (similar to the ANN search). Else if all states in the database have distances larger than from the query state, we return . In other cases, we either return a database state with distance no more than from the query state or return .
The most straightforward way is to perform the -equality-test for each database state with the query state . By the above algorithm for equality test (setting ), we can determine with probability whether there exists a state such that the -equality-test on and returns . The above procedure takes quantum time, which is linear in terms of the number of states in the database. Another significant limitation of this method is the necessity of using fresh copies of the database states for each search operation because of the equality test, making the database system unsustainable.
A closely related operation to search is join, which is one of the most important operations in relational database systems. We introduce the quantum version of natural join as follows.
Definition 3.3 (-natural-join).
Given two databases and of quantum states, we want to output a set that includes all pairs of states such that , and excludes all pairs such that . The decisions for other pairs can be arbitrary.
3.3 Selection and Sorting
In relational databases for classical data, selection is typically denoted by , where is a relation and is a propositional formula that involves an attribute, a comparison operator in the set , and a constant value for comparison (e.g., age ). However, in the quantum data setting, quantum states cannot be directly compared. We can only apply a measurement on the state and get a random outcome according to the distribution . As a classical analog, we would say a person’s age is with probability and with probability .333This assembles probabilistic databases, but in the quantum data setting the probability distribution is not given explicitly, and the support size of the distribution is exponential in terms of the number of qubits of each quantum state. We thus look at the expectation value for the observable corresponding to .
The quantity holds significant importance in quantum mechanics (see, e.g., the textbook [51]). It can be used to provide an estimate of the system’s average energy in a particular state, describe the level of non-classical correlations between entangled particles, quantify quantum information such as entropy, coherence, and entanglement, etc.
We define the -approximate ‘’ selection operation for quantum data as follows.
Definition 3.4 (-selection).
Given a database , an observable , a threshold , and an error parameter , return a set of states such that includes all database states such that , but excludes all such that .
Note that the -approximate equality selection can be implemented by taking the difference between -selection and -selection, which includes all with and excludes all with or . In the context of approximation, we can consider ‘’ and ‘’ the same as ‘’ and ‘’, respectively.
We also note that -search can also be handled by looking at for a specific observable , although this solution is not as efficient as that using the particular sketches that we shall design for the search operation. We have included a reduction from -search to -selection in Appendix D.
In the context of databases, we are particularly interested in the following type of observables.
Definition 3.5 (-local observable).
An observable of a system with qubits is called -local if it can be written as a sum of a constant number of terms, each acting on at most qubits. For instance, a -local observable in a -qubit system might look like:
Where and are operators acting on the pairs of qubits (1,2) and (2,3) respectively, while and are the identity operators acting on the remaining qubits.
-local observables have been well studied in the literature (see [11, 42] and references therein). They are interesting because, in most practical scenarios, our goal is to identify specific properties of a quantum state (e.g., the energy, momentum, or spin of a photon) that rely on a small subset of qubits of the state. This is similar to the classical setting where most queries depend on a few attributes of a relational database table. For example, suppose we want to retrieve all records in a table containing patient information for individuals aged years or older with systolic blood pressure at least , we only need to look at two attributes in the table: age and blood pressure. If we view each qubit of a quantum state as an attribute (e.g., spin, position, momentum, polarization, etc.), then a -local observable performs selection on at most attributes of the quantum state.
A related problem of selection is sorting. As a motivation, we would like to sort a set of given quantum states according to their average energy with respect to an observable determined by a particular application. Note that there is no natural order between the quantum states themselves. Therefore, introducing an observable and computing the expectation value is somewhat necessary to establish a total order between the quantum states.
We define the sorting operation with respect to an observable as follows. Similar to the selection operation, we introduce an additive approximation in the sorted order.
Definition 3.6 (-sorting).
Given a database of states, an observable , and an error parameter , return an order of the states in such that for all , we have .
4 Sketches for Quantum Data Operations
In this section, we introduce two quantum data sketches, vector sketches and shadow seeds, which are summaries of the original states for efficiently handling previously mentioned database operations.
Before delving into the details, let us use metaphors to provide some very high-level intuition of the two data summarizing methods. The vector sketches can be seen as capturing snapshots of the state from different angles, while each shadow seed can be seen as a piece of information gleaned from the state. Using multiple shadow seeds, we can reconstruct the original state at varying levels of resolution.
4.1 Vector Sketches for Equality-Test, Search, and Join
The concept of vector sketch is to represent a quantum state as a vector in with instead of a vector in , while preserve certain distance properties. In this section, we design vector sketches for quantum states and then use them to conduct equality test, search, and join.
A natural way to construct the sketch is to take a number of random measurements on , and write down the measurement outcomes as a vector. The following result is due to Sen [53], rewritten for pure quantum states.
Theorem 4.1 ([53]).
Let and be two pure quantum states in . With probability at least over the choice of a random measurement basis , there exists a universal constant such that
(1) |
Theorem 4.1 connects the trace distance of two quantum states to the distance of their measurement outcome distributions. We note that the distortion in (1), , is a big constant whose value left unspecified in [53].
Vectors and are discrete distributions with outcomes . It is well-known that for a discrete distribution over a domain of size , using samples we can obtain an empirical distribution such that with probability (see, e.g., [10]).
Corollary 4.2.
Let and be the empirical distributions of measurement outcomes by applying in Theorem 4.1 to (for a sufficiently large constant ) copies of and , respectively. With probability , we have
where is a universal constant.
We can view and as two empirical probability vectors. However, since for a -qubit state, it is both space-expensive to store and time-expensive to use it for database operations.
Embedding to -space
We aim to address the issue of efficiency in both time and space by showing that there is another distribution of measurements whose number of outcomes is independent of the state dimension , for which a similar connection exists between the trace distance of two quantum states and the distance of the corresponding measurement outcome distributions. Moreover, the distortion of our sketching can be made arbitrarily close to (compared with in (1)). It is worth noting that this distortion will significantly impact the efficiency of the search and join operations, as we will discuss shortly.
Our result is summarized in the following theorem.
Theorem 4.3.
Let and be two pure -dimensional quantum states. For any , there is a distribution of measurements with outcomes for a sufficiently large constant , such that a measurement sampled randomly from satisfies
with probability at least , where is a universal computable constant. Additionally, the measurement sampling can be completed in time, and the sampled measurement can be represented as a quantum circuit with a gate complexity of .
Proof 4.4 (Proof Overview).
At a high level, our approach leverages form dimension reduction through quantum measurements. We make use of a technique called pretty good measurement [36] to generate random projective quantum measurements with outcomes. The output of these measurements are random vectors serving as the embedding of the state into .
We start by picking a random basis for based on the Haar measure [35]. Let be independent Gaussian random variables with mean zero and variance , and let be a random vector where . We repeat this process and generate complex Gaussian random vectors . These vectors are linearly independent with probability one; but they are not necessarily orthonormal. We make use of pretty good measurement to orthogonalize and normalize these vectors. More precisely, we construct the operator (matrix) , and define the vector for each . We can show that are linearly independent and are orthonormal. Moreover, the distribution of is unitary invariant, and hence the Haar measure. Intuitively, is distributed uniformly over surface of the unit sphere in . Next, we randomly group ’s into groups and form random projection operators as
Let be the corresponding measurement. Clearly, is a valid measurement with probability one. This random measurement facilitates an embedding of the quantum states in into . We carefully analyze the distortion of the embedding (i.e., the outcome distribution by applying to the quantum state) using tools from the concentration of measures and properties of the Haar distribution. We show that the distortion of this embedding is no more than with probability when for a constant . The complete proof can be found in Appendix B.1.
The measurement construction described above could require polynomial time in . However, we demonstrate that it can be sampled more efficiently from the Clifford group in classical time , leveraging the properties of unitary 2-designs from quantum information theory. The details are provided in Appendix B.1.1.
To approximate up to an additive error , we have to approximate up to . We have the following immediate corollary.
Corollary 4.5.
For any , let for a sufficiently large constant , and let and be the empirical distributions of the outcomes by applying independent random measurements in Theorem 4.3 to (for a sufficiently large constant ) copies of and , respectively. With probability at least , we have
where is the same constant in Theorem 4.3.
Embedding to -space
The sketch we have constructed for the -space can also be applied to the -space, albeit through a different analysis. The distance is interesting since we know from Theorem 2.2 that enjoys a slightly better ANN scheme in term of time and space complexities, which will be useful for speeding up search and join operations. The proof of the following theorem can be found in Appendix B.2.
Theorem 4.6.
Let and be two pure -dimensional quantum states. For any , there is a distribution of measurements with outcomes for a sufficiently large constant , such that a measurement sampled randomly from satisfies
with probability at least . Additionally, the measurement sampling can be completed in time, and the sampled measurement can be represented as a quantum circuit with a gate complexity of .
For a discrete distribution over a domain of size for any , it takes samples to obtain an empirical distribution such that with probability (see, e.g., [10]). We have the following corollary.
Corollary 4.7.
For any , let for a sufficiently large constant , and let and be the empirical distributions of the outcomes by applying independent random measurements in Theorem 4.6 to (for a sufficiently large constant ) copies of and , respectively. With probability , we have
Johnson-Lindenstrauss Lemma in Our Context. It is natural to ask whether existing dimension reduction techniques, such as the Johnson–Lindenstrauss (JL) lemma, can be applied directly to the -dimensional vector representation of a quantum state , or the outcome distribution when measured in the computational basis. After all, we can use quantum tomography to learn the representation approximately. We would like to first point out that a direct application will not work, since we can construct simple examples demonstrating inherent distortions between the trace distance of quantum states and the distances of their -dimensional vector representations ( or ), even when all the coordinates are real-valued and before any dimension reduction step. We leave the detailed examples and calculation to Appendix C. In our examples, for the vector representation, the distortions between the trace distance of quantum states and the and distances of the two corresponding vectors are at least and , respectively. And for the vector representation, the distortions between the trace distance of quantum states and the and distances of the two corresponding vectors are at least and , respectively. Moreover, the JL lemma only takes real vectors.
We also note that there exists a near-linear lower bound for dimension reduction in the space [4], indicating that, unlike the JL lemma for space, dimension reduction in the space is not generally possible.
We note that there is a way to circumvent the issues for embedding quantum states into the space: for each state , we write its density matrix as a real-valued dimensional vector . By some calculation, we can show that the distance of and preserves the trace distance of the two original pure states and . We then perform dimension reduction on the vectors using the JL lemma. Our sketching algorithm has the following advantages compared with this “full tomography plus JL lemma” approach (setting the error probability ):
-
1.
The memory usage of our sketch construction is independent of , while the memory needed for storing the classical vector representation of the quantum state is and that for the density matrix is .
-
2.
Our sketch construction takes time, while the full (pure) quantum state tomography takes [19] time and the dimension reduction using the JL lemma needs another time.
These comparisons demonstrate that our sketch construction using direct quantum measurements significantly outperforms the method of first converting the quantum state to its classical description followed by dimension reduction, both in terms of time and space, which are the main focus of this paper.
We now apply our embedding results to database operations.
The Equality-Test Operation
The Search Operation
We now illustrate how to use vector sketches and approximate nearest neighbor (ANN) to perform -search on quantum states.
Let and be the additive error and multiplicative error in Corollary 4.5/Corollary 4.7 for building , respectively. We assume that an LSH indexing structure has already been built on top of ’s to achieve the time and space usages stated in Theorem 2.2. To handle -search, we call , where is the parameter for the tradeoff between the distortion and the time/space complexity in ANN. By Corollary 4.5/Corollary 4.7 and Theorem 2.2, if there exists a state such that , then ANN returns a state such that . On the other hand, ANN either returns a state with , or returns .
By Theorem 2.2, it takes classical time to perform the search. The space for storing the LHS index is , where for and for .
We note that in the above approach, we have to make sure that . In other words, we can only handle -search with . However, since and can be positive constants arbitrarily close to , we can essentially handle all constants . Certainly, the higher the value of , the larger that we can pick for reducing the query time and space usage in the ANN search. In practice, a reasonably large constant may be okay, as the trace distance between two quantum states that are generated by separate entities or experiments is typically much larger than that between two states originating from the same entity or experiment (due to quantum noise or preparation errors).
Setting , and , we have , and consequently . Applying our vector sketch with respect to the distance and the corresponding ANN search, we have the following theorem.
Theorem 4.8.
There is an index of size , using which we can solve -search on a database of quantum states with success probability and classical time .
Note that the index space cost is independent of , and the query time is sublinear in (for ) and independent of the state dimension .
The Join Operation
The sketch-based approach can also be used for join. Given a set of sketch vectors , we can apply the same hashing process as that for the ANN search, and then verify (by computing the actual distance) all pairs of vectors that collide on at least one hash function. The space cost is the same as that of the search. The query time is dependent on the size of the join output, but it is still independent of the state dimension .
4.2 Shadow Seeds for Selection and Sorting
In this section, we develop a classical data summarization that can be used to estimate the expectation value for an arbitrary -local observable . We make use of the classical shadow tomography (CST), introduced in [37], to approximate up to a small additive error. CST tries to extract minimal information about the quantum state, without performing complete tomography, to estimate certain properties of the state described by observables.
For completeness, let us briefly describe the CST procedure using Pauli measurements. For each of the copies of , we select unitary operators, , randomly and independently from the set , where is the Hadamard gate and is the square root of the Pauli- gate; see Appendix A.2 for their matrix representations. We then apply to the -th qubit of and measure the state on the computational basis. The result is a binary string . The pairs form a row vector, where is the index of in the set . We then repeat this process for times, getting rows, forming the seed matrix . We call the shadow seeds. Clearly, can be stored using classical bits, since each entry of belongs to .
At the time of query, given a -local observable , we first construct -local classical shadows of the database state from each row of its seed matrix with respect to the -local observable . Suppose depends non-trivially on the qubits indexed by . Let be the standard basis vectors in the two dimensional plane. For each row and column , we first construct a vector . Next, we construct the -th shadow as a matrix where is the identity matrix. Finally, the estimator for is given by . The following theorem states that is a good approximation of the expectation value .
Theorem 4.9 (Based on [37]).
The above procedure prepares an shadow seed matrix given copies of an -qubit quantum state , such that for any given -local observable , if , the estimator approximates up to an additive error with probability using . Moreover, the time for computing using is bounded by , and the space for storing is classical bits.
Note that the space cost and query time are both independent of the state dimension .
Typically, the -local observable can be expressed as a quantum circuit with gate complexity. In this case, we propose a new estimation algorithm to further improve the total query time from to (omitting other less critical factors) by an approach we call QCQC (quantumclassicalquantumclassical). We have the following theorem, whose proof can be found in Appendix B.3.
Theorem 4.10.
There is a procedure for preparing an shadow seed matrix given copies of an -qubit quantum state , such that for any given -local observable with gate complexity, if , we can approximate up to an additive error with probability using . Moreover, the quantum time for computing using is bounded by , and the space for storing is classical bits.
The Selection Operation
It is easy to see that Theorem 4.10 directly implies an algorithm for handling -selection: Setting , we can estimate up to an additive error with probability for each -qubit database state using an shadow seed matrix, where . By a union bound over database states, we can solve the -selection problem with probability . The query time is bounded by .
Theorem 4.11.
There is an index of size , using which we can solve for any -local observable the -selection on a database of -qubit quantum states with success probability and quantum time .
The Sorting Operation
Since the shadow seed matrix can be used for estimating the expectation value up to an additive error , we can use it for -sorting with the same space and time complexity as that for the selection operation.
5 Conclusion and Future Work
In this paper, we have defined basic database queries for quantum data and proposed several classical sketches of quantum states to facilitate these queries. We consider our work a preliminary step towards a comprehensive quantum data management system. Numerous questions and directions remain open following this work. We list a few below.
Support More Data Operations
This paper primarily focuses on two basic database operations: search and selection, along with several related operations. We would like to expand the support to more complex operations for data analytics, such as clustering and classification, for which we may need to develop new classical summaries of the quantum states for the sake of efficiency.
Mixed States
In various scenarios, such as when the description of a quantum system is unknown due to quantum noise, the use of a density operator (or, density matrix) for describing mixed quantum states becomes more convenient. Suppose the quantum system is in one of a collection of -dimensional pure states , we can represent a mixed quantum state as , where and . We can view as a convex combination of outer products of pure states , where each is associated with a probability . We anticipate that results presented in this paper can be extended to mixed states, although the technical aspects of this generalization require further investigation.
The Integration with the Theory of Relational Databases
A key feature of our proposed model is that quantum data is represented entirely in the classical format. This unique aspect enables us to integrate our model with established theories related to indexing, query execution, and query optimization in relational databases designed for classical data. However, the integration process will likely require the redesign of multiple components to accommodate the inherent differences stemming from the distinct definitions of database operations for quantum data.
References
- [1] Scott Aaronson. The learnability of quantum states. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 463(2088):3089–3114, 2007.
- [2] Scott Aaronson. Shadow tomography of quantum states. In Ilias Diakonikolas, David Kempe, and Monika Henzinger, editors, STOC, pages 325–338. ACM, 2018.
- [3] Scott Aaronson and Daniel Gottesman. Improved simulation of stabilizer circuits. Physical Review A, 70(5):052328, nov 2004. doi:10.1103/physreva.70.052328.
- [4] Alexandr Andoni, Moses Charikar, Ofer Neiman, and Huy L. Nguyen. Near linear lower bound for dimension reduction in L1. In Rafail Ostrovsky, editor, FOCS, pages 315–323. IEEE Computer Society, 2011.
- [5] Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In FOCS, pages 459–468. IEEE Computer Society, 2006.
- [6] Costin Badescu and Ryan O’Donnell. Improved quantum data analysis. In Samir Khuller and Virginia Vassilevska Williams, editors, STOC, pages 1398–1411. ACM, 2021.
- [7] Costin Badescu, Ryan O’Donnell, and John Wright. Quantum state certification. CoRR, abs/1708.06002, 2017.
- [8] Harry Buhrman, Richard Cleve, John Watrous, and Ronald De Wolf. Quantum fingerprinting. Physical Review Letters, 87(16):167902, 2001.
- [9] Umut Çalikyilmaz, Sven Groppe, Jinghua Groppe, Tobias Winker, Stefan Prestel, Farida Shagieva, Daanish Arya, Florian Preis, and Le Gruenwald. Opportunities for quantum acceleration of databases: Optimization of queries and transaction schedules. Proc. VLDB Endow., 16(9):2344–2353, 2023.
- [10] Clément L. Canonne. A short note on learning discrete distributions, 2020. arXiv:2002.11457.
- [11] Thomas Chen, Shivam Nadimpalli, and Henry Yuen. Testing and learning quantum juntas nearly optimally. In Nikhil Bansal and Viswanath Nagarajan, editors, SODA, pages 1163–1185.
- [12] Kai-Min Chung and Han-Hsuan Lin. Sample efficient algorithms for learning quantum channels in PAC model and the approximate state discrimination problem. In Min-Hsiu Hsieh, editor, TQC, volume 197 of LIPIcs, pages 3:1–3:22. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
- [13] Paul Cockshott. Quantum relational databases, 1997. arXiv:quant-ph/9712025.
- [14] Christoph Dankert, Richard Cleve, Joseph Emerson, and Etera Livine. Exact and approximate unitary 2-designs and their application to fidelity estimation. Physical Review A, 80(1):012304, July 2009. doi:10.1103/physreva.80.012304.
- [15] Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. In Jack Snoeyink and Jean-Daniel Boissonnat, editors, SOCG, pages 253–262. ACM, 2004.
- [16] D.P. DiVincenzo, D.W. Leung, and B.M. Terhal. Quantum data hiding. IEEE Transactions on Information Theory, 48(3):580–598, March 2002. doi:10.1109/18.985948.
- [17] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm, 2014. arXiv:1411.4028, doi:10.48550/ARXIV.1411.4028.
- [18] Edward Farhi and Hartmut Neven. Classification with quantum neural networks on near term processors. February 2018. arXiv:1802.06002.
- [19] Daniel Stilck França, Fernando G. S. L. Brandão, and Richard Kueng. Fast and robust quantum state tomography from few basis measurements. In Min-Hsiu Hsieh, editor, 16th Conference on the Theory of Quantum Computation, Communication and Cryptography, TQC 2021, July 5-8, 2021, Virtual Conference, volume 197 of LIPIcs, pages 7:1–7:13. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2021.
- [20] Siddhant Garg and Goutham Ramakrishnan. Advances in quantum deep learning: An overview. arXiv:2005.04316, May 2020. arXiv:2005.04316.
- [21] Weiyuan Gong and Scott Aaronson. Learning distributions over quantum measurement outcomes. CoRR, abs/2209.03007, 2022.
- [22] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Gary L. Miller, editor, STOC, pages 212–219. ACM, 1996.
- [23] Jeongwan Haah, Aram W. Harrow, Zheng-Feng Ji, Xiaodi Wu, and Nengkun Yu. Sample-optimal tomography of quantum states. In Daniel Wichs and Yishay Mansour, editors, STOC, pages 913–925. ACM, 2016.
- [24] Jeongwan Haah, Aram W. Harrow, Zhengfeng Ji, Xiaodi Wu, and Nengkun Yu. Sample-optimal tomography of quantum states. IEEE Transactions on Information Theory, pages 1–1, 2017. doi:10.1109/tit.2017.2719044.
- [25] Rihan Hai, Shih-Han Hung, and Sebastian Feld. Quantum data management: From theory to opportunities. In ICDE, pages 5376–5381. IEEE, 2024. URL: https://doi.org/10.1109/ICDE60146.2024.00410, doi:10.1109/ICDE60146.2024.00410.
- [26] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Quantum algorithm for linear systems of equations. Physical review letters, 103(15):150502, 2009.
- [27] Aram W. Harrow, Cedric Yen-Yu Lin, and Ashley Montanaro. Sequential measurements, disturbance and property testing. In Philip N. Klein, editor, SODA, pages 1598–1611. SIAM, 2017.
- [28] Aram W. Harrow and John C. Napp. Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms. Physical Review Letters, 126(14):140502, apr 2021. doi:10.1103/physrevlett.126.140502.
- [29] Aram W. Harrow and Andreas J. Winter. How many copies are needed for state discrimination? IEEE Trans. Inf. Theory, 58(1):1–2, 2012.
- [30] Patrick Hayden, Peter W. Shor, and Andreas Winter. Random quantum codes from gaussian ensembles and an uncertainty relation. Open Systems & Information Dynamics, 15(01):71–89, mar 2008.
- [31] Mohsen Heidari, Ananth Y. Grama, and Wojciech Szpankowski. Toward physically realizable quantum neural networks. Association for the Advancement of Articial Intelligence (AAAI), 2022.
- [32] Mohsen Heidari, Mobasshir A Naved, Zahra Honjani, Wenbo Xie, Arjun Jacob Grama, and Wojciech Szpankowski. Quantum shadow gradient descent for variational quantum algorithms. 2024. URL: https://arxiv.org/abs/2310.06935, arXiv:2310.06935.
- [33] Mohsen Heidari, Arun Padakandla, and Wojciech Szpankowski. A theoretical framework for learning from quantum data. In IEEE International Symposium on Information Theory (ISIT), 2021.
- [34] Fumio Hiai and Denes Petz. The Semicircle Law, Free Random Variables and Entropy (Mathematical Surveys & Monographs). American Mathematical Society, 2006.
- [35] Alexander S. Holevo. Quantum Systems, Channels, Information. De Gruyter, Berlin, Boston, 2013. URL: https://doi.org/10.1515/9783110273403 [cited 2023-08-27], doi:doi:10.1515/9783110273403.
- [36] Alexander Semenovich Holevo. On asymptotically optimal hypotheses testing in quantum statistics. Teoriya Veroyatnostei i ee Primeneniya, 23(2):429–432, 1978.
- [37] Hsin-Yuan Huang, Richard Kueng, and John Preskill. Predicting many properties of a quantum system from very few measurements. Nature Physics 16, 1050–1057 (2020), February 2020.
- [38] Noah Huffman, Dmitri Pavlichin, and Tsachy Weissman. Lossy compression for schrödinger-style quantum simulations, 2024. URL: https://arxiv.org/abs/2401.11088, arXiv:2401.11088.
- [39] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Jeffrey Scott Vitter, editor, STOC, pages 604–613. ACM, 1998.
- [40] E. Joos, H. D. Zeh, C. Kiefer, D. J. W. Giulini, J. Kupsch, and I. O. Stamatescu. Decoherence and the Appearance of a Classical World in Quantum Theory. Springer, 2003.
- [41] Stephen P. Jordan, Keith S. M. Lee, and John Preskill. Quantum algorithms for quantum field theories. Science, 336(6085):1130–1133, jun 2012. doi:10.1126/science.1217069.
- [42] Julia Kempe, Alexei Y. Kitaev, and Oded Regev. The complexity of the local hamiltonian problem. SIAM J. Comput., 35(5):1070–1097, 2006.
- [43] Ian D. Kivlichan, Craig Gidney, Dominic W. Berry, Nathan Wiebe, Jarrod McClean, Wei Sun, Zhang Jiang, Nicholas Rubin, Austin Fowler, Alán Aspuru-Guzik, Hartmut Neven, and Ryan Babbush. Improved fault-tolerant quantum simulation of condensed-phase correlated electrons via trotterization. Quantum, 4:296, jul 2020. doi:10.22331/q-2020-07-16-296.
- [44] Yang Liu and Gui Lu Long. Deleting a marked item from an unsorted database with a single query, 2007. arXiv:0710.3301.
- [45] Fabio Valerio Massoli, Lucia Vadicamo, Giuseppe Amato, and Fabrizio Falchi. A leap among entanglement and neural networks: A quantum survey. arXiv:2107.03313, July 2021. arXiv:2107.03313.
- [46] Isaac L. Chuang Michael A. Nielsen. Quantum Computation and Quantum Information. Cambridge University Pr., December 2010.
- [47] Isaac L. Chuang Michael A. Nielsen. Quantum Computation and Quantum Information. Cambridge University Pr., December 2010. URL: https://www.ebook.de/de/product/13055864/michael_a_nielsen_isaac_l_chuang_quantum_computation_and_quantum_information.html.
- [48] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii. Quantum circuit learning. Physical Review A, 98(3):032309, sep 2018. doi:10.1103/physreva.98.032309.
- [49] Ryan O’Donnell and John Wright. Efficient quantum tomography. In Daniel Wichs and Yishay Mansour, editors, STOC, pages 899–912. ACM, 2016.
- [50] Alexey Pyrkov, Alex Aliper, Dmitry Bezrukov, Yen-Chu Lin, Daniil Polykovskiy, Petrina Kamya, Feng Ren, and Alex Zhavoronkov. Quantum computing for near-term applications in generative chemistry and drug discovery. Drug Discovery Today, page 103675, 2023.
- [51] Jun John Sakurai and Eugene D Commins. Modern quantum mechanics, revised edition, 1995.
- [52] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. The quest for a quantum neural network. Quantum Information Processing, 13(11):2567–2586, Aug 2014. doi:10.1007/s11128-014-0809-8.
- [53] Pranab Sen. Random measurement bases, quantum state distinction and applications to the hidden subgroup problem. In CCC, pages 274–287. IEEE Computer Society, 2006.
- [54] Peter W. Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM J. Comput., 26(5):1484–1509, 1997.
- [55] Adam Smith, MS Kim, Frank Pollmann, and Johannes Knolle. Simulating quantum many-body dynamics on a current digital quantum computer. npj Quantum Information, 5(1):106, 2019.
- [56] Xiaoming Sun, Guojing Tian, Shuai Yang, Pei Yuan, and Shengyu Zhang. Asymptotically optimal circuit depth for quantum state preparation and general unitary synthesis, 2023. arXiv:2108.06150.
- [57] Immanuel Trummer and Christoph Koch. Multiple query optimization on the d-wave 2x adiabatic quantum computer. Proc. VLDB Endow., 9(9):648–659, 2016. URL: http://www.vldb.org/pvldb/vol9/p648-trummer.pdf, doi:10.14778/2947618.2947621.
- [58] James D. Whitfield, Jacob Biamonte, and Alán Aspuru-Guzik. Simulation of electronic structure hamiltonians using quantum computers. Molecular Physics, 109(5):735–750, mar 2011. doi:10.1080/00268976.2011.552441.
- [59] Tobias Winker, Sven Groppe, Valter Uotila, Zhengtong Yan, Jiaheng Lu, Maja Franz, and Wolfgang Mauerer. Quantum machine learning: Foundation, new techniques, and opportunities for database research. In Sudipto Das, Ippokratis Pandis, K. Selçuk Candan, and Sihem Amer-Yahia, editors, SIGMOD, pages 45–52. ACM, 2023.
- [60] Xin-Chuan Wu, Sheng Di, Emma Maitreyee Dasgupta, Franck Cappello, Hal Finkel, Yuri Alexeev, and Frederic T. Chong. Full-state quantum circuit simulation by using data compression. In Michela Taufer, Pavan Balaji, and Antonio J. Peña, editors, SC, pages 80:1–80:24. ACM, 2019.
- [61] Ahmed Younes. Database manipulation on quantum computers, 2007. arXiv:0705.4303.
Appendix A More Preliminaries
A.1 Basics of Quantum Information (A More Formal Approach)
Quantum States and Qubits. We can represent a -dimensional pure quantum state as a vector in . Using the standard Dirac bra-ket notation, we write where is an orthonormal basis in (referred as the computational basis), and ’s are called amplitudes with the property .
Let denote the conjugate transpose of , and let and denote the inner product and outer product of vectors and , respectively.
A qubit is a -dimensional quantum state and can be represented as , where . Generally, a -qubit state can be represented as
where are the computational basis states of the -qubit system, and it holds that .
A quantum state is called separable if it can be written as a tensor product of at least two states , which is often abbreviated as , or . Otherwise, the state is called entangled. A classical entangled state is the Bell state .
Quantum Operations
There are two types of quantum operations. The first is called unitary transformation. That is, we apply a unitary operator to a quantum state and get .444A unitary operator is a linear operator such that . This type of operation is used to describe the evolution of a closed quantum system. The second type of operations are measurements. Quantum measurements are the interface to obtain classical information about quantum states. Under the POVM (Positive Operator Valued Measures) formalism, a quantum measurement is described as a collection of positive semi-definite operators with ; each is associated with a measurement outcome , which can be chosen by the experimentalist. When performing on a quantum state , the probability of getting the outcome is given by . Let be the probability distribution of the measurement outcomes after applying on .
A projective measurement is a special case of a POVM where ’s are projective operators, i.e., . An example of a projective measurement is the measurement on the computational basis where . Any POVM can be written as a unitary operator followed by a projective measurement.
Observables are physical variables that can be measured. An observable is represented by a Hermitian operator whose eigenvalues are the set of possible outcomes. The observable spectrally decomposes as , where represents the projector onto the eigenspace of associated with the eigenvalue . The observable can be associated with the measurement with outcomes . The expected value of an observable on a state is expressed as .
When we say a measurement is performed on a -dimensional quantum state in the computational basis, we mean that the measurement with is applied on . It is noteworthy that the probability of observing the measurement outcome , denoted by , is equal to , wherein is the -th amplitude of the quantum state .
A crucial property of quantum mechanics is that each measurement would cause a disturbance to the quantum states. If the measurement outcome is , then the post-measurement state of can be written as
where satisfies . This particular phenomenon significantly complicates the quantum data management, as the quantum state cannot be entirely “recycled” following a measurement.
Trace Distance
Given two quantum states and , we define their trace distance to be
The trace distance is the most widely used distance measure for quantum states in the literature. It also has a nice physical meaning: Let be a POVM, and let and . That is, and are the probabilities of obtaining measurement outcome on and , respectively. We have which implies that if two quantum states are close in terms of trace distance, then any measurement conducted on these quantum states will yield probability distributions that are close in terms of the total variance distance. In other words, two quantum states that are close in terms of the trace distance are statistically indistinguishable under measurements.
A.2 Some Basic Quantum States and Gates
We list the vector representations of some basic quantum states that we need to use in this paper.
-
•
:
-
•
:
-
•
:
-
•
:
-
•
:
-
•
:
We make use of two basic quantum gates:
-
•
Hadamard gate
It turns to , and turns to .
-
•
Phase gate
It leaves unchanged, and turns to .
-
•
Another ser of special unitary operators are the Pauli operators defined as:
Here, is the identity operator, represents a bit-flip operation, represents a phase-flip operation, and combines both bit-flip and phase-flip with an imaginary phase factor.
A.3 Clifford Group
In this paper, we make use of a special class of quantum unitary operations called the Clifford group, which is a fundamental construct in quantum information theory. The Clifford group consists of unitary operations that map Pauli operators to other Pauli operators. Furthermore, the Clifford group is known for its efficient classical simulation, as described by the Gottesman-Knill theorem [47]. Any unitary in in this group can be implemented (up to a global phase factor) using a circuit with only Hadamard, Phase, and CNOT gates. The formal definition of the Clifford group is given below.
Definition A.1.
The Clifford group on -qubits is defined as the normalizer of the Pauli group under the action of conjugation, that is
where is the -qubit Pauli group generated by the identity and the Pauli matrices on each qubit, along with the phase factor .
A.4 Mathematical Tools
Lemma A.2 (Hoeffding’s inequality).
Let be i.i.d. random variables and . Then
Lemma A.3 (Generic Chernoff bound).
For any and random variable with moment generating function ,
Lemma A.4 (Berry-Esseen theorem).
Let be i.i.d. random variables with and . If , then
where is the CDF of and is a constant.
Appendix B Missing Proofs
B.1 Proof of Theorem 4.3
Measurements Construction
We first describe how to generate random measurements. We start by picking a random basis for based on the Haar measure, which can be done using a Gaussian ensemble of pure states as is used in [30]: Let be independent Gaussian random variables with mean zero and variance , and let be a random vector where . We repeat this process and generate Gaussian random vectors (written in the ket notation) .
We next create an orthonormal basis for using . It is clear that with probability one, ’s are linearly independent, which means that they span . However, they are not necessarily orthogonal. To address this issue, we use the pretty good measurement technique [35]. Define the operator , and define the vector for each .
We note that computing may be time expensive. In Section B.1.1, we will discuss a more efficient measurement construction via -design, which is a concept in quantum information theory that generalizes the idea of random sampling over the unitary group of unitary matrices.
Claim 1.
The set forms an orthonormal basis for .
Proof B.1.
Observe that
Moreover, are linearly independent since ’s are linearly independent. Hence, ’s are orthonormal.
We also note that the distribution of is unitary invariant. Hence, it is the Haar measure.
Next, we randomly group ’s into groups and form random projection operators as
(2) |
Let be the corresponding measurement. Clearly, is a valid POVM with probability one.
The Analysis of Distortion
Let and be the probability distributions of the measurement outcomes when the states are and , respectively. Then, the total variation distance between the two probability distributions can be written as
(3) |
where we have used the Born’s law and set .
We next show that has two eigenvalues , where is the trace distance. To this end, suppose is an eigenstate of , and , where as is a Hermitian operator. Multiplying both sides by gives
(4) |
Similarly, multiplying both sides by gives
(5) |
Hence, , which implies
(6) |
Now without loss of generality, let and denote the two eigenstates of . Hence,
(7) |
The right-hand side of (3) can be written as
(8) | |||||
(9) |
Let . Combining (3) and (9), we have
Multiplying both sides of the above equality by gives
(10) | |||||
We try to analyze the expectation and variance of each . First, note that , because the distribution of is unitary invariant, which implies that and have an identical distribution.
The variance of equals to
(11) | |||||
where is a random pure state generated based on the Haar measure. Since the Haar measure is invariant under Unitary transformation, and have the same joint distribution as and , where is a random unitary matrix (a Haar unitary) and refers to the entry on the ths row and th column of . Note that all entries of a Haar unitary are identically distributed [34]. Moreover, they can be written as with the distribution given by where and . Therefore, the distribution of is given by . Consequently, from (11) we have
(12) |
The first expectation in (12) calculates as
where is the Beta function that is defined as
The Beta function at positive integers can be calculated combinatorically as
Therefore,
Similarly, the second expectation in (12) equals to
Consequently, (12) reduces to
(13) |
Implying that . Now we continue our investigation of Equality (10). Let
(10) simplifies to
(14) |
Let
(15) |
Applying the generic Chernoff bound (Lemma A.3) to gives
(16) |
where is the moment generating function.
Let . We use to upper bound the RHS of (16). Since ’s are i.i.d., we have
(17) |
where . Apply Taylor’s expansion on around , we have
(18) |
where (for a universal constant ) is the remainder term.
It is clear that . By the definition of the moment generating function, and
(18) simplifies to the following:
where we have used the fact that (see (13)).
In the rest of the analysis, we will focus on the parameter ; the actual value of will be determined later.
For the other direction, we have
(20) |
Hence, for any , setting for a sufficiently large constant , we have that with probability at least . Combining (20), (15), and (14), we have that with probability at least ,
(21) |
where is distributed identically as ’s.
It remains to investigate . We show that . By the definition we know that
We start with the upper bound.
(22) |
For the lower bound, Markov inequality implies that
for any . Since is distributed identically as sum of i.i.d. random variables with , , and , by the Berry-Esseen theorem (Lemma A.4) we get
Hence,
(23) |
Recall that (see (13)). By a numerical computation, the maximum of the RHS of (23) attends at , which gives
(24) |
as long as is a sufficiently large constant.
B.1.1 Efficient Implementations of Random Measurements
Lastly, we address the space and time complexity of the sketching procedure in this proof. Specifically, we consider efficient construction of . Note that the runtime of the original measurement construction based on pretty good measurements is polynomial in , hence exponential in the number of qubits. This is because it relies on generating kets based on Haar measure and via complex Gaussian vectors in and the inverse-square root of the matrix . Generally, the classical time and the circuit gate complexity for sampling from Haar distribution is exponential. In what follows, we present a construction for with time and space.
Our approach is based on 2-design methods. A unitary -design is a concept in quantum information theory that generalizes the idea of random sampling over the unitary group of unitary matrices. It provides a way to approximate certain statistical properties of quantum states or operations without needing to sample from the entire group, which can be computationally expensive. Below we highlight basics of this concept.
Let denote a polynomial which is homogeneous with degree at most in the matrix elements of , and at most degree in the complex conjugates of these elements.
Definition B.2.
A unitary -design is a finite set of unitary matrices such that for any homogeneous polynomial
where is the Haar measure on the space of unitary matrices.
Intuitively, this definition implies that a unitary -design is indistinguishable from Haar measure when only polynomials of degree at most are used. We show that a 2-design is sufficient to obtain Theorem 4.3.
Note that the analysis of sample complexity in our proof relies on the generic Chernoff bound which, per (18), depends on the first two moments of
We show that can be written as a polynomial of degree in the elements of a unitary matrix . Let be a matrix with columns being the vectors of . That is is the element of , denoted by , located at the row and the column indexed by . With this definition, we can write , implying that . Therefore, the first moment of is a polynomial of degree in . Moreover, the second moment of is a polynomial of degree in . This is because based on (11), the second moment is a function of , which is written as . Hence, as long as the first and the second moments of resemble the Haar measure, the proof remains valid, indicating that a 2-design is sufficient.
Based on the above argument, instead of using the pretty good measurement construction, we sample from -design unitaries. This can be implemented efficiently using the Clifford group, a specialized class of quantum operators (for more details see Section A.3). It is well-known that the Clifford group is a 2-design [14]. Furthermore, there exists an algorithm that samples uniformly from the Clifford group in classical time and outputs a circuit representing the measurement with a gate complexity of , where is the number of qubits [16]. To construct the desired measurement , we first randomly generate a Clifford circuit, apply it to the input quantum state, and then measure in the computational basis. The outcome of the measurement is a binary string in . Lastly, we design a binning function that takes the -bit string and outputs an index in . This is achieved by partitioning the set into equal-size bins, which can be efficiently implemented using a decision tree that only reads the first bits of the binary measurement outcomes. There are possible outputs of the decision tree, meaning that it can be represented by a function that determines the bin index for any binary string of length . This construction is summarized as Algorithm 1.
In conclusion, the above construction generates a random measurement with quantum gates and classical time. Moreover, this measurement enjoys the same bound on the sample complexity as for the original construction based on the Haar measure.
B.2 Proof of Theorem 4.6
The proof for Theorem 4.6 (the -norm case) is similar to that for Theorem 4.3 (the -norm case), but the calculation will be different due to different distance functions. Let be the same random POVM generated as that in the proof of Theorem 4.3.
The distance between the output probability vectors can be written as
(25) |
where , which can be rewritten as , where and are two eigenstates of . Hence, we can rewrite (25) as
(26) | |||||
Multiplying both sides of (26) by a factor of , and letting , we have
(27) |
Let
(28) |
By a similar analysis as that in the proof of Theorem 4.3 (particularly, recall (6) and (13)), the expectation of the RHS of (27) is calculated to be
(29) |
We next analyze the variance of the RHS of (27).
We write
(30) |
By the generic Chernoff bound (Lemma A.3), we have
(31) |
where . Set for a constant to be determined later . We use to upper bound the RHS of (31). Since ’s are i.i.d., we have
(32) |
where .
Apply Taylor’s expansion on around , we get
(33) |
where (for a universal constant ) is the remainder term. We again have , , and for a universal constant .
In the rest of the analysis, we will focus on parameter ; the actual value of will be determined later.
We extend (33) as
(34) |
Plugging (34) to (32), we have
(35) |
Plugging (35) to (31), we have
where for the last equality to hold, we set constant .
For the other direction, we have
B.3 Proof of Theorem 4.10
For the first part, we use the same procedure is CST to create the seed matrix , where each row consists of the pairs .
Next, for the query algorithm, we propose an encoding to turn the classical shadows into quantum states. This is a deviation from CST as we push the computations back to quantum via a QCQC approach. Let be the index of the relevant qubits of a local observable in the query phase. For each , we create a random binary pair using the stored bit as follows:
(37) |
We then prepare a qubit and apply the corresponding operator to create qubit . Note that takes one of the following states:
that are easy to prepare (see Appendix A.2 for their vector representations). Then, we prepare the for the rest of qubits not indexed in . Let
(38) |
We construct the -th shadow sample as (written as the outer-product form)
(39) |
We next measure each using the observable , and obtain an outcome . Let
(40) |
Then, our estimator is the empirical average
We now show that when , then approximates up to an additive error with probability , proving the correctness part of Theorem 4.10. We will also give the query time analysis. Recall that the space needed for storing the seed matrix is classical bits.
In the rest of the proof, we will focus on the random variable for particular . Recall that the final approximation is the average of i.i.d. copies of .
Let , , , be the corresponding random variables of , , , , respectively. Since we focus on a particular , we will omit all subscripts in those random variables and write them as , , , . We also write as .
The following result can be inferred from [32]. We include a proof for completeness.
Lemma B.3.
For any , let . We have , where .
Proof B.4.
Let denote the shadow channel for Pauli measurements as defined in [37], which is given by
for any single qubit operator . By direct calculation, we have for a generic state ,
It is known that has an inverse as it is a linear mapping. Applying on , we have, again by direct calculation, that
The next lemma shows that is an unbiased estimator of the quantity .
Lemma B.5.
.
Proof B.6.
The correctness part of Theorem 4.10 follows immediately from Lemma B.5, Hoeffding’s inequality (Lemma A.2), and the fact that for all , we have .
Running time
We now analyze the running time of the query estimation procedure. First, the preparation of each quantum state takes quantum time . The construction of each shadow sample (Eq. (39)) takes quantum time; note that we do not actually need to prepare those ’s with , since the -local observable does not depend on those qubits. The quantum time for measuring each with a -local observable is , as we have assumed that has a gate complexity. The computation of each (Eq. (40)) can be done in classical time, and that of can be bounded by classical time. Summing up everything, the total running time can be bounded by quantum time plus classical time; the classical time can be ignored if we assume that a unit quantum time is at least a unit classical time.
Appendix C Distortions Between The Trace Distance and Distances of Quantum States
& | & | distortion w.r.t. | |
---|---|---|---|
– | |||
Let be the vector representation of a -dimensional quantum state , where the first coordinate is and the others are . Let be a -dimensional vector with the second coordinate being and the others being .
Let be the -dimensional vector with the first coordinators being and second half being , and
be the -dimensional vector with the middle coordinates being and the rest being .
Let denote the trace distance between two quantum states and . Let and denote the and distances between and , respectively. Let and denote the and distances between and , where is after taking the coordinate-wise absolute square; that is, .
The distortion of a distance with respect to is lower bounded by
In Table 1, we have calculated the distortions between the trace distance of the quantum states and the and distances of their corresponding classical vector representations. It is easy to see that all distortions are larger than .
Appendix D Search As A Special Case of Selection
We note that the selection operation can also be used for search. Given two quantum states and , letting , we have
Therefore, if we can estimate up to an additive factor (for a sufficiently small constant ) for any database state , we can also solve -search for any constant . By Theorem 4.10 and the fact that when , for a database consisting of states, the query (quantum) time using expectation value estimations is bounded by . This approach is certainly much more time-expensive than that using state sketches presented in Section 4.1.