Quantum search in a dictionary based on fingerprinting-hashing
Abstract
In this work, we present a quantum query algorithm for searching a word of length in an unsorted dictionary of size . The algorithm uses queries (Grover operators), like previously known algorithms.
What is new is that the algorithm is based on the quantum fingerprinting-hashing technique, which (a) provides a first level of amplitude amplification before applying the sequence of Grover amplitude amplification operators and (b) makes the algorithm more efficient in terms of memory use — it requires qubits. Note that previously developed algorithms by other researchers without hashing require qubits.
1 Introduction
The problems of information retrieval in databases are well known in theoretical and applied computer science. The problem is an enumeration problem, which classically requires an enumeration of all possible options. The well-known classical unsorted list search is a sequential search query algorithm. Such algorithms, for our convenience, can be described as follows. We need to solve a Boolean equation , where is a Boolean function on , . It is assumed that the function is given as an oracle. In the query model, one can ask the oracle only a question like: “what is given ?”, and use the answer in further computations. The number of queries is a characteristic of the complexity (time) of such a query algorithm.
The above problem of solving the equation is a general form of the iteration problem, which classically requires successive iteration over all elements of for the case where is unsorted. The quantum algorithm finds some root of the equation using calls to the function .
Such a quantum search, which accelerates search by a quadratic factor compared to classical algorithms, was proposed by L. Grover in 1996 [1] and has been modified in a number of subsequent works (see, the paper [2], lecture notes [3] and the book [4] for more information and citations). The development of quantum algorithms for information retrieval in databases continues. For example, see reviews of [5, 6].
Search for a word in the text.
Searching for a word in a text, as a task of searching for information in a database, has been considered in many works and has some specific features. First, the oracle , defined by the task, is used to find the occurrence of the word in the dictionary . Namely, if and only if . Secondly, for searching for a word in a text to speed up search in the 1970s and 1980s, hashing-based algorithms were proposed. The Knuth-Morris-Pratt (KMP) algorithm [7] and the Rabin-Karp algorithm [8] solve the problem in linear number of comparisons (in linear time), with complexity.
It is clear that quantum algorithms for this task, based on Grover’s idea, provide a quadratic saving in the number of calls to the oracle. Note that to describe Grover’s algorithm, we first need to specify an efficient oracle circuit for for such algorithm. A number of similar studies in recent decades in the field of developing algorithms for searching for occurrences of a word in a text (in an unsorted dictionary constructed from the text) were devoted to the construction of efficient oracles . A number of constructions of such oracles in terms of quantum circuits were proposed and the complexity of such circuits was studied: Practical Implementation of a Quantum String Matching Algorithm [9], Algorithm of Ramesh, Hariharan, and Vinay [10] (2003). Montanaro [11] (2017). Soni and Rasool [12] (2020).
Note that all of these listed results require qubits to search for a word of length in a dictionary of size .
The idea of using hashing methods for quantum information retrieval in an unordered database was realized in 2024 [13]. In [13], a hybrid classical-quantum (probabilistic-quantum) algorithm is proposed for searching a word of length in a dictionary of size .
The key idea of the hybrid classical-quantum algorithm is as follows: (a) choose a universal hash family of hash functions, (b) uniformly randomly choose a hash function , (c) hash the elements of with the hash function , and (d) apply the quantum amplification technique to search for the hash image of the word in the hashed dictionary . It has been proven that using hashing methods can exponentially reduce the number of qubits required. Specifically, qubits are sufficient when using hashing, instead of qubits in algorithms without hashing.
Quantum fingerprinting-hashing.
The quantum fingerprinting-hashing technique (quantum fingerprinting-hashing technique or quantum hashing technique for short) is a key component of the algorithms considered in this paper. The quantum fingerprinting was formalized as a method of quantum information compression in 2001 by Burman et al. [14]. The quantum fingerprinting function presented in [14] was built on error-correcting codes. Various designs of quantum hash functions and their cryptographic properties were discussed in [15]. See also the book [16]. The quantum hash function maps words of length to qubit states. Such a map has the following important (for our algorithms) properties: (a) the map is contractive, i.e. (for example for fingerprinting function [14] ), (b) the resulting quantum states are highly distinguishable, and (c) the function is invertible.
Our contribution.
In this work, we make a further step in applying hashing technique for finding a specific word of length from an unordered collection (vocabulary) of words, each of which has length .
The quantum search algorithm works as follows. The vocabulary is represented as a quantum state , whose elements are hashed by the quantum hash function . The search for an occurrence of the word is performed in two stages as follows. In the first stage, the quantum state is quantum-parallel inverted by the mapping determined by . This stage provides the first level of amplitude amplification. Then, in the second stage, a known sequence of Grover amplitude amplification operators is applied. Finally, the resulting quantum state is measured, and the algorithm produces the result.
The high probability of the correct result is based on properties (b) and (c) of the quantum hash function — they provides a first level of amplitude amplification before applying the sequence of Grover amplitude amplification operators
The quantum hash function is memory efficient: searching for a word of length in a dictionary of size requires qubits. The number of oracle calls is .
Note that, as in the previous work [13], we present the algorithm in terms of the quantum query model. This level of representation is sufficient to demonstrate the efficiency of memory use by the algorithm. We leave the issues of efficient implementation of the oracle scheme for further consideration.
The paper is organized as follows. In Section 2, we describe the quantum fingerprinting function and the necessary operators for its further implementation. In Section 3, we present a dictionary search algorithm based on the quantum fingerprinting function in technical detail. In Section 4, we formulate and prove theorems analyzing the operation of the algorithm and its complexity characteristics.
In Section 5.2, we generalize the specific construction of the algorithm from Section 3 to the general case of constructing algorithms based on the quantum hashing technique in terms of an arbitrary -collision-resistant quantum hash function . More precisely, in Section 5 we present a group of algorithms that can differ depending on which quantum hash function (see Section 5.1) we use.
In Section 6, we present its characteristics. We conclude Section 6 with a discussion of the features of the algorithm.
2 Quantum fingerprinting-hashing
The quantum function , defined in this section, is considered in the paper [17]. The authors called their function “quantum fingerprinting function”. The function is defined based on a binary error-correcting code.
Error Correcting Codes.
The error-correcting code is defined by the mapping
with the following condition: for any different words
their maps
satisfy the condition that the Hamming distance between them is at least .
Such a code is called an -code. A binary code is one in which .
For arbitrary , there exists an -error-correcting code with , .
Function generated by error correcting code.
The function generated by the binary -error-correcting code defined as follows.
For the function
is defined by the condition
(1) |
where is the -th bit of the codeword .
In content: a word with length is mapped to a -qubit quantum state . This state represents the encoding of the original word . The state is a superposition over qubits.
Realization of the quantum fingerprinting function .
Let .
The transformation
(2) |
acts on -qubits. We define the transformation . It is defined by a unitary matrix. It is a composition of the Hadamard transformation , the identity transformation and the transformation . It is convenient to define the transformation by describing its action on the vectors of the computational basis of the space. The arguments of the transformation are words :
is defined by the codeword , which has a length of . In terms of content, the transformation “changes” the last -st qubit in the basis state to .
Property 1
Transformation defines the function . Namely, for any word the following is true
Proof. The transformation , transforms the basis state into an equal-amplitude superposition of the first qubits (Hadamard transformation of the first qubits)
Next, the transformation transforms the state to the state :
Next, we also need the transformation
given by words , and it is the inverse of the transformation in the following sense
Property 2
The transformation is the inverse of the transformation .
Proof. Indeed, for an arbitrary word , for a -qubit state it is true that :
3 Algorithm .
Problem
Given an unordered set composed of binary sequences , each of length .
where for .
Given a binary string of length , where . It is required to find the index of the occurrence of in the sequence . Namely, it is required to find an index such that .
Note that this is essentially a database search problem.
Algorithm
The algorithm consists of two parts:
-
1.
First part: preparing the initial state based on the sequence .
-
2.
Second part: reading the search word and searching for its position in the sequence.
Note that the algorithm takes different input data in the two parts: and , respectively.
Description of the algorithm .
Input data:
For the first part: the sequence of binary words with a length of .
For the second part: a binary string with length .
Output data: The index , which denotes the number of an element in the sequence, such that .
Thus, the algorithm implements the mapping
-
1.
The first part of the algorithm (Initialization of hashed vocabulary) involves preparing the initial state .
The initial state is
Here, – is the zero state of qubits. According to this sequence , based on the transformation , which defines the quantum function , the state is converted into the initial state:
Recall that
(3) -
2.
The second part of the algorithm consists of reading the searched word and searching for a position , such that .
The search procedure consists of applying the amplitude amplification method presented in the paper [2] and described in the book [4].
-
•
Conversion hash transformation
The operator is applied to the state ,
where for .
-
•
Amplification
The amplification procedure of the amplitudes of basic states (“good states” as they are called in [4]) is applied to state .
The amplitude amplification procedure consists in the use of Grover macro steps according to the description in [4].
-
•
Measurement
The first qubits of the final state are measured in a computational basis. If the last qubits are all zero, then the result of measuring the first qubits is the position of element , where .
-
•
3.1 Characteristics of the algorithm.
The characteristics of a quantum algorithm include: the probability of error, the number of queries made to the analyzed data, and the amount of memory used.
The probability of success. We denote by the probability of a successful outcome of the algorithm .
Query complexity. The number of queries (the number of requests to the oracle) is the query complexity of the quantum algorithm .
Memory complexity. The number of used qubits is a measure of the memory complexity of quantum algorithm .
4 Analysis of the algorithm.
In this section, we define the theorem 2 and present its proof. The main states result is that the algorithm provides the correct result quickly with high probability and low memory complexity. Informally, this result is presented in theorem 1.
Denote by the probability of the event that the result of algorithm is a number , such that .
Meaningfully, the algorithm has the following characteristics.
Theorem 1
For the algorithm which searches for an occurrence of an element with length in a sequence of elements, the following is true
The Theorem 1 is based on the following formal statement.
Theorem 2
Let , and , where .
We present the proof of Theorem 2 in the next section. We conclude that section by demonstrating that parameters and can be selected to make close to 1.
4.1 Proof of the theorem 2.
In this section, we present a proof of the estimates for Theorem 2. For convenience, the evaluation of each characteristic has been presented in its own section.
4.1.1 Estimation of the probability of the algorithm success and query complexity
The following Property 3 is key to the assertion of the Lemma 1. This Lemma 1, in turn, is key to proving the high probability of correct operation of the algorithm.
Property 3
Let the function (3) generated by a binary error correcting -code with for some . Then for an arbitrary pair of different words the following is true
In this case, it is natural to say that the states and are -orthogonal.
Lemma 1
Let the function (3) generated by a binary error correcting -code with for some . Then for the state
the following is true. If , then . If , then .
Proof. The transformation is the inverse of . is applied to state to get . Therefore, for the word
For all other words , their function values and are pairwise -orthogonal (due to the Property 3).
Unitary transformation of states and saves the scalar product. This means that all states for , are pairwise -orthogonal
Therefore, for states
for it is true that .
The latter proves the statement of the lemma.
Amplification.
The amplification procedure corresponds to the one presented in the article [2]. For convenience, we will follow the description and notations of the procedure in chapter 8 of the book [4].
The amplification procedure is implemented by repeatedly applying the unitary operator to the state . In the text below, for simplicity, we will use to indicate the state of .
To prove the theorem, we first divide the vector into two parts and .
and the vector consists of the remaining components of the state . Let’s denote
is the probability of measuring basic states for . We will call these basic states good states, and the probability of obtaining them good probability . We will call other basic states bad states, with the probability of obtaining them being called bad probability . We renormalize components and and get the following states
(5) |
Then we can record
or
where defined by equality .
We define the search iteration operator as follows.
For an arbitrary real number , the operation acts as follows:
and thus, performs reflection relative to the axis defined by the vector .
More precisely, the operator changes the amplitude sign for the basic states of the form .
Let’s denote the state
which is orthogonal to .
and
are orthonormal bases for the same 2-dimensional space.
The operator acts as follows:
and thus, performs reflection relative to an axis defined by a vector .
Indeed,
and can be described in the basis
as
Repeating the application of the operator once brings the initial state to
(6) |
Let . Note that for small : for small (). In particular, in our case
and , for small . We choose such that , meaning (). Choosing provides .
is the probability of measuring a unique basic state among all basic states , of the part of the state (6). From this and from (4), (5) it follows that
Thus, selecting and choosing close to 0( chosen according to respectively) provides .
4.1.2 Estimation of the memory
The dimension of the algorithm is defined as the total number of qubits used in the state . So we have
For some selected and we have
Comment 1
Recall that is a parameter of the error-correcting code , and the code defines the quantum fingerprinting function . The value of plays an important role in determining the pairwise “almost” orthogonality (-orthogonality) of the values of . This is essential for proving the high probability of a correct result from the algorithm.
5 Algorithm based on quantum hashing
In this section, we present a generalization of the algorithm . The generalization is based on the application of quantum functions called “-stable”.
5.1 Quantum -stable functions
Definition 1
(-stable function) The quantum function
that maps words of length in the finite alphabet to a set of -qubit states will be called -stable, if for all words their quantum images and are pairwise -orthogonal
Definition 2
((,,)-quantum hash function) The quantum function
that maps words of length in the finite alphabet to a set of -qubit states will be called (,,)-quantum hash function if quantum function is -stable and
The requirement of -stable property of the function imposes restrictions on the compression ratio. The following lower bound for is was presented in [18].
Theorem 3
Let be a (,,)-quantum hash function. Then
Transformations and .
The algorithm uses the unitary transformation on the space and its inverse .
The transformation is an implementation of the (,,)-function . For the transformation, the following must be true
Accordingly, for the inverse transformation , the following must be true
5.2 Algorithm
The algorithm is a generalization of the quantum algorithm in terms of an arbitrary (,,)-quantum hash function .
The algorithm consists of two parts:
-
1.
First part: preparing the initial state based on sequence .
-
2.
Second part: reading the search word and searching for its position in the sequence.
Note that the algorithm has different input data for the two parts: and the word , respectively.
Description of the algorithm .
Input data:
For the first part: sequence of binary words with length .
For the second part: Binary word with length .
Output data: The index , which denotes the word number such that .
Thus, the algorithm implements the mapping
-
1.
The first part of the algorithm consists of preparing the initial state.
According to this sequence , using the transformation , which defines the (,, )-quantum hash function , a state is prepared:
-
2.
The second part of the algorithm – amplification consists of reading a word to be searched for, , and finding a number such that .
-
(a)
Conversion hash transformation – first level of amplification
The operator is applied to the state ,
where is the inverse transformation controlled by the searched word . We get the state
where for .
-
(b)
Amplification
-
(c)
Measurement
The first qubits of the final state are measured in a computational basis. If the last qubits are all zero, then the measurement result of the first qubits is declared as the index of the word in the sequence , for which .
-
(a)
6 Analysis of the algorithm
Meaningfully, the algorithm has the following characteristics.
Theorem 4
For the algorithm , which searches for the occurrence of an element with length in an unordered sequence of elements, the following is true
where is the number of qubits allocated to the (,,) - quantum hash function value.
6.1 Proof of the Theorem 4.
The rest (main) two parts of the proof of the statement of the Theorem 4 a) space complexity of the algorithm and b) the proof of the correctness of the algorithm are completely determined by properties of -quantum hash function .
6.1.1 Estimation of the probability of the algorithm success and query complexity
The proof of the correctness of the algorithm that is, the proof of the high probability of the success of the algorithm follows the proof of the Theorem 2 with one modification.
Property 4
Let the function be an -stable function. Then for the state
the following is true: If , then . If , then .
The proof of the Property 4 repeats the proof of the Lemma 1 with the simplification that we do not need to prove the -orthogonality (-stable) property of function . Remind that such the -stability of the function is essential point in the proof of Lemma 1. But here we have such property (-stability) from the definition of (,,)-quantum hash function .
Query complexity of the algorithm is determined by the number of applications of Grover iterations to search for occurrences of the word in a dictionary consisting of words. The implementation of the “query” part of the algorithm is described in the section 2b and in fact repeats the proof for the presented characteristics in Theorem 2.
We present only the final calculations from the proof.
Let is the probability of measuring one of the states after applying all Grover iterations. Note that for small : for small (). In particular, in our case
We choose such that , meaning (). Choosing provides .
is the probability of measuring a unique basic state among all basic states , of the part of the state (6). From this and from (4), (5) it follows that
Thus, selecting and choosing close to 0 provides .
Comment 2
Here we would like to comment on the second part of the algorithm (amplification) and its stages as follows. We call (following the [4]) the basic states in the form as “good states”, which are states whose amplitudes we amplify using well-known quantum amplification procedure. The remaining basic states are called “bad states”.
Note that there are good basic states
and only (some of them) states for which among the “good states” are “correct”. The necessary distinction between correct and incorrect states among the “good states” is implemented by means of the reverse hash transformation. This is the first level of amplification, and it works as follows.
Reverse hashing with a given word creates a state that is divided into two sets: a set of good states and a set of bad states. The set of good states is further divided into correct states and incorrect states. Reverse hashing operates such that the amplitudes of incorrect states is reduced from the initial value of to , while remaining amplitudes remain at . This is the first step in amplitude amplification. Subsequent steps of Grover’s algorithm amplify the amplitude of both correct and incorrect good states, such that the probability of finding correct states approaches 1, while the probability for incorrect states does not grow significantly.
6.1.2 Estimation of the memory
The space complexity is determined by the number of qubits representing the initial state and its transformations. The essential component of here is — the compression ratio achieved by the hash function .
Upper bound for depends on the type of the ()-quantum hash function.
7 Conclusion
In conclusion, we note that the algorithm is based on the application of a -quantum hash function.
We will call a (,,)-quantum hash function “good” if it is highly compressive (that is, the number of qubits allocated for the value of the hash function should be close to the lower bound defined in Theorem 3). Otherwise, we will consider the (,,)-quantum hash function to be “bad”.
Choosing a “good” -quantum hash function allows you to achieve a significant reduction in the number of qubits required for the algorithm to work.
The fingerprinting function described in Section 3, which is based on error-correcting codes, is a (,,)-quantum hash function. In addition, it is a good hash function: the estimate of is defined as .
Another function, that is also a ()-quantum function, is based on Freivald’s fingerprinting. This technique was proposed in the 1970s and has a long history of various applications. In particular, for quantum computations it has been used to construct efficient quantum automata [19] and quantum branching programs [20].
Freivald’s fingerprinting also defines a ()-quantum function (for the construction, see, e.g., [18] and the book [16]). In the content of this work, such the ()-quantum function is also “good”, i.e. for and with equal to almost .
So, the above two (,,)-quantum hash functions are “good”. That is they use qubits which are exponentially less than the length of a word from the dictionary.
An example of a “bad” ()-quantum hash function is one based on the universal family of linear hash functions [18], see also the book [16]. This hash function is not asymptotically optimal in terms of the number of qubits required for construction.
The study of various ()- quantum hash functions, problems of their efficient implementation and development of algorithms based on them continues.
References
- [1] Lov K. Grover. A fast quantum mechanical algorithm for database search. In Gary L. Miller, editor, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, Philadelphia, Pennsylvania, USA, May 22-24, 1996, pages 212–219. ACM, 1996.
- [2] Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain Tapp. Quantum amplitude amplification and estimation. Contemporary Mathematics, 305:53–74, 2002.
- [3] Ronald de Wolf. Quantum computing: Lecture notes. CoRR, abs/1907.09415, 2019.
- [4] Phillip Kaye, Raymond Laflamme, and Michele Mosca. An Introduction to Quantum Computing. Oxford University Press, 2006.
- [5] Farid M. Ablayev, Marat Ablayev, Joshua Zhexue Huang, Kamil Khadiev, Nailya Salikhova, and Dingming Wu. On quantum methods for machine learning problems part I: quantum tools. Big Data Min. Anal., 3(1):41–55, 2020.
- [6] Farid M. Ablayev, Marat Ablayev, Joshua Zhexue Huang, Kamil Khadiev, Nailya Salikhova, and Dingming Wu. On quantum methods for machine learning problems part II: quantum classification algorithms. Big Data Min. Anal., 3(1):56–67, 2020.
- [7] Knuth, Donald E and Morris, Jr, James H and Pratt, Vaughan R. Fast pattern matching in strings. SIAM journal on computing, 6(2):323–350, 1977.
- [8] RM Karp. Efficient randomized pattern-matching algorithms, the ibm journal of research and development. http://www. research. ibm. com/journal/rd/312/ibmrd3102P. pdf, 31, 1987.
- [9] Francesco Pio Marino, Simone Faro, and Antonio Scardace. Practical implementation of a quantum string matching algorithm. In Proceedings of the 2024 Workshop on Quantum Search and Information Retrieval, pages 17–24, 2024.
- [10] Hariharan Ramesh and V Vinay. String matching in o (n+ m) quantum time. Journal of Discrete Algorithms, 1(1):103–110, 2003.
- [11] Ashley Montanaro. Quantum pattern matching fast on average. Algorithmica, 77(1):16–39, 2017.
- [12] Kapil Kumar Soni and Akhtar Rasool. Pattern matching: a quantum oriented approach. Procedia Computer Science, 167:1991–2002, 2020.
- [13] Farid Ablayev, Nailya Salikhova, and Marat Ablayev. Hybrid classical–quantum text search based on hashing. Mathematics, 12(12):1858, 2024.
- [14] Harry Buhrman, Richard Cleve, John Watrous, and Ronald De Wolf. Quantum fingerprinting. Physical Review Letters, 87(16):167902, 2001.
- [15] Farid M. Ablayev, Marat Ablayev, Alexander Vasiliev, and Mansur Ziatdinov. Quantum fingerprinting and quantum hashing. computational and cryptographical aspects. Balt. J. Mod. Comput., 4(4), 2016.
- [16] Ablayev F.and Ablayev M. and Vasiliev A. Quantum hashing: effective constructions (in Russian). Lamport, 2023.
- [17] Harry Buhrman, Richard Cleve, John Watrous, and Ronald de Wolf. Quantum fingerprinting. Phys. Rev. Lett., 87(16):167902, 2001.
- [18] Farid Ablayev and Marat Ablayev. Quantum hashing via classical -universal hashing constructions. arXiv preprint arXiv:1404.1503, 2014.
- [19] Andris Ambainis and Rusins Freivalds. 1-way quantum finite automata: strengths, weaknesses and generalizations. In Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No. 98CB36280), pages 332–341. IEEE, 1998.
- [20] Farid Ablayev and Alexander Vasiliev. Algorithms for quantum branching programs based on fingerprinting. arXiv preprint arXiv:0911.2317, 2009.