Quantum implementation of circulant matrices and its use in quantum string processing
Abstract
String problems in general can be solved faster by using special data structures such as suffixes in many cases structured as trees and arrays. In this paper, we show that suffixes used in those data structures can be obtained by using circulant matrices as a quantum operator which can be implemented in logarithmic time. Hence, if the strings are given as quantum states, using the presented circuit implementation one can do string processing efficiently on quantum computers.
Index Terms:
Quantum algorithms, circulant matrix, suffix trees, Burrows-Wheeler Transform, string processingI Introduction
For any given vector , the circulant matrix with the first row is defined as follows ([1, 2, 3]):
(1) |
The eigenspace of the circulant matrices are formed by Fourier matrices; therefore, the eigenvectors and the eigenvalues of are defined analytically in the following form of pairs: A th () eigenvector is given by
(2) |
where . And its corresponding eigenvalue is
(3) |
As a result, the eigendecomposition of any circulant matrix can be written as where represents the normalized matrix for the discrete Fourier transform and , which is a diagonal matrix representing the eigenvalues.
I-A Direct Implementation as a quantum circuit
One can implement as a quantum circuit by using its eigendecomposition : It is well known that can be implemented in time as the quantum Fourier transform [4]. In addition, since any vector of dimension can be implemented in time, we can also implement in linear time. Therefore, can be implemented as a quantum circuit by using two and single qubit quantum gates, which is less than the classical matrix size .
I-B Circuit implementation through permutations
can be represented as a function of the following permutation matrix, which is also circulant:
(4) |
Then, we can redefine as a polynomial function of with the coefficients given by :
(5) |
Since is an orthogonal matrix, we can construct its exact quantum circuit decomposition. Moreover, it is known that sum of unitary matrices can be implemented as a subpart of a larger quantum system with the help of ancilla registers (e.g., [5, 6, 7]). Here, we shall show that can be constructed as a vector by using where represents a combinations of the vector and includes the powers of used in the above polynomial. Later in the paper, we shall show that this way of implementation eases the way to solve some string problems.
I-B1 Circuit for
Since is a circulant matrix, we can define its any th eigenvalue by using Eq.(3):
(6) |
Therefore, its eigendecomposition can be written as:
(7) |
As a quantum circuit, we can implement as follows: First, observe that the power in is equal to the row index which can be found from the binary expansion of . Therefore, if a qubit is in , we apply a control phase gate to the first qubit. The phase of these gates are determined from the decimal index of the qubit in the binary expansion. We define the phase gate as follows:
(8) |
where is the order of the control qubit in the binary expansion . The resulting circuit is drawn in Fig.1 for four qubits. The number of quantum gates for this implementation is equivalent to the number of qubits which is .
\Qcircuit@C=0.7em @R=1em
\lstick &\qw\qw \qw\ctrl3 \qw
\lstick \qw\qw \ctrl2 \qw \qw
\lstick \qw\ctrl1\qw \qw \qw
\lstick\gateR_w(0) \gateR_w(1) \gateR_w(2) \gateR_w(3)\qw
I-B2 Circuit for
Consider the following matrix:
(9) |
which highly resembles given in Eq.(7). Therefore, we can implement the same way as we implemented : In this case, we only need to somehow replace with the power of :
(10) |
Since all the powers of s have the same eigenvectors, we apply and only once at the beginning and at the end of the circuit. We use the following quantum operation in the circuit:
(11) |
Here, when we look at the circuit for in Fig.1, we can see that the power can be distributed to the quantum gates inside the circuit since the order of the gates is not important. That means, we can obtain any power of by simply adjusting the angle values of the quantum gates. Therefore, for power of , we adjust the parameter of the rotation gates in Fig.1 as: , and so on. The final circuit for is just composed of the controlled versions of these quantum gates which is shown in Fig.2 and 3.
@C=0.7em @R=1em \lstick &\qw\qw\qw \qw\ctrl3 \qw \qw
2 \qw \qw \qw
1\qw \qw \qw \qw
F^-1\gateU_P(0) \gateU_P(1) \gateU_P(2) \gateU_P(3)\gateF\qw
@C=0.7em @R=1em
\lstick &\ctrl4\ctrl4 \ctrl4 \ctrl4 \qw
\lstick \qw\qw \qw\ctrl3 \qw
\lstick \qw\qw \ctrl2 \qw \qw
\lstick \qw\ctrl1\qw \qw \qw
\lstick\gateR_w(0×k) \gateR_w(1×k) \gateR_w(2×k) \gateR_w(3×k)\qw
I-C The overall complexity
As shown in Fig.2 and 3 the number of quantum gates are limited to the number of qubits used in the main system. The size of the permutation matrix is by , therefore the main system is described by qubits. Then, an additional qubit is used to construct . Therefore, in total the circuit requires qubits. The quantum gates are at most controlled by two qubits, therefore the total number of required single and two qubit gates are bounded by in general. More specifically, if we assume the general implementation of quantum Fourier transform requires gates (with an optimization, it may require ) and the decomposition of each Toffoli gate requires less than gates [4], then the total complexity becomes bounded by . This shows that we can form on quantum computers very efficiently since it requires a number of quantum gates which is a poly-logarithm of the dimension .
II Applications
II-A Suffix trees and arrays
Many string problems can be solved easier if they are stored by using well-known data structures such as suffix tries, trees, and arrays [8]. Below, we will follow the lecture notes in Ref.[9] to give a brief introduction of these data structures and explain how to implement them on quantum computers with the help of the circuits introduced in the previous section.
From a given string, a suffix can be chosen by determining a starting position. For instance, we can choose the following suffixes from string “banana”:
(12) |
Here, “$” is used to indicate the end of the string and considered less than any letter of the alphabet in lexicographical order.
In suffix arrays, these strings are put in the buckets based on their orders and sorted. In tries, the prefixes are put in the rooted tree in which every node has a child for each letter of the alphabet used in the string. In the suffix tries, the suffixes are put in a rooted tree in which every node has a child for each letter of the alphabet used in the string. Therefore, by following the paths from the root of the tree, it is possible to determine if a letter follows another and if the prefix exists or not. In the suffix trees, the tries are built for the suffixes and the number of nodes in the tries are optimized by compressing the nodes in the straight paths ( i.e. the paths that do not go more than one way in any nodes on the path.). Each path in the suffix tree ends with the letter “$”.
II-B Burrows Wheeler Transform[12]
The main idea in Burrows Wheeler transform (BWT) is to first generate a circulant matrix by using the given string and then sort this matrix column by column. In the sorted matrix, the last column is used as the equivalent transformed string. As an illustration, consider the following example string ”banana”, which is used in most textbooks:
(13) |
From the above, BWT(“banana”) = “annb$aa”. In the above matrix, if one considers “$” as 1s and the rest of the letters as 0s, then BWT of any string can be represented as a permutation matrix.
II-C Quantum implementation of suffixes
In the previous section we show how to construct . Here, consider that we are given the text including also the “$” sign. We apply in the main register and in the ancilla register. This generates the following vector:
(14) |
If we multiply this vector by , we get in Eq.(1) in vector form:
(15) |
where each represents a circularly permuted string. The above vector provides the unsorted suffixes whose end indicated by some end of file character ”$”. Since the order of this character provides information about the suffixes, the amplitude of this char may be adjusted higher so that from the measurements, the order can be obtained in an efficient way.
By using the indices of the end of file characters, we can collapse the quantum state onto any desired direction. In addition, we can draw some conclusion from the measurements on the collapsed state: e.g., . Therefore, the measurement statistics can be used to draw conclusions about the most common prefixes in the suffixes in the collapsed state.
Since in most string algorithms sorted structures allow us to develop more efficient algorithms, the same sorting can be also done on this vector.
III Sorting
Sorting problem simply can be described as finding an ordered list of items given as an unordered list. It is known that comparison based sorting algorithms such as merge sort and quicksort have running time for a list of items. This lower bound can be broken by using bucket sorting (counting sort) where each item is considered as a direct pointer to the bucket; therefore, the sorting is done in time. This simple approach is further improved to give time, where is the number of bits used to represent each item and time. Because of the memory constraints and requirements, in practice comparison sorts are used more often in practice than these kinds of sorting algorithms. [13]
On quantum computers, the sorting problem is considered based on registers . Here, each register represents an item in the unsorted list. Therefore, the algorithms try to prepare the registers in the output to encode the natural ordered list of the given items: i.e., . Based on this representation, it is shown that quantum algorithms based on comparison models have similar complexity bounds for the sorting problem: i.e. [14, 15].
As done in the classical sorting algorithms, bucket sort with an order preserving hash function can be used to sort the items stored in memory as:
(16) |
where is the index of the item in the given unordered list. The sorting problem becomes constructing the following quantum state:
(17) |
where represents the index of the item in the sorted list. We can rewrite this in terms of a hash function that maps an item to the index :
(18) |
can be as simple as a direct map or more general hash function. In particular consider as a partial order preserving hash function: i.e., if and , their real ordered locations at some distance from each other so that , then . Then, since we can apply an operator simultaneously to all items, we can generate their orders in time. Sometimes knowing the elements’ rough order in the array may be considered enough. In those cases, the hash function need not be perfect; therefore, one can use similar ideas to classical sorting algorithms such as bucket sort or the shell-sort 111Shell sort is a generalization of insertion sort algorithm, where items at certain distances are compared and if necessary swapped to have a k-sorted array: i.e. an array where the numbers are grouped into regions based on their orders. to generate some k-sorted array in which buckets are sorted, however, the items in the same buckets are not sorted.
As mentioned above, if we use comparison based sorting algorithms, then the sorting is almost the same as classical sorting algorithm: Let us consider the following vector whose construction is given in the previous section:
(19) |
In Burrows Wheeler transform, the items are sorted by columns. We can do the same sorting on this vector: A particular direct sorting may be as follows:
-
•
First, we compare each element with its left neighbor, then if it is necessary we swap them.
-
–
This step can be done in parallel, if the swap and comparison can be implemented in number of quantum gates, then it takes time 222Here, since the comparison and swap operations depend on the number of qubits, it may require controlled gates whose decomposition requires number of gates that are polynomial in the system size . If this step can be done in , then sorting can be done more efficiently..
-
–
-
•
In the second step we do the same thing for the right neighbors (we group two elements together and compare them.).
-
–
This step also requires number of quantum gates.
-
–
-
•
If we repeat the above steps for O(n) time, we basically get a simple time sorting algorithm.
IV Conclusion
In this paper, we describe how to generate suffix structures efficiently as a vector by using quantum circuits for circulant matrices. We discuss how the generated vector can be used in the string algorithms and sorted if necessary. As a future direction, we will apply this circuit to the sequence alignment and pattern matching problems. Since circulant matrices are used in convolutions, it can be also applied to problems in different areas such as convolutional neural network, time series analysis [16, 17].
References
- [1] G. H. Golub and C. F. Van Loan, Matrix computations. JHU press, 2013.
- [2] R. M. Gray et al., “Toeplitz and circulant matrices: A review,” Foundations and Trends® in Communications and Information Theory, vol. 2, no. 3, pp. 155–239, 2006.
- [3] H. Karner, J. Schneid, and C. W. Ueberhuber, “Spectral decomposition of real circulant matrices,” Linear Algebra and Its Applications, vol. 367, pp. 301–311, 2003.
- [4] M. A. Nielsen and I. Chuang, “Quantum computation and quantum information,” 2002.
- [5] A. M. Childs and N. Wiebe, “Hamiltonian simulation using linear combinations of unitary operations,” arXiv preprint arXiv:1202.5822, 2012.
- [6] A. Daskin, A. Grama, G. Kollias, and S. Kais, “Universal programmable quantum circuit schemes to emulate an operator,” The Journal of chemical physics, vol. 137, no. 23, p. 234112, 2012.
- [7] G. H. Low and I. L. Chuang, “Hamiltonian simulation by qubitization,” Quantum, vol. 3, p. 163, 2019.
- [8] U. Manber and G. Myers, “Suffix arrays: a new method for on-line string searches,” siam Journal on Computing, vol. 22, no. 5, pp. 935–948, 1993.
- [9] B. Langmead, “Burrows-wheeler transform and fm index,” Johns Hopkins University, accessed in 2022. [Online]. Available: https://www.cs.jhu.edu/~langmea/resources/lecture_notes/10_bwt_and_fm_index_v2.pdf
- [10] E. Ukkonen, “On-line construction of suffix trees,” Algorithmica, vol. 14, no. 3, pp. 249–260, 1995.
- [11] M. Mielczarek and J. Szyda, “Review of alignment and snp calling algorithms for next-generation sequencing data,” Journal of applied genetics, vol. 57, no. 1, pp. 71–79, 2016.
- [12] M. Burrows and D. Wheeler, “A block-sorting lossless data compression algorithm,” in Digital SRC Research Report. Citeseer, 1994.
- [13] T. Hagerup, “Sorting and searching on the word ram,” in Annual Symposium on Theoretical Aspects of Computer Science. Springer, 1998, pp. 366–398.
- [14] P. Høyer, J. Neerbek, and Y. Shi, “Quantum complexities of ordered searching, sorting, and element distinctness,” Algorithmica, vol. 34, no. 4, pp. 429–448, 2002.
- [15] A. Ambainis, “Quantum lower bounds by quantum arguments,” Journal of Computer and System Sciences, vol. 64, no. 4, pp. 750–767, 2002.
- [16] D. S. G. Pollock, “Circulant matrices and time-series analysis,” International Journal of Mathematical Education in Science and Technology, vol. 33, no. 2, pp. 213–230, 2002.
- [17] A. Daskin, “A walk through of time series analysis on quantum computers,” arXiv preprint arXiv:2205.00986, 2022.