This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Quantum algorithm for position weight matrix matching

Koichi Miyamoto [email protected] Center for Quantum Information and Quantum Biology, Osaka University, Toyonaka, Osaka 560-0043, Japan    Naoki Yamamoto Department of Applied Physics and Physico-Informatics, Keio University, Yokohama, Kanagawa, Japan Quantum Computing Center, Keio University, Yokohama, Kanagawa, Japan    Yasubumi Sakakibara Department of Biosciences and Informatics, Keio University, Yokohama, Kanagawa, Japan Quantum Computing Center, Keio University, Yokohama, Kanagawa, Japan
Abstract

We propose two quantum algorithms for a problem in bioinformatics, position weight matrix (PWM) matching, which aims to find segments (sequence motifs) in a biological sequence such as DNA and protein that have high scores defined by the PWM and are thus of informational importance related to biological function. The two proposed algorithms, the naive iteration method and the Monte-Carlo-based method, output matched segments, given the oracular accesses to the entries in the biological sequence and the PWM. The former uses quantum amplitude amplification (QAA) for sequence motif search, resulting in the query complexity scaling on the sequence length nn, the sequence motif length mm and the number of the PWMs KK as O~(mKn)\widetilde{O}\left(m\sqrt{Kn}\right), which means speedup over existing classical algorithms with respect to nn and KK. The latter also uses QAA, and further, quantum Monte Carlo integration for segment score calculation, instead of iteratively operating quantum circuits for arithmetic in the naive iteration method; then it provides the additional speedup with respect to mm in some situation. As a drawback, these algorithms use quantum random access memories and their initialization takes O(n)O(n) time. Nevertheless, our algorithms keep the advantage especially when we search matches in a sequence for many PWMs in parallel.

I Introduction

Quantum computing [1] is an emerging technology that has a potential to provide large benefits to various fields. Many quantum algorithms that speedup time-consuming problems in classical computing have been proposed, and their applications to practical problems in industry and science have been studied. In this paper, we focus on an important problem in bioinformatics: position weight matrix matching.

As a field in bioinformatics, sequence analysis, which focuses on biological sequences such as DNA sequences and protein amino-acid sequences, has a long history. Such a sequence is represented as a string of alphabets 𝒜\mathcal{A}, e.g. {A,G,T,C}\{A,G,T,C\} for nucleabases in DNA and 20 letters for 20 types of amino acids in a protein, and holds biological information. As a tool to extract important information from a sequence, position weight matrices (PWMs), also known as position specific scoring matrices (PSSMs), are often used. More concretely, a PWM is a tool to find segments with fixed length mm that seems to hold specific information from a sequence with length nn. As explained in details later, a PWM MM is a matrix of real values, and its entries determine a score given to a segment. For example, if the iith position in a segment has the jjth alphabet, the (i,j)(i,j)th entry of MM is the score of that position. The score of the segment is defined as the sum of the scores at all the positions. We then search segments that have scores higher than the predetermined threshold. In this way, we can find some specific patterns (sequence motifs) in a sequence, admitting not only exact match but also approximate match to some extent. PWMs is in fact used to, for example, find transcription factor binding sites in DNA [2] and infer the three-dimensional structure of a protein [3].

Following recent developments of next-generation DNA sequencing technology, the volume of data handled in sequence analysis is exponentially growing. Although many classical algorithms and tools for PWM matching have been devised so far [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16], it is interesting to investigate the potential of novel technologies such as quantum computing to speedup this numerical problem to the extent that classical algorithms cannot reach.

Based on such a motivation, in this paper, we propose two quantum algorithms for PWM matching. As far as the authors know, this is the first proposal on quantum algorithms for this problem, although there are some quantum algorithms for exact string match [17, 18, 19].

Our quantum algorithms are two-fold: calculating scores of segments and searching high-score segments. For the latter, we use quantum amplitude amplification (QAA) [20], which is an extension of Grover’s algorithm for unstructured database search [21]. As is well known, this provides the quadratic quantum speedup with respect to the number of searched data points, which now corresponds to nn the length of the sequence.

On the former, we consider two approaches for score calculation, which differentiate the two proposed algorithms. The first one is calculating the segment score by adding the position-wise scores one by one using the quantum circuits for arithmetic. We name the PWM matching quantum algorithm based on this approach the naive iteration method. By this method, for any sequence with length nn and any KK PWMs for sequence motifs with length mm, given the oracles to get the specified entry in them, we can find nsoln_{\rm sol} matches with high probability making O~(mKnnsol)\widetilde{O}\left(m\sqrt{Knn_{\rm sol}}\right) queries111The symbol O~()\widetilde{O}(\cdot) hides logarithmic factors in O()O(\cdot). to the oracles. As far as the authors know, there is no known classical PWM matching algorithm whose worst-case complexity is sublinear to nn, and thus the above complexity just shows the quantum speedup. Moreover, note that we aim to search the matches for the multiple PWMs at the same time, which has not been considered in classical algorithms. We achieve the quantum speedup with respect to KK too, compared with the KK-times sequential runs of the algorithm for the different PWMs, whose complexity obviously scales with KK linearly.

The second quantum algorithm uses the quantum Monte Carlo integration (QMCI) [22] to calculate the segment score; we call this method the QMCI-based method. QMCI is, similarly to classical Monte Carlo, the method to estimate expectations of random variables, integrals and sums, and also provides the quadratic speedup compared with the classical counterpart. We therefore use this to calculate the segment score, which is the sum of the position-wise scores, expecting the further speedup from the naive iteration method, especially when mm is large and thus the sum has many terms. This combination of QMCI and QAA is similar to the quantum algorithm for gravitational wave (GW) matched filtering proposed in [23]. A drawback of this approach is the possibility of false detection: the result of QMCI inevitably has an error and the erroneous estimate on the segment score can exceed the threshold even if the true score does not. To cope with this issue, we introduce two levels of the threshold, wsoftw_{\rm soft} and whardw_{\rm hard}, which have the following meaning: we never want to miss segments with scores higher than whardw_{\rm hard} or falsely find those with scores lower than wsoftw_{\rm soft}, and it is not necessary but good to find those with scores between wsoftw_{\rm soft} and whardw_{\rm hard}. In this setting, the QMCI-based method outputs all the segments with scores higher than whardw_{\rm hard} possibly along with some of those with in-between scores with high probability, making O~(mnsoftKnwhardwsoft)\widetilde{O}\left(\frac{mn_{\rm soft}\sqrt{Kn}}{w_{\rm hard}-w_{\rm soft}}\right) oracle calls at most, where nsoftn_{\rm soft} is the number of the segments with scores higher than wsoftw_{\rm soft}. Although this complexity seemingly has the same dependency on mm as the naive iteration method, it can be sublinear to mm, since, as explained later, a reasonable choice of wsoftw_{\rm soft} and whardw_{\rm hard} is such that whardwsoftmw_{\rm hard}-w_{\rm soft}\sim\sqrt{m}.

Although the complexities of the proposed quantum algorithms are sublinear to nn and/or mm, they require time for preparation. The oracles used in the algorithms can be implemented by quantum random access memory (QRAM) [24], and initialization of QRAM, that is, registering the values of the entries in the sequence and the PWMs takes time, which is estimated as O(n)O(n) in usual situations. Despite of this initialization cost, our quantum algorithms still have the advantage over the existing classical algorithms, since, among classical ones with O(n)O(n) initialization cost, none has the worst-case complexity sublinear to nn in the main part. Also note that, once we prepare the QRAM for a sequence, our quantum algorithms can search the matches between that sequence and many PWMs, with much smaller initialization cost of the QRAM for the PWMs.

The rest of this paper is organized as follows. Sec. II is preliminary one, where we introduce PWM matching and some building-block quantum algorithms such as QAA and QMCI. Sec. III is the main part, where we explain our quantum algorithms for PWM matching, the naive iteration method and the QMCI-based method, presenting the detailed procedures in them and the estimations on their complexities. In Sec. IV, we discuss the aforementioned issues on our algorithms, the preparation cost for the QRAMs and the plausible setting on the segment score thresholds. Sec. V summarizes this paper.

II Preliminary

II.1 Notation

We denote the set of all positive real numbers by +\mathbb{R}_{+} and the set of all nonnegative real numbers by 0\mathbb{R}_{\geq 0}.

For each nn\in\mathbb{N}, we define [n]:={1,,n}[n]:=\{1,...,n\}, [n]0:={0,,n1}[n]_{0}:=\{0,...,n-1\} and n:={m|mn}\mathbb{N}_{\geq n}:=\{m\in\mathbb{N}\ |\ m\geq n\}.

For any probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) and any random variable XX on it, we denote the expectation of XX by E[X]E_{\mathbb{P}}[X].

For any finite set 𝒮\mathcal{S} and any nn\in\mathbb{N}, we denote S=(s0,,sn1)𝒮nS=(s_{0},...,s_{n-1})\in\mathcal{S}^{n}, where si𝒮s_{i}\in\mathcal{S} for each i[n]0i\in[n]_{0}, by S=s0..sn1S=s_{0}..s_{n-1}.

For any equation or inequality CC, 𝟙C\mathbbm{1}_{C} takes 1 if CC is satisfied, and 0 otherwise.

For any xx\in\mathbb{R}, if |xy|ϵ|x-y|\leq\epsilon holds for some yy\in\mathbb{R} and ϵ+\epsilon\in\mathbb{R}_{+}, we say that xx is an ϵ\epsilon-approximation of yy.

II.2 Position weight matrix matching

Here, we formally define the problem we hereafter consider.

Problem 1 (PWM matching).

Suppose that we are given the following:

  • A finite set 𝒜\mathcal{A} called the alphabet, whose elements are labeled with integers in [|𝒜|]0[|\mathcal{A}|]_{0}.

  • KK matrices Mk=(Mk(i,a))i[m]0,a𝒜m×|𝒜|,k[K]0M_{k}=(M_{k}(i,a))_{i\in[m]_{0},a\in\mathcal{A}}\in\mathbb{R}^{m\times|\mathcal{A}|},k\in[K]_{0} called the PWMs, where KK\in\mathbb{N} and m2m\in\mathbb{N}_{\geq 2} is called the length of the PWM.

  • An element S=s0..sn1S=s_{0}..s_{n-1} in 𝒜n\mathcal{A}^{n}, where nn is an integer larger than mm. We call it a sequence.

  • wthw_{\rm th}\in\mathbb{R} called the threshold.

For each (k,i)𝒫all:=[K]0×[nm+1]0(k,i)\in\mathcal{P}_{\rm all}:=[K]_{0}\times[n-m+1]_{0}, define

wk,i:=WMk(si..si+m1),w_{k,i}:=W_{M_{k}}(s_{i}..s_{i+m-1}), (1)

where, for every M=(M(i,a))i[m]0,a𝒜m×|𝒜|M=(M(i,a))_{i\in[m]_{0},a\in\mathcal{A}}\in\mathbb{R}^{m\times|\mathcal{A}|} and u0..um1𝒜mu_{0}..u_{m-1}\in\mathcal{A}^{m},

WM(u0..um1):=j=0m1M(j,uj).W_{M}(u_{0}..u_{m-1}):=\sum_{j=0}^{m-1}M(j,u_{j}). (2)

Then, we want to find all the elements in the set

𝒫sol:={(k,i)𝒫all|wk,iwth}.\mathcal{P}_{\rm sol}:=\{(k,i)\in\mathcal{P}_{\rm all}\ |\ w_{k,i}\geq w_{\rm th}\}. (3)
Example 1 (PWM score calculation).

An example of calculating scores for segments using PWM is shown below. The following PWM of length 8 represents the binding site motif for a transcription factor:

A 1.31-1.31 0.62-0.62 1.31-1.31 +0.63+0.63 1.31-1.31 1.31-1.31 1.31-1.31 +0.48+0.48
C 0.83-0.83 0.83-0.83 +1.12+1.12 0.83-0.83 +1.12+1.12 +1.12+1.12 +1.37+1.37 0.83-0.83
G 0.83-0.83 +1.25+1.25 +0.27+0.27 +0.27+0.27 0.83-0.83 +0.27+0.27 0.83-0.83 +0.56+0.56
T +0.89+0.89 1.31-1.31 1.31-1.31 1.31-1.31 0.21-0.21 1.31-1.31 1.31-1.31 1.31-1.31
(1) (2) (3) (4) (5) (6) (7) (8)

Given this PWM MM, the score WMW_{M} for the DNA sequence (segment) “TACATGCA” is calculated as follows:

WM(𝚃𝙰𝙲𝙰𝚃𝙶𝙲𝙰)\displaystyle W_{M}(\mathtt{TACATGCA})
=+0.89𝚃0.62𝙰+1.12𝙲+0.63𝙰0.21𝚃+0.27𝙶+1.37𝙲+0.48𝙰\displaystyle=\ \overbrace{+0.89}^{\mathtt{T}}\overbrace{-0.62}^{\mathtt{A}}\overbrace{+1.12}^{\mathtt{C}}\overbrace{+0.63}^{\mathtt{A}}\overbrace{-0.21}^{\mathtt{T}}\overbrace{+0.27}^{\mathtt{G}}\overbrace{+1.37}^{\mathtt{C}}\overbrace{+0.48}^{\mathtt{A}}
= 3.93\displaystyle=\ 3.93

This PWM matching is then applied to a long genome DNA sequence of million bases such that every segment ii in the DNA sequence is assigned a score WM(uiui+m1)W_{M}(u_{i}\ldots u_{i+m-1}) and we search 𝒫sol\mathcal{P}_{\rm sol}, segments with scores higher than the threshold wthw_{\rm th}.

In the Problem 3, we consider the match with the multiple PWMs simultaneously (K2K\geq 2), although Example 1 is a single-PWM case (K=1K=1). In general, the DNA sequence of a genome contains hundreds of sequence motifs and the annotation for a genome sequence must be completed by finding all sequence motifs using multiple PWMs simultaneously. As we will see later, we can achieve the quantum speedup with respect to the number KK of mulitple PWMs; that is, our quantum algorithm finds all the matches between the sequence SS and the multiple PWMs {Mk}k=0K1\{M_{k}\}_{k=0}^{K-1} faster than iterating individually searching for the matches between SS and each PWM. One might concern that, although it is assumed that all the KK PWSs have same length, this is not always the case. This point is easily settled as follows. Denoting the lengths of M0,,MK1M_{0},...,M_{K-1} by m0,,mK1m_{0},...,m_{K-1}, respectively, we set m:=max{m0,,mK1}m:=\max\{m_{0},...,m_{K-1}\} and, for each k[K]0k\in[K]_{0}, replace MkM_{k} with the m×|𝒜|m\times|\mathcal{A}| matrix whose first to mkm_{k}th rows are those in the original MkM_{k} and (mk+1)(m_{k}+1)th to mmth rows are filled with 0. Note that the score of each segment in SS does not change under this modification222Note that, for k[K]0k\in[K]_{0} such that mk<mm_{k}<m, this modification makes the last mmkm-m_{k} segments with length mkm_{k} out of the scope of the matching, although they should be considered. We calculate the scores for such segments and check whether they exceed the threshold, separately from our algorithm. We can reasonably assume that these additional calculations and checks take the negligible time, as far as the number of these exceptional segments, mmkm-m_{k}, is much smaller than that of all the segment, nmk+1n-m_{k}+1..

The typical orders of magnitudes of the parameters in PWM matching are as follows. The sequence length nn can be of order 10810^{8} (resp. 10610710^{6}-10^{7}) and the number of PWMs KK may be of order 10210^{2} (resp. 10410^{4}) for DNA (resp. protein) [11]. The sequence motif length mm is typically about ten or several tens [11], but motifs with lengths greater than 10210^{2} are sometimes considered for protein [25].

Hereafter, we assume that entries of PWMs are bounded:

0Mk(i,a)10\leq M_{k}(i,a)\leq 1 (4)

for every k[K]0k\in[K]_{0}, i[m]0i\in[m]_{0} and a𝒜a\in\mathcal{A}. This is just for the later convenience in using QMCI for score calculation. Although this condition is not satisfied in general cases including Example 1, we can meet it by rescaling. That is, we redefine MkM^{\prime}_{k} as MkM_{k}, where

Mk(i,a):=Mk(i,a)MminMmaxMminM_{k}^{\prime}(i,a):=\frac{M_{k}(i,a)-M_{\rm min}}{M_{\rm max}-M_{\rm min}} (5)

with

Mmax\displaystyle M_{\rm max} :=\displaystyle:= max(k,i,a)[K]0×[m]0×𝒜Mk(i,a),\displaystyle\max_{(k,i,a)\in[K]_{0}\times[m]_{0}\times\mathcal{A}}M_{k}(i,a),
Mmin\displaystyle M_{\rm min} :=\displaystyle:= min(k,i,a)[K]0×[m]0×𝒜Mk(i,a).\displaystyle\min_{(k,i,a)\in[K]_{0}\times[m]_{0}\times\mathcal{A}}M_{k}(i,a). (6)

It is easy to see that after this redefinition Eq. (4) holds. We also need to replace the threshold wthw_{\rm th} with

wth:=wthmMminMmaxMmin.w^{\prime}_{\rm th}:=\frac{w_{\rm th}-mM_{\rm min}}{M_{\rm max}-M_{\rm min}}. (7)

Note that the set (3) is invariant under the above rescaling.

II.3 Quantum algorithms

Here, we briefly explain the building-block quantum algorithms for our PWM matching algorithm.

II.3.1 Arithmetic on a quantum computer

Before introducing the quantum algorithms, let us summarize the setup for quantum computation and the elementary operations we use in this paper.

We consider computation on the system consisting of the multiple quantum registers. We treat real numbers in fixed-point binary representation and, for each xx\in\mathbb{R}, we denote by |x\ket{x} the computational basis state on a quantum register where the bit string on the register corresponds to the binary representation of xx. We assume that each register has a sufficient number of qubits and thus the error from finite-precision representation is negligible.

We use the oracles for elementary arithmetic such as the adder Oadd|x|y=|x|x+yO_{\rm add}\ket{x}\ket{y}=\ket{x}\ket{x+y}, the subtracter Osub|x|y=|xy|yO_{\rm sub}\ket{x}\ket{y}=\ket{x-y}\ket{y} and the multiplier Omul|x|y=|x|xyO_{\rm mul}\ket{x}\ket{y}=\ket{x}\ket{xy}, where x,yx,y\in\mathbb{R}. Many proposals on implementations of such oracles have been made so far: see [26] and the references therein.

Besides, we also assume the availability of the following oracles. The oracle O=O_{=} checks whether two numbers are equal or not: for any x,yx,y\in\mathbb{R}, O=|x|y|0=|x|y(𝟙x=y|0+𝟙xy|1)O_{=}\ket{x}\ket{y}\ket{0}=\ket{x}\ket{y}\left(\mathbbm{1}_{x=y}\ket{0}+\mathbbm{1}_{x\neq y}\ket{1}\right). Also, the comparator OcompO_{\rm comp} acts as Ocomp|x|y|0=|x|y(𝟙xy|1+𝟙x<y|0)O_{\rm comp}\ket{x}\ket{y}\ket{0}=\ket{x}\ket{y}\left(\mathbbm{1}_{x\geq y}\ket{1}+\mathbbm{1}_{x<y}\ket{0}\right) for any x,yx,y\in\mathbb{R}. These oracles can be implemented via subtraction. To check x=yx=y or not, we may calculate xyx-y and see whether it is 0 or not. Therefore, we can implement O=O_{=} by using a subtracter, followed by a multiple controlled-NOT (CNOT) gate activated if and only if all the bits of xyx-y are 0, and at last a NOT gate. Moreover, we can implement OcompO_{\rm comp} by combining a substacter with a CNOT gate activated if and only if the most significant bit of xyx-y is 0; this is because, if we adopt 2’s complement method to represent negative numbers, the most significant bit represents the sign of a number [27].

In addition, for any N2N\in\mathbb{N}_{\rm 2}, we assume the availability of the oracle ONEqPrO^{\rm EqPr}_{N} that generates the equiprobable superposition of |0,|1,,|N1\ket{0},\ket{1},...,\ket{N-1}: ONEqPr|0=1Ni=0N1|iO^{\rm EqPr}_{N}\ket{0}=\frac{1}{\sqrt{N}}\sum_{i=0}^{N-1}\ket{i}. If N=2nN=2^{n} with some nNn\in N, we can implement this oracle just by operating a Hadamard gate on each qubit of the nn-qubit register. If not, letting nn be log2N\lceil\log_{2}N\rceil, we can implement ONEqPrO^{\rm EqPr}_{N} by the method in [28] to generate a state in which a given probability density function p(x)p(x) is amplitude-encoded, with p(x)p(x) defined on [0,1][0,1] as p(x)=𝟙xN/2np(x)=\mathbbm{1}_{x\leq N/2^{n}}.

Lastly, we use the oracle ONmedO^{\rm med}_{N} that outputs the median med(x1,,xN){\rm med}(x_{1},...,x_{N}) of any NN real numbers x1,,xNx_{1},...,x_{N}, that is, ONmed|x1|xN|0=|x1|xN|med(x1,,xN)O^{\rm med}_{N}\ket{x_{1}}\cdots\ket{x_{N}}\ket{0}=\ket{x_{1}}\cdots\ket{x_{N}}\ket{{\rm med}(x_{1},...,x_{N})}. The implementations of this oracle have been discussed in [23].

Hereafter, we collectively call the above oracles the arithmetic oracles.

II.3.2 Quantum amplitude amplification

The first building block for our PWM matching algorithm is QAA [20]. Given the oracle to generate the superposition state |Φ\ket{\Phi}, QAA amplifies the amplitude of the “marked state” in |Φ\ket{\Phi} so that we can obtain it quadratically faster than naively iterating the process of generating |Φ\ket{\Phi} and measurement on it. Here, we give the following theorem, which was presented in [23] as a slight modification of the original one in [20].

Theorem 1 (Theorem 2 in [23], originally Theorem 3 in [20]).

Suppose that we are given an access to an oracle VV that acts on the system RR consisting of a quantum register R1R_{1} and a qubit R2R_{2}, as

V|0|0¯=a|ϕ1|1¯+1a|ϕ0|0¯=:|Φ,V\ket{0}\ket{\bar{0}}=\sqrt{a}\ket{\phi_{1}}\ket{\bar{1}}+\sqrt{1-a}\ket{\phi_{0}}\ket{\bar{0}}=:\ket{\Phi}, (8)

where |ϕ1\ket{\phi_{1}} and |ϕ0\ket{\phi_{0}} are some quantum states on R1R_{1} and a[0,1)a\in[0,1). Then, for any γ,δ(0,1)\gamma,\delta\in(0,1), there exists a quantum algorithm QAA(V,γ,δ)\textup{{QAA}}(V,\gamma,\delta) that behaves as follows:

  • The output of the algorithm is either of

    1. (A)

      the message “success” and the quantum state |ϕ1\ket{\phi_{1}}

    2. (B)

      the message “failure”

  • If aγa\geq\gamma, the algorithm outputs (A) with probability at least 1δ1-\delta, making O(logδ1a)O\left(\frac{\log\delta^{-1}}{\sqrt{a}}\right) queries to VV.

  • If 0<a<γ0<a<\gamma, the algorithm outputs either (A) or (B), making O(logδ1γ)O\left(\frac{\log\delta^{-1}}{\sqrt{\gamma}}\right) queries to VV.

  • If a=0a=0, the algorithm certainly outputs (B), making O(logδ1γ)O\left(\frac{\log\delta^{-1}}{\sqrt{\gamma}}\right) queries to VV333Although Theorem 2 in [23] does not mention the case that a=0a=0, the statement on this case is proved in Appendix A.

For the detailed procedure of QAA(V,γ,δ)\textup{{QAA}}(V,\gamma,\delta) and the proof of Theorem 1, see [23] and the original paper [20].

II.3.3 Quantum Monte Carlo integration

QAA leads to quantum amplitude estimation (QAE) algorithm [20], which estimates the amplitude of the marked state; QAE is further extended to the quantum algorithm for estimating the expectation of a random variable [22], which we call QMCI in this paper. This is the second building block. There are various versions of QMCI for different situations. For the PWM matching problem, we can use the following one, which assumes that the variable is bounded.

Theorem 2 (Theorem 2.3 in [22], modified).

Let NN\in\mathbb{N} and 𝒳\mathcal{X} be a set of NN real numbers X0,,XN1X_{0},...,X_{N-1}, each of which satisfies 0Xi10\leq X_{i}\leq 1. Suppose that we are given an oracle OXO_{X} that acts as

O𝒳|i|0=|i|Xi,O_{\mathcal{X}}\ket{i}\ket{0}=\ket{i}\ket{X_{i}}, (9)

for any i[N]0i\in[N]_{0}. Then, for any ϵ\epsilon and δ\delta in (0,1)(0,1), there is an oracle O𝒳,ϵ,δmeanO_{\mathcal{X},\epsilon,\delta}^{\rm mean} such that

O𝒳,ϵ,δmean|0=y𝒴αy|y,O_{\mathcal{X},\epsilon,\delta}^{\rm mean}\ket{0}=\sum_{y\in\mathcal{Y}}\alpha_{y}\ket{y}, (10)

where some ancillary qubits are undisplayed. Here, 𝒴\mathcal{Y} is a finite set of real numbers that includes a subset 𝒴~\tilde{\mathcal{Y}} consisting of ϵ\epsilon-approximations of the mean of X0,,XN1X_{0},...,X_{N-1},

μ:=1Ni=0N1Xi,\mu:=\frac{1}{N}\sum_{i=0}^{N-1}X_{i}, (11)

and {αy}y𝒴\{\alpha_{y}\}_{y\in\mathcal{Y}} are complex numbers satisfying

y~𝒴~|αy~|21δ.\sum_{\tilde{y}\in\tilde{\mathcal{Y}}}|\alpha_{\tilde{y}}|^{2}\geq 1-\delta. (12)

In O𝒳,ϵ,δmeanO_{\mathcal{X},\epsilon,\delta}^{\rm mean},

O(1ϵlog(1δ))O\left(\frac{1}{\epsilon}\log\left(\frac{1}{\delta}\right)\right) (13)

queries to O𝒳O_{\mathcal{X}} are made.

The proof is presented in Appendix B, where the detailed way to construct the oracle O𝒳,ϵ,δmeanO_{\mathcal{X},\epsilon,\delta}^{\rm mean} is also shown. Note that this theorem is slightly modified from the original one, Theorem 2.3 in [22], in some points. First, our Theorem 2 is on the algorithm to calculate the average μ\mu of a sequence and the sum, which is instantly obtained by multiplying the sequence size NN to the average. On the other hand, Theorem 2.3 in [22] presents the algorithm to calculate the expectation of a random variable, and so calculation of the average and the sum is a special case. This is sufficient for us, since we use QMCI to calculate the score of a segment in the sequence SS, which is in fact the sum of the scores of the entries in the segment. Second, although the algorithm in Theorem 2.3 in [22] outputs the approximation of μ\mu, the above Theorem 2 mentions only generating the state in Eq. (10). Although we obtain the approximation of μ\mu by measuring the state, we do not do so in our PWM matching algorithm. This is because our algorithm uses QMCI as a subroutine to calculate the score of each segment in the sequence SS in the high-score segment search by QAA. This modification is similar to that in [23], which also presents QMCI with no measurement. However, QMCI in this paper is different from that in [23] too, since the former assumes that each XiX_{i} is bounded but the latter assumes that the upper bound on the variance of the sequence is given.

III Quantum algorithm for PWM matching

We now present the quantum algorithm for PWM matching. The basic strategy is as follows: we calculate {wk,i}k,i\{w_{k,i}\}_{k,i} for all the pairs (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all} parallelly in a quantum superposition and find the pairs with high scores by QAA. We present the two versions of the quantum algorithm, the naive iteration method and the QMCI-based method, whose difference is how to calculate wk,iw_{k,i}.

III.1 Assumption on the quantum accesses to the sequence and the PWMs

Before presenting the quantum algorithm, we need to make some assumptions on the available oracles. First, for score calculation on a quantum computer, we need to load the entries in the sequence SS and the PWMs {Mk}k\{M_{k}\}_{k} onto quantum registers. This is formally stated as follows.

Assumption 1.

We have accesses to the following oracles:

  • OseqO_{\rm seq}: for any i[n]0i\in[n]_{0},

    Oseq|i|0=|i|si.O_{\rm seq}\ket{i}\ket{0}=\ket{i}\ket{s_{i}}. (14)
  • OPWMO_{\rm PWM}: for any (k,i,a)[K]0×[m]0×𝒜(k,i,a)\in[K]_{0}\times[m]_{0}\times\mathcal{A},

    OPWM|k|i|a|0=|k|i|a|Mk(i,a).O_{\rm PWM}\ket{k}\ket{i}\ket{a}\ket{0}=\ket{k}\ket{i}\ket{a}\ket{M_{k}(i,a)}. (15)

Here and hereafter, |a\ket{a} with a𝒜a\in\mathcal{A} is regarded as the computational basis state corresponding to the integer that labels aa. We can implement these oracles if QRAM [24] is available, but non-negligible preprocessing cost is needed. We will discuss this point in Sec. IV.1.

Besides, we assume that we can use the oracle that determines whether a given index pair (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all} is in a given subset 𝒫𝒫all\mathcal{P}\subset\mathcal{P}_{\rm all} or not.

Assumption 2.

For any subset 𝒫𝒫all\mathcal{P}\subset\mathcal{P}_{\rm all}, we have an access to the oracle O𝒫O_{\mathcal{P}} that acts as

O𝒫|k|i|0=|k|i(𝟙(k,i)𝒫|0+𝟙(k,i)𝒫|1)O_{\mathcal{P}}\ket{k}\ket{i}\ket{0}=\ket{k}\ket{i}\left(\mathbbm{1}_{(k,i)\in\mathcal{P}}\ket{0}+\mathbbm{1}_{(k,i)\notin\mathcal{P}}\ket{1}\right) (16)

for any (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all}.

We can also implement this oracle using QRAM as discussed in Sec. IV.1.

Since OseqO_{\rm seq}, OPWMO_{\rm PWM} and O𝒫O_{\mathcal{P}} are supposed to be realized by QRAM, we hereafter consider the number of queries to them as a metric of the complexity of our algorithm.

III.2 Algorithm \@slowromancapi@: the naive iteration method

We then explain the first algorithm, the naive iteration method.

We can calculate wk,iw_{k,i} by naively iterating the queries to OseqO_{\rm seq} and OPWMO_{\rm PWM} and additions as Procedure 1.

Input: k[K]0,i[m]0k\in[K]_{0},i\in[m]_{0}
1 \orighyper@refstepcounterprocedures
2Prepare the quantum registers R1,,R7R_{1},...,R_{7} and initialize R1R_{1}, R2R_{2}, R4R_{4} and the others to |k\ket{k}, |i\ket{i}, |i\ket{i} and |0\ket{0}, respectively.\orighyper@refstepcounterprocedures
3 \orighyper@refstepcounterprocedures
4for j=0,,m1j=0,...,m-1 do\orighyper@refstepcounterprocedures
5       \orighyper@refstepcounterprocedures
6      Set R5R_{5} to |sl\ket{s_{l}} by OseqO_{\rm seq} with the value on R4R_{4} being ll.\orighyper@refstepcounterprocedures
7       \orighyper@refstepcounterprocedures
8      Set R6R_{6} to |Mk(l,a)\ket{M_{k}(l,a)} by OPWMO_{\rm PWM} with the values on R1R_{1}, R3R_{3} and R5R_{5} being kk, ll and aa, respectively.\orighyper@refstepcounterprocedures
9       \orighyper@refstepcounterprocedures
10      Using OaddO_{\rm add}, add the value on R6R_{6} to the value on R7R_{7}.\orighyper@refstepcounterprocedures
11       \orighyper@refstepcounterprocedures
12      if j<m1j<m-1 then\orighyper@refstepcounterprocedures
13             \orighyper@refstepcounterprocedures
14            Reset R5R_{5} and R6R_{6} to |0\ket{0} using the inverses of OseqO_{\rm seq} and OPWMO_{\rm PWM}, respectively.\orighyper@refstepcounterprocedures
15             \orighyper@refstepcounterprocedures
16            Increment the values on R3R_{3} and R4R_{4} by 1, using OaddO_{\rm add}.\orighyper@refstepcounterprocedures
17             \orighyper@refstepcounterprocedures
18      \orighyper@refstepcounterprocedures
19\orighyper@refstepcounterprocedures
Procedure 1 calculate wk,iw_{k,i} by naive iteration

In this procedure, the quantum state is transformed as follows:

|k|i|0|i|0|0|0\displaystyle\ket{k}\ket{i}\ket{0}\ket{i}\ket{0}\ket{0}\ket{0} (17)
3\displaystyle\xrightarrow{3} |k|i|0|i|si|0|0\displaystyle\ket{k}\ket{i}\ket{0}\ket{i}\ket{s_{i}}\ket{0}\ket{0}
4\displaystyle\xrightarrow{4} |k|i|0|i|si|Mk(0,si)|0\displaystyle\ket{k}\ket{i}\ket{0}\ket{i}\ket{s_{i}}\ket{M_{k}(0,s_{i})}\ket{0}
5\displaystyle\xrightarrow{5} |k|i|0|i|si|Mk(0,si)|Mk(0,si)\displaystyle\ket{k}\ket{i}\ket{0}\ket{i}\ket{s_{i}}\ket{M_{k}(0,s_{i})}\ket{M_{k}(0,s_{i})}
7\displaystyle\xrightarrow{7} |k|i|0|i|0|0|Mk(0,si)\displaystyle\ket{k}\ket{i}\ket{0}\ket{i}\ket{0}\ket{0}\ket{M_{k}(0,s_{i})}
8\displaystyle\xrightarrow{8} |k|i|1|i+1|0|0|Mk(0,si)\displaystyle\ket{k}\ket{i}\ket{1}\ket{i+1}\ket{0}\ket{0}\ket{M_{k}(0,s_{i})}
3\displaystyle\xrightarrow{3} |k|i|1|i+1|si+1|0|Mk(0,si)\displaystyle\ket{k}\ket{i}\ket{1}\ket{i+1}\ket{s_{i+1}}\ket{0}\ket{M_{k}(0,s_{i})}
4\displaystyle\xrightarrow{4} |k|i|1|i+1|si+1|Mk(1,si+1)|Mk(0,si)\displaystyle\ket{k}\ket{i}\ket{1}\ket{i+1}\ket{s_{i+1}}\ket{M_{k}(1,s_{i+1})}\ket{M_{k}(0,s_{i})}
5\displaystyle\xrightarrow{5} |k|i|1|i+1|si+1|Mk(1,si+1)|j=01Mk(j,si+j)\displaystyle\ket{k}\ket{i}\ket{1}\ket{i+1}\ket{s_{i+1}}\ket{M_{k}(1,s_{i+1})}\Ket{\sum_{j=0}^{1}M_{k}(j,s_{i+j})}
7\displaystyle\xrightarrow{7} |k|i|1|i+1|0|0|j=01Mk(j,si+j)\displaystyle\ket{k}\ket{i}\ket{1}\ket{i+1}\ket{0}\ket{0}\Ket{\sum_{j=0}^{1}M_{k}(j,s_{i+j})}
\displaystyle\rightarrow \displaystyle...
5\displaystyle\xrightarrow{5} |k|i|m1|i+m1|si+m1|Mk(m1,si+m1)\displaystyle\ket{k}\ket{i}\ket{m-1}\ket{i+m-1}\ket{s_{i+m-1}}\ket{M_{k}(m-1,s_{i+m-1})}
|j=0m1Mk(j,si+j)=wk,i.\displaystyle\quad\otimes\ \Bigg{|}\underbrace{\sum_{j=0}^{m-1}M_{k}(j,s_{i+j})}_{=w_{k,i}}\Bigg{\rangle}.

Here, the numbers on the arrows correspond to the steps in Procedure 1. We denote by Osc,itO_{\rm sc,it} the quantum circuit for the above operation.

Then, using Osc,itO_{\rm sc,it}, we can construct the quantum algorithm to find the high-score segments.

Theorem 3.

Consider Problem 3 under Assumptions 1 and 2. Suppose that we are given δ(0,1)\delta\in(0,1). Then, there exists a quantum algorithm that behaves as follows.

  1. (i)

    If nsol:=|𝒫sol|>0n_{\rm sol}:=|\mathcal{P}_{\rm sol}|>0, the algorithm outputs all the elements in 𝒫sol\mathcal{P}_{\rm sol} with probability at least 1δ1-\delta, making

    O(mKnnsollog(Knδ))O\left(m\sqrt{Knn_{\rm sol}}\log\left(\frac{Kn}{\delta}\right)\right) (18)

    queries to OseqO_{\rm seq} and OPWMO_{\rm PWM}, and

    O(Knnsollog(Knδ))O\left(\sqrt{Knn_{\rm sol}}\log\left(\frac{Kn}{\delta}\right)\right) (19)

    queries to O𝒫O_{\mathcal{P}} with 𝒫\mathcal{P} being some subsets in 𝒫all\mathcal{P}_{\rm all}.

  2. (ii)

    If 𝒫sol\mathcal{P}_{\rm sol} is empty, the algorithm certainly outputs the message “no match”, making

    O(mKnlog(Knδ))O\left(m\sqrt{Kn}\log\left(\frac{Kn}{\delta}\right)\right) (20)

    queries to OseqO_{\rm seq} and OPWMO_{\rm PWM}, and

    O(Knlog(Knδ))O\left(\sqrt{Kn}\log\left(\frac{Kn}{\delta}\right)\right) (21)

    queries to O𝒫O_{\mathcal{P}} with 𝒫\mathcal{P} being some subsets in 𝒫all\mathcal{P}_{\rm all}.

Proof.

First, note that we can perform the following operation for any subset 𝒫𝒫all\mathcal{P}\subset\mathcal{P}_{\rm all}:

|0|0|0|0|0|0|0\displaystyle\ket{0}\ket{0}\ket{0}\ket{0}\ket{0}\ket{0}\ket{0} (22)
\displaystyle\rightarrow 1Knk=0K1i=0n1|k|i|0|0|0|0|0,\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\ket{k}\ket{i}\ket{0}\ket{0}\ket{0}\ket{0}\ket{0},
\displaystyle\rightarrow 1Knk=0K1i=0n1|k|i|wk,i|0|0|0|0\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\ket{k}\ket{i}\ket{w_{k,i}}\ket{0}\ket{0}\ket{0}\ket{0}
\displaystyle\rightarrow 1Knk=0K1i=0n1|k|i|wk,i|wth|0|0|0\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\ket{k}\ket{i}\ket{w_{k,i}}\ket{w_{\rm th}}\ket{0}\ket{0}\ket{0}
\displaystyle\rightarrow 1Knk=0K1i=0n1|k|i|wk,i|wth(𝟙wk,iwth|1+𝟙wk,i<wth|0)|0|0\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\ket{k}\ket{i}\ket{w_{k,i}}\ket{w_{\rm th}}\left(\mathbbm{1}_{w_{k,i}\geq w_{\rm th}}\ket{1}+\mathbbm{1}_{w_{k,i}<w_{\rm th}}\ket{0}\right)\ket{0}\ket{0}
\displaystyle\rightarrow 1Knk=0K1i=0n1|k|i|wk,i|wth(𝟙wk,iwth|1+𝟙wk,i<wth|0)(𝟙(k,i)𝒫|0+𝟙(k,i)𝒫|1)|0\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\ket{k}\ket{i}\ket{w_{k,i}}\ket{w_{\rm th}}\left(\mathbbm{1}_{w_{k,i}\geq w_{\rm th}}\ket{1}+\mathbbm{1}_{w_{k,i}<w_{\rm th}}\ket{0}\right)\left(\mathbbm{1}_{(k,i)\in\mathcal{P}}\ket{0}+\mathbbm{1}_{(k,i)\notin\mathcal{P}}\ket{1}\right)\ket{0}
\displaystyle\rightarrow 1Knk=0K1i=0n1|k|i|wk,i|wth\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\ket{k}\ket{i}\ket{w_{k,i}}\ket{w_{\rm th}}\otimes
(𝟙wk,iwth(k,i)𝒫|1|1|1+𝟙wk,iwth(k,i)𝒫|1|0|0+𝟙wk,i<wth(k,i)𝒫|0|1|0+𝟙wk,i<wth(k,i)𝒫|0|0|0)\displaystyle\qquad\qquad\qquad\left(\mathbbm{1}_{w_{k,i}\geq w_{\rm th}\ \wedge\ (k,i)\notin\mathcal{P}}\ket{1}\ket{1}\ket{1}+\mathbbm{1}_{w_{k,i}\geq w_{\rm th}\ \wedge\ (k,i)\in\mathcal{P}}\ket{1}\ket{0}\ket{0}+\mathbbm{1}_{w_{k,i}<w_{\rm th}\ \wedge\ (k,i)\notin\mathcal{P}}\ket{0}\ket{1}\ket{0}+\mathbbm{1}_{w_{k,i}<w_{\rm th}\ \wedge\ (k,i)\in\mathcal{P}}\ket{0}\ket{0}\ket{0}\right)
=:\displaystyle=: |𝒫sol𝒫¯|Kn|ψ𝒫,1|1+|𝒫sol¯𝒫|Kn|ψ𝒫,0|0=:|Ψ𝒫\displaystyle\sqrt{\frac{\left|\mathcal{P}_{\rm sol}\cap\overline{\mathcal{P}}\right|}{Kn}}\ket{\psi_{\mathcal{P},1}}\ket{1}+\sqrt{\frac{\left|\overline{\mathcal{P}_{\rm sol}}\cup\mathcal{P}\right|}{Kn}}\ket{\psi_{\mathcal{P},0}}\ket{0}=:\ket{\Psi_{\mathcal{P}}}

Here, the seven kets correspond to the seven quantum registers, among which the first four ones have a sufficient number of qubits and the last three ones are single-qubit. The complement of a set is determined with the universal set being 𝒫all\mathcal{P}_{\rm all}.

|ψ𝒫,1:=1|𝒫sol𝒫¯|(k,i)𝒫sol𝒫¯|k|i|wk,i|wth|1|1\ket{\psi_{\mathcal{P},1}}:=\frac{1}{\sqrt{\left|\mathcal{P}_{\rm sol}\cap\overline{\mathcal{P}}\right|}}\sum_{(k,i)\in\mathcal{P}_{\rm sol}\cap\overline{\mathcal{P}}}\ket{k}\ket{i}\ket{w_{k,i}}\ket{w_{\rm th}}\ket{1}\ket{1} (23)

is the quantum state on the system consisting of the first to sixth registers, and |ψ𝒫,0\ket{\psi_{\mathcal{P},0}} is another state on the same system. In Eq. (22), OKEqPrO^{\rm EqPr}_{K} and OnEqPrO^{\rm EqPr}_{n} are used at the first arrow. At the second arrow, Osc,itO_{\rm sc,it} is used with the register R3,,R6R_{3},...,R_{6} in Procedure 1 undisplayed. At the third arrow, we just set wthw_{\rm th} on the fourth register. The fourth and fifth transformations are done by OcompO_{\rm comp} and O𝒫O_{\mathcal{P}}, respectively. Then, the last transformation is done by a Toffoli gate on the last three qubits. We denote by O~sc,it𝒫\tilde{O}^{\mathcal{P}}_{\rm sc,it} the oracle for the operation in Eq. (22). Note that O~sc,it𝒫\tilde{O}^{\mathcal{P}}_{\rm sc,it} contains one call to O𝒫O_{\mathcal{P}} and mm calls to OseqO_{\rm seq} and OPWMO_{\rm PWM}, since Osc,itO_{\rm sc,it} makes O(m)O(m) calls to them.

Then, the naive iteration method is presented as Algorithm 2.

Input: δ(0,1)\delta\in(0,1)
1
2Set 𝒫temp=\mathcal{P}_{\rm temp}=\emptyset and StopFlg=0{\rm StopFlg}=0
3
4while StopFlg=0{\rm StopFlg}=0 do
5       Perform QAA(O~sc,it𝒫temp,1Kn,δKn)\textup{{QAA}}\left(\tilde{O}^{\mathcal{P}_{\rm temp}}_{\rm sc,it},\frac{1}{Kn},\frac{\delta}{Kn}\right).
6      
7      if The output message is “success” then
8             Measure the first and second registers in the output state |ψ𝒫temp,1\ket{\psi_{\mathcal{P}_{\rm temp},1}}.
9            
10            Add the measurement outcome (k,i)(k,i) to 𝒫temp\mathcal{P}_{\rm temp}.
11            
12       else
13             Set StopFlg=1{\rm StopFlg}=1
14      
15
16if 𝒫temp\mathcal{P}_{\rm temp}\neq\emptyset then
17       Output 𝒫temp\mathcal{P}_{\rm temp}.
18 else
19       Output ”no match”.
20
Algorithm 2 the naive iteration method

Let us consider the case that nsol1n_{\rm sol}\geq 1. In this case, we run QAA repeatedly, and, if each run finishes with the message “success”, we obtain an element in 𝒫sol𝒫temp\mathcal{P}_{\rm sol}\setminus\mathcal{P}_{\rm temp}, that is, an element in 𝒫sol\mathcal{P}_{\rm sol} that has not been obtained in the previous runs yet. Thus, if QAAs finish with “success” nsoln_{\rm sol} times in a row, we obtain all the nsoln_{\rm sol} elements in 𝒫sol\mathcal{P}_{\rm sol}. This happens with probability at least

(1δKn)nsol(1δKn)Kn1δ,\left(1-\frac{\delta}{Kn}\right)^{n_{\rm sol}}\geq\left(1-\frac{\delta}{Kn}\right)^{Kn}\geq 1-\delta, (24)

since each QAA finishes with the message “success” with probability at least 1δKn1-\frac{\delta}{Kn}, according to Theorem 1. In the llth QAA, at which the nsoll+1n_{\rm sol}-l+1 elements in 𝒫sol\mathcal{P}_{\rm sol} remain not to be obtained, the number of the queries to O~sc,it𝒫temp\tilde{O}^{\mathcal{P}_{\rm temp}}_{\rm sc,it} is

O(Knnsoll+1log(Knδ))O\left(\sqrt{\frac{Kn}{n_{\rm sol}-l+1}}\log\left(\frac{Kn}{\delta}\right)\right) (25)

according to Theorem 1, since the amplitude of |ψ𝒫temp,1\ket{\psi_{\mathcal{P}_{\rm temp},1}} in |Ψ𝒫temp\Ket{\Psi_{\mathcal{P}_{\rm temp}}} is

|𝒫sol𝒫temp¯|Kn=nsoll+1Kn.\sqrt{\frac{\left|\mathcal{P}_{\rm sol}\cap\overline{\mathcal{P}_{\rm temp}}\right|}{Kn}}=\sqrt{\frac{n_{\rm sol}-l+1}{Kn}}. (26)

Thus, in this step, the number of the queries to OseqO_{\rm seq} and OPWMO_{\rm PWM} is

O(mKnnsoll+1log(Knδ)),O\left(m\sqrt{\frac{Kn}{n_{\rm sol}-l+1}}\log\left(\frac{Kn}{\delta}\right)\right), (27)

since O~sc,it𝒫temp\tilde{O}^{\mathcal{P}_{\rm temp}}_{\rm sc,it} contains O(m)O(m) calls to them, and that of the queries to O𝒫O_{\mathcal{P}} is of order (25), since O~sc,it𝒫temp\tilde{O}^{\mathcal{P}_{\rm temp}}_{\rm sc,it} calls it once with 𝒫\mathcal{P} being 𝒫temp\mathcal{P}_{\rm temp}. Therefore, for O𝒫O_{\mathcal{P}}, the total query number in the series of QAAs is

O(l=1nsolKnnsoll+1log(Knδ)),O\left(\sum_{l=1}^{n_{\rm sol}}\sqrt{\frac{Kn}{n_{\rm sol}-l+1}}\log\left(\frac{Kn}{\delta}\right)\right), (28)

which turns into Eq. (19) by simple algebra, and that for OseqO_{\rm seq} and OPWMO_{\rm PWM} is this times mm, that is, Eq. (18). After nsoln_{\rm sol} QAAs with “success”, we run another QAA that outputs “failure” and end the algorithm, since now 𝒫sol𝒫temp¯\mathcal{P}_{\rm sol}\cap\overline{\mathcal{P}_{\rm temp}} is empty and the amplitude of |ψ𝒫temp,1\ket{\psi_{\mathcal{P}_{\rm temp},1}} in |Ψ𝒫temp\Ket{\Psi_{\mathcal{P}_{\rm temp}}} is 0. In this last QAA, the query number for OseqO_{\rm seq} and OPWMO_{\rm PWM} is of order (20) and that for O𝒫O_{\mathcal{P}} is of order (21), according to Theorem 1, but the total query number in the algorithm remains of order (18) and (19).

In the case that nsol=0n_{\rm sol}=0, the first QAA outputs “failure” and then the algorithm ends. The query number in this is of same order as that in the last QAA in the case that nsol>0n_{\rm sol}>0, that is, Eqs. (20) and (21).

Let us comment on the number of qubits used in the naive iteration method. As we see in Eqs. (17) and (22), this algorithm uses several quantum registers to represent the indexes for PWMs, positions in a sequence, and positions in a segment. The sufficient qubit numbers in these types of registers are O(logK)O(\log K), O(logn)O(\log n) and O(logm)O(\log m), respectively, and thus the total qubit number is O(logK+logn+logm)O(\log K+\log n+\log m), logarithmic on the parameters that characterize the problem. Besides, the algorithm uses a few registers to represent real numbers such as an entry Mk(j,s)M_{k}(j,s) of a PWM and a segment score wk,iw_{k,i}. If we take some typical setting for binary representation of real numbers (say, double precision with 64 bits) independently of the problem, a few registers for real numbers amount qubits of order 10210^{2}, which surpasses the qubits for the indexes for typical values of the parameters K,nK,n and mm mentioned in Sec. II.2. In summary, for a typical PWM matching problem, the qubit number used in the naive iteration method is of order 10210^{2}. This also applies to the QMCI-based method, which is explained in Sec. III.3.

III.3 Algorithm \@slowromancapii@: QMCI-based method

Next, we present the QMCI-based method. It is basically same as the naive iteration method, but it calculate the score of each segment by QMCI. Although in the naive iteration method we calculate the score of one segment, which is a sum of the mm position-wise scores, calling OseqO_{\rm seq} and OPWMO_{\rm PWM} O(m)O(m) times, this query number can be reduced by QMCI, whose complexity depends on the required accuracy, if we can set it sufficiently loose.

This method is inspired by the algorithm for GW matchied filtering presented in [23]. It is the two-fold problem of calculating the quantity called SNR, which is given as the sum of many terms, for each template waveform of the GW signal, and searching the high-SNR templates. In this regard, it has a same structure as PWM matching, which consists of calculating the scores of the segments and searching the high-score segments. Therefore, we naturally conceive the idea to apply the algorithm in [23], which is a combination of QMCI and QAA, to PWM matching.

As pointed out in [23], there is an issue in using QMCI. The result of QMCI inevitably contains the error, and it can cause the false match. That is, even if wk,iw_{k,i} for some (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all} is smaller than the threshold wthw_{\rm th}, the estimation of it by QMCI might exceed wthw_{\rm th} due to the error, and we might misjudge that the iith segment matches with MkM_{k}. To cope with this, we set the threshold in the similar way to [23]. That is, we set the two levels of the threshold, wsoftw_{\rm soft} and wsoftw_{\rm soft}, which have the following meanings:

  • We want to find (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all} such that wk,iwhardw_{k,i}\geq w_{\rm hard} with high probability.

  • We never want to falsely find (k,i)(k,i) such that wk,i<wsoftw_{k,i}<w_{\rm soft}.

  • If there are (k,i)(k,i) such that wsoftwk,i<whardw_{\rm soft}\leq w_{k,i}<w_{\rm hard}, it is not necessary but fine to find them.

Then, we set the accuracy of QMCI to whardwsoft2\frac{w_{\rm hard}-w_{\rm soft}}{2} and take the following policy: if the estimation of wk,iw_{k,i} by QMCI is larger than or equal to

wmid:=wsoft+whard2,w_{\rm mid}:=\frac{w_{\rm soft}+w_{\rm hard}}{2}, (29)

we judge (k,i)(k,i) as “matched”, and if not, we judge as “mismatched”. Under this policy, (k,i)(k,i) is judged as “matched” if wk,iwhardw_{k,i}\geq w_{\rm hard} and “mismatched” if wk,i<wsoftw_{k,i}<w_{\rm soft} with high probability. We will discuss the validity to assume that such two threshold levels are set in Sec. IV.2.

Now, let us present the theorem on the QMCI-based method.

Theorem 4.

Consider Problem 3 under Assumptions 1 and 2. Suppose that we are given δ(0,1)\delta\in(0,1) and wsoft,whardw_{\rm soft},w_{\rm hard}\in\mathbb{R} such that 0<wsoft<whard<m0<w_{\rm soft}<w_{\rm hard}<m. Define

𝒫hard:={(k,i)𝒫all|wk,iwhard}\mathcal{P}_{\rm hard}:=\{(k,i)\in\mathcal{P}_{\rm all}\ |\ w_{k,i}\geq w_{\rm hard}\} (30)

and

𝒫soft:={(k,i)𝒫all|wk,iwsoft}.\mathcal{P}_{\rm soft}:=\{(k,i)\in\mathcal{P}_{\rm all}\ |w_{k,i}\geq w_{\rm soft}\}. (31)

Then, there is a quantum algorithm that makes queries to OseqO_{\rm seq}, OPWMO_{\rm PWM} and O𝒫O_{\mathcal{P}} with 𝒫\mathcal{P} being some subsets in 𝒫all\mathcal{P}_{\rm all}, and behaves as follows:

  1. (i)

    If nhard:=|𝒫hard|>0n_{\rm hard}:=|\mathcal{P}_{\rm hard}|>0, the algorithm outputs all the elements in 𝒫hard\mathcal{P}_{\rm hard} and 0 or more elements in 𝒫soft𝒫hard\mathcal{P}_{\rm soft}\setminus\mathcal{P}_{\rm hard} with probability at least 1δ1-\delta. In the algorithm, OseqO_{\rm seq} and OPWMO_{\rm PWM} are called

    O(mnsoftKnwhardwsoftlog(K2n2δ)log(Knδ))O\left(\frac{mn_{\rm soft}\sqrt{Kn}}{w_{\rm hard}-w_{\rm soft}}\log\left(\frac{K^{2}n^{2}}{\delta}\right)\log\left(\frac{Kn}{\delta}\right)\right) (32)

    times, and O𝒫O_{\mathcal{P}} are called

    O(nsoftKnlog(K2n2δ)log(Knδ))O\left(n_{\rm soft}\sqrt{Kn}\log\left(\frac{K^{2}n^{2}}{\delta}\right)\log\left(\frac{Kn}{\delta}\right)\right) (33)

    times, where nsoft:=|𝒫soft|n_{\rm soft}:=|\mathcal{P}_{\rm soft}|.

  2. (ii)

    If nsoft=0n_{\rm soft}=0, the algorithm certainly outputs the message “no match”. In the algorithm, OseqO_{\rm seq} and OPWMO_{\rm PWM} are called

    O(mKnwhardwsoftlog(K2n2δ)log(Knδ))O\left(\frac{m\sqrt{Kn}}{w_{\rm hard}-w_{\rm soft}}\log\left(\frac{K^{2}n^{2}}{\delta}\right)\log\left(\frac{Kn}{\delta}\right)\right) (34)

    times, and O𝒫O_{\mathcal{P}} are called

    O(Knlog(K2n2δ)log(Knδ))O\left(\sqrt{Kn}\log\left(\frac{K^{2}n^{2}}{\delta}\right)\log\left(\frac{Kn}{\delta}\right)\right) (35)

    times.

  3. (iii)

    If nsoft>0n_{\rm soft}>0 and nhard=0n_{\rm hard}=0, the algorithm certainly outputs the message “no match” or 1 or more elements in 𝒫soft\mathcal{P}_{\rm soft}. In the algorithm, the number of queries to OseqO_{\rm seq} and OPWMO_{\rm PWM} is of order (32), and the number of queries to O𝒫O_{\mathcal{P}} is of order (33).

Proof.

First, note that, for any k[K]0k\in[K]_{0}, i[nm+1]0i\in[n-m+1]_{0} and j[m]0j\in[m]_{0}, we can perform the following operation:

|k|i|j|0|0|0\displaystyle\ket{k}\ket{i}\ket{j}\ket{0}\ket{0}\ket{0} (36)
\displaystyle\rightarrow |k|i|j|i+j|0|0\displaystyle\ket{k}\ket{i}\ket{j}\ket{i+j}\ket{0}\ket{0}
\displaystyle\rightarrow |k|i|j|i+j|si+j|0\displaystyle\ket{k}\ket{i}\ket{j}\ket{i+j}\ket{s_{i+j}}\ket{0}
\displaystyle\rightarrow |k|i|j|i+j|si+j|Mk(j,si+j),\displaystyle\ket{k}\ket{i}\ket{j}\ket{i+j}\ket{s_{i+j}}\ket{M_{k}(j,s_{i+j})},

where OaddO_{\rm add}, OseqO_{\rm seq} and OPWMO_{\rm PWM} are used at the first, second and third arrows, respectively. We denote by Osc,oneO_{\rm sc,one} the oracle for the above operation. According to Theorem 2, for any ϵ,δ(0,1)\epsilon,\delta\in(0,1), we use Osc,oneO_{\rm sc,one} O(ϵ1log(δ1))O(\epsilon^{-1}\log(\delta^{-1})) times to construct the oracle Oϵ,δsc,QMCIO^{\rm sc,QMCI}_{\epsilon,\delta} that acts as

Oϵ,δsc,QMCI|k|i|0=|k|iy𝒴k,iαyk,i|y.O^{\rm sc,QMCI}_{\epsilon,\delta}\ket{k}\ket{i}\ket{0}=\ket{k}\ket{i}\sum_{y\in\mathcal{Y}_{k,i}}\alpha^{k,i}_{y}\ket{y}. (37)

Here, 𝒴k,i\mathcal{Y}_{k,i} is a finite set of real numbers that includes a subset 𝒴~k,i\tilde{\mathcal{Y}}_{k,i} consisting of ϵ\epsilon-approximations of wk,im\frac{w_{k,i}}{m}, and {αyk,i}y𝒴k,i\{\alpha^{k,i}_{y}\}_{y\in\mathcal{Y}_{k,i}} are complex numbers satisfying

y~𝒴~k,i|αy~k,i|21δ.\sum_{\tilde{y}\in\tilde{\mathcal{Y}}_{k,i}}|\alpha^{k,i}_{\tilde{y}}|^{2}\geq 1-\delta. (38)

Furthermore, as Eq. (22), we can construct O~ϵ,δ,𝒫sc,QMCI\tilde{O}^{\rm sc,QMCI}_{\epsilon,\delta,\mathcal{P}} that acts on the seven-register system as

O~ϵ,δ,𝒫sc,QMCI|0|0|0|0|0|0|0\displaystyle\tilde{O}^{\rm sc,QMCI}_{\epsilon,\delta,\mathcal{P}}\ket{0}\ket{0}\ket{0}\ket{0}\ket{0}\ket{0}\ket{0} (39)
=\displaystyle= 1Knk=0K1i=0n1y𝒴k,iαyk,i|k|i|y|wmidm(𝟙ywmidm(k,i)𝒫|1|1|1+𝟙ywmidm(k,i)𝒫|1|0|0+\displaystyle\frac{1}{\sqrt{Kn}}\sum_{k=0}^{K-1}\sum_{i=0}^{n-1}\sum_{y\in\mathcal{Y}_{k,i}}\alpha^{k,i}_{y}\ket{k}\ket{i}\ket{y}\Ket{\frac{w_{\rm mid}}{m}}\left(\mathbbm{1}_{y\geq\frac{w_{\rm mid}}{m}\ \wedge\ (k,i)\notin\mathcal{P}}\ket{1}\ket{1}\ket{1}+\mathbbm{1}_{y\geq\frac{w_{\rm mid}}{m}\ \wedge\ (k,i)\in\mathcal{P}}\ket{1}\ket{0}\ket{0}+\right.
𝟙y<wmidm(k,i)𝒫|0|1|0+𝟙y<wmidm(k,i)𝒫|0|0|0)\displaystyle\ \quad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\left.\mathbbm{1}_{y<\frac{w_{\rm mid}}{m}\ \wedge\ (k,i)\notin\mathcal{P}}\ket{0}\ket{1}\ket{0}+\mathbbm{1}_{y<\frac{w_{\rm mid}}{m}\ \wedge\ (k,i)\in\mathcal{P}}\ket{0}\ket{0}\ket{0}\right)
=:\displaystyle=: β𝒫,1|ξ𝒫,1|1+β𝒫,0|ξ𝒫,0|0=:|Ξ𝒫\displaystyle\beta_{\mathcal{P},1}\ket{\xi_{\mathcal{P},1}}\ket{1}+\beta_{\mathcal{P},0}\ket{\xi_{\mathcal{P},0}}\ket{0}=:\ket{\Xi_{\mathcal{P}}}

for any subset 𝒫𝒫all\mathcal{P}\subset\mathcal{P}_{\rm all}, using Oϵ,δsc,QMCIO^{\rm sc,QMCI}_{\epsilon,\delta}, O𝒫O_{\mathcal{P}} and some arithmetic oracles. Here,

|ξ𝒫,1:=1(k,i)𝒫¯y𝒴k,iywmid/m|αyk,i|2×\displaystyle\ket{\xi_{\mathcal{P},1}}:=\frac{1}{\sqrt{\sum_{(k,i)\in\overline{\mathcal{P}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}}\times
(k,i)𝒫¯y𝒴k,iywmid/mαyk,i|k|i|y|wmidm|1|1\displaystyle\qquad\qquad\sum_{(k,i)\in\overline{\mathcal{P}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}\alpha^{k,i}_{y}\ket{k}\ket{i}\ket{y}\Ket{\frac{w_{\rm mid}}{m}}\ket{1}\ket{1} (40)

is the quantum states on the first six register, and |ξ𝒫,0\ket{\xi_{\mathcal{P},0}} is another state on the same system.

β𝒫,1=(k,i)𝒫¯y𝒴k,iywmid/m|αyk,i|2Kn,\beta_{\mathcal{P},1}=\sqrt{\frac{\sum_{(k,i)\in\overline{\mathcal{P}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}{Kn}}, (41)

and β𝒫,0\beta_{\mathcal{P},0} is another complex number satisfying |β𝒫,0|2+|β𝒫,1|2=1|\beta_{\mathcal{P},0}|^{2}+|\beta_{\mathcal{P},1}|^{2}=1. Note that O~ϵ,δ,𝒫sc,QMCI\tilde{O}^{\rm sc,QMCI}_{\epsilon,\delta,\mathcal{P}} uses Oϵ,δsc,QMCIO^{\rm sc,QMCI}_{\epsilon,\delta} once, and thus Osc,oneO_{\rm sc,one} O(ϵ1log(δ1))O(\epsilon^{-1}\log(\delta^{-1})) times. Since Osc,oneO_{\rm sc,one} calls OseqO_{\rm seq} and OPWMO_{\rm PWM} once each, O~ϵ,δ,𝒫sc,QMCI\tilde{O}^{\rm sc,QMCI}_{\epsilon,\delta,\mathcal{P}} calls them O(ϵ1log(δ1))O(\epsilon^{-1}\log(\delta^{-1})) times, consequently. Also note that O~ϵ,δ,𝒫sc,QMCI\tilde{O}^{\rm sc,QMCI}_{\epsilon,\delta,\mathcal{P}} uses O𝒫O_{\mathcal{P}} once.

Then, we present the QMCI-based method as Algorithm 3.

Input:
  • δ(0,1)\delta\in(0,1)

  • wsoftw_{\rm soft} and whardw_{\rm hard} such that 0<wsoft<whard<m0<w_{\rm soft}<w_{\rm hard}<m

1
2Set 𝒫temp=\mathcal{P}_{\rm temp}=\emptyset, StopFlg=0{\rm StopFlg}=0, δ=δ4K2n2\delta^{\prime}=\frac{\delta}{4K^{2}n^{2}}, δ′′=δ2Kn\delta^{\prime\prime}=\frac{\delta}{2Kn} and
ϵ=whardwsoft2m.\epsilon^{\prime}=\frac{w_{\rm hard}-w_{\rm soft}}{2m}. (42)
3
4while StopFlg=0{\rm StopFlg}=0 do
5       Perform QAA(O~ϵ,δ,𝒫tempsc,QMCI,12Kn,δ′′)\textup{{QAA}}\left(\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}},\frac{1}{2Kn},\delta^{\prime\prime}\right).
6      
7      if The output message is “success” then
8             Measure the first and second registers in the output state |ξ𝒫temp,1\ket{\xi_{\mathcal{P}_{\rm temp},1}} and let the outcome be (k,i)(k,i).
9            
10            Calculate wk,iw_{k,i} classically and let the result be wk,iclw^{\rm cl}_{k,i}.
11            if wk,iclwsoftw^{\rm cl}_{k,i}\geq w_{\rm soft} then
12                   Add (k,i)(k,i) to 𝒫temp\mathcal{P}_{\rm temp}.
13            else
14                   Set StopFlg=1{\rm StopFlg}=1
15            
16       else
17             Set StopFlg=1{\rm StopFlg}=1
18      
19
20if 𝒫temp\mathcal{P}_{\rm temp}\neq\emptyset then
21       Output 𝒫temp\mathcal{P}_{\rm temp}.
22 else
23       Output ”no match”.
Algorithm 3 The QMCI-based method

Then, let us consider the behavior of this algorithm in the following cases.

(i) nhard>0n_{\rm hard}>0

For any (k,i)𝒫hard(k,i)\in\mathcal{P}_{\rm hard},

|ywk,im|ϵywmidm\left|y-\frac{w_{k,i}}{m}\right|\leq\epsilon^{\prime}\Rightarrow y\geq\frac{w_{\rm mid}}{m} (43)

holds for any yy\in\mathbb{R} under the definitions (29) and (42), and thus

y𝒴k,iywmid/m|αyk,i|2y𝒴k,i|ywk,i|ϵ|αyk,i|21δ12\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}\geq\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ |y-w_{k,i}|\leq\epsilon^{\prime}\end{subarray}}|\alpha^{k,i}_{y}|^{2}\geq 1-\delta^{\prime}\geq\frac{1}{2} (44)

holds. This means that, if 𝒫hard𝒫temp¯\mathcal{P}_{\rm hard}\cap\overline{\mathcal{P}_{\rm temp}}\neq\emptyset,

|β𝒫temp,1|2\displaystyle|\beta_{\mathcal{P}_{\rm temp},1}|^{2} =\displaystyle= (k,i)𝒫~¯tempy𝒴k,iywmid/m|αyk,i|2Kn\displaystyle\frac{\sum_{(k,i)\in\overline{\tilde{\mathcal{P}}}_{\rm temp}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}{Kn} (45)
\displaystyle\geq (k,i)𝒫hard𝒫temp¯y𝒴k,iywmid/m|αyk,i|2Kn\displaystyle\frac{\sum_{(k,i)\in\mathcal{P}_{\rm hard}\cap\overline{\mathcal{P}_{\rm temp}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}{Kn}
\displaystyle\geq (k,i)𝒫hard𝒫temp¯12Kn\displaystyle\sum_{(k,i)\in\mathcal{P}_{\rm hard}\cap\overline{\mathcal{P}_{\rm temp}}}\frac{1}{2Kn}
\displaystyle\geq 12Kn,\displaystyle\frac{1}{2Kn},

and thus QAA(O~ϵ,δ,𝒫tempsc,QMCI,12Kn,δ′′)\textup{{QAA}}\left(\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}},\frac{1}{2Kn},\delta^{\prime\prime}\right) outputs |ξ𝒫temp,1\ket{\xi_{\mathcal{P}_{\rm temp},1}} with probability at least 1δ′′=1δ2Kn1-\delta^{\prime\prime}=1-\frac{\delta}{2Kn}.

On the other hand, for (k,i)𝒫soft(k,i)\notin\mathcal{P}_{\rm soft},

ywmidm|ywk,im|>ϵy𝒴~k,iy\geq\frac{w_{\rm mid}}{m}\Rightarrow\left|y-\frac{w_{k,i}}{m}\right|>\epsilon^{\prime}\Rightarrow y\notin\tilde{\mathcal{Y}}_{k,i} (46)

holds for any yy\in\mathbb{R}, and thus

y𝒴k,iywmid/m|αyk,i|2=y𝒴k,i𝒴~k,iywmid/m|αyk,i|2y𝒴k,i𝒴~k,i|αyk,i|2<δ\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}=\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\setminus\tilde{\mathcal{Y}}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}\leq\sum_{y\in\mathcal{Y}_{k,i}\setminus\tilde{\mathcal{Y}}_{k,i}}|\alpha^{k,i}_{y}|^{2}<\delta^{\prime} (47)

holds. From this, the probability that we obtain (k,i)𝒫soft(k,i)\in\mathcal{P}_{\rm soft} in measuring the first two registers in |ξ𝒫temp,1\ket{\xi_{\mathcal{P}_{\rm temp},1}} is evaluated as

(k,i)𝒫soft𝒫temp¯y𝒴k,iywmid/m|αyk,i|2(k,i)𝒫temp¯y𝒴k,iywmid/m|αyk,i|2\displaystyle\frac{\sum_{(k,i)\in\mathcal{P}_{\rm soft}\cap\overline{\mathcal{P}_{\rm temp}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}{\sum_{(k,i)\in\overline{\mathcal{P}_{\rm temp}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}} (48)
=\displaystyle= 1(k,i)𝒫soft¯𝒫temp¯y𝒴k,iywmid/m|αyk,i|2(k,i)𝒫temp¯y𝒴k,iywmid/m|αyk,i|2\displaystyle 1-\frac{\sum_{(k,i)\in\overline{\mathcal{P}_{\rm soft}}\cap\overline{\mathcal{P}_{\rm temp}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}{\sum_{(k,i)\in\overline{\mathcal{P}_{\rm temp}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}}
\displaystyle\geq 12(k,i)𝒫soft¯𝒫temp¯δ\displaystyle 1-2\sum_{(k,i)\in\overline{\mathcal{P}_{\rm soft}}\cap\overline{\mathcal{P}_{\rm temp}}}\delta^{\prime}
\displaystyle\geq 12Knδ\displaystyle 1-2Kn\delta^{\prime}
=\displaystyle= 1δ2Kn.\displaystyle 1-\frac{\delta}{2Kn}.

At the first inequality, we used Eq. (47) and

(k,i)𝒫temp¯y𝒴k,iywmid/m|αyk,i|2=Kn|β𝒫temp,1|212,\sum_{(k,i)\in\overline{\mathcal{P}_{\rm temp}}}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}|\alpha^{k,i}_{y}|^{2}=Kn|\beta_{\mathcal{P}_{\rm temp},1}|^{2}\geq\frac{1}{2}, (49)

which follows from Eq. (45).

Combining the above discussions, we see that, if 𝒫hard𝒫temp¯\mathcal{P}_{\rm hard}\cap\overline{\mathcal{P}_{\rm temp}}\neq\emptyset, we obtain an element in 𝒫soft𝒫temp¯\mathcal{P}_{\rm soft}\cap\overline{\mathcal{P}_{\rm temp}} by QAA(O~ϵ,δ,𝒫tempsc,QMCI,12Kn,δ′′)\textup{{QAA}}\left(\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}},\frac{1}{2Kn},\delta^{\prime\prime}\right) and the subsequent measurement on |ξ𝒫temp,1\ket{\xi_{\mathcal{P}_{\rm temp},1}} with probability at least (1δ2Kn)21δKn\left(1-\frac{\delta}{2Kn}\right)^{2}\geq 1-\frac{\delta}{Kn}. Therefore, with some probability, the following happens: we successively obtain elements in 𝒫soft\mathcal{P}_{\rm soft} in loop 2-12 in Algorithm 3, until we get all the elements in 𝒫hard\mathcal{P}_{\rm hard}. Since the number of loops is at most nsoftn_{\rm soft}, the probability that this happens is at least

(1δKn)nsoft(1δKn)Kn1δ.\left(1-\frac{\delta}{Kn}\right)^{n_{\rm soft}}\geq\left(1-\frac{\delta}{Kn}\right)^{Kn}\geq 1-\delta. (50)

We can evaluate the query complexity in this loop as Eqs. (32) and (33) under the current setting of ϵ,δ\epsilon^{\prime},\delta^{\prime} and δ′′\delta^{\prime\prime}, recalling that QAA(O~ϵ,δ,𝒫tempsc,QMCI,12Kn,δ′′)\textup{{QAA}}\left(\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}},\frac{1}{2Kn},\delta^{\prime\prime}\right) calls O~ϵ,δ,𝒫tempsc,QMCI\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}} O(Knlog(1δ′′))O\left(\sqrt{Kn}\log\left(\frac{1}{\delta^{\prime\prime}}\right)\right) times and that O~ϵ,δ,𝒫tempsc,QMCI\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}} calls OseqO_{\rm seq} and OPWMO_{\rm PWM} O(1ϵlog(1δ))O\left(\frac{1}{\epsilon^{\prime}}\log\left(\frac{1}{\delta^{\prime}}\right)\right) times and O𝒫O_{\mathcal{P}} O(1)O(1) times.

(ii) nsoft>0n_{\rm soft}>0

With certainty, the first run of QAA outputs the message “failure” or, even if not, wk,iw_{k,i} classically calculated at step 6 in Algorithm 3 is smaller than wsoftw_{\rm soft}, since every (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all} yields wk,i<wsoftw_{k,i}<w_{\rm soft} in this case. Therefore, the algorithm certainly ends outputting “no match”, with QAA run only once. The query complexity of this is evaluated as Eqs. (34) and (35).

(iii) nhard=0n_{\rm hard}=0 and nsoft>0n_{\rm soft}>0

The algorithm ends with only one QAA that outputs “failure” or (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all} such that wk,i<wsoftw_{k,i}<w_{\rm soft}, or QAA runs some times outputting (k,i)𝒫soft(k,i)\in\mathcal{P}_{\rm soft}. The number of QAA loops is at most nsoftn_{\rm soft}, and thus the query complexity is evaluated as Eqs. (32) and (33) similarly to the case (i).

Let us make some comments. First, note that, although Eqs. (18) and (20) scales with nsoln_{\rm sol} as O(nsol)O(\sqrt{n_{\rm sol}}), Eqs. (32) and (33) scales linearly with nsoftn_{\rm soft}, which means that the QMCI-based method has the worse scaling with respect to the number of matches. This difference can be understood as follows. In the naive iteration method, among the computational basis states contained in the state |Ψ𝒫temp\ket{\Psi_{\mathcal{P}_{\rm temp}}}, those with the last qubit taking |1\ket{1} are |k|i|wk,i|wth|1|1|1\ket{k}\ket{i}\ket{w_{k,i}}\ket{w_{\rm th}}\ket{1}\ket{1}\ket{1} with (k,i)𝒫sol𝒫temp¯(k,i)\in\mathcal{P}_{\rm sol}\cap\overline{\mathcal{P}_{\rm temp}}, each of which having the amplitude 1Kn\sqrt{\frac{1}{Kn}}. They constitute |ψ𝒫temp,1|1\ket{\psi_{\mathcal{P}_{\rm temp},1}}\ket{1}, the target state of QAA, whose amplitude decreases as nsolKn,nsol1Kn,,1Kn\sqrt{\frac{n_{\rm sol}}{Kn}},\sqrt{\frac{n_{\rm sol}-1}{Kn}},...,\sqrt{\frac{1}{Kn}} in the QAA loop, and this leads to the evaluation of the total complexity in Eqs. (32) and (33). On the other hand, in the QMCI-based method, among the computational basis states contained in the state |Ξ𝒫temp\ket{\Xi_{\mathcal{P}_{\rm temp}}}, those with the last qubit taking |1\ket{1} are |k|i|y|wthm|1|1|1\ket{k}\ket{i}\ket{y}\Ket{\frac{w_{\rm th}}{m}}\ket{1}\ket{1}\ket{1} with (k,i)(k,i) being any elements in 𝒫temp¯\overline{\mathcal{P}_{\rm temp}}, although those for (k,i)𝒫soft(k,i)\in\mathcal{P}_{\rm soft} constitute the most part of |Ξ𝒫temp\ket{\Xi_{\mathcal{P}_{\rm temp}}} in terms of the squared amplitude. When we write |Ξ𝒫temp\ket{\Xi_{\mathcal{P}_{\rm temp}}} as

|Ξ𝒫temp=1Kn(k,i)𝒫temp¯γk,i|ξ~𝒫temp,1;k,i+β𝒫temp,0|ξ𝒫temp,0|0\ket{\Xi_{\mathcal{P}_{\rm temp}}}=\frac{1}{\sqrt{Kn}}\sum_{(k,i)\in\overline{\mathcal{P}_{\rm temp}}}\gamma_{k,i}\ket{\tilde{\xi}_{\mathcal{P}_{\rm temp},1;k,i}}+\beta_{\mathcal{P}_{\rm temp},0}\ket{\xi_{\mathcal{P}_{\rm temp},0}}\ket{0} (51)

with

|ξ~𝒫temp,1;k,i\displaystyle\ket{\tilde{\xi}_{\mathcal{P}_{\rm temp},1;k,i}} :=\displaystyle:= 1γk,i|k|iy𝒴k,iywmid/mαyk,i|y|wthm|1|1|1,\displaystyle\frac{1}{\gamma_{k,i}}\ket{k}\ket{i}\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}\alpha^{k,i}_{y}\ket{y}\Ket{\frac{w_{\rm th}}{m}}\ket{1}\ket{1}\ket{1},
γk,i\displaystyle\gamma_{k,i} :=\displaystyle:= y𝒴k,iywmid/m|αyk,i|2,\displaystyle\sqrt{\sum_{\begin{subarray}{c}y\in\mathcal{Y}_{k,i}\\ y\geq w_{\rm mid}/m\end{subarray}}\left|\alpha^{k,i}_{y}\right|^{2}}, (52)

the squared amplitude of |ξ~𝒫temp,1;k,i\ket{\tilde{\xi}_{\mathcal{P}_{\rm temp},1;k,i}}, |γk,i|2Kn\frac{|\gamma_{k,i}|^{2}}{Kn}, is at least 12Kn\frac{1}{2Kn} for (k,i)𝒫hard(k,i)\in\mathcal{P}_{\rm hard} as we see from Eq. (44), but that for (k,i)𝒫soft𝒫hard(k,i)\in\mathcal{P}_{\rm soft}\setminus\mathcal{P}_{\rm hard} can be much smaller than it. Nevertheless, the squared amplitudes of the states |ξ~𝒫temp,1;k,i\ket{\tilde{\xi}_{\mathcal{P}_{\rm temp},1;k,i}} for (k,i)𝒫soft𝒫hard(k,i)\in\mathcal{P}_{\rm soft}\setminus\mathcal{P}_{\rm hard} can pile up to the value comparable with 12Kn\frac{1}{2Kn}. In such a situation, it is possible that, in the QAA loop, QAA(O~ϵ,δ,𝒫tempsc,QMCI,12Kn,δ′′)\textup{{QAA}}\left(\tilde{O}^{\rm sc,QMCI}_{\epsilon^{\prime},\delta^{\prime},\mathcal{P}_{\rm temp}},\frac{1}{2Kn},\delta^{\prime\prime}\right) continues to output |ξ𝒫temp,1\ket{\xi_{\mathcal{P}_{\rm temp},1}} and we continue to get (k,i)𝒫soft(k,i)\in\mathcal{P}_{\rm soft}, until we get O(nsoft)O(n_{\rm soft}) elements in 𝒫soft\mathcal{P}_{\rm soft} and the squared amplitude |β𝒫temp,1|2|\beta_{\mathcal{P}_{\rm temp},1}|^{2} of |ξ𝒫temp,1|1\ket{\xi_{\mathcal{P}_{\rm temp},1}}\ket{1} in |Ξ𝒫temp\ket{\Xi_{\mathcal{P}_{\rm temp}}} decreases below 12Kn\frac{1}{2Kn}. When this happens, the query complexity becomes comparable with the bounds (32) and (33).

Second, seemingly, the bounds on the number of queries to OseqO_{\rm seq} and OPWMO_{\rm PWM} in Eqs. (32) and (34) linearly scale with mm, which is similar to Eq. (18) and (20) and makes us consider that there is no speedup with respect to mm compared to the naive iteration method. However, if we can set wsoftw_{\rm soft} and whardw_{\rm hard} with larger difference for larger mm, the dependence of the bounds in Eqs. (32) and (34) on mm becomes milder than linear. This seems reasonable because, naively thinking, the typical value of the segment score, which is the sum of mm terms, becomes larger for larger mm, and so do wsoftw_{\rm soft}, whardw_{\rm hard} and their difference. In fact, in Sec. IV.2, we argue that it is reasonable to take whardw_{\rm hard} and wsoftw_{\rm soft} so that

whardwsoft=Ω(m),w_{\rm hard}-w_{\rm soft}=\Omega(\sqrt{m}), (53)

from which Eqs. (32) and (34) turns into

O(nsoftKnmlog(K2n2δ)log(Knδ))O\left(n_{\rm soft}\sqrt{Knm}\log\left(\frac{K^{2}n^{2}}{\delta}\right)\log\left(\frac{Kn}{\delta}\right)\right) (54)

and

O(Knmlog(K2n2δ)log(Knδ)),O\left(\sqrt{Knm}\log\left(\frac{K^{2}n^{2}}{\delta}\right)\log\left(\frac{Kn}{\delta}\right)\right), (55)

respectively. If so, the QMCI-based method can be beneficial compared to the naive iteration method for small nsoftn_{\rm soft} and large mm, that is, in the case that there is a small number of matches and the sequence motif length is large.

IV Discussion

IV.1 Implementations of the oracles with QRAMs and the cost to prepare them

Now, we consider how to implement the oracles OseqO_{\rm seq}, OPWMO_{\rm PWM} and O𝒫O_{\mathcal{P}}, which we have simply assumed are implementable so far.

It seems that, in order to realize the quantum access to the elements in the sequence SS like Eq. (14), we need to use a QRAM [24]. Although some difficulties in constructing it in reality have been pointed out [29], it is the very device that provides the access to the indexed data in superposition in O(logN)O(\log N) time with respect to NN the number of the data points. Of course, preparing a QRAM, that is, registering the NN data points into the QRAM requires O(N)O(N) time. To prepare OseqO_{\rm seq}, we need O(n)O(n) time.

We can use a QRAM also for OPWMO_{\rm PWM} in Eq. (15). Although the indexes are now three-fold, (k,i,a)(k,i,a), it is straightforward to combine them and regard it as an integer. Preparing this takes O(m|𝒜|K)O(m|\mathcal{A}|K) time, which is expected to be much shorter than O(N)O(N) in usual situations.

We can also construct O𝒫O_{\mathcal{P}}, especially O𝒫tempO_{\mathcal{P}_{\rm temp}}, using a QRAM. Naively thinking, we can do this by registering 0 or 1, which represents (k,i)𝒫(k,i)\in\mathcal{P} or not, for every (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all}. However, this takes O(nK)O(nK) time, which exceeds O(nm)O(nm) time for the classical exhaustive search if K>mK>m. Therefore, we adopt the following approach that takes the shorter time for QRAM preparation. First, we plausibly assume that the number of the matched PWMs at every position ii in the sequence SS is at most κ\kappa, which is O(1)O(1). Then, we prepare the QRAM O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} that outputs κ\kappa indices ki,1,,ki,κ[K]0k_{i,1},...,k_{i,\kappa}\in[K]_{0} such that (ki,1,i),,(ki,κ,i)𝒫temp(k_{i,1},i),...,(k_{i,\kappa},i)\in\mathcal{P}_{\rm temp} for each i[n]0i\in[n]_{0}:

O~𝒫temp|i|0|0κ=|i|ki,1|ki,κ.\tilde{O}_{\mathcal{P}_{\rm temp}}\ket{i}\underbrace{\ket{0}\cdots\ket{0}}_{\kappa}=\ket{i}\ket{k_{i,1}}\cdots\ket{k_{i,\kappa}}. (56)

If κ\kappa^{\prime} the number of such indices is smaller than κ\kappa, we set ki,κ+1,,ki,κk_{i,\kappa^{\prime}+1},...,k_{i,\kappa} to some dummy number (say, 1-1) not contained in [K]0[K]_{0}. Using this, we can perform the following operation for any (k,i)𝒫all(k,i)\in\mathcal{P}_{\rm all}:

|k|i|0|0κ|0|0κ|0\displaystyle\ket{k}\ket{i}\underbrace{\ket{0}\cdots\ket{0}}_{\kappa}\underbrace{\ket{0}\cdots\ket{0}}_{\kappa}\ket{0}
\displaystyle\rightarrow |k|i|ki,1|ki,κ|0|0|0\displaystyle\ket{k}\ket{i}\ket{k_{i,1}}\cdots\ket{k_{i,\kappa}}\ket{0}\cdots\ket{0}\ket{0}
\displaystyle\rightarrow |k|i|ki,1|ki,κ|𝟙kki,1|𝟙kki,κ|0\displaystyle\ket{k}\ket{i}\ket{k_{i,1}}\cdots\ket{k_{i,\kappa}}\Ket{\mathbbm{1}_{k\neq k_{i,1}}}\cdots\Ket{\mathbbm{1}_{k\neq k_{i,\kappa}}}\ket{0}
\displaystyle\rightarrow |k|i|ki,1|ki,κ|𝟙kki,1|𝟙kki,κ|𝟙kki,1kki,κ.\displaystyle\ket{k}\ket{i}\ket{k_{i,1}}\cdots\ket{k_{i,\kappa}}\Ket{\mathbbm{1}_{k\neq k_{i,1}}}\cdots\Ket{\mathbbm{1}_{k\neq k_{i,\kappa}}}\Ket{\mathbbm{1}_{k\neq k_{i,1}\wedge\cdots\wedge k\neq k_{i,\kappa}}}.

Here, the first to (κ+2)(\kappa+2)th kets correspond to registers with the sufficient number of qubits and the other kets correspond to the single qubits. In Eq. (LABEL:eq:OPImpl), we use O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} at the first arrow and O=O_{=}’s at the second arrow, and the last operation is done by the multiply controlled NOT gate on the last κ+1\kappa+1 qubits. Note that ‘1’ on the last qubit means (k,i)𝒫temp(k,i)\notin\mathcal{P}_{\rm temp}. Therefore, the above operation is in fact O𝒫tempO_{\mathcal{P}_{\rm temp}}, with some registers in Eq. (LABEL:eq:OPImpl) regarded as ancillas. In this implementation, the QRAM O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} is queried once in a call to the oracle O𝒫tempO_{\mathcal{P}_{\rm temp}}, along with O(κ)O(\kappa) uses of arithmetic oracles. For initializing O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}}, O(κn)O(\kappa n) time is taken at the very beginning of Algorithm 3, where 𝒫temp=\mathcal{P}_{\rm temp}=\emptyset and thus ki,1==ki,κ=1k_{i,1}=\cdots=k_{i,\kappa}=-1 for any i[n]0i\in[n]_{0}. After that, every time an index pair (k,i)(k,i) is added to 𝒫temp\mathcal{P}_{\rm temp} in the QAA loop in Algorithm 3, one memory cell in O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} is updated, which takes O(1)O(1) time.

Let us summarize the above discussion. At the beginning both of the naive iteration method and the QMCI method, we need to initialize the QRAMs OseqO_{\rm seq}, OPWMO_{\rm PWM} and O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}}, which takes O(n+m|𝒜|K+κn)O(n+m|\mathcal{A}|K+\kappa n) time in total. If we reasonably assume that m|𝒜|K<nm|\mathcal{A}|K<n and κ=O(1)\kappa=O(1), the time complexity is estimated as O(n)O(n).

Although we need to take O(n)O(n) time at the preliminary stage, after that the quantum algorithms run with complexities shown in Theorems 3 and 4, which scales with nn as O(n)O(\sqrt{n}), for any sequence SS and any PWMs {Mk}k\{M_{k}\}_{k}. Also note that, once we prepare OseqO_{\rm seq}, whose preparation is the bottleneck under the current assumption, we can search the matches between SS and another set of KK PWMs {Mk}k\{M^{\prime}_{k}\}_{k} by preparing OPWMO_{\rm PWM} and O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} and running the quantum algorithm, which no longer takes O(n)O(n) time444Initializing O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} seems to take O(κn)O(\kappa n) time again. However, if we have O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}} used in the previous algorithm run, resetting its updated memory cells gives us the properly initialized O~𝒫temp\tilde{O}_{\mathcal{P}_{\rm temp}}. Since the number of the memory cells to be reset is equal to that of the matches found in the previous run and it is usually much smaller than nn, this reset-based initialization does not take O(n)O(n) time.. As far as the authors know, there is no known method for PWM matching in which initialization takes O(n)O(n) time and the main search algorithm takes the sublinear complexity to nn. As an algorithm having the initialization cost of same order, we refer to [11], for example. In this classical algorithm based on an enhanced suffix array (ESA), it takes O(n)O(n) time to construct ESA. After that, the worst-case complexity to find matches is O(n+m)O(n+m) if some condition is satisfied, but it can be O(nm)O(nm) in the general case.

IV.2 Score threshold in the large mm limit

Here, we consider the asymptotic distribution of scores of segment when the sequence motif length mm is large and, based on it, discuss the plausible setting on the score threshold.

In many cases, the score threshold wthw_{\rm th} in matching with a PWM Mm×|𝒜|M\in\mathbb{R}^{m\times|\mathcal{A}|} is determined by the pp-value. That is, we set wthw_{\rm th} so that the probability that the score of a segment becomes equal to or larger than wthw_{\rm th} is equal to the given value p(0,1)p\in(0,1) in the background model. Here, the background model means the assumption that, when we take a segment of length mm in the sequence SS randomly and denote by uiu_{i} the alphabet in the iith position in the segment, u0,,um1u_{0},...,u_{m-1} are independent and identically distributed. The rigorous definition is as follows. Supposing that every a𝒜a\in\mathcal{A} is associated with pa(0,1)p_{a}\in(0,1) satisfying a𝒜pa=1\sum_{a\in\mathcal{A}}p_{a}=1, we consider the finite probability space (𝒜m,BG)(\mathcal{A}^{m},\mathbb{P}_{\rm BG}) consisting of the sample space 𝒜m\mathcal{A}^{m} and the probability function BG:𝒜m0\mathbb{P}_{\rm BG}:\mathcal{A}^{m}\rightarrow\mathbb{R}_{\geq 0} such that, for any u0..um1𝒜mu_{0}..u_{m-1}\in\mathcal{A}^{m},

BG(u0..um1)=i=0m1pai,\mathbb{P}_{\rm BG}(u_{0}..u_{m-1})=\prod_{i=0}^{m-1}p_{a_{i}}, (58)

if u0=a0,,um1=am1u_{0}=a_{0},...,u_{m-1}=a_{m-1} with a0,,am1𝒜a_{0},...,a_{m-1}\in\mathcal{A}. Then, we define

wth:=max{w|BG({u0..um1|WM(u0..um1)w})p},w_{\rm th}:=\max\{w\in\mathbb{R}\ |\ \mathbb{P}_{\rm BG}\left(\{u_{0}..u_{m-1}\ |\ W_{M}(u_{0}..u_{m-1})\geq w\}\right)\geq p\}, (59)

where, for any subset U𝒜mU\in\mathcal{A}^{m}, we define BG(U):=uUBG(u)\mathbb{P}_{\rm BG}(U):=\sum_{u\in U}\mathbb{P}_{\rm BG}(u).

Now, we regard W:=WM(u0..um1)W:=W_{M}(u_{0}..u_{m-1}) as a random variable and consider its asymptotic distribution in the case of large mm. We use the following theorem, a variant of the central limit theorem.

Theorem 5 (Theorem 27.4 in [30], modified).

Let {Xn}n0\{X_{n}\}_{n\in\mathbb{N}_{\geq 0}} be the sequence of the independent random variables on some probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) such that, for any n0n\in\mathbb{N}_{\geq 0}, XnX_{n} has the expectation μn\mu_{n} and the finite variance σn2\sigma_{n}^{2}. For each nn\in\mathbb{N}, define Sn:=i=0n1(Xiμi)S_{n}:=\sum_{i=0}^{n-1}(X_{i}-\mu_{i}) and sn2:=i=0n1σi2s_{n}^{2}:=\sum_{i=0}^{n-1}\sigma_{i}^{2}. Suppose that there exists δ+\delta\in\mathbb{R}_{+} such that

limn1sn2+δi=0n1E[|Xiμi|2+δ]=0.\lim_{n\rightarrow\infty}\frac{1}{s_{n}^{2+\delta}}\sum_{i=0}^{n-1}E_{\mathbb{P}}[|X_{i}-\mu_{i}|^{2+\delta}]=0. (60)

Then, Sn/snS_{n}/s_{n} converges in distribution to a standard normal random variable, as nn goes to infinity.

We can apply this theorem to the case of PWM matching by, for each i[m]0i\in[m]_{0}, regarding XiX_{i} in Theorem 5 as M(i,ui)M(i,u_{i}), the score of the alphabet in the iith position in the background model. μi\mu_{i} and σi\sigma_{i} correspond to the mean and the variance of the score of the alphabet in the iith position, that is,

μi=a𝒜paM(i,a)\mu_{i}=\sum_{a\in\mathcal{A}}p_{a}M(i,a) (61)

and

σi2=a𝒜pa(M(i,a)μi)2,\sigma_{i}^{2}=\sum_{a\in\mathcal{A}}p_{a}(M(i,a)-\mu_{i})^{2}, (62)

respectively. We must check the condition (60) is satisfied, and it is in fact satisfied in the following plausible situation. First, recall that we have rescaled the PWM so that Eq. (4) holds. Therefore,

EBG[|Xiμi|2+δ]1E_{\mathbb{P}_{\rm BG}}[|X_{i}-\mu_{i}|^{2+\delta}]\leq 1 (63)

holds obviously. Besides, we may additionally assume that there exist r(0,1)r\in(0,1) and σmin2+\sigma_{\rm min}^{2}\in\mathbb{R}_{+} independent of mm such that, for at least rm\lceil rm\rceil elements ii in [m]0[m]_{0}, σi2σmin2\sigma_{i}^{2}\geq\sigma_{\rm min}^{2} holds. This means that, although in some part of the positions the position-wise score variances might be small, at least in the certain ratio rr of the positions the variances exceeds the level σmin2\sigma_{\rm min}^{2}. This assumption yields

sm2rmσmin2.s_{m}^{2}\geq rm\sigma_{\rm min}^{2}. (64)

Combining Eqs. (63) and (64), we have

1sm2+δi=0m1EBG[|Xiμi|2+δ]m(rm)1+δ/2σmin2+δ\frac{1}{s_{m}^{2+\delta}}\sum_{i=0}^{m-1}E_{\mathbb{P}_{\rm BG}}[|X_{i}-\mu_{i}|^{2+\delta}]\leq\frac{m}{(rm)^{1+\delta/2}\sigma_{\rm min}^{2+\delta}} (65)

for any δ+\delta\in\mathbb{R}_{+}, which converges to 0 in the large mm limit.

Therefore, in the large mm limit, we can approximate as

BG(Wwth)(wthμ~m)/sm12πex2/2𝑑x,\mathbb{P}_{\rm BG}\left(W\geq w_{\rm th}\right)\approx\int^{\infty}_{(w_{\rm th}-\tilde{\mu}_{m})/s_{m}}\frac{1}{\sqrt{2\pi}}e^{-x^{2}/2}dx, (66)

with μ~m:=i=0m1μi=EBG[W]\tilde{\mu}_{m}:=\sum_{i=0}^{m-1}\mu_{i}=E_{\mathbb{P}_{\rm BG}}[W] being the expected segment score under the background model.

Considering the above asymptotic distribution of SS, it seems reasonable to set wthw_{\rm th} as μ~m+xσtot\tilde{\mu}_{m}+x\sigma_{\rm tot} with some x+x\in\mathbb{R}_{+}. Alternatively, if we set the two levels of the score whardw_{\rm hard} and wsoftw_{\rm soft} in the QMCI-based method, it seems plausible to set them as wsoft=μ~m+xsoftsmw_{\rm soft}=\tilde{\mu}_{m}+x_{\rm soft}s_{m} and whard=μ~m+xhardsmw_{\rm hard}=\tilde{\mu}_{m}+x_{\rm hard}s_{m} with xsoft,xhard+x_{\rm soft},x_{\rm hard}\in\mathbb{R}_{+} such that xsoft<xhardx_{\rm soft}<x_{\rm hard} and xhardxsoft=O(1)x_{\rm hard}-x_{\rm soft}=O(1). For example, xsoft=3x_{\rm soft}=3 and xhard=4x_{\rm hard}=4, which correspond to the pp-values 1.35×1031.35\times 10^{-3} and 3.17×1053.17\times 10^{-5}, respectively, in the approximation as Eq. (66). In such a setting, using sm=Ω(m)s_{m}=\Omega(\sqrt{m}) that follows from Eq. (64), we get Eq. (53) and then the complexity bounds (54) and (55).

V Summary

In this paper, we have proposed the two quantum algorithms for an important but time-consuming problem in bioinformatics, PWM matching, which aims to find sequence motifs in a biological sequence whose scores defined by PWMs exceed the threshold. Both of these algorithms, the naive iteration method and the QMCI-based method, utilize QAA for search of high-score segments. They are differentiated by how to calculate the segment score. The former calculates it by simply iterating to add up each position-wise score by quantum circuits for arithmetic. The latter uses QMCI for this summation, coping with false detection due to the QMCI error by setting two levels of threshold wsoftw_{\rm soft} and whardw_{\rm hard}. Given the oracular accesses to the entries in the sequence and PWMs, both of the quantum algorithms runs with query complexity scaling with the sequence length nn and the number of PWMs KK as O(Kn)O(\sqrt{Kn}), thanks to the well-known quadratic speedup by QAA. Furthermore, under some setting on wsoftw_{\rm soft} and whardw_{\rm hard}, the complexity of the QMCI-based method scales with the sequence motif length mm as O(m)O(\sqrt{m}). These mean the quantum speedup over existing classical algorithms. Although our quantum algorithms take O(n)O(n) preparation time for initialization of QRAMs, they still have the advantage especially when we perform matching between a sequence and many PWMs.

It is interesting that these algorithms for bioinformatics are inspired by the algorithm in [23] for a completely different field, gravitational wave astronomy. We expect that the scheme used in these algorithms, the combination of QMCI and QAA, is widely useful over various industrial and scientific fields. In future work, we will explore other applications of quantum algorithms in bioinformatics and other fields.

Acknowledgement

This work is supported by MEXT Quantum Leap Flagship Program (MEXT Q-LEAP) Grant no. JPMXS0120319794 and JPMXS0118067285. K.M. is supported by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant no. JP22K11924.

Appendix A Proof of Theorem 1 for the case that a=0a=0

Proof.

As described in [23] and the original paper [20], in QAA, we repeatedly generate Gj|ΦG^{j}\ket{\Phi} with various j0j\in\mathbb{N}_{\geq 0} and measure R2R_{2}, and output (A) if and only if the measurement outcome is 1. Here,

G:=AS0A1Sχ,G:=-AS_{0}A^{-1}S_{\chi}, (67)

where S0S_{0} and SχS_{\chi} are the unitary operators on the system under consideration acting as follows:

Sχ|ϕ|x={|ϕ|0;ifx=0|ϕ|1;ifx=1S_{\chi}\ket{\phi}\ket{x}=\begin{cases}\ket{\phi}\ket{0}&;\ {\rm if}\ x=0\\ -\ket{\phi}\ket{1}&;\ {\rm if}\ x=1\end{cases} (68)

with any state |ϕ\ket{\phi} on R1R_{1}, and

S0|Φ={|0|0;if|Φ=|0|0|Φ;ifΦ|(|0|0)=0.S_{0}\ket{\Phi^{\prime}}=\begin{cases}-\ket{0}\ket{0}&;\ {\rm if}\ \ket{\Phi^{\prime}}=\ket{0}\ket{0}\\ \ket{\Phi^{\prime}}&;\ {\rm if}\ \bra{\Phi^{\prime}}\left(\ket{0}\ket{0}\right)=0\end{cases}. (69)

As we can see easily, if a=0a=0, G|Φ=|Φ=|ϕ0|0G\ket{\Phi}=\ket{\Phi}=\ket{\phi_{0}}\ket{0} holds, and thus we never get 1 in measuring R2R_{2} in Gj|ΦG^{j}\ket{\Phi} for any jj. Therefore, (A) is never output, which means that (B) is output certainly. ∎

For the detailed procedure of QAA(A,γ,δ)\textup{{QAA}}(A,\gamma,\delta) and the proof of Theorem 1 in the other cases, see [23] and the original paper [20].

Appendix B Proof of Theorem 2

Before the proof, we present the following fact, Theorem 4 in [23]. It is almost same as Lemma 2.1 in [22], but slightly modified in reference to the original one, Lemma 6.1 in [31].

Lemma 1.

Let μ\mu\in\mathbb{R} and ϵ+\epsilon\in\mathbb{R}_{+}. Let 𝒜\mathcal{A} be an algorithm that outputs an ϵ\epsilon-approximation of μ\mu with probability γ34\gamma\geq\frac{3}{4}. Then, for any δ(0,1)\delta\in(0,1), the median of outputs in 12logδ1+112\left\lceil\log\delta^{-1}\right\rceil+1 runs of 𝒜\mathcal{A} is an ϵ\epsilon-approximation of μ\mu with probability at least 1δ1-\delta.

Then, the proof of Theorem 2 is as follows.

Proof of Theorem 2.

According to Theorem 7 in [23], for any integer tt larger than 2, we can construct the oracle O~𝒳,tmean\tilde{O}_{\mathcal{X},t}^{\rm mean} that acts as O~𝒳,tmean|0=y𝒴αy|y\tilde{O}_{\mathcal{X},t}^{\rm mean}\ket{0}=\sum_{y\in\mathcal{Y}}\alpha_{y}\ket{y} using O𝒳O_{\mathcal{X}} O(t)O(t) times. Here, 𝒴\mathcal{Y} is a finite set of real numbers that includes a subset 𝒴~\tilde{\mathcal{Y}} consisting of elements μ~\tilde{\mu} satisfying

|μ~μ|C(μt+1t2)|\tilde{\mu}-\mu|\leq C\left(\frac{\sqrt{\mu}}{t}+\frac{1}{t^{2}}\right) (70)

with a universal real constant CC, and {αy}y𝒴\{\alpha_{y}\}_{y\in\mathcal{Y}} are complex numbers satisfying y~𝒴~|αy~|28/π2\sum_{\tilde{y}\in\tilde{\mathcal{Y}}}|\alpha_{\tilde{y}}|^{2}\geq 8/\pi^{2}. Following this, we prepare a system with JJ quantum registers and generate the state

|Ψ:=(y1𝒴αy1|y1)(yJ𝒴αyJ|yJ)\ket{\Psi}:=\left(\sum_{y_{1}\in\mathcal{Y}}\alpha_{y_{1}}\ket{y_{1}}\right)\otimes\cdots\otimes\left(\sum_{y_{J}\in\mathcal{Y}}\alpha_{y_{J}}\ket{y_{J}}\right) (71)

by operating O~𝒳,tmean\tilde{O}_{\mathcal{X},t}^{\rm mean} on each register. Here, JJ and tt are set as

J=12logδ1+1,t=2Cϵ.J=12\left\lceil\log\delta^{-1}\right\rceil+1,t=\left\lceil\frac{2C}{\epsilon}\right\rceil. (72)

It can be shown by easy algebra that, in this setting, each μ~𝒴~\tilde{\mu}\in\tilde{\mathcal{Y}} satisfies |μ~μ|ϵ|\tilde{\mu}-\mu|\leq\epsilon. By measuring |Ψ\ket{\Psi}, we obtain JJ real numbers y1,,yJy_{1},...,y_{J}, each of which is a ϵ\epsilon-approximation of μ\mu with probability at least 8π2>34\frac{8}{\pi^{2}}>\frac{3}{4}. Therefore, because of Lemma 1, the median of y1,,yJy_{1},...,y_{J} is a ϵ\epsilon-approximation of μ\mu with probability at least 1δ1-\delta. This means that, if we generate

|Ψ:=(y1𝒴αy1|y1)(yJ𝒴αyJ|yJ)|med(y1,,yJ)\ket{\Psi^{\prime}}:=\left(\sum_{y_{1}\in\mathcal{Y}}\alpha_{y_{1}}\ket{y_{1}}\right)\otimes\cdots\otimes\left(\sum_{y_{J}\in\mathcal{Y}}\alpha_{y_{J}}\ket{y_{J}}\right)\ket{{\rm med}(y_{1},...,y_{J})} (73)

by adding one more register to |Ψ\ket{\Psi} and then using OJmedO^{\rm med}_{J}, this is actually the state in Eq. (10), with the first JJ registers regarded as undisplayed. In summary, we can construct O𝒳,ϵ,δmeanO_{\mathcal{X},\epsilon,\delta}^{\rm mean} as

O𝒳,ϵ,δmean=OJmed(O~𝒳,tmeanO~𝒳,tmeanJI),O_{\mathcal{X},\epsilon,\delta}^{\rm mean}=O^{\rm med}_{J}\left(\underbrace{\tilde{O}_{\mathcal{X},t}^{\rm mean}\otimes\cdots\otimes\tilde{O}_{\mathcal{X},t}^{\rm mean}}_{J}\otimes I\right), (74)

where II is the identity operator on the Hilbert space for the register. Since each O~𝒳,tmean\tilde{O}_{\mathcal{X},t}^{\rm mean} uses O𝒳O_{\mathcal{X}} O(t)O(t) times, O𝒳,ϵ,δmeanO_{\mathcal{X},\epsilon,\delta}^{\rm mean} uses O𝒳O_{\mathcal{X}} O(tJ)O(tJ) times, that is, O(1ϵlog(1δ))O\left(\frac{1}{\epsilon}\log\left(\frac{1}{\delta}\right)\right) times, in total. ∎

References