This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: W.-J. Liu 22institutetext: Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, P.R.China
22email: [email protected]
33institutetext: W.-J. Liu 44institutetext: P.-P. Gao 55institutetext: W.-B. Yu 66institutetext: Z.-G. Qu 77institutetext: School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, P.R.China 88institutetext: C.-N. Yang 99institutetext: Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien 974, Taiwan

Quantum Relief Algorithm

Wen-Jie Liu    Pei-Pei Gao    Wen-Bin Yu    Zhi-Guo Qu    Ching-Nung Yang
(Received: date / Accepted: date)
Abstract

Relief algorithm is a feature selection algorithm used in binary classification proposed by Kira and Rendell, and its computational complexity remarkable increases with both the scale of samples and the number of features. In order to reduce the complexity, a quantum feature selection algorithm based on Relief algorithm, also called quantum Relief algorithm, is proposed. In the algorithm, all features of each sample are superposed by a certain quantum state through the CMP and rotation operations, then the swap test and measurement are applied on this state to get the similarity between two samples. After that, Near-hit and Near-miss are obtained by calculating the maximal similarity, and further applied to update the feature weight vector WTWT to get WTWT^{\prime} that determine the relevant features with the threshold τ\tau. In order to verify our algorithm, a simulation experiment based on IBM Q with a simple example is performed. Efficiency analysis shows the computational complexity of our proposed algorithm is O(M), while the complexity of the original Relief algorithm is O(NM), where NN is the number of features for each sample, and MM is the size of the sample set. Obviously, our quantum Relief algorithm has superior acceleration than the classical one.

Keywords:
quantum Relief algorithm feature selection CMP operation rotation operation swap test IBM Q

1 Introduction

Machine learning refers to an area of computer science in which patterns are derived (“learned”) from data with the goal to make sense of previously unknown inputs. As part of both artificial intelligence and statistics, machine learning algorithms process large amounts of information for tasks that come naturally to the human brain, such as image recognition, pattern identification and strategy optimization.

Machine learning tasks are typically classified into two broad categories 1 , supervised and unsupervised machine learning, depending on whether there is a learning “signal” or “feedback” available to a learning system. In supervised machine learning, the learner is provided a set of training examples with features presented in the form of high-dimensional vectors and with corresponding labels to mark its category. On the contrary, no labels are given to the learning algorithm, leaving it on its own to find structure in unsupervised machine learning. The core mathematical task for both supervised and unsupervised machine learning algorithm is to evaluate the distance and inner products between the high-dimensional vectors, which requires a time proportional to the size of the vectors on classical computers. As we know, it is “curse of dimensionality” 2 to calculate the distance in high-dimensional condition. One of the possible solutions is the dimension reduction 3 , and the other is the feature selection 4 5 .

Relief algorithm is one of the most representative feature selection algorithms, which was firstly proposed by Kira et al. 6 . The algorithm devises a “relevant statistic” to weigh the importance of the feature which has been widely applied in many fields, such as, hand gesture recognition 7 , electricity price forecasting 8 and power system transient stability assessment 9 . Relief algorithm will occupy larger amount of computation resources while the number of samples and features become huger, which restricts the application of the algorithm.

Since the concept of quantum computer was proposed by the famous physicist Feynman 10 , a number of remarkable outcomes have been proposed. For example, Deutsch’s algorithms 11 12 embody the superiority of quantum parallelism calculation, Shor’s algorithm 13 solves the problem of integer factorization in polynomial time, and Grover’s algorithm 14 has a quadratic speedup to the problem of conducting a search through some unstructured search space. With the properties of superposition and entanglement, quantum computation has the potential advantage for dealing with high dimension vectors which attract researchers to apply the quantum mechanics to solve some classical machine learning tasks such as quantum pattern matching 15 , quantum probably approximately correct learning 16 , feedback learning for quantum measurement 17 , quantum binary classifiers 18 19 , and quantum support vector machines 20 .

Even in quantum machine learning, we still face with the trouble of “curse of dimensionality”, thus dimension reduction or feature selection is a necessary preliminary before training high-dimensional samples. Inspired by the idea of computing the inner product between two vectors 21 22 , we propose a Relief-based quantum parallel algorithm, also named quantum Relief algorithm, to effectively perform the feature selection.

The outline of this paper is as follows. The classic Relief algorithm is briefly reviewed in Sect. 2, the proposed quantum Relief algorithm is proposed in detail in Sect. 3, and a simulation experiment based on IBM Q with a simple example is given in Sect. 4. Subsequently, the efficiency of the algorithm is analyzed in Sect. 5, and the brief conclusion and the outlook of our work are discussed in the last section.

2 Review of Relief algorithm

Relief algorithm is a feature selection algorithm used in binary classification (generalizable to polynomial classification by decomposition into a number of binary problems) proposed by Kira and Rendell 6 in 1992. It is efficient in estimating features according to how well their values distinguish among samples that are near each other.

We can divide an MM-sample set into two vector sets: A={vj|vjN,j=1,2,M1}A{\rm{=}}\left\{{{v_{j}}{\rm{|}}{v_{j}}\in{{\mathbb{R}}^{\rm{N}}},j=1,2,\cdots{M_{1}}}\right\}, B={wk|wkN,k=1,2,M2}B{\rm{=}}\left\{{{w_{k}}{\rm{|}}{w_{k}}\in{{\mathbb{R}}^{\rm{N}}},k=1,2,\cdots{M_{2}}}\right\}, where vj{v_{j}}, wk{w_{k}} are NN-feature samples: vj=(vj1,vj2,,vjN)𝖳{v_{j}}={\left({{v_{j1}},{v_{j2}},\cdots,{v_{jN}}}\right)^{\mathsf{T}}}, wk=(wk1,wk2,wkN)𝖳{w_{k}}={\left({{w_{k1}},{w_{k2}},\cdots{w_{kN}}}\right)^{\mathsf{T}}}, vj1,,vjN,wk1,,wkN{0,1}{v_{j1}},\cdots,{v_{jN}},{w_{k1}},\cdots,{w_{kN}}\in\{0,1\}, and the weight vector of NN features WT=(wt1,wt2,,wtN)𝖳WT={\left({w{t_{1}},w{t_{2}},\cdots,w{t_{N}}}\right)^{\mathsf{T}}} is initialized to all zeros. Suppose the upper limit of iteration is TT, and the relevance threshold (that differentiate the relevant and irrelevant features) is τ(0τ1)\tau(0\leq\tau\leq 1). The process of Relief algorithm is described in Algorithm 1.

Init WT=(0,,0)𝖳WT={\left({0,\cdots,0}\right)^{\mathsf{T}}}.
for  t=1 to TT do
      Pick a sample uu randomly.
      Find the closest samples Near-hit and Near-miss in the two classes AA and BB.
      for i=1 to NN do
           WT[i]=WT[i]diff(i,u,Near-hit)2+diff(i,u,Near-miss)2WT\left[i\right]=WT\left[i\right]{\rm{-}}\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}{\rm{+}}\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}.
          
           end for
          
           end for
          Select the most relevant features according to WTWT and τ\tau.
ALGORITHM 1 Relief(AA, BB, TT, τ\tau)

At each iteration, pick a random sample uu, and then select the samples closest to uu (by NN-dimensional Euclidean distance) from each class. The closest same-class sample is called Near-hit, and the closest different-class sample is called Near-miss. Update the weight vector such that,

WT[i]=WT[i]diff(i,u,Near-hit)2+diff(i,u,Near-miss)2,WT\left[i\right]=WT\left[i\right]{\rm{-}}\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}{\rm{+}}\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}, (1)

where the function diff(i,u,v)\emph{diff}(i,u,v) is defined as below,

diff(i,u,v)={0ui=vi1uivi.diff(i,u,v)=\left\{\begin{array}[]{lcl}0{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{u_{i}}={v_{i}}\\ 1{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{u_{i}}\neq{v_{i}}\end{array}.\right. (2)

Relief was also described as generalizable to polynomial classification by decomposition into a number of binary problems. However, as the scale of samples and the number of features increase, their efficiencies will drastically decline.

3 Quantum Relief Algorithm

In order to handle the problem of large samples and large features, we propose a quantum Relief algorithm. Suppose the sample sets A={vj,j=1,2,M1}A=\left\{{{v_{j}},j=1,2,\cdots{M_{1}}}\right\}, B={wk,k=1,2,M2}B=\left\{{{w_{k}},k=1,2,\cdots{M_{2}}}\right\}, weight vector WTWT, the upper limit TT and threshold τ\tau are the same as classical Relief algorithm defined in Sec. 2. Different from the classical one, all the features of each sample are represented as a quantum superposition state, and the similarity between two samples can be calculated in parallel. Algorithm 2 shows the detailed procedure of quantum Relief algorithm.

Init WT=(0,,0)𝖳WT={\left({0,\cdots,0}\right)^{\mathsf{T}}}.
Prepare the states for sample sets AA and BB through CMP and rotation operations, |ϕAj=1N|ji=0N1|i|1(1|vji|2|0+vji|1)|ϕBk=1N|ki=0N1|i|1(1|wki|2|0+wki|1)\begin{array}[]{l}{\left|{{\phi_{A}}}\right\rangle_{j}}=\frac{1}{{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}\\ {\left|{{\phi_{B}}}\right\rangle_{k}}=\frac{1}{{\sqrt{N}}}\left|k\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{w_{ki}}}\right|}^{2}}}\left|0\right\rangle+{w_{ki}}\left|1\right\rangle}\right)}\end{array}.
for t=1 to TT do
      Select a state |ϕ\left|\phi\right\rangle randomly from the state set {|ϕAj}\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\} or {|ϕBk}{\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}}\right\}}.
     Perform a swap operation on |ϕ\left|\phi\right\rangle to get |φ = 1N|li=0N1|i(1|ui|2|0+ui|1)|1\left|\varphi\right\rangle{\text{ = }}\frac{{\text{1}}}{{\sqrt{N}}}\left|l\right\rangle\sum\limits_{i=0}^{N-1}{\left|i\right\rangle\left({\sqrt{1-{{\left|{{u_{i}}}\right|}^{2}}}\left|0\right\rangle+{u_{i}}\left|1\right\rangle}\right)\left|1\right\rangle}.
     
     Get |u|vj|2=(12Pjl(A)(1))N2|u|wk|2=(12Pkl(B)(1))N2\begin{array}[]{l}{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|^{2}}=\left({1-2P_{j}^{l\left(A\right)}\left(1\right)}\right){N^{2}}\\ {\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{w_{k}}}}}\right.\kern-1.2pt}{{{w_{k}}}}\right\rangle}\right|^{2}}=\left({1-2P_{k}^{l\left(B\right)}\left(1\right)}\right){N^{2}}\end{array} through swap test and measurement operation.
     
     Obtain the maximum similarity: max{|u|vj|2}\max\left\{{{{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|}^{2}}}\right\} and max{|u|wk|2}\max\left\{{{{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{w_{k}}}}}\right.\kern-1.2pt}{{{w_{k}}}}\right\rangle}\right|}^{2}}}\right\}.
     
     if uu belongs to class AA then
           Near-hit=vmax,Near-miss=wmax\emph{Near-hit}={v_{\max}},\emph{Near-miss}={\kern 1.0pt}{w_{\max}}.
          
           else
                Near-hit=wmax,Near-miss=vmax\emph{Near-hit}={w_{\max}},\emph{Near-miss}={\kern 1.0pt}{v_{\max}}.
               
                end if
               
               for  i = 1 to NN do
                     wti=wti1diff(i,u,Near-hit)2+diff(i,u,Near-miss)2w{t_{i}}=w{t_{i-1}}-\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}+\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}.
                    
                     end for
                    
                     end for
                    
                    WT¯\overline{WT} = (1/T)WT\left({{1\mathord{\left/{\vphantom{1T}}\right.\kern-1.2pt}T}}\right)WT.
                    
                    for i = 1 to NN do
                          if (WT¯iτ\overline{WT}_{i}\geq\tau) then
                               the i-th feature is relevant.
                              
                               else
                                    the i-th feature is not relevant.
                                   
                                    end if
                                   
                                    end for
ALGORITHM 2 QRelief(AA, BB, TT, τ\tau)

3.1 State preparation

At the beginning of the algorithm, the classical information is converted into quantum superposition states. Quantum superposition state sets {|ϕAj|j=1,2,,M1}\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}|j=1,2,\cdots,{M_{1}}}\right\} and {|ϕBk|k=1,2,,M2}\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}|k=1,2,\cdots,{M_{2}}}\right\}, which store all the feature values of vjA{v_{j}}\in A and wkB{w_{k}}\in B, are prepared as below,

|ϕAj=1N|ji=0N1|i|1(1|vji|2|0+vji|1)|ϕBk=1N|ki=0N1|i|1(1|wki|2|0+wki|1),\begin{array}[]{l}{\left|{{\phi_{A}}}\right\rangle_{j}}=\frac{1}{{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}\\ {\left|{{\phi_{B}}}\right\rangle_{k}}=\frac{1}{{\sqrt{N}}}\left|k\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{w_{ki}}}\right|}^{2}}}\left|0\right\rangle+{w_{ki}}\left|1\right\rangle}\right)}\end{array}, (3)

where vji{v_{ji{\kern 1.0pt}}} and wki{w_{ki{\kern 1.0pt}}} represent the i-th feature value of vector vj{v_{j{\kern 1.0pt}}} and wk{w_{k{\kern 1.0pt}}}, respectively. Suppose the initial state is |j|0n|1|0\left|j\right\rangle{\left|0\right\rangle^{\otimes n}}\left|1\right\rangle\left|0\right\rangle (n=log2(N)n=\left\lceil{{{\log}_{2}}\left({N}\right)}\right\rceil) and we want to prepare the state |ϕAj{\left|{{\phi_{A}}}\right\rangle_{j}}{\kern 1.0pt}{\kern 1.0pt}, its construction process consists of the following three steps.

First of all, the H (i.e., Hadamard gate) and CMP operations are performed on |0n{\left|0\right\rangle^{\otimes n}} to obtain the state 1/Ni=0N1|i{\raise 3.01385pt\hbox{$1$}\!\mathord{\left/{\vphantom{1{\sqrt{N}}}}\right.\kern-1.2pt}\!\lower 3.01385pt\hbox{${\sqrt{N}}$}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle},

|0nHandCMPoperations1Ni=0N1|i.{\left|0\right\rangle^{\otimes n}}\xrightarrow{\emph{H}{\kern 2.0pt}and{\kern 2.0pt}\emph{CMP}{\kern 2.0pt}operations}{{\rm{1}}\over{\sqrt{N}}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle}. (4)

Fig. 1 depicts the detailed circuit of these operations, where the CMP operation is a key component which is used to tailor the state 1/2ni=02n1|i{\raise 3.01385pt\hbox{$1$}\!\mathord{\left/{\vphantom{1{\sqrt{{2^{n}}}}}}\right.\kern-1.2pt}\!\lower 3.01385pt\hbox{${\sqrt{{2^{n}}}}$}}\sum\limits_{i=0}^{{2^{n}-1}}{\left|i\right\rangle} into the target state 1/Ni=0N1|i{\raise 3.01385pt\hbox{$1$}\!\mathord{\left/{\vphantom{1{\sqrt{N}}}}\right.\kern-1.2pt}\!\lower 3.01385pt\hbox{${\sqrt{N}}$}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle}, and its definition is

CMP|i|N|0={|i|N|0,i<N|i|N|1,iN.CMP\left|i\right\rangle\left|N\right\rangle\left|0\right\rangle=\left\{{\begin{array}[]{*{20}{c}}{\left|i\right\rangle\left|N\right\rangle\left|0\right\rangle,i<N}\\ {\left|i\right\rangle\left|N\right\rangle\left|1\right\rangle,i\geq N}\end{array}}.\right. (5)

After the CMP operation, the quantum state which greater than NN (i.e., the last qubit is |1\left|1\right\rangle) will be clipped off.

Refer to caption
Figure 1: The circuit of quantum operation performed on |0n{\left|0\right\rangle^{\otimes n}} to obtain the state 1Ni=0N1|i{{\rm{1}}\over{\sqrt{N}}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle} when the measurement outcome is 0. Here, the thick horizontal line represents multiple qubits, while the thin horizontal line represents one qubit, and H is the Hadamard gate, and CMP can be implemented using the circuit in Fig. 2.
Refer to caption
Figure 2: The circuit for performing CMP illustrated for a single qubit inputs |iq\left|{{i_{q}}}\right\rangle and |Nq\left|{{N_{q}}}\right\rangle. Here, the third line is the auxiliary qubit, while the last one is the result qubit which will contain |1\left|{1}\right\rangle if iNi\geq N, and be |0\left|{0}\right\rangle if i<Ni<N.

.

And then, an unitary rotation operation

Ry(2sin1vji)=[1|vji|2vjivji1|vji|2]{R_{y}}\left({2{{\sin}^{-1}}{v_{ji}}}\right)=\left[{\begin{array}[]{*{20}{c}}{\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}}&{-{v_{ji}}}\\ {{v_{ji}}}&{\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}}\end{array}}\right] (6)

is performed on the last qubit to obtain |ϕAj{\left|{{\phi_{A}}}\right\rangle_{j}}{\kern 1.0pt}{\kern 1.0pt},

1N|ji=0N1|i|1|0Ry1N|ji=0N1|i|1(1|vji|2|0+vji|1).{{\rm{1}}\over{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left|0\right\rangle}\buildrel{{R_{y}}}\over{\longrightarrow}{1\over{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}. (7)

3.2 Similarity calculation

The similarity is a coefficient that describes how close two samples are, and it is obviously inversely related to the Euclidean distance. After the state preparation, the main work of this phase is to calculate the similarity between |ϕ\left|\phi\right\rangle and other states in state sets {|ϕAj}\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\} and {|ϕBk}{\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}}\right\}}, where |ϕ\left|\phi\right\rangle is a state selected randomly from {|ϕAj}\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\} or {|ϕBk}{\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}}\right\}}. For simplicity, suppose |ϕ\left|\phi\right\rangle is the l-th state from {|ϕAj}\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\},

|ϕ = 1N|li=0N1|i|1(1|ui|2|0+ui|1),\left|\phi\right\rangle{\text{ = }}\frac{1}{{\sqrt{N}}}\left|l\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{u_{i}}}\right|}^{2}}}\left|0\right\rangle+{u_{i}}\left|1\right\rangle}\right)}, (8)

which corresponds to the chosen sample uu from sample set AA in the classical scenario. The detailed process is as follows.

First, a swap operation is performed on the last two qubits of |ϕ\left|\phi\right\rangle to obtain a new state,

|φ=1N|li=0N1|i(1|ui|2|0+ui|1)|1.\left|\varphi\right\rangle=\frac{1}{{\sqrt{N}}}\left|l\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left({\sqrt{1-{{\left|{{u_{i}}}\right|}^{2}}}\left|0\right\rangle+{u_{i}}\left|1\right\rangle}\right)\left|1\right\rangle}. (9)

Second, the swap test 23 operation (its circuit is given in Fig. 3) is applied on (|φ,|ϕAj)\left({\left|\varphi\right\rangle,{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{{\left|{{\phi_{A}}}\right\rangle}_{j}}}\right),

|0|φ|ϕAjswaptest[12|0(|φ|ϕAj+|ϕAj|φ)+12|1(|φ|ϕAj|ϕAj|φ)].\begin{array}[]{l}\left|0\right\rangle\left|\varphi\right\rangle{\left|{{\phi_{A}}}\right\rangle_{j}}\xrightarrow{swap{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}test}\left[{{1\over 2}\left|0\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}+{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)+{1\over 2}\left|1\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}-{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)}\right]\\ \end{array}. (10)
Refer to caption
Figure 3: The circuit of swap test operation. Here H is the Hadamard gate, and \bullet represents the control qubit conditional being set to one.

If we measure the first qubit with |11|II\left|1\right\rangle\left\langle 1\right|\otimes I\otimes I, the probability of measurement result to be |1\left|{\rm{1}}\right\rangle is

Pjl(1)=[120|(φ|ϕA|j+ϕA|jφ|)+121|(φ|ϕA|jϕA|jφ|)]|11|II[12|0(|φ|ϕAj+|ϕAj|φ)+12|1(|φ|ϕAj|ϕAj|φ)]=[121|(φ|ϕA|jϕA|jφ|)]|11|II[12|1(|φ|ϕAj|ϕAj|φ)]=1212|φ|ϕAj|2,\begin{array}[]{l}{P_{j}^{l}}(1)=\left[{{1\over 2}\left\langle 0\right|(\left\langle\varphi\right|{{\left\langle\phi_{A}\right|}_{j}}+{{\left\langle\phi_{A}\right|}_{j}}\left\langle\varphi\right|)+{1\over 2}\left\langle 1\right|(\left\langle\varphi\right|{{\left\langle\phi_{A}\right|}_{j}}-{{\left\langle\phi_{A}\right|}_{j}}\left\langle\varphi\right|)}\right]\left|1\right\rangle\left\langle 1\right|\otimes I\otimes I\\ \qquad\qquad\left[{{1\over 2}\left|0\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}+{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)+{1\over 2}\left|1\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}-{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)}\right]\\ \;\,\qquad=\left[{{1\over 2}\left\langle 1\right|(\left\langle\varphi\right|{{\left\langle\phi_{A}\right|}_{j}}-{{\left\langle\phi_{A}\right|}_{j}}\left\langle\varphi\right|)}\right]\left|1\right\rangle\left\langle 1\right|\otimes I\otimes I\left[{{1\over 2}\left|1\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}-{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)}\right]\\ \;\,\qquad={1\over 2}-{1\over 2}{\left|{{{\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi\phi}}\right.\kern-1.2pt}{\phi_{A}}\right\rangle}_{j}}}\right|^{2}}\end{array}, (11)

As we know, the inner product between |φ\left|\varphi\right\rangle and prepared state |ϕAj{\left|{{\phi_{A}}}\right\rangle_{j}} can be calculated as below,

φ|ϕAj=1Ni(ui)vji=1Nu|vj.\begin{array}[]{l}{\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi{{\phi_{A}}}}}\right.\kern-1.2pt}{{{\phi_{A}}}}\right\rangle_{j}}={1\over N}\sum\limits_{i}{(u_{i})^{*}{v_{ji}}}={1\over N}\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle\\ \end{array}. (12)

From Eqs. (11) and(12), we can get the similarity between samples uu and vjv_{j},

|u|vj|2=(12Pjl(A)(1))N2.\begin{array}[]{l}{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|^{2}}=\left({1-2P_{j}^{l\left(A\right)}\left(1\right)}\right){N^{2}}\\ \end{array}. (13)

Finally, we can find out the max-similarity sample vmaxA{v_{\max}}\in A through the classical maximum traversal search algorithm among the set {|u|vj|2}\left\{{{{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|}^{2}}}\right\}.

Through the above method, we can also find out the other max-similarity sample wmaxB{w_{\max}}\in B. When having finished all these steps, the algorithm turns to the next phase.

3.3 Weight vector update

The first step is to determine the closest same-class sample Near-hit and the different-class sample Near-miss, which obey the following rule,

{Near-hit=vmax,Near-miss=wmaxifuANear-hit=wmax,Near-miss=vmaxifuB.\left\{{\begin{array}[]{*{20}{c}}{\emph{Near-hit}={v_{\max}},{\kern 1.0pt}{\kern 1.0pt}\emph{Near-miss}={w_{\max}}}&{{\kern 1.0pt}{\kern 1.0pt}if{\kern 1.0pt}{\kern 1.0pt}u\in A}\\ {\emph{Near-hit}={w_{\max}},{\kern 1.0pt}{\kern 1.0pt}\emph{Near-miss}={v_{\max}}}&{{\kern 1.0pt}{\kern 1.0pt}if{\kern 1.0pt}{\kern 1.0pt}u\in B}\end{array}}.\right. (14)

After determining Near-hit and Near-miss, we can update every element of weight vector WT=(wt1,wt2,,wtN)𝖳WT={\left({w{t_{1}},w{t_{2}},\cdots,w{t_{N}}}\right)^{\mathsf{T}}} with them,

wti=wti1diff(i,u,Near-hit)2+diff(i,u,Near-miss)2,w{t_{i}}=w{t_{i-1}}-\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}+\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}{\kern 1.0pt}{\kern 1.0pt}, (15)

where 1iN1\leq i\leq N.

3.4 Feature selection

After iterating the similarity calculation and weight vector update TT runs, the algorithm jumps out of the loop with the final vector WTWT as output result. The remainder of the algorithm is to select the “real” relevant features.

We firstly divide WTWT by TT, and obtain its mean vector WT¯\overline{WT},

WT¯=1TWT.\overline{WT}=\frac{1}{T}WT. (16)

Then, we select relevant features according to the preset threshold τ\tau. To be specific, those features are selected if their corresponding values in WT¯\overline{WT} are greater than τ\tau and discarded in the opposite case,

{thei-thfeatureisrelevantifWT¯iτthei-thfeatureisNOTrelevantifWT¯i<τ.\left\{{\begin{array}[]{*{20}{c}}{the{\kern 2.0pt}\emph{i-th}{\kern 2.0pt}feature{\kern 2.0pt}is{\kern 2.0pt}relevant{\kern 36.0pt}if{\kern 3.0pt}\overline{WT}_{i}\geq\tau}\\ {the{\kern 2.0pt}\emph{i-th}{\kern 2.0pt}feature{\kern 2.0pt}is{\kern 2.0pt}NOT{\kern 2.0pt}relevant{\kern 8.0pt}if{\kern 3.0pt}\overline{WT}_{i}<\tau}\end{array}}.\right. (17)

4 Example and its experiment

Suppose there are four samples(see Tab. 1), S0=(1,0,1,0){S_{0}}=(1,0,1,0), S1=(1,0,0,0){S_{1}}=(1,0,0,0), S2=(0,1,1,0){S_{2}}=(0,1,1,0), S3=(0,1,0,0){S_{3}}=(0,1,0,0), thus the nn in Eq. (4) is 2, and they belong to two classes: A={S0,S1}A=\{{S_{0}},{S_{1}}\}, B={S2,S3}B=\{{S_{2}},{S_{3}}\}, which is illustrated in Fig. 4.

Table 1: The feature values of four samples. Each row represents all the feature values of a certain sample, while each column denotes a certain feature value of all the samples.
F0F_{0} F1F_{1} F2F_{2} F3F_{3}
S0S_{0} 1 0 1 0
S1S_{1} 1 0 0 0
S2S_{2} 0 1 1 0
S3S_{3} 0 1 0 0
Refer to caption
Figure 4: The simple example with four samples in classes AA and BB.

First, the four initial quantum states are prepared as follows,

{|ψS0=|00|02|1|0|ψS1=|01|02|1|0|ψS2=|10|02|1|0|ψS3=|11|02|1|0.\left\{{\begin{array}[]{*{20}{c}}{\left|\psi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}=\left|{{\rm{00}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\\ {\left|\psi\right\rangle_{{{\rm{S}}_{\rm{1}}}}}=\left|{{\rm{01}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\\ {\left|\psi\right\rangle_{{{\rm{S}}_{\rm{2}}}}}=\left|{{\rm{10}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\\ {\left|\psi\right\rangle_{{{\rm{S}}_{\rm{3}}}}}=\left|{{\rm{11}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\end{array}}.\right. (18)

Taking |ψS0{\left|\psi\right\rangle_{{{\rm{S}}_{\rm{0}}}}} as an example, the H2{H^{\otimes 2}} operation is applied on the third and fourth qubits,

|00|02|1|0H212|00i=03|i|1|0.\begin{array}[]{l}\left|{{\rm{00}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\xrightarrow{H^{\otimes 2}}{1\over 2}\left|{{\rm{00}}}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle\left|1\right\rangle\left|0\right\rangle}\end{array}. (19)

Then we perform Ry{R_{y}} rotation (see Eq. (6)) on the last qubit, where

{Ry(2sin1v00)=Ry(2sin1v02)=iY=[0110]Ry(2sin1v01)=Ry(2sin1v03)=I=[1001],\left\{{\begin{array}[]{*{20}{c}}{R_{y}}(2{\sin^{-1}}{v_{00}})={R_{y}}(2{\sin^{-1}}{v_{02}})=-iY=\left[{\begin{array}[]{*{20}{c}}0&{-1}\cr 1&0\cr\end{array}}\right]\\ {R_{y}}(2{\sin^{-1}}{v_{01}})={R_{y}}(2{\sin^{-1}}{v_{03}})$$=I=\left[{\begin{array}[]{*{20}{c}}1&0\cr 0&1\cr\end{array}}\right]\end{array}},\right. (20)

and can get

|ϕS0=12|00i=03|i|1(1|v0i|2|0+v0i|1).{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}={1\over 2}\left|{{\rm{00}}}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle\left|1\right\rangle\left({\sqrt{{\rm{1-}}{{\left|{{v_{0i}}}\right|}^{2}}}\left|0\right\rangle+{v_{0i}}\left|1\right\rangle}\right)}. (21)

The other quantum states are prepared in the same way and they are listed as below,

{|ϕS0=12|00i=03|i|1(1|v0i|2|0+v0i|1)|ϕS1=12|01i=03|i|1(1|v1i|2|0+v1i|1)|ϕS2=12|10i=03|i|1(1|v2i|2|0+v2i|1)|ϕS3=12|11i=03|i|1(1|v3i|2|0+v3i|1).\left\{{\begin{array}[]{*{20}{c}}{\left|\phi\right\rangle_{{S_{0}}}}={1\over 2}\left|{00}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{0i}}}\right|}^{2}}}\left|0\right\rangle+{v_{0i}}\left|1\right\rangle}\right)\\ {\left|\phi\right\rangle_{{S_{1}}}}={1\over 2}\left|{01}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{1i}}}\right|}^{2}}}\left|0\right\rangle+{v_{1i}}\left|1\right\rangle}\right)\\ {\left|\phi\right\rangle_{{S_{2}}}}={1\over 2}\left|{10}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{2i}}}\right|}^{2}}}\left|0\right\rangle+{v_{2i}}\left|1\right\rangle}\right)\\ {\left|\phi\right\rangle_{{S_{3}}}}={1\over 2}\left|{11}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{3i}}}\right|}^{2}}}\left|0\right\rangle+{v_{3i}}\left|1\right\rangle}\right)\end{array}}.\right. (22)

Second, we randomly select a sample (assume |ϕS0{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}} is that one), and perform similarity calculation with other samples (i.e., |ϕS1{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{1}}}}}, |ϕS2{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{2}}}}}, |ϕS3{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{3}}}}}). Taking |ϕS0{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}} and |ϕS1{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{1}}}}} as an example, we perform a swap operation between the last two qubits of |ϕS0{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}},

|ϕS0swap|φ=12|00i=03|i(1|v0i|2|0+v0i|1)|1.\begin{array}[]{l}{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}\xrightarrow{swap}\left|{{\varphi}}\right\rangle={1\over 2}\left|{{\rm{00}}}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle\left({\sqrt{{\rm{1-}}{{\left|{{v_{0i}}}\right|}^{2}}}\left|0\right\rangle+{v_{0i}}\left|1\right\rangle}\right)\left|1\right\rangle}\end{array}. (23)

After that, the swap test operation is applied on (|φ\left|{{\varphi}}\right\rangle, |ϕS1\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}),

|0|φ|ϕS1swaptest12|0(|φ|ϕS1+|ϕS1|φ)+12|1(|φ|ϕS1|ϕS1|φ),\begin{array}[]{l}\left|0\right\rangle\left|\varphi\right\rangle{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}\xrightarrow{swap{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}test}{1\over 2}\left|0\right\rangle\left({\left|\varphi\right\rangle{{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}+{{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}\left|\varphi\right\rangle}\right)+{1\over 2}\left|1\right\rangle\left({{\left|\varphi\right\rangle{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}-{{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}\left|\varphi\right\rangle}\right)\\ \end{array}, (24)

then, we measure the result shown in Eq.(24) and obtain the probability of the first qubit being |1\left|{\rm{1}}\right\rangle is

P10(1)=1212|φ|ϕS1|2.\begin{array}[]{l}{P_{1}^{0}}(1)={1\over 2}-{1\over 2}{\left|{{{\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi\phi}}\right.\kern-1.2pt}{\phi}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}}\right|^{2}}\end{array}. (25)

In terms of Eq. (12), the inner product between |φ\left|{{\varphi}}\right\rangle and |ϕS1\left|{{\phi}}\right\rangle_{S_{1}} can be represented as

φ|ϕS1=14iS0iS1i=14S0|S1.\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi\phi}}\right.\kern-1.2pt}{\phi}\right\rangle_{S_{1}}={1\over 4}\sum\limits_{i}{{S_{0}}_{i}^{*}{S_{1i}}={1\over 4}\left\langle{{{S_{0}}}}\mathrel{\left|{\vphantom{{{S_{0}}}{{S_{1}}}}}\right.\kern-1.2pt}{{{S_{1}}}}\right\rangle}. (26)

From Eqs. (25) and (26), the similarity between |S0\left|{{S_{0}}}\right\rangle and |S1\left|{{S_{1}}}\right\rangle is

|S0|S1|2=16(12P10(1)),{\left|{\left\langle{{{S_{0}}}}\mathrel{\left|{\vphantom{{{S_{0}}}{{S_{1}}}}}\right.\kern-1.2pt}{{{S_{1}}}}\right\rangle}\right|^{2}}=16(1-2{P_{1}^{0}}(1)), (27)

here, the value of P10(1){P_{1}^{0}}(1) can be determined by measurement.

In order to obtain the measurement result and also verify our algorithm, we choose the IBM Q [24] to perform the quantum processing (Fig. 5 gives the schematic diagram of our experimental circuit)111In the experiment, we program our algorithm based on the QISKit toolkit[25] and phyton language, and remotely connect the online IBM QX5 device to execute quantum processing.. After the experiment, we can get P10(1){P_{1}^{0}}(1) which is shown in Tab. 2.

Refer to caption
Figure 5: One of the ideal experiment circuits of QRelief algorithm running on IBM Q platform. q[0]q[5]q[0]-q[5] represents the randomly selected quantum state |ϕS0\left|{{\phi}}\right\rangle_{S_{0}}, q[6]q[11]q[6]-q[11] represents |ϕS1\left|{{\phi}}\right\rangle_{S_{1}}, and q[12]q[12] is the result qubit. H is the Hadamard gate, X, Y are Pauli-X, Pauli-Y gates, the symbol of two crosses connected by a line represents the swap operation, \circ represents the control qubit conditional being set to zero, and \bullet represents the control qubit conditional being set to one.
Table 2: The probabilities Pjl(1)P_{j}^{l}(1) of the first qubit being |1\left|{\rm{1}}\right\rangle
Iteration times(TT) uu SampleSample Pjl(1)P_{j}^{l}(1)
1 S0S_{0} S1S_{1} 0.49023438
S2S_{2} 0.49902344
S3S_{3} 0.49121094
2 S1S_{1} S0S_{0} 0.50097656
S2S_{2} 0.52246094
S3S_{3} 0.53417969
3 S2S_{2} S0S_{0} 0.50683594
S1S_{1} 0.50878906
S3S_{3} 0.49218750
4 S3S_{3} S0S_{0} 0.49804688
S1S_{1} 0.49218750
S2S_{2} 0.50195312

According to P10(1){P_{1}^{0}}(1) (from Tab. 2) and Eq. (27), we can calculate the similarity between |S0\left|{{S_{0}}}\right\rangle and |S1{\left|S_{1}\right\rangle},

|S0|S1|2=16(12P10(1))=16(120.49023438)0.3125.\begin{array}[]{l}{\left|{\left\langle{{{S_{0}}}}\mathrel{\left|{\vphantom{{{S_{0}}}{{S_{1}}}}}\right.\kern-1.2pt}{{{S_{1}}}}\right\rangle}\right|^{2}}=16(1-2{P_{1}^{0}}(1))\\ \;\;\;\qquad\qquad=16(1-2*0.49023438)\\ \;\;\;\qquad\qquad\approx 0.3125\end{array}. (28)

In the same way, the other two similarities (|S0\left|{{S_{0}}}\right\rangle, |S2{\left|S_{2}\right\rangle}), (|S0\left|{{S_{0}}}\right\rangle, |S3{\left|S_{3}\right\rangle}) can also be obtained (which are illustrated in Tab. 3).

Table 3: Similarities between samples
S0S_{0} S1S_{1} S2S_{2} S3S_{3}
S0S_{0} - 0.3125 0.03125 0.28125
S1S_{1} - 0.71875 1.09375
S2S_{2} - 0.25
S3S_{3} -

Third, From Tab. 3, it is easy to find Near-hit is S1S_{1} and Near-miss is S3S_{3} (as shown in Fig. 6). Then, the weight vector is updated by applying Eq.(15), and the result of WTWT is listed in the second row of Tab. 4 for the first iteration.

Refer to caption
Figure 6: Finding Near-hit and Near-miss.
Table 4: The update result of WTWT
Iterationtimes(T)Iterationtimes(T) Weightvector(WT)Weightvector(WT)
1 [1 1 0 0]
2 [2 2 -1 0]
3 [3 3 -1 0]
4 [4 4 -2 0]

The algorithm iterates TT times (in our example, TT=4) as above steps, and obtains all the WTWT results shown in Tab. 4. After TT-th iterations, WT=[4,4,2,0]WT=[4,4,-2,0], then WT¯=[1,1,1/2,0]\overline{WT}=[1,1,-1/2,0]. Since the threshold τ=0.5\tau=0.5, so the selected features are F0F_{0} and F1F_{1}, i.e., the first and second features.

5 Efficiency analysis

In classical Relief algorithm, it takes O(N) times to calculate the Euclidean distance between uu and each of the MM samples, the complexity of finding its Near-hit and Near-miss is related to MM, and the loop iterates TT times, so the computational complexity of classical Relief algorithm is O(TMN). Since TT is a constant which affects the accuracy of relevance levels, but it is chosen independently of MM and NN, so the complexity can be simplified to O(MN). In addition, an NN-dimensional vector in the Hilbert space is represented by NN bits in the classical computer, and there are MM samples (i.e., MM NN-dimensional vectors) needed to be stored in the algorithm, so the classical Relief algorithm will consume O(MN)O(MN) bits storage resources.

In our quantum Relief algorithm, all the features of each sample are superposed on a quantum state |ϕAj{\left|{{\phi_{A}}}\right\rangle_{j}} or |ϕBk{\left|{{\phi_{B}}}\right\rangle_{k}}, then the similarity calculation between two states, which is shown in Eq. (13), is just required to be taken O(1)O(1) time. As same as Relief algorithm, the similarity between the selected state |φ\left|\varphi\right\rangle and each state in {|ϕA}\left\{{\left|{{\phi_{A}}}\right\rangle}\right\}, {|ϕB}\left\{{\left|{{\phi_{B}}}\right\rangle}\right\} is calculated, taking O(M)O(M) times, to obtain Near-miss and Near-hit, and the loop iterates TT times, so the proposed algorithm totally takes O(TM) times. Since TT is a constant, the computational complexity can be rewritten as O(M). On the viewpoint of resource consumption, each quantum state in state sets {|ϕAj}\left\{{\left|{{\phi_{A}}}\right\rangle_{j}}\right\} and {|ϕBk}\left\{{\left|{{\phi_{B}}}\right\rangle_{k}}\right\} is represented as

|ϕAj=1N|ji=0N1|i|1(1|vji|2|0+vji|1){\left|{{\phi_{A}}}\right\rangle_{j}}=\frac{1}{{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}

or

|ϕBk=1N|ki=0N1|i|1(1|wki|2|0+wki|1),{\left|{{\phi_{B}}}\right\rangle_{k}}=\frac{1}{{\sqrt{N}}}\left|k\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{w_{ki}}}\right|}^{2}}}\left|0\right\rangle+{w_{ki}}\left|1\right\rangle}\right)},

and it consists of O(log2N)O({\log_{2}}N) qubits. Since j=1,2,,M1j=1,2,\cdots,{M_{1}}, k=1,2,,M2k=1,2,\cdots,{M_{2}} and M=M1+M2M={M_{1}}+{M_{2}}, that means that there are such MM quantum states needed to be stored in our algorithm, so it will consume O(MlogN)O(MlogN) qubits storage resources.

Tab. 5 illustrates the efficiency comparison, including the computational complexity and resource consumption, between classical Relief and our quantum Relief algorithms. As shown in the table, the computational complexity of our algorithm is O(M)O(M), which is obviously superior than O(MN)O(MN) in classical Relief algorithm. On the other hand, the resource that our algorithm needs to consume is O(MlogN)O(MlogN) qubits, while the classical Relief algorithm consumes O(MN)O(MN) bits.

Table 5: Efficiency comparison between classical Relief and quantum Relief algorithms
Complexity resource consumption
Relief Algorithm O(MN)O(MN) O(MN)O(MN) bits
Quantum Relief Algorithm O(M)O(M) O(MlogN)O(MlogN) qubits

6 Conclusion and Discussion

With quantum computing technologies nearing the era of commercialization and quantum supremacy, recently machine learning seems to be considered as one of the ”killer” applications. In this study, we utilize quantum computing technologies to solve a simple feature selection problem (just used in binary classification), and propose the quantum Relief algorithm, which consist of four phases: state preparation, similarity calculation, weight vector update, and features selection. Furthermore, we verify our algorithm by performing quantum experiments on the IBM Q platform. Compared with the classical Relief algorithm, our algorithm holds lower computation complexity and less resource consumption (in terms of number of resource). To be specific, the complexity is reduced from O(MN)O(MN) to O(M)O(M), and the consumed resource is shortened from O(MN)O(MN) bits to O(MlogN)O(MlogN) qubits.

Although this work just focuses on the feature selection problem (even the simple binary classification), but the method can be generalized to implement the other Relief-like algorithms, such as ReliefF 26 , RReliefF 27 . Besides, we are interested in utilizing quantum technologies to deal with some classic high-dimension massive data processing, such as text classification, video stream processing, data mining, computer version, information retrieval, and bioinformatics.

Acknowledgements.
The authors would like to thank the anonymous reviewers and editor for their comments that improved the quality of this paper. This work is supported by the National Nature Science Foundation of China (Grant Nos. 61502101, 61501247 and and 61672290), the Natural Science Foundation of Jiangsu Province, China (Grant No. BK20171458), the Six Talent Peaks Project of Jiangsu Province, China (Grant No. 2015-XXRJ-013), the Natural Science Foundation for Colleges and Universities of Jiangsu Province, China (Grant No. 16KJB520030), the Practice Innovation Training Program Projects for Jiangsu College Students (Grant No. 201810300016Z), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

References

  • (1) Mohri M., Rostamizadeh A., Talwalkar A.: Foundations of machine learning. MIT Press, Cambridge, Massachusetts (2012)
  • (2) Bellman, R. E.: Dynamic programming. Princeton University Press, Princeton, NJ (1957)
  • (3) Roweis, S. T., Saul, L. K.: Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326 (2000)
  • (4) Liu, H., Motoda, H.: Feature selection for knowledge discovery and data mining. Springer US (2012)
  • (5) Liu, H., Motoda, H.: Computational methods of feature selection. CRC Press (2007)
  • (6) Kira, K., Rendell, L. A.: A practical approach to feature selection. In: Proc. of 9th International Workshop on Machine Learning, pp. 249-256, Morgan Kaufmann Publishers, San Francisco, CA, USA (1992)
  • (7) Zhang, J., Lin, H., Zhao, M.: A fast algorithm for hand gesture recognition using Relief. In: Proc. of 6th International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 8-12, IEEE Press, Piscataway, NJ, USA (2009)
  • (8) Amjady, N., Daraeepour, A.: Day-ahead electricity price forecasting using the relief algorithm and neural networks. In: Proc. of 5th International Conference on the European Electricity, pp. 1-7, IEEE Press, Market, Lisboa, Portugal (2008)
  • (9) Dai, Y., Chen, L., Zhang, W., Min, Y.: Multi-support vector machine power system transient stability assessment based on relief algorithm. In: Proc. of 2015 IEEE PES Asia-Pacific Power and Energy Engineering, pp. 1-5. IEEE Press, Brisbane, QLD, Australia (2015)
  • (10) Feynman, R. P.: Quantum mechanical computers. Found. Phys. 16(6), 507-531 (1986)
  • (11) Deutsch, D., Jozsa, R.: Rapid solution of problems by quantum computation. In: Proc. of Royal Society of London, Series A: Mathematical and Physical Sciences. 439(1907), pp. 553-558, the Royal Society (1992)
  • (12) Deutsch, D.: Quantum theory, the church-turing principle and the universal quantum computer. In: Proc. of Royal Society of London, Series A: Mathematical and Physical Sciences. 400(1818), pp. 97-117, the Royal Society (1985)
  • (13) Shor, P. W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41(2), 303-332 (1999)
  • (14) Grover, Lov K.: A fast quantum mechanical algorithm for database search. In: Proc. of 28th Annual ACM symposium on Theory of Computing, pp. 212-219, ACM Press, New York, NY, USA (1996)
  • (15) Sasaki, M., Carlini, A.: Quantum learning and universal quantum matching machine. Phys. Rev. A. 66(2), 022303 (2002)
  • (16) Servedio, R. A., Gortler, S. J.: Equivalences and separations between quantum and classical learnability. SIAM J. Comput. 33(5), 1067-1092 (2004)
  • (17) Hentschel, A., Sanders, B. C.: Machine learning for precise quantum measurement. Phys. Rev. Lett. 104(6), 063603 (2010)
  • (18) Neven, H., Denchev, V. S., Rose, G., Macready, W. G.: Training a large scale classifier with the quantum adiabatic algorithm. arXiv:quant-ph/0912.0779v1 (2009)
  • (19) Pudenz, K. L., Lidar, D. A.: Quantum adiabatic machine learning. Quantum Inf. Process. 12(5), 2027-2070 (2013)
  • (20) Anguita, D., Ridella, S., Rivieccio, F., Zunino, R.: Quantum optimization for training support vector machines. Neural Netw., 16(5-6), 763 770 (2003)
  • (21) Wiebe, N., Kapoor, A., Svore, K.: Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quantum Info. Comput. 15(3-4), 316-356 (2015)
  • (22) Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum algorithms for supervised and unsupervised machine learning. arXiv:quant-ph/1307.0411 (2013)
  • (23) Buhrman, H., Cleve, R., Watrous, J., De, W. R.: Quantum fingerprinting. Phys. Rev. Lett. 87(16), 167902 (2001)
  • (24) IBM quantum computing platform. https://www.research.ibm.com/ibm-q/. Online accessed 12-April-2018
  • (25) QISKit: Open Source Quantum Information Science Kit. https://qiskit.org/. Online accessed 12-April-2018
  • (26) Kononenko, I., Simec, E., Robniksikonja, M.: Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7(1), 39-55 (1997)
  • (27) Robnik-Sikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: Proc. of 14th International conference on Machine Learning, pp. 296-304, Morgan Kaufmann Publishers, Nashville, TN, USA (1997)