∎

¹¹institutetext: W.-J. Liu ²²institutetext: Jiangsu Engineering Center of Network Monitoring, Nanjing University of Information Science & Technology, Nanjing 210044, P.R.China
²²email: [email protected] ³³institutetext: W.-J. Liu ⁴⁴institutetext: P.-P. Gao ⁵⁵institutetext: W.-B. Yu ⁶⁶institutetext: Z.-G. Qu ⁷⁷institutetext: School of Computer and Software, Nanjing University of Information Science & Technology, Nanjing 210044, P.R.China ⁸⁸institutetext: C.-N. Yang ⁹⁹institutetext: Department of Computer Science and Information Engineering, National Dong Hwa University, Hualien 974, Taiwan

Quantum Relief Algorithm

Wen-Jie Liu Pei-Pei Gao Wen-Bin Yu Zhi-Guo Qu Ching-Nung Yang

(Received: date / Accepted: date)

Abstract

Relief algorithm is a feature selection algorithm used in binary classification proposed by Kira and Rendell, and its computational complexity remarkable increases with both the scale of samples and the number of features. In order to reduce the complexity, a quantum feature selection algorithm based on Relief algorithm, also called quantum Relief algorithm, is proposed. In the algorithm, all features of each sample are superposed by a certain quantum state through the CMP and rotation operations, then the swap test and measurement are applied on this state to get the similarity between two samples. After that, Near-hit and Near-miss are obtained by calculating the maximal similarity, and further applied to update the feature weight vector $WT$ to get $WT^{\prime}$ that determine the relevant features with the threshold $\tau$ . In order to verify our algorithm, a simulation experiment based on IBM Q with a simple example is performed. Efficiency analysis shows the computational complexity of our proposed algorithm is O(M), while the complexity of the original Relief algorithm is O(NM), where $N$ is the number of features for each sample, and $M$ is the size of the sample set. Obviously, our quantum Relief algorithm has superior acceleration than the classical one.

Keywords:

quantum Relief algorithm feature selection CMP operation rotation operation swap test IBM Q

1 Introduction

Machine learning refers to an area of computer science in which patterns are derived (“learned”) from data with the goal to make sense of previously unknown inputs. As part of both artificial intelligence and statistics, machine learning algorithms process large amounts of information for tasks that come naturally to the human brain, such as image recognition, pattern identification and strategy optimization.

Machine learning tasks are typically classified into two broad categories 1 , supervised and unsupervised machine learning, depending on whether there is a learning “signal” or “feedback” available to a learning system. In supervised machine learning, the learner is provided a set of training examples with features presented in the form of high-dimensional vectors and with corresponding labels to mark its category. On the contrary, no labels are given to the learning algorithm, leaving it on its own to find structure in unsupervised machine learning. The core mathematical task for both supervised and unsupervised machine learning algorithm is to evaluate the distance and inner products between the high-dimensional vectors, which requires a time proportional to the size of the vectors on classical computers. As we know, it is “curse of dimensionality” 2 to calculate the distance in high-dimensional condition. One of the possible solutions is the dimension reduction 3 , and the other is the feature selection 4 5 .

Relief algorithm is one of the most representative feature selection algorithms, which was firstly proposed by Kira et al. 6 . The algorithm devises a “relevant statistic” to weigh the importance of the feature which has been widely applied in many fields, such as, hand gesture recognition 7 , electricity price forecasting 8 and power system transient stability assessment 9 . Relief algorithm will occupy larger amount of computation resources while the number of samples and features become huger, which restricts the application of the algorithm.

Since the concept of quantum computer was proposed by the famous physicist Feynman 10 , a number of remarkable outcomes have been proposed. For example, Deutsch’s algorithms 11 12 embody the superiority of quantum parallelism calculation, Shor’s algorithm 13 solves the problem of integer factorization in polynomial time, and Grover’s algorithm 14 has a quadratic speedup to the problem of conducting a search through some unstructured search space. With the properties of superposition and entanglement, quantum computation has the potential advantage for dealing with high dimension vectors which attract researchers to apply the quantum mechanics to solve some classical machine learning tasks such as quantum pattern matching 15 , quantum probably approximately correct learning 16 , feedback learning for quantum measurement 17 , quantum binary classifiers 18 19 , and quantum support vector machines 20 .

Even in quantum machine learning, we still face with the trouble of “curse of dimensionality”, thus dimension reduction or feature selection is a necessary preliminary before training high-dimensional samples. Inspired by the idea of computing the inner product between two vectors 21 22 , we propose a Relief-based quantum parallel algorithm, also named quantum Relief algorithm, to effectively perform the feature selection.

The outline of this paper is as follows. The classic Relief algorithm is briefly reviewed in Sect. 2, the proposed quantum Relief algorithm is proposed in detail in Sect. 3, and a simulation experiment based on IBM Q with a simple example is given in Sect. 4. Subsequently, the efficiency of the algorithm is analyzed in Sect. 5, and the brief conclusion and the outlook of our work are discussed in the last section.

2 Review of Relief algorithm

Relief algorithm is a feature selection algorithm used in binary classification (generalizable to polynomial classification by decomposition into a number of binary problems) proposed by Kira and Rendell 6 in 1992. It is efficient in estimating features according to how well their values distinguish among samples that are near each other.

We can divide an $M$ -sample set into two vector sets: $A{\rm{=}}\left\{{{v_{j}}{\rm{|}}{v_{j}}\in{{\mathbb{R}}^{\rm{N}}},j=1,2,\cdots{M_{1}}}\right\}$ , $B{\rm{=}}\left\{{{w_{k}}{\rm{|}}{w_{k}}\in{{\mathbb{R}}^{\rm{N}}},k=1,2,\cdots{M_{2}}}\right\}$ , where ${v_{j}}$ , ${w_{k}}$ are $N$ -feature samples: ${v_{j}}={\left({{v_{j1}},{v_{j2}},\cdots,{v_{jN}}}\right)^{\mathsf{T}}}$ , ${w_{k}}={\left({{w_{k1}},{w_{k2}},\cdots{w_{kN}}}\right)^{\mathsf{T}}}$ , ${v_{j1}},\cdots,{v_{jN}},{w_{k1}},\cdots,{w_{kN}}\in\{0,1\}$ , and the weight vector of $N$ features $WT={\left({w{t_{1}},w{t_{2}},\cdots,w{t_{N}}}\right)^{\mathsf{T}}}$ is initialized to all zeros. Suppose the upper limit of iteration is $T$ , and the relevance threshold (that differentiate the relevant and irrelevant features) is $\tau(0\leq\tau\leq 1)$ . The process of Relief algorithm is described in Algorithm 1.

Init

WT={\left({0,\cdots,0}\right)^{\mathsf{T}}}

for t=1 to $T$ do

Pick a sample

u

randomly.

Find the closest samples Near-hit and Near-miss in the two classes

A

and

B

for i=1 to $N$ do

WT\left[i\right]=WT\left[i\right]{\rm{-}}\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}{\rm{+}}\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}

end for

Select the most relevant features according to

WT

and

\tau

ALGORITHM 1 Relief(

A

B

T

\tau

)

At each iteration, pick a random sample $u$ , and then select the samples closest to $u$ (by $N$ -dimensional Euclidean distance) from each class. The closest same-class sample is called Near-hit, and the closest different-class sample is called Near-miss. Update the weight vector such that,

WT\left[i\right]=WT\left[i\right]{\rm{-}}\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}{\rm{+}}\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}},

(1)

where the function $\emph{diff}(i,u,v)$ is defined as below,

diff(i,u,v)=\left\{\begin{array}[]{lcl}0{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{u_{i}}={v_{i}}\\ 1{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{u_{i}}\neq{v_{i}}\end{array}.\right.

(2)

Relief was also described as generalizable to polynomial classification by decomposition into a number of binary problems. However, as the scale of samples and the number of features increase, their efficiencies will drastically decline.

3 Quantum Relief Algorithm

In order to handle the problem of large samples and large features, we propose a quantum Relief algorithm. Suppose the sample sets $A=\left\{{{v_{j}},j=1,2,\cdots{M_{1}}}\right\}$ , $B=\left\{{{w_{k}},k=1,2,\cdots{M_{2}}}\right\}$ , weight vector $WT$ , the upper limit $T$ and threshold $\tau$ are the same as classical Relief algorithm defined in Sec. 2. Different from the classical one, all the features of each sample are represented as a quantum superposition state, and the similarity between two samples can be calculated in parallel. Algorithm 2 shows the detailed procedure of quantum Relief algorithm.

Init

WT={\left({0,\cdots,0}\right)^{\mathsf{T}}}

Prepare the states for sample sets

A

and

B

through CMP and rotation operations,

\begin{array}[]{l}{\left|{{\phi_{A}}}\right\rangle_{j}}=\frac{1}{{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}\\ {\left|{{\phi_{B}}}\right\rangle_{k}}=\frac{1}{{\sqrt{N}}}\left|k\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{w_{ki}}}\right|}^{2}}}\left|0\right\rangle+{w_{ki}}\left|1\right\rangle}\right)}\end{array}

for t=1 to $T$ do

Select a state

\left|\phi\right\rangle

randomly from the state set

\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\}

{\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}}\right\}}

Perform a swap operation on

\left|\phi\right\rangle

to get

\left|\varphi\right\rangle{\text{ = }}\frac{{\text{1}}}{{\sqrt{N}}}\left|l\right\rangle\sum\limits_{i=0}^{N-1}{\left|i\right\rangle\left({\sqrt{1-{{\left|{{u_{i}}}\right|}^{2}}}\left|0\right\rangle+{u_{i}}\left|1\right\rangle}\right)\left|1\right\rangle}

Get

\begin{array}[]{l}{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|^{2}}=\left({1-2P_{j}^{l\left(A\right)}\left(1\right)}\right){N^{2}}\\ {\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{w_{k}}}}}\right.\kern-1.2pt}{{{w_{k}}}}\right\rangle}\right|^{2}}=\left({1-2P_{k}^{l\left(B\right)}\left(1\right)}\right){N^{2}}\end{array}

through swap test and measurement operation.

Obtain the maximum similarity:

\max\left\{{{{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|}^{2}}}\right\}

and

\max\left\{{{{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{w_{k}}}}}\right.\kern-1.2pt}{{{w_{k}}}}\right\rangle}\right|}^{2}}}\right\}

if $u$ belongs to class $A$ then

\emph{Near-hit}={v_{\max}},\emph{Near-miss}={\kern 1.0pt}{w_{\max}}

else

\emph{Near-hit}={w_{\max}},\emph{Near-miss}={\kern 1.0pt}{v_{\max}}

end if

for i = 1 to $N$ do

w{t_{i}}=w{t_{i-1}}-\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}+\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}

end for

\overline{WT}

\left({{1\mathord{\left/{\vphantom{1T}}\right.\kern-1.2pt}T}}\right)WT

for i = 1 to $N$ do

if ( $\overline{WT}_{i}\geq\tau$ ) then

the i-th feature is relevant.

else

the i-th feature is not relevant.

end if

end for

ALGORITHM 2 QRelief(

A

B

T

\tau

)

3.1 State preparation

At the beginning of the algorithm, the classical information is converted into quantum superposition states. Quantum superposition state sets $\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}|j=1,2,\cdots,{M_{1}}}\right\}$ and $\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}|k=1,2,\cdots,{M_{2}}}\right\}$ , which store all the feature values of ${v_{j}}\in A$ and ${w_{k}}\in B$ , are prepared as below,

\begin{array}[]{l}{\left|{{\phi_{A}}}\right\rangle_{j}}=\frac{1}{{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}\\ {\left|{{\phi_{B}}}\right\rangle_{k}}=\frac{1}{{\sqrt{N}}}\left|k\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{w_{ki}}}\right|}^{2}}}\left|0\right\rangle+{w_{ki}}\left|1\right\rangle}\right)}\end{array},

(3)

where ${v_{ji{\kern 1.0pt}}}$ and ${w_{ki{\kern 1.0pt}}}$ represent the i-th feature value of vector ${v_{j{\kern 1.0pt}}}$ and ${w_{k{\kern 1.0pt}}}$ , respectively. Suppose the initial state is $\left|j\right\rangle{\left|0\right\rangle^{\otimes n}}\left|1\right\rangle\left|0\right\rangle$ ( $n=\left\lceil{{{\log}_{2}}\left({N}\right)}\right\rceil$ ) and we want to prepare the state ${\left|{{\phi_{A}}}\right\rangle_{j}}{\kern 1.0pt}{\kern 1.0pt}$ , its construction process consists of the following three steps.

First of all, the H (i.e., Hadamard gate) and CMP operations are performed on ${\left|0\right\rangle^{\otimes n}}$ to obtain the state ${\raise 3.01385pt\hbox{$1$}\!\mathord{\left/{\vphantom{1{\sqrt{N}}}}\right.\kern-1.2pt}\!\lower 3.01385pt\hbox{${\sqrt{N}}$}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle}$ ,

{\left|0\right\rangle^{\otimes n}}\xrightarrow{\emph{H}{\kern 2.0pt}and{\kern 2.0pt}\emph{CMP}{\kern 2.0pt}operations}{{\rm{1}}\over{\sqrt{N}}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle}.

(4)

Fig. 1 depicts the detailed circuit of these operations, where the CMP operation is a key component which is used to tailor the state ${\raise 3.01385pt\hbox{$1$}\!\mathord{\left/{\vphantom{1{\sqrt{{2^{n}}}}}}\right.\kern-1.2pt}\!\lower 3.01385pt\hbox{${\sqrt{{2^{n}}}}$}}\sum\limits_{i=0}^{{2^{n}-1}}{\left|i\right\rangle}$ into the target state ${\raise 3.01385pt\hbox{$1$}\!\mathord{\left/{\vphantom{1{\sqrt{N}}}}\right.\kern-1.2pt}\!\lower 3.01385pt\hbox{${\sqrt{N}}$}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle}$ , and its definition is

CMP\left|i\right\rangle\left|N\right\rangle\left|0\right\rangle=\left\{{\begin{array}[]{*{20}{c}}{\left|i\right\rangle\left|N\right\rangle\left|0\right\rangle,i<N}\\ {\left|i\right\rangle\left|N\right\rangle\left|1\right\rangle,i\geq N}\end{array}}.\right.

(5)

After the CMP operation, the quantum state which greater than $N$ (i.e., the last qubit is $\left|1\right\rangle$ ) will be clipped off.

Refer to caption — Figure 1: The circuit of quantum operation performed on ${\left|0\right\rangle^{\otimes n}}$ to obtain the state ${{\rm{1}}\over{\sqrt{N}}}\sum\limits_{i=0}^{N-1}{\left|i\right\rangle}$ when the measurement outcome is 0. Here, the thick horizontal line represents multiple qubits, while the thin horizontal line represents one qubit, and H is the Hadamard gate, and CMP can be implemented using the circuit in Fig. 2.

And then, an unitary rotation operation

{R_{y}}\left({2{{\sin}^{-1}}{v_{ji}}}\right)=\left[{\begin{array}[]{*{20}{c}}{\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}}&{-{v_{ji}}}\\ {{v_{ji}}}&{\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}}\end{array}}\right]

(6)

is performed on the last qubit to obtain ${\left|{{\phi_{A}}}\right\rangle_{j}}{\kern 1.0pt}{\kern 1.0pt}$ ,

{{\rm{1}}\over{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left|0\right\rangle}\buildrel{{R_{y}}}\over{\longrightarrow}{1\over{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}.

(7)

3.2 Similarity calculation

The similarity is a coefficient that describes how close two samples are, and it is obviously inversely related to the Euclidean distance. After the state preparation, the main work of this phase is to calculate the similarity between $\left|\phi\right\rangle$ and other states in state sets $\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\}$ and ${\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}}\right\}}$ , where $\left|\phi\right\rangle$ is a state selected randomly from $\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\}$ or ${\left\{{{{\left|{{\phi_{B}}}\right\rangle}_{k}}{\kern 1.0pt}}\right\}}$ . For simplicity, suppose $\left|\phi\right\rangle$ is the l-th state from $\left\{{{{\left|{{\phi_{A}}}\right\rangle}_{j}}{\kern 1.0pt}}\right\}$ ,

\left|\phi\right\rangle{\text{ = }}\frac{1}{{\sqrt{N}}}\left|l\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{u_{i}}}\right|}^{2}}}\left|0\right\rangle+{u_{i}}\left|1\right\rangle}\right)},

(8)

which corresponds to the chosen sample $u$ from sample set $A$ in the classical scenario. The detailed process is as follows.

First, a swap operation is performed on the last two qubits of $\left|\phi\right\rangle$ to obtain a new state,

\left|\varphi\right\rangle=\frac{1}{{\sqrt{N}}}\left|l\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left({\sqrt{1-{{\left|{{u_{i}}}\right|}^{2}}}\left|0\right\rangle+{u_{i}}\left|1\right\rangle}\right)\left|1\right\rangle}.

(9)

Second, the swap test 23 operation (its circuit is given in Fig. 3) is applied on $\left({\left|\varphi\right\rangle,{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{{\left|{{\phi_{A}}}\right\rangle}_{j}}}\right)$ ,

\begin{array}[]{l}\left|0\right\rangle\left|\varphi\right\rangle{\left|{{\phi_{A}}}\right\rangle_{j}}\xrightarrow{swap{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}test}\left[{{1\over 2}\left|0\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}+{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)+{1\over 2}\left|1\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}-{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)}\right]\\ \end{array}.

(10)

If we measure the first qubit with $\left|1\right\rangle\left\langle 1\right|\otimes I\otimes I$ , the probability of measurement result to be $\left|{\rm{1}}\right\rangle$ is

\begin{array}[]{l}{P_{j}^{l}}(1)=\left[{{1\over 2}\left\langle 0\right|(\left\langle\varphi\right|{{\left\langle\phi_{A}\right|}_{j}}+{{\left\langle\phi_{A}\right|}_{j}}\left\langle\varphi\right|)+{1\over 2}\left\langle 1\right|(\left\langle\varphi\right|{{\left\langle\phi_{A}\right|}_{j}}-{{\left\langle\phi_{A}\right|}_{j}}\left\langle\varphi\right|)}\right]\left|1\right\rangle\left\langle 1\right|\otimes I\otimes I\\ \qquad\qquad\left[{{1\over 2}\left|0\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}+{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)+{1\over 2}\left|1\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}-{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)}\right]\\ \;\,\qquad=\left[{{1\over 2}\left\langle 1\right|(\left\langle\varphi\right|{{\left\langle\phi_{A}\right|}_{j}}-{{\left\langle\phi_{A}\right|}_{j}}\left\langle\varphi\right|)}\right]\left|1\right\rangle\left\langle 1\right|\otimes I\otimes I\left[{{1\over 2}\left|1\right\rangle(\left|\varphi\right\rangle{{\left|\phi_{A}\right\rangle}_{j}}-{{\left|\phi_{A}\right\rangle}_{j}}\left|\varphi\right\rangle)}\right]\\ \;\,\qquad={1\over 2}-{1\over 2}{\left|{{{\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi\phi}}\right.\kern-1.2pt}{\phi_{A}}\right\rangle}_{j}}}\right|^{2}}\end{array},

(11)

As we know, the inner product between $\left|\varphi\right\rangle$ and prepared state ${\left|{{\phi_{A}}}\right\rangle_{j}}$ can be calculated as below,

\begin{array}[]{l}{\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi{{\phi_{A}}}}}\right.\kern-1.2pt}{{{\phi_{A}}}}\right\rangle_{j}}={1\over N}\sum\limits_{i}{(u_{i})^{*}{v_{ji}}}={1\over N}\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle\\ \end{array}.

(12)

From Eqs. (11) and(12), we can get the similarity between samples $u$ and $v_{j}$ ,

\begin{array}[]{l}{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|^{2}}=\left({1-2P_{j}^{l\left(A\right)}\left(1\right)}\right){N^{2}}\\ \end{array}.

(13)

Finally, we can find out the max-similarity sample ${v_{\max}}\in A$ through the classical maximum traversal search algorithm among the set $\left\{{{{\left|{\left\langle{u}\mathrel{\left|{\vphantom{u{{v_{j}}}}}\right.\kern-1.2pt}{{{v_{j}}}}\right\rangle}\right|}^{2}}}\right\}$ .

Through the above method, we can also find out the other max-similarity sample ${w_{\max}}\in B$ . When having finished all these steps, the algorithm turns to the next phase.

3.3 Weight vector update

The first step is to determine the closest same-class sample Near-hit and the different-class sample Near-miss, which obey the following rule,

\left\{{\begin{array}[]{*{20}{c}}{\emph{Near-hit}={v_{\max}},{\kern 1.0pt}{\kern 1.0pt}\emph{Near-miss}={w_{\max}}}&{{\kern 1.0pt}{\kern 1.0pt}if{\kern 1.0pt}{\kern 1.0pt}u\in A}\\ {\emph{Near-hit}={w_{\max}},{\kern 1.0pt}{\kern 1.0pt}\emph{Near-miss}={v_{\max}}}&{{\kern 1.0pt}{\kern 1.0pt}if{\kern 1.0pt}{\kern 1.0pt}u\in B}\end{array}}.\right.

(14)

After determining Near-hit and Near-miss, we can update every element of weight vector $WT={\left({w{t_{1}},w{t_{2}},\cdots,w{t_{N}}}\right)^{\mathsf{T}}}$ with them,

w{t_{i}}=w{t_{i-1}}-\emph{diff}{\left({i,u,\emph{Near-hit}}\right)^{2}}+\emph{diff}{\left({i,u,\emph{Near-miss}}\right)^{2}}{\kern 1.0pt}{\kern 1.0pt},

(15)

where $1\leq i\leq N$ .

3.4 Feature selection

After iterating the similarity calculation and weight vector update $T$ runs, the algorithm jumps out of the loop with the final vector $WT$ as output result. The remainder of the algorithm is to select the “real” relevant features.

We firstly divide $WT$ by $T$ , and obtain its mean vector $\overline{WT}$ ,

\overline{WT}=\frac{1}{T}WT.

(16)

Then, we select relevant features according to the preset threshold $\tau$ . To be specific, those features are selected if their corresponding values in $\overline{WT}$ are greater than $\tau$ and discarded in the opposite case,

\left\{{\begin{array}[]{*{20}{c}}{the{\kern 2.0pt}\emph{i-th}{\kern 2.0pt}feature{\kern 2.0pt}is{\kern 2.0pt}relevant{\kern 36.0pt}if{\kern 3.0pt}\overline{WT}_{i}\geq\tau}\\ {the{\kern 2.0pt}\emph{i-th}{\kern 2.0pt}feature{\kern 2.0pt}is{\kern 2.0pt}NOT{\kern 2.0pt}relevant{\kern 8.0pt}if{\kern 3.0pt}\overline{WT}_{i}<\tau}\end{array}}.\right.

(17)

4 Example and its experiment

Suppose there are four samples(see Tab. 1), ${S_{0}}=(1,0,1,0)$ , ${S_{1}}=(1,0,0,0)$ , ${S_{2}}=(0,1,1,0)$ , ${S_{3}}=(0,1,0,0)$ , thus the $n$ in Eq. (4) is 2, and they belong to two classes: $A=\{{S_{0}},{S_{1}}\}$ , $B=\{{S_{2}},{S_{3}}\}$ , which is illustrated in Fig. 4.

Table 1: The feature values of four samples. Each row represents all the feature values of a certain sample, while each column denotes a certain feature value of all the samples.

	$F_{0}$	$F_{1}$	$F_{2}$
$S_{0}$	1	0	1
$S_{1}$	1	0	0
$S_{2}$	0	1	1
$S_{3}$	0	1	0

First, the four initial quantum states are prepared as follows,

\left\{{\begin{array}[]{*{20}{c}}{\left|\psi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}=\left|{{\rm{00}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\\ {\left|\psi\right\rangle_{{{\rm{S}}_{\rm{1}}}}}=\left|{{\rm{01}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\\ {\left|\psi\right\rangle_{{{\rm{S}}_{\rm{2}}}}}=\left|{{\rm{10}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\\ {\left|\psi\right\rangle_{{{\rm{S}}_{\rm{3}}}}}=\left|{{\rm{11}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\end{array}}.\right.

(18)

Taking ${\left|\psi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}$ as an example, the ${H^{\otimes 2}}$ operation is applied on the third and fourth qubits,

\begin{array}[]{l}\left|{{\rm{00}}}\right\rangle{\left|{\rm{0}}\right\rangle^{\otimes 2}}\left|1\right\rangle\left|0\right\rangle\xrightarrow{H^{\otimes 2}}{1\over 2}\left|{{\rm{00}}}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle\left|1\right\rangle\left|0\right\rangle}\end{array}.

(19)

Then we perform ${R_{y}}$ rotation (see Eq. (6)) on the last qubit, where

\left\{{\begin{array}[]{*{20}{c}}{R_{y}}(2{\sin^{-1}}{v_{00}})={R_{y}}(2{\sin^{-1}}{v_{02}})=-iY=\left[{\begin{array}[]{*{20}{c}}0&{-1}\cr 1&0\cr\end{array}}\right]\\ {R_{y}}(2{\sin^{-1}}{v_{01}})={R_{y}}(2{\sin^{-1}}{v_{03}})$$=I=\left[{\begin{array}[]{*{20}{c}}1&0\cr 0&1\cr\end{array}}\right]\end{array}},\right.

(20)

and can get

{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}={1\over 2}\left|{{\rm{00}}}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle\left|1\right\rangle\left({\sqrt{{\rm{1-}}{{\left|{{v_{0i}}}\right|}^{2}}}\left|0\right\rangle+{v_{0i}}\left|1\right\rangle}\right)}.

(21)

The other quantum states are prepared in the same way and they are listed as below,

\left\{{\begin{array}[]{*{20}{c}}{\left|\phi\right\rangle_{{S_{0}}}}={1\over 2}\left|{00}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{0i}}}\right|}^{2}}}\left|0\right\rangle+{v_{0i}}\left|1\right\rangle}\right)\\ {\left|\phi\right\rangle_{{S_{1}}}}={1\over 2}\left|{01}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{1i}}}\right|}^{2}}}\left|0\right\rangle+{v_{1i}}\left|1\right\rangle}\right)\\ {\left|\phi\right\rangle_{{S_{2}}}}={1\over 2}\left|{10}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{2i}}}\right|}^{2}}}\left|0\right\rangle+{v_{2i}}\left|1\right\rangle}\right)\\ {\left|\phi\right\rangle_{{S_{3}}}}={1\over 2}\left|{11}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle}\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{3i}}}\right|}^{2}}}\left|0\right\rangle+{v_{3i}}\left|1\right\rangle}\right)\end{array}}.\right.

(22)

Second, we randomly select a sample (assume ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}$ is that one), and perform similarity calculation with other samples (i.e., ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{1}}}}}$ , ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{2}}}}}$ , ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{3}}}}}$ ). Taking ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}$ and ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{1}}}}}$ as an example, we perform a swap operation between the last two qubits of ${\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}$ ,

\begin{array}[]{l}{\left|\phi\right\rangle_{{{\rm{S}}_{\rm{0}}}}}\xrightarrow{swap}\left|{{\varphi}}\right\rangle={1\over 2}\left|{{\rm{00}}}\right\rangle\sum\limits_{i=0}^{3}{\left|i\right\rangle\left({\sqrt{{\rm{1-}}{{\left|{{v_{0i}}}\right|}^{2}}}\left|0\right\rangle+{v_{0i}}\left|1\right\rangle}\right)\left|1\right\rangle}\end{array}.

(23)

After that, the swap test operation is applied on ( $\left|{{\varphi}}\right\rangle$ , $\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}$ ),

\begin{array}[]{l}\left|0\right\rangle\left|\varphi\right\rangle{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}\xrightarrow{swap{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}{\kern 1.0pt}test}{1\over 2}\left|0\right\rangle\left({\left|\varphi\right\rangle{{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}+{{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}\left|\varphi\right\rangle}\right)+{1\over 2}\left|1\right\rangle\left({{\left|\varphi\right\rangle{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}-{{\left|{{\phi}}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}\left|\varphi\right\rangle}\right)\\ \end{array},

(24)

then, we measure the result shown in Eq.(24) and obtain the probability of the first qubit being $\left|{\rm{1}}\right\rangle$ is

\begin{array}[]{l}{P_{1}^{0}}(1)={1\over 2}-{1\over 2}{\left|{{{\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi\phi}}\right.\kern-1.2pt}{\phi}\right\rangle_{{{\rm{S}}_{\rm{1}}}}}}}\right|^{2}}\end{array}.

(25)

In terms of Eq. (12), the inner product between $\left|{{\varphi}}\right\rangle$ and $\left|{{\phi}}\right\rangle_{S_{1}}$ can be represented as

\left\langle{\varphi}\mathrel{\left|{\vphantom{\varphi\phi}}\right.\kern-1.2pt}{\phi}\right\rangle_{S_{1}}={1\over 4}\sum\limits_{i}{{S_{0}}_{i}^{*}{S_{1i}}={1\over 4}\left\langle{{{S_{0}}}}\mathrel{\left|{\vphantom{{{S_{0}}}{{S_{1}}}}}\right.\kern-1.2pt}{{{S_{1}}}}\right\rangle}.

(26)

From Eqs. (25) and (26), the similarity between $\left|{{S_{0}}}\right\rangle$ and $\left|{{S_{1}}}\right\rangle$ is

{\left|{\left\langle{{{S_{0}}}}\mathrel{\left|{\vphantom{{{S_{0}}}{{S_{1}}}}}\right.\kern-1.2pt}{{{S_{1}}}}\right\rangle}\right|^{2}}=16(1-2{P_{1}^{0}}(1)),

(27)

here, the value of ${P_{1}^{0}}(1)$ can be determined by measurement.

In order to obtain the measurement result and also verify our algorithm, we choose the IBM Q [24] to perform the quantum processing (Fig. 5 gives the schematic diagram of our experimental circuit)¹¹1In the experiment, we program our algorithm based on the QISKit toolkit[25] and phyton language, and remotely connect the online IBM QX5 device to execute quantum processing.. After the experiment, we can get ${P_{1}^{0}}(1)$ which is shown in Tab. 2.

Table 2: The probabilities

P_{j}^{l}(1)

of the first qubit being

\left|{\rm{1}}\right\rangle

Iteration times( $T$ )	$u$	$Sample$	$P_{j}^{l}(1)$
1	$S_{0}$	$S_{1}$	0.49023438
		$S_{2}$	0.49902344
		$S_{3}$	0.49121094
2	$S_{1}$	$S_{0}$	0.50097656
		$S_{2}$	0.52246094
		$S_{3}$	0.53417969
3	$S_{2}$	$S_{0}$	0.50683594
		$S_{1}$	0.50878906
		$S_{3}$	0.49218750
4	$S_{3}$	$S_{0}$	0.49804688
		$S_{1}$	0.49218750
		$S_{2}$	0.50195312

According to ${P_{1}^{0}}(1)$ (from Tab. 2) and Eq. (27), we can calculate the similarity between $\left|{{S_{0}}}\right\rangle$ and ${\left|S_{1}\right\rangle}$ ,

\begin{array}[]{l}{\left|{\left\langle{{{S_{0}}}}\mathrel{\left|{\vphantom{{{S_{0}}}{{S_{1}}}}}\right.\kern-1.2pt}{{{S_{1}}}}\right\rangle}\right|^{2}}=16(1-2{P_{1}^{0}}(1))\\ \;\;\;\qquad\qquad=16(1-2*0.49023438)\\ \;\;\;\qquad\qquad\approx 0.3125\end{array}.

(28)

In the same way, the other two similarities ( $\left|{{S_{0}}}\right\rangle$ , ${\left|S_{2}\right\rangle}$ ), ( $\left|{{S_{0}}}\right\rangle$ , ${\left|S_{3}\right\rangle}$ ) can also be obtained (which are illustrated in Tab. 3).

Table 3: Similarities between samples

	$S_{0}$	$S_{1}$	$S_{2}$	$S_{3}$
$S_{0}$	$-$	0.3125	0.03125	0.28125
$S_{1}$		$-$	0.71875	1.09375
$S_{2}$			$-$	0.25
$S_{3}$				$-$

Third, From Tab. 3, it is easy to find Near-hit is $S_{1}$ and Near-miss is $S_{3}$ (as shown in Fig. 6). Then, the weight vector is updated by applying Eq.(15), and the result of $WT$ is listed in the second row of Tab. 4 for the first iteration.

Table 4: The update result of

WT

$Iterationtimes(T)$	$Weightvector(WT)$
1	[1 1 0 0]
2	[2 2 -1 0]
3	[3 3 -1 0]
4	[4 4 -2 0]

The algorithm iterates $T$ times (in our example, $T$ =4) as above steps, and obtains all the $WT$ results shown in Tab. 4. After $T$ -th iterations, $WT=[4,4,-2,0]$ , then $\overline{WT}=[1,1,-1/2,0]$ . Since the threshold $\tau=0.5$ , so the selected features are $F_{0}$ and $F_{1}$ , i.e., the first and second features.

5 Efficiency analysis

In classical Relief algorithm, it takes O(N) times to calculate the Euclidean distance between $u$ and each of the $M$ samples, the complexity of finding its Near-hit and Near-miss is related to $M$ , and the loop iterates $T$ times, so the computational complexity of classical Relief algorithm is O(TMN). Since $T$ is a constant which affects the accuracy of relevance levels, but it is chosen independently of $M$ and $N$ , so the complexity can be simplified to O(MN). In addition, an $N$ -dimensional vector in the Hilbert space is represented by $N$ bits in the classical computer, and there are $M$ samples (i.e., $M$ $N$ -dimensional vectors) needed to be stored in the algorithm, so the classical Relief algorithm will consume $O(MN)$ bits storage resources.

In our quantum Relief algorithm, all the features of each sample are superposed on a quantum state ${\left|{{\phi_{A}}}\right\rangle_{j}}$ or ${\left|{{\phi_{B}}}\right\rangle_{k}}$ , then the similarity calculation between two states, which is shown in Eq. (13), is just required to be taken $O(1)$ time. As same as Relief algorithm, the similarity between the selected state $\left|\varphi\right\rangle$ and each state in $\left\{{\left|{{\phi_{A}}}\right\rangle}\right\}$ , $\left\{{\left|{{\phi_{B}}}\right\rangle}\right\}$ is calculated, taking $O(M)$ times, to obtain Near-miss and Near-hit, and the loop iterates $T$ times, so the proposed algorithm totally takes O(TM) times. Since $T$ is a constant, the computational complexity can be rewritten as O(M). On the viewpoint of resource consumption, each quantum state in state sets $\left\{{\left|{{\phi_{A}}}\right\rangle_{j}}\right\}$ and $\left\{{\left|{{\phi_{B}}}\right\rangle_{k}}\right\}$ is represented as

{\left|{{\phi_{A}}}\right\rangle_{j}}=\frac{1}{{\sqrt{N}}}\left|j\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{v_{ji}}}\right|}^{2}}}\left|0\right\rangle+{v_{ji}}\left|1\right\rangle}\right)}

{\left|{{\phi_{B}}}\right\rangle_{k}}=\frac{1}{{\sqrt{N}}}\left|k\right\rangle\sum\limits_{i=0}^{N-1}{\left|{i}\right\rangle\left|1\right\rangle\left({\sqrt{1-{{\left|{{w_{ki}}}\right|}^{2}}}\left|0\right\rangle+{w_{ki}}\left|1\right\rangle}\right)},

and it consists of $O({\log_{2}}N)$ qubits. Since $j=1,2,\cdots,{M_{1}}$ , $k=1,2,\cdots,{M_{2}}$ and $M={M_{1}}+{M_{2}}$ , that means that there are such $M$ quantum states needed to be stored in our algorithm, so it will consume $O(MlogN)$ qubits storage resources.

Tab. 5 illustrates the efficiency comparison, including the computational complexity and resource consumption, between classical Relief and our quantum Relief algorithms. As shown in the table, the computational complexity of our algorithm is $O(M)$ , which is obviously superior than $O(MN)$ in classical Relief algorithm. On the other hand, the resource that our algorithm needs to consume is $O(MlogN)$ qubits, while the classical Relief algorithm consumes $O(MN)$ bits.

Table 5: Efficiency comparison between classical Relief and quantum Relief algorithms

	Complexity	resource consumption
Relief Algorithm	$O(MN)$	$O(MN)$ bits
Quantum Relief Algorithm	$O(M)$	$O(MlogN)$ qubits

6 Conclusion and Discussion

With quantum computing technologies nearing the era of commercialization and quantum supremacy, recently machine learning seems to be considered as one of the ”killer” applications. In this study, we utilize quantum computing technologies to solve a simple feature selection problem (just used in binary classification), and propose the quantum Relief algorithm, which consist of four phases: state preparation, similarity calculation, weight vector update, and features selection. Furthermore, we verify our algorithm by performing quantum experiments on the IBM Q platform. Compared with the classical Relief algorithm, our algorithm holds lower computation complexity and less resource consumption (in terms of number of resource). To be specific, the complexity is reduced from $O(MN)$ to $O(M)$ , and the consumed resource is shortened from $O(MN)$ bits to $O(MlogN)$ qubits.

Although this work just focuses on the feature selection problem (even the simple binary classification), but the method can be generalized to implement the other Relief-like algorithms, such as ReliefF 26 , RReliefF 27 . Besides, we are interested in utilizing quantum technologies to deal with some classic high-dimension massive data processing, such as text classification, video stream processing, data mining, computer version, information retrieval, and bioinformatics.

Acknowledgements.

The authors would like to thank the anonymous reviewers and editor for their comments that improved the quality of this paper. This work is supported by the National Nature Science Foundation of China (Grant Nos. 61502101, 61501247 and and 61672290), the Natural Science Foundation of Jiangsu Province, China (Grant No. BK20171458), the Six Talent Peaks Project of Jiangsu Province, China (Grant No. 2015-XXRJ-013), the Natural Science Foundation for Colleges and Universities of Jiangsu Province, China (Grant No. 16KJB520030), the Practice Innovation Training Program Projects for Jiangsu College Students (Grant No. 201810300016Z), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

References

(1) Mohri M., Rostamizadeh A., Talwalkar A.: Foundations of machine learning. MIT Press, Cambridge, Massachusetts (2012)
(2) Bellman, R. E.: Dynamic programming. Princeton University Press, Princeton, NJ (1957)
(3) Roweis, S. T., Saul, L. K.: Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326 (2000)
(4) Liu, H., Motoda, H.: Feature selection for knowledge discovery and data mining. Springer US (2012)
(5) Liu, H., Motoda, H.: Computational methods of feature selection. CRC Press (2007)
(6) Kira, K., Rendell, L. A.: A practical approach to feature selection. In: Proc. of 9th International Workshop on Machine Learning, pp. 249-256, Morgan Kaufmann Publishers, San Francisco, CA, USA (1992)
(7) Zhang, J., Lin, H., Zhao, M.: A fast algorithm for hand gesture recognition using Relief. In: Proc. of 6th International Conference on Fuzzy Systems and Knowledge Discovery, vol. 1, pp. 8-12, IEEE Press, Piscataway, NJ, USA (2009)
(8) Amjady, N., Daraeepour, A.: Day-ahead electricity price forecasting using the relief algorithm and neural networks. In: Proc. of 5th International Conference on the European Electricity, pp. 1-7, IEEE Press, Market, Lisboa, Portugal (2008)
(9) Dai, Y., Chen, L., Zhang, W., Min, Y.: Multi-support vector machine power system transient stability assessment based on relief algorithm. In: Proc. of 2015 IEEE PES Asia-Pacific Power and Energy Engineering, pp. 1-5. IEEE Press, Brisbane, QLD, Australia (2015)
(10) Feynman, R. P.: Quantum mechanical computers. Found. Phys. 16(6), 507-531 (1986)
(11) Deutsch, D., Jozsa, R.: Rapid solution of problems by quantum computation. In: Proc. of Royal Society of London, Series A: Mathematical and Physical Sciences. 439(1907), pp. 553-558, the Royal Society (1992)
(12) Deutsch, D.: Quantum theory, the church-turing principle and the universal quantum computer. In: Proc. of Royal Society of London, Series A: Mathematical and Physical Sciences. 400(1818), pp. 97-117, the Royal Society (1985)
(13) Shor, P. W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Rev. 41(2), 303-332 (1999)
(14) Grover, Lov K.: A fast quantum mechanical algorithm for database search. In: Proc. of 28th Annual ACM symposium on Theory of Computing, pp. 212-219, ACM Press, New York, NY, USA (1996)
(15) Sasaki, M., Carlini, A.: Quantum learning and universal quantum matching machine. Phys. Rev. A. 66(2), 022303 (2002)
(16) Servedio, R. A., Gortler, S. J.: Equivalences and separations between quantum and classical learnability. SIAM J. Comput. 33(5), 1067-1092 (2004)
(17) Hentschel, A., Sanders, B. C.: Machine learning for precise quantum measurement. Phys. Rev. Lett. 104(6), 063603 (2010)
(18) Neven, H., Denchev, V. S., Rose, G., Macready, W. G.: Training a large scale classifier with the quantum adiabatic algorithm. arXiv:quant-ph/0912.0779v1 (2009)
(19) Pudenz, K. L., Lidar, D. A.: Quantum adiabatic machine learning. Quantum Inf. Process. 12(5), 2027-2070 (2013)
(20) Anguita, D., Ridella, S., Rivieccio, F., Zunino, R.: Quantum optimization for training support vector machines. Neural Netw., 16(5-6), 763 770 (2003)
(21) Wiebe, N., Kapoor, A., Svore, K.: Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quantum Info. Comput. 15(3-4), 316-356 (2015)
(22) Lloyd, S., Mohseni, M., Rebentrost, P.: Quantum algorithms for supervised and unsupervised machine learning. arXiv:quant-ph/1307.0411 (2013)
(23) Buhrman, H., Cleve, R., Watrous, J., De, W. R.: Quantum fingerprinting. Phys. Rev. Lett. 87(16), 167902 (2001)
(24) IBM quantum computing platform. https://www.research.ibm.com/ibm-q/. Online accessed 12-April-2018
(25) QISKit: Open Source Quantum Information Science Kit. https://qiskit.org/. Online accessed 12-April-2018
(26) Kononenko, I., Simec, E., Robniksikonja, M.: Overcoming the myopia of inductive learning algorithms with RELIEFF. Appl. Intell. 7(1), 39-55 (1997)
(27) Robnik-Sikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: Proc. of 14th International conference on Machine Learning, pp. 296-304, Morgan Kaufmann Publishers, Nashville, TN, USA (1997)