Poster: Implementing Quantum Machine Learning in Qiskit for Genomic Sequence Classification

Navneet Singh and Shiva Raj Pokhrel

An Independent Implementation of Quantum Machine Learning Algorithms in Qiskit for Genomic Data

Navneet Singh and Shiva Raj Pokhrel

Abstract.

In this paper, we explore the power of Quantum Machine Learning as we extend, implement and evaluate algorithms like Quantum Support Vector Classifier (QSVC), Pegasos-QSVC, Variational Quantum Circuits (VQC), and Quantum Neural Networks (QNN) in Qiskit with diverse feature mapping techniques for genomic sequence classification.¹¹1Authors are from IoT & SE Lab, School of IT, Deakin University, Australia; [email protected].

Quantum Machine Learning, Quantum Support Vector Classifier (QSVC), Pegasos-QSVC, Variational Quantum Circuits (VQC), Feature Map, Genomic Sequence Classification and Quantum Neural Networks (QNN)

^†^†copyright: none^†^†doi: ^†^†isbn: ^†^†price:

1. Introduction

The growing evolution of the Quantum Machine Learning (QML) models has shown leapfrog in medical imaging analysis, however, there remains a notable gap in genomic sequence classification, prompting our research to evaluate, rethink and extend the baseline performance of Quantum Support Vector Classifier (QSVM), Pegasos-QSVM (Gentinetta et al., 2024), Variational Quantum Classifier (VQC), and Quantum Neural Network (QNN) (Abbas et al., 2021) over genomic data (Pokhrel et al., 2024).

In this preliminary research, we employ feature mapping techniques: ZFeatureMap, ZZFeatureMap, and PauliFeatureMap (Havlíček et al., 2019), extend, implement and evaluate QML algorithms over Qiskit for genomic data, to analyze the impact of algorithmic and feature mapping parameters for better understanding and application in genomic sequence classification.

2. Background and Implementation

Feature maps facilitate QML models by translating classical data $(x_{i},y_{i})$ into operational quantum states $\psi$ , enabling algorithms to process and analyze information efficiently. The ZZFeatureMap uses the ZZ gate to entangle pairs of qubits, introducing phase factors based on encoded classical data and structuring interactions between qubit pairs according to the data distribution. In contrast, the ZFeatureMap employs the Z gate to introduce phase shifts to individual qubit states based on classical data, rotating qubit states accordingly. The PauliFeatureMap uses combinations of Pauli gates (X, Y, and Z) to encode classical data.

Our developed QML algorithms and Qiskit implementations are open source. In our QSVC (Algorithm 1), we begin by encoding data points $\vec{x}_{i}$ into quantum states $\psi$ using a tailored quantum circuit $\mathcal{C}$ , and a feature map with parametrs $\theta$ , $U_{\text{feature}}(\vec{x}_{i},\theta)$ . A quantum kernel matrix $K$ is computed, representing the inner products between $\langle\psi(\vec{x}_{i}),\psi(\vec{x}_{j})\rangle$ , which facilitates the optimization of the SVM. Training entails solving a dual quadratic programming problem (step 4, Algo. 1) using classical to find the optimal separation hyperplane for labels $y_{i}$ in the quantum-enhanced feature space, while predictions involve preparing a $\psi$ for new data points and evaluating the decision function, $f(\vec{x})$ incorporating support vectors, kernel evaluations, and a bias term.

Algorithm 1 Quantum Support Vector Classifier (QSVC)

1:Encode data

\vec{x}_{i}

into quantum states

\psi

\leftarrow

U_{\text{feature}}(\vec{x}_{i},\theta)

2:Construct

\mathcal{C}

\to

Compute kernel matrix K.

3:Measure inner product,

K_{ij}\leftarrow\langle\psi(\vec{x}_{i}),\psi(\vec{x}_{j})\rangle

4:Optimize support vectors and coefficients

\alpha_{i}

and solve QP problem:

\vec{\alpha}\leftarrow\text{solve}\left(\max\left(\sum_{i=1}^{n}\alpha_{i}-\frac{1}{2}\sum_{i,j=1}^{n}y_{i}y_{j}\alpha_{i}\alpha_{j}K_{ij}\right)\right)

subject to:

0\leq\alpha_{i}\leq C

and

\sum_{i=1}^{n}\alpha_{i}y_{i}=0

5:Prepare

\psi

\leftarrow

U_{\text{feature}}(\vec{x}_{i},\theta)

6:return Decision function

f(\vec{x})

for prediction

f(\vec{x})=\text{sign}\left(\sum_{i=1}^{n}\alpha_{i}y_{i}K(\vec{x},\vec{x}_{i})-b\right)

Algorithm 2 Pegasos-QSVC

1:Encode data

\vec{x}_{i}

into quantum states

\psi

\leftarrow

U_{\text{feature}}(\vec{x}_{i},\theta)

2:Construct

\mathcal{C}

\to

Compute kernel matrix

K

3:for

t=1

T

4: Randomly select a subset of samples

5: Compute

y_{i}\langle\vec{w},\phi(\vec{x}_{i})\rangle

for each sample

6: if

y_{i}\langle\vec{w},\phi(\vec{x}_{i})\rangle<1

then

\vec{w}\leftarrow(1-\eta\lambda)\vec{w}+\eta y_{i}\phi(\vec{x}_{i})

7: else

\vec{w}\leftarrow(1-\eta\lambda)\vec{w}

8: end if

9: Normalize

\vec{w}

\vec{w}\leftarrow\min\left(1,\frac{1/\sqrt{\lambda}}{\|\vec{w}\|}\right)\vec{w}

10:end for

11:Prepare

\psi

\leftarrow

U_{\text{feature}}(\vec{x}_{i},\theta)

12:return Decision function:

f(\vec{x})=\text{sign}(\langle\vec{w},\phi(\vec{x})\rangle)

The Pegasos-QSVC, (Algorithm 2), in our implementation, initializes the qubits and $\theta$ , including learning rate $\eta$ and regularization factor $\lambda$ , encoding data in $\psi$ for the computation of the quantum kernel $K$ . In the training loop $T$ , iterative updates occur with subset evaluations, updating the weight vector $\vec{w}$ based on margin conditions and normalizing it for regularization (step 9, Algo 2). The predictions $f(\vec{x})$ for new data points are based on their representations of quantum states $\psi$ using the final model $(\langle\vec{w},\phi(\vec{x})\rangle)$ .

Algorithm 3 Variational Quantum Classifier (VQC)

1:Encode data

\vec{x}_{i}

into quantum states

\psi

\leftarrow

U_{\text{feature}}(\vec{x}_{i})

2:Construct variational circuit: parameterized gates

U(\vec{\theta})

3:Apply

U(\vec{\theta})

to the encoded states

\psi

4:Measurement

M\leftarrow\text{measure}_{\{|0\rangle,|1\rangle\}}(\psi)

5:Compute the cost function

C(\vec{\theta})

6:while not converged do

7: Use classical optimizer:

\vec{\theta}\leftarrow\text{optimize}(C(\vec{\theta}))

8:end while

9:return Optimized parameters:

\vec{\theta}_{\text{opt}}

Our implementation of VQC (Algorithm 3), begins with a qubit and $\theta$ , followed by encoding into qubits $U_{\text{feature}}(\vec{x}_{i})$ . A parameterized quantum circuit is constructed $U(\vec{\theta})$ , with gate $\theta$ serving as a learnable parameters $\vec{\theta}$ . Then quantum operations are applied for measuring $M$ to compute the resultant cost function $C(\vec{\theta})$ . This process is iteratively minimized by a classical optimizer adjusting the gate parameters until convergence, yielding $\vec{\theta}_{\text{opt}}$ .

Algorithm 4 Quantum Neural Network (QNN) Operation

1:Encode

\vec{x}_{i}

into quantum states:

\psi

\leftarrow

U_{\text{feature}}(\vec{x}_{i})

2:for each layer

l

from

1

L

3: Perform unitary transformation:

U_{l}(\theta_{l})

4:end for

5:Perform CNOT (or custom entangling gate)

6:Measure output qubits and obtain

|\Phi\rangle

7:while (

E>\varepsilon

) do

8: Renew

\theta

(quantum gradient descent):

\theta=\theta-\eta\nabla_{\theta}E

E=|

expected_output

-

measurement

(\Phi)|^{2}

10:end while

Our implementation of the QNN (Algorithm 4), initializes qubits and $\theta$ , encoding data into quantum states $\psi$ using $U_{\text{feature}}(\vec{x}_{i})$ . Through parameterized quantum gates across layers $l$ , we process data, capture correlations of entanglement, and compute $\phi$ by quantifying qubits. Observe that, in training, a quantum version of backpropagation adjusts based on $E$ and $\eta$ until the threshold minimization is attained.

We fed the democoding vs. intergenomic sequence dataset (Grešová et al., 2023), containing sequences of different transcript types in our experiments using four qubits, allowing thorough evaluation of the four proposed algorithms. A subset of 100,000 genome sequences (divided into training and test sets) is selected for analysis to evaluate various QML models. Initially, the data are transformed into numerical form through text vectorization and reduced in dimensionality using PCA, then encoded into quantum data using feature mapping techniques, preparing them for processing in QML models.

Refer to caption — Figure 1. Convergence of QNN & VQC objective functions.

Figure 1 illustrates the outperforming convergence of the objective function during the training for QNN and VQC with different feature mapping schemes (compared to (Abbas et al., 2021)). Our detailed evaluation with classification metrics is shown in Table 1, illustrating substantial improvement compared to (Abbas et al., 2021), (Gentinetta et al., 2024) and (Pokhrel et al., 2024) in accuracies (train and test), precision, recall, F1-Score, and AUROC (Area Under the Receiver Operating Characteristics), and performance in distinguishing between the two classes of genome sequences.

3. Conclusion

Although our preliminary analysis demonstrates improved performance, ongoing detailed investigations (Pokhrel et al., 2024) in the realm of QML for genomic classification and evaluations with larger datasets will further illuminate the potential of these advancements.

References

(1)
Abbas et al. (2021) Amira Abbas, David Sutter, Christa Zoufal, Aurélien Lucchi, Alessio Figalli, and Stefan Woerner. 2021. The power of quantum neural networks. Nature Computational Science 1, 6 (2021), 403–409.
Gentinetta et al. (2024) Gian Gentinetta, Arne Thomsen, David Sutter, and Stefan Woerner. 2024. The complexity of quantum support vector machines. Quantum 8 (2024), 1225.
Grešová et al. (2023) Katarína Grešová, Vlastimil Martinek, David Čechák, Petr Šimeček, and Panagiotis Alexiou. 2023. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data 24, 1 (2023), 25.
Havlíček et al. (2019) Vojtěch Havlíček, Antonio D Córcoles, Kristan Temme, Aram W Harrow, Abhinav Kandala, Jerry M Chow, and Jay M Gambetta. 2019. Supervised learning with quantum-enhanced feature spaces. Nature 567, 7747 (2019), 209–212.
Pokhrel et al. (2024) Shiva Raj Pokhrel, Naman Yash, Jonathan Kua, Gang Li, and Lei Pan. 2024. Quantum Federated Learning Experiments in the Cloud with Data Encoding. arXiv preprint arXiv:2405.00909 (2024).