Poster: Implementing Quantum Machine Learning in Qiskit for Genomic Sequence Classification
An Independent Implementation of Quantum Machine Learning Algorithms in Qiskit for Genomic Data
Abstract.
In this paper, we explore the power of Quantum Machine Learning as we extend, implement and evaluate algorithms like Quantum Support Vector Classifier (QSVC), Pegasos-QSVC, Variational Quantum Circuits (VQC), and Quantum Neural Networks (QNN) in Qiskit with diverse feature mapping techniques for genomic sequence classification.111Authors are from IoT & SE Lab, School of IT, Deakin University, Australia; [email protected].
1. Introduction
The growing evolution of the Quantum Machine Learning (QML) models has shown leapfrog in medical imaging analysis, however, there remains a notable gap in genomic sequence classification, prompting our research to evaluate, rethink and extend the baseline performance of Quantum Support Vector Classifier (QSVM), Pegasos-QSVM (Gentinetta et al., 2024), Variational Quantum Classifier (VQC), and Quantum Neural Network (QNN) (Abbas et al., 2021) over genomic data (Pokhrel et al., 2024).
In this preliminary research, we employ feature mapping techniques: ZFeatureMap, ZZFeatureMap, and PauliFeatureMap (Havlíček et al., 2019), extend, implement and evaluate QML algorithms over Qiskit for genomic data, to analyze the impact of algorithmic and feature mapping parameters for better understanding and application in genomic sequence classification.
2. Background and Implementation
Feature maps facilitate QML models by translating classical data into operational quantum states , enabling algorithms to process and analyze information efficiently. The ZZFeatureMap uses the ZZ gate to entangle pairs of qubits, introducing phase factors based on encoded classical data and structuring interactions between qubit pairs according to the data distribution. In contrast, the ZFeatureMap employs the Z gate to introduce phase shifts to individual qubit states based on classical data, rotating qubit states accordingly. The PauliFeatureMap uses combinations of Pauli gates (X, Y, and Z) to encode classical data.
Our developed QML algorithms and Qiskit implementations are open source. In our QSVC (Algorithm 1), we begin by encoding data points into quantum states using a tailored quantum circuit , and a feature map with parametrs , . A quantum kernel matrix is computed, representing the inner products between , which facilitates the optimization of the SVM. Training entails solving a dual quadratic programming problem (step 4, Algo. 1) using classical to find the optimal separation hyperplane for labels in the quantum-enhanced feature space, while predictions involve preparing a for new data points and evaluating the decision function, incorporating support vectors, kernel evaluations, and a bias term.
The Pegasos-QSVC, (Algorithm 2), in our implementation, initializes the qubits and , including learning rate and regularization factor , encoding data in for the computation of the quantum kernel . In the training loop , iterative updates occur with subset evaluations, updating the weight vector based on margin conditions and normalizing it for regularization (step 9, Algo 2). The predictions for new data points are based on their representations of quantum states using the final model .
Our implementation of VQC (Algorithm 3), begins with a qubit and , followed by encoding into qubits . A parameterized quantum circuit is constructed , with gate serving as a learnable parameters . Then quantum operations are applied for measuring to compute the resultant cost function . This process is iteratively minimized by a classical optimizer adjusting the gate parameters until convergence, yielding .
Our implementation of the QNN (Algorithm 4), initializes qubits and , encoding data into quantum states using . Through parameterized quantum gates across layers , we process data, capture correlations of entanglement, and compute by quantifying qubits. Observe that, in training, a quantum version of backpropagation adjusts based on and until the threshold minimization is attained.
We fed the democoding vs. intergenomic sequence dataset (Grešová et al., 2023), containing sequences of different transcript types in our experiments using four qubits, allowing thorough evaluation of the four proposed algorithms. A subset of 100,000 genome sequences (divided into training and test sets) is selected for analysis to evaluate various QML models. Initially, the data are transformed into numerical form through text vectorization and reduced in dimensionality using PCA, then encoded into quantum data using feature mapping techniques, preparing them for processing in QML models.

Figure 1 illustrates the outperforming convergence of the objective function during the training for QNN and VQC with different feature mapping schemes (compared to (Abbas et al., 2021)). Our detailed evaluation with classification metrics is shown in Table 1, illustrating substantial improvement compared to (Abbas et al., 2021), (Gentinetta et al., 2024) and (Pokhrel et al., 2024) in accuracies (train and test), precision, recall, F1-Score, and AUROC (Area Under the Receiver Operating Characteristics), and performance in distinguishing between the two classes of genome sequences.
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/f5d16b6d-d73b-4063-bc27-7f8506bb15ab/tab.png)
3. Conclusion
Although our preliminary analysis demonstrates improved performance, ongoing detailed investigations (Pokhrel et al., 2024) in the realm of QML for genomic classification and evaluations with larger datasets will further illuminate the potential of these advancements.
References
- (1)
- Abbas et al. (2021) Amira Abbas, David Sutter, Christa Zoufal, Aurélien Lucchi, Alessio Figalli, and Stefan Woerner. 2021. The power of quantum neural networks. Nature Computational Science 1, 6 (2021), 403–409.
- Gentinetta et al. (2024) Gian Gentinetta, Arne Thomsen, David Sutter, and Stefan Woerner. 2024. The complexity of quantum support vector machines. Quantum 8 (2024), 1225.
- Grešová et al. (2023) Katarína Grešová, Vlastimil Martinek, David Čechák, Petr Šimeček, and Panagiotis Alexiou. 2023. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data 24, 1 (2023), 25.
- Havlíček et al. (2019) Vojtěch Havlíček, Antonio D Córcoles, Kristan Temme, Aram W Harrow, Abhinav Kandala, Jerry M Chow, and Jay M Gambetta. 2019. Supervised learning with quantum-enhanced feature spaces. Nature 567, 7747 (2019), 209–212.
- Pokhrel et al. (2024) Shiva Raj Pokhrel, Naman Yash, Jonathan Kua, Gang Li, and Lei Pan. 2024. Quantum Federated Learning Experiments in the Cloud with Data Encoding. arXiv preprint arXiv:2405.00909 (2024).