This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Poster: Implementing Quantum Machine Learning in Qiskit for Genomic Sequence Classification

Navneet Singh and Shiva Raj Pokhrel

An Independent Implementation of Quantum Machine Learning Algorithms in Qiskit for Genomic Data

Navneet Singh and Shiva Raj Pokhrel
Abstract.

In this paper, we explore the power of Quantum Machine Learning as we extend, implement and evaluate algorithms like Quantum Support Vector Classifier (QSVC), Pegasos-QSVC, Variational Quantum Circuits (VQC), and Quantum Neural Networks (QNN) in Qiskit with diverse feature mapping techniques for genomic sequence classification.111Authors are from IoT & SE Lab, School of IT, Deakin University, Australia; [email protected].

Quantum Machine Learning, Quantum Support Vector Classifier (QSVC), Pegasos-QSVC, Variational Quantum Circuits (VQC), Feature Map, Genomic Sequence Classification and Quantum Neural Networks (QNN)
copyright: nonedoi: isbn: price:

1. Introduction

The growing evolution of the Quantum Machine Learning (QML) models has shown leapfrog in medical imaging analysis, however, there remains a notable gap in genomic sequence classification, prompting our research to evaluate, rethink and extend the baseline performance of Quantum Support Vector Classifier (QSVM), Pegasos-QSVM (Gentinetta et al., 2024), Variational Quantum Classifier (VQC), and Quantum Neural Network (QNN) (Abbas et al., 2021) over genomic data (Pokhrel et al., 2024).

In this preliminary research, we employ feature mapping techniques: ZFeatureMap, ZZFeatureMap, and PauliFeatureMap (Havlíček et al., 2019), extend, implement and evaluate QML algorithms over Qiskit for genomic data, to analyze the impact of algorithmic and feature mapping parameters for better understanding and application in genomic sequence classification.

2. Background and Implementation

Feature maps facilitate QML models by translating classical data (xi,yi)(x_{i},y_{i}) into operational quantum states ψ\psi, enabling algorithms to process and analyze information efficiently. The ZZFeatureMap uses the ZZ gate to entangle pairs of qubits, introducing phase factors based on encoded classical data and structuring interactions between qubit pairs according to the data distribution. In contrast, the ZFeatureMap employs the Z gate to introduce phase shifts to individual qubit states based on classical data, rotating qubit states accordingly. The PauliFeatureMap uses combinations of Pauli gates (X, Y, and Z) to encode classical data.

Our developed QML algorithms and Qiskit implementations are open source. In our QSVC (Algorithm 1), we begin by encoding data points xi\vec{x}_{i} into quantum states ψ\psi using a tailored quantum circuit 𝒞\mathcal{C}, and a feature map with parametrs θ\theta, Ufeature(xi,θ)U_{\text{feature}}(\vec{x}_{i},\theta). A quantum kernel matrix KK is computed, representing the inner products between ψ(xi),ψ(xj)\langle\psi(\vec{x}_{i}),\psi(\vec{x}_{j})\rangle, which facilitates the optimization of the SVM. Training entails solving a dual quadratic programming problem (step 4, Algo.  1) using classical to find the optimal separation hyperplane for labels yiy_{i} in the quantum-enhanced feature space, while predictions involve preparing a ψ\psi for new data points and evaluating the decision function, f(x)f(\vec{x}) incorporating support vectors, kernel evaluations, and a bias term.

Algorithm 1 Quantum Support Vector Classifier (QSVC)
1:Encode data xi\vec{x}_{i} into quantum states ψ\psi \leftarrowUfeature(xi,θ)U_{\text{feature}}(\vec{x}_{i},\theta).
2:Construct 𝒞\mathcal{C} \to   Compute kernel matrix K.
3:Measure inner product, Kijψ(xi),ψ(xj)K_{ij}\leftarrow\langle\psi(\vec{x}_{i}),\psi(\vec{x}_{j})\rangle.
4:Optimize support vectors and coefficients αi\alpha_{i} and solve QP problem:
αsolve(max(i=1nαi12i,j=1nyiyjαiαjKij))\vec{\alpha}\leftarrow\text{solve}\left(\max\left(\sum_{i=1}^{n}\alpha_{i}-\frac{1}{2}\sum_{i,j=1}^{n}y_{i}y_{j}\alpha_{i}\alpha_{j}K_{ij}\right)\right)
subject to: 0αiC0\leq\alpha_{i}\leq C and i=1nαiyi=0\sum_{i=1}^{n}\alpha_{i}y_{i}=0.
5:Prepare ψ\psi \leftarrowUfeature(xi,θ)U_{\text{feature}}(\vec{x}_{i},\theta).
6:return Decision function f(x)f(\vec{x}) for prediction
f(x)=sign(i=1nαiyiK(x,xi)b)f(\vec{x})=\text{sign}\left(\sum_{i=1}^{n}\alpha_{i}y_{i}K(\vec{x},\vec{x}_{i})-b\right)
Algorithm 2 Pegasos-QSVC
1:Encode data xi\vec{x}_{i} into quantum states ψ\psi \leftarrowUfeature(xi,θ)U_{\text{feature}}(\vec{x}_{i},\theta).
2:Construct 𝒞\mathcal{C} \to   Compute kernel matrix KK.
3:for t=1t=1 to TT do
4:     Randomly select a subset of samples
5:     Compute yiw,ϕ(xi)y_{i}\langle\vec{w},\phi(\vec{x}_{i})\rangle for each sample
6:     if yiw,ϕ(xi)<1y_{i}\langle\vec{w},\phi(\vec{x}_{i})\rangle<1 then w(1ηλ)w+ηyiϕ(xi)\vec{w}\leftarrow(1-\eta\lambda)\vec{w}+\eta y_{i}\phi(\vec{x}_{i})
7:     elsew(1ηλ)w\vec{w}\leftarrow(1-\eta\lambda)\vec{w}
8:     end if
9:     Normalize w\vec{w}: wmin(1,1/λw)w\vec{w}\leftarrow\min\left(1,\frac{1/\sqrt{\lambda}}{\|\vec{w}\|}\right)\vec{w}
10:end for
11:Prepare ψ\psi \leftarrowUfeature(xi,θ)U_{\text{feature}}(\vec{x}_{i},\theta).
12:return Decision function: f(x)=sign(w,ϕ(x))f(\vec{x})=\text{sign}(\langle\vec{w},\phi(\vec{x})\rangle).

The Pegasos-QSVC, (Algorithm 2), in our implementation, initializes the qubits and θ\theta, including learning rate η\eta and regularization factor λ\lambda, encoding data in ψ\psi for the computation of the quantum kernel KK. In the training loop TT, iterative updates occur with subset evaluations, updating the weight vector w\vec{w} based on margin conditions and normalizing it for regularization (step 9, Algo 2). The predictions f(x)f(\vec{x}) for new data points are based on their representations of quantum states ψ\psi using the final model (w,ϕ(x))(\langle\vec{w},\phi(\vec{x})\rangle).

Algorithm 3 Variational Quantum Classifier (VQC)
1:Encode data xi\vec{x}_{i} into quantum states ψ\psi \leftarrowUfeature(xi)U_{\text{feature}}(\vec{x}_{i}).
2:Construct variational circuit: parameterized gates U(θ)U(\vec{\theta}).
3:Apply U(θ)U(\vec{\theta}) to the encoded states ψ\psi.
4:Measurement Mmeasure{|0,|1}(ψ)M\leftarrow\text{measure}_{\{|0\rangle,|1\rangle\}}(\psi)
5:Compute the cost function C(θ)C(\vec{\theta}).
6:while not converged do
7:     Use classical optimizer: θoptimize(C(θ))\vec{\theta}\leftarrow\text{optimize}(C(\vec{\theta}))
8:end while
9:return Optimized parameters: θopt\vec{\theta}_{\text{opt}}.

Our implementation of VQC (Algorithm 3), begins with a qubit and θ\theta, followed by encoding into qubits Ufeature(xi)U_{\text{feature}}(\vec{x}_{i}). A parameterized quantum circuit is constructed U(θ)U(\vec{\theta}), with gate θ\theta serving as a learnable parameters θ\vec{\theta}. Then quantum operations are applied for measuring MM to compute the resultant cost function C(θ)C(\vec{\theta}). This process is iteratively minimized by a classical optimizer adjusting the gate parameters until convergence, yielding  θopt\vec{\theta}_{\text{opt}}.

Algorithm 4 Quantum Neural Network (QNN) Operation
1:Encode xi\vec{x}_{i} into quantum states: ψ\psi \leftarrowUfeature(xi)U_{\text{feature}}(\vec{x}_{i}).
2:for each layer ll from 11 to LL do
3:     Perform unitary transformation: Ul(θl)U_{l}(\theta_{l})
4:end for
5:Perform CNOT (or custom entangling gate)
6:Measure output qubits and obtain |Φ|\Phi\rangle
7:while (E>εE>\varepsilondo
8:     Renew θ\theta (quantum gradient descent): θ=θηθE\theta=\theta-\eta\nabla_{\theta}E
9:     E=|E=|expected_output - measurement(Φ)|2(\Phi)|^{2}
10:end while

Our implementation of the QNN (Algorithm 4), initializes qubits and  θ\theta, encoding data into quantum states ψ\psi using  Ufeature(xi)U_{\text{feature}}(\vec{x}_{i}). Through parameterized quantum gates across layers ll, we process data, capture correlations of entanglement, and compute ϕ\phi by quantifying qubits. Observe that, in training, a quantum version of backpropagation adjusts based on EE and η\eta until the threshold minimization is attained.

We fed the democoding vs. intergenomic sequence dataset (Grešová et al., 2023), containing sequences of different transcript types in our experiments using four qubits, allowing thorough evaluation of the four proposed algorithms. A subset of 100,000 genome sequences (divided into training and test sets) is selected for analysis to evaluate various QML models. Initially, the data are transformed into numerical form through text vectorization and reduced in dimensionality using PCA, then encoded into quantum data using feature mapping techniques, preparing them for processing in QML models.

Refer to caption
Figure 1. Convergence of QNN & VQC objective functions.

Figure 1 illustrates the outperforming convergence of the objective function during the training for QNN and VQC with different feature mapping schemes (compared to (Abbas et al., 2021)). Our detailed evaluation with classification metrics is shown in Table 1, illustrating substantial improvement compared to (Abbas et al., 2021)(Gentinetta et al., 2024) and (Pokhrel et al., 2024) in accuracies (train and test), precision, recall, F1-Score, and AUROC (Area Under the Receiver Operating Characteristics), and performance in distinguishing between the two classes of genome sequences.

[Uncaptioned image]

3. Conclusion

Although our preliminary analysis demonstrates improved performance, ongoing detailed investigations (Pokhrel et al., 2024) in the realm of QML for genomic classification and evaluations with larger datasets will further illuminate the potential of these advancements.

References

  • (1)
  • Abbas et al. (2021) Amira Abbas, David Sutter, Christa Zoufal, Aurélien Lucchi, Alessio Figalli, and Stefan Woerner. 2021. The power of quantum neural networks. Nature Computational Science 1, 6 (2021), 403–409.
  • Gentinetta et al. (2024) Gian Gentinetta, Arne Thomsen, David Sutter, and Stefan Woerner. 2024. The complexity of quantum support vector machines. Quantum 8 (2024), 1225.
  • Grešová et al. (2023) Katarína Grešová, Vlastimil Martinek, David Čechák, Petr Šimeček, and Panagiotis Alexiou. 2023. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data 24, 1 (2023), 25.
  • Havlíček et al. (2019) Vojtěch Havlíček, Antonio D Córcoles, Kristan Temme, Aram W Harrow, Abhinav Kandala, Jerry M Chow, and Jay M Gambetta. 2019. Supervised learning with quantum-enhanced feature spaces. Nature 567, 7747 (2019), 209–212.
  • Pokhrel et al. (2024) Shiva Raj Pokhrel, Naman Yash, Jonathan Kua, Gang Li, and Lei Pan. 2024. Quantum Federated Learning Experiments in the Cloud with Data Encoding. arXiv preprint arXiv:2405.00909 (2024).