This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adiabatic Quantum Support Vector Machines

Prasanna Date    Dong Jun Woun    Kathleen Hamilton    Eduardo A. Coello Perez    Mayanka Chandra Shekhar    Francisco Rios    John Gounley    In-Saeng Suh    Travis Humble    Georgia Tourassi
Abstract

Adiabatic quantum computers can solve difficult optimization problems (e.g., the quadratic unconstrained binary optimization problem), and they seem well suited to train machine learning models. In this paper, we describe an adiabatic quantum approach for training support vector machines. We show that the time complexity of our quantum approach is an order of magnitude better than the classical approach. Next, we compare the test accuracy of our quantum approach against a classical approach that uses the Scikit-learn library in Python across five benchmark datasets (Iris, Wisconsin Breast Cancer (WBC), Wine, Digits, and Lambeq). We show that our quantum approach obtains accuracies on par with the classical approach. Finally, we perform a scalability study in which we compute the total training times of the quantum approach and the classical approach with increasing number of features and number of data points in the training dataset. Our scalability results show that the quantum approach obtains a 3.5–4.5×\times speedup over the classical approach on datasets with many (millions of) features.

Quantum Machine Learning, Quantum AI, Support Vector Machine, Machine Learning, Quantum Computing

1 Introduction

In the 21st century, computer science has witnessed the rise of machine learning technologies and an explosion of their application areas (LeCun et al., 2015). Machine learning applications now run on all devices—ranging from edge devices such as smartphones to large supercomputing systems (Date, 2019). When developing any end-to-end machine learning application, training the machine learning model takes a significant amount of time and is a major bottleneck (Munson, 2012). Training a machine learning model generally refers to obtaining a set of optimal learning parameters that minimize a well-defined error function (i.e., an optimization problem) (Abu-Mostafa et al., 2012).

Although algorithms for solving optimization problems exist on classical computers, quantum computers are thought to be better at solving them (Moll et al., 2018). Interest in quantum computing has been growing rapidly in recent years as Moore’s law reaches its inevitable conclusion—a scenario in which classical computing can no longer sustain exponential leaps in performance (Theis & Wong, 2017). For certain problems, some quantum algorithms outperform their classical contemporaries, including the Fourier transformation (Coppersmith, 2002), integer factorization (Shor, 1994), and database search (Grover, 1996), and have attained quantum supremacy (Arute et al., 2019). Developing machine learning algorithms for quantum computing should result in shorter training times (Perdomo-Ortiz et al., 2018), but this approach must be tested thoroughly.

To this end, we quantify the gains obtained from using an adiabatic quantum computer (AQC) to train support vector machines (SVM). AQCs are adept at approximately solving the quadratic unconstrained binary optimization (QUBO) problem, which is known to be NP-hard (Kochenberger et al., 2014). In this paper, we formulate SVM training as a QUBO problem and use the D-Wave Advantage AQC to solve it. The main contributions of this work are as follows:

  1. 1.

    We formulate an adiabatic quantum approach for training SVMs.

  2. 2.

    We show that the time complexity of our quantum approach is an order of magnitude faster than the current classical approach.

  3. 3.

    We compare the test accuracies of our quantum approach to those of the classical approach across five benchmark datasets: Iris, Wisconsin Breast Cancer (WBC), Wine, Digits, and Lambeq. Our results show that the quantum approach obtains accuracies on par with the classical approach.

  4. 4.

    We show that the quantum approach is 3.5–4.5×\times faster than the classical approach for training SVMs on large (millions of features) synthetic datasets.

2 Related Work

The incorporation of quantum information and quantum computing is expected to have a significant effect on machine learning (Pudenz & Lidar, 2013; Dunjko et al., 2016; Biamonte et al., 2017). A hallmark of quantum algorithms is leveraging physical phenomena such as superposition and entanglement in addition to an exponentially large Hilbert space for computational advantages (Humble et al., 2022; Delgado et al., 2022). Two main areas of research in quantum machine learning are the development of quantum algorithms that can speed up the training of classical models and the development of quantum models that act as parameterized learning models (Dunjko & Briegel, 2018; Benedetti et al., 2019).

Quantum machine learning has been explored primarily on universal quantum computers. For example, the Harrow-Hassim-Lloyd algorithm (Harrow et al., 2009) is used to speed up matrix inversion. Jordan’s algorithm (Jordan, 2005) was adapted by Gilyen et al. (Gilyén et al., 2019) to leverage the quantum Fourier transform to compute the gradient of a classical function with a sublinear number of steps. Date proposed the quantum discriminator, which is a quantum discriminant model used for binary classification (Date & Smith, 2024). Quiroga et al. propose the quantum k-means classifier on the IBM quantum computers for discriminating quantum states on hardware (Quiroga et al., 2021).

Quantum gradient descent methods have also seen development (Rebentrost et al., 2019). Furthermore, the general training of machine learning classifiers has been studied, and these investigations show that sublinear training is possible (Li et al., 2019). However, these algorithms require either fault-tolerant quantum systems or quantum registers that far exceed current quantum hardware. Li et al. propose the ST-VQC algorithm to integrate non-linearity in quantum learning and improve the robustness of the learning model to noise. The number of qubits available on near-term, gate-based hardware has limited the direct application of quantum computing to standard machine learning benchmarks. Quantum classifiers that can analyze standard benchmark datasets (e.g., Wine, Sonar) have been studied via numerical simulations (Schuld et al., 2018), but studies that use hardware either require compression methods to reduce the number of training features to a tractable size or are limited to a small number of features (Havlíček et al., 2019). The size of qubit registers continues to increase, and the quality of gates continues to improve; as a result, exploring the applications of parameterized circuit models remains an active area of research (Benedetti et al., 2019).

Instead of implementing a quantum algorithm as a series of unitary gates applied to a qubit register, the D-Wave processor encodes a problem into a system of interacting quantum spins. Through a gradual change in the system’s Hamiltonian, the process of quantum annealing is used to find the ground state (and correspondingly the solution) of the encoded problem. Casting machine learning tasks as AQC problems through the use of QUBOs is different from constructing a quantum circuit (Pudenz & Lidar, 2013). But, because of the similarities between QUBOs and Ising spin models (Lucas, 2014), several QUBO-based analogues of restricted Boltamann machines, deep belief networks, and Hopfield networks have been proposed in the literature (Amin et al., 2018; Date et al., 2019b). Chen et al. use the D-Wave system to solve an NP-hard problem that pertains to energy-efficient routing in wireless sensor networks. QUBO-based implementations of conventional machine learning models (e.g., linear regression, k-means clustering, SVMs) have also been developed for the D-Wave platform (Amin et al., 2018; Date et al., 2019b; Arthur & Date, 2021; Date & Potok, 2021; Date et al., 2021). The strength of quantum annealing lies in the ability to solve difficult optimization problems, and this ability means the D-Wave platform could speed up training for classical machine learning models (Neven et al., 2012; Adachi & Henderson, 2015; Willsch et al., 2019).

Recent efforts have explored quantum approaches for SVMs as well. Otgonbaatar successfully demonstrated training SVMs on large remote sensing data by using the D-Wave quantum annealer (Otgonbaatar et al., 2022). To train large datasets, his team employed corsets, which are smaller representative subsets of the data. Barbosa analyzed the factors that contribute to the difficulty of solving a Maximum Clique problem on the D-Wave quantum computer (Barbosa et al., 2021). Lee investigated more effective ways to formulate QUBO problems for linear systems on the D-Wave AQCs (Lee et al., 2022), and Simoes’s research shows that quantum SVMs and neural networks trained on universal quantum computers can achieve higher accuracy than classical approaches (Simões et al., 2023).

3 SVMs

We use the following notation throughout this paper:

  • \mathbb{R}, \mathbb{N}, 𝔹\mathbb{B}: Set of real, natural, and binary numbers, respectively.

  • XX: Training dataset, XN×dX\in\mathbb{R}^{N\times d}; N,dN,d\in\mathbb{N}.

  • YY: Labels for binary classification; yiy_{i} is +1+1 (1-1) if data point xiXx_{i}\in X belongs to the first (second) class.

  • ww, bb: Weights and bias of the SVM; wdw\in\mathbb{R}^{d}, bb\in\mathbb{R}.

Refer to caption
Figure 1: SVMs.

We now state the SVM training problem shown in Figure 1, where we have two classes of data shown by red pluses and blue circles. We would like to find a separating hyperplane, H:wTx+b=0H:w^{T}x+b=0, that maximizes the distance between the nearest points that belong to the two classes. Hyperplanes H1:wTx+b=+1H_{1}:w^{T}x+b=+1 and H2:wTx+b=1H_{2}:w^{T}x+b=-1 are parallel to the separating hyperplane HH and demarcate the boundary of the corridor in which HH lies. H1H_{1} and H2H_{2} can be thought of as the hyperplanes in which the first (H1H_{1}) and second (H2H_{2}) classes begin. The orthogonal distance between H1H_{1} and H2H_{2} is given by 2w2\frac{2}{||w||^{2}}, which we would like to maximize. This is equivalent to minimizing w2||w||^{2}.

We must also ensure that all the points are classified correctly. All the red pluses in Figure 1 belong to the first class (i.e., yi=+1y_{i}=+1). All the blue circles in Figure 1 belong to the second class (i.e., yi=1y_{i}=-1). We must ensure that we have wTxi+b1w^{T}x_{i}+b\geq 1 for all the points in the first class and wTxi+b1w^{T}x_{i}+b\leq-1 for all the points in the second class. By incorporating these constraints for all points in the training dataset, the SVM training problem can be stated as follows:

minw,bw2\displaystyle\min_{w,b}\ ||w||^{2} (1)
subject to: yi(wTxi+b)1i=1,2,,N\displaystyle y_{i}(w^{T}x_{i}+b)\geq 1\qquad\forall i=1,2,\ldots,N

The objective function is convex because its Hessian is positive semi-definite. Furthermore, the constraints are linear, and hence, convex. Problem 1 is therefore a quadratic programming problem. To solve Problem 1, we first compute the Lagrangian dual as follows:

maxw,b,λ(w,b,λ)=w2i=1Nλi[yi(wTxi+b)1],\displaystyle\max_{w,b,\lambda}\ \mathcal{L}(w,b,\lambda)=||w||^{2}-\sum_{i=1}^{N}\lambda_{i}\left[y_{i}(w^{T}x_{i}+b)-1\right], (2)

where, λ\lambda is the vector that contains all the Lagrangian multipliers (i.e., λ=[λ1λ2λN]T\lambda=[\lambda_{1}\ \lambda_{2}\ \cdots\ \lambda_{N}]^{T}, and λi0i\lambda_{i}\geq 0\ \forall i). The non-zero Lagrangian multipliers in the final solution correspond to the support vectors and determine the hyperplanes H1H_{1} and H2H_{2} in Figure 1. The Lagrangian dual problem (Eq. 2) is solved in 𝒪(N3)\mathcal{O}(N^{3}) time on classical computers by applying the Karush-Kuhn-Tucker (KKT) conditions (Karush, 1939; Kuhn & Tucker, 2014). As part of the KKT conditions, we set the gradient of (w,b,λ)\mathcal{L}(w,b,\lambda) with respect to ww to 0. We also set the partial derivative of (w,b,λ)\mathcal{L}(w,b,\lambda) with respect to bb to zero. Doing so yields the following:

w(w,b,λ)\displaystyle\nabla_{w}\mathcal{L}(w,b,\lambda) =wi=1Nλiyixi=0\displaystyle=w-\sum_{i=1}^{N}\lambda_{i}y_{i}x_{i}=0
w\displaystyle\therefore\quad w =i=1Nλiyixi\displaystyle=\sum_{i=1}^{N}\lambda_{i}y_{i}x_{i} (3)
(w,b,λ)b\displaystyle\frac{\partial\mathcal{L}(w,b,\lambda)}{\partial b} =i=1Nλiyi=0\displaystyle=-\sum_{i=1}^{N}\lambda_{i}y_{i}=0
i=1Nλiyi\displaystyle\therefore\quad\sum_{i=1}^{N}\lambda_{i}y_{i} =0\displaystyle=0 (4)

Substituting Eqs. 3 and 4 into Eq. 2, we have

(λ)=i=1Nλi12i=1Nj=1Nλiλjxixjyiyj\displaystyle\mathcal{L}(\lambda)=\sum_{i=1}^{N}\lambda_{i}-\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\lambda_{i}\lambda_{j}x_{i}x_{j}y_{i}y_{j} (5)

Note that Eq. 5 is a function of λ\lambda only. We want to maximize Eq. 5 for the Lagrangian multipliers and ensure that λi,λj0i,j\lambda_{i},\lambda_{j}\geq 0\quad\forall i,j while satisfying Eq. 4.

4 Formulation for AQCs

AQCs are adept at solving QUBO problems, which are NP-hard and defined as follows:

minz𝔹MzTAz+zTb,\displaystyle\min_{z\in\mathbb{B}^{M}}z^{T}Az+z^{T}b, (6)

where z𝔹Mz\in\mathbb{B}^{M} is the binary decision vector (MM\in\mathbb{N}), AM×MA\in\mathbb{R}^{M\times M} is the QUBO matrix, and bMb\in\mathbb{R}^{M} is the QUBO vector.

To convert the SVM training problem into a QUBO problem, we write Eq. 5 as a minimization problem:

minλ(λ)\displaystyle\min_{\lambda}\ \mathcal{L}(\lambda) =12i=1Nj=1Nλiλjxixjyiyji=1Nλi\displaystyle=\frac{1}{2}\sum_{i=1}^{N}\sum_{j=1}^{N}\lambda_{i}\lambda_{j}x_{i}x_{j}y_{i}y_{j}-\sum_{i=1}^{N}\lambda_{i} (7)
λi,λj\displaystyle\lambda_{i},\lambda_{j} 0i,j\displaystyle\geq 0\quad\forall i,j

This can be written in matrix form as follows:

minλ(λ)=12λT(XXTYYT)λλT1Nλ0N,\displaystyle\min_{\lambda}\mathcal{L}(\lambda)=\frac{1}{2}\lambda^{T}(XX^{T}\odot YY^{T})\lambda-\lambda^{T}1_{N}\qquad\lambda\geq 0_{N}, (8)

where 1N1_{N} and 0N0_{N} represent NN-dimensional vectors of ones and zeros, respectively, and \odot is the element-wise multiplication operation.

We now introduce a KK-dimensional precision vector, P=[p1,p2,,pK]TP=[p_{1},p_{2},\ldots,p_{K}]^{T}, where each entry pkp_{k} is a power of 22. This is required to impose the non-negativity constraint on λ\lambda. The precision vector must also be sorted. For example, a precision vector could be P=[14,12,1,2,]TP=\left[\frac{1}{4},\frac{1}{2},1,2,\right]^{T}. We also introduce KK binary variables λ^ik\hat{\lambda}_{ik} for each Lagrangian multiplier such that

λi\displaystyle\lambda_{i} =k=1Kpkλ^iki=1,2,,N,\displaystyle=\sum_{k=1}^{K}p_{k}\hat{\lambda}_{ik}\qquad\forall i=1,2,\ldots,N, (9)

where pkp_{k} denotes the kthk^{\text{th}} entry in the precision vector PP. Next, we vertically stack all binary variables:

λ^\displaystyle\hat{\lambda} =[λ^11λ^1Kλ^21λ^2Kλ^N1λ^NK]T\displaystyle=[\hat{\lambda}_{11}\ \ldots\ \hat{\lambda}_{1K}\ \hat{\lambda}_{21}\ \ldots\ \hat{\lambda}_{2K}\ \ldots\ \hat{\lambda}_{N1}\ \ldots\ \hat{\lambda}_{NK}]^{T} (10)

We now define a precision matrix as follows:

𝒫=INPT\displaystyle\mathcal{P}=I_{N}\otimes P^{T} (11)

Notice that

λ\displaystyle\lambda =𝒫λ^\displaystyle=\mathcal{P}\hat{\lambda} (12)

Finally, we substitute the value of λ\lambda from Eq. 12 into Eq. 8:

minλ^𝔹NK(λ^)=12λ^T𝒫T(XXTYYT)𝒫λ^λ^T𝒫T1N\displaystyle\min_{\hat{\lambda}\in\mathbb{B}^{NK}}\ \mathcal{L}(\hat{\lambda})=\frac{1}{2}\hat{\lambda}^{T}\mathcal{P}^{T}(XX^{T}\odot YY^{T})\mathcal{P}\hat{\lambda}-\hat{\lambda}^{T}\mathcal{P}^{T}1_{N} (13)

Equation 13 is identical to Eq. 6 with z=λ^z=\hat{\lambda}, A=12𝒫T(XXTYYT)𝒫A=\frac{1}{2}\mathcal{P}^{T}(XX^{T}\odot YY^{T})\mathcal{P}, b=𝒫T1Nb=-\mathcal{P}^{T}1_{N}, and M=KNM=KN. Hence, we converted the SVM training problem from Eq. 2 into a QUBO problem in Eq. 13, which can then be solved on AQCs.

4.1 Computational Complexity

We begin our theoretical analysis by defining the space complexity for the number of qubits needed to solve the QUBO. The SVM training problem stated in Eq. 8 contains 𝒪(N)\mathcal{O}(N) variables (λ\lambda) and 𝒪(Nd)\mathcal{O}(Nd) data (XX and YY). The QUBO formulation of the SVM training problem stated in Eq. 13 consists of the same amount of data. However, as part of the QUBO formulation, we introduced KK binary variables for each Lagrangian multiplier in the original problem (Eq. 8). So, the total number of variables in Eq. 13 is 𝒪(KN)\mathcal{O}(KN). So, the qubit footprint (or space complexity) of this formulation would be 𝒪(N2K2)\mathcal{O}(N^{2}K^{2}) after embedding onto the hardware.

The time complexity of classical SVM algorithms is 𝒪(N3)\mathcal{O}(N^{3}) (Bottou & Lin, 2007). We analyze the time complexity for our approach in three parts. First, the time complexity of converting Problem 1 into a QUBO problem can be inferred from Eqs. 7 and 9 as 𝒪(N2K2)\mathcal{O}(N^{2}K^{2}). Because we have 𝒪(NK)\mathcal{O}(NK) variables in the QUBO formulation, embedding can be done in 𝒪(N2K2)\mathcal{O}(N^{2}K^{2}) time by using the embedding algorithm proposed by Date et al. (Date et al., 2019a). Although the theoretical time complexity of quantum annealing used to obtain an exact solution is exponential (𝒪(ed)\mathcal{O}(e^{\sqrt{d}})) (Mukherjee & Chakrabarti, 2015), a more realistic estimate of the running time can be made by using measures such as ST99 and ST99(OPT) (Wang & Jonckheere, 2019), which give the expected number of iterations to reach a certain level of optimality with 99%99\% certainty. Quantum annealing performs well on problems in which the energy barriers between local optima are tall and narrow because such an energy landscape is more conducive to quantum tunneling. To estimate ST99 and ST99(OPT) for our approach, details on specific instances of the SVM problem are required. Estimating ST99 and ST99(OPT) for generic QUBO formulation of the SVM problem is beyond the scope of the present work.

That said, we would like to shed some light on the quantum annealing running times observed in practice. An AQC can accommodate only finite-sized problems—for example, D-Wave 2000Q can accommodate problems with 64 or fewer binary variables that require all-to-all connectivity (Date et al., 2019a). For problems within this range, a constant annealing time and a constant number of repetitions seem to work well in practice. So, the total time to convert and solve a linear regression problem on an AQC would be 𝒪(N2K2)\mathcal{O}(N^{2}K^{2}).

Note that the qubit footprint 𝒪(N2K2)\mathcal{O}(N^{2}K^{2}) and time complexity 𝒪(N2K2)\mathcal{O}(N^{2}K^{2}) assume that KK is a variable. If the precision for all parameters (λ^\hat{\lambda}) is fixed (e.g., limited to 3232-bit or 6464-bit precision), then KK becomes a constant factor. The resulting qubit footprint would be 𝒪(N2)\mathcal{O}(N^{2}), and time complexity would also be be 𝒪(N2)\mathcal{O}(N^{2}). This time complexity is an order of magnitude better than the classical algorithm (𝒪(N3)\mathcal{O}(N^{3})).

5 Empirical Analysis

5.1 Methodology and Performance Metrics

Our investigation compares the accuracy of the classical SVM implemented in Scikit-learn with the quantum SVM that utilizes the D-Wave Advantage AQC. In addition to the quantum approach, we also consider simulated annealers to compare accuracy. Scikit-learn solves SVMs by using the sequential minimal optimization algorithm, whereas the quantum approach solves SVMs by transforming the problem into a QUBO problem. In our study, we convert Problem 1 to a Lagrangian dual (Problem 8) and then into a QUBO (Problem 13) for the quantum and simulated annealers. Our research evaluates the performance of the quantum approach, the classical approach, and the simulated annealing approach across two key metrics: (i) accuracy of the trained model and (ii) total compute time. We calculate the accuracy by dividing the number of correct classifications by the total number of samples. For the classical approach, we measure the compute time as the time taken for training. In contrast, the compute time for the quantum approach encompasses the time required for converting the problem into a QUBO problem, embedding the QUBO problem onto the quantum hardware, and performing quantum annealing to solve the problem. Notably, we observed that the simulated annealers failed to find a viable solution, especially when there were many features, so we decided not to examine the computing time scalability for simulated annealers.

5.2 Hardware Configuration

We conducted the implementations of the classical approach, simulated annealing, and preprocessing stage on two different hardware configurations without altering the underlying methodology. The first system, equipped with an AMD Ryzen 4600H 6-core CPU running at 3 GHz and 16 GB of DDR4 RAM running at 2,666 MHz, was used to evaluate the accuracy results described in Section 5.4. The second system, equipped with an Intel Xeon E5-2690v4 14-core CPU running at 2.6 GHz and 64 GB of DDR4 RAM running at 2,400 MHz, was used to evaluate the compute times presented in Section 5.6. We used the two different machines for the accuracy and compute time comparisons because the AMD machine could not support the larger synthetic datasets used in the compute time experiments.

We used the D-Wave Advantage quantum annealer to evaluate the quantum approach; the annealer offers 5,627 functioning qubits. After preprocessing the problem on a classical computer, we employed the EmbeddingComposite class from the D-Wave library to embed the problem onto the quantum hardware. We conducted 10 annealing runs for each problem instance to ensure accuracy and selected the solution with the lowest energy sample.

5.3 Datasets

We compare the classical, quantum, and simulated annealing approaches across the following datasets: synthetic datasets, Iris, WBC, Wine, Digits, and Lambeq. When datasets were split for training and testing, the training data was uniformly split per class. For generating the synthetic datasets, we employ two methods. The first method utilizes the make_blob function from the Scikit-learn library. This function generates synthetic data with two linearly separable centers and dd-dimensional sides. In the second method, we implement a function that generates data points and their corresponding class labels based on the following specified parameters: number of data points, number of features, SVM weights, and SVM bias. We generate each data point by sampling the feature values uniformly at random in the range [-1, 1]. We then classify these points based on a linear decision boundary. This process iteratively generates the data until we have the desired number of data points. This approach allows us to generate synthetic datasets that, when trained with an SVM, result in predictable hyperplanes. The Iris dataset contains 150 data points with four features, and the WBC dataset consists of 369 data points with 30 features. The Wine dataset contains 178 data points with 13 features for three classes, and the Digits dataset holds 1,797 data points with 64 features and 10 classes, but we only used two classes of 168 data points.

Refer to caption
Figure 2: Comparison of hyperplanes created by the support vectors with Scikit-learn (+), simulated annealing(- -), and D-Wave (—) on positive synthetic data (blue and green circles).
Refer to caption
Figure 3: Comparison of hyperplanes created by the support vectors with Scikit-learn (+), simulated annealing(- -) and D-Wave (—) on negative synthetic data (blue and green circles).
Refer to caption
Figure 4: Comparison of hyperplanes created by the support vectors with Scikit-learn (+), simulated annealing(- -), and D-Wave (—) on random synthetic data (blue and green circles split by classification).
Table 1: Training and Testing Accuracy of Datasets
Data Trained Tested Scikit-learn Scikit-learn D-Wave D-Wave Simulated Annealing Simulated Annealing
Points Points Training Accuracy Testing Accuracy Training Accuracy Testing Accuracy Training Accuracy Testing Accuracy
Synthetic positive 20 100 100 100 100 100 100 99.6 ± 0.8
Synthetic negative 20 100 100 100 100 100 100 100
Synthetic random 20 100 100 100 100 100 100 100
Setosa - Virginica 20 80 100 100 100 100 100 99.6 ± 1.2
Setosa - Versicolor 20 80 100 99.6 ± 0.6 100 99.5 ± 1.6 100 99.7 ± 0.5
Versicolor - Virginica 20 80 90.5 ± 6 85.6 ± 3.3 70.5 ± 33.1 67.9 ± 28.8 93.5 ± 6.7 88.1 ± 6.7
WBC 52 517 97.7 ± 1.2 95 ± 1.4 93.3 ± 21 93.1 ± 1.3 91.7 ± 2.6 92.6 ± 1.4
Wine 0—1 52 55 100 100 100 100 99 ± 1.4 99.5 ± 1.7
Wine 0—2 52 78 100 98.2 ± 1.8 96 ± 2.1 96 ± 1.3 95 ± 2.1 94.7 ± 2.3
Wine 1—2 52 67 99.2 ± 1 96 ± 1.7 98.5 ± 1.5 96.6 ± 2.4 96 ± 4 94.2 ± 4.5
Digits 0—1 20 158 100 99.2 ± 1.3 99 ± 2.1 97.5 ± 2.2
Lambeq 52 30 100 100 93.5 ± 2.9 88.7 ± 4.8

5.4 Accuracy

We compare the classification accuracy of three distinct SVM approaches: the classical SVM, simulated annealing SVM, and quantum SVM. We employ multiple datasets and conduct 10 rounds of training and testing for each dataset. In many of these datasets, we use only 52 points chosen uniformly at random for training the classifiers because 52 is the largest size of the training dataset that could be accommodated on the annealer. The accuracy results are presented in Table 1.

5.4.1 Synthetic Data

We parameterize the synthetic datasets into three types: positive, negative, and random. Positive datasets are designed to be linearly separable when separated by a positive hyperplane (see Figure 2). Conversely, negative datasets are designed to be separed by a negative hyperplane (see Figure 3). To achieve this, we classify randomly sampled points based on a linear decision boundary. Additionally, random datasets with linearly separable data points are generated using the Scikit-learn make_blob function (see Figure 4). Before training the SVM models, we ensure that all data points are normalized. Each training dataset consistently contains 20 points with two features. For testing accuracies, we use 100 points. We provide a detailed comparison of scaling data points and features later in this work.

All three approaches—classical SVM, simulated annealing SVM, and quantum SVM—accurately identify the support vectors for the synthetic datasets. Figures 2, 3, and 4 illustrate the datasets and corresponding hyperplanes constructed by the support vectors. Both classical SVM and quantum SVM exhibit 100%100\% accuracy during training and testing across all three datasets. The simulated annealing approach also achieved 100%100\% accuracy on the negative and random datasets. For the positive datasets, the simulated annealing approach achieved an average accuracy of 99.6%99.6\%. These results highlight the effectiveness of all three methods in accurately identifying support vectors.

Table 2: Training and Testing Accuracy of Varying Annealing Time
Data Annealing Trained Tested Scikit-learn Scikit-learn D-Wave D-Wave Simulated Annealing Simulated Annealing
Time(µs) Points Points Training Accuracy Testing Accuracy Training Accuracy Testing Accuracy Training Accuracy Testing Accuracy
Versicolor - Virginica 20 52 48 93.8 ± 3.1 94.2 ± 3.2 87.5 ± 5.4 86.2 ± 6.4 87.5 ± 4.6 85.4 ± 4.7
Versicolor - Virginica 100 52 48 95.0 ± 2.3 93.1 ± 4.3 88.1 ± 4.1 85.8 ± 5.8 87.1 ± 3.4 86.7 ± 5.0
Versicolor - Virginica 1000 52 48 93.7 ± 2.4 95.0 ± 2.2 86.7 ± 2.8 89.8 ± 4.0 88.1 ± 3.2 89.0 ± 3.9
WBC 20 52 517 95.6 ± 2.0 95.7 ± 0.9 90.8 ± 3.9 92.7 ± 1.2 91.0 ± 3.4 93.2 ± 0.7
WBC 100 52 517 97.3 ± 2.6 95.4 ± 1.3 90.6 ± 3.2 92.9 ± 1.1 91.3 ± 2.8 93.2 ± 1.0
WBC 1000 52 517 96.7 ± 1.3 95.4 ± 1.1 91.3 ± 3.9 92.2 ± 1.1 91.7 ± 3.1 92.4 ± 1.7
Wine 1—2 20 52 67 97.9 ± 1.9 97.2 ± 2.0 97.3 ± 2.7 97.0 ± 1.4 95.0 ± 4.3 96.3 ± 3.3
Wine 1—2 100 52 67 98.7 ± 1.8 96.3 ± 1.8 97.5 ± 2.6 96.0 ± 2.9 96.7 ± 3.6 96.7 ± 2.2
Wine 1—2 1000 52 67 98.8 ± 1.3 97.2 ± 2.2 96.7 ± 2.0 95.7 ± 2.9 97.1 ± 1.6 95.7 ± 4.3

5.4.2 Iris

The Iris dataset consists of 150 samples, three classes, and four features. The three classes describe the Setosa, Versicolor, and Virginica flowers, and the four features represent the length and width of the petal and sepal of the flower. The objective is to perform binary classification for each pair of classes: (Setosa, Versicolor), (Setosa, Virginica), and (Versicolor, Virginica). We normalize the data for training. For each pair, we utilize 20 data points across both classes for training and evaluate the model’s accuracy by using the remaining 80 points. The accuracy results are shown in Table 1.

First, all three approaches classify the Setosa and Virginica flowers well. The classical and quantum approaches achieve 100%100\% accuracy in the training and testing data classification. The simulated annealer achieves 100%100\% accuracy in the training data and 99.6%99.6\% accuracy in the testing data. Second, all three approaches demonstrate near perfect classification performance for Setosa and Versicolor flowers. The classical approach attains an average of 100%100\% training accuracy and 99.6%99.6\% testing accuracy. The quantum annealer achieves 100%100\% training accuracy and 99.5%99.5\% testing accuracy. The simulated annealer attains 100%100\% training accuracy and 99.7%99.7\% testing accuracy. Notably, there is no statistically significant difference in accuracy across the three training methods. Third, all three methods exhibit lower classification performance for the Versicolor and Virginica flower species, which are known to be linearly inseparable. The classical approach achieves 90.5%90.5\% training accuracy and 85.6%85.6\% testing accuracy. The quantum annealer achieves 70.5%70.5\% training accuracy and 67.9%67.9\% testing accuracy. The simulated annealer achieves 93.5%93.5\% training accuracy and 88.1%88.1\% testing accuracy.

5.4.3 WBC

The WBC dataset consists of 569 samples, two classes, and 30 features. The data represents digitalized characteristics of breast cell nuclei. The data is normalized for training. For evaluation, 52 points are used for training, and 517 points are used for testing accuracy. The classical approach demonstrates the highest accuracy and achieves an average of 97.7%97.7\% for training data and 95.0%95.0\% for testing data. The quantum approach follows with a training accuracy of 93.3%93.3\% and a testing accuracy of 93.1%93.1\%. Last, the simulated annealing approach achieves a training accuracy of 91.7%91.7\% and a testing accuracy of 92.6%92.6\%.

5.4.4 Wine

The Wine dataset has 178 samples, three classes, and 13 features. The features describe characteristics of the wine, including alcohol, ash, and magnesium content. We compare the three classes in combination: (class 0 and class 1), (class 0 and class 2), and (class 1 and class 2). In Table 1, the classes are represented as Wine Class # - Class #. The data is normalized for training, and 52 points are used. The remaining points are used for testing accuracy for the three combinations of datasets.

First, when comparing class 0 and class 1, 52 points are trained and 55 points are tested. The training and testing accuracy averaged 100%100\% for the classical SVM and quantum SVM and 99%99\% and 99.5%99.5\% for the simulated annealer. Second, when comparing class 0 and class 2, 52 points are trained and 78 points are tested. The training and testing accuracy averaged 100%100\% and 98.2%98.2\% for the classical SVM. The quantum SVM averaged 96%96\% in accuracy for both training and training, and the simulated annealing SVM averaged 95.0%95.0\% and 94.7%94.7\% in accuracy. Third, when comparing class 1 and class 2, 52 points are trained and 67 points are tested. The training and testing accuracy averaged 99.2%99.2\% and 96.0%96.0\% for the classical SVM, 98.5%98.5\% and 96.6%96.6\% for the quantum SVM, and 96.0%96.0\% and 94.2%94.2\% for the simulated annealer.

Table 3: Scalability with Number of Features
Features Scikit-learn D-Wave Embedding D-Wave Preprocessing D-Wave Access D-Wave Compute D-Wave Network
Time (ms) Time (ms) Time (ms) Time (ms) Time (ms) Overhead (ms)
(Preprocess +
Embedding + Access)
2 1.175 ±\pm 0.072 1,379.469 ±\pm 203.994 239.732 ±\pm 28.529 18.39 ±\pm 0.057 1,637.592 ±\pm 211.0 422,713.522 ±\pm 111,923.977
4 1.158 ±\pm 0.024 1,405.793 ±\pm 377.299 231.087 ±\pm 1.612 18.33 ±\pm 0.095 1,655.21 ±\pm 377.127 330,455.173 ±\pm 90,493.002
8 1.185 ±\pm 0.047 1,164.539 ±\pm 378.807 231.308 ±\pm 1.815 18.29 ±\pm 0.099 1,414.137 ±\pm 378.851 450,387.29 ±\pm 206,827.975
16 1.192 ±\pm 0.014 1,232.807 ±\pm 568.895 232.896 ±\pm 1.886 18.37 ±\pm 0.048 1,484.073 ±\pm 568.916 322,623.635 ±\pm 106,189.784
32 1.18 ±\pm 0.09 1,120.983 ±\pm 97.316 231.457 ±\pm 1.17 18.39 ±\pm 0.057 1,370.829 ±\pm 97.337 370,004.535 ±\pm 126,229.111
64 1.253 ±\pm 0.098 1,381.844 ±\pm 839.621 243.054 ±\pm 30.388 18.38 ±\pm 0.042 1,643.278 ±\pm 836.636 552,217.955 ±\pm 286,747.464
128 1.249 ±\pm 0.043 1,073.689 ±\pm 54.481 234.919 ±\pm 11.097 18.33 ±\pm 0.067 1,326.939 ±\pm 59.699 418,365.701 ±\pm 239,712.534
256 1.206 ±\pm 0.082 1,386.806 ±\pm 161.495 234.472 ±\pm 29.823 18.37 ±\pm 0.048 1,639.648 ±\pm 164.609 310,426.496 ±\pm 138,119.185
512 1.261 ±\pm 0.031 1,502.025 ±\pm 388.979 230.431 ±\pm 2.023 18.35 ±\pm 0.071 1,750.806 ±\pm 388.302 317,514.671 ±\pm 156,945.844
1,024 1.493 ±\pm 0.075 1,423.757 ±\pm 355.441 236.499 ±\pm 16.529 18.29 ±\pm 0.11 1,678.546 ±\pm 353.131 390,337.469 ±\pm 123,918.024
2,048 2.011 ±\pm 0.145 1,646.613 ±\pm 727.704 258.914 ±\pm 42.713 18.34 ±\pm 0.07 1,923.867 ±\pm 720.138 288,349.625 ±\pm 135,981.113
4,096 4.598 ±\pm 0.704 1,482.837 ±\pm 686.698 257.999 ±\pm 40.914 18.34 ±\pm 0.097 1,759.176 ±\pm 681.247 399,891.987 ±\pm 196,877.404
8,192 5.228 ±\pm 0.888 1,280.656 ±\pm 72.575 260.521 ±\pm 40.755 18.39 ±\pm 0.074 1,559.567 ±\pm 105.107 316,780.627 ±\pm 139,171.499
16,384 9.234 ±\pm 0.907 1,321.959 ±\pm 90.673 249.479 ±\pm 35.551 18.37 ±\pm 0.048 1,589.807 ±\pm 112.371 393,245.32 ±\pm 97,614.288
32,768 16.969 ±\pm 1.247 1,577.783 ±\pm 1013.991 242.255 ±\pm 23.858 18.36 ±\pm 0.107 1,838.398 ±\pm 1007.663 347,998.856 ±\pm 129,216.235
65,536 34.06 ±\pm 2.547 1,692.453 ±\pm 1249.103 234.506 ±\pm 3.644 18.37 ±\pm 0.048 1,945.328 ±\pm 1248.834 387,732.537 ±\pm 135,155.523
131,072 76.238 ±\pm 4.305 1,302.487 ±\pm 84.587 254.051 ±\pm 44.079 18.34 ±\pm 0.07 1,574.878 ±\pm 93.238 303,776.797 ±\pm 82,509.468
262,144 182.478 ±\pm 10.178 1,654.28 ±\pm 1011.998 243.556 ±\pm 31.683 18.37 ±\pm 0.062 1,916.393 ±\pm 1013.422 356,133.736 ±\pm 129,222.717
524,288 430.502 ±\pm 26.294 1,530.793 ±\pm 910.643 253.15 ±\pm 42.777 18.37 ±\pm 0.058 1,791.273 ±\pm 908.992 334,643.888 ±\pm 109,913.759
1,048,576 822.135 ±\pm 23.357 1,389.431 ±\pm 115.631 281.621 ±\pm 39.667 18.36 ±\pm 0.07 1,689.412 ±\pm 132.526 404,630.828 ±\pm 274,147.286
2,097,152 1,543.618 ±\pm 139.181 1,318.885 ±\pm 88.77 294.537 ±\pm 42.083 18.35 ±\pm 0.071 1,631.772 ±\pm 107.334 328,081.545 ±\pm 142,239.908
4,194,304 3,166.03 ±\pm 316.12 1,300.346 ±\pm 128.42 294.486 ±\pm 32.706 18.36 ±\pm 0.084 1,613.192 ±\pm 124.671 420,615.812 ±\pm 195,816.586
8,388,608 6,920.407 ±\pm 329.305 1,516.167 ±\pm 332.535 343.214 ±\pm 19.5 18.36 ±\pm 0.052 1,877.741 ±\pm 326.011 428,409.075 ±\pm 256,341.033
Refer to caption
(a) Few features (dd)
Refer to caption
(b) Many features (dd)
Figure 5: Scalability comparison of the Scikit-learn SVM (blue bar and bold line) and quantum SVM (light, medium, and dark green bars and dotted line). The xx-axis indicates the number of features (dd), and the yy-axis logarithmically represents time in seconds. The xx-axis ranges from 22 to 8,388,6088,388,608 across the two figures. In Figure 5(a), dd varies between 22 and 4,0964,096, and in Figure 5(b), dd varies between 8,1928,192 and 8,388,6088,388,608. In Figure 5(b), at 8 million features, the quantum approach demonstrated a speedup of 3.69×\times over the classical approach.
Table 4: Scalability with Number of Points
Trained Points Scikit-learn D-Wave Embedding D-Wave Preprocessing D-Wave Access D-Wave Compute D-Wave Network
Time (ms) Time (ms) Time (ms) Time (ms) Time (ms) Overhead (ms)
(Preprocess +
Embedding + Access)
4 397.597 ±\pm 12.209 1,430.915 ±\pm 297.767 52.004 ±\pm 7.233 17.0 ±\pm 0.156 1,499.919 ±\pm 297.346 120.693 ±\pm 15.664
6 628.511 ±\pm 7.604 1,422.757 ±\pm 382.261 71.3 ±\pm 1.359 16.92 ±\pm 0.187 1,510.977 ±\pm 382.923 440.845 ±\pm 81.854
8 865.129 ±\pm 14.27 1,303.998 ±\pm 516.343 83.676 ±\pm 9.532 16.95 ±\pm 0.217 1,404.623 ±\pm 515.02 1,112.126 ±\pm 360.746
10 1,131.034 ±\pm 12.523 1,633.786 ±\pm 985.969 122.318 ±\pm 14.208 17.01 ±\pm 0.311 1,773.115 ±\pm 992.71 2,049.21 ±\pm 661.929
12 1,414.692 ±\pm 42.061 1,334.734 ±\pm 177.382 141.291 ±\pm 11.683 17.03 ±\pm 0.134 1,493.055 ±\pm 179.878 3,905.956 ±\pm 1,651.884
14 1,709.829 ±\pm 30.631 1,378.001 ±\pm 380.399 168.314 ±\pm 2.797 17.29 ±\pm 0.179 1,563.604 ±\pm 382.467 8,745.695 ±\pm 3,737.998
16 2,057.462 ±\pm 21.925 1,154.179 ±\pm 311.386 49.982 ±\pm 2.321 17.18 ±\pm 0.244 1,221.341 ±\pm 310.555 10,334.816 ±\pm 3,908.58
18 2,298.212 ±\pm 102.983 1,217.663 ±\pm 419.797 66.299 ±\pm 11.212 17.1 ±\pm 0.189 1,301.062 ±\pm 419.291 19,508.428 ±\pm 7,737.284
20 2,623.09 ±\pm 145.095 1,257.566 ±\pm 530.37 72.883 ±\pm 11.114 17.32 ±\pm 0.266 1,347.769 ±\pm 527.727 23,161.555 ±\pm 11,342.062
22 3,117.991 ±\pm 116.612 1,370.357 ±\pm 674.162 84.43 ±\pm 13.221 17.47 ±\pm 0.291 1,472.257 ±\pm 686.392 38,135.434 ±\pm 20,395.38
24 3,402.623 ±\pm 140.536 1,225.826 ±\pm 309.729 91.573 ±\pm 4.39 17.62 ±\pm 0.215 1,335.019 ±\pm 308.578 39,167.391 ±\pm 25,129.559
26 3,854.724 ±\pm 161.398 1,078.185 ±\pm 58.742 102.241 ±\pm 3.995 17.66 ±\pm 0.143 1,198.086 ±\pm 59.46 49,917.087 ±\pm 24,788.842
28 4,158.047 ±\pm 237.876 1,409.069 ±\pm 1,002.37 116.227 ±\pm 2.926 17.54 ±\pm 0.272 1,542.835 ±\pm 1,003.154 74,291.927 ±\pm 38,792.915
30 4,325.487 ±\pm 132.322 1,442.387 ±\pm 1,252.714 145.596 ±\pm 30.139 17.85 ±\pm 0.085 1,605.834 ±\pm 1,247.755 84,428.49 ±\pm 44,035.425
32 4,736.955 ±\pm 146.286 1,074.25 ±\pm 54.508 150.959 ±\pm 23.223 17.87 ±\pm 0.048 1,243.08 ±\pm 63.392 109,971.795 ±\pm 33,452.821
34 4,895.237 ±\pm 209.984 1,099.025 ±\pm 75.892 173.559 ±\pm 28.753 17.97 ±\pm 0.095 1,290.554 ±\pm 65.265 100,535.682 ±\pm 52,262.449
36 5,259.747 ±\pm 252.264 2,015.937 ±\pm 2,794.639 183.582 ±\pm 23.139 17.93 ±\pm 0.125 2,217.45 ±\pm 2,791.223 112,656.481 ±\pm 54,607.342
38 5,390.033 ±\pm 195.683 1,079.114 ±\pm 59.131 217.531 ±\pm 35.153 17.98 ±\pm 0.169 1,314.624 ±\pm 62.542 181,915.305 ±\pm 142,277.523
40 5,656.504 ±\pm 101.086 1,868.626 ±\pm 2,396.907 214.248 ±\pm 7.722 18.1 ±\pm 0.067 2,100.973 ±\pm 2,396.668 215,169.1 ±\pm 130,688.927
42 5,962.922 ±\pm 125.236 1,156.912 ±\pm 102.021 237.135 ±\pm 11.716 18.1 ±\pm 0.105 1,412.147 ±\pm 101.156 175,136.911 ±\pm 66,927.252
44 6,253.6 ±\pm 91.927 1,142.343 ±\pm 87.118 246.837 ±\pm 7.396 18.16 ±\pm 0.084 1,407.339 ±\pm 91.611 279,223.122 ±\pm 119,973.036
46 6,463.876 ±\pm 145.813 1,179.851 ±\pm 110.547 291.388 ±\pm 29.706 18.18 ±\pm 0.092 1,489.419 ±\pm 106.889 248,253.607 ±\pm 83,297.36
48 6,771.785 ±\pm 130.662 2,191.238 ±\pm 3,049.933 292.981 ±\pm 12.052 18.31 ±\pm 0.074 2,502.53 ±\pm 3,046.364 297,207.318 ±\pm 178,830.42
50 7,080.465 ±\pm 96.879 1,222.904 ±\pm 114.113 328.075 ±\pm 36.607 18.3 ±\pm 0.125 1,569.279 ±\pm 104.291 292,123.268 ±\pm 141,273.153
52 7,194.556 ±\pm 195.74 1,219.074 ±\pm 88.527 369.37 ±\pm 39.163 18.34 ±\pm 0.07 1,606.784 ±\pm 90.774 407,913.426 ±\pm 209,009.574
54 7,623.851 ±\pm 530.636 4,031.514 ±\pm 8,823.785 383.31 ±\pm 34.857 18.38 ±\pm 0.079 4,433.204 ±\pm 8,824.813 399,734.428 ±\pm 191,104.466
Refer to caption
Figure 6: Scalability comparison of the Scikit-learn SVM (blue bar and bold line) and quantum SVM (light, medium, and dark green bars and dotted line). The xx-axis indicates the number of points (NN), and the yy-axis logarithmically represents time in seconds. The xx-axis ranges from 44 to 5454. In Figure 6, at 52 points, the quantum approach demonstrates a speedup of 4.48×\times over the classical approach. At 54 points, the quantum approach demonstrates a speedup of 1.72×\times over the classical approach.

5.4.5 Digits (0–1)

The Digits dataset depicts numerals between 0 and 9 in black-and-white 8×88\times 8 images. Each pixel value is between 0 and 16, where 16 is the darkest. Each image has its 64 pixels represented in one row. So, in training, each point has 64 features. The value of each pixel is normalized. We train 20 points each of class 0 and class 1 for 10 iterations to observe the accuracy. To test accuracy, 158 points are used. In Table 1, the classes are represented as Digits Class 0–1. The classical approach resulted in a 100%100\% training accuracy and a 99.2%99.2\% testing accuracy. The quantum approach resulted in a 99%99\% training accuracy and a 97.5%97.5\% testing accuracy. The simulated annealer failed to find a solution for the QUBO problems.

5.4.6 Lambeq

Lambeq is a dataset consisting of concise three to four word sentences classified into the domains of cooking and technology, with each word labeled according to its respective part of speech (e.g., verb, noun). We train fifty-two of the seventy sentences ten times and test thirty points each time. The dataset underwent parsing and embedding by using Bert’s natural language embedder. This was followed by training through an SVM model. The classical approach resulted in a 100%100\% training and testing accuracy. The quantum approach resulted in a 93.5%93.5\% training accuracy and a 88.7%88.7\% testing accuracy. The simulated annealer failed to find a solution for the QUBO problems.

5.5 Annealing Time

The accuracy of the quantum SVM comes second to that of the classical SVM in certain data sets. So, we saw the need to compare the impact of the annealing time parameter, which may influence the quantum annealer’s accuracy. We tested to see if the variance in annealing time has a significant effect on the accuracies of the trained models. Each dataset is trained and tested 10 times for each condition. The data used for the experiment included the Iris dataset with the classes Versicolor and Virginica, Wisconsin Breast Cancer, and Wine dataset with the classes 1 and 2. The annealing time used for the experiment included 20μs\,\mu s, 100μs\,\mu s, and 1000μs\,\mu s. Each training set contained 52 training points. The results, which include the mean and standard deviation of the accuracy, are presented in Table 2.

Even as the annealing time increased, the Scikit-learn model consistently performed better and consisted less variance in accuracy than the quantum and simulated annealing models. There wasn’t a statistically significant difference in the accuracy by increasing the annealing time as the mean accuracies of each test were within one standard deviation of each other. However, the Iris data set with the classes Virginica and Versicolor saw less variance and higher training and testing accuracy when trained for 1,000μs1,000\mu s as compared to 20μs20\mu s.

5.6 Time

We conduct a scalability analysis of the training time of the classical and quantum approach using synthetic datasets. We generate the random datasets using the Scikit-learn make_blob function. Before training the SVM models, we normalize the data points. The D-Wave access time represents the programming and sampling time. The sampling time includes annealing, read out, delay time. The simulated annealing approach failed to find a solution when there are many features, so we decided it was inappropriate to examine the scalability of the simulated annealing approach. The complexity of the classical approach is O(N3N^{3}), and the complexity of the quantum approach is O(N2N^{2}).

5.6.1 Number of Features

We conduct a scalability study to investigate the impact of feature size on the compute time of our classical and quantum SVM approaches. We also describe the runtime performance as the number of features increase from 2 to 8,388,608. The number of data points used for training remains constant at 52 across all the datasets because that is the most data points we could stably embed on the D-Wave Advantage quantum computer. The number of qubits available limits the number of data points. We are also testing each configuration 10 times.

The results, including the mean and standard deviation obtained from 10 iterations, are presented in Table 3. The scalability is shown in Figure 5, where the xx-axis indicates the number of features (dd), and the yy-axis logarithmically represents time in seconds. We depict the total Scikit-learn time with light blue bars. The medium green bars represent the D-Wave preprocessing time. The light green bars represent the D-Wave access time, and the dark green bars show the D-Wave embedding time. As shown, the access time stays constant at 18 ms. With smaller number of features (dd\leq 1,048,576), the classical approach’s runtime is faster than the D-Wave approach. As the number of features increases (dd\geq 2,098,152), the quantum approach’s compute time is faster than the classical approach. With fewer features, the time taken for embedding greatly influences the overall compute time on the D-Wave system. In contrast, the difference in access and preprocessing time has less influence. Similarly, with many features, the embedding time remains the dominant factor in the D-Wave’s compute time, with the access and preprocessing times have minimal impact. However, the preprocessing time does increase, whereas the embedding and access time do not. The performance of both approaches is comparable when there are 2 million features. When training with up to 8 million features, the quantum approach exhibits a speedup of 3.69×\times over the classical approach. Although not considered for the total computation time, the network overhead time increases as the number of points increases. The increased points result in an increased number of qubits used, which means longer data transfer times to the D-Wave server. The testing and training accuracies are both 100%100\%.

5.6.2 Number of Data Points

We also conduct a scalability study to investigate how the number of data points affects the compute time of our classical and quantum SVM approaches. We examine the runtime performance as the dataset expands from 4 to 54 data points. The number of features used for training remains constant at 8 million across all the datasets because the runtime performance of both approaches is comparable at 12 points and 8 million features. The number of qubits available limits the number of points.

The results, including the mean and standard deviation obtained from 10 iterations, are presented in Table 4. We present the scalability results in Figure 6, where the xx-axis indicates the number of data points (NN), and the yy-axis logarithmically represents time in seconds. We depict the total Scikit-learn time with blue bars. The green bars represent the D-Wave preprocessing time. The light green bars represent the D-Wave access time, and the dark green bars show the D-Wave embedding time. The dominant factor in the total time taken for the Quantum SVM is the embedding time. The preprocessing and access time increase as the number of points increases, while the embedding time remains relatively constant, except when training 54 points. Considering the available number of qubits, this outcome is expected as the data size increases. However, even at 54 points, the quantum SVM is 1.72×\times faster than its classical counterpart. At 52 points, where the quantum computer could find an embedding with minimal time fluctuation, the quantum computer outperforms the classical computer by a factor of 4.48.

5.7 Discussion

To the best of our knowledge, this is one of the first studies that shows a demonstrable quantum speedup for training machine learning models on the NISQ-era quantum computers. Our study, coupled with the linear regression study by Date and Potok in 2021 (Date & Potok, 2021), provides a compelling evidence that a quantum speedup can be realized on datasets that have either a large number of data points or a large number of features. On smaller datasets, the overheads of running jobs on the quantum computer overshadow any speedup obtained by the quantum approach.

The testing accuracies presented in Table 1 show that the quantum approach performs better than the classical approach in case of Wine 1–2. On some datasets such as synthetic and Iris, it equals the performance of the classical approach. On other datasets however (Iris Versicolor-Virginica, WBC, Wine 0–2, Digits, and Lambeq), it is outperformed by the classical approach. On these datasets, the lower accuracy of the quantum approach is partly attributed to the fact that the datasets were linearly inseparable, for e.g., Versicolor-Virginica. For other datasets, the accuracy of the weights learned in the quantum approach was directly affected by the numerical precision (i.e., number of bits) allocated to each Lagrangian multiplier in the QUBO problem. This precision was in turn constrained by the hardware architecture, i.e., the number of qubits and their connectivity available on the quantum hardware. Within the constraints imposed by the quantum architecture, the quantum approach was able to produce accuracies that were in the same ballpark as the classical approach for most of the datasets.

Our results in this paper demonstrate that the quantum approach trains SVMs faster than the Scikit-learn classical approach on larger-sized datasets. While the Scikit-learn’s algorithm for training SVMs may not be the most efficient classical algorithm, it certainly is one of the most widely used algorithms in the literature and runs in 𝒪(N3)\mathcal{O}(N^{3}) time. The classical algorithms have been optimized for decades, whereas the quantum algorithms are still in their nascent stages. In this light, our motivation in this paper was to demonstrate that our quantum approach can perform faster than a widely used classical approach. Our results are intended to serve as a baseline for more optimized quantum approaches in the future. We certainly hope that the future quantum approaches would outperform not only the most widely used classical approaches, but also the best performing classical approaches.

6 Conclusion

A critical limitation of current state-of-the-art machine learning is the extensive computational resources required for training. The duration of this process varies significantly—ranging from a few hours to several months—and is contingent upon the size of the training dataset. To address this problem, we introduce a quantum approach for solving the SVM problem. This paper describes our empirical analysis of transforming SVMs to QUBO problems, which D-Wave Advantage computers can solve. Through theoretical analysis, we establish that our quantum approach outperforms the current classical approach for runtime. Next, we compare the performance of an SVM implementation on a D-Wave Advantage quantum annealer with a classical implementation on AMD Ryzen 5 4600H and Intel Xeon E5-2690v4 CPUs. Our training results demonstrate that the quantum approach’s accuracy is comparable with the classical approach. When training with a dataset with 8 million features, the quantum annealer outperforms the classical computer by achieving up to 4.48×\times faster computation. Our research contribution shows that quantum computing can effectively reduce training times and lead to accelerated scientific discoveries and improved performance for machine learning models.

In the future, we would like to explore novel approaches to train larger datasets effectively. We want to expand our quantum approach to encompass variants of SVMs that leverage kernel methods. Finally, we are interested in exploring techniques to mitigate the effects of noise and errors in quantum annealing.

7 Acknowledgment

This manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/doe-public-access-plan). This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank Sam Crawford of Oak Ridge National Laboratory for editing this manuscript.

References

  • Abu-Mostafa et al. (2012) Abu-Mostafa, Y. S., Magdon-Ismail, M., and Lin, H.-T. Learning from data, volume 4. AMLBook New York, NY, USA:, 2012.
  • Adachi & Henderson (2015) Adachi, S. H. and Henderson, M. P. Application of quantum annealing to training of deep neural networks. arXiv preprint arXiv:1510.06356, 2015.
  • Amin et al. (2018) Amin, M. H., Andriyash, E., Rolfe, J., Kulchytskyy, B., and Melko, R. Quantum Boltzmann machine. Physical Review X, 8(2):021050, 2018.
  • Arthur & Date (2021) Arthur, D. and Date, P. Balanced k-means clustering on an adiabatic quantum computer. Quantum Information Processing, 20(9):1–30, 2021.
  • Arute et al. (2019) Arute, F., Arya, K., Babbush, R., Bacon, D., Bardin, J. C., Barends, R., Biswas, R., Boixo, S., Brandao, F. G., Buell, D. A., et al. Quantum supremacy using a programmable superconducting processor. Nature, 574(7779):505–510, 2019.
  • Barbosa et al. (2021) Barbosa, A., Pelofske, E., Hahn, G., and Djidjev, H. N. Using machine learning for quantum annealing accuracy prediction. Algorithms, 14(6), 2021. ISSN 1999-4893. doi: 10.3390/a14060187. URL https://www.mdpi.com/1999-4893/14/6/187.
  • Benedetti et al. (2019) Benedetti, M., Lloyd, E., Sack, S., and Fiorentini, M. Parameterized quantum circuits as machine learning models. Quantum Science and Technology, 4(4):043001, 2019.
  • Biamonte et al. (2017) Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., and Lloyd, S. Quantum machine learning. Nature, 549(7671):195–202, 2017.
  • Bottou & Lin (2007) Bottou, L. and Lin, C.-J. Support vector machine solvers. Large scale kernel machines, 3(1):301–320, 2007.
  • Coppersmith (2002) Coppersmith, D. An approximate fourier transform useful in quantum factoring. arXiv preprint quant-ph/0201067, 2002.
  • Date (2019) Date, P. Combinatorial Neural Network Training Algorithm for Neuromorphic Computing. PhD thesis, Rensselaer Polytechnic Institute, 2019.
  • Date & Potok (2021) Date, P. and Potok, T. Adiabatic quantum linear regression. Scientific reports, 11(1):21905, 2021.
  • Date & Smith (2024) Date, P. and Smith, W. Quantum discriminator for binary classification. Scientific Reports, 14(1):1328, 2024.
  • Date et al. (2019a) Date, P., Patton, R., Schuman, C., and Potok, T. Efficiently embedding QUBO problems on adiabatic quantum computers. Quantum Information Processing, 18(4):117, 2019a.
  • Date et al. (2019b) Date, P., Schuman, C., Patton, R., and Potok, T. A classical-quantum hybrid approach for unsupervised probabilistic machine learning. In Future of Information and Communication Conference, pp. 98–117. Springer, 2019b.
  • Date et al. (2021) Date, P., Arthur, D., and Pusey-Nazzaro, L. Qubo formulations for training machine learning models. Scientific Reports, 11, 2021.
  • Delgado et al. (2022) Delgado, A., Hamilton, K. E., Vlimant, J.-R., Magano, D., Omar, Y., Bargassa, P., Francis, A., Gianelle, A., Sestini, L., Lucchesi, D., et al. Quantum computing for data analysis in high energy physics. arXiv preprint arXiv:2203.08805, 2022.
  • Dunjko & Briegel (2018) Dunjko, V. and Briegel, H. J. Machine learning & artificial intelligence in the quantum domain: a review of recent progress. Reports on Progress in Physics, 81(7):074001, 2018.
  • Dunjko et al. (2016) Dunjko, V., Taylor, J. M., and Briegel, H. J. Quantum-enhanced machine learning. Physical review letters, 117(13):130501, 2016.
  • Gilyén et al. (2019) Gilyén, A., Arunachalam, S., and Wiebe, N. Optimizing quantum optimization algorithms via faster quantum gradient computation. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.  1425–1444. SIAM, 2019.
  • Grover (1996) Grover, L. K. A fast quantum mechanical algorithm for database search. In Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, pp.  212–219, 1996.
  • Harrow et al. (2009) Harrow, A. W., Hassidim, A., and Lloyd, S. Quantum algorithm for linear systems of equations. Physical review letters, 103(15):150502, 2009.
  • Havlíček et al. (2019) Havlíček, V., Córcoles, A. D., Temme, K., Harrow, A. W., Kandala, A., Chow, J. M., and Gambetta, J. M. Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747):209–212, 2019.
  • Humble et al. (2022) Humble, T. S., Delgado, A., Pooser, R., Seck, C., Bennink, R., Leyton-Ortega, V., Wang, C.-C. J., Dumitrescu, E., Morris, T., Hamilton, K., et al. Snowmass white paper: Quantum computing systems and software for high-energy physics research. arXiv preprint arXiv:2203.07091, 2022.
  • Jordan (2005) Jordan, S. P. Fast quantum algorithm for numerical gradient estimation. Physical review letters, 95(5):050501, 2005.
  • Karush (1939) Karush, W. Minima of functions of several variables with inequalities as side constraints. M. Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, 1939.
  • Kochenberger et al. (2014) Kochenberger, G., Hao, J.-K., Glover, F., Lewis, M., Lü, Z., Wang, H., and Wang, Y. The unconstrained binary quadratic programming problem: a survey. Journal of Combinatorial Optimization, 28(1):58–81, 2014.
  • Kuhn & Tucker (2014) Kuhn, H. W. and Tucker, A. W. Nonlinear programming. In Traces and emergence of nonlinear programming, pp. 247–258. Springer, 2014.
  • LeCun et al. (2015) LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature, 521(7553):436–444, 2015.
  • Lee et al. (2022) Lee, H., Noh, S., and Jun, K. Effective QUBO modeling for solving linear systems on D-wave quantum annealing device. In Donkor, E., Hayduk, M., Frey, M. R., Jr., S. J. L., and Myers, J. M. (eds.), Quantum Information Science, Sensing, and Computation XIV, volume 12093, pp.  120930F. International Society for Optics and Photonics, SPIE, 2022. doi: 10.1117/12.2632416. URL https://doi.org/10.1117/12.2632416.
  • Li et al. (2019) Li, T., Chakrabarti, S., and Wu, X. Sublinear quantum algorithms for training linear and kernel-based classifiers. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp.  3815–3824, Long Beach, California, USA, 09–15 Jun 2019. PMLR. URL http://proceedings.mlr.press/v97/li19b.html.
  • Lucas (2014) Lucas, A. Ising formulations of many np problems. Frontiers in Physics, 2:5, 2014.
  • Moll et al. (2018) Moll, N., Barkoutsos, P., Bishop, L. S., Chow, J. M., Cross, A., Egger, D. J., Filipp, S., Fuhrer, A., Gambetta, J. M., Ganzhorn, M., et al. Quantum optimization using variational algorithms on near-term quantum devices. Quantum Science and Technology, 3(3):030503, 2018.
  • Mukherjee & Chakrabarti (2015) Mukherjee, S. and Chakrabarti, B. K. Multivariable optimization: Quantum annealing and computation. The European Physical Journal Special Topics, 224(1):17–24, 2015.
  • Munson (2012) Munson, M. A. A study on the importance of and time spent on different modeling steps. ACM SIGKDD Explorations Newsletter, 13(2):65–71, 2012.
  • Neven et al. (2012) Neven, H., Denchev, V. S., Rose, G., and Macready, W. G. QBoost: Large scale classifier training with adiabatic quantum optimization. In Hoi, S. C. H. and Buntine, W. (eds.), Proceedings of the Asian Conference on Machine Learning, volume 25 of Proceedings of Machine Learning Research, pp.  333–348, Singapore Management University, Singapore, 04–06 Nov 2012. PMLR. URL http://proceedings.mlr.press/v25/neven12.html.
  • Otgonbaatar et al. (2022) Otgonbaatar, S., Datcu, M., and Demir, B. Coreset of hyperspectral images on a small quantum computer. In IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium, pp.  4923–4926, 2022. doi: 10.1109/IGARSS46834.2022.9884273.
  • Perdomo-Ortiz et al. (2018) Perdomo-Ortiz, A., Benedetti, M., Realpe-Gómez, J., and Biswas, R. Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers. Quantum Science and Technology, 3(3):030502, 2018.
  • Pudenz & Lidar (2013) Pudenz, K. L. and Lidar, D. A. Quantum adiabatic machine learning. Quantum information processing, 12(5):2027–2070, 2013.
  • Quiroga et al. (2021) Quiroga, D., Date, P., and Pooser, R. Discriminating quantum states with quantum machine learning. In 2021 International Conference on Rebooting Computing (ICRC), pp.  56–63. IEEE, 2021.
  • Rebentrost et al. (2019) Rebentrost, P., Schuld, M., Wossnig, L., Petruccione, F., and Lloyd, S. Quantum gradient descent and newton’s method for constrained polynomial optimization. New Journal of Physics, 21(7):073023, 2019.
  • Schuld et al. (2018) Schuld, M., Bocharov, A., Svore, K., and Wiebe, N. Circuit-centric quantum classifiers. arXiv preprint arXiv:1804.00633, 2018.
  • Shor (1994) Shor, P. W. Algorithms for quantum computation: discrete logarithms and factoring. In Proceedings 35th annual symposium on foundations of computer science, pp.  124–134. Ieee, 1994.
  • Simões et al. (2023) Simões, R. D. M., Huber, P., Meier, N., Smailov, N., Füchslin, R. M., and Stockinger, K. Experimental evaluation of quantum machine learning algorithms. IEEE Access, 11:6197–6208, 2023. doi: 10.1109/ACCESS.2023.3236409.
  • Theis & Wong (2017) Theis, T. N. and Wong, H.-S. P. The end of moore’s law: A new beginning for information technology. Computing in Science & Engineering, 19(2):41–50, 2017.
  • Wang & Jonckheere (2019) Wang, C. and Jonckheere, E. Simulated versus reduced noise quantum annealing in maximum independent set solution to wireless network scheduling. Quantum Information Processing, 18(1):1–25, 2019.
  • Willsch et al. (2019) Willsch, D., Willsch, M., De Raedt, H., and Michielsen, K. Support vector machines on the D-Wave quantum annealer. Computer Physics Communications, pp.  107006, 2019.