This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: CSE, IIT Delhi.thanks: 11email: [email protected]

A Quantum Approximation Scheme for kk-Means

Ragesh Jaiswal
Abstract

We give a quantum approximation scheme (i.e., (1+ε)(1+\varepsilon)-approximation for every ε>0\varepsilon>0) for the classical kk-means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset VV with NN points in d\mathbb{R}^{d} stored in QRAM data structure, our quantum algorithm runs in time O~(2O~(kε)η2d)\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}\eta^{2}d\right) and with high probability outputs a set CC of kk centers such that cost(V,C)(1+ε)cost(V,COPT)cost(V,C)\leq(1+\varepsilon)\cdot cost(V,C_{OPT}). Here COPTC_{OPT} denotes the optimal kk-centers, cost(.)cost(.) denotes the standard kk-means cost function (i.e., the sum of squared distance of points to the closest center), and η\eta is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of (1+ε)(1+\varepsilon) for the kk-means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.

1 Introduction

Data clustering and the kk-means problem, in particular, have many applications in data processing. The kk-means problem is defined as: given a set of nn points V={v1,,vn}dV=\{v_{1},...,v_{n}\}\subset\mathbb{R}^{d}, and a positive integer kk, find a set CdC\subset\mathbb{R}^{d} of kk centers such that the cost function,

Φ(V,C)vVmincCD2(v,c),\Phi(V,C)\equiv\sum_{v\in V}\min_{c\in C}D^{2}(v,c),

is minimised. Here, D(v,c)vcD(v,c)\equiv\norm{v-c} is the Euclidean distance between points vv and cc. Partitioning the points based on the closest center in the center set CC gives a natural clustering of the data points. Due to its applications in data processing, a lot of work goes into designing algorithms from theoretical and practical standpoints. The kk-means problem is known to be 𝖭𝖯\mathsf{NP}-hard, so it is unlikely to have a polynomial time algorithm. Much research has been done on designing polynomial time approximation algorithms for the kk-means problem. However, the algorithm used in practice to solve kk-means instances is a heuristic, popularly known as the kk-means algorithm (not to be confused with the kk-means problem). This heuristic, also known as Lloyd’s iterations [Llo82], iteratively improves the solution in several rounds. The heuristic starts with an arbitrarily chosen set of kk centers. In every iteration, it (i) partitions the points based on the nearest center and (ii) updates the center set to the centroids of the kk partitions. In the classical computational model, it is easy to see that every Lloyd iteration costs O(Nkd)O(Nkd) time. This hill-climbing approach may get stuck in a local minimum or take a huge amount of time to converge and hence does not give provable guarantees on the quality of the final solution or the running time. In practice, Lloyd’s iterations are usually preceded by the kk-means++ algorithm [AV07], a fast sampling-based approach for picking the initial kk centers that also gives an approximation guarantee. So, Lloyd’s iterations, preceded by the kk-means++ algorithm, give the best of both worlds, theory, and practice. Hence, it is unsurprising that a lot of work has been done on these two algorithms. This ranges from efficiency improvements in specific settings to implementations in distributed and parallel models. With the quantum computing revolution imminent, it is natural to talk about quantum versions of these algorithms and quantum algorithms for the kk-means problem in general.

Early work on the kk-means problem within the quantum setting involved efficiency gains from quantizing Lloyd’s iterations. In particular, Aimeur, Brassard, and Gambs [ABG13] gave an O(N3/2k)O(\frac{N^{3/2}}{\sqrt{k}}) time algorithm for executing a single Lloyd’s iteration for the Metric kk-median clustering problem that is similar to the kk-means problem. This was using the quantum minimum finding algorithm of Durr and Hoyer [DH99]. Using quantum distance estimation techniques assuming quantum data access, Lloyd, Mohseni, and Rebentrost [LMR13] gave an O(kNlog(d))O(kN\log{d}) time algorithm for the execution of a single Lloyd’s iteration for the kk-means problem. More recently, [KLLP19] gave an approximate quantization of the kk-means++ method and Lloyd’s iteration assuming QRAM data structure [kp17] access to the data. Interestingly, the running time has only polylogarithmic dependence on the size NN of the dataset. The algorithm uses quantum linear algebra procedures, and hence there is dependence on certain parameters that appear in such procedures, such as the condition number κ(V)\kappa(V). Since Lloyd’s iterations do not give an approximation guarantee, its quantum version is also a heuristic without a provable approximation guarantee.111Even though [KLLP19] gives a quantum version of the kk-means++ algorithm that has an O(log(k))O(\log{k}) approximation guarantee, the guarantee for the quantum version (which has errors) is not shown explicitly. Our work on the kk-means problem builds upon the techniques developed in all the above and other works on quantum unsupervised learning to design algorithms with provable approximation guarantees. Specifically, we want to design an approximation scheme for the kk-means problem with a running time that has only a polylogarithmic dependence on the data size NN as in the algorithm of [KLLP19]. An approximation scheme is an algorithm that, in addition to the dataset and kk, takes an error parameter ε>0\varepsilon>0 as input and outputs a solution with a cost within (1+ε)(1+\varepsilon) factor of the optimal. We do this by quantizing the highly parallel, sampling-based approximation scheme of [BGJK20]. The tradeoff in obtaining this fine-grained approximation is that the running time of our algorithm has an exponential dependence on kk and error parameter ε\varepsilon. In the classical setting, such algorithms are categorized as Fixed Parameter Approximation Schemes (fpt-AS). Such (1+ε)(1+\varepsilon)-approximation algorithms can have exponential running time dependence on the parameter (e.g., the number of clusters kk in our setting). The practical motivation for studying Fixed-Parameter Tractability for computationally hard problems is that when the parameter is small (e.g., number of clusters k5k\sim 5), the running time is not prohibitively large. We state our main result as the following theorem, which we will prove in the remainder of the paper.

Theorem 1.1 (Main Theorem)

Let 0<ε<1/20<\varepsilon<1/2 be the error parameter. There is a quantum algorithm that, when given QRAM data structure access to a dataset VN×dV\in\mathbb{R}^{N\times d}, runs in time O~(2O~(kε)dη2)\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}d\eta^{2}\right) and outputs a kk center set Ck×dC\in\mathbb{R}^{k\times d} such that with high probability Φ(V,C)(1+ε)OPT\Phi(V,C)\leq(1+\varepsilon)\cdot OPT. Here, η\eta is the aspect ratio, i.e., the ratio of the maximum to the minimum distance between two given points in VV.222The O~\tilde{O} notation hides logarithmic factors in NN. The O~\tilde{O} in the exponent hides logarithmic factors in kk and 1/ε1/\varepsilon.

1.1 An approximation scheme in the classical setting

We convert the D2D^{2}-sampling-based approximation scheme of [BGJK20] to a Quantum version. The approximation scheme is simple and highly parallel, which can be described in the following few lines:

Input: Dataset VV, integer k>0k>0, and error ε>0\varepsilon>0

Output: A center set CC^{\prime} with Φ(V,C)(1+ε)OPT\Phi(V,C^{\prime})\leq(1+\varepsilon)OPT

  1. 1.

    (Constant approximation) Find a center set CC that is a constant factor approximate solution. An (α,β)(\alpha,\beta) pseudo-approximate solution, for constants α,β\alpha,\beta, also works.

  2. 2.

    (D2D^{2}-sampling) Pick a set TT of poly(kε)poly(\frac{k}{\varepsilon}) points independently from the dataset using D2D^{2}-sampling with respect to the center set CC.

  3. 3.

    (All subsets) Out of all kk-tuples (S1,,Sk)(S_{1},...,S_{k}) of (multi)subsets of T{copies of points in C}T\cup\{\textrm{copies of points in $C$}\}, each SiS_{i} of size O(1ε)O(\frac{1}{\varepsilon}), return (μ(S1),,μ(Sk))(\mu(S_{1}),...,\mu(S_{k})) that gives the least kk-means cost. Here, μ(Si)\mu(S_{i}) denotes the centroid of points in SiS_{i}.

We will discuss the quantization of the above three steps of the approximation scheme of [BGJK20], thus obtaining a quantum approximation scheme. 333Steps (2) and (3) in the algorithm are within a loop for probability amplification. This loop is skipped in this high-level description for simplicity.

1. (Constant approximation)

The first step requires finding a constant factor approximate solution for the kk-means problem. Even though several constant factor approximation algorithms are known, we need one with a quantum counterpart that runs in time that is polylogarithmic in the input size NN. One such algorithm is the kk-means++ seeding algorithm [AV07] that picks kk centers in a sequence with the ithi^{th} center picked using D2D^{2}-sampling444D2D^{2}-sampling: Given a center set CC, D2D^{2}-sampling picks a datapoint with probability proportional to the squared distance of the point to the closest center in CC. with respect to the previously chosen (i1)(i-1) centers. [KLLP19] give an approximate quantum version of D2D^{2}-sampling. The approximation guarantee of the kk-means++ algorithm is O(log(k))O(\log{k}) instead of the constant approximation required in the approximation scheme of [BGJK20]. It is known from the work of [ADK09] that if the D2D^{2}-sampling in kk-means++ is continued for 2k2k steps instead of stopping after sampling kk centers, then we obtain a center set of size 2k2k that is a (2,O(1))(2,O(1))-pseudo approximate solution. This means that this 2k2k-size center set has a kk-means cost that is some constant times the optimal. Such a pseudo-approximate solution is sufficient for the approximation scheme of [BGJK20] to work. We show that the pseudo-approximation guarantee of [ADK09] also holds when using the approximate quantum version of the D2D^{2}-sampling procedure.

2. (D2D^{2}-sampling)

The second step of [BGJK20] involves D2D^{2}-sampling, which we already discussed how to quantize. This is no different than the D2D^{2}-sampling involved in the kk-means++ algorithm of the previous step. The sampling in this step is simpler since the center set CC with respect to which the D2D^{2}-sampling is performed, does not change (as is the case with the kk-means++ algorithm.)

3. (All subsets)

Since the number of points sampled in the previous step is poly(kε)poly(\frac{k}{\varepsilon}), we need to consider a list of (kε)O~(kε)\left(\frac{k}{\varepsilon}\right)^{\tilde{O}(\frac{k}{\varepsilon})} tuples of subsets, each giving a kk-center set (a tuple (S1,,Sk)(S_{1},...,S_{k}) defines (μ(S1),,μ(Sk))(\mu(S_{1}),...,\mu(S_{k}))). We need to compute the kk-means cost for each kk center sets in the list and then pick the one with the least cost. We give quantization of the above steps. 555Note that when picking the center set with the least cost, we can get quadratic improvement in the search for the best kk-center set using quantum search. Given that the search space is of size (kε)O~(kε)\left(\frac{k}{\varepsilon}\right)^{\tilde{O}(\frac{k}{\varepsilon})}, this results only in a constant factor improvement in the exponent. So, we leave out the quantum search from the discussion for simplicity.

Note that the quantization of the classical steps of [BGJK20] will incur precision errors. So, we first need to ensure that the approximation guarantee of [BGJK20] is robust against small errors in distance estimates, D2D^{2}-sampling probabilities, and kk-means cost estimates. We must carefully account for errors and ensure that the quantum algorithm retains the (1+ε)(1+\varepsilon) approximation guarantee of the robust version of [BGJK20].

Organization

We begin the technical discussions in the next section by showing that the approximation scheme of [BGJK20] is robust against errors. We will also show the robustness of the kk-means++ procedure. In the subsequent section, we give the quantization of the steps of [BGJK20]. First, we briefly discuss the related work.

1.2 Related work

We have already discussed past research works on quantum versions of the kk-means algorithm (i.e., Lloyd’s iterations). This includes [ABG13], [LMR13], and [KLLP19]. All these have been built using various quantum tools and techniques developed for various problems in quantum unsupervised learning, such as coherent amplitude and median estimation, distance estimation, minimum finding, etc. See [WKS15] for examples of several such tools. Other directions on quantum kk-means includes adiabatic algorithms (e.g., [LMR13]) and algorithms using the QAOA framework (e.g., [OMA+17, FGG14]). However, these are without provable guarantees. A line of work has also suggested that quantum algorithms can outperform classical ones because of the QRAM data structure access. A more level playing field is to assume that a similar sample and query data access is available in the classical setting. Under this assumption, several “dequantization” results for unsupervised machine learning algorithms have been given. This includes [Tan19, CGL+20, Tan21]. It will be interesting to see if similar dequantization is possible for the quantum algorithms presented in this work since the main ingredient of our algorithm and the dequantization results is length-squared sampling.

2 A Robust Approximation Scheme

We start the discussion with the D2D^{2}-sampling method. In particular, we would like to check the robustness of the approximation guarantee provided by the D2D^{2}-sampling method against errors in estimating the distances between points. We will show that the D2D^{2}-sampling method gives a constant pseudo-approximation even under sampling errors.

2.1 Pseudoapproximation using D2D^{2}-sampling

Let the matrix VN×dV\in\mathbb{R}^{N\times d} denote the dataset, where row ii contains the ithi^{th} data point vidv_{i}\in\mathbb{R}^{d}. Let the matrix Ct×dC\in\mathbb{R}^{t\times d} any tt-center set, where row ii contains the ithi^{th} center cidc_{i}\in\mathbb{R}^{d} out of the tt centers. Sampling a data point using the D2D^{2} distribution w.r.t. (short for with respect to) a center set CC means that the datapoint viv_{i} gets sampled with probability proportional to the squared distance to its nearest center in the center set CC. This is also known as D2D^{2} sampling w.r.t. center set CC. More formally, data points are sampled using the distribution (D2(v1,C)jD2(vj,C),,D2(vN,C)jD2(vj,C))\left(\frac{D^{2}(v_{1},C)}{\sum_{j}D^{2}(v_{j},C)},...,\frac{D^{2}(v_{N},C)}{\sum_{j}D^{2}(v_{j},C)}\right), where D2(vj,C)mincCD2(vj,c)D^{2}(v_{j},C)\equiv\min_{c\in C}D^{2}(v_{j},c). For the special case C=C=\emptyset, D2D^{2} sampling is the same as uniform sampling. The kk-means++ seeding algorithm starts with an empty center set CC and, over kk iterations, adds a center to CC in every iteration by D2D^{2} sampling w.r.t. the current center set CC. It is known from the result of [AV07] that this kk-means++ algorithm above gives an O(log(k))O(\log{k}) approximation in expectation. It is also known from the result of [ADK09] that if 2k2k centers are sampled, instead of kk (i.e., the for-loop runs from 11 to 2k2k), the cost with respect to these 2k2k centers is at most some constant times the optimal kk-means cost. Such an algorithm is called a pseudo approximation algorithm. Such a pseudo approximation algorithm is sufficient for the approximation scheme of [BGJK20]. So, we will quantize the following constant factor pseudo-approximation algorithm.

1  Input: (V,k)(V,k)
2  C{}C\leftarrow\{\}
  for ii = 11 to 2k2k do
3     Pick cc using D2D^{2}-sampling w.r.t. center set CC
4     C:=C{c}C:=C\leftarrow\{c\}
  end for
  return  CC
Algorithm 1 A pseudo-approximation algorithm based on D2D^{2}-sampling.

In the quantum simulation of the above sampling procedure, there will be small errors in the sampling probabilities in each iteration. We need to ensure that the constant approximation guarantee of the above procedure is robust against small errors in the sampling probabilities owing to errors in distance estimation. We will work with a relative error of (1±δ)(1\pm\delta) for small δ\delta. Following is a crucial lemma from [AV07] needed to show the pseudo-approximation property of Algorithm 1.

Lemma 1 (Lemma 3.2 in [AV07])

Let AA be an arbitrary optimal cluster, and let CC be an arbitrary set of centers. Let cc be a center chosen from AA with D2D^{2}-sampling with respect to CC. Then 𝐄[cost(A,C{c})]8OPT(A)\mathbf{E}[cost(A,C\cup\{c\})]\leq 8\cdot OPT(A).

The above lemma is used as a black box in the analysis of Algorithm 1 in [ADK09]. The following version of the lemma holds for distance estimates with a relative error of (1±δ)(1\pm\delta) and gives a constant factor approximation guarantee. Since Lemma 1 is used as a black box in the analysis of Algorithm 1, replacing this lemma with Lemma 2 also gives a constant factor approximation to the kk-means objective. We will use the following notion of the closeness of two distance functions.

Definition 1

A distance function D1D_{1} is said to be δ\delta-close to distance function D2D_{2}, denoted by D1δD2D_{1}\sim_{\delta}D_{2}, if for every pair of points x,ydx,y\in\mathbb{R}^{d}, D1(x,y)(1±δ)D2(x,y)D_{1}(x,y)\in(1\pm\delta)\cdot D_{2}(x,y).666We use the notation that for positive reals P,QP,Q, P(1±δ)QP\in(1\pm\delta)\cdot Q if (1δ)QP(1+δ)Q(1-\delta)\cdot Q\leq P\leq(1+\delta)\cdot Q.

Lemma 2

Let 0<δ1/20<\delta\leq 1/2. Let AA be an arbitrary optimal cluster and CC be an arbitrary set of centers. Let cc be a center chosen from AA with D~2\tilde{D}^{2}-sampling with respect to CC, where D~δD\tilde{D}\sim_{\delta}D. Then 𝐄[cost(A,C{c})]72OPT(A)\mathbf{E}[cost(A,C\cup\{c\})]\leq 72\cdot OPT(A).

Proof

Let D(a)D(a) denote the distance of the point aa from the nearest center in CC and let D~(a)\tilde{D}(a) denote the estimated distance. We have D~(a)D(a)(1±δ)\tilde{D}(a)\in D(a)\cdot(1\pm\delta). The following expression gives the expectation:

a0AD~2(a0)aAD~2(a)aAmin(D2(a),D2(a,a0))\displaystyle\sum_{a_{0}\in A}\frac{\tilde{D}^{2}(a_{0})}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}

Note that for all a0,aAa_{0},a\in A, D(a0)D(a)+D(a,a0)D(a_{0})\leq D(a)+D(a,a_{0}). This gives D~(a0)1+δ1δD~(a)+(1+δ)D(a0,a)\tilde{D}(a_{0})\leq\frac{1+\delta}{1-\delta}\cdot\tilde{D}(a)+(1+\delta)\cdot D(a_{0},a), which further gives D~2(a0)2(1+δ1δ)2D~2(a)+2(1+δ)2D2(a0,a)\tilde{D}^{2}(a_{0})\leq 2\left(\frac{1+\delta}{1-\delta}\right)^{2}\cdot\tilde{D}^{2}(a)+2(1+\delta)^{2}\cdot D^{2}(a_{0},a) and D~2(a0)2|A|(1+δ1δ)2aAD~2(a)+2|A|(1+δ)2aAD2(a0,a)\tilde{D}^{2}(a_{0})\leq\frac{2}{|A|}\left(\frac{1+\delta}{1-\delta}\right)^{2}\cdot\sum_{a\in A}\tilde{D}^{2}(a)+\frac{2}{|A|}(1+\delta)^{2}\cdot\sum_{a\in A}D^{2}(a_{0},a). We use this to obtain the following upper bound on the expectation 𝐄[cost(A,C{c})]\mathbf{E}[cost(A,C\cup\{c\})]:

𝐄[cost(A,C{c})]\displaystyle\mathbf{E}[cost(A,C\cup\{c\})] \displaystyle\leq a0AD~2(a0)aAD~2(a)aAmin(D2(a),D2(a,a0))\displaystyle\sum_{a_{0}\in A}\frac{\tilde{D}^{2}(a_{0})}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}
\displaystyle\leq a0A(2|A|(1+δ1δ)2aAD~2(a))aAD~2(a)aAmin(D2(a),D2(a,a0))+\displaystyle\sum_{a_{0}\in A}\frac{\left(\frac{2}{|A|}\left(\frac{1+\delta}{1-\delta}\right)^{2}\sum_{a\in A}\tilde{D}^{2}(a)\right)}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}+
a0A(2|A|(1+δ)2aAD2(a0,a))aAD~2(a)aAmin(D2(a),D2(a,a0))\displaystyle\sum_{a_{0}\in A}\frac{\left(\frac{2}{|A|}(1+\delta)^{2}\sum_{a\in A}D^{2}(a_{0},a)\right)}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}
\displaystyle\leq a0AaA2|A|(1+δ1δ)2D2(a,a0)+a0AaA2|A|(1+δ1δ)2D(a0,a)2\displaystyle\sum_{a_{0}\in A}\sum_{a^{\prime}\in A}\frac{2}{|A|}\left(\frac{1+\delta}{1-\delta}\right)^{2}D^{2}(a^{\prime},a_{0})+\sum_{a_{0}\in A}\sum_{a\in A}\frac{2}{|A|}\left(\frac{1+\delta}{1-\delta}\right)^{2}D(a_{0},a)^{2}
=\displaystyle= 4|A|(1+δ1δ)2a0AaAD2(a0,a)2\displaystyle\frac{4}{|A|}\left(\frac{1+\delta}{1-\delta}\right)^{2}\sum_{a_{0}\in A}\sum_{a\in A}D^{2}(a_{0},a)^{2}
=\displaystyle= 8(1+δ1δ)2OPT(A)\displaystyle 8\left(\frac{1+\delta}{1-\delta}\right)^{2}OPT(A)
\displaystyle\leq 72OPT(A).\displaystyle 72\cdot OPT(A).

This completes the proof of the lemma.∎

We will use this lemma in the approximation scheme of [BGJK20]. However, this lemma may be of independent interest as this gives a quantum pseudo approximation algorithm with a constant factor approximation that runs in time that is polylogarithmic in the data size and linear in kk and dd. We will discuss this quantum algorithm in the next Section.

2.2 Approximation scheme of [BGJK20]

A high-level description of the approximation scheme of [BGJK20] was given in the introduction. We give a more detailed pseudocode in Algorithm 2.

1:  Input: (V,k,ε,C)(V,k,\varepsilon,C), where VV is the dataset, k>0k>0 is the number of clusters, ε>0\varepsilon>0 is the error parameter, and CC is a kk center set that gives constant (pseudo)approximation.
2:  Output: A list \mathcal{L} of kk center sets such that for at least one CC^{\prime}\in\mathcal{L}, Φ(V,C)(1+ε)OPT\Phi(V,C^{\prime})\leq(1+\varepsilon)\cdot OPT.
3:  Constants: ρ=O(kε4)\rho=O(\frac{k}{\varepsilon^{4}}); τ=O(1ε)\tau=O(\frac{1}{\varepsilon})
4:  \mathcal{L}\leftarrow\emptyset; count1count\leftarrow 1
5:  repeat
6:     Sample a multi-set MM of ρk\rho k points from VV using D2D^{2}-sampling wrt center set CC
7:     MM{τk copies of each element in C}M\leftarrow M\cup\{\tau k\textrm{ copies of each element in $C$}\}
8:     for all disjoint subsets S1,,SkS_{1},...,S_{k} of MM such that i,|Si|=τ\forall i,|S_{i}|=\tau do
9:        (μ(S1),,μ(Sk))\mathcal{L}\leftarrow\mathcal{L}\cup(\mu(S_{1}),...,\mu(S_{k}))
10:     end for
11:     countcount++
12:  until count<2kcount<2^{k}
13:  return  \mathcal{L}
Algorithm 2 Algorithm of [BGJK20]

In addition to the input instance (V,k)(V,k) and error parameter ε\varepsilon, the algorithm is also given a constant approximate solution CC, which is used for D2D^{2}-sampling. A pseudoapproximate solution CC is sufficient for the analysis in [BGJK20]. The discussion from the previous subsection gives a robust algorithm that outputs a pseudoapproximate solution even under errors in distance estimates. So, the input requirement of Algorithm 2 can be met. Now, the main ingredient being D2D^{2}-sampling, we need to ensure that errors in distance estimate do not seriously impact the approximation analysis of Algorithm 2. We state the main theorem of [BGJK20] before giving the analogous statement for the modified algorithm where DD is replaced with D~\tilde{D} that is δ\delta-close to DD.

Theorem 2.1 (Theorem 1 in [BGJK20])

Let 0<ε1/20<\varepsilon\leq 1/2 be the error parameter, VN×dV\in\mathbb{R}^{N\times d} be the dataset, kk be a positive integer, and let CC be a constant approximate solution for dataset VV. Let \mathcal{L} be the list returned by Algorithm 2 on input (V,k,ε,C)(V,k,\varepsilon,C) using the Euclidean distance function DD. Then with probability at least 3/43/4, \mathcal{L} contains a center set CC^{\prime} such that Φ(V,C)(1+ε)OPT\Phi(V,C^{\prime})\leq(1+\varepsilon)\cdot OPT. Moreover, ||=O~(2O~(kε))|\mathcal{L}|=\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}\right) and the running time of the algorithm is O(Nd||)O(Nd|\mathcal{L}|).

We give the analogous theorem with access to the Euclidean distance function DD replaced with a function D~\tilde{D} that is δ\delta-close to DD.

Theorem 2.2

Let 0<ε120<\varepsilon\leq\frac{1}{2} be the error parameter, 0<δ<10<\delta<1 be the closeness parameter, VN×dV\in\mathbb{R}^{N\times d} be the dataset, kk be a positive integer, and let CC be a constant approximate solution for dataset VV. Let \mathcal{L} be the list returned by Algorithm 2 on input (V,k,ε,C)(V,k,\varepsilon,C) using the distance function D~\tilde{D} that is δ\delta-close to the Euclidean distance function DD. Then with probability at least 3/43/4, \mathcal{L} contains a center set CC^{\prime} such that Φ(V,C)(1+ε)OPT\Phi(V,C^{\prime})\leq(1+\varepsilon)\cdot OPT. Moreover, ||=O~(2O~(kε(1δ)))|\mathcal{L}|=\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon(1-\delta)})}\right) and the running time of the algorithm is O(Nd||)O(Nd|\mathcal{L}|).

The proof of the above theorem closely follows the proof of Theorem 2.1 of [BGJK20]. This is similar to the proof of Theorem 2 that we saw earlier, closely following the proof of Lemma 1. The minor changes are related to approximate distance estimates using D~\tilde{D} instead of real estimates using DD. The statement of Theorem 2.2 is not surprising in this light. Instead of repeating the entire proof of [BGJK20], we point out the one change in their argument caused by using D~\tilde{D} instead of DD as the distance function. The analysis of [BGJK20] works by partitioning the points in any optimal cluster XjX_{j} into those that are close to CC and those that are far. For the far points, it is shown that when doing D2D^{2}-sampling, a far point will be sampled with probability at least γ\gamma times the uniform sampling probability (see Lemma 21 in [GJK20], which is a full version of [BGJK20]). It then argues that a reasonable size set of D2D^{2}-sampled points will contain a uniform sub-sample. A combination of the uniform sub-sample along with copies of points in CC gives a good center for this optimal cluster XjX_{j}. Replacing DD with D~\tilde{D} decrease the value of γ\gamma by a multiplicative factor of (1δ)2(1+δ)2(1δ)4\frac{(1-\delta)^{2}}{(1+\delta)^{2}}\geq(1-\delta)^{4}. This means that the number of points sampled should increase by a factor of O(11δ)O(\frac{1}{1-\delta}). This means that the list size increases to O~(2O~(kε(1δ)))\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon(1-\delta)})}\right). Note that when δ12\delta\leq\frac{1}{2}, the list size and running time retains the same form as that in [BGJK20] (i.e., ||=O~(2O~(kε))|\mathcal{L}|=\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}\right) and time O(Nd||)O(Nd|\mathcal{L}|)).

3 Quantum Algorithms

We will work under the assumption that the minimum distance between two data points is 11, which can be acheived using scaling. This makes the aspect ratio η\eta simply the maximum distance between two data points. We will use ii for an index into the rows of the data matrix VN×dV\in\mathbb{R}^{N\times d}, and jj for an index into the rows of the center matrix Ck×dC\in\mathbb{R}^{k\times d}. We would ideally like to design a quantum algorithm that performs the transformation:

|i|j|0|i|j|D(vi,cj)\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{{D(v_{i},c_{j})}}

Let us call the state on the right |Ψideal\ket{\Psi_{ideal}}. This is an ideal quantum state for us since |Ψideal\ket{\Psi_{ideal}} helps to perform D2D^{2}-sampling and to find the kk-means cost of clustering, which are the main components of the approximation scheme of [BGJK20] that we intend to use. One caveat is that we will only be able to perform the following transformation (instead of the abovementioned transformation)

|i|j|0|i|j|ψi,j,\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{\psi_{i,j}},

where |ψi,j\ket{\psi_{i,j}} is an approximation for |D~(vi,cj)\ket{\tilde{D}(v_{i},c_{j})} in a sense that we will make precise below. We will use |Ψreal\ket{\Psi_{real}} to denote the state |i|j|ψi,j\ket{i}\ket{j}\ket{\psi_{i,j}}. This state is prepared using tools such as swap test followed by coherent amplitude estimation, and median estimation. Since these tools and techniques are known from previous works [WKS15, LMR13, KLLP19], we summarise the discussion (see Section 4.1 and 4.2 in [KLLP19]) in the following lemma.

Lemma 3 ([KLLP19] and [WKS15])

Assume for a data matrix VN×dV\in\mathbb{R}^{N\times d} and a center set matrix Ct×dC\in\mathbb{R}^{t\times d} that the following unitaries: (i) |i|0|i|vi\ket{i}\ket{0}\rightarrow\ket{i}\ket{v_{i}}, (ii) |j|0|j|cj\ket{j}\ket{0}\rightarrow\ket{j}\ket{c_{j}} can be performed in time TT and the norms of the vectors are known. For any Δ>0\Delta>0, there is a quantum algorithm that in time O(Tlog(1Δ)ε)O\left(\frac{T\log{\frac{1}{\Delta}}}{\varepsilon}\right) computes:

|i|j|0|i|j|ψi,j,\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{\psi_{i,j}},

where |ψi,j\ket{\psi_{i,j}} satisfies the following two conditions for every i[N]i\in[N] and j[t]j\in[t]:

  1. (i)

    |ψi,j|0|D~(vi,cj)Δ\norm{\ket{\psi_{i,j}}-\ket{0^{\otimes\ell}}\ket{{\tilde{D}(v_{i},c_{j})}}}\leq\Delta, and

  2. (ii)

    For every i,ji,j, D~(vi,cj)(1±ε)D(vi,cj)\tilde{D}(v_{i},c_{j})\in(1\pm\varepsilon)\cdot D(v_{i},c_{j}).

In the subsequent discussions, we will use TT as the time to access the QRAM data structure [kp17], i.e., for the transitions |i|0|i|vi\ket{i}\ket{0}\rightarrow\ket{i}\ket{v_{i}} and |j|0|j|cj\ket{j}\ket{0}\rightarrow\ket{j}\ket{c_{j}} as given in the above lemma. This is known to be T=O(log2(Nd))T=O(\log^{2}{(Nd)}). Moreover, the time to update each entry in this data structure is also T=O(log2(Nd))T=O(\log^{2}{(Nd)}). This is the logarithmic factor that is hidden in the O~\tilde{O} notation. In the following subsections, we discuss the utilities of |Ψreal\ket{\Psi_{real}} for the various components of the approximation scheme of [BGJK20]. During these discussions, it will be easier to see the utility first with the ideal state |Ψideal\ket{\Psi_{ideal}} before the real state |Ψreal\ket{\Psi_{real}} that can actually be prepared. We will see how |Ψreal\ket{\Psi_{real}} is sufficient within a reasonable error bound.

3.1 Finding distance to closest center

Let us see how we can estimate the distance of any point to its closest center in a center set CC with tkt\leq k centers. We can use the transformation |i|j|0|i|j|D(vi,cj)\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{{D(v_{i},c_{j})}} to prepare the following state for any ii:

|i|D(vi,c1)|D(vi,c2)|D(vi,ct)\ket{i}\ket{D(v_{i},c_{1})}\ket{D(v_{i},c_{2})}...\ket{D(v_{i},c_{t})}

We can then iteratively compare and swap pairs of registers to prepare the state |i|minj[t]D(vi,cj)\ket{i}\ket{\min_{j\in[t]}D(v_{i},c_{j})}. If we apply the same procedure to |i|ψi,1|ψi,t\ket{i}\ket{\psi_{i,1}}...\ket{\psi_{i,t}}, then with probability at least (12Δ)t(1-2\Delta)^{t}, the resulting state will be |i|minj[t]D~(vi,cj)\ket{i}\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}. So, the contents of the second register will be an estimate of the distance of the ithi^{th} point to its closest center in the center set CC. This further means that the following state can be prepared with probability at least (12Δ)Nt(1-2\Delta)^{Nt}:777The state prepared is actually 1Ni=1N|i(α|minj[t]D~(vi,cj)+β|G)\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\left(\alpha\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}+\beta\ket{G}\right) with |α|2(12Δ)Nk|\alpha|^{2}\geq(1-2\Delta)^{Nk}. However, instead of working with this state, subsequent discussions become much simpler if we assume that |ΨC\ket{\Psi_{C}} is prepared with probability |α|2|\alpha|^{2}.

|ΨC1Ni=1N|i|minj[t]D~(vi,cj).\ket{\Psi_{C}}\equiv\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}.

This quantum state can be used to find the approximate clustering cost of the center set CC, which we discuss in the following subsection. However, before we do that, let us summarise the main ideas of this subsection in the following lemma.

Lemma 4

There is a quantum algorithm that, with probability at least (12Δ)Nt(1-2\Delta)^{Nt}, prepares the quantum state |ΨC\ket{\Psi_{C}} in time O(Tklog(1Δ)ε)O\left(\frac{Tk\log{\frac{1}{\Delta}}}{\varepsilon}\right).

3.2 Computing cost of clustering

Suppose we want to compute the kk-means cost, Φ(V,C)i=1Nminj[k]D2(vi,cj)\Phi(V,C)\equiv\sum_{i=1}^{N}\min_{j\in[k]}{D^{2}(v_{i},c_{j})}, of the clustering given by a kk center set Ck×dC\in\mathbb{R}^{k\times d}. We can prepare mm copies of the state |ΨC\ket{\Psi_{C}} and then estimate the cost of clustering by measuring mm copies of this quantum state and summing the squares of the second registers. If mm is sufficiently large, we obtain a close estimate of Φ(V,C)\Phi(V,C). To show this formally, we will use the following Hoeffding tail inequality.

Theorem 3.1 (Hoeffding bound)

Let X1,,XmX_{1},...,X_{m} be independent, bounded random variables such that Xi[a,b]X_{i}\in[a,b]. Let Sm=X1++XmS_{m}=X_{1}+...+X_{m} . Then for any θ>0\theta>0, we have:

𝐏𝐫[|Sm𝐄[Sm]|θ]2e2θ2m(ba)2.\mathbf{Pr}[|S_{m}-\mathbf{E}[S_{m}]|\geq\theta]\leq 2\cdot e^{\frac{-2\theta^{2}}{m(b-a)^{2}}}.

Let X1,,XmX_{1},...,X_{m} denotes the square of the measured value of the second register in |ΨC\ket{\Psi_{C}}. These are random values in the range [1,η2][1,\eta^{2}], where η=maxi,jD~(vi,vj)(1±ε)maxi,jD(vi,vj)\eta=\max_{i,j}\tilde{D}(v_{i},v_{j})\in(1\pm\varepsilon)\cdot\max_{i,j}{D(v_{i},v_{j})}. First, we note the expectation of these random variables equals Φ~(V,C)N\frac{\tilde{\Phi}(V,C)}{N}, where Φ~(V,C)i=1Nminj[k]D~(vi,vj)(1±ε)Φ(V,C)\tilde{\Phi}(V,C)\equiv\sum_{i=1}^{N}\min_{j\in[k]}\tilde{D}(v_{i},v_{j})\in(1\pm\varepsilon)\cdot\Phi(V,C). We define the variable St=X1+X2++XmS_{t}=X_{1}+X_{2}+...+X_{m} and apply the Hoeffding bound on these bounded random variables to get a concentration result that can then be used.

Lemma 5

Let αm=SmNm\alpha_{m}=S_{m}\cdot\frac{N}{m} and L>0L>0. If m=η2ln((10L))ε2m=\frac{\eta^{2}\ln{(10L)}}{\varepsilon^{2}}, then we have:

𝐏𝐫[αm(1±ε)Φ~(V,C)]115L.\mathbf{Pr}[\alpha_{m}\in(1\pm\varepsilon)\cdot\tilde{\Phi}(V,C)]\geq 1-\frac{1}{5L}.
Proof

We know that 𝐄[Sm]=mNΦ~(V,C)\mathbf{E}[S_{m}]=\frac{m}{N}\cdot\tilde{\Phi}(V,C) From the Hoeffding tail inequality, we get the following:

𝐏𝐫[|Sm𝐄[Sm]|ε𝐄[Sm]]\displaystyle\mathbf{Pr}[|S_{m}-\mathbf{E}[S_{m}]|\geq\varepsilon\cdot\mathbf{E}[S_{m}]] \displaystyle\leq 2e2ε2𝐄[Sm]2mη2=2e2ε2mη2(Φ~(V,C)N)22eln((10L))15L.\displaystyle 2\cdot e^{\frac{-2\varepsilon^{2}\mathbf{E}[S_{m}]^{2}}{m\eta^{2}}}=2\cdot e^{\frac{-2\varepsilon^{2}m}{\eta^{2}}\cdot\left(\frac{\tilde{\Phi}(V,C)}{N}\right)^{2}}\leq 2\cdot e^{-\ln{(10L)}}\leq\frac{1}{5L}.

This implies that:

𝐏𝐫[|αmΦ~(V,C)|εΦ~(V,C)]15L.\mathbf{Pr}[|\alpha_{m}-\tilde{\Phi}(V,C)|\leq\varepsilon\cdot\tilde{\Phi}(V,C)]\leq\frac{1}{5L}.

This completes the proof of the lemma.∎

So, conditioned on having mm copies of the state |ΨC\ket{\Psi_{C}}, we get an estimate of the clustering cost within a relative error of (1±ε2)(1\pm\varepsilon^{2}) with probability at least (115L)(1-\frac{1}{5L}). Removing the conditioning, we get the same with probability at least (12Δ)Nkm(115L)(1-2\Delta)^{Nkm}\cdot(1-\frac{1}{5L}). We want to use the above cost estimation technique to calculate the cost for a list of center sets C1,,CLC_{1},...,C_{L}, and then pick the center set from the list with the least cost. We must apply the union bound appropriately to do this with high probability. We summarise these results in the following lemma. Let us first set some of the parameters with values that we will use to implement the approximation scheme of [BGJK20].

  • LL denotes the size of the list of kk-center sets we will iterate over to find the one with the least cost. This quantity is bounded as L=(kε)O(kε)L=\left(\frac{k}{\varepsilon}\right)^{O(\frac{k}{\varepsilon})}.

  • mm is the number of copies of the state |ΨC\ket{\Psi_{C}} made to estimate the cost of the center set CC. This, as given is Lemma 5 is m=η2ln((10L))ε2m=\frac{\eta^{2}\ln{(10L)}}{\varepsilon^{2}}, where η=(1+ε)maxi,jD(vi,vj)\eta=(1+\varepsilon)\cdot\max_{i,j}{D(v_{i},v_{j})}.

Lemma 6

Let L=(kε)O(kε)L=\left(\frac{k}{\varepsilon}\right)^{O(\frac{k}{\varepsilon})}, m=η2ln((10L))ε2m=\frac{\eta^{2}\ln{(10L)}}{\varepsilon^{2}}, and Δ=O(1NkmL)\Delta=O\left(\frac{1}{NkmL}\right). Given a point set VV and a list of center sets C1,,CLC_{1},...,C_{L} in the QRAM model, there is a quantum algorithm that runs in time O~(2O~(kε)Tη2)\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}T\eta^{2}\right) and outputs an index ll such that Φ(V,Cl)(1+ε)2minjLΦ(V,Cj)\Phi(V,C_{l})\leq(1+\varepsilon)^{2}\min_{j\in L}\Phi(V,C_{j}) with probbaility at least 35\frac{3}{5}.

Proof

The algorithm estimates the cost of C1,,CLC_{1},...,C_{L} using mm copies each of |ΨC1,,|ΨCL\ket{\Psi_{C_{1}}},...,\ket{\Psi_{C_{L}}} and picks the index with the minimum value in time O(TkmLlog(1Δ)ε)O\left(\frac{TkmL\log{\frac{1}{\Delta}}}{\varepsilon}\right). Plugging the values of L,mL,m, and Δ\Delta we get the running time stated in the lemma.

Let us bound the error probability of this procedure. By Lemma 4, the probability that we do not have the correct mm copies each of |ΨC1,,|ΨCL\ket{\Psi_{C_{1}}},...,\ket{\Psi_{C_{L}}} is bounded by (12Δ)NkmL(1-2\Delta)^{NkmL}. Conditioned of having |ΨC1,,|ΨCL\ket{\Psi_{C_{1}}},...,\ket{\Psi_{C_{L}}}, the probability that there exists an index j[L]j\in[L], where the estimate is off by more than a (1±ε)2(1\pm\varepsilon)^{2} factor is upper bounded by 15\frac{1}{5} by the union bound. So, the probability that the algorithm will find an index ll such that Φ(V,Cl)>(1+ε)2minj[L]Φ(V,Cj)\Phi(V,C_{l})>(1+\varepsilon)^{2}\min_{j\in[L]}{\Phi(V,C_{j})} is upper bounded by (12Δ)Nklm+15(1-2\Delta)^{Nklm}+\frac{1}{5}. This probability is at most 25\frac{2}{5} since Δ=O(1NkmL)\Delta=O(\frac{1}{NkmL}). This completes the proof of the lemma. ∎

3.3 D2D^{2}-sampling

D2D^{2}-sampling from the point set VV with respect to a center set Ct×dC\in\mathbb{R}^{t\times d} with tt centers, samples viv_{i} with probability proportional to minj[t]D2(vi,cj)\min_{j\in[t]}{D^{2}(v_{i},c_{j})}. Let us see if we can use our state |ΨC=1Ni=1N|i|minj[t]D~(vi,cj)\ket{\Psi_{C}}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}} is useful to perform this sampling. If we can pull out the value of the second register as the amplitude, then measurement will give us close to D2D^{2}-sampling. This is possible since we have an estimate of the clustering cost from the previous subsection. We can use controlled rotations on an ancilla qubit to prepare the state:

|Ψsample1Ni=1N|i(βi|0+12|1),\ket{\Psi_{sample}}\equiv\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\left(\beta_{i}\ket{0}+\frac{1}{\sqrt{2}}\ket{1}\right),

where βi=minj[t]D~(vi,cj)2Φ~(V,C)\beta_{i}=\frac{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}{\sqrt{2\cdot\tilde{\Phi}(V,C)}}. So, the probability of measurement of (i,0)(i,0) is minj[t]D~(vi,cj)2Φ~(V,C)\frac{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}{2\cdot\tilde{\Phi}(V,C)}. Since we do rejection sampling (ignoring (.,1)(.,1)’s that are sampled with probability 12\frac{1}{2}), we end up sampling with a distribution where the probability of sampling ii is minj[t]D~(vi,cj)Φ~(V,C)(1±ε)minj[t]D(vi,cj)Φ(V,C)\frac{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}{\tilde{\Phi}(V,C)}\in(1\pm\varepsilon)\cdot\frac{\min_{j\in[t]}{D(v_{i},c_{j})}}{\Phi(V,C)}. This means that points get sampled with a probability close to the actual D2D^{2}-sampling probability. As we have mentioned earlier, this is sufficient for the approximation guarantees of [BGJK20] to hold. We summarise the observations of this section in the next lemma. We will need the following notion of the relative similarity of two distributions.

Definition 2

Let 0<ε<10<\varepsilon<1. For two distributions D1D_{1} and D2D_{2} over a finite set XX, we say that D1εD2D_{1}\sim_{\varepsilon}D_{2} if for every xXx\in X, D1(x)(1±ε)D2(x)D_{1}(x)\in(1\pm\varepsilon)\cdot D_{2}(x).

Lemma 7

Given a dataset VN×dV\in\mathbb{R}^{N\times d} and a center set Ct×dC\in\mathbb{R}^{t\times d} in the QRAM model, there is a quantum algorithm that runs in time O(TkSlog(1Δ)ε)O\left(\frac{TkS\log{\frac{1}{\Delta}}}{\varepsilon}\right) and with probability at least (12Δ)NtS(1-2\Delta)^{NtS} outputs SS independent samples with distribution ZZ such that ZεD2Z\sim_{\varepsilon}D^{2}, where D2D^{2} denotes the D2D^{2}-sampling distribution.

Proof

The proof follows from Lemma 4 and the preceding discussion.∎

The above lemma says that for Δ=O(1NkS)\Delta=O(\frac{1}{NkS}), we obtain the required samples with high probability. We can now give proof of Theorem 1.1 assembling the quantum tools of this section.

Proof (Proof of Theorem 1.1)

The first requirement for executing the algorithm of [BGJK20] is a constant pseudo approximation algorithm using which we obtain the initial center set CC. By Lemma 2, we know that 2k2k points sampled using D~2\tilde{D}^{2}-sampling gives such a center set. From Lemma 7, this can be done quantumly in time O~(k2dε)\tilde{O}(\frac{k^{2}d}{\varepsilon}), which also includes the time O(kdlog2(kd))O(kd\log^{2}{(kd)}) to set up the QRAM data structure for all kk iterations. The algorithm of [BGJK20] has an outer repeat loop for probability amplification. Within the outer loop, poly(kε)poly(\frac{k}{\varepsilon}) points are D2D^{2}-sampled with respect to the center set CC (line 6). This can again be done quantumly using Lemma 7 in time O~(d(k/ε)O(1))\tilde{O}(d(k/\varepsilon)^{O(1)}). We can then classically process the point set MM (see line 7 in Algorithm 2) and create the QRAM data structure for the list C1,,CLC_{1},...,C_{L} of kk-center sets that correspond to all possible disjoint subsets of MM (see line 8 in Algorithm 2). This takes time O~(Lkd)\tilde{O}(Lkd), where L=(kε)O(kε)L=\left(\frac{k}{\varepsilon}\right)^{O(\frac{k}{\varepsilon})}. Theorem 2.2 shows that at least one center set in the list gives (1+ε)(1+\varepsilon)-approximation. We use this fact in conjunction with the result of Lemma 6 to get that the underlying quantum algorithm runs in time O~(2O~(kε)dη2)\tilde{O}(2^{\tilde{O}(\frac{k}{\varepsilon})}d\eta^{2}) and with high probability outputs a center set CC^{\prime} such that Φ(V,C)(1+ε)3OPT\Phi(V,C^{\prime})\leq(1+\varepsilon)^{3}\cdot OPT.888We needed (1+ε)(1+\varepsilon), but got (1+ε)3(1+\varepsilon)^{3} instead. However, this can be handled with ε=ε/4\varepsilon^{\prime}=\varepsilon/4.

4 Discussion and Open Problems

We give a quantum algorithm for the kk-means problem with a provable approximation guarantee of (1+ε)(1+\varepsilon) for arbitrary ε\varepsilon with a polylogarithmic running time dependence on the data size NN and an exponential dependence on kε\frac{k}{\varepsilon}. In the classical setting, there are FPT (fixed-parameter tractable) algorithms that have polynomial running time dependence on the input size NN but are allowed to have exponential dependence on the parameters (e.g. kk in the kk-means problem, which is typically a small number). In this paper, we witnessed a case where we were able to take such a classical FPT algorithm into the quantum setting and lower the dependency on NN from linear in the classical setting [BGJK20] to polylogarithmic (this paper) while keeping the dependence on the parameters (k,d,εk,d,\varepsilon) intact. The aspect ratio η\eta can be considered an additional parameter. It would be interesting to see if there are other problems where such quantization is possible. If so, discussing Quantum FPT (QFPT) algorithms with polylogarithmic dependence on the input size and possibly exponential dependence on the parameters would make sense. Another future direction is to check whether the sample and query access defined by [Tan19] is sufficient to obtain comparable results in the classical setting.

References

  • [ABG13] Esma Aïmeur, Gilles Brassard, and Sébastien Gambs. Quantum speed-up for unsupervised learning. Machine Learning, 90:261–287, 2013.
  • [ADK09] Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. Adaptive sampling for k-means clustering. In Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX ’09 / RANDOM ’09, pages 15–28, Berlin, Heidelberg, 2009. Springer-Verlag.
  • [AV07] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 1027–1035, USA, 2007. Society for Industrial and Applied Mathematics.
  • [BGJK20] Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. On Sampling Based Algorithms for k-Means. In Nitin Saxena and Sunil Simon, editors, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020), volume 182 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1–13:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
  • [CGL+20] Nai-Hui Chia, András Gilyén, Han-Hsuan Lin, Seth Lloyd, Ewin Tang, and Chunhao Wang. Quantum-Inspired Algorithms for Solving Low-Rank Linear Equation Systems with Logarithmic Dependence on the Dimension. In Yixin Cao, Siu-Wing Cheng, and Minming Li, editors, 31st International Symposium on Algorithms and Computation (ISAAC 2020), volume 181 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1–47:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
  • [DH99] Christoph Durr and Peter Hoyer. A quantum algorithm for finding the minimum, 1999.
  • [FGG14] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm, 2014.
  • [GJK20] Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. Streaming ptas for constrained k-means, 2020.
  • [KLLP19] Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, and Anupam Prakash. q-means: A quantum algorithm for unsupervised machine learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  • [Llo82] S. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137, 1982.
  • [LMR13] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum algorithms for supervised and unsupervised machine learning, 2013.
  • [OMA+17] J. S. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell, N. Didier, E. Schuyler Fried, S. Hong, P. Karalekas, C. B. Osborn, A. Papageorge, E. C. Peterson, G. Prawiroatmodjo, N. Rubin, Colm A. Ryan, D. Scarabelli, M. Scheer, E. A. Sete, P. Sivarajah, Robert S. Smith, A. Staley, N. Tezak, W. J. Zeng, A. Hudson, Blake R. Johnson, M. Reagor, M. P. da Silva, and C. Rigetti. Unsupervised machine learning on a hybrid quantum computer, 2017.
  • [Tan19] Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 217?228, New York, NY, USA, 2019. Association for Computing Machinery.
  • [Tan21] Ewin Tang. Quantum principal component analysis only achieves an exponential speedup because of its state preparation assumptions. Physical Review Letters, 127(6), aug 2021.
  • [WKS15] Nathan Wiebe, Ashish Kapoor, and Krysta M. Svore. Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quantum Info. Comput., 15(3?4):316–356, mar 2015.