¹¹institutetext: CSE, IIT Delhi.^†^†thanks: ¹¹email: [email protected]

A Quantum Approximation Scheme for $k$ -Means

Ragesh Jaiswal

Abstract

We give a quantum approximation scheme (i.e., $(1+\varepsilon)$ -approximation for every $\varepsilon>0$ ) for the classical $k$ -means clustering problem in the QRAM model with a running time that has only polylogarithmic dependence on the number of data points. More specifically, given a dataset $V$ with $N$ points in $\mathbb{R}^{d}$ stored in QRAM data structure, our quantum algorithm runs in time $\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}\eta^{2}d\right)$ and with high probability outputs a set $C$ of $k$ centers such that $cost(V,C)\leq(1+\varepsilon)\cdot cost(V,C_{OPT})$ . Here $C_{OPT}$ denotes the optimal $k$ -centers, $cost(.)$ denotes the standard $k$ -means cost function (i.e., the sum of squared distance of points to the closest center), and $\eta$ is the aspect ratio (i.e., the ratio of maximum distance to minimum distance). This is the first quantum algorithm with a polylogarithmic running time that gives a provable approximation guarantee of $(1+\varepsilon)$ for the $k$ -means problem. Also, unlike previous works on unsupervised learning, our quantum algorithm does not require quantum linear algebra subroutines and has a running time independent of parameters (e.g., condition number) that appear in such procedures.

1 Introduction

Data clustering and the $k$ -means problem, in particular, have many applications in data processing. The $k$ -means problem is defined as: given a set of $n$ points $V=\{v_{1},...,v_{n}\}\subset\mathbb{R}^{d}$ , and a positive integer $k$ , find a set $C\subset\mathbb{R}^{d}$ of $k$ centers such that the cost function,

\Phi(V,C)\equiv\sum_{v\in V}\min_{c\in C}D^{2}(v,c),

is minimised. Here, $D(v,c)\equiv\norm{v-c}$ is the Euclidean distance between points $v$ and $c$ . Partitioning the points based on the closest center in the center set $C$ gives a natural clustering of the data points. Due to its applications in data processing, a lot of work goes into designing algorithms from theoretical and practical standpoints. The $k$ -means problem is known to be $\mathsf{NP}$ -hard, so it is unlikely to have a polynomial time algorithm. Much research has been done on designing polynomial time approximation algorithms for the $k$ -means problem. However, the algorithm used in practice to solve $k$ -means instances is a heuristic, popularly known as the $k$ -means algorithm (not to be confused with the $k$ -means problem). This heuristic, also known as Lloyd’s iterations [Llo82], iteratively improves the solution in several rounds. The heuristic starts with an arbitrarily chosen set of $k$ centers. In every iteration, it (i) partitions the points based on the nearest center and (ii) updates the center set to the centroids of the $k$ partitions. In the classical computational model, it is easy to see that every Lloyd iteration costs $O(Nkd)$ time. This hill-climbing approach may get stuck in a local minimum or take a huge amount of time to converge and hence does not give provable guarantees on the quality of the final solution or the running time. In practice, Lloyd’s iterations are usually preceded by the $k$ -means++ algorithm [AV07], a fast sampling-based approach for picking the initial $k$ centers that also gives an approximation guarantee. So, Lloyd’s iterations, preceded by the $k$ -means++ algorithm, give the best of both worlds, theory, and practice. Hence, it is unsurprising that a lot of work has been done on these two algorithms. This ranges from efficiency improvements in specific settings to implementations in distributed and parallel models. With the quantum computing revolution imminent, it is natural to talk about quantum versions of these algorithms and quantum algorithms for the $k$ -means problem in general.

Early work on the $k$ -means problem within the quantum setting involved efficiency gains from quantizing Lloyd’s iterations. In particular, Aimeur, Brassard, and Gambs [ABG13] gave an $O(\frac{N^{3/2}}{\sqrt{k}})$ time algorithm for executing a single Lloyd’s iteration for the Metric $k$ -median clustering problem that is similar to the $k$ -means problem. This was using the quantum minimum finding algorithm of Durr and Hoyer [DH99]. Using quantum distance estimation techniques assuming quantum data access, Lloyd, Mohseni, and Rebentrost [LMR13] gave an $O(kN\log{d})$ time algorithm for the execution of a single Lloyd’s iteration for the $k$ -means problem. More recently, [KLLP19] gave an approximate quantization of the $k$ -means++ method and Lloyd’s iteration assuming QRAM data structure [kp17] access to the data. Interestingly, the running time has only polylogarithmic dependence on the size $N$ of the dataset. The algorithm uses quantum linear algebra procedures, and hence there is dependence on certain parameters that appear in such procedures, such as the condition number $\kappa(V)$ . Since Lloyd’s iterations do not give an approximation guarantee, its quantum version is also a heuristic without a provable approximation guarantee.¹¹1Even though [KLLP19] gives a quantum version of the $k$ -means++ algorithm that has an $O(\log{k})$ approximation guarantee, the guarantee for the quantum version (which has errors) is not shown explicitly. Our work on the $k$ -means problem builds upon the techniques developed in all the above and other works on quantum unsupervised learning to design algorithms with provable approximation guarantees. Specifically, we want to design an approximation scheme for the $k$ -means problem with a running time that has only a polylogarithmic dependence on the data size $N$ as in the algorithm of [KLLP19]. An approximation scheme is an algorithm that, in addition to the dataset and $k$ , takes an error parameter $\varepsilon>0$ as input and outputs a solution with a cost within $(1+\varepsilon)$ factor of the optimal. We do this by quantizing the highly parallel, sampling-based approximation scheme of [BGJK20]. The tradeoff in obtaining this fine-grained approximation is that the running time of our algorithm has an exponential dependence on $k$ and error parameter $\varepsilon$ . In the classical setting, such algorithms are categorized as Fixed Parameter Approximation Schemes (fpt-AS). Such $(1+\varepsilon)$ -approximation algorithms can have exponential running time dependence on the parameter (e.g., the number of clusters $k$ in our setting). The practical motivation for studying Fixed-Parameter Tractability for computationally hard problems is that when the parameter is small (e.g., number of clusters $k\sim 5$ ), the running time is not prohibitively large. We state our main result as the following theorem, which we will prove in the remainder of the paper.

Theorem 1.1 (Main Theorem)

Let $0<\varepsilon<1/2$ be the error parameter. There is a quantum algorithm that, when given QRAM data structure access to a dataset $V\in\mathbb{R}^{N\times d}$ , runs in time $\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}d\eta^{2}\right)$ and outputs a $k$ center set $C\in\mathbb{R}^{k\times d}$ such that with high probability $\Phi(V,C)\leq(1+\varepsilon)\cdot OPT$ . Here, $\eta$ is the aspect ratio, i.e., the ratio of the maximum to the minimum distance between two given points in $V$ .²²2The $\tilde{O}$ notation hides logarithmic factors in $N$ . The $\tilde{O}$ in the exponent hides logarithmic factors in $k$ and $1/\varepsilon$ .

1.1 An approximation scheme in the classical setting

We convert the $D^{2}$ -sampling-based approximation scheme of [BGJK20] to a Quantum version. The approximation scheme is simple and highly parallel, which can be described in the following few lines:

Input: Dataset $V$ , integer $k>0$ , and error $\varepsilon>0$

Output: A center set $C^{\prime}$ with $\Phi(V,C^{\prime})\leq(1+\varepsilon)OPT$

1.

(Constant approximation) Find a center set $C$ that is a constant factor approximate solution. An $(\alpha,\beta)$ pseudo-approximate solution, for constants $\alpha,\beta$ , also works.
2.

( $D^{2}$ -sampling) Pick a set $T$ of $poly(\frac{k}{\varepsilon})$ points independently from the dataset using $D^{2}$ -sampling with respect to the center set $C$ .
3.

(All subsets) Out of all $k$ -tuples $(S_{1},...,S_{k})$ of (multi)subsets of $T\cup\{\textrm{copies of points in $C$}\}$ , each $S_{i}$ of size $O(\frac{1}{\varepsilon})$ , return $(\mu(S_{1}),...,\mu(S_{k}))$ that gives the least $k$ -means cost. Here, $\mu(S_{i})$ denotes the centroid of points in $S_{i}$ .

We will discuss the quantization of the above three steps of the approximation scheme of [BGJK20], thus obtaining a quantum approximation scheme. ³³3Steps (2) and (3) in the algorithm are within a loop for probability amplification. This loop is skipped in this high-level description for simplicity.

1. (Constant approximation)

The first step requires finding a constant factor approximate solution for the $k$ -means problem. Even though several constant factor approximation algorithms are known, we need one with a quantum counterpart that runs in time that is polylogarithmic in the input size $N$ . One such algorithm is the $k$ -means++ seeding algorithm [AV07] that picks $k$ centers in a sequence with the $i^{th}$ center picked using $D^{2}$ -sampling⁴⁴4 $D^{2}$ -sampling: Given a center set $C$ , $D^{2}$ -sampling picks a datapoint with probability proportional to the squared distance of the point to the closest center in $C$ . with respect to the previously chosen $(i-1)$ centers. [KLLP19] give an approximate quantum version of $D^{2}$ -sampling. The approximation guarantee of the $k$ -means++ algorithm is $O(\log{k})$ instead of the constant approximation required in the approximation scheme of [BGJK20]. It is known from the work of [ADK09] that if the $D^{2}$ -sampling in $k$ -means++ is continued for $2k$ steps instead of stopping after sampling $k$ centers, then we obtain a center set of size $2k$ that is a $(2,O(1))$ -pseudo approximate solution. This means that this $2k$ -size center set has a $k$ -means cost that is some constant times the optimal. Such a pseudo-approximate solution is sufficient for the approximation scheme of [BGJK20] to work. We show that the pseudo-approximation guarantee of [ADK09] also holds when using the approximate quantum version of the $D^{2}$ -sampling procedure.

2. ( $D^{2}$ -sampling)

The second step of [BGJK20] involves $D^{2}$ -sampling, which we already discussed how to quantize. This is no different than the $D^{2}$ -sampling involved in the $k$ -means++ algorithm of the previous step. The sampling in this step is simpler since the center set $C$ with respect to which the $D^{2}$ -sampling is performed, does not change (as is the case with the $k$ -means++ algorithm.)

3. (All subsets)

Since the number of points sampled in the previous step is $poly(\frac{k}{\varepsilon})$ , we need to consider a list of $\left(\frac{k}{\varepsilon}\right)^{\tilde{O}(\frac{k}{\varepsilon})}$ tuples of subsets, each giving a $k$ -center set (a tuple $(S_{1},...,S_{k})$ defines $(\mu(S_{1}),...,\mu(S_{k}))$ ). We need to compute the $k$ -means cost for each $k$ center sets in the list and then pick the one with the least cost. We give quantization of the above steps. ⁵⁵5Note that when picking the center set with the least cost, we can get quadratic improvement in the search for the best $k$ -center set using quantum search. Given that the search space is of size $\left(\frac{k}{\varepsilon}\right)^{\tilde{O}(\frac{k}{\varepsilon})}$ , this results only in a constant factor improvement in the exponent. So, we leave out the quantum search from the discussion for simplicity.

Note that the quantization of the classical steps of [BGJK20] will incur precision errors. So, we first need to ensure that the approximation guarantee of [BGJK20] is robust against small errors in distance estimates, $D^{2}$ -sampling probabilities, and $k$ -means cost estimates. We must carefully account for errors and ensure that the quantum algorithm retains the $(1+\varepsilon)$ approximation guarantee of the robust version of [BGJK20].

Organization

We begin the technical discussions in the next section by showing that the approximation scheme of [BGJK20] is robust against errors. We will also show the robustness of the $k$ -means++ procedure. In the subsequent section, we give the quantization of the steps of [BGJK20]. First, we briefly discuss the related work.

1.2 Related work

We have already discussed past research works on quantum versions of the $k$ -means algorithm (i.e., Lloyd’s iterations). This includes [ABG13], [LMR13], and [KLLP19]. All these have been built using various quantum tools and techniques developed for various problems in quantum unsupervised learning, such as coherent amplitude and median estimation, distance estimation, minimum finding, etc. See [WKS15] for examples of several such tools. Other directions on quantum $k$ -means includes adiabatic algorithms (e.g., [LMR13]) and algorithms using the QAOA framework (e.g., [OMA⁺17, FGG14]). However, these are without provable guarantees. A line of work has also suggested that quantum algorithms can outperform classical ones because of the QRAM data structure access. A more level playing field is to assume that a similar sample and query data access is available in the classical setting. Under this assumption, several “dequantization” results for unsupervised machine learning algorithms have been given. This includes [Tan19, CGL⁺20, Tan21]. It will be interesting to see if similar dequantization is possible for the quantum algorithms presented in this work since the main ingredient of our algorithm and the dequantization results is length-squared sampling.

2 A Robust Approximation Scheme

We start the discussion with the $D^{2}$ -sampling method. In particular, we would like to check the robustness of the approximation guarantee provided by the $D^{2}$ -sampling method against errors in estimating the distances between points. We will show that the $D^{2}$ -sampling method gives a constant pseudo-approximation even under sampling errors.

2.1 Pseudoapproximation using $D^{2}$ -sampling

Let the matrix $V\in\mathbb{R}^{N\times d}$ denote the dataset, where row $i$ contains the $i^{th}$ data point $v_{i}\in\mathbb{R}^{d}$ . Let the matrix $C\in\mathbb{R}^{t\times d}$ any $t$ -center set, where row $i$ contains the $i^{th}$ center $c_{i}\in\mathbb{R}^{d}$ out of the $t$ centers. Sampling a data point using the $D^{2}$ distribution w.r.t. (short for with respect to) a center set $C$ means that the datapoint $v_{i}$ gets sampled with probability proportional to the squared distance to its nearest center in the center set $C$ . This is also known as $D^{2}$ sampling w.r.t. center set $C$ . More formally, data points are sampled using the distribution $\left(\frac{D^{2}(v_{1},C)}{\sum_{j}D^{2}(v_{j},C)},...,\frac{D^{2}(v_{N},C)}{\sum_{j}D^{2}(v_{j},C)}\right)$ , where $D^{2}(v_{j},C)\equiv\min_{c\in C}D^{2}(v_{j},c)$ . For the special case $C=\emptyset$ , $D^{2}$ sampling is the same as uniform sampling. The $k$ -means++ seeding algorithm starts with an empty center set $C$ and, over $k$ iterations, adds a center to $C$ in every iteration by $D^{2}$ sampling w.r.t. the current center set $C$ . It is known from the result of [AV07] that this $k$ -means++ algorithm above gives an $O(\log{k})$ approximation in expectation. It is also known from the result of [ADK09] that if $2k$ centers are sampled, instead of $k$ (i.e., the for-loop runs from $1$ to $2k$ ), the cost with respect to these $2k$ centers is at most some constant times the optimal $k$ -means cost. Such an algorithm is called a pseudo approximation algorithm. Such a pseudo approximation algorithm is sufficient for the approximation scheme of [BGJK20]. So, we will quantize the following constant factor pseudo-approximation algorithm.

1 Input:

(V,k)

C\leftarrow\{\}

for

i

1

2k

3 Pick

c

using

D^{2}

-sampling w.r.t. center set

C

C:=C\leftarrow\{c\}

end for

return

C

Algorithm 1 A pseudo-approximation algorithm based on

D^{2}

-sampling.

In the quantum simulation of the above sampling procedure, there will be small errors in the sampling probabilities in each iteration. We need to ensure that the constant approximation guarantee of the above procedure is robust against small errors in the sampling probabilities owing to errors in distance estimation. We will work with a relative error of $(1\pm\delta)$ for small $\delta$ . Following is a crucial lemma from [AV07] needed to show the pseudo-approximation property of Algorithm 1.

Lemma 1 (Lemma 3.2 in [AV07])

Let $A$ be an arbitrary optimal cluster, and let $C$ be an arbitrary set of centers. Let $c$ be a center chosen from $A$ with $D^{2}$ -sampling with respect to $C$ . Then $\mathbf{E}[cost(A,C\cup\{c\})]\leq 8\cdot OPT(A)$ .

The above lemma is used as a black box in the analysis of Algorithm 1 in [ADK09]. The following version of the lemma holds for distance estimates with a relative error of $(1\pm\delta)$ and gives a constant factor approximation guarantee. Since Lemma 1 is used as a black box in the analysis of Algorithm 1, replacing this lemma with Lemma 2 also gives a constant factor approximation to the $k$ -means objective. We will use the following notion of the closeness of two distance functions.

Definition 1

A distance function $D_{1}$ is said to be $\delta$ -close to distance function $D_{2}$ , denoted by $D_{1}\sim_{\delta}D_{2}$ , if for every pair of points $x,y\in\mathbb{R}^{d}$ , $D_{1}(x,y)\in(1\pm\delta)\cdot D_{2}(x,y)$ .⁶⁶6We use the notation that for positive reals $P,Q$ , $P\in(1\pm\delta)\cdot Q$ if $(1-\delta)\cdot Q\leq P\leq(1+\delta)\cdot Q$ .

Lemma 2

Let $0<\delta\leq 1/2$ . Let $A$ be an arbitrary optimal cluster and $C$ be an arbitrary set of centers. Let $c$ be a center chosen from $A$ with $\tilde{D}^{2}$ -sampling with respect to $C$ , where $\tilde{D}\sim_{\delta}D$ . Then $\mathbf{E}[cost(A,C\cup\{c\})]\leq 72\cdot OPT(A)$ .

Proof

Let $D(a)$ denote the distance of the point $a$ from the nearest center in $C$ and let $\tilde{D}(a)$ denote the estimated distance. We have $\tilde{D}(a)\in D(a)\cdot(1\pm\delta)$ . The following expression gives the expectation:

\displaystyle\sum_{a_{0}\in A}\frac{\tilde{D}^{2}(a_{0})}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}

Note that for all $a_{0},a\in A$ , $D(a_{0})\leq D(a)+D(a,a_{0})$ . This gives $\tilde{D}(a_{0})\leq\frac{1+\delta}{1-\delta}\cdot\tilde{D}(a)+(1+\delta)\cdot D(a_{0},a)$ , which further gives $\tilde{D}^{2}(a_{0})\leq 2\left(\frac{1+\delta}{1-\delta}\right)^{2}\cdot\tilde{D}^{2}(a)+2(1+\delta)^{2}\cdot D^{2}(a_{0},a)$ and $\tilde{D}^{2}(a_{0})\leq\frac{2}{|A|}\left(\frac{1+\delta}{1-\delta}\right)^{2}\cdot\sum_{a\in A}\tilde{D}^{2}(a)+\frac{2}{|A|}(1+\delta)^{2}\cdot\sum_{a\in A}D^{2}(a_{0},a)$ . We use this to obtain the following upper bound on the expectation $\mathbf{E}[cost(A,C\cup\{c\})]$ :

$\displaystyle\mathbf{E}[cost(A,C\cup\{c\})]$	$\displaystyle\leq$	$\displaystyle\sum_{a_{0}\in A}\frac{\tilde{D}^{2}(a_{0})}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}$
	$\displaystyle\leq$	$\displaystyle\sum_{a_{0}\in A}\frac{\left(\frac{2}{\|A\|}\left(\frac{1+\delta}{1-\delta}\right)^{2}\sum_{a\in A}\tilde{D}^{2}(a)\right)}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}+$
		$\displaystyle\sum_{a_{0}\in A}\frac{\left(\frac{2}{\|A\|}(1+\delta)^{2}\sum_{a\in A}D^{2}(a_{0},a)\right)}{\sum_{a\in A}\tilde{D}^{2}(a)}\cdot\sum_{a^{\prime}\in A}\min{\left(D^{2}(a^{\prime}),D^{2}(a^{\prime},a_{0})\right)}$
	$\displaystyle\leq$	$\displaystyle\sum_{a_{0}\in A}\sum_{a^{\prime}\in A}\frac{2}{\|A\|}\left(\frac{1+\delta}{1-\delta}\right)^{2}D^{2}(a^{\prime},a_{0})+\sum_{a_{0}\in A}\sum_{a\in A}\frac{2}{\|A\|}\left(\frac{1+\delta}{1-\delta}\right)^{2}D(a_{0},a)^{2}$
	$\displaystyle=$	$\displaystyle\frac{4}{\|A\|}\left(\frac{1+\delta}{1-\delta}\right)^{2}\sum_{a_{0}\in A}\sum_{a\in A}D^{2}(a_{0},a)^{2}$
	$\displaystyle=$	$\displaystyle 8\left(\frac{1+\delta}{1-\delta}\right)^{2}OPT(A)$
	$\displaystyle\leq$	$\displaystyle 72\cdot OPT(A).$

This completes the proof of the lemma.∎

We will use this lemma in the approximation scheme of [BGJK20]. However, this lemma may be of independent interest as this gives a quantum pseudo approximation algorithm with a constant factor approximation that runs in time that is polylogarithmic in the data size and linear in $k$ and $d$ . We will discuss this quantum algorithm in the next Section.

2.2 Approximation scheme of [BGJK20]

A high-level description of the approximation scheme of [BGJK20] was given in the introduction. We give a more detailed pseudocode in Algorithm 2.

1: Input:

(V,k,\varepsilon,C)

, where

V

is the dataset,

k>0

is the number of clusters,

\varepsilon>0

is the error parameter, and

C

is a

k

center set that gives constant (pseudo)approximation.

2: Output: A list

\mathcal{L}

k

center sets such that for at least one

C^{\prime}\in\mathcal{L}

\Phi(V,C^{\prime})\leq(1+\varepsilon)\cdot OPT

3: Constants:

\rho=O(\frac{k}{\varepsilon^{4}})

;

\tau=O(\frac{1}{\varepsilon})

\mathcal{L}\leftarrow\emptyset

;

count\leftarrow 1

5: repeat

6: Sample a multi-set

M

\rho k

points from

V

using

D^{2}

-sampling wrt center set

C

M\leftarrow M\cup\{\tau k\textrm{ copies of each element in $C$}\}

8: for all disjoint subsets

S_{1},...,S_{k}

M

such that

\forall i,|S_{i}|=\tau

\mathcal{L}\leftarrow\mathcal{L}\cup(\mu(S_{1}),...,\mu(S_{k}))

10: end for

11:

count

12: until

count<2^{k}

13: return

\mathcal{L}

Algorithm 2 Algorithm of [BGJK20]

In addition to the input instance $(V,k)$ and error parameter $\varepsilon$ , the algorithm is also given a constant approximate solution $C$ , which is used for $D^{2}$ -sampling. A pseudoapproximate solution $C$ is sufficient for the analysis in [BGJK20]. The discussion from the previous subsection gives a robust algorithm that outputs a pseudoapproximate solution even under errors in distance estimates. So, the input requirement of Algorithm 2 can be met. Now, the main ingredient being $D^{2}$ -sampling, we need to ensure that errors in distance estimate do not seriously impact the approximation analysis of Algorithm 2. We state the main theorem of [BGJK20] before giving the analogous statement for the modified algorithm where $D$ is replaced with $\tilde{D}$ that is $\delta$ -close to $D$ .

Theorem 2.1 (Theorem 1 in [BGJK20])

Let $0<\varepsilon\leq 1/2$ be the error parameter, $V\in\mathbb{R}^{N\times d}$ be the dataset, $k$ be a positive integer, and let $C$ be a constant approximate solution for dataset $V$ . Let $\mathcal{L}$ be the list returned by Algorithm 2 on input $(V,k,\varepsilon,C)$ using the Euclidean distance function $D$ . Then with probability at least $3/4$ , $\mathcal{L}$ contains a center set $C^{\prime}$ such that $\Phi(V,C^{\prime})\leq(1+\varepsilon)\cdot OPT$ . Moreover, $|\mathcal{L}|=\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}\right)$ and the running time of the algorithm is $O(Nd|\mathcal{L}|)$ .

We give the analogous theorem with access to the Euclidean distance function $D$ replaced with a function $\tilde{D}$ that is $\delta$ -close to $D$ .

Theorem 2.2

Let $0<\varepsilon\leq\frac{1}{2}$ be the error parameter, $0<\delta<1$ be the closeness parameter, $V\in\mathbb{R}^{N\times d}$ be the dataset, $k$ be a positive integer, and let $C$ be a constant approximate solution for dataset $V$ . Let $\mathcal{L}$ be the list returned by Algorithm 2 on input $(V,k,\varepsilon,C)$ using the distance function $\tilde{D}$ that is $\delta$ -close to the Euclidean distance function $D$ . Then with probability at least $3/4$ , $\mathcal{L}$ contains a center set $C^{\prime}$ such that $\Phi(V,C^{\prime})\leq(1+\varepsilon)\cdot OPT$ . Moreover, $|\mathcal{L}|=\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon(1-\delta)})}\right)$ and the running time of the algorithm is $O(Nd|\mathcal{L}|)$ .

The proof of the above theorem closely follows the proof of Theorem 2.1 of [BGJK20]. This is similar to the proof of Theorem 2 that we saw earlier, closely following the proof of Lemma 1. The minor changes are related to approximate distance estimates using $\tilde{D}$ instead of real estimates using $D$ . The statement of Theorem 2.2 is not surprising in this light. Instead of repeating the entire proof of [BGJK20], we point out the one change in their argument caused by using $\tilde{D}$ instead of $D$ as the distance function. The analysis of [BGJK20] works by partitioning the points in any optimal cluster $X_{j}$ into those that are close to $C$ and those that are far. For the far points, it is shown that when doing $D^{2}$ -sampling, a far point will be sampled with probability at least $\gamma$ times the uniform sampling probability (see Lemma 21 in [GJK20], which is a full version of [BGJK20]). It then argues that a reasonable size set of $D^{2}$ -sampled points will contain a uniform sub-sample. A combination of the uniform sub-sample along with copies of points in $C$ gives a good center for this optimal cluster $X_{j}$ . Replacing $D$ with $\tilde{D}$ decrease the value of $\gamma$ by a multiplicative factor of $\frac{(1-\delta)^{2}}{(1+\delta)^{2}}\geq(1-\delta)^{4}$ . This means that the number of points sampled should increase by a factor of $O(\frac{1}{1-\delta})$ . This means that the list size increases to $\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon(1-\delta)})}\right)$ . Note that when $\delta\leq\frac{1}{2}$ , the list size and running time retains the same form as that in [BGJK20] (i.e., $|\mathcal{L}|=\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}\right)$ and time $O(Nd|\mathcal{L}|)$ ).

3 Quantum Algorithms

We will work under the assumption that the minimum distance between two data points is $1$ , which can be acheived using scaling. This makes the aspect ratio $\eta$ simply the maximum distance between two data points. We will use $i$ for an index into the rows of the data matrix $V\in\mathbb{R}^{N\times d}$ , and $j$ for an index into the rows of the center matrix $C\in\mathbb{R}^{k\times d}$ . We would ideally like to design a quantum algorithm that performs the transformation:

\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{{D(v_{i},c_{j})}}

Let us call the state on the right $\ket{\Psi_{ideal}}$ . This is an ideal quantum state for us since $\ket{\Psi_{ideal}}$ helps to perform $D^{2}$ -sampling and to find the $k$ -means cost of clustering, which are the main components of the approximation scheme of [BGJK20] that we intend to use. One caveat is that we will only be able to perform the following transformation (instead of the abovementioned transformation)

\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{\psi_{i,j}},

where $\ket{\psi_{i,j}}$ is an approximation for $\ket{\tilde{D}(v_{i},c_{j})}$ in a sense that we will make precise below. We will use $\ket{\Psi_{real}}$ to denote the state $\ket{i}\ket{j}\ket{\psi_{i,j}}$ . This state is prepared using tools such as swap test followed by coherent amplitude estimation, and median estimation. Since these tools and techniques are known from previous works [WKS15, LMR13, KLLP19], we summarise the discussion (see Section 4.1 and 4.2 in [KLLP19]) in the following lemma.

Lemma 3 ([KLLP19] and [WKS15])

Assume for a data matrix $V\in\mathbb{R}^{N\times d}$ and a center set matrix $C\in\mathbb{R}^{t\times d}$ that the following unitaries: (i) $\ket{i}\ket{0}\rightarrow\ket{i}\ket{v_{i}}$ , (ii) $\ket{j}\ket{0}\rightarrow\ket{j}\ket{c_{j}}$ can be performed in time $T$ and the norms of the vectors are known. For any $\Delta>0$ , there is a quantum algorithm that in time $O\left(\frac{T\log{\frac{1}{\Delta}}}{\varepsilon}\right)$ computes:

\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{\psi_{i,j}},

where $\ket{\psi_{i,j}}$ satisfies the following two conditions for every $i\in[N]$ and $j\in[t]$ :

(i)

$\norm{\ket{\psi_{i,j}}-\ket{0^{\otimes\ell}}\ket{{\tilde{D}(v_{i},c_{j})}}}\leq\Delta$ , and
(ii)

For every $i,j$ , $\tilde{D}(v_{i},c_{j})\in(1\pm\varepsilon)\cdot D(v_{i},c_{j})$ .

In the subsequent discussions, we will use $T$ as the time to access the QRAM data structure [kp17], i.e., for the transitions $\ket{i}\ket{0}\rightarrow\ket{i}\ket{v_{i}}$ and $\ket{j}\ket{0}\rightarrow\ket{j}\ket{c_{j}}$ as given in the above lemma. This is known to be $T=O(\log^{2}{(Nd)})$ . Moreover, the time to update each entry in this data structure is also $T=O(\log^{2}{(Nd)})$ . This is the logarithmic factor that is hidden in the $\tilde{O}$ notation. In the following subsections, we discuss the utilities of $\ket{\Psi_{real}}$ for the various components of the approximation scheme of [BGJK20]. During these discussions, it will be easier to see the utility first with the ideal state $\ket{\Psi_{ideal}}$ before the real state $\ket{\Psi_{real}}$ that can actually be prepared. We will see how $\ket{\Psi_{real}}$ is sufficient within a reasonable error bound.

3.1 Finding distance to closest center

Let us see how we can estimate the distance of any point to its closest center in a center set $C$ with $t\leq k$ centers. We can use the transformation $\ket{i}\ket{j}\ket{0}\rightarrow\ket{i}\ket{j}\ket{{D(v_{i},c_{j})}}$ to prepare the following state for any $i$ :

\ket{i}\ket{D(v_{i},c_{1})}\ket{D(v_{i},c_{2})}...\ket{D(v_{i},c_{t})}

We can then iteratively compare and swap pairs of registers to prepare the state $\ket{i}\ket{\min_{j\in[t]}D(v_{i},c_{j})}$ . If we apply the same procedure to $\ket{i}\ket{\psi_{i,1}}...\ket{\psi_{i,t}}$ , then with probability at least $(1-2\Delta)^{t}$ , the resulting state will be $\ket{i}\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}$ . So, the contents of the second register will be an estimate of the distance of the $i^{th}$ point to its closest center in the center set $C$ . This further means that the following state can be prepared with probability at least $(1-2\Delta)^{Nt}$ :⁷⁷7The state prepared is actually $\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\left(\alpha\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}+\beta\ket{G}\right)$ with $|\alpha|^{2}\geq(1-2\Delta)^{Nk}$ . However, instead of working with this state, subsequent discussions become much simpler if we assume that $\ket{\Psi_{C}}$ is prepared with probability $|\alpha|^{2}$ .

\ket{\Psi_{C}}\equiv\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}.

This quantum state can be used to find the approximate clustering cost of the center set $C$ , which we discuss in the following subsection. However, before we do that, let us summarise the main ideas of this subsection in the following lemma.

Lemma 4

There is a quantum algorithm that, with probability at least $(1-2\Delta)^{Nt}$ , prepares the quantum state $\ket{\Psi_{C}}$ in time $O\left(\frac{Tk\log{\frac{1}{\Delta}}}{\varepsilon}\right)$ .

3.2 Computing cost of clustering

Suppose we want to compute the $k$ -means cost, $\Phi(V,C)\equiv\sum_{i=1}^{N}\min_{j\in[k]}{D^{2}(v_{i},c_{j})}$ , of the clustering given by a $k$ center set $C\in\mathbb{R}^{k\times d}$ . We can prepare $m$ copies of the state $\ket{\Psi_{C}}$ and then estimate the cost of clustering by measuring $m$ copies of this quantum state and summing the squares of the second registers. If $m$ is sufficiently large, we obtain a close estimate of $\Phi(V,C)$ . To show this formally, we will use the following Hoeffding tail inequality.

Theorem 3.1 (Hoeffding bound)

Let $X_{1},...,X_{m}$ be independent, bounded random variables such that $X_{i}\in[a,b]$ . Let $S_{m}=X_{1}+...+X_{m}$ . Then for any $\theta>0$ , we have:

\mathbf{Pr}[|S_{m}-\mathbf{E}[S_{m}]|\geq\theta]\leq 2\cdot e^{\frac{-2\theta^{2}}{m(b-a)^{2}}}.

Let $X_{1},...,X_{m}$ denotes the square of the measured value of the second register in $\ket{\Psi_{C}}$ . These are random values in the range $[1,\eta^{2}]$ , where $\eta=\max_{i,j}\tilde{D}(v_{i},v_{j})\in(1\pm\varepsilon)\cdot\max_{i,j}{D(v_{i},v_{j})}$ . First, we note the expectation of these random variables equals $\frac{\tilde{\Phi}(V,C)}{N}$ , where $\tilde{\Phi}(V,C)\equiv\sum_{i=1}^{N}\min_{j\in[k]}\tilde{D}(v_{i},v_{j})\in(1\pm\varepsilon)\cdot\Phi(V,C)$ . We define the variable $S_{t}=X_{1}+X_{2}+...+X_{m}$ and apply the Hoeffding bound on these bounded random variables to get a concentration result that can then be used.

Lemma 5

Let $\alpha_{m}=S_{m}\cdot\frac{N}{m}$ and $L>0$ . If $m=\frac{\eta^{2}\ln{(10L)}}{\varepsilon^{2}}$ , then we have:

\mathbf{Pr}[\alpha_{m}\in(1\pm\varepsilon)\cdot\tilde{\Phi}(V,C)]\geq 1-\frac{1}{5L}.

Proof

We know that $\mathbf{E}[S_{m}]=\frac{m}{N}\cdot\tilde{\Phi}(V,C)$ From the Hoeffding tail inequality, we get the following:

\displaystyle\mathbf{Pr}[|S_{m}-\mathbf{E}[S_{m}]|\geq\varepsilon\cdot\mathbf{E}[S_{m}]]

\displaystyle\leq

\displaystyle 2\cdot e^{\frac{-2\varepsilon^{2}\mathbf{E}[S_{m}]^{2}}{m\eta^{2}}}=2\cdot e^{\frac{-2\varepsilon^{2}m}{\eta^{2}}\cdot\left(\frac{\tilde{\Phi}(V,C)}{N}\right)^{2}}\leq 2\cdot e^{-\ln{(10L)}}\leq\frac{1}{5L}.

This implies that:

\mathbf{Pr}[|\alpha_{m}-\tilde{\Phi}(V,C)|\leq\varepsilon\cdot\tilde{\Phi}(V,C)]\leq\frac{1}{5L}.

This completes the proof of the lemma.∎

So, conditioned on having $m$ copies of the state $\ket{\Psi_{C}}$ , we get an estimate of the clustering cost within a relative error of $(1\pm\varepsilon^{2})$ with probability at least $(1-\frac{1}{5L})$ . Removing the conditioning, we get the same with probability at least $(1-2\Delta)^{Nkm}\cdot(1-\frac{1}{5L})$ . We want to use the above cost estimation technique to calculate the cost for a list of center sets $C_{1},...,C_{L}$ , and then pick the center set from the list with the least cost. We must apply the union bound appropriately to do this with high probability. We summarise these results in the following lemma. Let us first set some of the parameters with values that we will use to implement the approximation scheme of [BGJK20].

•

$L$ denotes the size of the list of $k$ -center sets we will iterate over to find the one with the least cost. This quantity is bounded as $L=\left(\frac{k}{\varepsilon}\right)^{O(\frac{k}{\varepsilon})}$ .
•

$m$ is the number of copies of the state $\ket{\Psi_{C}}$ made to estimate the cost of the center set $C$ . This, as given is Lemma 5 is $m=\frac{\eta^{2}\ln{(10L)}}{\varepsilon^{2}}$ , where $\eta=(1+\varepsilon)\cdot\max_{i,j}{D(v_{i},v_{j})}$ .

Lemma 6

Let $L=\left(\frac{k}{\varepsilon}\right)^{O(\frac{k}{\varepsilon})}$ , $m=\frac{\eta^{2}\ln{(10L)}}{\varepsilon^{2}}$ , and $\Delta=O\left(\frac{1}{NkmL}\right)$ . Given a point set $V$ and a list of center sets $C_{1},...,C_{L}$ in the QRAM model, there is a quantum algorithm that runs in time $\tilde{O}\left(2^{\tilde{O}(\frac{k}{\varepsilon})}T\eta^{2}\right)$ and outputs an index $l$ such that $\Phi(V,C_{l})\leq(1+\varepsilon)^{2}\min_{j\in L}\Phi(V,C_{j})$ with probbaility at least $\frac{3}{5}$ .

Proof

The algorithm estimates the cost of $C_{1},...,C_{L}$ using $m$ copies each of $\ket{\Psi_{C_{1}}},...,\ket{\Psi_{C_{L}}}$ and picks the index with the minimum value in time $O\left(\frac{TkmL\log{\frac{1}{\Delta}}}{\varepsilon}\right)$ . Plugging the values of $L,m$ , and $\Delta$ we get the running time stated in the lemma.

Let us bound the error probability of this procedure. By Lemma 4, the probability that we do not have the correct $m$ copies each of $\ket{\Psi_{C_{1}}},...,\ket{\Psi_{C_{L}}}$ is bounded by $(1-2\Delta)^{NkmL}$ . Conditioned of having $\ket{\Psi_{C_{1}}},...,\ket{\Psi_{C_{L}}}$ , the probability that there exists an index $j\in[L]$ , where the estimate is off by more than a $(1\pm\varepsilon)^{2}$ factor is upper bounded by $\frac{1}{5}$ by the union bound. So, the probability that the algorithm will find an index $l$ such that $\Phi(V,C_{l})>(1+\varepsilon)^{2}\min_{j\in[L]}{\Phi(V,C_{j})}$ is upper bounded by $(1-2\Delta)^{Nklm}+\frac{1}{5}$ . This probability is at most $\frac{2}{5}$ since $\Delta=O(\frac{1}{NkmL})$ . This completes the proof of the lemma. ∎

3.3 $D^{2}$ -sampling

$D^{2}$ -sampling from the point set $V$ with respect to a center set $C\in\mathbb{R}^{t\times d}$ with $t$ centers, samples $v_{i}$ with probability proportional to $\min_{j\in[t]}{D^{2}(v_{i},c_{j})}$ . Let us see if we can use our state $\ket{\Psi_{C}}=\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\ket{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}$ is useful to perform this sampling. If we can pull out the value of the second register as the amplitude, then measurement will give us close to $D^{2}$ -sampling. This is possible since we have an estimate of the clustering cost from the previous subsection. We can use controlled rotations on an ancilla qubit to prepare the state:

\ket{\Psi_{sample}}\equiv\frac{1}{\sqrt{N}}\sum_{i=1}^{N}\ket{i}\left(\beta_{i}\ket{0}+\frac{1}{\sqrt{2}}\ket{1}\right),

where $\beta_{i}=\frac{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}{\sqrt{2\cdot\tilde{\Phi}(V,C)}}$ . So, the probability of measurement of $(i,0)$ is $\frac{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}{2\cdot\tilde{\Phi}(V,C)}$ . Since we do rejection sampling (ignoring $(.,1)$ ’s that are sampled with probability $\frac{1}{2}$ ), we end up sampling with a distribution where the probability of sampling $i$ is $\frac{\min_{j\in[t]}{\tilde{D}(v_{i},c_{j})}}{\tilde{\Phi}(V,C)}\in(1\pm\varepsilon)\cdot\frac{\min_{j\in[t]}{D(v_{i},c_{j})}}{\Phi(V,C)}$ . This means that points get sampled with a probability close to the actual $D^{2}$ -sampling probability. As we have mentioned earlier, this is sufficient for the approximation guarantees of [BGJK20] to hold. We summarise the observations of this section in the next lemma. We will need the following notion of the relative similarity of two distributions.

Definition 2

Let $0<\varepsilon<1$ . For two distributions $D_{1}$ and $D_{2}$ over a finite set $X$ , we say that $D_{1}\sim_{\varepsilon}D_{2}$ if for every $x\in X$ , $D_{1}(x)\in(1\pm\varepsilon)\cdot D_{2}(x)$ .

Lemma 7

Given a dataset $V\in\mathbb{R}^{N\times d}$ and a center set $C\in\mathbb{R}^{t\times d}$ in the QRAM model, there is a quantum algorithm that runs in time $O\left(\frac{TkS\log{\frac{1}{\Delta}}}{\varepsilon}\right)$ and with probability at least $(1-2\Delta)^{NtS}$ outputs $S$ independent samples with distribution $Z$ such that $Z\sim_{\varepsilon}D^{2}$ , where $D^{2}$ denotes the $D^{2}$ -sampling distribution.

Proof

The proof follows from Lemma 4 and the preceding discussion.∎

The above lemma says that for $\Delta=O(\frac{1}{NkS})$ , we obtain the required samples with high probability. We can now give proof of Theorem 1.1 assembling the quantum tools of this section.

Proof (Proof of Theorem 1.1)

The first requirement for executing the algorithm of [BGJK20] is a constant pseudo approximation algorithm using which we obtain the initial center set $C$ . By Lemma 2, we know that $2k$ points sampled using $\tilde{D}^{2}$ -sampling gives such a center set. From Lemma 7, this can be done quantumly in time $\tilde{O}(\frac{k^{2}d}{\varepsilon})$ , which also includes the time $O(kd\log^{2}{(kd)})$ to set up the QRAM data structure for all $k$ iterations. The algorithm of [BGJK20] has an outer repeat loop for probability amplification. Within the outer loop, $poly(\frac{k}{\varepsilon})$ points are $D^{2}$ -sampled with respect to the center set $C$ (line 6). This can again be done quantumly using Lemma 7 in time $\tilde{O}(d(k/\varepsilon)^{O(1)})$ . We can then classically process the point set $M$ (see line 7 in Algorithm 2) and create the QRAM data structure for the list $C_{1},...,C_{L}$ of $k$ -center sets that correspond to all possible disjoint subsets of $M$ (see line 8 in Algorithm 2). This takes time $\tilde{O}(Lkd)$ , where $L=\left(\frac{k}{\varepsilon}\right)^{O(\frac{k}{\varepsilon})}$ . Theorem 2.2 shows that at least one center set in the list gives $(1+\varepsilon)$ -approximation. We use this fact in conjunction with the result of Lemma 6 to get that the underlying quantum algorithm runs in time $\tilde{O}(2^{\tilde{O}(\frac{k}{\varepsilon})}d\eta^{2})$ and with high probability outputs a center set $C^{\prime}$ such that $\Phi(V,C^{\prime})\leq(1+\varepsilon)^{3}\cdot OPT$ .⁸⁸8We needed $(1+\varepsilon)$ , but got $(1+\varepsilon)^{3}$ instead. However, this can be handled with $\varepsilon^{\prime}=\varepsilon/4$ . ∎

4 Discussion and Open Problems

We give a quantum algorithm for the $k$ -means problem with a provable approximation guarantee of $(1+\varepsilon)$ for arbitrary $\varepsilon$ with a polylogarithmic running time dependence on the data size $N$ and an exponential dependence on $\frac{k}{\varepsilon}$ . In the classical setting, there are FPT (fixed-parameter tractable) algorithms that have polynomial running time dependence on the input size $N$ but are allowed to have exponential dependence on the parameters (e.g. $k$ in the $k$ -means problem, which is typically a small number). In this paper, we witnessed a case where we were able to take such a classical FPT algorithm into the quantum setting and lower the dependency on $N$ from linear in the classical setting [BGJK20] to polylogarithmic (this paper) while keeping the dependence on the parameters ( $k,d,\varepsilon$ ) intact. The aspect ratio $\eta$ can be considered an additional parameter. It would be interesting to see if there are other problems where such quantization is possible. If so, discussing Quantum FPT (QFPT) algorithms with polylogarithmic dependence on the input size and possibly exponential dependence on the parameters would make sense. Another future direction is to check whether the sample and query access defined by [Tan19] is sufficient to obtain comparable results in the classical setting.

References

[ABG13] Esma Aïmeur, Gilles Brassard, and Sébastien Gambs. Quantum speed-up for unsupervised learning. Machine Learning, 90:261–287, 2013.
[ADK09] Ankit Aggarwal, Amit Deshpande, and Ravi Kannan. Adaptive sampling for k-means clustering. In Proceedings of the 12th International Workshop and 13th International Workshop on Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX ’09 / RANDOM ’09, pages 15–28, Berlin, Heidelberg, 2009. Springer-Verlag.
[AV07] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, page 1027–1035, USA, 2007. Society for Industrial and Applied Mathematics.
[BGJK20] Anup Bhattacharya, Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. On Sampling Based Algorithms for k-Means. In Nitin Saxena and Sunil Simon, editors, 40th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2020), volume 182 of Leibniz International Proceedings in Informatics (LIPIcs), pages 13:1–13:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
[CGL⁺20] Nai-Hui Chia, András Gilyén, Han-Hsuan Lin, Seth Lloyd, Ewin Tang, and Chunhao Wang. Quantum-Inspired Algorithms for Solving Low-Rank Linear Equation Systems with Logarithmic Dependence on the Dimension. In Yixin Cao, Siu-Wing Cheng, and Minming Li, editors, 31st International Symposium on Algorithms and Computation (ISAAC 2020), volume 181 of Leibniz International Proceedings in Informatics (LIPIcs), pages 47:1–47:17, Dagstuhl, Germany, 2020. Schloss Dagstuhl–Leibniz-Zentrum für Informatik.
[DH99] Christoph Durr and Peter Hoyer. A quantum algorithm for finding the minimum, 1999.
[FGG14] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. A quantum approximate optimization algorithm, 2014.
[GJK20] Dishant Goyal, Ragesh Jaiswal, and Amit Kumar. Streaming ptas for constrained k-means, 2020.
[KLLP19] Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, and Anupam Prakash. q-means: A quantum algorithm for unsupervised machine learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
[Llo82] S. Lloyd. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2):129–137, 1982.
[LMR13] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Quantum algorithms for supervised and unsupervised machine learning, 2013.
[OMA⁺17] J. S. Otterbach, R. Manenti, N. Alidoust, A. Bestwick, M. Block, B. Bloom, S. Caldwell, N. Didier, E. Schuyler Fried, S. Hong, P. Karalekas, C. B. Osborn, A. Papageorge, E. C. Peterson, G. Prawiroatmodjo, N. Rubin, Colm A. Ryan, D. Scarabelli, M. Scheer, E. A. Sete, P. Sivarajah, Robert S. Smith, A. Staley, N. Tezak, W. J. Zeng, A. Hudson, Blake R. Johnson, M. Reagor, M. P. da Silva, and C. Rigetti. Unsupervised machine learning on a hybrid quantum computer, 2017.
[Tan19] Ewin Tang. A quantum-inspired classical algorithm for recommendation systems. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, page 217?228, New York, NY, USA, 2019. Association for Computing Machinery.
[Tan21] Ewin Tang. Quantum principal component analysis only achieves an exponential speedup because of its state preparation assumptions. Physical Review Letters, 127(6), aug 2021.
[WKS15] Nathan Wiebe, Ashish Kapoor, and Krysta M. Svore. Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning. Quantum Info. Comput., 15(3?4):316–356, mar 2015.

A Quantum Approximation Scheme for kk-Means

Abstract

1 Introduction

Theorem 1.1 (Main Theorem)

1.1 An approximation scheme in the classical setting

1. (Constant approximation)

2. (D2D^{2}-sampling)

3. (All subsets)

Organization

1.2 Related work

2 A Robust Approximation Scheme

2.1 Pseudoapproximation using D2D^{2}-sampling

Lemma 1 (Lemma 3.2 in [AV07])

Definition 1

Lemma 2

Proof

2.2 Approximation scheme of [BGJK20]

Theorem 2.1 (Theorem 1 in [BGJK20])

Theorem 2.2

3 Quantum Algorithms

Lemma 3 ([KLLP19] and [WKS15])

3.1 Finding distance to closest center

Lemma 4

3.2 Computing cost of clustering

Theorem 3.1 (Hoeffding bound)

Lemma 5

Proof

Lemma 6

Proof

3.3 D2D^{2}-sampling

Definition 2

Lemma 7

Proof

Proof (Proof of Theorem 1.1)

4 Discussion and Open Problems

References

A Quantum Approximation Scheme for $k$ -Means

2. ( $D^{2}$ -sampling)

2.1 Pseudoapproximation using $D^{2}$ -sampling

3.3 $D^{2}$ -sampling