Improved Approximations for Euclidean $k$ -means and $k$ -median, via Nested Quasi-Independent Sets

Vincent Cohen-Addad , Hossein Esfandiari , Vahab Mirrokni , Shyam Narayanan Google Research. [email protected] Research. [email protected] Research. [email protected] Institute of Technology. Work done as an intern at Google Research. [email protected]

Abstract

Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$ -median and $k$ -means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani [30] and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward [1]. Our algorithm achieves an approximation ratio of $2.406$ and $5.912$ for Euclidean $k$ -median and $k$ -means, respectively, improving upon the 2.633 approximation ratio of Ahmadian et al. [1] and the 6.1291 approximation ratio of Grandoni, Ostrovsky, Rabani, Schulman, and Venkat [25].

Our techniques involve a much stronger exploitation of the Euclidean metric than previous work on Euclidean clustering. In addition, we introduce a new method of removing excess centers using a variant of independent sets over graphs that we dub a “nested quasi-independent set”. In turn, this technique may be of interest for other optimization problems in Euclidean and $\ell_{p}$ metric spaces.

1 Introduction

The $k$ -means and $k$ -median problems are among the oldest and most fundamental clustering problems. Originally motivated by operations research and statistics problems when they first appeared in the late 50s [43, 37, 28] they are now at the heart of several unsupervised and semi-supervised machine learning models and data mining techniques and are thus a main part of the toolbox of modern data analysis techniques in a variety of fields. In addition to their practical relevance, these two problems exhibit strong ties with some classic optimization problems, such as set cover, and understanding their complexity has thus been a long standing problem which has inspired several breakthrough techniques.

Given two sets $\mathcal{D}$ , $\mathcal{F}$ of points in a metric space and an integer $k$ , the goal of the $k$ -median problem is to find a set $S$ of $k$ points in $\mathcal{F}$ , called centers, minimizing the sum of distances from each point in $\mathcal{D}$ to the closest point in $S$ . The goal of the $k$ -means problem is to minimize the sum of distances squared. The complexity of the $k$ -median or $k$ -means problems heavily depends on the underlying metric space. In general metric spaces, namely when the distances only need to obey the triangle inequality, the $k$ -median and $k$ -means problem are known to admit a 2.675-approximation [9] and a 9-approximation [1], respectively, and cannot be approximated better than $1+2/e\sim 1.736$ for $k$ -median and $1+8/e\sim 3.943$ for $k$ -means assuming P $\neq$ NP. We know that the upper and lower bounds are tight when we allow a running time of $f(k)n^{O(1)}$ (i.e.: in the fixed-parameter tractability setting) for arbitrary computable functions $f$ [17], suggesting that the lower bounds cannot be improved for the general case either. When we turn to slightly more structured metric spaces such as Euclidean metrics the picture changes drastically. While the problem remains NP-hard when the dimension $d=2$ (and $k$ is large) [41] or $k=2$ (and $d$ is large) [23], both problems admit $(1+\varepsilon)$ -approximation algorithms with running times $f(k,\varepsilon)nd$ [33], with an exponential dependency in $k$ , and $f(d,\varepsilon)n\log^{O(1)}n$ [15, 14], with a doubly exponential dependency in $d$ (the latter extends to doubling metrics), a prohibitive running time in practice.

Arguably, the most practically important setting is when the input points lie in Euclidean space of large dimension and the number of clusters is non-constant, namely when both $k$ and $d$ are part of the input ¹¹1Note here that $d$ can always be assumed to be of $O(\log k/\varepsilon^{-2})$ using dimensionality reduction techniques [38] (see also [8] for a slightly worse bound). Unfortunately, the complexity of the problem in this regime is far from being understood. Recent results have proven new inapproximability results: respectively 1.17 and 1.07 for the Euclidean k-means and k-median problems assuming P $\neq$ NP and 1.73 and 1.27 assuming the Johnson Coverage hypothesis of [18, 19]. For the continuous case, the same series of papers show a hardness of 1.06 and 1.015 for Euclidean $k$ -means and $k$ -median respectively assuming P $\neq$ NP and 1.36 and 1.08 assuming the Johnson-Coverage hypothesis (see also [21] for further related work on continuous $k$ -median and $k$ -means in other metric spaces). Interestingly, the above hardness results implies that there is no algorithmic benefit that could be gained from the $\ell_{1}$ -metric: Assuming the Johnson Coverage hypothesis the hardness bounds for $k$ -median and $k$ -means are the same in the $\ell_{1}$ -metric that in general metrics [19]. However, it seems plausible to leverage the structure of the $\ell_{2}$ -metric to obtain approximation algorithms bypassing the lower bounds for the general metric or $\ell_{1}$ case (e.g.: obtaining an approximation ratio better than $1+2/e$ for $k$ -median).

In a breakthrough result, Ahmadian et al. [1] were the first to exploit the structure of high-dimensional Euclidean metrics to obtain better bounds than the current best-known bounds for the general metric case. Concretely, they showed how to obtain a 6.3574-approximation for $k$ -means (improving upon the 9-approximation of Kanungo et al. [31]) and a $1+\sqrt{8/3}+\varepsilon\approx 2.633$ -approximation for $k$ -median (improving upon the 2.675-approximation for general metrics [9]). The bound for $k$ -means was recently improved to a 6.1291-approximation (or more precisely, the unique real root to $4x^{3}-24x^{2}-3x-1=0$ ) by [25], by tweaking the analysis of Ahmadian et al. [1], and no progress has been made for Euclidean $k$ -median after [1].

This very active line of research mixes both new hardness of approximation results and approximation algorithms and aims at answering a very fundamental question: How much can we leverage the Euclidean geometry to obtain better approximation algorithms? And conversely, what do we learn about Euclidean geometry when studying basic computational problems? Our results aim at making progress toward answering the above questions.

1.1 Our Results

Our main result consists of better approximation algorithms for both $k$ -median and $k$ -means, with ratio $2.406$ for $k$ -median and $5.912$ for $k$ -means, improving the 2.633-approximation of Ahmadian et al. [1] for $k$ -median and $6.1291$ -approximation of [25] for $k$ -means.

Theorem 1.1.

For any $\varepsilon>0$ , there exists a polynomial-time algorithm that returns a solution to the Euclidean $k$ -median problem whose cost is at most $2.406+\varepsilon$ times the optimum.

For any $\varepsilon>0$ , there exists a polynomial-time algorithm that returns a solution to the Euclidean $k$ -means problem whose cost is at most $5.912+\varepsilon$ times the optimum.

Our approximation ratio for $k$ -median breaks the natural barrier of $1+\sqrt{2}>2.41$ and our approximation ratio for $k$ -means is the first below 6. The approximation bound of $1+\sqrt{2}$ for Euclidean $k$ -median is indeed a natural barrier for the state-of-the-art approach of Ahmadian et al. [1] that relies on using the primal-dual approach of Jain and Vazirani [30]. At a high level, the approximation bound of 3 for general metrics for the algorithm of Jain and Vazirani can be interpreted as an approximation bound of 1+2, where 1 is the optimum service cost and an additional cost of 2 times the optimum is added for input clients poorly served by the solution. Since general metrics are only required to satisfy the triangle inequality, the 2 naturally arises from bounding the distance from a client to its center in the solution output and an application of the triangle inequality to bound this distance. Therefore, one can hope to then obtain a substantial gain when working in Euclidean spaces: The triangle inequality is rarely tight (unless points are aligned) and this leads to the hope of replacing the 2 by $\sqrt{1+1}$ in the above the bound, making the approximation ratio of $1+\sqrt{2}$ a natural target for $k$ -median. In fact, this high-level discussion can be made slightly more formal: we show that the analysis of the result of Ahmadian et al. [1] cannot be improved below $1+\sqrt{2}$ for $k$ -median, exhibiting a limit of the state-of-the-art approaches.

In this paper, we take one step further, similar to the result of Li and Svensson [36] for the general metric case who improved for the first below the approximation ratio of 3, we show how to bypass $1+\sqrt{2}$ for Euclidean $k$ -median (and the bound of 6.129 for $k$ -means).

Furthermore, one of our main contributions is to obtain better Lagrangian Multiplier Preserving (LMP) approximations for the Lagrangian relaxations of both problems. To understand this, we need to give a little more background on previous work and how previous approximation algorithms were derived. A natural approach to the $k$ -median and $k$ -means problem is to (1) relax the constraint on the number of centers $k$ in the solution, (2) find an approximate solution for the relaxed problem, and (3) derive an approximate solution that satisfies the constraint on the number of centers. Roughly, an LMP approximate solution $S$ is a solution where we bound the ratio of the cost of $S$ to the optimum cost, but pay a penalty proportional to some $\lambda\geq 0$ for each center in $S$ . Importantly, if $|S|=k$ , an LMP $\rho$ -approximation that outputs $S$ is a $\rho$ -approximation for $k$ -means (or $k$ -median). We formally define an LMP approximation in Section 2. LMP solutions have played a central role in obtaining better approximation algorithms for $k$ -median in general metrics and more recently in high-dimensional Euclidean metrics. Thus obtaining better LMP solutions for the Lagrangian relaxation of $k$ -median and $k$ -means has been an important problem. A byproduct of our approach is a new 2.395-LMP for Euclidean $k$ -median and $3+2\sqrt{2}$ -LMP for Euclidean $k$ -means.

Our techniques may be of use in other clustering and combinatorial optimization problems over Euclidean space as well, such as Facility Location. In addition, by exploiting the geometric structure similarly, these techniques likely extend to $\ell_{p}$ -metric spaces (for $p>1$ ).

1.2 Related Work

The first $O(1)$ -approximation for the $k$ -median problem in general metrics is due to Charikar et al. [12]. The $k$ -median problem has then been a testbed for a variety of powerful approaches such as the primal-dual schema [30, 10], greedy algorithms (and dual fitting) [29], improved LP rounding [13], local-search [4, 16], and LMP-based-approximation [36]. The current best approximation guarantee is 2.675 [9] and the best hardness result is $(1+2/e)$ [26]. For $k$ -means in general metrics, the current best approximation guarantee is 9 [1] and the current best hardness result is $(1+8/e)$ (which implicitly follows from [26], as noted in [1]).

We have already covered in the introduction the history of algorithms for high-dimensional Euclidean $k$ -median and $k$ -means with running time polynomial in both $k$ and the dimension and that leverage the properties of the Euclidean metrics ([31, 1, 25]). In terms of lower bounds, the first to show that the high-dimensional $k$ -median and $k$ -means problems were APX-hard were Guruswami and Indyk [27], and later Awasthi et al. [6] showed that the APX-hardness holds even if the centers can be placed arbitrarily in $\mathbb{R}^{d}$ . The inapproximability bound was later slightly improved by Lee et al. [34] until the recent best known bounds of [18, 19]. From a more practical point of view, Arthur and Vassilvitskii showed that the widely-used popular heuristic of Lloyd [37] can lead to solutions with arbitrarily bad approximation guarantees [3], but can be improved by a simple seeding strategy, called $k$ -means++, so as to guarantee that the output is within an $O(\log k)$ factor of the optimum [2].

For fixed $k$ , there are several known approximation schemes, typically using small coresets [8, 24, 33] There also exists a large body of bicriteria approximations (namely outputting a solution with $(1+c)k$ centers for some constant $c>0$ ): see, e.g., [7, 11, 20, 32, 39]. There has also been a long line or work on the metric facility location problem, culminating with the result of Li [35] who gave a 1.488-approximation algorithm, almost matching the lower bound of 1.463 of Guha and Khuller [26]. Note that no better bound is known for high-dimensional Euclidean facility location.

1.3 Roadmap

In Section 2, we describe some preliminary definitions. We also formally define the LMP approximation and introduce the LMP framework of Jain and Vazirani [30] and Ahmadian et al. [1]. In Section 3, we provide an overview of the new technical results we developed to obtain the improved bounds. In Section 4, we obtain a $3+2\sqrt{2}\approx 5.828$ LMP approximation for the Euclidean $k$ -means problem. In Section 5, extend our LMP approximation to a $5.912$ -approximation for standard Euclidean $k$ -means. Finally, in Section 6, we obtain a $2.395$ LMP approximation for Euclidean $k$ -median that can be extended to a $2.406$ -approximation for standard Euclidean $k$ -median.

In Appendix E, we briefly show that the result of [1] cannot be extended beyond $1+\sqrt{2}$ for Euclidean $k$ -median: this demonstrates the need of our new techniques for breaking this barrier.

2 Preliminaries

Our goal is to provide approximation algorithms for either the $k$ -means or $k$ -median problem in Euclidean space on a set $\mathcal{D}$ of clients of size $n$ . For the entirety of this paper, we consider the discrete $k$ -means and $k$ -median problems, where rather than having the $k$ centers allowed to be anywhere, we are given a fixed set of facilities $\mathcal{F}$ of size $m$ which is polynomial in $n$ , from which the $k$ centers must be chosen from. It is well-known (e.g., [40]) that a polynomial-time algorithm providing a $\rho$ -approximation for discrete $k$ -means (resp., median) implies a polynomial-time $\rho+\varepsilon$ -approximation for standard $k$ -means (resp., median) for an arbitrarily small constant $\varepsilon$ .

For two points $x,y$ in Euclidean space, we define $d(x,y)$ as the Euclidean distance (a.k.a. $\ell_{2}$ -distance) between $x$ and $y$ . In addition, to avoid redefining everything or restating identical results for both $k$ -means and $k$ -median, we define $c(x,y):=d(x,y)^{2}$ in the context of $k$ -means and $c(x,y):=d(x,y)$ in the context of $k$ -means. For a subset $S$ of Euclidean space, we define $d(x,S):=\min_{s\in S}d(x,s)$ and $c(x,S):=\min_{s\in S}c(x,s)$ .

For the $k$ -means (or $k$ -median) problem, for a subset $S\subset\mathcal{F}$ , we define $\text{cost}(\mathcal{D},S):=\sum_{j\in\mathcal{D}}c(j,S)$ . In addition, we define $\text{OPT}_{k}$ to be the optimum $k$ -means (or $k$ -median) cost for a set $\mathcal{D}$ and a set of facilities $\mathcal{F}$ , i.e., $\text{OPT}_{k}=\min_{S\subset\mathcal{F},|S|=k}\text{cost}(\mathcal{D},S)$ . Recall that a $\rho$ -approximation algorithm is an algorithm that produces a subset of $k$ facilities $S\subset\mathcal{F}$ with $\text{cost}(\mathcal{D},S)\leq\rho\cdot\text{OPT}_{k}$ in the worst-case.

2.1 The Lagrangian LP Relaxation and LMP Solutions

We first look at the standard LP formulation for $k$ -means/medians. The variables of the LP include a variable $y_{i}$ for each facility $i\in\mathcal{F}$ and a variable $x_{i,j}$ for each pair $(i,j)$ for $i\in\mathcal{F}$ and $j\in\mathcal{D}$ . The standard LP relaxation is the following:

minimize	$\displaystyle\sum_{i\in\mathcal{F},j\in\mathcal{D}}x_{i,j}\cdot c(j,i)\hskip 14.22636pt$				(1)
such that		$\displaystyle\sum_{i\in\mathcal{F}}x_{i,j}$	$\displaystyle\geq 1$	$\displaystyle\qquad\forall j\in\mathcal{D}$	(2)
$\displaystyle\quad\sum_{i\in\mathcal{F}}y_{i}$	$\displaystyle\leq k$				(3)
$\displaystyle 0\leq x_{i,j}$	$\displaystyle\leq y_{i}$	$\displaystyle\qquad\forall j\in\mathcal{D},i\in\mathcal{F}$			(4)

The intuition behind this linear program is that we can think of $x_{i,j}$ as the indicator variable of client $j$ being assigned to facility $i$ , and $y_{i}$ as the indicator variable of facility $i$ being opened. We need every facility $j\in\mathcal{D}$ to be assigned to at least one client, that at most $k$ facilities $i$ are opened, and that $x_{i,j}$ is $1$ only if $y_{i}=1$ (since clients can only be assigned to open facilities). We also ensure a nonnegativity constraint on $x_{i,j}$ and $y_{i}$ by ensuring that $0\leq x_{i,j}$ . Finally, our goal is to minimize the sum of distances (for $k$ -median) or the sum of squared distances (for $k$ -means) from each client to its closest facility – or simply the facility it is assigned to, and if exactly one of the $x_{i,j}$ values is $1$ for a fixed client $j$ and the rest are $0$ , then $\sum_{i\in\mathcal{F}}x_{i,j}c(j,i)$ is precisely the distance (or squared distance) from $j$ to its corresponding facility. By relaxing the linear program to have real variables, we can only decrease the optimum, so if we let $L$ be the optimum value of the LP relaxation, then $L\leq\text{OPT}_{k}$ .

Jain and Vazirani [30] considered the Lagrangian relaxation of this linear program, by relaxing the constraint (3) and adding a dependence on a Lagrangian parameter $\lambda\geq 0$ . By doing this, the number of facilities no longer has to be at most $k$ in the relaxed linear program but the objective function penalizes for opening more than $k$ centers. Namely, the goal becomes to minimize

\sum_{i\in\mathcal{F},j\in\mathcal{D}}x_{i,j}\cdot c(j,i)+\lambda\cdot\left(\sum_{i\in\mathcal{F}}y_{i}-k\right)

(5)

subject to Constraints (2) and (4). Indeed, for $\lambda\geq 0$ , the objective only decreases from (1) to (5) for any feasible solution to the original LP. Therefore, this new linear program, which we will call $\text{LP}(\lambda)$ , has optimum $L(\lambda)\leq L$ . Now, it is known that the Dual linear program to this Lagrangian relaxation of the original linear program can be written as the following, which has variables $\alpha=\{\alpha_{j}\}_{j\in\mathcal{D}}$ :

maximize	$\displaystyle\left(\sum_{j\in\mathcal{D}}\alpha_{j}\right)-\lambda\cdot k\hskip 14.22636pt$				(6)
such that		$\displaystyle\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)$	$\displaystyle\leq\lambda$	$\displaystyle\qquad\forall i\in\mathcal{F}$	(7)
$\displaystyle\hskip 102.43008pt\alpha$	$\displaystyle\geq 0$				(8)

We call this linear program $\text{DUAL}(\lambda)$ . Because the optimum to $\text{DUAL}(\lambda)$ equals the optimum to the primal $\text{LP}(\lambda)$ by strong duality, this means that for any $\alpha=\{\alpha_{j}\}_{j\in\mathcal{D}}$ satisfying Conditions (7) and (8), we have that $\left(\sum_{j\in\mathcal{D}}\alpha_{j}\right)-\lambda\cdot k\leq L(\lambda)\leq L\leq\text{OPT}_{k}$ .

For a fixed $\lambda$ , we say that $\alpha$ is feasible if it satisfies both (7) and (8). Thus, to provide a $\rho$ -approximation to $k$ -means (or $k$ -median), it suffices to provide both a feasible $\alpha$ and a subset $S\subset\mathcal{F}$ of size $k$ such that $\text{cost}(\mathcal{D},S)$ is at most $\rho\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda\cdot|S|\right).$

In both the work of Jain and Vazirani [30] and the work of Ahmadian et al. [1], they start with a weaker type of algorithm, called a Lagrangian Multiplier Preserving (LMP) approximation algorithm. To explain this notion, let $\text{OPT}(\lambda)$ represent the optimum (minimum) for the modified linear program $\text{LP}^{\prime}(\lambda)$ , which is the same as $\text{LP}(\lambda)$ except without the subtraction of $\lambda\cdot k$ in the objective function (5). (So, $\text{LP}^{\prime}(\lambda)$ has no dependence on $k$ ). Note that this is also the optimum (maximum) for $\text{DUAL}^{\prime}(\lambda)$ , which is the same as $\text{DUAL}(\lambda)$ except without the subtraction of $\lambda\cdot k$ in the objective function (6). Then, for some fixed $\lambda\geq 0$ , we say that a $\rho$ -approximation algorithm is LMP if it returns a solution $S\subset\mathcal{F}$ satisfying

\sum_{j\in\mathcal{D}}c(j,S)\leq\rho\cdot\left(\text{OPT}(\lambda)-\lambda\cdot|S|\right).

Indeed, if we could find a choice of $\lambda$ and an LMP $\rho$ -approximate solution $S$ with size $|S|=k$ , we would have found a $\rho$ -approximation for $k$ -means (or $k$ -median) clustering.

2.2 Witnesses and Conflict Graphs

Jain and Vazirani [30] proposed a simple primal-dual approach to create a feasible solution $\alpha$ of $\text{DUAL}(\lambda)$ with certain additional properties that are useful for providing an efficient solution to the original $k$ -median (or $k$ -means) problem efficiently. We describe it briefly as follows, based on the exposition of Ahmadian et al. [1, Subsection 3.1].

Start with $\alpha=\textbf{0}$ , i.e., $\alpha_{j}=0$ for all $j\in\mathcal{D}$ . We increase all $\alpha_{j}$ ’s continuously at a uniform rate, but stop growing each $\alpha_{j}$ once one of the following two events occurs:

1.

For some $i\in\mathcal{F}$ , a dual constraint $\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)\leq\lambda$ becomes tight (i.e., reaches equality). Once this happens, we stop growing $\alpha_{j}$ , and declare that facility $i$ is tight for all $i$ such that the constraint became equality. In addition, we will say that $i$ is the witness of $j$ .
2.

For some already tight facility $i$ , we grow $\alpha_{j}$ until $\alpha_{j}=c(j,i)$ . In this case, we also say that $i$ is the witness of $j$ .

We note that this process must eventually terminate for all $j$ (e.g., once $\alpha_{j}$ reaches $\min_{i\in\mathcal{F}}c(j,i)+\lambda$ ). This completes our creation of the dual solution $\alpha$ (it is simple to see that $\alpha$ is feasible).

For any client $j$ , we define $N(j):=\{i\in\mathcal{F}:\alpha_{j}>c(j,i)\}$ , and likewise, for any client $i$ , we define $N(i):=\{j\in\mathcal{D}:\alpha_{j}>c(j,i)\}$ . For any tight facility $i$ , we will define $t_{i}:=\max_{j\in N(i)}\alpha_{j}$ , where $t_{i}=0$ by default if $N(i)=\emptyset$ . For each client $j$ , its witness $i$ will have the useful properties that $t_{i}\leq\alpha_{j}$ and $c(j,i)\leq\alpha_{j}$ .

We have already created our dual solution $\alpha$ : to create our set of $k$ facilities, we will choose a subset of the tight facilities. First, we define the conflict graph on the set of tight facilities. Indeed, Jain and Vazirani [30], Ahmadian et al. [1], and we all have slightly different definitions: so we contrast the three.

•

[30] Here, we say that $(i,i^{\prime})$ forms an edge in the conflict graph $H$ if there exists a client $j$ such that $i,i^{\prime}\in N(j)$ (or equivalently, $\alpha_{j}\geq c(j,i)$ and $\alpha_{j}\geq c(j,i^{\prime})$ ).
•

[1] Here, we say that $(i,i^{\prime})$ forms an edge in the conflict graph $H(\delta)$ (where $\delta>0$ is some parameter) if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}})$ and there exists a client $j$ such that $i,i^{\prime}\in N(j)$ .
•

In our definition, we completely drop the condition from [30], and just say that $(i,i^{\prime})$ forms an edge in the conflict graph $H(\delta)$ if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}})$ .

It turns out that in the algorithm of Ahmadian et al. [1], the approximation factor is not affected by whether they use their definition or our definition. But in our case, it turns out that dropping the condition from [30] in fact allows us to obtain a better approximation.

To provide an LMP approximation, both Jain and Vazirani [30] and Ahmadian et al. [1] constructed a maximal independent set $I$ of the conflict graph $H$ (or $H(\delta)$ for an appropriate choice of $\delta>0$ ) and used $I=S$ as the set of centers. For Jain and Vazirani’s definition, the independent set $I$ obtains an LMP $9$ -approximation for metric $k$ -means. For Ahmadian et al.’s definition, the independent set $I$ obtains an LMP $6.1291$ -approximation for Euclidean $k$ -means if $\delta$ is chosen properly. (We note that Ahmadian et al. only proved a factor of $6.3574$ , though their argument can be improved to show a $6.1291$ -approximation factor as proven by Grandoni et al. [25]). While we will not explicitly prove it, our definition of $H(\delta)$ also obtains the same bound with the same choice of $\delta$ . For the Euclidean $k$ -median problem, using either Ahmadian et al.’s or our definition, one can obtain an LMP $(1+\sqrt{2})$ -approximation: Ahmadian et al. [1] only proved it for the weaker $1+\sqrt{8/3}$ approximation factor, but we prove the improved approximation in Subsection 6.1. We then show how to obtain a better LMP solution and then a better approximation bound.

3 Technical Overview

The algorithms of both Jain and Vazirani [30] and Ahmadian et al [1] begin by constructing an LMP approximation. Their approximation follows two phases: a growing phase and a pruning phase. In the growing phase, as described in Subsection 2.2, they grow the solution $\alpha$ starting from $\alpha=\textbf{0}$ , until they obtain a suitable dual solution $\alpha$ for $\text{DUAL}(\lambda)$ . In addition, they have a list of tight facilities $i$ , which we think of as our candidate centers. The pruning phase removes unnecessary facilities: as described in Subsection 2.2, we create a conflict graph $H(\delta)$ over the tight facilities, and only choose a maximal independent set $I$ . Hence, we are pruning out tight facilities to make sure we do not have too many nearby facilities. This way we ensure that the total number of centers is not unnecessarily large. Our main contributions are to improve the pruning phase with a new algorithm and several new geometric insights, and to show how our LMP approximation can be extended to improved algorithms for standard Euclidean $k$ -means (and $k$ -median) clustering.

Improved LMP Approximation:

To simplify the exposition, we focus on Euclidean $k$ -means. To analyze the approximation, Ahmadian et al. [1] compare the cost of each client $j$ in the final solution to its contribution to the dual objective function. The cost of a client $j$ is simply $c(j,I)$ where $I$ is our set of centers, and $j$ ’s contribution to the dual is $\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))$ , where we recall the definition of $N(j)$ from Subsection 2.2. One can show that the sum of the individual dual contributions equals the dual objective (6). By casework on the size $a=|N(j)\cap I|$ , [25] (by modifying the work of [1]) shows that $c(j,I)\leq\rho\cdot\left[\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right],$ where $\rho\approx 6.1291$ if $\delta\approx 2.1777$ is chosen appropriately (in general, we think of $\delta$ as slightly greater than $2$ ). The only bottlenecks (i.e., where the cost-dual ratio could equal $\rho$ ) are when $a\in\{0,2\}$ . Our LMP approximation improves the pruning phase by reducing the cost-dual ratio in these cases.

Due to the disjointed nature of the bottleneck cases, our first observation was that averaging between two or more independent sets may be beneficial. This way, if for some client $j$ , the first independent set $I_{1}$ had $a=0$ and the second set $I_{2}$ had $a=2$ , perhaps by taking a weighted average of $I_{1}$ and $I_{2}$ we can obtain a set $I$ where $a=1$ with reasonable probability. Hence, the expected cost-dual ratio of $j$ will be below $\rho$ . The second useful observation comes from the fact that $t_{i^{*}}\leq\alpha_{j}$ and $c(j,i^{*})\leq\alpha_{j}$ , if $i^{*}$ is the witness of $j$ . In the $a=0$ case, [1] applies this to show that $d(j,i^{*})\leq\sqrt{\alpha_{j}}$ and $d(i^{*},I)\leq\sqrt{\delta\cdot t_{i^{*}}}\leq\sqrt{\delta\cdot\alpha_{j}}$ , which follows by the definition of conflict graph. Hence, the bottleneck occurs when $i^{*}$ has distance exactly $\sqrt{\alpha_{j}}$ from $j$ , and the nearest point $i\in I$ has distance exactly $\sqrt{\delta\cdot\alpha_{j}}$ from $i^{*}$ , in the direction opposite $j$ . This causes $c(j,I)$ to be $(1+\sqrt{\delta})^{2}\cdot\alpha_{j}$ in the worst case, whereas $j$ ’s contribution to the dual is merely $\alpha_{j}$ . To reduce this ratio, we either need to make sure that the distance from $i^{*}$ to either $j$ or $i$ is smaller, or that $j,i^{*},i$ do not lie in a line in that order.

A reasonable first attempt is to select two choices $\delta_{1}\geq\delta\geq\delta_{2}$ , and consider the nested conflict graphs $H(\delta_{1})\supset H(\delta_{2})$ . We can then create nested independent sets by first creating $I_{1}$ as a maximal independent set of $H(\delta_{1})$ , then extending it to $(I_{1}\cup I_{2})\supset I_{1}$ for $H(\delta_{2})$ . Our final set $S$ will be an average of $I_{1}$ and $(I_{1}\cup I_{2})$ : we include all of $I_{1}$ and each point in $I_{2}$ with some probability. The motivation behind this attempt is that if both $N(j)\cap I_{1}$ and $N(j)\cap I_{2}$ are empty, the witness $i^{*}$ of $j$ should be adjacent in $H(\delta_{1})$ to some point $i_{1}\in I_{1}$ , and adjacent in $H(\delta_{2})$ to some point $i_{2}\in(I_{1}\cup I_{2})$ . Hence, either $i_{1}=i_{2}$ , in which case the distance from $j$ to $I_{1}$ is now only $(1+\sqrt{\delta_{2}})\cdot\sqrt{\alpha_{j}}$ instead of $(1+\sqrt{\delta_{1}})\cdot\sqrt{\alpha_{j}}$ (see Figure 1), or $i_{1}\neq i_{2}$ , in which case $i_{1},i_{2}$ must be far apart because $i_{1},i_{2}$ are both in the independent set $I_{1}\cup I_{2}$ (see Figure 1). In the latter case, we cannot have the bottleneck case for both $i_{1}$ and $i_{2}$ , as that would imply $j,i^{*},i_{2},i_{1}$ are collinear in that order with $d(i^{*},i_{2})=\sqrt{\delta_{2}\cdot\alpha_{j}}$ and $d(i^{*},i_{1})=\sqrt{\delta_{1}\cdot\alpha_{j}}$ , so $i_{2},i_{1}$ are too close. Hence, it appears that we have improved one of the main bottlenecks.

Unfortunately, we run into a new issue, if $i_{2}=j$ and no other points in $I_{1}\cup I_{2}$ were in $N(j)$ . While this case may look good because $|N(j)\cap I_{2}|=1$ which is not a bottleneck, it is only not a bottleneck because the contribution of $j$ to the clustering cost and the dual both equal $0$ in this case. So, if we condition on $i_{2}\in S$ , the cost and dual are not affected by $j$ , but if we condition on $i_{2}\not\in S$ , the cost-dual ratio of $j$ could be $(1+\sqrt{\delta_{1}})^{2}$ – hence, we have again made no improvement. While one could attempt to fix this by creating a series of nested independent sets, this approach also fails to work for the same reasons. Hence, we have a new main bottleneck case, where $i_{2}=j$ , and $j,i^{*},i_{1}$ are collinear in that order with $d(j,i^{*})=\sqrt{\alpha_{j}}$ and $d(i^{*},i_{1})=\sqrt{\delta_{1}\cdot\alpha_{j}}$ (see Figure 1).

We now explain the intuition for fixing this. In the main bottleneck case, if we could add the witness $i^{*}$ of $j$ to $S$ , this would reduce $c(j,S)$ significantly if $i_{2}\not\in S$ , yet does not affect $j$ ’s contribution to the dual. Unfortunately, adding $i^{*}$ to $S$ reduces the dual nonetheless, due to other clients. Instead, we will consider a subset of tight facilities that are close, but not too close, to exactly one tight facility in $I_{2}$ but not close to any other facilities in $I_{1}\cup I_{2}$ . In the main bottleneck cases, the witnesses $i^{*}$ precisely satisfy the condition, as may some additional points. We again prune these points by creating a conflict graph just on these vertices, and pick another maximal independent set $I_{3}$ . Finally, with some probability we will replace each point $i_{2}\in I_{2}$ with the points in $I_{3}$ that are close to $i_{2}$ . In our main bottleneck case, we will either pick $i^{*}\in I_{3}$ , or pick another point $i_{3}$ that is within distance $\sqrt{\delta_{1}\cdot\alpha_{j}}$ of $i^{*}$ , but now must be far from all points in $I_{1}\cup I_{2}$ and therefore will not have distance $(1+\sqrt{\delta_{1}})\cdot\sqrt{\alpha_{j}}$ from $j$ (see Figure 1). In addition, $i_{3}$ might be close, but is not allowed to be too close to $i_{2}=j$ , so if we replace $i_{2}$ with $i_{3}$ , we do not run into the issue of $j$ ’s contribution being $0$ for both the clustering cost and the dual solution.

Given $I_{1},I_{2},I_{3}$ , our overall procedure for generating $S$ is to include all of $I_{1}$ , and include points in $I_{2}$ and $I_{3}$ with some probability. In addition, each point $i_{3}\in I_{3}$ is close to a unique point $i_{2}\in I_{2}$ , and we anti-correlate them being in $S$ . We call the triple $(I_{1},I_{2},I_{3})$ a nested quasi-independent set, since $I_{1}$ and $I_{1}\cup I_{2}$ are independent sets and $I_{1}\cup I_{2}\cup I_{3}$ has similar properties, and these sets $I_{1},I_{1}\cup I_{2},I_{1}\cup I_{2}\cup I_{3}$ are nested.

We remark that the anti-correlation between selecting $i_{2}$ and points $i_{3}$ that are close to $i_{2}$ is reminiscent of a step of the rounding of bipoint solution of Li and Svensson [36] (there the authors combine two solutions by building a collection of stars where the center of the star belongs to a solution, while the leaves belong to another, and choose to open either the center, or the leaves at random). In this context, we will also allow our algorithm to open slightly $k+C$ centers (instead of $k$ ) for some absolute constant $C$ . Similar to Li and Svensson [36] and Byrka et al. [9], we show (Lemma 5.2) that this is without loss of generality (such an algorithm can be used to obtain an approximation algorithm opening at most $k$ centers).

The analysis of the approximation bound is very casework heavy, depending on the values of $a=|N(j)\cap I_{1}|,$ $b=|N(j)\cap I_{2}|$ , and $c=|N(j)\cap I_{3}|$ for each $j$ , and requiring many geometric insights that heavily exploit the structure of the Euclidean metric.

Figure 1: Subfigures (a) and (b) represent the cases when

N(j)\cap I_{1},N(j)\cap I_{2}

are both empty. In (a), we can bound

d(i^{*},i_{1})

\sqrt{\delta_{2}\cdot\alpha_{j}}<\sqrt{\delta_{1}\cdot\alpha_{j}}

, and in (b), we can obtain better bounds for

d(j,i_{1})

and

d(j,i_{2})

than by just the triangle inequality. Subfigure (c) represents the problem with having nested two independent sets, when

N(j)\cap I_{1}=\emptyset

but

N(j)\cap I_{2}

has a unique point

i_{2}=j

. If we replace

i_{2}=j

with the blue node

i_{3}

with some probability, the cost-to-dual ratio improves.

Finally, we remark that this method of generating a set $S$ of centers also can be used to provide an LMP approximation below $1+\sqrt{2}$ for Euclidean $k$ -median. However, due to the differing nature of what the bottleneck cases are, the choices of $\delta_{1},\delta_{2}$ , and the construction of the final set $I_{3}$ will differ. Both to obtain and break the $1+\sqrt{2}$ -approximation for Euclidean $k$ -median, one must understand how the sum of distances from a facility $j$ to its close facilities $i_{1},\dots,i_{a}\in N(j)\cap S$ relates to the pairwise distances between $i_{1},\dots,i_{a}$ . In the $k$ -means case, the squared distances make this straightfoward, but the $k$ -median case requires more geometric insights (see Lemma 6.1) to improve the $1+\sqrt{8/3}$ -approximation of Ahmadian et al. [1] to a $1+\sqrt{2}$ -approximation, even with the same algorithm.

Polynomial-Time Algorithm:

While we can successfully obtain an LMP $(3+2\sqrt{2})$ -approximation for Euclidean $k$ -means, this does not imply a $(3+2\sqrt{2})$ -approximation for the overall Euclidean $k$ -means problem, since our LMP approximation may not produce the right number of cluster centers. To address this issue, Ahmadian et al. [1] developed a method of slowly raising the parameter $\lambda$ and generating a series of dual solutions $\alpha^{(0)},\alpha^{(1)},\alpha^{(2)},\dots$ , where each pair of consecutive solutions $\alpha^{(j)}$ and $\alpha^{(j+1)}$ are sufficiently similar. The procedure of generating dual solutions $\alpha^{(j)}$ in [1] is very complicated, but will not change significantly from their result to ours. Given these dual solutions, [1] shows how to interpolate between consecutive dual solutions $\alpha^{(j)}$ and $\alpha^{(j+1)}$ , creating a series of conflict graphs where the subsequent graph removes at most one facility at a time (but occasionally adds many facilities). This property of removing at most one facility ensures that the maximal independent set decreases by at most 1 at each step (but could increase by a large number if the removed vertex allows for more points to be added). Using an intermediate-value theorem, they could ensure that at some point, there is an independent set $I$ of the conflict graph with size exactly $k$ . Hence, they apply the LMP technique to obtain the same approximation ratio.

In our setting, we are not so lucky, because we are dealing with nested independent sets (and even more complicated versions of them). Even if the size of the first part $I_{1}$ never decreases by more than $1$ at a time, $I_{1}$ could increase which could potentially make the sizes of $I_{2}$ or $I_{3}$ decrease rapidly, even if we only remove one facility from the conflict graph. To deal with this, we instead consider the first time that the expected size of $S$ drops below $k$ (where we recall that all points in $I_{1}$ are in $S$ , but points in $I_{2}$ and $I_{3}$ are inserted into our final set $S$ with some probability). Let $(I_{1},I_{2},I_{3})$ represent the set of chosen facilities right before the expected size of $S$ drops below $k$ for the first time, and $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ represent the set of chosen facilities right after.

To obtain size exactly $k$ , we show that one can always do one of the following: either (i) modify the probabilities of selecting points in $I_{2}$ and $I_{3}$ so that the size is exactly $k$ , or (ii) use submodularity properties of the $k$ -means objective function to interpolate between the set $S$ generated by $(I_{1},I_{2},I_{3})$ and the set $S^{\prime}$ generated by $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ . While $|S|\geq k$ and $|S^{\prime}|<k$ , we do not necessarily have that $S^{\prime}\subset S$ , and in fact we will interpolate between $S^{\prime}$ and $S\cup S^{\prime}$ instead by adding random points from $S\backslash S^{\prime}$ to $S^{\prime}$ . These procedures are relatively computational and require a good understanding of how the dual objective behaves as we modify the probabilities of selecting elements. In addition, because we modify the probabilities to make the size exactly $k$ and because we interpolate between $S^{\prime}$ and $S\cup S^{\prime}$ instead of $S^{\prime}$ and $S$ to obtain our final set of centers, we lose a slight approximation factor from the LMP approximation, but we still significantly improve over the old bound. In addition, this procedure works well for the Euclidean $k$ -means and $k$ -median problems.

One minor problem that we will run into is that a point $i_{2}$ may have many points in $I_{3}$ that are all “close” to it. As a result, even if the expected size of $S$ is exactly $k$ , the variance may be large because we are either selecting $i_{2}$ or all of the points that are close to $i_{2}$ . In this case, we can use the submodularity properties of the $k$ -means objective and coupling methods from probability theory to show that if $i_{2}$ has too many close points in $I_{3}$ , we could instead include $i_{2}$ and include an average number of these close points from $I_{3}$ , without increasing the expected cost.

4 LMP Approximation for Euclidean $k$ -means

In this section, we provide a $(3+2\sqrt{2})$ LMP approximation algorithm (in expectation) for the Euclidean $k$ -means problem. Our algorithm will be parameterized by four parameters, $\delta_{1},\delta_{2},\delta_{3}$ , and $p$ . We fix $\delta_{1},\delta_{2},\delta_{3}$ and allow $p$ to vary, and obtain an approximation constant $\rho(p)$ that is a function of $p$ : for appropriate choices of $p$ , $\rho(p)$ will equal $3+2\sqrt{2}$ .

In Subsection 4.1, we describe the LMP algorithm, which is based on the LMP approximation algorithms by Jain and Vazirani [30] and Ahmadian et al. [1], but using our technique of generating what we call a nested quasi-independent set. In Subsection 4.2, we analyze the approximation ratio, which spans a large amount of casework.

4.1 The algorithm and setup

Recall the conflict graph $H:=H(\delta)$ , where we define two tight facilities $(i,i^{\prime})$ to be connected if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}).$ We set parameters $\delta_{1}\geq\delta_{2}\geq 2\geq\delta_{3}$ and $0<p<1$ , and define $V_{1}$ to be the set of all tight facilities. Given the set of tight facilities $V_{1}$ and conflict graphs $H(\delta)$ over $V_{1}$ for all $\delta>0$ , our algorithm works by applying the procedure described in Algorithm 1 (on the next page) to $V_{1}$ , with parameters $\delta_{1},\delta_{2},\delta_{3}$ , and $p$ .

Algorithm 1 Generate a Nested Quasi-Independent Set of

V_{1}

, as well as a set of centers

S

providing an LMP approximation for Euclidean

k

-means

LMP( $V_{1},\{H(\delta)\},\delta_{1},\delta_{2},\delta_{3},p$ ):

1:Create a maximal independent set

I_{1}

H(\delta_{1})

2:Let

V_{2}

be the set of points in

V_{1}\backslash I_{1}

that are not adjacent to

I_{1}

H(\delta_{2})

3:Create a maximal independent set

I_{2}

of the induced subgraph

H(\delta_{2})[V_{2}]

4:Let

V_{3}

be the set of points

i

V_{2}\backslash I_{2}

such that there is exactly one point in

I_{2}

that is a neighbor of

i

H(\delta_{2})

, there are no points in

I_{1}

that are neighbors of

i

H(\delta_{2})

, and there are no points in

I_{2}

that are neighbors of

i

H(\delta_{3})

5:Create a maximal independent set

I_{3}

of the induced subgraph

H(\delta_{2})[V_{3}]

6:Note that every point

i\in I_{3}

has a unique adjacent neighbor

q(i)\in I_{2}

H(\delta_{2})

. We create the final set

S

as follows:

•

Include every point $i\in I_{1}$ .
•

For each point $i\in I_{2}$ , flip a fair coin. If the coin lands heads, include $i$ with probability $2p$ . Otherwise, include each point in $q^{-1}(i)$ independently with probability $2p$ .

We will call the triple $(I_{1},I_{2},I_{3})$ generated by Algorithm 1 a nested quasi-independent set. Although $I_{1},I_{2},I_{3}$ are disjoint, we call it a nested quasi-independent set since $I_{1}\subset I_{1}\cup I_{2}\subset I_{1}\cup I_{2}\cup I_{3}$ are nested, and $I_{1}$ is a maximal independent set for $H(\delta_{1})$ and $I_{1}\cup I_{2}$ is a maximal independent set for $H(\delta_{2})$ . While $I_{1}\cup I_{2}\cup I_{3}$ is not an independent set, it shares similar properties. As described in the technical overview (and in Algorithm 1), the LMP approximation algorithm uses $(I_{1},I_{2},I_{3})$ to create our output set of centers $S$ . $S$ contains all of $I_{1}$ and each point in $I_{2}\cup I_{3}$ with probability $p$ , but the choices of which points in $I_{2}\cup I_{3}$ are in $S$ are not fully independent.

For the dual solution $\alpha=\{\alpha_{j}\}_{j\in\mathcal{D}}$ and values $t_{i}$ for tight $i$ as generated as in Subsection 2.2, we note the following simple yet crucial facts.

Proposition 4.1.

[1] The following hold.

1.

For any client $j$ and its witness $i$ , $i$ is tight and $\alpha_{j}\geq t_{i}$ .
2.

For any client $j$ and its witness $i$ , $\alpha_{j}\geq c(j,i)$ .
3.

For any tight facility $i$ and any client $j^{\prime}\in N(i)$ , $t_{i}\geq\alpha_{j^{\prime}}$ .

These will essentially be the only facts relating witnesses and clients that we will need to use.

4.2 Main lemma

We consider a more general setup, as it will be required when converting the LMP approximation to a full polynomial-time algorithm. Let $\mathcal{V}\subset\mathcal{F}$ be a subset of facilities (for instance, this may represent the set of tight facilities) and let $\mathcal{D}$ be the full set of clients. For each $j\in\mathcal{D},$ let $\alpha_{j}\geq 0$ be some real number, and for each $i\in\mathcal{V}$ , let $t_{i}\geq 0$ be some real number. For each client $j\in\mathcal{D}$ , we associate with it a set $N(j)\subset\mathcal{V}$ . (For instance, this could be the set $N(j)=\{i\in\mathcal{V}:\alpha_{j}>c(j,i)\}$ ). In addition, suppose that for each client $j\in\mathcal{D},$ there exists a “witness” facility $w(j)\in\mathcal{V}$ . Finally, suppose that we have the following assumptions. (These assumptions will hold by Proposition 4.1 when $\alpha=\{\alpha_{j}\}$ is generated by the procedure in Subsection 2.2, $\mathcal{V}$ is the set of tight facilities, and $N(j)=\{i\in\mathcal{V}:\alpha_{j}>c(j,i)\}.$ )

1.

For any client $j\in\mathcal{D}$ , the witness $w(j)\in\mathcal{V}$ satisfies $\alpha_{j}\geq t_{w(j)}$ and $\alpha_{j}\geq c(j,w(j))$ .
2.

For any client $j\in\mathcal{D}$ and any facility $i\in N(j)$ , $t_{i}\geq\alpha_{j}>c(j,i)$ .

For the Euclidean k-means problem with the above assumptions, we will show the following:

Lemma 4.2.

Consider the set of conflict graphs $\{H(\delta)\}_{\delta>0}$ created on the vertices $\mathcal{V}$ , where $(i,i^{\prime})$ is an edge in $H(\delta)$ if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}})$ . Fix $\delta_{1}=\frac{4+8\sqrt{2}}{7}\approx 2.1877,\delta_{2}=2,\delta_{3}=6-4\sqrt{2}\approx 0.3432,$ and let $p<0.5$ be variable. Now, let $S$ be the randomized set created by applying Algorithm 1 on $V_{1}=\mathcal{V}$ . Then, for any $j\in\mathcal{D}$ ,

\mathbb{E}[c(j,S)]\leq\rho(p)\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right],

(9)

where $\rho(p)$ is some constant that only depends on $p$ (since $\delta_{1},\delta_{2},\delta_{3}$ are fixed).

Remark.

While we have not defined $\rho(p)$ , we will implicitly define it through our cases. We provide detailed visualizations of the bounds we obtain for each of our cases in Desmos (see Appendix A for the links). Importantly, we show that we can set $p$ such that $\rho(p)\leq 3+2\sqrt{2}$ .

To see why Lemma 4.2 implies an LMP approximation (in expectation), fix some $\lambda\geq 0$ . Then, we perform the procedure in Subsection 2.2 and let $\mathcal{V}$ be the set of tight facilities, $N(j)=\{i\in\mathcal{V}:\alpha_{j}>c(j,i)\}$ , and $N(i)=\{j\in\mathcal{D}:\alpha_{j}>c(j,i)\}$ . Then, by adding (9) over all $j\in\mathcal{D}$ , we have that

	$\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S)]$	$\displaystyle\leq\rho(p)\cdot\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{j\in\mathcal{D}}\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]$
		$\displaystyle=\rho(p)\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\mathbb{E}\left[\sum_{i\in S}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\right]\right)$
		$\displaystyle=\rho(p)\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda\cdot\mathbb{E}[\|S\|]\right).$

Above, the final line follows because $\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)=\lambda$ , since $i$ is assumed to be tight. Thus, we obtain an LMP approximation with approximation factor $\rho(p)$ for any choice of $p$ . Given this, we now prove Lemma 4.2.

Proof of Lemma 4.2.

We fix $j$ and do casework based on the sizes $a=|N(j)\cap I_{1}|$ , $b=|N(j)\cap I_{2}|,$ and $c=|N(j)\cap I_{3}|$ . We show that $\mathbb{E}[c(j,S)]\leq\rho(p)\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]$ for each case of $a,b,c.$ We will call $\mathbb{E}[c(j,S)]$ the numerator and $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]$ the denominator, and attempt to show this fraction is at most $\rho(p).$ By scaling all distances (and all $\alpha_{j},t_{i}$ values accordingly), we will assume WLOG that $\alpha_{j}=1$ , so $d(j,w(j)),t_{w(j)}\leq 1$ .

First, we have the following basic proposition.

Proposition 4.3.

$d(j,S)\leq d(j,I_{1})\leq 1+\sqrt{\delta_{1}}$ .

Proof.

Note that $d(j,I_{1})\leq d(j,w(j))+d(w(j),I_{1})$ . But $d(j,w(j))\leq 1$ and by properties of the independent set $I_{1}$ , there exists $i_{1}\in I_{1}$ such that $d(w(j),i_{1})\leq\sqrt{\delta_{1}\cdot\min(t_{w(j)},t_{i_{1}})}\leq\sqrt{\delta_{1}}$ , since $t_{w(j)}\leq 1$ . So, $d(j,I_{1})\leq 1+\sqrt{\delta_{1}}$ , as desired. ∎

In addition, we have the following simple proposition about Euclidean space, the proof of which is deferred to Appendix B.

Proposition 4.4.

Suppose that we have $4$ points $A,B,C,D$ in Euclidean space and parameters $\nu_{1},\nu_{2},\nu_{3},\sigma_{1},\sigma_{2},\sigma_{3}\geq 0$ such that $d(A,B)^{2}\leq 1$ , $d(B,C)^{2}\leq\nu_{1}\cdot\min(\sigma_{1},\sigma_{2})$ , $d(B,D)^{2}\leq\nu_{2}\cdot\min(\sigma_{1},\sigma_{3})$ , and $d(C,D)^{2}\geq\nu_{3}\cdot\min(\sigma_{2},\sigma_{3})$ . Moreover, assume that $\sigma_{1}\leq 1$ and that $\nu_{1},\nu_{2}\geq\nu_{3}$ . Then,

p\cdot\|C-A\|_{2}^{2}+(1-p)\cdot\|D-A\|_{2}^{2}\leq 1+p\cdot\nu_{1}+(1-p)\cdot\nu_{2}+2\sqrt{p\cdot\nu_{1}+(1-p)\cdot\nu_{2}-p(1-p)\cdot\nu_{3}}.

Finally, we will also make frequent use of the well-known fact that for any set of $h\geq 1$ points $I=\{i_{1},\dots,i_{h}\}$ and any $j$ in Euclidean space, $\frac{1}{2h}\sum_{i,i^{\prime}\in I}\|i-i^{\prime}\|_{2}^{2}\leq\sum_{i\in I}\|i-j\|_{2}^{2}.$ As a direct result of this, if $c(j,i)\leq 1$ for all $i\in I$ but $c(i,i^{\prime})\geq 2$ for all $i\neq i^{\prime}\in I$ , then $\sum_{i\in I}(1-c(j,i))\leq 1$ .

We are now ready to investigate the several cases needed to prove Lemma 4.2.

Case 1: $\boldsymbol{a=0,b=1,c=0}$ .

Let $i_{2}$ be the unique point in $I_{2}\cap N(j),$ and let $i^{*}=w(j)$ be the witness of $j$ . Recall that $i^{*}$ is tight, so $i^{*}\in\mathcal{V}_{1}$ . Note that $d(j,i^{*})\leq 1$ and $t_{i^{*}}\leq 1$ .

There are numerous sub-cases to consider, which we enumerate.

$\boldsymbol{i^{*}\not\in V_{2}}$ . In this case, either $i^{*}\in I_{1}$ so $d(i^{*},I_{1})=0$ , or there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{1}})}\leq\sqrt{\delta_{2}}$ . So, $d(j,I_{1})\leq 1+\sqrt{\delta_{2}}$ . In addition, we have that $i_{2}\in S$ with probability $p$ . So, if we let $t:=d(j,i_{2})$ , we can bound the fraction by

\frac{p\cdot t^{2}+(1-p)\cdot(1+\sqrt{\delta_{2}})^{2}}{1-p(1-t^{2})}=\frac{p\cdot t^{2}+(1-p)\cdot(1+\sqrt{\delta_{2}})^{2}}{p\cdot t^{2}+(1-p)}.

Note that $0\leq t<1$ since $i_{2}\in N(j),$ and the above fraction is maximized for $t=0$ , in which case we get that the fraction is at most

(1+\sqrt{\delta_{2}})^{2}.

(1.a)

b)

$\boldsymbol{i^{*}\in V_{3}}.$ In this case, there exists $i_{3}\in I_{3}$ (possibly $i_{3}=i^{*}$ ) such that $d(i^{*},i_{3})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{3}})}.$ In addition, there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\sqrt{\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})}$ . Finally, since $I_{3}\subset V_{2}$ , we must have that $d(i_{1},i_{3})\geq\sqrt{\delta_{2}\cdot\min(t_{i_{1}},t_{i_{3}})}$ . If we condition on $i_{2}\in S$ , then the numerator and denominator both equal $c(j,i_{2})$ , so the fraction is $1$ (or $0/0$ ). Else, if we condition on $i_{2}\not\in S$ , then the denominator is $1$ , and the numerator is at most $p\cdot\|i_{3}-j\|_{2}^{2}+(1-p)\cdot\|i_{1}-j\|_{2}^{2}$ , since $i_{1}\in S$ always, and either $q(i_{3})\neq i_{2}$ , in which case $\mathbb{P}(i_{3}\in S|i_{2}\not\in S)=p$ , or $q(i_{3})=i_{2}$ , in which case $\mathbb{P}(i_{3}\in S|i_{2}\not\in S)=\frac{p}{1-p}\geq p$ .

Note that $d(j,i^{*})\leq 1$ , that $t_{i^{*}}\leq 1$ , and that $\delta_{2},\delta_{1}\geq\delta_{2}$ . So, we may apply Proposition 4.4 with $A=j,B=i^{*},C=i_{3},D=i_{1}$ and $\nu_{1}=\nu_{3}=\delta_{2},\nu_{2}=\delta_{1},\sigma_{1}=t_{i^{*}},\sigma_{2}=t_{i_{3}},\sigma_{3}=t_{i_{1}}$ to bound numerator (and thus the overall fraction since the denominator equals $1$ ) by

$1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}}.$ (1.b)

In the remaining cases, we may assume that $i^{*}\in V_{2}\backslash V_{3}$ . Then, one of the following must occur:

$\boldsymbol{i^{*}=i_{2}}$ . In this case, define $t=d(j,i^{*})\in[0,1]$ , and note that $d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq t+\sqrt{\delta_{1}}$ . So, with probability $p$ , we have that $d(j,S)\leq d(j,i^{*})=t$ , and otherwise, we have that $d(j,S)\leq d(j,I_{1})=t+\sqrt{\delta_{1}}$ . So, we can bound the ratio by

\max_{0\leq t\leq 1}\frac{p\cdot t^{2}+(1-p)\cdot(t+\sqrt{\delta_{1}})^{2}}{1-p\cdot(1-t^{2})}\leq\max\left((\sqrt{0.75}+\sqrt{\delta_{1}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+3p/4}{1-p/4}\right).

(1.c)

We prove this final inequality in Appendix B.

d)

$\boldsymbol{i^{*}\in I_{2}}$ but $\boldsymbol{i^{*}\neq i_{2}}$ . First, we recall that $d(j,i^{*})\leq 1$ . Now, let $t=d(j,i_{2})$ . In this case, with probability $p$ , $d(j,S)\leq t$ (if we select $i_{2}$ to be in $S$ ), with probability $p(1-p)$ , $d(j,S)\leq 1$ (if we select $i^{*}$ but not $i_{2}$ to be in $S$ ), and in the remaining event of $(1-p)^{2}$ probability, we still have that $d(j,S)\leq d(j,I_{1})\leq 1+\sqrt{\delta_{1}}$ by Proposition 4.3. So, we can bound the ratio by

$\max_{0\leq t\leq 1}\frac{p\cdot t^{2}+p(1-p)\cdot 1+(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}}{1-p\cdot(1-t^{2})}.$

Note that this is maximized when $t=0$ (since the numerator and denominator increase at the same rate when $t$ increases), so we can bound the ratio by

$\frac{p(1-p)+(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}}{1-p}=p+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}.$ (1.d)
e)

There is more than one neighbor of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{2})}$ that is in $\boldsymbol{I_{2}}$ . In this case, there is some other point $i_{2}^{\prime}\in I_{2}$ not in $N(j)$ such that $d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})}.$ So, we have four points $j,i^{*},i_{1}\in I_{1},i_{2}^{\prime}\in I_{2}$ such that $d(j,i^{*})\leq 1,$ $d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})},$ $d(i^{*},i_{1})\leq\sqrt{\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})},$ and $d(i_{1},i_{2}^{\prime})\geq\sqrt{\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}^{\prime}})}.$

If we condition on $i_{2}\in S$ , then the denominator equals $c(j,i_{2})$ and the numerator is at most $c(j,i_{2})$ , so the fraction is $1$ (or $0/0$ ). Else, if we condition on $i_{2}\not\in S$ , then the denominator is $1$ , and the numerator is at most $p\cdot\|i_{2}^{\prime}-j\|_{2}^{2}+(1-p)\cdot\|i_{1}-j\|_{2}^{2}$ . Note that $d(j,i^{*})\leq 1$ , that $t_{i^{*}}\leq 1$ , and that $\delta_{2},\delta_{1}\geq\delta_{2}$ . So, as in 1.b, we may apply Proposition 4.4 to bound the ratio by

$1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}}.$ (1.e)
f)

There are no neighbors of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{2})}$ that are in $\boldsymbol{I_{2}}$ . In this case, we would actually have that $i^{*}\in I_{2},$ because we defined $I_{2}$ to be a maximal independent set in the induced subgraph $H(\delta_{2})[V_{2}].$ So, if there were no such neighbors and $i^{*}\in V_{2}$ , then we could add $i^{*}$ to $I_{2}$ , contradicting the maximality of $I_{2}$ . Having $i^{*}\in I_{2}$ was already covered by subcases c) and d).

There is a neighbor of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{3})}$ that is also in $\boldsymbol{I_{2}}$ , which means that either $d(i^{*},i_{2})\leq\sqrt{\delta_{3}\cdot t_{i^{*}}}$ so $d(i_{2},j)\geq\max(0,d(j,i^{*})-\sqrt{\delta_{3}\cdot t_{i^{*}}})$ , or there is some other point $i_{2}^{\prime}\in I_{2}$ not in $N(j)$ such that $d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})}.$ If $d(i^{*},i_{2})\leq\sqrt{\delta_{3}\cdot t_{i^{*}}}$ , then define $t=t_{i^{*}}$ and $u=d(j,i^{*})$ . In this case, $d(j,I_{1})\leq u+\sqrt{\delta_{1}\cdot t},$ and $d(j,i_{2})\geq\max(0,u-\sqrt{\delta_{3}\cdot t}).$ Since $t=t_{i^{*}}\leq 1$ and $u=d(j,i^{*})\leq 1$ , we can bound the overall fraction as at most

	$\displaystyle\hskip 14.22636pt\max_{0\leq t\leq 1}\max_{0\leq u\leq 1}\max_{d(j,i_{2})\geq\max(0,u-\sqrt{\delta_{3}\cdot t})}\frac{(1-p)\cdot(u+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot d(j,i_{2})^{2}}{1-p+p\cdot d(j,i_{2})^{2}}$
	$\displaystyle\leq\max\left((\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p\cdot(1-\sqrt{\delta_{3}})^{2}}\right).$		(1.g.i)

We derive the final inequality in Appendix B.

Alternatively, if $d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})},$ then if we condition on $i_{2}\in S,$ the fraction is $1$ (or $0/0$ ), and if we condition on $i_{2}\not\in S$ , the denominator is $1$ and the numerator is at most $p\cdot d(j,i_{2}^{\prime})^{2}+(1-p)\cdot d(j,i_{1})^{2}\leq p\cdot(1+\sqrt{\delta_{3}})^{2}+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}.$ (Note that $i_{2}\in S$ and $i_{2}^{\prime}\in S$ are independent.) Therefore, we can also bound the overall fraction by

p\cdot(1+\sqrt{\delta_{3}})^{2}+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}.

(1.g.ii)

h)

There is a neighbor of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{2})}$ that is also in $\boldsymbol{I_{1}}$ . In this case, $i^{*}$ would not be in $V_{2}$ , so we are back to sub-case 1.a.

Case 2: $\boldsymbol{a=0,b=1,c\geq 1}$ .

Let $i_{2}$ be the unique point in $N(j)\cap I_{2},$ and let $i_{3}^{(1)},\dots,i_{3}^{(c)}$ represent the points in $N(j)\cap I_{3}$ . Let $c_{1}$ be the number of points in $N(j)\cap I_{3}$ that are in $q^{-1}(i_{2})$ , and let $c_{2}=c-c_{1}$ be the number of points in $N(j)\cap I_{3}$ not in $q^{-1}(i_{2}).$ We will have four subcases. For simplicity, in this case we keep $\delta_{2}=2.$

Before delving into the subcases, we first prove the following propositions regarding the probability of some point in $I_{3}$ being selected.

Proposition 4.5.

Let $c=|N(j)\cap I_{3}|$ . Then, the probability that no point in $N(j)\cap I_{3}$ is in $S$ is at most $\frac{1}{2}+\frac{1}{2}(1-2p)^{c}$ .

Proof.

First, note that $\left(\frac{1}{2}+\frac{1}{2}x\right)\cdot\left(\frac{1}{2}+\frac{1}{2}y\right)+\left(\frac{1}{2}-\frac{1}{2}x\right)\cdot\left(\frac{1}{2}-\frac{1}{2}y\right)=\frac{1}{2}+\frac{1}{2}xy$ . Therefore, if $0\leq x,y\leq 1$ , then $\left(\frac{1}{2}+\frac{1}{2}x\right)\cdot\left(\frac{1}{2}+\frac{1}{2}y\right)\leq\frac{1}{2}+\frac{1}{2}xy.$ In general, through induction we have that for any $0\leq x_{1},\dots,x_{r}\leq 1,$ that $\prod_{s=1}^{r}\left(\frac{1}{2}+\frac{1}{2}x_{s}\right)\leq\frac{1}{2}+\frac{1}{2}x_{1}\cdots x_{r}$ .

Now, group the points $i_{3}^{(1)},\dots,i_{3}^{(c)}\in N(j)\cap I_{3}$ into $r\leq c$ groups of sizes $c_{1},\dots,c_{r}$ , where each group is points that map to the same point in $I_{2}$ under $q$ . Then, for each group $s$ , the probability that no point in the group is in $S$ is precisely $\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{s}}$ , because with probability $\frac{1}{2}$ we will only consider picking the point $q(i)$ (for $i$ in group $s$ ), and otherwise each point in the group will still not be in $S$ with probability $1-2p$ . So, the overall probability is

\prod_{s=1}^{r}\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{s}}\right)\leq\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{1}+\cdots+c_{r}}=\frac{1}{2}+\frac{1}{2}(1-2p)^{c}.\qed

We also note the following related proposition.

Proposition 4.6.

Let $c=|N(j)\cap I_{3}|$ , and $i^{\prime}=q(i)$ for some arbitrary $i\in N(j)\cap I_{3}$ . Then, the probability that no point in $N(j)\cap I_{3}$ nor $i^{\prime}$ is in $S$ is at most $\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}$ .

Proof.

Similar to the previous proposition, we group the points $i_{3}^{(1)},\dots,i_{3}^{(c)}\in N(j)\cap I_{3}$ into $r\leq c$ groups of sizes $c_{1},\dots,c_{r}$ . Assume WLOG that $i$ is in the first group. Then, the probability that no point in the first group nor $i^{\prime}$ is in $S$ is $\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c_{1}}=(1-2p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{1}-1}\right)$ . So, the overall probability is

	$\displaystyle(1-2p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{1}-1}\right)\cdot\prod_{s=2}^{r}\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{s}}\right)$	$\displaystyle\leq(1-2p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{(c_{1}-1)+c_{2}+\cdots+c_{r}}\right)$
		$\displaystyle=\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}.\qed$

$\boldsymbol{c_{1}=0}$ . In this case, we have that no pair of points in $N(j)\cap(I_{2}\cup I_{3})$ are connected in $H(\delta_{2})$ , which means that they have pairwise distances at least $\sqrt{\delta_{2}}$ from each other (since $t_{i}\geq 1$ if $i\in N(j)$ ). So, $\sum_{i\in N(j)\cap(I_{2}\cup I_{3})}c(j,i)\geq\frac{1}{2(c+1)}\cdot c(c+1)(\sqrt{\delta_{2}})^{2}=c$ since $|N(j)\cap(I_{2}\cup I_{3})|=c+1$ and $\delta_{2}=2$ . Consequently, $\sum_{i\in N(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))\leq(c+1)-c=1$ . Therefore, the denominator is at least $1-p$ . To bound the numerator, we note that the probability of none of the points in $I_{2}\cup I_{3}$ being in $S$ is at most $(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)$ . This is because $i_{2}\not\in S$ with probability $1-p$ , no point in $N(j)\cap I_{3}$ is in $S$ with probability at most $\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}$ by Proposition 4.5, and these two events are independent since $c_{1}=0$ . If some point in $N(j)\cap(I_{2}\cup I_{3})$ is in $S$ , then $c(j,S)\leq 1$ , and otherwise, $c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}$ . Therefore, we can bound the numerator as at most $(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\right).$ Overall, we have that the fraction is at most

\frac{(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\right)}{1-p}.

(2.a)

$\boldsymbol{c_{2}=0}$ and $\boldsymbol{c\geq 2}$ . In this case, we have that $q(i_{3}^{(k)})=i_{2}$ for all $k\in[c]$ (note that $c=c_{1}$ ). Since the points $i_{3}^{(k)}$ for all $k\in[c]$ have pairwise distance at least $\sqrt{2}$ from each other, we have that $\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))\leq 1$ . Letting $t$ be such that $1-t=\sum_{i\in N(j)\cap I_{3}}(1-c(j,i)),$ we have that $\mathbb{E}_{i\sim N(j)\cap I_{3}}c(j,i)=\frac{c-1}{c}+\frac{t}{c}$ . In addition, let $u=d(j,i_{2}).$ In this case, the denominator of our fraction is $1-p(1-u^{2})-p(1-t)=p\cdot u^{2}+p\cdot t+(1-2p).$ To bound the numerator, with probability $p$ we have that $i_{2}\in S$ , in which case $c(j,S)\leq u^{2}$ . In addition, there is a disjoint $\frac{1}{2}\cdot\left(1-(1-2p)^{c}\right)$ -probability event where some $i_{3}^{(k)}\in S$ , and conditioned on this event, $\mathbb{E}[c(j,S)]\leq\frac{c-1}{c}+\frac{t}{c}$ . Otherwise, we still have that $c(j,S)\leq c(j,I_{1})\leq(1+\sqrt{\delta_{1}})^{2}$ . So, overall we have that the fraction is at most

\frac{p\cdot u^{2}+\frac{1}{2}\cdot\left(1-(1-2p)^{c}\right)\cdot\left(\frac{c-1}{c}+\frac{t}{c}\right)+\left(1-p-\frac{1}{2}\cdot\left(1-(1-2p)^{c}\right)\right)\cdot(1+\sqrt{\delta_{1}})^{2}}{p\cdot u^{2}+p\cdot t+(1-2p)}.

This function clearly decreases as $u$ increases (since the numerator and denominator increase at the same rate). In addition, since $\frac{1}{2}\cdot(1-(1-2p)^{c})\cdot\frac{1}{c}<p$ (whenever $0<p<\frac{1}{2}$ ), we have that the numerator increases at a slower rate than the denominator when $t$ increases, so this function also decreases as $t$ increases. So, we may assume that $t=u=0$ to get that the fraction is at most

\frac{\frac{1}{2}\cdot(1-(1-2p)^{c})\cdot\frac{c-1}{c}+(1-p-\frac{1}{2}\cdot(1-(1-2p)^{c}))\cdot(1+\sqrt{\delta_{1}})^{2}}{1-2p}.

(2.b)

$\boldsymbol{c_{1},c_{2}\geq 1}$ . In this case, we have that there exists a point $i_{3}^{(k)}\in N(j)\cap I_{3}$ not in $q^{-1}(i_{2})$ , so it has distance at least $\sqrt{\delta_{2}}$ from $i_{2}$ . Therefore, by the triangle inequality, $d(i_{2},j)\geq\sqrt{\delta_{2}}-1.$ Let $t=d(i_{2},j)$ . Next, since all of the points in $N(j)\cap I_{3}$ have pairwise distance at least $\sqrt{2}$ from each other, $\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))\leq 1$ . Therefore, the denominator of the fraction is at most $1-p(1-t^{2})-p\leq 1-2p+t^{2}p$ .

We now bound the numerator. First, by Proposition 4.6 and since $c\geq 2$ , the probability that no point in $N(j)\cap(I_{2}\cup I_{3})$ is in $S$ is some $p^{\prime}$ , where $p^{\prime}\leq\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\leq(1-2p)(1-p)$ . In this event, we have that $c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}$ . In addition, there is a $p$ probability that $i_{2}\in S$ , in which case $c(j,S)\leq t^{2}$ . Finally, with $1-p-p^{\prime}$ probability, we have that $i_{2}\not\in S$ but there is some $i_{3}^{(k)}\in S$ , so $c(j,S)\leq 1$ . Overall, the fraction is at most

\frac{p\cdot t^{2}+p^{\prime}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-p-p^{\prime})}{1-2p+t^{2}p}.

This function clearly decreases as $t$ increases (since the numerator and denominator increase at the same rate), so we may assume that $t=\sqrt{2}-1$ as this is our lower bound on $t$ . So, the fraction is at most

\frac{p\cdot(\sqrt{2}-1)^{2}+p^{\prime}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-p-p^{\prime})}{1-2p+(\sqrt{2}-1)^{2}p}\leq\frac{1-p(2\sqrt{2}-2)+(1-2p)(1-p)\cdot\left((1+\sqrt{\delta_{1}})^{2}-1\right)}{1-p(2\sqrt{2}-1)}.

(2.c)

$\boldsymbol{c_{1}=1}$ and $\boldsymbol{c_{2}=0}$ . In this case, $c=1$ , so we simply write $i_{3}$ as the unique point in $I_{3}\cap N(j)$ . Let $i^{*}=w(j)$ be the witness of $j$ . Since $c_{1}=1$ , this means that $i_{2},i_{3}$ are neighbors in $H(\delta_{2})$ and $q(i_{3})=i_{2}$ . Finally, we have that $i_{2},i_{3}$ are not connected in $H(\delta_{3})$ . So, $d(i_{2},i_{3})\geq\sqrt{\delta_{3}\cdot\min(t_{i_{2}},t_{i_{3}})}.$ Now, note that since $i_{2},i_{3}$ are not in $I_{1}$ , we either have that the witness $i^{*}$ is in $I_{1}$ , in which case $d(j,I_{1})=1$ , or all of $i_{2},i_{3},i^{*}$ are adjacent to $I_{1}$ in $H(\delta_{1})$ since $I_{1}$ was a maximal independent set in $H(\delta_{1})$ . Therefore, if we define $\beta=d(j,i_{2})$ and $\gamma=d(j,i_{3})$ , this means that $d(j,I_{1})\leq\min\left(1+\sqrt{\delta_{1}},\beta+\sqrt{\delta_{1}\cdot t_{i_{2}}},\gamma+\sqrt{\delta_{1}\cdot t_{i_{3}}}\right),$ and $d(i_{2},i_{3})\leq\beta+\gamma$ by triangle inequality.

Note that the denominator equals $1-p(1-\beta^{2})-p(1-\gamma^{2})$ . To bound the numerator, note that with probability $p$ , $i_{2}\in S$ in which case $d(j,S)\leq\beta$ and with probability $p$ , $i_{3}\in S$ in which case $d(j,S)\leq\gamma$ . Also, these two events are disjoint since $q(i_{3})=i_{2}$ . Finally, in the remaining $1-2p$ probability event, $d(j,S)\leq d(j,I_{1})\leq\min(1+\sqrt{\delta_{1}},\beta+\sqrt{\delta_{1}\cdot t_{i_{2}}},\gamma+\sqrt{\delta_{1}\cdot t_{i_{3}}})$ .

Letting $t=\min(t_{i_{2}},t_{i_{3}})\geq 1$ , we have that $\min\left(\beta+\sqrt{\delta_{1}\cdot t_{i_{2}}},\gamma+\sqrt{\delta_{1}\cdot t_{i_{3}}}\right)\leq\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t}$ . We also know that $\sqrt{\delta_{3}\cdot t}\leq d(i_{2},i_{3})\leq\beta+\gamma$ . Therefore, we can bound the ratio by

	$\displaystyle\hskip 14.22636pt\max_{\begin{subarray}{c}t\geq 1\\ \beta+\gamma\geq\sqrt{\delta_{3}\cdot t}\end{subarray}}\frac{(1-2p)\cdot\min(1+\sqrt{\delta_{1}},\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot\beta^{2}+p\cdot\gamma^{2}}{1-p(1-\beta^{2})-p(1-\gamma^{2})}$
	$\displaystyle=\frac{\left(1-2p\right)\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\left(\delta_{1}+\left(\sqrt{\delta_{1}}+\sqrt{\delta_{3}}\right)^{2}\right)+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}{\left(1-2p\right)\cdot\left(\delta_{1}+\left(\sqrt{\delta_{1}}+\sqrt{\delta_{3}}\right)^{2}\right)+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}.$		(2.d)

We prove the final equality in Appendix B.

Case 3: $\boldsymbol{a=0,b\geq 2}$ .

We split this case into three cases. First, we recall that each point $i\in I_{3}$ corresponds to some point $q(i)\in I_{2}$ . Let $c_{1}$ represent the number of points such $i\in N(j)\cap I_{3}$ such that $q(i)\in N(j)\cap I_{2}$ , and let $c_{2}=c-c_{1}$ . Note that if $c=0$ , then $c_{1}=c_{2}=0$ .

a)

$\boldsymbol{c_{1}=0}$ . In this case, all of the points in $N(j)\cap(I_{2}\cup I_{3})$ must not be connected in $H(\delta_{2})$ . Therefore, $\sum_{i\in N(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))\leq 1$ and since $a=0$ , this means that the denominator is at least $1-p$ in expectation. In addition, since $b\geq 2,$ the probability of no point in $N(j)\cap I_{2}$ being in $S$ is at most $(1-p)^{2}$ , in which case $d(j,S)\leq(1+\sqrt{\delta_{1}})^{2}$ . If there is some point in $N(j)\cap I_{2}$ in $S$ , then $d(j,S)\leq 1.$ Therefore, the numerator is at most $(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-(1-p)^{2}).$ So, we can bound the fraction by

$\frac{(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-(1-p)^{2})}{1-p}.$ (3.a)

$\boldsymbol{c_{1}=1,c_{2}=0}$ . In this case, the probability of there being a point in $N(j)\cap(I_{2}\cup I_{3})$ that is part of $S$ is $(1-2p)\cdot(1-p)^{b-1}$ . Conditioned on this event, $c(j,S)\leq 1$ , and otherwise, $c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}$ . So, the numerator is at most $(1-2p)\cdot(1-p)^{b-1}\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-2p)\cdot(1-p)^{b-1}\right)\cdot 1.$ Finally, we have that all of the points in $N(j)\cap(I_{2}\cup I_{3})$ are separated by at least $\sqrt{2}$ , except for the unique point $i_{3}\in N(j)\cap I_{3}$ and $q(i_{3})$ , which are separated by at least $\sqrt{\delta_{3}}$ . So, $\sum_{N(j)\cap(I_{2}\cup I_{3})}c(j,i)\geq\frac{1}{b+1}\cdot\left(\frac{b(b+1)}{2}\cdot 2+(\delta_{3}-2)\right)=b+\frac{\delta_{3}-2}{b+1},$ which means that $\sum_{N(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))\leq 1+\frac{2-\delta_{3}}{b+1}.$ Therefore, the denominator is at least $1-p\cdot\left(1+\frac{2-\delta_{3}}{b+1}\right).$ Overall, the fraction is at most

\frac{(1-p)^{b-1}\cdot(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-p)^{b-1}(1-2p)\right)}{1-\left(1+\frac{2-\delta_{3}}{b+1}\right)\cdot p}

(3.b)

$\boldsymbol{c_{1}\geq 2}$ or $\boldsymbol{c_{1}=1,c_{2}\geq 1}$ . In this case, we first note that since all of the points in $N(j)\cap I_{2}$ have distance at least $\sqrt{2}$ from each other and all of the points in $N(j)\cap I_{3}$ have distance at least $\sqrt{2}$ from each other, both $\sum_{i\in N(j)\cap I_{2}}(1-c(j,i))$ and $\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))$ are at most $1$ . Let $t$ be such that $1-t=\sum_{i\in N(j)\cap I_{2}}(1-c(j,i)).$ Then, the denominator is at least $1-p(2-t)$ . In addition, with probability $1-(1-p)^{b}$ , at least one of the points in $N(j)\cap I_{2}$ will be in $S$ , conditioned on which the expected value of $c(j,S)$ is at most $\frac{1}{b}\cdot\sum_{i\in N(j)\cap I_{2}}c(j,i)=\frac{b-1}{b}+\frac{t}{b}.$ Next, note that the probability of no point in $N(j)\cap(I_{2}\cup I_{3})$ being in $S$ is maximized when all points $i\in N(j)\cap I_{3}$ with $q(i)\in N(j)$ map to a single point $i_{2}\in N(j)\cap I_{2}$ , and all other points in $N(j)\cap I_{3}$ map to a single point $i_{2}^{\prime}\in I_{2}\backslash N(j)$ . In this case, the probability that no point in $N(j)\cap(I_{2}\cup I_{3})$ is in $S$ is at most $p^{\prime}:=(1-p)^{b-1}\cdot\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c_{1}}\right)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right).$

Overall, we have that with probability $1-(1-p)^{b}$ , some point in $I_{2}\cap N(j)$ is in $S$ , conditioned on which $\mathbb{E}[c(j,S)]\leq\frac{b-1}{b}+\frac{t}{b}$ , with probability at most $p^{\prime},$ no point in $N(j)\cap(I_{2}\cup I_{3})$ is in $S$ , conditioned on which $c(j,S)\leq(1+\sqrt{\delta_{1}})^{2},$ and otherwise, some point in $N(j)\cap I_{3}$ is in $S$ , which means $c(j,S)\leq 1.$ So, we can bound this fraction overall by

\frac{(1-(1-p)^{b})\cdot\left(\frac{b-1}{b}+\frac{t}{b}\right)+p^{\prime}\cdot(\sqrt{\delta_{1}})^{2}+((1-p)^{b}-p^{\prime})\cdot 1}{1-2p+p\cdot t}.

Noting that $(1-(1-p)^{b})\cdot\frac{1}{b}\leq p$ , we have that the numerator increases at a slower rate than the denominator as $t$ increases. Therefore, this fraction is maximized when $t=0$ . So, we can bound the fraction by

		$\displaystyle\hskip 14.22636pt\frac{(1-(1-p)^{b})\cdot\frac{b-1}{b}+p^{\prime}\cdot(1+\sqrt{\delta_{1}})^{2}+((1-p)^{b}-p^{\prime})}{1-2p}$
		$\displaystyle=\frac{\frac{b-1}{b}+\frac{(1-p)^{b}}{b}+(1-p)^{b-1}\cdot\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c_{1}}\right)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\cdot\left((1+\sqrt{\delta_{1}})^{2}-1\right)}{1-2p}.$		(3.c)

Case 4: $\boldsymbol{a=0,b=0}$ .

We split this case into three subcases.

a)

$\boldsymbol{c=0}.$ In this case, $N(j)\cap S$ is always empty, so the denominator is $1$ . To bound the numerator, we consider the witness $i^{*}$ of $j$ . If $i^{*}\in I_{1}$ , the numerator is at most $c(j,i^{*})\leq 1$ so we can bound the fraction by $1$ . Else, if $i^{*}\not\in V_{2},$ then there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}})}\leq\sqrt{\delta_{2}}$ , and since $d(j,i^{*})\leq 1,$ we have that $d(j,I_{1})\leq 1+\sqrt{\delta_{2}})$ . Thus, the fraction in the case where $i\in I_{1}$ or $i^{*}\not\in V_{2}$ is at most

$(1+\sqrt{\delta_{2}})^{2}.$ (4.a.i)

Otherwise, there is some $i_{1}\in I_{1}$ of distance at most $\sqrt{\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})}\leq\sqrt{\delta_{1}}$ away from $i^{*}$ . Next, if $i^{*}\in I_{2},$ the numerator is at most $p\cdot 1+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}$ . Otherwise, there is some $i_{2}\in I_{2}$ of distance at most $\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}})}$ away from $i^{*}$ . Finally, $d(i_{1},i_{2})\geq\sqrt{\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}})}$ . Therefore, as in 1.b, we can apply Proposition 4.4 to obtain that the numerator, and therefore, the fraction is at most

$1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}},$ (4.a.ii)

since the above (4.a.ii) is greater than both $1$ and $p+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}$ for any $0<p<1.$
b)

$\boldsymbol{c=1}.$ In this case, let $i_{3}$ be the unique element in $N(j)\cap I_{3}$ . Conditioned in $i_{3}\in S$ , the denominator equals $c(j,i_{3})$ and the numerator is at most $c(j,i_{3})$ . Otherwise, the denominator is $1$ , and the numerator can again be bounded in an identical way to the previous case, since the probability of $i_{2}\in S$ is either $p$ (if $q(i_{3})\neq i_{2}$ ) or $\frac{p}{1-p}>p$ (if $q(i_{3})=i_{2}$ ). Therefore, the fraction is again at most

$(1+\sqrt{\delta_{2}})^{2}$ (4.b.i)

if the witness $i^{*}$ of $j$ satisfies $i^{*}\in I_{1}$ or $i^{*}\not\in V_{2}$ , and is at most

$1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}}$ (4.b.ii)

otherwise.

$\boldsymbol{c\geq 2}$ . In this case, note that all points in $N(j)\cap I_{3}$ are separated by at least $\sqrt{2}$ , which means that $\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))\leq 1$ . Letting $t$ be such that $1-t=\sum_{i\in N(j)\cap I_{3}}(1-c(j,i)),$ we have that $\sum_{i\in N(j)\cap I_{3}}c(j,i)=c-1+t$ , so there exists $i_{3}\in N(j)\cap I_{3}$ such that $c(j,i_{3})\leq\frac{c-1+t}{c}$ . In addition, we know that $c(j,I_{1})\leq(1+\sqrt{\delta_{1}})^{2}$ . Finally, we also note the denominator equals $1-p(1-t)=1-p+pt$ .

Next, note that

	$\displaystyle\sum_{i,i^{\prime}\in N(j)\cap I_{3}}c(i,i^{\prime})$	$\displaystyle=\sum_{i,i^{\prime}\in N(j)\cap I_{3}}\left[\\|i-j\\|^{2}+\\|i^{\prime}-j\\|^{2}-2\langle i-j,i^{\prime}-j\rangle\right]$
		$\displaystyle=2c\cdot\left[\sum_{i\in N(j)\cap I_{3}}c(j,i)\right]-2\left\langle\sum_{i\in N(j)\cap I_{3}}(i-j),\sum_{i^{\prime}\in N(j)\cap I_{3}}(i^{\prime}-j)\right\rangle$
		$\displaystyle\leq 2c\cdot\left[\sum_{i\in N(j)\cap I_{3}}c(j,i)\right]$
		$\displaystyle=2c\cdot(c-1+t).$

Since $c(i,i)=0$ , this means there exists $i\neq i^{\prime}$ such that $c(i,i^{\prime})\leq\frac{2(c-1+t)}{c-1},$ and since $c(i,i^{\prime})\geq 2\cdot\min(t_{i},t_{i^{\prime}})$ for any $i,i^{\prime}\in I_{3}$ , this means that $\min_{i\in N(j)\cap I_{3}}t_{i}\leq\frac{c-1+t}{c-1}.$

Let $i=\arg\min_{i\in N(j)\cap I_{3}}t_{i}$ , and let $i_{2}=q(i)\in I_{2}$ . Let $\mathcal{E}_{1}$ be the event that no point in $N(j)\cap I_{3}$ nor $i_{2}$ is in $S$ , let $\mathcal{E}_{2}$ be the event that no point in $N(j)\cap I_{3}$ is in $S$ , and let $\mathcal{E}_{3}$ be the event that $i_{3}$ is not in $S$ . Note that $\mathcal{E}_{1}$ implies $\mathcal{E}_{2}$ , which implies $\mathcal{E}_{3}$ . Now, by Proposition 4.6, $\mathbb{P}(\mathcal{E}_{1})$ equals some $p_{1}\leq\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}$ . Likewise, By Proposition 4.5, $\mathbb{P}(\mathcal{E}_{2})$ equals some $p_{2}\leq\frac{1}{2}+\frac{1}{2}(1-2p)^{c}$ . In addition, $\mathbb{P}(\mathcal{E}_{3})=p_{3}=1-p$ . Under the event $\mathcal{E}_{3}^{c}$ , we have that $c(j,S)\leq c(j,i_{3})\leq\frac{c-1+t}{c}$ . Next, under the event $\mathcal{E}_{3}\backslash\mathcal{E}_{2},$ we have that some point in $N(j)\cap I_{3}$ is in $S$ , so $c(j,S)\leq 1$ . Under the event $\mathcal{E}_{2}\backslash\mathcal{E}_{1}$ , we know that $i_{2}\in S$ , so $c(j,S)\leq d(j,i_{2})^{2}\leq(c(j,i)+\sqrt{\delta_{2}\cdot t_{i}})^{2}\leq\left(1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)^{2}$ . Finally, we always have that $c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}$ .

Therefore, we can bound the overall fraction by

\frac{p_{1}\cdot(1+\sqrt{\delta_{1}})^{2}+(p_{2}-p_{1})\cdot\min\left(1+\sqrt{\delta_{1}},1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)^{2}+(p_{3}-p_{2})\cdot 1+(1-p_{3})\cdot\frac{c-1+t}{c}}{1-p+pt}.

Since $(1+\sqrt{\delta_{1}})^{2}\geq\min\left(1+\sqrt{\delta_{1}},1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)^{2}\geq 1\geq\frac{c-1+t}{c},$ the above fraction is an increasing function in the variables $p_{1},p_{2},p_{3}$ . So, we can upper bound this fraction by replacing $p_{1},p_{2},p_{3}$ with their respective upper bounds $\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c},$ $\frac{1}{2}+\frac{1}{2}(1-2p)^{c}$ , and $1-p$ , as well as replacing $\min\left(1+\sqrt{\delta_{1}},1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)$ with simply $1+\sqrt{2\cdot\frac{c-1+t}{c-1}}$ . Next, since $p_{2}-p_{1}=p$ and $1-p_{3}=p$ , the derivative of the numerator with respect to $t$ is $p\cdot\left(\frac{2}{c-1}+\sqrt{\frac{2}{(c-1)(c-1+t)}}\right)+p\cdot\frac{1}{c}\leq p\cdot(2.5+\sqrt{2})$ , and the derivative of the denominator with respect to $t$ is $pt$ . Hence, this fraction decreases as $t$ increases, unless the fraction is less than $2.5+\sqrt{2}$ . Therefore, we can bound the fraction by

	$\displaystyle\hskip 14.22636pt\max\left(2.5+\sqrt{2},\frac{p_{1}\cdot(1+\sqrt{\delta_{1}})^{2}+(p_{2}-p_{1})\cdot(1+\sqrt{2})^{2}+(p_{3}-p_{2})\cdot 1+(1-p_{3})\cdot\frac{c-1}{c}}{1-p}\right)$
	$\displaystyle=\max\left(2.5+\sqrt{2},\frac{p_{1}\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1+\sqrt{2})^{2}+(1-p-p_{2})\cdot 1+p\cdot\frac{c-1}{c}}{1-p}\right)$
	$\displaystyle=\max\left(2.5+\sqrt{2},\frac{\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\right)\cdot(1+\sqrt{\delta_{1}})^{2}-\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c}\right)+p(1+\sqrt{2})^{2}+1-\frac{p}{c}}{1-p}\right),$		(4.c)

where we used the facts that $p_{3}=1-p$ and $p_{2}-p_{1}=p$ .

Case 5: $\boldsymbol{a\geq 1}$ .

In this case, note that $\mathbb{E}[c(j,S)]\leq\mathbb{E}[c(j,I_{1})]\leq 1$ deterministically, since $a=|I_{1}\cap N(j)|\geq 1$ . However, we can improve upon this, since we may have some points in $I_{1}\cap N(j)$ much closer to $j$ , or we may have some points in $(I_{2}\cup I_{3})\cap N(j)$ which are closer and appear with some probability.

Recall that in our algorithm, we flip a coin for each $i\in I_{2}$ to decide whether we include $i\in S$ with probability $2p$ or include each $q^{-1}(i)$ in $S$ independently with probability $2p$ . Let us condition on all of these fair coin flips, and say that a point $i\in I_{2}\cup I_{3}$ survives the coin flips if they could be in the set $S$ with probability $2p$ afterwards. For simplicity, we replace $p$ with $\accentset{\rule{2.79996pt}{0.7pt}}{p}:=2p$ . We also let $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ represent the points in $I_{2}\cup I_{3}$ that survive the fair coin flips.

Let the squared distances from $j$ to each of the points in $I_{1}\cap N(j)$ be $r_{1},\dots,r_{a}$ , and the squared distances from $j$ to each of the points in $\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j)$ be $s_{1},\dots,s_{h}$ , where $h=|\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j)|$ . It is trivial to see that $\mathbb{E}[c(j,S)]\leq\min_{1\leq i\leq a}r_{i}\leq\frac{r_{1}+\cdots+r_{a}}{a},$ and conditioned on at least one of the points in $\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j)$ being selected, we have that $c(j,S)$ in expectation is at most $\frac{s_{1}+\cdots+s_{h}}{h}.$ The probability of at least one of the points in $\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j)$ being selected in $S$ is

1-(1-\accentset{\rule{2.79996pt}{0.7pt}}{p})^{h}\geq 1-\frac{1}{(1+\accentset{\rule{2.79996pt}{0.7pt}}{p})^{h}}\geq 1-\frac{1}{1+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}=\frac{\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}{1+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}\geq\frac{\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h},

since conditioned on the initial coin flips, each surviving point in $(\accentset{\rule{2.79996pt}{0.7pt}}{I_{2}}\cup\accentset{\rule{2.79996pt}{0.7pt}}{I_{3}})$ is included in $S$ independently with probability $\accentset{\rule{2.79996pt}{0.7pt}}{p}$ . Therefore, we can say that

\mathbb{E}[c(j,S)]\leq\frac{r_{1}+\cdots+r_{a}}{a}\cdot\frac{a}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}+\frac{s_{1}+\cdots+s_{h}}{h}\cdot\frac{\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}=\frac{(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot(s_{1}+\cdots+s_{h})}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}.

Next, we have that

	$\displaystyle\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]$	$\displaystyle=\alpha_{j}-(a\cdot\alpha_{j}-(r_{1}+\cdots+r_{a}))-\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot(h\cdot\alpha_{j}-(s_{1}+\cdots+s_{h}))$
		$\displaystyle=\alpha_{j}\cdot(1-(a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h))+[(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot(s_{1}+\cdots+s_{h})].$

Now, we provide a lower bound for $(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}(s_{1}+\cdots+s_{h}).$ To do so, we use the fact that all the points in $N(j)\cap I_{1}$ are separated by at least $\delta_{1}\cdot\alpha_{j}$ in squared distance, and all the surviving points in $N(j)\cap(I_{1}\cup\accentset{\rule{2.79996pt}{0.7pt}}{I_{2}}\cup\accentset{\rule{2.79996pt}{0.7pt}}{I_{3}})$ are separated by at least $\delta_{2}\cdot\alpha_{j}$ in squared distance, to get

(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}(s_{1}+\cdots+s_{h})\geq\frac{1}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}\cdot\alpha_{j}\cdot\left(\delta_{1}\cdot\frac{a(a-1)}{2}+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot a\cdot h+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}^{2}\cdot\frac{h(h-1)}{2}\right).

So, if $a\geq 1$ and $(a,h)\neq(1,0)$ , then if we let $T_{1}=a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h,$ $T_{3}=\delta_{1}\cdot\frac{a(a-1)}{2}+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot a\cdot h+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}^{2}\cdot\frac{h(h-1)}{2},$ and $T_{2}=T_{3}/T_{1}$ , then the ratio is at most

\frac{T_{2}/T_{1}}{1-T_{1}+T_{2}}=\frac{T_{3}}{T_{1}(T_{1}-T_{1}^{2}+T_{3})}=\frac{1}{T_{1}}+\frac{T_{1}-1}{T_{1}-T_{1}^{2}+T_{3}}.

(5.a)

In the case where $a=1,h=0,$ this fraction is undefined. However, we note that in this case, $N(j)\cap S$ deterministically contains a unique center $i^{*}$ and nothing else, so $\mathbb{E}[c(j,S)]\leq c(j,i^{*})$ and $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]=\alpha_{j}-(\alpha_{j}-c(j,i^{*}))=c(j,i^{*}).$ Therefore, the fraction is $1$ . ∎

Therefore, we have the LMP approximation is at most $\rho(p)$ , where $\rho(p)$ is determined via the numerous cases in the proof of Lemma 4.2. The final step is to actually bound $\rho(p)$ based on the cases. Indeed, by casework one can show the following proposition.

Proposition 4.7.

For $p\in[0.096,0.402]$ and $\delta_{1}=\frac{4+8\sqrt{2}}{7},\delta_{2}=2$ , and $\delta_{3}=0.265$ , we have that

\rho(p)\leq\max\bigg{(}3+2\sqrt{2},1+2p+(1-p)\cdot\delta_{1}+2\sqrt{2p^{2}+(1-p)\cdot\delta_{1}},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p(1-\sqrt{\delta_{3}})^{2}},\\ \frac{(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}{(1-2p)\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot(1+\sqrt{\delta_{1}})^{2}\cdot\delta_{3}}\bigg{)}.

As a consequence, for $p_{1}:=0.402$ , $\rho(p_{1})\leq 3+2\sqrt{2}$ .

We defer the casework to Lemma 5.19, and the above proposition follows immediately from it. Therefore, we get a $(3+2\sqrt{2})\approx 5.828$ LMP approximation for the Euclidean $k$ -means problem.

5 Polynomial-time Approximation Algorithm for Euclidean $k$ -means

In this section, we describe how we improve the LMP approximation for Euclidean $k$ -means to a polynomial-time approximation algorithm. Unfortunately, we will lose a slight factor in our approximation, but we still obtain a significant improvement over the previous state-of-the-art approximation factor. While we focus on the $k$ -means problem, we note that this improvement can also be applied to the $k$ -median problem as well, with some small modifications that we will make note of. In Section 6, we will provide an LMP approximation for $k$ -median, and explain how we can use the same techniques in this section to also obtain an improved polynomial-time algorithm for $k$ -median as well.

In Subsection 5.1, we describe the polynomial-time algorithm to generate two nested quasi-independent sets $I$ and $I^{\prime}$ , which will be crucial in developing our final set of centers of size $k$ with low clustering cost. This procedure is based on a similar algorithm of Ahmadian et al. [1], but with some important changes in how we update our graphs and independent sets. In Subsection 5.2, we describe and state a few additional preliminary results. In Subsection 5.3, we analyze the algorithm and show how we can use $I$ and $I^{\prime}$ to generate our final set of centers, to obtain a $6.013$ -approximation algorithm. Finally, in Subsection 5.4, we show that our analysis in Subsection 5.3 can be further improved, to obtain a $(5.912+\varepsilon)$ -approximation guarantee.

We remark that our approximation guarantee of $(5.912+\varepsilon)$ is only in expectation. However, this can be made to be with exponential failure probability, as with probability $\varepsilon$ the approximation ratio will be $(5.912+O(\varepsilon))$ . So, by running the algorithm $\varepsilon^{-1}\cdot\text{poly}(n)$ times in parallel, and outputting the best solution of these, we obtain a $(5.912+\varepsilon)$ -approximation ratio with probability at least $1-(1-\varepsilon)^{\varepsilon^{-1}\cdot\text{poly}(n)}\geq 1-e^{-\text{poly}(n)}$ .

5.1 The algorithm and setup

First, we make some assumptions on the clients and facilities. We first assume that the number of facilities, $m=|\mathcal{F}|$ , is at most polynomial in the number of clients $n=|\mathcal{D}|$ . In addition, we also assume that the distances between clients and facilities are all in the range $[1,n^{6}]$ . Indeed, both of these assumptions can be made via standard discretization techniques, and we only lose an $1+o(1)$ -approximation factor by removing these assumptions [1]. Note that this means the optimal clustering cost, which we call $\text{OPT}_{k}$ , is at least $n$ for both $k$ -means and $k$ -medians. Finally, we assume that $k\leq n-1$ (else this problem is trivial in polynomial time).

Next, we describe the setup relating to dual solutions. Consider the tuple $(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ , where $\alpha\in\mathbb{R}^{\mathcal{D}}$ , $z\in\mathbb{R}^{\mathcal{F}}$ , $\mathcal{F}_{S}\subset\mathcal{F}$ , and $\mathcal{D}_{S}:\mathcal{F}_{S}\to\{0,1\}^{\mathcal{D}}$ . Here, $\alpha$ represents the set $\{\alpha_{j}\}_{j\in\mathcal{D}}$ which will be a solution to the dual linear program, $z$ represents $\{z_{i}\}_{i\in\mathcal{F}}$ , where each $z_{i}\in\{\lambda,\lambda+\frac{1}{n}\}$ will be a modified value representing the threshold for tightness of facility $i$ , $\mathcal{F}_{S}$ represents a subset of facilities that we deem “special”, and $\mathcal{D}_{S}$ is a function that maps each special facility to a subset of the clients that we deem special clients for that facility.

When talking about a single solution $(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ , we define $\beta_{ij}=\max(0,\alpha_{j}-c(j,i))$ for any $i\in\mathcal{F},j\in\mathcal{D}$ , and define $N(i)=\{j\in\mathcal{D}:\beta_{ij}>0\}$ . We say that a facility $i$ is tight if $\sum_{j\in\mathcal{D}}\beta_{ij}=z_{i}$ . Now, we define $\tau_{i}$ for each $i$ that is either tight or special (i.e., in $\mathcal{F}_{S}$ ). For each tight facility, we define $\tau_{i}=\max_{j\in N(i)}\alpha_{j}$ , and for each special facility, we define $\tau_{i}=\max_{j\in N(i)\cap\mathcal{D}_{S}(i)}\alpha_{j}$ . We default the maximum of an empty set to be $0$ . We also consider a modified conflict graph $H:=H(\delta)$ on the set of tight or special facilities, with an edge between $i$ and $i^{\prime}$ if $c(i,i^{\prime})\leq\delta\cdot\min(\tau_{i},\tau_{i^{\prime}})$ .

We can now define the notion of roundable solutions: our definition is slightly modified from [1, Definition 5.1].

Definition 5.1.

Let $\alpha\in\mathbb{R}^{\mathcal{D}},$ $z\in\mathbb{R}^{\mathcal{F}},$ $\mathcal{F}_{S}\subset\mathcal{F}$ be the set of special facilities, and $\mathcal{D}_{S}:\mathcal{F}_{S}\to\{0,1\}^{\mathcal{D}}$ be the function assigning each special facility $i\in\mathcal{F}_{S}$ to a subset of special clients $\mathcal{D}_{S}(i)$ . Then, the tuple $(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ is $(\lambda,k^{\prime})$ -roundable if

1.

$\alpha$ is a feasible solution of $\text{DUAL}(\lambda+\frac{1}{n})$ and $\alpha_{j}\geq 1$ for all $j$ .
2.

For all $i\in\mathcal{F},$ $\lambda\leq z_{i}\leq\lambda+\frac{1}{n}.$
3.
There exists a subset $\mathcal{D}_{B}$ of “bad” clients so that for all $j\in\mathcal{D}$ , there is a facility $w(j)$ that is either tight or in $\mathcal{F}_{S}$ , such that:
1. (a)
  
  For all $j\in\mathcal{D}\backslash\mathcal{D}_{B}$ , $(1+\varepsilon)\cdot\alpha_{j}\geq c(j,w(j))$
2. (b)
  
  For all $j\in\mathcal{D}\backslash\mathcal{D}_{B}$ , $(1+\varepsilon)\cdot\alpha_{j}\geq\tau_{w(j)}$
3. (c)
  
  $\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\tau_{w(j)}\right)$
4.

$\sum_{i\in\mathcal{F}_{S}}\sum_{j\in\mathcal{D}_{S}(i)}\max(0,\alpha_{j}-c(j,i))\geq\lambda\cdot|\mathcal{F}_{S}|-\gamma\cdot\text{OPT}_{k^{\prime}}$ , and $|\mathcal{F}_{S}|\leq n.$

Here, $\gamma\ll\varepsilon\ll 1$ are arbitrarily small constants, which are implicit parameters in the definition. Finally, we say that $(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ is $k^{\prime}$ -roundable if it is $(\lambda,k^{\prime})$ -roundable for some choice of $\lambda\geq 0$ .

Algorithm 2 Generate Sequence of Nested Quasi-Independent Sets

1:Initialize

\mathcal{S}^{(0)}=(\alpha^{(0)},z^{(0)},\mathcal{F}_{S}^{(0)},\mathcal{D}_{S}^{(0)})

, and set

\lambda\leftarrow 0

I_{1}^{(0)}\leftarrow\mathcal{F}

I_{2}^{(0)}\leftarrow\emptyset

I_{3}^{(0)}\leftarrow\emptyset

2:Set

I^{(0)}=(I_{1}^{(0)},I_{2}^{(0)},I_{3}^{(0)})

\varepsilon_{z}\leftarrow n^{-\text{poly}(\gamma^{-1})}

L\leftarrow 4n^{7}\cdot\varepsilon_{z}^{-1}

k^{\prime}\leftarrow\min(k,|I_{1}^{(0)}|),

and

p_{1}=0.402

3:for

\lambda=0,\varepsilon_{z},\dots,L\cdot\varepsilon_{z}

4: for

i\in\mathcal{F}

5: Call RaisePrice(

\alpha^{(0)},z^{(0)},I_{1}^{(0)},i

) to generate a polynomial-size sequence

\mathcal{S}^{(1)},\dots,\mathcal{S}^{(q)}

of close,

k^{\prime}

-roundable solutions

6: for

\ell=0

q-1

7: Call GraphUpdate(

\mathcal{S}^{(\ell)},\mathcal{S}^{(\ell+1)},I^{(\ell)}

) to produce a sequence

\{I^{(\ell,r)}\}_{r=0}^{p_{\ell}}

8: for

r=1

p_{\ell}

9: if

|I_{1}^{(\ell,r)}|+p_{1}|I_{2}^{(\ell,r)}\cup I_{3}^{(\ell,r)}|<k

then

10: Let

I=(I_{1}^{(\ell,r-1)},I_{2}^{(\ell,r-1)},I_{3}^{(\ell,r-1)})

I^{\prime}=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)})

, and return

(I,I^{\prime})

11: else

12:

I^{(\ell+1)}\leftarrow I^{(\ell,p_{\ell})}

13:

\mathcal{S}^{(0)}\leftarrow\mathcal{S}^{(q)}

I^{(0)}\leftarrow I^{(q)}

14:

k^{\prime}\leftarrow\min(k^{\prime},|I_{1}^{(0)}|)

Our main algorithm is described in Figure 2. This algorithm outputs two nested quasi-independent sets $I$ and $I^{\prime}$ . The final set $S$ will be obtained either from one of these two sets, or from some hybridization of them. We will defer the actual construction of $S$ to Theorem 5.17.

The method of RaisePrice comes from [1], and will not be of importance, apart from the results that they give us for the overall algorithm (see Theorem 5.7). We will make some important definitions and set up the overall algorithm, and then describe the method of GraphUpdate, which is slightly modified from [1].

First, we describe the initialization phase of Algorithm 2 to generate $\mathcal{S}^{(0)}$ , which is almost identical to that of Ahmadian et al. [1, P. 22-23 of journal version]. The main difference is that we parameterized the procedure by $1+\kappa,1/\kappa$ (instead of 2 and 6 in Ahmadian et al.) Start by setting $\lambda=0$ , $z_{i}^{(0)}=0$ for all $i\in\mathcal{F}$ , and $\mathcal{F}_{S}^{(0)}=\emptyset$ (so $\mathcal{D}_{S}^{(0)}$ has empty domain). We then set $\alpha_{j}^{(0)}=0$ for all $j\in\mathcal{D}$ .

Now, we increase all of the $\alpha_{j}^{(0)}$ values simultaneously at a uniform rate, and for each $j$ , we stop increasing $\alpha_{j}^{(0)}$ once one of the following $2$ events occur:

1.

$\alpha_{j}^{(0)}=c(j,i)$ for some $i$ .
2.

$(1+\kappa)\cdot\sqrt{\alpha_{j}}\geq d(j,j^{\prime})+\frac{1}{\kappa}\cdot\sqrt{\alpha_{j^{\prime}}}$ for some $j^{\prime}\neq j$ (or $(1+\kappa)\cdot\alpha_{j}\geq d(j,j^{\prime})+\frac{1}{\kappa}\cdot\alpha_{j^{\prime}}$ in the $k$ -median case). Here, $\kappa$ will be a fixed small constant (see Appendix D for details on how to set $\kappa$ ).

In the initial solution, $\alpha_{j}^{(0)}\leq\min_{i\in\mathcal{F}}c(j,i)$ for all $i\in\mathcal{F}$ , which means that $N(i)$ is empty for all $i\in\mathcal{F}$ , so $\tau_{i}=0$ . In addition, since every $z_{i}=0$ , every facility is tight. This means that the conflict graph $H(\delta)$ on the set of tight facilities for the initial solution we construct is just an empty graph on the full set of facilities, since $c(i,i^{\prime})>0=\delta\cdot\min(\tau_{i},\tau_{i^{\prime}}).$ So, this means that if we apply Algorithm 1 to $V_{1}=\mathcal{F}$ , we will obtain that $I_{1}^{(0)}=\mathcal{F}$ and $I_{2}^{(0)}=I_{3}^{(0)}=\emptyset$ .

We now set up some definitions that will be important for the remainder of the algorithm and analysis. Define two dual solutions $\alpha=\{\alpha_{j}\}$ and $\alpha^{\prime}=\{\alpha_{j}^{\prime}\}$ to be close if $\max_{j\in\mathcal{D}}|\alpha_{j}-\alpha_{j}^{\prime}|\leq\frac{1}{n^{2}}$ . Consider two solutions $\mathcal{S}^{(\ell)}=(\alpha^{(\ell)},z^{(\ell)},\mathcal{F}_{S}^{(\ell)},\mathcal{D}_{S}^{(\ell)})$ and $\mathcal{S}^{(\ell+1)}=(\alpha^{(\ell+1)},z^{(\ell+1)},\mathcal{F}_{S}^{(\ell+1)},\mathcal{D}_{S}^{(\ell+1)})$ that are each $(\lambda,k^{\prime})$ -roundable for some choice of $\lambda$ , such that $\alpha^{(\ell)}$ and $\alpha^{(\ell+1)}$ are close. This means that $|\alpha_{j}^{(\ell)}-\alpha_{j}^{(\ell+1)}|\leq\frac{1}{n^{2}}$ for all $j\in\mathcal{D},$ and that $\lambda\leq z_{i}^{(\ell)},z_{i}^{(\ell+1)}\leq\lambda+\frac{1}{n}$ for all $i\in\mathcal{F}$ , even if $i$ is not tight. Let $\mathcal{V}^{(\ell)}$ represent the set of tight or special facilities in $\mathcal{S}^{(\ell)}$ and define $\mathcal{V}^{(\ell+1)}$ likewise.

Let $\sqcup$ denote the disjoint union, i.e., $S\sqcup T$ is a set consisting of a copy of each element in $S$ and a distinct copy of each element in $T$ . For each point $i\in\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}$ , we define $\mathcal{D}_{S}(i)$ (if $i$ were a special facility), $\tau_{i}$ , and $z_{i}$ based on whether $i$ came from $\mathcal{V}^{(\ell)}$ or from $\mathcal{V}^{(\ell+1)}$ . This means that for $i\in\mathcal{V}^{(\ell)}$ , $\mathcal{D}_{S}(i)=\mathcal{D}^{(\ell)}_{S}(i)$ , $z_{i}=z_{i}^{(\ell)}$ , and $\tau_{i}=\tau_{i}^{(\ell)}=\max_{j:\alpha_{j}^{(\ell)}>c(j,i)}\alpha_{j}^{(\ell)}$ if $i$ is tight and $\tau_{i}=\tau_{i}^{(\ell)}=\max_{j:\alpha_{j}^{(\ell)}>c(j,i),j\in\mathcal{D}_{S}^{(\ell)}(i)}\alpha_{j}^{(\ell)}$ if $i$ is special (and likewise for $i\in\mathcal{V}^{(\ell+1)}$ ). In addition, for each client $j\in\mathcal{D}$ , we define $\alpha_{j}=\min(\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)})$ ; for each $j\in\mathcal{D}$ and $i\in\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}$ , we define $\beta_{ij}=\max(0,\alpha_{j}-c(j,i))$ ; and for each $i\in\mathcal{F}$ , we let $N(i)=\{j\in\mathcal{D}:\alpha_{j}-c(j,i)>0\}$ . Since $\alpha_{j}=\min(\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}),$ this means $N(i)\subset N^{(\ell)}(i)$ if $i\in\mathcal{V}^{(\ell)}$ and $N(i)\subset N^{(\ell+1)}(i)$ if $i\in\mathcal{V}^{(\ell+1)}.$

We create a hybrid conflict graph on a subset of the disjoint union $\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}$ . First, we let $H^{(\ell,0)}(\delta)$ represent the conflict graph on $\mathcal{V}^{(\ell,0)}:=\mathcal{V}^{(\ell)}$ . The conflict graph on a set of facilities means there is an edge between two vertices $(i,i^{\prime})$ if $c(i,i^{\prime})\leq\delta\cdot\min(\tau_{i},\tau_{i^{\prime}})$ . Next, we choose some ordering of the vertices in $\mathcal{V}^{(\ell)}$ , and for each $1\leq r\leq p_{\ell}:=|\mathcal{V}^{(\ell)}|+1$ , we let $V^{(\ell,r)}$ represent the vertices of the so-called merged vertex set defined as $\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}$ after we removed the first $r-1$ vertices in $\mathcal{V}^{(\ell)}$ , and let $H^{(\ell,r)}(\delta)$ represent the conflict graph on $\mathcal{V}^{(\ell,r)}$ , where again the conflict graph means that $i,i^{\prime}\in\mathcal{V}^{(\ell,r)}$ share an edge if $c(i,i^{\prime})\leq\delta\cdot\min(\tau_{i},\tau_{i^{\prime}})$ . Note that $\mathcal{V}^{(\ell,1)}=\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}$ . For simplicity of notation, we may abbreviate $\mathcal{V}^{(\ell,r)}$ as $\mathcal{V}$ and $H^{(\ell,r)}(\delta)$ as $H(\delta)$ if the context is clear.

In the context of a hybrid conflict graph $\mathcal{V}^{(\ell,r)}$ , for any client $j$ , let $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ be the subset of $\mathcal{V}^{(\ell,r)}$ consisting of all tight facilities $i$ such that $j\in N(i)$ and all special facilities $i$ such that $j\in N(i)\cap\mathcal{D}_{S}(i)$ . We also define $w(j)$ as the witness for $j$ in the solution $\mathcal{S}^{(\ell+1)}$ , and $\mathcal{D}_{B}$ as the set of bad clients from the solution $\mathcal{S}^{(\ell+1)}$ .

Finally, to describe the actual GraphUpdate procedure, it works as follows. First, we note that $I^{(0,0)}=I^{(0)}$ which has already been decided, either by the first initialized solution or from the previous solution $I^{(q)}$ before we reset $I^{(0)}=I^{(q)}$ . Otherwise, $I^{(\ell+1,0)}=I^{(\ell,p_{\ell})}$ since the set of vertices $\mathcal{V}^{(\ell+1,0)}=\mathcal{V}^{(\ell+1)}=\mathcal{V}^{(\ell,p_{\ell})}$ . Finally, we note that $|\mathcal{V}^{(\ell,r)}\backslash\mathcal{V}^{(\ell,r+1)}|\leq 1,$ since for $r=0$ $\mathcal{V}^{(\ell,r)}\subset\mathcal{V}^{(\ell,r+1)}$ and otherwise, $\mathcal{V}^{(\ell,r+1)}$ is created by simply removing one element from $\mathcal{V}^{(\ell,r)}$ . So, the maximal independent set $I_{1}^{(\ell,r)}$ of $H^{(\ell,r)}(\delta_{1})$ can easily be extended to a maximal independent set $I_{1}^{(\ell,r+1)}$ of $H^{(\ell,r+1)}(\delta_{1})$ by deleting at most $1$ element and then possibly extending the independent set. We then extend $I_{1}^{(\ell,r+1)}$ arbitrarily based on Steps 2 to 5 to create $I_{2}^{(\ell,r+1)}$ and $I_{3}^{(\ell,r+1)}$ , where $I_{2}^{(\ell,r+1)}$ and $I_{3}^{(\ell,r+1)}$ may have no relation to $I_{2}^{(\ell,r)}$ and $I_{3}^{(\ell,r)}$ . So, we inductively create $I^{(\ell,r+1)}=(I_{1}^{(\ell,r+1)},I_{2}^{(\ell,r+1)},I_{3}^{(\ell,r+1)})$ from $I^{(\ell,r)}=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)})$ , where importantly $|I_{1}^{(\ell,r)}\backslash I_{1}^{(\ell,r+1)}|\leq 1$ .

5.2 Additional preliminaries

For our approximation guarantees, we require two additional preliminaries. The first is to show a rough equivalence between solving $k$ -means (resp., $k$ -median) clustering and solving $k$ -means (resp., $k$ -median) clustering if allowed $O(1)$ additional clusters. The second is to define the notion of negative-submodularity and its application for $k$ -means and $k$ -median.

First, we show that for any constant $C$ and parameter $\varepsilon$ , if there exists a polynomial-time $\alpha$ -approximation algorithm for $k$ -means or $k$ -median in any metric space that uses $k+C$ centers, then for any constant $\varepsilon>0$ , there exists a polynomial-time $\alpha(1+\varepsilon)$ -approximation algorithm that opens exactly $k$ centers.

More formally, the statement we prove is the following. Note that a similar statement was proven in [36].

Lemma 5.2.

Let $C,\alpha$ be some absolute constants. Let $\mathcal{A}$ be an $\alpha$ -approximation algorithm with running time $T$ for $k$ -median (resp. $k$ -means) that open $k+C$ centers. Then, for any $1/3>\varepsilon>0$ , there exists an $\alpha(1+\varepsilon)$ -approximation for $k$ -median (resp. $k$ -means) with running time $O(T+n^{\text{poly}(C/\varepsilon)})$ .

Proof.

We give the proof of the $k$ -median problem, the proof for the $k$ -means problem is identical, up to adjustment of the constants.

To proceed, we need the following notion (due to [42]): A $k$ -median instance is said to be $(1+\alpha)$ -ORSS-separable if the ratio of the cost of the optimum solution with $k-1$ centers to the cost of the optimum solution with $k$ centers is at least $1+\alpha$ .

We can now present our algorithm; For any $k$ , the algorithm is as follows. Compute an $\alpha$ -approximate solution $S_{C}$ (with $k+C-C=k$ centers) to the $(k-C)$ -median instance using $\mathcal{A}$ . Then, for any $i=1,\ldots,C$ , compute a solution $S_{i-1}$ with $k-i$ centers using the algorithms for $(1+\nicefrac{{\varepsilon}}{{C}})^{\varepsilon/10}$ -ORSS-separable instances of [5, 22] to obtain a $(1+\varepsilon/3)$ -approximate solution in time $n^{\text{poly}(C/\varepsilon)}$ . Output the solution $S^{*}$ of $S_{0},\ldots,S_{C}$ with minimum $k$ -median cost.

We now turn to the analysis of the above algorithm. The running time follows immediately from its definition and the results of [5, 22]. Let’s then consider the approximation guarantee of the solution produced. For $0\leq i\leq C$ , let $\text{OPT}_{i}$ be the solution to $(k-i)$ -median, i.e.: the $k$ -median problem with $k-i$ centers. Our goal is thus to show that the solution $S^{*}$ output by the above algorithm is an $\alpha(1+\varepsilon)$ -approximate solution to the $\text{OPT}_{0}$ .

If the cost of $\text{OPT}_{C}$ is within a $(1+\varepsilon)$ -factor of the cost of $\text{OPT}_{0}$ , then the cost of the solution output by the algorithm is no larger than the cost of solution $S_{C}$ whose total cost is thus at most $\alpha\text{cost}(\text{OPT}_{C})\leq\alpha(1+\varepsilon)\text{cost}(\text{OPT}_{0})$ , as desired.

Otherwise, since $\varepsilon<1/3$ we have that there exists a $i>0$ such that $\text{cost}(\text{OPT}_{i})\geq(1+\nicefrac{{\varepsilon}}{{C}})^{\varepsilon/10}\text{cost}(\text{OPT}_{i-1})$ . Let $i^{*}$ be the smallest $i$ such that the above holds. In which case, we have both that $\text{cost}(\text{OPT}_{i^{*}-1})\leq(1+\varepsilon/3)\text{cost}(\text{OPT}_{0})$ and that the $(k-(i^{*}-1))$ -median instance is $(1+\nicefrac{{\varepsilon}}{{C}})^{\varepsilon/10}$ -ORSS-separable. Therefore, by the results of [5, 22], the cost of the solution output by our algorithm is no larger than $(1+\varepsilon/3)\text{cost}(\text{OPT}_{i^{*}-1})$ and so at most $(1+\varepsilon/3)^{2}\text{cost}(\text{OPT}_{0})\leq(1+\varepsilon)\text{cost}(\text{OPT}_{0})$ by our choice of $\varepsilon$ , hence the lemma. ∎

Next, we describe the definition of submodular and negative-submodular set functions.

Definition 5.3.

Let $\Omega$ be a finite set, and let $f$ be a function from the set of subsets of $\Omega$ , $\mathcal{P}(\Omega)$ , to the real numbers $\mathbb{R}$ . Then, $f$ is submodular over $\Omega$ if for any $X\subset Y\subsetneq\Omega$ and any $x\in\Omega\backslash Y$ , we have that $f(X\cup\{x\})-f(X)\geq f(Y\cup\{x\})-f(Y)$ . Likewise, $f$ is negative-submodular over $\Omega$ if $-f$ is a submodular function: equivalently, if for any $X\subset Y\subsetneq\Omega$ and any $x\in\Omega\backslash Y$ , we have that $f(X\cup\{x\})-f(X)\leq f(Y\cup\{x\})-f(Y)$ .

The following claim, proven in [17], proves that the $k$ -means and $k$ -median objective functions are both negative-submodular functions.

Proposition 5.4.

[17, Claim 10] Fix $\mathcal{F}$ and $\mathcal{D}$ , and let $f:\mathcal{P}(\mathcal{F})\to\mathbb{R}$ be the function sending each subset $S\subset\mathcal{F}$ to $\text{cost}(\mathcal{D},S)$ . Then, $f$ is a negative-submodular function over $\mathcal{F}$ , either if the cost is the $k$ -median cost or if the cost is the $k$ -means cost.

Given Proposition 5.4, we can use standard properties of submodular functions to infer the following claims.

Proposition 5.5.

Let $S_{0}\subset S_{1}$ be sets of facilities, where $S_{0}$ has size $k_{0}$ and $S_{1}$ has size $k_{0}+k_{1}$ . Then, for any $0\leq p\leq 1$ , if $S:S_{0}\subset S\subset S_{1}$ is a set created including all of $S_{0}$ and then independently including each element in $S_{1}\backslash S_{0}$ with probability $p$ , then $\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq p\cdot\text{cost}(\mathcal{D},S_{1})+(1-p)\cdot\text{cost}(\mathcal{D},S_{0})$ .

Proposition 5.6.

Let $S_{0}\subset S_{1}$ be sets of facilities, where $S_{0}$ has size $k_{0}$ and $S_{1}$ has size $k_{0}+k_{1}$ . Then, if $S:S_{0}\subset S\subset S_{1}$ is a set created by randomly adding exactly $k_{2}$ items from $S_{1}\backslash S_{0}$ to $S_{0}$ , for some fixed $0\leq k_{2}\leq k_{1}$ , then we have $\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\frac{k_{2}}{k_{1}}\cdot\text{cost}(\mathcal{D},S_{1})+(1-\frac{k_{2}}{k_{1}})\cdot\text{cost}(\mathcal{D},S_{0})$ .

5.3 Analysis

The following theorem relating to the above algorithm was (essentially) proven by Ahmadian et al. [1], and will be very important in our analysis.

Theorem 5.7.

Algorithm 2 runs in $n^{O(1)}$ time (where the $O(1)$ may depend on $\varepsilon$ and $\gamma$ ), and the following conditions hold.

1.

Let $k^{\prime}$ be the minimum of $k$ and the sizes of all sets that become $I_{1}^{(0)}$ (i.e., the first part of each nested quasi-independent set $I^{(q)}$ that becomes $I^{(0)}$ , as done in line 13 of the pseudocode). Then, every solution $\mathcal{S}^{(\ell)}$ that is generated when $\lambda$ is a certain value is $(\lambda,k^{\prime})$ -roundable.
2.

For any solution $\mathcal{S}$ that becomes $\mathcal{S}^{(0)}$ , $\mathcal{F}_{S}^{(0)}=\emptyset$ (and so $\mathcal{D}_{S}^{(0)}$ is an empty function). In addition, $\mathcal{S}=\mathcal{S}^{(0)}$ has no corresponding bad clients, i.e., $\mathcal{D}_{B}=\emptyset$ .
3.

For any two consecutive solutions $\mathcal{S}^{(\ell)}$ and $\mathcal{S}^{(\ell+1)}$ , we have that $\alpha^{(\ell)}$ and $\alpha^{(\ell+1)}$ are close.
4.

Every $I^{(\ell,r)}=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)})$ is a nested quasi-independent set for the set of facilities $\mathcal{V}^{(\ell,r)}$ . In addition, for every $1\leq r\leq p_{\ell}$ , $|I_{1}^{(\ell,r-1)}\backslash I_{1}^{(\ell,r)}|\leq 1.$

This theorem is technically stronger than what was proven in [1], but follows from a nearly identical analysis to their paper, with a very minor tweak to the algorithm. We explain why Theorem 5.7 follows from their analysis in Appendix D.

For the following lemmas (Lemmas 5.8 until Proposition 5.15), we consider a fixed family of conflict graphs $\{H(\delta)\}_{\delta>0}=\{H^{(\ell,r)}(\delta)\}_{\delta>0}$ on a hybrid set $\mathcal{V}=\mathcal{V}^{(\ell,r)}$ , for some $r\geq 1$ , where both $\mathcal{S}^{(\ell)}$ and $\mathcal{S}^{(\ell+1)}$ are $(\lambda,k^{\prime})$ -roundable. For some fixed $\delta_{1}\geq\delta_{2}\geq 2\geq\delta_{3},$ we let $(I_{1},I_{2},I_{3})$ be a nested quasi-independent set of $\{H(\delta)\}$ , i.e., the output of running all but the final step of Algorithm 1 with $V_{1}=\mathcal{V}$ , and treat it as fixed. However, we will let $p$ (required in the final step 6) be variable, though we may consider $p$ as initialized to some fixed $p_{1}$ .

Many of these results apply for both $k$ -means and $k$ -median. While we focus on $k$ -means, we later explain how to make simple modifications to apply our results to the $k$ -median problem. In addition, we will treat $\delta_{1},\delta_{2},\delta_{3}$ as fixed but $p$ as potentially variable. We let $\rho(p)$ represent the approximation constant from the LMP algorithm (i.e., in Lemma 4.2) with probability $p$ (either for $k$ -means or $k$ -median, depending on constant).

We first show some crucial preliminary claims relating to the hybrid graph $\mathcal{V}^{(\ell,r)}$ , where $r\geq 1.$

Lemma 5.8.

For any client $j\in\mathcal{D}$ and any facility $i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ , $\tau_{i}\geq\alpha_{j}>c(j,i)$ .

Proof.

Note that if $i\in\mathcal{V}^{(\ell)}$ , then $\tau_{i}$ is the maximum $\alpha_{j^{\prime}}^{(\ell)}$ over $j^{\prime}$ such that $\alpha_{j^{\prime}}^{(\ell)}>c(j^{\prime},i)$ and $j^{\prime}\in\mathcal{D}_{S}^{(\ell)}(i)$ if $i$ is special. Since $\alpha_{j^{\prime}}^{(\ell)}>\alpha_{j^{\prime}}$ , this is at least the maximum $\alpha_{j^{\prime}}$ over $j^{\prime}$ such that $\alpha_{j^{\prime}}>c(j^{\prime},i)$ and $j^{\prime}\in\mathcal{D}_{S}^{(\ell)}(i)$ if $i$ is special. But if $i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ , then indeed $\alpha_{j}>c(j,i)$ and $j\in\mathcal{D}_{S}^{(\ell)}(i)$ if $i$ is special (recall that $\mathcal{D}_{S}$ was defined based on whether $i$ is in $\mathcal{V}^{(\ell)}$ or $\mathcal{V}^{(\ell+1)}$ ). So, $\tau_{i}\geq\alpha_{j}$ . By an identical argument, the same holds if $i\in\mathcal{V}^{(\ell+1)}$ .

Finally, note that we defined $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ to precisely be the set of tight facilities $i$ in $\mathcal{V}$ with $\alpha_{j}>c(j,i)$ , or special facilities $i$ in $\mathcal{V}$ with $\alpha_{j}>c(j,i)$ and $j\in\mathcal{D}_{S}(i)$ . So, we always have $\alpha_{j}>c(j,i)$ . ∎

Lemma 5.9.

Suppose that $S\subset I_{1}\cup I_{2}\cup I_{3}$ contains every point in $I_{1}$ , and each point in $I_{2}$ and each point in $I_{3}$ with probability $p\leq 0.5$ (not necessarily independently). Then, for any point $j$ ,

\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0.

Remark.

We note that this lemma holds even for bad clients $j\in\mathcal{D}_{B}$ . In addition, we remark that we will be applying this lemma on $S$ as a nested quasi-independent set or something similar.

Finally, we note that this lemma (and the following lemma, Lemma 5.10) are the only results where we directly use the fact that we are studying the $k$ -means as opposed to the $k$ -median problem.

Proof of Lemma 5.9.

Note that every point $i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S$ satisfies $\tau_{i}\geq\alpha_{j}$ and $\alpha_{j}-c(j,i)>0$ , by Lemma 5.8. So, by linearity of expectation, it suffices to show that

\alpha_{j}\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))+\frac{1}{2}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i))+\frac{1}{2}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i)).

(10)

Now, note that by the definition of $I_{2}$ , every pair of points $(i,i^{\prime})$ in $(I_{1}\cup I_{2})\cap\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ are separated by at least $\sqrt{\delta_{2}\cdot\min(\tau_{i},\tau_{i^{\prime}})}$ distance. But since $i,i^{\prime}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ , $\min(\tau_{i},\tau_{i^{\prime}})\geq\alpha_{j}$ by Lemma 5.8. So, $d(i,i^{\prime})\geq\sqrt{2\cdot\alpha_{j}}$ . Therefore,

$\displaystyle\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{2})}(\alpha_{j}-c(j,i))$	$\displaystyle\leq\|I_{1}\cup I_{2}\|\cdot\alpha_{j}-\frac{1}{2\cdot\|I_{1}\cup I_{2}\|}\cdot\sum_{i,i^{\prime}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{2})}d(i,i^{\prime})^{2}$
	$\displaystyle\leq\|I_{1}\cup I_{2}\|\cdot\alpha_{j}-\frac{1}{2\cdot\|I_{1}\cup I_{2}\|}\cdot(\|I_{1}\cup I_{2}\|)\cdot(\|I_{1}\cup I_{2}\|-1)\cdot 2\cdot\alpha_{j}$
	$\displaystyle=\alpha_{j}.$	(11)

Likewise, every pair of points $(i,i^{\prime})$ in $(I_{1}\cup I_{3})\cap\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ are also separated by at least $\sqrt{\delta_{2}\cdot\min(\tau_{i},\tau_{i^{\prime}})}\geq\sqrt{2\cdot\alpha_{j}}$ distance. Therefore, the same calculations as in (11) give us

\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{3})}(\alpha_{j}-c(j,i))\leq\alpha_{j}.

(12)

Averaging Equations (11) and (12) gives us Equation (10), which finishes the lemma. ∎

We next have the following lemma, which bounds the cost for clients that are not bad.

Lemma 5.10.

Let $p<0.5$ and $S$ be generated by applying Step 6 to $(I_{1},I_{2},I_{3})$ . Then, for every client $j\not\in\mathcal{D}_{B}$ ,

\mathbb{E}[c(j,S)]\leq\rho(p)\cdot(1+O(\varepsilon))\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right],

where $\rho(p)$ represents the constant from Lemma 4.2.

Proof.

By Lemma 5.8, we have that $\tau_{i}\geq\alpha_{j}$ for all $i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)$ . In addition, for every $j\not\in\mathcal{D}_{B}$ , there exists a point $w(j)$ in $\mathcal{V}^{(\ell+1)}$ (so it has not been removed, i.e., it is still in $\mathcal{V}=\mathcal{V}^{(\ell,r)}$ ) such that $(1+\varepsilon)\cdot\alpha_{j}^{(\ell+1)}\geq c(j,w(j))$ and $(1+\varepsilon)\cdot\alpha_{j}^{(\ell+1)}\geq\tau_{w(j)}$ . Since $\alpha_{j}^{(\ell+1)}\geq 1$ for all $j$ and $|\alpha_{j}-\alpha_{j}^{(\ell+1)}|\leq\frac{1}{n^{2}},$ this means that $(1+O(\varepsilon))\cdot\alpha_{j}\geq c(j,w(j)),\tau_{w(j)}$ . These pieces of information are sufficient for all of our calculations from Lemma 4.2 to go through. ∎

We note that the expression $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))$ may be somewhat unwieldy. Therefore, we provide an upper bound on its sum over $j\in\mathcal{D}.$

Lemma 5.11.

Let $S$ be any subset of $\mathcal{V}=\mathcal{V}^{(\ell,r)}$ . Then,

\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot|S|+4\gamma\cdot\text{OPT}_{k^{\prime}}.

Remark.

We note that this lemma holds even when $\lambda<\frac{1}{n}$ , i.e., $\lambda-\frac{1}{n}<0$ .

Proof.

First, by splitting the sum based on tight and special facilities, we have that

	$\displaystyle\hskip 14.22636pt\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)$
	$\displaystyle=\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{\begin{subarray}{c}i\in S\\ i\text{ tight}\end{subarray}}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))-\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i)).$		(13)

Now, we note that for any tight facility $i$ , either $\sum_{j\in\mathcal{D}}\max(0,\alpha_{j}^{(\ell)}-c(j,i))=z_{i}^{(\ell)}\in[\lambda,\lambda+\frac{1}{n}]$ or $\sum_{j\in\mathcal{D}}\max(0,\alpha_{j}^{(\ell+1)}-c(j,i))=z_{i}^{(\ell+1)}\in[\lambda,\lambda+\frac{1}{n}]$ . Since $\alpha^{(\ell)}$ and $\alpha^{(\ell+1)}$ are close, this means $0\leq\alpha_{j}^{(\ell)}-\alpha_{j},\alpha_{j}^{(\ell+1)}-\alpha_{j}\leq\frac{1}{n^{2}}$ , and since there are $n$ clients in $\mathcal{D}$ , this means that

\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)\geq\sum_{j\in\mathcal{D}}\max\left(\alpha_{j}^{(\ell^{\prime})}-c(j,i)-\frac{1}{n^{2}},0\right)\geq\lambda-\frac{1}{n},

(14)

for some choice of $\ell^{\prime}$ in $\{\ell,\ell+1\}$ .

In addition, we know that both $\alpha^{(\ell)}$ and $\alpha^{(\ell+1)}$ are feasible solutions of $\text{DUAL}(\lambda+\frac{1}{n})$ , and that $\alpha_{j}\leq\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}$ . Therefore, for any special facility $i\in\mathcal{F}_{S}^{(\ell)}\sqcup\mathcal{F}_{S}^{(\ell+1)}$ , $\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i))\leq\sum_{j\in\mathcal{D}}\max(0,\alpha_{j}-c(j,i))\leq\lambda+\frac{1}{n}$ . But, we have that

	$\displaystyle\sum_{i\in\mathcal{F}_{S}^{(\ell^{\prime})}}\sum_{j\in N(i)\cap\mathcal{D}_{S}^{(\ell^{\prime})}(i)}(\alpha_{j}-c(j,i))$	$\displaystyle\geq\sum_{i\in\mathcal{F}_{S}^{(\ell^{\prime})}}\sum_{j\in\mathcal{D}_{S}^{(\ell^{\prime})}(i)}\max\left(0,\alpha_{j}^{(\ell^{\prime})}-\frac{1}{n^{2}}-c(j,i)\right)$
		$\displaystyle\geq\lambda\cdot\|\mathcal{F}_{S}^{(\ell^{\prime})}\|-\gamma\cdot\text{OPT}_{k^{\prime}}-\frac{\|\mathcal{F}_{S}^{(\ell^{\prime})}\|}{n},$

for both $\ell^{\prime}=\ell$ and $\ell^{\prime}=\ell+1$ (the last inequality follows by Condition 4 of Definition 5.1). So, if we let $e_{i}$ represent $\lambda+\frac{1}{n}-\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i))$ for each special facility $i$ , we have that $e_{i}\geq 0$ but $\sum_{i\in\mathcal{F}_{S}^{(\ell)}}e_{i}\leq\frac{2}{n}\cdot|\mathcal{F}_{S}^{(\ell)}|+\gamma\cdot\text{OPT}_{k^{\prime}}\leq 2\gamma\cdot\text{OPT}_{k^{\prime}}$ , since $|\mathcal{F}_{S}^{(\ell)}|\leq n$ and $\text{OPT}_{k^{\prime}}\geq n\geq\frac{2}{\gamma}$ . Similarly, $\sum_{i\in\mathcal{F}_{S}^{(\ell+1)}}e_{i}\leq 2\gamma\cdot\text{OPT}_{k^{\prime}}.$ So, this means that

\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}e_{i}\leq 4\gamma\cdot\text{OPT}_{k^{\prime}},

which means that

	$\displaystyle\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i))$	$\displaystyle\geq\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}\left(\lambda+\frac{1}{n}-e_{i}\right)$
		$\displaystyle\geq\left(\lambda+\frac{1}{n}\right)\cdot\|\{i\in S:i\text{ special}\}\|-4\gamma\cdot\text{OPT}_{k^{\prime}}.$		(15)

Thus, by combining Equations (13), (14), and (15), we get

	$\displaystyle\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)$	$\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{\begin{subarray}{c}i\in S\\ i\text{ tight}\end{subarray}}\left(\lambda-\frac{1}{n}\right)-\left(\lambda+\frac{1}{n}\right)\cdot\|\{i\in S:i\text{ special}\}\|+4\gamma\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\|S\|+4\gamma\cdot\text{OPT}_{k^{\prime}}.\qed$

Next, we show that the bad clients do not contribute much to the total cost.

Lemma 5.12.

Let $S$ be a subset of $\mathcal{V}$ containing $I_{1}$ . Then, we have that

\sum_{j\in\mathcal{D}_{B}}c(j,S)\leq O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Proof.

Note that for every point $j\in\mathcal{D}_{B}$ , there exists a facility $w(j)\in\mathcal{V}^{(\ell+1)}$ such that

\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\tau_{w(j)}\right)\leq O(\gamma)\cdot\text{OPT}_{k^{\prime}},

because $\mathcal{S}^{(\ell+1)}$ is $(\lambda,k^{\prime})$ -roundable. Now, note that $d(j,S)\leq d(j,I_{1})\leq d(j,w(j))+d(w(j),I_{1})$ , so $c(j,S)\leq 2[c(j,w(j))+c(w(j),I_{1})]$ . But since $w(j)\in\mathcal{V}^{(\ell+1)}\subset\mathcal{V}$ , $c(w(j),I_{1})\leq\delta_{1}\cdot\tau_{w(j)}$ , and therefore,

\sum_{j\in\mathcal{D}_{B}}c(j,S)\leq 2\cdot\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\delta_{1}\cdot\tau_{w(j)}\right)\leq O(\gamma)\cdot\text{OPT}_{k^{\prime}},

where the final inequality follows by Condition 3c) of Definition 5.1. ∎

We now combine our previous lemmas to obtain the following bound on the expected cost of $S$ , giving a result that bounds the overall expected cost in terms of the dual solution.

Lemma 5.13.

Suppose that $S$ is generated by applying Step 6 to $(I_{1},I_{2},I_{3})$ with the probability set to $p<0.5$ . Then, the expected cost $\mathbb{E}[\text{cost}(\mathcal{D},S)]$ is at most

\rho(p)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[|S|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}},

where $\rho(p)$ represents the constant from Lemma 4.2.

Proof.

We will abbreviate $\rho:=\rho(p)$ . We can split up the cost based on good (i.e., not in $\mathcal{D}_{B}$ ) points $j$ and bad points $j$ . Indeed, doing this, we get

	$\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S)]$	$\displaystyle=\sum_{j\not\in\mathcal{D}_{B}}\mathbb{E}[c(j,S)]+\sum_{j\in\mathcal{D}_{B}}\mathbb{E}[c(j,S)]$
		$\displaystyle\leq\rho\cdot(1+O(\varepsilon))\cdot\sum_{j\not\in\mathcal{D}_{B}}\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho\cdot(1+O(\varepsilon))\cdot\sum_{j\in\mathcal{D}}\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[\|S\|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.$

In the above equation, the second line follows from Lemmas 5.10 and 5.12. The third line is true since $\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0$ for all $j$ , even in $\mathcal{D}_{B}$ , by Lemma 5.9. Finally, the fourth line is true because of Lemma 5.11. ∎

Next, we show that under certain circumstances, we can find a solution $S$ of size at most $k$ satisfying a similar condition to Lemma 5.13, with high probability.

Lemma 5.14.

Suppose that $|I_{1}|+p\cdot|I_{2}\cup I_{3}|=k$ for some integer $k$ (which may be larger than $n$ ) and some $p\in[0.01,0.49]$ . Then, for any sufficiently large integer $C$ , if $|I_{2}\cup I_{3}|\geq 100\cdot C^{4}$ , then there exists a polynomial-time randomized algorithm that outputs a set $S$ such that $I_{1}\subset S\subset I_{1}\cup I_{2}\cup I_{3}$ , and with probability at least $9/10$ , $|S|\leq k$ and

\text{cost}(\mathcal{D},S)\leq\rho\left(p-\frac{2}{C}\right)\cdot\left(1+\frac{300}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Remark.

By repeating this randomized algorithm polynomially many times and outputting the lowest-cost solution $S$ with $|S|\leq k$ , we can make the failure probability exponentially small.

Proof.

Let $r=|I_{2}|$ , and partition $I_{2}\cup I_{3}$ into $T_{1},\dots,T_{r}$ , where each $T_{\ell}$ consists of a point $i_{2}\in I_{2}$ and $q^{-1}(i_{2})$ , i.e., $i_{2}$ ’s preimage under the map $q$ . We assume WLOG that the $T_{\ell}$ ’s are sorted in non-increasing order of size, and write $x_{\ell}$ as the unique point in $T_{\ell}\cap I_{2}$ . Define $s$ such that $|T_{1}|\geq\dots\geq|T_{s}|\geq C>|T_{s+1}|\geq\dots\geq|T_{r}|$ (note that $s$ may be $0$ or $r$ ). Note that $s\leq\frac{|I_{2}\cup I_{3}|}{C}$ so $\frac{s}{|I_{2}\cup I_{3}|}\leq\frac{1}{C}$ . Now, set $p^{\prime}=p-\frac{2}{C}$ , and consider creating the following set $S$ :

•

For each $i\in I_{1},$ include $i\in S$ .
•

For each $\ell\leq s,$ include $x_{\ell}\in S$ , and for each $i$ in $T_{\ell}\backslash\{x_{\ell}\}$ , include $i\in S$ independently with probability $p^{\prime}$ .
•

For each $\ell>s,$ flip a fair coin. If it lands heads, include $x_{\ell}$ with probability $2p^{\prime}$ , and if it lands tails, include each point in $T_{\ell}\backslash\{x_{\ell}\}$ independently with probability $2p^{\prime}$ .

The expected size of $S$ is $|I_{1}|+s+(|T_{1}|+\cdots+|T_{s}|-s)\cdot p^{\prime}+(|T_{s+1}|+\cdots+|T_{r}|)\cdot p^{\prime}=|I_{1}|+p^{\prime}\cdot|I_{2}\cup I_{3}|+(1-p^{\prime})\cdot s\leq|I_{1}|+p^{\prime}\cdot|I_{2}\cup I_{3}|+s.$ Therefore, since $p^{\prime}=p-\frac{2}{C},$ the expected size of $S$ as at most $|I_{1}|+(p-\frac{2}{C})\cdot|I_{2}\cup I_{3}|+\frac{|I_{2}\cup I_{3}|}{C}=|I_{1}|+(p-\frac{1}{C})\cdot|I_{2}\cup I_{3}|.$ To bound the variance of $S$ , we note that each point in $I_{1}$ and each $x_{\ell}$ for $\ell\leq s$ is deterministically in $S$ , each point in $T_{\ell}\backslash\{x_{\ell}\}$ for $\ell\leq s$ is independently selected with probability $p^{\prime}$ , and the number of points from each $T_{\ell}$ for $\ell>s$ is some independent random variable bounded by $|T_{\ell}|\leq C$ . So, the variance can be bounded by $(|T_{1}|+\cdots+|T_{s}|)+\sum_{\ell=s+1}^{r}|T_{\ell}|^{2}\leq\max_{s+1\leq\ell\leq r}|T_{\ell}|\cdot\left(|T_{1}|+\cdots+|T_{r}|\right)\leq C\cdot|I_{2}\cup I_{3}|$ . So, by Chebyshev’s inequality, with probability at least $1-\frac{1}{10C}$ ,

	$\displaystyle\|S\|$	$\displaystyle\leq\|I_{1}\|+(p-\frac{1}{C})\cdot\|I_{2}\cup I_{3}\|+\sqrt{10C\cdot C\cdot\|I_{2}\cup I_{3}\|}$
		$\displaystyle\leq\|I_{1}\|+p\|I_{2}\cup I_{3}\|-\frac{1}{C}\|I_{2}\cup I_{3}\|+\sqrt{10}C\cdot\sqrt{\|I_{2}\cup I_{3}\|}$
		$\displaystyle\leq\|I_{1}\|+p\|I_{2}\cup I_{3}\|=k,$

where the final inequality is true since $|I_{2}\cup I_{3}|\geq 100C^{4}$ .

Next, we bound the expected cost of $S$ . First, consider running the final step 6 of the LMP algorithm on $(I_{1},I_{2},I_{3})$ using probability $p^{\prime}$ . This would produce a set $S_{0}$ such that

$\displaystyle\mathbb{E}\left[\text{cost}(\mathcal{D},S_{0})\right]$	$\displaystyle\leq\rho(p^{\prime})\cdot(1+O(\varepsilon))\cdot\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[\|S_{0}\|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
	$\displaystyle=\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(\|I_{1}\|+\left(p-\frac{2}{C}\right)\cdot\|I_{2}\cup I_{3}\|\right)\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
	$\displaystyle=\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{2}{C}\cdot\left(\lambda-\frac{1}{n}\right)\cdot\|I_{2}\cup I_{3}\|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.$	(16)

Above, the first line follows by Lemma 5.13, the second line follows by definition of $p^{\prime}$ and $S_{0}$ , and the third line follows from the fact that $|I_{1}|+p|I_{2}\cup I_{3}|=k$ .

Now, note that if we had performed the final step of the LMP algorithm on $(I_{1},I_{2},I_{3})$ using probability $1/2$ instead of $p^{\prime}$ , the set (call it $S_{1}$ ) would have satisfied

\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(|I_{1}|+\frac{1}{2}(|I_{2}|+|I_{3}|)\right)+4\gamma\cdot\text{OPT}_{k^{\prime}}\geq\mathbb{E}\left[\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)\right]\geq 0

by Lemmas 5.11 and 5.9. This means that $\sum_{j\in\mathcal{D}}\alpha_{j}\geq(\lambda-\frac{1}{n})\cdot(|I_{1}|+1/2\cdot|I_{2}\cup I_{3}|)-4\gamma\cdot\text{OPT}_{k^{\prime}}.$ Therefore, again using the fact that $|I_{1}|+p\cdot|I_{2}\cup I_{3}|=k$ , we have that

\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\geq\left(\lambda-\frac{1}{n}\right)\cdot\left(\frac{1}{2}-p\right)\cdot|I_{2}\cup I_{3}|-4\gamma\cdot\text{OPT}_{k^{\prime}}.

We can rewrite this to obtain

\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|\leq\frac{1}{1/2-p}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right)+O(\gamma)\cdot\text{OPT}_{k^{\prime}},

(17)

since $p\in[0.01,0.49]$ .

In this and the next paragraph, we prove that $\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})]$ . To see why, we consider a coupling of the randomness to generate a sequence of sets $S_{0},S_{1},\dots,S_{s}=S$ . To do so, for each point $i\in I_{1}$ , we automatically include $i\in S=S_{s}$ and $i\in S_{h}$ for all $0\leq h<s$ . Now, for each $1\leq\ell\leq r$ , we first create a temporary set $\tilde{T}_{\ell}\subset T_{\ell}\backslash\{x_{\ell}\}$ by including each point $i\in T_{\ell}\backslash\{x_{\ell}\}$ to be in $\tilde{T}_{\ell}$ independently with probability $2p$ . Then, we create two sets $T_{\ell}^{(0)}\subset T_{\ell}$ and $T_{\ell}^{(1)}\subset T_{\ell}$ as follows. For $T_{\ell}^{(0)}$ , we include each point in $\tilde{T}_{\ell}$ independently, with probability $1/2$ , and always include $x_{\ell}\in T_{\ell}^{(0)}$ . For $T_{\ell}^{(1)}$ , we flip a fair coin: if the coin lands heads, we only include $x_{\ell}$ , but if the coin lands tails, we do not include $x_{\ell}$ but include all of $\tilde{T}_{\ell}$ . We remark that overall, $T_{\ell}^{(0)}$ includes each point in $T_{\ell}\backslash\{x_{\ell}\}$ independently with probability $p$ .

Now, for each $0\leq h\leq s$ , we define $S_{h}:=\left(\bigcup_{1\leq\ell\leq h}T_{\ell}^{(0)}\right)\cup\left(\bigcup_{h<\ell\leq r}T_{\ell}^{(1)}\right)$ . One can verify that $S$ and $S_{0}$ have the desired distribution, since $S_{0}=\bigcup_{1\leq\ell\leq r}T_{\ell}^{(1)}$ is precisely the distribution obtained after applying step 6 of the LMP algorithm on $(I_{1},I_{2},I_{3})$ , but $S$ takes $T_{\ell}^{(0)}$ instead of $T_{\ell}^{(1)}$ for each $\ell\leq s$ , which is precisely the desired distribution for $S$ (as we defined $S$ at the beginning of this lemma’s proof). To show that $\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})]$ , it suffices to show that $\mathbb{E}[\text{cost}(\mathcal{D},S_{h})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1})]$ for all $1\leq h\leq s$ . However, note that because of our coupling, the only difference between $S_{h}$ and $S_{h-1}$ relates to points in $T_{h}$ . If we let $S_{h-1}^{\prime}=S_{h-1}\cup\{x_{\ell}\}$ be the set that always includes $x_{\ell}$ but includes the entirety of $\tilde{T}_{h}$ with probability $1/2$ , then clearly $\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1}^{\prime})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1})]$ . In addition, if we condition on the set $\tilde{T}_{h}$ , the only difference between $S_{h-1}^{\prime}$ and $S_{h}$ is that $S_{h}$ includes each point in $\tilde{T}_{h}$ with $1/2$ probability, whereas $S_{h-1}$ either includes the entirety of $\tilde{T}_{h}$ with $1/2$ probability or includes none of $\tilde{T}_{h}$ . Therefore, by Proposition 5.5, using the negative-submodularity of $k$ -means [17], we have that $\mathbb{E}[\text{cost}(\mathcal{D},S_{h})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1}^{\prime})]$ . So, we have that $\mathbb{E}[\text{cost}(\mathcal{D},S_{h})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1}^{\prime})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1})]$ , which means that

\mathbb{E}[\text{cost}(\mathcal{D},S)]=\mathbb{E}[\text{cost}(\mathcal{D},S_{s})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{s-1})]\leq\cdots\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{1})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})].

(18)

In summary,

	$\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S)]$	$\displaystyle\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})]$
		$\displaystyle\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{2}{C}\cdot\left(\lambda-\frac{1}{n}\right)\cdot\|I_{2}\cup I_{3}\|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left(1+\frac{2/C}{1/2-p}\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left(1+\frac{200}{C}\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.$

Above, the first line follows from Equation (18), the second line follows from Equation (16), the third line follows from Equation (17), and the fourth line follows since $1/2-p\geq 0.01.$ So, with probability at least $1-\frac{1}{10C}$ , $|S|\leq k$ , and by Markov’s inequality,

\text{cost}(\mathcal{D},S)\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left(1+\frac{300}{C}\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}

with probability at least $\frac{10}{C}$ . So, both of these hold simultaneously with probability at least $\frac{9}{C},$ and by repeating the procedure $O(C)$ times, we will find our desired set $S$ with probability $9/10$ . ∎

Our upper bound on the cost has so far been based on terms of the form $\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k.$ We note that this value is at most roughly $\text{OPT}_{k}$ . Specifically, we note the following:

Proposition 5.15.

If $\alpha^{(\ell+1)}$ is a feasible solution to $\text{DUAL}(\lambda),$ then $\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{1}{n}\right)\cdot k\leq\text{OPT}_{k}.$

Proof.

Recall that $\alpha_{j}=\min(\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}),$ and that $\{\alpha_{j}^{(\ell+1)}\}$ is a feasible solution to $\text{DUAL}(\lambda+\frac{1}{n})$ . Therefore, by duality, we have that

\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{1}{n}\right)\cdot k\leq\sum_{j\in\mathcal{D}}\alpha_{j}^{(\ell+1)}-\left(\lambda+\frac{1}{n}\right)\cdot k\leq\text{OPT}_{k}.\qed

One potential issue is that if our goal is to obtain a good approximation to optimal $k$ -means, the $\gamma\cdot\text{OPT}_{k^{\prime}}$ error, which should be negligible, may appear too large if $k^{\prime}$ is smaller than $k$ . To fix this, we show that $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ in certain cases, which we later show we will satisfy. For the following lemma, we return to considering a single roundable solution $\mathcal{S}=(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ , and let $\mathcal{V}$ represent the corresponding set of tight or special facilities corresponding to $\mathcal{S}.$

Lemma 5.16.

Let $(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ be $(\lambda^{\prime},k^{\prime})$ -roundable for some $\lambda^{\prime}\geq 0$ , where $\mathcal{F}_{S}=\emptyset$ and the set of corresponding bad clients is $\mathcal{D}_{B}=\emptyset$ . Define $\{H(\delta)\}$ as the corresponding family of conflict graphs, with some fixed nested quasi-independent set $(I_{1},I_{2},I_{3})$ . Suppose that $k\leq\min\left(n-1,|I_{1}|+p\cdot|I_{2}\cup I_{3}|\right)$ for some $p\leq 0.49$ , and that $k^{\prime}=\min(k,|I_{1}|)$ . Then, $\text{OPT}_{k^{\prime}}=O\left(\text{OPT}_{k}\right)$ .

Proof.

If $k^{\prime}=k$ , the result is trivial. So, we assume that $k^{\prime}=|I_{1}|$ .

Since $\mathcal{D}_{B}$ is empty, we have that every client $j$ has a tight witness $w(j)$ (since there are no special facilities) such that $(1+\varepsilon)\cdot\alpha_{j}\geq c(j,w(j))$ and $(1+\varepsilon)\cdot\alpha_{j}\geq\tau_{w(j)}$ . In addition, we have that for any $i\in I_{1}\cup I_{2}\cup I_{3}$ , $i$ is tight which means $\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=z_{i}\in[\lambda^{\prime},\lambda^{\prime}+\frac{1}{n}]$ . Therefore,

\sum_{j\in\mathcal{D}}\left[\alpha_{j}-\sum_{i\in N(j)\cap I_{1}}(\alpha_{j}-c(j,i))\right]=\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{i\in I_{1}}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda^{\prime}\cdot|I_{1}|.

Then, we can use the LMP approximation to get that

\sum_{j\in\mathcal{D}}c(j,I_{1})\leq\rho_{1}\cdot(1+O(\varepsilon))\cdot\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in N(j)\cap I_{1}}(\alpha_{j}-c(j,i))\right)\leq\rho_{1}\cdot(1+O(\varepsilon))\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda^{\prime}\cdot|I_{1}|\right),

where $\rho_{1}=\rho(0)$ , i.e., assuming that no point in $I_{2}$ or $I_{3}$ is included as part of the set. In addition, we know that if $S$ is created by including all of $I_{1}$ and each point in $I_{2}\cup I_{3}$ with probability $\frac{1}{2}$ , then

	$\displaystyle\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda^{\prime}\cdot\left(\|I_{1}\|+\frac{1}{2}\|I_{2}\cup I_{3}\|\right)$	$\displaystyle\geq\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{i\in S}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\right]$
		$\displaystyle=\sum_{j\in\mathcal{D}}\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]$
		$\displaystyle\geq 0.$

Above, the first inequality follows since $\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=z_{i}\geq\lambda^{\prime}$ for any tight $i$ , and the final inequality follows because of Lemma 5.9.

To summarize, we have that there exists a constant $\theta=\frac{1}{\rho_{1}\cdot(1+O(\varepsilon))}$ such that

	$\displaystyle\theta\cdot\sum_{j\in\mathcal{D}}c(j,I_{1})$	$\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\|I_{1}\|\cdot\lambda^{\prime}$	(19)
and
	$\displaystyle 0$	$\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\|I_{1}\|+\frac{1}{2}\cdot\|I_{2}\cup I_{3}\|\right)\cdot\lambda^{\prime}.$	(20)

Therefore, by taking a weighted average of Equations (19) and (20), we get

(1-2p)\cdot\theta\cdot\text{OPT}_{k^{\prime}}\leq(1-2p)\cdot\theta\cdot\sum_{j\in\mathcal{D}}c(j,I_{1})\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(|I_{1}|+p\cdot|I_{2}\cup I_{3}|\right)\cdot\lambda^{\prime}\leq\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot\lambda^{\prime},

where the first inequality is true since $k^{\prime}=|I_{1}|$ and the last inequality is true since $|I_{1}|+p\cdot|I_{2}\cup I_{3}|\geq k.$ Thus, since $p\leq 0.49$ and since $\rho_{1}=O(1)$ , we have that $\text{OPT}_{k^{\prime}}=O\left(\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot\lambda^{\prime}\right)$ .

Finally, since $\{\alpha_{j}\}$ is a feasible solution to $\text{DUAL}(\lambda^{\prime}+\frac{1}{n})$ , this means that $\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot\lambda^{\prime}=\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot(\lambda^{\prime}+\frac{1}{n})+\frac{k}{n}\leq\text{OPT}_{k}+\frac{k}{n}.$ However, if $k\leq n-1$ , then $\frac{k}{n}\leq 1\leq\text{OPT}_{k}$ . So, $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ . ∎

Recall that $H(\delta)$ represents the conflict graph $H^{(\ell,r)}(\delta)$ . We will also let $H^{\prime}(\delta)$ represent the conflict graph $H^{(\ell,r+1)}(\delta)$ . In that case, $H^{\prime}(\delta)$ is the same as $H(\delta)$ except with one vertex removed. Recall $(I_{1},I_{2},I_{3})$ was a nested quasi-independent set of $\{H(\delta)\}$ , and let $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ be a nested quasi-independent set of $\{H^{\prime}(\delta)\}$ , such that $|I_{1}\backslash I_{1}^{\prime}|\leq 1$ and $|I_{1}|+p_{1}|I_{2}\cup I_{3}|\geq k>|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|$ .

Theorem 5.17.

Let $C>0$ be an arbitrarily large constant, and $\varepsilon<0$ be an arbitrarily small constant. Given the sets $(I,I^{\prime})$ obtained in Algorithm 2, in polynomial time we can obtain a solution for Euclidean $k$ -means with approximation factor at most

\left(1+\varepsilon\right)\cdot\max_{r\geq 1}\min\left(\rho\left(\frac{p_{1}}{r}\right),\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{r}{2p_{1}}-1\right)}\right)\right).

(21)

Proof.

First, we remark that it suffices to obtain a set of facilities of size at most $k+c_{0}\cdot C^{4}$ with cost at most $K\cdot\text{OPT}_{k-c_{0}\cdot C^{4}}$ for any fixed constant $c_{0}$ (for all $1\leq k\leq n-1$ ), where $K$ is the value in Equation (21). Indeed, we can apply Lemma 5.2 to obtain a solution of size $k-c_{0}\cdot C^{4}$ and cost $(1+\varepsilon)\cdot K\cdot\text{OPT}_{k-c_{0}\cdot C^{4}}$ in polynomial time, hence obtaining a $(1+\varepsilon)\cdot K$ -approximate solution for $(k-c_{0}\cdot C^{4})$ means clustering for all $1\leq k\leq n-1$ . Thus, we get the desired approximation ratio for all $k\leq n-c_{0}C^{4}-1$ , but for $k\geq n-c_{0}C^{4}-1$ , we can enumerate all the $(2n)^{c_{0}C^{4}-1}$ different clusterings of the input that have at most $c_{0}C^{4}-1$ non-singleton parts and solve $k$ -clustering exactly in $n^{O(C^{4})}$ time.

Algorithm 2 stops once we have found the first hybrid conflict graph $\mathcal{V}^{(\ell,r)}$ for some $r\geq 1$ where the corresponding nested quasi-independent set $(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)})$ satisfies $|I_{1}^{(\ell,r)}|+p_{1}\cdot|I_{2}^{(\ell,r)}\cup I_{3}^{(\ell,r)}|<k$ . Let $\mathcal{V}^{\prime}:=\mathcal{V}^{(\ell,r)}$ and let $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}):=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)})$ . In addition, let $\mathcal{V}:=\mathcal{V}^{(\ell,r-1)}$ and $(I_{1},I_{2},I_{3}):=(I_{1}^{(\ell,r-1)},I_{2}^{(\ell,r-1)},I_{3}^{(\ell,r-1)})$ . If $r=1$ then $r-1=0$ , which may be problematic since our previous lemmas can only be used for $r\geq 1$ . However, we note that $I^{(\ell,0)}=I^{(\ell)}=I^{(\ell-1,p_{\ell-1})}$ if $\ell\geq 1$ , and that $I^{(0,0)}=I^{(0)}$ was previously labeled as $I^{(q)}=I^{(q-1,p_{q-1})}.$ The only exception to this is the case when $\ell=0,r=1$ , and $I^{(0,0)}$ is the initialized solution created in the first line of the algorithm. In this case, however, recall from our initialization that $I_{1}^{(0,0)}=\mathcal{F}$ is the full set of facilities, and $I_{1}^{(0,1)}$ will just be an extension of this set, so $|I_{1}^{(0,1)}|\geq|\mathcal{F}|\geq k$ . Therefore, $I=(I_{1},I_{2},I_{3})$ and $I^{\prime}=(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ are both expressible as nested quasi-independent sets of merged conflict graphs. However, if $I=I^{(0,1)}$ and $I^{\prime}=I^{(0,0)}$ , then we may need to express $I^{\prime}=I^{(q-1,p_{q-1}+1)}$ based on a previous labeling, so it is possible that $I$ comes from a $(\lambda+\frac{1}{n},k^{\prime})$ -roundable solution and $I^{\prime}$ comes from a $(\lambda,k^{\prime})$ -roundable solution, rather than both nested quasi-independent sets coming from $(\lambda,k^{\prime})$ -roundable solutions.

First, we show that for the value of $k^{\prime}$ at the end of the algorithm (which means all solutions found are $(\lambda^{\prime},k^{\prime})$ -roundable for some $\lambda^{\prime}$ ), that $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ . To see why this is the case, note that either $k^{\prime}=k$ , so the claim is trivial, or $k^{\prime}=|I_{1}^{(0)}|$ for some $I_{1}^{(0)}$ that corresponds to a solution that was at some point labeled as $\mathcal{S}^{(0)}$ . Note that the corresponding nested quasi-independent set $(I_{1}^{(0)},I_{2}^{(0)},I_{3}^{(0)})$ is not the final set $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ , because if so we would have stopped the algorithm before we decided to label $\mathcal{S}^{(0)}$ as such. Therefore, $k\leq|I_{1}^{(0)}|+p_{1}\cdot|I_{2}^{(0)}\cup I_{3}^{(0)}|$ and we are assuming that $k\leq n-1.$ Finally, since $(I_{1}^{(0)},I_{2}^{(0)},I_{3}^{(0)})$ arises from a $(\lambda^{\prime},k^{\prime})$ -roundable solution with no special facilities or bad clients (by Condition 2), we may apply Lemma 5.16 to obtain that $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ .

Note that $|I_{1}^{\prime}|\geq|I_{1}|-1$ , and that $|I_{1}^{\prime}|\leq|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|<k$ , which means that $|I_{1}|<k+1$ so $|I_{1}|\leq k$ . First, suppose that $|I_{1}|\geq k-100C^{4}$ , where we recall that $C$ is an arbitrarily large but fixed constant. In this case, this means that $|I_{1}^{\prime}|\geq k-100C^{4}-1$ and $p_{1}\cdot|I_{2}^{\prime}\cup I_{3}^{\prime}|\leq 100C^{4}+1$ so $|I_{2}^{\prime}\cup I_{3}^{\prime}|=O(C^{4}).$ In this case, we can apply Lemma 5.13 to find a randomized set $I_{1}^{\prime}\subset S\subset I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime}$ such that

\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[|S|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}},

where $\{\alpha_{j}^{\prime}\}_{j\in\mathcal{D}}$ corresponds to the merged solution that produces $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ . Since $|I_{2}^{\prime}\cup I_{3}^{\prime}|=O(C^{4})$ , we can try every possible $I_{1}^{\prime}\subset S\subset I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime}$ to get a deterministic set $S$ of size at most $|I_{1}^{\prime}|+|I_{2}^{\prime}\cup I_{3}^{\prime}|\leq k+O(C^{4})$ and size at least $|I_{1}^{\prime}|\geq k-O(C^{4})$ such that

	$\displaystyle\text{cost}(\mathcal{D},S)$	$\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda-\frac{1}{n}\right)\cdot\|S\|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda+\frac{2}{n}\right)\cdot\|S\|\right]+O(1/n)\cdot\|S\|+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{\|S\|}+O(\gamma)\cdot\text{OPT}_{k}.$

The final line follows by Proposition 5.15, since $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ , and since $|S|\leq k+O(C^{4})\leq O(k)$ so $O(1/n)\cdot|S|=O(1)\leq O(\gamma)\cdot\text{OPT}_{k}$ . Therefore, there exists an absolute constant $c_{0}$ such that we have a set of size at most $|I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime}|\leq k+c_{0}\cdot C^{4}$ with cost at most $\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{|S|}+O(\gamma)\cdot\text{OPT}_{k}\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{k-c_{0}\cdot C^{4}}$ . As argued in the first paragraph of this proof, this is sufficient since we can apply Lemma 5.2.

Otherwise, namely when $|I_{1}|\leq k-100C^{4}$ , we have $|I_{2}\cup I_{3}|\geq 100C^{4}$ since $|I_{1}|+p_{1}|I_{2}\cup I_{3}|\geq k$ . Then, recall that $|I_{1}\backslash I_{1}^{\prime}|\leq 1,$ so let $\kappa=|I_{1}\backslash I_{1}^{\prime}|\in\{0,1\}$ . Set $t>0$ and $c\geq 0$ such that $|I_{1}|+p_{1}|I_{2}\cup I_{3}|=k+c\cdot t$ and $|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|=k-t.$ Then, $|I_{1}\cap I_{1}^{\prime}|=k-(1+d)t$ for some $d\geq 0$ , so $|I_{1}|=k-(1+d)t+\kappa$ . In this case, $p_{1}|I_{2}\cup I_{3}|=(1+c+d)t-\kappa$ , so if we set $p=p_{1}\cdot\frac{(1+d)t-\kappa}{(1+c+d)t-\kappa}$ , then $|I_{1}|+p|I_{2}\cup I_{3}|=k$ . Also, since $|I_{2}\cup I_{3}|\geq 100C^{4}$ , we have that $(1+c+d)t\geq p_{1}\cdot 100C^{4}\geq C$ , so $p=p_{1}\cdot\frac{1+d}{1+c+d}-\eta$ for some $\eta\leq 1/C$ . In this case, assuming that $p>0.01,$ we can use Lemma 5.14 to obtain a solution of size at most $k$ with cost at most

\rho\left(p_{1}\cdot\frac{1+d}{1+c+d}-\eta-\frac{2}{C}\right)\cdot\left(1+\frac{300}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Now, since $p_{1}\cdot\frac{1+d}{1+c+d}\leq p_{1}=.402$ , it is straightforward to verify that $\rho$ has bounded derivative. (Indeed, each case produces a function that is continuously differentiable on $[0,0.5)$ , so has bounded derivative on $[0,0.402]$ .) Therefore, since $\eta\leq\frac{1}{C}$ , the solution in fact has cost at most

\rho\left(p_{1}\cdot\frac{1+d}{1+c+d}\right)\cdot\left(1+O\left(\frac{1}{C}+\varepsilon\right)\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

But since $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ , and since

\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k=\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{2}{n}\right)\cdot k+\frac{3k}{n}\leq\text{OPT}_{k}+3\leq(1+O(\gamma))\cdot\text{OPT}_{k},

we obtain a solution of cost at most

\rho\left(p_{1}\cdot\frac{1+d}{1+c+d}\right)\cdot\left(1+O\left(\frac{1}{C}+\varepsilon+\gamma\right)\right)\cdot\text{OPT}_{k}.

(22)

In addition, note that we can use Lemma 5.14 to obtain a solution $S$ for $(k+ct)$ -means and $S^{\prime}$ for $(k-t)$ -means. Also, $|S\cup S^{\prime}|=|S|+|S^{\prime}|-|S\cap S^{\prime}|\leq|S|+|S^{\prime}|-|I_{1}\cap I_{1}^{\prime}|=k+(c+d)t.$ So, if we define $\rho^{\prime}:=\max_{0\leq\eta\leq 1/C}\rho(p_{1}-\eta-\frac{2}{C})\cdot(1+\frac{300}{C})\cdot(1+O(\varepsilon))=\rho(p_{1})\cdot(1+O(\frac{1}{C}+\varepsilon))$ , then

\text{cost}(\mathcal{D},S^{\prime})\leq\rho^{\prime}\cdot\left(\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\right)+O(\gamma)\cdot\text{OPT}_{k^{\prime}}

(23)

and

\text{cost}(\mathcal{D},S\cup S^{\prime})\leq\text{cost}(\mathcal{D},S)\leq\rho^{\prime}\cdot\left(\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k+ct)\right)+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

(24)

Therefore, by Proposition 5.6, if we randomly add $t$ of the items in $S\backslash S^{\prime}$ , we will get a set $S^{\prime\prime}$ of size $k$ with expected cost

$\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})]$	$\displaystyle\leq\rho^{\prime}\cdot\Bigg{(}\frac{c+d}{1+c+d}\cdot\Bigg{(}\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\Bigg{)}$
	$\displaystyle\hskip 96.73918pt+\frac{1}{1+c+d}\cdot\Bigg{(}\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k+ct)\Bigg{)}\Bigg{)}+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
	$\displaystyle=\rho^{\prime}\cdot\Bigg{(}\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{d}{1+c+d}\cdot\left(\lambda-\frac{1}{n}\right)\cdot t\Bigg{)}+O(\gamma)\cdot\text{OPT}_{k}.$	(25)

Note that if $\lambda-\frac{1}{n}\leq 0$ , then this means that $\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})]\leq\rho^{\prime}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}\right)+O(\gamma)\cdot\text{OPT}_{k},$ and Proposition 5.15 tells us that $\sum_{j\in\mathcal{D}}\alpha_{j}\leq\text{OPT}_{k}+O(k/n)\leq(1+O(\gamma))\cdot\text{OPT}_{k}$ , so the expected cost is at most $\rho(p_{1})\cdot(1+O(\frac{1}{C}+\varepsilon+\gamma))\cdot\text{OPT}_{k}.$ Alternatively, we may assume that $\lambda-\frac{1}{n}>0$ .

In this case, note that by Lemmas 5.9 and 5.11, we have that

\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(|I_{1}|+\frac{1}{2}|I_{2}\cup I_{3}|\right)+4\gamma\cdot\text{OPT}_{k^{\prime}}\geq 0.

We can rewrite $|I_{1}|+\frac{1}{2}|I_{2}\cup I_{3}|$ as $(k-(1+d)t+\kappa)+\frac{1}{2p_{1}}\cdot((1+c+d)t-\kappa)=k+t\cdot\left(\frac{1}{2p_{1}}(1+c+d)-(1+d)\right)-O(1)\geq k+t\cdot\left(\frac{1}{2p_{1}}(1+c+d)-(1+d)\right)\cdot\left(1-\frac{O(1)}{C^{4}}\right)$ , where the last inequality is true since $t\cdot\left[\frac{1}{2p_{1}}(1+c+d)-(1+d)\right]\geq\Omega(t\cdot(1+c+d))\geq\Omega(|I_{2}\cup I_{3}|)\geq 100C^{4}$ . Thus, we have that

\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+4\gamma\cdot\text{OPT}_{k^{\prime}}\geq\left(\lambda-\frac{1}{n}\right)\cdot t\cdot\left(\frac{1}{2p_{1}}(1+c+d)-(1+d)\right)\cdot\left(1-\frac{O(1)}{C^{4}}\right).

Therefore, combining the above equation with (25), we have that

$\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})]$	$\displaystyle\leq\rho^{\prime}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+O(\gamma)\cdot\text{OPT}_{k^{\prime}}\right)\cdot\left(1+\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}\cdot\left(1+\frac{O(1)}{C^{4}}\right)\right)$
	$\displaystyle\leq\rho^{\prime}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{2}{n}\right)\cdot k+O(\gamma)\cdot\text{OPT}_{k}\right)\cdot\left(1+\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}\cdot\left(1+\frac{O(1)}{C^{4}}\right)\right)$
	$\displaystyle\leq\rho^{\prime}\cdot\left(1+O(\frac{1}{C}+\gamma)\right)\cdot\left(1+\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}\right)\cdot\text{OPT}_{k}.$	(26)

If we set $r\geq 1$ such that $c=(r-1)(1+d)$ , then $\frac{1+d}{1+c+d}=\frac{1}{r}$ and

\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}=\frac{d}{r(1+d)}\cdot\frac{1}{(\frac{r}{2p_{1}}-1)\cdot(1+d)}=\frac{d}{(1+d)^{2}}\cdot\frac{1}{r\cdot(\frac{r}{2p_{1}}-1)}\leq\frac{1}{4r\cdot(\frac{r}{2p_{1}}-1)}.

So, by combining Equations (22) and (26), we can always guarantee an approximation factor of at most

\left(1+O(\frac{1}{C}+\varepsilon+\gamma)\right)\cdot\max_{r\geq 1}\min\left(\rho\left(\frac{p_{1}}{r}\right),\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{r}{2p_{1}}-1\right)}\right)\right).

(27)

This approximation factor also holds in the case when $\lambda-\frac{1}{n}\leq 0$ , by setting $r=1$ . So, by letting $C$ be an arbitrarily large constant and $\gamma\ll\varepsilon\ll 1$ be arbitrarily small constants, the result follows. ∎

Since we have set $p_{1}=0.402$ and $\delta_{1}=\frac{4+8\sqrt{2}}{7},\delta_{2}=2,\delta_{3}=0.265$ , by Proposition 4.7 we have $\rho(p_{1})=3+2\sqrt{2}$ . If $r\geq 3.221,$ one can verify that $(3+2\sqrt{2})\cdot\left(1+\frac{1}{4r\cdot(\frac{r}{2p_{1}}-1)}\right)\leq 5.979$ . Alternatively, if $1\leq r\leq 3.221,$ then $\frac{p_{1}}{r}\geq\frac{.402}{3.221}\geq.1248,$ and it is straightfoward to verify that $\rho(p)\leq 5.979$ for all $p\in[.1248,.402]$ using Proposition 4.7.²²2We remark that while Proposition 4.7 follows from Lemma 5.19, Lemma 5.19 only depends on Lemma 4.2, so there is no circular reasoning. Hence, we have a $\boxed{5.979}$ -approximation to $k$ -means.

5.4 Improving the approximation further

First, we define some important quantities. For any client $j$ , we define $A_{j}:=\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))$ , and we define $B_{j}:=\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i))$ . Note that $B_{j}\geq 0$ always.

We split the clients into $5$ groups. Let $\mathcal{D}_{1}$ be the set of all clients $j\not\in\mathcal{D}_{B}$ corresponding to subcases 1.a, 1.c, 1.d, 1.g.ii, 1.h, 2.a, 3.a, 4.a.i, 4.b.i, and 4.c, as well as the clients in 5.a where there do not exist $i_{2}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}$ and $i_{3}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}$ such that $q(i_{3})=i_{2}$ . (In the casework, our choice of $a$ is $|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S|$ rather than $N(j)\cap S$ , similar for $b$ and $c$ .) Let $Q_{1}$ be the sum of $A_{j}$ for these clients, and $R_{1}$ be the sum of $B_{j}$ for these clients. Next, let $\mathcal{D}_{2}$ be the set of all clients $j\not\in\mathcal{D}_{B}$ corresponding to subcases 1.b, 1.e, 4.a.ii, and 4.b.ii. Let $\mathcal{D}_{3}$ be the set of all clients $j\not\in\mathcal{D}_{B}$ corresponding to subcase 1.g.i. Let $\mathcal{D}_{4}$ be the set of all clients $j\not\in\mathcal{D}_{B}$ corresponding to subcase 2.d, further restricted to $c(j,i_{2})+c(j,i_{3})\geq 0.25\cdot\alpha_{j}$ (or equivalently in the language of Case 2.d in Lemma 4.2, $\beta^{2}+\gamma^{2}\geq 0.25$ ). Finally, let $\mathcal{D}_{5}$ be the set of all bad clients $j\in\mathcal{D}_{B}$ , as well as all remaining subcases (2.b, 2.c, 2.d when $\beta^{2}+\gamma^{2}<0.25$ , 3.b, 3.c, and the clients in 5.a not covered by $\mathcal{D}_{1}$ ). Note these cover all cases (recall that 1.f is a non-existent case). Finally, we define $Q_{2},Q_{3},Q_{4},Q_{5},R_{2},R_{3},R_{4},R_{5}$ similarly to how we defined $Q_{1}$ and $R_{1}.$

Now, we have the following result, which improves over Lemma 5.9.

Lemma 5.18.

For any client $j$ , $A_{j}-\frac{1}{2}B_{j}\geq 0$ . In addition, if the client $j$ corresponds to any of the subcases in case $1$ or case $4$ , or to subcases $2.a$ or $3.a$ , then $A_{j}-B_{j}\geq 0$ . Also, if the client $j$ corresponds to subcase $2.d$ where $\beta^{2}+\gamma^{2}\geq 0.25$ , then $A_{j}-\frac{4}{7}B_{j}\geq 0$ .

Remark.

As in Lemma 5.9, this lemma holds even for bad clients $j\in\mathcal{D}_{B}$ .

Proof.

The proof that $A_{j}-\frac{1}{2}B_{j}\geq 0$ for any client $j$ is identical to that of Lemma 5.9. So, we focus on the next two claims. For the subcases in Case 1, note that $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}=\emptyset$ , so we just need to show that $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i))$ , which is implied by Equation (11). Likewise, for the subcases in Case 4, note that $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}=\emptyset$ , so we just need to show that $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i))$ , which is implied by Equation (12).

We recall that in subcases 2.a and 3.a, we noted in both cases that the points in $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}$ and $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}$ were all separated in $H(\delta_{2})$ , and that $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}=\emptyset$ since $a=0$ . So, we have that $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))=\alpha_{j}\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i))$ in both cases.

Finally, we consider subcase 2.d when $\beta^{2}+\gamma^{2}\geq 0.25.$ In this case, we have (when $\alpha_{j}=1$ ) that $1-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=1$ , and $\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))=(1-c(j,i_{2}))+(1-c(j,i_{3}))=2-\beta^{2}-\gamma^{2}$ , which is at most $1.75$ if $\beta^{2}+\gamma^{2}\geq 0.25.$ So, for general $\alpha_{j}$ , we have that $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))=\alpha_{j}\geq\frac{4}{7}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)).$ ∎

Therefore, we have that

R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq Q_{2},\hskip 28.45274ptR_{3}\leq Q_{3},\hskip 28.45274ptR_{4}\leq 1.75\cdot Q_{4},\hskip 14.22636pt\text{and}\hskip 14.22636ptR_{5}\leq 2Q_{5}.

(28)

Next, we define $\rho^{(i)}(p)$ to be the maximum fraction $\rho(p)$ obtained from the casework corresponding to the (not bad) clients in $\mathcal{D}_{i}$ . (Note that $\rho(p)=\max\left(\rho^{(1)}(p),\rho^{(2)}(p),\rho^{(3)}(p),\rho^{(4)}(p),\rho^{(5)}(p)\right)$ .) We have the following result:

Lemma 5.19.

Let $\delta_{1}=\frac{4+8\sqrt{2}}{7},$ $\delta_{2}=2$ , and $\delta_{3}=0.265$ . Then, for all $p\in[0.096,0.402],$ we have that $\rho^{(1)}(p)\leq 3+2\sqrt{2}$ , $\rho^{(2)}(p)\leq 1+2p+(1-p)\cdot\delta_{1}+2\sqrt{2p^{2}+(1-p)\cdot\delta_{1}}$ , and $\rho^{(5)}(p)\leq 5.68$ .

Proof.

We start by considering $\rho^{(1)}(p)$ , covered by subcases 1.a, 1.c, 1.d, 1.g.ii, 1.h, 2.a, 3.a, 4.a.i, 4.b.i, and 4.c, and certain subcases of 5.a. All subcases except 2.a, 4.c, and 5.a can easily be verified (see our Desmos file for K-means Case 1, the link is in Appendix A). For subcase 2.a, we have to verify it for all choices of $c\geq 1$ . However, it is simple to see that the numerator of the fraction decreases as $c_{2}$ increases whenever $p\in[0,0.5]$ , so in fact we just have to verify it for $c=c_{2}=1$ , which is straightforward. For subcase 4.c, we have to verify it for all choices of $c\geq 2$ . For $c=2$ it is straightforward to verify. For $c\geq 3,$ since $2.5+\sqrt{2}\leq 3+2\sqrt{2},$ it suffices to show

\frac{\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\right)\cdot(1+\sqrt{\delta_{1}})^{2}-\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c}\right)+p(1+\sqrt{2})^{2}+1}{1-p}\leq 3+2\sqrt{2},

where we have taken the fraction from 4.c and added back a $\frac{p}{c}$ term to the numerator. Now, this fraction is decreasing as $c$ increases, so it suffices to verify it for $c=3$ , which is straightforward.

The last case for $\rho^{(1)}(p)$ is Case 5.a. We show that in all cases the fraction is bounded by $3+2\sqrt{2}$ for $p\in[0.096,0.402]$ , and if $h\geq 1$ then the fraction can further be bounded by $5.68$ . This is clearly sufficient for bounding $\rho^{(1)}(p)$ . It will also be important in bounding $\rho^{(5)}(p)$ - indeed, if there exist $i_{2}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}$ and $i_{3}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}$ such that $q(i_{3})=i_{2},$ then regardless of the outcomes of the initial fair coins, $h\geq 1$ since exactly one of $i_{3}$ or $q(i_{3})=i_{2}$ will contribute to the value of $h$ .

First, we note that $T_{1}-T_{1}^{2}+T_{3}$ can be rewritten as

a+2ph-(a^{2}+4aph+4p^{2}h^{2})+\frac{\delta_{1}}{2}\cdot(a^{2}-a)+4pah+4p^{2}\cdot h(h-1)=\left(\frac{\delta_{1}}{2}-1\right)\cdot a(a-1)+2p(1-2p)\cdot h.

In the case where $a=1$ and $h\geq 1$ , we can therefore simplify the fraction in (5.a) to $\frac{1}{a+2ph}+\frac{2ph}{2p(1-2p)\cdot h}=\frac{1}{a+2ph}+\frac{1}{1-2p}\leq\frac{1}{1+2p}+\frac{1}{1-2p}=\frac{2}{1-4p^{2}}.$ This is at most $5.68$ for any $p\leq 0.402$ . When $a\geq 2$ , we can write the fraction as

\frac{1}{a+2ph}+\frac{(a-1)+(2p)h}{\left(\frac{\delta_{1}}{2}-1\right)\cdot a(a-1)+(2p)h\cdot(1-2p)}.

(29)

When $a\geq 2$ and $h=0$ , (29) can be simplified as

\frac{1}{a}+\frac{1}{a\cdot\left(\frac{\delta_{1}}{2}-1\right)}\leq\frac{1}{2}\left(1+\frac{1}{\frac{\delta_{1}}{2}-1}\right)=3+2\sqrt{2}.

	$\displaystyle\frac{1}{a+2ph}+\frac{(a-1)+(2p)h}{\left(\frac{\delta_{1}}{2}-1\right)\cdot a(a-1)+(2p)h\cdot(1-2p)}$	$\displaystyle=\frac{1}{2+2ph}+\frac{[1+2p]+2p(h-1)}{\left[\left(\frac{\delta_{1}}{2}-1\right)\cdot 2+2p\right]+2p(h-1)\cdot(1-2p)}$
		$\displaystyle\leq\frac{1}{a+2p}+\max\left(\frac{1}{\left(\frac{\delta_{1}}{2}-1\right)\cdot a},\frac{1}{1-2p}\right).$

When $a=2$ and $h\geq 1$ , we can rewrite (29) as

	$\displaystyle\frac{1}{2+2ph}+\frac{1+2ph}{\delta_{1}-2+2ph(1-2p)}$	$\displaystyle=\frac{1}{2+2ph}+\frac{1+2p+2p(h-1)}{\delta_{1}-2+2p(1-2p)+2p(1-2p)(h-1)}$
		$\displaystyle\leq\frac{1}{2+2p}+\max\left(\frac{1+2p}{\delta_{1}-2+2p(1-2p)},\frac{1}{1-2p}\right),$

which is easily verifiable to be at most $5.68$ for $p\in[0.096,0.402]$ . When $a\geq 3$ and $h\geq 1$ , (29) is at most

\frac{1}{3}+\max\left(\frac{1}{3\cdot\left(\frac{\delta_{1}}{2}-1\right)},\frac{1}{1-2p}\right),

which is easily verifiable to be at most $5.68$ for $p\in[0.096,0.402]$ . The final case is when $a=1,h=0$ , but here we saw in our analysis of 5.a that the fraction is at most $1$ , or that the numerator and denominator are both $0$ .

Next, consider $\rho^{(2)}(p)$ , which is covered by subcases 1.b, 1.e, 4.a.ii, and 4.b.ii. Indeed, since $\delta_{2}=2$ , these all have the exact same bound of $1+2p+(1-p)\delta_{1}+2\sqrt{2p^{2}+(1-p)\delta_{1}}$ .

Finally, we deal with $\rho^{(5)}(p)$ , which deals with subcases 2.b, 2.c, 3.b, 3.c, and 5.a, along with 2.d when $\beta^{2}+\gamma^{2}<0.25$ .

Subcase 2.b can be easily verified to be at most $5.664$ in the range $p\in[0.096,0.402]$ when $c_{2}=0$ and $c=c_{1}$ is between $2$ and $5$ . Beyond this, we assume that $c_{1}\geq 6$ , so we can apply the crude bound that the fraction is at most

\frac{\frac{1}{2}+\left(1-p-\frac{1}{2}\cdot\left(1-(1-2p)^{6}\right)\right)\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}}{1-2p},

which is at most $5.68$ for $p\in[0.096,0.402]$ . It is easy to verify that the fraction in Subcase 2.c is at most $5.68$ for $p\in[0.096,0.402]$ .

Subcase 3.b is easy to verify for $2\leq b\leq 5.$ For $b\geq 6$ , we can apply the crude bound that the fraction is at most

\frac{(1-p)^{5}\cdot(1-2p)\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}+1}{1-\left(1+\frac{2-\delta_{3}}{7}\right)p},

which trivially satisfies the desired bounds. Finally, we note that in Subcase 3.c, the fraction decreases as $c_{1}$ and $c_{2}$ increase, so we may assume that either $c_{1}=c_{2}=1$ or $c_{1}=2$ and $c_{2}=0$ . These are easy to verify for $2\leq b\leq 5$ , and for $b\geq 6$ , we may apply a crude bound to say the fraction is at most

\frac{1+(1-p)^{5}\cdot(1-2p)\cdot\left((1+\sqrt{\delta_{1}})^{2}-1\right)}{1-2p}

as long as $c_{1}\geq 1$ and $c_{2}\geq 0.$ This is at most $5.68$ in the range $[0.096,0.402]$ .

Subcase 5.a was dealt with previously (as we only have to consider when $h\geq 1$ ), so the final case is 2.d when $\beta^{2}+\gamma^{2}<0.25$ . In this case, we recall the fraction is

\frac{(1-2p)\cdot\min(1+\sqrt{\delta_{1}},\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot(\beta^{2}+\gamma^{2})}{1-(2-\beta^{2}-\gamma^{2})\cdot p},

where $t\geq 1$ , $\beta+\gamma\geq\sqrt{\delta_{3}\cdot t}$ , and also $\beta^{2}+\gamma^{2}\leq 0.25$ . By the symmetry of $\beta$ and $\gamma$ , we may replace $\max(\beta,\gamma)$ with $\beta$ . So, by defining $\zeta=\beta^{2}+\gamma^{2},$ we can upperbound the above expression by

\frac{(1-2p)\cdot(\beta+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot\zeta}{1-2p+p\cdot\zeta}\leq\frac{(1-2p)\cdot(\beta+\sqrt{\delta_{1}/\delta_{3}}\cdot(\beta+\gamma))^{2}+p\cdot\zeta}{1-2p+p\cdot\zeta},

since $\zeta\leq 0.25$ and since $\beta+\gamma\geq\sqrt{\delta_{3}\cdot t}$ . By Cauchy-Schwarz, $\left(\beta\cdot x+\gamma\cdot y\right)^{2}\leq(\beta^{2}+\gamma^{2})\cdot(x^{2}+y^{2})\leq\zeta\cdot(x^{2}+y^{2})$ . So, we can bound the above expression by

\frac{(1-2p)\cdot\zeta\cdot\left((1+\sqrt{\delta_{1}/\delta_{3}})^{2}+\delta_{1}/\delta_{3}\right)+p\cdot\zeta}{1-2p+p\cdot\zeta}=\frac{(1-2p)\cdot\left((1+\sqrt{\delta_{1}/\delta_{3}})^{2}+\delta_{1}/\delta_{3}\right)+p}{\frac{1-2p}{\zeta}+p}.

For $p\leq 0.5,$ this fraction clearly increases with $\zeta$ , so we maximize this when $\zeta=0.25$ . When setting $\zeta=0.25$ , this can easily be verified to be at most $5.68$ for all $p\in[0.096,0.5]$ .

This concludes all cases, thus proving the proposition. ∎

Next, we recall Lemma 5.11. First, by setting $S$ to be $I_{1}$ in Lemma 5.11, we obtain that

\sum_{i=1}^{5}Q_{i}=\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\right)\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot|I_{1}|+4\gamma\cdot\text{OPT}_{k^{\prime}}.

(30)

Next, by setting $S$ to be $I_{2}\cup I_{3}$ in Lemma 5.11, we obtain that

\sum_{i=1}^{5}R_{i}=\sum_{j\in\mathcal{D}}\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i))\geq\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|-4\gamma\cdot\text{OPT}_{k^{\prime}}.

(31)

Next, we recall Lemma 5.13. By splitting $\mathbb{E}[\text{cost}(\mathcal{D},S)]$ based on whether $j$ is in $\mathcal{D}_{1}$ , $\mathcal{D}_{2}$ , $\mathcal{D}_{3}$ , $\mathcal{D}_{4}$ , $\mathcal{D}_{5}\backslash\mathcal{D}_{B}$ , or $\mathcal{D}_{B}$ , we obtain that

\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq(1+O(\varepsilon))\cdot\left[\sum_{i=1}^{5}\rho^{(i)}(p)\cdot(Q_{i}-p\cdot R_{i})\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Therefore, the argument of Lemma 5.14 implies that if $|I_{1}|+p\cdot|I_{2}\cup I_{3}|=k$ , if $p\in[0.01,0.49]$ , and if $|I_{2}\cup I_{3}|\geq 100C^{4}$ , then we can choose a set $I_{1}\subset S\subset I_{1}\cup I_{2}\cup I_{3}$ such that $|S|\leq k$ and

	$\displaystyle\text{cost}(\mathcal{D},S)$	$\displaystyle\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(p-\frac{2}{C}\right)\cdot\left(Q_{i}-\left(p-\frac{2}{C}\right)\cdot R_{i}\right)\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(p\right)\cdot(Q_{i}-p\cdot R_{i})\right]+O(\gamma)\cdot\text{OPT}_{k}.$		(32)

To explain the second line, note that $\rho^{(i)}$ has bounded derivative on $[0.01,0.49]$ and that $Q_{i}\geq 0.5\cdot R_{i}$ . Therefore, since $p\in[0.01,0.49]$ , $\rho^{(i)}\left(p-\frac{2}{C}\right)=\rho^{(i)}(p)\cdot\left(1+O(1/C)\right)$ , and $Q_{i}-p\cdot R_{i}=\Omega(R_{i})$ which means $Q_{i}-\left(p-\frac{2}{C}\right)\cdot R_{i}=(Q_{i}-p\cdot R_{i})\cdot\left(1+O(1/C)\right)$ . In addition, we still have that $\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k})$ , as in our proof of Theorem 5.17.

We now return to the setup of Theorem 5.17, where we have $(I_{1},I_{2},I_{3})$ and $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ . Suppose that $|I_{1}|+p_{1}|I_{2}\cup I_{3}|=k+ct$ , $|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|=k-t$ , $|I_{1}\cap I_{1}^{\prime}|=k-(1+d)t$ , and $|I_{1}\backslash I_{1}^{\prime}|=\kappa\in\{0,1\}$ . In addition, suppose that $|I_{1}|\geq k-100C^{4},$ which means that $|I_{1}^{\prime}|\geq k-100C^{4}-1$ . In this case, we may follow the same approach as in our Theorem 5.17 to obtain a $\rho(p_{1})\cdot(1+O(\varepsilon))$ -approximation to $k$ -means.

Alternatively, we may suppose that $|I_{1}|\leq k-100C^{4}$ , which implies that $|I_{2}\cup I_{3}|\geq 100C^{4}$ . Then, defining $r\geq 1$ such that $c=(r-1)\cdot(1+d)$ , we can use Equation (32) to find a solution of size at most $k$ with cost at most

\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right)\right]+O(\gamma)\cdot\text{OPT}_{k},

(33)

in the same manner as (32), by setting $p=p_{1}\cdot\frac{(1+d)t-\kappa}{(1+c+d)t-\kappa}=\frac{p_{1}}{r}-O(1/C)$ . Alternatively, we can obtain two separate solutions $I_{1}\subset S\subset I_{1}\cup I_{2}\cup I_{3}$ of size $k+ct$ , and a solution $I_{1}^{\prime}\subset S^{\prime}\subset I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime}$ of size $k-t$ , such that $|S\cup S^{\prime}|=k+(c+d)t$ . We have that

\text{cost}(\mathcal{D},S\cup S^{\prime})\leq\text{cost}(\mathcal{D},S)\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(p_{1}\right)\cdot(Q_{i}-p_{1}\cdot R_{i})\right]+O(\gamma)\cdot\text{OPT}_{k}.

Finally, using the bound (23) for the cost of $S^{\prime}$ , we have

\text{cost}(\mathcal{D},S^{\prime})\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\rho(p_{1})\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\right)+O(\gamma)\cdot\text{OPT}_{k}.

Note that we are not able to use a more sophisticated bound for $\text{cost}(\mathcal{D},S^{\prime})$ , because our values of $\{Q_{i}\}$ and $\{R_{i}\}$ only apply to $(I_{1},I_{2},I_{3})$ and not to $(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime})$ . By combining the solutions $S\cup S^{\prime}$ and $S^{\prime}$ , by adding $t$ random points from $S\backslash S^{\prime}$ to $S^{\prime}$ , and using Proposition 5.6, we obtain a solution $S^{\prime\prime}$ with expected cost

\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})]\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\Biggr{[}\frac{1}{r(1+d)}\cdot\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})\\ +\left(1-\frac{1}{r(1+d)}\right)\cdot\rho(p_{1})\cdot\Biggr{(}\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\Biggr{)}\Biggr{]}+O(\gamma)\cdot\text{OPT}_{k}.

(34)

This is because we combine the solution $S\cup S^{\prime}$ , which has size $k+(c+d)t$ , with the solution $S^{\prime}$ , which has size $k-t,$ so we assign the first solution relative weight $\frac{1}{1+c+d}=\frac{1}{r(1+d)}$ and the second solution relative weight $\frac{c+d}{1+c+d}=1-\frac{1}{r(1+d)}$ .

Now, let $\mathfrak{D}$ equal $\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k$ . Then, since $|I_{1}|+\frac{p_{1}}{r}|I_{2}\cup I_{3}|\geq k$ , we can combine Equations (30) and (31) to get that

	$\displaystyle\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right)$	$\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(\|I_{1}\|+\frac{p_{1}}{r}\|I_{2}\cup I_{3}\|\right)+O(\gamma)\cdot\text{OPT}_{k}$
		$\displaystyle\leq\mathfrak{D}+O(\gamma)\cdot\text{OPT}_{k}.$		(35)

Next, recall (by Equation (33)) that we have a solution of size at most $k$ with cost at most

\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right)\right]+O(\gamma)\cdot\text{OPT}_{k}.

(36)

Finally, we note that since $|I_{2}\cup I_{3}|=\frac{r(1+d)t-\kappa}{p_{1}}\geq(1-O(1/C))\cdot\frac{r(1+d)t}{p_{1}},$ we have that

\sum_{i=1}^{5}R_{i}+O(\gamma)\cdot\text{OPT}_{k}\geq\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|\geq\left(1-O(\frac{1}{C})\right)\cdot\frac{r(1+d)}{p_{1}}\cdot\left(\lambda-\frac{1}{n}\right)\cdot t.

Therefore, we can bound the expected cost of $S^{\prime\prime}$ by

\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\Biggr{[}\frac{1}{r(1+d)}\cdot\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})\\ +\left(1-\frac{1}{r(1+d)}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}+\frac{p_{1}}{r(1+d)}\cdot\sum_{i=1}^{5}R_{i}\right)\Biggr{]}+O(\gamma)\cdot\text{OPT}_{k}.

(37)

Now, we have that $r\geq 1$ , and if we let $\theta=\frac{1}{1+d}$ , we have that $\theta\in[0,1]$ . Hence, to show that we obtain an approximation $\rho+O(\varepsilon+\gamma+1/C)$ , it suffices to show that for all choices of $\theta\in[0,1]$ and $r\geq 1,$ that if we let $\mathfrak{D}^{\prime}=\mathfrak{D}+O(\gamma)\cdot\text{OPT}_{k}$ , one cannot simultaneously satisfy

$\displaystyle\mathfrak{D}^{\prime}$	$\displaystyle\geq\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right)$	(38)
$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\frac{\theta}{r}\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})+\left(1-\frac{\theta}{r}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+p_{1}\cdot\frac{\theta}{r}\sum_{i=1}^{5}R_{i}\right)$	(39)
$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right)$	(40)

and

R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq Q_{2},\hskip 28.45274ptR_{3}\leq Q_{3},\hskip 28.45274ptR_{4}\leq 1.75Q_{4},\hskip 28.45274ptR_{5}\leq 2Q_{5}.

(41)

Indeed, we already know that (38) is true (same as (35)) and that (41) is true (same as (28)). So if we can show we can’t simultaneously satisfy all of (38), (39), (40), and (41), then either (39) or (40) is false. But we have a clustering with at most $k$ centers and cost at most the right hand sides of each of (39) and (40) up to a $1+O(1/C+\varepsilon+\gamma)$ multiplicative factor, due to (37) and (36), respectively. Therefore, we successfully obtain a solution of cost at most $\rho\cdot\left(1+O(1/C+\varepsilon+\gamma)\right)\cdot\mathfrak{D}^{\prime}$ . Moreover, $\mathfrak{D}^{\prime}\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+O(\gamma)\cdot\text{OPT}_{k}\leq(1+O(\gamma))\cdot\text{OPT}_{k}$ , since $\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{2}{n}\right)\cdot k\leq\text{OPT}_{k}$ by Proposition 5.15 as both $\alpha^{(\ell)},\alpha^{(\ell+1)}$ are solutions to $\text{DUAL}(\lambda+\frac{1}{n})$ , and since $\frac{3k}{n}\leq 3\leq O(\gamma)\cdot\text{OPT}_{k}$ . Therefore, $\mathfrak{D}^{\prime}\leq(1+O(\gamma))\cdot\text{OPT}_{k}$ , which means that we have found a $\rho\cdot(1+O(1/C+\varepsilon+\gamma))$ approximation to $k$ -means clustering.

Indeed, by numerical analysis of these linear constraints and based on the functions $\rho^{(i)}$ , we obtain a $\boxed{5.912}$ -approximation algorithm for Euclidean $k$ -means clustering. We defer the details to Appendix C.

6 Improved Approximation Algorithm for $k$ -median

6.1 Improvement to $1+\sqrt{2}$ -approximation

In this subsection, we show that a $1+\sqrt{2}+\varepsilon$ -approximation can be obtained by a simple modification of the Ahmadian et al. [1] analysis. Because we use the same algorithm as [1], the reduction from an LMP algorithm to a full polynomial-time algorithm is identical, so it suffices to improve the analysis of their LMP algorithm to a $1+\sqrt{2}$ -approximation. The main difficulty in this subsection will be obtaining a tight bound on the norms (as opposed to squared norms) of points that are pairwise separated, which we prove in Lemma 6.1. In the next subsection, we show how to break the $1+\sqrt{2}$ barrier that this algorithm runs into, which will follow a similar approach to our improved $k$ -means algorithm.

We first recall the setup of the LMP approximation of [1]. Let $c(j,i)=d(j,i)$ be the distance between a client $j\in\mathcal{D}$ and a facility $i\in\mathcal{F}$ . Suppose we have a solution $\alpha$ to $\text{DUAL}(\lambda)$ , such that every client $j$ has a tight witness $w(j)\in\mathcal{F}$ with $\alpha_{j}\geq t_{w(j)}$ and $\alpha_{j}\geq c(j,w(j)).$ Recall that $t_{i}=\max_{j\in N(i)}\alpha_{j}$ , where $N(i)=\{j\in\mathcal{D}:\alpha_{j}>c(j,i)\}$ , and likewise, $N(j)=\{i\in\mathcal{F}:\alpha_{j}>c(j,i)\}$ . Now, we let the conflict graph $H(\delta)$ on tight facilities (i.e., facilities $i$ with $\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=\lambda$ ) have an edge $(i,i^{\prime})$ if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}})$ .

We let $\delta=\sqrt{2}$ and return a maximal independent set $I$ of $H(\delta)$ as our LMP-approximation. It suffices to show that for each client $j\in\mathcal{D},$ that $c(j,I)\leq(1+\sqrt{2})\cdot\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right).$ To see why, by adding over all clients $j$ , we obtain that

\text{cost}(\mathcal{D},I)\leq(1+\sqrt{2})\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{i\in I}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\right)=(1+\sqrt{2})\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda\cdot|I|\right).

Finally, since $\alpha$ is a feasible solution to $\text{DUAL}(\lambda),$ this implies that $\text{cost}(\mathcal{D},I)\leq(1+\sqrt{2})\cdot\text{OPT}_{|I|}.$

Before we verify the LMP approximation, we need the following lemma about points in Euclidean space.

Lemma 6.1.

Let $h\geq 2$ and suppose that $x_{1},\dots,x_{h}$ are points in Euclidean space $\mathbb{R}^{d}$ (for some $d$ ) such that $\|x_{i}-x_{j}\|_{2}^{2}\geq 2$ for all $i\neq j$ . Then, $\sum_{i=1}^{h}\|x_{i}\|_{2}\geq\sqrt{h\cdot(h-1)}$ .

Proof.

Note that for any positive real numbers $t_{1},t_{2},\dots,t_{h}$ that add to $1$ , we have that

\sum_{i=1}^{h}t_{i}\cdot\|x_{i}\|_{2}^{2}\geq\sum_{i=1}^{h}t_{i}\cdot\|x_{i}\|_{2}^{2}-\left\|\sum t_{i}x_{i}\right\|_{2}^{2}=\sum_{i<j}t_{i}t_{j}\|x_{i}-x_{j}\|_{2}^{2}\geq 2\cdot\sum_{i<j}t_{i}t_{j}.

Then, by setting $a_{i}=\|x_{i}\|_{2}$ for each $i$ and scaling by $t_{1}+\cdots+t_{h}$ accordingly to remove the assumption that $t_{1}+\cdots+t_{h}=1$ , we have that

\left(\sum_{i=1}^{h}t_{i}\cdot a_{i}^{2}\right)\cdot\left(\sum_{i=1}^{h}t_{i}\right)\geq 2\cdot\sum_{i<j}t_{i}t_{j}

for all $t_{1},\dots,t_{h}\geq 0$ . Now, if some $a_{i}=0$ , then $\|x_{j}\|_{2}=\|x_{i}-x_{j}\|_{2}\geq\sqrt{2}$ , which means that $\sum_{i=1}^{h}\|x_{j}\|_{2}\geq(h-1)\cdot\sqrt{2}\geq\sqrt{h(h-1)}$ for all $h\geq 2$ . Alternatively, $a_{i}\neq 0$ for any $i$ , so we can set $t_{i}=\frac{1}{a_{i}}$ , to obtain that

\left(\sum_{i=1}^{h}a_{i}\right)\cdot\left(\sum_{i=1}^{h}\frac{1}{a_{i}}\right)\geq 2\cdot\sum_{i<j}\frac{1}{a_{i}a_{j}}.

(42)

From now on, for any polynomial $P(a_{1},\dots,a_{h})$ , we denote $\sum_{\text{sym}}P(a_{1},\dots,a_{h})$ to be the sum of all distinct terms of the form $P(a_{\pi(1)},\dots,a_{\pi(h)})$ over all permutations of $[h]$ . For instance, $\sum_{\text{sym}}a_{1}a_{2}a_{3}=\sum_{1\leq i<j<k\leq h}a_{i}a_{j}a_{k}$ and $\sum_{\text{sym}}a_{1}^{2}a_{2}=\sum_{1\leq i<j\leq h}a_{i}^{2}a_{j}+\sum_{1\leq j<i\leq h}a_{i}^{2}a_{j}$ .

In the case when $h=2$ , this means that $(a_{1}+a_{2})\cdot\frac{a_{1}+a_{2}}{a_{1}a_{2}}\geq\frac{2}{a_{1}a_{2}},$ so $a_{1}+a_{2}\geq\sqrt{2}=\sqrt{h(h-1)}.$ Alternatively, we assume that $h\geq 3$ . When $h\geq 3$ , note that

$\displaystyle\left(\sum_{\text{sym}}a_{1}\cdots a_{h-2}\right)\cdot\left(\sum a_{i}\right)$	$\displaystyle=(h-1)\cdot\sum_{\text{sym}}a_{1}\cdots a_{h-1}+\sum_{\text{sym}}a_{1}^{2}a_{2}\cdots a_{h-2}$
	$\displaystyle\geq\left[(h-1)+\frac{h(h-1)(h-2)/2}{h}\right]\cdot\sum_{\text{sym}}a_{1}\cdots a_{h-1}$
	$\displaystyle=\frac{h(h-1)}{2}\cdot\sum_{\text{sym}}a_{1}\cdots a_{h-1},$	(43)

where the second line above follows by Muirhead’s inequality. Therefore, we have that

	$\displaystyle\left(\sum_{i=1}^{h}a_{i}\right)^{2}$	$\displaystyle\geq\left(\sum_{i=1}^{h}a_{i}\right)\cdot\frac{h(h-1)}{2}\cdot\frac{\sum_{\text{sym}}a_{1}\cdots a_{h-1}}{\sum_{\text{sym}}a_{1}\cdots a_{h-2}}$
		$\displaystyle=\left(\sum_{i=1}^{h}a_{i}\right)\cdot\frac{h(h-1)}{2}\cdot\frac{\sum_{\text{sym}}\frac{1}{a_{i}}}{\sum_{\text{sym}}\frac{1}{a_{i}a_{j}}}$
		$\displaystyle\geq 2\cdot\frac{h(h-1)}{2}$
		$\displaystyle=h(h-1),$

where the first line follows from (43) and the third line follows from (42). Therefore, we indeed have that $\sum_{i=1}^{h}\|x_{i}\|_{2}\geq\sqrt{h(h-1)}$ . ∎

To verify the LMP approximation, it suffices to show that for every $j$ , $c(j,I)\leq(1+\sqrt{2})\cdot\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right).$ We split this up into $3$ cases.

Case 1: $\boldsymbol{|I\cap N(j)|=0}$ .

In this case, $d(j,I)\leq d(j,w(j))+d(w(j),I)$ by the Triangle Inequality. But we know that $d(j,w(j))\leq\alpha_{j}$ , and that $d(w(j),I)\leq\sqrt{2}\cdot t_{w(j)}\leq\sqrt{2}\cdot\alpha_{j}$ , using the fact that $I$ is a maximal independent set so $w(j)$ has some neighbor of $I$ in the conflict graph. Thus, $d(j,I)\leq(1+\sqrt{2})\cdot\alpha_{j}$ . However, since $N(j)\cap I=\emptyset$ , this means that $\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right)=\alpha_{j}$ . So, the desired inequality holds.

Case 2: $\boldsymbol{|I\cap N(j)|=1}$ .

In this case, let $i_{1}$ be the unique point in $N(j)\cap I$ . Then, $d(j,I)\leq d(j,i_{1})$ . In addition, $\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right)=\alpha_{j}-(\alpha_{j}-c(j,i_{1}))=c(j,i_{1})=d(j,i_{1})$ . Since $d(j,i_{1})\geq 0$ , the desired inequality holds (even with a ratio of $1<1+\sqrt{2}$ ).

Case 3: $\boldsymbol{|I\cap N(j)|=s\geq 2}$ .

In this case, let $i_{1},\dots,i_{s}$ be the set of points in $N(j)\cap I$ . Then, we know that $d(i_{r},i_{r^{\prime}})\geq\delta\cdot\min(t_{i_{r}},t_{i_{r^{\prime}}})$ for any $r\neq r^{\prime}$ . But $t_{i_{r}},t_{i_{r^{\prime}}}\geq\alpha_{j}$ by definition of $t_{i}$ (since $i_{r},i_{r^{\prime}}\in N(j)$ ), so this means that $d(i_{r},i_{r^{\prime}})\geq\sqrt{2}\cdot\alpha_{j}$ for every $r\neq r^{\prime}$ .

Now, by applying Lemma 6.1, we have that $\sum_{r=1}^{s}d(j,i_{r})\geq\sqrt{s\cdot(s-1)}\cdot\alpha_{j}$ . Now, let $t=\frac{1}{\alpha_{j}}\cdot\sum_{r=1}^{s}d(j,i_{r})$ , so $t\geq\sqrt{s\cdot(s-1)}$ . Then, $d(j,I)\leq\min_{1\leq r\leq s}d(j,i_{r})\leq\frac{1}{s}\cdot\sum_{r=1}^{s}d(j,i_{r})=\frac{T}{s}\cdot\alpha_{j}$ . Ina ddition, we have that $\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))=\alpha_{j}-s\cdot\alpha_{j}+T\cdot\alpha_{j}=(T-(s-1))\cdot\alpha_{j}$ . So, the ratio is

\frac{T/s}{T-(s-1)}\leq\frac{\sqrt{s\cdot s-1}/s}{\sqrt{s(s-1)}-(s-1)}=\frac{1}{s-\sqrt{s(s-1)}}\leq 2.

Above, the first inequality follows because as $T$ increases, the numerator increases at a slower rate than the denominator, so assuming that the fraction is at least $1$ , we wish for $T$ to be as small as possible to maximize the fraction. The final inequality holds because $s-\sqrt{s(s-1)}\geq\frac{1}{2}$ for all $s\geq 2$ . Therefore, the desired inequality holds (even with a ratio of $2<1+\sqrt{2}$ ).

So in fact, there is a simple improvement from the $1+\sqrt{8/3}\approx 2.633$ approximation algorithm to a $1+\sqrt{2}\approx 2.414$ algorithm. A natural question is whether this can be improved further without any significant changes to the algorithm or analysis. Indeed, there only seems to be one bottleneck, when $|I\cap N(j)|=0$ , so naturally one may assume that by slightly reducing $\delta=\sqrt{2}$ , the approximation from Case 1 should improve below $1+\sqrt{2}$ and the approximation from Case 3 should become worse than $2$ , but can still be below $1+\sqrt{2}$ .

Unfortunately, such a hope cannot be realized. Indeed, if we replace $\delta=\sqrt{2}$ with some $\delta<\sqrt{2}$ , we may have that $d(j,i_{1})=d(j,i_{2})=\cdots=d(j,i_{s})=\delta\cdot\sqrt{\frac{s-1}{2s}}\cdot\alpha_{j}$ and the pairwise distances are all exactly $\delta\cdot\alpha_{j}$ between each $i_{r},i_{r^{\prime}}$ . However, in this case, $\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))=\alpha_{j}\cdot\left(1-s+\delta\cdot\sqrt{s(s-1)/2}\right),$ which for $\delta<\sqrt{2}$ is in fact negative for sufficiently large $s$ . Hence, even for $\delta=\sqrt{2}-\varepsilon$ for a very small choice of $\varepsilon>0$ , we cannot even guarantee a constant factor approximation with this analysis approach. So, this approach gets stuck at a $1+\sqrt{2}$ approximation.

In the following subsection, we show how an improved LMP approximation algorithm for Euclidean $k$ -median, breaking the $1+\sqrt{2}$ approximation barrier. We will then show that we can also break this barrier for a polynomial-time $k$ -median algorithm as well.

6.2 An improved LMP algorithm for Euclidean $k$ -median

Recall the conflict graph $H:=H(\delta)$ , where we define two tight facilities $(i,i^{\prime})$ to be connected if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}).$ We set parameters $\delta_{1}\geq\delta_{2}\geq\delta_{3}$ and $0<p<1$ , and define $V_{1}$ to be the set of all tight facilities. Given the set of tight facilities $V_{1}$ and conflict graphs $H(\delta)$ for all $\delta>0$ , our algorithm works by applying the procedure described in Algorithm 3 to $V_{1}$ .

Algorithm 3 Generate a Nested Quasi-Independent Set of

V_{1}

, as well as a set of centers

S

providing an LMP approximation for Euclidean

k

-median

LMPMedian( $V_{1},\{H(\delta)\},\delta_{1},\delta_{2},\delta_{3},p$ ):

1:Create a maximal independent set

I_{1}

H(\delta_{1})

2:Let

V_{2}

be the set of points in

V_{1}\backslash I_{1}

that are not adjacent to

I_{1}

H(\delta_{2})

3:Create a maximal independent set

I_{2}

of the induced subgraph

H(\delta_{1})[V_{2}]

4:Let

V_{3}

be the set of points

i

V_{2}\backslash I_{2}

such that there is exactly one point in

I_{2}

that is a neighbor of

i

H(\delta_{1})

, there are no points in

I_{1}

that are neighbors of

i

H(\delta_{2})

, and there are no points in

I_{2}

that are neighbors of

i

H(\delta_{3})

5:Create a maximal independent set

I_{3}

of the induced subgraph

H(\delta_{1})[V_{3}]

6:Note that every point

i\in I_{3}

has a unique adjacent neighbor

q(i)\in I_{2}

H(\delta_{1})

. We create the final set

S

as follows:

•

Include every point $i\in I_{1}$ .
•

For each point $i\in I_{2}$ , flip a fair coin. If the coin lands heads, include $i$ with probability $2p$ . Otherwise, include each point in $q^{-1}(i)$ independently with probability $2p$ .

As in the $k$ -means case, we consider a more general setup, so that we can convert the LMP approximation to a full polynomial-time algorithm. Instead of $V_{1},$ let $\mathcal{V}\subset\mathcal{F}$ be a subset of facilities and let $\mathcal{D}$ be the full set of clients. For each $j\in\mathcal{D},$ let $\alpha_{j}\geq 0$ be some real number, and for each $i\in\mathcal{V}$ , let $t_{i}\geq 0$ be some real number. In addition, for each client $j\in\mathcal{D}$ , we associate with it a set $N(j)\subset\mathcal{V}$ and a “witness” facility $w(j)\in\mathcal{V}$ . Finally, suppose that we have the following assumptions:

1.

For any client $j\in\mathcal{D}$ , the witness $w(j)\in\mathcal{V}$ satisfies $\alpha_{j}\geq t_{w(j)}$ and $\alpha_{j}\geq c(j,w(j))$ .
2.

For any client $j\in\mathcal{D}$ and any facility $i\in N(j)$ , $t_{i}\geq\alpha_{j}>c(j,i)$ .

Then, for the graph $H(\delta)$ on $\mathcal{V}$ where $i,i^{\prime}\in\mathcal{V}$ are connected if and only if $c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}})$ (recall that now, $c(i,i^{\prime})=d(i,i^{\prime})$ instead of $d(i,i^{\prime})^{2}$ ), we have the following main lemma.

Lemma 6.2.

Fix $\delta_{1}=\sqrt{2}$ , $\delta_{2}=1.395$ , and $\delta_{3}=2-\sqrt{2}\approx 0.5858,$ and let $p<0.337$ be variable. Now, let $S$ be the randomized set created by applying Algorithm 3 on $V_{1}=\mathcal{V}$ . Then, for any $j\in\mathcal{D}$ ,

\mathbb{E}[c(j,S)]\leq\rho(p)\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right],

where $\rho(p)$ is some constant that only depends on $p$ (since $\delta_{1},\delta_{2},\delta_{3}$ are fixed).

Proof.

As in the $k$ -means case, we fix $j\in\mathcal{D}$ , and we do casework based on the sizes of $a=|I_{1}\cap N(j)|$ , $b=|I_{2}\cap N(j)|$ , and $c=|I_{3}\cap N(j)|$ .

Case 1: $\boldsymbol{a=0,b=1,c=0}$ .

Let $i_{2}$ be the unique point in $I_{2}\cap N(j),$ and let $i^{*}=w(j)$ be the witness of $j$ . We have the following subcases:

$\boldsymbol{i^{*}\not\in V_{2}}$ . In this case, either $i^{*}\in I_{1}$ so $d(i^{*},I_{1})=0$ , or there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\delta_{2}\cdot\min(t_{i^{*}},t_{i_{1}})\leq\delta_{2}$ . So, $d(j,I_{1})\leq 1+\delta_{2}$ . In addition, we have that $i_{2}\in S$ with probability $p$ . So, if we let $t:=d(j,i_{2})$ , we can bound the ratio by

\frac{p\cdot t+(1-p)\cdot(1+\delta_{2})}{1-p(1-t)}=\frac{p\cdot t+(1-p)\cdot(1+\delta_{2})}{p\cdot t+(1-p)}\leq 1+\delta_{2},

(1.a’)

since $t\geq 0$ .

b)

$\boldsymbol{i^{*}\in V_{3}}.$ In this case, there exists $i_{3}\in I_{3}$ (possibly $i_{3}=i^{*}$ ) such that $d(i^{*},i_{3})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{3}}).$ In addition, there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})$ . In addition, we have that $t_{i^{*}}\leq\alpha_{j}=1$ . Finally, since $I_{3}\subset V_{2}$ , we must have that $d(i_{1},i_{3})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{3}})$ . If we condition on $i_{2}\in S$ , then the numerator and denominator both equal $c(j,i_{2})$ , so the fraction is $1$ (or $0/0$ ). Else, if we condition on $i_{2}\not\in S$ , then the denominator is $1$ , and $i_{3}\in S$ with probability either $p$ or $\frac{p}{1-p}>p$ . Therefore, $\mathbb{E}[d(j,S)|i_{2}\not\in S]\leq p\cdot\|i_{3}-j\|_{2}+(1-p)\cdot\|i_{1}-j\|_{2}$ . We can bound this (we defer the details to Appendix B) by

$\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}},$ (1.b’)

where $X=p^{2}+p(1-p)\cdot T$ and $Y=(1-p)^{2}+\frac{p(1-p)}{T}.$

In the remaining cases, we may assume that $i^{*}\in V_{2}\backslash V_{3}$ . Then, one of the following must occur:

$\boldsymbol{i^{*}=i_{2}}$ . In this case, define $t=d(j,i^{*})\in[0,1]$ , and note that $d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq t+\delta_{1}$ . So, with probability $p$ , we have that $d(j,S)\leq d(j,i^{*})=t$ , and otherwise, we have that $d(j,S)\leq d(j,I_{1})=t+\delta_{1}$ . So, we can bound the ratio by

\max_{0\leq t\leq 1}\frac{p\cdot t+(1-p)\cdot(t+\delta_{1})}{1-p\cdot(1-t)}=\max_{0\leq t\leq 1}\frac{t+(1-p)\delta_{1}}{p\cdot t+(1-p)}.

For $p$ such that $1/p>\delta_{1},$ it is clear that this function increases as $t$ increases, so it is maximized when $t=1$ , which means we can bound the ratio by

1+(1-p)\cdot\delta_{1}.

(1.c’)

d)

$\boldsymbol{i^{*}\in I_{2}}$ but $\boldsymbol{i^{*}\neq i_{2}}$ . First, we recall that $d(j,i^{*})\leq 1$ . Now, let $t=d(j,i_{2})$ . In this case, with probability $p$ , $d(j,S)=t$ (if we select $i_{2}$ to be in $S$ ), with probability $p(1-p)$ , $d(j,S)\leq 1$ (if we select $i^{*}$ but not $i_{2}$ to be in $S$ ), and in the remaining event of $(1-p)^{2}$ probability, we still have that $d(j,S)\leq d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq 1+\delta_{1}.$ So, we can bound the ratio by

$\max_{0\leq t\leq 1}\frac{p\cdot t+p(1-p)\cdot 1+(1-p)^{2}\cdot(1+\delta_{1})}{1-p\cdot(1-t)}.$

Note that this is maximized when $t=0$ (since the numerator and denominator increase at the same rate when $t$ increases), so we can bound the ratio by

$\frac{p(1-p)+(1-p)^{2}\cdot(1+\delta_{1})}{1-p}=1+(1-p)\cdot\delta_{1}.$ (1.d’)
e)

There is more than one neighbor of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{1})}$ that is in $\boldsymbol{I_{2}}$ . In this case, there is some other point $i_{2}^{\prime}\in I_{2}$ not in $N(j)$ such that $d(i^{*},i_{2}^{\prime})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}).$ So, we have four points $j,i^{*},i_{1}\in I_{1},i_{2}^{\prime}\in I_{2}$ such that $d(j,i^{*})\leq 1,$ $d(i^{*},i_{2}^{\prime})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}),$ $d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}}),$ and $d(i_{1},i_{2}^{\prime})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}^{\prime}}).$

If we condition on $i_{2}\in S$ , then the denominator equals $c(j,i_{2})$ and the numerator is at most $c(j,i_{2})$ , so the fraction is $1$ (or $0/0$ ). Else, if we condition on $i_{2}\not\in S$ , then the denominator is $1$ , and the numerator is at most $p\cdot\|i_{2}^{\prime}-j\|_{2}^{2}+(1-p)\cdot\|i_{1}-j\|_{2}^{2}$ . Note that $d(j,i^{*})\leq 1$ , that $t_{i^{*}}\leq 1$ , and that $\delta_{2},\delta_{1}\geq\delta_{2}$ . So, as in case b), the overall fraction is at most

$\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}},$ (1.e’)

where $X=p^{2}+p(1-p)\cdot T$ and $Y=(1-p)^{2}+\frac{p(1-p)}{T}.$

There are no neighbors of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{1})}$ that are in $\boldsymbol{I_{2}}$ . In this case, $d(i^{*},i_{2})\geq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}}).$ Define $t=\min(t_{i^{*}},t_{i_{2}})$ . Since $d(j,i^{*})\leq 1,$ by the triangle inequality we have that $d(j,i_{2})\geq\max\left(0,\delta_{1}\cdot t-1\right)$ . In addition, we still have that $d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq 1+\delta_{1}\cdot t_{i^{*}},$ and $d(j,I_{1})\leq d(j,i_{2})+d(i_{2},I_{1})\leq 1+\delta_{1}\cdot t_{i_{2}}$ , so together we have that $d(j,I_{1})\leq 1+\delta_{1}\cdot t$ . Since $i_{2}\in S$ with probability $p$ , the ratio is at most

	$\displaystyle\hskip 14.22636pt\max_{0\leq t\leq 1}\max_{d(j,i_{2})\geq\delta_{1}\cdot t-1}\frac{p\cdot d(j,i_{2})+(1-p)\cdot(1+\delta_{1}\cdot t)}{1-p(1-d(j,i_{2}))}$
	$\displaystyle=\max_{0\leq t\leq 1}\max_{d(j,i_{2})\geq\delta_{1}\cdot t-1}\frac{p\cdot d(j,i_{2})+(1-p)\cdot(1+\delta_{1}\cdot t)}{p\cdot d(j,i_{2})+(1-p)}.$

It is clear that this function is decreasing as $d(j,i_{2})$ is increasing (and nonnegative). So, we may assume WLOG that $d(j,i_{2})=\max(0,\sqrt{\delta_{2}\cdot t}-1)$ to bound this ratio by

\max_{0\leq t\leq 1}\frac{p\cdot\max(0,\delta_{1}\cdot t-1)+(1-p)\cdot(1+\delta_{1}\cdot t)}{p\cdot\max(0,\delta_{1}\cdot t-1)+(1-p)}

If $\delta_{1}\cdot t-1\leq 0$ , then $\delta_{1}\cdot t+1\leq 2,$ so we can bound the above equation by $2$ . Otherwise, the above fraction can be rewritten as $\frac{\delta_{1}\cdot t+(1-2p)}{p\cdot\delta_{1}\cdot t+(1-2p)}$ . For $p<0.5,$ this is maximized when $t=1$ over the range $t\in[0,1]$ , so we can bound the ratio by

\frac{1-2p+\delta_{1}}{1-2p+p\cdot\delta_{1}}.

(1.f’)

There is a neighbor of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{3})}$ that is also in $\boldsymbol{I_{2}}$ . In this case, either $d(i^{*},i_{2})\leq\delta_{3}\cdot t_{i^{*}}$ so $d(i_{2},j)\geq\max(0,d(j,i^{*})-\delta_{3}\cdot t_{i^{*}})$ , or there is some other point $i_{2}^{\prime}\in I_{2}$ not in $N(j)$ such that $d(i^{*},i_{2}^{\prime})\leq\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}).$ If $d(i^{*},i_{2})\leq\delta_{3}\cdot t_{i^{*}}$ , then define $t=t_{i^{*}}$ and $u=d(j,i^{*})$ . In this case, $d(j,I_{1})\leq u+\delta_{1}\cdot t,$ and $d(j,i_{2})\geq\max(0,u-\delta_{3}\cdot t).$ So, the fraction is at most

\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot d(j,i_{2})}{1-p+p\cdot d(j,i_{2})}\leq\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot\max(0,u-t\cdot\delta_{3})}{1-p+p\cdot\max(0,u-t\cdot\delta_{3})}.

Since $t=t_{i^{*}}\leq 1$ and $d(j,i^{*})\leq 1$ , we can bound the overall fraction as at most

	$\displaystyle\hskip 14.22636pt\max_{0\leq t\leq 1}\max_{0\leq u\leq 1}\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot\max(0,u-t\cdot\delta_{3})}{1-p+p\cdot\max(0,u-t\cdot\delta_{3})}$
	$\displaystyle\leq\max\left(\delta_{1}+\delta_{3},\frac{1+\delta_{1}-p(\delta_{1}+\delta_{3})}{1-p\cdot\delta_{3}}\right)$		(1.g.i’)

We derive the final equality in Appendix B.

Alternatively, if $d(i^{*},i_{2}^{\prime})\leq\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}),$ then if we condition on $i_{2}\in S,$ the fraction is $1$ (or $0/0$ ), and if we condition on $i_{2}\not\in S$ , the denominator is $1$ and the numerator is at most $p\cdot d(j,i_{2}^{\prime})+(1-p)\cdot d(j,i_{1})\leq p\cdot(1+\delta_{3})+(1-p)\cdot(1+\delta_{1}).$ (Note that $i_{2}\in S$ and $i_{2}^{\prime}\in S$ are independent.) Therefore, we can also bound the overall fraction by

p\cdot(1+\delta_{3})+(1-p)\cdot(1+\delta_{1}).

(1.g.ii’)

h)

There is a neighbor of $\boldsymbol{i^{*}}$ in $\boldsymbol{H(\delta_{2})}$ that is also in $\boldsymbol{I_{1}}$ . In this case, $i^{*}$ would not be in $V_{2}$ , so we are back to sub-case 1.a’.

Case 2: $\boldsymbol{a=0,b=0,c\leq 1}$ .

We again let $i^{*}$ be the witness of $j$ . In this case, if $i\not\in V_{2}$ , then there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\delta_{2}\cdot\min(t_{i^{*}},t_{i_{1}})\leq\delta_{2}$ , in which case $d(j,I_{1})\leq 1+\delta_{2}.$ Otherwise, there exists $i_{1}\in I_{1}$ such that $d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})$ , and there exists $i_{2}\in I_{2}$ such that $d(i^{*},i_{2})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}})$ . Finally, in this case we also have that $d(i_{1},i_{2})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}})$ . Now, we consider two subcases, either $c=0$ or $c=1$ .

a)

$\boldsymbol{c=0.}$ In this case, we have that the denominator is $1$ , and the numerator is either at most $1+\delta_{2}$ , or is at most $p\cdot\|j-i_{2}\|_{2}+(1-p)\cdot\|j-i_{1}\|_{2}$ , where $d(j,i^{*})\leq 1$ , $d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})$ , $d(i^{*},i_{2})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}})$ , and $d(i_{1},i_{2})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}})$ . Hence, we can bound the overall fraction, by the same computation as in the $k$ -median subcase 1.b), as

$\max\left(1+\delta_{2},\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right),$ (2.a’)

where $X=p^{2}+p(1-p)\cdot T$ and $Y=(1-p)^{2}+\frac{p(1-p)}{T}.$
b)

$\boldsymbol{c=1.}$ In this case, let $i_{3}$ be the unique point in $N(j)\cap I_{3}.$ Then, conditioned on $i_{3}$ being in $S$ , the numerator and denominator both equal $d(j,i_{3})$ . Otherwise, the denominator is $1$ and we can bound the numerator the same way as in subcase 2a), since the probability of $i_{2}\in S$ is either $p$ (if $q(i_{3})\neq i_{2}$ ) or $\frac{p}{1-p}\geq p$ (if $q(i_{3})=i_{2}$ ). So, we can bound the overall fraction again as

$\max\left(1+\delta_{2},\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right),$ (2.b’)

where $X=p^{2}+p(1-p)\cdot T$ and $Y=(1-p)^{2}+\frac{p(1-p)}{T}.$

Case 3: $\boldsymbol{a=0}$ , all other cases.

Note that in this case, we may assume $b+c=|N(j)\cap(I_{2}\cup I_{3})|\geq 2$ , since we already took care of all cases when $a=0$ and $b+c\leq 1$ . We split into two main subcases.

Every point $\boldsymbol{i}$ in $\boldsymbol{N(j)\cap(I_{2}\cup I_{3})}$ satisfies $\boldsymbol{d(j,i)\geq\delta_{1}-1.}$ In this case, let $\accentset{\rule{2.79996pt}{0.7pt}}{I}\subset I_{2}\cup I_{3}$ represent the set of points selected to be in $S$ . Note that $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ is a random set.

Note that with probability at least $2p-2p^{2}$ , $|N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1$ . (Since $N(j)\cap(I_{2}\cup I_{3})$ has size at least $2$ , the probability of $\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j)$ being nonempty is minimized when $b=0,c=2$ , and the two points in $N(j)\cap I_{3}$ map to the same point under $q$ .) In this event, let $h=|N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}|$ , and let $r_{1},\dots,r_{h}$ represent the distances from $j$ to each of the points in $N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}$ . Then, by Lemma 6.1, $\frac{r_{1}+\cdots+r_{h}}{h}\geq\sqrt{\frac{h-1}{h}}.$ So, if we set $r=\frac{r_{1}+\cdots+r_{h}}{h},$ then $r\leq 2\left[r\cdot h-(h-1)\right]$ for any $h\geq 1$ and $r\geq\sqrt{\frac{h-1}{h}},$ which means that $\min r_{i}\leq\frac{r_{1}+\cdots+r_{h}}{h}\leq 2\cdot\left(1-\sum_{i=1}^{h}(1-r_{i})\right)$ .

In addition, if $|\accentset{\rule{2.79996pt}{0.7pt}}{I}|=1$ , then $1-\sum(1-r_{i})=r_{1}\geq\delta_{1}-1=\sqrt{2}-1$ , and otherwise, because every point in $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ is separated by at least $\delta_{1}=\sqrt{2}$ distance, $1-\sum(1-r_{i})\geq\sqrt{h(h-1)}-(h-1)\geq\sqrt{2}-1$ by Lemma 6.1. Overall, this means that whenever $h=|N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1,$ $\min r_{i}\leq 2\cdot\left(1-\sum(1-r_{i})\right)$ and $1-\sum(1-r_{i})\geq\sqrt{2}-1$ .

In addition, if $\accentset{\rule{2.79996pt}{0.7pt}}{I}=\emptyset$ , then the denominator is $1$ and the numerator is at most $d(j,w(j))+d(w(j),I_{1})\leq 1+\sqrt{2}$ . Therefore, if we let $q$ be the probability that $|\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1$ and $t$ be the expectation of $1-\sum(1-r_{i})$ conditioned on $|\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1$ , the overall fraction is at most

	$\displaystyle\frac{(1+\sqrt{2})\cdot(1-q)+2\cdot t\cdot q}{(1-q)+t\cdot q}$	$\displaystyle\leq\frac{(1+\sqrt{2})\cdot(1-q)+2\cdot(\sqrt{2}-1)\cdot q}{(1-q)+(\sqrt{2}-1)q}$
		$\displaystyle\leq\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})}.$		(3.a’)

There exists a point $\boldsymbol{i\in N(j)\cap(I_{2}\cup I_{3})}$ such that $\boldsymbol{d(j,i)<\delta_{1}-1.}$ In this case, note that $d(i,i^{\prime})<\delta_{1}$ for all points $i^{\prime}\in N(j)\cap(I_{2}\cup I_{3})$ . Assuming $b+c\geq 2$ , this is only possible if either:

i)

$b=1,c=1$ and the unique points $i_{2}\in N(j)\cap I_{2}$ and $i_{3}\in N(j)\cap I_{3}$ satisfy $q(i_{3})=i_{2}$ , or
ii)

$b=1,c\geq 2$ , the unique point $i_{2}\in N(j)\cap I_{2}$ is the only point with $d(j,i)<\delta_{1}-1$ , and every point in $N(j)\cap I_{3}$ maps to $i_{2}$ under $q$ .

First, assume Case b)i. Let $r=d(j,i_{2})$ and $s=d(j,i_{3})$ . Then, $\mathbb{E}\left[1-\sum_{i\in N(j)\cap S}(1-d(j,i))\right]$ $=(1-2p)\cdot 1+p\cdot r+p\cdot s$ , and the expected distance $d(j,S)$ is at most $p\cdot r+p\cdot s+(1-2p)\cdot(1+\sqrt{2})$ . Since $d(r,s)\geq\delta_{3}$ , this means that $r+s\geq\delta_{3},$ so the overall fraction is at most

\frac{(1+\sqrt{2})\cdot(1-2p)+\delta_{3}\cdot p}{(1-2p)+\delta_{3}\cdot p}.

(3.b.i’)

Next, assume Case b)ii. Let $r=d(j,i_{2})$ , and let $s_{1},\dots,s_{c}$ be the distances from $j$ to each of the $c$ points in $N(j)\cap I_{3}$ . Let $s=\frac{s_{1}+\cdots+s_{c}}{c}$ Then, $\mathbb{E}\left[1-\sum_{i\in N(j)\cap S}(1-d(j,i))\right]=1-p(1-r)-\sum_{i=1}^{c}p(1-s_{i})\geq 1-p(1+c)+p\cdot(r+s\cdot c).$ In addition, $\mathbb{E}[c(j,S)]$ is at most $(1+\sqrt{2})\cdot(\frac{1}{2}-p)+(1+\sqrt{2})\cdot\frac{1}{2}(1-2p)^{c}+p\cdot r+\frac{1}{2}\left(1-(1-2p)^{c}\right)\cdot s$ . Since the numerator and denominator grow at the same rate with respect to $r$ , and the numerator grows slower with respect to $s$ than the denominator, we wish to minimize $r$ and $s$ to maximize the fraction. So, we set $r=0$ , and $s=\sqrt{\frac{c-1}{c}}$ by Lemma 6.1. Therefore, the fraction is at most

\frac{(1+\sqrt{2})\cdot\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\right)+\frac{1}{2}\left(1-(1-2p)^{c}\right)\cdot\sqrt{\frac{c-1}{c}}}{1-p(1+c-\sqrt{c(c-1)})}

(3.b.ii’)

Case 4: $\boldsymbol{a\geq 1}$ .

First, we will condition on the fair coin flips, and let $\accentset{\rule{2.79996pt}{0.7pt}}{I}\subset N(j)\cap(I_{2}\cup I_{3})$ be the set of “surviving” points, i.e., the points that will be included in $S$ with probability $2p$ . Note all points in $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ have pairwise distance at least $\delta_{1}=\sqrt{2}$ from each other, and all points in $N(j)\cap I_{1}$ have pairwise distance at least $\delta_{1}$ from each other also. However, the points in $N(j)\cap I_{1}$ and $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ are only guaranteed to have pairwise distance at least $\delta_{2}$ from each other. Let $h$ represent the size $|\accentset{\rule{2.79996pt}{0.7pt}}{I}|$ .

We consider several subcases.

a)

$\boldsymbol{h=0.}$ In this case, we can use the same bounds as Cases 2 and 3 of the simpler $1+\sqrt{2}$ -approximation, since we only have to worry about points in $I_{1}\cap N(j).$ Indeed, the same bounds on the numerator and denominator still hold, so the ratio is at most

$2.$ (4.a’)
b)

$\boldsymbol{a=1,h=1.}$ In this case, let $i_{1}$ be the unique point in $N(j)\cap I_{1}$ , and let $i_{2}$ be the unique point in $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ . Then, $d(i_{1},i_{2})\geq\delta_{2},$ so if $t=d(j,i_{1})$ and $u=d(j,i_{2})$ , then the denominator in expectation is $1-(1-t)-2p(1-u)=t-2p(1-u)\geq t\cdot(1-2p)$ , since $1-u\leq t$ . But, the numerator $\mathbb{E}[c(j,S)]$ is at most $t$ , so the overall fraction is at most

$\frac{1}{1-2p}.$ (4.b’)
c)

$\boldsymbol{a=1,h\geq 2.}$ Let $i_{1}$ be the unique point in $N(j)\cap I_{1}.$ Then, we must have that $d(j,i_{1})\geq\delta_{2}-1.$ Letting $t=d(j,i_{1})$ , we have that $d(j,S)\leq t$ , but $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq t-2p\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{I}}(1-c(j,i)).$ However, we know that $\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{I}}(1-c(j,i))\geq h-\sqrt{h(h-1)}\geq 2-\sqrt{2}$ by Lemma 6.1, so the denominator is at least $t-(2-\sqrt{2})\cdot 2p$ . So, the ratio is at most $\frac{t}{t-2(2-\sqrt{2})p},$ which is maximized when $t$ is as small as possible, namely $t=\delta_{2}-1$ . So, the ratio is at most

$\frac{\delta_{2}-1}{(\delta_{2}-1)-2(2-\sqrt{2})p}.$ (4.c’)

$\boldsymbol{a\geq 2,h=1.}$ In this case, let $i_{2}$ be the unique point in $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ , and let $t=d(j,i_{2})$ . Note that $d(j,i_{2})\geq\delta_{2}-1$ , so $1-d(j,i_{2})\leq 2-\delta_{2}$ . In addition, if the distances from $j$ to the points in $N(j)\cap I_{1}$ are $r_{1},\dots,r_{a}$ , then $d(j,S)\leq\frac{r_{1}+\cdots+r_{a}}{a}.$ If we let $r=\frac{r_{1}+\cdots+r_{a}}{a}$ , then $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]=1-a(1-r)-2p(1-t)\geq 1-a(1-r)-(2-\delta_{2})\cdot 2p.$ So, the overall fraction is at most $\frac{r}{1-a(1-r)-(2-\delta_{2})\cdot 2p}.$ It is clear that this function decreases as $r$ increases, so we want to set $r$ as small as possible. However, we know that $r\geq\sqrt{\frac{a-1}{a}}$ by Lemma 6.1, so the overall fraction is at most

\frac{\sqrt{\frac{a-1}{a}}}{\sqrt{a(a-1)}-(a-1)-(2-\delta_{2})\cdot 2p}=\frac{1}{(a-\sqrt{a(a-1)})-(2-\delta_{2})\cdot 2p\cdot\sqrt{\frac{a-1}{a}}}.

The denominator clearly decreases as $a\to\infty$ , so the overall fraction is at most the limit of the above as $a\to\infty$ , which is

\frac{1}{\frac{1}{2}-(2-\delta_{2})\cdot 2p}.

(4.d’)

e)

$\boldsymbol{a,h\geq 2.}$ In this case, let the distances from $j$ to the points in $N(j)\cap I_{1}$ be $r_{1},\dots,r_{a}$ , and let the distances from the points from $j$ to the points in $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ be $s_{1},\dots,s_{h}$ . Also, let $r=\frac{r_{1}+\cdots+r_{a}}{a},$ and let $s=\frac{s_{1}+\cdots+s_{h}}{h}.$ Then, we have that the numerator is at most $r$ , and the denominator is at least $1-a\cdot(1-r)-2p\cdot h\cdot(1-s)$ . Next, note that by Lemma 6.1, $s\geq\sqrt{\frac{h-1}{h}}$ , so $h\cdot(1-s)\geq h\cdot\left(1-\sqrt{\frac{h-1}{h}}\right)\geq h-\sqrt{h-1}\geq 2-\sqrt{2}$ . So, the fraction is at most $\frac{r}{1-a(1-r)-2p\cdot(2-\sqrt{2})}$ . This is exactly the same as in subcase 4d), except there the denominator was $1-a(1-r)-(2-\delta_{2})\cdot 2p,$ i.e., we just replaced $\delta_{2}$ with $\sqrt{2}$ . So, the same calculations give us that we can bound the overall fraction by at most

$\frac{1}{\frac{1}{2}-(2-\sqrt{2})\cdot 2p}.$ (4.e’)

∎

Finally, we bound the actual LMP approximation constant, similar to Proposition 4.7 for the $k$ -means case. We have the following proposition, which will immediately follow from analyzing all subases carefully (see Lemma 6.5).

Proposition 6.3.

For $p=0.068,$ $\rho(p)\leq 2.395$ . Hence, we can obtain a $2.395$ -LMP approximation.

6.3 Improved $k$ -median approximation

In this section, we explain how our LMP approximation for $k$ -median implies an improved polynomial time $k$ -median approximation for any fixed $k$ . We set $p_{1}=0.068$ and $\delta_{1}=\sqrt{2},\delta_{2}=1.395$ , and $\delta_{3}=2-\sqrt{2}$ . In this case, we have that $\rho(\delta_{1})\leq 2.395$ by Proposition 6.3.

Next, we have that all of the results in Subsections 5.1 and 5.3 hold in the $k$ -median context, with two changes. The first, more obvious, change is that Lemma 5.10 (and all subsequent results in Section 5.3) needs to use the function $\rho$ associated with $k$ -median as opposed to the function associated with $k$ -means.

The second change is that Lemma 5.9 no longer holds for $p\leq 0.5$ , but still holds for $p\leq p_{0}$ for some fixed choice $p_{0}$ . Indeed, for Cases 1, 2, and 3 (i.e., when $a=0$ ), we have that $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0$ for any $p\leq 0.5$ , since $I_{1}=\emptyset$ , and $\sum_{i\in N(j)\cap I_{2}}(\alpha_{j}-c(j,i))\leq(b-\sqrt{b(b-1)})\cdot\alpha_{j}\leq\alpha_{j}$ if $|N(j)\cap I_{2}|=b$ , and likewise $\sum_{i\in N(j)\cap I_{3}}(\alpha_{j}-c(j,i))\leq(c-\sqrt{c(c-1)})\cdot\alpha_{j}\leq\alpha_{j}$ if $|N(j)\cap I_{3}|=c$ . So, $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq(1-2p)\cdot\alpha_{j}\geq 0$ for $p\leq 0.5$ . For case $4$ of $k$ -median, we verify that $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0$ after conditioning on $\accentset{\rule{2.79996pt}{0.7pt}}{I}$ . Indeed, if $\accentset{\rule{2.79996pt}{0.7pt}}{I}=0$ (i.e., subcase 4.a), then this just equals $\alpha_{j}-\sum_{i\in N(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\alpha_{j}\cdot(\sqrt{a(a-1)}-(a-1))\geq 0$ . In the remaining subcases, the value $\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0$ is always nonnegative as long as the denominator of our final fractions are also nonnegative. So, we just need that $1-2p\geq 0$ , $(\delta_{2}-1)-2(2-\sqrt{2})p\geq 0$ , $\frac{1}{2}-(2-\delta_{2})\cdot 2p\geq 0$ , and $\frac{1}{2}-(2-\sqrt{2})\cdot 2p\geq 0$ . These are all true as long as $p\leq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\leq 0.337$ . Thus, we replace $0.5$ with $p_{0}=0.337$ .

Overall, the rest of Subsection 5.3 goes through, except that our final bound will be

\left(1+O(\frac{1}{C}+\varepsilon+\gamma)\right)\cdot\max_{r\geq 1}\min\left(\rho\left(\frac{p_{1}}{r}\right),\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{p_{0}\cdot r}{p_{1}}-1\right)}\right)\right),

(44)

where $p_{1}=0.068$ and $p_{0}=0.337$ . The main replacement here is that we replaced $\frac{r}{2p_{1}}=\frac{0.5\cdot r}{p_{1}}$ with $\frac{0.337\cdot r}{p_{1}}.$ We can use this to obtain a $2.408$ -approximation, improving over $1+\sqrt{2}$ . We will not elaborate on this, however, as we will see that using the method in Subsection 5.4, we can further improve this to $2.406$ .

We split the clients this time into $3$ groups. We let $\mathcal{D}_{1}$ be the set of clients $j\not\in\mathcal{D}_{B}$ corresponding to all subcases in Cases 1 and 2, $\mathcal{D}_{2}$ be the set of clients $j\not\in\mathcal{D}_{B}$ corresponding to all subcases in Case 3, and $\mathcal{D}_{3}$ be the set of clients corresponding to all subcases in Case 4, and all bad clients $j\in\mathcal{D}_{B}.$ For any client $j$ , as in Subsection 5.4, we define $A_{j}:=\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))$ and $B_{j}:=\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i))$ . We also define $Q_{1},Q_{2},Q_{3},R_{1},R_{2},R_{3}$ similar to how we did for the $k$ -means case.

Similar to Lemma 5.18 in the $k$ -means case, we have the following result for the $k$ -median case.

Lemma 6.4.

Let $\delta_{1}=\sqrt{2}$ , $\delta_{2}=1.395$ , and $\delta_{3}=2-\sqrt{2}$ . For any client $j\in\mathcal{D}_{1}$ , $A_{j}\geq B_{j}$ . For any client $j\in\mathcal{D}_{2}$ , $A_{j}\geq\frac{1}{2}B_{j}$ . Finally, for any client $j\in\mathcal{D}_{3},$ $A_{j}\geq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\cdot B_{j}.$

Proof.

Recall that $a=|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}|$ , $b=|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}|$ , and $c=|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}|$ . In case $1$ or $2$ , we have that $a=0$ and $b+c\leq 1$ , so $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=\alpha_{j}$ , and the sum of $\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{2})}(\alpha_{j}-c(j,i))$ is merely over a single point, so is at most $\alpha_{j}$ . Thus, if $j\in\mathcal{D}_{1},$ $A_{j}\geq B_{j}$ . (Note that this even holds for bad clients $j\in\mathcal{D}_{B}$ .)

In case $3$ , we have that $a=0$ , so $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=\alpha_{j}$ . In addition, all points $i,i^{\prime}$ in $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}$ are separated by at least $\sqrt{2}\cdot\min(\tau_{i},\tau_{i^{\prime}})\geq\sqrt{2}\cdot\alpha_{j}$ . Hence, by Lemma 6.1, $\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i))\leq\alpha_{j}\cdot(b-\sqrt{b(b-1)})\leq\alpha_{j}$ . Likewise, $\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i))\leq\alpha_{j}\cdot(c-\sqrt{c(c-1)})\leq\alpha_{j}$ . So, $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=\alpha_{j}\geq\frac{1}{2}\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)).$ Thus, if $j\in\mathcal{D}_{2},$ $A_{j}\geq B_{j}$ .

In case $4$ , we have that $a\geq 1$ . We claim that in this case, $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\frac{\delta_{2}-1}{2-\sqrt{2}}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i))$ . This will follow from the fact that all points in $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}$ are separated by at least $\sqrt{2}\cdot\alpha_{j}$ , all points in $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}$ are also separated by at least $\sqrt{2}\cdot\alpha_{j}$ , and all points in $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}$ are separated by at least $\delta_{2}\cdot\alpha_{j}$ from all points in $\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}$ . In fact, this immediately follows from the bounding of the denominators in subcases 4.a’, 4.b’, 4.c’, 4.d’, and 4.e’, where we replace $h$ with $b$ . Likewise, $\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\frac{\delta_{2}-1}{2-\sqrt{2}}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i))$ . Overall, we have that for all clients in case $4$ , $A_{j}\geq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\cdot B_{j}$ . Since $A_{j}\geq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\cdot B_{j}$ in all cases, it also holds for the bad clients $j\in\mathcal{D}_{B}$ as well. ∎

As a direct corollary, we have that

R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq 2Q_{2},\hskip 14.22636pt\text{and}\hskip 14.22636ptR_{3}\leq\frac{2(2-\sqrt{2})}{\delta_{2}-1}Q_{3}.

(45)

Next, similar to Lemma 5.19 in the $k$ -means case, we have the following result.

Lemma 6.5.

Let $\delta_{1}=\sqrt{2},$ $\delta_{2}=1.395$ , and $\delta_{3}=2-\sqrt{2}$ . Then, for all $p\in[0.01,0.068],$ we have that $\rho^{(1)}(p)\leq\max\left(1+\delta_{2},\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right),$ where $X=p^{2}+p(1-p)\cdot 1.1$ and $Y=(1-p)^{2}+\frac{p(1-p)}{1.1}.$ In addition, for all $p\in[0.01,0.068]$ , we have that $\rho^{(2)}(p)\leq\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})},$ and that $\rho^{(3)}(p)\leq\frac{1}{\frac{1}{2}-2\cdot(2-d_{2})\cdot p}$ .

Proof.

To bound $\rho^{(1)}(p)$ , we simply analyze all subcases in Case 1 and Case 2 and set $T=1.1$ . This is straightfoward to verify (see, for instance, our Desmos files on $k$ -median in Appendix A).

To bound $\rho^{(2)}(p)$ , we analyze all subcases in Case 3. Subcase 3.a’ and 3.b.i’ are straightfoward to verify. For subcase 3.b.ii’, we have to verify for all choices of $c\geq 2$ . For $c=2$ , we can verify manually. For $c\geq 3$ , it is easy to see that the numerator of 3.b.ii’ is at most

	$\displaystyle\frac{1}{2}\left[(1+\sqrt{2})\cdot\left((1-2p)+(1-2p)^{c}\right)+1-(1-2p)^{c}\right]$	$\displaystyle=\frac{1}{2}\left[1+(1+\sqrt{2})\cdot(1-2p)+\sqrt{2}\cdot(1-2p)^{c}\right]$
		$\displaystyle\leq\frac{1}{2}\left[1+(1+\sqrt{2})\cdot(1-2p)+\sqrt{2}\cdot(1-2p)^{3}\right],$

and the denominator is at least $1-2p$ . So, the fraction is at most

\frac{\frac{1}{2}\cdot\left[1+(1+\sqrt{2})\cdot(1-2p)+\sqrt{2}\cdot(1-2p)^{3}\right]}{1-2p},

which is at most $\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})}$ for all $p\in[0.01,0.068]$ .

Finally, it is straightfoward to check that all $5$ subcases in Case $4$ are at most $\frac{1}{\frac{1}{2}-2\cdot(2-d_{2})\cdot p}$ for all $p\in[0.1,0.068]$ . ∎

One can modify the remainder of the proof analogously to as in the $k$ -means case in Section 5.4. Hence, to show that we obtain an approximation $\rho+O(\varepsilon+\gamma+1/C)$ , it suffices to show that for all choices of $\theta\in[0,1]$ and $r\geq 1,$ that if we let $\mathfrak{D}^{\prime}=\mathfrak{D}+O(\gamma)\cdot\text{OPT}_{k}$ , one cannot simultaneously satisfy

$\displaystyle\mathfrak{D}^{\prime}$	$\displaystyle\geq\sum_{i=1}^{3}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right)$	(46)
$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\frac{\theta}{r}\sum_{i=1}^{3}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})+\left(1-\frac{\theta}{r}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+p_{1}\cdot\frac{\theta}{r}\sum_{i=1}^{4}R_{i}\right)$	(47)
$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\sum_{i=1}^{3}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right)$	(48)

and

R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq 2Q_{2},\hskip 28.45274ptR_{3}\leq\frac{2(2-\sqrt{2})}{\delta_{2}-1}Q_{3}.

(49)

By numerical analysis of these linear constraints and based on the functions $\rho^{(i)}$ , we obtain a $\boxed{2.406}$ -approximation algorithm for Euclidean $k$ -median clustering. We defer the details to Appendix C.

Acknowledgments

The authors thank Ashkan Norouzi-Fard for helpful discussions relating to modifying the previous results on roundable solutions. The authors also thank Fabrizio Grandoni, Piotr Indyk, Euiwoong Lee, and Chris Schwiegelshohn for helpful conversations. Finally, we would like to thank an anonymous reviewer for providing a useful suggestion in removing one of the cases for $k$ -means.

References

[1] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. SIAM Journal on Computing, 49(4):FOCS17–97–FOCS17–156, 2019.
[2] David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 1027–1035, 2007.
[3] David Arthur and Sergei Vassilvitskii. Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM J. Comput., 39(2):766–782, 2009.
[4] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544–562, 2004.
[5] Pranjal Awasthi, Avrim Blum, and Or Sheffet. Stability yields a PTAS for k-median and k-means clustering. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 309–318, 2010.
[6] Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of euclidean k-means. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands, volume 34 of LIPIcs, pages 754–767. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015.
[7] Sayan Bandyapadhyay and Kasturi Varadarajan. On variants of k-means clustering. In 32nd International Symposium on Computational Geometry (SoCG 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
[8] Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and the johnson-lindenstrauss lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 1039–1050, 2019.
[9] Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1–23:31, 2017.
[10] Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for the facility location and k-median problems. In 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pages 378–388, 1999.
[11] Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for facility location problems. SIAM J. Comput., 34(4):803–824, 2005.
[12] Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the $k$ -median problem. J. Comput. Syst. Sci., 65(1):129–149, 2002.
[13] Moses Charikar and Shi Li. A dependent LP-rounding approach for the k-median problem. In Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I, pages 194–205, 2012.
[14] Vincent Cohen-Addad. A fast approximation scheme for low-dimensional k-means. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 430–440. SIAM, 2018.
[15] Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximations schemes for clustering in doubling metrics. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 540–559. IEEE, 2019.
[16] Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, and David Saulpic. An improved local search algorithm for $k$ -median. In Proceedings of the Thirty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 1556–1612. SIAM, 2022.
[17] Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight fpt approximations for k-median and k-means. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 42:1–42:14. Ieee, 2019.
[18] Vincent Cohen-Addad and Karthik C. S. Inapproximability of clustering in lp metrics. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 519–539. IEEE Computer Society, 2019.
[19] Vincent Cohen-Addad, Euiwoong Lee, and Karthik C. S. Johnson coverage hypothesis: Inapproximability of $k$ -means and $k$ -median in $\ell_{p}$ metrics. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022. SIAM, 2022.
[20] Vincent Cohen-Addad and Claire Mathieu. Effectiveness of local search for geometric optimization. In 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands, pages 329–343, 2015.
[21] Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. On approximability of clustering problems without candidate centers. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2635–2648. SIAM, 2021.
[22] Vincent Cohen-Addad and Chris Schwiegelshohn. On the local structure of stable clustering instances. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 49–60. IEEE Computer Society, 2017.
[23] Sanjoy Dasgupta. The hardness of k-means clustering. Department of Computer Science and Engineering, University of California …, 2008.
[24] D. Feldman and M. Langberg. A unified framework for approximating and clustering data. In STOC, pages 569–578, 2011.
[25] Fabrizio Grandoni, Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Rakesh Venkat. A refined approximation for euclidean k-means. Inf. Process. Lett., 176:106251, 2022.
[26] Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. J. Algorithms, 31(1):228–248, 1999.
[27] Venkatesan Guruswami and Piotr Indyk. Embeddings and non-approximability of geometric problems. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 537–538, 2003.
[28] S Louis Hakimi. Optimum locations of switching centers and the absolute centers and medians of a graph. Operations research, 12(3):450–459, 1964.
[29] Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP. J. ACM, 50(6):795–824, 2003.
[30] Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM, 48(2):274–296, 2001.
[31] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89–112, 2004.
[32] Madhukar R. Korupolu, C. Greg Plaxton, and Rajmohan Rajaraman. Analysis of a local search heuristic for facility location problems. J. Algorithms, 37(1):146–188, 2000.
[33] Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1–5:32, 2010.
[34] Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Inf. Process. Lett., 120:40–43, 2017.
[35] Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Inf. Comput., 222:45–58, 2013.
[36] Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. SIAM J. Comput., 45(2):530–547, 2016.
[37] SP Lloyd. Least square quantization in pcm. bell telephone laboratories paper. published in journal much later: Lloyd, sp: Least squares quantization in pcm. IEEE Trans. Inform. Theor.(1957/1982), 18, 1957.
[38] Konstantin Makarychev, Yury Makarychev, and Ilya P. Razenshteyn. Performance of johnson-lindenstrauss transform for k-means and k-medians clustering. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 1027–1038. ACM, 2019.
[39] Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, and Justin Ward. A bi-criteria approximation algorithm for k-means. In Klaus Jansen, Claire Mathieu, José D. P. Rolim, and Chris Umans, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2016, September 7-9, 2016, Paris, France, volume 60 of LIPIcs, pages 14:1–14:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
[40] Jirí Matousek. On approximate geometric k-clustering. Discrete & Computational Geometry, 24(1):61–84, 2000.
[41] Nimrod Megiddo and Kenneth J Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182–196, 1984.
[42] Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy. The effectiveness of Lloyd-type methods for the k-means problem. J. ACM, 59(6):28, 2012.
[43] Hugo Steinhaus. Sur la division des corps matériels en parties. Bull. Acad. Pol. Sci., Cl. III, 4:801–804, 1957.

Appendix A Desmos Graphs and Code

Here, we provide links for the Desmos files used to visualize the LMP approximations for both $k$ -means and $k$ -median, and the Python code used to improve the approximation factor for $k$ -means.

We provide graphs on Desmos for the LMP Approximation bounds for $k$ -means and $k$ -medians, as functions of the probability $p$ . We remark that in some of these cases, there may be parameters (such as $a,b,c,c_{1},c_{2},h$ ) that need to be set properly (which can be done via toggles on the respective Desmos link) to see the actual approximation ratio as a function of $k$ .

Case $1$ of $k$ -means is available here: https://www.desmos.com/calculator/jd8ud6h2e9

Case $2$ of $k$ -means is available here: https://www.desmos.com/calculator/pgtylk9eui

Case $3$ of $k$ -means is available here: https://www.desmos.com/calculator/zjshynypsh

Case $4$ of $k$ -means is available here: https://www.desmos.com/calculator/ibwult8qzs

Case $5$ of $k$ -means is available here: https://www.desmos.com/calculator/pgtylk9eui

Case $1$ of $k$ -median is available here: https://www.desmos.com/calculator/9qmscsfvrr

Case $2$ of $k$ -median is available here: https://www.desmos.com/calculator/rdidyxhs2o

Case $3$ of $k$ -median is available here: https://www.desmos.com/calculator/zoeswetvyz

Case $4$ of $k$ -median is available here: https://www.desmos.com/calculator/mpwrmz7mhe

The Python source code for $k$ -means is available at
https://drive.google.com/file/d/1mzKPr4ZbXe7FPDtx8JBkz2ReCK4OriQz/view?usp=sharing, and the Python source code for $k$ -median is available at
https://drive.google.com/file/d/1SEfUvHaOd78QgxCFaFA8OjuDsKb3Qg0I/view?usp=sharing. One can also view the code in a more readable PDF format for $k$ -means at
https://drive.google.com/file/d/1Ujcd6znbwxOkG-72zBZGX3otXPPAScud/view?usp=sharing, and for $k$ -median at
https://drive.google.com/file/d/15HP3wBN20tCanwAc1dAA3rvjI23drdcO/view?usp=sharing.

Appendix B Omitted Details for the LMP Approximations

First, we prove Proposition 4.4.

Proof of Proposition 4.4.

Let $v_{1}=B-A$ , $v_{2}=C-B$ , and $v_{3}=D-B.$ Then,

	$\displaystyle p\cdot\\|C-A\\|_{2}^{2}+(1-p)\cdot\\|D-A\\|_{2}^{2}$	$\displaystyle=p\cdot\\|v_{1}+v_{2}\\|_{2}^{2}+(1-p)\cdot\\|v_{1}+v_{3}\\|_{2}^{2}$
		$\displaystyle=\\|v_{1}\\|_{2}^{2}+2\cdot\langle v_{1},pv_{2}+(1-p)v_{3}\rangle+p\cdot\\|v_{2}\\|_{2}^{2}+(1-p)\cdot\\|v_{3}\\|_{2}^{2}$
		$\displaystyle\leq 1+2\cdot\\|pv_{2}+(1-p)v_{3}\\|_{2}+p\cdot\\|v_{2}\\|_{2}^{2}+(1-p)\cdot\\|v_{3}\\|_{2}^{2},$

since $\|v_{1}\|_{2}\leq 1$ . Now, we can write

\|pv_{2}+(1-p)v_{3}\|_{2}=\sqrt{\|pv_{2}+(1-p)v_{3}\|_{2}^{2}}=\sqrt{p\cdot\|v_{2}\|_{2}^{2}+(1-p)\cdot\|v_{3}\|_{2}^{2}-p(1-p)\cdot\|v_{2}-v_{3}\|_{2}^{2}}.

So, we have that $p\cdot\|c-a\|_{2}^{2}+(1-p)\cdot\|d-a\|_{2}^{2}$ is at most

1+2\cdot\sqrt{p\cdot\nu_{1}\cdot\min(\sigma_{1},\sigma_{2})+(1-p)\cdot\nu_{2}\cdot\min(\sigma_{1},\sigma_{3})-p(1-p)\cdot\nu_{3}\cdot\min(\sigma_{2},\sigma_{3})}\\ +p\cdot\nu_{1}\cdot\min(\sigma_{1},\sigma_{2})+(1-p)\cdot\nu_{2}\cdot\min(\sigma_{1},\sigma_{3}).

(50)

It is simple to see that (50) is nondecreasing in $\sigma_{1}$ for a fixed $\sigma_{2},\sigma_{3}$ , so (50) is maximized when $\sigma_{1}=1$ . Next, when $\sigma_{1}=1$ , it is clear that (50) is non-increasing in $\sigma_{2}$ if $\sigma_{2}\geq 1$ and likewise for $\sigma_{3}$ , so (50) is maximized for some $\sigma_{2},\sigma_{3}\leq 1$ . In this case, (50) simplifies to

1+2\sqrt{p\cdot\nu_{1}\cdot\sigma_{2}+(1-p)\cdot\nu_{2}\cdot\sigma_{3}-p(1-p)\cdot\nu_{3}\cdot\min(\sigma_{2},\sigma_{3})}+p\cdot\nu_{1}\cdot\sigma_{2}+(1-p)\cdot\nu_{2}\cdot\sigma_{3}.

Now, using the fact that $\nu_{1},\nu_{2}\geq\nu_{3}$ and that $p,(1-p)\geq p(1-p)$ , we have that this expression is nondecreasing in both $\sigma_{2},\sigma_{3}$ as long as $\sigma_{2},\sigma_{3}\leq 1$ . So, we may upper bound (50), and thus $p\cdot\|C-A\|_{2}^{2}+(1-p)\cdot\|D-A\|_{2}^{2}$ , by

1+2\cdot\sqrt{p\cdot\nu_{1}+(1-p)\cdot\nu_{2}-p(1-p)\cdot\nu_{3}}+p\cdot\nu_{1}+(1-p)\cdot\nu_{2},

by setting $\sigma_{2}=\sigma_{3}=1.$ ∎

Next, we complete the details in Lemmas 4.2 and 6.2 that we did not complete in the main body of the paper.

K-means: Case 1.c:

We wish to maximize

\frac{(1-p)\cdot(t+\sqrt{\delta_{1}})^{2}+p\cdot t^{2}}{1-p(1-t^{2})}=\frac{(1-p)\cdot(t+\sqrt{\delta_{1}})^{2}+p\cdot t^{2}}{(1-p)+p\cdot t^{2}},

over $0\leq t\leq 1.$ First, note that if $t\leq\sqrt{0.75}$ , then we can bound this fraction by $(t+\sqrt{\delta_{1}})^{2}\leq(\sqrt{0.75}+\sqrt{\delta_{1}})^{2}$ . Alternatively, if $t\geq\sqrt{0.75}$ , then we can bound this fraction by at most

\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot t^{2}}{(1-p)+p\cdot t^{2}}\leq\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+3p/4}{1-p/4},

where the left-hand side in the above equation has the numerator and denominator increasing at the same rate in terms of $t$ , so it is maximized when $t$ is minimized, i.e., $t=\sqrt{0.75}$ . Thus, we can bound the overall fraction as at most

\max\left((\sqrt{0.75}+\sqrt{\delta_{1}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+3p/4}{1-p/4}\right).

K-Means: Case 1.g.i:

We wish to maximize

\frac{(1-p)\cdot(u+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot d(j,i_{2})^{2}}{1-p+p\cdot d(j,i_{2})^{2}}

over $t,u\in[0,1]$ and $d(j,i_{2})\geq u-\sqrt{\delta_{3}\cdot t}$ . First, note that if we treat $d(j,i_{2})$ as a variable, the numerator and denominator increase at the same rate as $d(j,i_{2})^{2}$ increases, so this fraction is maximized when $d(j,i_{2})=\max(0,u-\sqrt{\delta_{3}\cdot t})$ . If $u-\sqrt{\delta_{3}\cdot t}\leq 0$ , then this fraction equals $(u+\sqrt{\delta_{1}\cdot t})^{2}$ , but $u\leq\sqrt{\delta_{3}\cdot t}\leq\sqrt{\delta_{3}}$ since $t\leq 1,$ and this means that $(u+\sqrt{\delta_{1}\cdot t})^{2}\leq(\sqrt{\delta_{3}}+\sqrt{\delta_{1}})^{2}$ . Alternatively, we are maximizing

\frac{(1-p)\cdot(u+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot(u-\sqrt{\delta_{3}\cdot t})^{2}}{1-p+p\cdot(u-\sqrt{\delta_{3}\cdot t})^{2}}

over $t,u\in[0,1]$ . Next, note that if $u>\sqrt{\delta_{3}\cdot t}$ and $t<1,$ then increasing $t$ will decrease $(u-\sqrt{\delta_{3}\cdot t})^{2}$ and increase $(u+\sqrt{\delta_{1}\cdot t})^{2}$ . So, the denominator decreases and the numerator either increases or decreases at a slower rate. Thus, we may assume that either $u-\sqrt{\delta_{3}\cdot t}\leq 0$ or that $t=1$ . In the case where $t=1$ , we wish to maximize

\frac{(1-p)\cdot(u+\sqrt{\delta_{1}})^{2}+p\cdot(u-\sqrt{\delta_{3}})^{2}}{1-p+p\cdot(u-\sqrt{\delta_{3}})^{2}}=\frac{p\cdot\left[(u+\sqrt{\delta_{1}})^{2}+(u-\sqrt{\delta_{3}})^{2}\right]+(1-2p)\cdot(u+\sqrt{\delta_{1}})^{2}}{p\cdot\left[(u-\sqrt{\delta_{3}})^{2}+1\right]+(1-2p)}.

Writing $A(u)=(u+\sqrt{\delta_{1}})^{2}+(u-\sqrt{\delta_{3}})^{2}$ , $B(u)=(u+\sqrt{\delta_{1}})^{2},$ and $C(u)=(u-\sqrt{\delta_{3}})^{2}+1,$ we can verify that $\frac{A(u)}{C(u)}$ and $B(u)$ are both increasing functions in $u$ over $[0,1]$ , which means so is $\frac{p\cdot A(u)+(1-2p)\cdot B(u)}{p\cdot C(u)+(1-2p)}.$ Therefore, the overall maximum is at most

\max\left((\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p\cdot(1-\sqrt{\delta_{3}})^{2}}\right)

K-Means: Case 2.d:

Our goal is to maximize

\frac{(1-2p)\cdot\min(1+\sqrt{\delta_{1}},\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot\beta^{2}+p\cdot\gamma^{2}}{1-p(1-\beta^{2})-p(1-\gamma^{2})}

over $t\geq 1$ and $\beta+\gamma\geq\sqrt{\delta_{3}\cdot t}$ (and where $\beta,\gamma\geq 0$ ). By symmetry, we may assume WLOG that $\beta\geq\gamma$ , and replace $\max(\beta,\gamma)$ with $\beta$ . Next, note that increasing $t$ only increases the overall fraction, so we may increase $t$ until we have that $\beta+\gamma=\sqrt{\delta_{3}\cdot t}$ . So, we now wish to maximize

\frac{(1-2p)\cdot\min\left(1+\sqrt{\delta_{1}},\beta+\sqrt{\delta_{1}/\delta_{3}}\cdot(\beta+\gamma)\right)^{2}+p\cdot(\beta^{2}+\gamma^{2})}{1-2p+p\cdot(\beta^{2}+\gamma^{2})}

over $\beta,\gamma\geq 0$ subject to $\beta+\gamma\geq\sqrt{\delta_{3}}$ (since $\beta+\gamma\geq\sqrt{\delta_{3}\cdot t}$ and $t\geq 1$ ). But, note that if $\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)>1+\sqrt{\delta_{1}}$ , then any decrease in either $\beta$ or $\gamma$ until we have that $\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)=1+\sqrt{\delta_{1}}$ will decrease both the numerator and the denominator by the same amount, and so will increase the fraction. Thus, we may assume that $\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)\leq 1+\sqrt{\delta_{1}}$ .

In this case, we may rewrite our goal as maximizing

f(\beta,\gamma):=\frac{(1-2p)\cdot\left(\beta+\sqrt{\delta_{1}/\delta_{3}}\cdot(\beta+\gamma)\right)^{2}+p\cdot(\beta^{2}+\gamma^{2})}{1-2p+p\cdot(\beta^{2}+\gamma^{2})}

(51)

over $\beta,\gamma\geq 0$ subject to $\beta+\gamma\geq\sqrt{\delta_{3}}$ and $\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)\leq 1+\sqrt{\delta_{1}}$ . Now, for any fixed $\beta,\gamma$ , we note that $f(\lambda\beta,\lambda\gamma)$ for any $\lambda\geq 1$ multiplies the numerator of the fraction in (51) by a $\lambda^{2}$ factor, but multiplies the denominator of the fraction by less than a $\lambda^{2}$ factor, since $1-2p\geq 0$ . Therefore, the fraction increases overall, which means that to maximize $f(\beta,\gamma)$ , we may always assume that $\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)=1+\sqrt{\delta_{1}}$ . It is easy to see that this automatically implies that $\beta+\gamma\geq\sqrt{\delta_{3}}$ when $\beta,\gamma\geq 0$ .

Thus, our goal is to maximize

\frac{(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}+p(\beta^{2}+\gamma^{2})}{(1-2p)+p\cdot(\beta^{2}+\gamma^{2})}

subject to $\beta,\gamma\geq 0$ and $\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)=1+\sqrt{\delta_{1}}$ . Maximizing this, however, just entails to minimizing $\beta^{2}+\gamma^{2}$ , which is easy to solve as $\beta=(\sqrt{\delta_{3}}+\sqrt{\delta_{1}})\cdot\frac{\sqrt{\delta_{3}}(1+\sqrt{\delta_{1}})}{\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2}}$ and $\gamma=\sqrt{\delta_{1}}\cdot\frac{\sqrt{\delta_{3}}(1+\sqrt{\delta_{1}})}{\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2}},$ which means that $\beta^{2}+\gamma^{2}=\frac{\delta_{3}\cdot(1+\sqrt{\delta_{1}})^{2}}{\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2}}.$

K-median: Case 1.b’:

It suffices to prove the following proposition.

Proposition B.1.

Let $0<p\leq 1/2$ , and suppose that we have $4$ points $i^{*},i_{1},i_{3},j$ in Euclidean space such that $d(j,i^{*})\leq 1,$ $d(i^{*},i_{1})\leq\sqrt{2}\cdot\min(t_{i^{*}},t_{i_{1}}),$ $d(i^{*},i_{3})\leq\sqrt{2}\cdot\min(t_{i^{*}},t_{i_{3}}),$ $d(i_{1},i_{3})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{3}}),$ and $t_{i^{*}}\leq 1$ . Then, for any $T>0$ ,

(1-p)\cdot d(j,i_{1})+p\cdot d(j,i_{3})\leq\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-X\cdot\delta_{2}^{2}}}

where $X=p^{2}+p(1-p)\cdot T$ and $Y=(1-p)^{2}+\frac{p(1-p)}{T}.$

Proof.

We write $d(j,i_{1})=\|j-i_{1}\|_{2}$ and $d(j,i_{3})=\|j-i_{3}\|_{2}$ . First, we have that

	$\displaystyle\hskip 14.22636pt\left((1-p)\cdot\\|j-i_{1}\\|_{2}+p\cdot\\|j-i_{3}\\|_{2}\right)^{2}$
	$\displaystyle=(1-p)^{2}\cdot\\|j-i_{1}\\|_{2}^{2}+p^{2}\cdot\\|j-i_{3}\\|_{2}^{2}+2p(1-p)\cdot\\|j-i_{1}\\|_{2}\cdot\\|j-i_{3}\\|_{2}$
	$\displaystyle\leq(1-p)^{2}\cdot\\|j-i_{1}\\|_{2}^{2}+p^{2}\cdot\\|j-i_{3}\\|_{2}^{2}+p(1-p)\cdot\left(\frac{1}{T}\cdot\\|j-i_{1}\\|_{2}^{2}+T\cdot\\|j-i_{3}\\|_{2}^{2}\right)$

for any $T>0$ . Writing $X=p^{2}+p(1-p)\cdot T$ and $Y=(1-p)^{2}+\frac{p(1-p)}{T},$ we have that

\left((1-p)\cdot\|j-i_{1}\|_{2}+p\cdot\|j-i_{3}\|_{2}\right)^{2}\leq X\cdot\|j-i_{3}\|_{2}^{2}+Y\cdot\|j-i_{1}\|_{2}^{2}.

We can now apply Proposition 4.4 on the points $A=j,B=i^{*},C=i_{3},D=i_{1}$ , with $\nu_{1}=\nu_{2}=2,\nu_{3}=\delta_{2}^{2}$ , and $\sigma_{1}=t_{i^{*}},\sigma_{2}=t_{i_{3}},\sigma_{3}=t_{i_{1}}$ and where we replace the parameter $p$ in Proposition 4.4 with $\frac{X}{X+Y}$ , to say that

	$\displaystyle\hskip 14.22636ptX\cdot\\|j-i_{3}\\|_{2}^{2}+Y\cdot\\|j-i_{1}\\|_{2}^{2}$
	$\displaystyle\leq(X+Y)\cdot\left(1+\frac{X}{X+Y}\cdot 2+\frac{Y}{X+Y}\cdot 2+2\sqrt{\frac{X}{X+Y}\cdot 2+\frac{Y}{X+Y}\cdot 2-\frac{X}{X+Y}\cdot\frac{Y}{X+Y}\cdot\delta_{2}^{2}}\right)$
	$\displaystyle=3(X+Y)+2\sqrt{2(X+Y)^{2}-\delta_{2}^{2}\cdot XY}.$

In summary, we have that for any choice of $T>0$ ,

	$\displaystyle\left((1-p)\cdot\\|j-i_{1}\\|_{2}+p\cdot\\|j-i_{3}\\|_{2}\right)^{2}$	$\displaystyle\leq X\cdot\\|j-i_{3}\\|_{2}^{2}+Y\cdot\\|j-i_{1}\\|_{2}^{2}$
		$\displaystyle\leq 3(X+Y)+2\sqrt{2(X+Y)^{2}-\delta_{2}^{2}\cdot XY}.\qed$

K-median: Case 1.g.i’:

Our goal is to maximize

\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot\max(0,u-t\cdot\delta_{3})}{1-p+p\cdot\max(0,u-t\cdot\delta_{3})}

over $0\leq t,u\leq 1.$ First, note that if $u-t\cdot\delta_{3}\leq 0,$ then since $t\leq 1,$ this means that $u\leq\delta_{3}$ . In this case, the fraction equals $u+\delta_{1}\cdot t\leq\delta_{1}+\delta_{3}=2$ , since $\delta_{1}=\sqrt{2}$ and $\delta_{3}=2-\sqrt{2}$ .

Alternatively, we have that $\max(0,u-t\cdot\delta_{3})=u-t\cdot\delta_{3}.$ Let $u^{\prime}=u-t\cdot\delta_{3}\geq 0$ , so $u=u^{\prime}+t\cdot\delta_{3}$ . In this case, we wish to maximize

\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot(u-t\cdot\delta_{3})}{1-p+p\cdot(u-t\cdot\delta_{3})}=\frac{u^{\prime}+(1-p)\cdot(\delta_{1}+\delta_{3})t}{p\cdot u^{\prime}+(1-p)}=\frac{u^{\prime}+(1-p)\cdot 2t}{p\cdot u^{\prime}+(1-p)}.

over $0\leq t\leq 1$ and $0\leq u^{\prime}\leq 1-t\cdot\delta_{3}$ . Since $\frac{1}{p}\geq 2\geq 2t$ , we have that increasing $u^{\prime}$ increases the fraction overall. So, we may assume that $u^{\prime}=1-t\cdot\delta_{3}.$ In this case, we are trying to maximize the fraction

\frac{1-t\cdot\delta_{3}+(1-p)\cdot 2t}{p\cdot(1-t\cdot\delta_{3})+(1-p)}=\frac{1+(2-\delta_{3})t-2pt}{1-p\cdot\delta_{3}\cdot t}.

Since $p\leq\frac{1}{2}$ , the numerator increases and the denominator decreases as $t$ increases, so the fraction increases overall. Thus, this fraction is maximized when $t=1$ , and equals

\frac{1+(2-\delta_{3})-2p}{1-p\cdot\delta_{3}}=\frac{1+\delta_{1}-(\delta_{1}+\delta_{3})p}{1-p\cdot\delta_{3}}.

Appendix C Numerical Analysis for Euclidean $k$ -means and $k$ -median

C.1 The $k$ -means case

We recall that our goal is to show, for an appropriate choice of $\rho$ , that for any $0\leq\theta\leq 1$ and any $r\geq 1$ , we cannot simultaneously satisfy

$\displaystyle\mathfrak{D}^{\prime}$	$\displaystyle\geq\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right),$	(52)
$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\frac{\theta}{r}\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})+\left(1-\frac{\theta}{r}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+p_{1}\cdot\frac{\theta}{r}\sum_{i=1}^{5}R_{i}\right),$	(53)
$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right),$	(54)

and

R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq Q_{2},\hskip 28.45274ptR_{3}\leq Q_{3},\hskip 28.45274ptR_{4}\leq 1.75Q_{4},\hskip 28.45274ptR_{5}\leq 2Q_{5},

(55)

where we will let $Q_{1},Q_{2},Q_{3},Q_{4},Q_{5},R_{1},R_{2},R_{3},R_{4},R_{5}$ and $\mathfrak{D}^{\prime}$ be arbitrary nonnegative reals. For $p_{1}=0.402$ , we recall that $\rho(p_{1})=3+2\sqrt{2}$ . Now, note that if we increase $\mathfrak{D}^{\prime}$ , Equations (53) and (54) become harder to satisfy, since in both equations, the left hand side has a greater slope as a function of $\mathfrak{D}^{\prime}$ than the right hand side. As a result, we may assume that $\mathfrak{D}^{\prime}=\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right)$ , which we know is nonnegative since $p_{1}<0.5$ and $r\geq 1$ , so $Q_{i}\geq\frac{p_{1}}{r}\cdot R_{i}$ for all $1\leq i\leq 5$ .

Now, we note that we may assume $r\geq 2.37$ . This is because if $r\leq 2.37,$ then $\frac{p_{1}}{r}\geq 0.169,$ and it is easy to verify that $\rho(p)\leq 5.912$ for any $p\in[0.169,0.402]$ (for instance, by using Lemma 5.19 to bound $\rho^{(1)}(p),\rho^{(2)}(p)$ , and $\rho^{(5)}(p)$ , and using Cases 1.g.i and 2.d for $\rho^{(3)}(p)$ and $\rho^{(4)}(p)$ ). Therefore, for any $\rho\geq 5.912,$ if $r\leq 2.37$ then Equations (52) and (54) cannot hold simultaneously. In addition, we may also assume that $r\leq 4.18,$ since if $r\geq 3.5$ , then we can use the simpler bound of $\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot(r/(2p_{1})-1)}\right)\leq 5.912$ .

We recall that for $p\in[0.096,0.402]$ , $\rho^{(1)}(p)=3+2\sqrt{2}$ , $\rho^{(2)}(p)=1+2\cdot p+(1-p)\cdot\delta_{1}+2\sqrt{2\cdot p^{2}+(1-p)\cdot\delta_{1}},$ $\rho^{(3)}(p)=\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p(1-\sqrt{\delta_{3}})^{2}}$ , $\rho^{(4)}(p)=\frac{(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}{(1-2p)\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot(1+\sqrt{\delta_{1}})^{2}\cdot\delta_{3}}$ , and $\rho^{(5)}(p)=5.68$ .

Now, let $\rho=5.912$ , and suppose there exist $0\leq\theta_{0}\leq\theta\leq\theta_{1}\leq 1$ and $1\leq r_{0}\leq r\leq r_{1}$ such that Equations (52), (53), (54), and (55) can be simultaneously satisfies for nonnegative $Q_{1},Q_{2},Q_{3},Q_{4},Q_{5},R_{1},R_{2},R_{3},R_{4},R_{5}$ . Then, in fact we must be able to satisfy the weaker conditions

	$\displaystyle\mathfrak{D}^{\prime}$	$\displaystyle=\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r_{0}}\cdot R_{i}\right)$
	$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{1})+\left(1-\frac{\theta_{0}}{r_{1}}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{4}R_{i}\right)$
	$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r_{1}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right),$

and (55), while having $Q_{1},Q_{2},Q_{3},Q_{4},Q_{5},R_{1},R_{2},R_{3},R_{4},R_{5}$ all be nonnegative. Indeed, the conditions are weaker since we have decreased the value of $\mathcal{D}^{\prime}$ and increased all terms on the right-hand side (noting that each $\rho^{(i)}$ is a non-increasing function in the range $[0,0.402]$ ).

For every $0\leq\theta_{0}\leq 0.99$ and $2.37\leq r_{0}\leq 4.17$ such that $\theta_{0},r_{0}$ are integral multiples of $0.01$ , we look at the intervals $\theta\in[\theta_{0},\theta_{0}+0.01]$ and $r\in[r_{0},r_{0}+0.01]$ . If this region has a nonnegative solution to these inequalities, then we further partition the region $[\theta_{0},\theta_{0}+0.01]\times[r_{0},r_{0}+0.01]$ into a $10\times 10$ grid of dimensions $0.001$ . If one of these regions has a nonnegative solution to these inequalities, we partition one step further into a grid of dimensions $0.0001$ (we will not need to partition beyond this). Using this procecdure, we are able to obtain that there is no solution in $\theta\in[0,1]$ and $r\in[2.37,4.18]$ when $\rho=5.912$ , which allows us to establish that our algorithm provides a polynomial-time $\boxed{5.912}$ -approximation for Euclidean $k$ -means clustering.

See Appendix A for links to the Python code.

C.2 The $k$ -median case

The $k$ -median case is almost identical, except for the modified equations and modified choices of $\rho^{(i)}$ . This time, we wish to show that for any $\theta\in[0,1]$ and $r\geq 1$ , there exists $0\leq\theta_{0}\leq\theta\leq\theta_{1}\leq 1$ and $1\leq r_{0}\leq r\leq r_{1}$ such that one cannot satisfy

	$\displaystyle\mathfrak{D}^{\prime}$	$\displaystyle=\sum_{i=1}^{3}\left(Q_{i}-\frac{p_{1}}{r_{0}}\cdot R_{i}\right)$
	$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{3}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{1})+\left(1-\frac{\theta_{0}}{r_{1}}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{4}R_{i}\right)$
	$\displaystyle\rho\cdot\mathfrak{D}^{\prime}$	$\displaystyle<\rho^{(1)}\left(\frac{p_{1}}{r_{1}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right)+\rho^{(2)}\left(\frac{p_{1}}{r_{1}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right)+\rho^{(3)}\left(\frac{p_{1}}{r_{0}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right),$

where $Q_{1},Q_{2},Q_{3},R_{1},R_{2},R_{3}$ are nonnegative. In the above equations, we set $p_{1}=0.068$ , $\delta_{1}=\sqrt{2},\delta_{2}=1.395$ , and $\delta_{3}=2-\sqrt{2}$ . Also, recall that for $p\in[0.01,0.068],$ we have that $\rho^{(1)}(p)\leq\max\left(1+\delta_{2},\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right),$ where $X=p^{2}+p(1-p)\cdot 1.1$ and $Y=(1-p)^{2}+\frac{p(1-p)}{1.1},$ $\rho^{(2)}(p)\leq\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})},$ and that $\rho^{(3)}(p)\leq\frac{1}{\frac{1}{2}-2\cdot(2-d_{2})\cdot p}$ .

We remark that in the final equation, we use $\rho^{(3)}(p_{1}/r_{0})$ instead of $\rho^{(3)}(p_{1}/r_{1})$ - this is because $\rho^{(3)}$ is a decreasing function on the region $[0,0.068]$ , as opposed to $\rho^{(1)}$ and $\rho^{(2)}$ which are both decreasing functions.

First, we may assume that $r\in[2.4,3.42].$ Indeed, if $1\leq r<2.4$ , one can use the more naive bound of $\rho(p_{1}/r)$ , which is less than $2.406$ . If $r>3.42$ , one can instead use the bound of

\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{p_{0}\cdot r}{p_{1}}-1\right)}\right)\leq 2.395\cdot\left(1+\frac{1}{4\cdot 3.42\cdot\left(\frac{0.337\cdot 3.42}{0.068}-1\right)}\right)<2.406.

To finish, we apply a similar method as in the $k$ -means case. We split the region $(\theta,r)\in[0,1]\times[2.4,3.42]$ into grid blocks of size $0.005\times 0.005$ with $\theta_{0},\theta_{1},r_{0},r_{1}$ being the endpoints in each direction. We verify that the linear program has no solution when $\rho=2.406$ for each grid block: if it does, we further refine the grid block into smaller $0.001\times 0.001$ -sized pieces and verify each of the smaller pieces.

See Appendix A for links to the Python code.

Appendix D Changes to Construction of Roundable Solutions

In this section, we explain how Ahmadian et al. [1] implicitly prove Theorem 5.7, up to some minor modifications of their algorithm and analysis. Because the algorithm and analysis is almost entirely the same, we only describe the differences between the algorithm and analysis in [1] and what we need for our Theorem 5.7.

The only changes in the overall algorithm will be as follows. We will set some small constant $\kappa$ such that $\varepsilon=\kappa^{2}$ . We will then set $K=\Theta(\varepsilon^{-1}\gamma^{-4}\log\kappa)$ as opposed to $K=\Theta(\varepsilon^{-1}\gamma^{-4})$ in [1, Algorithm 2, Line 4], and set the definition of stopped [1, Section 7] to be that $j\in\mathcal{D}$ is stopped if $\exists j^{\prime}\neq j\in\mathcal{D}$ such that $(1+\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,j^{\prime})+\kappa^{-1}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j^{\prime}}$ , as opposed to $\exists j^{\prime}\neq j\in\mathcal{D}$ such that $2\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,j^{\prime})+6\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j^{\prime}}$ . Here, we are letting $\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}=\sqrt{\alpha_{j}}$ for $j\in\mathcal{D}$ .

We now describe how this changes the claims throughout [1, Sections 7-8]. We only describe the changes to the statements, because the proofs do not change at all. For nearly the remainder of this appendix, we will consider a new definition of roundable, neither the one in [1] nor our Definition 5.1. Our modified definition will instead have that: for all $j\in\mathcal{D}\backslash\mathcal{D}_{B}$ and all $A\in[2\kappa,1/(2\kappa)]$ , $(1+A+10\varepsilon/\kappa)^{2}\cdot\alpha_{j}\geq\left(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}}\right)^{2}$ , which replaces Conditions 3a and 3b in Definition 5.1 (or Condition 2a in [1, Definition 5.1]). In addition, our modified definition will have that for all clients $j\in\mathcal{D}$ , $\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}})^{2}$ for all $A\in[2\kappa,1/(2\kappa)]$ , which replaces our Condition 3c in Definition 5.1 (or condition 2b in [1, Definition 5.1])

We are now ready to describe how each claim in [1] changes (or stays the same).

[1, Lemma 7.1] still holds with our new definition of stopped: the same proof still works.

The (unnumbered) claim in the first paragraph of [1, Section 8] still goes through, with our modified definition of roundable (i.e., the one presented in this section).

In [1, Section 8.1], both [1, Lemma 8.1] and [1, Lemma 8.2] still hold with our new definition of stopped, with essentially no changes to the proof. Likewise, in [1, Section 8.2], Lemmas 8.3, 8.4, 8.5 and Corollary 8.6 in [1] still hold with our new definition of stopped.

In [1, Section 8.3], we change the definition of $\mathcal{B}$ , the “potentially bad” clients (see [1, Equation (8.1)]), to be

\mathcal{B}=\left\{j\in\mathcal{D}:j\text{ is undecided and }(1+\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}<d(j,j^{\prime})+\frac{1}{\kappa}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}\right\},

where $\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}=\sqrt{\alpha_{j}^{(0)}}$ refers to the value of $\alpha_{j}$ at the start of a RaisePrice(s)olution (i.e., when the solution $\mathcal{S}$ is labeled as $\mathcal{S}^{(0)}$ in our Algorithm 2). This contrasts to the original definition of $\mathcal{B}$ , which was the set of undecided clients with $2\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}<d(j,j^{\prime})+6\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}$ , in a similar way to how our definition of stopped contrasts with the original definition.

[1, Lemma 8.7] is now as follows. For any $(\alpha,z)$ produced during RaisePrice, for every client $j\in\mathcal{D}$ the following holds:

•

If $j\in\mathcal{D}\backslash\mathcal{B}$ then there exists a tight facility $i$ such that $(1+A+\varepsilon/\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,i)\cdot A\cdot\sqrt{t_{i}}$ for all $A\in[2\kappa,1/(2\kappa)]$ .
•

There exists a tight facility $i$ such that $\frac{1}{\kappa}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}\geq d(j,i)+A\cdot\sqrt{t_{i}}$ for all $A\in[2\kappa,1/(2\kappa)]$ .

Again, the same proof holds.

We now move to [1, Section 8.4]. We update [1, Lemma 8.8] to be that if $j$ has a tight edge to some facility $i$ , then $\alpha_{j^{\prime}}\leq\frac{5^{2}}{\kappa^{4}}\cdot\alpha_{j}$ for any $j^{\prime}$ with a tight edge to $i$ . In the proof, we would replace the stronger statement [1, Equation (8.2)] with: $(1+\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j^{\prime}}\leq d(j^{\prime},j)+\frac{4}{\kappa}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}$ . In addition, we would update the Claim inside the proof of [1, Lemma 8.8] to be that: there is some tight facility $i^{*}$ in $(\alpha^{(0)},z^{(0)})$ and also:

d(j_{1},i^{*})\leq(1+\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j_{1}}^{(0)}\leq(1+\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\hskip 14.22636pt\text{and}\hskip 14.22636pt\alpha_{j^{\prime\prime}}^{(0)}\leq(1+\varepsilon)\alpha_{j_{1}}^{(0)}\leq(1+\varepsilon)\alpha_{j}\text{ for all }j^{\prime\prime}\in N^{(0)}(i^{*}).

Up to these changes, the rest of the proof of Lemma 8.8 in [1] is essentially unchanged.

Next, we update [1, Lemma 8.9] to replace “ $\alpha_{j}^{(0)}\geq 20^{2}\theta_{s}$ or $\alpha_{j}\geq 20^{2}\theta_{s}$ ” with “ $\alpha_{j}^{(0)}\geq\frac{6^{2}}{\kappa^{4}}\theta_{s}$ or $\alpha_{j}\geq\frac{6^{2}}{\kappa^{4}}\theta_{s}$ ” - again the same proof still holds.

[1, Proposition 8.10] and its proof still hold, except that we have to replace $C_{1}=\lceil\log_{1+\varepsilon}(20^{4})\rceil$ with $C_{1}=\lceil\log_{1+\varepsilon}6^{4}/\kappa^{8}\rceil=O(\varepsilon^{-1}\log\kappa^{-1})$ . So, if we set $\varepsilon_{z}=n^{-6(K+C_{1}+2)-3}$ for our new choice of $C_{1},$ Proposition 8.10 in [1] holds.

We update [1, Proposition 8.11] to say that if $j\in\mathcal{B}$ for some $(\alpha,z)$ produced by RaisePrice, then $\kappa^{4}\cdot\theta_{s}\leq\alpha_{j}^{(0)}\leq\frac{6^{4}}{\kappa^{12}}\theta_{s}$ . We replace the last equation in the Claim in the Proposition’s proof with: $\kappa^{2}\cdot\theta_{s}\leq\alpha_{j^{\prime}}^{(0)}\leq\frac{6^{4}}{\kappa^{8}}\cdot\theta_{s}$ . The same proof still holds. We also update the definition of $\mathcal{W}(\sigma)$ to be the set $\{j\in\mathcal{D}:\kappa^{8}/6^{2}\cdot\theta_{s}\leq\alpha_{j}^{(0)}\leq 6^{6}/\kappa^{16}\cdot\theta_{s}\text{ for some s}\},$ where RaisePrice defines the parameters $\theta_{s}$ based on the shift parameter $\sigma\in[0,K/2)$ . With these definitions, and our modified choice of $K=\Theta(\varepsilon^{-1}\gamma^{-4}\log\kappa^{-1})$ , we will have that [1, Corollary 8.12] still holds.

We now move to [1, Section 8.5]. We keep their definitions of $\gamma$ -close neighborhoods and of dense facilities and clients. We also let the sets $\mathcal{F}_{D},\mathcal{D}_{D},\mathcal{F}_{S}^{(\ell)},\mathcal{D}_{S}^{(\ell)}(i)$ , and $\mathcal{D}_{B}$ be defined the same way (modulo our change in definition of $\mathcal{B}$ ). We also define $\tau_{i}$ the same way, and we will let $H^{(0)}$ and $IS^{(0)}$ simply represent the conflict graphs $H^{(0)}(\delta_{1})$ and $IS_{1}^{(0)}$ as generated by our Algorithm 2, respectively, at each iteration corresponding to making a new solution $\mathcal{S}^{(0)}$ . Note that we are choosing $\delta=\delta_{1}=\frac{4+8\sqrt{2}}{7}$ , so $\sqrt{2}\leq\sqrt{\delta}\leq 2$ .

With these, it is quite simple to see that [1, Lemma 8.13] still holds, where the choice $\rho=\rho(0)$ in the proof is the approximation constant of the LMP algorithm in [1] with only a single parameter $\rho$ based on $\delta=\frac{4+8\sqrt{2}}{7}.$ In addition, [1, Lemma 8.14] still holds, except that we replace $\text{OPT}_{k}$ with $\text{OPT}_{k^{\prime}}$ , since the final inequality in the proof relates $\sum_{j\in\mathcal{D}_{>\gamma}}d(j,IS^{(0)})^{2}\leq\sum_{j\in\mathcal{D}}d(j,IS^{(0)})^{2}$ to $\text{OPT}_{k^{\prime}}$ now since $k^{\prime}$ is the minimum of $k$ and all sizes of the sets that become $IS^{(0)}$ at some point. [1, Corollary 8.15] (and the following Remark 8.16) also hold, due to our updated definition of $\mathcal{W}(\sigma)$ and $K$ .

Now, we update [1, Lemma 8.17] to say: for any $j\in\mathcal{D}_{D}\cap\mathcal{B}$ , either:

•

There exists a tight facility $i\in\mathcal{F}$ such that for all $A\in[2\kappa,1/(2\kappa)]$ , $(1+A+10\cdot\varepsilon/\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,i)+A\cdot\sqrt{t_{i}}$
•

There exists a special facility $i\in\mathcal{F}_{S}$ such that for all $A\in[2\kappa,1/(2\kappa)]$ , $(1+A+10\cdot\varepsilon/\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,i)+A\cdot\sqrt{\tau_{i}}.$

Again, the proof holds with minimal change.

We now move to [1, Section 8.6], the final section of Ahmadian et al.’s analysis. We first look at how [1, Proposition 8.18] changes. The fact that $\alpha$ is feasible for $\text{DUAL}(\lambda+\frac{1}{n})$ , that $\alpha_{j}\geq 1$ for all $j$ , and that $z_{i}\in[\lambda,\lambda+\frac{1}{n}]$ are all still true. We now have that $(1+A+10\varepsilon/\kappa)^{2}\alpha_{j}\geq(d(j,i)+A\sqrt{\tau_{i}})^{2}$ for all clients $j$ not in $\mathcal{B}\backslash\mathcal{D}_{D}\subset\mathcal{D}_{B}$ by using our modified versions of Lemma 8.7 and Lemma 8.17. In addition, our modified Lemma 8.7 tells us that even for bad clients $j\in\mathcal{D}_{B}$ , there exists a tight facility $i$ such that $\kappa^{-2}\cdot\alpha_{j}^{(0)}\geq(d(j,i)+A\cdot\sqrt{t_{i}})^{2}$ for all $A\in[2\kappa,1/(2\kappa)]$ . Hence, we precisely have that $\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}})^{2}$ for all $A\in[2\kappa,1/(2\kappa)]$ , by adding over all clients $j\in\mathcal{D}_{B}$ and using Corollary 8.15, which still holds unchanged, apart from replacing $\text{OPT}_{k}$ with $\text{OPT}_{k^{\prime}}$ . The final part of proving [1, Proposition 8.18], i.e., verifying Condition 4 in our definition 5.1, holds where the only change is replacing $\text{OPT}_{k}$ with $\text{OPT}_{k^{\prime}}$ . Thus, we have that each solution that is generated is $(\lambda,k^{\prime})$ -roundable.

Finally, we have that [1, Theorem 8.19] still holds with essentially no change, meaning that each call to RaisePrice takes polynomial time and generates a polynomial number of $(\lambda,k^{\prime})$ -roundable solutions for our modified definition of roundable. This also implies that Algorithm 2 runs in polynomial time, since GraphUpdate clearly takes polynomial time, and since the total number of times we call RaisePrice is at most $|\mathcal{F}|\cdot L,$ which is polynomial since $\varepsilon_{z}^{-1}$ and $m=|\mathcal{F}|$ are both polynomial in $n$ .

Now, we note that each time we update our quasi-independent set in GraphUpdate, the new set $(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)})$ only depends on $I_{1}^{(\ell,r-1)}$ and has no dependence on our choice of $I_{2}^{(\ell,r-1)}$ or $I_{3}^{(\ell,r-1)}$ . Therefore, if we ignore the sets $I_{2},I_{3}$ and only focus on $I_{1}$ , the procedure of generating the sequence $\{I_{1}^{(\ell,r)}\}$ is in fact identical to the procedure in Ahmadian et al.[1]. The only difference is that we choose our stopping point based on the first time that $|I_{1}^{(\ell,r)}|+p_{1}\cdot|I_{2}^{(\ell,r)}\cup I_{3}^{(\ell,r)}|<k,$ as opposed to the first time that $|I_{1}^{(\ell,r)}|\leq k$ as done in [1]. Because of this, our Algorithm 2 in fact works exactly as the main algorithm in [1] if we only focus on $I_{1}^{(\ell,r)}$ and set $\delta=\delta_{1}=\frac{4+8\sqrt{2}}{7}.$ The only differences are the way we choose when to stop the procedure and the way we update $K$ and the definition of stopped clients and definition of the set $\mathcal{B}$ .

As a result, we have that each solution $(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S})$ generated is $\lambda$ is $(\lambda,k)$ -roundable, up to our modified definition of roundable (where $\lambda$ is chosen accordingly). By this, we mean that we replace Condition 3 in 5.1 with:

a)

For all $j\in\mathcal{D}\backslash\mathcal{D}_{B}$ and all $A\in[2\kappa,1/(2\kappa)],$ $(1+A+10\varepsilon/\kappa)^{2}\cdot\alpha_{j}\geq(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}})^{2}$ .
b)

For all $j\in\mathcal{D}$ and all $A\in[2\kappa,1/(2\kappa)]$ , $\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}(d(j,w(j)+A\cdot\sqrt{\tau_{w(j)}})^{2}$ .

Recall that $\varepsilon=\kappa^{2}\ll 1$ . Now, by setting $A=2\kappa$ , we have that

(1+12\kappa)^{2}\cdot\alpha_{j}=\left(1+2\kappa+\frac{10\varepsilon}{\kappa}\right)^{2}\cdot\alpha_{j}\geq\left(d(j,w(j))+2\kappa\cdot\sqrt{\tau_{w(j)}}\right)^{2}\geq c(j,w(j)),

and by setting $A=1/(2\kappa)$ , we have

\left(\frac{1+12\kappa}{2\kappa}\right)^{2}\cdot\alpha_{j}=\left(1+\frac{1}{2\kappa}+10\cdot\frac{\varepsilon}{\kappa}\right)^{2}\cdot\alpha_{j}\geq\left(d(j,w(j))+\frac{1}{2\kappa}\cdot\sqrt{\tau_{w(j)}}\right)^{2}\geq\frac{1}{(2\kappa)^{2}}\cdot\tau_{w(j)}.

Finally, by setting $A=1$ , we have

\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}\left(d(j,w(j))+\sqrt{\tau_{w(j)}}\right)^{2}\geq\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\tau_{w(j)}\right).

Therefore, by setting $\varepsilon^{\prime}=(1+12\kappa)^{2}-1=O(\sqrt{\varepsilon})$ , we have that $(1+\varepsilon^{\prime})\cdot\alpha_{j}\geq c(j,w(j))$ and $(1+\varepsilon^{\prime})\cdot\alpha_{j}\geq\tau_{w(j)}$ . In addition, if we set $\gamma^{\prime}=\kappa^{-2}\cdot\gamma$ , we have that $\gamma^{\prime}\cdot\text{OPT}_{k}\geq\sum_{j\in\mathcal{D}_{B}}(c(j,w(j))+\tau_{w(j)}).$ Therefore, since $\varepsilon=\kappa^{2}$ , and if we assume that $\gamma\ll\varepsilon^{2}$ , then we have that $\varepsilon^{\prime}\ll 1$ and $\gamma^{\prime}\ll\varepsilon\ll\varepsilon^{\prime}$ . So, we can replace $\varepsilon^{\prime}$ with $\varepsilon$ and $\gamma^{\prime}$ with $\gamma$ , we have an algorithm that still runs in polynomial time (since the old values of $\kappa,\gamma,\varepsilon$ are still polynomial factors in the new values of $\varepsilon,\gamma$ which are all constants, even if they are arbitrary small). But now, we have that each solution $(\alpha^{(\ell)},z^{(\ell)},\mathcal{F}_{S}^{(\ell)},\mathcal{D}_{S}^{(\ell)})$ satisfies the actual Condition 3 in Definition 5.1, for our new values of $\varepsilon$ and $\gamma$ .

Overall, we have that the algorithm runs in polynomial time, and each solution is $k^{\prime}$ -roundable, where $k^{\prime}$ is the minimum of $k$ and $\min|I_{1}^{(0)}|$ over the course of the algorithm. Each pair of consecutive solutions is close as in Theorem 8.19 in [1] (which follows from their Proposition 8.10). Next, we have that each time we create a solution $\mathcal{S}^{(0)}$ , Lemma 8.1 in [1], which holds in our setting, tells us that every client $(\alpha^{(0)},z^{(0)})$ is decided. Since $\mathcal{B}$ is a subset of undecided facilities, $\mathcal{B}=\emptyset$ for a solution $\mathcal{S}^{(0)}$ , which means that $\mathcal{F}_{S}=\emptyset$ based on the definition of $\mathcal{F}_{S}$ . In addition, our modified version of [1, Lemma 8.7] holds for all $j$ since $\mathcal{B}=\emptyset,$ which means that we can set the bad clients $\mathcal{D}_{B}$ to be $\emptyset$ . So, for each $\mathcal{S}^{(0)}$ , the special facilities and bad clients are both empty. Next, we have that $I^{(\ell,r)}$ is a nested quasi-independent set because of how we defined $\mathcal{V}^{(\ell,r)}$ and how we defined $I^{(\ell,r)}$ in our GraphUpdate procedure. Finally, we had that $|\mathcal{V}^{(\ell,r)}\backslash\mathcal{V}^{(\ell,r+1)}|\leq 1$ as described at the end of Subsection 5.1, and that we created $I_{1}^{(\ell,r+1)}$ from $I_{1}^{(\ell,r)}$ by removing a single point from $\mathcal{V}^{(\ell,r)}$ if $|\mathcal{V}^{(\ell,r)}\backslash\mathcal{V}^{(\ell,r+1)}|=1$ (which may or may not be in $I_{1}^{(\ell,r)}$ ), and then extending to a maximal independent set of $\mathcal{V}^{(\ell+1,r)}$ . So, we have that $|I_{1}^{(\ell,r)}\backslash I_{1}^{(\ell,r+1)}|\leq 1.$ This means that all of the statements of Theorem 5.7 hold.

Appendix E Limit of [1] in Obtaining Improved Approximations

In this section, we show that the algorithm of Ahmadian et al. [1] cannot guarantee an LMP approximation better than $1+\sqrt{2}$ in the case of $k$ -median. In more detail, we show that there exists a set of clients $\mathcal{D}$ , facilities $\mathcal{F}$ , and parameter $\lambda>0$ such that for any choice $\delta\geq 1$ in the pruning phase, the LMP algorithm described in the preliminary subsection 5.1 does not obtain better than a $(1+\sqrt{2})$ -approximation for $k$ -median. As a result, their technique cannot guarantee an LMP approximation for all choices $\lambda$ , which means any improvement to their analysis would have to move significantly outside the LMP framework.

We start with the $k$ -median case. First, consider the points $j$ , $i_{1}=j_{1}$ , and $i_{2}=j_{2}$ such that $j,j_{1},j_{2}$ are collinear in that order, $d(j,j_{1})=T$ , and $d(j_{1},j_{2})=\sqrt{2}\cdot T$ for some choice of $T>0$ . Consider applying the LMP algorithm described in Section 2.2 on just these points $\mathcal{F}=\{i_{1},i_{2}\}$ and $\mathcal{D}=\{j,j,\dots,j,j_{1},j_{2}\}$ , where we set $\lambda=T$ and will include a large number $N$ of copies of $j$ . In this case, the growing phase will set $\alpha_{j}=\alpha_{j_{1}}=\alpha_{j_{2}}=T$ , where $i_{1}$ and $i_{2}$ both become tight. Also, $N(i_{1})=\{j_{1}\}$ (with each copy of $j$ barely not being in it) and $N(i_{2})=\{j_{2}\}$ . One also obtains that $t_{i_{1}}=t_{i_{2}}=T$ . Then, if $\delta\geq\sqrt{2}$ , then $i_{1},i_{2}$ are connected in the conflict graph $H(\delta),$ which means that the pruning phase will only allow either $i_{1}$ or $i_{2}$ to be in our set $S$ . The algorithm is arbitrary, and may set $S=\{i_{2}\}$ . In this case, the total clustering cost is $N\cdot T\cdot(1+\sqrt{2})+\sqrt{2}\cdot T=(1+\sqrt{2})\cdot T\cdot(N+1)-T,$ whereas the dual is $\sum\alpha_{j}-\lambda\cdot 1=T\cdot(N+2)-T=T\cdot(N+1)$ . If $\delta<\sqrt{2},$ then both $i_{1},i_{2}$ are included, so the primal is $T\cdot N$ and the dual is $\sum\alpha_{j}-\lambda\cdot 2=T\cdot N$ .

Next, we consider a point $j$ as well as points $i_{1},\dots,i_{h}$ , such that $i_{1},\dots,i_{h}$ form a regular simplex with centroid $j$ and pairwise distances $T^{\prime}\cdot\sqrt{2}\cdot(1-\varepsilon)$ between each $i_{r}$ and $i_{s}$ , for some $T^{\prime}>0$ and arbitrarily small $\varepsilon$ . Consider applying the LMP algorithm described in Section 2.2 on just these points $\mathcal{F}=\{i_{1},\dots,i_{h}\}$ and $\mathcal{D}=\{j\}$ , where we set $\lambda=T^{\prime}\cdot\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)$ . In this case, we will have that since $d(\alpha_{j},i_{r})=T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}$ for all $1\leq r\leq h$ , all facilities $i_{r}$ will become tight with $\alpha_{j}=t_{i_{r}}=T^{\prime}$ for all $1\leq r\leq h$ . If $\delta<\sqrt{2}$ , since the pairwise distances are more than $T^{\prime}\cdot\delta$ , the conflict graph will be empty so all facilities will be in the independent set. Therefore, the clustering cost will be $T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}$ , and the dual will be

\alpha_{j}-\lambda\cdot h=T^{\prime}\left(1-h\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)\right)=T^{\prime}\cdot\left((1-\varepsilon)\sqrt{h(h-1)}-(h-1)\right)\leq T^{\prime}\cdot\left(1-\varepsilon\cdot h\right).

Else, if $\delta\geq\sqrt{2}$ , the conflict graph $H(\delta)$ is complete on $i_{1},\dots,i_{h}$ , so only one facility will be in the independent set. The clustering cost is still $T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}$ , and the dual will be

\alpha_{j}-\lambda\cdot 1=T^{\prime}\left(1-\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)\right)=T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}.

Now, we fix $\varepsilon$ as a very small constant, and $h=\Theta(\varepsilon^{-3})$ . Finally, we set $T=1=\lambda$ and $T^{\prime}=1/\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)=\Theta(\varepsilon^{-1})$ . Finally, we set $N=\Theta(\varepsilon^{-2})$ , and consider the concatenation of each of the two cases described above, where the corresponding cases are sufficiently far apart in Euclidean space that there is no interaction.

If $\delta\geq\sqrt{2}$ , then the overall clustering cost is

(1+\sqrt{2})\cdot T\cdot(N+1)-T+T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}=(1+\sqrt{2})\cdot N\cdot(1+O(\varepsilon))

whereas the total dual is

T\cdot N+T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}=N\cdot(1+O(\varepsilon)).

So, we do not obtain better than a $1+\sqrt{2}-O(\varepsilon)$ approximation in this case. If $\delta<\sqrt{2}$ , then the total dual is in fact negative, as it is at most

T\cdot N+T^{\prime}\cdot(1-\varepsilon\cdot h)\leq-\Omega(\varepsilon^{-3}).

Overall, there is no choice of $\delta$ that we can set to improve over a $1+\sqrt{2}$ approximation.

	$\displaystyle\|S\|$	$\displaystyle\leq\|I_{1}\|+(p-\frac{1}{C})\cdot\|I_{2}\cup I_{3}\|+\sqrt{10C\cdot C\cdot\|I_{2}\cup I_{3}\|}$
		$\displaystyle\leq\|I_{1}\|+p\|I_{2}\cup I_{3}\|-\frac{1}{C}\|I_{2}\cup I_{3}\|+\sqrt{10}C\cdot\sqrt{\|I_{2}\cup I_{3}\|}$
		$\displaystyle\leq\|I_{1}\|+p\|I_{2}\cup I_{3}\|=k,$

$\displaystyle\mathbb{E}\left[\text{cost}(\mathcal{D},S_{0})\right]$	$\displaystyle\leq\rho(p^{\prime})\cdot(1+O(\varepsilon))\cdot\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[\|S_{0}\|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
	$\displaystyle=\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(\|I_{1}\|+\left(p-\frac{2}{C}\right)\cdot\|I_{2}\cup I_{3}\|\right)\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
	$\displaystyle=\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{2}{C}\cdot\left(\lambda-\frac{1}{n}\right)\cdot\|I_{2}\cup I_{3}\|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.$	(16)

	$\displaystyle\text{cost}(\mathcal{D},S)$	$\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda-\frac{1}{n}\right)\cdot\|S\|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda+\frac{2}{n}\right)\cdot\|S\|\right]+O(1/n)\cdot\|S\|+O(\gamma)\cdot\text{OPT}_{k^{\prime}}$
		$\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{\|S\|}+O(\gamma)\cdot\text{OPT}_{k}.$

	$\displaystyle p\cdot\\|C-A\\|_{2}^{2}+(1-p)\cdot\\|D-A\\|_{2}^{2}$	$\displaystyle=p\cdot\\|v_{1}+v_{2}\\|_{2}^{2}+(1-p)\cdot\\|v_{1}+v_{3}\\|_{2}^{2}$
		$\displaystyle=\\|v_{1}\\|_{2}^{2}+2\cdot\langle v_{1},pv_{2}+(1-p)v_{3}\rangle+p\cdot\\|v_{2}\\|_{2}^{2}+(1-p)\cdot\\|v_{3}\\|_{2}^{2}$
		$\displaystyle\leq 1+2\cdot\\|pv_{2}+(1-p)v_{3}\\|_{2}+p\cdot\\|v_{2}\\|_{2}^{2}+(1-p)\cdot\\|v_{3}\\|_{2}^{2},$

Improved Approximations for Euclidean kk-means and kk-median, via Nested Quasi-Independent Sets

Abstract

1 Introduction

1.1 Our Results

Theorem 1.1.

1.2 Related Work

1.3 Roadmap

2 Preliminaries

2.1 The Lagrangian LP Relaxation and LMP Solutions

2.2 Witnesses and Conflict Graphs

3 Technical Overview

Improved LMP Approximation:

Polynomial-Time Algorithm:

4 LMP Approximation for Euclidean kk-means

4.1 The algorithm and setup

Proposition 4.1.

4.2 Main lemma

Lemma 4.2.

Remark.

Proof of Lemma 4.2.

Proposition 4.3.

Proof.

Proposition 4.4.

Case 1: 𝒂=𝟎,𝒃=𝟏,𝒄=𝟎\boldsymbol{a=0,b=1,c=0}.

Case 2: 𝒂=𝟎,𝒃=𝟏,𝒄≥𝟏\boldsymbol{a=0,b=1,c\geq 1}.

Proposition 4.5.

Proof.

Proposition 4.6.

Proof.

Case 3: 𝒂=𝟎,𝒃≥𝟐\boldsymbol{a=0,b\geq 2}.

Case 4: 𝒂=𝟎,𝒃=𝟎\boldsymbol{a=0,b=0}.

Case 5: 𝒂≥𝟏\boldsymbol{a\geq 1}.

Proposition 4.7.

5 Polynomial-time Approximation Algorithm for Euclidean kk-means

5.1 The algorithm and setup

Definition 5.1.

5.2 Additional preliminaries

Lemma 5.2.

Proof.

Definition 5.3.

Proposition 5.4.

Proposition 5.5.

Proposition 5.6.

5.3 Analysis

Theorem 5.7.

Lemma 5.8.

Proof.

Lemma 5.9.

Remark.

Proof of Lemma 5.9.

Lemma 5.10.

Proof.

Lemma 5.11.

Remark.

Proof.

Lemma 5.12.

Proof.

Lemma 5.13.

Proof.

Lemma 5.14.

Remark.

Proof.

Proposition 5.15.

Proof.

Lemma 5.16.

Proof.

Theorem 5.17.

Proof.

5.4 Improving the approximation further

Lemma 5.18.

Remark.

Proof.

Lemma 5.19.

Proof.

6 Improved Approximation Algorithm for kk-median

6.1 Improvement to 1+21+\sqrt{2}-approximation

Lemma 6.1.

Proof.

Case 1: |𝑰∩𝑵​(𝒋)|=𝟎\boldsymbol{|I\cap N(j)|=0}.

Case 2: |𝑰∩𝑵​(𝒋)|=𝟏\boldsymbol{|I\cap N(j)|=1}.

Improved Approximations for Euclidean $k$ -means and $k$ -median, via Nested Quasi-Independent Sets

4 LMP Approximation for Euclidean $k$ -means

Case 1: $\boldsymbol{a=0,b=1,c=0}$ .

Case 2: $\boldsymbol{a=0,b=1,c\geq 1}$ .

Case 3: $\boldsymbol{a=0,b\geq 2}$ .

Case 4: $\boldsymbol{a=0,b=0}$ .

Case 5: $\boldsymbol{a\geq 1}$ .

5 Polynomial-time Approximation Algorithm for Euclidean $k$ -means

6 Improved Approximation Algorithm for $k$ -median

6.1 Improvement to $1+\sqrt{2}$ -approximation

Case 1: $\boldsymbol{|I\cap N(j)|=0}$ .

Case 2: $\boldsymbol{|I\cap N(j)|=1}$ .

Case 3: $\boldsymbol{|I\cap N(j)|=s\geq 2}$ .

6.2 An improved LMP algorithm for Euclidean $k$ -median

Case 1: $\boldsymbol{a=0,b=1,c=0}$ .

Case 2: $\boldsymbol{a=0,b=0,c\leq 1}$ .

Case 3: $\boldsymbol{a=0}$ , all other cases.

Case 4: $\boldsymbol{a\geq 1}$ .

6.3 Improved $k$ -median approximation

Appendix C Numerical Analysis for Euclidean $k$ -means and $k$ -median

C.1 The $k$ -means case

C.2 The $k$ -median case