This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Improved Approximations for Euclidean kk-means and kk-median, via Nested Quasi-Independent Sets

Vincent Cohen-Addad , Hossein Esfandiari , Vahab Mirrokni , Shyam Narayanan Google Research. [email protected] Research. [email protected] Research. [email protected] Institute of Technology. Work done as an intern at Google Research. [email protected]
Abstract

Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean kk-median and kk-means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani [30] and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward [1]. Our algorithm achieves an approximation ratio of 2.4062.406 and 5.9125.912 for Euclidean kk-median and kk-means, respectively, improving upon the 2.633 approximation ratio of Ahmadian et al. [1] and the 6.1291 approximation ratio of Grandoni, Ostrovsky, Rabani, Schulman, and Venkat [25].

Our techniques involve a much stronger exploitation of the Euclidean metric than previous work on Euclidean clustering. In addition, we introduce a new method of removing excess centers using a variant of independent sets over graphs that we dub a “nested quasi-independent set”. In turn, this technique may be of interest for other optimization problems in Euclidean and p\ell_{p} metric spaces.

1 Introduction

The kk-means and kk-median problems are among the oldest and most fundamental clustering problems. Originally motivated by operations research and statistics problems when they first appeared in the late 50s [43, 37, 28] they are now at the heart of several unsupervised and semi-supervised machine learning models and data mining techniques and are thus a main part of the toolbox of modern data analysis techniques in a variety of fields. In addition to their practical relevance, these two problems exhibit strong ties with some classic optimization problems, such as set cover, and understanding their complexity has thus been a long standing problem which has inspired several breakthrough techniques.

Given two sets 𝒟\mathcal{D}, \mathcal{F} of points in a metric space and an integer kk, the goal of the kk-median problem is to find a set SS of kk points in \mathcal{F}, called centers, minimizing the sum of distances from each point in 𝒟\mathcal{D} to the closest point in SS. The goal of the kk-means problem is to minimize the sum of distances squared. The complexity of the kk-median or kk-means problems heavily depends on the underlying metric space. In general metric spaces, namely when the distances only need to obey the triangle inequality, the kk-median and kk-means problem are known to admit a 2.675-approximation [9] and a 9-approximation [1], respectively, and cannot be approximated better than 1+2/e1.7361+2/e\sim 1.736 for kk-median and 1+8/e3.9431+8/e\sim 3.943 for kk-means assuming P \neq NP. We know that the upper and lower bounds are tight when we allow a running time of f(k)nO(1)f(k)n^{O(1)} (i.e.: in the fixed-parameter tractability setting) for arbitrary computable functions ff [17], suggesting that the lower bounds cannot be improved for the general case either. When we turn to slightly more structured metric spaces such as Euclidean metrics the picture changes drastically. While the problem remains NP-hard when the dimension d=2d=2 (and kk is large) [41] or k=2k=2 (and dd is large) [23], both problems admit (1+ε)(1+\varepsilon)-approximation algorithms with running times f(k,ε)ndf(k,\varepsilon)nd [33], with an exponential dependency in kk, and f(d,ε)nlogO(1)nf(d,\varepsilon)n\log^{O(1)}n [15, 14], with a doubly exponential dependency in dd (the latter extends to doubling metrics), a prohibitive running time in practice.

Arguably, the most practically important setting is when the input points lie in Euclidean space of large dimension and the number of clusters is non-constant, namely when both kk and dd are part of the input 111Note here that dd can always be assumed to be of O(logk/ε2)O(\log k/\varepsilon^{-2}) using dimensionality reduction techniques [38] (see also [8] for a slightly worse bound). Unfortunately, the complexity of the problem in this regime is far from being understood. Recent results have proven new inapproximability results: respectively 1.17 and 1.07 for the Euclidean k-means and k-median problems assuming P \neq NP and 1.73 and 1.27 assuming the Johnson Coverage hypothesis of [18, 19]. For the continuous case, the same series of papers show a hardness of 1.06 and 1.015 for Euclidean kk-means and kk-median respectively assuming P \neq NP and 1.36 and 1.08 assuming the Johnson-Coverage hypothesis (see also [21] for further related work on continuous kk-median and kk-means in other metric spaces). Interestingly, the above hardness results implies that there is no algorithmic benefit that could be gained from the 1\ell_{1}-metric: Assuming the Johnson Coverage hypothesis the hardness bounds for kk-median and kk-means are the same in the 1\ell_{1}-metric that in general metrics [19]. However, it seems plausible to leverage the structure of the 2\ell_{2}-metric to obtain approximation algorithms bypassing the lower bounds for the general metric or 1\ell_{1} case (e.g.: obtaining an approximation ratio better than 1+2/e1+2/e for kk-median).

In a breakthrough result, Ahmadian et al. [1] were the first to exploit the structure of high-dimensional Euclidean metrics to obtain better bounds than the current best-known bounds for the general metric case. Concretely, they showed how to obtain a 6.3574-approximation for kk-means (improving upon the 9-approximation of Kanungo et al. [31]) and a 1+8/3+ε2.6331+\sqrt{8/3}+\varepsilon\approx 2.633-approximation for kk-median (improving upon the 2.675-approximation for general metrics [9]). The bound for kk-means was recently improved to a 6.1291-approximation (or more precisely, the unique real root to 4x324x23x1=04x^{3}-24x^{2}-3x-1=0) by [25], by tweaking the analysis of Ahmadian et al. [1], and no progress has been made for Euclidean kk-median after [1].

This very active line of research mixes both new hardness of approximation results and approximation algorithms and aims at answering a very fundamental question: How much can we leverage the Euclidean geometry to obtain better approximation algorithms? And conversely, what do we learn about Euclidean geometry when studying basic computational problems? Our results aim at making progress toward answering the above questions.

1.1 Our Results

Our main result consists of better approximation algorithms for both kk-median and kk-means, with ratio 2.4062.406 for kk-median and 5.9125.912 for kk-means, improving the 2.633-approximation of Ahmadian et al. [1] for kk-median and 6.12916.1291-approximation of [25] for kk-means.

Theorem 1.1.

For any ε>0\varepsilon>0, there exists a polynomial-time algorithm that returns a solution to the Euclidean kk-median problem whose cost is at most 2.406+ε2.406+\varepsilon times the optimum.

For any ε>0\varepsilon>0, there exists a polynomial-time algorithm that returns a solution to the Euclidean kk-means problem whose cost is at most 5.912+ε5.912+\varepsilon times the optimum.

Our approximation ratio for kk-median breaks the natural barrier of 1+2>2.411+\sqrt{2}>2.41 and our approximation ratio for kk-means is the first below 6. The approximation bound of 1+21+\sqrt{2} for Euclidean kk-median is indeed a natural barrier for the state-of-the-art approach of Ahmadian et al. [1] that relies on using the primal-dual approach of Jain and Vazirani [30]. At a high level, the approximation bound of 3 for general metrics for the algorithm of Jain and Vazirani can be interpreted as an approximation bound of 1+2, where 1 is the optimum service cost and an additional cost of 2 times the optimum is added for input clients poorly served by the solution. Since general metrics are only required to satisfy the triangle inequality, the 2 naturally arises from bounding the distance from a client to its center in the solution output and an application of the triangle inequality to bound this distance. Therefore, one can hope to then obtain a substantial gain when working in Euclidean spaces: The triangle inequality is rarely tight (unless points are aligned) and this leads to the hope of replacing the 2 by 1+1\sqrt{1+1} in the above the bound, making the approximation ratio of 1+21+\sqrt{2} a natural target for kk-median. In fact, this high-level discussion can be made slightly more formal: we show that the analysis of the result of Ahmadian et al. [1] cannot be improved below 1+21+\sqrt{2} for kk-median, exhibiting a limit of the state-of-the-art approaches.

In this paper, we take one step further, similar to the result of Li and Svensson [36] for the general metric case who improved for the first below the approximation ratio of 3, we show how to bypass 1+21+\sqrt{2} for Euclidean kk-median (and the bound of 6.129 for kk-means).

Furthermore, one of our main contributions is to obtain better Lagrangian Multiplier Preserving (LMP) approximations for the Lagrangian relaxations of both problems. To understand this, we need to give a little more background on previous work and how previous approximation algorithms were derived. A natural approach to the kk-median and kk-means problem is to (1) relax the constraint on the number of centers kk in the solution, (2) find an approximate solution for the relaxed problem, and (3) derive an approximate solution that satisfies the constraint on the number of centers. Roughly, an LMP approximate solution SS is a solution where we bound the ratio of the cost of SS to the optimum cost, but pay a penalty proportional to some λ0\lambda\geq 0 for each center in SS. Importantly, if |S|=k|S|=k, an LMP ρ\rho-approximation that outputs SS is a ρ\rho-approximation for kk-means (or kk-median). We formally define an LMP approximation in Section 2. LMP solutions have played a central role in obtaining better approximation algorithms for kk-median in general metrics and more recently in high-dimensional Euclidean metrics. Thus obtaining better LMP solutions for the Lagrangian relaxation of kk-median and kk-means has been an important problem. A byproduct of our approach is a new 2.395-LMP for Euclidean kk-median and 3+223+2\sqrt{2}-LMP for Euclidean kk-means.

Our techniques may be of use in other clustering and combinatorial optimization problems over Euclidean space as well, such as Facility Location. In addition, by exploiting the geometric structure similarly, these techniques likely extend to p\ell_{p}-metric spaces (for p>1p>1).

1.2 Related Work

The first O(1)O(1)-approximation for the kk-median problem in general metrics is due to Charikar et al. [12]. The kk-median problem has then been a testbed for a variety of powerful approaches such as the primal-dual schema [30, 10], greedy algorithms (and dual fitting) [29], improved LP rounding [13], local-search [4, 16], and LMP-based-approximation [36]. The current best approximation guarantee is 2.675 [9] and the best hardness result is (1+2/e)(1+2/e) [26]. For kk-means in general metrics, the current best approximation guarantee is 9 [1] and the current best hardness result is (1+8/e)(1+8/e) (which implicitly follows from [26], as noted in [1]).

We have already covered in the introduction the history of algorithms for high-dimensional Euclidean kk-median and kk-means with running time polynomial in both kk and the dimension and that leverage the properties of the Euclidean metrics ([31, 1, 25]). In terms of lower bounds, the first to show that the high-dimensional kk-median and kk-means problems were APX-hard were Guruswami and Indyk [27], and later Awasthi et al. [6] showed that the APX-hardness holds even if the centers can be placed arbitrarily in d\mathbb{R}^{d}. The inapproximability bound was later slightly improved by Lee et al. [34] until the recent best known bounds of [18, 19]. From a more practical point of view, Arthur and Vassilvitskii showed that the widely-used popular heuristic of Lloyd [37] can lead to solutions with arbitrarily bad approximation guarantees [3], but can be improved by a simple seeding strategy, called kk-means++, so as to guarantee that the output is within an O(logk)O(\log k) factor of the optimum [2].

For fixed kk, there are several known approximation schemes, typically using small coresets [8, 24, 33] There also exists a large body of bicriteria approximations (namely outputting a solution with (1+c)k(1+c)k centers for some constant c>0c>0): see, e.g., [7, 11, 20, 32, 39]. There has also been a long line or work on the metric facility location problem, culminating with the result of Li [35] who gave a 1.488-approximation algorithm, almost matching the lower bound of 1.463 of Guha and Khuller [26]. Note that no better bound is known for high-dimensional Euclidean facility location.

1.3 Roadmap

In Section 2, we describe some preliminary definitions. We also formally define the LMP approximation and introduce the LMP framework of Jain and Vazirani [30] and Ahmadian et al. [1]. In Section 3, we provide an overview of the new technical results we developed to obtain the improved bounds. In Section 4, we obtain a 3+225.8283+2\sqrt{2}\approx 5.828 LMP approximation for the Euclidean kk-means problem. In Section 5, extend our LMP approximation to a 5.9125.912-approximation for standard Euclidean kk-means. Finally, in Section 6, we obtain a 2.3952.395 LMP approximation for Euclidean kk-median that can be extended to a 2.4062.406-approximation for standard Euclidean kk-median.

In Appendix E, we briefly show that the result of [1] cannot be extended beyond 1+21+\sqrt{2} for Euclidean kk-median: this demonstrates the need of our new techniques for breaking this barrier.

2 Preliminaries

Our goal is to provide approximation algorithms for either the kk-means or kk-median problem in Euclidean space on a set 𝒟\mathcal{D} of clients of size nn. For the entirety of this paper, we consider the discrete kk-means and kk-median problems, where rather than having the kk centers allowed to be anywhere, we are given a fixed set of facilities \mathcal{F} of size mm which is polynomial in nn, from which the kk centers must be chosen from. It is well-known (e.g., [40]) that a polynomial-time algorithm providing a ρ\rho-approximation for discrete kk-means (resp., median) implies a polynomial-time ρ+ε\rho+\varepsilon-approximation for standard kk-means (resp., median) for an arbitrarily small constant ε\varepsilon.

For two points x,yx,y in Euclidean space, we define d(x,y)d(x,y) as the Euclidean distance (a.k.a. 2\ell_{2}-distance) between xx and yy. In addition, to avoid redefining everything or restating identical results for both kk-means and kk-median, we define c(x,y):=d(x,y)2c(x,y):=d(x,y)^{2} in the context of kk-means and c(x,y):=d(x,y)c(x,y):=d(x,y) in the context of kk-means. For a subset SS of Euclidean space, we define d(x,S):=minsSd(x,s)d(x,S):=\min_{s\in S}d(x,s) and c(x,S):=minsSc(x,s)c(x,S):=\min_{s\in S}c(x,s).

For the kk-means (or kk-median) problem, for a subset SS\subset\mathcal{F}, we define cost(𝒟,S):=j𝒟c(j,S)\text{cost}(\mathcal{D},S):=\sum_{j\in\mathcal{D}}c(j,S). In addition, we define OPTk\text{OPT}_{k} to be the optimum kk-means (or kk-median) cost for a set 𝒟\mathcal{D} and a set of facilities \mathcal{F}, i.e., OPTk=minS,|S|=kcost(𝒟,S)\text{OPT}_{k}=\min_{S\subset\mathcal{F},|S|=k}\text{cost}(\mathcal{D},S). Recall that a ρ\rho-approximation algorithm is an algorithm that produces a subset of kk facilities SS\subset\mathcal{F} with cost(𝒟,S)ρOPTk\text{cost}(\mathcal{D},S)\leq\rho\cdot\text{OPT}_{k} in the worst-case.

2.1 The Lagrangian LP Relaxation and LMP Solutions

We first look at the standard LP formulation for kk-means/medians. The variables of the LP include a variable yiy_{i} for each facility ii\in\mathcal{F} and a variable xi,jx_{i,j} for each pair (i,j)(i,j) for ii\in\mathcal{F} and j𝒟j\in\mathcal{D}. The standard LP relaxation is the following:

minimize i,j𝒟xi,jc(j,i)\displaystyle\sum_{i\in\mathcal{F},j\in\mathcal{D}}x_{i,j}\cdot c(j,i)\hskip 14.22636pt (1)
such that ixi,j\displaystyle\sum_{i\in\mathcal{F}}x_{i,j} 1\displaystyle\geq 1 j𝒟\displaystyle\qquad\forall j\in\mathcal{D} (2)
iyi\displaystyle\quad\sum_{i\in\mathcal{F}}y_{i} k\displaystyle\leq k (3)
0xi,j\displaystyle 0\leq x_{i,j} yi\displaystyle\leq y_{i} j𝒟,i\displaystyle\qquad\forall j\in\mathcal{D},i\in\mathcal{F} (4)

The intuition behind this linear program is that we can think of xi,jx_{i,j} as the indicator variable of client jj being assigned to facility ii, and yiy_{i} as the indicator variable of facility ii being opened. We need every facility j𝒟j\in\mathcal{D} to be assigned to at least one client, that at most kk facilities ii are opened, and that xi,jx_{i,j} is 11 only if yi=1y_{i}=1 (since clients can only be assigned to open facilities). We also ensure a nonnegativity constraint on xi,jx_{i,j} and yiy_{i} by ensuring that 0xi,j0\leq x_{i,j}. Finally, our goal is to minimize the sum of distances (for kk-median) or the sum of squared distances (for kk-means) from each client to its closest facility – or simply the facility it is assigned to, and if exactly one of the xi,jx_{i,j} values is 11 for a fixed client jj and the rest are 0, then ixi,jc(j,i)\sum_{i\in\mathcal{F}}x_{i,j}c(j,i) is precisely the distance (or squared distance) from jj to its corresponding facility. By relaxing the linear program to have real variables, we can only decrease the optimum, so if we let LL be the optimum value of the LP relaxation, then LOPTkL\leq\text{OPT}_{k}.

Jain and Vazirani [30] considered the Lagrangian relaxation of this linear program, by relaxing the constraint (3) and adding a dependence on a Lagrangian parameter λ0\lambda\geq 0. By doing this, the number of facilities no longer has to be at most kk in the relaxed linear program but the objective function penalizes for opening more than kk centers. Namely, the goal becomes to minimize

i,j𝒟xi,jc(j,i)+λ(iyik)\sum_{i\in\mathcal{F},j\in\mathcal{D}}x_{i,j}\cdot c(j,i)+\lambda\cdot\left(\sum_{i\in\mathcal{F}}y_{i}-k\right) (5)

subject to Constraints (2) and (4). Indeed, for λ0\lambda\geq 0, the objective only decreases from (1) to (5) for any feasible solution to the original LP. Therefore, this new linear program, which we will call LP(λ)\text{LP}(\lambda), has optimum L(λ)LL(\lambda)\leq L. Now, it is known that the Dual linear program to this Lagrangian relaxation of the original linear program can be written as the following, which has variables α={αj}j𝒟\alpha=\{\alpha_{j}\}_{j\in\mathcal{D}}:

maximize (j𝒟αj)λk\displaystyle\left(\sum_{j\in\mathcal{D}}\alpha_{j}\right)-\lambda\cdot k\hskip 14.22636pt (6)
such that j𝒟max(αjc(j,i),0)\displaystyle\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0) λ\displaystyle\leq\lambda i\displaystyle\qquad\forall i\in\mathcal{F} (7)
α\displaystyle\hskip 102.43008pt\alpha 0\displaystyle\geq 0 (8)

We call this linear program DUAL(λ)\text{DUAL}(\lambda). Because the optimum to DUAL(λ)\text{DUAL}(\lambda) equals the optimum to the primal LP(λ)\text{LP}(\lambda) by strong duality, this means that for any α={αj}j𝒟\alpha=\{\alpha_{j}\}_{j\in\mathcal{D}} satisfying Conditions (7) and (8), we have that (j𝒟αj)λkL(λ)LOPTk\left(\sum_{j\in\mathcal{D}}\alpha_{j}\right)-\lambda\cdot k\leq L(\lambda)\leq L\leq\text{OPT}_{k}.

For a fixed λ\lambda, we say that α\alpha is feasible if it satisfies both (7) and (8). Thus, to provide a ρ\rho-approximation to kk-means (or kk-median), it suffices to provide both a feasible α\alpha and a subset SS\subset\mathcal{F} of size kk such that cost(𝒟,S)\text{cost}(\mathcal{D},S) is at most ρ(j𝒟αjλ|S|).\rho\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda\cdot|S|\right).

In both the work of Jain and Vazirani [30] and the work of Ahmadian et al. [1], they start with a weaker type of algorithm, called a Lagrangian Multiplier Preserving (LMP) approximation algorithm. To explain this notion, let OPT(λ)\text{OPT}(\lambda) represent the optimum (minimum) for the modified linear program LP(λ)\text{LP}^{\prime}(\lambda), which is the same as LP(λ)\text{LP}(\lambda) except without the subtraction of λk\lambda\cdot k in the objective function (5). (So, LP(λ)\text{LP}^{\prime}(\lambda) has no dependence on kk). Note that this is also the optimum (maximum) for DUAL(λ)\text{DUAL}^{\prime}(\lambda), which is the same as DUAL(λ)\text{DUAL}(\lambda) except without the subtraction of λk\lambda\cdot k in the objective function (6). Then, for some fixed λ0\lambda\geq 0, we say that a ρ\rho-approximation algorithm is LMP if it returns a solution SS\subset\mathcal{F} satisfying

j𝒟c(j,S)ρ(OPT(λ)λ|S|).\sum_{j\in\mathcal{D}}c(j,S)\leq\rho\cdot\left(\text{OPT}(\lambda)-\lambda\cdot|S|\right).

Indeed, if we could find a choice of λ\lambda and an LMP ρ\rho-approximate solution SS with size |S|=k|S|=k, we would have found a ρ\rho-approximation for kk-means (or kk-median) clustering.

2.2 Witnesses and Conflict Graphs

Jain and Vazirani [30] proposed a simple primal-dual approach to create a feasible solution α\alpha of DUAL(λ)\text{DUAL}(\lambda) with certain additional properties that are useful for providing an efficient solution to the original kk-median (or kk-means) problem efficiently. We describe it briefly as follows, based on the exposition of Ahmadian et al. [1, Subsection 3.1].

Start with α=0\alpha=\textbf{0}, i.e., αj=0\alpha_{j}=0 for all j𝒟j\in\mathcal{D}. We increase all αj\alpha_{j}’s continuously at a uniform rate, but stop growing each αj\alpha_{j} once one of the following two events occurs:

  1. 1.

    For some ii\in\mathcal{F}, a dual constraint j𝒟max(αjc(j,i),0)λ\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)\leq\lambda becomes tight (i.e., reaches equality). Once this happens, we stop growing αj\alpha_{j}, and declare that facility ii is tight for all ii such that the constraint became equality. In addition, we will say that ii is the witness of jj.

  2. 2.

    For some already tight facility ii, we grow αj\alpha_{j} until αj=c(j,i)\alpha_{j}=c(j,i). In this case, we also say that ii is the witness of jj.

We note that this process must eventually terminate for all jj (e.g., once αj\alpha_{j} reaches minic(j,i)+λ\min_{i\in\mathcal{F}}c(j,i)+\lambda). This completes our creation of the dual solution α\alpha (it is simple to see that α\alpha is feasible).

For any client jj, we define N(j):={i:αj>c(j,i)}N(j):=\{i\in\mathcal{F}:\alpha_{j}>c(j,i)\}, and likewise, for any client ii, we define N(i):={j𝒟:αj>c(j,i)}N(i):=\{j\in\mathcal{D}:\alpha_{j}>c(j,i)\}. For any tight facility ii, we will define ti:=maxjN(i)αjt_{i}:=\max_{j\in N(i)}\alpha_{j}, where ti=0t_{i}=0 by default if N(i)=N(i)=\emptyset. For each client jj, its witness ii will have the useful properties that tiαjt_{i}\leq\alpha_{j} and c(j,i)αjc(j,i)\leq\alpha_{j}.

We have already created our dual solution α\alpha: to create our set of kk facilities, we will choose a subset of the tight facilities. First, we define the conflict graph on the set of tight facilities. Indeed, Jain and Vazirani [30], Ahmadian et al. [1], and we all have slightly different definitions: so we contrast the three.

  • [30] Here, we say that (i,i)(i,i^{\prime}) forms an edge in the conflict graph HH if there exists a client jj such that i,iN(j)i,i^{\prime}\in N(j) (or equivalently, αjc(j,i)\alpha_{j}\geq c(j,i) and αjc(j,i)\alpha_{j}\geq c(j,i^{\prime})).

  • [1] Here, we say that (i,i)(i,i^{\prime}) forms an edge in the conflict graph H(δ)H(\delta) (where δ>0\delta>0 is some parameter) if c(i,i)δmin(ti,ti)c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}) and there exists a client jj such that i,iN(j)i,i^{\prime}\in N(j).

  • In our definition, we completely drop the condition from [30], and just say that (i,i)(i,i^{\prime}) forms an edge in the conflict graph H(δ)H(\delta) if c(i,i)δmin(ti,ti)c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}).

It turns out that in the algorithm of Ahmadian et al. [1], the approximation factor is not affected by whether they use their definition or our definition. But in our case, it turns out that dropping the condition from [30] in fact allows us to obtain a better approximation.

To provide an LMP approximation, both Jain and Vazirani [30] and Ahmadian et al. [1] constructed a maximal independent set II of the conflict graph HH (or H(δ)H(\delta) for an appropriate choice of δ>0\delta>0) and used I=SI=S as the set of centers. For Jain and Vazirani’s definition, the independent set II obtains an LMP 99-approximation for metric kk-means. For Ahmadian et al.’s definition, the independent set II obtains an LMP 6.12916.1291-approximation for Euclidean kk-means if δ\delta is chosen properly. (We note that Ahmadian et al. only proved a factor of 6.35746.3574, though their argument can be improved to show a 6.12916.1291-approximation factor as proven by Grandoni et al. [25]). While we will not explicitly prove it, our definition of H(δ)H(\delta) also obtains the same bound with the same choice of δ\delta. For the Euclidean kk-median problem, using either Ahmadian et al.’s or our definition, one can obtain an LMP (1+2)(1+\sqrt{2})-approximation: Ahmadian et al. [1] only proved it for the weaker 1+8/31+\sqrt{8/3} approximation factor, but we prove the improved approximation in Subsection 6.1. We then show how to obtain a better LMP solution and then a better approximation bound.

3 Technical Overview

The algorithms of both Jain and Vazirani [30] and Ahmadian et al [1] begin by constructing an LMP approximation. Their approximation follows two phases: a growing phase and a pruning phase. In the growing phase, as described in Subsection 2.2, they grow the solution α\alpha starting from α=0\alpha=\textbf{0}, until they obtain a suitable dual solution α\alpha for DUAL(λ)\text{DUAL}(\lambda). In addition, they have a list of tight facilities ii, which we think of as our candidate centers. The pruning phase removes unnecessary facilities: as described in Subsection 2.2, we create a conflict graph H(δ)H(\delta) over the tight facilities, and only choose a maximal independent set II. Hence, we are pruning out tight facilities to make sure we do not have too many nearby facilities. This way we ensure that the total number of centers is not unnecessarily large. Our main contributions are to improve the pruning phase with a new algorithm and several new geometric insights, and to show how our LMP approximation can be extended to improved algorithms for standard Euclidean kk-means (and kk-median) clustering.

Improved LMP Approximation:

To simplify the exposition, we focus on Euclidean kk-means. To analyze the approximation, Ahmadian et al. [1] compare the cost of each client jj in the final solution to its contribution to the dual objective function. The cost of a client jj is simply c(j,I)c(j,I) where II is our set of centers, and jj’s contribution to the dual is αjiN(j)I(αjc(j,i))\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i)), where we recall the definition of N(j)N(j) from Subsection 2.2. One can show that the sum of the individual dual contributions equals the dual objective (6). By casework on the size a=|N(j)I|a=|N(j)\cap I|, [25] (by modifying the work of [1]) shows that c(j,I)ρ[αjiN(j)I(αjc(j,i))],c(j,I)\leq\rho\cdot\left[\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right], where ρ6.1291\rho\approx 6.1291 if δ2.1777\delta\approx 2.1777 is chosen appropriately (in general, we think of δ\delta as slightly greater than 22). The only bottlenecks (i.e., where the cost-dual ratio could equal ρ\rho) are when a{0,2}a\in\{0,2\}. Our LMP approximation improves the pruning phase by reducing the cost-dual ratio in these cases.

Due to the disjointed nature of the bottleneck cases, our first observation was that averaging between two or more independent sets may be beneficial. This way, if for some client jj, the first independent set I1I_{1} had a=0a=0 and the second set I2I_{2} had a=2a=2, perhaps by taking a weighted average of I1I_{1} and I2I_{2} we can obtain a set II where a=1a=1 with reasonable probability. Hence, the expected cost-dual ratio of jj will be below ρ\rho. The second useful observation comes from the fact that tiαjt_{i^{*}}\leq\alpha_{j} and c(j,i)αjc(j,i^{*})\leq\alpha_{j}, if ii^{*} is the witness of jj. In the a=0a=0 case, [1] applies this to show that d(j,i)αjd(j,i^{*})\leq\sqrt{\alpha_{j}} and d(i,I)δtiδαjd(i^{*},I)\leq\sqrt{\delta\cdot t_{i^{*}}}\leq\sqrt{\delta\cdot\alpha_{j}}, which follows by the definition of conflict graph. Hence, the bottleneck occurs when ii^{*} has distance exactly αj\sqrt{\alpha_{j}} from jj, and the nearest point iIi\in I has distance exactly δαj\sqrt{\delta\cdot\alpha_{j}} from ii^{*}, in the direction opposite jj. This causes c(j,I)c(j,I) to be (1+δ)2αj(1+\sqrt{\delta})^{2}\cdot\alpha_{j} in the worst case, whereas jj’s contribution to the dual is merely αj\alpha_{j}. To reduce this ratio, we either need to make sure that the distance from ii^{*} to either jj or ii is smaller, or that j,i,ij,i^{*},i do not lie in a line in that order.

A reasonable first attempt is to select two choices δ1δδ2\delta_{1}\geq\delta\geq\delta_{2}, and consider the nested conflict graphs H(δ1)H(δ2)H(\delta_{1})\supset H(\delta_{2}). We can then create nested independent sets by first creating I1I_{1} as a maximal independent set of H(δ1)H(\delta_{1}), then extending it to (I1I2)I1(I_{1}\cup I_{2})\supset I_{1} for H(δ2)H(\delta_{2}). Our final set SS will be an average of I1I_{1} and (I1I2)(I_{1}\cup I_{2}): we include all of I1I_{1} and each point in I2I_{2} with some probability. The motivation behind this attempt is that if both N(j)I1N(j)\cap I_{1} and N(j)I2N(j)\cap I_{2} are empty, the witness ii^{*} of jj should be adjacent in H(δ1)H(\delta_{1}) to some point i1I1i_{1}\in I_{1}, and adjacent in H(δ2)H(\delta_{2}) to some point i2(I1I2)i_{2}\in(I_{1}\cup I_{2}). Hence, either i1=i2i_{1}=i_{2}, in which case the distance from jj to I1I_{1} is now only (1+δ2)αj(1+\sqrt{\delta_{2}})\cdot\sqrt{\alpha_{j}} instead of (1+δ1)αj(1+\sqrt{\delta_{1}})\cdot\sqrt{\alpha_{j}} (see Figure 1), or i1i2i_{1}\neq i_{2}, in which case i1,i2i_{1},i_{2} must be far apart because i1,i2i_{1},i_{2} are both in the independent set I1I2I_{1}\cup I_{2} (see Figure 1). In the latter case, we cannot have the bottleneck case for both i1i_{1} and i2i_{2}, as that would imply j,i,i2,i1j,i^{*},i_{2},i_{1} are collinear in that order with d(i,i2)=δ2αjd(i^{*},i_{2})=\sqrt{\delta_{2}\cdot\alpha_{j}} and d(i,i1)=δ1αjd(i^{*},i_{1})=\sqrt{\delta_{1}\cdot\alpha_{j}}, so i2,i1i_{2},i_{1} are too close. Hence, it appears that we have improved one of the main bottlenecks.

Unfortunately, we run into a new issue, if i2=ji_{2}=j and no other points in I1I2I_{1}\cup I_{2} were in N(j)N(j). While this case may look good because |N(j)I2|=1|N(j)\cap I_{2}|=1 which is not a bottleneck, it is only not a bottleneck because the contribution of jj to the clustering cost and the dual both equal 0 in this case. So, if we condition on i2Si_{2}\in S, the cost and dual are not affected by jj, but if we condition on i2Si_{2}\not\in S, the cost-dual ratio of jj could be (1+δ1)2(1+\sqrt{\delta_{1}})^{2} – hence, we have again made no improvement. While one could attempt to fix this by creating a series of nested independent sets, this approach also fails to work for the same reasons. Hence, we have a new main bottleneck case, where i2=ji_{2}=j, and j,i,i1j,i^{*},i_{1} are collinear in that order with d(j,i)=αjd(j,i^{*})=\sqrt{\alpha_{j}} and d(i,i1)=δ1αjd(i^{*},i_{1})=\sqrt{\delta_{1}\cdot\alpha_{j}} (see Figure 1).

We now explain the intuition for fixing this. In the main bottleneck case, if we could add the witness ii^{*} of jj to SS, this would reduce c(j,S)c(j,S) significantly if i2Si_{2}\not\in S, yet does not affect jj’s contribution to the dual. Unfortunately, adding ii^{*} to SS reduces the dual nonetheless, due to other clients. Instead, we will consider a subset of tight facilities that are close, but not too close, to exactly one tight facility in I2I_{2} but not close to any other facilities in I1I2I_{1}\cup I_{2}. In the main bottleneck cases, the witnesses ii^{*} precisely satisfy the condition, as may some additional points. We again prune these points by creating a conflict graph just on these vertices, and pick another maximal independent set I3I_{3}. Finally, with some probability we will replace each point i2I2i_{2}\in I_{2} with the points in I3I_{3} that are close to i2i_{2}. In our main bottleneck case, we will either pick iI3i^{*}\in I_{3}, or pick another point i3i_{3} that is within distance δ1αj\sqrt{\delta_{1}\cdot\alpha_{j}} of ii^{*}, but now must be far from all points in I1I2I_{1}\cup I_{2} and therefore will not have distance (1+δ1)αj(1+\sqrt{\delta_{1}})\cdot\sqrt{\alpha_{j}} from jj (see Figure 1). In addition, i3i_{3} might be close, but is not allowed to be too close to i2=ji_{2}=j, so if we replace i2i_{2} with i3i_{3}, we do not run into the issue of jj’s contribution being 0 for both the clustering cost and the dual solution.

Given I1,I2,I3I_{1},I_{2},I_{3}, our overall procedure for generating SS is to include all of I1I_{1}, and include points in I2I_{2} and I3I_{3} with some probability. In addition, each point i3I3i_{3}\in I_{3} is close to a unique point i2I2i_{2}\in I_{2}, and we anti-correlate them being in SS. We call the triple (I1,I2,I3)(I_{1},I_{2},I_{3}) a nested quasi-independent set, since I1I_{1} and I1I2I_{1}\cup I_{2} are independent sets and I1I2I3I_{1}\cup I_{2}\cup I_{3} has similar properties, and these sets I1,I1I2,I1I2I3I_{1},I_{1}\cup I_{2},I_{1}\cup I_{2}\cup I_{3} are nested.

We remark that the anti-correlation between selecting i2i_{2} and points i3i_{3} that are close to i2i_{2} is reminiscent of a step of the rounding of bipoint solution of Li and Svensson [36] (there the authors combine two solutions by building a collection of stars where the center of the star belongs to a solution, while the leaves belong to another, and choose to open either the center, or the leaves at random). In this context, we will also allow our algorithm to open slightly k+Ck+C centers (instead of kk) for some absolute constant CC. Similar to Li and Svensson [36] and Byrka et al. [9], we show (Lemma 5.2) that this is without loss of generality (such an algorithm can be used to obtain an approximation algorithm opening at most kk centers).

The analysis of the approximation bound is very casework heavy, depending on the values of a=|N(j)I1|,a=|N(j)\cap I_{1}|, b=|N(j)I2|b=|N(j)\cap I_{2}|, and c=|N(j)I3|c=|N(j)\cap I_{3}| for each jj, and requiring many geometric insights that heavily exploit the structure of the Euclidean metric.

jjii^{*}i1i_{1}=i2=i_{2}αj\sqrt{\alpha_{j}}δ2αj\sqrt{\delta_{2}\cdot\alpha_{j}}
jjii^{*}i1i_{1}i2i_{2}αj\sqrt{\alpha_{j}}δ2αj\sqrt{\delta_{2}\cdot\alpha_{j}}δ1αj\sqrt{\delta_{1}\cdot\alpha_{j}}δ2αj\sqrt{\delta_{2}\cdot\alpha_{j}}
jjii^{*}i1i_{1}=i2=i_{2}i3i_{3}αj\sqrt{\alpha_{j}}δ1αj\sqrt{\delta_{1}\cdot\alpha_{j}}
Figure 1: Subfigures (a) and (b) represent the cases when N(j)I1,N(j)I2N(j)\cap I_{1},N(j)\cap I_{2} are both empty. In (a), we can bound d(i,i1)d(i^{*},i_{1}) as δ2αj<δ1αj\sqrt{\delta_{2}\cdot\alpha_{j}}<\sqrt{\delta_{1}\cdot\alpha_{j}}, and in (b), we can obtain better bounds for d(j,i1)d(j,i_{1}) and d(j,i2)d(j,i_{2}) than by just the triangle inequality. Subfigure (c) represents the problem with having nested two independent sets, when N(j)I1=N(j)\cap I_{1}=\emptyset but N(j)I2N(j)\cap I_{2} has a unique point i2=ji_{2}=j. If we replace i2=ji_{2}=j with the blue node i3i_{3} with some probability, the cost-to-dual ratio improves.

Finally, we remark that this method of generating a set SS of centers also can be used to provide an LMP approximation below 1+21+\sqrt{2} for Euclidean kk-median. However, due to the differing nature of what the bottleneck cases are, the choices of δ1,δ2\delta_{1},\delta_{2}, and the construction of the final set I3I_{3} will differ. Both to obtain and break the 1+21+\sqrt{2}-approximation for Euclidean kk-median, one must understand how the sum of distances from a facility jj to its close facilities i1,,iaN(j)Si_{1},\dots,i_{a}\in N(j)\cap S relates to the pairwise distances between i1,,iai_{1},\dots,i_{a}. In the kk-means case, the squared distances make this straightfoward, but the kk-median case requires more geometric insights (see Lemma 6.1) to improve the 1+8/31+\sqrt{8/3}-approximation of Ahmadian et al. [1] to a 1+21+\sqrt{2}-approximation, even with the same algorithm.

Polynomial-Time Algorithm:

While we can successfully obtain an LMP (3+22)(3+2\sqrt{2})-approximation for Euclidean kk-means, this does not imply a (3+22)(3+2\sqrt{2})-approximation for the overall Euclidean kk-means problem, since our LMP approximation may not produce the right number of cluster centers. To address this issue, Ahmadian et al. [1] developed a method of slowly raising the parameter λ\lambda and generating a series of dual solutions α(0),α(1),α(2),\alpha^{(0)},\alpha^{(1)},\alpha^{(2)},\dots, where each pair of consecutive solutions α(j)\alpha^{(j)} and α(j+1)\alpha^{(j+1)} are sufficiently similar. The procedure of generating dual solutions α(j)\alpha^{(j)} in [1] is very complicated, but will not change significantly from their result to ours. Given these dual solutions, [1] shows how to interpolate between consecutive dual solutions α(j)\alpha^{(j)} and α(j+1)\alpha^{(j+1)}, creating a series of conflict graphs where the subsequent graph removes at most one facility at a time (but occasionally adds many facilities). This property of removing at most one facility ensures that the maximal independent set decreases by at most 1 at each step (but could increase by a large number if the removed vertex allows for more points to be added). Using an intermediate-value theorem, they could ensure that at some point, there is an independent set II of the conflict graph with size exactly kk. Hence, they apply the LMP technique to obtain the same approximation ratio.

In our setting, we are not so lucky, because we are dealing with nested independent sets (and even more complicated versions of them). Even if the size of the first part I1I_{1} never decreases by more than 11 at a time, I1I_{1} could increase which could potentially make the sizes of I2I_{2} or I3I_{3} decrease rapidly, even if we only remove one facility from the conflict graph. To deal with this, we instead consider the first time that the expected size of SS drops below kk (where we recall that all points in I1I_{1} are in SS, but points in I2I_{2} and I3I_{3} are inserted into our final set SS with some probability). Let (I1,I2,I3)(I_{1},I_{2},I_{3}) represent the set of chosen facilities right before the expected size of SS drops below kk for the first time, and (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}) represent the set of chosen facilities right after.

To obtain size exactly kk, we show that one can always do one of the following: either (i) modify the probabilities of selecting points in I2I_{2} and I3I_{3} so that the size is exactly kk, or (ii) use submodularity properties of the kk-means objective function to interpolate between the set SS generated by (I1,I2,I3)(I_{1},I_{2},I_{3}) and the set SS^{\prime} generated by (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}). While |S|k|S|\geq k and |S|<k|S^{\prime}|<k, we do not necessarily have that SSS^{\prime}\subset S, and in fact we will interpolate between SS^{\prime} and SSS\cup S^{\prime} instead by adding random points from S\SS\backslash S^{\prime} to SS^{\prime}. These procedures are relatively computational and require a good understanding of how the dual objective behaves as we modify the probabilities of selecting elements. In addition, because we modify the probabilities to make the size exactly kk and because we interpolate between SS^{\prime} and SSS\cup S^{\prime} instead of SS^{\prime} and SS to obtain our final set of centers, we lose a slight approximation factor from the LMP approximation, but we still significantly improve over the old bound. In addition, this procedure works well for the Euclidean kk-means and kk-median problems.

One minor problem that we will run into is that a point i2i_{2} may have many points in I3I_{3} that are all “close” to it. As a result, even if the expected size of SS is exactly kk, the variance may be large because we are either selecting i2i_{2} or all of the points that are close to i2i_{2}. In this case, we can use the submodularity properties of the kk-means objective and coupling methods from probability theory to show that if i2i_{2} has too many close points in I3I_{3}, we could instead include i2i_{2} and include an average number of these close points from I3I_{3}, without increasing the expected cost.

4 LMP Approximation for Euclidean kk-means

In this section, we provide a (3+22)(3+2\sqrt{2}) LMP approximation algorithm (in expectation) for the Euclidean kk-means problem. Our algorithm will be parameterized by four parameters, δ1,δ2,δ3\delta_{1},\delta_{2},\delta_{3}, and pp. We fix δ1,δ2,δ3\delta_{1},\delta_{2},\delta_{3} and allow pp to vary, and obtain an approximation constant ρ(p)\rho(p) that is a function of pp: for appropriate choices of pp, ρ(p)\rho(p) will equal 3+223+2\sqrt{2}.

In Subsection 4.1, we describe the LMP algorithm, which is based on the LMP approximation algorithms by Jain and Vazirani [30] and Ahmadian et al. [1], but using our technique of generating what we call a nested quasi-independent set. In Subsection 4.2, we analyze the approximation ratio, which spans a large amount of casework.

4.1 The algorithm and setup

Recall the conflict graph H:=H(δ)H:=H(\delta), where we define two tight facilities (i,i)(i,i^{\prime}) to be connected if c(i,i)δmin(ti,ti).c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}). We set parameters δ1δ22δ3\delta_{1}\geq\delta_{2}\geq 2\geq\delta_{3} and 0<p<10<p<1, and define V1V_{1} to be the set of all tight facilities. Given the set of tight facilities V1V_{1} and conflict graphs H(δ)H(\delta) over V1V_{1} for all δ>0\delta>0, our algorithm works by applying the procedure described in Algorithm 1 (on the next page) to V1V_{1}, with parameters δ1,δ2,δ3\delta_{1},\delta_{2},\delta_{3}, and pp.

Algorithm 1 Generate a Nested Quasi-Independent Set of V1V_{1}, as well as a set of centers SS providing an LMP approximation for Euclidean kk-means

LMP(V1,{H(δ)},δ1,δ2,δ3,pV_{1},\{H(\delta)\},\delta_{1},\delta_{2},\delta_{3},p):

1:Create a maximal independent set I1I_{1} of H(δ1)H(\delta_{1}).
2:Let V2V_{2} be the set of points in V1\I1V_{1}\backslash I_{1} that are not adjacent to I1I_{1} in H(δ2)H(\delta_{2}).
3:Create a maximal independent set I2I_{2} of the induced subgraph H(δ2)[V2]H(\delta_{2})[V_{2}].
4:Let V3V_{3} be the set of points ii in V2\I2V_{2}\backslash I_{2} such that there is exactly one point in I2I_{2} that is a neighbor of ii in H(δ2)H(\delta_{2}), there are no points in I1I_{1} that are neighbors of ii in H(δ2)H(\delta_{2}), and there are no points in I2I_{2} that are neighbors of ii in H(δ3)H(\delta_{3}).
5:Create a maximal independent set I3I_{3} of the induced subgraph H(δ2)[V3]H(\delta_{2})[V_{3}].
6:Note that every point iI3i\in I_{3} has a unique adjacent neighbor q(i)I2q(i)\in I_{2} in H(δ2)H(\delta_{2}). We create the final set SS as follows:
  • Include every point iI1i\in I_{1}.

  • For each point iI2i\in I_{2}, flip a fair coin. If the coin lands heads, include ii with probability 2p2p. Otherwise, include each point in q1(i)q^{-1}(i) independently with probability 2p2p.

We will call the triple (I1,I2,I3)(I_{1},I_{2},I_{3}) generated by Algorithm 1 a nested quasi-independent set. Although I1,I2,I3I_{1},I_{2},I_{3} are disjoint, we call it a nested quasi-independent set since I1I1I2I1I2I3I_{1}\subset I_{1}\cup I_{2}\subset I_{1}\cup I_{2}\cup I_{3} are nested, and I1I_{1} is a maximal independent set for H(δ1)H(\delta_{1}) and I1I2I_{1}\cup I_{2} is a maximal independent set for H(δ2)H(\delta_{2}). While I1I2I3I_{1}\cup I_{2}\cup I_{3} is not an independent set, it shares similar properties. As described in the technical overview (and in Algorithm 1), the LMP approximation algorithm uses (I1,I2,I3)(I_{1},I_{2},I_{3}) to create our output set of centers SS. SS contains all of I1I_{1} and each point in I2I3I_{2}\cup I_{3} with probability pp, but the choices of which points in I2I3I_{2}\cup I_{3} are in SS are not fully independent.

For the dual solution α={αj}j𝒟\alpha=\{\alpha_{j}\}_{j\in\mathcal{D}} and values tit_{i} for tight ii as generated as in Subsection 2.2, we note the following simple yet crucial facts.

Proposition 4.1.

[1] The following hold.

  1. 1.

    For any client jj and its witness ii, ii is tight and αjti\alpha_{j}\geq t_{i}.

  2. 2.

    For any client jj and its witness ii, αjc(j,i)\alpha_{j}\geq c(j,i).

  3. 3.

    For any tight facility ii and any client jN(i)j^{\prime}\in N(i), tiαjt_{i}\geq\alpha_{j^{\prime}}.

These will essentially be the only facts relating witnesses and clients that we will need to use.

4.2 Main lemma

We consider a more general setup, as it will be required when converting the LMP approximation to a full polynomial-time algorithm. Let 𝒱\mathcal{V}\subset\mathcal{F} be a subset of facilities (for instance, this may represent the set of tight facilities) and let 𝒟\mathcal{D} be the full set of clients. For each j𝒟,j\in\mathcal{D}, let αj0\alpha_{j}\geq 0 be some real number, and for each i𝒱i\in\mathcal{V}, let ti0t_{i}\geq 0 be some real number. For each client j𝒟j\in\mathcal{D}, we associate with it a set N(j)𝒱N(j)\subset\mathcal{V}. (For instance, this could be the set N(j)={i𝒱:αj>c(j,i)}N(j)=\{i\in\mathcal{V}:\alpha_{j}>c(j,i)\}). In addition, suppose that for each client j𝒟,j\in\mathcal{D}, there exists a “witness” facility w(j)𝒱w(j)\in\mathcal{V}. Finally, suppose that we have the following assumptions. (These assumptions will hold by Proposition 4.1 when α={αj}\alpha=\{\alpha_{j}\} is generated by the procedure in Subsection 2.2, 𝒱\mathcal{V} is the set of tight facilities, and N(j)={i𝒱:αj>c(j,i)}.N(j)=\{i\in\mathcal{V}:\alpha_{j}>c(j,i)\}.)

  1. 1.

    For any client j𝒟j\in\mathcal{D}, the witness w(j)𝒱w(j)\in\mathcal{V} satisfies αjtw(j)\alpha_{j}\geq t_{w(j)} and αjc(j,w(j))\alpha_{j}\geq c(j,w(j)).

  2. 2.

    For any client j𝒟j\in\mathcal{D} and any facility iN(j)i\in N(j), tiαj>c(j,i)t_{i}\geq\alpha_{j}>c(j,i).

For the Euclidean k-means problem with the above assumptions, we will show the following:

Lemma 4.2.

Consider the set of conflict graphs {H(δ)}δ>0\{H(\delta)\}_{\delta>0} created on the vertices 𝒱\mathcal{V}, where (i,i)(i,i^{\prime}) is an edge in H(δ)H(\delta) if c(i,i)δmin(ti,ti)c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}). Fix δ1=4+8272.1877,δ2=2,δ3=6420.3432,\delta_{1}=\frac{4+8\sqrt{2}}{7}\approx 2.1877,\delta_{2}=2,\delta_{3}=6-4\sqrt{2}\approx 0.3432, and let p<0.5p<0.5 be variable. Now, let SS be the randomized set created by applying Algorithm 1 on V1=𝒱V_{1}=\mathcal{V}. Then, for any j𝒟j\in\mathcal{D},

𝔼[c(j,S)]ρ(p)𝔼[αjiN(j)S(αjc(j,i))],\mathbb{E}[c(j,S)]\leq\rho(p)\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right], (9)

where ρ(p)\rho(p) is some constant that only depends on pp (since δ1,δ2,δ3\delta_{1},\delta_{2},\delta_{3} are fixed).

Remark.

While we have not defined ρ(p)\rho(p), we will implicitly define it through our cases. We provide detailed visualizations of the bounds we obtain for each of our cases in Desmos (see Appendix A for the links). Importantly, we show that we can set pp such that ρ(p)3+22\rho(p)\leq 3+2\sqrt{2}.

To see why Lemma 4.2 implies an LMP approximation (in expectation), fix some λ0\lambda\geq 0. Then, we perform the procedure in Subsection 2.2 and let 𝒱\mathcal{V} be the set of tight facilities, N(j)={i𝒱:αj>c(j,i)}N(j)=\{i\in\mathcal{V}:\alpha_{j}>c(j,i)\}, and N(i)={j𝒟:αj>c(j,i)}N(i)=\{j\in\mathcal{D}:\alpha_{j}>c(j,i)\}. Then, by adding (9) over all j𝒟j\in\mathcal{D}, we have that

𝔼[cost(𝒟,S)]\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S)] ρ(p)𝔼[j𝒟αjj𝒟iN(j)S(αjc(j,i))]\displaystyle\leq\rho(p)\cdot\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{j\in\mathcal{D}}\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]
=ρ(p)(j𝒟αj𝔼[iSjN(i)(αjc(j,i))])\displaystyle=\rho(p)\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\mathbb{E}\left[\sum_{i\in S}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\right]\right)
=ρ(p)(j𝒟αjλ𝔼[|S|]).\displaystyle=\rho(p)\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda\cdot\mathbb{E}[|S|]\right).

Above, the final line follows because jN(i)(αjc(j,i))=j𝒟max(αjc(j,i),0)=λ\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)=\lambda, since ii is assumed to be tight. Thus, we obtain an LMP approximation with approximation factor ρ(p)\rho(p) for any choice of pp. Given this, we now prove Lemma 4.2.

Proof of Lemma 4.2.

We fix jj and do casework based on the sizes a=|N(j)I1|a=|N(j)\cap I_{1}|, b=|N(j)I2|,b=|N(j)\cap I_{2}|, and c=|N(j)I3|c=|N(j)\cap I_{3}|. We show that 𝔼[c(j,S)]ρ(p)𝔼[αjiN(j)S(αjc(j,i))]\mathbb{E}[c(j,S)]\leq\rho(p)\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right] for each case of a,b,c.a,b,c. We will call 𝔼[c(j,S)]\mathbb{E}[c(j,S)] the numerator and 𝔼[αjiN(j)S(αjc(j,i))]\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right] the denominator, and attempt to show this fraction is at most ρ(p).\rho(p). By scaling all distances (and all αj,ti\alpha_{j},t_{i} values accordingly), we will assume WLOG that αj=1\alpha_{j}=1, so d(j,w(j)),tw(j)1d(j,w(j)),t_{w(j)}\leq 1.

First, we have the following basic proposition.

Proposition 4.3.

d(j,S)d(j,I1)1+δ1d(j,S)\leq d(j,I_{1})\leq 1+\sqrt{\delta_{1}}.

Proof.

Note that d(j,I1)d(j,w(j))+d(w(j),I1)d(j,I_{1})\leq d(j,w(j))+d(w(j),I_{1}). But d(j,w(j))1d(j,w(j))\leq 1 and by properties of the independent set I1I_{1}, there exists i1I1i_{1}\in I_{1} such that d(w(j),i1)δ1min(tw(j),ti1)δ1d(w(j),i_{1})\leq\sqrt{\delta_{1}\cdot\min(t_{w(j)},t_{i_{1}})}\leq\sqrt{\delta_{1}}, since tw(j)1t_{w(j)}\leq 1. So, d(j,I1)1+δ1d(j,I_{1})\leq 1+\sqrt{\delta_{1}}, as desired. ∎

In addition, we have the following simple proposition about Euclidean space, the proof of which is deferred to Appendix B.

Proposition 4.4.

Suppose that we have 44 points A,B,C,DA,B,C,D in Euclidean space and parameters ν1,ν2,ν3,σ1,σ2,σ30\nu_{1},\nu_{2},\nu_{3},\sigma_{1},\sigma_{2},\sigma_{3}\geq 0 such that d(A,B)21d(A,B)^{2}\leq 1, d(B,C)2ν1min(σ1,σ2)d(B,C)^{2}\leq\nu_{1}\cdot\min(\sigma_{1},\sigma_{2}), d(B,D)2ν2min(σ1,σ3)d(B,D)^{2}\leq\nu_{2}\cdot\min(\sigma_{1},\sigma_{3}), and d(C,D)2ν3min(σ2,σ3)d(C,D)^{2}\geq\nu_{3}\cdot\min(\sigma_{2},\sigma_{3}). Moreover, assume that σ11\sigma_{1}\leq 1 and that ν1,ν2ν3\nu_{1},\nu_{2}\geq\nu_{3}. Then,

pCA22+(1p)DA221+pν1+(1p)ν2+2pν1+(1p)ν2p(1p)ν3.p\cdot\|C-A\|_{2}^{2}+(1-p)\cdot\|D-A\|_{2}^{2}\leq 1+p\cdot\nu_{1}+(1-p)\cdot\nu_{2}+2\sqrt{p\cdot\nu_{1}+(1-p)\cdot\nu_{2}-p(1-p)\cdot\nu_{3}}.

Finally, we will also make frequent use of the well-known fact that for any set of h1h\geq 1 points I={i1,,ih}I=\{i_{1},\dots,i_{h}\} and any jj in Euclidean space, 12hi,iIii22iIij22.\frac{1}{2h}\sum_{i,i^{\prime}\in I}\|i-i^{\prime}\|_{2}^{2}\leq\sum_{i\in I}\|i-j\|_{2}^{2}. As a direct result of this, if c(j,i)1c(j,i)\leq 1 for all iIi\in I but c(i,i)2c(i,i^{\prime})\geq 2 for all iiIi\neq i^{\prime}\in I, then iI(1c(j,i))1\sum_{i\in I}(1-c(j,i))\leq 1.

We are now ready to investigate the several cases needed to prove Lemma 4.2.

Case 1: 𝒂=𝟎,𝒃=𝟏,𝒄=𝟎\boldsymbol{a=0,b=1,c=0}.

Let i2i_{2} be the unique point in I2N(j),I_{2}\cap N(j), and let i=w(j)i^{*}=w(j) be the witness of jj. Recall that ii^{*} is tight, so i𝒱1i^{*}\in\mathcal{V}_{1}. Note that d(j,i)1d(j,i^{*})\leq 1 and ti1t_{i^{*}}\leq 1.

There are numerous sub-cases to consider, which we enumerate.

  1. a)

    𝒊𝑽𝟐\boldsymbol{i^{*}\not\in V_{2}}. In this case, either iI1i^{*}\in I_{1} so d(i,I1)=0d(i^{*},I_{1})=0, or there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ2min(ti,ti1)δ2d(i^{*},i_{1})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{1}})}\leq\sqrt{\delta_{2}}. So, d(j,I1)1+δ2d(j,I_{1})\leq 1+\sqrt{\delta_{2}}. In addition, we have that i2Si_{2}\in S with probability pp. So, if we let t:=d(j,i2)t:=d(j,i_{2}), we can bound the fraction by

    pt2+(1p)(1+δ2)21p(1t2)=pt2+(1p)(1+δ2)2pt2+(1p).\frac{p\cdot t^{2}+(1-p)\cdot(1+\sqrt{\delta_{2}})^{2}}{1-p(1-t^{2})}=\frac{p\cdot t^{2}+(1-p)\cdot(1+\sqrt{\delta_{2}})^{2}}{p\cdot t^{2}+(1-p)}.

    Note that 0t<10\leq t<1 since i2N(j),i_{2}\in N(j), and the above fraction is maximized for t=0t=0, in which case we get that the fraction is at most

    (1+δ2)2.(1+\sqrt{\delta_{2}})^{2}. (1.a)
  2. b)

    𝒊𝑽𝟑.\boldsymbol{i^{*}\in V_{3}}. In this case, there exists i3I3i_{3}\in I_{3} (possibly i3=ii_{3}=i^{*}) such that d(i,i3)δ2min(ti,ti3).d(i^{*},i_{3})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{3}})}. In addition, there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ1min(ti,ti1)d(i^{*},i_{1})\leq\sqrt{\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})}. Finally, since I3V2I_{3}\subset V_{2}, we must have that d(i1,i3)δ2min(ti1,ti3)d(i_{1},i_{3})\geq\sqrt{\delta_{2}\cdot\min(t_{i_{1}},t_{i_{3}})}. If we condition on i2Si_{2}\in S, then the numerator and denominator both equal c(j,i2)c(j,i_{2}), so the fraction is 11 (or 0/00/0). Else, if we condition on i2Si_{2}\not\in S, then the denominator is 11, and the numerator is at most pi3j22+(1p)i1j22p\cdot\|i_{3}-j\|_{2}^{2}+(1-p)\cdot\|i_{1}-j\|_{2}^{2}, since i1Si_{1}\in S always, and either q(i3)i2q(i_{3})\neq i_{2}, in which case (i3S|i2S)=p\mathbb{P}(i_{3}\in S|i_{2}\not\in S)=p, or q(i3)=i2q(i_{3})=i_{2}, in which case (i3S|i2S)=p1pp\mathbb{P}(i_{3}\in S|i_{2}\not\in S)=\frac{p}{1-p}\geq p.

    Note that d(j,i)1d(j,i^{*})\leq 1, that ti1t_{i^{*}}\leq 1, and that δ2,δ1δ2\delta_{2},\delta_{1}\geq\delta_{2}. So, we may apply Proposition 4.4 with A=j,B=i,C=i3,D=i1A=j,B=i^{*},C=i_{3},D=i_{1} and ν1=ν3=δ2,ν2=δ1,σ1=ti,σ2=ti3,σ3=ti1\nu_{1}=\nu_{3}=\delta_{2},\nu_{2}=\delta_{1},\sigma_{1}=t_{i^{*}},\sigma_{2}=t_{i_{3}},\sigma_{3}=t_{i_{1}} to bound numerator (and thus the overall fraction since the denominator equals 11) by

    1+pδ2+(1p)δ1+2p2δ2+(1p)δ1.1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}}. (1.b)

In the remaining cases, we may assume that iV2\V3i^{*}\in V_{2}\backslash V_{3}. Then, one of the following must occur:

  1. c)

    𝒊=𝒊𝟐\boldsymbol{i^{*}=i_{2}}. In this case, define t=d(j,i)[0,1]t=d(j,i^{*})\in[0,1], and note that d(j,I1)d(j,i)+d(i,I1)t+δ1d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq t+\sqrt{\delta_{1}}. So, with probability pp, we have that d(j,S)d(j,i)=td(j,S)\leq d(j,i^{*})=t, and otherwise, we have that d(j,S)d(j,I1)=t+δ1d(j,S)\leq d(j,I_{1})=t+\sqrt{\delta_{1}}. So, we can bound the ratio by

    max0t1pt2+(1p)(t+δ1)21p(1t2)max((0.75+δ1)2,(1p)(1+δ1)2+3p/41p/4).\max_{0\leq t\leq 1}\frac{p\cdot t^{2}+(1-p)\cdot(t+\sqrt{\delta_{1}})^{2}}{1-p\cdot(1-t^{2})}\leq\max\left((\sqrt{0.75}+\sqrt{\delta_{1}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+3p/4}{1-p/4}\right). (1.c)

    We prove this final inequality in Appendix B.

  2. d)

    𝒊𝑰𝟐\boldsymbol{i^{*}\in I_{2}} but 𝒊𝒊𝟐\boldsymbol{i^{*}\neq i_{2}}. First, we recall that d(j,i)1d(j,i^{*})\leq 1. Now, let t=d(j,i2)t=d(j,i_{2}). In this case, with probability pp, d(j,S)td(j,S)\leq t (if we select i2i_{2} to be in SS), with probability p(1p)p(1-p), d(j,S)1d(j,S)\leq 1 (if we select ii^{*} but not i2i_{2} to be in SS), and in the remaining event of (1p)2(1-p)^{2} probability, we still have that d(j,S)d(j,I1)1+δ1d(j,S)\leq d(j,I_{1})\leq 1+\sqrt{\delta_{1}} by Proposition 4.3. So, we can bound the ratio by

    max0t1pt2+p(1p)1+(1p)2(1+δ1)21p(1t2).\max_{0\leq t\leq 1}\frac{p\cdot t^{2}+p(1-p)\cdot 1+(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}}{1-p\cdot(1-t^{2})}.

    Note that this is maximized when t=0t=0 (since the numerator and denominator increase at the same rate when tt increases), so we can bound the ratio by

    p(1p)+(1p)2(1+δ1)21p=p+(1p)(1+δ1)2.\frac{p(1-p)+(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}}{1-p}=p+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}. (1.d)
  3. e)

    There is more than one neighbor of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟐)\boldsymbol{H(\delta_{2})} that is in 𝑰𝟐\boldsymbol{I_{2}}. In this case, there is some other point i2I2i_{2}^{\prime}\in I_{2} not in N(j)N(j) such that d(i,i2)δ2min(ti,ti2).d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})}. So, we have four points j,i,i1I1,i2I2j,i^{*},i_{1}\in I_{1},i_{2}^{\prime}\in I_{2} such that d(j,i)1,d(j,i^{*})\leq 1, d(i,i2)δ2min(ti,ti2),d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})}, d(i,i1)δ1min(ti,ti1),d(i^{*},i_{1})\leq\sqrt{\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})}, and d(i1,i2)δ2min(ti1,ti2).d(i_{1},i_{2}^{\prime})\geq\sqrt{\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}^{\prime}})}.

    If we condition on i2Si_{2}\in S, then the denominator equals c(j,i2)c(j,i_{2}) and the numerator is at most c(j,i2)c(j,i_{2}), so the fraction is 11 (or 0/00/0). Else, if we condition on i2Si_{2}\not\in S, then the denominator is 11, and the numerator is at most pi2j22+(1p)i1j22p\cdot\|i_{2}^{\prime}-j\|_{2}^{2}+(1-p)\cdot\|i_{1}-j\|_{2}^{2}. Note that d(j,i)1d(j,i^{*})\leq 1, that ti1t_{i^{*}}\leq 1, and that δ2,δ1δ2\delta_{2},\delta_{1}\geq\delta_{2}. So, as in 1.b, we may apply Proposition 4.4 to bound the ratio by

    1+pδ2+(1p)δ1+2p2δ2+(1p)δ1.1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}}. (1.e)
  4. f)

    There are no neighbors of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟐)\boldsymbol{H(\delta_{2})} that are in 𝑰𝟐\boldsymbol{I_{2}}. In this case, we would actually have that iI2,i^{*}\in I_{2}, because we defined I2I_{2} to be a maximal independent set in the induced subgraph H(δ2)[V2].H(\delta_{2})[V_{2}]. So, if there were no such neighbors and iV2i^{*}\in V_{2}, then we could add ii^{*} to I2I_{2}, contradicting the maximality of I2I_{2}. Having iI2i^{*}\in I_{2} was already covered by subcases c) and d).

  5. g)

    There is a neighbor of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟑)\boldsymbol{H(\delta_{3})} that is also in 𝑰𝟐\boldsymbol{I_{2}}, which means that either d(i,i2)δ3tid(i^{*},i_{2})\leq\sqrt{\delta_{3}\cdot t_{i^{*}}} so d(i2,j)max(0,d(j,i)δ3ti)d(i_{2},j)\geq\max(0,d(j,i^{*})-\sqrt{\delta_{3}\cdot t_{i^{*}}}), or there is some other point i2I2i_{2}^{\prime}\in I_{2} not in N(j)N(j) such that d(i,i2)δ3min(ti,ti2).d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})}. If d(i,i2)δ3tid(i^{*},i_{2})\leq\sqrt{\delta_{3}\cdot t_{i^{*}}}, then define t=tit=t_{i^{*}} and u=d(j,i)u=d(j,i^{*}). In this case, d(j,I1)u+δ1t,d(j,I_{1})\leq u+\sqrt{\delta_{1}\cdot t}, and d(j,i2)max(0,uδ3t).d(j,i_{2})\geq\max(0,u-\sqrt{\delta_{3}\cdot t}). Since t=ti1t=t_{i^{*}}\leq 1 and u=d(j,i)1u=d(j,i^{*})\leq 1, we can bound the overall fraction as at most

    max0t1max0u1maxd(j,i2)max(0,uδ3t)(1p)(u+δ1t)2+pd(j,i2)21p+pd(j,i2)2\displaystyle\hskip 14.22636pt\max_{0\leq t\leq 1}\max_{0\leq u\leq 1}\max_{d(j,i_{2})\geq\max(0,u-\sqrt{\delta_{3}\cdot t})}\frac{(1-p)\cdot(u+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot d(j,i_{2})^{2}}{1-p+p\cdot d(j,i_{2})^{2}}
    max((δ1+δ3)2,(1p)(1+δ1)2+p(1δ3)21p+p(1δ3)2).\displaystyle\leq\max\left((\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p\cdot(1-\sqrt{\delta_{3}})^{2}}\right). (1.g.i)

    We derive the final inequality in Appendix B.

    Alternatively, if d(i,i2)δ3min(ti,ti2),d(i^{*},i_{2}^{\prime})\leq\sqrt{\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}})}, then if we condition on i2S,i_{2}\in S, the fraction is 11 (or 0/00/0), and if we condition on i2Si_{2}\not\in S, the denominator is 11 and the numerator is at most pd(j,i2)2+(1p)d(j,i1)2p(1+δ3)2+(1p)(1+δ1)2.p\cdot d(j,i_{2}^{\prime})^{2}+(1-p)\cdot d(j,i_{1})^{2}\leq p\cdot(1+\sqrt{\delta_{3}})^{2}+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}. (Note that i2Si_{2}\in S and i2Si_{2}^{\prime}\in S are independent.) Therefore, we can also bound the overall fraction by

    p(1+δ3)2+(1p)(1+δ1)2.p\cdot(1+\sqrt{\delta_{3}})^{2}+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}. (1.g.ii)
  6. h)

    There is a neighbor of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟐)\boldsymbol{H(\delta_{2})} that is also in 𝑰𝟏\boldsymbol{I_{1}}. In this case, ii^{*} would not be in V2V_{2}, so we are back to sub-case 1.a.

Case 2: 𝒂=𝟎,𝒃=𝟏,𝒄𝟏\boldsymbol{a=0,b=1,c\geq 1}.

Let i2i_{2} be the unique point in N(j)I2,N(j)\cap I_{2}, and let i3(1),,i3(c)i_{3}^{(1)},\dots,i_{3}^{(c)} represent the points in N(j)I3N(j)\cap I_{3}. Let c1c_{1} be the number of points in N(j)I3N(j)\cap I_{3} that are in q1(i2)q^{-1}(i_{2}), and let c2=cc1c_{2}=c-c_{1} be the number of points in N(j)I3N(j)\cap I_{3} not in q1(i2).q^{-1}(i_{2}). We will have four subcases. For simplicity, in this case we keep δ2=2.\delta_{2}=2.

Before delving into the subcases, we first prove the following propositions regarding the probability of some point in I3I_{3} being selected.

Proposition 4.5.

Let c=|N(j)I3|c=|N(j)\cap I_{3}|. Then, the probability that no point in N(j)I3N(j)\cap I_{3} is in SS is at most 12+12(12p)c\frac{1}{2}+\frac{1}{2}(1-2p)^{c}.

Proof.

First, note that (12+12x)(12+12y)+(1212x)(1212y)=12+12xy\left(\frac{1}{2}+\frac{1}{2}x\right)\cdot\left(\frac{1}{2}+\frac{1}{2}y\right)+\left(\frac{1}{2}-\frac{1}{2}x\right)\cdot\left(\frac{1}{2}-\frac{1}{2}y\right)=\frac{1}{2}+\frac{1}{2}xy. Therefore, if 0x,y10\leq x,y\leq 1, then (12+12x)(12+12y)12+12xy.\left(\frac{1}{2}+\frac{1}{2}x\right)\cdot\left(\frac{1}{2}+\frac{1}{2}y\right)\leq\frac{1}{2}+\frac{1}{2}xy. In general, through induction we have that for any 0x1,,xr1,0\leq x_{1},\dots,x_{r}\leq 1, that s=1r(12+12xs)12+12x1xr\prod_{s=1}^{r}\left(\frac{1}{2}+\frac{1}{2}x_{s}\right)\leq\frac{1}{2}+\frac{1}{2}x_{1}\cdots x_{r}.

Now, group the points i3(1),,i3(c)N(j)I3i_{3}^{(1)},\dots,i_{3}^{(c)}\in N(j)\cap I_{3} into rcr\leq c groups of sizes c1,,crc_{1},\dots,c_{r}, where each group is points that map to the same point in I2I_{2} under qq. Then, for each group ss, the probability that no point in the group is in SS is precisely 12+12(12p)cs\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{s}}, because with probability 12\frac{1}{2} we will only consider picking the point q(i)q(i) (for ii in group ss), and otherwise each point in the group will still not be in SS with probability 12p1-2p. So, the overall probability is

s=1r(12+12(12p)cs)12+12(12p)c1++cr=12+12(12p)c.\prod_{s=1}^{r}\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{s}}\right)\leq\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{1}+\cdots+c_{r}}=\frac{1}{2}+\frac{1}{2}(1-2p)^{c}.\qed

We also note the following related proposition.

Proposition 4.6.

Let c=|N(j)I3|c=|N(j)\cap I_{3}|, and i=q(i)i^{\prime}=q(i) for some arbitrary iN(j)I3i\in N(j)\cap I_{3}. Then, the probability that no point in N(j)I3N(j)\cap I_{3} nor ii^{\prime} is in SS is at most 12(12p)+12(12p)c\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}.

Proof.

Similar to the previous proposition, we group the points i3(1),,i3(c)N(j)I3i_{3}^{(1)},\dots,i_{3}^{(c)}\in N(j)\cap I_{3} into rcr\leq c groups of sizes c1,,crc_{1},\dots,c_{r}. Assume WLOG that ii is in the first group. Then, the probability that no point in the first group nor ii^{\prime} is in SS is 12(12p)+12(12p)c1=(12p)(12+12(12p)c11)\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c_{1}}=(1-2p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{1}-1}\right). So, the overall probability is

(12p)(12+12(12p)c11)s=2r(12+12(12p)cs)\displaystyle(1-2p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{1}-1}\right)\cdot\prod_{s=2}^{r}\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{s}}\right) (12p)(12+12(12p)(c11)+c2++cr)\displaystyle\leq(1-2p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{(c_{1}-1)+c_{2}+\cdots+c_{r}}\right)
=12(12p)+12(12p)c.\displaystyle=\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}.\qed
  1. a)

    𝒄𝟏=𝟎\boldsymbol{c_{1}=0}. In this case, we have that no pair of points in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) are connected in H(δ2)H(\delta_{2}), which means that they have pairwise distances at least δ2\sqrt{\delta_{2}} from each other (since ti1t_{i}\geq 1 if iN(j)i\in N(j)). So, iN(j)(I2I3)c(j,i)12(c+1)c(c+1)(δ2)2=c\sum_{i\in N(j)\cap(I_{2}\cup I_{3})}c(j,i)\geq\frac{1}{2(c+1)}\cdot c(c+1)(\sqrt{\delta_{2}})^{2}=c since |N(j)(I2I3)|=c+1|N(j)\cap(I_{2}\cup I_{3})|=c+1 and δ2=2\delta_{2}=2. Consequently, iN(j)(I2I3)(1c(j,i))(c+1)c=1\sum_{i\in N(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))\leq(c+1)-c=1. Therefore, the denominator is at least 1p1-p. To bound the numerator, we note that the probability of none of the points in I2I3I_{2}\cup I_{3} being in SS is at most (1p)(12+12(12p)c2)(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right). This is because i2Si_{2}\not\in S with probability 1p1-p, no point in N(j)I3N(j)\cap I_{3} is in SS with probability at most 12+12(12p)c2\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}} by Proposition 4.5, and these two events are independent since c1=0c_{1}=0. If some point in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) is in SS, then c(j,S)1c(j,S)\leq 1, and otherwise, c(j,S)(1+δ1)2c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}. Therefore, we can bound the numerator as at most (1p)(12+12(12p)c2)(1+δ1)2+(1(1p)(12+12(12p)c2)).(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\right). Overall, we have that the fraction is at most

    (1p)(12+12(12p)c2)(1+δ1)2+(1(1p)(12+12(12p)c2))1p.\frac{(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-p)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\right)}{1-p}. (2.a)
  2. b)

    𝒄𝟐=𝟎\boldsymbol{c_{2}=0} and 𝒄𝟐\boldsymbol{c\geq 2}. In this case, we have that q(i3(k))=i2q(i_{3}^{(k)})=i_{2} for all k[c]k\in[c] (note that c=c1c=c_{1}). Since the points i3(k)i_{3}^{(k)} for all k[c]k\in[c] have pairwise distance at least 2\sqrt{2} from each other, we have that iN(j)I3(1c(j,i))1\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))\leq 1. Letting tt be such that 1t=iN(j)I3(1c(j,i)),1-t=\sum_{i\in N(j)\cap I_{3}}(1-c(j,i)), we have that 𝔼iN(j)I3c(j,i)=c1c+tc\mathbb{E}_{i\sim N(j)\cap I_{3}}c(j,i)=\frac{c-1}{c}+\frac{t}{c}. In addition, let u=d(j,i2).u=d(j,i_{2}). In this case, the denominator of our fraction is 1p(1u2)p(1t)=pu2+pt+(12p).1-p(1-u^{2})-p(1-t)=p\cdot u^{2}+p\cdot t+(1-2p). To bound the numerator, with probability pp we have that i2Si_{2}\in S, in which case c(j,S)u2c(j,S)\leq u^{2}. In addition, there is a disjoint 12(1(12p)c)\frac{1}{2}\cdot\left(1-(1-2p)^{c}\right)-probability event where some i3(k)Si_{3}^{(k)}\in S, and conditioned on this event, 𝔼[c(j,S)]c1c+tc\mathbb{E}[c(j,S)]\leq\frac{c-1}{c}+\frac{t}{c}. Otherwise, we still have that c(j,S)c(j,I1)(1+δ1)2c(j,S)\leq c(j,I_{1})\leq(1+\sqrt{\delta_{1}})^{2}. So, overall we have that the fraction is at most

    pu2+12(1(12p)c)(c1c+tc)+(1p12(1(12p)c))(1+δ1)2pu2+pt+(12p).\frac{p\cdot u^{2}+\frac{1}{2}\cdot\left(1-(1-2p)^{c}\right)\cdot\left(\frac{c-1}{c}+\frac{t}{c}\right)+\left(1-p-\frac{1}{2}\cdot\left(1-(1-2p)^{c}\right)\right)\cdot(1+\sqrt{\delta_{1}})^{2}}{p\cdot u^{2}+p\cdot t+(1-2p)}.

    This function clearly decreases as uu increases (since the numerator and denominator increase at the same rate). In addition, since 12(1(12p)c)1c<p\frac{1}{2}\cdot(1-(1-2p)^{c})\cdot\frac{1}{c}<p (whenever 0<p<120<p<\frac{1}{2}), we have that the numerator increases at a slower rate than the denominator when tt increases, so this function also decreases as tt increases. So, we may assume that t=u=0t=u=0 to get that the fraction is at most

    12(1(12p)c)c1c+(1p12(1(12p)c))(1+δ1)212p.\frac{\frac{1}{2}\cdot(1-(1-2p)^{c})\cdot\frac{c-1}{c}+(1-p-\frac{1}{2}\cdot(1-(1-2p)^{c}))\cdot(1+\sqrt{\delta_{1}})^{2}}{1-2p}. (2.b)
  3. c)

    𝒄𝟏,𝒄𝟐𝟏\boldsymbol{c_{1},c_{2}\geq 1}. In this case, we have that there exists a point i3(k)N(j)I3i_{3}^{(k)}\in N(j)\cap I_{3} not in q1(i2)q^{-1}(i_{2}), so it has distance at least δ2\sqrt{\delta_{2}} from i2i_{2}. Therefore, by the triangle inequality, d(i2,j)δ21.d(i_{2},j)\geq\sqrt{\delta_{2}}-1. Let t=d(i2,j)t=d(i_{2},j). Next, since all of the points in N(j)I3N(j)\cap I_{3} have pairwise distance at least 2\sqrt{2} from each other, iN(j)I3(1c(j,i))1\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))\leq 1. Therefore, the denominator of the fraction is at most 1p(1t2)p12p+t2p1-p(1-t^{2})-p\leq 1-2p+t^{2}p.

    We now bound the numerator. First, by Proposition 4.6 and since c2c\geq 2, the probability that no point in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) is in SS is some pp^{\prime}, where p12(12p)+12(12p)c(12p)(1p)p^{\prime}\leq\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\leq(1-2p)(1-p). In this event, we have that c(j,S)(1+δ1)2c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}. In addition, there is a pp probability that i2Si_{2}\in S, in which case c(j,S)t2c(j,S)\leq t^{2}. Finally, with 1pp1-p-p^{\prime} probability, we have that i2Si_{2}\not\in S but there is some i3(k)Si_{3}^{(k)}\in S, so c(j,S)1c(j,S)\leq 1. Overall, the fraction is at most

    pt2+p(1+δ1)2+(1pp)12p+t2p.\frac{p\cdot t^{2}+p^{\prime}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-p-p^{\prime})}{1-2p+t^{2}p}.

    This function clearly decreases as tt increases (since the numerator and denominator increase at the same rate), so we may assume that t=21t=\sqrt{2}-1 as this is our lower bound on tt. So, the fraction is at most

    p(21)2+p(1+δ1)2+(1pp)12p+(21)2p1p(222)+(12p)(1p)((1+δ1)21)1p(221).\frac{p\cdot(\sqrt{2}-1)^{2}+p^{\prime}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-p-p^{\prime})}{1-2p+(\sqrt{2}-1)^{2}p}\leq\frac{1-p(2\sqrt{2}-2)+(1-2p)(1-p)\cdot\left((1+\sqrt{\delta_{1}})^{2}-1\right)}{1-p(2\sqrt{2}-1)}. (2.c)
  4. d)

    𝒄𝟏=𝟏\boldsymbol{c_{1}=1} and 𝒄𝟐=𝟎\boldsymbol{c_{2}=0}. In this case, c=1c=1, so we simply write i3i_{3} as the unique point in I3N(j)I_{3}\cap N(j). Let i=w(j)i^{*}=w(j) be the witness of jj. Since c1=1c_{1}=1, this means that i2,i3i_{2},i_{3} are neighbors in H(δ2)H(\delta_{2}) and q(i3)=i2q(i_{3})=i_{2}. Finally, we have that i2,i3i_{2},i_{3} are not connected in H(δ3)H(\delta_{3}). So, d(i2,i3)δ3min(ti2,ti3).d(i_{2},i_{3})\geq\sqrt{\delta_{3}\cdot\min(t_{i_{2}},t_{i_{3}})}. Now, note that since i2,i3i_{2},i_{3} are not in I1I_{1}, we either have that the witness ii^{*} is in I1I_{1}, in which case d(j,I1)=1d(j,I_{1})=1, or all of i2,i3,ii_{2},i_{3},i^{*} are adjacent to I1I_{1} in H(δ1)H(\delta_{1}) since I1I_{1} was a maximal independent set in H(δ1)H(\delta_{1}). Therefore, if we define β=d(j,i2)\beta=d(j,i_{2}) and γ=d(j,i3)\gamma=d(j,i_{3}), this means that d(j,I1)min(1+δ1,β+δ1ti2,γ+δ1ti3),d(j,I_{1})\leq\min\left(1+\sqrt{\delta_{1}},\beta+\sqrt{\delta_{1}\cdot t_{i_{2}}},\gamma+\sqrt{\delta_{1}\cdot t_{i_{3}}}\right), and d(i2,i3)β+γd(i_{2},i_{3})\leq\beta+\gamma by triangle inequality.

    Note that the denominator equals 1p(1β2)p(1γ2)1-p(1-\beta^{2})-p(1-\gamma^{2}). To bound the numerator, note that with probability pp, i2Si_{2}\in S in which case d(j,S)βd(j,S)\leq\beta and with probability pp, i3Si_{3}\in S in which case d(j,S)γd(j,S)\leq\gamma. Also, these two events are disjoint since q(i3)=i2q(i_{3})=i_{2}. Finally, in the remaining 12p1-2p probability event, d(j,S)d(j,I1)min(1+δ1,β+δ1ti2,γ+δ1ti3)d(j,S)\leq d(j,I_{1})\leq\min(1+\sqrt{\delta_{1}},\beta+\sqrt{\delta_{1}\cdot t_{i_{2}}},\gamma+\sqrt{\delta_{1}\cdot t_{i_{3}}}).

    Letting t=min(ti2,ti3)1t=\min(t_{i_{2}},t_{i_{3}})\geq 1, we have that min(β+δ1ti2,γ+δ1ti3)max(β,γ)+δ1t\min\left(\beta+\sqrt{\delta_{1}\cdot t_{i_{2}}},\gamma+\sqrt{\delta_{1}\cdot t_{i_{3}}}\right)\leq\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t}. We also know that δ3td(i2,i3)β+γ\sqrt{\delta_{3}\cdot t}\leq d(i_{2},i_{3})\leq\beta+\gamma. Therefore, we can bound the ratio by

    maxt1β+γδ3t(12p)min(1+δ1,max(β,γ)+δ1t)2+pβ2+pγ21p(1β2)p(1γ2)\displaystyle\hskip 14.22636pt\max_{\begin{subarray}{c}t\geq 1\\ \beta+\gamma\geq\sqrt{\delta_{3}\cdot t}\end{subarray}}\frac{(1-2p)\cdot\min(1+\sqrt{\delta_{1}},\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot\beta^{2}+p\cdot\gamma^{2}}{1-p(1-\beta^{2})-p(1-\gamma^{2})}
    =(12p)(1+δ1)2(δ1+(δ1+δ3)2)+p(1+δ1)2δ3(12p)(δ1+(δ1+δ3)2)+p(1+δ1)2δ3.\displaystyle=\frac{\left(1-2p\right)\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\left(\delta_{1}+\left(\sqrt{\delta_{1}}+\sqrt{\delta_{3}}\right)^{2}\right)+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}{\left(1-2p\right)\cdot\left(\delta_{1}+\left(\sqrt{\delta_{1}}+\sqrt{\delta_{3}}\right)^{2}\right)+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}. (2.d)

    We prove the final equality in Appendix B.

Case 3: 𝒂=𝟎,𝒃𝟐\boldsymbol{a=0,b\geq 2}.

We split this case into three cases. First, we recall that each point iI3i\in I_{3} corresponds to some point q(i)I2q(i)\in I_{2}. Let c1c_{1} represent the number of points such iN(j)I3i\in N(j)\cap I_{3} such that q(i)N(j)I2q(i)\in N(j)\cap I_{2}, and let c2=cc1c_{2}=c-c_{1}. Note that if c=0c=0, then c1=c2=0c_{1}=c_{2}=0.

  1. a)

    𝒄𝟏=𝟎\boldsymbol{c_{1}=0}. In this case, all of the points in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) must not be connected in H(δ2)H(\delta_{2}). Therefore, iN(j)(I2I3)(1c(j,i))1\sum_{i\in N(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))\leq 1 and since a=0a=0, this means that the denominator is at least 1p1-p in expectation. In addition, since b2,b\geq 2, the probability of no point in N(j)I2N(j)\cap I_{2} being in SS is at most (1p)2(1-p)^{2}, in which case d(j,S)(1+δ1)2d(j,S)\leq(1+\sqrt{\delta_{1}})^{2}. If there is some point in N(j)I2N(j)\cap I_{2} in SS, then d(j,S)1.d(j,S)\leq 1. Therefore, the numerator is at most (1p)2(1+δ1)2+(1(1p)2).(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-(1-p)^{2}). So, we can bound the fraction by

    (1p)2(1+δ1)2+(1(1p)2)1p.\frac{(1-p)^{2}\cdot(1+\sqrt{\delta_{1}})^{2}+(1-(1-p)^{2})}{1-p}. (3.a)
  2. b)

    𝒄𝟏=𝟏,𝒄𝟐=𝟎\boldsymbol{c_{1}=1,c_{2}=0}. In this case, the probability of there being a point in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) that is part of SS is (12p)(1p)b1(1-2p)\cdot(1-p)^{b-1}. Conditioned on this event, c(j,S)1c(j,S)\leq 1, and otherwise, c(j,S)(1+δ1)2c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}. So, the numerator is at most (12p)(1p)b1(1+δ1)2+(1(12p)(1p)b1)1.(1-2p)\cdot(1-p)^{b-1}\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-2p)\cdot(1-p)^{b-1}\right)\cdot 1. Finally, we have that all of the points in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) are separated by at least 2\sqrt{2}, except for the unique point i3N(j)I3i_{3}\in N(j)\cap I_{3} and q(i3)q(i_{3}), which are separated by at least δ3\sqrt{\delta_{3}}. So, N(j)(I2I3)c(j,i)1b+1(b(b+1)22+(δ32))=b+δ32b+1,\sum_{N(j)\cap(I_{2}\cup I_{3})}c(j,i)\geq\frac{1}{b+1}\cdot\left(\frac{b(b+1)}{2}\cdot 2+(\delta_{3}-2)\right)=b+\frac{\delta_{3}-2}{b+1}, which means that N(j)(I2I3)(1c(j,i))1+2δ3b+1.\sum_{N(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))\leq 1+\frac{2-\delta_{3}}{b+1}. Therefore, the denominator is at least 1p(1+2δ3b+1).1-p\cdot\left(1+\frac{2-\delta_{3}}{b+1}\right). Overall, the fraction is at most

    (1p)b1(12p)(1+δ1)2+(1(1p)b1(12p))1(1+2δ3b+1)p\frac{(1-p)^{b-1}\cdot(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}+\left(1-(1-p)^{b-1}(1-2p)\right)}{1-\left(1+\frac{2-\delta_{3}}{b+1}\right)\cdot p} (3.b)
  3. c)

    𝒄𝟏𝟐\boldsymbol{c_{1}\geq 2} or 𝒄𝟏=𝟏,𝒄𝟐𝟏\boldsymbol{c_{1}=1,c_{2}\geq 1}. In this case, we first note that since all of the points in N(j)I2N(j)\cap I_{2} have distance at least 2\sqrt{2} from each other and all of the points in N(j)I3N(j)\cap I_{3} have distance at least 2\sqrt{2} from each other, both iN(j)I2(1c(j,i))\sum_{i\in N(j)\cap I_{2}}(1-c(j,i)) and iN(j)I3(1c(j,i))\sum_{i\in N(j)\cap I_{3}}(1-c(j,i)) are at most 11. Let tt be such that 1t=iN(j)I2(1c(j,i)).1-t=\sum_{i\in N(j)\cap I_{2}}(1-c(j,i)). Then, the denominator is at least 1p(2t)1-p(2-t). In addition, with probability 1(1p)b1-(1-p)^{b}, at least one of the points in N(j)I2N(j)\cap I_{2} will be in SS, conditioned on which the expected value of c(j,S)c(j,S) is at most 1biN(j)I2c(j,i)=b1b+tb.\frac{1}{b}\cdot\sum_{i\in N(j)\cap I_{2}}c(j,i)=\frac{b-1}{b}+\frac{t}{b}. Next, note that the probability of no point in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) being in SS is maximized when all points iN(j)I3i\in N(j)\cap I_{3} with q(i)N(j)q(i)\in N(j) map to a single point i2N(j)I2i_{2}\in N(j)\cap I_{2}, and all other points in N(j)I3N(j)\cap I_{3} map to a single point i2I2\N(j)i_{2}^{\prime}\in I_{2}\backslash N(j). In this case, the probability that no point in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) is in SS is at most p:=(1p)b1(12(12p)+12(12p)c1)(12+12(12p)c2).p^{\prime}:=(1-p)^{b-1}\cdot\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c_{1}}\right)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right).

    Overall, we have that with probability 1(1p)b1-(1-p)^{b}, some point in I2N(j)I_{2}\cap N(j) is in SS, conditioned on which 𝔼[c(j,S)]b1b+tb\mathbb{E}[c(j,S)]\leq\frac{b-1}{b}+\frac{t}{b}, with probability at most p,p^{\prime}, no point in N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) is in SS, conditioned on which c(j,S)(1+δ1)2,c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}, and otherwise, some point in N(j)I3N(j)\cap I_{3} is in SS, which means c(j,S)1.c(j,S)\leq 1. So, we can bound this fraction overall by

    (1(1p)b)(b1b+tb)+p(δ1)2+((1p)bp)112p+pt.\frac{(1-(1-p)^{b})\cdot\left(\frac{b-1}{b}+\frac{t}{b}\right)+p^{\prime}\cdot(\sqrt{\delta_{1}})^{2}+((1-p)^{b}-p^{\prime})\cdot 1}{1-2p+p\cdot t}.

    Noting that (1(1p)b)1bp(1-(1-p)^{b})\cdot\frac{1}{b}\leq p, we have that the numerator increases at a slower rate than the denominator as tt increases. Therefore, this fraction is maximized when t=0t=0. So, we can bound the fraction by

    (1(1p)b)b1b+p(1+δ1)2+((1p)bp)12p\displaystyle\hskip 14.22636pt\frac{(1-(1-p)^{b})\cdot\frac{b-1}{b}+p^{\prime}\cdot(1+\sqrt{\delta_{1}})^{2}+((1-p)^{b}-p^{\prime})}{1-2p}
    =b1b+(1p)bb+(1p)b1(12(12p)+12(12p)c1)(12+12(12p)c2)((1+δ1)21)12p.\displaystyle=\frac{\frac{b-1}{b}+\frac{(1-p)^{b}}{b}+(1-p)^{b-1}\cdot\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c_{1}}\right)\cdot\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c_{2}}\right)\cdot\left((1+\sqrt{\delta_{1}})^{2}-1\right)}{1-2p}. (3.c)

Case 4: 𝒂=𝟎,𝒃=𝟎\boldsymbol{a=0,b=0}.

We split this case into three subcases.

  1. a)

    𝒄=𝟎.\boldsymbol{c=0}. In this case, N(j)SN(j)\cap S is always empty, so the denominator is 11. To bound the numerator, we consider the witness ii^{*} of jj. If iI1i^{*}\in I_{1}, the numerator is at most c(j,i)1c(j,i^{*})\leq 1 so we can bound the fraction by 11. Else, if iV2,i^{*}\not\in V_{2}, then there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ2min(ti,ti2)δ2d(i^{*},i_{1})\leq\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}})}\leq\sqrt{\delta_{2}}, and since d(j,i)1,d(j,i^{*})\leq 1, we have that d(j,I1)1+δ2)d(j,I_{1})\leq 1+\sqrt{\delta_{2}}). Thus, the fraction in the case where iI1i\in I_{1} or iV2i^{*}\not\in V_{2} is at most

    (1+δ2)2.(1+\sqrt{\delta_{2}})^{2}. (4.a.i)

    Otherwise, there is some i1I1i_{1}\in I_{1} of distance at most δ1min(ti,ti1)δ1\sqrt{\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}})}\leq\sqrt{\delta_{1}} away from ii^{*}. Next, if iI2,i^{*}\in I_{2}, the numerator is at most p1+(1p)(1+δ1)2p\cdot 1+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}. Otherwise, there is some i2I2i_{2}\in I_{2} of distance at most δ2min(ti,ti2)\sqrt{\delta_{2}\cdot\min(t_{i^{*}},t_{i_{2}})} away from ii^{*}. Finally, d(i1,i2)δ2min(ti1,ti2)d(i_{1},i_{2})\geq\sqrt{\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}})}. Therefore, as in 1.b, we can apply Proposition 4.4 to obtain that the numerator, and therefore, the fraction is at most

    1+pδ2+(1p)δ1+2p2δ2+(1p)δ1,1+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}}, (4.a.ii)

    since the above (4.a.ii) is greater than both 11 and p+(1p)(1+δ1)2p+(1-p)\cdot(1+\sqrt{\delta_{1}})^{2} for any 0<p<1.0<p<1.

  2. b)

    𝒄=𝟏.\boldsymbol{c=1}. In this case, let i3i_{3} be the unique element in N(j)I3N(j)\cap I_{3}. Conditioned in i3Si_{3}\in S, the denominator equals c(j,i3)c(j,i_{3}) and the numerator is at most c(j,i3)c(j,i_{3}). Otherwise, the denominator is 11, and the numerator can again be bounded in an identical way to the previous case, since the probability of i2Si_{2}\in S is either pp (if q(i3)i2q(i_{3})\neq i_{2}) or p1p>p\frac{p}{1-p}>p (if q(i3)=i2q(i_{3})=i_{2}). Therefore, the fraction is again at most

    (1+δ2)2(1+\sqrt{\delta_{2}})^{2} (4.b.i)

    if the witness ii^{*} of jj satisfies iI1i^{*}\in I_{1} or iV2i^{*}\not\in V_{2}, and is at most

    1+pδ2+(1p)δ1+2p2δ2+(1p)δ11+p\cdot\delta_{2}+(1-p)\cdot\delta_{1}+2\sqrt{p^{2}\cdot\delta_{2}+(1-p)\cdot\delta_{1}} (4.b.ii)

    otherwise.

  3. c)

    𝒄𝟐\boldsymbol{c\geq 2}. In this case, note that all points in N(j)I3N(j)\cap I_{3} are separated by at least 2\sqrt{2}, which means that iN(j)I3(1c(j,i))1\sum_{i\in N(j)\cap I_{3}}(1-c(j,i))\leq 1. Letting tt be such that 1t=iN(j)I3(1c(j,i)),1-t=\sum_{i\in N(j)\cap I_{3}}(1-c(j,i)), we have that iN(j)I3c(j,i)=c1+t\sum_{i\in N(j)\cap I_{3}}c(j,i)=c-1+t, so there exists i3N(j)I3i_{3}\in N(j)\cap I_{3} such that c(j,i3)c1+tcc(j,i_{3})\leq\frac{c-1+t}{c}. In addition, we know that c(j,I1)(1+δ1)2c(j,I_{1})\leq(1+\sqrt{\delta_{1}})^{2}. Finally, we also note the denominator equals 1p(1t)=1p+pt1-p(1-t)=1-p+pt.

    Next, note that

    i,iN(j)I3c(i,i)\displaystyle\sum_{i,i^{\prime}\in N(j)\cap I_{3}}c(i,i^{\prime}) =i,iN(j)I3[ij2+ij22ij,ij]\displaystyle=\sum_{i,i^{\prime}\in N(j)\cap I_{3}}\left[\|i-j\|^{2}+\|i^{\prime}-j\|^{2}-2\langle i-j,i^{\prime}-j\rangle\right]
    =2c[iN(j)I3c(j,i)]2iN(j)I3(ij),iN(j)I3(ij)\displaystyle=2c\cdot\left[\sum_{i\in N(j)\cap I_{3}}c(j,i)\right]-2\left\langle\sum_{i\in N(j)\cap I_{3}}(i-j),\sum_{i^{\prime}\in N(j)\cap I_{3}}(i^{\prime}-j)\right\rangle
    2c[iN(j)I3c(j,i)]\displaystyle\leq 2c\cdot\left[\sum_{i\in N(j)\cap I_{3}}c(j,i)\right]
    =2c(c1+t).\displaystyle=2c\cdot(c-1+t).

    Since c(i,i)=0c(i,i)=0, this means there exists iii\neq i^{\prime} such that c(i,i)2(c1+t)c1,c(i,i^{\prime})\leq\frac{2(c-1+t)}{c-1}, and since c(i,i)2min(ti,ti)c(i,i^{\prime})\geq 2\cdot\min(t_{i},t_{i^{\prime}}) for any i,iI3i,i^{\prime}\in I_{3}, this means that miniN(j)I3tic1+tc1.\min_{i\in N(j)\cap I_{3}}t_{i}\leq\frac{c-1+t}{c-1}.

    Let i=argminiN(j)I3tii=\arg\min_{i\in N(j)\cap I_{3}}t_{i}, and let i2=q(i)I2i_{2}=q(i)\in I_{2}. Let 1\mathcal{E}_{1} be the event that no point in N(j)I3N(j)\cap I_{3} nor i2i_{2} is in SS, let 2\mathcal{E}_{2} be the event that no point in N(j)I3N(j)\cap I_{3} is in SS, and let 3\mathcal{E}_{3} be the event that i3i_{3} is not in SS. Note that 1\mathcal{E}_{1} implies 2\mathcal{E}_{2}, which implies 3\mathcal{E}_{3}. Now, by Proposition 4.6, (1)\mathbb{P}(\mathcal{E}_{1}) equals some p112(12p)+12(12p)cp_{1}\leq\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}. Likewise, By Proposition 4.5, (2)\mathbb{P}(\mathcal{E}_{2}) equals some p212+12(12p)cp_{2}\leq\frac{1}{2}+\frac{1}{2}(1-2p)^{c}. In addition, (3)=p3=1p\mathbb{P}(\mathcal{E}_{3})=p_{3}=1-p. Under the event 3c\mathcal{E}_{3}^{c}, we have that c(j,S)c(j,i3)c1+tcc(j,S)\leq c(j,i_{3})\leq\frac{c-1+t}{c}. Next, under the event 3\2,\mathcal{E}_{3}\backslash\mathcal{E}_{2}, we have that some point in N(j)I3N(j)\cap I_{3} is in SS, so c(j,S)1c(j,S)\leq 1. Under the event 2\1\mathcal{E}_{2}\backslash\mathcal{E}_{1}, we know that i2Si_{2}\in S, so c(j,S)d(j,i2)2(c(j,i)+δ2ti)2(1+2c1+tc1)2c(j,S)\leq d(j,i_{2})^{2}\leq(c(j,i)+\sqrt{\delta_{2}\cdot t_{i}})^{2}\leq\left(1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)^{2}. Finally, we always have that c(j,S)(1+δ1)2c(j,S)\leq(1+\sqrt{\delta_{1}})^{2}.

    Therefore, we can bound the overall fraction by

    p1(1+δ1)2+(p2p1)min(1+δ1,1+2c1+tc1)2+(p3p2)1+(1p3)c1+tc1p+pt.\frac{p_{1}\cdot(1+\sqrt{\delta_{1}})^{2}+(p_{2}-p_{1})\cdot\min\left(1+\sqrt{\delta_{1}},1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)^{2}+(p_{3}-p_{2})\cdot 1+(1-p_{3})\cdot\frac{c-1+t}{c}}{1-p+pt}.

    Since (1+δ1)2min(1+δ1,1+2c1+tc1)21c1+tc,(1+\sqrt{\delta_{1}})^{2}\geq\min\left(1+\sqrt{\delta_{1}},1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right)^{2}\geq 1\geq\frac{c-1+t}{c}, the above fraction is an increasing function in the variables p1,p2,p3p_{1},p_{2},p_{3}. So, we can upper bound this fraction by replacing p1,p2,p3p_{1},p_{2},p_{3} with their respective upper bounds 12(12p)+12(12p)c,\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}, 12+12(12p)c\frac{1}{2}+\frac{1}{2}(1-2p)^{c}, and 1p1-p, as well as replacing min(1+δ1,1+2c1+tc1)\min\left(1+\sqrt{\delta_{1}},1+\sqrt{2\cdot\frac{c-1+t}{c-1}}\right) with simply 1+2c1+tc11+\sqrt{2\cdot\frac{c-1+t}{c-1}}. Next, since p2p1=pp_{2}-p_{1}=p and 1p3=p1-p_{3}=p, the derivative of the numerator with respect to tt is p(2c1+2(c1)(c1+t))+p1cp(2.5+2)p\cdot\left(\frac{2}{c-1}+\sqrt{\frac{2}{(c-1)(c-1+t)}}\right)+p\cdot\frac{1}{c}\leq p\cdot(2.5+\sqrt{2}), and the derivative of the denominator with respect to tt is ptpt. Hence, this fraction decreases as tt increases, unless the fraction is less than 2.5+22.5+\sqrt{2}. Therefore, we can bound the fraction by

    max(2.5+2,p1(1+δ1)2+(p2p1)(1+2)2+(p3p2)1+(1p3)c1c1p)\displaystyle\hskip 14.22636pt\max\left(2.5+\sqrt{2},\frac{p_{1}\cdot(1+\sqrt{\delta_{1}})^{2}+(p_{2}-p_{1})\cdot(1+\sqrt{2})^{2}+(p_{3}-p_{2})\cdot 1+(1-p_{3})\cdot\frac{c-1}{c}}{1-p}\right)
    =max(2.5+2,p1(1+δ1)2+p(1+2)2+(1pp2)1+pc1c1p)\displaystyle=\max\left(2.5+\sqrt{2},\frac{p_{1}\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1+\sqrt{2})^{2}+(1-p-p_{2})\cdot 1+p\cdot\frac{c-1}{c}}{1-p}\right)
    =max(2.5+2,(12(12p)+12(12p)c)(1+δ1)2(12+12(12p)c)+p(1+2)2+1pc1p),\displaystyle=\max\left(2.5+\sqrt{2},\frac{\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\right)\cdot(1+\sqrt{\delta_{1}})^{2}-\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c}\right)+p(1+\sqrt{2})^{2}+1-\frac{p}{c}}{1-p}\right), (4.c)

    where we used the facts that p3=1pp_{3}=1-p and p2p1=pp_{2}-p_{1}=p.

Case 5: 𝒂𝟏\boldsymbol{a\geq 1}.

In this case, note that 𝔼[c(j,S)]𝔼[c(j,I1)]1\mathbb{E}[c(j,S)]\leq\mathbb{E}[c(j,I_{1})]\leq 1 deterministically, since a=|I1N(j)|1a=|I_{1}\cap N(j)|\geq 1. However, we can improve upon this, since we may have some points in I1N(j)I_{1}\cap N(j) much closer to jj, or we may have some points in (I2I3)N(j)(I_{2}\cup I_{3})\cap N(j) which are closer and appear with some probability.

Recall that in our algorithm, we flip a coin for each iI2i\in I_{2} to decide whether we include iSi\in S with probability 2p2p or include each q1(i)q^{-1}(i) in SS independently with probability 2p2p. Let us condition on all of these fair coin flips, and say that a point iI2I3i\in I_{2}\cup I_{3} survives the coin flips if they could be in the set SS with probability 2p2p afterwards. For simplicity, we replace pp with p :=2p\accentset{\rule{2.79996pt}{0.7pt}}{p}:=2p. We also let I \accentset{\rule{2.79996pt}{0.7pt}}{I} represent the points in I2I3I_{2}\cup I_{3} that survive the fair coin flips.

Let the squared distances from jj to each of the points in I1N(j)I_{1}\cap N(j) be r1,,rar_{1},\dots,r_{a}, and the squared distances from jj to each of the points in I N(j)\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j) be s1,,shs_{1},\dots,s_{h}, where h=|I N(j)|h=|\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j)|. It is trivial to see that 𝔼[c(j,S)]min1iarir1++raa,\mathbb{E}[c(j,S)]\leq\min_{1\leq i\leq a}r_{i}\leq\frac{r_{1}+\cdots+r_{a}}{a}, and conditioned on at least one of the points in I N(j)\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j) being selected, we have that c(j,S)c(j,S) in expectation is at most s1++shh.\frac{s_{1}+\cdots+s_{h}}{h}. The probability of at least one of the points in I N(j)\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j) being selected in SS is

1(1p )h11(1+p )h111+p h=p h1+p hp ha+p h,1-(1-\accentset{\rule{2.79996pt}{0.7pt}}{p})^{h}\geq 1-\frac{1}{(1+\accentset{\rule{2.79996pt}{0.7pt}}{p})^{h}}\geq 1-\frac{1}{1+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}=\frac{\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}{1+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}\geq\frac{\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h},

since conditioned on the initial coin flips, each surviving point in (I2 I3 )(\accentset{\rule{2.79996pt}{0.7pt}}{I_{2}}\cup\accentset{\rule{2.79996pt}{0.7pt}}{I_{3}}) is included in SS independently with probability p \accentset{\rule{2.79996pt}{0.7pt}}{p}. Therefore, we can say that

𝔼[c(j,S)]r1++raaaa+p h+s1++shhp ha+p h=(r1++ra)+p (s1++sh)a+p h.\mathbb{E}[c(j,S)]\leq\frac{r_{1}+\cdots+r_{a}}{a}\cdot\frac{a}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}+\frac{s_{1}+\cdots+s_{h}}{h}\cdot\frac{\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}=\frac{(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot(s_{1}+\cdots+s_{h})}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}.

Next, we have that

𝔼[αjiN(j)S(αjc(j,i))]\displaystyle\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right] =αj(aαj(r1++ra))p (hαj(s1++sh))\displaystyle=\alpha_{j}-(a\cdot\alpha_{j}-(r_{1}+\cdots+r_{a}))-\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot(h\cdot\alpha_{j}-(s_{1}+\cdots+s_{h}))
=αj(1(a+p h))+[(r1++ra)+p (s1++sh)].\displaystyle=\alpha_{j}\cdot(1-(a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h))+[(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot(s_{1}+\cdots+s_{h})].

Now, we provide a lower bound for (r1++ra)+p (s1++sh).(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}(s_{1}+\cdots+s_{h}). To do so, we use the fact that all the points in N(j)I1N(j)\cap I_{1} are separated by at least δ1αj\delta_{1}\cdot\alpha_{j} in squared distance, and all the surviving points in N(j)(I1I2 I3 )N(j)\cap(I_{1}\cup\accentset{\rule{2.79996pt}{0.7pt}}{I_{2}}\cup\accentset{\rule{2.79996pt}{0.7pt}}{I_{3}}) are separated by at least δ2αj\delta_{2}\cdot\alpha_{j} in squared distance, to get

(r1++ra)+p (s1++sh)1a+p hαj(δ1a(a1)2+δ2p ah+δ2p 2h(h1)2).(r_{1}+\cdots+r_{a})+\accentset{\rule{2.79996pt}{0.7pt}}{p}(s_{1}+\cdots+s_{h})\geq\frac{1}{a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h}\cdot\alpha_{j}\cdot\left(\delta_{1}\cdot\frac{a(a-1)}{2}+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot a\cdot h+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}^{2}\cdot\frac{h(h-1)}{2}\right).

So, if a1a\geq 1 and (a,h)(1,0)(a,h)\neq(1,0), then if we let T1=a+p h,T_{1}=a+\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot h, T3=δ1a(a1)2+δ2p ah+δ2p 2h(h1)2,T_{3}=\delta_{1}\cdot\frac{a(a-1)}{2}+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}\cdot a\cdot h+\delta_{2}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{p}^{2}\cdot\frac{h(h-1)}{2}, and T2=T3/T1T_{2}=T_{3}/T_{1}, then the ratio is at most

T2/T11T1+T2=T3T1(T1T12+T3)=1T1+T11T1T12+T3.\frac{T_{2}/T_{1}}{1-T_{1}+T_{2}}=\frac{T_{3}}{T_{1}(T_{1}-T_{1}^{2}+T_{3})}=\frac{1}{T_{1}}+\frac{T_{1}-1}{T_{1}-T_{1}^{2}+T_{3}}. (5.a)

In the case where a=1,h=0,a=1,h=0, this fraction is undefined. However, we note that in this case, N(j)SN(j)\cap S deterministically contains a unique center ii^{*} and nothing else, so 𝔼[c(j,S)]c(j,i)\mathbb{E}[c(j,S)]\leq c(j,i^{*}) and 𝔼[αjiN(j)S(αjc(j,i))]=αj(αjc(j,i))=c(j,i).\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]=\alpha_{j}-(\alpha_{j}-c(j,i^{*}))=c(j,i^{*}). Therefore, the fraction is 11. ∎

Therefore, we have the LMP approximation is at most ρ(p)\rho(p), where ρ(p)\rho(p) is determined via the numerous cases in the proof of Lemma 4.2. The final step is to actually bound ρ(p)\rho(p) based on the cases. Indeed, by casework one can show the following proposition.

Proposition 4.7.

For p[0.096,0.402]p\in[0.096,0.402] and δ1=4+827,δ2=2\delta_{1}=\frac{4+8\sqrt{2}}{7},\delta_{2}=2, and δ3=0.265\delta_{3}=0.265, we have that

ρ(p)max(3+22,1+2p+(1p)δ1+22p2+(1p)δ1,(1p)(1+δ1)2+p(1δ3)21p+p(1δ3)2,(12p)(1+δ1)2(δ1+(δ1+δ3)2)+p(1+δ1)2δ3(12p)(δ1+(δ1+δ3)2)+p(1+δ1)2δ3).\rho(p)\leq\max\bigg{(}3+2\sqrt{2},1+2p+(1-p)\cdot\delta_{1}+2\sqrt{2p^{2}+(1-p)\cdot\delta_{1}},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p(1-\sqrt{\delta_{3}})^{2}},\\ \frac{(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}{(1-2p)\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot(1+\sqrt{\delta_{1}})^{2}\cdot\delta_{3}}\bigg{)}.

As a consequence, for p1:=0.402p_{1}:=0.402, ρ(p1)3+22\rho(p_{1})\leq 3+2\sqrt{2}.

We defer the casework to Lemma 5.19, and the above proposition follows immediately from it. Therefore, we get a (3+22)5.828(3+2\sqrt{2})\approx 5.828 LMP approximation for the Euclidean kk-means problem.

5 Polynomial-time Approximation Algorithm for Euclidean kk-means

In this section, we describe how we improve the LMP approximation for Euclidean kk-means to a polynomial-time approximation algorithm. Unfortunately, we will lose a slight factor in our approximation, but we still obtain a significant improvement over the previous state-of-the-art approximation factor. While we focus on the kk-means problem, we note that this improvement can also be applied to the kk-median problem as well, with some small modifications that we will make note of. In Section 6, we will provide an LMP approximation for kk-median, and explain how we can use the same techniques in this section to also obtain an improved polynomial-time algorithm for kk-median as well.

In Subsection 5.1, we describe the polynomial-time algorithm to generate two nested quasi-independent sets II and II^{\prime}, which will be crucial in developing our final set of centers of size kk with low clustering cost. This procedure is based on a similar algorithm of Ahmadian et al. [1], but with some important changes in how we update our graphs and independent sets. In Subsection 5.2, we describe and state a few additional preliminary results. In Subsection 5.3, we analyze the algorithm and show how we can use II and II^{\prime} to generate our final set of centers, to obtain a 6.0136.013-approximation algorithm. Finally, in Subsection 5.4, we show that our analysis in Subsection 5.3 can be further improved, to obtain a (5.912+ε)(5.912+\varepsilon)-approximation guarantee.

We remark that our approximation guarantee of (5.912+ε)(5.912+\varepsilon) is only in expectation. However, this can be made to be with exponential failure probability, as with probability ε\varepsilon the approximation ratio will be (5.912+O(ε))(5.912+O(\varepsilon)). So, by running the algorithm ε1poly(n)\varepsilon^{-1}\cdot\text{poly}(n) times in parallel, and outputting the best solution of these, we obtain a (5.912+ε)(5.912+\varepsilon)-approximation ratio with probability at least 1(1ε)ε1poly(n)1epoly(n)1-(1-\varepsilon)^{\varepsilon^{-1}\cdot\text{poly}(n)}\geq 1-e^{-\text{poly}(n)}.

5.1 The algorithm and setup

First, we make some assumptions on the clients and facilities. We first assume that the number of facilities, m=||m=|\mathcal{F}|, is at most polynomial in the number of clients n=|𝒟|n=|\mathcal{D}|. In addition, we also assume that the distances between clients and facilities are all in the range [1,n6][1,n^{6}]. Indeed, both of these assumptions can be made via standard discretization techniques, and we only lose an 1+o(1)1+o(1)-approximation factor by removing these assumptions [1]. Note that this means the optimal clustering cost, which we call OPTk\text{OPT}_{k}, is at least nn for both kk-means and kk-medians. Finally, we assume that kn1k\leq n-1 (else this problem is trivial in polynomial time).

Next, we describe the setup relating to dual solutions. Consider the tuple (α,z,S,𝒟S)(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}), where α𝒟\alpha\in\mathbb{R}^{\mathcal{D}}, zz\in\mathbb{R}^{\mathcal{F}}, S\mathcal{F}_{S}\subset\mathcal{F}, and 𝒟S:S{0,1}𝒟\mathcal{D}_{S}:\mathcal{F}_{S}\to\{0,1\}^{\mathcal{D}}. Here, α\alpha represents the set {αj}j𝒟\{\alpha_{j}\}_{j\in\mathcal{D}} which will be a solution to the dual linear program, zz represents {zi}i\{z_{i}\}_{i\in\mathcal{F}}, where each zi{λ,λ+1n}z_{i}\in\{\lambda,\lambda+\frac{1}{n}\} will be a modified value representing the threshold for tightness of facility ii, S\mathcal{F}_{S} represents a subset of facilities that we deem “special”, and 𝒟S\mathcal{D}_{S} is a function that maps each special facility to a subset of the clients that we deem special clients for that facility.

When talking about a single solution (α,z,S,𝒟S)(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}), we define βij=max(0,αjc(j,i))\beta_{ij}=\max(0,\alpha_{j}-c(j,i)) for any i,j𝒟i\in\mathcal{F},j\in\mathcal{D}, and define N(i)={j𝒟:βij>0}N(i)=\{j\in\mathcal{D}:\beta_{ij}>0\}. We say that a facility ii is tight if j𝒟βij=zi\sum_{j\in\mathcal{D}}\beta_{ij}=z_{i}. Now, we define τi\tau_{i} for each ii that is either tight or special (i.e., in S\mathcal{F}_{S}). For each tight facility, we define τi=maxjN(i)αj\tau_{i}=\max_{j\in N(i)}\alpha_{j}, and for each special facility, we define τi=maxjN(i)𝒟S(i)αj\tau_{i}=\max_{j\in N(i)\cap\mathcal{D}_{S}(i)}\alpha_{j}. We default the maximum of an empty set to be 0. We also consider a modified conflict graph H:=H(δ)H:=H(\delta) on the set of tight or special facilities, with an edge between ii and ii^{\prime} if c(i,i)δmin(τi,τi)c(i,i^{\prime})\leq\delta\cdot\min(\tau_{i},\tau_{i^{\prime}}).

We can now define the notion of roundable solutions: our definition is slightly modified from [1, Definition 5.1].

Definition 5.1.

Let α𝒟,\alpha\in\mathbb{R}^{\mathcal{D}}, z,z\in\mathbb{R}^{\mathcal{F}}, S\mathcal{F}_{S}\subset\mathcal{F} be the set of special facilities, and 𝒟S:S{0,1}𝒟\mathcal{D}_{S}:\mathcal{F}_{S}\to\{0,1\}^{\mathcal{D}} be the function assigning each special facility iSi\in\mathcal{F}_{S} to a subset of special clients 𝒟S(i)\mathcal{D}_{S}(i). Then, the tuple (α,z,S,𝒟S)(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}) is (λ,k)(\lambda,k^{\prime})-roundable if

  1. 1.

    α\alpha is a feasible solution of DUAL(λ+1n)\text{DUAL}(\lambda+\frac{1}{n}) and αj1\alpha_{j}\geq 1 for all jj.

  2. 2.

    For all i,i\in\mathcal{F}, λziλ+1n.\lambda\leq z_{i}\leq\lambda+\frac{1}{n}.

  3. 3.

    There exists a subset 𝒟B\mathcal{D}_{B} of “bad” clients so that for all j𝒟j\in\mathcal{D}, there is a facility w(j)w(j) that is either tight or in S\mathcal{F}_{S}, such that:

    1. (a)

      For all j𝒟\𝒟Bj\in\mathcal{D}\backslash\mathcal{D}_{B}, (1+ε)αjc(j,w(j))(1+\varepsilon)\cdot\alpha_{j}\geq c(j,w(j))

    2. (b)

      For all j𝒟\𝒟Bj\in\mathcal{D}\backslash\mathcal{D}_{B}, (1+ε)αjτw(j)(1+\varepsilon)\cdot\alpha_{j}\geq\tau_{w(j)}

    3. (c)

      γOPTkj𝒟B(c(j,w(j))+τw(j))\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\tau_{w(j)}\right)

  4. 4.

    iSj𝒟S(i)max(0,αjc(j,i))λ|S|γOPTk\sum_{i\in\mathcal{F}_{S}}\sum_{j\in\mathcal{D}_{S}(i)}\max(0,\alpha_{j}-c(j,i))\geq\lambda\cdot|\mathcal{F}_{S}|-\gamma\cdot\text{OPT}_{k^{\prime}}, and |S|n.|\mathcal{F}_{S}|\leq n.

Here, γε1\gamma\ll\varepsilon\ll 1 are arbitrarily small constants, which are implicit parameters in the definition. Finally, we say that (α,z,S,𝒟S)(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}) is kk^{\prime}-roundable if it is (λ,k)(\lambda,k^{\prime})-roundable for some choice of λ0\lambda\geq 0.

Algorithm 2 Generate Sequence of Nested Quasi-Independent Sets
1:Initialize 𝒮(0)=(α(0),z(0),S(0),𝒟S(0))\mathcal{S}^{(0)}=(\alpha^{(0)},z^{(0)},\mathcal{F}_{S}^{(0)},\mathcal{D}_{S}^{(0)}), and set λ0\lambda\leftarrow 0, I1(0)I_{1}^{(0)}\leftarrow\mathcal{F}, I2(0)I_{2}^{(0)}\leftarrow\emptyset, I3(0)I_{3}^{(0)}\leftarrow\emptyset
2:Set I(0)=(I1(0),I2(0),I3(0))I^{(0)}=(I_{1}^{(0)},I_{2}^{(0)},I_{3}^{(0)}), εznpoly(γ1)\varepsilon_{z}\leftarrow n^{-\text{poly}(\gamma^{-1})}, L4n7εz1L\leftarrow 4n^{7}\cdot\varepsilon_{z}^{-1}, kmin(k,|I1(0)|),k^{\prime}\leftarrow\min(k,|I_{1}^{(0)}|), and p1=0.402p_{1}=0.402.
3:for λ=0,εz,,Lεz\lambda=0,\varepsilon_{z},\dots,L\cdot\varepsilon_{z} do
4:     for ii\in\mathcal{F} do
5:         Call RaisePrice(α(0),z(0),I1(0),i\alpha^{(0)},z^{(0)},I_{1}^{(0)},i) to generate a polynomial-size sequence 𝒮(1),,𝒮(q)\mathcal{S}^{(1)},\dots,\mathcal{S}^{(q)} of close, kk^{\prime}-roundable solutions
6:         for =0\ell=0 to q1q-1 do
7:              Call GraphUpdate(𝒮(),𝒮(+1),I()\mathcal{S}^{(\ell)},\mathcal{S}^{(\ell+1)},I^{(\ell)}) to produce a sequence {I(,r)}r=0p\{I^{(\ell,r)}\}_{r=0}^{p_{\ell}}
8:              for r=1r=1 to pp_{\ell} do
9:                  if |I1(,r)|+p1|I2(,r)I3(,r)|<k|I_{1}^{(\ell,r)}|+p_{1}|I_{2}^{(\ell,r)}\cup I_{3}^{(\ell,r)}|<k then
10:                       Let I=(I1(,r1),I2(,r1),I3(,r1))I=(I_{1}^{(\ell,r-1)},I_{2}^{(\ell,r-1)},I_{3}^{(\ell,r-1)}), I=(I1(,r),I2(,r),I3(,r))I^{\prime}=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)}), and return (I,I)(I,I^{\prime})
11:                  else
12:                       I(+1)I(,p)I^{(\ell+1)}\leftarrow I^{(\ell,p_{\ell})}                                          
13:         𝒮(0)𝒮(q)\mathcal{S}^{(0)}\leftarrow\mathcal{S}^{(q)}, I(0)I(q)I^{(0)}\leftarrow I^{(q)}
14:         kmin(k,|I1(0)|)k^{\prime}\leftarrow\min(k^{\prime},|I_{1}^{(0)}|)      

Our main algorithm is described in Figure 2. This algorithm outputs two nested quasi-independent sets II and II^{\prime}. The final set SS will be obtained either from one of these two sets, or from some hybridization of them. We will defer the actual construction of SS to Theorem 5.17.

The method of RaisePrice comes from [1], and will not be of importance, apart from the results that they give us for the overall algorithm (see Theorem 5.7). We will make some important definitions and set up the overall algorithm, and then describe the method of GraphUpdate, which is slightly modified from [1].

First, we describe the initialization phase of Algorithm 2 to generate 𝒮(0)\mathcal{S}^{(0)}, which is almost identical to that of Ahmadian et al. [1, P. 22-23 of journal version]. The main difference is that we parameterized the procedure by 1+κ,1/κ1+\kappa,1/\kappa (instead of 2 and 6 in Ahmadian et al.) Start by setting λ=0\lambda=0, zi(0)=0z_{i}^{(0)}=0 for all ii\in\mathcal{F}, and S(0)=\mathcal{F}_{S}^{(0)}=\emptyset (so 𝒟S(0)\mathcal{D}_{S}^{(0)} has empty domain). We then set αj(0)=0\alpha_{j}^{(0)}=0 for all j𝒟j\in\mathcal{D}.

Now, we increase all of the αj(0)\alpha_{j}^{(0)} values simultaneously at a uniform rate, and for each jj, we stop increasing αj(0)\alpha_{j}^{(0)} once one of the following 22 events occur:

  1. 1.

    αj(0)=c(j,i)\alpha_{j}^{(0)}=c(j,i) for some ii.

  2. 2.

    (1+κ)αjd(j,j)+1καj(1+\kappa)\cdot\sqrt{\alpha_{j}}\geq d(j,j^{\prime})+\frac{1}{\kappa}\cdot\sqrt{\alpha_{j^{\prime}}} for some jjj^{\prime}\neq j (or (1+κ)αjd(j,j)+1καj(1+\kappa)\cdot\alpha_{j}\geq d(j,j^{\prime})+\frac{1}{\kappa}\cdot\alpha_{j^{\prime}} in the kk-median case). Here, κ\kappa will be a fixed small constant (see Appendix D for details on how to set κ\kappa).

In the initial solution, αj(0)minic(j,i)\alpha_{j}^{(0)}\leq\min_{i\in\mathcal{F}}c(j,i) for all ii\in\mathcal{F}, which means that N(i)N(i) is empty for all ii\in\mathcal{F}, so τi=0\tau_{i}=0. In addition, since every zi=0z_{i}=0, every facility is tight. This means that the conflict graph H(δ)H(\delta) on the set of tight facilities for the initial solution we construct is just an empty graph on the full set of facilities, since c(i,i)>0=δmin(τi,τi).c(i,i^{\prime})>0=\delta\cdot\min(\tau_{i},\tau_{i^{\prime}}). So, this means that if we apply Algorithm 1 to V1=V_{1}=\mathcal{F}, we will obtain that I1(0)=I_{1}^{(0)}=\mathcal{F} and I2(0)=I3(0)=I_{2}^{(0)}=I_{3}^{(0)}=\emptyset.

We now set up some definitions that will be important for the remainder of the algorithm and analysis. Define two dual solutions α={αj}\alpha=\{\alpha_{j}\} and α={αj}\alpha^{\prime}=\{\alpha_{j}^{\prime}\} to be close if maxj𝒟|αjαj|1n2\max_{j\in\mathcal{D}}|\alpha_{j}-\alpha_{j}^{\prime}|\leq\frac{1}{n^{2}}. Consider two solutions 𝒮()=(α(),z(),S(),𝒟S())\mathcal{S}^{(\ell)}=(\alpha^{(\ell)},z^{(\ell)},\mathcal{F}_{S}^{(\ell)},\mathcal{D}_{S}^{(\ell)}) and 𝒮(+1)=(α(+1),z(+1),S(+1),𝒟S(+1))\mathcal{S}^{(\ell+1)}=(\alpha^{(\ell+1)},z^{(\ell+1)},\mathcal{F}_{S}^{(\ell+1)},\mathcal{D}_{S}^{(\ell+1)}) that are each (λ,k)(\lambda,k^{\prime})-roundable for some choice of λ\lambda, such that α()\alpha^{(\ell)} and α(+1)\alpha^{(\ell+1)} are close. This means that |αj()αj(+1)|1n2|\alpha_{j}^{(\ell)}-\alpha_{j}^{(\ell+1)}|\leq\frac{1}{n^{2}} for all j𝒟,j\in\mathcal{D}, and that λzi(),zi(+1)λ+1n\lambda\leq z_{i}^{(\ell)},z_{i}^{(\ell+1)}\leq\lambda+\frac{1}{n} for all ii\in\mathcal{F}, even if ii is not tight. Let 𝒱()\mathcal{V}^{(\ell)} represent the set of tight or special facilities in 𝒮()\mathcal{S}^{(\ell)} and define 𝒱(+1)\mathcal{V}^{(\ell+1)} likewise.

Let \sqcup denote the disjoint union, i.e., STS\sqcup T is a set consisting of a copy of each element in SS and a distinct copy of each element in TT. For each point i𝒱()𝒱(+1)i\in\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}, we define 𝒟S(i)\mathcal{D}_{S}(i) (if ii were a special facility), τi\tau_{i}, and ziz_{i} based on whether ii came from 𝒱()\mathcal{V}^{(\ell)} or from 𝒱(+1)\mathcal{V}^{(\ell+1)}. This means that for i𝒱()i\in\mathcal{V}^{(\ell)}, 𝒟S(i)=𝒟S()(i)\mathcal{D}_{S}(i)=\mathcal{D}^{(\ell)}_{S}(i), zi=zi()z_{i}=z_{i}^{(\ell)}, and τi=τi()=maxj:αj()>c(j,i)αj()\tau_{i}=\tau_{i}^{(\ell)}=\max_{j:\alpha_{j}^{(\ell)}>c(j,i)}\alpha_{j}^{(\ell)} if ii is tight and τi=τi()=maxj:αj()>c(j,i),j𝒟S()(i)αj()\tau_{i}=\tau_{i}^{(\ell)}=\max_{j:\alpha_{j}^{(\ell)}>c(j,i),j\in\mathcal{D}_{S}^{(\ell)}(i)}\alpha_{j}^{(\ell)} if ii is special (and likewise for i𝒱(+1)i\in\mathcal{V}^{(\ell+1)}). In addition, for each client j𝒟j\in\mathcal{D}, we define αj=min(αj(),αj(+1))\alpha_{j}=\min(\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}); for each j𝒟j\in\mathcal{D} and i𝒱()𝒱(+1)i\in\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}, we define βij=max(0,αjc(j,i))\beta_{ij}=\max(0,\alpha_{j}-c(j,i)); and for each ii\in\mathcal{F}, we let N(i)={j𝒟:αjc(j,i)>0}N(i)=\{j\in\mathcal{D}:\alpha_{j}-c(j,i)>0\}. Since αj=min(αj(),αj(+1)),\alpha_{j}=\min(\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}), this means N(i)N()(i)N(i)\subset N^{(\ell)}(i) if i𝒱()i\in\mathcal{V}^{(\ell)} and N(i)N(+1)(i)N(i)\subset N^{(\ell+1)}(i) if i𝒱(+1).i\in\mathcal{V}^{(\ell+1)}.

We create a hybrid conflict graph on a subset of the disjoint union 𝒱()𝒱(+1)\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}. First, we let H(,0)(δ)H^{(\ell,0)}(\delta) represent the conflict graph on 𝒱(,0):=𝒱()\mathcal{V}^{(\ell,0)}:=\mathcal{V}^{(\ell)}. The conflict graph on a set of facilities means there is an edge between two vertices (i,i)(i,i^{\prime}) if c(i,i)δmin(τi,τi)c(i,i^{\prime})\leq\delta\cdot\min(\tau_{i},\tau_{i^{\prime}}). Next, we choose some ordering of the vertices in 𝒱()\mathcal{V}^{(\ell)}, and for each 1rp:=|𝒱()|+11\leq r\leq p_{\ell}:=|\mathcal{V}^{(\ell)}|+1, we let V(,r)V^{(\ell,r)} represent the vertices of the so-called merged vertex set defined as 𝒱()𝒱(+1)\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)} after we removed the first r1r-1 vertices in 𝒱()\mathcal{V}^{(\ell)}, and let H(,r)(δ)H^{(\ell,r)}(\delta) represent the conflict graph on 𝒱(,r)\mathcal{V}^{(\ell,r)}, where again the conflict graph means that i,i𝒱(,r)i,i^{\prime}\in\mathcal{V}^{(\ell,r)} share an edge if c(i,i)δmin(τi,τi)c(i,i^{\prime})\leq\delta\cdot\min(\tau_{i},\tau_{i^{\prime}}). Note that 𝒱(,1)=𝒱()𝒱(+1)\mathcal{V}^{(\ell,1)}=\mathcal{V}^{(\ell)}\sqcup\mathcal{V}^{(\ell+1)}. For simplicity of notation, we may abbreviate 𝒱(,r)\mathcal{V}^{(\ell,r)} as 𝒱\mathcal{V} and H(,r)(δ)H^{(\ell,r)}(\delta) as H(δ)H(\delta) if the context is clear.

In the context of a hybrid conflict graph 𝒱(,r)\mathcal{V}^{(\ell,r)}, for any client jj, let N (j)\accentset{\rule{2.79996pt}{0.7pt}}{N}(j) be the subset of 𝒱(,r)\mathcal{V}^{(\ell,r)} consisting of all tight facilities ii such that jN(i)j\in N(i) and all special facilities ii such that jN(i)𝒟S(i)j\in N(i)\cap\mathcal{D}_{S}(i). We also define w(j)w(j) as the witness for jj in the solution 𝒮(+1)\mathcal{S}^{(\ell+1)}, and 𝒟B\mathcal{D}_{B} as the set of bad clients from the solution 𝒮(+1)\mathcal{S}^{(\ell+1)}.

Finally, to describe the actual GraphUpdate procedure, it works as follows. First, we note that I(0,0)=I(0)I^{(0,0)}=I^{(0)} which has already been decided, either by the first initialized solution or from the previous solution I(q)I^{(q)} before we reset I(0)=I(q)I^{(0)}=I^{(q)}. Otherwise, I(+1,0)=I(,p)I^{(\ell+1,0)}=I^{(\ell,p_{\ell})} since the set of vertices 𝒱(+1,0)=𝒱(+1)=𝒱(,p)\mathcal{V}^{(\ell+1,0)}=\mathcal{V}^{(\ell+1)}=\mathcal{V}^{(\ell,p_{\ell})}. Finally, we note that |𝒱(,r)\𝒱(,r+1)|1,|\mathcal{V}^{(\ell,r)}\backslash\mathcal{V}^{(\ell,r+1)}|\leq 1, since for r=0r=0 𝒱(,r)𝒱(,r+1)\mathcal{V}^{(\ell,r)}\subset\mathcal{V}^{(\ell,r+1)} and otherwise, 𝒱(,r+1)\mathcal{V}^{(\ell,r+1)} is created by simply removing one element from 𝒱(,r)\mathcal{V}^{(\ell,r)}. So, the maximal independent set I1(,r)I_{1}^{(\ell,r)} of H(,r)(δ1)H^{(\ell,r)}(\delta_{1}) can easily be extended to a maximal independent set I1(,r+1)I_{1}^{(\ell,r+1)} of H(,r+1)(δ1)H^{(\ell,r+1)}(\delta_{1}) by deleting at most 11 element and then possibly extending the independent set. We then extend I1(,r+1)I_{1}^{(\ell,r+1)} arbitrarily based on Steps 2 to 5 to create I2(,r+1)I_{2}^{(\ell,r+1)} and I3(,r+1)I_{3}^{(\ell,r+1)}, where I2(,r+1)I_{2}^{(\ell,r+1)} and I3(,r+1)I_{3}^{(\ell,r+1)} may have no relation to I2(,r)I_{2}^{(\ell,r)} and I3(,r)I_{3}^{(\ell,r)}. So, we inductively create I(,r+1)=(I1(,r+1),I2(,r+1),I3(,r+1))I^{(\ell,r+1)}=(I_{1}^{(\ell,r+1)},I_{2}^{(\ell,r+1)},I_{3}^{(\ell,r+1)}) from I(,r)=(I1(,r),I2(,r),I3(,r))I^{(\ell,r)}=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)}), where importantly |I1(,r)\I1(,r+1)|1|I_{1}^{(\ell,r)}\backslash I_{1}^{(\ell,r+1)}|\leq 1.

5.2 Additional preliminaries

For our approximation guarantees, we require two additional preliminaries. The first is to show a rough equivalence between solving kk-means (resp., kk-median) clustering and solving kk-means (resp., kk-median) clustering if allowed O(1)O(1) additional clusters. The second is to define the notion of negative-submodularity and its application for kk-means and kk-median.

First, we show that for any constant CC and parameter ε\varepsilon, if there exists a polynomial-time α\alpha-approximation algorithm for kk-means or kk-median in any metric space that uses k+Ck+C centers, then for any constant ε>0\varepsilon>0, there exists a polynomial-time α(1+ε)\alpha(1+\varepsilon)-approximation algorithm that opens exactly kk centers.

More formally, the statement we prove is the following. Note that a similar statement was proven in [36].

Lemma 5.2.

Let C,αC,\alpha be some absolute constants. Let 𝒜\mathcal{A} be an α\alpha-approximation algorithm with running time TT for kk-median (resp. kk-means) that open k+Ck+C centers. Then, for any 1/3>ε>01/3>\varepsilon>0, there exists an α(1+ε)\alpha(1+\varepsilon)-approximation for kk-median (resp. kk-means) with running time O(T+npoly(C/ε))O(T+n^{\text{poly}(C/\varepsilon)}).

Proof.

We give the proof of the kk-median problem, the proof for the kk-means problem is identical, up to adjustment of the constants.

To proceed, we need the following notion (due to [42]): A kk-median instance is said to be (1+α)(1+\alpha)-ORSS-separable if the ratio of the cost of the optimum solution with k1k-1 centers to the cost of the optimum solution with kk centers is at least 1+α1+\alpha.

We can now present our algorithm; For any kk, the algorithm is as follows. Compute an α\alpha-approximate solution SCS_{C} (with k+CC=kk+C-C=k centers) to the (kC)(k-C)-median instance using 𝒜\mathcal{A}. Then, for any i=1,,Ci=1,\ldots,C, compute a solution Si1S_{i-1} with kik-i centers using the algorithms for (1+ε/C)ε/10(1+\nicefrac{{\varepsilon}}{{C}})^{\varepsilon/10}-ORSS-separable instances of [5, 22] to obtain a (1+ε/3)(1+\varepsilon/3)-approximate solution in time npoly(C/ε)n^{\text{poly}(C/\varepsilon)}. Output the solution SS^{*} of S0,,SCS_{0},\ldots,S_{C} with minimum kk-median cost.

We now turn to the analysis of the above algorithm. The running time follows immediately from its definition and the results of [5, 22]. Let’s then consider the approximation guarantee of the solution produced. For 0iC0\leq i\leq C, let OPTi\text{OPT}_{i} be the solution to (ki)(k-i)-median, i.e.: the kk-median problem with kik-i centers. Our goal is thus to show that the solution SS^{*} output by the above algorithm is an α(1+ε)\alpha(1+\varepsilon)-approximate solution to the OPT0\text{OPT}_{0}.

If the cost of OPTC\text{OPT}_{C} is within a (1+ε)(1+\varepsilon)-factor of the cost of OPT0\text{OPT}_{0}, then the cost of the solution output by the algorithm is no larger than the cost of solution SCS_{C} whose total cost is thus at most αcost(OPTC)α(1+ε)cost(OPT0)\alpha\text{cost}(\text{OPT}_{C})\leq\alpha(1+\varepsilon)\text{cost}(\text{OPT}_{0}), as desired.

Otherwise, since ε<1/3\varepsilon<1/3 we have that there exists a i>0i>0 such that cost(OPTi)(1+ε/C)ε/10cost(OPTi1)\text{cost}(\text{OPT}_{i})\geq(1+\nicefrac{{\varepsilon}}{{C}})^{\varepsilon/10}\text{cost}(\text{OPT}_{i-1}). Let ii^{*} be the smallest ii such that the above holds. In which case, we have both that cost(OPTi1)(1+ε/3)cost(OPT0)\text{cost}(\text{OPT}_{i^{*}-1})\leq(1+\varepsilon/3)\text{cost}(\text{OPT}_{0}) and that the (k(i1))(k-(i^{*}-1))-median instance is (1+ε/C)ε/10(1+\nicefrac{{\varepsilon}}{{C}})^{\varepsilon/10}-ORSS-separable. Therefore, by the results of [5, 22], the cost of the solution output by our algorithm is no larger than (1+ε/3)cost(OPTi1)(1+\varepsilon/3)\text{cost}(\text{OPT}_{i^{*}-1}) and so at most (1+ε/3)2cost(OPT0)(1+ε)cost(OPT0)(1+\varepsilon/3)^{2}\text{cost}(\text{OPT}_{0})\leq(1+\varepsilon)\text{cost}(\text{OPT}_{0}) by our choice of ε\varepsilon, hence the lemma. ∎

Next, we describe the definition of submodular and negative-submodular set functions.

Definition 5.3.

Let Ω\Omega be a finite set, and let ff be a function from the set of subsets of Ω\Omega, 𝒫(Ω)\mathcal{P}(\Omega), to the real numbers \mathbb{R}. Then, ff is submodular over Ω\Omega if for any XYΩX\subset Y\subsetneq\Omega and any xΩ\Yx\in\Omega\backslash Y, we have that f(X{x})f(X)f(Y{x})f(Y)f(X\cup\{x\})-f(X)\geq f(Y\cup\{x\})-f(Y). Likewise, ff is negative-submodular over Ω\Omega if f-f is a submodular function: equivalently, if for any XYΩX\subset Y\subsetneq\Omega and any xΩ\Yx\in\Omega\backslash Y, we have that f(X{x})f(X)f(Y{x})f(Y)f(X\cup\{x\})-f(X)\leq f(Y\cup\{x\})-f(Y).

The following claim, proven in [17], proves that the kk-means and kk-median objective functions are both negative-submodular functions.

Proposition 5.4.

[17, Claim 10] Fix \mathcal{F} and 𝒟\mathcal{D}, and let f:𝒫()f:\mathcal{P}(\mathcal{F})\to\mathbb{R} be the function sending each subset SS\subset\mathcal{F} to cost(𝒟,S)\text{cost}(\mathcal{D},S). Then, ff is a negative-submodular function over \mathcal{F}, either if the cost is the kk-median cost or if the cost is the kk-means cost.

Given Proposition 5.4, we can use standard properties of submodular functions to infer the following claims.

Proposition 5.5.

Let S0S1S_{0}\subset S_{1} be sets of facilities, where S0S_{0} has size k0k_{0} and S1S_{1} has size k0+k1k_{0}+k_{1}. Then, for any 0p10\leq p\leq 1, if S:S0SS1S:S_{0}\subset S\subset S_{1} is a set created including all of S0S_{0} and then independently including each element in S1\S0S_{1}\backslash S_{0} with probability pp, then 𝔼[cost(𝒟,S)]pcost(𝒟,S1)+(1p)cost(𝒟,S0)\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq p\cdot\text{cost}(\mathcal{D},S_{1})+(1-p)\cdot\text{cost}(\mathcal{D},S_{0}).

Proposition 5.6.

Let S0S1S_{0}\subset S_{1} be sets of facilities, where S0S_{0} has size k0k_{0} and S1S_{1} has size k0+k1k_{0}+k_{1}. Then, if S:S0SS1S:S_{0}\subset S\subset S_{1} is a set created by randomly adding exactly k2k_{2} items from S1\S0S_{1}\backslash S_{0} to S0S_{0}, for some fixed 0k2k10\leq k_{2}\leq k_{1}, then we have 𝔼[cost(𝒟,S)]k2k1cost(𝒟,S1)+(1k2k1)cost(𝒟,S0)\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\frac{k_{2}}{k_{1}}\cdot\text{cost}(\mathcal{D},S_{1})+(1-\frac{k_{2}}{k_{1}})\cdot\text{cost}(\mathcal{D},S_{0}).

5.3 Analysis

The following theorem relating to the above algorithm was (essentially) proven by Ahmadian et al. [1], and will be very important in our analysis.

Theorem 5.7.

Algorithm 2 runs in nO(1)n^{O(1)} time (where the O(1)O(1) may depend on ε\varepsilon and γ\gamma), and the following conditions hold.

  1. 1.

    Let kk^{\prime} be the minimum of kk and the sizes of all sets that become I1(0)I_{1}^{(0)} (i.e., the first part of each nested quasi-independent set I(q)I^{(q)} that becomes I(0)I^{(0)}, as done in line 13 of the pseudocode). Then, every solution 𝒮()\mathcal{S}^{(\ell)} that is generated when λ\lambda is a certain value is (λ,k)(\lambda,k^{\prime})-roundable.

  2. 2.

    For any solution 𝒮\mathcal{S} that becomes 𝒮(0)\mathcal{S}^{(0)}, S(0)=\mathcal{F}_{S}^{(0)}=\emptyset (and so 𝒟S(0)\mathcal{D}_{S}^{(0)} is an empty function). In addition, 𝒮=𝒮(0)\mathcal{S}=\mathcal{S}^{(0)} has no corresponding bad clients, i.e., 𝒟B=\mathcal{D}_{B}=\emptyset.

  3. 3.

    For any two consecutive solutions 𝒮()\mathcal{S}^{(\ell)} and 𝒮(+1)\mathcal{S}^{(\ell+1)}, we have that α()\alpha^{(\ell)} and α(+1)\alpha^{(\ell+1)} are close.

  4. 4.

    Every I(,r)=(I1(,r),I2(,r),I3(,r))I^{(\ell,r)}=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)}) is a nested quasi-independent set for the set of facilities 𝒱(,r)\mathcal{V}^{(\ell,r)}. In addition, for every 1rp1\leq r\leq p_{\ell}, |I1(,r1)\I1(,r)|1.|I_{1}^{(\ell,r-1)}\backslash I_{1}^{(\ell,r)}|\leq 1.

This theorem is technically stronger than what was proven in [1], but follows from a nearly identical analysis to their paper, with a very minor tweak to the algorithm. We explain why Theorem 5.7 follows from their analysis in Appendix D.

For the following lemmas (Lemmas 5.8 until Proposition 5.15), we consider a fixed family of conflict graphs {H(δ)}δ>0={H(,r)(δ)}δ>0\{H(\delta)\}_{\delta>0}=\{H^{(\ell,r)}(\delta)\}_{\delta>0} on a hybrid set 𝒱=𝒱(,r)\mathcal{V}=\mathcal{V}^{(\ell,r)}, for some r1r\geq 1, where both 𝒮()\mathcal{S}^{(\ell)} and 𝒮(+1)\mathcal{S}^{(\ell+1)} are (λ,k)(\lambda,k^{\prime})-roundable. For some fixed δ1δ22δ3,\delta_{1}\geq\delta_{2}\geq 2\geq\delta_{3}, we let (I1,I2,I3)(I_{1},I_{2},I_{3}) be a nested quasi-independent set of {H(δ)}\{H(\delta)\}, i.e., the output of running all but the final step of Algorithm 1 with V1=𝒱V_{1}=\mathcal{V}, and treat it as fixed. However, we will let pp (required in the final step 6) be variable, though we may consider pp as initialized to some fixed p1p_{1}.

Many of these results apply for both kk-means and kk-median. While we focus on kk-means, we later explain how to make simple modifications to apply our results to the kk-median problem. In addition, we will treat δ1,δ2,δ3\delta_{1},\delta_{2},\delta_{3} as fixed but pp as potentially variable. We let ρ(p)\rho(p) represent the approximation constant from the LMP algorithm (i.e., in Lemma 4.2) with probability pp (either for kk-means or kk-median, depending on constant).

We first show some crucial preliminary claims relating to the hybrid graph 𝒱(,r)\mathcal{V}^{(\ell,r)}, where r1.r\geq 1.

Lemma 5.8.

For any client j𝒟j\in\mathcal{D} and any facility iN (j)i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j), τiαj>c(j,i)\tau_{i}\geq\alpha_{j}>c(j,i).

Proof.

Note that if i𝒱()i\in\mathcal{V}^{(\ell)}, then τi\tau_{i} is the maximum αj()\alpha_{j^{\prime}}^{(\ell)} over jj^{\prime} such that αj()>c(j,i)\alpha_{j^{\prime}}^{(\ell)}>c(j^{\prime},i) and j𝒟S()(i)j^{\prime}\in\mathcal{D}_{S}^{(\ell)}(i) if ii is special. Since αj()>αj\alpha_{j^{\prime}}^{(\ell)}>\alpha_{j^{\prime}}, this is at least the maximum αj\alpha_{j^{\prime}} over jj^{\prime} such that αj>c(j,i)\alpha_{j^{\prime}}>c(j^{\prime},i) and j𝒟S()(i)j^{\prime}\in\mathcal{D}_{S}^{(\ell)}(i) if ii is special. But if iN (j)i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j), then indeed αj>c(j,i)\alpha_{j}>c(j,i) and j𝒟S()(i)j\in\mathcal{D}_{S}^{(\ell)}(i) if ii is special (recall that 𝒟S\mathcal{D}_{S} was defined based on whether ii is in 𝒱()\mathcal{V}^{(\ell)} or 𝒱(+1)\mathcal{V}^{(\ell+1)}). So, τiαj\tau_{i}\geq\alpha_{j}. By an identical argument, the same holds if i𝒱(+1)i\in\mathcal{V}^{(\ell+1)}.

Finally, note that we defined N (j)\accentset{\rule{2.79996pt}{0.7pt}}{N}(j) to precisely be the set of tight facilities ii in 𝒱\mathcal{V} with αj>c(j,i)\alpha_{j}>c(j,i), or special facilities ii in 𝒱\mathcal{V} with αj>c(j,i)\alpha_{j}>c(j,i) and j𝒟S(i)j\in\mathcal{D}_{S}(i). So, we always have αj>c(j,i)\alpha_{j}>c(j,i). ∎

Lemma 5.9.

Suppose that SI1I2I3S\subset I_{1}\cup I_{2}\cup I_{3} contains every point in I1I_{1}, and each point in I2I_{2} and each point in I3I_{3} with probability p0.5p\leq 0.5 (not necessarily independently). Then, for any point jj,

𝔼[αjiN (j)S(αjc(j,i))]0.\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0.
Remark.

We note that this lemma holds even for bad clients j𝒟Bj\in\mathcal{D}_{B}. In addition, we remark that we will be applying this lemma on SS as a nested quasi-independent set or something similar.

Finally, we note that this lemma (and the following lemma, Lemma 5.10) are the only results where we directly use the fact that we are studying the kk-means as opposed to the kk-median problem.

Proof of Lemma 5.9.

Note that every point iN (j)Si\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S satisfies τiαj\tau_{i}\geq\alpha_{j} and αjc(j,i)>0\alpha_{j}-c(j,i)>0, by Lemma 5.8. So, by linearity of expectation, it suffices to show that

αjiN (j)I1(αjc(j,i))+12iN (j)I2(αjc(j,i))+12iN (j)I3(αjc(j,i)).\alpha_{j}\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))+\frac{1}{2}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i))+\frac{1}{2}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i)). (10)

Now, note that by the definition of I2I_{2}, every pair of points (i,i)(i,i^{\prime}) in (I1I2)N (j)(I_{1}\cup I_{2})\cap\accentset{\rule{2.79996pt}{0.7pt}}{N}(j) are separated by at least δ2min(τi,τi)\sqrt{\delta_{2}\cdot\min(\tau_{i},\tau_{i^{\prime}})} distance. But since i,iN (j)i,i^{\prime}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j), min(τi,τi)αj\min(\tau_{i},\tau_{i^{\prime}})\geq\alpha_{j} by Lemma 5.8. So, d(i,i)2αjd(i,i^{\prime})\geq\sqrt{2\cdot\alpha_{j}}. Therefore,

iN (j)(I1I2)(αjc(j,i))\displaystyle\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{2})}(\alpha_{j}-c(j,i)) |I1I2|αj12|I1I2|i,iN (j)(I1I2)d(i,i)2\displaystyle\leq|I_{1}\cup I_{2}|\cdot\alpha_{j}-\frac{1}{2\cdot|I_{1}\cup I_{2}|}\cdot\sum_{i,i^{\prime}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{2})}d(i,i^{\prime})^{2}
|I1I2|αj12|I1I2|(|I1I2|)(|I1I2|1)2αj\displaystyle\leq|I_{1}\cup I_{2}|\cdot\alpha_{j}-\frac{1}{2\cdot|I_{1}\cup I_{2}|}\cdot(|I_{1}\cup I_{2}|)\cdot(|I_{1}\cup I_{2}|-1)\cdot 2\cdot\alpha_{j}
=αj.\displaystyle=\alpha_{j}. (11)

Likewise, every pair of points (i,i)(i,i^{\prime}) in (I1I3)N (j)(I_{1}\cup I_{3})\cap\accentset{\rule{2.79996pt}{0.7pt}}{N}(j) are also separated by at least δ2min(τi,τi)2αj\sqrt{\delta_{2}\cdot\min(\tau_{i},\tau_{i^{\prime}})}\geq\sqrt{2\cdot\alpha_{j}} distance. Therefore, the same calculations as in (11) give us

iN (j)(I1I3)(αjc(j,i))αj.\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{3})}(\alpha_{j}-c(j,i))\leq\alpha_{j}. (12)

Averaging Equations (11) and (12) gives us Equation (10), which finishes the lemma. ∎

We next have the following lemma, which bounds the cost for clients that are not bad.

Lemma 5.10.

Let p<0.5p<0.5 and SS be generated by applying Step 6 to (I1,I2,I3)(I_{1},I_{2},I_{3}). Then, for every client j𝒟Bj\not\in\mathcal{D}_{B},

𝔼[c(j,S)]ρ(p)(1+O(ε))𝔼[αjiN (j)S(αjc(j,i))],\mathbb{E}[c(j,S)]\leq\rho(p)\cdot(1+O(\varepsilon))\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right],

where ρ(p)\rho(p) represents the constant from Lemma 4.2.

Proof.

By Lemma 5.8, we have that τiαj\tau_{i}\geq\alpha_{j} for all iN (j)i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j). In addition, for every j𝒟Bj\not\in\mathcal{D}_{B}, there exists a point w(j)w(j) in 𝒱(+1)\mathcal{V}^{(\ell+1)} (so it has not been removed, i.e., it is still in 𝒱=𝒱(,r)\mathcal{V}=\mathcal{V}^{(\ell,r)}) such that (1+ε)αj(+1)c(j,w(j))(1+\varepsilon)\cdot\alpha_{j}^{(\ell+1)}\geq c(j,w(j)) and (1+ε)αj(+1)τw(j)(1+\varepsilon)\cdot\alpha_{j}^{(\ell+1)}\geq\tau_{w(j)}. Since αj(+1)1\alpha_{j}^{(\ell+1)}\geq 1 for all jj and |αjαj(+1)|1n2,|\alpha_{j}-\alpha_{j}^{(\ell+1)}|\leq\frac{1}{n^{2}}, this means that (1+O(ε))αjc(j,w(j)),τw(j)(1+O(\varepsilon))\cdot\alpha_{j}\geq c(j,w(j)),\tau_{w(j)}. These pieces of information are sufficient for all of our calculations from Lemma 4.2 to go through. ∎

We note that the expression αjiN (j)S(αjc(j,i))\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i)) may be somewhat unwieldy. Therefore, we provide an upper bound on its sum over j𝒟.j\in\mathcal{D}.

Lemma 5.11.

Let SS be any subset of 𝒱=𝒱(,r)\mathcal{V}=\mathcal{V}^{(\ell,r)}. Then,

j𝒟(αjiN (j)S(αjc(j,i)))j𝒟αj(λ1n)|S|+4γOPTk.\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot|S|+4\gamma\cdot\text{OPT}_{k^{\prime}}.
Remark.

We note that this lemma holds even when λ<1n\lambda<\frac{1}{n}, i.e., λ1n<0\lambda-\frac{1}{n}<0.

Proof.

First, by splitting the sum based on tight and special facilities, we have that

j𝒟(αjiN (j)S(αjc(j,i)))\displaystyle\hskip 14.22636pt\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)
=j𝒟αjiSi tightjN(i)(αjc(j,i))iSi specialjN(i)𝒟S(i)(αjc(j,i)).\displaystyle=\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{\begin{subarray}{c}i\in S\\ i\text{ tight}\end{subarray}}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))-\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i)). (13)

Now, we note that for any tight facility ii, either j𝒟max(0,αj()c(j,i))=zi()[λ,λ+1n]\sum_{j\in\mathcal{D}}\max(0,\alpha_{j}^{(\ell)}-c(j,i))=z_{i}^{(\ell)}\in[\lambda,\lambda+\frac{1}{n}] or j𝒟max(0,αj(+1)c(j,i))=zi(+1)[λ,λ+1n]\sum_{j\in\mathcal{D}}\max(0,\alpha_{j}^{(\ell+1)}-c(j,i))=z_{i}^{(\ell+1)}\in[\lambda,\lambda+\frac{1}{n}]. Since α()\alpha^{(\ell)} and α(+1)\alpha^{(\ell+1)} are close, this means 0αj()αj,αj(+1)αj1n20\leq\alpha_{j}^{(\ell)}-\alpha_{j},\alpha_{j}^{(\ell+1)}-\alpha_{j}\leq\frac{1}{n^{2}}, and since there are nn clients in 𝒟\mathcal{D}, this means that

jN(i)(αjc(j,i))=j𝒟max(αjc(j,i),0)j𝒟max(αj()c(j,i)1n2,0)λ1n,\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=\sum_{j\in\mathcal{D}}\max(\alpha_{j}-c(j,i),0)\geq\sum_{j\in\mathcal{D}}\max\left(\alpha_{j}^{(\ell^{\prime})}-c(j,i)-\frac{1}{n^{2}},0\right)\geq\lambda-\frac{1}{n}, (14)

for some choice of \ell^{\prime} in {,+1}\{\ell,\ell+1\}.

In addition, we know that both α()\alpha^{(\ell)} and α(+1)\alpha^{(\ell+1)} are feasible solutions of DUAL(λ+1n)\text{DUAL}(\lambda+\frac{1}{n}), and that αjαj(),αj(+1)\alpha_{j}\leq\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}. Therefore, for any special facility iS()S(+1)i\in\mathcal{F}_{S}^{(\ell)}\sqcup\mathcal{F}_{S}^{(\ell+1)}, jN(i)𝒟S(i)(αjc(j,i))j𝒟max(0,αjc(j,i))λ+1n\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i))\leq\sum_{j\in\mathcal{D}}\max(0,\alpha_{j}-c(j,i))\leq\lambda+\frac{1}{n}. But, we have that

iS()jN(i)𝒟S()(i)(αjc(j,i))\displaystyle\sum_{i\in\mathcal{F}_{S}^{(\ell^{\prime})}}\sum_{j\in N(i)\cap\mathcal{D}_{S}^{(\ell^{\prime})}(i)}(\alpha_{j}-c(j,i)) iS()j𝒟S()(i)max(0,αj()1n2c(j,i))\displaystyle\geq\sum_{i\in\mathcal{F}_{S}^{(\ell^{\prime})}}\sum_{j\in\mathcal{D}_{S}^{(\ell^{\prime})}(i)}\max\left(0,\alpha_{j}^{(\ell^{\prime})}-\frac{1}{n^{2}}-c(j,i)\right)
λ|S()|γOPTk|S()|n,\displaystyle\geq\lambda\cdot|\mathcal{F}_{S}^{(\ell^{\prime})}|-\gamma\cdot\text{OPT}_{k^{\prime}}-\frac{|\mathcal{F}_{S}^{(\ell^{\prime})}|}{n},

for both =\ell^{\prime}=\ell and =+1\ell^{\prime}=\ell+1 (the last inequality follows by Condition 4 of Definition 5.1). So, if we let eie_{i} represent λ+1njN(i)𝒟S(i)(αjc(j,i))\lambda+\frac{1}{n}-\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i)) for each special facility ii, we have that ei0e_{i}\geq 0 but iS()ei2n|S()|+γOPTk2γOPTk\sum_{i\in\mathcal{F}_{S}^{(\ell)}}e_{i}\leq\frac{2}{n}\cdot|\mathcal{F}_{S}^{(\ell)}|+\gamma\cdot\text{OPT}_{k^{\prime}}\leq 2\gamma\cdot\text{OPT}_{k^{\prime}}, since |S()|n|\mathcal{F}_{S}^{(\ell)}|\leq n and OPTkn2γ\text{OPT}_{k^{\prime}}\geq n\geq\frac{2}{\gamma}. Similarly, iS(+1)ei2γOPTk.\sum_{i\in\mathcal{F}_{S}^{(\ell+1)}}e_{i}\leq 2\gamma\cdot\text{OPT}_{k^{\prime}}. So, this means that

iSi specialei4γOPTk,\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}e_{i}\leq 4\gamma\cdot\text{OPT}_{k^{\prime}},

which means that

iSi specialjN(i)𝒟S(i)(αjc(j,i))\displaystyle\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}\sum_{j\in N(i)\cap\mathcal{D}_{S}(i)}(\alpha_{j}-c(j,i)) iSi special(λ+1nei)\displaystyle\geq\sum_{\begin{subarray}{c}i\in S\\ i\text{ special}\end{subarray}}\left(\lambda+\frac{1}{n}-e_{i}\right)
(λ+1n)|{iS:i special}|4γOPTk.\displaystyle\geq\left(\lambda+\frac{1}{n}\right)\cdot|\{i\in S:i\text{ special}\}|-4\gamma\cdot\text{OPT}_{k^{\prime}}. (15)

Thus, by combining Equations (13), (14), and (15), we get

j𝒟(αjiN (j)S(αjc(j,i)))\displaystyle\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right) j𝒟αjiSi tight(λ1n)(λ+1n)|{iS:i special}|+4γOPTk\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{\begin{subarray}{c}i\in S\\ i\text{ tight}\end{subarray}}\left(\lambda-\frac{1}{n}\right)-\left(\lambda+\frac{1}{n}\right)\cdot|\{i\in S:i\text{ special}\}|+4\gamma\cdot\text{OPT}_{k^{\prime}}
j𝒟αj(λ1n)|S|+4γOPTk.\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot|S|+4\gamma\cdot\text{OPT}_{k^{\prime}}.\qed

Next, we show that the bad clients do not contribute much to the total cost.

Lemma 5.12.

Let SS be a subset of 𝒱\mathcal{V} containing I1I_{1}. Then, we have that

j𝒟Bc(j,S)O(γ)OPTk.\sum_{j\in\mathcal{D}_{B}}c(j,S)\leq O(\gamma)\cdot\text{OPT}_{k^{\prime}}.
Proof.

Note that for every point j𝒟Bj\in\mathcal{D}_{B}, there exists a facility w(j)𝒱(+1)w(j)\in\mathcal{V}^{(\ell+1)} such that

j𝒟B(c(j,w(j))+τw(j))O(γ)OPTk,\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\tau_{w(j)}\right)\leq O(\gamma)\cdot\text{OPT}_{k^{\prime}},

because 𝒮(+1)\mathcal{S}^{(\ell+1)} is (λ,k)(\lambda,k^{\prime})-roundable. Now, note that d(j,S)d(j,I1)d(j,w(j))+d(w(j),I1)d(j,S)\leq d(j,I_{1})\leq d(j,w(j))+d(w(j),I_{1}), so c(j,S)2[c(j,w(j))+c(w(j),I1)]c(j,S)\leq 2[c(j,w(j))+c(w(j),I_{1})]. But since w(j)𝒱(+1)𝒱w(j)\in\mathcal{V}^{(\ell+1)}\subset\mathcal{V}, c(w(j),I1)δ1τw(j)c(w(j),I_{1})\leq\delta_{1}\cdot\tau_{w(j)}, and therefore,

j𝒟Bc(j,S)2j𝒟B(c(j,w(j))+δ1τw(j))O(γ)OPTk,\sum_{j\in\mathcal{D}_{B}}c(j,S)\leq 2\cdot\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\delta_{1}\cdot\tau_{w(j)}\right)\leq O(\gamma)\cdot\text{OPT}_{k^{\prime}},

where the final inequality follows by Condition 3c) of Definition 5.1. ∎

We now combine our previous lemmas to obtain the following bound on the expected cost of SS, giving a result that bounds the overall expected cost in terms of the dual solution.

Lemma 5.13.

Suppose that SS is generated by applying Step 6 to (I1,I2,I3)(I_{1},I_{2},I_{3}) with the probability set to p<0.5p<0.5. Then, the expected cost 𝔼[cost(𝒟,S)]\mathbb{E}[\text{cost}(\mathcal{D},S)] is at most

ρ(p)(1+O(ε))[j𝒟αj(λ1n)𝔼[|S|]]+O(γ)OPTk,\rho(p)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[|S|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}},

where ρ(p)\rho(p) represents the constant from Lemma 4.2.

Proof.

We will abbreviate ρ:=ρ(p)\rho:=\rho(p). We can split up the cost based on good (i.e., not in 𝒟B\mathcal{D}_{B}) points jj and bad points jj. Indeed, doing this, we get

𝔼[cost(𝒟,S)]\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S)] =j𝒟B𝔼[c(j,S)]+j𝒟B𝔼[c(j,S)]\displaystyle=\sum_{j\not\in\mathcal{D}_{B}}\mathbb{E}[c(j,S)]+\sum_{j\in\mathcal{D}_{B}}\mathbb{E}[c(j,S)]
ρ(1+O(ε))j𝒟B𝔼[αjiN (j)S(αjc(j,i))]+O(γ)OPTk\displaystyle\leq\rho\cdot(1+O(\varepsilon))\cdot\sum_{j\not\in\mathcal{D}_{B}}\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
ρ(1+O(ε))j𝒟𝔼[αjiN (j)S(αjc(j,i))]+O(γ)OPTk\displaystyle\leq\rho\cdot(1+O(\varepsilon))\cdot\sum_{j\in\mathcal{D}}\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
ρ(1+O(ε))[j𝒟αj(λ1n)𝔼[|S|]]+O(γ)OPTk.\displaystyle\leq\rho\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[|S|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

In the above equation, the second line follows from Lemmas 5.10 and 5.12. The third line is true since 𝔼[αjiN (j)S(αjc(j,i))]0\mathbb{E}\left[\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0 for all jj, even in 𝒟B\mathcal{D}_{B}, by Lemma 5.9. Finally, the fourth line is true because of Lemma 5.11. ∎

Next, we show that under certain circumstances, we can find a solution SS of size at most kk satisfying a similar condition to Lemma 5.13, with high probability.

Lemma 5.14.

Suppose that |I1|+p|I2I3|=k|I_{1}|+p\cdot|I_{2}\cup I_{3}|=k for some integer kk (which may be larger than nn) and some p[0.01,0.49]p\in[0.01,0.49]. Then, for any sufficiently large integer CC, if |I2I3|100C4|I_{2}\cup I_{3}|\geq 100\cdot C^{4}, then there exists a polynomial-time randomized algorithm that outputs a set SS such that I1SI1I2I3I_{1}\subset S\subset I_{1}\cup I_{2}\cup I_{3}, and with probability at least 9/109/10, |S|k|S|\leq k and

cost(𝒟,S)ρ(p2C)(1+300C)(1+O(ε))[j𝒟αj(λ1n)k]+O(γ)OPTk.\text{cost}(\mathcal{D},S)\leq\rho\left(p-\frac{2}{C}\right)\cdot\left(1+\frac{300}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.
Remark.

By repeating this randomized algorithm polynomially many times and outputting the lowest-cost solution SS with |S|k|S|\leq k, we can make the failure probability exponentially small.

Proof.

Let r=|I2|r=|I_{2}|, and partition I2I3I_{2}\cup I_{3} into T1,,TrT_{1},\dots,T_{r}, where each TT_{\ell} consists of a point i2I2i_{2}\in I_{2} and q1(i2)q^{-1}(i_{2}), i.e., i2i_{2}’s preimage under the map qq. We assume WLOG that the TT_{\ell}’s are sorted in non-increasing order of size, and write xx_{\ell} as the unique point in TI2T_{\ell}\cap I_{2}. Define ss such that |T1||Ts|C>|Ts+1||Tr||T_{1}|\geq\dots\geq|T_{s}|\geq C>|T_{s+1}|\geq\dots\geq|T_{r}| (note that ss may be 0 or rr). Note that s|I2I3|Cs\leq\frac{|I_{2}\cup I_{3}|}{C} so s|I2I3|1C\frac{s}{|I_{2}\cup I_{3}|}\leq\frac{1}{C}. Now, set p=p2Cp^{\prime}=p-\frac{2}{C}, and consider creating the following set SS:

  • For each iI1,i\in I_{1}, include iSi\in S.

  • For each s,\ell\leq s, include xSx_{\ell}\in S, and for each ii in T\{x}T_{\ell}\backslash\{x_{\ell}\}, include iSi\in S independently with probability pp^{\prime}.

  • For each >s,\ell>s, flip a fair coin. If it lands heads, include xx_{\ell} with probability 2p2p^{\prime}, and if it lands tails, include each point in T\{x}T_{\ell}\backslash\{x_{\ell}\} independently with probability 2p2p^{\prime}.

The expected size of SS is |I1|+s+(|T1|++|Ts|s)p+(|Ts+1|++|Tr|)p=|I1|+p|I2I3|+(1p)s|I1|+p|I2I3|+s.|I_{1}|+s+(|T_{1}|+\cdots+|T_{s}|-s)\cdot p^{\prime}+(|T_{s+1}|+\cdots+|T_{r}|)\cdot p^{\prime}=|I_{1}|+p^{\prime}\cdot|I_{2}\cup I_{3}|+(1-p^{\prime})\cdot s\leq|I_{1}|+p^{\prime}\cdot|I_{2}\cup I_{3}|+s. Therefore, since p=p2C,p^{\prime}=p-\frac{2}{C}, the expected size of SS as at most |I1|+(p2C)|I2I3|+|I2I3|C=|I1|+(p1C)|I2I3|.|I_{1}|+(p-\frac{2}{C})\cdot|I_{2}\cup I_{3}|+\frac{|I_{2}\cup I_{3}|}{C}=|I_{1}|+(p-\frac{1}{C})\cdot|I_{2}\cup I_{3}|. To bound the variance of SS, we note that each point in I1I_{1} and each xx_{\ell} for s\ell\leq s is deterministically in SS, each point in T\{x}T_{\ell}\backslash\{x_{\ell}\} for s\ell\leq s is independently selected with probability pp^{\prime}, and the number of points from each TT_{\ell} for >s\ell>s is some independent random variable bounded by |T|C|T_{\ell}|\leq C. So, the variance can be bounded by (|T1|++|Ts|)+=s+1r|T|2maxs+1r|T|(|T1|++|Tr|)C|I2I3|(|T_{1}|+\cdots+|T_{s}|)+\sum_{\ell=s+1}^{r}|T_{\ell}|^{2}\leq\max_{s+1\leq\ell\leq r}|T_{\ell}|\cdot\left(|T_{1}|+\cdots+|T_{r}|\right)\leq C\cdot|I_{2}\cup I_{3}|. So, by Chebyshev’s inequality, with probability at least 1110C1-\frac{1}{10C},

|S|\displaystyle|S| |I1|+(p1C)|I2I3|+10CC|I2I3|\displaystyle\leq|I_{1}|+(p-\frac{1}{C})\cdot|I_{2}\cup I_{3}|+\sqrt{10C\cdot C\cdot|I_{2}\cup I_{3}|}
|I1|+p|I2I3|1C|I2I3|+10C|I2I3|\displaystyle\leq|I_{1}|+p|I_{2}\cup I_{3}|-\frac{1}{C}|I_{2}\cup I_{3}|+\sqrt{10}C\cdot\sqrt{|I_{2}\cup I_{3}|}
|I1|+p|I2I3|=k,\displaystyle\leq|I_{1}|+p|I_{2}\cup I_{3}|=k,

where the final inequality is true since |I2I3|100C4|I_{2}\cup I_{3}|\geq 100C^{4}.

Next, we bound the expected cost of SS. First, consider running the final step 6 of the LMP algorithm on (I1,I2,I3)(I_{1},I_{2},I_{3}) using probability pp^{\prime}. This would produce a set S0S_{0} such that

𝔼[cost(𝒟,S0)]\displaystyle\mathbb{E}\left[\text{cost}(\mathcal{D},S_{0})\right] ρ(p)(1+O(ε))𝔼[j𝒟αj(λ1n)𝔼[|S0|]]+O(γ)OPTk\displaystyle\leq\rho(p^{\prime})\cdot(1+O(\varepsilon))\cdot\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[|S_{0}|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
=ρ(p2C)(1+O(ε))[j𝒟αj(λ1n)(|I1|+(p2C)|I2I3|)]+O(γ)OPTk\displaystyle=\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(|I_{1}|+\left(p-\frac{2}{C}\right)\cdot|I_{2}\cup I_{3}|\right)\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
=ρ(p2C)(1+O(ε))[j𝒟αj(λ1n)k+2C(λ1n)|I2I3|]+O(γ)OPTk.\displaystyle=\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{2}{C}\cdot\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}. (16)

Above, the first line follows by Lemma 5.13, the second line follows by definition of pp^{\prime} and S0S_{0}, and the third line follows from the fact that |I1|+p|I2I3|=k|I_{1}|+p|I_{2}\cup I_{3}|=k.

Now, note that if we had performed the final step of the LMP algorithm on (I1,I2,I3)(I_{1},I_{2},I_{3}) using probability 1/21/2 instead of pp^{\prime}, the set (call it S1S_{1}) would have satisfied

j𝒟αj(λ1n)(|I1|+12(|I2|+|I3|))+4γOPTk𝔼[j𝒟(αjiN (j)S(αjc(j,i)))]0\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(|I_{1}|+\frac{1}{2}(|I_{2}|+|I_{3}|)\right)+4\gamma\cdot\text{OPT}_{k^{\prime}}\geq\mathbb{E}\left[\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S}(\alpha_{j}-c(j,i))\right)\right]\geq 0

by Lemmas 5.11 and 5.9. This means that j𝒟αj(λ1n)(|I1|+1/2|I2I3|)4γOPTk.\sum_{j\in\mathcal{D}}\alpha_{j}\geq(\lambda-\frac{1}{n})\cdot(|I_{1}|+1/2\cdot|I_{2}\cup I_{3}|)-4\gamma\cdot\text{OPT}_{k^{\prime}}. Therefore, again using the fact that |I1|+p|I2I3|=k|I_{1}|+p\cdot|I_{2}\cup I_{3}|=k, we have that

j𝒟αj(λ1n)k(λ1n)(12p)|I2I3|4γOPTk.\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\geq\left(\lambda-\frac{1}{n}\right)\cdot\left(\frac{1}{2}-p\right)\cdot|I_{2}\cup I_{3}|-4\gamma\cdot\text{OPT}_{k^{\prime}}.

We can rewrite this to obtain

(λ1n)|I2I3|11/2p(j𝒟αj(λ1n)k)+O(γ)OPTk,\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|\leq\frac{1}{1/2-p}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right)+O(\gamma)\cdot\text{OPT}_{k^{\prime}}, (17)

since p[0.01,0.49]p\in[0.01,0.49].

In this and the next paragraph, we prove that 𝔼[cost(𝒟,S)]𝔼[cost(𝒟,S0)]\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})]. To see why, we consider a coupling of the randomness to generate a sequence of sets S0,S1,,Ss=SS_{0},S_{1},\dots,S_{s}=S. To do so, for each point iI1i\in I_{1}, we automatically include iS=Ssi\in S=S_{s} and iShi\in S_{h} for all 0h<s0\leq h<s. Now, for each 1r1\leq\ell\leq r, we first create a temporary set T~T\{x}\tilde{T}_{\ell}\subset T_{\ell}\backslash\{x_{\ell}\} by including each point iT\{x}i\in T_{\ell}\backslash\{x_{\ell}\} to be in T~\tilde{T}_{\ell} independently with probability 2p2p. Then, we create two sets T(0)TT_{\ell}^{(0)}\subset T_{\ell} and T(1)TT_{\ell}^{(1)}\subset T_{\ell} as follows. For T(0)T_{\ell}^{(0)}, we include each point in T~\tilde{T}_{\ell} independently, with probability 1/21/2, and always include xT(0)x_{\ell}\in T_{\ell}^{(0)}. For T(1)T_{\ell}^{(1)}, we flip a fair coin: if the coin lands heads, we only include xx_{\ell}, but if the coin lands tails, we do not include xx_{\ell} but include all of T~\tilde{T}_{\ell}. We remark that overall, T(0)T_{\ell}^{(0)} includes each point in T\{x}T_{\ell}\backslash\{x_{\ell}\} independently with probability pp.

Now, for each 0hs0\leq h\leq s, we define Sh:=(1hT(0))(h<rT(1))S_{h}:=\left(\bigcup_{1\leq\ell\leq h}T_{\ell}^{(0)}\right)\cup\left(\bigcup_{h<\ell\leq r}T_{\ell}^{(1)}\right). One can verify that SS and S0S_{0} have the desired distribution, since S0=1rT(1)S_{0}=\bigcup_{1\leq\ell\leq r}T_{\ell}^{(1)} is precisely the distribution obtained after applying step 6 of the LMP algorithm on (I1,I2,I3)(I_{1},I_{2},I_{3}), but SS takes T(0)T_{\ell}^{(0)} instead of T(1)T_{\ell}^{(1)} for each s\ell\leq s, which is precisely the desired distribution for SS (as we defined SS at the beginning of this lemma’s proof). To show that 𝔼[cost(𝒟,S)]𝔼[cost(𝒟,S0)]\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})], it suffices to show that 𝔼[cost(𝒟,Sh)]𝔼[cost(𝒟,Sh1)]\mathbb{E}[\text{cost}(\mathcal{D},S_{h})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1})] for all 1hs1\leq h\leq s. However, note that because of our coupling, the only difference between ShS_{h} and Sh1S_{h-1} relates to points in ThT_{h}. If we let Sh1=Sh1{x}S_{h-1}^{\prime}=S_{h-1}\cup\{x_{\ell}\} be the set that always includes xx_{\ell} but includes the entirety of T~h\tilde{T}_{h} with probability 1/21/2, then clearly 𝔼[cost(𝒟,Sh1)]𝔼[cost(𝒟,Sh1)]\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1}^{\prime})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1})]. In addition, if we condition on the set T~h\tilde{T}_{h}, the only difference between Sh1S_{h-1}^{\prime} and ShS_{h} is that ShS_{h} includes each point in T~h\tilde{T}_{h} with 1/21/2 probability, whereas Sh1S_{h-1} either includes the entirety of T~h\tilde{T}_{h} with 1/21/2 probability or includes none of T~h\tilde{T}_{h}. Therefore, by Proposition 5.5, using the negative-submodularity of kk-means [17], we have that 𝔼[cost(𝒟,Sh)]𝔼[cost(𝒟,Sh1)]\mathbb{E}[\text{cost}(\mathcal{D},S_{h})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1}^{\prime})]. So, we have that 𝔼[cost(𝒟,Sh)]𝔼[cost(𝒟,Sh1)]𝔼[cost(𝒟,Sh1)]\mathbb{E}[\text{cost}(\mathcal{D},S_{h})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1}^{\prime})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{h-1})], which means that

𝔼[cost(𝒟,S)]=𝔼[cost(𝒟,Ss)]𝔼[cost(𝒟,Ss1)]𝔼[cost(𝒟,S1)]𝔼[cost(𝒟,S0)].\mathbb{E}[\text{cost}(\mathcal{D},S)]=\mathbb{E}[\text{cost}(\mathcal{D},S_{s})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{s-1})]\leq\cdots\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{1})]\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})]. (18)

In summary,

𝔼[cost(𝒟,S)]\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S)] 𝔼[cost(𝒟,S0)]\displaystyle\leq\mathbb{E}[\text{cost}(\mathcal{D},S_{0})]
ρ(p2C)(1+O(ε))[j𝒟αj(λ1n)k+2C(λ1n)|I2I3|]+O(γ)OPTk\displaystyle\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{2}{C}\cdot\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
ρ(p2C)(1+O(ε))(1+2/C1/2p)[j𝒟αj(λ1n)k]+O(γ)OPTk\displaystyle\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left(1+\frac{2/C}{1/2-p}\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
ρ(p2C)(1+O(ε))(1+200C)[j𝒟αj(λ1n)k]+O(γ)OPTk.\displaystyle\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left(1+\frac{200}{C}\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Above, the first line follows from Equation (18), the second line follows from Equation (16), the third line follows from Equation (17), and the fourth line follows since 1/2p0.01.1/2-p\geq 0.01. So, with probability at least 1110C1-\frac{1}{10C}, |S|k|S|\leq k, and by Markov’s inequality,

cost(𝒟,S)ρ(p2C)(1+O(ε))(1+300C)[j𝒟αj(λ1n)k]+O(γ)OPTk\text{cost}(\mathcal{D},S)\leq\rho\left(p-\frac{2}{C}\right)\cdot(1+O(\varepsilon))\cdot\left(1+\frac{300}{C}\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}

with probability at least 10C\frac{10}{C}. So, both of these hold simultaneously with probability at least 9C,\frac{9}{C}, and by repeating the procedure O(C)O(C) times, we will find our desired set SS with probability 9/109/10. ∎

Our upper bound on the cost has so far been based on terms of the form j𝒟αj(λ1n)k.\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k. We note that this value is at most roughly OPTk\text{OPT}_{k}. Specifically, we note the following:

Proposition 5.15.

If α(+1)\alpha^{(\ell+1)} is a feasible solution to DUAL(λ),\text{DUAL}(\lambda), then j𝒟αj(λ+1n)kOPTk.\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{1}{n}\right)\cdot k\leq\text{OPT}_{k}.

Proof.

Recall that αj=min(αj(),αj(+1)),\alpha_{j}=\min(\alpha_{j}^{(\ell)},\alpha_{j}^{(\ell+1)}), and that {αj(+1)}\{\alpha_{j}^{(\ell+1)}\} is a feasible solution to DUAL(λ+1n)\text{DUAL}(\lambda+\frac{1}{n}). Therefore, by duality, we have that

j𝒟αj(λ+1n)kj𝒟αj(+1)(λ+1n)kOPTk.\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{1}{n}\right)\cdot k\leq\sum_{j\in\mathcal{D}}\alpha_{j}^{(\ell+1)}-\left(\lambda+\frac{1}{n}\right)\cdot k\leq\text{OPT}_{k}.\qed

One potential issue is that if our goal is to obtain a good approximation to optimal kk-means, the γOPTk\gamma\cdot\text{OPT}_{k^{\prime}} error, which should be negligible, may appear too large if kk^{\prime} is smaller than kk. To fix this, we show that OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}) in certain cases, which we later show we will satisfy. For the following lemma, we return to considering a single roundable solution 𝒮=(α,z,S,𝒟S)\mathcal{S}=(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}), and let 𝒱\mathcal{V} represent the corresponding set of tight or special facilities corresponding to 𝒮.\mathcal{S}.

Lemma 5.16.

Let (α,z,S,𝒟S)(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}) be (λ,k)(\lambda^{\prime},k^{\prime})-roundable for some λ0\lambda^{\prime}\geq 0, where S=\mathcal{F}_{S}=\emptyset and the set of corresponding bad clients is 𝒟B=\mathcal{D}_{B}=\emptyset. Define {H(δ)}\{H(\delta)\} as the corresponding family of conflict graphs, with some fixed nested quasi-independent set (I1,I2,I3)(I_{1},I_{2},I_{3}). Suppose that kmin(n1,|I1|+p|I2I3|)k\leq\min\left(n-1,|I_{1}|+p\cdot|I_{2}\cup I_{3}|\right) for some p0.49p\leq 0.49, and that k=min(k,|I1|)k^{\prime}=\min(k,|I_{1}|). Then, OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O\left(\text{OPT}_{k}\right).

Proof.

If k=kk^{\prime}=k, the result is trivial. So, we assume that k=|I1|k^{\prime}=|I_{1}|.

Since 𝒟B\mathcal{D}_{B} is empty, we have that every client jj has a tight witness w(j)w(j) (since there are no special facilities) such that (1+ε)αjc(j,w(j))(1+\varepsilon)\cdot\alpha_{j}\geq c(j,w(j)) and (1+ε)αjτw(j)(1+\varepsilon)\cdot\alpha_{j}\geq\tau_{w(j)}. In addition, we have that for any iI1I2I3i\in I_{1}\cup I_{2}\cup I_{3}, ii is tight which means jN(i)(αjc(j,i))=zi[λ,λ+1n]\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=z_{i}\in[\lambda^{\prime},\lambda^{\prime}+\frac{1}{n}]. Therefore,

j𝒟[αjiN(j)I1(αjc(j,i))]=j𝒟αjiI1jN(i)(αjc(j,i))j𝒟αjλ|I1|.\sum_{j\in\mathcal{D}}\left[\alpha_{j}-\sum_{i\in N(j)\cap I_{1}}(\alpha_{j}-c(j,i))\right]=\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{i\in I_{1}}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda^{\prime}\cdot|I_{1}|.

Then, we can use the LMP approximation to get that

j𝒟c(j,I1)ρ1(1+O(ε))j𝒟(αjiN(j)I1(αjc(j,i)))ρ1(1+O(ε))(j𝒟αjλ|I1|),\sum_{j\in\mathcal{D}}c(j,I_{1})\leq\rho_{1}\cdot(1+O(\varepsilon))\cdot\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in N(j)\cap I_{1}}(\alpha_{j}-c(j,i))\right)\leq\rho_{1}\cdot(1+O(\varepsilon))\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda^{\prime}\cdot|I_{1}|\right),

where ρ1=ρ(0)\rho_{1}=\rho(0), i.e., assuming that no point in I2I_{2} or I3I_{3} is included as part of the set. In addition, we know that if SS is created by including all of I1I_{1} and each point in I2I3I_{2}\cup I_{3} with probability 12\frac{1}{2}, then

j𝒟αjλ(|I1|+12|I2I3|)\displaystyle\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda^{\prime}\cdot\left(|I_{1}|+\frac{1}{2}|I_{2}\cup I_{3}|\right) 𝔼[j𝒟αjiSjN(i)(αjc(j,i))]\displaystyle\geq\mathbb{E}\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{i\in S}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\right]
=j𝒟𝔼[αjiN(j)S(αjc(j,i))]\displaystyle=\sum_{j\in\mathcal{D}}\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]
0.\displaystyle\geq 0.

Above, the first inequality follows since jN(i)(αjc(j,i))=ziλ\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=z_{i}\geq\lambda^{\prime} for any tight ii, and the final inequality follows because of Lemma 5.9.

To summarize, we have that there exists a constant θ=1ρ1(1+O(ε))\theta=\frac{1}{\rho_{1}\cdot(1+O(\varepsilon))} such that

θj𝒟c(j,I1)\displaystyle\theta\cdot\sum_{j\in\mathcal{D}}c(j,I_{1}) j𝒟αj|I1|λ\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-|I_{1}|\cdot\lambda^{\prime} (19)
and
0\displaystyle 0 j𝒟αj(|I1|+12|I2I3|)λ.\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(|I_{1}|+\frac{1}{2}\cdot|I_{2}\cup I_{3}|\right)\cdot\lambda^{\prime}. (20)

Therefore, by taking a weighted average of Equations (19) and (20), we get

(12p)θOPTk(12p)θj𝒟c(j,I1)j𝒟αj(|I1|+p|I2I3|)λj𝒟αjkλ,(1-2p)\cdot\theta\cdot\text{OPT}_{k^{\prime}}\leq(1-2p)\cdot\theta\cdot\sum_{j\in\mathcal{D}}c(j,I_{1})\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(|I_{1}|+p\cdot|I_{2}\cup I_{3}|\right)\cdot\lambda^{\prime}\leq\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot\lambda^{\prime},

where the first inequality is true since k=|I1|k^{\prime}=|I_{1}| and the last inequality is true since |I1|+p|I2I3|k.|I_{1}|+p\cdot|I_{2}\cup I_{3}|\geq k. Thus, since p0.49p\leq 0.49 and since ρ1=O(1)\rho_{1}=O(1), we have that OPTk=O(j𝒟αjkλ)\text{OPT}_{k^{\prime}}=O\left(\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot\lambda^{\prime}\right).

Finally, since {αj}\{\alpha_{j}\} is a feasible solution to DUAL(λ+1n)\text{DUAL}(\lambda^{\prime}+\frac{1}{n}), this means that j𝒟αjkλ=j𝒟αjk(λ+1n)+knOPTk+kn.\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot\lambda^{\prime}=\sum_{j\in\mathcal{D}}\alpha_{j}-k\cdot(\lambda^{\prime}+\frac{1}{n})+\frac{k}{n}\leq\text{OPT}_{k}+\frac{k}{n}. However, if kn1k\leq n-1, then kn1OPTk\frac{k}{n}\leq 1\leq\text{OPT}_{k}. So, OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}). ∎

Recall that H(δ)H(\delta) represents the conflict graph H(,r)(δ)H^{(\ell,r)}(\delta). We will also let H(δ)H^{\prime}(\delta) represent the conflict graph H(,r+1)(δ)H^{(\ell,r+1)}(\delta). In that case, H(δ)H^{\prime}(\delta) is the same as H(δ)H(\delta) except with one vertex removed. Recall (I1,I2,I3)(I_{1},I_{2},I_{3}) was a nested quasi-independent set of {H(δ)}\{H(\delta)\}, and let (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}) be a nested quasi-independent set of {H(δ)}\{H^{\prime}(\delta)\}, such that |I1\I1|1|I_{1}\backslash I_{1}^{\prime}|\leq 1 and |I1|+p1|I2I3|k>|I1|+p1|I2I3||I_{1}|+p_{1}|I_{2}\cup I_{3}|\geq k>|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|.

Theorem 5.17.

Let C>0C>0 be an arbitrarily large constant, and ε<0\varepsilon<0 be an arbitrarily small constant. Given the sets (I,I)(I,I^{\prime}) obtained in Algorithm 2, in polynomial time we can obtain a solution for Euclidean kk-means with approximation factor at most

(1+ε)maxr1min(ρ(p1r),ρ(p1)(1+14r(r2p11))).\left(1+\varepsilon\right)\cdot\max_{r\geq 1}\min\left(\rho\left(\frac{p_{1}}{r}\right),\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{r}{2p_{1}}-1\right)}\right)\right). (21)
Proof.

First, we remark that it suffices to obtain a set of facilities of size at most k+c0C4k+c_{0}\cdot C^{4} with cost at most KOPTkc0C4K\cdot\text{OPT}_{k-c_{0}\cdot C^{4}} for any fixed constant c0c_{0} (for all 1kn11\leq k\leq n-1), where KK is the value in Equation (21). Indeed, we can apply Lemma 5.2 to obtain a solution of size kc0C4k-c_{0}\cdot C^{4} and cost (1+ε)KOPTkc0C4(1+\varepsilon)\cdot K\cdot\text{OPT}_{k-c_{0}\cdot C^{4}} in polynomial time, hence obtaining a (1+ε)K(1+\varepsilon)\cdot K-approximate solution for (kc0C4)(k-c_{0}\cdot C^{4}) means clustering for all 1kn11\leq k\leq n-1. Thus, we get the desired approximation ratio for all knc0C41k\leq n-c_{0}C^{4}-1, but for knc0C41k\geq n-c_{0}C^{4}-1, we can enumerate all the (2n)c0C41(2n)^{c_{0}C^{4}-1} different clusterings of the input that have at most c0C41c_{0}C^{4}-1 non-singleton parts and solve kk-clustering exactly in nO(C4)n^{O(C^{4})} time.

Algorithm 2 stops once we have found the first hybrid conflict graph 𝒱(,r)\mathcal{V}^{(\ell,r)} for some r1r\geq 1 where the corresponding nested quasi-independent set (I1(,r),I2(,r),I3(,r))(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)}) satisfies |I1(,r)|+p1|I2(,r)I3(,r)|<k|I_{1}^{(\ell,r)}|+p_{1}\cdot|I_{2}^{(\ell,r)}\cup I_{3}^{(\ell,r)}|<k. Let 𝒱:=𝒱(,r)\mathcal{V}^{\prime}:=\mathcal{V}^{(\ell,r)} and let (I1,I2,I3):=(I1(,r),I2(,r),I3(,r))(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}):=(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)}). In addition, let 𝒱:=𝒱(,r1)\mathcal{V}:=\mathcal{V}^{(\ell,r-1)} and (I1,I2,I3):=(I1(,r1),I2(,r1),I3(,r1))(I_{1},I_{2},I_{3}):=(I_{1}^{(\ell,r-1)},I_{2}^{(\ell,r-1)},I_{3}^{(\ell,r-1)}). If r=1r=1 then r1=0r-1=0, which may be problematic since our previous lemmas can only be used for r1r\geq 1. However, we note that I(,0)=I()=I(1,p1)I^{(\ell,0)}=I^{(\ell)}=I^{(\ell-1,p_{\ell-1})} if 1\ell\geq 1, and that I(0,0)=I(0)I^{(0,0)}=I^{(0)} was previously labeled as I(q)=I(q1,pq1).I^{(q)}=I^{(q-1,p_{q-1})}. The only exception to this is the case when =0,r=1\ell=0,r=1, and I(0,0)I^{(0,0)} is the initialized solution created in the first line of the algorithm. In this case, however, recall from our initialization that I1(0,0)=I_{1}^{(0,0)}=\mathcal{F} is the full set of facilities, and I1(0,1)I_{1}^{(0,1)} will just be an extension of this set, so |I1(0,1)|||k|I_{1}^{(0,1)}|\geq|\mathcal{F}|\geq k. Therefore, I=(I1,I2,I3)I=(I_{1},I_{2},I_{3}) and I=(I1,I2,I3)I^{\prime}=(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}) are both expressible as nested quasi-independent sets of merged conflict graphs. However, if I=I(0,1)I=I^{(0,1)} and I=I(0,0)I^{\prime}=I^{(0,0)}, then we may need to express I=I(q1,pq1+1)I^{\prime}=I^{(q-1,p_{q-1}+1)} based on a previous labeling, so it is possible that II comes from a (λ+1n,k)(\lambda+\frac{1}{n},k^{\prime})-roundable solution and II^{\prime} comes from a (λ,k)(\lambda,k^{\prime})-roundable solution, rather than both nested quasi-independent sets coming from (λ,k)(\lambda,k^{\prime})-roundable solutions.

First, we show that for the value of kk^{\prime} at the end of the algorithm (which means all solutions found are (λ,k)(\lambda^{\prime},k^{\prime})-roundable for some λ\lambda^{\prime}), that OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}). To see why this is the case, note that either k=kk^{\prime}=k, so the claim is trivial, or k=|I1(0)|k^{\prime}=|I_{1}^{(0)}| for some I1(0)I_{1}^{(0)} that corresponds to a solution that was at some point labeled as 𝒮(0)\mathcal{S}^{(0)}. Note that the corresponding nested quasi-independent set (I1(0),I2(0),I3(0))(I_{1}^{(0)},I_{2}^{(0)},I_{3}^{(0)}) is not the final set (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}), because if so we would have stopped the algorithm before we decided to label 𝒮(0)\mathcal{S}^{(0)} as such. Therefore, k|I1(0)|+p1|I2(0)I3(0)|k\leq|I_{1}^{(0)}|+p_{1}\cdot|I_{2}^{(0)}\cup I_{3}^{(0)}| and we are assuming that kn1.k\leq n-1. Finally, since (I1(0),I2(0),I3(0))(I_{1}^{(0)},I_{2}^{(0)},I_{3}^{(0)}) arises from a (λ,k)(\lambda^{\prime},k^{\prime})-roundable solution with no special facilities or bad clients (by Condition 2), we may apply Lemma 5.16 to obtain that OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}).

Note that |I1||I1|1|I_{1}^{\prime}|\geq|I_{1}|-1, and that |I1||I1|+p1|I2I3|<k|I_{1}^{\prime}|\leq|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|<k, which means that |I1|<k+1|I_{1}|<k+1 so |I1|k|I_{1}|\leq k. First, suppose that |I1|k100C4|I_{1}|\geq k-100C^{4}, where we recall that CC is an arbitrarily large but fixed constant. In this case, this means that |I1|k100C41|I_{1}^{\prime}|\geq k-100C^{4}-1 and p1|I2I3|100C4+1p_{1}\cdot|I_{2}^{\prime}\cup I_{3}^{\prime}|\leq 100C^{4}+1 so |I2I3|=O(C4).|I_{2}^{\prime}\cup I_{3}^{\prime}|=O(C^{4}). In this case, we can apply Lemma 5.13 to find a randomized set I1SI1I2I3I_{1}^{\prime}\subset S\subset I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime} such that

𝔼[cost(𝒟,S)]ρ(p1)(1+O(ε))[j𝒟αj(λ1n)𝔼[|S|]]+O(γ)OPTk,\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda-\frac{1}{n}\right)\cdot\mathbb{E}[|S|]\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}},

where {αj}j𝒟\{\alpha_{j}^{\prime}\}_{j\in\mathcal{D}} corresponds to the merged solution that produces (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}). Since |I2I3|=O(C4)|I_{2}^{\prime}\cup I_{3}^{\prime}|=O(C^{4}), we can try every possible I1SI1I2I3I_{1}^{\prime}\subset S\subset I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime} to get a deterministic set SS of size at most |I1|+|I2I3|k+O(C4)|I_{1}^{\prime}|+|I_{2}^{\prime}\cup I_{3}^{\prime}|\leq k+O(C^{4}) and size at least |I1|kO(C4)|I_{1}^{\prime}|\geq k-O(C^{4}) such that

cost(𝒟,S)\displaystyle\text{cost}(\mathcal{D},S) ρ(p1)(1+O(ε))[j𝒟αj(λ1n)|S|]+O(γ)OPTk\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda-\frac{1}{n}\right)\cdot|S|\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
ρ(p1)(1+O(ε))[j𝒟αj(λ+2n)|S|]+O(1/n)|S|+O(γ)OPTk\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}^{\prime}-\left(\lambda+\frac{2}{n}\right)\cdot|S|\right]+O(1/n)\cdot|S|+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
ρ(p1)(1+O(ε))OPT|S|+O(γ)OPTk.\displaystyle\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{|S|}+O(\gamma)\cdot\text{OPT}_{k}.

The final line follows by Proposition 5.15, since OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}), and since |S|k+O(C4)O(k)|S|\leq k+O(C^{4})\leq O(k) so O(1/n)|S|=O(1)O(γ)OPTkO(1/n)\cdot|S|=O(1)\leq O(\gamma)\cdot\text{OPT}_{k}. Therefore, there exists an absolute constant c0c_{0} such that we have a set of size at most |I1I2I3|k+c0C4|I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime}|\leq k+c_{0}\cdot C^{4} with cost at most ρ(p1)(1+O(ε))OPT|S|+O(γ)OPTkρ(p1)(1+O(ε))OPTkc0C4\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{|S|}+O(\gamma)\cdot\text{OPT}_{k}\leq\rho(p_{1})\cdot(1+O(\varepsilon))\cdot\text{OPT}_{k-c_{0}\cdot C^{4}}. As argued in the first paragraph of this proof, this is sufficient since we can apply Lemma 5.2.

Otherwise, namely when |I1|k100C4|I_{1}|\leq k-100C^{4}, we have |I2I3|100C4|I_{2}\cup I_{3}|\geq 100C^{4} since |I1|+p1|I2I3|k|I_{1}|+p_{1}|I_{2}\cup I_{3}|\geq k. Then, recall that |I1\I1|1,|I_{1}\backslash I_{1}^{\prime}|\leq 1, so let κ=|I1\I1|{0,1}\kappa=|I_{1}\backslash I_{1}^{\prime}|\in\{0,1\}. Set t>0t>0 and c0c\geq 0 such that |I1|+p1|I2I3|=k+ct|I_{1}|+p_{1}|I_{2}\cup I_{3}|=k+c\cdot t and |I1|+p1|I2I3|=kt.|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|=k-t. Then, |I1I1|=k(1+d)t|I_{1}\cap I_{1}^{\prime}|=k-(1+d)t for some d0d\geq 0, so |I1|=k(1+d)t+κ|I_{1}|=k-(1+d)t+\kappa. In this case, p1|I2I3|=(1+c+d)tκp_{1}|I_{2}\cup I_{3}|=(1+c+d)t-\kappa, so if we set p=p1(1+d)tκ(1+c+d)tκp=p_{1}\cdot\frac{(1+d)t-\kappa}{(1+c+d)t-\kappa}, then |I1|+p|I2I3|=k|I_{1}|+p|I_{2}\cup I_{3}|=k. Also, since |I2I3|100C4|I_{2}\cup I_{3}|\geq 100C^{4}, we have that (1+c+d)tp1100C4C(1+c+d)t\geq p_{1}\cdot 100C^{4}\geq C, so p=p11+d1+c+dηp=p_{1}\cdot\frac{1+d}{1+c+d}-\eta for some η1/C\eta\leq 1/C. In this case, assuming that p>0.01,p>0.01, we can use Lemma 5.14 to obtain a solution of size at most kk with cost at most

ρ(p11+d1+c+dη2C)(1+300C)(1+O(ε))[j𝒟αj(λ1n)k]+O(γ)OPTk.\rho\left(p_{1}\cdot\frac{1+d}{1+c+d}-\eta-\frac{2}{C}\right)\cdot\left(1+\frac{300}{C}\right)\cdot(1+O(\varepsilon))\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Now, since p11+d1+c+dp1=.402p_{1}\cdot\frac{1+d}{1+c+d}\leq p_{1}=.402, it is straightforward to verify that ρ\rho has bounded derivative. (Indeed, each case produces a function that is continuously differentiable on [0,0.5)[0,0.5), so has bounded derivative on [0,0.402][0,0.402].) Therefore, since η1C\eta\leq\frac{1}{C}, the solution in fact has cost at most

ρ(p11+d1+c+d)(1+O(1C+ε))[j𝒟αj(λ1n)k]+O(γ)OPTk.\rho\left(p_{1}\cdot\frac{1+d}{1+c+d}\right)\cdot\left(1+O\left(\frac{1}{C}+\varepsilon\right)\right)\cdot\left[\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

But since OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}), and since

j𝒟αj(λ1n)k=j𝒟αj(λ+2n)k+3knOPTk+3(1+O(γ))OPTk,\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k=\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{2}{n}\right)\cdot k+\frac{3k}{n}\leq\text{OPT}_{k}+3\leq(1+O(\gamma))\cdot\text{OPT}_{k},

we obtain a solution of cost at most

ρ(p11+d1+c+d)(1+O(1C+ε+γ))OPTk.\rho\left(p_{1}\cdot\frac{1+d}{1+c+d}\right)\cdot\left(1+O\left(\frac{1}{C}+\varepsilon+\gamma\right)\right)\cdot\text{OPT}_{k}. (22)

In addition, note that we can use Lemma 5.14 to obtain a solution SS for (k+ct)(k+ct)-means and SS^{\prime} for (kt)(k-t)-means. Also, |SS|=|S|+|S||SS||S|+|S||I1I1|=k+(c+d)t.|S\cup S^{\prime}|=|S|+|S^{\prime}|-|S\cap S^{\prime}|\leq|S|+|S^{\prime}|-|I_{1}\cap I_{1}^{\prime}|=k+(c+d)t. So, if we define ρ:=max0η1/Cρ(p1η2C)(1+300C)(1+O(ε))=ρ(p1)(1+O(1C+ε))\rho^{\prime}:=\max_{0\leq\eta\leq 1/C}\rho(p_{1}-\eta-\frac{2}{C})\cdot(1+\frac{300}{C})\cdot(1+O(\varepsilon))=\rho(p_{1})\cdot(1+O(\frac{1}{C}+\varepsilon)), then

cost(𝒟,S)ρ(jDαj(λ1n)(kt))+O(γ)OPTk\text{cost}(\mathcal{D},S^{\prime})\leq\rho^{\prime}\cdot\left(\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\right)+O(\gamma)\cdot\text{OPT}_{k^{\prime}} (23)

and

cost(𝒟,SS)cost(𝒟,S)ρ(jDαj(λ1n)(k+ct))+O(γ)OPTk.\text{cost}(\mathcal{D},S\cup S^{\prime})\leq\text{cost}(\mathcal{D},S)\leq\rho^{\prime}\cdot\left(\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k+ct)\right)+O(\gamma)\cdot\text{OPT}_{k^{\prime}}. (24)

Therefore, by Proposition 5.6, if we randomly add tt of the items in S\SS\backslash S^{\prime}, we will get a set S′′S^{\prime\prime} of size kk with expected cost

𝔼[cost(𝒟,S′′)]\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})] ρ(c+d1+c+d(jDαj(λ1n)(kt))\displaystyle\leq\rho^{\prime}\cdot\Bigg{(}\frac{c+d}{1+c+d}\cdot\Bigg{(}\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\Bigg{)}
+11+c+d(jDαj(λ1n)(k+ct)))+O(γ)OPTk\displaystyle\hskip 96.73918pt+\frac{1}{1+c+d}\cdot\Bigg{(}\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k+ct)\Bigg{)}\Bigg{)}+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
=ρ(jDαj(λ1n)k+d1+c+d(λ1n)t)+O(γ)OPTk.\displaystyle=\rho^{\prime}\cdot\Bigg{(}\sum_{j\in D}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+\frac{d}{1+c+d}\cdot\left(\lambda-\frac{1}{n}\right)\cdot t\Bigg{)}+O(\gamma)\cdot\text{OPT}_{k}. (25)

Note that if λ1n0\lambda-\frac{1}{n}\leq 0, then this means that 𝔼[cost(𝒟,S′′)]ρ(j𝒟αj)+O(γ)OPTk,\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})]\leq\rho^{\prime}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}\right)+O(\gamma)\cdot\text{OPT}_{k}, and Proposition 5.15 tells us that j𝒟αjOPTk+O(k/n)(1+O(γ))OPTk\sum_{j\in\mathcal{D}}\alpha_{j}\leq\text{OPT}_{k}+O(k/n)\leq(1+O(\gamma))\cdot\text{OPT}_{k}, so the expected cost is at most ρ(p1)(1+O(1C+ε+γ))OPTk.\rho(p_{1})\cdot(1+O(\frac{1}{C}+\varepsilon+\gamma))\cdot\text{OPT}_{k}. Alternatively, we may assume that λ1n>0\lambda-\frac{1}{n}>0.

In this case, note that by Lemmas 5.9 and 5.11, we have that

j𝒟αj(λ1n)(|I1|+12|I2I3|)+4γOPTk0.\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(|I_{1}|+\frac{1}{2}|I_{2}\cup I_{3}|\right)+4\gamma\cdot\text{OPT}_{k^{\prime}}\geq 0.

We can rewrite |I1|+12|I2I3||I_{1}|+\frac{1}{2}|I_{2}\cup I_{3}| as (k(1+d)t+κ)+12p1((1+c+d)tκ)=k+t(12p1(1+c+d)(1+d))O(1)k+t(12p1(1+c+d)(1+d))(1O(1)C4)(k-(1+d)t+\kappa)+\frac{1}{2p_{1}}\cdot((1+c+d)t-\kappa)=k+t\cdot\left(\frac{1}{2p_{1}}(1+c+d)-(1+d)\right)-O(1)\geq k+t\cdot\left(\frac{1}{2p_{1}}(1+c+d)-(1+d)\right)\cdot\left(1-\frac{O(1)}{C^{4}}\right), where the last inequality is true since t[12p1(1+c+d)(1+d)]Ω(t(1+c+d))Ω(|I2I3|)100C4t\cdot\left[\frac{1}{2p_{1}}(1+c+d)-(1+d)\right]\geq\Omega(t\cdot(1+c+d))\geq\Omega(|I_{2}\cup I_{3}|)\geq 100C^{4}. Thus, we have that

j𝒟αj(λ1n)k+4γOPTk(λ1n)t(12p1(1+c+d)(1+d))(1O(1)C4).\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+4\gamma\cdot\text{OPT}_{k^{\prime}}\geq\left(\lambda-\frac{1}{n}\right)\cdot t\cdot\left(\frac{1}{2p_{1}}(1+c+d)-(1+d)\right)\cdot\left(1-\frac{O(1)}{C^{4}}\right).

Therefore, combining the above equation with (25), we have that

𝔼[cost(𝒟,S′′)]\displaystyle\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})] ρ(j𝒟αj(λ1n)k+O(γ)OPTk)(1+d1+c+d12p1(1+c+d)(1+d)(1+O(1)C4))\displaystyle\leq\rho^{\prime}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+O(\gamma)\cdot\text{OPT}_{k^{\prime}}\right)\cdot\left(1+\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}\cdot\left(1+\frac{O(1)}{C^{4}}\right)\right)
ρ(j𝒟αj(λ+2n)k+O(γ)OPTk)(1+d1+c+d12p1(1+c+d)(1+d)(1+O(1)C4))\displaystyle\leq\rho^{\prime}\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{2}{n}\right)\cdot k+O(\gamma)\cdot\text{OPT}_{k}\right)\cdot\left(1+\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}\cdot\left(1+\frac{O(1)}{C^{4}}\right)\right)
ρ(1+O(1C+γ))(1+d1+c+d12p1(1+c+d)(1+d))OPTk.\displaystyle\leq\rho^{\prime}\cdot\left(1+O(\frac{1}{C}+\gamma)\right)\cdot\left(1+\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}\right)\cdot\text{OPT}_{k}. (26)

If we set r1r\geq 1 such that c=(r1)(1+d)c=(r-1)(1+d), then 1+d1+c+d=1r\frac{1+d}{1+c+d}=\frac{1}{r} and

d1+c+d12p1(1+c+d)(1+d)=dr(1+d)1(r2p11)(1+d)=d(1+d)21r(r2p11)14r(r2p11).\frac{\frac{d}{1+c+d}}{\frac{1}{2p_{1}}(1+c+d)-(1+d)}=\frac{d}{r(1+d)}\cdot\frac{1}{(\frac{r}{2p_{1}}-1)\cdot(1+d)}=\frac{d}{(1+d)^{2}}\cdot\frac{1}{r\cdot(\frac{r}{2p_{1}}-1)}\leq\frac{1}{4r\cdot(\frac{r}{2p_{1}}-1)}.

So, by combining Equations (22) and (26), we can always guarantee an approximation factor of at most

(1+O(1C+ε+γ))maxr1min(ρ(p1r),ρ(p1)(1+14r(r2p11))).\left(1+O(\frac{1}{C}+\varepsilon+\gamma)\right)\cdot\max_{r\geq 1}\min\left(\rho\left(\frac{p_{1}}{r}\right),\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{r}{2p_{1}}-1\right)}\right)\right). (27)

This approximation factor also holds in the case when λ1n0\lambda-\frac{1}{n}\leq 0, by setting r=1r=1. So, by letting CC be an arbitrarily large constant and γε1\gamma\ll\varepsilon\ll 1 be arbitrarily small constants, the result follows. ∎

Since we have set p1=0.402p_{1}=0.402 and δ1=4+827,δ2=2,δ3=0.265\delta_{1}=\frac{4+8\sqrt{2}}{7},\delta_{2}=2,\delta_{3}=0.265, by Proposition 4.7 we have ρ(p1)=3+22\rho(p_{1})=3+2\sqrt{2}. If r3.221,r\geq 3.221, one can verify that (3+22)(1+14r(r2p11))5.979(3+2\sqrt{2})\cdot\left(1+\frac{1}{4r\cdot(\frac{r}{2p_{1}}-1)}\right)\leq 5.979. Alternatively, if 1r3.221,1\leq r\leq 3.221, then p1r.4023.221.1248,\frac{p_{1}}{r}\geq\frac{.402}{3.221}\geq.1248, and it is straightfoward to verify that ρ(p)5.979\rho(p)\leq 5.979 for all p[.1248,.402]p\in[.1248,.402] using Proposition 4.7.222We remark that while Proposition 4.7 follows from Lemma 5.19, Lemma 5.19 only depends on Lemma 4.2, so there is no circular reasoning. Hence, we have a 5.979\boxed{5.979}-approximation to kk-means.

5.4 Improving the approximation further

First, we define some important quantities. For any client jj, we define Aj:=αjiN (j)I1(αjc(j,i))A_{j}:=\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i)), and we define Bj:=iN (j)(I2I3)(αjc(j,i))B_{j}:=\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)). Note that Bj0B_{j}\geq 0 always.

We split the clients into 55 groups. Let 𝒟1\mathcal{D}_{1} be the set of all clients j𝒟Bj\not\in\mathcal{D}_{B} corresponding to subcases 1.a, 1.c, 1.d, 1.g.ii, 1.h, 2.a, 3.a, 4.a.i, 4.b.i, and 4.c, as well as the clients in 5.a where there do not exist i2N (j)I2i_{2}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2} and i3N (j)I3i_{3}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3} such that q(i3)=i2q(i_{3})=i_{2}. (In the casework, our choice of aa is |N (j)S||\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap S| rather than N(j)SN(j)\cap S, similar for bb and cc.) Let Q1Q_{1} be the sum of AjA_{j} for these clients, and R1R_{1} be the sum of BjB_{j} for these clients. Next, let 𝒟2\mathcal{D}_{2} be the set of all clients j𝒟Bj\not\in\mathcal{D}_{B} corresponding to subcases 1.b, 1.e, 4.a.ii, and 4.b.ii. Let 𝒟3\mathcal{D}_{3} be the set of all clients j𝒟Bj\not\in\mathcal{D}_{B} corresponding to subcase 1.g.i. Let 𝒟4\mathcal{D}_{4} be the set of all clients j𝒟Bj\not\in\mathcal{D}_{B} corresponding to subcase 2.d, further restricted to c(j,i2)+c(j,i3)0.25αjc(j,i_{2})+c(j,i_{3})\geq 0.25\cdot\alpha_{j} (or equivalently in the language of Case 2.d in Lemma 4.2, β2+γ20.25\beta^{2}+\gamma^{2}\geq 0.25). Finally, let 𝒟5\mathcal{D}_{5} be the set of all bad clients j𝒟Bj\in\mathcal{D}_{B}, as well as all remaining subcases (2.b, 2.c, 2.d when β2+γ2<0.25\beta^{2}+\gamma^{2}<0.25, 3.b, 3.c, and the clients in 5.a not covered by 𝒟1\mathcal{D}_{1}). Note these cover all cases (recall that 1.f is a non-existent case). Finally, we define Q2,Q3,Q4,Q5,R2,R3,R4,R5Q_{2},Q_{3},Q_{4},Q_{5},R_{2},R_{3},R_{4},R_{5} similarly to how we defined Q1Q_{1} and R1.R_{1}.

Now, we have the following result, which improves over Lemma 5.9.

Lemma 5.18.

For any client jj, Aj12Bj0A_{j}-\frac{1}{2}B_{j}\geq 0. In addition, if the client jj corresponds to any of the subcases in case 11 or case 44, or to subcases 2.a2.a or 3.a3.a, then AjBj0A_{j}-B_{j}\geq 0. Also, if the client jj corresponds to subcase 2.d2.d where β2+γ20.25\beta^{2}+\gamma^{2}\geq 0.25, then Aj47Bj0A_{j}-\frac{4}{7}B_{j}\geq 0.

Remark.

As in Lemma 5.9, this lemma holds even for bad clients j𝒟Bj\in\mathcal{D}_{B}.

Proof.

The proof that Aj12Bj0A_{j}-\frac{1}{2}B_{j}\geq 0 for any client jj is identical to that of Lemma 5.9. So, we focus on the next two claims. For the subcases in Case 1, note that N (j)I3=\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}=\emptyset, so we just need to show that αjiN (j)I1(αjc(j,i))iN (j)I2(αjc(j,i))\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i)), which is implied by Equation (11). Likewise, for the subcases in Case 4, note that N (j)I2=\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}=\emptyset, so we just need to show that αjiN (j)I1(αjc(j,i))iN (j)I3(αjc(j,i))\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i)), which is implied by Equation (12).

We recall that in subcases 2.a and 3.a, we noted in both cases that the points in N (j)I2\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2} and N (j)I3\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3} were all separated in H(δ2)H(\delta_{2}), and that N (j)I1=\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}=\emptyset since a=0a=0. So, we have that αjiN (j)I1(αjc(j,i))=αjiN (j)(I2I3)(αjc(j,i))\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))=\alpha_{j}\geq\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)) in both cases.

Finally, we consider subcase 2.d when β2+γ20.25.\beta^{2}+\gamma^{2}\geq 0.25. In this case, we have (when αj=1\alpha_{j}=1) that 1iN (j)I1=11-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=1, and iN (j)(I2I3)(1c(j,i))=(1c(j,i2))+(1c(j,i3))=2β2γ2\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(1-c(j,i))=(1-c(j,i_{2}))+(1-c(j,i_{3}))=2-\beta^{2}-\gamma^{2}, which is at most 1.751.75 if β2+γ20.25.\beta^{2}+\gamma^{2}\geq 0.25. So, for general αj\alpha_{j}, we have that αjiN (j)I1(αjc(j,i))=αj47iN (j)(I2I3)(αjc(j,i)).\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))=\alpha_{j}\geq\frac{4}{7}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)).

Therefore, we have that

R1Q1,R2Q2,R3Q3,R41.75Q4,andR52Q5.R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq Q_{2},\hskip 28.45274ptR_{3}\leq Q_{3},\hskip 28.45274ptR_{4}\leq 1.75\cdot Q_{4},\hskip 14.22636pt\text{and}\hskip 14.22636ptR_{5}\leq 2Q_{5}. (28)

Next, we define ρ(i)(p)\rho^{(i)}(p) to be the maximum fraction ρ(p)\rho(p) obtained from the casework corresponding to the (not bad) clients in 𝒟i\mathcal{D}_{i}. (Note that ρ(p)=max(ρ(1)(p),ρ(2)(p),ρ(3)(p),ρ(4)(p),ρ(5)(p))\rho(p)=\max\left(\rho^{(1)}(p),\rho^{(2)}(p),\rho^{(3)}(p),\rho^{(4)}(p),\rho^{(5)}(p)\right).) We have the following result:

Lemma 5.19.

Let δ1=4+827,\delta_{1}=\frac{4+8\sqrt{2}}{7}, δ2=2\delta_{2}=2, and δ3=0.265\delta_{3}=0.265. Then, for all p[0.096,0.402],p\in[0.096,0.402], we have that ρ(1)(p)3+22\rho^{(1)}(p)\leq 3+2\sqrt{2}, ρ(2)(p)1+2p+(1p)δ1+22p2+(1p)δ1\rho^{(2)}(p)\leq 1+2p+(1-p)\cdot\delta_{1}+2\sqrt{2p^{2}+(1-p)\cdot\delta_{1}}, and ρ(5)(p)5.68\rho^{(5)}(p)\leq 5.68.

Proof.

We start by considering ρ(1)(p)\rho^{(1)}(p), covered by subcases 1.a, 1.c, 1.d, 1.g.ii, 1.h, 2.a, 3.a, 4.a.i, 4.b.i, and 4.c, and certain subcases of 5.a. All subcases except 2.a, 4.c, and 5.a can easily be verified (see our Desmos file for K-means Case 1, the link is in Appendix A). For subcase 2.a, we have to verify it for all choices of c1c\geq 1. However, it is simple to see that the numerator of the fraction decreases as c2c_{2} increases whenever p[0,0.5]p\in[0,0.5], so in fact we just have to verify it for c=c2=1c=c_{2}=1, which is straightforward. For subcase 4.c, we have to verify it for all choices of c2c\geq 2. For c=2c=2 it is straightforward to verify. For c3,c\geq 3, since 2.5+23+22,2.5+\sqrt{2}\leq 3+2\sqrt{2}, it suffices to show

(12(12p)+12(12p)c)(1+δ1)2(12+12(12p)c)+p(1+2)2+11p3+22,\frac{\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\right)\cdot(1+\sqrt{\delta_{1}})^{2}-\left(\frac{1}{2}+\frac{1}{2}(1-2p)^{c}\right)+p(1+\sqrt{2})^{2}+1}{1-p}\leq 3+2\sqrt{2},

where we have taken the fraction from 4.c and added back a pc\frac{p}{c} term to the numerator. Now, this fraction is decreasing as cc increases, so it suffices to verify it for c=3c=3, which is straightforward.

The last case for ρ(1)(p)\rho^{(1)}(p) is Case 5.a. We show that in all cases the fraction is bounded by 3+223+2\sqrt{2} for p[0.096,0.402]p\in[0.096,0.402], and if h1h\geq 1 then the fraction can further be bounded by 5.685.68. This is clearly sufficient for bounding ρ(1)(p)\rho^{(1)}(p). It will also be important in bounding ρ(5)(p)\rho^{(5)}(p) - indeed, if there exist i2N (j)I2i_{2}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2} and i3N (j)I3i_{3}\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3} such that q(i3)=i2,q(i_{3})=i_{2}, then regardless of the outcomes of the initial fair coins, h1h\geq 1 since exactly one of i3i_{3} or q(i3)=i2q(i_{3})=i_{2} will contribute to the value of hh.

First, we note that T1T12+T3T_{1}-T_{1}^{2}+T_{3} can be rewritten as

a+2ph(a2+4aph+4p2h2)+δ12(a2a)+4pah+4p2h(h1)=(δ121)a(a1)+2p(12p)h.a+2ph-(a^{2}+4aph+4p^{2}h^{2})+\frac{\delta_{1}}{2}\cdot(a^{2}-a)+4pah+4p^{2}\cdot h(h-1)=\left(\frac{\delta_{1}}{2}-1\right)\cdot a(a-1)+2p(1-2p)\cdot h.

In the case where a=1a=1 and h1h\geq 1, we can therefore simplify the fraction in (5.a) to 1a+2ph+2ph2p(12p)h=1a+2ph+112p11+2p+112p=214p2.\frac{1}{a+2ph}+\frac{2ph}{2p(1-2p)\cdot h}=\frac{1}{a+2ph}+\frac{1}{1-2p}\leq\frac{1}{1+2p}+\frac{1}{1-2p}=\frac{2}{1-4p^{2}}. This is at most 5.685.68 for any p0.402p\leq 0.402. When a2a\geq 2, we can write the fraction as

1a+2ph+(a1)+(2p)h(δ121)a(a1)+(2p)h(12p).\frac{1}{a+2ph}+\frac{(a-1)+(2p)h}{\left(\frac{\delta_{1}}{2}-1\right)\cdot a(a-1)+(2p)h\cdot(1-2p)}. (29)

When a2a\geq 2 and h=0h=0, (29) can be simplified as

1a+1a(δ121)12(1+1δ121)=3+22.\frac{1}{a}+\frac{1}{a\cdot\left(\frac{\delta_{1}}{2}-1\right)}\leq\frac{1}{2}\left(1+\frac{1}{\frac{\delta_{1}}{2}-1}\right)=3+2\sqrt{2}.
1a+2ph+(a1)+(2p)h(δ121)a(a1)+(2p)h(12p)\displaystyle\frac{1}{a+2ph}+\frac{(a-1)+(2p)h}{\left(\frac{\delta_{1}}{2}-1\right)\cdot a(a-1)+(2p)h\cdot(1-2p)} =12+2ph+[1+2p]+2p(h1)[(δ121)2+2p]+2p(h1)(12p)\displaystyle=\frac{1}{2+2ph}+\frac{[1+2p]+2p(h-1)}{\left[\left(\frac{\delta_{1}}{2}-1\right)\cdot 2+2p\right]+2p(h-1)\cdot(1-2p)}
1a+2p+max(1(δ121)a,112p).\displaystyle\leq\frac{1}{a+2p}+\max\left(\frac{1}{\left(\frac{\delta_{1}}{2}-1\right)\cdot a},\frac{1}{1-2p}\right).

When a=2a=2 and h1h\geq 1, we can rewrite (29) as

12+2ph+1+2phδ12+2ph(12p)\displaystyle\frac{1}{2+2ph}+\frac{1+2ph}{\delta_{1}-2+2ph(1-2p)} =12+2ph+1+2p+2p(h1)δ12+2p(12p)+2p(12p)(h1)\displaystyle=\frac{1}{2+2ph}+\frac{1+2p+2p(h-1)}{\delta_{1}-2+2p(1-2p)+2p(1-2p)(h-1)}
12+2p+max(1+2pδ12+2p(12p),112p),\displaystyle\leq\frac{1}{2+2p}+\max\left(\frac{1+2p}{\delta_{1}-2+2p(1-2p)},\frac{1}{1-2p}\right),

which is easily verifiable to be at most 5.685.68 for p[0.096,0.402]p\in[0.096,0.402]. When a3a\geq 3 and h1h\geq 1, (29) is at most

13+max(13(δ121),112p),\frac{1}{3}+\max\left(\frac{1}{3\cdot\left(\frac{\delta_{1}}{2}-1\right)},\frac{1}{1-2p}\right),

which is easily verifiable to be at most 5.685.68 for p[0.096,0.402]p\in[0.096,0.402]. The final case is when a=1,h=0a=1,h=0, but here we saw in our analysis of 5.a that the fraction is at most 11, or that the numerator and denominator are both 0.

Next, consider ρ(2)(p)\rho^{(2)}(p), which is covered by subcases 1.b, 1.e, 4.a.ii, and 4.b.ii. Indeed, since δ2=2\delta_{2}=2, these all have the exact same bound of 1+2p+(1p)δ1+22p2+(1p)δ11+2p+(1-p)\delta_{1}+2\sqrt{2p^{2}+(1-p)\delta_{1}}.

Finally, we deal with ρ(5)(p)\rho^{(5)}(p), which deals with subcases 2.b, 2.c, 3.b, 3.c, and 5.a, along with 2.d when β2+γ2<0.25\beta^{2}+\gamma^{2}<0.25.

Subcase 2.b can be easily verified to be at most 5.6645.664 in the range p[0.096,0.402]p\in[0.096,0.402] when c2=0c_{2}=0 and c=c1c=c_{1} is between 22 and 55. Beyond this, we assume that c16c_{1}\geq 6, so we can apply the crude bound that the fraction is at most

12+(1p12(1(12p)6))(1+δ1)212p,\frac{\frac{1}{2}+\left(1-p-\frac{1}{2}\cdot\left(1-(1-2p)^{6}\right)\right)\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}}{1-2p},

which is at most 5.685.68 for p[0.096,0.402]p\in[0.096,0.402]. It is easy to verify that the fraction in Subcase 2.c is at most 5.685.68 for p[0.096,0.402]p\in[0.096,0.402].

Subcase 3.b is easy to verify for 2b5.2\leq b\leq 5. For b6b\geq 6, we can apply the crude bound that the fraction is at most

(1p)5(12p)(1+δ1)2+11(1+2δ37)p,\frac{(1-p)^{5}\cdot(1-2p)\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}+1}{1-\left(1+\frac{2-\delta_{3}}{7}\right)p},

which trivially satisfies the desired bounds. Finally, we note that in Subcase 3.c, the fraction decreases as c1c_{1} and c2c_{2} increase, so we may assume that either c1=c2=1c_{1}=c_{2}=1 or c1=2c_{1}=2 and c2=0c_{2}=0. These are easy to verify for 2b52\leq b\leq 5, and for b6b\geq 6, we may apply a crude bound to say the fraction is at most

1+(1p)5(12p)((1+δ1)21)12p\frac{1+(1-p)^{5}\cdot(1-2p)\cdot\left((1+\sqrt{\delta_{1}})^{2}-1\right)}{1-2p}

as long as c11c_{1}\geq 1 and c20.c_{2}\geq 0. This is at most 5.685.68 in the range [0.096,0.402][0.096,0.402].

Subcase 5.a was dealt with previously (as we only have to consider when h1h\geq 1), so the final case is 2.d when β2+γ2<0.25\beta^{2}+\gamma^{2}<0.25. In this case, we recall the fraction is

(12p)min(1+δ1,max(β,γ)+δ1t)2+p(β2+γ2)1(2β2γ2)p,\frac{(1-2p)\cdot\min(1+\sqrt{\delta_{1}},\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot(\beta^{2}+\gamma^{2})}{1-(2-\beta^{2}-\gamma^{2})\cdot p},

where t1t\geq 1, β+γδ3t\beta+\gamma\geq\sqrt{\delta_{3}\cdot t}, and also β2+γ20.25\beta^{2}+\gamma^{2}\leq 0.25. By the symmetry of β\beta and γ\gamma, we may replace max(β,γ)\max(\beta,\gamma) with β\beta. So, by defining ζ=β2+γ2,\zeta=\beta^{2}+\gamma^{2}, we can upperbound the above expression by

(12p)(β+δ1t)2+pζ12p+pζ(12p)(β+δ1/δ3(β+γ))2+pζ12p+pζ,\frac{(1-2p)\cdot(\beta+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot\zeta}{1-2p+p\cdot\zeta}\leq\frac{(1-2p)\cdot(\beta+\sqrt{\delta_{1}/\delta_{3}}\cdot(\beta+\gamma))^{2}+p\cdot\zeta}{1-2p+p\cdot\zeta},

since ζ0.25\zeta\leq 0.25 and since β+γδ3t\beta+\gamma\geq\sqrt{\delta_{3}\cdot t}. By Cauchy-Schwarz, (βx+γy)2(β2+γ2)(x2+y2)ζ(x2+y2)\left(\beta\cdot x+\gamma\cdot y\right)^{2}\leq(\beta^{2}+\gamma^{2})\cdot(x^{2}+y^{2})\leq\zeta\cdot(x^{2}+y^{2}). So, we can bound the above expression by

(12p)ζ((1+δ1/δ3)2+δ1/δ3)+pζ12p+pζ=(12p)((1+δ1/δ3)2+δ1/δ3)+p12pζ+p.\frac{(1-2p)\cdot\zeta\cdot\left((1+\sqrt{\delta_{1}/\delta_{3}})^{2}+\delta_{1}/\delta_{3}\right)+p\cdot\zeta}{1-2p+p\cdot\zeta}=\frac{(1-2p)\cdot\left((1+\sqrt{\delta_{1}/\delta_{3}})^{2}+\delta_{1}/\delta_{3}\right)+p}{\frac{1-2p}{\zeta}+p}.

For p0.5,p\leq 0.5, this fraction clearly increases with ζ\zeta, so we maximize this when ζ=0.25\zeta=0.25. When setting ζ=0.25\zeta=0.25, this can easily be verified to be at most 5.685.68 for all p[0.096,0.5]p\in[0.096,0.5].

This concludes all cases, thus proving the proposition. ∎

Next, we recall Lemma 5.11. First, by setting SS to be I1I_{1} in Lemma 5.11, we obtain that

i=15Qi=j𝒟(αjiN (j)I1(αjc(j,i)))j𝒟αj(λ1n)|I1|+4γOPTk.\sum_{i=1}^{5}Q_{i}=\sum_{j\in\mathcal{D}}\left(\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\right)\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot|I_{1}|+4\gamma\cdot\text{OPT}_{k^{\prime}}. (30)

Next, by setting SS to be I2I3I_{2}\cup I_{3} in Lemma 5.11, we obtain that

i=15Ri=j𝒟iN (j)(I2I3)(αjc(j,i))(λ1n)|I2I3|4γOPTk.\sum_{i=1}^{5}R_{i}=\sum_{j\in\mathcal{D}}\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i))\geq\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|-4\gamma\cdot\text{OPT}_{k^{\prime}}. (31)

Next, we recall Lemma 5.13. By splitting 𝔼[cost(𝒟,S)]\mathbb{E}[\text{cost}(\mathcal{D},S)] based on whether jj is in 𝒟1\mathcal{D}_{1}, 𝒟2\mathcal{D}_{2}, 𝒟3\mathcal{D}_{3}, 𝒟4\mathcal{D}_{4}, 𝒟5\𝒟B\mathcal{D}_{5}\backslash\mathcal{D}_{B}, or 𝒟B\mathcal{D}_{B}, we obtain that

𝔼[cost(𝒟,S)](1+O(ε))[i=15ρ(i)(p)(QipRi)]+O(γ)OPTk.\mathbb{E}[\text{cost}(\mathcal{D},S)]\leq(1+O(\varepsilon))\cdot\left[\sum_{i=1}^{5}\rho^{(i)}(p)\cdot(Q_{i}-p\cdot R_{i})\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}.

Therefore, the argument of Lemma 5.14 implies that if |I1|+p|I2I3|=k|I_{1}|+p\cdot|I_{2}\cup I_{3}|=k, if p[0.01,0.49]p\in[0.01,0.49], and if |I2I3|100C4|I_{2}\cup I_{3}|\geq 100C^{4}, then we can choose a set I1SI1I2I3I_{1}\subset S\subset I_{1}\cup I_{2}\cup I_{3} such that |S|k|S|\leq k and

cost(𝒟,S)\displaystyle\text{cost}(\mathcal{D},S) (1+O(ε+1C))[i=15ρ(i)(p2C)(Qi(p2C)Ri)]+O(γ)OPTk\displaystyle\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(p-\frac{2}{C}\right)\cdot\left(Q_{i}-\left(p-\frac{2}{C}\right)\cdot R_{i}\right)\right]+O(\gamma)\cdot\text{OPT}_{k^{\prime}}
(1+O(ε+1C))[i=15ρ(i)(p)(QipRi)]+O(γ)OPTk.\displaystyle\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(p\right)\cdot(Q_{i}-p\cdot R_{i})\right]+O(\gamma)\cdot\text{OPT}_{k}. (32)

To explain the second line, note that ρ(i)\rho^{(i)} has bounded derivative on [0.01,0.49][0.01,0.49] and that Qi0.5RiQ_{i}\geq 0.5\cdot R_{i}. Therefore, since p[0.01,0.49]p\in[0.01,0.49], ρ(i)(p2C)=ρ(i)(p)(1+O(1/C))\rho^{(i)}\left(p-\frac{2}{C}\right)=\rho^{(i)}(p)\cdot\left(1+O(1/C)\right), and QipRi=Ω(Ri)Q_{i}-p\cdot R_{i}=\Omega(R_{i}) which means Qi(p2C)Ri=(QipRi)(1+O(1/C))Q_{i}-\left(p-\frac{2}{C}\right)\cdot R_{i}=(Q_{i}-p\cdot R_{i})\cdot\left(1+O(1/C)\right). In addition, we still have that OPTk=O(OPTk)\text{OPT}_{k^{\prime}}=O(\text{OPT}_{k}), as in our proof of Theorem 5.17.

We now return to the setup of Theorem 5.17, where we have (I1,I2,I3)(I_{1},I_{2},I_{3}) and (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}). Suppose that |I1|+p1|I2I3|=k+ct|I_{1}|+p_{1}|I_{2}\cup I_{3}|=k+ct, |I1|+p1|I2I3|=kt|I_{1}^{\prime}|+p_{1}|I_{2}^{\prime}\cup I_{3}^{\prime}|=k-t, |I1I1|=k(1+d)t|I_{1}\cap I_{1}^{\prime}|=k-(1+d)t, and |I1\I1|=κ{0,1}|I_{1}\backslash I_{1}^{\prime}|=\kappa\in\{0,1\}. In addition, suppose that |I1|k100C4,|I_{1}|\geq k-100C^{4}, which means that |I1|k100C41|I_{1}^{\prime}|\geq k-100C^{4}-1. In this case, we may follow the same approach as in our Theorem 5.17 to obtain a ρ(p1)(1+O(ε))\rho(p_{1})\cdot(1+O(\varepsilon))-approximation to kk-means.

Alternatively, we may suppose that |I1|k100C4|I_{1}|\leq k-100C^{4}, which implies that |I2I3|100C4|I_{2}\cup I_{3}|\geq 100C^{4}. Then, defining r1r\geq 1 such that c=(r1)(1+d)c=(r-1)\cdot(1+d), we can use Equation (32) to find a solution of size at most kk with cost at most

(1+O(ε+1C))[i=15ρ(i)(p1r)(Qip1rRi)]+O(γ)OPTk,\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right)\right]+O(\gamma)\cdot\text{OPT}_{k}, (33)

in the same manner as (32), by setting p=p1(1+d)tκ(1+c+d)tκ=p1rO(1/C)p=p_{1}\cdot\frac{(1+d)t-\kappa}{(1+c+d)t-\kappa}=\frac{p_{1}}{r}-O(1/C). Alternatively, we can obtain two separate solutions I1SI1I2I3I_{1}\subset S\subset I_{1}\cup I_{2}\cup I_{3} of size k+ctk+ct, and a solution I1SI1I2I3I_{1}^{\prime}\subset S^{\prime}\subset I_{1}^{\prime}\cup I_{2}^{\prime}\cup I_{3}^{\prime} of size ktk-t, such that |SS|=k+(c+d)t|S\cup S^{\prime}|=k+(c+d)t. We have that

cost(𝒟,SS)cost(𝒟,S)(1+O(ε+1C))[i=15ρ(i)(p1)(Qip1Ri)]+O(γ)OPTk.\text{cost}(\mathcal{D},S\cup S^{\prime})\leq\text{cost}(\mathcal{D},S)\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(p_{1}\right)\cdot(Q_{i}-p_{1}\cdot R_{i})\right]+O(\gamma)\cdot\text{OPT}_{k}.

Finally, using the bound (23) for the cost of SS^{\prime}, we have

cost(𝒟,S)(1+O(ε+1C))ρ(p1)(j𝒟αj(λ1n)(kt))+O(γ)OPTk.\text{cost}(\mathcal{D},S^{\prime})\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\rho(p_{1})\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\right)+O(\gamma)\cdot\text{OPT}_{k}.

Note that we are not able to use a more sophisticated bound for cost(𝒟,S)\text{cost}(\mathcal{D},S^{\prime}), because our values of {Qi}\{Q_{i}\} and {Ri}\{R_{i}\} only apply to (I1,I2,I3)(I_{1},I_{2},I_{3}) and not to (I1,I2,I3)(I_{1}^{\prime},I_{2}^{\prime},I_{3}^{\prime}). By combining the solutions SSS\cup S^{\prime} and SS^{\prime}, by adding tt random points from S\SS\backslash S^{\prime} to SS^{\prime}, and using Proposition 5.6, we obtain a solution S′′S^{\prime\prime} with expected cost

𝔼[cost(𝒟,S′′)](1+O(ε+1C))[1r(1+d)i=15ρ(i)(p1)(Qip1Ri)+(11r(1+d))ρ(p1)(j𝒟αj(λ1n)(kt))]+O(γ)OPTk.\mathbb{E}[\text{cost}(\mathcal{D},S^{\prime\prime})]\leq\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\Biggr{[}\frac{1}{r(1+d)}\cdot\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})\\ +\left(1-\frac{1}{r(1+d)}\right)\cdot\rho(p_{1})\cdot\Biggr{(}\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot(k-t)\Biggr{)}\Biggr{]}+O(\gamma)\cdot\text{OPT}_{k}. (34)

This is because we combine the solution SSS\cup S^{\prime}, which has size k+(c+d)tk+(c+d)t, with the solution SS^{\prime}, which has size kt,k-t, so we assign the first solution relative weight 11+c+d=1r(1+d)\frac{1}{1+c+d}=\frac{1}{r(1+d)} and the second solution relative weight c+d1+c+d=11r(1+d)\frac{c+d}{1+c+d}=1-\frac{1}{r(1+d)}.

Now, let 𝔇\mathfrak{D} equal j𝒟αj(λ1n)k\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k. Then, since |I1|+p1r|I2I3|k|I_{1}|+\frac{p_{1}}{r}|I_{2}\cup I_{3}|\geq k, we can combine Equations (30) and (31) to get that

i=15(Qip1rRi)\displaystyle\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right) j𝒟αj(λ1n)(|I1|+p1r|I2I3|)+O(γ)OPTk\displaystyle\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot\left(|I_{1}|+\frac{p_{1}}{r}|I_{2}\cup I_{3}|\right)+O(\gamma)\cdot\text{OPT}_{k}
𝔇+O(γ)OPTk.\displaystyle\leq\mathfrak{D}+O(\gamma)\cdot\text{OPT}_{k}. (35)

Next, recall (by Equation (33)) that we have a solution of size at most kk with cost at most

(1+O(ε+1C))[i=15ρ(i)(p1r)(Qip1rRi)]+O(γ)OPTk.\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\left[\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right)\right]+O(\gamma)\cdot\text{OPT}_{k}. (36)

Finally, we note that since |I2I3|=r(1+d)tκp1(1O(1/C))r(1+d)tp1,|I_{2}\cup I_{3}|=\frac{r(1+d)t-\kappa}{p_{1}}\geq(1-O(1/C))\cdot\frac{r(1+d)t}{p_{1}}, we have that

i=15Ri+O(γ)OPTk(λ1n)|I2I3|(1O(1C))r(1+d)p1(λ1n)t.\sum_{i=1}^{5}R_{i}+O(\gamma)\cdot\text{OPT}_{k}\geq\left(\lambda-\frac{1}{n}\right)\cdot|I_{2}\cup I_{3}|\geq\left(1-O(\frac{1}{C})\right)\cdot\frac{r(1+d)}{p_{1}}\cdot\left(\lambda-\frac{1}{n}\right)\cdot t.

Therefore, we can bound the expected cost of S′′S^{\prime\prime} by

(1+O(ε+1C))[1r(1+d)i=15ρ(i)(p1)(Qip1Ri)+(11r(1+d))ρ(p1)(𝔇+p1r(1+d)i=15Ri)]+O(γ)OPTk.\left(1+O(\varepsilon+\frac{1}{C})\right)\cdot\Biggr{[}\frac{1}{r(1+d)}\cdot\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})\\ +\left(1-\frac{1}{r(1+d)}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}+\frac{p_{1}}{r(1+d)}\cdot\sum_{i=1}^{5}R_{i}\right)\Biggr{]}+O(\gamma)\cdot\text{OPT}_{k}. (37)

Now, we have that r1r\geq 1, and if we let θ=11+d\theta=\frac{1}{1+d}, we have that θ[0,1]\theta\in[0,1]. Hence, to show that we obtain an approximation ρ+O(ε+γ+1/C)\rho+O(\varepsilon+\gamma+1/C), it suffices to show that for all choices of θ[0,1]\theta\in[0,1] and r1,r\geq 1, that if we let 𝔇=𝔇+O(γ)OPTk\mathfrak{D}^{\prime}=\mathfrak{D}+O(\gamma)\cdot\text{OPT}_{k}, one cannot simultaneously satisfy

𝔇\displaystyle\mathfrak{D}^{\prime} i=15(Qip1rRi)\displaystyle\geq\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right) (38)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <θri=15ρ(i)(p1)(Qip1Ri)+(1θr)ρ(p1)(𝔇+p1θri=15Ri)\displaystyle<\frac{\theta}{r}\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})+\left(1-\frac{\theta}{r}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+p_{1}\cdot\frac{\theta}{r}\sum_{i=1}^{5}R_{i}\right) (39)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <i=15ρ(i)(p1r)(Qip1rRi)\displaystyle<\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right) (40)

and

R1Q1,R2Q2,R3Q3,R41.75Q4,R52Q5.R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq Q_{2},\hskip 28.45274ptR_{3}\leq Q_{3},\hskip 28.45274ptR_{4}\leq 1.75Q_{4},\hskip 28.45274ptR_{5}\leq 2Q_{5}. (41)

Indeed, we already know that (38) is true (same as (35)) and that (41) is true (same as (28)). So if we can show we can’t simultaneously satisfy all of (38), (39), (40), and (41), then either (39) or (40) is false. But we have a clustering with at most kk centers and cost at most the right hand sides of each of (39) and (40) up to a 1+O(1/C+ε+γ)1+O(1/C+\varepsilon+\gamma) multiplicative factor, due to (37) and (36), respectively. Therefore, we successfully obtain a solution of cost at most ρ(1+O(1/C+ε+γ))𝔇\rho\cdot\left(1+O(1/C+\varepsilon+\gamma)\right)\cdot\mathfrak{D}^{\prime}. Moreover, 𝔇j𝒟αj(λ1n)k+O(γ)OPTk(1+O(γ))OPTk\mathfrak{D}^{\prime}\leq\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda-\frac{1}{n}\right)\cdot k+O(\gamma)\cdot\text{OPT}_{k}\leq(1+O(\gamma))\cdot\text{OPT}_{k}, since j𝒟αj(λ+2n)kOPTk\sum_{j\in\mathcal{D}}\alpha_{j}-\left(\lambda+\frac{2}{n}\right)\cdot k\leq\text{OPT}_{k} by Proposition 5.15 as both α(),α(+1)\alpha^{(\ell)},\alpha^{(\ell+1)} are solutions to DUAL(λ+1n)\text{DUAL}(\lambda+\frac{1}{n}), and since 3kn3O(γ)OPTk\frac{3k}{n}\leq 3\leq O(\gamma)\cdot\text{OPT}_{k}. Therefore, 𝔇(1+O(γ))OPTk\mathfrak{D}^{\prime}\leq(1+O(\gamma))\cdot\text{OPT}_{k}, which means that we have found a ρ(1+O(1/C+ε+γ))\rho\cdot(1+O(1/C+\varepsilon+\gamma)) approximation to kk-means clustering.

Indeed, by numerical analysis of these linear constraints and based on the functions ρ(i)\rho^{(i)}, we obtain a 5.912\boxed{5.912}-approximation algorithm for Euclidean kk-means clustering. We defer the details to Appendix C.

6 Improved Approximation Algorithm for kk-median

6.1 Improvement to 1+21+\sqrt{2}-approximation

In this subsection, we show that a 1+2+ε1+\sqrt{2}+\varepsilon-approximation can be obtained by a simple modification of the Ahmadian et al. [1] analysis. Because we use the same algorithm as [1], the reduction from an LMP algorithm to a full polynomial-time algorithm is identical, so it suffices to improve the analysis of their LMP algorithm to a 1+21+\sqrt{2}-approximation. The main difficulty in this subsection will be obtaining a tight bound on the norms (as opposed to squared norms) of points that are pairwise separated, which we prove in Lemma 6.1. In the next subsection, we show how to break the 1+21+\sqrt{2} barrier that this algorithm runs into, which will follow a similar approach to our improved kk-means algorithm.

We first recall the setup of the LMP approximation of [1]. Let c(j,i)=d(j,i)c(j,i)=d(j,i) be the distance between a client j𝒟j\in\mathcal{D} and a facility ii\in\mathcal{F}. Suppose we have a solution α\alpha to DUAL(λ)\text{DUAL}(\lambda), such that every client jj has a tight witness w(j)w(j)\in\mathcal{F} with αjtw(j)\alpha_{j}\geq t_{w(j)} and αjc(j,w(j)).\alpha_{j}\geq c(j,w(j)). Recall that ti=maxjN(i)αjt_{i}=\max_{j\in N(i)}\alpha_{j}, where N(i)={j𝒟:αj>c(j,i)}N(i)=\{j\in\mathcal{D}:\alpha_{j}>c(j,i)\}, and likewise, N(j)={i:αj>c(j,i)}N(j)=\{i\in\mathcal{F}:\alpha_{j}>c(j,i)\}. Now, we let the conflict graph H(δ)H(\delta) on tight facilities (i.e., facilities ii with jN(i)(αjc(j,i))=λ\sum_{j\in N(i)}(\alpha_{j}-c(j,i))=\lambda) have an edge (i,i)(i,i^{\prime}) if c(i,i)δmin(ti,ti)c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}).

We let δ=2\delta=\sqrt{2} and return a maximal independent set II of H(δ)H(\delta) as our LMP-approximation. It suffices to show that for each client j𝒟,j\in\mathcal{D}, that c(j,I)(1+2)(αjiN(j)I(αjc(j,i))).c(j,I)\leq(1+\sqrt{2})\cdot\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right). To see why, by adding over all clients jj, we obtain that

cost(𝒟,I)(1+2)(j𝒟αjiIjN(i)(αjc(j,i)))=(1+2)(j𝒟αjλ|I|).\text{cost}(\mathcal{D},I)\leq(1+\sqrt{2})\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\sum_{i\in I}\sum_{j\in N(i)}(\alpha_{j}-c(j,i))\right)=(1+\sqrt{2})\cdot\left(\sum_{j\in\mathcal{D}}\alpha_{j}-\lambda\cdot|I|\right).

Finally, since α\alpha is a feasible solution to DUAL(λ),\text{DUAL}(\lambda), this implies that cost(𝒟,I)(1+2)OPT|I|.\text{cost}(\mathcal{D},I)\leq(1+\sqrt{2})\cdot\text{OPT}_{|I|}.

Before we verify the LMP approximation, we need the following lemma about points in Euclidean space.

Lemma 6.1.

Let h2h\geq 2 and suppose that x1,,xhx_{1},\dots,x_{h} are points in Euclidean space d\mathbb{R}^{d} (for some dd) such that xixj222\|x_{i}-x_{j}\|_{2}^{2}\geq 2 for all iji\neq j. Then, i=1hxi2h(h1)\sum_{i=1}^{h}\|x_{i}\|_{2}\geq\sqrt{h\cdot(h-1)}.

Proof.

Note that for any positive real numbers t1,t2,,tht_{1},t_{2},\dots,t_{h} that add to 11, we have that

i=1htixi22i=1htixi22tixi22=i<jtitjxixj222i<jtitj.\sum_{i=1}^{h}t_{i}\cdot\|x_{i}\|_{2}^{2}\geq\sum_{i=1}^{h}t_{i}\cdot\|x_{i}\|_{2}^{2}-\left\|\sum t_{i}x_{i}\right\|_{2}^{2}=\sum_{i<j}t_{i}t_{j}\|x_{i}-x_{j}\|_{2}^{2}\geq 2\cdot\sum_{i<j}t_{i}t_{j}.

Then, by setting ai=xi2a_{i}=\|x_{i}\|_{2} for each ii and scaling by t1++tht_{1}+\cdots+t_{h} accordingly to remove the assumption that t1++th=1t_{1}+\cdots+t_{h}=1, we have that

(i=1htiai2)(i=1hti)2i<jtitj\left(\sum_{i=1}^{h}t_{i}\cdot a_{i}^{2}\right)\cdot\left(\sum_{i=1}^{h}t_{i}\right)\geq 2\cdot\sum_{i<j}t_{i}t_{j}

for all t1,,th0t_{1},\dots,t_{h}\geq 0. Now, if some ai=0a_{i}=0, then xj2=xixj22\|x_{j}\|_{2}=\|x_{i}-x_{j}\|_{2}\geq\sqrt{2}, which means that i=1hxj2(h1)2h(h1)\sum_{i=1}^{h}\|x_{j}\|_{2}\geq(h-1)\cdot\sqrt{2}\geq\sqrt{h(h-1)} for all h2h\geq 2. Alternatively, ai0a_{i}\neq 0 for any ii, so we can set ti=1ait_{i}=\frac{1}{a_{i}}, to obtain that

(i=1hai)(i=1h1ai)2i<j1aiaj.\left(\sum_{i=1}^{h}a_{i}\right)\cdot\left(\sum_{i=1}^{h}\frac{1}{a_{i}}\right)\geq 2\cdot\sum_{i<j}\frac{1}{a_{i}a_{j}}. (42)

From now on, for any polynomial P(a1,,ah)P(a_{1},\dots,a_{h}), we denote symP(a1,,ah)\sum_{\text{sym}}P(a_{1},\dots,a_{h}) to be the sum of all distinct terms of the form P(aπ(1),,aπ(h))P(a_{\pi(1)},\dots,a_{\pi(h)}) over all permutations of [h][h]. For instance, syma1a2a3=1i<j<khaiajak\sum_{\text{sym}}a_{1}a_{2}a_{3}=\sum_{1\leq i<j<k\leq h}a_{i}a_{j}a_{k} and syma12a2=1i<jhai2aj+1j<ihai2aj\sum_{\text{sym}}a_{1}^{2}a_{2}=\sum_{1\leq i<j\leq h}a_{i}^{2}a_{j}+\sum_{1\leq j<i\leq h}a_{i}^{2}a_{j}.

In the case when h=2h=2, this means that (a1+a2)a1+a2a1a22a1a2,(a_{1}+a_{2})\cdot\frac{a_{1}+a_{2}}{a_{1}a_{2}}\geq\frac{2}{a_{1}a_{2}}, so a1+a22=h(h1).a_{1}+a_{2}\geq\sqrt{2}=\sqrt{h(h-1)}. Alternatively, we assume that h3h\geq 3. When h3h\geq 3, note that

(syma1ah2)(ai)\displaystyle\left(\sum_{\text{sym}}a_{1}\cdots a_{h-2}\right)\cdot\left(\sum a_{i}\right) =(h1)syma1ah1+syma12a2ah2\displaystyle=(h-1)\cdot\sum_{\text{sym}}a_{1}\cdots a_{h-1}+\sum_{\text{sym}}a_{1}^{2}a_{2}\cdots a_{h-2}
[(h1)+h(h1)(h2)/2h]syma1ah1\displaystyle\geq\left[(h-1)+\frac{h(h-1)(h-2)/2}{h}\right]\cdot\sum_{\text{sym}}a_{1}\cdots a_{h-1}
=h(h1)2syma1ah1,\displaystyle=\frac{h(h-1)}{2}\cdot\sum_{\text{sym}}a_{1}\cdots a_{h-1}, (43)

where the second line above follows by Muirhead’s inequality. Therefore, we have that

(i=1hai)2\displaystyle\left(\sum_{i=1}^{h}a_{i}\right)^{2} (i=1hai)h(h1)2syma1ah1syma1ah2\displaystyle\geq\left(\sum_{i=1}^{h}a_{i}\right)\cdot\frac{h(h-1)}{2}\cdot\frac{\sum_{\text{sym}}a_{1}\cdots a_{h-1}}{\sum_{\text{sym}}a_{1}\cdots a_{h-2}}
=(i=1hai)h(h1)2sym1aisym1aiaj\displaystyle=\left(\sum_{i=1}^{h}a_{i}\right)\cdot\frac{h(h-1)}{2}\cdot\frac{\sum_{\text{sym}}\frac{1}{a_{i}}}{\sum_{\text{sym}}\frac{1}{a_{i}a_{j}}}
2h(h1)2\displaystyle\geq 2\cdot\frac{h(h-1)}{2}
=h(h1),\displaystyle=h(h-1),

where the first line follows from (43) and the third line follows from (42). Therefore, we indeed have that i=1hxi2h(h1)\sum_{i=1}^{h}\|x_{i}\|_{2}\geq\sqrt{h(h-1)}. ∎

To verify the LMP approximation, it suffices to show that for every jj, c(j,I)(1+2)(αjiN(j)I(αjc(j,i))).c(j,I)\leq(1+\sqrt{2})\cdot\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right). We split this up into 33 cases.

Case 1: |𝑰𝑵(𝒋)|=𝟎\boldsymbol{|I\cap N(j)|=0}.

In this case, d(j,I)d(j,w(j))+d(w(j),I)d(j,I)\leq d(j,w(j))+d(w(j),I) by the Triangle Inequality. But we know that d(j,w(j))αjd(j,w(j))\leq\alpha_{j}, and that d(w(j),I)2tw(j)2αjd(w(j),I)\leq\sqrt{2}\cdot t_{w(j)}\leq\sqrt{2}\cdot\alpha_{j}, using the fact that II is a maximal independent set so w(j)w(j) has some neighbor of II in the conflict graph. Thus, d(j,I)(1+2)αjd(j,I)\leq(1+\sqrt{2})\cdot\alpha_{j}. However, since N(j)I=N(j)\cap I=\emptyset, this means that (αjiN(j)I(αjc(j,i)))=αj\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right)=\alpha_{j}. So, the desired inequality holds.

Case 2: |𝑰𝑵(𝒋)|=𝟏\boldsymbol{|I\cap N(j)|=1}.

In this case, let i1i_{1} be the unique point in N(j)IN(j)\cap I. Then, d(j,I)d(j,i1)d(j,I)\leq d(j,i_{1}). In addition, (αjiN(j)I(αjc(j,i)))=αj(αjc(j,i1))=c(j,i1)=d(j,i1)\left(\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))\right)=\alpha_{j}-(\alpha_{j}-c(j,i_{1}))=c(j,i_{1})=d(j,i_{1}). Since d(j,i1)0d(j,i_{1})\geq 0, the desired inequality holds (even with a ratio of 1<1+21<1+\sqrt{2}).

Case 3: |𝑰𝑵(𝒋)|=𝒔𝟐\boldsymbol{|I\cap N(j)|=s\geq 2}.

In this case, let i1,,isi_{1},\dots,i_{s} be the set of points in N(j)IN(j)\cap I. Then, we know that d(ir,ir)δmin(tir,tir)d(i_{r},i_{r^{\prime}})\geq\delta\cdot\min(t_{i_{r}},t_{i_{r^{\prime}}}) for any rrr\neq r^{\prime}. But tir,tirαjt_{i_{r}},t_{i_{r^{\prime}}}\geq\alpha_{j} by definition of tit_{i} (since ir,irN(j)i_{r},i_{r^{\prime}}\in N(j)), so this means that d(ir,ir)2αjd(i_{r},i_{r^{\prime}})\geq\sqrt{2}\cdot\alpha_{j} for every rrr\neq r^{\prime}.

Now, by applying Lemma 6.1, we have that r=1sd(j,ir)s(s1)αj\sum_{r=1}^{s}d(j,i_{r})\geq\sqrt{s\cdot(s-1)}\cdot\alpha_{j}. Now, let t=1αjr=1sd(j,ir)t=\frac{1}{\alpha_{j}}\cdot\sum_{r=1}^{s}d(j,i_{r}), so ts(s1)t\geq\sqrt{s\cdot(s-1)}. Then, d(j,I)min1rsd(j,ir)1sr=1sd(j,ir)=Tsαjd(j,I)\leq\min_{1\leq r\leq s}d(j,i_{r})\leq\frac{1}{s}\cdot\sum_{r=1}^{s}d(j,i_{r})=\frac{T}{s}\cdot\alpha_{j}. Ina ddition, we have that αjiN(j)I(αjc(j,i))=αjsαj+Tαj=(T(s1))αj\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))=\alpha_{j}-s\cdot\alpha_{j}+T\cdot\alpha_{j}=(T-(s-1))\cdot\alpha_{j}. So, the ratio is

T/sT(s1)ss1/ss(s1)(s1)=1ss(s1)2.\frac{T/s}{T-(s-1)}\leq\frac{\sqrt{s\cdot s-1}/s}{\sqrt{s(s-1)}-(s-1)}=\frac{1}{s-\sqrt{s(s-1)}}\leq 2.

Above, the first inequality follows because as TT increases, the numerator increases at a slower rate than the denominator, so assuming that the fraction is at least 11, we wish for TT to be as small as possible to maximize the fraction. The final inequality holds because ss(s1)12s-\sqrt{s(s-1)}\geq\frac{1}{2} for all s2s\geq 2. Therefore, the desired inequality holds (even with a ratio of 2<1+22<1+\sqrt{2}).

So in fact, there is a simple improvement from the 1+8/32.6331+\sqrt{8/3}\approx 2.633 approximation algorithm to a 1+22.4141+\sqrt{2}\approx 2.414 algorithm. A natural question is whether this can be improved further without any significant changes to the algorithm or analysis. Indeed, there only seems to be one bottleneck, when |IN(j)|=0|I\cap N(j)|=0, so naturally one may assume that by slightly reducing δ=2\delta=\sqrt{2}, the approximation from Case 1 should improve below 1+21+\sqrt{2} and the approximation from Case 3 should become worse than 22, but can still be below 1+21+\sqrt{2}.

Unfortunately, such a hope cannot be realized. Indeed, if we replace δ=2\delta=\sqrt{2} with some δ<2\delta<\sqrt{2}, we may have that d(j,i1)=d(j,i2)==d(j,is)=δs12sαjd(j,i_{1})=d(j,i_{2})=\cdots=d(j,i_{s})=\delta\cdot\sqrt{\frac{s-1}{2s}}\cdot\alpha_{j} and the pairwise distances are all exactly δαj\delta\cdot\alpha_{j} between each ir,iri_{r},i_{r^{\prime}}. However, in this case, αjiN(j)I(αjc(j,i))=αj(1s+δs(s1)/2),\alpha_{j}-\sum_{i\in N(j)\cap I}(\alpha_{j}-c(j,i))=\alpha_{j}\cdot\left(1-s+\delta\cdot\sqrt{s(s-1)/2}\right), which for δ<2\delta<\sqrt{2} is in fact negative for sufficiently large ss. Hence, even for δ=2ε\delta=\sqrt{2}-\varepsilon for a very small choice of ε>0\varepsilon>0, we cannot even guarantee a constant factor approximation with this analysis approach. So, this approach gets stuck at a 1+21+\sqrt{2} approximation.

In the following subsection, we show how an improved LMP approximation algorithm for Euclidean kk-median, breaking the 1+21+\sqrt{2} approximation barrier. We will then show that we can also break this barrier for a polynomial-time kk-median algorithm as well.

6.2 An improved LMP algorithm for Euclidean kk-median

Recall the conflict graph H:=H(δ)H:=H(\delta), where we define two tight facilities (i,i)(i,i^{\prime}) to be connected if c(i,i)δmin(ti,ti).c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}). We set parameters δ1δ2δ3\delta_{1}\geq\delta_{2}\geq\delta_{3} and 0<p<10<p<1, and define V1V_{1} to be the set of all tight facilities. Given the set of tight facilities V1V_{1} and conflict graphs H(δ)H(\delta) for all δ>0\delta>0, our algorithm works by applying the procedure described in Algorithm 3 to V1V_{1}.

Algorithm 3 Generate a Nested Quasi-Independent Set of V1V_{1}, as well as a set of centers SS providing an LMP approximation for Euclidean kk-median

LMPMedian(V1,{H(δ)},δ1,δ2,δ3,pV_{1},\{H(\delta)\},\delta_{1},\delta_{2},\delta_{3},p):

1:Create a maximal independent set I1I_{1} of H(δ1)H(\delta_{1}).
2:Let V2V_{2} be the set of points in V1\I1V_{1}\backslash I_{1} that are not adjacent to I1I_{1} in H(δ2)H(\delta_{2}).
3:Create a maximal independent set I2I_{2} of the induced subgraph H(δ1)[V2]H(\delta_{1})[V_{2}].
4:Let V3V_{3} be the set of points ii in V2\I2V_{2}\backslash I_{2} such that there is exactly one point in I2I_{2} that is a neighbor of ii in H(δ1)H(\delta_{1}), there are no points in I1I_{1} that are neighbors of ii in H(δ2)H(\delta_{2}), and there are no points in I2I_{2} that are neighbors of ii in H(δ3)H(\delta_{3}).
5:Create a maximal independent set I3I_{3} of the induced subgraph H(δ1)[V3]H(\delta_{1})[V_{3}].
6:Note that every point iI3i\in I_{3} has a unique adjacent neighbor q(i)I2q(i)\in I_{2} in H(δ1)H(\delta_{1}). We create the final set SS as follows:
  • Include every point iI1i\in I_{1}.

  • For each point iI2i\in I_{2}, flip a fair coin. If the coin lands heads, include ii with probability 2p2p. Otherwise, include each point in q1(i)q^{-1}(i) independently with probability 2p2p.

As in the kk-means case, we consider a more general setup, so that we can convert the LMP approximation to a full polynomial-time algorithm. Instead of V1,V_{1}, let 𝒱\mathcal{V}\subset\mathcal{F} be a subset of facilities and let 𝒟\mathcal{D} be the full set of clients. For each j𝒟,j\in\mathcal{D}, let αj0\alpha_{j}\geq 0 be some real number, and for each i𝒱i\in\mathcal{V}, let ti0t_{i}\geq 0 be some real number. In addition, for each client j𝒟j\in\mathcal{D}, we associate with it a set N(j)𝒱N(j)\subset\mathcal{V} and a “witness” facility w(j)𝒱w(j)\in\mathcal{V}. Finally, suppose that we have the following assumptions:

  1. 1.

    For any client j𝒟j\in\mathcal{D}, the witness w(j)𝒱w(j)\in\mathcal{V} satisfies αjtw(j)\alpha_{j}\geq t_{w(j)} and αjc(j,w(j))\alpha_{j}\geq c(j,w(j)).

  2. 2.

    For any client j𝒟j\in\mathcal{D} and any facility iN(j)i\in N(j), tiαj>c(j,i)t_{i}\geq\alpha_{j}>c(j,i).

Then, for the graph H(δ)H(\delta) on 𝒱\mathcal{V} where i,i𝒱i,i^{\prime}\in\mathcal{V} are connected if and only if c(i,i)δmin(ti,ti)c(i,i^{\prime})\leq\delta\cdot\min(t_{i},t_{i^{\prime}}) (recall that now, c(i,i)=d(i,i)c(i,i^{\prime})=d(i,i^{\prime}) instead of d(i,i)2d(i,i^{\prime})^{2}), we have the following main lemma.

Lemma 6.2.

Fix δ1=2\delta_{1}=\sqrt{2}, δ2=1.395\delta_{2}=1.395, and δ3=220.5858,\delta_{3}=2-\sqrt{2}\approx 0.5858, and let p<0.337p<0.337 be variable. Now, let SS be the randomized set created by applying Algorithm 3 on V1=𝒱V_{1}=\mathcal{V}. Then, for any j𝒟j\in\mathcal{D},

𝔼[c(j,S)]ρ(p)𝔼[αjiN(j)S(αjc(j,i))],\mathbb{E}[c(j,S)]\leq\rho(p)\cdot\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right],

where ρ(p)\rho(p) is some constant that only depends on pp (since δ1,δ2,δ3\delta_{1},\delta_{2},\delta_{3} are fixed).

Proof.

As in the kk-means case, we fix j𝒟j\in\mathcal{D}, and we do casework based on the sizes of a=|I1N(j)|a=|I_{1}\cap N(j)|, b=|I2N(j)|b=|I_{2}\cap N(j)|, and c=|I3N(j)|c=|I_{3}\cap N(j)|.

Case 1: 𝒂=𝟎,𝒃=𝟏,𝒄=𝟎\boldsymbol{a=0,b=1,c=0}.

Let i2i_{2} be the unique point in I2N(j),I_{2}\cap N(j), and let i=w(j)i^{*}=w(j) be the witness of jj. We have the following subcases:

  1. a)

    𝒊𝑽𝟐\boldsymbol{i^{*}\not\in V_{2}}. In this case, either iI1i^{*}\in I_{1} so d(i,I1)=0d(i^{*},I_{1})=0, or there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ2min(ti,ti1)δ2d(i^{*},i_{1})\leq\delta_{2}\cdot\min(t_{i^{*}},t_{i_{1}})\leq\delta_{2}. So, d(j,I1)1+δ2d(j,I_{1})\leq 1+\delta_{2}. In addition, we have that i2Si_{2}\in S with probability pp. So, if we let t:=d(j,i2)t:=d(j,i_{2}), we can bound the ratio by

    pt+(1p)(1+δ2)1p(1t)=pt+(1p)(1+δ2)pt+(1p)1+δ2,\frac{p\cdot t+(1-p)\cdot(1+\delta_{2})}{1-p(1-t)}=\frac{p\cdot t+(1-p)\cdot(1+\delta_{2})}{p\cdot t+(1-p)}\leq 1+\delta_{2}, (1.a’)

    since t0t\geq 0.

  2. b)

    𝒊𝑽𝟑.\boldsymbol{i^{*}\in V_{3}}. In this case, there exists i3I3i_{3}\in I_{3} (possibly i3=ii_{3}=i^{*}) such that d(i,i3)δ1min(ti,ti3).d(i^{*},i_{3})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{3}}). In addition, there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ1min(ti,ti1)d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}}). In addition, we have that tiαj=1t_{i^{*}}\leq\alpha_{j}=1. Finally, since I3V2I_{3}\subset V_{2}, we must have that d(i1,i3)δ2min(ti1,ti3)d(i_{1},i_{3})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{3}}). If we condition on i2Si_{2}\in S, then the numerator and denominator both equal c(j,i2)c(j,i_{2}), so the fraction is 11 (or 0/00/0). Else, if we condition on i2Si_{2}\not\in S, then the denominator is 11, and i3Si_{3}\in S with probability either pp or p1p>p\frac{p}{1-p}>p. Therefore, 𝔼[d(j,S)|i2S]pi3j2+(1p)i1j2\mathbb{E}[d(j,S)|i_{2}\not\in S]\leq p\cdot\|i_{3}-j\|_{2}+(1-p)\cdot\|i_{1}-j\|_{2}. We can bound this (we defer the details to Appendix B) by

    infT>03(X+Y)+22(X+Y)2δ22XY,\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}, (1.b’)

    where X=p2+p(1p)TX=p^{2}+p(1-p)\cdot T and Y=(1p)2+p(1p)T.Y=(1-p)^{2}+\frac{p(1-p)}{T}.

In the remaining cases, we may assume that iV2\V3i^{*}\in V_{2}\backslash V_{3}. Then, one of the following must occur:

  1. c)

    𝒊=𝒊𝟐\boldsymbol{i^{*}=i_{2}}. In this case, define t=d(j,i)[0,1]t=d(j,i^{*})\in[0,1], and note that d(j,I1)d(j,i)+d(i,I1)t+δ1d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq t+\delta_{1}. So, with probability pp, we have that d(j,S)d(j,i)=td(j,S)\leq d(j,i^{*})=t, and otherwise, we have that d(j,S)d(j,I1)=t+δ1d(j,S)\leq d(j,I_{1})=t+\delta_{1}. So, we can bound the ratio by

    max0t1pt+(1p)(t+δ1)1p(1t)=max0t1t+(1p)δ1pt+(1p).\max_{0\leq t\leq 1}\frac{p\cdot t+(1-p)\cdot(t+\delta_{1})}{1-p\cdot(1-t)}=\max_{0\leq t\leq 1}\frac{t+(1-p)\delta_{1}}{p\cdot t+(1-p)}.

    For pp such that 1/p>δ1,1/p>\delta_{1}, it is clear that this function increases as tt increases, so it is maximized when t=1t=1, which means we can bound the ratio by

    1+(1p)δ1.1+(1-p)\cdot\delta_{1}. (1.c’)
  2. d)

    𝒊𝑰𝟐\boldsymbol{i^{*}\in I_{2}} but 𝒊𝒊𝟐\boldsymbol{i^{*}\neq i_{2}}. First, we recall that d(j,i)1d(j,i^{*})\leq 1. Now, let t=d(j,i2)t=d(j,i_{2}). In this case, with probability pp, d(j,S)=td(j,S)=t (if we select i2i_{2} to be in SS), with probability p(1p)p(1-p), d(j,S)1d(j,S)\leq 1 (if we select ii^{*} but not i2i_{2} to be in SS), and in the remaining event of (1p)2(1-p)^{2} probability, we still have that d(j,S)d(j,I1)d(j,i)+d(i,I1)1+δ1.d(j,S)\leq d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq 1+\delta_{1}. So, we can bound the ratio by

    max0t1pt+p(1p)1+(1p)2(1+δ1)1p(1t).\max_{0\leq t\leq 1}\frac{p\cdot t+p(1-p)\cdot 1+(1-p)^{2}\cdot(1+\delta_{1})}{1-p\cdot(1-t)}.

    Note that this is maximized when t=0t=0 (since the numerator and denominator increase at the same rate when tt increases), so we can bound the ratio by

    p(1p)+(1p)2(1+δ1)1p=1+(1p)δ1.\frac{p(1-p)+(1-p)^{2}\cdot(1+\delta_{1})}{1-p}=1+(1-p)\cdot\delta_{1}. (1.d’)
  3. e)

    There is more than one neighbor of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟏)\boldsymbol{H(\delta_{1})} that is in 𝑰𝟐\boldsymbol{I_{2}}. In this case, there is some other point i2I2i_{2}^{\prime}\in I_{2} not in N(j)N(j) such that d(i,i2)δ1min(ti,ti2).d(i^{*},i_{2}^{\prime})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}). So, we have four points j,i,i1I1,i2I2j,i^{*},i_{1}\in I_{1},i_{2}^{\prime}\in I_{2} such that d(j,i)1,d(j,i^{*})\leq 1, d(i,i2)δ1min(ti,ti2),d(i^{*},i_{2}^{\prime})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}), d(i,i1)δ1min(ti,ti1),d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}}), and d(i1,i2)δ2min(ti1,ti2).d(i_{1},i_{2}^{\prime})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}^{\prime}}).

    If we condition on i2Si_{2}\in S, then the denominator equals c(j,i2)c(j,i_{2}) and the numerator is at most c(j,i2)c(j,i_{2}), so the fraction is 11 (or 0/00/0). Else, if we condition on i2Si_{2}\not\in S, then the denominator is 11, and the numerator is at most pi2j22+(1p)i1j22p\cdot\|i_{2}^{\prime}-j\|_{2}^{2}+(1-p)\cdot\|i_{1}-j\|_{2}^{2}. Note that d(j,i)1d(j,i^{*})\leq 1, that ti1t_{i^{*}}\leq 1, and that δ2,δ1δ2\delta_{2},\delta_{1}\geq\delta_{2}. So, as in case b), the overall fraction is at most

    infT>03(X+Y)+22(X+Y)2δ22XY,\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}, (1.e’)

    where X=p2+p(1p)TX=p^{2}+p(1-p)\cdot T and Y=(1p)2+p(1p)T.Y=(1-p)^{2}+\frac{p(1-p)}{T}.

  4. f)

    There are no neighbors of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟏)\boldsymbol{H(\delta_{1})} that are in 𝑰𝟐\boldsymbol{I_{2}}. In this case, d(i,i2)δ1min(ti,ti2).d(i^{*},i_{2})\geq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}}). Define t=min(ti,ti2)t=\min(t_{i^{*}},t_{i_{2}}). Since d(j,i)1,d(j,i^{*})\leq 1, by the triangle inequality we have that d(j,i2)max(0,δ1t1)d(j,i_{2})\geq\max\left(0,\delta_{1}\cdot t-1\right). In addition, we still have that d(j,I1)d(j,i)+d(i,I1)1+δ1ti,d(j,I_{1})\leq d(j,i^{*})+d(i^{*},I_{1})\leq 1+\delta_{1}\cdot t_{i^{*}}, and d(j,I1)d(j,i2)+d(i2,I1)1+δ1ti2d(j,I_{1})\leq d(j,i_{2})+d(i_{2},I_{1})\leq 1+\delta_{1}\cdot t_{i_{2}}, so together we have that d(j,I1)1+δ1td(j,I_{1})\leq 1+\delta_{1}\cdot t. Since i2Si_{2}\in S with probability pp, the ratio is at most

    max0t1maxd(j,i2)δ1t1pd(j,i2)+(1p)(1+δ1t)1p(1d(j,i2))\displaystyle\hskip 14.22636pt\max_{0\leq t\leq 1}\max_{d(j,i_{2})\geq\delta_{1}\cdot t-1}\frac{p\cdot d(j,i_{2})+(1-p)\cdot(1+\delta_{1}\cdot t)}{1-p(1-d(j,i_{2}))}
    =max0t1maxd(j,i2)δ1t1pd(j,i2)+(1p)(1+δ1t)pd(j,i2)+(1p).\displaystyle=\max_{0\leq t\leq 1}\max_{d(j,i_{2})\geq\delta_{1}\cdot t-1}\frac{p\cdot d(j,i_{2})+(1-p)\cdot(1+\delta_{1}\cdot t)}{p\cdot d(j,i_{2})+(1-p)}.

    It is clear that this function is decreasing as d(j,i2)d(j,i_{2}) is increasing (and nonnegative). So, we may assume WLOG that d(j,i2)=max(0,δ2t1)d(j,i_{2})=\max(0,\sqrt{\delta_{2}\cdot t}-1) to bound this ratio by

    max0t1pmax(0,δ1t1)+(1p)(1+δ1t)pmax(0,δ1t1)+(1p)\max_{0\leq t\leq 1}\frac{p\cdot\max(0,\delta_{1}\cdot t-1)+(1-p)\cdot(1+\delta_{1}\cdot t)}{p\cdot\max(0,\delta_{1}\cdot t-1)+(1-p)}

    If δ1t10\delta_{1}\cdot t-1\leq 0, then δ1t+12,\delta_{1}\cdot t+1\leq 2, so we can bound the above equation by 22. Otherwise, the above fraction can be rewritten as δ1t+(12p)pδ1t+(12p)\frac{\delta_{1}\cdot t+(1-2p)}{p\cdot\delta_{1}\cdot t+(1-2p)}. For p<0.5,p<0.5, this is maximized when t=1t=1 over the range t[0,1]t\in[0,1], so we can bound the ratio by

    12p+δ112p+pδ1.\frac{1-2p+\delta_{1}}{1-2p+p\cdot\delta_{1}}. (1.f’)
  5. g)

    There is a neighbor of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟑)\boldsymbol{H(\delta_{3})} that is also in 𝑰𝟐\boldsymbol{I_{2}}. In this case, either d(i,i2)δ3tid(i^{*},i_{2})\leq\delta_{3}\cdot t_{i^{*}} so d(i2,j)max(0,d(j,i)δ3ti)d(i_{2},j)\geq\max(0,d(j,i^{*})-\delta_{3}\cdot t_{i^{*}}), or there is some other point i2I2i_{2}^{\prime}\in I_{2} not in N(j)N(j) such that d(i,i2)δ3min(ti,ti2).d(i^{*},i_{2}^{\prime})\leq\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}). If d(i,i2)δ3tid(i^{*},i_{2})\leq\delta_{3}\cdot t_{i^{*}}, then define t=tit=t_{i^{*}} and u=d(j,i)u=d(j,i^{*}). In this case, d(j,I1)u+δ1t,d(j,I_{1})\leq u+\delta_{1}\cdot t, and d(j,i2)max(0,uδ3t).d(j,i_{2})\geq\max(0,u-\delta_{3}\cdot t). So, the fraction is at most

    (1p)(u+δ1t)+pd(j,i2)1p+pd(j,i2)(1p)(u+δ1t)+pmax(0,utδ3)1p+pmax(0,utδ3).\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot d(j,i_{2})}{1-p+p\cdot d(j,i_{2})}\leq\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot\max(0,u-t\cdot\delta_{3})}{1-p+p\cdot\max(0,u-t\cdot\delta_{3})}.

    Since t=ti1t=t_{i^{*}}\leq 1 and d(j,i)1d(j,i^{*})\leq 1, we can bound the overall fraction as at most

    max0t1max0u1(1p)(u+δ1t)+pmax(0,utδ3)1p+pmax(0,utδ3)\displaystyle\hskip 14.22636pt\max_{0\leq t\leq 1}\max_{0\leq u\leq 1}\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot\max(0,u-t\cdot\delta_{3})}{1-p+p\cdot\max(0,u-t\cdot\delta_{3})}
    max(δ1+δ3,1+δ1p(δ1+δ3)1pδ3)\displaystyle\leq\max\left(\delta_{1}+\delta_{3},\frac{1+\delta_{1}-p(\delta_{1}+\delta_{3})}{1-p\cdot\delta_{3}}\right) (1.g.i’)

    We derive the final equality in Appendix B.

    Alternatively, if d(i,i2)δ3min(ti,ti2),d(i^{*},i_{2}^{\prime})\leq\delta_{3}\cdot\min(t_{i^{*}},t_{i_{2}^{\prime}}), then if we condition on i2S,i_{2}\in S, the fraction is 11 (or 0/00/0), and if we condition on i2Si_{2}\not\in S, the denominator is 11 and the numerator is at most pd(j,i2)+(1p)d(j,i1)p(1+δ3)+(1p)(1+δ1).p\cdot d(j,i_{2}^{\prime})+(1-p)\cdot d(j,i_{1})\leq p\cdot(1+\delta_{3})+(1-p)\cdot(1+\delta_{1}). (Note that i2Si_{2}\in S and i2Si_{2}^{\prime}\in S are independent.) Therefore, we can also bound the overall fraction by

    p(1+δ3)+(1p)(1+δ1).p\cdot(1+\delta_{3})+(1-p)\cdot(1+\delta_{1}). (1.g.ii’)
  6. h)

    There is a neighbor of 𝒊\boldsymbol{i^{*}} in 𝑯(𝜹𝟐)\boldsymbol{H(\delta_{2})} that is also in 𝑰𝟏\boldsymbol{I_{1}}. In this case, ii^{*} would not be in V2V_{2}, so we are back to sub-case 1.a’.

Case 2: 𝒂=𝟎,𝒃=𝟎,𝒄𝟏\boldsymbol{a=0,b=0,c\leq 1}.

We again let ii^{*} be the witness of jj. In this case, if iV2i\not\in V_{2}, then there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ2min(ti,ti1)δ2d(i^{*},i_{1})\leq\delta_{2}\cdot\min(t_{i^{*}},t_{i_{1}})\leq\delta_{2}, in which case d(j,I1)1+δ2.d(j,I_{1})\leq 1+\delta_{2}. Otherwise, there exists i1I1i_{1}\in I_{1} such that d(i,i1)δ1min(ti,ti1)d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}}), and there exists i2I2i_{2}\in I_{2} such that d(i,i2)δ1min(ti,ti2)d(i^{*},i_{2})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}}). Finally, in this case we also have that d(i1,i2)δ2min(ti1,ti2)d(i_{1},i_{2})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}}). Now, we consider two subcases, either c=0c=0 or c=1c=1.

  1. a)

    𝒄=𝟎.\boldsymbol{c=0.} In this case, we have that the denominator is 11, and the numerator is either at most 1+δ21+\delta_{2}, or is at most pji22+(1p)ji12p\cdot\|j-i_{2}\|_{2}+(1-p)\cdot\|j-i_{1}\|_{2}, where d(j,i)1d(j,i^{*})\leq 1, d(i,i1)δ1min(ti,ti1)d(i^{*},i_{1})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{1}}), d(i,i2)δ1min(ti,ti2)d(i^{*},i_{2})\leq\delta_{1}\cdot\min(t_{i^{*}},t_{i_{2}}), and d(i1,i2)δ2min(ti1,ti2)d(i_{1},i_{2})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{2}}). Hence, we can bound the overall fraction, by the same computation as in the kk-median subcase 1.b), as

    max(1+δ2,infT>03(X+Y)+22(X+Y)2δ22XY),\max\left(1+\delta_{2},\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right), (2.a’)

    where X=p2+p(1p)TX=p^{2}+p(1-p)\cdot T and Y=(1p)2+p(1p)T.Y=(1-p)^{2}+\frac{p(1-p)}{T}.

  2. b)

    𝒄=𝟏.\boldsymbol{c=1.} In this case, let i3i_{3} be the unique point in N(j)I3.N(j)\cap I_{3}. Then, conditioned on i3i_{3} being in SS, the numerator and denominator both equal d(j,i3)d(j,i_{3}). Otherwise, the denominator is 11 and we can bound the numerator the same way as in subcase 2a), since the probability of i2Si_{2}\in S is either pp (if q(i3)i2q(i_{3})\neq i_{2}) or p1pp\frac{p}{1-p}\geq p (if q(i3)=i2q(i_{3})=i_{2}). So, we can bound the overall fraction again as

    max(1+δ2,infT>03(X+Y)+22(X+Y)2δ22XY),\max\left(1+\delta_{2},\inf_{T>0}\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right), (2.b’)

    where X=p2+p(1p)TX=p^{2}+p(1-p)\cdot T and Y=(1p)2+p(1p)T.Y=(1-p)^{2}+\frac{p(1-p)}{T}.

Case 3: 𝒂=𝟎\boldsymbol{a=0}, all other cases.

Note that in this case, we may assume b+c=|N(j)(I2I3)|2b+c=|N(j)\cap(I_{2}\cup I_{3})|\geq 2, since we already took care of all cases when a=0a=0 and b+c1b+c\leq 1. We split into two main subcases.

  1. a)

    Every point 𝒊\boldsymbol{i} in 𝑵(𝒋)(𝑰𝟐𝑰𝟑)\boldsymbol{N(j)\cap(I_{2}\cup I_{3})} satisfies 𝒅(𝒋,𝒊)𝜹𝟏𝟏.\boldsymbol{d(j,i)\geq\delta_{1}-1.} In this case, let I I2I3\accentset{\rule{2.79996pt}{0.7pt}}{I}\subset I_{2}\cup I_{3} represent the set of points selected to be in SS. Note that I \accentset{\rule{2.79996pt}{0.7pt}}{I} is a random set.

    Note that with probability at least 2p2p22p-2p^{2}, |N(j)I |1|N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1. (Since N(j)(I2I3)N(j)\cap(I_{2}\cup I_{3}) has size at least 22, the probability of I N(j)\accentset{\rule{2.79996pt}{0.7pt}}{I}\cap N(j) being nonempty is minimized when b=0,c=2b=0,c=2, and the two points in N(j)I3N(j)\cap I_{3} map to the same point under qq.) In this event, let h=|N(j)I |h=|N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}|, and let r1,,rhr_{1},\dots,r_{h} represent the distances from jj to each of the points in N(j)I N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}. Then, by Lemma 6.1, r1++rhhh1h.\frac{r_{1}+\cdots+r_{h}}{h}\geq\sqrt{\frac{h-1}{h}}. So, if we set r=r1++rhh,r=\frac{r_{1}+\cdots+r_{h}}{h}, then r2[rh(h1)]r\leq 2\left[r\cdot h-(h-1)\right] for any h1h\geq 1 and rh1h,r\geq\sqrt{\frac{h-1}{h}}, which means that minrir1++rhh2(1i=1h(1ri))\min r_{i}\leq\frac{r_{1}+\cdots+r_{h}}{h}\leq 2\cdot\left(1-\sum_{i=1}^{h}(1-r_{i})\right).

    In addition, if |I |=1|\accentset{\rule{2.79996pt}{0.7pt}}{I}|=1, then 1(1ri)=r1δ11=211-\sum(1-r_{i})=r_{1}\geq\delta_{1}-1=\sqrt{2}-1, and otherwise, because every point in I \accentset{\rule{2.79996pt}{0.7pt}}{I} is separated by at least δ1=2\delta_{1}=\sqrt{2} distance, 1(1ri)h(h1)(h1)211-\sum(1-r_{i})\geq\sqrt{h(h-1)}-(h-1)\geq\sqrt{2}-1 by Lemma 6.1. Overall, this means that whenever h=|N(j)I |1,h=|N(j)\cap\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1, minri2(1(1ri))\min r_{i}\leq 2\cdot\left(1-\sum(1-r_{i})\right) and 1(1ri)211-\sum(1-r_{i})\geq\sqrt{2}-1.

    In addition, if I =\accentset{\rule{2.79996pt}{0.7pt}}{I}=\emptyset, then the denominator is 11 and the numerator is at most d(j,w(j))+d(w(j),I1)1+2d(j,w(j))+d(w(j),I_{1})\leq 1+\sqrt{2}. Therefore, if we let qq be the probability that |I |1|\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1 and tt be the expectation of 1(1ri)1-\sum(1-r_{i}) conditioned on |I |1|\accentset{\rule{2.79996pt}{0.7pt}}{I}|\geq 1, the overall fraction is at most

    (1+2)(1q)+2tq(1q)+tq\displaystyle\frac{(1+\sqrt{2})\cdot(1-q)+2\cdot t\cdot q}{(1-q)+t\cdot q} (1+2)(1q)+2(21)q(1q)+(21)q\displaystyle\leq\frac{(1+\sqrt{2})\cdot(1-q)+2\cdot(\sqrt{2}-1)\cdot q}{(1-q)+(\sqrt{2}-1)q}
    (1+2)(32)(2p2p2)1(22)(2p2p2).\displaystyle\leq\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})}. (3.a’)
  2. b)

    There exists a point 𝒊𝑵(𝒋)(𝑰𝟐𝑰𝟑)\boldsymbol{i\in N(j)\cap(I_{2}\cup I_{3})} such that 𝒅(𝒋,𝒊)<𝜹𝟏𝟏.\boldsymbol{d(j,i)<\delta_{1}-1.} In this case, note that d(i,i)<δ1d(i,i^{\prime})<\delta_{1} for all points iN(j)(I2I3)i^{\prime}\in N(j)\cap(I_{2}\cup I_{3}). Assuming b+c2b+c\geq 2, this is only possible if either:

    1. i)

      b=1,c=1b=1,c=1 and the unique points i2N(j)I2i_{2}\in N(j)\cap I_{2} and i3N(j)I3i_{3}\in N(j)\cap I_{3} satisfy q(i3)=i2q(i_{3})=i_{2}, or

    2. ii)

      b=1,c2b=1,c\geq 2, the unique point i2N(j)I2i_{2}\in N(j)\cap I_{2} is the only point with d(j,i)<δ11d(j,i)<\delta_{1}-1, and every point in N(j)I3N(j)\cap I_{3} maps to i2i_{2} under qq.

    First, assume Case b)i. Let r=d(j,i2)r=d(j,i_{2}) and s=d(j,i3)s=d(j,i_{3}). Then, 𝔼[1iN(j)S(1d(j,i))]\mathbb{E}\left[1-\sum_{i\in N(j)\cap S}(1-d(j,i))\right] =(12p)1+pr+ps=(1-2p)\cdot 1+p\cdot r+p\cdot s, and the expected distance d(j,S)d(j,S) is at most pr+ps+(12p)(1+2)p\cdot r+p\cdot s+(1-2p)\cdot(1+\sqrt{2}). Since d(r,s)δ3d(r,s)\geq\delta_{3}, this means that r+sδ3,r+s\geq\delta_{3}, so the overall fraction is at most

    (1+2)(12p)+δ3p(12p)+δ3p.\frac{(1+\sqrt{2})\cdot(1-2p)+\delta_{3}\cdot p}{(1-2p)+\delta_{3}\cdot p}. (3.b.i’)

    Next, assume Case b)ii. Let r=d(j,i2)r=d(j,i_{2}), and let s1,,scs_{1},\dots,s_{c} be the distances from jj to each of the cc points in N(j)I3N(j)\cap I_{3}. Let s=s1++sccs=\frac{s_{1}+\cdots+s_{c}}{c} Then, 𝔼[1iN(j)S(1d(j,i))]=1p(1r)i=1cp(1si)1p(1+c)+p(r+sc).\mathbb{E}\left[1-\sum_{i\in N(j)\cap S}(1-d(j,i))\right]=1-p(1-r)-\sum_{i=1}^{c}p(1-s_{i})\geq 1-p(1+c)+p\cdot(r+s\cdot c). In addition, 𝔼[c(j,S)]\mathbb{E}[c(j,S)] is at most (1+2)(12p)+(1+2)12(12p)c+pr+12(1(12p)c)s(1+\sqrt{2})\cdot(\frac{1}{2}-p)+(1+\sqrt{2})\cdot\frac{1}{2}(1-2p)^{c}+p\cdot r+\frac{1}{2}\left(1-(1-2p)^{c}\right)\cdot s. Since the numerator and denominator grow at the same rate with respect to rr, and the numerator grows slower with respect to ss than the denominator, we wish to minimize rr and ss to maximize the fraction. So, we set r=0r=0, and s=c1cs=\sqrt{\frac{c-1}{c}} by Lemma 6.1. Therefore, the fraction is at most

    (1+2)(12(12p)+12(12p)c)+12(1(12p)c)c1c1p(1+cc(c1))\frac{(1+\sqrt{2})\cdot\left(\frac{1}{2}(1-2p)+\frac{1}{2}(1-2p)^{c}\right)+\frac{1}{2}\left(1-(1-2p)^{c}\right)\cdot\sqrt{\frac{c-1}{c}}}{1-p(1+c-\sqrt{c(c-1)})} (3.b.ii’)

Case 4: 𝒂𝟏\boldsymbol{a\geq 1}.

First, we will condition on the fair coin flips, and let I N(j)(I2I3)\accentset{\rule{2.79996pt}{0.7pt}}{I}\subset N(j)\cap(I_{2}\cup I_{3}) be the set of “surviving” points, i.e., the points that will be included in SS with probability 2p2p. Note all points in I \accentset{\rule{2.79996pt}{0.7pt}}{I} have pairwise distance at least δ1=2\delta_{1}=\sqrt{2} from each other, and all points in N(j)I1N(j)\cap I_{1} have pairwise distance at least δ1\delta_{1} from each other also. However, the points in N(j)I1N(j)\cap I_{1} and I \accentset{\rule{2.79996pt}{0.7pt}}{I} are only guaranteed to have pairwise distance at least δ2\delta_{2} from each other. Let hh represent the size |I ||\accentset{\rule{2.79996pt}{0.7pt}}{I}|.

We consider several subcases.

  1. a)

    𝒉=𝟎.\boldsymbol{h=0.} In this case, we can use the same bounds as Cases 2 and 3 of the simpler 1+21+\sqrt{2}-approximation, since we only have to worry about points in I1N(j).I_{1}\cap N(j). Indeed, the same bounds on the numerator and denominator still hold, so the ratio is at most

    2.2. (4.a’)
  2. b)

    𝒂=𝟏,𝒉=𝟏.\boldsymbol{a=1,h=1.} In this case, let i1i_{1} be the unique point in N(j)I1N(j)\cap I_{1}, and let i2i_{2} be the unique point in I \accentset{\rule{2.79996pt}{0.7pt}}{I}. Then, d(i1,i2)δ2,d(i_{1},i_{2})\geq\delta_{2}, so if t=d(j,i1)t=d(j,i_{1}) and u=d(j,i2)u=d(j,i_{2}), then the denominator in expectation is 1(1t)2p(1u)=t2p(1u)t(12p)1-(1-t)-2p(1-u)=t-2p(1-u)\geq t\cdot(1-2p), since 1ut1-u\leq t. But, the numerator 𝔼[c(j,S)]\mathbb{E}[c(j,S)] is at most tt, so the overall fraction is at most

    112p.\frac{1}{1-2p}. (4.b’)
  3. c)

    𝒂=𝟏,𝒉𝟐.\boldsymbol{a=1,h\geq 2.} Let i1i_{1} be the unique point in N(j)I1.N(j)\cap I_{1}. Then, we must have that d(j,i1)δ21.d(j,i_{1})\geq\delta_{2}-1. Letting t=d(j,i1)t=d(j,i_{1}), we have that d(j,S)td(j,S)\leq t, but 𝔼[αjiN(j)S(αjc(j,i))]t2piI (1c(j,i)).\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq t-2p\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{I}}(1-c(j,i)). However, we know that iI (1c(j,i))hh(h1)22\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{I}}(1-c(j,i))\geq h-\sqrt{h(h-1)}\geq 2-\sqrt{2} by Lemma 6.1, so the denominator is at least t(22)2pt-(2-\sqrt{2})\cdot 2p. So, the ratio is at most tt2(22)p,\frac{t}{t-2(2-\sqrt{2})p}, which is maximized when tt is as small as possible, namely t=δ21t=\delta_{2}-1. So, the ratio is at most

    δ21(δ21)2(22)p.\frac{\delta_{2}-1}{(\delta_{2}-1)-2(2-\sqrt{2})p}. (4.c’)
  4. d)

    𝒂𝟐,𝒉=𝟏.\boldsymbol{a\geq 2,h=1.} In this case, let i2i_{2} be the unique point in I \accentset{\rule{2.79996pt}{0.7pt}}{I}, and let t=d(j,i2)t=d(j,i_{2}). Note that d(j,i2)δ21d(j,i_{2})\geq\delta_{2}-1, so 1d(j,i2)2δ21-d(j,i_{2})\leq 2-\delta_{2}. In addition, if the distances from jj to the points in N(j)I1N(j)\cap I_{1} are r1,,rar_{1},\dots,r_{a}, then d(j,S)r1++raa.d(j,S)\leq\frac{r_{1}+\cdots+r_{a}}{a}. If we let r=r1++raar=\frac{r_{1}+\cdots+r_{a}}{a}, then 𝔼[αjiN(j)S(αjc(j,i))]=1a(1r)2p(1t)1a(1r)(2δ2)2p.\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]=1-a(1-r)-2p(1-t)\geq 1-a(1-r)-(2-\delta_{2})\cdot 2p. So, the overall fraction is at most r1a(1r)(2δ2)2p.\frac{r}{1-a(1-r)-(2-\delta_{2})\cdot 2p}. It is clear that this function decreases as rr increases, so we want to set rr as small as possible. However, we know that ra1ar\geq\sqrt{\frac{a-1}{a}} by Lemma 6.1, so the overall fraction is at most

    a1aa(a1)(a1)(2δ2)2p=1(aa(a1))(2δ2)2pa1a.\frac{\sqrt{\frac{a-1}{a}}}{\sqrt{a(a-1)}-(a-1)-(2-\delta_{2})\cdot 2p}=\frac{1}{(a-\sqrt{a(a-1)})-(2-\delta_{2})\cdot 2p\cdot\sqrt{\frac{a-1}{a}}}.

    The denominator clearly decreases as aa\to\infty, so the overall fraction is at most the limit of the above as aa\to\infty, which is

    112(2δ2)2p.\frac{1}{\frac{1}{2}-(2-\delta_{2})\cdot 2p}. (4.d’)
  5. e)

    𝒂,𝒉𝟐.\boldsymbol{a,h\geq 2.} In this case, let the distances from jj to the points in N(j)I1N(j)\cap I_{1} be r1,,rar_{1},\dots,r_{a}, and let the distances from the points from jj to the points in I \accentset{\rule{2.79996pt}{0.7pt}}{I} be s1,,shs_{1},\dots,s_{h}. Also, let r=r1++raa,r=\frac{r_{1}+\cdots+r_{a}}{a}, and let s=s1++shh.s=\frac{s_{1}+\cdots+s_{h}}{h}. Then, we have that the numerator is at most rr, and the denominator is at least 1a(1r)2ph(1s)1-a\cdot(1-r)-2p\cdot h\cdot(1-s). Next, note that by Lemma 6.1, sh1hs\geq\sqrt{\frac{h-1}{h}}, so h(1s)h(1h1h)hh122h\cdot(1-s)\geq h\cdot\left(1-\sqrt{\frac{h-1}{h}}\right)\geq h-\sqrt{h-1}\geq 2-\sqrt{2}. So, the fraction is at most r1a(1r)2p(22)\frac{r}{1-a(1-r)-2p\cdot(2-\sqrt{2})}. This is exactly the same as in subcase 4d), except there the denominator was 1a(1r)(2δ2)2p,1-a(1-r)-(2-\delta_{2})\cdot 2p, i.e., we just replaced δ2\delta_{2} with 2\sqrt{2}. So, the same calculations give us that we can bound the overall fraction by at most

    112(22)2p.\frac{1}{\frac{1}{2}-(2-\sqrt{2})\cdot 2p}. (4.e’)

Finally, we bound the actual LMP approximation constant, similar to Proposition 4.7 for the kk-means case. We have the following proposition, which will immediately follow from analyzing all subases carefully (see Lemma 6.5).

Proposition 6.3.

For p=0.068,p=0.068, ρ(p)2.395\rho(p)\leq 2.395. Hence, we can obtain a 2.3952.395-LMP approximation.

6.3 Improved kk-median approximation

In this section, we explain how our LMP approximation for kk-median implies an improved polynomial time kk-median approximation for any fixed kk. We set p1=0.068p_{1}=0.068 and δ1=2,δ2=1.395\delta_{1}=\sqrt{2},\delta_{2}=1.395, and δ3=22\delta_{3}=2-\sqrt{2}. In this case, we have that ρ(δ1)2.395\rho(\delta_{1})\leq 2.395 by Proposition 6.3.

Next, we have that all of the results in Subsections 5.1 and 5.3 hold in the kk-median context, with two changes. The first, more obvious, change is that Lemma 5.10 (and all subsequent results in Section 5.3) needs to use the function ρ\rho associated with kk-median as opposed to the function associated with kk-means.

The second change is that Lemma 5.9 no longer holds for p0.5p\leq 0.5, but still holds for pp0p\leq p_{0} for some fixed choice p0p_{0}. Indeed, for Cases 1, 2, and 3 (i.e., when a=0a=0), we have that 𝔼[αjiN(j)S(αjc(j,i))]0\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0 for any p0.5p\leq 0.5, since I1=I_{1}=\emptyset, and iN(j)I2(αjc(j,i))(bb(b1))αjαj\sum_{i\in N(j)\cap I_{2}}(\alpha_{j}-c(j,i))\leq(b-\sqrt{b(b-1)})\cdot\alpha_{j}\leq\alpha_{j} if |N(j)I2|=b|N(j)\cap I_{2}|=b, and likewise iN(j)I3(αjc(j,i))(cc(c1))αjαj\sum_{i\in N(j)\cap I_{3}}(\alpha_{j}-c(j,i))\leq(c-\sqrt{c(c-1)})\cdot\alpha_{j}\leq\alpha_{j} if |N(j)I3|=c|N(j)\cap I_{3}|=c. So, 𝔼[αjiN(j)S(αjc(j,i))](12p)αj0\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq(1-2p)\cdot\alpha_{j}\geq 0 for p0.5p\leq 0.5. For case 44 of kk-median, we verify that 𝔼[αjiN(j)S(αjc(j,i))]0\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0 after conditioning on I \accentset{\rule{2.79996pt}{0.7pt}}{I}. Indeed, if I =0\accentset{\rule{2.79996pt}{0.7pt}}{I}=0 (i.e., subcase 4.a), then this just equals αjiN(j)I1(αjc(j,i))αj(a(a1)(a1))0\alpha_{j}-\sum_{i\in N(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\alpha_{j}\cdot(\sqrt{a(a-1)}-(a-1))\geq 0. In the remaining subcases, the value 𝔼[αjiN(j)S(αjc(j,i))]0\mathbb{E}\left[\alpha_{j}-\sum_{i\in N(j)\cap S}(\alpha_{j}-c(j,i))\right]\geq 0 is always nonnegative as long as the denominator of our final fractions are also nonnegative. So, we just need that 12p01-2p\geq 0, (δ21)2(22)p0(\delta_{2}-1)-2(2-\sqrt{2})p\geq 0, 12(2δ2)2p0\frac{1}{2}-(2-\delta_{2})\cdot 2p\geq 0, and 12(22)2p0\frac{1}{2}-(2-\sqrt{2})\cdot 2p\geq 0. These are all true as long as pδ212(22)0.337p\leq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\leq 0.337. Thus, we replace 0.50.5 with p0=0.337p_{0}=0.337.

Overall, the rest of Subsection 5.3 goes through, except that our final bound will be

(1+O(1C+ε+γ))maxr1min(ρ(p1r),ρ(p1)(1+14r(p0rp11))),\left(1+O(\frac{1}{C}+\varepsilon+\gamma)\right)\cdot\max_{r\geq 1}\min\left(\rho\left(\frac{p_{1}}{r}\right),\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{p_{0}\cdot r}{p_{1}}-1\right)}\right)\right), (44)

where p1=0.068p_{1}=0.068 and p0=0.337p_{0}=0.337. The main replacement here is that we replaced r2p1=0.5rp1\frac{r}{2p_{1}}=\frac{0.5\cdot r}{p_{1}} with 0.337rp1.\frac{0.337\cdot r}{p_{1}}. We can use this to obtain a 2.4082.408-approximation, improving over 1+21+\sqrt{2}. We will not elaborate on this, however, as we will see that using the method in Subsection 5.4, we can further improve this to 2.4062.406.

We split the clients this time into 33 groups. We let 𝒟1\mathcal{D}_{1} be the set of clients j𝒟Bj\not\in\mathcal{D}_{B} corresponding to all subcases in Cases 1 and 2, 𝒟2\mathcal{D}_{2} be the set of clients j𝒟Bj\not\in\mathcal{D}_{B} corresponding to all subcases in Case 3, and 𝒟3\mathcal{D}_{3} be the set of clients corresponding to all subcases in Case 4, and all bad clients j𝒟B.j\in\mathcal{D}_{B}. For any client jj, as in Subsection 5.4, we define Aj:=αjiN (j)I1(αjc(j,i))A_{j}:=\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i)) and Bj:=iN (j)(I2I3)(αjc(j,i))B_{j}:=\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)). We also define Q1,Q2,Q3,R1,R2,R3Q_{1},Q_{2},Q_{3},R_{1},R_{2},R_{3} similar to how we did for the kk-means case.

Similar to Lemma 5.18 in the kk-means case, we have the following result for the kk-median case.

Lemma 6.4.

Let δ1=2\delta_{1}=\sqrt{2}, δ2=1.395\delta_{2}=1.395, and δ3=22\delta_{3}=2-\sqrt{2}. For any client j𝒟1j\in\mathcal{D}_{1}, AjBjA_{j}\geq B_{j}. For any client j𝒟2j\in\mathcal{D}_{2}, Aj12BjA_{j}\geq\frac{1}{2}B_{j}. Finally, for any client j𝒟3,j\in\mathcal{D}_{3}, Ajδ212(22)Bj.A_{j}\geq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\cdot B_{j}.

Proof.

Recall that a=|N (j)I1|a=|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}|, b=|N (j)I2|b=|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}|, and c=|N (j)I3|c=|\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}|. In case 11 or 22, we have that a=0a=0 and b+c1b+c\leq 1, so αjiN (j)I1=αj\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=\alpha_{j}, and the sum of iN (j)(I1I2)(αjc(j,i))\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{1}\cup I_{2})}(\alpha_{j}-c(j,i)) is merely over a single point, so is at most αj\alpha_{j}. Thus, if j𝒟1,j\in\mathcal{D}_{1}, AjBjA_{j}\geq B_{j}. (Note that this even holds for bad clients j𝒟Bj\in\mathcal{D}_{B}.)

In case 33, we have that a=0a=0, so αjiN (j)I1=αj\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=\alpha_{j}. In addition, all points i,ii,i^{\prime} in N (j)I2\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2} are separated by at least 2min(τi,τi)2αj\sqrt{2}\cdot\min(\tau_{i},\tau_{i^{\prime}})\geq\sqrt{2}\cdot\alpha_{j}. Hence, by Lemma 6.1, iN (j)I2(αjc(j,i))αj(bb(b1))αj\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i))\leq\alpha_{j}\cdot(b-\sqrt{b(b-1)})\leq\alpha_{j}. Likewise, iN (j)I3(αjc(j,i))αj(cc(c1))αj\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i))\leq\alpha_{j}\cdot(c-\sqrt{c(c-1)})\leq\alpha_{j}. So, αjiN (j)I1=αj12iN (j)(I2I3)(αjc(j,i)).\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}=\alpha_{j}\geq\frac{1}{2}\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap(I_{2}\cup I_{3})}(\alpha_{j}-c(j,i)). Thus, if j𝒟2,j\in\mathcal{D}_{2}, AjBjA_{j}\geq B_{j}.

In case 44, we have that a1a\geq 1. We claim that in this case, αjiN (j)I1(αjc(j,i))δ2122iN (j)I2(αjc(j,i))\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\frac{\delta_{2}-1}{2-\sqrt{2}}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}}(\alpha_{j}-c(j,i)). This will follow from the fact that all points in N (j)I1\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1} are separated by at least 2αj\sqrt{2}\cdot\alpha_{j}, all points in N (j)I2\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2} are also separated by at least 2αj\sqrt{2}\cdot\alpha_{j}, and all points in N (j)I1\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1} are separated by at least δ2αj\delta_{2}\cdot\alpha_{j} from all points in N (j)I2\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{2}. In fact, this immediately follows from the bounding of the denominators in subcases 4.a’, 4.b’, 4.c’, 4.d’, and 4.e’, where we replace hh with bb. Likewise, αjiN (j)I1(αjc(j,i))δ2122iN (j)I3(αjc(j,i))\alpha_{j}-\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{1}}(\alpha_{j}-c(j,i))\geq\frac{\delta_{2}-1}{2-\sqrt{2}}\cdot\sum_{i\in\accentset{\rule{2.79996pt}{0.7pt}}{N}(j)\cap I_{3}}(\alpha_{j}-c(j,i)). Overall, we have that for all clients in case 44, Ajδ212(22)BjA_{j}\geq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\cdot B_{j}. Since Ajδ212(22)BjA_{j}\geq\frac{\delta_{2}-1}{2(2-\sqrt{2})}\cdot B_{j} in all cases, it also holds for the bad clients j𝒟Bj\in\mathcal{D}_{B} as well. ∎

As a direct corollary, we have that

R1Q1,R22Q2,andR32(22)δ21Q3.R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq 2Q_{2},\hskip 14.22636pt\text{and}\hskip 14.22636ptR_{3}\leq\frac{2(2-\sqrt{2})}{\delta_{2}-1}Q_{3}. (45)

Next, similar to Lemma 5.19 in the kk-means case, we have the following result.

Lemma 6.5.

Let δ1=2,\delta_{1}=\sqrt{2}, δ2=1.395\delta_{2}=1.395, and δ3=22\delta_{3}=2-\sqrt{2}. Then, for all p[0.01,0.068],p\in[0.01,0.068], we have that ρ(1)(p)max(1+δ2,3(X+Y)+22(X+Y)2δ22XY),\rho^{(1)}(p)\leq\max\left(1+\delta_{2},\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right), where X=p2+p(1p)1.1X=p^{2}+p(1-p)\cdot 1.1 and Y=(1p)2+p(1p)1.1.Y=(1-p)^{2}+\frac{p(1-p)}{1.1}. In addition, for all p[0.01,0.068]p\in[0.01,0.068], we have that ρ(2)(p)(1+2)(32)(2p2p2)1(22)(2p2p2),\rho^{(2)}(p)\leq\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})}, and that ρ(3)(p)1122(2d2)p\rho^{(3)}(p)\leq\frac{1}{\frac{1}{2}-2\cdot(2-d_{2})\cdot p}.

Proof.

To bound ρ(1)(p)\rho^{(1)}(p), we simply analyze all subcases in Case 1 and Case 2 and set T=1.1T=1.1. This is straightfoward to verify (see, for instance, our Desmos files on kk-median in Appendix A).

To bound ρ(2)(p)\rho^{(2)}(p), we analyze all subcases in Case 3. Subcase 3.a’ and 3.b.i’ are straightfoward to verify. For subcase 3.b.ii’, we have to verify for all choices of c2c\geq 2. For c=2c=2, we can verify manually. For c3c\geq 3, it is easy to see that the numerator of 3.b.ii’ is at most

12[(1+2)((12p)+(12p)c)+1(12p)c]\displaystyle\frac{1}{2}\left[(1+\sqrt{2})\cdot\left((1-2p)+(1-2p)^{c}\right)+1-(1-2p)^{c}\right] =12[1+(1+2)(12p)+2(12p)c]\displaystyle=\frac{1}{2}\left[1+(1+\sqrt{2})\cdot(1-2p)+\sqrt{2}\cdot(1-2p)^{c}\right]
12[1+(1+2)(12p)+2(12p)3],\displaystyle\leq\frac{1}{2}\left[1+(1+\sqrt{2})\cdot(1-2p)+\sqrt{2}\cdot(1-2p)^{3}\right],

and the denominator is at least 12p1-2p. So, the fraction is at most

12[1+(1+2)(12p)+2(12p)3]12p,\frac{\frac{1}{2}\cdot\left[1+(1+\sqrt{2})\cdot(1-2p)+\sqrt{2}\cdot(1-2p)^{3}\right]}{1-2p},

which is at most (1+2)(32)(2p2p2)1(22)(2p2p2)\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})} for all p[0.01,0.068]p\in[0.01,0.068].

Finally, it is straightfoward to check that all 55 subcases in Case 44 are at most 1122(2d2)p\frac{1}{\frac{1}{2}-2\cdot(2-d_{2})\cdot p} for all p[0.1,0.068]p\in[0.1,0.068]. ∎

One can modify the remainder of the proof analogously to as in the kk-means case in Section 5.4. Hence, to show that we obtain an approximation ρ+O(ε+γ+1/C)\rho+O(\varepsilon+\gamma+1/C), it suffices to show that for all choices of θ[0,1]\theta\in[0,1] and r1,r\geq 1, that if we let 𝔇=𝔇+O(γ)OPTk\mathfrak{D}^{\prime}=\mathfrak{D}+O(\gamma)\cdot\text{OPT}_{k}, one cannot simultaneously satisfy

𝔇\displaystyle\mathfrak{D}^{\prime} i=13(Qip1rRi)\displaystyle\geq\sum_{i=1}^{3}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right) (46)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <θri=13ρ(i)(p1)(Qip1Ri)+(1θr)ρ(p1)(𝔇+p1θri=14Ri)\displaystyle<\frac{\theta}{r}\sum_{i=1}^{3}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})+\left(1-\frac{\theta}{r}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+p_{1}\cdot\frac{\theta}{r}\sum_{i=1}^{4}R_{i}\right) (47)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <i=13ρ(i)(p1r)(Qip1rRi)\displaystyle<\sum_{i=1}^{3}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right) (48)

and

R1Q1,R22Q2,R32(22)δ21Q3.R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq 2Q_{2},\hskip 28.45274ptR_{3}\leq\frac{2(2-\sqrt{2})}{\delta_{2}-1}Q_{3}. (49)

By numerical analysis of these linear constraints and based on the functions ρ(i)\rho^{(i)}, we obtain a 2.406\boxed{2.406}-approximation algorithm for Euclidean kk-median clustering. We defer the details to Appendix C.

Acknowledgments

The authors thank Ashkan Norouzi-Fard for helpful discussions relating to modifying the previous results on roundable solutions. The authors also thank Fabrizio Grandoni, Piotr Indyk, Euiwoong Lee, and Chris Schwiegelshohn for helpful conversations. Finally, we would like to thank an anonymous reviewer for providing a useful suggestion in removing one of the cases for kk-means.

References

  • [1] Sara Ahmadian, Ashkan Norouzi-Fard, Ola Svensson, and Justin Ward. Better guarantees for k-means and Euclidean k-median by primal-dual algorithms. SIAM Journal on Computing, 49(4):FOCS17–97–FOCS17–156, 2019.
  • [2] David Arthur and Sergei Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007, pages 1027–1035, 2007.
  • [3] David Arthur and Sergei Vassilvitskii. Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM J. Comput., 39(2):766–782, 2009.
  • [4] Vijay Arya, Naveen Garg, Rohit Khandekar, Adam Meyerson, Kamesh Munagala, and Vinayaka Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544–562, 2004.
  • [5] Pranjal Awasthi, Avrim Blum, and Or Sheffet. Stability yields a PTAS for k-median and k-means clustering. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pages 309–318, 2010.
  • [6] Pranjal Awasthi, Moses Charikar, Ravishankar Krishnaswamy, and Ali Kemal Sinop. The hardness of approximation of euclidean k-means. In Lars Arge and János Pach, editors, 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands, volume 34 of LIPIcs, pages 754–767. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2015.
  • [7] Sayan Bandyapadhyay and Kasturi Varadarajan. On variants of k-means clustering. In 32nd International Symposium on Computational Geometry (SoCG 2016). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2016.
  • [8] Luca Becchetti, Marc Bury, Vincent Cohen-Addad, Fabrizio Grandoni, and Chris Schwiegelshohn. Oblivious dimension reduction for k-means: beyond subspaces and the johnson-lindenstrauss lemma. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 1039–1050, 2019.
  • [9] Jaroslaw Byrka, Thomas W. Pensyl, Bartosz Rybicki, Aravind Srinivasan, and Khoa Trinh. An improved approximation for k-median and positive correlation in budgeted optimization. ACM Trans. Algorithms, 13(2):23:1–23:31, 2017.
  • [10] Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for the facility location and k-median problems. In 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA, pages 378–388, 1999.
  • [11] Moses Charikar and Sudipto Guha. Improved combinatorial algorithms for facility location problems. SIAM J. Comput., 34(4):803–824, 2005.
  • [12] Moses Charikar, Sudipto Guha, Éva Tardos, and David B. Shmoys. A constant-factor approximation algorithm for the kk-median problem. J. Comput. Syst. Sci., 65(1):129–149, 2002.
  • [13] Moses Charikar and Shi Li. A dependent LP-rounding approach for the k-median problem. In Automata, Languages, and Programming - 39th International Colloquium, ICALP 2012, Warwick, UK, July 9-13, 2012, Proceedings, Part I, pages 194–205, 2012.
  • [14] Vincent Cohen-Addad. A fast approximation scheme for low-dimensional k-means. In Artur Czumaj, editor, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 430–440. SIAM, 2018.
  • [15] Vincent Cohen-Addad, Andreas Emil Feldmann, and David Saulpic. Near-linear time approximations schemes for clustering in doubling metrics. In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pages 540–559. IEEE, 2019.
  • [16] Vincent Cohen-Addad, Anupam Gupta, Lunjia Hu, Hoon Oh, and David Saulpic. An improved local search algorithm for kk-median. In Proceedings of the Thirty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2022, pages 1556–1612. SIAM, 2022.
  • [17] Vincent Cohen-Addad, Anupam Gupta, Amit Kumar, Euiwoong Lee, and Jason Li. Tight fpt approximations for k-median and k-means. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 42:1–42:14. Ieee, 2019.
  • [18] Vincent Cohen-Addad and Karthik C. S. Inapproximability of clustering in lp metrics. In David Zuckerman, editor, 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, Baltimore, Maryland, USA, November 9-12, 2019, pages 519–539. IEEE Computer Society, 2019.
  • [19] Vincent Cohen-Addad, Euiwoong Lee, and Karthik C. S. Johnson coverage hypothesis: Inapproximability of kk-means and kk-median in p\ell_{p} metrics. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms, SODA 2022. SIAM, 2022.
  • [20] Vincent Cohen-Addad and Claire Mathieu. Effectiveness of local search for geometric optimization. In 31st International Symposium on Computational Geometry, SoCG 2015, June 22-25, 2015, Eindhoven, The Netherlands, pages 329–343, 2015.
  • [21] Vincent Cohen-Addad, Karthik C. S., and Euiwoong Lee. On approximability of clustering problems without candidate centers. In Dániel Marx, editor, Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms, SODA 2021, Virtual Conference, January 10 - 13, 2021, pages 2635–2648. SIAM, 2021.
  • [22] Vincent Cohen-Addad and Chris Schwiegelshohn. On the local structure of stable clustering instances. In Chris Umans, editor, 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017, pages 49–60. IEEE Computer Society, 2017.
  • [23] Sanjoy Dasgupta. The hardness of k-means clustering. Department of Computer Science and Engineering, University of California …, 2008.
  • [24] D. Feldman and M. Langberg. A unified framework for approximating and clustering data. In STOC, pages 569–578, 2011.
  • [25] Fabrizio Grandoni, Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Rakesh Venkat. A refined approximation for euclidean k-means. Inf. Process. Lett., 176:106251, 2022.
  • [26] Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. J. Algorithms, 31(1):228–248, 1999.
  • [27] Venkatesan Guruswami and Piotr Indyk. Embeddings and non-approximability of geometric problems. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, January 12-14, 2003, Baltimore, Maryland, USA., pages 537–538, 2003.
  • [28] S Louis Hakimi. Optimum locations of switching centers and the absolute centers and medians of a graph. Operations research, 12(3):450–459, 1964.
  • [29] Kamal Jain, Mohammad Mahdian, Evangelos Markakis, Amin Saberi, and Vijay V. Vazirani. Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP. J. ACM, 50(6):795–824, 2003.
  • [30] Kamal Jain and Vijay V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM, 48(2):274–296, 2001.
  • [31] Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman, and Angela Y. Wu. A local search approximation algorithm for k-means clustering. Comput. Geom., 28(2-3):89–112, 2004.
  • [32] Madhukar R. Korupolu, C. Greg Plaxton, and Rajmohan Rajaraman. Analysis of a local search heuristic for facility location problems. J. Algorithms, 37(1):146–188, 2000.
  • [33] Amit Kumar, Yogish Sabharwal, and Sandeep Sen. Linear-time approximation schemes for clustering problems in any dimensions. J. ACM, 57(2):5:1–5:32, 2010.
  • [34] Euiwoong Lee, Melanie Schmidt, and John Wright. Improved and simplified inapproximability for k-means. Inf. Process. Lett., 120:40–43, 2017.
  • [35] Shi Li. A 1.488 approximation algorithm for the uncapacitated facility location problem. Inf. Comput., 222:45–58, 2013.
  • [36] Shi Li and Ola Svensson. Approximating k-median via pseudo-approximation. SIAM J. Comput., 45(2):530–547, 2016.
  • [37] SP Lloyd. Least square quantization in pcm. bell telephone laboratories paper. published in journal much later: Lloyd, sp: Least squares quantization in pcm. IEEE Trans. Inform. Theor.(1957/1982), 18, 1957.
  • [38] Konstantin Makarychev, Yury Makarychev, and Ilya P. Razenshteyn. Performance of johnson-lindenstrauss transform for k-means and k-medians clustering. In Moses Charikar and Edith Cohen, editors, Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, STOC 2019, Phoenix, AZ, USA, June 23-26, 2019, pages 1027–1038. ACM, 2019.
  • [39] Konstantin Makarychev, Yury Makarychev, Maxim Sviridenko, and Justin Ward. A bi-criteria approximation algorithm for k-means. In Klaus Jansen, Claire Mathieu, José D. P. Rolim, and Chris Umans, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2016, September 7-9, 2016, Paris, France, volume 60 of LIPIcs, pages 14:1–14:20. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2016.
  • [40] Jirí Matousek. On approximate geometric k-clustering. Discrete & Computational Geometry, 24(1):61–84, 2000.
  • [41] Nimrod Megiddo and Kenneth J Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182–196, 1984.
  • [42] Rafail Ostrovsky, Yuval Rabani, Leonard J. Schulman, and Chaitanya Swamy. The effectiveness of Lloyd-type methods for the k-means problem. J. ACM, 59(6):28, 2012.
  • [43] Hugo Steinhaus. Sur la division des corps matériels en parties. Bull. Acad. Pol. Sci., Cl. III, 4:801–804, 1957.

Appendix A Desmos Graphs and Code

Here, we provide links for the Desmos files used to visualize the LMP approximations for both kk-means and kk-median, and the Python code used to improve the approximation factor for kk-means.

We provide graphs on Desmos for the LMP Approximation bounds for kk-means and kk-medians, as functions of the probability pp. We remark that in some of these cases, there may be parameters (such as a,b,c,c1,c2,ha,b,c,c_{1},c_{2},h) that need to be set properly (which can be done via toggles on the respective Desmos link) to see the actual approximation ratio as a function of kk.

Case 11 of kk-means is available here: https://www.desmos.com/calculator/jd8ud6h2e9

Case 22 of kk-means is available here: https://www.desmos.com/calculator/pgtylk9eui

Case 33 of kk-means is available here: https://www.desmos.com/calculator/zjshynypsh

Case 44 of kk-means is available here: https://www.desmos.com/calculator/ibwult8qzs

Case 55 of kk-means is available here: https://www.desmos.com/calculator/pgtylk9eui

Case 11 of kk-median is available here: https://www.desmos.com/calculator/9qmscsfvrr

Case 22 of kk-median is available here: https://www.desmos.com/calculator/rdidyxhs2o

Case 33 of kk-median is available here: https://www.desmos.com/calculator/zoeswetvyz

Case 44 of kk-median is available here: https://www.desmos.com/calculator/mpwrmz7mhe

Appendix B Omitted Details for the LMP Approximations

First, we prove Proposition 4.4.

Proof of Proposition 4.4.

Let v1=BAv_{1}=B-A, v2=CBv_{2}=C-B, and v3=DB.v_{3}=D-B. Then,

pCA22+(1p)DA22\displaystyle p\cdot\|C-A\|_{2}^{2}+(1-p)\cdot\|D-A\|_{2}^{2} =pv1+v222+(1p)v1+v322\displaystyle=p\cdot\|v_{1}+v_{2}\|_{2}^{2}+(1-p)\cdot\|v_{1}+v_{3}\|_{2}^{2}
=v122+2v1,pv2+(1p)v3+pv222+(1p)v322\displaystyle=\|v_{1}\|_{2}^{2}+2\cdot\langle v_{1},pv_{2}+(1-p)v_{3}\rangle+p\cdot\|v_{2}\|_{2}^{2}+(1-p)\cdot\|v_{3}\|_{2}^{2}
1+2pv2+(1p)v32+pv222+(1p)v322,\displaystyle\leq 1+2\cdot\|pv_{2}+(1-p)v_{3}\|_{2}+p\cdot\|v_{2}\|_{2}^{2}+(1-p)\cdot\|v_{3}\|_{2}^{2},

since v121\|v_{1}\|_{2}\leq 1. Now, we can write

pv2+(1p)v32=pv2+(1p)v322=pv222+(1p)v322p(1p)v2v322.\|pv_{2}+(1-p)v_{3}\|_{2}=\sqrt{\|pv_{2}+(1-p)v_{3}\|_{2}^{2}}=\sqrt{p\cdot\|v_{2}\|_{2}^{2}+(1-p)\cdot\|v_{3}\|_{2}^{2}-p(1-p)\cdot\|v_{2}-v_{3}\|_{2}^{2}}.

So, we have that pca22+(1p)da22p\cdot\|c-a\|_{2}^{2}+(1-p)\cdot\|d-a\|_{2}^{2} is at most

1+2pν1min(σ1,σ2)+(1p)ν2min(σ1,σ3)p(1p)ν3min(σ2,σ3)+pν1min(σ1,σ2)+(1p)ν2min(σ1,σ3).1+2\cdot\sqrt{p\cdot\nu_{1}\cdot\min(\sigma_{1},\sigma_{2})+(1-p)\cdot\nu_{2}\cdot\min(\sigma_{1},\sigma_{3})-p(1-p)\cdot\nu_{3}\cdot\min(\sigma_{2},\sigma_{3})}\\ +p\cdot\nu_{1}\cdot\min(\sigma_{1},\sigma_{2})+(1-p)\cdot\nu_{2}\cdot\min(\sigma_{1},\sigma_{3}). (50)

It is simple to see that (50) is nondecreasing in σ1\sigma_{1} for a fixed σ2,σ3\sigma_{2},\sigma_{3}, so (50) is maximized when σ1=1\sigma_{1}=1. Next, when σ1=1\sigma_{1}=1, it is clear that (50) is non-increasing in σ2\sigma_{2} if σ21\sigma_{2}\geq 1 and likewise for σ3\sigma_{3}, so (50) is maximized for some σ2,σ31\sigma_{2},\sigma_{3}\leq 1. In this case, (50) simplifies to

1+2pν1σ2+(1p)ν2σ3p(1p)ν3min(σ2,σ3)+pν1σ2+(1p)ν2σ3.1+2\sqrt{p\cdot\nu_{1}\cdot\sigma_{2}+(1-p)\cdot\nu_{2}\cdot\sigma_{3}-p(1-p)\cdot\nu_{3}\cdot\min(\sigma_{2},\sigma_{3})}+p\cdot\nu_{1}\cdot\sigma_{2}+(1-p)\cdot\nu_{2}\cdot\sigma_{3}.

Now, using the fact that ν1,ν2ν3\nu_{1},\nu_{2}\geq\nu_{3} and that p,(1p)p(1p)p,(1-p)\geq p(1-p), we have that this expression is nondecreasing in both σ2,σ3\sigma_{2},\sigma_{3} as long as σ2,σ31\sigma_{2},\sigma_{3}\leq 1. So, we may upper bound (50), and thus pCA22+(1p)DA22p\cdot\|C-A\|_{2}^{2}+(1-p)\cdot\|D-A\|_{2}^{2}, by

1+2pν1+(1p)ν2p(1p)ν3+pν1+(1p)ν2,1+2\cdot\sqrt{p\cdot\nu_{1}+(1-p)\cdot\nu_{2}-p(1-p)\cdot\nu_{3}}+p\cdot\nu_{1}+(1-p)\cdot\nu_{2},

by setting σ2=σ3=1.\sigma_{2}=\sigma_{3}=1.

Next, we complete the details in Lemmas 4.2 and 6.2 that we did not complete in the main body of the paper.

K-means: Case 1.c:

We wish to maximize

(1p)(t+δ1)2+pt21p(1t2)=(1p)(t+δ1)2+pt2(1p)+pt2,\frac{(1-p)\cdot(t+\sqrt{\delta_{1}})^{2}+p\cdot t^{2}}{1-p(1-t^{2})}=\frac{(1-p)\cdot(t+\sqrt{\delta_{1}})^{2}+p\cdot t^{2}}{(1-p)+p\cdot t^{2}},

over 0t1.0\leq t\leq 1. First, note that if t0.75t\leq\sqrt{0.75}, then we can bound this fraction by (t+δ1)2(0.75+δ1)2(t+\sqrt{\delta_{1}})^{2}\leq(\sqrt{0.75}+\sqrt{\delta_{1}})^{2}. Alternatively, if t0.75t\geq\sqrt{0.75}, then we can bound this fraction by at most

(1p)(1+δ1)2+pt2(1p)+pt2(1p)(1+δ1)2+3p/41p/4,\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot t^{2}}{(1-p)+p\cdot t^{2}}\leq\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+3p/4}{1-p/4},

where the left-hand side in the above equation has the numerator and denominator increasing at the same rate in terms of tt, so it is maximized when tt is minimized, i.e., t=0.75t=\sqrt{0.75}. Thus, we can bound the overall fraction as at most

max((0.75+δ1)2,(1p)(1+δ1)2+3p/41p/4).\max\left((\sqrt{0.75}+\sqrt{\delta_{1}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+3p/4}{1-p/4}\right).

K-Means: Case 1.g.i:

We wish to maximize

(1p)(u+δ1t)2+pd(j,i2)21p+pd(j,i2)2\frac{(1-p)\cdot(u+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot d(j,i_{2})^{2}}{1-p+p\cdot d(j,i_{2})^{2}}

over t,u[0,1]t,u\in[0,1] and d(j,i2)uδ3td(j,i_{2})\geq u-\sqrt{\delta_{3}\cdot t}. First, note that if we treat d(j,i2)d(j,i_{2}) as a variable, the numerator and denominator increase at the same rate as d(j,i2)2d(j,i_{2})^{2} increases, so this fraction is maximized when d(j,i2)=max(0,uδ3t)d(j,i_{2})=\max(0,u-\sqrt{\delta_{3}\cdot t}). If uδ3t0u-\sqrt{\delta_{3}\cdot t}\leq 0, then this fraction equals (u+δ1t)2(u+\sqrt{\delta_{1}\cdot t})^{2}, but uδ3tδ3u\leq\sqrt{\delta_{3}\cdot t}\leq\sqrt{\delta_{3}} since t1,t\leq 1, and this means that (u+δ1t)2(δ3+δ1)2(u+\sqrt{\delta_{1}\cdot t})^{2}\leq(\sqrt{\delta_{3}}+\sqrt{\delta_{1}})^{2}. Alternatively, we are maximizing

(1p)(u+δ1t)2+p(uδ3t)21p+p(uδ3t)2\frac{(1-p)\cdot(u+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot(u-\sqrt{\delta_{3}\cdot t})^{2}}{1-p+p\cdot(u-\sqrt{\delta_{3}\cdot t})^{2}}

over t,u[0,1]t,u\in[0,1]. Next, note that if u>δ3tu>\sqrt{\delta_{3}\cdot t} and t<1,t<1, then increasing tt will decrease (uδ3t)2(u-\sqrt{\delta_{3}\cdot t})^{2} and increase (u+δ1t)2(u+\sqrt{\delta_{1}\cdot t})^{2}. So, the denominator decreases and the numerator either increases or decreases at a slower rate. Thus, we may assume that either uδ3t0u-\sqrt{\delta_{3}\cdot t}\leq 0 or that t=1t=1. In the case where t=1t=1, we wish to maximize

(1p)(u+δ1)2+p(uδ3)21p+p(uδ3)2=p[(u+δ1)2+(uδ3)2]+(12p)(u+δ1)2p[(uδ3)2+1]+(12p).\frac{(1-p)\cdot(u+\sqrt{\delta_{1}})^{2}+p\cdot(u-\sqrt{\delta_{3}})^{2}}{1-p+p\cdot(u-\sqrt{\delta_{3}})^{2}}=\frac{p\cdot\left[(u+\sqrt{\delta_{1}})^{2}+(u-\sqrt{\delta_{3}})^{2}\right]+(1-2p)\cdot(u+\sqrt{\delta_{1}})^{2}}{p\cdot\left[(u-\sqrt{\delta_{3}})^{2}+1\right]+(1-2p)}.

Writing A(u)=(u+δ1)2+(uδ3)2A(u)=(u+\sqrt{\delta_{1}})^{2}+(u-\sqrt{\delta_{3}})^{2}, B(u)=(u+δ1)2,B(u)=(u+\sqrt{\delta_{1}})^{2}, and C(u)=(uδ3)2+1,C(u)=(u-\sqrt{\delta_{3}})^{2}+1, we can verify that A(u)C(u)\frac{A(u)}{C(u)} and B(u)B(u) are both increasing functions in uu over [0,1][0,1], which means so is pA(u)+(12p)B(u)pC(u)+(12p).\frac{p\cdot A(u)+(1-2p)\cdot B(u)}{p\cdot C(u)+(1-2p)}. Therefore, the overall maximum is at most

max((δ1+δ3)2,(1p)(1+δ1)2+p(1δ3)21p+p(1δ3)2)\max\left((\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2},\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p\cdot(1-\sqrt{\delta_{3}})^{2}}\right)

K-Means: Case 2.d:

Our goal is to maximize

(12p)min(1+δ1,max(β,γ)+δ1t)2+pβ2+pγ21p(1β2)p(1γ2)\frac{(1-2p)\cdot\min(1+\sqrt{\delta_{1}},\max(\beta,\gamma)+\sqrt{\delta_{1}\cdot t})^{2}+p\cdot\beta^{2}+p\cdot\gamma^{2}}{1-p(1-\beta^{2})-p(1-\gamma^{2})}

over t1t\geq 1 and β+γδ3t\beta+\gamma\geq\sqrt{\delta_{3}\cdot t} (and where β,γ0\beta,\gamma\geq 0). By symmetry, we may assume WLOG that βγ\beta\geq\gamma, and replace max(β,γ)\max(\beta,\gamma) with β\beta. Next, note that increasing tt only increases the overall fraction, so we may increase tt until we have that β+γ=δ3t\beta+\gamma=\sqrt{\delta_{3}\cdot t}. So, we now wish to maximize

(12p)min(1+δ1,β+δ1/δ3(β+γ))2+p(β2+γ2)12p+p(β2+γ2)\frac{(1-2p)\cdot\min\left(1+\sqrt{\delta_{1}},\beta+\sqrt{\delta_{1}/\delta_{3}}\cdot(\beta+\gamma)\right)^{2}+p\cdot(\beta^{2}+\gamma^{2})}{1-2p+p\cdot(\beta^{2}+\gamma^{2})}

over β,γ0\beta,\gamma\geq 0 subject to β+γδ3\beta+\gamma\geq\sqrt{\delta_{3}} (since β+γδ3t\beta+\gamma\geq\sqrt{\delta_{3}\cdot t} and t1t\geq 1). But, note that if β+δ1/δ3(β+γ)>1+δ1\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)>1+\sqrt{\delta_{1}}, then any decrease in either β\beta or γ\gamma until we have that β+δ1/δ3(β+γ)=1+δ1\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)=1+\sqrt{\delta_{1}} will decrease both the numerator and the denominator by the same amount, and so will increase the fraction. Thus, we may assume that β+δ1/δ3(β+γ)1+δ1\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)\leq 1+\sqrt{\delta_{1}}.

In this case, we may rewrite our goal as maximizing

f(β,γ):=(12p)(β+δ1/δ3(β+γ))2+p(β2+γ2)12p+p(β2+γ2)f(\beta,\gamma):=\frac{(1-2p)\cdot\left(\beta+\sqrt{\delta_{1}/\delta_{3}}\cdot(\beta+\gamma)\right)^{2}+p\cdot(\beta^{2}+\gamma^{2})}{1-2p+p\cdot(\beta^{2}+\gamma^{2})} (51)

over β,γ0\beta,\gamma\geq 0 subject to β+γδ3\beta+\gamma\geq\sqrt{\delta_{3}} and β+δ1/δ3(β+γ)1+δ1\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)\leq 1+\sqrt{\delta_{1}}. Now, for any fixed β,γ\beta,\gamma, we note that f(λβ,λγ)f(\lambda\beta,\lambda\gamma) for any λ1\lambda\geq 1 multiplies the numerator of the fraction in (51) by a λ2\lambda^{2} factor, but multiplies the denominator of the fraction by less than a λ2\lambda^{2} factor, since 12p01-2p\geq 0. Therefore, the fraction increases overall, which means that to maximize f(β,γ)f(\beta,\gamma), we may always assume that β+δ1/δ3(β+γ)=1+δ1\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)=1+\sqrt{\delta_{1}}. It is easy to see that this automatically implies that β+γδ3\beta+\gamma\geq\sqrt{\delta_{3}} when β,γ0\beta,\gamma\geq 0.

Thus, our goal is to maximize

(12p)(1+δ1)2+p(β2+γ2)(12p)+p(β2+γ2)\frac{(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}+p(\beta^{2}+\gamma^{2})}{(1-2p)+p\cdot(\beta^{2}+\gamma^{2})}

subject to β,γ0\beta,\gamma\geq 0 and β+δ1/δ3(β+γ)=1+δ1\beta+\sqrt{\delta_{1}/\delta_{3}}(\beta+\gamma)=1+\sqrt{\delta_{1}}. Maximizing this, however, just entails to minimizing β2+γ2\beta^{2}+\gamma^{2}, which is easy to solve as β=(δ3+δ1)δ3(1+δ1)δ1+(δ1+δ3)2\beta=(\sqrt{\delta_{3}}+\sqrt{\delta_{1}})\cdot\frac{\sqrt{\delta_{3}}(1+\sqrt{\delta_{1}})}{\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2}} and γ=δ1δ3(1+δ1)δ1+(δ1+δ3)2,\gamma=\sqrt{\delta_{1}}\cdot\frac{\sqrt{\delta_{3}}(1+\sqrt{\delta_{1}})}{\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2}}, which means that β2+γ2=δ3(1+δ1)2δ1+(δ1+δ3)2.\beta^{2}+\gamma^{2}=\frac{\delta_{3}\cdot(1+\sqrt{\delta_{1}})^{2}}{\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2}}.

K-median: Case 1.b’:

It suffices to prove the following proposition.

Proposition B.1.

Let 0<p1/20<p\leq 1/2, and suppose that we have 44 points i,i1,i3,ji^{*},i_{1},i_{3},j in Euclidean space such that d(j,i)1,d(j,i^{*})\leq 1, d(i,i1)2min(ti,ti1),d(i^{*},i_{1})\leq\sqrt{2}\cdot\min(t_{i^{*}},t_{i_{1}}), d(i,i3)2min(ti,ti3),d(i^{*},i_{3})\leq\sqrt{2}\cdot\min(t_{i^{*}},t_{i_{3}}), d(i1,i3)δ2min(ti1,ti3),d(i_{1},i_{3})\geq\delta_{2}\cdot\min(t_{i_{1}},t_{i_{3}}), and ti1t_{i^{*}}\leq 1. Then, for any T>0T>0,

(1p)d(j,i1)+pd(j,i3)3(X+Y)+22(X+Y)2Xδ22(1-p)\cdot d(j,i_{1})+p\cdot d(j,i_{3})\leq\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-X\cdot\delta_{2}^{2}}}

where X=p2+p(1p)TX=p^{2}+p(1-p)\cdot T and Y=(1p)2+p(1p)T.Y=(1-p)^{2}+\frac{p(1-p)}{T}.

Proof.

We write d(j,i1)=ji12d(j,i_{1})=\|j-i_{1}\|_{2} and d(j,i3)=ji32d(j,i_{3})=\|j-i_{3}\|_{2}. First, we have that

((1p)ji12+pji32)2\displaystyle\hskip 14.22636pt\left((1-p)\cdot\|j-i_{1}\|_{2}+p\cdot\|j-i_{3}\|_{2}\right)^{2}
=(1p)2ji122+p2ji322+2p(1p)ji12ji32\displaystyle=(1-p)^{2}\cdot\|j-i_{1}\|_{2}^{2}+p^{2}\cdot\|j-i_{3}\|_{2}^{2}+2p(1-p)\cdot\|j-i_{1}\|_{2}\cdot\|j-i_{3}\|_{2}
(1p)2ji122+p2ji322+p(1p)(1Tji122+Tji322)\displaystyle\leq(1-p)^{2}\cdot\|j-i_{1}\|_{2}^{2}+p^{2}\cdot\|j-i_{3}\|_{2}^{2}+p(1-p)\cdot\left(\frac{1}{T}\cdot\|j-i_{1}\|_{2}^{2}+T\cdot\|j-i_{3}\|_{2}^{2}\right)

for any T>0T>0. Writing X=p2+p(1p)TX=p^{2}+p(1-p)\cdot T and Y=(1p)2+p(1p)T,Y=(1-p)^{2}+\frac{p(1-p)}{T}, we have that

((1p)ji12+pji32)2Xji322+Yji122.\left((1-p)\cdot\|j-i_{1}\|_{2}+p\cdot\|j-i_{3}\|_{2}\right)^{2}\leq X\cdot\|j-i_{3}\|_{2}^{2}+Y\cdot\|j-i_{1}\|_{2}^{2}.

We can now apply Proposition 4.4 on the points A=j,B=i,C=i3,D=i1A=j,B=i^{*},C=i_{3},D=i_{1}, with ν1=ν2=2,ν3=δ22\nu_{1}=\nu_{2}=2,\nu_{3}=\delta_{2}^{2}, and σ1=ti,σ2=ti3,σ3=ti1\sigma_{1}=t_{i^{*}},\sigma_{2}=t_{i_{3}},\sigma_{3}=t_{i_{1}} and where we replace the parameter pp in Proposition 4.4 with XX+Y\frac{X}{X+Y}, to say that

Xji322+Yji122\displaystyle\hskip 14.22636ptX\cdot\|j-i_{3}\|_{2}^{2}+Y\cdot\|j-i_{1}\|_{2}^{2}
(X+Y)(1+XX+Y2+YX+Y2+2XX+Y2+YX+Y2XX+YYX+Yδ22)\displaystyle\leq(X+Y)\cdot\left(1+\frac{X}{X+Y}\cdot 2+\frac{Y}{X+Y}\cdot 2+2\sqrt{\frac{X}{X+Y}\cdot 2+\frac{Y}{X+Y}\cdot 2-\frac{X}{X+Y}\cdot\frac{Y}{X+Y}\cdot\delta_{2}^{2}}\right)
=3(X+Y)+22(X+Y)2δ22XY.\displaystyle=3(X+Y)+2\sqrt{2(X+Y)^{2}-\delta_{2}^{2}\cdot XY}.

In summary, we have that for any choice of T>0T>0,

((1p)ji12+pji32)2\displaystyle\left((1-p)\cdot\|j-i_{1}\|_{2}+p\cdot\|j-i_{3}\|_{2}\right)^{2} Xji322+Yji122\displaystyle\leq X\cdot\|j-i_{3}\|_{2}^{2}+Y\cdot\|j-i_{1}\|_{2}^{2}
3(X+Y)+22(X+Y)2δ22XY.\displaystyle\leq 3(X+Y)+2\sqrt{2(X+Y)^{2}-\delta_{2}^{2}\cdot XY}.\qed

K-median: Case 1.g.i’:

Our goal is to maximize

(1p)(u+δ1t)+pmax(0,utδ3)1p+pmax(0,utδ3)\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot\max(0,u-t\cdot\delta_{3})}{1-p+p\cdot\max(0,u-t\cdot\delta_{3})}

over 0t,u1.0\leq t,u\leq 1. First, note that if utδ30,u-t\cdot\delta_{3}\leq 0, then since t1,t\leq 1, this means that uδ3u\leq\delta_{3}. In this case, the fraction equals u+δ1tδ1+δ3=2u+\delta_{1}\cdot t\leq\delta_{1}+\delta_{3}=2, since δ1=2\delta_{1}=\sqrt{2} and δ3=22\delta_{3}=2-\sqrt{2}.

Alternatively, we have that max(0,utδ3)=utδ3.\max(0,u-t\cdot\delta_{3})=u-t\cdot\delta_{3}. Let u=utδ30u^{\prime}=u-t\cdot\delta_{3}\geq 0, so u=u+tδ3u=u^{\prime}+t\cdot\delta_{3}. In this case, we wish to maximize

(1p)(u+δ1t)+p(utδ3)1p+p(utδ3)=u+(1p)(δ1+δ3)tpu+(1p)=u+(1p)2tpu+(1p).\frac{(1-p)\cdot(u+\delta_{1}\cdot t)+p\cdot(u-t\cdot\delta_{3})}{1-p+p\cdot(u-t\cdot\delta_{3})}=\frac{u^{\prime}+(1-p)\cdot(\delta_{1}+\delta_{3})t}{p\cdot u^{\prime}+(1-p)}=\frac{u^{\prime}+(1-p)\cdot 2t}{p\cdot u^{\prime}+(1-p)}.

over 0t10\leq t\leq 1 and 0u1tδ30\leq u^{\prime}\leq 1-t\cdot\delta_{3}. Since 1p22t\frac{1}{p}\geq 2\geq 2t, we have that increasing uu^{\prime} increases the fraction overall. So, we may assume that u=1tδ3.u^{\prime}=1-t\cdot\delta_{3}. In this case, we are trying to maximize the fraction

1tδ3+(1p)2tp(1tδ3)+(1p)=1+(2δ3)t2pt1pδ3t.\frac{1-t\cdot\delta_{3}+(1-p)\cdot 2t}{p\cdot(1-t\cdot\delta_{3})+(1-p)}=\frac{1+(2-\delta_{3})t-2pt}{1-p\cdot\delta_{3}\cdot t}.

Since p12p\leq\frac{1}{2}, the numerator increases and the denominator decreases as tt increases, so the fraction increases overall. Thus, this fraction is maximized when t=1t=1, and equals

1+(2δ3)2p1pδ3=1+δ1(δ1+δ3)p1pδ3.\frac{1+(2-\delta_{3})-2p}{1-p\cdot\delta_{3}}=\frac{1+\delta_{1}-(\delta_{1}+\delta_{3})p}{1-p\cdot\delta_{3}}.

Appendix C Numerical Analysis for Euclidean kk-means and kk-median

C.1 The kk-means case

We recall that our goal is to show, for an appropriate choice of ρ\rho, that for any 0θ10\leq\theta\leq 1 and any r1r\geq 1, we cannot simultaneously satisfy

𝔇\displaystyle\mathfrak{D}^{\prime} i=15(Qip1rRi),\displaystyle\geq\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right), (52)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <θri=15ρ(i)(p1)(Qip1Ri)+(1θr)ρ(p1)(𝔇+p1θri=15Ri),\displaystyle<\frac{\theta}{r}\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{i})+\left(1-\frac{\theta}{r}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+p_{1}\cdot\frac{\theta}{r}\sum_{i=1}^{5}R_{i}\right), (53)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <i=15ρ(i)(p1r)(Qip1rRi),\displaystyle<\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r}\cdot R_{i}\right), (54)

and

R1Q1,R2Q2,R3Q3,R41.75Q4,R52Q5,R_{1}\leq Q_{1},\hskip 28.45274ptR_{2}\leq Q_{2},\hskip 28.45274ptR_{3}\leq Q_{3},\hskip 28.45274ptR_{4}\leq 1.75Q_{4},\hskip 28.45274ptR_{5}\leq 2Q_{5}, (55)

where we will let Q1,Q2,Q3,Q4,Q5,R1,R2,R3,R4,R5Q_{1},Q_{2},Q_{3},Q_{4},Q_{5},R_{1},R_{2},R_{3},R_{4},R_{5} and 𝔇\mathfrak{D}^{\prime} be arbitrary nonnegative reals. For p1=0.402p_{1}=0.402, we recall that ρ(p1)=3+22\rho(p_{1})=3+2\sqrt{2}. Now, note that if we increase 𝔇\mathfrak{D}^{\prime}, Equations (53) and (54) become harder to satisfy, since in both equations, the left hand side has a greater slope as a function of 𝔇\mathfrak{D}^{\prime} than the right hand side. As a result, we may assume that 𝔇=i=15(Qip1rRi)\mathfrak{D}^{\prime}=\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r}R_{i}\right), which we know is nonnegative since p1<0.5p_{1}<0.5 and r1r\geq 1, so Qip1rRiQ_{i}\geq\frac{p_{1}}{r}\cdot R_{i} for all 1i51\leq i\leq 5.

Now, we note that we may assume r2.37r\geq 2.37. This is because if r2.37,r\leq 2.37, then p1r0.169,\frac{p_{1}}{r}\geq 0.169, and it is easy to verify that ρ(p)5.912\rho(p)\leq 5.912 for any p[0.169,0.402]p\in[0.169,0.402] (for instance, by using Lemma 5.19 to bound ρ(1)(p),ρ(2)(p)\rho^{(1)}(p),\rho^{(2)}(p), and ρ(5)(p)\rho^{(5)}(p), and using Cases 1.g.i and 2.d for ρ(3)(p)\rho^{(3)}(p) and ρ(4)(p)\rho^{(4)}(p)). Therefore, for any ρ5.912,\rho\geq 5.912, if r2.37r\leq 2.37 then Equations (52) and (54) cannot hold simultaneously. In addition, we may also assume that r4.18,r\leq 4.18, since if r3.5r\geq 3.5, then we can use the simpler bound of ρ(p1)(1+14r(r/(2p1)1))5.912\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot(r/(2p_{1})-1)}\right)\leq 5.912.

We recall that for p[0.096,0.402]p\in[0.096,0.402], ρ(1)(p)=3+22\rho^{(1)}(p)=3+2\sqrt{2}, ρ(2)(p)=1+2p+(1p)δ1+22p2+(1p)δ1,\rho^{(2)}(p)=1+2\cdot p+(1-p)\cdot\delta_{1}+2\sqrt{2\cdot p^{2}+(1-p)\cdot\delta_{1}}, ρ(3)(p)=(1p)(1+δ1)2+p(1δ3)21p+p(1δ3)2\rho^{(3)}(p)=\frac{(1-p)\cdot(1+\sqrt{\delta_{1}})^{2}+p\cdot(1-\sqrt{\delta_{3}})^{2}}{1-p+p(1-\sqrt{\delta_{3}})^{2}}, ρ(4)(p)=(12p)(1+δ1)2(δ1+(δ1+δ3)2)+p(1+δ1)2δ3(12p)(δ1+(δ1+δ3)2)+p(1+δ1)2δ3\rho^{(4)}(p)=\frac{(1-2p)\cdot(1+\sqrt{\delta_{1}})^{2}\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot\left(1+\sqrt{\delta_{1}}\right)^{2}\cdot\delta_{3}}{(1-2p)\cdot(\delta_{1}+(\sqrt{\delta_{1}}+\sqrt{\delta_{3}})^{2})+p\cdot(1+\sqrt{\delta_{1}})^{2}\cdot\delta_{3}}, and ρ(5)(p)=5.68\rho^{(5)}(p)=5.68.

Now, let ρ=5.912\rho=5.912, and suppose there exist 0θ0θθ110\leq\theta_{0}\leq\theta\leq\theta_{1}\leq 1 and 1r0rr11\leq r_{0}\leq r\leq r_{1} such that Equations (52), (53), (54), and (55) can be simultaneously satisfies for nonnegative Q1,Q2,Q3,Q4,Q5,R1,R2,R3,R4,R5Q_{1},Q_{2},Q_{3},Q_{4},Q_{5},R_{1},R_{2},R_{3},R_{4},R_{5}. Then, in fact we must be able to satisfy the weaker conditions

𝔇\displaystyle\mathfrak{D}^{\prime} =i=15(Qip1r0Ri)\displaystyle=\sum_{i=1}^{5}\left(Q_{i}-\frac{p_{1}}{r_{0}}\cdot R_{i}\right)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <θ1r0i=15ρ(i)(p1)(Qip1R1)+(1θ0r1)ρ(p1)(𝔇+θ1r0i=14Ri)\displaystyle<\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{5}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{1})+\left(1-\frac{\theta_{0}}{r_{1}}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{4}R_{i}\right)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <i=15ρ(i)(p1r1)(Qip1r1Ri),\displaystyle<\sum_{i=1}^{5}\rho^{(i)}\left(\frac{p_{1}}{r_{1}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right),

and (55), while having Q1,Q2,Q3,Q4,Q5,R1,R2,R3,R4,R5Q_{1},Q_{2},Q_{3},Q_{4},Q_{5},R_{1},R_{2},R_{3},R_{4},R_{5} all be nonnegative. Indeed, the conditions are weaker since we have decreased the value of 𝒟\mathcal{D}^{\prime} and increased all terms on the right-hand side (noting that each ρ(i)\rho^{(i)} is a non-increasing function in the range [0,0.402][0,0.402]).

For every 0θ00.990\leq\theta_{0}\leq 0.99 and 2.37r04.172.37\leq r_{0}\leq 4.17 such that θ0,r0\theta_{0},r_{0} are integral multiples of 0.010.01, we look at the intervals θ[θ0,θ0+0.01]\theta\in[\theta_{0},\theta_{0}+0.01] and r[r0,r0+0.01]r\in[r_{0},r_{0}+0.01]. If this region has a nonnegative solution to these inequalities, then we further partition the region [θ0,θ0+0.01]×[r0,r0+0.01][\theta_{0},\theta_{0}+0.01]\times[r_{0},r_{0}+0.01] into a 10×1010\times 10 grid of dimensions 0.0010.001. If one of these regions has a nonnegative solution to these inequalities, we partition one step further into a grid of dimensions 0.00010.0001 (we will not need to partition beyond this). Using this procecdure, we are able to obtain that there is no solution in θ[0,1]\theta\in[0,1] and r[2.37,4.18]r\in[2.37,4.18] when ρ=5.912\rho=5.912, which allows us to establish that our algorithm provides a polynomial-time 5.912\boxed{5.912}-approximation for Euclidean kk-means clustering.

See Appendix A for links to the Python code.

C.2 The kk-median case

The kk-median case is almost identical, except for the modified equations and modified choices of ρ(i)\rho^{(i)}. This time, we wish to show that for any θ[0,1]\theta\in[0,1] and r1r\geq 1, there exists 0θ0θθ110\leq\theta_{0}\leq\theta\leq\theta_{1}\leq 1 and 1r0rr11\leq r_{0}\leq r\leq r_{1} such that one cannot satisfy

𝔇\displaystyle\mathfrak{D}^{\prime} =i=13(Qip1r0Ri)\displaystyle=\sum_{i=1}^{3}\left(Q_{i}-\frac{p_{1}}{r_{0}}\cdot R_{i}\right)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <θ1r0i=13ρ(i)(p1)(Qip1R1)+(1θ0r1)ρ(p1)(𝔇+θ1r0i=14Ri)\displaystyle<\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{3}\rho^{(i)}(p_{1})\cdot(Q_{i}-p_{1}\cdot R_{1})+\left(1-\frac{\theta_{0}}{r_{1}}\right)\cdot\rho(p_{1})\cdot\left(\mathfrak{D}^{\prime}+\frac{\theta_{1}}{r_{0}}\cdot\sum_{i=1}^{4}R_{i}\right)
ρ𝔇\displaystyle\rho\cdot\mathfrak{D}^{\prime} <ρ(1)(p1r1)(Qip1r1Ri)+ρ(2)(p1r1)(Qip1r1Ri)+ρ(3)(p1r0)(Qip1r1Ri),\displaystyle<\rho^{(1)}\left(\frac{p_{1}}{r_{1}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right)+\rho^{(2)}\left(\frac{p_{1}}{r_{1}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right)+\rho^{(3)}\left(\frac{p_{1}}{r_{0}}\right)\cdot\left(Q_{i}-\frac{p_{1}}{r_{1}}\cdot R_{i}\right),

where Q1,Q2,Q3,R1,R2,R3Q_{1},Q_{2},Q_{3},R_{1},R_{2},R_{3} are nonnegative. In the above equations, we set p1=0.068p_{1}=0.068, δ1=2,δ2=1.395\delta_{1}=\sqrt{2},\delta_{2}=1.395, and δ3=22\delta_{3}=2-\sqrt{2}. Also, recall that for p[0.01,0.068],p\in[0.01,0.068], we have that ρ(1)(p)max(1+δ2,3(X+Y)+22(X+Y)2δ22XY),\rho^{(1)}(p)\leq\max\left(1+\delta_{2},\sqrt{3\left(X+Y\right)+2\sqrt{2\left(X+Y\right)^{2}-\delta_{2}^{2}\cdot XY}}\right), where X=p2+p(1p)1.1X=p^{2}+p(1-p)\cdot 1.1 and Y=(1p)2+p(1p)1.1,Y=(1-p)^{2}+\frac{p(1-p)}{1.1}, ρ(2)(p)(1+2)(32)(2p2p2)1(22)(2p2p2),\rho^{(2)}(p)\leq\frac{(1+\sqrt{2})-(3-\sqrt{2})\cdot(2p-2p^{2})}{1-(2-\sqrt{2})\cdot(2p-2p^{2})}, and that ρ(3)(p)1122(2d2)p\rho^{(3)}(p)\leq\frac{1}{\frac{1}{2}-2\cdot(2-d_{2})\cdot p}.

We remark that in the final equation, we use ρ(3)(p1/r0)\rho^{(3)}(p_{1}/r_{0}) instead of ρ(3)(p1/r1)\rho^{(3)}(p_{1}/r_{1}) - this is because ρ(3)\rho^{(3)} is a decreasing function on the region [0,0.068][0,0.068], as opposed to ρ(1)\rho^{(1)} and ρ(2)\rho^{(2)} which are both decreasing functions.

First, we may assume that r[2.4,3.42].r\in[2.4,3.42]. Indeed, if 1r<2.41\leq r<2.4, one can use the more naive bound of ρ(p1/r)\rho(p_{1}/r), which is less than 2.4062.406. If r>3.42r>3.42, one can instead use the bound of

ρ(p1)(1+14r(p0rp11))2.395(1+143.42(0.3373.420.0681))<2.406.\rho(p_{1})\cdot\left(1+\frac{1}{4r\cdot\left(\frac{p_{0}\cdot r}{p_{1}}-1\right)}\right)\leq 2.395\cdot\left(1+\frac{1}{4\cdot 3.42\cdot\left(\frac{0.337\cdot 3.42}{0.068}-1\right)}\right)<2.406.

To finish, we apply a similar method as in the kk-means case. We split the region (θ,r)[0,1]×[2.4,3.42](\theta,r)\in[0,1]\times[2.4,3.42] into grid blocks of size 0.005×0.0050.005\times 0.005 with θ0,θ1,r0,r1\theta_{0},\theta_{1},r_{0},r_{1} being the endpoints in each direction. We verify that the linear program has no solution when ρ=2.406\rho=2.406 for each grid block: if it does, we further refine the grid block into smaller 0.001×0.0010.001\times 0.001-sized pieces and verify each of the smaller pieces.

See Appendix A for links to the Python code.

Appendix D Changes to Construction of Roundable Solutions

In this section, we explain how Ahmadian et al. [1] implicitly prove Theorem 5.7, up to some minor modifications of their algorithm and analysis. Because the algorithm and analysis is almost entirely the same, we only describe the differences between the algorithm and analysis in [1] and what we need for our Theorem 5.7.

The only changes in the overall algorithm will be as follows. We will set some small constant κ\kappa such that ε=κ2\varepsilon=\kappa^{2}. We will then set K=Θ(ε1γ4logκ)K=\Theta(\varepsilon^{-1}\gamma^{-4}\log\kappa) as opposed to K=Θ(ε1γ4)K=\Theta(\varepsilon^{-1}\gamma^{-4}) in [1, Algorithm 2, Line 4], and set the definition of stopped [1, Section 7] to be that j𝒟j\in\mathcal{D} is stopped if jj𝒟\exists j^{\prime}\neq j\in\mathcal{D} such that (1+κ)α jd(j,j)+κ1α j(1+\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,j^{\prime})+\kappa^{-1}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j^{\prime}}, as opposed to jj𝒟\exists j^{\prime}\neq j\in\mathcal{D} such that 2α jd(j,j)+6α j2\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,j^{\prime})+6\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j^{\prime}}. Here, we are letting α j=αj\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}=\sqrt{\alpha_{j}} for j𝒟j\in\mathcal{D}.

We now describe how this changes the claims throughout [1, Sections 7-8]. We only describe the changes to the statements, because the proofs do not change at all. For nearly the remainder of this appendix, we will consider a new definition of roundable, neither the one in [1] nor our Definition 5.1. Our modified definition will instead have that: for all j𝒟\𝒟Bj\in\mathcal{D}\backslash\mathcal{D}_{B} and all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)], (1+A+10ε/κ)2αj(d(j,w(j))+Aτw(j))2(1+A+10\varepsilon/\kappa)^{2}\cdot\alpha_{j}\geq\left(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}}\right)^{2}, which replaces Conditions 3a and 3b in Definition 5.1 (or Condition 2a in [1, Definition 5.1]). In addition, our modified definition will have that for all clients j𝒟j\in\mathcal{D}, κ2γOPTkj𝒟B(d(j,w(j))+Aτw(j))2\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}})^{2} for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)], which replaces our Condition 3c in Definition 5.1 (or condition 2b in [1, Definition 5.1])

We are now ready to describe how each claim in [1] changes (or stays the same).

[1, Lemma 7.1] still holds with our new definition of stopped: the same proof still works.

The (unnumbered) claim in the first paragraph of [1, Section 8] still goes through, with our modified definition of roundable (i.e., the one presented in this section).

In [1, Section 8.1], both [1, Lemma 8.1] and [1, Lemma 8.2] still hold with our new definition of stopped, with essentially no changes to the proof. Likewise, in [1, Section 8.2], Lemmas 8.3, 8.4, 8.5 and Corollary 8.6 in [1] still hold with our new definition of stopped.

In [1, Section 8.3], we change the definition of \mathcal{B}, the “potentially bad” clients (see [1, Equation (8.1)]), to be

={j𝒟:j is undecided and (1+κ)α j<d(j,j)+1κα j(0)},\mathcal{B}=\left\{j\in\mathcal{D}:j\text{ is undecided and }(1+\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}<d(j,j^{\prime})+\frac{1}{\kappa}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}\right\},

where α j(0)=αj(0)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}=\sqrt{\alpha_{j}^{(0)}} refers to the value of αj\alpha_{j} at the start of a RaisePrice(s)olution (i.e., when the solution 𝒮\mathcal{S} is labeled as 𝒮(0)\mathcal{S}^{(0)} in our Algorithm 2). This contrasts to the original definition of \mathcal{B}, which was the set of undecided clients with 2α j<d(j,j)+6α j(0)2\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}<d(j,j^{\prime})+6\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}, in a similar way to how our definition of stopped contrasts with the original definition.

[1, Lemma 8.7] is now as follows. For any (α,z)(\alpha,z) produced during RaisePrice, for every client j𝒟j\in\mathcal{D} the following holds:

  • If j𝒟\j\in\mathcal{D}\backslash\mathcal{B} then there exists a tight facility ii such that (1+A+ε/κ)α jd(j,i)Ati(1+A+\varepsilon/\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,i)\cdot A\cdot\sqrt{t_{i}} for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)].

  • There exists a tight facility ii such that 1κα j(0)d(j,i)+Ati\frac{1}{\kappa}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}^{(0)}\geq d(j,i)+A\cdot\sqrt{t_{i}} for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)].

Again, the same proof holds.

We now move to [1, Section 8.4]. We update [1, Lemma 8.8] to be that if jj has a tight edge to some facility ii, then αj52κ4αj\alpha_{j^{\prime}}\leq\frac{5^{2}}{\kappa^{4}}\cdot\alpha_{j} for any jj^{\prime} with a tight edge to ii. In the proof, we would replace the stronger statement [1, Equation (8.2)] with: (1+κ)α jd(j,j)+4κα j(1+\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j^{\prime}}\leq d(j^{\prime},j)+\frac{4}{\kappa}\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}. In addition, we would update the Claim inside the proof of [1, Lemma 8.8] to be that: there is some tight facility ii^{*} in (α(0),z(0))(\alpha^{(0)},z^{(0)}) and also:

d(j1,i)(1+κ)α j1(0)(1+κ)α jandαj′′(0)(1+ε)αj1(0)(1+ε)αj for all j′′N(0)(i).d(j_{1},i^{*})\leq(1+\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j_{1}}^{(0)}\leq(1+\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\hskip 14.22636pt\text{and}\hskip 14.22636pt\alpha_{j^{\prime\prime}}^{(0)}\leq(1+\varepsilon)\alpha_{j_{1}}^{(0)}\leq(1+\varepsilon)\alpha_{j}\text{ for all }j^{\prime\prime}\in N^{(0)}(i^{*}).

Up to these changes, the rest of the proof of Lemma 8.8 in [1] is essentially unchanged.

Next, we update [1, Lemma 8.9] to replace “αj(0)202θs\alpha_{j}^{(0)}\geq 20^{2}\theta_{s} or αj202θs\alpha_{j}\geq 20^{2}\theta_{s}” with “αj(0)62κ4θs\alpha_{j}^{(0)}\geq\frac{6^{2}}{\kappa^{4}}\theta_{s} or αj62κ4θs\alpha_{j}\geq\frac{6^{2}}{\kappa^{4}}\theta_{s}” - again the same proof still holds.

[1, Proposition 8.10] and its proof still hold, except that we have to replace C1=log1+ε(204)C_{1}=\lceil\log_{1+\varepsilon}(20^{4})\rceil with C1=log1+ε64/κ8=O(ε1logκ1)C_{1}=\lceil\log_{1+\varepsilon}6^{4}/\kappa^{8}\rceil=O(\varepsilon^{-1}\log\kappa^{-1}). So, if we set εz=n6(K+C1+2)3\varepsilon_{z}=n^{-6(K+C_{1}+2)-3} for our new choice of C1,C_{1}, Proposition 8.10 in [1] holds.

We update [1, Proposition 8.11] to say that if jj\in\mathcal{B} for some (α,z)(\alpha,z) produced by RaisePrice, then κ4θsαj(0)64κ12θs\kappa^{4}\cdot\theta_{s}\leq\alpha_{j}^{(0)}\leq\frac{6^{4}}{\kappa^{12}}\theta_{s}. We replace the last equation in the Claim in the Proposition’s proof with: κ2θsαj(0)64κ8θs\kappa^{2}\cdot\theta_{s}\leq\alpha_{j^{\prime}}^{(0)}\leq\frac{6^{4}}{\kappa^{8}}\cdot\theta_{s}. The same proof still holds. We also update the definition of 𝒲(σ)\mathcal{W}(\sigma) to be the set {j𝒟:κ8/62θsαj(0)66/κ16θs for some s},\{j\in\mathcal{D}:\kappa^{8}/6^{2}\cdot\theta_{s}\leq\alpha_{j}^{(0)}\leq 6^{6}/\kappa^{16}\cdot\theta_{s}\text{ for some s}\}, where RaisePrice defines the parameters θs\theta_{s} based on the shift parameter σ[0,K/2)\sigma\in[0,K/2). With these definitions, and our modified choice of K=Θ(ε1γ4logκ1)K=\Theta(\varepsilon^{-1}\gamma^{-4}\log\kappa^{-1}), we will have that [1, Corollary 8.12] still holds.

We now move to [1, Section 8.5]. We keep their definitions of γ\gamma-close neighborhoods and of dense facilities and clients. We also let the sets D,𝒟D,S(),𝒟S()(i)\mathcal{F}_{D},\mathcal{D}_{D},\mathcal{F}_{S}^{(\ell)},\mathcal{D}_{S}^{(\ell)}(i), and 𝒟B\mathcal{D}_{B} be defined the same way (modulo our change in definition of \mathcal{B}). We also define τi\tau_{i} the same way, and we will let H(0)H^{(0)} and IS(0)IS^{(0)} simply represent the conflict graphs H(0)(δ1)H^{(0)}(\delta_{1}) and IS1(0)IS_{1}^{(0)} as generated by our Algorithm 2, respectively, at each iteration corresponding to making a new solution 𝒮(0)\mathcal{S}^{(0)}. Note that we are choosing δ=δ1=4+827\delta=\delta_{1}=\frac{4+8\sqrt{2}}{7}, so 2δ2\sqrt{2}\leq\sqrt{\delta}\leq 2.

With these, it is quite simple to see that [1, Lemma 8.13] still holds, where the choice ρ=ρ(0)\rho=\rho(0) in the proof is the approximation constant of the LMP algorithm in [1] with only a single parameter ρ\rho based on δ=4+827.\delta=\frac{4+8\sqrt{2}}{7}. In addition, [1, Lemma 8.14] still holds, except that we replace OPTk\text{OPT}_{k} with OPTk\text{OPT}_{k^{\prime}}, since the final inequality in the proof relates j𝒟>γd(j,IS(0))2j𝒟d(j,IS(0))2\sum_{j\in\mathcal{D}_{>\gamma}}d(j,IS^{(0)})^{2}\leq\sum_{j\in\mathcal{D}}d(j,IS^{(0)})^{2} to OPTk\text{OPT}_{k^{\prime}} now since kk^{\prime} is the minimum of kk and all sizes of the sets that become IS(0)IS^{(0)} at some point. [1, Corollary 8.15] (and the following Remark 8.16) also hold, due to our updated definition of 𝒲(σ)\mathcal{W}(\sigma) and KK.

Now, we update [1, Lemma 8.17] to say: for any j𝒟Dj\in\mathcal{D}_{D}\cap\mathcal{B}, either:

  • There exists a tight facility ii\in\mathcal{F} such that for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)], (1+A+10ε/κ)α jd(j,i)+Ati(1+A+10\cdot\varepsilon/\kappa)\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,i)+A\cdot\sqrt{t_{i}}

  • There exists a special facility iSi\in\mathcal{F}_{S} such that for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)], (1+A+10ε/κ)α jd(j,i)+Aτi.(1+A+10\cdot\varepsilon/\kappa)\cdot\accentset{\rule{2.79996pt}{0.7pt}}{\alpha}_{j}\geq d(j,i)+A\cdot\sqrt{\tau_{i}}.

Again, the proof holds with minimal change.

We now move to [1, Section 8.6], the final section of Ahmadian et al.’s analysis. We first look at how [1, Proposition 8.18] changes. The fact that α\alpha is feasible for DUAL(λ+1n)\text{DUAL}(\lambda+\frac{1}{n}), that αj1\alpha_{j}\geq 1 for all jj, and that zi[λ,λ+1n]z_{i}\in[\lambda,\lambda+\frac{1}{n}] are all still true. We now have that (1+A+10ε/κ)2αj(d(j,i)+Aτi)2(1+A+10\varepsilon/\kappa)^{2}\alpha_{j}\geq(d(j,i)+A\sqrt{\tau_{i}})^{2} for all clients jj not in \𝒟D𝒟B\mathcal{B}\backslash\mathcal{D}_{D}\subset\mathcal{D}_{B} by using our modified versions of Lemma 8.7 and Lemma 8.17. In addition, our modified Lemma 8.7 tells us that even for bad clients j𝒟Bj\in\mathcal{D}_{B}, there exists a tight facility ii such that κ2αj(0)(d(j,i)+Ati)2\kappa^{-2}\cdot\alpha_{j}^{(0)}\geq(d(j,i)+A\cdot\sqrt{t_{i}})^{2} for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)]. Hence, we precisely have that κ2γOPTkj𝒟B(d(j,w(j))+Aτw(j))2\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}})^{2} for all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)], by adding over all clients j𝒟Bj\in\mathcal{D}_{B} and using Corollary 8.15, which still holds unchanged, apart from replacing OPTk\text{OPT}_{k} with OPTk\text{OPT}_{k^{\prime}}. The final part of proving [1, Proposition 8.18], i.e., verifying Condition 4 in our definition 5.1, holds where the only change is replacing OPTk\text{OPT}_{k} with OPTk\text{OPT}_{k^{\prime}}. Thus, we have that each solution that is generated is (λ,k)(\lambda,k^{\prime})-roundable.

Finally, we have that [1, Theorem 8.19] still holds with essentially no change, meaning that each call to RaisePrice takes polynomial time and generates a polynomial number of (λ,k)(\lambda,k^{\prime})-roundable solutions for our modified definition of roundable. This also implies that Algorithm 2 runs in polynomial time, since GraphUpdate clearly takes polynomial time, and since the total number of times we call RaisePrice is at most ||L,|\mathcal{F}|\cdot L, which is polynomial since εz1\varepsilon_{z}^{-1} and m=||m=|\mathcal{F}| are both polynomial in nn.

Now, we note that each time we update our quasi-independent set in GraphUpdate, the new set (I1(,r),I2(,r),I3(,r))(I_{1}^{(\ell,r)},I_{2}^{(\ell,r)},I_{3}^{(\ell,r)}) only depends on I1(,r1)I_{1}^{(\ell,r-1)} and has no dependence on our choice of I2(,r1)I_{2}^{(\ell,r-1)} or I3(,r1)I_{3}^{(\ell,r-1)}. Therefore, if we ignore the sets I2,I3I_{2},I_{3} and only focus on I1I_{1}, the procedure of generating the sequence {I1(,r)}\{I_{1}^{(\ell,r)}\} is in fact identical to the procedure in Ahmadian et al.[1]. The only difference is that we choose our stopping point based on the first time that |I1(,r)|+p1|I2(,r)I3(,r)|<k,|I_{1}^{(\ell,r)}|+p_{1}\cdot|I_{2}^{(\ell,r)}\cup I_{3}^{(\ell,r)}|<k, as opposed to the first time that |I1(,r)|k|I_{1}^{(\ell,r)}|\leq k as done in [1]. Because of this, our Algorithm 2 in fact works exactly as the main algorithm in [1] if we only focus on I1(,r)I_{1}^{(\ell,r)} and set δ=δ1=4+827.\delta=\delta_{1}=\frac{4+8\sqrt{2}}{7}. The only differences are the way we choose when to stop the procedure and the way we update KK and the definition of stopped clients and definition of the set \mathcal{B}.

As a result, we have that each solution (α,z,S,𝒟S)(\alpha,z,\mathcal{F}_{S},\mathcal{D}_{S}) generated is λ\lambda is (λ,k)(\lambda,k)-roundable, up to our modified definition of roundable (where λ\lambda is chosen accordingly). By this, we mean that we replace Condition 3 in 5.1 with:

  1. a)

    For all j𝒟\𝒟Bj\in\mathcal{D}\backslash\mathcal{D}_{B} and all A[2κ,1/(2κ)],A\in[2\kappa,1/(2\kappa)], (1+A+10ε/κ)2αj(d(j,w(j))+Aτw(j))2(1+A+10\varepsilon/\kappa)^{2}\cdot\alpha_{j}\geq(d(j,w(j))+A\cdot\sqrt{\tau_{w(j)}})^{2}.

  2. b)

    For all j𝒟j\in\mathcal{D} and all A[2κ,1/(2κ)]A\in[2\kappa,1/(2\kappa)], κ2γOPTkj𝒟B(d(j,w(j)+Aτw(j))2\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}(d(j,w(j)+A\cdot\sqrt{\tau_{w(j)}})^{2}.

Recall that ε=κ21\varepsilon=\kappa^{2}\ll 1. Now, by setting A=2κA=2\kappa, we have that

(1+12κ)2αj=(1+2κ+10εκ)2αj(d(j,w(j))+2κτw(j))2c(j,w(j)),(1+12\kappa)^{2}\cdot\alpha_{j}=\left(1+2\kappa+\frac{10\varepsilon}{\kappa}\right)^{2}\cdot\alpha_{j}\geq\left(d(j,w(j))+2\kappa\cdot\sqrt{\tau_{w(j)}}\right)^{2}\geq c(j,w(j)),

and by setting A=1/(2κ)A=1/(2\kappa), we have

(1+12κ2κ)2αj=(1+12κ+10εκ)2αj(d(j,w(j))+12κτw(j))21(2κ)2τw(j).\left(\frac{1+12\kappa}{2\kappa}\right)^{2}\cdot\alpha_{j}=\left(1+\frac{1}{2\kappa}+10\cdot\frac{\varepsilon}{\kappa}\right)^{2}\cdot\alpha_{j}\geq\left(d(j,w(j))+\frac{1}{2\kappa}\cdot\sqrt{\tau_{w(j)}}\right)^{2}\geq\frac{1}{(2\kappa)^{2}}\cdot\tau_{w(j)}.

Finally, by setting A=1A=1, we have

κ2γOPTkj𝒟B(d(j,w(j))+τw(j))2j𝒟B(c(j,w(j))+τw(j)).\kappa^{-2}\cdot\gamma\cdot\text{OPT}_{k^{\prime}}\geq\sum_{j\in\mathcal{D}_{B}}\left(d(j,w(j))+\sqrt{\tau_{w(j)}}\right)^{2}\geq\sum_{j\in\mathcal{D}_{B}}\left(c(j,w(j))+\tau_{w(j)}\right).

Therefore, by setting ε=(1+12κ)21=O(ε)\varepsilon^{\prime}=(1+12\kappa)^{2}-1=O(\sqrt{\varepsilon}), we have that (1+ε)αjc(j,w(j))(1+\varepsilon^{\prime})\cdot\alpha_{j}\geq c(j,w(j)) and (1+ε)αjτw(j)(1+\varepsilon^{\prime})\cdot\alpha_{j}\geq\tau_{w(j)}. In addition, if we set γ=κ2γ\gamma^{\prime}=\kappa^{-2}\cdot\gamma, we have that γOPTkj𝒟B(c(j,w(j))+τw(j)).\gamma^{\prime}\cdot\text{OPT}_{k}\geq\sum_{j\in\mathcal{D}_{B}}(c(j,w(j))+\tau_{w(j)}). Therefore, since ε=κ2\varepsilon=\kappa^{2}, and if we assume that γε2\gamma\ll\varepsilon^{2}, then we have that ε1\varepsilon^{\prime}\ll 1 and γεε\gamma^{\prime}\ll\varepsilon\ll\varepsilon^{\prime}. So, we can replace ε\varepsilon^{\prime} with ε\varepsilon and γ\gamma^{\prime} with γ\gamma, we have an algorithm that still runs in polynomial time (since the old values of κ,γ,ε\kappa,\gamma,\varepsilon are still polynomial factors in the new values of ε,γ\varepsilon,\gamma which are all constants, even if they are arbitrary small). But now, we have that each solution (α(),z(),S(),𝒟S())(\alpha^{(\ell)},z^{(\ell)},\mathcal{F}_{S}^{(\ell)},\mathcal{D}_{S}^{(\ell)}) satisfies the actual Condition 3 in Definition 5.1, for our new values of ε\varepsilon and γ\gamma.

Overall, we have that the algorithm runs in polynomial time, and each solution is kk^{\prime}-roundable, where kk^{\prime} is the minimum of kk and min|I1(0)|\min|I_{1}^{(0)}| over the course of the algorithm. Each pair of consecutive solutions is close as in Theorem 8.19 in [1] (which follows from their Proposition 8.10). Next, we have that each time we create a solution 𝒮(0)\mathcal{S}^{(0)}, Lemma 8.1 in [1], which holds in our setting, tells us that every client (α(0),z(0))(\alpha^{(0)},z^{(0)}) is decided. Since \mathcal{B} is a subset of undecided facilities, =\mathcal{B}=\emptyset for a solution 𝒮(0)\mathcal{S}^{(0)}, which means that S=\mathcal{F}_{S}=\emptyset based on the definition of S\mathcal{F}_{S}. In addition, our modified version of [1, Lemma 8.7] holds for all jj since =,\mathcal{B}=\emptyset, which means that we can set the bad clients 𝒟B\mathcal{D}_{B} to be \emptyset. So, for each 𝒮(0)\mathcal{S}^{(0)}, the special facilities and bad clients are both empty. Next, we have that I(,r)I^{(\ell,r)} is a nested quasi-independent set because of how we defined 𝒱(,r)\mathcal{V}^{(\ell,r)} and how we defined I(,r)I^{(\ell,r)} in our GraphUpdate procedure. Finally, we had that |𝒱(,r)\𝒱(,r+1)|1|\mathcal{V}^{(\ell,r)}\backslash\mathcal{V}^{(\ell,r+1)}|\leq 1 as described at the end of Subsection 5.1, and that we created I1(,r+1)I_{1}^{(\ell,r+1)} from I1(,r)I_{1}^{(\ell,r)} by removing a single point from 𝒱(,r)\mathcal{V}^{(\ell,r)} if |𝒱(,r)\𝒱(,r+1)|=1|\mathcal{V}^{(\ell,r)}\backslash\mathcal{V}^{(\ell,r+1)}|=1 (which may or may not be in I1(,r)I_{1}^{(\ell,r)}), and then extending to a maximal independent set of 𝒱(+1,r)\mathcal{V}^{(\ell+1,r)}. So, we have that |I1(,r)\I1(,r+1)|1.|I_{1}^{(\ell,r)}\backslash I_{1}^{(\ell,r+1)}|\leq 1. This means that all of the statements of Theorem 5.7 hold.

Appendix E Limit of [1] in Obtaining Improved Approximations

In this section, we show that the algorithm of Ahmadian et al. [1] cannot guarantee an LMP approximation better than 1+21+\sqrt{2} in the case of kk-median. In more detail, we show that there exists a set of clients 𝒟\mathcal{D}, facilities \mathcal{F}, and parameter λ>0\lambda>0 such that for any choice δ1\delta\geq 1 in the pruning phase, the LMP algorithm described in the preliminary subsection 5.1 does not obtain better than a (1+2)(1+\sqrt{2})-approximation for kk-median. As a result, their technique cannot guarantee an LMP approximation for all choices λ\lambda, which means any improvement to their analysis would have to move significantly outside the LMP framework.

We start with the kk-median case. First, consider the points jj, i1=j1i_{1}=j_{1}, and i2=j2i_{2}=j_{2} such that j,j1,j2j,j_{1},j_{2} are collinear in that order, d(j,j1)=Td(j,j_{1})=T, and d(j1,j2)=2Td(j_{1},j_{2})=\sqrt{2}\cdot T for some choice of T>0T>0. Consider applying the LMP algorithm described in Section 2.2 on just these points ={i1,i2}\mathcal{F}=\{i_{1},i_{2}\} and 𝒟={j,j,,j,j1,j2}\mathcal{D}=\{j,j,\dots,j,j_{1},j_{2}\}, where we set λ=T\lambda=T and will include a large number NN of copies of jj. In this case, the growing phase will set αj=αj1=αj2=T\alpha_{j}=\alpha_{j_{1}}=\alpha_{j_{2}}=T, where i1i_{1} and i2i_{2} both become tight. Also, N(i1)={j1}N(i_{1})=\{j_{1}\} (with each copy of jj barely not being in it) and N(i2)={j2}N(i_{2})=\{j_{2}\}. One also obtains that ti1=ti2=Tt_{i_{1}}=t_{i_{2}}=T. Then, if δ2\delta\geq\sqrt{2}, then i1,i2i_{1},i_{2} are connected in the conflict graph H(δ),H(\delta), which means that the pruning phase will only allow either i1i_{1} or i2i_{2} to be in our set SS. The algorithm is arbitrary, and may set S={i2}S=\{i_{2}\}. In this case, the total clustering cost is NT(1+2)+2T=(1+2)T(N+1)T,N\cdot T\cdot(1+\sqrt{2})+\sqrt{2}\cdot T=(1+\sqrt{2})\cdot T\cdot(N+1)-T, whereas the dual is αjλ1=T(N+2)T=T(N+1)\sum\alpha_{j}-\lambda\cdot 1=T\cdot(N+2)-T=T\cdot(N+1). If δ<2,\delta<\sqrt{2}, then both i1,i2i_{1},i_{2} are included, so the primal is TNT\cdot N and the dual is αjλ2=TN\sum\alpha_{j}-\lambda\cdot 2=T\cdot N.

Next, we consider a point jj as well as points i1,,ihi_{1},\dots,i_{h}, such that i1,,ihi_{1},\dots,i_{h} form a regular simplex with centroid jj and pairwise distances T2(1ε)T^{\prime}\cdot\sqrt{2}\cdot(1-\varepsilon) between each iri_{r} and isi_{s}, for some T>0T^{\prime}>0 and arbitrarily small ε\varepsilon. Consider applying the LMP algorithm described in Section 2.2 on just these points ={i1,,ih}\mathcal{F}=\{i_{1},\dots,i_{h}\} and 𝒟={j}\mathcal{D}=\{j\}, where we set λ=T(1(1ε)h1h)\lambda=T^{\prime}\cdot\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right). In this case, we will have that since d(αj,ir)=T(1ε)h1hd(\alpha_{j},i_{r})=T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}} for all 1rh1\leq r\leq h, all facilities iri_{r} will become tight with αj=tir=T\alpha_{j}=t_{i_{r}}=T^{\prime} for all 1rh1\leq r\leq h. If δ<2\delta<\sqrt{2}, since the pairwise distances are more than TδT^{\prime}\cdot\delta, the conflict graph will be empty so all facilities will be in the independent set. Therefore, the clustering cost will be T(1ε)h1hT^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}, and the dual will be

αjλh=T(1h(1(1ε)h1h))=T((1ε)h(h1)(h1))T(1εh).\alpha_{j}-\lambda\cdot h=T^{\prime}\left(1-h\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)\right)=T^{\prime}\cdot\left((1-\varepsilon)\sqrt{h(h-1)}-(h-1)\right)\leq T^{\prime}\cdot\left(1-\varepsilon\cdot h\right).

Else, if δ2\delta\geq\sqrt{2}, the conflict graph H(δ)H(\delta) is complete on i1,,ihi_{1},\dots,i_{h}, so only one facility will be in the independent set. The clustering cost is still T(1ε)h1hT^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}, and the dual will be

αjλ1=T(1(1(1ε)h1h))=T(1ε)h1h.\alpha_{j}-\lambda\cdot 1=T^{\prime}\left(1-\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)\right)=T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}.

Now, we fix ε\varepsilon as a very small constant, and h=Θ(ε3)h=\Theta(\varepsilon^{-3}). Finally, we set T=1=λT=1=\lambda and T=1/(1(1ε)h1h)=Θ(ε1)T^{\prime}=1/\left(1-(1-\varepsilon)\sqrt{\frac{h-1}{h}}\right)=\Theta(\varepsilon^{-1}). Finally, we set N=Θ(ε2)N=\Theta(\varepsilon^{-2}), and consider the concatenation of each of the two cases described above, where the corresponding cases are sufficiently far apart in Euclidean space that there is no interaction.

If δ2\delta\geq\sqrt{2}, then the overall clustering cost is

(1+2)T(N+1)T+T(1ε)h1h=(1+2)N(1+O(ε))(1+\sqrt{2})\cdot T\cdot(N+1)-T+T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}=(1+\sqrt{2})\cdot N\cdot(1+O(\varepsilon))

whereas the total dual is

TN+T(1ε)h1h=N(1+O(ε)).T\cdot N+T^{\prime}\cdot(1-\varepsilon)\cdot\sqrt{\frac{h-1}{h}}=N\cdot(1+O(\varepsilon)).

So, we do not obtain better than a 1+2O(ε)1+\sqrt{2}-O(\varepsilon) approximation in this case. If δ<2\delta<\sqrt{2}, then the total dual is in fact negative, as it is at most

TN+T(1εh)Ω(ε3).T\cdot N+T^{\prime}\cdot(1-\varepsilon\cdot h)\leq-\Omega(\varepsilon^{-3}).

Overall, there is no choice of δ\delta that we can set to improve over a 1+21+\sqrt{2} approximation.