This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

New Approximation Algorithms for Fair kk-median Problem

Di Wu School of Computer Science and Engineering, Central South University, Changsha, China. Email: [email protected]    Qilong Feng School of Computer Science and Engineering, Central South University, Changsha, China. Email: [email protected]    Jianxin Wang School of Computer Science and Engineering, Central South University, Changsha, China. Email: [email protected]
Abstract

The fair kk-median problem is one of the important clustering problems. The current best approximation ratio is 4.675 for this problem with 1-fair violation, which was proposed by Bercea et al. [APPROX-RANDOM’2019]. As far as we know, there is no available approximation algorithm for the problem without any fair violation. In this paper, we consider the fair kk-median problem in bounded doubling metrics and general metrics. We provide the first QPTAS for fair kk-median problem in doubling metrics. Based on the split-tree decomposition of doubling metrics, we present a dynamic programming process to find the candidate centers, and apply min-cost max-flow method to deal with the assignment of clients. Especially, for overcoming the difficulties caused by the fair constraints, we construct an auxiliary graph and use minimum weighted perfect matching to get part of the cost. For the fair kk-median problem in general metrics, we present an approximation algorithm with ratio O(logk)O(\log k), which is based on the embedding of given space into tree metrics, and the dynamic programming method. Our two approximation algorithms for the fair kk-median problem are the first results for the corresponding problems without any fair violation, respectively.

1 Introduction

As one of the most commonly studied problems, clustering aims to give a ”good” partition of a set of points of a metric space into different groups so that points with similar attributes should be in the same group. The clustering algorithm may be biased if the data set involves some sensitive attributes (gender, race, or age). To solve this deviation caused by some sensitive attributes of the data set, the fair clustering problem was proposed. Over the past few years, lots of attention have been paid on the fair clustering problem (Abbasi et al. (2021); Ahmadian et al. (2020); Backurs et al. (2019); Chen et al. (2019); Chierichetti et al. (2017); Ghadiri et al. (2021); Bera et al. (2019); Esmaeili et al. (2020)).

Chierichetti et al. (2017) first proposed the definition of redistricting fairness for clustering, which refers to the balance of disparate impact and fair representation of each protected class in every cluster. Bercea et al. (2019) first proposed a generalized and tunable notion of redistricting fairness where asserts that protected classes should maintain an upper bound and a lower bound proportion of the cluster size in clustering. This kind of fairness model is called (α,β)(\alpha,\beta)-proportional fair model. The input for fair kk-median problem is a set of points 𝒞\mathcal{C}, a set of candidate centers \mathcal{F} in a metric space together with an integer ll, and two vectors 𝜶\boldsymbol{\alpha} and 𝜷\boldsymbol{\beta}, the fair kk-median problem asks for a set FF\subseteq\mathcal{F} of kk points, called centers\emph{c}enters, together with an assignment μ:𝒞F\mu:\mathcal{C}\rightarrow F, where each client in the data set 𝒞\mathcal{C} is entitled a color color(i)color(i) and 𝒞\mathcal{C} is divided into ll disjoint groups 𝒞1,𝒞2,,𝒞l\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{l}, the goal is to minimize the kk-median objective function satisfying that the number of each type clients within color(i)color(i) in each cluster conforms to a certain proportion between αi\alpha_{i} and βi\beta_{i}.

Bercea et al. (2019) first proposed a bicriteria approximation scheme with 4.6754.675-approximation and 11-fairness violation in polynomial time for the fair kk-median problem in metric space, which can be extended to a 3/3.488/62.8563/3.488/62.856-approximation for the fair kk-center/facility location/kk-means problem within a 11-fairness violation, respectively. For the special case of αi=α,βi=0\alpha_{i}=\alpha,\beta_{i}=0 (α\alpha is a fixed number and i[l]i\in[l]), Ahmadian et al. (2019) gave a 33-approximation with 22-fairness violation algorithm for fair kk-center problem in metric space. As a generalization for the above (α,β)(\alpha,\beta)-proportional fairness model, Bera et al. (2019) proposed a more generalized notion of fairness such that each client can belong to multiple types. They gave an approximation algorithm with ρ+2\rho+2-approximation and 4Δ+34\Delta+3-fairness violation in polynomial time for the fair kk-median, fair kk-center and fair kk-means problems, where ρ\rho is the approximation ratio of the known kk-clustering algorithm, and Δ\Delta denotes the maximum number of types that each client belongs to. Harb and Lam (2020) gave a 33-approximation and 4Δ+34\Delta+3-fairness violation for the fair kk-center problem in polynomial time. As far as we know, for the fair kk-median problem, no approximation algorithm without fairness violation is available for any non-trivial metrics.

2 Results and Techniques

We propose the first qusi-polynomial time approximation scheme(QPTAS) for the fair kk-median problem in doubling metrics, the first O(logk)O(\log k)-approximation algorithm for the fair kk-median problem in general metrics, and these two results have no fairness violation.

Theorem 2.1.

Given an instance of size nn of the fair kk-median problem in metrics of fixed doubling dimension dd, there exists a randomized (1+ϵ)(1+\epsilon)-approximation algorithm with running time O~(n(lognϵ)O(d)l)\tilde{O}(n^{{(\frac{\log n}{\epsilon})}^{O(d)}l}).

Theorem 2.2.

Given an instance of size nn of the fair kk-median problem in general metric space, there exists a randomized O(logk)O(\log k)-approximation algorithm with running time nO(l)n^{O(l)}.

Our method solving the fair kk-median problem in doubling metrics is based on the dynamic programming process on split-tree decomposition. Firstly, the split-tree decomposition is to partition the metrics into a number of subsets randomly, called blocks, and keeps dividing these blocks recursively until each block only contains one point. For each block, we construct a subset of points, called portals, as the bridges to get the distance of points in split-tree. Our goal is to prove that there exists a near-optimal solution such that different blocks ”interact” only through portals, which is one of the major challenges to solve the fair kk-median problem in doubling metrics. The dynamic programming algorithm proceeds on the split-tree from the leaves to the root to fill in the DP table. For a table entry of a specific block, we save the information that how many clients of each type located outside (respectively inside) is assigned to facilities located inside (respectively outside) this block through each portal, and the number of facilities opened in this block. For a subproblem of a specific block, the value of the table entry of this block is calculated by considering the following two parts: the total value of table entries of the children of this block which can be extracted from the DP table; the cost between the portals of this block and the ones in its children. How to calculate the cost between the portals of a block and the ones in its children is another challenge to solve the fair kk-median problem in doubling metrics. To solve this problem, we construct an auxiliary graph and use minimum weighted perfect matching to get the cost between the portals of a block and the ones in its children. The dynamic programming procedure returns a set of facilities that need to be opened, and the number of clients assigned to each open facility satisfying the fairness constraints. Finally, we apply the min-cost max-flow method to assign the clients to the facilities.

The algorithm solving the fair kk-median problem in general metric space is based on the embedding of given space to tree metrics such that a Hierarchically Separated Tree (HST) can be obtained. Moreover, a dynamic programming process is presented on the Hierarchically Separated Tree to get the candidate opened facilities.

3 Preliminaries

Definition 3.1 (metric space).

Given a space 𝒳=(X,dist)\mathcal{X}=(X,dist) where XX is a set of points associated with a distance function dist:X×X0dist:X\times X\rightarrow\Re_{0}, space 𝒳\mathcal{X} is called a metric space if for any point x,y,zx,y,z in XX, the following properties are satisfied: (1) dist(x,y)=0dist(x,y)=0 iff x=yx=y; (2) dist(x,y)=dist(y,x)dist(x,y)=dist(y,x); (3) dist(x,y)+dist(y,z)dist(x,z)dist(x,y)+dist(y,z)\geq dist(x,z).

Definition 3.2 (doubling dimension).

Given a metric space (X,dist)(X,dist) and a non-negative real number rr, for any point xXx\in X, let B(x,r)={yXdist(x,y)r}B(x,r)=\{y\in X\mid dist(x,y)\leq r\} denote the ball around xx with radius rr. The doubling dimension of the metric space (X,dist)(X,dist) is the smallest integer dd such that any ball B(x,2r)B(x,2r) can be covered by at most 2d2^{d} ball B(x,r)B(x,r). A metric space with doubling dimension is called a doubling metric space.

Given a metric space 𝒳=(X,dist)\mathcal{X}=(X,dist), let Δ\Delta be the ratio between the largest and the smallest distance in 𝒳\mathcal{X}, which is called the aspect ratio of 𝒳\mathcal{X}. For any subset YXY\subseteq X, the aspect ratio of YY is defined as maxy,yYd(y,y)miny,yYd(y,y)\frac{\max_{y,y^{\prime}\in Y}d(y,y^{\prime})}{\min_{y,y^{\prime}\in Y}d(y,y^{\prime})}. For a positive integer ll, let [l]={1,2,,l}[l]=\{1,2,...,l\}.

Definition 3.3 (the fair kk-median problem).

Given a metric space (X,dist)(X,dist) with a set of clients 𝒞\mathcal{C} and a set of facilities \mathcal{F}, a non-negative integer kk and two vectors 𝛂,𝛃l\boldsymbol{\alpha},\boldsymbol{\beta}\in\mathcal{R}^{l} (0αiβi1)(0\leq\alpha_{i}\leq\beta_{i}\leq 1), assume that 𝒞\mathcal{C} is partitioned into ll disjoint groups, i.e., 𝒞={𝒞1,𝒞2,,𝒞l}\mathcal{C}=\{\mathcal{C}_{1},\mathcal{C}_{2},\ldots,\mathcal{C}_{l}\} and 𝒞i𝒞j=\mathcal{C}_{i}\cap\mathcal{C}_{j}=\emptyset (ij)(i\neq j). For any i[l]i\in[l], if x𝒞ix\in\mathcal{C}_{i}, xx is called an ii-th type client. The fair kk-median problem is to find a subset FF\subseteq\mathcal{F} of size at most kk and an assignment function μ:𝒞F\mu:\mathcal{C}\rightarrow F such that: (1) for any fFf\in F, i[l]i\in[l], αi{c𝒞iμ(c)=f}{c𝒞μ(c)=f}βi\alpha_{i}\leq\frac{\mid\{c\in\mathcal{C}_{i}\mid\mu(c)=f\}\mid}{\mid\{c\in\mathcal{C}\mid\mu(c)=f\}\mid}\leq\beta_{i}; (2) c𝒞dist(c,μ(c))\sum_{c\in\mathcal{C}}dist(c,\mu(c)) is minimized.

For an instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem, given a solution (F,μ)(F,\mu) of instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}), the facilities in FF are called candidate facilities. For a facility fFf\in F and a client cCc\in C, if μ(c)=f\mu(c)=f, then it is called that ff serves client cc. Let X=𝒞X=\mathcal{C}\cup\mathcal{F}, and assume that the size of XX is nn.

4 A QPTAS for Fair kk-median Problem in Doubling Metrics

For the fair kk-median problem in doubling metrics, we first construct a split-tree, and design a dynamic programming procedure to get the solution.

4.1 Split-tree Decomposition

Given an instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem, following the assumptions in Behsaz et al. (2019), the aspect-ratio Δ\Delta of the input metrics is at most O(n4/ϵ)O(n^{4}/\epsilon) (where nn is the size of 𝒞\mathcal{C}\cup\mathcal{F}, and ϵ\epsilon is a constant). In order to satisfy the aspect-ratio assumption, we first deal with the points in XX by the following preprocessing steps and we use OPTOPT to denote an optimal solution of fair kk-median problem. Find an O(1)O(1)-approximation feasible solution LL of the fair kk-median problem (Bercea et al. (2019)), and let cost(L)cost(L) be the cost of the feasible solution obtained. Then, if there exists a pair of points x,yXx,y\in X with distance less than ϵcost(L)/n4\epsilon cost(L)/n^{4}, remove xx, copy a point xx^{\prime} of xx, and add xx^{\prime} at yy (note that if there are two clients at yy, the two clients may not necessarily be assigned to the same facility in the solution). In the new instance obtained, a point xx is within distance at most nϵcost(L)/n4n\cdot\epsilon cost(L)/n^{4} from its original location. Thus, the cost of any solution for this new instance compared with the original instance is increased by at most nϵcost(L)/n3ϵcost(OPT)n\cdot\epsilon cost(L)/n^{3}\leq\epsilon cost(OPT).

For any set YY of points, a subset SYS\subseteq Y is called a ρ\rho-coveringcovering of YY if for any point yYy\in Y, there is a point sSs\in S such that dist(y,s)ρdist(y,s)\leq\rho. A subset SYS\subseteq Y is called a ρ\rho-packingpacking of YY if for any s,sSs,s^{\prime}\in Sdist(s,s)ρdist(s,s^{\prime})\geq\rho. A subset SYS\subseteq Y is a ρ\rho-netnet in YY if it is both a ρ\rho-coveringcovering of YY and a ρ\rho-packingpacking of YY. The size of a net in metrics with fixed doubling dimension dd is bounded by the following lemma.

Lemma 4.1.

Let (X,dist)(X,dist) be a metric space with doubling dimension dd and aspect-ratio Δ\Delta, and let SXS\subseteq X be a ρ\rho-netnet. Then S(Δρ)O(d)\mid S\mid\leq(\frac{\Delta}{\rho})^{O(d)}.

A decomposition of the metric (X,dist)(X,dist) is a partition of XX into subsets, which are called blocks. A hierarchical decomposition is a sequence of +1\ell+1 decompositions H0,H1,,HH_{0},H_{1},...,H_{\ell} such that every block of Hi+1H_{i+1} is the union of blocks in HiH_{i} and H={X}H_{\ell}=\{X\} and H0={{x}xX}H_{0}=\{\{x\}\mid x\in X\}. The blocks of H_iH\_i are called the level ii blocks. A split-tree of the metric space is a complete hierarchical decomposition: the root node is the level \ell block H={X}H_{\ell}=\{X\} and the leaves are the singletons such that H0={{x}xX}H_{0}=\{\{x\}\mid x\in X\}. Given a metric space (X,dist)(X,dist), let Δ=2\Delta=2^{\ell} be the diameter of the metrics. We first construct a sequence of sets Y1Y2Y1Y0=XY_{\ell-1}\subseteq\ldots\subseteq Y_{2}\subseteq Y_{1}\subseteq Y_{0}=X such that YiY_{i} is a 2i22^{i-2}-net of Yi1Y_{i-1}. Starting from H={X}H_{\ell}=\{X\}, we now discuss the relation between the blocks in (i+1i+1)-th level and ii-th level, where 0i10\leq i\leq\ell-1. Let ri=2iϱr_{i}=2^{i}\varrho where 12ϱ<1\frac{1}{2}\leq\varrho<1, and let π\pi be a random ordering of the points in XX. For each point xXx\in X, let π(x)\pi(x) be the order of xx in π\pi. For the set YiY_{i}, let {y1,,yh}\{y_{1},\ldots,y_{h}\} be the set of points in YiY_{i} with order from left to right in π\pi. For the point y1y_{1}, a new block By1B^{y_{1}} can be obtained at level ii such that By1={xXdist(x,y1)ri}B^{y_{1}}=\{x\in X\mid dist(x,y_{1})\leq r_{i}\}. For any point yjy_{j} (1<jh)(1<j\leq h), a new block ByjB^{y_{j}} can be obtained at level ii by the following way: let Z={xXdist(x,yj)ri}Z=\{x\in X\mid dist(x,y_{j})\leq r_{i}\}. For each point xx in ZZ, if xx is not contained in any blocks {By1,,Byj1}\{B^{y_{1}},\ldots,B^{y_{j-1}}\}, then xx is contained in ByjB^{y_{j}}.

Lemma 4.2 (Talwar (2004)).

Given a split-tree of metric space (X,dist)(X,dist) with a sequence of decompositions H0,H1,,HH_{0},H_{1},...,H_{\ell}, the split-tree decomposition has the following properties:

  1. 1.

    The total number of levels \ell is O(logn)O(\log n) (since Δ=O(n4/ϵ)\Delta=O(n^{4}/\epsilon)).

  2. 2.

    Each block of level ii has diameter at most 2i+12^{i+1}, namely the maximum distance between any pair of points in a block of level ii is at most 2i+12^{i+1}.

  3. 3.

    Each level ii block is the union of at most 2O(d)2^{O(d)} level i1i-1 blocks.

  4. 4.

    For any pair of points u,vXu,v\in X, the probability that they are in different sets corresponding to blocks at level ii of split-tree is at most O(d)dist(u,v)2iO(d)\cdot\frac{dist(u,v)}{2^{i}}.

Given an instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem, let TT be the split-tree obtained by using the methods in Talwar (2004). For each block BB of level ii in TT, we compute a ρ2i+1\rho 2^{i+1}-net PP of block BB. Each point in PP is called a portal, and PP is also called a portal set. By Lemma 4.1, it follows that the number of portals at a given block is ρO(d)\rho^{-O(d)}. Moreover, the split-tree within portal set can be found in time (1/ρ)O(d)nlogΔ(1/\rho)^{O(d)}n\log\Delta (Bartal and Gottlieb (2013); Cohen-Addad et al. (2019)). For a portal set PP of a block BB of level ii in TT, there is an important property as follows.

Lemma 4.3 (Bartal et al. (2016)).

For a portal set PP of a block BB of level ii in TT, assume that the children of BB at level i1i-1 is B1,B2,,BuB_{1},B_{2},\ldots,B_{u} and each block BjB_{j} has a portal set PjBjP_{j}\subseteq B_{j}. PP is a subset of the portal sets computed for the descendant blocks of BB. That is, PP1P2PuP\subseteq P_{1}\cup P_{2}\cup\ldots\ P_{u}.

Given a split-tree TT and two blocks B1B_{1} and B2B_{2} at level ii of TT, and for two points uB1u\in B_{1}, and vB2v\in B_{2}, the distance between uu and vv on the split-tree is defined as the length of the path which is constructed by the subpath from uu to a portal pip_{i} of B1B_{1}, the subpath from pip_{i} to a portal pjp_{j} of B2B_{2}, and the subpath from pip_{i} to vv. Given a block BB at level ii of split-tree TT and a portal set PBP\subseteq B, if there is a client c1c_{1} outside of BB that is assigned to a facility f1f_{1} inside of BB crossing at a portal p1Pp_{1}\in P, then we say that client c1c_{1} enters BB through portal p1p_{1}. Similarly, if there is a client c2c_{2} inside of BB that is assigned to a facility f2f_{2} outside of BB crossing at a portal p2Pp_{2}\in P, then we say that client c2c_{2} leaves BB through portal p2p_{2}. In the following, all the distances considered are the distances on the split-tree TT.

Lemma 4.4.

For any metric (X,dist)(X,dist) with doubling dimension dd and any ρ>0\rho>0, given a randomized split-tree TT, a pair of blocks B1B_{1} and B2B_{2} of level ii, a ρ2i+1\rho 2^{i+1}-net P1P_{1} of block B1B_{1}, a ρ2i+1\rho 2^{i+1}-net P2P_{2} of block B2B_{2}, and any two points uB1u\in B_{1} and vB2v\in B_{2}, for the distance of uu and vv on the split-tree TT and the distance dist(u,v)dist(u,v) in the metric space, we have: minpiP1,pjP2{dist(u,pi)+dist(pi,pj)+dist(pj,v)}dist(u,v)+O(ρ2i+1)\min_{p_{i}\in P_{1},p_{j}\in P_{2}}\{dist(u,p_{i})+dist(p_{i},p_{j})+dist(p_{j},v)\}\leq dist(u,v)+O(\rho 2^{i+1}).

Proof.

By the triangle inequality, we have

dist(pi,pj)dist(u,pi)+dist(u,pj)dist(u,pi)+dist(u,v)+dist(v,pj).\displaystyle dist(p_{i},p_{j})\leq dist(u,p_{i})+dist(u,p_{j})\leq dist(u,p_{i})+dist(u,v)+dist(v,p_{j}).
dist(u,pi)+dist(pi,pj)+dist(v,pj)2dist(u,pi)+dist(u,v)+2dist(v,pj).\displaystyle dist(u,p_{i})+dist(p_{i},p_{j})+dist(v,p_{j})\leq 2dist(u,p_{i})+dist(u,v)+2dist(v,p_{j}).

Then, we have

minpiP1,pjP2{dist(u,pi)+dist(pi,pj)+dist(pj,v)}\displaystyle\min_{p_{i}\in P_{1},p_{j}\in P_{2}}\{dist(u,p_{i})+dist(p_{i},p_{j})+dist(p_{j},v)\}
minpiP1,pjP2{2dist(u,pi)+dist(u,v)+2dist(v,pj)}\displaystyle\leq\min_{p_{i}\in P_{1},p_{j}\in P_{2}}\{2dist(u,p_{i})+dist(u,v)+2dist(v,p_{j})\}
=minpiP1,pjP2{2dist(u,pi)+2dist(v,pj)}+dist(u,v)\displaystyle=\min_{p_{i}\in P_{1},p_{j}\in P_{2}}\{2dist(u,p_{i})+2dist(v,p_{j})\}\!+\!dist(u,v)
=minpiP1,pjP22dist(u,pi)+minpiP1,pjP22dist(v,pj)+dist(u,v).\displaystyle=\min_{p_{i}\in P_{1},p_{j}\in P_{2}}2dist(u,p_{i})\!+\!\min_{p_{i}\in P_{1},p_{j}\in P_{2}}2dist(v,p_{j})+dist(u,v).

Since P1P_{1} is a ρ2i+1\rho 2^{i+1}-net of block B1B_{1} and P2P_{2} is a ρ2i+1\rho 2^{i+1}-net of block B2B_{2}, we have that minpiP1,pjP22dist(u,pi)2dist(u,pi)2ρ2i+1\min\limits_{p_{i}\in P_{1},p_{j}\in P_{2}}2dist(u,p_{i})\leq 2dist(u,p_{i}^{\prime})\leq 2\cdot\rho 2^{i+1}, where pip_{i}^{\prime} is any point in P1P_{1}. Similarly, we have that minpiP1,pjP22dist(u,pi)2ρ2i+1\min\limits_{p_{i}\in P_{1},p_{j}\in P_{2}}2dist(u,p_{i})\leq 2\cdot\rho 2^{i+1}. Thus, we have

minpiP1,pjP2{dist(u,pi)+dist(pi,pj)+dist(pj,v)}dist(u,v)+O(ρ2i+1)\min_{p_{i}\in P_{1},p_{j}\in P_{2}}\{dist(u,p_{i})+dist(p_{i},p_{j})+dist(p_{j},v)\}\leq dist(u,v)+O(\rho 2^{i+1})

4.2 Algorithm for Fair kk-median Problem in Doubling Metrics

In this section, we show how to design the dynamic programming process based on the split-tree.

4.2.1 Dynamic Programming State

Given a split-tree TT, the dynamic programming algorithm proceeds on split-tree TT from the leaves to the root. For any block BB of split-tree TT, we define the subtree rooted at block BB as TBT_{B}, and each subtree is a subproblem in our dynamic programming process, which includes the information that how many clients of each type entering or leaving the subtree through the portals of the subtree.

A table entry in the dynamic programming process is a tuple DP[B,kB,𝑸1E,𝑸1L,𝑸2E,𝑸2L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\boldsymbol{Q}_{2}^{E},\boldsymbol{Q}_{2}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}] where the parameters are defined as follows:

  1. 1.

    mm is the size of portal set in BB and m=ρO(d)m=\rho^{-O(d)}.

  2. 2.

    BB is the root node of the subtree TBT_{B}.

  3. 3.

    kBk_{B}(0kBk0\leq k_{B}\leq k) is the number of open facilities in BB.

  4. 4.

    𝑸iE\boldsymbol{Q}_{i}^{E} (i[m]i\in[m]) is a vector with ll values and for t[l]t\in[l], qtE(i)q_{t}^{E(i)}(0qtE(i)n0\leq q_{t}^{E(i)}\leq n) in 𝑸iE\boldsymbol{Q}_{i}^{E} denotes the number of type tt clients that enter BB through the ii-th portal.

  5. 5.

    𝑸iL\boldsymbol{Q}_{i}^{L} (i[m]i\in[m]) is a vector with ll values and for t[l]t\in[l], qtL(i)q_{t}^{L(i)}(nqtL(i)0-n\leq q_{t}^{L(i)}\leq 0) in 𝑸iL\boldsymbol{Q}_{i}^{L} denotes the number of type tt clients that leave BB through the ii-th portal.

The cost of table entry DP[B,kB,𝑸1E,𝑸1L,𝑸2E,𝑸2L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\boldsymbol{Q}_{2}^{E},\boldsymbol{Q}_{2}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}] consists of the following three parts: (1) The cost of assigning clients inside of BB to facilities inside of BB; (2) For the clients inside of BB assigned to facilities outside of BB through portals of BB, the cost from the clients inside of BB to portals of BB; (3) For the clients outside of BB assigned to facilities inside of BB through portals of BB, the cost from portals of BB to the assigned facilities inside of BB.

The solutions at the root RR of TT are in the table entry DP[R,k,𝟎,𝟎,,𝟎]DP[R,k,\boldsymbol{0},\boldsymbol{0},\ldots,\boldsymbol{0}], where 𝟎\boldsymbol{0} is a vector with ll zero components. Among all these solutions in DP[R,k,𝟎,𝟎,,𝟎]DP[R,k,\boldsymbol{0},\boldsymbol{0},\ldots,\boldsymbol{0}], the algorithm outputs the one with the minimum cost.

The base case of the dynamic programming is located at the leaves (which are singletons) of split-tree. For a leaf of split-tree, the corresponding block has only one portal and we only need to set 𝑸1E\boldsymbol{Q}_{1}^{E} and 𝑸1L\boldsymbol{Q}_{1}^{L}, while the remaining 𝑸iE\boldsymbol{Q}_{i}^{E}’s and 𝑸iL\boldsymbol{Q}_{i}^{L}’s can be assigned 𝟎\boldsymbol{0}. Since there is only one facility or client in each leaf of split-tree, we consider the following three cases for the table entry of leaf node BB:

  1. 1.

    If there is only one facility in the block and this facility is opened, then the table value of this block is DP[B,1,𝑸1E,𝟎,𝟎,𝟎,,𝟎,𝟎]=0DP[B,1,\boldsymbol{Q}_{1}^{E},\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\ldots,\boldsymbol{0},\boldsymbol{0}]=0. In addition, the tt-th value of 𝑸1E\boldsymbol{Q}_{1}^{E} is denoted as qtq_{t}, i.e., qtq_{t} is the number of type tt clients outside of BB that are served by the facility inside BB. Thus, in order to satisfy the fair constraints, the number of each type clients should satisfy the following constraint: for each t[l]t\in[l], αtqtt=1lqtβt\alpha_{t}\leq\frac{q_{t}}{\sum_{t=1}^{l}q_{t}}\leq\beta_{t}.

  2. 2.

    If there is only one facility in the block but this facility is not opened, then the table value of this block is DP[B,0,𝟎,𝟎,𝟎,𝟎,,𝟎,𝟎]=0DP[B,0,\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\ldots,\boldsymbol{0},\boldsymbol{0}]=0.

  3. 3.

    If there is only one client in the block, then the table value of this block is DP[B,0,𝟎,𝑸1L,𝟎,𝟎,,𝟎,𝟎]=0DP[B,0,\boldsymbol{0},\boldsymbol{Q}_{1}^{L},\boldsymbol{0},\boldsymbol{0},\ldots,\boldsymbol{0},\boldsymbol{0}]=0. If the client is a type tt (t[l]t\in[l]) client, then the tt-th value in 𝑸1L\boldsymbol{Q}_{1}^{L} is -1 and the other values in 𝑸1L\boldsymbol{Q}_{1}^{L} is 0.

4.2.2 Computing Table Entries

Now we consider the subproblem on the subtree TBT_{B} where block BB is at level ii of split-tree TT. For the block BB, let B1,B2,,BuB_{1},B_{2},...,B_{u} be the children of block BB where uu is at most 2O(d)2^{O(d)}. Let (B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) be the configuration of BB, and let (Bj,kBj,𝑸1E(j),𝑸1L(j),,𝑸mE(j),𝑸mL(j))(B_{j},k_{B_{j}},\boldsymbol{Q}_{1}^{E(j)},\boldsymbol{Q}_{1}^{L(j)},\ldots,\boldsymbol{Q}_{m}^{E(j)},\boldsymbol{Q}_{m}^{L(j)}) be the configuration of children BjB_{j} (1ju1\leq j\leq u), where BjB_{j} (j[u]j\in[u]) denotes the jj-th child of BB, kBjk_{B_{j}} is the number of facilities that are opened in BjB_{j}, 𝑸iE(j)\boldsymbol{Q}_{i}^{E(j)} (i[m]i\in[m]) is a vector with ll values and each value qtE(ij)q_{t}^{E(ij)} (0qtE(ij)n,t[l]0\leq q_{t}^{E(ij)}\leq n,t\in[l]) in 𝑸iE(j)\boldsymbol{Q}_{i}^{E(j)} denotes the number of the tt-th type clients entering BjB_{j} through the ii-th portal of BjB_{j}, 𝑸iL(j)\boldsymbol{Q}_{i}^{L(j)} is a vector with ll values and each value qtL(ij)q_{t}^{L(ij)} (0qtL(ij)n,t[l]0\leq q_{t}^{L(ij)}\leq n,t\in[l]) in 𝑸iL(j)\boldsymbol{Q}_{i}^{L(j)} denotes the number of the tt-th type clients leaving BjB_{j} through the ii-th portal of BjB_{j}. The values of kBjk_{B_{j}}, 𝑸iE(j)\boldsymbol{Q}_{i}^{E(j)} and 𝑸iL(j)\boldsymbol{Q}_{i}^{L(j)} have the following constraints:

  1. (1)

    kB1+kB2++kBukBk_{B_{1}}+k_{B_{2}}+\ldots+k_{B_{u}}\leq k_{B};

  2. (2)

    i=1mj=1u𝑸iE(j)+i=1mj=1u𝑸iL(j)\sum_{i=1}^{m}{\sum_{j=1}^{u}\boldsymbol{Q}_{i}^{E(j)}}+\sum_{i=1}^{m}{\sum_{j=1}^{u}\boldsymbol{Q}_{i}^{L(j)}}=i=1m𝑸iE+i=1m𝑸iL\sum_{i=1}^{m}\boldsymbol{Q}_{i}^{E}+\sum_{i=1}^{m}\boldsymbol{Q}_{i}^{L}.

Lemma 4.5.

Given a subproblem BB at level ii of split-tree TT (i0i\neq 0) and its children B1,B2,,BuB_{1},B_{2},\ldots,B_{u} where uu is at most 2O(d)2^{O(d)}, for the configuration (B,kB,𝐐1E,𝐐1L,,𝐐mE,𝐐mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) of BB, and the configuration (Bj,kBj,𝐐1E(j),𝐐1L(j),,𝐐mE(j),𝐐mL(j))(B_{j},k_{B_{j}},\boldsymbol{Q}_{1}^{E(j)},\boldsymbol{Q}_{1}^{L(j)},\ldots,\boldsymbol{Q}_{m}^{E(j)},\boldsymbol{Q}_{m}^{L(j)}) of each children BjB_{j} (1ju1\leq j\leq u), we have

DP[B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL]=min{j=1uDP[Bj,kBj,𝑸1E(j),𝑸1L(j),,𝑸mE(j),𝑸mL(j)]+τ,\displaystyle DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}]=\min\{\sum_{j=1}^{u}DP[B_{j},k_{B_{j}},\boldsymbol{Q}_{1}^{E(j)},\boldsymbol{Q}_{1}^{L(j)},\ldots,\boldsymbol{Q}_{m}^{E(j)},\boldsymbol{Q}_{m}^{L(j)}]+\tau,
withkB1+kB2++kBukB,i=1mj=1u𝑸iE(j)+i=1mj=1u𝑸iL(j)=i=1m𝑸iE+i=1m𝑸iL}\displaystyle with\ k_{B_{1}}+k_{B_{2}}+\ldots+k_{B_{u}}\leq k_{B},\sum_{i=1}^{m}\sum_{j=1}^{u}\boldsymbol{Q}_{i}^{E(j)}+\sum_{i=1}^{m}{\sum_{j=1}^{u}\boldsymbol{Q}_{i}^{L(j)}}=\sum_{i=1}^{m}\boldsymbol{Q}_{i}^{E}+\sum_{i=1}^{m}\boldsymbol{Q}_{i}^{L}\}

where τ\tau is the cost between portals of B1,B2,,BuB_{1},B_{2},\ldots,B_{u} and the ones in BB, and can be calculated in polynomial time.

Proof.

Since there exists consistence relation between the configurations of BB and its children, that is, the number of clients entering or leaving in the configuration (B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) is equal to the number of clients entering or leaving in the configurations (Bj,kBj,𝑸1E(j),𝑸1L(j),,𝑸mE(j),𝑸mL(j))(B_{j},k_{B_{j}},\boldsymbol{Q}_{1}^{E(j)},\boldsymbol{Q}_{1}^{L(j)},\ldots,\boldsymbol{Q}_{m}^{E(j)},\boldsymbol{Q}_{m}^{L(j)}), we have

i=1mj=1u𝑸iE(j)+i=1mj=1u𝑸iL(j)=i=1m𝑸iE+i=1m𝑸iL\displaystyle\sum_{i=1}^{m}{\sum_{j=1}^{u}\boldsymbol{Q}_{i}^{E(j)}}+\sum_{i=1}^{m}{\sum_{j=1}^{u}\boldsymbol{Q}_{i}^{L(j)}}=\sum_{i=1}^{m}\boldsymbol{Q}_{i}^{E}+\sum_{i=1}^{m}\boldsymbol{Q}_{i}^{L}

For a particular type tt (t[l]t\in[l]), the number of type tt clients entering or leaving in the configuration (B,kB,𝑸1E,𝑸1L,𝑸2E,𝑸2L,,𝑸mE,𝑸mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\boldsymbol{Q}_{2}^{E},\boldsymbol{Q}_{2}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) is equal to the number of type tt clients entering or leaving in the configurations (Bj,kBj,𝑸1E(j),𝑸1L(j),𝑸2E(j),𝑸2L(j),,𝑸mE(j),𝑸mL(j))(B_{j},k_{B_{j}},\boldsymbol{Q}_{1}^{E(j)},\boldsymbol{Q}_{1}^{L(j)},\boldsymbol{Q}_{2}^{E(j)},\boldsymbol{Q}_{2}^{L(j)},\ldots,\boldsymbol{Q}_{m}^{E(j)},\boldsymbol{Q}_{m}^{L(j)}). Thus, we have

i=1mj=1uqtE(ij)+i=1mj=1uqtL(ij)=i=1mqtE(i)+i=1mqtL(i)\displaystyle\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{E(ij)}}+\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{L(ij)}}=\sum_{i=1}^{m}q_{t}^{E(i)}+\sum_{i=1}^{m}q_{t}^{L(i)}

Since nqtL(ij)0-n\leq q_{t}^{L(ij)}\leq 0, nqtL(i)0-n\leq q_{t}^{L(i)}\leq 0, we have:

i=1mj=1uqtE(ij)i=1mj=1uqtL(ij)=i=1mqtE(i)i=1mqtL(i)\displaystyle\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{E(ij)}}-\sum_{i=1}^{m}{\sum_{j=1}^{u}\mid q_{t}^{L(ij)}\mid}=\sum_{i=1}^{m}q_{t}^{E(i)}-\sum_{i=1}^{m}\mid q_{t}^{L(i)}\mid (1)

Then, we have:

i=1mj=1uqtE(ij)+i=1mqtL(i)=i=1mi=1mqtE(i)+j=1uqtL(ij)\displaystyle\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{E(ij)}}+\sum_{i=1}^{m}\mid q_{t}^{L(i)}\mid=\sum_{i=1}^{m}{\sum_{i=1}^{m}q_{t}^{E(i)}+\sum_{j=1}^{u}\mid q_{t}^{L(ij)}\mid} (2)

Based on the constraints of kBjk_{B_{j}}, 𝑸iE(j)\boldsymbol{Q}_{i}^{E(j)}, 𝑸iL(j)\boldsymbol{Q}_{i}^{L(j)}, to find the minimum cost of the solutions in subproblem (B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) with table entry DP[B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}], we need to enumerate all consistent subproblems from its children B1,B2,,BuB_{1},B_{2},\ldots,B_{u}. Moreover, the cost DP[B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}] is also closely related to the cost between portals of B1,B2,,BuB_{1},B_{2},\ldots,B_{u} and the ones of BB, denoted by τ\tau.

We now give the general idea to calculate τ\tau. For each type t[l]t\in[l], a weighted bipartite graph Φt\Phi_{t} is constructed. The value of τ\tau is obtained based on perfect matching on each bipartite graph constructed. If we find the minimum cost perfect matching in Φt\Phi_{t}, it gives the optimal case for the type tt clients entering or leaving B1,,BuB_{1},\ldots,B_{u} and BB through portals of B1,,BuB_{1},\ldots,B_{u} and the ones in BB.

For each type t[l]t\in[l], a weighted bipartite graph Φt=(RS,E)\Phi_{t}=(R\cup S,E) can be constructed by the following way. Let R=XEFXLSR=X_{E}^{F}\cup X_{L}^{S}, and S=XESXLFS=X_{E}^{S}\cup X_{L}^{F}. For the block BB, and for each portal uu in BB, if there exist qq clients entering BB through uu, then add qq new vertices to XEFX_{E}^{F}. It is easy to see that XEF=i=1mqtE(i)\mid X_{E}^{F}\mid=\sum_{i=1}^{m}q_{t}^{E(i)}. Similarly, for each portal uu in BB, if there exist qq clients leaving BB through uu, then add qq new vertices to XLFX_{L}^{F}. Then, we have XLF=i=1mqtL(i)\mid X_{L}^{F}\mid=\sum_{i=1}^{m}\mid q_{t}^{L(i)}\mid. For each child BjB_{j} of BB, and for each portal uu of BjB_{j}, if there exist qq clients entering BjB_{j} through uu, then add qq new vertices to XESX_{E}^{S}. Then, we have XES=i=1mj=1uqtE(ij)\mid X_{E}^{S}\mid=\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{E(ij)}}. Similarly, for each child BjB_{j} of BB, and for each portal uu of BjB_{j}, if there exist qq clients leaving BjB_{j} through uu, then add qq new vertices to XLSX_{L}^{S}. Then, we have XLS=i=1mj=1uqtL(ij)\mid X_{L}^{S}\mid=\sum_{i=1}^{m}{\sum_{j=1}^{u}\mid q_{t}^{L(ij)}\mid}.

For each vertex vv in RSR\cup S, let P(v)P(v) be the portal in BB or BjB_{j} such that vertex vv is added by the clients entering or leaving block through the portal.

The set EE of edges in Φt\Phi_{t} can be constructed by the following methods.

  1. 1.

    Add all the edges in {(u,v)uXEF,vXES}\{(u,v)\mid u\in X_{E}^{F},v\in X_{E}^{S}\} to EE.

  2. 2.

    Add all the edges in {(u,v)uXLS,vXLF}\{(u,v)\mid u\in X_{L}^{S},v\in X_{L}^{F}\} to EE.

  3. 3.

    Add all edges in {(u,v)uXLS,vXES\{(u,v)\mid u\in X_{L}^{S},v\in X_{E}^{S}, where P(u)BiP(u)\in B_{i} and P(v)BjP(v)\in B_{j} and ij}i\neq j\} to EE.

  4. 4.

    For each edge (u,v)(u,v) in EE, the weight of edge (u,v)(u,v) is denoted as dist(P(u),P(v))dist(P(u),P(v)).

Claim 4.6.

Given a bipartite graph Φt=(RS,E)\Phi_{t}=(R\cup S,E) as described above, there must exist a perfect matching in Φt=(RS,E)\Phi_{t}=(R\cup S,E).

Proof.

Assume that there does not exist a perfect matching in Φt=(RS,E)\Phi_{t}=(R\cup S,E). Let MM be one of the maximum matchings in Φt\Phi_{t}. Since MM is not a perfect matching, there exists at least two unmatched vertices in Φt\Phi_{t}. By equation 1, we have R=S\mid R\mid=\mid S\mid. Thus, the number of unmatched vertices must be even. Assume that a,ba,b are two unmatched vertices in Φt\Phi_{t}, we have the following cases for the unmatched vertices aa and bb: Case 1, aXEF,bXESa\in X_{E}^{F},b\in X_{E}^{S} or aXLS,bXLFa\in X_{L}^{S},b\in X_{L}^{F}. According to the construction process of EE, there must exist an edge between aa and bb. Thus, based on MM, a matching MM^{\prime} with one more edge can be obtained, contradicting the fact that MM is a maximum matching in Φt\Phi_{t}.

Case 2, aXEF,bXLFa\in X_{E}^{F},b\in X_{L}^{F}. We have the following two subcases: 1) There is no edge between vertices in XLSX_{L}^{S} and vertices in XESX_{E}^{S}. Since XEF=XES,XLS=XLF\mid X_{E}^{F}\mid=\mid X_{E}^{S}\mid,\mid X_{L}^{S}\mid=\mid X_{L}^{F}\mid, there must exist two unmatched vertices aXESa^{\prime}\in X_{E}^{S} and bXLSb^{\prime}\in X_{L}^{S} such that an augmenting path from aa to aa^{\prime} and an augmenting path from bb to bb^{\prime} can be found. Thus, based on MM, a matching MM^{\prime} with larger size can be obtained, contradicting the fact that MM is a maximum matching in Φt\Phi_{t}. 2) There exist two vertices cXLS,dXESc\in X_{L}^{S},d\in X_{E}^{S} such that edge (c,d)(c,d) is in EE. According to the construction process of EE, edges (a,d)(a,d), (c,b)(c,b) are contained in EE. Thus, an augmenting path from aa to bb can be found, and a matching MM^{\prime} with larger size can be obtained, which is a contradiction.

Case 3, aXES,bXLSa\in X_{E}^{S},b\in X_{L}^{S}. We consider the following four cases. 1) There is no edge from aa to the vertices in XLSX_{L}^{S}, and there is no edge from bb to the vertices in XESX_{E}^{S}. Since XEF=XES,XLS=XLF\mid X_{E}^{F}\mid=\mid X_{E}^{S}\mid,\mid X_{L}^{S}\mid=\mid X_{L}^{F}\mid, there must exist two unmatched vertices aXEFa^{\prime}\in X_{E}^{F} and bXLFb^{\prime}\in X_{L}^{F} such that an augmenting path from aa to aa^{\prime} and an augmenting path from bb to bb^{\prime} can be found. Thus, based on MM, a matching MM^{\prime} with larger size can be obtained, contradicting tha fact that MM is a maximum matching in Φt\Phi_{t}. 2) There exists a vertex cc in XLSX_{L}^{S} such that edge (a,c)(a,c) is in EE. According to the construction process of EE, there exist a vertex dd in XLFX_{L}^{F} such that edges (c,d)(c,d), (d,b)(d,b) are in EE. Thus, an augmenting path from aa to bb can be found, and a matching MM^{\prime} with larger size can be obtained, which is a contradiction. 3) There exists a vertex cc in XESX_{E}^{S} such that edge (b,c)(b,c) is in EE. According to the construction process of EE, there exist a vertex dd in XEFX_{E}^{F} such that edges (c,d)(c,d), (d,a)(d,a) are in EE. Thus, an augmenting path from aa to bb can be found, and a matching MM^{\prime} with larger size can be obtained, which is a contradiction. 4) There exists a vertex cc in XLSX_{L}^{S} such that edge (a,c)(a,c) is in EE and there exists a vertex dd in XESX_{E}^{S} such that edge (b,d)(b,d) is in EE. Based on MM, it is easy to find an augmenting path from aa to bb to get a matching with larger size, which is a contradiction.

Therefore, there must exist a perfect matching in bipartite graph Φt=(RS,E)\Phi_{t}=(R\cup S,E). ∎

Let τt\tau_{t} be the sum weights of the minimum weighted perfect matching in Φt=(RS,E)\Phi_{t}=(R\cup S,E).

Claim 4.7.

The value of τ\tau is t=1lτt\sum_{t=1}^{l}\tau_{t}.

Proof.

By equation 1, we have

i=1mj=1uqtE(ij)i=1mqtE(i)=i=1mj=1uqtL(ij)i=1mqtL(i)\displaystyle\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{E(ij)}}-\sum_{i=1}^{m}q_{t}^{E(i)}=\sum_{i=1}^{m}{\sum_{j=1}^{u}\mid q_{t}^{L(ij)}\mid}-\sum_{i=1}^{m}\mid q_{t}^{L(i)}\mid (3)

We denote i=1mj=1uqtE(ij)i=1mqtE(i)\sum_{i=1}^{m}{\sum_{j=1}^{u}q_{t}^{E(ij)}}-\sum_{i=1}^{m}q_{t}^{E(i)} and i=1mj=1uqtL(ij)i=1mqtL(i)\sum_{i=1}^{m}{\sum_{j=1}^{u}\mid q_{t}^{L(ij)}\mid}-\sum_{i=1}^{m}\mid q_{t}^{L(i)}\mid as the remaining number of clients entering or leaving B1,,BuB_{1},\ldots,B_{u}, respectively. Then, the remaining number of clients entering B1,,BuB_{1},\ldots,B_{u} is equal to the remaining number of clients leaving B1,,BuB_{1},\ldots,B_{u}.

For each type t[l]t\in[l], a weighted bipartite graph Φt\Phi_{t} is constructed. Let HiH_{i} be the number of edges in perfect matching of Φt\Phi_{t}, and let H=i=1lHiH=\sum_{i=1}^{l}H_{i}. Since the number of clients entering or leaving in the configuration (B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) is equal to the number of clients entering or leaving in the configurations (Bj,kBj,𝑸1E(j),𝑸1L(j),,𝑸mE(j),𝑸mL(j))(B_{j},k_{B_{j}},\boldsymbol{Q}_{1}^{E(j)},\boldsymbol{Q}_{1}^{L(j)},\ldots,\boldsymbol{Q}_{m}^{E(j)},\boldsymbol{Q}_{m}^{L(j)}), we can get that the number of pair portals to get τ\tau value is equal to HH.

In each graph Φt\Phi_{t}, the perfect matching in Φt\Phi_{t} gives the relation between the portals of B1,,BuB_{1},\ldots,B_{u} and the ones in BB, where the clients of type tt entering or leaving through those portals. The minimum cost of the solutions in subproblem (B,kB,𝑸1E,𝑸1L,𝑸2E,𝑸2L,,𝑸mE,𝑸mL)(B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\boldsymbol{Q}_{2}^{E},\boldsymbol{Q}_{2}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}) with table entry DP[B,kB,𝑸1E,𝑸1L,𝑸2E,𝑸2L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\boldsymbol{Q}_{2}^{E},\boldsymbol{Q}_{2}^{L},\ldots,\\ \boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}] is based on the value of τ\tau, which is the cost between portals of B1,B2,,BuB_{1},B_{2},\ldots,B_{u} and the ones in BB. Thus, we can get that τt=1lτt\tau\leq\sum_{t=1}^{l}\tau_{t}. We now prove that the value of τ\tau cannot be less than t=1lτt\sum_{t=1}^{l}\tau_{t}. Assume that τ<t=1lτt\tau<\sum_{t=1}^{l}\tau_{t}. The value of τ\tau is the cost between portals of B1,B2,,BuB_{1},B_{2},\ldots,B_{u} and the ones in BB, and by the fair constraints, there are ll types of clients to deal with. Thus, the value of τ\tau can be divided into ll values, i.e., τ=τ1++τl\tau=\tau_{1}^{\prime}+\ldots+\tau_{l}^{\prime}. By assumption, τ<t=1lτt\tau<\sum_{t=1}^{l}\tau_{t}. Thus, there is a type tt such that τt<τt\tau_{t}^{\prime}<\tau_{t}. Let MM be the perfect matching in Φt\Phi_{t} to get the value τt\tau_{t}, and let SS be the set of pair portals used to get the value τt\tau_{t}^{\prime}. By the above discussion, the number of pair portals in SS is equal to the number of edges in MM. For each pair portal (p,p)(p,p^{\prime}) in SS, by the construction process of edge set EE, there exists a corresponding edge with weight dist(p,p)dist(p,p^{\prime}) in graph Φt\Phi_{t}. Moreover, let MM^{\prime} be the set of corresponding edges of the pair portals in SS. It is easy to get that MM^{\prime} is a perfect matching in Φt\Phi_{t} with weight smaller than MM, contradicting that MM is a minimum weighted perfect matching in Φt\Phi_{t}. Thus, τ=t=1lτt\tau=\sum_{t=1}^{l}\tau_{t}. ∎

Since the number of vertices and the number of edges in graph Φt\Phi_{t} are polynomial, we can find the minimum cost perfect matching of graph Φt=(RS,E)\Phi_{t}=(R\cup S,E) and compute the value of τ\tau in polynomial time (Kuhn (1955); Munkres (1957)). ∎

4.2.3 Assigning Clients to Facilities

Given an instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem, the above dynamic programming procedure obtains a set FF\subseteq\mathcal{F} of size at most kk which is a set of open facilities for the fair kk-median problem and the number of clients assigned to each open facility satisfying the fairness constraints. However, for each client c𝒞c\in\mathcal{C}, the dynamic programming procedure only ensures that the number of clients assigned to each open facility satisfies fair constraints, and does not obtain the assignment which maps each client c𝒞c\in\mathcal{C} to a facility fFf\in F. We apply the min-cost max-flow technique to obtain the assignment which maps each client c𝒞c\in\mathcal{C} to a facility fFf\in F for fair kk-median problem. Consider the solution obtained by the above dynamic programming, we obtain a set FF\subseteq\mathcal{F} of size at most kk which is a set of open facilities for fair kk-median problem and each fFf\in F is given some numbers to denote the number of each type clients assigned to it.

Given an open facilities set F={f1,f2,,fη}F=\{f_{1},f_{2},\ldots,f_{\eta}\} (where ηk\eta\leq k, FF\subseteq\mathcal{F}) and a vector 𝝀i\boldsymbol{\lambda}_{i} with ll values for each open facility in FF, where 𝝀i\boldsymbol{\lambda}_{i} (i[η]i\in[\eta]) is a vector within ll values, and each value λih\lambda_{i}^{h} (0λhin,h[l]0\leq\lambda_{h}^{i}\leq n,h\in[l]) in 𝝀i\boldsymbol{\lambda}_{i} denotes the number of the hh-th type clients that are assigned to fif_{i}, we construct a bipartite graph G=(UV,A)G=(U\cup V,A) with both capacities and weights on the edges and bb values on the vertices as follows. Let U={cjhh[l],j[𝒞h]}{s}U=\{c_{j}^{h}\mid h\in[l],j\in[\mid\mathcal{C}_{h}\mid]\}\cup\{s\}, V={fihi[η],h[l]}{t}V=\{f_{i}^{h}\mid i\in[\eta],h\in[l]\}\cup\{t\}, where ss denote the source point, tt denotes the sink point, cjhc_{j}^{h} denotes the jj-th client of the type hh clients (cjh𝒞hc_{j}^{h}\in\mathcal{C}_{h}), fihf_{i}^{h} denotes the ii-th facility fiFf_{i}\in F and we label this facility as type hh (h[l]h\in[l]).

The edges in GG can be constructed as follows. For each fixed h[l]h\in[l], and for each vertex cjhUc_{j}^{h}\in U and fihVf_{i}^{h}\in V (where j[𝒞h],i[η]j\in[\mid\mathcal{C}_{h}\mid],i\in[\eta]), add a directed edge ee from cjhc_{j}^{h} to fihf_{i}^{h} in GG. The capacity of the edge is 1 and the cost of the edge is dist(cjh,fi)dist(c_{j}^{h},f_{i}). Note that parallel edges may exit (parallel edges are kept in GG). Furthermore, for each vertex cjhU{s}c_{j}^{h}\in U-\{s\}, add a directed edge from ss to cjhc_{j}^{h} with capacity 0 and cost 0. For each vertex fihVf_{i}^{h}\in V, add a directed edge from fihf_{i}^{h} to tt with capacity 0 and cost 0.

Finally, we set bb valuevalue for each vertex in bipartite graph G=(UV,A)G=(U\cup V,A). For each vertex vGv\in G, let b(v)b(v) denote the value of each vertex. Then, positive b(v)b(v) means that the vertex vv supplies b(v)b(v) units of flow, and negative b(v)b(v) means that the vertex vv demands b(v)b(v) units of flow. For each vertex cjhUc_{j}^{h}\in U, set b(cjh)=1b(c_{j}^{h})=1. For each vertex fihVf_{i}^{h}\in V, set b(fih)=λihb(f_{i}^{h})=-\lambda_{i}^{h}. Since i=1ηh=1lb(fih)=𝒞\sum\limits_{i=1}^{\eta}{\sum\limits_{h=1}^{l}b(f_{i}^{h})}=-\mid\mathcal{C}\mid and h=1lj=1𝒞hb(cjh)=𝒞\sum\limits_{h=1}^{l}{\sum\limits_{j=1}^{\mid\mathcal{C}_{h}\mid}b(c_{j}^{h})}=\mid\mathcal{C}\mid , we have vGb(v)=0\sum\limits_{v\in G}{b(v)}=0. Then, we run a min-cost max-flow algorithm on the bipartite graph G=(UV,A)G=(U\cup V,A). For the min-cost max-flow found in graph GG, and for each edge (cjh,fih)(c_{j}^{h},f_{i}^{h}) in GG, if edge (cjh,fih)(c_{j}^{h},f_{i}^{h}) has a flow of 1, then we assign the client cjhc_{j}^{h} to facility fif_{i}. Thus, by applying the min-cost max-flow method in graph GG, we obtain the assignment which maps each client c𝒞c\in\mathcal{C} to a facility fFf\in F in polynomial time.

4.2.4 Analysis and Running Time

We show that our algorithm outputs a solution of cost at most (1+ϵ)(1+\epsilon) times the cost of the optimal solution and it gives a QPTAS for the fair kk-median problem in doubling metrics. Let OPTOPT denote one of the optimal solution of the fair kk-median problem in doubling metrics, and let cost(OPT)cost(OPT) denote the cost of this optimal solution.

Theorem 4.8.

Given an instance (X,dist,𝒞,,k,l,𝛂,𝛃)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem in doubling metrics, there exists an algorithm such that a solution of cost at most (1+ϵ)cost(OPT)(1+\epsilon)cost(OPT) can be obtained, and the running time is at most O~(n(lognϵ)O(d)l)\tilde{O}(n^{{(\frac{\log n}{\epsilon})}^{O(d)}l}).

Proof.

Let 𝒢\mathcal{G} be the solution obtained by the dynamic programming and clients assignment procedure. For each client c𝒞c\in\mathcal{C}, we say that f𝒢f\in\mathcal{G} serves a client cc if 𝒢(c)=f\mathcal{G}(c)=f. By lemma 4.2, and for any pair of points uu and vv in the metrics, the probability that uu and vv are divided into different blocks at level ii is at most O(d)dist(u,v)2iO(d)\cdot\frac{dist(u,v)}{2^{i}}. By lemma 4.4, for any two points uu and vv at level ii, the existence of portal set in each block of split-tree incurs an additional cost of O(ρ2i+1)O(\rho 2^{i+1}), and this is incurred with probability at most O(d)dist(u,v)2iO(d)\cdot\frac{dist(u,v)}{2^{i}}. Then, the total additional expected cost for the path between uu and vv is at most

i=0O(d)(dist(u,v)2i)O(ρ2i+1)=i=0O(ρddist(u,v))=(+1)(O(dρdist(u,v))).\displaystyle\sum\limits_{i=0}^{\ell}O(d)(\frac{dist(u,v)}{2^{i}})\cdot O(\rho 2^{i+1})=\sum\limits_{i=0}^{\ell}O(\rho d\cdot dist(u,v))=(\ell+1)(O(d\rho dist(u,v))). (4)

Let ρ=O(ϵ/dlogn)\rho=O(\epsilon/d\log n), the total additional expected cost for the path between uu and vv is at most ϵdist(u,v)\epsilon dist(u,v). Consider a pair of points cc and OPT(c)OPT(c) (OPT(c)OPT(c) denotes the facility that serves cc), where c𝒞c\in\mathcal{C} is assigned to OPT(c)OPT(c) in the optimal solution OPTOPT. By equation 4, we have that total additional expected cost for the path between cc and OPT(c)OPT(c) is at most ϵdist(c,OPT(c))\epsilon dist(c,OPT(c)). Namely, the total additional expected cost of solution OPTOPT is ϵcost(OPT)\epsilon cost(OPT). Let costT(OPT)cost_{T}(OPT) be the total cost of solution OPTOPT based on split-tree TT. Then, we have that costT(OPT)(1+ϵ)cost(OPT)cost_{T}(OPT)\leq(1+\epsilon)cost(OPT). Then, we have that costT(𝒢)costT(OPT)(1+ϵ)cost(OPT)cost_{T}(\mathcal{G})\leq cost_{T}(OPT)\leq(1+\epsilon)cost(OPT). Thus, the total cost of the solution obtained is no more than (1+ϵ)cost(OPT)(1+\epsilon)cost(OPT).

The running time of our algorithm contains the following three parts: constructing the split-tree TT, executing the dynamic programming procedure, and executing the min-cost max-flow algorithm. Firstly, the split-tree decomposition within portal set can be found in time (1/ρ)O(d)nlogΔ(1/\rho)^{O(d)}n\log\Delta (where Δ=O(n4/ϵ)\Delta=O(n^{4}/\epsilon)).

We now analyze the running time of the dynamic programming process. The running time of the dynamic programming process is related to the number of entries in the dynamic programming table, and the time to calculate the value of each table entry. For a table entry DP[B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}] in the dynamic programming table, we analyze the values of the parameters in table entry. For the given split-tree TT, the total number of levels in the tree is O(logn)O(\log n), and there are nn leaves in TT. The number of distinct value of BB is equal to the number of nodes in the split-tree TT, which is at most O(nlogn)O(n\cdot\log n). Moreover, the value of kBk_{B} is a non-negative integer bounded by nn, and the parameters of qtE(i)q_{t}^{E(i)} and qtL(i)q_{t}^{L(i)} (t[l]t\in[l]) in 𝑸iE\boldsymbol{Q}_{i}^{E} and 𝑸iL\boldsymbol{Q}_{i}^{L} (i[m]i\in[m]), which are non-negative integers bounded by nn. Thus, the total number of entries in our dynamic programming table is at most O(n2ml+2logn)O(n^{2ml+2}\log n).

We now analyze the time to calculate the value of each table entry. For the table entry DP[B,kB,𝑸1E,𝑸1L,,𝑸mE,𝑸mL]DP[B,k_{B},\boldsymbol{Q}_{1}^{E},\boldsymbol{Q}_{1}^{L},\\ \ldots,\boldsymbol{Q}_{m}^{E},\boldsymbol{Q}_{m}^{L}], we consider all possibilities based on the values of variables kBj,𝑸iE(j),𝑸iL(j)k_{B_{j}},\boldsymbol{Q}_{i}^{E(j)},\boldsymbol{Q}_{i}^{L(j)} (i[m],j[u]i\in[m],j\in[u]). Since each of these 2mlu+u2mlu+u variables are non-negative integers bounded by nn, there are O(n2mlu+u)O(n^{2mlu+u}) cases to try all possible values of those parameters to satisfy certain constraints. In the process of dynamic programming procedure, the value of τ\tau is calculated, and it is closely related to the minimum weighted perfect matching in ll bipartite graphs, which can be solved in polynomial time (Kuhn (1955); Munkres (1957)). Specifically, for any bipartite graph Φt=(RS,E)\Phi_{t}=(R\cup S,E), the number of vertices in RR (where R=S\mid R\mid=\mid S\mid) is at most nm(u+1)nm\cdot(u+1). Thus, the time to calculate the minimum weighted perfect matching of Φt\Phi_{t} is bounded by O((nm(u+1))3)O({(nm\cdot(u+1))}^{3}), and the time to calculate the value of τ\tau is O((nm(u+1))3l)O({(nm\cdot(u+1))}^{3}l).

Therefore, the time to calculate the value of each table entry in the dynamic programming process is at most O(n2mlu+u(O(1)+O((nm(u+1))3l)))O(n^{2mlu+u}\cdot(O(1)+O({(nm\cdot(u+1))}^{3}l))) (which can be simplified to O(n2mlu+u+3m3u3l)O(n^{2mlu+u+3}m^{3}u^{3}l). As the total number of entries in the table is bounded by O(n2ml+2logn)O(n^{2ml+2}\log n), and the number of portals in a portal set ρ2i+1\rho 2^{i+1}-net of a block at level ii is m=O(dlognϵ)dm=O(d\frac{\log n}{\epsilon})^{d}, the running time of the dynamic programming process is bounded by O~(n(dlognϵ)O(d)l)\tilde{O}(n^{{(d\frac{\log n}{\epsilon})}^{O(d)}l}). Moreover, for the set FF of opened facilities obtained by dynamic programming process, the clients can be assigned to FF by applying the min-cost max-flow method in poly(n)poly(n) time.

For the instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem in doubling metrics, a solution of cost at most (1+ϵ)cost(OPT)(1+\epsilon)cost(OPT) can be obtained in time O~(n(lognϵ)O(d)l)\tilde{O}(n^{{(\frac{\log n}{\epsilon})}^{O(d)}l}). ∎

5 An O(logk)O(\log k)-Approximation for Fair kk-median Problem General Metrics

In this section, to solve the fair kk-median problem in general metrics, we first deal with the points in XX by a general preprocessing step, then construct a Hierarchically Separated Trees (HST) to embed metric space into tree metrics, finally apply a dynamic programming procedure to get the solution.

5.1 Hierarchically Separated Trees

Given an instance (X,dist,𝒞,,k,l,α,β)(X,dist,\mathcal{C},\mathcal{F},k,l,\alpha,\beta) of the fair kk-median problem, in order to reduce the size of input, we deal with the points in XX by the following preprocessing steps and we use OPTOPT to denote an optimal solution of fair kk-median problem. Find an O(1)O(1)-approximation feasible solution LL of the fair kk-median problem (Bercea et al. (2019)), and let 𝒞(L)\mathcal{C}(L) be the kk centers of the feasible solution obtained. Then, for each point xXx\in X, remove xx, copy a point xx^{\prime} of xx and add xx^{\prime} at 𝒞x\mathcal{C}_{x} where 𝒞x\mathcal{C}_{x} is the nearest center in 𝒞(L)\mathcal{C}(L) to xx. In the new instance obtained, we reduce the size of input locations from nn to kk with constant factor distortion. Thus, the cost of any solution for this new instance compared with the original instance is increased by at most O(1)cost(OPT)O(1)cost(OPT).

A decomposition of the metric space (X,dist)(X,dist) is a partitioning of XX into subsets, which are called blocks. A hierarchically decomposition of the metric (X,dist)(X,dist) is a sequence of δ+1\delta+1 decompositions D0,D1,,DδD_{0},D_{1},\ldots,D_{\delta} such that every block of Di+1D_{i+1} is the union of blocks in DiD_{i} and Dδ={X}D_{\delta}=\{X\} and D0={xxX}D_{0}=\{{x}\mid x\in X\}. The blocks of DiD_{i} are called the level ii blocks. A hierarchically separated tree (HST) of the metric space is a complete hierarchically decomposition: the root node is the level δ\delta block Dδ={X}D_{\delta}=\{X\} and the leaves are the singletons such that D0={xxX}D_{0}=\{{x}\mid x\in X\}. Given a metric space (X,dist)(X,dist), let Δ=2δ\Delta=2^{\delta} be the diameter of the metrics. Starting from Dδ={X}D_{\delta}=\{X\}, we now discuss the relation between the blocks in i+1i+1-th level and ii-th level, where 0iδ10\leq i\leq\delta-1. Let ri=2iϱr_{i}=2^{i}\varrho where 12ϱ<1\frac{1}{2}\leq\varrho<1, and let π\pi be a random ordering of the points in XX. For each point xXx\in X, let π(x)\pi(x) be the order of xx in π\pi. For each block SS at Di+1D_{i+1}, let {y1,,yh}\{y_{1},\ldots,y_{h}\} be the set of points in SS with order from left to right in π\pi. For the point y1y_{1}, a new block Sy1S^{y_{1}} can be obtained at level ii such that Sy1={xXdist(x,y1)ri}S^{y_{1}}=\{x\in X\mid dist(x,y_{1})\leq r_{i}\}. For any point yjy_{j} (1<jh)(1<j\leq h), a new block SyjS^{y_{j}} can be obtained at level ii by the following way: let Z={xXdist(x,yj)ri}Z=\{x\in X\mid dist(x,y_{j})\leq r_{i}\}. For each point xx in ZZ, if xx is not contained in any blocks {Sy1,,Syj1}\{S^{y_{1}},\ldots,S^{y_{j-1}}\}, then xx is contained in SyjS^{y_{j}}.

Lemma 5.1 (Fakcharoenphol et al. (2004)).

Given a HST of metric space (X,dist)(X,dist) with a sequence of decompositions D0,D1,,DδD_{0},D_{1},\ldots,D_{\delta}, the HST decomposition has the following properties:

  1. 1.

    The number of level δ\delta is O(logn)O(\log n) (since the aspect ratio of the input metrics XX is O(n4/ϵ)O(n^{4}/\epsilon)).

  2. 2.

    Each block of level ii has diameter at most 2i+12^{i+1}, namely the maximum distance between any pair of points in a block of level ii is at most 2i+12^{i+1}.

  3. 3.

    For each block SS of level ii, there are some edges between the block and its children at level i1i-1, the length from a block SS to each of its children at level i1i-1 is equal to 2i2^{i} (which is also the upper bound of the radius of SS).

  4. 4.

    For any pair of points u,vu,v must be separated at level ii, where i=(logdist(u,v)1)i=(\lfloor\log dist(u,v)\rfloor-1).

A general metrics can be probabilistically embedded into HST with a distortion of O(log𝒳)O(\log\mid\mathcal{X}\mid) where 𝒳\mid\mathcal{X}\mid is the size of input locations and equals to kk from the above preprocessing step and this embedding can be constructed in polynomial time (Fakcharoenphol et al. (2004)). We now define the distance function in HST. For any two blocks Si,SjS_{i},S_{j} in HST TT, and for any two points uSiu\in S_{i}, vSjv\in S_{j}, the distance between uu and vv in TT is denoted as the length of the shortest path distance in TT between block SiS_{i} and block SjS_{j}.

5.2 Algorithm for Fair kk-median Problem in General Metrics

In this section, we give the algorithm to solve the fair kk-median problem in general metrics.

5.2.1 Dynamic Programming States

Given a HST TT, the dynamic programming proceeds on TT from the leaves to the root. For any block SS of HST TT, let TBT_{B} be the subtree rooted at block BB, and let (S)\ell(S) be the number of children of block SS, and assume that the set of children of block SS is {S1,,S(S)}\{S_{1},\ldots,S_{\ell(S)}\}. For 1i(S)1\leq i\leq\ell(S), let TS,iT_{S,i} denote the subtree induced by the blocks in {S}TS1TS2TSi\{S\}\cup T_{S_{1}}\cup T_{S_{2}}\cup\ldots\cup T_{S_{i}}, and each subtree TS,iT_{S,i} is a subproblem in our dynamic programming process, which includes the information that how many clients of each type entering or leaving TS,iT_{S,i}.

A table entry in the dynamic programming process is a tuple DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}], where the parameters are defined as follows:

  1. 1.

    SS is a block of HST TT.

  2. 2.

    kSk_{S} is the number of facilities that are opened in the subtree TS,iT_{S,i}.

  3. 3.

    ii is an integer indicating how many blocks in children are used to get subtree TS,iT_{S,i}.

  4. 4.

    𝑸Senter\boldsymbol{Q}_{S}^{enter} is a vector with ll values, and qtEq_{t}^{E} (0qtEn0\leq q_{t}^{E}\leq n) in 𝑸Senter\boldsymbol{Q}_{S}^{enter} denotes the number of type tt clients inside the subtree TSTS,iT_{S}\setminus T_{S,i} that are served by facilities located in TS,iT_{S,i}.

  5. 5.

    𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter} is a vector with ll values, and qtE(out)q_{t}^{E(out)} (0qtE(out)n0\leq q_{t}^{E(out)}\leq n) in 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter} denotes the number of type tt clients inside the subtree TTST\setminus T_{S} that are served by facilities located in TS,iT_{S,i}.

  6. 6.

    𝑸Sleave\boldsymbol{Q}_{S}^{leave} is a vector with ll values, and qtLq_{t}^{L} (nqtL0-n\leq q_{t}^{L}\leq 0) in 𝑸Sleave\boldsymbol{Q}_{S}^{leave} denotes the number of type tt clients in the subtree TS,iT_{S,i} that are served by facilities located inside the subtree TSTS,iT_{S}\setminus T_{S,i}.

  7. 7.

    𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave} is a vector with ll values, and qtL(out)q_{t}^{L(out)} (nqtL(out)0-n\leq q_{t}^{L(out)}\leq 0) in 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave} denotes the number of type tt clients in the subtree TS,iT_{S,i} that are served by facilities located inside the subtree TTST\setminus T_{S}.

The solutions at the root RR of HST TT are in the table entry DP[R,k,(R),𝟎,𝟎,𝟎,𝟎]DP[R,k,\ell(R),\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\boldsymbol{0}], where 𝟎\boldsymbol{0} is a vector with ll zero components. Among all these solutions in DP[R,k,(R),𝟎,𝟎,𝟎,𝟎]DP[R,k,\ell(R),\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\boldsymbol{0}], the algorithm outputs the one with the minimum cost.

The base case of the dynamic programming consist of leaves of the HST where at most one facility can be opened at a given location. Let TS,0T_{S,0} denote the simple tree within the singleton {S}\{S\}. Since there is only one facility or client in each leaf of HST, we consider the following three cases for the table entry of leaf node SS:

  1. 1.

    For a block SS at level 0 of HST TT, there is only a facility in the block and this facility is opened. The table entry of the subproblem at SS is DP[S,1,0,𝟎,𝑸Soutenter,𝟎,𝟎]=0DP[S,1,0,\boldsymbol{0},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{0},\boldsymbol{0}]=0. In addition, the tt-th value in 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter}, denoted as qtq_{t}, is the number of type tt clients inside the subtree TTST\setminus T_{S} that are assigned to the facility located in TST_{S}. Thus, in order to satisfy the fair constraints, for each t[l]t\in[l], αtqtt=1lqtβt\alpha_{t}\leq\frac{q_{t}}{\sum_{t=1}^{l}q_{t}}\leq\beta_{t}.

  2. 2.

    For a block SS at level 0 of HST TT, there is only a facility in the block but this facility is not opened. The table entry of the subproblem at SS is DP[S,0,0,𝟎,𝟎,𝟎,𝟎]=0DP[S,0,0,\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\boldsymbol{0}]=0.

  3. 3.

    For a block SS at level 0 of HST TT, there is only a client in the block. The table entry of the subproblem at SS is DP[S,0,0,𝟎,𝟎,𝟎,𝑸Soutleave]=0DP[S,0,0,\boldsymbol{0},\boldsymbol{0},\boldsymbol{0},\boldsymbol{Q}_{Sout}^{leave}]=0. Since this client must be a client of type tt (t[l]t\in[l]), the tt-th value in 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave} will be -1.

5.2.2 Computing Table Entries

Assume that we consider the subproblem on the subtree TS,iT_{S,i} where SS is a block at level jj of HST TT. Let (S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave)(S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}) be the configuration of TS,iT_{S,i} with table entry DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\\ \boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}], where 1i(v)1\leq i\leq\ell(v). To find the minimum cost of DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}], we enumerate all consistent subproblems corresponding to TSiT_{S_{i}} and TS,i1T_{S,i-1} (if i2i\geq 2). Moreover, the value of the table entries corresponding to TSiT_{S_{i}} and TS,i1T_{S,i-1} can be extracted from the dynamic programming table. The recursive expression of the dynamic programming procedure has the following two cases.

Case 1. i=1i=1. In this case, we consider the subtree TS,1T_{S,1} of TT induced by the blocks in TS1ST_{S_{1}}\cup{S}. As S1S_{1} is the first node in the total order of the children of SS, the assignments of the subproblem TS,1T_{S,1} is the same as the assignments of TS1T_{S_{1}}, and the facilities and clients in any feasible solution to the subproblem TS,1T_{S,1} must come from TS1T_{S_{1}}. Thus, we have

DP[S,kS,1,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]=DP[S1,kS1,(S1),𝟎,𝑸Soutenter,𝟎,𝑸Soutleave]+\displaystyle DP[S,k_{S},1,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}]=DP[S_{1},k_{S_{1}},\ell(S_{1}),\boldsymbol{0},\boldsymbol{Q}_{Sout}^{enter*},\boldsymbol{0},\boldsymbol{Q}_{Sout}^{leave*}]+
distT(S,S1)(t=1lqtE(out)+t=1lqtL(out))\displaystyle dist_{T}(S,S_{1})\cdot(\sum_{t=1}^{l}q_{t}^{E(out)*}+\sum_{t=1}^{l}\mid q_{t}^{L(out)*}\mid)
withkS1=kS,𝑸Soutenter=𝑸Senter+𝑸Soutenter,𝑸Soutleave=𝑸Sleave+𝑸Soutleave.\displaystyle with\ k_{S_{1}}=k_{S},\boldsymbol{Q}_{Sout}^{enter*}=\boldsymbol{Q}_{S}^{enter}+\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{Sout}^{leave*}=\boldsymbol{Q}_{S}^{leave}+\boldsymbol{Q}_{Sout}^{leave}.

where kS1k_{S_{1}} is the number of open facilities in the subtree TS1T_{S_{1}} satisfying the constraint kS1=kSk_{S_{1}}=k_{S}. 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter*} denotes the number of each type clients inside the subtree TTS1T\setminus T_{S_{1}} that are served by facilities inside TS1T_{S_{1}} and qtE(out)q_{t}^{E(out)*} in 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter*} denotes the number of type tt (t[l]t\in[l]) clients inside the subtree TTS1T\setminus T_{S_{1}} that are served by facilities inside TS1T_{S_{1}}, 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave*} denotes the number of each type clients inside TS1T_{S_{1}} that are served by facilities inside the subtree TTS1T\setminus T_{S_{1}} and qtL(out)q_{t}^{L(out)*} in 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave*} denotes the number of type tt (t[l]t\in[l]) clients inside TS1T_{S_{1}} that are served by facilities inside the subtree TTS1T\setminus T_{S_{1}}. distT(S,S1)dist_{T}(S,S_{1}) denotes the length of the edge (S,S1)(S,S_{1}). Moreover, the values of 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter*} and 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave*} satisfy the following constraints.

  1. 1.

    (1) 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter*}=𝑸Senter\boldsymbol{Q}_{S}^{enter}+𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter}. From the definition of 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter*} above, the set of clients to get value 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter*} are either in the subtree TSTS1T_{S}\setminus T_{S_{1}} or in the subtree TTST\setminus T_{S}. From the table entry, the number of clients in the subtree TSTS1T_{S}\setminus T_{S_{1}} that are served by the facilities in TS,1T_{S,1} is in 𝑸Senter\boldsymbol{Q}_{S}^{enter}, and the number of clients in the subtree TTST\setminus T_{S} that are served by the facilities in TS,1T_{S,1} is in 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter}.

  2. 2.

    (2) 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave*}=𝑸Sleave\boldsymbol{Q}_{S}^{leave}+𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave}. From the definition of 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave*}, the set of facilities to get value 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave*} are either in the subtree TSTS1T_{S}\setminus T_{S_{1}} or in the subtree TTST\setminus T_{S}. From the table entry, the number of clients in TS1T_{S_{1}} that are served by the facilities in the subtree TSTS1T_{S}\setminus T_{S_{1}} is in 𝑸Sleave\boldsymbol{Q}_{S}^{leave}, and the number of clients in TS1T_{S_{1}} that are served by the facilities in the subtree TTST\setminus T_{S} is in 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave}.

Case 2. 2i(S)2\leq i\leq\ell(S). In this case, the value of table entry on the subtree TS,iT_{S,i} is obtained from the subproblems on the subtrees TS,i1T_{S,i-1} and TSiT_{S_{i}}, by considering all various possible values of the parameters kS,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleavek_{S},\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}. Thus, we have

DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]=min{DP[S,kS,i1,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]\displaystyle DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}]=\min\{DP[S,k_{S}^{\prime},i-1,\boldsymbol{Q}_{S}^{enter^{\prime}},\boldsymbol{Q}_{Sout}^{enter^{\prime}},\boldsymbol{Q}_{S}^{leave^{\prime}},\boldsymbol{Q}_{Sout}^{leave^{\prime}}]
+DP[Si,kSi,(Si),𝟎,𝑸Sioutenter,𝟎,𝑸Sioutleave]+distT(S,Si)(t=1lqtE(out)+t=1lqtL(out))\displaystyle+DP[S_{i},k_{S_{i}},\ell(S_{i}),\boldsymbol{0},\boldsymbol{Q}_{S_{i}out}^{enter*},\boldsymbol{0},\boldsymbol{Q}_{S_{i}out}^{leave*}]+dist_{T}(S,S_{i})\cdot(\sum_{t=1}^{l}q_{t}^{E(out)*}+\sum_{t=1}^{l}\mid q_{t}^{L(out)*}\mid)
withkS+kSi=kS,𝑸Senter+𝑸Soutenter+𝑸Sleave+𝑸Soutleave+𝑸Sioutenter+𝑸Sioutleave\displaystyle\ with\ k_{S}^{\prime}+k_{S_{i}}=k_{S},\boldsymbol{Q}_{S}^{enter^{\prime}}+\boldsymbol{Q}_{Sout}^{enter^{\prime}}+\boldsymbol{Q}_{S}^{leave^{\prime}}+\boldsymbol{Q}_{Sout}^{leave^{\prime}}+\boldsymbol{Q}_{S_{i}out}^{enter*}+\boldsymbol{Q}_{S_{i}out}^{leave*}
=𝑸Senter+𝑸Soutenter+𝑸Sleave+𝑸Soutleave}\displaystyle=\boldsymbol{Q}_{S}^{enter}+\boldsymbol{Q}_{Sout}^{enter}+\boldsymbol{Q}_{S}^{leave}+\boldsymbol{Q}_{Sout}^{leave}\}

where 𝑸Senter\boldsymbol{Q}_{S}^{enter^{\prime}} denotes the number of each type clients inside the subtree TSTS,i1T_{S}\setminus T_{S,i-1} that are served by facilities inside TS,i1T_{S,i-1} and 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter^{\prime}} denotes the number of each type clients inside the subtree TTST\setminus T_{S} that are served by facilities inside TS,i1T_{S,i-1}, 𝑸Sleave\boldsymbol{Q}_{S}^{leave^{\prime}} denotes the number of each type clients inside the subtree TS,i1T_{S,i-1} that are assigned to facilities inside the subtree TSTS,i1T_{S}\setminus T_{S,i-1} and 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave^{\prime}} denotes the number of each type clients inside the subtree TS,i1T_{S,i-1} that are assigned to facilities inside the subtree TTST\setminus T_{S}. Moreover, 𝑸Sioutenter\boldsymbol{Q}_{S_{i}out}^{enter*} denotes the number of each type clients inside the subtree TTSiT\setminus T_{S_{i}} that served by facilities inside TSiT_{S_{i}}, qtE(out)q_{t}^{E(out)*} in 𝑸Sioutenter\boldsymbol{Q}_{S_{i}out}^{enter*} denotes the number of type tt (t[l]t\in[l]) clients inside the subtree TTSiT\setminus T_{S_{i}} that are served by facilities inside TSiT_{S_{i}}, 𝑸Sioutleave\boldsymbol{Q}_{S_{i}out}^{leave*} denotes the number of each type clients inside TSiT_{S_{i}} that are served by facilities inside the subtree TTSiT\setminus T_{S_{i}}, qtL(out)q_{t}^{L(out)*} in 𝑸Sioutleave\boldsymbol{Q}_{S_{i}out}^{leave*} denotes the number of type tt (t[l]t\in[l]) clients inside TSiT_{S_{i}} that are served by facilities inside the subtree TTSiT\setminus T_{S_{i}}, and distT(S,Si)dist_{T}(S,S_{i}) denotes the length of the edge (S,Si)(S,S_{i}). The values of 𝑸Senter\boldsymbol{Q}_{S}^{enter^{\prime}}, 𝑸Soutenter\boldsymbol{Q}_{Sout}^{enter^{\prime}}, 𝑸Sleave\boldsymbol{Q}_{S}^{leave^{\prime}}, 𝑸Soutleave\boldsymbol{Q}_{Sout}^{leave^{\prime}} and 𝑸Sioutenter\boldsymbol{Q}_{S_{i}out}^{enter*}, 𝑸Sioutleave\boldsymbol{Q}_{S_{i}out}^{leave*} should satisfy the following consistent constraints:

  1. 1.

    (1) kS+kSi=kSk_{S}^{\prime}+k_{S_{i}}=k_{S}. kSk_{S}^{\prime} is the number of open facilities in the subtree TS,i1T_{S,i-1} and kSik_{S_{i}} is the number of open facilities in the subtree TSiT_{S_{i}} satisfying the constraint kS+kSi=kSk_{S}^{\prime}+k_{S_{i}}=k_{S}.

  2. 2.

    (2) 𝑸Senter+𝑸Soutenter+𝑸Sleave+𝑸Soutleave+𝑸Sioutenter+𝑸Sioutleave=𝑸Senter+𝑸Soutenter+𝑸Sleave+𝑸Soutleave\boldsymbol{Q}_{S}^{enter^{\prime}}+\boldsymbol{Q}_{Sout}^{enter^{\prime}}+\boldsymbol{Q}_{S}^{leave^{\prime}}+\boldsymbol{Q}_{Sout}^{leave^{\prime}}+\boldsymbol{Q}_{S_{i}out}^{enter*}+\boldsymbol{Q}_{S_{i}out}^{leave*}=\boldsymbol{Q}_{S}^{enter}+\boldsymbol{Q}_{Sout}^{enter}+\boldsymbol{Q}_{S}^{leave}+\boldsymbol{Q}_{Sout}^{leave}. The subproblem on the subtree TS,iT_{S,i} corresponding to DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}] is consistent with subproblems on the subtrees TS,i1T_{S,i-1} and TSiT_{S_{i}} corresponding to DP[S,kS,i1,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S}^{\prime},i-1,\boldsymbol{Q}_{S}^{enter^{\prime}},\boldsymbol{Q}_{Sout}^{enter^{\prime}},\boldsymbol{Q}_{S}^{leave^{\prime}},\\ \boldsymbol{Q}_{Sout}^{leave^{\prime}}] and DP[Si,kSi,(Si),𝟎,𝑸Sioutenter,𝟎,𝑸Sioutleave]DP[S_{i},k_{S_{i}},\ell(S_{i}),\boldsymbol{0},\boldsymbol{Q}_{S_{i}out}^{enter*},\boldsymbol{0},\boldsymbol{Q}_{S_{i}out}^{leave*}].

Finally, based on the solution obtained by the dynamic programming, we can apply min-cost max-flow technique same as the one in doubling metrics to guarantee the assignment function for fair kk-median problem in general metrics.

5.2.3 Analysis and Running time

In this section, we show that our algorithm outputs a solution of cost at most O(logk)O(\log k) times the cost of the optimal solution in polynomial time. Let OPTOPT denote a optimal solution of the fair kk-median problem in general metrics, and let cost(OPT)cost(OPT) denote the cost of this optimal solution.

Theorem 5.2.

Given an instance (X,dist,𝒞,,k,l,𝛂,𝛃)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem in general metrics, there exists an algorithm such that a solution of cost at most O(logk)cost(OPT)O(\log k)cost(OPT) can be obtained, and the running time is at most nO(l)n^{O(l)}.

Proof.

Let 𝒢\mathcal{G} be the solution obtained by the dynamic programming process and clients assignment procedure. For each client c𝒞c\in\mathcal{C}, we say that f𝒢f\in\mathcal{G} serves the client cc if 𝒢(c)=f\mathcal{G}(c)=f. By lemma 5.1, for any pair of points u,vu,v, we have that uu and vv must be separated at level ii where i=(logdist(u,v)1)i=(\lfloor\log dist(u,v)\rfloor-1). Then, the total expected cost for the path between uu and vv on the HST is at most

distT(u,v)=2j=0i2i=2(2i+11)=22(logdist(u,v))22dist(u,v)\displaystyle dist_{T}(u,v)=2\sum_{j=0}^{i}2^{i}=2(2^{i+1}-1)=2\cdot 2^{(\lfloor\log dist(u,v)\rfloor)}-2\leq 2dist(u,v) (5)

Consider a pair of points cc and OPT(c)OPT(c) where c𝒞c\in\mathcal{C} is assigned to OPT(c)OPT(c) (OPT(c)OPT(c) denotes the facility that serves cc) in the optimal solution OPTOPT. By equation 5, we have that the total expected cost for the path between cc and OPT(c)OPT(c) is at most distT(c,OPT(c))2dist(c,OPT(c)dist_{T}(c,OPT(c))\leq 2dist(c,OPT(c). Namely, the total cost of solution OPTOPT on the tree metrics is at most costT(OPT)2cost(OPT)cost_{T}(OPT)\leq 2cost(OPT). Then, we have that costT(𝒢)costT(OPT)2cost(OPT)cost_{T}(\mathcal{G})\leq cost_{T}(OPT)\leq 2cost(OPT). Moreover, we have that a metric can be probabilistically embedded into HST with a distortion of O(logk)O(\log k). Thus, the total cost of the solution 𝒢\mathcal{G} obtained by the above dynamic programming process is no more than O(logk)cost(OPT)O(\log k)cost(OPT)

The running time of our algorithm is the sum of three main parts: constructing the HST TT, executing the dynamic programming procedure and executing the min-cost max-flow algorithm. Firstly, as described above, we have that a HST can be constructed in polynomial time.

For a table entry DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}] in the dynamic programming table, we analyze the the value of the parameters in this table entry. For the given HST TT, the total number of levels is O(logn)O(\log n) and there are nn leaves in TT. The number of distinct value of SS is equal to the number of nodes in TT, which from Fakcharoenphol et al. (2004) is at most O(nlogn)O(n\cdot\log n). Moreover, the value of parameters kSk_{S} is non-negative integer bounded above by nn. The parameters in 𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave\boldsymbol{Q}_{S}^{enter},\boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave} are also non-negative integers bounded above by nn. For a fixed SS, each ii denotes the first ii children of SS from the total ordering of the children of SS in TT. Since TT has exactly nn leaves, SS can have no more than nn children. Thus, ii is bounded above by nn. From the above arguments, the total number of table entries in our dynamic programming table is at most O(n4l+3logn)O(n^{4l+3}\log n).

We now analyze the time to calculate the value of each table entry. For a table entry DP[S,kS,i,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave]DP[S,k_{S},i,\boldsymbol{Q}_{S}^{enter},\\ \boldsymbol{Q}_{Sout}^{enter},\boldsymbol{Q}_{S}^{leave},\boldsymbol{Q}_{Sout}^{leave}], we consider all possibilities based on the parameters kS,kSi,𝑸Senter,𝑸Soutenter,𝑸Sleave,𝑸Soutleave,𝑸Sioutenter,𝑸Sioutleavek_{S}^{\prime},k_{S_{i}},\boldsymbol{Q}_{S}^{enter^{\prime}},\boldsymbol{Q}_{Sout}^{enter^{\prime}},\boldsymbol{Q}_{S}^{leave^{\prime}},\\ \boldsymbol{Q}_{Sout}^{leave^{\prime}},\boldsymbol{Q}_{S_{i}out}^{enter*},\boldsymbol{Q}_{S_{i}out}^{leave*}. Since each of these O(6l+2)O({6l+2}) parameters are non-negative integers bounded above by nn, there are O(n6l)O(n^{6l}) cases to try all possible values of these parameters to satisfy certain constraints.

Thus, the running time of our dynamic programming algorithm is bounded by O(n4l+3lognn6l)=O(n10llogn)O(n^{4l+3}\log n\cdot n^{6l})=O(n^{10l}\log n). Moreover, for the set FF of opened facilities obtained by dynamic programming process, the clients can be assigned to FF by applying the min-cost max-flow method in poly(n)poly(n) time.

For the instance (X,dist,𝒞,,k,l,𝜶,𝜷)(X,dist,\mathcal{C},\mathcal{F},k,l,\boldsymbol{\alpha},\boldsymbol{\beta}) of the fair kk-median problem in general metrics, a solution of cost at most O(logk)cost(OPT)O(\log k)cost(OPT) can be obtained in time nO(l)n^{O(l)}.

6 Conclusions

In this paper, we study the fair kk-median problem in doubling metrics and general metrics. We present a (1+ϵ)(1+\epsilon)-approximation algorithm that runs in QPTAS. We further show that our algorithm works in a more general metrics and we present an O(logk)O(\log k)-approximation algorithm that runs in time nO(l)n^{O(l)}. Our two approximation algorithms for the fair kk-median problem are the first results for the corresponding problems without any fair violation, respectively.

References

  • Abbasi et al. [2021] M. Abbasi, A. Bhaskara, and S. Venkatasubramanian. Fair clustering via equitable group representations. In Proc. 4th ACM Conference on Fairness, Accountability, and Transparency, pages 504–514, 2021.
  • Ahmadian et al. [2019] S. Ahmadian, A. Epasto, R. Kumar, and M. Mahdian. Clustering without over-representation. In Proc. 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 267–275, 2019.
  • Ahmadian et al. [2020] S. Ahmadian, A. Epasto, M. Knittel, R. Kumar, M. Mahdian, B. Moseley, P. Pham, S. Vassilvitskii, and Y. Wang. Fair hierarchical clustering. In Proc. 34th International Conference on Neural Information Processing Systems, 2020.
  • Backurs et al. [2019] A. Backurs, P. Indyk, K. Onak, B. Schieber, A. Vakilian, and T. Wagner. Scalable fair clustering. In Proc. 36th International Conference on Machine Learning, pages 405–413, 2019.
  • Bartal and Gottlieb [2013] Y. Bartal and L.-A. Gottlieb. A linear time approximation scheme for euclidean tsp. In Proc. 54th Annual Symposium on Foundations of Computer Science, pages 698–706, 2013.
  • Bartal et al. [2016] Y. Bartal, L.-A. Gottlieb, and R. Krauthgamer. The traveling salesman problem: low-dimensionality implies a polynomial time approximation scheme. SIAM Journal on Computing, 45(4):1563–1581, 2016.
  • Behsaz et al. [2019] B. Behsaz, Z. Friggstad, M. R. Salavatipour, and R. Sivakumar. Approximation algorithms for min-sum kk-clustering and balanced kk-median. Algorithmica, 81(3):1006–1030, 2019.
  • Bera et al. [2019] S. K. Bera, D. Chakrabarty, N. Flores, and M. Negahbani. Fair algorithms for clustering. In Proc. 33rd International Conference on Neural Information Processing Systems, pages 4955–4966, 2019.
  • Bercea et al. [2019] I. O. Bercea, M. Groß, S. Khuller, A. Kumar, C. Rösner, D. R. Schmidt, and M. Schmidt. On the cost of essentially fair clusterings. In Proc. 22nd International Conference on Approximation Algorithms for Combinatorial Optimization Problems and 23rd International Conference on Randomization and Computation, pages 18:1–18:22, 2019.
  • Chen et al. [2019] X. Chen, B. Fain, L. Lyu, and K. Munagala. Proportionally fair clustering. In Proc. 36th International Conference on Machine Learning, pages 1032–1041, 2019.
  • Chierichetti et al. [2017] F. Chierichetti, R. Kumar, S. Lattanzi, and S. Vassilvitskii. Fair clustering through fairlets. In Proc. 31st International Conference on Neural Information Processing Systems, pages 5029–5037, 2017.
  • Cohen-Addad et al. [2019] V. Cohen-Addad, A. E. Feldmann, and D. Saulpic. Near-linear time approximations schemes for clustering in doubling metrics. In Proc. 60th Annual Symposium on Foundations of Computer Science, pages 540–559, 2019.
  • Esmaeili et al. [2020] S. Esmaeili, B. Brubach, L. Tsepenekas, and J. Dickerson. Probabilistic fair clustering. Proc. 34th International Conference on Neural Information Processing Systems, 33, 2020.
  • Fakcharoenphol et al. [2004] J. Fakcharoenphol, S. Rao, and K. Talwar. A tight bound on approximating arbitrary metrics by tree metrics. Journal of Computer and System Sciences, 69(3):485–497, 2004.
  • Ghadiri et al. [2021] M. Ghadiri, S. Samadi, and S. Vempala. Socially fair kk-means clustering. In Proc. 4th ACM Conference on Fairness, Accountability, and Transparency, pages 438–448, 2021.
  • Harb and Lam [2020] E. Harb and H. S. Lam. KFC: A scalable approximation algorithm for kk-center fair clustering. In Proc. 34th International Conference on Neural Information Processing Systems, 2020.
  • Kuhn [1955] H. W. Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  • Munkres [1957] J. Munkres. Algorithms for the assignment and transportation problems. Journal of the society for industrial and applied mathematics, 5(1):32–38, 1957.
  • Talwar [2004] K. Talwar. Bypassing the embedding: algorithms for low dimensional metrics. In Proc. 36th Annual ACM symposium on Theory of computing, pages 281–290, 2004.