This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sub-quadratic (1+ϵ)(1+\epsilon)-approximate Euclidean Spanners, with Applications

Alexandr Andoni  Hengjie Zhang [email protected]. Columbia University.[email protected]. Columbia University.
Abstract

We study graph spanners for point-set in the high-dimensional Euclidean space. On the one hand, we prove that spanners with stretch <2<\sqrt{2} and subquadratic size are not possible, even if we add Steiner points. On the other hand, if we add extra nodes to the graph (non-metric Steiner points), then we can obtain (1+ϵ)(1+\epsilon)-approximate spanners of subquadratic size. We show how to construct a spanner of size n2Ω(ϵ3)n^{2-\Omega(\epsilon^{3})}, as well as a directed version of the spanner of size n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})}.

We use our directed spanner to obtain an algorithm for computing (1+ϵ)(1+\epsilon)-approximation to Earth-Mover Distance (optimal transport) between two sets of size nn in time n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})}.

1 Introduction

A spanner for a metric defined by the graph GG is a subgraph HH such that all shortest path distances in HH approximate shortest path distances in GG up to some approximation α\alpha. Originally introduced in [PS89], the notion has been very influential in theory and practice — see, for example, the recent survey [ABS+20] and earlier surveys [Epp99, Zwi01]. The classic result is that, for any integer k1k\geq 1, for any weighted graph GG, one can construct a spanner of size O(n1+1/k)O(n^{1+1/k}) approximating all distances up to factor 2k12k-1 [PS89], termed stretch. This is tight under the Erdös girth conjecture [Erd63].

Graph spanners have been studied in a number of settings and variants. While we refer the reader to the survey [ABS+20] for the vast list of relevant papers, we highlight that there are a number of parameters/versions of spanners considered: approximation (e.g., multiplicative, additive, and others), which pairs to preserve, properties of the graph GG, computation time, data structures (distance oracles), etc.

An almost disjoint line of research concerns geometric spanners, where the graph GG is defined implicitly as the pairwise distance in some metric MM; introduced in [Che86] even before spanners where defined. In particular, here GG is a complete graph, where each pair of nodes is connected with an edge with weight equal to their distance. The classic results here show that an nn-point metric in the Euclidean space d\mathbb{R}^{d} admits a (1+ϵ)(1+\epsilon)-approximation spanner of size O(nϵd+1)O(n\cdot\epsilon^{-d+1}) [Cla87, Kei88, KG92, ADD+93]; see also the survey [NS07].

More recent progress [LS22, BT22] showed tightness of the bound nϵd+1n\epsilon^{-d+1}, but also that Steiner spanners can improve the size to O(nϵd+12)O(n\epsilon^{\tfrac{-d+1}{2}}). A Steiner spanner is a graph whose nodes are the pointset PdP\subset\mathbb{R}^{d} together with some extra Steiner points d\in\mathbb{R}^{d}. Whenever an edge is added to the spanner, its weight is the direct distance between the corresponding points (so that no pair is shortcut).

In contrast to the two extremely well-studied settings from above, the high-dimensional geometric spanners have surprisingly received much less attention. In particular, we are aware of only one (theoretical) result: [HPIS13] who obtained cc-spanner of size O(n1+O(1/c2))O(n^{1+O(1/c^{2})}) for any c3.4c\geq 3.4. This is despite the ubiquity of high-dimensional datasets and the fact that many (massive) graphs are actually “similarity graphs” 111While similarity graph is a bit different from the aforementioned complete graph — e.g., the edges are only for pairs of points at distance r\leq r — the tools from here apply to such graphs nonetheless., obtained by connecting close points of some dataset PdP\in\mathbb{R}^{d} — subsequently used for various clustering applications; see recent [DEL+22] and references therein. Indeed, the spanner from [HPIS13] has been implemented in [CHJ+22], yielding orders of magnitude speed up in graph building, without affecting down-the-stream clustering which leverages the spanner only.

To put our lack of understanding into perspective, we don’t even know if we can obtain subquadratic-size spanners for approximation c3.4c\leq 3.4.

1.1 Our contributions

We build new high-dimensional geometric spanners in the 1+ϵ1+\epsilon approximation regime, asking the following main question:

Question 1.1.

Do there exist spanners for high-dimensional Euclidean pointset with 1+ϵ1+\epsilon approximation and subquadratic size?

From now on, we assume we have an nn-point dataset PdP\in\mathbb{R}^{d} under the Euclidean distance, where dlognd\gg\log n. For a graph HH, we use dH(p,q)\mathrm{d}_{H}(p,q) to denote the shortest path distance between pp to qq in HH. We also let Δ\Delta be the aspect ratio of PP: Δ=maxp,qPpq2minp,qP,pqpq2\Delta=\tfrac{\max_{p,q\in P}\|p-q\|_{2}}{\min_{p,q\in P,p\neq q}\|p-q\|_{2}}.

Lower bound for spanners.

We first give indication that it is impossible to obtain (1+ϵ)(1+\epsilon) approximation with a sub-quadratic-sized spanner even if we allow Steiner points. In particular, we prove the following two lower bounds for the Hamming metric and Euclidean metric, respectively.

Theorem 1.2 (See Section 2.1).

For any ϵ>0\epsilon>0, there exists a dataset P{0,1}dP\subset\{0,1\}^{d} such that the following holds. Fix a graph HH with vertices identified with PP, such that for any p,qPp,q\in P,

pq1dH(p,q)<(2ϵ)pq1.\|p-q\|_{1}\leq\mathrm{d}_{H}(p,q)<(2-\epsilon)\cdot\|p-q\|_{1}.

Then HH must have Ω(n2ϵ4)\Omega(n^{2}\epsilon^{4}) edges. This remains true even if HH contains Steiner vertices corresponding to points in {0,1}d\{0,1\}^{d}.

Theorem 1.3 (See Section 2.2).

For any ϵ>0\epsilon>0, there exists a dataset PSd1(0,1)P\subset S^{d-1}(0,1) such that the following holds. Fix a graph HH with vertices identified with PP, such that for any p,qPp,q\in P,

pq2dH(p,q)<(2ϵ)pq2.\|p-q\|_{2}\leq\mathrm{d}_{H}(p,q)<(\sqrt{2}-\epsilon)\cdot\|p-q\|_{2}.

Then HH must have Ω(n2ϵ4/log2n)\Omega(n^{2}\epsilon^{4}/\log^{2}n) edges. This remains true even if HH contains Steiner vertices corresponding to points in d\mathbb{R}^{d}.

Spanners with non-metric Steiner nodes.

Next, we show that if we consider a more general class of Steiner spanners, where we allow graph nodes that do not correspond to points d\in\mathbb{R}^{d}, we can indeed obtain (1+ϵ)(1+\epsilon)-spanners of subquadratic size, answering positively the question from above. We refer to such spanners as non-metric Steiner spanners.

Furthermore, we distinguish two types of non-metric Steiner spanners: undirected, and directed. In particular, a directed spanner HH for sets P,QP,Q with stretch c>1c>1 is a directed graph with vertex-set PQSP\cup Q\cup S, satisfying for all pP,qQp\in P,q\in Q:

pqdH(p,q)cpq,\|p-q\|\leq\vec{\mathrm{d}}_{H}(p,q)\leq c\|p-q\|,

where dH(p,q)\vec{\mathrm{d}}_{H}(p,q) as the shortest path from pPp\in P to qQq\in Q. When we have one dataset PP, we take QQ to be a copy of PP. Note that directed spanner only makes sense for non-metric Steiner version.

Theorem 1.4 (See Section 3).

Fix ϵ(0,1)\epsilon\in(0,1). For any dataset PdP\in\mathbb{R}^{d}, there exists a directed graph HH on vertices PSP\cup S, such that for any p,qPp,q\in P,

pq2dH(p,q)<(1+ϵ)pq2.\|p-q\|_{2}\leq\vec{\mathrm{d}}_{H}(p,q)<(1+\epsilon)\cdot\|p-q\|_{2}.

Furthermore, the number of edges of HH, the size of SS, and the construction time are all bounded by O(n2Ω(ϵ2)logΔ)O(n^{2-\Omega(\epsilon^{2})}\log\Delta), where Δ\Delta is the aspect ratio of PP.

While the directed spanner turns out to be already sufficient for our application below, there other applications which may require an undirected graph (as many graph problems are easier for undirected graphs). Thus, we also show that one can construct an undirected spanner, albeit the parameters are less efficient than for the directed case.

Theorem 1.5 (See Section 4).

Fix ϵ(0,1)\epsilon\in(0,1). For any dataset PdP\in\mathbb{R}^{d}, we can construct an undirected graph HH on vertices PSP\cup S, such that for any p,qPp,q\in P,

pq2dH(p,q)<(1+ϵ)pq2.\|p-q\|_{2}\leq\mathrm{d}_{H}(p,q)<(1+\epsilon)\cdot\|p-q\|_{2}.

Furthermore, the construction time, the number of edges of HH and the size of SS are bounded by O(n2Ω(ϵ3)logΔ)O(n^{2-\Omega(\epsilon^{3})}\log\Delta), where Δ\Delta is the aspect ratio of PP.

Sub-quadratic algorithms for the Earth-Mover Distance in high dimensions problem

We highlight one concrete algorithmic application of our spanner construction. In particular, we consider the problem of Earth-Mover Distance in the high-dimensional spaces: given two nn-points sets A,BdA,B\subset\mathbb{R}^{d}, compute the smallest cost bi-chromatic matching argminπ:AB1naAaπ(a)\operatorname{argmin}_{\pi:A\to B}\tfrac{1}{n}\sum_{a\in A}\|a-\pi(a)\| where π\pi ranges over all permutations. In different communities, the EMD problem is also termed optimal transport, Wasserstein distance, Monge-Kantarovich, and others.222Formally, all these are defined over distributions over nn points, equivalently weighted pointsets. All our results apply to the weighted version as well, and hence we ignore the difference for simplicity of presentation.

The basic problem is to compute the (near-) optimal π\pi efficiently. When dimension dd is small, a long progression of results, starting from [Vai89] and culminating in [KNP19] yielded an (1+ϵ)(1+\epsilon)-approximate algorithm running in time O~(n)ϵO(d)\widetilde{O}(n)\cdot\epsilon^{O(d)}.

Another line of research concerns the general metric (or cost) setting, i.e., more general than the Euclidean setting. Here, the best possible runtime is n2\sim n^{2}, which is necessary in general. One approach uses Sinkhorn iteration algorithm to obtain additive error of ϵmaxp,qA,Bpq\epsilon\max_{p,q\in A,B}\|p-q\| [Cut13, ANWR17]. A different approach uses faster linear programming algorithms to solve the standard bi-partite matching in time near-linear in the size of the (complete) bi-partite graph on A×BA\times B [VDBLL+21, CKL+22] — this approach can obtain an exact result.

Nevertheless, the most common setting for general cost is still the high-dimensional Euclidean (or 1\ell_{1}) space (see, e.g., the experiments from [ANWR17], or this most recent paper [ARSS23]). In this setting, one can hope for runtime n2\ll n^{2}, as the input size is only O(nd)O(nd).

The best algorithm for the high-dimensional setting obtains runtime of O(n1+O(1/c2))O(n^{1+O(1/c^{2})}) for cc-approximation, for c3.4c\geq 3.4. This follows from using the spanner of [HPIS13] in the following general approach. One can take the pointset ABA\cup B and build a spanner HH on it. Then the EMD problem reduces to solving the “graphical EMD” case: EMD under the metric defined by the shortest path distance in HH. The latter problem is equivalent to the problem of uncapacitated min-cost flow, or transshipment problem. This problem in turn can be solved in time near-linear in the graph size [She17, ASZ20, Li20]. While these algorithms are for undirected graphs (and hence would require an undirected spanner), the most recent algorithm does solve the min-cost flow problem in directed graph in near-linear time [CKL+22].

Thus, we obtain the following statement, which is direct corollary of our directed spanner Theorem 1.4 combined with [CKL+22]:

Theorem 1.6.

For any ϵ>0\epsilon>0, given two sets of points A,BdA,B\subset\mathbb{R}^{d} of size nn, we can solve the EMD(A,B)EMD(A,B) problem within (1+ϵ)(1+\epsilon) approximation in time O(n2Ω(ϵ2)logΔ)O(n^{2-\Omega(\epsilon^{2})}\log\Delta) time.

1.2 Our techniques

The natural place to start is to understand why [HPIS13] does not answer our main Question 1.1, instead obtaining a subquadratic-sized spanner only for approximation c>3.4c>3.4. First, we note that it is enough to solve the problem for some fixed scale r>0r>0: for given dataset PP, build a spanner HrH_{r} where all pairs of points at distance r\leq r have a path of length crcr, while no pairwise distance contracts. For this, the authors use Locality-Sensitive Hashing, originally designed to solve the Approximate Near Neighbor Search (ANNS) problem in d\mathbb{R}^{d}:

Definition 1.7 ([HIM12]).

Fix dimension dd, approximation c>1c>1, and scale r>0r>0. A distribution {\cal H} over hash functions h:dUh:\mathbb{R}^{d}\to U, for some discrete set UU, is called (c,p1,p2)(c,p_{1},p_{2})-Locality Sensitive Hashing (LSH) if:

  • for any two points p,qdp,q\in\mathbb{R}^{d} with pqr\|p-q\|\leq r, we have Prh[h(p)=h(q)]p1\Pr_{h}[h(p)=h(q)]\geq p_{1};

  • for any two points p,qdp,q\in\mathbb{R}^{d} with pq>cr\|p-q\|>cr, we have Prh[h(p)=h(q)]<p2\Pr_{h}[h(p)=h(q)]<p_{2}.

It is known that one can obtain (p1,p2)(p_{1},p_{2})-LSH with p1p21/c2+o(1)p_{1}\leq p_{2}^{1/c^{2}+o(1)} for any p2<dω(1)p_{2}<d^{-\omega(1)} [AI08]. The overall [HPIS13] algorithm proceeds as follows. Sample L1/p1L\approx 1/p_{1} hash functions hh from (c/2,p1,p2)(c/2,p_{1},p_{2})-LSH where p2=1/n3p_{2}=1/n^{3}. Each such hash function partitions the dataset PP into buckets. In each bucket, we connect all points to a fixed point with an edge of length cr/2cr/2 — i.e., add a star to the graph HrH_{r}. One can immediately deduce that points at distance r\leq r will be connected by 2-hop path of length crcr, and that the total size is O(nL)=O(n/p1)=O(n1+12/c2+o(1))O(nL)=O(n/p_{1})=O(n^{1+12/c^{2}+o(1)}). Furthermore, since p3=1/n3p_{3}=1/n^{3}, one can argue that no pair at distance <cr<cr ends up in the same bucket.

Note the size is sub-quadratic only for c>12>3.4c>\sqrt{12}>3.4, and there are two contributions to the exponent 12. The first one is the “star” gadget inside the bucket: most colliding points are connected by a 2-hop path that imposes a factor-2 stretch. The second one is the fact that we need to set p2=1/n3p_{2}=1/n^{3} to ensure that no pair of points at distance >cr>cr collide in a bucket.

As we discuss below, both these sources of stretch loss are intrinsic to the algorithm and require new ideas to overcome.

Lower bound.

First, we show that we cannot obtain approximation c<2c<2 using vanilla spanners or even spanners with Steiner points in the Euclidean/Hamming metric. Indeed, consider a random pointset PP on the surface of a unit sphere in d\mathbb{R}^{d}. Then all distances are concentrated around 2\sqrt{2}. It is immediate that, for any spanner HrH_{r}, any pair of points without a direct edge will have stretch 2. One can also show that even if we have Steiner points in d\mathbb{R}^{d}, then a subquadratic-sized spanner much have stretch at least 2\sqrt{2}. For intuition, note that the best Steiner point for PP is the center of the sphere. A similar lower bound holds for Steiner spanners in the Hamming space.

Upper bound.

To deal with the first source of approximation from [HPIS13] algorithm, we consider adding Steiner vertices to the spanner that does not represent points in the metric itself. In particular, in the above equidistant case, the solution is to add a node ss to the graph, and connect it to each of the points in PP with an edge of length cr/2cr/2. We term this a “star gadget” below.

Now we can focus on the second source of stretch: that we need to set p2p_{2} as low as 1/n31/n^{3}, which is a much more significant technical challenge. The natural first question would be: why not set p20.1/np_{2}\approx 0.1/n and hence p1=1/n1ϵp_{1}=1/n^{1-\epsilon} for c=1+ϵc=1+\epsilon?

Consider a pair of points p,qPp,q\in P at distance r\leq r that we would like to collide in some bucket, in which case they are connected by a path of distance crcr (using our new star gadget from above). It is immediate to argue that p,qp,q will collide with probability p1p_{1}, and hence 1o(1)1-o(1) over all LL hash functions. The tricky part is to ensure that this successful bucket does not also contain a pair of far points (i.e., at distance >cr>cr), in which case the star gadget will shortcut the pair.

To analyze the probability of far pair of points falling into the bucket, we can consider, for example, the probability that a “far” point fPf\in P at distance qf>cr\|q-f\|>cr collides with qq. Standard analysis says that the expected number of far points is p2n\leq p_{2}n, and hence, by Markov’s, there’s, say, Ω(1)\Omega(1) probability that no far point collides with qq. The crucial caveat is that this probability is not independent of the event that p,qp,q collide: i.e., it may be the case that whenever p,qp,q collide, the number of colliding far points is much larger than the expectation.

We are lead to consider the “3-point collision” probability: Pr[(h(f)=h(q))(h(q)=h(p))]\Pr[(h(f)=h(q))\wedge(h(q)=h(p))]. If we were able to prove that these two events are essentially independent, we would be all set. Such property has indeed been considered, at least in [AR15] who used it for data-dependent LSH algorithms. They proved such desired independence by a very intricate reduction to the “pseudo-random” case: where PP is essentially random on the sphere, except for the pair of interest p,qp,q. Alas, their reduction does not seem applicable here for a couple of reasons. The first is that the independence holds only in the case when far point ff is far from both pp and qq. The second is that, while in [AR15] one can think of pp being a “rare” close point to qq (this helps in the reduction), in our setting, there could be both Ω(n)\Omega(n) close points and Ω(n)\Omega(n) far points for most qq’s!

Our approach is different and uses two main ideas: the first enough to get directed spanners, and the second necessary for the undirected spanners.

Upper bound: first idea.

We circumvent the “3-point collision” analysis by considering a different setting of the hashing, roughly corresponding to the ANNS algorithm with sub-polynomial query time. In particular, we use the hashing scheme from [ALRW17]. Think of the dataset as being two sets PP and QQ, and we need to worry only about the collision of points pPp\in P vs qQq\in Q (in the end we take Q=PQ=P). Then [ALRW17] design a hashing scheme such that for some L=nO(1)L=n^{O(1)}, there’s a function hP:d2[L]h_{P}:\mathbb{R}^{d}\to 2^{[L]} which describes for each point pPp\in P which set of buckets it is hashed to, and there’s an equivalent hQ:d2[L]h_{Q}:\mathbb{R}^{d}\to 2^{[L]} function. Then we have that, for any pair p,qp,q at distance r\leq r, we have that hP(p)hQ(q)h_{P}(p)\cap h_{Q}(q)\neq\emptyset with constant probability. Furthermore, |hP(p)|=nρP+o(1)|h_{P}(p)|=n^{\rho_{P}+o(1)} and |hQ(q)|=nρQ+o(1)|h_{Q}(q)|=n^{\rho_{Q}+o(1)} for ρQ,ρP0\rho_{Q},\rho_{P}\geq 0 satisfying some relation, and the runtime to compute hP(p),hQ(q)h_{P}(p),h_{Q}(q) is roughly proportional to their output size. It is possible to set ρQ=0\rho_{Q}=0 in which case ρP1/ϵ2\rho_{P}\approx 1/\epsilon^{2} (we note that this bound is tight [ALRW17]).

Our solution uses the above asymmetric hashing as follows: split the dataset PP into n1ϵ2/2\approx n^{1-\epsilon^{2}/2} groups of size m=nϵ2/2m=n^{\epsilon^{2}/2}. For each group, preprocess as above, taking a total of n/mmρP<n1.5\approx n/m\cdot m^{\rho_{P}}<n^{1.5} time, and then query each qQq\in Q, taking a total of nn/m=n2ϵ2/2\approx n\cdot n/m=n^{2-\epsilon^{2}/2} time. We can then prove that, for every close pair pP,qQp\in P,q\in Q, with probability at least 1/21/2, there exists a bucket where 1) p,qp,q collide, but 2) contains no point ff that is far from qq.

The remaining caveat is that we need to verify that a bucket succeeded — all hashed p,qp,q are close to each other — as we cannot add star gadget to the failed ones (otherwise, we create shortcuts). For this, we need to employ furthest neighbor search data structure. A naïve adaptation of the best currently known data structure, would result in extra runtime n2ϵ3\approx n^{2-\epsilon^{3}}, a slow-down we manage to avoid.

Upper bound: second idea.

The above procedure obtains only a directed spanner since we guarantee distances between pPp\in P and qQq\in Q but among p,pPp,p^{\prime}\in P pairs or q,qQq,q^{\prime}\in Q pairs. This turns out to be much more difficult to ensure.

In fact, our original “star gadget” is not even enough to get an undirected spanner. In particular, consider the case when, we have n/2n/2 points PP drawn from sphere of radius a=2a=2 and QQ drawn sphere of radius b=1b=1. Then the distance between a pair pP,qQp\in P,q\in Q is typically a2+b2=5\sqrt{a^{2}+b^{2}}=\sqrt{5}; fix r=5r=\sqrt{5}. At the same time the distance between points p,pPp,p^{\prime}\in P is 2a=8>r\sqrt{2}a=\sqrt{8}>r. It seems impossible to construct an undirected spanner of sub-quadratic size with stretch 1+ϵ1+\epsilon using the “star gadget” from before — because it would shortcut PP-to-PP distances.

Instead, we introduce a new gadget: asymmetric star gadget. In particular, in the above example, we have a Steiner vertex ss that is connected with distance 2a/2\sqrt{2}a/2 to points pPp\in P and with distance a2+b22a/22b/2\sqrt{a^{2}+b^{2}}-\sqrt{2}a/2\geq\sqrt{2}b/2 to points qQq\in Q. Now the distance PP-to-QQ is rr; PP-to-PP is 2a\sqrt{2}a; and QQ-to-QQ is 2b\geq\sqrt{2}b as well. While the QQ-to-QQ distance is stretched (it will be taken care of at a different scale rr), the rr-scale distances are taken care of, and there are not shortcuts created.

To be able to use this gadget, we start from where we left off with the directed spanner: a pair of sets of PP,QQP^{\prime}\subset P,Q^{\prime}\subset Q such that each pPp\in P^{\prime} is close to each qQq\in Q^{\prime}. We show that in this case, there exists a,ba,b with a+bra+b\leq r, such that we can decompose PP^{\prime} (and QQ^{\prime}) into smaller parts so that the diameter of each part is (1+ϵ)a(1+\epsilon)a (and (1+ϵ)b(1+\epsilon)b respectively). The number of such parts is bounded by P1Θ(ϵ)P^{\prime 1-\Theta(\epsilon)} and Q1Θ(ϵ)Q^{\prime 1-\Theta(\epsilon)} respectively. Then we use the asymmetric gadget on each pair of such parts.

Acknowledgements

We thank Negev Shekel Nosatzki for important discussions and suggestions for the undirected spanner construction in Section 4. Research supported in part by NSF (CCF2008733) and ONR (N00014-22-1-2713).

2 Lower Bound for Spanners, with or without Steiner Points

In this section, we prove the lower bounds on the size of the spanner for geometric pointsets. We consider spanners that contains Steiner nodes corresponding to points in the metric. We note that since the spanner cannot shortcut any pair, for any spanner edge connecting two points x,ydx,y\in\mathbb{R}^{d}, the optimal length of the edge is the distance between points xx and yy.

We prove lower bounds for both the Hamming and Euclidean (2\ell_{2}) spaces separately.333While it is known that 2\ell_{2} can be embedded into 1\ell_{1} (and 1\ell_{1} into 2\ell_{2}-squared), a lower bound for the 2\ell_{2} (1\ell_{1}) space does not immediately imply a lower bound for the 1\ell_{1} (resp. 2\ell_{2}) space since the Steiner points are to be taken in the considered space. Both lower bounds use the following lemma, which we prove in Section 2.3.

Lemma 2.1.

Consider a graph G=(V,E)G=(V,E) and integer kk. Let Px,PyVP_{x},P_{y}\subseteq V be any disjoint sets. Suppose |Px|=|Py|=n|P_{x}|=|P_{y}|=n and |V||E|=o(n2/k2)|V|\leq|E|=o(n^{2}/k^{2}). For every xPxx\in P_{x} and yPyy\in P_{y}, let Sx,yS_{x,y} be any path from xx to yy in GG. Then, there exists kk distinct points x1,,xkPxx_{1},\cdots,x_{k}\in P_{x} and kk distinct points y1,,ykPyy_{1},\cdots,y_{k}\in P_{y}, such that there is a node that lays on all paths Sx1,y1,Sx2,y2,,Sxk,ykS_{x_{1},y_{1}},S_{x_{2},y_{2}},\cdots,S_{x_{k},y_{k}}.

2.1 Lower bound in the Hamming metric

We prove the following lower bound on Steiner spanner size in the 1\ell_{1} metric.

Theorem 2.2 (Restatement of Theorem 1.2).

For any ϵ(0,1)\epsilon\in(0,1), there exists a point set P{0,1}dP\subset\{0,1\}^{d} such that the following holds. Let H=(PS,E)H=(P\cup S,E) be any spanner that can use extra Steiner points S{0,1}dS\subseteq\{0,1\}^{d}, but can only use the 1\ell_{1} distance as the edge weight between two points. Suppose for all u,vPu,v\in P, the distance between uu and vv on graph HH satisfies

uv1dH(u,v)(1+ϵ)uv1,\|u-v\|_{1}\leq\mathrm{d}_{H}(u,v)\leq(1+\epsilon)\|u-v\|_{1},

then HH must use Ω(n2(1ϵ)4)\Omega(n^{2}(1-\epsilon)^{4}) edges.

Proof of Theorem 2.2.

Let PP be chosen as every point is i.i.d. uniformly sampled from the hamming cube {0,1}d\{0,1\}^{d} with d=ω(log2n)d=\omega(\log^{2}n). With high probability, every pair of points in PP has distance d/2±o(1)d\in d/2\pm o(1)d.

For every two points x,y{0,1}dx,y\in\{0,1\}^{d}, we define Ax,y:={z{0,1}dxz1+yz1δ2d+xy1}A_{x,y}:=\{z\in\{0,1\}^{d}\mid\|x-z\|_{1}+\|y-z\|_{1}\leq\frac{\delta}{2}\cdot d+\|x-y\|_{1}\}, where δ=(1+ϵ)/2\delta=(1+\epsilon)/2 is taking average of 11 and ϵ\epsilon. Intuitively, for x,yPx,y\in P, the set Ax,yA_{x,y} contains all possible Steiner points ss in {0,1}d\{0,1\}^{d} such that the path xsyx\rightarrow s\rightarrow y won’t have large approximation: if xs1+sy1(1+ϵ)xy1\|x-s\|_{1}+\|s-y\|_{1}\leq(1+\epsilon)\|x-y\|_{1}, then sAx,ys\in A_{x,y} since (1+ϵ)xy1δ2d+xy1(1+\epsilon)\|x-y\|_{1}\leq\frac{\delta}{2}\cdot d+\|x-y\|_{1}.

The following claim shows that Ax,yA_{x,y} has small volume.

Claim 2.3.

For every p{0,1}dp\in\{0,1\}^{d}, if x,yx,y are uniformly sampled from {0,1}d\{0,1\}^{d}, then

Prx,y[pAx,y]exp(18(1δ)2d).\displaystyle\Pr_{x,y}[p\in A_{x,y}]\leq\exp(-\frac{1}{8}(1-\delta)^{2}d).
Proof.

Define S={i[d]xi=yi}S=\{i\in[d]\mid x_{i}=y_{i}\} be the set of coordinates where xx and yy agree on. Ax,yA_{x,y} has an equivalent definition Ax,y={z{0,1}dzSxS1δ4d}A_{x,y}=\{z\in\{0,1\}^{d}\mid\|z_{S}-x_{S}\|_{1}\leq\frac{\delta}{4}\cdot d\}. For i[d]i\in[d], let XiX_{i} be the random variable that Xi=1X_{i}=1 if iSi\in S and pi=xip_{i}=x_{i}, otherwise Xi=0X_{i}=0. If x,yx,y are sampled uniformly randomly from {0,1}d\{0,1\}^{d}, then each Xi=1X_{i}=1 with 14\frac{1}{4} probability and Xi=0X_{i}=0 with 34\frac{3}{4} probability, and all XiX_{i} are independent. And pAx,yp\in A_{x,y} is equivalent to Xiδ4d\sum X_{i}\leq\frac{\delta}{4}\cdot d.

By Chernoff bound Pr[X(1α)𝔼[X]]eα2𝔼[X]/2\Pr[X\leq(1-\alpha)\operatorname*{{\mathbb{E}}}[X]]\leq e^{-\alpha^{2}\operatorname*{{\mathbb{E}}}[X]/2}, replacing α\alpha with 1δ1-\delta, we have

Prx,y[pAx,y]=Pr[iXiδ4d]exp(18(1δ)2d).\displaystyle\Pr_{x,y}[p\in A_{x,y}]=\Pr[\sum_{i}X_{i}\leq\frac{\delta}{4}\cdot d]\leq\exp(-\frac{1}{8}(1-\delta)^{2}d).

We are now ready to proceed with the main argument. For any PP, let HH be any spanner stated in this theorem. Suppose HH has o(n2/k2)o(n^{2}/k^{2}) edges for some kk tbd. By Lemma 2.1, there must exist 2k2k distinct points x1,y1,,xk,ykPx_{1},y_{1},\cdots,x_{k},y_{k}\in P such that the shortest path between (x1,y1),(xk,yk)(x_{1},y_{1}),\cdots(x_{k},y_{k}) intersect on some vertex vHv\in H (which could be Steiner or an original point in PP). Since the edge weight can only use the actual Hamming distance, we must have xiv1+yiv1(1+ϵ)xiyi1xiyi1+δ2d\|x_{i}-v\|_{1}+\|y_{i}-v\|_{1}\leq(1+\epsilon)\|x_{i}-y_{i}\|_{1}\leq\|x_{i}-y_{i}\|_{1}+\frac{\delta}{2}d for all ii. Thus, vv must be in every Axi,yiA_{x_{i},y_{i}} by definition of Ax,yA_{x,y}.

Define the event EE as follows: there exists a point p{0,1}dp\in\{0,1\}^{d}, and 2k2k distinct points x1,y1,,xk,ykPx_{1},y_{1},\cdots,x_{k},y_{k}\in P such that pAxi,yi,i[k]p\in A_{x_{i},y_{i}},\leavevmode\nobreak\ \forall i\in[k]. We have argued that if the claimed spanner with o(n2/k)o(n^{2}/k) edges exists for every PP, then PrP[E]=1\Pr_{P}[E]=1. We will show next that when k>24(1δ)2k>24(1-\delta)^{-2}, the probability EE happens (over the random choice of PP) is less than 11. Hence, there exists a dataset PP^{*} for which EE does not happen, and therefore a claimed spanner HH does not exist for PP^{*}.

By Claim 2.3, for a fixed p{0,1}dp\in\{0,1\}^{d}, we have

Prx,y{0,1}d[pAx,y]exp(18(1δ)2d)\displaystyle\Pr_{x,y\sim\{0,1\}^{d}}[p\in A_{x,y}]\leq\exp(-\frac{1}{8}(1-\delta)^{2}d)

Therefore, for k>24(1δ)2k>24(1-\delta)^{-2},

Pr[ distinct x1,y1,,xk,ykP,suchthatpi[k]Axi,yi]\displaystyle\leavevmode\nobreak\ \Pr[\exists\text{\leavevmode\nobreak\ distinct\leavevmode\nobreak\ }x_{1},y_{1},\cdots,x_{k},y_{k}\in P,\leavevmode\nobreak\ \mathrm{such\leavevmode\nobreak\ that}\leavevmode\nobreak\ p\in\cap_{i\in[k]}A_{x_{i},y_{i}}]
\displaystyle\leq n2k(Prx,y[pAx,y])k\displaystyle\leavevmode\nobreak\ n^{2k}\cdot(\Pr_{x,y}[p\in A_{x,y}])^{k}
\displaystyle\leq exp(2klnnk8(1δ)2d)\displaystyle\leavevmode\nobreak\ \exp(2k\ln n-\frac{k}{8}(1-\delta)^{2}d)
\displaystyle\leq exp(2d).\displaystyle\leavevmode\nobreak\ \exp(-2d).

Thus, we conclude Pr[E]<2dexp(2d)=o(1)\Pr[E]<2^{d}\cdot\exp(-2d)=o(1) by union bound on all points p{0,1}dp\in\{0,1\}^{d}, and therefore the theorem follows. ∎

2.2 Lower bound in 2\ell_{2} metric

We now prove the lower bound for the 2\ell_{2} metric:

Theorem 2.4 (2\ell_{2} lower bound).

For any ϵ(0,21)\epsilon\in(0,\sqrt{2}-1), there exists a dataset PSd1(0,1)P\subset S^{d-1}(0,1), such that the following holds. Let H=(PS,E)H=(P\cup S,E) be any spanner that can use extra Steiner points SdS\subset\mathbb{R}^{d}, but can only use the 2\ell_{2} distance as the edge weight between two points. Suppose for all u,vPu,v\in P, the distance between uu and vv on graph HH satisfies

uv2dH(u,v)(2ϵ)uv2,\|u-v\|_{2}\leq\mathrm{d}_{H}(u,v)\leq(\sqrt{2}-\epsilon)\|u-v\|_{2},

then HH must use Ω(n2ϵ4/log2n)\Omega(n^{2}\epsilon^{4}/\log^{2}n) edges, where n=|P|n=|P|.

First, we give some definitions. For any ϵ(0,21)\epsilon\in(0,\sqrt{2}-1) and two points x,ydx,y\in\mathbb{R}^{d}, let Ax,yϵA_{x,y}^{\epsilon} be defined as Ax,yϵ:={pdxp2+yp2(2ϵ)xy2}A_{x,y}^{\epsilon}:=\{p\in\mathbb{R}^{d}\mid\|x-p\|_{2}+\|y-p\|_{2}\leq(\sqrt{2}-\epsilon)\|x-y\|_{2}\}. If ϵ\epsilon is clear from the context, we write it as Ax,yA_{x,y}.

The following lemma shows that for a fixed point pp, it is rare Ax,yA_{x,y} contains pp for a random pair of points xx and yy.

Lemma 2.5.

For ϵ(0,21)\epsilon\in(0,\sqrt{2}-1), let pBd(0,1)p\in B^{d}(0,1) be any point in the unit ball. Let x,yx,y be sampled uniformly from unit sphere Sd1S^{d-1}. Then

Prx,ySd1[pAx,yϵxy2(1±oϵ(1))2]eϵ2d/2.\Pr_{x,y\sim S^{d-1}}[p\in A_{x,y}^{\epsilon}\mid\|x-y\|_{2}\in(1\pm o_{\epsilon}(1))\sqrt{2}]\leq e^{-\epsilon^{2}d/2}.
Proof.

It is enough to bound Prx,ySd1[pAx,yxy2=2t]\Pr_{x,y\sim S^{d-1}}[p\in A_{x,y}\mid\|x-y\|_{2}=2t] for any fixed t(1±oϵ(1))2/2t\in(1\pm o_{\epsilon}(1))\sqrt{2}/2. By symmetry, this is equivalent to

PrpSd1(0,r)[pAx,y]=Vol(Ax,ySd1(0,r))Vol(Sd1(0,r)),\Pr_{p^{\prime}\sim S^{d-1}(0,r)}[p^{\prime}\in A_{x,y}]=\frac{\operatorname{Vol}(A_{x,y}\cap S^{d-1}(0,r))}{\operatorname{Vol}(S^{d-1}(0,r))},

where r:=p2r:=\|p\|_{2}, and x,yx,y can be any two fixed points on Sd1S^{d-1} that xy2=2t\|x-y\|_{2}=2t, say, x=(t,1t2,0d2),y=(t,1t2,0d2)x=(-t,\sqrt{1-t^{2}},0_{d-2}),y=(t,\sqrt{1-t^{2}},0_{d-2}).

Note that Ax,yA_{x,y} is an ellipsoid with focal points on xx and yy. Let c=(0,1,0d2)c=(0,1,0_{d-2}). Let

s:=\displaystyle s:= minzAx,yz,c1t2(2ϵ)2t2t2\displaystyle\leavevmode\nobreak\ \min_{z\in A_{x,y}}\langle z,c\rangle\geq\sqrt{1-t^{2}}-\sqrt{(\sqrt{2}-\epsilon)^{2}t^{2}-t^{2}}
=\displaystyle= 12+oϵ(1)122ϵ+ϵ22oϵ(1)\displaystyle\leavevmode\nobreak\ \frac{1}{\sqrt{2}}+o_{\epsilon}(1)-\frac{\sqrt{1-2\sqrt{2}\epsilon+\epsilon^{2}}}{\sqrt{2}}-o_{\epsilon}(1)
\displaystyle\geq ϵ.\displaystyle\leavevmode\nobreak\ \epsilon.

This means Ax,ySd1(0,r)Cap(r,c,s)A_{x,y}\cap S^{d-1}(0,r)\subseteq\textrm{Cap}(r,c,s), where Cap(r,c,s)\textrm{Cap}(r,c,s) is the sphere cap including all points xSd1(0,r)x\in S^{d-1}(0,r) for which xcsx\cdot c\geq s. Using the bound on the volume of a high dimensional sphere cap given by [Tko12], we get

Vol(Cap(r,c,s))Vol(Sd1(0,r))ed(s/r)2/2=eϵ2d/2.\frac{\operatorname{Vol}(\textrm{Cap}(r,c,s))}{\operatorname{Vol}(S^{d-1}(0,r))}\leq e^{-d(s/r)^{2}/2}=e^{-\epsilon^{2}d/2}.

Proof of Theorem 2.4.

Let PP be generated as every point is i.i.d. uniformly sampled from unit sphere Sd1(0,1)S^{d-1}(0,1) with d=ω(log2n)d=\omega(\log^{2}n). With high probability, every pair of points in PP has distance 2±o(1)\in\sqrt{2}\pm o(1).

Let HH be any spanner stated in this theorem with o(n2/k2)o(n^{2}/k^{2}) edges for k=10logn/ϵ2k=10\log n/\epsilon^{2}. Wlog, we can assume every Steiner point of HH is in the unit ball Bd(0,1)B^{d}(0,1), since otherwise we can project them onto the unit sphere. By Lemma 2.1, there must exists 2k2k distinct points x1,y1,,xk,ykPx_{1},y_{1},\cdots,x_{k},y_{k}\in P such that the shortest path between (x1,y1),,(xk,yk)(x_{1},y_{1}),\cdots,(x_{k},y_{k}) intersect on some point vHv\in H. Since the edge weight can only use the actual distance in the Euclidean space, we must have xiv2+yiv2(2ϵ)xiyi2\|x_{i}-v\|_{2}+\|y_{i}-v\|_{2}\leq(\sqrt{2}-\epsilon)\|x_{i}-y_{i}\|_{2} for all ii. Let vv^{\prime} be the closest point to vv such that every coordinate of vv^{\prime} is a multiple of ϵ10d\frac{\epsilon}{10\sqrt{d}}, and thus vv2ϵ/10\|v-v^{\prime}\|_{2}\leq\epsilon/10, and therefore, vv^{\prime} satisfies that for all ii,

xiv2+yiv2(2ϵ/2)xiyi2.\|x_{i}-v^{\prime}\|_{2}+\|y_{i}-v^{\prime}\|_{2}\leq(\sqrt{2}-\epsilon/2)\|x_{i}-y_{i}\|_{2}.

Let NN include all points in ball Bd(0,r)B^{d}(0,r) whose coordinates are multiple of ϵ10d\frac{\epsilon}{10\sqrt{d}} and let the event EE be: there exists a point pNp\in N and 2k2k distinct points x1,y1,xk,ykPx_{1},y_{1},\cdots x_{k},y_{k}\in P such that for all i[k]i\in[k] pAxi,yiϵ/2p\in A^{\epsilon/2}_{x_{i},y_{i}}. The above argument indicates that if such a spanner exists, then EE must happen. Next, we will show that EE happens with probability <1<1 for a random dataset PP; therefore, we assert that such spanner HH does not exist for some PP.

Let pNp\in N be any point. Since every pair of points in PP has distance 2±o(1)\in\sqrt{2}\pm o(1), by Lemma 2.5, we have

Pr[x1,y1,xk,yk,s.t.piAxi,yiϵ/2]\displaystyle\leavevmode\nobreak\ \Pr[\exists x_{1},y_{1},\cdots x_{k},y_{k},\leavevmode\nobreak\ \leavevmode\nobreak\ \text{s.t.}\leavevmode\nobreak\ \leavevmode\nobreak\ p\in\cap_{i}A^{\epsilon/2}_{x_{i},y_{i}}]
\displaystyle\leq n2k(Prx,y[pAx,yxy2=2±o(1)])k\displaystyle\leavevmode\nobreak\ n^{2k}\cdot\left(\Pr_{x,y}[p\in A_{x,y}\mid\|x-y\|_{2}=\sqrt{2}\pm o(1)]\right)^{k}
\displaystyle\leq exp(2klnnϵ2dk/8)\displaystyle\leavevmode\nobreak\ \exp(2k\ln n-\epsilon^{2}dk/8)
=\displaystyle= exp(dlogn).\displaystyle\leavevmode\nobreak\ \exp(-d\log n).

By union bound on all pNp\in N, we have

Pr[E]exp(dlogn)(20d/ϵ)d=o(1),\Pr[E]\leq\exp(-d\log n)\cdot(20\sqrt{d}/\epsilon)^{d}=o(1),

therefore, the theorem follows. ∎

2.3 Proof of Lemma 2.1

Proof of Lemma 2.1.

Suppose there is a graph GG for which the property doesn’t hold. We will turn GG into another bipartite graph H=(PxPy,E)H=(P_{x}\cup P_{y},E^{\prime}) with less than n2n^{2} edges but can connect all pairs of points xPxx\in P_{x} and yPyy\in P_{y} by a direct edge. Thus, by proof of contradiction, we can conclude such graph GG doesn’t exist.

First of all, we assume every path Sx,yS_{x,y} never touches PxP_{x} or PyP_{y} except its endpoints, since otherwise, we can create dummy points for every point in PxP_{x} and PyP_{y} and let the path Sx,yS_{x,y} use dummy points. If the new graph has kk paths intersecting on one point, then so does the original graph.

Next, to simplify the proof, we give direction to edges. Let Sx,yS_{x,y} be any path, and let {v1,,vk}\{v_{1},\cdots,v_{k}\} be points on the path in order. We say the directed edge e=(a,b)Sx,ye=(a,b)\in S_{x,y} if a=via=v_{i} and b=vi+1b=v_{i+1} for some ii. We assume GG only has useful edges. From now on, GG is a directed graph and

E=xPx,yPy(a,b)Sx,y{(a,b)}.E=\bigcup_{x\in P_{x},y\in P_{y}}\bigcup_{(a,b)\in S_{x,y}}\{(a,b)\}.

To turn GG into HH, we have three intermediate steps – from GG to G1G_{1}, G2G_{2}, G3G_{3}, then HH, each graph has better properties. In the first graph G1=(V1,E1)G_{1}=(V_{1},E_{1}), we will have |V1|=O(|V|)|V_{1}|=O(|V|), |E1|=O(|E|)|E_{1}|=O(|E|), and every vertex is light which we will explain later. The procedure of making G1G_{1} is as follows.

For every vertex vVv\in V, let U(v)U(v) be initially empty. For every path Sx,yS_{x,y} with xPxx\in P_{x} and yPyy\in P_{y}, let {v1,,vk}\{v_{1},\cdots,v_{k}\} be the points on the path. For every i[k]i\in[k], we add (x,y)(x,y) to U(vi)U(v_{i}).

We show that we modify the graph to obtain a simpler condition — that U(v)U(v) contains <k<k of either distinct xx’s or distinct yy’s. For every vertex vV\(PxPy)v\in V\backslash(P_{x}\cup P_{y}), U(v)U(v) can be regarded as a bipartite graph with a collection of edges between PxP_{x} and PyP_{y}. In this bipartite graph, the maximum matching is less than kk, otherwise there are kk paths with distinct endpoints intersecting on vv.

Theorem 2.6 (Extended Hall’s theorem).

For any bipartite graph G=(L,R,E)G=(L,R,E) with |L|=|R|=n|L|=|R|=n, the cardinality of maximum matching is

μ(G)=nmaxT(|T||N(T)|),\mu(G)=n-\max_{T}(|T|-|N(T)|),

where TT range over all subset of LL and RR separately, and N(T)N(T) denote the set of neighbors of the set TT in GG.

Apply the above theorem on the bipartite graph U(v)U(v), there must be a set TPxT\subseteq P_{x} or TPyT\subseteq P_{y} such that n|T|+|N(T)|=μ(U(v))<kn-|T|+|N(T)|=\mu(U(v))<k. We must have |T|nk|T|\geq n-k and |N(T)|k|N(T)|\leq k. The analysis for the cases TPxT\subseteq P_{x} and TPyT\subseteq P_{y} would be similar, so we assume the first case. Let TL=TT_{L}=T and TR=Py\N(T)T_{R}=P_{y}\backslash N(T). Note that there is no path Sx,yS_{x,y} in GG crossing vv with xTLx\in T_{L} and yTRy\in T_{R}. In the new graph G1G_{1}, we create three copies of vv: vTL,T¯Rv_{T_{L},\bar{T}_{R}} and vT¯L,TRv_{\bar{T}_{L},T_{R}} and vT¯L,T¯Rv_{\bar{T}_{L},\bar{T}_{R}}, where T¯L:=Px\TL\bar{T}_{L}:=P_{x}\backslash T_{L} and T¯R:=Py\TR\bar{T}_{R}:=P_{y}\backslash T_{R}. So |V1|3|V||V_{1}|\leq 3|V|. For each edge (u,v)(u,v) in GG, we create at most 99 edges linking each pair of copies of uu and vv. Thus, |E1|9|E||E_{1}|\leq 9|E|.

Let {v1,,vl}\{v_{1},\cdots,v_{l}\} be vertices on path Sx,yS_{x,y}. For all 2ik12\leq i\leq k-1, we replace viv_{i} with its copy vS,Tv_{S,T} where xS,yTx\in S,y\in T and get a path Sx,yS^{\prime}_{x,y} in G1G_{1}. In the new graph G1G_{1}, for every vertex vV1v\in V_{1}, let L(v),R(v)L(v),R(v) be initially empty. For every path Sx,yS^{\prime}_{x,y} with xPxx\in P_{x} and yPyy\in P_{y}, let {v1,,vl}\{v_{1},\cdots,v_{l}\} be the points on the path. For every i[l]i\in[l], we add xx to L(vi)L(v_{i}) and add yy to R(vi)R(v_{i}).

Since |T¯L|,|T¯R|k|\bar{T}_{L}|,|\bar{T}_{R}|\leq k from the construction of G1G_{1}, for every vertex vV1\(PxPy)v\in V_{1}\backslash(P_{x}\cup P_{y}), we have either |L(v)|k|L(v)|\leq k or |R(v)|k|R(v)|\leq k. If |L(v)|k|L(v)|\leq k, we call vv left-light, otherwise we call vv right-light. So there are four types of points in G1G_{1}: PxP_{x}, PyP_{y}, left-light and right-light points.

For any graph G=(V,E)G=(V,E) and a vertex vVv\in V, we call inG(v)\textrm{in}_{G}(v) as the set of edges entering vv, and outG(v)\textrm{out}_{G}(v) as the set of edges leaving vv. In some context, we also refer inG(v)\textrm{in}_{G}(v) outG(v)\textrm{out}_{G}(v) as those vertices connected with an edge entering/leaving vv.

Then we construct the next graph G2=(V2,E2)G_{2}=(V_{2},E_{2}) as follows. Initially, E2=E1E_{2}=E_{1}. Then, for every left-light vertex vv, we delete all edges in inG1(v)\textrm{in}_{G_{1}}(v) and add edge (a,v)(a,v) for every aL(v)a\in L(v). For every right-light vertex vv, we delete all edges in outG1(v)\textrm{out}_{G_{1}}(v) and add edge (v,b)(v,b) for every bR(v)b\in R(v). Intuitively, the construction is replacing edges with shortcuts. One can check that every pair of points xPx,yPyx\in P_{x},y\in P_{y} is still connected with a path in G2G_{2}. The number of edges is bounded by |E2||E1|+|V1|k|E_{2}|\leq|E_{1}|+|V_{1}|\cdot k. The purpose of constructing G2G_{2} is to have inG2(v)Px\textrm{in}_{G_{2}}(v)\subseteq P_{x} for left-light vertices vv, and outG2(v)Py\textrm{out}_{G_{2}}(v)\subseteq P_{y} for right-light vertices vv. Thus, G2G_{2} is in fact a four-layered graph ordered from PxP_{x}left-light vertices– right-light vertices – PyP_{y}, and every edge must go strictly from lower level to higher level.

To construct HH from G2G_{2}, we need to remove all points in V2\(PxPy)V_{2}\backslash(P_{x}\cup P_{y}). Define the removing procedure applied on vertex vv on graph GG^{*}, RemoveG(v)\textsc{Remove}_{G^{*}}(v) as follows: delete all edges connecting vv, and then for every vertex ainG(v)a\in\textrm{in}_{G^{*}}(v) and boutG(v)b\in\textrm{out}_{G^{*}}(v), add an edge from aa to bb.

We apply RemoveG2(v)\textsc{Remove}_{G_{2}}(v) on every left-light vertices vv in G2G_{2} to get graph G3=(V3,E3)G_{3}=(V_{3},E_{3}), then apply RemoveG3(v)\textsc{Remove}_{G_{3}}(v) on every right-light vertices vv in G3G_{3} to get graph G4=(PxPy,E4)G_{4}=(P_{x}\cup P_{y},E_{4}). The order of deleting vv does not affect the outcome since there are no edges connecting two left-light vertices, nor two right-light vertices. We have

|E3||E2|+visleft-lightkoutG2(v)|E2|+k|E1|,\displaystyle|E_{3}|\leq|E_{2}|+\sum_{v\leavevmode\nobreak\ is\leavevmode\nobreak\ \textit{left-light}}k\cdot\textrm{out}_{G_{2}}(v)\leq|E_{2}|+k|E_{1}|,

where the last step follows by outG2(v)=outG1(v)\textrm{out}_{G_{2}}(v)=\textrm{out}_{G_{1}}(v) for left-light vertices vv.

Furthermore, we have

|E4||E3|+visright-lightkinG3(v)(k+1)|E3|.|E_{4}|\leq|E_{3}|+\sum_{v\leavevmode\nobreak\ is\leavevmode\nobreak\ \textit{right-light}}k\cdot\textrm{in}_{G_{3}}(v)\leq(k+1)|E_{3}|.

Thus, |E4|(k+1)|E3|(k+1)k(|V1|+|E1|)=O(k2)|E|<n2|E_{4}|\leq(k+1)|E_{3}|\leq(k+1)k(|V_{1}|+|E_{1}|)=O(k^{2})\cdot|E|<n^{2}. And G4G_{4} is our HH – a bipartite graph that connects all pairs of points from xPxx\in P_{x} and yPyy\in P_{y} with less than n2n^{2} edges. ∎

3 Directed Non-metric Steiner Spanners

In this section we construct a directed non-metric Steiner spanner, proving Theorem 1.4.

Theorem 3.1 (Restatement of Theorem 1.4).

Fix ϵ(0,1)\epsilon\in(0,1). For any dataset P,QdP,Q\in\mathbb{R}^{d}, there exists a directed graph HH on vertices PSP\cup S, such that for every p,qPp,q\in P,

pq2dH(p,q)<(1+ϵ)pq2.\|p-q\|_{2}\leq\vec{\mathrm{d}}_{H}(p,q)<(1+\epsilon)\cdot\|p-q\|_{2}.

Furthermore, the number of edges of HH, the size of SS, and the construction time are all bounded by O(n2Ω(ϵ2)logΔ)O(n^{2-\Omega(\epsilon^{2})}\log\Delta), where Δ\Delta is the aspect ratio of PP.

We prove this theorem by reducing to the one-scale case: we fix distance rr, and we construct a spanner HrH_{r} such that, for all pairs of points within distance [r,(1+ϵ)r][r,(1+\epsilon)r], they are all connected in HrH_{r} with a proper path. Here is the formal statement.

Lemma 3.2.

Let PdP\subset\mathbb{R}^{d} be a point set on the unit sphere. Let n=|P|n=|P|, r=(logn)1/8r=(\log n)^{-1/8}, and ϵ\epsilon is the approximation. There is an algorithm running in n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})} time, that can generate a directed graph H=(PS,E)H=(P\cup S,E) with |E|n2Ω(ϵ2)|E|\leq n^{2-\Omega(\epsilon^{2})}. Here SS is the set of extra non-metric Steiner points. With high probability, HH satisfies:

  1. (1)

    p,qP\forall p,q\in P, dH(p,q)pq2\vec{\mathrm{d}}_{H}(p,q)\geq\|p-q\|_{2};

  2. (2)

    p,qP\forall p,q\in P with pq2r\|p-q\|_{2}\leq r, there is a 2-hop directed path in HH from pp to qq of length (1+ϵ)r\leq(1+\epsilon)r.

The proof of this lemma is in the rest of this section. For now, we finish the proof of our main theorem using Lemma 3.2.

Proof of Theorem 3.1.

In Lemma 3.2, we show how to build a spanner, so that every pair of points p,qPp,q\in P with pq2[r,(1+ϵ)r]\|p-q\|_{2}\in[r,(1+\epsilon)r] are handled, given any parameter rr. We can do doubling on rr from least distance to largest distance to create a sequence of spanners H1,,HlogΔH_{1},\cdots,H_{\log\Delta}. Then H=HiH=\cup H_{i} is the final spanner.

There is only one issue to solve in applying Lemma 3.2: that we need PP to be on the sphere and rr has to be about 1/log1/8n1/\log^{1/8}n. Indeed, it is possible to reduce any Euclidean instance to such an instance, while preserving the distances at the scale rr. See one such reduction in Corollary A.2 from [AIMN23] (based on [BRS11, Lemma 6]).

3.1 Algorithm for one scale

The algorithm heavily uses the data-independent (asymmetric) locality sensitive hashing from [ALRW17] as a subroutine. We use the following primitive from [ALRW17]; while this lemma is not explicitly given in [ALRW17], we show how it is obtained in Appendix A.

Lemma 3.3 (Data-independent LSH from Section 3 of [ALRW17]).

Let r=(logn)1/8r=(\log n)^{-1/8} be a fixed parameter, c=1+ϵc=1+\epsilon be the approximation ratio. There is a data structure that, takes a random seed ss, implicitly 444TT can be super-polynomially large. gives two random collections of subspaces A1,A2,,ATdA_{1},A_{2},\cdots,A_{T}\subset\mathbb{R}^{d} and B1,B2,,BTdB_{1},B_{2},\cdots,B_{T}\subset\mathbb{R}^{d}. For any point pp, denote 𝒜p={Ai:pAi}\mathcal{A}_{p}=\{A_{i}:p\in A_{i}\} as the set of subspaces AiA_{i} that contains pp. Similarly, define p={Bi:pBi}\mathcal{B}_{p}=\{B_{i}:p\in B_{i}\}. Then, for any n=ω(1)n=\omega(1), we have

  1. 1.

    pSd1\forall p\in S^{d-1}, 𝔼s[|𝒜p|]n1/ϵ2+O(1/ϵ)\operatorname*{{\mathbb{E}}}_{s}[|\mathcal{A}_{p}|]\leq n^{1/\epsilon^{2}+O(1/\epsilon)};

  2. 2.

    pSd1\forall p\in S^{d-1}, 𝔼s[|p|]=no(1)\operatorname*{{\mathbb{E}}}_{s}[|\mathcal{B}_{p}|]=n^{o(1)};

  3. 3.

    p,qSd1\forall p,q\in S^{d-1} that pq2r\|p-q\|_{2}\leq r, i[T]\forall i\in[T],

    Prs[pAiqBi]no(1);\Pr_{s}[p\in A_{i}\mid q\in B_{i}]\geq n^{-o(1)};
  4. 4.

    p,qSd1\forall p,q\in S^{d-1} that pq2cr\|p-q\|_{2}\geq c\cdot r, i[T]\forall i\in[T],

    Prs[pAiqBi]1n;\Pr_{s}[p\in A_{i}\mid q\in B_{i}]\leq\frac{1}{n};
  5. 5.

    pSd1\forall p\in S^{d-1}, one can find all Ai𝒜pA_{i}\in\mathcal{A}_{p} in O(|𝒜p|no(1)O(|\mathcal{A}_{p}|\cdot n^{o(1)} time, and all BipB_{i}\in\mathcal{B}_{p} in O(|p|no(1))O(|\mathcal{B}_{p}|\cdot n^{o(1)}) time.

Description of algorithm for directed spanner construction. The formal algorithm is presented as Algorithm 1. The algorithm takes as input a dataset PSd1P\subseteq S^{d-1}, a distance parameter rRr\in R, and a approximation factor ϵ\epsilon, and output a Steiner spanner graph HH. It never create “shortcut”, i.e., dH(p,q)\vec{\mathrm{d}}_{H}(p,q) cannot be smaller than pq2\|p-q\|_{2}. For those pq2[r,(1+ϵ)r]\|p-q\|_{2}\in[r,(1+\epsilon)r], it guarantees that dH(p,q)(1+ϵ)pq2\vec{\mathrm{d}}_{H}(p,q)\leq(1+\epsilon)\|p-q\|_{2}.

The algorithm consists of no(1)n^{o(1)} rounds. In each round, for every pair of points p,qp,q at distance r\leq r, they have no(1)n^{-o(1)} chance being connected by a 2-hop path of total length of (1+ϵ)r(1+\epsilon)r. Furthermore, we avoid creating a short-cut on any pair of points. Thus, after no(1)n^{o(1)} rounds, we expect our guarantee is satisfied.

In each round, we first arbitrarily partition all points into groups of size m=nϵ2/4m=n^{\epsilon^{2}/4}. Denote groups as P1,,Pn/mP_{1},\cdots,P_{n/m}. Then, we use Locality Sensitive Hashing from Lemma 3.3 (with parameters c11+ϵc_{1}\leftarrow 1+\epsilon be the approximation and nKmn\leftarrow Km for some K=|P|o(1)K=|P|^{o(1)}). Note that we use the same LSH function for all groups. Assume {Ai,Bi}i[T]\{A_{i},B_{i}\}_{i\in[T]} are half-spaces generated by LSH function. Define sets QjPBjQ_{j}\leftarrow P\cap B_{j} and Pi,jPiBjP_{i,j}\leftarrow P_{i}\cap B_{j}. Intuitively, every point in QjQ_{j} will be close to every point in Pi,jP_{i,j} for all ii, because it’s exactly the building and querying procedure of a nearest neighbor data structure – in the pre-processing phase, pPi,jp\in P_{i,j} falls into jj-th bucket, and in the query phase, qQjq\in Q_{j} queries the jj-th bucket.

To guarantee that some qQjq\in Q_{j} is at distance (1+ϵ)r\leq(1+\epsilon)r from all the points in Pi,jP_{i,j} (which may fail to happen with non-trivial probability), we perform a Furthest Neighbor Search (FNS) check. For the latter, we use an FNS data structure, stated in the lemma below. It follows immediately from the reduction from [Ind01] to a nearest neighbor search data structure, which we instantiate with the data structure from [ALRW17].

Lemma 3.4 (FNS data structure [Ind01, ALRW17]).

There is a cc-approximate Furthest Neighbor Search data structure with O(n1+ρP+o(1))O(n^{1+\rho_{P}+o(1)}) preprocessing time and O(nρQ+o(1))O(n^{\rho_{Q}+o(1)}) query time, for any ρQ,ρP>0\rho_{Q},\rho_{P}>0 satisfying the trade-off of:

c2ρQ+(c21)ρP=2c21.c^{2}\sqrt{\rho_{Q}}+(c^{2}-1)\sqrt{\rho_{P}}=\sqrt{2c^{2}-1}.

We build an FNS data structure (Lemma 3.4) for each Pi,jP_{i,j} with approximation c21+ϵc_{2}\leftarrow 1+\epsilon and parameters ρP2/ϵ2\rho_{P}\leftarrow 2/\epsilon^{2} and ρQ0\rho_{Q}\leftarrow 0. For every point qQjq\in Q_{j}, we check if every point in Pi,jP_{i,j} is c2c_{2} approximately (1+ϵ)r(1+\epsilon)r-close to qq. If so, we can safely put qq into Vi,jV_{i,j}. As a result, every point qVi,jq\in V_{i,j} and every point pPi,jp\in P_{i,j} has Euclidean distance (1+3ϵ)r\leq(1+3\epsilon)r so we can create an extra Steiner node as the center and add a star gadget. We will prove later that if p,qp,q are close, they will fall into Pi,jP_{i,j} and Vi,jV_{i,j} with no(1)n^{-o(1)} probability, for some i,ji,j.

Algorithm 1
1:function Build Spanner(PSd1P\subseteq S^{d-1}, r,ϵ>0r,\epsilon>0)
2:     Let n=|P|n=|P| and m=nϵ2/4m=n^{\epsilon^{2}/4}
3:     Initialize the spanner H=(P,)H=(P,\emptyset) as a graph with vertex set PP but no edges
4:     for it=1,2,,no(1)\textrm{it}=1,2,\cdots,n^{o(1)} do \triangleright Repeat no(1)n^{o(1)} rounds to boost success probability
5:         Arbitrarily partition PP into n/mn/m groups P1,Pn/mP_{1},\cdots P_{n/m} each of size mm
6:         Let {Aj,Bj}j[T]\{A_{j},B_{j}\}_{j\in[T]} be the LSH function stated in Lemma 3.3
7:         \triangleright Plug in parameter c11+ϵc_{1}\leftarrow 1+\epsilon and nKmn\leftarrow Km for some K=|P|o(1)K=|P|^{o(1)}
8:         Compute QjPBjQ_{j}\leftarrow P\cap B_{j}, j[T]\forall j\in[T]
9:         for i[n/m]i\in[n/m]j[T]j\in[T] do
10:              Calculate Pi,jPiAjP_{i,j}\leftarrow P_{i}\cap A_{j}
11:              Vi,jV_{i,j}\leftarrow\emptyset
12:              Build FNS data structure for Pi,jP_{i,j} using Lemma 3.4
13:              \triangleright Plug in parameter c21+ϵc_{2}\leftarrow 1+\epsilon, ρP2/ϵ2\rho_{P}\leftarrow 2/\epsilon^{2} and ρQ0\rho_{Q}\leftarrow 0.
14:              for qQjq\in Q_{j} do
15:                  Query qq into the FNS data structure for PijP_{ij} to get ff, the c2c_{2}-approximate furthest point
16:                  if qf2c1r\|q-f\|_{2}\leq c_{1}r then
17:                       Add qq to Vi,jV_{i,j}
18:                  end if
19:              end for
20:              Create a new Steiner node sijs_{ij} in HH
21:              For all qVi,jq\in V_{i,j}, add a directed edge (q,sij)(q,s_{ij}) with length 0 to HH
22:              For all pPi,jp\in P_{i,j}, add a directed edge (sij,p)(s_{ij},p) with length (1+3ϵ)r(1+3\epsilon)r to HH
23:         end for
24:     end for
25:     return HH
26:end function

3.2 One scale correctness: Proof of Lemma 3.2

We now prove Lemma 3.2. We first analyze the size of sets Pi,jP_{i,j} and QjQ_{j}, and then bound the size of spanner.

Observation 3.5.

Let Pi,jP_{i,j} and QjQ_{j} be defined in Algorithm 1. We have

  1. 1.

    for all i,ji,j, Pi,jPiP_{i,j}\subseteq P_{i} and |Pi|=m|P_{i}|=m.

  2. 2.

    j|Qj|=n1+o(1)\sum_{j}|Q_{j}|=n^{1+o(1)};

  3. 3.

    i,j𝔼[|Pi,j|]=n5/4+O(ϵ)\sum_{i,j}\operatorname*{{\mathbb{E}}}[|P_{i,j}|]=n^{5/4+O(\epsilon)};

  4. 4.

    j,i|Pi,j|n\forall j,\sum_{i}|P_{i,j}|\leq n,

  5. 5.

    i[n/m]i\in[n/m] and m=nϵ2/4m=n^{\epsilon^{2}/4}.

Proof.

We need to prove (2), (3), (4).

For (2), (3), by Lemma 3.3, each qPq\in P is mapped to (Km)o(1)=no(1)(Km)^{o(1)}=n^{o(1)} different QjQ_{j}, and each pPip\in P_{i} is mapped to (Km)1/ϵ2+O(1/ϵ)=n1/4+O(ϵ)(Km)^{1/\epsilon^{2}+O(1/\epsilon)}=n^{1/4+O(\epsilon)} different Pi,jP_{i,j}. For (4), we note that for every fixed jj, any pPp\in P can appear in Pi,jP_{i,j} for at most one value of ii. ∎

Proof of Lemma 3.2.

We are going to prove the lemma in four aspects: the graph our algorithm output satisfies both property (1) and (2), also the edge number and running time are both upper bounded by n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})}. For simplicity, we will run Algorithm 1 with ϵ=13ϵinput\epsilon=\frac{1}{3}\epsilon_{\mathrm{input}}. The lemma then follows with a substitution.

Property 1: p,qP\forall p,q\in P, dH(p,q)pq2\vec{\mathrm{d}}_{H}(p,q)\geq\|p-q\|_{2}.

The only situation we add edges is when creating paths between Pi,jP_{i,j} and Vi,jV_{i,j}, and all paths have length (1+3ϵ)r(1+3\epsilon)r. Because every qq enters Vi,jV_{i,j} only after it passes FNS test (Line 16), which means every point in Pi,jP_{i,j} is c1c2r<(1+3ϵ)rc_{1}c_{2}r<(1+3\epsilon)r close to qq. So property (1) is true.

Property 2: p,qP\forall p,q\in P with pq2r\|p-q\|_{2}\leq r, there is a 2-hop directed path in HH from pp to qq of length (1+ϵ)r\leq(1+\epsilon)r.

Let p,qPp,q\in P be any pair of points with pq2r\|p-q\|_{2}\leq r. Let’s compute the probability of connecting qq to pp with a path in one round.

Suppose pp is partitioned into group PiP_{i}. We are going to calculate the probability that pPi,jp\in P_{i,j} and qVi,jq\in V_{i,j}, for some j[T]j\in[T]. Define F:={fPiqf2>(1+ϵ)r}F:=\{f\in P_{i}\mid\|q-f\|_{2}>(1+\epsilon)r\} as the set of far points of qq. If FAj=F\cap A_{j}=\emptyset, while qq is included in BjB_{j}, then qq will pass the FNS test and thus be included in Vi,jV_{i,j}.

By Lemma 3.3 and the choice of our parameters (c11+ϵc_{1}\leftarrow 1+\epsilon and nKmn\leftarrow Km), we have for all j[T]j\in[T],

Pr[pAjqBj]=(Km)o(1).\Pr[p\in A_{j}\mid q\in B_{j}]=(Km)^{-o(1)}.

and

Pr[fAjqBj]=1/(Km).\Pr[f\in A_{j}\mid q\in B_{j}]=1/(Km).

Since |F||Pi|=m|F|\leq|P_{i}|=m, by union bound on all fFf\in F, we can bound the probability that the bad event happens:

Pr[fF, s.t. fAjqBj]1/K.\Pr[\exists f\in F,\text{\leavevmode\nobreak\ s.t.\leavevmode\nobreak\ }f\in A_{j}\mid q\in B_{j}]\leq 1/K.

Thus, by choose a proper K=no(1)K=n^{o(1)}, we have

Pr[qVi,j and pAjqBj](Km)o(1)1/K=(Km)o(1)(1o(1)).\Pr[q\in V_{i,j}\textrm{\leavevmode\nobreak\ and\leavevmode\nobreak\ }p\in A_{j}\mid q\in B_{j}]\geq(Km)^{-o(1)}-1/K=(Km)^{-o(1)}(1-o(1)).

Also, By Lemma 3.3, we have

|{j[T]:qBj}|=(Km)o(1)1.|\{j\in[T]:q\in B_{j}\}|=(Km)^{o(1)}\geq 1.

Thus, with no(1)\geq n^{-o(1)} probability, pPi,jp\in P_{i,j} and qVi,jq\in V_{i,j} for some i,ji,j, and hence the spanner will have a 2-hop path from qq to pp, via sijs_{ij}, with a path of length (1+3ϵ)r(1+3\epsilon)r. By repeating no(1)n^{o(1)} rounds, qq connects to pp with high probability.

Bounding the number of edges. In one round, the total number of edges added from SS to PP is at most i,j|Pi,j|n5/4+O(ϵ)\sum_{i,j}|P_{i,j}|\leq n^{5/4+O(\epsilon)}, where the last step is by Observation 3.5. So the total number of edges from SS to PP is at most n5/4+O(ϵ)n^{5/4+O(\epsilon)}.

On the other hand, the total number of edges from Q=PQ=P to SS is at most

no(1)i,j|Vi,j|no(1)nmj|Qj|=n2+o(1)/m=n2Ω(ϵ2),n^{o(1)}\cdot\sum_{i,j}|V_{i,j}|\leq n^{o(1)}\frac{n}{m}\cdot\sum_{j}|Q_{j}|=n^{2+o(1)}/m=n^{2-\Omega(\epsilon^{2})},

where the first step is because Vi,jQjV_{i,j}\subseteq Q_{j}, the second step is by j|Qj|=n1+o(1)\sum_{j}|Q_{j}|=n^{1+o(1)} (Observation 3.5).

Thus, |E|n2Ω(ϵ2)|E|\leq n^{2-\Omega(\epsilon^{2})}.

Running time. By Lemma 3.3, we can find all AjA_{j} that contains pp for all pPp\in P in n(Km)1/ϵ2+O(1/ϵ)=n5/4+O(ϵ)n\cdot(Km)^{1/\epsilon^{2}+O(1/\epsilon)}=n^{5/4+O(\epsilon)} time, and all BjB_{j} that contains qq for all qPq\in P in n(Km)o(1)=n1+o(1)n\cdot(Km)^{o(1)}=n^{1+o(1)} time.

The major running time cost is on building and querying the FNS data structure on every Pi,jP_{i,j}. We use Lemma 3.4 with ρP=2/ϵ2\rho_{P}=2/\epsilon^{2} and ρQ=0\rho_{Q}=0 (See line 13).

In the preprocessing phase, we spend time |Pi,j|ρP\sum|P_{i,j}|^{\rho_{P}} to build the FNS data structure for every Pi,jP_{i,j}. The total time cost is

i,j|Pi,j|ρP(|Pi,j|)max|Pi,j|ρPn5/4+O(ϵ)mρP=n7/4+O(ϵ),\sum_{i,j}|P_{i,j}|^{\rho_{P}}\leq(\sum|P_{i,j}|)\cdot\max|P_{i,j}|^{\rho_{P}}\leq n^{5/4+O(\epsilon)}\cdot m^{\rho_{P}}=n^{7/4+O(\epsilon)},

where the second step is because |Pi,j|=n5/4+O(ϵ)\sum|P_{i,j}|=n^{5/4+O(\epsilon)}, and |Pi,j||Pi|=m|P_{i,j}|\leq|P_{i}|=m, the last step is by our choice of m=nϵ2/4m=n^{\epsilon^{2}/4}.

In the query phase, for every point qPq\in P, if qq is in QjQ_{j}, qq needs to query FNS data structure built on Pi,jP_{i,j} for every i[n/m]i\in[n/m]. Since qq is in at most no(1)n^{o(1)} different QjQ_{j}’s, with our choice of ρq=0\rho_{q}=0, the query time cost on each point qPq\in P is at most i[n/m],j:qQj|Pi,j|ρQ=n1+o(1)/m\sum_{i\in[n/m],j:q\in Q_{j}}|P_{i,j}|^{\rho_{Q}}=n^{1+o(1)}/m. Thus, the query time for all qPq\in P, and also the overall running time is

nn1+o(1)/m=n2Ω(ϵ2).n\cdot n^{1+o(1)}/m=n^{2-\Omega(\epsilon^{2})}.

4 Undirected Non-metric Steiner Spanners

In this section, we show how to turn the directed spanner from Theorem 3.1 into an undirected spanner, with some loss in its size. Here is the formal statement:

Theorem 4.1 (Restatement of Theorem 1.5).

Fix ϵ(0,1)\epsilon\in(0,1). For any dataset PdP\in\mathbb{R}^{d}, we can construct an undirected graph HH on vertices PSP\cup S, such that for any p,qPp,q\in P,

pq2dH(p,q)<(1+ϵ)pq2.\|p-q\|_{2}\leq\mathrm{d}_{H}(p,q)<(1+\epsilon)\cdot\|p-q\|_{2}.

Furthermore, the construction time, the number of edges of HH and the size of SS are bounded by O(n2Ω(ϵ3)logΔ)O(n^{2-\Omega(\epsilon^{3})}\log\Delta), where Δ\Delta is the aspect ratio.

The proof of this theorem starts with the directed spanner constructed in the previous section, modifying it further. We note that the directed edges appear in line 20 to line 22 in Algorithm 1, where we link all qVi,jq\in V_{i,j} to Steiner point sijs_{ij} with edge length (1+ϵ)r(1+\epsilon)r and link sijs_{ij} to all pPi,jp\in P_{i,j} with edge length 0. If these directed edges lose their direction, they will potentially create shortcuts between p,pPi,jp,p^{\prime}\in P_{i,j} or q,qVi,jq,q^{\prime}\in V_{i,j} (depending on how we would assign the weights to these edges).

We show how to replace the above (directed) gadget with a more involved (undirected) graph. For now, we focus on one (Vi,j,Pi,j)(V_{i,j},P_{i,j}) pair. The high level idea is to partition Vi,jV_{i,j} and Pi,jP_{i,j} into clusters of bounded diameter aa and bb, respectively, with a+b=2(1+ϵ)ra+b=2(1+\epsilon)r. Then, for each pair of clusters VijVi,j,PijPijV_{ij}^{\prime}\subset V_{i,j},P_{ij}^{\prime}\subset P_{ij}, we add a Steiner node with connecting to each qVi,jq\in V_{i,j}^{\prime} with edge of length a/2a/2, and to Pi,jP_{i,j}^{\prime} with edge length b/2b/2. In this way, we connect Vi,jV_{i,j} and Pi,jP_{i,j} with path of length a/2+b/2=(1+ϵ)ra/2+b/2=(1+\epsilon)r without shortcuting any pair of points. As long as the number of clusters is |Vij|,|Pij|\ll|V_{ij}|,|P_{ij}|, the number of edges is |Vi,j||Pi,j|\ll|V_{i,j}|\cdot|P_{i,j}|.

We start with some definitions.

Definition 4.2 (Average square distance).

For a point set AA, define avg(A)=𝔼p,qA[pq22]\operatorname{avg}(A)=\sqrt{\operatorname*{{\mathbb{E}}}_{p,q\in A}[\|p-q\|_{2}^{2}]}. Define avgmax(A)=maxSAavg(S)\operatorname{avg}_{\max}(A)=\max_{S\subseteq A}\operatorname{avg}(S).

Remark 4.3.

When SS only has two points x,yx,y, 𝔼p,qS[pq22]xy22\operatorname*{{\mathbb{E}}}_{p,q\in S}[\|p-q\|_{2}^{2}]\neq\|x-y\|_{2}^{2}, since there is half probability that p=q=xp=q=x or p=q=yp=q=y. Thus, avgmax2(A)\operatorname{avg}_{\max}^{2}(A) is can be smaller than maxp,qApq22\max_{p,q\in A}\|p-q\|_{2}^{2}.

The average square distance has the following properties.

Claim 4.4.

If two sets of points A,BA,B satisfy maxpA,qBpq2r\max_{p\in A,q\in B}\|p-q\|_{2}\leq r, then avg(A)+avg(B)2r\operatorname{avg}(A)+\operatorname{avg}(B)\leq 2r.

Proof.

The proof follows from non-embeddability of the Kn,nK_{n,n} metric into the square of 2\ell_{2}. In particular, consider the following negative-type inequality applied to A,BA,B (see, e.g., [Sei91] or [Sch35]): for any real ap,bqa_{p},b_{q} with pAap+qBbq=0\sum_{p\in A}a_{p}+\sum_{q\in B}b_{q}=0, we must have:

i,jAaiajpq22+i,jBbibjpq222iA,jBaibjpq220.\sum_{i,j\in A}a_{i}a_{j}\|p-q\|_{2}^{2}+\sum_{i,j\in B}b_{i}b_{j}\|p-q\|_{2}^{2}-2\sum_{i\in A,j\in B}a_{i}b_{j}\|p-q\|_{2}^{2}\leq 0.

Now set ap=1/|A|a_{p}=1/|A| and bq=1/|B|b_{q}=-1/|B|. The inequality from above becomes avg2(A)+avg2(B)2𝔼pA,qB[pq22]2r2\operatorname{avg}^{2}(A)+\operatorname{avg}^{2}(B)\leq 2\operatorname*{{\mathbb{E}}}_{p\in A,q\in B}[\|p-q\|_{2}^{2}]\leq 2r^{2}. Thus, avg(A)+avg(B)22r2=2r\operatorname{avg}(A)+\operatorname{avg}(B)\leq\sqrt{2\cdot 2r^{2}}=2r. ∎

Note that Claim 4.4 is also true for any subset of AA and BB. Thus, we have the following proposition.

Proposition 4.5.

If two sets of points A,BA,B satisfy maxpA,qBpq2r\max_{p\in A,q\in B}\|p-q\|_{2}\leq r, then avgmax(A)+avgmax(B)2r\operatorname{avg}_{\max}(A)+\operatorname{avg}_{\max}(B)\leq 2r.

Definition 4.6 (Low diameter decomposition).

Let AA be a point set. We say A1,,ATA_{1},\cdots,A_{T} is an rr-diameter decomposition of AA, if they form a partition of AA, and every AiA_{i} has diameter at most rr.

Lemma 4.7.

Fix r,ϵ>0r,\epsilon>0. Suppose we have two point sets A,BA,B with maxpA,qBpq2r\max_{p\in A,q\in B}\|p-q\|_{2}\leq r. Let x,y>0x,y>0 be any real number such that x+y2(1+ϵ)rx+y\leq 2(1+\epsilon)r. Given A1,,AkA_{1},\cdots,A_{k}, an xx-diameter decomposition of AA, and B1,BlB_{1},\cdots B_{l}, a yy-diameter decomposition of BB, we can build an undirected graph HH such that for dH\mathrm{d}_{H} the shortest path distance in HH:

  • for all pAp\in A, qBq\in B, dH(p,q)(1+ϵ)r\mathrm{d}_{H}(p,q)\leq(1+\epsilon)r;

  • for all p,qABp,q\in A\cup B, dH(p,q)pq2\mathrm{d}_{H}(p,q)\geq\|p-q\|_{2},

and the size of HH is O(|A|l+k|B|)O(|A|\cdot l+k\cdot|B|). The construction time is O(|H|)O(|H|).

Proof.

Choose any x0xx_{0}\geq x and y0yy_{0}\geq y such that x0+y0=2(1+ϵ)rx_{0}+y_{0}=2(1+\epsilon)r.

For every AiA_{i} and BjB_{j}, we create a Steiner point si,js_{i,j}. For every pAip\in A_{i}, add edge (p,si,j)(p,s_{i,j}) with length x0/2x_{0}/2. For every qBjq\in B_{j}, add edge (q,si,j)(q,s_{i,j}) with length y0/2y_{0}/2. Now, every p1,p2Aip_{1},p_{2}\in A_{i} are connected with distance x0x_{0}, every q1,q2Bjq_{1},q_{2}\in B_{j} are connected by y0y_{0}, and every pAi,qBjp\in A_{i},q\in B_{j} are connected by (x0+y0)/2=(1+ϵ)r(x_{0}+y_{0})/2=(1+\epsilon)r. Because AiA_{i} has diameter at most xx0x\leq x_{0}, BjB_{j} has diameter at most yy0y\leq y_{0}, the constraint dH(p,q)pq2\mathrm{d}_{H}(p,q)\geq\|p-q\|_{2} always holds throughout this procedure.

The number of edges is at most i[k],j[l](|Ai|+|Bj|)=|A|l+k|B|\sum_{i\in[k],j\in[l]}(|A_{i}|+|B_{j}|)=|A|\cdot l+k\cdot|B|. ∎

With the above lemma, it remains to show how to construct small low-diameter decompositions, which we will do next.

4.1 Computing low diameter decomposition

We now show that we can compute efficient low-diameter decompositions to be used in Lemma 4.7. We note that we use xavgmax(A)x\approx\operatorname{avg}_{\max}(A) and yavgmax(B)y\approx\operatorname{avg}_{\max}(B). Hence we need an efficient algorithm to compute avgmax(A)\operatorname{avg}_{\max}(A)-diameter decomposition for any point set AA. We will use the notions of rr-close and rr-far. Namely, if two points p,qp,q have distance r\leq r, we say they are a rr-close pair. If their distance is >r>r, we say they are a rr-far pair.

This section gives an important subroutine for the decomposition, presented as Algorithm 2. The input set PP has the guarantee that at least |P|2t|P|^{2-t} pairs of points are aa-close. The algorithm will then output disjoint clusters P1,,PTPP_{1},\cdots,P_{T}\subseteq P, in which every PiP_{i} has diameter (1+ϵ)a(1+\epsilon)a, and Pi\cup P_{i} contains a significant portion of PP. cc is a parameter the algorithm can chose from (0,0.2](0,0.2].

Algorithm 2
1:function Extract Clusters(PSd1P\subseteq S^{d-1}, a>0a>0, ϵ>0\epsilon>0, c>0c>0, t>0t>0)
2:     \triangleright Input guarantee: at least |P|2t|P|^{2-t} pairs of points are aa-close.
3:     Let n=|P|n=|P|.
4:     Let hh be a random (a,(1+ϵ)a,p1,p2)(a,(1+\epsilon)a,p_{1},p_{2})-LSH function, with p1=ncp_{1}=n^{-c}, p2=n(1+ϵ)cp_{2}=n^{-(1+\epsilon)c}.
5:     Partition PP into 1/p21/p_{2} buckets using hh.
6:     \triangleright See the proof of Lemma 4.8 for the reason why there are 1/p2\leq 1/p_{2} buckets.
7:     For each bucket ii, let Ch,iC_{h,i} be the number of aa-close pairs in this bucket. Similarly, let Fh,iF_{h,i} be the number of (1+ϵ)a(1+\epsilon)a-far pairs.
8:     if i\exists i, such that Ch,ip1p24n2tC_{h,i}\geq\frac{p_{1}p_{2}}{4}n^{2-t} and Fh,i/Ch/i16p2p1ntF_{h,i}/C_{h/i}\leq 16\frac{p_{2}}{p_{1}}n^{t} then
9:         Let BiB_{i} be all points in bucket ii.
10:         repeat
11:              randomly pick k=n(ϵct)/2/10k=n^{(\epsilon c-t)/2}/10 points from BiB_{i}.
12:              if they are pairwise aa-close pairs then
13:                  form a cluster with them and delete them from BiB_{i}
14:              end if
15:         until formed Ch,i/(10nk)C_{h,i}/(10nk) clusters.
16:     else
17:         Go back to Line 4
18:     end if
19:     return all formed clusters
20:end function

The guarantees of the Algorithm 2 are captured by the following lemma.

Lemma 4.8 (cluster extraction).

Fix an nn-point set PdP\subset\mathbb{R}^{d}. Let ϵ(0,2)\epsilon\in(0,2) be the approximation parameter. Let a>0a>0 be a parameter such that there are at least n2tn^{2-t} aa-close pairs, where t=O(ϵ)t=O(\epsilon).

For any parameter c(0,0.2]c\in(0,0.2] with ϵc>t\epsilon c>t, Algorithm 2 outputs disjoint clusters each with size Ω(n(ϵct)/2)\Omega(n^{(\epsilon c-t)/2}). Each cluster has diameter (1+ϵ)a(1+\epsilon)a, i.e., all its points are (1+ϵ)a(1+\epsilon)a-close. Moreover, the total number of points in all clusters is Ω(n12cϵct)\Omega(n^{1-2c-\epsilon c-t}). Algorithm 2 can be implemented to run in O(n1+c+t)O(n^{1+c+t}) expected time.

Proof.

In our algorithm, we pick a random (a,(1+ϵ)a,p1,p2)(a,(1+\epsilon)a,p_{1},p_{2})-LSH function with p1=1/ncp_{1}=1/n^{c} and p2=1/n(1+ϵ)cp_{2}=1/n^{(1+\epsilon)c}. We note that an (a,(1+ϵ)a,p1,p2)(a,(1+\epsilon)a,p_{1},p_{2})-LSH function maps an aa-close pair into the same bucket with probability p1\geq p_{1}, and map a (1+ϵ)a(1+\epsilon)a-far pair into the same bucket with probability p2\leq p_{2}.

We can assume, wlog, that the number of buckets is at most B=1/p2B=1/p_{2}, since otherwise, we can randomly group buckets into 1/p21/p_{2} buckets, increasing p2p_{2} by at most a factor of 22. The expected number of (1+ϵ)a(1+\epsilon)a-far pairs that collide is χFp2n2\chi_{F}\leq p_{2}n^{2}, and the expected number of colliding aa-close pairs is χCp1n2t\chi_{C}\geq p_{1}n^{2-t}.

For any such LSH function hh, we denote FhF_{h} and ChC_{h} as the number of colliding far and close pairs, respectively. We use 𝟙[A]=1{\mathbb{1}}\left[{A}\right]=1 if AA is true, otherwise 𝟙[A]=0{\mathbb{1}}\left[{A}\right]=0.

Since 𝔼[Ch]=χC\operatorname*{{\mathbb{E}}}[C_{h}]=\chi_{C}, we have

𝔼[Ch𝟙[Ch12χC]]12χC.\operatorname*{{\mathbb{E}}}[C_{h}\cdot{\mathbb{1}}\left[{C_{h}\geq\frac{1}{2}\chi_{C}}\right]]\geq\frac{1}{2}\chi_{C}.

Also,

χF=𝔼[Fh]𝔼[ChFhCh𝟙[FhCh4χFχC]]𝔼[Ch𝟙[Ch12χC and FhCh4χFχC]]4χFχC.\chi_{F}=\operatorname*{{\mathbb{E}}}[F_{h}]\geq\operatorname*{{\mathbb{E}}}[C_{h}\cdot\frac{F_{h}}{C_{h}}\cdot{\mathbb{1}}\left[{\frac{F_{h}}{C_{h}}\geq\frac{4\chi_{F}}{\chi_{C}}}\right]]\geq\operatorname*{{\mathbb{E}}}[C_{h}\cdot{\mathbb{1}}\left[{C_{h}\geq\frac{1}{2}\chi_{C}\text{\leavevmode\nobreak\ and\leavevmode\nobreak\ }\frac{F_{h}}{C_{h}}\geq\frac{4\chi_{F}}{\chi_{C}}}\right]]\cdot\frac{4\chi_{F}}{\chi_{C}}.

Combining the above two inequalities, we get

𝔼[Ch𝟙[Ch12χC and FhCh4χFχC]]12χCχFχC4χF14χC.\operatorname*{{\mathbb{E}}}[C_{h}\cdot{\mathbb{1}}\left[{C_{h}\geq\frac{1}{2}\chi_{C}\text{\leavevmode\nobreak\ and\leavevmode\nobreak\ }\frac{F_{h}}{C_{h}}\leq\frac{4\chi_{F}}{\chi_{C}}}\right]]\geq\frac{1}{2}\chi_{C}-\chi_{F}\cdot\frac{\chi_{C}}{4\chi_{F}}\geq\frac{1}{4}\chi_{C}.

Since Chn2C_{h}\leq n^{2}, we get

Pr[Ch12χC and Ch/Fh14χC/χF]14χC/n2.\Pr[C_{h}\geq\frac{1}{2}\chi_{C}\text{\leavevmode\nobreak\ and\leavevmode\nobreak\ }C_{h}/F_{h}\geq\frac{1}{4}\chi_{C}/\chi_{F}]\geq\frac{1}{4}\chi_{C}/n^{2}.

Furthermore, there must exist some bucket ii such that: 1) the number of close pairs colliding in bucket ii is Ch,iCh/2BC_{h,i}\geq C_{h}/2B, and 2) the number Fh,iF_{h,i} of colliding far pairs satisfies that Ch,i/Fh,i14Ch/FhC_{h,i}/F_{h,i}\geq\frac{1}{4}C_{h}/F_{h}.

This means, with probability 14χC/n2p1nt/4\geq\frac{1}{4}\chi_{C}/n^{2}\geq p_{1}n^{-t}/4, we succeed at the if-check in line 8, because we have a bucket with Ch,iCh/2Bp1p24n2tC_{h,i}\geq C_{h}/2B\geq\tfrac{p_{1}p_{2}}{4}n^{2-t} and Fh,i/Ch,i16χF/χC16p2p1ntF_{h,i}/C_{h,i}\leq 16\chi_{F}/\chi_{C}\leq 16\frac{p_{2}}{p_{1}}n^{t}. Pick k=p1nt/(160p2)=Θ(n(ϵct)/2)1k=\sqrt{p_{1}n^{-t}/(160p_{2})}=\Theta(n^{(\epsilon c-t)/2})\geq 1 points at random from this bucket ii, and call them AA. Then, for set FF defined as the (1+ϵ)a(1+\epsilon)a-far pairs in BiB_{i}, we have 𝔼[|A2F|]16p2p1ntk2<1/10\operatorname*{{\mathbb{E}}}[|A^{2}\cap F|]\leq 16\frac{p_{2}}{p_{1}}n^{t}\cdot k^{2}<1/10. Thus, with 9/10\geq 9/10 probability, kk points are pairwise (1+ϵ)a(1+\epsilon)a-close and they are extracted in line 13. Note that each extraction will decrease Ch,iC_{h,i} by at most nknk, so we can extract Ch,i/(10nk)C_{h,i}/(10nk) times safely as the same analysis holds.

Running time. We check line 8 for O(nt/p1)O(n^{t}/p_{1}) times in expectation. Each time we use O(n)O(n) time to implement the hashing. In each check, instead of using n2n^{2} time calculating exact Ch,iC_{h,i} and Fh,iF_{h,i}, we can do sampling to estimate. O(ntlognp1p2)O(\frac{n^{t}\log n}{p_{1}p_{2}}) samples can 1+o(1)1+o(1) approximate Ch,iC_{h,i} w.h.p, and O(lognp22)O(\frac{\log n}{p_{2}^{2}}) samples is enough for Fh,iF_{h,i}.

Total running time would be

O(ntp1(n+B(ntlognp1p2+lognp22))+Ch,ik210nk)=\displaystyle O\left(\frac{n^{t}}{p_{1}}\Big{(}n+B\big{(}\frac{n^{t}\log n}{p_{1}p_{2}}+\frac{\log n}{p_{2}^{2}}\big{)}\Big{)}+\frac{C_{h,i}k^{2}}{10nk}\right)= O(n1+c+t+n4c+2ϵc+2t+o(1)+n4c+3ϵc+t+o(1)+n12c)\displaystyle\leavevmode\nobreak\ O(n^{1+c+t}+n^{4c+2\epsilon c+2t+o(1)}+n^{4c+3\epsilon c+t+o(1)}+n^{1-2c})
=\displaystyle= O(n1+c+t).\displaystyle\leavevmode\nobreak\ O(n^{1+c+t}).

4.2 Proof of Theorem 4.1

The last step is to compute small-diameter decomposition for sets Vij,PijV_{ij},P_{ij}. In particular, we will use Algorithm 2 to compute (avgmax(A)+ϵdia(A))(\operatorname{avg}_{\max}(A)+\epsilon\cdot\operatorname{dia}(A))-diameter decomposition of a set AA, where dia(A)\operatorname{dia}(A) is the diameter of AA.

We accomplish this in the next three lemmas. Lemma 4.9 shows that most pairs inside a set AA are at distance at most avg(A)\operatorname{avg}(A). Lemma 4.10 shows a decomposition of a set AA into clusters with diameter (avgmax(A)+ϵdia(A))\leq(\operatorname{avg}_{\max}(A)+\epsilon\cdot\operatorname{dia}(A)). Lemma 4.11 shows how to decompose specifically Vi,jV_{i,j} from Algorithm 1.

Lemma 4.9.

Given a point set AA and ϵ(0,1/2)\epsilon\in(0,1/2), let the average distance be define in Def. 4.2, i.e., avg(A)=𝔼p,qA[pq22]\operatorname{avg}(A)=\sqrt{\operatorname*{{\mathbb{E}}}_{p,q\in A}[\|p-q\|_{2}^{2}]}. Then, there are at least |A|2ϵ|A|^{2}\epsilon pairs of p,qAp,q\in A such that pq2(1+ϵ)avg(A)\|p-q\|_{2}\leq(1+\epsilon)\operatorname{avg}(A).

Proof.

By Markov’s, Prp,qA[pq22>(1+ϵ)2avg2(A)]avg2(A)(1+ϵ)2avg2(A)1ϵ\Pr_{p,q\in A}[\|p-q\|_{2}^{2}>(1+\epsilon)^{2}\operatorname{avg}^{2}(A)]\leq\frac{\operatorname{avg}^{2}(A)}{(1+\epsilon)^{2}\operatorname{avg}^{2}(A)}\leq 1-\epsilon. Thus, there are ϵ\epsilon fraction pairs of p,qAp,q\in A that have pq2(1+ϵ)avg(A)\|p-q\|_{2}\leq(1+\epsilon)\operatorname{avg}(A). ∎

Lemma 4.10.

Fix ϵ>0\epsilon>0. Consider a set AA of size n>ω(1/ϵ)n>\omega(1/\epsilon). We can construct A1,,ATA_{1},\cdots,A_{T}, which is a (avgmax(A)+ϵdia(A))(\operatorname{avg}_{\max}(A)+\epsilon\cdot\operatorname{dia}(A))-diameter decomposition of AA in O(n1.4)O(n^{1.4}) time. Furthermore, we have Tn1Ω(ϵ)T\leq n^{1-\Omega(\epsilon)}.

Proof.

Since every p,qAp,q\in A has pq2dia(A)\|p-q\|_{2}\leq\operatorname{dia}(A), we can compute a quantity avg^(A)\widehat{\operatorname{avg}}(A) in poly(1/ϵ,logn)\operatorname{poly}(1/\epsilon,\log n) time satisfying avg(A)avg^(A)avg(A)+ϵdia(A)\operatorname{avg}(A)\leq\widehat{\operatorname{avg}}(A)\leq\operatorname{avg}(A)+\epsilon\cdot\operatorname{dia}(A) (simply by computing an empirical average of a number of sampled pairs).

By Lemma 4.9, there are at least n2ϵn^{2}\epsilon number of (1+ϵ)avg^(A)(1+\epsilon)\widehat{\operatorname{avg}}(A)-close pairs. Let t=o(1)t=o(1) so that n2ϵ=n2tn^{2}\epsilon=n^{2-t}. Fix c=0.05c=0.05. Run Extract Clusters(A,a(1+ϵ)avg^(A),ϵ,c0.05,t)\textsc{Extract Clusters}(A,a\leftarrow(1+\epsilon)\widehat{\operatorname{avg}}(A),\epsilon,c\leftarrow 0.05,t). By Lemma 4.8, we extract Ω(n12cϵct)\Omega(n^{1-2c-\epsilon c-t}) points from AA and that takes O(n1+c+t)O(n^{1+c+t}) time.

Denote the remaining part of AA as AA^{\prime}. Now, iteratively recalculate avg^(A)\widehat{\operatorname{avg}}(A^{\prime}) and run Extract Clusters on AA^{\prime} until no points are remained. Denote A1,,ATA_{1},\cdots,A_{T} as the final partition of AA. By Lemma 4.8, TT is at most lognnn(ϵct)/2=n1Ω(ϵ)\log n\cdot\frac{n}{n^{(\epsilon c-t)/2}}=n^{1-\Omega(\epsilon)} since each AiA_{i} has size lower bound. And every AiA_{i} has diameter at most (1+ϵ)(avgmax(A)+ϵdia(A))avgmax+3dia(A)(1+\epsilon)(\operatorname{avg}_{\max}(A)+\epsilon\cdot\operatorname{dia}(A))\leq\operatorname{avg}_{\max}+3\operatorname{dia}(A). The total running time is O(lognnn12cϵctn1+c+t)=O(n1.4)O(\log n\cdot\frac{n}{n^{1-2c-\epsilon c-t}}n^{1+c+t})=O(n^{1.4}). Replace ϵ\epsilon with ϵ/3\epsilon/3 to conclude the lemma. ∎

Lemma 4.11 (Decompose Vi,jV_{i,j}).

Suppose we are in Build Spanner(P,r,ϵ)\textsc{Build Spanner}(P,r,\epsilon) (Algorithm 1) and let nn, QjQ_{j}, Vi,jV_{i,j} be defined in Algorithm 1. Using n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})} time, we can construct (avgmax(Vi,j)+3ϵr)(\operatorname{avg}_{\max}(V_{i,j})+3\epsilon r)-diameter decomposition of Vi,jV_{i,j} with at most |Qj|1Ω(ϵ)|Q_{j}|^{1-\Omega(\epsilon)} clusters for every Vi,jV_{i,j}.

Proof.

The idea is to compute low-diameter decomposition for all of QjQ_{j}, and the decomposition for Vi,jV_{i,j} can be calculated in a straightforward way.

Fix QjQ_{j}. For every r0{0,ϵr,2ϵr,,2r}r_{0}\in\{0,\epsilon r,2\epsilon r,\cdots,2r\}, we iteratively run Algorithm 2 with parameters Extract Clusters(QjQ_{j}, ar0a\leftarrow r_{0}, ϵϵ\epsilon\leftarrow\epsilon, c0.1c\leftarrow 0.1, t0.1ϵt\leftarrow 0.1\epsilon) and delete the extracted points from QjQ_{j}, until we cannot extract anymore — specifically, until the Repeat loop (line 10) cannot be satisfied within the bounded time Lemma 4.8 gives, which can be checked directly. Let the extracted clusters be A1r0,ATr0A_{1}^{r_{0}},\cdots A_{T}^{r_{0}} (T|Qj|1Ω(ϵ)T\leq|Q_{j}|^{1-\Omega(\epsilon)}) and the remaining set be Arer0A_{\textrm{re}}^{r_{0}} (|Arer0||Qj||A_{\textrm{re}}^{r_{0}}|\leq|Q_{j}|). Note that every Air0A^{r_{0}}_{i} has diameter r0r_{0}.

Now consider one Vi,jV_{i,j}. Let avgmax(Vi,j)=maxSVi,javg(S)\operatorname{avg}_{\max}(V_{i,j})=\max_{S\subseteq V_{i,j}}\operatorname{avg}(S) be defined in Def. 4.2. Because every point in Vi,jV_{i,j} is rr-close to Pi,jP_{i,j}, we have avgmax(Vi,j)2r\operatorname{avg}_{\max}(V_{i,j})\leq 2r. We claim that there exists r0avgmax(Vi,j)+2ϵrr_{0}\leq\operatorname{avg}_{\max}(V_{i,j})+2\epsilon r such that the partition of Vi,jV_{i,j} defined as {A1r0Vi,j,,ATr0Vi,j}{v}vArer0Vi,j\{A_{1}^{r_{0}}\cap V_{i,j},\cdots,A_{T}^{r_{0}}\cap V_{i,j}\}\cup\{v\}_{v\in A^{r_{0}}_{\textrm{re}}\cap V_{i,j}} consists of at most |Qj|1Ω(ϵ)|Q_{j}|^{1-\Omega(\epsilon)} clusters, and therefore, this is the decomposition of Vi,jV_{i,j} we want. The only thing we need to prove is that |Arer0Vi,j|=|Qj|1Ω(ϵ)|A^{r_{0}}_{\textrm{re}}\cap V_{i,j}|=|Q_{j}|^{1-\Omega(\epsilon)}.

Let B=Arer0Vi,jB=A^{r_{0}}_{\textrm{re}}\cap V_{i,j}. By Lemma 4.9, there are at least |B|2ϵ|B|^{2}\epsilon pairs of p,qBp,q\in B with pq2(1+ϵ)avg(B)\|p-q\|_{2}\leq(1+\epsilon)\operatorname{avg}(B). Let r0(1+ϵ)avg(B)r_{0}\geq(1+\epsilon)\operatorname{avg}(B) be the least multiplicity of ϵr\epsilon r. So r0avgmax(Vi,j)+3ϵrr_{0}\leq\operatorname{avg}_{\max}(V_{i,j})+3\epsilon r. Since Arer0A^{r_{0}}_{\mathrm{re}} as a remaining set fails on Extract Clusters(Arer0,r0,ϵ,c0.1,t0.1ϵ)\textsc{Extract Clusters}(A^{r_{0}}_{\mathrm{re}},r_{0},\epsilon,c\leftarrow 0.1,t\leftarrow 0.1\epsilon), by Lemma 4.8, this means the number of r0r_{0}-close pairs is at most |Arer0|2t|Qj|20.1ϵ|A^{r_{0}}_{\mathrm{re}}|^{2-t}\leq|Q_{j}|^{2-0.1\epsilon}. This means |B|2ϵ|Qj|20.1ϵ|B|^{2}\epsilon\leq|Q_{j}|^{2-0.1\epsilon}. Therefore, |B|=|Qj|1Ω(ϵ)|B|=|Q_{j}|^{1-\Omega(\epsilon)}.

There are at most O(1/ϵ)O(1/\epsilon) different r0r_{0} and we can try all to find the correct one. Use Lemma 4.10, computing the decomposition for all QjQ_{j} and r0r_{0} uses at most 1ϵj|Qj|1.4n1.5\frac{1}{\epsilon}\sum_{j}|Q_{j}|^{1.4}\leq n^{1.5} running time. Thus, decomposing all Vi,jV_{i,j} is done in time

O(n1.5+1ϵi,j|Vi,j|)n2Ω(ϵ2).O(n^{1.5}+\frac{1}{\epsilon}\sum_{i,j}|V_{i,j}|)\leq n^{2-\Omega(\epsilon^{2})}.

Proof of Theorem 4.1.

Suppose we are in Build Spanner(P,r,ϵ)\textsc{Build Spanner}(P,r,\epsilon) (Algorithm 1). Let QjQ_{j}, Vi,jV_{i,j}, Pi,jP_{i,j} be as in the algorithm. Now, instead of doing line 20 to line 22, we do the following.

First, compute an (avgmax(Pi,j)+2ϵr)(\operatorname{avg}_{\max}(P_{i,j})+2\epsilon\cdot r)-diameter decomposition of Pi,jP_{i,j}, denoted as A1,,AkA_{1},\cdots,A_{k}, and compute an (avgmax(Vi,j)+3ϵr)(\operatorname{avg}_{\max}(V_{i,j})+3\epsilon\cdot r)-diameter decomposition of Vi,jV_{i,j}, denoted as B1,,BlB_{1},\cdots,B_{l}. Then, use the gadget in Lemma 4.7 to add edges in our undirected spanner. The reason we can use the gadget is because avgmax(Vi,j)+avgmax(Pi,j)2r\operatorname{avg}_{\max}(V_{i,j})+\operatorname{avg}_{\max}(P_{i,j})\leq 2r by Proposition 4.5. Thus pairs in Vij×PijV_{ij}\times P_{ij} are distorted by at most 1+O(ϵ)1+O(\epsilon).

Decomposing all Pi,jP_{i,j} can be done in running time proportional to

i,j|Pi,j|1.4(i,j|Pi,j|)1.4n1.251.4=n1.75.\sum_{i,j}|P_{i,j}|^{1.4}\leq(\sum_{i,j}|P_{i,j}|)^{1.4}\leq n^{1.25\cdot 1.4}=n^{1.75}.

Computing the decompositions of all Vi,jV_{i,j} can be done in n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})} time.

Size of spanner. By Lemma 4.7, every pair of Vi,jV_{i,j} and Pi,jP_{i,j} adds |Vi,j|k+l|Pi,j||V_{i,j}|\cdot k+l\cdot|P_{i,j}| edges to the spanner. We have k|Pi,j|1Ω(ϵ)k\leq|P_{i,j}|^{1-\Omega(\epsilon)} (Lemma 4.10) and l|Qj|1Ω(ϵ)l\leq|Q_{j}|^{1-\Omega(\epsilon)} (Lemma 4.11). Since Vi,jQjV_{i,j}\subseteq Q_{j}, the total size of our spanner is at most

logΔ(i,j|Qj||Pi,j|1Ω(ϵ)+|Qj|1Ω(ϵ)|Pi,j|).\log\Delta\cdot\left(\sum_{i,j}|Q_{j}|\cdot|P_{i,j}|^{1-\Omega(\epsilon)}+|Q_{j}|^{1-\Omega(\epsilon)}\cdot|P_{i,j}|\right).

Here we copied the constraints for QjQ_{j} and Pi,jP_{i,j} from Observation 3.5.

  1. 1.

    for all i,ji,j, Pi,jPiP_{i,j}\subseteq P_{i} and |Pi|=m|P_{i}|=m.

  2. 2.

    j|Qj|=n1+o(1)\sum_{j}|Q_{j}|=n^{1+o(1)};

  3. 3.

    i,j𝔼[|Pi,j|]=n5/4+O(ϵ)\sum_{i,j}\operatorname*{{\mathbb{E}}}[|P_{i,j}|]=n^{5/4+O(\epsilon)};

  4. 4.

    j,i|Pi,j||P|=n\forall j,\sum_{i}|P_{i,j}|\leq|P|=n,

  5. 5.

    i[n/m]i\in[n/m] and m=nϵ2/4m=n^{\epsilon^{2}/4}.

We bound the two terms as follows.

  1. 1.
    i,j|Qj||Pi,j|1Ω(ϵ)\displaystyle\leavevmode\nobreak\ \sum_{i,j}|Q_{j}|\cdot|P_{i,j}|^{1-\Omega(\epsilon)}
    \displaystyle\leq i,j|Qj|(maxi,j|Pi,j|1Ω(ϵ))\displaystyle\leavevmode\nobreak\ \sum_{i,j}|Q_{j}|\cdot(\max_{i,j}|P_{i,j}|^{1-\Omega(\epsilon)})
    \displaystyle\leq nmj|Qj|m1Ω(ϵ)\displaystyle\leavevmode\nobreak\ \frac{n}{m}\cdot\sum_{j}|Q_{j}|\cdot m^{1-\Omega(\epsilon)}
    \displaystyle\leq n2/mϵ=n2Ω(ϵ3),\displaystyle\leavevmode\nobreak\ n^{2}/m^{\epsilon}=n^{2-\Omega(\epsilon^{3})},

    where the second step is because i[n/m]i\in[n/m] (5) and (1), the third step is by (2).

  2. 2.
    i,j|Qj|1Ω(ϵ)|Pi,j|\displaystyle\leavevmode\nobreak\ \sum_{i,j}|Q_{j}|^{1-\Omega(\epsilon)}\cdot|P_{i,j}|
    \displaystyle\leq i,j,|Qj|n|Qj|1Ω(ϵ)|Pi,j|+i,j,|Qj|n|Qj|1Ω(ϵ)|Pi,j|\displaystyle\leavevmode\nobreak\ \sum_{i,j,|Q_{j}|\leq\sqrt{n}}|Q_{j}|^{1-\Omega(\epsilon)}\cdot|P_{i,j}|+\sum_{i,j,|Q_{j}|\geq\sqrt{n}}|Q_{j}|^{1-\Omega(\epsilon)}\cdot|P_{i,j}|
    \displaystyle\leq n1Ω(ϵ)i,j|Pi,j|+j,|Qj|n|Qj|1Ω(ϵ)n\displaystyle\leavevmode\nobreak\ \sqrt{n}^{1-\Omega(\epsilon)}\cdot\sum_{i,j}|P_{i,j}|+\sum_{j,\leavevmode\nobreak\ |Q_{j}|\geq\sqrt{n}}|Q_{j}|^{1-\Omega(\epsilon)}\cdot n
    \displaystyle\leq n1.75+O(ϵ)+n2Ω(ϵ),\displaystyle\leavevmode\nobreak\ n^{1.75+O(\epsilon)}+n^{2-\Omega(\epsilon)},

    where the second step is by j|Pi,j|=n\sum_{j}|P_{i,j}|=n (4), the last step is by (3).

Concluding, we proved that the size of our spanner is n2Ω(ϵ3)logΔn^{2-\Omega(\epsilon^{3})}\log\Delta. ∎

5 Removing dependence on Δ\Delta

If we ignore the time cost for constructing, we can remove the dependence on Δ\Delta in the number of edges used in the spanners. Below we show this for the directed spanner reducing the size from n2Ω(ϵ2)logΔn^{2-\Omega(\epsilon^{2})}\log\Delta to n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})}; similar result for the undirected spanner is analogous. We use the idea from [HPIS13].

We start by defining the δ\delta-net.

Definition 5.1 (δ\delta-net).

Let (X,ρ)(X,\rho) be a metric space, and let δ>0\delta>0. A maximal set YXY\subseteq X, such that for any x,yYx,y\in Y, ρ(x,y)>δ\rho(x,y)>\delta, is called a δ\delta-net for (X,ρ)(X,\rho).

Assume the dataset has closest distance aa and largest distance aΔa\cdot\Delta. Let b=aϵb=a\cdot\epsilon. Define K=log1+ϵ(Δ/ϵ)K=\log_{1+\epsilon}(\Delta/\epsilon) so that b(1+ϵ)K=aΔb\cdot(1+\epsilon)^{K}=a\cdot\Delta. For every i{0,1,,K}i\in\{0,1,\cdots,K\}, let NiPN_{i}\subseteq P be a b(1+ϵ)ib(1+\epsilon)^{i}-net for (P,2)(P,\ell_{2}). This means every point pp not in NiN_{i} must have a near neighbor in NiN_{i} that is b(1+ϵ)ib(1+\epsilon)^{i} close, and we call this point the leader of pp, denoted as Li(p)L_{i}(p).

We can pick those nets so that

P=N0N1N2NK.P=N_{0}\supseteq N_{1}\supseteq N_{2}\supseteq\cdots\supseteq N_{K}.

We suppose to run Lemma 3.2 on each net NiN_{i} with parameter ri=a(1+ϵ)ir_{i}=a(1+\epsilon)^{i}. But note that many point in NiN_{i} may not even has a neighbor within distance rir_{i}. Thus, we will do the following to reduce the size of the spanner. For i[k]i\in[k], let ViV_{i} be the set of points in NiN_{i} that has at least one neighbor within distance rir_{i} in NiN_{i}. Then we run Lemma 3.2 on each ViV_{i} with parameter r=rir=r_{i} to get HiH_{i}. We claim that iHi\cup_{i}H_{i} is a spanner with (1+7ϵ)(1+7\epsilon) dilation.

We prove by induction on the distance rr that for every pair of point with pq2r\|p-q\|_{2}\leq r, dH(p,q)(1+7ϵ)r\mathrm{d}_{H}(p,q)\leq(1+7\epsilon)r. Assume this is true for all pairs of point with distance <r<r. Let p,qp,q be pq2=r[a(1+ϵ)i4,a(1+ϵ)i3]\|p-q\|_{2}=r\in[a(1+\epsilon)^{i-4},a(1+\epsilon)^{i-3}]. If pNip\in N_{i}, we let x=px=p, otherwise we let x=Li(p)Nix=L_{i}(p)\in N_{i} be the leader of pp. Similarly, we define y=qy=q if qNiq\in N_{i} or otherwise we let y=Li(q)y=L_{i}(q). Then dH(p,q)dH(p,x)+dH(x,y)+dH(y,q)\vec{d}_{H}(p,q)\leq\vec{d}_{H}(p,x)+\vec{d}_{H}(x,y)+\vec{d}_{H}(y,q). Since px2,qy2b(1+ϵ)i<r\|p-x\|_{2},\|q-y\|_{2}\leq b(1+\epsilon)^{i}<r, by induction we must have dH(p,x),dH(q,y)(1+7ϵ)b(1+ϵ)iϵ(1+11ϵ)r\vec{d}_{H}(p,x),\vec{d}_{H}(q,y)\leq(1+7\epsilon)\cdot b(1+\epsilon)^{i}\leq\epsilon(1+11\epsilon)r. Since xy2xp2+pq2+qy2(1+3ϵ)rri\|x-y\|_{2}\leq\|x-p\|_{2}+\|p-q\|_{2}+\|q-y\|_{2}\leq(1+3\epsilon)r\leq r_{i} and x,yNix,y\in N_{i}, x,yx,y must be also in ViV_{i}. Thus, we have

dHi(x,y)(1+ϵ)ri(1+4ϵ)r.\vec{d}_{H_{i}}(x,y)\leq(1+\epsilon)r_{i}\leq(1+4\epsilon)r.

Thus, dH(p,q)dH(p,x)+dHi(x,y)+dH(q,y)(1+7ϵ)pq2\vec{d}_{H}(p,q)\leq\vec{d}_{H}(p,x)+\vec{d}_{H_{i}}(x,y)+\vec{d}_{H}(q,y)\leq(1+7\epsilon)\|p-q\|_{2}.

Also, because p,qVi,dHi(p,q)pq2\forall p,q\in V_{i},\vec{d}_{H_{i}}(p,q)\geq\|p-q\|_{2}, so p,qP,dH(p,q)pq2\forall p,q\in P,\vec{d}_{H}(p,q)\geq\|p-q\|_{2}.

Next, we will upper bound the size of ViV_{i}.

For every point pPp\in P and i[K]i\in[K], we create an imaginary point called (i,p)(i,p) if pNip\in N_{i}. Denote z(p)z(p) as the largest integer that pNz(p)p\in N_{z(p)}. We build a tree on all such points as follows. We link an edge between (i,p)(i,p) to (i+1,p)(i+1,p) for all iz(p)1i\leq z(p)-1, and link (z(p),p)(z(p),p) to (z(p)+1,q)(z(p)+1,q) where q=Lz(p)(p)q=L_{z(p)}(p) is the leader of pp. The root of the tree is (K,p)(K,p^{*}), in which NK={p}N_{K}=\{p^{*}\} has only one point.

Consider any i[K]i\in[K] and pPp\in P such that pVip\in V_{i}. This means pp has a point qq that is a(1+ϵ)ia(1+\epsilon)^{i} close to pp. By properties of δ\delta-net, pp and qq cannot co-exists in net Ni+kN_{i+k}, where k:=log1+ϵ(b/a)k:=\log_{1+\epsilon}(b/a). This means there must be a vertex on the chain {(i,p),(i+1,p),,(i+k,p)}\{(i,p),(i+1,p),\cdots,(i+k,p)\} having two children.

The tree has nn leaves and it has at most n1n-1 vertices having two children, so |Vi|=O(kn)\sum|V_{i}|=O(kn).

The size of the spanner HH is

i|Vi|2Ω(ϵ2)=(n/ϵ2)2Ω(ϵ2)=n2Ω(ϵ2).\sum_{i}|V_{i}|^{2-\Omega(\epsilon^{2})}=(n/\epsilon^{2})^{2-\Omega(\epsilon^{2})}=n^{2-\Omega(\epsilon^{2})}.
Theorem 5.2 (Existence result for directed spanner).

Fix ϵ(0,1)\epsilon\in(0,1). For any dataset PdP\in\mathbb{R}^{d}, there exists a directed graph HH on vertices PSP\cup S with size n2Ω(ϵ2)n^{2-\Omega(\epsilon^{2})}, such that for any p,qPp,q\in P,

pq2dH(p,q)<(1+ϵ)pq2.\|p-q\|_{2}\leq\vec{\mathrm{d}}_{H}(p,q)<(1+\epsilon)\cdot\|p-q\|_{2}.

Using the same technique, we can prove similar results for undirected graph.

Theorem 5.3 (Existence result for undirected spanner).

Fix ϵ(0,1)\epsilon\in(0,1). For any dataset PdP\in\mathbb{R}^{d}, there exists an undirected graph HH on vertices PSP\cup S with size n2Ω(ϵ3)n^{2-\Omega(\epsilon^{3})}, such that for any p,qPp,q\in P,

pq2dH(p,q)<(1+ϵ)pq2.\|p-q\|_{2}\leq\mathrm{d}_{H}(p,q)<(1+\epsilon)\cdot\|p-q\|_{2}.

Appendix

Appendix A Data-independent LSH from [ALRW17]

We show here how Lemma 3.3 follows from results in [ALRW17, Section 3]. We use the notations from [ALRW17], noting that our ρQ/ρP\rho_{Q}/\rho_{P} corresponds to ρq/ρu\rho_{q}/\rho_{u} from [ALRW17].

Definition A.1 (Definition of α(s)\alpha(s), β(s)\beta(s)).

For any 0<s<20<s<2, let u,vu,v be any two points on a unit sphere Sd1S^{d-1} at distance ss. Define α(s):=1s2/2\alpha(s):=1-s^{2}/2 as the cosine of the angle between uu and vv, and β(s):=1α2(s)\beta(s):=\sqrt{1-\alpha^{2}(s)} as the sine of the same angle.

Definition A.2 (Definition of F(η)F(\eta) and G(s,η,σ)G(s,\eta,\sigma)).

Let uu be any point on Sd1S^{d-1}. Define

F(η):=PrzN(0,1)d[u,zη].F(\eta):=\Pr_{z\sim N(0,1)^{d}}[\langle u,z\rangle\geq\eta].

For any 0<s<20<s<2, let vv be any point on Sd1S^{d-1} with dis(u,v)=s\operatorname{dis}(u,v)=s. Define

G(s,η,σ):=PrzN(0,1)d[u,zηv,zσ].G(s,\eta,\sigma):=\Pr_{z\sim N(0,1)^{d}}[\langle u,z\rangle\geq\eta\wedge\langle v,z\rangle\geq\sigma].

Note that F(η),G(s,η,σ)F(\eta),G(s,\eta,\sigma) doesn’t depend on the specific choice of uu and vv due to the symmetry of Gaussian.

Lemma A.3 (Lemma 3.1 of [ALRW17]).

For η\eta\rightarrow\infty,

F(η)=e(1+o(1))η22.F(\eta)=e^{-(1+o(1))\cdot\frac{\eta^{2}}{2}}.
Lemma A.4 (Lemma 3.2 of [ALRW17]).

For every s(0,2)s\in(0,2), if η,σ\eta,\sigma\rightarrow\infty and α(s)η<σ<1α(s)η\alpha(s)\cdot\eta<\sigma<\frac{1}{\alpha(s)}\cdot\eta, one has

G(s,η,σ)=e(1+o(1))η2+σ22α(s)ησ2β2(s).G(s,\eta,\sigma)=e^{-(1+o(1))\cdot\frac{\eta^{2}+\sigma^{2}-2\alpha(s)\eta\sigma}{2\beta^{2}(s)}}.
Lemma A.5 (Data-independent LSH from Section 3 of [ALRW17]).

Let r(0,2)r\in(0,2) be a fixed parameter. For any integer K1K\geq 1 and ηq,ηu>0\eta_{q},\eta_{u}>0, there is a data structure that, takes a random seed ss, implicitly gives two random collections of subspaces A1,A2,,ATdA_{1},A_{2},\cdots,A_{T}\subset\mathbb{R}^{d} and B1,B2,,BTdB_{1},B_{2},\cdots,B_{T}\subset\mathbb{R}^{d}. For any point pp, denote 𝒜p={Ai:pAi}\mathcal{A}_{p}=\{A_{i}:p\in A_{i}\} as the set of subspaces AiA_{i} that contains pp. Similarly, define p={Bi:pBi}\mathcal{B}_{p}=\{B_{i}:p\in B_{i}\}.

  1. 1.

    pSd1\forall p\in S^{d-1}, 𝔼s[|𝒜p|]=Θ(F(ηu)/G(r,ηu,ηq))K\operatorname*{{\mathbb{E}}}_{s}[|\mathcal{A}_{p}|]=\Theta(F(\eta_{u})/G(r,\eta_{u},\eta_{q}))^{K} ;

  2. 2.

    pSd1\forall p\in S^{d-1}, 𝔼s[|p|]=Θ(F(ηq)/G(r,ηu,ηq))K\operatorname*{{\mathbb{E}}}_{s}[|\mathcal{B}_{p}|]=\Theta(F(\eta_{q})/G(r,\eta_{u},\eta_{q}))^{K} ;

  3. 3.

    p,qSd1,i[T]\forall p,q\in S^{d-1},i\in[T],

    Pr[pAiqBi]=(G(pq2,ηu,ηq)/F(ηq))K\Pr[p\in A_{i}\mid q\in B_{i}]=(G(\|p-q\|_{2},\eta_{u},\eta_{q})/F(\eta_{q}))^{K}
  4. 4.

    pSd1\forall p\in S^{d-1}, one can find all Ai𝒜pA_{i}\in\mathcal{A}_{p} in O(|𝒜p|logn1/G(r,ηu,ηq))O(|\mathcal{A}_{p}|\cdot\log n\cdot 1/G(r,\eta_{u},\eta_{q})) time, and all BipB_{i}\in\mathcal{B}_{p} in O(|p|logn1/G(r,ηu,ηq))O(|\mathcal{B}_{p}|\cdot\log n\cdot 1/G(r,\eta_{u},\eta_{q})) time;

The following lemma shows that with properly chosen ηu\eta_{u}, ηq\eta_{q} and KK, Lemma 3.3 can be proven from Lemma A.5.

Lemma A.6 (parameters).

Let r=(logn)1/8r=(\log n)^{-1/8} and c=1+ϵc=1+\epsilon is the approximation ratio. Then, for any n=ω(1)n=\omega(1), there is a choice of ηu,ηq>0\eta_{u},\eta_{q}>0 and 1KO(lnn)1\leq K\leq O(\sqrt{\ln n}) such that

  1. 1.

    (F(ηu)/G(r,ηu,ηq))K=n1/ϵ2+O(1/ϵ)(F(\eta_{u})/G(r,\eta_{u},\eta_{q}))^{K}=n^{1/\epsilon^{2}+O(1/\epsilon)};

  2. 2.

    (F(ηq)/G(r,ηu,ηq))K=no(1)(F(\eta_{q})/G(r,\eta_{u},\eta_{q}))^{K}=n^{o(1)};

  3. 3.

    (G(cr,ηu,ηq)/F(ηq))K=1/n(G(cr,\eta_{u},\eta_{q})/F(\eta_{q}))^{K}=1/n.

  4. 4.

    1/G(r,ηu,ηq)=no(1)1/G(r,\eta_{u},\eta_{q})=n^{o(1)}.

Proof.

Let σ,τ\sigma,\tau be two parameters that F(ηu)K=nσF(\eta_{u})^{K}=n^{-\sigma} and F(ηq)K=nτF(\eta_{q})^{K}=n^{-\tau}. Thus, G(r,ηu,ηq)G(r,\eta_{u},\eta_{q}) can be written as nσ+τ2α(r)τσβ2(r)n^{-\frac{\sigma+\tau-2\alpha(r)\sqrt{\tau\sigma}}{\beta^{2}(r)}}. We set τ:=β(cr)α(r)α(cr)\sqrt{\tau}:=\frac{\beta(cr)}{\alpha(r)-\alpha(cr)} and σ:=α(r)τ\sqrt{\sigma}:=\alpha(r)\cdot\sqrt{\tau} as what [ALRW17] did in Section 3.3.3, in the case where ρq=0\rho_{q}=0.

The proof of part 3 follows the same proof as in [ALRW17]. Here we have

ln(G(cr,ηu,ηq)F(ηq))K/ln(n)=(ττ+σ2α(cr)τσβ2(cr))=1.\displaystyle\ln\left(\frac{G(cr,\eta_{u},\eta_{q})}{F(\eta_{q})}\right)^{K}/\ln(n)=\left(\tau-\frac{\tau+\sigma-2\alpha(cr)\sqrt{\tau\sigma}}{\beta^{2}(cr)}\right)=-1.

Also, part 2 is true because

(F(ηq)/G(r,ηu,ηq))K=nρq+o(1),\displaystyle(F(\eta_{q})/G(r,\eta_{u},\eta_{q}))^{K}=n^{\rho_{q}+o(1)},

where

ρq=σ+τ2α(r)τσβ2(r)τ=(σα(r)τ)2β2(r)=0.\displaystyle\rho_{q}=\frac{\sigma+\tau-2\alpha(r)\sqrt{\tau\sigma}}{\beta^{2}(r)}-\tau=\frac{(\sqrt{\sigma}-\alpha(r)\sqrt{\tau})^{2}}{\beta^{2}(r)}=0.

For part 1, we have

(F(ηu)/G(r,ηu,ηq))K=nρu+o(1),\displaystyle(F(\eta_{u})/G(r,\eta_{u},\eta_{q}))^{K}=n^{\rho_{u}+o(1)},

where

ρu=σ+τ2α(r)τσβ2(r)σ=(α(r)στ)2β2(r)=1/ϵ2+O(1/ϵ)+O(r).\displaystyle\rho_{u}=\frac{\sigma+\tau-2\alpha(r)\sqrt{\tau\sigma}}{\beta^{2}(r)}-\sigma=\frac{(\alpha(r)\sqrt{\sigma}-\sqrt{\tau})^{2}}{\beta^{2}(r)}=1/\epsilon^{2}+O(1/\epsilon)+O(r).

Part 4 follows by computing GG directly and using the fact that r=(logn)1/8r=(\log n)^{-1/8}. ∎

References

  • [ABS+20] Reyan Ahmed, Greg Bodwin, Faryad Darabi Sahneh, Keaton Hamm, Mohammad Javad Latifi Jebelli, Stephen Kobourov, and Richard Spence. Graph spanners: A tutorial review. Computer Science Review, 37:100253, 2020.
  • [ADD+93] Ingo Althöfer, Gautam Das, David Dobkin, Deborah Joseph, and José Soares. On sparse spanners of weighted graphs. Discrete & Computational Geometry, 9(1):81–100, 1993.
  • [AI08] Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Communications of the ACM, 51(1):117–122, 2008.
  • [AIMN23] Alexandr Andoni, Piotr Indyk, Sepideh Mahabadi, and Shyam S. Narayanan. Differentially private approximate near neighbor counting in high dimensions. In NeurIPS, 2023.
  • [ALRW17] Alexandr Andoni, Thijs Laarhoven, Ilya Razenshteyn, and Erik Waingarten. Optimal hashing-based time-space trade-offs for approximate near neighbors. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 47–66. SIAM, 2017.
  • [ANWR17] Jason Altschuler, Jonathan Niles-Weed, and Philippe Rigollet. Near-linear time approximation algorithms for optimal transport via sinkhorn iteration. Advances in neural information processing systems, 30, 2017.
  • [AR15] Alexandr Andoni and Ilya Razenshteyn. Optimal data-dependent hashing for approximate near neighbors. 2015. Full version at http://arxiv.org/abs/1501.01062.
  • [ARSS23] Pankaj K Agarwal, Sharath Raghvendra, Pouyan Shirzadian, and Rachita Sowle. A higher precision algorithm for computing the 11-wasserstein distance. In The Eleventh International Conference on Learning Representations, 2023.
  • [ASZ20] Alexandr Andoni, Clifford Stein, and Peilin Zhong. Parallel approximate undirected shortest paths via low hop emulators. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 322–335, 2020.
  • [BRS11] Yair Bartal, Ben Recht, and Leonard J. Schulman. Dimensionality reduction: Beyond the johnson-lindenstrauss bound. In Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 868–887. SIAM, 2011.
  • [BT22] Sujoy Bhore and Csaba D Tóth. Euclidean steiner spanners: Light and sparse. SIAM Journal on Discrete Mathematics, 36(3):2411–2444, 2022.
  • [Che86] Paul Chew. There is a planar graph almost as good as the complete graph. In Proceedings of the second annual symposium on Computational geometry, pages 169–177, 1986.
  • [CHJ+22] CJ Carey, Jonathan Halcrow, Rajesh Jayaram, Vahab Mirrokni, Warren Schudy, and Peilin Zhong. Stars: Tera-scale graph building for clustering and graph learning. arXiv preprint arXiv:2212.02635, 2022.
  • [CKL+22] Li Chen, Rasmus Kyng, Yang P Liu, Richard Peng, Maximilian Probst Gutenberg, and Sushant Sachdeva. Maximum flow and minimum-cost flow in almost-linear time. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 612–623. IEEE, 2022.
  • [Cla87] Ken Clarkson. Approximation algorithms for shortest path motion planning. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, pages 56–65, 1987.
  • [Cut13] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
  • [DEL+22] Laxman Dhulipala, David Eisenstat, Jakub Lacki, Vahab Mirrokni, and Jessica Shi. Hierarchical agglomerative graph clustering in poly-logarithmic depth. Advances in Neural Information Processing Systems, 35:22925–22940, 2022.
  • [Epp99] David Eppstein. Spanning trees and spanners. handbook of computational geometry, j.-r. sack and j. urrutia, eds, 1999.
  • [Erd63] P. Erdös. Extremal problems in graph theory. Theory of Graphs and its Applications (Proc. Symp. Smolenice, 1963), pages 29–36, 1963.
  • [HIM12] Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: Towards removing the curse of dimensionality. Theory of Computing, 1(8):321–350, 2012.
  • [HPIS13] Sariel Har-Peled, Piotr Indyk, and Anastasios Sidiropoulos. Euclidean spanners in high dimensions. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms, pages 804–809. SIAM, 2013.
  • [Ind01] Piotr Indyk. High-dimensional computational geometry. Ph.D. Thesis. Department of Computer Science, Stanford University, 2001.
  • [Kei88] J Mark Keil. Approximating the complete euclidean graph. In SWAT 88: 1st Scandinavian Workshop on Algorithm Theory Halmstad, Sweden, July 5–8, 1988 Proceedings 1, pages 208–213. Springer, 1988.
  • [KG92] J Mark Keil and Carl A Gutwin. Classes of graphs which approximate the complete euclidean graph. Discrete & Computational Geometry, 7:13–28, 1992.
  • [KNP19] Andrey Boris Khesin, Aleksandar Nikolov, and Dmitry Paramonov. Preconditioning for the geometric transportation problem. In 35th International Symposium on Computational Geometry (SoCG 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
  • [Li20] Jason Li. Faster parallel algorithm for approximate shortest path. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 308–321, 2020.
  • [LS22] Hung Le and Shay Solomon. Truly optimal euclidean spanners. SIAM Journal on Computing, (0):FOCS19–135, 2022.
  • [NS07] Giri Narasimhan and Michiel Smid. Geometric spanner networks. Cambridge University Press, 2007.
  • [PS89] David Peleg and Alejandro A Schäffer. Graph spanners. Journal of graph theory, 13(1):99–116, 1989.
  • [Sch35] Isaac J Schoenberg. Remarks to Maurice Frechet’s article “sur la definition axiomatique d’une classe d’espace distances vectoriellement applicable sur l’espace de hilbert”. Annals of Mathematics, pages 724–732, 1935.
  • [Sei91] JJ Seidel. Quasiregular two-distance sets. In Geometry and Combinatorics, pages 267–273. Elsevier, 1991.
  • [She17] Jonah Sherman. Generalized preconditioning and undirected minimum-cost flow. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 772–780. SIAM, 2017.
  • [Tko12] Tomasz Tkocz. An upper bound for spherical caps. The American Mathematical Monthly, 119(7):606–607, 2012.
  • [Vai89] Pravin M. Vaidya. Geometry helps in matching. SIAM Journal on Computing, 18(6):1201–1225, 1989.
  • [VDBLL+21] Jan Van Den Brand, Yin Tat Lee, Yang P Liu, Thatchaphol Saranurak, Aaron Sidford, Zhao Song, and Di Wang. Minimum cost flows, mdps, and 1\ell_{1}-regression in nearly linear time for dense instances. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 859–869, 2021.
  • [Zwi01] Uri Zwick. Exact and approximate distances in graphs—a survey. In Algorithms—ESA 2001: 9th Annual European Symposium Århus, Denmark, August 28–31, 2001 Proceedings, pages 33–48. Springer, 2001.