This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Dynamic Metric Embedding into p\ell_{p} Space

Kiarash Banihashem    MohammadTaghi Hajiaghayi    Dariusz R. Kowalski    Jan Olkowski    Max Springer
Abstract

We give the first non-trivial decremental dynamic embedding of a weighted, undirected graph GG into p\ell_{p} space. Given a weighted graph GG undergoing a sequence of edge weight increases, the goal of this problem is to maintain a (randomized) mapping ϕ:(G,d)(X,p)\phi:(G,d)\to(X,\ell_{p}) from the set of vertices of the graph to the p\ell_{p} space such that for every pair of vertices uu and vv, the expected distance between ϕ(u)\phi(u) and ϕ(v)\phi(v) in the p\ell_{p} metric is within a small multiplicative factor, referred to as the distortion, of their distance in GG. Our main result is a dynamic algorithm with expected distortion O(log3n)O(\log^{3}n) and total update time O((m1+o(1)log2W+Qlogn)log(nW))O\left((m^{1+o(1)}\log^{2}W+Q\log n)\log(nW)\right), where WW is the maximum weight of the edges, QQ is the total number of updates and n,mn,m denote the number of vertices and edges in GG respectively. This is the first result of its kind, extending the seminal result of Bourgain (Bourgain, 1985) to the growing field of dynamic algorithms. Moreover, we demonstrate that in the fully dynamic regime, where we tolerate edge insertions as well as deletions, no algorithm can explicitly maintain an embedding into p\ell_{p} space that has a low distortion with high probability. 111 An earlier version of this paper claimed an expceted distortion bound of O(log2n)O(\log^{2}n) and a total update time of O((m1+o(1)log2W+Q)log(nW))O((m^{1+o(1)}\log^{2}W+Q)\log(nW)). The new version obtains the slightly worse bounds of O(log3n)O(\log^{3}n) and O((m1+o(1)log2W+Qlogn)log(nW))O((m^{1+o(1)}\log^{2}W+Q\log n)\log(nW)) respectively.

Machine Learning, ICML
\addauthor

kbmagenta \addauthormsmagenta \addauthorjogreen


1 Introduction

A low distortion embedding between two metric spaces, M=(X,d)M=(X,d) and M=(X,d)M^{\prime}=(X^{\prime},d^{\prime}), is a mapping ff such that for every pair of points x,yXx,y\in X we have

d(x,y)d(f(x),f(y))Cd(x,y),\displaystyle d(x,y)\leq d^{\prime}(f(x),f(y))\leq C\cdot d(x,y)\ ,

where CC is often referred to as the distortion of such an embedding. Low-distortion embeddings have been extensively employed to simplify graph theoretic problems prevalent in the algorithm design literature (Indyk, 2001). This effectiveness stems primarily from the ability to represent any graph GG using a metric, wherein distances correspond to the shortest paths between two nodes. However, computing numerous graph properties within such a metric is inherently challenging. Thus, by first embedding the graph into an “easy” metric, we can facilitate simplified problem-solving, albeit with an approximation factor determined by the distortion introduced by the embedding. For example, approximation algorithms for the sparsest cut (Linial et al., 1995), bandwidth (Blum et al., 1998) and buy-at-bulk (Awerbuch & Azar, 1997b) graph problems leverage embeddings into low-distortion metric spaces to obtain their near-optimal guarantees.

In the present work, we investigate fundamental embedding problems in the dynamic setting, where the input graph GG is subject to modification at each iteration by an adversary. Specifically, we address the following question:

Problem 1.

Is it possible to embed any graph GG, undergoing a dynamic sequence of edge updates, into Euclidean (and more broadly the p\ell_{p}-metric) space with minimal distortion of the underlying metric’s pairwise distances?

Unsurprisingly, the use of randomization is essential in demonstrating that such a data structure is indeed attainable. Most notably, we build upon the fundamental building blocks of Bourgain, Johnson and Lindenstrauss in further demonstrating the power of randomized decompositions of a graph to efficiently map such an input, undergoing dynamic updates, into Euclidean space for ease of computation with only polylogarithmic expected distortion (see Section 2 for the formal definition) of the original distances between nodes of GG. These are the first results of their kind in the dynamic input setting.

1.1 Motivation

Metric Embedding. From a mathematical perspective, embeddings of finite metric spaces into normed spaces is a natural extension on the local theory of Banach spaces (J., 2002). The goal of this area of research is to devise mappings, ff, that preserve pairwise distances up to an additive or multiplicative distortion. In tandem to ensuring this metric is not too heavily distorted, we also seek to ensure that the resulting embedding of a point in the original space has low-dimension (i.e. can be represented by small number of coordinates) to ensure the representation is spacially efficient.

Within this problem framework, the classic question is that of embedding metric spaces into Hilbert space. Considerable literature has investigated embeddings into p\ell_{p} normed spaces (see the survey (Abraham et al., 2006) for a comprehensive overview of the main results). Most crucially, the cornerstone of the field is the following theorem by Bourgain in 1985:

Theorem 1.1 ((Bourgain, 1985)).

For every nn-point metric space, there exists an embedding into Euclidean space with distortion O(logn)O(\log n).

This landmark result is foundational in the theory of embedding into finite metric spaces. Moreover, it was further shown in (Linial et al., 1995) that Bourgain’s embedding yields an embebdding into any p\ell_{p}-metric with distorition O(logn)O(\log n) and dimension O(log2n)O(\log^{2}n) – demonstrating a highly efficient and precise algorithm.

We highlight that the above results are of immense value to the field of computer science in the age of big data where the construction of appropriately sized data structures is no longer efficient (or even feasible). For instance, in the field of social media analysis, processing billions of daily tweets to identify trending topics and sentiment analysis would require impractical amounts of storage and computational resources without dimension reduction techniques like topic modeling algorithms (Church, 2017; Subercaze et al., 2015). It is thus essential to reduce these inputs to a more approachable metric space to prevent computational bottle-necking. In this paper, we present the first extension on these seminal tools to the emerging domain of dynamic algorithms. Specifically, we maintain a polylogarithmic (expected) distortion embedding into the p\ell_{p}-metric through a sequence of updates to the input graph.

Dynamic Algorithm. A dynamic graph algorithm is a data structure that supports edge insertions, edge deletions, and can answer queries on certain properties of the input with respect to the original space’s metrics. While trivially one can run a static algorithm on the graph after each update and rebuild a structure equipped to answer queries, the now large body of work on dynamic algorithms works to devise solutions with considerably faster update and query times. In the present work, we maintain a dynamic data structure that both reduces the dimension of the input for ease of computation and exhibits only a modest expansion of the original metric’s pairwise distances in expectation.

Similar to the fundamental motivation underlying metric embeddings, the emergence of big data has intensified the need for dynamic algorithms capable of efficiently storing representations of massive input graphs, while promptly adapting to any changes that may occur on a variety of machine learning and optimization problems (Bhattacharya et al., 2022; Dütting et al., 2023). As an illustrative example, consider the problem of maintaining connectivity information in a large graph that undergoes edge insertions and deletions – an essential problem in the field of route planning and navigation. In a static scenario, the solution can be trivially achieved by rebuilding the shortest paths between nodes using Djikstra’s algorithm on every source vertex after each update to the graph. However, it is easy to see that for connectivity graphs employed in big data systems, this procedure quickly becomes intractable. Recent advancements in the field of dynamic algorithms have revealed that it is possible to maintain such connectivity information with considerably less overhead in terms of the update time to the data structure without a large loss of accuracy for the paths (Bernstein, 2009; Roditty & Zwick, 2004, 2012). This capacity to adapt data structures to effectively handle diverse queries is rapidly transitioning from being merely helpful to absolutely essential. Building upon this existing body of literature, we present a novel contribution by developing a dynamic embedding structure tailored to capturing the distances between nodes in a graph, specifically within the context of the simplified p\ell_{p} metric – a highly useful computation in the field of dimension reduction for big data. Importantly, our approach guarantees a polylogarithmic expected distortion, thereby striking a balance between efficiency and accuracy.

1.2 Our Results

We first explore the decremental setting, where edge weights can only increase dynamically (i.e., nodes move further apart); this is the setting under which our primary algorithmic contributions are effective. For the fully dynamic setting which allows both increases and decreases in edge weights, we show a partial negative result proving that maintaining an embedding into the p\ell_{p}-metric explicitly that has low distortion with high probability is not feasible. Here explicitly maintaining an embedding means that the entire embedding is updated efficiently, rather just reporting any changes to the data structure (see Section 2 for a more precise definition of these problem regimes).

Theorem 1.2.

There is no fully dynamic algorithm that can explicitly maintain a dynamic embedding into p\ell_{p} space with high probability.

Though computation is efficient in the target space, we demonstrate that an adversarially selected sequence of updates to the graph can force an update of the embedding for Ω(n)\Omega(n) nodes in each step which becomes intractable to maintain. Intuitively, this result is derived from the fact that changing a large number of pairwise distances in the p\ell_{p} metric is only possible by moving a large number of points, while making a similar change in the input graph can be done easily by, essentially, connecting and disconnecting two components. We expand more formally on this result in Section 3.

The main idea underpinning our primary algorithmic result is a novel combination of the static randomized decomposition of a graph (as utilized by Bourgain) with a decremental clustering algorithm to maintain an embedding into p\ell_{p} space that exhibits O(log3n)O(\log^{3}n) expected distortion and can answer distance queries with polylogarithmic update time. Our algorithmic result is stated formally as follows.

Theorem 1.3.

For every graph GG with max edge weight WW and a metric p\ell_{p}, there is a decremental dynamic algorithm that maintains an embedding, ρ:VO(log(nW))\rho:V\rightarrow{\mathbb{R}}^{O(\log(nW))}, for the metric induced by the dynamically updated graph GG into p\ell_{p} space of dimension O(log(nW))O(\log(nW)) that has expected (over the internal randomization of the algorithm) distortion at most O(log3n)O(\log^{3}{n}) and its running time is at most O((m1+o(1)log2W+Qlogn)log(nW))O\left((m^{1+o(1)}\log^{2}W+Q\log n)\log(nW)\right) with high probability222Throughout the paper, we say that an event holds with high probability (whp for short), if its probability is at least 1na1-n^{-a} for some absolute constant aa., where QQ denotes the total number of updates. Within this running time, the algorithm explicitly outputs all changes to the embedding and can answer distance queries between pair of vertices in O(log(nW))O(\log(nW)) time.

To prove the guarantees of this algorithm, we require an alternative, constructive proof of Bourgain’s lemma. Our algorithm is different from standard approaches to the problem which

can be classified as “Frechet embeddings.” In these embeddings, each coordinates ρi(v)\rho_{i}(v) takes the form of dG(v,Si)d_{G}(v,S_{i}) where SiS_{i} is a specific set. However, these approaches are not suitable for the dynamic setting due to limitations in analyzing their upper bound on ρ(u)ρ(v)p\|{\rho(u)-\rho(v)}\|_{p} for every given uu and vv. Specifically, the distances can be maintained only approximately at best, prohibiting us from obtaining an upper bound.

Starting from the static case, we introduce the notion of a (random) (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut. There are two main properties of a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut. Ignoring for now the technical ϵ\epsilon parameter of this notation, the parameters β\beta and RR control the following. First, we require that the probability that two vertices are in different sets is at most β\beta times the distance between these vertices in GG. Intuitively, we can expect many close vertices to be on the same side of the cut. On the other hand, for every pair of vertices whose distance in GG is larger than RR, we require probability at least 12\frac{1}{2} that they are on different sides of the cut. The rationale behind the latter property is that such a cut will, with constant probability, properly distribute vertices that are of distance at least RR in GG. We then construct O(log(nW))O(\log(nW)) such cuts, where the ii-th cut corresponds to a different choice of the distance steering parameter RR, i.e. Ri=2iR_{i}=2^{i}. The final embedding is made by assigning every vertex a vector of O(lognW)O(\log{nW}) coordinates, one coordinate for corresponding to each parameter choice RiR_{i}. For every cut we denote its two sides as “left” and “right”. If a vertex is on the left side of the ii-th cut, we set its ii-th coordinate to 0; if it is on the right side, we set the coordinate to RiR_{i}. Using both aforementioned properties of a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut, we show that such an assignment is an embedding with O(log3n)O(\log^{3}{n}) stretch.

To implement this algorithm in the dynamically changing graph GG, we prove that (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cuts can be efficiently obtained from a (β,δ)(\beta,\delta)-weak decomposition of GG, a probabilistic graph partitioning introduced by Bartal (Bartal, 1996). In this decomposition, we partition vertices of GG into clusters such that the distance (with respect to GG) between every pair of vertices in a cluster is at most δ\delta, but on the other hand, for every edge the probability that this edge connects two different clusters is at most β\beta times its weight. To proceed to (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cuts, we augment this construction by randomly assigning each cluster to one of the two sides of the cut. In the analysis, we manage to show that such simple random assignments guarantee the properties we require from a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut. On the other hand, provided that we are able to dynamically maintain (β,δ)(\beta,\delta)-weak decomposition of GG, it is simple to update the random assignment after each update. To deal with a (β,δ)(\beta,\delta)-weak decomposition of GG under dynamic updates, we lean on the result of (Forster et al., 2021) who showed how to maintain such a decomposition under edge deletions. We observe that their framework, with few technical changes, translates to our settings.

We discuss the details of the underlying static tools used to maintain this structure in Section 4 and proceed to augment these procedures to maintain edge weight updates in Section 5. Moreover, we note that the embedding can be used to implement a dynamic distance oracle (all-pairs shortest paths), as for each two vertices in the graph, we can estimate their distances efficiently by calculating the distance between their embeddings. While our distance guarantees only hold in expectation, the update time of a distance oracle based on our algorithm nearly matches the best known bounds for the APSP problem for O(log2n)O(\log^{2}n) stretch (Chechik, 2018; Forster et al., 2023), which further shows the tightness of our analysis.

1.3 Related Work

Metric Embedding.

The foundational result for the algorithmic applications of metric embedding is that of Bourgain in 1985 (Bourgain, 1985) which embeds into any p\ell_{p} metric with logarithmic distortion. When the input metric is already the 2\ell_{2} metric, the result of Johnson and Lindenstrauss (Johnson et al., 1986) shows that its size can be reduced to O(logn/ε2)O(\log n/\varepsilon^{2}) with (1+ε)(1+\varepsilon) distortion for ε>0\varepsilon>0. Recent works have studied lower bounds for the minimum number of dimensions necessary for this compression; e.g., see (Larsen & Nelson, 2017). To the best of our knowledge, these embedding results have no analogous algorithm in the dynamic setting, which we formulate in the present work.

While p\ell_{p} space is extremely useful for functional approximation and other challenging mathematical problems, there also exists a line of research on the embeddings of an input metric to a tree metric which inherently lends itself to dynamic problems. For embedding into these tree structures, an emphasis is placed on algorithms for probabilistic tree embeddings (PTE) where the host metric is embedded into a distribution of trees. Concretely, given a graph GG, the objective is to find a distribution over a set τ\tau of trees such that distances in GG do not get contracted and the expected distances over the randomly sampled tree distribution do not exceed a multiplicative stretch of α\alpha (stretch here can be considered interchangeable with the concept of distortion). The preliminary work on such embeddings from Bartal (Bartal, 1996) demonstrated that by a “ball growing” approach, we can embed any graph with O(log2n)O(\log^{2}n) stretch with a nearly equivalent lower bound of Ω(logn)\Omega(\log n) stretch for any such embedding. This work was later improved to obtain a PTE procedure with optimal O(logn)O(\log n) stretch (Fakcharoenphol et al., 2003) which has applications in problems for metric labeling (Kleinberg & Tardos, 2002), buy-at-bulk network design (Awerbuch & Azar, 1997a), vehicle routing (Charikar et al., 1998), and many other such contexts (Bartal, 2004; Garg et al., 2000). Our dynamic emebdding procedure combines this ball growing approach with a decremental clustering procedure to efficiently maintain an embedding into the p\ell_{p}-metric.

Dynamic Embedding.

Closely related to our work is the study of dynamic embeddings into trees. The work of (Forster & Goranci, 2019) initiates the study on the dynamic maintenance of low-stretch such spanning trees, devising an algorithm that yields an average distortion of no(1)n^{o(1)} in expectation with n1/2+o(1)n^{1/2+o(1)} update time per operation. This result was later improved to no(1)n^{o(1)} average distortion and update time bounded by no(1)n^{o(1)} (Chechik & Zhang, 2020).

The restriction of these prior works to the maintenance of spannning trees is an inherently more difficult and limited problem instance. To improve upon the above bounds, (Forster et al., 2021) removes this restriction and designs an embedding procedure that guarantees an expected distortion of no(1)n^{o(1)} in no(1)n^{o(1)} update time, or O(log4n)O(\log^{4}n) stretch with m1/2+o(1)m^{1/2+o(1)} update time when embedding into a distribution of trees. This work also devises a decremental clustering procedure that we build upon in the present work to devise our embeddings. We additionally note that the expected distortion objective more closely aligns with our primary result, however our embedding into the p\ell_{p} -metric is better suited for the class of NP-hard optimization problems whose approximation algorithms rely on the geometry of Euclidean space such as sparsest cut (Arora et al., 2005; Aumann & Rabani, 1998; Chawla et al., 2008), graph decompositions (Arora et al., 2009; Linial et al., 1995), and the bandwidth problem (Dunagan & Vempala, 2001; Feige, 1998; Krauthgamer et al., 2004).

Similar to the present work is the study of dynamic distance oracles as originally studied by (Thorup & Zwick, 2005) in the static setting, and later extended to the decremental setting with a data structure which maintains the distance between any two points from the input metric with O(2k1)O(2k-1) stretch, O~(mn)\tilde{O}(mn) total update time and O(m+n1+1/k)O(m+n^{1+1/k}) space (where kk is any positive integer) (Roditty & Zwick, 2012). This result can be further improved to a distortion of 1+ε1+\varepsilon with O~(n2)\tilde{O}(n^{2}) space for every ε>0\varepsilon>0. (Chechik, 2018) further present a decremental algorithm for the all pairs shortest path (APSP) problem which admits (2+ε)k1(2+\varepsilon)k-1 distortion with total update time of O(mn1/k+o(1)log(nW))O(mn^{1/k+o(1)}\log(nW)) and query time O(loglog(nW))O(\log\log(nW)). Our embedding which generalizes this notion of distance oracle yields a nearly equivalent update time for O(log2n)O(\log^{2}n) expected distortion, further demonstrating the tightness of our analysis.

In the next section, we precisely define the mathematical framework and formalization within which our algorithmic techniques reside.

2 Model and Preliminaries

Let G=(V,E)G=(V,E) be a weighted, undirected graph on nn vertices with (at most) mm edges of positive integer weights in the range from 1 to WW, where WW is a fixed parameter known to the algorithm. For an edge (u,v)E(u,v)\in E, we denote its weight by wG(u,v)w_{G}(u,v). For every pair of nodes u,vVu,v\in V, let dG(u,v)d_{G}(u,v) be the length of the shortest weighted path between nodes u,vu,v in GG, where we define the weight of a path as the sum of the weights of its edges. Throughout, we let Δ\Delta denote a power of two that is always larger than the diameter of the graph; note that ΔO(nW)\Delta\in O(nW). We note that (V,dG)(V,d_{G}) is a metric space.

Given a set of vertices VVV^{\prime}\subseteq V, we define the weak diameter of VV^{\prime} as the maximum distance between the vertices of VV^{\prime} in the original graph, i.e., wdiam(V)=supu,vVdG(u,v).\textnormal{wdiam}(V^{\prime})=\sup_{u,v\in V^{\prime}}d_{G}(u,v)\ . For all uVu\in V and r0r\geq 0, let BG(u,r)B_{G}(u,r) denote the set of all vertices that are within distance rr from uu in the graph GG, i.e., BG(u,r):={vV:dG(u,v)r}B_{G}(u,r):=\left\{\,v\in V:d_{G}(u,v)\leq r\,\right\}.

Metric Embedding. The objective of this paper is to construct and maintain an embedding of the metric defined by an input graph GG to an p\ell_{p} metric space without distorting the original distances by too much. More formally, given a metric space (X,dX)(X,d_{X}), an injective mapping f:GXf:G\rightarrow X is called an embedding, from GG into XX. We define the expansion (or stretch) and the contraction of the embedding ff, respectively, as:

expans(f)\displaystyle\textnormal{expans}(f) =supu,vV;uvdX(f(u),f(v))dG(u,v)\displaystyle=\sup_{u,v\in V;u\neq v}\frac{d_{X}(f(u),f(v))}{d_{G}(u,v)}
contr(f)\displaystyle\textnormal{contr}(f) =supu,vV;uvdG(u,v)dX(f(u),f(v)).\displaystyle=\sup_{u,v\in V;u\neq v}\frac{d_{G}(u,v)}{d_{X}(f(u),f(v))}\ .

We define the distortion of the embedding ff as distort(f)=expans(f)contr(f)\textnormal{distort}(f)=\textnormal{expans}(f)\cdot\textnormal{contr}(f). Note that any embedding ff satisfies 1contr(f)dG(u,v)dX(f(u),f(v))expans(f)dG(u,v).\frac{1}{\textnormal{contr}(f)}\cdot d_{G}(u,v)\leq d_{X}(f(u),f(v))\leq\textnormal{expans}(f)\cdot d_{G}(u,v). The embeddings in this paper are random functions, and are constructed by randomized algorithms. Given a random embedding f:VXf:V\to X, we define its expected distortion as the smallest value α>0\alpha>0 for which there exist positive values a,ba,b satisfying ab=αab=\alpha such that for all u,vVu,v\in V: 333Throughout the paper, we mostly consider a=1a=1. As such, we sometimes use distortion and stretch interchangeably since we are only concerned with the expansion of distances between points.

1adG(u,v)𝔼[dX(f(u),f(v))]bdG(u,v).\displaystyle\frac{1}{a}\cdot d_{G}(u,v)\leq\mathbb{E}\left[{d_{X}(f(u),f(v))}\right]\leq b\cdot d_{G}(u,v)\ . (1)

In this paper, we focus on embeddings into the p\ell_{p} metric space. In this metric space, the ground set XX equals d{\mathbb{R}}^{d}, for some positive integer dd, and for every pair of points x,yXx,y\in X, the distance dXd_{X} is defined as

dX(x,y)=xyp=(i=1d|xiyi|p)1/p,\displaystyle d_{X}(x,y)=\|{x-y}\|_{p}=\left(\,\sum_{i=1}^{d}|x_{i}-y_{i}|^{p}\,\right)^{1/p},

where xix_{i} and yiy_{i} refer to the ii-th coordinate of xx and yy, respectively.

Dynamic Model. We consider a model where the underlying input graph GG undergoes a sequence of updates as specified by an oblivious adversary. We assume that the adversary knows the algorithm, but does not have access to the random bits the algorithm uses. We use G0,G1,G_{0},G_{1},\dots to denote the corresponding sequence of graphs, where GiG_{i} refers to the graph after ii updates. Throughout, we will use QQ to denote the total number of updates to an input graph. This sequence is fixed by the adversary before the execution of the algorithm, but is revealed to the algorithm gradually, one by one. Our goal is to explicitly maintain an embedding after each update, as formally defined below:

Definition 2.1 (Maintain).

We say that a dynamic algorithm 𝒜\mathcal{A} explicitly maintains an embedding of the input graph into a metric space (X,dX)(X,d_{X}) if there exists a sequence of mappings ϕ0,ϕ1,\phi_{0},\phi_{1},\dots where ϕi:VX\phi_{i}:V\to X and 𝒜\mathcal{A} outputs the changes in ϕ\phi after every update. Formally, after the update tt, the algorithm should output vv and ϕt(v)\phi_{t}(v) for all vv such that ϕt(v)ϕt1(v)\phi_{t}(v)\neq\phi_{t-1}(v).

We operate in the decremental setting and assume that each update takes the form of an edge weight increase, i.e., for an edge (u,v)E(u,v)\in E, the value of wG(u,v)w_{G}(u,v) increases. We note that this is slightly different from the standard definition of the decremental setting which permits the deletion of edges in the input graph. The deletion of an edge can lead the input graph to potentially become disconnected, which means we may have dGt(u,v)=d_{G_{t}}(u,v)=\infty for some time step tt and u,vVu,v\in V. This is problematic, however, because regardless of the value of ϕt(u)\phi_{t}(u) and ϕt(v)\phi_{t}(v), we will always have ϕt(u)ϕt(v)p<\|{\phi_{t}(u)-\phi_{t}(v)}\|_{p}<\infty because the p\ell_{p} metrics do not allow for infinite distances. This in turn means that we cannot satisfy the bounds for expected distortion (Equation (1)), and as such cannot design a low-distortion embedding. To avoid this issue, we restrict the updates to edge weight increases only, and we note that in practice the removal an edge can be simulated by choosing a large WW as the dependence of our bounds on WW will be polylogarithmic. Thus, edge weight increases serve as a necessary stand-in for edge deletions as both will lead to pairwise distances increasing.

In the section that follows, we will show that maintaining a fully dynamic embedding, where edge weights are subject to both increases and decreases, that has low distortion with high probability is unfeasible in the p\ell_{p}-metric space if the distortion bounds hold. This limitation underpins the rationale for the above decremental problem setting we introduce.

3 Lower Bound for Explicit Maintenance of Fully Dynamic Embeddings

We first present an (oblivious) adversarial construction of edge weight modifications to a graph in the fully dynamic model that cannot be explicitly maintained in the geometry of p\ell_{p} space without needing to modify the embedding for every node in the original graph. We highlight that this is a high probability result whereas the main algorithmic results we obtain hold in expectation.

Theorem 3.1.

Any fully dynamic algorithm that maintains an embedding into the p\ell_{p}-metric space which guarantees a distortion of at most o(W)o(W) with high probability must have an update time at least Ω(n)\Omega(n).

Proof.

Let 𝒜\mathcal{A} be a fully dynamic algorithm which guarantees a stretch of at most o(W)o(W) with high probability. Consider an input graph GG that consists of two separate complete graphs on nn vertices, HH and HH^{\prime}, comprised of unit edges. Further consider two fixed vertices vH,vHv\in H,v^{\prime}\in H^{\prime}. If there is a unit edge between these two vertices, then the distance of all elements in HH and HH^{\prime} is at most 3 in the graph metric, and therefore in the p\ell_{p} embedding cannot be more than o(W)o(W).

Now, assume an adversary increases the edge weight connecting the vertices vv and vv^{\prime} to a value of WW. In the original graph metric, all pairwise distances between the nodes of HH and HH^{\prime} must now be at least WW. Therefore, the embedded points of one cluster (HH or HH^{\prime}) must be updated so as to not contract the original metric and maintain the distortion of at most WW with high probability (see Figure 1 for a depiction of this construction). Therefore, the algorithm 𝒜\mathcal{A} must update the embedding for all nn nodes of one of the complete components of GG to be at least WW away from the other with respect to the p\ell_{p} norm and satisfy the distortion constraints with high probability. Thus, we charge at least Ω(n)\Omega(n) to the update time of 𝒜\mathcal{A} in the worst case for the maintenance of the embedding. 444Formally, for each pair of vertices in H×HH\times H^{\prime}, at least one of them needs to be updated. Since there are Ω(n2)\Omega(n^{2}) pairs and each vertex update resolves the issue for O(n)O(n) pairs, we need Ω(n)\Omega(n) vertex updates. Moreover, we cannot amortize this worst case update occurrence since, in the subsequent iteration, the adversary can change the edge weight back 1 and repeat the cycle – resulting in Ω(n)\Omega(n) updates per iteration. ∎

Refer to caption
Figure 1: Adversarial sequence of graph updates

Though this sequence is simplistic, it highlights the inherent limitations of embedding into a complete and locally compact metric like the p\ell_{p} normed space. We additionally remark that, in the expected distortion setting of our algorithmic result, this lower bound does not persist since a large increase in the expected pairwise distances between nodes does not necessarily imply the embedding for every pair of points has been updated.

4 Static algorithm

We proceed to present our algorithm by first presenting the static partitioning procedure, which is used to initialize our data structure and is subsequently maintained through the sequence of updates specified by the adversary. While our ideas are based on prior work, to our knowledge, this static algorithm is a novel construction that has not appeared before in the literature.

4.1 Distance Preserving Cuts

Our algorithm dynamically maintains an embedding based on a set of cuts in the graph, where each cut is designed to separate vertices with distances above some threshold RR, while simultaneously preserving distances of the vertices in the graph. We formally define the notion of a distance preserving cut.

Definition 4.1 (Distance preserving cut).

Given a graph G=(V,E)G=(V,E), let SVS\subseteq V be a random subset of the vertices. For vertices u,vu,v, let cutu,v\texttt{cut}_{u,v} denote the event that u,vu,v are on different sides of the partition (S,V\S)(S,V\backslash S), i.e.,

cutu,v={uS and vS} or {uS and vS}.\displaystyle\texttt{cut}_{u,v}=\left\{\,u\in S\text{ and }v\notin S\,\right\}\text{ or }\left\{\,u\notin S\text{ and }v\in S\,\right\}\ .

We say that SS is a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut, or (β,R,ϵ)(\beta,R,\epsilon)-cut for short, if it has the following three properties:

  • Pr[cutu,v]βd(u,v)\mathrm{Pr}\left[{\texttt{cut}_{u,v}}\right]\leq\beta\cdot d(u,v) for every u,vu,v,

  • Pr[cutu,v]=0\mathrm{Pr}\left[{\texttt{cut}_{u,v}}\right]=0 for every u,vu,v such that d(u,v)<ϵd(u,v)<\epsilon,

  • Pr[cutu,v]12\mathrm{Pr}\left[{\texttt{cut}_{u,v}}\right]\geq\frac{1}{2} for every u,vu,v such that d(u,v)>Rd(u,v)>R.

Following in the algorithmic technique of decomposing the graph into these smaller sets with desirable pairwise distance bounds is a refinement on the ball-growing approach of Bartal (Bartal, 1996) and the padded decomposition at the heart of Bourgain’s embedding results (Bourgain, 1985). Most importantly, this efficient cut set construction allows us to contract edges which are small enough to ignore in the approximation factor and also provide logarithmic distortion bounds on the larger paths – a fact that will be verified in the following analysis.

The main result in this section is the following lemma that guarantees the existence of such cut sets and will be used heavily in our pre-processing graph decomposition at various granularities which in turn leads to the desired distortion bounds promised in Theorem 1.3.

Lemma 4.2.

For every 1RΔ1\leq R\leq\Delta, there exists a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut with β=O(lognR)\beta=O\left(\frac{\log n}{R}\right) and ϵ=Ω(Rn)\epsilon=\Omega\left(\frac{R}{n}\right).

We now present the proof of this lemma which uses the randomized decomposition method of Bartal (Bartal, 1996) in conjunction with a novel probabilistic compression procedure. First, we review the definition of an RR-partition of a graph GG (as originally defined in Bartal (Bartal, 1996)).

Definition 4.3.

An RR-partition of GG is a collection of subsets of vertices P={V1,,Vk}P=\{V_{1},...,V_{k}\} such that

  • For all i[k]i\in[k], ViVV_{i}\subseteq V and i[k]Vi=V\bigcup_{i\in[k]}V_{i}=V.

  • For all i,j[k]i,j\in[k] such that iji\neq j, ViVj=V_{i}\cap V_{j}=\emptyset.

  • Let C(Vi)C(V_{i}) denote the subgraph of GG induced on the vertices of ViV_{i}. Such a subgraph is referred to as a “cluster” of the partition and for every i[k]i\in[k], the weak diameter of ViV_{i} is bounded above as wdiam(Ci)R\textnormal{wdiam}(C_{i})\leq R.

Next, we give the definition of a (β,R)(\beta,R)-weak decomposition - a modificiation on the RR partition that probabilistically ensures vertices close to each other appear in the same cluster.

Definition 4.4.

Given a graph G=(V,E)G=(V,E), let 𝒞={C1,Ck}\mathcal{C}=\left\{\,C_{1},\dots C_{k}\,\right\} be a (random) partitioning of the vertices. For every uVu\in V, let C(u)[k]C(u)\in[k] denote the index ii such that uCiu\in C_{i}. We say that 𝒞\mathcal{C} is a (β,R)(\beta,R)-weak decomposition of GG if for every u,vu,v we have

Pr[C(u)C(v)]βd(u,v)\displaystyle\mathrm{Pr}\left[{C(u)\neq C(v)}\right]\leq\beta\cdot d(u,v)

and for every ii and pair u,vCiu,v\in C_{i} we have d(u,v)Rd(u,v)\leq R.

Following Bartal (Bartal, 1996), we prove that for all RR there exists a (O(logn/R),R)(O(\log n/R),R)-weak decomposition of GG that has the additional property that vertices which are closer than R2n\frac{R}{2n} are necessarily in the same cluster. Formally,

Theorem 4.5.

Given a graph G=(V,E)G=(V,E) with parameters 1RΔ1\leq R\leq\Delta, there exists a (β,R)(\beta,R)-weak decomposition {C1,,Ck}\{C_{1},...,C_{k}\} with β=O(logn/R)\beta=O(\log n/R) such that for every pair of vertices u,vVu,v\in V:

  • Pr[C(u)C(v)]βdG(u,v)\mathrm{Pr}\left[{C(u)\neq C(v)}\right]\leq\beta\cdot d_{G}(u,v)

  • If dG(u,v)<R2nd_{G}(u,v)<\frac{R}{2n} then uu and vv are in the same cluster.

Algorithm 1 Low-Diameter Randomized Decomposition (Ldrd) (Bartal, 1996)
1:  Input: Graph G=(V,E)G=(V,E) and parameter 1RΔ1\leq R\leq\Delta
2:  Output: An LDRD of G, denoted 𝒞={Ci}i=1n\mathcal{C}=\{C_{i}\}_{i=1}^{n}
3:  Contract edges of GG which are at most R2n\frac{R}{2n}
4:  Set UVU\leftarrow V
5:  for uUu\in U do
6:     Sample rG(β)r\sim G(\beta)
7:     rmin{r,R}r\leftarrow\min\{r,R\}
8:     C(u){vU:dG(u,v)r}C(u)\leftarrow\{v\in U:d_{G}(u,v)\leq r\}
9:     UUC(u)U\leftarrow U\setminus C(u)
10:  end for
11:  Return: {Ci}i=1n\{C_{i}\}_{i=1}^{n}

Given this randomized decomposition of our graph, we can construct the desired “cut” set that preserves distances by a simple random compression scheme that combines clusters from the above process. Specifically, we take each cluster from Theorem 4.5 and independently assign to one side of the constructed cut, grouping all the clusters into one of two groups. Within these groups we then merge the clusters to obtain our desired cut sets, SS and VSV\setminus S. The following lemma verifies that this is a distance preserving cut and the pseudocode is presented in Algorithm 2 for clarity. The proof is deferred to the appendix due to space constraints.

Algorithm 2 Randomized (β,R,ϵ)(\beta,R,\epsilon)-Cut Decomposition
1:  𝒞Ldrd(G,R)\mathcal{C}\leftarrow\textsc{Ldrd}(G,R)
2:  SS\leftarrow\emptyset
3:  for C𝒞C\in\mathcal{C} do
4:     Pick τ{0,1}\tau\in\{0,1\} uniformly at random
5:     if τ=1\tau=1 then
6:        SSCS\leftarrow S\cup C
7:     end if
8:  end for
9:  Return: (S,VS)(S,V\setminus S)
Lemma 4.6.

Given a value 1RΔ1\leq R\leq\Delta, let {Ci}i=1k\{C_{i}\}_{i=1}^{k} be the weak decomposition of the graph satisfying the properties of Theorem 4.5, and define the cut SS as S:=i[k]:xi=1Ci,S:=\cup_{i\in[k]:x_{i}=1}C_{i}, where x1,,xkx_{1},\dots,x_{k} is a sequence of i.i.d Bernoulli variables with parameter 12\frac{1}{2}. The cut SS is a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut with ϵ=R/2n\epsilon=R/2n.

4.2 Embedding Procedure

We now proceed to show how to obtain an embedding of the graph using the distance preserving cuts of the previous section. Let Δ\Delta be an upper bound on the diameter of the graph GG. We define our embedding that builds upon Definition 4.1 as follows.

Definition 4.7.

Given a sequence of cuts (S1,,Sr)(S_{1},\dots,S_{r}) and parameters (R1,,Rr)(R_{1},\dots,R_{r}), we define the characteristic embedding of (Si,Ri)i=1r\left(\,S_{i},R_{i}\,\right)_{i=1}^{r} as a mapping ρ:Vr\rho:V\to{\mathbb{R}}^{r} that sets the ii-th coordinate of ρ(v)\rho(v) to RiR_{i} if vSiv\in S_{i} and to 0 otherwise, i.e., ρ(v)i:=Ri𝟙{vSi}.\rho(v)_{i}:=R_{i}\cdot{\mathds{1}\left\{v\in S_{i}\right\}}.

We note the difference our embedding procedure and the existing embedding procedures into p\ell_{p} space. The standard approach to the problem is to use Frechet embeddings; each coordinate ρi(v)\rho_{i}(v) is of the form dG(v,Si)d_{G}(v,S_{i}) for some set SiS_{i}. These sets are either obtained randomly, or using the partitioning scheme of the Fakcharoenphol-Rao-Talwar (FRT) embedding (Fakcharoenphol et al., 2003). These procedures are not well-suited for the dynamic setting, however because of the analysis of their upper bound on ρ(u)ρ(v)p\|{\rho(u)-\rho(v)}\|_{p} for every pair u,vu,v. Specifically, in order to bound ρ(u)ρ(v)p\|{\rho(u)-\rho(v)}\|_{p}, the approaches rely on

|ρi(u)ρi(v)|=|dG(u,Si)dG(v,Si)|dG(u,v),\displaystyle\left|\rho_{i}(u)-\rho_{i}(v)\right|=\left|d_{G}(u,S_{i})-d_{G}(v,S_{i})\right|\leq d_{G}(u,v)\ ,

where the inequality follows from the triangle inequality. In the dynamic setting however, (efficiently) maintaining distances can only be done approximately. This means that ρi(u)\rho_{i}(u) and ρi(v)\rho_{i}(v) would each be within a (1+ϵ)(1+\epsilon) factor of dG(u,Si)d_{G}(u,S_{i}) and dG(v,Si)d_{G}(v,S_{i}) which would result in a degradation of our guarantees when maintained dynamically.

We now leverage the key characteristics for a set of distance preserving cuts to demonstrate that the corresponding characteristic embedding preserves the original distances in the p\ell_{p}-metric with only polylogarithmic distortion.

Theorem 4.8.

Given a graph G=(V,E)G=(V,E) and a parameter Δ\Delta that is a power of 22 satisfying Δdiam(G)\Delta\geq\text{diam}(G), let S1,,Slog(Δ)+1S_{1},\dots,S_{\log(\Delta)+1} be (random) subsets of VV such that SiS_{i} is a (βi,Ri,ϵi)(\beta_{i},R_{i},\epsilon_{i})-cut with Ri=2i2R_{i}=2^{i-2} and (βi,ϵi)=(O(log2nRi),Ω(Rin))(\beta_{i},\epsilon_{i})=(O(\frac{\log^{2}n}{R_{i}}),\Omega(\frac{R_{i}}{n})), and let ρ:VlogΔ+1\rho:V\to{\mathbb{R}}^{\log\Delta+1} be the characteristic embedding of these cuts. For every pair of vertices u,vu,v and any p[1,)p\in[1,\infty):

14d(u,v)𝔼[ρ(u)ρ(v)p]O(log3n)d(u,v).\displaystyle\frac{1}{4}\cdot d(u,v)\leq\mathbb{E}\left[{\|{\rho(u)-\rho(v)}\|_{p}}\right]\leq O(\log^{3}n)d(u,v).

We note that the constant 1/41/4 can be easily removed by multiplying the embedding vectors by 44. Equipped with this static embedding that only distorts pairwise distances by a polylogarithmic factor, we proceed to adapt the structure to efficiently modify the cut-set decomposition of GG through a sequence of (adversarially chosen) edge weight increases.

5 Dynamic Algorithm

In this section, we prove Theorem 1.3555Because of the page limit we avoid repeating statements of longer theorems.. We do it by constructing an algorithm that dynamically maintains an embedding of a metric induced by a graph GG into p\ell_{p} space of dimension O(logΔ)O(\log{\Delta}).

Our construction starts by observing a reduction. Informally, in the next theorem we show that in order to maintain dynamically the desired embedding, it is enough to have a dynamic algorithm that for every 1R2Δ1\leq R\leq 2\Delta maintains a (O(log2nR),R,Ω(Rn))(O(\frac{\log^{2}{n}}{R}),R,\Omega(\frac{R}{n}))-distance preserving cut.

Theorem 5.1.

Assume we are given an algorithm 𝒜\mathcal{A} that takes as input the parameter RR and decrementally maintains a (β,R,ϵ)(\beta,R,\epsilon) distance preserving cut SS for a graph GG undergoing edge weight increases, outputting changes to SS after each such update, where β:=O(log2nR)\beta:=O(\frac{\log^{2}{n}}{R}) and ϵ=Ω(Rn)\epsilon=\Omega(\frac{R}{n}). Assume further that the total running time of the algorithm 𝒜\mathcal{A} is bounded by t(m,n)t(m,n) whp. Then there is a decremental dynamic algorithm that maintains an embedding of the vertices ρ:VlogΔ+1\rho:V\to{\mathbb{R}}^{\log{\Delta}+1}, where Δ\Delta is a power of 22 always satisfying Δdiam(G)\Delta\geq\text{diam}(G)666For instance, we can set Δ\Delta to be the smallest power of 22 larger than nWnW., that has expected (over the internal randomization of the algorithm) distortion at most O(log3n)O(\log^{3}{n}) and running time at most O(t(m,n)logΔ)O\left(t(m,n)\log{\Delta}\right), whp.

We now show how to maintain a distance preserving cut dynamically, which in turn leads to a dynamic embedding algorithm via Theorem 5.1 and completes the proof of the main result, Theorem 1.3. We start by observing that a (β,δ)(\beta,\delta)-weak decomposition of GG can be dynamically maintained. We here highlight that the authors of (Forster et al., 2021), building upon the approach of (Chechik & Zhang, 2020), have already proved that a (β,δ)(\beta,\delta)-weak decomposition of GG can be dynamically maintained under edge deletions (Corollary 3.8). The proof of our observation involves adapting their techniques to the slightly modified definition of dynamic changes we invoke here to handle the continuous nature of p\ell_{p} space.

Lemma 5.2.

For every β(0,1)\beta\in(0,1) and δ=(6(a+2)(2+logm)lnn)β1=O(aβ1log2n)\delta=(6(a+2)(2+\log m)\ln n)\beta^{-1}=O(a\beta^{-1}\log^{2}n), where a1a\geq 1 is a given constant controlling the success probability, there is a decremental algorithm to maintain a probabilistic weak (β,δ)(\beta,\delta)-decomposition of a weighted, undirected graph undergoing increases of edge weights that with high probability has total update time O(m1+o(1)logW+Qlogn)O(m^{1+o(1)}\log W+Q\log n), where QQ is the total number of updates to the input graph, and (within this running time) is able to report all nodes and incident edges of every cluster that is formed. Over the course of the algorithm, each change to the partitioning of the nodes into clusters happens by splitting an existing cluster into two or several clusters and each node changes its cluster at most O(logn)O(\log n) times.

Equipped with this tool, we can present the main contribution of this section - the maintenance of a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut under dynamic edge weights increases.

Lemma 5.3.

For every 0R2Δ0\leq R\leq 2\Delta, there is a decremental dynamic algorithm that maintains a (β,R,ϵ)\left(\beta,R,\epsilon\right)-distance preserving cut a of weighted, undirected graph GG where β=O(log2(n)/R)\beta=O(\log^{2}(n)/R) and ϵ=Ω(R/n)\epsilon=\Omega(R/n). Its total update time is O(m1+o(1)log2W+Qlogn)O(m^{1+o(1)}\log^{2}W+Q\log n) with high probability, where QQ is the total number of updates to the input graph, and, within this running time, explicitly reports all changes to the maintained cut.

The synthesis of these two lemmas with the result of Theorem 5.1 yields the overall dynamic embedding of Theorem 1.3.

6 Conclusion

We here present the first dynamic embedding into p\ell_{p} space which is equipped to handle edge weight increases – a non-trivial extension of the seminal Bourgain and JL embedding results (Bourgain, 1985; Johnson et al., 1986). Most notably, our embeddings produce only a polylogarithmic distortion of the base metric and exhibit an update time on par with the best known results for the APSP and other embedding based problems. Our embedding procedure additionally reports any modifications within polylogarithmic time and is naturally well suited to the class of NP-hard optimization problems which rely on Euclidean geometry for approximations to the optimal solution. To supplement our algorithmic result, we further present a lower bound for the fully dynamic setting where edge weights can be increased or decreased. In particular, we show that no algorithm can achieve a low distortion with high probability without inheriting an update time of Ω(n)\Omega(n) which makes the procedure inefficient in practice.

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Acknowledgements

We thank the anonymous reviewers for their valuable feedback. This work is Partially supported by DARPA QuICC, ONR MURI 2024 award on Algorithms, Learning, and Game Theory, Army-Research Laboratory (ARL) grant W911NF2410052, NSF AF:Small grants 2218678, 2114269, 2347322 Max Springer was supported by the National Science Foundation Graduate Research Fellowship Program under Grant No. DGE 1840340. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

References

  • Abraham et al. (2006) Abraham, I., Bartal, Y., and Neimany, O. Advances in metric embedding theory. In Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pp.  271–286, 2006.
  • Arora et al. (2005) Arora, S., Lee, J. R., and Naor, A. Euclidean distortion and the sparsest cut. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pp.  553–562, 2005.
  • Arora et al. (2009) Arora, S., Rao, S., and Vazirani, U. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):1–37, 2009.
  • Aumann & Rabani (1998) Aumann, Y. and Rabani, Y. An o (log k) approximate min-cut max-flow theorem and approximation algorithm. SIAM Journal on Computing, 27(1):291–301, 1998.
  • Awerbuch & Azar (1997a) Awerbuch, B. and Azar, Y. Buy-at-bulk network design. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pp.  542–547, 1997a. doi: 10.1109/SFCS.1997.646143.
  • Awerbuch & Azar (1997b) Awerbuch, B. and Azar, Y. Buy-at-bulk network design. In Proceedings 38th Annual Symposium on Foundations of Computer Science, pp.  542–547. IEEE, 1997b.
  • Bartal (1996) Bartal, Y. Probabilistic approximation of metric spaces and its algorithmic applications. In Proceedings of 37th Conference on Foundations of Computer Science, pp.  184–193. IEEE, 1996.
  • Bartal (2004) Bartal, Y. Graph decomposition lemmas and their role in metric embedding methods. In Albers, S. and Radzik, T. (eds.), Algorithms – ESA 2004, pp.  89–97, Berlin, Heidelberg, 2004. Springer Berlin Heidelberg. ISBN 978-3-540-30140-0.
  • Bernstein (2009) Bernstein, A. Fully dynamic (2+ ε\varepsilon) approximate all-pairs shortest paths with fast query and close to linear update time. In 2009 50th Annual IEEE Symposium on Foundations of Computer Science, pp.  693–702. IEEE, 2009.
  • Bhattacharya et al. (2022) Bhattacharya, S., Lattanzi, S., and Parotsidis, N. Efficient and stable fully dynamic facility location. Advances in neural information processing systems, 35:23358–23370, 2022.
  • Blum et al. (1998) Blum, A., Konjevod, G., Ravi, R., and Vempala, S. Semi-definite relaxations for minimum bandwidth and other vertex-ordering problems. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp.  100–105, 1998.
  • Bourgain (1985) Bourgain, J. On lipschitz embedding of finite metric spaces in hilbert space. Israel Journal of Mathematics, 52:46–52, 1985.
  • Charikar et al. (1998) Charikar, M., Chekuri, C., Goel, A., and Guha, S. Rounding via trees: Deterministic approximation algorithms for group steiner trees and k-median. Conference Proceedings of the Annual ACM Symposium on Theory of Computing, pp.  114–123, 1998. ISSN 0734-9025. Proceedings of the 1998 30th Annual ACM Symposium on Theory of Computing ; Conference date: 23-05-1998 Through 26-05-1998.
  • Chawla et al. (2008) Chawla, S., Gupta, A., and Räcke, H. Embeddings of negative-type metrics and an improved approximation to generalized sparsest cut. ACM Transactions on Algorithms (TALG), 4(2):1–18, 2008.
  • Chechik (2018) Chechik, S. Near-optimal approximate decremental all pairs shortest paths. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp.  170–181. IEEE, 2018.
  • Chechik & Zhang (2020) Chechik, S. and Zhang, T. Dynamic low-stretch spanning trees in subpolynomial time. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp.  463–475. SIAM, 2020.
  • Church (2017) Church, K. W. Word2vec. Natural Language Engineering, 23(1):155–162, 2017.
  • Dunagan & Vempala (2001) Dunagan, J. and Vempala, S. On euclidean embeddings and bandwidth minimization. In International Workshop on Randomization and Approximation Techniques in Computer Science, pp.  229–240. Springer, 2001.
  • Dütting et al. (2023) Dütting, P., Fusco, F., Lattanzi, S., Norouzi-Fard, A., and Zadimoghaddam, M. Fully dynamic submodular maximization over matroids. In International Conference on Machine Learning, pp. 8821–8835. PMLR, 2023.
  • Fakcharoenphol et al. (2003) Fakcharoenphol, J., Rao, S., and Talwar, K. A tight bound on approximating arbitrary metrics by tree metrics. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC ’03, pp.  448–455, New York, NY, USA, 2003. Association for Computing Machinery. ISBN 1581136749. doi: 10.1145/780542.780608. URL https://doi.org/10.1145/780542.780608.
  • Feige (1998) Feige, U. Approximating the bandwidth via volume respecting embeddings. In Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp.  90–99, 1998.
  • Forster & Goranci (2019) Forster, S. and Goranci, G. Dynamic low-stretch trees via dynamic low-diameter decompositions. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pp.  377–388, 2019.
  • Forster et al. (2021) Forster, S., Goranci, G., and Henzinger, M. Dynamic maintenance of low-stretch probabilistic tree embeddings with applications. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp.  1226–1245. SIAM, 2021.
  • Forster et al. (2023) Forster, S., Goranci, G., Nazari, Y., and Skarlatos, A. Bootstrapping dynamic distance oracles. arXiv preprint arXiv:2303.06102, 2023.
  • Garg et al. (2000) Garg, N., Konjevod, G., and Ravi, R. A polylogarithmic approximation algorithm for the group steiner tree problem. Journal of Algorithms, 37(1):66–84, 2000. ISSN 0196-6774. doi: https://doi.org/10.1006/jagm.2000.1096. URL https://www.sciencedirect.com/science/article/pii/S0196677400910964.
  • Henzinger et al. (2018) Henzinger, M., Krinninger, S., and Nanongkai, D. Decremental single-source shortest paths on undirected graphs in near-linear total update time. Journal of the ACM (JACM), 65(6):1–40, 2018.
  • Indyk (2001) Indyk, P. Algorithmic applications of low-distortion embeddings. In Proc. 42nd IEEE Symposium on Foundations of Computer Science, pp.  1, 2001.
  • J. (2002) J., M. Lectures on discrete geometry. Graduate Texts in Mathematics, 2002. ISSN 0072-5285. doi: 10.1007/978-1-4613-0039-7. URL https://cir.nii.ac.jp/crid/1361981469479209856.
  • Johnson et al. (1986) Johnson, W. B., Lindenstrauss, J., and Schechtman, G. Extensions of lipschitz maps into banach spaces. Israel Journal of Mathematics, 54(2):129–138, 1986. doi: 10.1007/BF02764938. URL https://doi.org/10.1007/BF02764938.
  • Kleinberg & Tardos (2002) Kleinberg, J. and Tardos, E. Approximation algorithms for classification problems with pairwise relationships: Metric labeling and markov random fields. J. ACM, 49(5):616–639, sep 2002. ISSN 0004-5411. doi: 10.1145/585265.585268. URL https://doi.org/10.1145/585265.585268.
  • Krauthgamer et al. (2004) Krauthgamer, R., Lee, J. R., Mendel, M., and Naor, A. Measured descent: A new embedding method for finite metrics. In 45th Annual IEEE Symposium on Foundations of Computer Science, pp.  434–443. IEEE, 2004.
  • Larsen & Nelson (2017) Larsen, K. G. and Nelson, J. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp.  633–638. IEEE, 2017.
  • Leskovec & Krevl (2014) Leskovec, J. and Krevl, A. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
  • Linial et al. (1995) Linial, N., London, E., and Rabinovich, Y. The geometry of graphs and some of its algorithmic applications. Combinatorica, 15:215–245, 1995.
  • Roditty & Zwick (2004) Roditty, L. and Zwick, U. On dynamic shortest paths problems. In Algorithms–ESA 2004: 12th Annual European Symposium, Bergen, Norway, September 14-17, 2004. Proceedings 12, pp.  580–591. Springer, 2004.
  • Roditty & Zwick (2012) Roditty, L. and Zwick, U. Dynamic approximate all-pairs shortest paths in undirected graphs. SIAM Journal on Computing, 41(3):670–683, 2012.
  • Subercaze et al. (2015) Subercaze, J., Gravier, C., and Laforest, F. On metric embedding for boosting semantic similarity computations. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp.  8–14, 2015.
  • Thorup & Zwick (2005) Thorup, M. and Zwick, U. Approximate distance oracles. Journal of the ACM (JACM), 52(1):1–24, 2005.

Appendix A Empirical validation

We tested the theoretical algorithm guarantees on three different graphs.

Data sets preparation. As the backbone for each graph, we used the social network of LastFM users from Asia available in the Stanford Network Analysis Project dataset (SNAP) (Leskovec & Krevl, 2014). To adhere to our dynamic setting, we randomly chose a subset of 150, 300, and 600 connected nodes to form three different bases of the dynamically changing network. We added random weights from a uniform distribution to these graphs. We augmented each graph by respectively 10000, 5000, and 1000 changes to the topology (queries). Each change increases the weight of a randomly and uniformly chosen edge of the graph by a number chosen from a uniform distribution whose range increases as the process progresses.

Evaluation. We implemented the cut-preserving embedding from Theorem 1.3 and computed the distances between every pair of nodes in the graph after each query. We compared the average of these distances with the average distances computed by an exact algorithm that in an offline fashion computes the shortest distances after each query. Visualized results are presenting in Figure 2.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Visualization of average distances in a dynamically changing metric. The orange line represents the average distance between all pairs of nodes computed exactly, using a deterministic shortest path algorithm, after every query. The blue line represents the average distance computed based on the embedding given by the dynamic embedding algorithm proposed in the paper.

To allow more direct reasoning about the distortion of our embedding, in Figure 3 we provide plots representing, after each query, the ratio of the average distance based on our dynamic embedding to the average distance computed exactly. We would like to note that even though theoretically our embedding is not contractive, this property holds in expectation. In practice, small fluctuations may appear which are particularly visible in the case of a small number of queries.

Refer to caption
Refer to caption
Refer to caption
Figure 3: The ratio of the average distance between all pairs of points computed by of our embedding to the exact average distance between all pairs, after each query.

Conclusions. We observed that in these experiments the achieved stretch is within a constant factor of the average over the exact distances which adheres to the O(log2(n))O(log^{2}(n)) theoretical bound of Theorem 1.3 and even surpasses it. This advantage might be a consequence of a couple of things, e.g. the random process involved in the generation, or a few number of testing instances. Nevertheless, we find this results promising. We hope that they can serve as a starting point for further investigation of practicality of dynamic embedding.

Appendix B Omitted Proofs

B.1 Static Algorithm

We briefly provide a sketch of for the proof of Theorem 4.5 below and refer to Bartal (Bartal, 1996) for the full details.

Proof sketch.

We construct the so-called “low diameter randomized decomposition” (LDRD) for the input graph GG with (parameter RR) via the following high-level procedure: first, contract all paths between vertices that are shorter than R2n\frac{R}{2n}. Now, starting with the contracted graph we pick an arbitrary vertex, uu, and select a radius rr sampled from the geometric distribution with success parameter β\beta; if rRr\geq R then we simply replace it with RR. Note that, with high probability, rr will not be replaced because of the choice of the parameters. Mark all unmarked vertices contained within the set Br(u)={vG:d(u,v)r}B_{r}(u)=\{v\in G:d(u,v)\leq r\} and define this to be a new cluster of the partition. Repeat this ball-growing process with the next unmarked vertex in GG and only consider the remaining unmarked vertices to be included in future clusters. This procedure is repeated until all vertices are marked and return the created clusters. Pseudocode for this is provided in Algorithm 1.

To verify the theorem, we proceed to prove each property of a (β,R)(\beta,R)-weak decomposition holds with the additional probabilistic bound. The second property holds by contraction of small edges. Therefore, we need only obtain an upper bound on the probability that the two vertices uu and vv are not assigned to the same cluster, ie., C(u)C(v)C(u)\neq C(v). This final point is a standard procedure attributed to (Bartal, 1996) which we overview below.

For some edge e=(u,v)e=(u,v), we need to bound the probability that ee is contained within none of the clusters of the decomposition. Specifically, we need to ensure that after each ball growing procedure (Line 8 of Algorithm 1) ee is not added to a cluster. We can therefore decompose this value as the probability that exactly one end point of ee was contained in the last ball grown or neither is contained. Let tt denote the stage of the ball growing procedure and let XtX_{t} denote the event that edge (u,v)Cj(u,v)\notin C_{j} for all jtj\leq t. Then we can recursively compute Pr[Xt]\mathrm{Pr}\left[{X_{t}}\right] as the probability that exactly one of u,vu,v are contained in CtC_{t}, or the probability that neither is as a condition on Pr[Xt+1]\mathrm{Pr}\left[{X_{t+1}}\right]. The direct computation of these values using the probability density functions for the radius random variable’s distribution and bounding the probability that an endpoint of (u,v)Ct(u,v)\in C_{t} is contained in Section 3 of (Bartal, 1996) and yields the desired O(lognR)O\left(\frac{\log n}{R}\right) upper bound on the probability that C(u)C(v)C(u)\neq C(v).

Proof of Lemma 4.6.

By the guarantee on the partition given by Theorem 4.5, for every pair of vertices u,vVu,v\in V we must have that Pr[u and v in different clusters]βd(u,v)\mathrm{Pr}\left[{u\text{ and }v\text{ in different clusters}}\right]\leq\beta\cdot d(u,v), and Pr[u and v in different clusters]=0\mathrm{Pr}\left[{u\text{ and }v\text{ in different clusters}}\right]=0 for all u,vu,v such that dG(u,v)<Rnd_{G}(u,v)<\frac{R}{n}. Note that if the two nodes are in the same cluster than they must be on the same side of the cut. Therefore, the first two properties of (β,R)(\beta,R)-cuts are proved.

Now, further assume that u,vVu,v\in V such that d(u,v)>Rd(u,v)>R. By construction, for all CiC_{i}, have wdiam(Ci)R\textnormal{wdiam}(C_{i})\leq R. Therefore, the vertices uu and vv cannot be contained in the same cluster. Let CuC_{u} and CvCuC_{v}\neq C_{u} denote the clusters containing uu and vv respectively. By construction of the cut set SS, we have that Pr[cutCu,Cv]=12\mathrm{Pr}\left[{\texttt{cut}_{C_{u},C_{v}}}\right]=\frac{1}{2}. Therefore, Pr[cutu,v]1/2\mathrm{Pr}\left[{\texttt{cut}_{u,v}}\right]\geq 1/2 as claimed. ∎

Proof of Theorem 4.8.

We divide the proof into a few key Claims. For every pair of vertices u,vu,v, define dp(u,v):=ρ(u)ρ(v)pd^{\prime}_{p}(u,v):=\|{\rho(u)-\rho(v)}\|_{p}. We first lower bound 𝔼[dp(u,v)]\mathbb{E}\left[{d^{\prime}_{p}(u,v)}\right].

Claim 1.

For every pair of vertices u,vVu,v\in V:

𝔼[dp(u,v)]𝔼[d(u,v)]d(u,v)4.\displaystyle\mathbb{E}\left[{d^{\prime}_{p}(u,v)}\right]\geq\mathbb{E}\left[{d^{\prime}_{\infty}(u,v)}\right]\geq\frac{d(u,v)}{4}\ .
Proof.

The first inequality holds for any embedding ρ\rho since .p\|{.}\|_{p} is decreasing in pp; we therefore focus on the second inequality. We assume that d(u,v)0d(u,v)\neq 0 as otherwise the claim is trivially true.

Fix u,vu,v and set ii to be the maximal integer satisfying 2i2<d(u,v)2^{i-2}<d(u,v). By definition of ii, we must have 2i1d(u,v)2^{i-1}\geq d(u,v). We note that i[logΔ+1]i\in[\log\Delta+1]; we have i1i\geq 1 because 212<1d(u,v)2^{1-2}<1\leq d(u,v) and we have i<logΔ+2i<\log\Delta+2 because 2(logΔ+2)2=Δd(u,v)2^{(\log\Delta+2)-2}=\Delta\geq d(u,v).

Consider the cut SiS_{i}. Since d(u,v)>Rid(u,v)>R_{i}, by definition of a distance preserving cut, we have Pr[cutu,v(Si)]12\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right]\geq\frac{1}{2} which implies that with probability at least 12\frac{1}{2}, the ii-th coordinate ρ(u)\rho(u) and ρ(v)\rho(v) different. This implies the claim by the definition of the characteristic embedding. Formally,

𝔼[d(u,v)]\displaystyle\mathbb{E}\left[{d^{\prime}_{\infty}(u,v)}\right] 𝔼[|ρi(u)ρi(v)|]\displaystyle\geq\mathbb{E}\left[{\left|\rho_{i}(u)-\rho_{i}(v)\right|}\right] (Definition of .\|{.}\|_{\infty})
=RiPr[cutu,v(Si)]\displaystyle=R_{i}\cdot\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right] (Definition of ρi\rho_{i})
Ri2\displaystyle\geq\frac{R_{i}}{2} (Assumption on SiS_{i})
d(u,v)4\displaystyle\geq\frac{d(u,v)}{4} (Since 2Rid(u,v)).\displaystyle\text{\emph{(Since $2R_{i}\geq d(u,v)$)}}.

thus, proving the claim. ∎

We now upper bound 𝔼[dp(u,v)]\mathbb{E}\left[{d^{\prime}_{p}(u,v)}\right].

Claim 2.

For every pair of vertices u,vVu,v\in V:

𝔼[dp(u,v)]𝔼[d1(u,v)]O(log2n)d(u,v).\displaystyle\mathbb{E}\left[{d^{\prime}_{p}(u,v)}\right]\leq\mathbb{E}\left[{d^{\prime}_{1}(u,v)}\right]\leq O(\log^{2}n)\cdot d(u,v)\ .
Proof.

As before, first inequality follows from the fact that .p\|{.}\|_{p} is decreasing in pp. We therefore focus on the second inequality. For convenience, we denote d:=d(u,v)d:=d(u,v). Note that, for every scale ii, we have

𝔼[ρi(u)ρi(v)1]\displaystyle\mathbb{E}\left[{\|{\rho_{i}(u)-\rho_{i}(v)}\|_{1}}\right] =𝔼[Ri𝟙{cutu,v(Si)}]\displaystyle=\mathbb{E}\left[{R_{i}\cdot{\mathds{1}\left\{\texttt{cut}_{u,v}(S_{i})\right\}}}\right]
=RiPr[cutu,v(Si)].\displaystyle=R_{i}\cdot\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right]. (2)

By definition, the 1\ell_{1} norm will merely be a summation on the above over the scales of RiR_{i}:

𝔼[ρ(u)ρ(v)1]\displaystyle\mathbb{E}\left[{\|{\rho(u)-\rho(v)}\|_{1}}\right] =i=1logΔ+1𝔼[|ρi(u))ρi(v)|]\displaystyle=\sum_{i=1}^{\log\Delta+1}\mathbb{E}\left[{|\rho_{i}(u))-\rho_{i}(v)|}\right]
=i=1logΔ+1RiPr[cutu,v(Si)].\displaystyle=\sum_{i=1}^{\log\Delta+1}R_{i}\cdot\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right].

We proceed to bound this summation by bounding individual summands corresponding to manageable scales with the properties of our distance preserving cuts from Section 4.1.

We divide the above sum into three parts depending on the relationship the parameters and dd:

  • Case 1: d>Rid>R_{i}. For these ii, we simply bound Pr[cutu,v(Si)]\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right] with 11.

  • Case 2: d<ϵid<\epsilon_{i}. In this case, since the cut SiS_{i} is distance preserving, we must have Pr[cutu,v(Si)]=0\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right]=0.

  • Case 3: ϵidRi\epsilon_{i}\leq d\leq R_{i}. Since SiS_{i} is distance preserving, we we know that Pr[cutu,v(Si)]βid(u,v)\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right]\leq\beta_{i}d(u,v).

We proceed to combine the three cases. We first note that any ii in Case 11 satisfies Ri<dR_{i}<d which means 2i2<d2^{i-2}<d or equivalently i<log(d)+2i<\log(d)+2. Therefore,

i:Ri<dRii:2i2<d2i2i=1logd+22i22logd+1=O(d).\displaystyle\sum_{i:R_{i}<d}R_{i}\leq\sum_{i:2^{i-2}<d}2^{i-2}\leq\sum_{i=1}^{\left\lceil\log d\right\rceil+2}2^{i-2}\leq 2^{\left\lceil\log d\right\rceil+1}=O(d).

As for Case 3,

i:ϵidRiRiPr[cutu,v(Si)]i:ϵidRiβiRid(u,v)=d(u,v)O(log2n)|i:ϵidRi|,\displaystyle\sum_{i:\epsilon_{i}\leq d\leq R_{i}}R_{i}\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right]\leq\sum_{i:\epsilon_{i}\leq d\leq R_{i}}\beta_{i}R_{i}d(u,v)=d(u,v)O(\log^{2}n)\cdot\left|i:\epsilon_{i}\leq d\leq R_{i}\right|, (3)

where fore the final inequality we have used the assumption βiO(log2nRi)\beta_{i}\leq O(\frac{\log^{2}n}{R_{i}}).

We proceed to bound |i:ϵidRi|\left|i:\epsilon_{i}\leq d\leq R_{i}\right|. Let cc be the constant such that ϵicRi/n\epsilon_{i}\geq cR_{i}/n. In order for dd to satisfy d[ϵi,Ri]d\in[\epsilon_{i},R_{i}], we must have d2i2d\leq 2^{i-2} which means ilog(d)+2i\geq\log(d)+2 and c2i2/ndc2^{i-2}/n\leq d which means ilog(nd/c)+2i\leq\log(nd/c)+2. It follows that

|i:ϵidRi|(log(nd/c)+2)(log(d)+2)+1=log(n)+log(c1)+1,\displaystyle\left|i:\epsilon_{i}\leq d\leq R_{i}\right|\leq(\log(nd/c)+2)-(\log(d)+2)+1=\log(n)+\log(c^{-1})+1,

which is O(log(n)+1)=O(logn)O(\log(n)+1)=O(\log n). Plugging this back in Equation (3) we obtain

i:ϵidRiRiPr[cutu,v(Si)]O(log3n)d(u,v)\displaystyle\sum_{i:\epsilon_{i}\leq d\leq R_{i}}R_{i}\mathrm{Pr}\left[{\texttt{cut}_{u,v}(S_{i})}\right]\leq O(\log^{3}n)d(u,v)

Combining the above cases, and using Equation (2), we conclude that

i=1logΔ+1|ρi(u)ρi(v)|O(log3(n))d(u,v).\displaystyle\sum_{i=1}^{\log\Delta+1}\left|\rho_{i}(u)-\rho_{i}(v)\right|\leq O(\log^{3}(n))d(u,v).

Combining the two claims, we have the result of Theorem 4.8. ∎

B.2 Dynamic Algorithm

Proof of Theorem 5.1.

The decremental algorithm uses log(Δ)+1\log(\Delta)+1 instances of algorithm 𝒜\mathcal{A} with the given choice of parameters which allows us to follow the argument provided for the static case in Section 4. Let Ri=2iR_{i}=2^{i} for 1ilog(Δ)1\leq i\leq\log(\Delta). The ii-th instance of the algorithm 𝒜\mathcal{A} maintains a (O(log2nRi),Ri,Ω(Rin))\left(O(\frac{\log^{2}{n}}{R_{i}}),R_{i},\Omega(\frac{R_{i}}{n})\right)-distance preserving cut SiS_{i}. The final embedding ρ:VlogΔ\rho:V\rightarrow{\mathbb{R}}^{\log{\Delta}} is the characteristic embedding of the cuts (S1,,Slog(Δ)+1)(S_{1},\ldots,S_{\log(\Delta)+1}) and parameters (R1,,Rlog(Δ)+1)\left(R_{1},\ldots,R_{\log(\Delta)+1}\right). Formally, we set the ii-th coordinate of ρ(v)\rho(v) to be RiR_{i} if vSiv\in S_{i} and to 0 otherwise.

Upon the arrival of a decremental change, the algorithm inputs this change into every instance of the dynamic decremental algorithm 𝒜\mathcal{A} it runs. By assumption on the input algorithm 𝒜\mathcal{A}, each run explicitly outputs changes to the structure of the ii-th cut. Therefore, the main algorithm can adapt to these changes by appropriately changing the coordinates of vertices. Specifically, if a vertex is removed from the ii-th cut its coordinate changes from 0 to RiR_{i} and if it is added the opposite change takes place. Since processing each such change takes constant time, and there are t(m,n)t(m,n) updates in total by assumption on 𝒜\mathcal{A}, the total time for the ii-th instance is t(m,n)t(m,n). Therefore, by charging the time used for these changes to the total update time of O(logΔ)O(\log{\Delta}) instances of the algorithm 𝒜\mathcal{A}, the total running time of the decremental algorithm is at most O(t(m,n)logΔ)O\left(t(m,n)\log{\Delta}\right). As for the distortion, by assumption on algorithm 𝒜\mathcal{A}, each run maintains a (O(log2nRi),Ri,Ω(Rin))\left(O(\frac{\log^{2}{n}}{R_{i}}),R_{i},\Omega(\frac{R_{i}}{n})\right)-distance preserving cut. Thus, the stretch of the maintained embedding ρ\rho follows from applying Theorem 4.8. ∎

Proof sketch of Lemma 5.2.

At a high level, the dynamic algorithm presented in (Forster et al., 2021) for maintaining a weak decomposition undergoing edge deletions relies on the concept of assigning a center to each cluster in the decomposition, an idea initially introduced in (Chechik & Zhang, 2020).

This technique employs a dynamic Single Source Shortest Paths (SSSP) algorithm (specifically, the SSSP algorithm described in (Henzinger et al., 2018)) to monitor the distances from a center to every vertex within the cluster. Whenever an edge is deleted, the change is also updated to the SSSP algorithm. The SSSP algorithm then outputs vertices from the cluster whose distance to the center is greater than a certain threshold, and such that keeping these vertices within the cluster could potentially violate the requirements of a (β,δ)(\beta,\delta)-weak decomposition. To prevent this event, an appearance of such a vertex incurs either a re-centering of the current cluster or splitting the cluster into two or more disjoint new clusters. These operations ensure that eventually the diameter of each cluster satisfies the requirements of the (β,R)(\beta,R)-weak decomposition. Crucially, the authors show that the number of times a cluster is re-centered can be bounded by alogna\log{n}, for some absolute constant aa. On the other hand, the splitting procedure is designed in such a way that the size of a cluster that is split from the previous cluster shrinks by at least a factor of 22. As a result, any vertex can be moved to a new cluster at most O(logn)O(\log{n}) times.

Now, the crucial observation that allows us to carry the approach to the case of edge weight increases is the fact that the SSSP algorithm of (Henzinger et al., 2018) also handles edge weight increases while preserving the same complexity bounds. The algorithm then is the same as in (Forster et al., 2021) with the only change that, to monitor distance from a center of a cluster to every other vertex, we use the SSSP version that supports edge weights increases. As for the correctness part, we can carry the analysis from (Forster et al., 2021) with minor changes. In Lemma 3.3, we show that the probability of being an inter-cluster edge is at most βwk(e)\beta w_{k}(e), where wk(e)w_{k}(e) denotes the weight of edge ee in the current dynamic graph GkG_{k} after kk updates. This, by the union bound, implies that for every pair of vertices u,vGku,v\in G_{k} it holds Pr[C(u)C(v)]βdGk(u,v)\mathrm{Pr}\left[{C(u)\neq C(v)}\right]\leq\beta\cdot d_{G_{k}}(u,v). From the fact that weight updates only increase distances, we observe that Lemmas 3.5 and 3.6 from (Forster et al., 2021) still hold (here, it is also important that updates are independent of the algorithm, which is also true in our model). This observation ultimately gives us that each cluster undergoes a center re-assignment at most O(alogn)O(a\cdot\log{n}) times. As a consequence, our derivation of total update time follows from the one in (Forster et al., 2021) with the change that we account O(Qlogn)O(Q\log n) time to parse all updates since each edge is present in at most O(logn)O(\log n) different SSSP instances in their algorithm. ∎

Proof of Lemma 5.3.

Let the graph GG^{\prime} denote the graph obtained by replacing all edges in GG with weight ϵ\leq\epsilon with weight 0. It is easy to see that we can maintain the graph GG with Θ(m+Q)\Theta(m+Q) total update time; we start by preprocessing GG at the beginning of the algorithm. Next, whenever there is an update to an edge weight, we pass the update along to GG^{\prime} if the new values of the weight is strictly larger than ϵ\epsilon and ignore it if it is at most ϵ\epsilon. Note that in the latter case the old value is at most ϵ\epsilon as well since we only allow edge-weight increases.

We then use the dynamic decremental algorithm from Lemma 5.2 for the choice of parameters (β,R/2)(\beta,R/2) and a=Θ(1)a=\Theta(1) on the graph GG^{\prime}. As a consequence of the initialization, a (β,R/2)\left(\beta,R/2\right)-weak decomposition of the preprocessed GG^{\prime} is computed. Denote the clusters of this decomposition C1,,CkC_{1},\ldots,C_{k}. We claim that the clusters satisfy the following properties.

  • The weak diamater of any CiC_{i} in GG is at most RR.

  • For any u,vu,v, the probability that they are in different clusters is at most βdG(u,v)\beta d_{G}(u,v).

  • Any u,vu,v such that dG(u,v)ϵd_{G}(u,v)\leq\epsilon are in the same cluster.

We start with the first property. By definition of a (β,R/2)(\beta,R/2)-weak decomposition, if dG(u,v)>R/2d_{G^{\prime}}(u,v)>R/2 then uu and vv are in different clusters. We note however that dG(u,v)dG(u,v)+ϵnd_{G}(u,v)\leq d_{G^{\prime}}(u,v)+\epsilon n; this is because any path connecting uu and vv in GG^{\prime} has at most nn edges that with replaced weights. Choosing ϵR/2n\epsilon\leq R/2n, it follows that dG(u,v)Rd_{G}(u,v)\leq R for all uu and vv in the same cluster. For the second property, we note that the probability that uu and vv are in different clusters is at most βdG(u,v)\beta d_{G^{\prime}}(u,v), and therefore the property follows from dG(u,v)dG(u,v)d_{G^{\prime}}(u,v)\leq d_{G}(u,v). Finally, for the third property, it holds because any two such vertices will have distance 0 in GG^{\prime} and therefore are always in the same cluster.

After initialization, the algorithm samples kk uniform and independent values from {0,1}\{0,1\}, one value for each cluster. Next, a cut of GG is created by grouping vertices from clusters that have been assigned value 11, denoting this side of the cut SS. By Lemma 4.6, we have that the cut SS is a (β,R,ϵ)\left(\beta,R,\epsilon\right)-distance preserving cut. We also note that the cut SS can be generated with O(k)=O(n)O(k)=O(n) additive overhead to the time complexity of the dynamic decremental algorithm from Lemma 5.2.

Finally, we discuss the algorithm action upon a decremental update of an edge. As mentioned, we maintain GG^{\prime} by forwarding the update of an edge if necessary depending on whether the new weight is ϵ\leq\epsilon or >ϵ>\epsilon. The update to GG^{\prime} is then in turn forwarded to the algorithm that dynamically maintains (β,R/2)\left(\beta,R/2\right)-weak decomposition of GG^{\prime}. According to Lemma 5.2, the only changes to the partitioning occur when splitting an existing cluster into two or more new clusters and members of the new clusters need to be explicitly listed. Let 𝒞={C1,,Cj}\mathcal{C}^{\prime}=\{C^{\prime}_{1},\ldots,C^{\prime}_{j}\} be the set of these newly formed clusters. The main algorithm temporarily deletes the vertices belonging to clusters 𝒞\mathcal{C}^{\prime} from the dynamic cut SS it maintains. Then, for each new cluster, it samples independently and uniformly a value from {0,1}\{0,1\}. Again, vertices from clusters that have been sampled 11 are added to those who already had 11 (before the decremental update), while the ones who have just sampled 0 are assigned to the other side of the cut. Since the decremental change is oblivious to the randomness of the algorithm, the random bits assigned to all clusters of the new partitioning obtained from the decremental update are distributed i.i.d. according to a Bernoulli distribution with parameter 12\frac{1}{2}, and by applying Lemma 4.6 we obtain that the updated cut is also a (β,R,ϵ)(\beta,R,\epsilon)-distance preserving cut. As for the time complexity, we can observe that the number of changes made to the structure is linearly proportional to the number of vertices changing a cluster after an update. It, therefore, follows that the total update time needed for maintaining the cut is dominated by the time needed to report the changes in the dynamic partitioning, and according to Lemma 5.2, is at most O(m1+o(1)log2W+Qlogn)O\left(m^{1+o(1)}\log^{2}W+Q\log n\right). ∎